npm - @rafter-security/cli - Versions diffs - 0.7.0 → 0.7.2 - Mend

@rafter-security/cli 0.7.0 → 0.7.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (56) hide show

package/README.md +20 -1
package/dist/commands/agent/audit-skill.js +2 -1
package/dist/commands/agent/audit.js +27 -0
package/dist/commands/agent/components.js +800 -0
package/dist/commands/agent/disable.js +47 -0
package/dist/commands/agent/enable.js +50 -0
package/dist/commands/agent/index.js +6 -0
package/dist/commands/agent/init.js +162 -164
package/dist/commands/agent/list.js +72 -0
package/dist/commands/brief.js +20 -0
package/dist/commands/docs/index.js +18 -0
package/dist/commands/docs/list.js +37 -0
package/dist/commands/docs/show.js +64 -0
package/dist/commands/mcp/server.js +84 -0
package/dist/commands/skill/index.js +14 -0
package/dist/commands/skill/install.js +89 -0
package/dist/commands/skill/list.js +79 -0
package/dist/commands/skill/registry.js +273 -0
package/dist/commands/skill/remote.js +333 -0
package/dist/commands/skill/review.js +975 -0
package/dist/commands/skill/uninstall.js +65 -0
package/dist/core/audit-logger.js +262 -21
package/dist/core/config-manager.js +3 -0
package/dist/core/docs-loader.js +148 -0
package/dist/core/policy-loader.js +72 -1
package/dist/index.js +6 -0
package/package.json +1 -1
package/resources/skills/rafter/SKILL.md +76 -96
package/resources/skills/rafter/docs/backend.md +106 -0
package/resources/skills/rafter/docs/cli-reference.md +199 -0
package/resources/skills/rafter/docs/finding-triage.md +79 -0
package/resources/skills/rafter/docs/guardrails.md +91 -0
package/resources/skills/rafter/docs/shift-left.md +64 -0
package/resources/skills/rafter-code-review/SKILL.md +91 -0
package/resources/skills/rafter-code-review/docs/api.md +90 -0
package/resources/skills/rafter-code-review/docs/asvs.md +120 -0
package/resources/skills/rafter-code-review/docs/cwe-top25.md +78 -0
package/resources/skills/rafter-code-review/docs/investigation-playbook.md +101 -0
package/resources/skills/rafter-code-review/docs/llm.md +87 -0
package/resources/skills/rafter-code-review/docs/web-app.md +84 -0
package/resources/skills/rafter-secure-design/SKILL.md +103 -0
package/resources/skills/rafter-secure-design/docs/api-design.md +97 -0
package/resources/skills/rafter-secure-design/docs/auth.md +67 -0
package/resources/skills/rafter-secure-design/docs/data-storage.md +90 -0
package/resources/skills/rafter-secure-design/docs/dependencies.md +101 -0
package/resources/skills/rafter-secure-design/docs/deployment.md +104 -0
package/resources/skills/rafter-secure-design/docs/ingestion.md +98 -0
package/resources/skills/rafter-secure-design/docs/standards-pointers.md +102 -0
package/resources/skills/rafter-secure-design/docs/threat-modeling.md +128 -0
package/resources/skills/rafter-skill-review/SKILL.md +106 -0
package/resources/skills/rafter-skill-review/docs/authorship-provenance.md +82 -0
package/resources/skills/rafter-skill-review/docs/changelog-review.md +99 -0
package/resources/skills/rafter-skill-review/docs/data-practices.md +88 -0
package/resources/skills/rafter-skill-review/docs/malware-indicators.md +79 -0
package/resources/skills/rafter-skill-review/docs/prompt-injection.md +85 -0
package/resources/skills/rafter-skill-review/docs/telemetry.md +78 -0

package/resources/skills/rafter-code-review/docs/llm.md ADDED Viewed

@@ -0,0 +1,87 @@
+# LLM-Integrated Code Review — OWASP LLM Top 10 (2025)
+For any code that sends prompts to a model, exposes tool calls, retrieves context (RAG), or ships model output to a downstream system. Walk as questions. Cite file:line.
+## LLM01 — Prompt Injection
+Assume every string that reaches the prompt — user input, retrieved documents, tool output, file contents, web pages — is adversarial.
+- Trace the prompt build. Concatenation of user input into the system prompt? String interpolation of retrieved chunks? Find every `system + user` join site.
+- Are there *structural* defenses? (Delimiters the model is trained to respect, role separation, XML tags, instruction hierarchies.) Note: none are airtight — defense is layered, not singular.
+- Indirect injection: is retrieved content (web page, email, PDF, repo file) ever fed to the model? Treat it as untrusted input, same as the user's message.
+- Output gating: is the model's output used to decide authz, invoke tools, or send messages? If yes — LLM01 merges with LLM06 (Excessive Agency).
+## LLM02 — Sensitive Information Disclosure
+- What goes into the prompt? Grep for prompts that include: PII, internal URLs, database rows, credentials, full request objects. "Just pass context" is the failure mode.
+- Is there a redaction step between "application data" and "prompt"? Can it be turned off by flag?
+- Does the model provider retain logs? Which tenant's data is crossing into the provider? Is that contractually allowed?
+- Model output: before returning to the user, is it scanned for data the caller shouldn't see (e.g. other tenants' data leaked from the context)?
+## LLM03 — Supply Chain
+- Model source: where does the model come from? Provider API (which account?) or self-hosted? If self-hosted, from which registry? Is the weights file checksummed?
+- Embedding model: same questions. Many RAG pipelines have *two* models; both are supply chain.
+- Prompt templates: if loaded from a shared registry (LangChain Hub, custom store), pinned and verified? Or pulled by name?
+- Plugins / tools / MCP servers registered with the agent — are they audited (see `rafter agent audit`) before install?
+## LLM04 — Data & Model Poisoning
+- Training / fine-tuning data: where from, how reviewed, who can write to the source? Can a user of the system influence future training (feedback loops)?
+- RAG corpus: same question. Can a user add documents to the retrieval index? If yes — those documents can issue instructions via LLM01.
+- Vector store: who can write? Who can update metadata (which drives filtering)? Metadata poisoning can bypass the retrieval filter.
+## LLM05 — Improper Output Handling
+Treat model output as untrusted input to whatever consumes it.
+- Markdown → HTML rendering: is the markdown sanitized? `![img](javascript:...)`, `<script>` in allowed tags, `<img onerror=>`?
+- Model output as code: passed to `eval`, `exec`, `Function()`, compiled and run, written as a shell script? That's RCE by way of prompt.
+- Model output as URL: used to fetch, redirect, or render? Same SSRF/XSS questions as elsewhere — plus: the model happily generates `javascript:` URLs.
+- Model output as SQL / shell / XPath: if the model writes queries, is the result parameterized / sandboxed / approved before execution?
+- Tool-call arguments from the model: validate shape, types, and values against a schema. Do not trust the model to stay in bounds.
+## LLM06 — Excessive Agency
+Tools + untrusted prompts = agent exfiltration / damage.
+- For each tool the agent can call, ask: (a) does it need to exist, (b) what's its blast radius, (c) is there a human-in-the-loop gate for irreversible actions?
+- Permissions scope: does the agent run with the calling user's permissions, or with service-account permissions that exceed any one user?
+- Destructive actions (send email, charge card, delete, write file, run shell): any of these reachable from a prompt? Use Rafter's command guardrails (`rafter agent exec`) as a pattern.
+- Chained calls: can tool A's output become tool B's input with no validation? Multi-step attacks live here.
+## LLM07 — System Prompt Leakage
+- Don't put secrets in system prompts. Grep the system prompt for API keys, customer-specific config, internal URLs.
+- Assume the system prompt is recoverable. The prompt is for *behavior*, not for *authz*. If the code relies on the user not knowing the prompt to enforce a policy — the policy is broken.
+- Different tenants / roles: different prompts, loaded server-side keyed by the *authenticated* principal, never from the request.
+## LLM08 — Vector & Embedding Weaknesses
+- Embedding-time injection: user content embedded without sanitization can be weaponized when retrieved.
+- Access control on retrieval: is the query filtered by tenant / user before the vector search, or filtered *after*? "After" often leaks via re-ranker.
+- Embedding collisions / adversarial embeddings: high-stakes retrieval (medical, legal) — is there a confidence floor on the similarity score before acting?
+## LLM09 — Misinformation & Overreliance
+A design question, but reviewable:
+- Does the UI make it clear the output is model-generated? Is there a confidence indicator where warranted?
+- For advice domains (medical, legal, financial), is there a disclaimer *and* a hard gate on actions?
+- Does the code treat model output as ground truth anywhere? Summaries, extractions, classifications used downstream should have a human review step or a fallback.
+## LLM10 — Unbounded Consumption
+- Token budgets per request, per user, per tenant, per day?
+- Max tokens on *output* (not just input) — unbounded generation is the classic DoS/cost footgun.
+- Streaming responses: timeout per chunk? Total timeout?
+- Parallel requests: queue depth, concurrency caps? Fan-out from a single user request to N model calls (RAG, ReAct loops) — bounded?
+---
+## Exit criteria
+- For every tool the agent can call: documented purpose, scope, and human-gate story.
+- For every retrieval/RAG path: write-access audit, injection defenses, tenant isolation.
+- For every model output sink: treated as untrusted, specific sanitization / validation cited.
+- Run `rafter agent audit` on any bundled skills/plugins. Pair with `rafter run` for SAST.

package/resources/skills/rafter-code-review/docs/web-app.md ADDED Viewed

@@ -0,0 +1,84 @@
+# Web Application Review — OWASP Top 10 (2021)
+Walk each category as questions. Cite file:line evidence before moving on. If you can't answer a question, that *is* the finding.
+## A01 — Broken Access Control
+The #1 risk. Every authenticated route must answer: "who is allowed?"
+- Grep for route handlers (`app.get`, `@app.route`, `router.handle`, controller annotations). For each: is there an explicit authz check? If you can't see one, trace the middleware chain — is it registered *before* this route?
+- For every `where user_id = ?` pattern, is the id from the session, or from the request? `?id=123` in the URL that controls the DB lookup is IDOR-shaped.
+- Are admin routes distinguished by URL prefix alone? If `/admin/*` is only protected by "don't tell users", that's not protection.
+- Does the app rely on HTTP verb restrictions (GET safe, POST protected)? Can you POST to a GET-only endpoint? Does it accept `X-HTTP-Method-Override`?
+- Is CORS configured with `Access-Control-Allow-Origin: *` alongside `Allow-Credentials: true`? That combination is almost always wrong.
+## A02 — Cryptographic Failures
+- What algorithms appear? Grep for `md5`, `sha1`, `des`, `rc4`, `ecb`. Any hit on user data, session tokens, or passwords is a finding.
+- How are passwords hashed? Look for `bcrypt`, `scrypt`, `argon2`, `pbkdf2`. Absence is the finding. `sha256(password + salt)` is not password hashing.
+- Are secrets in source? Run `rafter scan local .` first — but also grep for `private_key`, `api_key`, `BEGIN RSA`, `.pem`, `.p12`.
+- Is TLS enforced? Look for redirect middleware, HSTS headers, cookie `Secure` flag. Cookies without `Secure` + `HttpOnly` + `SameSite` — ask why.
+- Is randomness from `Math.random()` / `rand()` used for tokens, session ids, password resets? Must be `crypto.randomBytes` / `secrets.token_*` / `crypto/rand`.
+## A03 — Injection
+- SQL: every query that interpolates a variable (`f"SELECT ... {x}"`, backticks with `${x}`, `+` string concat into SQL). Must be parameterized. ORMs help but `.raw()` / `.query()` escape hatches don't.
+- Command injection: `exec`, `spawn`, `system`, `subprocess.run(shell=True)`, `child_process.exec`. Any user input reaching these? Prefer array form, never `shell=True` with input.
+- LDAP / NoSQL / XPath / template injection: same question — does user input reach a query language, and is it escaped by the library or by string concat?
+- XSS: where does user-controlled data reach HTML? React/Vue auto-escape; `dangerouslySetInnerHTML`, `v-html`, `innerHTML`, template literals rendered as HTML are the escape hatches. Server-side: is the template engine autoescaping? Jinja2 defaults off for `.txt`, on for `.html`.
+- Deserialization: `pickle.loads`, `yaml.load` (without SafeLoader), `Marshal.load`, Java's `ObjectInputStream`. Any of these on untrusted bytes is RCE-shaped.
+## A04 — Insecure Design
+Design smells that code review *can* catch:
+- Is there a single trust boundary, or does the same request cross it multiple times? (e.g. user → API → internal service that re-reads user input without re-validating.)
+- Are rate limits on authentication and password reset flows? Count attempts per account *and* per IP.
+- Does the password reset flow leak account existence? "Email sent if account exists" vs "no account with that email" — the latter is an oracle.
+- Is the "remember me" token a long-lived bearer? What invalidates it on password change?
+## A05 — Security Misconfiguration
+- Debug mode / stack traces in production? Grep for `DEBUG = True`, `app.debug`, `NODE_ENV` comparisons.
+- Default credentials in config files or seed scripts? Look in `seed.js`, `fixtures/`, `docker-compose.yml`.
+- Unused frameworks/features enabled? Directory listing? Admin consoles (`/admin`, `/actuator`, `/console`) without authn?
+- Security headers: CSP, X-Content-Type-Options, Referrer-Policy, Permissions-Policy. Is there a helmet/`secure` middleware registered?
+- Cloud metadata access — can the server be coerced into fetching `169.254.169.254`? (see also A10/SSRF.)
+## A06 — Vulnerable & Outdated Components
+- `rafter run` covers this via SCA. In review, check that the manifest is present (`package.json`, `requirements.txt`, `go.mod`, `pom.xml`) and that the lockfile is committed.
+- Is there a `postinstall` / `prepare` script running arbitrary code from dependencies? That's a supply-chain footgun.
+- Are any dependencies pulled from raw git URLs or non-registry sources without pinning?
+## A07 — Identification & Authentication Failures
+- Session management: where is the session created, stored, invalidated? Does logout actually invalidate server-side, or just drop the cookie?
+- Multi-factor: present on admin? On password change? On MFA enrollment itself (bypass via "add new device")?
+- Credential stuffing: lockout policy, captcha on repeated failures, generic error messages.
+- JWT: is `alg: none` accepted? Is the key confusion attack possible (HS256 verified against an RSA public key)? Is `kid` used to resolve arbitrary files?
+## A08 — Software & Data Integrity Failures
+- Update channels: does the app auto-update itself or pull config from remote? Is that channel signed and verified?
+- CI/CD: does the pipeline verify signatures on built artifacts? Are secrets scoped per-job or leaked across?
+- Deserialization (overlaps with A03): any untrusted blob fed to `pickle` / `yaml.load` / `unserialize` / `readObject`.
+## A09 — Security Logging & Monitoring Failures
+- Are authn failures logged with enough context (user id, ip, timestamp) to be useful?
+- Do logs leak secrets? Grep log statements for `password`, `token`, request bodies printed wholesale.
+- Is there a correlation id per request that survives across services?
+## A10 — Server-Side Request Forgery (SSRF)
+- Any endpoint that fetches a URL supplied by the user? (image proxy, webhook configurer, PDF-from-URL, OAuth callback that fetches `openid-configuration`.)
+- Is the URL's host allowlisted? Does the allowlist resolve the hostname and re-check against an internal-IP denylist (RFC1918 + link-local + cloud metadata)?
+- Does it follow redirects? Each redirect is a fresh SSRF check, not just the first URL.
+---
+## Exit criteria
+- For each category above, either a file:line citation proving it's handled, OR a finding logged with ruleId-shaped summary, OR an explicit "N/A — feature not present in this diff".
+- Pair with `rafter run` results: cross-reference scanner findings against your manual walk. Scanner-only hits are candidates for triage (`rafter/docs/finding-triage.md`); manual-only hits are the ones scanners miss.

package/resources/skills/rafter-secure-design/SKILL.md ADDED Viewed

@@ -0,0 +1,103 @@
+---
+name: rafter-secure-design
+description: "Shift-left, design-phase security — walk design decisions as a Choose-Your-Own-Adventure *before* the code exists. Router skill: pick what you're designing (auth, data storage, API surface, ingestion, deployment, dependencies) and Read the matching sub-doc. Each sub-doc is a set of questions a security engineer would ask at kickoff — what primitive to pick, what to refuse, what to threat-model. Pair with `rafter-code-review` (mid-lifecycle review) and the `rafter` skill (detection). Use at feature kickoff, architecture review, or whenever you're choosing between primitives."
+version: 0.1.0
+allowed-tools: [Read, Glob, Grep]
+---
+# Rafter Secure Design — Designing It Right The First Time
+A designer's skill, not a scanner. The goal is to catch the flaw in the whiteboard sketch, not three weeks later in a PR. Each sub-doc asks the questions a security engineer would ask at kickoff — "which primitive, which boundary, which default?"
+> Pair with `rafter-code-review` (structured review *during* PR) and the `rafter` skill (automated detection of what slipped through). This skill is the earliest stage — prevention before the code exists.
+## How to use this skill
+1. Identify what's being designed (below). If multiple apply, walk them in the order listed — `threat-modeling` last, as a capstone.
+2. `Read` only the matching sub-doc. Do not preload them all; pick-and-load keeps the conversation tight.
+3. Work through its questions against the *proposed* design. Capture the answer inline (architecture doc, design RFC, PR description). If you can't answer a question, that's a design gap — resolve it before writing code.
+4. When the design is stable, run the `threat-modeling` walk to stress-test it.
+5. Hand off to `rafter-code-review` during implementation.
+---
+## Choose Your Adventure
+### (1) Authentication & Authorization
+For: login, sessions, tokens, service-to-service identity, multi-tenant access, role-based permissions, anything that answers "who is this and what can they do?"
+- **Read `docs/auth.md`** — Primitive selection (session vs. JWT vs. OAuth), authZ model (RBAC / ABAC / ReBAC), token lifetime + revocation, MFA surface, service identity. Questions phrased as "pick one and say why".
+### (2) Data storage — at rest, in transit, PII
+For: database schema design, file storage, caches, logs, anything that decides *where* sensitive data lives and *who* holds the keys.
+- **Read `docs/data-storage.md`** — Classification (what is PII/PHI/PCI here?), encryption choices, key management, retention + deletion, backup scope, tenancy isolation. Anti-patterns: encrypt-everything-as-a-religion, homegrown crypto, keys next to data.
+### (3) API surface — REST / GraphQL / gRPC / webhooks
+For: designing new endpoints, shaping request/response schemas, choosing between resource styles, rate limiting, versioning, exposing internal services.
+- **Read `docs/api-design.md`** — Resource modeling for authz (is this endpoint BOLA-shaped?), write-vs-read boundaries, idempotency, rate-limit keys, error taxonomy (what leaks?), webhook delivery + replay.
+### (4) Ingestion — inputs, uploads, parsers, user content
+For: anything that accepts user-controlled bytes: form posts, file uploads, webhook payloads, imports, content rendering, search indexing.
+- **Read `docs/ingestion.md`** — Trust boundaries (where does untrusted become trusted?), parser choice (safe default vs. fast), size + shape limits, content sniffing, SSRF-adjacent fetchers, deserialization surface.
+### (5) Deployment — topology, network, secrets, runtime
+For: infra plan, service boundaries, secret distribution, egress policy, CI/CD pipeline, build-time vs. run-time separation.
+- **Read `docs/deployment.md`** — Network zones, least-privilege IAM, secret distribution (not "put it in env"), build provenance, runtime posture (read-only FS, non-root), multi-region / DR assumptions.
+### (6) Dependencies & supply chain
+For: picking a library, adopting a framework, pulling a container base image, introducing a new SaaS, wiring a postinstall script.
+- **Read `docs/dependencies.md`** — Pick-vs-write, maintenance signal, install-time execution, pinning + lockfiles, SBOM + SCA hooks, vendoring vs. registry, typosquat / slopsquat checks.
+### (7) Threat model — STRIDE walk of the full design
+For: the capstone pass *after* the above decisions are drafted. Also good for any greenfield service review.
+- **Read `docs/threat-modeling.md`** — STRIDE applied to the specific design (not the generic checklist). Trust boundaries, data-flow diagrams as prose, abuse cases, negative-space questions ("what did we implicitly assume?").
+### (8) Which standards / frameworks should bound this?
+For: scoping compliance, picking a baseline, answering "how much is enough?"
+- **Read `docs/standards-pointers.md`** — Pointers to ASVS (app sec), NIST SSDF (lifecycle), CSA CCM (cloud), OWASP SAMM (program maturity), plus the cheap-and-fast subset to start with.
+---
+## What this skill will NOT do
+- It will not write the design document for you. It walks *your* draft through structured questions.
+- It will not replace a dedicated threat-modeling session with the team. It prepares you for one.
+- It will not produce a checklist to mechanically tick through. Every question expects a deliberate answer; "N/A because..." is fine, "skip" is not.
+---
+## Fast path at feature kickoff
+```text
+1. Sketch the design (one-pager, box-and-arrow).
+2. Walk the sub-doc that matches the riskiest choice you're about to make.
+3. Walk threat-modeling.md as a capstone.
+4. Write the decisions into the design doc as "decided / rejected / why".
+5. Start coding — and loop in `rafter-code-review` when the PR lands.
+```
+If you're revisiting an existing design (refactor, migration), same flow: treat the current shape as "proposed" and walk the relevant sub-docs as questions.
+---
+## Tie-backs
+- Ready to review the code that implements the design? → `rafter-code-review`.
+- Implementation landed, need automated checks? → `rafter` skill, `rafter run` / `rafter scan local`.
+- Risky command came up mid-design (spike, data migration)? → `rafter` skill, `docs/guardrails.md`.
+- Have a specific finding from a scan? → `rafter` skill, `docs/finding-triage.md`.

package/resources/skills/rafter-secure-design/docs/api-design.md ADDED Viewed

@@ -0,0 +1,97 @@
+# API Design — Design Questions
+The shape of your API decides which vulnerabilities are *possible*. Good shape makes BOLA, BFLA, and mass assignment hard to write. Bad shape makes them hard to avoid.
+## Resource modeling — is this endpoint BOLA-shaped?
+- For each endpoint, what is the resource being named, and *how is it named*? `GET /orders/:id` names an order by global id — the caller can enumerate and try any id. Contrast with `GET /me/orders/:id` — scoped to the caller.
+- The scoping prefix (`/me`, `/org/:org_id`) doesn't enforce authZ by itself, but it makes the enforcement gap visible. "I forgot to check" is harder when the URL structure announces the scope.
+- Are identifiers **opaque** (random, unguessable) or **sequential**? Sequential ids aren't a security control, but combined with missing authZ they turn a 5-minute bug into a data breach. Opaque ids (UUIDv4, ULIDs with enough entropy) buy you a little defense-in-depth.
+- GraphQL: the resource boundary is per-field, not per-endpoint. You need authZ on every resolver that returns a resource, including nested resolvers. Think: "can a query walk from a public node to a private one via an edge?"
+## AuthZ enforcement point
+- Where does each endpoint check authorization?
+  - Before the handler (middleware / decorator): good for coarse checks (authenticated? role?).
+  - Inside the domain layer, against the specific resource: required for resource-level checks (can user X read order Y?).
+  - Both: middleware filters obvious unauthenticated traffic; domain checks the specific access.
+- Missing authZ checks are the #1 API bug class. Is there a test that *proves* every endpoint either returns 401 without auth or has an authZ test that denies a different user?
+- BFLA (broken function-level authz): admin actions on regular-user endpoints. Is there a single codepath that's reachable by multiple roles where only the check is different? That's the BFLA shape.
+## Request shape — mass assignment
+- Does the handler bind the full request body into a model, then save? `User.create(request.body)` is mass assignment — a client can set `is_admin: true` if the field exists on the model.
+- Explicit allowlist per endpoint, even if it's verbose. Frameworks that "automatically filter" are a landmine — the filter is correct until a field is added.
+- For updates: what fields are read-only? Created-at, created-by, tenant-id, owner-id — none of these should be settable by the client.
+## Idempotency & safety
+- Write endpoints: does the spec say idempotent or not? `PUT /things/:id` should be idempotent; `POST /things` usually isn't. Clients will retry — non-idempotent writes without an idempotency key will double-charge, double-send, double-create.
+- If you accept an `Idempotency-Key` header (Stripe-style): how long is the key scoped? Per-user, per-hour, per-day? Too short = legitimate retries fail; too long = stale dedup.
+- HTTP verb discipline: does the server accept verb-override headers (`X-HTTP-Method-Override`)? If yes, the "GET is safe" assumption breaks — any GET can become a POST.
+## Rate limiting & abuse
+- What are the **three** rate-limit keys? Per-IP (cheapest), per-API-key / per-user (account-level abuse), per-endpoint (expensive endpoints get lower limits).
+- Authentication endpoints (login, password reset, MFA): count per-account *and* per-IP. Per-IP alone misses credential stuffing with rotating IPs; per-account alone misses enumeration.
+- Webhook senders: self-rate-limit (queue, backoff). A storm of retries is a self-DoS.
+- Abuse cost: for expensive operations (file upload, image processing, LLM calls), what prevents one user from burning all the budget? Quotas > rate limits for cost control.
+## Error taxonomy — what leaks
+- Do errors distinguish "record not found" from "record exists but you can't see it"? They should **not** — both return 404. Revealing existence is an oracle.
+- Login errors: "invalid email" vs. "invalid password" = enumeration oracle. Both return "invalid credentials".
+- Stack traces, SQL errors, file paths in error responses — all debugging aids that become disclosure bugs in production. What's the production error shape? Do you have tests that assert it doesn't leak?
+- Error codes: are they stable and documented? "ERR_1042" isn't user-hostile; "database connection timeout on host db-prod-01.internal" is.
+## Pagination & bulk ops
+- Is there an upper bound on `limit`? Unbounded = trivial DoS and data exfil. What's the cap (1000 is common), and does the client know it was capped (via `has_next`)?
+- Cursor vs. offset: cursor-based is better for deep pagination and for immutable-once-read semantics. Offset lets attackers enumerate by incrementing.
+- Bulk ops (`POST /things/bulk`): per-element authZ, not just outer authZ. The handler might accept 500 resource ids and forget to check each one.
+## Webhooks (outbound)
+- Is the webhook destination user-supplied? If yes: this is SSRF-shaped. Allowlist the target (domain allowlist *plus* IP allowlist that excludes RFC1918, link-local, cloud metadata `169.254.169.254`).
+- Signed payloads: HMAC with a per-receiver secret, signature in a header, timestamp in the payload. Receivers should verify signature *and* reject stale timestamps (replay protection).
+- Retry policy: exponential backoff with a cap; max retries; dead-letter queue. Unbounded retries = self-DoS on receiver outages.
+- Does the delivery include PII? If yes, the receiver URL is now part of your data flow for compliance purposes. You need a deletion story for their side too.
+## Webhooks (inbound)
+- Verification: HMAC check, timestamp tolerance (< 5 min), replay cache (seen this signature recently?).
+- The payload is untrusted. Parse into a typed schema, reject unknown fields — don't echo into the DB.
+## Versioning & deprecation
+- How do you version? URL path (`/v1/`), header (`Accept: application/vnd.example.v1+json`), or query param? Pick one and stick with it.
+- How do you deprecate an endpoint? Sunset date, `Deprecation` header, metrics on which clients still call it. Deprecation without metrics = deprecation forever.
+- Old versions are old attack surface. Every live version is a maintenance cost.
+## API keys & client credentials
+- Scope per key (what endpoints, what data); expiration; revocation.
+- Does the key identify a **principal** (user / service) or just a **contract**? Per-principal is easier to audit; "contract keys" that many services share lose attribution.
+- Key display: show once at creation, store hashed. Rotation flow: overlap window (old + new valid) to avoid downtime.
+- Per-key audit log: every authenticated call names the key.
+## Refuse-list
+- Endpoints that accept the full request body into an ORM model without an allowlist.
+- 404 vs. 403 that leaks existence. (It's fine to 403 on a *permission* mismatch when the user knows the resource exists; not on resource-existence probes.)
+- Unbounded `limit` parameters.
+- User-supplied URLs fetched without an allowlist + IP denylist.
+- Login / password-reset endpoints without rate limits on both IP *and* account.
+- Error responses that include DB errors, file paths, or stack traces in production.
+- Webhook verification that's only "is there a signature header" without validating it.
+- API versioning schemes where v1 is never sunsetted (perpetual liability).
+---
+## Exit criteria
+- Every endpoint has a one-line authZ rule ("caller's user_id must equal the resource's owner_id, or the caller's role must be admin").
+- Mass-assignment story is explicit — allowlist, not auto-bind.
+- Rate limit keys are defined and justified per endpoint class.
+- Error taxonomy is in the spec, not up to the implementer.
+- Webhook designs (if any) specify signing, replay protection, and (outbound) SSRF defense.

package/resources/skills/rafter-secure-design/docs/auth.md ADDED Viewed

@@ -0,0 +1,67 @@
+# Authentication & Authorization — Design Questions
+Answer each block *before* you write code. If the answer is "we'll figure it out later", you have a design gap, not a plan. Cite the proposed primitive (library, spec, service) in your answer.
+## Identity — who is the user?
+- Is this **end users** (humans), **services** (internal/external), or **agents** (LLMs, automations)? The authN primitive differs for each; do not use one pipeline for all three.
+- For humans: are you federating (SSO / OIDC / SAML) or running your own password + MFA? Running your own is a maintenance burden — can you justify not federating?
+- For services: mTLS? Signed JWTs with a trusted issuer? Workload identity (SPIFFE, cloud IAM)? What *refuses* a service call — is absence of credential a 401 or a silent allow?
+- For agents: is the agent acting **as the user** (delegated) or **as itself** (service principal)? Delegated needs scoped tokens with user consent; as-itself needs audit trails that name the agent + the invoking user.
+## AuthN — choose the primitive and say why
+- Session cookies + server-side session store, or self-contained tokens (JWT / PASETO)?
+  - Sessions: easier to revoke, harder to scale across regions without sticky state.
+  - JWT: scales, harder to revoke — do you have a plan for revocation (short TTL + refresh, or a revocation list)?
+- If JWT: which algorithm are you signing with? **Refuse `alg: none`** and **refuse HS256 with any key the verifier can confuse with a public key.** Prefer `EdDSA` or `RS256`/`ES256` with a clearly separated key store.
+- If OAuth / OIDC: which flow? Authorization Code + PKCE for any client that isn't a trusted backend. **Never implicit flow in 2026.** If you have a reason, write it down.
+- Password policy: bcrypt / scrypt / argon2id — which one and what cost? Do you plan to rehash on login when cost parameters bump? What's the plan for credential stuffing (rate limit, captcha, breach-list checks)?
+- MFA: present at login only, or also at sensitive actions (password change, MFA enrollment, payment method change, export-all-data)? MFA enrollment itself needs anti-bypass (don't let "add a new device" bypass existing MFA).
+## AuthZ — model the access, don't reinvent it
+- Is the model **RBAC** (roles → permissions), **ABAC** (attributes → policy), or **ReBAC** (relationships, Zanzibar-style)?
+  - RBAC: cheap, coarse. Fails on "users can only see their own records" — that's a resource-ownership check, not a role.
+  - ABAC: flexible, hard to audit. If the policy is "user.org == resource.org AND (user.role == admin OR resource.owner == user)", write it as a policy engine input (Rego, Cedar), not scattered `if` statements.
+  - ReBAC: best for hierarchical sharing (docs, folders, workspaces). Expensive to bolt on later — decide now if you'll need it.
+- Where is authZ enforced? **At every entry point to the domain layer**, not per-route. Router-layer middleware checks authN, the domain layer checks authZ against the resource. If both live in the controller, the next developer will forget one.
+- IDOR / BOLA: for every resource access, is the ID scoped to the caller? `GET /orders/:id` that returns any order in the database is a bug. Are you checking `order.tenant_id == user.tenant_id` *and* `order.user_id == user.id` (or a delegated-access rule)?
+## Sessions / tokens — lifetime and revocation
+- Session lifetime: idle timeout and absolute timeout? "Remember me" — what invalidates it on password change, MFA reset, account deletion?
+- Refresh tokens: single-use (rotating) or replayable? Rotating + detection-on-reuse is the modern default. If you can't detect reuse, you lose the audit signal.
+- Revocation list: where does it live? Is it read on every authZ check (expensive) or pushed to token TTL only? If the latter, your TTL *is* your worst-case revocation delay — be honest about that.
+- Logout: does it actually invalidate server-side, or just clear the cookie? A logout that only clears the client is a lie.
+## Multi-tenant isolation
+- Tenancy is an authZ concern, not a DB trick. Is the tenant id on every query? What enforces that — raw SQL, ORM hook, policy engine? (ORM hook is easy to bypass with raw queries; policy engine with query-rewriting is strongest.)
+- Is the tenant id from the **session**, never from the **request**? `?tenant_id=X` in the URL is a footgun.
+- Cross-tenant sharing (delegation, impersonation for support, data export): designed explicitly, or accidental because of a missing check?
+## Service-to-service
+- Zero-trust posture: does every internal call still carry and verify identity, or is the internal network treated as trusted? (Treat-as-trusted has failed in every post-mortem for a decade.)
+- How is service identity bootstrapped? Static long-lived secrets in env vars are the weakest option — workload identity (AWS IAM, GCP WIF, Kubernetes SA tokens, SPIFFE) is strongest.
+- Does the callee log *which* service called it? Without that, you can't incident-respond.
+## Refuse-list — if any of these are in the proposal, stop and redesign
+- Homegrown password hashing (`sha256(pw + salt)` is not hashing).
+- Homegrown JWT signing/verification (pick a maintained library, prefer PASETO for new designs).
+- `alg: none` acceptance, or JWT libraries that don't pin algorithm.
+- "The internal network is trusted, so we skip auth between services."
+- Tenant id derived from the request path or query string rather than the session.
+- MFA that can be bypassed by enrolling a new device without re-verifying.
+- Password reset tokens that are long-lived, non-rotating, or tied to email only without rate-limit + recent-activity checks.
+---
+## Exit criteria
+- Each subsection above has a one-line answer, naming a specific primitive or library.
+- The refuse-list has been checked against the proposal; any hits are explicitly waived with a written "we accept this because...".
+- AuthZ model chosen and a first sketch of the policy (RBAC table / ABAC rules / ReBAC relations) exists.
+- You're ready to hand this section to the implementing engineer without ambiguity.

package/resources/skills/rafter-secure-design/docs/data-storage.md ADDED Viewed

@@ -0,0 +1,90 @@
+# Data Storage — Design Questions
+Where data lives determines blast radius. Every decision here is about making compromise cheap: key rotation, minimal retention, isolation by default.
+## Classify — what *is* this data?
+Before anything else, tag each field the design stores:
+- **Identifier**: email, username, account id. Useful to enumerate, often PII on its own.
+- **Credential**: password, API key, OAuth token, session id. Compromise = account takeover.
+- **PII / PHI / PCI**: personal / health / payment. Regulatory scope — GDPR, HIPAA, PCI-DSS apply. Which?
+- **Secret**: business-internal (encryption keys, signing keys, webhook secrets).
+- **Content**: user-generated text, files, comments. Defamation / CSAM risk is real — do you have a moderation path?
+- **Derived**: embeddings, summaries, ML features. Often treated as "not the original data" but can leak it — an embedding can sometimes reconstruct input.
+- **Audit / log**: who did what when. Usually keep-forever, but often contains identifiers — classify the fields, not just the collection.
+If a field doesn't fit a bucket, ask why it's being stored at all. **Data you don't have can't leak.**
+## Encryption at rest
+- Is the store's default disk-encryption enough (AWS RDS / GCP Cloud SQL / DynamoDB with CMK)? For most data, yes — don't add a second layer without a reason.
+- Application-level encryption is worth it when: (a) the DB operator is a different trust boundary than the app, (b) you need field-level access control tied to the app's authn, (c) compliance demands customer-managed keys. Don't encrypt application-side just to "feel safer" — you'll break queries, search, and analytics.
+- Envelope encryption (KMS-wrapped DEKs) is the pattern for app-side encryption. Who holds the KEK? Can you rotate it without re-encrypting every row?
+- Deterministic vs. randomized encryption: deterministic lets you query/join, but leaks equality. Decide per field.
+## Keys — the *actual* security boundary
+- Where do the keys live? KMS / HSM / Vault / env var? **Env var is weakest** — it ends up in logs, dumps, and `ps auxe` output.
+- Who can *use* the key (decrypt) vs. who can *manage* the key (rotate, destroy)? These must be separate IAM principals.
+- Rotation schedule: signing keys < 1 year, data keys rotated via envelope re-wrap (cheap), password hashing upgrade on login (transparent).
+- Key separation by tenant: single tenant per key is strongest (revoke = delete tenant key) but expensive. Per-tenant DEK with a shared KEK is a good middle ground.
+- Break-glass: how do you get out when KMS is down? Do you have a tested runbook, or will you find out during the incident?
+## Secrets (the application's own)
+- Application secrets (DB passwords, API keys, signing keys, webhook secrets) go in a secret manager (Vault, AWS Secrets Manager, GCP Secret Manager, Kubernetes Secrets with encryption-at-rest). **Not in env vars committed to repo; not in env vars set by deploy scripts that log them.**
+- Rotation: can you rotate without an outage? If the answer is "restart every service", that's OK for weekly but not daily. For rotation under pressure (leak detected), test the runbook *before* the leak.
+- Least privilege: each service gets its own secret with the smallest scope. The web frontend does not need the DB's admin password.
+## Encryption in transit
+- TLS everywhere, including internal. "Internal network is trusted" is not a posture, it's a wish.
+- What TLS version floor? TLS 1.2 is the practical minimum in 2026; 1.3 is the default for new designs.
+- Certificate management: automated (ACME, cert-manager, cloud-managed) or manual? Manual renewal is a recurring outage.
+- mTLS for service-to-service? Worth it when services are owned by different teams or spans security zones.
+## Retention & deletion
+- How long does each class live? Default to **shortest defensible** — you're not obligated to keep it forever. Data you delete can't subpoena, leak, or breach.
+- GDPR / CCPA deletion: when a user requests deletion, *what* is deleted? Logs, backups, analytics exports, ML training sets, embeddings? If the answer is "we'll figure it out", you'll fail an audit.
+- Soft-delete vs. hard-delete: soft-delete is good for recovery windows, bad for compliance. After the window closes, hard-delete and prove it (cryptographic erasure = destroy the key).
+- Backup scope: backups inherit the data's sensitivity. Are backups encrypted with a *different* key than live data (so a live-data compromise doesn't grant backups)?
+## Tenancy isolation
+- Row-level: single DB, tenant_id column. Cheapest, weakest — one missed `where tenant_id` = cross-tenant leak.
+- Schema-per-tenant: same DB, separate schemas. Mid-cost, mid-strength — ORM must respect search_path.
+- DB-per-tenant: separate DB per tenant. Most expensive, strongest — compromise of one DB doesn't leak others.
+- Decide by the cost of a cross-tenant leak, not by current scale. Upgrading later means a data migration.
+## Logs — the forgotten data store
+- Do logs contain the data classified above? Request bodies wholesale, error messages with stack traces containing secrets, URL query strings with tokens, user inputs echoed back — all common.
+- Who reads logs? A dev on their laptop? A SaaS log provider? Each hop is a trust boundary. Scrub secrets before the hop.
+- Log retention is often longer than the application's data retention — a GDPR deletion that misses logs is incomplete.
+## Caches, queues, search indices
+- Redis / Memcached / Elasticsearch / SQS / Kafka — each is a secondary data store. Classify what's in it.
+- Is the cache encrypted at rest? Accessible over TLS? Authenticated? **Unauthenticated Redis on a public IP is still the #1 cloud leak source in 2026.**
+- Search indices often copy data verbatim — a deleted record in the DB can linger in Elasticsearch unless you wire the deletion into both.
+## Refuse-list
+- Custom crypto primitives ("we xor with a rotating key"). Pick `libsodium` / `AEAD` via a maintained library.
+- Storing passwords reversibly. Ever.
+- Log statements that print request bodies, auth headers, or token-bearing URLs.
+- Backups in the same blast radius as live data (same account, same region, same key).
+- Tenant isolation enforced only at the ORM layer (raw queries bypass it).
+- Embeddings / ML features stored without the classification that the source data had.
+---
+## Exit criteria
+- Every stored field has a classification and a retention policy.
+- Encryption story names specific keys, a specific KMS, and a rotation cadence.
+- Secret distribution path is explicit — not "env vars set by Terraform".
+- Deletion path is defined for each data class, including logs and backups.
+- Tenant isolation level is chosen with a written justification.