npm - @rafter-security/cli - Versions diffs - 0.6.6 → 0.7.1 - Mend

@rafter-security/cli 0.6.6 → 0.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (70) hide show

package/README.md +29 -10
package/dist/commands/agent/audit-skill.js +22 -20
package/dist/commands/agent/audit.js +27 -0
package/dist/commands/agent/components.js +800 -0
package/dist/commands/agent/config.js +2 -1
package/dist/commands/agent/disable.js +47 -0
package/dist/commands/agent/enable.js +50 -0
package/dist/commands/agent/exec.js +2 -0
package/dist/commands/agent/index.js +6 -0
package/dist/commands/agent/init.js +162 -163
package/dist/commands/agent/install-hook.js +15 -14
package/dist/commands/agent/list.js +72 -0
package/dist/commands/agent/scan.js +4 -3
package/dist/commands/agent/verify.js +1 -1
package/dist/commands/backend/run.js +12 -3
package/dist/commands/backend/scan-status.js +3 -2
package/dist/commands/brief.js +22 -2
package/dist/commands/ci/init.js +25 -21
package/dist/commands/completion.js +4 -3
package/dist/commands/docs/index.js +18 -0
package/dist/commands/docs/list.js +37 -0
package/dist/commands/docs/show.js +64 -0
package/dist/commands/mcp/server.js +84 -0
package/dist/commands/report.js +42 -41
package/dist/commands/scan/index.js +7 -5
package/dist/commands/skill/index.js +14 -0
package/dist/commands/skill/install.js +89 -0
package/dist/commands/skill/list.js +79 -0
package/dist/commands/skill/registry.js +273 -0
package/dist/commands/skill/remote.js +333 -0
package/dist/commands/skill/review.js +975 -0
package/dist/commands/skill/uninstall.js +65 -0
package/dist/core/audit-logger.js +262 -21
package/dist/core/config-manager.js +3 -0
package/dist/core/docs-loader.js +148 -0
package/dist/core/policy-loader.js +72 -1
package/dist/core/risk-rules.js +16 -3
package/dist/index.js +19 -9
package/dist/scanners/gitleaks.js +6 -2
package/package.json +1 -1
package/resources/skills/rafter/SKILL.md +77 -97
package/resources/skills/rafter/docs/backend.md +106 -0
package/resources/skills/rafter/docs/cli-reference.md +199 -0
package/resources/skills/rafter/docs/finding-triage.md +79 -0
package/resources/skills/rafter/docs/guardrails.md +91 -0
package/resources/skills/rafter/docs/shift-left.md +64 -0
package/resources/skills/rafter-agent-security/SKILL.md +1 -1
package/resources/skills/rafter-code-review/SKILL.md +91 -0
package/resources/skills/rafter-code-review/docs/api.md +90 -0
package/resources/skills/rafter-code-review/docs/asvs.md +120 -0
package/resources/skills/rafter-code-review/docs/cwe-top25.md +78 -0
package/resources/skills/rafter-code-review/docs/investigation-playbook.md +101 -0
package/resources/skills/rafter-code-review/docs/llm.md +87 -0
package/resources/skills/rafter-code-review/docs/web-app.md +84 -0
package/resources/skills/rafter-secure-design/SKILL.md +103 -0
package/resources/skills/rafter-secure-design/docs/api-design.md +97 -0
package/resources/skills/rafter-secure-design/docs/auth.md +67 -0
package/resources/skills/rafter-secure-design/docs/data-storage.md +90 -0
package/resources/skills/rafter-secure-design/docs/dependencies.md +101 -0
package/resources/skills/rafter-secure-design/docs/deployment.md +104 -0
package/resources/skills/rafter-secure-design/docs/ingestion.md +98 -0
package/resources/skills/rafter-secure-design/docs/standards-pointers.md +102 -0
package/resources/skills/rafter-secure-design/docs/threat-modeling.md +128 -0
package/resources/skills/rafter-skill-review/SKILL.md +106 -0
package/resources/skills/rafter-skill-review/docs/authorship-provenance.md +82 -0
package/resources/skills/rafter-skill-review/docs/changelog-review.md +99 -0
package/resources/skills/rafter-skill-review/docs/data-practices.md +88 -0
package/resources/skills/rafter-skill-review/docs/malware-indicators.md +79 -0
package/resources/skills/rafter-skill-review/docs/prompt-injection.md +85 -0
package/resources/skills/rafter-skill-review/docs/telemetry.md +78 -0

package/resources/skills/rafter-secure-design/docs/api-design.md ADDED Viewed

@@ -0,0 +1,97 @@
+# API Design — Design Questions
+The shape of your API decides which vulnerabilities are *possible*. Good shape makes BOLA, BFLA, and mass assignment hard to write. Bad shape makes them hard to avoid.
+## Resource modeling — is this endpoint BOLA-shaped?
+- For each endpoint, what is the resource being named, and *how is it named*? `GET /orders/:id` names an order by global id — the caller can enumerate and try any id. Contrast with `GET /me/orders/:id` — scoped to the caller.
+- The scoping prefix (`/me`, `/org/:org_id`) doesn't enforce authZ by itself, but it makes the enforcement gap visible. "I forgot to check" is harder when the URL structure announces the scope.
+- Are identifiers **opaque** (random, unguessable) or **sequential**? Sequential ids aren't a security control, but combined with missing authZ they turn a 5-minute bug into a data breach. Opaque ids (UUIDv4, ULIDs with enough entropy) buy you a little defense-in-depth.
+- GraphQL: the resource boundary is per-field, not per-endpoint. You need authZ on every resolver that returns a resource, including nested resolvers. Think: "can a query walk from a public node to a private one via an edge?"
+## AuthZ enforcement point
+- Where does each endpoint check authorization?
+  - Before the handler (middleware / decorator): good for coarse checks (authenticated? role?).
+  - Inside the domain layer, against the specific resource: required for resource-level checks (can user X read order Y?).
+  - Both: middleware filters obvious unauthenticated traffic; domain checks the specific access.
+- Missing authZ checks are the #1 API bug class. Is there a test that *proves* every endpoint either returns 401 without auth or has an authZ test that denies a different user?
+- BFLA (broken function-level authz): admin actions on regular-user endpoints. Is there a single codepath that's reachable by multiple roles where only the check is different? That's the BFLA shape.
+## Request shape — mass assignment
+- Does the handler bind the full request body into a model, then save? `User.create(request.body)` is mass assignment — a client can set `is_admin: true` if the field exists on the model.
+- Explicit allowlist per endpoint, even if it's verbose. Frameworks that "automatically filter" are a landmine — the filter is correct until a field is added.
+- For updates: what fields are read-only? Created-at, created-by, tenant-id, owner-id — none of these should be settable by the client.
+## Idempotency & safety
+- Write endpoints: does the spec say idempotent or not? `PUT /things/:id` should be idempotent; `POST /things` usually isn't. Clients will retry — non-idempotent writes without an idempotency key will double-charge, double-send, double-create.
+- If you accept an `Idempotency-Key` header (Stripe-style): how long is the key scoped? Per-user, per-hour, per-day? Too short = legitimate retries fail; too long = stale dedup.
+- HTTP verb discipline: does the server accept verb-override headers (`X-HTTP-Method-Override`)? If yes, the "GET is safe" assumption breaks — any GET can become a POST.
+## Rate limiting & abuse
+- What are the **three** rate-limit keys? Per-IP (cheapest), per-API-key / per-user (account-level abuse), per-endpoint (expensive endpoints get lower limits).
+- Authentication endpoints (login, password reset, MFA): count per-account *and* per-IP. Per-IP alone misses credential stuffing with rotating IPs; per-account alone misses enumeration.
+- Webhook senders: self-rate-limit (queue, backoff). A storm of retries is a self-DoS.
+- Abuse cost: for expensive operations (file upload, image processing, LLM calls), what prevents one user from burning all the budget? Quotas > rate limits for cost control.
+## Error taxonomy — what leaks
+- Do errors distinguish "record not found" from "record exists but you can't see it"? They should **not** — both return 404. Revealing existence is an oracle.
+- Login errors: "invalid email" vs. "invalid password" = enumeration oracle. Both return "invalid credentials".
+- Stack traces, SQL errors, file paths in error responses — all debugging aids that become disclosure bugs in production. What's the production error shape? Do you have tests that assert it doesn't leak?
+- Error codes: are they stable and documented? "ERR_1042" isn't user-hostile; "database connection timeout on host db-prod-01.internal" is.
+## Pagination & bulk ops
+- Is there an upper bound on `limit`? Unbounded = trivial DoS and data exfil. What's the cap (1000 is common), and does the client know it was capped (via `has_next`)?
+- Cursor vs. offset: cursor-based is better for deep pagination and for immutable-once-read semantics. Offset lets attackers enumerate by incrementing.
+- Bulk ops (`POST /things/bulk`): per-element authZ, not just outer authZ. The handler might accept 500 resource ids and forget to check each one.
+## Webhooks (outbound)
+- Is the webhook destination user-supplied? If yes: this is SSRF-shaped. Allowlist the target (domain allowlist *plus* IP allowlist that excludes RFC1918, link-local, cloud metadata `169.254.169.254`).
+- Signed payloads: HMAC with a per-receiver secret, signature in a header, timestamp in the payload. Receivers should verify signature *and* reject stale timestamps (replay protection).
+- Retry policy: exponential backoff with a cap; max retries; dead-letter queue. Unbounded retries = self-DoS on receiver outages.
+- Does the delivery include PII? If yes, the receiver URL is now part of your data flow for compliance purposes. You need a deletion story for their side too.
+## Webhooks (inbound)
+- Verification: HMAC check, timestamp tolerance (< 5 min), replay cache (seen this signature recently?).
+- The payload is untrusted. Parse into a typed schema, reject unknown fields — don't echo into the DB.
+## Versioning & deprecation
+- How do you version? URL path (`/v1/`), header (`Accept: application/vnd.example.v1+json`), or query param? Pick one and stick with it.
+- How do you deprecate an endpoint? Sunset date, `Deprecation` header, metrics on which clients still call it. Deprecation without metrics = deprecation forever.
+- Old versions are old attack surface. Every live version is a maintenance cost.
+## API keys & client credentials
+- Scope per key (what endpoints, what data); expiration; revocation.
+- Does the key identify a **principal** (user / service) or just a **contract**? Per-principal is easier to audit; "contract keys" that many services share lose attribution.
+- Key display: show once at creation, store hashed. Rotation flow: overlap window (old + new valid) to avoid downtime.
+- Per-key audit log: every authenticated call names the key.
+## Refuse-list
+- Endpoints that accept the full request body into an ORM model without an allowlist.
+- 404 vs. 403 that leaks existence. (It's fine to 403 on a *permission* mismatch when the user knows the resource exists; not on resource-existence probes.)
+- Unbounded `limit` parameters.
+- User-supplied URLs fetched without an allowlist + IP denylist.
+- Login / password-reset endpoints without rate limits on both IP *and* account.
+- Error responses that include DB errors, file paths, or stack traces in production.
+- Webhook verification that's only "is there a signature header" without validating it.
+- API versioning schemes where v1 is never sunsetted (perpetual liability).
+---
+## Exit criteria
+- Every endpoint has a one-line authZ rule ("caller's user_id must equal the resource's owner_id, or the caller's role must be admin").
+- Mass-assignment story is explicit — allowlist, not auto-bind.
+- Rate limit keys are defined and justified per endpoint class.
+- Error taxonomy is in the spec, not up to the implementer.
+- Webhook designs (if any) specify signing, replay protection, and (outbound) SSRF defense.

package/resources/skills/rafter-secure-design/docs/auth.md ADDED Viewed

@@ -0,0 +1,67 @@
+# Authentication & Authorization — Design Questions
+Answer each block *before* you write code. If the answer is "we'll figure it out later", you have a design gap, not a plan. Cite the proposed primitive (library, spec, service) in your answer.
+## Identity — who is the user?
+- Is this **end users** (humans), **services** (internal/external), or **agents** (LLMs, automations)? The authN primitive differs for each; do not use one pipeline for all three.
+- For humans: are you federating (SSO / OIDC / SAML) or running your own password + MFA? Running your own is a maintenance burden — can you justify not federating?
+- For services: mTLS? Signed JWTs with a trusted issuer? Workload identity (SPIFFE, cloud IAM)? What *refuses* a service call — is absence of credential a 401 or a silent allow?
+- For agents: is the agent acting **as the user** (delegated) or **as itself** (service principal)? Delegated needs scoped tokens with user consent; as-itself needs audit trails that name the agent + the invoking user.
+## AuthN — choose the primitive and say why
+- Session cookies + server-side session store, or self-contained tokens (JWT / PASETO)?
+  - Sessions: easier to revoke, harder to scale across regions without sticky state.
+  - JWT: scales, harder to revoke — do you have a plan for revocation (short TTL + refresh, or a revocation list)?
+- If JWT: which algorithm are you signing with? **Refuse `alg: none`** and **refuse HS256 with any key the verifier can confuse with a public key.** Prefer `EdDSA` or `RS256`/`ES256` with a clearly separated key store.
+- If OAuth / OIDC: which flow? Authorization Code + PKCE for any client that isn't a trusted backend. **Never implicit flow in 2026.** If you have a reason, write it down.
+- Password policy: bcrypt / scrypt / argon2id — which one and what cost? Do you plan to rehash on login when cost parameters bump? What's the plan for credential stuffing (rate limit, captcha, breach-list checks)?
+- MFA: present at login only, or also at sensitive actions (password change, MFA enrollment, payment method change, export-all-data)? MFA enrollment itself needs anti-bypass (don't let "add a new device" bypass existing MFA).
+## AuthZ — model the access, don't reinvent it
+- Is the model **RBAC** (roles → permissions), **ABAC** (attributes → policy), or **ReBAC** (relationships, Zanzibar-style)?
+  - RBAC: cheap, coarse. Fails on "users can only see their own records" — that's a resource-ownership check, not a role.
+  - ABAC: flexible, hard to audit. If the policy is "user.org == resource.org AND (user.role == admin OR resource.owner == user)", write it as a policy engine input (Rego, Cedar), not scattered `if` statements.
+  - ReBAC: best for hierarchical sharing (docs, folders, workspaces). Expensive to bolt on later — decide now if you'll need it.
+- Where is authZ enforced? **At every entry point to the domain layer**, not per-route. Router-layer middleware checks authN, the domain layer checks authZ against the resource. If both live in the controller, the next developer will forget one.
+- IDOR / BOLA: for every resource access, is the ID scoped to the caller? `GET /orders/:id` that returns any order in the database is a bug. Are you checking `order.tenant_id == user.tenant_id` *and* `order.user_id == user.id` (or a delegated-access rule)?
+## Sessions / tokens — lifetime and revocation
+- Session lifetime: idle timeout and absolute timeout? "Remember me" — what invalidates it on password change, MFA reset, account deletion?
+- Refresh tokens: single-use (rotating) or replayable? Rotating + detection-on-reuse is the modern default. If you can't detect reuse, you lose the audit signal.
+- Revocation list: where does it live? Is it read on every authZ check (expensive) or pushed to token TTL only? If the latter, your TTL *is* your worst-case revocation delay — be honest about that.
+- Logout: does it actually invalidate server-side, or just clear the cookie? A logout that only clears the client is a lie.
+## Multi-tenant isolation
+- Tenancy is an authZ concern, not a DB trick. Is the tenant id on every query? What enforces that — raw SQL, ORM hook, policy engine? (ORM hook is easy to bypass with raw queries; policy engine with query-rewriting is strongest.)
+- Is the tenant id from the **session**, never from the **request**? `?tenant_id=X` in the URL is a footgun.
+- Cross-tenant sharing (delegation, impersonation for support, data export): designed explicitly, or accidental because of a missing check?
+## Service-to-service
+- Zero-trust posture: does every internal call still carry and verify identity, or is the internal network treated as trusted? (Treat-as-trusted has failed in every post-mortem for a decade.)
+- How is service identity bootstrapped? Static long-lived secrets in env vars are the weakest option — workload identity (AWS IAM, GCP WIF, Kubernetes SA tokens, SPIFFE) is strongest.
+- Does the callee log *which* service called it? Without that, you can't incident-respond.
+## Refuse-list — if any of these are in the proposal, stop and redesign
+- Homegrown password hashing (`sha256(pw + salt)` is not hashing).
+- Homegrown JWT signing/verification (pick a maintained library, prefer PASETO for new designs).
+- `alg: none` acceptance, or JWT libraries that don't pin algorithm.
+- "The internal network is trusted, so we skip auth between services."
+- Tenant id derived from the request path or query string rather than the session.
+- MFA that can be bypassed by enrolling a new device without re-verifying.
+- Password reset tokens that are long-lived, non-rotating, or tied to email only without rate-limit + recent-activity checks.
+---
+## Exit criteria
+- Each subsection above has a one-line answer, naming a specific primitive or library.
+- The refuse-list has been checked against the proposal; any hits are explicitly waived with a written "we accept this because...".
+- AuthZ model chosen and a first sketch of the policy (RBAC table / ABAC rules / ReBAC relations) exists.
+- You're ready to hand this section to the implementing engineer without ambiguity.

package/resources/skills/rafter-secure-design/docs/data-storage.md ADDED Viewed

@@ -0,0 +1,90 @@
+# Data Storage — Design Questions
+Where data lives determines blast radius. Every decision here is about making compromise cheap: key rotation, minimal retention, isolation by default.
+## Classify — what *is* this data?
+Before anything else, tag each field the design stores:
+- **Identifier**: email, username, account id. Useful to enumerate, often PII on its own.
+- **Credential**: password, API key, OAuth token, session id. Compromise = account takeover.
+- **PII / PHI / PCI**: personal / health / payment. Regulatory scope — GDPR, HIPAA, PCI-DSS apply. Which?
+- **Secret**: business-internal (encryption keys, signing keys, webhook secrets).
+- **Content**: user-generated text, files, comments. Defamation / CSAM risk is real — do you have a moderation path?
+- **Derived**: embeddings, summaries, ML features. Often treated as "not the original data" but can leak it — an embedding can sometimes reconstruct input.
+- **Audit / log**: who did what when. Usually keep-forever, but often contains identifiers — classify the fields, not just the collection.
+If a field doesn't fit a bucket, ask why it's being stored at all. **Data you don't have can't leak.**
+## Encryption at rest
+- Is the store's default disk-encryption enough (AWS RDS / GCP Cloud SQL / DynamoDB with CMK)? For most data, yes — don't add a second layer without a reason.
+- Application-level encryption is worth it when: (a) the DB operator is a different trust boundary than the app, (b) you need field-level access control tied to the app's authn, (c) compliance demands customer-managed keys. Don't encrypt application-side just to "feel safer" — you'll break queries, search, and analytics.
+- Envelope encryption (KMS-wrapped DEKs) is the pattern for app-side encryption. Who holds the KEK? Can you rotate it without re-encrypting every row?
+- Deterministic vs. randomized encryption: deterministic lets you query/join, but leaks equality. Decide per field.
+## Keys — the *actual* security boundary
+- Where do the keys live? KMS / HSM / Vault / env var? **Env var is weakest** — it ends up in logs, dumps, and `ps auxe` output.
+- Who can *use* the key (decrypt) vs. who can *manage* the key (rotate, destroy)? These must be separate IAM principals.
+- Rotation schedule: signing keys < 1 year, data keys rotated via envelope re-wrap (cheap), password hashing upgrade on login (transparent).
+- Key separation by tenant: single tenant per key is strongest (revoke = delete tenant key) but expensive. Per-tenant DEK with a shared KEK is a good middle ground.
+- Break-glass: how do you get out when KMS is down? Do you have a tested runbook, or will you find out during the incident?
+## Secrets (the application's own)
+- Application secrets (DB passwords, API keys, signing keys, webhook secrets) go in a secret manager (Vault, AWS Secrets Manager, GCP Secret Manager, Kubernetes Secrets with encryption-at-rest). **Not in env vars committed to repo; not in env vars set by deploy scripts that log them.**
+- Rotation: can you rotate without an outage? If the answer is "restart every service", that's OK for weekly but not daily. For rotation under pressure (leak detected), test the runbook *before* the leak.
+- Least privilege: each service gets its own secret with the smallest scope. The web frontend does not need the DB's admin password.
+## Encryption in transit
+- TLS everywhere, including internal. "Internal network is trusted" is not a posture, it's a wish.
+- What TLS version floor? TLS 1.2 is the practical minimum in 2026; 1.3 is the default for new designs.
+- Certificate management: automated (ACME, cert-manager, cloud-managed) or manual? Manual renewal is a recurring outage.
+- mTLS for service-to-service? Worth it when services are owned by different teams or spans security zones.
+## Retention & deletion
+- How long does each class live? Default to **shortest defensible** — you're not obligated to keep it forever. Data you delete can't subpoena, leak, or breach.
+- GDPR / CCPA deletion: when a user requests deletion, *what* is deleted? Logs, backups, analytics exports, ML training sets, embeddings? If the answer is "we'll figure it out", you'll fail an audit.
+- Soft-delete vs. hard-delete: soft-delete is good for recovery windows, bad for compliance. After the window closes, hard-delete and prove it (cryptographic erasure = destroy the key).
+- Backup scope: backups inherit the data's sensitivity. Are backups encrypted with a *different* key than live data (so a live-data compromise doesn't grant backups)?
+## Tenancy isolation
+- Row-level: single DB, tenant_id column. Cheapest, weakest — one missed `where tenant_id` = cross-tenant leak.
+- Schema-per-tenant: same DB, separate schemas. Mid-cost, mid-strength — ORM must respect search_path.
+- DB-per-tenant: separate DB per tenant. Most expensive, strongest — compromise of one DB doesn't leak others.
+- Decide by the cost of a cross-tenant leak, not by current scale. Upgrading later means a data migration.
+## Logs — the forgotten data store
+- Do logs contain the data classified above? Request bodies wholesale, error messages with stack traces containing secrets, URL query strings with tokens, user inputs echoed back — all common.
+- Who reads logs? A dev on their laptop? A SaaS log provider? Each hop is a trust boundary. Scrub secrets before the hop.
+- Log retention is often longer than the application's data retention — a GDPR deletion that misses logs is incomplete.
+## Caches, queues, search indices
+- Redis / Memcached / Elasticsearch / SQS / Kafka — each is a secondary data store. Classify what's in it.
+- Is the cache encrypted at rest? Accessible over TLS? Authenticated? **Unauthenticated Redis on a public IP is still the #1 cloud leak source in 2026.**
+- Search indices often copy data verbatim — a deleted record in the DB can linger in Elasticsearch unless you wire the deletion into both.
+## Refuse-list
+- Custom crypto primitives ("we xor with a rotating key"). Pick `libsodium` / `AEAD` via a maintained library.
+- Storing passwords reversibly. Ever.
+- Log statements that print request bodies, auth headers, or token-bearing URLs.
+- Backups in the same blast radius as live data (same account, same region, same key).
+- Tenant isolation enforced only at the ORM layer (raw queries bypass it).
+- Embeddings / ML features stored without the classification that the source data had.
+---
+## Exit criteria
+- Every stored field has a classification and a retention policy.
+- Encryption story names specific keys, a specific KMS, and a rotation cadence.
+- Secret distribution path is explicit — not "env vars set by Terraform".
+- Deletion path is defined for each data class, including logs and backups.
+- Tenant isolation level is chosen with a written justification.

package/resources/skills/rafter-secure-design/docs/dependencies.md ADDED Viewed

@@ -0,0 +1,101 @@
+# Dependencies & Supply Chain — Design Questions
+Every dependency is a trust transfer: their bugs become yours, their maintainers become your dependency on goodwill. The question at design time is "is this worth the transfer?"
+## Pick vs. write — which one
+- Cryptography, authN / authZ primitives, parsers for complex formats, protocol implementations: **pick, don't write.** The library has years of eyes and fuzz time.
+- Glue code, config loaders, small utility functions: **write, don't pick.** A 5-line helper beats a transitively-huge dependency.
+- The middle (rate limiters, retry logic, caches): depends on how mature your language's standard library is. Go stdlib + a small helper often beats pulling in a 300-line middleware framework.
+## Maintenance signal — before you adopt
+Read the repo before adopting. Answers to these in one sitting:
+- When was the last commit, release, CVE response? Dormant ≠ dead, but "last release 2019" for a security-adjacent lib is a risk.
+- How many maintainers? Solo-maintainer packages are a bus-factor and takeover risk (npm `event-stream`, PyPI `ctx`).
+- Does the project publish a security policy (SECURITY.md, GHSA history)? Projects that have handled CVEs well handle them well.
+- Download count and reverse-dependency count: high-popularity packages get eyes on them; low-popularity is higher chance of silent badness.
+- Typosquat / slopsquat check: is this the real package name? LLM-generated install instructions now routinely hallucinate package names that bad actors then register. Verify from the project's own README / GitHub.
+## Install-time execution
+- `postinstall` / `preinstall` / `prepare` hooks in npm, arbitrary `setup.py` code in Python, Gradle init scripts, Cargo build scripts — all run with your developer's or CI's permissions.
+- Does your package manager have a way to disable these? npm `--ignore-scripts`, `pnpm install --ignore-scripts` + allowlist via `packageExtensions`. Pip has `--no-binary` but less granular.
+- CI should install with the strictest flags. Developers can run with scripts enabled *after* review.
+## Pinning & lockfiles
+- Lockfile (`package-lock.json`, `pnpm-lock.yaml`, `yarn.lock`, `poetry.lock`, `Cargo.lock`, `go.sum`) committed. No exceptions for "libraries" — downstream lockfiles are the user's responsibility, but your CI needs reproducibility.
+- Range pinning in the manifest (`^1.2.3`) is fine for libraries; applications benefit from exact pins + a lockfile for reproducibility.
+- Lockfile verification in CI (`npm ci`, `pnpm install --frozen-lockfile`, `yarn install --immutable`, `poetry install --no-update`). Without verification, a drifted lockfile ships unknown code.
+## Vendoring vs. registry
+- Registry (npm, PyPI, Go proxy, crates.io): convenient, but the registry is a trust root. Compromise of a maintainer account has shipped malware repeatedly.
+- Registry mirror / proxy (Artifactory, Cloudsmith, Google Artifact Registry): lets you cache + scan + pin. Best-of-both for teams with infra.
+- Vendoring: committing dependency code into your repo. Highest control, highest cost. Justified for (a) critical dependencies you need to patch locally, (b) airgapped builds, (c) compliance requirements.
+## SCA — hook it in, don't treat it as a quarterly task
+- SCA on every PR and on main: Dependabot, Renovate, Snyk, Trivy, Grype, `rafter run` (which aggregates SCA).
+- Auto-PRs for dependency updates: accept them with tests gating. Batching 3 months of updates is worse than a weekly drip.
+- Critical CVEs (known-exploited, CVSS ≥ 9): page on detection, not "log and review later".
+- Noise management: not every CVE applies to how you use the library. Triage policy is part of the design — who decides what's accepted, and how is the decision logged?
+## Supply chain attacks to design against
+- **Typosquat / slopsquat**: package name misspellings, especially for names an LLM might generate. Pin from upstream README only.
+- **Dependency confusion**: your private package name registered publicly. Publish a placeholder of your internal package names, or use scoped packages with registry routing.
+- **Maintainer takeover**: compromised maintainer account publishes malware. Defenses: pin by digest (where supported), monitor for unexpected releases.
+- **Protestware / hacktivism**: maintainer deliberately ships malware or destructive code (e.g., `node-ipc`). Pinning catches it; SCA post-mortem confirms.
+- **Compromised CI**: build-time tamper that injects malware into your artifact. Defenses: reproducible builds, signed provenance (SLSA), isolated build environment.
+## Transitive depth
+- How deep is the dep tree? `npm ls` / `cargo tree` / `pipdeptree`. Dozens of transitive deps per direct dep = huge attack surface.
+- Does each direct dep pull in its own HTTP client, its own JSON parser, its own date library? Consolidate at the application level where possible.
+- Transitive version conflicts: which wins? In npm / pnpm, hoisting rules. In Python, last-wins. Explicit `overrides` / `resolutions` let you force a patched version.
+## Container images as dependencies
+- Base images are dependencies — same maintenance questions apply. Distroless (Google-maintained) and Chainguard (security-first) are first-party; random Docker Hub images are not.
+- Pin by digest. `image:tag` is mutable.
+- Multi-stage builds: builder image can be heavy; final image should be minimal. Don't ship your build toolchain to prod.
+- Image scanning in CI: `trivy image`, `grype`, cloud-native scanners. Block deploys on critical findings for production.
+## SaaS dependencies
+- Adopting a SaaS is also a dep: your data, their availability and security posture.
+- Do they publish a SOC 2 / ISO 27001 / security whitepaper? Not gospel, but absence is a signal.
+- Where does the data live (region, sub-processors)? For PII, this is a compliance question.
+- Offboarding: if they vanish or you churn, how do you migrate? Vendor lock-in is a security issue too (can't rotate away from a breach).
+## LLM / AI libraries — the new supply chain
+- Model weights are dependencies. Which model, which version, hosted where?
+- Inference SDKs (openai, anthropic, litellm) are dependencies with the standard risks *plus* credential-surface (API keys per provider).
+- Vector DB clients (pinecone, qdrant, chroma) are dependencies that also hold your embeddings — classify accordingly.
+- `prompt-injection-guard` style libraries are pattern-based and will never catch novel attacks — adopt but don't trust absolutely.
+## Refuse-list
+- Pulling a dependency from a raw git URL or GitHub tarball without pinning commit SHA.
+- Adopting a package because an LLM suggested the name, without verifying it exists upstream (slopsquat bait).
+- `:latest` tags on base images or dependency versions.
+- CI that installs with `postinstall` enabled on every run without script review.
+- Solo-maintained packages in your critical path (auth, crypto, payments) without a forking / vendoring plan.
+- Adopting a SaaS for a compliance-scoped workload without reviewing their posture.
+- Skipping the lockfile because "we're a library".
+- SCA as a quarterly scan rather than a PR-level gate.
+---
+## Exit criteria
+- Every new direct dependency has a one-line justification (pick vs. write, maintenance signal reviewed).
+- Install-time execution policy is specified for CI.
+- Lockfile + verification in CI is confirmed.
+- SCA tool is wired to PRs, with a triage policy for findings.
+- Base images are pinned by digest with a rebuild cadence.
+- If the design uses a SaaS or LLM provider, the data-flow and credential-scope are drawn.

package/resources/skills/rafter-secure-design/docs/deployment.md ADDED Viewed

@@ -0,0 +1,104 @@
+# Deployment — Design Questions
+Deployment is where "the app is secure" meets reality. Network boundaries, runtime posture, secret distribution, build provenance — each decided here survives every refactor of the code.
+## Network topology — zones, not flat
+- Sketch zones: public edge (LB / CDN / WAF), app tier, data tier, admin tier, third-party egress. Each is a distinct security zone.
+- What traffic is allowed **between** zones, and what's denied by default? Default-deny is the only sane starting point. If the default is allow and you block selectively, you're one misconfiguration from exposure.
+- Public edge: what terminates TLS? WAF in front or not? WAF is good for cheap-filter; not a substitute for app-side validation.
+- Admin access (SSH, kube-exec, DB console): over the public internet? Over a VPN / zero-trust proxy (Tailscale, Cloudflare Access, Teleport)? The public-internet-with-a-bastion is a 2005 pattern.
+## Egress — the forgotten boundary
+- Can your app reach arbitrary internet destinations? Default should be "allowlist of known egress targets" (external APIs you integrate with, OS package mirrors, telemetry).
+- Egress control is the best SSRF defense *and* the best data-exfiltration defense. If a compromised app can only reach `api.stripe.com`, the blast radius is Stripe calls.
+- Metadata services (169.254.169.254): block at the network layer, not just the app. IMDSv2 on AWS (required hop limit = 1 + session token) blocks the rebinding variant.
+## Identity & IAM
+- Every compute workload has a workload identity (AWS IAM role, GCP service account, Kubernetes ServiceAccount + bound tokens, SPIFFE ID). **Not shared credentials, not long-lived keys.**
+- Least privilege per workload. "The web service has DB read + DB write + admin on this one table" is better than "the web service has AdminAccess".
+- Break-glass access: there's an auditable path for a human to gain emergency privileges. Not a shared `root` password.
+- IAM changes go through code review (Terraform PR, Pulumi PR). Click-ops IAM is how wide-open permissions persist.
+## Secret distribution
+- Where does each service get its secrets? Secret manager (Vault, AWS SM, GCP SM, Kubernetes Secrets with sealed / external-secrets), *not* Terraform-plan output, *not* env vars set by a deploy script that logs them.
+- Secrets rotate. Short-lived DB credentials (Vault dynamic secrets, IAM database auth) > long-lived passwords. If your design says "quarterly rotation of a static password", name who does it and how.
+- Secrets are scoped per service. The web tier doesn't have the admin DB credential.
+- Encryption-at-rest for the secret manager itself: by default on all cloud-managed; verify for self-hosted.
+- Secrets in CI: scoped per job, never printed to logs, masked in output. PR workflows triggered from forks don't see secrets.
+## Container / runtime posture
+- Run as non-root. If `USER 0` or `runAsUser: 0`, flag it.
+- Read-only root filesystem where possible. Writable mounts are explicit (`/tmp`, named volumes).
+- Capabilities: drop all, add back only what's needed. `CAP_NET_BIND_SERVICE` is the usual one.
+- Seccomp / AppArmor / SELinux profile: a real profile, not "Unconfined".
+- Resource limits: CPU and memory limits per container. No limit = one compromised pod can starve the node.
+## Base images
+- Distroless / Alpine / minimal / scratch > Ubuntu full. Fewer packages = fewer CVEs, smaller attack surface.
+- Pin by digest (`image@sha256:...`), not tag. `:latest` and even `:v1.2.3` can be overwritten; digests are immutable.
+- SCA on base images in CI. Re-pull / rebuild cadence (weekly) to pick up upstream patches.
+- Who maintains the base image? First-party (your team) > team-adjacent > "some Docker Hub account". Unmaintained bases rot.
+## Build provenance & supply chain
+- Is the build reproducible? Given the same inputs, does a rebuild produce the same artifact? Not always achievable, but worth asking.
+- SLSA level: aim for SLSA 3 (hosted builder, signed provenance) for anything shipping to production. SLSA 1 (provenance exists) is the minimum.
+- Artifact signing: Sigstore / Cosign / Notary. Signatures verified at deploy, not just at build.
+- Dependency pinning: lockfile committed, lockfile verified in CI.
+- `postinstall` / `prepare` scripts from dependencies: ban or audit. These execute arbitrary code on install — it's the npm supply-chain attack class.
+- SBOM generation at build time. Store it with the artifact.
+## CI/CD posture
+- Who can deploy to prod? Production deploys gated on approval, signed tags, or protected branch merges.
+- CI runners: ephemeral (fresh VM / container per job), not long-running hosts with persistent state.
+- Workflow permissions: least-privilege GITHUB_TOKEN / equivalent. Write-all is the click-to-compromise default.
+- Self-hosted runners + public repo = RCE. Either make the repo private, use GitHub-hosted runners for public workflows, or lock runners to specific workflows.
+- Branch protection: required reviews, required status checks, no force-push to main. Linear history if you need audit simplicity.
+## Production-vs-staging parity
+- Same architecture in staging as prod, with masked / synthetic data. Staging that uses prod data = a second prod blast radius with half the controls.
+- Config differences are explicit and minimal. "We disable auth in staging" is how auth gets disabled in prod one day by accident.
+- Feature flags that default-off in prod and default-on in staging: tested in both states.
+## Multi-region / DR
+- If the design spans regions: is the active/passive or active/active model clear? What's replicated, what's per-region?
+- Encryption keys per region, or a global key? (Global is simpler but expands blast radius.)
+- Failover runbook exists and was tested in the last 12 months. Not-yet-tested = doesn't work.
+## Logging & monitoring posture
+- Structured logs, shipped to a separate system (not the same DB the app writes to). A compromise of app storage shouldn't delete the audit trail.
+- Authentication to the log system: workload identity, not shared token.
+- What paging signals exist? Login-anomaly rates, authZ denials, 5xx surges, unusual egress — without these, the breach is found by the customer.
+- Retention: logs often outlive production data. Classify log contents and apply retention accordingly.
+## Refuse-list
+- Long-lived static cloud credentials baked into container images or env vars.
+- Privileged containers (`privileged: true`, `runAsUser: 0` without justification).
+- `:latest` tags or unpinned base images in production manifests.
+- CI workflows with write-all GITHUB_TOKEN scope by default.
+- "We'll add network policy later" — network default-allow is not a plan.
+- Secrets set via Terraform variable with plan output visible in logs.
+- Shared SSH keys, shared `root` password, shared admin console.
+- Metadata service reachable from a public-facing container (IMDSv1, or IMDSv2 with unlimited hop count).
+---
+## Exit criteria
+- Zone diagram exists; cross-zone traffic is allowlisted, not denylisted.
+- Each workload has a named identity and a scoped IAM role.
+- Secret distribution names the secret manager and the rotation model.
+- Container runtime posture is specified: user, filesystem, capabilities, resource limits.
+- Build pipeline specifies provenance (SLSA), signing, and dependency pinning.
+- Log shipping + retention is set, independent of application storage.

package/resources/skills/rafter-secure-design/docs/ingestion.md ADDED Viewed

@@ -0,0 +1,98 @@
+# Ingestion — Design Questions
+Every byte crossing your trust boundary is a question: "who says this is safe, and how?" Most of the OWASP Top 10 lives at ingestion — parsers, decoders, fetchers, uploaders.
+## Trust boundaries — name them
+- Draw the boundary: external (internet, partner API, user upload) → your edge → your internal services → your storage.
+- Each boundary crossing is a *validation point*. Validation means: shape check (schema), size check (bytes / fields), semantic check (does this make sense here?).
+- Validation at the edge is necessary but not sufficient — internal services that re-read the data need to re-validate if the trust delta matters (e.g., a cached input re-used later as a filename).
+- Parsers *are* the boundary for complex formats. A "validated JSON blob" that contains an eval-able code path is still a hole.
+## Input schemas — declare, don't hand-parse
+- Have a typed schema for every external input: JSON Schema, Zod, Pydantic, protobuf, OpenAPI-generated types. Reject unknown fields (`additionalProperties: false`).
+- Accepting unknown fields is how mass-assignment bugs enter — the attacker ships `is_admin: true` and the schema silently accepts it.
+- Length / size / range bounds on every field. Strings have max lengths, numbers have ranges, arrays have max sizes, nesting has max depth. Unbounded = DoS shape.
+- Regex validation: anchor with `^` and `$`. Fear catastrophic backtracking — test with a regex-safety linter or prefer RE2-backed engines.
+## Size limits — everywhere, early
+- Request body size cap at the edge (reverse proxy / API gateway). Don't rely on the framework to cap — it parses first, rejects second.
+- Per-field limits inside the body.
+- Upload size limits, file-count limits, total-request-size limits.
+- Decoder limits: JSON depth, XML entity count, zip expansion ratio (zip bomb). The default parser often has no cap — configure it explicitly.
+## Parser selection — safe default, not fast default
+- JSON: language-standard parser with strict mode. Reject duplicate keys (behavior varies across parsers — pick one that matches what your schema validator sees).
+- YAML: `yaml.safe_load` in Python, `js-yaml` with `safeLoad` / schema, `serde_yaml` with explicit types. **Never `yaml.load` without `SafeLoader`.**
+- XML: disable external entity resolution (XXE). `defusedxml` in Python, libraries with XXE off by default. If your design needs XML, flag this explicitly and pick the right library.
+- CSV: beware formula injection (`=CMD(...)` in a field opened by Excel). Prefix fields starting with `= + - @ \t \r` when exporting.
+- Protobuf / Thrift / MessagePack: safe-by-construction for schema violations, but size limits still needed.
+- Regex-heavy parsers: ReDoS risk. Prefer PEG / EBNF grammars for untrusted input where possible.
+- HTML / Markdown: never innerHTML raw; always sanitize (DOMPurify, bleach). Markdown renderers have inline-HTML modes — disable them for untrusted content.
+## Deserialization — the silent RCE
+- Any of `pickle.loads`, `yaml.load` (default), Java `ObjectInputStream`, PHP `unserialize`, .NET `BinaryFormatter`, `Marshal.load` — on untrusted bytes — is RCE-shaped.
+- If you *need* cross-language serialization: JSON, Protobuf, MessagePack, Avro. If you *need* native: sign the payload (HMAC) so only your own emitters are accepted, and still validate after deserialization.
+- Node `JSON.parse` + object assignment: prototype pollution via `__proto__` / `constructor` / `prototype` keys. Use `Object.create(null)` for dictionaries or a library that filters.
+## File uploads
+- What file types are accepted? Allowlist by **content sniff + declared MIME + extension**, not any one of them alone.
+- Storage: write under a random name (UUID) — never preserve the client-supplied filename in the path. Preserving it enables path traversal and overwrite attacks.
+- Scanner: for user-to-user content, run an AV / malware scan. For images, re-encode to strip EXIF + polyglot tricks.
+- Serving: serve from a different origin / subdomain than your app (so a rendered SVG or HTML can't steal same-origin cookies). Set `Content-Disposition: attachment` for anything that isn't trusted media.
+- Size: per-file and per-user/per-day quotas. Unbounded upload = cheap DoS + storage bomb.
+## Server-side fetchers — SSRF-shaped
+If any part of the design does "take a URL from user, fetch it":
+- Is there a concrete business reason? Image proxy, webhook configurer, PDF-from-URL, OAuth metadata fetch — each is a known SSRF vector.
+- Allowlist the destination **after** DNS resolution. `https://attacker.com` that DNS-resolves to `127.0.0.1` is the rebinding attack — resolve first, then decide.
+- Deny: RFC1918 (10/8, 172.16/12, 192.168/16), link-local (169.254/16), loopback (127/8, ::1), cloud metadata (169.254.169.254, metadata.google.internal, fd00:ec2::254), IPv6 equivalents, and any internal CIDR you own.
+- Redirects are fresh SSRF checks per hop. Disable redirects or re-validate each one.
+- Timeouts + max-response-size: unbounded fetches = DoS.
+- Response parsing: the fetched content is *still untrusted*. Don't eval it, don't template it, don't copy it to storage unsanitized.
+## Content rendering — templates, markdown, rich text
+- Which template engine? Autoescape on by default for HTML (`{{ user }}` escapes). The unsafe marker is `|safe` (Jinja), `{!!  !!}` (Blade), `dangerouslySetInnerHTML` (React), `v-html` (Vue). Every use of the unsafe marker is a review point.
+- Markdown: does the renderer allow inline HTML? For untrusted authors, disable it or sanitize post-render with a DOMPurify-equivalent.
+- Rich text (TinyMCE, Quill, Slate): sanitize the HTML output *server-side* before storing. Client-side sanitization is advisory, not authoritative.
+- SVG: SVGs can embed scripts. Re-render to PNG server-side, or sanitize with a tool that strips `<script>`, event handlers, and external references.
+## Search inputs
+- Full-text search: user input goes into a query parser (Lucene syntax, etc.). Is there an injection risk (`field:*` to bypass scoping)? Sanitize or use parameterized search API.
+- Sort / filter parameters: if user-controlled, allowlist the column names. `ORDER BY {user_input}` is SQL injection even if the rest of the query is parameterized.
+## Imports (batch data)
+- CSV / XLS / JSON imports are trust-boundary crossings at scale. Same rules — schema, size, field limits — applied per row.
+- Streaming vs. load-all: streaming is kinder to memory and enables early rejection. Load-all with a 1GB file = OOM.
+- Partial-failure semantics: if row 500 is bad, does the import roll back rows 1-499? Either answer can be right, but it must be *decided*, not accidental.
+## Refuse-list
+- `yaml.load` / `pickle.loads` / `Marshal.load` on any externally-sourced bytes.
+- XML parsers with external entity resolution enabled.
+- Uploads stored under client-supplied filenames.
+- Server-side URL fetchers without an allowlist + post-DNS IP denylist.
+- Schemas that accept unknown fields (`additionalProperties: true` by default).
+- Unbounded sizes: no request body cap, no per-field length, no decoder depth limit.
+- Markdown / HTML rendering of untrusted content without server-side sanitization.
+- Regex patterns without anchors or on backtracking engines with untrusted input.
+---
+## Exit criteria
+- Every external input has a named schema and a size/shape limit.
+- Parser choices are listed with the safe variant selected.
+- If any fetcher is in the design, its allowlist + IP denylist + redirect policy is specified.
+- File upload flow names the content-sniff library, the storage-naming scheme, and the serving origin.
+- The design identifies every "untrusted bytes → executable context" path and closes it.