npm - @rafter-security/cli - Versions diffs - 0.6.6 → 0.7.1 - Mend

@rafter-security/cli 0.6.6 → 0.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (70) hide show

package/README.md +29 -10
package/dist/commands/agent/audit-skill.js +22 -20
package/dist/commands/agent/audit.js +27 -0
package/dist/commands/agent/components.js +800 -0
package/dist/commands/agent/config.js +2 -1
package/dist/commands/agent/disable.js +47 -0
package/dist/commands/agent/enable.js +50 -0
package/dist/commands/agent/exec.js +2 -0
package/dist/commands/agent/index.js +6 -0
package/dist/commands/agent/init.js +162 -163
package/dist/commands/agent/install-hook.js +15 -14
package/dist/commands/agent/list.js +72 -0
package/dist/commands/agent/scan.js +4 -3
package/dist/commands/agent/verify.js +1 -1
package/dist/commands/backend/run.js +12 -3
package/dist/commands/backend/scan-status.js +3 -2
package/dist/commands/brief.js +22 -2
package/dist/commands/ci/init.js +25 -21
package/dist/commands/completion.js +4 -3
package/dist/commands/docs/index.js +18 -0
package/dist/commands/docs/list.js +37 -0
package/dist/commands/docs/show.js +64 -0
package/dist/commands/mcp/server.js +84 -0
package/dist/commands/report.js +42 -41
package/dist/commands/scan/index.js +7 -5
package/dist/commands/skill/index.js +14 -0
package/dist/commands/skill/install.js +89 -0
package/dist/commands/skill/list.js +79 -0
package/dist/commands/skill/registry.js +273 -0
package/dist/commands/skill/remote.js +333 -0
package/dist/commands/skill/review.js +975 -0
package/dist/commands/skill/uninstall.js +65 -0
package/dist/core/audit-logger.js +262 -21
package/dist/core/config-manager.js +3 -0
package/dist/core/docs-loader.js +148 -0
package/dist/core/policy-loader.js +72 -1
package/dist/core/risk-rules.js +16 -3
package/dist/index.js +19 -9
package/dist/scanners/gitleaks.js +6 -2
package/package.json +1 -1
package/resources/skills/rafter/SKILL.md +77 -97
package/resources/skills/rafter/docs/backend.md +106 -0
package/resources/skills/rafter/docs/cli-reference.md +199 -0
package/resources/skills/rafter/docs/finding-triage.md +79 -0
package/resources/skills/rafter/docs/guardrails.md +91 -0
package/resources/skills/rafter/docs/shift-left.md +64 -0
package/resources/skills/rafter-agent-security/SKILL.md +1 -1
package/resources/skills/rafter-code-review/SKILL.md +91 -0
package/resources/skills/rafter-code-review/docs/api.md +90 -0
package/resources/skills/rafter-code-review/docs/asvs.md +120 -0
package/resources/skills/rafter-code-review/docs/cwe-top25.md +78 -0
package/resources/skills/rafter-code-review/docs/investigation-playbook.md +101 -0
package/resources/skills/rafter-code-review/docs/llm.md +87 -0
package/resources/skills/rafter-code-review/docs/web-app.md +84 -0
package/resources/skills/rafter-secure-design/SKILL.md +103 -0
package/resources/skills/rafter-secure-design/docs/api-design.md +97 -0
package/resources/skills/rafter-secure-design/docs/auth.md +67 -0
package/resources/skills/rafter-secure-design/docs/data-storage.md +90 -0
package/resources/skills/rafter-secure-design/docs/dependencies.md +101 -0
package/resources/skills/rafter-secure-design/docs/deployment.md +104 -0
package/resources/skills/rafter-secure-design/docs/ingestion.md +98 -0
package/resources/skills/rafter-secure-design/docs/standards-pointers.md +102 -0
package/resources/skills/rafter-secure-design/docs/threat-modeling.md +128 -0
package/resources/skills/rafter-skill-review/SKILL.md +106 -0
package/resources/skills/rafter-skill-review/docs/authorship-provenance.md +82 -0
package/resources/skills/rafter-skill-review/docs/changelog-review.md +99 -0
package/resources/skills/rafter-skill-review/docs/data-practices.md +88 -0
package/resources/skills/rafter-skill-review/docs/malware-indicators.md +79 -0
package/resources/skills/rafter-skill-review/docs/prompt-injection.md +85 -0
package/resources/skills/rafter-skill-review/docs/telemetry.md +78 -0

package/resources/skills/rafter-code-review/docs/asvs.md ADDED Viewed

@@ -0,0 +1,120 @@
+# OWASP ASVS — Picking a Level, Spot-Checking It
+The Application Security Verification Standard (ASVS 4.0 / 5.0) is a catalog of verification requirements. Unlike Top 10 lists, it's exhaustive — which is why you pick a level first and spot-check, not walk every item.
+## Step 1 — Pick the right level
+| Level | Use for | Rough test |
+|---|---|---|
+| **L1** — Opportunistic | Low-value apps, internal tooling, marketing sites. "Protect against casual, opportunistic attackers." | Would losing this data be annoying but not damaging? |
+| **L2** — Standard | Most apps that handle user data, B2B SaaS, line-of-business apps. "Default for apps handling sensitive data." | PII, payment, auth, health-adjacent, B2B tenants? |
+| **L3** — Advanced | Apps where compromise leads to real harm: financial transactions, healthcare records, critical infrastructure, high-trust platforms. | Regulatory scrutiny? Lives/money at risk? |
+**Rule of thumb**: pick the lowest level that matches the *highest-sensitivity* data flow in the scope. Don't average. A single admin endpoint that touches payment data pulls the whole service to L2 for that endpoint.
+---
+## Step 2 — Spot-check (not walk-every-item)
+ASVS has 280+ requirements. You will not walk them all in a PR review. Instead, for the level you picked, ask the three questions below per category.
+### V1 — Architecture, Design, Threat Modeling
+- Is there a threat model for this feature? (L2+: yes; L3: reviewed and signed.)
+- Are all components inventoried (deps, services, data stores)?
+- Does the design document trust boundaries and assumptions?
+### V2 — Authentication
+- Password policy: min length 8 (L1) / 12 (L2) / 12+MFA (L3); hashed with argon2/bcrypt/scrypt; never truncated or case-normalized in storage.
+- MFA: not required at L1; required for admin/sensitive at L2; required for all at L3.
+- Credential recovery: does not bypass MFA; uses time-limited, single-use tokens; does not leak account existence.
+### V3 — Session Management
+- Server-side session store (or stateless token with revocation)?
+- Session rotation on login / privilege change / logout?
+- Cookie flags: `Secure`, `HttpOnly`, `SameSite=Lax` or stricter; `Domain` scoped tightly.
+### V4 — Access Control
+- Deny-by-default on new endpoints?
+- Per-object authz (BOLA) enforced where user ids appear in URLs?
+- Admin functions require re-authentication at L2+; require MFA step-up at L3.
+### V5 — Validation, Sanitization, Encoding
+- Input: validated at the trust boundary (not downstream) against a positive spec (allowlist, schema, regex anchored).
+- Output: context-aware encoding (HTML / URL / JSON / shell / SQL). Templating auto-escapes?
+- Parsers: safe loaders for YAML/XML/JSON; no string-concat into query languages.
+### V6 — Stored Cryptography
+- Algorithms: AES-GCM, ChaCha20-Poly1305, SHA-256+, HMAC-SHA-256+, PBKDF2/argon2 for passwords.
+- Key management: keys not in source, rotated, scoped per-environment, stored in a managed KMS at L2+.
+- Randomness: `secrets` / `crypto.randomBytes` / `crypto/rand` — never `rand()` / `Math.random()`.
+### V7 — Error Handling & Logging
+- No secrets / PII in logs (grep log statements).
+- No stack traces to the user in production.
+- Authn, authz, and sensitive actions logged with who/when/what.
+### V8 — Data Protection
+- PII classified? Access to it logged?
+- Data at rest encrypted (disk, DB, backups, cache)? Keys managed separately?
+- Data in transit: TLS 1.2+ with modern ciphers; HSTS.
+### V9 — Communications
+- TLS everywhere, including internal hops? At L3, mutual TLS between services.
+- Certificate validation not disabled anywhere (grep `InsecureSkipVerify`, `verify=False`, `rejectUnauthorized: false`).
+### V10 — Malicious Code
+- No hardcoded backdoors, debug logins, "magic" accounts.
+- Build pipeline integrity: signed artifacts, locked deps, reproducible where possible.
+### V11 — Business Logic
+- Sequential flows (checkout, signup, reset): can steps be skipped? Replayed? Reordered?
+- Anti-automation on abuse-prone flows (captcha, proof of work, device fingerprint)?
+### V12 — Files & Resources
+- Uploads: type-sniffed (not extension-trusted), size-limited, stored outside web root, name-randomized.
+- Downloads: path-confined; no `../` traversal; MIME set explicitly.
+- Archives: zip-slip / tar-slip prevention when extracting.
+### V13 — API & Web Service
+- OpenAPI / GraphQL schema present and matches implementation?
+- Auth per-endpoint, not per-service?
+- Rate limits per-endpoint tier?
+### V14 — Configuration
+- Production config reviewed (debug off, defaults rotated, unused features off)?
+- Dependencies: SCA in CI, lockfile committed, base images pinned.
+- Secrets: none in source, rotated on exposure, scoped per environment.
+---
+## Step 3 — What to produce
+Not an ASVS report. A list of **citations** keyed by (level, category, requirement), plus **gaps**. Example:
+```
+L2 / V4.1.3 (deny by default) — OK, `authMiddleware` registered globally in app.ts:41
+L2 / V4.2.1 (per-object authz) — GAP, /orders/:id handler (orders.ts:88) lacks owner check
+L2 / V5.1.1 (schema validation) — OK, zod schemas in routes/*.ts
+```
+---
+## Tie-backs
+- Concrete vulnerabilities (not requirements): go to `web-app.md` / `api.md` / `llm.md`.
+- Specific finding investigation: `investigation-playbook.md`.
+- Automated coverage: `rafter run --mode plus` produces ASVS-tagged findings.

package/resources/skills/rafter-code-review/docs/cwe-top25.md ADDED Viewed

@@ -0,0 +1,78 @@
+# CWE Top 25 — Language-Keyed Checklist
+MITRE's CWE Top 25 is weakness-level, not risk-level. Use this for CLI tools, libraries, IaC, and anything where OWASP's web/API framing doesn't fit. Pick the language section; pair with language-specific linters.
+## How to use
+- Grep the patterns below against the diff. Each hit is a question, not a verdict.
+- Cross-reference with `rafter run` — the backend catches many of these via SAST; this doc covers the ones that take context to judge.
+---
+## Cross-language (applies everywhere)
+- **CWE-79 XSS / CWE-89 SQLi / CWE-78 OS Command Injection** — any user input reaching a query language, shell, or HTML sink. Fix at the sink: parameterize, array-exec, autoescape.
+- **CWE-22 Path Traversal** — any `open(path)`, `fs.readFile(path)`, `os.path.join(base, user_input)`. Canonicalize (`realpath` / `filepath.Abs`) and verify the result stays under an allow-root.
+- **CWE-352 CSRF** — state-changing endpoints: is there a token check? SameSite cookies are necessary but not sufficient for cross-site POSTs in older browsers / API clients.
+- **CWE-287 Improper Authentication / CWE-862 Missing Authorization** — covered in web-app.md / api.md.
+- **CWE-798 Hardcoded Credentials** — `rafter scan local .` catches literal secrets; manually check env-var defaults (`API_KEY = os.environ.get("KEY", "dev-fallback-abc123")` ships the fallback).
+- **CWE-918 SSRF** — any user-supplied URL fetched server-side. See web-app.md A10.
+---
+## Python
+- **CWE-502 Insecure Deserialization** — `pickle.load`, `pickle.loads`, `yaml.load` without `SafeLoader`, `shelve`, `marshal`. Any of these on untrusted bytes is RCE.
+- **CWE-78 / Subprocess** — `subprocess.run(..., shell=True)` with user input. Use list form: `subprocess.run(["cmd", arg])`, never `shell=True` with interpolated input.
+- **CWE-94 Code Injection** — `eval`, `exec`, `compile`, `__import__` with user input. Also `pd.eval`, `numexpr.evaluate`.
+- **CWE-611 XXE** — `xml.etree.ElementTree` is safe by default in 3.7+, but `lxml.etree.parse` with `resolve_entities=True` is not. Prefer `defusedxml`.
+- **CWE-327 Weak Crypto** — `hashlib.md5` / `sha1` on passwords; `random` (not `secrets`) for tokens; `Crypto.Cipher.DES`, `AES.MODE_ECB`.
+- **CWE-20 Input Validation** — type coercion pitfalls: `int(x)` raises, `int(x, 16)` accepts leading `0x`, `float("inf")`.
+## JavaScript / TypeScript
+- **CWE-1321 Prototype Pollution** — `_.merge`, `Object.assign` with user-controlled source, recursive deep-merge on user JSON. Node: affects the whole process.
+- **CWE-79 XSS** — `innerHTML`, `outerHTML`, `document.write`, `dangerouslySetInnerHTML`, `v-html`, `$sce.trustAsHtml`. React's default is safe; anything that bypasses it is the finding.
+- **CWE-94 Code Injection** — `eval`, `new Function(str)`, `setTimeout(str, ...)`, `setInterval(str, ...)`, `vm.runInThisContext` with user input.
+- **CWE-22 Path Traversal** — `path.join(base, userInput)` does not protect. Must resolve and verify containment.
+- **CWE-400 Regex DoS (ReDoS)** — catastrophic backtracking patterns: `(a+)+`, `(.*)*`. Especially user-provided regexes.
+- **CWE-346 Origin Validation** — `postMessage` handlers that don't check `event.origin`. `addEventListener("message", ...)` without origin check is the bug.
+## Go
+- **CWE-369 Divide-by-Zero / CWE-190 Integer Overflow** — Go doesn't panic on overflow, it wraps. Slice indexing with computed sizes: `make([]byte, headerLen)` where `headerLen` is attacker-controlled.
+- **CWE-362 Race Conditions** — map writes without mutex; goroutines sharing non-channel state; `context.Value` for mutable data. Run `go test -race`.
+- **CWE-74 Injection / CWE-78** — `exec.Command(name, args...)` is safe; `sh -c <string>` is not. Check `exec.Command("sh", "-c", userInput)`.
+- **CWE-295 Improper Certificate Validation** — `tls.Config{InsecureSkipVerify: true}` outside tests.
+- **CWE-400 Resource Consumption** — `io.ReadAll` on untrusted streams with no `io.LimitReader`. Goroutine leaks: for every `go f()`, how does it exit?
+- **CWE-665 Improper Initialization** — zero-value structs used as "valid" config; `sync.Mutex` copied by value.
+## Rust
+- **CWE-119 Buffer Issues** — `unsafe` blocks. Every `unsafe` needs a comment explaining the invariant; missing comments are findings.
+- **CWE-362 Race Conditions** — despite borrow checker, `Arc<Mutex<T>>` misuse (holding across `.await`), `RefCell` in multi-threaded code (→ `RwLock`).
+- **CWE-674 Uncontrolled Recursion** — `serde` with deeply nested JSON, manual recursive parsers without depth limit.
+- **CWE-400 Resource Consumption** — `.collect::<Vec<_>>()` on untrusted iterator; `Bytes::from` without length cap.
+- **CWE-704 Incorrect Type Conversion** — `as` casts that truncate silently (`u64 as u32`). Prefer `try_into()`.
+## Java / Kotlin
+- **CWE-502 Insecure Deserialization** — `ObjectInputStream.readObject` on untrusted bytes; XMLDecoder; Jackson with default typing (`@JsonTypeInfo(use = Id.CLASS)` + polymorphic).
+- **CWE-611 XXE** — `DocumentBuilderFactory` / `SAXParserFactory` without disabling external entities. Default is unsafe in older Java.
+- **CWE-22 Path Traversal** — `Paths.get(base, userInput)` doesn't check containment; use `toRealPath().startsWith(base)`.
+- **CWE-917 Expression Language Injection** — SpEL, OGNL, MVEL with user input (classic Struts-style RCE).
+## IaC (Terraform / CloudFormation / Kubernetes)
+- **CWE-284 Improper Access Control** — security groups with `0.0.0.0/0` on admin ports (22, 3389, db ports); S3 buckets public; IAM policies with `Resource: "*"` + `Action: "*"`.
+- **CWE-732 Incorrect Permissions** — file modes `0777`, world-writable volumes, ConfigMaps holding secrets.
+- **CWE-319 Cleartext Transmission** — ELB listeners on port 80 without redirect; storage without encryption at rest; TLS versions < 1.2.
+- **CWE-798 Hardcoded Credentials** — secrets in `*.tf`, `*.yaml` environment, `docker-compose.yml`.
+- **CWE-1104 Unmaintained 3rd-Party** — Docker base images pinned to `latest` or unpinned digests; Helm charts from unreviewed repos.
+---
+## Exit criteria
+- For each language in the diff, walked the relevant section. For each hit, either a file:line citation showing it's safe, or a finding filed.
+- Run language-specific linters in CI (`bandit`, `semgrep`, `golangci-lint`, `cargo clippy`, `spotbugs`) — this skill complements, doesn't replace them.

package/resources/skills/rafter-code-review/docs/investigation-playbook.md ADDED Viewed

@@ -0,0 +1,101 @@
+# Investigation Playbook — Canonical Questions per Category
+When one finding, suspicious pattern, or vague "this looks wrong" needs a follow-up — use this. Each section is a question you can actually answer with Grep / Read / trace.
+## Reachability: "Can untrusted input get here?"
+Before fixing anything, prove it's reachable.
+- Where does the input originate? Trace upward from the sink: `app.post(...)` → handler → service → ... → the line in question.
+- Are there layers that filter or transform along the way? Allowlist validator? JSON schema? ORM serializer?
+- Is this code path actually called in production? Or is it a dead branch left from a refactor?
+- If it's an internal service — is it exposed via a misconfigured ingress, reachable from the internet, accessible from a compromised pod? "Internal" is a policy, not a security boundary.
+- Test: can you write a failing request/input that triggers the line? If you can write it in 5 minutes, an attacker can.
+---
+## Authz coverage: "Is every path checked?"
+For an authz-critical operation (read user X, delete resource Y, invoke admin action):
+- List every entry point (HTTP handler, gRPC method, queue worker, CLI command, background job). For each: is the check present?
+- For HTTP: is the check in middleware (applies to all), or per-handler (easy to miss)? Grep for the check; count matches; count routes; compare.
+- Does the check use *request-supplied* identity or *session-derived* identity? `X-User-Id: 42` header is not identity.
+- Privilege escalation corner: can a user modify their own `role`, `tenant_id`, `permissions` via the same endpoint that updates their profile? (mass-assignment + authz = disaster.)
+- Is authz re-checked after redirects / async continuations / token refreshes? Identity is not sticky across those.
+---
+## Data flow: "Does tainted data reach a dangerous sink?"
+Source → sinks, both directions:
+- **Top-down**: pick a source (request body, query string, file upload, DB read from a user-writable table). Grep how that variable flows. Does it reach a sink (SQL, shell, HTML, URL fetch, deserializer)?
+- **Bottom-up**: pick a sink (every `subprocess.run`, every `db.raw`, every `innerHTML`). Trace backward. Is the input at the sink derivable from a source?
+- Don't trust "it's validated upstream" without proof. Read the validator; check that the type after validation is strong enough (strings are weak, `UUID` is strong).
+- Does the data go through serialization round-trips that could re-introduce metacharacters? JSON round-trip, URL-decoding at the wrong layer, base64 → string → SQL.
+---
+## Trust boundaries: "Where does untrusted become trusted?"
+- Draw the boundary. What's on each side?
+- At the boundary: is there validation (shape, type, range, allowlist)? Is there normalization (Unicode NFKC, lowercasing, path canonicalization)?
+- Is the same boundary crossed more than once? (Controller → service → repo — does the repo re-validate, or trust the service?)
+- Cross-service: does Service B trust Service A's payload? If A is compromised, what can B do?
+- Cross-tenant: if a single process serves multiple tenants, where is the tenant id enforced? On every query? Or only at the top of the handler?
+---
+## Error paths: "What happens when this fails?"
+- Every try/except / error branch: does it leak information (stack, internal IDs, DB errors) to the caller?
+- Does the failure leave the system in a broken state (half-written file, partial DB row, orphaned session)?
+- Does the failure log enough for you to debug a real incident? Generic "failed to process" without context is a blind spot.
+- Are retries bounded? Does the retry code path itself re-authenticate, or reuse a possibly-stale token?
+---
+## Concurrency: "What happens with two of these at once?"
+- Is there shared mutable state (module globals, singletons, caches, files)? Protected by a lock?
+- Check-then-act races: `if not exists: create` — two requests can both pass the check. Use `INSERT ... ON CONFLICT` or transactions.
+- Idempotency: can the client retry safely? Is there an idempotency key? Repeated payment, duplicate email, double-spend patterns.
+- Async/await holding locks across `.await`: in Rust/Python, this deadlocks. In Go, it's fine but can cause fairness issues.
+---
+## Secrets lifecycle: "Where does this credential live, and who can read it?"
+- Creation: how is it generated (entropy source)? Who knows it at creation time?
+- Storage: env var, config file, KMS, DB, vault? File permissions?
+- Transit: does it appear in logs, metrics, error messages, request bodies?
+- Rotation: is there a story for rotating it? Automated or manual? What breaks during rotation?
+- Revocation: if it leaks today, what's the time-to-revoke? Minutes, hours, or "we'd have to redeploy"?
+---
+## Input shape: "Can I break the parser?"
+- Size: is there a max? What happens at the max+1? At 10×max?
+- Depth: for JSON/XML/nested structures — max depth? Billion-laughs / deeply nested dicts can OOM.
+- Encoding: UTF-8 vs UTF-16 vs Latin-1; BOM handling; surrogate pairs; null bytes in paths.
+- Numeric: NaN, Infinity, -0, integer overflow, very large floats losing precision.
+- Arrays: empty, one element, duplicate keys, sparse arrays, non-integer indices.
+---
+## How to record the outcome
+For each finding that survives investigation, produce a one-line summary in this shape:
+```
+[severity] [ruleId or ad-hoc tag] file:line — <one-sentence issue> — <one-sentence fix direction>
+```
+Example:
+```
+[high] IDOR /orders/:id (orders.ts:88) — handler loads order by URL id without comparing to session user — add owner check before load, 404 (not 403) on mismatch
+```
+Feed these into the PR review comment or back to `rafter` for triage follow-up (`rafter/docs/finding-triage.md`).

package/resources/skills/rafter-code-review/docs/llm.md ADDED Viewed

@@ -0,0 +1,87 @@
+# LLM-Integrated Code Review — OWASP LLM Top 10 (2025)
+For any code that sends prompts to a model, exposes tool calls, retrieves context (RAG), or ships model output to a downstream system. Walk as questions. Cite file:line.
+## LLM01 — Prompt Injection
+Assume every string that reaches the prompt — user input, retrieved documents, tool output, file contents, web pages — is adversarial.
+- Trace the prompt build. Concatenation of user input into the system prompt? String interpolation of retrieved chunks? Find every `system + user` join site.
+- Are there *structural* defenses? (Delimiters the model is trained to respect, role separation, XML tags, instruction hierarchies.) Note: none are airtight — defense is layered, not singular.
+- Indirect injection: is retrieved content (web page, email, PDF, repo file) ever fed to the model? Treat it as untrusted input, same as the user's message.
+- Output gating: is the model's output used to decide authz, invoke tools, or send messages? If yes — LLM01 merges with LLM06 (Excessive Agency).
+## LLM02 — Sensitive Information Disclosure
+- What goes into the prompt? Grep for prompts that include: PII, internal URLs, database rows, credentials, full request objects. "Just pass context" is the failure mode.
+- Is there a redaction step between "application data" and "prompt"? Can it be turned off by flag?
+- Does the model provider retain logs? Which tenant's data is crossing into the provider? Is that contractually allowed?
+- Model output: before returning to the user, is it scanned for data the caller shouldn't see (e.g. other tenants' data leaked from the context)?
+## LLM03 — Supply Chain
+- Model source: where does the model come from? Provider API (which account?) or self-hosted? If self-hosted, from which registry? Is the weights file checksummed?
+- Embedding model: same questions. Many RAG pipelines have *two* models; both are supply chain.
+- Prompt templates: if loaded from a shared registry (LangChain Hub, custom store), pinned and verified? Or pulled by name?
+- Plugins / tools / MCP servers registered with the agent — are they audited (see `rafter agent audit`) before install?
+## LLM04 — Data & Model Poisoning
+- Training / fine-tuning data: where from, how reviewed, who can write to the source? Can a user of the system influence future training (feedback loops)?
+- RAG corpus: same question. Can a user add documents to the retrieval index? If yes — those documents can issue instructions via LLM01.
+- Vector store: who can write? Who can update metadata (which drives filtering)? Metadata poisoning can bypass the retrieval filter.
+## LLM05 — Improper Output Handling
+Treat model output as untrusted input to whatever consumes it.
+- Markdown → HTML rendering: is the markdown sanitized? `![img](javascript:...)`, `<script>` in allowed tags, `<img onerror=>`?
+- Model output as code: passed to `eval`, `exec`, `Function()`, compiled and run, written as a shell script? That's RCE by way of prompt.
+- Model output as URL: used to fetch, redirect, or render? Same SSRF/XSS questions as elsewhere — plus: the model happily generates `javascript:` URLs.
+- Model output as SQL / shell / XPath: if the model writes queries, is the result parameterized / sandboxed / approved before execution?
+- Tool-call arguments from the model: validate shape, types, and values against a schema. Do not trust the model to stay in bounds.
+## LLM06 — Excessive Agency
+Tools + untrusted prompts = agent exfiltration / damage.
+- For each tool the agent can call, ask: (a) does it need to exist, (b) what's its blast radius, (c) is there a human-in-the-loop gate for irreversible actions?
+- Permissions scope: does the agent run with the calling user's permissions, or with service-account permissions that exceed any one user?
+- Destructive actions (send email, charge card, delete, write file, run shell): any of these reachable from a prompt? Use Rafter's command guardrails (`rafter agent exec`) as a pattern.
+- Chained calls: can tool A's output become tool B's input with no validation? Multi-step attacks live here.
+## LLM07 — System Prompt Leakage
+- Don't put secrets in system prompts. Grep the system prompt for API keys, customer-specific config, internal URLs.
+- Assume the system prompt is recoverable. The prompt is for *behavior*, not for *authz*. If the code relies on the user not knowing the prompt to enforce a policy — the policy is broken.
+- Different tenants / roles: different prompts, loaded server-side keyed by the *authenticated* principal, never from the request.
+## LLM08 — Vector & Embedding Weaknesses
+- Embedding-time injection: user content embedded without sanitization can be weaponized when retrieved.
+- Access control on retrieval: is the query filtered by tenant / user before the vector search, or filtered *after*? "After" often leaks via re-ranker.
+- Embedding collisions / adversarial embeddings: high-stakes retrieval (medical, legal) — is there a confidence floor on the similarity score before acting?
+## LLM09 — Misinformation & Overreliance
+A design question, but reviewable:
+- Does the UI make it clear the output is model-generated? Is there a confidence indicator where warranted?
+- For advice domains (medical, legal, financial), is there a disclaimer *and* a hard gate on actions?
+- Does the code treat model output as ground truth anywhere? Summaries, extractions, classifications used downstream should have a human review step or a fallback.
+## LLM10 — Unbounded Consumption
+- Token budgets per request, per user, per tenant, per day?
+- Max tokens on *output* (not just input) — unbounded generation is the classic DoS/cost footgun.
+- Streaming responses: timeout per chunk? Total timeout?
+- Parallel requests: queue depth, concurrency caps? Fan-out from a single user request to N model calls (RAG, ReAct loops) — bounded?
+---
+## Exit criteria
+- For every tool the agent can call: documented purpose, scope, and human-gate story.
+- For every retrieval/RAG path: write-access audit, injection defenses, tenant isolation.
+- For every model output sink: treated as untrusted, specific sanitization / validation cited.
+- Run `rafter agent audit` on any bundled skills/plugins. Pair with `rafter run` for SAST.

package/resources/skills/rafter-code-review/docs/web-app.md ADDED Viewed

@@ -0,0 +1,84 @@
+# Web Application Review — OWASP Top 10 (2021)
+Walk each category as questions. Cite file:line evidence before moving on. If you can't answer a question, that *is* the finding.
+## A01 — Broken Access Control
+The #1 risk. Every authenticated route must answer: "who is allowed?"
+- Grep for route handlers (`app.get`, `@app.route`, `router.handle`, controller annotations). For each: is there an explicit authz check? If you can't see one, trace the middleware chain — is it registered *before* this route?
+- For every `where user_id = ?` pattern, is the id from the session, or from the request? `?id=123` in the URL that controls the DB lookup is IDOR-shaped.
+- Are admin routes distinguished by URL prefix alone? If `/admin/*` is only protected by "don't tell users", that's not protection.
+- Does the app rely on HTTP verb restrictions (GET safe, POST protected)? Can you POST to a GET-only endpoint? Does it accept `X-HTTP-Method-Override`?
+- Is CORS configured with `Access-Control-Allow-Origin: *` alongside `Allow-Credentials: true`? That combination is almost always wrong.
+## A02 — Cryptographic Failures
+- What algorithms appear? Grep for `md5`, `sha1`, `des`, `rc4`, `ecb`. Any hit on user data, session tokens, or passwords is a finding.
+- How are passwords hashed? Look for `bcrypt`, `scrypt`, `argon2`, `pbkdf2`. Absence is the finding. `sha256(password + salt)` is not password hashing.
+- Are secrets in source? Run `rafter scan local .` first — but also grep for `private_key`, `api_key`, `BEGIN RSA`, `.pem`, `.p12`.
+- Is TLS enforced? Look for redirect middleware, HSTS headers, cookie `Secure` flag. Cookies without `Secure` + `HttpOnly` + `SameSite` — ask why.
+- Is randomness from `Math.random()` / `rand()` used for tokens, session ids, password resets? Must be `crypto.randomBytes` / `secrets.token_*` / `crypto/rand`.
+## A03 — Injection
+- SQL: every query that interpolates a variable (`f"SELECT ... {x}"`, backticks with `${x}`, `+` string concat into SQL). Must be parameterized. ORMs help but `.raw()` / `.query()` escape hatches don't.
+- Command injection: `exec`, `spawn`, `system`, `subprocess.run(shell=True)`, `child_process.exec`. Any user input reaching these? Prefer array form, never `shell=True` with input.
+- LDAP / NoSQL / XPath / template injection: same question — does user input reach a query language, and is it escaped by the library or by string concat?
+- XSS: where does user-controlled data reach HTML? React/Vue auto-escape; `dangerouslySetInnerHTML`, `v-html`, `innerHTML`, template literals rendered as HTML are the escape hatches. Server-side: is the template engine autoescaping? Jinja2 defaults off for `.txt`, on for `.html`.
+- Deserialization: `pickle.loads`, `yaml.load` (without SafeLoader), `Marshal.load`, Java's `ObjectInputStream`. Any of these on untrusted bytes is RCE-shaped.
+## A04 — Insecure Design
+Design smells that code review *can* catch:
+- Is there a single trust boundary, or does the same request cross it multiple times? (e.g. user → API → internal service that re-reads user input without re-validating.)
+- Are rate limits on authentication and password reset flows? Count attempts per account *and* per IP.
+- Does the password reset flow leak account existence? "Email sent if account exists" vs "no account with that email" — the latter is an oracle.
+- Is the "remember me" token a long-lived bearer? What invalidates it on password change?
+## A05 — Security Misconfiguration
+- Debug mode / stack traces in production? Grep for `DEBUG = True`, `app.debug`, `NODE_ENV` comparisons.
+- Default credentials in config files or seed scripts? Look in `seed.js`, `fixtures/`, `docker-compose.yml`.
+- Unused frameworks/features enabled? Directory listing? Admin consoles (`/admin`, `/actuator`, `/console`) without authn?
+- Security headers: CSP, X-Content-Type-Options, Referrer-Policy, Permissions-Policy. Is there a helmet/`secure` middleware registered?
+- Cloud metadata access — can the server be coerced into fetching `169.254.169.254`? (see also A10/SSRF.)
+## A06 — Vulnerable & Outdated Components
+- `rafter run` covers this via SCA. In review, check that the manifest is present (`package.json`, `requirements.txt`, `go.mod`, `pom.xml`) and that the lockfile is committed.
+- Is there a `postinstall` / `prepare` script running arbitrary code from dependencies? That's a supply-chain footgun.
+- Are any dependencies pulled from raw git URLs or non-registry sources without pinning?
+## A07 — Identification & Authentication Failures
+- Session management: where is the session created, stored, invalidated? Does logout actually invalidate server-side, or just drop the cookie?
+- Multi-factor: present on admin? On password change? On MFA enrollment itself (bypass via "add new device")?
+- Credential stuffing: lockout policy, captcha on repeated failures, generic error messages.
+- JWT: is `alg: none` accepted? Is the key confusion attack possible (HS256 verified against an RSA public key)? Is `kid` used to resolve arbitrary files?
+## A08 — Software & Data Integrity Failures
+- Update channels: does the app auto-update itself or pull config from remote? Is that channel signed and verified?
+- CI/CD: does the pipeline verify signatures on built artifacts? Are secrets scoped per-job or leaked across?
+- Deserialization (overlaps with A03): any untrusted blob fed to `pickle` / `yaml.load` / `unserialize` / `readObject`.
+## A09 — Security Logging & Monitoring Failures
+- Are authn failures logged with enough context (user id, ip, timestamp) to be useful?
+- Do logs leak secrets? Grep log statements for `password`, `token`, request bodies printed wholesale.
+- Is there a correlation id per request that survives across services?
+## A10 — Server-Side Request Forgery (SSRF)
+- Any endpoint that fetches a URL supplied by the user? (image proxy, webhook configurer, PDF-from-URL, OAuth callback that fetches `openid-configuration`.)
+- Is the URL's host allowlisted? Does the allowlist resolve the hostname and re-check against an internal-IP denylist (RFC1918 + link-local + cloud metadata)?
+- Does it follow redirects? Each redirect is a fresh SSRF check, not just the first URL.
+---
+## Exit criteria
+- For each category above, either a file:line citation proving it's handled, OR a finding logged with ruleId-shaped summary, OR an explicit "N/A — feature not present in this diff".
+- Pair with `rafter run` results: cross-reference scanner findings against your manual walk. Scanner-only hits are candidates for triage (`rafter/docs/finding-triage.md`); manual-only hits are the ones scanners miss.

package/resources/skills/rafter-secure-design/SKILL.md ADDED Viewed

@@ -0,0 +1,103 @@
+---
+name: rafter-secure-design
+description: "Shift-left, design-phase security — walk design decisions as a Choose-Your-Own-Adventure *before* the code exists. Router skill: pick what you're designing (auth, data storage, API surface, ingestion, deployment, dependencies) and Read the matching sub-doc. Each sub-doc is a set of questions a security engineer would ask at kickoff — what primitive to pick, what to refuse, what to threat-model. Pair with `rafter-code-review` (mid-lifecycle review) and the `rafter` skill (detection). Use at feature kickoff, architecture review, or whenever you're choosing between primitives."
+version: 0.1.0
+allowed-tools: [Read, Glob, Grep]
+---
+# Rafter Secure Design — Designing It Right The First Time
+A designer's skill, not a scanner. The goal is to catch the flaw in the whiteboard sketch, not three weeks later in a PR. Each sub-doc asks the questions a security engineer would ask at kickoff — "which primitive, which boundary, which default?"
+> Pair with `rafter-code-review` (structured review *during* PR) and the `rafter` skill (automated detection of what slipped through). This skill is the earliest stage — prevention before the code exists.
+## How to use this skill
+1. Identify what's being designed (below). If multiple apply, walk them in the order listed — `threat-modeling` last, as a capstone.
+2. `Read` only the matching sub-doc. Do not preload them all; pick-and-load keeps the conversation tight.
+3. Work through its questions against the *proposed* design. Capture the answer inline (architecture doc, design RFC, PR description). If you can't answer a question, that's a design gap — resolve it before writing code.
+4. When the design is stable, run the `threat-modeling` walk to stress-test it.
+5. Hand off to `rafter-code-review` during implementation.
+---
+## Choose Your Adventure
+### (1) Authentication & Authorization
+For: login, sessions, tokens, service-to-service identity, multi-tenant access, role-based permissions, anything that answers "who is this and what can they do?"
+- **Read `docs/auth.md`** — Primitive selection (session vs. JWT vs. OAuth), authZ model (RBAC / ABAC / ReBAC), token lifetime + revocation, MFA surface, service identity. Questions phrased as "pick one and say why".
+### (2) Data storage — at rest, in transit, PII
+For: database schema design, file storage, caches, logs, anything that decides *where* sensitive data lives and *who* holds the keys.
+- **Read `docs/data-storage.md`** — Classification (what is PII/PHI/PCI here?), encryption choices, key management, retention + deletion, backup scope, tenancy isolation. Anti-patterns: encrypt-everything-as-a-religion, homegrown crypto, keys next to data.
+### (3) API surface — REST / GraphQL / gRPC / webhooks
+For: designing new endpoints, shaping request/response schemas, choosing between resource styles, rate limiting, versioning, exposing internal services.
+- **Read `docs/api-design.md`** — Resource modeling for authz (is this endpoint BOLA-shaped?), write-vs-read boundaries, idempotency, rate-limit keys, error taxonomy (what leaks?), webhook delivery + replay.
+### (4) Ingestion — inputs, uploads, parsers, user content
+For: anything that accepts user-controlled bytes: form posts, file uploads, webhook payloads, imports, content rendering, search indexing.
+- **Read `docs/ingestion.md`** — Trust boundaries (where does untrusted become trusted?), parser choice (safe default vs. fast), size + shape limits, content sniffing, SSRF-adjacent fetchers, deserialization surface.
+### (5) Deployment — topology, network, secrets, runtime
+For: infra plan, service boundaries, secret distribution, egress policy, CI/CD pipeline, build-time vs. run-time separation.
+- **Read `docs/deployment.md`** — Network zones, least-privilege IAM, secret distribution (not "put it in env"), build provenance, runtime posture (read-only FS, non-root), multi-region / DR assumptions.
+### (6) Dependencies & supply chain
+For: picking a library, adopting a framework, pulling a container base image, introducing a new SaaS, wiring a postinstall script.
+- **Read `docs/dependencies.md`** — Pick-vs-write, maintenance signal, install-time execution, pinning + lockfiles, SBOM + SCA hooks, vendoring vs. registry, typosquat / slopsquat checks.
+### (7) Threat model — STRIDE walk of the full design
+For: the capstone pass *after* the above decisions are drafted. Also good for any greenfield service review.
+- **Read `docs/threat-modeling.md`** — STRIDE applied to the specific design (not the generic checklist). Trust boundaries, data-flow diagrams as prose, abuse cases, negative-space questions ("what did we implicitly assume?").
+### (8) Which standards / frameworks should bound this?
+For: scoping compliance, picking a baseline, answering "how much is enough?"
+- **Read `docs/standards-pointers.md`** — Pointers to ASVS (app sec), NIST SSDF (lifecycle), CSA CCM (cloud), OWASP SAMM (program maturity), plus the cheap-and-fast subset to start with.
+---
+## What this skill will NOT do
+- It will not write the design document for you. It walks *your* draft through structured questions.
+- It will not replace a dedicated threat-modeling session with the team. It prepares you for one.
+- It will not produce a checklist to mechanically tick through. Every question expects a deliberate answer; "N/A because..." is fine, "skip" is not.
+---
+## Fast path at feature kickoff
+```text
+1. Sketch the design (one-pager, box-and-arrow).
+2. Walk the sub-doc that matches the riskiest choice you're about to make.
+3. Walk threat-modeling.md as a capstone.
+4. Write the decisions into the design doc as "decided / rejected / why".
+5. Start coding — and loop in `rafter-code-review` when the PR lands.
+```
+If you're revisiting an existing design (refactor, migration), same flow: treat the current shape as "proposed" and walk the relevant sub-docs as questions.
+---
+## Tie-backs
+- Ready to review the code that implements the design? → `rafter-code-review`.
+- Implementation landed, need automated checks? → `rafter` skill, `rafter run` / `rafter scan local`.
+- Risky command came up mid-design (spike, data migration)? → `rafter` skill, `docs/guardrails.md`.
+- Have a specific finding from a scan? → `rafter` skill, `docs/finding-triage.md`.