npm - @rafter-security/cli - Versions diffs - 0.7.0 → 0.7.1 - Mend

@rafter-security/cli 0.7.0 → 0.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (56) hide show

package/README.md +20 -1
package/dist/commands/agent/audit-skill.js +2 -1
package/dist/commands/agent/audit.js +27 -0
package/dist/commands/agent/components.js +800 -0
package/dist/commands/agent/disable.js +47 -0
package/dist/commands/agent/enable.js +50 -0
package/dist/commands/agent/index.js +6 -0
package/dist/commands/agent/init.js +162 -164
package/dist/commands/agent/list.js +72 -0
package/dist/commands/brief.js +20 -0
package/dist/commands/docs/index.js +18 -0
package/dist/commands/docs/list.js +37 -0
package/dist/commands/docs/show.js +64 -0
package/dist/commands/mcp/server.js +84 -0
package/dist/commands/skill/index.js +14 -0
package/dist/commands/skill/install.js +89 -0
package/dist/commands/skill/list.js +79 -0
package/dist/commands/skill/registry.js +273 -0
package/dist/commands/skill/remote.js +333 -0
package/dist/commands/skill/review.js +975 -0
package/dist/commands/skill/uninstall.js +65 -0
package/dist/core/audit-logger.js +262 -21
package/dist/core/config-manager.js +3 -0
package/dist/core/docs-loader.js +148 -0
package/dist/core/policy-loader.js +72 -1
package/dist/index.js +6 -0
package/package.json +1 -1
package/resources/skills/rafter/SKILL.md +76 -96
package/resources/skills/rafter/docs/backend.md +106 -0
package/resources/skills/rafter/docs/cli-reference.md +199 -0
package/resources/skills/rafter/docs/finding-triage.md +79 -0
package/resources/skills/rafter/docs/guardrails.md +91 -0
package/resources/skills/rafter/docs/shift-left.md +64 -0
package/resources/skills/rafter-code-review/SKILL.md +91 -0
package/resources/skills/rafter-code-review/docs/api.md +90 -0
package/resources/skills/rafter-code-review/docs/asvs.md +120 -0
package/resources/skills/rafter-code-review/docs/cwe-top25.md +78 -0
package/resources/skills/rafter-code-review/docs/investigation-playbook.md +101 -0
package/resources/skills/rafter-code-review/docs/llm.md +87 -0
package/resources/skills/rafter-code-review/docs/web-app.md +84 -0
package/resources/skills/rafter-secure-design/SKILL.md +103 -0
package/resources/skills/rafter-secure-design/docs/api-design.md +97 -0
package/resources/skills/rafter-secure-design/docs/auth.md +67 -0
package/resources/skills/rafter-secure-design/docs/data-storage.md +90 -0
package/resources/skills/rafter-secure-design/docs/dependencies.md +101 -0
package/resources/skills/rafter-secure-design/docs/deployment.md +104 -0
package/resources/skills/rafter-secure-design/docs/ingestion.md +98 -0
package/resources/skills/rafter-secure-design/docs/standards-pointers.md +102 -0
package/resources/skills/rafter-secure-design/docs/threat-modeling.md +128 -0
package/resources/skills/rafter-skill-review/SKILL.md +106 -0
package/resources/skills/rafter-skill-review/docs/authorship-provenance.md +82 -0
package/resources/skills/rafter-skill-review/docs/changelog-review.md +99 -0
package/resources/skills/rafter-skill-review/docs/data-practices.md +88 -0
package/resources/skills/rafter-skill-review/docs/malware-indicators.md +79 -0
package/resources/skills/rafter-skill-review/docs/prompt-injection.md +85 -0
package/resources/skills/rafter-skill-review/docs/telemetry.md +78 -0

package/resources/skills/rafter/docs/guardrails.md ADDED Viewed

@@ -0,0 +1,91 @@
+# Rafter Guardrails — PreToolUse Hooks & Command Risk
+How Rafter intercepts agent tool calls before they execute, how it decides what to block, and how to override safely.
+## The Shape
+Rafter exposes two hook handlers over stdio:
+- `rafter hook pretool` — read a JSON event on stdin (from Claude Code, etc.), emit an approve/block decision on stdout.
+- `rafter hook posttool` — read a JSON event after a tool ran; log to audit trail, optionally rescan written files for secrets.
+For platforms without hooks, the same classifier is reachable as:
+- `rafter agent exec --dry-run -- <command>` (returns risk, exits 0/1)
+- `rafter mcp serve` → MCP tool `evaluate_command`
+## Risk Tiers
+Every command (Bash-like tool call) gets classified into one of four tiers by `src/core/risk-rules.ts`:
+| Tier | What it means | Default hook behavior |
+|---|---|---|
+| `low` | Read-only, safe prefix (`ls`, `cat`, `grep`, `git status` …), no chaining | **approve** silently |
+| `medium` | State-changing but recoverable (package installs, git commits on current branch) | **approve with note** in audit log |
+| `high` | Destructive or privileged (force push, `sudo`, broad file deletion, curl | sh) | **prompt** the agent / user for approval |
+| `critical` | Likely irreversible damage (`rm -rf /`, DB drop, wiping .git, repo-wide chmod) | **block** hard |
+Tiers are derived from regex patterns in `risk-rules.ts` (`CRITICAL_PATTERNS`, `HIGH_PATTERNS`, `MEDIUM_PATTERNS`) plus a `SAFE_PREFIX` allowlist. Presence of chain operators (`&&`, `||`, `;`, `|`) disqualifies the safe-prefix shortcut.
+## Policy Overrides
+`.rafter.yml` (project) and `~/.rafter/config.yml` (global) can override defaults:
+```yaml
+risk:
+  blocked_patterns:
+    - "terraform destroy"
+  require_approval:
+    - "^npm publish"
+  allow:
+    - "^pnpm run test"     # force low regardless of content
+```
+Merge order (most specific wins): project `.rafter.yml` > global config > built-in defaults. Dump the effective merged policy with `rafter policy export`.
+## How to Interpret a Block
+When a hook blocks a command, the JSON response includes:
+- `decision`: `"block" | "approve" | "ask"`
+- `riskLevel`: `"critical" | "high" | "medium" | "low"`
+- `reason`: the matched pattern or policy rule
+- `ruleId`: stable ID you can reference in overrides / suppressions
+**Before overriding, ask: is there a safer form of this command?** Example:
+- `rm -rf $DIR` with unvalidated `$DIR` → use explicit path or `--one-file-system`.
+- `curl <url> | sh` → download, inspect, then run.
+- `git push --force` → `git push --force-with-lease`.
+## How to Request an Override
+If the block is a false positive **for this specific context**, the right path is:
+1. Add an allow pattern scoped to the project in `.rafter.yml`:
+   ```yaml
+   risk:
+     allow:
+       - "^terraform destroy -target=module\\.sandbox"
+   ```
+2. Or run once with an explicit ack flag: `rafter agent exec --force -- <command>` (logged to audit trail; still shows up in `rafter agent audit` history).
+3. Never disable the hook globally to get past one command — that silently drops protection for every future call.
+## Audit Trail
+Every hook decision (approve / ask / block) is appended to the JSONL audit log:
+- Location: `rafter agent status` prints the path.
+- Read: `rafter agent audit --log` (or MCP `read_audit_log`).
+- Use it for postmortems: *why did this command run?*, *what did the agent try before the block?*
+## Platform Notes
+- **Claude Code**: `rafter agent init --with-claude-code` wires `pretool` + `posttool` into `~/.claude/settings.json`. Hook timeout is 5s; long scans defer to the async posttool path.
+- **MCP clients (Gemini, Cursor, …)**: no native hook; use the `evaluate_command` MCP tool from your agent's system prompt ("before Bash, call rafter.evaluate_command").
+- **CI**: hooks don't fire in CI. Use `rafter scan` + `rafter policy validate` in the pipeline instead.
+## Common Pitfalls
+- A `low` classification is not a safety guarantee — it means "no known-bad pattern matched". Still review unusual commands.
+- Chaining defeats the safe-prefix allowlist on purpose (`ls && rm -rf /` is not low-risk).
+- `sudo` always escalates to at least `high` regardless of the wrapped command.
+- Secret leaks in arguments (`curl -H "Authorization: Bearer abc..."`) are flagged by posttool scanning, not by the pretool risk classifier.

package/resources/skills/rafter/docs/shift-left.md ADDED Viewed

@@ -0,0 +1,64 @@
+# Shift-Left — Secure Design & Code Review Skills
+`rafter` (this skill) handles **detection**: scanners, hooks, risk classifiers. Two sibling skills cover the earlier stages of the lifecycle — use them when prevention or structured review is more valuable than another scan pass.
+## Decision Tree
+| You're trying to … | Reach for |
+|---|---|
+| Write code that **doesn't have the flaw in the first place** (design phase, picking primitives, shaping APIs) | `rafter-secure-design` |
+| **Review existing code** against OWASP Top 10 / MITRE ATT&CK / ASVS with a structured walkthrough | `rafter-code-review` |
+| Find concrete bugs / leaks / CVEs automatically | stay in this skill — see branch (a) in SKILL.md |
+The three skills compose: design well (secure-design) → write it → review it (code-review) → detect what slipped through (rafter scan + guardrails).
+## `rafter-secure-design` (filed as rf-bcr)
+Use at feature kickoff or during architecture review, *before* code exists. It's a CYOA over design decisions:
+- Authn / authz primitives: which to pick, which to refuse (e.g. homegrown JWT signing).
+- Input boundaries: where to validate, where to escape, where to parameterize.
+- Secrets handling: storage, rotation, scoping of least-privilege credentials.
+- Data-in-transit / data-at-rest defaults per language/framework.
+- Threat modeling prompts: STRIDE-style walks you can run with an agent.
+Invoke it by name in platforms that auto-trigger skills, or:
+```bash
+rafter brief shift-left      # this doc
+# and then load the sibling:
+#   Read skills/rafter-secure-design/SKILL.md
+```
+## `rafter-code-review` (landed)
+Use during code review — your own or an AI's. A CYOA router into OWASP / MITRE / ASVS walkthroughs phrased as *questions*, not as monolithic audits. It's the *analysis* counterpart to automated scanning.
+Pick the category that matches the code in front of you:
+- **Web app** → `rafter-code-review/docs/web-app.md` (OWASP Top 10 2021).
+- **REST / GraphQL / gRPC API** → `rafter-code-review/docs/api.md` (OWASP API Top 10 2023).
+- **LLM-integrated feature** → `rafter-code-review/docs/llm.md` (OWASP LLM Top 10 2025).
+- **CLI / library / IaC** → `rafter-code-review/docs/cwe-top25.md` (MITRE CWE Top 25, keyed by language).
+- **Need to pick review depth** → `rafter-code-review/docs/asvs.md` (ASVS L1/L2/L3 selection + spot-checks).
+- **Single suspicious finding to chase** → `rafter-code-review/docs/investigation-playbook.md`.
+Start at `rafter-code-review/SKILL.md` — it's a router; Read only the one sub-doc you need so you don't flood context.
+Pair with `rafter run --mode plus` when you want both a human-style walkthrough and the backend's deep pass on the same diff.
+## When to use which (cheat sheet)
+- Designing a new service → **secure-design**.
+- Reviewing a teammate's PR by eye → **code-review**.
+- CI gate / pre-push / scheduled scan → **rafter** (this skill), `rafter run` / `rafter scan local`.
+- "I have a finding, now what?" → **rafter**, `docs/finding-triage.md`.
+- "I have a risky command, is it safe?" → **rafter**, `docs/guardrails.md`.
+Do not duplicate. If a sibling skill already owns the topic, Read it and stop — don't re-derive the checklist here.
+## Status
+- `rafter-code-review` — **landed** (rf-z7j). Ships alongside this skill; invoke directly.
+- `rafter-secure-design` — **landed** (rf-bcr). Ships alongside this skill; invoke directly. Router skill with sub-docs for auth, data storage, API design, ingestion, deployment, dependencies, threat modeling, and standards pointers.
+Both are installed — prefer invoking them directly for structured output over re-deriving checklists here.

package/resources/skills/rafter-code-review/SKILL.md ADDED Viewed

@@ -0,0 +1,91 @@
+---
+name: rafter-code-review
+description: "Structured security code review — OWASP / MITRE / ASVS walkthroughs as questions, not audits. Router skill: pick what kind of code you're reviewing (web app, REST/GraphQL API, LLM-integrated, CLI/library/IaC) and Read the matching sub-doc. Designed to pair with `rafter scan` / `rafter run` — the scanner finds known-bad patterns, this skill asks the questions that patterns miss. Use during PR review, refactoring risky modules, or pre-release hardening."
+version: 0.7.0
+allowed-tools: [Bash, Read, Glob, Grep]
+---
+# Rafter Code Review — Structured Security Walkthroughs
+A reviewer's skill, not an audit generator. Each sub-doc is a set of **questions** to run against the code — what to grep for, what to trace, what to ask before you sign off. No monolithic reports.
+> Pair with the `rafter` skill (detection: `rafter scan`, `rafter run`) and `rafter-secure-design` (prevention: design-phase walks). This skill is the middle stage — review before merge.
+## How to use this skill
+1. Identify the category of code in front of you (below).
+2. `Read` only the matching sub-doc — do not preload them all.
+3. Work through its questions against the specific files/diff. Cite file:line evidence as you go.
+4. When in doubt on a single finding, jump to `docs/investigation-playbook.md` for canonical follow-up questions.
+5. Finish with `rafter run --mode plus` on the same diff if the stakes warrant a deep automated pass.
+---
+## Choose Your Adventure
+### (1) Web application (server-rendered, session-based, or SPA backend)
+For: login flows, session/cookie handling, form handlers, template rendering, admin panels, anything browser-facing.
+- **Read `docs/web-app.md`** — OWASP Top 10 (2021) walk: broken access control, crypto failures, injection, insecure design, misconfig, vulnerable components, authn failures, integrity failures, logging gaps, SSRF.
+### (2) REST / GraphQL / gRPC API (machine-to-machine, mobile backend, public API)
+For: endpoint surface that isn't primarily rendering HTML — tokens instead of sessions, authz-per-endpoint, rate limiting.
+- **Read `docs/api.md`** — OWASP API Security Top 10 (2023): BOLA, broken authn, BOPLA, unrestricted resource consumption, BFLA, unrestricted access to sensitive business flows, SSRF, misconfig, improper inventory, unsafe consumption of third-party APIs.
+### (3) LLM-integrated feature (prompts, agents, tools, RAG, embeddings)
+For: anything that sends user text to a model, uses tool calls, retrieves untrusted context, or ships model output to a downstream system.
+- **Read `docs/llm.md`** — OWASP LLM Top 10 (2025): prompt injection, sensitive info disclosure, supply chain, data/model poisoning, improper output handling, excessive agency, system prompt leakage, vector/embedding weaknesses, misinformation, unbounded consumption.
+### (4) CLI, library, or infra-as-code
+For: build tooling, developer CLIs, shared SDK packages, Terraform / CloudFormation / Kubernetes manifests, shell scripts.
+- **Read `docs/cwe-top25.md`** — MITRE CWE Top 25, keyed by language (Python / JS / Go / Rust / Java) and by IaC primitive. Focus on injection, memory safety, path traversal, race conditions, privilege mismanagement.
+### (5) I need to pick the right depth for this review
+For: "how hard should I look?", scoping a review before starting, compliance-adjacent changes.
+- **Read `docs/asvs.md`** — OWASP ASVS L1 / L2 / L3. Picks the level based on risk tier of the code, then gives spot-check questions per level.
+### (6) I have one specific question to investigate
+For: single-finding follow-up, tracing a suspicious call, "is this input reachable from outside?".
+- **Read `docs/investigation-playbook.md`** — canonical questions: reachability, authz coverage, data-flow direction, trust boundary placement.
+---
+## What this skill will NOT do
+- It will not generate a monolithic "security audit report". If you need a report, run `rafter run --mode plus` — the backend is better at that.
+- It will not replace automated scanning. Always pair with `rafter scan local .` (secrets) and `rafter run` (SAST/SCA) before review.
+- It will not produce recommendations without evidence. Every question expects a file:line answer before moving on.
+---
+## Fast path for a typical PR review
+```bash
+# 1. Run deterministic checks first — cheap, catches the obvious
+rafter scan local .
+rafter run                    # remote SAST/SCA, if RAFTER_API_KEY set
+# 2. Then pick the category and walk the questions
+#    Read docs/<category>.md
+```
+If the diff spans categories (e.g. a web app that also has an LLM feature), Read both sub-docs and walk them sequentially. Don't try to merge the checklists.
+---
+## Tie-backs
+- Finding from the scanner you don't understand? → `rafter` skill, `docs/finding-triage.md`.
+- Designing a new feature instead of reviewing one? → `rafter-secure-design`.
+- Risky command came up mid-review? → `rafter` skill, `docs/guardrails.md`.

package/resources/skills/rafter-code-review/docs/api.md ADDED Viewed

@@ -0,0 +1,90 @@
+# API Review — OWASP API Security Top 10 (2023)
+REST / GraphQL / gRPC review: authz-per-endpoint, per-object and per-field; rate limiting; bulk operations. Walk each category as questions. Cite file:line before moving on.
+## API1 — Broken Object Level Authorization (BOLA)
+The most common API vuln. Per-object authz, not per-endpoint.
+- For every handler that takes an id (`/orders/:id`, `/users/:user_id/settings`), is there a check that the id belongs to the caller? "Authenticated" is not "authorized".
+- Grep patterns: `findById`, `SELECT ... WHERE id = ?`, `get_object_or_404`. Is the caller's identity in the query, or compared after?
+- GraphQL: authz at the resolver level for *each* field that returns a user-owned object. Schema-level auth is not enough if resolvers fan out.
+- UUIDs do not save you. They only slow discovery; they do not provide authorization.
+## API2 — Broken Authentication
+- Every unauthenticated endpoint — is it supposed to be? List them: grep for `@AnonymousAllowed`, `permission_classes = []`, middleware skips.
+- Token lifetime, refresh, and revocation: where is a token invalidated on logout / password change / user deletion?
+- JWT-specific: is `alg` pinned? Is the key rotated? Is `iss` / `aud` checked? Is clock skew bounded?
+- API keys: how are they generated (entropy), stored (hashed?), scoped (per-tenant? per-capability?), rotated?
+- Credential endpoints (login, reset, MFA enroll) — rate-limited separately from normal endpoints? Return generic errors? Constant-time compare?
+## API3 — Broken Object Property Level Authorization (BOPLA)
+Covers both mass-assignment and excessive data exposure.
+- Serialization: when returning an object, are sensitive fields (`password_hash`, `mfa_secret`, `internal_notes`, `role`) explicitly excluded? "Return the model" is a red flag; "return a DTO" is the fix.
+- Mass assignment: can the client set fields they shouldn't? `User.objects.update(**request.data)`, `req.body` spread into an ORM constructor, Rails `params.permit!`. Check every update/create path.
+- GraphQL: schema exposes fields; are resolvers authz-checked per field? Can a non-admin introspect admin-only fields?
+## API4 — Unrestricted Resource Consumption
+- Pagination on every list endpoint? Max page size enforced server-side (not just a default)?
+- Rate limits: per-user, per-IP, per-endpoint. Token bucket? What happens at the limit — 429 with `Retry-After`, or silent 500?
+- Request size limits: body size, file upload size, JSON depth, GraphQL query depth / complexity.
+- Expensive operations: image processing, PDF generation, report export — are they queued, timeboxed, cost-accounted?
+- Amplification: does one API call trigger N outbound calls (email, SMS, push)? Can that N be user-controlled?
+## API5 — Broken Function Level Authorization (BFLA)
+Different from BOLA — this is "can a regular user invoke an admin function at all?", not "can user A touch user B's data?".
+- List admin / privileged endpoints. For each, is there a role check? Is the role from a trusted source (session/token claim) or from the request (`X-Role: admin`)?
+- HTTP verb confusion: does the handler accept PUT/PATCH/DELETE when only GET was authz'd? Are method restrictions on the router or in the handler?
+- Feature flags: does the flag gate *access* or only *visibility*? If the endpoint is reachable when the flag is off, the flag isn't security.
+## API6 — Unrestricted Access to Sensitive Business Flows
+- Identify flows worth abusing: signup, promo code redemption, ticket/inventory purchase, "add friend", "send invite".
+- Per-flow: is there anti-automation (captcha, proof of work, device fingerprint, delay between steps)? Rate limit per account *and* per payment instrument *and* per IP range?
+- Does the flow leak enumeration? Signup "email already registered" is a known tradeoff — is it the right one here?
+## API7 — Server-Side Request Forgery
+(Same question set as web-app A10 — see `web-app.md`.)
+- Webhook configurators, URL-based imports, OAuth discovery endpoints, image fetchers: any user-supplied URL that the server fetches?
+- DNS rebinding: is the URL resolved once and then reused, or re-resolved on each redirect? Are redirects followed blindly?
+- Cloud metadata (`169.254.169.254`, `metadata.google.internal`) explicitly blocked?
+## API8 — Security Misconfiguration
+- Error responses: do they include stack traces, SQL fragments, internal hostnames? Production should return stable error shapes only.
+- CORS per-endpoint: any endpoint with `Allow-Credentials: true` *and* a reflected / wildcard origin?
+- Default routes from frameworks still mounted (`/actuator/*`, `/debug/*`, `/_next/*` in dev mode)?
+- Are OPTIONS responses correctly restrictive? Do HEAD and OPTIONS follow the same authz as GET?
+- TLS: are older APIs allowed to accept plain HTTP for backwards compat? If yes — is that documented and scoped?
+## API9 — Improper Inventory Management
+A governance issue, but reviewable:
+- Is there an API version registry? When this PR adds or changes an endpoint, is it documented (OpenAPI / GraphQL schema committed)?
+- Are deprecated endpoints marked and scheduled for removal? Still reachable in production?
+- Non-prod environments (staging, sandbox) — do they share data, credentials, or network paths with prod? Often the weakest link.
+## API10 — Unsafe Consumption of Third-Party APIs
+- Outbound API calls: is the response validated before use (schema, size, type)? "Trust the third party" is the failure mode.
+- Credentials to third parties: scoped to least privilege? Rotated? Not shared across tenants?
+- What happens on timeout / 5xx from the third party? Fallback to cached data? Log and surface?
+- If the third party is compromised, what is the blast radius here? Does our data flow into untrusted callbacks?
+---
+## Exit criteria
+- Every endpoint touched by the diff has a documented answer for API1 (per-object authz) and API5 (per-function authz).
+- Every new third-party integration has answers for API10.
+- Every new flow has a rate-limit story (API4) and an abuse story (API6).
+- Scanner cross-check: run `rafter run` and reconcile SAST findings against this walk.

package/resources/skills/rafter-code-review/docs/asvs.md ADDED Viewed

@@ -0,0 +1,120 @@
+# OWASP ASVS — Picking a Level, Spot-Checking It
+The Application Security Verification Standard (ASVS 4.0 / 5.0) is a catalog of verification requirements. Unlike Top 10 lists, it's exhaustive — which is why you pick a level first and spot-check, not walk every item.
+## Step 1 — Pick the right level
+| Level | Use for | Rough test |
+|---|---|---|
+| **L1** — Opportunistic | Low-value apps, internal tooling, marketing sites. "Protect against casual, opportunistic attackers." | Would losing this data be annoying but not damaging? |
+| **L2** — Standard | Most apps that handle user data, B2B SaaS, line-of-business apps. "Default for apps handling sensitive data." | PII, payment, auth, health-adjacent, B2B tenants? |
+| **L3** — Advanced | Apps where compromise leads to real harm: financial transactions, healthcare records, critical infrastructure, high-trust platforms. | Regulatory scrutiny? Lives/money at risk? |
+**Rule of thumb**: pick the lowest level that matches the *highest-sensitivity* data flow in the scope. Don't average. A single admin endpoint that touches payment data pulls the whole service to L2 for that endpoint.
+---
+## Step 2 — Spot-check (not walk-every-item)
+ASVS has 280+ requirements. You will not walk them all in a PR review. Instead, for the level you picked, ask the three questions below per category.
+### V1 — Architecture, Design, Threat Modeling
+- Is there a threat model for this feature? (L2+: yes; L3: reviewed and signed.)
+- Are all components inventoried (deps, services, data stores)?
+- Does the design document trust boundaries and assumptions?
+### V2 — Authentication
+- Password policy: min length 8 (L1) / 12 (L2) / 12+MFA (L3); hashed with argon2/bcrypt/scrypt; never truncated or case-normalized in storage.
+- MFA: not required at L1; required for admin/sensitive at L2; required for all at L3.
+- Credential recovery: does not bypass MFA; uses time-limited, single-use tokens; does not leak account existence.
+### V3 — Session Management
+- Server-side session store (or stateless token with revocation)?
+- Session rotation on login / privilege change / logout?
+- Cookie flags: `Secure`, `HttpOnly`, `SameSite=Lax` or stricter; `Domain` scoped tightly.
+### V4 — Access Control
+- Deny-by-default on new endpoints?
+- Per-object authz (BOLA) enforced where user ids appear in URLs?
+- Admin functions require re-authentication at L2+; require MFA step-up at L3.
+### V5 — Validation, Sanitization, Encoding
+- Input: validated at the trust boundary (not downstream) against a positive spec (allowlist, schema, regex anchored).
+- Output: context-aware encoding (HTML / URL / JSON / shell / SQL). Templating auto-escapes?
+- Parsers: safe loaders for YAML/XML/JSON; no string-concat into query languages.
+### V6 — Stored Cryptography
+- Algorithms: AES-GCM, ChaCha20-Poly1305, SHA-256+, HMAC-SHA-256+, PBKDF2/argon2 for passwords.
+- Key management: keys not in source, rotated, scoped per-environment, stored in a managed KMS at L2+.
+- Randomness: `secrets` / `crypto.randomBytes` / `crypto/rand` — never `rand()` / `Math.random()`.
+### V7 — Error Handling & Logging
+- No secrets / PII in logs (grep log statements).
+- No stack traces to the user in production.
+- Authn, authz, and sensitive actions logged with who/when/what.
+### V8 — Data Protection
+- PII classified? Access to it logged?
+- Data at rest encrypted (disk, DB, backups, cache)? Keys managed separately?
+- Data in transit: TLS 1.2+ with modern ciphers; HSTS.
+### V9 — Communications
+- TLS everywhere, including internal hops? At L3, mutual TLS between services.
+- Certificate validation not disabled anywhere (grep `InsecureSkipVerify`, `verify=False`, `rejectUnauthorized: false`).
+### V10 — Malicious Code
+- No hardcoded backdoors, debug logins, "magic" accounts.
+- Build pipeline integrity: signed artifacts, locked deps, reproducible where possible.
+### V11 — Business Logic
+- Sequential flows (checkout, signup, reset): can steps be skipped? Replayed? Reordered?
+- Anti-automation on abuse-prone flows (captcha, proof of work, device fingerprint)?
+### V12 — Files & Resources
+- Uploads: type-sniffed (not extension-trusted), size-limited, stored outside web root, name-randomized.
+- Downloads: path-confined; no `../` traversal; MIME set explicitly.
+- Archives: zip-slip / tar-slip prevention when extracting.
+### V13 — API & Web Service
+- OpenAPI / GraphQL schema present and matches implementation?
+- Auth per-endpoint, not per-service?
+- Rate limits per-endpoint tier?
+### V14 — Configuration
+- Production config reviewed (debug off, defaults rotated, unused features off)?
+- Dependencies: SCA in CI, lockfile committed, base images pinned.
+- Secrets: none in source, rotated on exposure, scoped per environment.
+---
+## Step 3 — What to produce
+Not an ASVS report. A list of **citations** keyed by (level, category, requirement), plus **gaps**. Example:
+```
+L2 / V4.1.3 (deny by default) — OK, `authMiddleware` registered globally in app.ts:41
+L2 / V4.2.1 (per-object authz) — GAP, /orders/:id handler (orders.ts:88) lacks owner check
+L2 / V5.1.1 (schema validation) — OK, zod schemas in routes/*.ts
+```
+---
+## Tie-backs
+- Concrete vulnerabilities (not requirements): go to `web-app.md` / `api.md` / `llm.md`.
+- Specific finding investigation: `investigation-playbook.md`.
+- Automated coverage: `rafter run --mode plus` produces ASVS-tagged findings.

package/resources/skills/rafter-code-review/docs/cwe-top25.md ADDED Viewed

@@ -0,0 +1,78 @@
+# CWE Top 25 — Language-Keyed Checklist
+MITRE's CWE Top 25 is weakness-level, not risk-level. Use this for CLI tools, libraries, IaC, and anything where OWASP's web/API framing doesn't fit. Pick the language section; pair with language-specific linters.
+## How to use
+- Grep the patterns below against the diff. Each hit is a question, not a verdict.
+- Cross-reference with `rafter run` — the backend catches many of these via SAST; this doc covers the ones that take context to judge.
+---
+## Cross-language (applies everywhere)
+- **CWE-79 XSS / CWE-89 SQLi / CWE-78 OS Command Injection** — any user input reaching a query language, shell, or HTML sink. Fix at the sink: parameterize, array-exec, autoescape.
+- **CWE-22 Path Traversal** — any `open(path)`, `fs.readFile(path)`, `os.path.join(base, user_input)`. Canonicalize (`realpath` / `filepath.Abs`) and verify the result stays under an allow-root.
+- **CWE-352 CSRF** — state-changing endpoints: is there a token check? SameSite cookies are necessary but not sufficient for cross-site POSTs in older browsers / API clients.
+- **CWE-287 Improper Authentication / CWE-862 Missing Authorization** — covered in web-app.md / api.md.
+- **CWE-798 Hardcoded Credentials** — `rafter scan local .` catches literal secrets; manually check env-var defaults (`API_KEY = os.environ.get("KEY", "dev-fallback-abc123")` ships the fallback).
+- **CWE-918 SSRF** — any user-supplied URL fetched server-side. See web-app.md A10.
+---
+## Python
+- **CWE-502 Insecure Deserialization** — `pickle.load`, `pickle.loads`, `yaml.load` without `SafeLoader`, `shelve`, `marshal`. Any of these on untrusted bytes is RCE.
+- **CWE-78 / Subprocess** — `subprocess.run(..., shell=True)` with user input. Use list form: `subprocess.run(["cmd", arg])`, never `shell=True` with interpolated input.
+- **CWE-94 Code Injection** — `eval`, `exec`, `compile`, `__import__` with user input. Also `pd.eval`, `numexpr.evaluate`.
+- **CWE-611 XXE** — `xml.etree.ElementTree` is safe by default in 3.7+, but `lxml.etree.parse` with `resolve_entities=True` is not. Prefer `defusedxml`.
+- **CWE-327 Weak Crypto** — `hashlib.md5` / `sha1` on passwords; `random` (not `secrets`) for tokens; `Crypto.Cipher.DES`, `AES.MODE_ECB`.
+- **CWE-20 Input Validation** — type coercion pitfalls: `int(x)` raises, `int(x, 16)` accepts leading `0x`, `float("inf")`.
+## JavaScript / TypeScript
+- **CWE-1321 Prototype Pollution** — `_.merge`, `Object.assign` with user-controlled source, recursive deep-merge on user JSON. Node: affects the whole process.
+- **CWE-79 XSS** — `innerHTML`, `outerHTML`, `document.write`, `dangerouslySetInnerHTML`, `v-html`, `$sce.trustAsHtml`. React's default is safe; anything that bypasses it is the finding.
+- **CWE-94 Code Injection** — `eval`, `new Function(str)`, `setTimeout(str, ...)`, `setInterval(str, ...)`, `vm.runInThisContext` with user input.
+- **CWE-22 Path Traversal** — `path.join(base, userInput)` does not protect. Must resolve and verify containment.
+- **CWE-400 Regex DoS (ReDoS)** — catastrophic backtracking patterns: `(a+)+`, `(.*)*`. Especially user-provided regexes.
+- **CWE-346 Origin Validation** — `postMessage` handlers that don't check `event.origin`. `addEventListener("message", ...)` without origin check is the bug.
+## Go
+- **CWE-369 Divide-by-Zero / CWE-190 Integer Overflow** — Go doesn't panic on overflow, it wraps. Slice indexing with computed sizes: `make([]byte, headerLen)` where `headerLen` is attacker-controlled.
+- **CWE-362 Race Conditions** — map writes without mutex; goroutines sharing non-channel state; `context.Value` for mutable data. Run `go test -race`.
+- **CWE-74 Injection / CWE-78** — `exec.Command(name, args...)` is safe; `sh -c <string>` is not. Check `exec.Command("sh", "-c", userInput)`.
+- **CWE-295 Improper Certificate Validation** — `tls.Config{InsecureSkipVerify: true}` outside tests.
+- **CWE-400 Resource Consumption** — `io.ReadAll` on untrusted streams with no `io.LimitReader`. Goroutine leaks: for every `go f()`, how does it exit?
+- **CWE-665 Improper Initialization** — zero-value structs used as "valid" config; `sync.Mutex` copied by value.
+## Rust
+- **CWE-119 Buffer Issues** — `unsafe` blocks. Every `unsafe` needs a comment explaining the invariant; missing comments are findings.
+- **CWE-362 Race Conditions** — despite borrow checker, `Arc<Mutex<T>>` misuse (holding across `.await`), `RefCell` in multi-threaded code (→ `RwLock`).
+- **CWE-674 Uncontrolled Recursion** — `serde` with deeply nested JSON, manual recursive parsers without depth limit.
+- **CWE-400 Resource Consumption** — `.collect::<Vec<_>>()` on untrusted iterator; `Bytes::from` without length cap.
+- **CWE-704 Incorrect Type Conversion** — `as` casts that truncate silently (`u64 as u32`). Prefer `try_into()`.
+## Java / Kotlin
+- **CWE-502 Insecure Deserialization** — `ObjectInputStream.readObject` on untrusted bytes; XMLDecoder; Jackson with default typing (`@JsonTypeInfo(use = Id.CLASS)` + polymorphic).
+- **CWE-611 XXE** — `DocumentBuilderFactory` / `SAXParserFactory` without disabling external entities. Default is unsafe in older Java.
+- **CWE-22 Path Traversal** — `Paths.get(base, userInput)` doesn't check containment; use `toRealPath().startsWith(base)`.
+- **CWE-917 Expression Language Injection** — SpEL, OGNL, MVEL with user input (classic Struts-style RCE).
+## IaC (Terraform / CloudFormation / Kubernetes)
+- **CWE-284 Improper Access Control** — security groups with `0.0.0.0/0` on admin ports (22, 3389, db ports); S3 buckets public; IAM policies with `Resource: "*"` + `Action: "*"`.
+- **CWE-732 Incorrect Permissions** — file modes `0777`, world-writable volumes, ConfigMaps holding secrets.
+- **CWE-319 Cleartext Transmission** — ELB listeners on port 80 without redirect; storage without encryption at rest; TLS versions < 1.2.
+- **CWE-798 Hardcoded Credentials** — secrets in `*.tf`, `*.yaml` environment, `docker-compose.yml`.
+- **CWE-1104 Unmaintained 3rd-Party** — Docker base images pinned to `latest` or unpinned digests; Helm charts from unreviewed repos.
+---
+## Exit criteria
+- For each language in the diff, walked the relevant section. For each hit, either a file:line citation showing it's safe, or a finding filed.
+- Run language-specific linters in CI (`bandit`, `semgrep`, `golangci-lint`, `cargo clippy`, `spotbugs`) — this skill complements, doesn't replace them.

package/resources/skills/rafter-code-review/docs/investigation-playbook.md ADDED Viewed

@@ -0,0 +1,101 @@
+# Investigation Playbook — Canonical Questions per Category
+When one finding, suspicious pattern, or vague "this looks wrong" needs a follow-up — use this. Each section is a question you can actually answer with Grep / Read / trace.
+## Reachability: "Can untrusted input get here?"
+Before fixing anything, prove it's reachable.
+- Where does the input originate? Trace upward from the sink: `app.post(...)` → handler → service → ... → the line in question.
+- Are there layers that filter or transform along the way? Allowlist validator? JSON schema? ORM serializer?
+- Is this code path actually called in production? Or is it a dead branch left from a refactor?
+- If it's an internal service — is it exposed via a misconfigured ingress, reachable from the internet, accessible from a compromised pod? "Internal" is a policy, not a security boundary.
+- Test: can you write a failing request/input that triggers the line? If you can write it in 5 minutes, an attacker can.
+---
+## Authz coverage: "Is every path checked?"
+For an authz-critical operation (read user X, delete resource Y, invoke admin action):
+- List every entry point (HTTP handler, gRPC method, queue worker, CLI command, background job). For each: is the check present?
+- For HTTP: is the check in middleware (applies to all), or per-handler (easy to miss)? Grep for the check; count matches; count routes; compare.
+- Does the check use *request-supplied* identity or *session-derived* identity? `X-User-Id: 42` header is not identity.
+- Privilege escalation corner: can a user modify their own `role`, `tenant_id`, `permissions` via the same endpoint that updates their profile? (mass-assignment + authz = disaster.)
+- Is authz re-checked after redirects / async continuations / token refreshes? Identity is not sticky across those.
+---
+## Data flow: "Does tainted data reach a dangerous sink?"
+Source → sinks, both directions:
+- **Top-down**: pick a source (request body, query string, file upload, DB read from a user-writable table). Grep how that variable flows. Does it reach a sink (SQL, shell, HTML, URL fetch, deserializer)?
+- **Bottom-up**: pick a sink (every `subprocess.run`, every `db.raw`, every `innerHTML`). Trace backward. Is the input at the sink derivable from a source?
+- Don't trust "it's validated upstream" without proof. Read the validator; check that the type after validation is strong enough (strings are weak, `UUID` is strong).
+- Does the data go through serialization round-trips that could re-introduce metacharacters? JSON round-trip, URL-decoding at the wrong layer, base64 → string → SQL.
+---
+## Trust boundaries: "Where does untrusted become trusted?"
+- Draw the boundary. What's on each side?
+- At the boundary: is there validation (shape, type, range, allowlist)? Is there normalization (Unicode NFKC, lowercasing, path canonicalization)?
+- Is the same boundary crossed more than once? (Controller → service → repo — does the repo re-validate, or trust the service?)
+- Cross-service: does Service B trust Service A's payload? If A is compromised, what can B do?
+- Cross-tenant: if a single process serves multiple tenants, where is the tenant id enforced? On every query? Or only at the top of the handler?
+---
+## Error paths: "What happens when this fails?"
+- Every try/except / error branch: does it leak information (stack, internal IDs, DB errors) to the caller?
+- Does the failure leave the system in a broken state (half-written file, partial DB row, orphaned session)?
+- Does the failure log enough for you to debug a real incident? Generic "failed to process" without context is a blind spot.
+- Are retries bounded? Does the retry code path itself re-authenticate, or reuse a possibly-stale token?
+---
+## Concurrency: "What happens with two of these at once?"
+- Is there shared mutable state (module globals, singletons, caches, files)? Protected by a lock?
+- Check-then-act races: `if not exists: create` — two requests can both pass the check. Use `INSERT ... ON CONFLICT` or transactions.
+- Idempotency: can the client retry safely? Is there an idempotency key? Repeated payment, duplicate email, double-spend patterns.
+- Async/await holding locks across `.await`: in Rust/Python, this deadlocks. In Go, it's fine but can cause fairness issues.
+---
+## Secrets lifecycle: "Where does this credential live, and who can read it?"
+- Creation: how is it generated (entropy source)? Who knows it at creation time?
+- Storage: env var, config file, KMS, DB, vault? File permissions?
+- Transit: does it appear in logs, metrics, error messages, request bodies?
+- Rotation: is there a story for rotating it? Automated or manual? What breaks during rotation?
+- Revocation: if it leaks today, what's the time-to-revoke? Minutes, hours, or "we'd have to redeploy"?
+---
+## Input shape: "Can I break the parser?"
+- Size: is there a max? What happens at the max+1? At 10×max?
+- Depth: for JSON/XML/nested structures — max depth? Billion-laughs / deeply nested dicts can OOM.
+- Encoding: UTF-8 vs UTF-16 vs Latin-1; BOM handling; surrogate pairs; null bytes in paths.
+- Numeric: NaN, Infinity, -0, integer overflow, very large floats losing precision.
+- Arrays: empty, one element, duplicate keys, sparse arrays, non-integer indices.
+---
+## How to record the outcome
+For each finding that survives investigation, produce a one-line summary in this shape:
+```
+[severity] [ruleId or ad-hoc tag] file:line — <one-sentence issue> — <one-sentence fix direction>
+```
+Example:
+```
+[high] IDOR /orders/:id (orders.ts:88) — handler loads order by URL id without comparing to session user — add owner check before load, 404 (not 403) on mismatch
+```
+Feed these into the PR review comment or back to `rafter` for triage follow-up (`rafter/docs/finding-triage.md`).