npm - @kiwidata/grimoire - Versions diffs - 0.1.3 → 0.1.5 - Mend

@kiwidata/grimoire 0.1.3 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (159) hide show

package/AGENTS.md +56 -4
package/README.md +107 -59
package/dist/cli/index.js +7 -7
package/dist/cli/index.js.map +1 -1
package/dist/commands/check.js +1 -1
package/dist/commands/check.js.map +1 -1
package/dist/commands/configure.d.ts +3 -0
package/dist/commands/configure.d.ts.map +1 -0
package/dist/commands/configure.js +19 -0
package/dist/commands/configure.js.map +1 -0
package/dist/commands/init.d.ts.map +1 -1
package/dist/commands/init.js +2 -0
package/dist/commands/init.js.map +1 -1
package/dist/core/check.d.ts.map +1 -1
package/dist/core/check.js +165 -111
package/dist/core/check.js.map +1 -1
package/dist/core/ci.d.ts.map +1 -1
package/dist/core/ci.js +50 -69
package/dist/core/ci.js.map +1 -1
package/dist/core/configure.d.ts +14 -0
package/dist/core/configure.d.ts.map +1 -0
package/dist/core/configure.js +434 -0
package/dist/core/configure.js.map +1 -0
package/dist/core/detect.d.ts.map +1 -1
package/dist/core/detect.js +153 -26
package/dist/core/detect.js.map +1 -1
package/dist/core/diff.d.ts.map +1 -1
package/dist/core/diff.js +62 -93
package/dist/core/diff.js.map +1 -1
package/dist/core/doc-style.d.ts +0 -4
package/dist/core/doc-style.d.ts.map +1 -1
package/dist/core/doc-style.js +103 -22
package/dist/core/doc-style.js.map +1 -1
package/dist/core/docs.js +202 -170
package/dist/core/docs.js.map +1 -1
package/dist/core/health.d.ts +6 -0
package/dist/core/health.d.ts.map +1 -1
package/dist/core/health.js +133 -96
package/dist/core/health.js.map +1 -1
package/dist/core/hooks.d.ts +0 -3
package/dist/core/hooks.d.ts.map +1 -1
package/dist/core/hooks.js +11 -16
package/dist/core/hooks.js.map +1 -1
package/dist/core/init.d.ts +2 -0
package/dist/core/init.d.ts.map +1 -1
package/dist/core/init.js +230 -406
package/dist/core/init.js.map +1 -1
package/dist/core/list.d.ts.map +1 -1
package/dist/core/list.js +55 -65
package/dist/core/list.js.map +1 -1
package/dist/core/risk-register.d.ts +17 -0
package/dist/core/risk-register.d.ts.map +1 -0
package/dist/core/risk-register.js +73 -0
package/dist/core/risk-register.js.map +1 -0
package/dist/core/shared-setup.d.ts +0 -40
package/dist/core/shared-setup.d.ts.map +1 -1
package/dist/core/shared-setup.js +92 -56
package/dist/core/shared-setup.js.map +1 -1
package/dist/core/status.d.ts.map +1 -1
package/dist/core/status.js +42 -52
package/dist/core/status.js.map +1 -1
package/dist/core/test-quality.d.ts +0 -8
package/dist/core/test-quality.d.ts.map +1 -1
package/dist/core/test-quality.js +24 -30
package/dist/core/test-quality.js.map +1 -1
package/dist/core/trace.d.ts.map +1 -1
package/dist/core/trace.js +67 -75
package/dist/core/trace.js.map +1 -1
package/dist/core/update.d.ts.map +1 -1
package/dist/core/update.js +61 -11
package/dist/core/update.js.map +1 -1
package/dist/core/validate.d.ts +1 -4
package/dist/core/validate.d.ts.map +1 -1
package/dist/core/validate.js +126 -148
package/dist/core/validate.js.map +1 -1
package/dist/index.d.ts +0 -3
package/dist/index.d.ts.map +1 -1
package/dist/index.js +0 -3
package/dist/index.js.map +1 -1
package/dist/utils/config.d.ts +15 -5
package/dist/utils/config.d.ts.map +1 -1
package/dist/utils/config.js +63 -42
package/dist/utils/config.js.map +1 -1
package/dist/utils/fs.d.ts +0 -12
package/dist/utils/fs.d.ts.map +1 -1
package/dist/utils/fs.js +0 -12
package/dist/utils/fs.js.map +1 -1
package/dist/utils/paths.d.ts +0 -6
package/dist/utils/paths.d.ts.map +1 -1
package/dist/utils/paths.js +0 -6
package/dist/utils/paths.js.map +1 -1
package/dist/utils/spawn.d.ts +0 -3
package/dist/utils/spawn.d.ts.map +1 -1
package/dist/utils/spawn.js +0 -3
package/dist/utils/spawn.js.map +1 -1
package/package.json +1 -1
package/skills/grimoire-apply/SKILL.md +89 -25
package/skills/grimoire-audit/SKILL.md +21 -1
package/skills/grimoire-bug/SKILL.md +48 -9
package/skills/grimoire-commit/SKILL.md +3 -2
package/skills/grimoire-design/SKILL.md +259 -0
package/skills/grimoire-design-consult/SKILL.md +200 -0
package/skills/grimoire-discover/SKILL.md +139 -109
package/skills/grimoire-draft/SKILL.md +131 -15
package/skills/grimoire-plan/SKILL.md +119 -46
package/skills/grimoire-pr/SKILL.md +7 -10
package/skills/grimoire-pr-review/SKILL.md +46 -115
package/skills/grimoire-precommit-review/SKILL.md +205 -0
package/skills/grimoire-refactor/SKILL.md +6 -6
package/skills/grimoire-review/SKILL.md +95 -156
package/skills/grimoire-verify/SKILL.md +40 -7
package/skills/grimoire-vuln-remediate/SKILL.md +107 -0
package/skills/grimoire-vuln-triage/SKILL.md +109 -0
package/skills/references/adversarial-personas.md +225 -0
package/skills/references/brand-tokens-format.md +186 -0
package/skills/references/code-quality.md +172 -0
package/skills/references/container-scan-triage.md +102 -0
package/skills/references/dependency-vuln-triage.md +236 -0
package/skills/references/design-heuristics.md +138 -0
package/skills/references/design-input-formats.md +190 -0
package/skills/references/pattern-guard.md +180 -0
package/skills/references/principles.md +82 -0
package/skills/references/refactor-scan-categories.md +154 -2
package/skills/references/review-personas.md +406 -0
package/skills/references/security-compliance.md +22 -1
package/skills/references/testing-contracts.md +1 -1
package/skills/references/visual-fidelity.md +206 -0
package/templates/accepted-risks.yml +47 -0
package/templates/brand-tokens-example.json +13 -0
package/templates/brand-voice-example.md +22 -0
package/templates/constraints.md +25 -0
package/templates/design-tool-setup-stub.md +59 -0
package/dist/commands/archive.d.ts +0 -3
package/dist/commands/archive.d.ts.map +0 -1
package/dist/commands/archive.js +0 -22
package/dist/commands/archive.js.map +0 -1
package/dist/commands/log.d.ts +0 -3
package/dist/commands/log.d.ts.map +0 -1
package/dist/commands/log.js +0 -15
package/dist/commands/log.js.map +0 -1
package/dist/commands/map.d.ts +0 -3
package/dist/commands/map.d.ts.map +0 -1
package/dist/commands/map.js +0 -17
package/dist/commands/map.js.map +0 -1
package/dist/core/archive.d.ts +0 -9
package/dist/core/archive.d.ts.map +0 -1
package/dist/core/archive.js +0 -92
package/dist/core/archive.js.map +0 -1
package/dist/core/log.d.ts +0 -8
package/dist/core/log.d.ts.map +0 -1
package/dist/core/log.js +0 -150
package/dist/core/log.js.map +0 -1
package/dist/core/map.d.ts +0 -9
package/dist/core/map.d.ts.map +0 -1
package/dist/core/map.js +0 -302
package/dist/core/map.js.map +0 -1
package/templates/dupignore +0 -93
package/templates/mapignore +0 -58
package/templates/mapkeys +0 -65

package/skills/references/dependency-vuln-triage.md ADDED Viewed

@@ -0,0 +1,236 @@
+# Vulnerability Triage Reference
+Loaded by `grimoire-vuln-triage` (and later `grimoire-vuln-remediate`). Turns **any** vulnerability scan — `npm audit`, `pip-audit`, `osv-scanner`, Trivy, Grype, Snyk, Dependabot, a SARIF file, or a CSV/markdown report a teammate forwards — into per-advisory verdicts whose single most important output is one decision:
+> **Drop everything and hotfix now, or let it ride the normal testing / release cycle?**
+Everything below exists to answer that, honestly, for *our* deployment — not in the abstract. The skill is **scanner-agnostic**: it normalizes whatever it's handed into one canonical model, then triages that. Covers application dependencies (npm/PyPI/Go/Cargo/…), OS packages (Debian/Alpine/RPM from container scans), and container/IaC findings alike.
+## Why raw scanner severity is not the answer
+Scanners rank by **CVSS base score**, which describes the vulnerability in a vacuum. It knows nothing about whether our code reaches the vulnerable function, whether the package even runs in production, whether the service is internet-facing, what controls sit in front of it, or **whether we already upgraded past it**. CVSS alone over-escalates: most "high"/"critical" findings are not actionable in a given deployment. Commercial reachability tooling suppresses 70–90% of findings for exactly this reason. We get most of that signal for free from reconciliation + KEV + EPSS + reachability + our own context.
+The triage rubric is **Threat × Exposure × Impact**:
+- **Threat** — is it actually being exploited / likely to be? (KEV, EPSS)
+- **Exposure** — can an attacker reach the vulnerable code in our deployment? (reachability + network exposure)
+- **Impact** — what is the blast radius if they do? (data sensitivity, privilege)
+## Step 1 — Normalize the scan into the canonical advisory model
+**Do this before anything else, regardless of source.** Different scanners emit wildly different shapes; triage logic must never be coupled to one format. Map each finding to:
+| Field | Meaning |
+|---|---|
+| `id` | Primary advisory id (CVE, GHSA, OSV, vendor id) |
+| `aliases` | All other ids (so KEV/EPSS lookups can find the CVE) |
+| `cve` | The CVE alias if any (KEV/EPSS key); may be absent |
+| `component` | Package / module / OS-package / image name |
+| `component_type` | `library` \| `os-package` \| `container` \| `iac` \| `runtime` — drives how reachability is judged |
+| `installed_version` | What the scan saw |
+| `fixed_version` | First fixed version, or `none` |
+| `severity` / `cvss` | Scanner-reported, treated as a prior only |
+| `target` | Where it was found (lockfile, image layer, Dockerfile, repo) |
+| `scanner` | Which tool produced it |
+| `advisory_url` / `description` | For reading what the bug actually is |
+### Format adapters
+Read the right fields per scanner — **do not** assume one tool's field names apply to another:
+- **npm audit** (`--json`): `vulnerabilities{}` keyed by package → `severity`, `via[]` (string or advisory object with `title`/`url`), `isDirect`, `fixAvailable`, `nodes[]`. Dev deps appear via `effects`/dependency graph (no single `dev` flag on every entry).
+- **pip-audit** (`-f json`): `dependencies[]` → `{name, version, vulns[]}`; each vuln has `id`, `aliases[]` (find the `CVE-` one), `fix_versions[]`, `description`. **No dev flag in the data** — infer dev/runtime from lockfile groups (`[tool.uv] dev-dependencies`, poetry `group.dev`, `requirements-dev.txt`).
+- **osv-scanner** (`--format json`): `results[].packages[].vulnerabilities[]` with OSV ids + `aliases`; `results[].source.path` is the manifest.
+- **Trivy** (`--format json`): `Results[]` each with a `Class` (`os-pkgs` \| `lang-pkgs` \| `config`) and `Type` (debian, alpine, gobinary, python-pkg, …); `Results[].Vulnerabilities[]` → `VulnerabilityID`, `PkgName`, `InstalledVersion`, `FixedVersion`, `Severity`, `CVSS{}`. **`Class: os-pkgs` → `component_type: os-package`** (base-image OS cruft — judged differently from app deps). `Results[].Class: config` → `component_type: iac` (Dockerfile/k8s misconfig, not a CVE — triage on exposure, no KEV/EPSS).
+- **Grype** (`-o json`): `matches[].vulnerability` (`id`, `severity`, `fix.versions[]`) + `matches[].artifact` (`name`, `version`, `type`).
+- **Snyk / Dependabot / SARIF**: pull `ruleId`/`cve`, the package coordinate, and fixed version from `results[]` / alerts / `runs[].results[]`. SARIF `level` maps to severity.
+- **Unknown / freeform (CSV, markdown, pasted text):** extract the minimum — `id` (CVE/GHSA), `component`, `installed_version`, `fixed_version` if stated. Anything you can't fill is `unknown`, not a guess. Triage proceeds on what you have; record `scanner: <described>` and note reduced confidence.
+If you genuinely can't parse a format, say so and ask for `--json`/SARIF rather than guessing at a table.
+### Deduplicate before triaging
+Scanners — Trivy especially — emit the **same CVE once per affected package** (e.g. `CVE-2026-40393` listed against `libgbm1`, `libgl1-mesa-dri`, `libglx-mesa0`, `mesa-libgallium` = 4 findings, 1 vulnerability). Collapse to **unique `(id, component_type)`**, keeping the list of affected packages on the single entry. Triage and count the deduplicated set — a "200 CVE" image scan is often 30 real advisories. Report both numbers (raw findings → unique advisories) so the noise reduction is visible.
+### Container scans also carry non-CVE results — don't drop them
+Trivy/Grype image scans include result classes that are **not** package CVEs and need routing, not triage:
+- **secret** (`Class: secret`) — a credential/key found in an image layer (e.g. an `.env` file baked in). Even zero-hit secret *targets* are a smell (why is an env file in the image?). Route to infra/`grimoire-draft`, treat any real hit as a confidential security issue.
+- **config / misconfig** (`Class: config`, `component_type: iac`) — Dockerfile/k8s findings (root user, no resource limits, exposed port, `:latest` base). Triage on exposure; route persistent ones to infra/`grimoire-draft`. Not an app hotfix.
+## Step 2 — Reconcile against the current tree FIRST  *(mandatory, highest-leverage)*
+**Scan artifacts go stale.** A report from last week was taken against versions you may have already upgraded. Before spending any effort on enrichment or reachability, compare each advisory's `installed_version` against what the repo resolves **right now**:
+- Read the live lockfile / manifest: `package-lock.json` / `pnpm-lock.yaml` / `yarn.lock`, `uv.lock` / `poetry.lock` / `requirements.txt`, `go.mod`/`go.sum`, `Cargo.lock`, `Gemfile.lock`. For container/OS findings, the equivalent is "is this image still deployed?" — check the current image tag / Dockerfile base.
+- If the **currently resolved version ≥ `fixed_version`**, the advisory is **`fixed`** → drop it from the queue before enrichment. Record it in the "Already fixed" section as the audit trail.
+- If a manifest comment or prior triage already dismisses the CVE (e.g. `urllib3>=2.7.0  # CVE-2026-44431`), treat as `fixed`/known and don't re-litigate.
+This single pass routinely clears the majority of a stale scan and saves the expensive work for findings that are actually still present. **Never file remediation for an advisory without confirming it still exists in the current tree.**
+## The enrichment signals (for advisories that survive reconciliation)
+Gather what you can. Degrade gracefully — a missing signal is "unknown", not "safe". OS-package and library findings both get KEV/EPSS (they're CVEs); IaC/config findings skip threat-intel and triage on exposure.
+### KEV — CISA Known Exploited Vulnerabilities  *(Threat, binary)*
+Fetch once per run and match every advisory's `cve`/`aliases`:
+`https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json`
+KEV membership is binary, evidence-grounded, auditor-defensible. **A reachable KEV vulnerability is a hotfix candidate regardless of CVSS.**
+### EPSS — Exploit Prediction Scoring System  *(Threat, probability)*
+Fetch per CVE (batchable, comma-separated):
+`https://api.first.org/data/v1/epss?cve=CVE-2024-XXXX,CVE-2024-YYYY`
+Daily-refreshed probability (0–1) of exploitation in the next 30 days. Ranks the long tail KEV is silent on. Rough bands: `≥0.5` high, `0.1–0.5` elevated, `<0.1` low. A prior, not a verdict.
+### Reachability — is the vulnerable code in our execute path?  *(Exposure)* — judge by `component_type`
+The strongest noise filter. **How you judge reachability depends on what kind of component it is:**
+- **`library` (app dependency):**
+  - *Dev/test only?* Not shipped or run in prod → **not_affected** in prod (`vulnerable_code_not_in_execute_path`). Infer dev/runtime from lockfile groups — there is usually no single flag.
+  - *Imported at all?* `search_graph(name_pattern=<pkg>)` / `search_code(<pkg>)`. Unused transitive → low exposure.
+  - *Vulnerable function reached?* When the advisory names the affected API, `trace_path` / `search_code` to confirm our code calls *that* surface. Not present / not on a reachable path → **not_affected**.
+- **`os-package` (Debian/Alpine/RPM from a container scan):** judge **two separate axes** — *reachability* (is the vulnerable code path called by untrusted input?) drives urgency; *removability* (can we delete it, and what breaks?) drives remediation. **They are not the same — unreachable ≠ removable.** A headless API can't reach Mesa's OpenGL code *and* can't remove Mesa if it arrived transitively behind a package it needs. A C library is only reachable if something in the running app binds to it, so grep the **consumer**, not the C package name (the app never `import`s `libexpat1`). Honor the scanner's fix-state: `will_not_fix`/`end_of_life`/no-`FixedVersion` means **no fix exists** → the lever is accept-with-expiry or a base-image bump/rebuild, **never** an "upgrade X" ticket. **Before recommending any removal or "slim the base image," trace how the package is installed (explicit / transitive / base-image / builder-only) and name the post-change test** — see `./container-scan-triage.md` for the full discipline, the transitive-source and consumer maps, and the anti-patterns. Route image-structure changes to infra/`grimoire-draft`, not app remediation.
+- **`runtime` (the interpreter/build tool itself — e.g. `pip`, `node`, `setuptools`):** is it invoked **at runtime** or only at **build time**? A build-time tool not present/called in the running container → **not_affected** at runtime (`vulnerable_code_not_in_execute_path`); note it as build-image hygiene at most. Check the container entrypoint/CMD.
+- **`container` / `iac` (Dockerfile, k8s, compose misconfig):** not a CVE — no KEV/EPSS. Triage on exposure + control: does the misconfig (root user, no resource limits, exposed port, `:latest` base) actually create reachable risk in our deployment? Route persistent ones to `grimoire-draft`/infra, not an app hotfix.
+**Grep can lie — verify the binding.** A bare symbol/name match is not proof: `Price.fromstring()` (the `price-parser` library) is not XML `etree.fromstring()`; a package named in a comment isn't a call. Confirm the match is the actual vulnerable binding (right import, right call site) before asserting `not_affected` **or** `affected`.
+**Resolve unknowns now — `under_investigation` is a last resort, not a default.** When grep doesn't settle reachability, work the question before deferring it:
+1. **Trace deeper** — `codebase-memory-mcp` `trace_path` from the entry points (routes/handlers) to the vulnerable binding; read the actual call site, not just its name.
+2. **Ask the decisive question.** Most reachability unknowns collapse to one yes/no the human can answer instantly — ask it inline rather than filing a task. The pattern: *"Does <surface> ever receive attacker-controlled <input>?"* e.g. "Does any endpoint parse user-supplied XML (SAML/SSO metadata, uploads, external responses)?" → "no" makes expat/libxml2 `not_affected` on the spot; "yes" makes them `affected`. One question can clear several findings **and** spare a register entry and a follow-up task.
+3. Only when the answer genuinely needs work nobody in the session can do — a runtime check, a teammate's knowledge, an external dependency — mark **`under_investigation`**, time-box it, and name exactly what must be checked. Don't force a verdict to look decisive; but don't punt one you could resolve with a trace or a question, either.
+Record reachability provenance: `graph-verified` (codebase-memory-mcp), `grep-asserted` (fallback), `image-layer` (container scan), or `unknown`.
+### Exposure & Controls — our deployment, our mitigations  *(Exposure + Impact damping)*
+Read these — do **not** invent a controls config file; the truth already lives here:
+- **`.grimoire/docs/context.yml`** — `deployment` (internet-facing vs internal vs lambda/batch), `infrastructure`, `services`. An internal-only service behind an auth gateway has far less exposure than a public endpoint.
+- **MADR decisions** (`.grimoire/decisions/*.md`) — `Security (CIA)` quality-attribute rows and security-relevant decisions (WAF, network isolation, input validation, auth model, tenancy). A documented control that breaks the attack path is a legitimate VEX `inline_mitigations_already_exist` / severity damper.
+- **App config that satisfies (or fails) an advisory's precondition** — many CVEs are conditional ("only when `SESSION_SAVE_EVERY_REQUEST=True`", "only on ASGI", "only if middleware X enabled"). Read the actual setting. A precondition that is **met** raises urgency; one that is **absent** is a clean `not_affected` (`vulnerable_code_cannot_be_controlled_by_adversary` / not in execute path).
+If a control that *would* change the verdict is assumed but **not recorded anywhere**, do not credit it silently. Flag the gap and recommend recording it as a MADR via `grimoire-draft` — an undocumented control is not defensible in an audit.
+## VEX verdict — the per-advisory status
+Assign each surviving advisory a [VEX](https://www.cisa.gov/sites/default/files/2023-04/minimum-requirements-for-vex-508c.pdf) status. This is the noise-suppression layer: only `affected` items become dev work.
+| Status | Meaning | Justification codes (for `not_affected`) |
+|---|---|---|
+| `fixed` | Already remediated — current tree resolves ≥ the fixed version (from Step 2). | — |
+| `not_affected` | We are not exploitable. **No dev work.** | `component_not_present`, `vulnerable_code_not_present`, `vulnerable_code_not_in_execute_path`, `vulnerable_code_cannot_be_controlled_by_adversary`, `inline_mitigations_already_exist` |
+| `affected` | Exploitable in our deployment. **Needs a remediation action.** | — (carries an urgency, see below) |
+| `under_investigation` | Can't determine yet (no graph, ambiguous advisory). Time-box it. | — |
+Every verdict records *why*. A `not_affected` with a justification code is the auditor-defensible way to dismiss noise; a bare "looks fine" is not.
+## Urgency — the decision that matters (for `affected` items only)
+| Urgency | Trigger | Action |
+|---|---|---|
+| **hotfix-now** | In **KEV** AND reachable AND exposed; **or** EPSS high + reachable + internet-exposed + no mitigating control; **or** active exploitation against a high-impact surface (auth, PII, RCE on a public endpoint). | Drop everything. Expedited fix branch, out-of-band release. Notify security owner. |
+| **next-release** | Reachable but exposure is damped (internal-only / behind a control), **or** no KEV and low/elevated EPSS, **or** fix requires a non-trivial upgrade / image rebuild with no active-exploitation pressure. | File remediation for the normal testing / release cycle. |
+| **accept (risk-accepted)** | `affected` but low real risk and **no fix available** (no patched version / no newer base image yet). | Record justification + an **expiry / revisit date**. Re-triage on expiry or when a fix ships. Don't let it become permanent. |
+Decision tree, in order:
+1. Already `fixed` (Step 2)? → done, drop. No enrichment needed.
+2. `not_affected` (reachability / precondition absent)? → done, it's noise. No urgency.
+3. Reachable + (KEV **or** high EPSS) + internet-exposed + no breaking control? → **hotfix-now**.
+4. `affected` but damped (not exposed / control breaks the path / low EPSS / dev-or-build-time only) → **next-release**.
+5. `affected`, no fix exists, low risk → **accept** with expiry.
+Default bias: unknown reachability + internet-facing + KEV → **hotfix-now** (fail safe). Unknown reachability + no KEV + low EPSS → **under_investigation** + next-release — don't manufacture an emergency.
+## The Contrarian pass — calibrate before you escalate
+Before finalizing **any** `hotfix-now` or `affected` verdict, run the **Contrarian calibration pass** (`./review-personas.md` §4.8) over the escalated findings. The Contrarian adds no findings; it challenges the ones we're about to act on. For each escalation ask:
+1. **Steel-man "we are not affected."** Strongest case this doesn't matter here — dev/build-only, function never called, base-OS cruft, behind auth + network isolation, precondition absent, input never attacker-controlled. If it holds, drop the urgency (often to `not_affected`).
+2. **Name the assumption.** "Assumes the parser is fed untrusted input." "Assumes this endpoint is public." If it contradicts `context.yml`/a MADR/the actual config, the finding is mis-calibrated.
+3. **Inversion.** If we hotfixed this, what *new* risk ships — a rushed major bump, an untested base-image swap, a breaking transitive change? Is the cure riskier than the disease before the next release window?
+4. **Is doing-nothing-until-release right?** Symptom vs root cause; will it actually trigger; cost of "fix now" vs "fix when it hurts".
+5. **Is severity calibrated?** A `hotfix-now` must clear all of: reachable, exploitable-as-deployed, real blast radius.
+Emit per escalation: `[hotfix upheld]` / `[hotfix → next-release]` / `[finding dropped]` with one line of evidence. Summary counts are **post-Contrarian**. Calibration, not veto — a surviving harm path tied to `context.yml`/KEV stands.
+## Triage record format
+Write `.grimoire/security/vulns/<run-date>/triage.md`:
+```markdown
+---
+scanners: [<npm-audit|pip-audit|osv-scanner|trivy|grype|snyk|sarif|other>]
+scan_dates: [<YYYY-MM-DD per source>]
+triaged_date: <YYYY-MM-DD>
+reconciled_against: <lockfile/manifest/image checked>
+kev-feed: <date fetched, or "offline">
+epss-fetched: <true|false>
+reachability: <graph-verified|grep-asserted|image-layer|unknown>
+totals: { raw_findings: N, unique_advisories: N, fixed: N, not_affected: N, affected: N, hotfix_now: N, accepted: N, under_investigation: N }
+---
+# Vulnerability Triage — <run-date>
+## Hotfix now (drop everything)
+<!-- omit section if empty -->
+### <id> — <component> <version> (<component_type>)
+- **VEX**: affected · **Urgency**: hotfix-now
+- **KEV**: yes/no · **EPSS**: 0.NN · **CVSS**: N.N (<severity>) · **Scanner**: <tool>
+- **Reachable**: <yes — calls X / image-layer / no / unknown> (<provenance>)
+- **Exposure**: <internet-facing endpoint / internal-only / dev/build-only / base-OS>
+- **Controls**: <none that break the path / WAF per ADR-00NN / behind auth gateway>
+- **Precondition**: <CVE condition — met / absent, with the setting checked>
+- **Blast radius**: <RCE / PII read / DoS / info disclosure>
+- **Contrarian**: [hotfix upheld] <one line>
+- **Fix**: upgrade <component> <from> → <to> / rebuild on <base> (or: no fix — mitigation: <...>)
+## Next release cycle
+### <id> — <component> (<component_type>)
+- (same fields) · **Contrarian**: [hotfix → next-release] <why damped>
+## Risk-accepted (revisit by <date>)
+### <id> — <component>
+- **VEX**: affected · no fix available · **Expiry**: <YYYY-MM-DD> · **Justification**: <...>
+## Already fixed (reconciled out — scan was stale)
+<!-- audit trail: dropped before enrichment because current tree is past the fix -->
+- <id> — <component>: current tree resolves <ver> ≥ fixed <ver>
+- <id> — <component>: dismissed in manifest (<comment/prior triage>)
+## Not affected (suppressed noise)
+- <id> — <component>: not_affected (`vulnerable_code_not_in_execute_path` — dev-only / build-time / base-OS not invoked)
+- <id> — <component>: not_affected (`vulnerable_code_cannot_be_controlled_by_adversary` — precondition absent: <setting>)
+## Under investigation (time-boxed to <date>)
+- <id> — <component>: <what's blocking the call>
+## Control gaps surfaced
+- <control> assumed for <id> but not in any decision record → suggest `grimoire-draft`.
+## Infra follow-ups (root-cause, not per-CVE)
+<!-- container/IaC hygiene — route to infra/grimoire-draft, not app remediation. Each must state how-installed + post-change test, per container-scan-triage.md. -->
+- <package> (CVE <id>): installed via <explicit line N / transitive via PARENT / base image / builder-only>; depends: <evidence>; removable: <yes-safe / yes-after-removing-PARENT / no-base-OS / no-required-by-Y>; recommendation: <Dockerfile edit / bump-base / accept+document> — test: <build / import / DB connect>.
+- Secret/config result: <e.g. `.env` baked into image layer> → remove from image, route to infra.
+```
+## Supply-chain note (separate from CVE triage)
+A *known CVE* in a component is what this reference triages. A **dependency add/upgrade** (new package, version bump, floating range, missing lockfile/integrity hashes) is a different risk class — covered by `security-compliance.md` § Supply Chain Defense, a review-time blocker, not a CVE-triage output. Keep them distinct: triage answers "is this known CVE a hotfix?"; supply-chain defense answers "should this change merge at all?".
+## Principles
+- **Reconcile first.** A stale scan is mostly already-fixed findings. Confirm each advisory still exists in the current tree before any other work — it's the cheapest, highest-leverage pass.
+- **Scanner-agnostic by construction.** Normalize any tool's output into the canonical model, then triage that. Never couple the verdict logic to npm's or pip's field names.
+- **CVSS ranks the world; we triage our deployment.** The whole job is that gap. A "critical" in a dev-only, unreachable, or base-OS-cruft component is noise; a "medium" KEV hit on a public endpoint is a hotfix.
+- **Reachability is type-aware.** Library imports, OS-package usage, runtime-vs-build, and IaC misconfig are judged differently. A flagged base-image OS lib the app never calls is not a prod emergency.
+- **`not_affected` needs a justification code, not a vibe.** That line is the audit trail.
+- **Controls must be recorded to count.** Undocumented WAF/auth/isolation can't damp a verdict — flag the gap.
+- **Fail safe on unknowns, but don't manufacture emergencies.**
+- **The Contrarian calibrates escalations; it does not suppress real signal.**
+- **`accept` is not `ignore`.** It carries an expiry and gets re-triaged.

package/skills/references/design-heuristics.md ADDED Viewed

@@ -0,0 +1,138 @@
+# Design Heuristics Reference
+Loaded by `grimoire-design` (variant generation, state enumeration) and review skills running visual-fidelity checks. A compact checklist of the heuristics, laws, and minimum-viable rules an AI agent or reviewer should know before generating or critiquing UI.
+The goal is **calibration**, not exhaustiveness. Every heuristic below has a trigger condition — if the trigger doesn't apply to the change under review, skip it. Heuristics fired indiscriminately become noise; the materiality gate from `./review-personas.md` §2 applies here too.
+---
+## 1. Nielsen's 10 Usability Heuristics
+The classic baseline. Each line: heuristic, one-line definition, trigger.
+| # | Heuristic | Trigger (when it applies) |
+|---|---|---|
+| 1 | **Visibility of system status** — keep users informed of what's happening | Any async action; loading > 1s; multi-step flows |
+| 2 | **Match between system and real world** — speak the user's language | Naming any concept users will see; error messages |
+| 3 | **User control and freedom** — escape hatches, undo, cancel | Any destructive action; multi-step wizards; modal dialogs |
+| 4 | **Consistency and standards** — follow platform and product conventions | New component where a similar one exists; reinventing standard controls |
+| 5 | **Error prevention** — design out errors before they happen | Forms; destructive actions; irreversible operations |
+| 6 | **Recognition rather than recall** — show options, don't make users remember | Multi-step flows; command palettes; settings spread across pages |
+| 7 | **Flexibility and efficiency of use** — accelerators for power users | Frequently-used flows; keyboard shortcuts; bulk operations |
+| 8 | **Aesthetic and minimalist design** — every extra element competes for attention | Crowded UI; multiple CTAs; decorative elements without purpose |
+| 9 | **Help users recognize, diagnose, and recover from errors** — plain language, named cause, suggested fix | Every error path; form validation; API failures |
+| 10 | **Help and documentation** — searchable, task-focused, concrete steps | Onboarding; complex features; first-use moments |
+Findings cite the heuristic by number: "Violates H#9 — error message names no recovery path."
+---
+## 2. WCAG 2.2 AA Quick Reference
+The minimum bar for any web or mobile UI claiming accessibility. Numbers are AA-level; AAA is stricter and rarely required outside regulated domains.
+### Contrast
+- **Body text vs background**: 4.5:1 minimum
+- **Large text** (≥18pt regular or ≥14pt bold) vs background: 3:1 minimum
+- **UI components** (buttons, form borders, focus indicators, icons that convey meaning) vs adjacent colors: 3:1 minimum
+- **Tools**: check contrast with a contrast checker; pseudo-disabled or low-emphasis text still must meet 3:1 if it carries meaning
+### Target size
+- **Minimum interactive target**: 24×24 CSS pixels (WCAG 2.2 added this — was 44×44 in iOS HIG, 48dp in Material)
+- **Spacing**: targets smaller than 24×24 must have at least 24px of clear space around them
+- **Exceptions**: inline links in body text; user-agent-controlled (native `<select>`); essential controls where size is dictated by the underlying content
+### Focus
+- Focus indicator must be **visible** on every interactive element
+- Focus order must match visual reading order (left-to-right, top-to-bottom for LTR; reverse for RTL)
+- No focus traps except in modal dialogs (where trap must be escapable via Esc)
+### Forms
+- Every input has a programmatically-associated `<label>`
+- Required fields are marked beyond color (asterisk, "required" text)
+- Errors are announced to screen readers (live region or focus shift) and named in text near the field
+### Motion
+- No content that flashes >3 times per second (seizure risk)
+- Respect `prefers-reduced-motion`; provide a non-animated alternative for essential motion
+---
+## 3. Deceptive Patterns (Brignull Taxonomy)
+Dark patterns to **avoid** in the design, and to **flag** during review. Source: deceptive.design (Harry Brignull's taxonomy). Findings here are usually blockers — the project's stage and audience determine severity, but never normalize them.
+### Patterns
+- **Roach motel** — easy to get into, hard to get out (e.g. one-click signup, multi-step cancel flow). Trigger: review any sign-up / subscription / account-deletion flow.
+- **Confirmshaming** — guilt the user out of opting out (e.g. "No thanks, I hate saving money"). Trigger: any opt-out / decline button.
+- **Sneak into basket** — adds items the user did not select (e.g. donation pre-checked, add-on default-enabled at checkout). Trigger: any cart / order-review flow.
+- **Hidden costs** — final price revealed only at last step (fees, shipping, taxes appear at checkout). Trigger: any purchase / pricing flow.
+- **Forced continuity** — free trial silently rolls to paid without notice. Trigger: any trial / subscription onboarding.
+- **Disguised ads** — ads styled to look like content or controls. Trigger: any ad-supported UI.
+- **Friend spam** — uses contact list to send unsolicited invites under the user's name. Trigger: any contact-import / referral flow.
+- **Privacy zuckering** — tricks users into sharing more data than intended. Trigger: any consent flow, permission prompt, default privacy setting.
+- **Misdirection** — uses visual emphasis to distract from a deceptive choice. Trigger: A/B-tested CTA layouts; "recommended" defaults.
+- **Trick questions** — confusingly-worded questions where the obvious answer is the opposite of intent. Trigger: any settings toggle, consent checkbox.
+### Reviewer rule
+For each pattern, ask: "If a regulator (FTC, CMA, EU DSA enforcement) saw this flow tomorrow, would the company defend it or change it?" If the answer is "change it," the pattern is a blocker.
+---
+## 4. Cognitive Laws (apply when relevant)
+Named laws that compress empirical findings about human-UI interaction. Each one is a single sentence plus when to apply it.
+- **Fitts's Law** — time to acquire a target is a function of distance and size. *Apply when*: placing primary actions (put them where the cursor / thumb already is, make them large); reviewing dense toolbars.
+- **Hick's Law** — decision time grows with the log of the number of choices. *Apply when*: menus with >7 items; settings pages; onboarding step sequences. Reduce, group, or progressively disclose.
+- **Miller's 7±2** — short-term memory holds roughly 7 items. *Apply when*: navigation breadth (top-level menu items), groups within a form, items shown without scrolling. Chunk when over the limit.
+- **Jakob's Law** — users spend most of their time on other sites. *Apply when*: inventing a new pattern where a standard one exists. Most users expect the search box top-right, the logo top-left, the cart icon top-right, the sign-out under a profile menu. Deviate only with reason.
+- **Doherty Threshold** — productivity soars when system response is under 400ms. *Apply when*: any interactive action — feedback within 100ms, completion under 400ms where possible, skeleton/loader for anything longer.
+- **Tesler's Law** — every system has irreducible complexity; the question is who absorbs it (user, designer, engineer). *Apply when*: simplifying — never delete complexity, only shift it. Don't push to users what the system can decide.
+- **Postel's Law (UI variant)** — be liberal in what you accept, conservative in what you produce. *Apply when*: form inputs (accept "(555) 123-4567" or "5551234567"); display formatting (canonicalize on output).
+---
+## 5. Empty / Error / Loading State Rules
+For every interactive component, the design must address these states. Missing states are the single most common omission in AI-generated UI; treat them as a checklist.
+### Required states (all interactive components)
+| State | Minimum-viable handling |
+|---|---|
+| **Default** | The component at rest, ready for input or display. |
+| **Loading** | Visible feedback if action takes >300ms. Skeleton, spinner, or progress bar. Never a blank screen. |
+| **Empty** | Component is visible but holds no data. Show a brief explanation of what would normally appear and how to populate it (the "zero state"). Never silent. |
+| **Error** | Component cannot fulfill its job. Show what went wrong in plain language, with a concrete recovery action (retry, contact, alternative path). Never just "An error occurred." |
+### Conditional states (apply per component type)
+| State | Applies when | Handling |
+|---|---|---|
+| **Success** | Action has a meaningful completed state (form submit, file upload) | Brief acknowledgement + next step. No celebration confetti unless the action genuinely warrants it. |
+| **Disabled** | Action is unavailable in the current context | Visually muted; tooltip or label explains *why* it's disabled and what would enable it. Never disable without explanation. |
+| **Read-only** | Component shows data the user can't edit | Visually distinct from editable (no input border, no cursor); copy-to-clipboard if data is referenceable. |
+| **Over-limit** | Input has a max length, count, or quota | Live counter visible at ≥80% capacity; clear feedback when limit is hit and what the user can do (delete, upgrade). |
+| **Partial / degraded** | Component depends on a service that's slow or down | Show what's available + named outage explanation; do not pretend everything is fine. |
+### Reviewer rule
+For each interactive component in the design, walk the four required states. If any is missing, that's a finding — severity depends on the criticality of the component. Login forms missing an error state = blocker. Footer-link list missing a loading state = drop (loading isn't a thing for static content).
+---
+## Notes for AI Agents Generating UI
+- Default to **fewer choices** (Hick) and **standard patterns** (Jakob). The marginal cost of inventing a novel layout almost never beats the marginal cost of users not recognizing it.
+- Default to **visible feedback within 100ms** for any interaction. Even a focus-ring change counts.
+- Default to **calm, blameless error messages** with a named recovery path. "Couldn't reach the server. Retry?" beats "An error occurred."
+- When in doubt about contrast, check; do not estimate. AI-generated palettes routinely miss 4.5:1.
+- When in doubt about target size, go to 44×44. The 24×24 minimum is the floor, not the goal.

package/skills/references/design-input-formats.md ADDED Viewed

@@ -0,0 +1,190 @@
+# Design Input Formats Reference
+Loaded by `grimoire-design` (variant generation) and `grimoire-draft` (Figma snapshot consumption). Defines the input sources grimoire-design can consume, in precedence order, and the fallbacks when none are available.
+The precedence is **Figma MCP → other MCPs → static HTML → ASCII**. Higher-fidelity inputs win; lower-fidelity outputs are emitted only when nothing better is available. See ADR-0018 for the Figma-primary decision.
+---
+## 1. Figma MCP (primary)
+When `project.design_tool.mcp` is configured with the Figma server, grimoire-design queries Figma directly for frame data and component metadata.
+### Setup
+Configured at `grimoire init` time. Stored as:
+```yaml
+project:
+  design_tool:
+    name: figma
+    mcp:
+      name: figma-developer
+      command: npx
+      args: ["-y", "figma-developer-mcp@latest"]
+```
+The access token is **never** written to config. The MCP server reads `FIGMA_ACCESS_TOKEN` from the shell environment.
+### What to query
+- **Frame data** — given a Figma URL or node ID, fetch frame structure (children, sizes, positions). Use for converting a designed screen into a Gherkin scenario set.
+- **Variables** — Figma Variables → DTCG tokens. If the project's `.grimoire/brand/tokens.json` is missing and Figma Variables exist, offer to seed `tokens.json` from them via Tokens Studio export.
+- **Components** — query the file's components inventory. Cross-reference with `.grimoire/docs/components.md` to detect drift or net-new components.
+### Cache
+When grimoire-design or grimoire-draft fetches frame data, cache the response at `.grimoire/changes/<change-id>/designs/figma-snapshot.json`. Reuse cache for subsequent skills on the same change-id; refresh on user request.
+### Graceful degradation
+If the MCP is configured but the call fails (network, expired token, missing file permission):
+- Emit one-line "Figma MCP unreachable — `<error>`. Falling back to static HTML."
+- Continue with HTML fallback (§4 below). Do not crash the workflow.
+---
+## 2. shadcn-ui MCP (optional)
+When the project uses shadcn-ui (detected via `components.json` or `@radix-ui/*` deps) and the shadcn MCP is installed, grimoire-design can fetch component source by name.
+### What to query
+- **Component fetch** — given a component name (e.g. `Button`, `DialogClose`), retrieve the canonical source. Use when generating variants to ensure they reference the actual project component shape, not a generic one.
+- **Variants list** — enumerate variants the project's component library exposes (e.g. `Button` → `default`, `destructive`, `ghost`, `outline`).
+### Activation
+Only engaged when `.grimoire/docs/components.md` lists shadcn-ui as the component library. Otherwise skip.
+---
+## 3. Storybook MCP (optional)
+When the project has Storybook (`.storybook/` directory, `*.stories.*` files), the Storybook MCP can extract story metadata.
+### What to query
+- **Story enumeration** — list all stories with their args, controls, and parameters. Use to derive states per component (default / loading / empty / error).
+- **Story rendering** — for a given story, fetch the rendered HTML snapshot (if Storybook is running locally with the addon installed).
+### Use case
+The richest source of per-component state coverage. When available, prefer over manually enumerating states in §9 of the grimoire-design workflow.
+---
+## 4. design-extract (optional)
+URL-to-tokens scraper. Given a live site URL, produces DTCG-format `tokens.json`.
+### When to use
+- Bootstrapping `tokens.json` from an existing site (e.g. migrating to grimoire mid-project)
+- Sanity-checking that hand-edited tokens match what's actually on a deployed page
+### Output
+Writes to stdout or a path; pipe to `.grimoire/brand/tokens.json` (or to a temp file for diffing).
+### Limitations
+- Computed styles only — no semantic grouping. Output is flat; group manually.
+- Misses tokens not present on the scanned page (e.g. error states never rendered).
+---
+## 5. HTML Fallback
+When no MCP is available, grimoire-design emits self-contained static HTML files at `.grimoire/changes/<change-id>/designs/variant-{n}.html`.
+### Structure
+```html
+<!DOCTYPE html>
+<html>
+<head>
+  <meta charset="utf-8" />
+  <title>Variant 1 — <change-id></title>
+  <style>
+    :root {
+      /* Brand tokens injected from .grimoire/brand/tokens.json */
+      --color-primary: #0066ff;
+      --color-text: #111827;
+      --spacing-base: 8px;
+      --font-family-base: Inter, sans-serif;
+    }
+    body { font-family: var(--font-family-base); color: var(--color-text); }
+    .button-primary { background: var(--color-primary); padding: var(--spacing-base); }
+    /* ... */
+  </style>
+</head>
+<body>
+  <main>
+    <!-- Variant markup -->
+  </main>
+</body>
+</html>
+```
+### Rules
+- **Self-contained** — no external CSS, no CDN scripts, no remote fonts. Designer opens the file directly; offline must work.
+- **CSS variables only** — every color, spacing, font value must reference a `--token` CSS variable defined in `:root`. The `:root` block is the bridge between `tokens.json` and rendered output.
+- **No JS** unless the variant is demonstrating an interaction that can't be shown statically. Prefer multiple HTML files showing each state over one file with JS state toggling.
+- **One file per variant** — `variant-1.html`, `variant-2.html`, `variant-3.html` by default. A `preview.html` file at the same level renders all variants × all states in a grid for side-by-side review.
+### Token referencing
+Generate the `:root` block by reading `.grimoire/brand/tokens.json` and emitting one CSS custom property per token. Mapping rule: `color.primary` → `--color-primary`, `font.family.base` → `--font-family-base`. Dot becomes hyphen, kebab-case throughout.
+If `tokens.json` is absent, emit neutral defaults (white background, system font, 8px spacing) and note in a top-of-file comment: `/* No brand tokens — using neutral defaults. Run grimoire-design --capture-brand. */`
+---
+## 6. ASCII Fallback
+For trivial scope (level 1-2 changes touching a single existing component), ASCII art in a markdown table is the right tier. Faster to author and read than HTML for low-stakes layout changes.
+### When to use
+- Single component, single state change
+- Pure layout reordering (no new visual treatment)
+- TUI surface (where HTML preview is irrelevant)
+- Quick sketch for a consult conversation, not a final spec
+### Convention
+```markdown
+## Variant 1 — login form, error state
+| Element        | Layout                              |
+|---             |---                                  |
+| Header         | [Logo]                  [Help link] |
+| Form           | Email:    [____________________]    |
+|                | Password: [____________________]    |
+|                | [!] Invalid credentials             |
+|                | [ Sign in ]      Forgot password?   |
+| Footer         | Terms · Privacy · v2.4              |
+```
+Use `[Element]` for interactive controls, `[!]` for error states, plain text for static labels. Markdown tables keep the structure readable in any viewer.
+### When NOT to use
+- Web or mobile surface with new visual treatment → use HTML
+- Multi-component or multi-state designs → ASCII collapses; use HTML grid
+- Anything a designer needs to react to visually → ASCII underspecifies
+---
+## Selection Rule (precedence)
+Grimoire-design picks the highest-fidelity output the environment supports:
+1. Figma MCP configured → render in Figma (no local artifact written)
+2. shadcn / Storybook MCP available + UI codebase detected → HTML using actual component source
+3. Otherwise → static HTML with brand-token CSS variables
+4. Override to ASCII only when scope is trivial OR surface is TUI
+User can override via conversational invocation: "use HTML" or "give me ASCII". The selection is a default, not a lock.