npm - @kiwidata/grimoire - Versions diffs - 0.2.2 → 0.3.1 - Mend

@kiwidata/grimoire 0.2.2 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/AGENTS.md +20 -10
package/README.md +11 -16
package/package.json +1 -1
package/skills/grimoire-apply/SKILL.md +1 -0
package/skills/grimoire-audit/SKILL.md +12 -1
package/skills/grimoire-bug/SKILL.md +3 -0
package/skills/grimoire-design/SKILL.md +6 -4
package/skills/grimoire-design-consult/SKILL.md +3 -3
package/skills/grimoire-draft/SKILL.md +45 -112
package/skills/grimoire-plan/SKILL.md +103 -13
package/skills/grimoire-review/SKILL.md +2 -0
package/skills/grimoire-verify/SKILL.md +20 -0
package/skills/references/artifact-map.md +13 -1
package/skills/references/design-spine.md +166 -0
package/skills/references/elicitation-personas.md +26 -2
package/skills/references/red-flags.md +62 -0
package/skills/references/review-personas.md +30 -2
package/templates/draft.md +6 -1

package/AGENTS.md CHANGED Viewed

@@ -103,22 +103,22 @@ User has a request
 ├─ "I want to add / change / remove functionality"
 │  │
 │  ├─ Adding new behavior?
-│  │  → /grimoire:draft → write new .feature file
+│  │  → /grimoire:draft → design the new behavior (plan projects the .feature)
 │  │
 │  ├─ Changing existing behavior?
-│  │  → /grimoire:draft → modify existing .feature file
+│  │  → /grimoire:draft → design the change (plan projects it into the .feature)
 │  │
 │  ├─ Removing a feature?
 │  │  → /grimoire:remove → tracked removal with impact assessment
 │  │
 │  └─ Does it also involve a technology/architecture choice?
-│     → Draft BOTH: .feature file + MADR decision record in the same change
+│     → Draft BOTH the behavior and the decision in one change
 │
 ├─ "We should use X instead of Y" / "How should we architect this?"
-│  → /grimoire:draft → MADR decision record (not a feature)
+│  → /grimoire:draft → design the decision (plan projects a MADR, not a feature)
 │
 ├─ "We need to handle X concurrent users / meet Y compliance"
-│  → /grimoire:draft → MADR decision record (non-functional requirement)
+│  → /grimoire:draft → design the requirement (plan projects a MADR / constraint)
 │
 ├─ "What do we have? What's documented?"
 │  → /grimoire:audit → discover undocumented features and decisions
@@ -156,9 +156,11 @@ The end-to-end flow for adding or modifying behavior is six stages, each owned b
 **Draft** (`/grimoire:draft`) → **Plan** (`/grimoire:plan`) → **Review** (`/grimoire:review`, optional) → **Apply** (`/grimoire:apply`) → **Verify** (`/grimoire:verify`) → **PR** (`grimoire pr`).
+Draft's single job is to design the change on `draft.md`. **Projection** — turning that agreed design into features, constraints, MADRs, `data.yml`, and the manifest — is the **first step of Plan**, not the end of Draft.
 Each skill's SKILL.md is the authoritative home for that stage's mechanics; the README "Workflow" section is the narrative walkthrough. Do not re-derive stage steps here — invoke the skill. The operational invariants that bind every stage:
-- **Manifest status tracks progress:** `approved` after draft, `implementing` during apply, `accepted` at PR.
+- **Manifest status tracks progress:** the manifest is created at plan's projection step; `approved` once the design is agreed and projected, `implementing` during apply, `accepted` at PR.
 - **Live on the branch.** Features, decisions, constraints, and schema are edited directly on the feature branch — no copy-into-change-folder, no promote step.
 - **No archive step.** The PR diff *is* the change; git history plus the `Change: <id>` commit trailer are the record. PR finalize just flips decision status to `accepted` and removes the ephemeral change folder.
 - **The user drives the pace.** Review mode (default) approves every file change before writing; autonomous mode works the full task list, stopping only on blockers.
@@ -204,9 +206,9 @@ project-root/
 ## Conventions
 ### Manifest Status Lifecycle
-Every manifest has a `status` field in YAML frontmatter:
-- `draft` — being written, not yet reviewed
-- `approved` — reviewed by user, ready for planning/implementation
+Every manifest has a `status` field in YAML frontmatter (the manifest is created at plan's projection step, from the already-agreed design):
+- `draft` — just projected from the agreed `draft.md`
+- `approved` — design agreed and projected, ready for implementation
 - `implementing` — tasks are being worked on
 Update the status as the change progresses. The CLI reads this to report change state. There is no `complete`/archive state — finalize removes the ephemeral change folder once the PR is opened; git history is the record.
@@ -248,7 +250,15 @@ This is what makes `grimoire trace` work. Without it, the commit is invisible to
 ### Decision Numbering
 - Sequential, zero-padded: `0001-`, `0002-`, etc.
 - Never reuse numbers
-- Superseded decisions keep their number, status updated to `superseded by NNNN`
+### Decision Lifecycle
+Status moves `proposed → accepted → (deprecated | superseded by NNNN)`:
+- `proposed` — drafted, not yet adopted.
+- `accepted` — in force; treated as a constraint by every stage.
+- `deprecated` — no longer recommended, with no direct replacement (the need went away).
+- `superseded by NNNN` — replaced by a newer decision.
+Supersession is **two-way and explicit**: the superseding ADR back-links the one it replaces (in Context or Decision Drivers), and the superseded ADR keeps its number with status set to `superseded by NNNN`. This is the only home for the link — don't restate it elsewhere.
 ### Step Definitions
 Organize by **domain concept**, NOT by feature file. Check the project's existing test setup and match its BDD framework conventions. See the active skill's testing reference for ecosystem-specific patterns.

package/README.md CHANGED Viewed

@@ -77,8 +77,8 @@ Then talk to your AI assistant:
 ```
 You: "Users should be able to log in with 2FA"
-→ /grimoire:draft    Creates login.feature with Given/When/Then scenarios
-→ /grimoire:plan     Generates tasks: write the test, then production code
+→ /grimoire:draft    Designs the change on one living draft.md (Given/When/Then take shape here)
+→ /grimoire:plan     Projects the design into login.feature + decisions, then generates tasks
 → /grimoire:review   (optional) Product, security, engineering + principles review
 → /grimoire:apply    Implements test-first (BDD for behavior, unit for invariants)
 → /grimoire:verify   Confirms all scenarios pass, no regressions
@@ -115,11 +115,11 @@ Grimoire routes your request to its one correct home (an admission test keeps ea
 - **"The login page is broken"** → `/grimoire:bug` (reproduce first, then fix)
 - **"A tester found a problem"** → `/grimoire:bug-report` → `/grimoire:bug-triage` → routed fix
-A `.feature` is allowed only if it has an external actor, is observable without reading code/logs, uses domain language, and survives a reimplementation. Security controls, NFRs, and observability guarantees are invariants → they live in the constraints register. Produces `.feature` files (with security tags like `@security`, `@auth`, `@pii`, `@pci-dss` when applicable), constraint entries, decision records, `data.yml` for schema changes, and a manifest tracking the change.
+A `.feature` is allowed only if it has an external actor, is observable without reading code/logs, uses domain language, and survives a reimplementation. Security controls, NFRs, and observability guarantees are invariants → they live in the constraints register. You design all this on one living `draft.md`; it is **projected** into its homes — `.feature` files (with security tags like `@security`, `@auth`, `@pii`, `@pci-dss` when applicable), constraint entries, decision records, `data.yml` for schema changes, and a manifest — at the **start of Plan**, so Draft's one job is to design the change.
-### 2. Plan — Generate concrete tasks
+### 2. Plan — Project the design, then generate concrete tasks
-Every scenario becomes a pair: write the step definition (test), then write the production code. Tasks reference exact file paths, exact assertions, and real patterns from area docs. Data changes (models, migrations) are ordered before feature code.
+Plan opens by **projecting** the agreed `draft.md` into its homes (features, constraints, decisions, `data.yml`, manifest), running the admission test and principles gate as it goes. Then every scenario becomes a pair: write the step definition (test), then write the production code. Tasks reference exact file paths, exact assertions, and real patterns from area docs, ordered along the technical spine (dependencies → data → API → logic → UI → verification).
 The plan skill reads area docs for conventions and boundaries, and queries the code graph for reusable utilities and exact symbols — so the AI plans with real codebase knowledge, not guesses. Each task is tagged with its verification level: `scenario` (behavior), `unit-invariant` (a constraint), or `characterization` (internal/refactor).
@@ -176,7 +176,7 @@ Grimoire treats design as a first-class spec input, not an afterthought.
 - **Brand capture at init** — `grimoire init` offers to capture colors, type, spacing, and voice into `.grimoire/brand/` (DTCG tokens). Skip-able; can be added later via `grimoire-design --capture-brand`.
 - **Consult (optional)** — `/grimoire:design-consult` runs a pre-design Q&A. Security and data personas interview the designer about the proposed change *before* any artifacts exist, surfacing assumptions and constraints early. No findings, no blockers — just questions whose answers will shape the design.
 - **Design** — `/grimoire:design` walks: problem statement → user flow & pain points → variants (Figma MCP, static HTML, or ASCII) → required component states (default/loading/empty/error) → proposed Gherkin scenarios for each (component × state).
-- **Handoff** — accepted scenarios feed `/grimoire:draft` (manifest + ADRs), then `/grimoire:plan` (tasks), then `/grimoire:review` — **mandatory at complexity 4** with surface-conditional adversarial personas (keyboard, screen-reader, contrast on web; touch + gesture on mobile; keyboard-only on TUI).
+- **Handoff** — accepted scenarios feed `/grimoire:draft` (design), then `/grimoire:plan` (projects manifest + ADRs, then tasks), then `/grimoire:review` — **mandatory at complexity 4** with surface-conditional adversarial personas (keyboard, screen-reader, contrast on web; touch + gesture on mobile; keyboard-only on TUI).
 - **Revision** — `/grimoire:design --revise` re-enters an existing design without restarting. Shows current variants and Gherkin, asks what to change, regenerates only the affected artifacts. Previously-accepted scenarios are not overwritten without confirmation.
 Brand-drift lint (`grimoire-design --lint`) cross-references hardcoded colors / px / fonts against `.grimoire/brand/tokens.json` and suggests token replacements. Wired into precommit-review when tokens exist.
@@ -194,19 +194,14 @@ Full grimoire cycle end-to-end — adding two-factor authentication to an existi
 You: "Users should verify their identity with a TOTP code after entering their password"
 ```
-The AI runs `/grimoire:draft` and produces:
+The AI runs `/grimoire:draft` and designs the change on one living `draft.md` — the decision ledger (Y-statements), behavioral sketches, and an open/decided ledger — iterating with you until the design is agreed:
 ```
 .grimoire/changes/add-2fa-login/
-├── manifest.md              # Why, what's changing, scope
-├── features/
-│   └── auth/
-│       └── login.feature    # Updated with 2FA scenarios
-└── decisions/
-    └── 0003-totp-library.md # Chose pyotp over django-otp
+└── draft.md   # the agreed design; scenarios take shape here before projection
 ```
-**login.feature:**
+The scenarios that emerge (projected into `login.feature` at the start of Plan):
 ```gherkin
 Feature: Login with two-factor authentication
   As a user
@@ -235,11 +230,11 @@ Feature: Login with two-factor authentication
     And I should remain on the verification page
 ```
-You review and approve. Manifest status: `draft` → `approved`.
+You review and approve the **design**. Nothing is written to `features/` or `decisions/` yet — projection happens next, in Plan.
 ### Plan
-The AI runs `/grimoire:plan`, reads the approved features + area docs + data schema, and generates `tasks.md`:
+The AI runs `/grimoire:plan`, which **first projects** the agreed `draft.md` into its homes — `manifest.md`, `features/auth/login.feature` (the scenarios above), and `decisions/0003-totp-library.md` — running the admission test as it routes each fact. It then reads those homes + area docs + the code graph and generates `tasks.md`, ordered along the technical spine:
 ```markdown
 # Tasks: add-2fa-login

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@kiwidata/grimoire",
-  "version": "0.2.2",
+  "version": "0.3.1",
   "description": "Gherkin + MADR spec-driven development for AI coding assistants",
   "type": "module",
   "bin": {

package/skills/grimoire-apply/SKILL.md CHANGED Viewed

@@ -384,6 +384,7 @@ Present a brief summary:
 ## Important
 - **Tests are not optional.** Every task produces both production code and passing step definitions. No exceptions.
 - **Red-green is mandatory, not aspirational.** A test must fail before it passes. If it doesn't fail, it's not a real test. Fix it before moving on.
+- **Code-before-test is the most common bypass.** "I'll add the test after" / "let me see it work first" are the *Code before the test* rationalization in `../references/red-flags.md`. If you wrote code before the test, delete the code and start from red.
 - **A test that always passes is worse than no test.** It gives false confidence. If you can't make a step definition fail, you don't understand what it's testing.
 - The feature file is the spec. If a test fails, fix the code, not the feature.
 - If implementation reveals that a scenario is wrong or missing, STOP and go back to draft. Don't silently change features.

package/skills/grimoire-audit/SKILL.md CHANGED Viewed

@@ -107,6 +107,9 @@ For confirmed items, create a grimoire change:
 Group related items into single changes — don't create one change per discovery.
 ### 6. Dead Feature Detection
+**Detection is deterministic.** Every dead/stale finding cites exact `file:line` (or ADR id) evidence from a reproducible check — codebase-memory-mcp graph queries (`search_graph` / `get_architecture`) per [0029]/[0030], with `grep` / `git blame` only where the graph has no answer (e.g. `@skip` age). The same commit yields the same findings. The LLM summarizes and interviews; it does not score the codebase by impression.
 Check for documented features and decisions that may no longer be accurate:
 **Dead features** — feature files that describe behavior the code no longer implements:
@@ -137,7 +140,15 @@ After the interview, summarize:
 - How many decisions are documented vs. undocumented
 - How many decisions are stale
 - How many conventions files drifted vs. up-to-date
-- Suggest which areas to address first (highest risk / most complex / most frequently changed)
+Then emit a **Top Actions** list — most-risk first, each with the exact path and the single next move. The ranking comes from the deterministic checks (§6), not impression, so the same commit yields the same list:
+```markdown
+## Top Actions
+1. `features/billing/invoice.feature` — dead (InvoiceView deleted ~3mo ago); create a removal change.
+2. `.grimoire/decisions/0007-search-backend.md` — stale (library no longer in deps); deprecate or update.
+3. `.grimoire/docs/conventions/api.md` — drifted (views moved to `src/api/handlers/`); refresh.
+```
 ## Important
 - This is a COLLABORATIVE process, not a dump. Interview, don't lecture.

package/skills/grimoire-bug/SKILL.md CHANGED Viewed

@@ -55,6 +55,8 @@ Before touching any production code:
 2. Run it — **it MUST FAIL**, reproducing the bug
 3. If it passes, your test doesn't actually reproduce the bug. Fix the test until it fails for the right reason.
+**Name it after the bug.** This repro test stays as the permanent regression test — name it so the bug is obvious (`test_password_reset_special_chars`; scenario "Password reset with plus-sign email"). One bug → one named regression test. This is how the same bug doesn't come back: a future change that reintroduces it goes red on a test that names the defect.
 This is non-negotiable. A bug fix without a reproduction test is a guess that might work. A failing test is proof you understand the problem.
 ### 4. Document the Bug
@@ -193,6 +195,7 @@ Report to the user:
 ## Important
 - **Reproduce before you fix.** No exceptions. If you can't reproduce it, you don't understand it, and your fix is a guess.
+- **The test is the source of truth, not your self-review.** When the same agent writes a fix and then reviews it, the same wrong assumption rides into both steps — "looks correct" is not evidence. The red→green of the named regression test (and the configured suites) is the proof. Don't declare a bug fixed on a code re-read; declare it fixed when the mechanical gate flips and stays green.
 - **Small fixes only.** If the bug fix requires significant architectural changes, it's not a bug fix — route to `grimoire-draft` for a proper change.
 - **Don't over-document.** The test is the documentation. A one-line comment in the test explaining the bug is enough. Don't create tracking files, bug reports, or manifests for a bug fix.
 - **The feature file is truth.** If a scenario describes behavior the user now says is wrong, that's a spec change, not a bug. Handle it through `grimoire-draft`.

package/skills/grimoire-design/SKILL.md CHANGED Viewed

@@ -61,6 +61,8 @@ Render the chosen template for the user to fill in. Write the result to `problem
 ### 3. User Flow & Success Metrics
 Ask for a **friction-log narrative** as the default minimum-viable level — a short prose description of the current user journey and where it hurts. Offer (but never force) two upgrades: Mermaid journey diagram, then service blueprint.
+Walk the flow on the **UX-workflow spine** (`../references/design-spine.md`): pick a direction — **backward** from the goal ("what must be true for the user to reach here?") when the goal is clear but the path isn't, or **forward** from what the user already knows when documenting an existing happy path — then validate by walking the other way; gaps where the two don't meet are missing or assumed steps. Reconstruct the *real* sequence with laddering / the Mom Test (`../references/elicitation-personas.md`), not an imagined one. Keep the simplicity bias: a surfaced step is a candidate, and a step that serves no part of the stated goal is cut, not designed.
 Separately — not bundled — ask "What are the user's current pain points?" Accept a bulleted list, free text, or "none known". Capture pain points in `problem.md` under a dedicated `## Pain Points` section. Variants generated in step 6 must each state which pain points they address (or explicitly mark "deferred").
 Ask for at least one measurable success metric (e.g., "reduce support tickets about lockouts by 50%"). If the user cannot articulate one, note `no success metric — design effectiveness will be hard to evaluate` as an assumption in `problem.md`.
@@ -73,7 +75,7 @@ If `.grimoire/docs/components.md` is absent AND the project has UI code, ask the
 - MUI / Chakra / Mantine / Radix imports in `package.json`
 - `*.stories.{ts,tsx,jsx,js}` (Storybook stories)
-Write findings to `.grimoire/docs/components.md` listing detected components with file paths and known variants. Skip the scan entirely if no UI signals are present (greenfield or non-UI surface). Subsequent variants prefer existing components over net-new designs and flag net-new explicitly.
+Read at **interface altitude** — detect components from these manifests, imports, and stories, not by reading component source bodies (`../references/artifact-map.md` → Reading altitude). Write findings to `.grimoire/docs/components.md` listing detected components with file paths and known variants. Skip the scan entirely if no UI signals are present (greenfield or non-UI surface). Subsequent variants prefer existing components over net-new designs and flag net-new explicitly.
 ### 5. Brand Grounding
 Read `.grimoire/brand/tokens.json` and `.grimoire/brand/voice.md` if they exist. Use the format documented in `../references/brand-tokens-format.md`. Required groups: `color.*`, `font.family.*`, `font.size.*`, `spacing.*`.
@@ -128,7 +130,7 @@ For HTML output, write a single `.grimoire/changes/<change-id>/designs/preview.h
 Skip preview rendering when output is Figma (the Figma file IS the preview) or ASCII (the markdown table IS the preview).
 ### 9. Derive Gherkin
-Propose draft scenarios as a **design artifact** at `.grimoire/changes/<change-id>/designs/scenarios.feature` (a proposal, not the live baseline). One Scenario per (component × state), Given / When / Then grounded in the design. `grimoire-draft` writes the user-approved scenarios live into `features/` — design does not edit the live baseline directly. Every proposed scenario must still pass draft's feature admission test (external actor, observable, domain language).
+Propose draft scenarios as a **design artifact** at `.grimoire/changes/<change-id>/designs/scenarios.feature` (a proposal, not the live baseline). One Scenario per (component × state), Given / When / Then grounded in the design. These are a proposal: `grimoire-draft` carries them into the design, and `grimoire-plan` projects the user-approved scenarios live into `features/` — design does not edit the live baseline directly. Every proposed scenario must still pass the feature admission test (external actor, observable, domain language).
 Apply surface-conditional adversarial scenarios per `../references/adversarial-personas.md`:
@@ -143,7 +145,7 @@ Present the proposed scenarios for review: "Review proposed scenarios — accept
 ### 10. Handoff
 When the user accepts proposed scenarios, the change folder is populated. Suggest the next step:
-> Run `grimoire-draft` to refine the manifest and ADRs, or `grimoire-plan` to break into tasks.
+> Run `grimoire-draft` to refine the behavioral design, then `grimoire-plan` to project it (features/constraints/ADRs/manifest) and break into tasks.
 Skill is done.
@@ -256,4 +258,4 @@ Do not error — absence is a valid state for projects that haven't onboarded br
 - **Brand drift findings are suggestions, not blockers.** Lint mode proposes token replacements; it does not auto-rewrite code. The user decides whether to apply.
 ## Done
-When the user accepts proposed Gherkin scenarios and the change folder contains `problem.md`, `designs/`, and `features/`, the workflow is complete. Suggest `grimoire-draft` (manifest + ADRs) or `grimoire-plan` (task breakdown) next.
+When the user accepts proposed Gherkin scenarios and the change folder contains `problem.md` and `designs/`, the workflow is complete. Suggest `grimoire-draft` (refine the behavioral design) next, then `grimoire-plan` (project to features/ADRs/manifest + task breakdown).

package/skills/grimoire-design-consult/SKILL.md CHANGED Viewed

@@ -181,9 +181,9 @@ Do not invent content for empty sections. If the designer skipped "Proposed user
 ### 6. Handoff
 Tell the user what runs next and what those skills will do with `consult.md`:
-- **`grimoire-design`** on the same `change-id` will read `consult.md` first, propagate assumptions and givens into prompts for variant generation (e.g., "exclude patterns that violate givens"), and copy the lists into `manifest.md` when the designer accepts a direction.
-- **`grimoire-draft`** on the same `change-id` will read `consult.md` and copy "Inferred assumptions" + "Inferred givens" into `manifest.md` (Assumptions section, plus a new Givens section) at level 3-4 complexity.
-- **Open questions** travel into `manifest.md` as unvalidated assumptions for the designer/PM to resolve before plan.
+- **`grimoire-design`** on the same `change-id` will read `consult.md` first and propagate assumptions and givens into prompts for variant generation (e.g., "exclude patterns that violate givens"); the lists carry into the design and are projected into `manifest.md` by `grimoire-plan`.
+- **`grimoire-draft`** on the same `change-id` will read `consult.md` and carry "Inferred assumptions" + "Inferred givens" into the `draft.md` design (assumptions → Decided/Open; givens → Decisions-ledger context). `grimoire-plan` then projects these into the manifest's Assumptions section at level 3-4 complexity.
+- **Open questions** stay in `consult.md` as designer follow-up items — they are not copied forward.
 Do not invoke the next skill automatically. Confirm with the user, then suggest the next command.

package/skills/grimoire-draft/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: grimoire-draft
-description: Design a change collaboratively on one living draft.md, then project it into Gherkin features, constraints, and MADR decisions. Use when the user describes new functionality, requirements, or architecture choices.
+description: Design a change collaboratively on one living draft.md. grimoire-plan then projects it into Gherkin features, constraints, and MADR decisions. Use when the user describes new functionality, requirements, or architecture choices.
 compatibility: Designed for Claude Code (or similar products)
 metadata:
   author: kiwi-data
@@ -9,12 +9,13 @@ metadata:
 # grimoire-draft
-Design a change on **one living document** (`draft.md`), iterating with the user, then
-**project** the agreed design into its durable homes (features, constraints, decisions).
+Design a change on **one living document** (`draft.md`), iterating with the user until the
+design is agreed. `grimoire-plan` then **projects** that design into its durable homes
+(features, constraints, decisions) — draft itself does not write them.
 The core idea: spread-out artifacts hinder the thinking. So you do all the designing in a
 single coherent doc — diagram/sketch, a decision ledger, pseudo-code, an open-question
-ledger — and only fragment it into separate homes **after agreement**. `draft.md` is
+ledger — and it is fragmented into separate homes (by plan) only **after agreement**. `draft.md` is
 ephemeral: retained as reference through the pipeline, deleted when `grimoire-apply` clears
 the change folder. Git history preserves it.
@@ -27,7 +28,7 @@ the change folder. Git history preserves it.
 ## Routing (coarse — up front)
 Decide only whether to design at all, and in which skill. The **fine** routing (which fact
-becomes a feature vs. a constraint vs. a decision) happens later, at projection (step 7).
+becomes a feature vs. a constraint vs. a decision) happens later, at projection — now `grimoire-plan`'s first step.
 - Bug report ("something is broken") → `grimoire-bug` or `grimoire-bug-report`
 - Pure refactoring (no behavior change) → no grimoire artifact needed. Suggest an ADR only if architecturally significant.
@@ -40,7 +41,7 @@ becomes a feature vs. a constraint vs. a decision) happens later, at projection
 Confirm this is a change worth designing, and which skill owns it (table above). You do
 **not** need to assign each fact to a home yet — during design everything lives in one
-`draft.md`; per-fact routing is a projection concern (step 7, D13).
+`draft.md`; per-fact routing is a projection concern, handled in `grimoire-plan`.
 The one up-front question that matters: **is this a behavior/feature/architecture change**
 (→ design it here), or a bug / pure refactor / config tweak (→ route away, per the table)?
@@ -54,9 +55,9 @@ the design exists. Up front, make only one binary call:
 - **Trivial** — config, typo, copy change, single-file fix, dependency bump. Skip the
   `draft.md` loop: make the change directly, record a minimal `manifest.md` (Why + file
   list), done.
-- **Non-trivial** — anything else. Build a `draft.md` and design the change (steps 3–8).
+- **Non-trivial** — anything else. Build a `draft.md` and design the change (steps 3–7).
-The full **complexity level (1–4)** is scored at **projection** (step 7), once the design
+The full **complexity level (1–4)** is scored at **projection** (`grimoire-plan`'s first step), once the design
 is settled, and written to `manifest.md` — not before (a premature number biases the design
 to fit it). During design, use the table below only as a rough guide for how deep to
 research and elicit; depth grows with the change, it is not pre-allocated.
@@ -79,7 +80,7 @@ Before designing, research what already exists. Do not ask the user to research
   built-ins / first-party check; architecture decisions, new dependencies, or cross-cutting
   concerns need full research across all categories.
-Follow the methodology in `../references/build-vs-buy.md`. The findings feed the `draft.md`
+Follow the methodology in `../references/build-vs-buy.md`. Read candidates at **interface altitude** — public API, types, and docs, not their source or tests (`../references/artifact-map.md` → Reading altitude). The findings feed the `draft.md`
 **Why** (and, for an adopt/build/hybrid call, the manifest **Prior Art** at projection).
 Present findings to the user and get agreement on direction before designing deeply.
@@ -111,17 +112,29 @@ This single loop replaces what used to be separate "elicit requirements", "draft
 "collaborate" steps. **Interviewing IS iterating on `draft.md`.** There is no gather-then-
 transcribe split — requirements surface, get questioned, and resolve inside the doc.
+**Walk the spine.** Pick the spine this change rides (`../references/design-spine.md`): the **technical spine** (process/constraints → data model → API/contract → UI) for behavioral/technical work, or the **UX-workflow spine** (backward from the goal, or forward from what the user knows) for a user-facing flow. Then **always walk its layers in order** — at each layer: elicit with that layer's lens, record its decisions, and validate the prior layer against what you just learned (a required field must trace to a downstream need; the data model must satisfy the process constraints). An empty layer is a one-line skip, not a silent omission. Ceremony scales to constraints: lightweight by default, but once the change introduces **more than 2 constraints**, walk every layer formally — that 3rd constraint is complexity surfacing.
 Iterate with the user, directly on `draft.md`:
 ```
 loop:
-  propose   → decisions into the Decisions ledger; shapes into Sketches; a diagram/sketch into At a glance
+  propose   → decisions into the Decisions ledger as Y-statements (../references/design-spine.md); shapes into Sketches; a diagram/sketch into At a glance
   question  → unknowns become rows under Open (use ../references/elicitation-personas.md as lenses)
   user reacts → answers / edits the doc
   resolve   → strike the Open row IN PLACE: `RESOLVED: <answer> (Dn)` — never delete it
 until Decided is stable AND Open is empty-or-deferred.
 ```
+**Explore before you converge.** When the design approach is genuinely open — more than one
+reasonable shape exists — sketch **2–3 candidate approaches** at a high level (one or two
+lines each: the idea + its main trade-off) and let the user pick a direction *before* you
+deep-dive the ledger on one. Don't silently commit to the first idea that works; the first
+idea is rarely the best, and an unexamined commitment is the *Silently filling a gap* red
+flag at design scale (`../references/red-flags.md`). Keep this lightweight: it is a quick
+approach-level fork, not a variants matrix — **visual/UI variants are `grimoire-design`'s
+job**. When the approach is obvious or forced (one viable shape), say so in one line and
+proceed; don't manufacture alternatives.
 Discipline for the loop:
 1. **Outcome & Non-goals first.** Pin these (into *Why*) before anything else — they set scope. Restate them back to the user.
@@ -131,124 +144,44 @@ Discipline for the loop:
 5. **Disambiguate immediately.** If an answer is vague ("handle errors gracefully"), ask the specific follow-up and record the concrete answer. Never leave a vague answer in the ledger.
 6. **Capture, don't extrapolate.** "Out of scope for now" → record as a non-goal and stop. Don't design a scenario "just in case".
 7. **When the user delegates** ("just write something reasonable"), record it explicitly as an Open→RESOLVED row: "Defaulting to <choice> per user delegation — flag in review if wrong." The assumption stays visible.
-8. **Sort facts by kind as they emerge.** An invariant (security control, NFR, performance budget, observability guarantee) is not a behavior — capture it in the *Constraints* section, not as a behavioral sketch. Apply the rough behaviour-vs-invariant test as you design (does an external actor observe it without reading code/logs?) so projection's admission test (step 7) gets clean input instead of slop to reroute. The fine fact-to-home routing still happens at projection; this just keeps the design honest while you think.
+8. **Sort facts by kind as they emerge.** An invariant (security control, NFR, performance budget, observability guarantee) is not a behavior — capture it in the *Constraints* section, not as a behavioral sketch. Apply the rough behaviour-vs-invariant test as you design (does an external actor observe it without reading code/logs?) so projection's admission test (in `grimoire-plan`) gets clean input instead of slop to reroute. The fine fact-to-home routing still happens at projection; this just keeps the design honest while you think.
 **Never silently fill an open question.** Either ask it (as an *Open* row), defer it to a non-goal, or record the inference explicitly in *Decided*. The *Decided/Open* ledger IS the requirements summary — before declaring the design done, walk it back to the user so they see every call and every guess.
 **Nothing is written to `features/`, `.grimoire/docs/constraints.md`, or `.grimoire/decisions/` during this loop.** Everything lives in `draft.md`. The design is "done" when *Decided* is stable and *Open* is empty-or-deferred — and the user agrees.
-Do NOT proceed to projection without explicit user approval of the design.
-### 7. Projection — generate the homes from draft.md
-Once the user agrees the design is settled, project `draft.md` into its durable homes. This
-is where the **fine routing** happens (each fact → its one home) and where the admission
-test + principles gate run. Artifacts are written **live in their real locations** on the
-branch — `git diff` is the staging area; there is no copy-into-the-change-folder.
-First, **score the complexity level (1–4)** now that the design is settled, and write it to
-`manifest.md` frontmatter as `complexity: <1-4>`.
-Then project each kind of fact:
-**Behaviors → `features/*.feature`.** For each behavioral fact in the design:
-*The feature-file admission test* — a scenario may be written **only if it passes all four gates**; if it fails any, it is a constraint or a decision, not a feature:
-1. **External actor, outside the system boundary** — an end user, an operator, or a *third-party* system integrating with you does the thing. "External" means outside *your* system, not outside one module: a sibling service, an internal queue consumer, or another module in the same repo calling this one is **internal**, even though it's a separate process. Internal actor → contract test or constraint/decision, never a `.feature`.
-2. **Observable** — the actor sees the outcome without reading code or logs. "<200ms", "logs scrubbed of PII" → fails → constraint.
-3. **Domain language** — domain nouns, zero implementation detail. Names a library/log-level/table (`loguru`, `INFO`, `bcrypt`, `users` table) → fails → leaking implementation.
-4. **Survives reimplementation** — rewrite the internals from scratch; would the scenario still read the same? If it would change, it's pinned to implementation → not a feature.
-**Internal protocols and service-to-service contracts are NOT features.** A change to how two of your own components talk — an internal RPC/queue/event shape, a module API, a wire format between your services — is a *contract*, verified by a contract/integration test (`verify: unit-invariant` at plan stage), not by Gherkin. It fails gate 1: there is no external actor, only your own code on both ends. If a third-party integrates against the protocol it's external and may be a feature; two of your own services is internal. This is the second-biggest source of feature-file slop after invariants.
-Common slop this catches: invariants (→ `constraints.md`) — "PII is scrubbed from logs", "all endpoints require auth", "responses are gzipped", "errors logged with a trace id"; internal protocols (→ contract test) — "service A publishes an OrderPlaced event B consumes", "the worker accepts a job payload with these fields", "module X returns this struct to module Y".
-*Extend vs. new — default is always extend; new files are the exception and require justification.* List existing feature files first (**required, not skippable** — do not write any scenario until this triage table is complete):
-```
-Existing feature files:
-  features/auth/login.feature         — "User Login"
-  features/billing/invoices.feature   — "Invoice Management"
-```
-For each scenario, decide extend-or-new and show it:
-```
-  "Admin resets a user's password"  → extend features/auth/login.feature (same actor domain: auth)
-  "User configures SSO provider"    → NEW (no existing file owns SSO configuration)
-```
-Signals to extend: same actor, same domain object, same entry point, same HTTP resource or screen. Signals genuinely new: new actor type with no existing file, entirely new domain object, or the existing Feature title would need "and" to cover both. If unsure, extend. A new file requires stating which files were considered and why none fit.
-Then write Gherkin (Feature title + user story; Background for shared preconditions; one scenario per behavior; Given/When/Then describing WHAT, never HOW). Apply security tags per `../references/security-compliance.md` (only when there's a security surface; compliance tags only when `project.compliance` is set). When design input grounded the scenarios (step 4): use brand-token **names** not hex values when `.grimoire/brand/tokens.json` applies; prefer existing component names when `.grimoire/docs/components.md` exists, and flag any net-new component ("new component required — confirm before plan stage").
-**Constraints → `.grimoire/docs/constraints.md`.** Every invariant that failed the admission test (it's a security control / NFR / observability / compliance rule, not an actor-observable behavior) becomes one row: **assertion · rationale · how-verified · links**. The assertion is a flat statement ("Log output never contains PII or secrets"), not Given/When/Then. `how-verified` names the test that proves it (a `unit-invariant` the plan stage will create) — never a Gherkin scenario. If it stems from a decision, link the MADR; don't restate it. Create the file from `templates/constraints.md` if absent.
-**Decisions → `.grimoire/decisions/NNNN-*.md`.** Project each Decisions-ledger entry, applying the **novelty gate**: a MADR is for a decision with a real, project-specific trade-off between viable alternatives — not for industry-default tooling picks or ecosystem-forced conventions. Ask: *would a competent engineer on this stack make a different choice, and need our reasoning to understand ours?* If no, skip it. Obvious tooling/convention picks fold into the existing `Tooling and convention baseline` ADR (one line: choice → why), not a new sequential record. Genuine trade-offs get the next sequential number, status `proposed` (`grimoire-apply` flips to `accepted` at finalize), using `.grimoire/decisions/template.md`.
-**Data changes → `.grimoire/changes/<change-id>/data.yml`.** If the change adds/modifies/removes data models, fields, indexes, or external API integrations, write `data.yml` (same YAML shape as `schema.yml`, only what's changing, `action:` on each entry):
-```yaml
-# Proposed data changes for: add-user-profiles
-users:
-  action: modify
-  source: src/models/user.py
-  fields:
-    avatar_url: { action: add, type: varchar, nullable: true }
-    legacy_name: { action: remove }
-profiles:
-  action: add
-  type: collection
-  fields:
-    user_id: { type: objectId, ref: users }
-    bio: { type: string, max_length: 500 }
-github_api:
-  action: add
-  type: external_api
-  provider: GitHub
-  schema_ref: https://docs.github.com/en/rest
-  client: src/integrations/github.py
-  endpoints:
-    get_user:
-      method: GET
-      path: /users/{username}
-      request:
-        headers: { Authorization: "Bearer {token}" }
-      response:
-        login: { type: string, required: true }
-        avatar_url: { type: string, required: true }
-        name: { type: string, nullable: true }
-      error_response:
-        message: { type: string }
-        status: { type: integer }
-```
+Do NOT hand off to `grimoire-plan` without explicit user approval of the design.
-**Contract documentation is mandatory for external APIs.** Every endpoint must document `request` (what you send), `response` (fields you read, `required: true` for those your code depends on), and `error_response` (the error shape you handle). Downstream skills generate contract tests from this. If you don't know the exact shape, reference `schema_ref` and document the subset your client uses — that subset is the contract. No data impact → skip `data.yml` entirely.
+### 7. Hand off — projection happens in plan
-**Manifest (`manifest.md`).** Generate it from `draft.md` as the durable plan-input glue: `complexity` (just scored), Why + Non-goals, the artifact list (added/modified/removed features, decisions, constraints), and a **Prior Art** section summarizing step 3's research (what was found/evaluated, why adopt/build/hybrid; if building, what's borrowed). **Level 3–4** also carry **Assumptions** (what must be true; mark evidence vs. unvalidated; flag unvalidated ones on the critical path) and a **Pre-Mortem** (2–5 plausible failure modes 6 months out, with mitigations or "accepted"). These come straight from the `draft.md` Decided/Open and Cut sections.
+Draft ends when the design on `draft.md` is agreed. **Projection — turning the design into its
+durable homes (features, constraints, MADRs, `data.yml`, manifest) — is now the first step of
+`grimoire-plan`**, co-located with the planning that consumes those homes. A two-phase draft
+(design *then* project) was one job too many; draft now does one thing — design the change —
+and hands the agreed `draft.md` to plan.
-**Do NOT delete `draft.md`.** Retain it read-only as the agreed reference through plan → … → apply. `grimoire-apply` removes it with the change folder at finalize.
+So draft does **not** write `features/`, `.grimoire/docs/constraints.md`,
+`.grimoire/decisions/`, `data.yml`, or the full `manifest.md`. What it leaves for plan:
-### 8. Validate (at projection)
+- `draft.md` — the agreed design: the Decisions ledger (Y-statements), Decided/Open, Sketches,
+  Constraints, and Cut sections. This is the single source plan projects from.
+- The change folder and the feature branch.
-- `.feature` files have valid Gherkin; every Feature has a user story; every Scenario has at least Given + When + Then.
-- MADR records have valid YAML frontmatter (status, date).
-- Manifest is complete and accurate; `complexity` is set.
-- **Re-run the admission test on every scenario you wrote**: external actor, observable, domain language, survives reimplementation. Any scenario that now fails is slop — move it to `constraints.md` or a MADR.
-- **Principles gate** (`../references/principles.md`): no fact written to two homes (DRY), no second way to do an existing thing (one right way), no reinvented wheel, no artifact created past the stated scope (KISS). Note: `draft.md` co-existing with the homes is **not** a DRY violation — it is the (soon-deleted) source the homes were projected from, not a parallel authority.
+**Exception — trivial changes** (the step-2 triviality gate) skip plan entirely: draft makes
+the change directly and records the minimal `manifest.md` (Why + file list) itself.
 ## Important
 - ONE change at a time. Don't combine unrelated changes.
-- **`draft.md` is the only surface you design on.** Features, constraints, MADRs, and the manifest are **generated from it** at projection — never authored by hand in parallel during design.
+- **Catch the rationalizations.** "Too small to spec", "I'll just assume a reasonable default" are the named excuses in `../references/red-flags.md` (*Skipping the spec*, *Silently filling a gap*). The urge to skip is the signal to do the step — not skip it.
+- **`draft.md` is the only surface you design on.** Features, constraints, MADRs, and the manifest are **generated from it** at projection (`grimoire-plan`'s first step) — never authored by hand in parallel during design, and not written by draft at all.
 - **Features describe actor-observable behavior, not implementation, and not invariants.** No external actor, not observable, or names a library/log-level/table → it's a constraint (→ `constraints.md`) or a decision (→ MADR). An internal protocol or service-to-service contract (your own components talking) is a contract test, not a `.feature` — "external" means outside your system, not outside one module. These two — invariants and internal protocols — are the top sources of feature-file slop.
 - **One fact, one home** (`../references/principles.md`). A capability lives in one `.feature`; a control in one constraint row; a decision in one MADR. Never the same fact in two homes (at rest).
-- Decisions live in **one inline ledger** in `draft.md` while designing; they project to separate MADRs only at step 7. This is how coupled decisions stay legible during the thinking.
-- Artifacts (post-projection) are edited **live on the branch** — never copied into `.grimoire/changes/`. `git diff` is the staging area.
+- Decisions live in **one inline ledger** in `draft.md` while designing (Y-statements); they project to separate MADRs only at projection, in `grimoire-plan`. This is how coupled decisions stay legible during the thinking.
+- Projected artifacts are edited **live on the branch** — never copied into `.grimoire/changes/`. `git diff` is the staging area.
 - **Figma access token is read from `FIGMA_ACCESS_TOKEN` by the MCP server.** Never log it, never write it to config or any artifact (`manifest.md`, `consult.md`, `figma-snapshot.json`, `draft.md`). The MCP handles auth transparently.
 ## Done
-When the user approves the design and it has been projected, the workflow is complete.
-`draft.md` remains as reference until `grimoire-apply` clears it. Present the change
-directory path and suggest next steps:
-- `grimoire-plan` to generate implementation tasks
+When the user approves the design on `draft.md`, the workflow is complete — draft does not
+project. Present the change directory path and suggest next steps:
+- `grimoire-plan` — projects the design into features/constraints/MADRs/manifest, then generates tasks
 - Or further iteration on `draft.md` if the user wants changes

package/skills/grimoire-plan/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: grimoire-plan
-description: Derive implementation tasks from approved Gherkin features and MADR decisions. Use when features are approved and ready for task breakdown.
+description: Project an agreed draft.md into its homes (features, constraints, decisions, manifest), then derive implementation tasks from them. Use after the design is approved in grimoire-draft.
 compatibility: Designed for Claude Code (or similar products)
 metadata:
   author: kiwi-data
@@ -9,7 +9,7 @@ metadata:
 # grimoire-plan
-Derive implementation tasks from approved Gherkin features and MADR decisions. The output must be detailed enough that any LLM can execute the tasks without further planning.
+Plan opens by **projecting** the agreed `draft.md` into its durable homes (features, constraints, decisions, `data.yml`, manifest), then derives implementation tasks from them. The output must be detailed enough that any LLM can execute the tasks without further planning.
 ## Triggers
 - User has approved a grimoire draft and wants to plan implementation
@@ -22,9 +22,7 @@ Derive implementation tasks from approved Gherkin features and MADR decisions. T
 - User wants to review the design → `grimoire-review` (after plan, before apply)
 ## Prerequisites
-- A change exists in `.grimoire/changes/<change-id>/` with:
-  - `manifest.md` (approved)
-  - At least one `.feature` file or decision record
+- A change exists in `.grimoire/changes/<change-id>/` with an agreed `draft.md` — the user has approved the design in `grimoire-draft`. Plan's first step **projects** that design into its homes (features, constraints, MADRs, `data.yml`, manifest); those do not need to exist yet.
 ## Workflow
@@ -56,14 +54,104 @@ The plan implements what's approved. It does not expand scope to hit a checklist
 These are gates, not aspirations — a task that adds a duplicate home or a reinvented wheel is rejected, not refined.
-### 1. Select Change
-- List active changes in `.grimoire/changes/`
-- If multiple, ask user which one to plan
-- If only one, confirm it
+### 1. Project the Design from draft.md
+**Select the change.** List active changes in `.grimoire/changes/`; if multiple, ask which to plan; if one, confirm it. Read its `draft.md` — the agreed design is the source of truth for this step. (If a change arrives already projected and has no `draft.md` — e.g. from `grimoire-refactor`, which authors its own register and artifacts — there is nothing to project: skip to step 2.)
+**Project `draft.md` into its durable homes.** This is where the **fine routing** happens (each fact → its one home) and where the admission test + principles gate run. Artifacts are written **live in their real locations** on the branch — `git diff` is the staging area; there is no copy-into-the-change-folder. (Projection used to close `grimoire-draft`; it now opens plan, co-located with the planning that consumes these homes.)
+First, **score the complexity level (1–4)** now that the design is settled, and write it to `manifest.md` frontmatter as `complexity: <1-4>` (use the level table in `grimoire-draft` step 2 as the rubric). Then project each kind of fact:
+**Behaviors → `features/*.feature`.** For each behavioral fact in the design:
+*The feature-file admission test* — a scenario may be written **only if it passes all four gates**; if it fails any, it is a constraint or a decision, not a feature:
+1. **External actor, outside the system boundary** — an end user, an operator, or a *third-party* system integrating with you does the thing. "External" means outside *your* system, not outside one module: a sibling service, an internal queue consumer, or another module in the same repo calling this one is **internal**, even though it's a separate process. Internal actor → contract test or constraint/decision, never a `.feature`.
+2. **Observable** — the actor sees the outcome without reading code or logs. "<200ms", "logs scrubbed of PII" → fails → constraint.
+3. **Domain language** — domain nouns, zero implementation detail. Names a library/log-level/table (`loguru`, `INFO`, `bcrypt`, `users` table) → fails → leaking implementation.
+4. **Survives reimplementation** — rewrite the internals from scratch; would the scenario still read the same? If it would change, it's pinned to implementation → not a feature.
+**Internal protocols and service-to-service contracts are NOT features.** A change to how two of your own components talk — an internal RPC/queue/event shape, a module API, a wire format between your services — is a *contract*, verified by a contract/integration test (`verify: unit-invariant`), not by Gherkin. It fails gate 1: there is no external actor, only your own code on both ends. If a third-party integrates against the protocol it's external and may be a feature; two of your own services is internal. This is the second-biggest source of feature-file slop after invariants.
+Common slop this catches: invariants (→ `constraints.md`) — "PII is scrubbed from logs", "all endpoints require auth", "responses are gzipped", "errors logged with a trace id"; internal protocols (→ contract test) — "service A publishes an OrderPlaced event B consumes", "the worker accepts a job payload with these fields", "module X returns this struct to module Y".
+*Extend vs. new — default is always extend; new files are the exception and require justification.* List existing feature files first (**required, not skippable** — do not write any scenario until this triage table is complete):
+```
+Existing feature files:
+  features/auth/login.feature         — "User Login"
+  features/billing/invoices.feature   — "Invoice Management"
+```
+For each scenario, decide extend-or-new and show it:
+```
+  "Admin resets a user's password"  → extend features/auth/login.feature (same actor domain: auth)
+  "User configures SSO provider"    → NEW (no existing file owns SSO configuration)
+```
+Signals to extend: same actor, same domain object, same entry point, same HTTP resource or screen. Signals genuinely new: new actor type with no existing file, entirely new domain object, or the existing Feature title would need "and" to cover both. If unsure, extend. A new file requires stating which files were considered and why none fit.
+Then write Gherkin (Feature title + user story; Background for shared preconditions; one scenario per behavior; Given/When/Then describing WHAT, never HOW). Apply security tags per `../references/security-compliance.md` (only when there's a security surface; compliance tags only when `project.compliance` is set). When design input grounded the scenarios (the change's `designs/` folder): use brand-token **names** not hex values when `.grimoire/brand/tokens.json` applies; prefer existing component names when `.grimoire/docs/components.md` exists, and flag any net-new component ("new component required — confirm before generating tasks").
+**Constraints → `.grimoire/docs/constraints.md`.** Every invariant that failed the admission test (it's a security control / NFR / observability / compliance rule, not an actor-observable behavior) becomes one row: **assertion · rationale · how-verified · links**. The assertion is a flat statement ("Log output never contains PII or secrets"), not Given/When/Then. `how-verified` names the test that proves it (a `unit-invariant` this plan creates) — never a Gherkin scenario. If it stems from a decision, link the MADR; don't restate it. Create the file from `templates/constraints.md` if absent.
+**Decisions → `.grimoire/decisions/NNNN-*.md`.** Project each Decisions-ledger entry (a Y-statement in `draft.md`), applying the **novelty gate**: a MADR is for a decision with a real, project-specific trade-off between viable alternatives — not for industry-default tooling picks or ecosystem-forced conventions. Ask: *would a competent engineer on this stack make a different choice, and need our reasoning to understand ours?* If no, skip it. Obvious tooling/convention picks fold into the existing `Tooling and convention baseline` ADR (one line: choice → why), not a new sequential record. Genuine trade-offs get the next sequential number, status `proposed` (`grimoire-apply` flips to `accepted` at finalize), using `.grimoire/decisions/template.md` — the Y-statement's context clause becomes the ADR's *Context and Problem Statement*.
+**Data changes → `.grimoire/changes/<change-id>/data.yml`.** If the change adds/modifies/removes data models, fields, indexes, or external API integrations, write `data.yml` (same YAML shape as `schema.yml`, only what's changing, `action:` on each entry):
+```yaml
+# Proposed data changes for: add-user-profiles
+users:
+  action: modify
+  source: src/models/user.py
+  fields:
+    avatar_url: { action: add, type: varchar, nullable: true }
+    legacy_name: { action: remove }
+profiles:
+  action: add
+  type: collection
+  fields:
+    user_id: { type: objectId, ref: users }
+    bio: { type: string, max_length: 500 }
+github_api:
+  action: add
+  type: external_api
+  provider: GitHub
+  schema_ref: https://docs.github.com/en/rest
+  client: src/integrations/github.py
+  endpoints:
+    get_user:
+      method: GET
+      path: /users/{username}
+      request:
+        headers: { Authorization: "Bearer {token}" }
+      response:
+        login: { type: string, required: true }
+        avatar_url: { type: string, required: true }
+        name: { type: string, nullable: true }
+      error_response:
+        message: { type: string }
+        status: { type: integer }
+```
+**Contract documentation is mandatory for external APIs.** Every endpoint must document `request` (what you send), `response` (fields you read, `required: true` for those your code depends on), and `error_response` (the error shape you handle). The task-generation step below turns this into contract tests. If you don't know the exact shape, reference `schema_ref` and document the subset your client uses — that subset is the contract. No data impact → skip `data.yml` entirely.
+**Manifest (`manifest.md`).** Generate it from `draft.md` as the durable plan glue: `complexity` (just scored), Why + Non-goals, the artifact list (added/modified/removed features, decisions, constraints), and a **Prior Art** section summarizing the build-vs-buy research captured in `draft.md` (what was found/evaluated, why adopt/build/hybrid; if building, what's borrowed). **Level 3–4** also carry **Assumptions** (what must be true; mark evidence vs. unvalidated; flag unvalidated ones on the critical path) and a **Pre-Mortem** (2–5 plausible failure modes 6 months out, with mitigations or "accepted"). These come straight from the `draft.md` Decided/Open and Cut sections.
+**Do NOT delete `draft.md`.** Retain it read-only as the agreed reference through the rest of plan → apply. `grimoire-apply` removes it with the change folder at finalize.
+**Validate the projection** before moving on:
+- `.feature` files have valid Gherkin; every Feature has a user story; every Scenario has at least Given + When + Then.
+- MADR records have valid YAML frontmatter (status, date).
+- Manifest is complete and accurate; `complexity` is set.
+- **Re-run the admission test on every scenario you wrote**: external actor, observable, domain language, survives reimplementation. Any scenario that now fails is slop — move it to `constraints.md` or a MADR.
+- **Principles gate** (`../references/principles.md`): no fact written to two homes (DRY), no second way to do an existing thing (one right way), no reinvented wheel, no artifact created past the stated scope (KISS). `draft.md` co-existing with the homes is **not** a DRY violation — it is the (soon-deleted) source the homes were projected from.
+The homes now exist; the rest of plan reads and breaks them into tasks.
 ### 2. Read All Artifacts
-Read the change's artifacts following `../references/artifact-map.md` — it defines what each file is, the grimoire-docs-first / graph-for-structure discipline, and the staleness gate. Plan-specific reading on top of that:
+Read the change's artifacts following `../references/artifact-map.md` — it defines what each file is, the grimoire-docs-first / graph-for-structure discipline, the **reading-altitude** rule (read contracts and signatures, not internal source or unit tests), and the staleness gate. Plan-specific reading on top of that:
 - `.grimoire/docs/constraints.md` — any constraints (security/NFR/observability) this change touches. These produce `unit-invariant` tasks, not scenarios.
 - The current baseline (`features/`, `.grimoire/decisions/`) via `git diff main` — exactly what this change adds vs. what already existed.
@@ -120,6 +208,8 @@ Level 1-2 changes with minor gaps may proceed; level 3-4 with multiple gaps shou
 ### 4. Generate Tasks
 Create `.grimoire/changes/<change-id>/tasks.md`. **Every task produces both production code AND a test — but the test level matches the artifact the task derives from.** Tasks are structured as pairs: the failing test first, then the production code.
+**Order tasks by the technical spine** (`../references/design-spine.md`): dependencies → data/schema → API/contract → business logic → UI by component → verification, **test-first within each layer**. This is the same order the change was designed on, so the plan's shape mirrors the design and stays predictable across changes.
 **Tag every implementation task with a `verify:` level** — this tells `grimoire-apply` which test vehicle to use. Match the artifact:
 | Task derives from | `verify:` | Test vehicle |
@@ -341,7 +431,7 @@ Before presenting to the user, verify the plan:
 - [ ] Every test task describes what to assert (no "write a test")
 - [ ] Every implementation task describes what to create/modify (no "add the code")
 - [ ] The verification section has the exact commands to run
-- [ ] Tasks are ordered: shared steps → test → production code → verification
+- [ ] Tasks follow the technical-spine order (`../references/design-spine.md`): dependencies → data/schema → API/contract → logic → UI → verification, test-first within each layer
 - [ ] No task requires the LLM to make architectural decisions — those should already be in the ADR
 - [ ] **Principles gate** (`../references/principles.md`): no task introduces a duplicate home for an existing fact (DRY), a second way to do an existing thing (one right way), a reinvented wheel where a tool/library/proven pattern exists (don't reinvent), or an abstraction/dependency justified only by a hypothetical (KISS). Any that does has a stated reason.
@@ -367,10 +457,10 @@ Check `.grimoire/config.yaml` for the configured agents:
 - If the user has configured separate thinking/coding agents, note this in the tasks.md header so the apply stage knows which agent to use
 ## Important
-- **Specificity is the whole point.** A vague plan is worse than no plan — it gives false confidence and the LLM will re-plan anyway. Every task must be executable without thinking.
+- **Specificity is the whole point.** A vague plan is worse than no plan — it gives false confidence and the LLM will re-plan anyway. Every task must be executable without thinking. "Implement the feature" is not a task — it's the *Skipping the plan / vague tasks* rationalization in `../references/red-flags.md`.
 - Tasks should be small and specific — one logical unit of work each
 - Every task traces back to a scenario or decision
-- Order matters: dependencies first, verification last
+- Order matters: tasks follow the technical-spine order (`../references/design-spine.md`); verification last
 - Don't generate tasks for things that already work (check the baseline)
 - Read the actual codebase before writing tasks. Reference real file paths, real patterns, real conventions. Don't guess.

package/skills/grimoire-review/SKILL.md CHANGED Viewed

@@ -177,9 +177,11 @@ Recommendation: Fix blockers, then proceed to apply.
 ## Important
 - This is a design review, not a code review. Focus on the specifications and plan, not hypothetical implementation details.
 - Be direct. Don't pad findings with praise or soften blockers. The goal is to catch problems before code is written, when they're cheap to fix.
+- **Ground every finding — ladder it.** Use the interview techniques in `../references/elicitation-personas.md` pointed at the design: ladder a decision *up* to the goal it serves (or expose it as Tunnel Vision), ladder a finding *down* to the concrete behavior that breaks, 5-Whys it to root cause. Also check each decision walks the spine in context (`../references/design-spine.md`) — names its layer, validates the prior. A laddered finding earns its severity; a bare verdict doesn't.
 - A blocker means "if we code this as-is, we'll have to come back and redo work." A suggestion means "this would improve the design but isn't blocking."
 - Keep each persona's review focused and short. Three bullet points that matter are better than ten that don't.
 - If the change is trivial (e.g., rename a field, fix a typo in a feature), say so and don't manufacture issues.
+- **Don't self-exempt by feel.** "It looks fine" / "I reviewed as I wrote it" are the *Skipping review* rationalization in `../references/red-flags.md`. Trivial-exempt is the skill's call, not a vibe.
 - All persona evaluation criteria, the materiality gate, the briefing structure, and the complexity-depth table live in `../references/review-personas.md`. Don't duplicate them here — read that file when running a persona.
 ## Done

package/skills/grimoire-verify/SKILL.md CHANGED Viewed

@@ -118,9 +118,27 @@ For each step definition:
    - **[warning]** `test_auth.py:58` — step "Then user should exist" only asserts `is not None` — check the actual user properties
    ```
+**Regression coverage:** When verifying a bug fix, confirm the fix ships with a regression test **named after the bug** (see `grimoire-bug`). A bug fix with no test that goes red-without-the-fix and pins the defect → WARNING — the bug can silently return.
 If `grimoire test-quality` CLI command is available, suggest running it for a comprehensive analysis.
 To run tests directly: use `config.tools.bdd_test` for BDD and `config.tools.unit_test` for unit tests.
+### 3.E Behavioral Verification *(optional — user-facing changes only)*
+Sections 3.B–3.D verify statically (code exists, asserts, follows decisions) and run the configured suites. They do **not** drive the running app. When the change is user-facing and the app can be run, add a behavioral pass; otherwise skip and say so. This mode adds **no mandatory dependency** — if there's no way to drive the app, mark it INCONCLUSIVE and rely on 3.A–3.D.
+**Read-only by default.** Read-only navigation/inspection needs no opt-in. Any state-changing action requires explicit user opt-in **and** a non-production target (local/staging URL, seeded creds). Never run mutations against production.
+**Verdict.** Every behavioral pass ends in exactly one:
+- **SHIP** — behavior matches the spec; no material issues.
+- **SHIP WITH FIXES** — works, with the non-blocking issues listed.
+- **DO NOT SHIP** — a scenario's promised outcome does not hold.
+- **INCONCLUSIVE** — could not verify (no baseline, app wouldn't run, tooling absent).
+**No baseline ⇒ INCONCLUSIVE, never a silent PASS.** Same rule as §3.C2: without a reference state you cannot claim behavior is correct. Report INCONCLUSIVE and fall back to static verification — do not dress up "I couldn't check" as a pass.
+**Click-path final-state check.** For each touchpoint the change affects, build a side-effect map — `action → {state it sets, state it resets}` — then trace the sequence and ask: *is the FINAL state what the label/spec promises?* This catches the silent-undo class (action B resets what action A just set) that static reading and single-assert tests miss.
 ### 4. Security Compliance Verification
 Verify that security guidance from plan and review stages was followed in implementation. Read `../references/security-compliance.md` for the full checklist.
@@ -222,6 +240,7 @@ Produce a structured report:
 - Scenarios verified: X
 - Decisions verified: X
 - Security checks: X passed, X failed
+- Behavioral verdict: <SHIP | SHIP WITH FIXES | DO NOT SHIP | INCONCLUSIVE | n/a (static only)>
 - Issues found: X critical, X warnings, X suggestions
 ## Critical Issues
@@ -255,6 +274,7 @@ Based on the report:
 ## Important
 - Verify is read-only. Do NOT fix issues — only report them. The user decides what to do.
+- **"Should pass" is not evidence.** Declaring done without running is the *Declaring done without verifying* rationalization in `../references/red-flags.md`. Observe state, don't predict it.
 - Be specific: reference file paths and line numbers for every issue.
 - A scenario without a step definition is always CRITICAL — the spec is not tested.
 - A step definition with no assertions is always CRITICAL — it's a false positive.

package/skills/references/artifact-map.md CHANGED Viewed

@@ -8,7 +8,7 @@ Loaded by skills that read a change's specs before acting (`grimoire-plan`, `gri
 Per-change (under `.grimoire/changes/<change-id>/`):
-- **`draft.md`** — the living design doc the change was designed on (diagram/sketch, decision ledger, pseudo-code, Decided/Open ledger). The single source the other artifacts were **projected** from at the end of `grimoire-draft`. Ephemeral: retained read-only as the agreed-design reference through the pipeline, deleted when `grimoire-apply` clears the change folder. Read it for the *intent and rationale* behind the projected artifacts; the features/constraints/decisions remain the authoritative homes.
+- **`draft.md`** — the living design doc the change was designed on (diagram/sketch, decision ledger of Y-statements, pseudo-code, Decided/Open ledger). The single source the other artifacts are **projected** from at the start of `grimoire-plan`. Ephemeral: retained read-only as the agreed-design reference through the pipeline, deleted when `grimoire-apply` clears the change folder. Read it for the *intent and rationale* behind the projected artifacts; the features/constraints/decisions remain the authoritative homes.
 - **`manifest.md`** — change summary, complexity level, and the Why. Level 3-4 also carry Assumptions, Pre-Mortem, and **Prior Art** (the build-vs-buy rationale). Generated from `draft.md` at projection.
 - **`features/*.feature`** — behavioral specifications. Edited live in `features/` on the branch.
 - **decision records** — architectural choices for this change, edited live in `.grimoire/decisions/`, including Cost of Ownership sections.
@@ -35,6 +35,18 @@ Project-wide (under `.grimoire/`):
 ---
+## Reading altitude — design reads contracts, debugging reads internals
+When you read code during **design** (`grimoire-draft`, `grimoire-design`, `grimoire-plan`), read at the **published-interface altitude** — what a caller needs to integrate, not how the callee works inside:
+- **Third-party library or service** — its public API, types, and docs. Not its source, and not its tests. The contract is what you design against; the internals are the maintainer's concern.
+- **Your own system** — the touched area's exported symbols, API endpoints, and data schema, plus the relevant feature files and `constraints.md`. Not the whole backend's source, and not its unit tests.
+- **Prefer the graph for structure without bodies.** `search_graph` / `get_architecture` give signatures, callers, and call edges — the *shape* of the interface — without spending context on implementation bodies. That is the altitude design needs.
+**Reading full source bodies and unit tests is a *debugging* activity** — justified when a behavior is wrong and you need root cause (`grimoire-bug`), not when you need to know how an interface is used. In design the question is "what is the contract?", and the contract lives in signatures, schemas, and specs — not in line-by-line implementation. Exhaustively reading internals at design time burns context and rarely improves the design. This is the rule above, sharpened: even when you *do* read source, read the seam, not the guts.
+---
 ## Staleness gate
 For each area doc you load, compare its `last_updated` against `git log -1 --format=%ci <directory>`. If the doc is older than the most recent commit to its directory, it's stale — its paths, utility names, and patterns may be wrong.

package/skills/references/design-spine.md ADDED Viewed

@@ -0,0 +1,166 @@
+# Design Spine
+The ordered path a design walks — in the interview (`grimoire-draft`, `grimoire-design`), in
+evaluation (`grimoire-review`), and in task order (`grimoire-plan`). One spine, walked the
+same way everywhere, so the structure is predictable: the user learns where each kind of
+decision happens.
+This is the single home for *how* a design proceeds. Skills cite the relevant section; none
+restate it (DRY). It pairs with `principles.md` (what every artifact must satisfy) and
+`red-flags.md` (the excuses to skip a stage) — this file is the *sequence* the work follows.
+The methods below are named on purpose. They are established practice — lean on the name
+(Working Backwards, inside-out layering, stepwise refinement, Y-statement) so the work
+inherits a known, well-understood discipline instead of an ad-hoc one.
+---
+## Bias: complete, not over-built
+The spine and its personas are a **surfacing** tool — they raise candidates (steps,
+constraints, "what happens if" cases). Surfacing tools bias toward *more*: every question
+invites a handler, every constraint feels like rigor. Left ungoverned, walking the spine
+*manufactures* the over-engineering YAGNI warns against. So the walk has one governing rule:
+**Surface broadly, build narrowly.** The spine raises the candidate; `principles.md` §4
+(KISS/YAGNI) decides whether it earns a place. They are a pair — never run the surfacing
+without the prune.
+- **Every surfaced item gets a disposition, and the default is *don't build it*.** Each
+  candidate resolves to one of: *build* (a present, concrete need requires it), *won't build*
+  (record as a one-line non-goal), or *defer*. When unsure, it's a non-goal. Recording "we
+  considered X and chose not to" is completeness; building X "just in case" is not.
+- **Lean simple when you must lean.** Under-building is cheap to add later; over-building is
+  expensive to remove. If the call is genuinely balanced, choose the simpler, less-complete
+  option and say so — complexity layers in cleanly later, but rarely comes back out.
+- **"Complete" means the stated outcome plus the failures whose cost the user would actually
+  feel — not every conceivable case.** Completeness is measured against the outcome, not
+  against an exhaustive enumeration of edge cases.
+- **Constraints are surface area, not virtue.** Each must earn its place with a present,
+  concrete *why* (a downstream need, a real corruption risk). A constraint justified only by
+  "might need to" is YAGNI — cut it. This is why the ceremony gate counts constraints as a
+  *cost* signal, not a rigor score.
+---
+## Pick the spine
+Two spines. Pick by what the change touches; **once picked, always walk it in order** (next
+section). A mixed change uses the technical spine and expands its UI layer with the UX spine.
+| Change | Spine | Home skill |
+|--------|-------|-----------|
+| User-facing flow / screen / UI | **UX-workflow spine** | `grimoire-design` |
+| Behavioral / technical (API, data, logic) | **technical spine** | `grimoire-draft` |
+| Mixed (UI + backend) | **technical**, UI layer via UX | draft + design for the UI layer |
+### UX-workflow spine — traversal direction
+Walk the user's process in one of two directions. State which you're using.
+- **Backward — "Working Backwards"** (Cooper goal-directed design; Amazon PR/FAQ). Start at
+  the **goal / end-state** (a JTBD outcome: "when *situation*, I want *motivation*, so I can
+  *outcome*"). At each step ask **"what must be true for the user to reach here?"** Best when
+  the goal is clear but the path is contested — it surfaces unknown prerequisites and prunes
+  steps that serve no downstream need.
+- **Forward — "forward chaining"** (skills-forward). Start from **what the user reasonably
+  knows or has at the start** and step toward the goal. Best when the starting state is well
+  defined but the goal is emergent, or when documenting an existing happy path.
+**Reconcile (the discipline):** define the end-state, chain **backward** to the required
+prerequisites, then **validate forward** by walking a real user from their actual starting
+knowledge (the Mom Test — see `elicitation-personas.md`). Where the two traversals don't meet
+are the missing or assumed steps — capture each as an Open row.
+### Technical spine — layer order
+Design **process/constraints → data model → API/contract → UI, component by component.** This
+is **inside-out** layering (DDD layered architecture / Clean Architecture dependency rule):
+the domain and its rules sit at the core; the API and UI are outer detail that depend inward,
+never the reverse.
+| Layer | What you settle here |
+|-------|----------------------|
+| **1 · Process / constraints** | The invariants and limits that must always hold — business rules, security controls, NFRs, what must *never* happen (data-corruption guards). These bound everything downstream. |
+| **2 · Data model** | Entities, fields, relationships, and each field's constraints (required / unique / nullable / range). Every constraint states its *why* — the downstream need or corruption risk that justifies it (→ a `constraints.md` row). |
+| **3 · API / contract** | The interface other code or clients use. Design it as a deliberate contract that **hides** the data model — this reconciles "data-first" with "API-first": the API is a versioned abstraction over the schema, not a mirror of it. |
+| **4 · UI, by component** | The surface, one component at a time. For a user-facing surface, expand this layer with the UX-workflow spine above. |
+**Each layer constrains the next** (*stepwise refinement* — every decision narrows the
+solution space for the layers below). **Building a layer validates the prior** (*consumer-
+driven contracts* / model–implementation feedback): designing the API tests the data model;
+designing the UI tests the API. When a lower layer can't satisfy an upper one, the upper one
+was wrong — go back and fix it, don't patch around it downstream.
+## Walk it — always, in order
+Whatever spine is chosen, **traverse its layers/steps in order; do not jump around.** At each
+layer:
+1. **Elicit** with that layer's lens — personas are the *who* (`elicitation-personas.md`),
+   techniques are the *how* (laddering / Mom Test / 5 Whys, same file).
+2. **Record decisions** for the layer in the `draft.md` ledger as Y-statements (below).
+3. **Validate the prior** layer against what you just learned — restate the check explicitly
+   ("this required field traces to *downstream need X*"; "this data model satisfies process
+   constraint *C*"). A failed validation sends you back up, not forward.
+An **empty layer is fine** — say so in one line and skip (a change with no data impact skips
+layer 2). Skipping is a stated call, never a silent omission.
+## Ceremony gate — scale to the constraints, not first impressions
+Full layer-by-layer ceremony is for changes that earn it; trivial ones don't.
+- **Default (lightweight):** walk the spine, but elicit only what the change needs and skip
+  empty layers in one line. Most level-1–2 changes finish here.
+- **Escalate to full ceremony when the change introduces more than 2 constraints** (data
+  invariants, security controls, cross-layer dependencies). The **3rd** new constraint is the
+  signal that real complexity is hiding — from there, walk every layer formally, record a
+  decision per layer, and (level 3–4) add the manifest Pre-Mortem.
+Constraint count is a measurable trigger that complements the complexity level: complexity is
+an *output* of design, and the count is that output surfacing mid-interview. A nominally
+"simple" change that trips the gate is not simple — let the count, not the first impression,
+set the depth.
+## Decisions — Y-statement, in context
+Every decision in the `draft.md` ledger is recorded as a **Y-statement**, so its context is
+forced into the record. This defeats the *Tunnel Vision* anti-pattern — a decision that reads
+well in isolation but is wrong for the surrounding workflow:
+> **D*n*:** In the context of *spine layer / use-case*, facing *concern / force*, we chose
+> *option* over *alternatives*, to achieve *quality*, accepting *downside* — **because
+> *why*.**
+The **because** is mandatory. The **context** clause ties the decision to the spine layer it
+emerged from, so the user evaluates it *in the situation it serves*, not as an abstract claim
+("sounds good" decided in a vacuum is exactly how ADRs go wrong). Coupled decisions
+cross-reference by ID (D7 cites D3). At projection (`grimoire-plan`'s first step) each ledger entry
+maps cleanly to a MADR: the context clause becomes the ADR's *Context and Problem Statement*,
+the chosen/neglected options its *Considered Options / Decision Outcome*, the accepted
+downside its *Consequences*. The novelty gate still applies — industry-default picks fold into
+the baseline ADR, not a new record.
+## Plan task order follows the technical spine
+`grimoire-plan` emits tasks in the **same layer order** the change was designed on, so the
+plan's shape mirrors the design:
+```
+dependencies → data/schema → API/contract → business logic → UI by component → verification
+```
+Within each layer, **test first** — the failing test for that layer-pair before its code.
+This is the per-layer red-green unit, not a single global acceptance test up front. The order
+is fixed on purpose: over time the user learns that schema changes always land before API
+changes, contract tests before clients, UI last.
+---
+## How skills cite this
+- **design** — UX-workflow spine (traversal direction) at the user-flow step.
+- **draft** — pick + walk the spine in the design loop; Y-statement ledger; ceremony gate.
+- **plan** — task order = technical-spine order; test-first per layer.
+- **review / verify** — check that decisions name their context and each layer validates the prior.
+Each skill links its own section; none restate the spine. This is the one home (DRY).

package/skills/references/elicitation-personas.md CHANGED Viewed

@@ -6,7 +6,7 @@ Persona-driven questions to surface requirements. Used by draft (gather requirem
 - **In draft**: Ask these questions to gather requirements before drafting.
 - **In plan**: Use as a completeness checklist — flag gaps in the specs, don't ask the user.
-- **In review**: Use as evaluation criteria — check if the design addresses each concern.
+- **In review**: Use as evaluation criteria — check if the design addresses each concern. Apply the interview techniques below to *interrogate the design itself* (not the user) so a finding is reasoned, not asserted.
 ## Depth by Complexity Level
@@ -19,6 +19,27 @@ Persona-driven questions to surface requirements. Used by draft (gather requirem
 Don't ask every question — only ask questions whose answers aren't already clear.
+**Questions surface candidates, not requirements.** A persona question raises something to *decide*, and the default decision is "out of scope" — record it as a one-line non-goal and move on. Don't convert every answered question into built handling or a new constraint; that is how eliciting turns into over-engineering. Promote to a scenario or constraint only what the stated outcome actually needs — everything else is a non-goal. See the simplicity bias in `design-spine.md`.
+## Interview Techniques — How to Ask
+The personas are the *who* (which concerns to probe); these are the *how* (question shapes that surface a real answer). Reach for them especially when walking a spine in `design-spine.md`.
+- **Laddering** — the bidirectional workhorse. Ladder **up** ("why is that important to you?") to climb from a feature to the goal it serves; ladder **down** ("how would you do that?" / "what would that look like?") to descend from a goal to a concrete step. Up drives the *backward* UX traversal; down drives the *forward* one.
+- **The Mom Test** — anchor every question in **past behavior, not hypotheticals**: "When was the last time you hit this? Walk me through exactly what you did." Beats "would you use…?" / "is this a good idea?" — those invite flattery, not facts. The best way to reconstruct the *real* workflow instead of the imagined one.
+- **Critical Incident Technique** — ask for a **specific real incident** in detail ("tell me about the last time X went wrong"). People recall concrete incidents far more accurately than general process, so this surfaces edge cases and the actual happy path.
+- **5 Whys** — repeat "why" to climb from a symptom or step to its root motivation. Use it to find the end-state a backward traversal should start from.
+Don't run all four — pick the one that fits: laddering to trace a workflow up or down the spine, Mom Test / CIT to reconstruct what really happens, 5 Whys to find the underlying goal.
+**In review, point these inward at the design.** They make a critique credible — reasoned rather than asserted:
+- **Ladder a decision up** — does it trace to a real goal, or is it *Tunnel Vision* (reads well in isolation, wrong for the surrounding workflow)? A decision whose ladder-up dead-ends serves no goal.
+- **Ladder a finding down** — "how does this concretely fail?" A finding you can ladder to a specific broken behavior is real; one you can't is a hunch — say so or drop it.
+- **5-Whys a finding to root cause** before grading it, so the blocker names the underlying defect, not a symptom.
+A laddered finding ("fails because → because → because") earns its severity. A bare verdict ("[blocker] this is wrong") does not — and the materiality gate in `review-personas.md` rejects it.
 ## Outcome & Scope — Always Ask First
 Before diving into persona questions, establish the outcome and boundaries. These two questions prevent the most common spec failures — building the wrong thing and building too much:
@@ -48,6 +69,7 @@ Ask when: the change has user-facing behavior.
 Ask when: the change introduces new components, services, dependencies, or data flows.
+- **Follow the technical spine** (`design-spine.md`): design inside-out — process/constraints → data model → API contract → UI, each layer constraining the next. Flag anything that inverts it: a UI shape dictating the data model, or an API that mirrors the schema instead of abstracting over it.
 - What's the deployment context? Does this run in the same service or cross service boundaries?
 - What existing components does this touch? Are there shared modules, APIs, or databases involved?
 - Are there concurrency concerns? Multiple users or processes acting on the same data?
@@ -75,6 +97,7 @@ Ask when: the change involves authentication, authorization, user input, sensiti
 Ask when: the change has complex behavior, multiple paths, or integration points.
+- **Walk the workflow step by step and ask *what happens if* at each step** — the user abandons here, the input is invalid, the network drops, the action runs twice, they arrive without the previous step's precondition? This surfaces candidates, not requirements: **most answers are "nothing — out of scope," recorded as a one-line non-goal.** Promote a "what happens if" to a scenario or constraint only when its failure carries a cost the user would actually feel for *this* change (the simplicity bias in `design-spine.md`).
 - What are the boundary values? Min/max lengths, zero vs. one, empty collections, null states?
 - What are the timing edge cases? Concurrent edits, race conditions, timeout during processing?
 - What external dependencies could fail? How should the system behave when they do — retry, fallback, error?
@@ -85,10 +108,11 @@ Ask when: the change has complex behavior, multiple paths, or integration points
 Ask when: the change creates, modifies, or removes data models, or integrates with external APIs.
 - What data entities are involved? What are the relationships between them?
-- What are the field constraints? Required, unique, nullable, max length, valid ranges, enums?
+- What are the field constraints — required, unique, nullable, max length, valid ranges, enums? For each, **what is the *why*** — the downstream need it serves or the corruption risk it prevents ("required because the billing job reads it", "unique to stop duplicate ledgers")? Record that justification as the rationale on a `constraints.md` row, not as an unstated assumption.
 - How does this data grow? Is there a retention policy, archival strategy, or cleanup needed?
 - Is there existing data that needs migrating? Can the migration run live or does it need downtime?
 - Are there external API contracts? What fields does the client read, and what happens if the schema changes?
+- **Does the data model supply everything the API and UI need?** For each field an endpoint or screen requires, confirm a backing source exists in the model with the right type and nullability — the data layer *validates* against the layers above it (`design-spine.md`). A required API/UI value with no data-model source is a gap to fix in the model now, not a `null` to discover in production.
 ## Requirements Summary Template

package/skills/references/red-flags.md ADDED Viewed

@@ -0,0 +1,62 @@
+# Anti-Rationalization Red Flags
+Under time pressure, sunk cost, or false confidence, an AI talks itself out of the
+discipline it was told to follow — and does it convincingly. The rationalizations are
+predictable and few. This file names them.
+This is the single home for the "excuses to skip a stage." It is the sibling of the
+**Anti-Loop Protocol** in `AGENTS.md`: that governs loops *inside* a stage; this governs
+skipping a stage *entirely*. Skills cite the relevant section rather than restating it.
+**The rule:** when you catch yourself forming one of these thoughts, that is the signal to
+*do the step*, not skip it. The urge to skip is the evidence the step is needed. A stage is
+skipped only by an explicit, recorded decision (a gate that says skip) — never by a silent
+rationalization that it "isn't worth it this time."
+---
+## Skipping the spec (→ grimoire-draft)
+**Catch yourself saying:** "Too small / too obvious to draft." · "I already know what to build." · "I'll spec it after I see it work."
+**Why it's wrong:** "Too small" is a complexity judgment, and complexity is an *output* of design, not an input — you can't score it honestly before designing. Code-first means the spec gets reverse-engineered to match whatever you built, so it documents the bug instead of catching it.
+**Instead:** Run the triviality gate (draft step 2). If it's genuinely trivial (config/typo/single-file) the gate says skip — that's a *recorded* call. Anything else: draft it.
+## Silently filling a gap (→ grimoire-draft)
+**Catch yourself saying:** "A reasonable default is obvious here." · "I understand enough, no need to ask." · "I'll just assume X."
+**Why it's wrong:** A silent assumption the user would have corrected becomes a bug whose paper trail says "intended." One question costs seconds; a wrong guess costs a rebuild.
+**Instead:** Ask it (an *Open* row), defer it to a non-goal, or record the inference explicitly as `RESOLVED: defaulting to X per delegation`. Never leave an unrecorded guess in the design.
+## Skipping the plan / vague tasks (→ grimoire-plan)
+**Catch yourself saying:** "Planning is overhead, I'll work it out as I go." · "The task is just 'implement the feature'."
+**Why it's wrong:** A vague plan is worse than none — it gives false confidence and you re-plan mid-implementation anyway, now with code already written the wrong way. "Implement the feature" is not a task; it restates the goal.
+**Instead:** Every task names exact files and one approach, small enough to execute without thinking (grimoire-plan). If a task needs thought to start, it isn't planned yet.
+## Code before the test (→ grimoire-apply)
+**Catch yourself saying:** "I'll add the test after." · "Let me just see it work first." · "The test is trivial, red-first is ceremony."
+**Why it's wrong:** A test written after the code is shaped to pass the code, not to catch its bugs — it asserts what you built, not what was required. A test that never failed has never proven anything. This is the single most common discipline bypass.
+**Instead:** Red first — watch it fail for the right reason, then make it pass (grimoire-apply). If you wrote code before the test, the honest move is to delete the code and start from red.
+## Skipping review (→ grimoire-review)
+**Catch yourself saying:** "It looks fine." · "I reviewed it as I wrote it." · "Too small to need review."
+**Why it's wrong:** "Looks fine" is the feeling that precedes every shipped bug. Reviewing your own work as you write it is the weakest review there is — same blind spots, same assumptions, at the moment you're most committed to the approach.
+**Instead:** Run the persona pass at the depth the complexity calls for (grimoire-review). Trivial changes are exempt *by the skill's own rule* — say so and move on; don't self-exempt by feel.
+## Declaring done without verifying (→ grimoire-verify)
+**Catch yourself saying:** "Tests should pass." · "That obviously works." · "I'll trust the run I did earlier."
+**Why it's wrong:** "Should pass" is a prediction, not evidence. Done is a claim about observed state; an unobserved claim is a guess in a confident voice.
+**Instead:** Run it. Confirm every scenario has a real step definition with real assertions and no regressions (grimoire-verify). Evidence over claims — every time.
+## Doing more than the task (→ principles.md §4)
+**Catch yourself saying:** "While I'm in here I'll also…" · "We'll need it later."
+This is scope creep / YAGNI — its home is **`principles.md` §4 (KISS/YAGNI)**, not here. The named flags ("while I'm here", "for a future caller") live there; cut the speculative work and say so in one line.
+---
+## How skills cite this
+- **draft** — *Skipping the spec*, *Silently filling a gap*
+- **plan** — *Skipping the plan / vague tasks*
+- **apply** — *Code before the test*
+- **review** — *Skipping review*
+- **verify** — *Declaring done without verifying*
+Each skill links its own section; none restate the list. This is the one home (DRY).

package/skills/references/review-personas.md CHANGED Viewed

@@ -116,6 +116,31 @@ Severity inflation patterns to avoid:
 - "Untested edge case" when no scenario in the briefing covers it → not a blocker.
 - "Missing observability" on a level 1-2 change → suggestion, never blocker.
+## 2c. Pre-Report Gate *(diff-review personas: Senior Engineer code-level, Security code-level scan, Code Style)*
+The materiality gate (§2) asks "does this matter to *this* project". The Pre-Report Gate asks the prior question: "is this *even a real issue*". Both apply; this one runs first on code-level findings. Before writing any finding, answer four questions:
+1. **Exact line** — can you cite the precise `file:line` the finding lives at?
+2. **Concrete failure mode** — can you state input → state → bad outcome? Not "could be unsafe" — the actual trigger and consequence.
+3. **Context read** — have you read the callers, imports, and tests around the line, not just the hunk? Trace the type and the caller before claiming a flaw.
+4. **Severity defensible** — would the §2b severity survive the Contrarian?
+Any "no / unsure" → downgrade or drop. A **blocker** additionally requires the offending snippet, the failure scenario, and **why existing guards don't already catch it** (neighbor code, framework default, narrowing on the prior line). If you can't write that, it is not a blocker.
+## 2d. Common False Positives — skip these
+Recurring LLM mis-flags. Each has a disqualifying condition — check it before filing. The fix is always *trace it*, not *pattern-match the syntax*.
+- **"Possible null deref"** when the preceding line narrows the type (`if (!x) return`, early-return, `?.` already guarding) → trace the type flow; drop.
+- **"N+1 query"** on a fixed-cardinality loop (known small constant) or a DataLoader / batched path → not N+1; drop.
+- **"Missing await"** on an intentionally detached call (`void promise`, fire-and-forget with a comment, a queued job) → check for `void` / comment first; drop.
+- **"Unhandled promise rejection"** on a promise that is `.catch`-chained or `await`ed in a `try` → trace the chain; drop.
+- **"Math.random() is insecure"** in a non-crypto context (jitter, sampling, test data, cache-bust) → security theater; drop. Flag only on tokens/keys/IDs.
+- **"Missing input validation"** when a traced caller already validates at the boundary → trace one caller; internal code may trust it (errors-at-the-boundary). Drop or route as a boundary note.
+- **"Magic number / no constant"**, **"add a comment"**, **"could be more generic"** with no project anchor → style preference; drop (or §4.6 suggestion at most).
+Closing test for any code-level finding: **would a senior engineer on this team actually change this in review?** If no, skip.
 ---
 ## 3. Complexity Gating
@@ -170,12 +195,13 @@ Evaluate:
 ### 4.2 Senior Engineer
-Treat accepted decisions as constraints — cite ADR ID before suggesting an override.
+Treat accepted decisions as constraints — cite ADR ID before suggesting an override. On PR/pre-commit, run the Pre-Report Gate (§2c) and check §2d before filing any code-level finding.
 Evaluate:
 - **Build vs Buy** *(design only)*: Was prior art research thorough? If a well-maintained library exists that the manifest doesn't mention, **blocker**.
 - **Simplicity (YAGNI ladder)**: Walk `../references/principles.md` §4 in order — could it not exist (YAGNI), does the stdlib do it, does a native platform feature cover it, does an installed dep solve it, is it one line? Flag the first rung the code skipped: unnecessary abstraction, indirection, premature generalization, config-driven where a direct call would do. Abstract on the third real use, not the first (**Rule of Three**) — two copies is not yet a pattern. Every finding **names the concrete replacement** (`stdlib: 27-line validator → "@" in email, 1 line`), not just "this seems complex" — a finding the author can't act on is noise.
 - **Architecture**: Decisions sensible for this codebase? Will this paint us into a corner?
+- **Unrecorded decision**: Does the change make an architectural or technology choice — new dependency, pattern, module boundary, NFR target — with no ADR recorded? An architectural decision without a decision record → finding; route to an ADR via `grimoire-draft`, and check it satisfies the capability-surface rule (ADR-0036). Apply the novelty gate (`grimoire-audit` §3) — don't flag default tooling picks.
 - **Conventions** *(PR/pre-commit)*: Does new code match file layout, naming, and patterns already in the touched areas? Check `.grimoire/docs/<area>.md` if present.
 - **Reuse / reinvention**: Existing utilities re-implemented (`grep` similar names; area-doc reusable lists), or stdlib / native-platform / installed-dep functionality hand-rolled (principles.md §3 — don't reinvent the wheel). Name what already does the job.
 - **Dead code** *(PR/pre-commit)*: Functions added but not called, imports unused, commented-out code, stubs with no implementation.
@@ -221,6 +247,8 @@ Most reviews: only **Data disclosure** + **Linking/Identifying** apply — skip
 #### Code-level scan *(PR/pre-commit only)*
+Run the Pre-Report Gate (§2c) and check §2d before filing — security findings draw the most reflexive false positives (theoretical injection on validated input, `Math.random()` outside crypto).
 - **Secrets**: Grep diff for hardcoded keys, tokens, passwords, cloud credentials, JWT secrets. Any hit = **blocker**.
 - **Injection**: Raw SQL with string concatenation, shell-exec with user input, `eval`/`exec`, unsafe deserialization. Tag OWASP + CWE.
 - **Input validation**: New endpoints without schema validation, file uploads without size/type limits, path params used directly in filesystem calls.
@@ -288,7 +316,7 @@ Verify the diff matches the project's code-style and comment standards. This is
 4. Lint/format config in repo root: `.editorconfig`, `eslint.config.*`, `.prettierrc*`, `pyproject.toml` (ruff/black), `.rubocop.yml`, `rustfmt.toml`, `.golangci.yml`, etc.
 5. **Neighboring files** in the touched directories — derive convention from what already exists when no config exists
-If none of the above pin a rule, **don't invent one**. Style preferences without a project anchor are dropped.
+If none of the above pin a rule, **don't invent one**. Style preferences without a project anchor are dropped. The Pre-Report Gate (§2c) and §2d apply here too — most style nits without a config anchor are §2d false positives.
 #### Evaluate

package/templates/draft.md CHANGED Viewed

@@ -60,9 +60,14 @@ kind: greenfield | refactor
   and cross-references (D7 cites D3) freely — this is how coupled decisions stay legible
   in one place. At projection, each NOVEL decision becomes a MADR (novelty gate applies —
   obvious tooling picks fold into the baseline ADR, they don't mint a record).
+  Phrase each row as a Y-statement (see the design-spine reference): the Decision cell states
+  "in the context of <spine layer / use-case>, facing <force>, chose <option> over
+  <alternatives>, accepting <downside>"; the Why cell is the "because". The context clause
+  ties the decision to the layer it serves so it's judged in situation, not in a vacuum.
 -->
-| #  | Decision | Why |
+| #  | Decision (Y-statement: context · option over alternatives · accepting downside) | Why (because…) |
 |----|----------|-----|
 | D1 |          |     |