@kiwidata/grimoire 0.2.2 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/AGENTS.md CHANGED
@@ -103,22 +103,22 @@ User has a request
103
103
  ├─ "I want to add / change / remove functionality"
104
104
  │ │
105
105
  │ ├─ Adding new behavior?
106
- │ │ → /grimoire:draft → write new .feature file
106
+ │ │ → /grimoire:draft → design the new behavior (plan projects the .feature)
107
107
  │ │
108
108
  │ ├─ Changing existing behavior?
109
- │ │ → /grimoire:draft → modify existing .feature file
109
+ │ │ → /grimoire:draft → design the change (plan projects it into the .feature)
110
110
  │ │
111
111
  │ ├─ Removing a feature?
112
112
  │ │ → /grimoire:remove → tracked removal with impact assessment
113
113
  │ │
114
114
  │ └─ Does it also involve a technology/architecture choice?
115
- │ → Draft BOTH: .feature file + MADR decision record in the same change
115
+ │ → Draft BOTH the behavior and the decision in one change
116
116
 
117
117
  ├─ "We should use X instead of Y" / "How should we architect this?"
118
- │ → /grimoire:draft → MADR decision record (not a feature)
118
+ │ → /grimoire:draft → design the decision (plan projects a MADR, not a feature)
119
119
 
120
120
  ├─ "We need to handle X concurrent users / meet Y compliance"
121
- │ → /grimoire:draft → MADR decision record (non-functional requirement)
121
+ │ → /grimoire:draft → design the requirement (plan projects a MADR / constraint)
122
122
 
123
123
  ├─ "What do we have? What's documented?"
124
124
  │ → /grimoire:audit → discover undocumented features and decisions
@@ -156,9 +156,11 @@ The end-to-end flow for adding or modifying behavior is six stages, each owned b
156
156
 
157
157
  **Draft** (`/grimoire:draft`) → **Plan** (`/grimoire:plan`) → **Review** (`/grimoire:review`, optional) → **Apply** (`/grimoire:apply`) → **Verify** (`/grimoire:verify`) → **PR** (`grimoire pr`).
158
158
 
159
+ Draft's single job is to design the change on `draft.md`. **Projection** — turning that agreed design into features, constraints, MADRs, `data.yml`, and the manifest — is the **first step of Plan**, not the end of Draft.
160
+
159
161
  Each skill's SKILL.md is the authoritative home for that stage's mechanics; the README "Workflow" section is the narrative walkthrough. Do not re-derive stage steps here — invoke the skill. The operational invariants that bind every stage:
160
162
 
161
- - **Manifest status tracks progress:** `approved` after draft, `implementing` during apply, `accepted` at PR.
163
+ - **Manifest status tracks progress:** the manifest is created at plan's projection step; `approved` once the design is agreed and projected, `implementing` during apply, `accepted` at PR.
162
164
  - **Live on the branch.** Features, decisions, constraints, and schema are edited directly on the feature branch — no copy-into-change-folder, no promote step.
163
165
  - **No archive step.** The PR diff *is* the change; git history plus the `Change: <id>` commit trailer are the record. PR finalize just flips decision status to `accepted` and removes the ephemeral change folder.
164
166
  - **The user drives the pace.** Review mode (default) approves every file change before writing; autonomous mode works the full task list, stopping only on blockers.
@@ -204,9 +206,9 @@ project-root/
204
206
  ## Conventions
205
207
 
206
208
  ### Manifest Status Lifecycle
207
- Every manifest has a `status` field in YAML frontmatter:
208
- - `draft` — being written, not yet reviewed
209
- - `approved` — reviewed by user, ready for planning/implementation
209
+ Every manifest has a `status` field in YAML frontmatter (the manifest is created at plan's projection step, from the already-agreed design):
210
+ - `draft` — just projected from the agreed `draft.md`
211
+ - `approved` — design agreed and projected, ready for implementation
210
212
  - `implementing` — tasks are being worked on
211
213
 
212
214
  Update the status as the change progresses. The CLI reads this to report change state. There is no `complete`/archive state — finalize removes the ephemeral change folder once the PR is opened; git history is the record.
@@ -248,7 +250,15 @@ This is what makes `grimoire trace` work. Without it, the commit is invisible to
248
250
  ### Decision Numbering
249
251
  - Sequential, zero-padded: `0001-`, `0002-`, etc.
250
252
  - Never reuse numbers
251
- - Superseded decisions keep their number, status updated to `superseded by NNNN`
253
+
254
+ ### Decision Lifecycle
255
+ Status moves `proposed → accepted → (deprecated | superseded by NNNN)`:
256
+ - `proposed` — drafted, not yet adopted.
257
+ - `accepted` — in force; treated as a constraint by every stage.
258
+ - `deprecated` — no longer recommended, with no direct replacement (the need went away).
259
+ - `superseded by NNNN` — replaced by a newer decision.
260
+
261
+ Supersession is **two-way and explicit**: the superseding ADR back-links the one it replaces (in Context or Decision Drivers), and the superseded ADR keeps its number with status set to `superseded by NNNN`. This is the only home for the link — don't restate it elsewhere.
252
262
 
253
263
  ### Step Definitions
254
264
  Organize by **domain concept**, NOT by feature file. Check the project's existing test setup and match its BDD framework conventions. See the active skill's testing reference for ecosystem-specific patterns.
package/README.md CHANGED
@@ -77,8 +77,8 @@ Then talk to your AI assistant:
77
77
  ```
78
78
  You: "Users should be able to log in with 2FA"
79
79
 
80
- → /grimoire:draft Creates login.feature with Given/When/Then scenarios
81
- → /grimoire:plan Generates tasks: write the test, then production code
80
+ → /grimoire:draft Designs the change on one living draft.md (Given/When/Then take shape here)
81
+ → /grimoire:plan Projects the design into login.feature + decisions, then generates tasks
82
82
  → /grimoire:review (optional) Product, security, engineering + principles review
83
83
  → /grimoire:apply Implements test-first (BDD for behavior, unit for invariants)
84
84
  → /grimoire:verify Confirms all scenarios pass, no regressions
@@ -115,11 +115,11 @@ Grimoire routes your request to its one correct home (an admission test keeps ea
115
115
  - **"The login page is broken"** → `/grimoire:bug` (reproduce first, then fix)
116
116
  - **"A tester found a problem"** → `/grimoire:bug-report` → `/grimoire:bug-triage` → routed fix
117
117
 
118
- A `.feature` is allowed only if it has an external actor, is observable without reading code/logs, uses domain language, and survives a reimplementation. Security controls, NFRs, and observability guarantees are invariants → they live in the constraints register. Produces `.feature` files (with security tags like `@security`, `@auth`, `@pii`, `@pci-dss` when applicable), constraint entries, decision records, `data.yml` for schema changes, and a manifest tracking the change.
118
+ A `.feature` is allowed only if it has an external actor, is observable without reading code/logs, uses domain language, and survives a reimplementation. Security controls, NFRs, and observability guarantees are invariants → they live in the constraints register. You design all this on one living `draft.md`; it is **projected** into its homes — `.feature` files (with security tags like `@security`, `@auth`, `@pii`, `@pci-dss` when applicable), constraint entries, decision records, `data.yml` for schema changes, and a manifest at the **start of Plan**, so Draft's one job is to design the change.
119
119
 
120
- ### 2. Plan — Generate concrete tasks
120
+ ### 2. Plan — Project the design, then generate concrete tasks
121
121
 
122
- Every scenario becomes a pair: write the step definition (test), then write the production code. Tasks reference exact file paths, exact assertions, and real patterns from area docs. Data changes (models, migrations) are ordered before feature code.
122
+ Plan opens by **projecting** the agreed `draft.md` into its homes (features, constraints, decisions, `data.yml`, manifest), running the admission test and principles gate as it goes. Then every scenario becomes a pair: write the step definition (test), then write the production code. Tasks reference exact file paths, exact assertions, and real patterns from area docs, ordered along the technical spine (dependencies data API logic → UI → verification).
123
123
 
124
124
  The plan skill reads area docs for conventions and boundaries, and queries the code graph for reusable utilities and exact symbols — so the AI plans with real codebase knowledge, not guesses. Each task is tagged with its verification level: `scenario` (behavior), `unit-invariant` (a constraint), or `characterization` (internal/refactor).
125
125
 
@@ -176,7 +176,7 @@ Grimoire treats design as a first-class spec input, not an afterthought.
176
176
  - **Brand capture at init** — `grimoire init` offers to capture colors, type, spacing, and voice into `.grimoire/brand/` (DTCG tokens). Skip-able; can be added later via `grimoire-design --capture-brand`.
177
177
  - **Consult (optional)** — `/grimoire:design-consult` runs a pre-design Q&A. Security and data personas interview the designer about the proposed change *before* any artifacts exist, surfacing assumptions and constraints early. No findings, no blockers — just questions whose answers will shape the design.
178
178
  - **Design** — `/grimoire:design` walks: problem statement → user flow & pain points → variants (Figma MCP, static HTML, or ASCII) → required component states (default/loading/empty/error) → proposed Gherkin scenarios for each (component × state).
179
- - **Handoff** — accepted scenarios feed `/grimoire:draft` (manifest + ADRs), then `/grimoire:plan` (tasks), then `/grimoire:review` — **mandatory at complexity 4** with surface-conditional adversarial personas (keyboard, screen-reader, contrast on web; touch + gesture on mobile; keyboard-only on TUI).
179
+ - **Handoff** — accepted scenarios feed `/grimoire:draft` (design), then `/grimoire:plan` (projects manifest + ADRs, then tasks), then `/grimoire:review` — **mandatory at complexity 4** with surface-conditional adversarial personas (keyboard, screen-reader, contrast on web; touch + gesture on mobile; keyboard-only on TUI).
180
180
  - **Revision** — `/grimoire:design --revise` re-enters an existing design without restarting. Shows current variants and Gherkin, asks what to change, regenerates only the affected artifacts. Previously-accepted scenarios are not overwritten without confirmation.
181
181
 
182
182
  Brand-drift lint (`grimoire-design --lint`) cross-references hardcoded colors / px / fonts against `.grimoire/brand/tokens.json` and suggests token replacements. Wired into precommit-review when tokens exist.
@@ -194,19 +194,14 @@ Full grimoire cycle end-to-end — adding two-factor authentication to an existi
194
194
  You: "Users should verify their identity with a TOTP code after entering their password"
195
195
  ```
196
196
 
197
- The AI runs `/grimoire:draft` and produces:
197
+ The AI runs `/grimoire:draft` and designs the change on one living `draft.md` — the decision ledger (Y-statements), behavioral sketches, and an open/decided ledger — iterating with you until the design is agreed:
198
198
 
199
199
  ```
200
200
  .grimoire/changes/add-2fa-login/
201
- ├── manifest.md # Why, what's changing, scope
202
- ├── features/
203
- │ └── auth/
204
- │ └── login.feature # Updated with 2FA scenarios
205
- └── decisions/
206
- └── 0003-totp-library.md # Chose pyotp over django-otp
201
+ └── draft.md # the agreed design; scenarios take shape here before projection
207
202
  ```
208
203
 
209
- **login.feature:**
204
+ The scenarios that emerge (projected into `login.feature` at the start of Plan):
210
205
  ```gherkin
211
206
  Feature: Login with two-factor authentication
212
207
  As a user
@@ -235,11 +230,11 @@ Feature: Login with two-factor authentication
235
230
  And I should remain on the verification page
236
231
  ```
237
232
 
238
- You review and approve. Manifest status: `draft` `approved`.
233
+ You review and approve the **design**. Nothing is written to `features/` or `decisions/` yet — projection happens next, in Plan.
239
234
 
240
235
  ### Plan
241
236
 
242
- The AI runs `/grimoire:plan`, reads the approved features + area docs + data schema, and generates `tasks.md`:
237
+ The AI runs `/grimoire:plan`, which **first projects** the agreed `draft.md` into its homes — `manifest.md`, `features/auth/login.feature` (the scenarios above), and `decisions/0003-totp-library.md` — running the admission test as it routes each fact. It then reads those homes + area docs + the code graph and generates `tasks.md`, ordered along the technical spine:
243
238
 
244
239
  ```markdown
245
240
  # Tasks: add-2fa-login
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@kiwidata/grimoire",
3
- "version": "0.2.2",
3
+ "version": "0.3.1",
4
4
  "description": "Gherkin + MADR spec-driven development for AI coding assistants",
5
5
  "type": "module",
6
6
  "bin": {
@@ -384,6 +384,7 @@ Present a brief summary:
384
384
  ## Important
385
385
  - **Tests are not optional.** Every task produces both production code and passing step definitions. No exceptions.
386
386
  - **Red-green is mandatory, not aspirational.** A test must fail before it passes. If it doesn't fail, it's not a real test. Fix it before moving on.
387
+ - **Code-before-test is the most common bypass.** "I'll add the test after" / "let me see it work first" are the *Code before the test* rationalization in `../references/red-flags.md`. If you wrote code before the test, delete the code and start from red.
387
388
  - **A test that always passes is worse than no test.** It gives false confidence. If you can't make a step definition fail, you don't understand what it's testing.
388
389
  - The feature file is the spec. If a test fails, fix the code, not the feature.
389
390
  - If implementation reveals that a scenario is wrong or missing, STOP and go back to draft. Don't silently change features.
@@ -107,6 +107,9 @@ For confirmed items, create a grimoire change:
107
107
  Group related items into single changes — don't create one change per discovery.
108
108
 
109
109
  ### 6. Dead Feature Detection
110
+
111
+ **Detection is deterministic.** Every dead/stale finding cites exact `file:line` (or ADR id) evidence from a reproducible check — codebase-memory-mcp graph queries (`search_graph` / `get_architecture`) per [0029]/[0030], with `grep` / `git blame` only where the graph has no answer (e.g. `@skip` age). The same commit yields the same findings. The LLM summarizes and interviews; it does not score the codebase by impression.
112
+
110
113
  Check for documented features and decisions that may no longer be accurate:
111
114
 
112
115
  **Dead features** — feature files that describe behavior the code no longer implements:
@@ -137,7 +140,15 @@ After the interview, summarize:
137
140
  - How many decisions are documented vs. undocumented
138
141
  - How many decisions are stale
139
142
  - How many conventions files drifted vs. up-to-date
140
- - Suggest which areas to address first (highest risk / most complex / most frequently changed)
143
+
144
+ Then emit a **Top Actions** list — most-risk first, each with the exact path and the single next move. The ranking comes from the deterministic checks (§6), not impression, so the same commit yields the same list:
145
+
146
+ ```markdown
147
+ ## Top Actions
148
+ 1. `features/billing/invoice.feature` — dead (InvoiceView deleted ~3mo ago); create a removal change.
149
+ 2. `.grimoire/decisions/0007-search-backend.md` — stale (library no longer in deps); deprecate or update.
150
+ 3. `.grimoire/docs/conventions/api.md` — drifted (views moved to `src/api/handlers/`); refresh.
151
+ ```
141
152
 
142
153
  ## Important
143
154
  - This is a COLLABORATIVE process, not a dump. Interview, don't lecture.
@@ -55,6 +55,8 @@ Before touching any production code:
55
55
  2. Run it — **it MUST FAIL**, reproducing the bug
56
56
  3. If it passes, your test doesn't actually reproduce the bug. Fix the test until it fails for the right reason.
57
57
 
58
+ **Name it after the bug.** This repro test stays as the permanent regression test — name it so the bug is obvious (`test_password_reset_special_chars`; scenario "Password reset with plus-sign email"). One bug → one named regression test. This is how the same bug doesn't come back: a future change that reintroduces it goes red on a test that names the defect.
59
+
58
60
  This is non-negotiable. A bug fix without a reproduction test is a guess that might work. A failing test is proof you understand the problem.
59
61
 
60
62
  ### 4. Document the Bug
@@ -193,6 +195,7 @@ Report to the user:
193
195
 
194
196
  ## Important
195
197
  - **Reproduce before you fix.** No exceptions. If you can't reproduce it, you don't understand it, and your fix is a guess.
198
+ - **The test is the source of truth, not your self-review.** When the same agent writes a fix and then reviews it, the same wrong assumption rides into both steps — "looks correct" is not evidence. The red→green of the named regression test (and the configured suites) is the proof. Don't declare a bug fixed on a code re-read; declare it fixed when the mechanical gate flips and stays green.
196
199
  - **Small fixes only.** If the bug fix requires significant architectural changes, it's not a bug fix — route to `grimoire-draft` for a proper change.
197
200
  - **Don't over-document.** The test is the documentation. A one-line comment in the test explaining the bug is enough. Don't create tracking files, bug reports, or manifests for a bug fix.
198
201
  - **The feature file is truth.** If a scenario describes behavior the user now says is wrong, that's a spec change, not a bug. Handle it through `grimoire-draft`.
@@ -61,6 +61,8 @@ Render the chosen template for the user to fill in. Write the result to `problem
61
61
  ### 3. User Flow & Success Metrics
62
62
  Ask for a **friction-log narrative** as the default minimum-viable level — a short prose description of the current user journey and where it hurts. Offer (but never force) two upgrades: Mermaid journey diagram, then service blueprint.
63
63
 
64
+ Walk the flow on the **UX-workflow spine** (`../references/design-spine.md`): pick a direction — **backward** from the goal ("what must be true for the user to reach here?") when the goal is clear but the path isn't, or **forward** from what the user already knows when documenting an existing happy path — then validate by walking the other way; gaps where the two don't meet are missing or assumed steps. Reconstruct the *real* sequence with laddering / the Mom Test (`../references/elicitation-personas.md`), not an imagined one. Keep the simplicity bias: a surfaced step is a candidate, and a step that serves no part of the stated goal is cut, not designed.
65
+
64
66
  Separately — not bundled — ask "What are the user's current pain points?" Accept a bulleted list, free text, or "none known". Capture pain points in `problem.md` under a dedicated `## Pain Points` section. Variants generated in step 6 must each state which pain points they address (or explicitly mark "deferred").
65
67
 
66
68
  Ask for at least one measurable success metric (e.g., "reduce support tickets about lockouts by 50%"). If the user cannot articulate one, note `no success metric — design effectiveness will be hard to evaluate` as an assumption in `problem.md`.
@@ -73,7 +75,7 @@ If `.grimoire/docs/components.md` is absent AND the project has UI code, ask the
73
75
  - MUI / Chakra / Mantine / Radix imports in `package.json`
74
76
  - `*.stories.{ts,tsx,jsx,js}` (Storybook stories)
75
77
 
76
- Write findings to `.grimoire/docs/components.md` listing detected components with file paths and known variants. Skip the scan entirely if no UI signals are present (greenfield or non-UI surface). Subsequent variants prefer existing components over net-new designs and flag net-new explicitly.
78
+ Read at **interface altitude** — detect components from these manifests, imports, and stories, not by reading component source bodies (`../references/artifact-map.md` → Reading altitude). Write findings to `.grimoire/docs/components.md` listing detected components with file paths and known variants. Skip the scan entirely if no UI signals are present (greenfield or non-UI surface). Subsequent variants prefer existing components over net-new designs and flag net-new explicitly.
77
79
 
78
80
  ### 5. Brand Grounding
79
81
  Read `.grimoire/brand/tokens.json` and `.grimoire/brand/voice.md` if they exist. Use the format documented in `../references/brand-tokens-format.md`. Required groups: `color.*`, `font.family.*`, `font.size.*`, `spacing.*`.
@@ -128,7 +130,7 @@ For HTML output, write a single `.grimoire/changes/<change-id>/designs/preview.h
128
130
  Skip preview rendering when output is Figma (the Figma file IS the preview) or ASCII (the markdown table IS the preview).
129
131
 
130
132
  ### 9. Derive Gherkin
131
- Propose draft scenarios as a **design artifact** at `.grimoire/changes/<change-id>/designs/scenarios.feature` (a proposal, not the live baseline). One Scenario per (component × state), Given / When / Then grounded in the design. `grimoire-draft` writes the user-approved scenarios live into `features/` — design does not edit the live baseline directly. Every proposed scenario must still pass draft's feature admission test (external actor, observable, domain language).
133
+ Propose draft scenarios as a **design artifact** at `.grimoire/changes/<change-id>/designs/scenarios.feature` (a proposal, not the live baseline). One Scenario per (component × state), Given / When / Then grounded in the design. These are a proposal: `grimoire-draft` carries them into the design, and `grimoire-plan` projects the user-approved scenarios live into `features/` — design does not edit the live baseline directly. Every proposed scenario must still pass the feature admission test (external actor, observable, domain language).
132
134
 
133
135
  Apply surface-conditional adversarial scenarios per `../references/adversarial-personas.md`:
134
136
 
@@ -143,7 +145,7 @@ Present the proposed scenarios for review: "Review proposed scenarios — accept
143
145
  ### 10. Handoff
144
146
  When the user accepts proposed scenarios, the change folder is populated. Suggest the next step:
145
147
 
146
- > Run `grimoire-draft` to refine the manifest and ADRs, or `grimoire-plan` to break into tasks.
148
+ > Run `grimoire-draft` to refine the behavioral design, then `grimoire-plan` to project it (features/constraints/ADRs/manifest) and break into tasks.
147
149
 
148
150
  Skill is done.
149
151
 
@@ -256,4 +258,4 @@ Do not error — absence is a valid state for projects that haven't onboarded br
256
258
  - **Brand drift findings are suggestions, not blockers.** Lint mode proposes token replacements; it does not auto-rewrite code. The user decides whether to apply.
257
259
 
258
260
  ## Done
259
- When the user accepts proposed Gherkin scenarios and the change folder contains `problem.md`, `designs/`, and `features/`, the workflow is complete. Suggest `grimoire-draft` (manifest + ADRs) or `grimoire-plan` (task breakdown) next.
261
+ When the user accepts proposed Gherkin scenarios and the change folder contains `problem.md` and `designs/`, the workflow is complete. Suggest `grimoire-draft` (refine the behavioral design) next, then `grimoire-plan` (project to features/ADRs/manifest + task breakdown).
@@ -181,9 +181,9 @@ Do not invent content for empty sections. If the designer skipped "Proposed user
181
181
  ### 6. Handoff
182
182
  Tell the user what runs next and what those skills will do with `consult.md`:
183
183
 
184
- - **`grimoire-design`** on the same `change-id` will read `consult.md` first, propagate assumptions and givens into prompts for variant generation (e.g., "exclude patterns that violate givens"), and copy the lists into `manifest.md` when the designer accepts a direction.
185
- - **`grimoire-draft`** on the same `change-id` will read `consult.md` and copy "Inferred assumptions" + "Inferred givens" into `manifest.md` (Assumptions section, plus a new Givens section) at level 3-4 complexity.
186
- - **Open questions** travel into `manifest.md` as unvalidated assumptions for the designer/PM to resolve before plan.
184
+ - **`grimoire-design`** on the same `change-id` will read `consult.md` first and propagate assumptions and givens into prompts for variant generation (e.g., "exclude patterns that violate givens"); the lists carry into the design and are projected into `manifest.md` by `grimoire-plan`.
185
+ - **`grimoire-draft`** on the same `change-id` will read `consult.md` and carry "Inferred assumptions" + "Inferred givens" into the `draft.md` design (assumptions Decided/Open; givens Decisions-ledger context). `grimoire-plan` then projects these into the manifest's Assumptions section at level 3-4 complexity.
186
+ - **Open questions** stay in `consult.md` as designer follow-up items they are not copied forward.
187
187
 
188
188
  Do not invoke the next skill automatically. Confirm with the user, then suggest the next command.
189
189
 
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: grimoire-draft
3
- description: Design a change collaboratively on one living draft.md, then project it into Gherkin features, constraints, and MADR decisions. Use when the user describes new functionality, requirements, or architecture choices.
3
+ description: Design a change collaboratively on one living draft.md. grimoire-plan then projects it into Gherkin features, constraints, and MADR decisions. Use when the user describes new functionality, requirements, or architecture choices.
4
4
  compatibility: Designed for Claude Code (or similar products)
5
5
  metadata:
6
6
  author: kiwi-data
@@ -9,12 +9,13 @@ metadata:
9
9
 
10
10
  # grimoire-draft
11
11
 
12
- Design a change on **one living document** (`draft.md`), iterating with the user, then
13
- **project** the agreed design into its durable homes (features, constraints, decisions).
12
+ Design a change on **one living document** (`draft.md`), iterating with the user until the
13
+ design is agreed. `grimoire-plan` then **projects** that design into its durable homes
14
+ (features, constraints, decisions) — draft itself does not write them.
14
15
 
15
16
  The core idea: spread-out artifacts hinder the thinking. So you do all the designing in a
16
17
  single coherent doc — diagram/sketch, a decision ledger, pseudo-code, an open-question
17
- ledger — and only fragment it into separate homes **after agreement**. `draft.md` is
18
+ ledger — and it is fragmented into separate homes (by plan) only **after agreement**. `draft.md` is
18
19
  ephemeral: retained as reference through the pipeline, deleted when `grimoire-apply` clears
19
20
  the change folder. Git history preserves it.
20
21
 
@@ -27,7 +28,7 @@ the change folder. Git history preserves it.
27
28
  ## Routing (coarse — up front)
28
29
 
29
30
  Decide only whether to design at all, and in which skill. The **fine** routing (which fact
30
- becomes a feature vs. a constraint vs. a decision) happens later, at projection (step 7).
31
+ becomes a feature vs. a constraint vs. a decision) happens later, at projection — now `grimoire-plan`'s first step.
31
32
 
32
33
  - Bug report ("something is broken") → `grimoire-bug` or `grimoire-bug-report`
33
34
  - Pure refactoring (no behavior change) → no grimoire artifact needed. Suggest an ADR only if architecturally significant.
@@ -40,7 +41,7 @@ becomes a feature vs. a constraint vs. a decision) happens later, at projection
40
41
 
41
42
  Confirm this is a change worth designing, and which skill owns it (table above). You do
42
43
  **not** need to assign each fact to a home yet — during design everything lives in one
43
- `draft.md`; per-fact routing is a projection concern (step 7, D13).
44
+ `draft.md`; per-fact routing is a projection concern, handled in `grimoire-plan`.
44
45
 
45
46
  The one up-front question that matters: **is this a behavior/feature/architecture change**
46
47
  (→ design it here), or a bug / pure refactor / config tweak (→ route away, per the table)?
@@ -54,9 +55,9 @@ the design exists. Up front, make only one binary call:
54
55
  - **Trivial** — config, typo, copy change, single-file fix, dependency bump. Skip the
55
56
  `draft.md` loop: make the change directly, record a minimal `manifest.md` (Why + file
56
57
  list), done.
57
- - **Non-trivial** — anything else. Build a `draft.md` and design the change (steps 3–8).
58
+ - **Non-trivial** — anything else. Build a `draft.md` and design the change (steps 3–7).
58
59
 
59
- The full **complexity level (1–4)** is scored at **projection** (step 7), once the design
60
+ The full **complexity level (1–4)** is scored at **projection** (`grimoire-plan`'s first step), once the design
60
61
  is settled, and written to `manifest.md` — not before (a premature number biases the design
61
62
  to fit it). During design, use the table below only as a rough guide for how deep to
62
63
  research and elicit; depth grows with the change, it is not pre-allocated.
@@ -79,7 +80,7 @@ Before designing, research what already exists. Do not ask the user to research
79
80
  built-ins / first-party check; architecture decisions, new dependencies, or cross-cutting
80
81
  concerns need full research across all categories.
81
82
 
82
- Follow the methodology in `../references/build-vs-buy.md`. The findings feed the `draft.md`
83
+ Follow the methodology in `../references/build-vs-buy.md`. Read candidates at **interface altitude** — public API, types, and docs, not their source or tests (`../references/artifact-map.md` → Reading altitude). The findings feed the `draft.md`
83
84
  **Why** (and, for an adopt/build/hybrid call, the manifest **Prior Art** at projection).
84
85
  Present findings to the user and get agreement on direction before designing deeply.
85
86
 
@@ -111,17 +112,29 @@ This single loop replaces what used to be separate "elicit requirements", "draft
111
112
  "collaborate" steps. **Interviewing IS iterating on `draft.md`.** There is no gather-then-
112
113
  transcribe split — requirements surface, get questioned, and resolve inside the doc.
113
114
 
115
+ **Walk the spine.** Pick the spine this change rides (`../references/design-spine.md`): the **technical spine** (process/constraints → data model → API/contract → UI) for behavioral/technical work, or the **UX-workflow spine** (backward from the goal, or forward from what the user knows) for a user-facing flow. Then **always walk its layers in order** — at each layer: elicit with that layer's lens, record its decisions, and validate the prior layer against what you just learned (a required field must trace to a downstream need; the data model must satisfy the process constraints). An empty layer is a one-line skip, not a silent omission. Ceremony scales to constraints: lightweight by default, but once the change introduces **more than 2 constraints**, walk every layer formally — that 3rd constraint is complexity surfacing.
116
+
114
117
  Iterate with the user, directly on `draft.md`:
115
118
 
116
119
  ```
117
120
  loop:
118
- propose → decisions into the Decisions ledger; shapes into Sketches; a diagram/sketch into At a glance
121
+ propose → decisions into the Decisions ledger as Y-statements (../references/design-spine.md); shapes into Sketches; a diagram/sketch into At a glance
119
122
  question → unknowns become rows under Open (use ../references/elicitation-personas.md as lenses)
120
123
  user reacts → answers / edits the doc
121
124
  resolve → strike the Open row IN PLACE: `RESOLVED: <answer> (Dn)` — never delete it
122
125
  until Decided is stable AND Open is empty-or-deferred.
123
126
  ```
124
127
 
128
+ **Explore before you converge.** When the design approach is genuinely open — more than one
129
+ reasonable shape exists — sketch **2–3 candidate approaches** at a high level (one or two
130
+ lines each: the idea + its main trade-off) and let the user pick a direction *before* you
131
+ deep-dive the ledger on one. Don't silently commit to the first idea that works; the first
132
+ idea is rarely the best, and an unexamined commitment is the *Silently filling a gap* red
133
+ flag at design scale (`../references/red-flags.md`). Keep this lightweight: it is a quick
134
+ approach-level fork, not a variants matrix — **visual/UI variants are `grimoire-design`'s
135
+ job**. When the approach is obvious or forced (one viable shape), say so in one line and
136
+ proceed; don't manufacture alternatives.
137
+
125
138
  Discipline for the loop:
126
139
 
127
140
  1. **Outcome & Non-goals first.** Pin these (into *Why*) before anything else — they set scope. Restate them back to the user.
@@ -131,124 +144,44 @@ Discipline for the loop:
131
144
  5. **Disambiguate immediately.** If an answer is vague ("handle errors gracefully"), ask the specific follow-up and record the concrete answer. Never leave a vague answer in the ledger.
132
145
  6. **Capture, don't extrapolate.** "Out of scope for now" → record as a non-goal and stop. Don't design a scenario "just in case".
133
146
  7. **When the user delegates** ("just write something reasonable"), record it explicitly as an Open→RESOLVED row: "Defaulting to <choice> per user delegation — flag in review if wrong." The assumption stays visible.
134
- 8. **Sort facts by kind as they emerge.** An invariant (security control, NFR, performance budget, observability guarantee) is not a behavior — capture it in the *Constraints* section, not as a behavioral sketch. Apply the rough behaviour-vs-invariant test as you design (does an external actor observe it without reading code/logs?) so projection's admission test (step 7) gets clean input instead of slop to reroute. The fine fact-to-home routing still happens at projection; this just keeps the design honest while you think.
147
+ 8. **Sort facts by kind as they emerge.** An invariant (security control, NFR, performance budget, observability guarantee) is not a behavior — capture it in the *Constraints* section, not as a behavioral sketch. Apply the rough behaviour-vs-invariant test as you design (does an external actor observe it without reading code/logs?) so projection's admission test (in `grimoire-plan`) gets clean input instead of slop to reroute. The fine fact-to-home routing still happens at projection; this just keeps the design honest while you think.
135
148
 
136
149
  **Never silently fill an open question.** Either ask it (as an *Open* row), defer it to a non-goal, or record the inference explicitly in *Decided*. The *Decided/Open* ledger IS the requirements summary — before declaring the design done, walk it back to the user so they see every call and every guess.
137
150
 
138
151
  **Nothing is written to `features/`, `.grimoire/docs/constraints.md`, or `.grimoire/decisions/` during this loop.** Everything lives in `draft.md`. The design is "done" when *Decided* is stable and *Open* is empty-or-deferred — and the user agrees.
139
152
 
140
- Do NOT proceed to projection without explicit user approval of the design.
141
-
142
- ### 7. Projection — generate the homes from draft.md
143
-
144
- Once the user agrees the design is settled, project `draft.md` into its durable homes. This
145
- is where the **fine routing** happens (each fact → its one home) and where the admission
146
- test + principles gate run. Artifacts are written **live in their real locations** on the
147
- branch — `git diff` is the staging area; there is no copy-into-the-change-folder.
148
-
149
- First, **score the complexity level (1–4)** now that the design is settled, and write it to
150
- `manifest.md` frontmatter as `complexity: <1-4>`.
151
-
152
- Then project each kind of fact:
153
-
154
- **Behaviors → `features/*.feature`.** For each behavioral fact in the design:
155
-
156
- *The feature-file admission test* — a scenario may be written **only if it passes all four gates**; if it fails any, it is a constraint or a decision, not a feature:
157
- 1. **External actor, outside the system boundary** — an end user, an operator, or a *third-party* system integrating with you does the thing. "External" means outside *your* system, not outside one module: a sibling service, an internal queue consumer, or another module in the same repo calling this one is **internal**, even though it's a separate process. Internal actor → contract test or constraint/decision, never a `.feature`.
158
- 2. **Observable** — the actor sees the outcome without reading code or logs. "<200ms", "logs scrubbed of PII" → fails → constraint.
159
- 3. **Domain language** — domain nouns, zero implementation detail. Names a library/log-level/table (`loguru`, `INFO`, `bcrypt`, `users` table) → fails → leaking implementation.
160
- 4. **Survives reimplementation** — rewrite the internals from scratch; would the scenario still read the same? If it would change, it's pinned to implementation → not a feature.
161
-
162
- **Internal protocols and service-to-service contracts are NOT features.** A change to how two of your own components talk — an internal RPC/queue/event shape, a module API, a wire format between your services — is a *contract*, verified by a contract/integration test (`verify: unit-invariant` at plan stage), not by Gherkin. It fails gate 1: there is no external actor, only your own code on both ends. If a third-party integrates against the protocol it's external and may be a feature; two of your own services is internal. This is the second-biggest source of feature-file slop after invariants.
163
-
164
- Common slop this catches: invariants (→ `constraints.md`) — "PII is scrubbed from logs", "all endpoints require auth", "responses are gzipped", "errors logged with a trace id"; internal protocols (→ contract test) — "service A publishes an OrderPlaced event B consumes", "the worker accepts a job payload with these fields", "module X returns this struct to module Y".
165
-
166
- *Extend vs. new — default is always extend; new files are the exception and require justification.* List existing feature files first (**required, not skippable** — do not write any scenario until this triage table is complete):
167
-
168
- ```
169
- Existing feature files:
170
- features/auth/login.feature — "User Login"
171
- features/billing/invoices.feature — "Invoice Management"
172
- ```
173
-
174
- For each scenario, decide extend-or-new and show it:
175
-
176
- ```
177
- "Admin resets a user's password" → extend features/auth/login.feature (same actor domain: auth)
178
- "User configures SSO provider" → NEW (no existing file owns SSO configuration)
179
- ```
180
-
181
- Signals to extend: same actor, same domain object, same entry point, same HTTP resource or screen. Signals genuinely new: new actor type with no existing file, entirely new domain object, or the existing Feature title would need "and" to cover both. If unsure, extend. A new file requires stating which files were considered and why none fit.
182
-
183
- Then write Gherkin (Feature title + user story; Background for shared preconditions; one scenario per behavior; Given/When/Then describing WHAT, never HOW). Apply security tags per `../references/security-compliance.md` (only when there's a security surface; compliance tags only when `project.compliance` is set). When design input grounded the scenarios (step 4): use brand-token **names** not hex values when `.grimoire/brand/tokens.json` applies; prefer existing component names when `.grimoire/docs/components.md` exists, and flag any net-new component ("new component required — confirm before plan stage").
184
-
185
- **Constraints → `.grimoire/docs/constraints.md`.** Every invariant that failed the admission test (it's a security control / NFR / observability / compliance rule, not an actor-observable behavior) becomes one row: **assertion · rationale · how-verified · links**. The assertion is a flat statement ("Log output never contains PII or secrets"), not Given/When/Then. `how-verified` names the test that proves it (a `unit-invariant` the plan stage will create) — never a Gherkin scenario. If it stems from a decision, link the MADR; don't restate it. Create the file from `templates/constraints.md` if absent.
186
-
187
- **Decisions → `.grimoire/decisions/NNNN-*.md`.** Project each Decisions-ledger entry, applying the **novelty gate**: a MADR is for a decision with a real, project-specific trade-off between viable alternatives — not for industry-default tooling picks or ecosystem-forced conventions. Ask: *would a competent engineer on this stack make a different choice, and need our reasoning to understand ours?* If no, skip it. Obvious tooling/convention picks fold into the existing `Tooling and convention baseline` ADR (one line: choice → why), not a new sequential record. Genuine trade-offs get the next sequential number, status `proposed` (`grimoire-apply` flips to `accepted` at finalize), using `.grimoire/decisions/template.md`.
188
-
189
- **Data changes → `.grimoire/changes/<change-id>/data.yml`.** If the change adds/modifies/removes data models, fields, indexes, or external API integrations, write `data.yml` (same YAML shape as `schema.yml`, only what's changing, `action:` on each entry):
190
-
191
- ```yaml
192
- # Proposed data changes for: add-user-profiles
193
- users:
194
- action: modify
195
- source: src/models/user.py
196
- fields:
197
- avatar_url: { action: add, type: varchar, nullable: true }
198
- legacy_name: { action: remove }
199
- profiles:
200
- action: add
201
- type: collection
202
- fields:
203
- user_id: { type: objectId, ref: users }
204
- bio: { type: string, max_length: 500 }
205
- github_api:
206
- action: add
207
- type: external_api
208
- provider: GitHub
209
- schema_ref: https://docs.github.com/en/rest
210
- client: src/integrations/github.py
211
- endpoints:
212
- get_user:
213
- method: GET
214
- path: /users/{username}
215
- request:
216
- headers: { Authorization: "Bearer {token}" }
217
- response:
218
- login: { type: string, required: true }
219
- avatar_url: { type: string, required: true }
220
- name: { type: string, nullable: true }
221
- error_response:
222
- message: { type: string }
223
- status: { type: integer }
224
- ```
153
+ Do NOT hand off to `grimoire-plan` without explicit user approval of the design.
225
154
 
226
- **Contract documentation is mandatory for external APIs.** Every endpoint must document `request` (what you send), `response` (fields you read, `required: true` for those your code depends on), and `error_response` (the error shape you handle). Downstream skills generate contract tests from this. If you don't know the exact shape, reference `schema_ref` and document the subset your client uses that subset is the contract. No data impact → skip `data.yml` entirely.
155
+ ### 7. Hand offprojection happens in plan
227
156
 
228
- **Manifest (`manifest.md`).** Generate it from `draft.md` as the durable plan-input glue: `complexity` (just scored), Why + Non-goals, the artifact list (added/modified/removed features, decisions, constraints), and a **Prior Art** section summarizing step 3's research (what was found/evaluated, why adopt/build/hybrid; if building, what's borrowed). **Level 3–4** also carry **Assumptions** (what must be true; mark evidence vs. unvalidated; flag unvalidated ones on the critical path) and a **Pre-Mortem** (2–5 plausible failure modes 6 months out, with mitigations or "accepted"). These come straight from the `draft.md` Decided/Open and Cut sections.
157
+ Draft ends when the design on `draft.md` is agreed. **Projection turning the design into its
158
+ durable homes (features, constraints, MADRs, `data.yml`, manifest) — is now the first step of
159
+ `grimoire-plan`**, co-located with the planning that consumes those homes. A two-phase draft
160
+ (design *then* project) was one job too many; draft now does one thing — design the change —
161
+ and hands the agreed `draft.md` to plan.
229
162
 
230
- **Do NOT delete `draft.md`.** Retain it read-only as the agreed reference through plan → … → apply. `grimoire-apply` removes it with the change folder at finalize.
163
+ So draft does **not** write `features/`, `.grimoire/docs/constraints.md`,
164
+ `.grimoire/decisions/`, `data.yml`, or the full `manifest.md`. What it leaves for plan:
231
165
 
232
- ### 8. Validate (at projection)
166
+ - `draft.md` the agreed design: the Decisions ledger (Y-statements), Decided/Open, Sketches,
167
+ Constraints, and Cut sections. This is the single source plan projects from.
168
+ - The change folder and the feature branch.
233
169
 
234
- - `.feature` files have valid Gherkin; every Feature has a user story; every Scenario has at least Given + When + Then.
235
- - MADR records have valid YAML frontmatter (status, date).
236
- - Manifest is complete and accurate; `complexity` is set.
237
- - **Re-run the admission test on every scenario you wrote**: external actor, observable, domain language, survives reimplementation. Any scenario that now fails is slop — move it to `constraints.md` or a MADR.
238
- - **Principles gate** (`../references/principles.md`): no fact written to two homes (DRY), no second way to do an existing thing (one right way), no reinvented wheel, no artifact created past the stated scope (KISS). Note: `draft.md` co-existing with the homes is **not** a DRY violation — it is the (soon-deleted) source the homes were projected from, not a parallel authority.
170
+ **Exception trivial changes** (the step-2 triviality gate) skip plan entirely: draft makes
171
+ the change directly and records the minimal `manifest.md` (Why + file list) itself.
239
172
 
240
173
  ## Important
241
174
  - ONE change at a time. Don't combine unrelated changes.
242
- - **`draft.md` is the only surface you design on.** Features, constraints, MADRs, and the manifest are **generated from it** at projection never authored by hand in parallel during design.
175
+ - **Catch the rationalizations.** "Too small to spec", "I'll just assume a reasonable default" are the named excuses in `../references/red-flags.md` (*Skipping the spec*, *Silently filling a gap*). The urge to skip is the signal to do the step — not skip it.
176
+ - **`draft.md` is the only surface you design on.** Features, constraints, MADRs, and the manifest are **generated from it** at projection (`grimoire-plan`'s first step) — never authored by hand in parallel during design, and not written by draft at all.
243
177
  - **Features describe actor-observable behavior, not implementation, and not invariants.** No external actor, not observable, or names a library/log-level/table → it's a constraint (→ `constraints.md`) or a decision (→ MADR). An internal protocol or service-to-service contract (your own components talking) is a contract test, not a `.feature` — "external" means outside your system, not outside one module. These two — invariants and internal protocols — are the top sources of feature-file slop.
244
178
  - **One fact, one home** (`../references/principles.md`). A capability lives in one `.feature`; a control in one constraint row; a decision in one MADR. Never the same fact in two homes (at rest).
245
- - Decisions live in **one inline ledger** in `draft.md` while designing; they project to separate MADRs only at step 7. This is how coupled decisions stay legible during the thinking.
246
- - Artifacts (post-projection) are edited **live on the branch** — never copied into `.grimoire/changes/`. `git diff` is the staging area.
179
+ - Decisions live in **one inline ledger** in `draft.md` while designing (Y-statements); they project to separate MADRs only at projection, in `grimoire-plan`. This is how coupled decisions stay legible during the thinking.
180
+ - Projected artifacts are edited **live on the branch** — never copied into `.grimoire/changes/`. `git diff` is the staging area.
247
181
  - **Figma access token is read from `FIGMA_ACCESS_TOKEN` by the MCP server.** Never log it, never write it to config or any artifact (`manifest.md`, `consult.md`, `figma-snapshot.json`, `draft.md`). The MCP handles auth transparently.
248
182
 
249
183
  ## Done
250
- When the user approves the design and it has been projected, the workflow is complete.
251
- `draft.md` remains as reference until `grimoire-apply` clears it. Present the change
252
- directory path and suggest next steps:
253
- - `grimoire-plan` to generate implementation tasks
184
+ When the user approves the design on `draft.md`, the workflow is complete — draft does not
185
+ project. Present the change directory path and suggest next steps:
186
+ - `grimoire-plan` projects the design into features/constraints/MADRs/manifest, then generates tasks
254
187
  - Or further iteration on `draft.md` if the user wants changes
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: grimoire-plan
3
- description: Derive implementation tasks from approved Gherkin features and MADR decisions. Use when features are approved and ready for task breakdown.
3
+ description: Project an agreed draft.md into its homes (features, constraints, decisions, manifest), then derive implementation tasks from them. Use after the design is approved in grimoire-draft.
4
4
  compatibility: Designed for Claude Code (or similar products)
5
5
  metadata:
6
6
  author: kiwi-data
@@ -9,7 +9,7 @@ metadata:
9
9
 
10
10
  # grimoire-plan
11
11
 
12
- Derive implementation tasks from approved Gherkin features and MADR decisions. The output must be detailed enough that any LLM can execute the tasks without further planning.
12
+ Plan opens by **projecting** the agreed `draft.md` into its durable homes (features, constraints, decisions, `data.yml`, manifest), then derives implementation tasks from them. The output must be detailed enough that any LLM can execute the tasks without further planning.
13
13
 
14
14
  ## Triggers
15
15
  - User has approved a grimoire draft and wants to plan implementation
@@ -22,9 +22,7 @@ Derive implementation tasks from approved Gherkin features and MADR decisions. T
22
22
  - User wants to review the design → `grimoire-review` (after plan, before apply)
23
23
 
24
24
  ## Prerequisites
25
- - A change exists in `.grimoire/changes/<change-id>/` with:
26
- - `manifest.md` (approved)
27
- - At least one `.feature` file or decision record
25
+ - A change exists in `.grimoire/changes/<change-id>/` with an agreed `draft.md` — the user has approved the design in `grimoire-draft`. Plan's first step **projects** that design into its homes (features, constraints, MADRs, `data.yml`, manifest); those do not need to exist yet.
28
26
 
29
27
  ## Workflow
30
28
 
@@ -56,14 +54,104 @@ The plan implements what's approved. It does not expand scope to hit a checklist
56
54
 
57
55
  These are gates, not aspirations — a task that adds a duplicate home or a reinvented wheel is rejected, not refined.
58
56
 
59
- ### 1. Select Change
60
- - List active changes in `.grimoire/changes/`
61
- - If multiple, ask user which one to plan
62
- - If only one, confirm it
57
+ ### 1. Project the Design from draft.md
58
+
59
+ **Select the change.** List active changes in `.grimoire/changes/`; if multiple, ask which to plan; if one, confirm it. Read its `draft.md` — the agreed design is the source of truth for this step. (If a change arrives already projected and has no `draft.md` — e.g. from `grimoire-refactor`, which authors its own register and artifacts — there is nothing to project: skip to step 2.)
60
+
61
+ **Project `draft.md` into its durable homes.** This is where the **fine routing** happens (each fact → its one home) and where the admission test + principles gate run. Artifacts are written **live in their real locations** on the branch — `git diff` is the staging area; there is no copy-into-the-change-folder. (Projection used to close `grimoire-draft`; it now opens plan, co-located with the planning that consumes these homes.)
62
+
63
+ First, **score the complexity level (1–4)** now that the design is settled, and write it to `manifest.md` frontmatter as `complexity: <1-4>` (use the level table in `grimoire-draft` step 2 as the rubric). Then project each kind of fact:
64
+
65
+ **Behaviors → `features/*.feature`.** For each behavioral fact in the design:
66
+
67
+ *The feature-file admission test* — a scenario may be written **only if it passes all four gates**; if it fails any, it is a constraint or a decision, not a feature:
68
+ 1. **External actor, outside the system boundary** — an end user, an operator, or a *third-party* system integrating with you does the thing. "External" means outside *your* system, not outside one module: a sibling service, an internal queue consumer, or another module in the same repo calling this one is **internal**, even though it's a separate process. Internal actor → contract test or constraint/decision, never a `.feature`.
69
+ 2. **Observable** — the actor sees the outcome without reading code or logs. "<200ms", "logs scrubbed of PII" → fails → constraint.
70
+ 3. **Domain language** — domain nouns, zero implementation detail. Names a library/log-level/table (`loguru`, `INFO`, `bcrypt`, `users` table) → fails → leaking implementation.
71
+ 4. **Survives reimplementation** — rewrite the internals from scratch; would the scenario still read the same? If it would change, it's pinned to implementation → not a feature.
72
+
73
+ **Internal protocols and service-to-service contracts are NOT features.** A change to how two of your own components talk — an internal RPC/queue/event shape, a module API, a wire format between your services — is a *contract*, verified by a contract/integration test (`verify: unit-invariant`), not by Gherkin. It fails gate 1: there is no external actor, only your own code on both ends. If a third-party integrates against the protocol it's external and may be a feature; two of your own services is internal. This is the second-biggest source of feature-file slop after invariants.
74
+
75
+ Common slop this catches: invariants (→ `constraints.md`) — "PII is scrubbed from logs", "all endpoints require auth", "responses are gzipped", "errors logged with a trace id"; internal protocols (→ contract test) — "service A publishes an OrderPlaced event B consumes", "the worker accepts a job payload with these fields", "module X returns this struct to module Y".
76
+
77
+ *Extend vs. new — default is always extend; new files are the exception and require justification.* List existing feature files first (**required, not skippable** — do not write any scenario until this triage table is complete):
78
+
79
+ ```
80
+ Existing feature files:
81
+ features/auth/login.feature — "User Login"
82
+ features/billing/invoices.feature — "Invoice Management"
83
+ ```
84
+
85
+ For each scenario, decide extend-or-new and show it:
86
+
87
+ ```
88
+ "Admin resets a user's password" → extend features/auth/login.feature (same actor domain: auth)
89
+ "User configures SSO provider" → NEW (no existing file owns SSO configuration)
90
+ ```
91
+
92
+ Signals to extend: same actor, same domain object, same entry point, same HTTP resource or screen. Signals genuinely new: new actor type with no existing file, entirely new domain object, or the existing Feature title would need "and" to cover both. If unsure, extend. A new file requires stating which files were considered and why none fit.
93
+
94
+ Then write Gherkin (Feature title + user story; Background for shared preconditions; one scenario per behavior; Given/When/Then describing WHAT, never HOW). Apply security tags per `../references/security-compliance.md` (only when there's a security surface; compliance tags only when `project.compliance` is set). When design input grounded the scenarios (the change's `designs/` folder): use brand-token **names** not hex values when `.grimoire/brand/tokens.json` applies; prefer existing component names when `.grimoire/docs/components.md` exists, and flag any net-new component ("new component required — confirm before generating tasks").
95
+
96
+ **Constraints → `.grimoire/docs/constraints.md`.** Every invariant that failed the admission test (it's a security control / NFR / observability / compliance rule, not an actor-observable behavior) becomes one row: **assertion · rationale · how-verified · links**. The assertion is a flat statement ("Log output never contains PII or secrets"), not Given/When/Then. `how-verified` names the test that proves it (a `unit-invariant` this plan creates) — never a Gherkin scenario. If it stems from a decision, link the MADR; don't restate it. Create the file from `templates/constraints.md` if absent.
97
+
98
+ **Decisions → `.grimoire/decisions/NNNN-*.md`.** Project each Decisions-ledger entry (a Y-statement in `draft.md`), applying the **novelty gate**: a MADR is for a decision with a real, project-specific trade-off between viable alternatives — not for industry-default tooling picks or ecosystem-forced conventions. Ask: *would a competent engineer on this stack make a different choice, and need our reasoning to understand ours?* If no, skip it. Obvious tooling/convention picks fold into the existing `Tooling and convention baseline` ADR (one line: choice → why), not a new sequential record. Genuine trade-offs get the next sequential number, status `proposed` (`grimoire-apply` flips to `accepted` at finalize), using `.grimoire/decisions/template.md` — the Y-statement's context clause becomes the ADR's *Context and Problem Statement*.
99
+
100
+ **Data changes → `.grimoire/changes/<change-id>/data.yml`.** If the change adds/modifies/removes data models, fields, indexes, or external API integrations, write `data.yml` (same YAML shape as `schema.yml`, only what's changing, `action:` on each entry):
101
+
102
+ ```yaml
103
+ # Proposed data changes for: add-user-profiles
104
+ users:
105
+ action: modify
106
+ source: src/models/user.py
107
+ fields:
108
+ avatar_url: { action: add, type: varchar, nullable: true }
109
+ legacy_name: { action: remove }
110
+ profiles:
111
+ action: add
112
+ type: collection
113
+ fields:
114
+ user_id: { type: objectId, ref: users }
115
+ bio: { type: string, max_length: 500 }
116
+ github_api:
117
+ action: add
118
+ type: external_api
119
+ provider: GitHub
120
+ schema_ref: https://docs.github.com/en/rest
121
+ client: src/integrations/github.py
122
+ endpoints:
123
+ get_user:
124
+ method: GET
125
+ path: /users/{username}
126
+ request:
127
+ headers: { Authorization: "Bearer {token}" }
128
+ response:
129
+ login: { type: string, required: true }
130
+ avatar_url: { type: string, required: true }
131
+ name: { type: string, nullable: true }
132
+ error_response:
133
+ message: { type: string }
134
+ status: { type: integer }
135
+ ```
136
+
137
+ **Contract documentation is mandatory for external APIs.** Every endpoint must document `request` (what you send), `response` (fields you read, `required: true` for those your code depends on), and `error_response` (the error shape you handle). The task-generation step below turns this into contract tests. If you don't know the exact shape, reference `schema_ref` and document the subset your client uses — that subset is the contract. No data impact → skip `data.yml` entirely.
138
+
139
+ **Manifest (`manifest.md`).** Generate it from `draft.md` as the durable plan glue: `complexity` (just scored), Why + Non-goals, the artifact list (added/modified/removed features, decisions, constraints), and a **Prior Art** section summarizing the build-vs-buy research captured in `draft.md` (what was found/evaluated, why adopt/build/hybrid; if building, what's borrowed). **Level 3–4** also carry **Assumptions** (what must be true; mark evidence vs. unvalidated; flag unvalidated ones on the critical path) and a **Pre-Mortem** (2–5 plausible failure modes 6 months out, with mitigations or "accepted"). These come straight from the `draft.md` Decided/Open and Cut sections.
140
+
141
+ **Do NOT delete `draft.md`.** Retain it read-only as the agreed reference through the rest of plan → apply. `grimoire-apply` removes it with the change folder at finalize.
142
+
143
+ **Validate the projection** before moving on:
144
+ - `.feature` files have valid Gherkin; every Feature has a user story; every Scenario has at least Given + When + Then.
145
+ - MADR records have valid YAML frontmatter (status, date).
146
+ - Manifest is complete and accurate; `complexity` is set.
147
+ - **Re-run the admission test on every scenario you wrote**: external actor, observable, domain language, survives reimplementation. Any scenario that now fails is slop — move it to `constraints.md` or a MADR.
148
+ - **Principles gate** (`../references/principles.md`): no fact written to two homes (DRY), no second way to do an existing thing (one right way), no reinvented wheel, no artifact created past the stated scope (KISS). `draft.md` co-existing with the homes is **not** a DRY violation — it is the (soon-deleted) source the homes were projected from.
149
+
150
+ The homes now exist; the rest of plan reads and breaks them into tasks.
63
151
 
64
152
  ### 2. Read All Artifacts
65
153
 
66
- Read the change's artifacts following `../references/artifact-map.md` — it defines what each file is, the grimoire-docs-first / graph-for-structure discipline, and the staleness gate. Plan-specific reading on top of that:
154
+ Read the change's artifacts following `../references/artifact-map.md` — it defines what each file is, the grimoire-docs-first / graph-for-structure discipline, the **reading-altitude** rule (read contracts and signatures, not internal source or unit tests), and the staleness gate. Plan-specific reading on top of that:
67
155
 
68
156
  - `.grimoire/docs/constraints.md` — any constraints (security/NFR/observability) this change touches. These produce `unit-invariant` tasks, not scenarios.
69
157
  - The current baseline (`features/`, `.grimoire/decisions/`) via `git diff main` — exactly what this change adds vs. what already existed.
@@ -120,6 +208,8 @@ Level 1-2 changes with minor gaps may proceed; level 3-4 with multiple gaps shou
120
208
  ### 4. Generate Tasks
121
209
  Create `.grimoire/changes/<change-id>/tasks.md`. **Every task produces both production code AND a test — but the test level matches the artifact the task derives from.** Tasks are structured as pairs: the failing test first, then the production code.
122
210
 
211
+ **Order tasks by the technical spine** (`../references/design-spine.md`): dependencies → data/schema → API/contract → business logic → UI by component → verification, **test-first within each layer**. This is the same order the change was designed on, so the plan's shape mirrors the design and stays predictable across changes.
212
+
123
213
  **Tag every implementation task with a `verify:` level** — this tells `grimoire-apply` which test vehicle to use. Match the artifact:
124
214
 
125
215
  | Task derives from | `verify:` | Test vehicle |
@@ -341,7 +431,7 @@ Before presenting to the user, verify the plan:
341
431
  - [ ] Every test task describes what to assert (no "write a test")
342
432
  - [ ] Every implementation task describes what to create/modify (no "add the code")
343
433
  - [ ] The verification section has the exact commands to run
344
- - [ ] Tasks are ordered: shared stepstestproduction code → verification
434
+ - [ ] Tasks follow the technical-spine order (`../references/design-spine.md`): dependencies data/schema API/contractlogic UI → verification, test-first within each layer
345
435
  - [ ] No task requires the LLM to make architectural decisions — those should already be in the ADR
346
436
  - [ ] **Principles gate** (`../references/principles.md`): no task introduces a duplicate home for an existing fact (DRY), a second way to do an existing thing (one right way), a reinvented wheel where a tool/library/proven pattern exists (don't reinvent), or an abstraction/dependency justified only by a hypothetical (KISS). Any that does has a stated reason.
347
437
 
@@ -367,10 +457,10 @@ Check `.grimoire/config.yaml` for the configured agents:
367
457
  - If the user has configured separate thinking/coding agents, note this in the tasks.md header so the apply stage knows which agent to use
368
458
 
369
459
  ## Important
370
- - **Specificity is the whole point.** A vague plan is worse than no plan — it gives false confidence and the LLM will re-plan anyway. Every task must be executable without thinking.
460
+ - **Specificity is the whole point.** A vague plan is worse than no plan — it gives false confidence and the LLM will re-plan anyway. Every task must be executable without thinking. "Implement the feature" is not a task — it's the *Skipping the plan / vague tasks* rationalization in `../references/red-flags.md`.
371
461
  - Tasks should be small and specific — one logical unit of work each
372
462
  - Every task traces back to a scenario or decision
373
- - Order matters: dependencies first, verification last
463
+ - Order matters: tasks follow the technical-spine order (`../references/design-spine.md`); verification last
374
464
  - Don't generate tasks for things that already work (check the baseline)
375
465
  - Read the actual codebase before writing tasks. Reference real file paths, real patterns, real conventions. Don't guess.
376
466
 
@@ -177,9 +177,11 @@ Recommendation: Fix blockers, then proceed to apply.
177
177
  ## Important
178
178
  - This is a design review, not a code review. Focus on the specifications and plan, not hypothetical implementation details.
179
179
  - Be direct. Don't pad findings with praise or soften blockers. The goal is to catch problems before code is written, when they're cheap to fix.
180
+ - **Ground every finding — ladder it.** Use the interview techniques in `../references/elicitation-personas.md` pointed at the design: ladder a decision *up* to the goal it serves (or expose it as Tunnel Vision), ladder a finding *down* to the concrete behavior that breaks, 5-Whys it to root cause. Also check each decision walks the spine in context (`../references/design-spine.md`) — names its layer, validates the prior. A laddered finding earns its severity; a bare verdict doesn't.
180
181
  - A blocker means "if we code this as-is, we'll have to come back and redo work." A suggestion means "this would improve the design but isn't blocking."
181
182
  - Keep each persona's review focused and short. Three bullet points that matter are better than ten that don't.
182
183
  - If the change is trivial (e.g., rename a field, fix a typo in a feature), say so and don't manufacture issues.
184
+ - **Don't self-exempt by feel.** "It looks fine" / "I reviewed as I wrote it" are the *Skipping review* rationalization in `../references/red-flags.md`. Trivial-exempt is the skill's call, not a vibe.
183
185
  - All persona evaluation criteria, the materiality gate, the briefing structure, and the complexity-depth table live in `../references/review-personas.md`. Don't duplicate them here — read that file when running a persona.
184
186
 
185
187
  ## Done
@@ -118,9 +118,27 @@ For each step definition:
118
118
  - **[warning]** `test_auth.py:58` — step "Then user should exist" only asserts `is not None` — check the actual user properties
119
119
  ```
120
120
 
121
+ **Regression coverage:** When verifying a bug fix, confirm the fix ships with a regression test **named after the bug** (see `grimoire-bug`). A bug fix with no test that goes red-without-the-fix and pins the defect → WARNING — the bug can silently return.
122
+
121
123
  If `grimoire test-quality` CLI command is available, suggest running it for a comprehensive analysis.
122
124
  To run tests directly: use `config.tools.bdd_test` for BDD and `config.tools.unit_test` for unit tests.
123
125
 
126
+ ### 3.E Behavioral Verification *(optional — user-facing changes only)*
127
+
128
+ Sections 3.B–3.D verify statically (code exists, asserts, follows decisions) and run the configured suites. They do **not** drive the running app. When the change is user-facing and the app can be run, add a behavioral pass; otherwise skip and say so. This mode adds **no mandatory dependency** — if there's no way to drive the app, mark it INCONCLUSIVE and rely on 3.A–3.D.
129
+
130
+ **Read-only by default.** Read-only navigation/inspection needs no opt-in. Any state-changing action requires explicit user opt-in **and** a non-production target (local/staging URL, seeded creds). Never run mutations against production.
131
+
132
+ **Verdict.** Every behavioral pass ends in exactly one:
133
+ - **SHIP** — behavior matches the spec; no material issues.
134
+ - **SHIP WITH FIXES** — works, with the non-blocking issues listed.
135
+ - **DO NOT SHIP** — a scenario's promised outcome does not hold.
136
+ - **INCONCLUSIVE** — could not verify (no baseline, app wouldn't run, tooling absent).
137
+
138
+ **No baseline ⇒ INCONCLUSIVE, never a silent PASS.** Same rule as §3.C2: without a reference state you cannot claim behavior is correct. Report INCONCLUSIVE and fall back to static verification — do not dress up "I couldn't check" as a pass.
139
+
140
+ **Click-path final-state check.** For each touchpoint the change affects, build a side-effect map — `action → {state it sets, state it resets}` — then trace the sequence and ask: *is the FINAL state what the label/spec promises?* This catches the silent-undo class (action B resets what action A just set) that static reading and single-assert tests miss.
141
+
124
142
  ### 4. Security Compliance Verification
125
143
 
126
144
  Verify that security guidance from plan and review stages was followed in implementation. Read `../references/security-compliance.md` for the full checklist.
@@ -222,6 +240,7 @@ Produce a structured report:
222
240
  - Scenarios verified: X
223
241
  - Decisions verified: X
224
242
  - Security checks: X passed, X failed
243
+ - Behavioral verdict: <SHIP | SHIP WITH FIXES | DO NOT SHIP | INCONCLUSIVE | n/a (static only)>
225
244
  - Issues found: X critical, X warnings, X suggestions
226
245
 
227
246
  ## Critical Issues
@@ -255,6 +274,7 @@ Based on the report:
255
274
 
256
275
  ## Important
257
276
  - Verify is read-only. Do NOT fix issues — only report them. The user decides what to do.
277
+ - **"Should pass" is not evidence.** Declaring done without running is the *Declaring done without verifying* rationalization in `../references/red-flags.md`. Observe state, don't predict it.
258
278
  - Be specific: reference file paths and line numbers for every issue.
259
279
  - A scenario without a step definition is always CRITICAL — the spec is not tested.
260
280
  - A step definition with no assertions is always CRITICAL — it's a false positive.
@@ -8,7 +8,7 @@ Loaded by skills that read a change's specs before acting (`grimoire-plan`, `gri
8
8
 
9
9
  Per-change (under `.grimoire/changes/<change-id>/`):
10
10
 
11
- - **`draft.md`** — the living design doc the change was designed on (diagram/sketch, decision ledger, pseudo-code, Decided/Open ledger). The single source the other artifacts were **projected** from at the end of `grimoire-draft`. Ephemeral: retained read-only as the agreed-design reference through the pipeline, deleted when `grimoire-apply` clears the change folder. Read it for the *intent and rationale* behind the projected artifacts; the features/constraints/decisions remain the authoritative homes.
11
+ - **`draft.md`** — the living design doc the change was designed on (diagram/sketch, decision ledger of Y-statements, pseudo-code, Decided/Open ledger). The single source the other artifacts are **projected** from at the start of `grimoire-plan`. Ephemeral: retained read-only as the agreed-design reference through the pipeline, deleted when `grimoire-apply` clears the change folder. Read it for the *intent and rationale* behind the projected artifacts; the features/constraints/decisions remain the authoritative homes.
12
12
  - **`manifest.md`** — change summary, complexity level, and the Why. Level 3-4 also carry Assumptions, Pre-Mortem, and **Prior Art** (the build-vs-buy rationale). Generated from `draft.md` at projection.
13
13
  - **`features/*.feature`** — behavioral specifications. Edited live in `features/` on the branch.
14
14
  - **decision records** — architectural choices for this change, edited live in `.grimoire/decisions/`, including Cost of Ownership sections.
@@ -35,6 +35,18 @@ Project-wide (under `.grimoire/`):
35
35
 
36
36
  ---
37
37
 
38
+ ## Reading altitude — design reads contracts, debugging reads internals
39
+
40
+ When you read code during **design** (`grimoire-draft`, `grimoire-design`, `grimoire-plan`), read at the **published-interface altitude** — what a caller needs to integrate, not how the callee works inside:
41
+
42
+ - **Third-party library or service** — its public API, types, and docs. Not its source, and not its tests. The contract is what you design against; the internals are the maintainer's concern.
43
+ - **Your own system** — the touched area's exported symbols, API endpoints, and data schema, plus the relevant feature files and `constraints.md`. Not the whole backend's source, and not its unit tests.
44
+ - **Prefer the graph for structure without bodies.** `search_graph` / `get_architecture` give signatures, callers, and call edges — the *shape* of the interface — without spending context on implementation bodies. That is the altitude design needs.
45
+
46
+ **Reading full source bodies and unit tests is a *debugging* activity** — justified when a behavior is wrong and you need root cause (`grimoire-bug`), not when you need to know how an interface is used. In design the question is "what is the contract?", and the contract lives in signatures, schemas, and specs — not in line-by-line implementation. Exhaustively reading internals at design time burns context and rarely improves the design. This is the rule above, sharpened: even when you *do* read source, read the seam, not the guts.
47
+
48
+ ---
49
+
38
50
  ## Staleness gate
39
51
 
40
52
  For each area doc you load, compare its `last_updated` against `git log -1 --format=%ci <directory>`. If the doc is older than the most recent commit to its directory, it's stale — its paths, utility names, and patterns may be wrong.
@@ -0,0 +1,166 @@
1
+ # Design Spine
2
+
3
+ The ordered path a design walks — in the interview (`grimoire-draft`, `grimoire-design`), in
4
+ evaluation (`grimoire-review`), and in task order (`grimoire-plan`). One spine, walked the
5
+ same way everywhere, so the structure is predictable: the user learns where each kind of
6
+ decision happens.
7
+
8
+ This is the single home for *how* a design proceeds. Skills cite the relevant section; none
9
+ restate it (DRY). It pairs with `principles.md` (what every artifact must satisfy) and
10
+ `red-flags.md` (the excuses to skip a stage) — this file is the *sequence* the work follows.
11
+
12
+ The methods below are named on purpose. They are established practice — lean on the name
13
+ (Working Backwards, inside-out layering, stepwise refinement, Y-statement) so the work
14
+ inherits a known, well-understood discipline instead of an ad-hoc one.
15
+
16
+ ---
17
+
18
+ ## Bias: complete, not over-built
19
+
20
+ The spine and its personas are a **surfacing** tool — they raise candidates (steps,
21
+ constraints, "what happens if" cases). Surfacing tools bias toward *more*: every question
22
+ invites a handler, every constraint feels like rigor. Left ungoverned, walking the spine
23
+ *manufactures* the over-engineering YAGNI warns against. So the walk has one governing rule:
24
+
25
+ **Surface broadly, build narrowly.** The spine raises the candidate; `principles.md` §4
26
+ (KISS/YAGNI) decides whether it earns a place. They are a pair — never run the surfacing
27
+ without the prune.
28
+
29
+ - **Every surfaced item gets a disposition, and the default is *don't build it*.** Each
30
+ candidate resolves to one of: *build* (a present, concrete need requires it), *won't build*
31
+ (record as a one-line non-goal), or *defer*. When unsure, it's a non-goal. Recording "we
32
+ considered X and chose not to" is completeness; building X "just in case" is not.
33
+ - **Lean simple when you must lean.** Under-building is cheap to add later; over-building is
34
+ expensive to remove. If the call is genuinely balanced, choose the simpler, less-complete
35
+ option and say so — complexity layers in cleanly later, but rarely comes back out.
36
+ - **"Complete" means the stated outcome plus the failures whose cost the user would actually
37
+ feel — not every conceivable case.** Completeness is measured against the outcome, not
38
+ against an exhaustive enumeration of edge cases.
39
+ - **Constraints are surface area, not virtue.** Each must earn its place with a present,
40
+ concrete *why* (a downstream need, a real corruption risk). A constraint justified only by
41
+ "might need to" is YAGNI — cut it. This is why the ceremony gate counts constraints as a
42
+ *cost* signal, not a rigor score.
43
+
44
+ ---
45
+
46
+ ## Pick the spine
47
+
48
+ Two spines. Pick by what the change touches; **once picked, always walk it in order** (next
49
+ section). A mixed change uses the technical spine and expands its UI layer with the UX spine.
50
+
51
+ | Change | Spine | Home skill |
52
+ |--------|-------|-----------|
53
+ | User-facing flow / screen / UI | **UX-workflow spine** | `grimoire-design` |
54
+ | Behavioral / technical (API, data, logic) | **technical spine** | `grimoire-draft` |
55
+ | Mixed (UI + backend) | **technical**, UI layer via UX | draft + design for the UI layer |
56
+
57
+ ### UX-workflow spine — traversal direction
58
+ Walk the user's process in one of two directions. State which you're using.
59
+
60
+ - **Backward — "Working Backwards"** (Cooper goal-directed design; Amazon PR/FAQ). Start at
61
+ the **goal / end-state** (a JTBD outcome: "when *situation*, I want *motivation*, so I can
62
+ *outcome*"). At each step ask **"what must be true for the user to reach here?"** Best when
63
+ the goal is clear but the path is contested — it surfaces unknown prerequisites and prunes
64
+ steps that serve no downstream need.
65
+ - **Forward — "forward chaining"** (skills-forward). Start from **what the user reasonably
66
+ knows or has at the start** and step toward the goal. Best when the starting state is well
67
+ defined but the goal is emergent, or when documenting an existing happy path.
68
+
69
+ **Reconcile (the discipline):** define the end-state, chain **backward** to the required
70
+ prerequisites, then **validate forward** by walking a real user from their actual starting
71
+ knowledge (the Mom Test — see `elicitation-personas.md`). Where the two traversals don't meet
72
+ are the missing or assumed steps — capture each as an Open row.
73
+
74
+ ### Technical spine — layer order
75
+ Design **process/constraints → data model → API/contract → UI, component by component.** This
76
+ is **inside-out** layering (DDD layered architecture / Clean Architecture dependency rule):
77
+ the domain and its rules sit at the core; the API and UI are outer detail that depend inward,
78
+ never the reverse.
79
+
80
+ | Layer | What you settle here |
81
+ |-------|----------------------|
82
+ | **1 · Process / constraints** | The invariants and limits that must always hold — business rules, security controls, NFRs, what must *never* happen (data-corruption guards). These bound everything downstream. |
83
+ | **2 · Data model** | Entities, fields, relationships, and each field's constraints (required / unique / nullable / range). Every constraint states its *why* — the downstream need or corruption risk that justifies it (→ a `constraints.md` row). |
84
+ | **3 · API / contract** | The interface other code or clients use. Design it as a deliberate contract that **hides** the data model — this reconciles "data-first" with "API-first": the API is a versioned abstraction over the schema, not a mirror of it. |
85
+ | **4 · UI, by component** | The surface, one component at a time. For a user-facing surface, expand this layer with the UX-workflow spine above. |
86
+
87
+ **Each layer constrains the next** (*stepwise refinement* — every decision narrows the
88
+ solution space for the layers below). **Building a layer validates the prior** (*consumer-
89
+ driven contracts* / model–implementation feedback): designing the API tests the data model;
90
+ designing the UI tests the API. When a lower layer can't satisfy an upper one, the upper one
91
+ was wrong — go back and fix it, don't patch around it downstream.
92
+
93
+ ## Walk it — always, in order
94
+
95
+ Whatever spine is chosen, **traverse its layers/steps in order; do not jump around.** At each
96
+ layer:
97
+
98
+ 1. **Elicit** with that layer's lens — personas are the *who* (`elicitation-personas.md`),
99
+ techniques are the *how* (laddering / Mom Test / 5 Whys, same file).
100
+ 2. **Record decisions** for the layer in the `draft.md` ledger as Y-statements (below).
101
+ 3. **Validate the prior** layer against what you just learned — restate the check explicitly
102
+ ("this required field traces to *downstream need X*"; "this data model satisfies process
103
+ constraint *C*"). A failed validation sends you back up, not forward.
104
+
105
+ An **empty layer is fine** — say so in one line and skip (a change with no data impact skips
106
+ layer 2). Skipping is a stated call, never a silent omission.
107
+
108
+ ## Ceremony gate — scale to the constraints, not first impressions
109
+
110
+ Full layer-by-layer ceremony is for changes that earn it; trivial ones don't.
111
+
112
+ - **Default (lightweight):** walk the spine, but elicit only what the change needs and skip
113
+ empty layers in one line. Most level-1–2 changes finish here.
114
+ - **Escalate to full ceremony when the change introduces more than 2 constraints** (data
115
+ invariants, security controls, cross-layer dependencies). The **3rd** new constraint is the
116
+ signal that real complexity is hiding — from there, walk every layer formally, record a
117
+ decision per layer, and (level 3–4) add the manifest Pre-Mortem.
118
+
119
+ Constraint count is a measurable trigger that complements the complexity level: complexity is
120
+ an *output* of design, and the count is that output surfacing mid-interview. A nominally
121
+ "simple" change that trips the gate is not simple — let the count, not the first impression,
122
+ set the depth.
123
+
124
+ ## Decisions — Y-statement, in context
125
+
126
+ Every decision in the `draft.md` ledger is recorded as a **Y-statement**, so its context is
127
+ forced into the record. This defeats the *Tunnel Vision* anti-pattern — a decision that reads
128
+ well in isolation but is wrong for the surrounding workflow:
129
+
130
+ > **D*n*:** In the context of *spine layer / use-case*, facing *concern / force*, we chose
131
+ > *option* over *alternatives*, to achieve *quality*, accepting *downside* — **because
132
+ > *why*.**
133
+
134
+ The **because** is mandatory. The **context** clause ties the decision to the spine layer it
135
+ emerged from, so the user evaluates it *in the situation it serves*, not as an abstract claim
136
+ ("sounds good" decided in a vacuum is exactly how ADRs go wrong). Coupled decisions
137
+ cross-reference by ID (D7 cites D3). At projection (`grimoire-plan`'s first step) each ledger entry
138
+ maps cleanly to a MADR: the context clause becomes the ADR's *Context and Problem Statement*,
139
+ the chosen/neglected options its *Considered Options / Decision Outcome*, the accepted
140
+ downside its *Consequences*. The novelty gate still applies — industry-default picks fold into
141
+ the baseline ADR, not a new record.
142
+
143
+ ## Plan task order follows the technical spine
144
+
145
+ `grimoire-plan` emits tasks in the **same layer order** the change was designed on, so the
146
+ plan's shape mirrors the design:
147
+
148
+ ```
149
+ dependencies → data/schema → API/contract → business logic → UI by component → verification
150
+ ```
151
+
152
+ Within each layer, **test first** — the failing test for that layer-pair before its code.
153
+ This is the per-layer red-green unit, not a single global acceptance test up front. The order
154
+ is fixed on purpose: over time the user learns that schema changes always land before API
155
+ changes, contract tests before clients, UI last.
156
+
157
+ ---
158
+
159
+ ## How skills cite this
160
+
161
+ - **design** — UX-workflow spine (traversal direction) at the user-flow step.
162
+ - **draft** — pick + walk the spine in the design loop; Y-statement ledger; ceremony gate.
163
+ - **plan** — task order = technical-spine order; test-first per layer.
164
+ - **review / verify** — check that decisions name their context and each layer validates the prior.
165
+
166
+ Each skill links its own section; none restate the spine. This is the one home (DRY).
@@ -6,7 +6,7 @@ Persona-driven questions to surface requirements. Used by draft (gather requirem
6
6
 
7
7
  - **In draft**: Ask these questions to gather requirements before drafting.
8
8
  - **In plan**: Use as a completeness checklist — flag gaps in the specs, don't ask the user.
9
- - **In review**: Use as evaluation criteria — check if the design addresses each concern.
9
+ - **In review**: Use as evaluation criteria — check if the design addresses each concern. Apply the interview techniques below to *interrogate the design itself* (not the user) so a finding is reasoned, not asserted.
10
10
 
11
11
  ## Depth by Complexity Level
12
12
 
@@ -19,6 +19,27 @@ Persona-driven questions to surface requirements. Used by draft (gather requirem
19
19
 
20
20
  Don't ask every question — only ask questions whose answers aren't already clear.
21
21
 
22
+ **Questions surface candidates, not requirements.** A persona question raises something to *decide*, and the default decision is "out of scope" — record it as a one-line non-goal and move on. Don't convert every answered question into built handling or a new constraint; that is how eliciting turns into over-engineering. Promote to a scenario or constraint only what the stated outcome actually needs — everything else is a non-goal. See the simplicity bias in `design-spine.md`.
23
+
24
+ ## Interview Techniques — How to Ask
25
+
26
+ The personas are the *who* (which concerns to probe); these are the *how* (question shapes that surface a real answer). Reach for them especially when walking a spine in `design-spine.md`.
27
+
28
+ - **Laddering** — the bidirectional workhorse. Ladder **up** ("why is that important to you?") to climb from a feature to the goal it serves; ladder **down** ("how would you do that?" / "what would that look like?") to descend from a goal to a concrete step. Up drives the *backward* UX traversal; down drives the *forward* one.
29
+ - **The Mom Test** — anchor every question in **past behavior, not hypotheticals**: "When was the last time you hit this? Walk me through exactly what you did." Beats "would you use…?" / "is this a good idea?" — those invite flattery, not facts. The best way to reconstruct the *real* workflow instead of the imagined one.
30
+ - **Critical Incident Technique** — ask for a **specific real incident** in detail ("tell me about the last time X went wrong"). People recall concrete incidents far more accurately than general process, so this surfaces edge cases and the actual happy path.
31
+ - **5 Whys** — repeat "why" to climb from a symptom or step to its root motivation. Use it to find the end-state a backward traversal should start from.
32
+
33
+ Don't run all four — pick the one that fits: laddering to trace a workflow up or down the spine, Mom Test / CIT to reconstruct what really happens, 5 Whys to find the underlying goal.
34
+
35
+ **In review, point these inward at the design.** They make a critique credible — reasoned rather than asserted:
36
+
37
+ - **Ladder a decision up** — does it trace to a real goal, or is it *Tunnel Vision* (reads well in isolation, wrong for the surrounding workflow)? A decision whose ladder-up dead-ends serves no goal.
38
+ - **Ladder a finding down** — "how does this concretely fail?" A finding you can ladder to a specific broken behavior is real; one you can't is a hunch — say so or drop it.
39
+ - **5-Whys a finding to root cause** before grading it, so the blocker names the underlying defect, not a symptom.
40
+
41
+ A laddered finding ("fails because → because → because") earns its severity. A bare verdict ("[blocker] this is wrong") does not — and the materiality gate in `review-personas.md` rejects it.
42
+
22
43
  ## Outcome & Scope — Always Ask First
23
44
 
24
45
  Before diving into persona questions, establish the outcome and boundaries. These two questions prevent the most common spec failures — building the wrong thing and building too much:
@@ -48,6 +69,7 @@ Ask when: the change has user-facing behavior.
48
69
 
49
70
  Ask when: the change introduces new components, services, dependencies, or data flows.
50
71
 
72
+ - **Follow the technical spine** (`design-spine.md`): design inside-out — process/constraints → data model → API contract → UI, each layer constraining the next. Flag anything that inverts it: a UI shape dictating the data model, or an API that mirrors the schema instead of abstracting over it.
51
73
  - What's the deployment context? Does this run in the same service or cross service boundaries?
52
74
  - What existing components does this touch? Are there shared modules, APIs, or databases involved?
53
75
  - Are there concurrency concerns? Multiple users or processes acting on the same data?
@@ -75,6 +97,7 @@ Ask when: the change involves authentication, authorization, user input, sensiti
75
97
 
76
98
  Ask when: the change has complex behavior, multiple paths, or integration points.
77
99
 
100
+ - **Walk the workflow step by step and ask *what happens if* at each step** — the user abandons here, the input is invalid, the network drops, the action runs twice, they arrive without the previous step's precondition? This surfaces candidates, not requirements: **most answers are "nothing — out of scope," recorded as a one-line non-goal.** Promote a "what happens if" to a scenario or constraint only when its failure carries a cost the user would actually feel for *this* change (the simplicity bias in `design-spine.md`).
78
101
  - What are the boundary values? Min/max lengths, zero vs. one, empty collections, null states?
79
102
  - What are the timing edge cases? Concurrent edits, race conditions, timeout during processing?
80
103
  - What external dependencies could fail? How should the system behave when they do — retry, fallback, error?
@@ -85,10 +108,11 @@ Ask when: the change has complex behavior, multiple paths, or integration points
85
108
  Ask when: the change creates, modifies, or removes data models, or integrates with external APIs.
86
109
 
87
110
  - What data entities are involved? What are the relationships between them?
88
- - What are the field constraints? Required, unique, nullable, max length, valid ranges, enums?
111
+ - What are the field constraints — required, unique, nullable, max length, valid ranges, enums? For each, **what is the *why*** — the downstream need it serves or the corruption risk it prevents ("required because the billing job reads it", "unique to stop duplicate ledgers")? Record that justification as the rationale on a `constraints.md` row, not as an unstated assumption.
89
112
  - How does this data grow? Is there a retention policy, archival strategy, or cleanup needed?
90
113
  - Is there existing data that needs migrating? Can the migration run live or does it need downtime?
91
114
  - Are there external API contracts? What fields does the client read, and what happens if the schema changes?
115
+ - **Does the data model supply everything the API and UI need?** For each field an endpoint or screen requires, confirm a backing source exists in the model with the right type and nullability — the data layer *validates* against the layers above it (`design-spine.md`). A required API/UI value with no data-model source is a gap to fix in the model now, not a `null` to discover in production.
92
116
 
93
117
  ## Requirements Summary Template
94
118
 
@@ -0,0 +1,62 @@
1
+ # Anti-Rationalization Red Flags
2
+
3
+ Under time pressure, sunk cost, or false confidence, an AI talks itself out of the
4
+ discipline it was told to follow — and does it convincingly. The rationalizations are
5
+ predictable and few. This file names them.
6
+
7
+ This is the single home for the "excuses to skip a stage." It is the sibling of the
8
+ **Anti-Loop Protocol** in `AGENTS.md`: that governs loops *inside* a stage; this governs
9
+ skipping a stage *entirely*. Skills cite the relevant section rather than restating it.
10
+
11
+ **The rule:** when you catch yourself forming one of these thoughts, that is the signal to
12
+ *do the step*, not skip it. The urge to skip is the evidence the step is needed. A stage is
13
+ skipped only by an explicit, recorded decision (a gate that says skip) — never by a silent
14
+ rationalization that it "isn't worth it this time."
15
+
16
+ ---
17
+
18
+ ## Skipping the spec (→ grimoire-draft)
19
+ **Catch yourself saying:** "Too small / too obvious to draft." · "I already know what to build." · "I'll spec it after I see it work."
20
+ **Why it's wrong:** "Too small" is a complexity judgment, and complexity is an *output* of design, not an input — you can't score it honestly before designing. Code-first means the spec gets reverse-engineered to match whatever you built, so it documents the bug instead of catching it.
21
+ **Instead:** Run the triviality gate (draft step 2). If it's genuinely trivial (config/typo/single-file) the gate says skip — that's a *recorded* call. Anything else: draft it.
22
+
23
+ ## Silently filling a gap (→ grimoire-draft)
24
+ **Catch yourself saying:** "A reasonable default is obvious here." · "I understand enough, no need to ask." · "I'll just assume X."
25
+ **Why it's wrong:** A silent assumption the user would have corrected becomes a bug whose paper trail says "intended." One question costs seconds; a wrong guess costs a rebuild.
26
+ **Instead:** Ask it (an *Open* row), defer it to a non-goal, or record the inference explicitly as `RESOLVED: defaulting to X per delegation`. Never leave an unrecorded guess in the design.
27
+
28
+ ## Skipping the plan / vague tasks (→ grimoire-plan)
29
+ **Catch yourself saying:** "Planning is overhead, I'll work it out as I go." · "The task is just 'implement the feature'."
30
+ **Why it's wrong:** A vague plan is worse than none — it gives false confidence and you re-plan mid-implementation anyway, now with code already written the wrong way. "Implement the feature" is not a task; it restates the goal.
31
+ **Instead:** Every task names exact files and one approach, small enough to execute without thinking (grimoire-plan). If a task needs thought to start, it isn't planned yet.
32
+
33
+ ## Code before the test (→ grimoire-apply)
34
+ **Catch yourself saying:** "I'll add the test after." · "Let me just see it work first." · "The test is trivial, red-first is ceremony."
35
+ **Why it's wrong:** A test written after the code is shaped to pass the code, not to catch its bugs — it asserts what you built, not what was required. A test that never failed has never proven anything. This is the single most common discipline bypass.
36
+ **Instead:** Red first — watch it fail for the right reason, then make it pass (grimoire-apply). If you wrote code before the test, the honest move is to delete the code and start from red.
37
+
38
+ ## Skipping review (→ grimoire-review)
39
+ **Catch yourself saying:** "It looks fine." · "I reviewed it as I wrote it." · "Too small to need review."
40
+ **Why it's wrong:** "Looks fine" is the feeling that precedes every shipped bug. Reviewing your own work as you write it is the weakest review there is — same blind spots, same assumptions, at the moment you're most committed to the approach.
41
+ **Instead:** Run the persona pass at the depth the complexity calls for (grimoire-review). Trivial changes are exempt *by the skill's own rule* — say so and move on; don't self-exempt by feel.
42
+
43
+ ## Declaring done without verifying (→ grimoire-verify)
44
+ **Catch yourself saying:** "Tests should pass." · "That obviously works." · "I'll trust the run I did earlier."
45
+ **Why it's wrong:** "Should pass" is a prediction, not evidence. Done is a claim about observed state; an unobserved claim is a guess in a confident voice.
46
+ **Instead:** Run it. Confirm every scenario has a real step definition with real assertions and no regressions (grimoire-verify). Evidence over claims — every time.
47
+
48
+ ## Doing more than the task (→ principles.md §4)
49
+ **Catch yourself saying:** "While I'm in here I'll also…" · "We'll need it later."
50
+ This is scope creep / YAGNI — its home is **`principles.md` §4 (KISS/YAGNI)**, not here. The named flags ("while I'm here", "for a future caller") live there; cut the speculative work and say so in one line.
51
+
52
+ ---
53
+
54
+ ## How skills cite this
55
+
56
+ - **draft** — *Skipping the spec*, *Silently filling a gap*
57
+ - **plan** — *Skipping the plan / vague tasks*
58
+ - **apply** — *Code before the test*
59
+ - **review** — *Skipping review*
60
+ - **verify** — *Declaring done without verifying*
61
+
62
+ Each skill links its own section; none restate the list. This is the one home (DRY).
@@ -116,6 +116,31 @@ Severity inflation patterns to avoid:
116
116
  - "Untested edge case" when no scenario in the briefing covers it → not a blocker.
117
117
  - "Missing observability" on a level 1-2 change → suggestion, never blocker.
118
118
 
119
+ ## 2c. Pre-Report Gate *(diff-review personas: Senior Engineer code-level, Security code-level scan, Code Style)*
120
+
121
+ The materiality gate (§2) asks "does this matter to *this* project". The Pre-Report Gate asks the prior question: "is this *even a real issue*". Both apply; this one runs first on code-level findings. Before writing any finding, answer four questions:
122
+
123
+ 1. **Exact line** — can you cite the precise `file:line` the finding lives at?
124
+ 2. **Concrete failure mode** — can you state input → state → bad outcome? Not "could be unsafe" — the actual trigger and consequence.
125
+ 3. **Context read** — have you read the callers, imports, and tests around the line, not just the hunk? Trace the type and the caller before claiming a flaw.
126
+ 4. **Severity defensible** — would the §2b severity survive the Contrarian?
127
+
128
+ Any "no / unsure" → downgrade or drop. A **blocker** additionally requires the offending snippet, the failure scenario, and **why existing guards don't already catch it** (neighbor code, framework default, narrowing on the prior line). If you can't write that, it is not a blocker.
129
+
130
+ ## 2d. Common False Positives — skip these
131
+
132
+ Recurring LLM mis-flags. Each has a disqualifying condition — check it before filing. The fix is always *trace it*, not *pattern-match the syntax*.
133
+
134
+ - **"Possible null deref"** when the preceding line narrows the type (`if (!x) return`, early-return, `?.` already guarding) → trace the type flow; drop.
135
+ - **"N+1 query"** on a fixed-cardinality loop (known small constant) or a DataLoader / batched path → not N+1; drop.
136
+ - **"Missing await"** on an intentionally detached call (`void promise`, fire-and-forget with a comment, a queued job) → check for `void` / comment first; drop.
137
+ - **"Unhandled promise rejection"** on a promise that is `.catch`-chained or `await`ed in a `try` → trace the chain; drop.
138
+ - **"Math.random() is insecure"** in a non-crypto context (jitter, sampling, test data, cache-bust) → security theater; drop. Flag only on tokens/keys/IDs.
139
+ - **"Missing input validation"** when a traced caller already validates at the boundary → trace one caller; internal code may trust it (errors-at-the-boundary). Drop or route as a boundary note.
140
+ - **"Magic number / no constant"**, **"add a comment"**, **"could be more generic"** with no project anchor → style preference; drop (or §4.6 suggestion at most).
141
+
142
+ Closing test for any code-level finding: **would a senior engineer on this team actually change this in review?** If no, skip.
143
+
119
144
  ---
120
145
 
121
146
  ## 3. Complexity Gating
@@ -170,12 +195,13 @@ Evaluate:
170
195
 
171
196
  ### 4.2 Senior Engineer
172
197
 
173
- Treat accepted decisions as constraints — cite ADR ID before suggesting an override.
198
+ Treat accepted decisions as constraints — cite ADR ID before suggesting an override. On PR/pre-commit, run the Pre-Report Gate (§2c) and check §2d before filing any code-level finding.
174
199
 
175
200
  Evaluate:
176
201
  - **Build vs Buy** *(design only)*: Was prior art research thorough? If a well-maintained library exists that the manifest doesn't mention, **blocker**.
177
202
  - **Simplicity (YAGNI ladder)**: Walk `../references/principles.md` §4 in order — could it not exist (YAGNI), does the stdlib do it, does a native platform feature cover it, does an installed dep solve it, is it one line? Flag the first rung the code skipped: unnecessary abstraction, indirection, premature generalization, config-driven where a direct call would do. Abstract on the third real use, not the first (**Rule of Three**) — two copies is not yet a pattern. Every finding **names the concrete replacement** (`stdlib: 27-line validator → "@" in email, 1 line`), not just "this seems complex" — a finding the author can't act on is noise.
178
203
  - **Architecture**: Decisions sensible for this codebase? Will this paint us into a corner?
204
+ - **Unrecorded decision**: Does the change make an architectural or technology choice — new dependency, pattern, module boundary, NFR target — with no ADR recorded? An architectural decision without a decision record → finding; route to an ADR via `grimoire-draft`, and check it satisfies the capability-surface rule (ADR-0036). Apply the novelty gate (`grimoire-audit` §3) — don't flag default tooling picks.
179
205
  - **Conventions** *(PR/pre-commit)*: Does new code match file layout, naming, and patterns already in the touched areas? Check `.grimoire/docs/<area>.md` if present.
180
206
  - **Reuse / reinvention**: Existing utilities re-implemented (`grep` similar names; area-doc reusable lists), or stdlib / native-platform / installed-dep functionality hand-rolled (principles.md §3 — don't reinvent the wheel). Name what already does the job.
181
207
  - **Dead code** *(PR/pre-commit)*: Functions added but not called, imports unused, commented-out code, stubs with no implementation.
@@ -221,6 +247,8 @@ Most reviews: only **Data disclosure** + **Linking/Identifying** apply — skip
221
247
 
222
248
  #### Code-level scan *(PR/pre-commit only)*
223
249
 
250
+ Run the Pre-Report Gate (§2c) and check §2d before filing — security findings draw the most reflexive false positives (theoretical injection on validated input, `Math.random()` outside crypto).
251
+
224
252
  - **Secrets**: Grep diff for hardcoded keys, tokens, passwords, cloud credentials, JWT secrets. Any hit = **blocker**.
225
253
  - **Injection**: Raw SQL with string concatenation, shell-exec with user input, `eval`/`exec`, unsafe deserialization. Tag OWASP + CWE.
226
254
  - **Input validation**: New endpoints without schema validation, file uploads without size/type limits, path params used directly in filesystem calls.
@@ -288,7 +316,7 @@ Verify the diff matches the project's code-style and comment standards. This is
288
316
  4. Lint/format config in repo root: `.editorconfig`, `eslint.config.*`, `.prettierrc*`, `pyproject.toml` (ruff/black), `.rubocop.yml`, `rustfmt.toml`, `.golangci.yml`, etc.
289
317
  5. **Neighboring files** in the touched directories — derive convention from what already exists when no config exists
290
318
 
291
- If none of the above pin a rule, **don't invent one**. Style preferences without a project anchor are dropped.
319
+ If none of the above pin a rule, **don't invent one**. Style preferences without a project anchor are dropped. The Pre-Report Gate (§2c) and §2d apply here too — most style nits without a config anchor are §2d false positives.
292
320
 
293
321
  #### Evaluate
294
322
 
@@ -60,9 +60,14 @@ kind: greenfield | refactor
60
60
  and cross-references (D7 cites D3) freely — this is how coupled decisions stay legible
61
61
  in one place. At projection, each NOVEL decision becomes a MADR (novelty gate applies —
62
62
  obvious tooling picks fold into the baseline ADR, they don't mint a record).
63
+
64
+ Phrase each row as a Y-statement (see the design-spine reference): the Decision cell states
65
+ "in the context of <spine layer / use-case>, facing <force>, chose <option> over
66
+ <alternatives>, accepting <downside>"; the Why cell is the "because". The context clause
67
+ ties the decision to the layer it serves so it's judged in situation, not in a vacuum.
63
68
  -->
64
69
 
65
- | # | Decision | Why |
70
+ | # | Decision (Y-statement: context · option over alternatives · accepting downside) | Why (because…) |
66
71
  |----|----------|-----|
67
72
  | D1 | | |
68
73