npm - @kiwidata/grimoire - Versions diffs - 0.2.1 → 0.3.0 - Mend

@kiwidata/grimoire 0.2.1 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/AGENTS.md +16 -3
package/package.json +1 -1
package/skills/grimoire-apply/SKILL.md +62 -10
package/skills/grimoire-audit/SKILL.md +12 -1
package/skills/grimoire-bug/SKILL.md +3 -0
package/skills/grimoire-verify/SKILL.md +19 -0
package/skills/references/review-personas.md +30 -2
package/templates/learnings.md +40 -0

package/AGENTS.md CHANGED Viewed

@@ -61,6 +61,10 @@ After any failure, state what you observe before proposing a fix. One sentence:
 This applies especially to test failures. "The test failed" is not a diagnosis. "The test expected `302` but got `200` because the redirect middleware isn't registered in the test client" is.
+### Loop-level breaker (autonomous apply)
+The attempt budget above is per-problem. Autonomous `grimoire-apply` adds a run-level circuit breaker and cross-section thrash detection on top of it — see `grimoire-apply` SKILL.md. Don't duplicate the per-problem rules there; the breaker is the loop-scale backstop, this protocol is the per-problem one.
 ## When to Use Grimoire
 Use grimoire when the user's request involves:
@@ -178,7 +182,7 @@ If a task seems wrong or impossible during apply:
 ## Directory Structure
-Features, decisions, constraints, and schema are edited **live on the feature branch** — `git diff` is the staging area. A change folder holds only the ephemeral coordination artifacts (manifest + tasks) and is removed at finalize; the PR diff and git history are the record. There is no proposed-copy tree and no archive tree.
+Features, decisions, constraints, and schema are edited **live on the feature branch** — `git diff` is the staging area. A change folder holds only the ephemeral coordination artifacts (manifest, tasks, and the apply-maintained learnings file) and is removed at finalize; the PR diff and git history are the record. There is no proposed-copy tree and no archive tree.
 ```
 project-root/
@@ -193,7 +197,8 @@ project-root/
 │   └── changes/              # ephemeral per-change coordination — removed at finalize
 │       └── <change-id>/
 │           ├── manifest.md
-│           └── tasks.md
+│           ├── tasks.md
+│           └── learnings.md  # apply working memory: failure-mode notes + discovered facts
 ```
 ## Conventions
@@ -243,7 +248,15 @@ This is what makes `grimoire trace` work. Without it, the commit is invisible to
 ### Decision Numbering
 - Sequential, zero-padded: `0001-`, `0002-`, etc.
 - Never reuse numbers
-- Superseded decisions keep their number, status updated to `superseded by NNNN`
+### Decision Lifecycle
+Status moves `proposed → accepted → (deprecated | superseded by NNNN)`:
+- `proposed` — drafted, not yet adopted.
+- `accepted` — in force; treated as a constraint by every stage.
+- `deprecated` — no longer recommended, with no direct replacement (the need went away).
+- `superseded by NNNN` — replaced by a newer decision.
+Supersession is **two-way and explicit**: the superseding ADR back-links the one it replaces (in Context or Decision Drivers), and the superseded ADR keeps its number with status set to `superseded by NNNN`. This is the only home for the link — don't restate it elsewhere.
 ### Step Definitions
 Organize by **domain concept**, NOT by feature file. Check the project's existing test setup and match its BDD framework conventions. See the active skill's testing reference for ecosystem-specific patterns.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@kiwidata/grimoire",
-  "version": "0.2.1",
+  "version": "0.3.0",
   "description": "Gherkin + MADR spec-driven development for AI coding assistants",
   "type": "module",
   "bin": {

package/skills/grimoire-apply/SKILL.md CHANGED Viewed

@@ -21,7 +21,7 @@ Implement tasks from a planned grimoire change using **test-first discipline at
 Do NOT write a `.feature` scenario for a `unit-invariant` or `characterization` task — forcing Gherkin where a unit test is correct is the antipattern that fills feature files with slop. One right way: behavior → scenario, everything else → unit test.
-**Artifacts are edited live on the feature branch.** Features, decisions, constraints, and schema are real files in `features/`, `.grimoire/decisions/`, `.grimoire/docs/`. There is no copy-into-change-folder and no promote step — `git diff` is the staging area. The change folder holds only ephemeral process scaffolding (`manifest.md`, `tasks.md`).
+**Artifacts are edited live on the feature branch.** Features, decisions, constraints, and schema are real files in `features/`, `.grimoire/decisions/`, `.grimoire/docs/`. There is no copy-into-change-folder and no promote step — `git diff` is the staging area. The change folder holds only ephemeral process scaffolding (`manifest.md`, `tasks.md`, and the apply-maintained `learnings.md`).
 ## CRITICAL: Two Rules That Must Not Be Broken
@@ -84,16 +84,26 @@ If the user doesn't specify, default to review mode.
 **Both modes:** Update `tasks.md` in real time as work progresses. Mark tasks `- [x]` the moment they pass. If a task is split, reordered, or new tasks are discovered during implementation, update `tasks.md` immediately so it always reflects the current state. The task list is the source of truth for progress — if the session is interrupted, the next agent should be able to read `tasks.md` and know exactly where to resume.
+### Working Memory: `learnings.md`
+Apply keeps one ephemeral file, `.grimoire/changes/<change-id>/learnings.md` (create it from `templates/learnings.md` the first time you need it). It is the loop's memory between attempts and sessions, and it is **removed at finalize** with the rest of the change folder — nothing in it reaches the repo. Two sections, two lifecycles:
+- **Failure-mode notes** — transient. After a failed attempt, append one line: what you tried and why it failed. Before any retry, read this section so you don't repeat a dead end. Prune a task's notes the moment it goes green. Never promote them.
+- **Discovered facts** — durable facts about the project learned while implementing (a build flag, a convention, an undocumented contract). Stage them here with their destination home; at finalize they are reconciled into that one home and cleared. Do **not** write them into `AGENTS.md`.
+Subagents and fresh sessions read and append to this file the same way they use `tasks.md` — it is shared state on disk, not context-window memory.
 ### Stuck Detection & Recovery
 **You MUST track failed attempts per task.** If a test won't go green, count your attempts:
+- **Before any attempt past the first:** read the task's **failure-mode notes** in `learnings.md`. Do not repeat an approach already recorded there as failed.
 - **Attempt 1:** Try the straightforward implementation from the task description.
-- **Attempt 2:** If attempt 1 failed, re-read the error carefully. Try a *different* approach — not the same code with minor tweaks. State what you're doing differently and why.
-- **Attempt 3 (final):** If attempt 2 failed, try one more *fundamentally different* approach. If the same error recurs, the problem is likely not in your implementation.
+- **Attempt 2:** If attempt 1 failed, append a failure-mode note (`<task-id> · tried … · failed: …`), re-read the error carefully, then try a *different* approach — not the same code with minor tweaks. State what you're doing differently and why.
+- **Attempt 3 (final):** If attempt 2 failed, append the second dead end as a failure-mode note, then try one more *fundamentally different* approach. If the same error recurs, the problem is likely not in your implementation.
 **After 3 failed attempts on a single task, STOP.** Do not continue. Instead:
-1. Add a comment to `tasks.md` under the task: `<!-- BLOCKED: <summary of what was tried and what failed> -->`
+1. Add a comment to `tasks.md` under the task: `<!-- BLOCKED: <summary> -->` (the full trail is already in the failure-mode notes)
 2. Present to the user:
    - What the task requires
    - What you tried (all 3 approaches, briefly)
@@ -115,10 +125,29 @@ If the user doesn't specify, default to review mode.
 **Never silently retry the same approach.** If your implementation produced error X and you're about to write code that will produce error X again, stop and think about why. If you can't identify what would change the outcome, stop and ask.
+### Circuit Breaker & Cross-Section Thrash (Autonomous Mode)
+The per-task 3-attempt cap bounds a single task; it cannot see the *run* cycling. Autonomous mode adds a loop-level breaker the parent orchestrator checks **between sections**. Caps live under `llm.coding.limits` in `.grimoire/config.yaml`:
+| Cap | Default | Kind |
+|-----|---------|------|
+| `max_sections_without_checkpoint` | 5 | followable — halt and checkpoint with the user |
+| `consecutive_blocked` | 2 | followable — two BLOCKED sections in a row → halt |
+| `max_cost_usd` | null (opt-in) | **soft** — self-reported; not harness-enforced in v1 |
+| `max_wallclock_min` | null (opt-in) | **soft** — self-reported; not harness-enforced in v1 |
+**Cross-section thrash detection:** halt the whole run — don't just retry locally — when the last two sections both ended BLOCKED, **or** when a section's failure-mode error class repeats the prior section's (read the failure-mode notes in `learnings.md` to compare). A failed attempt always leaves a note, so the thrash signal accumulates across sections; the breaker is the last resort once that signal shows the loop is stuck, not the first line of defense.
+**On any trip:** stop, state the trip reason and a one-line diagnosis (what cycled, what was tried), and hand to the user. Do not continue past a tripped breaker.
+> **Enforcement honesty:** the section and BLOCKED caps are orchestrator behavior the agent follows; the cost and wall-clock caps are *soft* — the agent self-reports against them and they are not enforced by the harness in v1. A hard, code-enforced breaker is a deferred follow-up.
 ### Session Management — MANDATORY Fresh Context Per Section
 **Do NOT implement all tasks in a single conversation context.** Context accumulates across tasks and degrades output quality — the LLM starts hallucinating based on stale file contents it read 5 tasks ago. This is not a suggestion. Fresh context per task section is required.
+**Size one task to one context.** The goal is not statelessness for its own sake — a task should be small enough that one coherent context carries it start to finish (stateful *within* a task), and context is reset *between* tasks. If a single task overflows its context mid-flight, that is a **smell that the task is too big** — split the spec, don't paper over it with a stateless restart loop. Fresh-context-per-section gives you the "reset between" half for free; keeping tasks small gives you the "continuity within" half.
 Each task section in `tasks.md` has a `<!-- context: ... -->` block listing the exact files needed. This is the loading list for that section's fresh context.
 #### Claude Code: Subagent Per Section
@@ -132,6 +161,12 @@ The parent agent is the **orchestrator only** — it does NOT implement tasks it
    find section <N>, and implement all unchecked tasks in that section.
    Follow the red-green BDD cycle for each task. Mark tasks [x] when done.
+   Use `.grimoire/changes/<change-id>/learnings.md` as working memory: read a
+   task's failure-mode notes before retrying it and don't repeat a recorded dead
+   end; append a failure-mode note after any failed attempt; prune them when the
+   task goes green; append durable project facts to Discovered facts with their
+   home (never to AGENTS.md). Never weaken or delete a test to force green.
    Before writing any production code, read `../references/code-quality.md`,
    `../references/testing-contracts.md`, and `../references/pattern-guard.md`.
    Apply the code-quality rules WHILE you write (not after) — reuse before write,
@@ -255,11 +290,14 @@ Work through `tasks.md` sequentially. **Every task follows the same cycle: test
    - Assertions check behavior, not just types or existence — "response status is 302 and redirect URL is /dashboard/" not "response is not None"
    - If you wrote a test that would pass against a null/trivial implementation, strengthen it
 10. **Code quality check:** Walk the seven-point checklist in `../references/code-quality.md` against every file you changed. Any fail → fix code, re-run tests, re-check. Do not mark `[x]` while a check fails.
-11. Mark complete: `- [ ]` → `- [x]`
-12. Move to next task
+11. **Reconcile working memory:** prune this task's failure-mode notes from `learnings.md` — it's green, they've served their purpose. If you learned a durable project fact while implementing (a build flag, a convention, an undocumented contract, an architectural constraint), append it to the **Discovered facts** section with its destination home — don't write it into `AGENTS.md` and don't leave it only in context.
+12. Mark complete: `- [ ]` → `- [x]`
+13. Move to next task
 **This is strict red-green BDD.** A test that has never been red has never proven it can catch a failure. The red step is NOT a formality — it is the proof that the test works. If you skip it or the test passes immediately, you have a false positive that provides zero safety.
+**Never game the gate (reward-hack guard).** When a test won't pass, fix the production code — never weaken or delete the test to force green. Deleting a test, loosening an assertion to match wrong output, narrowing what it checks, or skipping/`xfail`-ing it to get a green run is **stop-and-flag**, not a valid completion. The gate is the convergence signal; gaming it produces plausible-wrong code faster. If a test genuinely encodes the wrong expectation, that is a spec problem — STOP and go back to draft, don't quietly edit the test to pass.
 **Step definition rules:**
 - Organize by domain concept, not by feature file
 - Shared steps go in the project's common step location (check existing test setup)
@@ -290,16 +328,30 @@ When all tests are green. Features, decisions, and constraints were edited live
 2. Constraints (`.grimoire/docs/constraints.md`) were edited in place — nothing to move.
 3. If the change has a `data.yml` (schema delta), apply its `add`/`modify`/`remove` entries to the live `.grimoire/docs/data/schema.yml` so the baseline schema stays current. `data.yml` is a migration-delta spec (ephemeral scaffolding carrying nullability/safety/ordering intent a raw diff wouldn't), not a copy of the schema — `schema.yml` is the live target; the delta is discarded with the change folder.
 4. Refresh the project overview: run `grimoire docs`. It regenerates `.grimoire/docs/OVERVIEW.md` (the human entry point) from the now-current features, constraints, decisions, and schema — superseded decisions drop out automatically. This is the existing `docs` command, not a new one.
-5. Remove the change directory `.grimoire/changes/<change-id>/`. Its `manifest.md` + `tasks.md` (+ any `data.yml`) and the `draft.md` design doc are ephemeral process scaffolding. `draft.md` was retained read-only through the pipeline as the agreed-design reference; this is its closing deletion. The durable record is the branch, the PR, and `git log` — linked by the `Change: <change-id>` trailer; git history still preserves `draft.md` if ever needed. **There is no archive tree** (don't reinvent git history).
+5. Reconcile `learnings.md`: for each entry under **Discovered facts**, write it into the home it names — an area doc (`.grimoire/docs/<area>.md`), a decision, a constraint, or `schema.yml`. Confirm the routing with the user (it's correctable) and drop stale ones. Failure-mode notes are discarded, not promoted. This is the one place facts learned during apply enter the durable record — `AGENTS.md` is never the destination.
+6. Remove the change directory `.grimoire/changes/<change-id>/`. Its `manifest.md` + `tasks.md` + `learnings.md` (+ any `data.yml`) and the `draft.md` design doc are ephemeral process scaffolding. `draft.md` was retained read-only through the pipeline as the agreed-design reference; this is its closing deletion.
+   **Guard — never delete uncommitted scaffolding.** `git log` only preserves what was committed. If `draft.md`/`tasks.md`/`manifest.md`/`learnings.md` were never committed (e.g. draft and plan ran without intermediate commits), deleting them now loses them permanently — there is no recovering an untracked file. Before removing the folder, verify it is in history:
+   ```
+   git ls-files --error-unmatch .grimoire/changes/<change-id>/draft.md
+   ```
+   If that errors (untracked), or `git status` shows uncommitted edits under the change folder, **commit the scaffolding first** (see step 8 — this becomes the first of two commits), then delete. If you cannot commit, STOP and tell the user rather than deleting.
+   The durable record is the branch, the PR, and `git log` — linked by the `Change: <change-id>` trailer; once committed, git history preserves `draft.md` if ever needed. **There is no archive tree** (don't reinvent git history).
 ### 8. Commit
-Finalize must be complete before committing — the commit captures the finished state (accepted decisions, cleared scaffolding), not mid-flight change artefacts.
+The commit captures the finished state — accepted decisions, live artifacts, cleared scaffolding — not mid-flight change artefacts.
+**Order depends on whether the scaffolding is already in history (see step 6's guard):**
+- **Scaffolding already committed** (draft/plan committed earlier, the normal case): finalize fully — including the folder removal — then make one commit capturing the accepted state and the deletion.
+- **Scaffolding NOT yet committed** (this is the change's first commit): you cannot delete-then-commit, or the scaffolding is lost forever. Make **two commits**: (1) commit the implementation, live artifacts, and the still-present change folder so history preserves `draft.md`/`tasks.md`; (2) remove the folder and commit the deletion. Both carry the `Change: <change-id>` trailer.
-Stage the live artifacts and the scaffolding removal:
+Stage the live artifacts (and, in the single-commit case, the scaffolding removal):
 ```
 git add features/ .grimoire/decisions/ .grimoire/docs/ src/ tests/
-git add -u  # picks up the removed change directory
+git add -u  # picks up the removed change directory (single-commit case)
 ```
 Then commit using `/grimoire:commit` (reads change context for the message) or write a manual message following `AGENTS.md` commit trailer conventions:

package/skills/grimoire-audit/SKILL.md CHANGED Viewed

@@ -107,6 +107,9 @@ For confirmed items, create a grimoire change:
 Group related items into single changes — don't create one change per discovery.
 ### 6. Dead Feature Detection
+**Detection is deterministic.** Every dead/stale finding cites exact `file:line` (or ADR id) evidence from a reproducible check — codebase-memory-mcp graph queries (`search_graph` / `get_architecture`) per [0029]/[0030], with `grep` / `git blame` only where the graph has no answer (e.g. `@skip` age). The same commit yields the same findings. The LLM summarizes and interviews; it does not score the codebase by impression.
 Check for documented features and decisions that may no longer be accurate:
 **Dead features** — feature files that describe behavior the code no longer implements:
@@ -137,7 +140,15 @@ After the interview, summarize:
 - How many decisions are documented vs. undocumented
 - How many decisions are stale
 - How many conventions files drifted vs. up-to-date
-- Suggest which areas to address first (highest risk / most complex / most frequently changed)
+Then emit a **Top Actions** list — most-risk first, each with the exact path and the single next move. The ranking comes from the deterministic checks (§6), not impression, so the same commit yields the same list:
+```markdown
+## Top Actions
+1. `features/billing/invoice.feature` — dead (InvoiceView deleted ~3mo ago); create a removal change.
+2. `.grimoire/decisions/0007-search-backend.md` — stale (library no longer in deps); deprecate or update.
+3. `.grimoire/docs/conventions/api.md` — drifted (views moved to `src/api/handlers/`); refresh.
+```
 ## Important
 - This is a COLLABORATIVE process, not a dump. Interview, don't lecture.

package/skills/grimoire-bug/SKILL.md CHANGED Viewed

@@ -55,6 +55,8 @@ Before touching any production code:
 2. Run it — **it MUST FAIL**, reproducing the bug
 3. If it passes, your test doesn't actually reproduce the bug. Fix the test until it fails for the right reason.
+**Name it after the bug.** This repro test stays as the permanent regression test — name it so the bug is obvious (`test_password_reset_special_chars`; scenario "Password reset with plus-sign email"). One bug → one named regression test. This is how the same bug doesn't come back: a future change that reintroduces it goes red on a test that names the defect.
 This is non-negotiable. A bug fix without a reproduction test is a guess that might work. A failing test is proof you understand the problem.
 ### 4. Document the Bug
@@ -193,6 +195,7 @@ Report to the user:
 ## Important
 - **Reproduce before you fix.** No exceptions. If you can't reproduce it, you don't understand it, and your fix is a guess.
+- **The test is the source of truth, not your self-review.** When the same agent writes a fix and then reviews it, the same wrong assumption rides into both steps — "looks correct" is not evidence. The red→green of the named regression test (and the configured suites) is the proof. Don't declare a bug fixed on a code re-read; declare it fixed when the mechanical gate flips and stays green.
 - **Small fixes only.** If the bug fix requires significant architectural changes, it's not a bug fix — route to `grimoire-draft` for a proper change.
 - **Don't over-document.** The test is the documentation. A one-line comment in the test explaining the bug is enough. Don't create tracking files, bug reports, or manifests for a bug fix.
 - **The feature file is truth.** If a scenario describes behavior the user now says is wrong, that's a spec change, not a bug. Handle it through `grimoire-draft`.

package/skills/grimoire-verify/SKILL.md CHANGED Viewed

@@ -118,9 +118,27 @@ For each step definition:
    - **[warning]** `test_auth.py:58` — step "Then user should exist" only asserts `is not None` — check the actual user properties
    ```
+**Regression coverage:** When verifying a bug fix, confirm the fix ships with a regression test **named after the bug** (see `grimoire-bug`). A bug fix with no test that goes red-without-the-fix and pins the defect → WARNING — the bug can silently return.
 If `grimoire test-quality` CLI command is available, suggest running it for a comprehensive analysis.
 To run tests directly: use `config.tools.bdd_test` for BDD and `config.tools.unit_test` for unit tests.
+### 3.E Behavioral Verification *(optional — user-facing changes only)*
+Sections 3.B–3.D verify statically (code exists, asserts, follows decisions) and run the configured suites. They do **not** drive the running app. When the change is user-facing and the app can be run, add a behavioral pass; otherwise skip and say so. This mode adds **no mandatory dependency** — if there's no way to drive the app, mark it INCONCLUSIVE and rely on 3.A–3.D.
+**Read-only by default.** Read-only navigation/inspection needs no opt-in. Any state-changing action requires explicit user opt-in **and** a non-production target (local/staging URL, seeded creds). Never run mutations against production.
+**Verdict.** Every behavioral pass ends in exactly one:
+- **SHIP** — behavior matches the spec; no material issues.
+- **SHIP WITH FIXES** — works, with the non-blocking issues listed.
+- **DO NOT SHIP** — a scenario's promised outcome does not hold.
+- **INCONCLUSIVE** — could not verify (no baseline, app wouldn't run, tooling absent).
+**No baseline ⇒ INCONCLUSIVE, never a silent PASS.** Same rule as §3.C2: without a reference state you cannot claim behavior is correct. Report INCONCLUSIVE and fall back to static verification — do not dress up "I couldn't check" as a pass.
+**Click-path final-state check.** For each touchpoint the change affects, build a side-effect map — `action → {state it sets, state it resets}` — then trace the sequence and ask: *is the FINAL state what the label/spec promises?* This catches the silent-undo class (action B resets what action A just set) that static reading and single-assert tests miss.
 ### 4. Security Compliance Verification
 Verify that security guidance from plan and review stages was followed in implementation. Read `../references/security-compliance.md` for the full checklist.
@@ -222,6 +240,7 @@ Produce a structured report:
 - Scenarios verified: X
 - Decisions verified: X
 - Security checks: X passed, X failed
+- Behavioral verdict: <SHIP | SHIP WITH FIXES | DO NOT SHIP | INCONCLUSIVE | n/a (static only)>
 - Issues found: X critical, X warnings, X suggestions
 ## Critical Issues

package/skills/references/review-personas.md CHANGED Viewed

@@ -116,6 +116,31 @@ Severity inflation patterns to avoid:
 - "Untested edge case" when no scenario in the briefing covers it → not a blocker.
 - "Missing observability" on a level 1-2 change → suggestion, never blocker.
+## 2c. Pre-Report Gate *(diff-review personas: Senior Engineer code-level, Security code-level scan, Code Style)*
+The materiality gate (§2) asks "does this matter to *this* project". The Pre-Report Gate asks the prior question: "is this *even a real issue*". Both apply; this one runs first on code-level findings. Before writing any finding, answer four questions:
+1. **Exact line** — can you cite the precise `file:line` the finding lives at?
+2. **Concrete failure mode** — can you state input → state → bad outcome? Not "could be unsafe" — the actual trigger and consequence.
+3. **Context read** — have you read the callers, imports, and tests around the line, not just the hunk? Trace the type and the caller before claiming a flaw.
+4. **Severity defensible** — would the §2b severity survive the Contrarian?
+Any "no / unsure" → downgrade or drop. A **blocker** additionally requires the offending snippet, the failure scenario, and **why existing guards don't already catch it** (neighbor code, framework default, narrowing on the prior line). If you can't write that, it is not a blocker.
+## 2d. Common False Positives — skip these
+Recurring LLM mis-flags. Each has a disqualifying condition — check it before filing. The fix is always *trace it*, not *pattern-match the syntax*.
+- **"Possible null deref"** when the preceding line narrows the type (`if (!x) return`, early-return, `?.` already guarding) → trace the type flow; drop.
+- **"N+1 query"** on a fixed-cardinality loop (known small constant) or a DataLoader / batched path → not N+1; drop.
+- **"Missing await"** on an intentionally detached call (`void promise`, fire-and-forget with a comment, a queued job) → check for `void` / comment first; drop.
+- **"Unhandled promise rejection"** on a promise that is `.catch`-chained or `await`ed in a `try` → trace the chain; drop.
+- **"Math.random() is insecure"** in a non-crypto context (jitter, sampling, test data, cache-bust) → security theater; drop. Flag only on tokens/keys/IDs.
+- **"Missing input validation"** when a traced caller already validates at the boundary → trace one caller; internal code may trust it (errors-at-the-boundary). Drop or route as a boundary note.
+- **"Magic number / no constant"**, **"add a comment"**, **"could be more generic"** with no project anchor → style preference; drop (or §4.6 suggestion at most).
+Closing test for any code-level finding: **would a senior engineer on this team actually change this in review?** If no, skip.
 ---
 ## 3. Complexity Gating
@@ -170,12 +195,13 @@ Evaluate:
 ### 4.2 Senior Engineer
-Treat accepted decisions as constraints — cite ADR ID before suggesting an override.
+Treat accepted decisions as constraints — cite ADR ID before suggesting an override. On PR/pre-commit, run the Pre-Report Gate (§2c) and check §2d before filing any code-level finding.
 Evaluate:
 - **Build vs Buy** *(design only)*: Was prior art research thorough? If a well-maintained library exists that the manifest doesn't mention, **blocker**.
 - **Simplicity (YAGNI ladder)**: Walk `../references/principles.md` §4 in order — could it not exist (YAGNI), does the stdlib do it, does a native platform feature cover it, does an installed dep solve it, is it one line? Flag the first rung the code skipped: unnecessary abstraction, indirection, premature generalization, config-driven where a direct call would do. Abstract on the third real use, not the first (**Rule of Three**) — two copies is not yet a pattern. Every finding **names the concrete replacement** (`stdlib: 27-line validator → "@" in email, 1 line`), not just "this seems complex" — a finding the author can't act on is noise.
 - **Architecture**: Decisions sensible for this codebase? Will this paint us into a corner?
+- **Unrecorded decision**: Does the change make an architectural or technology choice — new dependency, pattern, module boundary, NFR target — with no ADR recorded? An architectural decision without a decision record → finding; route to an ADR via `grimoire-draft`, and check it satisfies the capability-surface rule (ADR-0036). Apply the novelty gate (`grimoire-audit` §3) — don't flag default tooling picks.
 - **Conventions** *(PR/pre-commit)*: Does new code match file layout, naming, and patterns already in the touched areas? Check `.grimoire/docs/<area>.md` if present.
 - **Reuse / reinvention**: Existing utilities re-implemented (`grep` similar names; area-doc reusable lists), or stdlib / native-platform / installed-dep functionality hand-rolled (principles.md §3 — don't reinvent the wheel). Name what already does the job.
 - **Dead code** *(PR/pre-commit)*: Functions added but not called, imports unused, commented-out code, stubs with no implementation.
@@ -221,6 +247,8 @@ Most reviews: only **Data disclosure** + **Linking/Identifying** apply — skip
 #### Code-level scan *(PR/pre-commit only)*
+Run the Pre-Report Gate (§2c) and check §2d before filing — security findings draw the most reflexive false positives (theoretical injection on validated input, `Math.random()` outside crypto).
 - **Secrets**: Grep diff for hardcoded keys, tokens, passwords, cloud credentials, JWT secrets. Any hit = **blocker**.
 - **Injection**: Raw SQL with string concatenation, shell-exec with user input, `eval`/`exec`, unsafe deserialization. Tag OWASP + CWE.
 - **Input validation**: New endpoints without schema validation, file uploads without size/type limits, path params used directly in filesystem calls.
@@ -288,7 +316,7 @@ Verify the diff matches the project's code-style and comment standards. This is
 4. Lint/format config in repo root: `.editorconfig`, `eslint.config.*`, `.prettierrc*`, `pyproject.toml` (ruff/black), `.rubocop.yml`, `rustfmt.toml`, `.golangci.yml`, etc.
 5. **Neighboring files** in the touched directories — derive convention from what already exists when no config exists
-If none of the above pin a rule, **don't invent one**. Style preferences without a project anchor are dropped.
+If none of the above pin a rule, **don't invent one**. Style preferences without a project anchor are dropped. The Pre-Report Gate (§2c) and §2d apply here too — most style nits without a config anchor are §2d false positives.
 #### Evaluate

package/templates/learnings.md ADDED Viewed

@@ -0,0 +1,40 @@
+# Learnings — <change-id>
+<!--
+  Ephemeral working memory for this change. Lives only in
+  `.grimoire/changes/<change-id>/` and is **removed at finalize** with the rest
+  of the scaffolding — nothing here persists to the repo. Re-read it at the start
+  of every task section and before every retry.
+  Two sections, two lifecycles. Keep them separate; never write either into
+  `AGENTS.md`.
+-->
+## Failure-mode notes
+<!--
+  Transient. One line per dead end: what was tried and why it failed, so the next
+  attempt does not repeat it. This is the antidote to thrashing — a stuck retry
+  MUST read this section first. Pruned per task: delete a task's entries the
+  moment that task goes green. Never promoted anywhere.
+-->
+Format: `- <task-id> · tried <approach> · failed: <observed error / why>`
+- 2.2 · tried mocking the client wrapper · failed: mock satisfied an assertion prod code never reaches — mock at the HTTP boundary instead
+## Discovered facts
+<!--
+  Durable facts about the project learned while implementing — a build flag, a
+  convention, an undocumented contract, an architectural constraint. Staged here
+  only until reconciled into the one home that owns that fact at finalize, then
+  cleared. Recording the destination home makes reconciliation mechanical and
+  lets the user correct the routing — that reconciliation is what keeps the fact
+  from going stale, because it then lives where the project's own changes keep it
+  honest.
+-->
+Format: `- fact: <what was learned> → home: <area doc | decision | constraint | schema | feature>`
+- fact: the bdd suite needs `TZ=UTC` or time-based scenarios flake → home: `.grimoire/docs/<area>.md`