npm - @kiwidata/grimoire - Versions diffs - 0.2.1 → 0.2.2 - Mend

@kiwidata/grimoire 0.2.1 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/AGENTS.md +7 -2
package/package.json +1 -1
package/skills/grimoire-apply/SKILL.md +62 -10
package/templates/learnings.md +40 -0

package/AGENTS.md CHANGED Viewed

@@ -61,6 +61,10 @@ After any failure, state what you observe before proposing a fix. One sentence:
 This applies especially to test failures. "The test failed" is not a diagnosis. "The test expected `302` but got `200` because the redirect middleware isn't registered in the test client" is.
+### Loop-level breaker (autonomous apply)
+The attempt budget above is per-problem. Autonomous `grimoire-apply` adds a run-level circuit breaker and cross-section thrash detection on top of it — see `grimoire-apply` SKILL.md. Don't duplicate the per-problem rules there; the breaker is the loop-scale backstop, this protocol is the per-problem one.
 ## When to Use Grimoire
 Use grimoire when the user's request involves:
@@ -178,7 +182,7 @@ If a task seems wrong or impossible during apply:
 ## Directory Structure
-Features, decisions, constraints, and schema are edited **live on the feature branch** — `git diff` is the staging area. A change folder holds only the ephemeral coordination artifacts (manifest + tasks) and is removed at finalize; the PR diff and git history are the record. There is no proposed-copy tree and no archive tree.
+Features, decisions, constraints, and schema are edited **live on the feature branch** — `git diff` is the staging area. A change folder holds only the ephemeral coordination artifacts (manifest, tasks, and the apply-maintained learnings file) and is removed at finalize; the PR diff and git history are the record. There is no proposed-copy tree and no archive tree.
 ```
 project-root/
@@ -193,7 +197,8 @@ project-root/
 │   └── changes/              # ephemeral per-change coordination — removed at finalize
 │       └── <change-id>/
 │           ├── manifest.md
-│           └── tasks.md
+│           ├── tasks.md
+│           └── learnings.md  # apply working memory: failure-mode notes + discovered facts
 ```
 ## Conventions

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@kiwidata/grimoire",
-  "version": "0.2.1",
+  "version": "0.2.2",
   "description": "Gherkin + MADR spec-driven development for AI coding assistants",
   "type": "module",
   "bin": {

package/skills/grimoire-apply/SKILL.md CHANGED Viewed

@@ -21,7 +21,7 @@ Implement tasks from a planned grimoire change using **test-first discipline at
 Do NOT write a `.feature` scenario for a `unit-invariant` or `characterization` task — forcing Gherkin where a unit test is correct is the antipattern that fills feature files with slop. One right way: behavior → scenario, everything else → unit test.
-**Artifacts are edited live on the feature branch.** Features, decisions, constraints, and schema are real files in `features/`, `.grimoire/decisions/`, `.grimoire/docs/`. There is no copy-into-change-folder and no promote step — `git diff` is the staging area. The change folder holds only ephemeral process scaffolding (`manifest.md`, `tasks.md`).
+**Artifacts are edited live on the feature branch.** Features, decisions, constraints, and schema are real files in `features/`, `.grimoire/decisions/`, `.grimoire/docs/`. There is no copy-into-change-folder and no promote step — `git diff` is the staging area. The change folder holds only ephemeral process scaffolding (`manifest.md`, `tasks.md`, and the apply-maintained `learnings.md`).
 ## CRITICAL: Two Rules That Must Not Be Broken
@@ -84,16 +84,26 @@ If the user doesn't specify, default to review mode.
 **Both modes:** Update `tasks.md` in real time as work progresses. Mark tasks `- [x]` the moment they pass. If a task is split, reordered, or new tasks are discovered during implementation, update `tasks.md` immediately so it always reflects the current state. The task list is the source of truth for progress — if the session is interrupted, the next agent should be able to read `tasks.md` and know exactly where to resume.
+### Working Memory: `learnings.md`
+Apply keeps one ephemeral file, `.grimoire/changes/<change-id>/learnings.md` (create it from `templates/learnings.md` the first time you need it). It is the loop's memory between attempts and sessions, and it is **removed at finalize** with the rest of the change folder — nothing in it reaches the repo. Two sections, two lifecycles:
+- **Failure-mode notes** — transient. After a failed attempt, append one line: what you tried and why it failed. Before any retry, read this section so you don't repeat a dead end. Prune a task's notes the moment it goes green. Never promote them.
+- **Discovered facts** — durable facts about the project learned while implementing (a build flag, a convention, an undocumented contract). Stage them here with their destination home; at finalize they are reconciled into that one home and cleared. Do **not** write them into `AGENTS.md`.
+Subagents and fresh sessions read and append to this file the same way they use `tasks.md` — it is shared state on disk, not context-window memory.
 ### Stuck Detection & Recovery
 **You MUST track failed attempts per task.** If a test won't go green, count your attempts:
+- **Before any attempt past the first:** read the task's **failure-mode notes** in `learnings.md`. Do not repeat an approach already recorded there as failed.
 - **Attempt 1:** Try the straightforward implementation from the task description.
-- **Attempt 2:** If attempt 1 failed, re-read the error carefully. Try a *different* approach — not the same code with minor tweaks. State what you're doing differently and why.
-- **Attempt 3 (final):** If attempt 2 failed, try one more *fundamentally different* approach. If the same error recurs, the problem is likely not in your implementation.
+- **Attempt 2:** If attempt 1 failed, append a failure-mode note (`<task-id> · tried … · failed: …`), re-read the error carefully, then try a *different* approach — not the same code with minor tweaks. State what you're doing differently and why.
+- **Attempt 3 (final):** If attempt 2 failed, append the second dead end as a failure-mode note, then try one more *fundamentally different* approach. If the same error recurs, the problem is likely not in your implementation.
 **After 3 failed attempts on a single task, STOP.** Do not continue. Instead:
-1. Add a comment to `tasks.md` under the task: `<!-- BLOCKED: <summary of what was tried and what failed> -->`
+1. Add a comment to `tasks.md` under the task: `<!-- BLOCKED: <summary> -->` (the full trail is already in the failure-mode notes)
 2. Present to the user:
    - What the task requires
    - What you tried (all 3 approaches, briefly)
@@ -115,10 +125,29 @@ If the user doesn't specify, default to review mode.
 **Never silently retry the same approach.** If your implementation produced error X and you're about to write code that will produce error X again, stop and think about why. If you can't identify what would change the outcome, stop and ask.
+### Circuit Breaker & Cross-Section Thrash (Autonomous Mode)
+The per-task 3-attempt cap bounds a single task; it cannot see the *run* cycling. Autonomous mode adds a loop-level breaker the parent orchestrator checks **between sections**. Caps live under `llm.coding.limits` in `.grimoire/config.yaml`:
+| Cap | Default | Kind |
+|-----|---------|------|
+| `max_sections_without_checkpoint` | 5 | followable — halt and checkpoint with the user |
+| `consecutive_blocked` | 2 | followable — two BLOCKED sections in a row → halt |
+| `max_cost_usd` | null (opt-in) | **soft** — self-reported; not harness-enforced in v1 |
+| `max_wallclock_min` | null (opt-in) | **soft** — self-reported; not harness-enforced in v1 |
+**Cross-section thrash detection:** halt the whole run — don't just retry locally — when the last two sections both ended BLOCKED, **or** when a section's failure-mode error class repeats the prior section's (read the failure-mode notes in `learnings.md` to compare). A failed attempt always leaves a note, so the thrash signal accumulates across sections; the breaker is the last resort once that signal shows the loop is stuck, not the first line of defense.
+**On any trip:** stop, state the trip reason and a one-line diagnosis (what cycled, what was tried), and hand to the user. Do not continue past a tripped breaker.
+> **Enforcement honesty:** the section and BLOCKED caps are orchestrator behavior the agent follows; the cost and wall-clock caps are *soft* — the agent self-reports against them and they are not enforced by the harness in v1. A hard, code-enforced breaker is a deferred follow-up.
 ### Session Management — MANDATORY Fresh Context Per Section
 **Do NOT implement all tasks in a single conversation context.** Context accumulates across tasks and degrades output quality — the LLM starts hallucinating based on stale file contents it read 5 tasks ago. This is not a suggestion. Fresh context per task section is required.
+**Size one task to one context.** The goal is not statelessness for its own sake — a task should be small enough that one coherent context carries it start to finish (stateful *within* a task), and context is reset *between* tasks. If a single task overflows its context mid-flight, that is a **smell that the task is too big** — split the spec, don't paper over it with a stateless restart loop. Fresh-context-per-section gives you the "reset between" half for free; keeping tasks small gives you the "continuity within" half.
 Each task section in `tasks.md` has a `<!-- context: ... -->` block listing the exact files needed. This is the loading list for that section's fresh context.
 #### Claude Code: Subagent Per Section
@@ -132,6 +161,12 @@ The parent agent is the **orchestrator only** — it does NOT implement tasks it
    find section <N>, and implement all unchecked tasks in that section.
    Follow the red-green BDD cycle for each task. Mark tasks [x] when done.
+   Use `.grimoire/changes/<change-id>/learnings.md` as working memory: read a
+   task's failure-mode notes before retrying it and don't repeat a recorded dead
+   end; append a failure-mode note after any failed attempt; prune them when the
+   task goes green; append durable project facts to Discovered facts with their
+   home (never to AGENTS.md). Never weaken or delete a test to force green.
    Before writing any production code, read `../references/code-quality.md`,
    `../references/testing-contracts.md`, and `../references/pattern-guard.md`.
    Apply the code-quality rules WHILE you write (not after) — reuse before write,
@@ -255,11 +290,14 @@ Work through `tasks.md` sequentially. **Every task follows the same cycle: test
    - Assertions check behavior, not just types or existence — "response status is 302 and redirect URL is /dashboard/" not "response is not None"
    - If you wrote a test that would pass against a null/trivial implementation, strengthen it
 10. **Code quality check:** Walk the seven-point checklist in `../references/code-quality.md` against every file you changed. Any fail → fix code, re-run tests, re-check. Do not mark `[x]` while a check fails.
-11. Mark complete: `- [ ]` → `- [x]`
-12. Move to next task
+11. **Reconcile working memory:** prune this task's failure-mode notes from `learnings.md` — it's green, they've served their purpose. If you learned a durable project fact while implementing (a build flag, a convention, an undocumented contract, an architectural constraint), append it to the **Discovered facts** section with its destination home — don't write it into `AGENTS.md` and don't leave it only in context.
+12. Mark complete: `- [ ]` → `- [x]`
+13. Move to next task
 **This is strict red-green BDD.** A test that has never been red has never proven it can catch a failure. The red step is NOT a formality — it is the proof that the test works. If you skip it or the test passes immediately, you have a false positive that provides zero safety.
+**Never game the gate (reward-hack guard).** When a test won't pass, fix the production code — never weaken or delete the test to force green. Deleting a test, loosening an assertion to match wrong output, narrowing what it checks, or skipping/`xfail`-ing it to get a green run is **stop-and-flag**, not a valid completion. The gate is the convergence signal; gaming it produces plausible-wrong code faster. If a test genuinely encodes the wrong expectation, that is a spec problem — STOP and go back to draft, don't quietly edit the test to pass.
 **Step definition rules:**
 - Organize by domain concept, not by feature file
 - Shared steps go in the project's common step location (check existing test setup)
@@ -290,16 +328,30 @@ When all tests are green. Features, decisions, and constraints were edited live
 2. Constraints (`.grimoire/docs/constraints.md`) were edited in place — nothing to move.
 3. If the change has a `data.yml` (schema delta), apply its `add`/`modify`/`remove` entries to the live `.grimoire/docs/data/schema.yml` so the baseline schema stays current. `data.yml` is a migration-delta spec (ephemeral scaffolding carrying nullability/safety/ordering intent a raw diff wouldn't), not a copy of the schema — `schema.yml` is the live target; the delta is discarded with the change folder.
 4. Refresh the project overview: run `grimoire docs`. It regenerates `.grimoire/docs/OVERVIEW.md` (the human entry point) from the now-current features, constraints, decisions, and schema — superseded decisions drop out automatically. This is the existing `docs` command, not a new one.
-5. Remove the change directory `.grimoire/changes/<change-id>/`. Its `manifest.md` + `tasks.md` (+ any `data.yml`) and the `draft.md` design doc are ephemeral process scaffolding. `draft.md` was retained read-only through the pipeline as the agreed-design reference; this is its closing deletion. The durable record is the branch, the PR, and `git log` — linked by the `Change: <change-id>` trailer; git history still preserves `draft.md` if ever needed. **There is no archive tree** (don't reinvent git history).
+5. Reconcile `learnings.md`: for each entry under **Discovered facts**, write it into the home it names — an area doc (`.grimoire/docs/<area>.md`), a decision, a constraint, or `schema.yml`. Confirm the routing with the user (it's correctable) and drop stale ones. Failure-mode notes are discarded, not promoted. This is the one place facts learned during apply enter the durable record — `AGENTS.md` is never the destination.
+6. Remove the change directory `.grimoire/changes/<change-id>/`. Its `manifest.md` + `tasks.md` + `learnings.md` (+ any `data.yml`) and the `draft.md` design doc are ephemeral process scaffolding. `draft.md` was retained read-only through the pipeline as the agreed-design reference; this is its closing deletion.
+   **Guard — never delete uncommitted scaffolding.** `git log` only preserves what was committed. If `draft.md`/`tasks.md`/`manifest.md`/`learnings.md` were never committed (e.g. draft and plan ran without intermediate commits), deleting them now loses them permanently — there is no recovering an untracked file. Before removing the folder, verify it is in history:
+   ```
+   git ls-files --error-unmatch .grimoire/changes/<change-id>/draft.md
+   ```
+   If that errors (untracked), or `git status` shows uncommitted edits under the change folder, **commit the scaffolding first** (see step 8 — this becomes the first of two commits), then delete. If you cannot commit, STOP and tell the user rather than deleting.
+   The durable record is the branch, the PR, and `git log` — linked by the `Change: <change-id>` trailer; once committed, git history preserves `draft.md` if ever needed. **There is no archive tree** (don't reinvent git history).
 ### 8. Commit
-Finalize must be complete before committing — the commit captures the finished state (accepted decisions, cleared scaffolding), not mid-flight change artefacts.
+The commit captures the finished state — accepted decisions, live artifacts, cleared scaffolding — not mid-flight change artefacts.
+**Order depends on whether the scaffolding is already in history (see step 6's guard):**
+- **Scaffolding already committed** (draft/plan committed earlier, the normal case): finalize fully — including the folder removal — then make one commit capturing the accepted state and the deletion.
+- **Scaffolding NOT yet committed** (this is the change's first commit): you cannot delete-then-commit, or the scaffolding is lost forever. Make **two commits**: (1) commit the implementation, live artifacts, and the still-present change folder so history preserves `draft.md`/`tasks.md`; (2) remove the folder and commit the deletion. Both carry the `Change: <change-id>` trailer.
-Stage the live artifacts and the scaffolding removal:
+Stage the live artifacts (and, in the single-commit case, the scaffolding removal):
 ```
 git add features/ .grimoire/decisions/ .grimoire/docs/ src/ tests/
-git add -u  # picks up the removed change directory
+git add -u  # picks up the removed change directory (single-commit case)
 ```
 Then commit using `/grimoire:commit` (reads change context for the message) or write a manual message following `AGENTS.md` commit trailer conventions:

package/templates/learnings.md ADDED Viewed

@@ -0,0 +1,40 @@
+# Learnings — <change-id>
+<!--
+  Ephemeral working memory for this change. Lives only in
+  `.grimoire/changes/<change-id>/` and is **removed at finalize** with the rest
+  of the scaffolding — nothing here persists to the repo. Re-read it at the start
+  of every task section and before every retry.
+  Two sections, two lifecycles. Keep them separate; never write either into
+  `AGENTS.md`.
+-->
+## Failure-mode notes
+<!--
+  Transient. One line per dead end: what was tried and why it failed, so the next
+  attempt does not repeat it. This is the antidote to thrashing — a stuck retry
+  MUST read this section first. Pruned per task: delete a task's entries the
+  moment that task goes green. Never promoted anywhere.
+-->
+Format: `- <task-id> · tried <approach> · failed: <observed error / why>`
+- 2.2 · tried mocking the client wrapper · failed: mock satisfied an assertion prod code never reaches — mock at the HTTP boundary instead
+## Discovered facts
+<!--
+  Durable facts about the project learned while implementing — a build flag, a
+  convention, an undocumented contract, an architectural constraint. Staged here
+  only until reconciled into the one home that owns that fact at finalize, then
+  cleared. Recording the destination home makes reconciliation mechanical and
+  lets the user correct the routing — that reconciliation is what keeps the fact
+  from going stale, because it then lives where the project's own changes keep it
+  honest.
+-->
+Format: `- fact: <what was learned> → home: <area doc | decision | constraint | schema | feature>`
+- fact: the bdd suite needs `TZ=UTC` or time-based scenarios flake → home: `.grimoire/docs/<area>.md`