@kiwidata/grimoire 0.2.1 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/AGENTS.md CHANGED
@@ -61,6 +61,10 @@ After any failure, state what you observe before proposing a fix. One sentence:
61
61
 
62
62
  This applies especially to test failures. "The test failed" is not a diagnosis. "The test expected `302` but got `200` because the redirect middleware isn't registered in the test client" is.
63
63
 
64
+ ### Loop-level breaker (autonomous apply)
65
+
66
+ The attempt budget above is per-problem. Autonomous `grimoire-apply` adds a run-level circuit breaker and cross-section thrash detection on top of it — see `grimoire-apply` SKILL.md. Don't duplicate the per-problem rules there; the breaker is the loop-scale backstop, this protocol is the per-problem one.
67
+
64
68
  ## When to Use Grimoire
65
69
 
66
70
  Use grimoire when the user's request involves:
@@ -178,7 +182,7 @@ If a task seems wrong or impossible during apply:
178
182
 
179
183
  ## Directory Structure
180
184
 
181
- Features, decisions, constraints, and schema are edited **live on the feature branch** — `git diff` is the staging area. A change folder holds only the ephemeral coordination artifacts (manifest + tasks) and is removed at finalize; the PR diff and git history are the record. There is no proposed-copy tree and no archive tree.
185
+ Features, decisions, constraints, and schema are edited **live on the feature branch** — `git diff` is the staging area. A change folder holds only the ephemeral coordination artifacts (manifest, tasks, and the apply-maintained learnings file) and is removed at finalize; the PR diff and git history are the record. There is no proposed-copy tree and no archive tree.
182
186
 
183
187
  ```
184
188
  project-root/
@@ -193,7 +197,8 @@ project-root/
193
197
  │ └── changes/ # ephemeral per-change coordination — removed at finalize
194
198
  │ └── <change-id>/
195
199
  │ ├── manifest.md
196
- └── tasks.md
200
+ ├── tasks.md
201
+ │ └── learnings.md # apply working memory: failure-mode notes + discovered facts
197
202
  ```
198
203
 
199
204
  ## Conventions
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@kiwidata/grimoire",
3
- "version": "0.2.1",
3
+ "version": "0.2.2",
4
4
  "description": "Gherkin + MADR spec-driven development for AI coding assistants",
5
5
  "type": "module",
6
6
  "bin": {
@@ -21,7 +21,7 @@ Implement tasks from a planned grimoire change using **test-first discipline at
21
21
 
22
22
  Do NOT write a `.feature` scenario for a `unit-invariant` or `characterization` task — forcing Gherkin where a unit test is correct is the antipattern that fills feature files with slop. One right way: behavior → scenario, everything else → unit test.
23
23
 
24
- **Artifacts are edited live on the feature branch.** Features, decisions, constraints, and schema are real files in `features/`, `.grimoire/decisions/`, `.grimoire/docs/`. There is no copy-into-change-folder and no promote step — `git diff` is the staging area. The change folder holds only ephemeral process scaffolding (`manifest.md`, `tasks.md`).
24
+ **Artifacts are edited live on the feature branch.** Features, decisions, constraints, and schema are real files in `features/`, `.grimoire/decisions/`, `.grimoire/docs/`. There is no copy-into-change-folder and no promote step — `git diff` is the staging area. The change folder holds only ephemeral process scaffolding (`manifest.md`, `tasks.md`, and the apply-maintained `learnings.md`).
25
25
 
26
26
  ## CRITICAL: Two Rules That Must Not Be Broken
27
27
 
@@ -84,16 +84,26 @@ If the user doesn't specify, default to review mode.
84
84
 
85
85
  **Both modes:** Update `tasks.md` in real time as work progresses. Mark tasks `- [x]` the moment they pass. If a task is split, reordered, or new tasks are discovered during implementation, update `tasks.md` immediately so it always reflects the current state. The task list is the source of truth for progress — if the session is interrupted, the next agent should be able to read `tasks.md` and know exactly where to resume.
86
86
 
87
+ ### Working Memory: `learnings.md`
88
+
89
+ Apply keeps one ephemeral file, `.grimoire/changes/<change-id>/learnings.md` (create it from `templates/learnings.md` the first time you need it). It is the loop's memory between attempts and sessions, and it is **removed at finalize** with the rest of the change folder — nothing in it reaches the repo. Two sections, two lifecycles:
90
+
91
+ - **Failure-mode notes** — transient. After a failed attempt, append one line: what you tried and why it failed. Before any retry, read this section so you don't repeat a dead end. Prune a task's notes the moment it goes green. Never promote them.
92
+ - **Discovered facts** — durable facts about the project learned while implementing (a build flag, a convention, an undocumented contract). Stage them here with their destination home; at finalize they are reconciled into that one home and cleared. Do **not** write them into `AGENTS.md`.
93
+
94
+ Subagents and fresh sessions read and append to this file the same way they use `tasks.md` — it is shared state on disk, not context-window memory.
95
+
87
96
  ### Stuck Detection & Recovery
88
97
 
89
98
  **You MUST track failed attempts per task.** If a test won't go green, count your attempts:
90
99
 
100
+ - **Before any attempt past the first:** read the task's **failure-mode notes** in `learnings.md`. Do not repeat an approach already recorded there as failed.
91
101
  - **Attempt 1:** Try the straightforward implementation from the task description.
92
- - **Attempt 2:** If attempt 1 failed, re-read the error carefully. Try a *different* approach — not the same code with minor tweaks. State what you're doing differently and why.
93
- - **Attempt 3 (final):** If attempt 2 failed, try one more *fundamentally different* approach. If the same error recurs, the problem is likely not in your implementation.
102
+ - **Attempt 2:** If attempt 1 failed, append a failure-mode note (`<task-id> · tried … · failed: …`), re-read the error carefully, then try a *different* approach — not the same code with minor tweaks. State what you're doing differently and why.
103
+ - **Attempt 3 (final):** If attempt 2 failed, append the second dead end as a failure-mode note, then try one more *fundamentally different* approach. If the same error recurs, the problem is likely not in your implementation.
94
104
 
95
105
  **After 3 failed attempts on a single task, STOP.** Do not continue. Instead:
96
- 1. Add a comment to `tasks.md` under the task: `<!-- BLOCKED: <summary of what was tried and what failed> -->`
106
+ 1. Add a comment to `tasks.md` under the task: `<!-- BLOCKED: <summary> -->` (the full trail is already in the failure-mode notes)
97
107
  2. Present to the user:
98
108
  - What the task requires
99
109
  - What you tried (all 3 approaches, briefly)
@@ -115,10 +125,29 @@ If the user doesn't specify, default to review mode.
115
125
 
116
126
  **Never silently retry the same approach.** If your implementation produced error X and you're about to write code that will produce error X again, stop and think about why. If you can't identify what would change the outcome, stop and ask.
117
127
 
128
+ ### Circuit Breaker & Cross-Section Thrash (Autonomous Mode)
129
+
130
+ The per-task 3-attempt cap bounds a single task; it cannot see the *run* cycling. Autonomous mode adds a loop-level breaker the parent orchestrator checks **between sections**. Caps live under `llm.coding.limits` in `.grimoire/config.yaml`:
131
+
132
+ | Cap | Default | Kind |
133
+ |-----|---------|------|
134
+ | `max_sections_without_checkpoint` | 5 | followable — halt and checkpoint with the user |
135
+ | `consecutive_blocked` | 2 | followable — two BLOCKED sections in a row → halt |
136
+ | `max_cost_usd` | null (opt-in) | **soft** — self-reported; not harness-enforced in v1 |
137
+ | `max_wallclock_min` | null (opt-in) | **soft** — self-reported; not harness-enforced in v1 |
138
+
139
+ **Cross-section thrash detection:** halt the whole run — don't just retry locally — when the last two sections both ended BLOCKED, **or** when a section's failure-mode error class repeats the prior section's (read the failure-mode notes in `learnings.md` to compare). A failed attempt always leaves a note, so the thrash signal accumulates across sections; the breaker is the last resort once that signal shows the loop is stuck, not the first line of defense.
140
+
141
+ **On any trip:** stop, state the trip reason and a one-line diagnosis (what cycled, what was tried), and hand to the user. Do not continue past a tripped breaker.
142
+
143
+ > **Enforcement honesty:** the section and BLOCKED caps are orchestrator behavior the agent follows; the cost and wall-clock caps are *soft* — the agent self-reports against them and they are not enforced by the harness in v1. A hard, code-enforced breaker is a deferred follow-up.
144
+
118
145
  ### Session Management — MANDATORY Fresh Context Per Section
119
146
 
120
147
  **Do NOT implement all tasks in a single conversation context.** Context accumulates across tasks and degrades output quality — the LLM starts hallucinating based on stale file contents it read 5 tasks ago. This is not a suggestion. Fresh context per task section is required.
121
148
 
149
+ **Size one task to one context.** The goal is not statelessness for its own sake — a task should be small enough that one coherent context carries it start to finish (stateful *within* a task), and context is reset *between* tasks. If a single task overflows its context mid-flight, that is a **smell that the task is too big** — split the spec, don't paper over it with a stateless restart loop. Fresh-context-per-section gives you the "reset between" half for free; keeping tasks small gives you the "continuity within" half.
150
+
122
151
  Each task section in `tasks.md` has a `<!-- context: ... -->` block listing the exact files needed. This is the loading list for that section's fresh context.
123
152
 
124
153
  #### Claude Code: Subagent Per Section
@@ -132,6 +161,12 @@ The parent agent is the **orchestrator only** — it does NOT implement tasks it
132
161
  find section <N>, and implement all unchecked tasks in that section.
133
162
  Follow the red-green BDD cycle for each task. Mark tasks [x] when done.
134
163
 
164
+ Use `.grimoire/changes/<change-id>/learnings.md` as working memory: read a
165
+ task's failure-mode notes before retrying it and don't repeat a recorded dead
166
+ end; append a failure-mode note after any failed attempt; prune them when the
167
+ task goes green; append durable project facts to Discovered facts with their
168
+ home (never to AGENTS.md). Never weaken or delete a test to force green.
169
+
135
170
  Before writing any production code, read `../references/code-quality.md`,
136
171
  `../references/testing-contracts.md`, and `../references/pattern-guard.md`.
137
172
  Apply the code-quality rules WHILE you write (not after) — reuse before write,
@@ -255,11 +290,14 @@ Work through `tasks.md` sequentially. **Every task follows the same cycle: test
255
290
  - Assertions check behavior, not just types or existence — "response status is 302 and redirect URL is /dashboard/" not "response is not None"
256
291
  - If you wrote a test that would pass against a null/trivial implementation, strengthen it
257
292
  10. **Code quality check:** Walk the seven-point checklist in `../references/code-quality.md` against every file you changed. Any fail → fix code, re-run tests, re-check. Do not mark `[x]` while a check fails.
258
- 11. Mark complete: `- [ ]` `- [x]`
259
- 12. Move to next task
293
+ 11. **Reconcile working memory:** prune this task's failure-mode notes from `learnings.md` it's green, they've served their purpose. If you learned a durable project fact while implementing (a build flag, a convention, an undocumented contract, an architectural constraint), append it to the **Discovered facts** section with its destination home — don't write it into `AGENTS.md` and don't leave it only in context.
294
+ 12. Mark complete: `- [ ]` → `- [x]`
295
+ 13. Move to next task
260
296
 
261
297
  **This is strict red-green BDD.** A test that has never been red has never proven it can catch a failure. The red step is NOT a formality — it is the proof that the test works. If you skip it or the test passes immediately, you have a false positive that provides zero safety.
262
298
 
299
+ **Never game the gate (reward-hack guard).** When a test won't pass, fix the production code — never weaken or delete the test to force green. Deleting a test, loosening an assertion to match wrong output, narrowing what it checks, or skipping/`xfail`-ing it to get a green run is **stop-and-flag**, not a valid completion. The gate is the convergence signal; gaming it produces plausible-wrong code faster. If a test genuinely encodes the wrong expectation, that is a spec problem — STOP and go back to draft, don't quietly edit the test to pass.
300
+
263
301
  **Step definition rules:**
264
302
  - Organize by domain concept, not by feature file
265
303
  - Shared steps go in the project's common step location (check existing test setup)
@@ -290,16 +328,30 @@ When all tests are green. Features, decisions, and constraints were edited live
290
328
  2. Constraints (`.grimoire/docs/constraints.md`) were edited in place — nothing to move.
291
329
  3. If the change has a `data.yml` (schema delta), apply its `add`/`modify`/`remove` entries to the live `.grimoire/docs/data/schema.yml` so the baseline schema stays current. `data.yml` is a migration-delta spec (ephemeral scaffolding carrying nullability/safety/ordering intent a raw diff wouldn't), not a copy of the schema — `schema.yml` is the live target; the delta is discarded with the change folder.
292
330
  4. Refresh the project overview: run `grimoire docs`. It regenerates `.grimoire/docs/OVERVIEW.md` (the human entry point) from the now-current features, constraints, decisions, and schema — superseded decisions drop out automatically. This is the existing `docs` command, not a new one.
293
- 5. Remove the change directory `.grimoire/changes/<change-id>/`. Its `manifest.md` + `tasks.md` (+ any `data.yml`) and the `draft.md` design doc are ephemeral process scaffolding. `draft.md` was retained read-only through the pipeline as the agreed-design reference; this is its closing deletion. The durable record is the branch, the PR, and `git log` linked by the `Change: <change-id>` trailer; git history still preserves `draft.md` if ever needed. **There is no archive tree** (don't reinvent git history).
331
+ 5. Reconcile `learnings.md`: for each entry under **Discovered facts**, write it into the home it names an area doc (`.grimoire/docs/<area>.md`), a decision, a constraint, or `schema.yml`. Confirm the routing with the user (it's correctable) and drop stale ones. Failure-mode notes are discarded, not promoted. This is the one place facts learned during apply enter the durable record `AGENTS.md` is never the destination.
332
+ 6. Remove the change directory `.grimoire/changes/<change-id>/`. Its `manifest.md` + `tasks.md` + `learnings.md` (+ any `data.yml`) and the `draft.md` design doc are ephemeral process scaffolding. `draft.md` was retained read-only through the pipeline as the agreed-design reference; this is its closing deletion.
333
+
334
+ **Guard — never delete uncommitted scaffolding.** `git log` only preserves what was committed. If `draft.md`/`tasks.md`/`manifest.md`/`learnings.md` were never committed (e.g. draft and plan ran without intermediate commits), deleting them now loses them permanently — there is no recovering an untracked file. Before removing the folder, verify it is in history:
335
+ ```
336
+ git ls-files --error-unmatch .grimoire/changes/<change-id>/draft.md
337
+ ```
338
+ If that errors (untracked), or `git status` shows uncommitted edits under the change folder, **commit the scaffolding first** (see step 8 — this becomes the first of two commits), then delete. If you cannot commit, STOP and tell the user rather than deleting.
339
+
340
+ The durable record is the branch, the PR, and `git log` — linked by the `Change: <change-id>` trailer; once committed, git history preserves `draft.md` if ever needed. **There is no archive tree** (don't reinvent git history).
294
341
 
295
342
  ### 8. Commit
296
343
 
297
- Finalize must be complete before committing — the commit captures the finished state (accepted decisions, cleared scaffolding), not mid-flight change artefacts.
344
+ The commit captures the finished state accepted decisions, live artifacts, cleared scaffolding not mid-flight change artefacts.
345
+
346
+ **Order depends on whether the scaffolding is already in history (see step 6's guard):**
347
+
348
+ - **Scaffolding already committed** (draft/plan committed earlier, the normal case): finalize fully — including the folder removal — then make one commit capturing the accepted state and the deletion.
349
+ - **Scaffolding NOT yet committed** (this is the change's first commit): you cannot delete-then-commit, or the scaffolding is lost forever. Make **two commits**: (1) commit the implementation, live artifacts, and the still-present change folder so history preserves `draft.md`/`tasks.md`; (2) remove the folder and commit the deletion. Both carry the `Change: <change-id>` trailer.
298
350
 
299
- Stage the live artifacts and the scaffolding removal:
351
+ Stage the live artifacts (and, in the single-commit case, the scaffolding removal):
300
352
  ```
301
353
  git add features/ .grimoire/decisions/ .grimoire/docs/ src/ tests/
302
- git add -u # picks up the removed change directory
354
+ git add -u # picks up the removed change directory (single-commit case)
303
355
  ```
304
356
 
305
357
  Then commit using `/grimoire:commit` (reads change context for the message) or write a manual message following `AGENTS.md` commit trailer conventions:
@@ -0,0 +1,40 @@
1
+ # Learnings — <change-id>
2
+
3
+ <!--
4
+ Ephemeral working memory for this change. Lives only in
5
+ `.grimoire/changes/<change-id>/` and is **removed at finalize** with the rest
6
+ of the scaffolding — nothing here persists to the repo. Re-read it at the start
7
+ of every task section and before every retry.
8
+
9
+ Two sections, two lifecycles. Keep them separate; never write either into
10
+ `AGENTS.md`.
11
+ -->
12
+
13
+ ## Failure-mode notes
14
+
15
+ <!--
16
+ Transient. One line per dead end: what was tried and why it failed, so the next
17
+ attempt does not repeat it. This is the antidote to thrashing — a stuck retry
18
+ MUST read this section first. Pruned per task: delete a task's entries the
19
+ moment that task goes green. Never promoted anywhere.
20
+ -->
21
+
22
+ Format: `- <task-id> · tried <approach> · failed: <observed error / why>`
23
+
24
+ - 2.2 · tried mocking the client wrapper · failed: mock satisfied an assertion prod code never reaches — mock at the HTTP boundary instead
25
+
26
+ ## Discovered facts
27
+
28
+ <!--
29
+ Durable facts about the project learned while implementing — a build flag, a
30
+ convention, an undocumented contract, an architectural constraint. Staged here
31
+ only until reconciled into the one home that owns that fact at finalize, then
32
+ cleared. Recording the destination home makes reconciliation mechanical and
33
+ lets the user correct the routing — that reconciliation is what keeps the fact
34
+ from going stale, because it then lives where the project's own changes keep it
35
+ honest.
36
+ -->
37
+
38
+ Format: `- fact: <what was learned> → home: <area doc | decision | constraint | schema | feature>`
39
+
40
+ - fact: the bdd suite needs `TZ=UTC` or time-based scenarios flake → home: `.grimoire/docs/<area>.md`