@kiwidata/grimoire 0.2.1 → 0.2.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +7 -2
- package/package.json +1 -1
- package/skills/grimoire-apply/SKILL.md +62 -10
- package/templates/learnings.md +40 -0
package/AGENTS.md
CHANGED
|
@@ -61,6 +61,10 @@ After any failure, state what you observe before proposing a fix. One sentence:
|
|
|
61
61
|
|
|
62
62
|
This applies especially to test failures. "The test failed" is not a diagnosis. "The test expected `302` but got `200` because the redirect middleware isn't registered in the test client" is.
|
|
63
63
|
|
|
64
|
+
### Loop-level breaker (autonomous apply)
|
|
65
|
+
|
|
66
|
+
The attempt budget above is per-problem. Autonomous `grimoire-apply` adds a run-level circuit breaker and cross-section thrash detection on top of it — see `grimoire-apply` SKILL.md. Don't duplicate the per-problem rules there; the breaker is the loop-scale backstop, this protocol is the per-problem one.
|
|
67
|
+
|
|
64
68
|
## When to Use Grimoire
|
|
65
69
|
|
|
66
70
|
Use grimoire when the user's request involves:
|
|
@@ -178,7 +182,7 @@ If a task seems wrong or impossible during apply:
|
|
|
178
182
|
|
|
179
183
|
## Directory Structure
|
|
180
184
|
|
|
181
|
-
Features, decisions, constraints, and schema are edited **live on the feature branch** — `git diff` is the staging area. A change folder holds only the ephemeral coordination artifacts (manifest
|
|
185
|
+
Features, decisions, constraints, and schema are edited **live on the feature branch** — `git diff` is the staging area. A change folder holds only the ephemeral coordination artifacts (manifest, tasks, and the apply-maintained learnings file) and is removed at finalize; the PR diff and git history are the record. There is no proposed-copy tree and no archive tree.
|
|
182
186
|
|
|
183
187
|
```
|
|
184
188
|
project-root/
|
|
@@ -193,7 +197,8 @@ project-root/
|
|
|
193
197
|
│ └── changes/ # ephemeral per-change coordination — removed at finalize
|
|
194
198
|
│ └── <change-id>/
|
|
195
199
|
│ ├── manifest.md
|
|
196
|
-
│
|
|
200
|
+
│ ├── tasks.md
|
|
201
|
+
│ └── learnings.md # apply working memory: failure-mode notes + discovered facts
|
|
197
202
|
```
|
|
198
203
|
|
|
199
204
|
## Conventions
|
package/package.json
CHANGED
|
@@ -21,7 +21,7 @@ Implement tasks from a planned grimoire change using **test-first discipline at
|
|
|
21
21
|
|
|
22
22
|
Do NOT write a `.feature` scenario for a `unit-invariant` or `characterization` task — forcing Gherkin where a unit test is correct is the antipattern that fills feature files with slop. One right way: behavior → scenario, everything else → unit test.
|
|
23
23
|
|
|
24
|
-
**Artifacts are edited live on the feature branch.** Features, decisions, constraints, and schema are real files in `features/`, `.grimoire/decisions/`, `.grimoire/docs/`. There is no copy-into-change-folder and no promote step — `git diff` is the staging area. The change folder holds only ephemeral process scaffolding (`manifest.md`, `tasks.md`).
|
|
24
|
+
**Artifacts are edited live on the feature branch.** Features, decisions, constraints, and schema are real files in `features/`, `.grimoire/decisions/`, `.grimoire/docs/`. There is no copy-into-change-folder and no promote step — `git diff` is the staging area. The change folder holds only ephemeral process scaffolding (`manifest.md`, `tasks.md`, and the apply-maintained `learnings.md`).
|
|
25
25
|
|
|
26
26
|
## CRITICAL: Two Rules That Must Not Be Broken
|
|
27
27
|
|
|
@@ -84,16 +84,26 @@ If the user doesn't specify, default to review mode.
|
|
|
84
84
|
|
|
85
85
|
**Both modes:** Update `tasks.md` in real time as work progresses. Mark tasks `- [x]` the moment they pass. If a task is split, reordered, or new tasks are discovered during implementation, update `tasks.md` immediately so it always reflects the current state. The task list is the source of truth for progress — if the session is interrupted, the next agent should be able to read `tasks.md` and know exactly where to resume.
|
|
86
86
|
|
|
87
|
+
### Working Memory: `learnings.md`
|
|
88
|
+
|
|
89
|
+
Apply keeps one ephemeral file, `.grimoire/changes/<change-id>/learnings.md` (create it from `templates/learnings.md` the first time you need it). It is the loop's memory between attempts and sessions, and it is **removed at finalize** with the rest of the change folder — nothing in it reaches the repo. Two sections, two lifecycles:
|
|
90
|
+
|
|
91
|
+
- **Failure-mode notes** — transient. After a failed attempt, append one line: what you tried and why it failed. Before any retry, read this section so you don't repeat a dead end. Prune a task's notes the moment it goes green. Never promote them.
|
|
92
|
+
- **Discovered facts** — durable facts about the project learned while implementing (a build flag, a convention, an undocumented contract). Stage them here with their destination home; at finalize they are reconciled into that one home and cleared. Do **not** write them into `AGENTS.md`.
|
|
93
|
+
|
|
94
|
+
Subagents and fresh sessions read and append to this file the same way they use `tasks.md` — it is shared state on disk, not context-window memory.
|
|
95
|
+
|
|
87
96
|
### Stuck Detection & Recovery
|
|
88
97
|
|
|
89
98
|
**You MUST track failed attempts per task.** If a test won't go green, count your attempts:
|
|
90
99
|
|
|
100
|
+
- **Before any attempt past the first:** read the task's **failure-mode notes** in `learnings.md`. Do not repeat an approach already recorded there as failed.
|
|
91
101
|
- **Attempt 1:** Try the straightforward implementation from the task description.
|
|
92
|
-
- **Attempt 2:** If attempt 1 failed, re-read the error carefully
|
|
93
|
-
- **Attempt 3 (final):** If attempt 2 failed, try one more *fundamentally different* approach. If the same error recurs, the problem is likely not in your implementation.
|
|
102
|
+
- **Attempt 2:** If attempt 1 failed, append a failure-mode note (`<task-id> · tried … · failed: …`), re-read the error carefully, then try a *different* approach — not the same code with minor tweaks. State what you're doing differently and why.
|
|
103
|
+
- **Attempt 3 (final):** If attempt 2 failed, append the second dead end as a failure-mode note, then try one more *fundamentally different* approach. If the same error recurs, the problem is likely not in your implementation.
|
|
94
104
|
|
|
95
105
|
**After 3 failed attempts on a single task, STOP.** Do not continue. Instead:
|
|
96
|
-
1. Add a comment to `tasks.md` under the task: `<!-- BLOCKED: <summary
|
|
106
|
+
1. Add a comment to `tasks.md` under the task: `<!-- BLOCKED: <summary> -->` (the full trail is already in the failure-mode notes)
|
|
97
107
|
2. Present to the user:
|
|
98
108
|
- What the task requires
|
|
99
109
|
- What you tried (all 3 approaches, briefly)
|
|
@@ -115,10 +125,29 @@ If the user doesn't specify, default to review mode.
|
|
|
115
125
|
|
|
116
126
|
**Never silently retry the same approach.** If your implementation produced error X and you're about to write code that will produce error X again, stop and think about why. If you can't identify what would change the outcome, stop and ask.
|
|
117
127
|
|
|
128
|
+
### Circuit Breaker & Cross-Section Thrash (Autonomous Mode)
|
|
129
|
+
|
|
130
|
+
The per-task 3-attempt cap bounds a single task; it cannot see the *run* cycling. Autonomous mode adds a loop-level breaker the parent orchestrator checks **between sections**. Caps live under `llm.coding.limits` in `.grimoire/config.yaml`:
|
|
131
|
+
|
|
132
|
+
| Cap | Default | Kind |
|
|
133
|
+
|-----|---------|------|
|
|
134
|
+
| `max_sections_without_checkpoint` | 5 | followable — halt and checkpoint with the user |
|
|
135
|
+
| `consecutive_blocked` | 2 | followable — two BLOCKED sections in a row → halt |
|
|
136
|
+
| `max_cost_usd` | null (opt-in) | **soft** — self-reported; not harness-enforced in v1 |
|
|
137
|
+
| `max_wallclock_min` | null (opt-in) | **soft** — self-reported; not harness-enforced in v1 |
|
|
138
|
+
|
|
139
|
+
**Cross-section thrash detection:** halt the whole run — don't just retry locally — when the last two sections both ended BLOCKED, **or** when a section's failure-mode error class repeats the prior section's (read the failure-mode notes in `learnings.md` to compare). A failed attempt always leaves a note, so the thrash signal accumulates across sections; the breaker is the last resort once that signal shows the loop is stuck, not the first line of defense.
|
|
140
|
+
|
|
141
|
+
**On any trip:** stop, state the trip reason and a one-line diagnosis (what cycled, what was tried), and hand to the user. Do not continue past a tripped breaker.
|
|
142
|
+
|
|
143
|
+
> **Enforcement honesty:** the section and BLOCKED caps are orchestrator behavior the agent follows; the cost and wall-clock caps are *soft* — the agent self-reports against them and they are not enforced by the harness in v1. A hard, code-enforced breaker is a deferred follow-up.
|
|
144
|
+
|
|
118
145
|
### Session Management — MANDATORY Fresh Context Per Section
|
|
119
146
|
|
|
120
147
|
**Do NOT implement all tasks in a single conversation context.** Context accumulates across tasks and degrades output quality — the LLM starts hallucinating based on stale file contents it read 5 tasks ago. This is not a suggestion. Fresh context per task section is required.
|
|
121
148
|
|
|
149
|
+
**Size one task to one context.** The goal is not statelessness for its own sake — a task should be small enough that one coherent context carries it start to finish (stateful *within* a task), and context is reset *between* tasks. If a single task overflows its context mid-flight, that is a **smell that the task is too big** — split the spec, don't paper over it with a stateless restart loop. Fresh-context-per-section gives you the "reset between" half for free; keeping tasks small gives you the "continuity within" half.
|
|
150
|
+
|
|
122
151
|
Each task section in `tasks.md` has a `<!-- context: ... -->` block listing the exact files needed. This is the loading list for that section's fresh context.
|
|
123
152
|
|
|
124
153
|
#### Claude Code: Subagent Per Section
|
|
@@ -132,6 +161,12 @@ The parent agent is the **orchestrator only** — it does NOT implement tasks it
|
|
|
132
161
|
find section <N>, and implement all unchecked tasks in that section.
|
|
133
162
|
Follow the red-green BDD cycle for each task. Mark tasks [x] when done.
|
|
134
163
|
|
|
164
|
+
Use `.grimoire/changes/<change-id>/learnings.md` as working memory: read a
|
|
165
|
+
task's failure-mode notes before retrying it and don't repeat a recorded dead
|
|
166
|
+
end; append a failure-mode note after any failed attempt; prune them when the
|
|
167
|
+
task goes green; append durable project facts to Discovered facts with their
|
|
168
|
+
home (never to AGENTS.md). Never weaken or delete a test to force green.
|
|
169
|
+
|
|
135
170
|
Before writing any production code, read `../references/code-quality.md`,
|
|
136
171
|
`../references/testing-contracts.md`, and `../references/pattern-guard.md`.
|
|
137
172
|
Apply the code-quality rules WHILE you write (not after) — reuse before write,
|
|
@@ -255,11 +290,14 @@ Work through `tasks.md` sequentially. **Every task follows the same cycle: test
|
|
|
255
290
|
- Assertions check behavior, not just types or existence — "response status is 302 and redirect URL is /dashboard/" not "response is not None"
|
|
256
291
|
- If you wrote a test that would pass against a null/trivial implementation, strengthen it
|
|
257
292
|
10. **Code quality check:** Walk the seven-point checklist in `../references/code-quality.md` against every file you changed. Any fail → fix code, re-run tests, re-check. Do not mark `[x]` while a check fails.
|
|
258
|
-
11.
|
|
259
|
-
12.
|
|
293
|
+
11. **Reconcile working memory:** prune this task's failure-mode notes from `learnings.md` — it's green, they've served their purpose. If you learned a durable project fact while implementing (a build flag, a convention, an undocumented contract, an architectural constraint), append it to the **Discovered facts** section with its destination home — don't write it into `AGENTS.md` and don't leave it only in context.
|
|
294
|
+
12. Mark complete: `- [ ]` → `- [x]`
|
|
295
|
+
13. Move to next task
|
|
260
296
|
|
|
261
297
|
**This is strict red-green BDD.** A test that has never been red has never proven it can catch a failure. The red step is NOT a formality — it is the proof that the test works. If you skip it or the test passes immediately, you have a false positive that provides zero safety.
|
|
262
298
|
|
|
299
|
+
**Never game the gate (reward-hack guard).** When a test won't pass, fix the production code — never weaken or delete the test to force green. Deleting a test, loosening an assertion to match wrong output, narrowing what it checks, or skipping/`xfail`-ing it to get a green run is **stop-and-flag**, not a valid completion. The gate is the convergence signal; gaming it produces plausible-wrong code faster. If a test genuinely encodes the wrong expectation, that is a spec problem — STOP and go back to draft, don't quietly edit the test to pass.
|
|
300
|
+
|
|
263
301
|
**Step definition rules:**
|
|
264
302
|
- Organize by domain concept, not by feature file
|
|
265
303
|
- Shared steps go in the project's common step location (check existing test setup)
|
|
@@ -290,16 +328,30 @@ When all tests are green. Features, decisions, and constraints were edited live
|
|
|
290
328
|
2. Constraints (`.grimoire/docs/constraints.md`) were edited in place — nothing to move.
|
|
291
329
|
3. If the change has a `data.yml` (schema delta), apply its `add`/`modify`/`remove` entries to the live `.grimoire/docs/data/schema.yml` so the baseline schema stays current. `data.yml` is a migration-delta spec (ephemeral scaffolding carrying nullability/safety/ordering intent a raw diff wouldn't), not a copy of the schema — `schema.yml` is the live target; the delta is discarded with the change folder.
|
|
292
330
|
4. Refresh the project overview: run `grimoire docs`. It regenerates `.grimoire/docs/OVERVIEW.md` (the human entry point) from the now-current features, constraints, decisions, and schema — superseded decisions drop out automatically. This is the existing `docs` command, not a new one.
|
|
293
|
-
5.
|
|
331
|
+
5. Reconcile `learnings.md`: for each entry under **Discovered facts**, write it into the home it names — an area doc (`.grimoire/docs/<area>.md`), a decision, a constraint, or `schema.yml`. Confirm the routing with the user (it's correctable) and drop stale ones. Failure-mode notes are discarded, not promoted. This is the one place facts learned during apply enter the durable record — `AGENTS.md` is never the destination.
|
|
332
|
+
6. Remove the change directory `.grimoire/changes/<change-id>/`. Its `manifest.md` + `tasks.md` + `learnings.md` (+ any `data.yml`) and the `draft.md` design doc are ephemeral process scaffolding. `draft.md` was retained read-only through the pipeline as the agreed-design reference; this is its closing deletion.
|
|
333
|
+
|
|
334
|
+
**Guard — never delete uncommitted scaffolding.** `git log` only preserves what was committed. If `draft.md`/`tasks.md`/`manifest.md`/`learnings.md` were never committed (e.g. draft and plan ran without intermediate commits), deleting them now loses them permanently — there is no recovering an untracked file. Before removing the folder, verify it is in history:
|
|
335
|
+
```
|
|
336
|
+
git ls-files --error-unmatch .grimoire/changes/<change-id>/draft.md
|
|
337
|
+
```
|
|
338
|
+
If that errors (untracked), or `git status` shows uncommitted edits under the change folder, **commit the scaffolding first** (see step 8 — this becomes the first of two commits), then delete. If you cannot commit, STOP and tell the user rather than deleting.
|
|
339
|
+
|
|
340
|
+
The durable record is the branch, the PR, and `git log` — linked by the `Change: <change-id>` trailer; once committed, git history preserves `draft.md` if ever needed. **There is no archive tree** (don't reinvent git history).
|
|
294
341
|
|
|
295
342
|
### 8. Commit
|
|
296
343
|
|
|
297
|
-
|
|
344
|
+
The commit captures the finished state — accepted decisions, live artifacts, cleared scaffolding — not mid-flight change artefacts.
|
|
345
|
+
|
|
346
|
+
**Order depends on whether the scaffolding is already in history (see step 6's guard):**
|
|
347
|
+
|
|
348
|
+
- **Scaffolding already committed** (draft/plan committed earlier, the normal case): finalize fully — including the folder removal — then make one commit capturing the accepted state and the deletion.
|
|
349
|
+
- **Scaffolding NOT yet committed** (this is the change's first commit): you cannot delete-then-commit, or the scaffolding is lost forever. Make **two commits**: (1) commit the implementation, live artifacts, and the still-present change folder so history preserves `draft.md`/`tasks.md`; (2) remove the folder and commit the deletion. Both carry the `Change: <change-id>` trailer.
|
|
298
350
|
|
|
299
|
-
Stage the live artifacts and the scaffolding removal:
|
|
351
|
+
Stage the live artifacts (and, in the single-commit case, the scaffolding removal):
|
|
300
352
|
```
|
|
301
353
|
git add features/ .grimoire/decisions/ .grimoire/docs/ src/ tests/
|
|
302
|
-
git add -u # picks up the removed change directory
|
|
354
|
+
git add -u # picks up the removed change directory (single-commit case)
|
|
303
355
|
```
|
|
304
356
|
|
|
305
357
|
Then commit using `/grimoire:commit` (reads change context for the message) or write a manual message following `AGENTS.md` commit trailer conventions:
|
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
# Learnings — <change-id>
|
|
2
|
+
|
|
3
|
+
<!--
|
|
4
|
+
Ephemeral working memory for this change. Lives only in
|
|
5
|
+
`.grimoire/changes/<change-id>/` and is **removed at finalize** with the rest
|
|
6
|
+
of the scaffolding — nothing here persists to the repo. Re-read it at the start
|
|
7
|
+
of every task section and before every retry.
|
|
8
|
+
|
|
9
|
+
Two sections, two lifecycles. Keep them separate; never write either into
|
|
10
|
+
`AGENTS.md`.
|
|
11
|
+
-->
|
|
12
|
+
|
|
13
|
+
## Failure-mode notes
|
|
14
|
+
|
|
15
|
+
<!--
|
|
16
|
+
Transient. One line per dead end: what was tried and why it failed, so the next
|
|
17
|
+
attempt does not repeat it. This is the antidote to thrashing — a stuck retry
|
|
18
|
+
MUST read this section first. Pruned per task: delete a task's entries the
|
|
19
|
+
moment that task goes green. Never promoted anywhere.
|
|
20
|
+
-->
|
|
21
|
+
|
|
22
|
+
Format: `- <task-id> · tried <approach> · failed: <observed error / why>`
|
|
23
|
+
|
|
24
|
+
- 2.2 · tried mocking the client wrapper · failed: mock satisfied an assertion prod code never reaches — mock at the HTTP boundary instead
|
|
25
|
+
|
|
26
|
+
## Discovered facts
|
|
27
|
+
|
|
28
|
+
<!--
|
|
29
|
+
Durable facts about the project learned while implementing — a build flag, a
|
|
30
|
+
convention, an undocumented contract, an architectural constraint. Staged here
|
|
31
|
+
only until reconciled into the one home that owns that fact at finalize, then
|
|
32
|
+
cleared. Recording the destination home makes reconciliation mechanical and
|
|
33
|
+
lets the user correct the routing — that reconciliation is what keeps the fact
|
|
34
|
+
from going stale, because it then lives where the project's own changes keep it
|
|
35
|
+
honest.
|
|
36
|
+
-->
|
|
37
|
+
|
|
38
|
+
Format: `- fact: <what was learned> → home: <area doc | decision | constraint | schema | feature>`
|
|
39
|
+
|
|
40
|
+
- fact: the bdd suite needs `TZ=UTC` or time-based scenarios flake → home: `.grimoire/docs/<area>.md`
|