@pilotspace/add 1.1.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (61) hide show
  1. package/CHANGELOG.md +81 -0
  2. package/GETTING-STARTED.md +187 -139
  3. package/README.md +13 -7
  4. package/bin/cli.js +96 -5
  5. package/docs/01-principles.md +3 -3
  6. package/docs/02-the-flow.md +19 -12
  7. package/docs/03-step-1-specify.md +15 -13
  8. package/docs/04-step-2-scenarios.md +2 -2
  9. package/docs/05-step-3-contract.md +3 -3
  10. package/docs/06-step-4-tests.md +10 -2
  11. package/docs/07-step-5-build.md +3 -1
  12. package/docs/08-step-6-verify.md +25 -5
  13. package/docs/09-the-loop.md +12 -6
  14. package/docs/10-setup-and-stages.md +27 -13
  15. package/docs/11-governance.md +6 -2
  16. package/docs/12-roles.md +3 -3
  17. package/docs/13-adoption.md +1 -1
  18. package/docs/14-foundation.md +15 -15
  19. package/docs/15-foundations-and-lineage.md +106 -0
  20. package/docs/README.md +4 -0
  21. package/docs/appendix-a-templates.md +3 -3
  22. package/docs/appendix-b-prompts.md +40 -5
  23. package/docs/appendix-c-glossary.md +49 -12
  24. package/docs/appendix-d-worked-example.md +2 -2
  25. package/docs/appendix-e-checklists.md +16 -4
  26. package/docs/appendix-f-requirements-matrix.md +8 -8
  27. package/docs/appendix-g-references.md +106 -0
  28. package/package.json +1 -1
  29. package/skill/add/SKILL.md +41 -38
  30. package/skill/add/adopt.md +13 -11
  31. package/skill/add/deltas.md +8 -6
  32. package/skill/add/fold.md +19 -17
  33. package/skill/add/graduate.md +74 -0
  34. package/skill/add/intake.md +22 -7
  35. package/skill/add/loop.md +59 -0
  36. package/skill/add/phases/0-ground.md +66 -0
  37. package/skill/add/phases/0-setup.md +32 -25
  38. package/skill/add/phases/1-specify.md +28 -13
  39. package/skill/add/phases/2-scenarios.md +14 -4
  40. package/skill/add/phases/3-contract.md +27 -12
  41. package/skill/add/phases/4-tests.md +15 -5
  42. package/skill/add/phases/5-build.md +33 -4
  43. package/skill/add/phases/6-verify.md +40 -2
  44. package/skill/add/phases/7-observe.md +13 -5
  45. package/skill/add/report-template.md +65 -7
  46. package/skill/add/run.md +93 -39
  47. package/skill/add/scope.md +10 -6
  48. package/skill/add/setup-review.md +13 -10
  49. package/skill/add/streams.md +88 -23
  50. package/tooling/add.py +1817 -90
  51. package/tooling/templates/CONVENTIONS.md.tmpl +1 -1
  52. package/tooling/templates/DESIGN.md.tmpl +66 -0
  53. package/tooling/templates/GLOSSARY.md.tmpl +29 -0
  54. package/tooling/templates/MILESTONE.md.tmpl +1 -0
  55. package/tooling/templates/PROJECT.md.tmpl +6 -3
  56. package/tooling/templates/TASK.md.tmpl +55 -15
  57. package/tooling/templates/catalog.sample.json +38 -0
  58. package/tooling/templates/prototype.sample.json +48 -0
  59. package/tooling/templates/tokens.sample.json +55 -0
  60. package/tooling/templates/udd-catalog.md +122 -0
  61. package/tooling/templates/udd-tokens.md +79 -0
@@ -23,8 +23,8 @@ Reject:
23
23
  - source == destination -> "same_account"
24
24
  - balance < amount -> "insufficient_funds"
25
25
  - account not mine -> "forbidden"
26
- Assumptions — least-sure first:
27
- ⚠ same currency only (no FX) in v1 — least sure because the ticket never said; if wrong: the amount/rounding model changes and this contract is wrong
26
+ Assumptions — lowest-confidence first:
27
+ ⚠ same currency only (no FX) in v1 — lowest confidence because the ticket never said; if wrong: the amount/rounding model changes and this contract is wrong
28
28
  - [x] no daily limit in v1 — confirmed: out of scope for v1
29
29
  ```
30
30
 
@@ -18,7 +18,8 @@ Every exit check in the book, collected for quick use. Print this page.
18
18
  - [ ] Every required behavior stated explicitly.
19
19
  - [ ] Every rejection has a named error code.
20
20
  - [ ] Success state-change described.
21
- - [ ] Assumptions ranked least-sure first; the 1–2 most-likely-wrong ⚠-flagged with why + cost (or an honest "none material" that still names the single biggest risk).
21
+ - [ ] Assumptions ranked lowest-confidence first; the 1–2 most-likely-wrong ⚠-flagged with why + cost (or an honest "none material" that still names the single biggest risk).
22
+ - [ ] "Existing behavior" assumptions carry grep/line citations; wiring claims name the production caller chain.
22
23
 
23
24
  ## Step 2 — Scenarios
24
25
 
@@ -40,6 +41,9 @@ Every exit check in the book, collected for quick use. Print this page.
40
41
  - [ ] Suite runs in the pipeline and is red for the right reason.
41
42
  - [ ] Tests assert behavior, not internals.
42
43
  - [ ] Coverage target recorded.
44
+ - [ ] No `should_panic` lying reds — unimplemented paths use `todo!()` so they fail.
45
+ - [ ] Collateral tests for globally-enumerated things listed by exact name.
46
+ - [ ] Arithmetic checked: fixtures can reach green against frozen constants.
43
47
 
44
48
  ## Step 5 — Build
45
49
 
@@ -55,7 +59,10 @@ Every exit check in the book, collected for quick use. Print this page.
55
59
  - [ ] Concurrency/timing of the risky operation is safe.
56
60
  - [ ] No exposed secrets, injection, or unexpected dependencies.
57
61
  - [ ] Layering and dependencies follow `CONVENTIONS.md`.
58
- - [ ] A person reviewed and approved.
62
+ - [ ] Deep check: wiring trace recorded (every new symbol reachable from production entry point) and no dead code introduced.
63
+ - [ ] Was the green earned? Adversarial refute-read on the unchanged suite (no overfit, no vacuous asserts, no stubbed logic).
64
+ - [ ] Full-suite rerun by orchestrator (not only the agent's scoped run).
65
+ - [ ] A person reviewed and approved, **or** auto-resolved by the run (under `autonomy: auto`, no residue).
59
66
  - [ ] Outcome recorded (`PASS` / `RISK-ACCEPTED` / `HARD-STOP`).
60
67
 
61
68
  ## The loop
@@ -70,11 +77,16 @@ Every exit check in the book, collected for quick use. Print this page.
70
77
 
71
78
  A feature is shippable only when all are true:
72
79
 
73
- - [ ] Spec complete: behavior stated, rejections named, assumptions ranked least-sure first with the biggest risk flagged.
80
+ - [ ] Spec complete: behavior stated, rejections named, assumptions ranked lowest-confidence first with the biggest risk flagged.
81
+ - [ ] Wiring and "existing behavior" assumptions carry grep/line citations; wiring claims name the production caller chain.
74
82
  - [ ] Every rule has a scenario.
75
83
  - [ ] Contract frozen; contract tests green.
76
- - [ ] A test per scenario; suite was red before the build.
84
+ - [ ] A test per scenario; suite was red before the build (no `should_panic` lying reds).
85
+ - [ ] Collateral tests listed by exact name; arithmetic checked against frozen constants.
77
86
  - [ ] All tests green; coverage held; tests and contract untouched by the AI.
87
+ - [ ] Wiring trace recorded: every new symbol reachable from production entry point.
88
+ - [ ] Adversarial refute-read confirms the green was earned (no overfit, no vacuous asserts, no stubbed logic).
89
+ - [ ] Full-suite rerun by orchestrator; not just the agent's scoped run.
78
90
  - [ ] Concurrency, security, and architecture checked by a person.
79
91
  - [ ] Gate outcome recorded with an accountable owner.
80
92
  - [ ] Released behind a flag, with monitors in place.
@@ -10,11 +10,11 @@ This appendix maps every AIDD document to a three-level project hierarchy, so th
10
10
 
11
11
  | Level | What it is | AIDD meaning | Spans |
12
12
  |-------|-----------|--------------|-------|
13
- | **Project** | the whole product or engagement | the survivor layer — documents created once and kept for the life of the product | all milestones |
13
+ | **Project** | the whole product or engagement | the living documentation — documents created once and kept for the life of the product | all milestones |
14
14
  | **Milestone** | a stage or release | one pass of the flow at a chosen depth: Prototype, POC, MVP, or Production-Ready; groups many tasks | many tasks |
15
15
  | **Task** | one feature through the flow | a single pass of Specify → … → Verify → Observe; the smallest unit with its own gate records | the seven steps |
16
16
 
17
- A **project** sets up the survivor-layer documents once. A **milestone** is a depth-bounded goal that groups tasks and has its own entry and exit document gates. A **task** is one feature, and it produces the per-feature artifacts.
17
+ A **project** sets up the living documentation once. A **milestone** is a depth-bounded goal that groups tasks and has its own entry and exit document gates. A **task** is one feature, and it produces the per-feature artifacts.
18
18
 
19
19
  ## How the hierarchy decomposes
20
20
 
@@ -53,12 +53,12 @@ Which document lives at which level, who is accountable for it, and how long it
53
53
  | `SLO.md` (objectives) | Milestone (MVP+) | from MVP | from MVP onward | DevOps / SRE |
54
54
  | `SPEC.md` | Task | per feature | living | Product / Domain |
55
55
  | `features/*.feature` | Task | per feature | living | QA / Test |
56
- | `contracts/*.md` | Task → **Project** | per feature, then frozen | survivor (promoted to project) | Architect / Lead |
56
+ | `contracts/*.md` | Task → **Project** | per feature, then frozen | living doc (promoted to project) | Architect / Lead |
57
57
  | `tests/*` | Task | per feature | living | QA / Engineer |
58
58
  | Source code | Task | per feature | **disposable** | Engineer |
59
59
  | Gate outcome records | Task | per step | kept for audit | the reviewer |
60
60
 
61
- > Note the one promotion: a **contract** is authored at task level but, once frozen, becomes part of the project's survivor layer — other tasks depend on it. That promotion is why a contract change is a project-level change request, not a task-local edit.
61
+ > Note the one promotion: a **contract** is authored at task level but, once frozen, becomes part of the project's living documentation — other tasks depend on it. That promotion is why a contract change is a project-level change request, not a task-local edit.
62
62
 
63
63
  ---
64
64
 
@@ -93,13 +93,13 @@ Every task, regardless of milestone, produces this artifact chain. The depth var
93
93
 
94
94
  | Step | Required document | Exit gate (the proof) | Detail |
95
95
  |------|-------------------|------------------------|--------|
96
- | 1 Specify | `SPEC.md` | rules + named rejections, assumptions ranked least-sure first (biggest risk ⚠-flagged) | [03](./03-step-1-specify.md) |
96
+ | 1 Specify | `SPEC.md` | rules + named rejections, assumptions ranked lowest-confidence first (biggest risk ⚠-flagged) | [03](./03-step-1-specify.md) |
97
97
  | 2 Scenarios | `features/<task>.feature` | one scenario per rule | [04](./04-step-2-scenarios.md) |
98
98
  | 3 Contract | `contracts/<task>.md` | frozen + contract tests green | [05](./05-step-3-contract.md) |
99
99
  | 4 Tests | `tests/<task>_*` | one test per scenario, red first | [06](./06-step-4-tests.md) |
100
100
  | 5 Build | source code + evidence bundle | all tests green, nothing weakened | [07](./07-step-5-build.md) |
101
101
  | 6 Verify | gate outcome record | `PASS` / `RISK-ACCEPTED` / `HARD-STOP` (auto-resolved on evidence under `autonomy: auto`; security always escalates) | [08](./08-step-6-verify.md) |
102
- | 7 Observe | `TASK.md` §7 OBSERVE block | released behind a flag; scenario-monitors live; spec delta + competency deltas captured | [09](./09-the-loop.md) |
102
+ | 7 Observe | `TASK.md` §7 OBSERVE block | released behind a flag; scenario-monitors live; spec delta + lessons learned captured | [09](./09-the-loop.md) |
103
103
 
104
104
  A task is **done** when the build's documents exist and the Verify record reads `PASS` (or a signed `RISK-ACCEPTED`); the seventh step — **Observe** (§7) — then runs in production and feeds the next loop's Specify. See the master shippable checklist in [Appendix E](./appendix-e-checklists.md).
105
105
 
@@ -136,13 +136,13 @@ The tests are the source of truth; this table is their index. If a row here is e
136
136
 
137
137
  ## Worked example — the hierarchy filled in
138
138
 
139
- - **Project:** *Mobile Banking App.* Survivor-layer documents: `CONVENTIONS.md`, `GLOSSARY.md` (defines *account*, *balance*, *transfer*), `MODEL_REGISTRY.md`, `dependencies.allowlist`, `playbook/`.
139
+ - **Project:** *Mobile Banking App.* Living documentation: `CONVENTIONS.md`, `GLOSSARY.md` (defines *account*, *balance*, *transfer*), `MODEL_REGISTRY.md`, `dependencies.allowlist`, `playbook/`.
140
140
  - **Milestone:** *MVP — core money movement.* Exit requires the full per-feature document set for each task below, plus a light `SLO.md` and a milestone exit report.
141
141
  - **Task:** *Transfer between own accounts* → `SPEC.md`, `features/transfer.feature`, `contracts/transfer.md` (frozen at v1), `tests/transfer_test.py`, code, and a `PASS` gate record. (The full set is in [Appendix D](./appendix-d-worked-example.md).)
142
142
  - **Task:** *View balance* → its own SPEC, feature, contract, tests, code, record.
143
143
  - **Task:** *Transaction history* → its own set.
144
144
 
145
- When all three tasks read `PASS` and the milestone documents exist, the MVP milestone exits — and the frozen `transfer` contract is now a project-level survivor artifact the next milestone builds on.
145
+ When all three tasks read `PASS` and the milestone documents exist, the MVP milestone exits — and the frozen `transfer` contract is now a project-level living-documentation artifact the next milestone builds on.
146
146
 
147
147
  ---
148
148
 
@@ -0,0 +1,106 @@
1
+ # Appendix G — References & Lineage
2
+
3
+ ADD did not appear from nowhere. It sits at the meeting point of three currents:
4
+ the **recursive self-improvement** thesis (AI that helps build the next AI), the
5
+ **spec-driven development** movement (the specification, not the code, is the
6
+ source of truth), and a decade of **agentic + tests-first** research showing that
7
+ a generate→check→refine loop, constrained by executable tests, turns fluent model
8
+ output into trustworthy software. This appendix is the curated, verified grounding
9
+ for that lineage — every source below is reachable and annotated with a `↔ ADD:`
10
+ line saying exactly how it relates to the method.
11
+
12
+ **The frame — "closing the loop."** Anthropic's recursive-self-improvement picture
13
+ runs from autonomous agents delegating to workers *today* toward a future where
14
+ Claude improves Claude. ADD is a deliberately **human-gated, evidence-trusted**
15
+ instance of that loop: the AI drives spec→build→verify→observe, but a human owns the
16
+ frozen contract and the verify gate, and trust comes from passing tests and
17
+ re-resolved evidence — never from a plausible-looking diff. The sources here are
18
+ the shoulders that posture stands on.
19
+
20
+ The four sections below are the four currents. The comparison table places ADD next
21
+ to its two closest peers — GitHub's **spec-kit** and **GSD (Get Shit Done)** — and
22
+ names where ADD diverges. Read "How to cite" first; the rest of the book cites into
23
+ the keys defined here.
24
+
25
+ ## How to cite
26
+
27
+ The book uses one inline citation form — **author-year** — and every entry's lead
28
+ `(Author Year)` *is* its cite-key. Resolve any inline `[…]` to the matching entry below.
29
+
30
+ | Authors | Inline form | Example |
31
+ |---|---|---|
32
+ | one author | `[Surname Year]` | `[Schmidhuber 2003]` |
33
+ | two authors | `[Surname & Surname Year]` | `[Mathews & Nagappan 2024]` |
34
+ | three or more | `[Surname et al. Year]` | `[Zelikman et al. 2023]` |
35
+ | an organisation | `[Org Year]` | `[Anthropic 2026a]` · `[GitHub 2025]` |
36
+ | several at once | joined by `; ` | `[Schmidhuber 2003; Zelikman et al. 2023]` |
37
+ | same author, same year | add a `Year`-letter suffix | `[Anthropic 2025a]` / `[Anthropic 2025b]` |
38
+
39
+ The 3+-author rule becomes **et al.**; an organisation stands in as the author
40
+ when no individual is credited; and when two org-authored sources collide on a year
41
+ (several Anthropic 2025/2026 items do, below) a trailing letter disambiguates them.
42
+ There is exactly one entry per cite-key.
43
+
44
+ ## spec-kit ↔ ADD (and GSD)
45
+
46
+ ADD shares the spec-first DNA of GitHub's **spec-kit** and the Claude-Code,
47
+ context-rot-fighting niche of **GSD**. The phase models line up closely:
48
+
49
+ | ADD phase | spec-kit command | GSD phase |
50
+ |---|---|---|
51
+ | foundation · principles | `/speckit.constitution` → `constitution.md` | (project setup / `CLAUDE.md`-level) |
52
+ | §1 specify (what / why) | `/speckit.specify` → `spec.md` | **discuss** — capture decisions before planning |
53
+ | §3 contract (how, frozen) | `/speckit.plan` → `plan.md`, `contracts/` | **plan** — research, decompose, fit fresh context |
54
+ | milestone tasks / waves | `/speckit.tasks` → `tasks.md` | (phases → parallel waves) |
55
+ | §5 build | `/speckit.implement` | **execute** — parallel waves, fresh 200k-token context each |
56
+ | §6 verify | `/speckit.analyze` + `/speckit.checklist` | **verify** — walk what was built, fix before declaring done |
57
+
58
+ **Where ADD diverges.** spec-kit stops at `implement`; GSD ends at verify (GSD Core
59
+ adds a fifth *ship* phase). ADD closes the loop past both by adding three things
60
+ neither has as a first-class gate: a **failing-tests-first** gate (§4 — no build
61
+ starts until the tests are red for the right reason), an **observe→`fold`**
62
+ self-improvement step (§7 — confirmed learnings consolidate into a versioned foundation),
63
+ and an engine-tracked **dynamic goal-loop** that will hold a milestone open and
64
+ reopen tasks until its exit criteria are met. ADD also deliberately targets **less
65
+ doc-time than GSD** — a lean foundation and one human approval per task, rather than
66
+ a document per phase. The shared lineage is real; the tests-first gate, the `fold`,
67
+ and the goal-loop are ADD's contribution.
68
+
69
+ ## 1. Recursive self-improvement
70
+
71
+ - **When AI builds itself** (Favaro & Clark 2026) — https://www.anthropic.com/institute/recursive-self-improvement — essay. The RSI thesis: by 2026 >80% of code merged at Anthropic was Claude-authored and the 50%-task time-horizon keeps doubling; recursive self-improvement would shift humans from builders to validators. ↔ ADD: the seed source — ADD is the human-gated, evidence-trusted way to run a spec→build→verify→observe loop while the human stays the validator.
72
+ - **Automated Alignment Researchers** (Anthropic 2026a) — https://www.anthropic.com/research/automated-alignment-researchers — research. Nine parallel Claude agents recovered ~97% of the human-expert gap on an alignment task in 5 days versus 7 for the human team. ↔ ADD: the strongest evidence the recursive loop is not speculative — parallel agents under review are exactly ADD's wave-plus-verify shape.
73
+ - **Machines of Loving Grace** (Amodei 2024) — https://www.darioamodei.com/essay/machines-of-loving-grace — essay. A "country of geniuses in a datacenter," argued with a measured, bounded position on recursive self-improvement. ↔ ADD: the intent framing behind milestoning — bound the loop with human direction rather than let it run open.
74
+ - **Gödel Machines: Self-Referential Universal Problem Solvers** (Schmidhuber 2003) — https://arxiv.org/abs/cs/0309048 — paper. A provably-optimal self-modifying agent that rewrites itself only when it can prove the rewrite helps. ↔ ADD: the mathematical anchor of the lineage — and a precedent for "only change on proof," which ADD enforces socially via the never-weaken-a-test rule.
75
+ - **STOP: Self-Taught Optimizer** (Zelikman et al. 2023) — https://arxiv.org/abs/2310.02304 — paper. A scaffolding program recursively improves the code that improves code. ↔ ADD: the algorithmic kin of the `fold` step — consolidate confirmed learnings back into the method that produced them.
76
+ - **Self-Refine: Iterative Refinement with Self-Feedback** (Madaan et al. 2023) — https://arxiv.org/abs/2303.17651 — paper. Generate→critique→refine with the same model lifts quality ~20% with no extra training. ↔ ADD: the micro-loop inside build→verify — produce, check against the contract, refine.
77
+ - **Self-Rewarding Language Models** (Yuan et al. 2024) — https://arxiv.org/abs/2401.10020 — paper. A model acts as its own reward judge to improve across iterations. ↔ ADD: the risk ADD answers — a self-judging loop needs an external gate; ADD makes tests and a human the reward signal, not the model's own opinion.
78
+ - **Reflexion: Language Agents with Verbal Reinforcement Learning** (Shinn et al. 2023) — https://arxiv.org/abs/2303.11366 — paper. Agents keep verbal reflections in episodic memory and retry, reaching 91% on HumanEval. ↔ ADD: the principle behind "reopen the task if criteria are unmet" — a failed check becomes feedback for the next attempt, not a dead end.
79
+ - **Voyager: An Open-Ended Embodied Agent with LLMs** (Wang et al. 2023) — https://arxiv.org/abs/2305.16291 — paper. An auto-curriculum agent that grows a reusable skill library over time. ↔ ADD: the growing foundation — each milestone's consolidated deltas are ADD's accumulating skill library.
80
+ - **AlphaEvolve: A Coding Agent for Scientific and Algorithmic Discovery** (Novikov et al. 2025) — https://arxiv.org/abs/2506.13131 — paper. An evolutionary coding agent that beat a long-standing matrix-multiplication record and shipped a production scheduler improvement. ↔ ADD: the end-state evidence — a generate-and-verify loop can exceed human baselines when every candidate is checked.
81
+
82
+ ## 2. Autonomous & agentic workflows
83
+
84
+ - **Building Effective Agents** (Schluntz & Zhang 2024) — https://www.anthropic.com/research/building-effective-agents — blog. The canonical taxonomy: prompt-chaining, routing, orchestrator-workers, and the evaluator-optimizer loop. ↔ ADD: the architecture cite — evaluator-optimizer is build→verify→refine; orchestrator-workers is ADD's wave parallelism.
85
+ - **Enabling Claude Code to work more autonomously** (Anthropic 2025a) — https://www.anthropic.com/news/enabling-claude-code-to-work-more-autonomously — news. Checkpoints, subagents, hooks, background tasks, and `/rewind` rollback. ↔ ADD: checkpoint/rewind is the rollback strategy behind phase gates; hooks are where the engine enforces them.
86
+ - **How we built our multi-agent research system** (Anthropic 2025b) — https://www.anthropic.com/engineering/multi-agent-research-system — blog. An Opus lead orchestrating Sonnet subagents, with an LLM acting as judge, lifting task performance ~90%. ↔ ADD: the lead-plus-subagents-plus-judge pattern is exactly ADD's wave execution under a verify gate.
87
+ - **ReAct: Synergizing Reasoning and Acting in Language Models** (Yao et al. 2022) — https://arxiv.org/abs/2210.03629 — paper. Interleaving think→act→observe turns a model into an agent. ↔ ADD: the base loop every ADD phase runs on.
88
+ - **Toolformer: Language Models Can Teach Themselves to Use Tools** (Schick et al. 2023) — https://arxiv.org/abs/2302.04761 — paper. Self-supervised learning of when and how to call external tools. ↔ ADD: the capability that lets an agent run its own tests, linters, and builds — the evidence ADD trusts.
89
+ - **SWE-agent: Agent–Computer Interfaces Enable Automated Software Engineering** (Yang et al. 2024) — https://arxiv.org/abs/2405.15793 — paper. A designed agent–computer interface materially improves autonomous issue resolution. ↔ ADD: the structured agent↔environment contract — ADD's `add.py` engine is that interface for the method.
90
+ - **The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery** (Lu et al. 2024) — https://arxiv.org/abs/2408.06292 — paper. A full idea→experiment→write→review research loop at ~$15 per paper. ↔ ADD: the research analog of ADD's loop — and a reminder that an automated reviewer is the weak link a human gate protects.
91
+
92
+ ## 3. Spec-driven development & spec-kit
93
+
94
+ - **GitHub Spec Kit** (GitHub 2025) — https://github.com/github/spec-kit — repo. The reference SDD toolkit: the phase model is `constitution` → `specify` → `plan` → `tasks` → `implement`, with the spec as the executable source of truth. ↔ ADD: the closest spec-first sibling — ADD's specify and contract phases map onto specify and plan; see the comparison table for the divergence.
95
+ - **Spec-driven development with AI: get started with a new open-source toolkit** (Delimarsky 2025) — https://github.blog/ai-and-ml/generative-ai/spec-driven-development-with-ai-get-started-with-a-new-open-source-toolkit/ — blog. The spec-kit launch post; frames `tasks` as "TDD for your AI agent." ↔ ADD: independent articulation of why decomposing a spec into checkable units beats one big prompt.
96
+ - **Spec-driven development: using Markdown as a programming language when building with AI** (Vesely 2025) — https://github.blog/ai-and-ml/generative-ai/spec-driven-development-using-markdown-as-a-programming-language-when-building-with-ai/ — blog. Spec-as-source, with context-rot named as the failure SDD exists to solve. ↔ ADD: the rationale for the frozen contract — a stable written spec is what survives when the model's context degrades.
97
+ - **Get Shit Done (GSD)** (GSD 2025) — https://github.com/open-gsd/gsd-core — repo. A meta-prompting, context-engineering, spec-driven system for Claude Code; its `discuss` → `plan` → `execute` → `verify` cycle runs each phase in a fresh subagent context to fight context-rot (originally `gsd-build/get-shit-done`, now continued as GSD Core). ↔ ADD: ADD's closest peer — same Claude-Code, context-rot niche; ADD diverges with the tests-first gate, the observe→`fold` step, and the dynamic goal-loop, and aims for less doc-time than GSD.
98
+ - **Beyond Vibe Coding: Amazon Introduces Kiro, the Spec-Driven Agentic IDE** (InfoQ 2025) — https://www.infoq.com/news/2025/08/aws-kiro-spec-driven-agent/ — blog. Kiro structures work as requirements→design→tasks with execution hooks. ↔ ADD: cross-vendor confirmation that spec-first is converging across the industry, not a single-tool idea.
99
+ - **Spec-Driven Development: From Code to Contract in the Age of AI Coding Assistants** (Piskala 2026) — https://arxiv.org/abs/2602.00180 — paper. A taxonomy of SDD rigor — Spec-First, Spec-Anchored, Spec-as-Source — reporting human-refined specs can cut LLM code errors substantially, with BDD as SDD's ancestor. ↔ ADD: places ADD as "Spec-Anchored" and gives the academic vocabulary for the contract-freeze decision.
100
+
101
+ ## 4. Tests-first & verification
102
+
103
+ - **Test-Driven Development for Code Generation** (Mathews & Nagappan 2024) — https://arxiv.org/abs/2402.13521 — paper. Supplying tests alongside the prompt measurably lifts pass rates on MBPP and HumanEval. ↔ ADD: the empirical backbone of the failing-tests-first gate — tests as the constraint that makes generation verifiable.
104
+ - **SWE-bench: Can Language Models Resolve Real-World GitHub Issues?** (Jimenez et al. 2023) — https://arxiv.org/abs/2310.06770 — paper. 2,294 real issues judged by whether the project's own tests pass; <2% solved at release. ↔ ADD: the yardstick that proves the point — "done" means the tests pass, which is exactly how ADD gates a feature.
105
+ - **Our framework for developing safe and trustworthy agents** (Anthropic 2025c) — https://www.anthropic.com/news/our-framework-for-developing-safe-and-trustworthy-agents — news. Five principles: human control, transparency, alignment, privacy, and security. ↔ ADD: the frozen-contract gate and never-weaken-a-test rule are human control and transparency made concrete; the security HARD-STOP is the security principle.
106
+ - **Responsible Scaling Policy v3.0** (Anthropic 2026b) — https://www.anthropic.com/news/responsible-scaling-policy-v3 — policy. The AI Safety Level framework; ASL-3 governs autonomous R&D capability. ↔ ADD: the governance ceiling that makes ADD's discipline necessary — as the loop gets more capable, the gates and the human-owned verify matter more, not less.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@pilotspace/add",
3
- "version": "1.1.0",
3
+ "version": "1.3.0",
4
4
  "description": "ADD (AI-Driven Development) — a minimal, state-tracked Claude Code skill that drives every feature through Specify → Scenarios → Contract → Tests → Build → Verify → Observe. Ships the AIDD book as its trust layer.",
5
5
  "bin": {
6
6
  "add": "bin/cli.js"
@@ -20,7 +20,7 @@ You are the orchestrator. ADD keeps the AI fast *and* safe by fixing direction
20
20
  the result through passing evidence rather than a plausible-looking diff.
21
21
 
22
22
  **One file = one task.** Each feature lives in a single `.add/tasks/<slug>/TASK.md`
23
- with seven sections. You fill them top to bottom; the Python tool tracks where
23
+ with a §0 ground preamble and seven step sections. You fill them top to bottom; the Python tool tracks where
24
24
  you are so context never rots across sessions.
25
25
 
26
26
  ## Always start here (orient — do not skip)
@@ -34,7 +34,7 @@ python3 .add/tooling/add.py status
34
34
  - **No `.add/state.json` yet** (a fresh install drops tooling + docs but does *not* init — so `status` says
35
35
  `no .add/ project found`) → enter **autonomous setup**: YOU run init yourself —
36
36
  `add.py init --name "<inferred>" --stage <picked> --await-lock` (don't tell the human to) — then read
37
- `phases/0-setup.md` and draft the foundation + first scope + first contract through to the human lock-down.
37
+ `phases/0-setup.md` and draft the foundation + first scope + first contract through to the human baseline approval.
38
38
  - **A task is active** → open `.add/tasks/<active>/TASK.md`, look at its `phase:`
39
39
  marker, and read the matching `phases/<n>-<phase>.md`. Work *only* that phase.
40
40
  - **No active task** → first SIZE the request (see Intake below), then create the
@@ -45,8 +45,9 @@ python3 .add/tooling/add.py status
45
45
  When the user brings a raw request, classify it BEFORE making a milestone or task:
46
46
  read `intake.md` and place it in exactly one bucket — `new-major` · `sub-milestone`
47
47
  · `task` · `change-request` — then propose `{ bucket, rationale, command }` and let
48
- the human confirm. This is the intake altitude (request → versioned scope); see
49
- `intake.md` for the rubric, the tie-break order, and worked examples.
48
+ the human confirm. This is the intake level (request → versioned scope); see
49
+ `intake.md` for the rubric, the tie-break order, and worked examples. A question or
50
+ unsharp intent? **Interview before you size** — explore and suggest first (`intake.md`).
50
51
 
51
52
  Once a request is classified `new-major`/`sub-milestone`, drafting the actual
52
53
  `MILESTONE.md` (goal · scope · exit criteria · breadth-first tasks) is the second
@@ -59,57 +60,56 @@ Load the phase guide **only for the phase you are in** (progressive disclosure):
59
60
 
60
61
  | Phase | Guide | Produces (TASK.md section) | Who leads |
61
62
  |-------|-------|----------------------------|-----------|
62
- | setup | `phases/0-setup.md` | `.add/` + survivors + first §1–§3 + `SETUP-REVIEW.md` | AI drafts → **human locks** (the lock-down) |
63
- | specify | `phases/1-specify.md` | §1 rules + ranked least-sure flag | AI drafts (co-specify) |
63
+ | setup | `phases/0-setup.md` | `.add/` + living docs + first §1–§3 + `SETUP-REVIEW.md` | AI drafts → **human locks** (the baseline approval) |
64
+ | ground | `phases/0-ground.md` | §0 GROUND map (real files · symbols · the anchors §3 cites) | **AI** (the §0 preamble — no new gate) |
65
+ | specify | `phases/1-specify.md` | §1 rules + ranked lowest-confidence flag | AI drafts (co-specify)† |
64
66
  | scenarios | `phases/2-scenarios.md` | §2 Given/When/Then | AI drafts† |
65
- | contract | `phases/3-contract.md` | §3 frozen shape | AI drafts → **human approves once** (the seam)† |
67
+ | contract | `phases/3-contract.md` | §3 frozen shape | AI drafts → **human approves once** (the decision point)† |
66
68
  | tests | `phases/4-tests.md` | §4 + red suite in `tests/` | AI drafts† |
67
69
  | build | `phases/5-build.md` | code in `src/`, tests green | **AI** |
68
70
  | verify | `phases/6-verify.md` | §6 checks + gate record | **AI auto-gates on evidence**; human on residue/security‡ |
69
71
  | observe | `phases/7-observe.md` | §7 spec delta | human + AI |
70
72
 
71
- † **One-approval front (v7).** §1–§4 are drafted by the AI as a single bundle and frozen
72
- together; the human gives **one approval, at the contract freeze** (the autonomy seam) not
73
- three separate sign-offs. The AI presents the bundle least-sure-first. See `run.md`.
74
- **Verify auto-gate (v6–v7).** Under `autonomy: auto` (the default) a run may auto-PASS once
75
- the evidence is complete (all tests green · loops dry · no residue) recorded as *auto-resolved*,
76
- an explicit PASS, not a skip. **Security always escalates** (HARD-STOP), as do concurrency /
77
- architecture residue and `conservative` autonomy. See `run.md`.
73
+ † **The specification bundle (v7).** §1–§4 are one bundle; the human gives **one approval at the
74
+ contract freeze** (the decision point), presented lowest-confidence-first. See `run.md`.
75
+ **Verify auto-gate (v6–v7).** Under `autonomy: auto` (the default) a run may auto-PASS on
76
+ complete evidence — recorded as *auto-resolved*, an explicit PASS, not a skip. **Security always
77
+ escalates** (HARD-STOP); so do concurrency / architecture residue and a lowered autonomy level (`conservative` / `manual`).
78
+ See `run.md`.
78
79
 
79
- Whenever you present a seam to the human in chat (intake · front approval · gate ·
80
- milestone close), follow `report-template.md` — SUMMARY DECISION FLAGS
81
- EVIDENCE → NEXT, engine-sourced facts, show-before-ask, never pre-stamp a seam.
80
+ Whenever you present a decision point to the human in chat (intake · bundle approval · gate ·
81
+ milestone close), follow `report-template.md` — open with the ARC (goal · done · plan,
82
+ engine-sourced), then SUMMARY → DECISION → ⚠ FLAGS → EVIDENCE → NEXT, show-before-ask, never
83
+ pre-stamp a decision point — and the question is a summary, never the artifact.
82
84
 
83
- In **observe**, also emit **competency deltas** — learnings tagged by which of the five
85
+ In **observe**, also emit **lessons learned** — learnings tagged by which of the five
84
86
  (`DDD · SDD · UDD · TDD · ADD`) they improve — so the foundation self-improves across loops.
85
- You write them as `open`; the human folds them into `PROJECT.md`. Read `deltas.md` for the
86
- grammar and the status lifecycle. At milestone close (or on demand), run the fold ritual that
87
+ You write them as `open`; the human consolidates them into `PROJECT.md`. Read `deltas.md` for the
88
+ grammar and the status lifecycle. At milestone close (or on demand), run the retrospective consolidation that
87
89
  gathers confirmed deltas into a versioned foundation — read `fold.md`.
88
90
 
89
- ## The dynamic run (v6–v7)
91
+ ## Beyond the bundle — load on demand
90
92
 
91
- Once **§3 CONTRACT is FROZEN**, the build→verify half runs as a dynamic, auto-gated run
92
- fan-out + in-run convergence instead of a manual build (`autonomy: auto` is the default; lower
93
- to `conservative` to keep a human at the gate). Read `run.md` for the trigger, the touch-boundary,
94
- the evidence auto-gate, and the autonomy dial. The human-led front still owns *direction*, but v7
95
- compresses it to a **single approval at the contract seam**; the run never edits a frozen contract
96
- and never auto-passes a security finding.
93
+ Once **§3 CONTRACT is FROZEN**, the build→verify half is a dynamic, auto-gated run
94
+ (`autonomy: auto` default, lowered to `conservative` or `manual` for a human gate) read `run.md`. To
95
+ pipeline several ready tasks behind their own frozen contracts, read `streams.md`.
97
96
 
98
- ## Parallel streams pipelining independent tasks (opt-in)
97
+ When a milestone's tasks are all done but its **goal** (the `MILESTONE.md` exit criteria) is not
98
+ yet met, `milestone-done` holds the milestone open — read `loop.md` for the dynamic loop that turns
99
+ open deltas + extras into the next tasks, proposed by you and confirmed by the human, until the goal is met.
99
100
 
100
- The default is one task at a time. When a milestone has several tasks whose `deps=` are
101
- already `PASS` and a human is ready to review, you MAY run them concurrently: read
102
- `streams.md`. It changes no `add.py` code you compute a READY-QUEUE from `status`,
103
- spawn one worker per ready task (each in a worktree, building behind its own frozen
104
- contract), and keep the human seams (front approval · escalated Verify) on one serial
105
- REVIEW-QUEUE. The honest gain is pipelining (the reviewer never waits on a build), not
106
- N× speed; the autonomy dial sets how much actually overlaps.
101
+ When `add.py status` prints **`MVP covered propose graduation`** (every milestone done AND the
102
+ stage-goal-criteria all `[x]`), the project is ready to graduate its stage read `graduate.md` for the
103
+ orchestration: gather `graduation-report` analytics co-specify interview draft ≥1 production
104
+ milestone human confirm then (and only then) `stage production`. The flip is guarded
105
+ (`stage_no_roadmap`) and is the FINAL step never a bare label change.
107
106
 
108
107
  ## Non-negotiable rules (from the method)
109
108
 
109
+ <constraints>
110
110
  1. **Direction before speed.** Never start Build until §1–§4 exist and tests are red.
111
111
  2. **Trust evidence, not inspection.** A feature is trusted because its tests pass
112
- and the blind-spots (concurrency, security, architecture) were checked — not
112
+ and the non-functional risks (concurrency, security, architecture) were checked — not
113
113
  because the code reads plausibly.
114
114
  3. **Never weaken a test or edit a frozen contract to make the build pass.** That
115
115
  inverts the method. A real change is a *change request* back to Specify.
@@ -117,6 +117,7 @@ N× speed; the autonomy dial sets how much actually overlaps.
117
117
  `PASS`, `RISK-ACCEPTED` (signed, non-security only), or `HARD-STOP`. A security
118
118
  finding is always `HARD-STOP`.
119
119
  5. **Ask, don't guess.** If a requirement is unclear, stop and ask the user.
120
+ </constraints>
120
121
 
121
122
  ## Advancing
122
123
 
@@ -136,9 +137,11 @@ The steps never change; their depth does. Read the stage from `add.py status`:
136
137
  - **prototype** — run light; code is throwaway; design/experience is the point.
137
138
  - **poc** — run contract/tests/build deeply on the single riskiest slice only.
138
139
  - **mvp** — full flow, narrow scope, light observation.
139
- - **production** — every step at full rigor + the observe loop.
140
+ - **production** — every step at full rigor + the observe loop. Reach it via the graduation
141
+ orchestration (`graduate.md`) when status shows `MVP covered → propose graduation`, never a bare
142
+ `stage production` flip — the transition is guarded behind a human-confirmed roadmap.
140
143
 
141
- ## The trust layer
144
+ ## The method rationale
142
145
 
143
146
  The full method (the *why* behind every rule) is the AIDD book in `.add/docs/`.
144
147
  When a phase decision is genuinely unclear, read the linked chapter — each phase
@@ -3,8 +3,8 @@
3
3
  When ADD is pointed at a repo that already has code, onboarding is **silent**: the code
4
4
  answers the questions a greenfield interview would ask, so you read it rather than ask.
5
5
  This is the **brownfield path** of setup (the greenfield path keeps the 4-lens interview —
6
- see `phases/0-setup.md`). You fill the survivor files from evidence, then stop at the one
7
- human gate: the **lock-down** (`add.py lock`).
6
+ see `phases/0-setup.md`). You fill the living-documentation files from evidence, then stop at the one
7
+ human gate: the **baseline approval** (`add.py lock`).
8
8
 
9
9
  ## The signal — and arming the gate
10
10
 
@@ -14,7 +14,7 @@ Enter a brownfield repo with `--await-lock`:
14
14
  python3 .add/tooling/add.py init --await-lock
15
15
  ```
16
16
 
17
- `--await-lock` does two things. It seeds an **unlocked** setup, which *arms the lock-down gate*
17
+ `--await-lock` does two things. It seeds an **unlocked** setup, which *arms the baseline-approval gate*
18
18
  — the engine then refuses a second task, crossing into build, and recording a gate until you
19
19
  `lock`. And init, being brownfield-aware, prints a line that begins:
20
20
 
@@ -29,9 +29,9 @@ code (a mechanical fact); it never reads or fills it — interpreting it is your
29
29
 
30
30
  ## The silent mapping
31
31
 
32
- Fill each survivor file in `.add/` from what the code actually shows — **ask nothing**:
32
+ Fill each living-doc file in `.add/` from what the code actually shows — **ask nothing**:
33
33
 
34
- | Survivor | Read it from |
34
+ | Living doc | Read it from |
35
35
  |----------|--------------|
36
36
  | `PROJECT.md` (foundation) | the domain nouns, entry points, the README, the first milestone the code implies |
37
37
  | `CONVENTIONS.md` | the languages, folder layout, naming, lint config, error style already in the tree |
@@ -41,19 +41,21 @@ Fill each survivor file in `.add/` from what the code actually shows — **ask n
41
41
 
42
42
  Two rules that never bend:
43
43
 
44
- 1. **Never clobber a survivor.** `init` already skips any survivor that exists; if a human
44
+ <constraints>
45
+ 1. **Never clobber a living doc.** `init` already skips any living-doc file that exists; if a human
45
46
  already wrote `PROJECT.md`, you READ it, you do not overwrite it. Add, never replace.
46
47
  2. **Tag every drafted decision `evidence-grounded` vs `guessed`.** A line you read from the
47
48
  code is *evidence-grounded* (cite the file). A line you inferred because the code was silent
48
- is *guessed*. The human's single lock-down is only honest if they can see which is which —
49
+ is *guessed*. The human's single baseline approval is only honest if they can see which is which —
49
50
  the guesses are what they actually need to check. (The tags feed `SETUP-REVIEW.md`.)
51
+ </constraints>
50
52
 
51
- ## Where it ends — the lock-down
53
+ ## Where it ends — the baseline approval
52
54
 
53
55
  Brownfield onboarding draws no per-step approvals. You map the foundation, then draft the
54
- first milestone's scope and the first task's candidate front exactly as greenfield does, and
55
- present it all at **one** human gate. The human reviews the decisions (least-sure / `guessed`
56
- first) and signs:
56
+ first milestone's scope and the first task's candidate specification bundle exactly as greenfield does, and
57
+ present it all at **one** human gate. The human reviews the decisions (lowest-confidence / `guessed`
58
+ first) and confirms in conversation; you run the lock with their name:
57
59
 
58
60
  ```bash
59
61
  python3 .add/tooling/add.py lock --by "<name>"
@@ -1,12 +1,12 @@
1
- # Competency deltas — how each loop sharpens the foundation
1
+ # Lessons learned — how each loop sharpens the foundation
2
2
 
3
- A **competency delta** is a single learning a task produces, tagged by which of ADD's five
3
+ A **lesson learned** is a single learning a task produces, tagged by which of ADD's five
4
4
  competencies it improves. You write deltas in a task's **OBSERVE** phase; later, the
5
- `foundation-update-loop` gathers the confirmed ones and folds them into a versioned `PROJECT.md`.
5
+ `foundation-update-loop` gathers the confirmed ones and consolidates them into a versioned `PROJECT.md`.
6
6
  This is how `DDD · SDD · UDD · TDD · ADD` stop being write-once and start converging.
7
7
 
8
8
  You (the AI) **emit** deltas as `open`. Only the **human** moves a delta to `folded` or `rejected`
9
- (folding into the foundation is judgment — see the verify/observe seam). You never self-fold.
9
+ (consolidating into the foundation is judgment — see the verify/observe decision point). You never self-approve a consolidation.
10
10
 
11
11
  ## The grammar (frozen)
12
12
 
@@ -50,7 +50,7 @@ That is its home. Split genuinely separate learnings into separate deltas; never
50
50
  ```
51
51
  emit (OBSERVE) human review (foundation-update-loop)
52
52
  open ───────────▶ folded (the learning is merged into PROJECT.md; version bumps)
53
- └──────────▶ rejected (considered and deliberately NOT folded — the trail is kept)
53
+ └──────────▶ rejected (considered and deliberately NOT consolidated — the trail is kept)
54
54
  ```
55
55
 
56
56
  An `open` delta is a pending signal. `folded` and `rejected` are both human decisions; a `rejected`
@@ -60,9 +60,11 @@ delta is left in place (not deleted) so "we saw this and chose not to act" stays
60
60
 
61
61
  There is no engine validator yet, so before you record a delta, self-check it:
62
62
 
63
+ <reject_codes>
63
64
  - `unknown_competency` — the tag is missing or not one of `DDD · SDD · UDD · TDD · ADD`. Fix the tag.
64
65
  - `no_evidence` — the `(evidence: …)` pointer is missing or empty. Add the proof, or drop the line.
65
66
  - `unknown_status` — the status is not `open | folded | rejected`. A fresh delta is `open`.
67
+ </reject_codes>
66
68
 
67
69
  ## Worked example
68
70
 
@@ -75,5 +77,5 @@ A task that built a tenancy feature finished its OBSERVE phase with:
75
77
  ```
76
78
 
77
79
  Three learnings, three competencies, each with a pointer. At the next foundation update the human
78
- folded the DDD and TDD deltas into `PROJECT.md` (→ `folded`) and rejected the ADD one as a one-off
80
+ consolidated the DDD and TDD deltas into `PROJECT.md` (→ `folded`) and rejected the ADD one as a one-off
79
81
  (→ `rejected`). The foundation got sharper; nothing was silently lost.
package/skill/add/fold.md CHANGED
@@ -1,29 +1,29 @@
1
- # Folding deltas — how the foundation self-improves
1
+ # Consolidating deltas — how the foundation self-improves
2
2
 
3
- This **closes the loop**. `deltas.md` lets a task EMIT learnings (`open` competency deltas in its
4
- OBSERVE phase); folding gathers the confirmed ones and writes them into a **versioned foundation**,
3
+ This **closes the loop**. `deltas.md` lets a task EMIT learnings (`open` lessons learned in its
4
+ OBSERVE phase); the retrospective consolidation gathers the confirmed ones and writes them into a **versioned foundation**,
5
5
  so `DDD · SDD · UDD · TDD · ADD` sharpen across milestones instead of drifting.
6
6
 
7
- You (the AI) **gather and propose**; the **human confirms**; you then write the **append-only** fold.
8
- You never self-foldfolding is judgment (see the verify/observe seam).
7
+ You (the AI) **gather and propose**; the **human confirms**; you then write the **append-only** consolidation.
8
+ You never self-approve a consolidation consolidating is judgment (see the verify/observe decision point).
9
9
 
10
- ## When to fold
10
+ ## When to consolidate
11
11
 
12
12
  At **milestone close** (the natural "version bump to the foundation"), or **on demand** when open
13
- deltas have piled up. This is a convention, not a command — there is no `add.py fold`; the ritual
13
+ deltas have piled up. This is a convention, not a command — there is no `add.py fold`; the consolidation
14
14
  lives here so the engine stays judgment-free.
15
15
 
16
16
  ## The ritual
17
17
 
18
- 1. **Gather** — scan every task's OBSERVE `### Competency deltas` block for lines still `open`.
18
+ 1. **Gather** — scan every task's §7 OBSERVE block for lesson-learned lines still `open` (`add.py deltas` reads them by the machine heading).
19
19
  2. **Group** — bucket them by competency (`DDD · SDD · UDD · TDD · ADD`).
20
20
  3. **Propose** — for each, draft the exact foundation edit (see routing) and show the human.
21
21
  4. **Confirm** — the human accepts or declines each delta. No write happens without this.
22
22
  5. **Write** — append the accepted edits, flip each delta's status, and bump the version.
23
23
 
24
- ## Fold routing (every competency has a home)
24
+ ## Consolidation routing (every competency has a home)
25
25
 
26
- | competency | folds into | how |
26
+ | competency | consolidates into | how |
27
27
  |------------|-----------|-----|
28
28
  | `DDD` | `PROJECT.md` §Domain (DDD) | refine/append a model bullet |
29
29
  | `SDD` | `PROJECT.md` §Spec / Living Document (SDD) | refine/append a settled-vs-open line |
@@ -31,7 +31,7 @@ lives here so the engine stays judgment-free.
31
31
  | `TDD` | `CONVENTIONS.md` | append a testing convention (no PROJECT.md section — it is the engine) |
32
32
  | `ADD` | `CONVENTIONS.md` | append a build/harness convention (likewise the engine) |
33
33
 
34
- **Every** fold — whatever the competency — ALSO appends one row to `PROJECT.md` **§Key Decisions**
34
+ **Every** consolidation — whatever the competency — ALSO appends one row to `PROJECT.md` **§Key Decisions**
35
35
  (date · decision · why · outcome): the universal, auditable trail of what the foundation learned.
36
36
 
37
37
  ## Status transitions & version
@@ -39,16 +39,18 @@ lives here so the engine stays judgment-free.
39
39
  - on **confirm**: the delta moves `open` → `folded` (and its edit is appended to the routed target).
40
40
  - on **decline**: the delta moves `open` → `rejected` and is **left in place** — never deleted —
41
41
  so "we considered this and chose not to act" stays auditable.
42
- - a fold is **append-only**: it adds bullets/rows; it never silently rewrites existing foundation text.
43
- - each fold session **bumps** the `foundation-version:` marker in `PROJECT.md` by one (monotonic int).
42
+ - a consolidation is **append-only**: it adds bullets/rows; it never silently rewrites existing foundation text.
43
+ - each consolidation session **bumps** the `foundation-version:` marker in `PROJECT.md` by one (monotonic int).
44
44
 
45
45
  ## Reject codes (the AI is first check, the human the backstop)
46
46
 
47
+ <reject_codes>
47
48
  - `no_open_deltas` — nothing is `open` anywhere. The ritual is a no-op; do **not** bump the version.
48
49
  - `unconfirmed_fold` — a write was attempted without recorded human confirmation. The AI proposes;
49
- it never self-folds. Stop and get confirmation.
50
- - `unroutable_delta` — a delta's competency is not one of the five, so it has no fold target. Fix the
51
- delta (it is malformed per `deltas.md`) before folding.
50
+ it never self-approves one. Stop and get confirmation.
51
+ - `unroutable_delta` — a delta's competency is not one of the five, so it has no consolidation target. Fix the
52
+ delta (it is malformed per `deltas.md`) before consolidating.
53
+ </reject_codes>
52
54
 
53
55
  ## Worked example (from this repo's own history)
54
56
 
@@ -60,7 +62,7 @@ which have no PROJECT.md section:
60
62
  - [TDD · open] structural tests guard canonical artifacts but not their dogfood twins (evidence: scope-loop note + this build)
61
63
  ```
62
64
 
63
- At the next fold the human confirms both. Routing sends each to `CONVENTIONS.md` (a "sync the dogfood
65
+ At the next consolidation the human confirms both. Routing sends each to `CONVENTIONS.md` (a "sync the dogfood
64
66
  tree + assert md5 parity" convention), appends a §Key Decisions row for each, flips them to `folded`,
65
67
  and bumps `foundation-version` 1 → 2. The two competencies the foundation never tracked before now
66
68
  have a home — which is exactly why v5 routes TDD/ADD to `CONVENTIONS.md`.