@pilotspace/add 1.1.0 → 1.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +81 -0
- package/GETTING-STARTED.md +187 -139
- package/README.md +13 -7
- package/bin/cli.js +96 -5
- package/docs/01-principles.md +3 -3
- package/docs/02-the-flow.md +19 -12
- package/docs/03-step-1-specify.md +15 -13
- package/docs/04-step-2-scenarios.md +2 -2
- package/docs/05-step-3-contract.md +3 -3
- package/docs/06-step-4-tests.md +10 -2
- package/docs/07-step-5-build.md +3 -1
- package/docs/08-step-6-verify.md +25 -5
- package/docs/09-the-loop.md +12 -6
- package/docs/10-setup-and-stages.md +27 -13
- package/docs/11-governance.md +6 -2
- package/docs/12-roles.md +3 -3
- package/docs/13-adoption.md +1 -1
- package/docs/14-foundation.md +15 -15
- package/docs/15-foundations-and-lineage.md +106 -0
- package/docs/README.md +4 -0
- package/docs/appendix-a-templates.md +3 -3
- package/docs/appendix-b-prompts.md +40 -5
- package/docs/appendix-c-glossary.md +49 -12
- package/docs/appendix-d-worked-example.md +2 -2
- package/docs/appendix-e-checklists.md +16 -4
- package/docs/appendix-f-requirements-matrix.md +8 -8
- package/docs/appendix-g-references.md +106 -0
- package/package.json +1 -1
- package/skill/add/SKILL.md +41 -38
- package/skill/add/adopt.md +13 -11
- package/skill/add/deltas.md +8 -6
- package/skill/add/fold.md +19 -17
- package/skill/add/graduate.md +74 -0
- package/skill/add/intake.md +22 -7
- package/skill/add/loop.md +59 -0
- package/skill/add/phases/0-ground.md +66 -0
- package/skill/add/phases/0-setup.md +32 -25
- package/skill/add/phases/1-specify.md +28 -13
- package/skill/add/phases/2-scenarios.md +14 -4
- package/skill/add/phases/3-contract.md +27 -12
- package/skill/add/phases/4-tests.md +15 -5
- package/skill/add/phases/5-build.md +33 -4
- package/skill/add/phases/6-verify.md +40 -2
- package/skill/add/phases/7-observe.md +13 -5
- package/skill/add/report-template.md +65 -7
- package/skill/add/run.md +93 -39
- package/skill/add/scope.md +10 -6
- package/skill/add/setup-review.md +13 -10
- package/skill/add/streams.md +88 -23
- package/tooling/add.py +1817 -90
- package/tooling/templates/CONVENTIONS.md.tmpl +1 -1
- package/tooling/templates/DESIGN.md.tmpl +66 -0
- package/tooling/templates/GLOSSARY.md.tmpl +29 -0
- package/tooling/templates/MILESTONE.md.tmpl +1 -0
- package/tooling/templates/PROJECT.md.tmpl +6 -3
- package/tooling/templates/TASK.md.tmpl +55 -15
- package/tooling/templates/catalog.sample.json +38 -0
- package/tooling/templates/prototype.sample.json +48 -0
- package/tooling/templates/tokens.sample.json +55 -0
- package/tooling/templates/udd-catalog.md +122 -0
- package/tooling/templates/udd-tokens.md +79 -0
|
@@ -6,34 +6,34 @@ This chapter covers two operational matters: what you set up once per project, a
|
|
|
6
6
|
|
|
7
7
|
---
|
|
8
8
|
|
|
9
|
-
## Setup: the AI drafts, you
|
|
9
|
+
## Setup: the AI drafts, you approve the baseline
|
|
10
10
|
|
|
11
|
-
Before the first feature, the project needs a foundation — but standing it up is no longer your chore. Point ADD at the repo and **the AI does the drafting**: it runs `init` itself, reads what is there, and fills the foundation the whole project depends on. Your single act is the **
|
|
11
|
+
Before the first feature, the project needs a foundation — but standing it up is no longer your chore. Point ADD at the repo and **the AI does the drafting**: it runs `init` itself, reads what is there, and fills the foundation the whole project depends on. Your single act is the **baseline approval** — the one human gate that freezes it.
|
|
12
12
|
|
|
13
|
-
**What the AI drafts.** From an existing codebase it works **silently** — the code answers the questions a setup interview would ask. On an empty repo it runs a short **four-lens interview** (domain · spec · users · decisions), then drafts. Either way it fills the
|
|
13
|
+
**What the AI drafts.** From an existing codebase it works **silently** — the code answers the questions a setup interview would ask. On an empty repo it runs a short **four-lens interview** (domain · spec · users · decisions), then drafts. Either way it fills the living documentation — the files that outlive all code — and drafts the first milestone's scope and the first task's candidate contract:
|
|
14
14
|
|
|
15
15
|
| Item | File | Purpose |
|
|
16
16
|
|------|------|---------|
|
|
17
17
|
| Foundation | `PROJECT.md` | domain · active spec · UI/UX · key decisions — the context every task reads first |
|
|
18
|
-
| Conventions | `CONVENTIONS.md` | naming, layout, language, formatter —
|
|
18
|
+
| Conventions | `CONVENTIONS.md` | naming, layout, language, formatter — living documentation |
|
|
19
19
|
| Model record | `MODEL_REGISTRY.md` | which AI model and version the project uses, for reproducibility and audit |
|
|
20
20
|
| Dependency allow-list | `dependencies.allowlist` | the packages the AI may use; the pipeline rejects others |
|
|
21
21
|
| Prompt playbook | `playbook/` | the six prompts from [Appendix B](./appendix-b-prompts.md) |
|
|
22
22
|
| Repository + pipeline | — | runs the gates on every change |
|
|
23
23
|
|
|
24
|
-
Every drafted decision is tagged **evidence-grounded** (read from the code) or **guessed** (thin or inferred) and listed
|
|
24
|
+
Every drafted decision is tagged **evidence-grounded** (read from the code) or **guessed** (thin or inferred) and listed lowest-confidence-first in a `SETUP-REVIEW.md`, so the one signature you give is informed rather than given without reading.
|
|
25
25
|
|
|
26
|
-
**The
|
|
26
|
+
**The baseline approval.** The AI presents `SETUP-REVIEW.md`; you check the `guessed` rows; you **lock** — once. That single act freezes the foundation, the first scope, and the first contract together. It is the setup-level analog of the [contract freeze](./05-step-3-contract.md), and it doubles as the first task's contract approval — so there is no separate sign-off. Before the lock the engine lets the AI draft but refuses to cross into build; after it, the build opens.
|
|
27
27
|
|
|
28
28
|
**Setup exit check**
|
|
29
29
|
|
|
30
|
-
- [ ] Foundation +
|
|
31
|
-
- [ ] `SETUP-REVIEW.md` lists every drafted decision
|
|
30
|
+
- [ ] Foundation + living docs drafted (brownfield: from the code, evidence-tagged; greenfield: from the interview, gaps flagged `guessed`).
|
|
31
|
+
- [ ] `SETUP-REVIEW.md` lists every drafted decision lowest-confidence-first.
|
|
32
32
|
- [ ] The model is pinned; the allow-list exists and the pipeline fails on any package outside it.
|
|
33
33
|
- [ ] The pipeline runs and is green on the empty skeleton.
|
|
34
34
|
- [ ] The human **locked down** — and only then did the first feature's build open.
|
|
35
35
|
|
|
36
|
-
Do not start a feature until the pipeline is green and the foundation is locked. The
|
|
36
|
+
Do not start a feature until the pipeline is green and the foundation is locked. The baseline approval turns the AI's draft into committed direction; the pipeline enforces every later exit check without anyone having to remember to.
|
|
37
37
|
|
|
38
38
|
---
|
|
39
39
|
|
|
@@ -80,7 +80,21 @@ The durable thing is never the code:
|
|
|
80
80
|
| POC → MVP | the spike code | the validated approach + the risky-interface contract |
|
|
81
81
|
| MVP → Production | nothing | everything; the code is real and is hardened |
|
|
82
82
|
|
|
83
|
-
The
|
|
83
|
+
The living documentation thickens as you move right: a prototype leaves you a validated design; a proof of concept adds a proven approach and a contract; the MVP adds real, kept code. By production, you are hardening, not rebuilding.
|
|
84
|
+
|
|
85
|
+
### Graduating between stages
|
|
86
|
+
|
|
87
|
+
Moving up a stage — most consequentially MVP → Production — is its own scope level, the fourth after setup, intake, and the milestone loop. It is *not* a label someone types: a project earns production through a human-confirmed roadmap of the hardening work, never through a bare flip. The `add` skill drives this in `graduate.md`; the shape is five steps.
|
|
88
|
+
|
|
89
|
+
**The cue.** When every milestone is `done` *and* the human's **stage-goal-criteria** in `PROJECT.md` are all `[x]`, `add.py status` prints `→ MVP covered → propose graduation`. Until both tallies complete, nothing here applies — a project with no stage-goal-criteria block behaves exactly as before.
|
|
90
|
+
|
|
91
|
+
1. **Gather the analytics.** `add.py graduation-report` clusters the whole MVP loop's evidence into five labeled record-sets — open deltas by competency, open RISK-ACCEPTED waivers by expiry, RETRO records, verify residue, and observe-loop coverage gaps. It *gathers, never judges*: there is no readiness verdict, only the records you reason from.
|
|
92
|
+
2. **Interview.** Synthesize *what production means here* with the human, using those records as the agenda. This synthesis is the judgment the engine refuses to make.
|
|
93
|
+
3. **Draft the roadmap.** For each production outcome the interview surfaces, draft a production milestone with the existing command — `add.py new-milestone <slug> --stage production --goal "…"` — and write its exit criteria. The roadmap is **≥1** milestone; the hardening work itself is what those milestones contain.
|
|
94
|
+
4. **Human confirms.** The human accepts, edits, or declines each draft. Nothing is created on an unconfirmed draft.
|
|
95
|
+
5. **Flip — the final step.** Only now run `add.py stage production`.
|
|
96
|
+
|
|
97
|
+
**The floor the engine enforces.** `add.py stage production` is guarded: it refuses with `stage_no_roadmap` (non-zero exit, state byte-unchanged) when no milestone has `stage: production`. The check is a *tally* — does a production-roadmap record exist? — never a readiness judgment, mirroring the milestone goal-gate. `--force` overrides it for grandfathered or edge cases; use it deliberately, not as the normal path. The guard is on the `→production` transition only; flips to prototype/poc/mvp are unchanged. The engine never advances the stage on its own — it gathers, counts, and holds the floor while the human judges and confirms.
|
|
84
98
|
|
|
85
99
|
---
|
|
86
100
|
|
|
@@ -88,14 +102,14 @@ The survivor layer thickens as you move right: a prototype leaves you a validate
|
|
|
88
102
|
|
|
89
103
|
The default is one task at a time. But when a milestone holds several tasks whose dependencies are already `PASS` and a reviewer is ready, you may run them **concurrently** — one worker per ready task, each building behind its own frozen contract.
|
|
90
104
|
|
|
91
|
-
**Be honest about the gain.** With one human reviewer you cannot beat `review_time × N_tasks`; the human-led
|
|
105
|
+
**Be honest about the gain.** With one human reviewer you cannot beat `review_time × N_tasks`; the human-led decision points are serial. So the win is **not throughput** — it is that the reviewer is *never blocked waiting on a build*. While a person reviews task A's specification bundle, the builds for B, C, and D run behind *their* frozen contracts. You hide build latency under human latency; do not promise more.
|
|
92
106
|
|
|
93
107
|
**Two queues, no new state** — both read from `add.py status`:
|
|
94
108
|
|
|
95
109
|
- **READY-QUEUE** — tasks in the active milestone where the phase is not `done` and every dependency already reads `gate=PASS`. These are the only tasks a worker may pick up; a task finishing `PASS` unblocks its dependents on the next `status`.
|
|
96
|
-
- **REVIEW-QUEUE** — the irreducibly serial part: the **
|
|
110
|
+
- **REVIEW-QUEUE** — the irreducibly serial part: the **bundle approval** (contract freeze) and any **Verify escalation**. One human, one queue, presented one at a time — never a batch that invites approval without reading.
|
|
97
111
|
|
|
98
|
-
**The autonomy
|
|
112
|
+
**The autonomy level is the throttle** — an explicit, overridable per-task token on an ordered ladder `manual < conservative < auto`. At `manual`, the human owns every gate and nothing auto-resolves (the strict floor). At `conservative`, both gates queue on the human (pure pipelining — builds overlap, nothing auto-resolves). At `auto` (the seeded default), only the bundle-approval decision point and residue escalations queue; Verify auto-PASSes on evidence, so real concurrency follows. The floor never drops below **one human approval per task, at the contract decision point**.
|
|
99
113
|
|
|
100
114
|
**Design for failure (required).** Lease each task to its worker with a timeout — if a worker dies, release the claim back to READY rather than trusting partial work. A worker that hits a stop-and-escalate blocks only its own task; siblings keep running. And if several workers fail in one wave, trip a circuit-breaker and fall back to sequential — repeated failure means the scope was wrong, not the parallelism.
|
|
101
115
|
|
package/docs/11-governance.md
CHANGED
|
@@ -19,7 +19,11 @@ How much the AI is allowed to do is not one switch; it is a setting chosen per a
|
|
|
19
19
|
|
|
20
20
|
The governing rule, restated from the principles: **operate only at the level your review capacity can sustain.** If the AI produces more than the team can verify, drop a level.
|
|
21
21
|
|
|
22
|
-
The **per-scope default is auto-with-evidence behind a one-approval
|
|
22
|
+
The **per-scope default is auto-with-evidence behind a one-approval decision point**: the AI drafts the specification bundle, a human approves the frozen contract once, and the build auto-gates on evidence. You *lower* a scope toward draft-and-review or suggest wherever risk is high or evidence is thin — and a high-risk or method-defining scope is *always* lowered (it is never auto-run). The default sets where you start; review capacity and risk set where you stay.
|
|
23
|
+
|
|
24
|
+
The engine expresses this per task as an explicit three-rung level — `autonomy: manual | conservative | auto`, an ordered ladder `manual < conservative < auto` declared in the `TASK.md` header and reviewed at the freeze. `auto` is auto-with-evidence behind the one approval (the seeded default); `conservative` is the deliberate lowering that keeps a person at the verify gate; `manual` is the strict floor where the human owns the gate and nothing auto-resolves. A high-risk or method-defining scope refuses an unguarded `auto` (`unguarded_high_risk_auto`) — it must be lowered to `conservative` or `manual`. The prose here and that engine token are one rule: prose ≡ enforcement.
|
|
25
|
+
|
|
26
|
+
**Autonomy is earned by goal-clarity — the auto-ready goal.** The autonomy level decides *who* resolves Verify; an **auto-ready goal** decides whether a self-verifying run is even *meaningful*. A milestone goal is *auto-ready* when **every exit criterion cites a verifier** — `(verify: <test | command | metric>)` — so the engine can check the result against the goal without human judgment. `add.py check` raises a `goal_not_auto_ready` WARN (never red, the active milestone only) while the goal has criteria not all cited, and `status` surfaces a `goal-ready:` line every session, so the goal-clarity gap stays visible. The WARN *measures*, it never blocks: it changes neither the freeze gate nor the autonomy level — clarifying the goal is the prerequisite that *earns* trust, not a new gate (a zero-criteria goal reads not-auto-ready and is milestone-shaping's nudge, not this one's). The lint raises the floor — a citation slot per criterion — but cannot prove the citation is honest: a human can still write `(verify: it works)`, and closing that is a person's judgment, not the engine's.
|
|
23
27
|
|
|
24
28
|
## The gate-fail protocol and the three reports
|
|
25
29
|
|
|
@@ -46,7 +50,7 @@ When someone proposes skipping a step "to go faster," this table is the answer:
|
|
|
46
50
|
|
|
47
51
|
## The continuous concerns
|
|
48
52
|
|
|
49
|
-
Four concerns are not steps but threads that run through every step, starting at project setup. Pulling them
|
|
53
|
+
Four concerns are not steps but threads that run through every step, starting at project setup. Pulling them forward ("shifting left") is far cheaper than bolting them on at the end.
|
|
50
54
|
|
|
51
55
|
| Concern | Begins at | Enforced at the build gate by |
|
|
52
56
|
|---------|-----------|-------------------------------|
|
package/docs/12-roles.md
CHANGED
|
@@ -11,8 +11,8 @@ Everyone on an AIDD team becomes, in part, a *verifier*; most also become *autho
|
|
|
11
11
|
- **Mission:** ensure the right thing gets built. You guard the problem.
|
|
12
12
|
- **Leads:** Specify. **Contributes to:** Scenarios; the loop (deciding what the next cycle addresses).
|
|
13
13
|
- **Owns:** the problem definition, the glossary of domain terms, the prioritized backlog.
|
|
14
|
-
- **Done means:** the spec states real user value with no disputed terms and its assumptions ranked
|
|
15
|
-
- **Apply it:** run the Specify prompt against a real ticket or interview, then read the AI's
|
|
14
|
+
- **Done means:** the spec states real user value with no disputed terms and its assumptions ranked lowest-confidence first — the one or two most likely wrong flagged with *why* and *what they cost*; after release, you have decided what the next loop must address.
|
|
15
|
+
- **Apply it:** run the Specify prompt against a real ticket or interview, then read the AI's lowest-confidence flag *first* and decide the one or two load-bearing assumptions before skimming the low-stakes tail. If you cannot confirm a load-bearing rule, it is not ready to build.
|
|
16
16
|
|
|
17
17
|
## Architect / Engineering Lead
|
|
18
18
|
|
|
@@ -40,7 +40,7 @@ Everyone on an AIDD team becomes, in part, a *verifier*; most also become *autho
|
|
|
40
40
|
|
|
41
41
|
## QA / Test Engineer
|
|
42
42
|
|
|
43
|
-
- **Mission:** make "done" machine-checkable; you are the
|
|
43
|
+
- **Mission:** make "done" machine-checkable; you are the guardrail for AI-written code.
|
|
44
44
|
- **Leads:** Tests. **Contributes to:** Scenarios (turning rules into checkable form); the loop (production monitors).
|
|
45
45
|
- **Owns:** the test suite, the scenario files, the coverage target, the test report at each gate.
|
|
46
46
|
- **Done means:** every scenario has a test that was red before the build; the suite is honest (nothing passes by default); coverage never regresses.
|
package/docs/13-adoption.md
CHANGED
|
@@ -19,7 +19,7 @@ Adopt the method on one real product, not as an all-at-once mandate.
|
|
|
19
19
|
|
|
20
20
|
| Choose… | When… |
|
|
21
21
|
|---------|-------|
|
|
22
|
-
| **Express** | startup, spike, or internal tool; speed of learning dominates; small
|
|
22
|
+
| **Express** | startup, spike, or internal tool; speed of learning dominates; small scope of impact |
|
|
23
23
|
| **Standard** | a normal product with real users and ordinary risk |
|
|
24
24
|
| **Regulated** | finance, health, or anything audited; failure is expensive or legally consequential |
|
|
25
25
|
|
package/docs/14-foundation.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# 14 · The foundation: project context across milestones
|
|
2
2
|
|
|
3
|
-
[← 13 Adoption](./13-adoption.md) · [Contents](./README.md) · Next: [
|
|
3
|
+
[← 13 Adoption](./13-adoption.md) · [Contents](./README.md) · Next: [15 Foundations & Lineage →](./15-foundations-and-lineage.md)
|
|
4
4
|
|
|
5
5
|
---
|
|
6
6
|
|
|
@@ -18,7 +18,7 @@ guesses. That is the same failure the method exists to prevent ([00](./00-introd
|
|
|
18
18
|
one level up.
|
|
19
19
|
|
|
20
20
|
The **foundation** is the layer that holds this context and *outlives every
|
|
21
|
-
milestone*. It is not new ceremony; it is the [
|
|
21
|
+
milestone*. It is not new ceremony; it is the [living documentation](./appendix-f-requirements-matrix.md)
|
|
22
22
|
the method already names, made explicit as three concerns.
|
|
23
23
|
|
|
24
24
|
## Three concerns, one foundation
|
|
@@ -53,14 +53,14 @@ fifth, where the AI executes on it:
|
|
|
53
53
|
|
|
54
54
|

|
|
55
55
|
|
|
56
|
-
> The diagram's foundation (DDD · SDD · UDD) and the method's own words —
|
|
57
|
-
>
|
|
56
|
+
> The diagram's foundation (DDD · SDD · UDD) and the method's own words — living
|
|
57
|
+
> documentation · the foundation document · ubiquitous language — name the same three ideas. This
|
|
58
58
|
> chapter is where the diagram and the text finally meet.
|
|
59
59
|
|
|
60
60
|
## One file, not three
|
|
61
61
|
|
|
62
62
|
A foundation that takes a week to write is a foundation no one keeps current. So
|
|
63
|
-
ADD realizes all three concerns as **one
|
|
63
|
+
ADD realizes all three concerns as **one living document — `PROJECT.md`** — with
|
|
64
64
|
one short section each, plus an append-only record of key decisions:
|
|
65
65
|
|
|
66
66
|
```
|
|
@@ -76,8 +76,8 @@ the detail belongs in a milestone or a contract, not the foundation. The foundat
|
|
|
76
76
|
is the *thin, durable* context the engine reads first — not a place to relocate the
|
|
77
77
|
work. And you do not hand-write it: at setup the AI **drafts** all four sections —
|
|
78
78
|
silently from an existing codebase, or from a short four-lens interview on a
|
|
79
|
-
greenfield repo — and a single human **
|
|
80
|
-
direction (the setup-
|
|
79
|
+
greenfield repo — and a single human **baseline approval** freezes that draft as committed
|
|
80
|
+
direction (the setup-level analog of a contract freeze).
|
|
81
81
|
|
|
82
82
|
## How it feeds the engine — and takes feedback back
|
|
83
83
|
|
|
@@ -101,24 +101,24 @@ life of the product, owned above any single milestone.
|
|
|
101
101
|
|
|
102
102
|
| Tier | Lives in | Lifespan | Holds |
|
|
103
103
|
|------|----------|----------|-------|
|
|
104
|
-
| **Project** (foundation) | `.add/PROJECT.md` +
|
|
104
|
+
| **Project** (foundation) | `.add/PROJECT.md` + living-doc files | whole product | domain, spec stance, users, decisions |
|
|
105
105
|
| **Milestone** | `.add/milestones/<slug>/MILESTONE.md` | one depth-bounded goal | scope, shared contracts, exit criteria |
|
|
106
106
|
| **Task** | `.add/tasks/<slug>/TASK.md` | one feature | the seven-step artifacts |
|
|
107
107
|
|
|
108
108
|
A milestone is a *version bump* to the foundation, not a fresh start: when it
|
|
109
|
-
closes,
|
|
109
|
+
closes, consolidate what it validated into `PROJECT.md` (a decision, a settled domain
|
|
110
110
|
term, a confirmed user journey) and open the next one against the same, now-richer,
|
|
111
|
-
ground. The
|
|
111
|
+
ground. The consolidation is not informal: each loop emits **lessons learned** (tagged
|
|
112
112
|
`DDD · SDD · UDD · TDD · ADD`) in its Observe step, and at milestone close a person
|
|
113
|
-
gathers the open ones and
|
|
114
|
-
bumped — into the foundation. See [09 · The loop](./09-the-loop.md#
|
|
113
|
+
gathers the open ones and consolidates them — append-only, with the `foundation-version:`
|
|
114
|
+
bumped — into the foundation. See [09 · The loop](./09-the-loop.md#lessons-learned-and-the-retrospective-consolidation)
|
|
115
115
|
for the grammar, the ritual, and the tooling (`add.py deltas`, `add.py check`).
|
|
116
116
|
|
|
117
117
|
## In the tooling
|
|
118
118
|
|
|
119
|
-
- `add.py init` scaffolds `PROJECT.md` as a
|
|
120
|
-
content and a single human **
|
|
121
|
-
|
|
119
|
+
- `add.py init` scaffolds `PROJECT.md` as a living-doc file; the AI then drafts its
|
|
120
|
+
content and a single human **baseline approval** (`add.py lock`) freezes it. Like every
|
|
121
|
+
living-doc file, `init` **never overwrites a hand-edited one**.
|
|
122
122
|
- `add.py status` shows a one-line pointer to the foundation, so a fresh session
|
|
123
123
|
re-orients on context before code.
|
|
124
124
|
- The guideline block written into `CLAUDE.md` / `AGENTS.md` tells any agent the
|
|
@@ -0,0 +1,106 @@
|
|
|
1
|
+
# 15 · Foundations & Lineage
|
|
2
|
+
|
|
3
|
+
[← 14 The foundation](./14-foundation.md) · [Contents](./README.md) · Next: [Appendix A Templates →](./appendix-a-templates.md)
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
ADD did not appear from nowhere. It sits where four currents meet: the **recursive
|
|
8
|
+
self-improvement** thesis (AI that helps build the next AI), a decade of **autonomous and
|
|
9
|
+
agentic** research, the **spec-driven development** movement (the specification, not the
|
|
10
|
+
code, is the source of truth), and the **tests-first** discipline that constrains a
|
|
11
|
+
generate→check→refine loop with executable tests — turning fluent model output into
|
|
12
|
+
trustworthy software. This chapter tells that story; [Appendix G](./appendix-g-references.md)
|
|
13
|
+
is the verified source list it cites into. Every `[Author Year]` here resolves to an entry
|
|
14
|
+
there.
|
|
15
|
+
|
|
16
|
+
## The frame — "closing the loop"
|
|
17
|
+
|
|
18
|
+
Anthropic's recursive-self-improvement picture runs from autonomous agents delegating to
|
|
19
|
+
workers *today* toward a future where Claude improves Claude — *closing the loop* on the
|
|
20
|
+
work of building AI itself [Favaro & Clark 2026]. That is the backdrop ADD is built for, and
|
|
21
|
+
its position inside that picture is deliberately narrow: ADD is a **human-gated,
|
|
22
|
+
evidence-trusted** instance of recursive self-improvement. The AI drives the whole inner
|
|
23
|
+
cycle — specify → build → verify → observe — but a human owns the frozen contract and the
|
|
24
|
+
verify gate, and trust comes from passing tests and re-resolved evidence, never from a
|
|
25
|
+
diff that merely reads plausibly. The argument is not that the loop should stay open
|
|
26
|
+
forever; it is that the loop should be *bounded by human direction* rather than left to run
|
|
27
|
+
unattended [Amodei 2024]. ADD is one concrete shape for that bound.
|
|
28
|
+
|
|
29
|
+
## The four currents
|
|
30
|
+
|
|
31
|
+
**Recursive self-improvement.** The mathematical anchor is the Gödel machine — a
|
|
32
|
+
self-modifying agent that rewrites itself *only when it can prove the rewrite helps*
|
|
33
|
+
[Schmidhuber 2003]. ADD enforces the same discipline socially rather than formally: the
|
|
34
|
+
never-weaken-a-test rule is "only change on proof" expressed as a gate. The algorithmic kin
|
|
35
|
+
arrived later — a scaffolding program that improves the code that improves code
|
|
36
|
+
[Zelikman et al. 2023], a generate→critique→refine micro-loop [Madaan et al. 2023], agents
|
|
37
|
+
that keep verbal reflections and retry [Shinn et al. 2023], an agent that grows a reusable
|
|
38
|
+
skill library over time [Wang et al. 2023], and an evolutionary coder that beat a
|
|
39
|
+
long-standing matrix-multiplication record under continuous checking
|
|
40
|
+
[Novikov et al. 2025]. And where a self-rewarding loop has the model judge its own reward
|
|
41
|
+
[Yuan et al. 2024], ADD diverges by design — it makes the tests and a human the reward
|
|
42
|
+
signal, not the model's own opinion.
|
|
43
|
+
|
|
44
|
+
**Autonomous and agentic workflows.** The architecture vocabulary comes from the canonical
|
|
45
|
+
taxonomy of prompt-chaining, routing, orchestrator-workers, and the evaluator-optimizer loop
|
|
46
|
+
[Schluntz & Zhang 2024] — where evaluator-optimizer *is* build→verify→refine and
|
|
47
|
+
orchestrator-workers is ADD's wave parallelism. Underneath it sit the base agent loop of
|
|
48
|
+
interleaved think→act→observe [Yao et al. 2022], the self-supervised tool use that lets an
|
|
49
|
+
agent run its own tests and builds [Schick et al. 2023], and the designed agent–computer
|
|
50
|
+
interface that materially lifts autonomous issue resolution [Yang et al. 2024] — the role
|
|
51
|
+
ADD's `add.py` engine plays for the method. The production reports close the gap from theory
|
|
52
|
+
to practice: checkpoints, subagents, and rollback for autonomous work [Anthropic 2025a], and
|
|
53
|
+
a lead orchestrating subagents under an LLM judge [Anthropic 2025b].
|
|
54
|
+
|
|
55
|
+
**Spec-driven development.** ADD's closest siblings are explicit specification systems.
|
|
56
|
+
GitHub's **spec-kit** runs `constitution` → `specify` → `plan` → `tasks` → `implement` with
|
|
57
|
+
the spec as the executable source of truth [GitHub 2025]; its launch framed task
|
|
58
|
+
decomposition as "TDD for your AI agent" [Delimarsky 2025], and its rationale named the
|
|
59
|
+
failure spec-driven work exists to solve — context degrading over a long session
|
|
60
|
+
[Vesely 2025]. The academic vocabulary followed, with a taxonomy of Spec-First,
|
|
61
|
+
Spec-Anchored, and Spec-as-Source rigor [Piskala 2026], and the pattern is converging across
|
|
62
|
+
vendors [InfoQ 2025]. Nearest of all is **GSD** — a spec-driven, context-engineering system
|
|
63
|
+
for the same Claude-Code niche [GSD 2025].
|
|
64
|
+
|
|
65
|
+
**Tests-first and verification.** The empirical backbone is direct: supplying tests
|
|
66
|
+
alongside the prompt measurably lifts pass rates [Mathews & Nagappan 2024], and the field's
|
|
67
|
+
yardstick judges a fix solely by whether the project's own tests pass [Jimenez et al. 2023].
|
|
68
|
+
"Done" means the tests pass — which is exactly how ADD gates a feature. The safety framing
|
|
69
|
+
completes the current: human control and transparency made concrete [Anthropic 2025c], under
|
|
70
|
+
a governance ceiling that grows *more* binding, not less, as the loop gets more capable
|
|
71
|
+
[Anthropic 2026b].
|
|
72
|
+
|
|
73
|
+
## Where ADD diverges
|
|
74
|
+
|
|
75
|
+
The shared lineage is real, but ADD is not a re-skin of its siblings. spec-kit stops at
|
|
76
|
+
`implement`; GSD ends at verify. ADD closes the loop past both by adding three things
|
|
77
|
+
neither spec-kit [GitHub 2025] nor GSD [GSD 2025] carries as a first-class gate:
|
|
78
|
+
|
|
79
|
+
- a **failing-tests-first gate** — no build starts until the tests are red for the right
|
|
80
|
+
reason, so the contract is proven executable before any code exists;
|
|
81
|
+
- an **observe → `fold`** step — confirmed lessons learned consolidate back into a versioned
|
|
82
|
+
foundation, so the method improves itself across loops (retrospective consolidation is the
|
|
83
|
+
recursive-self-improvement current turned inward on ADD);
|
|
84
|
+
- a **dynamic goal-loop** — the engine holds a milestone open and reopens tasks until its
|
|
85
|
+
exit criteria are met, rather than declaring done when a checklist empties.
|
|
86
|
+
|
|
87
|
+
ADD also deliberately targets **less doc-time than GSD** — a lean foundation and one human
|
|
88
|
+
approval per task instead of a document per phase. The tests-first gate, the `fold`, and the
|
|
89
|
+
goal-loop are ADD's contribution; everything beneath them is inherited.
|
|
90
|
+
|
|
91
|
+
## The evidence chain — the loop already runs
|
|
92
|
+
|
|
93
|
+
The case that this is not speculative rests on three measured facts. First, the task
|
|
94
|
+
time-horizon: the length of work models complete unaided keeps doubling [Favaro & Clark 2026].
|
|
95
|
+
Second, the authorship share: by 2026 more than 80% of the code merged at Anthropic was
|
|
96
|
+
Claude-authored [Favaro & Clark 2026]. Third, the **Automated Alignment Researchers** result:
|
|
97
|
+
nine parallel Claude agents recovered roughly 97% of the human-expert gap on an alignment task
|
|
98
|
+
in five days against the human team's seven [Anthropic 2026a] — parallel agents working under
|
|
99
|
+
review, which is precisely ADD's wave-plus-verify shape. The loop already runs.
|
|
100
|
+
|
|
101
|
+
What it does *not* yet supply is the discipline to trust the output. That is ADD's
|
|
102
|
+
contribution: the frozen contract, the never-weaken-a-test rule, the evidence-over-inspection
|
|
103
|
+
gate, and the security HARD-STOP that no autonomy level may auto-pass [Anthropic 2025c],
|
|
104
|
+
held beneath the responsible-scaling governance ceiling [Anthropic 2026b]. As the loop grows
|
|
105
|
+
more capable, those gates and the human-owned verify matter more, not less. ADD is the human-gated, evidence-trusted way to stand inside the
|
|
106
|
+
closing loop and still own the result.
|
package/docs/README.md
CHANGED
|
@@ -51,6 +51,9 @@ For every feature, before AI writes any code, you write four short artifacts in
|
|
|
51
51
|
- [13 · Adoption and onboarding](./13-adoption.md)
|
|
52
52
|
- [14 · The foundation: project context across milestones](./14-foundation.md)
|
|
53
53
|
|
|
54
|
+
**Lineage**
|
|
55
|
+
- [15 · Foundations & Lineage](./15-foundations-and-lineage.md)
|
|
56
|
+
|
|
54
57
|
**Part IV — Reference**
|
|
55
58
|
- [Appendix A · Templates](./appendix-a-templates.md)
|
|
56
59
|
- [Appendix B · Prompt library](./appendix-b-prompts.md)
|
|
@@ -58,6 +61,7 @@ For every feature, before AI writes any code, you write four short artifacts in
|
|
|
58
61
|
- [Appendix D · The worked example, end to end](./appendix-d-worked-example.md)
|
|
59
62
|
- [Appendix E · Checklists](./appendix-e-checklists.md)
|
|
60
63
|
- [Appendix F · Document requirements matrix (Project → Milestone → Task)](./appendix-f-requirements-matrix.md)
|
|
64
|
+
- [Appendix G · References & lineage](./appendix-g-references.md)
|
|
61
65
|
|
|
62
66
|
---
|
|
63
67
|
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# Appendix A · Templates
|
|
2
2
|
|
|
3
|
-
[←
|
|
3
|
+
[← 15 Foundations & Lineage](./15-foundations-and-lineage.md) · [Contents](./README.md) · Next: [Appendix B Prompts →](./appendix-b-prompts.md)
|
|
4
4
|
|
|
5
5
|
Copy-paste blanks. Project-level templates are filled once at setup; feature-level templates are filled once per feature.
|
|
6
6
|
|
|
@@ -46,8 +46,8 @@ Reject:
|
|
|
46
46
|
- <bad input / situation> -> "<error_code>"
|
|
47
47
|
After:
|
|
48
48
|
- <state true once it succeeds>
|
|
49
|
-
Assumptions —
|
|
50
|
-
⚠ <most-likely-wrong assumption> —
|
|
49
|
+
Assumptions — lowest-confidence first:
|
|
50
|
+
⚠ <most-likely-wrong assumption> — lowest confidence because <why>; if wrong: <cost>
|
|
51
51
|
- [x] <confirmed / low-stakes assumption> — <one line>
|
|
52
52
|
```
|
|
53
53
|
|
|
@@ -7,6 +7,9 @@ The contents of the `playbook/` folder. Each prompt is plain text that names the
|
|
|
7
7
|
---
|
|
8
8
|
|
|
9
9
|
### `playbook/1_specify.md`
|
|
10
|
+
|
|
11
|
+
<prompt>
|
|
12
|
+
|
|
10
13
|
```
|
|
11
14
|
Role: a domain analyst who brainstorms, then asks rather than assumes.
|
|
12
15
|
Read first: ./PRD/* , ./GLOSSARY.md , ./inputs/ (tickets, interviews, contracts)
|
|
@@ -19,15 +22,20 @@ Steps:
|
|
|
19
22
|
giving each refusal a named error code.
|
|
20
23
|
# why: named errors become scenarios and contract responses; "handle bad input" does not.
|
|
21
24
|
2. State the success state-change (After).
|
|
22
|
-
3. List the assumptions you had to make, RANKED
|
|
23
|
-
|
|
24
|
-
# why: a flat all-equal list gets
|
|
25
|
-
Exit: a domain owner disputes none of it; assumptions ranked
|
|
25
|
+
3. List the assumptions you had to make, RANKED lowest-confidence first; flag the 1–2 where
|
|
26
|
+
your confidence is lowest as `⚠ <assumption> — lowest confidence because <why>; if wrong: <cost>`.
|
|
27
|
+
# why: a flat all-equal list gets approved without reading; a ranked one aims my attention at the risk.
|
|
28
|
+
Exit: a domain owner disputes none of it; assumptions ranked lowest-confidence first, the 1–2 ⚠ flags
|
|
26
29
|
carrying why + cost — or an honest "none material" that still names the single biggest risk.
|
|
27
|
-
Never: resolve an ambiguity by guessing — ask. Never a blank "none" or a flat
|
|
30
|
+
Never: resolve an ambiguity by guessing — ask. Never a blank "none" or a flat list of equal ticks.
|
|
28
31
|
```
|
|
29
32
|
|
|
33
|
+
</prompt>
|
|
34
|
+
|
|
30
35
|
### `playbook/2_scenarios.md`
|
|
36
|
+
|
|
37
|
+
<prompt>
|
|
38
|
+
|
|
31
39
|
```
|
|
32
40
|
Role: a specification tester.
|
|
33
41
|
Read first: ./SPEC.md , ./GLOSSARY.md
|
|
@@ -41,7 +49,12 @@ Exit: every rule has at least one scenario with an observable result.
|
|
|
41
49
|
Never: write a vague result ("then it works").
|
|
42
50
|
```
|
|
43
51
|
|
|
52
|
+
</prompt>
|
|
53
|
+
|
|
44
54
|
### `playbook/3_contract.md`
|
|
55
|
+
|
|
56
|
+
<prompt>
|
|
57
|
+
|
|
45
58
|
```
|
|
46
59
|
Role: an interface/contract architect; contracts are immutable once frozen.
|
|
47
60
|
Read first: ./SPEC.md , ./features/*.feature , ./GLOSSARY.md
|
|
@@ -57,7 +70,12 @@ Exit: contract tests pass against the mock; every spec rejection has a response.
|
|
|
57
70
|
Never: change a frozen contract — a change is a request that reopens Specify.
|
|
58
71
|
```
|
|
59
72
|
|
|
73
|
+
</prompt>
|
|
74
|
+
|
|
60
75
|
### `playbook/4_tests.md`
|
|
76
|
+
|
|
77
|
+
<prompt>
|
|
78
|
+
|
|
61
79
|
```
|
|
62
80
|
Role: a test author who writes tests before code.
|
|
63
81
|
Read first: ./features/*.feature , ./contracts/*
|
|
@@ -73,7 +91,12 @@ Exit: one test per scenario; suite red for the right reason; target recorded.
|
|
|
73
91
|
Never: assert on internals; write the implementation here.
|
|
74
92
|
```
|
|
75
93
|
|
|
94
|
+
</prompt>
|
|
95
|
+
|
|
76
96
|
### `playbook/5_build.md`
|
|
97
|
+
|
|
98
|
+
<prompt>
|
|
99
|
+
|
|
77
100
|
```
|
|
78
101
|
Role: an execution agent. The human commands; you implement and report.
|
|
79
102
|
Read first: ./SPEC.md , ./contracts/* , ./tests/* , ./CONVENTIONS.md
|
|
@@ -90,7 +113,12 @@ Never: change a test or the contract; add an unlisted dependency; exceed the tas
|
|
|
90
113
|
without escalating; guess when unclear — ask.
|
|
91
114
|
```
|
|
92
115
|
|
|
116
|
+
</prompt>
|
|
117
|
+
|
|
93
118
|
### `playbook/6_observe.md`
|
|
119
|
+
|
|
120
|
+
<prompt>
|
|
121
|
+
|
|
94
122
|
```
|
|
95
123
|
Role: a reliability analyst feeding the next cycle.
|
|
96
124
|
Read first: telemetry exports , service-objective definitions , incident tickets
|
|
@@ -104,9 +132,14 @@ Exit: a reviewed SPEC delta linked into the backlog.
|
|
|
104
132
|
Never: auto-roll back — recommend; a human owns the production decision.
|
|
105
133
|
```
|
|
106
134
|
|
|
135
|
+
</prompt>
|
|
136
|
+
|
|
107
137
|
---
|
|
108
138
|
|
|
109
139
|
### Master prompt skeleton
|
|
140
|
+
|
|
141
|
+
<prompt>
|
|
142
|
+
|
|
110
143
|
```
|
|
111
144
|
Role: <one line — who the agent is for this step>
|
|
112
145
|
Read first: <explicit repository paths — never chat memory>
|
|
@@ -117,3 +150,5 @@ Exit: <conditions a person or the pipeline can check>
|
|
|
117
150
|
Never: <what the agent must not do>
|
|
118
151
|
Evidence: <artifacts to attach for review>
|
|
119
152
|
```
|
|
153
|
+
|
|
154
|
+
</prompt>
|
|
@@ -10,31 +10,43 @@
|
|
|
10
10
|
|
|
11
11
|
**Artifact** — a durable work product: the spec, the scenarios, the contract, the tests. The artifacts survive; the code is disposable.
|
|
12
12
|
|
|
13
|
-
**
|
|
13
|
+
**Lesson learned** (formerly "competency delta") — a single learning a loop produces, tagged by which of the five competencies (`DDD · SDD · UDD · TDD · ADD`) it improves, written in a task's OBSERVE phase as `- [<COMPETENCY> · <status>] <learning> (evidence: …)`. Emitted `open` by the AI; the human folds it into a versioned `PROJECT.md` (`folded`) or declines it (`rejected`). The mechanism by which the foundation self-improves instead of drifting. See the `add` skill's `deltas.md`.
|
|
14
14
|
|
|
15
15
|
**Contract** — the fixed external shape of a feature: interfaces, data structures, names, and error cases. Frozen before the build, it is the surface the AI builds against.
|
|
16
16
|
|
|
17
|
-
**Co-specification** — how a spec is made in ADD: the AI and the human **brainstorm the shape together** (diverge), the AI **drafts** it, and the human **validates with the AI's advice** (validate). The AI's decisive advice is the *
|
|
17
|
+
**Co-specification** — how a spec is made in ADD: the AI and the human **brainstorm the shape together** (diverge), the AI **drafts** it, and the human **validates with the AI's advice** (validate). The AI's decisive advice is the *lowest-confidence flag*. It replaces dictation-by-one-side — the human owns the decision, the AI owns surfacing what it does not yet know. See [03 Specify](./03-step-1-specify.md).
|
|
18
18
|
|
|
19
19
|
**Disposable code** — the view that code is one regenerable implementation of the artifacts, not a durable asset to be preserved.
|
|
20
20
|
|
|
21
21
|
**Evidence bundle** — the proof attached to a change (passing tests, clean security scan, no coverage loss) that justifies trusting it and may unlock more AI autonomy.
|
|
22
22
|
|
|
23
|
-
**Foundation version** — a monotonic integer marker in `PROJECT.md` that advances by one each time confirmed
|
|
23
|
+
**Foundation version** — a monotonic integer marker in `PROJECT.md` that advances by one each time confirmed lessons learned are consolidated into the foundation. It makes the living documentation's evolution auditable: a rising version with fewer new deltas per milestone is the signal that a competency is converging rather than drifting. Bumped only by the retrospective consolidation (see the `add` skill's `fold.md`).
|
|
24
24
|
|
|
25
25
|
**Gate** — a checkpoint with an explicit pass/fail exit. Its outcome is `PASS`, `RISK-ACCEPTED`, or `HARD-STOP`.
|
|
26
26
|
|
|
27
|
+
**Ground (phase-0 preamble)** — the per-task phase *before* Specify in which the AI gathers the real current codebase the task touches — files, symbols, signatures, patterns, conventions — into a lean **grounding map**, surfacing the **anchors** the frozen contract will cite. It is AI-owned and adds no approval (the one approval stays at the contract freeze); it precedes the seven steps as step 0 so the contract, tests, and build are grounded in the code as it actually is, not in assumption. Lives in the `add` skill's `phases/0-ground.md`.
|
|
28
|
+
|
|
29
|
+
**Grounding map / anchors** — the §0 GROUND artifact: the real files, symbols, and conventions a task touches, plus the **anchors** — the symbols the frozen contract names. Task-specific delta only: it defers to `PROJECT.md` / `CONVENTIONS.md` for architecture and never re-runs the setup brownfield scan. `add.py status` / `check` surface whether the active task's contract is grounded (measure, never block — the contract-freeze checklist asks the human to confirm it).
|
|
30
|
+
|
|
27
31
|
**`HARD-STOP`** — a gate outcome meaning work cannot proceed; triggered by any failing test or security finding.
|
|
28
32
|
|
|
29
|
-
**Intake** — the step *before* a task: sizing a raw request into versioned scope by classifying it into one **request bucket**. The AI proposes `{bucket, rationale, command}`; the human confirms. Lives in the `add` skill's `intake.md` (the intake
|
|
33
|
+
**Intake** — the step *before* a task: sizing a raw request into versioned scope by classifying it into one **request bucket**. The AI proposes `{bucket, rationale, command}`; the human confirms. Lives in the `add` skill's `intake.md` (the intake level, above the per-task flow).
|
|
30
34
|
|
|
31
|
-
**
|
|
35
|
+
**Lowest-confidence flag** (formerly "least-sure flag") — the AI's ranked declaration of the **1–2 things most likely to be wrong** in what it is asking a human to approve, each carrying *why* it is uncertain and *what it costs if wrong* (`⚠ [spec|scenario|contract|test] … — because …; if wrong: …`). It reshapes the old flat assumptions list into a ranked one, so a single approval aims the reviewer's attention at the real risk instead of a flat list of equal-looking ticks. Bundle-wide at the contract-freeze decision point; the §1 assumptions are its first input. If nothing is materially uncertain it still names the single biggest risk — never a blank "none". It makes a genuine review cheap and a lazy one visibly negligent, but cannot *force* the read. The "AI advises" half of **co-specification**.
|
|
32
36
|
|
|
33
37
|
**Living document** — an artifact expected to change as the loop learns; never frozen forever (the one exception being a versioned contract, which changes only via a change request).
|
|
34
38
|
|
|
35
|
-
**
|
|
39
|
+
**Onboarding** (formerly "on-ramp") — the path a new user walks from install to their first milestone: install → `/add` → describe the goal → the agent runs intake (sizing the request into a milestone the human confirms) → the specification bundle → the self-driving run. The AI-first entry to the method; the human talks to the agent rather than hand-typing `add.py`.
|
|
40
|
+
|
|
41
|
+
**Decision point** (formerly "seam") — a place where the flow stops for human judgment: the contract-freeze approval (the one approval), an escalated verify gate, intake confirmation, milestone close. The machine layer keeps the legacy name: the `--json` owner enum `seam`, the decide-digest key `seam`, and the `seam-audit` CI job.
|
|
42
|
+
|
|
43
|
+
**The decision arc** — the three engine-sourced lines a gate report opens with at every **decision point**: `goal:` the milestone goal the work serves · `done:` the achievement, the proven progress toward it (the gate reports render this line as `done`) · `plan:` what comes next. What `done` reports adapts per gate (verify: tests + evidence · milestone close: exit-criteria met · intake: the request sized) while the three-part shape stays constant. Rendered first, above the report's summary, so the human confirms with sight of the whole trajectory, not a local snapshot. Engine-sourced like all evidence — goal · done · plan are pulled from `add.py` output, never re-typed. Presentation only: it never adds a gate or changes a `PASS` / `RISK-ACCEPTED` / `HARD-STOP` / freeze outcome. The report it opens is the chat report a person reads at a decision point — distinct from the three Test/Quality/Risk reports a verify gate produces ([11 Governance](./11-governance.md)). See the `add` skill's `report-template.md`.
|
|
44
|
+
|
|
45
|
+
**Specification bundle** (formerly "the one-approval front") — §1–§4 of a task (spec · scenarios · contract · failing tests) drafted by the AI as one piece and approved by a person **once**, at the contract freeze. Rejecting any part returns the whole bundle to draft. The single approval it carries is the bundle approval.
|
|
36
46
|
|
|
37
|
-
**
|
|
47
|
+
**Retrospective consolidation** (formerly "the fold / fold ritual") — the milestone-close (or on-demand) step where a person gathers `open` lessons learned, confirms each, and the AI writes them append-only into the versioned foundation, bumping `foundation-version:`. The AI never self-approves a consolidation. The machine names keep their names: `fold.md`, the `folded` delta status, and `add.py deltas`.
|
|
48
|
+
|
|
49
|
+
**Owner (of a phase)** — who drives a phase, exposed by `add.py … --json` as `human`, `seam`, or `ai` (machine enum values that keep their names; in prose the `seam` value's concept is now the decision point, formerly "seam"). It tells an autonomous harness where it may run (`ai`) and where it must checkpoint to a person (`human`/`seam`), following the who-does-what table (Verify is always `human`).
|
|
38
50
|
|
|
39
51
|
**Profile** — the intensity at which the method is run: Express, Standard, or Regulated.
|
|
40
52
|
|
|
@@ -48,17 +60,41 @@
|
|
|
48
60
|
|
|
49
61
|
**Spec (`SPEC.md`)** — the plain-language statement of what a feature must do, must reject, and assumes.
|
|
50
62
|
|
|
51
|
-
**
|
|
63
|
+
**Cross-cutting concern** (formerly "spine / continuous concern") — a concern that runs through every step rather than being one step: security, testing, observability, cost.
|
|
52
64
|
|
|
53
65
|
**Stage** — one pass through the flow at a chosen depth: Prototype, Proof of Concept, MVP, or Production-Ready.
|
|
54
66
|
|
|
55
|
-
**
|
|
67
|
+
**Stage graduation** — the orchestration loop that proposes the move to the next **stage** as a human-confirmed roadmap, never a bare flip; the 4th scope level after setup · intake · milestone-loop. The cue is every milestone `done` with the **stage-goal-criteria** all `[x]`; the flow is gather **graduation analytics** → interview *what production means here* → draft ≥1 production milestone → human confirms → `add.py stage production` as the final step. The →production flip is guarded: it refuses with `stage_no_roadmap` (a tally, not a readiness judgment) until ≥1 production milestone exists; `--force` overrides. Lives in the `add` skill's `graduate.md`.
|
|
68
|
+
|
|
69
|
+
**Graduation analytics** — the five record-sets `add.py graduation-report` clusters from the whole MVP loop for the graduation interview: open deltas by competency · open RISK-ACCEPTED waivers by expiry · RETRO records · verify residue · observe-loop coverage gaps. It gathers, never judges — there is no readiness verdict, only the records the human reasons from (gather-not-judge).
|
|
70
|
+
|
|
71
|
+
**Stage-goal-criteria** — the human-authored `[x]` checklist in `PROJECT.md` that defines "MVP covered" for this project; when every milestone is `done` and these are all checked, `add.py status` prints the graduation cue. Authored by the human (judgment), never inferred by the engine.
|
|
72
|
+
|
|
73
|
+
**Baseline approval** (formerly "the lock-down") — the single human gate ending autonomous setup: an explicit yes that freezes the foundation, first scope, and first contract together; runs as `add.py lock --by <name>`.
|
|
74
|
+
|
|
75
|
+
**Scope level** (formerly "altitude") — the granularity a decision lives at: intake level (request → versioned scope) · milestone level · setup/foundation level · task level. (A cross-stage decision lives one level out, at the **stage-graduation** loop — which `graduate.md` also numbers as a scope level; see **Stage graduation**.) One ⚠-assumption notation is shared across every scope level.
|
|
76
|
+
|
|
77
|
+
**Autonomy level** (formerly "autonomy dial") — the explicit per-task setting (`autonomy: manual | conservative | auto`, an ordered ladder manual < conservative < auto) choosing who resolves Verify: `auto` auto-PASSes on complete evidence, `conservative` keeps a human at the gate, `manual` is the strict floor (the human owns the gate; nothing auto-resolves). A high-risk scope refuses an unguarded `auto` — it must be lowered to `manual` or `conservative`. New tasks seed a visible, overridable `autonomy: auto`; a live task with no level warns (`implicit_autonomy`), a token outside the set is rejected (`unknown_autonomy_level`).
|
|
78
|
+
|
|
79
|
+
**Auto-ready goal** — a milestone goal whose every exit criterion **cites a verifier** (`(verify: <test|command|metric>)`), so the engine can self-verify the result against the goal without human judgment. It is the prerequisite by which **autonomy is earned by goal-clarity**: the **autonomy level** governs *who* resolves Verify, but a clarified, machine-checkable goal is what makes a self-verifying run meaningful. `add.py check` raises a `goal_not_auto_ready` **WARN** (never red) for the active milestone until it has an auto-ready goal (≥1 exit criterion and every one cited), and `status` surfaces it (`goal-ready: auto-ready ✓` / cited-of-total); a zero-criteria goal reads not-auto-ready and is milestone-shaping's nudge, not this warning's. The lint forces a citation *slot* per criterion — it raises the floor but **cannot prove the citation is real** (a human can write `(verify: it works)`): citation-theater is the accepted irreducible floor, and the freeze gate and autonomy behavior are unchanged by it.
|
|
80
|
+
|
|
81
|
+
**Automated quality gate** (formerly "evidence auto-gate") — the Verify resolver under `autonomy: auto`: a run may auto-PASS on complete evidence, recorded as *auto-resolved*; a security finding always escalates (`HARD-STOP`).
|
|
82
|
+
|
|
83
|
+
**Change scope** (formerly "touch-boundary") — the hard boundary of a locked run: what it may edit (code, tests-to-green, evidence) and must not (the frozen contract, locked scope, any test weakening). The `<touch_boundary>` XML prompt tag keeps its name.
|
|
84
|
+
|
|
85
|
+
**Non-functional review** (formerly "blind-spot checks") — the deliberate verify-time check of the risks tests rarely catch: concurrency, security, architecture. Security findings always escalate.
|
|
86
|
+
|
|
87
|
+
**Failing-first suite** (formerly "red safety net") — the per-feature test suite written before any code and confirmed red for the right reason (a missing implementation, not a broken test); the TDD red phase at ADD step 4.
|
|
88
|
+
|
|
89
|
+
**Method rationale** (formerly "trust layer") — the *why* behind every rule: the AIDD book in `.add/docs/`, read on demand via each phase guide's chapter pointer, never auto-loaded.
|
|
90
|
+
|
|
91
|
+
**Working state** (formerly "state surface" — one of the two record surfaces) — everything an agent loads every session: the `add` skill (router `SKILL.md` + the active phase) and the lean operational docs — `PROJECT.md`, the active `MILESTONE.md` and `TASK.md`, and `state.json`. Kept small to avoid context rot. Contrast **audit trail**.
|
|
56
92
|
|
|
57
93
|
**Stop signal** — the boolean an autonomous harness reads from `add.py … --json` (`stop = owner != "ai"`): true means pause for a person before proceeding. The irreducible stops are the contract freeze and the Verify gate. See **Owner (of a phase)**.
|
|
58
94
|
|
|
59
|
-
**
|
|
95
|
+
**Audit trail** (formerly "story surface") — the book (`docs/*`): the whole method, read once by a person to trust ADD, then referenced by a pointer and **never auto-loaded** into agent context. Contrast **working state**.
|
|
60
96
|
|
|
61
|
-
**
|
|
97
|
+
**Living documentation** (formerly "survivor layer") — the set of durable artifacts (conventions, glossary, frozen contracts) that outlives any particular code.
|
|
62
98
|
|
|
63
99
|
**Trust ladder / autonomy ladder** — the graduated levels of AI autonomy, earned with evidence and verification capacity.
|
|
64
100
|
|
|
@@ -73,6 +109,7 @@ This book uses plain step names. Teams connecting it to a larger formal standard
|
|
|
73
109
|
| Plain step (this book) | Formal phase name |
|
|
74
110
|
|------------------------|-------------------|
|
|
75
111
|
| Project setup | Foundation |
|
|
112
|
+
| Ground (preamble) | Codebase Discovery (the §0 grounding map) |
|
|
76
113
|
| Specify | Domain Discovery + Spec Definition |
|
|
77
114
|
| (design portion) | UX-Driven Design |
|
|
78
115
|
| Scenarios | Behavior specification (Given/When/Then) |
|
|
@@ -82,4 +119,4 @@ This book uses plain step names. Teams connecting it to a larger formal standard
|
|
|
82
119
|
| Verify | the review gate within the build |
|
|
83
120
|
| Observe (loop) | Operate and Learn |
|
|
84
121
|
|
|
85
|
-
The formal standard also names the *foundation* and *design* work as full phases in their own right; this book
|
|
122
|
+
The formal standard also names the *foundation* and *design* work as full phases in their own right; this book merges them into project setup and the Specify step (and the Prototype stage) to keep the flow to six memorable steps.
|