@pilotspace/add 1.1.0 → 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +40 -0
- package/GETTING-STARTED.md +165 -139
- package/README.md +13 -7
- package/bin/cli.js +13 -4
- package/docs/01-principles.md +3 -3
- package/docs/02-the-flow.md +15 -11
- package/docs/03-step-1-specify.md +13 -13
- package/docs/04-step-2-scenarios.md +2 -2
- package/docs/05-step-3-contract.md +3 -3
- package/docs/06-step-4-tests.md +2 -2
- package/docs/07-step-5-build.md +1 -1
- package/docs/08-step-6-verify.md +14 -5
- package/docs/09-the-loop.md +12 -6
- package/docs/10-setup-and-stages.md +27 -13
- package/docs/11-governance.md +2 -2
- package/docs/12-roles.md +3 -3
- package/docs/13-adoption.md +1 -1
- package/docs/14-foundation.md +15 -15
- package/docs/15-foundations-and-lineage.md +106 -0
- package/docs/README.md +4 -0
- package/docs/appendix-a-templates.md +3 -3
- package/docs/appendix-b-prompts.md +40 -5
- package/docs/appendix-c-glossary.md +42 -12
- package/docs/appendix-d-worked-example.md +2 -2
- package/docs/appendix-e-checklists.md +2 -2
- package/docs/appendix-f-requirements-matrix.md +8 -8
- package/docs/appendix-g-references.md +106 -0
- package/package.json +1 -1
- package/skill/add/SKILL.md +39 -37
- package/skill/add/adopt.md +13 -11
- package/skill/add/deltas.md +8 -6
- package/skill/add/fold.md +19 -17
- package/skill/add/graduate.md +74 -0
- package/skill/add/intake.md +22 -7
- package/skill/add/loop.md +59 -0
- package/skill/add/phases/0-setup.md +29 -24
- package/skill/add/phases/1-specify.md +23 -13
- package/skill/add/phases/2-scenarios.md +14 -4
- package/skill/add/phases/3-contract.md +24 -11
- package/skill/add/phases/4-tests.md +15 -5
- package/skill/add/phases/5-build.md +11 -4
- package/skill/add/phases/6-verify.md +24 -2
- package/skill/add/phases/7-observe.md +13 -5
- package/skill/add/report-template.md +65 -7
- package/skill/add/run.md +45 -34
- package/skill/add/scope.md +10 -6
- package/skill/add/setup-review.md +13 -10
- package/skill/add/streams.md +69 -19
- package/tooling/add.py +476 -34
- package/tooling/templates/CONVENTIONS.md.tmpl +1 -1
- package/tooling/templates/GLOSSARY.md.tmpl +23 -0
- package/tooling/templates/MILESTONE.md.tmpl +1 -0
- package/tooling/templates/PROJECT.md.tmpl +4 -3
- package/tooling/templates/TASK.md.tmpl +33 -12
package/docs/02-the-flow.md
CHANGED
|
@@ -6,15 +6,15 @@
|
|
|
6
6
|
|
|
7
7
|
## The flow
|
|
8
8
|
|
|
9
|
-
AIDD is one repeatable flow of **seven steps**: six build the feature — Specify → Scenarios → Contract → Tests → Build → Verify — and the seventh, **Observe**, feeds what production teaches back into the next Specify. In the default flow the AI drafts the
|
|
9
|
+
AIDD is one repeatable flow of **seven steps**: six build the feature — Specify → Scenarios → Contract → Tests → Build → Verify — and the seventh, **Observe**, feeds what production teaches back into the next Specify. In the default flow the AI drafts the specification bundle (steps 1–4) and a person approves it **once**, at the contract freeze; the AI performs the Build; and Verify is resolved on evidence under `autonomy: auto`, with a person owning any residue. (See [11 Governance](./11-governance.md) for the autonomy level and the one-approval decision point.)
|
|
10
10
|
|
|
11
|
-

|
|
12
12
|
|
|
13
13
|
```mermaid
|
|
14
14
|
flowchart LR
|
|
15
15
|
S1["1 Specify<br/>the rules"] --> S2["2 Scenarios<br/>pass/fail cases"]
|
|
16
16
|
S2 --> S3["3 Contract<br/>freeze the shape"]
|
|
17
|
-
S3 --> S4["4 Tests<br/>
|
|
17
|
+
S3 --> S4["4 Tests<br/>failing-first (red)"]
|
|
18
18
|
S4 --> S5["5 Build<br/>AI writes code"]
|
|
19
19
|
S5 --> S6["6 Verify<br/>evidence + checks"]
|
|
20
20
|
S6 --> OBS["Observe<br/>in production"]
|
|
@@ -23,14 +23,14 @@ flowchart LR
|
|
|
23
23
|
S5 -. "a missing rule → back to Specify" .-> S1
|
|
24
24
|
OBS -. "what you learn becomes the next spec" .-> S1
|
|
25
25
|
classDef human fill:#FAEEDA,stroke:#BA7517,color:#633806;
|
|
26
|
-
classDef
|
|
26
|
+
classDef decision fill:#E1F5EE,stroke:#0F6E56,color:#04342C;
|
|
27
27
|
classDef machine fill:#E6F1FB,stroke:#185FA5,color:#042C53;
|
|
28
28
|
class S1,S2 human;
|
|
29
|
-
class S3,S4
|
|
29
|
+
class S3,S4 decision;
|
|
30
30
|
class S5,S6 machine;
|
|
31
31
|
```
|
|
32
32
|
|
|
33
|
-
> **Solid arrows are the
|
|
33
|
+
> **Solid arrows are the primary flow** — you never start a phase before its input exists (forward-skip forbidden). **Dashed arrows are backward correction** — any phase may return to an earlier one to repair its artifact (the long loop, Observe → Specify, is the same rule at milestone scale). The tight Tests ⇄ Build cycle is the per-feature red/green engine.
|
|
34
34
|
|
|
35
35
|
```text
|
|
36
36
|
human-led ─────────────────►│◄─────────── machine-led ──► human verify
|
|
@@ -45,9 +45,9 @@ flowchart LR
|
|
|
45
45
|
└─────────────────────────┘ becomes the next Specify
|
|
46
46
|
```
|
|
47
47
|
|
|
48
|
-
The shape is deliberate: the human-led steps establish direction, a frozen contract forms the
|
|
48
|
+
The shape is deliberate: the human-led steps establish direction, a frozen contract forms the decision point in the middle, and the AI-led build runs fast and safely on the far side because everything it needs is already fixed.
|
|
49
49
|
|
|
50
|
-
> **What changed in v7 (the diagrams above show the structural
|
|
50
|
+
> **What changed in v7 (the diagrams above show the structural flow, which is unchanged).** The *steps* and their order are exactly as drawn — only **who resolves them** moved. The AI now drafts the whole specification bundle (steps 1–4) and a person approves it **once**, at the contract freeze (not a sign-off at each step); and **Verify is auto-gated on evidence** under `autonomy: auto` (the default), escalating security — always a `HARD-STOP` — and other residue to a person. Lower the autonomy level to `conservative` to keep a human at the Verify gate. See [11 Governance](./11-governance.md).
|
|
51
51
|
|
|
52
52
|
## Why the order is the order
|
|
53
53
|
|
|
@@ -58,7 +58,7 @@ Each step produces exactly one artifact, and each artifact is the input to the n
|
|
|
58
58
|
| 1 Specify | the rules | scenarios, and everything after |
|
|
59
59
|
| 2 Scenarios | pass/fail cases | the tests |
|
|
60
60
|
| 3 Contract | the fixed shape | the tests and the build |
|
|
61
|
-
| 4 Tests | the failing
|
|
61
|
+
| 4 Tests | the failing-first suite | the build and the verification |
|
|
62
62
|
| 5 Build | the code | the verification |
|
|
63
63
|
| 6 Verify | a trusted, releasable change | the release and the next loop |
|
|
64
64
|
|
|
@@ -66,17 +66,21 @@ The single rule of discipline follows directly: **do not begin a step until the
|
|
|
66
66
|
|
|
67
67
|
The flow runs in two directions under two rules that never conflict. **Backward correction is always allowed:** any phase may send you back to an earlier one to repair its artifact — a failing Build that exposes a missing rule sends you back to Specify, and that is the loop working ([principle 4](./01-principles.md)), not a failure. **Forward-skipping is forbidden:** you never start a phase before its input artifact exists. Correct backward freely; never skip forward.
|
|
68
68
|
|
|
69
|
+
**`done` is terminal — except via the recorded reopen.** Backward correction moves a *live* task; a task at `done` has already passed its gate. The one way back from `done` is the recorded `reopen` action (`add.py reopen <task> --to <phase> --reason "..."`): it returns the task to an earlier phase, resets the gate, and writes down *why* — so a done verdict is never quietly un-done. This is the same backward-correction rule, made explicit at the one state where it would otherwise be bypassed silently.
|
|
70
|
+
|
|
69
71
|
## Who does what
|
|
70
72
|
|
|
71
73
|
| Step | Person's job | AI's job |
|
|
72
74
|
|------|--------------|----------|
|
|
73
75
|
| 1 Specify | confirm the rules (part of the one approval) | draft; list assumptions to confirm |
|
|
74
76
|
| 2 Scenarios | confirm what "correct" looks like (part of the one approval) | draft scenarios |
|
|
75
|
-
| 3 Contract | **approve & freeze the whole bundle (§1–§4) once — the
|
|
77
|
+
| 3 Contract | **approve & freeze the whole bundle (§1–§4) once — the decision point** | draft the contract and mocks |
|
|
76
78
|
| 4 Tests | confirm the targets (part of the one approval) | draft the failing tests |
|
|
77
79
|
| 5 Build | direct in small batches | implement until tests pass |
|
|
78
80
|
| 6 Verify | own the residue (security · concurrency · architecture); approve when `conservative` | gather evidence; **auto-PASS on complete evidence** under `autonomy: auto` |
|
|
79
|
-
| 7 Observe | read the signal;
|
|
81
|
+
| 7 Observe | read the signal; consolidate confirmed deltas into PROJECT.md | run behind a flag; emit lessons learned |
|
|
82
|
+
|
|
83
|
+
**What the human sees when it is their turn — the decision arc.** Whenever the flow stops for the human — the baseline approval that ends setup, the contract-freeze decision point and an escalated verify gate within each task, and the wider decision points of the loop (intake · scope · milestone close · stage graduation) — the AI opens its report with the **decision arc**: three engine-sourced lines — `goal:` the milestone goal the work serves · `done:` the proven progress toward it · `plan:` what comes next. The arc renders first, above the report's summary, so the human confirms with sight of the whole trajectory rather than a local snapshot. It is presentation only — it never adds a gate or changes an outcome. See [Appendix C](./appendix-c-glossary.md).
|
|
80
84
|
|
|
81
85
|
## What survives, and what is disposable
|
|
82
86
|
|
|
@@ -4,7 +4,7 @@
|
|
|
4
4
|
|
|
5
5
|
> **Purpose:** state, in plain language, what the feature must do and what it must reject, with no ambiguity left for the AI to resolve by guessing.
|
|
6
6
|
> **Produces:** `SPEC.md` for the feature.
|
|
7
|
-
> **How it works — co-specification:** AI and human **brainstorm the shape together**; the AI drafts; the **human validates, with the AI's advice.** The decisive advice is a *
|
|
7
|
+
> **How it works — co-specification:** AI and human **brainstorm the shape together**; the AI drafts; the **human validates, with the AI's advice.** The decisive advice is a *lowest-confidence flag* — the AI names the one or two things most likely to be wrong, so the human's attention lands where it matters. The human owns the decision; the AI owns surfacing what it does not yet know.
|
|
8
8
|
|
|
9
9
|
---
|
|
10
10
|
|
|
@@ -19,10 +19,10 @@ There is also a diagnostic value: **if you cannot write the spec, you do not yet
|
|
|
19
19
|
A specification is not dictated by one side. It is made in three moves:
|
|
20
20
|
|
|
21
21
|
1. **Diverge — brainstorm by both.** Before drafting, the AI surfaces the *decision space*: the two or three genuine ways to frame the feature, and the open questions it would otherwise resolve by guessing. You react — add, kill, redirect. This is the brainstorm, and it lives in the conversation, not in a new document.
|
|
22
|
-
2. **Converge — the AI drafts, and ranks its own uncertainty.** The AI writes the spec below, then ranks
|
|
22
|
+
2. **Converge — the AI drafts, and ranks its own uncertainty.** The AI writes the spec below, then ranks where its confidence is lowest. It does not hand you a flat list of equal-looking assumptions to nod through; it tells you *where it is most likely wrong, and what that would cost.*
|
|
23
23
|
3. **Validate — you decide, with the AI's advice.** You read the ranked uncertainty first, then confirm, correct, or send it back. Your approval is real because your attention was aimed.
|
|
24
24
|
|
|
25
|
-
The brainstorm leaves a *light trace, not a document.* What you chose becomes a rule; what you weighed and dropped becomes a one-line **`Framings weighed:`** note; what stayed genuinely uncertain becomes a **
|
|
25
|
+
The brainstorm leaves a *light trace, not a document.* What you chose becomes a rule; what you weighed and dropped becomes a one-line **`Framings weighed:`** note; what stayed genuinely uncertain becomes a **lowest-confidence flag**. Nothing new to maintain — the residue lands in the spec you were writing anyway.
|
|
26
26
|
|
|
27
27
|
## What a good specification contains
|
|
28
28
|
|
|
@@ -31,7 +31,7 @@ Four parts, kept short:
|
|
|
31
31
|
1. **Must** — the behaviors the feature is required to perform.
|
|
32
32
|
2. **Reject** — the inputs or situations it must refuse, each paired with a named error.
|
|
33
33
|
3. **After** — the state that is true once it succeeds (what changed).
|
|
34
|
-
4. **Assumptions —
|
|
34
|
+
4. **Assumptions — lowest-confidence first** — the things you are taking for granted, **ranked so the most-likely-wrong come first.** The top one or two carry a `⚠` flag with *why it is uncertain* and *what it costs if wrong*; the rest are the low-stakes tail. A spec with genuinely nothing uncertain still names its single biggest risk, however small — the AI never claims a blank mind.
|
|
35
35
|
|
|
36
36
|
Naming the errors matters. "Reject bad amounts" is an instruction to guess; `amount <= 0 -> "amount_invalid"` is a rule that produces a testable scenario and a defined contract response.
|
|
37
37
|
|
|
@@ -47,8 +47,8 @@ Reject:
|
|
|
47
47
|
- <bad input / situation> -> "<error_code>"
|
|
48
48
|
After:
|
|
49
49
|
- <what is true once it succeeds>
|
|
50
|
-
Assumptions —
|
|
51
|
-
⚠ <most-likely-wrong assumption> —
|
|
50
|
+
Assumptions — lowest-confidence first:
|
|
51
|
+
⚠ <most-likely-wrong assumption> — lowest confidence because <why>; if wrong: <cost>
|
|
52
52
|
- [x] <confirmed / low-stakes assumption> — <one line>
|
|
53
53
|
```
|
|
54
54
|
|
|
@@ -69,8 +69,8 @@ Reject:
|
|
|
69
69
|
- source == destination -> "same_account"
|
|
70
70
|
- balance < amount -> "insufficient_funds"
|
|
71
71
|
- account not mine -> "forbidden"
|
|
72
|
-
Assumptions —
|
|
73
|
-
⚠ same currency only (no FX) in v1 —
|
|
72
|
+
Assumptions — lowest-confidence first:
|
|
73
|
+
⚠ same currency only (no FX) in v1 — lowest confidence because the ticket never said; if wrong: the whole amount/rounding model changes and this contract is wrong
|
|
74
74
|
- [x] no daily limit in v1 — confirmed: out of scope for v1
|
|
75
75
|
```
|
|
76
76
|
|
|
@@ -78,16 +78,16 @@ The `Framings weighed:` line shows what was considered and dropped, so the chose
|
|
|
78
78
|
|
|
79
79
|
## The AI's role here
|
|
80
80
|
|
|
81
|
-
Use the AI to **open the space and then narrow it honestly.** First it brainstorms the genuine framings with you (diverge). Then it drafts the spec from whatever raw material you have — a ticket, an interview, a contract document — listing every assumption it had to make, **ranked
|
|
81
|
+
Use the AI to **open the space and then narrow it honestly.** First it brainstorms the genuine framings with you (diverge). Then it drafts the spec from whatever raw material you have — a ticket, an interview, a contract document — listing every assumption it had to make, **ranked lowest-confidence first**, and flagging the one or two it is least confident in with *why* and *what it costs if wrong*. Its instinct is to fill gaps silently and present a confident wall; the method forces those gaps into the open, and forces the confident wall to declare its own soft spots. See `playbook/1_specify.md` in [Appendix B](./appendix-b-prompts.md).
|
|
82
82
|
|
|
83
|
-
The defining instruction: *if a requirement is unclear, ask — do not resolve it by guessing — and of the things you must assume, say plainly
|
|
83
|
+
The defining instruction: *if a requirement is unclear, ask — do not resolve it by guessing — and of the things you must assume, say plainly where your confidence is lowest.*
|
|
84
84
|
|
|
85
85
|
## Common mistakes
|
|
86
86
|
|
|
87
87
|
- **Stating only the happy path.** The "Reject" list is where most real complexity lives; an empty one usually means it has not been thought through.
|
|
88
88
|
- **Free-text errors.** Errors must be named codes, not sentences, so they can become scenarios and contract responses.
|
|
89
89
|
- **Hidden assumptions.** If an assumption is not written down, it is not confirmed — it is a future bug with a delay timer.
|
|
90
|
-
- **A flat
|
|
90
|
+
- **A flat list of "confirmed" assumptions.** Eight equal-looking ticks invite a reflex approval. Rank them; flag the one or two that are load-bearing. An unranked list hides the risk inside the noise.
|
|
91
91
|
|
|
92
92
|
## Exit check
|
|
93
93
|
|
|
@@ -96,7 +96,7 @@ A spec is done when:
|
|
|
96
96
|
- [ ] Every required behavior is stated explicitly.
|
|
97
97
|
- [ ] Every rejection has a named error code.
|
|
98
98
|
- [ ] The success state-change is described.
|
|
99
|
-
- [ ] The assumptions are ordered
|
|
99
|
+
- [ ] The assumptions are ordered lowest-confidence first, and the one or two `⚠` flags carry *why* + *cost* — or, for genuinely trivial scope, an honest "none material" that still names the single biggest risk.
|
|
100
100
|
|
|
101
101
|
The shift from older practice: you no longer pre-confirm every assumption to advance. You confirm that the AI has *ranked* its uncertainty and that you have *engaged the top of the rank.* Stated honestly: the flag makes a genuine review cheap and a lazy one visibly negligent — it cannot force the read. That is the most a lightweight check can buy.
|
|
102
102
|
|
|
@@ -108,7 +108,7 @@ If you cannot state a rule clearly, the feature is not ready to build. Stop, tak
|
|
|
108
108
|
|
|
109
109
|
## The one approval, and where the flag really lands
|
|
110
110
|
|
|
111
|
-
In the one-approval
|
|
111
|
+
In the one-approval flow, you do not approve the spec alone — you approve the whole frozen bundle (spec, scenarios, contract, tests) once, at the contract freeze. So the lowest-confidence flag is **bundle-wide**: at that single decision point the AI leads with *"of everything I'm asking you to freeze, these one or two points are most likely wrong"* — and a flag may point at an uncovered scenario or the contract shape, not only a spec assumption. The ranking you do here in Specify is the first input into that one gate. See [05 Contract](./05-step-3-contract.md) and the `add` skill's `run.md`.
|
|
112
112
|
|
|
113
113
|
---
|
|
114
114
|
|
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
> **Produces:** `features/<name>.feature`.
|
|
7
7
|
> **Person's job:** decide what "correct" looks like in concrete situations. **AI's job:** draft the scenarios.
|
|
8
8
|
|
|
9
|
-
> **Part of the
|
|
9
|
+
> **Part of the specification bundle (v7).** In the default flow these scenarios are drafted by the AI alongside the spec, contract, and failing tests as **one bundle**, approved by a person **once** (the one approval), at the contract freeze — not signed off step by step. This chapter is how to get the scenarios *right*; [05 Contract](./05-step-3-contract.md) is where the bundle is frozen. See [11 Governance](./11-governance.md).
|
|
10
10
|
|
|
11
11
|
---
|
|
12
12
|
|
|
@@ -14,7 +14,7 @@
|
|
|
14
14
|
|
|
15
15
|
A plain rule is still open to interpretation. "Source must have enough balance" leaves open: enough for what, exactly? What happens to the balances when it is *not* enough? A scenario removes the interpretation by pinning a specific situation to a specific expected result.
|
|
16
16
|
|
|
17
|
-
Scenarios occupy a unique position: they are **readable by people and checkable by machines at the same time.** A product owner can confirm a scenario is what they meant; a test can be generated directly from it. This makes them the bridge between the human-led
|
|
17
|
+
Scenarios occupy a unique position: they are **readable by people and checkable by machines at the same time.** A product owner can confirm a scenario is what they meant; a test can be generated directly from it. This makes them the bridge between the human-led half of the flow and the machine-led back. They are the single most leverage-bearing artifact in the method, because everything downstream — the tests, and through them the build's definition of success — is generated from them.
|
|
18
18
|
|
|
19
19
|
## The form
|
|
20
20
|
|
|
@@ -6,13 +6,13 @@
|
|
|
6
6
|
> **Produces:** `contracts/<name>.md` (plus a mock and contract tests).
|
|
7
7
|
> **Person's job:** approve and freeze the shape. **AI's job:** generate the first draft, the mock, and the contract tests.
|
|
8
8
|
|
|
9
|
-
> **The one approval lands here (v7).** In the default flow the AI drafts
|
|
9
|
+
> **The one approval lands here (v7).** In the default flow the AI drafts spec, scenarios, this contract, and the failing tests as **one specification bundle**, and a person gives a **single approval at this freeze**. Freezing the contract is the one human gate of the bundle, not the third of three sign-offs; reject any part and the whole bundle returns to draft (backward correction, not failure). See [11 Governance](./11-governance.md).
|
|
10
10
|
|
|
11
11
|
---
|
|
12
12
|
|
|
13
|
-
## The
|
|
13
|
+
## The decision point of the whole method
|
|
14
14
|
|
|
15
|
-
This step is the
|
|
15
|
+
This step is the decision point between the human-led and machine-led halves of the flow, and it is what makes everything after it safe.
|
|
16
16
|
|
|
17
17
|
The reasoning is simple. The AI is allowed to write and rewrite code quickly. That is only safe if there is a stable surface that the rest of the system depends on and that the AI is not allowed to disturb. The frozen contract is that surface. Below it, the code is disposable and can be regenerated freely; above it, nothing breaks, because the shape it depends on does not move.
|
|
18
18
|
|
package/docs/06-step-4-tests.md
CHANGED
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
> **Produces:** a failing (red) automated test suite.
|
|
7
7
|
> **Person's job:** set the targets and coverage. **AI's job:** generate the tests.
|
|
8
8
|
|
|
9
|
-
> **Part of the
|
|
9
|
+
> **Part of the specification bundle (v7).** In the default flow these tests are drafted by the AI as part of the specification **bundle** (spec · scenarios · contract · tests) and approved by a person **once**, at the contract freeze — the tests are part of what that one approval covers. They still must be **red before the build**. See [11 Governance](./11-governance.md).
|
|
10
10
|
|
|
11
11
|
---
|
|
12
12
|
|
|
@@ -18,7 +18,7 @@ The reason is mechanical. If code is written first and tests after, the tests ar
|
|
|
18
18
|
|
|
19
19
|
## The must-fail principle
|
|
20
20
|
|
|
21
|
-
After generating the tests, you run them — and they must **fail**, because no implementation exists yet. This sounds trivial and is not. A test that passes before any code is written is testing nothing; it is a false reassurance that will later wave bad code through. Confirming the suite is "red for the right reason" (a missing implementation, not a broken test) is what makes it
|
|
21
|
+
After generating the tests, you run them — and they must **fail**, because no implementation exists yet. This sounds trivial and is not. A test that passes before any code is written is testing nothing; it is a false reassurance that will later wave bad code through. Confirming the suite is "red for the right reason" (a missing implementation, not a broken test) is what makes it genuinely protective.
|
|
22
22
|
|
|
23
23
|
## What to test
|
|
24
24
|
|
package/docs/07-step-5-build.md
CHANGED
|
@@ -63,7 +63,7 @@ The autonomy granted in this step should match the evidence and your review capa
|
|
|
63
63
|
|
|
64
64
|
## Common mistakes
|
|
65
65
|
|
|
66
|
-
- **Batches too large to review.** Shrinks verification to
|
|
66
|
+
- **Batches too large to review.** Shrinks verification to approving without reading.
|
|
67
67
|
- **Letting the AI add unknown dependencies.** The allow-list check in the pipeline should block this automatically; if it does not, the supply-chain risk is real (an AI may invent a plausible package name that an attacker has registered).
|
|
68
68
|
- **Accepting "all tests pass" without reading the change.** Passing tests are necessary, not sufficient — the next step exists for exactly this reason.
|
|
69
69
|
|
package/docs/08-step-6-verify.md
CHANGED
|
@@ -12,16 +12,16 @@
|
|
|
12
12
|
|
|
13
13
|
The build produced passing tests. That is necessary but not sufficient. Verification is where a person establishes trust — and the principle governing it is *trust through evidence, not inspection.*
|
|
14
14
|
|
|
15
|
-
This needs care, because it is easy to misread. "Not by inspection" does not mean "do not look at the code." It means the *basis* of trust is the passing evidence plus a deliberate check of the specific things tests cannot easily catch — not a general impression that the code reads plausibly. Plausibility is exactly the trap: AI code is frequently plausible and wrong. So verification has two parts: confirm the evidence, then check the known
|
|
15
|
+
This needs care, because it is easy to misread. "Not by inspection" does not mean "do not look at the code." It means the *basis* of trust is the passing evidence plus a deliberate check of the specific things tests cannot easily catch — not a general impression that the code reads plausibly. Plausibility is exactly the trap: AI code is frequently plausible and wrong. So verification has two parts: confirm the evidence, then check the known non-functional risks.
|
|
16
16
|
|
|
17
|
-
## Who resolves Verify — the
|
|
17
|
+
## Who resolves Verify — the automated quality gate
|
|
18
18
|
|
|
19
|
-
Verify can be resolved two ways, set per task by the `autonomy:` header (see [governance](./11-governance.md) and the autonomy
|
|
19
|
+
Verify can be resolved two ways, set per task by the `autonomy:` header (see [governance](./11-governance.md) and the autonomy level):
|
|
20
20
|
|
|
21
21
|
- **Auto (the default).** When `autonomy: auto`, the run resolves the gate on **evidence** rather than waiting for a person — but only when *all* of these hold: every test green, coverage not decreased, no test weakened and no contract edited, the convergence loops dry, and **no residue** (security, concurrency, or architecture). It records `PASS` as *auto-resolved*, naming the run as the accountable owner — an explicit pass, not a skip. This is principle 7: a gate may be resolved by evidence when that evidence is sufficient and the result is logged.
|
|
22
22
|
- **Human.** When `autonomy: conservative`, or whenever the run finds residue it cannot judge, the gate stops for a person; the two parts below are theirs.
|
|
23
23
|
|
|
24
|
-
**Security is always a `HARD-STOP` and is never auto-passed, at any autonomy level.** The two parts that follow — confirm the evidence, then check the
|
|
24
|
+
**Security is always a `HARD-STOP` and is never auto-passed, at any autonomy level.** The two parts that follow — confirm the evidence, then check the non-functional risks — are what *either* resolver works through; the only question is whether a person or the recorded run signs the outcome.
|
|
25
25
|
|
|
26
26
|
## Part one — confirm the evidence
|
|
27
27
|
|
|
@@ -40,6 +40,14 @@ Automated tests are excellent at behavior on defined inputs and poor at a few sp
|
|
|
40
40
|
- **Security.** Are there exposed secrets, injection openings, or unexpected dependencies? AI-generated code is known to hardcode secrets and to pull in packages by plausible-but-wrong names.
|
|
41
41
|
- **Architecture conformance.** Does the change respect the layering and dependency rules in `CONVENTIONS.md`? Speed with no architectural check produces a fast-growing tangle that becomes unmaintainable within months.
|
|
42
42
|
|
|
43
|
+
## Part three — the deep check (do not skim)
|
|
44
|
+
|
|
45
|
+
Two failures slip straight past green tests. The first is code that is never *wired in* — a new function that nothing calls, an endpoint no route reaches: the tests for it pass in isolation while the feature is, in practice, absent. The second is the opposite — code left *dead* behind a path nothing exercises, quietly rotting. And for a change that produced prose rather than code, the equivalent failure is signing off on a claim you never actually read in full. Plausibility hides all three. So verification carries one explicit requirement beyond the non-functional review:
|
|
46
|
+
|
|
47
|
+
> Deep check — do not skim. If the task produced code, record that every new symbol is referenced (wiring) and that no new dead/unused code was introduced. If it produced prose or non-code, record a semantic read — what you read in full and what it confirmed. Which path applies is the resolver's judgement; the engine never classifies.
|
|
48
|
+
|
|
49
|
+
This is *evidence*, not impression: a reference search showing where each new symbol is called, a scan confirming nothing new is orphaned, or — for prose — a note of exactly what was read and what it confirmed. An unfilled deep check is a **shallow verify**, not a pass. The engine cannot judge wiring, dead code, or whether prose was truly read; the resolver records the evidence, and a person (under `conservative`) or the recorded run (under `auto`) signs it.
|
|
50
|
+
|
|
43
51
|
## Recording the outcome
|
|
44
52
|
|
|
45
53
|
Every verification ends with exactly one recorded outcome, with an accountable owner — never a silent pass:
|
|
@@ -58,12 +66,13 @@ A security finding is always a `HARD-STOP`; it is never waved through with a wai
|
|
|
58
66
|
- [ ] Concurrency/timing of the risky operation is safe.
|
|
59
67
|
- [ ] No exposed secrets, injection openings, or unexpected dependencies.
|
|
60
68
|
- [ ] Layering and dependencies follow `CONVENTIONS.md`.
|
|
69
|
+
- [ ] Deep check (do not skim): for code, every new symbol is referenced (wiring) and no new dead/unused code was introduced; for prose/non-code, a semantic read is recorded.
|
|
61
70
|
- [ ] The change is approved — by a person, **or** (under `autonomy: auto`, no residue) auto-resolved by the run as the recorded accountable owner.
|
|
62
71
|
- [ ] An outcome is recorded (`PASS` / `RISK-ACCEPTED` / `HARD-STOP`).
|
|
63
72
|
|
|
64
73
|
## Common mistakes
|
|
65
74
|
|
|
66
|
-
- **Shipping on plausibility.** Reading the diff, finding it reasonable, and approving — without the evidence and the
|
|
75
|
+
- **Shipping on plausibility.** Reading the diff, finding it reasonable, and approving — without the evidence and the non-functional review — is the precise failure the method exists to prevent.
|
|
67
76
|
- **Treating a security gap as acceptable risk.** It is a `HARD-STOP`, not a waiver.
|
|
68
77
|
- **Skipping the concurrency check** because the tests are green. Tests rarely exercise simultaneity; this is a manual check by design.
|
|
69
78
|
|
package/docs/09-the-loop.md
CHANGED
|
@@ -15,7 +15,7 @@ That information is the input to the next cycle. What you learn in production be
|
|
|
15
15
|
|
|
16
16
|
## Release deliberately
|
|
17
17
|
|
|
18
|
-
Release behind a mechanism that limits the
|
|
18
|
+
Release behind a mechanism that limits the scope of impact of a mistake — a feature flag, a gradual rollout, or both. The verification step established that the feature is correct against everything you anticipated; a controlled release is your protection against what you did not anticipate. If something is wrong, you want to affect a few users and roll back, not affect everyone and scramble.
|
|
19
19
|
|
|
20
20
|
## Reuse the scenarios as monitors
|
|
21
21
|
|
|
@@ -33,9 +33,9 @@ Every defect, surprise, or new need is written up as a change to the specificati
|
|
|
33
33
|
|
|
34
34
|
This is also where the AI returns to a useful role: summarizing telemetry, clustering errors into themes, and drafting the proposed spec delta for a person to review. But the production decisions — what to roll back, what to prioritize — remain human.
|
|
35
35
|
|
|
36
|
-
##
|
|
36
|
+
## Lessons learned and the retrospective consolidation
|
|
37
37
|
|
|
38
|
-
A spec delta feeds the *next feature*. But a loop also teaches the **method itself** — that the domain model missed a boundary, that a whole class of scenario was never tested, that a build convention helped or hurt. AIDD captures those as **
|
|
38
|
+
A spec delta feeds the *next feature*. But a loop also teaches the **method itself** — that the domain model missed a boundary, that a whole class of scenario was never tested, that a build convention helped or hurt. AIDD captures those as **lessons learned**: a single tagged learning, written in the Observe step, marking which of the five competencies it sharpens.
|
|
39
39
|
|
|
40
40
|
| tag | competency | a delta here means you learned something about… |
|
|
41
41
|
|-----|------------|--------------------------------------------------|
|
|
@@ -45,11 +45,11 @@ A spec delta feeds the *next feature*. But a loop also teaches the **method itse
|
|
|
45
45
|
| `TDD` | Test | how we prove correctness — a missing scenario, a flaky or hollow test |
|
|
46
46
|
| `ADD` | AI/build | how the AI builds — a harness, prompt, or convention that helped or hurt |
|
|
47
47
|
|
|
48
|
-
Each delta is one tagged entry — `- [COMPETENCY · status] the learning (evidence: a pointer)` — and the evidence is **required**: a failing scenario, a production signal, a review note. No evidence means it is an opinion, not a delta. The AI **emits** deltas as `open`; it never
|
|
48
|
+
Each delta is one tagged entry — `- [COMPETENCY · status] the learning (evidence: a pointer)` — and the evidence is **required**: a failing scenario, a production signal, a review note. No evidence means it is an opinion, not a delta. The AI **emits** deltas as `open`; it never consolidates its own. Consolidation is judgment, and judgment is the human's — the same verify/observe decision point that keeps the AI from grading its own work.
|
|
49
49
|
|
|
50
|
-
**The
|
|
50
|
+
**The consolidation.** At milestone close (or on demand, when open deltas pile up), a person runs the retrospective consolidation: **gather** every `open` delta across the milestone's tasks, **group** them by competency, **propose** the exact foundation edit for each, **confirm** with the human one by one, then **write** — append-only — flipping each delta to `folded` (merged) or `rejected` (considered and deliberately not merged, left in place so the trail survives), and bumping the `foundation-version:` marker. `DDD`/`SDD`/`UDD` deltas consolidate into the matching section of `PROJECT.md`; `TDD`/`ADD` consolidate into `CONVENTIONS.md` (they sharpen the engine, not the product); and **every** consolidation also appends one row to `PROJECT.md` §Key Decisions — the universal, auditable record of what the foundation learned.
|
|
51
51
|
|
|
52
|
-
**Tooling.** `add.py deltas` lists every open delta across the project (so nothing waiting to be
|
|
52
|
+
**Tooling.** `add.py deltas` lists every open delta across the project (so nothing waiting to be consolidated is invisible); `add.py check` lints each delta's well-formedness — known competency tag, valid status, non-empty evidence. There is deliberately **no `add.py fold`**: the engine stays judgment-free, and the ritual lives with the human who owns it.
|
|
53
53
|
|
|
54
54
|
## Re-entrancy: the loop is the whole point
|
|
55
55
|
|
|
@@ -57,5 +57,11 @@ Two principles converge here. *The flow is re-entrant* — any step can send you
|
|
|
57
57
|
|
|
58
58
|
A team operating this way does not experience requirements changing as a failure of planning. It experiences it as the system working: reality is teaching the specification, and the specification is teaching the next build.
|
|
59
59
|
|
|
60
|
+
## The milestone holds until its goal is met
|
|
61
|
+
|
|
62
|
+
A single feature loops through Observe back to Specify; a **milestone** has the same shape at a larger scale, and a gate to match. A milestone is not finished when its tasks are done — it is finished when its **goal** is met, expressed as the exit criteria in `MILESTONE.md`. So `add.py milestone-done` is **goal-gated**: it refuses to close a milestone while any exit criterion is still unchecked, and **holds until** every box is checked. Those checkboxes are the human's affirmation that the goal is genuinely met — the engine reads the tally, it never judges the goal itself. (A milestone with no exit criteria closes as before; `milestone-done` is the only path to `done`, and archiving refuses anything not yet done — so the one gate cannot be slipped.)
|
|
63
|
+
|
|
64
|
+
While the milestone is held open, the work each task leaves behind — open lessons, and items discovered but out of scope — becomes its next tasks: the AI proposes them, the human confirms, and the loop continues until the goal is reached. The milestone is the loop made concrete; the exit criteria are its finish line.
|
|
65
|
+
|
|
60
66
|
> **Do:** release small, watch the scenarios, and feed every learning back into the spec.
|
|
61
67
|
> **Don't:** treat shipping as the end. The most valuable information about a feature arrives *after* it ships.
|
|
@@ -6,34 +6,34 @@ This chapter covers two operational matters: what you set up once per project, a
|
|
|
6
6
|
|
|
7
7
|
---
|
|
8
8
|
|
|
9
|
-
## Setup: the AI drafts, you
|
|
9
|
+
## Setup: the AI drafts, you approve the baseline
|
|
10
10
|
|
|
11
|
-
Before the first feature, the project needs a foundation — but standing it up is no longer your chore. Point ADD at the repo and **the AI does the drafting**: it runs `init` itself, reads what is there, and fills the foundation the whole project depends on. Your single act is the **
|
|
11
|
+
Before the first feature, the project needs a foundation — but standing it up is no longer your chore. Point ADD at the repo and **the AI does the drafting**: it runs `init` itself, reads what is there, and fills the foundation the whole project depends on. Your single act is the **baseline approval** — the one human gate that freezes it.
|
|
12
12
|
|
|
13
|
-
**What the AI drafts.** From an existing codebase it works **silently** — the code answers the questions a setup interview would ask. On an empty repo it runs a short **four-lens interview** (domain · spec · users · decisions), then drafts. Either way it fills the
|
|
13
|
+
**What the AI drafts.** From an existing codebase it works **silently** — the code answers the questions a setup interview would ask. On an empty repo it runs a short **four-lens interview** (domain · spec · users · decisions), then drafts. Either way it fills the living documentation — the files that outlive all code — and drafts the first milestone's scope and the first task's candidate contract:
|
|
14
14
|
|
|
15
15
|
| Item | File | Purpose |
|
|
16
16
|
|------|------|---------|
|
|
17
17
|
| Foundation | `PROJECT.md` | domain · active spec · UI/UX · key decisions — the context every task reads first |
|
|
18
|
-
| Conventions | `CONVENTIONS.md` | naming, layout, language, formatter —
|
|
18
|
+
| Conventions | `CONVENTIONS.md` | naming, layout, language, formatter — living documentation |
|
|
19
19
|
| Model record | `MODEL_REGISTRY.md` | which AI model and version the project uses, for reproducibility and audit |
|
|
20
20
|
| Dependency allow-list | `dependencies.allowlist` | the packages the AI may use; the pipeline rejects others |
|
|
21
21
|
| Prompt playbook | `playbook/` | the six prompts from [Appendix B](./appendix-b-prompts.md) |
|
|
22
22
|
| Repository + pipeline | — | runs the gates on every change |
|
|
23
23
|
|
|
24
|
-
Every drafted decision is tagged **evidence-grounded** (read from the code) or **guessed** (thin or inferred) and listed
|
|
24
|
+
Every drafted decision is tagged **evidence-grounded** (read from the code) or **guessed** (thin or inferred) and listed lowest-confidence-first in a `SETUP-REVIEW.md`, so the one signature you give is informed rather than given without reading.
|
|
25
25
|
|
|
26
|
-
**The
|
|
26
|
+
**The baseline approval.** The AI presents `SETUP-REVIEW.md`; you check the `guessed` rows; you **lock** — once. That single act freezes the foundation, the first scope, and the first contract together. It is the setup-level analog of the [contract freeze](./05-step-3-contract.md), and it doubles as the first task's contract approval — so there is no separate sign-off. Before the lock the engine lets the AI draft but refuses to cross into build; after it, the build opens.
|
|
27
27
|
|
|
28
28
|
**Setup exit check**
|
|
29
29
|
|
|
30
|
-
- [ ] Foundation +
|
|
31
|
-
- [ ] `SETUP-REVIEW.md` lists every drafted decision
|
|
30
|
+
- [ ] Foundation + living docs drafted (brownfield: from the code, evidence-tagged; greenfield: from the interview, gaps flagged `guessed`).
|
|
31
|
+
- [ ] `SETUP-REVIEW.md` lists every drafted decision lowest-confidence-first.
|
|
32
32
|
- [ ] The model is pinned; the allow-list exists and the pipeline fails on any package outside it.
|
|
33
33
|
- [ ] The pipeline runs and is green on the empty skeleton.
|
|
34
34
|
- [ ] The human **locked down** — and only then did the first feature's build open.
|
|
35
35
|
|
|
36
|
-
Do not start a feature until the pipeline is green and the foundation is locked. The
|
|
36
|
+
Do not start a feature until the pipeline is green and the foundation is locked. The baseline approval turns the AI's draft into committed direction; the pipeline enforces every later exit check without anyone having to remember to.
|
|
37
37
|
|
|
38
38
|
---
|
|
39
39
|
|
|
@@ -80,7 +80,21 @@ The durable thing is never the code:
|
|
|
80
80
|
| POC → MVP | the spike code | the validated approach + the risky-interface contract |
|
|
81
81
|
| MVP → Production | nothing | everything; the code is real and is hardened |
|
|
82
82
|
|
|
83
|
-
The
|
|
83
|
+
The living documentation thickens as you move right: a prototype leaves you a validated design; a proof of concept adds a proven approach and a contract; the MVP adds real, kept code. By production, you are hardening, not rebuilding.
|
|
84
|
+
|
|
85
|
+
### Graduating between stages
|
|
86
|
+
|
|
87
|
+
Moving up a stage — most consequentially MVP → Production — is its own scope level, the fourth after setup, intake, and the milestone loop. It is *not* a label someone types: a project earns production through a human-confirmed roadmap of the hardening work, never through a bare flip. The `add` skill drives this in `graduate.md`; the shape is five steps.
|
|
88
|
+
|
|
89
|
+
**The cue.** When every milestone is `done` *and* the human's **stage-goal-criteria** in `PROJECT.md` are all `[x]`, `add.py status` prints `→ MVP covered → propose graduation`. Until both tallies complete, nothing here applies — a project with no stage-goal-criteria block behaves exactly as before.
|
|
90
|
+
|
|
91
|
+
1. **Gather the analytics.** `add.py graduation-report` clusters the whole MVP loop's evidence into five labeled record-sets — open deltas by competency, open RISK-ACCEPTED waivers by expiry, RETRO records, verify residue, and observe-loop coverage gaps. It *gathers, never judges*: there is no readiness verdict, only the records you reason from.
|
|
92
|
+
2. **Interview.** Synthesize *what production means here* with the human, using those records as the agenda. This synthesis is the judgment the engine refuses to make.
|
|
93
|
+
3. **Draft the roadmap.** For each production outcome the interview surfaces, draft a production milestone with the existing command — `add.py new-milestone <slug> --stage production --goal "…"` — and write its exit criteria. The roadmap is **≥1** milestone; the hardening work itself is what those milestones contain.
|
|
94
|
+
4. **Human confirms.** The human accepts, edits, or declines each draft. Nothing is created on an unconfirmed draft.
|
|
95
|
+
5. **Flip — the final step.** Only now run `add.py stage production`.
|
|
96
|
+
|
|
97
|
+
**The floor the engine enforces.** `add.py stage production` is guarded: it refuses with `stage_no_roadmap` (non-zero exit, state byte-unchanged) when no milestone has `stage: production`. The check is a *tally* — does a production-roadmap record exist? — never a readiness judgment, mirroring the milestone goal-gate. `--force` overrides it for grandfathered or edge cases; use it deliberately, not as the normal path. The guard is on the `→production` transition only; flips to prototype/poc/mvp are unchanged. The engine never advances the stage on its own — it gathers, counts, and holds the floor while the human judges and confirms.
|
|
84
98
|
|
|
85
99
|
---
|
|
86
100
|
|
|
@@ -88,14 +102,14 @@ The survivor layer thickens as you move right: a prototype leaves you a validate
|
|
|
88
102
|
|
|
89
103
|
The default is one task at a time. But when a milestone holds several tasks whose dependencies are already `PASS` and a reviewer is ready, you may run them **concurrently** — one worker per ready task, each building behind its own frozen contract.
|
|
90
104
|
|
|
91
|
-
**Be honest about the gain.** With one human reviewer you cannot beat `review_time × N_tasks`; the human-led
|
|
105
|
+
**Be honest about the gain.** With one human reviewer you cannot beat `review_time × N_tasks`; the human-led decision points are serial. So the win is **not throughput** — it is that the reviewer is *never blocked waiting on a build*. While a person reviews task A's specification bundle, the builds for B, C, and D run behind *their* frozen contracts. You hide build latency under human latency; do not promise more.
|
|
92
106
|
|
|
93
107
|
**Two queues, no new state** — both read from `add.py status`:
|
|
94
108
|
|
|
95
109
|
- **READY-QUEUE** — tasks in the active milestone where the phase is not `done` and every dependency already reads `gate=PASS`. These are the only tasks a worker may pick up; a task finishing `PASS` unblocks its dependents on the next `status`.
|
|
96
|
-
- **REVIEW-QUEUE** — the irreducibly serial part: the **
|
|
110
|
+
- **REVIEW-QUEUE** — the irreducibly serial part: the **bundle approval** (contract freeze) and any **Verify escalation**. One human, one queue, presented one at a time — never a batch that invites approval without reading.
|
|
97
111
|
|
|
98
|
-
**The autonomy
|
|
112
|
+
**The autonomy level is the throttle.** At `conservative`, both gates queue on the human (pure pipelining — builds overlap, nothing auto-resolves). At `auto` (the default), only the bundle-approval decision point and residue escalations queue; Verify auto-PASSes on evidence, so real concurrency follows. The floor never drops below **one human approval per task, at the contract decision point**.
|
|
99
113
|
|
|
100
114
|
**Design for failure (required).** Lease each task to its worker with a timeout — if a worker dies, release the claim back to READY rather than trusting partial work. A worker that hits a stop-and-escalate blocks only its own task; siblings keep running. And if several workers fail in one wave, trip a circuit-breaker and fall back to sequential — repeated failure means the scope was wrong, not the parallelism.
|
|
101
115
|
|
package/docs/11-governance.md
CHANGED
|
@@ -19,7 +19,7 @@ How much the AI is allowed to do is not one switch; it is a setting chosen per a
|
|
|
19
19
|
|
|
20
20
|
The governing rule, restated from the principles: **operate only at the level your review capacity can sustain.** If the AI produces more than the team can verify, drop a level.
|
|
21
21
|
|
|
22
|
-
The **per-scope default is auto-with-evidence behind a one-approval
|
|
22
|
+
The **per-scope default is auto-with-evidence behind a one-approval decision point**: the AI drafts the specification bundle, a human approves the frozen contract once, and the build auto-gates on evidence. You *lower* a scope toward draft-and-review or suggest wherever risk is high or evidence is thin — and a high-risk or method-defining scope is *always* lowered (it is never auto-run). The default sets where you start; review capacity and risk set where you stay.
|
|
23
23
|
|
|
24
24
|
## The gate-fail protocol and the three reports
|
|
25
25
|
|
|
@@ -46,7 +46,7 @@ When someone proposes skipping a step "to go faster," this table is the answer:
|
|
|
46
46
|
|
|
47
47
|
## The continuous concerns
|
|
48
48
|
|
|
49
|
-
Four concerns are not steps but threads that run through every step, starting at project setup. Pulling them
|
|
49
|
+
Four concerns are not steps but threads that run through every step, starting at project setup. Pulling them forward ("shifting left") is far cheaper than bolting them on at the end.
|
|
50
50
|
|
|
51
51
|
| Concern | Begins at | Enforced at the build gate by |
|
|
52
52
|
|---------|-----------|-------------------------------|
|
package/docs/12-roles.md
CHANGED
|
@@ -11,8 +11,8 @@ Everyone on an AIDD team becomes, in part, a *verifier*; most also become *autho
|
|
|
11
11
|
- **Mission:** ensure the right thing gets built. You guard the problem.
|
|
12
12
|
- **Leads:** Specify. **Contributes to:** Scenarios; the loop (deciding what the next cycle addresses).
|
|
13
13
|
- **Owns:** the problem definition, the glossary of domain terms, the prioritized backlog.
|
|
14
|
-
- **Done means:** the spec states real user value with no disputed terms and its assumptions ranked
|
|
15
|
-
- **Apply it:** run the Specify prompt against a real ticket or interview, then read the AI's
|
|
14
|
+
- **Done means:** the spec states real user value with no disputed terms and its assumptions ranked lowest-confidence first — the one or two most likely wrong flagged with *why* and *what they cost*; after release, you have decided what the next loop must address.
|
|
15
|
+
- **Apply it:** run the Specify prompt against a real ticket or interview, then read the AI's lowest-confidence flag *first* and decide the one or two load-bearing assumptions before skimming the low-stakes tail. If you cannot confirm a load-bearing rule, it is not ready to build.
|
|
16
16
|
|
|
17
17
|
## Architect / Engineering Lead
|
|
18
18
|
|
|
@@ -40,7 +40,7 @@ Everyone on an AIDD team becomes, in part, a *verifier*; most also become *autho
|
|
|
40
40
|
|
|
41
41
|
## QA / Test Engineer
|
|
42
42
|
|
|
43
|
-
- **Mission:** make "done" machine-checkable; you are the
|
|
43
|
+
- **Mission:** make "done" machine-checkable; you are the guardrail for AI-written code.
|
|
44
44
|
- **Leads:** Tests. **Contributes to:** Scenarios (turning rules into checkable form); the loop (production monitors).
|
|
45
45
|
- **Owns:** the test suite, the scenario files, the coverage target, the test report at each gate.
|
|
46
46
|
- **Done means:** every scenario has a test that was red before the build; the suite is honest (nothing passes by default); coverage never regresses.
|
package/docs/13-adoption.md
CHANGED
|
@@ -19,7 +19,7 @@ Adopt the method on one real product, not as an all-at-once mandate.
|
|
|
19
19
|
|
|
20
20
|
| Choose… | When… |
|
|
21
21
|
|---------|-------|
|
|
22
|
-
| **Express** | startup, spike, or internal tool; speed of learning dominates; small
|
|
22
|
+
| **Express** | startup, spike, or internal tool; speed of learning dominates; small scope of impact |
|
|
23
23
|
| **Standard** | a normal product with real users and ordinary risk |
|
|
24
24
|
| **Regulated** | finance, health, or anything audited; failure is expensive or legally consequential |
|
|
25
25
|
|
package/docs/14-foundation.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# 14 · The foundation: project context across milestones
|
|
2
2
|
|
|
3
|
-
[← 13 Adoption](./13-adoption.md) · [Contents](./README.md) · Next: [
|
|
3
|
+
[← 13 Adoption](./13-adoption.md) · [Contents](./README.md) · Next: [15 Foundations & Lineage →](./15-foundations-and-lineage.md)
|
|
4
4
|
|
|
5
5
|
---
|
|
6
6
|
|
|
@@ -18,7 +18,7 @@ guesses. That is the same failure the method exists to prevent ([00](./00-introd
|
|
|
18
18
|
one level up.
|
|
19
19
|
|
|
20
20
|
The **foundation** is the layer that holds this context and *outlives every
|
|
21
|
-
milestone*. It is not new ceremony; it is the [
|
|
21
|
+
milestone*. It is not new ceremony; it is the [living documentation](./appendix-f-requirements-matrix.md)
|
|
22
22
|
the method already names, made explicit as three concerns.
|
|
23
23
|
|
|
24
24
|
## Three concerns, one foundation
|
|
@@ -53,14 +53,14 @@ fifth, where the AI executes on it:
|
|
|
53
53
|
|
|
54
54
|

|
|
55
55
|
|
|
56
|
-
> The diagram's foundation (DDD · SDD · UDD) and the method's own words —
|
|
57
|
-
>
|
|
56
|
+
> The diagram's foundation (DDD · SDD · UDD) and the method's own words — living
|
|
57
|
+
> documentation · the foundation document · ubiquitous language — name the same three ideas. This
|
|
58
58
|
> chapter is where the diagram and the text finally meet.
|
|
59
59
|
|
|
60
60
|
## One file, not three
|
|
61
61
|
|
|
62
62
|
A foundation that takes a week to write is a foundation no one keeps current. So
|
|
63
|
-
ADD realizes all three concerns as **one
|
|
63
|
+
ADD realizes all three concerns as **one living document — `PROJECT.md`** — with
|
|
64
64
|
one short section each, plus an append-only record of key decisions:
|
|
65
65
|
|
|
66
66
|
```
|
|
@@ -76,8 +76,8 @@ the detail belongs in a milestone or a contract, not the foundation. The foundat
|
|
|
76
76
|
is the *thin, durable* context the engine reads first — not a place to relocate the
|
|
77
77
|
work. And you do not hand-write it: at setup the AI **drafts** all four sections —
|
|
78
78
|
silently from an existing codebase, or from a short four-lens interview on a
|
|
79
|
-
greenfield repo — and a single human **
|
|
80
|
-
direction (the setup-
|
|
79
|
+
greenfield repo — and a single human **baseline approval** freezes that draft as committed
|
|
80
|
+
direction (the setup-level analog of a contract freeze).
|
|
81
81
|
|
|
82
82
|
## How it feeds the engine — and takes feedback back
|
|
83
83
|
|
|
@@ -101,24 +101,24 @@ life of the product, owned above any single milestone.
|
|
|
101
101
|
|
|
102
102
|
| Tier | Lives in | Lifespan | Holds |
|
|
103
103
|
|------|----------|----------|-------|
|
|
104
|
-
| **Project** (foundation) | `.add/PROJECT.md` +
|
|
104
|
+
| **Project** (foundation) | `.add/PROJECT.md` + living-doc files | whole product | domain, spec stance, users, decisions |
|
|
105
105
|
| **Milestone** | `.add/milestones/<slug>/MILESTONE.md` | one depth-bounded goal | scope, shared contracts, exit criteria |
|
|
106
106
|
| **Task** | `.add/tasks/<slug>/TASK.md` | one feature | the seven-step artifacts |
|
|
107
107
|
|
|
108
108
|
A milestone is a *version bump* to the foundation, not a fresh start: when it
|
|
109
|
-
closes,
|
|
109
|
+
closes, consolidate what it validated into `PROJECT.md` (a decision, a settled domain
|
|
110
110
|
term, a confirmed user journey) and open the next one against the same, now-richer,
|
|
111
|
-
ground. The
|
|
111
|
+
ground. The consolidation is not informal: each loop emits **lessons learned** (tagged
|
|
112
112
|
`DDD · SDD · UDD · TDD · ADD`) in its Observe step, and at milestone close a person
|
|
113
|
-
gathers the open ones and
|
|
114
|
-
bumped — into the foundation. See [09 · The loop](./09-the-loop.md#
|
|
113
|
+
gathers the open ones and consolidates them — append-only, with the `foundation-version:`
|
|
114
|
+
bumped — into the foundation. See [09 · The loop](./09-the-loop.md#lessons-learned-and-the-retrospective-consolidation)
|
|
115
115
|
for the grammar, the ritual, and the tooling (`add.py deltas`, `add.py check`).
|
|
116
116
|
|
|
117
117
|
## In the tooling
|
|
118
118
|
|
|
119
|
-
- `add.py init` scaffolds `PROJECT.md` as a
|
|
120
|
-
content and a single human **
|
|
121
|
-
|
|
119
|
+
- `add.py init` scaffolds `PROJECT.md` as a living-doc file; the AI then drafts its
|
|
120
|
+
content and a single human **baseline approval** (`add.py lock`) freezes it. Like every
|
|
121
|
+
living-doc file, `init` **never overwrites a hand-edited one**.
|
|
122
122
|
- `add.py status` shows a one-line pointer to the foundation, so a fresh session
|
|
123
123
|
re-orients on context before code.
|
|
124
124
|
- The guideline block written into `CLAUDE.md` / `AGENTS.md` tells any agent the
|