@pilotspace/add 1.0.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (54) hide show
  1. package/CHANGELOG.md +88 -0
  2. package/GETTING-STARTED.md +172 -84
  3. package/README.md +14 -8
  4. package/bin/cli.js +39 -38
  5. package/docs/01-principles.md +3 -3
  6. package/docs/02-the-flow.md +20 -13
  7. package/docs/03-step-1-specify.md +13 -13
  8. package/docs/04-step-2-scenarios.md +3 -1
  9. package/docs/05-step-3-contract.md +4 -2
  10. package/docs/06-step-4-tests.md +3 -1
  11. package/docs/07-step-5-build.md +1 -1
  12. package/docs/08-step-6-verify.md +22 -4
  13. package/docs/09-the-loop.md +25 -1
  14. package/docs/10-setup-and-stages.md +52 -9
  15. package/docs/11-governance.md +2 -2
  16. package/docs/12-roles.md +3 -3
  17. package/docs/13-adoption.md +3 -3
  18. package/docs/14-foundation.md +19 -11
  19. package/docs/15-foundations-and-lineage.md +106 -0
  20. package/docs/README.md +4 -0
  21. package/docs/appendix-a-templates.md +3 -3
  22. package/docs/appendix-b-prompts.md +40 -5
  23. package/docs/appendix-c-glossary.md +42 -12
  24. package/docs/appendix-d-worked-example.md +2 -2
  25. package/docs/appendix-e-checklists.md +2 -2
  26. package/docs/appendix-f-requirements-matrix.md +12 -11
  27. package/docs/appendix-g-references.md +106 -0
  28. package/package.json +5 -3
  29. package/skill/add/SKILL.md +50 -21
  30. package/skill/add/adopt.md +67 -0
  31. package/skill/add/deltas.md +20 -8
  32. package/skill/add/fold.md +19 -17
  33. package/skill/add/graduate.md +74 -0
  34. package/skill/add/intake.md +22 -7
  35. package/skill/add/loop.md +59 -0
  36. package/skill/add/phases/0-setup.md +92 -24
  37. package/skill/add/phases/1-specify.md +23 -13
  38. package/skill/add/phases/2-scenarios.md +14 -4
  39. package/skill/add/phases/3-contract.md +38 -9
  40. package/skill/add/phases/4-tests.md +29 -5
  41. package/skill/add/phases/5-build.md +14 -4
  42. package/skill/add/phases/6-verify.md +38 -4
  43. package/skill/add/phases/7-observe.md +13 -5
  44. package/skill/add/report-template.md +106 -0
  45. package/skill/add/run.md +53 -34
  46. package/skill/add/scope.md +24 -2
  47. package/skill/add/setup-review.md +65 -0
  48. package/skill/add/streams.md +256 -0
  49. package/tooling/add.py +1388 -62
  50. package/tooling/templates/CONVENTIONS.md.tmpl +1 -1
  51. package/tooling/templates/GLOSSARY.md.tmpl +23 -0
  52. package/tooling/templates/MILESTONE.md.tmpl +1 -0
  53. package/tooling/templates/PROJECT.md.tmpl +4 -3
  54. package/tooling/templates/TASK.md.tmpl +39 -11
@@ -0,0 +1,106 @@
1
+ # Chat reports — the decision-point template (for the AI, not for add.py)
2
+
3
+ The engine renders artifacts (`report`, `report --decide`, `status`); this file
4
+ governs the CHAT MESSAGE you wrap around them. The digest is the artifact BEHIND
5
+ your presentation, never a replacement for it — and your prose is never a
6
+ replacement for the digest.
7
+
8
+ Use it every time you report at or near a decision point: an intake proposal, a
9
+ bundle approval, a verify gate, a task completion, a milestone close.
10
+
11
+ ## The decision arc — rendered first, above the five blocks
12
+
13
+ Every report at a human gate opens with the **ARC** — three labelled lines that
14
+ place the decision in the work's whole arc, so the human confirms with sight of
15
+ where this is going, not just the step in front of them. Render it first, then a
16
+ separator, then the unchanged five blocks below:
17
+
18
+ ```
19
+ ARC goal: <the milestone / project goal this decision serves>
20
+ done: <proven progress — tasks done · exit-criteria met · what this gate proves>
21
+ plan: <this gate → the next step → the goal>
22
+ ```
23
+
24
+ - **goal** — the milestone or project goal the decision serves, read from the
25
+ `m-goal` line in `add.py status`; never re-typed from memory.
26
+ - **done** — proven progress only: exit-criteria met/total and tasks done from
27
+ the rollup, plus what this gate proves. An honest fact, never a hope.
28
+ - **plan** — this gate → the next step → the goal, mirroring the rollup's
29
+ `DECIDE NEXT` line.
30
+
31
+ The arc is required at every human gate: **baseline-lock · contract-freeze ·
32
+ verify · intake · scope · milestone-close · graduation**. The three labels stay
33
+ constant; their content adapts to the gate. The arc is presentation only — it
34
+ adds no gate and changes no PASS / RISK-ACCEPTED / HARD-STOP / freeze outcome.
35
+
36
+ Its facts are engine-sourced, exactly like EVIDENCE below: goal = `m-goal` ·
37
+ done = exit-criteria met/total + tasks done · plan = `DECIDE NEXT`. If your arc
38
+ and `add.py` output disagree, the engine wins — fix the arc, not the engine.
39
+
40
+ ### Per-gate examples — one shape, gate-specific content
41
+
42
+ - **verify** — `goal:` ship the decision arc · `done:` report-arc tests 6/6
43
+ green, gate ready · `plan:` PASS this gate → wire the arc into every gate → goal.
44
+ - **contract-freeze** — `goal:` … · `done:` bundle drafted, lowest-confidence
45
+ flag surfaced · `plan:` freeze §3 → build → goal.
46
+ - **milestone-close** — `goal:` … · `done:` exit-criteria 3/3 met, all tasks
47
+ done · `plan:` close → archive → the next milestone.
48
+ - **intake** — `goal:` the sized request · `done:` classified new-major,
49
+ rationale stated · `plan:` create the milestone → first contract → goal.
50
+
51
+ ## The five blocks, in order
52
+
53
+ ```
54
+ SUMMARY one line: intent + target + where we are
55
+ DECISION what you need from the human (or "none — FYI")
56
+ ⚠ FLAGS lowest-confidence first, why + cost-if-wrong
57
+ EVIDENCE small table: tests · gates · parity · check — engine-sourced
58
+ NEXT the single next action + what it unlocks
59
+ ```
60
+
61
+ 1. **SUMMARY** — one line carrying intent + target + position, e.g.
62
+ "v13 task 2/3 — tests-declared-fallback is green, gate PASS." The reader
63
+ knows where they are before they read anything else.
64
+ 2. **DECISION** — the question the human must answer, stated plainly; exactly
65
+ one decision per report, or an explicit "none — FYI". If a decision exists,
66
+ ask it AFTER everything below has been shown (show-before-ask).
67
+ 3. **⚠ FLAGS** — lowest-confidence first, each with *why* confidence is lowest and the
68
+ *cost if wrong*. Where TASK.md markers exist (`⚠` / `- [~]` / `- [ ]`),
69
+ quote them verbatim and keep their document order — extraction ≠ judgment.
70
+ 4. **EVIDENCE** — engine-sourced facts pasted from `add.py` output, never
71
+ re-typed from memory. If your prose and the engine disagree, the engine
72
+ wins: fix the engine or the data, not the sentence.
73
+ 5. **NEXT** — one action and what it unlocks. Mirror the rollup's DECIDE NEXT
74
+ line when it is right; overrule it only with a stated reason (e.g. planned
75
+ tasks the state file cannot see yet).
76
+
77
+ **The ask itself** — when block 2's decision becomes a literal question component
78
+ (option picker, numbered menu), compose it as a summary: the detail stays in the
79
+ report above, the question carries intent + what "yes" means + the flag count.
80
+
81
+ ## Hard rules
82
+
83
+ <constraints>
84
+ - **Summary-first.** Never bury the decision under a task list or a diff.
85
+ - **Show before ask.** Render the artifact (digest · diff · report) before any
86
+ approval question; the human decides on what they can see.
87
+ - **Reconcile the count.** Before the ask, your ⚠ FLAGS must reconcile with
88
+ `add.py report --decide`'s open-item count. If your prose calls an item
89
+ resolved while the digest still counts it open, the engine wins — fix the data
90
+ (the TASK.md markers the digest reads), not the sentence. A report whose flag
91
+ count disagrees with the engine is the un-transparent gate the ARC exists to close.
92
+ - **Never pre-stamp a human decision point.** Freeze / gate / lock fields stay DRAFT or
93
+ blank until the answer returns: show → ask → stamp → advance. An artifact
94
+ must never claim an approval that has not happened.
95
+ - **One report per decision point.** After an approval, point at the frozen artifact —
96
+ do not re-render the whole bundle.
97
+ - **Honest scope.** "Done" means the request, not the last task: report
98
+ "task 2/3", never "done" while approved scope remains.
99
+ - **The question is a summary, never the artifact.** Every approval ask carries
100
+ two layers: a compact SUMMARY · DECISION · ⚠ FLAGS block sits in chat
101
+ immediately before the ask (positional), and the question text itself is a
102
+ summary of two lines at most — intent + what "yes" means + the flag count —
103
+ pointing at the report above (compositional). The full bundle, diff, or
104
+ artifact lives only in the chat report; a question that re-carries it buries
105
+ the decision.
106
+ </constraints>
package/skill/add/run.md CHANGED
@@ -1,25 +1,24 @@
1
1
  # The dynamic run — executing a locked scope
2
2
 
3
3
  Once a task's CONTRACT is frozen (phase 3), the scope is *locked*: the external shape will not move.
4
- That lock is ADD's autonomy seam — below it code is disposable; above it nothing breaks. This rubric
5
- covers what runs on the far side of the seam: the **build->verify half, executed as a dynamic,
6
- self-improving run** instead of a manual, sequential build. The human-led FRONT (Specify · Scenarios
7
- · Contract) still owns *direction*, but v7 compresses it to a **single human approval at the seam**
8
- (see "The one-approval front" below) — the AI drafts the whole front, a human approves it once.
4
+ That lock is ADD's autonomy decision point — below it code is disposable; above it nothing breaks. This rubric
5
+ covers what runs on the far side of the decision point: the **build->verify half, executed as a dynamic,
6
+ self-improving run** instead of a manual, sequential build. The human-led **specification bundle** (Specify · Scenarios
7
+ · Contract) still owns *direction*, but v7 compresses it to a **single human approval at the decision point**
8
+ (see "The specification bundle" below) — the AI drafts the whole bundle, a human approves it once.
9
9
 
10
10
  > **Self-improving = within-run convergence + emit v5 deltas** — same definition as v5: tracked,
11
11
  > evidence-backed, never autonomous training. The run converges in-turn AND feeds the human-gated
12
- > fold loop (`deltas.md` · `fold.md`). The engine stays judgment-free: this is a rubric, not `add.py`.
12
+ > consolidation loop (`deltas.md` · `fold.md`). The engine stays judgment-free: this is a rubric, not `add.py`.
13
13
 
14
- ## The one-approval front (v7)
14
+ ## The specification bundle (v7)
15
15
 
16
- The human-led front used to be three separate approvals — Specify, then Scenarios, then the Contract
17
- freeze. v7 compresses it to **one**. From the user's input the AI **drafts the whole front as a single
18
- bundle** the Spec, the Scenarios, the Contract, and the failing Tests and presents it together. The
19
- human gives **one approval, at the frozen contract** (the seam). That single approval is the green light
16
+ The specification bundle used to be three separate approvals — Specify, then Scenarios, then the Contract
17
+ freeze. v7 compresses it to **one**. From the user's input the AI **drafts the whole specification bundle in one pass** — the Spec, the Scenarios, the Contract, and the failing Tests — and presents it together. The
18
+ human gives **one approval, at the frozen contract** (the decision point). That single approval is the green light
20
19
  for the self-driving run.
21
20
 
22
- Why one approval and not zero: the contract freeze is the autonomy seam, and the seam **stays human**.
21
+ Why one approval and not zero: the contract freeze is the autonomy decision point, and the decision point **stays human**.
23
22
  The AI *drafts* the contract but never *freezes its own* — a person approves the frozen shape before any
24
23
  auto-run touches code. This is exactly what keeps "never self-gate a human-led gate" true under an auto
25
24
  default: the one gate that remains is human. Drop it to zero and the AI would freeze the interface it
@@ -28,10 +27,11 @@ then builds against and self-gate the result — the circular trust v6's dogfood
28
27
  What the human is actually approving in that one gate: that the drafted Spec captures the real intent,
29
28
  that the Scenarios cover the cases that matter, and that the Contract shape is the one to freeze. Reject
30
29
  any part and the bundle goes back to draft — that is backward-correction (principle 4), not failure.
31
- Approve, and the run begins.
30
+ Approve, and the run begins. The decision-point guide (`phases/3-contract.md`) carries the
31
+ **freeze review checklist** — six lines that walk the human through exactly this, ⚠-first.
32
32
 
33
- **The least-sure flag — aiming the one approval.** A single approval over a whole bundle invites a
34
- rubber stamp. So the AI presents the bundle **least-sure first**: of everything it is asking the human
33
+ **The lowest-confidence flag — aiming the one approval.** A single approval over a whole bundle is easy to
34
+ grant without reading. So the AI presents the bundle **lowest-confidence first**: of everything it is asking the human
35
35
  to freeze, it names the **1–2 points most likely to be wrong**, tagged by part
36
36
  (`⚠ [spec|scenario|contract|test] … — because …; if wrong: …`), each with *why* it is uncertain and
37
37
  *what it costs if wrong*. The §1 assumptions feed it, but a flag may equally point at an uncovered
@@ -39,7 +39,7 @@ scenario or the contract shape. If nothing is materially uncertain, the AI still
39
39
  biggest risk, however small — never a blank "none". Honest about its limit: the flag records that the
40
40
  human approved with the soft spots **in front of them**, eyes open; it makes a real review cheap and a
41
41
  lazy one visibly negligent, but it cannot *force* engagement — and the AI never asserts that the human
42
- engaged when it cannot know (a self-asserted gate would just be the rubber stamp one level up). Closing
42
+ engaged when it cannot know (a self-asserted gate would just move the unread approval one level up). Closing
43
43
  that enforcement gap is the job of a CI checker, not of prose.
44
44
 
45
45
  ## When the run begins — the scope-lock trigger
@@ -49,17 +49,18 @@ The trigger is the **frozen contract**, nothing else. A run may start only when:
49
49
  - §3 CONTRACT is marked `FROZEN @ vN` (the shape is fixed), AND
50
50
  - §4 TESTS exist and are RED for the right reason (the target the run drives to green).
51
51
 
52
- No frozen contract -> no run: you are still on the human-led front, and starting early is the
52
+ No frozen contract -> no run: you are still inside the specification bundle, and starting early is the
53
53
  forward-skip the flow forbids. The lock is what makes autonomous execution *safe* — the AI cannot
54
54
  drift the interface, because the interface is frozen above it.
55
55
 
56
- ## The touch-boundary — what the run may and may not touch
56
+ ## The change scope — what the run may and may not touch
57
57
 
58
+ <constraints>
58
59
  A locked run has a hard boundary. It MAY:
59
60
 
60
- - write and rewrite **code** (`src/`) — code is disposable below the seam;
61
+ - write and rewrite **code** (`src/`) — code is disposable below the decision point;
61
62
  - drive the **tests** to green WITHOUT weakening them (a weakened test is a method violation);
62
- - gather **evidence** for the verify gate (test output, blind-spot checks).
63
+ - gather **evidence** for the verify gate (test output, non-functional review).
63
64
 
64
65
  It MUST NOT:
65
66
 
@@ -67,10 +68,11 @@ It MUST NOT:
67
68
  the run STOPS and hands back to a human to reopen Specify (principle 4). The run never re-locks
68
69
  scope on its own.
69
70
  - weaken, delete, or skip a **test** to make the build pass (that inverts the method).
70
- - touch the **human-led front artifacts** (§1–§3) except to halt and escalate.
71
+ - touch the **specification-bundle artifacts** (§1–§3) except to halt and escalate.
72
+ </constraints>
71
73
 
72
74
  Crossing the boundary is not a fast run; it is an unverified one. When the run hits something only the
73
- front can resolve, it stops — and that stop is the loop working, not failing.
75
+ specification bundle can resolve, it stops — and that stop is the loop working, not failing.
74
76
 
75
77
  ## The dynamic run — fan-out and in-run convergence
76
78
 
@@ -82,21 +84,28 @@ on a trustworthy result with three loops:
82
84
  Stopping at the first green is how defects survive; the run stops only when the well runs dry.
83
85
  - **adversarial verify** — for every "done" claim, an independent skeptic tries to REFUTE it. The
84
86
  claim survives only if it withstands refutation, not because one pass looked plausible.
85
- - **completeness-critic** — a final pass that asks "what did we NOT cover — a scenario, a blind-spot,
87
+ - **completeness-critic** — a final pass that asks "what did we NOT cover — a scenario, a non-functional risk,
86
88
  an unstated assumption?" Whatever it finds re-enters the run.
87
89
 
88
90
  The run ends only when the loops go dry AND the auto-gate's evidence is satisfied. This is the run
89
91
  **self-improving within the turn** — the same convergence the foundation loop runs across milestones,
90
92
  compressed into one task.
91
93
 
92
- ## The evidence auto-gate
94
+ ## The automated quality gate
93
95
 
96
+ <constraints>
94
97
  The verify gate may be resolved by **evidence** rather than by a person — when the evidence is
95
98
  sufficient and the result is recorded (principle 7, reframed: an automated, recorded pass is an
96
99
  explicit pass, not a skip).
97
100
 
98
101
  - **Auto-PASS requires ALL of:** every test green; coverage not decreased; no test weakened and no
99
- contract edited; the convergence loops dry; the completeness-critic found nothing open.
102
+ contract edited; the convergence loops dry; the completeness-critic found nothing open; and the
103
+ deep check below recorded.
104
+ - **The deep check (every gate, no skim).** Deep check — do not skim. If the task produced code, record
105
+ that every new symbol is referenced (wiring) and that no new dead/unused code was introduced. If it
106
+ produced prose or non-code, record a semantic read — what you read in full and what it confirmed.
107
+ Which path applies is the resolver's judgement; the engine never classifies. An unfilled deep check is
108
+ a **shallow verify**, not an auto-PASS — evidence the work is wired, not merely plausible.
100
109
  - **Always escalates to a human (never auto-passed):** any **security** finding (HARD-STOP, always);
101
110
  a **concurrency**/timing risk the tests cannot exercise; an **architecture**/layering violation; and
102
111
  any failing test. These are the residue principle 2 names — automation cannot judge them.
@@ -106,22 +115,24 @@ explicit pass, not a skip).
106
115
 
107
116
  The auto-gate NEVER writes a human signature it did not get. An auto-PASS is logged as *auto-resolved*,
108
117
  honestly — the line between a pass and a skip is the recorded outcome, not a forged name.
118
+ </constraints>
109
119
 
110
120
  ## Emitting deltas — feeding the foundation back
111
121
 
112
122
  The completeness-critic does not discard what it finds. Every gap, surprise, or convention that helped
113
- or hurt becomes an **`open` competency delta** in the task's OBSERVE block, in the `deltas.md` grammar,
123
+ or hurt becomes an **`open` lesson learned** in the task's OBSERVE block, in the `deltas.md` grammar,
114
124
  tagged by competency:
115
125
 
116
126
  - a finding the run FIXED but that taught the foundation something (a missing scenario -> `TDD`);
117
127
  - a finding the run could NOT fix — a residue escalation -> a delta AND the escalation to a human.
118
128
 
119
- These `open` deltas feed v5's human-gated fold (`fold.md`) at milestone close: the run emits `open`;
120
- the human folds. That is the loop closing — **v6 run -> v5 foundation** — so a dynamic run sharpens the
129
+ These `open` deltas feed v5's human-gated consolidation (`fold.md`) at milestone close: the run emits `open`;
130
+ the human consolidates. That is the loop closing — **v6 run -> v5 foundation** — so a dynamic run sharpens the
121
131
  five competencies instead of letting its findings evaporate at end-of-run.
122
132
 
123
- ## The autonomy dial
133
+ ## The autonomy level
124
134
 
135
+ <constraints>
125
136
  How much a run may auto-gate is a **per-scope setting**, not a global switch (principle 5: trust is
126
137
  earned per scope). A task declares its level in its `TASK.md` header:
127
138
 
@@ -137,16 +148,24 @@ autonomy: auto | conservative
137
148
 
138
149
  > **v7 reversal (recorded, not hidden).** Earlier the default was `conservative` and `auto` was the
139
150
  > earned exception; v7 flips this — `auto` is the default, `conservative` is the deliberate lowering.
140
- > What did **not** change is principle 5: the dial is still **per-scope**, the level still lives in the
151
+ > What did **not** change is principle 5: the autonomy level is still **per-scope**, and it still lives in the
141
152
  > `TASK.md` header, and you still lower it anywhere risk demands. Only the starting point moved.
142
153
 
143
- **The high-risk guard — `auto` is refused where it matters most.** The dial is not a blank cheque. On a
154
+ **The high-risk guard — `auto` is refused where it matters most.** The autonomy level is not a blank cheque. On a
144
155
  **high-risk or method-defining scope** — anything where a wrong-but-plausible result is expensive or
145
156
  hard to reverse (auth, money, data-loss paths, the method/trust-layer itself) — `auto` must be lowered
146
157
  to `conservative`; leaving it at `auto` there is the reject code **`unguarded_high_risk_auto`**. This
147
- closes the v6 dogfood blind-spot, where the whole milestone ran at `auto` on the riskiest possible
158
+ closes the v6 dogfood gap, where the whole milestone ran at `auto` on the riskiest possible
148
159
  scope (defining the method) with no friction. The default is `auto` *for ordinary, well-tested scope*;
149
160
  high risk still earns a human gate.
150
161
 
151
- The dial is a **rubric convention** read by the human and the run it is **not an `add.py` flag** (the
152
- engine stays judgment-free); the level lives in the `TASK.md` header where the run already reads.
162
+ Judging *what* is high-risk stays human the scope declares **`risk: high`** in the same `TASK.md`
163
+ header where the autonomy level lives, reviewed at the freeze like every header line (the engine never
164
+ classifies scope). **Since v14 the guard is mechanical for the declared case:**
165
+ the engine refuses the declared combination — `add.py gate` will not complete (`PASS`/`RISK-ACCEPTED`) a task whose header
166
+ carries `risk: high` without `autonomy: conservative` (error `unguarded_high_risk_auto`; `HARD-STOP`
167
+ always records — stopping is never blocked), and `add.py audit` flags the same code on a finished
168
+ record whose header was tampered or whose GATE RECORD reviewer is the auto-gate — which CI enforces
169
+ (audit-ci). The honest limit mirrors the audit's: an **undeclared** high-risk scope passes; declaring
170
+ is the human decision point, the engine enforces what was declared.
171
+ </constraints>
@@ -20,6 +20,26 @@ scope drafting honors intake's classification — it never re-sizes a request:
20
20
  means one drafting pass, NOT auto-creation. Nothing is written to disk — single draft or the
21
21
  whole batch — until the human confirms. You propose; you wait.
22
22
 
23
+ ## Brainstorm before you draft — co-specify at milestone level
24
+
25
+ Don't draft a MILESTONE.md from thin input. Run the same three-move co-specify as a
26
+ task's §1 (`phases/1-specify.md`) — Diverge (framings + open questions) → Converge
27
+ (draft + rank) → Validate (show flags first) — raised to milestone scope. Ask only
28
+ what moves the goal, the In/Out line, or the task list; skip what PROJECT.md settles.
29
+ Draft the WHOLE milestone before showing; nothing hits disk until the human confirms.
30
+
31
+ Diverge seeds (pick the live ones):
32
+ - **Outcome** — done means a user can do *what* they can't today? (goal sentence)
33
+ - **Edge of scope** — nearest thing assumed IN that you want OUT? (Out list)
34
+ - **Riskiest decision point** — which contract, if wrong, costs the most rework? (freeze-first)
35
+ - **Done-looks-like** — how do we SEE each outcome without reading code? (exit criteria)
36
+ - **First slice** — which task unblocks the rest? (breadth-first order)
37
+
38
+ Rank assumptions lowest-confidence first; the top 1–2 get the flag the human reads at confirm:
39
+ `⚠ <assumption> — lowest confidence because <why>; if wrong: <cost>`. Present the draft via
40
+ `report-template.md` — open with the ARC (goal · done · plan): the goal this milestone serves,
41
+ what is already covered, and the plan its task list lays out.
42
+
23
43
  ## Drafting a good MILESTONE.md (section by section)
24
44
 
25
45
  - **goal** — ONE sentence, an outcome not an output ("a user can size any request", not "write
@@ -27,8 +47,8 @@ whole batch — until the human confirms. You propose; you wait.
27
47
  - **Scope In/Out** — the explicit anti-creep deferral list. Naming what is OUT is as important
28
48
  as what is IN; an empty Out list usually means the scope is not yet thought through.
29
49
  - **Shared decisions & glossary deltas** — cross-cutting rules every task must honor, named from
30
- the glossary. New terms get a glossary entry (the survivor layer stays honest).
31
- - **Shared / risky contracts to freeze first** — the seams between tasks; name the owning task.
50
+ the glossary. New terms get a glossary entry (the living documentation stays honest).
51
+ - **Shared / risky contracts to freeze first** — the decision points between tasks; name the owning task.
32
52
  - **Tasks (breadth-first)** — `slug · depends-on · one line` each. Decompose by deliverable, not
33
53
  by phase; keep each task one-file-sized. Order by dependency, not by guesswork.
34
54
  - **Exit criteria** — observable, and **every exit criterion maps to a declared task slug**
@@ -36,6 +56,7 @@ whole batch — until the human confirms. You propose; you wait.
36
56
 
37
57
  ## Reject codes (emit `{ reject, rationale }`, create nothing)
38
58
 
59
+ <reject_codes>
39
60
  - `not_classified` — the request has not been through intake yet. Classify it first; you cannot
40
61
  draft scope for an unclassified request.
41
62
  - `dangling_criterion` — a drafted MILESTONE.md has an exit criterion that maps to no declared
@@ -43,6 +64,7 @@ whole batch — until the human confirms. You propose; you wait.
43
64
  a malformed milestone. With no engine lint, you are the first check and the human is the backstop.
44
65
  - `no_milestone` — intake routed the request to `task` or `change-request`; scope drafting
45
66
  creates NO milestone. Honor the classification; do not invent milestone-sized scope.
67
+ </reject_codes>
46
68
 
47
69
  ## Worked example (from this repo's own history)
48
70
 
@@ -0,0 +1,65 @@
1
+ # Setup review — the one page the human signs
2
+
3
+ Autonomous setup ends at a single human gate: the **baseline approval** (`add.py lock`). Before that
4
+ signature is honest, the human needs to see *what you drafted and how sure you were* — not re-derive
5
+ it. `SETUP-REVIEW.md` is that page: every decision you made while drafting the foundation, first-scope,
6
+ and the first contract, **ordered lowest-confidence-first** so the riskiest guesses meet their eye first.
7
+
8
+ This is the setup-level analog of presenting a task's specification bundle lowest-confidence-first at the contract freeze.
9
+ The engine never reads this file — `add.py lock` is judgment-free, the signature *is* the gate (see
10
+ `setup-lock-state`). The human **reading** this page is the review; your job is to make the reading honest.
11
+
12
+ ## Where it lives
13
+
14
+ Write **one** artifact at `.add/SETUP-REVIEW.md`. **Never clobber a human-edited one** — if it already
15
+ exists with hand edits, append/update, don't overwrite (the same non-clobber rule `init` applies to
16
+ living docs). It is a per-onboarding, setup-level artifact; it sits beside `PROJECT.md`, not under a task.
17
+
18
+ ## The template
19
+
20
+ ```markdown
21
+ # SETUP REVIEW — <project>
22
+
23
+ <stage> · <brownfield | greenfield> · drafted by <model> @ <date>
24
+
25
+ | # | Decision | Lands in | Tag | Why / Evidence |
26
+ |---|----------|----------|-----|----------------|
27
+ | 1 | <the drafted decision> | PROJECT.md \| scope \| first-contract | `guessed` | <the inference + why you had to guess> |
28
+ | 2 | <…> | <…> | `evidence-grounded` | <cite the source file/line you read it from> |
29
+
30
+ Sign: confirm in chat → the agent runs `add.py lock --by "<name>"` (typing it yourself works too)
31
+ ```
32
+
33
+ Rows are numbered for reference at the gate ("row 1 is where my confidence is lowest").
34
+
35
+ ## The two rules that make it honest
36
+
37
+ <constraints>
38
+ 1. **Lowest-confidence-first.** Order rows by confidence **ascending**. A `guessed` row always floats above an
39
+ `evidence-grounded` one. The point is not completeness theatre — it is to spend the human's attention
40
+ where it changes outcomes: the top of the table is the part they actually need to challenge.
41
+
42
+ 2. **Every row is tagged — `guessed` or `evidence-grounded`.**
43
+ - `evidence-grounded` — you read it from the code/repo. **Cite the file** (e.g. `pyproject.toml`,
44
+ `src/orders/models.py`). Brownfield onboarding (see `adopt.md`) is mostly these.
45
+ - `guessed` — the repo was silent, so you inferred it. **State the inference and why.** Thin-greenfield
46
+ onboarding (a near-empty repo, only the 4-lens answers) produces these. These are what the human
47
+ must check; that is why they sit on top.
48
+
49
+ The tag vocabulary is shared with `adopt.md` — the brownfield map tags each filled living-doc decision
50
+ `guessed`/`evidence-grounded`, and those tags flow straight into this table.
51
+ </constraints>
52
+
53
+ ## Where it ends
54
+
55
+ `SETUP-REVIEW.md` is **read-only context** for the baseline approval. You do not ask the human to approve it
56
+ field-by-field; you present it, lowest-confidence-first; they confirm in conversation, and you run the lock
57
+ with their name:
58
+
59
+ ```bash
60
+ python3 .add/tooling/add.py lock --by "<name>"
61
+ ```
62
+
63
+ `lock` records the lock layers and opens the build — it does **not** parse or validate this file (the
64
+ engine stays judgment-free). The review lives in the human's reading of the page, not in the tool. Make
65
+ the top of the table the truth they most need, and the one signature is informed.