@pilotspace/add 1.1.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (54) hide show
  1. package/CHANGELOG.md +40 -0
  2. package/GETTING-STARTED.md +165 -139
  3. package/README.md +13 -7
  4. package/bin/cli.js +13 -4
  5. package/docs/01-principles.md +3 -3
  6. package/docs/02-the-flow.md +15 -11
  7. package/docs/03-step-1-specify.md +13 -13
  8. package/docs/04-step-2-scenarios.md +2 -2
  9. package/docs/05-step-3-contract.md +3 -3
  10. package/docs/06-step-4-tests.md +2 -2
  11. package/docs/07-step-5-build.md +1 -1
  12. package/docs/08-step-6-verify.md +14 -5
  13. package/docs/09-the-loop.md +12 -6
  14. package/docs/10-setup-and-stages.md +27 -13
  15. package/docs/11-governance.md +2 -2
  16. package/docs/12-roles.md +3 -3
  17. package/docs/13-adoption.md +1 -1
  18. package/docs/14-foundation.md +15 -15
  19. package/docs/15-foundations-and-lineage.md +106 -0
  20. package/docs/README.md +4 -0
  21. package/docs/appendix-a-templates.md +3 -3
  22. package/docs/appendix-b-prompts.md +40 -5
  23. package/docs/appendix-c-glossary.md +42 -12
  24. package/docs/appendix-d-worked-example.md +2 -2
  25. package/docs/appendix-e-checklists.md +2 -2
  26. package/docs/appendix-f-requirements-matrix.md +8 -8
  27. package/docs/appendix-g-references.md +106 -0
  28. package/package.json +1 -1
  29. package/skill/add/SKILL.md +39 -37
  30. package/skill/add/adopt.md +13 -11
  31. package/skill/add/deltas.md +8 -6
  32. package/skill/add/fold.md +19 -17
  33. package/skill/add/graduate.md +74 -0
  34. package/skill/add/intake.md +22 -7
  35. package/skill/add/loop.md +59 -0
  36. package/skill/add/phases/0-setup.md +29 -24
  37. package/skill/add/phases/1-specify.md +23 -13
  38. package/skill/add/phases/2-scenarios.md +14 -4
  39. package/skill/add/phases/3-contract.md +24 -11
  40. package/skill/add/phases/4-tests.md +15 -5
  41. package/skill/add/phases/5-build.md +11 -4
  42. package/skill/add/phases/6-verify.md +24 -2
  43. package/skill/add/phases/7-observe.md +13 -5
  44. package/skill/add/report-template.md +65 -7
  45. package/skill/add/run.md +45 -34
  46. package/skill/add/scope.md +10 -6
  47. package/skill/add/setup-review.md +13 -10
  48. package/skill/add/streams.md +69 -19
  49. package/tooling/add.py +476 -34
  50. package/tooling/templates/CONVENTIONS.md.tmpl +1 -1
  51. package/tooling/templates/GLOSSARY.md.tmpl +23 -0
  52. package/tooling/templates/MILESTONE.md.tmpl +1 -0
  53. package/tooling/templates/PROJECT.md.tmpl +4 -3
  54. package/tooling/templates/TASK.md.tmpl +33 -12
package/skill/add/run.md CHANGED
@@ -1,25 +1,24 @@
1
1
  # The dynamic run — executing a locked scope
2
2
 
3
3
  Once a task's CONTRACT is frozen (phase 3), the scope is *locked*: the external shape will not move.
4
- That lock is ADD's autonomy seam — below it code is disposable; above it nothing breaks. This rubric
5
- covers what runs on the far side of the seam: the **build->verify half, executed as a dynamic,
6
- self-improving run** instead of a manual, sequential build. The human-led FRONT (Specify · Scenarios
7
- · Contract) still owns *direction*, but v7 compresses it to a **single human approval at the seam**
8
- (see "The one-approval front" below) — the AI drafts the whole front, a human approves it once.
4
+ That lock is ADD's autonomy decision point — below it code is disposable; above it nothing breaks. This rubric
5
+ covers what runs on the far side of the decision point: the **build->verify half, executed as a dynamic,
6
+ self-improving run** instead of a manual, sequential build. The human-led **specification bundle** (Specify · Scenarios
7
+ · Contract) still owns *direction*, but v7 compresses it to a **single human approval at the decision point**
8
+ (see "The specification bundle" below) — the AI drafts the whole bundle, a human approves it once.
9
9
 
10
10
  > **Self-improving = within-run convergence + emit v5 deltas** — same definition as v5: tracked,
11
11
  > evidence-backed, never autonomous training. The run converges in-turn AND feeds the human-gated
12
- > fold loop (`deltas.md` · `fold.md`). The engine stays judgment-free: this is a rubric, not `add.py`.
12
+ > consolidation loop (`deltas.md` · `fold.md`). The engine stays judgment-free: this is a rubric, not `add.py`.
13
13
 
14
- ## The one-approval front (v7)
14
+ ## The specification bundle (v7)
15
15
 
16
- The human-led front used to be three separate approvals — Specify, then Scenarios, then the Contract
17
- freeze. v7 compresses it to **one**. From the user's input the AI **drafts the whole front as a single
18
- bundle** the Spec, the Scenarios, the Contract, and the failing Tests and presents it together. The
19
- human gives **one approval, at the frozen contract** (the seam). That single approval is the green light
16
+ The specification bundle used to be three separate approvals — Specify, then Scenarios, then the Contract
17
+ freeze. v7 compresses it to **one**. From the user's input the AI **drafts the whole specification bundle in one pass** — the Spec, the Scenarios, the Contract, and the failing Tests — and presents it together. The
18
+ human gives **one approval, at the frozen contract** (the decision point). That single approval is the green light
20
19
  for the self-driving run.
21
20
 
22
- Why one approval and not zero: the contract freeze is the autonomy seam, and the seam **stays human**.
21
+ Why one approval and not zero: the contract freeze is the autonomy decision point, and the decision point **stays human**.
23
22
  The AI *drafts* the contract but never *freezes its own* — a person approves the frozen shape before any
24
23
  auto-run touches code. This is exactly what keeps "never self-gate a human-led gate" true under an auto
25
24
  default: the one gate that remains is human. Drop it to zero and the AI would freeze the interface it
@@ -28,11 +27,11 @@ then builds against and self-gate the result — the circular trust v6's dogfood
28
27
  What the human is actually approving in that one gate: that the drafted Spec captures the real intent,
29
28
  that the Scenarios cover the cases that matter, and that the Contract shape is the one to freeze. Reject
30
29
  any part and the bundle goes back to draft — that is backward-correction (principle 4), not failure.
31
- Approve, and the run begins. The seam guide (`phases/3-contract.md`) carries the
30
+ Approve, and the run begins. The decision-point guide (`phases/3-contract.md`) carries the
32
31
  **freeze review checklist** — six lines that walk the human through exactly this, ⚠-first.
33
32
 
34
- **The least-sure flag — aiming the one approval.** A single approval over a whole bundle invites a
35
- rubber stamp. So the AI presents the bundle **least-sure first**: of everything it is asking the human
33
+ **The lowest-confidence flag — aiming the one approval.** A single approval over a whole bundle is easy to
34
+ grant without reading. So the AI presents the bundle **lowest-confidence first**: of everything it is asking the human
36
35
  to freeze, it names the **1–2 points most likely to be wrong**, tagged by part
37
36
  (`⚠ [spec|scenario|contract|test] … — because …; if wrong: …`), each with *why* it is uncertain and
38
37
  *what it costs if wrong*. The §1 assumptions feed it, but a flag may equally point at an uncovered
@@ -40,7 +39,7 @@ scenario or the contract shape. If nothing is materially uncertain, the AI still
40
39
  biggest risk, however small — never a blank "none". Honest about its limit: the flag records that the
41
40
  human approved with the soft spots **in front of them**, eyes open; it makes a real review cheap and a
42
41
  lazy one visibly negligent, but it cannot *force* engagement — and the AI never asserts that the human
43
- engaged when it cannot know (a self-asserted gate would just be the rubber stamp one level up). Closing
42
+ engaged when it cannot know (a self-asserted gate would just move the unread approval one level up). Closing
44
43
  that enforcement gap is the job of a CI checker, not of prose.
45
44
 
46
45
  ## When the run begins — the scope-lock trigger
@@ -50,17 +49,18 @@ The trigger is the **frozen contract**, nothing else. A run may start only when:
50
49
  - §3 CONTRACT is marked `FROZEN @ vN` (the shape is fixed), AND
51
50
  - §4 TESTS exist and are RED for the right reason (the target the run drives to green).
52
51
 
53
- No frozen contract -> no run: you are still on the human-led front, and starting early is the
52
+ No frozen contract -> no run: you are still inside the specification bundle, and starting early is the
54
53
  forward-skip the flow forbids. The lock is what makes autonomous execution *safe* — the AI cannot
55
54
  drift the interface, because the interface is frozen above it.
56
55
 
57
- ## The touch-boundary — what the run may and may not touch
56
+ ## The change scope — what the run may and may not touch
58
57
 
58
+ <constraints>
59
59
  A locked run has a hard boundary. It MAY:
60
60
 
61
- - write and rewrite **code** (`src/`) — code is disposable below the seam;
61
+ - write and rewrite **code** (`src/`) — code is disposable below the decision point;
62
62
  - drive the **tests** to green WITHOUT weakening them (a weakened test is a method violation);
63
- - gather **evidence** for the verify gate (test output, blind-spot checks).
63
+ - gather **evidence** for the verify gate (test output, non-functional review).
64
64
 
65
65
  It MUST NOT:
66
66
 
@@ -68,10 +68,11 @@ It MUST NOT:
68
68
  the run STOPS and hands back to a human to reopen Specify (principle 4). The run never re-locks
69
69
  scope on its own.
70
70
  - weaken, delete, or skip a **test** to make the build pass (that inverts the method).
71
- - touch the **human-led front artifacts** (§1–§3) except to halt and escalate.
71
+ - touch the **specification-bundle artifacts** (§1–§3) except to halt and escalate.
72
+ </constraints>
72
73
 
73
74
  Crossing the boundary is not a fast run; it is an unverified one. When the run hits something only the
74
- front can resolve, it stops — and that stop is the loop working, not failing.
75
+ specification bundle can resolve, it stops — and that stop is the loop working, not failing.
75
76
 
76
77
  ## The dynamic run — fan-out and in-run convergence
77
78
 
@@ -83,21 +84,28 @@ on a trustworthy result with three loops:
83
84
  Stopping at the first green is how defects survive; the run stops only when the well runs dry.
84
85
  - **adversarial verify** — for every "done" claim, an independent skeptic tries to REFUTE it. The
85
86
  claim survives only if it withstands refutation, not because one pass looked plausible.
86
- - **completeness-critic** — a final pass that asks "what did we NOT cover — a scenario, a blind-spot,
87
+ - **completeness-critic** — a final pass that asks "what did we NOT cover — a scenario, a non-functional risk,
87
88
  an unstated assumption?" Whatever it finds re-enters the run.
88
89
 
89
90
  The run ends only when the loops go dry AND the auto-gate's evidence is satisfied. This is the run
90
91
  **self-improving within the turn** — the same convergence the foundation loop runs across milestones,
91
92
  compressed into one task.
92
93
 
93
- ## The evidence auto-gate
94
+ ## The automated quality gate
94
95
 
96
+ <constraints>
95
97
  The verify gate may be resolved by **evidence** rather than by a person — when the evidence is
96
98
  sufficient and the result is recorded (principle 7, reframed: an automated, recorded pass is an
97
99
  explicit pass, not a skip).
98
100
 
99
101
  - **Auto-PASS requires ALL of:** every test green; coverage not decreased; no test weakened and no
100
- contract edited; the convergence loops dry; the completeness-critic found nothing open.
102
+ contract edited; the convergence loops dry; the completeness-critic found nothing open; and the
103
+ deep check below recorded.
104
+ - **The deep check (every gate, no skim).** Deep check — do not skim. If the task produced code, record
105
+ that every new symbol is referenced (wiring) and that no new dead/unused code was introduced. If it
106
+ produced prose or non-code, record a semantic read — what you read in full and what it confirmed.
107
+ Which path applies is the resolver's judgement; the engine never classifies. An unfilled deep check is
108
+ a **shallow verify**, not an auto-PASS — evidence the work is wired, not merely plausible.
101
109
  - **Always escalates to a human (never auto-passed):** any **security** finding (HARD-STOP, always);
102
110
  a **concurrency**/timing risk the tests cannot exercise; an **architecture**/layering violation; and
103
111
  any failing test. These are the residue principle 2 names — automation cannot judge them.
@@ -107,22 +115,24 @@ explicit pass, not a skip).
107
115
 
108
116
  The auto-gate NEVER writes a human signature it did not get. An auto-PASS is logged as *auto-resolved*,
109
117
  honestly — the line between a pass and a skip is the recorded outcome, not a forged name.
118
+ </constraints>
110
119
 
111
120
  ## Emitting deltas — feeding the foundation back
112
121
 
113
122
  The completeness-critic does not discard what it finds. Every gap, surprise, or convention that helped
114
- or hurt becomes an **`open` competency delta** in the task's OBSERVE block, in the `deltas.md` grammar,
123
+ or hurt becomes an **`open` lesson learned** in the task's OBSERVE block, in the `deltas.md` grammar,
115
124
  tagged by competency:
116
125
 
117
126
  - a finding the run FIXED but that taught the foundation something (a missing scenario -> `TDD`);
118
127
  - a finding the run could NOT fix — a residue escalation -> a delta AND the escalation to a human.
119
128
 
120
- These `open` deltas feed v5's human-gated fold (`fold.md`) at milestone close: the run emits `open`;
121
- the human folds. That is the loop closing — **v6 run -> v5 foundation** — so a dynamic run sharpens the
129
+ These `open` deltas feed v5's human-gated consolidation (`fold.md`) at milestone close: the run emits `open`;
130
+ the human consolidates. That is the loop closing — **v6 run -> v5 foundation** — so a dynamic run sharpens the
122
131
  five competencies instead of letting its findings evaporate at end-of-run.
123
132
 
124
- ## The autonomy dial
133
+ ## The autonomy level
125
134
 
135
+ <constraints>
126
136
  How much a run may auto-gate is a **per-scope setting**, not a global switch (principle 5: trust is
127
137
  earned per scope). A task declares its level in its `TASK.md` header:
128
138
 
@@ -138,23 +148,24 @@ autonomy: auto | conservative
138
148
 
139
149
  > **v7 reversal (recorded, not hidden).** Earlier the default was `conservative` and `auto` was the
140
150
  > earned exception; v7 flips this — `auto` is the default, `conservative` is the deliberate lowering.
141
- > What did **not** change is principle 5: the dial is still **per-scope**, the level still lives in the
151
+ > What did **not** change is principle 5: the autonomy level is still **per-scope**, and it still lives in the
142
152
  > `TASK.md` header, and you still lower it anywhere risk demands. Only the starting point moved.
143
153
 
144
- **The high-risk guard — `auto` is refused where it matters most.** The dial is not a blank cheque. On a
154
+ **The high-risk guard — `auto` is refused where it matters most.** The autonomy level is not a blank cheque. On a
145
155
  **high-risk or method-defining scope** — anything where a wrong-but-plausible result is expensive or
146
156
  hard to reverse (auth, money, data-loss paths, the method/trust-layer itself) — `auto` must be lowered
147
157
  to `conservative`; leaving it at `auto` there is the reject code **`unguarded_high_risk_auto`**. This
148
- closes the v6 dogfood blind-spot, where the whole milestone ran at `auto` on the riskiest possible
158
+ closes the v6 dogfood gap, where the whole milestone ran at `auto` on the riskiest possible
149
159
  scope (defining the method) with no friction. The default is `auto` *for ordinary, well-tested scope*;
150
160
  high risk still earns a human gate.
151
161
 
152
162
  Judging *what* is high-risk stays human — the scope declares **`risk: high`** in the same `TASK.md`
153
- header where the dial lives, reviewed at the freeze like every header line (the engine never
163
+ header where the autonomy level lives, reviewed at the freeze like every header line (the engine never
154
164
  classifies scope). **Since v14 the guard is mechanical for the declared case:**
155
165
  the engine refuses the declared combination — `add.py gate` will not complete (`PASS`/`RISK-ACCEPTED`) a task whose header
156
166
  carries `risk: high` without `autonomy: conservative` (error `unguarded_high_risk_auto`; `HARD-STOP`
157
167
  always records — stopping is never blocked), and `add.py audit` flags the same code on a finished
158
168
  record whose header was tampered or whose GATE RECORD reviewer is the auto-gate — which CI enforces
159
169
  (audit-ci). The honest limit mirrors the audit's: an **undeclared** high-risk scope passes; declaring
160
- is the human seam, the engine enforces what was declared.
170
+ is the human decision point, the engine enforces what was declared.
171
+ </constraints>
@@ -20,7 +20,7 @@ scope drafting honors intake's classification — it never re-sizes a request:
20
20
  means one drafting pass, NOT auto-creation. Nothing is written to disk — single draft or the
21
21
  whole batch — until the human confirms. You propose; you wait.
22
22
 
23
- ## Brainstorm before you draft — co-specify at milestone altitude
23
+ ## Brainstorm before you draft — co-specify at milestone level
24
24
 
25
25
  Don't draft a MILESTONE.md from thin input. Run the same three-move co-specify as a
26
26
  task's §1 (`phases/1-specify.md`) — Diverge (framings + open questions) → Converge
@@ -31,12 +31,14 @@ Draft the WHOLE milestone before showing; nothing hits disk until the human conf
31
31
  Diverge seeds (pick the live ones):
32
32
  - **Outcome** — done means a user can do *what* they can't today? (goal sentence)
33
33
  - **Edge of scope** — nearest thing assumed IN that you want OUT? (Out list)
34
- - **Riskiest seam** — which contract, if wrong, costs the most rework? (freeze-first)
34
+ - **Riskiest decision point** — which contract, if wrong, costs the most rework? (freeze-first)
35
35
  - **Done-looks-like** — how do we SEE each outcome without reading code? (exit criteria)
36
36
  - **First slice** — which task unblocks the rest? (breadth-first order)
37
37
 
38
- Rank assumptions least-sure first; the top 1–2 get the flag the human reads at confirm:
39
- `⚠ <assumption> — least sure because <why>; if wrong: <cost>`.
38
+ Rank assumptions lowest-confidence first; the top 1–2 get the flag the human reads at confirm:
39
+ `⚠ <assumption> — lowest confidence because <why>; if wrong: <cost>`. Present the draft via
40
+ `report-template.md` — open with the ARC (goal · done · plan): the goal this milestone serves,
41
+ what is already covered, and the plan its task list lays out.
40
42
 
41
43
  ## Drafting a good MILESTONE.md (section by section)
42
44
 
@@ -45,8 +47,8 @@ Rank assumptions least-sure first; the top 1–2 get the flag the human reads at
45
47
  - **Scope In/Out** — the explicit anti-creep deferral list. Naming what is OUT is as important
46
48
  as what is IN; an empty Out list usually means the scope is not yet thought through.
47
49
  - **Shared decisions & glossary deltas** — cross-cutting rules every task must honor, named from
48
- the glossary. New terms get a glossary entry (the survivor layer stays honest).
49
- - **Shared / risky contracts to freeze first** — the seams between tasks; name the owning task.
50
+ the glossary. New terms get a glossary entry (the living documentation stays honest).
51
+ - **Shared / risky contracts to freeze first** — the decision points between tasks; name the owning task.
50
52
  - **Tasks (breadth-first)** — `slug · depends-on · one line` each. Decompose by deliverable, not
51
53
  by phase; keep each task one-file-sized. Order by dependency, not by guesswork.
52
54
  - **Exit criteria** — observable, and **every exit criterion maps to a declared task slug**
@@ -54,6 +56,7 @@ Rank assumptions least-sure first; the top 1–2 get the flag the human reads at
54
56
 
55
57
  ## Reject codes (emit `{ reject, rationale }`, create nothing)
56
58
 
59
+ <reject_codes>
57
60
  - `not_classified` — the request has not been through intake yet. Classify it first; you cannot
58
61
  draft scope for an unclassified request.
59
62
  - `dangling_criterion` — a drafted MILESTONE.md has an exit criterion that maps to no declared
@@ -61,6 +64,7 @@ Rank assumptions least-sure first; the top 1–2 get the flag the human reads at
61
64
  a malformed milestone. With no engine lint, you are the first check and the human is the backstop.
62
65
  - `no_milestone` — intake routed the request to `task` or `change-request`; scope drafting
63
66
  creates NO milestone. Honor the classification; do not invent milestone-sized scope.
67
+ </reject_codes>
64
68
 
65
69
  ## Worked example (from this repo's own history)
66
70
 
@@ -1,11 +1,11 @@
1
1
  # Setup review — the one page the human signs
2
2
 
3
- Autonomous setup ends at a single human gate: the **lock-down** (`add.py lock`). Before that
3
+ Autonomous setup ends at a single human gate: the **baseline approval** (`add.py lock`). Before that
4
4
  signature is honest, the human needs to see *what you drafted and how sure you were* — not re-derive
5
5
  it. `SETUP-REVIEW.md` is that page: every decision you made while drafting the foundation, first-scope,
6
- and the first contract, **ordered least-sure-first** so the riskiest guesses meet their eye first.
6
+ and the first contract, **ordered lowest-confidence-first** so the riskiest guesses meet their eye first.
7
7
 
8
- This is the setup-altitude analog of presenting a task's front least-sure-first at the contract freeze.
8
+ This is the setup-level analog of presenting a task's specification bundle lowest-confidence-first at the contract freeze.
9
9
  The engine never reads this file — `add.py lock` is judgment-free, the signature *is* the gate (see
10
10
  `setup-lock-state`). The human **reading** this page is the review; your job is to make the reading honest.
11
11
 
@@ -13,7 +13,7 @@ The engine never reads this file — `add.py lock` is judgment-free, the signatu
13
13
 
14
14
  Write **one** artifact at `.add/SETUP-REVIEW.md`. **Never clobber a human-edited one** — if it already
15
15
  exists with hand edits, append/update, don't overwrite (the same non-clobber rule `init` applies to
16
- survivors). It is a per-onboarding, setup-altitude artifact; it sits beside `PROJECT.md`, not under a task.
16
+ living docs). It is a per-onboarding, setup-level artifact; it sits beside `PROJECT.md`, not under a task.
17
17
 
18
18
  ## The template
19
19
 
@@ -27,14 +27,15 @@ survivors). It is a per-onboarding, setup-altitude artifact; it sits beside `PRO
27
27
  | 1 | <the drafted decision> | PROJECT.md \| scope \| first-contract | `guessed` | <the inference + why you had to guess> |
28
28
  | 2 | <…> | <…> | `evidence-grounded` | <cite the source file/line you read it from> |
29
29
 
30
- Sign: reviewed the above → `add.py lock --by "<name>"`
30
+ Sign: confirm in chatthe agent runs `add.py lock --by "<name>"` (typing it yourself works too)
31
31
  ```
32
32
 
33
- Rows are numbered for reference at the gate ("row 1 is the one I'm least sure about").
33
+ Rows are numbered for reference at the gate ("row 1 is where my confidence is lowest").
34
34
 
35
35
  ## The two rules that make it honest
36
36
 
37
- 1. **Least-sure-first.** Order rows by confidence **ascending**. A `guessed` row always floats above an
37
+ <constraints>
38
+ 1. **Lowest-confidence-first.** Order rows by confidence **ascending**. A `guessed` row always floats above an
38
39
  `evidence-grounded` one. The point is not completeness theatre — it is to spend the human's attention
39
40
  where it changes outcomes: the top of the table is the part they actually need to challenge.
40
41
 
@@ -45,13 +46,15 @@ Rows are numbered for reference at the gate ("row 1 is the one I'm least sure ab
45
46
  onboarding (a near-empty repo, only the 4-lens answers) produces these. These are what the human
46
47
  must check; that is why they sit on top.
47
48
 
48
- The tag vocabulary is shared with `adopt.md` — the brownfield map tags each filled survivor decision
49
+ The tag vocabulary is shared with `adopt.md` — the brownfield map tags each filled living-doc decision
49
50
  `guessed`/`evidence-grounded`, and those tags flow straight into this table.
51
+ </constraints>
50
52
 
51
53
  ## Where it ends
52
54
 
53
- `SETUP-REVIEW.md` is **read-only context** for the lock-down. You do not ask the human to approve it
54
- field-by-field; you present it, least-sure-first, and they sign once:
55
+ `SETUP-REVIEW.md` is **read-only context** for the baseline approval. You do not ask the human to approve it
56
+ field-by-field; you present it, lowest-confidence-first; they confirm in conversation, and you run the lock
57
+ with their name:
55
58
 
56
59
  ```bash
57
60
  python3 .add/tooling/add.py lock --by "<name>"
@@ -11,9 +11,9 @@ orchestrator*, drive several tasks at once by reading the dependency DAG that
11
11
  ## The honest frame — this is pipelining, not N× speed
12
12
 
13
13
  With **one human reviewer** you cannot beat `review_time × N_tasks` (the human-led
14
- seams are serial — `docs/10-setup-and-stages.md:91`). So the win is **not throughput**:
14
+ decision points are serial — `docs/10-setup-and-stages.md:91`). So the win is **not throughput**:
15
15
  it is that the reviewer is **never blocked waiting on a build**. While the human reviews
16
- task A's frozen front, the builds for B·C·D run behind *their* frozen contracts. You hide
16
+ task A's frozen bundle, the builds for B·C·D run behind *their* frozen contracts. You hide
17
17
  build latency under human latency. Do not promise more than that.
18
18
 
19
19
  ## The two queues
@@ -24,33 +24,34 @@ Compute both from one `python3 .add/tooling/add.py status` — no new state:
24
24
  `deps=` task already shows `gate=PASS`. These are the only tasks a worker may pick up.
25
25
  A task with unmet deps stays queued; a task finishing PASS unblocks its dependents on
26
26
  the next `status`.
27
- - **REVIEW-QUEUE** — the irreducibly serial part: the **one-approval front** (contract
27
+ - **REVIEW-QUEUE** — the irreducibly serial part: the **bundle approval** (contract
28
28
  freeze) and any **Verify escalation**. One human, one queue. Present these one at a
29
- time, never in a batch the human will rubber-stamp.
29
+ time, never in a batch the human will approve without reading.
30
30
 
31
31
  ```
32
32
  add.py status ─► READY-QUEUE ──spawn workers──► builds run ──► REVIEW-QUEUE ──► done
33
- (deps=PASS?) (machine span) (concurrent) (human seams,
33
+ (deps=PASS?) (machine span) (concurrent) (decision points,
34
34
  ▲ strictly serial)
35
35
  └──────────────── a task gating PASS unblocks its dependents ──────────────┘
36
36
  ```
37
37
 
38
- ## The dial is the throttle (not a new flag)
38
+ ## The autonomy level is the throttle (not a new flag)
39
39
 
40
40
  How much concurrency you actually get is set by each task's `autonomy:` header
41
41
  (`run.md`), not by this rubric:
42
42
 
43
43
  | `autonomy` (TASK.md) | What serializes on the human | Concurrency |
44
44
  |----------------------|------------------------------|-------------|
45
- | `conservative` | one-approval front **+** every Verify | pure pipelining — builds overlap, both gates queue |
46
- | `auto` (default) | one-approval front **only**; Verify auto-PASSes on evidence | real concurrency — only the seam + residue escalations queue |
45
+ | `conservative` | bundle approval **+** every Verify | pure pipelining — builds overlap, both gates queue |
46
+ | `auto` (default) | bundle approval **only**; Verify auto-PASSes on evidence | real concurrency — only the decision point + residue escalations queue |
47
47
  | `auto` but **high-risk** | refused → forced `conservative` (`unguarded_high_risk_auto`) | back to pipelining, by design |
48
48
 
49
- The irreducible floor is **one human approval per task at the contract seam** — the seam
49
+ The irreducible floor is **one human approval per task at the contract decision point** — the decision point
50
50
  never drops to zero (`run.md:22`). That floor is correct; do not engineer around it.
51
51
 
52
52
  ## Who writes what — the hard boundary
53
53
 
54
+ <constraints>
54
55
  - **You (orchestrator)** own all shared writes: `MILESTONE.md`, and every
55
56
  `add.py advance <slug>` / `add.py gate <outcome> <slug>` call. **Always pass the explicit
56
57
  `<slug>`** — `advance`/`gate`/`phase` all take an optional task slug and act on it
@@ -62,21 +63,70 @@ never drops to zero (`run.md:22`). That floor is correct; do not engineer around
62
63
  - **Isolation**: spawn each worker with `isolation="worktree"` so concurrent builds
63
64
  cannot collide. The worktree is discarded on failure; the task resets to its last-good
64
65
  phase.
66
+ </constraints>
65
67
 
66
68
  ## Design for failure (required)
67
69
 
68
70
  - **Fresh worktree base (verify base == HEAD)** — create each worker's worktree from current
69
- `HEAD` **after** you commit the task's frozen front (spec · scenarios · contract · tests). A
71
+ `HEAD` **after** you commit the task's frozen specification bundle (spec · scenarios · contract · tests). A
70
72
  worktree forked from a stale base forces the worker to recreate the frozen artifacts by hand
71
73
  (the v10 dogfood hit exactly this). Before the worker starts, confirm `git -C <worktree>
72
74
  rev-parse HEAD` equals the orchestrator's `HEAD`; if it drifted, `git merge` the base in first.
73
- - **Lease + timeout** — record which worker holds which task; if a worker dies, release
74
- the claim back to READY (re-spawn, do not assume partial work is sound).
75
+ - **Lease + timeout** — record which worker holds which task (in the wave ledger, below);
76
+ if a worker dies, release the claim back to READY (re-spawn, do not assume partial work is sound).
75
77
  - **Failure isolates** — a worker that hits a STOP-and-escalate (below) blocks only its
76
78
  own task. Siblings keep running; the escalation joins the REVIEW-QUEUE.
77
79
  - **Circuit-breaker** — if N workers fail in a wave, stop fanning out and fall back to
78
80
  sequential. Repeated failure means the scope was wrong, not the parallelism.
79
81
 
82
+ ## Wave ledger — the wave's resume point
83
+
84
+ A single task resumes from `state.json`; a wave used to resume from nothing — the
85
+ task ↔ lease ↔ fork-base ↔ autonomy ↔ merge-order mapping lived only in the orchestrator's
86
+ chat context, and the v12-1 recurrence proved that discipline without an artifact fails
87
+ (the base check existed in prose and never ran). The ledger fixes both: it is the file you
88
+ re-orient from, and its evidence cells cannot be filled without executing the checks.
89
+
90
+ **The file** — `.add/milestones/<m>/WAVE.md`, orchestrator-owned like `MILESTONE.md` and
91
+ `state.json`. ONE live wave per milestone at a time; opening a second while one is live is
92
+ refused (`wave_already_live`). **Workers never read WAVE.md** — the orchestrator copies the
93
+ relevant mid-wave decisions into each worker's PROMPT.md at spawn/respawn, so the worker
94
+ contract below stays unchanged and no worker widens into sibling state.
95
+
96
+ ```markdown
97
+ # WAVE.md — transient wave ledger (orchestrator-owned · one live wave per milestone)
98
+ wave: <n> · opened: <date> · status: live|merging
99
+ base: <orchestrator HEAD at spawn — the sha every fork must equal>
100
+
101
+ ### Roster (lease ledger)
102
+ | task | lease (worker) | fork-base (pasted) | autonomy | spawned | timeout |
103
+ |--------|----------------|---------------------------------------------|----------|---------|---------|
104
+ | <slug> | wt-a | <paste `git -C <wt> rev-parse HEAD` output> | auto | <time> | <dur> |
105
+
106
+ ### Mid-wave decisions
107
+ - <date> <decision a later or respawned worker must honor — copy it into that worker's PROMPT.md>
108
+
109
+ ### Merge order (serial; integration Verify per merge)
110
+ 1. <slug> → 2. <slug>
111
+ ```
112
+
113
+ **Evidence cells, not ticks.** The fork-base cell holds the PASTED output of
114
+ `git -C <worktree> rev-parse HEAD`, and it must equal `base:`. A tick is not evidence; a row
115
+ you can only fill by running the command is the fresh-worktree-base check EXECUTING — the
116
+ v12-1 lesson (words-exist ≠ method-works) closed structurally. Spawning a worker whose roster
117
+ row lacks that evidence is refused (`unverified_fork_base`).
118
+
119
+ **Lifecycle — open → consume → digest → delete.** Open the ledger when the first worker
120
+ spawns. The serial integration Verify consumes it (the merge order is read from it, one
121
+ worktree at a time). At wave close, absorb the evidence digest — wave base · roster→fork-base
122
+ evidence · merge order · integration-Verify outcome — into `MILESTONE.md` as an append-only
123
+ `## Wave log` block (this is the integration-Verify *record*, previously homeless), and only
124
+ then remove the file. Removing WAVE.md before the digest is absorbed is refused
125
+ (`digest_not_absorbed`) — the proof the checks ran must outlive the file.
126
+
127
+ **Resume rule.** On session start, a live WAVE.md is the wave's resume point: re-orient from
128
+ the file — roster, bases, decisions, merge order — never from conversational memory.
129
+
80
130
  ## Merge is serial — integration Verify
81
131
 
82
132
  Parallel build, **serial integration**. After workers return, you merge the worktrees
@@ -85,8 +135,8 @@ checks that `run.md:102` says automation cannot judge. Two green tasks in isolat
85
135
  still conflict when merged; this step is where that surfaces. Never auto-pass it.
86
136
 
87
137
  Each worktree carries a full copy of `.add/`. Merge back **only** `src/`, `tests/`, and the
88
- worker's own `.add/tasks/<slug>/` (TASK.md · SUMMARY.md) — `.add/state.json` and
89
- `MILESTONE.md` stay orchestrator-owned, or a parallel merge will drag stale state back.
138
+ worker's own `.add/tasks/<slug>/` (TASK.md · SUMMARY.md) — `.add/state.json`, `MILESTONE.md`,
139
+ and the live `WAVE.md` stay orchestrator-owned, or a parallel merge will drag stale state back.
90
140
 
91
141
  ## The worker contract — portable across coding agents
92
142
 
@@ -107,7 +157,7 @@ changes. Fill every `{{...}}` per stream. The ADD-specific value is `<touch_boun
107
157
  Execute the LOCKED dynamic run for task '{{TASK_SLUG}}' in milestone {{MILESTONE}}:
108
158
  drive §4 TESTS red→green against the FROZEN contract {{CONTRACT_VERSION}}, converge, and
109
159
  resolve verify per autonomy={{AUTONOMY}}. You own ONLY the machine-led span — the two human
110
- seams (front approval · escalated Verify) are NOT yours.
160
+ decision points (bundle approval · escalated Verify) are NOT yours.
111
161
  </objective>
112
162
 
113
163
  <persona>
@@ -126,7 +176,7 @@ Self-Eval; if any < 0.9, refine before returning.
126
176
  <touch_boundary> <!-- from run.md:56-73; the worker's contract, identical on every runner -->
127
177
  MAY: rewrite code in src/ · drive tests green WITHOUT weakening them · gather verify evidence.
128
178
  MUST NOT: edit the frozen CONTRACT or locked scope · weaken/delete/skip any test ·
129
- touch §1–§3 front artifacts · write MILESTONE.md / state.json / any sibling stream.
179
+ touch §1–§3 bundle artifacts · write MILESTONE.md / state.json / any sibling stream.
130
180
  STOP-and-escalate (return your findings; do not decide):
131
181
  • a discovered scope/contract gap → backward-correction, reopen Specify (principle 4)
132
182
  • any SECURITY finding → HARD-STOP, always
@@ -156,7 +206,7 @@ ripgrep otherwise. Design every IO path for failure — timeouts, retries, rollb
156
206
  <return> <!-- the worker PROPOSES; the orchestrator RECORDS. A worker never runs add.py. -->
157
207
  End with a structured verdict AND write the same into SUMMARY.md in the task dir:
158
208
  { task, outcome: PASS|RISK-ACCEPTED|HARD-STOP|ESCALATE, evidence: <tests+coverage>,
159
- residue: [security|concurrency|architecture findings], deltas: [open competency deltas] }.
209
+ residue: [security|concurrency|architecture findings], deltas: [open lessons learned] }.
160
210
  Do NOT touch add.py or any shared file — the orchestrator gates on your verdict.
161
211
  </return>
162
212
  ```
@@ -169,7 +219,7 @@ The contract is identical whichever model runs it (the model is disposable, like
169
219
  | Tier | When | Claude Code | Any other runner |
170
220
  |------|------|-------------|------------------|
171
221
  | **mid** | ordinary, well-tested scope; clear contract | `sonnet` | the runner's balanced model |
172
- | **top** | complex / ambiguous / cross-cutting / large blast radius | `opus` | the runner's strongest reasoning model |
222
+ | **top** | complex / ambiguous / cross-cutting / broad scope of impact | `opus` | the runner's strongest reasoning model |
173
223
 
174
224
  Two rules sit **above** model choice and never bend:
175
225
  - **High-risk ⇒ `conservative` autonomy, regardless of model** (`run.md` high-risk guard). A
@@ -186,7 +236,7 @@ worktree, then points the agent at that directory.
186
236
  |-----------|----------|----------------------------------|-----------------------------------------------|
187
237
  | spawn a worker | prompt + label | `Task(description=…, prompt=…)` | `cd $WT && <agent> run --prompt-file PROMPT.md` |
188
238
  | pick the model | tier → id | `model="opus"\|"sonnet"` | a `--model <id>` flag |
189
- | isolate | worktree | `isolation="worktree"` | `git worktree add $WT HEAD` (after committing the front; verify base == HEAD), then run inside it |
239
+ | isolate | worktree | `isolation="worktree"` | `git worktree add $WT HEAD` (after committing the bundle; verify base == HEAD), then run inside it |
190
240
  | load context | files / cwd | `<context_files>` + repo cwd | run inside `$WT`; paths are relative |
191
241
  | domain expertise | skill / preamble | a Claude skill in `<expertise>` | a system-prompt / profile preamble |
192
242
  | return a verdict | structured | final message (optionally a schema) | stdout JSON the orchestrator parses |