@pilotspace/add 1.1.0 → 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +40 -0
- package/GETTING-STARTED.md +165 -139
- package/README.md +13 -7
- package/bin/cli.js +13 -4
- package/docs/01-principles.md +3 -3
- package/docs/02-the-flow.md +15 -11
- package/docs/03-step-1-specify.md +13 -13
- package/docs/04-step-2-scenarios.md +2 -2
- package/docs/05-step-3-contract.md +3 -3
- package/docs/06-step-4-tests.md +2 -2
- package/docs/07-step-5-build.md +1 -1
- package/docs/08-step-6-verify.md +14 -5
- package/docs/09-the-loop.md +12 -6
- package/docs/10-setup-and-stages.md +27 -13
- package/docs/11-governance.md +2 -2
- package/docs/12-roles.md +3 -3
- package/docs/13-adoption.md +1 -1
- package/docs/14-foundation.md +15 -15
- package/docs/15-foundations-and-lineage.md +106 -0
- package/docs/README.md +4 -0
- package/docs/appendix-a-templates.md +3 -3
- package/docs/appendix-b-prompts.md +40 -5
- package/docs/appendix-c-glossary.md +42 -12
- package/docs/appendix-d-worked-example.md +2 -2
- package/docs/appendix-e-checklists.md +2 -2
- package/docs/appendix-f-requirements-matrix.md +8 -8
- package/docs/appendix-g-references.md +106 -0
- package/package.json +1 -1
- package/skill/add/SKILL.md +39 -37
- package/skill/add/adopt.md +13 -11
- package/skill/add/deltas.md +8 -6
- package/skill/add/fold.md +19 -17
- package/skill/add/graduate.md +74 -0
- package/skill/add/intake.md +22 -7
- package/skill/add/loop.md +59 -0
- package/skill/add/phases/0-setup.md +29 -24
- package/skill/add/phases/1-specify.md +23 -13
- package/skill/add/phases/2-scenarios.md +14 -4
- package/skill/add/phases/3-contract.md +24 -11
- package/skill/add/phases/4-tests.md +15 -5
- package/skill/add/phases/5-build.md +11 -4
- package/skill/add/phases/6-verify.md +24 -2
- package/skill/add/phases/7-observe.md +13 -5
- package/skill/add/report-template.md +65 -7
- package/skill/add/run.md +45 -34
- package/skill/add/scope.md +10 -6
- package/skill/add/setup-review.md +13 -10
- package/skill/add/streams.md +69 -19
- package/tooling/add.py +476 -34
- package/tooling/templates/CONVENTIONS.md.tmpl +1 -1
- package/tooling/templates/GLOSSARY.md.tmpl +23 -0
- package/tooling/templates/MILESTONE.md.tmpl +1 -0
- package/tooling/templates/PROJECT.md.tmpl +4 -3
- package/tooling/templates/TASK.md.tmpl +33 -12
package/skill/add/run.md
CHANGED
|
@@ -1,25 +1,24 @@
|
|
|
1
1
|
# The dynamic run — executing a locked scope
|
|
2
2
|
|
|
3
3
|
Once a task's CONTRACT is frozen (phase 3), the scope is *locked*: the external shape will not move.
|
|
4
|
-
That lock is ADD's autonomy
|
|
5
|
-
covers what runs on the far side of the
|
|
6
|
-
self-improving run** instead of a manual, sequential build. The human-led
|
|
7
|
-
· Contract) still owns *direction*, but v7 compresses it to a **single human approval at the
|
|
8
|
-
(see "The
|
|
4
|
+
That lock is ADD's autonomy decision point — below it code is disposable; above it nothing breaks. This rubric
|
|
5
|
+
covers what runs on the far side of the decision point: the **build->verify half, executed as a dynamic,
|
|
6
|
+
self-improving run** instead of a manual, sequential build. The human-led **specification bundle** (Specify · Scenarios
|
|
7
|
+
· Contract) still owns *direction*, but v7 compresses it to a **single human approval at the decision point**
|
|
8
|
+
(see "The specification bundle" below) — the AI drafts the whole bundle, a human approves it once.
|
|
9
9
|
|
|
10
10
|
> **Self-improving = within-run convergence + emit v5 deltas** — same definition as v5: tracked,
|
|
11
11
|
> evidence-backed, never autonomous training. The run converges in-turn AND feeds the human-gated
|
|
12
|
-
>
|
|
12
|
+
> consolidation loop (`deltas.md` · `fold.md`). The engine stays judgment-free: this is a rubric, not `add.py`.
|
|
13
13
|
|
|
14
|
-
## The
|
|
14
|
+
## The specification bundle (v7)
|
|
15
15
|
|
|
16
|
-
The
|
|
17
|
-
freeze. v7 compresses it to **one**. From the user's input the AI **drafts the whole
|
|
18
|
-
|
|
19
|
-
human gives **one approval, at the frozen contract** (the seam). That single approval is the green light
|
|
16
|
+
The specification bundle used to be three separate approvals — Specify, then Scenarios, then the Contract
|
|
17
|
+
freeze. v7 compresses it to **one**. From the user's input the AI **drafts the whole specification bundle in one pass** — the Spec, the Scenarios, the Contract, and the failing Tests — and presents it together. The
|
|
18
|
+
human gives **one approval, at the frozen contract** (the decision point). That single approval is the green light
|
|
20
19
|
for the self-driving run.
|
|
21
20
|
|
|
22
|
-
Why one approval and not zero: the contract freeze is the autonomy
|
|
21
|
+
Why one approval and not zero: the contract freeze is the autonomy decision point, and the decision point **stays human**.
|
|
23
22
|
The AI *drafts* the contract but never *freezes its own* — a person approves the frozen shape before any
|
|
24
23
|
auto-run touches code. This is exactly what keeps "never self-gate a human-led gate" true under an auto
|
|
25
24
|
default: the one gate that remains is human. Drop it to zero and the AI would freeze the interface it
|
|
@@ -28,11 +27,11 @@ then builds against and self-gate the result — the circular trust v6's dogfood
|
|
|
28
27
|
What the human is actually approving in that one gate: that the drafted Spec captures the real intent,
|
|
29
28
|
that the Scenarios cover the cases that matter, and that the Contract shape is the one to freeze. Reject
|
|
30
29
|
any part and the bundle goes back to draft — that is backward-correction (principle 4), not failure.
|
|
31
|
-
Approve, and the run begins. The
|
|
30
|
+
Approve, and the run begins. The decision-point guide (`phases/3-contract.md`) carries the
|
|
32
31
|
**freeze review checklist** — six lines that walk the human through exactly this, ⚠-first.
|
|
33
32
|
|
|
34
|
-
**The
|
|
35
|
-
|
|
33
|
+
**The lowest-confidence flag — aiming the one approval.** A single approval over a whole bundle is easy to
|
|
34
|
+
grant without reading. So the AI presents the bundle **lowest-confidence first**: of everything it is asking the human
|
|
36
35
|
to freeze, it names the **1–2 points most likely to be wrong**, tagged by part
|
|
37
36
|
(`⚠ [spec|scenario|contract|test] … — because …; if wrong: …`), each with *why* it is uncertain and
|
|
38
37
|
*what it costs if wrong*. The §1 assumptions feed it, but a flag may equally point at an uncovered
|
|
@@ -40,7 +39,7 @@ scenario or the contract shape. If nothing is materially uncertain, the AI still
|
|
|
40
39
|
biggest risk, however small — never a blank "none". Honest about its limit: the flag records that the
|
|
41
40
|
human approved with the soft spots **in front of them**, eyes open; it makes a real review cheap and a
|
|
42
41
|
lazy one visibly negligent, but it cannot *force* engagement — and the AI never asserts that the human
|
|
43
|
-
engaged when it cannot know (a self-asserted gate would just
|
|
42
|
+
engaged when it cannot know (a self-asserted gate would just move the unread approval one level up). Closing
|
|
44
43
|
that enforcement gap is the job of a CI checker, not of prose.
|
|
45
44
|
|
|
46
45
|
## When the run begins — the scope-lock trigger
|
|
@@ -50,17 +49,18 @@ The trigger is the **frozen contract**, nothing else. A run may start only when:
|
|
|
50
49
|
- §3 CONTRACT is marked `FROZEN @ vN` (the shape is fixed), AND
|
|
51
50
|
- §4 TESTS exist and are RED for the right reason (the target the run drives to green).
|
|
52
51
|
|
|
53
|
-
No frozen contract -> no run: you are still
|
|
52
|
+
No frozen contract -> no run: you are still inside the specification bundle, and starting early is the
|
|
54
53
|
forward-skip the flow forbids. The lock is what makes autonomous execution *safe* — the AI cannot
|
|
55
54
|
drift the interface, because the interface is frozen above it.
|
|
56
55
|
|
|
57
|
-
## The
|
|
56
|
+
## The change scope — what the run may and may not touch
|
|
58
57
|
|
|
58
|
+
<constraints>
|
|
59
59
|
A locked run has a hard boundary. It MAY:
|
|
60
60
|
|
|
61
|
-
- write and rewrite **code** (`src/`) — code is disposable below the
|
|
61
|
+
- write and rewrite **code** (`src/`) — code is disposable below the decision point;
|
|
62
62
|
- drive the **tests** to green WITHOUT weakening them (a weakened test is a method violation);
|
|
63
|
-
- gather **evidence** for the verify gate (test output,
|
|
63
|
+
- gather **evidence** for the verify gate (test output, non-functional review).
|
|
64
64
|
|
|
65
65
|
It MUST NOT:
|
|
66
66
|
|
|
@@ -68,10 +68,11 @@ It MUST NOT:
|
|
|
68
68
|
the run STOPS and hands back to a human to reopen Specify (principle 4). The run never re-locks
|
|
69
69
|
scope on its own.
|
|
70
70
|
- weaken, delete, or skip a **test** to make the build pass (that inverts the method).
|
|
71
|
-
- touch the **
|
|
71
|
+
- touch the **specification-bundle artifacts** (§1–§3) except to halt and escalate.
|
|
72
|
+
</constraints>
|
|
72
73
|
|
|
73
74
|
Crossing the boundary is not a fast run; it is an unverified one. When the run hits something only the
|
|
74
|
-
|
|
75
|
+
specification bundle can resolve, it stops — and that stop is the loop working, not failing.
|
|
75
76
|
|
|
76
77
|
## The dynamic run — fan-out and in-run convergence
|
|
77
78
|
|
|
@@ -83,21 +84,28 @@ on a trustworthy result with three loops:
|
|
|
83
84
|
Stopping at the first green is how defects survive; the run stops only when the well runs dry.
|
|
84
85
|
- **adversarial verify** — for every "done" claim, an independent skeptic tries to REFUTE it. The
|
|
85
86
|
claim survives only if it withstands refutation, not because one pass looked plausible.
|
|
86
|
-
- **completeness-critic** — a final pass that asks "what did we NOT cover — a scenario, a
|
|
87
|
+
- **completeness-critic** — a final pass that asks "what did we NOT cover — a scenario, a non-functional risk,
|
|
87
88
|
an unstated assumption?" Whatever it finds re-enters the run.
|
|
88
89
|
|
|
89
90
|
The run ends only when the loops go dry AND the auto-gate's evidence is satisfied. This is the run
|
|
90
91
|
**self-improving within the turn** — the same convergence the foundation loop runs across milestones,
|
|
91
92
|
compressed into one task.
|
|
92
93
|
|
|
93
|
-
## The
|
|
94
|
+
## The automated quality gate
|
|
94
95
|
|
|
96
|
+
<constraints>
|
|
95
97
|
The verify gate may be resolved by **evidence** rather than by a person — when the evidence is
|
|
96
98
|
sufficient and the result is recorded (principle 7, reframed: an automated, recorded pass is an
|
|
97
99
|
explicit pass, not a skip).
|
|
98
100
|
|
|
99
101
|
- **Auto-PASS requires ALL of:** every test green; coverage not decreased; no test weakened and no
|
|
100
|
-
contract edited; the convergence loops dry; the completeness-critic found nothing open
|
|
102
|
+
contract edited; the convergence loops dry; the completeness-critic found nothing open; and the
|
|
103
|
+
deep check below recorded.
|
|
104
|
+
- **The deep check (every gate, no skim).** Deep check — do not skim. If the task produced code, record
|
|
105
|
+
that every new symbol is referenced (wiring) and that no new dead/unused code was introduced. If it
|
|
106
|
+
produced prose or non-code, record a semantic read — what you read in full and what it confirmed.
|
|
107
|
+
Which path applies is the resolver's judgement; the engine never classifies. An unfilled deep check is
|
|
108
|
+
a **shallow verify**, not an auto-PASS — evidence the work is wired, not merely plausible.
|
|
101
109
|
- **Always escalates to a human (never auto-passed):** any **security** finding (HARD-STOP, always);
|
|
102
110
|
a **concurrency**/timing risk the tests cannot exercise; an **architecture**/layering violation; and
|
|
103
111
|
any failing test. These are the residue principle 2 names — automation cannot judge them.
|
|
@@ -107,22 +115,24 @@ explicit pass, not a skip).
|
|
|
107
115
|
|
|
108
116
|
The auto-gate NEVER writes a human signature it did not get. An auto-PASS is logged as *auto-resolved*,
|
|
109
117
|
honestly — the line between a pass and a skip is the recorded outcome, not a forged name.
|
|
118
|
+
</constraints>
|
|
110
119
|
|
|
111
120
|
## Emitting deltas — feeding the foundation back
|
|
112
121
|
|
|
113
122
|
The completeness-critic does not discard what it finds. Every gap, surprise, or convention that helped
|
|
114
|
-
or hurt becomes an **`open`
|
|
123
|
+
or hurt becomes an **`open` lesson learned** in the task's OBSERVE block, in the `deltas.md` grammar,
|
|
115
124
|
tagged by competency:
|
|
116
125
|
|
|
117
126
|
- a finding the run FIXED but that taught the foundation something (a missing scenario -> `TDD`);
|
|
118
127
|
- a finding the run could NOT fix — a residue escalation -> a delta AND the escalation to a human.
|
|
119
128
|
|
|
120
|
-
These `open` deltas feed v5's human-gated
|
|
121
|
-
the human
|
|
129
|
+
These `open` deltas feed v5's human-gated consolidation (`fold.md`) at milestone close: the run emits `open`;
|
|
130
|
+
the human consolidates. That is the loop closing — **v6 run -> v5 foundation** — so a dynamic run sharpens the
|
|
122
131
|
five competencies instead of letting its findings evaporate at end-of-run.
|
|
123
132
|
|
|
124
|
-
## The autonomy
|
|
133
|
+
## The autonomy level
|
|
125
134
|
|
|
135
|
+
<constraints>
|
|
126
136
|
How much a run may auto-gate is a **per-scope setting**, not a global switch (principle 5: trust is
|
|
127
137
|
earned per scope). A task declares its level in its `TASK.md` header:
|
|
128
138
|
|
|
@@ -138,23 +148,24 @@ autonomy: auto | conservative
|
|
|
138
148
|
|
|
139
149
|
> **v7 reversal (recorded, not hidden).** Earlier the default was `conservative` and `auto` was the
|
|
140
150
|
> earned exception; v7 flips this — `auto` is the default, `conservative` is the deliberate lowering.
|
|
141
|
-
> What did **not** change is principle 5: the
|
|
151
|
+
> What did **not** change is principle 5: the autonomy level is still **per-scope**, and it still lives in the
|
|
142
152
|
> `TASK.md` header, and you still lower it anywhere risk demands. Only the starting point moved.
|
|
143
153
|
|
|
144
|
-
**The high-risk guard — `auto` is refused where it matters most.** The
|
|
154
|
+
**The high-risk guard — `auto` is refused where it matters most.** The autonomy level is not a blank cheque. On a
|
|
145
155
|
**high-risk or method-defining scope** — anything where a wrong-but-plausible result is expensive or
|
|
146
156
|
hard to reverse (auth, money, data-loss paths, the method/trust-layer itself) — `auto` must be lowered
|
|
147
157
|
to `conservative`; leaving it at `auto` there is the reject code **`unguarded_high_risk_auto`**. This
|
|
148
|
-
closes the v6 dogfood
|
|
158
|
+
closes the v6 dogfood gap, where the whole milestone ran at `auto` on the riskiest possible
|
|
149
159
|
scope (defining the method) with no friction. The default is `auto` *for ordinary, well-tested scope*;
|
|
150
160
|
high risk still earns a human gate.
|
|
151
161
|
|
|
152
162
|
Judging *what* is high-risk stays human — the scope declares **`risk: high`** in the same `TASK.md`
|
|
153
|
-
header where the
|
|
163
|
+
header where the autonomy level lives, reviewed at the freeze like every header line (the engine never
|
|
154
164
|
classifies scope). **Since v14 the guard is mechanical for the declared case:**
|
|
155
165
|
the engine refuses the declared combination — `add.py gate` will not complete (`PASS`/`RISK-ACCEPTED`) a task whose header
|
|
156
166
|
carries `risk: high` without `autonomy: conservative` (error `unguarded_high_risk_auto`; `HARD-STOP`
|
|
157
167
|
always records — stopping is never blocked), and `add.py audit` flags the same code on a finished
|
|
158
168
|
record whose header was tampered or whose GATE RECORD reviewer is the auto-gate — which CI enforces
|
|
159
169
|
(audit-ci). The honest limit mirrors the audit's: an **undeclared** high-risk scope passes; declaring
|
|
160
|
-
is the human
|
|
170
|
+
is the human decision point, the engine enforces what was declared.
|
|
171
|
+
</constraints>
|
package/skill/add/scope.md
CHANGED
|
@@ -20,7 +20,7 @@ scope drafting honors intake's classification — it never re-sizes a request:
|
|
|
20
20
|
means one drafting pass, NOT auto-creation. Nothing is written to disk — single draft or the
|
|
21
21
|
whole batch — until the human confirms. You propose; you wait.
|
|
22
22
|
|
|
23
|
-
## Brainstorm before you draft — co-specify at milestone
|
|
23
|
+
## Brainstorm before you draft — co-specify at milestone level
|
|
24
24
|
|
|
25
25
|
Don't draft a MILESTONE.md from thin input. Run the same three-move co-specify as a
|
|
26
26
|
task's §1 (`phases/1-specify.md`) — Diverge (framings + open questions) → Converge
|
|
@@ -31,12 +31,14 @@ Draft the WHOLE milestone before showing; nothing hits disk until the human conf
|
|
|
31
31
|
Diverge seeds (pick the live ones):
|
|
32
32
|
- **Outcome** — done means a user can do *what* they can't today? (goal sentence)
|
|
33
33
|
- **Edge of scope** — nearest thing assumed IN that you want OUT? (Out list)
|
|
34
|
-
- **Riskiest
|
|
34
|
+
- **Riskiest decision point** — which contract, if wrong, costs the most rework? (freeze-first)
|
|
35
35
|
- **Done-looks-like** — how do we SEE each outcome without reading code? (exit criteria)
|
|
36
36
|
- **First slice** — which task unblocks the rest? (breadth-first order)
|
|
37
37
|
|
|
38
|
-
Rank assumptions
|
|
39
|
-
`⚠ <assumption> —
|
|
38
|
+
Rank assumptions lowest-confidence first; the top 1–2 get the flag the human reads at confirm:
|
|
39
|
+
`⚠ <assumption> — lowest confidence because <why>; if wrong: <cost>`. Present the draft via
|
|
40
|
+
`report-template.md` — open with the ARC (goal · done · plan): the goal this milestone serves,
|
|
41
|
+
what is already covered, and the plan its task list lays out.
|
|
40
42
|
|
|
41
43
|
## Drafting a good MILESTONE.md (section by section)
|
|
42
44
|
|
|
@@ -45,8 +47,8 @@ Rank assumptions least-sure first; the top 1–2 get the flag the human reads at
|
|
|
45
47
|
- **Scope In/Out** — the explicit anti-creep deferral list. Naming what is OUT is as important
|
|
46
48
|
as what is IN; an empty Out list usually means the scope is not yet thought through.
|
|
47
49
|
- **Shared decisions & glossary deltas** — cross-cutting rules every task must honor, named from
|
|
48
|
-
the glossary. New terms get a glossary entry (the
|
|
49
|
-
- **Shared / risky contracts to freeze first** — the
|
|
50
|
+
the glossary. New terms get a glossary entry (the living documentation stays honest).
|
|
51
|
+
- **Shared / risky contracts to freeze first** — the decision points between tasks; name the owning task.
|
|
50
52
|
- **Tasks (breadth-first)** — `slug · depends-on · one line` each. Decompose by deliverable, not
|
|
51
53
|
by phase; keep each task one-file-sized. Order by dependency, not by guesswork.
|
|
52
54
|
- **Exit criteria** — observable, and **every exit criterion maps to a declared task slug**
|
|
@@ -54,6 +56,7 @@ Rank assumptions least-sure first; the top 1–2 get the flag the human reads at
|
|
|
54
56
|
|
|
55
57
|
## Reject codes (emit `{ reject, rationale }`, create nothing)
|
|
56
58
|
|
|
59
|
+
<reject_codes>
|
|
57
60
|
- `not_classified` — the request has not been through intake yet. Classify it first; you cannot
|
|
58
61
|
draft scope for an unclassified request.
|
|
59
62
|
- `dangling_criterion` — a drafted MILESTONE.md has an exit criterion that maps to no declared
|
|
@@ -61,6 +64,7 @@ Rank assumptions least-sure first; the top 1–2 get the flag the human reads at
|
|
|
61
64
|
a malformed milestone. With no engine lint, you are the first check and the human is the backstop.
|
|
62
65
|
- `no_milestone` — intake routed the request to `task` or `change-request`; scope drafting
|
|
63
66
|
creates NO milestone. Honor the classification; do not invent milestone-sized scope.
|
|
67
|
+
</reject_codes>
|
|
64
68
|
|
|
65
69
|
## Worked example (from this repo's own history)
|
|
66
70
|
|
|
@@ -1,11 +1,11 @@
|
|
|
1
1
|
# Setup review — the one page the human signs
|
|
2
2
|
|
|
3
|
-
Autonomous setup ends at a single human gate: the **
|
|
3
|
+
Autonomous setup ends at a single human gate: the **baseline approval** (`add.py lock`). Before that
|
|
4
4
|
signature is honest, the human needs to see *what you drafted and how sure you were* — not re-derive
|
|
5
5
|
it. `SETUP-REVIEW.md` is that page: every decision you made while drafting the foundation, first-scope,
|
|
6
|
-
and the first contract, **ordered
|
|
6
|
+
and the first contract, **ordered lowest-confidence-first** so the riskiest guesses meet their eye first.
|
|
7
7
|
|
|
8
|
-
This is the setup-
|
|
8
|
+
This is the setup-level analog of presenting a task's specification bundle lowest-confidence-first at the contract freeze.
|
|
9
9
|
The engine never reads this file — `add.py lock` is judgment-free, the signature *is* the gate (see
|
|
10
10
|
`setup-lock-state`). The human **reading** this page is the review; your job is to make the reading honest.
|
|
11
11
|
|
|
@@ -13,7 +13,7 @@ The engine never reads this file — `add.py lock` is judgment-free, the signatu
|
|
|
13
13
|
|
|
14
14
|
Write **one** artifact at `.add/SETUP-REVIEW.md`. **Never clobber a human-edited one** — if it already
|
|
15
15
|
exists with hand edits, append/update, don't overwrite (the same non-clobber rule `init` applies to
|
|
16
|
-
|
|
16
|
+
living docs). It is a per-onboarding, setup-level artifact; it sits beside `PROJECT.md`, not under a task.
|
|
17
17
|
|
|
18
18
|
## The template
|
|
19
19
|
|
|
@@ -27,14 +27,15 @@ survivors). It is a per-onboarding, setup-altitude artifact; it sits beside `PRO
|
|
|
27
27
|
| 1 | <the drafted decision> | PROJECT.md \| scope \| first-contract | `guessed` | <the inference + why you had to guess> |
|
|
28
28
|
| 2 | <…> | <…> | `evidence-grounded` | <cite the source file/line you read it from> |
|
|
29
29
|
|
|
30
|
-
Sign:
|
|
30
|
+
Sign: confirm in chat → the agent runs `add.py lock --by "<name>"` (typing it yourself works too)
|
|
31
31
|
```
|
|
32
32
|
|
|
33
|
-
Rows are numbered for reference at the gate ("row 1 is
|
|
33
|
+
Rows are numbered for reference at the gate ("row 1 is where my confidence is lowest").
|
|
34
34
|
|
|
35
35
|
## The two rules that make it honest
|
|
36
36
|
|
|
37
|
-
|
|
37
|
+
<constraints>
|
|
38
|
+
1. **Lowest-confidence-first.** Order rows by confidence **ascending**. A `guessed` row always floats above an
|
|
38
39
|
`evidence-grounded` one. The point is not completeness theatre — it is to spend the human's attention
|
|
39
40
|
where it changes outcomes: the top of the table is the part they actually need to challenge.
|
|
40
41
|
|
|
@@ -45,13 +46,15 @@ Rows are numbered for reference at the gate ("row 1 is the one I'm least sure ab
|
|
|
45
46
|
onboarding (a near-empty repo, only the 4-lens answers) produces these. These are what the human
|
|
46
47
|
must check; that is why they sit on top.
|
|
47
48
|
|
|
48
|
-
The tag vocabulary is shared with `adopt.md` — the brownfield map tags each filled
|
|
49
|
+
The tag vocabulary is shared with `adopt.md` — the brownfield map tags each filled living-doc decision
|
|
49
50
|
`guessed`/`evidence-grounded`, and those tags flow straight into this table.
|
|
51
|
+
</constraints>
|
|
50
52
|
|
|
51
53
|
## Where it ends
|
|
52
54
|
|
|
53
|
-
`SETUP-REVIEW.md` is **read-only context** for the
|
|
54
|
-
field-by-field; you present it,
|
|
55
|
+
`SETUP-REVIEW.md` is **read-only context** for the baseline approval. You do not ask the human to approve it
|
|
56
|
+
field-by-field; you present it, lowest-confidence-first; they confirm in conversation, and you run the lock
|
|
57
|
+
with their name:
|
|
55
58
|
|
|
56
59
|
```bash
|
|
57
60
|
python3 .add/tooling/add.py lock --by "<name>"
|
package/skill/add/streams.md
CHANGED
|
@@ -11,9 +11,9 @@ orchestrator*, drive several tasks at once by reading the dependency DAG that
|
|
|
11
11
|
## The honest frame — this is pipelining, not N× speed
|
|
12
12
|
|
|
13
13
|
With **one human reviewer** you cannot beat `review_time × N_tasks` (the human-led
|
|
14
|
-
|
|
14
|
+
decision points are serial — `docs/10-setup-and-stages.md:91`). So the win is **not throughput**:
|
|
15
15
|
it is that the reviewer is **never blocked waiting on a build**. While the human reviews
|
|
16
|
-
task A's frozen
|
|
16
|
+
task A's frozen bundle, the builds for B·C·D run behind *their* frozen contracts. You hide
|
|
17
17
|
build latency under human latency. Do not promise more than that.
|
|
18
18
|
|
|
19
19
|
## The two queues
|
|
@@ -24,33 +24,34 @@ Compute both from one `python3 .add/tooling/add.py status` — no new state:
|
|
|
24
24
|
`deps=` task already shows `gate=PASS`. These are the only tasks a worker may pick up.
|
|
25
25
|
A task with unmet deps stays queued; a task finishing PASS unblocks its dependents on
|
|
26
26
|
the next `status`.
|
|
27
|
-
- **REVIEW-QUEUE** — the irreducibly serial part: the **
|
|
27
|
+
- **REVIEW-QUEUE** — the irreducibly serial part: the **bundle approval** (contract
|
|
28
28
|
freeze) and any **Verify escalation**. One human, one queue. Present these one at a
|
|
29
|
-
time, never in a batch the human will
|
|
29
|
+
time, never in a batch the human will approve without reading.
|
|
30
30
|
|
|
31
31
|
```
|
|
32
32
|
add.py status ─► READY-QUEUE ──spawn workers──► builds run ──► REVIEW-QUEUE ──► done
|
|
33
|
-
(deps=PASS?) (machine span) (concurrent) (
|
|
33
|
+
(deps=PASS?) (machine span) (concurrent) (decision points,
|
|
34
34
|
▲ strictly serial)
|
|
35
35
|
└──────────────── a task gating PASS unblocks its dependents ──────────────┘
|
|
36
36
|
```
|
|
37
37
|
|
|
38
|
-
## The
|
|
38
|
+
## The autonomy level is the throttle (not a new flag)
|
|
39
39
|
|
|
40
40
|
How much concurrency you actually get is set by each task's `autonomy:` header
|
|
41
41
|
(`run.md`), not by this rubric:
|
|
42
42
|
|
|
43
43
|
| `autonomy` (TASK.md) | What serializes on the human | Concurrency |
|
|
44
44
|
|----------------------|------------------------------|-------------|
|
|
45
|
-
| `conservative` |
|
|
46
|
-
| `auto` (default) |
|
|
45
|
+
| `conservative` | bundle approval **+** every Verify | pure pipelining — builds overlap, both gates queue |
|
|
46
|
+
| `auto` (default) | bundle approval **only**; Verify auto-PASSes on evidence | real concurrency — only the decision point + residue escalations queue |
|
|
47
47
|
| `auto` but **high-risk** | refused → forced `conservative` (`unguarded_high_risk_auto`) | back to pipelining, by design |
|
|
48
48
|
|
|
49
|
-
The irreducible floor is **one human approval per task at the contract
|
|
49
|
+
The irreducible floor is **one human approval per task at the contract decision point** — the decision point
|
|
50
50
|
never drops to zero (`run.md:22`). That floor is correct; do not engineer around it.
|
|
51
51
|
|
|
52
52
|
## Who writes what — the hard boundary
|
|
53
53
|
|
|
54
|
+
<constraints>
|
|
54
55
|
- **You (orchestrator)** own all shared writes: `MILESTONE.md`, and every
|
|
55
56
|
`add.py advance <slug>` / `add.py gate <outcome> <slug>` call. **Always pass the explicit
|
|
56
57
|
`<slug>`** — `advance`/`gate`/`phase` all take an optional task slug and act on it
|
|
@@ -62,21 +63,70 @@ never drops to zero (`run.md:22`). That floor is correct; do not engineer around
|
|
|
62
63
|
- **Isolation**: spawn each worker with `isolation="worktree"` so concurrent builds
|
|
63
64
|
cannot collide. The worktree is discarded on failure; the task resets to its last-good
|
|
64
65
|
phase.
|
|
66
|
+
</constraints>
|
|
65
67
|
|
|
66
68
|
## Design for failure (required)
|
|
67
69
|
|
|
68
70
|
- **Fresh worktree base (verify base == HEAD)** — create each worker's worktree from current
|
|
69
|
-
`HEAD` **after** you commit the task's frozen
|
|
71
|
+
`HEAD` **after** you commit the task's frozen specification bundle (spec · scenarios · contract · tests). A
|
|
70
72
|
worktree forked from a stale base forces the worker to recreate the frozen artifacts by hand
|
|
71
73
|
(the v10 dogfood hit exactly this). Before the worker starts, confirm `git -C <worktree>
|
|
72
74
|
rev-parse HEAD` equals the orchestrator's `HEAD`; if it drifted, `git merge` the base in first.
|
|
73
|
-
- **Lease + timeout** — record which worker holds which task
|
|
74
|
-
the claim back to READY (re-spawn, do not assume partial work is sound).
|
|
75
|
+
- **Lease + timeout** — record which worker holds which task (in the wave ledger, below);
|
|
76
|
+
if a worker dies, release the claim back to READY (re-spawn, do not assume partial work is sound).
|
|
75
77
|
- **Failure isolates** — a worker that hits a STOP-and-escalate (below) blocks only its
|
|
76
78
|
own task. Siblings keep running; the escalation joins the REVIEW-QUEUE.
|
|
77
79
|
- **Circuit-breaker** — if N workers fail in a wave, stop fanning out and fall back to
|
|
78
80
|
sequential. Repeated failure means the scope was wrong, not the parallelism.
|
|
79
81
|
|
|
82
|
+
## Wave ledger — the wave's resume point
|
|
83
|
+
|
|
84
|
+
A single task resumes from `state.json`; a wave used to resume from nothing — the
|
|
85
|
+
task ↔ lease ↔ fork-base ↔ autonomy ↔ merge-order mapping lived only in the orchestrator's
|
|
86
|
+
chat context, and the v12-1 recurrence proved that discipline without an artifact fails
|
|
87
|
+
(the base check existed in prose and never ran). The ledger fixes both: it is the file you
|
|
88
|
+
re-orient from, and its evidence cells cannot be filled without executing the checks.
|
|
89
|
+
|
|
90
|
+
**The file** — `.add/milestones/<m>/WAVE.md`, orchestrator-owned like `MILESTONE.md` and
|
|
91
|
+
`state.json`. ONE live wave per milestone at a time; opening a second while one is live is
|
|
92
|
+
refused (`wave_already_live`). **Workers never read WAVE.md** — the orchestrator copies the
|
|
93
|
+
relevant mid-wave decisions into each worker's PROMPT.md at spawn/respawn, so the worker
|
|
94
|
+
contract below stays unchanged and no worker widens into sibling state.
|
|
95
|
+
|
|
96
|
+
```markdown
|
|
97
|
+
# WAVE.md — transient wave ledger (orchestrator-owned · one live wave per milestone)
|
|
98
|
+
wave: <n> · opened: <date> · status: live|merging
|
|
99
|
+
base: <orchestrator HEAD at spawn — the sha every fork must equal>
|
|
100
|
+
|
|
101
|
+
### Roster (lease ledger)
|
|
102
|
+
| task | lease (worker) | fork-base (pasted) | autonomy | spawned | timeout |
|
|
103
|
+
|--------|----------------|---------------------------------------------|----------|---------|---------|
|
|
104
|
+
| <slug> | wt-a | <paste `git -C <wt> rev-parse HEAD` output> | auto | <time> | <dur> |
|
|
105
|
+
|
|
106
|
+
### Mid-wave decisions
|
|
107
|
+
- <date> <decision a later or respawned worker must honor — copy it into that worker's PROMPT.md>
|
|
108
|
+
|
|
109
|
+
### Merge order (serial; integration Verify per merge)
|
|
110
|
+
1. <slug> → 2. <slug>
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
**Evidence cells, not ticks.** The fork-base cell holds the PASTED output of
|
|
114
|
+
`git -C <worktree> rev-parse HEAD`, and it must equal `base:`. A tick is not evidence; a row
|
|
115
|
+
you can only fill by running the command is the fresh-worktree-base check EXECUTING — the
|
|
116
|
+
v12-1 lesson (words-exist ≠ method-works) closed structurally. Spawning a worker whose roster
|
|
117
|
+
row lacks that evidence is refused (`unverified_fork_base`).
|
|
118
|
+
|
|
119
|
+
**Lifecycle — open → consume → digest → delete.** Open the ledger when the first worker
|
|
120
|
+
spawns. The serial integration Verify consumes it (the merge order is read from it, one
|
|
121
|
+
worktree at a time). At wave close, absorb the evidence digest — wave base · roster→fork-base
|
|
122
|
+
evidence · merge order · integration-Verify outcome — into `MILESTONE.md` as an append-only
|
|
123
|
+
`## Wave log` block (this is the integration-Verify *record*, previously homeless), and only
|
|
124
|
+
then remove the file. Removing WAVE.md before the digest is absorbed is refused
|
|
125
|
+
(`digest_not_absorbed`) — the proof the checks ran must outlive the file.
|
|
126
|
+
|
|
127
|
+
**Resume rule.** On session start, a live WAVE.md is the wave's resume point: re-orient from
|
|
128
|
+
the file — roster, bases, decisions, merge order — never from conversational memory.
|
|
129
|
+
|
|
80
130
|
## Merge is serial — integration Verify
|
|
81
131
|
|
|
82
132
|
Parallel build, **serial integration**. After workers return, you merge the worktrees
|
|
@@ -85,8 +135,8 @@ checks that `run.md:102` says automation cannot judge. Two green tasks in isolat
|
|
|
85
135
|
still conflict when merged; this step is where that surfaces. Never auto-pass it.
|
|
86
136
|
|
|
87
137
|
Each worktree carries a full copy of `.add/`. Merge back **only** `src/`, `tests/`, and the
|
|
88
|
-
worker's own `.add/tasks/<slug>/` (TASK.md · SUMMARY.md) — `.add/state.json`
|
|
89
|
-
`
|
|
138
|
+
worker's own `.add/tasks/<slug>/` (TASK.md · SUMMARY.md) — `.add/state.json`, `MILESTONE.md`,
|
|
139
|
+
and the live `WAVE.md` stay orchestrator-owned, or a parallel merge will drag stale state back.
|
|
90
140
|
|
|
91
141
|
## The worker contract — portable across coding agents
|
|
92
142
|
|
|
@@ -107,7 +157,7 @@ changes. Fill every `{{...}}` per stream. The ADD-specific value is `<touch_boun
|
|
|
107
157
|
Execute the LOCKED dynamic run for task '{{TASK_SLUG}}' in milestone {{MILESTONE}}:
|
|
108
158
|
drive §4 TESTS red→green against the FROZEN contract {{CONTRACT_VERSION}}, converge, and
|
|
109
159
|
resolve verify per autonomy={{AUTONOMY}}. You own ONLY the machine-led span — the two human
|
|
110
|
-
|
|
160
|
+
decision points (bundle approval · escalated Verify) are NOT yours.
|
|
111
161
|
</objective>
|
|
112
162
|
|
|
113
163
|
<persona>
|
|
@@ -126,7 +176,7 @@ Self-Eval; if any < 0.9, refine before returning.
|
|
|
126
176
|
<touch_boundary> <!-- from run.md:56-73; the worker's contract, identical on every runner -->
|
|
127
177
|
MAY: rewrite code in src/ · drive tests green WITHOUT weakening them · gather verify evidence.
|
|
128
178
|
MUST NOT: edit the frozen CONTRACT or locked scope · weaken/delete/skip any test ·
|
|
129
|
-
touch §1–§3
|
|
179
|
+
touch §1–§3 bundle artifacts · write MILESTONE.md / state.json / any sibling stream.
|
|
130
180
|
STOP-and-escalate (return your findings; do not decide):
|
|
131
181
|
• a discovered scope/contract gap → backward-correction, reopen Specify (principle 4)
|
|
132
182
|
• any SECURITY finding → HARD-STOP, always
|
|
@@ -156,7 +206,7 @@ ripgrep otherwise. Design every IO path for failure — timeouts, retries, rollb
|
|
|
156
206
|
<return> <!-- the worker PROPOSES; the orchestrator RECORDS. A worker never runs add.py. -->
|
|
157
207
|
End with a structured verdict AND write the same into SUMMARY.md in the task dir:
|
|
158
208
|
{ task, outcome: PASS|RISK-ACCEPTED|HARD-STOP|ESCALATE, evidence: <tests+coverage>,
|
|
159
|
-
residue: [security|concurrency|architecture findings], deltas: [open
|
|
209
|
+
residue: [security|concurrency|architecture findings], deltas: [open lessons learned] }.
|
|
160
210
|
Do NOT touch add.py or any shared file — the orchestrator gates on your verdict.
|
|
161
211
|
</return>
|
|
162
212
|
```
|
|
@@ -169,7 +219,7 @@ The contract is identical whichever model runs it (the model is disposable, like
|
|
|
169
219
|
| Tier | When | Claude Code | Any other runner |
|
|
170
220
|
|------|------|-------------|------------------|
|
|
171
221
|
| **mid** | ordinary, well-tested scope; clear contract | `sonnet` | the runner's balanced model |
|
|
172
|
-
| **top** | complex / ambiguous / cross-cutting /
|
|
222
|
+
| **top** | complex / ambiguous / cross-cutting / broad scope of impact | `opus` | the runner's strongest reasoning model |
|
|
173
223
|
|
|
174
224
|
Two rules sit **above** model choice and never bend:
|
|
175
225
|
- **High-risk ⇒ `conservative` autonomy, regardless of model** (`run.md` high-risk guard). A
|
|
@@ -186,7 +236,7 @@ worktree, then points the agent at that directory.
|
|
|
186
236
|
|-----------|----------|----------------------------------|-----------------------------------------------|
|
|
187
237
|
| spawn a worker | prompt + label | `Task(description=…, prompt=…)` | `cd $WT && <agent> run --prompt-file PROMPT.md` |
|
|
188
238
|
| pick the model | tier → id | `model="opus"\|"sonnet"` | a `--model <id>` flag |
|
|
189
|
-
| isolate | worktree | `isolation="worktree"` | `git worktree add $WT HEAD` (after committing the
|
|
239
|
+
| isolate | worktree | `isolation="worktree"` | `git worktree add $WT HEAD` (after committing the bundle; verify base == HEAD), then run inside it |
|
|
190
240
|
| load context | files / cwd | `<context_files>` + repo cwd | run inside `$WT`; paths are relative |
|
|
191
241
|
| domain expertise | skill / preamble | a Claude skill in `<expertise>` | a system-prompt / profile preamble |
|
|
192
242
|
| return a verdict | structured | final message (optionally a schema) | stdout JSON the orchestrator parses |
|