@pilotspace/add 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (53) hide show
  1. package/GETTING-STARTED.md +238 -0
  2. package/LICENSE +20 -0
  3. package/README.md +106 -0
  4. package/bin/cli.js +131 -0
  5. package/docs/00-introduction.md +46 -0
  6. package/docs/01-principles.md +71 -0
  7. package/docs/02-the-flow.md +93 -0
  8. package/docs/03-step-1-specify.md +117 -0
  9. package/docs/04-step-2-scenarios.md +78 -0
  10. package/docs/05-step-3-contract.md +78 -0
  11. package/docs/06-step-4-tests.md +71 -0
  12. package/docs/07-step-5-build.md +80 -0
  13. package/docs/08-step-6-verify.md +63 -0
  14. package/docs/09-the-loop.md +43 -0
  15. package/docs/10-setup-and-stages.md +75 -0
  16. package/docs/11-governance.md +87 -0
  17. package/docs/12-roles.md +99 -0
  18. package/docs/13-adoption.md +67 -0
  19. package/docs/14-foundation.md +121 -0
  20. package/docs/README.md +70 -0
  21. package/docs/add-competencies.png +0 -0
  22. package/docs/add-flow.png +0 -0
  23. package/docs/add-foundation.png +0 -0
  24. package/docs/add-hierarchy.png +0 -0
  25. package/docs/appendix-a-templates.md +88 -0
  26. package/docs/appendix-b-prompts.md +119 -0
  27. package/docs/appendix-c-glossary.md +85 -0
  28. package/docs/appendix-d-worked-example.md +152 -0
  29. package/docs/appendix-e-checklists.md +80 -0
  30. package/docs/appendix-f-requirements-matrix.md +170 -0
  31. package/package.json +47 -0
  32. package/skill/add/SKILL.md +118 -0
  33. package/skill/add/deltas.md +69 -0
  34. package/skill/add/fold.md +66 -0
  35. package/skill/add/intake.md +49 -0
  36. package/skill/add/phases/0-setup.md +35 -0
  37. package/skill/add/phases/1-specify.md +55 -0
  38. package/skill/add/phases/2-scenarios.md +36 -0
  39. package/skill/add/phases/3-contract.md +41 -0
  40. package/skill/add/phases/4-tests.md +37 -0
  41. package/skill/add/phases/5-build.md +38 -0
  42. package/skill/add/phases/6-verify.md +39 -0
  43. package/skill/add/phases/7-observe.md +32 -0
  44. package/skill/add/run.md +152 -0
  45. package/skill/add/scope.md +58 -0
  46. package/tooling/add.py +1573 -0
  47. package/tooling/templates/CONVENTIONS.md.tmpl +8 -0
  48. package/tooling/templates/GLOSSARY.md.tmpl +3 -0
  49. package/tooling/templates/MILESTONE.md.tmpl +25 -0
  50. package/tooling/templates/MODEL_REGISTRY.md.tmpl +6 -0
  51. package/tooling/templates/PROJECT.md.tmpl +42 -0
  52. package/tooling/templates/TASK.md.tmpl +111 -0
  53. package/tooling/templates/dependencies.allowlist.tmpl +2 -0
@@ -0,0 +1,152 @@
1
+ # The dynamic run — executing a locked scope
2
+
3
+ Once a task's CONTRACT is frozen (phase 3), the scope is *locked*: the external shape will not move.
4
+ That lock is ADD's autonomy seam — below it code is disposable; above it nothing breaks. This rubric
5
+ covers what runs on the far side of the seam: the **build->verify half, executed as a dynamic,
6
+ self-improving run** instead of a manual, sequential build. The human-led FRONT (Specify · Scenarios
7
+ · Contract) still owns *direction*, but v7 compresses it to a **single human approval at the seam**
8
+ (see "The one-approval front" below) — the AI drafts the whole front, a human approves it once.
9
+
10
+ > **Self-improving = within-run convergence + emit v5 deltas** — same definition as v5: tracked,
11
+ > evidence-backed, never autonomous training. The run converges in-turn AND feeds the human-gated
12
+ > fold loop (`deltas.md` · `fold.md`). The engine stays judgment-free: this is a rubric, not `add.py`.
13
+
14
+ ## The one-approval front (v7)
15
+
16
+ The human-led front used to be three separate approvals — Specify, then Scenarios, then the Contract
17
+ freeze. v7 compresses it to **one**. From the user's input the AI **drafts the whole front as a single
18
+ bundle** — the Spec, the Scenarios, the Contract, and the failing Tests — and presents it together. The
19
+ human gives **one approval, at the frozen contract** (the seam). That single approval is the green light
20
+ for the self-driving run.
21
+
22
+ Why one approval and not zero: the contract freeze is the autonomy seam, and the seam **stays human**.
23
+ The AI *drafts* the contract but never *freezes its own* — a person approves the frozen shape before any
24
+ auto-run touches code. This is exactly what keeps "never self-gate a human-led gate" true under an auto
25
+ default: the one gate that remains is human. Drop it to zero and the AI would freeze the interface it
26
+ then builds against and self-gate the result — the circular trust v6's dogfood warned against.
27
+
28
+ What the human is actually approving in that one gate: that the drafted Spec captures the real intent,
29
+ that the Scenarios cover the cases that matter, and that the Contract shape is the one to freeze. Reject
30
+ any part and the bundle goes back to draft — that is backward-correction (principle 4), not failure.
31
+ Approve, and the run begins.
32
+
33
+ **The least-sure flag — aiming the one approval.** A single approval over a whole bundle invites a
34
+ rubber stamp. So the AI presents the bundle **least-sure first**: of everything it is asking the human
35
+ to freeze, it names the **1–2 points most likely to be wrong**, tagged by part
36
+ (`⚠ [spec|scenario|contract|test] … — because …; if wrong: …`), each with *why* it is uncertain and
37
+ *what it costs if wrong*. The §1 assumptions feed it, but a flag may equally point at an uncovered
38
+ scenario or the contract shape. If nothing is materially uncertain, the AI still names the single
39
+ biggest risk, however small — never a blank "none". Honest about its limit: the flag records that the
40
+ human approved with the soft spots **in front of them**, eyes open; it makes a real review cheap and a
41
+ lazy one visibly negligent, but it cannot *force* engagement — and the AI never asserts that the human
42
+ engaged when it cannot know (a self-asserted gate would just be the rubber stamp one level up). Closing
43
+ that enforcement gap is the job of a CI checker, not of prose.
44
+
45
+ ## When the run begins — the scope-lock trigger
46
+
47
+ The trigger is the **frozen contract**, nothing else. A run may start only when:
48
+
49
+ - §3 CONTRACT is marked `FROZEN @ vN` (the shape is fixed), AND
50
+ - §4 TESTS exist and are RED for the right reason (the target the run drives to green).
51
+
52
+ No frozen contract -> no run: you are still on the human-led front, and starting early is the
53
+ forward-skip the flow forbids. The lock is what makes autonomous execution *safe* — the AI cannot
54
+ drift the interface, because the interface is frozen above it.
55
+
56
+ ## The touch-boundary — what the run may and may not touch
57
+
58
+ A locked run has a hard boundary. It MAY:
59
+
60
+ - write and rewrite **code** (`src/`) — code is disposable below the seam;
61
+ - drive the **tests** to green WITHOUT weakening them (a weakened test is a method violation);
62
+ - gather **evidence** for the verify gate (test output, blind-spot checks).
63
+
64
+ It MUST NOT:
65
+
66
+ - change the **frozen contract** or the **locked scope** — a discovered gap is backward-correction:
67
+ the run STOPS and hands back to a human to reopen Specify (principle 4). The run never re-locks
68
+ scope on its own.
69
+ - weaken, delete, or skip a **test** to make the build pass (that inverts the method).
70
+ - touch the **human-led front artifacts** (§1–§3) except to halt and escalate.
71
+
72
+ Crossing the boundary is not a fast run; it is an unverified one. When the run hits something only the
73
+ front can resolve, it stops — and that stop is the loop working, not failing.
74
+
75
+ ## The dynamic run — fan-out and in-run convergence
76
+
77
+ Once it starts, the run does not crawl the build in one linear pass. It **fans out** the independent
78
+ work — several build attempts, several test-fix loops, several checks at once — and then **converges**
79
+ on a trustworthy result with three loops:
80
+
81
+ - **loop-until-dry** — keep hunting failures and gaps until N consecutive passes find nothing new.
82
+ Stopping at the first green is how defects survive; the run stops only when the well runs dry.
83
+ - **adversarial verify** — for every "done" claim, an independent skeptic tries to REFUTE it. The
84
+ claim survives only if it withstands refutation, not because one pass looked plausible.
85
+ - **completeness-critic** — a final pass that asks "what did we NOT cover — a scenario, a blind-spot,
86
+ an unstated assumption?" Whatever it finds re-enters the run.
87
+
88
+ The run ends only when the loops go dry AND the auto-gate's evidence is satisfied. This is the run
89
+ **self-improving within the turn** — the same convergence the foundation loop runs across milestones,
90
+ compressed into one task.
91
+
92
+ ## The evidence auto-gate
93
+
94
+ The verify gate may be resolved by **evidence** rather than by a person — when the evidence is
95
+ sufficient and the result is recorded (principle 7, reframed: an automated, recorded pass is an
96
+ explicit pass, not a skip).
97
+
98
+ - **Auto-PASS requires ALL of:** every test green; coverage not decreased; no test weakened and no
99
+ contract edited; the convergence loops dry; the completeness-critic found nothing open.
100
+ - **Always escalates to a human (never auto-passed):** any **security** finding (HARD-STOP, always);
101
+ a **concurrency**/timing risk the tests cannot exercise; an **architecture**/layering violation; and
102
+ any failing test. These are the residue principle 2 names — automation cannot judge them.
103
+ - **Records exactly one outcome** (no silent skip): `PASS` (evidence + the named run as accountable
104
+ owner) · `RISK-ACCEPTED` (non-security, signed) · `HARD-STOP`. The record states it was
105
+ auto-resolved, names the run, and lists the residue checks performed.
106
+
107
+ The auto-gate NEVER writes a human signature it did not get. An auto-PASS is logged as *auto-resolved*,
108
+ honestly — the line between a pass and a skip is the recorded outcome, not a forged name.
109
+
110
+ ## Emitting deltas — feeding the foundation back
111
+
112
+ The completeness-critic does not discard what it finds. Every gap, surprise, or convention that helped
113
+ or hurt becomes an **`open` competency delta** in the task's OBSERVE block, in the `deltas.md` grammar,
114
+ tagged by competency:
115
+
116
+ - a finding the run FIXED but that taught the foundation something (a missing scenario -> `TDD`);
117
+ - a finding the run could NOT fix — a residue escalation -> a delta AND the escalation to a human.
118
+
119
+ These `open` deltas feed v5's human-gated fold (`fold.md`) at milestone close: the run emits `open`;
120
+ the human folds. That is the loop closing — **v6 run -> v5 foundation** — so a dynamic run sharpens the
121
+ five competencies instead of letting its findings evaporate at end-of-run.
122
+
123
+ ## The autonomy dial
124
+
125
+ How much a run may auto-gate is a **per-scope setting**, not a global switch (principle 5: trust is
126
+ earned per scope). A task declares its level in its `TASK.md` header:
127
+
128
+ ```
129
+ autonomy: auto | conservative
130
+ ```
131
+
132
+ - **auto (the default)** — the run may auto-PASS when the evidence + residue checks above are
133
+ satisfied. Security still always escalates. This is the default starting point: a frozen contract
134
+ flips the task into a self-driving run that converges and auto-gates on evidence.
135
+ - **conservative** — the deliberate *lowering*: the run does all the work and converges, but STOPS at
136
+ the verify gate for a human. Auto-PASS is disabled. Choose it wherever evidence is thin or risk is high.
137
+
138
+ > **v7 reversal (recorded, not hidden).** Earlier the default was `conservative` and `auto` was the
139
+ > earned exception; v7 flips this — `auto` is the default, `conservative` is the deliberate lowering.
140
+ > What did **not** change is principle 5: the dial is still **per-scope**, the level still lives in the
141
+ > `TASK.md` header, and you still lower it anywhere risk demands. Only the starting point moved.
142
+
143
+ **The high-risk guard — `auto` is refused where it matters most.** The dial is not a blank cheque. On a
144
+ **high-risk or method-defining scope** — anything where a wrong-but-plausible result is expensive or
145
+ hard to reverse (auth, money, data-loss paths, the method/trust-layer itself) — `auto` must be lowered
146
+ to `conservative`; leaving it at `auto` there is the reject code **`unguarded_high_risk_auto`**. This
147
+ closes the v6 dogfood blind-spot, where the whole milestone ran at `auto` on the riskiest possible
148
+ scope (defining the method) with no friction. The default is `auto` *for ordinary, well-tested scope*;
149
+ high risk still earns a human gate.
150
+
151
+ The dial is a **rubric convention** read by the human and the run — it is **not an `add.py` flag** (the
152
+ engine stays judgment-free); the level lives in the `TASK.md` header where the run already reads.
@@ -0,0 +1,58 @@
1
+ # Scope drafting — turn a classified request into a versioned MILESTONE.md
2
+
3
+ This is the **second half of intake**. `intake.md` CLASSIFIES a request into a bucket; scope
4
+ drafting turns that classified request into a confirmed, well-formed, versioned `MILESTONE.md`
5
+ through discussion. The MILESTONE.md template is the SHAPE; this rubric is HOW to fill it well.
6
+ You (the AI) **propose**; the human **confirms before anything is created**.
7
+
8
+ ## What to do per intake outcome
9
+
10
+ scope drafting honors intake's classification — it never re-sizes a request:
11
+
12
+ | intake outcome | scope-loop action | creates (after confirm) |
13
+ |----------------|-------------------|-------------------------|
14
+ | `new-major` / `sub-milestone` | draft ONE MILESTONE.md (fill the template via discussion) | 1 milestone |
15
+ | `task` | route to `add.py new-task <slug>` (it fits the active milestone) | 0 milestones |
16
+ | `change-request` | route to SPECIFY/CONTRACT of the affected task | 0 milestones |
17
+ | `split_required` | draft ALL N items as a batch in ONE pass | N milestones/tasks |
18
+
19
+ **Confirm before create is the invariant.** It holds in the one-pass split case too: "one pass"
20
+ means one drafting pass, NOT auto-creation. Nothing is written to disk — single draft or the
21
+ whole batch — until the human confirms. You propose; you wait.
22
+
23
+ ## Drafting a good MILESTONE.md (section by section)
24
+
25
+ - **goal** — ONE sentence, an outcome not an output ("a user can size any request", not "write
26
+ intake.md"). If it needs an "and", it is probably two milestones.
27
+ - **Scope In/Out** — the explicit anti-creep deferral list. Naming what is OUT is as important
28
+ as what is IN; an empty Out list usually means the scope is not yet thought through.
29
+ - **Shared decisions & glossary deltas** — cross-cutting rules every task must honor, named from
30
+ the glossary. New terms get a glossary entry (the survivor layer stays honest).
31
+ - **Shared / risky contracts to freeze first** — the seams between tasks; name the owning task.
32
+ - **Tasks (breadth-first)** — `slug · depends-on · one line` each. Decompose by deliverable, not
33
+ by phase; keep each task one-file-sized. Order by dependency, not by guesswork.
34
+ - **Exit criteria** — observable, and **every exit criterion maps to a declared task slug**
35
+ (no dangling criterion). Each line answers "which task delivers this, and how would we see it?"
36
+
37
+ ## Reject codes (emit `{ reject, rationale }`, create nothing)
38
+
39
+ - `not_classified` — the request has not been through intake yet. Classify it first; you cannot
40
+ draft scope for an unclassified request.
41
+ - `dangling_criterion` — a drafted MILESTONE.md has an exit criterion that maps to no declared
42
+ task slug. FIX the draft (add the task or drop the criterion) before proposing — never propose
43
+ a malformed milestone. With no engine lint, you are the first check and the human is the backstop.
44
+ - `no_milestone` — intake routed the request to `task` or `change-request`; scope drafting
45
+ creates NO milestone. Honor the classification; do not invent milestone-sized scope.
46
+
47
+ ## Worked example (from this repo's own history)
48
+
49
+ Request: *"open the Interface & Intake milestone"* → intake classified it `sub-milestone` of the
50
+ live v4 self-driving theme → scope drafting produced **`.add/milestones/v4-1/MILESTONE.md`**:
51
+
52
+ - **goal**: make ADD harness-drivable and self-scoping — machine-readable state plus an
53
+ AI-facilitated request→versioned-milestone intake loop (the real v4-1 goal, one outcome sentence).
54
+ - **tasks** (breadth-first): `machine-state-json` · `versioning-policy` · `scope-loop`.
55
+ - **exit criteria** — each maps to its task slug: `--json` emits owner+stop (← machine-state-json),
56
+ the AI proposes a bucket with rationale (← versioning-policy), the AI drafts a versioned
57
+ MILESTONE.md via discussion (← scope-loop). Every criterion names the task that delivers it —
58
+ which is exactly the well-formedness rule above, checkable against the real file.