@pilotspace/add 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/GETTING-STARTED.md +238 -0
- package/LICENSE +20 -0
- package/README.md +106 -0
- package/bin/cli.js +131 -0
- package/docs/00-introduction.md +46 -0
- package/docs/01-principles.md +71 -0
- package/docs/02-the-flow.md +93 -0
- package/docs/03-step-1-specify.md +117 -0
- package/docs/04-step-2-scenarios.md +78 -0
- package/docs/05-step-3-contract.md +78 -0
- package/docs/06-step-4-tests.md +71 -0
- package/docs/07-step-5-build.md +80 -0
- package/docs/08-step-6-verify.md +63 -0
- package/docs/09-the-loop.md +43 -0
- package/docs/10-setup-and-stages.md +75 -0
- package/docs/11-governance.md +87 -0
- package/docs/12-roles.md +99 -0
- package/docs/13-adoption.md +67 -0
- package/docs/14-foundation.md +121 -0
- package/docs/README.md +70 -0
- package/docs/add-competencies.png +0 -0
- package/docs/add-flow.png +0 -0
- package/docs/add-foundation.png +0 -0
- package/docs/add-hierarchy.png +0 -0
- package/docs/appendix-a-templates.md +88 -0
- package/docs/appendix-b-prompts.md +119 -0
- package/docs/appendix-c-glossary.md +85 -0
- package/docs/appendix-d-worked-example.md +152 -0
- package/docs/appendix-e-checklists.md +80 -0
- package/docs/appendix-f-requirements-matrix.md +170 -0
- package/package.json +47 -0
- package/skill/add/SKILL.md +118 -0
- package/skill/add/deltas.md +69 -0
- package/skill/add/fold.md +66 -0
- package/skill/add/intake.md +49 -0
- package/skill/add/phases/0-setup.md +35 -0
- package/skill/add/phases/1-specify.md +55 -0
- package/skill/add/phases/2-scenarios.md +36 -0
- package/skill/add/phases/3-contract.md +41 -0
- package/skill/add/phases/4-tests.md +37 -0
- package/skill/add/phases/5-build.md +38 -0
- package/skill/add/phases/6-verify.md +39 -0
- package/skill/add/phases/7-observe.md +32 -0
- package/skill/add/run.md +152 -0
- package/skill/add/scope.md +58 -0
- package/tooling/add.py +1573 -0
- package/tooling/templates/CONVENTIONS.md.tmpl +8 -0
- package/tooling/templates/GLOSSARY.md.tmpl +3 -0
- package/tooling/templates/MILESTONE.md.tmpl +25 -0
- package/tooling/templates/MODEL_REGISTRY.md.tmpl +6 -0
- package/tooling/templates/PROJECT.md.tmpl +42 -0
- package/tooling/templates/TASK.md.tmpl +111 -0
- package/tooling/templates/dependencies.allowlist.tmpl +2 -0
package/skill/add/run.md
ADDED
|
@@ -0,0 +1,152 @@
|
|
|
1
|
+
# The dynamic run — executing a locked scope
|
|
2
|
+
|
|
3
|
+
Once a task's CONTRACT is frozen (phase 3), the scope is *locked*: the external shape will not move.
|
|
4
|
+
That lock is ADD's autonomy seam — below it code is disposable; above it nothing breaks. This rubric
|
|
5
|
+
covers what runs on the far side of the seam: the **build->verify half, executed as a dynamic,
|
|
6
|
+
self-improving run** instead of a manual, sequential build. The human-led FRONT (Specify · Scenarios
|
|
7
|
+
· Contract) still owns *direction*, but v7 compresses it to a **single human approval at the seam**
|
|
8
|
+
(see "The one-approval front" below) — the AI drafts the whole front, a human approves it once.
|
|
9
|
+
|
|
10
|
+
> **Self-improving = within-run convergence + emit v5 deltas** — same definition as v5: tracked,
|
|
11
|
+
> evidence-backed, never autonomous training. The run converges in-turn AND feeds the human-gated
|
|
12
|
+
> fold loop (`deltas.md` · `fold.md`). The engine stays judgment-free: this is a rubric, not `add.py`.
|
|
13
|
+
|
|
14
|
+
## The one-approval front (v7)
|
|
15
|
+
|
|
16
|
+
The human-led front used to be three separate approvals — Specify, then Scenarios, then the Contract
|
|
17
|
+
freeze. v7 compresses it to **one**. From the user's input the AI **drafts the whole front as a single
|
|
18
|
+
bundle** — the Spec, the Scenarios, the Contract, and the failing Tests — and presents it together. The
|
|
19
|
+
human gives **one approval, at the frozen contract** (the seam). That single approval is the green light
|
|
20
|
+
for the self-driving run.
|
|
21
|
+
|
|
22
|
+
Why one approval and not zero: the contract freeze is the autonomy seam, and the seam **stays human**.
|
|
23
|
+
The AI *drafts* the contract but never *freezes its own* — a person approves the frozen shape before any
|
|
24
|
+
auto-run touches code. This is exactly what keeps "never self-gate a human-led gate" true under an auto
|
|
25
|
+
default: the one gate that remains is human. Drop it to zero and the AI would freeze the interface it
|
|
26
|
+
then builds against and self-gate the result — the circular trust v6's dogfood warned against.
|
|
27
|
+
|
|
28
|
+
What the human is actually approving in that one gate: that the drafted Spec captures the real intent,
|
|
29
|
+
that the Scenarios cover the cases that matter, and that the Contract shape is the one to freeze. Reject
|
|
30
|
+
any part and the bundle goes back to draft — that is backward-correction (principle 4), not failure.
|
|
31
|
+
Approve, and the run begins.
|
|
32
|
+
|
|
33
|
+
**The least-sure flag — aiming the one approval.** A single approval over a whole bundle invites a
|
|
34
|
+
rubber stamp. So the AI presents the bundle **least-sure first**: of everything it is asking the human
|
|
35
|
+
to freeze, it names the **1–2 points most likely to be wrong**, tagged by part
|
|
36
|
+
(`⚠ [spec|scenario|contract|test] … — because …; if wrong: …`), each with *why* it is uncertain and
|
|
37
|
+
*what it costs if wrong*. The §1 assumptions feed it, but a flag may equally point at an uncovered
|
|
38
|
+
scenario or the contract shape. If nothing is materially uncertain, the AI still names the single
|
|
39
|
+
biggest risk, however small — never a blank "none". Honest about its limit: the flag records that the
|
|
40
|
+
human approved with the soft spots **in front of them**, eyes open; it makes a real review cheap and a
|
|
41
|
+
lazy one visibly negligent, but it cannot *force* engagement — and the AI never asserts that the human
|
|
42
|
+
engaged when it cannot know (a self-asserted gate would just be the rubber stamp one level up). Closing
|
|
43
|
+
that enforcement gap is the job of a CI checker, not of prose.
|
|
44
|
+
|
|
45
|
+
## When the run begins — the scope-lock trigger
|
|
46
|
+
|
|
47
|
+
The trigger is the **frozen contract**, nothing else. A run may start only when:
|
|
48
|
+
|
|
49
|
+
- §3 CONTRACT is marked `FROZEN @ vN` (the shape is fixed), AND
|
|
50
|
+
- §4 TESTS exist and are RED for the right reason (the target the run drives to green).
|
|
51
|
+
|
|
52
|
+
No frozen contract -> no run: you are still on the human-led front, and starting early is the
|
|
53
|
+
forward-skip the flow forbids. The lock is what makes autonomous execution *safe* — the AI cannot
|
|
54
|
+
drift the interface, because the interface is frozen above it.
|
|
55
|
+
|
|
56
|
+
## The touch-boundary — what the run may and may not touch
|
|
57
|
+
|
|
58
|
+
A locked run has a hard boundary. It MAY:
|
|
59
|
+
|
|
60
|
+
- write and rewrite **code** (`src/`) — code is disposable below the seam;
|
|
61
|
+
- drive the **tests** to green WITHOUT weakening them (a weakened test is a method violation);
|
|
62
|
+
- gather **evidence** for the verify gate (test output, blind-spot checks).
|
|
63
|
+
|
|
64
|
+
It MUST NOT:
|
|
65
|
+
|
|
66
|
+
- change the **frozen contract** or the **locked scope** — a discovered gap is backward-correction:
|
|
67
|
+
the run STOPS and hands back to a human to reopen Specify (principle 4). The run never re-locks
|
|
68
|
+
scope on its own.
|
|
69
|
+
- weaken, delete, or skip a **test** to make the build pass (that inverts the method).
|
|
70
|
+
- touch the **human-led front artifacts** (§1–§3) except to halt and escalate.
|
|
71
|
+
|
|
72
|
+
Crossing the boundary is not a fast run; it is an unverified one. When the run hits something only the
|
|
73
|
+
front can resolve, it stops — and that stop is the loop working, not failing.
|
|
74
|
+
|
|
75
|
+
## The dynamic run — fan-out and in-run convergence
|
|
76
|
+
|
|
77
|
+
Once it starts, the run does not crawl the build in one linear pass. It **fans out** the independent
|
|
78
|
+
work — several build attempts, several test-fix loops, several checks at once — and then **converges**
|
|
79
|
+
on a trustworthy result with three loops:
|
|
80
|
+
|
|
81
|
+
- **loop-until-dry** — keep hunting failures and gaps until N consecutive passes find nothing new.
|
|
82
|
+
Stopping at the first green is how defects survive; the run stops only when the well runs dry.
|
|
83
|
+
- **adversarial verify** — for every "done" claim, an independent skeptic tries to REFUTE it. The
|
|
84
|
+
claim survives only if it withstands refutation, not because one pass looked plausible.
|
|
85
|
+
- **completeness-critic** — a final pass that asks "what did we NOT cover — a scenario, a blind-spot,
|
|
86
|
+
an unstated assumption?" Whatever it finds re-enters the run.
|
|
87
|
+
|
|
88
|
+
The run ends only when the loops go dry AND the auto-gate's evidence is satisfied. This is the run
|
|
89
|
+
**self-improving within the turn** — the same convergence the foundation loop runs across milestones,
|
|
90
|
+
compressed into one task.
|
|
91
|
+
|
|
92
|
+
## The evidence auto-gate
|
|
93
|
+
|
|
94
|
+
The verify gate may be resolved by **evidence** rather than by a person — when the evidence is
|
|
95
|
+
sufficient and the result is recorded (principle 7, reframed: an automated, recorded pass is an
|
|
96
|
+
explicit pass, not a skip).
|
|
97
|
+
|
|
98
|
+
- **Auto-PASS requires ALL of:** every test green; coverage not decreased; no test weakened and no
|
|
99
|
+
contract edited; the convergence loops dry; the completeness-critic found nothing open.
|
|
100
|
+
- **Always escalates to a human (never auto-passed):** any **security** finding (HARD-STOP, always);
|
|
101
|
+
a **concurrency**/timing risk the tests cannot exercise; an **architecture**/layering violation; and
|
|
102
|
+
any failing test. These are the residue principle 2 names — automation cannot judge them.
|
|
103
|
+
- **Records exactly one outcome** (no silent skip): `PASS` (evidence + the named run as accountable
|
|
104
|
+
owner) · `RISK-ACCEPTED` (non-security, signed) · `HARD-STOP`. The record states it was
|
|
105
|
+
auto-resolved, names the run, and lists the residue checks performed.
|
|
106
|
+
|
|
107
|
+
The auto-gate NEVER writes a human signature it did not get. An auto-PASS is logged as *auto-resolved*,
|
|
108
|
+
honestly — the line between a pass and a skip is the recorded outcome, not a forged name.
|
|
109
|
+
|
|
110
|
+
## Emitting deltas — feeding the foundation back
|
|
111
|
+
|
|
112
|
+
The completeness-critic does not discard what it finds. Every gap, surprise, or convention that helped
|
|
113
|
+
or hurt becomes an **`open` competency delta** in the task's OBSERVE block, in the `deltas.md` grammar,
|
|
114
|
+
tagged by competency:
|
|
115
|
+
|
|
116
|
+
- a finding the run FIXED but that taught the foundation something (a missing scenario -> `TDD`);
|
|
117
|
+
- a finding the run could NOT fix — a residue escalation -> a delta AND the escalation to a human.
|
|
118
|
+
|
|
119
|
+
These `open` deltas feed v5's human-gated fold (`fold.md`) at milestone close: the run emits `open`;
|
|
120
|
+
the human folds. That is the loop closing — **v6 run -> v5 foundation** — so a dynamic run sharpens the
|
|
121
|
+
five competencies instead of letting its findings evaporate at end-of-run.
|
|
122
|
+
|
|
123
|
+
## The autonomy dial
|
|
124
|
+
|
|
125
|
+
How much a run may auto-gate is a **per-scope setting**, not a global switch (principle 5: trust is
|
|
126
|
+
earned per scope). A task declares its level in its `TASK.md` header:
|
|
127
|
+
|
|
128
|
+
```
|
|
129
|
+
autonomy: auto | conservative
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
- **auto (the default)** — the run may auto-PASS when the evidence + residue checks above are
|
|
133
|
+
satisfied. Security still always escalates. This is the default starting point: a frozen contract
|
|
134
|
+
flips the task into a self-driving run that converges and auto-gates on evidence.
|
|
135
|
+
- **conservative** — the deliberate *lowering*: the run does all the work and converges, but STOPS at
|
|
136
|
+
the verify gate for a human. Auto-PASS is disabled. Choose it wherever evidence is thin or risk is high.
|
|
137
|
+
|
|
138
|
+
> **v7 reversal (recorded, not hidden).** Earlier the default was `conservative` and `auto` was the
|
|
139
|
+
> earned exception; v7 flips this — `auto` is the default, `conservative` is the deliberate lowering.
|
|
140
|
+
> What did **not** change is principle 5: the dial is still **per-scope**, the level still lives in the
|
|
141
|
+
> `TASK.md` header, and you still lower it anywhere risk demands. Only the starting point moved.
|
|
142
|
+
|
|
143
|
+
**The high-risk guard — `auto` is refused where it matters most.** The dial is not a blank cheque. On a
|
|
144
|
+
**high-risk or method-defining scope** — anything where a wrong-but-plausible result is expensive or
|
|
145
|
+
hard to reverse (auth, money, data-loss paths, the method/trust-layer itself) — `auto` must be lowered
|
|
146
|
+
to `conservative`; leaving it at `auto` there is the reject code **`unguarded_high_risk_auto`**. This
|
|
147
|
+
closes the v6 dogfood blind-spot, where the whole milestone ran at `auto` on the riskiest possible
|
|
148
|
+
scope (defining the method) with no friction. The default is `auto` *for ordinary, well-tested scope*;
|
|
149
|
+
high risk still earns a human gate.
|
|
150
|
+
|
|
151
|
+
The dial is a **rubric convention** read by the human and the run — it is **not an `add.py` flag** (the
|
|
152
|
+
engine stays judgment-free); the level lives in the `TASK.md` header where the run already reads.
|
|
@@ -0,0 +1,58 @@
|
|
|
1
|
+
# Scope drafting — turn a classified request into a versioned MILESTONE.md
|
|
2
|
+
|
|
3
|
+
This is the **second half of intake**. `intake.md` CLASSIFIES a request into a bucket; scope
|
|
4
|
+
drafting turns that classified request into a confirmed, well-formed, versioned `MILESTONE.md`
|
|
5
|
+
through discussion. The MILESTONE.md template is the SHAPE; this rubric is HOW to fill it well.
|
|
6
|
+
You (the AI) **propose**; the human **confirms before anything is created**.
|
|
7
|
+
|
|
8
|
+
## What to do per intake outcome
|
|
9
|
+
|
|
10
|
+
scope drafting honors intake's classification — it never re-sizes a request:
|
|
11
|
+
|
|
12
|
+
| intake outcome | scope-loop action | creates (after confirm) |
|
|
13
|
+
|----------------|-------------------|-------------------------|
|
|
14
|
+
| `new-major` / `sub-milestone` | draft ONE MILESTONE.md (fill the template via discussion) | 1 milestone |
|
|
15
|
+
| `task` | route to `add.py new-task <slug>` (it fits the active milestone) | 0 milestones |
|
|
16
|
+
| `change-request` | route to SPECIFY/CONTRACT of the affected task | 0 milestones |
|
|
17
|
+
| `split_required` | draft ALL N items as a batch in ONE pass | N milestones/tasks |
|
|
18
|
+
|
|
19
|
+
**Confirm before create is the invariant.** It holds in the one-pass split case too: "one pass"
|
|
20
|
+
means one drafting pass, NOT auto-creation. Nothing is written to disk — single draft or the
|
|
21
|
+
whole batch — until the human confirms. You propose; you wait.
|
|
22
|
+
|
|
23
|
+
## Drafting a good MILESTONE.md (section by section)
|
|
24
|
+
|
|
25
|
+
- **goal** — ONE sentence, an outcome not an output ("a user can size any request", not "write
|
|
26
|
+
intake.md"). If it needs an "and", it is probably two milestones.
|
|
27
|
+
- **Scope In/Out** — the explicit anti-creep deferral list. Naming what is OUT is as important
|
|
28
|
+
as what is IN; an empty Out list usually means the scope is not yet thought through.
|
|
29
|
+
- **Shared decisions & glossary deltas** — cross-cutting rules every task must honor, named from
|
|
30
|
+
the glossary. New terms get a glossary entry (the survivor layer stays honest).
|
|
31
|
+
- **Shared / risky contracts to freeze first** — the seams between tasks; name the owning task.
|
|
32
|
+
- **Tasks (breadth-first)** — `slug · depends-on · one line` each. Decompose by deliverable, not
|
|
33
|
+
by phase; keep each task one-file-sized. Order by dependency, not by guesswork.
|
|
34
|
+
- **Exit criteria** — observable, and **every exit criterion maps to a declared task slug**
|
|
35
|
+
(no dangling criterion). Each line answers "which task delivers this, and how would we see it?"
|
|
36
|
+
|
|
37
|
+
## Reject codes (emit `{ reject, rationale }`, create nothing)
|
|
38
|
+
|
|
39
|
+
- `not_classified` — the request has not been through intake yet. Classify it first; you cannot
|
|
40
|
+
draft scope for an unclassified request.
|
|
41
|
+
- `dangling_criterion` — a drafted MILESTONE.md has an exit criterion that maps to no declared
|
|
42
|
+
task slug. FIX the draft (add the task or drop the criterion) before proposing — never propose
|
|
43
|
+
a malformed milestone. With no engine lint, you are the first check and the human is the backstop.
|
|
44
|
+
- `no_milestone` — intake routed the request to `task` or `change-request`; scope drafting
|
|
45
|
+
creates NO milestone. Honor the classification; do not invent milestone-sized scope.
|
|
46
|
+
|
|
47
|
+
## Worked example (from this repo's own history)
|
|
48
|
+
|
|
49
|
+
Request: *"open the Interface & Intake milestone"* → intake classified it `sub-milestone` of the
|
|
50
|
+
live v4 self-driving theme → scope drafting produced **`.add/milestones/v4-1/MILESTONE.md`**:
|
|
51
|
+
|
|
52
|
+
- **goal**: make ADD harness-drivable and self-scoping — machine-readable state plus an
|
|
53
|
+
AI-facilitated request→versioned-milestone intake loop (the real v4-1 goal, one outcome sentence).
|
|
54
|
+
- **tasks** (breadth-first): `machine-state-json` · `versioning-policy` · `scope-loop`.
|
|
55
|
+
- **exit criteria** — each maps to its task slug: `--json` emits owner+stop (← machine-state-json),
|
|
56
|
+
the AI proposes a bucket with rationale (← versioning-policy), the AI drafts a versioned
|
|
57
|
+
MILESTONE.md via discussion (← scope-loop). Every criterion names the task that delivers it —
|
|
58
|
+
which is exactly the well-formedness rule above, checkable against the real file.
|