@pilotspace/add 1.0.0 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +48 -0
- package/GETTING-STARTED.md +66 -4
- package/README.md +2 -2
- package/bin/cli.js +27 -35
- package/docs/02-the-flow.md +9 -6
- package/docs/04-step-2-scenarios.md +2 -0
- package/docs/05-step-3-contract.md +2 -0
- package/docs/06-step-4-tests.md +2 -0
- package/docs/08-step-6-verify.md +11 -2
- package/docs/09-the-loop.md +18 -0
- package/docs/10-setup-and-stages.md +36 -7
- package/docs/13-adoption.md +2 -2
- package/docs/14-foundation.md +12 -4
- package/docs/appendix-f-requirements-matrix.md +5 -4
- package/package.json +5 -3
- package/skill/add/SKILL.md +40 -13
- package/skill/add/adopt.md +65 -0
- package/skill/add/deltas.md +12 -2
- package/skill/add/phases/0-setup.md +87 -24
- package/skill/add/phases/3-contract.md +16 -0
- package/skill/add/phases/4-tests.md +14 -0
- package/skill/add/phases/5-build.md +3 -0
- package/skill/add/phases/6-verify.md +15 -3
- package/skill/add/report-template.md +48 -0
- package/skill/add/run.md +11 -3
- package/skill/add/scope.md +18 -0
- package/skill/add/setup-review.md +62 -0
- package/skill/add/streams.md +206 -0
- package/tooling/add.py +940 -56
- package/tooling/templates/TASK.md.tmpl +7 -0
package/skill/add/SKILL.md
CHANGED
|
@@ -31,7 +31,10 @@ Run the tool to find the resume point instead of re-reading the repo:
|
|
|
31
31
|
python3 .add/tooling/add.py status
|
|
32
32
|
```
|
|
33
33
|
|
|
34
|
-
- **No `.add
|
|
34
|
+
- **No `.add/state.json` yet** (a fresh install drops tooling + docs but does *not* init — so `status` says
|
|
35
|
+
`no .add/ project found`) → enter **autonomous setup**: YOU run init yourself —
|
|
36
|
+
`add.py init --name "<inferred>" --stage <picked> --await-lock` (don't tell the human to) — then read
|
|
37
|
+
`phases/0-setup.md` and draft the foundation + first scope + first contract through to the human lock-down.
|
|
35
38
|
- **A task is active** → open `.add/tasks/<active>/TASK.md`, look at its `phase:`
|
|
36
39
|
marker, and read the matching `phases/<n>-<phase>.md`. Work *only* that phase.
|
|
37
40
|
- **No active task** → first SIZE the request (see Intake below), then create the
|
|
@@ -56,28 +59,51 @@ Load the phase guide **only for the phase you are in** (progressive disclosure):
|
|
|
56
59
|
|
|
57
60
|
| Phase | Guide | Produces (TASK.md section) | Who leads |
|
|
58
61
|
|-------|-------|----------------------------|-----------|
|
|
59
|
-
| setup | `phases/0-setup.md` | `.add/` +
|
|
60
|
-
| specify | `phases/1-specify.md` | §1 rules + ranked least-sure flag |
|
|
61
|
-
| scenarios | `phases/2-scenarios.md` | §2 Given/When/Then |
|
|
62
|
-
| contract | `phases/3-contract.md` | §3 frozen shape | human
|
|
63
|
-
| tests | `phases/4-tests.md` | §4 + red suite in `tests/` |
|
|
62
|
+
| setup | `phases/0-setup.md` | `.add/` + survivors + first §1–§3 + `SETUP-REVIEW.md` | AI drafts → **human locks** (the lock-down) |
|
|
63
|
+
| specify | `phases/1-specify.md` | §1 rules + ranked least-sure flag | AI drafts (co-specify)† |
|
|
64
|
+
| scenarios | `phases/2-scenarios.md` | §2 Given/When/Then | AI drafts† |
|
|
65
|
+
| contract | `phases/3-contract.md` | §3 frozen shape | AI drafts → **human approves once** (the seam)† |
|
|
66
|
+
| tests | `phases/4-tests.md` | §4 + red suite in `tests/` | AI drafts† |
|
|
64
67
|
| build | `phases/5-build.md` | code in `src/`, tests green | **AI** |
|
|
65
|
-
| verify | `phases/6-verify.md` | §6 checks + gate record | **human
|
|
68
|
+
| verify | `phases/6-verify.md` | §6 checks + gate record | **AI auto-gates on evidence**; human on residue/security‡ |
|
|
66
69
|
| observe | `phases/7-observe.md` | §7 spec delta | human + AI |
|
|
67
70
|
|
|
71
|
+
† **One-approval front (v7).** §1–§4 are drafted by the AI as a single bundle and frozen
|
|
72
|
+
together; the human gives **one approval, at the contract freeze** (the autonomy seam) — not
|
|
73
|
+
three separate sign-offs. The AI presents the bundle least-sure-first. See `run.md`.
|
|
74
|
+
‡ **Verify auto-gate (v6–v7).** Under `autonomy: auto` (the default) a run may auto-PASS once
|
|
75
|
+
the evidence is complete (all tests green · loops dry · no residue) — recorded as *auto-resolved*,
|
|
76
|
+
an explicit PASS, not a skip. **Security always escalates** (HARD-STOP), as do concurrency /
|
|
77
|
+
architecture residue and `conservative` autonomy. See `run.md`.
|
|
78
|
+
|
|
79
|
+
Whenever you present a seam to the human in chat (intake · front approval · gate ·
|
|
80
|
+
milestone close), follow `report-template.md` — SUMMARY → DECISION → ⚠ FLAGS →
|
|
81
|
+
EVIDENCE → NEXT, engine-sourced facts, show-before-ask, never pre-stamp a seam.
|
|
82
|
+
|
|
68
83
|
In **observe**, also emit **competency deltas** — learnings tagged by which of the five
|
|
69
84
|
(`DDD · SDD · UDD · TDD · ADD`) they improve — so the foundation self-improves across loops.
|
|
70
85
|
You write them as `open`; the human folds them into `PROJECT.md`. Read `deltas.md` for the
|
|
71
86
|
grammar and the status lifecycle. At milestone close (or on demand), run the fold ritual that
|
|
72
87
|
gathers confirmed deltas into a versioned foundation — read `fold.md`.
|
|
73
88
|
|
|
74
|
-
## The dynamic run (v6)
|
|
89
|
+
## The dynamic run (v6–v7)
|
|
90
|
+
|
|
91
|
+
Once **§3 CONTRACT is FROZEN**, the build→verify half runs as a dynamic, auto-gated run —
|
|
92
|
+
fan-out + in-run convergence — instead of a manual build (`autonomy: auto` is the default; lower
|
|
93
|
+
to `conservative` to keep a human at the gate). Read `run.md` for the trigger, the touch-boundary,
|
|
94
|
+
the evidence auto-gate, and the autonomy dial. The human-led front still owns *direction*, but v7
|
|
95
|
+
compresses it to a **single approval at the contract seam**; the run never edits a frozen contract
|
|
96
|
+
and never auto-passes a security finding.
|
|
97
|
+
|
|
98
|
+
## Parallel streams — pipelining independent tasks (opt-in)
|
|
75
99
|
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
100
|
+
The default is one task at a time. When a milestone has several tasks whose `deps=` are
|
|
101
|
+
already `PASS` and a human is ready to review, you MAY run them concurrently: read
|
|
102
|
+
`streams.md`. It changes no `add.py` code — you compute a READY-QUEUE from `status`,
|
|
103
|
+
spawn one worker per ready task (each in a worktree, building behind its own frozen
|
|
104
|
+
contract), and keep the human seams (front approval · escalated Verify) on one serial
|
|
105
|
+
REVIEW-QUEUE. The honest gain is pipelining (the reviewer never waits on a build), not
|
|
106
|
+
N× speed; the autonomy dial sets how much actually overlaps.
|
|
81
107
|
|
|
82
108
|
## Non-negotiable rules (from the method)
|
|
83
109
|
|
|
@@ -100,6 +126,7 @@ inside TASK.md):
|
|
|
100
126
|
```bash
|
|
101
127
|
python3 .add/tooling/add.py advance # next phase of the active task
|
|
102
128
|
python3 .add/tooling/add.py gate PASS # at verify: records PASS, marks done
|
|
129
|
+
python3 .add/tooling/add.py use <slug> # switch the active task (e.g. across parallel streams)
|
|
103
130
|
```
|
|
104
131
|
|
|
105
132
|
## Depth by stage
|
|
@@ -0,0 +1,65 @@
|
|
|
1
|
+
# Adopt — map an existing repo into the foundation (silent)
|
|
2
|
+
|
|
3
|
+
When ADD is pointed at a repo that already has code, onboarding is **silent**: the code
|
|
4
|
+
answers the questions a greenfield interview would ask, so you read it rather than ask.
|
|
5
|
+
This is the **brownfield path** of setup (the greenfield path keeps the 4-lens interview —
|
|
6
|
+
see `phases/0-setup.md`). You fill the survivor files from evidence, then stop at the one
|
|
7
|
+
human gate: the **lock-down** (`add.py lock`).
|
|
8
|
+
|
|
9
|
+
## The signal — and arming the gate
|
|
10
|
+
|
|
11
|
+
Enter a brownfield repo with `--await-lock`:
|
|
12
|
+
|
|
13
|
+
```bash
|
|
14
|
+
python3 .add/tooling/add.py init --await-lock
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
`--await-lock` does two things. It seeds an **unlocked** setup, which *arms the lock-down gate*
|
|
18
|
+
— the engine then refuses a second task, crossing into build, and recording a gate until you
|
|
19
|
+
`lock`. And init, being brownfield-aware, prints a line that begins:
|
|
20
|
+
|
|
21
|
+
```
|
|
22
|
+
brownfield: existing code detected — the `add` skill maps it into your foundation …
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
That line is your cue to run this guide. **Always use `--await-lock` for brownfield onboarding**:
|
|
26
|
+
a plain `init` writes no setup and is grandfathered-locked, so its gate never arms *and* the
|
|
27
|
+
closing `lock` below would refuse with `already_locked`. The engine only *detects* the existing
|
|
28
|
+
code (a mechanical fact); it never reads or fills it — interpreting it is your job.
|
|
29
|
+
|
|
30
|
+
## The silent mapping
|
|
31
|
+
|
|
32
|
+
Fill each survivor file in `.add/` from what the code actually shows — **ask nothing**:
|
|
33
|
+
|
|
34
|
+
| Survivor | Read it from |
|
|
35
|
+
|----------|--------------|
|
|
36
|
+
| `PROJECT.md` (foundation) | the domain nouns, entry points, the README, the first milestone the code implies |
|
|
37
|
+
| `CONVENTIONS.md` | the languages, folder layout, naming, lint config, error style already in the tree |
|
|
38
|
+
| `GLOSSARY.md` | the recurring names in modules, models, and public APIs (one name per concept) |
|
|
39
|
+
| `MODEL_REGISTRY.md` | leave the active model record; note any AI-authored code you can detect |
|
|
40
|
+
| `dependencies.allowlist` | the manifests already in the repo (package.json, pyproject, go.mod, …) |
|
|
41
|
+
|
|
42
|
+
Two rules that never bend:
|
|
43
|
+
|
|
44
|
+
1. **Never clobber a survivor.** `init` already skips any survivor that exists; if a human
|
|
45
|
+
already wrote `PROJECT.md`, you READ it, you do not overwrite it. Add, never replace.
|
|
46
|
+
2. **Tag every drafted decision `evidence-grounded` vs `guessed`.** A line you read from the
|
|
47
|
+
code is *evidence-grounded* (cite the file). A line you inferred because the code was silent
|
|
48
|
+
is *guessed*. The human's single lock-down is only honest if they can see which is which —
|
|
49
|
+
the guesses are what they actually need to check. (The tags feed `SETUP-REVIEW.md`.)
|
|
50
|
+
|
|
51
|
+
## Where it ends — the lock-down
|
|
52
|
+
|
|
53
|
+
Brownfield onboarding draws no per-step approvals. You map the foundation, then draft the
|
|
54
|
+
first milestone's scope and the first task's candidate front exactly as greenfield does, and
|
|
55
|
+
present it all at **one** human gate. The human reviews the decisions (least-sure / `guessed`
|
|
56
|
+
first) and signs:
|
|
57
|
+
|
|
58
|
+
```bash
|
|
59
|
+
python3 .add/tooling/add.py lock --by "<name>"
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
`lock` freezes the foundation + scope + first contract in one atomic write and opens the build.
|
|
63
|
+
Until it is run, the engine refuses a second task, crossing into build, and recording a gate —
|
|
64
|
+
so nothing is built on an unreviewed map. That gate is the only thing brownfield onboarding asks
|
|
65
|
+
of a human; everything before it, you did from the code.
|
package/skill/add/deltas.md
CHANGED
|
@@ -10,7 +10,7 @@ You (the AI) **emit** deltas as `open`. Only the **human** moves a delta to `fol
|
|
|
10
10
|
|
|
11
11
|
## The grammar (frozen)
|
|
12
12
|
|
|
13
|
-
Each delta
|
|
13
|
+
Each delta begins on its own **tag line**; the learning may wrap onto continuation lines:
|
|
14
14
|
|
|
15
15
|
```
|
|
16
16
|
- [<COMPETENCY> · <status>] <learning> (evidence: <pointer>)
|
|
@@ -18,10 +18,20 @@ Each delta is ONE line, exactly:
|
|
|
18
18
|
|
|
19
19
|
- `<COMPETENCY>` — exactly one of the five (below).
|
|
20
20
|
- `<status>` — `open` | `folded` | `rejected`. A **newly emitted delta is `open`**.
|
|
21
|
-
- `<learning>` — the insight
|
|
21
|
+
- `<learning>` — the insight ("the domain model missed multi-tenancy"). It may run past one line;
|
|
22
|
+
the `- [COMPETENCY · status]` tag line must come **first**, and the `(evidence: …)` clause must
|
|
23
|
+
**close** the delta (on its last line).
|
|
22
24
|
- `(evidence: …)` — **required**, non-empty: a failing scenario, a production signal, a review
|
|
23
25
|
note. No evidence → it is an opinion, not a delta.
|
|
24
26
|
|
|
27
|
+
A long learning may wrap — the lint (`add.py check`) joins continuation lines, so this is **one**
|
|
28
|
+
delta, not two:
|
|
29
|
+
|
|
30
|
+
```
|
|
31
|
+
- [SDD · open] the export endpoint must reject a tenant-scoped token used cross-tenant,
|
|
32
|
+
returning `forbidden` (not `not_found`) (evidence: scenario_cross_tenant_export failed)
|
|
33
|
+
```
|
|
34
|
+
|
|
25
35
|
## The five competencies (pick exactly one per delta)
|
|
26
36
|
|
|
27
37
|
| tag | competency | a delta here means you learned something about… |
|
|
@@ -1,35 +1,98 @@
|
|
|
1
|
-
# Phase 0 — Setup (
|
|
1
|
+
# Phase 0 — Setup (autonomous draft → one human lock-down)
|
|
2
2
|
|
|
3
|
-
Goal:
|
|
3
|
+
Goal: point ADD at a repo and **you** draft the whole foundation — domain, first-milestone scope,
|
|
4
|
+
and the first task's contract — then hand the human exactly one decision: the **lock-down**. Brownfield
|
|
5
|
+
is silent (the code answers the questions); greenfield keeps a short interview. Either way, the human's
|
|
6
|
+
only gate is `add.py lock`. This is the setup-altitude analog of a task's one-approval contract freeze.
|
|
4
7
|
|
|
5
|
-
##
|
|
8
|
+
## 1 · Zero-touch entry — you run init yourself
|
|
6
9
|
|
|
7
|
-
|
|
10
|
+
When there is no `.add/state.json`, do **not** tell the human to initialise — run it yourself. Infer the
|
|
11
|
+
project name and stage from the repo, and **arm the lock-down gate** with `--await-lock`:
|
|
12
|
+
|
|
13
|
+
```bash
|
|
14
|
+
python3 .add/tooling/add.py init --name "<inferred from repo/dir>" --stage <prototype|poc|mvp|production> --await-lock
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
- `--await-lock` is **required** here: it seeds an *unlocked* setup, which arms the gate so the engine
|
|
18
|
+
refuses a second task / crossing into build / a `gate` until you `lock`. A plain `init` is
|
|
19
|
+
grandfathered-locked — its gate never arms, and the closing `lock` would error `already_locked`.
|
|
20
|
+
- name + stage are **your judgment** (read them from the dir name, README, manifests); the engine stays
|
|
21
|
+
mechanical. Pick the stage from the ambition you hear: throwaway → `prototype`, one risky slice → `poc`,
|
|
22
|
+
narrow-but-real → `mvp`, full rigor → `production`.
|
|
23
|
+
|
|
24
|
+
`init` prints one of two things — **that is your branch**:
|
|
25
|
+
- a line starting `brownfield:` → there is existing code (go to **2a**);
|
|
26
|
+
- the greenfield closing (no `brownfield:`) → an empty repo (go to **2b**).
|
|
27
|
+
|
|
28
|
+
## 2a · Brownfield — map it silently
|
|
29
|
+
|
|
30
|
+
The code answers the questions a greenfield interview would ask, so **read it, don't ask**. Open
|
|
31
|
+
`adopt.md` and follow it: fill each survivor file from the code, never clobber an existing one, and tag
|
|
32
|
+
every decision `evidence-grounded` (cite the file) or `guessed`. Ask the human **nothing** at this step.
|
|
33
|
+
|
|
34
|
+
## 2b · Greenfield — the 4-lens interview (kept): co-specify at foundation altitude
|
|
35
|
+
|
|
36
|
+
An empty repo has no code to read, so run the short interview. This is the **co-specify at foundation
|
|
37
|
+
altitude** move — the same diverge → converge → validate brainstorm a task's §1 uses (`phases/1-specify.md`),
|
|
38
|
+
lifted to the foundation. Ask the one load-bearing question per lens (diverge), draft the foundation
|
|
39
|
+
(converge), then rank what you're least sure of and show the top flag first (validate):
|
|
40
|
+
|
|
41
|
+
| Lens | The one question that unblocks the section |
|
|
42
|
+
|------|--------------------------------------------|
|
|
43
|
+
| Domain (DDD) | The 3–5 core nouns, and the one invariant that must NEVER break? |
|
|
44
|
+
| Spec (SDD) | The first milestone's outcome — and what's explicitly NOT in v1? |
|
|
45
|
+
| Users (UDD) | The primary user and the one job they hire this for? (or "no UI — surface is X") |
|
|
46
|
+
| Decisions | What's already decided that you'd regret re-litigating? (first Key Decision row) |
|
|
47
|
+
|
|
48
|
+
Ask only the live ones; skip what the request already answers. Rank your drafts least-sure-first using the
|
|
49
|
+
one notation every altitude shares — `⚠ <assumption> — least sure because <why>; if wrong: <cost>` — and
|
|
50
|
+
tag thin or inferred answers `guessed`.
|
|
51
|
+
|
|
52
|
+
## 3 · Draft to the lock (both paths)
|
|
53
|
+
|
|
54
|
+
1. **Fill the survivors** (they outlive all code): `.add/PROJECT.md` (the foundation — Domain · Spec/active
|
|
55
|
+
milestone · UI/UX · Key Decisions, one screen), `CONVENTIONS.md`, `GLOSSARY.md`, `MODEL_REGISTRY.md`,
|
|
56
|
+
`dependencies.allowlist`. Brownfield: from the code. Greenfield: from the interview, gaps flagged `guessed`.
|
|
57
|
+
2. **Size the first milestone** (read `scope.md`) and draft its `MILESTONE.md` — goal · scope · exit criteria
|
|
58
|
+
· breadth-first tasks.
|
|
59
|
+
3. **Create the first task and draft its candidate front.** `new-task` is allowed pre-lock:
|
|
8
60
|
```bash
|
|
9
|
-
python3 .add/tooling/add.py
|
|
61
|
+
python3 .add/tooling/add.py new-task <slug> --title "<first feature>"
|
|
10
62
|
```
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
63
|
+
Draft §1 (specify) · §2 (scenarios) · §3 (contract). **Leave §3 `Status: DRAFT`** — the lock is its
|
|
64
|
+
approval (see §5). You MAY `advance` through specify → scenarios → contract → tests pre-lock, but the
|
|
65
|
+
engine **refuses crossing into build** until you `lock` (`setup_unlocked`). Sequence: front → lock → build.
|
|
66
|
+
4. **Write `.add/SETUP-REVIEW.md`** per `setup-review.md`: every decision you drafted (foundation, scope,
|
|
67
|
+
first contract), **least-sure-first**, each tagged `guessed` | `evidence-grounded`.
|
|
68
|
+
|
|
69
|
+
## 4 · The one human gate — the lock-down
|
|
70
|
+
|
|
71
|
+
Present `SETUP-REVIEW.md` least-sure-first (the `guessed` rows are what the human must actually check). They
|
|
72
|
+
sign **once**:
|
|
73
|
+
|
|
74
|
+
```bash
|
|
75
|
+
python3 .add/tooling/add.py lock --by "<name>"
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
`lock` records the lock layers (foundation · scope · contract) in one atomic write and opens the build. It is
|
|
79
|
+
judgment-free — it does **not** parse `SETUP-REVIEW.md`; the human *reading* it is the review.
|
|
80
|
+
|
|
81
|
+
## 5 · After the lock
|
|
82
|
+
|
|
83
|
+
- The lock **is** the first task's contract approval — the v7 one-approval-front and the lock-down collapse
|
|
84
|
+
into this single signature. Do **not** ask for a separate contract-freeze sign-off (that double-gates).
|
|
85
|
+
- Stamp the first task's §3 `Status: FROZEN @ v1` (lock-authorized), then read `phases/5-build.md` — build is
|
|
86
|
+
now open. Everything before this signature, you drafted.
|
|
22
87
|
|
|
23
88
|
## Exit gate
|
|
24
89
|
|
|
25
|
-
- [ ] `.add/state.json` exists (`
|
|
26
|
-
- [ ]
|
|
27
|
-
- [ ]
|
|
28
|
-
- [ ]
|
|
90
|
+
- [ ] `.add/state.json` exists; setup was seeded unlocked (`--await-lock`) then locked.
|
|
91
|
+
- [ ] Survivors filled (brownfield: from code, tagged evidence-grounded; greenfield: from the interview).
|
|
92
|
+
- [ ] First task created; §1–§3 drafted; `.add/SETUP-REVIEW.md` written least-sure-first.
|
|
93
|
+
- [ ] Human signed `add.py lock`; first task §3 `FROZEN @ v1`; build open.
|
|
29
94
|
|
|
30
95
|
## Next
|
|
31
96
|
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
```
|
|
35
|
-
Then read `phases/1-specify.md`. · Book: `docs/10-setup-and-stages.md`.
|
|
97
|
+
After the lock, read `phases/5-build.md` (build is open). · Book: `docs/10-setup-and-stages.md`
|
|
98
|
+
*(note: book chapters 10 / 13 / 14 still describe the older human-led setup until `book-align` lands).*
|
|
@@ -20,6 +20,22 @@ whole bundle (§1–§4). Before asking for it, present the bundle **least-sure
|
|
|
20
20
|
most likely wrong (`⚠ [spec|scenario|contract|test] … — because …; if wrong: …`) — aim the human's
|
|
21
21
|
eye before they freeze. See `run.md`.
|
|
22
22
|
|
|
23
|
+
## The freeze review checklist
|
|
24
|
+
|
|
25
|
+
The human's one minute, aimed. Walk these six before saying yes:
|
|
26
|
+
|
|
27
|
+
- **⚠ flags first** — read the least-sure flags; accept each knowing its cost if wrong.
|
|
28
|
+
- **Intent** — does §1 say what you actually want built (and is anything you expected missing)?
|
|
29
|
+
- **Cases** — does every Must and Reject have an observable §2 scenario you care about?
|
|
30
|
+
- **Shape** — glossary names, error codes, additive vs breaking: is THIS the shape to freeze?
|
|
31
|
+
- **Risk** — is this scope high-risk or method-defining? Then require
|
|
32
|
+
`risk: high · autonomy: conservative` in the TASK.md header — the engine refuses an unguarded completion.
|
|
33
|
+
- **Tests** — will §4 go red for the right reason, asserting behavior rather than internals?
|
|
34
|
+
|
|
35
|
+
This checklist AIMS the one approval — never a second gate, no sign-off forms, no
|
|
36
|
+
extra documents. Reject any line and the bundle goes back to draft; that is
|
|
37
|
+
backward-correction, not failure.
|
|
38
|
+
|
|
23
39
|
## AI prompt
|
|
24
40
|
|
|
25
41
|
> Role: an interface architect; frozen contracts are immutable. Read §1, §2,
|
|
@@ -17,6 +17,20 @@ before code exists is testing nothing and will wave bad code through later.
|
|
|
17
17
|
- Side-effect assertions on rejection paths (`assert balance unchanged`).
|
|
18
18
|
- A recorded coverage target in §4.
|
|
19
19
|
|
|
20
|
+
## Declaring where tests live
|
|
21
|
+
|
|
22
|
+
§4's `Tests live in:` line is machine-read: when a task has no local `tests/`,
|
|
23
|
+
`add.py report` counts test functions at the declared path(s) instead. The FIRST
|
|
24
|
+
line matching `Tests live in:` is read; paths are its backticked tokens.
|
|
25
|
+
Resolution: `./…` → this task's dir · a token containing `/` → the project root
|
|
26
|
+
(the parent of `.add/`) · a bare name → a sibling of the previous token's
|
|
27
|
+
directory (else the task dir). A directory token counts the `*.py` files directly
|
|
28
|
+
inside it (non-recursive); a `.py` file token counts itself; anything else is
|
|
29
|
+
ignored. Resolved files are deduped, and reports mark declared counts with `†`.
|
|
30
|
+
Paths are confined: anything resolving (symlinks followed)
|
|
31
|
+
outside the project root counts 0 — `..` traversal, absolute paths, and
|
|
32
|
+
symlink escapes are never read.
|
|
33
|
+
|
|
20
34
|
## AI prompt
|
|
21
35
|
|
|
22
36
|
> Role: a test author who writes tests before code. Read §2 and §3. Turn each
|
|
@@ -36,3 +36,6 @@ change request back to Specify. Honor the feature-specific safety rule named in
|
|
|
36
36
|
|
|
37
37
|
`python3 .add/tooling/add.py advance` → read `phases/6-verify.md`.
|
|
38
38
|
Book: `docs/07-step-5-build.md`.
|
|
39
|
+
|
|
40
|
+
> Under `autonomy: auto` (the default) Build and Verify run together as one dynamic,
|
|
41
|
+
> evidence-auto-gated run — not two manual stops. See `run.md`.
|
|
@@ -1,8 +1,16 @@
|
|
|
1
1
|
# Phase 6 — Verify (evidence + blind-spot checks)
|
|
2
2
|
|
|
3
3
|
Goal: establish trust and record an outcome. Passing tests are necessary, not
|
|
4
|
-
sufficient.
|
|
5
|
-
|
|
4
|
+
sufficient. Fill **§6** in TASK.md including the GATE RECORD.
|
|
5
|
+
|
|
6
|
+
> **Who resolves this gate depends on the `autonomy:` header (see `run.md`).**
|
|
7
|
+
> Under `autonomy: auto` (the default) a run auto-PASSes once the evidence is
|
|
8
|
+
> complete — every test green, the convergence loops dry, and **no residue**
|
|
9
|
+
> (security · concurrency · architecture) — recording it as *auto-resolved* with
|
|
10
|
+
> the named run as accountable owner: an explicit PASS, not a skip. **Security is
|
|
11
|
+
> always a HARD-STOP and is never auto-passed.** Under `autonomy: conservative`,
|
|
12
|
+
> or whenever residue is found, this phase is **human-led** and the checks below
|
|
13
|
+
> are the human's.
|
|
6
14
|
|
|
7
15
|
## Part one — confirm the evidence
|
|
8
16
|
|
|
@@ -18,6 +26,9 @@ If any is false, stop and return to Build — there is nothing to verify yet.
|
|
|
18
26
|
and miss races.) This is usually the single most important check.
|
|
19
27
|
- **Security** — exposed secrets, injection openings, unexpected/invented
|
|
20
28
|
dependencies. A security finding is always `HARD-STOP`, never a waiver.
|
|
29
|
+
Writing ANY note on this line means the gate escalates to the human — and
|
|
30
|
+
start it with `NOTE` or `⚠` so `add.py audit` can see it: a marked security
|
|
31
|
+
note reviewed by the auto-gate is an audit finding (`unescalated_security_note`).
|
|
21
32
|
- **Architecture** — does it respect layering/dependency rules in CONVENTIONS.md?
|
|
22
33
|
|
|
23
34
|
## Record exactly one outcome (no silent pass)
|
|
@@ -30,7 +41,8 @@ If any is false, stop and return to Build — there is nothing to verify yet.
|
|
|
30
41
|
|
|
31
42
|
## Exit gate / Next
|
|
32
43
|
|
|
33
|
-
- [ ] Evidence confirmed, blind-spots checked, a person approved,
|
|
44
|
+
- [ ] Evidence confirmed, blind-spots checked, outcome recorded — a person approved, or
|
|
45
|
+
(under `autonomy: auto` with no residue) the run auto-resolved as the accountable owner.
|
|
34
46
|
|
|
35
47
|
```bash
|
|
36
48
|
python3 .add/tooling/add.py gate PASS # marks the task done
|
|
@@ -0,0 +1,48 @@
|
|
|
1
|
+
# Chat reports — the seam template (for the AI, not for add.py)
|
|
2
|
+
|
|
3
|
+
The engine renders artifacts (`report`, `report --decide`, `status`); this file
|
|
4
|
+
governs the CHAT MESSAGE you wrap around them. The digest is the artifact BEHIND
|
|
5
|
+
your presentation, never a replacement for it — and your prose is never a
|
|
6
|
+
replacement for the digest.
|
|
7
|
+
|
|
8
|
+
Use it every time you report at or near a decision seam: an intake proposal, a
|
|
9
|
+
bundle/front approval, a verify gate, a task completion, a milestone close.
|
|
10
|
+
|
|
11
|
+
## The five blocks, in order
|
|
12
|
+
|
|
13
|
+
```
|
|
14
|
+
SUMMARY one line: intent + target + where we are
|
|
15
|
+
DECISION what you need from the human (or "none — FYI")
|
|
16
|
+
⚠ FLAGS least-sure first, why + cost-if-wrong
|
|
17
|
+
EVIDENCE small table: tests · gates · parity · check — engine-sourced
|
|
18
|
+
NEXT the single next action + what it unlocks
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
1. **SUMMARY** — one line carrying intent + target + position, e.g.
|
|
22
|
+
"v13 task 2/3 — tests-declared-fallback is green, gate PASS." The reader
|
|
23
|
+
knows where they are before they read anything else.
|
|
24
|
+
2. **DECISION** — the question the human must answer, stated plainly; exactly
|
|
25
|
+
one decision per report, or an explicit "none — FYI". If a decision exists,
|
|
26
|
+
ask it AFTER everything below has been shown (show-before-ask).
|
|
27
|
+
3. **⚠ FLAGS** — least-sure first, each with *why* it is least sure and the
|
|
28
|
+
*cost if wrong*. Where TASK.md markers exist (`⚠` / `- [~]` / `- [ ]`),
|
|
29
|
+
quote them verbatim and keep their document order — extraction ≠ judgment.
|
|
30
|
+
4. **EVIDENCE** — engine-sourced facts pasted from `add.py` output, never
|
|
31
|
+
re-typed from memory. If your prose and the engine disagree, the engine
|
|
32
|
+
wins: fix the engine or the data, not the sentence.
|
|
33
|
+
5. **NEXT** — one action and what it unlocks. Mirror the rollup's DECIDE NEXT
|
|
34
|
+
line when it is right; overrule it only with a stated reason (e.g. planned
|
|
35
|
+
tasks the state file cannot see yet).
|
|
36
|
+
|
|
37
|
+
## Hard rules
|
|
38
|
+
|
|
39
|
+
- **Summary-first.** Never bury the decision under a task list or a diff.
|
|
40
|
+
- **Show before ask.** Render the artifact (digest · diff · report) before any
|
|
41
|
+
approval question; the human decides on what they can see.
|
|
42
|
+
- **Never pre-stamp a human seam.** Freeze / gate / lock fields stay DRAFT or
|
|
43
|
+
blank until the answer returns: show → ask → stamp → advance. An artifact
|
|
44
|
+
must never claim an approval that has not happened.
|
|
45
|
+
- **One report per seam.** After an approval, point at the frozen artifact —
|
|
46
|
+
do not re-render the whole bundle.
|
|
47
|
+
- **Honest scope.** "Done" means the request, not the last task: report
|
|
48
|
+
"task 2/3", never "done" while approved scope remains.
|
package/skill/add/run.md
CHANGED
|
@@ -28,7 +28,8 @@ then builds against and self-gate the result — the circular trust v6's dogfood
|
|
|
28
28
|
What the human is actually approving in that one gate: that the drafted Spec captures the real intent,
|
|
29
29
|
that the Scenarios cover the cases that matter, and that the Contract shape is the one to freeze. Reject
|
|
30
30
|
any part and the bundle goes back to draft — that is backward-correction (principle 4), not failure.
|
|
31
|
-
Approve, and the run begins.
|
|
31
|
+
Approve, and the run begins. The seam guide (`phases/3-contract.md`) carries the
|
|
32
|
+
**freeze review checklist** — six lines that walk the human through exactly this, ⚠-first.
|
|
32
33
|
|
|
33
34
|
**The least-sure flag — aiming the one approval.** A single approval over a whole bundle invites a
|
|
34
35
|
rubber stamp. So the AI presents the bundle **least-sure first**: of everything it is asking the human
|
|
@@ -148,5 +149,12 @@ closes the v6 dogfood blind-spot, where the whole milestone ran at `auto` on the
|
|
|
148
149
|
scope (defining the method) with no friction. The default is `auto` *for ordinary, well-tested scope*;
|
|
149
150
|
high risk still earns a human gate.
|
|
150
151
|
|
|
151
|
-
|
|
152
|
-
|
|
152
|
+
Judging *what* is high-risk stays human — the scope declares **`risk: high`** in the same `TASK.md`
|
|
153
|
+
header where the dial lives, reviewed at the freeze like every header line (the engine never
|
|
154
|
+
classifies scope). **Since v14 the guard is mechanical for the declared case:**
|
|
155
|
+
the engine refuses the declared combination — `add.py gate` will not complete (`PASS`/`RISK-ACCEPTED`) a task whose header
|
|
156
|
+
carries `risk: high` without `autonomy: conservative` (error `unguarded_high_risk_auto`; `HARD-STOP`
|
|
157
|
+
always records — stopping is never blocked), and `add.py audit` flags the same code on a finished
|
|
158
|
+
record whose header was tampered or whose GATE RECORD reviewer is the auto-gate — which CI enforces
|
|
159
|
+
(audit-ci). The honest limit mirrors the audit's: an **undeclared** high-risk scope passes; declaring
|
|
160
|
+
is the human seam, the engine enforces what was declared.
|
package/skill/add/scope.md
CHANGED
|
@@ -20,6 +20,24 @@ scope drafting honors intake's classification — it never re-sizes a request:
|
|
|
20
20
|
means one drafting pass, NOT auto-creation. Nothing is written to disk — single draft or the
|
|
21
21
|
whole batch — until the human confirms. You propose; you wait.
|
|
22
22
|
|
|
23
|
+
## Brainstorm before you draft — co-specify at milestone altitude
|
|
24
|
+
|
|
25
|
+
Don't draft a MILESTONE.md from thin input. Run the same three-move co-specify as a
|
|
26
|
+
task's §1 (`phases/1-specify.md`) — Diverge (framings + open questions) → Converge
|
|
27
|
+
(draft + rank) → Validate (show flags first) — raised to milestone scope. Ask only
|
|
28
|
+
what moves the goal, the In/Out line, or the task list; skip what PROJECT.md settles.
|
|
29
|
+
Draft the WHOLE milestone before showing; nothing hits disk until the human confirms.
|
|
30
|
+
|
|
31
|
+
Diverge seeds (pick the live ones):
|
|
32
|
+
- **Outcome** — done means a user can do *what* they can't today? (goal sentence)
|
|
33
|
+
- **Edge of scope** — nearest thing assumed IN that you want OUT? (Out list)
|
|
34
|
+
- **Riskiest seam** — which contract, if wrong, costs the most rework? (freeze-first)
|
|
35
|
+
- **Done-looks-like** — how do we SEE each outcome without reading code? (exit criteria)
|
|
36
|
+
- **First slice** — which task unblocks the rest? (breadth-first order)
|
|
37
|
+
|
|
38
|
+
Rank assumptions least-sure first; the top 1–2 get the flag the human reads at confirm:
|
|
39
|
+
`⚠ <assumption> — least sure because <why>; if wrong: <cost>`.
|
|
40
|
+
|
|
23
41
|
## Drafting a good MILESTONE.md (section by section)
|
|
24
42
|
|
|
25
43
|
- **goal** — ONE sentence, an outcome not an output ("a user can size any request", not "write
|
|
@@ -0,0 +1,62 @@
|
|
|
1
|
+
# Setup review — the one page the human signs
|
|
2
|
+
|
|
3
|
+
Autonomous setup ends at a single human gate: the **lock-down** (`add.py lock`). Before that
|
|
4
|
+
signature is honest, the human needs to see *what you drafted and how sure you were* — not re-derive
|
|
5
|
+
it. `SETUP-REVIEW.md` is that page: every decision you made while drafting the foundation, first-scope,
|
|
6
|
+
and the first contract, **ordered least-sure-first** so the riskiest guesses meet their eye first.
|
|
7
|
+
|
|
8
|
+
This is the setup-altitude analog of presenting a task's front least-sure-first at the contract freeze.
|
|
9
|
+
The engine never reads this file — `add.py lock` is judgment-free, the signature *is* the gate (see
|
|
10
|
+
`setup-lock-state`). The human **reading** this page is the review; your job is to make the reading honest.
|
|
11
|
+
|
|
12
|
+
## Where it lives
|
|
13
|
+
|
|
14
|
+
Write **one** artifact at `.add/SETUP-REVIEW.md`. **Never clobber a human-edited one** — if it already
|
|
15
|
+
exists with hand edits, append/update, don't overwrite (the same non-clobber rule `init` applies to
|
|
16
|
+
survivors). It is a per-onboarding, setup-altitude artifact; it sits beside `PROJECT.md`, not under a task.
|
|
17
|
+
|
|
18
|
+
## The template
|
|
19
|
+
|
|
20
|
+
```markdown
|
|
21
|
+
# SETUP REVIEW — <project>
|
|
22
|
+
|
|
23
|
+
<stage> · <brownfield | greenfield> · drafted by <model> @ <date>
|
|
24
|
+
|
|
25
|
+
| # | Decision | Lands in | Tag | Why / Evidence |
|
|
26
|
+
|---|----------|----------|-----|----------------|
|
|
27
|
+
| 1 | <the drafted decision> | PROJECT.md \| scope \| first-contract | `guessed` | <the inference + why you had to guess> |
|
|
28
|
+
| 2 | <…> | <…> | `evidence-grounded` | <cite the source file/line you read it from> |
|
|
29
|
+
|
|
30
|
+
Sign: reviewed the above → `add.py lock --by "<name>"`
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
Rows are numbered for reference at the gate ("row 1 is the one I'm least sure about").
|
|
34
|
+
|
|
35
|
+
## The two rules that make it honest
|
|
36
|
+
|
|
37
|
+
1. **Least-sure-first.** Order rows by confidence **ascending**. A `guessed` row always floats above an
|
|
38
|
+
`evidence-grounded` one. The point is not completeness theatre — it is to spend the human's attention
|
|
39
|
+
where it changes outcomes: the top of the table is the part they actually need to challenge.
|
|
40
|
+
|
|
41
|
+
2. **Every row is tagged — `guessed` or `evidence-grounded`.**
|
|
42
|
+
- `evidence-grounded` — you read it from the code/repo. **Cite the file** (e.g. `pyproject.toml`,
|
|
43
|
+
`src/orders/models.py`). Brownfield onboarding (see `adopt.md`) is mostly these.
|
|
44
|
+
- `guessed` — the repo was silent, so you inferred it. **State the inference and why.** Thin-greenfield
|
|
45
|
+
onboarding (a near-empty repo, only the 4-lens answers) produces these. These are what the human
|
|
46
|
+
must check; that is why they sit on top.
|
|
47
|
+
|
|
48
|
+
The tag vocabulary is shared with `adopt.md` — the brownfield map tags each filled survivor decision
|
|
49
|
+
`guessed`/`evidence-grounded`, and those tags flow straight into this table.
|
|
50
|
+
|
|
51
|
+
## Where it ends
|
|
52
|
+
|
|
53
|
+
`SETUP-REVIEW.md` is **read-only context** for the lock-down. You do not ask the human to approve it
|
|
54
|
+
field-by-field; you present it, least-sure-first, and they sign once:
|
|
55
|
+
|
|
56
|
+
```bash
|
|
57
|
+
python3 .add/tooling/add.py lock --by "<name>"
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
`lock` records the lock layers and opens the build — it does **not** parse or validate this file (the
|
|
61
|
+
engine stays judgment-free). The review lives in the human's reading of the page, not in the tool. Make
|
|
62
|
+
the top of the table the truth they most need, and the one signature is informed.
|