@pilotspace/add 1.0.0 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +48 -0
- package/GETTING-STARTED.md +66 -4
- package/README.md +2 -2
- package/bin/cli.js +27 -35
- package/docs/02-the-flow.md +9 -6
- package/docs/04-step-2-scenarios.md +2 -0
- package/docs/05-step-3-contract.md +2 -0
- package/docs/06-step-4-tests.md +2 -0
- package/docs/08-step-6-verify.md +11 -2
- package/docs/09-the-loop.md +18 -0
- package/docs/10-setup-and-stages.md +36 -7
- package/docs/13-adoption.md +2 -2
- package/docs/14-foundation.md +12 -4
- package/docs/appendix-f-requirements-matrix.md +5 -4
- package/package.json +5 -3
- package/skill/add/SKILL.md +40 -13
- package/skill/add/adopt.md +65 -0
- package/skill/add/deltas.md +12 -2
- package/skill/add/phases/0-setup.md +87 -24
- package/skill/add/phases/3-contract.md +16 -0
- package/skill/add/phases/4-tests.md +14 -0
- package/skill/add/phases/5-build.md +3 -0
- package/skill/add/phases/6-verify.md +15 -3
- package/skill/add/report-template.md +48 -0
- package/skill/add/run.md +11 -3
- package/skill/add/scope.md +18 -0
- package/skill/add/setup-review.md +62 -0
- package/skill/add/streams.md +206 -0
- package/tooling/add.py +940 -56
- package/tooling/templates/TASK.md.tmpl +7 -0
package/CHANGELOG.md
ADDED
|
@@ -0,0 +1,48 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to the ADD method (`@pilotspace/add` on npm,
|
|
4
|
+
`pilotspace-add` on PyPI) are documented here. The format follows
|
|
5
|
+
[Keep a Changelog](https://keepachangelog.com/); versions follow semver.
|
|
6
|
+
|
|
7
|
+
## [1.1.0] — 2026-06-05
|
|
8
|
+
|
|
9
|
+
Production-ready enforcement: the gates are now verified by machinery distinct
|
|
10
|
+
from the agent, and any AI agent can follow the method through the CLI alone.
|
|
11
|
+
|
|
12
|
+
### Added
|
|
13
|
+
- **`add.py audit [--json]`** — judgment-free, read-only verification that
|
|
14
|
+
human seams left well-formed records: a named human at every contract freeze,
|
|
15
|
+
exactly one gate outcome per done task, a human reviewer wherever the
|
|
16
|
+
security line carries a `NOTE`/`⚠` marker, no waivers on security. Exit 0
|
|
17
|
+
clean / exit 1 with `{task, code, detail}` findings.
|
|
18
|
+
- **Seam audit in CI** — a `seam-audit` job (this repo) plus a copy-paste
|
|
19
|
+
workflow for consumer projects (GETTING-STARTED "Enforce the seams in CI"):
|
|
20
|
+
a malformed seam record fails CI on a machine the agent does not control
|
|
21
|
+
(*never self-gate*, enforced).
|
|
22
|
+
- **The mechanized high-risk guard** — declare `risk: high` in a TASK.md
|
|
23
|
+
header and the engine refuses to complete the task (`PASS`/`RISK-ACCEPTED`)
|
|
24
|
+
until the dial is lowered to `autonomy: conservative`; error and audit
|
|
25
|
+
finding `unguarded_high_risk_auto`. Judging *what* is high-risk stays human;
|
|
26
|
+
the declared combination is enforced. `HARD-STOP` is never blocked.
|
|
27
|
+
- **Agent portability** — `add.py guide` now names the exact phase-guide file
|
|
28
|
+
to read (`guide : .claude/skills/add/phases/<n>-<phase>.md`, never a dead
|
|
29
|
+
pointer; additive `"guide"` key in `--json`), and the AGENTS.md/CLAUDE.md
|
|
30
|
+
block routes any agent — Claude, Cursor, Copilot, Codex — through the CLI
|
|
31
|
+
alone.
|
|
32
|
+
- **The freeze review checklist** — six ⚠-first lines inside the contract
|
|
33
|
+
phase guide that aim the human's one approval (intent · cases · shape ·
|
|
34
|
+
risk declaration · tests), never a second gate.
|
|
35
|
+
|
|
36
|
+
### Changed
|
|
37
|
+
- GitHub Actions bumped off the deprecated Node-20 runtimes
|
|
38
|
+
(checkout v5, setup-python v6, setup-node v5).
|
|
39
|
+
- GETTING-STARTED: CI enforcement section + `guide :` orientation.
|
|
40
|
+
|
|
41
|
+
## [1.0.0] — 2026-06-04
|
|
42
|
+
|
|
43
|
+
First public release: the seven-phase flow (specify → scenarios → contract →
|
|
44
|
+
tests → build → verify → observe) driven by one `TASK.md` per task, the
|
|
45
|
+
`add.py` state tracker (init · status · guide · report · check · gates ·
|
|
46
|
+
milestones · competency deltas · fold), the `add` skill for Claude Code, and
|
|
47
|
+
the full method book (`.add/docs/`). Installable via
|
|
48
|
+
`npx @pilotspace/add init` or `pip install pilotspace-add`.
|
package/GETTING-STARTED.md
CHANGED
|
@@ -42,20 +42,35 @@ contract → review the result.** Everything between is the agent.
|
|
|
42
42
|
|
|
43
43
|
## 0 · Prerequisites
|
|
44
44
|
|
|
45
|
-
- **
|
|
45
|
+
- **Python 3.10+** — required; the tool itself is stdlib-only (no pip dependencies).
|
|
46
|
+
- **One installer**, whichever you already have: **Node.js ≥ 18** (for `npx`) *or*
|
|
47
|
+
**pip** (Python). Both install the exact same `.add/` runtime.
|
|
46
48
|
- A project folder. It can be empty or an existing repo.
|
|
47
49
|
|
|
50
|
+
> **Windows:** use `py` wherever this guide writes `python3` (the Python launcher on
|
|
51
|
+
> Windows) — e.g. `py .add\tooling\add.py status`. Both installers handle the install
|
|
52
|
+
> step for you; only the by-hand `add.py` commands below differ.
|
|
53
|
+
|
|
48
54
|
---
|
|
49
55
|
|
|
50
56
|
## 1 · Install
|
|
51
57
|
|
|
52
|
-
From your project root:
|
|
58
|
+
From your project root, pick **one** path — both produce the same install:
|
|
59
|
+
|
|
60
|
+
**Option A — npm (Node.js ≥ 18):**
|
|
53
61
|
|
|
54
62
|
```bash
|
|
55
63
|
npx @pilotspace/add init --name "My App" --stage prototype
|
|
56
64
|
```
|
|
57
65
|
|
|
58
|
-
|
|
66
|
+
**Option B — pip (Python 3.10+):**
|
|
67
|
+
|
|
68
|
+
```bash
|
|
69
|
+
pip install pilotspace-add
|
|
70
|
+
pilotspace-add init --name "My App" --stage prototype
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
Either one creates `.add/` (your runtime), drops the `add` skill into
|
|
59
74
|
`.claude/skills/add/`, and bundles the book into `.add/docs/`. Pick the stage that
|
|
60
75
|
matches your intent — `prototype`, `poc`, `mvp`, or `production`. You can change it
|
|
61
76
|
later with `python3 .add/tooling/add.py stage mvp`.
|
|
@@ -87,7 +102,9 @@ python3 .add/tooling/add.py guide
|
|
|
87
102
|
|
|
88
103
|
`status` tells you *where* you are; `guide` tells you *what to do next* — the active
|
|
89
104
|
task's phase, the one concrete next action, the chapter to read, and the exact command
|
|
90
|
-
to run once that phase is done.
|
|
105
|
+
to run once that phase is done. Its `guide :` line names the phase-guide file to
|
|
106
|
+
read for the current phase (`.claude/skills/add/phases/…` — plain markdown), which is
|
|
107
|
+
how **any** agent — Claude, Cursor, Copilot, Codex — follows ADD through the CLI alone.
|
|
91
108
|
|
|
92
109
|
---
|
|
93
110
|
|
|
@@ -211,6 +228,51 @@ code 0 means healthy — handy as a CI gate.
|
|
|
211
228
|
|
|
212
229
|
---
|
|
213
230
|
|
|
231
|
+
## Enforce the seams in CI
|
|
232
|
+
|
|
233
|
+
`add.py audit` re-verifies every recorded human gate on your board — a named
|
|
234
|
+
human at each contract freeze, exactly one gate outcome per done task, a human
|
|
235
|
+
reviewer wherever the security line carries a note. It exits non-zero naming
|
|
236
|
+
the task and the finding, which makes it a CI gate: enforcement runs on a
|
|
237
|
+
machine the agent does not control, so the agent can never stamp its own work
|
|
238
|
+
green (*never self-gate*).
|
|
239
|
+
|
|
240
|
+
Drop this workflow into `.github/workflows/seam-audit.yml`:
|
|
241
|
+
|
|
242
|
+
```yaml
|
|
243
|
+
name: seam-audit
|
|
244
|
+
|
|
245
|
+
on:
|
|
246
|
+
push:
|
|
247
|
+
branches: [main]
|
|
248
|
+
pull_request:
|
|
249
|
+
|
|
250
|
+
permissions:
|
|
251
|
+
contents: read
|
|
252
|
+
|
|
253
|
+
jobs:
|
|
254
|
+
seam-audit:
|
|
255
|
+
name: Seam audit (recorded human gates)
|
|
256
|
+
runs-on: ubuntu-latest
|
|
257
|
+
steps:
|
|
258
|
+
- uses: actions/checkout@v4
|
|
259
|
+
|
|
260
|
+
- uses: actions/setup-python@v5
|
|
261
|
+
with:
|
|
262
|
+
python-version: '3.12'
|
|
263
|
+
|
|
264
|
+
- name: Audit recorded human seams
|
|
265
|
+
run: python3 .add/tooling/add.py audit
|
|
266
|
+
```
|
|
267
|
+
|
|
268
|
+
The command is the same one you can run locally — the installer already placed
|
|
269
|
+
`add.py` at `.add/tooling/add.py`, and the audit is read-only (it never edits
|
|
270
|
+
your board). A red `seam-audit` job means a seam record is malformed or a
|
|
271
|
+
security note was left to the auto-gate; fix the record (or escalate the gate
|
|
272
|
+
to a human), never the auditor.
|
|
273
|
+
|
|
274
|
+
---
|
|
275
|
+
|
|
214
276
|
## 6 · Resume next session
|
|
215
277
|
|
|
216
278
|
Close your laptop, come back tomorrow, run:
|
package/README.md
CHANGED
|
@@ -39,8 +39,8 @@ npx @pilotspace/add init --name "My App" --stage prototype
|
|
|
39
39
|
|
|
40
40
|
```bash
|
|
41
41
|
# Python / pip
|
|
42
|
-
pip install add
|
|
43
|
-
add
|
|
42
|
+
pip install pilotspace-add
|
|
43
|
+
pilotspace-add init --name "My App" --stage prototype
|
|
44
44
|
```
|
|
45
45
|
|
|
46
46
|
**New here?** Follow the [10-minute Quickstart](./GETTING-STARTED.md) — it walks
|
package/bin/cli.js
CHANGED
|
@@ -10,15 +10,17 @@
|
|
|
10
10
|
* <target>/.claude/skills/add/ (the skill Claude loads)
|
|
11
11
|
* <target>/.add/tooling/ (add.py scaffolder + state tracker)
|
|
12
12
|
* <target>/.add/docs/ (the AIDD book — the trust layer)
|
|
13
|
-
*
|
|
13
|
+
* It DROPS FILES ONLY — it does NOT run `add.py init`. Initialisation is deferred to
|
|
14
|
+
* the AI (via `/add`, which runs `init --await-lock` to arm the v12 lock-down gate) or
|
|
15
|
+
* to a CLI user. A pre-run plain init would grandfather-lock the gate before `/add` runs
|
|
16
|
+
* AND consume the brownfield signal in the terminal, where the AI never sees it.
|
|
14
17
|
*
|
|
15
|
-
* Zero npm dependencies. Designed for failure:
|
|
16
|
-
*
|
|
18
|
+
* Zero npm dependencies, no Python needed at install time. Designed for failure:
|
|
19
|
+
* verifies sources exist before copying, never clobbers an existing skill.
|
|
17
20
|
*/
|
|
18
21
|
|
|
19
22
|
const fs = require("fs");
|
|
20
23
|
const path = require("path");
|
|
21
|
-
const { spawnSync } = require("child_process");
|
|
22
24
|
|
|
23
25
|
const PKG_ROOT = path.resolve(__dirname, "..");
|
|
24
26
|
|
|
@@ -39,24 +41,23 @@ function parseArgs(argv) {
|
|
|
39
41
|
return args;
|
|
40
42
|
}
|
|
41
43
|
|
|
42
|
-
function copyDir(src, dest, { skipIfExists } = {}) {
|
|
44
|
+
function copyDir(src, dest, { skipIfExists, cleanReplace } = {}) {
|
|
43
45
|
if (!fs.existsSync(src)) fail("missing packaged source: " + src);
|
|
44
46
|
if (skipIfExists && fs.existsSync(dest)) {
|
|
45
47
|
warn(dest + " exists — leaving it untouched");
|
|
46
48
|
return;
|
|
47
49
|
}
|
|
50
|
+
// Clean replace: drop a stale dest before copying so a `--force` re-install can
|
|
51
|
+
// never leave orphaned files from a previous version behind. fs.cpSync merges
|
|
52
|
+
// (it never removes), so without this `--force` is a merge, not a replace. Mirrors
|
|
53
|
+
// _installer.py's `shutil.rmtree(skill_dest)` so npm and pip behave identically.
|
|
54
|
+
if (cleanReplace && fs.existsSync(dest)) {
|
|
55
|
+
fs.rmSync(dest, { recursive: true, force: true });
|
|
56
|
+
}
|
|
48
57
|
fs.mkdirSync(path.dirname(dest), { recursive: true });
|
|
49
58
|
fs.cpSync(src, dest, { recursive: true });
|
|
50
59
|
}
|
|
51
60
|
|
|
52
|
-
function hasPython() {
|
|
53
|
-
for (const py of ["python3", "python"]) {
|
|
54
|
-
const r = spawnSync(py, ["--version"], { stdio: "ignore" });
|
|
55
|
-
if (r.status === 0) return py;
|
|
56
|
-
}
|
|
57
|
-
return null;
|
|
58
|
-
}
|
|
59
|
-
|
|
60
61
|
function cmdInit(args) {
|
|
61
62
|
const target = path.resolve(args._[0] || ".");
|
|
62
63
|
if (!fs.existsSync(target)) fail("target directory does not exist: " + target);
|
|
@@ -66,7 +67,7 @@ function cmdInit(args) {
|
|
|
66
67
|
copyDir(
|
|
67
68
|
path.join(PKG_ROOT, "skill", "add"),
|
|
68
69
|
path.join(target, ".claude", "skills", "add"),
|
|
69
|
-
{ skipIfExists: !args.force }
|
|
70
|
+
{ skipIfExists: !args.force, cleanReplace: args.force }
|
|
70
71
|
);
|
|
71
72
|
log(" ✓ skill -> .claude/skills/add/");
|
|
72
73
|
|
|
@@ -88,27 +89,18 @@ function cmdInit(args) {
|
|
|
88
89
|
{ skipIfExists: false });
|
|
89
90
|
log(" ✓ trust docs -> .add/docs/ (the AIDD book)");
|
|
90
91
|
|
|
91
|
-
// 4
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
if (args.force) initArgs.push("--force");
|
|
104
|
-
const r = spawnSync(py, initArgs, { stdio: "inherit" });
|
|
105
|
-
if (r.status !== 0 && r.status !== null) {
|
|
106
|
-
warn("`add.py init` exited non-zero (state may already exist). Run `add.py status` to check.");
|
|
107
|
-
}
|
|
108
|
-
|
|
109
|
-
log("\nDone. In Claude Code, the `add` skill is now installed.");
|
|
110
|
-
log("Next: open Claude Code, run `/add`, and say what you want to build —");
|
|
111
|
-
log(" the agent sizes it into a milestone and drives the build with you.");
|
|
92
|
+
// NO step 4: the installer DROPS FILES ONLY. Initialisation is deferred to the AI
|
|
93
|
+
// (via `/add`) or a CLI user — a pre-run plain `add.py init` would grandfather-lock
|
|
94
|
+
// the v12 lock-down gate before `/add` runs (see file header). So no Python is run here.
|
|
95
|
+
log("\nDone. The `add` skill + tooling are installed (no project state yet — that's intentional).");
|
|
96
|
+
log("Next: open Claude Code, run `/add`, and say what you want to build — the agent");
|
|
97
|
+
log(" sets up the foundation, sizes it into a milestone, and drives the build with you;");
|
|
98
|
+
log(" you sign off once, at the lock-down.");
|
|
99
|
+
log("");
|
|
100
|
+
log("Prefer the CLI / not using Claude Code? Initialise it yourself (this arms the lock-down):");
|
|
101
|
+
const launcher = process.platform === "win32" ? "py" : "python3";
|
|
102
|
+
log(` ${launcher} .add/tooling/add.py init --await-lock --stage ${args.stage}` +
|
|
103
|
+
(args.name ? ` --name "${args.name}"` : ""));
|
|
112
104
|
}
|
|
113
105
|
|
|
114
106
|
function main() {
|
package/docs/02-the-flow.md
CHANGED
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
|
|
7
7
|
## The flow
|
|
8
8
|
|
|
9
|
-
AIDD is one repeatable flow of six
|
|
9
|
+
AIDD is one repeatable flow of **seven steps**: six build the feature — Specify → Scenarios → Contract → Tests → Build → Verify — and the seventh, **Observe**, feeds what production teaches back into the next Specify. In the default flow the AI drafts the front (steps 1–4) and a person approves it **once**, at the contract freeze; the AI performs the Build; and Verify is resolved on evidence under `autonomy: auto`, with a person owning any residue. (See [11 Governance](./11-governance.md) for the autonomy dial and the one-approval seam.)
|
|
10
10
|
|
|
11
11
|

|
|
12
12
|
|
|
@@ -47,6 +47,8 @@ flowchart LR
|
|
|
47
47
|
|
|
48
48
|
The shape is deliberate: the human-led steps establish direction, a frozen contract forms the seam in the middle, and the AI-led build runs fast and safely on the far side because everything it needs is already fixed.
|
|
49
49
|
|
|
50
|
+
> **What changed in v7 (the diagrams above show the structural spine, which is unchanged).** The *steps* and their order are exactly as drawn — only **who resolves them** moved. The AI now drafts the whole front (steps 1–4) and a person approves it **once**, at the contract freeze (not a sign-off at each step); and **Verify is auto-gated on evidence** under `autonomy: auto` (the default), escalating security — always a `HARD-STOP` — and other residue to a person. Lower the dial to `conservative` to keep a human at the Verify gate. See [11 Governance](./11-governance.md).
|
|
51
|
+
|
|
50
52
|
## Why the order is the order
|
|
51
53
|
|
|
52
54
|
Each step produces exactly one artifact, and each artifact is the input to the next step. The order is not a preference; it is a dependency chain.
|
|
@@ -68,12 +70,13 @@ The flow runs in two directions under two rules that never conflict. **Backward
|
|
|
68
70
|
|
|
69
71
|
| Step | Person's job | AI's job |
|
|
70
72
|
|------|--------------|----------|
|
|
71
|
-
| 1 Specify |
|
|
72
|
-
| 2 Scenarios |
|
|
73
|
-
| 3 Contract | approve
|
|
74
|
-
| 4 Tests |
|
|
73
|
+
| 1 Specify | confirm the rules (part of the one approval) | draft; list assumptions to confirm |
|
|
74
|
+
| 2 Scenarios | confirm what "correct" looks like (part of the one approval) | draft scenarios |
|
|
75
|
+
| 3 Contract | **approve & freeze the whole bundle (§1–§4) once — the seam** | draft the contract and mocks |
|
|
76
|
+
| 4 Tests | confirm the targets (part of the one approval) | draft the failing tests |
|
|
75
77
|
| 5 Build | direct in small batches | implement until tests pass |
|
|
76
|
-
| 6 Verify |
|
|
78
|
+
| 6 Verify | own the residue (security · concurrency · architecture); approve when `conservative` | gather evidence; **auto-PASS on complete evidence** under `autonomy: auto` |
|
|
79
|
+
| 7 Observe | read the signal; fold confirmed deltas into PROJECT.md | run behind a flag; emit competency deltas |
|
|
77
80
|
|
|
78
81
|
## What survives, and what is disposable
|
|
79
82
|
|
|
@@ -6,6 +6,8 @@
|
|
|
6
6
|
> **Produces:** `features/<name>.feature`.
|
|
7
7
|
> **Person's job:** decide what "correct" looks like in concrete situations. **AI's job:** draft the scenarios.
|
|
8
8
|
|
|
9
|
+
> **Part of the one-approval front (v7).** In the default flow these scenarios are drafted by the AI alongside the spec, contract, and failing tests as **one bundle**, approved by a person **once**, at the contract freeze — not signed off step by step. This chapter is how to get the scenarios *right*; [05 Contract](./05-step-3-contract.md) is where the bundle is frozen. See [11 Governance](./11-governance.md).
|
|
10
|
+
|
|
9
11
|
---
|
|
10
12
|
|
|
11
13
|
## Why turn rules into scenarios
|
|
@@ -6,6 +6,8 @@
|
|
|
6
6
|
> **Produces:** `contracts/<name>.md` (plus a mock and contract tests).
|
|
7
7
|
> **Person's job:** approve and freeze the shape. **AI's job:** generate the first draft, the mock, and the contract tests.
|
|
8
8
|
|
|
9
|
+
> **The one approval lands here (v7).** In the default flow the AI drafts the whole front — spec, scenarios, this contract, and the failing tests — as **one bundle**, and a person gives a **single approval at this freeze**. Freezing the contract is the one human gate of the front, not the third of three sign-offs; reject any part and the whole bundle returns to draft (backward correction, not failure). See [11 Governance](./11-governance.md).
|
|
10
|
+
|
|
9
11
|
---
|
|
10
12
|
|
|
11
13
|
## The seam of the whole method
|
package/docs/06-step-4-tests.md
CHANGED
|
@@ -6,6 +6,8 @@
|
|
|
6
6
|
> **Produces:** a failing (red) automated test suite.
|
|
7
7
|
> **Person's job:** set the targets and coverage. **AI's job:** generate the tests.
|
|
8
8
|
|
|
9
|
+
> **Part of the one-approval front (v7).** In the default flow these tests are drafted by the AI as part of the front **bundle** (spec · scenarios · contract · tests) and approved by a person **once**, at the contract freeze — the tests are part of what that single approval covers. They still must be **red before the build**. See [11 Governance](./11-governance.md).
|
|
10
|
+
|
|
9
11
|
---
|
|
10
12
|
|
|
11
13
|
## Why tests come before code
|
package/docs/08-step-6-verify.md
CHANGED
|
@@ -4,7 +4,7 @@
|
|
|
4
4
|
|
|
5
5
|
> **Purpose:** confirm the result is correct and safe to release.
|
|
6
6
|
> **Produces:** a reviewed change with a recorded outcome, ready to release.
|
|
7
|
-
> **
|
|
7
|
+
> **Who resolves it:** set per task by the `autonomy:` header. Under `autonomy: auto` (the default) the run resolves the gate on evidence; under `conservative`, or for any residue, it is the human's check. **Security always escalates to a human.**
|
|
8
8
|
|
|
9
9
|
---
|
|
10
10
|
|
|
@@ -14,6 +14,15 @@ The build produced passing tests. That is necessary but not sufficient. Verifica
|
|
|
14
14
|
|
|
15
15
|
This needs care, because it is easy to misread. "Not by inspection" does not mean "do not look at the code." It means the *basis* of trust is the passing evidence plus a deliberate check of the specific things tests cannot easily catch — not a general impression that the code reads plausibly. Plausibility is exactly the trap: AI code is frequently plausible and wrong. So verification has two parts: confirm the evidence, then check the known blind spots.
|
|
16
16
|
|
|
17
|
+
## Who resolves Verify — the evidence auto-gate
|
|
18
|
+
|
|
19
|
+
Verify can be resolved two ways, set per task by the `autonomy:` header (see [governance](./11-governance.md) and the autonomy dial):
|
|
20
|
+
|
|
21
|
+
- **Auto (the default).** When `autonomy: auto`, the run resolves the gate on **evidence** rather than waiting for a person — but only when *all* of these hold: every test green, coverage not decreased, no test weakened and no contract edited, the convergence loops dry, and **no residue** (security, concurrency, or architecture). It records `PASS` as *auto-resolved*, naming the run as the accountable owner — an explicit pass, not a skip. This is principle 7: a gate may be resolved by evidence when that evidence is sufficient and the result is logged.
|
|
22
|
+
- **Human.** When `autonomy: conservative`, or whenever the run finds residue it cannot judge, the gate stops for a person; the two parts below are theirs.
|
|
23
|
+
|
|
24
|
+
**Security is always a `HARD-STOP` and is never auto-passed, at any autonomy level.** The two parts that follow — confirm the evidence, then check the blind spots — are what *either* resolver works through; the only question is whether a person or the recorded run signs the outcome.
|
|
25
|
+
|
|
17
26
|
## Part one — confirm the evidence
|
|
18
27
|
|
|
19
28
|
- [ ] All tests pass.
|
|
@@ -49,7 +58,7 @@ A security finding is always a `HARD-STOP`; it is never waved through with a wai
|
|
|
49
58
|
- [ ] Concurrency/timing of the risky operation is safe.
|
|
50
59
|
- [ ] No exposed secrets, injection openings, or unexpected dependencies.
|
|
51
60
|
- [ ] Layering and dependencies follow `CONVENTIONS.md`.
|
|
52
|
-
- [ ]
|
|
61
|
+
- [ ] The change is approved — by a person, **or** (under `autonomy: auto`, no residue) auto-resolved by the run as the recorded accountable owner.
|
|
53
62
|
- [ ] An outcome is recorded (`PASS` / `RISK-ACCEPTED` / `HARD-STOP`).
|
|
54
63
|
|
|
55
64
|
## Common mistakes
|
package/docs/09-the-loop.md
CHANGED
|
@@ -33,6 +33,24 @@ Every defect, surprise, or new need is written up as a change to the specificati
|
|
|
33
33
|
|
|
34
34
|
This is also where the AI returns to a useful role: summarizing telemetry, clustering errors into themes, and drafting the proposed spec delta for a person to review. But the production decisions — what to roll back, what to prioritize — remain human.
|
|
35
35
|
|
|
36
|
+
## Competency deltas and the foundation fold
|
|
37
|
+
|
|
38
|
+
A spec delta feeds the *next feature*. But a loop also teaches the **method itself** — that the domain model missed a boundary, that a whole class of scenario was never tested, that a build convention helped or hurt. AIDD captures those as **competency deltas**: a single tagged learning, written in the Observe step, marking which of the five competencies it sharpens.
|
|
39
|
+
|
|
40
|
+
| tag | competency | a delta here means you learned something about… |
|
|
41
|
+
|-----|------------|--------------------------------------------------|
|
|
42
|
+
| `DDD` | Domain | the domain model — an entity, rule, or boundary the spec assumed wrong |
|
|
43
|
+
| `SDD` | Spec | what the feature must do or reject — a missing or wrong requirement |
|
|
44
|
+
| `UDD` | UI/UX | the user-facing shape — a flow, affordance, or wording that misled |
|
|
45
|
+
| `TDD` | Test | how we prove correctness — a missing scenario, a flaky or hollow test |
|
|
46
|
+
| `ADD` | AI/build | how the AI builds — a harness, prompt, or convention that helped or hurt |
|
|
47
|
+
|
|
48
|
+
Each delta is one tagged entry — `- [COMPETENCY · status] the learning (evidence: a pointer)` — and the evidence is **required**: a failing scenario, a production signal, a review note. No evidence means it is an opinion, not a delta. The AI **emits** deltas as `open`; it never folds its own. Folding is judgment, and judgment is the human's — the same verify/observe seam that keeps the AI from grading its own work.
|
|
49
|
+
|
|
50
|
+
**The fold.** At milestone close (or on demand, when open deltas pile up), a person runs the fold ritual: **gather** every `open` delta across the milestone's tasks, **group** them by competency, **propose** the exact foundation edit for each, **confirm** with the human one by one, then **write** — append-only — flipping each delta to `folded` (merged) or `rejected` (considered and deliberately not merged, left in place so the trail survives), and bumping the `foundation-version:` marker. `DDD`/`SDD`/`UDD` deltas fold into the matching section of `PROJECT.md`; `TDD`/`ADD` fold into `CONVENTIONS.md` (they sharpen the engine, not the product); and **every** fold also appends one row to `PROJECT.md` §Key Decisions — the universal, auditable record of what the foundation learned.
|
|
51
|
+
|
|
52
|
+
**Tooling.** `add.py deltas` lists every open delta across the project (so nothing waiting to be folded is invisible); `add.py check` lints each delta's well-formedness — known competency tag, valid status, non-empty evidence. There is deliberately **no `add.py fold`**: the engine stays judgment-free, and the ritual lives with the human who owns it.
|
|
53
|
+
|
|
36
54
|
## Re-entrancy: the loop is the whole point
|
|
37
55
|
|
|
38
56
|
Two principles converge here. *The flow is re-entrant* — any step can send you back to an earlier one — and *the flow is a loop* — production feeds the next specification. Together they mean the artifacts you built are never "finished"; they are living documents that the next cycle refines.
|
|
@@ -6,26 +6,34 @@ This chapter covers two operational matters: what you set up once per project, a
|
|
|
6
6
|
|
|
7
7
|
---
|
|
8
8
|
|
|
9
|
-
##
|
|
9
|
+
## Setup: the AI drafts, you lock down
|
|
10
10
|
|
|
11
|
-
Before the first feature,
|
|
11
|
+
Before the first feature, the project needs a foundation — but standing it up is no longer your chore. Point ADD at the repo and **the AI does the drafting**: it runs `init` itself, reads what is there, and fills the foundation the whole project depends on. Your single act is the **lock-down** — the one human gate that freezes it.
|
|
12
|
+
|
|
13
|
+
**What the AI drafts.** From an existing codebase it works **silently** — the code answers the questions a setup interview would ask. On an empty repo it runs a short **four-lens interview** (domain · spec · users · decisions), then drafts. Either way it fills the survivor layer — the files that outlive all code — and drafts the first milestone's scope and the first task's candidate contract:
|
|
12
14
|
|
|
13
15
|
| Item | File | Purpose |
|
|
14
16
|
|------|------|---------|
|
|
15
|
-
|
|
|
17
|
+
| Foundation | `PROJECT.md` | domain · active spec · UI/UX · key decisions — the context every task reads first |
|
|
16
18
|
| Conventions | `CONVENTIONS.md` | naming, layout, language, formatter — the survivor layer |
|
|
17
19
|
| Model record | `MODEL_REGISTRY.md` | which AI model and version the project uses, for reproducibility and audit |
|
|
18
20
|
| Dependency allow-list | `dependencies.allowlist` | the packages the AI may use; the pipeline rejects others |
|
|
19
21
|
| Prompt playbook | `playbook/` | the six prompts from [Appendix B](./appendix-b-prompts.md) |
|
|
22
|
+
| Repository + pipeline | — | runs the gates on every change |
|
|
23
|
+
|
|
24
|
+
Every drafted decision is tagged **evidence-grounded** (read from the code) or **guessed** (thin or inferred) and listed least-sure-first in a `SETUP-REVIEW.md`, so the one signature you give is informed rather than a rubber stamp.
|
|
25
|
+
|
|
26
|
+
**The lock-down.** The AI presents `SETUP-REVIEW.md`; you check the `guessed` rows; you **lock** — once. That single act freezes the foundation, the first scope, and the first contract together. It is the setup-altitude analog of the [contract freeze](./05-step-3-contract.md), and it doubles as the first task's contract approval — so there is no separate sign-off. Before the lock the engine lets the AI draft but refuses to cross into build; after it, the build opens.
|
|
20
27
|
|
|
21
28
|
**Setup exit check**
|
|
22
29
|
|
|
30
|
+
- [ ] Foundation + survivors drafted (brownfield: from the code, evidence-tagged; greenfield: from the interview, gaps flagged `guessed`).
|
|
31
|
+
- [ ] `SETUP-REVIEW.md` lists every drafted decision least-sure-first.
|
|
32
|
+
- [ ] The model is pinned; the allow-list exists and the pipeline fails on any package outside it.
|
|
23
33
|
- [ ] The pipeline runs and is green on the empty skeleton.
|
|
24
|
-
- [ ] The
|
|
25
|
-
- [ ] The allow-list exists and the pipeline fails on any package outside it.
|
|
26
|
-
- [ ] The playbook is present.
|
|
34
|
+
- [ ] The human **locked down** — and only then did the first feature's build open.
|
|
27
35
|
|
|
28
|
-
Do not start a feature until the pipeline is green
|
|
36
|
+
Do not start a feature until the pipeline is green and the foundation is locked. The lock-down turns the AI's draft into committed direction; the pipeline enforces every later exit check without anyone having to remember to.
|
|
29
37
|
|
|
30
38
|
---
|
|
31
39
|
|
|
@@ -73,3 +81,24 @@ The durable thing is never the code:
|
|
|
73
81
|
| MVP → Production | nothing | everything; the code is real and is hardened |
|
|
74
82
|
|
|
75
83
|
The survivor layer thickens as you move right: a prototype leaves you a validated design; a proof of concept adds a proven approach and a contract; the MVP adds real, kept code. By production, you are hardening, not rebuilding.
|
|
84
|
+
|
|
85
|
+
---
|
|
86
|
+
|
|
87
|
+
## Parallel streams (opt-in)
|
|
88
|
+
|
|
89
|
+
The default is one task at a time. But when a milestone holds several tasks whose dependencies are already `PASS` and a reviewer is ready, you may run them **concurrently** — one worker per ready task, each building behind its own frozen contract.
|
|
90
|
+
|
|
91
|
+
**Be honest about the gain.** With one human reviewer you cannot beat `review_time × N_tasks`; the human-led seams are serial. So the win is **not throughput** — it is that the reviewer is *never blocked waiting on a build*. While a person reviews task A's front, the builds for B, C, and D run behind *their* frozen contracts. You hide build latency under human latency; do not promise more.
|
|
92
|
+
|
|
93
|
+
**Two queues, no new state** — both read from `add.py status`:
|
|
94
|
+
|
|
95
|
+
- **READY-QUEUE** — tasks in the active milestone where the phase is not `done` and every dependency already reads `gate=PASS`. These are the only tasks a worker may pick up; a task finishing `PASS` unblocks its dependents on the next `status`.
|
|
96
|
+
- **REVIEW-QUEUE** — the irreducibly serial part: the **one-approval front** (contract freeze) and any **Verify escalation**. One human, one queue, presented one at a time — never a batch that invites a rubber stamp.
|
|
97
|
+
|
|
98
|
+
**The autonomy dial is the throttle.** At `conservative`, both gates queue on the human (pure pipelining — builds overlap, nothing auto-resolves). At `auto` (the default), only the front seam and residue escalations queue; Verify auto-PASSes on evidence, so real concurrency follows. The floor never drops below **one human approval per task, at the contract seam**.
|
|
99
|
+
|
|
100
|
+
**Design for failure (required).** Lease each task to its worker with a timeout — if a worker dies, release the claim back to READY rather than trusting partial work. A worker that hits a stop-and-escalate blocks only its own task; siblings keep running. And if several workers fail in one wave, trip a circuit-breaker and fall back to sequential — repeated failure means the scope was wrong, not the parallelism.
|
|
101
|
+
|
|
102
|
+
**The hard boundary.** The orchestrator owns every shared write — `state.json`, `MILESTONE.md`, and each `add.py advance`/`gate` call (always with the explicit task slug). A worker owns only its own task directory and is isolated in a git worktree, so concurrent builds cannot collide. Merge is **serial**: bring worktrees back one at a time and run an **integration Verify** for the concurrency and architecture conflicts that two-green-in-isolation tasks can still produce — automation never auto-passes that step.
|
|
103
|
+
|
|
104
|
+
The full, agent-agnostic worker contract (the prompt a worker runs) and the per-runner spawn adapter live in the skill's `streams.md`; this section is the *why* and the safety frame, not the operational recipe.
|
package/docs/13-adoption.md
CHANGED
|
@@ -10,7 +10,7 @@ How a team starts using AIDD, and how a new person becomes productive in it.
|
|
|
10
10
|
|
|
11
11
|
Adopt the method on one real product, not as an all-at-once mandate.
|
|
12
12
|
|
|
13
|
-
1. **Days 1–15 —
|
|
13
|
+
1. **Days 1–15 — Lock the foundation.** On one pilot service, let the AI draft the foundation — conventions, glossary, dependency allow-list, model record — from the existing code (or the four-lens interview if greenfield), then **lock it down** with one signature. The prompt playbook is [Appendix B](./appendix-b-prompts.md).
|
|
14
14
|
2. **Days 16–45 — One feature, end to end.** Run a single feature through the whole flow at the **Express** profile. Capture friction; tune the prompts' golden cases as you go.
|
|
15
15
|
3. **Days 46–75 — Turn on the gates.** Wire the three reports and the gate-fail protocol into the pipeline; introduce the autonomy ladder at the generate-behind-gate level.
|
|
16
16
|
4. **Days 76–90 — Promote.** Move the pilot to the **Standard** profile, draft the **Regulated** variant for any compliance-bound product, and publish the prompts as a shared, versioned playbook.
|
|
@@ -53,7 +53,7 @@ Switching tools changes the discovery convention and nothing structural.
|
|
|
53
53
|
| Role | First-week task |
|
|
54
54
|
|------|------------------|
|
|
55
55
|
| Product / Domain | run the Specify prompt on a real input; produce a glossary you would defend |
|
|
56
|
-
| Architect / Lead |
|
|
56
|
+
| Architect / Lead | review the AI's setup draft and lock it down (the first contract freezes with it); wire the architecture check into the pipeline |
|
|
57
57
|
| Engineer (Senior) | run the Build prompt on one small task; produce a full evidence bundle |
|
|
58
58
|
| Engineer (Junior) | take a handed-over spec; make a red test green without weakening it |
|
|
59
59
|
| QA / Test | convert one rule into a scenario, then a failing test |
|
package/docs/14-foundation.md
CHANGED
|
@@ -74,7 +74,10 @@ one short section each, plus an append-only record of key decisions:
|
|
|
74
74
|
Keep it to one screen. If a section wants to grow into a manual, that is a signal
|
|
75
75
|
the detail belongs in a milestone or a contract, not the foundation. The foundation
|
|
76
76
|
is the *thin, durable* context the engine reads first — not a place to relocate the
|
|
77
|
-
work.
|
|
77
|
+
work. And you do not hand-write it: at setup the AI **drafts** all four sections —
|
|
78
|
+
silently from an existing codebase, or from a short four-lens interview on a
|
|
79
|
+
greenfield repo — and a single human **lock-down** freezes that draft as committed
|
|
80
|
+
direction (the setup-altitude analog of a contract freeze).
|
|
78
81
|
|
|
79
82
|
## How it feeds the engine — and takes feedback back
|
|
80
83
|
|
|
@@ -105,12 +108,17 @@ life of the product, owned above any single milestone.
|
|
|
105
108
|
A milestone is a *version bump* to the foundation, not a fresh start: when it
|
|
106
109
|
closes, fold what it validated into `PROJECT.md` (a decision, a settled domain
|
|
107
110
|
term, a confirmed user journey) and open the next one against the same, now-richer,
|
|
108
|
-
ground.
|
|
111
|
+
ground. The fold is not informal: each loop emits **competency deltas** (tagged
|
|
112
|
+
`DDD · SDD · UDD · TDD · ADD`) in its Observe step, and at milestone close a person
|
|
113
|
+
gathers the open ones and folds them — append-only, with the `foundation-version:`
|
|
114
|
+
bumped — into the foundation. See [09 · The loop](./09-the-loop.md#competency-deltas-and-the-foundation-fold)
|
|
115
|
+
for the grammar, the ritual, and the tooling (`add.py deltas`, `add.py check`).
|
|
109
116
|
|
|
110
117
|
## In the tooling
|
|
111
118
|
|
|
112
|
-
- `add.py init` scaffolds `PROJECT.md` as a survivor file
|
|
113
|
-
|
|
119
|
+
- `add.py init` scaffolds `PROJECT.md` as a survivor file; the AI then drafts its
|
|
120
|
+
content and a single human **lock-down** (`add.py lock`) freezes it. Like every
|
|
121
|
+
survivor file, `init` **never overwrites a hand-edited one**.
|
|
114
122
|
- `add.py status` shows a one-line pointer to the foundation, so a fresh session
|
|
115
123
|
re-orients on context before code.
|
|
116
124
|
- The guideline block written into `CLAUDE.md` / `AGENTS.md` tells any agent the
|
|
@@ -12,7 +12,7 @@ This appendix maps every AIDD document to a three-level project hierarchy, so th
|
|
|
12
12
|
|-------|-----------|--------------|-------|
|
|
13
13
|
| **Project** | the whole product or engagement | the survivor layer — documents created once and kept for the life of the product | all milestones |
|
|
14
14
|
| **Milestone** | a stage or release | one pass of the flow at a chosen depth: Prototype, POC, MVP, or Production-Ready; groups many tasks | many tasks |
|
|
15
|
-
| **Task** | one feature through the flow | a single pass of Specify → … → Verify; the smallest unit with its own gate records | the
|
|
15
|
+
| **Task** | one feature through the flow | a single pass of Specify → … → Verify → Observe; the smallest unit with its own gate records | the seven steps |
|
|
16
16
|
|
|
17
17
|
A **project** sets up the survivor-layer documents once. A **milestone** is a depth-bounded goal that groups tasks and has its own entry and exit document gates. A **task** is one feature, and it produces the per-feature artifacts.
|
|
18
18
|
|
|
@@ -87,7 +87,7 @@ Which documents must exist, and at what depth, to **exit** each milestone. Depth
|
|
|
87
87
|
|
|
88
88
|
---
|
|
89
89
|
|
|
90
|
-
## Matrix 3 — Documents required per task (the
|
|
90
|
+
## Matrix 3 — Documents required per task (the seven steps)
|
|
91
91
|
|
|
92
92
|
Every task, regardless of milestone, produces this artifact chain. The depth varies by milestone (Matrix 2); the *sequence and exit gate* do not.
|
|
93
93
|
|
|
@@ -98,9 +98,10 @@ Every task, regardless of milestone, produces this artifact chain. The depth var
|
|
|
98
98
|
| 3 Contract | `contracts/<task>.md` | frozen + contract tests green | [05](./05-step-3-contract.md) |
|
|
99
99
|
| 4 Tests | `tests/<task>_*` | one test per scenario, red first | [06](./06-step-4-tests.md) |
|
|
100
100
|
| 5 Build | source code + evidence bundle | all tests green, nothing weakened | [07](./07-step-5-build.md) |
|
|
101
|
-
| 6 Verify | gate outcome record | `PASS` / `RISK-ACCEPTED` / `HARD-STOP` | [08](./08-step-6-verify.md) |
|
|
101
|
+
| 6 Verify | gate outcome record | `PASS` / `RISK-ACCEPTED` / `HARD-STOP` (auto-resolved on evidence under `autonomy: auto`; security always escalates) | [08](./08-step-6-verify.md) |
|
|
102
|
+
| 7 Observe | `TASK.md` §7 OBSERVE block | released behind a flag; scenario-monitors live; spec delta + competency deltas captured | [09](./09-the-loop.md) |
|
|
102
103
|
|
|
103
|
-
A task is **done**
|
|
104
|
+
A task is **done** when the build's documents exist and the Verify record reads `PASS` (or a signed `RISK-ACCEPTED`); the seventh step — **Observe** (§7) — then runs in production and feeds the next loop's Specify. See the master shippable checklist in [Appendix E](./appendix-e-checklists.md).
|
|
104
105
|
|
|
105
106
|
---
|
|
106
107
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@pilotspace/add",
|
|
3
|
-
"version": "1.
|
|
3
|
+
"version": "1.1.0",
|
|
4
4
|
"description": "ADD (AI-Driven Development) — a minimal, state-tracked Claude Code skill that drives every feature through Specify → Scenarios → Contract → Tests → Build → Verify → Observe. Ships the AIDD book as its trust layer.",
|
|
5
5
|
"bin": {
|
|
6
6
|
"add": "bin/cli.js"
|
|
@@ -9,7 +9,8 @@
|
|
|
9
9
|
"access": "public"
|
|
10
10
|
},
|
|
11
11
|
"scripts": {
|
|
12
|
-
"test": "python3 -m unittest discover -s tooling -p 'test_*.py'"
|
|
12
|
+
"test": "python3 -m unittest discover -s tooling -p 'test_*.py'",
|
|
13
|
+
"prepublishOnly": "python3 -m unittest discover -s tooling -p 'test_packaging.py'"
|
|
13
14
|
},
|
|
14
15
|
"files": [
|
|
15
16
|
"bin/",
|
|
@@ -18,7 +19,8 @@
|
|
|
18
19
|
"tooling/templates/",
|
|
19
20
|
"docs/",
|
|
20
21
|
"README.md",
|
|
21
|
-
"GETTING-STARTED.md"
|
|
22
|
+
"GETTING-STARTED.md",
|
|
23
|
+
"CHANGELOG.md"
|
|
22
24
|
],
|
|
23
25
|
"keywords": [
|
|
24
26
|
"ai-driven-development",
|