@pilotspace/add 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md ADDED
@@ -0,0 +1,48 @@
1
+ # Changelog
2
+
3
+ All notable changes to the ADD method (`@pilotspace/add` on npm,
4
+ `pilotspace-add` on PyPI) are documented here. The format follows
5
+ [Keep a Changelog](https://keepachangelog.com/); versions follow semver.
6
+
7
+ ## [1.1.0] — 2026-06-05
8
+
9
+ Production-ready enforcement: the gates are now verified by machinery distinct
10
+ from the agent, and any AI agent can follow the method through the CLI alone.
11
+
12
+ ### Added
13
+ - **`add.py audit [--json]`** — judgment-free, read-only verification that
14
+ human seams left well-formed records: a named human at every contract freeze,
15
+ exactly one gate outcome per done task, a human reviewer wherever the
16
+ security line carries a `NOTE`/`⚠` marker, no waivers on security. Exit 0
17
+ clean / exit 1 with `{task, code, detail}` findings.
18
+ - **Seam audit in CI** — a `seam-audit` job (this repo) plus a copy-paste
19
+ workflow for consumer projects (GETTING-STARTED "Enforce the seams in CI"):
20
+ a malformed seam record fails CI on a machine the agent does not control
21
+ (*never self-gate*, enforced).
22
+ - **The mechanized high-risk guard** — declare `risk: high` in a TASK.md
23
+ header and the engine refuses to complete the task (`PASS`/`RISK-ACCEPTED`)
24
+ until the dial is lowered to `autonomy: conservative`; error and audit
25
+ finding `unguarded_high_risk_auto`. Judging *what* is high-risk stays human;
26
+ the declared combination is enforced. `HARD-STOP` is never blocked.
27
+ - **Agent portability** — `add.py guide` now names the exact phase-guide file
28
+ to read (`guide : .claude/skills/add/phases/<n>-<phase>.md`, never a dead
29
+ pointer; additive `"guide"` key in `--json`), and the AGENTS.md/CLAUDE.md
30
+ block routes any agent — Claude, Cursor, Copilot, Codex — through the CLI
31
+ alone.
32
+ - **The freeze review checklist** — six ⚠-first lines inside the contract
33
+ phase guide that aim the human's one approval (intent · cases · shape ·
34
+ risk declaration · tests), never a second gate.
35
+
36
+ ### Changed
37
+ - GitHub Actions bumped off the deprecated Node-20 runtimes
38
+ (checkout v5, setup-python v6, setup-node v5).
39
+ - GETTING-STARTED: CI enforcement section + `guide :` orientation.
40
+
41
+ ## [1.0.0] — 2026-06-04
42
+
43
+ First public release: the seven-phase flow (specify → scenarios → contract →
44
+ tests → build → verify → observe) driven by one `TASK.md` per task, the
45
+ `add.py` state tracker (init · status · guide · report · check · gates ·
46
+ milestones · competency deltas · fold), the `add` skill for Claude Code, and
47
+ the full method book (`.add/docs/`). Installable via
48
+ `npx @pilotspace/add init` or `pip install pilotspace-add`.
@@ -42,20 +42,35 @@ contract → review the result.** Everything between is the agent.
42
42
 
43
43
  ## 0 · Prerequisites
44
44
 
45
- - **Node.js ≥ 16** (to install) and **Python 3.12+** (the tool is stdlib-only).
45
+ - **Python 3.10+** — required; the tool itself is stdlib-only (no pip dependencies).
46
+ - **One installer**, whichever you already have: **Node.js ≥ 18** (for `npx`) *or*
47
+ **pip** (Python). Both install the exact same `.add/` runtime.
46
48
  - A project folder. It can be empty or an existing repo.
47
49
 
50
+ > **Windows:** use `py` wherever this guide writes `python3` (the Python launcher on
51
+ > Windows) — e.g. `py .add\tooling\add.py status`. Both installers handle the install
52
+ > step for you; only the by-hand `add.py` commands below differ.
53
+
48
54
  ---
49
55
 
50
56
  ## 1 · Install
51
57
 
52
- From your project root:
58
+ From your project root, pick **one** path — both produce the same install:
59
+
60
+ **Option A — npm (Node.js ≥ 18):**
53
61
 
54
62
  ```bash
55
63
  npx @pilotspace/add init --name "My App" --stage prototype
56
64
  ```
57
65
 
58
- This creates `.add/` (your runtime), drops the `add` skill into
66
+ **Option B pip (Python 3.10+):**
67
+
68
+ ```bash
69
+ pip install pilotspace-add
70
+ pilotspace-add init --name "My App" --stage prototype
71
+ ```
72
+
73
+ Either one creates `.add/` (your runtime), drops the `add` skill into
59
74
  `.claude/skills/add/`, and bundles the book into `.add/docs/`. Pick the stage that
60
75
  matches your intent — `prototype`, `poc`, `mvp`, or `production`. You can change it
61
76
  later with `python3 .add/tooling/add.py stage mvp`.
@@ -87,7 +102,9 @@ python3 .add/tooling/add.py guide
87
102
 
88
103
  `status` tells you *where* you are; `guide` tells you *what to do next* — the active
89
104
  task's phase, the one concrete next action, the chapter to read, and the exact command
90
- to run once that phase is done.
105
+ to run once that phase is done. Its `guide :` line names the phase-guide file to
106
+ read for the current phase (`.claude/skills/add/phases/…` — plain markdown), which is
107
+ how **any** agent — Claude, Cursor, Copilot, Codex — follows ADD through the CLI alone.
91
108
 
92
109
  ---
93
110
 
@@ -211,6 +228,51 @@ code 0 means healthy — handy as a CI gate.
211
228
 
212
229
  ---
213
230
 
231
+ ## Enforce the seams in CI
232
+
233
+ `add.py audit` re-verifies every recorded human gate on your board — a named
234
+ human at each contract freeze, exactly one gate outcome per done task, a human
235
+ reviewer wherever the security line carries a note. It exits non-zero naming
236
+ the task and the finding, which makes it a CI gate: enforcement runs on a
237
+ machine the agent does not control, so the agent can never stamp its own work
238
+ green (*never self-gate*).
239
+
240
+ Drop this workflow into `.github/workflows/seam-audit.yml`:
241
+
242
+ ```yaml
243
+ name: seam-audit
244
+
245
+ on:
246
+ push:
247
+ branches: [main]
248
+ pull_request:
249
+
250
+ permissions:
251
+ contents: read
252
+
253
+ jobs:
254
+ seam-audit:
255
+ name: Seam audit (recorded human gates)
256
+ runs-on: ubuntu-latest
257
+ steps:
258
+ - uses: actions/checkout@v4
259
+
260
+ - uses: actions/setup-python@v5
261
+ with:
262
+ python-version: '3.12'
263
+
264
+ - name: Audit recorded human seams
265
+ run: python3 .add/tooling/add.py audit
266
+ ```
267
+
268
+ The command is the same one you can run locally — the installer already placed
269
+ `add.py` at `.add/tooling/add.py`, and the audit is read-only (it never edits
270
+ your board). A red `seam-audit` job means a seam record is malformed or a
271
+ security note was left to the auto-gate; fix the record (or escalate the gate
272
+ to a human), never the auditor.
273
+
274
+ ---
275
+
214
276
  ## 6 · Resume next session
215
277
 
216
278
  Close your laptop, come back tomorrow, run:
package/README.md CHANGED
@@ -39,8 +39,8 @@ npx @pilotspace/add init --name "My App" --stage prototype
39
39
 
40
40
  ```bash
41
41
  # Python / pip
42
- pip install add-method
43
- add-method init --name "My App" --stage prototype
42
+ pip install pilotspace-add
43
+ pilotspace-add init --name "My App" --stage prototype
44
44
  ```
45
45
 
46
46
  **New here?** Follow the [10-minute Quickstart](./GETTING-STARTED.md) — it walks
package/bin/cli.js CHANGED
@@ -10,15 +10,17 @@
10
10
  * <target>/.claude/skills/add/ (the skill Claude loads)
11
11
  * <target>/.add/tooling/ (add.py scaffolder + state tracker)
12
12
  * <target>/.add/docs/ (the AIDD book — the trust layer)
13
- * Then runs `add.py init` to create .add/state.json and survivor files.
13
+ * It DROPS FILES ONLY — it does NOT run `add.py init`. Initialisation is deferred to
14
+ * the AI (via `/add`, which runs `init --await-lock` to arm the v12 lock-down gate) or
15
+ * to a CLI user. A pre-run plain init would grandfather-lock the gate before `/add` runs
16
+ * AND consume the brownfield signal in the terminal, where the AI never sees it.
14
17
  *
15
- * Zero npm dependencies. Designed for failure: verifies sources, never clobbers
16
- * an existing state.json, and degrades gracefully if python3 is absent.
18
+ * Zero npm dependencies, no Python needed at install time. Designed for failure:
19
+ * verifies sources exist before copying, never clobbers an existing skill.
17
20
  */
18
21
 
19
22
  const fs = require("fs");
20
23
  const path = require("path");
21
- const { spawnSync } = require("child_process");
22
24
 
23
25
  const PKG_ROOT = path.resolve(__dirname, "..");
24
26
 
@@ -39,24 +41,23 @@ function parseArgs(argv) {
39
41
  return args;
40
42
  }
41
43
 
42
- function copyDir(src, dest, { skipIfExists } = {}) {
44
+ function copyDir(src, dest, { skipIfExists, cleanReplace } = {}) {
43
45
  if (!fs.existsSync(src)) fail("missing packaged source: " + src);
44
46
  if (skipIfExists && fs.existsSync(dest)) {
45
47
  warn(dest + " exists — leaving it untouched");
46
48
  return;
47
49
  }
50
+ // Clean replace: drop a stale dest before copying so a `--force` re-install can
51
+ // never leave orphaned files from a previous version behind. fs.cpSync merges
52
+ // (it never removes), so without this `--force` is a merge, not a replace. Mirrors
53
+ // _installer.py's `shutil.rmtree(skill_dest)` so npm and pip behave identically.
54
+ if (cleanReplace && fs.existsSync(dest)) {
55
+ fs.rmSync(dest, { recursive: true, force: true });
56
+ }
48
57
  fs.mkdirSync(path.dirname(dest), { recursive: true });
49
58
  fs.cpSync(src, dest, { recursive: true });
50
59
  }
51
60
 
52
- function hasPython() {
53
- for (const py of ["python3", "python"]) {
54
- const r = spawnSync(py, ["--version"], { stdio: "ignore" });
55
- if (r.status === 0) return py;
56
- }
57
- return null;
58
- }
59
-
60
61
  function cmdInit(args) {
61
62
  const target = path.resolve(args._[0] || ".");
62
63
  if (!fs.existsSync(target)) fail("target directory does not exist: " + target);
@@ -66,7 +67,7 @@ function cmdInit(args) {
66
67
  copyDir(
67
68
  path.join(PKG_ROOT, "skill", "add"),
68
69
  path.join(target, ".claude", "skills", "add"),
69
- { skipIfExists: !args.force }
70
+ { skipIfExists: !args.force, cleanReplace: args.force }
70
71
  );
71
72
  log(" ✓ skill -> .claude/skills/add/");
72
73
 
@@ -88,27 +89,18 @@ function cmdInit(args) {
88
89
  { skipIfExists: false });
89
90
  log(" ✓ trust docs -> .add/docs/ (the AIDD book)");
90
91
 
91
- // 4. run add.py init (idempotent — add.py refuses to clobber state.json)
92
- const py = hasPython();
93
- const addPy = path.join(toolingDest, "add.py");
94
- if (!py) {
95
- warn("python3 not foundskipping `add.py init`.");
96
- log("\nFinish setup manually once Python is available:");
97
- log(" python3 .add/tooling/add.py init" +
98
- (args.name ? ` --name "${args.name}"` : "") + ` --stage ${args.stage}`);
99
- return;
100
- }
101
- const initArgs = [addPy, "init", "--dir", target, "--stage", args.stage];
102
- if (args.name) initArgs.push("--name", args.name);
103
- if (args.force) initArgs.push("--force");
104
- const r = spawnSync(py, initArgs, { stdio: "inherit" });
105
- if (r.status !== 0 && r.status !== null) {
106
- warn("`add.py init` exited non-zero (state may already exist). Run `add.py status` to check.");
107
- }
108
-
109
- log("\nDone. In Claude Code, the `add` skill is now installed.");
110
- log("Next: open Claude Code, run `/add`, and say what you want to build —");
111
- log(" the agent sizes it into a milestone and drives the build with you.");
92
+ // NO step 4: the installer DROPS FILES ONLY. Initialisation is deferred to the AI
93
+ // (via `/add`) or a CLI user — a pre-run plain `add.py init` would grandfather-lock
94
+ // the v12 lock-down gate before `/add` runs (see file header). So no Python is run here.
95
+ log("\nDone. The `add` skill + tooling are installed (no project state yet — that's intentional).");
96
+ log("Next: open Claude Code, run `/add`, and say what you want to build the agent");
97
+ log(" sets up the foundation, sizes it into a milestone, and drives the build with you;");
98
+ log(" you sign off once, at the lock-down.");
99
+ log("");
100
+ log("Prefer the CLI / not using Claude Code? Initialise it yourself (this arms the lock-down):");
101
+ const launcher = process.platform === "win32" ? "py" : "python3";
102
+ log(` ${launcher} .add/tooling/add.py init --await-lock --stage ${args.stage}` +
103
+ (args.name ? ` --name "${args.name}"` : ""));
112
104
  }
113
105
 
114
106
  function main() {
@@ -6,7 +6,7 @@
6
6
 
7
7
  ## The flow
8
8
 
9
- AIDD is one repeatable flow of six steps, followed by an observation loop. People perform the first four steps (with AI assistance), the AI performs the fifth (under direction), and people perform the sixth.
9
+ AIDD is one repeatable flow of **seven steps**: six build the feature — Specify → Scenarios → Contract → Tests → Build → Verify — and the seventh, **Observe**, feeds what production teaches back into the next Specify. In the default flow the AI drafts the front (steps 1–4) and a person approves it **once**, at the contract freeze; the AI performs the Build; and Verify is resolved on evidence under `autonomy: auto`, with a person owning any residue. (See [11 Governance](./11-governance.md) for the autonomy dial and the one-approval seam.)
10
10
 
11
11
  ![The ADD flow — a solid forward spine Specify→Scenarios→Contract→Tests→Build→Verify→Observe, with dashed backward-correction arrows (any phase may return to an earlier one), a Tests⇄Build red/green engine, and Observe looping back to the next Specify](./add-flow.png)
12
12
 
@@ -47,6 +47,8 @@ flowchart LR
47
47
 
48
48
  The shape is deliberate: the human-led steps establish direction, a frozen contract forms the seam in the middle, and the AI-led build runs fast and safely on the far side because everything it needs is already fixed.
49
49
 
50
+ > **What changed in v7 (the diagrams above show the structural spine, which is unchanged).** The *steps* and their order are exactly as drawn — only **who resolves them** moved. The AI now drafts the whole front (steps 1–4) and a person approves it **once**, at the contract freeze (not a sign-off at each step); and **Verify is auto-gated on evidence** under `autonomy: auto` (the default), escalating security — always a `HARD-STOP` — and other residue to a person. Lower the dial to `conservative` to keep a human at the Verify gate. See [11 Governance](./11-governance.md).
51
+
50
52
  ## Why the order is the order
51
53
 
52
54
  Each step produces exactly one artifact, and each artifact is the input to the next step. The order is not a preference; it is a dependency chain.
@@ -68,12 +70,13 @@ The flow runs in two directions under two rules that never conflict. **Backward
68
70
 
69
71
  | Step | Person's job | AI's job |
70
72
  |------|--------------|----------|
71
- | 1 Specify | decide and confirm the rules | draft; list assumptions to confirm |
72
- | 2 Scenarios | decide what "correct" looks like | draft scenarios |
73
- | 3 Contract | approve and freeze the shape | generate the contract and mocks |
74
- | 4 Tests | set the targets | generate failing tests |
73
+ | 1 Specify | confirm the rules (part of the one approval) | draft; list assumptions to confirm |
74
+ | 2 Scenarios | confirm what "correct" looks like (part of the one approval) | draft scenarios |
75
+ | 3 Contract | **approve & freeze the whole bundle (§1–§4) once — the seam** | draft the contract and mocks |
76
+ | 4 Tests | confirm the targets (part of the one approval) | draft the failing tests |
75
77
  | 5 Build | direct in small batches | implement until tests pass |
76
- | 6 Verify | confirm via evidence + judgment | (none this is the human check) |
78
+ | 6 Verify | own the residue (security · concurrency · architecture); approve when `conservative` | gather evidence; **auto-PASS on complete evidence** under `autonomy: auto` |
79
+ | 7 Observe | read the signal; fold confirmed deltas into PROJECT.md | run behind a flag; emit competency deltas |
77
80
 
78
81
  ## What survives, and what is disposable
79
82
 
@@ -6,6 +6,8 @@
6
6
  > **Produces:** `features/<name>.feature`.
7
7
  > **Person's job:** decide what "correct" looks like in concrete situations. **AI's job:** draft the scenarios.
8
8
 
9
+ > **Part of the one-approval front (v7).** In the default flow these scenarios are drafted by the AI alongside the spec, contract, and failing tests as **one bundle**, approved by a person **once**, at the contract freeze — not signed off step by step. This chapter is how to get the scenarios *right*; [05 Contract](./05-step-3-contract.md) is where the bundle is frozen. See [11 Governance](./11-governance.md).
10
+
9
11
  ---
10
12
 
11
13
  ## Why turn rules into scenarios
@@ -6,6 +6,8 @@
6
6
  > **Produces:** `contracts/<name>.md` (plus a mock and contract tests).
7
7
  > **Person's job:** approve and freeze the shape. **AI's job:** generate the first draft, the mock, and the contract tests.
8
8
 
9
+ > **The one approval lands here (v7).** In the default flow the AI drafts the whole front — spec, scenarios, this contract, and the failing tests — as **one bundle**, and a person gives a **single approval at this freeze**. Freezing the contract is the one human gate of the front, not the third of three sign-offs; reject any part and the whole bundle returns to draft (backward correction, not failure). See [11 Governance](./11-governance.md).
10
+
9
11
  ---
10
12
 
11
13
  ## The seam of the whole method
@@ -6,6 +6,8 @@
6
6
  > **Produces:** a failing (red) automated test suite.
7
7
  > **Person's job:** set the targets and coverage. **AI's job:** generate the tests.
8
8
 
9
+ > **Part of the one-approval front (v7).** In the default flow these tests are drafted by the AI as part of the front **bundle** (spec · scenarios · contract · tests) and approved by a person **once**, at the contract freeze — the tests are part of what that single approval covers. They still must be **red before the build**. See [11 Governance](./11-governance.md).
10
+
9
11
  ---
10
12
 
11
13
  ## Why tests come before code
@@ -4,7 +4,7 @@
4
4
 
5
5
  > **Purpose:** confirm the result is correct and safe to release.
6
6
  > **Produces:** a reviewed change with a recorded outcome, ready to release.
7
- > **Person's job:** this entire step. There is no AI role here it is the human check.
7
+ > **Who resolves it:** set per task by the `autonomy:` header. Under `autonomy: auto` (the default) the run resolves the gate on evidence; under `conservative`, or for any residue, it is the human's check. **Security always escalates to a human.**
8
8
 
9
9
  ---
10
10
 
@@ -14,6 +14,15 @@ The build produced passing tests. That is necessary but not sufficient. Verifica
14
14
 
15
15
  This needs care, because it is easy to misread. "Not by inspection" does not mean "do not look at the code." It means the *basis* of trust is the passing evidence plus a deliberate check of the specific things tests cannot easily catch — not a general impression that the code reads plausibly. Plausibility is exactly the trap: AI code is frequently plausible and wrong. So verification has two parts: confirm the evidence, then check the known blind spots.
16
16
 
17
+ ## Who resolves Verify — the evidence auto-gate
18
+
19
+ Verify can be resolved two ways, set per task by the `autonomy:` header (see [governance](./11-governance.md) and the autonomy dial):
20
+
21
+ - **Auto (the default).** When `autonomy: auto`, the run resolves the gate on **evidence** rather than waiting for a person — but only when *all* of these hold: every test green, coverage not decreased, no test weakened and no contract edited, the convergence loops dry, and **no residue** (security, concurrency, or architecture). It records `PASS` as *auto-resolved*, naming the run as the accountable owner — an explicit pass, not a skip. This is principle 7: a gate may be resolved by evidence when that evidence is sufficient and the result is logged.
22
+ - **Human.** When `autonomy: conservative`, or whenever the run finds residue it cannot judge, the gate stops for a person; the two parts below are theirs.
23
+
24
+ **Security is always a `HARD-STOP` and is never auto-passed, at any autonomy level.** The two parts that follow — confirm the evidence, then check the blind spots — are what *either* resolver works through; the only question is whether a person or the recorded run signs the outcome.
25
+
17
26
  ## Part one — confirm the evidence
18
27
 
19
28
  - [ ] All tests pass.
@@ -49,7 +58,7 @@ A security finding is always a `HARD-STOP`; it is never waved through with a wai
49
58
  - [ ] Concurrency/timing of the risky operation is safe.
50
59
  - [ ] No exposed secrets, injection openings, or unexpected dependencies.
51
60
  - [ ] Layering and dependencies follow `CONVENTIONS.md`.
52
- - [ ] A person has reviewed and approved the change.
61
+ - [ ] The change is approved — by a person, **or** (under `autonomy: auto`, no residue) auto-resolved by the run as the recorded accountable owner.
53
62
  - [ ] An outcome is recorded (`PASS` / `RISK-ACCEPTED` / `HARD-STOP`).
54
63
 
55
64
  ## Common mistakes
@@ -33,6 +33,24 @@ Every defect, surprise, or new need is written up as a change to the specificati
33
33
 
34
34
  This is also where the AI returns to a useful role: summarizing telemetry, clustering errors into themes, and drafting the proposed spec delta for a person to review. But the production decisions — what to roll back, what to prioritize — remain human.
35
35
 
36
+ ## Competency deltas and the foundation fold
37
+
38
+ A spec delta feeds the *next feature*. But a loop also teaches the **method itself** — that the domain model missed a boundary, that a whole class of scenario was never tested, that a build convention helped or hurt. AIDD captures those as **competency deltas**: a single tagged learning, written in the Observe step, marking which of the five competencies it sharpens.
39
+
40
+ | tag | competency | a delta here means you learned something about… |
41
+ |-----|------------|--------------------------------------------------|
42
+ | `DDD` | Domain | the domain model — an entity, rule, or boundary the spec assumed wrong |
43
+ | `SDD` | Spec | what the feature must do or reject — a missing or wrong requirement |
44
+ | `UDD` | UI/UX | the user-facing shape — a flow, affordance, or wording that misled |
45
+ | `TDD` | Test | how we prove correctness — a missing scenario, a flaky or hollow test |
46
+ | `ADD` | AI/build | how the AI builds — a harness, prompt, or convention that helped or hurt |
47
+
48
+ Each delta is one tagged entry — `- [COMPETENCY · status] the learning (evidence: a pointer)` — and the evidence is **required**: a failing scenario, a production signal, a review note. No evidence means it is an opinion, not a delta. The AI **emits** deltas as `open`; it never folds its own. Folding is judgment, and judgment is the human's — the same verify/observe seam that keeps the AI from grading its own work.
49
+
50
+ **The fold.** At milestone close (or on demand, when open deltas pile up), a person runs the fold ritual: **gather** every `open` delta across the milestone's tasks, **group** them by competency, **propose** the exact foundation edit for each, **confirm** with the human one by one, then **write** — append-only — flipping each delta to `folded` (merged) or `rejected` (considered and deliberately not merged, left in place so the trail survives), and bumping the `foundation-version:` marker. `DDD`/`SDD`/`UDD` deltas fold into the matching section of `PROJECT.md`; `TDD`/`ADD` fold into `CONVENTIONS.md` (they sharpen the engine, not the product); and **every** fold also appends one row to `PROJECT.md` §Key Decisions — the universal, auditable record of what the foundation learned.
51
+
52
+ **Tooling.** `add.py deltas` lists every open delta across the project (so nothing waiting to be folded is invisible); `add.py check` lints each delta's well-formedness — known competency tag, valid status, non-empty evidence. There is deliberately **no `add.py fold`**: the engine stays judgment-free, and the ritual lives with the human who owns it.
53
+
36
54
  ## Re-entrancy: the loop is the whole point
37
55
 
38
56
  Two principles converge here. *The flow is re-entrant* — any step can send you back to an earlier one — and *the flow is a loop* — production feeds the next specification. Together they mean the artifacts you built are never "finished"; they are living documents that the next cycle refines.
@@ -6,26 +6,34 @@ This chapter covers two operational matters: what you set up once per project, a
6
6
 
7
7
  ---
8
8
 
9
- ## One-time setup
9
+ ## Setup: the AI drafts, you lock down
10
10
 
11
- Before the first feature, establish the foundation the whole project depends on. Done once, it makes every later checkpoint enforceable automatically.
11
+ Before the first feature, the project needs a foundation — but standing it up is no longer your chore. Point ADD at the repo and **the AI does the drafting**: it runs `init` itself, reads what is there, and fills the foundation the whole project depends on. Your single act is the **lock-down** the one human gate that freezes it.
12
+
13
+ **What the AI drafts.** From an existing codebase it works **silently** — the code answers the questions a setup interview would ask. On an empty repo it runs a short **four-lens interview** (domain · spec · users · decisions), then drafts. Either way it fills the survivor layer — the files that outlive all code — and drafts the first milestone's scope and the first task's candidate contract:
12
14
 
13
15
  | Item | File | Purpose |
14
16
  |------|------|---------|
15
- | Repository + pipeline | | runs the gates on every change |
17
+ | Foundation | `PROJECT.md` | domain · active spec · UI/UX · key decisions — the context every task reads first |
16
18
  | Conventions | `CONVENTIONS.md` | naming, layout, language, formatter — the survivor layer |
17
19
  | Model record | `MODEL_REGISTRY.md` | which AI model and version the project uses, for reproducibility and audit |
18
20
  | Dependency allow-list | `dependencies.allowlist` | the packages the AI may use; the pipeline rejects others |
19
21
  | Prompt playbook | `playbook/` | the six prompts from [Appendix B](./appendix-b-prompts.md) |
22
+ | Repository + pipeline | — | runs the gates on every change |
23
+
24
+ Every drafted decision is tagged **evidence-grounded** (read from the code) or **guessed** (thin or inferred) and listed least-sure-first in a `SETUP-REVIEW.md`, so the one signature you give is informed rather than a rubber stamp.
25
+
26
+ **The lock-down.** The AI presents `SETUP-REVIEW.md`; you check the `guessed` rows; you **lock** — once. That single act freezes the foundation, the first scope, and the first contract together. It is the setup-altitude analog of the [contract freeze](./05-step-3-contract.md), and it doubles as the first task's contract approval — so there is no separate sign-off. Before the lock the engine lets the AI draft but refuses to cross into build; after it, the build opens.
20
27
 
21
28
  **Setup exit check**
22
29
 
30
+ - [ ] Foundation + survivors drafted (brownfield: from the code, evidence-tagged; greenfield: from the interview, gaps flagged `guessed`).
31
+ - [ ] `SETUP-REVIEW.md` lists every drafted decision least-sure-first.
32
+ - [ ] The model is pinned; the allow-list exists and the pipeline fails on any package outside it.
23
33
  - [ ] The pipeline runs and is green on the empty skeleton.
24
- - [ ] The model is pinned.
25
- - [ ] The allow-list exists and the pipeline fails on any package outside it.
26
- - [ ] The playbook is present.
34
+ - [ ] The human **locked down** — and only then did the first feature's build open.
27
35
 
28
- Do not start a feature until the pipeline is green. It is the thing that will enforce every later exit check without anyone having to remember to.
36
+ Do not start a feature until the pipeline is green and the foundation is locked. The lock-down turns the AI's draft into committed direction; the pipeline enforces every later exit check without anyone having to remember to.
29
37
 
30
38
  ---
31
39
 
@@ -73,3 +81,24 @@ The durable thing is never the code:
73
81
  | MVP → Production | nothing | everything; the code is real and is hardened |
74
82
 
75
83
  The survivor layer thickens as you move right: a prototype leaves you a validated design; a proof of concept adds a proven approach and a contract; the MVP adds real, kept code. By production, you are hardening, not rebuilding.
84
+
85
+ ---
86
+
87
+ ## Parallel streams (opt-in)
88
+
89
+ The default is one task at a time. But when a milestone holds several tasks whose dependencies are already `PASS` and a reviewer is ready, you may run them **concurrently** — one worker per ready task, each building behind its own frozen contract.
90
+
91
+ **Be honest about the gain.** With one human reviewer you cannot beat `review_time × N_tasks`; the human-led seams are serial. So the win is **not throughput** — it is that the reviewer is *never blocked waiting on a build*. While a person reviews task A's front, the builds for B, C, and D run behind *their* frozen contracts. You hide build latency under human latency; do not promise more.
92
+
93
+ **Two queues, no new state** — both read from `add.py status`:
94
+
95
+ - **READY-QUEUE** — tasks in the active milestone where the phase is not `done` and every dependency already reads `gate=PASS`. These are the only tasks a worker may pick up; a task finishing `PASS` unblocks its dependents on the next `status`.
96
+ - **REVIEW-QUEUE** — the irreducibly serial part: the **one-approval front** (contract freeze) and any **Verify escalation**. One human, one queue, presented one at a time — never a batch that invites a rubber stamp.
97
+
98
+ **The autonomy dial is the throttle.** At `conservative`, both gates queue on the human (pure pipelining — builds overlap, nothing auto-resolves). At `auto` (the default), only the front seam and residue escalations queue; Verify auto-PASSes on evidence, so real concurrency follows. The floor never drops below **one human approval per task, at the contract seam**.
99
+
100
+ **Design for failure (required).** Lease each task to its worker with a timeout — if a worker dies, release the claim back to READY rather than trusting partial work. A worker that hits a stop-and-escalate blocks only its own task; siblings keep running. And if several workers fail in one wave, trip a circuit-breaker and fall back to sequential — repeated failure means the scope was wrong, not the parallelism.
101
+
102
+ **The hard boundary.** The orchestrator owns every shared write — `state.json`, `MILESTONE.md`, and each `add.py advance`/`gate` call (always with the explicit task slug). A worker owns only its own task directory and is isolated in a git worktree, so concurrent builds cannot collide. Merge is **serial**: bring worktrees back one at a time and run an **integration Verify** for the concurrency and architecture conflicts that two-green-in-isolation tasks can still produce — automation never auto-passes that step.
103
+
104
+ The full, agent-agnostic worker contract (the prompt a worker runs) and the per-runner spawn adapter live in the skill's `streams.md`; this section is the *why* and the safety frame, not the operational recipe.
@@ -10,7 +10,7 @@ How a team starts using AIDD, and how a new person becomes productive in it.
10
10
 
11
11
  Adopt the method on one real product, not as an all-at-once mandate.
12
12
 
13
- 1. **Days 1–15 — Set the foundation.** Stand up the one-time setup on one pilot service: conventions, glossary, dependency allow-list, model record, and the prompt playbook ([Appendix B](./appendix-b-prompts.md)).
13
+ 1. **Days 1–15 — Lock the foundation.** On one pilot service, let the AI draft the foundation conventions, glossary, dependency allow-list, model record from the existing code (or the four-lens interview if greenfield), then **lock it down** with one signature. The prompt playbook is [Appendix B](./appendix-b-prompts.md).
14
14
  2. **Days 16–45 — One feature, end to end.** Run a single feature through the whole flow at the **Express** profile. Capture friction; tune the prompts' golden cases as you go.
15
15
  3. **Days 46–75 — Turn on the gates.** Wire the three reports and the gate-fail protocol into the pipeline; introduce the autonomy ladder at the generate-behind-gate level.
16
16
  4. **Days 76–90 — Promote.** Move the pilot to the **Standard** profile, draft the **Regulated** variant for any compliance-bound product, and publish the prompts as a shared, versioned playbook.
@@ -53,7 +53,7 @@ Switching tools changes the discovery convention and nothing structural.
53
53
  | Role | First-week task |
54
54
  |------|------------------|
55
55
  | Product / Domain | run the Specify prompt on a real input; produce a glossary you would defend |
56
- | Architect / Lead | stand up setup and freeze one contract; wire the architecture check into the pipeline |
56
+ | Architect / Lead | review the AI's setup draft and lock it down (the first contract freezes with it); wire the architecture check into the pipeline |
57
57
  | Engineer (Senior) | run the Build prompt on one small task; produce a full evidence bundle |
58
58
  | Engineer (Junior) | take a handed-over spec; make a red test green without weakening it |
59
59
  | QA / Test | convert one rule into a scenario, then a failing test |
@@ -74,7 +74,10 @@ one short section each, plus an append-only record of key decisions:
74
74
  Keep it to one screen. If a section wants to grow into a manual, that is a signal
75
75
  the detail belongs in a milestone or a contract, not the foundation. The foundation
76
76
  is the *thin, durable* context the engine reads first — not a place to relocate the
77
- work.
77
+ work. And you do not hand-write it: at setup the AI **drafts** all four sections —
78
+ silently from an existing codebase, or from a short four-lens interview on a
79
+ greenfield repo — and a single human **lock-down** freezes that draft as committed
80
+ direction (the setup-altitude analog of a contract freeze).
78
81
 
79
82
  ## How it feeds the engine — and takes feedback back
80
83
 
@@ -105,12 +108,17 @@ life of the product, owned above any single milestone.
105
108
  A milestone is a *version bump* to the foundation, not a fresh start: when it
106
109
  closes, fold what it validated into `PROJECT.md` (a decision, a settled domain
107
110
  term, a confirmed user journey) and open the next one against the same, now-richer,
108
- ground.
111
+ ground. The fold is not informal: each loop emits **competency deltas** (tagged
112
+ `DDD · SDD · UDD · TDD · ADD`) in its Observe step, and at milestone close a person
113
+ gathers the open ones and folds them — append-only, with the `foundation-version:`
114
+ bumped — into the foundation. See [09 · The loop](./09-the-loop.md#competency-deltas-and-the-foundation-fold)
115
+ for the grammar, the ritual, and the tooling (`add.py deltas`, `add.py check`).
109
116
 
110
117
  ## In the tooling
111
118
 
112
- - `add.py init` scaffolds `PROJECT.md` as a survivor file and, like every
113
- survivor file, **never overwrites a hand-edited one**.
119
+ - `add.py init` scaffolds `PROJECT.md` as a survivor file; the AI then drafts its
120
+ content and a single human **lock-down** (`add.py lock`) freezes it. Like every
121
+ survivor file, `init` **never overwrites a hand-edited one**.
114
122
  - `add.py status` shows a one-line pointer to the foundation, so a fresh session
115
123
  re-orients on context before code.
116
124
  - The guideline block written into `CLAUDE.md` / `AGENTS.md` tells any agent the
@@ -12,7 +12,7 @@ This appendix maps every AIDD document to a three-level project hierarchy, so th
12
12
  |-------|-----------|--------------|-------|
13
13
  | **Project** | the whole product or engagement | the survivor layer — documents created once and kept for the life of the product | all milestones |
14
14
  | **Milestone** | a stage or release | one pass of the flow at a chosen depth: Prototype, POC, MVP, or Production-Ready; groups many tasks | many tasks |
15
- | **Task** | one feature through the flow | a single pass of Specify → … → Verify; the smallest unit with its own gate records | the six steps |
15
+ | **Task** | one feature through the flow | a single pass of Specify → … → Verify → Observe; the smallest unit with its own gate records | the seven steps |
16
16
 
17
17
  A **project** sets up the survivor-layer documents once. A **milestone** is a depth-bounded goal that groups tasks and has its own entry and exit document gates. A **task** is one feature, and it produces the per-feature artifacts.
18
18
 
@@ -87,7 +87,7 @@ Which documents must exist, and at what depth, to **exit** each milestone. Depth
87
87
 
88
88
  ---
89
89
 
90
- ## Matrix 3 — Documents required per task (the six steps)
90
+ ## Matrix 3 — Documents required per task (the seven steps)
91
91
 
92
92
  Every task, regardless of milestone, produces this artifact chain. The depth varies by milestone (Matrix 2); the *sequence and exit gate* do not.
93
93
 
@@ -98,9 +98,10 @@ Every task, regardless of milestone, produces this artifact chain. The depth var
98
98
  | 3 Contract | `contracts/<task>.md` | frozen + contract tests green | [05](./05-step-3-contract.md) |
99
99
  | 4 Tests | `tests/<task>_*` | one test per scenario, red first | [06](./06-step-4-tests.md) |
100
100
  | 5 Build | source code + evidence bundle | all tests green, nothing weakened | [07](./07-step-5-build.md) |
101
- | 6 Verify | gate outcome record | `PASS` / `RISK-ACCEPTED` / `HARD-STOP` | [08](./08-step-6-verify.md) |
101
+ | 6 Verify | gate outcome record | `PASS` / `RISK-ACCEPTED` / `HARD-STOP` (auto-resolved on evidence under `autonomy: auto`; security always escalates) | [08](./08-step-6-verify.md) |
102
+ | 7 Observe | `TASK.md` §7 OBSERVE block | released behind a flag; scenario-monitors live; spec delta + competency deltas captured | [09](./09-the-loop.md) |
102
103
 
103
- A task is **done** only when all six documents exist and the Verify record reads `PASS` (or a signed `RISK-ACCEPTED`). See the master shippable checklist in [Appendix E](./appendix-e-checklists.md).
104
+ A task is **done** when the build's documents exist and the Verify record reads `PASS` (or a signed `RISK-ACCEPTED`); the seventh step — **Observe** (§7) — then runs in production and feeds the next loop's Specify. See the master shippable checklist in [Appendix E](./appendix-e-checklists.md).
104
105
 
105
106
  ---
106
107
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@pilotspace/add",
3
- "version": "1.0.0",
3
+ "version": "1.1.0",
4
4
  "description": "ADD (AI-Driven Development) — a minimal, state-tracked Claude Code skill that drives every feature through Specify → Scenarios → Contract → Tests → Build → Verify → Observe. Ships the AIDD book as its trust layer.",
5
5
  "bin": {
6
6
  "add": "bin/cli.js"
@@ -9,7 +9,8 @@
9
9
  "access": "public"
10
10
  },
11
11
  "scripts": {
12
- "test": "python3 -m unittest discover -s tooling -p 'test_*.py'"
12
+ "test": "python3 -m unittest discover -s tooling -p 'test_*.py'",
13
+ "prepublishOnly": "python3 -m unittest discover -s tooling -p 'test_packaging.py'"
13
14
  },
14
15
  "files": [
15
16
  "bin/",
@@ -18,7 +19,8 @@
18
19
  "tooling/templates/",
19
20
  "docs/",
20
21
  "README.md",
21
- "GETTING-STARTED.md"
22
+ "GETTING-STARTED.md",
23
+ "CHANGELOG.md"
22
24
  ],
23
25
  "keywords": [
24
26
  "ai-driven-development",