@pilotspace/add 1.0.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (54) hide show
  1. package/CHANGELOG.md +88 -0
  2. package/GETTING-STARTED.md +172 -84
  3. package/README.md +14 -8
  4. package/bin/cli.js +39 -38
  5. package/docs/01-principles.md +3 -3
  6. package/docs/02-the-flow.md +20 -13
  7. package/docs/03-step-1-specify.md +13 -13
  8. package/docs/04-step-2-scenarios.md +3 -1
  9. package/docs/05-step-3-contract.md +4 -2
  10. package/docs/06-step-4-tests.md +3 -1
  11. package/docs/07-step-5-build.md +1 -1
  12. package/docs/08-step-6-verify.md +22 -4
  13. package/docs/09-the-loop.md +25 -1
  14. package/docs/10-setup-and-stages.md +52 -9
  15. package/docs/11-governance.md +2 -2
  16. package/docs/12-roles.md +3 -3
  17. package/docs/13-adoption.md +3 -3
  18. package/docs/14-foundation.md +19 -11
  19. package/docs/15-foundations-and-lineage.md +106 -0
  20. package/docs/README.md +4 -0
  21. package/docs/appendix-a-templates.md +3 -3
  22. package/docs/appendix-b-prompts.md +40 -5
  23. package/docs/appendix-c-glossary.md +42 -12
  24. package/docs/appendix-d-worked-example.md +2 -2
  25. package/docs/appendix-e-checklists.md +2 -2
  26. package/docs/appendix-f-requirements-matrix.md +12 -11
  27. package/docs/appendix-g-references.md +106 -0
  28. package/package.json +5 -3
  29. package/skill/add/SKILL.md +50 -21
  30. package/skill/add/adopt.md +67 -0
  31. package/skill/add/deltas.md +20 -8
  32. package/skill/add/fold.md +19 -17
  33. package/skill/add/graduate.md +74 -0
  34. package/skill/add/intake.md +22 -7
  35. package/skill/add/loop.md +59 -0
  36. package/skill/add/phases/0-setup.md +92 -24
  37. package/skill/add/phases/1-specify.md +23 -13
  38. package/skill/add/phases/2-scenarios.md +14 -4
  39. package/skill/add/phases/3-contract.md +38 -9
  40. package/skill/add/phases/4-tests.md +29 -5
  41. package/skill/add/phases/5-build.md +14 -4
  42. package/skill/add/phases/6-verify.md +38 -4
  43. package/skill/add/phases/7-observe.md +13 -5
  44. package/skill/add/report-template.md +106 -0
  45. package/skill/add/run.md +53 -34
  46. package/skill/add/scope.md +24 -2
  47. package/skill/add/setup-review.md +65 -0
  48. package/skill/add/streams.md +256 -0
  49. package/tooling/add.py +1388 -62
  50. package/tooling/templates/CONVENTIONS.md.tmpl +1 -1
  51. package/tooling/templates/GLOSSARY.md.tmpl +23 -0
  52. package/tooling/templates/MILESTONE.md.tmpl +1 -0
  53. package/tooling/templates/PROJECT.md.tmpl +4 -3
  54. package/tooling/templates/TASK.md.tmpl +39 -11
@@ -0,0 +1,59 @@
1
+ # The dynamic loop — open deltas and extras become the next tasks
2
+
3
+ A milestone is not done when its tasks are done — it is done when its **GOAL** is met.
4
+ This guide is the loop that drives a milestone toward that goal: turn what each task
5
+ leaves behind (open lessons, and work discovered but out of scope) into the next tasks,
6
+ and keep going until the exit criteria are all met.
7
+
8
+ You (the AI) **gather and propose**; the **human confirms**; the existing `add.py new-task`
9
+ creates each one. The engine never decides what the next task is — that is judgment.
10
+
11
+ ## The goal-gate (what holds the loop open)
12
+
13
+ `add.py milestone-done <slug>` REFUSES to close a milestone while its exit criteria are not
14
+ all met — it stops with `milestone_goal_unmet` and the milestone stays active. The exit-criteria
15
+ checkboxes in `MILESTONE.md` ARE the human's goal-met affirmation: the engine reads the
16
+ `- [x]`/`- [ ]` tally, it never judges whether the goal is met (the same trust model as reading a
17
+ recorded `PASS`). Checking the last box is the deliberate act that releases the gate.
18
+
19
+ The gate fires only when criteria exist. A milestone with no exit-criteria checkboxes closes as
20
+ before — write criteria into `MILESTONE.md` if you want the goal-gate to hold the milestone open.
21
+
22
+ `milestone-done` is the only way a milestone reaches `done`; `archive-milestone` and `compact`
23
+ both refuse a milestone that is not done. So the one gate is enough — there is no quiet way around it.
24
+
25
+ ## The loop
26
+
27
+ When every task is done but the goal is not, `add.py status` shows
28
+ `goal not met (m/n exit criteria)` where it would otherwise prompt to archive. That is the cue:
29
+
30
+ 1. **Gather** the carried inventory:
31
+ - open lessons — `add.py deltas` (the §7 OBSERVE deltas still `open`);
32
+ - the planned-but-unscaffolded tasks — the plan-vs-state line in `add.py status`;
33
+ - any reopened task — one a deepened verify returned to the flow (see below).
34
+ 2. **Propose** the next tasks: for each carried item worth doing now, draft a one-line task
35
+ (slug + title + why) and show the human. Group the trivial ones; do not propose noise.
36
+ 3. **Confirm** — the human accepts, edits, or declines each. No task is created without this.
37
+ 4. **Create** each accepted task — `add.py new-task <slug> --title "..."` — and run it through
38
+ the normal flow (specify → … → verify).
39
+ 5. **Repeat** until the work the goal needs is done.
40
+ 6. **Close** — when the goal is genuinely met, check the exit-criteria boxes in `MILESTONE.md`,
41
+ then `add.py milestone-done <slug>` succeeds (then consolidate the open deltas and archive).
42
+ Present the close via `report-template.md` — open with the ARC (goal · done · plan): the
43
+ milestone goal, the exit-criteria met that prove it, and the plan beyond the close.
44
+
45
+ ## Reopen is the verb; this loop is the trigger
46
+
47
+ When a deepened verify (the no-skim wiring / dead-code / semantic check) finds a criterion unmet
48
+ on a task already marked done, `add.py reopen <task> --to <phase> --reason "..."` returns it to the
49
+ flow with a recorded reason and a reset gate. `reopen` is the recorded action; deciding WHEN to
50
+ fire it — because a goal criterion is unmet — is this loop's job.
51
+
52
+ ## The reactivation residual (deferred)
53
+
54
+ A reopen fired inside the loop happens while the milestone is still **active** — the goal-gate held
55
+ it open, so it never reached done, and no reactivation is needed. The one residual — reopening a
56
+ task inside a milestone that was already closed — is surfaced by `add.py check` (a done milestone
57
+ with a live task reads as incoherent). Re-activating a closed milestone is **deferred**: resolve it
58
+ by hand for now (the loop's own design keeps in-flight milestones open), until a later task makes
59
+ milestone reactivation first-class.
@@ -1,35 +1,103 @@
1
- # Phase 0 — Setup (once per project)
1
+ # Phase 0 — Setup (autonomous draft → one human baseline approval)
2
2
 
3
- Goal: make every later gate enforceable automatically. Do this once.
3
+ Goal: point ADD at a repo and **you** draft the whole foundation — domain, first-milestone scope,
4
+ and the first task's contract — then hand the human exactly one decision: the **baseline approval**. Brownfield
5
+ is silent (the code answers the questions); greenfield keeps a short interview. Either way, the human's
6
+ only gate is `add.py lock`. This is the setup-level analog of a task's one-approval contract freeze.
4
7
 
5
- ## Do
8
+ ## 1 · Zero-touch entry — you run init yourself
6
9
 
7
- 1. Initialise the runtime (creates `.add/` + survivor-layer files):
10
+ When there is no `.add/state.json`, do **not** tell the human to initialise run it yourself. Infer the
11
+ project name and stage from the repo, and **arm the baseline-approval gate** with `--await-lock`:
12
+
13
+ ```bash
14
+ python3 .add/tooling/add.py init --name "<inferred from repo/dir>" --stage <prototype|poc|mvp|production> --await-lock
15
+ ```
16
+
17
+ - `--await-lock` is **required** here: it seeds an *unlocked* setup, which arms the gate so the engine
18
+ refuses a second task / crossing into build / a `gate` until you `lock`. A plain `init` is
19
+ grandfathered-locked — its gate never arms, and the closing `lock` would error `already_locked`.
20
+ - name + stage are **your judgment** (read them from the dir name, README, manifests); the engine stays
21
+ mechanical. Pick the stage from the ambition you hear: throwaway → `prototype`, one risky slice → `poc`,
22
+ narrow-but-real → `mvp`, full rigor → `production`.
23
+
24
+ `init` prints one of two things — **that is your branch**:
25
+ - a line starting `brownfield:` → there is existing code (go to **2a**);
26
+ - the greenfield closing (no `brownfield:`) → an empty repo (go to **2b**).
27
+
28
+ ## 2a · Brownfield — map it silently
29
+
30
+ The code answers the questions a greenfield interview would ask, so **read it instead of asking**. Open
31
+ `adopt.md` and follow it: fill each living-doc file from the code, never clobber an existing one, and tag
32
+ every decision `evidence-grounded` (cite the file) or `guessed`. Ask the human **nothing** at this step.
33
+
34
+ ## 2b · Greenfield — the 4-lens interview (kept): co-specify at foundation level
35
+
36
+ An empty repo has no code to read, so run the short interview. This is the **co-specify at foundation
37
+ level** move — the same diverge → converge → validate brainstorm a task's §1 uses (`phases/1-specify.md`),
38
+ lifted to the foundation. Ask the one load-bearing question per lens (diverge), draft the foundation
39
+ (converge), then rank where your confidence is lowest and show the top flag first (validate):
40
+
41
+ | Lens | The one question that unblocks the section |
42
+ |------|--------------------------------------------|
43
+ | Domain (DDD) | The 3–5 core nouns, and the one invariant that must NEVER break? |
44
+ | Spec (SDD) | The first milestone's outcome — and what's explicitly NOT in v1? |
45
+ | Users (UDD) | The primary user and the one job they hire this for? (or "no UI — surface is X") |
46
+ | Decisions | What's already decided that you'd regret re-litigating? (first Key Decision row) |
47
+
48
+ Ask only the live ones; skip what the request already answers. Rank your drafts lowest-confidence-first using the
49
+ one notation every scope level shares — `⚠ <assumption> — lowest confidence because <why>; if wrong: <cost>` — and
50
+ tag thin or inferred answers `guessed`.
51
+
52
+ ## 3 · Draft to the lock (both paths)
53
+
54
+ 1. **Fill the living documentation** (it outlives all code): `.add/PROJECT.md` (the foundation — Domain · Spec/active
55
+ milestone · UI/UX · Key Decisions, one screen), `CONVENTIONS.md`, `GLOSSARY.md`, `MODEL_REGISTRY.md`,
56
+ `dependencies.allowlist`. Brownfield: from the code. Greenfield: from the interview, gaps flagged `guessed`.
57
+ 2. **Size the first milestone** (read `scope.md`) and draft its `MILESTONE.md` — goal · scope · exit criteria
58
+ · breadth-first tasks.
59
+ 3. **Create the first task and draft its candidate specification bundle.** `new-task` is allowed pre-lock:
8
60
  ```bash
9
- python3 .add/tooling/add.py init --name "<project>" --stage prototype
61
+ python3 .add/tooling/add.py new-task <slug> --title "<first feature>"
10
62
  ```
11
- If the tool isn't there yet, the installer (`npx @pilotspace/add init`) placed it at
12
- `.add/tooling/add.py`.
13
- 2. Fill the survivor-layer files (they outlive all code):
14
- - `.add/PROJECT.md` **the foundation**: Domain (DDD) · Spec/Living-Document (SDD,
15
- active milestone) · UI/UX (UDD) · Key Decisions. Cross-milestone context the
16
- engine reads first. Keep it to one screen. Book: `docs/14-foundation.md`.
17
- - `.add/CONVENTIONS.md` language, folders, naming, lint, error-code style, architecture.
18
- - `.add/GLOSSARY.md` — one name per concept; used in specs, contracts, and code.
19
- - `.add/MODEL_REGISTRY.md` which AI model/version writes this project.
20
- - `.add/dependencies.allowlist` packages the AI may use; CI rejects others.
21
- 3. Confirm CI runs green on the empty skeleton before the first feature.
63
+ Draft §1 (specify) · §2 (scenarios) · §3 (contract). **Leave §3 `Status: DRAFT`** the lock is its
64
+ approval (see §5). You MAY `advance` through specify → scenarios → contract → tests pre-lock, but the
65
+ engine **refuses crossing into build** until you `lock` (`setup_unlocked`). Sequence: bundle → lock → build.
66
+ 4. **Write `.add/SETUP-REVIEW.md`** per `setup-review.md`: every decision you drafted (foundation, scope,
67
+ first contract), **lowest-confidence-first**, each tagged `guessed` | `evidence-grounded`.
68
+
69
+ ## 4 · The one human gate the baseline approval
70
+
71
+ Open the report with the ARC (goal · done · plan) per `report-template.md`, then present
72
+ `SETUP-REVIEW.md` lowest-confidence-first (the `guessed` rows are what the human must actually check). They
73
+ confirm **once** an explicit yes to the baseline approval itself, in conversation; ambient agreement mid-stream is
74
+ not a confirmation. On that recorded confirmation, you run the lock with their name:
75
+
76
+ ```bash
77
+ python3 .add/tooling/add.py lock --by "<name>"
78
+ ```
79
+
80
+ Typing the command themselves stays the **escape hatch** — the decision is always the human's; you just
81
+ execute it. `lock` records the lock layers (foundation · scope · contract) in one atomic write and opens the
82
+ build. It is judgment-free — it does **not** parse `SETUP-REVIEW.md`; the human *reading* it is the review.
83
+
84
+ ## 5 · After the lock
85
+
86
+ - The lock **is** the first task's contract approval — the v7 specification-bundle approval and the baseline approval collapse
87
+ into this single signature. Do **not** ask for a separate contract-freeze sign-off (that double-gates).
88
+ - Stamp the first task's §3 `Status: FROZEN @ v1` (lock-authorized), then read `phases/5-build.md` — build is
89
+ now open. Everything before this signature, you drafted.
22
90
 
23
91
  ## Exit gate
24
92
 
25
- - [ ] `.add/state.json` exists (`add.py status` works).
26
- - [ ] `.add/PROJECT.md` foundation filled (domain · spec · UI/UX).
27
- - [ ] CONVENTIONS, GLOSSARY, MODEL_REGISTRY, allowlist filled.
28
- - [ ] Pipeline green on the skeleton.
93
+ <exit_gate>
94
+ - [ ] `.add/state.json` exists; setup was seeded unlocked (`--await-lock`) then locked.
95
+ - [ ] Living docs filled (brownfield: from code, tagged evidence-grounded; greenfield: from the interview).
96
+ - [ ] First task created; §1–§3 drafted; `.add/SETUP-REVIEW.md` written lowest-confidence-first.
97
+ - [ ] Human confirmed the baseline approval and `add.py lock --by` ran with their name; first task §3 `FROZEN @ v1`; build open.
98
+ </exit_gate>
29
99
 
30
100
  ## Next
31
101
 
32
- ```bash
33
- python3 .add/tooling/add.py new-task <slug> --title "<feature>"
34
- ```
35
- Then read `phases/1-specify.md`. · Book: `docs/10-setup-and-stages.md`.
102
+ After the lock, read `phases/5-build.md` (build is open). · Book: `docs/10-setup-and-stages.md`
103
+ *(note: book chapters 10 / 13 / 14 still describe the older human-led setup until `book-align` lands).*
@@ -11,42 +11,52 @@ understand the feature — that is information, not an obstacle. Stop and ask.
11
11
 
12
12
  1. **Diverge** — before drafting, surface the decision space: the 2–3 genuine framings of the
13
13
  feature + the open questions you would otherwise guess. Invite the user to add, kill,
14
- redirect. (Conversational — no new file. At prototype/poc this collapses to one sentence.)
15
- 2. **Converge** — draft §1, then RANK what you are least sure about (below).
14
+ redirect. (Conversational — no new file. At prototype/poc this shortens to one sentence.)
15
+ 2. **Converge** — draft §1, then RANK where your confidence is lowest (below).
16
16
  3. **Validate** — present the ranked uncertainty first; the user confirms, corrects, or sends back.
17
17
 
18
18
  ## Produce (in TASK.md §1)
19
19
 
20
+ <output_format>
20
21
  - **Framings weighed** — a one-line trace of what you considered: `X (chosen) · Y · Z`.
21
22
  - **Must** — each required behavior.
22
23
  - **Reject** — each refused input/situation, paired with a **named error code**
23
24
  (`amount <= 0 -> "amount_invalid"`, never "handle bad input").
24
25
  - **After** — the state that is true once it succeeds.
25
- - **Assumptions — least-sure first** — ranked most-likely-wrong → least. The top 1–2 carry a
26
- `⚠` flag: `⚠ <assumption> — least sure because <why>; if wrong: <cost>`. The rest are the
27
- low-stakes `[x]` tail. Never a flat wall of equal `[x]` ticks that is what gets rubber-stamped.
26
+ - **Assumptions — lowest-confidence first** — ranked most-likely-wrong → least. The top 1–2 carry a
27
+ `⚠` flag: `⚠ <assumption> — lowest confidence because <why>; if wrong: <cost>`. The rest are the
28
+ low-stakes `[x]` tail. Keep the ranking visible — a flat list of equal `[x]` ticks gets approved without reading.
29
+ </output_format>
28
30
 
29
- ## The least-sure flag is bundle-wide
31
+ ## The lowest-confidence flag is bundle-wide
30
32
 
31
33
  The single human approval happens once, at the contract freeze, over the whole bundle. So your
32
- §1 ranking is the FIRST FEEDER into a bundle-level flag the user reads at the seam (`run.md`):
34
+ §1 ranking is the first input into a bundle-level flag the user reads at the decision point (`run.md`):
33
35
  *"of everything I'm asking you to freeze, these 1–2 are most likely wrong."* A flag may point at
34
36
  a §1 assumption, an uncovered scenario, or the contract shape.
35
37
 
36
38
  ## AI prompt
37
39
 
38
- > Role: a domain analyst who brainstorms, then asks rather than assumes. Read CONVENTIONS,
39
- > GLOSSARY, and the user's raw input. First surface 2–3 framings + open questions and let me
40
- > react. Then produce §1: Framings weighed, every Must, every Reject with a named error code,
41
- > the After state, and the Assumptions RANKED least-sure first flag the 1–2 you are least
42
- > sure about with why + cost. Never resolve an ambiguity by guessing.
40
+ <prompt>
41
+ Role: a domain analyst who brainstorms, then asks rather than assumes.
42
+ Read first: CONVENTIONS · GLOSSARY · the user's raw input.
43
+ Objective: fill §1 SPECIFY with zero ambiguity left for the AI to resolve by guessing.
44
+ Steps:
45
+ 1. Surface 2–3 framings + the open questions; let the user react before you draft.
46
+ 2. Produce §1 — Framings weighed, every Must, every Reject with a named error code, the
47
+ After state, and the Assumptions RANKED lowest-confidence first.
48
+ 3. Flag the 1–2 where your confidence is lowest, each with why + cost.
49
+ Never: resolve an ambiguity by guessing.
50
+ </prompt>
43
51
 
44
52
  ## Exit gate
45
53
 
54
+ <exit_gate>
46
55
  - [ ] Framings weighed noted; every required behavior stated.
47
56
  - [ ] Every rejection has a named error code; success state-change described.
48
- - [ ] Assumptions ordered least-sure first; the 1–2 `⚠` flags carry why + cost — or an honest
57
+ - [ ] Assumptions ordered lowest-confidence first; the 1–2 `⚠` flags carry why + cost — or an honest
49
58
  "none material" that still names the single biggest risk (never a blank "none").
59
+ </exit_gate>
50
60
 
51
61
  ## Next
52
62
 
@@ -6,6 +6,7 @@ generated from it. Fill **§2 SCENARIOS** in TASK.md.
6
6
 
7
7
  ## Produce (in TASK.md §2)
8
8
 
9
+ <output_format>
9
10
  ```gherkin
10
11
  Scenario: <short name>
11
12
  Given <starting situation>
@@ -15,20 +16,29 @@ Scenario: <short name>
15
16
  ```
16
17
 
17
18
  The `And ... unchanged` clause catches corrupting partial failures (e.g. a balance
18
- deducted before a check fails). Never omit it on a rejection.
19
+ deducted before a check fails). Include it on every rejection.
20
+ </output_format>
19
21
 
20
22
  ## AI prompt
21
23
 
22
- > Role: a specification tester. Read §1 and GLOSSARY. Write one scenario per Must
23
- > and per Reject rule. For every rejection add an And-clause asserting what must NOT
24
- > change. Results must be specific and observable — never "then it works".
24
+ <prompt>
25
+ Role: a specification tester.
26
+ Read first: §1 · GLOSSARY.
27
+ Objective: one scenario per Must and per Reject rule, each result specific and observable.
28
+ Steps:
29
+ 1. Write one scenario per Must rule and one per Reject rule.
30
+ 2. For every rejection add an And-clause asserting what must NOT change.
31
+ Never: settle for a vague result ("then it works") — results must be specific and observable.
32
+ </prompt>
25
33
 
26
34
  ## Exit gate
27
35
 
36
+ <exit_gate>
28
37
  - [ ] One scenario per Must rule.
29
38
  - [ ] One scenario per Reject rule.
30
39
  - [ ] Each result is a specific, observable fact.
31
40
  - [ ] Every rejection asserts what stays unchanged.
41
+ </exit_gate>
32
42
 
33
43
  ## Next
34
44
 
@@ -1,12 +1,13 @@
1
1
  # Phase 3 — Contract (freeze the shape)
2
2
 
3
3
  Goal: fix the external shape — interfaces, data, names, error cases — and FREEZE
4
- it. This is the seam that makes the AI-led build safe: below it code is
4
+ it. This is the decision point that makes the AI-led build safe: below it code is
5
5
  disposable; above it nothing breaks because the shape does not move. Fill
6
6
  **§3 CONTRACT** in TASK.md.
7
7
 
8
8
  ## Produce (in TASK.md §3)
9
9
 
10
+ <output_format>
10
11
  - Interfaces (endpoints/functions/messages) with inputs/outputs.
11
12
  - Request/response shapes + persistent schema (note transactional needs).
12
13
  - Names drawn from `GLOSSARY.md` (same concept = same name everywhere).
@@ -14,26 +15,54 @@ disposable; above it nothing breaks because the shape does not move. Fill
14
15
 
15
16
  Then mark `Status: FROZEN @ v1`. Generate a mock + contract tests so dependent
16
17
  work can start before the real code exists.
18
+ </output_format>
17
19
 
18
- **The freeze is the one approval.** This seam is where the single human approval lands, over the
19
- whole bundle (§1–§4). Before asking for it, present the bundle **least-sure first**: the 1–2 points
20
+ **The freeze is the one approval.** This decision point is where the single human approval lands, over the
21
+ whole bundle (§1–§4). Before asking for it, present the bundle **lowest-confidence first**: the 1–2 points
20
22
  most likely wrong (`⚠ [spec|scenario|contract|test] … — because …; if wrong: …`) — aim the human's
21
- eye before they freeze. See `run.md`.
23
+ eye before they freeze. Open that report with the ARC (goal · done · plan) per `report-template.md` so the
24
+ human sees the goal this freeze serves and the plan beyond it, not just the bundle. See `run.md`.
25
+
26
+ ## The freeze review checklist
27
+
28
+ The human's one minute, aimed. Walk these six before saying yes:
29
+
30
+ - **⚠ flags first** — read the lowest-confidence flags; accept each knowing its cost if wrong.
31
+ The engine refuses an unflagged freeze before build: a frozen §3 with no well-formed
32
+ lowest-confidence flag is rejected (`unflagged_freeze`), and `audit` re-checks it on every
33
+ record that crossed.
34
+ - **Intent** — does §1 say what you actually want built (and is anything you expected missing)?
35
+ - **Cases** — does every Must and Reject have an observable §2 scenario you care about?
36
+ - **Shape** — glossary names, error codes, additive vs breaking: is THIS the shape to freeze?
37
+ - **Risk** — is this scope high-risk or method-defining? Then require
38
+ `risk: high · autonomy: conservative` in the TASK.md header — the engine refuses an unguarded completion.
39
+ - **Tests** — will §4 go red for the right reason, asserting behavior rather than internals?
40
+
41
+ This checklist AIMS the one approval — the freeze stays the only gate: no sign-off forms, no
42
+ extra documents. Reject any line and the bundle goes back to draft; that is
43
+ backward-correction, not failure.
22
44
 
23
45
  ## AI prompt
24
46
 
25
- > Role: an interface architect; frozen contracts are immutable. Read §1, §2,
26
- > GLOSSARY. Produce §3: interfaces, shapes, schema named from the glossary; a
27
- > response for every Reject code; a mock returning the contracted shapes and
28
- > contract tests pinning them. Mark FROZEN. No business logic. Never change a
29
- > frozen contract — a change reopens Specify.
47
+ <prompt>
48
+ Role: an interface architect; frozen contracts are immutable.
49
+ Read first: §1 · §2 · GLOSSARY.
50
+ Objective: produce §3 the frozen external shape, nothing more.
51
+ Steps:
52
+ 1. Define interfaces, shapes, and schema named from the glossary, with a response for every Reject code.
53
+ 2. Generate a mock returning the contracted shapes and contract tests pinning them.
54
+ 3. Mark FROZEN. No business logic.
55
+ Never: change a frozen contract — a change reopens Specify.
56
+ </prompt>
30
57
 
31
58
  ## Exit gate
32
59
 
60
+ <exit_gate>
33
61
  - [ ] Versioned and marked `FROZEN`.
34
62
  - [ ] Contract tests pass against the mock.
35
63
  - [ ] Every name matches the glossary.
36
64
  - [ ] Every spec rejection has a contracted response.
65
+ </exit_gate>
37
66
 
38
67
  ## Next
39
68
 
@@ -1,4 +1,4 @@
1
- # Phase 4 — Tests (red safety net)
1
+ # Phase 4 — Tests (failing-first suite)
2
2
 
3
3
  Goal: turn scenarios + contract into automated tests and confirm they FAIL before
4
4
  any code exists. This operationalizes red/green TDD: red now, green only after
@@ -12,24 +12,48 @@ before code exists is testing nothing and will wave bad code through later.
12
12
 
13
13
  ## Produce
14
14
 
15
+ <output_format>
15
16
  - One executable test per scenario (§2), asserting **behavior, not internals**.
16
17
  - Contract-conformance tests (shapes + error responses from §3).
17
18
  - Side-effect assertions on rejection paths (`assert balance unchanged`).
18
19
  - A recorded coverage target in §4.
20
+ </output_format>
21
+
22
+ ## Declaring where tests live
23
+
24
+ §4's `Tests live in:` line is machine-read: when a task has no local `tests/`,
25
+ `add.py report` counts test functions at the declared path(s) instead. The FIRST
26
+ line matching `Tests live in:` is read; paths are its backticked tokens.
27
+ Resolution: `./…` → this task's dir · a token containing `/` → the project root
28
+ (the parent of `.add/`) · a bare name → a sibling of the previous token's
29
+ directory (else the task dir). A directory token counts the `*.py` files directly
30
+ inside it (non-recursive); a `.py` file token counts itself; anything else is
31
+ ignored. Resolved files are deduped, and reports mark declared counts with `†`.
32
+ Paths are confined: anything resolving (symlinks followed)
33
+ outside the project root counts 0 — `..` traversal, absolute paths, and
34
+ symlink escapes are never read.
19
35
 
20
36
  ## AI prompt
21
37
 
22
- > Role: a test author who writes tests before code. Read §2 and §3. Turn each
23
- > scenario into an executable test; add contract-conformance and edge-case tests;
24
- > run the suite and confirm it fails for the right reason. Record a coverage
25
- > target. Do NOT implement the feature. Never assert on internals.
38
+ <prompt>
39
+ Role: a test author who writes tests before code.
40
+ Read first: §2 · §3.
41
+ Objective: a red suite that fails for the right reason behavior, not internals.
42
+ Steps:
43
+ 1. Turn each scenario into an executable test.
44
+ 2. Add contract-conformance and edge-case tests.
45
+ 3. Run the suite and confirm it fails for the right reason; record a coverage target.
46
+ Never: implement the feature, or assert on internals.
47
+ </prompt>
26
48
 
27
49
  ## Exit gate
28
50
 
51
+ <exit_gate>
29
52
  - [ ] One test per scenario.
30
53
  - [ ] Suite runs and is **red for the right reason**.
31
54
  - [ ] Tests assert observable behavior.
32
55
  - [ ] Coverage target recorded.
56
+ </exit_gate>
33
57
 
34
58
  ## Next
35
59
 
@@ -19,20 +19,30 @@ change request back to Specify. Honor the feature-specific safety rule named in
19
19
 
20
20
  ## AI prompt
21
21
 
22
- > Read §1, §3, §4, and CONVENTIONS. Make EVERY failing test pass, one small batch
23
- > at a time. Constraints: do NOT change any test; do NOT change the contract; honor
24
- > the §5 safety rule; use only allow-listed packages; stop and ask if unclear.
25
- > Report which tests pass and exactly what changed.
22
+ <prompt>
23
+ Role: implement the feature so EVERY failing test passes the build phase.
24
+ Read first: §1 · §3 · §4 · CONVENTIONS.
25
+ Objective: every §4 test green, one small batch at a time.
26
+ Steps:
27
+ 1. Make EVERY failing test pass, one small batch at a time, honoring the §5 safety rule.
28
+ 2. Report which tests pass and exactly what changed.
29
+ Never: change a test or the contract; use a package off the allow-list; or push past something unclear instead of asking.
30
+ </prompt>
26
31
 
27
32
  ## Exit gate
28
33
 
34
+ <exit_gate>
29
35
  - [ ] All tests pass.
30
36
  - [ ] Coverage did not decrease.
31
37
  - [ ] No test and no contract modified by the AI.
32
38
  - [ ] No dependency outside the allow-list.
33
39
  - [ ] Change small enough to review in full.
40
+ </exit_gate>
34
41
 
35
42
  ## Next
36
43
 
37
44
  `python3 .add/tooling/add.py advance` → read `phases/6-verify.md`.
38
45
  Book: `docs/07-step-5-build.md`.
46
+
47
+ > Under `autonomy: auto` (the default) Build and Verify run together as one dynamic,
48
+ > evidence-auto-gated run — not two manual stops. See `run.md`.
@@ -1,8 +1,16 @@
1
- # Phase 6 — Verify (evidence + blind-spot checks)
1
+ # Phase 6 — Verify (evidence + non-functional review)
2
2
 
3
3
  Goal: establish trust and record an outcome. Passing tests are necessary, not
4
- sufficient. This phase is **human-led** there is no AI role. Fill **§6** in
5
- TASK.md including the GATE RECORD.
4
+ sufficient. Fill **§6** in TASK.md including the GATE RECORD.
5
+
6
+ > **Who resolves this gate depends on the `autonomy:` header (see `run.md`).**
7
+ > Under `autonomy: auto` (the default) a run auto-PASSes once the evidence is
8
+ > complete — every test green, the convergence loops dry, and **no residue**
9
+ > (security · concurrency · architecture) — recording it as *auto-resolved* with
10
+ > the named run as accountable owner: an explicit PASS, not a skip. **Security is
11
+ > always a HARD-STOP and is never auto-passed.** Under `autonomy: conservative`,
12
+ > or whenever residue is found, this phase is **human-led** and the checks below
13
+ > are the human's.
6
14
 
7
15
  ## Part one — confirm the evidence
8
16
 
@@ -18,10 +26,33 @@ If any is false, stop and return to Build — there is nothing to verify yet.
18
26
  and miss races.) This is usually the single most important check.
19
27
  - **Security** — exposed secrets, injection openings, unexpected/invented
20
28
  dependencies. A security finding is always `HARD-STOP`, never a waiver.
29
+ Writing ANY note on this line means the gate escalates to the human — and
30
+ start it with `NOTE` or `⚠` so `add.py audit` can see it: a marked security
31
+ note reviewed by the auto-gate is an audit finding (`unescalated_security_note`).
21
32
  - **Architecture** — does it respect layering/dependency rules in CONVENTIONS.md?
22
33
 
34
+ ## Part three — the deep check (do not skim)
35
+
36
+ Green tests prove behavior on the inputs you thought of. They do not prove the change
37
+ is *wired in*, nor that you did not leave a dead end behind — and for a non-coding change
38
+ they prove nothing about whether you actually *read* the thing you signed off. So one more
39
+ requirement, every gate:
40
+
41
+ Deep check — do not skim. If the task produced code, record that every new symbol is
42
+ referenced (wiring) and that no new dead/unused code was introduced. If it produced prose
43
+ or non-code, record a semantic read — what you read in full and what it confirmed. Which
44
+ path applies is the resolver's judgement; the engine never classifies.
45
+
46
+ Record it in the §6 **Deep checks** block — where each new symbol is called (a reference
47
+ search), the dead-code scan result, or the prose you read in full and what it confirmed.
48
+ An unfilled Deep checks block is a **shallow verify**, not a PASS.
49
+
23
50
  ## Record exactly one outcome (no silent pass)
24
51
 
52
+ When you present this gate to the human, open with the ARC (goal · done · plan) per
53
+ `report-template.md`, and reconcile its FLAGS with `add.py report --decide`'s open-item count
54
+ before the ask — per that file's reconcile rule (verify is where a flag-vs-digest mismatch bites).
55
+
25
56
  | Outcome | When |
26
57
  |---------|------|
27
58
  | `PASS` | all checks met |
@@ -30,7 +61,10 @@ If any is false, stop and return to Build — there is nothing to verify yet.
30
61
 
31
62
  ## Exit gate / Next
32
63
 
33
- - [ ] Evidence confirmed, blind-spots checked, a person approved, outcome recorded.
64
+ <exit_gate>
65
+ - [ ] Evidence confirmed, non-functional risks checked, outcome recorded — a person approved, or
66
+ (under `autonomy: auto` with no residue) the run auto-resolved as the accountable owner.
67
+ </exit_gate>
34
68
 
35
69
  ```bash
36
70
  python3 .add/tooling/add.py gate PASS # marks the task done
@@ -6,7 +6,7 @@ about the feature finally appears. Fill **§7** in TASK.md.
6
6
 
7
7
  ## Do
8
8
 
9
- 1. **Release behind a blast-radius limit** — feature flag and/or gradual rollout.
9
+ 1. **Release behind a scope-of-impact limit** — feature flag and/or gradual rollout.
10
10
  2. **Reuse scenarios as monitors** — the §2 scenarios that defined "correct" now
11
11
  define what you alert on: overall error rate, each rejection's rate (a spike in
12
12
  one is a signal), latency of the risky operation under load.
@@ -15,16 +15,24 @@ about the feature finally appears. Fill **§7** in TASK.md.
15
15
 
16
16
  ## AI prompt
17
17
 
18
- > Role: a reliability analyst feeding the next cycle. Read telemetry, objectives,
19
- > incidents. Report error-budget burn; cluster errors and surface the top
20
- > real-world failures; draft a SPEC delta with evidence links. Never auto-roll-back
21
- > recommend; a human owns the production decision.
18
+ <prompt>
19
+ Role: a reliability analyst feeding the next cycle.
20
+ Read first: telemetry · objectives · incidents.
21
+ Objective: turn what production shows into the next SPEC delta.
22
+ Steps:
23
+ 1. Report error-budget burn.
24
+ 2. Cluster errors and surface the top real-world failures.
25
+ 3. Draft a SPEC delta with evidence links.
26
+ Never: auto-roll-back — recommend; a human owns the production decision.
27
+ </prompt>
22
28
 
23
29
  ## Exit gate
24
30
 
31
+ <exit_gate>
25
32
  - [ ] Released behind a flag/rollout.
26
33
  - [ ] Scenario-based monitors live.
27
34
  - [ ] A reviewed spec delta captured (becomes the next `new-task`).
35
+ </exit_gate>
28
36
 
29
37
  ## Next
30
38