@pilotspace/add 1.1.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (61) hide show
  1. package/CHANGELOG.md +81 -0
  2. package/GETTING-STARTED.md +187 -139
  3. package/README.md +13 -7
  4. package/bin/cli.js +96 -5
  5. package/docs/01-principles.md +3 -3
  6. package/docs/02-the-flow.md +19 -12
  7. package/docs/03-step-1-specify.md +15 -13
  8. package/docs/04-step-2-scenarios.md +2 -2
  9. package/docs/05-step-3-contract.md +3 -3
  10. package/docs/06-step-4-tests.md +10 -2
  11. package/docs/07-step-5-build.md +3 -1
  12. package/docs/08-step-6-verify.md +25 -5
  13. package/docs/09-the-loop.md +12 -6
  14. package/docs/10-setup-and-stages.md +27 -13
  15. package/docs/11-governance.md +6 -2
  16. package/docs/12-roles.md +3 -3
  17. package/docs/13-adoption.md +1 -1
  18. package/docs/14-foundation.md +15 -15
  19. package/docs/15-foundations-and-lineage.md +106 -0
  20. package/docs/README.md +4 -0
  21. package/docs/appendix-a-templates.md +3 -3
  22. package/docs/appendix-b-prompts.md +40 -5
  23. package/docs/appendix-c-glossary.md +49 -12
  24. package/docs/appendix-d-worked-example.md +2 -2
  25. package/docs/appendix-e-checklists.md +16 -4
  26. package/docs/appendix-f-requirements-matrix.md +8 -8
  27. package/docs/appendix-g-references.md +106 -0
  28. package/package.json +1 -1
  29. package/skill/add/SKILL.md +41 -38
  30. package/skill/add/adopt.md +13 -11
  31. package/skill/add/deltas.md +8 -6
  32. package/skill/add/fold.md +19 -17
  33. package/skill/add/graduate.md +74 -0
  34. package/skill/add/intake.md +22 -7
  35. package/skill/add/loop.md +59 -0
  36. package/skill/add/phases/0-ground.md +66 -0
  37. package/skill/add/phases/0-setup.md +32 -25
  38. package/skill/add/phases/1-specify.md +28 -13
  39. package/skill/add/phases/2-scenarios.md +14 -4
  40. package/skill/add/phases/3-contract.md +27 -12
  41. package/skill/add/phases/4-tests.md +15 -5
  42. package/skill/add/phases/5-build.md +33 -4
  43. package/skill/add/phases/6-verify.md +40 -2
  44. package/skill/add/phases/7-observe.md +13 -5
  45. package/skill/add/report-template.md +65 -7
  46. package/skill/add/run.md +93 -39
  47. package/skill/add/scope.md +10 -6
  48. package/skill/add/setup-review.md +13 -10
  49. package/skill/add/streams.md +88 -23
  50. package/tooling/add.py +1817 -90
  51. package/tooling/templates/CONVENTIONS.md.tmpl +1 -1
  52. package/tooling/templates/DESIGN.md.tmpl +66 -0
  53. package/tooling/templates/GLOSSARY.md.tmpl +29 -0
  54. package/tooling/templates/MILESTONE.md.tmpl +1 -0
  55. package/tooling/templates/PROJECT.md.tmpl +6 -3
  56. package/tooling/templates/TASK.md.tmpl +55 -15
  57. package/tooling/templates/catalog.sample.json +38 -0
  58. package/tooling/templates/prototype.sample.json +48 -0
  59. package/tooling/templates/tokens.sample.json +55 -0
  60. package/tooling/templates/udd-catalog.md +122 -0
  61. package/tooling/templates/udd-tokens.md +79 -0
@@ -0,0 +1,74 @@
1
+ # Stage graduation — propose the move to production as a roadmap, never a flip
2
+
3
+ A project does not become "production" because someone typed a new label. It graduates when
4
+ the MVP is genuinely covered AND a human-confirmed roadmap of production work exists. This guide
5
+ is the **4th scope level** — after setup (`phases/0-setup.md`), intake (`intake.md` / `scope.md`),
6
+ and the milestone loop (`loop.md`). It turns the bare `add.py stage` flip into the **final step** of
7
+ an analytics-driven, interview-led orchestration.
8
+
9
+ You (the AI) **gather and propose**; the **human confirms and judges**; the engine only counts
10
+ tallies and enforces the floor. The engine never decides that the project is "ready" — that is
11
+ judgment, and it belongs to the interview.
12
+
13
+ ## The cue (what starts this)
14
+
15
+ When every milestone is `done` AND the human's stage-goal-criteria in `PROJECT.md` are all `[x]`,
16
+ `add.py status` prints:
17
+
18
+ ```
19
+ → MVP covered → propose graduation
20
+ ```
21
+
22
+ That line is the trigger. Before both tallies complete, status is silent and nothing here applies
23
+ (a project with no stage-goal-criteria block behaves exactly as today — grandfathered, zero change).
24
+
25
+ ## The flow
26
+
27
+ 1. **Gather the analytics** — run `add.py graduation-report` (add `--json` to branch on it). It
28
+ clusters the whole MVP loop's evidence into five labeled record-sets: open deltas by competency ·
29
+ open RISK-ACCEPTED waivers by expiry · RETRO records · verify residue · observe-loop coverage gaps.
30
+ It **gathers, never judges** — there is no readiness verdict to read; the records are what you
31
+ reason from.
32
+ 2. **Co-specify interview** — synthesize *"what production means HERE"* WITH the human, using the
33
+ gathered records as the agenda (the residue to harden, the coverage gaps to monitor, the open
34
+ deltas to consolidate). This synthesis is the judgment the engine refuses to make. Interview to real confidence —
35
+ do not guess what "production-ready" means for this project.
36
+ 3. **Draft the roadmap** — for each production outcome the interview surfaces, draft a production
37
+ milestone with the EXISTING command and goal-gate criteria:
38
+ `add.py new-milestone <slug> --stage production --goal "…"`, then write its exit criteria. The
39
+ roadmap is **≥1** milestone — the hardening work itself (SLOs, rollback tests, incident runbooks)
40
+ is what these milestones *contain*; this guide proposes them, it does not do them.
41
+ 4. **Human confirms** — present the roadmap via `report-template.md`, opening with the ARC
42
+ (goal · done · plan): the stage-graduation goal, the MVP coverage that earns the move, and the
43
+ plan the production milestones lay out. The human accepts, edits, or declines each drafted
44
+ milestone. No milestone is created without this; nothing advances on a draft the human has not confirmed.
45
+ 5. **Flip — the final step** — only now run `add.py stage production`. Because ≥1 production milestone
46
+ now exists, the guard passes and the transition is recorded. This is the orchestration's last act.
47
+
48
+ ## The floor (what the engine enforces)
49
+
50
+ `add.py stage production` is **guarded**: it refuses with `stage_no_roadmap` (non-zero exit, state
51
+ byte-unchanged) when zero milestones have `stage: production`. The check is a **tally** — "does a
52
+ production-roadmap record exist?" — never a readiness judgment (gather-not-judge at stage level;
53
+ it mirrors the milestone goal-gate's `milestone_goal_unmet`). `--force` overrides it, preserving human
54
+ authority for grandfathered/edge cases; use it deliberately, not as the normal path.
55
+
56
+ Scope: the guard is on the `→production` **transition** only. Flips to prototype/poc/mvp are the
57
+ existing bare flip, unchanged. `add.py init --stage production` is an explicit at-creation declaration
58
+ (the same authority as `--force`), not a transition — it is out of scope of the guard by design.
59
+
60
+ ## Invariants (never break these)
61
+
62
+ - **The flip is the final step**, never called outside this confirmed-roadmap path. A bare flip with
63
+ no roadmap is the symptom this scope level removes.
64
+ - **The engine never auto-flips.** Every step here is human-confirmed; the engine gathers, counts, and
65
+ enforces the floor — it does not advance the stage on its own.
66
+ - **The flow is continuous, not cue-reentrant.** The moment you draft the first production milestone,
67
+ `status` stops printing the cue (the "every milestone done" tally breaks). That is expected — do NOT
68
+ re-await the cue after drafting; carry the flow straight through to confirm and flip.
69
+
70
+ ## Depth and reuse
71
+
72
+ The same orchestration serves prototype→poc and poc→mvp; **mvp→production** is the rigorous proof
73
+ case (every step at full depth + the observe loop). At lower stages, run it light — the shape is the
74
+ same, the depth is less.
@@ -1,10 +1,19 @@
1
1
  # Intake — size a request into versioned scope
2
2
 
3
3
  Before a task exists, ADD turns a raw request into correctly-sized, versioned scope.
4
- This is the **intake altitude**: the per-task flow is phases 0–7; intake is the step
4
+ This is the **intake level**: the per-task flow is phases 0–7; intake is the step
5
5
  *before* a task — request → milestone or task. You (the AI) **propose**; the human
6
6
  **confirms**. Never create scope without a confirmed proposal.
7
7
 
8
+ ## Interview before you size
9
+
10
+ When the request arrives as a question, or its intent is not yet sharp enough to
11
+ place in one bucket: explore it WITH the user before classifying. Reflect the
12
+ intent you heard, name what seems in and out of scope, and offer 2–3 sized options
13
+ with your own recommendation. Only then emit `{ bucket, rationale, command }`.
14
+ `ask_human` stays the floor: when interviewing cannot sharpen the request,
15
+ reject — never guess a bucket.
16
+
8
17
  ## The four buckets
9
18
 
10
19
  Classify every request into exactly ONE bucket:
@@ -24,17 +33,23 @@ First ask "does this change already-frozen scope?" → if yes, it is a `change-r
24
33
 
25
34
  ## What you emit (the proposal)
26
35
 
36
+ Present the proposal to the human via `report-template.md` — open with the ARC (goal · done ·
37
+ plan): the goal this request serves, what is already covered, and the plan the chosen bucket sets up.
38
+
27
39
  For every request, emit ONE of:
28
40
 
29
41
  - **a classification** — `{ bucket, rationale, command }` — where `rationale` names WHY
30
42
  (the theme, the slice, the fit, or the frozen scope touched) and `command` is the exact
31
43
  `add.py …` from the table. The human confirms or overrides before you run it.
32
- - **a rejection** — `{ reject, rationale }` — and you create nothing:
33
- - `ask_human` — too ambiguous/underspecified to size. Ask the human; never guess a bucket.
34
- - `frozen_scope` — it changes frozen scope; route it as a `change-request` back to
35
- SPECIFY/CONTRACT of the affected task never spawn a parallel milestone that forks the truth.
36
- - `split_required` — it spans more than one bucket; propose the SMALLEST set of correctly-sized
37
- items, each with its own rationale; never force it into one milestone.
44
+ - **a rejection** — `{ reject, rationale }` — and you create nothing, emitting one of the closed set:
45
+
46
+ <reject_codes>
47
+ - `ask_human` too ambiguous/underspecified to size. Ask the human; never guess a bucket.
48
+ - `frozen_scope` — it changes frozen scope; route it as a `change-request` back to
49
+ SPECIFY/CONTRACT of the affected task never spawn a parallel milestone that forks the truth.
50
+ - `split_required` — it spans more than one bucket; propose the SMALLEST set of correctly-sized
51
+ items, each with its own rationale; never force it into one milestone.
52
+ </reject_codes>
38
53
 
39
54
  When confirmed, record the `rationale` in the artifact you create or affect — the new
40
55
  MILESTONE.md goal/body, the new TASK.md, or a note in the affected TASK.md — never in state.json.
@@ -0,0 +1,59 @@
1
+ # The dynamic loop — open deltas and extras become the next tasks
2
+
3
+ A milestone is not done when its tasks are done — it is done when its **GOAL** is met.
4
+ This guide is the loop that drives a milestone toward that goal: turn what each task
5
+ leaves behind (open lessons, and work discovered but out of scope) into the next tasks,
6
+ and keep going until the exit criteria are all met.
7
+
8
+ You (the AI) **gather and propose**; the **human confirms**; the existing `add.py new-task`
9
+ creates each one. The engine never decides what the next task is — that is judgment.
10
+
11
+ ## The goal-gate (what holds the loop open)
12
+
13
+ `add.py milestone-done <slug>` REFUSES to close a milestone while its exit criteria are not
14
+ all met — it stops with `milestone_goal_unmet` and the milestone stays active. The exit-criteria
15
+ checkboxes in `MILESTONE.md` ARE the human's goal-met affirmation: the engine reads the
16
+ `- [x]`/`- [ ]` tally, it never judges whether the goal is met (the same trust model as reading a
17
+ recorded `PASS`). Checking the last box is the deliberate act that releases the gate.
18
+
19
+ The gate fires only when criteria exist. A milestone with no exit-criteria checkboxes closes as
20
+ before — write criteria into `MILESTONE.md` if you want the goal-gate to hold the milestone open.
21
+
22
+ `milestone-done` is the only way a milestone reaches `done`; `archive-milestone` and `compact`
23
+ both refuse a milestone that is not done. So the one gate is enough — there is no quiet way around it.
24
+
25
+ ## The loop
26
+
27
+ When every task is done but the goal is not, `add.py status` shows
28
+ `goal not met (m/n exit criteria)` where it would otherwise prompt to archive. That is the cue:
29
+
30
+ 1. **Gather** the carried inventory:
31
+ - open lessons — `add.py deltas` (the §7 OBSERVE deltas still `open`);
32
+ - the planned-but-unscaffolded tasks — the plan-vs-state line in `add.py status`;
33
+ - any reopened task — one a deepened verify returned to the flow (see below).
34
+ 2. **Propose** the next tasks: for each carried item worth doing now, draft a one-line task
35
+ (slug + title + why) and show the human. Group the trivial ones; do not propose noise.
36
+ 3. **Confirm** — the human accepts, edits, or declines each. No task is created without this.
37
+ 4. **Create** each accepted task — `add.py new-task <slug> --title "..."` — and run it through
38
+ the normal flow (specify → … → verify).
39
+ 5. **Repeat** until the work the goal needs is done.
40
+ 6. **Close** — when the goal is genuinely met, check the exit-criteria boxes in `MILESTONE.md`,
41
+ then `add.py milestone-done <slug>` succeeds (then consolidate the open deltas and archive).
42
+ Present the close via `report-template.md` — open with the ARC (goal · done · plan): the
43
+ milestone goal, the exit-criteria met that prove it, and the plan beyond the close.
44
+
45
+ ## Reopen is the verb; this loop is the trigger
46
+
47
+ When a deepened verify (the no-skim wiring / dead-code / semantic check) finds a criterion unmet
48
+ on a task already marked done, `add.py reopen <task> --to <phase> --reason "..."` returns it to the
49
+ flow with a recorded reason and a reset gate. `reopen` is the recorded action; deciding WHEN to
50
+ fire it — because a goal criterion is unmet — is this loop's job.
51
+
52
+ ## The reactivation residual (deferred)
53
+
54
+ A reopen fired inside the loop happens while the milestone is still **active** — the goal-gate held
55
+ it open, so it never reached done, and no reactivation is needed. The one residual — reopening a
56
+ task inside a milestone that was already closed — is surfaced by `add.py check` (a done milestone
57
+ with a live task reads as incoherent). Re-activating a closed milestone is **deferred**: resolve it
58
+ by hand for now (the loop's own design keeps in-flight milestones open), until a later task makes
59
+ milestone reactivation first-class.
@@ -0,0 +1,66 @@
1
+ # Phase 0 — Ground (the real codebase)
2
+
3
+ Goal: before you specify anything, gather the REAL current working folder the task will
4
+ touch — the actual files, symbols, signatures, docs, todos, config, data, patterns, and conventions — so the
5
+ contract, tests, and build are grounded in what exists, not in what you assume.
6
+ Fill **§0 GROUND** in TASK.md. Ground is a per-task preamble to the seven steps;
7
+ it is **AI-owned** — no human gate here (the one approval stays at the §3 freeze).
8
+
9
+ If you cannot name the files and symbols the task touches, you do not yet understand
10
+ the work — gathering them IS the job, not a detour.
11
+
12
+ ## Gather (in TASK.md §0)
13
+
14
+ - **Touches** — the real files · symbols · signatures the task will read or change,
15
+ named from the actual code (use your code-navigation tools — grep / symbol search,
16
+ never memory). Each as `path:symbol — what it is / how it is keyed`.
17
+ - **Context (working folder)** — beyond code, the NON-code artifacts the task touches:
18
+ docs/textbase (README · `*.md` · design notes) · TODOs (`TODO.md` · `FIXME`/`TODO`/`HACK`
19
+ comments · task lists) · config/manifests (configs · `.env.example` · `pyproject`/`package`
20
+ · CI) · data/fixtures (samples · fixtures · schemas). Gather only the TASK-SPECIFIC
21
+ delta — never index the whole repo.
22
+ - **Honors** — the patterns and conventions the work must respect, cited from
23
+ `PROJECT.md` / `CONVENTIONS.md`. Gather only the TASK-SPECIFIC delta — never
24
+ re-derive the architecture or re-run the setup brownfield scan.
25
+ - **Anchors the contract cites** — the specific symbols §3 CONTRACT will name. The
26
+ contract may cite only anchors that appear here.
27
+
28
+ **How — gather efficiently:** for the BROAD sweep, prefer a small-model subagent / fast
29
+ index / skim (offload to a cheap context, return a compact map); then DEEPEN on what THIS
30
+ task specifically needs — never lock a shallow first pass. A recommendation: the engine
31
+ never spawns a subagent (tool-agnostic), so the orchestrating agent chooses.
32
+
33
+ ## Greenfield / first task
34
+
35
+ The first task of a project runs ground too. When there is little or no code yet
36
+ (greenfield), or you are mid-setup, your grounding IS the foundation docs / brownfield
37
+ scan you just produced — point at them; do not re-scan. An honest "new module, no
38
+ existing code; honors CONVENTIONS.md §X" is a complete grounding.
39
+
40
+ ## AI prompt
41
+
42
+ <prompt>
43
+ Role: an engineer who reads the real code before designing against it.
44
+ Read first: PROJECT.md · CONVENTIONS.md · the actual files the task touches.
45
+ Objective: fill §0 GROUND with the real files/symbols/signatures + the conventions to
46
+ honor + the anchor points the contract will cite — gathered from the codebase, never assumed.
47
+ Steps:
48
+ 0. Sweep broad cheaply first — prefer a small-model subagent / fast index / skim — then deepen task-specifically.
49
+ 1. Locate the files and symbols the task reads or changes (code tools, not memory).
50
+ 2. Record their signatures / how they are keyed; cite the conventions to honor (task delta only).
51
+ 3. Name the anchors §3 will cite.
52
+ Never: invent a file, symbol, or signature you have not opened.
53
+ </prompt>
54
+
55
+ ## Exit gate
56
+
57
+ <exit_gate>
58
+ - [ ] The real files/symbols the task touches are named (from the code, not assumed).
59
+ - [ ] The conventions to honor are cited (task-delta only; no architecture re-scan).
60
+ - [ ] The anchors §3 will cite are listed — §3 names only anchors that exist here.
61
+ </exit_gate>
62
+
63
+ ## Next
64
+
65
+ `python3 .add/tooling/add.py advance` → read `phases/1-specify.md`.
66
+ Book: `docs/02-the-flow.md` (the flow; ground is the §0 preamble to the seven steps).
@@ -1,14 +1,14 @@
1
- # Phase 0 — Setup (autonomous draft → one human lock-down)
1
+ # Phase 0 — Setup (autonomous draft → one human baseline approval)
2
2
 
3
3
  Goal: point ADD at a repo and **you** draft the whole foundation — domain, first-milestone scope,
4
- and the first task's contract — then hand the human exactly one decision: the **lock-down**. Brownfield
4
+ and the first task's contract — then hand the human exactly one decision: the **baseline approval**. Brownfield
5
5
  is silent (the code answers the questions); greenfield keeps a short interview. Either way, the human's
6
- only gate is `add.py lock`. This is the setup-altitude analog of a task's one-approval contract freeze.
6
+ only gate is `add.py lock`. This is the setup-level analog of a task's one-approval contract freeze.
7
7
 
8
8
  ## 1 · Zero-touch entry — you run init yourself
9
9
 
10
10
  When there is no `.add/state.json`, do **not** tell the human to initialise — run it yourself. Infer the
11
- project name and stage from the repo, and **arm the lock-down gate** with `--await-lock`:
11
+ project name and stage from the repo, and **arm the baseline-approval gate** with `--await-lock`:
12
12
 
13
13
  ```bash
14
14
  python3 .add/tooling/add.py init --name "<inferred from repo/dir>" --stage <prototype|poc|mvp|production> --await-lock
@@ -27,16 +27,16 @@ python3 .add/tooling/add.py init --name "<inferred from repo/dir>" --stage <prot
27
27
 
28
28
  ## 2a · Brownfield — map it silently
29
29
 
30
- The code answers the questions a greenfield interview would ask, so **read it, don't ask**. Open
31
- `adopt.md` and follow it: fill each survivor file from the code, never clobber an existing one, and tag
30
+ The code answers the questions a greenfield interview would ask, so **read it instead of asking**. Open
31
+ `adopt.md` and follow it: fill each living-doc file from the code, never clobber an existing one, and tag
32
32
  every decision `evidence-grounded` (cite the file) or `guessed`. Ask the human **nothing** at this step.
33
33
 
34
- ## 2b · Greenfield — the 4-lens interview (kept): co-specify at foundation altitude
34
+ ## 2b · Greenfield — the 4-lens interview (kept): co-specify at foundation level
35
35
 
36
36
  An empty repo has no code to read, so run the short interview. This is the **co-specify at foundation
37
- altitude** move — the same diverge → converge → validate brainstorm a task's §1 uses (`phases/1-specify.md`),
37
+ level** move — the same diverge → converge → validate brainstorm a task's §1 uses (`phases/1-specify.md`),
38
38
  lifted to the foundation. Ask the one load-bearing question per lens (diverge), draft the foundation
39
- (converge), then rank what you're least sure of and show the top flag first (validate):
39
+ (converge), then rank where your confidence is lowest and show the top flag first (validate):
40
40
 
41
41
  | Lens | The one question that unblocks the section |
42
42
  |------|--------------------------------------------|
@@ -45,52 +45,59 @@ lifted to the foundation. Ask the one load-bearing question per lens (diverge),
45
45
  | Users (UDD) | The primary user and the one job they hire this for? (or "no UI — surface is X") |
46
46
  | Decisions | What's already decided that you'd regret re-litigating? (first Key Decision row) |
47
47
 
48
- Ask only the live ones; skip what the request already answers. Rank your drafts least-sure-first using the
49
- one notation every altitude shares — `⚠ <assumption> — least sure because <why>; if wrong: <cost>` — and
48
+ Ask only the live ones; skip what the request already answers. Rank your drafts lowest-confidence-first using the
49
+ one notation every scope level shares — `⚠ <assumption> — lowest confidence because <why>; if wrong: <cost>` — and
50
50
  tag thin or inferred answers `guessed`.
51
51
 
52
52
  ## 3 · Draft to the lock (both paths)
53
53
 
54
- 1. **Fill the survivors** (they outlive all code): `.add/PROJECT.md` (the foundation — Domain · Spec/active
54
+ 1. **Fill the living documentation** (it outlives all code): `.add/PROJECT.md` (the foundation — Domain · Spec/active
55
55
  milestone · UI/UX · Key Decisions, one screen), `CONVENTIONS.md`, `GLOSSARY.md`, `MODEL_REGISTRY.md`,
56
- `dependencies.allowlist`. Brownfield: from the code. Greenfield: from the interview, gaps flagged `guessed`.
56
+ `dependencies.allowlist`, and for a UI project — `DESIGN.md` (the design source of truth: identity ·
57
+ principles · screens · the named-set foundation pointers + render recipe; delete it if there's no UI).
58
+ Brownfield: from the code. Greenfield: from the interview, gaps flagged `guessed`.
57
59
  2. **Size the first milestone** (read `scope.md`) and draft its `MILESTONE.md` — goal · scope · exit criteria
58
60
  · breadth-first tasks.
59
- 3. **Create the first task and draft its candidate front.** `new-task` is allowed pre-lock:
61
+ 3. **Create the first task and draft its candidate specification bundle.** `new-task` is allowed pre-lock:
60
62
  ```bash
61
63
  python3 .add/tooling/add.py new-task <slug> --title "<first feature>"
62
64
  ```
63
65
  Draft §1 (specify) · §2 (scenarios) · §3 (contract). **Leave §3 `Status: DRAFT`** — the lock is its
64
66
  approval (see §5). You MAY `advance` through specify → scenarios → contract → tests pre-lock, but the
65
- engine **refuses crossing into build** until you `lock` (`setup_unlocked`). Sequence: front → lock → build.
67
+ engine **refuses crossing into build** until you `lock` (`setup_unlocked`). Sequence: bundle → lock → build.
66
68
  4. **Write `.add/SETUP-REVIEW.md`** per `setup-review.md`: every decision you drafted (foundation, scope,
67
- first contract), **least-sure-first**, each tagged `guessed` | `evidence-grounded`.
69
+ first contract), **lowest-confidence-first**, each tagged `guessed` | `evidence-grounded`.
68
70
 
69
- ## 4 · The one human gate — the lock-down
71
+ ## 4 · The one human gate — the baseline approval
70
72
 
71
- Present `SETUP-REVIEW.md` least-sure-first (the `guessed` rows are what the human must actually check). They
72
- sign **once**:
73
+ Open the report with the ARC (goal · done · plan) per `report-template.md`, then present
74
+ `SETUP-REVIEW.md` lowest-confidence-first (the `guessed` rows are what the human must actually check). They
75
+ confirm **once** — an explicit yes to the baseline approval itself, in conversation; ambient agreement mid-stream is
76
+ not a confirmation. On that recorded confirmation, you run the lock with their name:
73
77
 
74
78
  ```bash
75
79
  python3 .add/tooling/add.py lock --by "<name>"
76
80
  ```
77
81
 
78
- `lock` records the lock layers (foundation · scope · contract) in one atomic write and opens the build. It is
79
- judgment-free it does **not** parse `SETUP-REVIEW.md`; the human *reading* it is the review.
82
+ Typing the command themselves stays the **escape hatch** the decision is always the human's; you just
83
+ execute it. `lock` records the lock layers (foundation · scope · contract) in one atomic write and opens the
84
+ build. It is judgment-free — it does **not** parse `SETUP-REVIEW.md`; the human *reading* it is the review.
80
85
 
81
86
  ## 5 · After the lock
82
87
 
83
- - The lock **is** the first task's contract approval — the v7 one-approval-front and the lock-down collapse
88
+ - The lock **is** the first task's contract approval — the v7 specification-bundle approval and the baseline approval collapse
84
89
  into this single signature. Do **not** ask for a separate contract-freeze sign-off (that double-gates).
85
90
  - Stamp the first task's §3 `Status: FROZEN @ v1` (lock-authorized), then read `phases/5-build.md` — build is
86
91
  now open. Everything before this signature, you drafted.
87
92
 
88
93
  ## Exit gate
89
94
 
95
+ <exit_gate>
90
96
  - [ ] `.add/state.json` exists; setup was seeded unlocked (`--await-lock`) then locked.
91
- - [ ] Survivors filled (brownfield: from code, tagged evidence-grounded; greenfield: from the interview).
92
- - [ ] First task created; §1–§3 drafted; `.add/SETUP-REVIEW.md` written least-sure-first.
93
- - [ ] Human signed `add.py lock`; first task §3 `FROZEN @ v1`; build open.
97
+ - [ ] Living docs filled (brownfield: from code, tagged evidence-grounded; greenfield: from the interview).
98
+ - [ ] First task created; §1–§3 drafted; `.add/SETUP-REVIEW.md` written lowest-confidence-first.
99
+ - [ ] Human confirmed the baseline approval and `add.py lock --by` ran with their name; first task §3 `FROZEN @ v1`; build open.
100
+ </exit_gate>
94
101
 
95
102
  ## Next
96
103
 
@@ -11,42 +11,57 @@ understand the feature — that is information, not an obstacle. Stop and ask.
11
11
 
12
12
  1. **Diverge** — before drafting, surface the decision space: the 2–3 genuine framings of the
13
13
  feature + the open questions you would otherwise guess. Invite the user to add, kill,
14
- redirect. (Conversational — no new file. At prototype/poc this collapses to one sentence.)
15
- 2. **Converge** — draft §1, then RANK what you are least sure about (below).
14
+ redirect. (Conversational — no new file. At prototype/poc this shortens to one sentence.)
15
+ 2. **Converge** — draft §1, then RANK where your confidence is lowest (below).
16
16
  3. **Validate** — present the ranked uncertainty first; the user confirms, corrects, or sends back.
17
17
 
18
+ **Identity is direction, not default (UDD).** For UI/design work, identity values — the brand
19
+ color, the core palette, the typeface — are human-owned. Surface them for discussion during
20
+ Diverge; never assume a brand value. The UDD token dialect checks a token's *shape*; its *value*
21
+ is the user's call (`udd-tokens.md`).
22
+
18
23
  ## Produce (in TASK.md §1)
19
24
 
25
+ <output_format>
20
26
  - **Framings weighed** — a one-line trace of what you considered: `X (chosen) · Y · Z`.
21
27
  - **Must** — each required behavior.
22
28
  - **Reject** — each refused input/situation, paired with a **named error code**
23
29
  (`amount <= 0 -> "amount_invalid"`, never "handle bad input").
24
30
  - **After** — the state that is true once it succeeds.
25
- - **Assumptions — least-sure first** — ranked most-likely-wrong → least. The top 1–2 carry a
26
- `⚠` flag: `⚠ <assumption> — least sure because <why>; if wrong: <cost>`. The rest are the
27
- low-stakes `[x]` tail. Never a flat wall of equal `[x]` ticks that is what gets rubber-stamped.
31
+ - **Assumptions — lowest-confidence first** — ranked most-likely-wrong → least. The top 1–2 carry a
32
+ `⚠` flag: `⚠ <assumption> — lowest confidence because <why>; if wrong: <cost>`. The rest are the
33
+ low-stakes `[x]` tail. Keep the ranking visible — a flat list of equal `[x]` ticks gets approved without reading.
34
+ </output_format>
28
35
 
29
- ## The least-sure flag is bundle-wide
36
+ ## The lowest-confidence flag is bundle-wide
30
37
 
31
38
  The single human approval happens once, at the contract freeze, over the whole bundle. So your
32
- §1 ranking is the FIRST FEEDER into a bundle-level flag the user reads at the seam (`run.md`):
39
+ §1 ranking is the first input into a bundle-level flag the user reads at the decision point (`run.md`):
33
40
  *"of everything I'm asking you to freeze, these 1–2 are most likely wrong."* A flag may point at
34
41
  a §1 assumption, an uncovered scenario, or the contract shape.
35
42
 
36
43
  ## AI prompt
37
44
 
38
- > Role: a domain analyst who brainstorms, then asks rather than assumes. Read CONVENTIONS,
39
- > GLOSSARY, and the user's raw input. First surface 2–3 framings + open questions and let me
40
- > react. Then produce §1: Framings weighed, every Must, every Reject with a named error code,
41
- > the After state, and the Assumptions RANKED least-sure first flag the 1–2 you are least
42
- > sure about with why + cost. Never resolve an ambiguity by guessing.
45
+ <prompt>
46
+ Role: a domain analyst who brainstorms, then asks rather than assumes.
47
+ Read first: CONVENTIONS · GLOSSARY · the user's raw input.
48
+ Objective: fill §1 SPECIFY with zero ambiguity left for the AI to resolve by guessing.
49
+ Steps:
50
+ 1. Surface 2–3 framings + the open questions; let the user react before you draft.
51
+ 2. Produce §1 — Framings weighed, every Must, every Reject with a named error code, the
52
+ After state, and the Assumptions RANKED lowest-confidence first.
53
+ 3. Flag the 1–2 where your confidence is lowest, each with why + cost.
54
+ Never: resolve an ambiguity by guessing.
55
+ </prompt>
43
56
 
44
57
  ## Exit gate
45
58
 
59
+ <exit_gate>
46
60
  - [ ] Framings weighed noted; every required behavior stated.
47
61
  - [ ] Every rejection has a named error code; success state-change described.
48
- - [ ] Assumptions ordered least-sure first; the 1–2 `⚠` flags carry why + cost — or an honest
62
+ - [ ] Assumptions ordered lowest-confidence first; the 1–2 `⚠` flags carry why + cost — or an honest
49
63
  "none material" that still names the single biggest risk (never a blank "none").
64
+ </exit_gate>
50
65
 
51
66
  ## Next
52
67
 
@@ -6,6 +6,7 @@ generated from it. Fill **§2 SCENARIOS** in TASK.md.
6
6
 
7
7
  ## Produce (in TASK.md §2)
8
8
 
9
+ <output_format>
9
10
  ```gherkin
10
11
  Scenario: <short name>
11
12
  Given <starting situation>
@@ -15,20 +16,29 @@ Scenario: <short name>
15
16
  ```
16
17
 
17
18
  The `And ... unchanged` clause catches corrupting partial failures (e.g. a balance
18
- deducted before a check fails). Never omit it on a rejection.
19
+ deducted before a check fails). Include it on every rejection.
20
+ </output_format>
19
21
 
20
22
  ## AI prompt
21
23
 
22
- > Role: a specification tester. Read §1 and GLOSSARY. Write one scenario per Must
23
- > and per Reject rule. For every rejection add an And-clause asserting what must NOT
24
- > change. Results must be specific and observable — never "then it works".
24
+ <prompt>
25
+ Role: a specification tester.
26
+ Read first: §1 · GLOSSARY.
27
+ Objective: one scenario per Must and per Reject rule, each result specific and observable.
28
+ Steps:
29
+ 1. Write one scenario per Must rule and one per Reject rule.
30
+ 2. For every rejection add an And-clause asserting what must NOT change.
31
+ Never: settle for a vague result ("then it works") — results must be specific and observable.
32
+ </prompt>
25
33
 
26
34
  ## Exit gate
27
35
 
36
+ <exit_gate>
28
37
  - [ ] One scenario per Must rule.
29
38
  - [ ] One scenario per Reject rule.
30
39
  - [ ] Each result is a specific, observable fact.
31
40
  - [ ] Every rejection asserts what stays unchanged.
41
+ </exit_gate>
32
42
 
33
43
  ## Next
34
44
 
@@ -1,12 +1,13 @@
1
1
  # Phase 3 — Contract (freeze the shape)
2
2
 
3
3
  Goal: fix the external shape — interfaces, data, names, error cases — and FREEZE
4
- it. This is the seam that makes the AI-led build safe: below it code is
4
+ it. This is the decision point that makes the AI-led build safe: below it code is
5
5
  disposable; above it nothing breaks because the shape does not move. Fill
6
6
  **§3 CONTRACT** in TASK.md.
7
7
 
8
8
  ## Produce (in TASK.md §3)
9
9
 
10
+ <output_format>
10
11
  - Interfaces (endpoints/functions/messages) with inputs/outputs.
11
12
  - Request/response shapes + persistent schema (note transactional needs).
12
13
  - Names drawn from `GLOSSARY.md` (same concept = same name everywhere).
@@ -14,42 +15,56 @@ disposable; above it nothing breaks because the shape does not move. Fill
14
15
 
15
16
  Then mark `Status: FROZEN @ v1`. Generate a mock + contract tests so dependent
16
17
  work can start before the real code exists.
18
+ </output_format>
17
19
 
18
- **The freeze is the one approval.** This seam is where the single human approval lands, over the
19
- whole bundle (§1–§4). Before asking for it, present the bundle **least-sure first**: the 1–2 points
20
+ **The freeze is the one approval.** This decision point is where the single human approval lands, over the
21
+ whole bundle (§1–§4). Before asking for it, present the bundle **lowest-confidence first**: the 1–2 points
20
22
  most likely wrong (`⚠ [spec|scenario|contract|test] … — because …; if wrong: …`) — aim the human's
21
- eye before they freeze. See `run.md`.
23
+ eye before they freeze. Open that report with the ARC (goal · done · plan) per `report-template.md` so the
24
+ human sees the goal this freeze serves and the plan beyond it, not just the bundle. See `run.md`.
25
+ The approval also freezes the §5 Scope (may touch) + Strategy declarations — the bundle covers them.
22
26
 
23
27
  ## The freeze review checklist
24
28
 
25
- The human's one minute, aimed. Walk these six before saying yes:
29
+ The human's one minute, aimed. Walk these seven before saying yes:
26
30
 
27
- - **⚠ flags first** — read the least-sure flags; accept each knowing its cost if wrong.
31
+ - **⚠ flags first** — read the lowest-confidence flags; accept each knowing its cost if wrong.
32
+ The engine refuses an unflagged freeze before build: a frozen §3 with no well-formed
33
+ lowest-confidence flag is rejected (`unflagged_freeze`), and `audit` re-checks it on every
34
+ record that crossed.
28
35
  - **Intent** — does §1 say what you actually want built (and is anything you expected missing)?
29
36
  - **Cases** — does every Must and Reject have an observable §2 scenario you care about?
30
37
  - **Shape** — glossary names, error codes, additive vs breaking: is THIS the shape to freeze?
38
+ - **Grounded** — does §3 cite anchors that exist in the §0 GROUND map (real files/symbols), not invented ones? `status`/`check` surface this — measure, never block.
31
39
  - **Risk** — is this scope high-risk or method-defining? Then require
32
40
  `risk: high · autonomy: conservative` in the TASK.md header — the engine refuses an unguarded completion.
33
41
  - **Tests** — will §4 go red for the right reason, asserting behavior rather than internals?
34
42
 
35
- This checklist AIMS the one approval — never a second gate, no sign-off forms, no
43
+ This checklist AIMS the one approval — the freeze stays the only gate: no sign-off forms, no
36
44
  extra documents. Reject any line and the bundle goes back to draft; that is
37
45
  backward-correction, not failure.
38
46
 
39
47
  ## AI prompt
40
48
 
41
- > Role: an interface architect; frozen contracts are immutable. Read §1, §2,
42
- > GLOSSARY. Produce §3: interfaces, shapes, schema named from the glossary; a
43
- > response for every Reject code; a mock returning the contracted shapes and
44
- > contract tests pinning them. Mark FROZEN. No business logic. Never change a
45
- > frozen contract — a change reopens Specify.
49
+ <prompt>
50
+ Role: an interface architect; frozen contracts are immutable.
51
+ Read first: §1 · §2 · GLOSSARY.
52
+ Objective: produce §3 the frozen external shape, nothing more.
53
+ Steps:
54
+ 1. Define interfaces, shapes, and schema named from the glossary, with a response for every Reject code.
55
+ 2. Generate a mock returning the contracted shapes and contract tests pinning them.
56
+ 3. Mark FROZEN. No business logic.
57
+ Never: change a frozen contract — a change reopens Specify.
58
+ </prompt>
46
59
 
47
60
  ## Exit gate
48
61
 
62
+ <exit_gate>
49
63
  - [ ] Versioned and marked `FROZEN`.
50
64
  - [ ] Contract tests pass against the mock.
51
65
  - [ ] Every name matches the glossary.
52
66
  - [ ] Every spec rejection has a contracted response.
67
+ </exit_gate>
53
68
 
54
69
  ## Next
55
70
 
@@ -1,4 +1,4 @@
1
- # Phase 4 — Tests (red safety net)
1
+ # Phase 4 — Tests (failing-first suite)
2
2
 
3
3
  Goal: turn scenarios + contract into automated tests and confirm they FAIL before
4
4
  any code exists. This operationalizes red/green TDD: red now, green only after
@@ -12,10 +12,12 @@ before code exists is testing nothing and will wave bad code through later.
12
12
 
13
13
  ## Produce
14
14
 
15
+ <output_format>
15
16
  - One executable test per scenario (§2), asserting **behavior, not internals**.
16
17
  - Contract-conformance tests (shapes + error responses from §3).
17
18
  - Side-effect assertions on rejection paths (`assert balance unchanged`).
18
19
  - A recorded coverage target in §4.
20
+ </output_format>
19
21
 
20
22
  ## Declaring where tests live
21
23
 
@@ -33,17 +35,25 @@ symlink escapes are never read.
33
35
 
34
36
  ## AI prompt
35
37
 
36
- > Role: a test author who writes tests before code. Read §2 and §3. Turn each
37
- > scenario into an executable test; add contract-conformance and edge-case tests;
38
- > run the suite and confirm it fails for the right reason. Record a coverage
39
- > target. Do NOT implement the feature. Never assert on internals.
38
+ <prompt>
39
+ Role: a test author who writes tests before code.
40
+ Read first: §2 · §3.
41
+ Objective: a red suite that fails for the right reason behavior, not internals.
42
+ Steps:
43
+ 1. Turn each scenario into an executable test.
44
+ 2. Add contract-conformance and edge-case tests.
45
+ 3. Run the suite and confirm it fails for the right reason; record a coverage target.
46
+ Never: implement the feature, or assert on internals.
47
+ </prompt>
40
48
 
41
49
  ## Exit gate
42
50
 
51
+ <exit_gate>
43
52
  - [ ] One test per scenario.
44
53
  - [ ] Suite runs and is **red for the right reason**.
45
54
  - [ ] Tests assert observable behavior.
46
55
  - [ ] Coverage target recorded.
56
+ </exit_gate>
47
57
 
48
58
  ## Next
49
59