@pilotspace/add 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (53) hide show
  1. package/GETTING-STARTED.md +238 -0
  2. package/LICENSE +20 -0
  3. package/README.md +106 -0
  4. package/bin/cli.js +131 -0
  5. package/docs/00-introduction.md +46 -0
  6. package/docs/01-principles.md +71 -0
  7. package/docs/02-the-flow.md +93 -0
  8. package/docs/03-step-1-specify.md +117 -0
  9. package/docs/04-step-2-scenarios.md +78 -0
  10. package/docs/05-step-3-contract.md +78 -0
  11. package/docs/06-step-4-tests.md +71 -0
  12. package/docs/07-step-5-build.md +80 -0
  13. package/docs/08-step-6-verify.md +63 -0
  14. package/docs/09-the-loop.md +43 -0
  15. package/docs/10-setup-and-stages.md +75 -0
  16. package/docs/11-governance.md +87 -0
  17. package/docs/12-roles.md +99 -0
  18. package/docs/13-adoption.md +67 -0
  19. package/docs/14-foundation.md +121 -0
  20. package/docs/README.md +70 -0
  21. package/docs/add-competencies.png +0 -0
  22. package/docs/add-flow.png +0 -0
  23. package/docs/add-foundation.png +0 -0
  24. package/docs/add-hierarchy.png +0 -0
  25. package/docs/appendix-a-templates.md +88 -0
  26. package/docs/appendix-b-prompts.md +119 -0
  27. package/docs/appendix-c-glossary.md +85 -0
  28. package/docs/appendix-d-worked-example.md +152 -0
  29. package/docs/appendix-e-checklists.md +80 -0
  30. package/docs/appendix-f-requirements-matrix.md +170 -0
  31. package/package.json +47 -0
  32. package/skill/add/SKILL.md +118 -0
  33. package/skill/add/deltas.md +69 -0
  34. package/skill/add/fold.md +66 -0
  35. package/skill/add/intake.md +49 -0
  36. package/skill/add/phases/0-setup.md +35 -0
  37. package/skill/add/phases/1-specify.md +55 -0
  38. package/skill/add/phases/2-scenarios.md +36 -0
  39. package/skill/add/phases/3-contract.md +41 -0
  40. package/skill/add/phases/4-tests.md +37 -0
  41. package/skill/add/phases/5-build.md +38 -0
  42. package/skill/add/phases/6-verify.md +39 -0
  43. package/skill/add/phases/7-observe.md +32 -0
  44. package/skill/add/run.md +152 -0
  45. package/skill/add/scope.md +58 -0
  46. package/tooling/add.py +1573 -0
  47. package/tooling/templates/CONVENTIONS.md.tmpl +8 -0
  48. package/tooling/templates/GLOSSARY.md.tmpl +3 -0
  49. package/tooling/templates/MILESTONE.md.tmpl +25 -0
  50. package/tooling/templates/MODEL_REGISTRY.md.tmpl +6 -0
  51. package/tooling/templates/PROJECT.md.tmpl +42 -0
  52. package/tooling/templates/TASK.md.tmpl +111 -0
  53. package/tooling/templates/dependencies.allowlist.tmpl +2 -0
@@ -0,0 +1,118 @@
1
+ ---
2
+ name: add
3
+ description: >-
4
+ ADD (AI-Driven Development) — a minimal, state-tracked workflow for building
5
+ software where the AI writes the code and the human owns direction and
6
+ verification. Drives every feature through one lean TASK.md: Specify →
7
+ Scenarios → Contract → Tests → Build → Verify → Observe, with red/green TDD
8
+ built in. Use this skill whenever working in a repo that has a `.add/`
9
+ directory, when the user says "add", "start a task", "next phase", "specify
10
+ this feature", "ADD method", or "AI-driven development", or when scaffolding a
11
+ new feature and you want spec/tests-first discipline instead of vague-prompt
12
+ coding. Also use it to resume work across sessions (it reads `.add/state.json`
13
+ so you never re-read the whole repo).
14
+ ---
15
+
16
+ # ADD — the orchestration engine
17
+
18
+ You are the orchestrator. ADD keeps the AI fast *and* safe by fixing direction
19
+ (spec, scenarios, contract, failing tests) **before** the build, and trusting
20
+ the result through passing evidence rather than a plausible-looking diff.
21
+
22
+ **One file = one task.** Each feature lives in a single `.add/tasks/<slug>/TASK.md`
23
+ with seven sections. You fill them top to bottom; the Python tool tracks where
24
+ you are so context never rots across sessions.
25
+
26
+ ## Always start here (orient — do not skip)
27
+
28
+ Run the tool to find the resume point instead of re-reading the repo:
29
+
30
+ ```bash
31
+ python3 .add/tooling/add.py status
32
+ ```
33
+
34
+ - **No `.add/` yet** → go to **phase 0 (setup)**: read `phases/0-setup.md`.
35
+ - **A task is active** → open `.add/tasks/<active>/TASK.md`, look at its `phase:`
36
+ marker, and read the matching `phases/<n>-<phase>.md`. Work *only* that phase.
37
+ - **No active task** → first SIZE the request (see Intake below), then create the
38
+ right scope: `python3 .add/tooling/add.py new-task <slug> --title "..."`.
39
+
40
+ ## Intake — size a request before creating scope
41
+
42
+ When the user brings a raw request, classify it BEFORE making a milestone or task:
43
+ read `intake.md` and place it in exactly one bucket — `new-major` · `sub-milestone`
44
+ · `task` · `change-request` — then propose `{ bucket, rationale, command }` and let
45
+ the human confirm. This is the intake altitude (request → versioned scope); see
46
+ `intake.md` for the rubric, the tie-break order, and worked examples.
47
+
48
+ Once a request is classified `new-major`/`sub-milestone`, drafting the actual
49
+ `MILESTONE.md` (goal · scope · exit criteria · breadth-first tasks) is the second
50
+ half of intake: read `scope.md` for how to fill it well, the per-outcome behavior,
51
+ and the confirm-before-create rule. You propose the draft; the human confirms.
52
+
53
+ ## The flow and which file to load
54
+
55
+ Load the phase guide **only for the phase you are in** (progressive disclosure):
56
+
57
+ | Phase | Guide | Produces (TASK.md section) | Who leads |
58
+ |-------|-------|----------------------------|-----------|
59
+ | setup | `phases/0-setup.md` | `.add/` + survivor files | human |
60
+ | specify | `phases/1-specify.md` | §1 rules + ranked least-sure flag | human + AI (co-specify) |
61
+ | scenarios | `phases/2-scenarios.md` | §2 Given/When/Then | human |
62
+ | contract | `phases/3-contract.md` | §3 frozen shape | human + AI |
63
+ | tests | `phases/4-tests.md` | §4 + red suite in `tests/` | human sets, AI writes |
64
+ | build | `phases/5-build.md` | code in `src/`, tests green | **AI** |
65
+ | verify | `phases/6-verify.md` | §6 checks + gate record | **human** |
66
+ | observe | `phases/7-observe.md` | §7 spec delta | human + AI |
67
+
68
+ In **observe**, also emit **competency deltas** — learnings tagged by which of the five
69
+ (`DDD · SDD · UDD · TDD · ADD`) they improve — so the foundation self-improves across loops.
70
+ You write them as `open`; the human folds them into `PROJECT.md`. Read `deltas.md` for the
71
+ grammar and the status lifecycle. At milestone close (or on demand), run the fold ritual that
72
+ gathers confirmed deltas into a versioned foundation — read `fold.md`.
73
+
74
+ ## The dynamic run (v6)
75
+
76
+ Once **§3 CONTRACT is FROZEN**, the build→verify half MAY run as a dynamic, auto-gated run —
77
+ fan-out + in-run convergence — instead of a manual build. Read `run.md` for the trigger, the
78
+ touch-boundary, the evidence auto-gate, and the autonomy dial. The human-led front
79
+ (specify·scenarios·contract) is unchanged; the run never edits a frozen contract and never
80
+ auto-passes a security finding.
81
+
82
+ ## Non-negotiable rules (from the method)
83
+
84
+ 1. **Direction before speed.** Never start Build until §1–§4 exist and tests are red.
85
+ 2. **Trust evidence, not inspection.** A feature is trusted because its tests pass
86
+ and the blind-spots (concurrency, security, architecture) were checked — not
87
+ because the code reads plausibly.
88
+ 3. **Never weaken a test or edit a frozen contract to make the build pass.** That
89
+ inverts the method. A real change is a *change request* back to Specify.
90
+ 4. **No silent skips.** Every Verify ends in exactly one recorded outcome:
91
+ `PASS`, `RISK-ACCEPTED` (signed, non-security only), or `HARD-STOP`. A security
92
+ finding is always `HARD-STOP`.
93
+ 5. **Ask, don't guess.** If a requirement is unclear, stop and ask the user.
94
+
95
+ ## Advancing
96
+
97
+ After a phase's exit gate is met, advance the state (this also syncs the marker
98
+ inside TASK.md):
99
+
100
+ ```bash
101
+ python3 .add/tooling/add.py advance # next phase of the active task
102
+ python3 .add/tooling/add.py gate PASS # at verify: records PASS, marks done
103
+ ```
104
+
105
+ ## Depth by stage
106
+
107
+ The steps never change; their depth does. Read the stage from `add.py status`:
108
+
109
+ - **prototype** — run light; code is throwaway; design/experience is the point.
110
+ - **poc** — run contract/tests/build deeply on the single riskiest slice only.
111
+ - **mvp** — full flow, narrow scope, light observation.
112
+ - **production** — every step at full rigor + the observe loop.
113
+
114
+ ## The trust layer
115
+
116
+ The full method (the *why* behind every rule) is the AIDD book in `.add/docs/`.
117
+ When a phase decision is genuinely unclear, read the linked chapter — each phase
118
+ guide points to its chapter. Do not duplicate the book here; load it on demand.
@@ -0,0 +1,69 @@
1
+ # Competency deltas — how each loop sharpens the foundation
2
+
3
+ A **competency delta** is a single learning a task produces, tagged by which of ADD's five
4
+ competencies it improves. You write deltas in a task's **OBSERVE** phase; later, the
5
+ `foundation-update-loop` gathers the confirmed ones and folds them into a versioned `PROJECT.md`.
6
+ This is how `DDD · SDD · UDD · TDD · ADD` stop being write-once and start converging.
7
+
8
+ You (the AI) **emit** deltas as `open`. Only the **human** moves a delta to `folded` or `rejected`
9
+ (folding into the foundation is judgment — see the verify/observe seam). You never self-fold.
10
+
11
+ ## The grammar (frozen)
12
+
13
+ Each delta is ONE line, exactly:
14
+
15
+ ```
16
+ - [<COMPETENCY> · <status>] <learning> (evidence: <pointer>)
17
+ ```
18
+
19
+ - `<COMPETENCY>` — exactly one of the five (below).
20
+ - `<status>` — `open` | `folded` | `rejected`. A **newly emitted delta is `open`**.
21
+ - `<learning>` — the insight, in one phrase ("the domain model missed multi-tenancy").
22
+ - `(evidence: …)` — **required**, non-empty: a failing scenario, a production signal, a review
23
+ note. No evidence → it is an opinion, not a delta.
24
+
25
+ ## The five competencies (pick exactly one per delta)
26
+
27
+ | tag | competency | a delta here means you learned something about… |
28
+ |-----|------------|--------------------------------------------------|
29
+ | `DDD` | Domain | the domain model — an entity, rule, or boundary the spec assumed wrong |
30
+ | `SDD` | Spec | what the feature must do / must reject — a missing or wrong requirement |
31
+ | `UDD` | UI/UX | the user-facing shape — a flow, affordance, or wording that misled |
32
+ | `TDD` | Test | how we prove correctness — a missing scenario, a flaky or hollow test |
33
+ | `ADD` | AI/build | how the AI builds — a harness, prompt, or convention that helped or hurt |
34
+
35
+ If a learning seems to touch two, ask "which competency, once updated, would have PREVENTED this?"
36
+ That is its home. Split genuinely separate learnings into separate deltas; never tag one twice.
37
+
38
+ ## Status lifecycle
39
+
40
+ ```
41
+ emit (OBSERVE) human review (foundation-update-loop)
42
+ open ───────────▶ folded (the learning is merged into PROJECT.md; version bumps)
43
+ └──────────▶ rejected (considered and deliberately NOT folded — the trail is kept)
44
+ ```
45
+
46
+ An `open` delta is a pending signal. `folded` and `rejected` are both human decisions; a `rejected`
47
+ delta is left in place (not deleted) so "we saw this and chose not to act" stays auditable.
48
+
49
+ ## Reject codes (well-formedness — you are the first check, the human is the backstop)
50
+
51
+ There is no engine validator yet, so before you record a delta, self-check it:
52
+
53
+ - `unknown_competency` — the tag is missing or not one of `DDD · SDD · UDD · TDD · ADD`. Fix the tag.
54
+ - `no_evidence` — the `(evidence: …)` pointer is missing or empty. Add the proof, or drop the line.
55
+ - `unknown_status` — the status is not `open | folded | rejected`. A fresh delta is `open`.
56
+
57
+ ## Worked example
58
+
59
+ A task that built a tenancy feature finished its OBSERVE phase with:
60
+
61
+ ```
62
+ - [DDD · open] the account model conflated org and workspace (evidence: scenario_cross_tenant_read failed)
63
+ - [TDD · open] no scenario covered a deleted tenant's dangling sessions (evidence: review note, PR thread)
64
+ - [ADD · open] the scaffold's allow-list missed the tenancy lib, slowing build (evidence: build log retry)
65
+ ```
66
+
67
+ Three learnings, three competencies, each with a pointer. At the next foundation update the human
68
+ folded the DDD and TDD deltas into `PROJECT.md` (→ `folded`) and rejected the ADD one as a one-off
69
+ (→ `rejected`). The foundation got sharper; nothing was silently lost.
@@ -0,0 +1,66 @@
1
+ # Folding deltas — how the foundation self-improves
2
+
3
+ This **closes the loop**. `deltas.md` lets a task EMIT learnings (`open` competency deltas in its
4
+ OBSERVE phase); folding gathers the confirmed ones and writes them into a **versioned foundation**,
5
+ so `DDD · SDD · UDD · TDD · ADD` sharpen across milestones instead of drifting.
6
+
7
+ You (the AI) **gather and propose**; the **human confirms**; you then write the **append-only** fold.
8
+ You never self-fold — folding is judgment (see the verify/observe seam).
9
+
10
+ ## When to fold
11
+
12
+ At **milestone close** (the natural "version bump to the foundation"), or **on demand** when open
13
+ deltas have piled up. This is a convention, not a command — there is no `add.py fold`; the ritual
14
+ lives here so the engine stays judgment-free.
15
+
16
+ ## The ritual
17
+
18
+ 1. **Gather** — scan every task's OBSERVE `### Competency deltas` block for lines still `open`.
19
+ 2. **Group** — bucket them by competency (`DDD · SDD · UDD · TDD · ADD`).
20
+ 3. **Propose** — for each, draft the exact foundation edit (see routing) and show the human.
21
+ 4. **Confirm** — the human accepts or declines each delta. No write happens without this.
22
+ 5. **Write** — append the accepted edits, flip each delta's status, and bump the version.
23
+
24
+ ## Fold routing (every competency has a home)
25
+
26
+ | competency | folds into | how |
27
+ |------------|-----------|-----|
28
+ | `DDD` | `PROJECT.md` §Domain (DDD) | refine/append a model bullet |
29
+ | `SDD` | `PROJECT.md` §Spec / Living Document (SDD) | refine/append a settled-vs-open line |
30
+ | `UDD` | `PROJECT.md` §Users (UDD) | refine/append a UX line |
31
+ | `TDD` | `CONVENTIONS.md` | append a testing convention (no PROJECT.md section — it is the engine) |
32
+ | `ADD` | `CONVENTIONS.md` | append a build/harness convention (likewise the engine) |
33
+
34
+ **Every** fold — whatever the competency — ALSO appends one row to `PROJECT.md` **§Key Decisions**
35
+ (date · decision · why · outcome): the universal, auditable trail of what the foundation learned.
36
+
37
+ ## Status transitions & version
38
+
39
+ - on **confirm**: the delta moves `open` → `folded` (and its edit is appended to the routed target).
40
+ - on **decline**: the delta moves `open` → `rejected` and is **left in place** — never deleted —
41
+ so "we considered this and chose not to act" stays auditable.
42
+ - a fold is **append-only**: it adds bullets/rows; it never silently rewrites existing foundation text.
43
+ - each fold session **bumps** the `foundation-version:` marker in `PROJECT.md` by one (monotonic int).
44
+
45
+ ## Reject codes (the AI is first check, the human the backstop)
46
+
47
+ - `no_open_deltas` — nothing is `open` anywhere. The ritual is a no-op; do **not** bump the version.
48
+ - `unconfirmed_fold` — a write was attempted without recorded human confirmation. The AI proposes;
49
+ it never self-folds. Stop and get confirmation.
50
+ - `unroutable_delta` — a delta's competency is not one of the five, so it has no fold target. Fix the
51
+ delta (it is malformed per `deltas.md`) before folding.
52
+
53
+ ## Worked example (from this repo's own history)
54
+
55
+ The `competency-deltas` task closed its OBSERVE with two deltas — the homeless ones, `TDD`/`ADD`,
56
+ which have no PROJECT.md section:
57
+
58
+ ```
59
+ - [ADD · open] dogfood .add/tooling template can silently diverge from canonical (evidence: md5 mismatch this build)
60
+ - [TDD · open] structural tests guard canonical artifacts but not their dogfood twins (evidence: scope-loop note + this build)
61
+ ```
62
+
63
+ At the next fold the human confirms both. Routing sends each to `CONVENTIONS.md` (a "sync the dogfood
64
+ tree + assert md5 parity" convention), appends a §Key Decisions row for each, flips them to `folded`,
65
+ and bumps `foundation-version` 1 → 2. The two competencies the foundation never tracked before now
66
+ have a home — which is exactly why v5 routes TDD/ADD to `CONVENTIONS.md`.
@@ -0,0 +1,49 @@
1
+ # Intake — size a request into versioned scope
2
+
3
+ Before a task exists, ADD turns a raw request into correctly-sized, versioned scope.
4
+ This is the **intake altitude**: the per-task flow is phases 0–7; intake is the step
5
+ *before* a task — request → milestone or task. You (the AI) **propose**; the human
6
+ **confirms**. Never create scope without a confirmed proposal.
7
+
8
+ ## The four buckets
9
+
10
+ Classify every request into exactly ONE bucket:
11
+
12
+ | Bucket | Decision test | Implied command |
13
+ |--------|---------------|-----------------|
14
+ | `new-major` | a new product theme/pillar no active milestone's goal covers | `add.py new-milestone vN` |
15
+ | `sub-milestone` | a slice of an EXISTING major theme, too big for one task | `add.py new-milestone vN-M` |
16
+ | `task` | fits within the ACTIVE milestone's stated scope | `add.py new-task <slug>` |
17
+ | `change-request` | modifies ALREADY-FROZEN scope (a frozen contract or a shipped promise) | `add.py phase specify\|contract <affected>` |
18
+
19
+ **Tie-break order: the frozen-scope test runs FIRST, before the size test.**
20
+ First ask "does this change already-frozen scope?" → if yes, it is a `change-request`
21
+ (never re-size frozen work as new scope). Only if no, apply the size test: a new theme
22
+ → `new-major`; a slice of a live theme → `sub-milestone`; fits the active milestone
23
+ → `task`.
24
+
25
+ ## What you emit (the proposal)
26
+
27
+ For every request, emit ONE of:
28
+
29
+ - **a classification** — `{ bucket, rationale, command }` — where `rationale` names WHY
30
+ (the theme, the slice, the fit, or the frozen scope touched) and `command` is the exact
31
+ `add.py …` from the table. The human confirms or overrides before you run it.
32
+ - **a rejection** — `{ reject, rationale }` — and you create nothing:
33
+ - `ask_human` — too ambiguous/underspecified to size. Ask the human; never guess a bucket.
34
+ - `frozen_scope` — it changes frozen scope; route it as a `change-request` back to
35
+ SPECIFY/CONTRACT of the affected task — never spawn a parallel milestone that forks the truth.
36
+ - `split_required` — it spans more than one bucket; propose the SMALLEST set of correctly-sized
37
+ items, each with its own rationale; never force it into one milestone.
38
+
39
+ When confirmed, record the `rationale` in the artifact you create or affect — the new
40
+ MILESTONE.md goal/body, the new TASK.md, or a note in the affected TASK.md — never in state.json.
41
+
42
+ ## Worked examples (from this project's own history)
43
+
44
+ | request | bucket | rationale |
45
+ |---------|--------|-----------|
46
+ | give ADD a hosted web dashboard | new-major | a new product theme no active milestone's goal covers → a fresh major line (v5) |
47
+ | add the build corridor + tests-red-before-build | sub-milestone | a slice of the live v4 "self-driving" theme, too big for one task → v4-2 |
48
+ | expose owner/stop as --json | task | fits the active v4-1 (intake interface) scope → one task |
49
+ | guide --json phase/gate should be nullable | change-request | changes the FROZEN machine-state-json contract → reopen its CONTRACT, do not make a new milestone |
@@ -0,0 +1,35 @@
1
+ # Phase 0 — Setup (once per project)
2
+
3
+ Goal: make every later gate enforceable automatically. Do this once.
4
+
5
+ ## Do
6
+
7
+ 1. Initialise the runtime (creates `.add/` + survivor-layer files):
8
+ ```bash
9
+ python3 .add/tooling/add.py init --name "<project>" --stage prototype
10
+ ```
11
+ If the tool isn't there yet, the installer (`npx @pilotspace/add init`) placed it at
12
+ `.add/tooling/add.py`.
13
+ 2. Fill the survivor-layer files (they outlive all code):
14
+ - `.add/PROJECT.md` — **the foundation**: Domain (DDD) · Spec/Living-Document (SDD,
15
+ → active milestone) · UI/UX (UDD) · Key Decisions. Cross-milestone context the
16
+ engine reads first. Keep it to one screen. Book: `docs/14-foundation.md`.
17
+ - `.add/CONVENTIONS.md` — language, folders, naming, lint, error-code style, architecture.
18
+ - `.add/GLOSSARY.md` — one name per concept; used in specs, contracts, and code.
19
+ - `.add/MODEL_REGISTRY.md` — which AI model/version writes this project.
20
+ - `.add/dependencies.allowlist` — packages the AI may use; CI rejects others.
21
+ 3. Confirm CI runs green on the empty skeleton before the first feature.
22
+
23
+ ## Exit gate
24
+
25
+ - [ ] `.add/state.json` exists (`add.py status` works).
26
+ - [ ] `.add/PROJECT.md` foundation filled (domain · spec · UI/UX).
27
+ - [ ] CONVENTIONS, GLOSSARY, MODEL_REGISTRY, allowlist filled.
28
+ - [ ] Pipeline green on the skeleton.
29
+
30
+ ## Next
31
+
32
+ ```bash
33
+ python3 .add/tooling/add.py new-task <slug> --title "<feature>"
34
+ ```
35
+ Then read `phases/1-specify.md`. · Book: `docs/10-setup-and-stages.md`.
@@ -0,0 +1,55 @@
1
+ # Phase 1 — Specify (the rules)
2
+
3
+ Goal: state what the feature MUST do and what it must REJECT, with zero ambiguity
4
+ for the AI to resolve by guessing. Fill **§1 SPECIFY** in TASK.md.
5
+
6
+ Specify is **co-specification**: brainstorm the shape WITH the user, draft it, then let
7
+ the user validate with your advice. If you cannot write the spec, you do not yet
8
+ understand the feature — that is information, not an obstacle. Stop and ask.
9
+
10
+ ## Co-specify in three moves
11
+
12
+ 1. **Diverge** — before drafting, surface the decision space: the 2–3 genuine framings of the
13
+ feature + the open questions you would otherwise guess. Invite the user to add, kill,
14
+ redirect. (Conversational — no new file. At prototype/poc this collapses to one sentence.)
15
+ 2. **Converge** — draft §1, then RANK what you are least sure about (below).
16
+ 3. **Validate** — present the ranked uncertainty first; the user confirms, corrects, or sends back.
17
+
18
+ ## Produce (in TASK.md §1)
19
+
20
+ - **Framings weighed** — a one-line trace of what you considered: `X (chosen) · Y · Z`.
21
+ - **Must** — each required behavior.
22
+ - **Reject** — each refused input/situation, paired with a **named error code**
23
+ (`amount <= 0 -> "amount_invalid"`, never "handle bad input").
24
+ - **After** — the state that is true once it succeeds.
25
+ - **Assumptions — least-sure first** — ranked most-likely-wrong → least. The top 1–2 carry a
26
+ `⚠` flag: `⚠ <assumption> — least sure because <why>; if wrong: <cost>`. The rest are the
27
+ low-stakes `[x]` tail. Never a flat wall of equal `[x]` ticks — that is what gets rubber-stamped.
28
+
29
+ ## The least-sure flag is bundle-wide
30
+
31
+ The single human approval happens once, at the contract freeze, over the whole bundle. So your
32
+ §1 ranking is the FIRST FEEDER into a bundle-level flag the user reads at the seam (`run.md`):
33
+ *"of everything I'm asking you to freeze, these 1–2 are most likely wrong."* A flag may point at
34
+ a §1 assumption, an uncovered scenario, or the contract shape.
35
+
36
+ ## AI prompt
37
+
38
+ > Role: a domain analyst who brainstorms, then asks rather than assumes. Read CONVENTIONS,
39
+ > GLOSSARY, and the user's raw input. First surface 2–3 framings + open questions and let me
40
+ > react. Then produce §1: Framings weighed, every Must, every Reject with a named error code,
41
+ > the After state, and the Assumptions RANKED least-sure first — flag the 1–2 you are least
42
+ > sure about with why + cost. Never resolve an ambiguity by guessing.
43
+
44
+ ## Exit gate
45
+
46
+ - [ ] Framings weighed noted; every required behavior stated.
47
+ - [ ] Every rejection has a named error code; success state-change described.
48
+ - [ ] Assumptions ordered least-sure first; the 1–2 `⚠` flags carry why + cost — or an honest
49
+ "none material" that still names the single biggest risk (never a blank "none").
50
+
51
+ ## Next
52
+
53
+ `python3 .add/tooling/add.py advance` → read `phases/2-scenarios.md`.
54
+ Book: `docs/03-step-1-specify.md`. (UI feature? also sketch flows + every screen
55
+ state: loading/empty/error/success.)
@@ -0,0 +1,36 @@
1
+ # Phase 2 — Scenarios (pass/fail cases)
2
+
3
+ Goal: rewrite each rule as a concrete Given/When/Then that is readable by people
4
+ and checkable by machines. This is the highest-leverage artifact — the tests are
5
+ generated from it. Fill **§2 SCENARIOS** in TASK.md.
6
+
7
+ ## Produce (in TASK.md §2)
8
+
9
+ ```gherkin
10
+ Scenario: <short name>
11
+ Given <starting situation>
12
+ When <action>
13
+ Then <observable result>
14
+ And <what must remain unchanged> # REQUIRED for every rejection
15
+ ```
16
+
17
+ The `And ... unchanged` clause catches corrupting partial failures (e.g. a balance
18
+ deducted before a check fails). Never omit it on a rejection.
19
+
20
+ ## AI prompt
21
+
22
+ > Role: a specification tester. Read §1 and GLOSSARY. Write one scenario per Must
23
+ > and per Reject rule. For every rejection add an And-clause asserting what must NOT
24
+ > change. Results must be specific and observable — never "then it works".
25
+
26
+ ## Exit gate
27
+
28
+ - [ ] One scenario per Must rule.
29
+ - [ ] One scenario per Reject rule.
30
+ - [ ] Each result is a specific, observable fact.
31
+ - [ ] Every rejection asserts what stays unchanged.
32
+
33
+ ## Next
34
+
35
+ `python3 .add/tooling/add.py advance` → read `phases/3-contract.md`.
36
+ Book: `docs/04-step-2-scenarios.md`.
@@ -0,0 +1,41 @@
1
+ # Phase 3 — Contract (freeze the shape)
2
+
3
+ Goal: fix the external shape — interfaces, data, names, error cases — and FREEZE
4
+ it. This is the seam that makes the AI-led build safe: below it code is
5
+ disposable; above it nothing breaks because the shape does not move. Fill
6
+ **§3 CONTRACT** in TASK.md.
7
+
8
+ ## Produce (in TASK.md §3)
9
+
10
+ - Interfaces (endpoints/functions/messages) with inputs/outputs.
11
+ - Request/response shapes + persistent schema (note transactional needs).
12
+ - Names drawn from `GLOSSARY.md` (same concept = same name everywhere).
13
+ - A response for **every** Reject error code from §1.
14
+
15
+ Then mark `Status: FROZEN @ v1`. Generate a mock + contract tests so dependent
16
+ work can start before the real code exists.
17
+
18
+ **The freeze is the one approval.** This seam is where the single human approval lands, over the
19
+ whole bundle (§1–§4). Before asking for it, present the bundle **least-sure first**: the 1–2 points
20
+ most likely wrong (`⚠ [spec|scenario|contract|test] … — because …; if wrong: …`) — aim the human's
21
+ eye before they freeze. See `run.md`.
22
+
23
+ ## AI prompt
24
+
25
+ > Role: an interface architect; frozen contracts are immutable. Read §1, §2,
26
+ > GLOSSARY. Produce §3: interfaces, shapes, schema named from the glossary; a
27
+ > response for every Reject code; a mock returning the contracted shapes and
28
+ > contract tests pinning them. Mark FROZEN. No business logic. Never change a
29
+ > frozen contract — a change reopens Specify.
30
+
31
+ ## Exit gate
32
+
33
+ - [ ] Versioned and marked `FROZEN`.
34
+ - [ ] Contract tests pass against the mock.
35
+ - [ ] Every name matches the glossary.
36
+ - [ ] Every spec rejection has a contracted response.
37
+
38
+ ## Next
39
+
40
+ `python3 .add/tooling/add.py advance` → read `phases/4-tests.md`.
41
+ Book: `docs/05-step-3-contract.md`.
@@ -0,0 +1,37 @@
1
+ # Phase 4 — Tests (red safety net)
2
+
3
+ Goal: turn scenarios + contract into automated tests and confirm they FAIL before
4
+ any code exists. This operationalizes red/green TDD: red now, green only after
5
+ Build. Fill **§4 TESTS** and write the suite into `.add/tasks/<slug>/tests/`.
6
+
7
+ ## The must-fail principle
8
+
9
+ Run the suite now, with no implementation — it must be **red for the right
10
+ reason** (missing implementation, not a broken harness). A test that passes
11
+ before code exists is testing nothing and will wave bad code through later.
12
+
13
+ ## Produce
14
+
15
+ - One executable test per scenario (§2), asserting **behavior, not internals**.
16
+ - Contract-conformance tests (shapes + error responses from §3).
17
+ - Side-effect assertions on rejection paths (`assert balance unchanged`).
18
+ - A recorded coverage target in §4.
19
+
20
+ ## AI prompt
21
+
22
+ > Role: a test author who writes tests before code. Read §2 and §3. Turn each
23
+ > scenario into an executable test; add contract-conformance and edge-case tests;
24
+ > run the suite and confirm it fails for the right reason. Record a coverage
25
+ > target. Do NOT implement the feature. Never assert on internals.
26
+
27
+ ## Exit gate
28
+
29
+ - [ ] One test per scenario.
30
+ - [ ] Suite runs and is **red for the right reason**.
31
+ - [ ] Tests assert observable behavior.
32
+ - [ ] Coverage target recorded.
33
+
34
+ ## Next
35
+
36
+ `python3 .add/tooling/add.py advance` → read `phases/5-build.md`.
37
+ Book: `docs/06-step-4-tests.md`.
@@ -0,0 +1,38 @@
1
+ # Phase 5 — Build (AI writes the code)
2
+
3
+ Goal: implement the feature so EVERY failing test passes — without changing any
4
+ test or the contract. This is the only phase the AI leads. It works because §1–§4
5
+ removed all ambiguity. Write code into `.add/tasks/<slug>/src/`.
6
+
7
+ ## Work in small batches
8
+
9
+ Pick ONE task-sized slice, restate the tests it must satisfy, implement, run
10
+ tests, iterate to green. Keep each batch small enough to review in full — you
11
+ cannot move faster than you can verify.
12
+
13
+ ## The cardinal rule
14
+
15
+ **Never weaken or delete a test to make it pass, and never edit the frozen
16
+ contract.** That makes the code judge itself. A genuine need to change either is a
17
+ change request back to Specify. Honor the feature-specific safety rule named in §5
18
+ (e.g. atomic balance update) — the one property tests alone may not force.
19
+
20
+ ## AI prompt
21
+
22
+ > Read §1, §3, §4, and CONVENTIONS. Make EVERY failing test pass, one small batch
23
+ > at a time. Constraints: do NOT change any test; do NOT change the contract; honor
24
+ > the §5 safety rule; use only allow-listed packages; stop and ask if unclear.
25
+ > Report which tests pass and exactly what changed.
26
+
27
+ ## Exit gate
28
+
29
+ - [ ] All tests pass.
30
+ - [ ] Coverage did not decrease.
31
+ - [ ] No test and no contract modified by the AI.
32
+ - [ ] No dependency outside the allow-list.
33
+ - [ ] Change small enough to review in full.
34
+
35
+ ## Next
36
+
37
+ `python3 .add/tooling/add.py advance` → read `phases/6-verify.md`.
38
+ Book: `docs/07-step-5-build.md`.
@@ -0,0 +1,39 @@
1
+ # Phase 6 — Verify (evidence + blind-spot checks)
2
+
3
+ Goal: establish trust and record an outcome. Passing tests are necessary, not
4
+ sufficient. This phase is **human-led** — there is no AI role. Fill **§6** in
5
+ TASK.md including the GATE RECORD.
6
+
7
+ ## Part one — confirm the evidence
8
+
9
+ - [ ] All tests pass.
10
+ - [ ] Coverage did not decrease.
11
+ - [ ] No test or contract was altered during build.
12
+
13
+ If any is false, stop and return to Build — there is nothing to verify yet.
14
+
15
+ ## Part two — check what tests miss
16
+
17
+ - **Concurrency/timing** — is it correct when two run at once? (Tests run serially
18
+ and miss races.) This is usually the single most important check.
19
+ - **Security** — exposed secrets, injection openings, unexpected/invented
20
+ dependencies. A security finding is always `HARD-STOP`, never a waiver.
21
+ - **Architecture** — does it respect layering/dependency rules in CONVENTIONS.md?
22
+
23
+ ## Record exactly one outcome (no silent pass)
24
+
25
+ | Outcome | When |
26
+ |---------|------|
27
+ | `PASS` | all checks met |
28
+ | `RISK-ACCEPTED` | a **non-security** gap, with signed owner + ticket + expiry |
29
+ | `HARD-STOP` | any failing test or any security finding |
30
+
31
+ ## Exit gate / Next
32
+
33
+ - [ ] Evidence confirmed, blind-spots checked, a person approved, outcome recorded.
34
+
35
+ ```bash
36
+ python3 .add/tooling/add.py gate PASS # marks the task done
37
+ # or: add.py gate RISK-ACCEPTED | add.py gate HARD-STOP (return to Build)
38
+ ```
39
+ Then read `phases/7-observe.md`. Book: `docs/08-step-6-verify.md`.
@@ -0,0 +1,32 @@
1
+ # Phase 7 — Observe (feed the next loop)
2
+
3
+ Goal: release deliberately, watch reality, and turn what you learn into the next
4
+ spec. Release is not the finish line — it is where the most reliable information
5
+ about the feature finally appears. Fill **§7** in TASK.md.
6
+
7
+ ## Do
8
+
9
+ 1. **Release behind a blast-radius limit** — feature flag and/or gradual rollout.
10
+ 2. **Reuse scenarios as monitors** — the §2 scenarios that defined "correct" now
11
+ define what you alert on: overall error rate, each rejection's rate (a spike in
12
+ one is a signal), latency of the risky operation under load.
13
+ 3. **Draft the next spec delta** — every defect, surprise, or new need becomes a
14
+ concrete change that re-enters the flow at Specify (a new task).
15
+
16
+ ## AI prompt
17
+
18
+ > Role: a reliability analyst feeding the next cycle. Read telemetry, objectives,
19
+ > incidents. Report error-budget burn; cluster errors and surface the top
20
+ > real-world failures; draft a SPEC delta with evidence links. Never auto-roll-back
21
+ > — recommend; a human owns the production decision.
22
+
23
+ ## Exit gate
24
+
25
+ - [ ] Released behind a flag/rollout.
26
+ - [ ] Scenario-based monitors live.
27
+ - [ ] A reviewed spec delta captured (becomes the next `new-task`).
28
+
29
+ ## Next
30
+
31
+ Loop. The artifacts you built are living documents the next cycle refines.
32
+ Book: `docs/09-the-loop.md`.