@pilotspace/add 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/GETTING-STARTED.md +238 -0
- package/LICENSE +20 -0
- package/README.md +106 -0
- package/bin/cli.js +131 -0
- package/docs/00-introduction.md +46 -0
- package/docs/01-principles.md +71 -0
- package/docs/02-the-flow.md +93 -0
- package/docs/03-step-1-specify.md +117 -0
- package/docs/04-step-2-scenarios.md +78 -0
- package/docs/05-step-3-contract.md +78 -0
- package/docs/06-step-4-tests.md +71 -0
- package/docs/07-step-5-build.md +80 -0
- package/docs/08-step-6-verify.md +63 -0
- package/docs/09-the-loop.md +43 -0
- package/docs/10-setup-and-stages.md +75 -0
- package/docs/11-governance.md +87 -0
- package/docs/12-roles.md +99 -0
- package/docs/13-adoption.md +67 -0
- package/docs/14-foundation.md +121 -0
- package/docs/README.md +70 -0
- package/docs/add-competencies.png +0 -0
- package/docs/add-flow.png +0 -0
- package/docs/add-foundation.png +0 -0
- package/docs/add-hierarchy.png +0 -0
- package/docs/appendix-a-templates.md +88 -0
- package/docs/appendix-b-prompts.md +119 -0
- package/docs/appendix-c-glossary.md +85 -0
- package/docs/appendix-d-worked-example.md +152 -0
- package/docs/appendix-e-checklists.md +80 -0
- package/docs/appendix-f-requirements-matrix.md +170 -0
- package/package.json +47 -0
- package/skill/add/SKILL.md +118 -0
- package/skill/add/deltas.md +69 -0
- package/skill/add/fold.md +66 -0
- package/skill/add/intake.md +49 -0
- package/skill/add/phases/0-setup.md +35 -0
- package/skill/add/phases/1-specify.md +55 -0
- package/skill/add/phases/2-scenarios.md +36 -0
- package/skill/add/phases/3-contract.md +41 -0
- package/skill/add/phases/4-tests.md +37 -0
- package/skill/add/phases/5-build.md +38 -0
- package/skill/add/phases/6-verify.md +39 -0
- package/skill/add/phases/7-observe.md +32 -0
- package/skill/add/run.md +152 -0
- package/skill/add/scope.md +58 -0
- package/tooling/add.py +1573 -0
- package/tooling/templates/CONVENTIONS.md.tmpl +8 -0
- package/tooling/templates/GLOSSARY.md.tmpl +3 -0
- package/tooling/templates/MILESTONE.md.tmpl +25 -0
- package/tooling/templates/MODEL_REGISTRY.md.tmpl +6 -0
- package/tooling/templates/PROJECT.md.tmpl +42 -0
- package/tooling/templates/TASK.md.tmpl +111 -0
- package/tooling/templates/dependencies.allowlist.tmpl +2 -0
|
@@ -0,0 +1,118 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: add
|
|
3
|
+
description: >-
|
|
4
|
+
ADD (AI-Driven Development) — a minimal, state-tracked workflow for building
|
|
5
|
+
software where the AI writes the code and the human owns direction and
|
|
6
|
+
verification. Drives every feature through one lean TASK.md: Specify →
|
|
7
|
+
Scenarios → Contract → Tests → Build → Verify → Observe, with red/green TDD
|
|
8
|
+
built in. Use this skill whenever working in a repo that has a `.add/`
|
|
9
|
+
directory, when the user says "add", "start a task", "next phase", "specify
|
|
10
|
+
this feature", "ADD method", or "AI-driven development", or when scaffolding a
|
|
11
|
+
new feature and you want spec/tests-first discipline instead of vague-prompt
|
|
12
|
+
coding. Also use it to resume work across sessions (it reads `.add/state.json`
|
|
13
|
+
so you never re-read the whole repo).
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
# ADD — the orchestration engine
|
|
17
|
+
|
|
18
|
+
You are the orchestrator. ADD keeps the AI fast *and* safe by fixing direction
|
|
19
|
+
(spec, scenarios, contract, failing tests) **before** the build, and trusting
|
|
20
|
+
the result through passing evidence rather than a plausible-looking diff.
|
|
21
|
+
|
|
22
|
+
**One file = one task.** Each feature lives in a single `.add/tasks/<slug>/TASK.md`
|
|
23
|
+
with seven sections. You fill them top to bottom; the Python tool tracks where
|
|
24
|
+
you are so context never rots across sessions.
|
|
25
|
+
|
|
26
|
+
## Always start here (orient — do not skip)
|
|
27
|
+
|
|
28
|
+
Run the tool to find the resume point instead of re-reading the repo:
|
|
29
|
+
|
|
30
|
+
```bash
|
|
31
|
+
python3 .add/tooling/add.py status
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
- **No `.add/` yet** → go to **phase 0 (setup)**: read `phases/0-setup.md`.
|
|
35
|
+
- **A task is active** → open `.add/tasks/<active>/TASK.md`, look at its `phase:`
|
|
36
|
+
marker, and read the matching `phases/<n>-<phase>.md`. Work *only* that phase.
|
|
37
|
+
- **No active task** → first SIZE the request (see Intake below), then create the
|
|
38
|
+
right scope: `python3 .add/tooling/add.py new-task <slug> --title "..."`.
|
|
39
|
+
|
|
40
|
+
## Intake — size a request before creating scope
|
|
41
|
+
|
|
42
|
+
When the user brings a raw request, classify it BEFORE making a milestone or task:
|
|
43
|
+
read `intake.md` and place it in exactly one bucket — `new-major` · `sub-milestone`
|
|
44
|
+
· `task` · `change-request` — then propose `{ bucket, rationale, command }` and let
|
|
45
|
+
the human confirm. This is the intake altitude (request → versioned scope); see
|
|
46
|
+
`intake.md` for the rubric, the tie-break order, and worked examples.
|
|
47
|
+
|
|
48
|
+
Once a request is classified `new-major`/`sub-milestone`, drafting the actual
|
|
49
|
+
`MILESTONE.md` (goal · scope · exit criteria · breadth-first tasks) is the second
|
|
50
|
+
half of intake: read `scope.md` for how to fill it well, the per-outcome behavior,
|
|
51
|
+
and the confirm-before-create rule. You propose the draft; the human confirms.
|
|
52
|
+
|
|
53
|
+
## The flow and which file to load
|
|
54
|
+
|
|
55
|
+
Load the phase guide **only for the phase you are in** (progressive disclosure):
|
|
56
|
+
|
|
57
|
+
| Phase | Guide | Produces (TASK.md section) | Who leads |
|
|
58
|
+
|-------|-------|----------------------------|-----------|
|
|
59
|
+
| setup | `phases/0-setup.md` | `.add/` + survivor files | human |
|
|
60
|
+
| specify | `phases/1-specify.md` | §1 rules + ranked least-sure flag | human + AI (co-specify) |
|
|
61
|
+
| scenarios | `phases/2-scenarios.md` | §2 Given/When/Then | human |
|
|
62
|
+
| contract | `phases/3-contract.md` | §3 frozen shape | human + AI |
|
|
63
|
+
| tests | `phases/4-tests.md` | §4 + red suite in `tests/` | human sets, AI writes |
|
|
64
|
+
| build | `phases/5-build.md` | code in `src/`, tests green | **AI** |
|
|
65
|
+
| verify | `phases/6-verify.md` | §6 checks + gate record | **human** |
|
|
66
|
+
| observe | `phases/7-observe.md` | §7 spec delta | human + AI |
|
|
67
|
+
|
|
68
|
+
In **observe**, also emit **competency deltas** — learnings tagged by which of the five
|
|
69
|
+
(`DDD · SDD · UDD · TDD · ADD`) they improve — so the foundation self-improves across loops.
|
|
70
|
+
You write them as `open`; the human folds them into `PROJECT.md`. Read `deltas.md` for the
|
|
71
|
+
grammar and the status lifecycle. At milestone close (or on demand), run the fold ritual that
|
|
72
|
+
gathers confirmed deltas into a versioned foundation — read `fold.md`.
|
|
73
|
+
|
|
74
|
+
## The dynamic run (v6)
|
|
75
|
+
|
|
76
|
+
Once **§3 CONTRACT is FROZEN**, the build→verify half MAY run as a dynamic, auto-gated run —
|
|
77
|
+
fan-out + in-run convergence — instead of a manual build. Read `run.md` for the trigger, the
|
|
78
|
+
touch-boundary, the evidence auto-gate, and the autonomy dial. The human-led front
|
|
79
|
+
(specify·scenarios·contract) is unchanged; the run never edits a frozen contract and never
|
|
80
|
+
auto-passes a security finding.
|
|
81
|
+
|
|
82
|
+
## Non-negotiable rules (from the method)
|
|
83
|
+
|
|
84
|
+
1. **Direction before speed.** Never start Build until §1–§4 exist and tests are red.
|
|
85
|
+
2. **Trust evidence, not inspection.** A feature is trusted because its tests pass
|
|
86
|
+
and the blind-spots (concurrency, security, architecture) were checked — not
|
|
87
|
+
because the code reads plausibly.
|
|
88
|
+
3. **Never weaken a test or edit a frozen contract to make the build pass.** That
|
|
89
|
+
inverts the method. A real change is a *change request* back to Specify.
|
|
90
|
+
4. **No silent skips.** Every Verify ends in exactly one recorded outcome:
|
|
91
|
+
`PASS`, `RISK-ACCEPTED` (signed, non-security only), or `HARD-STOP`. A security
|
|
92
|
+
finding is always `HARD-STOP`.
|
|
93
|
+
5. **Ask, don't guess.** If a requirement is unclear, stop and ask the user.
|
|
94
|
+
|
|
95
|
+
## Advancing
|
|
96
|
+
|
|
97
|
+
After a phase's exit gate is met, advance the state (this also syncs the marker
|
|
98
|
+
inside TASK.md):
|
|
99
|
+
|
|
100
|
+
```bash
|
|
101
|
+
python3 .add/tooling/add.py advance # next phase of the active task
|
|
102
|
+
python3 .add/tooling/add.py gate PASS # at verify: records PASS, marks done
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
## Depth by stage
|
|
106
|
+
|
|
107
|
+
The steps never change; their depth does. Read the stage from `add.py status`:
|
|
108
|
+
|
|
109
|
+
- **prototype** — run light; code is throwaway; design/experience is the point.
|
|
110
|
+
- **poc** — run contract/tests/build deeply on the single riskiest slice only.
|
|
111
|
+
- **mvp** — full flow, narrow scope, light observation.
|
|
112
|
+
- **production** — every step at full rigor + the observe loop.
|
|
113
|
+
|
|
114
|
+
## The trust layer
|
|
115
|
+
|
|
116
|
+
The full method (the *why* behind every rule) is the AIDD book in `.add/docs/`.
|
|
117
|
+
When a phase decision is genuinely unclear, read the linked chapter — each phase
|
|
118
|
+
guide points to its chapter. Do not duplicate the book here; load it on demand.
|
|
@@ -0,0 +1,69 @@
|
|
|
1
|
+
# Competency deltas — how each loop sharpens the foundation
|
|
2
|
+
|
|
3
|
+
A **competency delta** is a single learning a task produces, tagged by which of ADD's five
|
|
4
|
+
competencies it improves. You write deltas in a task's **OBSERVE** phase; later, the
|
|
5
|
+
`foundation-update-loop` gathers the confirmed ones and folds them into a versioned `PROJECT.md`.
|
|
6
|
+
This is how `DDD · SDD · UDD · TDD · ADD` stop being write-once and start converging.
|
|
7
|
+
|
|
8
|
+
You (the AI) **emit** deltas as `open`. Only the **human** moves a delta to `folded` or `rejected`
|
|
9
|
+
(folding into the foundation is judgment — see the verify/observe seam). You never self-fold.
|
|
10
|
+
|
|
11
|
+
## The grammar (frozen)
|
|
12
|
+
|
|
13
|
+
Each delta is ONE line, exactly:
|
|
14
|
+
|
|
15
|
+
```
|
|
16
|
+
- [<COMPETENCY> · <status>] <learning> (evidence: <pointer>)
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
- `<COMPETENCY>` — exactly one of the five (below).
|
|
20
|
+
- `<status>` — `open` | `folded` | `rejected`. A **newly emitted delta is `open`**.
|
|
21
|
+
- `<learning>` — the insight, in one phrase ("the domain model missed multi-tenancy").
|
|
22
|
+
- `(evidence: …)` — **required**, non-empty: a failing scenario, a production signal, a review
|
|
23
|
+
note. No evidence → it is an opinion, not a delta.
|
|
24
|
+
|
|
25
|
+
## The five competencies (pick exactly one per delta)
|
|
26
|
+
|
|
27
|
+
| tag | competency | a delta here means you learned something about… |
|
|
28
|
+
|-----|------------|--------------------------------------------------|
|
|
29
|
+
| `DDD` | Domain | the domain model — an entity, rule, or boundary the spec assumed wrong |
|
|
30
|
+
| `SDD` | Spec | what the feature must do / must reject — a missing or wrong requirement |
|
|
31
|
+
| `UDD` | UI/UX | the user-facing shape — a flow, affordance, or wording that misled |
|
|
32
|
+
| `TDD` | Test | how we prove correctness — a missing scenario, a flaky or hollow test |
|
|
33
|
+
| `ADD` | AI/build | how the AI builds — a harness, prompt, or convention that helped or hurt |
|
|
34
|
+
|
|
35
|
+
If a learning seems to touch two, ask "which competency, once updated, would have PREVENTED this?"
|
|
36
|
+
That is its home. Split genuinely separate learnings into separate deltas; never tag one twice.
|
|
37
|
+
|
|
38
|
+
## Status lifecycle
|
|
39
|
+
|
|
40
|
+
```
|
|
41
|
+
emit (OBSERVE) human review (foundation-update-loop)
|
|
42
|
+
open ───────────▶ folded (the learning is merged into PROJECT.md; version bumps)
|
|
43
|
+
└──────────▶ rejected (considered and deliberately NOT folded — the trail is kept)
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
An `open` delta is a pending signal. `folded` and `rejected` are both human decisions; a `rejected`
|
|
47
|
+
delta is left in place (not deleted) so "we saw this and chose not to act" stays auditable.
|
|
48
|
+
|
|
49
|
+
## Reject codes (well-formedness — you are the first check, the human is the backstop)
|
|
50
|
+
|
|
51
|
+
There is no engine validator yet, so before you record a delta, self-check it:
|
|
52
|
+
|
|
53
|
+
- `unknown_competency` — the tag is missing or not one of `DDD · SDD · UDD · TDD · ADD`. Fix the tag.
|
|
54
|
+
- `no_evidence` — the `(evidence: …)` pointer is missing or empty. Add the proof, or drop the line.
|
|
55
|
+
- `unknown_status` — the status is not `open | folded | rejected`. A fresh delta is `open`.
|
|
56
|
+
|
|
57
|
+
## Worked example
|
|
58
|
+
|
|
59
|
+
A task that built a tenancy feature finished its OBSERVE phase with:
|
|
60
|
+
|
|
61
|
+
```
|
|
62
|
+
- [DDD · open] the account model conflated org and workspace (evidence: scenario_cross_tenant_read failed)
|
|
63
|
+
- [TDD · open] no scenario covered a deleted tenant's dangling sessions (evidence: review note, PR thread)
|
|
64
|
+
- [ADD · open] the scaffold's allow-list missed the tenancy lib, slowing build (evidence: build log retry)
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
Three learnings, three competencies, each with a pointer. At the next foundation update the human
|
|
68
|
+
folded the DDD and TDD deltas into `PROJECT.md` (→ `folded`) and rejected the ADD one as a one-off
|
|
69
|
+
(→ `rejected`). The foundation got sharper; nothing was silently lost.
|
|
@@ -0,0 +1,66 @@
|
|
|
1
|
+
# Folding deltas — how the foundation self-improves
|
|
2
|
+
|
|
3
|
+
This **closes the loop**. `deltas.md` lets a task EMIT learnings (`open` competency deltas in its
|
|
4
|
+
OBSERVE phase); folding gathers the confirmed ones and writes them into a **versioned foundation**,
|
|
5
|
+
so `DDD · SDD · UDD · TDD · ADD` sharpen across milestones instead of drifting.
|
|
6
|
+
|
|
7
|
+
You (the AI) **gather and propose**; the **human confirms**; you then write the **append-only** fold.
|
|
8
|
+
You never self-fold — folding is judgment (see the verify/observe seam).
|
|
9
|
+
|
|
10
|
+
## When to fold
|
|
11
|
+
|
|
12
|
+
At **milestone close** (the natural "version bump to the foundation"), or **on demand** when open
|
|
13
|
+
deltas have piled up. This is a convention, not a command — there is no `add.py fold`; the ritual
|
|
14
|
+
lives here so the engine stays judgment-free.
|
|
15
|
+
|
|
16
|
+
## The ritual
|
|
17
|
+
|
|
18
|
+
1. **Gather** — scan every task's OBSERVE `### Competency deltas` block for lines still `open`.
|
|
19
|
+
2. **Group** — bucket them by competency (`DDD · SDD · UDD · TDD · ADD`).
|
|
20
|
+
3. **Propose** — for each, draft the exact foundation edit (see routing) and show the human.
|
|
21
|
+
4. **Confirm** — the human accepts or declines each delta. No write happens without this.
|
|
22
|
+
5. **Write** — append the accepted edits, flip each delta's status, and bump the version.
|
|
23
|
+
|
|
24
|
+
## Fold routing (every competency has a home)
|
|
25
|
+
|
|
26
|
+
| competency | folds into | how |
|
|
27
|
+
|------------|-----------|-----|
|
|
28
|
+
| `DDD` | `PROJECT.md` §Domain (DDD) | refine/append a model bullet |
|
|
29
|
+
| `SDD` | `PROJECT.md` §Spec / Living Document (SDD) | refine/append a settled-vs-open line |
|
|
30
|
+
| `UDD` | `PROJECT.md` §Users (UDD) | refine/append a UX line |
|
|
31
|
+
| `TDD` | `CONVENTIONS.md` | append a testing convention (no PROJECT.md section — it is the engine) |
|
|
32
|
+
| `ADD` | `CONVENTIONS.md` | append a build/harness convention (likewise the engine) |
|
|
33
|
+
|
|
34
|
+
**Every** fold — whatever the competency — ALSO appends one row to `PROJECT.md` **§Key Decisions**
|
|
35
|
+
(date · decision · why · outcome): the universal, auditable trail of what the foundation learned.
|
|
36
|
+
|
|
37
|
+
## Status transitions & version
|
|
38
|
+
|
|
39
|
+
- on **confirm**: the delta moves `open` → `folded` (and its edit is appended to the routed target).
|
|
40
|
+
- on **decline**: the delta moves `open` → `rejected` and is **left in place** — never deleted —
|
|
41
|
+
so "we considered this and chose not to act" stays auditable.
|
|
42
|
+
- a fold is **append-only**: it adds bullets/rows; it never silently rewrites existing foundation text.
|
|
43
|
+
- each fold session **bumps** the `foundation-version:` marker in `PROJECT.md` by one (monotonic int).
|
|
44
|
+
|
|
45
|
+
## Reject codes (the AI is first check, the human the backstop)
|
|
46
|
+
|
|
47
|
+
- `no_open_deltas` — nothing is `open` anywhere. The ritual is a no-op; do **not** bump the version.
|
|
48
|
+
- `unconfirmed_fold` — a write was attempted without recorded human confirmation. The AI proposes;
|
|
49
|
+
it never self-folds. Stop and get confirmation.
|
|
50
|
+
- `unroutable_delta` — a delta's competency is not one of the five, so it has no fold target. Fix the
|
|
51
|
+
delta (it is malformed per `deltas.md`) before folding.
|
|
52
|
+
|
|
53
|
+
## Worked example (from this repo's own history)
|
|
54
|
+
|
|
55
|
+
The `competency-deltas` task closed its OBSERVE with two deltas — the homeless ones, `TDD`/`ADD`,
|
|
56
|
+
which have no PROJECT.md section:
|
|
57
|
+
|
|
58
|
+
```
|
|
59
|
+
- [ADD · open] dogfood .add/tooling template can silently diverge from canonical (evidence: md5 mismatch this build)
|
|
60
|
+
- [TDD · open] structural tests guard canonical artifacts but not their dogfood twins (evidence: scope-loop note + this build)
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
At the next fold the human confirms both. Routing sends each to `CONVENTIONS.md` (a "sync the dogfood
|
|
64
|
+
tree + assert md5 parity" convention), appends a §Key Decisions row for each, flips them to `folded`,
|
|
65
|
+
and bumps `foundation-version` 1 → 2. The two competencies the foundation never tracked before now
|
|
66
|
+
have a home — which is exactly why v5 routes TDD/ADD to `CONVENTIONS.md`.
|
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
# Intake — size a request into versioned scope
|
|
2
|
+
|
|
3
|
+
Before a task exists, ADD turns a raw request into correctly-sized, versioned scope.
|
|
4
|
+
This is the **intake altitude**: the per-task flow is phases 0–7; intake is the step
|
|
5
|
+
*before* a task — request → milestone or task. You (the AI) **propose**; the human
|
|
6
|
+
**confirms**. Never create scope without a confirmed proposal.
|
|
7
|
+
|
|
8
|
+
## The four buckets
|
|
9
|
+
|
|
10
|
+
Classify every request into exactly ONE bucket:
|
|
11
|
+
|
|
12
|
+
| Bucket | Decision test | Implied command |
|
|
13
|
+
|--------|---------------|-----------------|
|
|
14
|
+
| `new-major` | a new product theme/pillar no active milestone's goal covers | `add.py new-milestone vN` |
|
|
15
|
+
| `sub-milestone` | a slice of an EXISTING major theme, too big for one task | `add.py new-milestone vN-M` |
|
|
16
|
+
| `task` | fits within the ACTIVE milestone's stated scope | `add.py new-task <slug>` |
|
|
17
|
+
| `change-request` | modifies ALREADY-FROZEN scope (a frozen contract or a shipped promise) | `add.py phase specify\|contract <affected>` |
|
|
18
|
+
|
|
19
|
+
**Tie-break order: the frozen-scope test runs FIRST, before the size test.**
|
|
20
|
+
First ask "does this change already-frozen scope?" → if yes, it is a `change-request`
|
|
21
|
+
(never re-size frozen work as new scope). Only if no, apply the size test: a new theme
|
|
22
|
+
→ `new-major`; a slice of a live theme → `sub-milestone`; fits the active milestone
|
|
23
|
+
→ `task`.
|
|
24
|
+
|
|
25
|
+
## What you emit (the proposal)
|
|
26
|
+
|
|
27
|
+
For every request, emit ONE of:
|
|
28
|
+
|
|
29
|
+
- **a classification** — `{ bucket, rationale, command }` — where `rationale` names WHY
|
|
30
|
+
(the theme, the slice, the fit, or the frozen scope touched) and `command` is the exact
|
|
31
|
+
`add.py …` from the table. The human confirms or overrides before you run it.
|
|
32
|
+
- **a rejection** — `{ reject, rationale }` — and you create nothing:
|
|
33
|
+
- `ask_human` — too ambiguous/underspecified to size. Ask the human; never guess a bucket.
|
|
34
|
+
- `frozen_scope` — it changes frozen scope; route it as a `change-request` back to
|
|
35
|
+
SPECIFY/CONTRACT of the affected task — never spawn a parallel milestone that forks the truth.
|
|
36
|
+
- `split_required` — it spans more than one bucket; propose the SMALLEST set of correctly-sized
|
|
37
|
+
items, each with its own rationale; never force it into one milestone.
|
|
38
|
+
|
|
39
|
+
When confirmed, record the `rationale` in the artifact you create or affect — the new
|
|
40
|
+
MILESTONE.md goal/body, the new TASK.md, or a note in the affected TASK.md — never in state.json.
|
|
41
|
+
|
|
42
|
+
## Worked examples (from this project's own history)
|
|
43
|
+
|
|
44
|
+
| request | bucket | rationale |
|
|
45
|
+
|---------|--------|-----------|
|
|
46
|
+
| give ADD a hosted web dashboard | new-major | a new product theme no active milestone's goal covers → a fresh major line (v5) |
|
|
47
|
+
| add the build corridor + tests-red-before-build | sub-milestone | a slice of the live v4 "self-driving" theme, too big for one task → v4-2 |
|
|
48
|
+
| expose owner/stop as --json | task | fits the active v4-1 (intake interface) scope → one task |
|
|
49
|
+
| guide --json phase/gate should be nullable | change-request | changes the FROZEN machine-state-json contract → reopen its CONTRACT, do not make a new milestone |
|
|
@@ -0,0 +1,35 @@
|
|
|
1
|
+
# Phase 0 — Setup (once per project)
|
|
2
|
+
|
|
3
|
+
Goal: make every later gate enforceable automatically. Do this once.
|
|
4
|
+
|
|
5
|
+
## Do
|
|
6
|
+
|
|
7
|
+
1. Initialise the runtime (creates `.add/` + survivor-layer files):
|
|
8
|
+
```bash
|
|
9
|
+
python3 .add/tooling/add.py init --name "<project>" --stage prototype
|
|
10
|
+
```
|
|
11
|
+
If the tool isn't there yet, the installer (`npx @pilotspace/add init`) placed it at
|
|
12
|
+
`.add/tooling/add.py`.
|
|
13
|
+
2. Fill the survivor-layer files (they outlive all code):
|
|
14
|
+
- `.add/PROJECT.md` — **the foundation**: Domain (DDD) · Spec/Living-Document (SDD,
|
|
15
|
+
→ active milestone) · UI/UX (UDD) · Key Decisions. Cross-milestone context the
|
|
16
|
+
engine reads first. Keep it to one screen. Book: `docs/14-foundation.md`.
|
|
17
|
+
- `.add/CONVENTIONS.md` — language, folders, naming, lint, error-code style, architecture.
|
|
18
|
+
- `.add/GLOSSARY.md` — one name per concept; used in specs, contracts, and code.
|
|
19
|
+
- `.add/MODEL_REGISTRY.md` — which AI model/version writes this project.
|
|
20
|
+
- `.add/dependencies.allowlist` — packages the AI may use; CI rejects others.
|
|
21
|
+
3. Confirm CI runs green on the empty skeleton before the first feature.
|
|
22
|
+
|
|
23
|
+
## Exit gate
|
|
24
|
+
|
|
25
|
+
- [ ] `.add/state.json` exists (`add.py status` works).
|
|
26
|
+
- [ ] `.add/PROJECT.md` foundation filled (domain · spec · UI/UX).
|
|
27
|
+
- [ ] CONVENTIONS, GLOSSARY, MODEL_REGISTRY, allowlist filled.
|
|
28
|
+
- [ ] Pipeline green on the skeleton.
|
|
29
|
+
|
|
30
|
+
## Next
|
|
31
|
+
|
|
32
|
+
```bash
|
|
33
|
+
python3 .add/tooling/add.py new-task <slug> --title "<feature>"
|
|
34
|
+
```
|
|
35
|
+
Then read `phases/1-specify.md`. · Book: `docs/10-setup-and-stages.md`.
|
|
@@ -0,0 +1,55 @@
|
|
|
1
|
+
# Phase 1 — Specify (the rules)
|
|
2
|
+
|
|
3
|
+
Goal: state what the feature MUST do and what it must REJECT, with zero ambiguity
|
|
4
|
+
for the AI to resolve by guessing. Fill **§1 SPECIFY** in TASK.md.
|
|
5
|
+
|
|
6
|
+
Specify is **co-specification**: brainstorm the shape WITH the user, draft it, then let
|
|
7
|
+
the user validate with your advice. If you cannot write the spec, you do not yet
|
|
8
|
+
understand the feature — that is information, not an obstacle. Stop and ask.
|
|
9
|
+
|
|
10
|
+
## Co-specify in three moves
|
|
11
|
+
|
|
12
|
+
1. **Diverge** — before drafting, surface the decision space: the 2–3 genuine framings of the
|
|
13
|
+
feature + the open questions you would otherwise guess. Invite the user to add, kill,
|
|
14
|
+
redirect. (Conversational — no new file. At prototype/poc this collapses to one sentence.)
|
|
15
|
+
2. **Converge** — draft §1, then RANK what you are least sure about (below).
|
|
16
|
+
3. **Validate** — present the ranked uncertainty first; the user confirms, corrects, or sends back.
|
|
17
|
+
|
|
18
|
+
## Produce (in TASK.md §1)
|
|
19
|
+
|
|
20
|
+
- **Framings weighed** — a one-line trace of what you considered: `X (chosen) · Y · Z`.
|
|
21
|
+
- **Must** — each required behavior.
|
|
22
|
+
- **Reject** — each refused input/situation, paired with a **named error code**
|
|
23
|
+
(`amount <= 0 -> "amount_invalid"`, never "handle bad input").
|
|
24
|
+
- **After** — the state that is true once it succeeds.
|
|
25
|
+
- **Assumptions — least-sure first** — ranked most-likely-wrong → least. The top 1–2 carry a
|
|
26
|
+
`⚠` flag: `⚠ <assumption> — least sure because <why>; if wrong: <cost>`. The rest are the
|
|
27
|
+
low-stakes `[x]` tail. Never a flat wall of equal `[x]` ticks — that is what gets rubber-stamped.
|
|
28
|
+
|
|
29
|
+
## The least-sure flag is bundle-wide
|
|
30
|
+
|
|
31
|
+
The single human approval happens once, at the contract freeze, over the whole bundle. So your
|
|
32
|
+
§1 ranking is the FIRST FEEDER into a bundle-level flag the user reads at the seam (`run.md`):
|
|
33
|
+
*"of everything I'm asking you to freeze, these 1–2 are most likely wrong."* A flag may point at
|
|
34
|
+
a §1 assumption, an uncovered scenario, or the contract shape.
|
|
35
|
+
|
|
36
|
+
## AI prompt
|
|
37
|
+
|
|
38
|
+
> Role: a domain analyst who brainstorms, then asks rather than assumes. Read CONVENTIONS,
|
|
39
|
+
> GLOSSARY, and the user's raw input. First surface 2–3 framings + open questions and let me
|
|
40
|
+
> react. Then produce §1: Framings weighed, every Must, every Reject with a named error code,
|
|
41
|
+
> the After state, and the Assumptions RANKED least-sure first — flag the 1–2 you are least
|
|
42
|
+
> sure about with why + cost. Never resolve an ambiguity by guessing.
|
|
43
|
+
|
|
44
|
+
## Exit gate
|
|
45
|
+
|
|
46
|
+
- [ ] Framings weighed noted; every required behavior stated.
|
|
47
|
+
- [ ] Every rejection has a named error code; success state-change described.
|
|
48
|
+
- [ ] Assumptions ordered least-sure first; the 1–2 `⚠` flags carry why + cost — or an honest
|
|
49
|
+
"none material" that still names the single biggest risk (never a blank "none").
|
|
50
|
+
|
|
51
|
+
## Next
|
|
52
|
+
|
|
53
|
+
`python3 .add/tooling/add.py advance` → read `phases/2-scenarios.md`.
|
|
54
|
+
Book: `docs/03-step-1-specify.md`. (UI feature? also sketch flows + every screen
|
|
55
|
+
state: loading/empty/error/success.)
|
|
@@ -0,0 +1,36 @@
|
|
|
1
|
+
# Phase 2 — Scenarios (pass/fail cases)
|
|
2
|
+
|
|
3
|
+
Goal: rewrite each rule as a concrete Given/When/Then that is readable by people
|
|
4
|
+
and checkable by machines. This is the highest-leverage artifact — the tests are
|
|
5
|
+
generated from it. Fill **§2 SCENARIOS** in TASK.md.
|
|
6
|
+
|
|
7
|
+
## Produce (in TASK.md §2)
|
|
8
|
+
|
|
9
|
+
```gherkin
|
|
10
|
+
Scenario: <short name>
|
|
11
|
+
Given <starting situation>
|
|
12
|
+
When <action>
|
|
13
|
+
Then <observable result>
|
|
14
|
+
And <what must remain unchanged> # REQUIRED for every rejection
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
The `And ... unchanged` clause catches corrupting partial failures (e.g. a balance
|
|
18
|
+
deducted before a check fails). Never omit it on a rejection.
|
|
19
|
+
|
|
20
|
+
## AI prompt
|
|
21
|
+
|
|
22
|
+
> Role: a specification tester. Read §1 and GLOSSARY. Write one scenario per Must
|
|
23
|
+
> and per Reject rule. For every rejection add an And-clause asserting what must NOT
|
|
24
|
+
> change. Results must be specific and observable — never "then it works".
|
|
25
|
+
|
|
26
|
+
## Exit gate
|
|
27
|
+
|
|
28
|
+
- [ ] One scenario per Must rule.
|
|
29
|
+
- [ ] One scenario per Reject rule.
|
|
30
|
+
- [ ] Each result is a specific, observable fact.
|
|
31
|
+
- [ ] Every rejection asserts what stays unchanged.
|
|
32
|
+
|
|
33
|
+
## Next
|
|
34
|
+
|
|
35
|
+
`python3 .add/tooling/add.py advance` → read `phases/3-contract.md`.
|
|
36
|
+
Book: `docs/04-step-2-scenarios.md`.
|
|
@@ -0,0 +1,41 @@
|
|
|
1
|
+
# Phase 3 — Contract (freeze the shape)
|
|
2
|
+
|
|
3
|
+
Goal: fix the external shape — interfaces, data, names, error cases — and FREEZE
|
|
4
|
+
it. This is the seam that makes the AI-led build safe: below it code is
|
|
5
|
+
disposable; above it nothing breaks because the shape does not move. Fill
|
|
6
|
+
**§3 CONTRACT** in TASK.md.
|
|
7
|
+
|
|
8
|
+
## Produce (in TASK.md §3)
|
|
9
|
+
|
|
10
|
+
- Interfaces (endpoints/functions/messages) with inputs/outputs.
|
|
11
|
+
- Request/response shapes + persistent schema (note transactional needs).
|
|
12
|
+
- Names drawn from `GLOSSARY.md` (same concept = same name everywhere).
|
|
13
|
+
- A response for **every** Reject error code from §1.
|
|
14
|
+
|
|
15
|
+
Then mark `Status: FROZEN @ v1`. Generate a mock + contract tests so dependent
|
|
16
|
+
work can start before the real code exists.
|
|
17
|
+
|
|
18
|
+
**The freeze is the one approval.** This seam is where the single human approval lands, over the
|
|
19
|
+
whole bundle (§1–§4). Before asking for it, present the bundle **least-sure first**: the 1–2 points
|
|
20
|
+
most likely wrong (`⚠ [spec|scenario|contract|test] … — because …; if wrong: …`) — aim the human's
|
|
21
|
+
eye before they freeze. See `run.md`.
|
|
22
|
+
|
|
23
|
+
## AI prompt
|
|
24
|
+
|
|
25
|
+
> Role: an interface architect; frozen contracts are immutable. Read §1, §2,
|
|
26
|
+
> GLOSSARY. Produce §3: interfaces, shapes, schema named from the glossary; a
|
|
27
|
+
> response for every Reject code; a mock returning the contracted shapes and
|
|
28
|
+
> contract tests pinning them. Mark FROZEN. No business logic. Never change a
|
|
29
|
+
> frozen contract — a change reopens Specify.
|
|
30
|
+
|
|
31
|
+
## Exit gate
|
|
32
|
+
|
|
33
|
+
- [ ] Versioned and marked `FROZEN`.
|
|
34
|
+
- [ ] Contract tests pass against the mock.
|
|
35
|
+
- [ ] Every name matches the glossary.
|
|
36
|
+
- [ ] Every spec rejection has a contracted response.
|
|
37
|
+
|
|
38
|
+
## Next
|
|
39
|
+
|
|
40
|
+
`python3 .add/tooling/add.py advance` → read `phases/4-tests.md`.
|
|
41
|
+
Book: `docs/05-step-3-contract.md`.
|
|
@@ -0,0 +1,37 @@
|
|
|
1
|
+
# Phase 4 — Tests (red safety net)
|
|
2
|
+
|
|
3
|
+
Goal: turn scenarios + contract into automated tests and confirm they FAIL before
|
|
4
|
+
any code exists. This operationalizes red/green TDD: red now, green only after
|
|
5
|
+
Build. Fill **§4 TESTS** and write the suite into `.add/tasks/<slug>/tests/`.
|
|
6
|
+
|
|
7
|
+
## The must-fail principle
|
|
8
|
+
|
|
9
|
+
Run the suite now, with no implementation — it must be **red for the right
|
|
10
|
+
reason** (missing implementation, not a broken harness). A test that passes
|
|
11
|
+
before code exists is testing nothing and will wave bad code through later.
|
|
12
|
+
|
|
13
|
+
## Produce
|
|
14
|
+
|
|
15
|
+
- One executable test per scenario (§2), asserting **behavior, not internals**.
|
|
16
|
+
- Contract-conformance tests (shapes + error responses from §3).
|
|
17
|
+
- Side-effect assertions on rejection paths (`assert balance unchanged`).
|
|
18
|
+
- A recorded coverage target in §4.
|
|
19
|
+
|
|
20
|
+
## AI prompt
|
|
21
|
+
|
|
22
|
+
> Role: a test author who writes tests before code. Read §2 and §3. Turn each
|
|
23
|
+
> scenario into an executable test; add contract-conformance and edge-case tests;
|
|
24
|
+
> run the suite and confirm it fails for the right reason. Record a coverage
|
|
25
|
+
> target. Do NOT implement the feature. Never assert on internals.
|
|
26
|
+
|
|
27
|
+
## Exit gate
|
|
28
|
+
|
|
29
|
+
- [ ] One test per scenario.
|
|
30
|
+
- [ ] Suite runs and is **red for the right reason**.
|
|
31
|
+
- [ ] Tests assert observable behavior.
|
|
32
|
+
- [ ] Coverage target recorded.
|
|
33
|
+
|
|
34
|
+
## Next
|
|
35
|
+
|
|
36
|
+
`python3 .add/tooling/add.py advance` → read `phases/5-build.md`.
|
|
37
|
+
Book: `docs/06-step-4-tests.md`.
|
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
# Phase 5 — Build (AI writes the code)
|
|
2
|
+
|
|
3
|
+
Goal: implement the feature so EVERY failing test passes — without changing any
|
|
4
|
+
test or the contract. This is the only phase the AI leads. It works because §1–§4
|
|
5
|
+
removed all ambiguity. Write code into `.add/tasks/<slug>/src/`.
|
|
6
|
+
|
|
7
|
+
## Work in small batches
|
|
8
|
+
|
|
9
|
+
Pick ONE task-sized slice, restate the tests it must satisfy, implement, run
|
|
10
|
+
tests, iterate to green. Keep each batch small enough to review in full — you
|
|
11
|
+
cannot move faster than you can verify.
|
|
12
|
+
|
|
13
|
+
## The cardinal rule
|
|
14
|
+
|
|
15
|
+
**Never weaken or delete a test to make it pass, and never edit the frozen
|
|
16
|
+
contract.** That makes the code judge itself. A genuine need to change either is a
|
|
17
|
+
change request back to Specify. Honor the feature-specific safety rule named in §5
|
|
18
|
+
(e.g. atomic balance update) — the one property tests alone may not force.
|
|
19
|
+
|
|
20
|
+
## AI prompt
|
|
21
|
+
|
|
22
|
+
> Read §1, §3, §4, and CONVENTIONS. Make EVERY failing test pass, one small batch
|
|
23
|
+
> at a time. Constraints: do NOT change any test; do NOT change the contract; honor
|
|
24
|
+
> the §5 safety rule; use only allow-listed packages; stop and ask if unclear.
|
|
25
|
+
> Report which tests pass and exactly what changed.
|
|
26
|
+
|
|
27
|
+
## Exit gate
|
|
28
|
+
|
|
29
|
+
- [ ] All tests pass.
|
|
30
|
+
- [ ] Coverage did not decrease.
|
|
31
|
+
- [ ] No test and no contract modified by the AI.
|
|
32
|
+
- [ ] No dependency outside the allow-list.
|
|
33
|
+
- [ ] Change small enough to review in full.
|
|
34
|
+
|
|
35
|
+
## Next
|
|
36
|
+
|
|
37
|
+
`python3 .add/tooling/add.py advance` → read `phases/6-verify.md`.
|
|
38
|
+
Book: `docs/07-step-5-build.md`.
|
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
# Phase 6 — Verify (evidence + blind-spot checks)
|
|
2
|
+
|
|
3
|
+
Goal: establish trust and record an outcome. Passing tests are necessary, not
|
|
4
|
+
sufficient. This phase is **human-led** — there is no AI role. Fill **§6** in
|
|
5
|
+
TASK.md including the GATE RECORD.
|
|
6
|
+
|
|
7
|
+
## Part one — confirm the evidence
|
|
8
|
+
|
|
9
|
+
- [ ] All tests pass.
|
|
10
|
+
- [ ] Coverage did not decrease.
|
|
11
|
+
- [ ] No test or contract was altered during build.
|
|
12
|
+
|
|
13
|
+
If any is false, stop and return to Build — there is nothing to verify yet.
|
|
14
|
+
|
|
15
|
+
## Part two — check what tests miss
|
|
16
|
+
|
|
17
|
+
- **Concurrency/timing** — is it correct when two run at once? (Tests run serially
|
|
18
|
+
and miss races.) This is usually the single most important check.
|
|
19
|
+
- **Security** — exposed secrets, injection openings, unexpected/invented
|
|
20
|
+
dependencies. A security finding is always `HARD-STOP`, never a waiver.
|
|
21
|
+
- **Architecture** — does it respect layering/dependency rules in CONVENTIONS.md?
|
|
22
|
+
|
|
23
|
+
## Record exactly one outcome (no silent pass)
|
|
24
|
+
|
|
25
|
+
| Outcome | When |
|
|
26
|
+
|---------|------|
|
|
27
|
+
| `PASS` | all checks met |
|
|
28
|
+
| `RISK-ACCEPTED` | a **non-security** gap, with signed owner + ticket + expiry |
|
|
29
|
+
| `HARD-STOP` | any failing test or any security finding |
|
|
30
|
+
|
|
31
|
+
## Exit gate / Next
|
|
32
|
+
|
|
33
|
+
- [ ] Evidence confirmed, blind-spots checked, a person approved, outcome recorded.
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
python3 .add/tooling/add.py gate PASS # marks the task done
|
|
37
|
+
# or: add.py gate RISK-ACCEPTED | add.py gate HARD-STOP (return to Build)
|
|
38
|
+
```
|
|
39
|
+
Then read `phases/7-observe.md`. Book: `docs/08-step-6-verify.md`.
|
|
@@ -0,0 +1,32 @@
|
|
|
1
|
+
# Phase 7 — Observe (feed the next loop)
|
|
2
|
+
|
|
3
|
+
Goal: release deliberately, watch reality, and turn what you learn into the next
|
|
4
|
+
spec. Release is not the finish line — it is where the most reliable information
|
|
5
|
+
about the feature finally appears. Fill **§7** in TASK.md.
|
|
6
|
+
|
|
7
|
+
## Do
|
|
8
|
+
|
|
9
|
+
1. **Release behind a blast-radius limit** — feature flag and/or gradual rollout.
|
|
10
|
+
2. **Reuse scenarios as monitors** — the §2 scenarios that defined "correct" now
|
|
11
|
+
define what you alert on: overall error rate, each rejection's rate (a spike in
|
|
12
|
+
one is a signal), latency of the risky operation under load.
|
|
13
|
+
3. **Draft the next spec delta** — every defect, surprise, or new need becomes a
|
|
14
|
+
concrete change that re-enters the flow at Specify (a new task).
|
|
15
|
+
|
|
16
|
+
## AI prompt
|
|
17
|
+
|
|
18
|
+
> Role: a reliability analyst feeding the next cycle. Read telemetry, objectives,
|
|
19
|
+
> incidents. Report error-budget burn; cluster errors and surface the top
|
|
20
|
+
> real-world failures; draft a SPEC delta with evidence links. Never auto-roll-back
|
|
21
|
+
> — recommend; a human owns the production decision.
|
|
22
|
+
|
|
23
|
+
## Exit gate
|
|
24
|
+
|
|
25
|
+
- [ ] Released behind a flag/rollout.
|
|
26
|
+
- [ ] Scenario-based monitors live.
|
|
27
|
+
- [ ] A reviewed spec delta captured (becomes the next `new-task`).
|
|
28
|
+
|
|
29
|
+
## Next
|
|
30
|
+
|
|
31
|
+
Loop. The artifacts you built are living documents the next cycle refines.
|
|
32
|
+
Book: `docs/09-the-loop.md`.
|