npm - @pilotspace/add - Versions diffs - 1.0.0 - Mend

@pilotspace/add 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (53) hide show

package/GETTING-STARTED.md +238 -0
package/LICENSE +20 -0
package/README.md +106 -0
package/bin/cli.js +131 -0
package/docs/00-introduction.md +46 -0
package/docs/01-principles.md +71 -0
package/docs/02-the-flow.md +93 -0
package/docs/03-step-1-specify.md +117 -0
package/docs/04-step-2-scenarios.md +78 -0
package/docs/05-step-3-contract.md +78 -0
package/docs/06-step-4-tests.md +71 -0
package/docs/07-step-5-build.md +80 -0
package/docs/08-step-6-verify.md +63 -0
package/docs/09-the-loop.md +43 -0
package/docs/10-setup-and-stages.md +75 -0
package/docs/11-governance.md +87 -0
package/docs/12-roles.md +99 -0
package/docs/13-adoption.md +67 -0
package/docs/14-foundation.md +121 -0
package/docs/README.md +70 -0
package/docs/add-competencies.png +0 -0
package/docs/add-flow.png +0 -0
package/docs/add-foundation.png +0 -0
package/docs/add-hierarchy.png +0 -0
package/docs/appendix-a-templates.md +88 -0
package/docs/appendix-b-prompts.md +119 -0
package/docs/appendix-c-glossary.md +85 -0
package/docs/appendix-d-worked-example.md +152 -0
package/docs/appendix-e-checklists.md +80 -0
package/docs/appendix-f-requirements-matrix.md +170 -0
package/package.json +47 -0
package/skill/add/SKILL.md +118 -0
package/skill/add/deltas.md +69 -0
package/skill/add/fold.md +66 -0
package/skill/add/intake.md +49 -0
package/skill/add/phases/0-setup.md +35 -0
package/skill/add/phases/1-specify.md +55 -0
package/skill/add/phases/2-scenarios.md +36 -0
package/skill/add/phases/3-contract.md +41 -0
package/skill/add/phases/4-tests.md +37 -0
package/skill/add/phases/5-build.md +38 -0
package/skill/add/phases/6-verify.md +39 -0
package/skill/add/phases/7-observe.md +32 -0
package/skill/add/run.md +152 -0
package/skill/add/scope.md +58 -0
package/tooling/add.py +1573 -0
package/tooling/templates/CONVENTIONS.md.tmpl +8 -0
package/tooling/templates/GLOSSARY.md.tmpl +3 -0
package/tooling/templates/MILESTONE.md.tmpl +25 -0
package/tooling/templates/MODEL_REGISTRY.md.tmpl +6 -0
package/tooling/templates/PROJECT.md.tmpl +42 -0
package/tooling/templates/TASK.md.tmpl +111 -0
package/tooling/templates/dependencies.allowlist.tmpl +2 -0

package/docs/appendix-b-prompts.md ADDED Viewed

@@ -0,0 +1,119 @@
+# Appendix B · Prompt library
+[← Appendix A Templates](./appendix-a-templates.md) · [Contents](./README.md) · Next: [Appendix C Glossary →](./appendix-c-glossary.md)
+The contents of the `playbook/` folder. Each prompt is plain text that names the files to read, states a single task, and lists the rules. The inline `# why:` notes are annotations — keep them; they encode the judgment behind each instruction. These prompts are themselves versioned, tested artifacts (see [11 Governance](./11-governance.md)).
+---
+### `playbook/1_specify.md`
+```
+Role: a domain analyst who brainstorms, then asks rather than assumes.
+Read first: ./PRD/* , ./GLOSSARY.md , ./inputs/ (tickets, interviews, contracts)
+Task: co-specify SPEC.md WITH me. No solutions, no code.
+Steps:
+  0. Diverge first: surface 2–3 genuine framings of the feature + the open questions, and let
+     me react before you draft. Record the result as `Framings weighed: X (chosen) · Y · Z`.
+     # why: a spec dictated by one side is a guess; brainstormed, it is a decision.
+  1. List every required behavior (Must) and every situation to refuse (Reject),
+     giving each refusal a named error code.
+     # why: named errors become scenarios and contract responses; "handle bad input" does not.
+  2. State the success state-change (After).
+  3. List the assumptions you had to make, RANKED least-sure first; flag the 1–2 you are least
+     sure about as `⚠ <assumption> — least sure because <why>; if wrong: <cost>`.
+     # why: a flat all-equal list gets rubber-stamped; a ranked one aims my attention at the risk.
+Exit: a domain owner disputes none of it; assumptions ranked least-sure first, the 1–2 ⚠ flags
+      carrying why + cost — or an honest "none material" that still names the single biggest risk.
+Never: resolve an ambiguity by guessing — ask. Never a blank "none" or a flat wall of equal ticks.
+```
+### `playbook/2_scenarios.md`
+```
+Role: a specification tester.
+Read first: ./SPEC.md , ./GLOSSARY.md
+Task: produce features/<name>.feature.
+Steps:
+  1. For each Must and each Reject rule, write a Given/When/Then scenario.
+     # why: a rule with no scenario will never be verified.
+  2. For every rejection, add an And-clause asserting what must NOT change.
+     # why: catches corrupting partial failures that a result-only check misses.
+Exit: every rule has at least one scenario with an observable result.
+Never: write a vague result ("then it works").
+```
+### `playbook/3_contract.md`
+```
+Role: an interface/contract architect; contracts are immutable once frozen.
+Read first: ./SPEC.md , ./features/*.feature , ./GLOSSARY.md
+Task: produce contracts/<name>.md, a mock server, and contract tests. No business logic.
+Steps:
+  1. Define interfaces, request/response shapes, and the schema, named from the glossary.
+     # why: consistent names prevent the subtle mismatches that cause silent bugs.
+  2. Define a response for every Reject error code in the spec.
+  3. Generate a mock returning the contracted shapes, and contract tests pinning them.
+     # why: the mock unblocks dependent work; the tests become a regression baseline.
+  4. Mark the contract FROZEN at a version.
+Exit: contract tests pass against the mock; every spec rejection has a response.
+Never: change a frozen contract — a change is a request that reopens Specify.
+```
+### `playbook/4_tests.md`
+```
+Role: a test author who writes tests before code.
+Read first: ./features/*.feature , ./contracts/*
+Task: produce a failing (red) test suite. Do NOT implement the feature.
+Steps:
+  1. Turn each scenario into an executable test.
+     # why: closes spec -> scenario -> test with no human translation loss.
+  2. Add contract-conformance and edge-case tests.
+  3. Run the suite; confirm it fails for the right reason (missing implementation).
+     # why: a test that passes before code exists is testing nothing.
+  4. Record a coverage target.
+Exit: one test per scenario; suite red for the right reason; target recorded.
+Never: assert on internals; write the implementation here.
+```
+### `playbook/5_build.md`
+```
+Role: an execution agent. The human commands; you implement and report.
+Read first: ./SPEC.md , ./contracts/* , ./tests/* , ./CONVENTIONS.md
+Task: make EVERY failing test pass, one small task at a time.
+Steps:
+  1. Pick ONE task; restate the tests it must satisfy before coding.
+     # why: small batches keep human review able to keep up.
+  2. Implement; run tests; iterate to green WITHOUT weakening any test.
+     # why: editing a test to pass makes the code judge itself — the cardinal sin.
+  3. Honor the feature-specific safety rule (e.g. atomic balance update).
+  4. Run security and allow-list checks; attach the evidence bundle; open the change.
+Exit: all green; coverage held; no test/contract changed; no out-of-allow-list package.
+Never: change a test or the contract; add an unlisted dependency; exceed the task budget
+       without escalating; guess when unclear — ask.
+```
+### `playbook/6_observe.md`
+```
+Role: a reliability analyst feeding the next cycle.
+Read first: telemetry exports , service-objective definitions , incident tickets
+Task: turn production reality into the next SPEC delta.
+Steps:
+  1. Report objective status and error-budget burn vs target.
+  2. Cluster errors and usage; surface the top real-world failures.
+  3. Draft a SPEC delta — what the next loop should add or fix — with evidence links.
+     # why: closes the loop; production learning becomes the next specification.
+Exit: a reviewed SPEC delta linked into the backlog.
+Never: auto-roll back — recommend; a human owns the production decision.
+```
+---
+### Master prompt skeleton
+```
+Role: <one line — who the agent is for this step>
+Read first: <explicit repository paths — never chat memory>
+Task: <the single outcome; state what is OUT of scope>
+Steps:
+  1. <action>      # why: <the judgment this encodes>
+Exit: <conditions a person or the pipeline can check>
+Never: <what the agent must not do>
+Evidence: <artifacts to attach for review>
+```

package/docs/appendix-c-glossary.md ADDED Viewed

@@ -0,0 +1,85 @@
+# Appendix C · Glossary
+[← Appendix B Prompts](./appendix-b-prompts.md) · [Contents](./README.md) · Next: [Appendix D Worked example →](./appendix-d-worked-example.md)
+---
+## Terms
+**AIDD (AI-Driven Development)** — a method of building software in which an AI agent writes most of the code and people direct and verify the work.
+**Artifact** — a durable work product: the spec, the scenarios, the contract, the tests. The artifacts survive; the code is disposable.
+**Competency delta** — a single learning a loop produces, tagged by which of the five competencies (`DDD · SDD · UDD · TDD · ADD`) it improves, written in a task's OBSERVE phase as `- [<COMPETENCY> · <status>] <learning> (evidence: …)`. Emitted `open` by the AI; the human folds it into a versioned `PROJECT.md` (`folded`) or declines it (`rejected`). The mechanism by which the foundation self-improves instead of drifting. See the `add` skill's `deltas.md`.
+**Contract** — the fixed external shape of a feature: interfaces, data structures, names, and error cases. Frozen before the build, it is the surface the AI builds against.
+**Co-specification** — how a spec is made in ADD: the AI and the human **brainstorm the shape together** (diverge), the AI **drafts** it, and the human **validates with the AI's advice** (validate). The AI's decisive advice is the *least-sure flag*. It replaces dictation-by-one-side — the human owns the decision, the AI owns surfacing what it does not yet know. See [03 Specify](./03-step-1-specify.md).
+**Disposable code** — the view that code is one regenerable implementation of the artifacts, not a durable asset to be preserved.
+**Evidence bundle** — the proof attached to a change (passing tests, clean security scan, no coverage loss) that justifies trusting it and may unlock more AI autonomy.
+**Foundation version** — a monotonic integer marker in `PROJECT.md` that advances by one each time confirmed competency deltas are folded into the foundation. It makes the survivor layer's evolution auditable: a rising version with fewer new deltas per milestone is the signal that a competency is converging rather than drifting. Bumped only by the fold ritual (see the `add` skill's `fold.md`).
+**Gate** — a checkpoint with an explicit pass/fail exit. Its outcome is `PASS`, `RISK-ACCEPTED`, or `HARD-STOP`.
+**`HARD-STOP`** — a gate outcome meaning work cannot proceed; triggered by any failing test or security finding.
+**Intake** — the step *before* a task: sizing a raw request into versioned scope by classifying it into one **request bucket**. The AI proposes `{bucket, rationale, command}`; the human confirms. Lives in the `add` skill's `intake.md` (the intake altitude, above the per-task flow).
+**Least-sure flag** — the AI's ranked declaration of the **1–2 things most likely to be wrong** in what it is asking a human to approve, each carrying *why* it is uncertain and *what it costs if wrong* (`⚠ [spec|scenario|contract|test] … — because …; if wrong: …`). It reshapes the old flat assumptions list into a ranked one, so a single approval aims the reviewer's attention at the real risk instead of a wall of equal-looking ticks. Bundle-wide at the one-approval freeze seam; the §1 assumptions are its first feeder. If nothing is materially uncertain it still names the single biggest risk — never a blank "none". It makes a genuine review cheap and a lazy one visibly negligent, but cannot *force* the read. The "AI advises" half of **co-specification**.
+**Living document** — an artifact expected to change as the loop learns; never frozen forever (the one exception being a versioned contract, which changes only via a change request).
+**On-ramp** — the path a new user walks from install to their first milestone: install → `/add` → describe the goal → the agent runs intake (sizing the request into a milestone the human confirms) → the one-approval front → the self-driving run. The AI-first entry to the method; the human talks to the agent rather than hand-typing `add.py`.
+**Owner (of a phase)** — who drives a phase, exposed by `add.py … --json` as `human`, `seam`, or `ai`. It tells an autonomous harness where it may run (`ai`) and where it must checkpoint to a person (`human`/`seam`), following the who-does-what table (Verify is always `human`).
+**Profile** — the intensity at which the method is run: Express, Standard, or Regulated.
+**Request bucket** — one of the four intake classifications — `new-major`, `sub-milestone`, `task`, or `change-request` — chosen by the tie-break order (the frozen-scope test runs before the size test). A request too vague to size is rejected `ask_human`; one that touches frozen scope, `frozen_scope`; one spanning buckets, `split_required`.
+**`RISK-ACCEPTED`** — a gate outcome meaning work proceeds with a signed waiver (owner, ticket, expiry); allowed for non-security gaps only.
+**Scenario** — a single rule expressed as Given/When/Then; readable by people and checkable by machines; the bridge between spec and tests.
+**Scope drafting (scope-loop)** — the second half of **intake**: once a request is classified `new-major`/`sub-milestone`, turning it into a confirmed, well-formed `MILESTONE.md` (goal · scope · exit criteria · breadth-first tasks) through discussion. Every exit criterion maps to a declared task slug; the AI proposes the draft, the human confirms before anything is created. Lives in the `add` skill's `scope.md`.
+**Spec (`SPEC.md`)** — the plain-language statement of what a feature must do, must reject, and assumes.
+**Spine / continuous concern** — a concern that runs through every step rather than being one step: security, testing, observability, cost.
+**Stage** — one pass through the flow at a chosen depth: Prototype, Proof of Concept, MVP, or Production-Ready.
+**State surface** — everything an agent loads every session: the `add` skill (router `SKILL.md` + the active phase) and the lean operational docs — `PROJECT.md`, the active `MILESTONE.md` and `TASK.md`, and `state.json`. Kept small to avoid context rot. Contrast **Story surface**.
+**Stop signal** — the boolean an autonomous harness reads from `add.py … --json` (`stop = owner != "ai"`): true means pause for a person before proceeding. The irreducible stops are the contract freeze and the Verify gate. See **Owner (of a phase)**.
+**Story surface** — the book (`docs/*`): the whole method, read once by a person to trust ADD, then referenced by a pointer and **never auto-loaded** into agent context. Contrast **State surface**.
+**Survivor layer** — the set of durable artifacts (conventions, glossary, frozen contracts) that outlive any particular code.
+**Trust ladder / autonomy ladder** — the graduated levels of AI autonomy, earned with evidence and verification capacity.
+**Verification capacity / review throughput** — the rate at which a team can confirm AI output is correct; the real ceiling on safe speed.
+---
+## Optional mapping to formal phase names
+This book uses plain step names. Teams connecting it to a larger formal standard may use these equivalents. The mapping is optional; the plain flow is complete on its own.
+| Plain step (this book) | Formal phase name |
+|------------------------|-------------------|
+| Project setup | Foundation |
+| Specify | Domain Discovery + Spec Definition |
+| (design portion) | UX-Driven Design |
+| Scenarios | Behavior specification (Given/When/Then) |
+| Contract | Contract Freeze |
+| Tests | Test-Driven Verification |
+| Build | AI-Driven Development (the engine) |
+| Verify | the review gate within the build |
+| Observe (loop) | Operate and Learn |
+The formal standard also names the *foundation* and *design* work as full phases in their own right; this book folds them into project setup and the Specify step (and the Prototype stage) to keep the flow to six memorable steps.

package/docs/appendix-d-worked-example.md ADDED Viewed

@@ -0,0 +1,152 @@
+# Appendix D · The worked example, end to end
+[← Appendix C Glossary](./appendix-c-glossary.md) · [Contents](./README.md) · Next: [Appendix E Checklists →](./appendix-e-checklists.md)
+The running example, assembled in one place so you can see a complete pass through the flow without flipping between chapters. The feature: **transfer money between a user's own accounts.**
+---
+## Step 1 — Specify → `SPEC.md`
+```
+Feature: Transfer money between my own accounts
+Framings weighed: synchronous single-currency transfer (chosen) · queued transfer · multi-currency with FX
+Must:
+  - move an amount from one of my accounts to another of mine
+  - amount > 0
+  - source and destination are different accounts
+  - source has enough balance
+After:
+  - source balance -= amount, destination balance += amount
+Reject:
+  - amount <= 0           -> "amount_invalid"
+  - source == destination -> "same_account"
+  - balance < amount      -> "insufficient_funds"
+  - account not mine      -> "forbidden"
+Assumptions — least-sure first:
+  ⚠ same currency only (no FX) in v1 — least sure because the ticket never said; if wrong: the amount/rounding model changes and this contract is wrong
+  - [x] no daily limit in v1 — confirmed: out of scope for v1
+```
+The product owner read the flagged assumption first — the single-currency choice, the one most likely to be wrong and most expensive if it were — and confirmed it: v1 is single-currency with no daily limit.
+## Step 2 — Scenarios → `features/transfer.feature`
+```
+Scenario: successful transfer
+  Given A has 100 and B has 0, both mine
+  When I transfer 30 from A to B
+  Then A has 70 and B has 30
+Scenario: amount must be positive
+  Given A has 100, mine
+  When I transfer 0 from A to B
+  Then it is rejected "amount_invalid"
+  And no balance changes
+Scenario: same account
+  Given A has 100, mine
+  When I transfer 10 from A to A
+  Then it is rejected "same_account"
+  And no balance changes
+Scenario: insufficient funds
+  Given A has 20, mine
+  When I transfer 50 from A to B
+  Then it is rejected "insufficient_funds"
+  And no balance changes
+Scenario: not my account
+  Given account C is not mine
+  When I transfer 10 from C to B
+  Then it is rejected "forbidden"
+```
+Five scenarios for four rejections plus the happy path — every rule from the spec is covered.
+## Step 3 — Contract → `contracts/transfer.md`
+```
+POST /transfers   body: { fromAccountId, toAccountId, amount }
+  200 -> { transferId, fromBalance, toBalance }
+  400 -> { error: "amount_invalid" | "same_account" | "insufficient_funds" }
+  403 -> { error: "forbidden" }
+Schema: accounts.balance (read + write, must be transactional)
+Status: FROZEN @ v1
+```
+Frozen at v1. The schema note flags the atomicity requirement the verification step will check.
+## Step 4 — Tests → `tests/transfer_test.py` (run first; all fail)
+```python
+def test_successful_transfer():
+    a = account(balance=100, owner=me); b = account(balance=0, owner=me)
+    r = transfer(a.id, b.id, 30)
+    assert r.status == 200
+    assert a.balance == 70 and b.balance == 30
+def test_amount_must_be_positive():
+    a = account(balance=100, owner=me); b = account(balance=0, owner=me)
+    r = transfer(a.id, b.id, 0)
+    assert r.status == 400 and r.error == "amount_invalid"
+    assert a.balance == 100 and b.balance == 0
+def test_same_account():
+    a = account(balance=100, owner=me)
+    r = transfer(a.id, a.id, 10)
+    assert r.status == 400 and r.error == "same_account"
+    assert a.balance == 100
+def test_insufficient_funds():
+    a = account(balance=20, owner=me); b = account(balance=0, owner=me)
+    r = transfer(a.id, b.id, 50)
+    assert r.status == 400 and r.error == "insufficient_funds"
+    assert a.balance == 20
+def test_not_my_account():
+    c = account(balance=100, owner=someone_else); b = account(balance=0, owner=me)
+    r = transfer(c.id, b.id, 10)
+    assert r.status == 403 and r.error == "forbidden"
+```
+Run now, with no implementation: all five fail. That is the honest baseline.
+## Step 5 — Build → the prompt given to the AI
+```
+Read SPEC.md, contracts/transfer.md, and tests/transfer_test.py.
+Implement POST /transfers so that EVERY test passes.
+Constraints:
+  - Do NOT change any test.
+  - Do NOT change the contract.
+  - Make the balance update atomic: debit and credit in a single transaction,
+    and re-check the balance inside the transaction.
+  - Stop and ask if any requirement is unclear — do not guess.
+  - Use only packages in dependencies.allowlist.
+Report which tests pass and exactly what you changed.
+```
+The AI implements, runs the suite, iterates, and reports all five green, listing the files it changed.
+## Step 6 — Verify → the human checks
+- **Evidence:** all five tests pass; coverage held; no test or contract was altered. ✓
+- **Concurrency (the key check):** two simultaneous transfers from account A must not both pass the balance check and overdraw it. The reviewer confirms the balance re-check happens *inside* the transaction and that the row is locked for the update — so a race cannot double-spend. ✓
+- **Security:** no hardcoded secrets; inputs validated; no new dependency added. ✓
+- **Architecture:** the change respects the layering in `CONVENTIONS.md`. ✓
+- **Outcome recorded:** `PASS`, reviewed by the senior engineer.
+## The loop — observe
+Released behind a feature flag to 5% of users. Monitored:
+- transfer error rate (target: well under 0.1% of attempts);
+- the rate of each rejection — a spike in `insufficient_funds` would suggest a UX problem (users not seeing their balance) rather than a code defect;
+- latency of the atomic update under load.
+A week later, telemetry shows an unexpectedly high `forbidden` rate. The `6_observe` prompt clusters it: users are trying to transfer *into* a shared account they can see but do not own. That observation becomes a `SPEC.md` delta — "support transfers into accounts I am authorized on, not only accounts I own" — and the flow returns to Step 1 for the next cycle.
+---
+This is the whole method in one feature: four artifacts written in order, an AI build bounded by them, a verification grounded in evidence plus the one check tests miss, and a loop that turns production reality into the next specification.

package/docs/appendix-e-checklists.md ADDED Viewed

@@ -0,0 +1,80 @@
+# Appendix E · Checklists
+[← Appendix D Worked example](./appendix-d-worked-example.md) · [Contents](./README.md) · Next: [Appendix F Requirements matrix →](./appendix-f-requirements-matrix.md)
+Every exit check in the book, collected for quick use. Print this page.
+---
+## Setup (once per project)
+- [ ] Pipeline runs and is green on the empty skeleton.
+- [ ] AI model pinned in `MODEL_REGISTRY.md`.
+- [ ] Dependency allow-list exists; pipeline fails on anything outside it.
+- [ ] `playbook/` contains the six prompts.
+## Step 1 — Specify
+- [ ] Every required behavior stated explicitly.
+- [ ] Every rejection has a named error code.
+- [ ] Success state-change described.
+- [ ] Assumptions ranked least-sure first; the 1–2 most-likely-wrong ⚠-flagged with why + cost (or an honest "none material" that still names the single biggest risk).
+## Step 2 — Scenarios
+- [ ] Every "Must" rule has a scenario.
+- [ ] Every "Reject" rule has a scenario.
+- [ ] Each result is a specific, observable fact.
+- [ ] Rejections assert what must stay unchanged.
+## Step 3 — Contract
+- [ ] Contract versioned and `FROZEN`.
+- [ ] Contract tests pass against the mock.
+- [ ] Names match the glossary.
+- [ ] Every spec rejection has a contracted response.
+## Step 4 — Tests
+- [ ] One test per scenario.
+- [ ] Suite runs in the pipeline and is red for the right reason.
+- [ ] Tests assert behavior, not internals.
+- [ ] Coverage target recorded.
+## Step 5 — Build
+- [ ] All tests pass.
+- [ ] Coverage did not decrease.
+- [ ] No test or contract modified by the AI.
+- [ ] No package outside the allow-list added.
+- [ ] Change is small enough to review in full.
+## Step 6 — Verify
+- [ ] All tests pass (the evidence).
+- [ ] Concurrency/timing of the risky operation is safe.
+- [ ] No exposed secrets, injection, or unexpected dependencies.
+- [ ] Layering and dependencies follow `CONVENTIONS.md`.
+- [ ] A person reviewed and approved.
+- [ ] Outcome recorded (`PASS` / `RISK-ACCEPTED` / `HARD-STOP`).
+## The loop
+- [ ] Released behind a flag or gradual rollout.
+- [ ] Scenarios reused as production monitors.
+- [ ] Learnings written back as a `SPEC.md` delta.
+---
+## Master shippable checklist
+A feature is shippable only when all are true:
+- [ ] Spec complete: behavior stated, rejections named, assumptions ranked least-sure first with the biggest risk flagged.
+- [ ] Every rule has a scenario.
+- [ ] Contract frozen; contract tests green.
+- [ ] A test per scenario; suite was red before the build.
+- [ ] All tests green; coverage held; tests and contract untouched by the AI.
+- [ ] Concurrency, security, and architecture checked by a person.
+- [ ] Gate outcome recorded with an accountable owner.
+- [ ] Released behind a flag, with monitors in place.

package/docs/appendix-f-requirements-matrix.md ADDED Viewed

@@ -0,0 +1,170 @@
+# Appendix F · Document requirements matrix (Project → Milestone → Task)
+[← Appendix E Checklists](./appendix-e-checklists.md) · [Contents](./README.md)
+This appendix maps every AIDD document to a three-level project hierarchy, so that at any level a team can answer three questions: **which documents must exist, who owns them, and what proves the level is complete.** It is the traceability backbone of the method — read it alongside the stage-depth matrix in [10 Setup and stages](./10-setup-and-stages.md), which covers *step* depth; this appendix covers *document* requirements.
+---
+## The three levels
+| Level | What it is | AIDD meaning | Spans |
+|-------|-----------|--------------|-------|
+| **Project** | the whole product or engagement | the survivor layer — documents created once and kept for the life of the product | all milestones |
+| **Milestone** | a stage or release | one pass of the flow at a chosen depth: Prototype, POC, MVP, or Production-Ready; groups many tasks | many tasks |
+| **Task** | one feature through the flow | a single pass of Specify → … → Verify; the smallest unit with its own gate records | the six steps |
+A **project** sets up the survivor-layer documents once. A **milestone** is a depth-bounded goal that groups tasks and has its own entry and exit document gates. A **task** is one feature, and it produces the per-feature artifacts.
+## How the hierarchy decomposes
+```mermaid
+flowchart TD
+  P["PROJECT — the product<br/>PROJECT.md (foundation) · CONVENTIONS · GLOSSARY · MODEL_REGISTRY · allowlist · playbook"]
+  P --> M1["MILESTONE · Prototype"]
+  P --> M2["MILESTONE · POC"]
+  P --> M3["MILESTONE · MVP"]
+  P --> M4["MILESTONE · Production-Ready"]
+  M3 --> T1["TASK · Transfer between accounts<br/>SPEC · feature · contract · tests · code · gate records"]
+  M3 --> T2["TASK · View balance"]
+  M3 --> T3["TASK · Transaction history"]
+  classDef p fill:#F1EFE8,stroke:#5F5E5A,color:#2C2C2A;
+  classDef m fill:#FAEEDA,stroke:#BA7517,color:#633806;
+  classDef t fill:#E6F1FB,stroke:#185FA5,color:#042C53;
+  class P p; class M1,M2,M3,M4 m; class T1,T2,T3 t;
+```
+---
+## Matrix 1 — Documents by level (ownership and lifespan)
+Which document lives at which level, who is accountable for it, and how long it lasts.
+| Document | Level | Created | Lifespan | Accountable owner |
+|----------|:-----:|---------|----------|-------------------|
+| `PROJECT.md` (foundation: domain · spec · UI/UX) | Project | setup, grows | whole project | Product / Architect |
+| `CONVENTIONS.md` | Project | setup | whole project | Architect / Lead |
+| `GLOSSARY.md` | Project | setup, grows | whole project | Product / Domain |
+| `MODEL_REGISTRY.md` | Project | setup | whole project | Architect / Lead |
+| `dependencies.allowlist` | Project | setup | whole project | Security |
+| `playbook/*.md` (prompts) | Project | setup, versioned | whole project | Eng Lead |
+| Stage plan / roadmap | Milestone | per milestone | the milestone | EM / Delivery |
+| Milestone exit report | Milestone | milestone end | the milestone | EM / Delivery |
+| `SLO.md` (objectives) | Milestone (MVP+) | from MVP | from MVP onward | DevOps / SRE |
+| `SPEC.md` | Task | per feature | living | Product / Domain |
+| `features/*.feature` | Task | per feature | living | QA / Test |
+| `contracts/*.md` | Task → **Project** | per feature, then frozen | survivor (promoted to project) | Architect / Lead |
+| `tests/*` | Task | per feature | living | QA / Engineer |
+| Source code | Task | per feature | **disposable** | Engineer |
+| Gate outcome records | Task | per step | kept for audit | the reviewer |
+> Note the one promotion: a **contract** is authored at task level but, once frozen, becomes part of the project's survivor layer — other tasks depend on it. That promotion is why a contract change is a project-level change request, not a task-local edit.
+---
+## Matrix 2 — Documents required by milestone
+Which documents must exist, and at what depth, to **exit** each milestone. Depth: **Deep** · **Core** · **Light** · **—** (not required).
+| Document | Prototype | POC | MVP | Production-Ready |
+|----------|:---------:|:---:|:---:|:----------------:|
+| `CONVENTIONS.md` | Light | Core | Required | Required |
+| `GLOSSARY.md` | seed | Core | Required | Required |
+| `MODEL_REGISTRY.md` | Required | Required | Required | Required |
+| `dependencies.allowlist` | optional | Required | Required | Required |
+| `playbook/*.md` | Required | Required | Required | Required |
+| `SPEC.md` | Light | Deep (risky slice) | Required (full) | Required (full) |
+| Design: flows + screen states | **Deep** | Light | Core | Deep |
+| `features/*.feature` | — | Core | Required | Exhaustive |
+| `contracts/*.md` (frozen) | — | Core (risky slice) | Required (frozen) | Required (versioned) |
+| `tests/*` | — | Core | Core | Full coverage |
+| `SLO.md` | — | — | Light | Required |
+| Gate outcome records | — | Core | Required | Required (all `HARD-STOP`) |
+| Operate / observe report | — | — | Light | Required |
+| Milestone exit report | Light | Core | Required | Required |
+**Reading it:** a Prototype exits on a deep design and little else; a POC adds a deep spec, core scenarios/contract/tests on the risky slice; an MVP requires the full per-feature document set plus light operations; Production requires everything at full depth with operations and audit-grade gate records.
+---
+## Matrix 3 — Documents required per task (the six steps)
+Every task, regardless of milestone, produces this artifact chain. The depth varies by milestone (Matrix 2); the *sequence and exit gate* do not.
+| Step | Required document | Exit gate (the proof) | Detail |
+|------|-------------------|------------------------|--------|
+| 1 Specify | `SPEC.md` | rules + named rejections, assumptions ranked least-sure first (biggest risk ⚠-flagged) | [03](./03-step-1-specify.md) |
+| 2 Scenarios | `features/<task>.feature` | one scenario per rule | [04](./04-step-2-scenarios.md) |
+| 3 Contract | `contracts/<task>.md` | frozen + contract tests green | [05](./05-step-3-contract.md) |
+| 4 Tests | `tests/<task>_*` | one test per scenario, red first | [06](./06-step-4-tests.md) |
+| 5 Build | source code + evidence bundle | all tests green, nothing weakened | [07](./07-step-5-build.md) |
+| 6 Verify | gate outcome record | `PASS` / `RISK-ACCEPTED` / `HARD-STOP` | [08](./08-step-6-verify.md) |
+A task is **done** only when all six documents exist and the Verify record reads `PASS` (or a signed `RISK-ACCEPTED`). See the master shippable checklist in [Appendix E](./appendix-e-checklists.md).
+---
+## Matrix 4 — Executable proofs (the claims the engine enforces)
+The rows above are the method's *promises*. A promise a tool quietly breaks is worse than none — so the `add` engine ships a proof-harness: each invariant below is pinned by an automated test that fails loudly if the **Story** (this book) and the **State** (the engine) drift apart. This table is the coverage *so far*, not a completeness claim — but the minimalism-and-coverage audit has now run once over Matrices 1–3 (see **Sweep findings** below); what it could cheaply prove, it added; what it deliberately left unenforced, it recorded.
+| Claim (where it lives) | The engine enforces | Proof test |
+|------------------------|---------------------|------------|
+| No silent skips (principle 7) · "done only when Verify reads `PASS`" (Matrix 3) | `gate PASS` is **refused** unless the task has reached `verify` | `test_gate_pass_refused_before_verify` |
+| A passed task is genuinely done | `gate PASS` at `verify` advances to `done` | `test_gate_pass_at_verify_reaches_done` |
+| Deliberate ≠ silent | the explicit `phase` command is a logged escape hatch the guardrail does not block | `test_phase_override_escape_hatch` |
+| "A security finding is ALWAYS `HARD-STOP`" | `HARD-STOP` is recordable from any phase and never forces `done` | `test_hardstop_recordable_mid_build` |
+| "done … or a signed `RISK-ACCEPTED`" (Matrix 3) | `gate RISK-ACCEPTED` at `verify` advances to `done` (same guard as `PASS`) | `test_risk_accepted_complete_reaches_done` |
+| A waived task **can complete its milestone** (the point of the waiver) | the completeness predicate counts a signed `RISK-ACCEPTED` as done, so `milestone-done` / `ready` / `check` / `archive` accept it — it does not silently block | `test_milestone_done_accepts_a_waived_task` · `test_check_tolerates_a_recorded_waiver` |
+| A waiver is **signed** (owner · ticket · expiry) | `gate RISK-ACCEPTED` is refused without all three; they are stored in state | `test_risk_accepted_requires_waiver` · `test_risk_accepted_partial_waiver_refused` |
+| A waiver can **expire** — a lapsed one is caught, not trusted forever | `check` **FAILS** a `RISK-ACCEPTED` task whose stored `expires` is before today; fail-closed on a missing/unparseable date (`waiver_expired`) | `test_check_flags_expired_waiver` · `test_check_passes_unexpired_waiver` · `test_check_failclosed_on_unparseable_expires` |
+| **The Story is never auto-loaded** (principle 9, the *Minimal* pillar) | **no** command reads a `docs/` chapter at runtime — and the spy runs *every* subcommand the parser exposes, so "no command" is universal, not a subset; a project with **no** `docs/` runs the whole lifecycle | `test_full_lifecycle_runs_with_no_story` · `test_no_command_reads_a_docs_chapter` · `test_every_subcommand_is_covered` |
+| The book's gate outcomes are the engine's | `PASS` · `RISK-ACCEPTED` · `HARD-STOP` exist in both prose and `GATES` | `test_book_gate_outcomes_match_engine` |
+The tests are the source of truth; this table is their index. If a row here is ever unproven, that is a gap in the method, not a detail — the proof-harness exists to make such gaps fail loudly. (Tests: `add-method/tooling/test_proof_harness.py`, `test_waiver.py`.)
+**Now closed:** an earlier version of this table flagged a `RISK-ACCEPTED` gap — the engine advanced only `PASS` to `done`, so a waived task could not complete its milestone, and the waiver fields were uncaptured. The `RISK-ACCEPTED` rows above close it: a signed waiver (owner · ticket · expiry) now completes a verify-phase task and is stored in state for a later `check` to expire. Closing it took *two* edits, not one — advancing the gate to `done` was necessary but not sufficient, because the shared completeness predicate (`milestone-done` / `ready` / `check` / `archive` all read it) still counted only `PASS`; a waived task reached `done` yet silently blocked its milestone until that predicate was taught to count a signed `RISK-ACCEPTED` too. The end-to-end row above is what catches that class of half-fix — proving the *task* completes is not proving the *milestone* can. The pattern that found it — book-claim → engine-enforces → named test — is the standing way to audit the remaining rows.
+**Sweep findings (minimalism-and-coverage audit, v2):** the audit walked Matrices 1–3 for claims the engine *could* enforce but did not yet prove.
+- **Proved and added** (rows above): the *Minimal* pillar's headline — "the Story is never auto-loaded" (principle 9) — was written but unproven; it is now pinned behaviorally (the engine runs the whole lifecycle with no `docs/` present, and a read-spy over *every* subcommand confirms none reads a chapter at runtime). And waiver **expiry**, which Matrix 4 already promised the state captured "for a later `check` to expire," is now enforced.
+- **Recorded, deliberately *not* enforced:** Matrix 3 says a task is done only when "all six documents exist." The engine checks that `TASK.md` *exists* and that its phase marker matches state — it does **not** parse the file to confirm each of the seven sections is filled. Teaching it to grade section completeness would push the engine toward reading and judging the Story it is supposed to keep off the runtime path — an anti-minimal move. The reviewer owns section completeness at the Verify gate (the human-led half of the method); the engine owns the cheap structural invariants. This is a chosen boundary, not an oversight.
+- **Lean check:** the audit confirmed `state.json` carries no redundant per-task fields — `title`, `phase`, `gate`, `milestone`, `depends_on`, the two timestamps, and a `waiver` only once one is signed; nothing to trim. A clean bill is a finding too.
+---
+## Worked example — the hierarchy filled in
+- **Project:** *Mobile Banking App.* Survivor-layer documents: `CONVENTIONS.md`, `GLOSSARY.md` (defines *account*, *balance*, *transfer*), `MODEL_REGISTRY.md`, `dependencies.allowlist`, `playbook/`.
+- **Milestone:** *MVP — core money movement.* Exit requires the full per-feature document set for each task below, plus a light `SLO.md` and a milestone exit report.
+  - **Task:** *Transfer between own accounts* → `SPEC.md`, `features/transfer.feature`, `contracts/transfer.md` (frozen at v1), `tests/transfer_test.py`, code, and a `PASS` gate record. (The full set is in [Appendix D](./appendix-d-worked-example.md).)
+  - **Task:** *View balance* → its own SPEC, feature, contract, tests, code, record.
+  - **Task:** *Transaction history* → its own set.
+When all three tasks read `PASS` and the milestone documents exist, the MVP milestone exits — and the frozen `transfer` contract is now a project-level survivor artifact the next milestone builds on.
+---
+## Traceability chain
+The hierarchy gives a clean line of evidence from a business goal down to a passing test — which is what makes an AIDD project auditable:
+```
+PROJECT goal           "let customers move their own money safely"
+  └─ MILESTONE (MVP)    "core money movement"
+       └─ TASK          "transfer between own accounts"
+            └─ SPEC rule        "source must have enough balance"
+                 └─ SCENARIO    "insufficient funds -> rejected, no change"
+                      └─ TEST   test_insufficient_funds (was red, now green)
+                           └─ VERIFY record   PASS (atomicity checked)
+```
+Every level points down to the evidence beneath it and up to the goal above it. To audit any claim — "we handle insufficient funds correctly" — you follow the chain to a specific test and a specific gate record. Nothing rests on assertion.
+---
+*This matrix is the requirements view of the method. The flow ([Part II](./02-the-flow.md)) tells you the order; the stages ([10](./10-setup-and-stages.md)) tell you the depth; this appendix tells you, at each level of the project, exactly which documents must exist and who owns them.*
+---
+*End of book. AIDD is one repeatable loop — Specify → Scenarios → Contract → Tests → Build → Verify → observe, then repeat. People own direction and verification; the AI owns the build; the artifacts are the asset and the code is disposable.*

package/package.json ADDED Viewed

@@ -0,0 +1,47 @@
+{
+  "name": "@pilotspace/add",
+  "version": "1.0.0",
+  "description": "ADD (AI-Driven Development) — a minimal, state-tracked Claude Code skill that drives every feature through Specify → Scenarios → Contract → Tests → Build → Verify → Observe. Ships the AIDD book as its trust layer.",
+  "bin": {
+    "add": "bin/cli.js"
+  },
+  "publishConfig": {
+    "access": "public"
+  },
+  "scripts": {
+    "test": "python3 -m unittest discover -s tooling -p 'test_*.py'"
+  },
+  "files": [
+    "bin/",
+    "skill/",
+    "tooling/add.py",
+    "tooling/templates/",
+    "docs/",
+    "README.md",
+    "GETTING-STARTED.md"
+  ],
+  "keywords": [
+    "ai-driven-development",
+    "add",
+    "aidd",
+    "claude-code",
+    "skill",
+    "tdd",
+    "spec-driven",
+    "agent",
+    "methodology"
+  ],
+  "engines": {
+    "node": ">=18"
+  },
+  "license": "MIT",
+  "author": "Tin Dang <tindang.ht97@gmail.com>",
+  "repository": {
+    "type": "git",
+    "url": "git+https://github.com/pilotspace/ADD.git"
+  },
+  "homepage": "https://github.com/pilotspace/ADD#readme",
+  "bugs": {
+    "url": "https://github.com/pilotspace/ADD/issues"
+  }
+}