npm - @pilotspace/add - Versions diffs - 1.1.0 → 1.2.0 - Mend

@pilotspace/add 1.1.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (54) hide show

package/CHANGELOG.md +40 -0
package/GETTING-STARTED.md +165 -139
package/README.md +13 -7
package/bin/cli.js +13 -4
package/docs/01-principles.md +3 -3
package/docs/02-the-flow.md +15 -11
package/docs/03-step-1-specify.md +13 -13
package/docs/04-step-2-scenarios.md +2 -2
package/docs/05-step-3-contract.md +3 -3
package/docs/06-step-4-tests.md +2 -2
package/docs/07-step-5-build.md +1 -1
package/docs/08-step-6-verify.md +14 -5
package/docs/09-the-loop.md +12 -6
package/docs/10-setup-and-stages.md +27 -13
package/docs/11-governance.md +2 -2
package/docs/12-roles.md +3 -3
package/docs/13-adoption.md +1 -1
package/docs/14-foundation.md +15 -15
package/docs/15-foundations-and-lineage.md +106 -0
package/docs/README.md +4 -0
package/docs/appendix-a-templates.md +3 -3
package/docs/appendix-b-prompts.md +40 -5
package/docs/appendix-c-glossary.md +42 -12
package/docs/appendix-d-worked-example.md +2 -2
package/docs/appendix-e-checklists.md +2 -2
package/docs/appendix-f-requirements-matrix.md +8 -8
package/docs/appendix-g-references.md +106 -0
package/package.json +1 -1
package/skill/add/SKILL.md +39 -37
package/skill/add/adopt.md +13 -11
package/skill/add/deltas.md +8 -6
package/skill/add/fold.md +19 -17
package/skill/add/graduate.md +74 -0
package/skill/add/intake.md +22 -7
package/skill/add/loop.md +59 -0
package/skill/add/phases/0-setup.md +29 -24
package/skill/add/phases/1-specify.md +23 -13
package/skill/add/phases/2-scenarios.md +14 -4
package/skill/add/phases/3-contract.md +24 -11
package/skill/add/phases/4-tests.md +15 -5
package/skill/add/phases/5-build.md +11 -4
package/skill/add/phases/6-verify.md +24 -2
package/skill/add/phases/7-observe.md +13 -5
package/skill/add/report-template.md +65 -7
package/skill/add/run.md +45 -34
package/skill/add/scope.md +10 -6
package/skill/add/setup-review.md +13 -10
package/skill/add/streams.md +69 -19
package/tooling/add.py +476 -34
package/tooling/templates/CONVENTIONS.md.tmpl +1 -1
package/tooling/templates/GLOSSARY.md.tmpl +23 -0
package/tooling/templates/MILESTONE.md.tmpl +1 -0
package/tooling/templates/PROJECT.md.tmpl +4 -3
package/tooling/templates/TASK.md.tmpl +33 -12

package/skill/add/run.md CHANGED Viewed

@@ -1,25 +1,24 @@
 # The dynamic run — executing a locked scope
 Once a task's CONTRACT is frozen (phase 3), the scope is *locked*: the external shape will not move.
-That lock is ADD's autonomy seam — below it code is disposable; above it nothing breaks. This rubric
-covers what runs on the far side of the seam: the **build->verify half, executed as a dynamic,
-self-improving run** instead of a manual, sequential build. The human-led FRONT (Specify · Scenarios
-· Contract) still owns *direction*, but v7 compresses it to a **single human approval at the seam**
-(see "The one-approval front" below) — the AI drafts the whole front, a human approves it once.
+That lock is ADD's autonomy decision point — below it code is disposable; above it nothing breaks. This rubric
+covers what runs on the far side of the decision point: the **build->verify half, executed as a dynamic,
+self-improving run** instead of a manual, sequential build. The human-led **specification bundle** (Specify · Scenarios
+· Contract) still owns *direction*, but v7 compresses it to a **single human approval at the decision point**
+(see "The specification bundle" below) — the AI drafts the whole bundle, a human approves it once.
 > **Self-improving = within-run convergence + emit v5 deltas** — same definition as v5: tracked,
 > evidence-backed, never autonomous training. The run converges in-turn AND feeds the human-gated
-> fold loop (`deltas.md` · `fold.md`). The engine stays judgment-free: this is a rubric, not `add.py`.
+> consolidation loop (`deltas.md` · `fold.md`). The engine stays judgment-free: this is a rubric, not `add.py`.
-## The one-approval front (v7)
+## The specification bundle (v7)
-The human-led front used to be three separate approvals — Specify, then Scenarios, then the Contract
-freeze. v7 compresses it to **one**. From the user's input the AI **drafts the whole front as a single
-bundle** — the Spec, the Scenarios, the Contract, and the failing Tests — and presents it together. The
-human gives **one approval, at the frozen contract** (the seam). That single approval is the green light
+The specification bundle used to be three separate approvals — Specify, then Scenarios, then the Contract
+freeze. v7 compresses it to **one**. From the user's input the AI **drafts the whole specification bundle in one pass** — the Spec, the Scenarios, the Contract, and the failing Tests — and presents it together. The
+human gives **one approval, at the frozen contract** (the decision point). That single approval is the green light
 for the self-driving run.
-Why one approval and not zero: the contract freeze is the autonomy seam, and the seam **stays human**.
+Why one approval and not zero: the contract freeze is the autonomy decision point, and the decision point **stays human**.
 The AI *drafts* the contract but never *freezes its own* — a person approves the frozen shape before any
 auto-run touches code. This is exactly what keeps "never self-gate a human-led gate" true under an auto
 default: the one gate that remains is human. Drop it to zero and the AI would freeze the interface it
@@ -28,11 +27,11 @@ then builds against and self-gate the result — the circular trust v6's dogfood
 What the human is actually approving in that one gate: that the drafted Spec captures the real intent,
 that the Scenarios cover the cases that matter, and that the Contract shape is the one to freeze. Reject
 any part and the bundle goes back to draft — that is backward-correction (principle 4), not failure.
-Approve, and the run begins. The seam guide (`phases/3-contract.md`) carries the
+Approve, and the run begins. The decision-point guide (`phases/3-contract.md`) carries the
 **freeze review checklist** — six lines that walk the human through exactly this, ⚠-first.
-**The least-sure flag — aiming the one approval.** A single approval over a whole bundle invites a
-rubber stamp. So the AI presents the bundle **least-sure first**: of everything it is asking the human
+**The lowest-confidence flag — aiming the one approval.** A single approval over a whole bundle is easy to
+grant without reading. So the AI presents the bundle **lowest-confidence first**: of everything it is asking the human
 to freeze, it names the **1–2 points most likely to be wrong**, tagged by part
 (`⚠ [spec|scenario|contract|test] … — because …; if wrong: …`), each with *why* it is uncertain and
 *what it costs if wrong*. The §1 assumptions feed it, but a flag may equally point at an uncovered
@@ -40,7 +39,7 @@ scenario or the contract shape. If nothing is materially uncertain, the AI still
 biggest risk, however small — never a blank "none". Honest about its limit: the flag records that the
 human approved with the soft spots **in front of them**, eyes open; it makes a real review cheap and a
 lazy one visibly negligent, but it cannot *force* engagement — and the AI never asserts that the human
-engaged when it cannot know (a self-asserted gate would just be the rubber stamp one level up). Closing
+engaged when it cannot know (a self-asserted gate would just move the unread approval one level up). Closing
 that enforcement gap is the job of a CI checker, not of prose.
 ## When the run begins — the scope-lock trigger
@@ -50,17 +49,18 @@ The trigger is the **frozen contract**, nothing else. A run may start only when:
 - §3 CONTRACT is marked `FROZEN @ vN` (the shape is fixed), AND
 - §4 TESTS exist and are RED for the right reason (the target the run drives to green).
-No frozen contract -> no run: you are still on the human-led front, and starting early is the
+No frozen contract -> no run: you are still inside the specification bundle, and starting early is the
 forward-skip the flow forbids. The lock is what makes autonomous execution *safe* — the AI cannot
 drift the interface, because the interface is frozen above it.
-## The touch-boundary — what the run may and may not touch
+## The change scope — what the run may and may not touch
+<constraints>
 A locked run has a hard boundary. It MAY:
-- write and rewrite **code** (`src/`) — code is disposable below the seam;
+- write and rewrite **code** (`src/`) — code is disposable below the decision point;
 - drive the **tests** to green WITHOUT weakening them (a weakened test is a method violation);
-- gather **evidence** for the verify gate (test output, blind-spot checks).
+- gather **evidence** for the verify gate (test output, non-functional review).
 It MUST NOT:
@@ -68,10 +68,11 @@ It MUST NOT:
   the run STOPS and hands back to a human to reopen Specify (principle 4). The run never re-locks
   scope on its own.
 - weaken, delete, or skip a **test** to make the build pass (that inverts the method).
-- touch the **human-led front artifacts** (§1–§3) except to halt and escalate.
+- touch the **specification-bundle artifacts** (§1–§3) except to halt and escalate.
+</constraints>
 Crossing the boundary is not a fast run; it is an unverified one. When the run hits something only the
-front can resolve, it stops — and that stop is the loop working, not failing.
+specification bundle can resolve, it stops — and that stop is the loop working, not failing.
 ## The dynamic run — fan-out and in-run convergence
@@ -83,21 +84,28 @@ on a trustworthy result with three loops:
   Stopping at the first green is how defects survive; the run stops only when the well runs dry.
 - **adversarial verify** — for every "done" claim, an independent skeptic tries to REFUTE it. The
   claim survives only if it withstands refutation, not because one pass looked plausible.
-- **completeness-critic** — a final pass that asks "what did we NOT cover — a scenario, a blind-spot,
+- **completeness-critic** — a final pass that asks "what did we NOT cover — a scenario, a non-functional risk,
   an unstated assumption?" Whatever it finds re-enters the run.
 The run ends only when the loops go dry AND the auto-gate's evidence is satisfied. This is the run
 **self-improving within the turn** — the same convergence the foundation loop runs across milestones,
 compressed into one task.
-## The evidence auto-gate
+## The automated quality gate
+<constraints>
 The verify gate may be resolved by **evidence** rather than by a person — when the evidence is
 sufficient and the result is recorded (principle 7, reframed: an automated, recorded pass is an
 explicit pass, not a skip).
 - **Auto-PASS requires ALL of:** every test green; coverage not decreased; no test weakened and no
-  contract edited; the convergence loops dry; the completeness-critic found nothing open.
+  contract edited; the convergence loops dry; the completeness-critic found nothing open; and the
+  deep check below recorded.
+- **The deep check (every gate, no skim).** Deep check — do not skim. If the task produced code, record
+  that every new symbol is referenced (wiring) and that no new dead/unused code was introduced. If it
+  produced prose or non-code, record a semantic read — what you read in full and what it confirmed.
+  Which path applies is the resolver's judgement; the engine never classifies. An unfilled deep check is
+  a **shallow verify**, not an auto-PASS — evidence the work is wired, not merely plausible.
 - **Always escalates to a human (never auto-passed):** any **security** finding (HARD-STOP, always);
   a **concurrency**/timing risk the tests cannot exercise; an **architecture**/layering violation; and
   any failing test. These are the residue principle 2 names — automation cannot judge them.
@@ -107,22 +115,24 @@ explicit pass, not a skip).
 The auto-gate NEVER writes a human signature it did not get. An auto-PASS is logged as *auto-resolved*,
 honestly — the line between a pass and a skip is the recorded outcome, not a forged name.
+</constraints>
 ## Emitting deltas — feeding the foundation back
 The completeness-critic does not discard what it finds. Every gap, surprise, or convention that helped
-or hurt becomes an **`open` competency delta** in the task's OBSERVE block, in the `deltas.md` grammar,
+or hurt becomes an **`open` lesson learned** in the task's OBSERVE block, in the `deltas.md` grammar,
 tagged by competency:
 - a finding the run FIXED but that taught the foundation something (a missing scenario -> `TDD`);
 - a finding the run could NOT fix — a residue escalation -> a delta AND the escalation to a human.
-These `open` deltas feed v5's human-gated fold (`fold.md`) at milestone close: the run emits `open`;
-the human folds. That is the loop closing — **v6 run -> v5 foundation** — so a dynamic run sharpens the
+These `open` deltas feed v5's human-gated consolidation (`fold.md`) at milestone close: the run emits `open`;
+the human consolidates. That is the loop closing — **v6 run -> v5 foundation** — so a dynamic run sharpens the
 five competencies instead of letting its findings evaporate at end-of-run.
-## The autonomy dial
+## The autonomy level
+<constraints>
 How much a run may auto-gate is a **per-scope setting**, not a global switch (principle 5: trust is
 earned per scope). A task declares its level in its `TASK.md` header:
@@ -138,23 +148,24 @@ autonomy: auto | conservative
 > **v7 reversal (recorded, not hidden).** Earlier the default was `conservative` and `auto` was the
 > earned exception; v7 flips this — `auto` is the default, `conservative` is the deliberate lowering.
-> What did **not** change is principle 5: the dial is still **per-scope**, the level still lives in the
+> What did **not** change is principle 5: the autonomy level is still **per-scope**, and it still lives in the
 > `TASK.md` header, and you still lower it anywhere risk demands. Only the starting point moved.
-**The high-risk guard — `auto` is refused where it matters most.** The dial is not a blank cheque. On a
+**The high-risk guard — `auto` is refused where it matters most.** The autonomy level is not a blank cheque. On a
 **high-risk or method-defining scope** — anything where a wrong-but-plausible result is expensive or
 hard to reverse (auth, money, data-loss paths, the method/trust-layer itself) — `auto` must be lowered
 to `conservative`; leaving it at `auto` there is the reject code **`unguarded_high_risk_auto`**. This
-closes the v6 dogfood blind-spot, where the whole milestone ran at `auto` on the riskiest possible
+closes the v6 dogfood gap, where the whole milestone ran at `auto` on the riskiest possible
 scope (defining the method) with no friction. The default is `auto` *for ordinary, well-tested scope*;
 high risk still earns a human gate.
 Judging *what* is high-risk stays human — the scope declares **`risk: high`** in the same `TASK.md`
-header where the dial lives, reviewed at the freeze like every header line (the engine never
+header where the autonomy level lives, reviewed at the freeze like every header line (the engine never
 classifies scope). **Since v14 the guard is mechanical for the declared case:**
 the engine refuses the declared combination — `add.py gate` will not complete (`PASS`/`RISK-ACCEPTED`) a task whose header
 carries `risk: high` without `autonomy: conservative` (error `unguarded_high_risk_auto`; `HARD-STOP`
 always records — stopping is never blocked), and `add.py audit` flags the same code on a finished
 record whose header was tampered or whose GATE RECORD reviewer is the auto-gate — which CI enforces
 (audit-ci). The honest limit mirrors the audit's: an **undeclared** high-risk scope passes; declaring
-is the human seam, the engine enforces what was declared.
+is the human decision point, the engine enforces what was declared.
+</constraints>

package/skill/add/scope.md CHANGED Viewed

@@ -20,7 +20,7 @@ scope drafting honors intake's classification — it never re-sizes a request:
 means one drafting pass, NOT auto-creation. Nothing is written to disk — single draft or the
 whole batch — until the human confirms. You propose; you wait.
-## Brainstorm before you draft — co-specify at milestone altitude
+## Brainstorm before you draft — co-specify at milestone level
 Don't draft a MILESTONE.md from thin input. Run the same three-move co-specify as a
 task's §1 (`phases/1-specify.md`) — Diverge (framings + open questions) → Converge
@@ -31,12 +31,14 @@ Draft the WHOLE milestone before showing; nothing hits disk until the human conf
 Diverge seeds (pick the live ones):
 - **Outcome** — done means a user can do *what* they can't today? (goal sentence)
 - **Edge of scope** — nearest thing assumed IN that you want OUT? (Out list)
-- **Riskiest seam** — which contract, if wrong, costs the most rework? (freeze-first)
+- **Riskiest decision point** — which contract, if wrong, costs the most rework? (freeze-first)
 - **Done-looks-like** — how do we SEE each outcome without reading code? (exit criteria)
 - **First slice** — which task unblocks the rest? (breadth-first order)
-Rank assumptions least-sure first; the top 1–2 get the flag the human reads at confirm:
-`⚠ <assumption> — least sure because <why>; if wrong: <cost>`.
+Rank assumptions lowest-confidence first; the top 1–2 get the flag the human reads at confirm:
+`⚠ <assumption> — lowest confidence because <why>; if wrong: <cost>`. Present the draft via
+`report-template.md` — open with the ARC (goal · done · plan): the goal this milestone serves,
+what is already covered, and the plan its task list lays out.
 ## Drafting a good MILESTONE.md (section by section)
@@ -45,8 +47,8 @@ Rank assumptions least-sure first; the top 1–2 get the flag the human reads at
 - **Scope In/Out** — the explicit anti-creep deferral list. Naming what is OUT is as important
   as what is IN; an empty Out list usually means the scope is not yet thought through.
 - **Shared decisions & glossary deltas** — cross-cutting rules every task must honor, named from
-  the glossary. New terms get a glossary entry (the survivor layer stays honest).
-- **Shared / risky contracts to freeze first** — the seams between tasks; name the owning task.
+  the glossary. New terms get a glossary entry (the living documentation stays honest).
+- **Shared / risky contracts to freeze first** — the decision points between tasks; name the owning task.
 - **Tasks (breadth-first)** — `slug · depends-on · one line` each. Decompose by deliverable, not
   by phase; keep each task one-file-sized. Order by dependency, not by guesswork.
 - **Exit criteria** — observable, and **every exit criterion maps to a declared task slug**
@@ -54,6 +56,7 @@ Rank assumptions least-sure first; the top 1–2 get the flag the human reads at
 ## Reject codes (emit `{ reject, rationale }`, create nothing)
+<reject_codes>
 - `not_classified` — the request has not been through intake yet. Classify it first; you cannot
   draft scope for an unclassified request.
 - `dangling_criterion` — a drafted MILESTONE.md has an exit criterion that maps to no declared
@@ -61,6 +64,7 @@ Rank assumptions least-sure first; the top 1–2 get the flag the human reads at
   a malformed milestone. With no engine lint, you are the first check and the human is the backstop.
 - `no_milestone` — intake routed the request to `task` or `change-request`; scope drafting
   creates NO milestone. Honor the classification; do not invent milestone-sized scope.
+</reject_codes>
 ## Worked example (from this repo's own history)

package/skill/add/setup-review.md CHANGED Viewed

@@ -1,11 +1,11 @@
 # Setup review — the one page the human signs
-Autonomous setup ends at a single human gate: the **lock-down** (`add.py lock`). Before that
+Autonomous setup ends at a single human gate: the **baseline approval** (`add.py lock`). Before that
 signature is honest, the human needs to see *what you drafted and how sure you were* — not re-derive
 it. `SETUP-REVIEW.md` is that page: every decision you made while drafting the foundation, first-scope,
-and the first contract, **ordered least-sure-first** so the riskiest guesses meet their eye first.
+and the first contract, **ordered lowest-confidence-first** so the riskiest guesses meet their eye first.
-This is the setup-altitude analog of presenting a task's front least-sure-first at the contract freeze.
+This is the setup-level analog of presenting a task's specification bundle lowest-confidence-first at the contract freeze.
 The engine never reads this file — `add.py lock` is judgment-free, the signature *is* the gate (see
 `setup-lock-state`). The human **reading** this page is the review; your job is to make the reading honest.
@@ -13,7 +13,7 @@ The engine never reads this file — `add.py lock` is judgment-free, the signatu
 Write **one** artifact at `.add/SETUP-REVIEW.md`. **Never clobber a human-edited one** — if it already
 exists with hand edits, append/update, don't overwrite (the same non-clobber rule `init` applies to
-survivors). It is a per-onboarding, setup-altitude artifact; it sits beside `PROJECT.md`, not under a task.
+living docs). It is a per-onboarding, setup-level artifact; it sits beside `PROJECT.md`, not under a task.
 ## The template
@@ -27,14 +27,15 @@ survivors). It is a per-onboarding, setup-altitude artifact; it sits beside `PRO
 | 1 | <the drafted decision> | PROJECT.md \| scope \| first-contract | `guessed` | <the inference + why you had to guess> |
 | 2 | <…> | <…> | `evidence-grounded` | <cite the source file/line you read it from> |
-Sign: reviewed the above → `add.py lock --by "<name>"`
+Sign: confirm in chat → the agent runs `add.py lock --by "<name>"` (typing it yourself works too)
 ```
-Rows are numbered for reference at the gate ("row 1 is the one I'm least sure about").
+Rows are numbered for reference at the gate ("row 1 is where my confidence is lowest").
 ## The two rules that make it honest
-1. **Least-sure-first.** Order rows by confidence **ascending**. A `guessed` row always floats above an
+<constraints>
+1. **Lowest-confidence-first.** Order rows by confidence **ascending**. A `guessed` row always floats above an
    `evidence-grounded` one. The point is not completeness theatre — it is to spend the human's attention
    where it changes outcomes: the top of the table is the part they actually need to challenge.
@@ -45,13 +46,15 @@ Rows are numbered for reference at the gate ("row 1 is the one I'm least sure ab
      onboarding (a near-empty repo, only the 4-lens answers) produces these. These are what the human
      must check; that is why they sit on top.
-   The tag vocabulary is shared with `adopt.md` — the brownfield map tags each filled survivor decision
+   The tag vocabulary is shared with `adopt.md` — the brownfield map tags each filled living-doc decision
    `guessed`/`evidence-grounded`, and those tags flow straight into this table.
+</constraints>
 ## Where it ends
-`SETUP-REVIEW.md` is **read-only context** for the lock-down. You do not ask the human to approve it
-field-by-field; you present it, least-sure-first, and they sign once:
+`SETUP-REVIEW.md` is **read-only context** for the baseline approval. You do not ask the human to approve it
+field-by-field; you present it, lowest-confidence-first; they confirm in conversation, and you run the lock
+with their name:
 ```bash
 python3 .add/tooling/add.py lock --by "<name>"

package/skill/add/streams.md CHANGED Viewed

@@ -11,9 +11,9 @@ orchestrator*, drive several tasks at once by reading the dependency DAG that
 ## The honest frame — this is pipelining, not N× speed
 With **one human reviewer** you cannot beat `review_time × N_tasks` (the human-led
-seams are serial — `docs/10-setup-and-stages.md:91`). So the win is **not throughput**:
+decision points are serial — `docs/10-setup-and-stages.md:91`). So the win is **not throughput**:
 it is that the reviewer is **never blocked waiting on a build**. While the human reviews
-task A's frozen front, the builds for B·C·D run behind *their* frozen contracts. You hide
+task A's frozen bundle, the builds for B·C·D run behind *their* frozen contracts. You hide
 build latency under human latency. Do not promise more than that.
 ## The two queues
@@ -24,33 +24,34 @@ Compute both from one `python3 .add/tooling/add.py status` — no new state:
   `deps=` task already shows `gate=PASS`. These are the only tasks a worker may pick up.
   A task with unmet deps stays queued; a task finishing PASS unblocks its dependents on
   the next `status`.
-- **REVIEW-QUEUE** — the irreducibly serial part: the **one-approval front** (contract
+- **REVIEW-QUEUE** — the irreducibly serial part: the **bundle approval** (contract
   freeze) and any **Verify escalation**. One human, one queue. Present these one at a
-  time, never in a batch the human will rubber-stamp.
+  time, never in a batch the human will approve without reading.
 ```
   add.py status ─► READY-QUEUE ──spawn workers──► builds run ──► REVIEW-QUEUE ──► done
-  (deps=PASS?)     (machine span)                 (concurrent)   (human seams,
+  (deps=PASS?)     (machine span)                 (concurrent)   (decision points,
        ▲                                                          strictly serial)
        └──────────────── a task gating PASS unblocks its dependents ──────────────┘
 ```
-## The dial is the throttle (not a new flag)
+## The autonomy level is the throttle (not a new flag)
 How much concurrency you actually get is set by each task's `autonomy:` header
 (`run.md`), not by this rubric:
 | `autonomy` (TASK.md) | What serializes on the human | Concurrency |
 |----------------------|------------------------------|-------------|
-| `conservative` | one-approval front **+** every Verify | pure pipelining — builds overlap, both gates queue |
-| `auto` (default) | one-approval front **only**; Verify auto-PASSes on evidence | real concurrency — only the seam + residue escalations queue |
+| `conservative` | bundle approval **+** every Verify | pure pipelining — builds overlap, both gates queue |
+| `auto` (default) | bundle approval **only**; Verify auto-PASSes on evidence | real concurrency — only the decision point + residue escalations queue |
 | `auto` but **high-risk** | refused → forced `conservative` (`unguarded_high_risk_auto`) | back to pipelining, by design |
-The irreducible floor is **one human approval per task at the contract seam** — the seam
+The irreducible floor is **one human approval per task at the contract decision point** — the decision point
 never drops to zero (`run.md:22`). That floor is correct; do not engineer around it.
 ## Who writes what — the hard boundary
+<constraints>
 - **You (orchestrator)** own all shared writes: `MILESTONE.md`, and every
   `add.py advance <slug>` / `add.py gate <outcome> <slug>` call. **Always pass the explicit
   `<slug>`** — `advance`/`gate`/`phase` all take an optional task slug and act on it
@@ -62,21 +63,70 @@ never drops to zero (`run.md:22`). That floor is correct; do not engineer around
 - **Isolation**: spawn each worker with `isolation="worktree"` so concurrent builds
   cannot collide. The worktree is discarded on failure; the task resets to its last-good
   phase.
+</constraints>
 ## Design for failure (required)
 - **Fresh worktree base (verify base == HEAD)** — create each worker's worktree from current
-  `HEAD` **after** you commit the task's frozen front (spec · scenarios · contract · tests). A
+  `HEAD` **after** you commit the task's frozen specification bundle (spec · scenarios · contract · tests). A
   worktree forked from a stale base forces the worker to recreate the frozen artifacts by hand
   (the v10 dogfood hit exactly this). Before the worker starts, confirm `git -C <worktree>
   rev-parse HEAD` equals the orchestrator's `HEAD`; if it drifted, `git merge` the base in first.
-- **Lease + timeout** — record which worker holds which task; if a worker dies, release
-  the claim back to READY (re-spawn, do not assume partial work is sound).
+- **Lease + timeout** — record which worker holds which task (in the wave ledger, below);
+  if a worker dies, release the claim back to READY (re-spawn, do not assume partial work is sound).
 - **Failure isolates** — a worker that hits a STOP-and-escalate (below) blocks only its
   own task. Siblings keep running; the escalation joins the REVIEW-QUEUE.
 - **Circuit-breaker** — if N workers fail in a wave, stop fanning out and fall back to
   sequential. Repeated failure means the scope was wrong, not the parallelism.
+## Wave ledger — the wave's resume point
+A single task resumes from `state.json`; a wave used to resume from nothing — the
+task ↔ lease ↔ fork-base ↔ autonomy ↔ merge-order mapping lived only in the orchestrator's
+chat context, and the v12-1 recurrence proved that discipline without an artifact fails
+(the base check existed in prose and never ran). The ledger fixes both: it is the file you
+re-orient from, and its evidence cells cannot be filled without executing the checks.
+**The file** — `.add/milestones/<m>/WAVE.md`, orchestrator-owned like `MILESTONE.md` and
+`state.json`. ONE live wave per milestone at a time; opening a second while one is live is
+refused (`wave_already_live`). **Workers never read WAVE.md** — the orchestrator copies the
+relevant mid-wave decisions into each worker's PROMPT.md at spawn/respawn, so the worker
+contract below stays unchanged and no worker widens into sibling state.
+```markdown
+# WAVE.md — transient wave ledger (orchestrator-owned · one live wave per milestone)
+wave: <n> · opened: <date> · status: live|merging
+base: <orchestrator HEAD at spawn — the sha every fork must equal>
+### Roster (lease ledger)
+| task   | lease (worker) | fork-base (pasted)                          | autonomy | spawned | timeout |
+|--------|----------------|---------------------------------------------|----------|---------|---------|
+| <slug> | wt-a           | <paste `git -C <wt> rev-parse HEAD` output> | auto     | <time>  | <dur>   |
+### Mid-wave decisions
+- <date> <decision a later or respawned worker must honor — copy it into that worker's PROMPT.md>
+### Merge order (serial; integration Verify per merge)
+1. <slug> → 2. <slug>
+```
+**Evidence cells, not ticks.** The fork-base cell holds the PASTED output of
+`git -C <worktree> rev-parse HEAD`, and it must equal `base:`. A tick is not evidence; a row
+you can only fill by running the command is the fresh-worktree-base check EXECUTING — the
+v12-1 lesson (words-exist ≠ method-works) closed structurally. Spawning a worker whose roster
+row lacks that evidence is refused (`unverified_fork_base`).
+**Lifecycle — open → consume → digest → delete.** Open the ledger when the first worker
+spawns. The serial integration Verify consumes it (the merge order is read from it, one
+worktree at a time). At wave close, absorb the evidence digest — wave base · roster→fork-base
+evidence · merge order · integration-Verify outcome — into `MILESTONE.md` as an append-only
+`## Wave log` block (this is the integration-Verify *record*, previously homeless), and only
+then remove the file. Removing WAVE.md before the digest is absorbed is refused
+(`digest_not_absorbed`) — the proof the checks ran must outlive the file.
+**Resume rule.** On session start, a live WAVE.md is the wave's resume point: re-orient from
+the file — roster, bases, decisions, merge order — never from conversational memory.
 ## Merge is serial — integration Verify
 Parallel build, **serial integration**. After workers return, you merge the worktrees
@@ -85,8 +135,8 @@ checks that `run.md:102` says automation cannot judge. Two green tasks in isolat
 still conflict when merged; this step is where that surfaces. Never auto-pass it.
 Each worktree carries a full copy of `.add/`. Merge back **only** `src/`, `tests/`, and the
-worker's own `.add/tasks/<slug>/` (TASK.md · SUMMARY.md) — `.add/state.json` and
-`MILESTONE.md` stay orchestrator-owned, or a parallel merge will drag stale state back.
+worker's own `.add/tasks/<slug>/` (TASK.md · SUMMARY.md) — `.add/state.json`, `MILESTONE.md`,
+and the live `WAVE.md` stay orchestrator-owned, or a parallel merge will drag stale state back.
 ## The worker contract — portable across coding agents
@@ -107,7 +157,7 @@ changes. Fill every `{{...}}` per stream. The ADD-specific value is `<touch_boun
 Execute the LOCKED dynamic run for task '{{TASK_SLUG}}' in milestone {{MILESTONE}}:
 drive §4 TESTS red→green against the FROZEN contract {{CONTRACT_VERSION}}, converge, and
 resolve verify per autonomy={{AUTONOMY}}. You own ONLY the machine-led span — the two human
-seams (front approval · escalated Verify) are NOT yours.
+decision points (bundle approval · escalated Verify) are NOT yours.
 </objective>
 <persona>
@@ -126,7 +176,7 @@ Self-Eval; if any < 0.9, refine before returning.
 <touch_boundary>   <!-- from run.md:56-73; the worker's contract, identical on every runner -->
 MAY:  rewrite code in src/ · drive tests green WITHOUT weakening them · gather verify evidence.
 MUST NOT: edit the frozen CONTRACT or locked scope · weaken/delete/skip any test ·
-          touch §1–§3 front artifacts · write MILESTONE.md / state.json / any sibling stream.
+          touch §1–§3 bundle artifacts · write MILESTONE.md / state.json / any sibling stream.
 STOP-and-escalate (return your findings; do not decide):
   • a discovered scope/contract gap  → backward-correction, reopen Specify (principle 4)
   • any SECURITY finding              → HARD-STOP, always
@@ -156,7 +206,7 @@ ripgrep otherwise. Design every IO path for failure — timeouts, retries, rollb
 <return>   <!-- the worker PROPOSES; the orchestrator RECORDS. A worker never runs add.py. -->
 End with a structured verdict AND write the same into SUMMARY.md in the task dir:
 { task, outcome: PASS|RISK-ACCEPTED|HARD-STOP|ESCALATE, evidence: <tests+coverage>,
-  residue: [security|concurrency|architecture findings], deltas: [open competency deltas] }.
+  residue: [security|concurrency|architecture findings], deltas: [open lessons learned] }.
 Do NOT touch add.py or any shared file — the orchestrator gates on your verdict.
 </return>
 ```
@@ -169,7 +219,7 @@ The contract is identical whichever model runs it (the model is disposable, like
 | Tier | When | Claude Code | Any other runner |
 |------|------|-------------|------------------|
 | **mid** | ordinary, well-tested scope; clear contract | `sonnet` | the runner's balanced model |
-| **top** | complex / ambiguous / cross-cutting / large blast radius | `opus` | the runner's strongest reasoning model |
+| **top** | complex / ambiguous / cross-cutting / broad scope of impact | `opus` | the runner's strongest reasoning model |
 Two rules sit **above** model choice and never bend:
 - **High-risk ⇒ `conservative` autonomy, regardless of model** (`run.md` high-risk guard). A
@@ -186,7 +236,7 @@ worktree, then points the agent at that directory.
 |-----------|----------|----------------------------------|-----------------------------------------------|
 | spawn a worker | prompt + label | `Task(description=…, prompt=…)` | `cd $WT && <agent> run --prompt-file PROMPT.md` |
 | pick the model | tier → id | `model="opus"\|"sonnet"` | a `--model <id>` flag |
-| isolate | worktree | `isolation="worktree"` | `git worktree add $WT HEAD` (after committing the front; verify base == HEAD), then run inside it |
+| isolate | worktree | `isolation="worktree"` | `git worktree add $WT HEAD` (after committing the bundle; verify base == HEAD), then run inside it |
 | load context | files / cwd | `<context_files>` + repo cwd | run inside `$WT`; paths are relative |
 | domain expertise | skill / preamble | a Claude skill in `<expertise>` | a system-prompt / profile preamble |
 | return a verdict | structured | final message (optionally a schema) | stdout JSON the orchestrator parses |