@pilotspace/add 1.1.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (61) hide show
  1. package/CHANGELOG.md +81 -0
  2. package/GETTING-STARTED.md +187 -139
  3. package/README.md +13 -7
  4. package/bin/cli.js +96 -5
  5. package/docs/01-principles.md +3 -3
  6. package/docs/02-the-flow.md +19 -12
  7. package/docs/03-step-1-specify.md +15 -13
  8. package/docs/04-step-2-scenarios.md +2 -2
  9. package/docs/05-step-3-contract.md +3 -3
  10. package/docs/06-step-4-tests.md +10 -2
  11. package/docs/07-step-5-build.md +3 -1
  12. package/docs/08-step-6-verify.md +25 -5
  13. package/docs/09-the-loop.md +12 -6
  14. package/docs/10-setup-and-stages.md +27 -13
  15. package/docs/11-governance.md +6 -2
  16. package/docs/12-roles.md +3 -3
  17. package/docs/13-adoption.md +1 -1
  18. package/docs/14-foundation.md +15 -15
  19. package/docs/15-foundations-and-lineage.md +106 -0
  20. package/docs/README.md +4 -0
  21. package/docs/appendix-a-templates.md +3 -3
  22. package/docs/appendix-b-prompts.md +40 -5
  23. package/docs/appendix-c-glossary.md +49 -12
  24. package/docs/appendix-d-worked-example.md +2 -2
  25. package/docs/appendix-e-checklists.md +16 -4
  26. package/docs/appendix-f-requirements-matrix.md +8 -8
  27. package/docs/appendix-g-references.md +106 -0
  28. package/package.json +1 -1
  29. package/skill/add/SKILL.md +41 -38
  30. package/skill/add/adopt.md +13 -11
  31. package/skill/add/deltas.md +8 -6
  32. package/skill/add/fold.md +19 -17
  33. package/skill/add/graduate.md +74 -0
  34. package/skill/add/intake.md +22 -7
  35. package/skill/add/loop.md +59 -0
  36. package/skill/add/phases/0-ground.md +66 -0
  37. package/skill/add/phases/0-setup.md +32 -25
  38. package/skill/add/phases/1-specify.md +28 -13
  39. package/skill/add/phases/2-scenarios.md +14 -4
  40. package/skill/add/phases/3-contract.md +27 -12
  41. package/skill/add/phases/4-tests.md +15 -5
  42. package/skill/add/phases/5-build.md +33 -4
  43. package/skill/add/phases/6-verify.md +40 -2
  44. package/skill/add/phases/7-observe.md +13 -5
  45. package/skill/add/report-template.md +65 -7
  46. package/skill/add/run.md +93 -39
  47. package/skill/add/scope.md +10 -6
  48. package/skill/add/setup-review.md +13 -10
  49. package/skill/add/streams.md +88 -23
  50. package/tooling/add.py +1817 -90
  51. package/tooling/templates/CONVENTIONS.md.tmpl +1 -1
  52. package/tooling/templates/DESIGN.md.tmpl +66 -0
  53. package/tooling/templates/GLOSSARY.md.tmpl +29 -0
  54. package/tooling/templates/MILESTONE.md.tmpl +1 -0
  55. package/tooling/templates/PROJECT.md.tmpl +6 -3
  56. package/tooling/templates/TASK.md.tmpl +55 -15
  57. package/tooling/templates/catalog.sample.json +38 -0
  58. package/tooling/templates/prototype.sample.json +48 -0
  59. package/tooling/templates/tokens.sample.json +55 -0
  60. package/tooling/templates/udd-catalog.md +122 -0
  61. package/tooling/templates/udd-tokens.md +79 -0
@@ -11,9 +11,9 @@ orchestrator*, drive several tasks at once by reading the dependency DAG that
11
11
  ## The honest frame — this is pipelining, not N× speed
12
12
 
13
13
  With **one human reviewer** you cannot beat `review_time × N_tasks` (the human-led
14
- seams are serial — `docs/10-setup-and-stages.md:91`). So the win is **not throughput**:
14
+ decision points are serial — `docs/10-setup-and-stages.md:91`). So the win is **not throughput**:
15
15
  it is that the reviewer is **never blocked waiting on a build**. While the human reviews
16
- task A's frozen front, the builds for B·C·D run behind *their* frozen contracts. You hide
16
+ task A's frozen bundle, the builds for B·C·D run behind *their* frozen contracts. You hide
17
17
  build latency under human latency. Do not promise more than that.
18
18
 
19
19
  ## The two queues
@@ -24,33 +24,34 @@ Compute both from one `python3 .add/tooling/add.py status` — no new state:
24
24
  `deps=` task already shows `gate=PASS`. These are the only tasks a worker may pick up.
25
25
  A task with unmet deps stays queued; a task finishing PASS unblocks its dependents on
26
26
  the next `status`.
27
- - **REVIEW-QUEUE** — the irreducibly serial part: the **one-approval front** (contract
27
+ - **REVIEW-QUEUE** — the irreducibly serial part: the **bundle approval** (contract
28
28
  freeze) and any **Verify escalation**. One human, one queue. Present these one at a
29
- time, never in a batch the human will rubber-stamp.
29
+ time, never in a batch the human will approve without reading.
30
30
 
31
31
  ```
32
32
  add.py status ─► READY-QUEUE ──spawn workers──► builds run ──► REVIEW-QUEUE ──► done
33
- (deps=PASS?) (machine span) (concurrent) (human seams,
33
+ (deps=PASS?) (machine span) (concurrent) (decision points,
34
34
  ▲ strictly serial)
35
35
  └──────────────── a task gating PASS unblocks its dependents ──────────────┘
36
36
  ```
37
37
 
38
- ## The dial is the throttle (not a new flag)
38
+ ## The autonomy level is the throttle (not a new flag)
39
39
 
40
40
  How much concurrency you actually get is set by each task's `autonomy:` header
41
41
  (`run.md`), not by this rubric:
42
42
 
43
43
  | `autonomy` (TASK.md) | What serializes on the human | Concurrency |
44
44
  |----------------------|------------------------------|-------------|
45
- | `conservative` | one-approval front **+** every Verify | pure pipelining — builds overlap, both gates queue |
46
- | `auto` (default) | one-approval front **only**; Verify auto-PASSes on evidence | real concurrency — only the seam + residue escalations queue |
47
- | `auto` but **high-risk** | refused → forced `conservative` (`unguarded_high_risk_auto`) | back to pipelining, by design |
45
+ | `conservative` / `manual` | bundle approval **+** every Verify | pure pipelining — builds overlap, both gates queue (`manual` is the strict floor; same streams behaviour) |
46
+ | `auto` (default) | bundle approval **only**; Verify auto-PASSes on evidence | real concurrency — only the decision point + residue escalations queue |
47
+ | `auto` but **high-risk** | refused → must lower to `conservative` / `manual` (`unguarded_high_risk_auto`) | back to pipelining, by design |
48
48
 
49
- The irreducible floor is **one human approval per task at the contract seam** — the seam
49
+ The irreducible floor is **one human approval per task at the contract decision point** — the decision point
50
50
  never drops to zero (`run.md:22`). That floor is correct; do not engineer around it.
51
51
 
52
52
  ## Who writes what — the hard boundary
53
53
 
54
+ <constraints>
54
55
  - **You (orchestrator)** own all shared writes: `MILESTONE.md`, and every
55
56
  `add.py advance <slug>` / `add.py gate <outcome> <slug>` call. **Always pass the explicit
56
57
  `<slug>`** — `advance`/`gate`/`phase` all take an optional task slug and act on it
@@ -62,21 +63,83 @@ never drops to zero (`run.md:22`). That floor is correct; do not engineer around
62
63
  - **Isolation**: spawn each worker with `isolation="worktree"` so concurrent builds
63
64
  cannot collide. The worktree is discarded on failure; the task resets to its last-good
64
65
  phase.
66
+ </constraints>
65
67
 
66
68
  ## Design for failure (required)
67
69
 
68
70
  - **Fresh worktree base (verify base == HEAD)** — create each worker's worktree from current
69
- `HEAD` **after** you commit the task's frozen front (spec · scenarios · contract · tests). A
71
+ `HEAD` **after** you commit the task's frozen specification bundle (spec · scenarios · contract · tests). A
70
72
  worktree forked from a stale base forces the worker to recreate the frozen artifacts by hand
71
73
  (the v10 dogfood hit exactly this). Before the worker starts, confirm `git -C <worktree>
72
74
  rev-parse HEAD` equals the orchestrator's `HEAD`; if it drifted, `git merge` the base in first.
73
- - **Lease + timeout** record which worker holds which task; if a worker dies, release
74
- the claim back to READY (re-spawn, do not assume partial work is sound).
75
+ On a runner that creates each worktree **at spawn** from a pool (e.g. Claude Code), that pool can hand
76
+ out a STALE base, so the pre-spawn `rev-parse` evidence cell is unsatisfiable. The `unverified_fork_base`
77
+ check then **shifts** — it never skips: the worker's **step-0** syncs to base (`git merge` the orchestrator's
78
+ `HEAD`) and re-echoes `rev-parse HEAD`, which the orchestrator verifies at **merge-time**, before merge-back.
79
+ The pre-spawn check stays the DEFAULT for fresh-`HEAD`-worktree runners; the merge-time path is the additive
80
+ ALTERNATIVE for spawn-time runners — never a replacement of the pre-spawn rule.
81
+ **The engine executes this gate** (engine-merge-base-enforcement): run
82
+ `python3 .add/tooling/add.py wave-verify` before the first merge-back — it refuses a mismatched or
83
+ pending echo (`unverified_fork_base`) and an off-template ledger (`wave_ledger_malformed`, fail-closed);
84
+ `add.py check` is the standing monitor (red at `status: merging`, `fork_base_pending` WARN at `live`).
85
+ - **Lease + timeout** — record which worker holds which task (in the wave ledger, below);
86
+ if a worker dies, release the claim back to READY (re-spawn, do not assume partial work is sound).
75
87
  - **Failure isolates** — a worker that hits a STOP-and-escalate (below) blocks only its
76
88
  own task. Siblings keep running; the escalation joins the REVIEW-QUEUE.
77
89
  - **Circuit-breaker** — if N workers fail in a wave, stop fanning out and fall back to
78
90
  sequential. Repeated failure means the scope was wrong, not the parallelism.
79
91
 
92
+ ## Wave ledger — the wave's resume point
93
+
94
+ A single task resumes from `state.json`; a wave used to resume from nothing — the
95
+ task ↔ lease ↔ fork-base ↔ autonomy ↔ merge-order mapping lived only in the orchestrator's
96
+ chat context, and the v12-1 recurrence proved that discipline without an artifact fails
97
+ (the base check existed in prose and never ran). The ledger fixes both: it is the file you
98
+ re-orient from, and its evidence cells cannot be filled without executing the checks.
99
+
100
+ **The file** — `.add/milestones/<m>/WAVE.md`, orchestrator-owned like `MILESTONE.md` and
101
+ `state.json`. ONE live wave per milestone at a time; opening a second while one is live is
102
+ refused (`wave_already_live`). **Workers never read WAVE.md** — the orchestrator copies the
103
+ relevant mid-wave decisions into each worker's PROMPT.md at spawn/respawn, so the worker
104
+ contract below stays unchanged and no worker widens into sibling state.
105
+
106
+ ```markdown
107
+ # WAVE.md — transient wave ledger (orchestrator-owned · one live wave per milestone)
108
+ wave: <n> · opened: <date> · status: live|merging
109
+ base: <orchestrator HEAD at spawn — the sha every fork must equal>
110
+
111
+ ### Roster (lease ledger)
112
+ | task | lease (worker) | fork-base (pasted) | autonomy | spawned | timeout |
113
+ |--------|----------------|---------------------------------------------|----------|---------|---------|
114
+ | <slug> | wt-a | <paste `git -C <wt> rev-parse HEAD` output> | auto | <time> | <dur> |
115
+
116
+ ### Mid-wave decisions
117
+ - <date> <decision a later or respawned worker must honor — copy it into that worker's PROMPT.md>
118
+
119
+ ### Merge order (serial; integration Verify per merge)
120
+ 1. <slug> → 2. <slug>
121
+ ```
122
+
123
+ **Evidence cells, not ticks.** The fork-base cell holds the PASTED output of
124
+ `git -C <worktree> rev-parse HEAD`, and it must equal `base:`. A tick is not evidence; a row
125
+ you can only fill by running the command is the fresh-worktree-base check EXECUTING — the
126
+ v12-1 lesson (words-exist ≠ method-works) closed structurally. Spawning a worker whose roster
127
+ row lacks that evidence is refused (`unverified_fork_base`). On a spawn-time pool runner this
128
+ PRE-spawn paste is unsatisfiable (the pooled base is stale until the worker syncs), so the cell
129
+ instead holds the worker's **step-0** post-sync echo (still `== base:`) and the `unverified_fork_base`
130
+ refusal **shifts to merge-time**, before merge-back — it shifts, it never lifts.
131
+
132
+ **Lifecycle — open → consume → digest → delete.** Open the ledger when the first worker
133
+ spawns. The serial integration Verify consumes it (the merge order is read from it, one
134
+ worktree at a time). At wave close, absorb the evidence digest — wave base · roster→fork-base
135
+ evidence · merge order · integration-Verify outcome — into `MILESTONE.md` as an append-only
136
+ `## Wave log` block (this is the integration-Verify *record*, previously homeless), and only
137
+ then remove the file. Removing WAVE.md before the digest is absorbed is refused
138
+ (`digest_not_absorbed`) — the proof the checks ran must outlive the file.
139
+
140
+ **Resume rule.** On session start, a live WAVE.md is the wave's resume point: re-orient from
141
+ the file — roster, bases, decisions, merge order — never from conversational memory.
142
+
80
143
  ## Merge is serial — integration Verify
81
144
 
82
145
  Parallel build, **serial integration**. After workers return, you merge the worktrees
@@ -85,8 +148,8 @@ checks that `run.md:102` says automation cannot judge. Two green tasks in isolat
85
148
  still conflict when merged; this step is where that surfaces. Never auto-pass it.
86
149
 
87
150
  Each worktree carries a full copy of `.add/`. Merge back **only** `src/`, `tests/`, and the
88
- worker's own `.add/tasks/<slug>/` (TASK.md · SUMMARY.md) — `.add/state.json` and
89
- `MILESTONE.md` stay orchestrator-owned, or a parallel merge will drag stale state back.
151
+ worker's own `.add/tasks/<slug>/` (TASK.md · SUMMARY.md) — `.add/state.json`, `MILESTONE.md`,
152
+ and the live `WAVE.md` stay orchestrator-owned, or a parallel merge will drag stale state back.
90
153
 
91
154
  ## The worker contract — portable across coding agents
92
155
 
@@ -107,7 +170,7 @@ changes. Fill every `{{...}}` per stream. The ADD-specific value is `<touch_boun
107
170
  Execute the LOCKED dynamic run for task '{{TASK_SLUG}}' in milestone {{MILESTONE}}:
108
171
  drive §4 TESTS red→green against the FROZEN contract {{CONTRACT_VERSION}}, converge, and
109
172
  resolve verify per autonomy={{AUTONOMY}}. You own ONLY the machine-led span — the two human
110
- seams (front approval · escalated Verify) are NOT yours.
173
+ decision points (bundle approval · escalated Verify) are NOT yours.
111
174
  </objective>
112
175
 
113
176
  <persona>
@@ -126,12 +189,12 @@ Self-Eval; if any < 0.9, refine before returning.
126
189
  <touch_boundary> <!-- from run.md:56-73; the worker's contract, identical on every runner -->
127
190
  MAY: rewrite code in src/ · drive tests green WITHOUT weakening them · gather verify evidence.
128
191
  MUST NOT: edit the frozen CONTRACT or locked scope · weaken/delete/skip any test ·
129
- touch §1–§3 front artifacts · write MILESTONE.md / state.json / any sibling stream.
192
+ touch §1–§3 bundle artifacts · write MILESTONE.md / state.json / any sibling stream.
130
193
  STOP-and-escalate (return your findings; do not decide):
131
194
  • a discovered scope/contract gap → backward-correction, reopen Specify (principle 4)
132
195
  • any SECURITY finding → HARD-STOP, always
133
196
  • a concurrency/timing OR architecture/layering risk the tests cannot exercise
134
- • [include this bullet ONLY when autonomy=conservative] the verify gate itself — STOP for the human
197
+ • [include this bullet when autonomy is conservative OR manual — any lowered rung] the verify gate itself — STOP for the human
135
198
  Auto-PASS only if autonomy=auto AND: all tests green · coverage not decreased · no test weakened ·
136
199
  no contract edited · loops dry · completeness-critic clean · no residue above. Log it as
137
200
  auto-resolved, naming this run as owner — never forge a human signature.
@@ -154,9 +217,11 @@ ripgrep otherwise. Design every IO path for failure — timeouts, retries, rollb
154
217
  </tools>
155
218
 
156
219
  <return> <!-- the worker PROPOSES; the orchestrator RECORDS. A worker never runs add.py. -->
157
- End with a structured verdict AND write the same into SUMMARY.md in the task dir:
220
+ End with a structured verdict AND write the same into SUMMARY.md in the task dir, then
221
+ **commit SUMMARY.md + deltas.md** in the worktree (uncommitted worktree files survive only by
222
+ harness courtesy — commit them so the serial-integration merge-back carries your report):
158
223
  { task, outcome: PASS|RISK-ACCEPTED|HARD-STOP|ESCALATE, evidence: <tests+coverage>,
159
- residue: [security|concurrency|architecture findings], deltas: [open competency deltas] }.
224
+ residue: [security|concurrency|architecture findings], deltas: [open lessons learned] }.
160
225
  Do NOT touch add.py or any shared file — the orchestrator gates on your verdict.
161
226
  </return>
162
227
  ```
@@ -169,10 +234,10 @@ The contract is identical whichever model runs it (the model is disposable, like
169
234
  | Tier | When | Claude Code | Any other runner |
170
235
  |------|------|-------------|------------------|
171
236
  | **mid** | ordinary, well-tested scope; clear contract | `sonnet` | the runner's balanced model |
172
- | **top** | complex / ambiguous / cross-cutting / large blast radius | `opus` | the runner's strongest reasoning model |
237
+ | **top** | complex / ambiguous / cross-cutting / broad scope of impact | `opus` | the runner's strongest reasoning model |
173
238
 
174
239
  Two rules sit **above** model choice and never bend:
175
- - **High-risk ⇒ `conservative` autonomy, regardless of model** (`run.md` high-risk guard). A
240
+ - **High-risk ⇒ a lowered rung (`conservative` or `manual`), regardless of model** (`run.md` high-risk guard). A
176
241
  stronger model does not buy back the human gate.
177
242
  - **Security residue always escalates** — no tier and no model auto-passes it.
178
243
 
@@ -186,7 +251,7 @@ worktree, then points the agent at that directory.
186
251
  |-----------|----------|----------------------------------|-----------------------------------------------|
187
252
  | spawn a worker | prompt + label | `Task(description=…, prompt=…)` | `cd $WT && <agent> run --prompt-file PROMPT.md` |
188
253
  | pick the model | tier → id | `model="opus"\|"sonnet"` | a `--model <id>` flag |
189
- | isolate | worktree | `isolation="worktree"` | `git worktree add $WT HEAD` (after committing the front; verify base == HEAD), then run inside it |
254
+ | isolate | worktree | `isolation="worktree"` | `git worktree add $WT HEAD` (after committing the bundle; verify base == HEAD), then run inside it |
190
255
  | load context | files / cwd | `<context_files>` + repo cwd | run inside `$WT`; paths are relative |
191
256
  | domain expertise | skill / preamble | a Claude skill in `<expertise>` | a system-prompt / profile preamble |
192
257
  | return a verdict | structured | final message (optionally a schema) | stdout JSON the orchestrator parses |