@pilotspace/add 1.0.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (54) hide show
  1. package/CHANGELOG.md +88 -0
  2. package/GETTING-STARTED.md +172 -84
  3. package/README.md +14 -8
  4. package/bin/cli.js +39 -38
  5. package/docs/01-principles.md +3 -3
  6. package/docs/02-the-flow.md +20 -13
  7. package/docs/03-step-1-specify.md +13 -13
  8. package/docs/04-step-2-scenarios.md +3 -1
  9. package/docs/05-step-3-contract.md +4 -2
  10. package/docs/06-step-4-tests.md +3 -1
  11. package/docs/07-step-5-build.md +1 -1
  12. package/docs/08-step-6-verify.md +22 -4
  13. package/docs/09-the-loop.md +25 -1
  14. package/docs/10-setup-and-stages.md +52 -9
  15. package/docs/11-governance.md +2 -2
  16. package/docs/12-roles.md +3 -3
  17. package/docs/13-adoption.md +3 -3
  18. package/docs/14-foundation.md +19 -11
  19. package/docs/15-foundations-and-lineage.md +106 -0
  20. package/docs/README.md +4 -0
  21. package/docs/appendix-a-templates.md +3 -3
  22. package/docs/appendix-b-prompts.md +40 -5
  23. package/docs/appendix-c-glossary.md +42 -12
  24. package/docs/appendix-d-worked-example.md +2 -2
  25. package/docs/appendix-e-checklists.md +2 -2
  26. package/docs/appendix-f-requirements-matrix.md +12 -11
  27. package/docs/appendix-g-references.md +106 -0
  28. package/package.json +5 -3
  29. package/skill/add/SKILL.md +50 -21
  30. package/skill/add/adopt.md +67 -0
  31. package/skill/add/deltas.md +20 -8
  32. package/skill/add/fold.md +19 -17
  33. package/skill/add/graduate.md +74 -0
  34. package/skill/add/intake.md +22 -7
  35. package/skill/add/loop.md +59 -0
  36. package/skill/add/phases/0-setup.md +92 -24
  37. package/skill/add/phases/1-specify.md +23 -13
  38. package/skill/add/phases/2-scenarios.md +14 -4
  39. package/skill/add/phases/3-contract.md +38 -9
  40. package/skill/add/phases/4-tests.md +29 -5
  41. package/skill/add/phases/5-build.md +14 -4
  42. package/skill/add/phases/6-verify.md +38 -4
  43. package/skill/add/phases/7-observe.md +13 -5
  44. package/skill/add/report-template.md +106 -0
  45. package/skill/add/run.md +53 -34
  46. package/skill/add/scope.md +24 -2
  47. package/skill/add/setup-review.md +65 -0
  48. package/skill/add/streams.md +256 -0
  49. package/tooling/add.py +1388 -62
  50. package/tooling/templates/CONVENTIONS.md.tmpl +1 -1
  51. package/tooling/templates/GLOSSARY.md.tmpl +23 -0
  52. package/tooling/templates/MILESTONE.md.tmpl +1 -0
  53. package/tooling/templates/PROJECT.md.tmpl +4 -3
  54. package/tooling/templates/TASK.md.tmpl +39 -11
@@ -0,0 +1,256 @@
1
+ # Parallel streams — pipelining independent tasks
2
+
3
+ Load this **only** when a milestone has more than one task and you want to run them
4
+ concurrently. The default ADD path is one task at a time; this rubric is the opt-in
5
+ escape hatch for when independent tasks are queued and a human is ready to review.
6
+
7
+ It changes **no `add.py` code and no phase semantics**. It is a way *you, the
8
+ orchestrator*, drive several tasks at once by reading the dependency DAG that
9
+ `add.py status` already prints, and spawning one worker per ready task.
10
+
11
+ ## The honest frame — this is pipelining, not N× speed
12
+
13
+ With **one human reviewer** you cannot beat `review_time × N_tasks` (the human-led
14
+ decision points are serial — `docs/10-setup-and-stages.md:91`). So the win is **not throughput**:
15
+ it is that the reviewer is **never blocked waiting on a build**. While the human reviews
16
+ task A's frozen bundle, the builds for B·C·D run behind *their* frozen contracts. You hide
17
+ build latency under human latency. Do not promise more than that.
18
+
19
+ ## The two queues
20
+
21
+ Compute both from one `python3 .add/tooling/add.py status` — no new state:
22
+
23
+ - **READY-QUEUE** — tasks in the active milestone where `phase ≠ done` **and** every
24
+ `deps=` task already shows `gate=PASS`. These are the only tasks a worker may pick up.
25
+ A task with unmet deps stays queued; a task finishing PASS unblocks its dependents on
26
+ the next `status`.
27
+ - **REVIEW-QUEUE** — the irreducibly serial part: the **bundle approval** (contract
28
+ freeze) and any **Verify escalation**. One human, one queue. Present these one at a
29
+ time, never in a batch the human will approve without reading.
30
+
31
+ ```
32
+ add.py status ─► READY-QUEUE ──spawn workers──► builds run ──► REVIEW-QUEUE ──► done
33
+ (deps=PASS?) (machine span) (concurrent) (decision points,
34
+ ▲ strictly serial)
35
+ └──────────────── a task gating PASS unblocks its dependents ──────────────┘
36
+ ```
37
+
38
+ ## The autonomy level is the throttle (not a new flag)
39
+
40
+ How much concurrency you actually get is set by each task's `autonomy:` header
41
+ (`run.md`), not by this rubric:
42
+
43
+ | `autonomy` (TASK.md) | What serializes on the human | Concurrency |
44
+ |----------------------|------------------------------|-------------|
45
+ | `conservative` | bundle approval **+** every Verify | pure pipelining — builds overlap, both gates queue |
46
+ | `auto` (default) | bundle approval **only**; Verify auto-PASSes on evidence | real concurrency — only the decision point + residue escalations queue |
47
+ | `auto` but **high-risk** | refused → forced `conservative` (`unguarded_high_risk_auto`) | back to pipelining, by design |
48
+
49
+ The irreducible floor is **one human approval per task at the contract decision point** — the decision point
50
+ never drops to zero (`run.md:22`). That floor is correct; do not engineer around it.
51
+
52
+ ## Who writes what — the hard boundary
53
+
54
+ <constraints>
55
+ - **You (orchestrator)** own all shared writes: `MILESTONE.md`, and every
56
+ `add.py advance <slug>` / `add.py gate <outcome> <slug>` call. **Always pass the explicit
57
+ `<slug>`** — `advance`/`gate`/`phase` all take an optional task slug and act on it
58
+ (`add.py` `_resolve_task`); omitting it falls back to the single `active_task`, which
59
+ races once more than one stream is live. Name the task every time. Workers never run these.
60
+ - **A worker** owns only its own `.add/tasks/<slug>/` — it builds `src/`, drives the
61
+ tests green, gathers evidence, and writes `SUMMARY.md` + OBSERVE deltas. It touches
62
+ **no sibling stream and no shared file**.
63
+ - **Isolation**: spawn each worker with `isolation="worktree"` so concurrent builds
64
+ cannot collide. The worktree is discarded on failure; the task resets to its last-good
65
+ phase.
66
+ </constraints>
67
+
68
+ ## Design for failure (required)
69
+
70
+ - **Fresh worktree base (verify base == HEAD)** — create each worker's worktree from current
71
+ `HEAD` **after** you commit the task's frozen specification bundle (spec · scenarios · contract · tests). A
72
+ worktree forked from a stale base forces the worker to recreate the frozen artifacts by hand
73
+ (the v10 dogfood hit exactly this). Before the worker starts, confirm `git -C <worktree>
74
+ rev-parse HEAD` equals the orchestrator's `HEAD`; if it drifted, `git merge` the base in first.
75
+ - **Lease + timeout** — record which worker holds which task (in the wave ledger, below);
76
+ if a worker dies, release the claim back to READY (re-spawn, do not assume partial work is sound).
77
+ - **Failure isolates** — a worker that hits a STOP-and-escalate (below) blocks only its
78
+ own task. Siblings keep running; the escalation joins the REVIEW-QUEUE.
79
+ - **Circuit-breaker** — if N workers fail in a wave, stop fanning out and fall back to
80
+ sequential. Repeated failure means the scope was wrong, not the parallelism.
81
+
82
+ ## Wave ledger — the wave's resume point
83
+
84
+ A single task resumes from `state.json`; a wave used to resume from nothing — the
85
+ task ↔ lease ↔ fork-base ↔ autonomy ↔ merge-order mapping lived only in the orchestrator's
86
+ chat context, and the v12-1 recurrence proved that discipline without an artifact fails
87
+ (the base check existed in prose and never ran). The ledger fixes both: it is the file you
88
+ re-orient from, and its evidence cells cannot be filled without executing the checks.
89
+
90
+ **The file** — `.add/milestones/<m>/WAVE.md`, orchestrator-owned like `MILESTONE.md` and
91
+ `state.json`. ONE live wave per milestone at a time; opening a second while one is live is
92
+ refused (`wave_already_live`). **Workers never read WAVE.md** — the orchestrator copies the
93
+ relevant mid-wave decisions into each worker's PROMPT.md at spawn/respawn, so the worker
94
+ contract below stays unchanged and no worker widens into sibling state.
95
+
96
+ ```markdown
97
+ # WAVE.md — transient wave ledger (orchestrator-owned · one live wave per milestone)
98
+ wave: <n> · opened: <date> · status: live|merging
99
+ base: <orchestrator HEAD at spawn — the sha every fork must equal>
100
+
101
+ ### Roster (lease ledger)
102
+ | task | lease (worker) | fork-base (pasted) | autonomy | spawned | timeout |
103
+ |--------|----------------|---------------------------------------------|----------|---------|---------|
104
+ | <slug> | wt-a | <paste `git -C <wt> rev-parse HEAD` output> | auto | <time> | <dur> |
105
+
106
+ ### Mid-wave decisions
107
+ - <date> <decision a later or respawned worker must honor — copy it into that worker's PROMPT.md>
108
+
109
+ ### Merge order (serial; integration Verify per merge)
110
+ 1. <slug> → 2. <slug>
111
+ ```
112
+
113
+ **Evidence cells, not ticks.** The fork-base cell holds the PASTED output of
114
+ `git -C <worktree> rev-parse HEAD`, and it must equal `base:`. A tick is not evidence; a row
115
+ you can only fill by running the command is the fresh-worktree-base check EXECUTING — the
116
+ v12-1 lesson (words-exist ≠ method-works) closed structurally. Spawning a worker whose roster
117
+ row lacks that evidence is refused (`unverified_fork_base`).
118
+
119
+ **Lifecycle — open → consume → digest → delete.** Open the ledger when the first worker
120
+ spawns. The serial integration Verify consumes it (the merge order is read from it, one
121
+ worktree at a time). At wave close, absorb the evidence digest — wave base · roster→fork-base
122
+ evidence · merge order · integration-Verify outcome — into `MILESTONE.md` as an append-only
123
+ `## Wave log` block (this is the integration-Verify *record*, previously homeless), and only
124
+ then remove the file. Removing WAVE.md before the digest is absorbed is refused
125
+ (`digest_not_absorbed`) — the proof the checks ran must outlive the file.
126
+
127
+ **Resume rule.** On session start, a live WAVE.md is the wave's resume point: re-orient from
128
+ the file — roster, bases, decisions, merge order — never from conversational memory.
129
+
130
+ ## Merge is serial — integration Verify
131
+
132
+ Parallel build, **serial integration**. After workers return, you merge the worktrees
133
+ one at a time and run the **integration** Verify — the concurrency / architecture / layering
134
+ checks that `run.md:102` says automation cannot judge. Two green tasks in isolation can
135
+ still conflict when merged; this step is where that surfaces. Never auto-pass it.
136
+
137
+ Each worktree carries a full copy of `.add/`. Merge back **only** `src/`, `tests/`, and the
138
+ worker's own `.add/tasks/<slug>/` (TASK.md · SUMMARY.md) — `.add/state.json`, `MILESTONE.md`,
139
+ and the live `WAVE.md` stay orchestrator-owned, or a parallel merge will drag stale state back.
140
+
141
+ ## The worker contract — portable across coding agents
142
+
143
+ A worker **is** the dynamic run (`run.md`) for one task. Keep two things separate:
144
+
145
+ - **The contract** (below) — the prompt. It is **agent-agnostic**: it names no vendor tool,
146
+ no model, no spawn API. It is a durable ADD artifact, like the spec and the tests.
147
+ - **The adapter** (next sections) — the thin, swappable mapping that tells *one* runner
148
+ (Claude Code · Codex · opencode · pi-mono · any CLI agent) how to launch the contract.
149
+
150
+ This split is the whole point: the same frozen contract runs on any agent; only the adapter
151
+ changes. Fill every `{{...}}` per stream. The ADD-specific value is `<touch_boundary>` + the
152
+ "return a verdict, never write shared state" rule — they are identical on every runner.
153
+
154
+ ```xml
155
+ <!-- PROMPT.md — dropped into the worker's worktree, or passed inline. No runner-specific tokens. -->
156
+ <objective>
157
+ Execute the LOCKED dynamic run for task '{{TASK_SLUG}}' in milestone {{MILESTONE}}:
158
+ drive §4 TESTS red→green against the FROZEN contract {{CONTRACT_VERSION}}, converge, and
159
+ resolve verify per autonomy={{AUTONOMY}}. You own ONLY the machine-led span — the two human
160
+ decision points (bundle approval · escalated Verify) are NOT yours.
161
+ </objective>
162
+
163
+ <persona>
164
+ You are a {{DOMAIN}} engineer with 15 years building {{DOMAIN_DETAIL}}.
165
+ A wrong-but-plausible result here is expensive; correctness over speed.
166
+ Work step by step:
167
+ 1. Load the context files. Confirm the start gate: §3 CONTRACT FROZEN @ {{CONTRACT_VERSION}}
168
+ AND §4 TESTS RED for the right reason. If not → STOP and escalate (forward-skip forbidden).
169
+ 2. Build in small batches in src/ until the red tests pass — never weaken or skip a test.
170
+ 3. Converge: loop-until-dry · adversarial-verify every 'done' claim · completeness-critic.
171
+ 4. Resolve verify per the boundary. Write SUMMARY.md + OBSERVE deltas (deltas.md grammar).
172
+ Score confidence (0-1) on Completeness · Clarity · Practicality · Optimization · EdgeCases ·
173
+ Self-Eval; if any < 0.9, refine before returning.
174
+ </persona>
175
+
176
+ <touch_boundary> <!-- from run.md:56-73; the worker's contract, identical on every runner -->
177
+ MAY: rewrite code in src/ · drive tests green WITHOUT weakening them · gather verify evidence.
178
+ MUST NOT: edit the frozen CONTRACT or locked scope · weaken/delete/skip any test ·
179
+ touch §1–§3 bundle artifacts · write MILESTONE.md / state.json / any sibling stream.
180
+ STOP-and-escalate (return your findings; do not decide):
181
+ • a discovered scope/contract gap → backward-correction, reopen Specify (principle 4)
182
+ • any SECURITY finding → HARD-STOP, always
183
+ • a concurrency/timing OR architecture/layering risk the tests cannot exercise
184
+ • [include this bullet ONLY when autonomy=conservative] the verify gate itself — STOP for the human
185
+ Auto-PASS only if autonomy=auto AND: all tests green · coverage not decreased · no test weakened ·
186
+ no contract edited · loops dry · completeness-critic clean · no residue above. Log it as
187
+ auto-resolved, naming this run as owner — never forge a human signature.
188
+ </touch_boundary>
189
+
190
+ <context_files> <!-- paths relative to the worktree root -->
191
+ .add/PROJECT.md · .add/milestones/{{MILESTONE}}/MILESTONE.md (READ-ONLY) ·
192
+ .add/tasks/{{TASK_SLUG}}/TASK.md · .claude/skills/add/run.md · .claude/skills/add/deltas.md
193
+ </context_files>
194
+
195
+ <expertise>
196
+ Adopt the persona above. If your runner supports specialist injection — a Claude Code skill,
197
+ a Codex/opencode system-prompt preamble, an agent profile — load the one matching {{DOMAIN}}.
198
+ If it does not, the persona IS your expertise.
199
+ </expertise>
200
+
201
+ <tools>
202
+ Navigate with your runner's code-intelligence: mcp__serena under Claude Code; LSP / ctags /
203
+ ripgrep otherwise. Design every IO path for failure — timeouts, retries, rollback.
204
+ </tools>
205
+
206
+ <return> <!-- the worker PROPOSES; the orchestrator RECORDS. A worker never runs add.py. -->
207
+ End with a structured verdict AND write the same into SUMMARY.md in the task dir:
208
+ { task, outcome: PASS|RISK-ACCEPTED|HARD-STOP|ESCALATE, evidence: <tests+coverage>,
209
+ residue: [security|concurrency|architecture findings], deltas: [open lessons learned] }.
210
+ Do NOT touch add.py or any shared file — the orchestrator gates on your verdict.
211
+ </return>
212
+ ```
213
+
214
+ ## Choosing the model — vendor-neutral tiers
215
+
216
+ ADD picks a **tier** from the scope's nature; the adapter maps the tier to the runner's model id.
217
+ The contract is identical whichever model runs it (the model is disposable, like the code):
218
+
219
+ | Tier | When | Claude Code | Any other runner |
220
+ |------|------|-------------|------------------|
221
+ | **mid** | ordinary, well-tested scope; clear contract | `sonnet` | the runner's balanced model |
222
+ | **top** | complex / ambiguous / cross-cutting / broad scope of impact | `opus` | the runner's strongest reasoning model |
223
+
224
+ Two rules sit **above** model choice and never bend:
225
+ - **High-risk ⇒ `conservative` autonomy, regardless of model** (`run.md` high-risk guard). A
226
+ stronger model does not buy back the human gate.
227
+ - **Security residue always escalates** — no tier and no model auto-passes it.
228
+
229
+ ## The spawn adapter — one thin mapping per runner
230
+
231
+ ADD needs six capabilities from any runner. **Isolation is the one ADD owns itself** (a git
232
+ worktree), so streams stay portable even on a runner with no native sandbox — ADD makes the
233
+ worktree, then points the agent at that directory.
234
+
235
+ | ADD needs | Abstract | Claude Code (verified reference) | Any CLI agent — Codex · opencode · pi-mono · … |
236
+ |-----------|----------|----------------------------------|-----------------------------------------------|
237
+ | spawn a worker | prompt + label | `Task(description=…, prompt=…)` | `cd $WT && <agent> run --prompt-file PROMPT.md` |
238
+ | pick the model | tier → id | `model="opus"\|"sonnet"` | a `--model <id>` flag |
239
+ | isolate | worktree | `isolation="worktree"` | `git worktree add $WT HEAD` (after committing the bundle; verify base == HEAD), then run inside it |
240
+ | load context | files / cwd | `<context_files>` + repo cwd | run inside `$WT`; paths are relative |
241
+ | domain expertise | skill / preamble | a Claude skill in `<expertise>` | a system-prompt / profile preamble |
242
+ | return a verdict | structured | final message (optionally a schema) | stdout JSON the orchestrator parses |
243
+
244
+ The **hint of `Task` spawn** is the Claude Code column — the worked reference. For any other
245
+ agent the recipe is the same shape: `git worktree add` → point the agent CLI at that dir with
246
+ the chosen model → it reads `PROMPT.md` → you parse its verdict.
247
+
248
+ > **Honesty:** only the Claude Code column is verified. The CLI forms for Codex/opencode/pi-mono
249
+ > are *illustrative shapes*, not confirmed flags — exact syntax differs per runner and version;
250
+ > confirm with the `find-docs` skill. The portable, durable parts are the **contract** and the
251
+ > **six-capability mapping**, never any one runner's flags.
252
+
253
+ When workers return, **you** record each outcome with the explicit slug — `add.py advance <slug>`
254
+ as evidence lands, `add.py gate PASS|RISK-ACCEPTED|HARD-STOP <slug>` at verify — then re-read
255
+ `status` to refill the READY-QUEUE. The worker proposes a verdict; the orchestrator records it.
256
+ That split is exactly what lets a non-Claude worker take part without ever touching shared state.