@pilotspace/add 1.0.0 → 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +88 -0
- package/GETTING-STARTED.md +172 -84
- package/README.md +14 -8
- package/bin/cli.js +39 -38
- package/docs/01-principles.md +3 -3
- package/docs/02-the-flow.md +20 -13
- package/docs/03-step-1-specify.md +13 -13
- package/docs/04-step-2-scenarios.md +3 -1
- package/docs/05-step-3-contract.md +4 -2
- package/docs/06-step-4-tests.md +3 -1
- package/docs/07-step-5-build.md +1 -1
- package/docs/08-step-6-verify.md +22 -4
- package/docs/09-the-loop.md +25 -1
- package/docs/10-setup-and-stages.md +52 -9
- package/docs/11-governance.md +2 -2
- package/docs/12-roles.md +3 -3
- package/docs/13-adoption.md +3 -3
- package/docs/14-foundation.md +19 -11
- package/docs/15-foundations-and-lineage.md +106 -0
- package/docs/README.md +4 -0
- package/docs/appendix-a-templates.md +3 -3
- package/docs/appendix-b-prompts.md +40 -5
- package/docs/appendix-c-glossary.md +42 -12
- package/docs/appendix-d-worked-example.md +2 -2
- package/docs/appendix-e-checklists.md +2 -2
- package/docs/appendix-f-requirements-matrix.md +12 -11
- package/docs/appendix-g-references.md +106 -0
- package/package.json +5 -3
- package/skill/add/SKILL.md +50 -21
- package/skill/add/adopt.md +67 -0
- package/skill/add/deltas.md +20 -8
- package/skill/add/fold.md +19 -17
- package/skill/add/graduate.md +74 -0
- package/skill/add/intake.md +22 -7
- package/skill/add/loop.md +59 -0
- package/skill/add/phases/0-setup.md +92 -24
- package/skill/add/phases/1-specify.md +23 -13
- package/skill/add/phases/2-scenarios.md +14 -4
- package/skill/add/phases/3-contract.md +38 -9
- package/skill/add/phases/4-tests.md +29 -5
- package/skill/add/phases/5-build.md +14 -4
- package/skill/add/phases/6-verify.md +38 -4
- package/skill/add/phases/7-observe.md +13 -5
- package/skill/add/report-template.md +106 -0
- package/skill/add/run.md +53 -34
- package/skill/add/scope.md +24 -2
- package/skill/add/setup-review.md +65 -0
- package/skill/add/streams.md +256 -0
- package/tooling/add.py +1388 -62
- package/tooling/templates/CONVENTIONS.md.tmpl +1 -1
- package/tooling/templates/GLOSSARY.md.tmpl +23 -0
- package/tooling/templates/MILESTONE.md.tmpl +1 -0
- package/tooling/templates/PROJECT.md.tmpl +4 -3
- package/tooling/templates/TASK.md.tmpl +39 -11
|
@@ -0,0 +1,106 @@
|
|
|
1
|
+
# Chat reports — the decision-point template (for the AI, not for add.py)
|
|
2
|
+
|
|
3
|
+
The engine renders artifacts (`report`, `report --decide`, `status`); this file
|
|
4
|
+
governs the CHAT MESSAGE you wrap around them. The digest is the artifact BEHIND
|
|
5
|
+
your presentation, never a replacement for it — and your prose is never a
|
|
6
|
+
replacement for the digest.
|
|
7
|
+
|
|
8
|
+
Use it every time you report at or near a decision point: an intake proposal, a
|
|
9
|
+
bundle approval, a verify gate, a task completion, a milestone close.
|
|
10
|
+
|
|
11
|
+
## The decision arc — rendered first, above the five blocks
|
|
12
|
+
|
|
13
|
+
Every report at a human gate opens with the **ARC** — three labelled lines that
|
|
14
|
+
place the decision in the work's whole arc, so the human confirms with sight of
|
|
15
|
+
where this is going, not just the step in front of them. Render it first, then a
|
|
16
|
+
separator, then the unchanged five blocks below:
|
|
17
|
+
|
|
18
|
+
```
|
|
19
|
+
ARC goal: <the milestone / project goal this decision serves>
|
|
20
|
+
done: <proven progress — tasks done · exit-criteria met · what this gate proves>
|
|
21
|
+
plan: <this gate → the next step → the goal>
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
- **goal** — the milestone or project goal the decision serves, read from the
|
|
25
|
+
`m-goal` line in `add.py status`; never re-typed from memory.
|
|
26
|
+
- **done** — proven progress only: exit-criteria met/total and tasks done from
|
|
27
|
+
the rollup, plus what this gate proves. An honest fact, never a hope.
|
|
28
|
+
- **plan** — this gate → the next step → the goal, mirroring the rollup's
|
|
29
|
+
`DECIDE NEXT` line.
|
|
30
|
+
|
|
31
|
+
The arc is required at every human gate: **baseline-lock · contract-freeze ·
|
|
32
|
+
verify · intake · scope · milestone-close · graduation**. The three labels stay
|
|
33
|
+
constant; their content adapts to the gate. The arc is presentation only — it
|
|
34
|
+
adds no gate and changes no PASS / RISK-ACCEPTED / HARD-STOP / freeze outcome.
|
|
35
|
+
|
|
36
|
+
Its facts are engine-sourced, exactly like EVIDENCE below: goal = `m-goal` ·
|
|
37
|
+
done = exit-criteria met/total + tasks done · plan = `DECIDE NEXT`. If your arc
|
|
38
|
+
and `add.py` output disagree, the engine wins — fix the arc, not the engine.
|
|
39
|
+
|
|
40
|
+
### Per-gate examples — one shape, gate-specific content
|
|
41
|
+
|
|
42
|
+
- **verify** — `goal:` ship the decision arc · `done:` report-arc tests 6/6
|
|
43
|
+
green, gate ready · `plan:` PASS this gate → wire the arc into every gate → goal.
|
|
44
|
+
- **contract-freeze** — `goal:` … · `done:` bundle drafted, lowest-confidence
|
|
45
|
+
flag surfaced · `plan:` freeze §3 → build → goal.
|
|
46
|
+
- **milestone-close** — `goal:` … · `done:` exit-criteria 3/3 met, all tasks
|
|
47
|
+
done · `plan:` close → archive → the next milestone.
|
|
48
|
+
- **intake** — `goal:` the sized request · `done:` classified new-major,
|
|
49
|
+
rationale stated · `plan:` create the milestone → first contract → goal.
|
|
50
|
+
|
|
51
|
+
## The five blocks, in order
|
|
52
|
+
|
|
53
|
+
```
|
|
54
|
+
SUMMARY one line: intent + target + where we are
|
|
55
|
+
DECISION what you need from the human (or "none — FYI")
|
|
56
|
+
⚠ FLAGS lowest-confidence first, why + cost-if-wrong
|
|
57
|
+
EVIDENCE small table: tests · gates · parity · check — engine-sourced
|
|
58
|
+
NEXT the single next action + what it unlocks
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
1. **SUMMARY** — one line carrying intent + target + position, e.g.
|
|
62
|
+
"v13 task 2/3 — tests-declared-fallback is green, gate PASS." The reader
|
|
63
|
+
knows where they are before they read anything else.
|
|
64
|
+
2. **DECISION** — the question the human must answer, stated plainly; exactly
|
|
65
|
+
one decision per report, or an explicit "none — FYI". If a decision exists,
|
|
66
|
+
ask it AFTER everything below has been shown (show-before-ask).
|
|
67
|
+
3. **⚠ FLAGS** — lowest-confidence first, each with *why* confidence is lowest and the
|
|
68
|
+
*cost if wrong*. Where TASK.md markers exist (`⚠` / `- [~]` / `- [ ]`),
|
|
69
|
+
quote them verbatim and keep their document order — extraction ≠ judgment.
|
|
70
|
+
4. **EVIDENCE** — engine-sourced facts pasted from `add.py` output, never
|
|
71
|
+
re-typed from memory. If your prose and the engine disagree, the engine
|
|
72
|
+
wins: fix the engine or the data, not the sentence.
|
|
73
|
+
5. **NEXT** — one action and what it unlocks. Mirror the rollup's DECIDE NEXT
|
|
74
|
+
line when it is right; overrule it only with a stated reason (e.g. planned
|
|
75
|
+
tasks the state file cannot see yet).
|
|
76
|
+
|
|
77
|
+
**The ask itself** — when block 2's decision becomes a literal question component
|
|
78
|
+
(option picker, numbered menu), compose it as a summary: the detail stays in the
|
|
79
|
+
report above, the question carries intent + what "yes" means + the flag count.
|
|
80
|
+
|
|
81
|
+
## Hard rules
|
|
82
|
+
|
|
83
|
+
<constraints>
|
|
84
|
+
- **Summary-first.** Never bury the decision under a task list or a diff.
|
|
85
|
+
- **Show before ask.** Render the artifact (digest · diff · report) before any
|
|
86
|
+
approval question; the human decides on what they can see.
|
|
87
|
+
- **Reconcile the count.** Before the ask, your ⚠ FLAGS must reconcile with
|
|
88
|
+
`add.py report --decide`'s open-item count. If your prose calls an item
|
|
89
|
+
resolved while the digest still counts it open, the engine wins — fix the data
|
|
90
|
+
(the TASK.md markers the digest reads), not the sentence. A report whose flag
|
|
91
|
+
count disagrees with the engine is the un-transparent gate the ARC exists to close.
|
|
92
|
+
- **Never pre-stamp a human decision point.** Freeze / gate / lock fields stay DRAFT or
|
|
93
|
+
blank until the answer returns: show → ask → stamp → advance. An artifact
|
|
94
|
+
must never claim an approval that has not happened.
|
|
95
|
+
- **One report per decision point.** After an approval, point at the frozen artifact —
|
|
96
|
+
do not re-render the whole bundle.
|
|
97
|
+
- **Honest scope.** "Done" means the request, not the last task: report
|
|
98
|
+
"task 2/3", never "done" while approved scope remains.
|
|
99
|
+
- **The question is a summary, never the artifact.** Every approval ask carries
|
|
100
|
+
two layers: a compact SUMMARY · DECISION · ⚠ FLAGS block sits in chat
|
|
101
|
+
immediately before the ask (positional), and the question text itself is a
|
|
102
|
+
summary of two lines at most — intent + what "yes" means + the flag count —
|
|
103
|
+
pointing at the report above (compositional). The full bundle, diff, or
|
|
104
|
+
artifact lives only in the chat report; a question that re-carries it buries
|
|
105
|
+
the decision.
|
|
106
|
+
</constraints>
|
package/skill/add/run.md
CHANGED
|
@@ -1,25 +1,24 @@
|
|
|
1
1
|
# The dynamic run — executing a locked scope
|
|
2
2
|
|
|
3
3
|
Once a task's CONTRACT is frozen (phase 3), the scope is *locked*: the external shape will not move.
|
|
4
|
-
That lock is ADD's autonomy
|
|
5
|
-
covers what runs on the far side of the
|
|
6
|
-
self-improving run** instead of a manual, sequential build. The human-led
|
|
7
|
-
· Contract) still owns *direction*, but v7 compresses it to a **single human approval at the
|
|
8
|
-
(see "The
|
|
4
|
+
That lock is ADD's autonomy decision point — below it code is disposable; above it nothing breaks. This rubric
|
|
5
|
+
covers what runs on the far side of the decision point: the **build->verify half, executed as a dynamic,
|
|
6
|
+
self-improving run** instead of a manual, sequential build. The human-led **specification bundle** (Specify · Scenarios
|
|
7
|
+
· Contract) still owns *direction*, but v7 compresses it to a **single human approval at the decision point**
|
|
8
|
+
(see "The specification bundle" below) — the AI drafts the whole bundle, a human approves it once.
|
|
9
9
|
|
|
10
10
|
> **Self-improving = within-run convergence + emit v5 deltas** — same definition as v5: tracked,
|
|
11
11
|
> evidence-backed, never autonomous training. The run converges in-turn AND feeds the human-gated
|
|
12
|
-
>
|
|
12
|
+
> consolidation loop (`deltas.md` · `fold.md`). The engine stays judgment-free: this is a rubric, not `add.py`.
|
|
13
13
|
|
|
14
|
-
## The
|
|
14
|
+
## The specification bundle (v7)
|
|
15
15
|
|
|
16
|
-
The
|
|
17
|
-
freeze. v7 compresses it to **one**. From the user's input the AI **drafts the whole
|
|
18
|
-
|
|
19
|
-
human gives **one approval, at the frozen contract** (the seam). That single approval is the green light
|
|
16
|
+
The specification bundle used to be three separate approvals — Specify, then Scenarios, then the Contract
|
|
17
|
+
freeze. v7 compresses it to **one**. From the user's input the AI **drafts the whole specification bundle in one pass** — the Spec, the Scenarios, the Contract, and the failing Tests — and presents it together. The
|
|
18
|
+
human gives **one approval, at the frozen contract** (the decision point). That single approval is the green light
|
|
20
19
|
for the self-driving run.
|
|
21
20
|
|
|
22
|
-
Why one approval and not zero: the contract freeze is the autonomy
|
|
21
|
+
Why one approval and not zero: the contract freeze is the autonomy decision point, and the decision point **stays human**.
|
|
23
22
|
The AI *drafts* the contract but never *freezes its own* — a person approves the frozen shape before any
|
|
24
23
|
auto-run touches code. This is exactly what keeps "never self-gate a human-led gate" true under an auto
|
|
25
24
|
default: the one gate that remains is human. Drop it to zero and the AI would freeze the interface it
|
|
@@ -28,10 +27,11 @@ then builds against and self-gate the result — the circular trust v6's dogfood
|
|
|
28
27
|
What the human is actually approving in that one gate: that the drafted Spec captures the real intent,
|
|
29
28
|
that the Scenarios cover the cases that matter, and that the Contract shape is the one to freeze. Reject
|
|
30
29
|
any part and the bundle goes back to draft — that is backward-correction (principle 4), not failure.
|
|
31
|
-
Approve, and the run begins.
|
|
30
|
+
Approve, and the run begins. The decision-point guide (`phases/3-contract.md`) carries the
|
|
31
|
+
**freeze review checklist** — six lines that walk the human through exactly this, ⚠-first.
|
|
32
32
|
|
|
33
|
-
**The
|
|
34
|
-
|
|
33
|
+
**The lowest-confidence flag — aiming the one approval.** A single approval over a whole bundle is easy to
|
|
34
|
+
grant without reading. So the AI presents the bundle **lowest-confidence first**: of everything it is asking the human
|
|
35
35
|
to freeze, it names the **1–2 points most likely to be wrong**, tagged by part
|
|
36
36
|
(`⚠ [spec|scenario|contract|test] … — because …; if wrong: …`), each with *why* it is uncertain and
|
|
37
37
|
*what it costs if wrong*. The §1 assumptions feed it, but a flag may equally point at an uncovered
|
|
@@ -39,7 +39,7 @@ scenario or the contract shape. If nothing is materially uncertain, the AI still
|
|
|
39
39
|
biggest risk, however small — never a blank "none". Honest about its limit: the flag records that the
|
|
40
40
|
human approved with the soft spots **in front of them**, eyes open; it makes a real review cheap and a
|
|
41
41
|
lazy one visibly negligent, but it cannot *force* engagement — and the AI never asserts that the human
|
|
42
|
-
engaged when it cannot know (a self-asserted gate would just
|
|
42
|
+
engaged when it cannot know (a self-asserted gate would just move the unread approval one level up). Closing
|
|
43
43
|
that enforcement gap is the job of a CI checker, not of prose.
|
|
44
44
|
|
|
45
45
|
## When the run begins — the scope-lock trigger
|
|
@@ -49,17 +49,18 @@ The trigger is the **frozen contract**, nothing else. A run may start only when:
|
|
|
49
49
|
- §3 CONTRACT is marked `FROZEN @ vN` (the shape is fixed), AND
|
|
50
50
|
- §4 TESTS exist and are RED for the right reason (the target the run drives to green).
|
|
51
51
|
|
|
52
|
-
No frozen contract -> no run: you are still
|
|
52
|
+
No frozen contract -> no run: you are still inside the specification bundle, and starting early is the
|
|
53
53
|
forward-skip the flow forbids. The lock is what makes autonomous execution *safe* — the AI cannot
|
|
54
54
|
drift the interface, because the interface is frozen above it.
|
|
55
55
|
|
|
56
|
-
## The
|
|
56
|
+
## The change scope — what the run may and may not touch
|
|
57
57
|
|
|
58
|
+
<constraints>
|
|
58
59
|
A locked run has a hard boundary. It MAY:
|
|
59
60
|
|
|
60
|
-
- write and rewrite **code** (`src/`) — code is disposable below the
|
|
61
|
+
- write and rewrite **code** (`src/`) — code is disposable below the decision point;
|
|
61
62
|
- drive the **tests** to green WITHOUT weakening them (a weakened test is a method violation);
|
|
62
|
-
- gather **evidence** for the verify gate (test output,
|
|
63
|
+
- gather **evidence** for the verify gate (test output, non-functional review).
|
|
63
64
|
|
|
64
65
|
It MUST NOT:
|
|
65
66
|
|
|
@@ -67,10 +68,11 @@ It MUST NOT:
|
|
|
67
68
|
the run STOPS and hands back to a human to reopen Specify (principle 4). The run never re-locks
|
|
68
69
|
scope on its own.
|
|
69
70
|
- weaken, delete, or skip a **test** to make the build pass (that inverts the method).
|
|
70
|
-
- touch the **
|
|
71
|
+
- touch the **specification-bundle artifacts** (§1–§3) except to halt and escalate.
|
|
72
|
+
</constraints>
|
|
71
73
|
|
|
72
74
|
Crossing the boundary is not a fast run; it is an unverified one. When the run hits something only the
|
|
73
|
-
|
|
75
|
+
specification bundle can resolve, it stops — and that stop is the loop working, not failing.
|
|
74
76
|
|
|
75
77
|
## The dynamic run — fan-out and in-run convergence
|
|
76
78
|
|
|
@@ -82,21 +84,28 @@ on a trustworthy result with three loops:
|
|
|
82
84
|
Stopping at the first green is how defects survive; the run stops only when the well runs dry.
|
|
83
85
|
- **adversarial verify** — for every "done" claim, an independent skeptic tries to REFUTE it. The
|
|
84
86
|
claim survives only if it withstands refutation, not because one pass looked plausible.
|
|
85
|
-
- **completeness-critic** — a final pass that asks "what did we NOT cover — a scenario, a
|
|
87
|
+
- **completeness-critic** — a final pass that asks "what did we NOT cover — a scenario, a non-functional risk,
|
|
86
88
|
an unstated assumption?" Whatever it finds re-enters the run.
|
|
87
89
|
|
|
88
90
|
The run ends only when the loops go dry AND the auto-gate's evidence is satisfied. This is the run
|
|
89
91
|
**self-improving within the turn** — the same convergence the foundation loop runs across milestones,
|
|
90
92
|
compressed into one task.
|
|
91
93
|
|
|
92
|
-
## The
|
|
94
|
+
## The automated quality gate
|
|
93
95
|
|
|
96
|
+
<constraints>
|
|
94
97
|
The verify gate may be resolved by **evidence** rather than by a person — when the evidence is
|
|
95
98
|
sufficient and the result is recorded (principle 7, reframed: an automated, recorded pass is an
|
|
96
99
|
explicit pass, not a skip).
|
|
97
100
|
|
|
98
101
|
- **Auto-PASS requires ALL of:** every test green; coverage not decreased; no test weakened and no
|
|
99
|
-
contract edited; the convergence loops dry; the completeness-critic found nothing open
|
|
102
|
+
contract edited; the convergence loops dry; the completeness-critic found nothing open; and the
|
|
103
|
+
deep check below recorded.
|
|
104
|
+
- **The deep check (every gate, no skim).** Deep check — do not skim. If the task produced code, record
|
|
105
|
+
that every new symbol is referenced (wiring) and that no new dead/unused code was introduced. If it
|
|
106
|
+
produced prose or non-code, record a semantic read — what you read in full and what it confirmed.
|
|
107
|
+
Which path applies is the resolver's judgement; the engine never classifies. An unfilled deep check is
|
|
108
|
+
a **shallow verify**, not an auto-PASS — evidence the work is wired, not merely plausible.
|
|
100
109
|
- **Always escalates to a human (never auto-passed):** any **security** finding (HARD-STOP, always);
|
|
101
110
|
a **concurrency**/timing risk the tests cannot exercise; an **architecture**/layering violation; and
|
|
102
111
|
any failing test. These are the residue principle 2 names — automation cannot judge them.
|
|
@@ -106,22 +115,24 @@ explicit pass, not a skip).
|
|
|
106
115
|
|
|
107
116
|
The auto-gate NEVER writes a human signature it did not get. An auto-PASS is logged as *auto-resolved*,
|
|
108
117
|
honestly — the line between a pass and a skip is the recorded outcome, not a forged name.
|
|
118
|
+
</constraints>
|
|
109
119
|
|
|
110
120
|
## Emitting deltas — feeding the foundation back
|
|
111
121
|
|
|
112
122
|
The completeness-critic does not discard what it finds. Every gap, surprise, or convention that helped
|
|
113
|
-
or hurt becomes an **`open`
|
|
123
|
+
or hurt becomes an **`open` lesson learned** in the task's OBSERVE block, in the `deltas.md` grammar,
|
|
114
124
|
tagged by competency:
|
|
115
125
|
|
|
116
126
|
- a finding the run FIXED but that taught the foundation something (a missing scenario -> `TDD`);
|
|
117
127
|
- a finding the run could NOT fix — a residue escalation -> a delta AND the escalation to a human.
|
|
118
128
|
|
|
119
|
-
These `open` deltas feed v5's human-gated
|
|
120
|
-
the human
|
|
129
|
+
These `open` deltas feed v5's human-gated consolidation (`fold.md`) at milestone close: the run emits `open`;
|
|
130
|
+
the human consolidates. That is the loop closing — **v6 run -> v5 foundation** — so a dynamic run sharpens the
|
|
121
131
|
five competencies instead of letting its findings evaporate at end-of-run.
|
|
122
132
|
|
|
123
|
-
## The autonomy
|
|
133
|
+
## The autonomy level
|
|
124
134
|
|
|
135
|
+
<constraints>
|
|
125
136
|
How much a run may auto-gate is a **per-scope setting**, not a global switch (principle 5: trust is
|
|
126
137
|
earned per scope). A task declares its level in its `TASK.md` header:
|
|
127
138
|
|
|
@@ -137,16 +148,24 @@ autonomy: auto | conservative
|
|
|
137
148
|
|
|
138
149
|
> **v7 reversal (recorded, not hidden).** Earlier the default was `conservative` and `auto` was the
|
|
139
150
|
> earned exception; v7 flips this — `auto` is the default, `conservative` is the deliberate lowering.
|
|
140
|
-
> What did **not** change is principle 5: the
|
|
151
|
+
> What did **not** change is principle 5: the autonomy level is still **per-scope**, and it still lives in the
|
|
141
152
|
> `TASK.md` header, and you still lower it anywhere risk demands. Only the starting point moved.
|
|
142
153
|
|
|
143
|
-
**The high-risk guard — `auto` is refused where it matters most.** The
|
|
154
|
+
**The high-risk guard — `auto` is refused where it matters most.** The autonomy level is not a blank cheque. On a
|
|
144
155
|
**high-risk or method-defining scope** — anything where a wrong-but-plausible result is expensive or
|
|
145
156
|
hard to reverse (auth, money, data-loss paths, the method/trust-layer itself) — `auto` must be lowered
|
|
146
157
|
to `conservative`; leaving it at `auto` there is the reject code **`unguarded_high_risk_auto`**. This
|
|
147
|
-
closes the v6 dogfood
|
|
158
|
+
closes the v6 dogfood gap, where the whole milestone ran at `auto` on the riskiest possible
|
|
148
159
|
scope (defining the method) with no friction. The default is `auto` *for ordinary, well-tested scope*;
|
|
149
160
|
high risk still earns a human gate.
|
|
150
161
|
|
|
151
|
-
|
|
152
|
-
|
|
162
|
+
Judging *what* is high-risk stays human — the scope declares **`risk: high`** in the same `TASK.md`
|
|
163
|
+
header where the autonomy level lives, reviewed at the freeze like every header line (the engine never
|
|
164
|
+
classifies scope). **Since v14 the guard is mechanical for the declared case:**
|
|
165
|
+
the engine refuses the declared combination — `add.py gate` will not complete (`PASS`/`RISK-ACCEPTED`) a task whose header
|
|
166
|
+
carries `risk: high` without `autonomy: conservative` (error `unguarded_high_risk_auto`; `HARD-STOP`
|
|
167
|
+
always records — stopping is never blocked), and `add.py audit` flags the same code on a finished
|
|
168
|
+
record whose header was tampered or whose GATE RECORD reviewer is the auto-gate — which CI enforces
|
|
169
|
+
(audit-ci). The honest limit mirrors the audit's: an **undeclared** high-risk scope passes; declaring
|
|
170
|
+
is the human decision point, the engine enforces what was declared.
|
|
171
|
+
</constraints>
|
package/skill/add/scope.md
CHANGED
|
@@ -20,6 +20,26 @@ scope drafting honors intake's classification — it never re-sizes a request:
|
|
|
20
20
|
means one drafting pass, NOT auto-creation. Nothing is written to disk — single draft or the
|
|
21
21
|
whole batch — until the human confirms. You propose; you wait.
|
|
22
22
|
|
|
23
|
+
## Brainstorm before you draft — co-specify at milestone level
|
|
24
|
+
|
|
25
|
+
Don't draft a MILESTONE.md from thin input. Run the same three-move co-specify as a
|
|
26
|
+
task's §1 (`phases/1-specify.md`) — Diverge (framings + open questions) → Converge
|
|
27
|
+
(draft + rank) → Validate (show flags first) — raised to milestone scope. Ask only
|
|
28
|
+
what moves the goal, the In/Out line, or the task list; skip what PROJECT.md settles.
|
|
29
|
+
Draft the WHOLE milestone before showing; nothing hits disk until the human confirms.
|
|
30
|
+
|
|
31
|
+
Diverge seeds (pick the live ones):
|
|
32
|
+
- **Outcome** — done means a user can do *what* they can't today? (goal sentence)
|
|
33
|
+
- **Edge of scope** — nearest thing assumed IN that you want OUT? (Out list)
|
|
34
|
+
- **Riskiest decision point** — which contract, if wrong, costs the most rework? (freeze-first)
|
|
35
|
+
- **Done-looks-like** — how do we SEE each outcome without reading code? (exit criteria)
|
|
36
|
+
- **First slice** — which task unblocks the rest? (breadth-first order)
|
|
37
|
+
|
|
38
|
+
Rank assumptions lowest-confidence first; the top 1–2 get the flag the human reads at confirm:
|
|
39
|
+
`⚠ <assumption> — lowest confidence because <why>; if wrong: <cost>`. Present the draft via
|
|
40
|
+
`report-template.md` — open with the ARC (goal · done · plan): the goal this milestone serves,
|
|
41
|
+
what is already covered, and the plan its task list lays out.
|
|
42
|
+
|
|
23
43
|
## Drafting a good MILESTONE.md (section by section)
|
|
24
44
|
|
|
25
45
|
- **goal** — ONE sentence, an outcome not an output ("a user can size any request", not "write
|
|
@@ -27,8 +47,8 @@ whole batch — until the human confirms. You propose; you wait.
|
|
|
27
47
|
- **Scope In/Out** — the explicit anti-creep deferral list. Naming what is OUT is as important
|
|
28
48
|
as what is IN; an empty Out list usually means the scope is not yet thought through.
|
|
29
49
|
- **Shared decisions & glossary deltas** — cross-cutting rules every task must honor, named from
|
|
30
|
-
the glossary. New terms get a glossary entry (the
|
|
31
|
-
- **Shared / risky contracts to freeze first** — the
|
|
50
|
+
the glossary. New terms get a glossary entry (the living documentation stays honest).
|
|
51
|
+
- **Shared / risky contracts to freeze first** — the decision points between tasks; name the owning task.
|
|
32
52
|
- **Tasks (breadth-first)** — `slug · depends-on · one line` each. Decompose by deliverable, not
|
|
33
53
|
by phase; keep each task one-file-sized. Order by dependency, not by guesswork.
|
|
34
54
|
- **Exit criteria** — observable, and **every exit criterion maps to a declared task slug**
|
|
@@ -36,6 +56,7 @@ whole batch — until the human confirms. You propose; you wait.
|
|
|
36
56
|
|
|
37
57
|
## Reject codes (emit `{ reject, rationale }`, create nothing)
|
|
38
58
|
|
|
59
|
+
<reject_codes>
|
|
39
60
|
- `not_classified` — the request has not been through intake yet. Classify it first; you cannot
|
|
40
61
|
draft scope for an unclassified request.
|
|
41
62
|
- `dangling_criterion` — a drafted MILESTONE.md has an exit criterion that maps to no declared
|
|
@@ -43,6 +64,7 @@ whole batch — until the human confirms. You propose; you wait.
|
|
|
43
64
|
a malformed milestone. With no engine lint, you are the first check and the human is the backstop.
|
|
44
65
|
- `no_milestone` — intake routed the request to `task` or `change-request`; scope drafting
|
|
45
66
|
creates NO milestone. Honor the classification; do not invent milestone-sized scope.
|
|
67
|
+
</reject_codes>
|
|
46
68
|
|
|
47
69
|
## Worked example (from this repo's own history)
|
|
48
70
|
|
|
@@ -0,0 +1,65 @@
|
|
|
1
|
+
# Setup review — the one page the human signs
|
|
2
|
+
|
|
3
|
+
Autonomous setup ends at a single human gate: the **baseline approval** (`add.py lock`). Before that
|
|
4
|
+
signature is honest, the human needs to see *what you drafted and how sure you were* — not re-derive
|
|
5
|
+
it. `SETUP-REVIEW.md` is that page: every decision you made while drafting the foundation, first-scope,
|
|
6
|
+
and the first contract, **ordered lowest-confidence-first** so the riskiest guesses meet their eye first.
|
|
7
|
+
|
|
8
|
+
This is the setup-level analog of presenting a task's specification bundle lowest-confidence-first at the contract freeze.
|
|
9
|
+
The engine never reads this file — `add.py lock` is judgment-free, the signature *is* the gate (see
|
|
10
|
+
`setup-lock-state`). The human **reading** this page is the review; your job is to make the reading honest.
|
|
11
|
+
|
|
12
|
+
## Where it lives
|
|
13
|
+
|
|
14
|
+
Write **one** artifact at `.add/SETUP-REVIEW.md`. **Never clobber a human-edited one** — if it already
|
|
15
|
+
exists with hand edits, append/update, don't overwrite (the same non-clobber rule `init` applies to
|
|
16
|
+
living docs). It is a per-onboarding, setup-level artifact; it sits beside `PROJECT.md`, not under a task.
|
|
17
|
+
|
|
18
|
+
## The template
|
|
19
|
+
|
|
20
|
+
```markdown
|
|
21
|
+
# SETUP REVIEW — <project>
|
|
22
|
+
|
|
23
|
+
<stage> · <brownfield | greenfield> · drafted by <model> @ <date>
|
|
24
|
+
|
|
25
|
+
| # | Decision | Lands in | Tag | Why / Evidence |
|
|
26
|
+
|---|----------|----------|-----|----------------|
|
|
27
|
+
| 1 | <the drafted decision> | PROJECT.md \| scope \| first-contract | `guessed` | <the inference + why you had to guess> |
|
|
28
|
+
| 2 | <…> | <…> | `evidence-grounded` | <cite the source file/line you read it from> |
|
|
29
|
+
|
|
30
|
+
Sign: confirm in chat → the agent runs `add.py lock --by "<name>"` (typing it yourself works too)
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
Rows are numbered for reference at the gate ("row 1 is where my confidence is lowest").
|
|
34
|
+
|
|
35
|
+
## The two rules that make it honest
|
|
36
|
+
|
|
37
|
+
<constraints>
|
|
38
|
+
1. **Lowest-confidence-first.** Order rows by confidence **ascending**. A `guessed` row always floats above an
|
|
39
|
+
`evidence-grounded` one. The point is not completeness theatre — it is to spend the human's attention
|
|
40
|
+
where it changes outcomes: the top of the table is the part they actually need to challenge.
|
|
41
|
+
|
|
42
|
+
2. **Every row is tagged — `guessed` or `evidence-grounded`.**
|
|
43
|
+
- `evidence-grounded` — you read it from the code/repo. **Cite the file** (e.g. `pyproject.toml`,
|
|
44
|
+
`src/orders/models.py`). Brownfield onboarding (see `adopt.md`) is mostly these.
|
|
45
|
+
- `guessed` — the repo was silent, so you inferred it. **State the inference and why.** Thin-greenfield
|
|
46
|
+
onboarding (a near-empty repo, only the 4-lens answers) produces these. These are what the human
|
|
47
|
+
must check; that is why they sit on top.
|
|
48
|
+
|
|
49
|
+
The tag vocabulary is shared with `adopt.md` — the brownfield map tags each filled living-doc decision
|
|
50
|
+
`guessed`/`evidence-grounded`, and those tags flow straight into this table.
|
|
51
|
+
</constraints>
|
|
52
|
+
|
|
53
|
+
## Where it ends
|
|
54
|
+
|
|
55
|
+
`SETUP-REVIEW.md` is **read-only context** for the baseline approval. You do not ask the human to approve it
|
|
56
|
+
field-by-field; you present it, lowest-confidence-first; they confirm in conversation, and you run the lock
|
|
57
|
+
with their name:
|
|
58
|
+
|
|
59
|
+
```bash
|
|
60
|
+
python3 .add/tooling/add.py lock --by "<name>"
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
`lock` records the lock layers and opens the build — it does **not** parse or validate this file (the
|
|
64
|
+
engine stays judgment-free). The review lives in the human's reading of the page, not in the tool. Make
|
|
65
|
+
the top of the table the truth they most need, and the one signature is informed.
|