@pilotspace/add 1.2.0 → 1.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +41 -0
- package/GETTING-STARTED.md +22 -0
- package/bin/cli.js +84 -2
- package/docs/02-the-flow.md +4 -1
- package/docs/03-step-1-specify.md +2 -0
- package/docs/06-step-4-tests.md +8 -0
- package/docs/07-step-5-build.md +2 -0
- package/docs/08-step-6-verify.md +11 -0
- package/docs/10-setup-and-stages.md +1 -1
- package/docs/11-governance.md +4 -0
- package/docs/appendix-c-glossary.md +8 -1
- package/docs/appendix-e-checklists.md +14 -2
- package/package.json +1 -1
- package/skill/add/SKILL.md +4 -3
- package/skill/add/phases/0-ground.md +66 -0
- package/skill/add/phases/0-setup.md +3 -1
- package/skill/add/phases/1-specify.md +5 -0
- package/skill/add/phases/3-contract.md +3 -1
- package/skill/add/phases/5-build.md +22 -0
- package/skill/add/phases/6-verify.md +16 -0
- package/skill/add/run.md +48 -5
- package/skill/add/streams.md +21 -6
- package/tooling/add.py +1348 -63
- package/tooling/templates/DESIGN.md.tmpl +66 -0
- package/tooling/templates/GLOSSARY.md.tmpl +7 -1
- package/tooling/templates/PROJECT.md.tmpl +3 -1
- package/tooling/templates/TASK.md.tmpl +23 -4
- package/tooling/templates/catalog.sample.json +38 -0
- package/tooling/templates/prototype.sample.json +48 -0
- package/tooling/templates/tokens.sample.json +55 -0
- package/tooling/templates/udd-catalog.md +122 -0
- package/tooling/templates/udd-tokens.md +79 -0
package/CHANGELOG.md
CHANGED
|
@@ -4,6 +4,47 @@ All notable changes to the ADD method (`@pilotspace/add` on npm,
|
|
|
4
4
|
`pilotspace-add` on PyPI) are documented here. The format follows
|
|
5
5
|
[Keep a Changelog](https://keepachangelog.com/); versions follow semver.
|
|
6
6
|
|
|
7
|
+
## [1.3.0] — 2026-06-13
|
|
8
|
+
|
|
9
|
+
The render-ready-foundation release: a UI project now gets a lintable design
|
|
10
|
+
foundation the AI drafts from, a build's declared scope is enforced as a gate,
|
|
11
|
+
every command names who drives the next step, and the new update command
|
|
12
|
+
refreshes an installed project in place. All additive; no breaking changes
|
|
13
|
+
(SemVer MINOR).
|
|
14
|
+
|
|
15
|
+
### Added
|
|
16
|
+
- **Render-ready UDD foundation** — a `DESIGN.md` prose front-door plus a JSON
|
|
17
|
+
foundation (3-layer design tokens · a component catalog · flat prototype
|
|
18
|
+
content trees) the AI drafts UI from, wired into 0-setup. `add.py check` now
|
|
19
|
+
lints the named set under `.add/design/`, going red with a named code on any
|
|
20
|
+
layer, catalog, tree, or cross-file token-resolution violation — and staying
|
|
21
|
+
silent when a project has no design set, so non-UI projects are unaffected.
|
|
22
|
+
A `udd-tokens.md` + `udd-catalog.md` pair documents the compact-DTCG dialect
|
|
23
|
+
and the json-render render recipe.
|
|
24
|
+
- **The scope gate** — a task's `§5 Scope (may touch)` declaration is frozen
|
|
25
|
+
into a snapshot at tests→build and enforced at the gate: an out-of-scope touch
|
|
26
|
+
heals the task back to BUILD for an honest redo (counting against a per-task
|
|
27
|
+
cap), while erased gate evidence fails closed. Scope creep can no longer ride a
|
|
28
|
+
green suite into a merge.
|
|
29
|
+
- **Engine next-step footer + the driver marker** — every completing command now
|
|
30
|
+
prints exactly one engine-sourced `next:` line, and names who owns it:
|
|
31
|
+
`[you drive]` when the AI proceeds, `[human gate]` at a decision point. The
|
|
32
|
+
driver marker resolves from one place (autonomy × phase), so the next step and
|
|
33
|
+
its owner are never ambiguous across a session.
|
|
34
|
+
- **The `update` command** — `npx @pilotspace/add update` (and the
|
|
35
|
+
`pilotspace-add update` command on PyPI) re-materializes the managed layer
|
|
36
|
+
(skill · tooling · docs) to the installed package version without a re-install.
|
|
37
|
+
It never touches your work — `state.json`, `PROJECT.md`, milestones, tasks, and
|
|
38
|
+
archive are preserved (state is backed up first regardless) — is idempotent via
|
|
39
|
+
a `.add-version` stamp, and offers `--check` to report version drift without
|
|
40
|
+
writing.
|
|
41
|
+
|
|
42
|
+
### Changed
|
|
43
|
+
- The foundation self-improved across these milestones: closing
|
|
44
|
+
`udd-design-foundation` folded its OBSERVE backlog into the versioned
|
|
45
|
+
CONVENTIONS/PROJECT foundation (foundation-version 29), sharpening the
|
|
46
|
+
contract-completeness, adversarial-refute, and engine-pin conventions.
|
|
47
|
+
|
|
7
48
|
## [1.2.0] — 2026-06-10
|
|
8
49
|
|
|
9
50
|
The decision-arc release: the method now narrates the build as one continuous
|
package/GETTING-STARTED.md
CHANGED
|
@@ -63,6 +63,28 @@ handoff — from here on it's conversation, not terminal commands.
|
|
|
63
63
|
> Why stages exist: the steps never change, only how *deeply* you run them.
|
|
64
64
|
> See `.add/docs/10-setup-and-stages.md`.
|
|
65
65
|
|
|
66
|
+
### Updating to a newer ADD — no re-install
|
|
67
|
+
|
|
68
|
+
When a new ADD version ships, refresh a project in two steps: bump the package
|
|
69
|
+
(your package manager), then re-materialize it into the project with `update`:
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
# npm — one shot (npx fetches latest, then re-materializes into this project):
|
|
73
|
+
npx @pilotspace/add@latest update
|
|
74
|
+
|
|
75
|
+
# pip — one shot via pipx (the npx analog: fetch latest + run):
|
|
76
|
+
pipx run pilotspace-add update
|
|
77
|
+
|
|
78
|
+
# …or plain pip, in two steps:
|
|
79
|
+
pip install -U pilotspace-add && pilotspace-add update
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
`update` clean-replaces the managed layer (`skill` · `.add/tooling` · `.add/docs`)
|
|
83
|
+
and **never touches your work** — `state.json`, `PROJECT.md`, milestones, tasks and
|
|
84
|
+
archive are left exactly as they were (it backs `state.json` up first regardless). It
|
|
85
|
+
is idempotent (same version twice is a no-op) and writes a `.add/.add-version` stamp.
|
|
86
|
+
Run `… update --check` to see whether a project is behind the installed package.
|
|
87
|
+
|
|
66
88
|
---
|
|
67
89
|
|
|
68
90
|
## 2 · Your first feature — talk to the agent
|
package/bin/cli.js
CHANGED
|
@@ -32,10 +32,11 @@ function parseArgs(argv) {
|
|
|
32
32
|
// stage/name stay null unless EXPLICITLY passed — the engine's own `init`
|
|
33
33
|
// defaults the stage and infers the name from the folder, so the manual-init
|
|
34
34
|
// hint only echoes flags the user actually chose (shortest true command).
|
|
35
|
-
const args = { _: [], force: false, stage: null, name: null };
|
|
35
|
+
const args = { _: [], force: false, check: false, stage: null, name: null };
|
|
36
36
|
for (let i = 0; i < argv.length; i++) {
|
|
37
37
|
const a = argv[i];
|
|
38
38
|
if (a === "--force") args.force = true;
|
|
39
|
+
else if (a === "--check") args.check = true;
|
|
39
40
|
else if (a === "--stage" || a === "--name") {
|
|
40
41
|
const v = argv[++i];
|
|
41
42
|
// fail loudly on a trailing/abutting flag — never silently drop a value
|
|
@@ -112,6 +113,82 @@ function cmdInit(args) {
|
|
|
112
113
|
(args.name ? ` --name "${args.name}"` : ""));
|
|
113
114
|
}
|
|
114
115
|
|
|
116
|
+
// --- update: re-materialize the managed layer without a re-install -----------
|
|
117
|
+
// The managed trees (ship-controlled). `update` clean-replaces each, so a file removed
|
|
118
|
+
// upstream leaves no orphan — and never touches .add/state.json, PROJECT.md, milestones,
|
|
119
|
+
// tasks, or archive (user data). Pure file-copy (npm <-> pip parity with _installer.py).
|
|
120
|
+
const MANAGED = [
|
|
121
|
+
["skill/add", [".claude", "skills", "add"], false],
|
|
122
|
+
["tooling", [".add", "tooling"], true],
|
|
123
|
+
["docs", [".add", "docs"], false],
|
|
124
|
+
];
|
|
125
|
+
const STAMP_FILE = ".add-version";
|
|
126
|
+
|
|
127
|
+
function pkgVersion() {
|
|
128
|
+
try { return require(path.join(PKG_ROOT, "package.json")).version; }
|
|
129
|
+
catch (_e) { return "0.0.0"; }
|
|
130
|
+
}
|
|
131
|
+
|
|
132
|
+
function readStamp(addDir) {
|
|
133
|
+
const p = path.join(addDir, STAMP_FILE);
|
|
134
|
+
if (!fs.existsSync(p)) return null;
|
|
135
|
+
try { return JSON.parse(fs.readFileSync(p, "utf8")); } catch (_e) { return null; }
|
|
136
|
+
}
|
|
137
|
+
|
|
138
|
+
function writeStamp(addDir, version) {
|
|
139
|
+
fs.mkdirSync(addDir, { recursive: true });
|
|
140
|
+
fs.writeFileSync(
|
|
141
|
+
path.join(addDir, STAMP_FILE),
|
|
142
|
+
JSON.stringify({ version: version, channel: "npm", installed_at: new Date().toISOString() }, null, 2) + "\n"
|
|
143
|
+
);
|
|
144
|
+
}
|
|
145
|
+
|
|
146
|
+
function cleanReplaceTree(src, dest, stripTests) {
|
|
147
|
+
if (!fs.existsSync(src)) fail("missing packaged source: " + src);
|
|
148
|
+
fs.mkdirSync(path.dirname(dest), { recursive: true });
|
|
149
|
+
if (fs.existsSync(dest)) fs.rmSync(dest, { recursive: true, force: true });
|
|
150
|
+
fs.cpSync(src, dest, { recursive: true });
|
|
151
|
+
if (stripTests) {
|
|
152
|
+
fs.rmSync(path.join(dest, "__pycache__"), { recursive: true, force: true });
|
|
153
|
+
for (const entry of fs.readdirSync(dest)) {
|
|
154
|
+
if (/^test_.*\.py$/.test(entry)) fs.rmSync(path.join(dest, entry), { force: true });
|
|
155
|
+
}
|
|
156
|
+
}
|
|
157
|
+
}
|
|
158
|
+
|
|
159
|
+
function cmdUpdate(args) {
|
|
160
|
+
const target = path.resolve(args._[0] || ".");
|
|
161
|
+
const addDir = path.join(target, ".add");
|
|
162
|
+
if (!fs.existsSync(path.join(addDir, "tooling")) && !fs.existsSync(path.join(addDir, "state.json"))) {
|
|
163
|
+
fail("no ADD project at " + target + " (.add/ not found) — run `init` first");
|
|
164
|
+
}
|
|
165
|
+
const version = pkgVersion();
|
|
166
|
+
const stamp = readStamp(addDir);
|
|
167
|
+
const cur = stamp && stamp.version ? stamp.version : null;
|
|
168
|
+
|
|
169
|
+
if (args.check) {
|
|
170
|
+
if (cur === version) log("ADD is current: project and package both at " + version + ".");
|
|
171
|
+
else if (cur === null) log("ADD project is unstamped; installed package is " + version + ". Run `update`.");
|
|
172
|
+
else log("ADD update available: project on " + cur + ", package is " + version + ". Run `update`.");
|
|
173
|
+
return;
|
|
174
|
+
}
|
|
175
|
+
if (cur === version && !args.force) {
|
|
176
|
+
log("ADD already at " + version + " — nothing to update (use --force to re-materialize).");
|
|
177
|
+
return;
|
|
178
|
+
}
|
|
179
|
+
// design-for-failure: back up state BEFORE touching anything.
|
|
180
|
+
const stateFile = path.join(addDir, "state.json");
|
|
181
|
+
if (fs.existsSync(stateFile)) {
|
|
182
|
+
fs.copyFileSync(stateFile, path.join(addDir, "pre-update-state.bak.json"));
|
|
183
|
+
}
|
|
184
|
+
for (const [sub, destParts, stripTests] of MANAGED) {
|
|
185
|
+
cleanReplaceTree(path.join(PKG_ROOT, sub), path.join(target, ...destParts), stripTests);
|
|
186
|
+
}
|
|
187
|
+
writeStamp(addDir, version);
|
|
188
|
+
log("ADD updated " + (cur || "(unstamped)") + " -> " + version +
|
|
189
|
+
" · skill · tooling · docs refreshed · your project state untouched.");
|
|
190
|
+
}
|
|
191
|
+
|
|
115
192
|
function main() {
|
|
116
193
|
const argv = process.argv.slice(2);
|
|
117
194
|
const cmd = argv[0] && !argv[0].startsWith("--") ? argv.shift() : "init";
|
|
@@ -120,9 +197,14 @@ function main() {
|
|
|
120
197
|
case "init":
|
|
121
198
|
cmdInit(args);
|
|
122
199
|
break;
|
|
200
|
+
case "update":
|
|
201
|
+
cmdUpdate(args);
|
|
202
|
+
break;
|
|
123
203
|
case "help":
|
|
124
204
|
case "--help":
|
|
125
|
-
log("usage: npx @pilotspace/add init [targetDir] [--force] [--
|
|
205
|
+
log("usage: npx @pilotspace/add <init|update> [targetDir] [--force] [--check]");
|
|
206
|
+
log(" init install the ADD skill + tooling + book into a project");
|
|
207
|
+
log(" update re-materialize skill/tooling/docs to this package version (preserves your state)");
|
|
126
208
|
break;
|
|
127
209
|
default:
|
|
128
210
|
fail("unknown command '" + cmd + "'. Try: npx @pilotspace/add init");
|
package/docs/02-the-flow.md
CHANGED
|
@@ -8,10 +8,13 @@
|
|
|
8
8
|
|
|
9
9
|
AIDD is one repeatable flow of **seven steps**: six build the feature — Specify → Scenarios → Contract → Tests → Build → Verify — and the seventh, **Observe**, feeds what production teaches back into the next Specify. In the default flow the AI drafts the specification bundle (steps 1–4) and a person approves it **once**, at the contract freeze; the AI performs the Build; and Verify is resolved on evidence under `autonomy: auto`, with a person owning any residue. (See [11 Governance](./11-governance.md) for the autonomy level and the one-approval decision point.)
|
|
10
10
|
|
|
11
|
+
**Before those seven steps comes a phase-0 preamble: `ground`.** Before it specifies anything, the AI gathers the real current codebase the task touches — the actual files, symbols, signatures, patterns, and conventions — into a lean §0 *grounding map*, surfacing the **anchors** the frozen contract will later cite. Ground is AI-owned and adds no new approval (the one approval stays at the contract freeze); it aims the specification bundle at reality instead of assumption, so the contract, tests, and build are grounded in the code as it actually is. The seven steps keep their numbering and brand — ground precedes them as step 0 (it is drawn as node 0 in the diagram below).
|
|
12
|
+
|
|
11
13
|

|
|
12
14
|
|
|
13
15
|
```mermaid
|
|
14
16
|
flowchart LR
|
|
17
|
+
S0["0 Ground<br/>the real codebase"] --> S1["1 Specify<br/>the rules"]
|
|
15
18
|
S1["1 Specify<br/>the rules"] --> S2["2 Scenarios<br/>pass/fail cases"]
|
|
16
19
|
S2 --> S3["3 Contract<br/>freeze the shape"]
|
|
17
20
|
S3 --> S4["4 Tests<br/>failing-first (red)"]
|
|
@@ -27,7 +30,7 @@ flowchart LR
|
|
|
27
30
|
classDef machine fill:#E6F1FB,stroke:#185FA5,color:#042C53;
|
|
28
31
|
class S1,S2 human;
|
|
29
32
|
class S3,S4 decision;
|
|
30
|
-
class S5,S6 machine;
|
|
33
|
+
class S0,S5,S6 machine;
|
|
31
34
|
```
|
|
32
35
|
|
|
33
36
|
> **Solid arrows are the primary flow** — you never start a phase before its input exists (forward-skip forbidden). **Dashed arrows are backward correction** — any phase may return to an earlier one to repair its artifact (the long loop, Observe → Specify, is the same rule at milestone scale). The tight Tests ⇄ Build cycle is the per-feature red/green engine.
|
|
@@ -88,6 +88,8 @@ The defining instruction: *if a requirement is unclear, ask — do not resolve i
|
|
|
88
88
|
- **Free-text errors.** Errors must be named codes, not sentences, so they can become scenarios and contract responses.
|
|
89
89
|
- **Hidden assumptions.** If an assumption is not written down, it is not confirmed — it is a future bug with a delay timer.
|
|
90
90
|
- **A flat list of "confirmed" assumptions.** Eight equal-looking ticks invite a reflex approval. Rank them; flag the one or two that are load-bearing. An unranked list hides the risk inside the noise.
|
|
91
|
+
- **"Existing behavior" claims without a citation.** An assumption row that asserts "this is how X works today" is describing intent, not code. Any wiring claim or assumption that depends on the current state of an existing path must carry a grep/line citation (e.g. `file.rs:203`) — otherwise it is a future bug in disguise.
|
|
92
|
+
- **Wiring claims that name a symbol, not a caller chain.** Verifying that a function exists is not the same as verifying it is reachable. A wiring claim is only valid when it names the production caller chain from an actual entry point — not just the symbol's location in a file. A function that nothing calls is dead, not wired.
|
|
91
93
|
|
|
92
94
|
## Exit check
|
|
93
95
|
|
package/docs/06-step-4-tests.md
CHANGED
|
@@ -60,6 +60,11 @@ The AI generates the test suite from the scenarios and contract. Your job is to
|
|
|
60
60
|
- **A green suite before the build.** Means the tests are not actually exercising the missing feature — fix them now.
|
|
61
61
|
- **Skipping the side-effect assertions.** Without `assert a.balance == 20` on the rejection path, a corrupting partial failure passes silently.
|
|
62
62
|
- **No coverage target.** Without a recorded target, coverage can quietly erode during the build.
|
|
63
|
+
- **`should_panic` as a red test.** Marking a test `#[should_panic(expected = "implement in green wave")]` (or the equivalent in any language) passes immediately and stays green while red — it is a lying red. Declare unimplemented paths with `todo!()` (or `unimplemented!()`) so the test actually fails. If a test is intentionally designed to flip from red to green during the build, say so with a comment: `// flip authorized at green wave`.
|
|
64
|
+
- **Collateral tests named by category, not by exact name.** When a spec adds a slash command, a new CLI subcommand, or any other globally-enumerated thing, there is a fixed collateral set of tests that count or enumerate it (e.g. a command-registry count test, a help-text snapshot, an autocomplete positional assert). Pre-list these tests by their **exact test names** in §4 — not categories — so the build agent's edits to those "pre-existing" tests are expected and the count is right. Naming only the category means the agent finds the wrong test or misses one.
|
|
65
|
+
- **Arithmetic not checked against frozen constants.** Before freezing, check that the red suite can reach green: a fixture with N bytes fails a hard-coded M-byte budget if N > M — the suite can never pass. Run the numbers before freeze, and add an additive override (e.g. `set_budget`) when the scenario implies a limit the production constant cannot satisfy in test.
|
|
66
|
+
- **Non-hermetic tests that read real user state.** Tests that call a loader with `None` (defaulting to `~/.helios/settings.json` or the real home dir) become torn-read flakes under a parallel suite and assert nothing useful. Red tests that create or read production paths must redirect them to a temp dir; grep new tests for `home_dir`, `~/.config`, real-path defaults before freeze.
|
|
67
|
+
- **Tests that share a per-machine singleton without isolation.** Background services (embedded servers, filesystem watchers) bind to fixed ports or paths. Tests that start such a service must tear it down, or they collide with a parallel run or an already-running dev instance. If the singleton cannot be isolated, gate those tests as serial (one thread, no parallel execution) and document it.
|
|
63
68
|
|
|
64
69
|
## Exit check
|
|
65
70
|
|
|
@@ -67,6 +72,9 @@ The AI generates the test suite from the scenarios and contract. Your job is to
|
|
|
67
72
|
- [ ] The suite runs in the pipeline and is **red for the right reason**.
|
|
68
73
|
- [ ] Tests assert observable behavior, not internals.
|
|
69
74
|
- [ ] A coverage target is recorded.
|
|
75
|
+
- [ ] No `should_panic` lying reds — unimplemented paths use `todo!()` or equivalent so they actually fail.
|
|
76
|
+
- [ ] Collateral tests for globally-enumerated things (command counts, help snapshots) are listed by exact name.
|
|
77
|
+
- [ ] Arithmetic checked: the red fixtures can reach green against the frozen constants.
|
|
70
78
|
|
|
71
79
|
## If the check fails
|
|
72
80
|
|
package/docs/07-step-5-build.md
CHANGED
|
@@ -78,3 +78,5 @@ The autonomy granted in this step should match the evidence and your review capa
|
|
|
78
78
|
## If the check fails
|
|
79
79
|
|
|
80
80
|
If the AI weakened a test, reject and re-prompt. If it added an out-of-allow-list package, the pipeline blocks it; have the AI find an approved alternative or raise the package for human approval. If the batch is too large to review, ask the AI to split the work and resubmit. Only once the exit check passes does the change proceed to verification.
|
|
81
|
+
|
|
82
|
+
And in the other direction: if the *verify* gate later finds a confirmed cheat — a tamper, or a build that gamed the green (overfit to the fixtures, vacuous asserts, stubbed-away logic) — the task returns *here* for an honest redo. That return is the **bounded self-heal loop** (see the run chapter): revert the tampered file or de-overfit the code, then advance again. It is capped — after the cap a confirmed cheat HARD-STOPs to the human rather than looping forever, and a gamed green is never auto-passed.
|
package/docs/08-step-6-verify.md
CHANGED
|
@@ -48,6 +48,14 @@ Two failures slip straight past green tests. The first is code that is never *wi
|
|
|
48
48
|
|
|
49
49
|
This is *evidence*, not impression: a reference search showing where each new symbol is called, a scan confirming nothing new is orphaned, or — for prose — a note of exactly what was read and what it confirmed. An unfilled deep check is a **shallow verify**, not a pass. The engine cannot judge wiring, dead code, or whether prose was truly read; the resolver records the evidence, and a person (under `conservative`) or the recorded run (under `auto`) signs it.
|
|
50
50
|
|
|
51
|
+
**The wiring trace is a named step, not a free-form note.** For every new hook, closure, or middleware registered in this task: trace from the process entry point to the call site and record it explicitly — symbol, file, line. A symbol that is only reachable via a test helper or `make_config` but not via the production entry point (e.g. `build_harness_with_dispatcher`, `interactive_mode`) is not wired. This is the third repeated class in production: "runtime-activation-order/silent-noop" — the code exists and the unit tests pass, but the feature is absent in the running program. The wiring trace is how you catch it before a user does.
|
|
52
|
+
|
|
53
|
+
## Part four — was the green earned?
|
|
54
|
+
|
|
55
|
+
Passing tests say the code satisfies the cases you wrote down. They do not say it earned that pass honestly — and the mechanical tamper tripwire (Step 6's floor) only catches an *edited* test or contract, not a build that gamed the *unchanged* suite. The same rubric the phase guide carries names what the tripwire cannot see:
|
|
56
|
+
|
|
57
|
+
A green suite proves the tests pass — not that the build EARNED them. Three judgment cheats pass the unchanged suite without earning it: src overfit to the test fixtures (special-cased to the literal inputs, not the general behavior §1 asked for), vacuous asserts (tautological — green even against an empty implementation), and real logic stubbed away (the function returns a constant the tests happen to accept). These cheats are invisible to the mechanical tamper tripwire, which only sees edited files. Score them with an adversarial refute-read: an independent reviewer — a subagent under `autonomy: auto` is recommended, the engine never spawns one — prompted to argue the green was NOT earned from outside the build context. This is the verify-gate, whole-suite specialization of run.md's adversarial verify (see run.md), not a new discipline. A confirmed earned-green failure is HARD-STOP-class: never auto-passed, never RISK-ACCEPTED — but a first cheat is a chance to redo: a confirmed cheat (mechanical tamper or a reported earned-green failure) enters the bounded self-heal loop — it returns to build for an honest redo, and only after the loop's cap does it HARD-STOP to the human (the loop lives in run.md).
|
|
58
|
+
|
|
51
59
|
## Recording the outcome
|
|
52
60
|
|
|
53
61
|
Every verification ends with exactly one recorded outcome, with an accountable owner — never a silent pass:
|
|
@@ -75,6 +83,9 @@ A security finding is always a `HARD-STOP`; it is never waved through with a wai
|
|
|
75
83
|
- **Shipping on plausibility.** Reading the diff, finding it reasonable, and approving — without the evidence and the non-functional review — is the precise failure the method exists to prevent.
|
|
76
84
|
- **Treating a security gap as acceptable risk.** It is a `HARD-STOP`, not a waiver.
|
|
77
85
|
- **Skipping the concurrency check** because the tests are green. Tests rarely exercise simultaneity; this is a manual check by design.
|
|
86
|
+
- **Trusting the green agent's self-reported test count.** A build agent running a filtered suite (e.g. `-E 'test(theme)'`) only sees tests inside the filter. Collateral failures outside the filter — a stale count in `all_commands_in_registry`, an e2e snapshot the agent did not touch — are invisible. The orchestrator's **full-suite rerun is load-bearing**; never skip it on the grounds that the scoped run was green.
|
|
87
|
+
- **User-observable-only failures escalate to the human before exhausting discriminating probes.** When a symptom is only observable by a person (a TCC dialog, a visual flicker, an OS-level prompt), do not respond by running the suite again. Instead, design two or three targeted probes that let the user distinguish cause A from cause B in one interaction each. Three AskUser probes resolve what three blind reruns cannot.
|
|
88
|
+
- **Background-process hangs misdiagnosed as test failures.** A test that never exits is not a failure in the test logic — it is a hang. The diagnosis recipe: background the test process, run `pgrep` to find it, use the platform profiler (`sample <pid>` on macOS, `perf` on Linux) to sample the stack, then `lsof -p <pid>` to see open files. Run an isolation experiment (suspect line on/off, 3×3) before reading any code. Entry-count caps do not bound wall time — a single huge directory or a blocking syscall inside a `spawn_blocking` call can hang indefinitely even when the entry cap is satisfied.
|
|
78
89
|
|
|
79
90
|
## If the check fails
|
|
80
91
|
|
|
@@ -109,7 +109,7 @@ The default is one task at a time. But when a milestone holds several tasks whos
|
|
|
109
109
|
- **READY-QUEUE** — tasks in the active milestone where the phase is not `done` and every dependency already reads `gate=PASS`. These are the only tasks a worker may pick up; a task finishing `PASS` unblocks its dependents on the next `status`.
|
|
110
110
|
- **REVIEW-QUEUE** — the irreducibly serial part: the **bundle approval** (contract freeze) and any **Verify escalation**. One human, one queue, presented one at a time — never a batch that invites approval without reading.
|
|
111
111
|
|
|
112
|
-
**The autonomy level is the throttle
|
|
112
|
+
**The autonomy level is the throttle** — an explicit, overridable per-task token on an ordered ladder `manual < conservative < auto`. At `manual`, the human owns every gate and nothing auto-resolves (the strict floor). At `conservative`, both gates queue on the human (pure pipelining — builds overlap, nothing auto-resolves). At `auto` (the seeded default), only the bundle-approval decision point and residue escalations queue; Verify auto-PASSes on evidence, so real concurrency follows. The floor never drops below **one human approval per task, at the contract decision point**.
|
|
113
113
|
|
|
114
114
|
**Design for failure (required).** Lease each task to its worker with a timeout — if a worker dies, release the claim back to READY rather than trusting partial work. A worker that hits a stop-and-escalate blocks only its own task; siblings keep running. And if several workers fail in one wave, trip a circuit-breaker and fall back to sequential — repeated failure means the scope was wrong, not the parallelism.
|
|
115
115
|
|
package/docs/11-governance.md
CHANGED
|
@@ -21,6 +21,10 @@ The governing rule, restated from the principles: **operate only at the level yo
|
|
|
21
21
|
|
|
22
22
|
The **per-scope default is auto-with-evidence behind a one-approval decision point**: the AI drafts the specification bundle, a human approves the frozen contract once, and the build auto-gates on evidence. You *lower* a scope toward draft-and-review or suggest wherever risk is high or evidence is thin — and a high-risk or method-defining scope is *always* lowered (it is never auto-run). The default sets where you start; review capacity and risk set where you stay.
|
|
23
23
|
|
|
24
|
+
The engine expresses this per task as an explicit three-rung level — `autonomy: manual | conservative | auto`, an ordered ladder `manual < conservative < auto` declared in the `TASK.md` header and reviewed at the freeze. `auto` is auto-with-evidence behind the one approval (the seeded default); `conservative` is the deliberate lowering that keeps a person at the verify gate; `manual` is the strict floor where the human owns the gate and nothing auto-resolves. A high-risk or method-defining scope refuses an unguarded `auto` (`unguarded_high_risk_auto`) — it must be lowered to `conservative` or `manual`. The prose here and that engine token are one rule: prose ≡ enforcement.
|
|
25
|
+
|
|
26
|
+
**Autonomy is earned by goal-clarity — the auto-ready goal.** The autonomy level decides *who* resolves Verify; an **auto-ready goal** decides whether a self-verifying run is even *meaningful*. A milestone goal is *auto-ready* when **every exit criterion cites a verifier** — `(verify: <test | command | metric>)` — so the engine can check the result against the goal without human judgment. `add.py check` raises a `goal_not_auto_ready` WARN (never red, the active milestone only) while the goal has criteria not all cited, and `status` surfaces a `goal-ready:` line every session, so the goal-clarity gap stays visible. The WARN *measures*, it never blocks: it changes neither the freeze gate nor the autonomy level — clarifying the goal is the prerequisite that *earns* trust, not a new gate (a zero-criteria goal reads not-auto-ready and is milestone-shaping's nudge, not this one's). The lint raises the floor — a citation slot per criterion — but cannot prove the citation is honest: a human can still write `(verify: it works)`, and closing that is a person's judgment, not the engine's.
|
|
27
|
+
|
|
24
28
|
## The gate-fail protocol and the three reports
|
|
25
29
|
|
|
26
30
|
Every checkpoint produces three short reports — **Test** (does it pass?), **Quality** (is it well-made and conformant?), and **Risk** (what could go wrong, and who owns it?) — and resolves to exactly one outcome:
|
|
@@ -24,6 +24,10 @@
|
|
|
24
24
|
|
|
25
25
|
**Gate** — a checkpoint with an explicit pass/fail exit. Its outcome is `PASS`, `RISK-ACCEPTED`, or `HARD-STOP`.
|
|
26
26
|
|
|
27
|
+
**Ground (phase-0 preamble)** — the per-task phase *before* Specify in which the AI gathers the real current codebase the task touches — files, symbols, signatures, patterns, conventions — into a lean **grounding map**, surfacing the **anchors** the frozen contract will cite. It is AI-owned and adds no approval (the one approval stays at the contract freeze); it precedes the seven steps as step 0 so the contract, tests, and build are grounded in the code as it actually is, not in assumption. Lives in the `add` skill's `phases/0-ground.md`.
|
|
28
|
+
|
|
29
|
+
**Grounding map / anchors** — the §0 GROUND artifact: the real files, symbols, and conventions a task touches, plus the **anchors** — the symbols the frozen contract names. Task-specific delta only: it defers to `PROJECT.md` / `CONVENTIONS.md` for architecture and never re-runs the setup brownfield scan. `add.py status` / `check` surface whether the active task's contract is grounded (measure, never block — the contract-freeze checklist asks the human to confirm it).
|
|
30
|
+
|
|
27
31
|
**`HARD-STOP`** — a gate outcome meaning work cannot proceed; triggered by any failing test or security finding.
|
|
28
32
|
|
|
29
33
|
**Intake** — the step *before* a task: sizing a raw request into versioned scope by classifying it into one **request bucket**. The AI proposes `{bucket, rationale, command}`; the human confirms. Lives in the `add` skill's `intake.md` (the intake level, above the per-task flow).
|
|
@@ -70,7 +74,9 @@
|
|
|
70
74
|
|
|
71
75
|
**Scope level** (formerly "altitude") — the granularity a decision lives at: intake level (request → versioned scope) · milestone level · setup/foundation level · task level. (A cross-stage decision lives one level out, at the **stage-graduation** loop — which `graduate.md` also numbers as a scope level; see **Stage graduation**.) One ⚠-assumption notation is shared across every scope level.
|
|
72
76
|
|
|
73
|
-
**Autonomy level** (formerly "autonomy dial") — the per-task setting (`autonomy:
|
|
77
|
+
**Autonomy level** (formerly "autonomy dial") — the explicit per-task setting (`autonomy: manual | conservative | auto`, an ordered ladder manual < conservative < auto) choosing who resolves Verify: `auto` auto-PASSes on complete evidence, `conservative` keeps a human at the gate, `manual` is the strict floor (the human owns the gate; nothing auto-resolves). A high-risk scope refuses an unguarded `auto` — it must be lowered to `manual` or `conservative`. New tasks seed a visible, overridable `autonomy: auto`; a live task with no level warns (`implicit_autonomy`), a token outside the set is rejected (`unknown_autonomy_level`).
|
|
78
|
+
|
|
79
|
+
**Auto-ready goal** — a milestone goal whose every exit criterion **cites a verifier** (`(verify: <test|command|metric>)`), so the engine can self-verify the result against the goal without human judgment. It is the prerequisite by which **autonomy is earned by goal-clarity**: the **autonomy level** governs *who* resolves Verify, but a clarified, machine-checkable goal is what makes a self-verifying run meaningful. `add.py check` raises a `goal_not_auto_ready` **WARN** (never red) for the active milestone until it has an auto-ready goal (≥1 exit criterion and every one cited), and `status` surfaces it (`goal-ready: auto-ready ✓` / cited-of-total); a zero-criteria goal reads not-auto-ready and is milestone-shaping's nudge, not this warning's. The lint forces a citation *slot* per criterion — it raises the floor but **cannot prove the citation is real** (a human can write `(verify: it works)`): citation-theater is the accepted irreducible floor, and the freeze gate and autonomy behavior are unchanged by it.
|
|
74
80
|
|
|
75
81
|
**Automated quality gate** (formerly "evidence auto-gate") — the Verify resolver under `autonomy: auto`: a run may auto-PASS on complete evidence, recorded as *auto-resolved*; a security finding always escalates (`HARD-STOP`).
|
|
76
82
|
|
|
@@ -103,6 +109,7 @@ This book uses plain step names. Teams connecting it to a larger formal standard
|
|
|
103
109
|
| Plain step (this book) | Formal phase name |
|
|
104
110
|
|------------------------|-------------------|
|
|
105
111
|
| Project setup | Foundation |
|
|
112
|
+
| Ground (preamble) | Codebase Discovery (the §0 grounding map) |
|
|
106
113
|
| Specify | Domain Discovery + Spec Definition |
|
|
107
114
|
| (design portion) | UX-Driven Design |
|
|
108
115
|
| Scenarios | Behavior specification (Given/When/Then) |
|
|
@@ -19,6 +19,7 @@ Every exit check in the book, collected for quick use. Print this page.
|
|
|
19
19
|
- [ ] Every rejection has a named error code.
|
|
20
20
|
- [ ] Success state-change described.
|
|
21
21
|
- [ ] Assumptions ranked lowest-confidence first; the 1–2 most-likely-wrong ⚠-flagged with why + cost (or an honest "none material" that still names the single biggest risk).
|
|
22
|
+
- [ ] "Existing behavior" assumptions carry grep/line citations; wiring claims name the production caller chain.
|
|
22
23
|
|
|
23
24
|
## Step 2 — Scenarios
|
|
24
25
|
|
|
@@ -40,6 +41,9 @@ Every exit check in the book, collected for quick use. Print this page.
|
|
|
40
41
|
- [ ] Suite runs in the pipeline and is red for the right reason.
|
|
41
42
|
- [ ] Tests assert behavior, not internals.
|
|
42
43
|
- [ ] Coverage target recorded.
|
|
44
|
+
- [ ] No `should_panic` lying reds — unimplemented paths use `todo!()` so they fail.
|
|
45
|
+
- [ ] Collateral tests for globally-enumerated things listed by exact name.
|
|
46
|
+
- [ ] Arithmetic checked: fixtures can reach green against frozen constants.
|
|
43
47
|
|
|
44
48
|
## Step 5 — Build
|
|
45
49
|
|
|
@@ -55,7 +59,10 @@ Every exit check in the book, collected for quick use. Print this page.
|
|
|
55
59
|
- [ ] Concurrency/timing of the risky operation is safe.
|
|
56
60
|
- [ ] No exposed secrets, injection, or unexpected dependencies.
|
|
57
61
|
- [ ] Layering and dependencies follow `CONVENTIONS.md`.
|
|
58
|
-
- [ ]
|
|
62
|
+
- [ ] Deep check: wiring trace recorded (every new symbol reachable from production entry point) and no dead code introduced.
|
|
63
|
+
- [ ] Was the green earned? Adversarial refute-read on the unchanged suite (no overfit, no vacuous asserts, no stubbed logic).
|
|
64
|
+
- [ ] Full-suite rerun by orchestrator (not only the agent's scoped run).
|
|
65
|
+
- [ ] A person reviewed and approved, **or** auto-resolved by the run (under `autonomy: auto`, no residue).
|
|
59
66
|
- [ ] Outcome recorded (`PASS` / `RISK-ACCEPTED` / `HARD-STOP`).
|
|
60
67
|
|
|
61
68
|
## The loop
|
|
@@ -71,10 +78,15 @@ Every exit check in the book, collected for quick use. Print this page.
|
|
|
71
78
|
A feature is shippable only when all are true:
|
|
72
79
|
|
|
73
80
|
- [ ] Spec complete: behavior stated, rejections named, assumptions ranked lowest-confidence first with the biggest risk flagged.
|
|
81
|
+
- [ ] Wiring and "existing behavior" assumptions carry grep/line citations; wiring claims name the production caller chain.
|
|
74
82
|
- [ ] Every rule has a scenario.
|
|
75
83
|
- [ ] Contract frozen; contract tests green.
|
|
76
|
-
- [ ] A test per scenario; suite was red before the build.
|
|
84
|
+
- [ ] A test per scenario; suite was red before the build (no `should_panic` lying reds).
|
|
85
|
+
- [ ] Collateral tests listed by exact name; arithmetic checked against frozen constants.
|
|
77
86
|
- [ ] All tests green; coverage held; tests and contract untouched by the AI.
|
|
87
|
+
- [ ] Wiring trace recorded: every new symbol reachable from production entry point.
|
|
88
|
+
- [ ] Adversarial refute-read confirms the green was earned (no overfit, no vacuous asserts, no stubbed logic).
|
|
89
|
+
- [ ] Full-suite rerun by orchestrator; not just the agent's scoped run.
|
|
78
90
|
- [ ] Concurrency, security, and architecture checked by a person.
|
|
79
91
|
- [ ] Gate outcome recorded with an accountable owner.
|
|
80
92
|
- [ ] Released behind a flag, with monitors in place.
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@pilotspace/add",
|
|
3
|
-
"version": "1.
|
|
3
|
+
"version": "1.3.0",
|
|
4
4
|
"description": "ADD (AI-Driven Development) — a minimal, state-tracked Claude Code skill that drives every feature through Specify → Scenarios → Contract → Tests → Build → Verify → Observe. Ships the AIDD book as its trust layer.",
|
|
5
5
|
"bin": {
|
|
6
6
|
"add": "bin/cli.js"
|
package/skill/add/SKILL.md
CHANGED
|
@@ -20,7 +20,7 @@ You are the orchestrator. ADD keeps the AI fast *and* safe by fixing direction
|
|
|
20
20
|
the result through passing evidence rather than a plausible-looking diff.
|
|
21
21
|
|
|
22
22
|
**One file = one task.** Each feature lives in a single `.add/tasks/<slug>/TASK.md`
|
|
23
|
-
with seven sections. You fill them top to bottom; the Python tool tracks where
|
|
23
|
+
with a §0 ground preamble and seven step sections. You fill them top to bottom; the Python tool tracks where
|
|
24
24
|
you are so context never rots across sessions.
|
|
25
25
|
|
|
26
26
|
## Always start here (orient — do not skip)
|
|
@@ -61,6 +61,7 @@ Load the phase guide **only for the phase you are in** (progressive disclosure):
|
|
|
61
61
|
| Phase | Guide | Produces (TASK.md section) | Who leads |
|
|
62
62
|
|-------|-------|----------------------------|-----------|
|
|
63
63
|
| setup | `phases/0-setup.md` | `.add/` + living docs + first §1–§3 + `SETUP-REVIEW.md` | AI drafts → **human locks** (the baseline approval) |
|
|
64
|
+
| ground | `phases/0-ground.md` | §0 GROUND map (real files · symbols · the anchors §3 cites) | **AI** (the §0 preamble — no new gate) |
|
|
64
65
|
| specify | `phases/1-specify.md` | §1 rules + ranked lowest-confidence flag | AI drafts (co-specify)† |
|
|
65
66
|
| scenarios | `phases/2-scenarios.md` | §2 Given/When/Then | AI drafts† |
|
|
66
67
|
| contract | `phases/3-contract.md` | §3 frozen shape | AI drafts → **human approves once** (the decision point)† |
|
|
@@ -73,7 +74,7 @@ Load the phase guide **only for the phase you are in** (progressive disclosure):
|
|
|
73
74
|
contract freeze** (the decision point), presented lowest-confidence-first. See `run.md`.
|
|
74
75
|
‡ **Verify auto-gate (v6–v7).** Under `autonomy: auto` (the default) a run may auto-PASS on
|
|
75
76
|
complete evidence — recorded as *auto-resolved*, an explicit PASS, not a skip. **Security always
|
|
76
|
-
escalates** (HARD-STOP); so do concurrency / architecture residue and `conservative`
|
|
77
|
+
escalates** (HARD-STOP); so do concurrency / architecture residue and a lowered autonomy level (`conservative` / `manual`).
|
|
77
78
|
See `run.md`.
|
|
78
79
|
|
|
79
80
|
Whenever you present a decision point to the human in chat (intake · bundle approval · gate ·
|
|
@@ -90,7 +91,7 @@ gathers confirmed deltas into a versioned foundation — read `fold.md`.
|
|
|
90
91
|
## Beyond the bundle — load on demand
|
|
91
92
|
|
|
92
93
|
Once **§3 CONTRACT is FROZEN**, the build→verify half is a dynamic, auto-gated run
|
|
93
|
-
(`autonomy: auto` default, lowered to `conservative` for a human gate) — read `run.md`. To
|
|
94
|
+
(`autonomy: auto` default, lowered to `conservative` or `manual` for a human gate) — read `run.md`. To
|
|
94
95
|
pipeline several ready tasks behind their own frozen contracts, read `streams.md`.
|
|
95
96
|
|
|
96
97
|
When a milestone's tasks are all done but its **goal** (the `MILESTONE.md` exit criteria) is not
|
|
@@ -0,0 +1,66 @@
|
|
|
1
|
+
# Phase 0 — Ground (the real codebase)
|
|
2
|
+
|
|
3
|
+
Goal: before you specify anything, gather the REAL current working folder the task will
|
|
4
|
+
touch — the actual files, symbols, signatures, docs, todos, config, data, patterns, and conventions — so the
|
|
5
|
+
contract, tests, and build are grounded in what exists, not in what you assume.
|
|
6
|
+
Fill **§0 GROUND** in TASK.md. Ground is a per-task preamble to the seven steps;
|
|
7
|
+
it is **AI-owned** — no human gate here (the one approval stays at the §3 freeze).
|
|
8
|
+
|
|
9
|
+
If you cannot name the files and symbols the task touches, you do not yet understand
|
|
10
|
+
the work — gathering them IS the job, not a detour.
|
|
11
|
+
|
|
12
|
+
## Gather (in TASK.md §0)
|
|
13
|
+
|
|
14
|
+
- **Touches** — the real files · symbols · signatures the task will read or change,
|
|
15
|
+
named from the actual code (use your code-navigation tools — grep / symbol search,
|
|
16
|
+
never memory). Each as `path:symbol — what it is / how it is keyed`.
|
|
17
|
+
- **Context (working folder)** — beyond code, the NON-code artifacts the task touches:
|
|
18
|
+
docs/textbase (README · `*.md` · design notes) · TODOs (`TODO.md` · `FIXME`/`TODO`/`HACK`
|
|
19
|
+
comments · task lists) · config/manifests (configs · `.env.example` · `pyproject`/`package`
|
|
20
|
+
· CI) · data/fixtures (samples · fixtures · schemas). Gather only the TASK-SPECIFIC
|
|
21
|
+
delta — never index the whole repo.
|
|
22
|
+
- **Honors** — the patterns and conventions the work must respect, cited from
|
|
23
|
+
`PROJECT.md` / `CONVENTIONS.md`. Gather only the TASK-SPECIFIC delta — never
|
|
24
|
+
re-derive the architecture or re-run the setup brownfield scan.
|
|
25
|
+
- **Anchors the contract cites** — the specific symbols §3 CONTRACT will name. The
|
|
26
|
+
contract may cite only anchors that appear here.
|
|
27
|
+
|
|
28
|
+
**How — gather efficiently:** for the BROAD sweep, prefer a small-model subagent / fast
|
|
29
|
+
index / skim (offload to a cheap context, return a compact map); then DEEPEN on what THIS
|
|
30
|
+
task specifically needs — never lock a shallow first pass. A recommendation: the engine
|
|
31
|
+
never spawns a subagent (tool-agnostic), so the orchestrating agent chooses.
|
|
32
|
+
|
|
33
|
+
## Greenfield / first task
|
|
34
|
+
|
|
35
|
+
The first task of a project runs ground too. When there is little or no code yet
|
|
36
|
+
(greenfield), or you are mid-setup, your grounding IS the foundation docs / brownfield
|
|
37
|
+
scan you just produced — point at them; do not re-scan. An honest "new module, no
|
|
38
|
+
existing code; honors CONVENTIONS.md §X" is a complete grounding.
|
|
39
|
+
|
|
40
|
+
## AI prompt
|
|
41
|
+
|
|
42
|
+
<prompt>
|
|
43
|
+
Role: an engineer who reads the real code before designing against it.
|
|
44
|
+
Read first: PROJECT.md · CONVENTIONS.md · the actual files the task touches.
|
|
45
|
+
Objective: fill §0 GROUND with the real files/symbols/signatures + the conventions to
|
|
46
|
+
honor + the anchor points the contract will cite — gathered from the codebase, never assumed.
|
|
47
|
+
Steps:
|
|
48
|
+
0. Sweep broad cheaply first — prefer a small-model subagent / fast index / skim — then deepen task-specifically.
|
|
49
|
+
1. Locate the files and symbols the task reads or changes (code tools, not memory).
|
|
50
|
+
2. Record their signatures / how they are keyed; cite the conventions to honor (task delta only).
|
|
51
|
+
3. Name the anchors §3 will cite.
|
|
52
|
+
Never: invent a file, symbol, or signature you have not opened.
|
|
53
|
+
</prompt>
|
|
54
|
+
|
|
55
|
+
## Exit gate
|
|
56
|
+
|
|
57
|
+
<exit_gate>
|
|
58
|
+
- [ ] The real files/symbols the task touches are named (from the code, not assumed).
|
|
59
|
+
- [ ] The conventions to honor are cited (task-delta only; no architecture re-scan).
|
|
60
|
+
- [ ] The anchors §3 will cite are listed — §3 names only anchors that exist here.
|
|
61
|
+
</exit_gate>
|
|
62
|
+
|
|
63
|
+
## Next
|
|
64
|
+
|
|
65
|
+
`python3 .add/tooling/add.py advance` → read `phases/1-specify.md`.
|
|
66
|
+
Book: `docs/02-the-flow.md` (the flow; ground is the §0 preamble to the seven steps).
|
|
@@ -53,7 +53,9 @@ tag thin or inferred answers `guessed`.
|
|
|
53
53
|
|
|
54
54
|
1. **Fill the living documentation** (it outlives all code): `.add/PROJECT.md` (the foundation — Domain · Spec/active
|
|
55
55
|
milestone · UI/UX · Key Decisions, one screen), `CONVENTIONS.md`, `GLOSSARY.md`, `MODEL_REGISTRY.md`,
|
|
56
|
-
`dependencies.allowlist
|
|
56
|
+
`dependencies.allowlist`, and — for a UI project — `DESIGN.md` (the design source of truth: identity ·
|
|
57
|
+
principles · screens · the named-set foundation pointers + render recipe; delete it if there's no UI).
|
|
58
|
+
Brownfield: from the code. Greenfield: from the interview, gaps flagged `guessed`.
|
|
57
59
|
2. **Size the first milestone** (read `scope.md`) and draft its `MILESTONE.md` — goal · scope · exit criteria
|
|
58
60
|
· breadth-first tasks.
|
|
59
61
|
3. **Create the first task and draft its candidate specification bundle.** `new-task` is allowed pre-lock:
|
|
@@ -15,6 +15,11 @@ understand the feature — that is information, not an obstacle. Stop and ask.
|
|
|
15
15
|
2. **Converge** — draft §1, then RANK where your confidence is lowest (below).
|
|
16
16
|
3. **Validate** — present the ranked uncertainty first; the user confirms, corrects, or sends back.
|
|
17
17
|
|
|
18
|
+
**Identity is direction, not default (UDD).** For UI/design work, identity values — the brand
|
|
19
|
+
color, the core palette, the typeface — are human-owned. Surface them for discussion during
|
|
20
|
+
Diverge; never assume a brand value. The UDD token dialect checks a token's *shape*; its *value*
|
|
21
|
+
is the user's call (`udd-tokens.md`).
|
|
22
|
+
|
|
18
23
|
## Produce (in TASK.md §1)
|
|
19
24
|
|
|
20
25
|
<output_format>
|
|
@@ -22,10 +22,11 @@ whole bundle (§1–§4). Before asking for it, present the bundle **lowest-conf
|
|
|
22
22
|
most likely wrong (`⚠ [spec|scenario|contract|test] … — because …; if wrong: …`) — aim the human's
|
|
23
23
|
eye before they freeze. Open that report with the ARC (goal · done · plan) per `report-template.md` so the
|
|
24
24
|
human sees the goal this freeze serves and the plan beyond it, not just the bundle. See `run.md`.
|
|
25
|
+
The approval also freezes the §5 Scope (may touch) + Strategy declarations — the bundle covers them.
|
|
25
26
|
|
|
26
27
|
## The freeze review checklist
|
|
27
28
|
|
|
28
|
-
The human's one minute, aimed. Walk these
|
|
29
|
+
The human's one minute, aimed. Walk these seven before saying yes:
|
|
29
30
|
|
|
30
31
|
- **⚠ flags first** — read the lowest-confidence flags; accept each knowing its cost if wrong.
|
|
31
32
|
The engine refuses an unflagged freeze before build: a frozen §3 with no well-formed
|
|
@@ -34,6 +35,7 @@ The human's one minute, aimed. Walk these six before saying yes:
|
|
|
34
35
|
- **Intent** — does §1 say what you actually want built (and is anything you expected missing)?
|
|
35
36
|
- **Cases** — does every Must and Reject have an observable §2 scenario you care about?
|
|
36
37
|
- **Shape** — glossary names, error codes, additive vs breaking: is THIS the shape to freeze?
|
|
38
|
+
- **Grounded** — does §3 cite anchors that exist in the §0 GROUND map (real files/symbols), not invented ones? `status`/`check` surface this — measure, never block.
|
|
37
39
|
- **Risk** — is this scope high-risk or method-defining? Then require
|
|
38
40
|
`risk: high · autonomy: conservative` in the TASK.md header — the engine refuses an unguarded completion.
|
|
39
41
|
- **Tests** — will §4 go red for the right reason, asserting behavior rather than internals?
|
|
@@ -10,6 +10,21 @@ Pick ONE task-sized slice, restate the tests it must satisfy, implement, run
|
|
|
10
10
|
tests, iterate to green. Keep each batch small enough to review in full — you
|
|
11
11
|
cannot move faster than you can verify.
|
|
12
12
|
|
|
13
|
+
## Declaring the scope of impact (Scope + Strategy)
|
|
14
|
+
|
|
15
|
+
§5 of TASK.md opens with two declarations, drafted WITH the specification bundle
|
|
16
|
+
and frozen by the one §3 approval — never invented mid-build:
|
|
17
|
+
|
|
18
|
+
- **Scope (may touch)** — the allowlist of every file the build may write
|
|
19
|
+
(backticked tokens; grammar in the template comment). During build, needing a
|
|
20
|
+
file outside the declared Scope is a **STOP → change request** back to Specify,
|
|
21
|
+
never improvisation.
|
|
22
|
+
- **Strategy (ordered batches)** — the planned build order. Guidance, not
|
|
23
|
+
enforced: it aims the small-batches loop, it does not gate it.
|
|
24
|
+
|
|
25
|
+
Deferral, named: the engine gate (touched ⊆ declared) lands in the
|
|
26
|
+
`scope-gate-enforce` task — until it ships this section is prose discipline.
|
|
27
|
+
|
|
13
28
|
## The cardinal rule
|
|
14
29
|
|
|
15
30
|
**Never weaken or delete a test to make it pass, and never edit the frozen
|
|
@@ -36,6 +51,7 @@ Never: change a test or the contract; use a package off the allow-list; or push
|
|
|
36
51
|
- [ ] Coverage did not decrease.
|
|
37
52
|
- [ ] No test and no contract modified by the AI.
|
|
38
53
|
- [ ] No dependency outside the allow-list.
|
|
54
|
+
- [ ] No file outside the declared §5 Scope was touched.
|
|
39
55
|
- [ ] Change small enough to review in full.
|
|
40
56
|
</exit_gate>
|
|
41
57
|
|
|
@@ -46,3 +62,9 @@ Book: `docs/07-step-5-build.md`.
|
|
|
46
62
|
|
|
47
63
|
> Under `autonomy: auto` (the default) Build and Verify run together as one dynamic,
|
|
48
64
|
> evidence-auto-gated run — not two manual stops. See `run.md`.
|
|
65
|
+
>
|
|
66
|
+
> **Honest redo.** If the verify gate finds a confirmed cheat (a tamper, or a reported
|
|
67
|
+
> earned-green failure), the task returns HERE for an honest redo — revert the tampered
|
|
68
|
+
> file or de-overfit src, then advance again. This is the bounded self-heal loop (`run.md`),
|
|
69
|
+
> capped: after the cap a confirmed cheat HARD-STOPs to the human. Never weaken a test or
|
|
70
|
+
> edit the frozen contract to pass.
|
|
@@ -47,6 +47,22 @@ Record it in the §6 **Deep checks** block — where each new symbol is called (
|
|
|
47
47
|
search), the dead-code scan result, or the prose you read in full and what it confirmed.
|
|
48
48
|
An unfilled Deep checks block is a **shallow verify**, not a PASS.
|
|
49
49
|
|
|
50
|
+
## Part four — was the green earned?
|
|
51
|
+
|
|
52
|
+
A green suite proves the tests pass — not that the build EARNED them. Three judgment cheats
|
|
53
|
+
pass the unchanged suite without earning it: src overfit to the test fixtures (special-cased
|
|
54
|
+
to the literal inputs, not the general behavior §1 asked for), vacuous asserts (tautological —
|
|
55
|
+
green even against an empty implementation), and real logic stubbed away (the function returns
|
|
56
|
+
a constant the tests happen to accept). These cheats are invisible to the mechanical tamper
|
|
57
|
+
tripwire, which only sees edited files. Score them with an adversarial refute-read: an
|
|
58
|
+
independent reviewer — a subagent under `autonomy: auto` is recommended, the engine never
|
|
59
|
+
spawns one — prompted to argue the green was NOT earned from outside the build context. This
|
|
60
|
+
is the verify-gate, whole-suite specialization of run.md's adversarial verify (see run.md), not
|
|
61
|
+
a new discipline. A confirmed earned-green failure is HARD-STOP-class: never auto-passed, never
|
|
62
|
+
RISK-ACCEPTED — but a first cheat is a chance to redo: a confirmed cheat (mechanical tamper or a
|
|
63
|
+
reported earned-green failure) enters the bounded self-heal loop — it returns to build for an honest
|
|
64
|
+
redo, and only after the loop's cap does it HARD-STOP to the human (the loop lives in run.md).
|
|
65
|
+
|
|
50
66
|
## Record exactly one outcome (no silent pass)
|
|
51
67
|
|
|
52
68
|
When you present this gate to the human, open with the ARC (goal · done · plan) per
|