@delegance/claude-autopilot 5.5.2 → 7.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +1776 -6
- package/README.md +65 -1
- package/bin/_launcher.js +38 -23
- package/dist/src/adapters/council/openai.js +12 -6
- package/dist/src/adapters/deploy/_http.d.ts +43 -0
- package/dist/src/adapters/deploy/_http.js +99 -0
- package/dist/src/adapters/deploy/fly.d.ts +206 -0
- package/dist/src/adapters/deploy/fly.js +696 -0
- package/dist/src/adapters/deploy/index.d.ts +2 -0
- package/dist/src/adapters/deploy/index.js +33 -0
- package/dist/src/adapters/deploy/render.d.ts +181 -0
- package/dist/src/adapters/deploy/render.js +550 -0
- package/dist/src/adapters/deploy/types.d.ts +67 -3
- package/dist/src/adapters/deploy/vercel.d.ts +17 -1
- package/dist/src/adapters/deploy/vercel.js +29 -49
- package/dist/src/adapters/pricing.d.ts +36 -0
- package/dist/src/adapters/pricing.js +40 -0
- package/dist/src/adapters/review-engine/codex.js +10 -7
- package/dist/src/cli/autopilot.d.ts +75 -0
- package/dist/src/cli/autopilot.js +750 -0
- package/dist/src/cli/brainstorm.d.ts +23 -0
- package/dist/src/cli/brainstorm.js +131 -0
- package/dist/src/cli/costs.d.ts +15 -1
- package/dist/src/cli/costs.js +99 -10
- package/dist/src/cli/dashboard/index.d.ts +5 -0
- package/dist/src/cli/dashboard/index.js +49 -0
- package/dist/src/cli/dashboard/login.d.ts +22 -0
- package/dist/src/cli/dashboard/login.js +260 -0
- package/dist/src/cli/dashboard/logout.d.ts +12 -0
- package/dist/src/cli/dashboard/logout.js +45 -0
- package/dist/src/cli/dashboard/status.d.ts +30 -0
- package/dist/src/cli/dashboard/status.js +65 -0
- package/dist/src/cli/dashboard/upload.d.ts +16 -0
- package/dist/src/cli/dashboard/upload.js +48 -0
- package/dist/src/cli/deploy.d.ts +3 -3
- package/dist/src/cli/deploy.js +34 -9
- package/dist/src/cli/engine-flag-deprecation.d.ts +14 -0
- package/dist/src/cli/engine-flag-deprecation.js +20 -0
- package/dist/src/cli/fix.d.ts +18 -0
- package/dist/src/cli/fix.js +105 -11
- package/dist/src/cli/help-text.d.ts +52 -0
- package/dist/src/cli/help-text.js +416 -0
- package/dist/src/cli/implement.d.ts +91 -0
- package/dist/src/cli/implement.js +196 -0
- package/dist/src/cli/index.d.ts +2 -1
- package/dist/src/cli/index.js +774 -245
- package/dist/src/cli/json-envelope.d.ts +187 -0
- package/dist/src/cli/json-envelope.js +270 -0
- package/dist/src/cli/json-mode.d.ts +33 -0
- package/dist/src/cli/json-mode.js +201 -0
- package/dist/src/cli/migrate.d.ts +111 -0
- package/dist/src/cli/migrate.js +305 -0
- package/dist/src/cli/plan.d.ts +81 -0
- package/dist/src/cli/plan.js +149 -0
- package/dist/src/cli/pr.d.ts +106 -0
- package/dist/src/cli/pr.js +191 -19
- package/dist/src/cli/preflight.js +26 -0
- package/dist/src/cli/review.d.ts +27 -0
- package/dist/src/cli/review.js +126 -0
- package/dist/src/cli/runs-watch-renderer.d.ts +45 -0
- package/dist/src/cli/runs-watch-renderer.js +275 -0
- package/dist/src/cli/runs-watch.d.ts +41 -0
- package/dist/src/cli/runs-watch.js +395 -0
- package/dist/src/cli/runs.d.ts +122 -0
- package/dist/src/cli/runs.js +902 -0
- package/dist/src/cli/scaffold.d.ts +39 -0
- package/dist/src/cli/scaffold.js +287 -0
- package/dist/src/cli/scan.d.ts +93 -0
- package/dist/src/cli/scan.js +166 -40
- package/dist/src/cli/setup.d.ts +30 -0
- package/dist/src/cli/setup.js +137 -0
- package/dist/src/cli/spec.d.ts +66 -0
- package/dist/src/cli/spec.js +132 -0
- package/dist/src/cli/validate.d.ts +29 -0
- package/dist/src/cli/validate.js +131 -0
- package/dist/src/core/config/schema.d.ts +9 -0
- package/dist/src/core/config/schema.js +7 -0
- package/dist/src/core/config/types.d.ts +11 -0
- package/dist/src/core/council/runner.d.ts +10 -1
- package/dist/src/core/council/runner.js +25 -3
- package/dist/src/core/council/types.d.ts +7 -0
- package/dist/src/core/errors.d.ts +1 -1
- package/dist/src/core/errors.js +11 -0
- package/dist/src/core/logging/redaction.d.ts +13 -0
- package/dist/src/core/logging/redaction.js +20 -0
- package/dist/src/core/migrate/schema-validator.js +15 -1
- package/dist/src/core/phases/static-rules.d.ts +5 -1
- package/dist/src/core/phases/static-rules.js +2 -5
- package/dist/src/core/run-state/budget.d.ts +88 -0
- package/dist/src/core/run-state/budget.js +141 -0
- package/dist/src/core/run-state/cli-internal.d.ts +21 -0
- package/dist/src/core/run-state/cli-internal.js +174 -0
- package/dist/src/core/run-state/events.d.ts +59 -0
- package/dist/src/core/run-state/events.js +512 -0
- package/dist/src/core/run-state/lock.d.ts +61 -0
- package/dist/src/core/run-state/lock.js +206 -0
- package/dist/src/core/run-state/phase-context.d.ts +60 -0
- package/dist/src/core/run-state/phase-context.js +108 -0
- package/dist/src/core/run-state/phase-registry.d.ts +137 -0
- package/dist/src/core/run-state/phase-registry.js +162 -0
- package/dist/src/core/run-state/phase-runner.d.ts +80 -0
- package/dist/src/core/run-state/phase-runner.js +447 -0
- package/dist/src/core/run-state/provider-readback.d.ts +130 -0
- package/dist/src/core/run-state/provider-readback.js +426 -0
- package/dist/src/core/run-state/replay-decision.d.ts +69 -0
- package/dist/src/core/run-state/replay-decision.js +144 -0
- package/dist/src/core/run-state/resolve-engine.d.ts +45 -0
- package/dist/src/core/run-state/resolve-engine.js +74 -0
- package/dist/src/core/run-state/resume-preflight.d.ts +66 -0
- package/dist/src/core/run-state/resume-preflight.js +116 -0
- package/dist/src/core/run-state/run-phase-with-lifecycle.d.ts +69 -0
- package/dist/src/core/run-state/run-phase-with-lifecycle.js +193 -0
- package/dist/src/core/run-state/runs.d.ts +57 -0
- package/dist/src/core/run-state/runs.js +288 -0
- package/dist/src/core/run-state/snapshot.d.ts +14 -0
- package/dist/src/core/run-state/snapshot.js +114 -0
- package/dist/src/core/run-state/state.d.ts +40 -0
- package/dist/src/core/run-state/state.js +164 -0
- package/dist/src/core/run-state/types.d.ts +284 -0
- package/dist/src/core/run-state/types.js +19 -0
- package/dist/src/core/run-state/ulid.d.ts +11 -0
- package/dist/src/core/run-state/ulid.js +95 -0
- package/dist/src/core/schema-alignment/extractor/index.d.ts +1 -1
- package/dist/src/core/schema-alignment/extractor/index.js +2 -2
- package/dist/src/core/schema-alignment/extractor/prisma.d.ts +13 -1
- package/dist/src/core/schema-alignment/extractor/prisma.js +65 -10
- package/dist/src/core/schema-alignment/git-history.d.ts +19 -0
- package/dist/src/core/schema-alignment/git-history.js +53 -0
- package/dist/src/core/static-rules/rules/brand-tokens.js +2 -2
- package/dist/src/core/static-rules/rules/schema-alignment.js +14 -4
- package/dist/src/dashboard/auto-upload.d.ts +26 -0
- package/dist/src/dashboard/auto-upload.js +107 -0
- package/dist/src/dashboard/config.d.ts +22 -0
- package/dist/src/dashboard/config.js +109 -0
- package/dist/src/dashboard/upload/canonical.d.ts +3 -0
- package/dist/src/dashboard/upload/canonical.js +16 -0
- package/dist/src/dashboard/upload/chain.d.ts +9 -0
- package/dist/src/dashboard/upload/chain.js +27 -0
- package/dist/src/dashboard/upload/snapshot.d.ts +23 -0
- package/dist/src/dashboard/upload/snapshot.js +66 -0
- package/dist/src/dashboard/upload/uploader.d.ts +54 -0
- package/dist/src/dashboard/upload/uploader.js +330 -0
- package/package.json +19 -3
- package/scripts/autoregress.ts +1 -1
- package/scripts/test-runner.mjs +4 -0
- package/skills/claude-autopilot.md +1 -1
- package/skills/make-interfaces-feel-better/SKILL.md +104 -0
- package/skills/simplify-ui/SKILL.md +103 -0
- package/skills/ui/SKILL.md +117 -0
- package/skills/ui-ux-pro-max/SKILL.md +90 -0
package/CHANGELOG.md
CHANGED
|
@@ -1,14 +1,1784 @@
|
|
|
1
|
-
|
|
1
|
+
## Unreleased
|
|
2
|
+
|
|
3
|
+
- v5.6 Phase 7 (docs reconciliation) — pending.
|
|
4
|
+
|
|
5
|
+
## 7.2.0 (2026-05-10)
|
|
6
|
+
|
|
7
|
+
**v7.2.0 — `claude-autopilot scaffold --from-spec <path>`.** Closes
|
|
8
|
+
the biggest remaining day-1 friction the v7.1.6 blank-repo benchmark
|
|
9
|
+
identified. Even with auto-scaffolded `CLAUDE.md` + `.gitignore`
|
|
10
|
+
(v7.1.7), a fresh repo still needs a hand-written `package.json`,
|
|
11
|
+
`tsconfig.json`, and directory skeleton before any feature work
|
|
12
|
+
happens. The new verb collapses that step.
|
|
13
|
+
|
|
14
|
+
**New verb** reads a spec markdown file's `## Files` section and:
|
|
15
|
+
|
|
16
|
+
* Creates listed directories (`mkdir -p`).
|
|
17
|
+
* Creates empty placeholder files for each path in the section.
|
|
18
|
+
* Generates a starter `package.json` (Node 22 ESM defaults +
|
|
19
|
+
hint-merged `bin` / `dependencies` / `scripts` parsed loosely
|
|
20
|
+
from the spec prose).
|
|
21
|
+
* Generates a starter `tsconfig.json` — JS-flavor (`allowJs +
|
|
22
|
+
checkJs + noEmit`) when the spec lists predominantly `.js` files,
|
|
23
|
+
TS-flavor (compiled to `dist/`) for `.ts` files.
|
|
24
|
+
|
|
25
|
+
**Never overwrites existing files** — operator opted into autopilot,
|
|
26
|
+
not into us nuking their package.json. Reports `· exists` for each
|
|
27
|
+
preserved file. Idempotent: re-running on a partially-scaffolded
|
|
28
|
+
repo only fills the gaps.
|
|
29
|
+
|
|
30
|
+
`--dry-run` flag logs what would happen without writing.
|
|
31
|
+
|
|
32
|
+
**End-to-end smoke**: scaffold from the actual v7.1.6 benchmark
|
|
33
|
+
spec produces a 100%-correct skeleton in ~50ms (3 dirs + 5
|
|
34
|
+
placeholder files + matching package.json bin/deps/scripts).
|
|
35
|
+
|
|
36
|
+
**Out of scope (deferred to v8):**
|
|
37
|
+
|
|
38
|
+
* Per-stack scaffolding (Python `pyproject.toml`, Go `go.mod`,
|
|
39
|
+
Rust `Cargo.toml`). v7.2.0 ships Node ESM only — covers the
|
|
40
|
+
v7.1.6 benchmark stack and the most common starter case.
|
|
41
|
+
* Running `npm install`. Operator picks the package manager.
|
|
42
|
+
|
|
43
|
+
11 new tests (4 parser + 2 builder + 5 end-to-end). 1548 → 1559
|
|
44
|
+
CLI tests; tsc clean; build clean. New verb registered in
|
|
45
|
+
`src/cli/index.ts` + listed in `Pipeline:` help group. Version
|
|
46
|
+
7.1.9 → 7.2.0 (minor bump for new verb surface).
|
|
47
|
+
|
|
48
|
+
## 7.1.9 (2026-05-10)
|
|
49
|
+
|
|
50
|
+
**v7.1.9 — build fix + Generic-stack next-steps hint.** Two
|
|
51
|
+
micro-fixes from the v7.1.8 benchmark re-run.
|
|
52
|
+
|
|
53
|
+
* **`canonicalize` declared at root** (`package.json`). The CLI's
|
|
54
|
+
`src/dashboard/upload/canonical.ts` (RFC 8785 / JCS parity copy
|
|
55
|
+
of `apps/web/lib/upload/canonical.ts`) imports `canonicalize`
|
|
56
|
+
but the module was only declared in `apps/web/package.json`.
|
|
57
|
+
Root build hit `TS2307: Cannot find module 'canonicalize'` even
|
|
58
|
+
though the package was actually installed via npm hoisting. Now
|
|
59
|
+
declared at root — `npm run build` from a fresh clone is clean.
|
|
60
|
+
* **Generic+low-confidence next-steps hint** (`src/cli/setup.ts`).
|
|
61
|
+
The v7.1.8 benchmark re-run on a truly blank repo reported
|
|
62
|
+
"Detected: Generic (low confidence)" with no actionable next
|
|
63
|
+
step. Setup now surfaces a one-liner:
|
|
64
|
+
`npm init -y` → `npx claude-autopilot setup --force`. Skipped
|
|
65
|
+
silently on high-confidence detections (the common case).
|
|
66
|
+
|
|
67
|
+
2 new tests (`tests/setup.test.ts`); 1546 → 1548 CLI tests; tsc
|
|
68
|
+
clean; build clean. Version 7.1.8 → 7.1.9.
|
|
69
|
+
|
|
70
|
+
## 7.1.8 (2026-05-10)
|
|
71
|
+
|
|
72
|
+
**v7.1.8 — blank-repo benchmark re-run on v7.1.7.** Docs-only PR.
|
|
73
|
+
Friction-reduction delta measurement after the v7.1.7 polish PR.
|
|
74
|
+
|
|
75
|
+
**All three v7.1.7 fixes verified end-to-end** on a fresh `git init`
|
|
76
|
+
repo:
|
|
77
|
+
|
|
78
|
+
* `.gitignore` auto-created with `.guardrail-cache/` + `node_modules/`.
|
|
79
|
+
* `CLAUDE.md` auto-scaffolded with detected stack, test command,
|
|
80
|
+
Conventional Commits convention, error class shape, branch
|
|
81
|
+
naming, TODO slots.
|
|
82
|
+
* Deprecation banner deduped per UTC day via
|
|
83
|
+
`~/.claude-autopilot/.deprecation-shown` stamp.
|
|
84
|
+
|
|
85
|
+
**Friction score: 3 of 6 v7.1.6 friction points closed; 1 partially
|
|
86
|
+
closed; 2 deferred.** Matches v7.1.6 prediction ("would close ~5 of
|
|
87
|
+
6") with minor over-promise.
|
|
88
|
+
|
|
89
|
+
**New friction surfaced:**
|
|
90
|
+
|
|
91
|
+
* Stale `dist/` after merge requires `npm run build` for local
|
|
92
|
+
contributors (invisible to `npm install -g` users).
|
|
93
|
+
* Build hits one stale TS error (`canonicalize` not declared at
|
|
94
|
+
root level) — 4 v7.1.7 helpers compiled, setup ran end-to-end,
|
|
95
|
+
filing as separate followup.
|
|
96
|
+
* `Detected: Generic (low confidence)` on truly blank repos —
|
|
97
|
+
honest but suggests next-step "scaffold a `package.json` first
|
|
98
|
+
for higher-confidence detection."
|
|
99
|
+
|
|
100
|
+
**New recommendations:** suggest stack-scaffold step in `setup`
|
|
101
|
+
next-steps when detection is `Generic` (~20min ship);
|
|
102
|
+
`scaffold --from-spec` verb (deferred from v7.1.6, ~1-day);
|
|
103
|
+
per-stack starter `tsconfig.json` / `pyproject.toml` (~2-4hr per
|
|
104
|
+
stack).
|
|
105
|
+
|
|
106
|
+
**Methodology caveat:** Phase B (impl agent) NOT re-run — wall-clock
|
|
107
|
+
impact is downstream and would need another full agent dispatch to
|
|
108
|
+
measure precisely. The friction-point table tells most of the story.
|
|
109
|
+
|
|
110
|
+
Full report at `docs/benchmarks/2026-05-10-blank-repo-v7.1.7.md`.
|
|
111
|
+
No code change; bumping to 7.1.8 to keep CHANGELOG/version line in
|
|
112
|
+
lockstep with master HEAD.
|
|
113
|
+
|
|
114
|
+
## 7.1.7 (2026-05-10)
|
|
115
|
+
|
|
116
|
+
**v7.1.7 — `setup` verb day-1 polish.** Three fixes from the v7.1.6
|
|
117
|
+
blank-repo benchmark report. Operator-facing improvements; no
|
|
118
|
+
breaking changes; no migration.
|
|
119
|
+
|
|
120
|
+
* **Per-calendar-day deprecation dedup** (`bin/_launcher.js`). The
|
|
121
|
+
v6.3+ stamp was keyed by `process.ppid + tty/pipe` — fine for
|
|
122
|
+
interactive shells, broken for git hooks (fresh shell per hook =
|
|
123
|
+
fresh ppid = stamp re-created every commit, notice printed every
|
|
124
|
+
commit). New stamp at `~/.claude-autopilot/.deprecation-shown`
|
|
125
|
+
contains `YYYY-MM-DD` and dedups by UTC day per machine.
|
|
126
|
+
Override env vars (`CLAUDE_AUTOPILOT_DEPRECATION=always|never`)
|
|
127
|
+
preserved.
|
|
128
|
+
* **Auto-add `node_modules/` + `.guardrail-cache/` to `.gitignore`**
|
|
129
|
+
on `setup` (`src/cli/setup.ts`). New `ensureGitignoreEntries()`
|
|
130
|
+
helper: idempotent (re-running never duplicates), preserves
|
|
131
|
+
existing entries, creates `.gitignore` from scratch if missing.
|
|
132
|
+
* **Auto-scaffold starter `CLAUDE.md`** when one doesn't exist
|
|
133
|
+
(`src/cli/setup.ts`). New `ensureStarterClaudeMd()` helper writes
|
|
134
|
+
~35 lines covering: detected stack + confidence, test command,
|
|
135
|
+
Conventional Commits convention, error class shape, branch naming,
|
|
136
|
+
TODO slots for "patterns to mimic" + "common pitfalls". Closes
|
|
137
|
+
~5 of 6 friction points the benchmark agent reported. Never
|
|
138
|
+
overwrites an existing `CLAUDE.md`.
|
|
139
|
+
|
|
140
|
+
13 new tests (4 setup + 6 launcher + 3 idempotency / overwrite-safety).
|
|
141
|
+
1539 → 1546 CLI tests. tsc clean. Version bump 7.1.6 → 7.1.7 to
|
|
142
|
+
keep CHANGELOG/version line in lockstep with master HEAD.
|
|
143
|
+
|
|
144
|
+
## 7.1.6 (2026-05-09)
|
|
145
|
+
|
|
146
|
+
**v7.1.6 — blank-repo benchmark report.** Docs-only PR. Captures
|
|
147
|
+
the day-1 experience of using `claude-autopilot` on a true `git init`
|
|
148
|
+
repo, end-to-end from "empty directory" to "feature shipped + tests
|
|
149
|
+
passing." Triggered by codex W5 from the autopilot product-direction
|
|
150
|
+
brainstorm.
|
|
151
|
+
|
|
152
|
+
**Headline:** ~17 minutes from `git init` to working MVP (small CLI,
|
|
153
|
+
Node 22 ESM, with a real Anthropic API call). Setup itself is ~6
|
|
154
|
+
seconds. Pre-commit static-rules hook caught accidentally-staged
|
|
155
|
+
secrets on day 1 (real-world value, not theoretical).
|
|
156
|
+
|
|
157
|
+
**Top friction points:** no `CLAUDE.md` scaffolded by `setup`;
|
|
158
|
+
deprecation banner prints on every commit; `.gitignore` doesn't
|
|
159
|
+
auto-add `node_modules/` or `.guardrail-cache/`; no `scaffold
|
|
160
|
+
--from-spec` verb.
|
|
161
|
+
|
|
162
|
+
**Top recommendations:** dedup deprecation banner (~30min ship),
|
|
163
|
+
auto-add cache dirs to `.gitignore` (~10min ship), auto-scaffold
|
|
164
|
+
starter `CLAUDE.md` on `setup` (~2-4hr ship). Fully-autonomous-from-
|
|
165
|
+
blank requires Option C (standalone daemon) work first — flagged as
|
|
166
|
+
v8 dependency.
|
|
167
|
+
|
|
168
|
+
Full report at `docs/benchmarks/2026-05-09-blank-repo.md`. Bumping
|
|
169
|
+
to 7.1.6 to keep CHANGELOG/version line in lockstep with master HEAD.
|
|
170
|
+
|
|
171
|
+
## 7.1.5 (2026-05-09)
|
|
172
|
+
|
|
173
|
+
**v7.1.5 — change-aware CI matrix.** CI infra optimization;
|
|
174
|
+
no application code change; no test additions.
|
|
175
|
+
|
|
176
|
+
The v7.0+ repo runs 6 GitHub Actions workflows on every PR
|
|
177
|
+
(bin smoke ×6 OS×Node + Test Node 22 + Delegance regression +
|
|
178
|
+
tarball check + apps/web typecheck/build/tests + RLS). Many of
|
|
179
|
+
those are irrelevant to PRs that only touch a different layer
|
|
180
|
+
(apps/web-only PRs don't need bin smoke; CLI-only PRs don't
|
|
181
|
+
need apps/web tests; docs-only PRs don't need anything).
|
|
182
|
+
|
|
183
|
+
Each workflow's `pull_request:` trigger now includes a `paths:`
|
|
184
|
+
filter — GitHub Actions skips the workflow entirely on PRs that
|
|
185
|
+
don't touch any matching file:
|
|
186
|
+
|
|
187
|
+
* `ci.yml` (Test Node 22), `bin-parity.yml` (bin smoke ×6),
|
|
188
|
+
`delegance-regression.yml`: triggered by CLI changes (`src/**`,
|
|
189
|
+
`bin/**`, `tests/**` for ci.yml, `scripts/**`, `presets/**`)
|
|
190
|
+
and conservative shared paths (`tsconfig*`, `package.json`,
|
|
191
|
+
`package-lock.json`, the workflow file itself).
|
|
192
|
+
* `web-tests.yml`: triggered by `apps/**`, `tsconfig*`,
|
|
193
|
+
`package.json`, `package-lock.json`, the workflow file.
|
|
194
|
+
* `db-tests.yml`: triggered by `db/**`, `tests/rls/**`,
|
|
195
|
+
`package.json`, `package-lock.json`, the workflow file.
|
|
196
|
+
* `npm-tarball-check.yml`: triggered by anything that affects
|
|
197
|
+
the published artifact (`package.json`, `.npmignore`,
|
|
198
|
+
`package-lock.json`, CLI source).
|
|
199
|
+
|
|
200
|
+
**Codex pass W4 safety net:** `push:` triggers (master + tag
|
|
201
|
+
pushes) deliberately have NO `paths:` filter. Every master merge
|
|
202
|
+
runs the full matrix, catching anything that slipped past the
|
|
203
|
+
PR-level filter (e.g. a config change in a directory we forgot
|
|
204
|
+
to enumerate). The PR-level filter is a latency optimization,
|
|
205
|
+
not a correctness boundary.
|
|
206
|
+
|
|
207
|
+
**Expected effect:** apps/web-only PRs (Phase 5.7-7.1.4 polish
|
|
208
|
+
shape) drop from ~12-15min CI wall clock to ~5-7min. Docs-only
|
|
209
|
+
PRs become a no-op CI run.
|
|
210
|
+
|
|
211
|
+
No package code change; bumping to 7.1.5 to keep CHANGELOG/
|
|
212
|
+
version-line in lockstep with master HEAD.
|
|
213
|
+
|
|
214
|
+
## 7.1.4 (2026-05-09)
|
|
215
|
+
|
|
216
|
+
**v7.1.4 — fix recurring PGRST002 RLS workflow flake.** CI infra
|
|
217
|
+
fix; no application code change; no test additions. Phase 5.1, 5.7,
|
|
218
|
+
and 7.1.3 all hit the same intermittent failure in the RLS negative
|
|
219
|
+
tests workflow:
|
|
2
220
|
|
|
3
|
-
|
|
221
|
+
```
|
|
222
|
+
PGRST002 — Could not query the database for the schema cache. Retrying.
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
PostgREST caches the database schema asynchronously AFTER
|
|
226
|
+
`supabase db reset` returns. The first SDK queries from the test
|
|
227
|
+
runner often arrive before the cache has finished warming, hard-
|
|
228
|
+
failing instead of waiting.
|
|
229
|
+
|
|
230
|
+
Fix: new "Wait for PostgREST schema cache to warm up" workflow step
|
|
231
|
+
between `Apply migrations` and `Run RLS tests`. Polls
|
|
232
|
+
`GET /rest/v1/` (PostgREST OpenAPI doc) up to 60s; succeeds on the
|
|
233
|
+
first response that parses as JSON with an `info` field. Times out
|
|
234
|
+
with diagnostic body if the cache doesn't warm.
|
|
235
|
+
|
|
236
|
+
Changes only `.github/workflows/db-tests.yml`. No package code
|
|
237
|
+
change, but bumping to 7.1.4 to keep version-line/CHANGELOG in
|
|
238
|
+
lockstep with master HEAD.
|
|
239
|
+
|
|
240
|
+
## 7.1.3 (2026-05-09)
|
|
241
|
+
|
|
242
|
+
**v7.1.3 — `/api/health/v7-readiness` deploy-verification endpoint.**
|
|
243
|
+
Hosted product (`apps/web/`) only. Operator-facing improvement; no
|
|
244
|
+
breaking changes; no migration.
|
|
245
|
+
|
|
246
|
+
* New `GET /api/health/v7-readiness` route, gated by
|
|
247
|
+
`Authorization: Bearer ${CRON_SECRET}` (constant-time compare via
|
|
248
|
+
`crypto.timingSafeEqual`).
|
|
249
|
+
* Verifies in one HTTP call:
|
|
250
|
+
- `check_membership_status` RPC is present + executable (closes
|
|
251
|
+
codex PR #141 PR-pass WARNING #3 — the Phase 6 migration must
|
|
252
|
+
be applied before deploying any v7.0+ web image, or every
|
|
253
|
+
org-scoped dashboard request returns `check_failed` within 60s).
|
|
254
|
+
- All 12 required env vars are set (Supabase, Stripe, WorkOS,
|
|
255
|
+
JWT/SSO/cookie secrets meeting ≥32-byte minimums where
|
|
256
|
+
applicable).
|
|
257
|
+
* Response: `200 {ok: true, totalChecks, passed, failed: 0, checks}`
|
|
258
|
+
on full pass; `503 {ok: false, ...}` with per-check
|
|
259
|
+
`{name, status, required, message?}` diagnostic on any required
|
|
260
|
+
failure.
|
|
261
|
+
* Operator runbook updated with `curl -fsSL` example for an
|
|
262
|
+
automated deploy-step gate.
|
|
263
|
+
* 8 new tests in `apps/web/__tests__/api/health/v7-readiness.test.ts`
|
|
264
|
+
covering happy path, missing env, too-short secret, RPC missing,
|
|
265
|
+
three auth-failure modes (no header, wrong secret, malformed
|
|
266
|
+
Bearer), and missing CRON_SECRET → 500.
|
|
267
|
+
* 613 → 621 web tests; 1536 CLI unchanged; tsc clean.
|
|
268
|
+
|
|
269
|
+
## 7.1.2 (2026-05-09)
|
|
270
|
+
|
|
271
|
+
**v7.1.2 — configurable membership-check TTL.** Hosted product
|
|
272
|
+
(`apps/web/`) only. Operator-facing improvement; no breaking changes;
|
|
273
|
+
no migration.
|
|
274
|
+
|
|
275
|
+
* New optional env var `MEMBERSHIP_CHECK_TTL_SECONDS` overrides the
|
|
276
|
+
default 60s `cao_membership_check` cookie TTL. Bounded `[1, 3600]`.
|
|
277
|
+
* Lower TTL = tighter revocation window (≤N seconds for a disabled
|
|
278
|
+
member to see 403 on next dashboard request) at the cost of more
|
|
279
|
+
`check_membership_status` RPC calls per dashboard navigation.
|
|
280
|
+
* Higher TTL = fewer RPC calls but extends the v7.0 documented
|
|
281
|
+
"≤60s revocation latency" guarantee.
|
|
282
|
+
* Invalid values (non-integer, < 1, > 3600) silently fall back to 60
|
|
283
|
+
with a one-shot warn (same pattern as the v7.1.1 PREVIOUS-secret
|
|
284
|
+
validator).
|
|
285
|
+
* 6 new tests in `cookie-hmac.test.ts` cover: default 60 when unset;
|
|
286
|
+
valid integer in range; non-numeric falls back; float falls back;
|
|
287
|
+
out-of-range (< 1, < 0, > 3600) falls back; signed cookie exp
|
|
288
|
+
respects the configured TTL via sign+verify roundtrip.
|
|
289
|
+
* 607 → 613 web tests; 1536 CLI unchanged; tsc clean.
|
|
290
|
+
|
|
291
|
+
## 7.1.1 (2026-05-09)
|
|
292
|
+
|
|
293
|
+
**v7.1.1 — dual-secret rotation for `MEMBERSHIP_CHECK_COOKIE_SECRET`.**
|
|
294
|
+
Hosted product (`apps/web/`) only. Operator-facing improvement;
|
|
295
|
+
no breaking changes; no migration; no new tests fail/skip.
|
|
296
|
+
|
|
297
|
+
* New optional env var `MEMBERSHIP_CHECK_COOKIE_SECRET_PREVIOUS`.
|
|
298
|
+
When set, `verifyMembershipCookie()` tries `CURRENT` first; on
|
|
299
|
+
signature mismatch, tries `PREVIOUS`. New cookies always sign
|
|
300
|
+
with `CURRENT`. Closes the v7.0 runbook follow-up where rotating
|
|
301
|
+
the secret invalidated every outstanding cookie at once = a
|
|
302
|
+
thundering-herd of `check_membership_status` RPC calls on every
|
|
303
|
+
active dashboard session.
|
|
304
|
+
* Operator rotation flow (4 steps) documented in `docs/v7/runbook.md`
|
|
305
|
+
+ `apps/web/.env.example`.
|
|
306
|
+
* `MEMBERSHIP_CHECK_COOKIE_SECRET_PREVIOUS` validation: same
|
|
307
|
+
≥32-byte minimum as `CURRENT`. Malformed/too-short `PREVIOUS`
|
|
308
|
+
is ignored with a one-shot warn — does not break the happy path.
|
|
309
|
+
* 5 new tests in `apps/web/__tests__/lib/middleware/cookie-hmac.test.ts`
|
|
310
|
+
cover: PREVIOUS verifies during rotation; new cookies sign with
|
|
311
|
+
CURRENT not PREVIOUS; forged-third-secret fails even with both;
|
|
312
|
+
PREVIOUS unset behaves identically to v7.1.0; PREVIOUS too short
|
|
313
|
+
is ignored without breaking CURRENT.
|
|
314
|
+
* 602 → 607 web tests; 1536 CLI unchanged; tsc clean.
|
|
315
|
+
|
|
316
|
+
## 7.1.0 (2026-05-09)
|
|
317
|
+
|
|
318
|
+
**v7.1 — symmetric ingest revocation closure.** Hosted product
|
|
319
|
+
(`apps/web/`) only. Closes the JWT-authenticated ingest gap that v7.0
|
|
320
|
+
Phase 6 explicitly deferred: collapses the per-request revocation
|
|
321
|
+
window from ≤15min (the JWT TTL) to **≤1 request** for org-scoped runs.
|
|
322
|
+
|
|
323
|
+
### apps/web — JWT-authenticated ingest membership re-check
|
|
324
|
+
|
|
325
|
+
- New helper `assertActiveMembership(claims)` in
|
|
326
|
+
`apps/web/lib/upload/membership-recheck.ts` — calls the existing
|
|
327
|
+
Phase 6 `check_membership_status` RPC and maps statuses to typed
|
|
328
|
+
errors. Personal runs short-circuit via `!claims.org_id`. Authority
|
|
329
|
+
is `claims.org_id`; the new `mint_status` claim is observability-
|
|
330
|
+
only (codex pass-1 CRITICAL #2 — closed bypass where a v7.0 token
|
|
331
|
+
could skip the check).
|
|
332
|
+
- New orchestrator `verifyTokenAndAssertRunMembership(token, runId,
|
|
333
|
+
supabase)` in `apps/web/lib/upload/auth.ts` — single chokepoint that
|
|
334
|
+
every JWT-authenticated ingest route calls. Combines (1) JWT shape
|
|
335
|
+
+ signature verify, (2) JWT.run_id ↔ route runId consistency,
|
|
336
|
+
(3) persisted runs lookup, (4) JWT.org_id ↔ run.organization_id
|
|
337
|
+
consistency (closes cross-org JWT replay AND personal-shortcut
|
|
338
|
+
bypass — codex pass-3 CRITICAL #2), and (5) per-request membership
|
|
339
|
+
re-check.
|
|
340
|
+
- `PUT /api/runs/:runId/events/:seq` and `POST /api/runs/:runId/finalize`
|
|
341
|
+
both call the orchestrator before any side-effect RPC / Storage
|
|
342
|
+
write. Disabled / inactive / no-membership returns 403; transient
|
|
343
|
+
RPC failure returns retryable 503; opaque 404 for run mismatches
|
|
344
|
+
(no enumeration leakage).
|
|
345
|
+
- `POST /api/upload-session` does its own pre-mint
|
|
346
|
+
`check_membership_status` RPC for org-scoped runs. Non-active
|
|
347
|
+
members get 403 `member_not_active` + `audit_events` row with
|
|
348
|
+
`action: 'ingest.mint_refused'`. No upload session created on
|
|
349
|
+
refusal. RPC failure → 503 (retryable parity with event-write/
|
|
350
|
+
finalize, codex pass-2 WARNING #2).
|
|
351
|
+
- JWT shape: `UploadTokenClaims.org_id` is now `string | null` (verify
|
|
352
|
+
normalizes wire-format `''` → `null`); new optional
|
|
353
|
+
`mint_status: 'active' | 'personal'` claim. `MintInput.mintStatus`
|
|
354
|
+
is required.
|
|
355
|
+
- `verifyUploadToken()` is preserved for the JWT-shape unit tests but
|
|
356
|
+
marked `@deprecated`. Routes under `app/api/runs/**` are blocked
|
|
357
|
+
from importing it directly via ESLint `no-restricted-imports`
|
|
358
|
+
(`apps/web/.eslintrc.json`). Defense-in-depth chokepoint
|
|
359
|
+
(codex pass-3 WARNING #5).
|
|
360
|
+
|
|
361
|
+
### Tests
|
|
362
|
+
|
|
363
|
+
- 32 new/modified web tests (566 → 598). Coverage: mint-time
|
|
364
|
+
membership snapshot (4), event-write re-check (8 — incl. ordering
|
|
365
|
+
spy + v7.0-shape regression), finalize re-check (4), helper unit
|
|
366
|
+
(10 — status enum + RPC error + personal shortcut + v7.0
|
|
367
|
+
back-compat), end-to-end disable-mid-session (1), identity invariant
|
|
368
|
+
(3), JWT shape (4 modified).
|
|
369
|
+
- `__tests__/_helpers/supabase-stub.ts` adds a
|
|
370
|
+
`check_membership_status` RPC handler that reads from the seeded
|
|
371
|
+
`memberships` table.
|
|
372
|
+
|
|
373
|
+
### Documentation
|
|
374
|
+
|
|
375
|
+
- `docs/v7/breaking-changes.md` — appended "v7.0 → v7.1" section
|
|
376
|
+
covering the rollout (no coordinated cutover; in-flight org-scoped
|
|
377
|
+
tokens enforce immediately).
|
|
378
|
+
- `apps/web/lib/dashboard/auth.ts` — extended the API-key audit
|
|
379
|
+
comment block with the new ingest-API JWT caller list and the
|
|
380
|
+
invariant.
|
|
381
|
+
|
|
382
|
+
### No SQL migration
|
|
383
|
+
|
|
384
|
+
Phase 6's `check_membership_status` RPC is reused verbatim. v7.1 ships
|
|
385
|
+
pure TypeScript + a single test-stub change.
|
|
386
|
+
|
|
387
|
+
## 7.0.0 (2026-05-09)
|
|
388
|
+
|
|
389
|
+
**v7.0 — hosted product MVP cutover.** First major bump since v6.0
|
|
390
|
+
(2026-04-22). Drops the engine-off code path, ships the autopilot.dev
|
|
391
|
+
hosted dashboard MVP, closes the last operational gap in dashboard
|
|
392
|
+
session revocation, and bumps the run-state schema_version to mark the
|
|
393
|
+
v7 era.
|
|
394
|
+
|
|
395
|
+
### Breaking changes (read this first)
|
|
396
|
+
|
|
397
|
+
See [docs/v7/breaking-changes.md](docs/v7/breaking-changes.md) for the
|
|
398
|
+
full migration checklist. The shortlist:
|
|
399
|
+
|
|
400
|
+
- **`--no-engine` removed.** Exits 1 with `invalid_config` if passed.
|
|
401
|
+
The engine is unconditionally on.
|
|
402
|
+
- **`CLAUDE_AUTOPILOT_ENGINE=off` removed (soft).** The env value is
|
|
403
|
+
ignored — engine still runs — but a one-shot stderr deprecation
|
|
404
|
+
banner fires + a `run.warning` event with code `engine_off_removed`
|
|
405
|
+
is emitted into the durable run log. Softer than `--no-engine`
|
|
406
|
+
because env vars in CI are sticky.
|
|
407
|
+
- **`ENGINE_DEFAULT_V6_0` and `ENGINE_DEFAULT_V6_1` exports removed**
|
|
408
|
+
from `src/core/run-state/resolve-engine.ts`. Direct importers must
|
|
409
|
+
replace with literal `true`. `resolveEngineEnabled()` itself is
|
|
410
|
+
preserved for source compatibility but always returns
|
|
411
|
+
`{enabled: true, source: 'default'}`.
|
|
412
|
+
- **`runEngineOff` callback on `runPhaseWithLifecycle` is preserved as
|
|
413
|
+
optional**, but the helper NEVER invokes it in v7.0. New call sites
|
|
414
|
+
should omit it.
|
|
415
|
+
- **`RUN_STATE_SCHEMA_VERSION` bumped 1 → 2.** v6.x runs are still
|
|
416
|
+
readable on v7 (`MIN_SUPPORTED` stays at 1). v6 binaries reading v7
|
|
417
|
+
runs hit a `corrupted_state` error with a "downgrade resume is not
|
|
418
|
+
supported" hint + `[1..1]` range.
|
|
419
|
+
- **`--engine` becomes a no-op shim** with one-shot per-process
|
|
420
|
+
stderr deprecation banner. Flag preserved so existing scripts don't
|
|
421
|
+
break; remove at your leisure (slated for v8).
|
|
422
|
+
|
|
423
|
+
### apps/web — real-time membership revocation
|
|
424
|
+
|
|
425
|
+
- New middleware extension on `/dashboard/**` and `/api/dashboard/**`.
|
|
426
|
+
Verifies the `cao_active_org` cookie + the HMAC-signed
|
|
427
|
+
`cao_membership_check` cookie cache; on miss/expired/wrong-identity,
|
|
428
|
+
calls the new `check_membership_status(p_org_id, p_user_id)` RPC
|
|
429
|
+
(1.5s timeout, fail-closed on error).
|
|
430
|
+
- Worst-case revocation window collapses from ≤1h (= access-token
|
|
431
|
+
expiry, the v6 baseline) to ≤60s (= cookie cache TTL).
|
|
432
|
+
- New env var: `MEMBERSHIP_CHECK_COOKIE_SECRET` (≥32 bytes;
|
|
433
|
+
`openssl rand -hex 32`). Lazy/runtime validation — `next build` in
|
|
434
|
+
CI without the secret won't crash; middleware fails closed at
|
|
435
|
+
request time if missing.
|
|
436
|
+
- Middleware runtime explicitly set to `nodejs` (was Edge default).
|
|
437
|
+
Required for `node:crypto` HMAC + `crypto.timingSafeEqual`.
|
|
438
|
+
- New page: `/access-revoked?reason=<code>` (Server Component, NOT
|
|
439
|
+
auth-gated, does NOT auto-forward authenticated users to avoid
|
|
440
|
+
redirect loops). Renders one of four reasons with a Sign-out form.
|
|
441
|
+
- Status → reason mapping table is the single source of truth (codex
|
|
442
|
+
pass-3 WARNING #5):
|
|
443
|
+
- `disabled` → `member_disabled`
|
|
444
|
+
- `inactive` / `invite_pending` → `member_inactive`
|
|
445
|
+
- `no_row` → `no_membership`
|
|
446
|
+
- RPC error / timeout → `check_failed`
|
|
447
|
+
- New SQL migration: `data/deltas/20260509200000_phase6_check_membership_rpc.sql`.
|
|
448
|
+
`SECURITY INVOKER` (NOT DEFINER per codex pass-2 WARNING #5 +
|
|
449
|
+
pass-3 WARNING #2 — `service_role` bypasses RLS already, so DEFINER
|
|
450
|
+
would only widen blast radius). REVOKE'd from PUBLIC/anon/authenticated;
|
|
451
|
+
GRANT EXECUTE to `service_role` only.
|
|
452
|
+
|
|
453
|
+
### Deferred to v7.1
|
|
454
|
+
|
|
455
|
+
- `MEMBERSHIP_CHECK_TTL_SECONDS` env var to let enterprise customers
|
|
456
|
+
tighten the 60s cache window.
|
|
457
|
+
- Server-side cache invalidation on `change_member_role` /
|
|
458
|
+
`disable_member` (would tighten role-change visibility from ≤60s to
|
|
459
|
+
immediate).
|
|
460
|
+
- Phase 2.2 ingest API JWT mint embeds `mint_membership_status` so
|
|
461
|
+
finalize/event endpoints can refuse disabled members within the
|
|
462
|
+
≤30min JWT TTL.
|
|
463
|
+
|
|
464
|
+
### Documentation
|
|
465
|
+
|
|
466
|
+
- New: `docs/v7/breaking-changes.md` — explicit v6 → v7 migration
|
|
467
|
+
checklist.
|
|
468
|
+
- New: `docs/v7/runbook.md` — production deployment runbook for the
|
|
469
|
+
hosted product (Vercel env vars grouped by purpose, WorkOS dashboard
|
|
470
|
+
hookups, Stripe products + webhook config, cron secret rotation,
|
|
471
|
+
first-deploy checklist).
|
|
472
|
+
- README — new "Hosted product (v7)" section pointing at autopilot.dev,
|
|
473
|
+
install snippet updated to `npm install -g
|
|
474
|
+
@delegance/claude-autopilot@latest`.
|
|
475
|
+
- `docs/v6/migration-guide.md` — appended v6.2.x → v7.0 section.
|
|
476
|
+
|
|
477
|
+
### CI / publishing
|
|
478
|
+
|
|
479
|
+
- `.github/workflows/ci.yml` now tags pushes matching
|
|
480
|
+
`v[0-9]+.[0-9]+.[0-9]+` (no suffix) with `--tag latest`; everything
|
|
481
|
+
else stays `--tag next`. `package.json` `publishConfig.tag` stays at
|
|
482
|
+
`next` as a hand-publish fallback only — the workflow is the source
|
|
483
|
+
of truth.
|
|
484
|
+
|
|
485
|
+
### Phase rollup (v7.0 cycle)
|
|
486
|
+
|
|
487
|
+
- **Phase 1** (schema/RLS) — multi-tenant Postgres + RLS policies for
|
|
488
|
+
the hosted product.
|
|
489
|
+
- **Phase 2.1** (Next.js scaffold) — `apps/web/` workspace, Vercel
|
|
490
|
+
deploy.
|
|
491
|
+
- **Phase 2.2** (ingest API) — signed-session JWT pipeline for
|
|
492
|
+
CLI → dashboard run uploads.
|
|
493
|
+
- **Phase 2.3** (CLI dashboard verbs) — `dashboard {login,logout,
|
|
494
|
+
status,upload}` + cli-auth loopback OAuth.
|
|
495
|
+
- **Phase 3** (Stripe) — entitlements, tiered pricing, webhook.
|
|
496
|
+
- **Phase 4** (dashboard UI + cli-auth hardening) — homepage, auth,
|
|
497
|
+
CSP-locked /cli-auth.
|
|
498
|
+
- **Phases 5.1-5.4** (org admin / WorkOS setup) — members, audit, cost,
|
|
499
|
+
per-tenant SSO connection management.
|
|
500
|
+
- **Phase 5.6** (WorkOS sign-in) — domain verification, SSO
|
|
501
|
+
enforcement chokepoint.
|
|
502
|
+
- **Phase 5.7** (admin lifecycle) — disable_member, sso_disconnect,
|
|
503
|
+
enable_member, last-owner race protection.
|
|
504
|
+
- **Phase 5.8** (lifecycle gap closure) — disabled-API-key
|
|
505
|
+
authorization fix + Vercel cron for cleanup_expired_sso_states.
|
|
506
|
+
- **Phase 6** (this release) — engine-off removal, schema bump, real-
|
|
507
|
+
time membership revocation, runbook, breaking-changes docs.
|
|
508
|
+
|
|
509
|
+
### Tests
|
|
510
|
+
|
|
511
|
+
- 1500+ existing CLI tests pass (engine-off tests collapsed to
|
|
512
|
+
always-on; net delta near zero).
|
|
513
|
+
- 510 → 566 web tests (+56 across cookie-hmac, check-membership, RPC
|
|
514
|
+
privilege grep, middleware revocation surface, response composition,
|
|
515
|
+
matcher, integration).
|
|
516
|
+
- tsc clean across both `@delegance/claude-autopilot` and
|
|
517
|
+
`@delegance/claude-autopilot-web`.
|
|
518
|
+
|
|
519
|
+
## 6.3.0-pre.13 (2026-05-09)
|
|
520
|
+
|
|
521
|
+
**v7.0 Phase 5.8 — Lifecycle gap closure.** Closes the two known gaps from Phase 5.7:
|
|
522
|
+
|
|
523
|
+
1. **Disabled-API-key authorization fix.** The Phase 2.2 `upload-session` and Phase 4 `artifact` routes had `let allowed = run.user_id === auth.userId` as the first authorization check. This allowed a member who got disabled AFTER creating an org-scoped run to keep uploading via their API key. Both routes now ALWAYS require active membership when `run.organization_id` is set, regardless of ownership. Personal (un-org-scoped) runs still use the ownership check. Regression test (`__tests__/api/dashboard/runs/disabled-api-key.test.ts`, 4 cases) locks this in.
|
|
524
|
+
2. **Vercel cron wiring for `cleanup_expired_sso_states` RPC.** New `GET /api/cron/cleanup-expired-sso-state` route (Vercel cron-secret-gated; rejects any caller without `Authorization: Bearer ${CRON_SECRET}`). Schedule `0 3 * * *` (daily 03:00 UTC) added to `vercel.json`. Calls the Phase 5.7 RPC with default args (24h state age, 30d event age). 4-test coverage (auth happy/fail paths + missing env).
|
|
525
|
+
|
|
526
|
+
New env: `CRON_SECRET` (Vercel sets automatically on production cron-attached projects; local-dev override via `.env.local`). Documented in `.env.example`.
|
|
527
|
+
|
|
528
|
+
Tests: 502 → 510 web. tsc clean.
|
|
529
|
+
|
|
530
|
+
## 6.3.0-pre.12 (2026-05-09)
|
|
531
|
+
|
|
532
|
+
**v7.0 Phase 5.7 — Admin lifecycle controls + session revocation.** Closes the lifecycle/revocation gap that Phases 5.4 and 5.6 explicitly deferred.
|
|
533
|
+
|
|
534
|
+
Three lifecycle controls:
|
|
535
|
+
|
|
536
|
+
1. **Admin disable-user** — `POST /api/dashboard/orgs/:orgId/members/:userId/disable` flips `memberships.status='disabled'`, captures `disabled_at`/`disabled_by`, deletes `auth.refresh_tokens` for the user. Existing access tokens expire ≤1h (Supabase default; documented in spec). Idempotent on already-disabled (returns `noop:true`, no duplicate audit, no duplicate revocation). Owner-protection (admin cannot disable owner) + last-owner guard.
|
|
537
|
+
2. **SSO disconnect cascade** — `apply_workos_event(connection.deleted)` set-based DELETE of refresh tokens for org members (status active OR disabled per codex plan-pass WARNING #1) with verified-domain emails. Audit metadata captures `cascadeRevokedUserCount` + `cascadeRevokedTokenCount` (no user IDs per plan-pass WARNING #5).
|
|
538
|
+
3. **`cleanup_expired_sso_states` RPC** — service-role only, called via `scripts/cleanup-expired-sso-state.ts` (no HTTP route per codex pass-1 CRITICAL #3). Phase 6 wires a cron.
|
|
539
|
+
|
|
540
|
+
Migration `data/deltas/20260509140000_phase5_7_lifecycle.sql`:
|
|
541
|
+
- ALTER `memberships.status` CHECK extended with `'disabled'` + `disabled_at`/`disabled_by` columns.
|
|
542
|
+
- 4 new SECURITY DEFINER RPCs (REVOKE FROM PUBLIC,anon,authenticated; GRANT TO service_role): `revoke_user_sessions`, `disable_member`, `enable_member`, `cleanup_expired_sso_states`.
|
|
543
|
+
- 2 RPC REPLACEs: `record_workos_sign_in` now refuses `member_disabled` / `member_inactive` / `invite_pending` (codex pass-2 WARNING #1); `apply_workos_event` adds set-based cascade DELETE on `connection.deleted`.
|
|
544
|
+
|
|
545
|
+
Surfaces:
|
|
546
|
+
- `POST /api/dashboard/orgs/:orgId/members/:userId/disable` (admin/owner-gated).
|
|
547
|
+
- `POST /api/dashboard/orgs/:orgId/members/:userId/enable` (admin/owner-gated, symmetric owner protection — only owners can re-enable owners per pass-2 WARNING #3).
|
|
548
|
+
- `GET /api/auth/sso/callback` modified to redirect 302 → `/login/sso?reason={member_disabled|member_inactive|invite_pending}` instead of returning 403 JSON.
|
|
549
|
+
|
|
550
|
+
`/login/sso` page renders 3 new banner reasons. `lib/dashboard/membership-guard.ts` MAP gains 10 new error codes. `package.json` 6.3.0-pre.11 → 6.3.0-pre.12.
|
|
551
|
+
|
|
552
|
+
Tests: 6 new test files (49 tests). disable.test.ts (11), enable.test.ts (4), webhook-cascade.test.ts (5), sso-signin-phase5-7.test.ts (4), phase5-7-privilege.test.ts (16 grep assertions), cleanup-expired-sso-state.test.ts (4), disabled-user-jwt.test.ts (4 — codex plan-pass CRITICAL #2 regression: proves disabled member with still-valid JWT can't access dashboard routes via 4 representative paths). 451 → 500 web tests. tsc clean.
|
|
553
|
+
|
|
554
|
+
**Known gaps (Phase 5.8):**
|
|
555
|
+
- API keys (Phase 2.3) are user-scoped not org-scoped; disabling membership in org A doesn't auto-revoke. Phase 5.8 will add a membership-active check in the API-key auth helper.
|
|
556
|
+
- Access-token expiry is the upper bound on revocation latency (≤1h Supabase default). Real-time revocation requires a request-time denylist + middleware (Phase 6).
|
|
557
|
+
- Cleanup script not yet cron-scheduled (Phase 6).
|
|
558
|
+
|
|
559
|
+
**Codex passes folded:** spec pass-1 (3C+5W+2N), pass-2 (1C+6W), plan-pass (2C+6W+2N). Highlights: dropped global API-key revocation due to cross-tenant blast (gap explicitly documented + deferred); cascade scope includes `'disabled'` per plan WARNING #1; audit metadata drops user IDs sample per plan WARNING #5; explicit disabled-user-JWT regression test proves spec's enforcement-audit table is correct.
|
|
560
|
+
|
|
561
|
+
## 6.3.0-pre.11 (2026-05-09)
|
|
562
|
+
|
|
563
|
+
**v7.0 Phase 5.6 — WorkOS SSO sign-in flow.** End-to-end SSO sign-in built on the Phase 5.4 foundation. Three sub-features that ship together (any subset is unusable):
|
|
564
|
+
|
|
565
|
+
- **Domain claim with DNS TXT challenge.** Admin-gated `POST/DELETE /api/dashboard/orgs/:orgId/sso/domains` + `POST .../verify`. Codex pass-1 CRITICAL #1 — `ever_verified` flag + unique partial index on `(lower(domain)) WHERE ever_verified=TRUE` blocks revoke-then-takeover by another org.
|
|
566
|
+
- **Sign-in flow.** Public `POST /api/auth/sso/start` (email-only — `orgId`-mode removed for anti-enumeration per codex pass-2 WARNING #8) → `GET /api/auth/sso/callback`. State binding (codex pass-2 CRITICAL #2): single canonical protocol — cookie holds HMAC-signed `{stateId, nonce}`, WorkOS state param = stateId only, server-stored `sso_authentication_states` row + atomic `consume_sso_authentication_state` RPC validates `(stateId, sha256(nonce))` + workos org/connection match. Session minted via admin-mediated magic link (codex pass-1 CRITICAL #4 — `verifyOtp` uses `token_hash` not `token`); session-user-mismatch verification revokes + audits + 500.
|
|
567
|
+
- **`sso_required` toggle.** Owner-only `PATCH /api/dashboard/orgs/:orgId/sso/required`. Asymmetric guard (codex pass-1 WARNING #7): turning OFF always allowed; turning ON requires active SSO. UI banner per codex pass-2 NOTE #2 explains the asymmetric state.
|
|
568
|
+
|
|
569
|
+
Single chokepoint enforcement: `enforceSsoRequired()` helper called from `/api/auth/callback` after every Google/magic-link `exchangeCodeForSession`. Sign-in surface registry table in spec documents the auth boundary.
|
|
570
|
+
|
|
571
|
+
Identity link (codex pass-1 WARNING #6): `workos_user_identities` table preserves `(workos_user_id, workos_organization_id) → user_id` mapping so future sign-ins re-use the same Supabase user even if IdP email changes. Magic link minted with the linked Supabase user's CURRENT email (looked up via `auth.admin.getUserById`), not the WorkOS profile email.
|
|
572
|
+
|
|
573
|
+
Migration `data/deltas/20260509120000_phase5_6_workos_signin.sql`:
|
|
574
|
+
- ALTER `organization_settings` ADD `sso_required BOOLEAN DEFAULT FALSE`.
|
|
575
|
+
- 3 new tables (`organization_domain_claims`, `sso_authentication_states`, `workos_user_identities`) with RLS + service-role grants.
|
|
576
|
+
- 6 SECURITY DEFINER RPCs: `claim_domain`, `mark_domain_verified`, `revoke_domain_claim`, `set_sso_required`, `consume_sso_authentication_state` (atomic UPDATE...RETURNING per codex plan-pass WARNING #5), `record_workos_sign_in` (verified-domain match required per codex pass-1 CRITICAL #3). All REVOKE FROM PUBLIC,anon,authenticated; GRANT TO service_role.
|
|
577
|
+
|
|
578
|
+
New deps: `tldts` (maintained PSL package per codex pass-1 NOTE #1).
|
|
579
|
+
New env vars: `SSO_STATE_SIGNING_SECRET` (≥32 bytes, module-load validation per codex plan-pass WARNING #4), `WORKOS_CLIENT_ID` (required by `workos.sso.getAuthorizationUrl`).
|
|
580
|
+
|
|
581
|
+
Helpers:
|
|
582
|
+
- `lib/dns/normalize-domain.ts` — `normalizeDomain` + `normalizeEmailDomain` (IDN, public-suffix-aware) used by every domain-touching surface.
|
|
583
|
+
- `lib/dns/verify-txt.ts` — `Promise.race`-bounded TXT lookup (codex pass-2 WARNING #4 — `node:dns/promises.resolveTxt` doesn't honor AbortSignal).
|
|
584
|
+
- `lib/auth/enforce-sso-required.ts` — sign-in surface chokepoint.
|
|
585
|
+
- `lib/workos/sign-in.ts` — `getSsoStateSigningSecret` (length-validated singleton), `signStateCookie` / `parseStateCookie` (HMAC), `buildAuthorizeUrl` (passes clientId per codex plan-pass CRITICAL #3).
|
|
586
|
+
- `lib/dashboard/membership-guard.ts` MAP gains 13 new error codes.
|
|
587
|
+
|
|
588
|
+
UI:
|
|
589
|
+
- `/login/sso` page + `<SsoSignInForm>` client component.
|
|
590
|
+
- `<SsoDomainsCard>` + `<SsoRequiredToggle>` embedded in admin SSO page (toggle renders even when SSO inactive per codex pass-1 WARNING #7).
|
|
591
|
+
|
|
592
|
+
Tests: 5 new test files (54 tests). domains.test.ts (11), required.test.ts (4), start.test.ts (5), callback.test.ts (10), sso-signin-privilege.test.ts (13), normalize-domain.test.ts (19), verify-txt.test.ts (6), enforce-sso-required.test.ts (7), sign-in.test.ts (11). Stub extensions for 7 new RPCs (`claim_domain`, `mark_domain_verified`, `revoke_domain_claim`, `set_sso_required`, `consume_sso_authentication_state`, `record_workos_sign_in`, `audit_append`) + 3 new tables + `auth.admin.{getUserById,createUser,generateLink,signOut}` + `auth.verifyOtp` mocks.
|
|
593
|
+
|
|
594
|
+
## 6.3.0-pre.10 (2026-05-08)
|
|
595
|
+
|
|
596
|
+
**v7.0 Phase 5.4 — WorkOS SSO setup.** Foundational SSO wiring: server-owned WorkOS organization correlation, admin-gated portal link, signature-verified lifecycle webhook, owner-gated disconnect.
|
|
597
|
+
|
|
598
|
+
New env vars: `WORKOS_API_KEY`, `WORKOS_WEBHOOK_SECRET`.
|
|
599
|
+
|
|
600
|
+
Migration `data/deltas/20260508180000_phase5_4_workos_setup.sql`:
|
|
601
|
+
- ALTER `organization_settings` adds 7 SSO columns (workos_organization_id, workos_connection_id, sso_connection_status, sso_connected_at, sso_disabled_at, sso_last_workos_event_at, sso_last_workos_event_id) + unique partial indexes on workos_organization_id and workos_connection_id.
|
|
602
|
+
- New `processed_workos_events` ledger with claim/lease/complete columns (status, processing_started_at, locked_until, attempt_count) — enables idempotent webhook retry.
|
|
603
|
+
- Three SECURITY DEFINER RPCs (REVOKE FROM PUBLIC,anon,authenticated; GRANT service_role): `record_sso_setup_initiated` (admin-gated, raises `workos_org_already_bound` if a different active WorkOS org would be swapped), `apply_workos_event` (claim/lease/complete + lifecycle ordering via sso_last_workos_event_at + state transition + audit append in one txn — connection.deleted always wins over older updated), `disable_sso_connection` (owner-only soft-disable).
|
|
604
|
+
|
|
605
|
+
Surfaces:
|
|
606
|
+
- `POST /api/dashboard/orgs/:orgId/sso/setup` — 6-step admin-gated portal-link sequence. Server-creates the WorkOS org via `externalId=orgId` so correlation is server-owned; idempotent on retry. Returns `{ portalUrl, workosOrganizationId }` with `Cache-Control: private, no-store`.
|
|
607
|
+
- `DELETE /api/dashboard/orgs/:orgId/sso` — owner-only two-step disconnect (RPC sets status='disabled'; route then calls `workos.sso.deleteConnection`; failure non-fatal — eventual `connection.deleted` webhook clears connection_id via apply_workos_event).
|
|
608
|
+
- `POST /api/workos/webhook` — runtime nodejs, raw `req.text()` body, HMAC verified via `workos.webhooks.constructEvent` (5-min tolerance). Maps connection.activated/deactivated/deleted (and dsync.* variants) through apply_workos_event RPC. 401 on bad signature, 500 on RPC error so WorkOS retries.
|
|
609
|
+
- `/dashboard/admin/sso` page (owner-only, 404 otherwise) + `<SsoSetupCard>` client component.
|
|
610
|
+
|
|
611
|
+
Helpers:
|
|
612
|
+
- `lib/workos/client.ts` — lazy `getWorkOS()` singleton + async `verifyWorkOSSignature()` wrapper (returns `{ok, event} | {ok:false, reason}`).
|
|
613
|
+
- `lib/dashboard/membership-guard.ts` MAP gains `workos_org_already_bound: 422`, `bad_workos_org_id: 422`, `webhook_signature_invalid: 401`.
|
|
614
|
+
|
|
615
|
+
Sidebar: admin layout adds "SSO" link.
|
|
616
|
+
|
|
617
|
+
Tests: 5 new test files (40 tests). setup.test.ts (11), disconnect.test.ts (6), webhook.test.ts (6), client.test.ts (6), sso-privilege.test.ts (11 — REVOKE/GRANT, SECURITY DEFINER, schema-qualified refs, claim/lease/complete columns, lifecycle handlers). Stub extensions for `record_sso_setup_initiated`, `apply_workos_event`, `disable_sso_connection` RPCs + `processed_workos_events` table behavior.
|
|
618
|
+
|
|
619
|
+
## 6.3.0-pre.9 (2026-05-08)
|
|
620
|
+
|
|
621
|
+
**v7.0 Phase 5.3 — Org switcher.** Replaces the "first admin/owner membership" hack across `/dashboard` + `/dashboard/admin/*` with a real org switcher backed by an HTTP-only cookie.
|
|
622
|
+
|
|
623
|
+
- New: `POST /api/dashboard/active-org` sets `cao_active_org` cookie (HttpOnly Secure SameSite=Lax 14d). Body `{ orgId }` validates caller is active member; `{ orgId: null }` clears.
|
|
624
|
+
- New: `lib/dashboard/active-org.ts` exports `resolveActiveOrg(svc, userId)` (cookie → first-membership fallback) and `listActiveOrgs(svc, userId)` (with names + roles).
|
|
625
|
+
- New: `<OrgSwitcher>` client component in dashboard sidebar (only shows when caller has 2+ active memberships).
|
|
626
|
+
- Modified: `/dashboard/layout.tsx`, `/dashboard/page.tsx`, `/dashboard/billing/page.tsx`, `/dashboard/admin/layout.tsx` all now consult `resolveActiveOrg` instead of `memberships[0]`.
|
|
627
|
+
- Admin layout cookie restricted to admin/owner orgs — cannot escalate a member-only org into the admin surface.
|
|
628
|
+
- 11 new tests (6 backend route + 5 helper). Stale-cookie test asserts the membership check rejects removed members.
|
|
629
|
+
- No new env vars, no migration.
|
|
630
|
+
|
|
631
|
+
## 6.3.0-pre.8 (2026-05-08)
|
|
632
|
+
|
|
633
|
+
**v7.0 Phase 5.2 — Audit log viewer + cost reporting (CSV export).** Closes the audit half of the original Phase 5 scope.
|
|
634
|
+
|
|
635
|
+
New surfaces:
|
|
636
|
+
1. `/dashboard/admin/audit` — server-rendered, role-gated. Paginated audit log with single-action filter, cursor-based pagination, prev_hash/this_hash exposed for chain-replay debugging.
|
|
637
|
+
2. `/dashboard/admin/cost` — owner/admin only. Per-user cost breakdown for a YYYY-MM period, default current UTC month. Download CSV button.
|
|
638
|
+
|
|
639
|
+
3 new API routes (all under `/api/dashboard/orgs/:orgId/`):
|
|
640
|
+
- `GET /audit` — list_audit_events RPC; cursor decode + ISO since/until validation route-side; nextCursor base64-re-encoded
|
|
641
|
+
- `GET /cost` — org_cost_report RPC; period response normalized to `{ since, until, sinceTs, untilTs }`
|
|
642
|
+
- `GET /cost.csv` — same RPC, formats as RFC 4180 CSV (CRLF, UTF-8 no BOM, double-quote escape); filename `cost-<orgId>-<since>-<until>.csv` (no org-name interpolation)
|
|
643
|
+
|
|
644
|
+
2 SECURITY DEFINER Postgres RPCs in `data/deltas/20260508160000_phase5_2_audit_cost_rpcs.sql`:
|
|
645
|
+
- `list_audit_events` — keyset pagination on `(occurred_at DESC, id DESC)` with index `audit_events_org_keyset_idx`. LEFT JOIN auth.users for actor_email.
|
|
646
|
+
- `org_cost_report` — aggregates runs by user_id; coalesce-in-coalesce-out NULL safety; LEFT JOIN auth.users.
|
|
647
|
+
- Both `SECURITY DEFINER SET search_path = public, audit, auth, pg_temp`. `REVOKE ALL FROM PUBLIC, anon, authenticated; GRANT EXECUTE TO service_role` only.
|
|
648
|
+
|
|
649
|
+
3 new helpers:
|
|
650
|
+
- `lib/dashboard/period.ts` — YYYY-MM parser converting `since/until` to (sinceTs, untilTs exclusive). UTC. Default to current month when both null.
|
|
651
|
+
- `lib/dashboard/cost-csv.ts` — RFC 4180 encoder + safe filename builder (validates against `[a-zA-Z0-9._-]`).
|
|
652
|
+
- `lib/dashboard/audit-cursor.ts` — base64 JSON cursor encode/decode + ISO 8601 UTC validator.
|
|
653
|
+
|
|
654
|
+
Codex passes folded:
|
|
655
|
+
- Spec pass 1 (3 CRITICAL: cost period semantics, runs.created_at vs nonexistent occurred_at, CSV filename injection + 7 WARNING + 2 NOTE)
|
|
656
|
+
- Spec pass 2 (2 CRITICAL: filename surface contradiction, deployment target clarification + 7 WARNING: cursor validation in route, error contract, JSON shape unification, audit period parsing, cache headers + 2 NOTE)
|
|
657
|
+
- Plan pass (2 CRITICAL: schema-qualify audit.events + SET search_path lock, runs.cost_usd ordering guard + 6 WARNING)
|
|
658
|
+
|
|
659
|
+
All routes return `Cache-Control: private, no-store`. All pages declare `force-dynamic`.
|
|
660
|
+
|
|
661
|
+
39 new tests (Phase 5.1's 237 → **276 web tests**). Helper unit tests: 21. Backend route tests: 32. Integration: 4. Privilege: 7 (incl. SECURITY DEFINER + search_path + schema-qualification + Phase 4 dependency check).
|
|
662
|
+
|
|
663
|
+
**Operator follow-up:** run `/migrate` to apply `20260508160000_phase5_2_audit_cost_rpcs.sql` against dev → QA → prod.
|
|
664
|
+
|
|
665
|
+
## 6.3.0-pre.7 (2026-05-08)
|
|
666
|
+
|
|
667
|
+
**v7.0 Phase 5.1 — Members management + RBAC enforcement.** First Org-tier user-visible surface. After Phase 5.1 ships, an Org-tier admin (small/mid Stripe plans, or free org owner) can actually manage their team.
|
|
668
|
+
|
|
669
|
+
New surfaces:
|
|
670
|
+
1. `/dashboard/admin/members` — server-rendered, role-gated. Lists active members with email, role dropdown per row, remove button. Embedded invite form (email + role).
|
|
671
|
+
2. `/dashboard/admin/settings` — owner-only. Edit org name (1..100 chars).
|
|
672
|
+
3. `/dashboard/admin/layout.tsx` — sidebar nav. 404s if signed-out OR caller has no admin/owner membership in any org.
|
|
673
|
+
4. Sidebar "Admin" link in `/dashboard/layout.tsx` — visible only when caller has admin/owner membership somewhere.
|
|
674
|
+
|
|
675
|
+
5 new API routes (all under `/api/dashboard/orgs/`):
|
|
676
|
+
- `GET /:orgId/members` — list active members + emails.
|
|
677
|
+
- `POST /:orgId/members/invite` — admin/owner invites by email; reactivates removed members.
|
|
678
|
+
- `PATCH /:orgId/members/:userId` — change role (matrix-gated).
|
|
679
|
+
- `DELETE /:orgId/members/:userId` — soft-remove (admin: members only; owner: any).
|
|
680
|
+
- `PATCH /:orgId` — owner updates org name.
|
|
681
|
+
|
|
682
|
+
4 SECURITY DEFINER Postgres RPCs in `data/deltas/20260508140000_phase5_1_member_rpcs.sql`:
|
|
683
|
+
- `invite_member`, `change_member_role`, `remove_member`, `update_org_name`.
|
|
684
|
+
- Each acquires `FOR UPDATE` lock on `memberships` rows for the org BEFORE re-reading caller role + authorizing. Atomically count + write + audit.append in one transaction.
|
|
685
|
+
- `REVOKE ALL FROM PUBLIC, anon, authenticated; GRANT EXECUTE TO service_role;` — direct authenticated RPC calls fail with `42501 permission denied`.
|
|
686
|
+
- Codex pass 1 CRITICAL (TOCTOU last-owner race) + codex pass 2 CRITICAL (caller-spoofing + lock-before-authorize) + codex plan-pass CRITICAL (`update_org_name` NULL-role guard) — all folded.
|
|
687
|
+
|
|
688
|
+
CSRF: `assertSameOrigin(req)` on every mutating route (5 of 5).
|
|
689
|
+
|
|
690
|
+
Audit events: `org.member.invited`, `org.member.role_changed`, `org.member.removed`, `org.settings.updated`. Written by RPCs only — never in route code.
|
|
691
|
+
|
|
692
|
+
35 backend tests + 4 integration tests = 39 new web tests. Existing 180 still pass. Concurrency test (#31) proves serial last-owner check via stub mutex; static migration test (#31b) proves REVOKE/GRANT.
|
|
693
|
+
|
|
694
|
+
No new env vars. No CLI changes.
|
|
695
|
+
|
|
696
|
+
**Operator follow-up:** run `/migrate` to apply `20260508140000_phase5_1_member_rpcs.sql` against dev → QA → prod.
|
|
697
|
+
|
|
698
|
+
## 6.3.0-pre.6 (2026-05-08)
|
|
699
|
+
|
|
700
|
+
**v7.0 Phase 4 — Free tier dashboard UI + `/cli-auth` page + public share-by-URL.** Closes the loop on Phase 3's commercially load-bearing 402: free users now SEE "you've used 87/100 this month" and one click away from upgrading.
|
|
701
|
+
|
|
702
|
+
Eight new UI surfaces:
|
|
703
|
+
1. `/dashboard` overview — auth-gated, server-rendered. Run count this month, cost MTD, current plan, recent runs (5), 30-day cost chart (inline SVG, no library).
|
|
704
|
+
2. `/dashboard/runs` — paginated list (20/page, offset-based via `range()`).
|
|
705
|
+
3. `/dashboard/runs/[runId]` — detail page with manifest-driven event replay (lazy chunk loading, hard 1000-event cap for MVP), state inspector, cost breakdown, visibility toggle.
|
|
706
|
+
4. `/dashboard/billing` — current plan/caps/usage; Upgrade/Manage subscription buttons that POST to Phase 3 endpoints.
|
|
707
|
+
5. `/dashboard/billing/success` — post-checkout polling page.
|
|
708
|
+
6. `/cli-auth` (DEFERRED FROM 2.3) — completes the CLI dashboard login flow. Server-validates `cb` (loopback only, port 56000-56050) + `nonce` (32 hex). Authenticated user clicks "Sign in CLI" → mints API key via `/api/dashboard/api-keys/mint` → POSTs to loopback with `mode: 'cors'`. CLI loopback listener (Phase 2.3, EXTENDED) gains OPTIONS preflight + `Access-Control-Allow-Origin` matching the configured `AUTOPILOT_PUBLIC_BASE_URL`.
|
|
709
|
+
7. `/runs/[runShareId]` — public share-by-URL. Server-side anon Supabase client (NOT createBrowserClient). Read-only events replay + state.
|
|
710
|
+
8. `PATCH /api/dashboard/runs/:runId/visibility` — narrow owner-only endpoint with explicit owner check + assertSameOrigin guard. NOT direct UPDATE on runs from client.
|
|
711
|
+
|
|
712
|
+
Plus required infrastructure:
|
|
713
|
+
- **Authorized signed-URL minter** at `GET /api/dashboard/runs/:runId/artifact?kind=manifest|chunk|state[&seq=N]` — verifies owner OR `visibility='public'` BEFORE calling `storage.from('run-uploads').createSignedUrl(path, 60)`. Bucket stays fully private. Chunk seq bounded against `upload_session_chunks` count → 422 on out-of-range. Path derived ONLY from DB-trusted values via `chunkPath()` helper.
|
|
714
|
+
- **assertSameOrigin guard** on cookie-authenticated mutating routes (mint, revoke, visibility, checkout, portal). Compares `Origin` header against `loadPublicBillingConfig().AUTOPILOT_PUBLIC_BASE_URL`. Skipped when API-key bearer auth is used.
|
|
715
|
+
- **`/cli-auth` security headers via middleware** — `Cache-Control: no-store`, `Referrer-Policy: no-referrer`, `X-Frame-Options: DENY`, and CSP including exact `connect-src 'self' http://127.0.0.1:* http://localhost:*` for the loopback POST. Headers set in middleware.ts (Server Component `headers()` reads request, not response).
|
|
716
|
+
- **Finalize handler** persists sanitized `cost_usd`/`duration_ms`/`run_status` from CLI state.json. TS-side bounds + enum validation BEFORE DB UPDATE so a buggy CLI doesn't trip the new CHECK and bring down the whole UPDATE. Wrapped in try/catch for graceful degradation during the rollout window before `/migrate` applies the new columns. Display-only — labeled "Reported by CLI", no entitlement/billing logic reads them.
|
|
717
|
+
- **safeRedirect** allowlist accepts `/cli-auth` AND preserves the full `?cb=&nonce=` query string when bouncing through Supabase Auth.
|
|
718
|
+
- **Env unification** — `AUTOPILOT_PUBLIC_BASE_URL` is now the canonical name everywhere (web AND CLI). The CLI's older `AUTOPILOT_DASHBOARD_BASE_URL` is a deprecated alias (warn-once on use).
|
|
719
|
+
|
|
720
|
+
Component breakdown: `<RunListItem>` server, `<EventReplay>` client (manifest-driven, lazy chunks, 1000-event cap), `<StateInspector>` client (recursive tree, no JSON-tree library), `<CostChart>` server (inline SVG, ~80 LOC), `<PlanCard>` server with client `<UpgradeButtons>`/`<ManageSubscriptionButton>`, `<VisibilityToggle>` client (optimistic update + confirmation modal).
|
|
721
|
+
|
|
722
|
+
30+ new tests: 6 visibility (incl. CSRF) + 14 artifact (9 base + 3 RLS + 2 seq-bounds) + 1 finalize-persists + 9 sanitize + 1 finalize-malformed-status + 3 cli-auth validate + 4 cli-auth headers + 1 cli-auth redirect round-trip + 2 cost-chart + 6 dashboard-pages integration + 4 origin-mismatch (mint/revoke/checkout/portal) + 1 CLI OPTIONS preflight = ~52 added tests across web + CLI.
|
|
723
|
+
|
|
724
|
+
**Migration:** `data/deltas/20260508120000_phase4_runs_metadata.sql` — `runs.cost_usd NUMERIC(12,4)`, `duration_ms INTEGER`, `run_status TEXT` with CHECK enum; cost-chart partial indexes (user vs org); `runs_select_public` policy for anon/authenticated on `visibility='public'`; column-level GRANT to anon (only safe public columns, NOT `SELECT *`). Operator runs `/migrate` post-merge BEFORE the code deploy fully exercises the new columns; finalize handler graceful-drops if columns missing.
|
|
725
|
+
|
|
726
|
+
**No new env vars** — all reuse Phase 2.1 + 2.3 + 3 vars. Consider standardizing `AUTOPILOT_PUBLIC_BASE_URL` in any custom CLI deployments (Phase 2.3's `AUTOPILOT_DASHBOARD_BASE_URL` still works but logs deprecation warning).
|
|
727
|
+
|
|
728
|
+
**Operator follow-ups:**
|
|
729
|
+
- Run `/migrate` to apply `data/deltas/20260508120000_phase4_runs_metadata.sql`.
|
|
730
|
+
- (Optional) Configure Stripe Customer Portal in dashboard if not already (allows cancellation, payment update from `/dashboard/billing`).
|
|
731
|
+
|
|
732
|
+
## 6.3.0-pre.5 (2026-05-08)
|
|
733
|
+
|
|
734
|
+
**v7.0 Phase 3 — Stripe entitlement enforcement.** Makes the cryptographic credibility boundary commercially load-bearing: every engine-on `autopilot --mode full` upload is now gated on the org's monthly run cap and retained-storage cap.
|
|
735
|
+
|
|
736
|
+
Five new surfaces:
|
|
737
|
+
1. `POST /api/stripe/webhook` — `runtime='nodejs'`, raw-body signature verification, claim/lease/complete idempotency (status='processing' + locked_until+attempt_count, stale leases reclaimed atomically), `last_stripe_event_at` watermark for out-of-order delivery. Handles `checkout.session.completed`, `customer.subscription.updated`, `customer.subscription.deleted`, `invoice.payment_failed`.
|
|
738
|
+
2. `POST /api/dashboard/billing/checkout` — Supabase session auth with role check (owner/admin), Stripe Checkout Session create with `idempotencyKey='${orgId}:${tier}:${interval}'` and customer reuse via `billing_customers.stripe_customer_id`. Returns `{ url }`.
|
|
739
|
+
3. `POST /api/dashboard/billing/portal` — same auth, returns Stripe Customer Portal session URL.
|
|
740
|
+
4. `POST /api/upload-session` — Phase 2.2 endpoint extended with entitlement gate between ownership pass and JWT mint. Returns 402 `{ error: 'limit_reached', limit, current, max, upgrade_url }`. New body field `expectedBytes` from `fs.stat(events.ndjson).size` for storage cap preflight (catches the 4.9-of-5GiB user uploading 20GiB pattern).
|
|
741
|
+
5. CLI uploader catches 402 → throws typed `UploadLimitError`. Auto-upload entry point (`auto-upload.ts`) detects, prints friendly message, returns `reason='limit-reached'` without bubbling. Run's exit code preserved.
|
|
742
|
+
|
|
743
|
+
Pricing tiers (per v7.0 MVP): Free (100 runs/mo, 5 GiB, $0), Org Small (1000, 50 GiB, $99/mo or $990/yr), Org Mid (10000, 500 GiB, $499/mo or $4990/yr), Enterprise (NULL caps = no enforcement, sales-led). PLAN_MAP keys by `(tier, interval)` for all 4 price IDs. Free organizations DO exist and share an org-level cap (NOT each-user-gets-personal-cap) — seeded by AFTER INSERT trigger on `organizations`.
|
|
744
|
+
|
|
745
|
+
Run-count cap uses STRICT `>` comparison (the runs row already exists when /api/upload-session is called, so count=100 is the 100th and is allowed; reject only at 101+). Storage cap = `sum_retained_bytes(orgId, userId, 90 days)` SQL aggregate, with `expectedBytes` preflight at mint time.
|
|
746
|
+
|
|
747
|
+
`loadBillingConfig()` validates Stripe env at runtime with zod; `loadPublicBillingConfig()` only reads `AUTOPILOT_PUBLIC_BASE_URL` so missing Stripe env doesn't break the upload-session entitlement gate. Subscription state grace logic: canceled-and-past-period-end → free; cancel_at past → free; payment_failed_at older than 7 days → free.
|
|
748
|
+
|
|
749
|
+
31 new tests: 8 webhook + 4 checkout + 3 portal + 10 checkEntitlement + 2 plan-map + 2 upload-session integration (web) + 3 CLI 402 handling.
|
|
750
|
+
|
|
751
|
+
**Migration:** `data/deltas/20260507180000_phase3_billing.sql` — `billing_customers`, augments `entitlements` with Stripe state + caps + watermark, `stripe_webhook_events` with claim/lease, `personal_entitlements`, augments `runs` with `total_bytes`+`deleted_at`, `sum_retained_bytes` + `count_runs_this_month` + `seed_free_entitlements` SECURITY DEFINER RPCs/trigger. CHECK constraint enforces free/small/mid have explicit caps and enterprise has NULLs. Backfills existing rows BEFORE adding the constraint. Operator runs `/migrate` post-merge.
|
|
752
|
+
|
|
753
|
+
**New env vars (Vercel):**
|
|
754
|
+
- `STRIPE_SECRET_KEY`
|
|
755
|
+
- `STRIPE_WEBHOOK_SECRET`
|
|
756
|
+
- `STRIPE_PRICE_SMALL_MONTHLY`
|
|
757
|
+
- `STRIPE_PRICE_SMALL_YEARLY`
|
|
758
|
+
- `STRIPE_PRICE_MID_MONTHLY`
|
|
759
|
+
- `STRIPE_PRICE_MID_YEARLY`
|
|
760
|
+
|
|
761
|
+
**Operator follow-ups:**
|
|
762
|
+
- Run `/migrate` to apply the migration through dev → QA → prod.
|
|
763
|
+
- Set the 6 Stripe env vars above in Vercel.
|
|
764
|
+
- Configure Stripe webhook in dashboard pointing at `https://autopilot.dev/api/stripe/webhook` and subscribe to: `checkout.session.completed`, `customer.subscription.updated`, `customer.subscription.deleted`, `invoice.payment_failed`.
|
|
765
|
+
- Create Stripe Products + 4 Prices: small ($99/mo + $990/yr), mid ($499/mo + $4990/yr).
|
|
766
|
+
|
|
767
|
+
## 6.3.0-pre.4 (2026-05-07)
|
|
768
|
+
|
|
769
|
+
**v7.0 Phase 2.3 — CLI dashboard verbs + auto-upload at run.complete.** Connects v6.x autopilot pipeline to Phase 2.2's ingest API.
|
|
770
|
+
|
|
771
|
+
Four new CLI verbs: `claude-autopilot dashboard {login,logout,status,upload}`. After `dashboard login`, every engine-on `autopilot --mode full` automatically uploads to autopilot.dev when `run.complete` fires. Login flow uses 128-bit nonce-bound loopback HTTP listener (port 56000-56050) with strict server-side `callbackUrl` validation, `crypto.timingSafeEqual` nonce verify, and atomic config write at `~/.claude-autopilot/dashboard.json` (mode 0600, dir 0700). Snapshot-before-upload (events.ndjson + state.json copied to `<runDir>/.upload-snapshot/` with stat-before/stat-after defense) so streaming writers can't tear the chunk reads. Auto-upload is foreground await with SIGINT/AbortController; failure prints `claude-autopilot dashboard upload <runId>` resume command and never overrides the run's exit code. Empty events.ndjson skips upload cleanly. Opt out per-run with `--no-upload` or globally with `CLAUDE_AUTOPILOT_UPLOAD=off`.
|
|
772
|
+
|
|
773
|
+
Web side adds four new endpoints under `/api/dashboard/`: `POST api-keys/mint` (Supabase session auth → atomic `mint_api_key_with_nonce` RPC, 128-bit `clp_<64-hex>` keys, SHA256-hashed at rest, 12-char prefix display), `POST api-keys/revoke` (idempotent, ownership-scoped), `GET me` (memberships + lastUploadAt), `GET runs/:runId/upload-session` (resume in-flight session). Centralized `authViaApiKey()` helper in `apps/web/lib/dashboard/auth.ts` looks up keys by deterministic hash with `eq + maybeSingle` (O(1)) and filters revoked keys. Strict `validateCallbackUrl()` regex restricts callbacks to `http://(127.0.0.1|localhost):560(0[0-9]|[1-4][0-9]|50)/cli-callback` with double-parse defense.
|
|
774
|
+
|
|
775
|
+
CLI ↔ web parity guaranteed by shared fixtures: `apps/web/lib/upload/__fixtures__/{chain-vectors,state-canonicalization-vectors}.json` are loaded byte-for-byte by `tests/dashboard/parity.test.ts`. Identical chain-root and JCS-canonical sha256 in both directions.
|
|
776
|
+
|
|
777
|
+
**Migration:** `data/deltas/20260507120000_phase2_3_api_keys.sql` — adds `api_keys` (RLS, key_hash regex check, prefix_display regex check), `api_key_mint_nonces` (RLS, service-role-only), `expire_mint_nonces()` SECURITY DEFINER RPC, and the atomic `mint_api_key_with_nonce()` SECURITY DEFINER RPC that fuses sweep + dedup-check + insert key + insert nonce in a single transaction. Operator runs `/migrate` post-merge.
|
|
778
|
+
|
|
779
|
+
**New env vars:**
|
|
780
|
+
- Web (Vercel): `NEXT_PUBLIC_AUTOPILOT_BASE_URL` — used by the `cli-auth` web page (deferred to Phase 4 dashboard UI) to display loopback callback URL.
|
|
781
|
+
- CLI: `AUTOPILOT_DASHBOARD_BASE_URL` (defaults `https://autopilot.dev`); `CLAUDE_AUTOPILOT_HOME` (defaults `~/.claude-autopilot`); `CLAUDE_AUTOPILOT_UPLOAD=off` opts out of auto-upload; `CLAUDE_AUTOPILOT_UPLOAD_RETRY_MS` overrides retry backoff (test seam).
|
|
782
|
+
|
|
783
|
+
**Operator follow-ups:**
|
|
784
|
+
- Run `/migrate` to apply the migration through dev → QA → prod.
|
|
785
|
+
- Set `NEXT_PUBLIC_AUTOPILOT_BASE_URL=https://autopilot.dev` in Vercel.
|
|
786
|
+
- Implement the `/cli-auth` web page in Phase 4 dashboard UI. The page must mint via `POST /api/dashboard/api-keys/mint` then POST `{ apiKey, fingerprint, accountEmail, nonce }` to the loopback callback (URL passed in `?cb=`). Phase 2.3 tests use a mock handler that simulates this flow end-to-end.
|
|
787
|
+
|
|
788
|
+
## 6.3.0-pre.3 (2026-05-07)
|
|
789
|
+
|
|
790
|
+
**v7.0 Phase 2.2 — ingest API + tamper-evident events.** First server endpoints in the repo. Three routes (`POST /api/upload-session`, `PUT /api/runs/:runId/events/:seq`, `POST /api/runs/:runId/finalize`) implement signed-session uploads with hash-chain verification and idempotent finalize. Per-chunk immutable Storage objects, DB row lock + unique constraint + Storage `upsert: false` triple-defense against concurrent corruption. Two-phase write ordering with `upload_session_chunks.status` for crash recovery. Dedicated `UPLOAD_SESSION_JWT_SECRET` (HS256, 15-min TTL, full claim hardening). RFC 8785 (JCS) state canonicalization. 38 new tests across upload-session, events-chunk, finalize, hash-chain vectors, JCS vectors, JWT, and storage helpers.
|
|
791
|
+
|
|
792
|
+
**Migration:** `data/deltas/20260507000000_phase2_2_ingest.sql` — adds `upload_session_chunks` table, augments `upload_sessions` with `next_expected_seq` + `chain_tip_hash`, adds `runs.state_sha256` + `runs.events_index_path`, partial unique index on `upload_sessions(run_id) WHERE consumed_at IS NULL`, CHECK constraints on hash-format columns, plus `claim_chunk_slot` and `mark_chunk_persisted` SECURITY DEFINER RPCs. Operator runs via `/migrate` post-merge.
|
|
793
|
+
|
|
794
|
+
**New env var:** `UPLOAD_SESSION_JWT_SECRET` — set in Vercel + local `.env.local`. Generate with `openssl rand -hex 32`. NOT shared with `SUPABASE_JWT_SECRET`.
|
|
795
|
+
|
|
796
|
+
**Storage bucket:** `run-uploads` — operator one-time setup in the Supabase project (private; service-role-only writes).
|
|
797
|
+
|
|
798
|
+
## 6.3.0-pre.2 (2026-05-07)
|
|
799
|
+
|
|
800
|
+
**v7.0 Phase 2.1 — Next.js scaffold + Supabase Auth (Free tier sign-in).**
|
|
801
|
+
|
|
802
|
+
First sub-PR of v7.0 Phase 2 (Ingest API + CLI integration). Pure foundation; no API endpoints related to ingest, no CLI dashboard verbs.
|
|
803
|
+
|
|
804
|
+
**What landed:**
|
|
805
|
+
- `apps/web/` Next.js 16 App Router app with React 19 + Tailwind v4
|
|
806
|
+
- npm workspaces (`workspaces: ["apps/*", "packages/*"]`) — CLI deps stay where they are; web deps live in `apps/web/package.json`
|
|
807
|
+
- `tsconfig.base.json` shared between CLI and web; `apps/web/` uses `bundler` module resolution, CLI keeps `NodeNext`
|
|
808
|
+
- Supabase Auth Google sign-in via PKCE callback (`/api/auth/callback`)
|
|
809
|
+
- Sign-out (`/api/auth/sign-out`) clears only configured project ref's cookies — never `sb-*` wildcard
|
|
810
|
+
- `safeRedirect` whitelist with documented change policy
|
|
811
|
+
- Scoped middleware matcher: refreshes session on page + `/api/auth/*` routes ONLY; excludes static assets, `/api/health`, and non-auth `/api/*` (ingest endpoints in 2.2 handle their own auth)
|
|
812
|
+
- Health endpoint `/api/health` for platform health checks
|
|
813
|
+
- 22 web tests via Vitest (10 redirect + 5 callback + 2 signout + 4 matcher + 1 typecheck-guard)
|
|
814
|
+
- `web-tests.yml` workflow runs typecheck + Next.js build + tests on every PR
|
|
815
|
+
- `npm-tarball-check.yml` workflow asserts `apps/` is excluded from the published CLI tarball
|
|
816
|
+
- `vercel.json` configured for monorepo build with `apps/web/` root
|
|
817
|
+
|
|
818
|
+
**Spec:** `docs/specs/v7.0-phase2.1-nextjs-scaffold.md` (PR #116)
|
|
819
|
+
**Plan:** `docs/superpowers/plans/2026-05-07-v7.0-phase2.1-nextjs-scaffold.md`
|
|
820
|
+
|
|
821
|
+
Pre-release on the npm `next` tag. `latest` stays on `6.2.2`.
|
|
822
|
+
|
|
823
|
+
## 6.3.0-pre.1 (2026-05-07)
|
|
824
|
+
|
|
825
|
+
**v7.0 Phase 1 — Foundation: schema + RLS + cross-tenant negative tests.**
|
|
826
|
+
|
|
827
|
+
First step toward the v7.0 hosted product. Database-only PR; no endpoints, no UI, no Stripe integration.
|
|
828
|
+
|
|
829
|
+
**What landed:**
|
|
830
|
+
|
|
831
|
+
- `db/supabase/` Supabase project bootstrap with 8 numbered migrations
|
|
832
|
+
- 7 multi-tenant tables: `organizations`, `memberships`, `runs`, `upload_sessions`, `entitlements`, `audit_events`, `organization_settings`
|
|
833
|
+
- RLS enabled on every table with two-branch pattern: `(organization_id IS NOT NULL AND active member)` OR `(organization_id IS NULL AND user_id = auth.uid())`
|
|
834
|
+
- `audit.append()` SQL function with hash-chain immutability; app roles get INSERT only via the function
|
|
835
|
+
- Supabase Storage buckets `org-runs` and `user-runs` with tenant-scoped path-prefix RLS
|
|
836
|
+
- `entitlements.plan` CHECK constraint matching `organizations.plan` exactly
|
|
837
|
+
- `upload_sessions` stores only `jti` + token hash (never raw signing material)
|
|
838
|
+
- 7 RLS negative test files covering: runs cross-tenant, free-vs-org-tier branches, audit immutability, storage path isolation, entitlements admin-only, membership edge cases, upload_sessions single-use
|
|
839
|
+
- CI workflow `.github/workflows/db-tests.yml` runs the test suite against a Dockerized Supabase on every PR
|
|
840
|
+
|
|
841
|
+
**Spec:** `docs/specs/v7.0-hosted-product-mvp.md` (PR #114)
|
|
842
|
+
**Plan:** `docs/superpowers/plans/2026-05-07-v7.0-phase1-foundation.md`
|
|
843
|
+
|
|
844
|
+
Pre-release on the npm `next` tag. `latest` stays on `6.2.2`.
|
|
845
|
+
|
|
846
|
+
## 6.2.2 — `claude-autopilot autopilot --json` envelope + cache version policy (2026-05-07)
|
|
847
|
+
|
|
848
|
+
**Headline.** Closes out the v6.2.x track. `claude-autopilot autopilot --json` now emits exactly one machine-readable envelope on stdout — successful runs, pre-run failures, and mid-pipeline failures all produce the same shape so CI consumers can branch on `.exitCode` / `.failedPhase` / `.errorCode` directly without parsing stderr NDJSON. The cache contract gains a `MIN_SUPPORTED..MAX_SUPPORTED` schema-version window so a stale run dir from a future binary fails with a clear error instead of an opaque shape crash. The migration guide gets a new "v6.1 → v6.2: one runId across the pipeline" section.
|
|
849
|
+
|
|
850
|
+
**Motivation — Codex review of the v6.2 spec (3 WARNING + 3 NOTE).** The v6.2 orchestrator spec reserved `--json` for v6.2.2; the spec for this PR (Codex 5.3-reviewed) folded back three warnings (strict equality on schemaVersion blocks rolling deploys, exactly-once envelope needs uncaughtException coverage, exit-code taxonomy ambiguous for pre-run failures) and three notes (six-phases vs four-phases migration text, `errorCode` union too loose, stdout purity test under stderr load).
|
|
851
|
+
|
|
852
|
+
**What's in (the 9 deliverables from the spec's "Scope" section).**
|
|
853
|
+
|
|
854
|
+
- **Outer JSON envelope** for `claude-autopilot autopilot --json`. New `AutopilotJsonEnvelope` shape (`version: '1'`, `verb: 'autopilot'`, `runId | null`, `status`, `exitCode`, `phases[]`, `totalCostUSD`, `durationMs`, `errorCode?`, `errorMessage?`, `failedAtPhase?`, `failedPhaseName?`). Pre-run failures get `runId: null` + populated `errorCode`. Mid-pipeline failures get `failedAtPhase` + `failedPhaseName`.
|
|
855
|
+
- **Bounded `AutopilotErrorCode` enum.** Exact strings: `invalid_config | budget_exceeded | lock_held | corrupted_state | partial_write | needs_human | phase_failed | internal_error`. CI consumers can rely on these specific values; new codes ship as minor versions of the envelope schema. Per codex NOTE #5.
|
|
856
|
+
- **Single-write latch + uncaughtException / unhandledRejection handlers.** Module-scoped boolean in `src/cli/json-envelope.ts` flips BEFORE writing so subsequent calls no-op. The orchestrator's `runAutopilotWithJsonEnvelope` installs process-level fatal handlers that consult the latch — if an envelope already shipped, they exit silently; otherwise they emit a fallback `internal_error` envelope before exiting `1`. Test seam `__testInstallProcessHandlers: false` keeps the handlers from leaking across the suite. Per codex WARNING #2.
|
|
857
|
+
- **Deterministic exit-code-to-errorCode mapping** via `computeAutopilotExitCode`. `0` success / `1` `invalid_config | phase_failed | internal_error` / `2` `lock_held | corrupted_state | partial_write` / `78` `budget_exceeded | needs_human`. Per codex WARNING #3.
|
|
858
|
+
- **Cache contract version policy** in `src/core/run-state/state.ts` + the replay path in `events.ts`. New exports `RUN_STATE_MIN_SUPPORTED_SCHEMA_VERSION = 1` and `RUN_STATE_MAX_SUPPORTED_SCHEMA_VERSION = RUN_STATE_SCHEMA_VERSION`. `replayState()` throws `corrupted_state` when the persisted `schema_version` falls outside the window, with a message naming both bounds for operator triage. Future minor versions can additively expand the schema while preserving forward-read compatibility (bump writer, leave reader); major bumps reset `MIN_SUPPORTED` to break with the past explicitly. Per codex WARNING #1.
|
|
859
|
+
- **Migration guide section.** New "v6.1 → v6.2: one runId across the pipeline" section in `docs/v6/migration-guide.md` walks through the per-verb → orchestrator collapse, the `--json` envelope shape (success / pre-run failure / mid-pipeline failure examples), the `AutopilotErrorCode` taxonomy table, and the cache version policy. Flags the v6.2.0 vs v6.2.1 phase-set difference per codex NOTE #4 — examples assume the v6.2.1 6-phase set (`scan → spec → plan → implement → migrate → pr`).
|
|
860
|
+
- **Channel discipline preserved.** The envelope is the only thing on stdout in `--json` mode (orchestrator runs with `__silent: true`). NDJSON events continue to flow to stderr unchanged via the existing v6 Phase 5 helpers.
|
|
861
|
+
- **Dispatcher wiring.** `src/cli/index.ts` plumbs `--json` through to `runAutopilotWithJsonEnvelope`; pre-run validation failures (`--mode`, `--budget`) emit envelopes too so CI never sees free-text errors when `--json` is on.
|
|
862
|
+
|
|
863
|
+
**Tests.** Baseline 1534 → 1548 (+14 net new):
|
|
864
|
+
|
|
865
|
+
- 9 envelope tests in `tests/cli/autopilot-json-envelope.test.ts` covering the 6 spec scenarios (success, pre-run failure, mid-pipeline failure, no-ANSI on stdout, stdout purity under stderr load, single-write latch + uncaughtException) plus 1 latch sanity test and 2 exit-code/enum mapping tests.
|
|
866
|
+
- 5 schema-version range tests in `tests/run-state/state.test.ts` covering the bounds export plus accept-in-range, reject-below-MIN, reject-above-MAX, and message-names-both-bounds.
|
|
867
|
+
|
|
868
|
+
**Engine-off path unchanged.** The schema-version range check applies inside `replayState()` (engine-on territory). Engine-off invocations don't read run dirs and are byte-for-byte identical to v6.2.1.
|
|
869
|
+
|
|
870
|
+
**Out of scope (deliberate, see spec for full list).**
|
|
871
|
+
- `--json` envelope on individual wrapped verbs other than `autopilot`. They already emit per-verb envelopes via the v6 Phase 5 helper; no change needed.
|
|
872
|
+
- Streaming JSON (newline-delimited progress events on stdout). v6.3 — would need a major channel-discipline change.
|
|
873
|
+
- Schema migration tooling. v6.x has only one schema version; migration tooling is reserved for the v7 layout change.
|
|
874
|
+
|
|
875
|
+
**Spec.** docs/specs/v6.2.2-json-envelope-and-docs.md (3 WARNING + 3 NOTE folded from the Codex 5.3 review).
|
|
876
|
+
|
|
877
|
+
## 6.2.1 — Side-effect phase idempotency contracts (`migrate` + `pr`) (2026-05-07)
|
|
878
|
+
|
|
879
|
+
**Headline.** Side-effecting phases now satisfy a registry-enforced two-step contract — record a deterministic "I'm starting this work" breadcrumb BEFORE the side-effect, then one reconciliation ref per durable artifact AFTER. With the contract in place, `migrate` and `pr` enter the orchestrator's `--mode=full` registry, expanding the v6.2.0 `scan → spec → plan → implement` pipeline to the full **6-phase** flow `scan → spec → plan → implement → migrate → pr` under one runId.
|
|
880
|
+
|
|
881
|
+
**Motivation — Codex CRITICAL gate from v6.2.** The v6.2 orchestrator spec flagged side-effect resume as the riskiest property to certify before adding `migrate` or `pr`: a partial crash mid-dispatch could leave the engine blind to applied work, causing the resume preflight to either silently re-run side effects (data loss) or pessimistically refuse every retry (operability tax). v6.2.1 closes the gap with a uniform contract every side-effecting phase must declare AND a registry-time guard that throws if the declaration is missing.
|
|
882
|
+
|
|
883
|
+
**What's in (the 7 deliverables from spec section "Scope of THIS PR").**
|
|
884
|
+
|
|
885
|
+
- **New `migration-batch` ref kind** in `ExternalRefKind` (`src/core/run-state/types.ts`). Documented semantics: "deterministic id covers a planned migration batch; emitted BEFORE dispatch so a partial crash leaves a resume target." Joins `migration-version` (the post-effect reconciliation ref).
|
|
886
|
+
- **`migrate` pre-effect breadcrumb.** `src/cli/migrate.ts` now emits a `migration-batch` ref BEFORE `dispatchFn(input)` — a partial crash leaves the orchestrator a resume target. The post-success `migration-version` refs stay (one per applied migration). Per the v6.2.1 spec, the batch id uses the `${env}:pre-dispatch:${Date.now()}` fallback form because no Delegance migrate skill (Supabase, Rails, Alembic, …) exposes its planned set pre-dispatch — the deterministic-id form `sha256(env+plannedMigrations)` is reserved for a follow-up that adds a planning verb to the skill protocol.
|
|
887
|
+
- **Provider readback for `migration-batch`** in `src/core/run-state/provider-readback.ts`. Queries the dispatcher's ledger for the planned set + applied set, returns `merged` (all applied), `open` (some pending), `failed` (any errored), or `unknown` (fail closed on missing fetcher / throw / null). New `MigrationBatchFetcher` interface + `registerMigrationBatchFetcher` seam alongside the existing `MigrationStateFetcher`.
|
|
888
|
+
- **Registry-time enforcement** in `src/core/run-state/phase-registry.ts`. New `registerPhase()` helper throws `Error: registry: side-effect phase <name> missing idempotency contract` when a `hasSideEffects: true` registration omits `preEffectRefKinds` or `postEffectRefKinds`. Applied to all six entries; the four read-only phases (scan/spec/plan/implement) omit the arrays without complaint.
|
|
889
|
+
- **`buildMigratePhase` and `buildPrPhase` builders** extracted following the v6.2.0 builder pattern (scan/spec/plan/implement). Each verb's existing `runX(options)` continues to delegate to its builder — direct CLI behavior is byte-for-byte identical to v6.2.0. The full registry now has: `scan / spec / plan / implement / migrate / pr`.
|
|
890
|
+
- **Resume preflight in orchestrator** (`src/cli/autopilot.ts` + new `src/core/run-state/resume-preflight.ts`). Before invoking `runPhase` on any side-effecting phase, the orchestrator collects prior `phase.success` + `phase.externalRef` events from `events.ndjson` and routes per the spec decision matrix: all post-effect refs `merged`/`live` → emit synthetic `phase.success` and skip; pre-effect breadcrumb `open` → retry (the phase body's own ledger handles dedup); otherwise → emit `replay.override` + throw `GuardrailError('needs_human')`. New error code `needs_human` joins the taxonomy in `src/core/errors.ts`.
|
|
891
|
+
- **`--mode=full` extended** to 6 phases (`DEFAULT_FULL_PHASES` in `phase-registry.ts`). After v6.2.1, `claude-autopilot autopilot` runs the entire pipeline under one runId — the YC-demo win deferred from v6.2.0.
|
|
892
|
+
|
|
893
|
+
**Tests.** Baseline 1509 → 1532 (+23 net new):
|
|
894
|
+
|
|
895
|
+
- 9 gating tests in `tests/cli/autopilot-side-effect-resume.test.ts` covering the 6 spec scenarios (migrate partial-crash retry, migrate full-success skip, pr-open skip, pr-closed needs-human, registry rejection, run-scope budget no-double-charge) plus 3 edge cases (proceed-fresh, prior success without refs, errored-ledger needs-human).
|
|
896
|
+
- 8 unit tests in `tests/run-state/provider-readback.test.ts` covering the new `migration-batch` readback (merged / open / failed / empty plan / null fetcher / throw / no fetcher / default-registry routing).
|
|
897
|
+
- 2 updated tests in `tests/cli/migrate-engine-smoke.test.ts` to account for the new pre-effect breadcrumb (now `1 + N` refs per run instead of `N`).
|
|
898
|
+
- 4 new test variants for the contract guard (`hasSideEffects: true` with each missing array, plus the empty-postEffect / read-only positive cases).
|
|
899
|
+
|
|
900
|
+
**Engine-off path unchanged.** Existing `migrate`/`pr` invocations without `--engine` continue byte-for-byte identical. The engine-off escape hatch threads through `executeMigratePhase(input, null)` / `executePrPhase(input, null)`, where a null `ctx` makes `emitExternalRef` a no-op — same precedent as every other wrapped verb.
|
|
901
|
+
|
|
902
|
+
**Out of scope (deliberate, see spec for full list).**
|
|
903
|
+
- Deterministic batch id (`sha256(env + plannedMigrations)`) — requires extracting a `planMigrations()` verb from each migrate skill's protocol. v6.2.x follow-up.
|
|
904
|
+
- `implement`'s `git-remote-push` ref (declared in the spec table but not yet emitted by `implement.ts`). v6.2.x follow-up.
|
|
905
|
+
- Cross-run ref dedup (e.g. recognizing two pre-dispatch breadcrumbs as the same operation across runs). Not needed for orchestrator MVP.
|
|
906
|
+
- Provider readback for non-Delegance migrate skills (Rails, Alembic, …). v6.2.1 ships the contract; per-skill readback is per-skill follow-up work.
|
|
907
|
+
|
|
908
|
+
**Spec.** docs/specs/v6.2.1-side-effect-idempotency.md (Codex CRITICAL gate from v6.2 — folded back as the foundation for this PR).
|
|
909
|
+
|
|
910
|
+
## 6.2.0 — Multi-phase orchestrator (`claude-autopilot autopilot`) (2026-05-07)
|
|
911
|
+
|
|
912
|
+
**Headline.** New top-level `claude-autopilot autopilot` verb runs `scan → spec → plan → implement` under **one runId**. The pre-v6.2 chain (`scan && spec && plan && implement`) created four separate runs with no parent — the orchestrator collapses them into a single ledger so `claude-autopilot runs watch <id>` covers the whole pipeline and a `--budget=$25` cap ticks down across phases instead of resetting per verb.
|
|
913
|
+
|
|
914
|
+
**What's in.**
|
|
915
|
+
- **`claude-autopilot autopilot [options]`** — sequential N-phase orchestrator. Engine-on REQUIRED (rejected at pre-flight if `--no-engine` / `CLAUDE_AUTOPILOT_ENGINE=off` / `engine.enabled: false`). Lifecycle: `createRun({ phases })` → per-phase `buildPhase + runPhase` → emit `run.complete` exactly once → refresh state snapshot → release lock in `finally`. Non-interactive (a `pause` budget decision becomes hard-fail) so it works in CI without prompting.
|
|
916
|
+
- **`build<Phase>Phase()` builders** extracted from `scan`, `spec`, `plan`, `implement`. Each verb's existing `runX(options)` continues to call its builder internally — direct CLI behavior is byte-for-byte identical to v6.1. Per-verb parity tests (`tests/cli/<verb>-builder-parity.test.ts`) compare stdout / stderr / `events.ndjson` between the legacy entry and the explicit builder + `runPhaseWithLifecycle` path.
|
|
917
|
+
- **Phase registry** at `src/core/run-state/phase-registry.ts`. `as const` + per-entry `satisfies PhaseRegistration<I, O>` preserves per-phase I/O typing through dynamic dispatch (per codex review NOTE #5). `getPhase(name)`, `listPhaseNames()`, and `validatePhaseNames(names)` are the public surface; `--phases=<csv>` validation lives here.
|
|
918
|
+
- **Run-scope budget** — `BudgetConfig.scope: 'phase' | 'run'` (default `'phase'` for back-compat). When `scope === 'run'` the orchestrator's per-phase budget gates resolve against cross-phase `phase.cost` totals so the `$25` demo narrative ticks down across the whole pipeline. `sumPhaseCost(events, '*')` cross-phase overload added. Both `BudgetCheck.scope` and `BudgetCheckEvent.scope` carry the resolution forward to observers (`runs show <id> --events`, future cost dashboards). Per codex review WARNING #2 — pulled forward into v6.2.0 (was deferred to v6.2.2 in the initial draft).
|
|
919
|
+
- **Exit-code matrix** (per codex review WARNING #3) — 0 success, 78 budget_exceeded, 2 engine error (`lock_held` / `corrupted_state` / `partial_write`), 1 everything else. Phase failure wins over finalization error.
|
|
920
|
+
- **CLI surface**: `--mode=full` (default — `scan → spec → plan → implement`), `--phases=<csv>` for custom lists, `--budget=<usd>` for the run-scope cap. `--mode=fix` and `--mode=review` reserved for v6.2.1+; `--json` envelope reserved for v6.2.2.
|
|
921
|
+
|
|
922
|
+
**Tests.** Baseline 1492 → 1509 (+17 new):
|
|
923
|
+
- 4 builder-parity tests (`scan`, `spec`, `plan`, `implement`) covering stdout / stderr / events triple-snapshot.
|
|
924
|
+
- 6 run-scope budget tests in `tests/run-state/budget.test.ts` covering scope flag default, run-scope happy path, run-scope cap exceeded across phases, Layer 1 advisory in run-scope, and phase/run scope math equivalence (regression guard).
|
|
925
|
+
- 7 orchestrator integration tests in `tests/cli/autopilot.test.ts` covering: 3-phase happy path, scan-failure phase 0, run-scope budget exceeded → exit 78, resume lookup `already-complete` short-circuit, `--phases=invalid,scan` → exit 1 invalid_config no run dir, `CLAUDE_AUTOPILOT_ENGINE=off` → exit 1 invalid_config, `cliEngine: false` → exit 1 invalid_config.
|
|
926
|
+
|
|
927
|
+
**Out of scope (deliberate, see spec for full list).**
|
|
928
|
+
- `migrate`, `pr` — gated on per-phase idempotency contracts (preflight readback + externalRef recorded BEFORE side-effect). v6.2.1.
|
|
929
|
+
- `--mode=fix`, `--mode=review` — v6.2.1+.
|
|
930
|
+
- `--json` envelope — v6.2.2.
|
|
931
|
+
- Parallel phase execution. Sequential by design.
|
|
932
|
+
- Interactive prompts inside the orchestrator. CI/scripts get deterministic exit codes; pause budget decisions hard-fail.
|
|
933
|
+
|
|
934
|
+
**Spec.** docs/specs/v6.2-multi-phase-orchestrator.md (Codex-reviewed: 1 CRITICAL + 3 WARNING + 3 NOTE folded back into the spec before implementation).
|
|
935
|
+
|
|
936
|
+
## 6.1.0 — Default flip: engine on by default + `--no-engine` deprecated (2026-05-07)
|
|
937
|
+
|
|
938
|
+
**Headline.** The Run State Engine is now ON by default. Bare
|
|
939
|
+
`claude-autopilot <verb>` invocations create a `.guardrail-cache/runs/<ulid>/`
|
|
940
|
+
directory, emit typed NDJSON events on stderr, apply budget gates if
|
|
941
|
+
`budgets:` is configured, and write a state snapshot — without any opt-in
|
|
942
|
+
config. v6.0 shipped the engine OFF behind an explicit `engine.enabled: true`
|
|
943
|
+
opt-in to give users control during a stabilization window; v6.1 closes
|
|
944
|
+
that window.
|
|
945
|
+
|
|
946
|
+
**Motivation — v6.0 stabilization criteria met.**
|
|
947
|
+
- 10 of 10 pipeline phases wrapped through `runPhaseWithLifecycle`
|
|
948
|
+
(`scan` v6.0.1, `costs`/`fix` v6.0.2, `brainstorm`/`spec` v6.0.3,
|
|
949
|
+
`plan`/`review` v6.0.4, `validate` v6.0.5, `implement` v6.0.7,
|
|
950
|
+
`migrate` v6.0.8 — first side-effecting wrap with `migration-version`
|
|
951
|
+
externalRefs, `pr` v6.0.9 — second side-effecting wrap with `github-pr`
|
|
952
|
+
externalRefs).
|
|
953
|
+
- Lifecycle helper extracted (v6.0.6) so all 10 wraps share the same
|
|
954
|
+
byte-for-byte engine-on / engine-off behavior.
|
|
955
|
+
- Side-effecting wraps proven (`migrate` + `pr`) — externalRef ledger
|
|
956
|
+
+ provider readback semantics exercised end-to-end.
|
|
957
|
+
- Live adapter cert suite green (Vercel + Fly + Render).
|
|
958
|
+
- `runs watch <id>` live cost/budget meter shipped (this release's
|
|
959
|
+
`v6.1.0-pre` entry below) — the YC-demo moment for the events stream.
|
|
960
|
+
- `npm test` baseline: 1469 → 1492 (+23 net new this release; all green).
|
|
961
|
+
|
|
962
|
+
**Deprecation.** `--no-engine`, `CLAUDE_AUTOPILOT_ENGINE=off|false|0|no`,
|
|
963
|
+
and `engine.enabled: false` continue to work as the legacy escape hatch
|
|
964
|
+
in v6.1.x. Each invocation that resolves to engine-off via one of those
|
|
965
|
+
explicit opt-outs now prints a single-line stderr deprecation notice:
|
|
966
|
+
|
|
967
|
+
```
|
|
968
|
+
[deprecation] --no-engine / engine.enabled: false will be removed in v7. Migrate to engine-on (default).
|
|
969
|
+
```
|
|
970
|
+
|
|
971
|
+
The notice fires only on user-driven opt-outs (`source: 'cli' | 'env' |
|
|
972
|
+
'config'`); the new (engine-on) default never trips it. **v7 removes
|
|
973
|
+
the escape hatch** — `engine.enabled: false` becomes a config validation
|
|
974
|
+
error and `--no-engine` / `CLAUDE_AUTOPILOT_ENGINE=off` are silently
|
|
975
|
+
ignored.
|
|
976
|
+
|
|
977
|
+
**Spec.** [`docs/specs/v6.1-default-flip.md`](docs/specs/v6.1-default-flip.md)
|
|
978
|
+
is the canonical reference for what flipped, why, and the v7 follow-up.
|
|
979
|
+
|
|
980
|
+
**Migration tips.**
|
|
981
|
+
- If your CI parses stderr as free-form text and relies on the v5.x
|
|
982
|
+
shape, set `CLAUDE_AUTOPILOT_ENGINE=off` (or pass `--no-engine`)
|
|
983
|
+
to pin the legacy behavior. You'll see the deprecation notice on
|
|
984
|
+
every invocation until you remove it — that's expected.
|
|
985
|
+
- If you opt out via config (`engine.enabled: false`), the same notice
|
|
986
|
+
fires on every invocation. Plan to remove that line before bumping
|
|
987
|
+
to v7.
|
|
988
|
+
- Existing users on `engine.enabled: true` are no-op'd — your config
|
|
989
|
+
still wins via the same precedence rules.
|
|
990
|
+
- See [`docs/v6/migration-guide.md#migrating-from-v60-to-v61`](docs/v6/migration-guide.md)
|
|
991
|
+
for the full upgrade walkthrough.
|
|
992
|
+
|
|
993
|
+
**Test surface.**
|
|
994
|
+
- `tests/run-state/resolve-engine.test.ts` — flipped 4 default-related
|
|
995
|
+
cases. New `v6.1 default-flip` describe block + `v6.1 deprecation
|
|
996
|
+
warning` describe block covering the predicate, the emitter, the
|
|
997
|
+
default `process.stderr` branch, and the `builtInDefault` override
|
|
998
|
+
path.
|
|
999
|
+
- `tests/run-state/run-phase-with-lifecycle.test.ts` — added 4 new
|
|
1000
|
+
cases pinning engine-on as the new default + the deprecation banner
|
|
1001
|
+
firing on opt-out / staying silent on the new default.
|
|
1002
|
+
- 9 engine-smoke tests (`brainstorm`, `costs`, `implement`, `migrate`,
|
|
1003
|
+
`plan`, `pr`, `review`, `spec`, `validate`) updated — the
|
|
1004
|
+
"engine off (default)" cases are now "engine on (v6.1 default)";
|
|
1005
|
+
the matching `cliEngine: false` cases stay as legacy-escape-hatch
|
|
1006
|
+
coverage.
|
|
1007
|
+
|
|
1008
|
+
**Files changed.**
|
|
1009
|
+
- `src/core/run-state/resolve-engine.ts` — new active default constant
|
|
1010
|
+
`ENGINE_DEFAULT_V6_1 = true`. The deprecated `ENGINE_DEFAULT_V6_0`
|
|
1011
|
+
export keeps its historical value (`false`) so out-of-tree consumers
|
|
1012
|
+
who pinned that symbol get what the name promises; both constants are
|
|
1013
|
+
removed in v7. New `emitEngineOffDeprecationWarning` helper +
|
|
1014
|
+
`shouldWarnEngineOffDeprecation` predicate +
|
|
1015
|
+
`ENGINE_OFF_DEPRECATION_MESSAGE` stable copy.
|
|
1016
|
+
- `src/core/run-state/run-phase-with-lifecycle.ts` — wires the
|
|
1017
|
+
deprecation helper into the engine-off branch.
|
|
1018
|
+
- `docs/v6/migration-guide.md` — new "Migrating from v6.0 to v6.1"
|
|
1019
|
+
section, updated precedence matrix, refreshed default-flip plan,
|
|
1020
|
+
relabeled "What changes" table.
|
|
1021
|
+
- `README.md` — v6 section updated (engine on by default + v7 removal
|
|
1022
|
+
timeline).
|
|
1023
|
+
- `package.json` — version `5.5.2` → `6.1.0`.
|
|
1024
|
+
|
|
1025
|
+
## v6.1.0-pre — `runs watch <id>` live cost meter (2026-05-07)
|
|
1026
|
+
|
|
1027
|
+
**The YC-demo moment.** v6.0.x hardened the events.ndjson stream across
|
|
1028
|
+
all 10 wrapped phases; v6.1 makes that stream visible in real time.
|
|
1029
|
+
`runs watch <runId>` tails events.ndjson via `fs.watchFile` (1s poll —
|
|
1030
|
+
inotify/FSEvents are unreliable for tiny appends across our matrix) and
|
|
1031
|
+
pretty-renders each event with a running cost/budget meter so a user
|
|
1032
|
+
running `claude-autopilot autopilot ...` in one terminal can `runs watch`
|
|
1033
|
+
in another and watch their $25 budget tick down while phases ship code.
|
|
1034
|
+
|
|
1035
|
+
**Demo transcript.** Live tail of a fixture run, ANSI-stripped:
|
|
1036
|
+
|
|
1037
|
+
```
|
|
1038
|
+
* run 01HZK7P3D8Q9V00000000000AB
|
|
1039
|
+
phases: spec -> plan -> implement -> pr
|
|
1040
|
+
budget: $0.00 / $25.00 (0%)
|
|
1041
|
+
[12:00:01] phase.start spec
|
|
1042
|
+
[12:00:42] phase.cost spec +$0.07 (in: 1.2k, out: 3.4k) total: $0.07
|
|
1043
|
+
[12:00:45] phase.success spec OK 44.2s
|
|
1044
|
+
[12:00:46] phase.start plan
|
|
1045
|
+
[12:01:12] phase.cost plan +$0.21 (in: 4.1k, out: 8.2k) total: $0.28
|
|
1046
|
+
[12:01:15] phase.success plan OK 29.0s
|
|
1047
|
+
[12:08:33] phase.externalRef pr -> github-pr#123
|
|
1048
|
+
[12:08:34] run.complete status=success totalCostUSD=$4.20 duration=8m32s
|
|
1049
|
+
|
|
1050
|
+
done run 01HZK7P3D8Q9V00000000000AB
|
|
1051
|
+
status=success totalCostUSD=$4.20 duration=8m33s
|
|
1052
|
+
```
|
|
1053
|
+
|
|
1054
|
+
**Modes.**
|
|
1055
|
+
|
|
1056
|
+
- `runs watch <id>` — live tail, exits on `run.complete` / Ctrl-C
|
|
1057
|
+
- `runs watch <id> --since <seq>` — replay forward from a specific seq
|
|
1058
|
+
(resume after disconnect)
|
|
1059
|
+
- `runs watch <id> --no-follow` — render snapshot once and exit (CI /
|
|
1060
|
+
scripting)
|
|
1061
|
+
- `runs watch <id> --json` — emit raw NDJSON to stdout (one event per
|
|
1062
|
+
line) for piping to `jq` or external dashboards. ANSI suppressed.
|
|
1063
|
+
- `runs watch <id> --no-color` — force ANSI off even on a TTY
|
|
1064
|
+
|
|
1065
|
+
**Pretty rendering.** Color thresholds on the budget bar — green <50%,
|
|
1066
|
+
yellow 50-90%, red >90%. Per-event coloring: cyan for phase.start, yellow
|
|
1067
|
+
for phase.cost, green for phase.success, red for phase.failed, magenta
|
|
1068
|
+
for phase.externalRef + lock.takeover + replay.override, bold-green for
|
|
1069
|
+
run.complete success, bold-red for run.complete failed/aborted. ANSI
|
|
1070
|
+
auto-strips when stdout is not a TTY (CI), when `--no-color` or `--json`
|
|
1071
|
+
is set, or when `NO_COLOR` env var is present.
|
|
1072
|
+
|
|
1073
|
+
**Pure renderer.** `src/cli/runs-watch-renderer.ts` is referentially
|
|
1074
|
+
transparent — `renderEventLine(event, runningTotal, opts)` is the core
|
|
1075
|
+
primitive, exported and 100% pure. Tests run as string-equality
|
|
1076
|
+
assertions in <300ms.
|
|
1077
|
+
|
|
1078
|
+
**Engine modules untouched.** This is purely a consumer of the existing
|
|
1079
|
+
event stream — no changes to `src/core/run-state/**`, no changes to the
|
|
1080
|
+
10 wrapped phase verbs, no changes to `runPhaseWithLifecycle`.
|
|
1081
|
+
|
|
1082
|
+
**Tests.** +43 new tests:
|
|
1083
|
+
- `tests/cli/runs-watch-renderer.test.ts` — 29 pure-renderer cases
|
|
1084
|
+
covering every event-line variant, the three budget-bar color
|
|
1085
|
+
thresholds, ANSI on/off symmetry, and the final-summary block
|
|
1086
|
+
- `tests/cli/runs-watch.test.ts` — 14 verb-level cases covering
|
|
1087
|
+
`--no-follow` snapshot, `--since` replay, `--json` mode, run-not-found
|
|
1088
|
+
(exit 2), invalid-ULID, live tail picks up appended events,
|
|
1089
|
+
budget rendering with/without `BudgetConfig`, plural `budgets` config
|
|
1090
|
+
alias, ANSI behavior, and run-complete short-circuit on already-
|
|
1091
|
+
terminated runs
|
|
1092
|
+
|
|
1093
|
+
**CLI plumbing.** New sub-verb on the `runs` umbrella: `runs watch <id>`.
|
|
1094
|
+
Help block surfaces `--since`, `--no-follow`, `--json`, `--no-color`
|
|
1095
|
+
plus a behavior summary + exit-code key. Exit codes: 0 success / clean
|
|
1096
|
+
exit, 1 invalid input or stream error, 2 not_found.
|
|
1097
|
+
|
|
1098
|
+
## v6.0.9 — wrap `pr` through `runPhaseWithLifecycle` (2026-05-06)
|
|
1099
|
+
|
|
1100
|
+
**First side-effecting phase wrapped.** v6.0.1 → v6.0.5 wrapped read-only
|
|
1101
|
+
verbs (`scan`, `costs`, `fix`, `brainstorm`, `spec`, `plan`, `review`,
|
|
1102
|
+
`validate`); v6.0.6 extracted the lifecycle helper. v6.0.9 wraps `pr` —
|
|
1103
|
+
the first verb that mutates state on the platform of record (GitHub
|
|
1104
|
+
issue comments + PR reviews). This proves the helper's `ctx.emitExternalRef`
|
|
1105
|
+
plumbing for genuinely side-effecting phases without any helper-shape
|
|
1106
|
+
changes.
|
|
1107
|
+
|
|
1108
|
+
**Declarations.** Match the v6 spec table exactly:
|
|
1109
|
+
|
|
1110
|
+
- `idempotent: false` — re-running posts a NEW PR review ID each time
|
|
1111
|
+
(`postReviewComments` dismisses prior + creates new). PR comment
|
|
1112
|
+
posting (`postPrComment`) is marker-deduped on the body but the
|
|
1113
|
+
underlying `gh` API call is still mutating.
|
|
1114
|
+
- `hasSideEffects: true` — posts to GitHub via the `gh` CLI inside the
|
|
1115
|
+
inner `runCommand` invocation.
|
|
1116
|
+
- `externalRefs: github-pr` — recorded BEFORE the inner `runCommand`
|
|
1117
|
+
runs so a crash mid-pipeline still leaves a breadcrumb pointing at
|
|
1118
|
+
the PR. The engine path's Phase 6 resume logic can `gh pr view <id>`
|
|
1119
|
+
to confirm the PR is still open before deciding whether a replay
|
|
1120
|
+
is safe.
|
|
1121
|
+
|
|
1122
|
+
**Engine-off byte-for-byte unchanged.** All `gh pr view` + `git fetch` +
|
|
1123
|
+
`runCommand` behavior preserved. The wrap adds two test seams
|
|
1124
|
+
(`__testPrMeta` to short-circuit PR metadata lookup, `__testRunCommand`
|
|
1125
|
+
to stub the inner pipeline) so the smoke test exercises the engine
|
|
1126
|
+
lifecycle without `gh` or a real review pipeline. Production callers
|
|
1127
|
+
must not pass these — they're documented "test only" with a comment
|
|
1128
|
+
mirroring scan / fix's `__testReviewEngine` precedent.
|
|
1129
|
+
|
|
1130
|
+
**CLI plumbing.** The `pr` dispatcher arm now threads `cliEngine` from
|
|
1131
|
+
`parseEngineCliFlag()` and `envEngine` from
|
|
1132
|
+
`process.env.CLAUDE_AUTOPILOT_ENGINE`, mirroring every other wrapped
|
|
1133
|
+
verb. The per-verb help block (`claude-autopilot help pr`) gains
|
|
1134
|
+
`--engine` / `--no-engine` lines plus a side-effects note (engine-on
|
|
1135
|
+
records a `github-pr` externalRef; future replays gate on the spec's
|
|
1136
|
+
"side-effect readback" rule). `GLOBAL_FLAGS_BLOCK` adds "v6.0.9: wired
|
|
1137
|
+
for `pr`" to its breadcrumb list.
|
|
1138
|
+
|
|
1139
|
+
**Smoke test.** New `tests/cli/pr-engine-smoke.test.ts`, 6 cases:
|
|
1140
|
+
- engine off (default): no run dir / no engine artifacts; runCommand
|
|
1141
|
+
still invoked
|
|
1142
|
+
- engine off (`cliEngine: false`): no run dir
|
|
1143
|
+
- engine on (`--engine`): state.json + events.ndjson + lifecycle in
|
|
1144
|
+
order (run.start → phase.start → phase.externalRef → phase.success
|
|
1145
|
+
→ run.complete); externalRef recorded with kind=`github-pr`,
|
|
1146
|
+
id=`42`, provider=`github`; `idempotent: false, hasSideEffects: true`
|
|
1147
|
+
reflected on the phase
|
|
1148
|
+
- env precedence (`CLAUDE_AUTOPILOT_ENGINE=on` without CLI flag)
|
|
1149
|
+
- CLI override (`--no-engine` beats env on)
|
|
1150
|
+
- runCommand returning 1 surfaces as verb exit 1 WITHOUT marking the
|
|
1151
|
+
engine phase as failed (pipeline result ≠ phase failure, same
|
|
1152
|
+
precedent as scan)
|
|
1153
|
+
|
|
1154
|
+
**Why no follow-up `github-comment` externalRef yet.** A potential
|
|
1155
|
+
extension is to record one externalRef per posted comment / review
|
|
1156
|
+
(`github-comment`). That requires plumbing the post-comment URL out
|
|
1157
|
+
of `runCommand` (currently only logged) — deferred to a follow-up PR.
|
|
1158
|
+
For v6.0.9 the `github-pr` ref is sufficient for the spec's readback
|
|
1159
|
+
rule: a Phase 6 resume can verify the PR is still open before
|
|
1160
|
+
deciding whether to retry.
|
|
1161
|
+
|
|
1162
|
+
**Files changed.** `src/cli/pr.ts` (270 insertions / 22 deletions),
|
|
1163
|
+
`src/cli/index.ts` (+12 lines for engine knob plumbing),
|
|
1164
|
+
`src/cli/help-text.ts` (+8 lines for the per-verb Options block +
|
|
1165
|
+
breadcrumb), `tests/cli/pr-engine-smoke.test.ts` (new, 306 lines),
|
|
1166
|
+
`docs/v6/wrapping-pipeline-phases.md` (status header + table row +
|
|
1167
|
+
deviation note), `docs/v6/migration-guide.md` ("what works today" list
|
|
1168
|
+
adds `pr`), `docs/specs/v6-run-state-engine.md` (reconciliation block
|
|
1169
|
+
appended). Total: ~600 lines added, ~25 lines removed.
|
|
1170
|
+
|
|
1171
|
+
**Status after v6.0.9.** Nine of 10 phases wrapped. Remaining:
|
|
1172
|
+
`implement` (v6.0.7) and `migrate` (v6.0.8) — both side-effecting,
|
|
1173
|
+
both wrapped concurrently with this PR by parallel agents.
|
|
1174
|
+
- **Bundled UI polish skills** — ships `/ui`, `/simplify-ui`, `/ui-ux-pro-max`,
|
|
1175
|
+
`/make-interfaces-feel-better` so consumers get them via `npm install` instead
|
|
1176
|
+
of needing user-level skill installs. `/ui` runs the chained pass (audit →
|
|
1177
|
+
simplify → align → polish); the other three are individual lenses. Auto-
|
|
1178
|
+
discovered via the existing `skills/` directory in the package `files`
|
|
1179
|
+
allowlist. Pairs with the design context loader
|
|
1180
|
+
(`src/core/ui/design-context-loader.ts`) — both gate on the same
|
|
1181
|
+
`hasFrontendFiles()` predicate so they only fire when frontend files change.
|
|
1182
|
+
|
|
1183
|
+
## v6.0.7 — wrap `implement` through `runPhaseWithLifecycle` (2026-05-07)
|
|
1184
|
+
|
|
1185
|
+
**Wraps the ninth pipeline phase.** Mechanical wrap following the v6.0.6
|
|
1186
|
+
helper recipe. Engine-off path is byte-for-byte unchanged (advisory print
|
|
1187
|
+
pointing at the Claude Code `claude-autopilot` skill); engine-on path
|
|
1188
|
+
creates a run dir + emits run.start / phase.start / phase.success /
|
|
1189
|
+
run.complete events. Concurrent dispatch — landed alongside v6.0.8
|
|
1190
|
+
(`migrate`) and v6.0.9 (`pr`).
|
|
1191
|
+
|
|
1192
|
+
- New `src/cli/implement.ts` — `RunPhase<ImplementInput, ImplementOutput>`
|
|
1193
|
+
with `idempotent: true, hasSideEffects: false`. **Documented deviation
|
|
1194
|
+
from spec table:** the spec at line 159 of
|
|
1195
|
+
`docs/specs/v6-run-state-engine.md` lists `implement` with
|
|
1196
|
+
`idempotent: partial, hasSideEffects: yes, externalRefs: git-remote-push`.
|
|
1197
|
+
That declaration assumes the verb itself writes commits and pushes them
|
|
1198
|
+
to a remote. The v6.0.7 CLI verb does **not** write code, run tests,
|
|
1199
|
+
commit, or push to a remote — all of that lives in the Claude Code
|
|
1200
|
+
`claude-autopilot` skill (and its delegates: `subagent-driven-development`,
|
|
1201
|
+
`commit-push-pr`, `using-git-worktrees`). The CLI verb is the engine-wrap
|
|
1202
|
+
shell — its only side effect is writing the local
|
|
1203
|
+
`.guardrail-cache/implement/<ts>-implement.md` log stub. If a future PR
|
|
1204
|
+
inlines the implement loop into the CLI verb, the declarations flip to
|
|
1205
|
+
match the spec table and a `ctx.emitExternalRef({ kind: 'git-remote-push',
|
|
1206
|
+
id: '<commit-sha>' })` call lands after each push.
|
|
1207
|
+
- CLI dispatcher in `src/cli/index.ts` — wires `--engine` / `--no-engine` /
|
|
1208
|
+
`--context` / `--plan` / `--output` / `--config` through the helper
|
|
1209
|
+
alongside `process.env.CLAUDE_AUTOPILOT_ENGINE`. Mirrors the validate /
|
|
1210
|
+
review / plan dispatcher shape.
|
|
1211
|
+
- Help text in `src/cli/help-text.ts` — adds `implement` to the Pipeline
|
|
1212
|
+
group + per-verb Options block. Bumps `GLOBAL_FLAGS_BLOCK` to cite
|
|
1213
|
+
v6.0.7 alongside v6.0.1 → v6.0.5.
|
|
1214
|
+
- New smoke test `tests/cli/implement-engine-smoke.test.ts` (6 cases) —
|
|
1215
|
+
asserts state.json + events.ndjson lifecycle, idempotent /
|
|
1216
|
+
hasSideEffects flags, env / CLI precedence, log file location.
|
|
1217
|
+
- Test count: 1408 → 1414 (+6). `npm test` clean. `npx tsc --noEmit`
|
|
1218
|
+
clean except pre-existing fixture errors.
|
|
1219
|
+
|
|
1220
|
+
## v6.0.8 — wrap `migrate` through `runPhaseWithLifecycle` (2026-05-06)
|
|
1221
|
+
|
|
1222
|
+
**First side-effecting phase under the engine.** v6.0.1 → v6.0.6 wrapped
|
|
1223
|
+
eight read-only / advisory verbs (`scan`, `costs`, `fix`, `brainstorm`,
|
|
1224
|
+
`spec`, `plan`, `review`, `validate`). v6.0.8 wraps `migrate` — the
|
|
1225
|
+
first verb that mutates external state (database schema). Builds on the
|
|
1226
|
+
`runPhaseWithLifecycle` helper landed in v6.0.6 plus
|
|
1227
|
+
`ctx.emitExternalRef()` from inside the phase body for the
|
|
1228
|
+
`migration-version` ledger. No helper-shape changes needed.
|
|
1229
|
+
|
|
1230
|
+
**Phase declarations** match the spec table at line 162 of
|
|
1231
|
+
`docs/specs/v6-run-state-engine.md`:
|
|
1232
|
+
|
|
1233
|
+
```
|
|
1234
|
+
idempotent: false — dispatcher output varies by ledger state
|
|
1235
|
+
(N applied on attempt 1, 0 on attempt 2 even
|
|
1236
|
+
though both are operationally safe)
|
|
1237
|
+
hasSideEffects: true — applies migrations, writes audit log,
|
|
1238
|
+
regenerates types, refreshes schema cache
|
|
1239
|
+
externalRefs: migration-version, scoped `<env>:<name>` per applied
|
|
1240
|
+
migration. Phase 6's resume gate will read these back
|
|
1241
|
+
against the live `migration_state` to decide
|
|
1242
|
+
skip-already-applied vs retry vs needs-human.
|
|
1243
|
+
```
|
|
1244
|
+
|
|
1245
|
+
**Why `idempotent: false` even though the underlying Delegance migrate
|
|
1246
|
+
skill is ledger-guarded against double-apply:** at the *engine
|
|
1247
|
+
semantics* layer, `idempotent: true` means "re-running the phase against
|
|
1248
|
+
the same input produces equivalent output." A dispatch invocation that
|
|
1249
|
+
previously applied N migrations on attempt 1 and applies 0 on attempt 2
|
|
1250
|
+
(everything already in the ledger) DOES produce different output
|
|
1251
|
+
(different `appliedMigrations` list, different `status`). The spec's
|
|
1252
|
+
`idempotent: false` is correct.
|
|
1253
|
+
|
|
1254
|
+
**Engine-off path is byte-for-byte identical to v6.0.7.** Same dispatch
|
|
1255
|
+
shape (`src/core/migrate/dispatcher.ts` unchanged), same render lines,
|
|
1256
|
+
same `--json` payload callback. CI / scripts that don't pass `--engine`
|
|
1257
|
+
are unaffected.
|
|
1258
|
+
|
|
1259
|
+
| File | Role |
|
|
1260
|
+
|---|---|
|
|
1261
|
+
| `src/cli/migrate.ts` (new) | Engine-wrap shell calling `runMigrate(opts) → { exitCode, result }`. Defines `MigrateInput` / `MigrateOutput` (JSON-serializable), `RunPhase<MigrateInput, MigrateOutput>` with `name: 'migrate'`, `idempotent: false`, `hasSideEffects: true`. Phase body invokes the dispatcher and emits one `migration-version` externalRef per applied migration via `ctx.emitExternalRef({ kind: 'migration-version', id: '<env>:<name>' })`. Test seam: `__testDispatch` injects a fake dispatcher so smoke tests can exercise the engine-wrap path without spawning a child process or hitting a real database |
|
|
1262
|
+
| `src/cli/index.ts` | dispatcher case for `migrate` routes through `runMigrate` instead of inlining `runMigrateDispatch`; threads `cliEngine` + `envEngine`. Engine-off byte-for-byte unchanged — same `--json` payload callback, same render |
|
|
1263
|
+
| `src/cli/help-text.ts` | per-verb Options block for `migrate` documents `--engine` / `--no-engine` + `--config`; GLOBAL_FLAGS_BLOCK breadcrumb cites v6.0.8 |
|
|
1264
|
+
| `tests/cli/migrate-engine-smoke.test.ts` (new) | 6 cases: engine off (default — no run dir), engine on (lifecycle events, state.json shape, idempotent: false + hasSideEffects: true declaration), externalRef emission per applied migration scoped by env, skipped status (zero externalRefs), dispatcher error → exit 1 + engine still records phase.success (domain failure ≠ engine failure), CLI `--no-engine` beats env on |
|
|
1265
|
+
| `docs/v6/wrapping-pipeline-phases.md` | phase-status table flips `migrate` to "WRAPPED in v6.0.8"; status line at top moves to "NINE phases wrapped"; new deviation note documents the ledger-vs-engine-semantics rationale |
|
|
1266
|
+
| `docs/v6/migration-guide.md` | "What works today" updated — three knobs now honored by `scan`, `costs`, `fix`, `brainstorm`, `spec`, `plan`, `review`, `validate`, `migrate` |
|
|
1267
|
+
| `docs/specs/v6-run-state-engine.md` | new "What was actually built (v6.0.8)" reconciliation block |
|
|
1268
|
+
|
|
1269
|
+
**Test delta:** 1408 → 1414 (+6). Typecheck clean. All 1408 existing
|
|
1270
|
+
tests pass unchanged — the engine-off path for `migrate` is byte-for-
|
|
1271
|
+
byte identical to v6.0.7 (same dispatch shape, same render).
|
|
1272
|
+
|
|
1273
|
+
**Concurrency note.** v6.0.7 (`implement`) and v6.0.9 (`pr`) are in
|
|
1274
|
+
flight on parallel worktrees, both targeting shared docs (CHANGELOG,
|
|
1275
|
+
recipe table, migration-guide) and `src/cli/{index,help-text}.ts`. The
|
|
1276
|
+
rebase contract: on push rejection, fetch + rebase + resolve conflicts
|
|
1277
|
+
keeping all wraps' contributions, re-test, push with `--force-with-lease`.
|
|
1278
|
+
|
|
1279
|
+
**Not done in v6.0.8 — explicit non-goals:**
|
|
1280
|
+
- Wrapping `implement` and `pr`. Continues across v6.0.7 / v6.0.9
|
|
1281
|
+
using the same helper plus `ctx.emitExternalRef()` for
|
|
1282
|
+
`git-remote-push` (implement) and `github-pr` (pr).
|
|
1283
|
+
- Wiring Phase 6's `migration_state` read-back. The engine PERSISTS
|
|
1284
|
+
`migration-version` externalRefs in v6.0.8; consulting them on
|
|
1285
|
+
resume ships in Phase 6+. Until then, retries on side-effecting
|
|
1286
|
+
phases require `--force-replay`.
|
|
1287
|
+
- Multi-phase pipeline orchestrator (autopilot's full
|
|
1288
|
+
`brainstorm → spec → plan → ... → migrate → ...` flow under one runId).
|
|
1289
|
+
- Flipping the v6.0 built-in default to ON. v6.1 territory.
|
|
1290
|
+
|
|
1291
|
+
## v6.0.6 — `runPhaseWithLifecycle` helper (2026-05-06)
|
|
1292
|
+
|
|
1293
|
+
**Tech-debt refactor, no behavior change.** v6.0.1 → v6.0.5 wrapped eight
|
|
1294
|
+
CLI verbs (`scan`, `costs`, `fix`, `brainstorm`, `spec`, `plan`, `review`,
|
|
1295
|
+
`validate`) by hand-rolling the same ~100-line lifecycle pattern in each
|
|
1296
|
+
file: `createRun → optional run.warning → runPhase → run.complete →
|
|
1297
|
+
state.json refresh → best-effort lock release in finally`. Bugbot caught
|
|
1298
|
+
the duplication on PR #97 (LOW severity, deferred) with the explicit
|
|
1299
|
+
note: "extracting from 5 of 10 examples risks getting the abstraction
|
|
1300
|
+
wrong; from 10 of 10 the pattern is fully evidenced." At 8 of 10, the
|
|
1301
|
+
pattern is sufficiently evidenced that the remaining three side-effecting
|
|
1302
|
+
phases (`implement`, `migrate`, `pr`) can use the same helper plus
|
|
1303
|
+
`ctx.emitExternalRef()` from inside their phase body — no helper-shape
|
|
1304
|
+
changes needed.
|
|
1305
|
+
|
|
1306
|
+
**The helper.** New `src/core/run-state/run-phase-with-lifecycle.ts` sits
|
|
1307
|
+
on top of the existing `runPhase()` API (which is unchanged). Callers
|
|
1308
|
+
continue to define their own `RunPhase<I, O>` with per-phase
|
|
1309
|
+
`idempotent` / `hasSideEffects` / `run`, and pass it in alongside the
|
|
1310
|
+
input, the loaded config, the engine knobs, and an `runEngineOff`
|
|
1311
|
+
escape-hatch callback. The helper:
|
|
1312
|
+
|
|
1313
|
+
- Resolves engine on/off via the canonical CLI > env > config > default
|
|
1314
|
+
precedence
|
|
1315
|
+
- On engine-off: invokes `runEngineOff()` and returns its result with
|
|
1316
|
+
`runId/runDir: null`
|
|
1317
|
+
- On engine-on: creates a run dir, optionally emits `run.warning` for
|
|
1318
|
+
invalid env, runs the phase, emits `run.complete` (success or failed),
|
|
1319
|
+
refreshes `state.json` from replayed events, releases the lock in
|
|
1320
|
+
`finally` (idempotent), and returns `{ output, runId, runDir }`
|
|
1321
|
+
- On phase failure: emits `run.complete` with `status: 'failed'`, prints
|
|
1322
|
+
the legacy `[<phase>] engine: phase failed — <msg>` banner to stderr
|
|
1323
|
+
byte-for-byte, releases the lock, and re-throws
|
|
1324
|
+
|
|
1325
|
+
**Migrated phases.** All eight wrapped verbs reduced. Each `runX(opts)`
|
|
1326
|
+
function shrinks: keep the per-phase `RunPhase<I, O>` definition + the
|
|
1327
|
+
engine-off path body; delete the lifecycle boilerplate; call
|
|
1328
|
+
`runPhaseWithLifecycle` once. Total reduction across `src/cli/`:
|
|
1329
|
+
|
|
1330
|
+
- `scan.ts` 498 → 429 lines (-69)
|
|
1331
|
+
- `costs.ts` 297 → 231 lines (-66)
|
|
1332
|
+
- `fix.ts` 473 → 415 lines (-58)
|
|
1333
|
+
- `brainstorm.ts` 251 → 189 lines (-62)
|
|
1334
|
+
- `spec.ts` 216 → 159 lines (-57)
|
|
1335
|
+
- `plan.ts` 269 → 199 lines (-70)
|
|
1336
|
+
- `review.ts` 256 → 189 lines (-67)
|
|
1337
|
+
- `validate.ts` 262 → 196 lines (-66)
|
|
1338
|
+
- **Total: 2522 → 2007 lines (~515 lines saved)**
|
|
1339
|
+
|
|
1340
|
+
**Engine-off path is byte-for-byte unchanged.** All eight existing
|
|
1341
|
+
`tests/cli/<verb>-engine-smoke.test.ts` smokes pass without modification
|
|
1342
|
+
(44 cases). The helper supplies an `runEngineOff` callback so the legacy
|
|
1343
|
+
code path stays intact even when the phase body's call shape would
|
|
1344
|
+
otherwise pin it.
|
|
1345
|
+
|
|
1346
|
+
### Test count
|
|
1347
|
+
|
|
1348
|
+
After v6.0.5 baseline: 1396 → 1408 (+12). +12 cases for the new
|
|
1349
|
+
`tests/run-state/run-phase-with-lifecycle.test.ts` covering: engine-off
|
|
1350
|
+
(default + CLI > env > config precedence); engine-on success (lifecycle
|
|
1351
|
+
events, state.json shape, env / config resolution, costUSD pass-through,
|
|
1352
|
+
costUSD-absent fallback to 0); engine-on failure (run.complete failed,
|
|
1353
|
+
state.json refresh, error re-thrown with original message preserved,
|
|
1354
|
+
lock released through finally); invalid env value falling through to
|
|
1355
|
+
config-resolved engine-on with `run.warning`. Existing 44 phase smokes
|
|
1356
|
+
unchanged. Typecheck clean. Bugbot LOW from PR #97 addressed.
|
|
1357
|
+
|
|
1358
|
+
### Deliberately deferred
|
|
1359
|
+
|
|
1360
|
+
- Wrapping the remaining pipeline phases (`implement`, `migrate`, `pr`).
|
|
1361
|
+
Side-effecting phases need careful externalRef plumbing — they will
|
|
1362
|
+
build against `runPhaseWithLifecycle` plus `ctx.emitExternalRef()`
|
|
1363
|
+
from inside their phase body. Helper signature does not need to grow
|
|
1364
|
+
for them; documented in the helper's header comment.
|
|
1365
|
+
- Multi-phase pipeline orchestrator (autopilot's full
|
|
1366
|
+
`brainstorm → spec → plan → ...` flow under one runId). The single-
|
|
1367
|
+
phase shape stays — multi-phase wrapping is a separate v6.x lift.
|
|
1368
|
+
- Flipping the v6.0 built-in default to ON. v6.1 territory.
|
|
1369
|
+
|
|
1370
|
+
## v6.0.5 — Engine wire-up Part E (2026-05-06)
|
|
1371
|
+
|
|
1372
|
+
**The headline.** v6.0.4 wrapped `plan` and `review`. v6.0.5 continues the
|
|
1373
|
+
mechanical wrap pattern from the recipe at
|
|
1374
|
+
[`docs/v6/wrapping-pipeline-phases.md`](docs/v6/wrapping-pipeline-phases.md)
|
|
1375
|
+
with one more single-shot, read-only verb:
|
|
1376
|
+
|
|
1377
|
+
- **`validate`** — new CLI verb. Engine-wrap shell for the validate
|
|
1378
|
+
pipeline phase. Writes a validate log stub under
|
|
1379
|
+
`.guardrail-cache/validate/`; the actual validation work (static
|
|
1380
|
+
checks, auto-fix, tests, Codex review with auto-fix, bugbot triage) is
|
|
1381
|
+
owned by the Claude Code `/validate` skill. Declared `idempotent: true,
|
|
1382
|
+
hasSideEffects: false` (local file write only; no provider calls, no
|
|
1383
|
+
git push, no PR comment, no SARIF upload).
|
|
1384
|
+
|
|
1385
|
+
**Documented deviation from the spec table.** The v6 spec
|
|
1386
|
+
([docs/specs/v6-run-state-engine.md](docs/specs/v6-run-state-engine.md),
|
|
1387
|
+
line 161) lists `validate` with externalRefs `sarif-artifact`. The
|
|
1388
|
+
v6.0.5 wrap matches the `idempotent: true, hasSideEffects: false`
|
|
1389
|
+
declaration but does **not** plumb a `sarif-artifact` externalRef — the
|
|
1390
|
+
v6.0.5 `validate` CLI verb does not emit a SARIF artifact. SARIF
|
|
1391
|
+
emission lives in `claude-autopilot run --format sarif --output <path>`
|
|
1392
|
+
(a separate verb). The SARIF reference is local-only file output (no
|
|
1393
|
+
remote upload), so the engine doesn't need a readback rule for it on
|
|
1394
|
+
resume — `idempotent: true` covers replay safety. If a future PR adds
|
|
1395
|
+
SARIF emission directly to this verb, the wrap can add a
|
|
1396
|
+
`ctx.emitExternalRef({ kind: 'sarif-artifact', ... })` call after the
|
|
1397
|
+
file write lands. Documented inline in `src/cli/validate.ts` and in the
|
|
1398
|
+
wrapping recipe's deviation note.
|
|
1399
|
+
|
|
1400
|
+
The engine-off code path is byte-for-byte unchanged; the `validate`
|
|
1401
|
+
verb is brand new in v6.0.5 (validation previously lived only as a
|
|
1402
|
+
Claude Code skill).
|
|
1403
|
+
|
|
1404
|
+
### Test count
|
|
1405
|
+
|
|
1406
|
+
After v6.0.4 baseline: 1390 → 1396 (+6). +6 cases for
|
|
1407
|
+
`validate-engine-smoke.test.ts`, mirroring the
|
|
1408
|
+
`review-engine-smoke.test.ts` shape: engine off → no run dir + log
|
|
1409
|
+
written; engine off (cliEngine: false); engine on → state.json +
|
|
1410
|
+
events.ndjson with the right lifecycle (`run.start` →
|
|
1411
|
+
`phase.start` → `phase.success` → `run.complete`); engine on with
|
|
1412
|
+
explicit `--context`; env-resolved; CLI override beats env. Typecheck
|
|
1413
|
+
clean.
|
|
1414
|
+
|
|
1415
|
+
### Deliberately deferred
|
|
1416
|
+
|
|
1417
|
+
- Wrapping the remaining pipeline phases (`implement`, `migrate`,
|
|
1418
|
+
`pr`). Side-effecting phases need careful externalRef plumbing per
|
|
1419
|
+
the recipe's "side effects" gate; wrap them last.
|
|
1420
|
+
- Adding SARIF emission directly to the `validate` verb. Lives in
|
|
1421
|
+
`claude-autopilot run --format sarif` (separate verb).
|
|
1422
|
+
- Extracting a shared `runPhaseWithLifecycle` helper across the eight
|
|
1423
|
+
wrapped verbs. Separate refactor PR — out of scope for v6.0.5.
|
|
1424
|
+
- Flipping the v6.0 built-in default to ON. v6.1 territory.
|
|
1425
|
+
|
|
1426
|
+
## v6.0.4 — Engine wire-up Part D (2026-05-06)
|
|
1427
|
+
|
|
1428
|
+
**The headline.** v6.0.3 wrapped `brainstorm` and `spec`. v6.0.4 continues
|
|
1429
|
+
the mechanical wrap pattern from the recipe at
|
|
1430
|
+
[`docs/v6/wrapping-pipeline-phases.md`](docs/v6/wrapping-pipeline-phases.md)
|
|
1431
|
+
with two more single-shot verbs:
|
|
1432
|
+
|
|
1433
|
+
- **`plan`** ([#98](https://github.com/axledbetter/claude-autopilot/pull/98)) —
|
|
1434
|
+
new CLI verb. Engine-wrap shell for the plan pipeline phase. Writes a
|
|
1435
|
+
plan markdown stub under `.guardrail-cache/plans/`; the actual
|
|
1436
|
+
LLM-driven planning content is owned by the Claude Code
|
|
1437
|
+
superpowers:writing-plans skill. Declared `idempotent: true,
|
|
1438
|
+
hasSideEffects: false` (local file write only; no provider calls, no
|
|
1439
|
+
git push, no PR comment).
|
|
1440
|
+
- **`review`** ([#98](https://github.com/axledbetter/claude-autopilot/pull/98)) —
|
|
1441
|
+
new CLI verb. Engine-wrap shell for the review pipeline phase. Writes
|
|
1442
|
+
a review log stub under `.guardrail-cache/reviews/`; the actual
|
|
1443
|
+
LLM-driven review content is owned by the Claude Code review skills
|
|
1444
|
+
(`/review`, `/review-2pass`, `pr-review-toolkit:review-pr`). Declared
|
|
1445
|
+
`idempotent: true, hasSideEffects: false`.
|
|
1446
|
+
|
|
1447
|
+
**Documented deviation from the spec table.** The v6 spec
|
|
1448
|
+
([docs/specs/v6-run-state-engine.md](docs/specs/v6-run-state-engine.md))
|
|
1449
|
+
lists `review` with externalRefs `review-comments`, implying PR-side
|
|
1450
|
+
comment posting (which would force `hasSideEffects: true`). The v6.0.4
|
|
1451
|
+
`review` verb does **not** post anywhere — PR-side comment posting
|
|
1452
|
+
lives in `claude-autopilot pr --inline-comments` /
|
|
1453
|
+
`--post-comments` (a separate verb). If a future PR adds platform-side
|
|
1454
|
+
comment posting to this verb, both declarations will need to flip and
|
|
1455
|
+
the readback rules will need to plumb a `review-comments` externalRef.
|
|
1456
|
+
Documented inline in `src/cli/review.ts`.
|
|
1457
|
+
|
|
1458
|
+
**Backward-compat — `review` grouping prefix preserved.**
|
|
1459
|
+
`claude-autopilot review` (no args) still prints the alpha.2 prefix
|
|
1460
|
+
help banner per the V16 v4-compat test. Flat-verb invocation requires
|
|
1461
|
+
at least one flag, e.g. `claude-autopilot review --engine`.
|
|
1462
|
+
`claude-autopilot help review` continues to surface the flat-verb
|
|
1463
|
+
Options block via `buildCommandHelpText`.
|
|
1464
|
+
|
|
1465
|
+
Engine-off code paths are unchanged for both verbs.
|
|
1466
|
+
|
|
1467
|
+
### Test count
|
|
1468
|
+
|
|
1469
|
+
After v6.0.3 baseline: 1378 → 1390 (+12). +6 cases for
|
|
1470
|
+
`plan-engine-smoke.test.ts`, +6 cases for `review-engine-smoke.test.ts`.
|
|
1471
|
+
Both mirror `costs-engine-smoke.test.ts`: engine off → no run dir;
|
|
1472
|
+
engine on → state.json + events.ndjson with the right lifecycle
|
|
1473
|
+
(`run.start` → `phase.start` → `phase.success` → `run.complete`);
|
|
1474
|
+
env-resolved; CLI override beats env. Typecheck clean.
|
|
1475
|
+
|
|
1476
|
+
### Deliberately deferred
|
|
1477
|
+
|
|
1478
|
+
- Wrapping the remaining pipeline phases (`implement`, `migrate`,
|
|
1479
|
+
`validate`, `pr`). Side-effecting phases (`implement`, `migrate`,
|
|
1480
|
+
`pr`) need careful externalRef plumbing per the recipe's "side
|
|
1481
|
+
effects" gate; wrap them last.
|
|
1482
|
+
- Flipping the v6.0 built-in default to ON. v6.1 territory.
|
|
1483
|
+
|
|
1484
|
+
## v6.0.3 — Wrap brainstorm + spec through runPhase (2026-05-05)
|
|
1485
|
+
|
|
1486
|
+
**The headline.** v6.0.3 continues the mechanical phase-wrap pattern from
|
|
1487
|
+
the recipe at
|
|
1488
|
+
[`docs/v6/wrapping-pipeline-phases.md`](docs/v6/wrapping-pipeline-phases.md)
|
|
1489
|
+
with two more pipeline verbs:
|
|
1490
|
+
|
|
1491
|
+
- **`brainstorm`** — the pipeline entry point. Implemented primarily as
|
|
1492
|
+
a Claude Code skill (`/brainstorm` → `superpowers:brainstorming`); the
|
|
1493
|
+
CLI verb is an advisory shim pointing the user there. The wrap declares
|
|
1494
|
+
`idempotent: true, hasSideEffects: false`. Engine-off path is
|
|
1495
|
+
byte-for-byte identical to v6.0.2 (the same advisory banner). Engine-on
|
|
1496
|
+
path creates a run dir + emits `run.start` / `phase.start` /
|
|
1497
|
+
`phase.success` / `run.complete`. `--json` envelope shape is preserved
|
|
1498
|
+
for back-compat with the WS7 welcome regression guard and
|
|
1499
|
+
`json-channel-discipline.test.ts`.
|
|
1500
|
+
- **`spec`** — same shape as brainstorm. New top-level subcommand (it
|
|
1501
|
+
was previously absent from `SUBCOMMANDS`); the CLI verb is an advisory
|
|
1502
|
+
shim pointing at the autopilot/brainstorm Claude Code flow. Same wrap
|
|
1503
|
+
flags + same engine lifecycle.
|
|
1504
|
+
|
|
1505
|
+
**Documented deviation from the spec table.** The
|
|
1506
|
+
[v6 spec table](docs/specs/v6-run-state-engine.md) declares both
|
|
1507
|
+
`brainstorm` and `spec` `idempotent: no` because the LLM dialogue
|
|
1508
|
+
produces new content each invocation. v6.0.3 declares `idempotent: true`
|
|
1509
|
+
because the CLI verbs themselves are static advisory prints with no LLM
|
|
1510
|
+
call and no externalRefs to reconcile — the engine's idempotency check
|
|
1511
|
+
is "safe to retry without reconciliation," not "produces byte-identical
|
|
1512
|
+
output." Justified inline at the top of `src/cli/brainstorm.ts` and
|
|
1513
|
+
`src/cli/spec.ts` plus a deviation block in the recipe. Once the CLI
|
|
1514
|
+
verbs grow real LLM bodies (a future v6.x lift), the declaration may
|
|
1515
|
+
flip and a `spec-file` externalRef will land on every successful run.
|
|
1516
|
+
|
|
1517
|
+
Engine-off code paths are unchanged for both verbs; existing tests pass
|
|
1518
|
+
without modification.
|
|
1519
|
+
|
|
1520
|
+
### Test count
|
|
1521
|
+
|
|
1522
|
+
1367 → 1378 (+11). +5 cases for `brainstorm-engine-smoke.test.ts`, +5
|
|
1523
|
+
cases for `spec-engine-smoke.test.ts`, +1 case for `spec` joining
|
|
1524
|
+
`MIGRATED_VERBS` in `json-channel-discipline.test.ts`. Both new smoke
|
|
1525
|
+
files mirror `costs-engine-smoke.test.ts`: engine off → no run dir;
|
|
1526
|
+
engine on → state.json + events.ndjson with the right lifecycle
|
|
1527
|
+
(`run.start` → `phase.start` → `phase.success` → `run.complete`);
|
|
1528
|
+
env-resolved; CLI override beats env. Typecheck clean.
|
|
1529
|
+
|
|
1530
|
+
### Deliberately deferred
|
|
1531
|
+
|
|
1532
|
+
- Wrapping the six remaining pipeline phases (`plan`, `implement`,
|
|
1533
|
+
`migrate`, `validate`, `pr`, `review`). One or two per release across
|
|
1534
|
+
v6.0.4+. A parallel agent works `plan` + `review` for v6.0.4.
|
|
1535
|
+
- Promoting `brainstorm`/`spec` from advisory shims to full LLM-bearing
|
|
1536
|
+
CLI verbs. The Claude Code skill remains the user-facing entry point;
|
|
1537
|
+
the CLI wraps exist so the engine has a place to record run-state for
|
|
1538
|
+
future multi-phase orchestration.
|
|
1539
|
+
|
|
1540
|
+
## v6.0.2 — Engine wire-up Part B (2026-05-06)
|
|
1541
|
+
|
|
1542
|
+
**The headline.** v6.0.1 wrapped the first pipeline phase (`scan`) through
|
|
1543
|
+
`runPhase`. v6.0.2 continues the mechanical wrap pattern from the recipe at
|
|
1544
|
+
[`docs/v6/wrapping-pipeline-phases.md`](docs/v6/wrapping-pipeline-phases.md)
|
|
1545
|
+
with two more single-shot verbs:
|
|
1546
|
+
|
|
1547
|
+
- **`costs`** ([#96](https://github.com/axledbetter/claude-autopilot/pull/96)) —
|
|
1548
|
+
pure read-only summary of the local cost ledger. The cleanest possible
|
|
1549
|
+
wrap: `idempotent: true, hasSideEffects: false`, no provider, no LLM,
|
|
1550
|
+
no file writes. CLI dispatcher passes `cliEngine` + `envEngine` through;
|
|
1551
|
+
`--config` flag also wired since the engine resolver consults config.
|
|
1552
|
+
- **`fix`** ([#96](https://github.com/axledbetter/claude-autopilot/pull/96)) —
|
|
1553
|
+
applies LLM-generated patches to local files. Declared
|
|
1554
|
+
`idempotent: true` (same finding + same file content → same patch) and
|
|
1555
|
+
`hasSideEffects: false` (no remote / git push / PR creation in the
|
|
1556
|
+
existing flow — purely local file edits, which the recipe defines as
|
|
1557
|
+
platform-side-effect-free). If/when fix grows a `--push` mode it will
|
|
1558
|
+
flip to `hasSideEffects: true` with a `git-remote-push` externalRef.
|
|
1559
|
+
|
|
1560
|
+
**Documented deviation from the recipe.** Both wraps follow the recipe
|
|
1561
|
+
mechanically. `fix` adds one explicit deviation: its phase body emits
|
|
1562
|
+
per-finding console output and reads a [y/n/q] confirmation via
|
|
1563
|
+
`readline`. Pure side-effect-free phase bodies are the recipe default,
|
|
1564
|
+
but interactive verbs are an explicit exception (same precedent as
|
|
1565
|
+
`scan` keeping its LLM call inside `executeScanPhase`). The summary line
|
|
1566
|
+
+ exit-code logic still lives in `renderFixOutput` so the engine path's
|
|
1567
|
+
idempotency isn't coupled to the final stdout shape. See the new "Note
|
|
1568
|
+
on interactive verbs" section at the bottom of the wrapping recipe.
|
|
1569
|
+
|
|
1570
|
+
Engine-off code paths are byte-for-byte unchanged for both verbs;
|
|
1571
|
+
existing tests pass without modification.
|
|
1572
|
+
|
|
1573
|
+
### Test count
|
|
1574
|
+
|
|
1575
|
+
1356 → 1367 (+11). +6 cases for `costs-engine-smoke.test.ts`, +5 cases
|
|
1576
|
+
for `fix-engine-smoke.test.ts`. Both mirror `scan-engine-smoke.test.ts`:
|
|
1577
|
+
engine off → no run dir; engine on → state.json + events.ndjson with
|
|
1578
|
+
the right lifecycle (`run.start` → `phase.start` → `phase.success` →
|
|
1579
|
+
`run.complete`); env-resolved; CLI override beats env. Typecheck clean.
|
|
1580
|
+
|
|
1581
|
+
### Deliberately deferred
|
|
1582
|
+
|
|
1583
|
+
- Wrapping the seven remaining pipeline phases (`brainstorm`, `plan`,
|
|
1584
|
+
`implement`, `migrate`, `validate`, `pr`, `review`). One or two per
|
|
1585
|
+
release across v6.0.3+.
|
|
1586
|
+
- Flipping the v6.0 built-in default to ON. v6.1 territory.
|
|
1587
|
+
|
|
1588
|
+
## v6.0.1 — Engine wire-up Part A (2026-05-05)
|
|
1589
|
+
|
|
1590
|
+
**The headline.** v6.0 shipped the engine modules but left the user-facing
|
|
1591
|
+
knobs un-wired. This release lights up the three knobs (`--engine` /
|
|
1592
|
+
`--no-engine` CLI flag, `CLAUDE_AUTOPILOT_ENGINE` env var,
|
|
1593
|
+
`engine.enabled` config key) with explicit precedence (CLI > env > config
|
|
1594
|
+
> built-in default) and wraps the **first** pipeline phase — `scan` —
|
|
1595
|
+
through `runPhase`. Every other pipeline phase still bypasses the engine;
|
|
1596
|
+
those land one or two per PR across subsequent v6.0.x releases following
|
|
1597
|
+
the recipe at [`docs/v6/wrapping-pipeline-phases.md`](docs/v6/wrapping-pipeline-phases.md).
|
|
1598
|
+
|
|
1599
|
+
The engine still ships **OFF** by default in v6.0.x. The default flip to
|
|
1600
|
+
**ON** lands in v6.1 per [`docs/specs/v6.1-default-flip.md`](docs/specs/v6.1-default-flip.md).
|
|
1601
|
+
|
|
1602
|
+
### What landed (PR #95)
|
|
1603
|
+
|
|
1604
|
+
- **`resolveEngineEnabled()` precedence resolver.** Pure / no-IO function
|
|
1605
|
+
in `src/core/run-state/resolve-engine.ts`. Inputs:
|
|
1606
|
+
`{cliEngine?, envValue?, configEnabled?, builtInDefault?}`. Outputs:
|
|
1607
|
+
`{enabled, source, reason, invalidEnvValue?}`. Accepts case-insensitive
|
|
1608
|
+
env values `on/off/true/false/1/0/yes/no` (plus whitespace tolerance);
|
|
1609
|
+
invalid values fall through to the next-lowest precedence layer and
|
|
1610
|
+
surface the raw string in `invalidEnvValue` so the caller can emit a
|
|
1611
|
+
`run.warning`. **+45 unit tests** covering every precedence layer, every
|
|
1612
|
+
accepted env form, the conflict rules, and the invalid-env fallthrough.
|
|
1613
|
+
- **CLI flag parsing in `src/cli/index.ts`.** New `parseEngineCliFlag()`
|
|
1614
|
+
helper rejects the conflict case (both `--engine` AND `--no-engine`)
|
|
1615
|
+
with `invalid_config` exit 1. Wired into the `scan` case to pass
|
|
1616
|
+
`cliEngine` + `envEngine` (from `process.env.CLAUDE_AUTOPILOT_ENGINE`)
|
|
1617
|
+
through to `runScan`.
|
|
1618
|
+
- **Config schema** (`src/core/config/types.ts` + `schema.ts`). New
|
|
1619
|
+
optional `engine.enabled: boolean` knob; schema rejects unknown
|
|
1620
|
+
sub-keys (`additionalProperties: false`).
|
|
1621
|
+
- **Help text** (`src/cli/help-text.ts`). New `GLOBAL_FLAGS_BLOCK`
|
|
1622
|
+
documents `--json` / `--engine` / `--no-engine` + the precedence
|
|
1623
|
+
matrix + scope (scan only in v6.0.1; rest follows the recipe). Per-verb
|
|
1624
|
+
`scan` Options block adds the new flags so `claude-autopilot help scan`
|
|
1625
|
+
is self-contained.
|
|
1626
|
+
- **`scan` pilot phase wrapping** (`src/cli/scan.ts`). Refactored the
|
|
1627
|
+
LLM-call-and-finding-processing portion into `executeScanPhase(input)`
|
|
1628
|
+
→ `ScanOutput` (pure, no console output, no exit-code logic). Defined
|
|
1629
|
+
`RunPhase<ScanInput, ScanOutput>` with `name: 'scan'`,
|
|
1630
|
+
`idempotent: true`, `hasSideEffects: false`. Engine-on path:
|
|
1631
|
+
`createRun()` → `runPhase()` → `run.complete` event +
|
|
1632
|
+
`replayState`/`writeStateSnapshot` refresh + best-effort lock release
|
|
1633
|
+
in `finally`. Engine-off path: `executeScanPhase(input)` directly,
|
|
1634
|
+
byte-for-byte unchanged from v6.0. Rendering extracted into
|
|
1635
|
+
`renderScanOutput()` so the engine path's idempotency isn't coupled
|
|
1636
|
+
to console output. Test seam (`__testReviewEngine`) lets the smoke test
|
|
1637
|
+
inject a fake without an LLM key.
|
|
1638
|
+
- **End-to-end smoke test** (`tests/cli/scan-engine-smoke.test.ts`).
|
|
1639
|
+
Drives `runScan` with the engine on against a tmp project; asserts
|
|
1640
|
+
`state.status === 'success'`, single `scan` phase with the right
|
|
1641
|
+
`idempotent` / `hasSideEffects` flags, monotonic seq numbers, and the
|
|
1642
|
+
full lifecycle (`run.start` → `phase.start` → `phase.success` →
|
|
1643
|
+
`run.complete`). Five cases including engine-off (no run dir),
|
|
1644
|
+
env-resolved, CLI override, and invalid-env-fallthrough warning.
|
|
1645
|
+
- **Wrapping recipe doc** (`docs/v6/wrapping-pipeline-phases.md`).
|
|
1646
|
+
Six-step recipe + phase-status table + idempotency decision tree +
|
|
1647
|
+
worked example (scan) + a checklist subsequent v6.0.x PRs follow when
|
|
1648
|
+
wrapping the remaining ten pipeline phases (`brainstorm`, `plan`,
|
|
1649
|
+
`implement`, `migrate`, `validate`, `pr`, `review`, `fix`, `costs`).
|
|
1650
|
+
- **Migration guide** (`docs/v6/migration-guide.md`). "What works today"
|
|
1651
|
+
list updated — three knobs move from "wiring pending" to "wired (limited
|
|
1652
|
+
to scan)". Other phases still tracked under "wiring pending."
|
|
1653
|
+
- **Spec reconciliation** (`docs/specs/v6-run-state-engine.md`). New "What
|
|
1654
|
+
was actually built (v6.0.1 — Part A)" block.
|
|
1655
|
+
|
|
1656
|
+
### Test count
|
|
1657
|
+
|
|
1658
|
+
1306 → 1356 (+50). Typecheck clean. Existing 1306 tests continue to pass
|
|
1659
|
+
unchanged — the engine-off code path for `scan` is byte-for-byte
|
|
1660
|
+
identical to v6.0.
|
|
1661
|
+
|
|
1662
|
+
### Deliberately deferred
|
|
1663
|
+
|
|
1664
|
+
- Wrapping of any other pipeline phase. Lands one or two per PR across
|
|
1665
|
+
v6.0.2+ following the recipe.
|
|
1666
|
+
- Flipping the v6.0 built-in default to ON. v6.1 territory.
|
|
1667
|
+
- Removing `--no-engine`. v7 territory.
|
|
1668
|
+
|
|
1669
|
+
## v6.0 — Run State Engine (2026-05-05)
|
|
1670
|
+
|
|
1671
|
+
**The headline.** Autopilot moves from a stateless command-stream to a
|
|
1672
|
+
checkpointed, resumable, budget-bounded, observable pipeline. Every run gets
|
|
1673
|
+
a ULID and a per-project directory at `.guardrail-cache/runs/<ulid>/`.
|
|
1674
|
+
Every state transition appends a typed event to `events.ndjson` and updates
|
|
1675
|
+
`state.json` atomically. Two-layer budget enforcement (advisory `estimateCost`
|
|
1676
|
+
preflight + mandatory runtime guard) hard-stops runaway spend before it
|
|
1677
|
+
happens. Every CLI verb grows a `--json` flag with strict stdout/stderr
|
|
1678
|
+
channel discipline so CI consumers can drive the pipeline programmatically.
|
|
1679
|
+
Side-effect phase replay decisions consult persisted `externalRefs` plus a
|
|
1680
|
+
live provider read-back so resume is safe by construction. **v6.0 ships
|
|
1681
|
+
with the engine OFF by default — opt-in via `engine.enabled: true` (config
|
|
1682
|
+
wiring across 6.0.x point releases). Default flips to ON in v6.1.** See
|
|
1683
|
+
[`docs/v6/migration-guide.md`](docs/v6/migration-guide.md) for the v5.x → v6
|
|
1684
|
+
walkthrough and [`docs/v6/quickstart.md`](docs/v6/quickstart.md) for the
|
|
1685
|
+
five-minute version.
|
|
1686
|
+
|
|
1687
|
+
### Per-phase landings
|
|
1688
|
+
|
|
1689
|
+
- **Phase 1 — Run State Engine persistence layer ([#86](https://github.com/axledbetter/claude-autopilot/pull/86)).** `RunState` / `RunEvent` / `PhaseSnapshot` / `ExternalRef` / `WriterId` types in `src/core/run-state/types.ts`. Pure-TS 26-char Crockford Base32 ULID generator (`ulid.ts`). Per-run advisory lock via `proper-lockfile` + `.lock-meta.json` sidecar with PID + SHA-256-hashed hostname; off-host writers default to alive (fail closed) so a network-mounted lock can't be stolen. Durable append protocol for `events.ndjson` (`open(O_APPEND)` → `write` → `fsync(fd)` → `close` per event) with monotonic `seq` via `.seq` sidecar. Truncated last-line detection emits `run.recovery(reason: 'recovered-from-partial-write')` and continues; mid-file corruption throws `partial_write` immediately. Atomic snapshot writer for `state.json` (`open(.tmp)` → `fsync(fd)` → `rename` → `fsync(dirfd)`; tmpfs/SMB compatibility via swallowed EISDIR/EPERM/ENOTSUP on the dir-fsync). `recoverState` falls back to events replay when `state.json` is missing/corrupt. `createRun` / `listRuns` / `gcRuns` lifecycle helpers; symlink-safe GC. New `ErrorCode` variants: `lock_held`, `corrupted_state`, `partial_write`. **+56 tests.**
|
|
1690
|
+
- **Phase 2 — Phase wrapper + lifecycle ([#87](https://github.com/axledbetter/claude-autopilot/pull/87)).** `RunPhase<I, O>` interface (`idempotent` / `hasSideEffects` / `estimateCost?` / `run` / `onResume?`). `runPhase` orchestrator emits `phase.start` → `phase.success`/`failed` and gates idempotent short-circuit + side-effecting replay. Atomic per-phase snapshot writer (`writePhaseSnapshot` with path-traversal rejection on phase names). Hidden CLI verb `claude-autopilot internal log-phase-event` exposed via `cli-internal.ts` so markdown-driven skills can append events without importing the engine. Sub-phase nesting via synthetic `phaseIdx` encoding (`parentIdx * 1000 + childOrdinal`). **+27 tests.** Spec deviation: idempotent-replay short-circuit emits `run.warning(details.reason: 'idempotent-replay')` instead of a new `phase.skipped` event variant — durable log doesn't need a new shape since the snapshot is identical.
|
|
1691
|
+
- **Phase 3 — `runs` / `run resume` CLI ([#88](https://github.com/axledbetter/claude-autopilot/pull/88)).** Six verbs: `runs list` (newest-first, `--status` filter), `runs show <id>` (state + optional events tail), `runs gc` (default 30-day cutoff, confirmation gate), `runs delete <id>` (terminal-status guard + lock acquisition), `runs doctor` (replay vs snapshot drift; `--fix` rewrites), `run resume <id>` (**lookup-only** in v6.0 — identifies next phase + decision rationale; live execution wires in 6.1+). Every verb supports `--json` envelope output (v1 schema). New `Engine` group in `HELP_GROUPS`. Decision vocabulary (`retry` / `skip-idempotent` / `needs-human` / `already-complete`) preserved as a thin wrapper around the canonical `decideReplay` matrix introduced in Phase 6. **No changes to existing CLI verbs.**
|
|
1692
|
+
- **Phase 4 — Budget enforcement ([#89](https://github.com/axledbetter/claude-autopilot/pull/89)).** `BudgetConfig` (`perRunUSD`, `perPhaseUSD?`, `councilMaxRecursionDepth?`, `bgAutopilotMaxRoundsPerSelfEat?`, `conservativePhaseReserveUSD?`). `checkPhaseBudget` pure decision function with two-layer policy: (1) advisory — uses `estimateCost.high` if the phase declares one; (2) mandatory — runs regardless, enforces `actualSoFar + conservativePhaseReserveUSD <= perRunUSD` so phases without `estimateCost` still trigger budget gates. `runPhase` emits a `budget.check` event with full decision rationale (`{phase, phaseIdx, estimatedHigh, actualSoFar, reserveApplied, capRemaining, decision, reason}`) before every spawn; throws `GuardrailError(budget_exceeded)` on hard-fail. Council synthesizer recursion bounded via `councilMaxRecursionDepth` — exceeded calls return `status: 'partial'` rather than continuing. **+25-30 tests.**
|
|
1693
|
+
- **Phase 5 — Typed JSON events + strict `--json` channel discipline ([#90](https://github.com/axledbetter/claude-autopilot/pull/90)).** `--json` flag now lives on every Review / Pipeline / Deploy / Migrate / Diagnostics verb. Strict channel contract enforced by a dispatcher-level wrapper (`runUnderJsonMode` in `src/cli/json-envelope.ts`): exactly **one** JSON envelope on stdout per invocation; **only** NDJSON event lines on stderr (synthetic `run.warning` for legacy text via `installJsonModeChannelDiscipline` console-wrap); ANSI color codes stripped; interactive prompts hard-fail with `EXIT_NEEDS_HUMAN = 78` and the envelope's `nextActions` field carries the resume hint. Text-mode behavior unchanged. **`tests/cli/json-channel-discipline.test.ts` asserts the invariants per migrated verb.**
|
|
1694
|
+
- **Phase 6 — Idempotency contracts + provider read-back ([#91](https://github.com/axledbetter/claude-autopilot/pull/91)).** `decideReplay` pure decision matrix in `replay-decision.ts` maps `(priorSuccess, idempotent, hasSideEffects, refs, readbacks, forceReplay)` → `'retry' | 'skip-already-applied' | 'needs-human' | 'abort'`. Pluggable `ProviderReadback` registry in `provider-readback.ts` with built-in read-backs for `github` (via `gh` CLI), `vercel` / `fly` / `render` (via the deploy adapters), `supabase` (via `migration_state`). All read-backs **fail closed** — any throw, parse failure, or unrecognized state collapses to `existsOnPlatform=false, currentState='unknown'` so the matrix routes to `needs-human` instead of a silent skip. `runPhase` wires `decideReplay` (replaces Phase 2's hard-coded throw). New `replay.override` event variant emitted when `--force-replay` flips a refusal into a retry; `foldEvents` records overrides on `phase.meta.replayOverrides`. `PhaseSnapshot.result` field added so `skip-already-applied` returns the prior output without re-execution. CLI lookup (`runRunResume`) delegates to the same `decideReplay` so prediction matches live execution. **+55 tests.**
|
|
1695
|
+
- **Phase 7 — Live adapter certification suite ([#92](https://github.com/axledbetter/claude-autopilot/pull/92)).** Five live assertions × three providers (Vercel + Fly + Render): deploy success, auth failure, 404, rollback, log streaming with redaction-on-planted-secret. Env-gated via `resolveProviderEnv()` — runs report `skipped` until the operator adds the seven `*_TEST` GitHub Secrets per `docs/adapters/cert-suite.md`. Flake-control harness (`tests/adapters/live/_harness.ts`) implements per-provider 3-attempt retry budget with exp backoff (1s / 4s / 16s) on transient categories, hard-fail (no retry) on auth/404/schema-mismatch, soft-fail with 3-strike escalation on rollout/log-streaming flakes; **+42 unit tests** for the harness alone (run under regular `npm test`, no live creds required). Nightly CI workflow at `.github/workflows/adapter-cert.yml` (09:00 UTC + manual `workflow_dispatch`); uploads `events.ndjson` + `log-tail.txt` artifacts on every run. **Spec deviation:** Fly cert needs a third env var (`FLY_IMAGE_TEST`) since the Fly adapter doesn't build images per the v5.6 design.
|
|
1696
|
+
- **Phase 8 — Docs + migration guide ([#94](https://github.com/axledbetter/claude-autopilot/pull/94), this PR).** `docs/v6/migration-guide.md` walks v5.x users through the opt-in flow with a precedence matrix, troubleshooting recipes, the per-phase idempotency table, and the v6.0 → v6.1 default-flip plan. `docs/v6/quickstart.md` is the five-minute version. README gains a "Run State Engine (v6)" section. CHANGELOG (this entry) bundles every phase. Spec gets a Phase 8 reconciliation block + a Status column on the implementation phases table. New `docs/specs/v6.1-default-flip.md` outlines the stabilization criteria for flipping `engine.enabled` to `true` by default and removing `--no-engine`.
|
|
1697
|
+
- **Spec — Codex-reviewed twice ([#85](https://github.com/axledbetter/claude-autopilot/pull/85)).** Two passes through Codex 5.3 hardened the persistence protocol (durable append + atomic snapshot ordering), promoted `events.ndjson` to source-of-truth with `state.json` as a derived cache, mandated copy-not-symlink for artifacts, added the two-layer budget policy with a mandatory runtime guard, formalized the strict `--json` channel discipline, defined the external-operation ledger for replay safety (`ExternalRef` + provider read-back), pinned the precedence matrix, and added flake-control parameters for the live adapter cert suite.
|
|
1698
|
+
|
|
1699
|
+
### Codex / council pricing — from the GPT-5.5 swap ([#93](https://github.com/axledbetter/claude-autopilot/pull/93))
|
|
1700
|
+
|
|
1701
|
+
- **Default codex/council model bumped `gpt-5.3-codex` → `gpt-5.5`.** OpenAI
|
|
1702
|
+
released GPT-5.5 (codename Spud) on 2026-04-23 — better at coding than 5.4
|
|
1703
|
+
with fewer tokens, available via standard Responses/Chat Completions API
|
|
1704
|
+
at `gpt-5.5` (no `-codex` suffix). Pricing **doubles** to $5/1M input +
|
|
1705
|
+
$30/1M output, so the per-adapter `COST_PER_M_INPUT/OUTPUT` defaults moved
|
|
1706
|
+
in lockstep — without this, every cost-ledger entry would silently halve.
|
|
1707
|
+
New canonical pricing table at `src/adapters/pricing.ts` keeps the legacy
|
|
1708
|
+
`gpt-5.3-codex` and `gpt-5.4` entries for back-compat with pinned
|
|
1709
|
+
`CODEX_MODEL`/`council.models[].model` configs. Override via env vars
|
|
1710
|
+
(`CODEX_MODEL`, `CODEX_COST_INPUT_PER_M`, `CODEX_COST_OUTPUT_PER_M`).
|
|
1711
|
+
|
|
1712
|
+
## v5.6.0 — Fly.io + Render deploy adapters (2026-05-04)
|
|
1713
|
+
|
|
1714
|
+
### Added
|
|
1715
|
+
|
|
1716
|
+
- **`@delegance/claude-autopilot deploy --adapter fly`** — first-class Fly.io adapter. Image-based releases via the Machines API (image must be pre-pushed via `fly deploy --build-only --push`), polling-based status, **WebSocket log streaming**, **native rollback** with simulated fallback when the API endpoint is unavailable. `FLY_API_TOKEN` env var; auth doctor warns when missing.
|
|
1717
|
+
- **`@delegance/claude-autopilot deploy --adapter render`** — first-class Render adapter. REST API deploys (with optional `clearCache`), service-scoped status polling at `GET /v1/services/{serviceId}/deploys/{deployId}`, REST-polling log stream with `(timestamp, logId)` cursor dedup, **simulated rollback** by re-deploying the previous successful commit. `RENDER_API_KEY` env var; auth doctor warns when missing.
|
|
1718
|
+
- **`DeployAdapterCapabilities` interface** — adapters declare `streamMode: 'websocket' | 'polling' | 'none'` and `nativeRollback: boolean`. CLI prints a one-line stderr notice for polling-mode adapters under `--watch` so users understand why log lines arrive in batches.
|
|
1719
|
+
- **Bounded auto-rollback orchestration in `src/cli/deploy.ts`** — when health check fails after deploy and `rollbackOn: [healthCheckFailure]` is configured, the CLI fires exactly one rollback (no chains), with `runHealthCheck` capped at 5 attempts × 6s backoff (~30s window). New terminal `DeployResult.status` values: `fail_rolled_back` and `fail_rollback_failed`.
|
|
1720
|
+
- **HTTP-status error taxonomy** — new `not_found` `ErrorCode` joins the union; per-adapter mapping: 401/403→`auth`, 404→`not_found`, 422/400→`invalid_config`, 5xx→`transient_network` (retryable). Provider request-id headers (`Fly-Request-Id`, `x-request-id`) captured into `error.details` for support tickets.
|
|
1721
|
+
- **Mandatory log redaction across all adapters** — every log line surfaced into `DeployResult.output` or PR-comment bodies runs through `redactLogLines()` (defaults: `AKIA…`, `sk-…`, `eyJ…`, `ghp_`, `xoxb-`, plus user-configurable `config.persistence.redactionPatterns`). Closes a real existing security hazard in the v5.4 Vercel adapter that was emitting unredacted logs into PR comments.
|
|
1722
|
+
- **Shared `src/adapters/deploy/_http.ts`** — extracted `fetchWithRetry` + `safeReadBody` helpers used by Vercel, Fly, and Render adapters; one canonical retry implementation to maintain.
|
|
1723
|
+
|
|
1724
|
+
### Fixed
|
|
1725
|
+
|
|
1726
|
+
- **Bugbot caught + autopilot fixed 4 real bugs across the v5.6 self-eat phases.** HIGH on Phase 2 (Render service-scoped URL — `pollUntilTerminal` and `status()` were using shorthand `/v1/deploys/{id}` which doesn't exist on Render's API). MEDIUM on Phase 3 (Render cursor dedup wasn't sorting same-ms entries by id, silently dropping out-of-order siblings). LOW on Phase 4 (`printAutoRollback` hardcoded "failed 3x" but the constant is now 5). LOW on Phase 5 (`getPreviousFileContent` was being called for `.sql` files where `previousContent` is ignored, wasting a `git show` spawn per migration).
|
|
1727
|
+
- **Schema-alignment diff-aware Prisma parsing (PR #44, schema-alignment cleanup)** — `getPreviousFileContent` now defaults to a CI-aware base ref (`GITHUB_BASE_REF` → `origin/<base>`, then `CI_MERGE_REQUEST_TARGET_BRANCH_NAME`, fallback `HEAD~1`) instead of always reading from `HEAD` (which gave empty diffs in CI). Dropped models now emit `drop_column` for every field of the removed model.
|
|
1728
|
+
- **Tombstone CLI no longer crashes with a stack trace when presets are missing (PR #82)** — schema-validator was running file IO at module load time, so every `claude-autopilot --version` call eagerly read `presets/aliases.lock.json` + `presets/schemas/migrate.schema.json`; missing presets crashed the CLI before it could format an error. Now lazy-init via memoized `getValidator()`.
|
|
1729
|
+
|
|
1730
|
+
## v5.5.2 — Framework-agnostic /migrate (2026-04-30)
|
|
1731
|
+
|
|
1732
|
+
### Added
|
|
1733
|
+
|
|
1734
|
+
- **Working examples for Rails, Alembic, Django, golang-migrate, Prisma, Drizzle, dbmate, Flyway, supabase-cli, custom scripts** in `skills/migrate/SKILL.md`. The dispatcher was always framework-agnostic, but the prior doc text only described the Supabase path.
|
|
1735
|
+
- **Detector `defaultCommand` fills** for `prisma-push`, `drizzle-push`, `golang-migrate`, `typeorm` so `claude-autopilot init` produces a working `stack.md` on first try for these toolchains.
|
|
1736
|
+
|
|
1737
|
+
### Fixed
|
|
1738
|
+
|
|
1739
|
+
- **`/migrate` skill description rewritten** as a generic dispatcher description with a "when to use migrate-supabase instead" callout. Anyone running `migrate@1` in a non-Supabase repo no longer sees Supabase-specific instructions.
|
|
1740
|
+
|
|
1741
|
+
## v5.5.1 — `openai` SDK now optional (2026-04-30)
|
|
1742
|
+
|
|
1743
|
+
### Changed
|
|
1744
|
+
|
|
1745
|
+
- **`openai` moved to `optionalDependencies`** alongside `@anthropic-ai/sdk`, `@google/generative-ai`, `@modelcontextprotocol/sdk`. All four LLM SDKs are now optional. `npm install --omit=optional` shed grows to **~26 MB** (was ~13 MB after v5.5.0). `scripts/autoregress.ts` migrated to `loadOpenAI()` — the last direct `import OpenAI` outside the adapter layer.
|
|
1746
|
+
|
|
1747
|
+
### Notes
|
|
1748
|
+
|
|
1749
|
+
- Council runner already handles missing-synth-SDK gracefully — returns `status: 'partial'` with the friendly install hint surfaced via the synthesis error field. Users with only `ANTHROPIC_API_KEY` get a partial result with model responses preserved.
|
|
1750
|
+
|
|
1751
|
+
## v5.5.0 — Lazy-load LLM SDKs + Vercel auth doctor (2026-04-30)
|
|
4
1752
|
|
|
5
1753
|
### Added
|
|
6
1754
|
|
|
7
|
-
- **`
|
|
8
|
-
-
|
|
9
|
-
-
|
|
1755
|
+
- **`src/adapters/sdk-loader.ts`** with `loadAnthropic` / `loadOpenAI` / `loadGoogleGenerativeAI` + `isSdkInstalled` helper. Friendly `GuardrailError` on `MODULE_NOT_FOUND` points at the exact `npm install` command.
|
|
1756
|
+
- **Phase 6 of v5.4 spec — Vercel auth doctor.** `claude-autopilot doctor` detects `deploy.adapter: vercel` in `guardrail.config.yaml` and warns when `VERCEL_TOKEN` is missing.
|
|
1757
|
+
- **LLM SDK install-state surface in doctor** — shows which optional LLM SDKs are actually installed.
|
|
1758
|
+
|
|
1759
|
+
### Changed
|
|
1760
|
+
|
|
1761
|
+
- **`@anthropic-ai/sdk`, `@google/generative-ai`, `@modelcontextprotocol/sdk` moved to `optionalDependencies`**. Six adapters converted from top-level import to dynamic load. Users with `--omit=optional` shed ~13 MB and only need the SDK matching their API key.
|
|
1762
|
+
|
|
1763
|
+
## v5.4.0 — Vercel first-class deploy adapter (2026-04-30)
|
|
1764
|
+
|
|
1765
|
+
### Added
|
|
1766
|
+
|
|
1767
|
+
- **`@delegance/claude-autopilot deploy --adapter vercel`** — first-class Vercel adapter via the v13 deployments API. Returns `dpl_xxx` IDs, polls status until terminal, populates `deployUrl` / `buildLogsUrl` / `output`. Auth via `VERCEL_TOKEN`.
|
|
1768
|
+
- **`--watch` SSE+NDJSON log streaming** — subscribes to `/v2/deployments/<id>/events?builds=1`, prints to stderr in real time. Reconnects once with exp backoff on disconnect.
|
|
1769
|
+
- **`claude-autopilot deploy rollback` + `deploy status`** — CLI subverbs over the adapter's `rollback()` / `status()` methods. `--to <id>` overrides "previous prod deploy" lookup.
|
|
1770
|
+
- **Auto-rollback on health-check failure** — when `rollbackOn: [healthCheckFailure]` is set in config, the CLI promotes the previous prod deploy if the post-deploy health check fails. PR comment shows both URLs (new + rolled-back-to).
|
|
1771
|
+
- **`<!-- claude-autopilot-deploy -->` upserting PR comment** — single comment is updated in place across deploy → log-stream → health-check → rollback, instead of spamming the PR with multiple comments.
|
|
1772
|
+
|
|
1773
|
+
### Fixed
|
|
1774
|
+
|
|
1775
|
+
- **Bugbot caught explicit `--config <missing>` was silently ignored on PR #63 (Phase 3)** — autopilot fixed it with a regression test in 4 minutes.
|
|
1776
|
+
- **Phase 4 introduced a regression in Phase 2's `--watch` test surface; caught via `npm test` before PR opened**, autopilot adapted spec interpretation (made health-check opt-in instead of falling back to deployUrl) and documented the deviation.
|
|
1777
|
+
|
|
1778
|
+
### Notes
|
|
10
1779
|
|
|
11
|
-
|
|
1780
|
+
- This release was **shipped as four self-eat PRs** (#59, #61, #63, #64) where autopilot implemented its own next phase end-to-end. Cumulative cost ~\$17.50, wall clock ~82 min, 47 new tests. See [DEMO.md](DEMO.md) for the full proof set.
|
|
1781
|
+
- v5.3 "deploy phase" was superseded by v5.4 — the adapter pattern subsumed the generic-command-only design from the in-flight v5.3 spec.
|
|
12
1782
|
|
|
13
1783
|
## v5.2.2 — Demo polish
|
|
14
1784
|
|