@delegance/claude-autopilot 5.5.2 → 7.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (150) hide show
  1. package/CHANGELOG.md +1776 -6
  2. package/README.md +65 -1
  3. package/bin/_launcher.js +38 -23
  4. package/dist/src/adapters/council/openai.js +12 -6
  5. package/dist/src/adapters/deploy/_http.d.ts +43 -0
  6. package/dist/src/adapters/deploy/_http.js +99 -0
  7. package/dist/src/adapters/deploy/fly.d.ts +206 -0
  8. package/dist/src/adapters/deploy/fly.js +696 -0
  9. package/dist/src/adapters/deploy/index.d.ts +2 -0
  10. package/dist/src/adapters/deploy/index.js +33 -0
  11. package/dist/src/adapters/deploy/render.d.ts +181 -0
  12. package/dist/src/adapters/deploy/render.js +550 -0
  13. package/dist/src/adapters/deploy/types.d.ts +67 -3
  14. package/dist/src/adapters/deploy/vercel.d.ts +17 -1
  15. package/dist/src/adapters/deploy/vercel.js +29 -49
  16. package/dist/src/adapters/pricing.d.ts +36 -0
  17. package/dist/src/adapters/pricing.js +40 -0
  18. package/dist/src/adapters/review-engine/codex.js +10 -7
  19. package/dist/src/cli/autopilot.d.ts +75 -0
  20. package/dist/src/cli/autopilot.js +750 -0
  21. package/dist/src/cli/brainstorm.d.ts +23 -0
  22. package/dist/src/cli/brainstorm.js +131 -0
  23. package/dist/src/cli/costs.d.ts +15 -1
  24. package/dist/src/cli/costs.js +99 -10
  25. package/dist/src/cli/dashboard/index.d.ts +5 -0
  26. package/dist/src/cli/dashboard/index.js +49 -0
  27. package/dist/src/cli/dashboard/login.d.ts +22 -0
  28. package/dist/src/cli/dashboard/login.js +260 -0
  29. package/dist/src/cli/dashboard/logout.d.ts +12 -0
  30. package/dist/src/cli/dashboard/logout.js +45 -0
  31. package/dist/src/cli/dashboard/status.d.ts +30 -0
  32. package/dist/src/cli/dashboard/status.js +65 -0
  33. package/dist/src/cli/dashboard/upload.d.ts +16 -0
  34. package/dist/src/cli/dashboard/upload.js +48 -0
  35. package/dist/src/cli/deploy.d.ts +3 -3
  36. package/dist/src/cli/deploy.js +34 -9
  37. package/dist/src/cli/engine-flag-deprecation.d.ts +14 -0
  38. package/dist/src/cli/engine-flag-deprecation.js +20 -0
  39. package/dist/src/cli/fix.d.ts +18 -0
  40. package/dist/src/cli/fix.js +105 -11
  41. package/dist/src/cli/help-text.d.ts +52 -0
  42. package/dist/src/cli/help-text.js +416 -0
  43. package/dist/src/cli/implement.d.ts +91 -0
  44. package/dist/src/cli/implement.js +196 -0
  45. package/dist/src/cli/index.d.ts +2 -1
  46. package/dist/src/cli/index.js +774 -245
  47. package/dist/src/cli/json-envelope.d.ts +187 -0
  48. package/dist/src/cli/json-envelope.js +270 -0
  49. package/dist/src/cli/json-mode.d.ts +33 -0
  50. package/dist/src/cli/json-mode.js +201 -0
  51. package/dist/src/cli/migrate.d.ts +111 -0
  52. package/dist/src/cli/migrate.js +305 -0
  53. package/dist/src/cli/plan.d.ts +81 -0
  54. package/dist/src/cli/plan.js +149 -0
  55. package/dist/src/cli/pr.d.ts +106 -0
  56. package/dist/src/cli/pr.js +191 -19
  57. package/dist/src/cli/preflight.js +26 -0
  58. package/dist/src/cli/review.d.ts +27 -0
  59. package/dist/src/cli/review.js +126 -0
  60. package/dist/src/cli/runs-watch-renderer.d.ts +45 -0
  61. package/dist/src/cli/runs-watch-renderer.js +275 -0
  62. package/dist/src/cli/runs-watch.d.ts +41 -0
  63. package/dist/src/cli/runs-watch.js +395 -0
  64. package/dist/src/cli/runs.d.ts +122 -0
  65. package/dist/src/cli/runs.js +902 -0
  66. package/dist/src/cli/scaffold.d.ts +39 -0
  67. package/dist/src/cli/scaffold.js +287 -0
  68. package/dist/src/cli/scan.d.ts +93 -0
  69. package/dist/src/cli/scan.js +166 -40
  70. package/dist/src/cli/setup.d.ts +30 -0
  71. package/dist/src/cli/setup.js +137 -0
  72. package/dist/src/cli/spec.d.ts +66 -0
  73. package/dist/src/cli/spec.js +132 -0
  74. package/dist/src/cli/validate.d.ts +29 -0
  75. package/dist/src/cli/validate.js +131 -0
  76. package/dist/src/core/config/schema.d.ts +9 -0
  77. package/dist/src/core/config/schema.js +7 -0
  78. package/dist/src/core/config/types.d.ts +11 -0
  79. package/dist/src/core/council/runner.d.ts +10 -1
  80. package/dist/src/core/council/runner.js +25 -3
  81. package/dist/src/core/council/types.d.ts +7 -0
  82. package/dist/src/core/errors.d.ts +1 -1
  83. package/dist/src/core/errors.js +11 -0
  84. package/dist/src/core/logging/redaction.d.ts +13 -0
  85. package/dist/src/core/logging/redaction.js +20 -0
  86. package/dist/src/core/migrate/schema-validator.js +15 -1
  87. package/dist/src/core/phases/static-rules.d.ts +5 -1
  88. package/dist/src/core/phases/static-rules.js +2 -5
  89. package/dist/src/core/run-state/budget.d.ts +88 -0
  90. package/dist/src/core/run-state/budget.js +141 -0
  91. package/dist/src/core/run-state/cli-internal.d.ts +21 -0
  92. package/dist/src/core/run-state/cli-internal.js +174 -0
  93. package/dist/src/core/run-state/events.d.ts +59 -0
  94. package/dist/src/core/run-state/events.js +512 -0
  95. package/dist/src/core/run-state/lock.d.ts +61 -0
  96. package/dist/src/core/run-state/lock.js +206 -0
  97. package/dist/src/core/run-state/phase-context.d.ts +60 -0
  98. package/dist/src/core/run-state/phase-context.js +108 -0
  99. package/dist/src/core/run-state/phase-registry.d.ts +137 -0
  100. package/dist/src/core/run-state/phase-registry.js +162 -0
  101. package/dist/src/core/run-state/phase-runner.d.ts +80 -0
  102. package/dist/src/core/run-state/phase-runner.js +447 -0
  103. package/dist/src/core/run-state/provider-readback.d.ts +130 -0
  104. package/dist/src/core/run-state/provider-readback.js +426 -0
  105. package/dist/src/core/run-state/replay-decision.d.ts +69 -0
  106. package/dist/src/core/run-state/replay-decision.js +144 -0
  107. package/dist/src/core/run-state/resolve-engine.d.ts +45 -0
  108. package/dist/src/core/run-state/resolve-engine.js +74 -0
  109. package/dist/src/core/run-state/resume-preflight.d.ts +66 -0
  110. package/dist/src/core/run-state/resume-preflight.js +116 -0
  111. package/dist/src/core/run-state/run-phase-with-lifecycle.d.ts +69 -0
  112. package/dist/src/core/run-state/run-phase-with-lifecycle.js +193 -0
  113. package/dist/src/core/run-state/runs.d.ts +57 -0
  114. package/dist/src/core/run-state/runs.js +288 -0
  115. package/dist/src/core/run-state/snapshot.d.ts +14 -0
  116. package/dist/src/core/run-state/snapshot.js +114 -0
  117. package/dist/src/core/run-state/state.d.ts +40 -0
  118. package/dist/src/core/run-state/state.js +164 -0
  119. package/dist/src/core/run-state/types.d.ts +284 -0
  120. package/dist/src/core/run-state/types.js +19 -0
  121. package/dist/src/core/run-state/ulid.d.ts +11 -0
  122. package/dist/src/core/run-state/ulid.js +95 -0
  123. package/dist/src/core/schema-alignment/extractor/index.d.ts +1 -1
  124. package/dist/src/core/schema-alignment/extractor/index.js +2 -2
  125. package/dist/src/core/schema-alignment/extractor/prisma.d.ts +13 -1
  126. package/dist/src/core/schema-alignment/extractor/prisma.js +65 -10
  127. package/dist/src/core/schema-alignment/git-history.d.ts +19 -0
  128. package/dist/src/core/schema-alignment/git-history.js +53 -0
  129. package/dist/src/core/static-rules/rules/brand-tokens.js +2 -2
  130. package/dist/src/core/static-rules/rules/schema-alignment.js +14 -4
  131. package/dist/src/dashboard/auto-upload.d.ts +26 -0
  132. package/dist/src/dashboard/auto-upload.js +107 -0
  133. package/dist/src/dashboard/config.d.ts +22 -0
  134. package/dist/src/dashboard/config.js +109 -0
  135. package/dist/src/dashboard/upload/canonical.d.ts +3 -0
  136. package/dist/src/dashboard/upload/canonical.js +16 -0
  137. package/dist/src/dashboard/upload/chain.d.ts +9 -0
  138. package/dist/src/dashboard/upload/chain.js +27 -0
  139. package/dist/src/dashboard/upload/snapshot.d.ts +23 -0
  140. package/dist/src/dashboard/upload/snapshot.js +66 -0
  141. package/dist/src/dashboard/upload/uploader.d.ts +54 -0
  142. package/dist/src/dashboard/upload/uploader.js +330 -0
  143. package/package.json +19 -3
  144. package/scripts/autoregress.ts +1 -1
  145. package/scripts/test-runner.mjs +4 -0
  146. package/skills/claude-autopilot.md +1 -1
  147. package/skills/make-interfaces-feel-better/SKILL.md +104 -0
  148. package/skills/simplify-ui/SKILL.md +103 -0
  149. package/skills/ui/SKILL.md +117 -0
  150. package/skills/ui-ux-pro-max/SKILL.md +90 -0
package/CHANGELOG.md CHANGED
@@ -1,14 +1,1784 @@
1
- # Changelog
1
+ ## Unreleased
2
+
3
+ - v5.6 Phase 7 (docs reconciliation) — pending.
4
+
5
+ ## 7.2.0 (2026-05-10)
6
+
7
+ **v7.2.0 — `claude-autopilot scaffold --from-spec <path>`.** Closes
8
+ the biggest remaining day-1 friction the v7.1.6 blank-repo benchmark
9
+ identified. Even with auto-scaffolded `CLAUDE.md` + `.gitignore`
10
+ (v7.1.7), a fresh repo still needs a hand-written `package.json`,
11
+ `tsconfig.json`, and directory skeleton before any feature work
12
+ happens. The new verb collapses that step.
13
+
14
+ **New verb** reads a spec markdown file's `## Files` section and:
15
+
16
+ * Creates listed directories (`mkdir -p`).
17
+ * Creates empty placeholder files for each path in the section.
18
+ * Generates a starter `package.json` (Node 22 ESM defaults +
19
+ hint-merged `bin` / `dependencies` / `scripts` parsed loosely
20
+ from the spec prose).
21
+ * Generates a starter `tsconfig.json` — JS-flavor (`allowJs +
22
+ checkJs + noEmit`) when the spec lists predominantly `.js` files,
23
+ TS-flavor (compiled to `dist/`) for `.ts` files.
24
+
25
+ **Never overwrites existing files** — operator opted into autopilot,
26
+ not into us nuking their package.json. Reports `· exists` for each
27
+ preserved file. Idempotent: re-running on a partially-scaffolded
28
+ repo only fills the gaps.
29
+
30
+ `--dry-run` flag logs what would happen without writing.
31
+
32
+ **End-to-end smoke**: scaffold from the actual v7.1.6 benchmark
33
+ spec produces a 100%-correct skeleton in ~50ms (3 dirs + 5
34
+ placeholder files + matching package.json bin/deps/scripts).
35
+
36
+ **Out of scope (deferred to v8):**
37
+
38
+ * Per-stack scaffolding (Python `pyproject.toml`, Go `go.mod`,
39
+ Rust `Cargo.toml`). v7.2.0 ships Node ESM only — covers the
40
+ v7.1.6 benchmark stack and the most common starter case.
41
+ * Running `npm install`. Operator picks the package manager.
42
+
43
+ 11 new tests (4 parser + 2 builder + 5 end-to-end). 1548 → 1559
44
+ CLI tests; tsc clean; build clean. New verb registered in
45
+ `src/cli/index.ts` + listed in `Pipeline:` help group. Version
46
+ 7.1.9 → 7.2.0 (minor bump for new verb surface).
47
+
48
+ ## 7.1.9 (2026-05-10)
49
+
50
+ **v7.1.9 — build fix + Generic-stack next-steps hint.** Two
51
+ micro-fixes from the v7.1.8 benchmark re-run.
52
+
53
+ * **`canonicalize` declared at root** (`package.json`). The CLI's
54
+ `src/dashboard/upload/canonical.ts` (RFC 8785 / JCS parity copy
55
+ of `apps/web/lib/upload/canonical.ts`) imports `canonicalize`
56
+ but the module was only declared in `apps/web/package.json`.
57
+ Root build hit `TS2307: Cannot find module 'canonicalize'` even
58
+ though the package was actually installed via npm hoisting. Now
59
+ declared at root — `npm run build` from a fresh clone is clean.
60
+ * **Generic+low-confidence next-steps hint** (`src/cli/setup.ts`).
61
+ The v7.1.8 benchmark re-run on a truly blank repo reported
62
+ "Detected: Generic (low confidence)" with no actionable next
63
+ step. Setup now surfaces a one-liner:
64
+ `npm init -y` → `npx claude-autopilot setup --force`. Skipped
65
+ silently on high-confidence detections (the common case).
66
+
67
+ 2 new tests (`tests/setup.test.ts`); 1546 → 1548 CLI tests; tsc
68
+ clean; build clean. Version 7.1.8 → 7.1.9.
69
+
70
+ ## 7.1.8 (2026-05-10)
71
+
72
+ **v7.1.8 — blank-repo benchmark re-run on v7.1.7.** Docs-only PR.
73
+ Friction-reduction delta measurement after the v7.1.7 polish PR.
74
+
75
+ **All three v7.1.7 fixes verified end-to-end** on a fresh `git init`
76
+ repo:
77
+
78
+ * `.gitignore` auto-created with `.guardrail-cache/` + `node_modules/`.
79
+ * `CLAUDE.md` auto-scaffolded with detected stack, test command,
80
+ Conventional Commits convention, error class shape, branch
81
+ naming, TODO slots.
82
+ * Deprecation banner deduped per UTC day via
83
+ `~/.claude-autopilot/.deprecation-shown` stamp.
84
+
85
+ **Friction score: 3 of 6 v7.1.6 friction points closed; 1 partially
86
+ closed; 2 deferred.** Matches v7.1.6 prediction ("would close ~5 of
87
+ 6") with minor over-promise.
88
+
89
+ **New friction surfaced:**
90
+
91
+ * Stale `dist/` after merge requires `npm run build` for local
92
+ contributors (invisible to `npm install -g` users).
93
+ * Build hits one stale TS error (`canonicalize` not declared at
94
+ root level) — 4 v7.1.7 helpers compiled, setup ran end-to-end,
95
+ filing as separate followup.
96
+ * `Detected: Generic (low confidence)` on truly blank repos —
97
+ honest but suggests next-step "scaffold a `package.json` first
98
+ for higher-confidence detection."
99
+
100
+ **New recommendations:** suggest stack-scaffold step in `setup`
101
+ next-steps when detection is `Generic` (~20min ship);
102
+ `scaffold --from-spec` verb (deferred from v7.1.6, ~1-day);
103
+ per-stack starter `tsconfig.json` / `pyproject.toml` (~2-4hr per
104
+ stack).
105
+
106
+ **Methodology caveat:** Phase B (impl agent) NOT re-run — wall-clock
107
+ impact is downstream and would need another full agent dispatch to
108
+ measure precisely. The friction-point table tells most of the story.
109
+
110
+ Full report at `docs/benchmarks/2026-05-10-blank-repo-v7.1.7.md`.
111
+ No code change; bumping to 7.1.8 to keep CHANGELOG/version line in
112
+ lockstep with master HEAD.
113
+
114
+ ## 7.1.7 (2026-05-10)
115
+
116
+ **v7.1.7 — `setup` verb day-1 polish.** Three fixes from the v7.1.6
117
+ blank-repo benchmark report. Operator-facing improvements; no
118
+ breaking changes; no migration.
119
+
120
+ * **Per-calendar-day deprecation dedup** (`bin/_launcher.js`). The
121
+ v6.3+ stamp was keyed by `process.ppid + tty/pipe` — fine for
122
+ interactive shells, broken for git hooks (fresh shell per hook =
123
+ fresh ppid = stamp re-created every commit, notice printed every
124
+ commit). New stamp at `~/.claude-autopilot/.deprecation-shown`
125
+ contains `YYYY-MM-DD` and dedups by UTC day per machine.
126
+ Override env vars (`CLAUDE_AUTOPILOT_DEPRECATION=always|never`)
127
+ preserved.
128
+ * **Auto-add `node_modules/` + `.guardrail-cache/` to `.gitignore`**
129
+ on `setup` (`src/cli/setup.ts`). New `ensureGitignoreEntries()`
130
+ helper: idempotent (re-running never duplicates), preserves
131
+ existing entries, creates `.gitignore` from scratch if missing.
132
+ * **Auto-scaffold starter `CLAUDE.md`** when one doesn't exist
133
+ (`src/cli/setup.ts`). New `ensureStarterClaudeMd()` helper writes
134
+ ~35 lines covering: detected stack + confidence, test command,
135
+ Conventional Commits convention, error class shape, branch naming,
136
+ TODO slots for "patterns to mimic" + "common pitfalls". Closes
137
+ ~5 of 6 friction points the benchmark agent reported. Never
138
+ overwrites an existing `CLAUDE.md`.
139
+
140
+ 13 new tests (4 setup + 6 launcher + 3 idempotency / overwrite-safety).
141
+ 1539 → 1546 CLI tests. tsc clean. Version bump 7.1.6 → 7.1.7 to
142
+ keep CHANGELOG/version line in lockstep with master HEAD.
143
+
144
+ ## 7.1.6 (2026-05-09)
145
+
146
+ **v7.1.6 — blank-repo benchmark report.** Docs-only PR. Captures
147
+ the day-1 experience of using `claude-autopilot` on a true `git init`
148
+ repo, end-to-end from "empty directory" to "feature shipped + tests
149
+ passing." Triggered by codex W5 from the autopilot product-direction
150
+ brainstorm.
151
+
152
+ **Headline:** ~17 minutes from `git init` to working MVP (small CLI,
153
+ Node 22 ESM, with a real Anthropic API call). Setup itself is ~6
154
+ seconds. Pre-commit static-rules hook caught accidentally-staged
155
+ secrets on day 1 (real-world value, not theoretical).
156
+
157
+ **Top friction points:** no `CLAUDE.md` scaffolded by `setup`;
158
+ deprecation banner prints on every commit; `.gitignore` doesn't
159
+ auto-add `node_modules/` or `.guardrail-cache/`; no `scaffold
160
+ --from-spec` verb.
161
+
162
+ **Top recommendations:** dedup deprecation banner (~30min ship),
163
+ auto-add cache dirs to `.gitignore` (~10min ship), auto-scaffold
164
+ starter `CLAUDE.md` on `setup` (~2-4hr ship). Fully-autonomous-from-
165
+ blank requires Option C (standalone daemon) work first — flagged as
166
+ v8 dependency.
167
+
168
+ Full report at `docs/benchmarks/2026-05-09-blank-repo.md`. Bumping
169
+ to 7.1.6 to keep CHANGELOG/version line in lockstep with master HEAD.
170
+
171
+ ## 7.1.5 (2026-05-09)
172
+
173
+ **v7.1.5 — change-aware CI matrix.** CI infra optimization;
174
+ no application code change; no test additions.
175
+
176
+ The v7.0+ repo runs 6 GitHub Actions workflows on every PR
177
+ (bin smoke ×6 OS×Node + Test Node 22 + Delegance regression +
178
+ tarball check + apps/web typecheck/build/tests + RLS). Many of
179
+ those are irrelevant to PRs that only touch a different layer
180
+ (apps/web-only PRs don't need bin smoke; CLI-only PRs don't
181
+ need apps/web tests; docs-only PRs don't need anything).
182
+
183
+ Each workflow's `pull_request:` trigger now includes a `paths:`
184
+ filter — GitHub Actions skips the workflow entirely on PRs that
185
+ don't touch any matching file:
186
+
187
+ * `ci.yml` (Test Node 22), `bin-parity.yml` (bin smoke ×6),
188
+ `delegance-regression.yml`: triggered by CLI changes (`src/**`,
189
+ `bin/**`, `tests/**` for ci.yml, `scripts/**`, `presets/**`)
190
+ and conservative shared paths (`tsconfig*`, `package.json`,
191
+ `package-lock.json`, the workflow file itself).
192
+ * `web-tests.yml`: triggered by `apps/**`, `tsconfig*`,
193
+ `package.json`, `package-lock.json`, the workflow file.
194
+ * `db-tests.yml`: triggered by `db/**`, `tests/rls/**`,
195
+ `package.json`, `package-lock.json`, the workflow file.
196
+ * `npm-tarball-check.yml`: triggered by anything that affects
197
+ the published artifact (`package.json`, `.npmignore`,
198
+ `package-lock.json`, CLI source).
199
+
200
+ **Codex pass W4 safety net:** `push:` triggers (master + tag
201
+ pushes) deliberately have NO `paths:` filter. Every master merge
202
+ runs the full matrix, catching anything that slipped past the
203
+ PR-level filter (e.g. a config change in a directory we forgot
204
+ to enumerate). The PR-level filter is a latency optimization,
205
+ not a correctness boundary.
206
+
207
+ **Expected effect:** apps/web-only PRs (Phase 5.7-7.1.4 polish
208
+ shape) drop from ~12-15min CI wall clock to ~5-7min. Docs-only
209
+ PRs become a no-op CI run.
210
+
211
+ No package code change; bumping to 7.1.5 to keep CHANGELOG/
212
+ version-line in lockstep with master HEAD.
213
+
214
+ ## 7.1.4 (2026-05-09)
215
+
216
+ **v7.1.4 — fix recurring PGRST002 RLS workflow flake.** CI infra
217
+ fix; no application code change; no test additions. Phase 5.1, 5.7,
218
+ and 7.1.3 all hit the same intermittent failure in the RLS negative
219
+ tests workflow:
2
220
 
3
- ## v5.3.0 — Deploy phase (in flight, not yet shipped)
221
+ ```
222
+ PGRST002 — Could not query the database for the schema cache. Retrying.
223
+ ```
224
+
225
+ PostgREST caches the database schema asynchronously AFTER
226
+ `supabase db reset` returns. The first SDK queries from the test
227
+ runner often arrive before the cache has finished warming, hard-
228
+ failing instead of waiting.
229
+
230
+ Fix: new "Wait for PostgREST schema cache to warm up" workflow step
231
+ between `Apply migrations` and `Run RLS tests`. Polls
232
+ `GET /rest/v1/` (PostgREST OpenAPI doc) up to 60s; succeeds on the
233
+ first response that parses as JSON with an `info` field. Times out
234
+ with diagnostic body if the cache doesn't warm.
235
+
236
+ Changes only `.github/workflows/db-tests.yml`. No package code
237
+ change, but bumping to 7.1.4 to keep version-line/CHANGELOG in
238
+ lockstep with master HEAD.
239
+
240
+ ## 7.1.3 (2026-05-09)
241
+
242
+ **v7.1.3 — `/api/health/v7-readiness` deploy-verification endpoint.**
243
+ Hosted product (`apps/web/`) only. Operator-facing improvement; no
244
+ breaking changes; no migration.
245
+
246
+ * New `GET /api/health/v7-readiness` route, gated by
247
+ `Authorization: Bearer ${CRON_SECRET}` (constant-time compare via
248
+ `crypto.timingSafeEqual`).
249
+ * Verifies in one HTTP call:
250
+ - `check_membership_status` RPC is present + executable (closes
251
+ codex PR #141 PR-pass WARNING #3 — the Phase 6 migration must
252
+ be applied before deploying any v7.0+ web image, or every
253
+ org-scoped dashboard request returns `check_failed` within 60s).
254
+ - All 12 required env vars are set (Supabase, Stripe, WorkOS,
255
+ JWT/SSO/cookie secrets meeting ≥32-byte minimums where
256
+ applicable).
257
+ * Response: `200 {ok: true, totalChecks, passed, failed: 0, checks}`
258
+ on full pass; `503 {ok: false, ...}` with per-check
259
+ `{name, status, required, message?}` diagnostic on any required
260
+ failure.
261
+ * Operator runbook updated with `curl -fsSL` example for an
262
+ automated deploy-step gate.
263
+ * 8 new tests in `apps/web/__tests__/api/health/v7-readiness.test.ts`
264
+ covering happy path, missing env, too-short secret, RPC missing,
265
+ three auth-failure modes (no header, wrong secret, malformed
266
+ Bearer), and missing CRON_SECRET → 500.
267
+ * 613 → 621 web tests; 1536 CLI unchanged; tsc clean.
268
+
269
+ ## 7.1.2 (2026-05-09)
270
+
271
+ **v7.1.2 — configurable membership-check TTL.** Hosted product
272
+ (`apps/web/`) only. Operator-facing improvement; no breaking changes;
273
+ no migration.
274
+
275
+ * New optional env var `MEMBERSHIP_CHECK_TTL_SECONDS` overrides the
276
+ default 60s `cao_membership_check` cookie TTL. Bounded `[1, 3600]`.
277
+ * Lower TTL = tighter revocation window (≤N seconds for a disabled
278
+ member to see 403 on next dashboard request) at the cost of more
279
+ `check_membership_status` RPC calls per dashboard navigation.
280
+ * Higher TTL = fewer RPC calls but extends the v7.0 documented
281
+ "≤60s revocation latency" guarantee.
282
+ * Invalid values (non-integer, < 1, > 3600) silently fall back to 60
283
+ with a one-shot warn (same pattern as the v7.1.1 PREVIOUS-secret
284
+ validator).
285
+ * 6 new tests in `cookie-hmac.test.ts` cover: default 60 when unset;
286
+ valid integer in range; non-numeric falls back; float falls back;
287
+ out-of-range (< 1, < 0, > 3600) falls back; signed cookie exp
288
+ respects the configured TTL via sign+verify roundtrip.
289
+ * 607 → 613 web tests; 1536 CLI unchanged; tsc clean.
290
+
291
+ ## 7.1.1 (2026-05-09)
292
+
293
+ **v7.1.1 — dual-secret rotation for `MEMBERSHIP_CHECK_COOKIE_SECRET`.**
294
+ Hosted product (`apps/web/`) only. Operator-facing improvement;
295
+ no breaking changes; no migration; no new tests fail/skip.
296
+
297
+ * New optional env var `MEMBERSHIP_CHECK_COOKIE_SECRET_PREVIOUS`.
298
+ When set, `verifyMembershipCookie()` tries `CURRENT` first; on
299
+ signature mismatch, tries `PREVIOUS`. New cookies always sign
300
+ with `CURRENT`. Closes the v7.0 runbook follow-up where rotating
301
+ the secret invalidated every outstanding cookie at once = a
302
+ thundering-herd of `check_membership_status` RPC calls on every
303
+ active dashboard session.
304
+ * Operator rotation flow (4 steps) documented in `docs/v7/runbook.md`
305
+ + `apps/web/.env.example`.
306
+ * `MEMBERSHIP_CHECK_COOKIE_SECRET_PREVIOUS` validation: same
307
+ ≥32-byte minimum as `CURRENT`. Malformed/too-short `PREVIOUS`
308
+ is ignored with a one-shot warn — does not break the happy path.
309
+ * 5 new tests in `apps/web/__tests__/lib/middleware/cookie-hmac.test.ts`
310
+ cover: PREVIOUS verifies during rotation; new cookies sign with
311
+ CURRENT not PREVIOUS; forged-third-secret fails even with both;
312
+ PREVIOUS unset behaves identically to v7.1.0; PREVIOUS too short
313
+ is ignored without breaking CURRENT.
314
+ * 602 → 607 web tests; 1536 CLI unchanged; tsc clean.
315
+
316
+ ## 7.1.0 (2026-05-09)
317
+
318
+ **v7.1 — symmetric ingest revocation closure.** Hosted product
319
+ (`apps/web/`) only. Closes the JWT-authenticated ingest gap that v7.0
320
+ Phase 6 explicitly deferred: collapses the per-request revocation
321
+ window from ≤15min (the JWT TTL) to **≤1 request** for org-scoped runs.
322
+
323
+ ### apps/web — JWT-authenticated ingest membership re-check
324
+
325
+ - New helper `assertActiveMembership(claims)` in
326
+ `apps/web/lib/upload/membership-recheck.ts` — calls the existing
327
+ Phase 6 `check_membership_status` RPC and maps statuses to typed
328
+ errors. Personal runs short-circuit via `!claims.org_id`. Authority
329
+ is `claims.org_id`; the new `mint_status` claim is observability-
330
+ only (codex pass-1 CRITICAL #2 — closed bypass where a v7.0 token
331
+ could skip the check).
332
+ - New orchestrator `verifyTokenAndAssertRunMembership(token, runId,
333
+ supabase)` in `apps/web/lib/upload/auth.ts` — single chokepoint that
334
+ every JWT-authenticated ingest route calls. Combines (1) JWT shape
335
+ + signature verify, (2) JWT.run_id ↔ route runId consistency,
336
+ (3) persisted runs lookup, (4) JWT.org_id ↔ run.organization_id
337
+ consistency (closes cross-org JWT replay AND personal-shortcut
338
+ bypass — codex pass-3 CRITICAL #2), and (5) per-request membership
339
+ re-check.
340
+ - `PUT /api/runs/:runId/events/:seq` and `POST /api/runs/:runId/finalize`
341
+ both call the orchestrator before any side-effect RPC / Storage
342
+ write. Disabled / inactive / no-membership returns 403; transient
343
+ RPC failure returns retryable 503; opaque 404 for run mismatches
344
+ (no enumeration leakage).
345
+ - `POST /api/upload-session` does its own pre-mint
346
+ `check_membership_status` RPC for org-scoped runs. Non-active
347
+ members get 403 `member_not_active` + `audit_events` row with
348
+ `action: 'ingest.mint_refused'`. No upload session created on
349
+ refusal. RPC failure → 503 (retryable parity with event-write/
350
+ finalize, codex pass-2 WARNING #2).
351
+ - JWT shape: `UploadTokenClaims.org_id` is now `string | null` (verify
352
+ normalizes wire-format `''` → `null`); new optional
353
+ `mint_status: 'active' | 'personal'` claim. `MintInput.mintStatus`
354
+ is required.
355
+ - `verifyUploadToken()` is preserved for the JWT-shape unit tests but
356
+ marked `@deprecated`. Routes under `app/api/runs/**` are blocked
357
+ from importing it directly via ESLint `no-restricted-imports`
358
+ (`apps/web/.eslintrc.json`). Defense-in-depth chokepoint
359
+ (codex pass-3 WARNING #5).
360
+
361
+ ### Tests
362
+
363
+ - 32 new/modified web tests (566 → 598). Coverage: mint-time
364
+ membership snapshot (4), event-write re-check (8 — incl. ordering
365
+ spy + v7.0-shape regression), finalize re-check (4), helper unit
366
+ (10 — status enum + RPC error + personal shortcut + v7.0
367
+ back-compat), end-to-end disable-mid-session (1), identity invariant
368
+ (3), JWT shape (4 modified).
369
+ - `__tests__/_helpers/supabase-stub.ts` adds a
370
+ `check_membership_status` RPC handler that reads from the seeded
371
+ `memberships` table.
372
+
373
+ ### Documentation
374
+
375
+ - `docs/v7/breaking-changes.md` — appended "v7.0 → v7.1" section
376
+ covering the rollout (no coordinated cutover; in-flight org-scoped
377
+ tokens enforce immediately).
378
+ - `apps/web/lib/dashboard/auth.ts` — extended the API-key audit
379
+ comment block with the new ingest-API JWT caller list and the
380
+ invariant.
381
+
382
+ ### No SQL migration
383
+
384
+ Phase 6's `check_membership_status` RPC is reused verbatim. v7.1 ships
385
+ pure TypeScript + a single test-stub change.
386
+
387
+ ## 7.0.0 (2026-05-09)
388
+
389
+ **v7.0 — hosted product MVP cutover.** First major bump since v6.0
390
+ (2026-04-22). Drops the engine-off code path, ships the autopilot.dev
391
+ hosted dashboard MVP, closes the last operational gap in dashboard
392
+ session revocation, and bumps the run-state schema_version to mark the
393
+ v7 era.
394
+
395
+ ### Breaking changes (read this first)
396
+
397
+ See [docs/v7/breaking-changes.md](docs/v7/breaking-changes.md) for the
398
+ full migration checklist. The shortlist:
399
+
400
+ - **`--no-engine` removed.** Exits 1 with `invalid_config` if passed.
401
+ The engine is unconditionally on.
402
+ - **`CLAUDE_AUTOPILOT_ENGINE=off` removed (soft).** The env value is
403
+ ignored — engine still runs — but a one-shot stderr deprecation
404
+ banner fires + a `run.warning` event with code `engine_off_removed`
405
+ is emitted into the durable run log. Softer than `--no-engine`
406
+ because env vars in CI are sticky.
407
+ - **`ENGINE_DEFAULT_V6_0` and `ENGINE_DEFAULT_V6_1` exports removed**
408
+ from `src/core/run-state/resolve-engine.ts`. Direct importers must
409
+ replace with literal `true`. `resolveEngineEnabled()` itself is
410
+ preserved for source compatibility but always returns
411
+ `{enabled: true, source: 'default'}`.
412
+ - **`runEngineOff` callback on `runPhaseWithLifecycle` is preserved as
413
+ optional**, but the helper NEVER invokes it in v7.0. New call sites
414
+ should omit it.
415
+ - **`RUN_STATE_SCHEMA_VERSION` bumped 1 → 2.** v6.x runs are still
416
+ readable on v7 (`MIN_SUPPORTED` stays at 1). v6 binaries reading v7
417
+ runs hit a `corrupted_state` error with a "downgrade resume is not
418
+ supported" hint + `[1..1]` range.
419
+ - **`--engine` becomes a no-op shim** with one-shot per-process
420
+ stderr deprecation banner. Flag preserved so existing scripts don't
421
+ break; remove at your leisure (slated for v8).
422
+
423
+ ### apps/web — real-time membership revocation
424
+
425
+ - New middleware extension on `/dashboard/**` and `/api/dashboard/**`.
426
+ Verifies the `cao_active_org` cookie + the HMAC-signed
427
+ `cao_membership_check` cookie cache; on miss/expired/wrong-identity,
428
+ calls the new `check_membership_status(p_org_id, p_user_id)` RPC
429
+ (1.5s timeout, fail-closed on error).
430
+ - Worst-case revocation window collapses from ≤1h (= access-token
431
+ expiry, the v6 baseline) to ≤60s (= cookie cache TTL).
432
+ - New env var: `MEMBERSHIP_CHECK_COOKIE_SECRET` (≥32 bytes;
433
+ `openssl rand -hex 32`). Lazy/runtime validation — `next build` in
434
+ CI without the secret won't crash; middleware fails closed at
435
+ request time if missing.
436
+ - Middleware runtime explicitly set to `nodejs` (was Edge default).
437
+ Required for `node:crypto` HMAC + `crypto.timingSafeEqual`.
438
+ - New page: `/access-revoked?reason=<code>` (Server Component, NOT
439
+ auth-gated, does NOT auto-forward authenticated users to avoid
440
+ redirect loops). Renders one of four reasons with a Sign-out form.
441
+ - Status → reason mapping table is the single source of truth (codex
442
+ pass-3 WARNING #5):
443
+ - `disabled` → `member_disabled`
444
+ - `inactive` / `invite_pending` → `member_inactive`
445
+ - `no_row` → `no_membership`
446
+ - RPC error / timeout → `check_failed`
447
+ - New SQL migration: `data/deltas/20260509200000_phase6_check_membership_rpc.sql`.
448
+ `SECURITY INVOKER` (NOT DEFINER per codex pass-2 WARNING #5 +
449
+ pass-3 WARNING #2 — `service_role` bypasses RLS already, so DEFINER
450
+ would only widen blast radius). REVOKE'd from PUBLIC/anon/authenticated;
451
+ GRANT EXECUTE to `service_role` only.
452
+
453
+ ### Deferred to v7.1
454
+
455
+ - `MEMBERSHIP_CHECK_TTL_SECONDS` env var to let enterprise customers
456
+ tighten the 60s cache window.
457
+ - Server-side cache invalidation on `change_member_role` /
458
+ `disable_member` (would tighten role-change visibility from ≤60s to
459
+ immediate).
460
+ - Phase 2.2 ingest API JWT mint embeds `mint_membership_status` so
461
+ finalize/event endpoints can refuse disabled members within the
462
+ ≤30min JWT TTL.
463
+
464
+ ### Documentation
465
+
466
+ - New: `docs/v7/breaking-changes.md` — explicit v6 → v7 migration
467
+ checklist.
468
+ - New: `docs/v7/runbook.md` — production deployment runbook for the
469
+ hosted product (Vercel env vars grouped by purpose, WorkOS dashboard
470
+ hookups, Stripe products + webhook config, cron secret rotation,
471
+ first-deploy checklist).
472
+ - README — new "Hosted product (v7)" section pointing at autopilot.dev,
473
+ install snippet updated to `npm install -g
474
+ @delegance/claude-autopilot@latest`.
475
+ - `docs/v6/migration-guide.md` — appended v6.2.x → v7.0 section.
476
+
477
+ ### CI / publishing
478
+
479
+ - `.github/workflows/ci.yml` now tags pushes matching
480
+ `v[0-9]+.[0-9]+.[0-9]+` (no suffix) with `--tag latest`; everything
481
+ else stays `--tag next`. `package.json` `publishConfig.tag` stays at
482
+ `next` as a hand-publish fallback only — the workflow is the source
483
+ of truth.
484
+
485
+ ### Phase rollup (v7.0 cycle)
486
+
487
+ - **Phase 1** (schema/RLS) — multi-tenant Postgres + RLS policies for
488
+ the hosted product.
489
+ - **Phase 2.1** (Next.js scaffold) — `apps/web/` workspace, Vercel
490
+ deploy.
491
+ - **Phase 2.2** (ingest API) — signed-session JWT pipeline for
492
+ CLI → dashboard run uploads.
493
+ - **Phase 2.3** (CLI dashboard verbs) — `dashboard {login,logout,
494
+ status,upload}` + cli-auth loopback OAuth.
495
+ - **Phase 3** (Stripe) — entitlements, tiered pricing, webhook.
496
+ - **Phase 4** (dashboard UI + cli-auth hardening) — homepage, auth,
497
+ CSP-locked /cli-auth.
498
+ - **Phases 5.1-5.4** (org admin / WorkOS setup) — members, audit, cost,
499
+ per-tenant SSO connection management.
500
+ - **Phase 5.6** (WorkOS sign-in) — domain verification, SSO
501
+ enforcement chokepoint.
502
+ - **Phase 5.7** (admin lifecycle) — disable_member, sso_disconnect,
503
+ enable_member, last-owner race protection.
504
+ - **Phase 5.8** (lifecycle gap closure) — disabled-API-key
505
+ authorization fix + Vercel cron for cleanup_expired_sso_states.
506
+ - **Phase 6** (this release) — engine-off removal, schema bump, real-
507
+ time membership revocation, runbook, breaking-changes docs.
508
+
509
+ ### Tests
510
+
511
+ - 1500+ existing CLI tests pass (engine-off tests collapsed to
512
+ always-on; net delta near zero).
513
+ - 510 → 566 web tests (+56 across cookie-hmac, check-membership, RPC
514
+ privilege grep, middleware revocation surface, response composition,
515
+ matcher, integration).
516
+ - tsc clean across both `@delegance/claude-autopilot` and
517
+ `@delegance/claude-autopilot-web`.
518
+
519
+ ## 6.3.0-pre.13 (2026-05-09)
520
+
521
+ **v7.0 Phase 5.8 — Lifecycle gap closure.** Closes the two known gaps from Phase 5.7:
522
+
523
+ 1. **Disabled-API-key authorization fix.** The Phase 2.2 `upload-session` and Phase 4 `artifact` routes had `let allowed = run.user_id === auth.userId` as the first authorization check. This allowed a member who got disabled AFTER creating an org-scoped run to keep uploading via their API key. Both routes now ALWAYS require active membership when `run.organization_id` is set, regardless of ownership. Personal (un-org-scoped) runs still use the ownership check. Regression test (`__tests__/api/dashboard/runs/disabled-api-key.test.ts`, 4 cases) locks this in.
524
+ 2. **Vercel cron wiring for `cleanup_expired_sso_states` RPC.** New `GET /api/cron/cleanup-expired-sso-state` route (Vercel cron-secret-gated; rejects any caller without `Authorization: Bearer ${CRON_SECRET}`). Schedule `0 3 * * *` (daily 03:00 UTC) added to `vercel.json`. Calls the Phase 5.7 RPC with default args (24h state age, 30d event age). 4-test coverage (auth happy/fail paths + missing env).
525
+
526
+ New env: `CRON_SECRET` (Vercel sets automatically on production cron-attached projects; local-dev override via `.env.local`). Documented in `.env.example`.
527
+
528
+ Tests: 502 → 510 web. tsc clean.
529
+
530
+ ## 6.3.0-pre.12 (2026-05-09)
531
+
532
+ **v7.0 Phase 5.7 — Admin lifecycle controls + session revocation.** Closes the lifecycle/revocation gap that Phases 5.4 and 5.6 explicitly deferred.
533
+
534
+ Three lifecycle controls:
535
+
536
+ 1. **Admin disable-user** — `POST /api/dashboard/orgs/:orgId/members/:userId/disable` flips `memberships.status='disabled'`, captures `disabled_at`/`disabled_by`, deletes `auth.refresh_tokens` for the user. Existing access tokens expire ≤1h (Supabase default; documented in spec). Idempotent on already-disabled (returns `noop:true`, no duplicate audit, no duplicate revocation). Owner-protection (admin cannot disable owner) + last-owner guard.
537
+ 2. **SSO disconnect cascade** — `apply_workos_event(connection.deleted)` set-based DELETE of refresh tokens for org members (status active OR disabled per codex plan-pass WARNING #1) with verified-domain emails. Audit metadata captures `cascadeRevokedUserCount` + `cascadeRevokedTokenCount` (no user IDs per plan-pass WARNING #5).
538
+ 3. **`cleanup_expired_sso_states` RPC** — service-role only, called via `scripts/cleanup-expired-sso-state.ts` (no HTTP route per codex pass-1 CRITICAL #3). Phase 6 wires a cron.
539
+
540
+ Migration `data/deltas/20260509140000_phase5_7_lifecycle.sql`:
541
+ - ALTER `memberships.status` CHECK extended with `'disabled'` + `disabled_at`/`disabled_by` columns.
542
+ - 4 new SECURITY DEFINER RPCs (REVOKE FROM PUBLIC,anon,authenticated; GRANT TO service_role): `revoke_user_sessions`, `disable_member`, `enable_member`, `cleanup_expired_sso_states`.
543
+ - 2 RPC REPLACEs: `record_workos_sign_in` now refuses `member_disabled` / `member_inactive` / `invite_pending` (codex pass-2 WARNING #1); `apply_workos_event` adds set-based cascade DELETE on `connection.deleted`.
544
+
545
+ Surfaces:
546
+ - `POST /api/dashboard/orgs/:orgId/members/:userId/disable` (admin/owner-gated).
547
+ - `POST /api/dashboard/orgs/:orgId/members/:userId/enable` (admin/owner-gated, symmetric owner protection — only owners can re-enable owners per pass-2 WARNING #3).
548
+ - `GET /api/auth/sso/callback` modified to redirect 302 → `/login/sso?reason={member_disabled|member_inactive|invite_pending}` instead of returning 403 JSON.
549
+
550
+ `/login/sso` page renders 3 new banner reasons. `lib/dashboard/membership-guard.ts` MAP gains 10 new error codes. `package.json` 6.3.0-pre.11 → 6.3.0-pre.12.
551
+
552
+ Tests: 6 new test files (49 tests). disable.test.ts (11), enable.test.ts (4), webhook-cascade.test.ts (5), sso-signin-phase5-7.test.ts (4), phase5-7-privilege.test.ts (16 grep assertions), cleanup-expired-sso-state.test.ts (4), disabled-user-jwt.test.ts (4 — codex plan-pass CRITICAL #2 regression: proves disabled member with still-valid JWT can't access dashboard routes via 4 representative paths). 451 → 500 web tests. tsc clean.
553
+
554
+ **Known gaps (Phase 5.8):**
555
+ - API keys (Phase 2.3) are user-scoped not org-scoped; disabling membership in org A doesn't auto-revoke. Phase 5.8 will add a membership-active check in the API-key auth helper.
556
+ - Access-token expiry is the upper bound on revocation latency (≤1h Supabase default). Real-time revocation requires a request-time denylist + middleware (Phase 6).
557
+ - Cleanup script not yet cron-scheduled (Phase 6).
558
+
559
+ **Codex passes folded:** spec pass-1 (3C+5W+2N), pass-2 (1C+6W), plan-pass (2C+6W+2N). Highlights: dropped global API-key revocation due to cross-tenant blast (gap explicitly documented + deferred); cascade scope includes `'disabled'` per plan WARNING #1; audit metadata drops user IDs sample per plan WARNING #5; explicit disabled-user-JWT regression test proves spec's enforcement-audit table is correct.
560
+
561
+ ## 6.3.0-pre.11 (2026-05-09)
562
+
563
+ **v7.0 Phase 5.6 — WorkOS SSO sign-in flow.** End-to-end SSO sign-in built on the Phase 5.4 foundation. Three sub-features that ship together (any subset is unusable):
564
+
565
+ - **Domain claim with DNS TXT challenge.** Admin-gated `POST/DELETE /api/dashboard/orgs/:orgId/sso/domains` + `POST .../verify`. Codex pass-1 CRITICAL #1 — `ever_verified` flag + unique partial index on `(lower(domain)) WHERE ever_verified=TRUE` blocks revoke-then-takeover by another org.
566
+ - **Sign-in flow.** Public `POST /api/auth/sso/start` (email-only — `orgId`-mode removed for anti-enumeration per codex pass-2 WARNING #8) → `GET /api/auth/sso/callback`. State binding (codex pass-2 CRITICAL #2): single canonical protocol — cookie holds HMAC-signed `{stateId, nonce}`, WorkOS state param = stateId only, server-stored `sso_authentication_states` row + atomic `consume_sso_authentication_state` RPC validates `(stateId, sha256(nonce))` + workos org/connection match. Session minted via admin-mediated magic link (codex pass-1 CRITICAL #4 — `verifyOtp` uses `token_hash` not `token`); session-user-mismatch verification revokes + audits + 500.
567
+ - **`sso_required` toggle.** Owner-only `PATCH /api/dashboard/orgs/:orgId/sso/required`. Asymmetric guard (codex pass-1 WARNING #7): turning OFF always allowed; turning ON requires active SSO. UI banner per codex pass-2 NOTE #2 explains the asymmetric state.
568
+
569
+ Single chokepoint enforcement: `enforceSsoRequired()` helper called from `/api/auth/callback` after every Google/magic-link `exchangeCodeForSession`. Sign-in surface registry table in spec documents the auth boundary.
570
+
571
+ Identity link (codex pass-1 WARNING #6): `workos_user_identities` table preserves `(workos_user_id, workos_organization_id) → user_id` mapping so future sign-ins re-use the same Supabase user even if IdP email changes. Magic link minted with the linked Supabase user's CURRENT email (looked up via `auth.admin.getUserById`), not the WorkOS profile email.
572
+
573
+ Migration `data/deltas/20260509120000_phase5_6_workos_signin.sql`:
574
+ - ALTER `organization_settings` ADD `sso_required BOOLEAN DEFAULT FALSE`.
575
+ - 3 new tables (`organization_domain_claims`, `sso_authentication_states`, `workos_user_identities`) with RLS + service-role grants.
576
+ - 6 SECURITY DEFINER RPCs: `claim_domain`, `mark_domain_verified`, `revoke_domain_claim`, `set_sso_required`, `consume_sso_authentication_state` (atomic UPDATE...RETURNING per codex plan-pass WARNING #5), `record_workos_sign_in` (verified-domain match required per codex pass-1 CRITICAL #3). All REVOKE FROM PUBLIC,anon,authenticated; GRANT TO service_role.
577
+
578
+ New deps: `tldts` (maintained PSL package per codex pass-1 NOTE #1).
579
+ New env vars: `SSO_STATE_SIGNING_SECRET` (≥32 bytes, module-load validation per codex plan-pass WARNING #4), `WORKOS_CLIENT_ID` (required by `workos.sso.getAuthorizationUrl`).
580
+
581
+ Helpers:
582
+ - `lib/dns/normalize-domain.ts` — `normalizeDomain` + `normalizeEmailDomain` (IDN, public-suffix-aware) used by every domain-touching surface.
583
+ - `lib/dns/verify-txt.ts` — `Promise.race`-bounded TXT lookup (codex pass-2 WARNING #4 — `node:dns/promises.resolveTxt` doesn't honor AbortSignal).
584
+ - `lib/auth/enforce-sso-required.ts` — sign-in surface chokepoint.
585
+ - `lib/workos/sign-in.ts` — `getSsoStateSigningSecret` (length-validated singleton), `signStateCookie` / `parseStateCookie` (HMAC), `buildAuthorizeUrl` (passes clientId per codex plan-pass CRITICAL #3).
586
+ - `lib/dashboard/membership-guard.ts` MAP gains 13 new error codes.
587
+
588
+ UI:
589
+ - `/login/sso` page + `<SsoSignInForm>` client component.
590
+ - `<SsoDomainsCard>` + `<SsoRequiredToggle>` embedded in admin SSO page (toggle renders even when SSO inactive per codex pass-1 WARNING #7).
591
+
592
+ Tests: 5 new test files (54 tests). domains.test.ts (11), required.test.ts (4), start.test.ts (5), callback.test.ts (10), sso-signin-privilege.test.ts (13), normalize-domain.test.ts (19), verify-txt.test.ts (6), enforce-sso-required.test.ts (7), sign-in.test.ts (11). Stub extensions for 7 new RPCs (`claim_domain`, `mark_domain_verified`, `revoke_domain_claim`, `set_sso_required`, `consume_sso_authentication_state`, `record_workos_sign_in`, `audit_append`) + 3 new tables + `auth.admin.{getUserById,createUser,generateLink,signOut}` + `auth.verifyOtp` mocks.
593
+
594
+ ## 6.3.0-pre.10 (2026-05-08)
595
+
596
+ **v7.0 Phase 5.4 — WorkOS SSO setup.** Foundational SSO wiring: server-owned WorkOS organization correlation, admin-gated portal link, signature-verified lifecycle webhook, owner-gated disconnect.
597
+
598
+ New env vars: `WORKOS_API_KEY`, `WORKOS_WEBHOOK_SECRET`.
599
+
600
+ Migration `data/deltas/20260508180000_phase5_4_workos_setup.sql`:
601
+ - ALTER `organization_settings` adds 7 SSO columns (workos_organization_id, workos_connection_id, sso_connection_status, sso_connected_at, sso_disabled_at, sso_last_workos_event_at, sso_last_workos_event_id) + unique partial indexes on workos_organization_id and workos_connection_id.
602
+ - New `processed_workos_events` ledger with claim/lease/complete columns (status, processing_started_at, locked_until, attempt_count) — enables idempotent webhook retry.
603
+ - Three SECURITY DEFINER RPCs (REVOKE FROM PUBLIC,anon,authenticated; GRANT service_role): `record_sso_setup_initiated` (admin-gated, raises `workos_org_already_bound` if a different active WorkOS org would be swapped), `apply_workos_event` (claim/lease/complete + lifecycle ordering via sso_last_workos_event_at + state transition + audit append in one txn — connection.deleted always wins over older updated), `disable_sso_connection` (owner-only soft-disable).
604
+
605
+ Surfaces:
606
+ - `POST /api/dashboard/orgs/:orgId/sso/setup` — 6-step admin-gated portal-link sequence. Server-creates the WorkOS org via `externalId=orgId` so correlation is server-owned; idempotent on retry. Returns `{ portalUrl, workosOrganizationId }` with `Cache-Control: private, no-store`.
607
+ - `DELETE /api/dashboard/orgs/:orgId/sso` — owner-only two-step disconnect (RPC sets status='disabled'; route then calls `workos.sso.deleteConnection`; failure non-fatal — eventual `connection.deleted` webhook clears connection_id via apply_workos_event).
608
+ - `POST /api/workos/webhook` — runtime nodejs, raw `req.text()` body, HMAC verified via `workos.webhooks.constructEvent` (5-min tolerance). Maps connection.activated/deactivated/deleted (and dsync.* variants) through apply_workos_event RPC. 401 on bad signature, 500 on RPC error so WorkOS retries.
609
+ - `/dashboard/admin/sso` page (owner-only, 404 otherwise) + `<SsoSetupCard>` client component.
610
+
611
+ Helpers:
612
+ - `lib/workos/client.ts` — lazy `getWorkOS()` singleton + async `verifyWorkOSSignature()` wrapper (returns `{ok, event} | {ok:false, reason}`).
613
+ - `lib/dashboard/membership-guard.ts` MAP gains `workos_org_already_bound: 422`, `bad_workos_org_id: 422`, `webhook_signature_invalid: 401`.
614
+
615
+ Sidebar: admin layout adds "SSO" link.
616
+
617
+ Tests: 5 new test files (40 tests). setup.test.ts (11), disconnect.test.ts (6), webhook.test.ts (6), client.test.ts (6), sso-privilege.test.ts (11 — REVOKE/GRANT, SECURITY DEFINER, schema-qualified refs, claim/lease/complete columns, lifecycle handlers). Stub extensions for `record_sso_setup_initiated`, `apply_workos_event`, `disable_sso_connection` RPCs + `processed_workos_events` table behavior.
618
+
619
+ ## 6.3.0-pre.9 (2026-05-08)
620
+
621
+ **v7.0 Phase 5.3 — Org switcher.** Replaces the "first admin/owner membership" hack across `/dashboard` + `/dashboard/admin/*` with a real org switcher backed by an HTTP-only cookie.
622
+
623
+ - New: `POST /api/dashboard/active-org` sets `cao_active_org` cookie (HttpOnly Secure SameSite=Lax 14d). Body `{ orgId }` validates caller is active member; `{ orgId: null }` clears.
624
+ - New: `lib/dashboard/active-org.ts` exports `resolveActiveOrg(svc, userId)` (cookie → first-membership fallback) and `listActiveOrgs(svc, userId)` (with names + roles).
625
+ - New: `<OrgSwitcher>` client component in dashboard sidebar (only shows when caller has 2+ active memberships).
626
+ - Modified: `/dashboard/layout.tsx`, `/dashboard/page.tsx`, `/dashboard/billing/page.tsx`, `/dashboard/admin/layout.tsx` all now consult `resolveActiveOrg` instead of `memberships[0]`.
627
+ - Admin layout cookie restricted to admin/owner orgs — cannot escalate a member-only org into the admin surface.
628
+ - 11 new tests (6 backend route + 5 helper). Stale-cookie test asserts the membership check rejects removed members.
629
+ - No new env vars, no migration.
630
+
631
+ ## 6.3.0-pre.8 (2026-05-08)
632
+
633
+ **v7.0 Phase 5.2 — Audit log viewer + cost reporting (CSV export).** Closes the audit half of the original Phase 5 scope.
634
+
635
+ New surfaces:
636
+ 1. `/dashboard/admin/audit` — server-rendered, role-gated. Paginated audit log with single-action filter, cursor-based pagination, prev_hash/this_hash exposed for chain-replay debugging.
637
+ 2. `/dashboard/admin/cost` — owner/admin only. Per-user cost breakdown for a YYYY-MM period, default current UTC month. Download CSV button.
638
+
639
+ 3 new API routes (all under `/api/dashboard/orgs/:orgId/`):
640
+ - `GET /audit` — list_audit_events RPC; cursor decode + ISO since/until validation route-side; nextCursor base64-re-encoded
641
+ - `GET /cost` — org_cost_report RPC; period response normalized to `{ since, until, sinceTs, untilTs }`
642
+ - `GET /cost.csv` — same RPC, formats as RFC 4180 CSV (CRLF, UTF-8 no BOM, double-quote escape); filename `cost-<orgId>-<since>-<until>.csv` (no org-name interpolation)
643
+
644
+ 2 SECURITY DEFINER Postgres RPCs in `data/deltas/20260508160000_phase5_2_audit_cost_rpcs.sql`:
645
+ - `list_audit_events` — keyset pagination on `(occurred_at DESC, id DESC)` with index `audit_events_org_keyset_idx`. LEFT JOIN auth.users for actor_email.
646
+ - `org_cost_report` — aggregates runs by user_id; coalesce-in-coalesce-out NULL safety; LEFT JOIN auth.users.
647
+ - Both `SECURITY DEFINER SET search_path = public, audit, auth, pg_temp`. `REVOKE ALL FROM PUBLIC, anon, authenticated; GRANT EXECUTE TO service_role` only.
648
+
649
+ 3 new helpers:
650
+ - `lib/dashboard/period.ts` — YYYY-MM parser converting `since/until` to (sinceTs, untilTs exclusive). UTC. Default to current month when both null.
651
+ - `lib/dashboard/cost-csv.ts` — RFC 4180 encoder + safe filename builder (validates against `[a-zA-Z0-9._-]`).
652
+ - `lib/dashboard/audit-cursor.ts` — base64 JSON cursor encode/decode + ISO 8601 UTC validator.
653
+
654
+ Codex passes folded:
655
+ - Spec pass 1 (3 CRITICAL: cost period semantics, runs.created_at vs nonexistent occurred_at, CSV filename injection + 7 WARNING + 2 NOTE)
656
+ - Spec pass 2 (2 CRITICAL: filename surface contradiction, deployment target clarification + 7 WARNING: cursor validation in route, error contract, JSON shape unification, audit period parsing, cache headers + 2 NOTE)
657
+ - Plan pass (2 CRITICAL: schema-qualify audit.events + SET search_path lock, runs.cost_usd ordering guard + 6 WARNING)
658
+
659
+ All routes return `Cache-Control: private, no-store`. All pages declare `force-dynamic`.
660
+
661
+ 39 new tests (Phase 5.1's 237 → **276 web tests**). Helper unit tests: 21. Backend route tests: 32. Integration: 4. Privilege: 7 (incl. SECURITY DEFINER + search_path + schema-qualification + Phase 4 dependency check).
662
+
663
+ **Operator follow-up:** run `/migrate` to apply `20260508160000_phase5_2_audit_cost_rpcs.sql` against dev → QA → prod.
664
+
665
+ ## 6.3.0-pre.7 (2026-05-08)
666
+
667
+ **v7.0 Phase 5.1 — Members management + RBAC enforcement.** First Org-tier user-visible surface. After Phase 5.1 ships, an Org-tier admin (small/mid Stripe plans, or free org owner) can actually manage their team.
668
+
669
+ New surfaces:
670
+ 1. `/dashboard/admin/members` — server-rendered, role-gated. Lists active members with email, role dropdown per row, remove button. Embedded invite form (email + role).
671
+ 2. `/dashboard/admin/settings` — owner-only. Edit org name (1..100 chars).
672
+ 3. `/dashboard/admin/layout.tsx` — sidebar nav. 404s if signed-out OR caller has no admin/owner membership in any org.
673
+ 4. Sidebar "Admin" link in `/dashboard/layout.tsx` — visible only when caller has admin/owner membership somewhere.
674
+
675
+ 5 new API routes (all under `/api/dashboard/orgs/`):
676
+ - `GET /:orgId/members` — list active members + emails.
677
+ - `POST /:orgId/members/invite` — admin/owner invites by email; reactivates removed members.
678
+ - `PATCH /:orgId/members/:userId` — change role (matrix-gated).
679
+ - `DELETE /:orgId/members/:userId` — soft-remove (admin: members only; owner: any).
680
+ - `PATCH /:orgId` — owner updates org name.
681
+
682
+ 4 SECURITY DEFINER Postgres RPCs in `data/deltas/20260508140000_phase5_1_member_rpcs.sql`:
683
+ - `invite_member`, `change_member_role`, `remove_member`, `update_org_name`.
684
+ - Each acquires `FOR UPDATE` lock on `memberships` rows for the org BEFORE re-reading caller role + authorizing. Atomically count + write + audit.append in one transaction.
685
+ - `REVOKE ALL FROM PUBLIC, anon, authenticated; GRANT EXECUTE TO service_role;` — direct authenticated RPC calls fail with `42501 permission denied`.
686
+ - Codex pass 1 CRITICAL (TOCTOU last-owner race) + codex pass 2 CRITICAL (caller-spoofing + lock-before-authorize) + codex plan-pass CRITICAL (`update_org_name` NULL-role guard) — all folded.
687
+
688
+ CSRF: `assertSameOrigin(req)` on every mutating route (5 of 5).
689
+
690
+ Audit events: `org.member.invited`, `org.member.role_changed`, `org.member.removed`, `org.settings.updated`. Written by RPCs only — never in route code.
691
+
692
+ 35 backend tests + 4 integration tests = 39 new web tests. Existing 180 still pass. Concurrency test (#31) proves serial last-owner check via stub mutex; static migration test (#31b) proves REVOKE/GRANT.
693
+
694
+ No new env vars. No CLI changes.
695
+
696
+ **Operator follow-up:** run `/migrate` to apply `20260508140000_phase5_1_member_rpcs.sql` against dev → QA → prod.
697
+
698
+ ## 6.3.0-pre.6 (2026-05-08)
699
+
700
+ **v7.0 Phase 4 — Free tier dashboard UI + `/cli-auth` page + public share-by-URL.** Closes the loop on Phase 3's commercially load-bearing 402: free users now SEE "you've used 87/100 this month" and one click away from upgrading.
701
+
702
+ Eight new UI surfaces:
703
+ 1. `/dashboard` overview — auth-gated, server-rendered. Run count this month, cost MTD, current plan, recent runs (5), 30-day cost chart (inline SVG, no library).
704
+ 2. `/dashboard/runs` — paginated list (20/page, offset-based via `range()`).
705
+ 3. `/dashboard/runs/[runId]` — detail page with manifest-driven event replay (lazy chunk loading, hard 1000-event cap for MVP), state inspector, cost breakdown, visibility toggle.
706
+ 4. `/dashboard/billing` — current plan/caps/usage; Upgrade/Manage subscription buttons that POST to Phase 3 endpoints.
707
+ 5. `/dashboard/billing/success` — post-checkout polling page.
708
+ 6. `/cli-auth` (DEFERRED FROM 2.3) — completes the CLI dashboard login flow. Server-validates `cb` (loopback only, port 56000-56050) + `nonce` (32 hex). Authenticated user clicks "Sign in CLI" → mints API key via `/api/dashboard/api-keys/mint` → POSTs to loopback with `mode: 'cors'`. CLI loopback listener (Phase 2.3, EXTENDED) gains OPTIONS preflight + `Access-Control-Allow-Origin` matching the configured `AUTOPILOT_PUBLIC_BASE_URL`.
709
+ 7. `/runs/[runShareId]` — public share-by-URL. Server-side anon Supabase client (NOT createBrowserClient). Read-only events replay + state.
710
+ 8. `PATCH /api/dashboard/runs/:runId/visibility` — narrow owner-only endpoint with explicit owner check + assertSameOrigin guard. NOT direct UPDATE on runs from client.
711
+
712
+ Plus required infrastructure:
713
+ - **Authorized signed-URL minter** at `GET /api/dashboard/runs/:runId/artifact?kind=manifest|chunk|state[&seq=N]` — verifies owner OR `visibility='public'` BEFORE calling `storage.from('run-uploads').createSignedUrl(path, 60)`. Bucket stays fully private. Chunk seq bounded against `upload_session_chunks` count → 422 on out-of-range. Path derived ONLY from DB-trusted values via `chunkPath()` helper.
714
+ - **assertSameOrigin guard** on cookie-authenticated mutating routes (mint, revoke, visibility, checkout, portal). Compares `Origin` header against `loadPublicBillingConfig().AUTOPILOT_PUBLIC_BASE_URL`. Skipped when API-key bearer auth is used.
715
+ - **`/cli-auth` security headers via middleware** — `Cache-Control: no-store`, `Referrer-Policy: no-referrer`, `X-Frame-Options: DENY`, and CSP including exact `connect-src 'self' http://127.0.0.1:* http://localhost:*` for the loopback POST. Headers set in middleware.ts (Server Component `headers()` reads request, not response).
716
+ - **Finalize handler** persists sanitized `cost_usd`/`duration_ms`/`run_status` from CLI state.json. TS-side bounds + enum validation BEFORE DB UPDATE so a buggy CLI doesn't trip the new CHECK and bring down the whole UPDATE. Wrapped in try/catch for graceful degradation during the rollout window before `/migrate` applies the new columns. Display-only — labeled "Reported by CLI", no entitlement/billing logic reads them.
717
+ - **safeRedirect** allowlist accepts `/cli-auth` AND preserves the full `?cb=&nonce=` query string when bouncing through Supabase Auth.
718
+ - **Env unification** — `AUTOPILOT_PUBLIC_BASE_URL` is now the canonical name everywhere (web AND CLI). The CLI's older `AUTOPILOT_DASHBOARD_BASE_URL` is a deprecated alias (warn-once on use).
719
+
720
+ Component breakdown: `<RunListItem>` server, `<EventReplay>` client (manifest-driven, lazy chunks, 1000-event cap), `<StateInspector>` client (recursive tree, no JSON-tree library), `<CostChart>` server (inline SVG, ~80 LOC), `<PlanCard>` server with client `<UpgradeButtons>`/`<ManageSubscriptionButton>`, `<VisibilityToggle>` client (optimistic update + confirmation modal).
721
+
722
+ 30+ new tests: 6 visibility (incl. CSRF) + 14 artifact (9 base + 3 RLS + 2 seq-bounds) + 1 finalize-persists + 9 sanitize + 1 finalize-malformed-status + 3 cli-auth validate + 4 cli-auth headers + 1 cli-auth redirect round-trip + 2 cost-chart + 6 dashboard-pages integration + 4 origin-mismatch (mint/revoke/checkout/portal) + 1 CLI OPTIONS preflight = ~52 added tests across web + CLI.
723
+
724
+ **Migration:** `data/deltas/20260508120000_phase4_runs_metadata.sql` — `runs.cost_usd NUMERIC(12,4)`, `duration_ms INTEGER`, `run_status TEXT` with CHECK enum; cost-chart partial indexes (user vs org); `runs_select_public` policy for anon/authenticated on `visibility='public'`; column-level GRANT to anon (only safe public columns, NOT `SELECT *`). Operator runs `/migrate` post-merge BEFORE the code deploy fully exercises the new columns; finalize handler graceful-drops if columns missing.
725
+
726
+ **No new env vars** — all reuse Phase 2.1 + 2.3 + 3 vars. Consider standardizing `AUTOPILOT_PUBLIC_BASE_URL` in any custom CLI deployments (Phase 2.3's `AUTOPILOT_DASHBOARD_BASE_URL` still works but logs deprecation warning).
727
+
728
+ **Operator follow-ups:**
729
+ - Run `/migrate` to apply `data/deltas/20260508120000_phase4_runs_metadata.sql`.
730
+ - (Optional) Configure Stripe Customer Portal in dashboard if not already (allows cancellation, payment update from `/dashboard/billing`).
731
+
732
+ ## 6.3.0-pre.5 (2026-05-08)
733
+
734
+ **v7.0 Phase 3 — Stripe entitlement enforcement.** Makes the cryptographic credibility boundary commercially load-bearing: every engine-on `autopilot --mode full` upload is now gated on the org's monthly run cap and retained-storage cap.
735
+
736
+ Five new surfaces:
737
+ 1. `POST /api/stripe/webhook` — `runtime='nodejs'`, raw-body signature verification, claim/lease/complete idempotency (status='processing' + locked_until+attempt_count, stale leases reclaimed atomically), `last_stripe_event_at` watermark for out-of-order delivery. Handles `checkout.session.completed`, `customer.subscription.updated`, `customer.subscription.deleted`, `invoice.payment_failed`.
738
+ 2. `POST /api/dashboard/billing/checkout` — Supabase session auth with role check (owner/admin), Stripe Checkout Session create with `idempotencyKey='${orgId}:${tier}:${interval}'` and customer reuse via `billing_customers.stripe_customer_id`. Returns `{ url }`.
739
+ 3. `POST /api/dashboard/billing/portal` — same auth, returns Stripe Customer Portal session URL.
740
+ 4. `POST /api/upload-session` — Phase 2.2 endpoint extended with entitlement gate between ownership pass and JWT mint. Returns 402 `{ error: 'limit_reached', limit, current, max, upgrade_url }`. New body field `expectedBytes` from `fs.stat(events.ndjson).size` for storage cap preflight (catches the 4.9-of-5GiB user uploading 20GiB pattern).
741
+ 5. CLI uploader catches 402 → throws typed `UploadLimitError`. Auto-upload entry point (`auto-upload.ts`) detects, prints friendly message, returns `reason='limit-reached'` without bubbling. Run's exit code preserved.
742
+
743
+ Pricing tiers (per v7.0 MVP): Free (100 runs/mo, 5 GiB, $0), Org Small (1000, 50 GiB, $99/mo or $990/yr), Org Mid (10000, 500 GiB, $499/mo or $4990/yr), Enterprise (NULL caps = no enforcement, sales-led). PLAN_MAP keys by `(tier, interval)` for all 4 price IDs. Free organizations DO exist and share an org-level cap (NOT each-user-gets-personal-cap) — seeded by AFTER INSERT trigger on `organizations`.
744
+
745
+ Run-count cap uses STRICT `>` comparison (the runs row already exists when /api/upload-session is called, so count=100 is the 100th and is allowed; reject only at 101+). Storage cap = `sum_retained_bytes(orgId, userId, 90 days)` SQL aggregate, with `expectedBytes` preflight at mint time.
746
+
747
+ `loadBillingConfig()` validates Stripe env at runtime with zod; `loadPublicBillingConfig()` only reads `AUTOPILOT_PUBLIC_BASE_URL` so missing Stripe env doesn't break the upload-session entitlement gate. Subscription state grace logic: canceled-and-past-period-end → free; cancel_at past → free; payment_failed_at older than 7 days → free.
748
+
749
+ 31 new tests: 8 webhook + 4 checkout + 3 portal + 10 checkEntitlement + 2 plan-map + 2 upload-session integration (web) + 3 CLI 402 handling.
750
+
751
+ **Migration:** `data/deltas/20260507180000_phase3_billing.sql` — `billing_customers`, augments `entitlements` with Stripe state + caps + watermark, `stripe_webhook_events` with claim/lease, `personal_entitlements`, augments `runs` with `total_bytes`+`deleted_at`, `sum_retained_bytes` + `count_runs_this_month` + `seed_free_entitlements` SECURITY DEFINER RPCs/trigger. CHECK constraint enforces free/small/mid have explicit caps and enterprise has NULLs. Backfills existing rows BEFORE adding the constraint. Operator runs `/migrate` post-merge.
752
+
753
+ **New env vars (Vercel):**
754
+ - `STRIPE_SECRET_KEY`
755
+ - `STRIPE_WEBHOOK_SECRET`
756
+ - `STRIPE_PRICE_SMALL_MONTHLY`
757
+ - `STRIPE_PRICE_SMALL_YEARLY`
758
+ - `STRIPE_PRICE_MID_MONTHLY`
759
+ - `STRIPE_PRICE_MID_YEARLY`
760
+
761
+ **Operator follow-ups:**
762
+ - Run `/migrate` to apply the migration through dev → QA → prod.
763
+ - Set the 6 Stripe env vars above in Vercel.
764
+ - Configure Stripe webhook in dashboard pointing at `https://autopilot.dev/api/stripe/webhook` and subscribe to: `checkout.session.completed`, `customer.subscription.updated`, `customer.subscription.deleted`, `invoice.payment_failed`.
765
+ - Create Stripe Products + 4 Prices: small ($99/mo + $990/yr), mid ($499/mo + $4990/yr).
766
+
767
+ ## 6.3.0-pre.4 (2026-05-07)
768
+
769
+ **v7.0 Phase 2.3 — CLI dashboard verbs + auto-upload at run.complete.** Connects v6.x autopilot pipeline to Phase 2.2's ingest API.
770
+
771
+ Four new CLI verbs: `claude-autopilot dashboard {login,logout,status,upload}`. After `dashboard login`, every engine-on `autopilot --mode full` automatically uploads to autopilot.dev when `run.complete` fires. Login flow uses 128-bit nonce-bound loopback HTTP listener (port 56000-56050) with strict server-side `callbackUrl` validation, `crypto.timingSafeEqual` nonce verify, and atomic config write at `~/.claude-autopilot/dashboard.json` (mode 0600, dir 0700). Snapshot-before-upload (events.ndjson + state.json copied to `<runDir>/.upload-snapshot/` with stat-before/stat-after defense) so streaming writers can't tear the chunk reads. Auto-upload is foreground await with SIGINT/AbortController; failure prints `claude-autopilot dashboard upload <runId>` resume command and never overrides the run's exit code. Empty events.ndjson skips upload cleanly. Opt out per-run with `--no-upload` or globally with `CLAUDE_AUTOPILOT_UPLOAD=off`.
772
+
773
+ Web side adds four new endpoints under `/api/dashboard/`: `POST api-keys/mint` (Supabase session auth → atomic `mint_api_key_with_nonce` RPC, 128-bit `clp_<64-hex>` keys, SHA256-hashed at rest, 12-char prefix display), `POST api-keys/revoke` (idempotent, ownership-scoped), `GET me` (memberships + lastUploadAt), `GET runs/:runId/upload-session` (resume in-flight session). Centralized `authViaApiKey()` helper in `apps/web/lib/dashboard/auth.ts` looks up keys by deterministic hash with `eq + maybeSingle` (O(1)) and filters revoked keys. Strict `validateCallbackUrl()` regex restricts callbacks to `http://(127.0.0.1|localhost):560(0[0-9]|[1-4][0-9]|50)/cli-callback` with double-parse defense.
774
+
775
+ CLI ↔ web parity guaranteed by shared fixtures: `apps/web/lib/upload/__fixtures__/{chain-vectors,state-canonicalization-vectors}.json` are loaded byte-for-byte by `tests/dashboard/parity.test.ts`. Identical chain-root and JCS-canonical sha256 in both directions.
776
+
777
+ **Migration:** `data/deltas/20260507120000_phase2_3_api_keys.sql` — adds `api_keys` (RLS, key_hash regex check, prefix_display regex check), `api_key_mint_nonces` (RLS, service-role-only), `expire_mint_nonces()` SECURITY DEFINER RPC, and the atomic `mint_api_key_with_nonce()` SECURITY DEFINER RPC that fuses sweep + dedup-check + insert key + insert nonce in a single transaction. Operator runs `/migrate` post-merge.
778
+
779
+ **New env vars:**
780
+ - Web (Vercel): `NEXT_PUBLIC_AUTOPILOT_BASE_URL` — used by the `cli-auth` web page (deferred to Phase 4 dashboard UI) to display loopback callback URL.
781
+ - CLI: `AUTOPILOT_DASHBOARD_BASE_URL` (defaults `https://autopilot.dev`); `CLAUDE_AUTOPILOT_HOME` (defaults `~/.claude-autopilot`); `CLAUDE_AUTOPILOT_UPLOAD=off` opts out of auto-upload; `CLAUDE_AUTOPILOT_UPLOAD_RETRY_MS` overrides retry backoff (test seam).
782
+
783
+ **Operator follow-ups:**
784
+ - Run `/migrate` to apply the migration through dev → QA → prod.
785
+ - Set `NEXT_PUBLIC_AUTOPILOT_BASE_URL=https://autopilot.dev` in Vercel.
786
+ - Implement the `/cli-auth` web page in Phase 4 dashboard UI. The page must mint via `POST /api/dashboard/api-keys/mint` then POST `{ apiKey, fingerprint, accountEmail, nonce }` to the loopback callback (URL passed in `?cb=`). Phase 2.3 tests use a mock handler that simulates this flow end-to-end.
787
+
788
+ ## 6.3.0-pre.3 (2026-05-07)
789
+
790
+ **v7.0 Phase 2.2 — ingest API + tamper-evident events.** First server endpoints in the repo. Three routes (`POST /api/upload-session`, `PUT /api/runs/:runId/events/:seq`, `POST /api/runs/:runId/finalize`) implement signed-session uploads with hash-chain verification and idempotent finalize. Per-chunk immutable Storage objects, DB row lock + unique constraint + Storage `upsert: false` triple-defense against concurrent corruption. Two-phase write ordering with `upload_session_chunks.status` for crash recovery. Dedicated `UPLOAD_SESSION_JWT_SECRET` (HS256, 15-min TTL, full claim hardening). RFC 8785 (JCS) state canonicalization. 38 new tests across upload-session, events-chunk, finalize, hash-chain vectors, JCS vectors, JWT, and storage helpers.
791
+
792
+ **Migration:** `data/deltas/20260507000000_phase2_2_ingest.sql` — adds `upload_session_chunks` table, augments `upload_sessions` with `next_expected_seq` + `chain_tip_hash`, adds `runs.state_sha256` + `runs.events_index_path`, partial unique index on `upload_sessions(run_id) WHERE consumed_at IS NULL`, CHECK constraints on hash-format columns, plus `claim_chunk_slot` and `mark_chunk_persisted` SECURITY DEFINER RPCs. Operator runs via `/migrate` post-merge.
793
+
794
+ **New env var:** `UPLOAD_SESSION_JWT_SECRET` — set in Vercel + local `.env.local`. Generate with `openssl rand -hex 32`. NOT shared with `SUPABASE_JWT_SECRET`.
795
+
796
+ **Storage bucket:** `run-uploads` — operator one-time setup in the Supabase project (private; service-role-only writes).
797
+
798
+ ## 6.3.0-pre.2 (2026-05-07)
799
+
800
+ **v7.0 Phase 2.1 — Next.js scaffold + Supabase Auth (Free tier sign-in).**
801
+
802
+ First sub-PR of v7.0 Phase 2 (Ingest API + CLI integration). Pure foundation; no API endpoints related to ingest, no CLI dashboard verbs.
803
+
804
+ **What landed:**
805
+ - `apps/web/` Next.js 16 App Router app with React 19 + Tailwind v4
806
+ - npm workspaces (`workspaces: ["apps/*", "packages/*"]`) — CLI deps stay where they are; web deps live in `apps/web/package.json`
807
+ - `tsconfig.base.json` shared between CLI and web; `apps/web/` uses `bundler` module resolution, CLI keeps `NodeNext`
808
+ - Supabase Auth Google sign-in via PKCE callback (`/api/auth/callback`)
809
+ - Sign-out (`/api/auth/sign-out`) clears only configured project ref's cookies — never `sb-*` wildcard
810
+ - `safeRedirect` whitelist with documented change policy
811
+ - Scoped middleware matcher: refreshes session on page + `/api/auth/*` routes ONLY; excludes static assets, `/api/health`, and non-auth `/api/*` (ingest endpoints in 2.2 handle their own auth)
812
+ - Health endpoint `/api/health` for platform health checks
813
+ - 22 web tests via Vitest (10 redirect + 5 callback + 2 signout + 4 matcher + 1 typecheck-guard)
814
+ - `web-tests.yml` workflow runs typecheck + Next.js build + tests on every PR
815
+ - `npm-tarball-check.yml` workflow asserts `apps/` is excluded from the published CLI tarball
816
+ - `vercel.json` configured for monorepo build with `apps/web/` root
817
+
818
+ **Spec:** `docs/specs/v7.0-phase2.1-nextjs-scaffold.md` (PR #116)
819
+ **Plan:** `docs/superpowers/plans/2026-05-07-v7.0-phase2.1-nextjs-scaffold.md`
820
+
821
+ Pre-release on the npm `next` tag. `latest` stays on `6.2.2`.
822
+
823
+ ## 6.3.0-pre.1 (2026-05-07)
824
+
825
+ **v7.0 Phase 1 — Foundation: schema + RLS + cross-tenant negative tests.**
826
+
827
+ First step toward the v7.0 hosted product. Database-only PR; no endpoints, no UI, no Stripe integration.
828
+
829
+ **What landed:**
830
+
831
+ - `db/supabase/` Supabase project bootstrap with 8 numbered migrations
832
+ - 7 multi-tenant tables: `organizations`, `memberships`, `runs`, `upload_sessions`, `entitlements`, `audit_events`, `organization_settings`
833
+ - RLS enabled on every table with two-branch pattern: `(organization_id IS NOT NULL AND active member)` OR `(organization_id IS NULL AND user_id = auth.uid())`
834
+ - `audit.append()` SQL function with hash-chain immutability; app roles get INSERT only via the function
835
+ - Supabase Storage buckets `org-runs` and `user-runs` with tenant-scoped path-prefix RLS
836
+ - `entitlements.plan` CHECK constraint matching `organizations.plan` exactly
837
+ - `upload_sessions` stores only `jti` + token hash (never raw signing material)
838
+ - 7 RLS negative test files covering: runs cross-tenant, free-vs-org-tier branches, audit immutability, storage path isolation, entitlements admin-only, membership edge cases, upload_sessions single-use
839
+ - CI workflow `.github/workflows/db-tests.yml` runs the test suite against a Dockerized Supabase on every PR
840
+
841
+ **Spec:** `docs/specs/v7.0-hosted-product-mvp.md` (PR #114)
842
+ **Plan:** `docs/superpowers/plans/2026-05-07-v7.0-phase1-foundation.md`
843
+
844
+ Pre-release on the npm `next` tag. `latest` stays on `6.2.2`.
845
+
846
+ ## 6.2.2 — `claude-autopilot autopilot --json` envelope + cache version policy (2026-05-07)
847
+
848
+ **Headline.** Closes out the v6.2.x track. `claude-autopilot autopilot --json` now emits exactly one machine-readable envelope on stdout — successful runs, pre-run failures, and mid-pipeline failures all produce the same shape so CI consumers can branch on `.exitCode` / `.failedPhase` / `.errorCode` directly without parsing stderr NDJSON. The cache contract gains a `MIN_SUPPORTED..MAX_SUPPORTED` schema-version window so a stale run dir from a future binary fails with a clear error instead of an opaque shape crash. The migration guide gets a new "v6.1 → v6.2: one runId across the pipeline" section.
849
+
850
+ **Motivation — Codex review of the v6.2 spec (3 WARNING + 3 NOTE).** The v6.2 orchestrator spec reserved `--json` for v6.2.2; the spec for this PR (Codex 5.3-reviewed) folded back three warnings (strict equality on schemaVersion blocks rolling deploys, exactly-once envelope needs uncaughtException coverage, exit-code taxonomy ambiguous for pre-run failures) and three notes (six-phases vs four-phases migration text, `errorCode` union too loose, stdout purity test under stderr load).
851
+
852
+ **What's in (the 9 deliverables from the spec's "Scope" section).**
853
+
854
+ - **Outer JSON envelope** for `claude-autopilot autopilot --json`. New `AutopilotJsonEnvelope` shape (`version: '1'`, `verb: 'autopilot'`, `runId | null`, `status`, `exitCode`, `phases[]`, `totalCostUSD`, `durationMs`, `errorCode?`, `errorMessage?`, `failedAtPhase?`, `failedPhaseName?`). Pre-run failures get `runId: null` + populated `errorCode`. Mid-pipeline failures get `failedAtPhase` + `failedPhaseName`.
855
+ - **Bounded `AutopilotErrorCode` enum.** Exact strings: `invalid_config | budget_exceeded | lock_held | corrupted_state | partial_write | needs_human | phase_failed | internal_error`. CI consumers can rely on these specific values; new codes ship as minor versions of the envelope schema. Per codex NOTE #5.
856
+ - **Single-write latch + uncaughtException / unhandledRejection handlers.** Module-scoped boolean in `src/cli/json-envelope.ts` flips BEFORE writing so subsequent calls no-op. The orchestrator's `runAutopilotWithJsonEnvelope` installs process-level fatal handlers that consult the latch — if an envelope already shipped, they exit silently; otherwise they emit a fallback `internal_error` envelope before exiting `1`. Test seam `__testInstallProcessHandlers: false` keeps the handlers from leaking across the suite. Per codex WARNING #2.
857
+ - **Deterministic exit-code-to-errorCode mapping** via `computeAutopilotExitCode`. `0` success / `1` `invalid_config | phase_failed | internal_error` / `2` `lock_held | corrupted_state | partial_write` / `78` `budget_exceeded | needs_human`. Per codex WARNING #3.
858
+ - **Cache contract version policy** in `src/core/run-state/state.ts` + the replay path in `events.ts`. New exports `RUN_STATE_MIN_SUPPORTED_SCHEMA_VERSION = 1` and `RUN_STATE_MAX_SUPPORTED_SCHEMA_VERSION = RUN_STATE_SCHEMA_VERSION`. `replayState()` throws `corrupted_state` when the persisted `schema_version` falls outside the window, with a message naming both bounds for operator triage. Future minor versions can additively expand the schema while preserving forward-read compatibility (bump writer, leave reader); major bumps reset `MIN_SUPPORTED` to break with the past explicitly. Per codex WARNING #1.
859
+ - **Migration guide section.** New "v6.1 → v6.2: one runId across the pipeline" section in `docs/v6/migration-guide.md` walks through the per-verb → orchestrator collapse, the `--json` envelope shape (success / pre-run failure / mid-pipeline failure examples), the `AutopilotErrorCode` taxonomy table, and the cache version policy. Flags the v6.2.0 vs v6.2.1 phase-set difference per codex NOTE #4 — examples assume the v6.2.1 6-phase set (`scan → spec → plan → implement → migrate → pr`).
860
+ - **Channel discipline preserved.** The envelope is the only thing on stdout in `--json` mode (orchestrator runs with `__silent: true`). NDJSON events continue to flow to stderr unchanged via the existing v6 Phase 5 helpers.
861
+ - **Dispatcher wiring.** `src/cli/index.ts` plumbs `--json` through to `runAutopilotWithJsonEnvelope`; pre-run validation failures (`--mode`, `--budget`) emit envelopes too so CI never sees free-text errors when `--json` is on.
862
+
863
+ **Tests.** Baseline 1534 → 1548 (+14 net new):
864
+
865
+ - 9 envelope tests in `tests/cli/autopilot-json-envelope.test.ts` covering the 6 spec scenarios (success, pre-run failure, mid-pipeline failure, no-ANSI on stdout, stdout purity under stderr load, single-write latch + uncaughtException) plus 1 latch sanity test and 2 exit-code/enum mapping tests.
866
+ - 5 schema-version range tests in `tests/run-state/state.test.ts` covering the bounds export plus accept-in-range, reject-below-MIN, reject-above-MAX, and message-names-both-bounds.
867
+
868
+ **Engine-off path unchanged.** The schema-version range check applies inside `replayState()` (engine-on territory). Engine-off invocations don't read run dirs and are byte-for-byte identical to v6.2.1.
869
+
870
+ **Out of scope (deliberate, see spec for full list).**
871
+ - `--json` envelope on individual wrapped verbs other than `autopilot`. They already emit per-verb envelopes via the v6 Phase 5 helper; no change needed.
872
+ - Streaming JSON (newline-delimited progress events on stdout). v6.3 — would need a major channel-discipline change.
873
+ - Schema migration tooling. v6.x has only one schema version; migration tooling is reserved for the v7 layout change.
874
+
875
+ **Spec.** docs/specs/v6.2.2-json-envelope-and-docs.md (3 WARNING + 3 NOTE folded from the Codex 5.3 review).
876
+
877
+ ## 6.2.1 — Side-effect phase idempotency contracts (`migrate` + `pr`) (2026-05-07)
878
+
879
+ **Headline.** Side-effecting phases now satisfy a registry-enforced two-step contract — record a deterministic "I'm starting this work" breadcrumb BEFORE the side-effect, then one reconciliation ref per durable artifact AFTER. With the contract in place, `migrate` and `pr` enter the orchestrator's `--mode=full` registry, expanding the v6.2.0 `scan → spec → plan → implement` pipeline to the full **6-phase** flow `scan → spec → plan → implement → migrate → pr` under one runId.
880
+
881
+ **Motivation — Codex CRITICAL gate from v6.2.** The v6.2 orchestrator spec flagged side-effect resume as the riskiest property to certify before adding `migrate` or `pr`: a partial crash mid-dispatch could leave the engine blind to applied work, causing the resume preflight to either silently re-run side effects (data loss) or pessimistically refuse every retry (operability tax). v6.2.1 closes the gap with a uniform contract every side-effecting phase must declare AND a registry-time guard that throws if the declaration is missing.
882
+
883
+ **What's in (the 7 deliverables from spec section "Scope of THIS PR").**
884
+
885
+ - **New `migration-batch` ref kind** in `ExternalRefKind` (`src/core/run-state/types.ts`). Documented semantics: "deterministic id covers a planned migration batch; emitted BEFORE dispatch so a partial crash leaves a resume target." Joins `migration-version` (the post-effect reconciliation ref).
886
+ - **`migrate` pre-effect breadcrumb.** `src/cli/migrate.ts` now emits a `migration-batch` ref BEFORE `dispatchFn(input)` — a partial crash leaves the orchestrator a resume target. The post-success `migration-version` refs stay (one per applied migration). Per the v6.2.1 spec, the batch id uses the `${env}:pre-dispatch:${Date.now()}` fallback form because no Delegance migrate skill (Supabase, Rails, Alembic, …) exposes its planned set pre-dispatch — the deterministic-id form `sha256(env+plannedMigrations)` is reserved for a follow-up that adds a planning verb to the skill protocol.
887
+ - **Provider readback for `migration-batch`** in `src/core/run-state/provider-readback.ts`. Queries the dispatcher's ledger for the planned set + applied set, returns `merged` (all applied), `open` (some pending), `failed` (any errored), or `unknown` (fail closed on missing fetcher / throw / null). New `MigrationBatchFetcher` interface + `registerMigrationBatchFetcher` seam alongside the existing `MigrationStateFetcher`.
888
+ - **Registry-time enforcement** in `src/core/run-state/phase-registry.ts`. New `registerPhase()` helper throws `Error: registry: side-effect phase <name> missing idempotency contract` when a `hasSideEffects: true` registration omits `preEffectRefKinds` or `postEffectRefKinds`. Applied to all six entries; the four read-only phases (scan/spec/plan/implement) omit the arrays without complaint.
889
+ - **`buildMigratePhase` and `buildPrPhase` builders** extracted following the v6.2.0 builder pattern (scan/spec/plan/implement). Each verb's existing `runX(options)` continues to delegate to its builder — direct CLI behavior is byte-for-byte identical to v6.2.0. The full registry now has: `scan / spec / plan / implement / migrate / pr`.
890
+ - **Resume preflight in orchestrator** (`src/cli/autopilot.ts` + new `src/core/run-state/resume-preflight.ts`). Before invoking `runPhase` on any side-effecting phase, the orchestrator collects prior `phase.success` + `phase.externalRef` events from `events.ndjson` and routes per the spec decision matrix: all post-effect refs `merged`/`live` → emit synthetic `phase.success` and skip; pre-effect breadcrumb `open` → retry (the phase body's own ledger handles dedup); otherwise → emit `replay.override` + throw `GuardrailError('needs_human')`. New error code `needs_human` joins the taxonomy in `src/core/errors.ts`.
891
+ - **`--mode=full` extended** to 6 phases (`DEFAULT_FULL_PHASES` in `phase-registry.ts`). After v6.2.1, `claude-autopilot autopilot` runs the entire pipeline under one runId — the YC-demo win deferred from v6.2.0.
892
+
893
+ **Tests.** Baseline 1509 → 1532 (+23 net new):
894
+
895
+ - 9 gating tests in `tests/cli/autopilot-side-effect-resume.test.ts` covering the 6 spec scenarios (migrate partial-crash retry, migrate full-success skip, pr-open skip, pr-closed needs-human, registry rejection, run-scope budget no-double-charge) plus 3 edge cases (proceed-fresh, prior success without refs, errored-ledger needs-human).
896
+ - 8 unit tests in `tests/run-state/provider-readback.test.ts` covering the new `migration-batch` readback (merged / open / failed / empty plan / null fetcher / throw / no fetcher / default-registry routing).
897
+ - 2 updated tests in `tests/cli/migrate-engine-smoke.test.ts` to account for the new pre-effect breadcrumb (now `1 + N` refs per run instead of `N`).
898
+ - 4 new test variants for the contract guard (`hasSideEffects: true` with each missing array, plus the empty-postEffect / read-only positive cases).
899
+
900
+ **Engine-off path unchanged.** Existing `migrate`/`pr` invocations without `--engine` continue byte-for-byte identical. The engine-off escape hatch threads through `executeMigratePhase(input, null)` / `executePrPhase(input, null)`, where a null `ctx` makes `emitExternalRef` a no-op — same precedent as every other wrapped verb.
901
+
902
+ **Out of scope (deliberate, see spec for full list).**
903
+ - Deterministic batch id (`sha256(env + plannedMigrations)`) — requires extracting a `planMigrations()` verb from each migrate skill's protocol. v6.2.x follow-up.
904
+ - `implement`'s `git-remote-push` ref (declared in the spec table but not yet emitted by `implement.ts`). v6.2.x follow-up.
905
+ - Cross-run ref dedup (e.g. recognizing two pre-dispatch breadcrumbs as the same operation across runs). Not needed for orchestrator MVP.
906
+ - Provider readback for non-Delegance migrate skills (Rails, Alembic, …). v6.2.1 ships the contract; per-skill readback is per-skill follow-up work.
907
+
908
+ **Spec.** docs/specs/v6.2.1-side-effect-idempotency.md (Codex CRITICAL gate from v6.2 — folded back as the foundation for this PR).
909
+
910
+ ## 6.2.0 — Multi-phase orchestrator (`claude-autopilot autopilot`) (2026-05-07)
911
+
912
+ **Headline.** New top-level `claude-autopilot autopilot` verb runs `scan → spec → plan → implement` under **one runId**. The pre-v6.2 chain (`scan && spec && plan && implement`) created four separate runs with no parent — the orchestrator collapses them into a single ledger so `claude-autopilot runs watch <id>` covers the whole pipeline and a `--budget=$25` cap ticks down across phases instead of resetting per verb.
913
+
914
+ **What's in.**
915
+ - **`claude-autopilot autopilot [options]`** — sequential N-phase orchestrator. Engine-on REQUIRED (rejected at pre-flight if `--no-engine` / `CLAUDE_AUTOPILOT_ENGINE=off` / `engine.enabled: false`). Lifecycle: `createRun({ phases })` → per-phase `buildPhase + runPhase` → emit `run.complete` exactly once → refresh state snapshot → release lock in `finally`. Non-interactive (a `pause` budget decision becomes hard-fail) so it works in CI without prompting.
916
+ - **`build<Phase>Phase()` builders** extracted from `scan`, `spec`, `plan`, `implement`. Each verb's existing `runX(options)` continues to call its builder internally — direct CLI behavior is byte-for-byte identical to v6.1. Per-verb parity tests (`tests/cli/<verb>-builder-parity.test.ts`) compare stdout / stderr / `events.ndjson` between the legacy entry and the explicit builder + `runPhaseWithLifecycle` path.
917
+ - **Phase registry** at `src/core/run-state/phase-registry.ts`. `as const` + per-entry `satisfies PhaseRegistration<I, O>` preserves per-phase I/O typing through dynamic dispatch (per codex review NOTE #5). `getPhase(name)`, `listPhaseNames()`, and `validatePhaseNames(names)` are the public surface; `--phases=<csv>` validation lives here.
918
+ - **Run-scope budget** — `BudgetConfig.scope: 'phase' | 'run'` (default `'phase'` for back-compat). When `scope === 'run'` the orchestrator's per-phase budget gates resolve against cross-phase `phase.cost` totals so the `$25` demo narrative ticks down across the whole pipeline. `sumPhaseCost(events, '*')` cross-phase overload added. Both `BudgetCheck.scope` and `BudgetCheckEvent.scope` carry the resolution forward to observers (`runs show <id> --events`, future cost dashboards). Per codex review WARNING #2 — pulled forward into v6.2.0 (was deferred to v6.2.2 in the initial draft).
919
+ - **Exit-code matrix** (per codex review WARNING #3) — 0 success, 78 budget_exceeded, 2 engine error (`lock_held` / `corrupted_state` / `partial_write`), 1 everything else. Phase failure wins over finalization error.
920
+ - **CLI surface**: `--mode=full` (default — `scan → spec → plan → implement`), `--phases=<csv>` for custom lists, `--budget=<usd>` for the run-scope cap. `--mode=fix` and `--mode=review` reserved for v6.2.1+; `--json` envelope reserved for v6.2.2.
921
+
922
+ **Tests.** Baseline 1492 → 1509 (+17 new):
923
+ - 4 builder-parity tests (`scan`, `spec`, `plan`, `implement`) covering stdout / stderr / events triple-snapshot.
924
+ - 6 run-scope budget tests in `tests/run-state/budget.test.ts` covering scope flag default, run-scope happy path, run-scope cap exceeded across phases, Layer 1 advisory in run-scope, and phase/run scope math equivalence (regression guard).
925
+ - 7 orchestrator integration tests in `tests/cli/autopilot.test.ts` covering: 3-phase happy path, scan-failure phase 0, run-scope budget exceeded → exit 78, resume lookup `already-complete` short-circuit, `--phases=invalid,scan` → exit 1 invalid_config no run dir, `CLAUDE_AUTOPILOT_ENGINE=off` → exit 1 invalid_config, `cliEngine: false` → exit 1 invalid_config.
926
+
927
+ **Out of scope (deliberate, see spec for full list).**
928
+ - `migrate`, `pr` — gated on per-phase idempotency contracts (preflight readback + externalRef recorded BEFORE side-effect). v6.2.1.
929
+ - `--mode=fix`, `--mode=review` — v6.2.1+.
930
+ - `--json` envelope — v6.2.2.
931
+ - Parallel phase execution. Sequential by design.
932
+ - Interactive prompts inside the orchestrator. CI/scripts get deterministic exit codes; pause budget decisions hard-fail.
933
+
934
+ **Spec.** docs/specs/v6.2-multi-phase-orchestrator.md (Codex-reviewed: 1 CRITICAL + 3 WARNING + 3 NOTE folded back into the spec before implementation).
935
+
936
+ ## 6.1.0 — Default flip: engine on by default + `--no-engine` deprecated (2026-05-07)
937
+
938
+ **Headline.** The Run State Engine is now ON by default. Bare
939
+ `claude-autopilot <verb>` invocations create a `.guardrail-cache/runs/<ulid>/`
940
+ directory, emit typed NDJSON events on stderr, apply budget gates if
941
+ `budgets:` is configured, and write a state snapshot — without any opt-in
942
+ config. v6.0 shipped the engine OFF behind an explicit `engine.enabled: true`
943
+ opt-in to give users control during a stabilization window; v6.1 closes
944
+ that window.
945
+
946
+ **Motivation — v6.0 stabilization criteria met.**
947
+ - 10 of 10 pipeline phases wrapped through `runPhaseWithLifecycle`
948
+ (`scan` v6.0.1, `costs`/`fix` v6.0.2, `brainstorm`/`spec` v6.0.3,
949
+ `plan`/`review` v6.0.4, `validate` v6.0.5, `implement` v6.0.7,
950
+ `migrate` v6.0.8 — first side-effecting wrap with `migration-version`
951
+ externalRefs, `pr` v6.0.9 — second side-effecting wrap with `github-pr`
952
+ externalRefs).
953
+ - Lifecycle helper extracted (v6.0.6) so all 10 wraps share the same
954
+ byte-for-byte engine-on / engine-off behavior.
955
+ - Side-effecting wraps proven (`migrate` + `pr`) — externalRef ledger
956
+ + provider readback semantics exercised end-to-end.
957
+ - Live adapter cert suite green (Vercel + Fly + Render).
958
+ - `runs watch <id>` live cost/budget meter shipped (this release's
959
+ `v6.1.0-pre` entry below) — the YC-demo moment for the events stream.
960
+ - `npm test` baseline: 1469 → 1492 (+23 net new this release; all green).
961
+
962
+ **Deprecation.** `--no-engine`, `CLAUDE_AUTOPILOT_ENGINE=off|false|0|no`,
963
+ and `engine.enabled: false` continue to work as the legacy escape hatch
964
+ in v6.1.x. Each invocation that resolves to engine-off via one of those
965
+ explicit opt-outs now prints a single-line stderr deprecation notice:
966
+
967
+ ```
968
+ [deprecation] --no-engine / engine.enabled: false will be removed in v7. Migrate to engine-on (default).
969
+ ```
970
+
971
+ The notice fires only on user-driven opt-outs (`source: 'cli' | 'env' |
972
+ 'config'`); the new (engine-on) default never trips it. **v7 removes
973
+ the escape hatch** — `engine.enabled: false` becomes a config validation
974
+ error and `--no-engine` / `CLAUDE_AUTOPILOT_ENGINE=off` are silently
975
+ ignored.
976
+
977
+ **Spec.** [`docs/specs/v6.1-default-flip.md`](docs/specs/v6.1-default-flip.md)
978
+ is the canonical reference for what flipped, why, and the v7 follow-up.
979
+
980
+ **Migration tips.**
981
+ - If your CI parses stderr as free-form text and relies on the v5.x
982
+ shape, set `CLAUDE_AUTOPILOT_ENGINE=off` (or pass `--no-engine`)
983
+ to pin the legacy behavior. You'll see the deprecation notice on
984
+ every invocation until you remove it — that's expected.
985
+ - If you opt out via config (`engine.enabled: false`), the same notice
986
+ fires on every invocation. Plan to remove that line before bumping
987
+ to v7.
988
+ - Existing users on `engine.enabled: true` are no-op'd — your config
989
+ still wins via the same precedence rules.
990
+ - See [`docs/v6/migration-guide.md#migrating-from-v60-to-v61`](docs/v6/migration-guide.md)
991
+ for the full upgrade walkthrough.
992
+
993
+ **Test surface.**
994
+ - `tests/run-state/resolve-engine.test.ts` — flipped 4 default-related
995
+ cases. New `v6.1 default-flip` describe block + `v6.1 deprecation
996
+ warning` describe block covering the predicate, the emitter, the
997
+ default `process.stderr` branch, and the `builtInDefault` override
998
+ path.
999
+ - `tests/run-state/run-phase-with-lifecycle.test.ts` — added 4 new
1000
+ cases pinning engine-on as the new default + the deprecation banner
1001
+ firing on opt-out / staying silent on the new default.
1002
+ - 9 engine-smoke tests (`brainstorm`, `costs`, `implement`, `migrate`,
1003
+ `plan`, `pr`, `review`, `spec`, `validate`) updated — the
1004
+ "engine off (default)" cases are now "engine on (v6.1 default)";
1005
+ the matching `cliEngine: false` cases stay as legacy-escape-hatch
1006
+ coverage.
1007
+
1008
+ **Files changed.**
1009
+ - `src/core/run-state/resolve-engine.ts` — new active default constant
1010
+ `ENGINE_DEFAULT_V6_1 = true`. The deprecated `ENGINE_DEFAULT_V6_0`
1011
+ export keeps its historical value (`false`) so out-of-tree consumers
1012
+ who pinned that symbol get what the name promises; both constants are
1013
+ removed in v7. New `emitEngineOffDeprecationWarning` helper +
1014
+ `shouldWarnEngineOffDeprecation` predicate +
1015
+ `ENGINE_OFF_DEPRECATION_MESSAGE` stable copy.
1016
+ - `src/core/run-state/run-phase-with-lifecycle.ts` — wires the
1017
+ deprecation helper into the engine-off branch.
1018
+ - `docs/v6/migration-guide.md` — new "Migrating from v6.0 to v6.1"
1019
+ section, updated precedence matrix, refreshed default-flip plan,
1020
+ relabeled "What changes" table.
1021
+ - `README.md` — v6 section updated (engine on by default + v7 removal
1022
+ timeline).
1023
+ - `package.json` — version `5.5.2` → `6.1.0`.
1024
+
1025
+ ## v6.1.0-pre — `runs watch <id>` live cost meter (2026-05-07)
1026
+
1027
+ **The YC-demo moment.** v6.0.x hardened the events.ndjson stream across
1028
+ all 10 wrapped phases; v6.1 makes that stream visible in real time.
1029
+ `runs watch <runId>` tails events.ndjson via `fs.watchFile` (1s poll —
1030
+ inotify/FSEvents are unreliable for tiny appends across our matrix) and
1031
+ pretty-renders each event with a running cost/budget meter so a user
1032
+ running `claude-autopilot autopilot ...` in one terminal can `runs watch`
1033
+ in another and watch their $25 budget tick down while phases ship code.
1034
+
1035
+ **Demo transcript.** Live tail of a fixture run, ANSI-stripped:
1036
+
1037
+ ```
1038
+ * run 01HZK7P3D8Q9V00000000000AB
1039
+ phases: spec -> plan -> implement -> pr
1040
+ budget: $0.00 / $25.00 (0%)
1041
+ [12:00:01] phase.start spec
1042
+ [12:00:42] phase.cost spec +$0.07 (in: 1.2k, out: 3.4k) total: $0.07
1043
+ [12:00:45] phase.success spec OK 44.2s
1044
+ [12:00:46] phase.start plan
1045
+ [12:01:12] phase.cost plan +$0.21 (in: 4.1k, out: 8.2k) total: $0.28
1046
+ [12:01:15] phase.success plan OK 29.0s
1047
+ [12:08:33] phase.externalRef pr -> github-pr#123
1048
+ [12:08:34] run.complete status=success totalCostUSD=$4.20 duration=8m32s
1049
+
1050
+ done run 01HZK7P3D8Q9V00000000000AB
1051
+ status=success totalCostUSD=$4.20 duration=8m33s
1052
+ ```
1053
+
1054
+ **Modes.**
1055
+
1056
+ - `runs watch <id>` — live tail, exits on `run.complete` / Ctrl-C
1057
+ - `runs watch <id> --since <seq>` — replay forward from a specific seq
1058
+ (resume after disconnect)
1059
+ - `runs watch <id> --no-follow` — render snapshot once and exit (CI /
1060
+ scripting)
1061
+ - `runs watch <id> --json` — emit raw NDJSON to stdout (one event per
1062
+ line) for piping to `jq` or external dashboards. ANSI suppressed.
1063
+ - `runs watch <id> --no-color` — force ANSI off even on a TTY
1064
+
1065
+ **Pretty rendering.** Color thresholds on the budget bar — green <50%,
1066
+ yellow 50-90%, red >90%. Per-event coloring: cyan for phase.start, yellow
1067
+ for phase.cost, green for phase.success, red for phase.failed, magenta
1068
+ for phase.externalRef + lock.takeover + replay.override, bold-green for
1069
+ run.complete success, bold-red for run.complete failed/aborted. ANSI
1070
+ auto-strips when stdout is not a TTY (CI), when `--no-color` or `--json`
1071
+ is set, or when `NO_COLOR` env var is present.
1072
+
1073
+ **Pure renderer.** `src/cli/runs-watch-renderer.ts` is referentially
1074
+ transparent — `renderEventLine(event, runningTotal, opts)` is the core
1075
+ primitive, exported and 100% pure. Tests run as string-equality
1076
+ assertions in <300ms.
1077
+
1078
+ **Engine modules untouched.** This is purely a consumer of the existing
1079
+ event stream — no changes to `src/core/run-state/**`, no changes to the
1080
+ 10 wrapped phase verbs, no changes to `runPhaseWithLifecycle`.
1081
+
1082
+ **Tests.** +43 new tests:
1083
+ - `tests/cli/runs-watch-renderer.test.ts` — 29 pure-renderer cases
1084
+ covering every event-line variant, the three budget-bar color
1085
+ thresholds, ANSI on/off symmetry, and the final-summary block
1086
+ - `tests/cli/runs-watch.test.ts` — 14 verb-level cases covering
1087
+ `--no-follow` snapshot, `--since` replay, `--json` mode, run-not-found
1088
+ (exit 2), invalid-ULID, live tail picks up appended events,
1089
+ budget rendering with/without `BudgetConfig`, plural `budgets` config
1090
+ alias, ANSI behavior, and run-complete short-circuit on already-
1091
+ terminated runs
1092
+
1093
+ **CLI plumbing.** New sub-verb on the `runs` umbrella: `runs watch <id>`.
1094
+ Help block surfaces `--since`, `--no-follow`, `--json`, `--no-color`
1095
+ plus a behavior summary + exit-code key. Exit codes: 0 success / clean
1096
+ exit, 1 invalid input or stream error, 2 not_found.
1097
+
1098
+ ## v6.0.9 — wrap `pr` through `runPhaseWithLifecycle` (2026-05-06)
1099
+
1100
+ **First side-effecting phase wrapped.** v6.0.1 → v6.0.5 wrapped read-only
1101
+ verbs (`scan`, `costs`, `fix`, `brainstorm`, `spec`, `plan`, `review`,
1102
+ `validate`); v6.0.6 extracted the lifecycle helper. v6.0.9 wraps `pr` —
1103
+ the first verb that mutates state on the platform of record (GitHub
1104
+ issue comments + PR reviews). This proves the helper's `ctx.emitExternalRef`
1105
+ plumbing for genuinely side-effecting phases without any helper-shape
1106
+ changes.
1107
+
1108
+ **Declarations.** Match the v6 spec table exactly:
1109
+
1110
+ - `idempotent: false` — re-running posts a NEW PR review ID each time
1111
+ (`postReviewComments` dismisses prior + creates new). PR comment
1112
+ posting (`postPrComment`) is marker-deduped on the body but the
1113
+ underlying `gh` API call is still mutating.
1114
+ - `hasSideEffects: true` — posts to GitHub via the `gh` CLI inside the
1115
+ inner `runCommand` invocation.
1116
+ - `externalRefs: github-pr` — recorded BEFORE the inner `runCommand`
1117
+ runs so a crash mid-pipeline still leaves a breadcrumb pointing at
1118
+ the PR. The engine path's Phase 6 resume logic can `gh pr view <id>`
1119
+ to confirm the PR is still open before deciding whether a replay
1120
+ is safe.
1121
+
1122
+ **Engine-off byte-for-byte unchanged.** All `gh pr view` + `git fetch` +
1123
+ `runCommand` behavior preserved. The wrap adds two test seams
1124
+ (`__testPrMeta` to short-circuit PR metadata lookup, `__testRunCommand`
1125
+ to stub the inner pipeline) so the smoke test exercises the engine
1126
+ lifecycle without `gh` or a real review pipeline. Production callers
1127
+ must not pass these — they're documented "test only" with a comment
1128
+ mirroring scan / fix's `__testReviewEngine` precedent.
1129
+
1130
+ **CLI plumbing.** The `pr` dispatcher arm now threads `cliEngine` from
1131
+ `parseEngineCliFlag()` and `envEngine` from
1132
+ `process.env.CLAUDE_AUTOPILOT_ENGINE`, mirroring every other wrapped
1133
+ verb. The per-verb help block (`claude-autopilot help pr`) gains
1134
+ `--engine` / `--no-engine` lines plus a side-effects note (engine-on
1135
+ records a `github-pr` externalRef; future replays gate on the spec's
1136
+ "side-effect readback" rule). `GLOBAL_FLAGS_BLOCK` adds "v6.0.9: wired
1137
+ for `pr`" to its breadcrumb list.
1138
+
1139
+ **Smoke test.** New `tests/cli/pr-engine-smoke.test.ts`, 6 cases:
1140
+ - engine off (default): no run dir / no engine artifacts; runCommand
1141
+ still invoked
1142
+ - engine off (`cliEngine: false`): no run dir
1143
+ - engine on (`--engine`): state.json + events.ndjson + lifecycle in
1144
+ order (run.start → phase.start → phase.externalRef → phase.success
1145
+ → run.complete); externalRef recorded with kind=`github-pr`,
1146
+ id=`42`, provider=`github`; `idempotent: false, hasSideEffects: true`
1147
+ reflected on the phase
1148
+ - env precedence (`CLAUDE_AUTOPILOT_ENGINE=on` without CLI flag)
1149
+ - CLI override (`--no-engine` beats env on)
1150
+ - runCommand returning 1 surfaces as verb exit 1 WITHOUT marking the
1151
+ engine phase as failed (pipeline result ≠ phase failure, same
1152
+ precedent as scan)
1153
+
1154
+ **Why no follow-up `github-comment` externalRef yet.** A potential
1155
+ extension is to record one externalRef per posted comment / review
1156
+ (`github-comment`). That requires plumbing the post-comment URL out
1157
+ of `runCommand` (currently only logged) — deferred to a follow-up PR.
1158
+ For v6.0.9 the `github-pr` ref is sufficient for the spec's readback
1159
+ rule: a Phase 6 resume can verify the PR is still open before
1160
+ deciding whether to retry.
1161
+
1162
+ **Files changed.** `src/cli/pr.ts` (270 insertions / 22 deletions),
1163
+ `src/cli/index.ts` (+12 lines for engine knob plumbing),
1164
+ `src/cli/help-text.ts` (+8 lines for the per-verb Options block +
1165
+ breadcrumb), `tests/cli/pr-engine-smoke.test.ts` (new, 306 lines),
1166
+ `docs/v6/wrapping-pipeline-phases.md` (status header + table row +
1167
+ deviation note), `docs/v6/migration-guide.md` ("what works today" list
1168
+ adds `pr`), `docs/specs/v6-run-state-engine.md` (reconciliation block
1169
+ appended). Total: ~600 lines added, ~25 lines removed.
1170
+
1171
+ **Status after v6.0.9.** Nine of 10 phases wrapped. Remaining:
1172
+ `implement` (v6.0.7) and `migrate` (v6.0.8) — both side-effecting,
1173
+ both wrapped concurrently with this PR by parallel agents.
1174
+ - **Bundled UI polish skills** — ships `/ui`, `/simplify-ui`, `/ui-ux-pro-max`,
1175
+ `/make-interfaces-feel-better` so consumers get them via `npm install` instead
1176
+ of needing user-level skill installs. `/ui` runs the chained pass (audit →
1177
+ simplify → align → polish); the other three are individual lenses. Auto-
1178
+ discovered via the existing `skills/` directory in the package `files`
1179
+ allowlist. Pairs with the design context loader
1180
+ (`src/core/ui/design-context-loader.ts`) — both gate on the same
1181
+ `hasFrontendFiles()` predicate so they only fire when frontend files change.
1182
+
1183
+ ## v6.0.7 — wrap `implement` through `runPhaseWithLifecycle` (2026-05-07)
1184
+
1185
+ **Wraps the ninth pipeline phase.** Mechanical wrap following the v6.0.6
1186
+ helper recipe. Engine-off path is byte-for-byte unchanged (advisory print
1187
+ pointing at the Claude Code `claude-autopilot` skill); engine-on path
1188
+ creates a run dir + emits run.start / phase.start / phase.success /
1189
+ run.complete events. Concurrent dispatch — landed alongside v6.0.8
1190
+ (`migrate`) and v6.0.9 (`pr`).
1191
+
1192
+ - New `src/cli/implement.ts` — `RunPhase<ImplementInput, ImplementOutput>`
1193
+ with `idempotent: true, hasSideEffects: false`. **Documented deviation
1194
+ from spec table:** the spec at line 159 of
1195
+ `docs/specs/v6-run-state-engine.md` lists `implement` with
1196
+ `idempotent: partial, hasSideEffects: yes, externalRefs: git-remote-push`.
1197
+ That declaration assumes the verb itself writes commits and pushes them
1198
+ to a remote. The v6.0.7 CLI verb does **not** write code, run tests,
1199
+ commit, or push to a remote — all of that lives in the Claude Code
1200
+ `claude-autopilot` skill (and its delegates: `subagent-driven-development`,
1201
+ `commit-push-pr`, `using-git-worktrees`). The CLI verb is the engine-wrap
1202
+ shell — its only side effect is writing the local
1203
+ `.guardrail-cache/implement/<ts>-implement.md` log stub. If a future PR
1204
+ inlines the implement loop into the CLI verb, the declarations flip to
1205
+ match the spec table and a `ctx.emitExternalRef({ kind: 'git-remote-push',
1206
+ id: '<commit-sha>' })` call lands after each push.
1207
+ - CLI dispatcher in `src/cli/index.ts` — wires `--engine` / `--no-engine` /
1208
+ `--context` / `--plan` / `--output` / `--config` through the helper
1209
+ alongside `process.env.CLAUDE_AUTOPILOT_ENGINE`. Mirrors the validate /
1210
+ review / plan dispatcher shape.
1211
+ - Help text in `src/cli/help-text.ts` — adds `implement` to the Pipeline
1212
+ group + per-verb Options block. Bumps `GLOBAL_FLAGS_BLOCK` to cite
1213
+ v6.0.7 alongside v6.0.1 → v6.0.5.
1214
+ - New smoke test `tests/cli/implement-engine-smoke.test.ts` (6 cases) —
1215
+ asserts state.json + events.ndjson lifecycle, idempotent /
1216
+ hasSideEffects flags, env / CLI precedence, log file location.
1217
+ - Test count: 1408 → 1414 (+6). `npm test` clean. `npx tsc --noEmit`
1218
+ clean except pre-existing fixture errors.
1219
+
1220
+ ## v6.0.8 — wrap `migrate` through `runPhaseWithLifecycle` (2026-05-06)
1221
+
1222
+ **First side-effecting phase under the engine.** v6.0.1 → v6.0.6 wrapped
1223
+ eight read-only / advisory verbs (`scan`, `costs`, `fix`, `brainstorm`,
1224
+ `spec`, `plan`, `review`, `validate`). v6.0.8 wraps `migrate` — the
1225
+ first verb that mutates external state (database schema). Builds on the
1226
+ `runPhaseWithLifecycle` helper landed in v6.0.6 plus
1227
+ `ctx.emitExternalRef()` from inside the phase body for the
1228
+ `migration-version` ledger. No helper-shape changes needed.
1229
+
1230
+ **Phase declarations** match the spec table at line 162 of
1231
+ `docs/specs/v6-run-state-engine.md`:
1232
+
1233
+ ```
1234
+ idempotent: false — dispatcher output varies by ledger state
1235
+ (N applied on attempt 1, 0 on attempt 2 even
1236
+ though both are operationally safe)
1237
+ hasSideEffects: true — applies migrations, writes audit log,
1238
+ regenerates types, refreshes schema cache
1239
+ externalRefs: migration-version, scoped `<env>:<name>` per applied
1240
+ migration. Phase 6's resume gate will read these back
1241
+ against the live `migration_state` to decide
1242
+ skip-already-applied vs retry vs needs-human.
1243
+ ```
1244
+
1245
+ **Why `idempotent: false` even though the underlying Delegance migrate
1246
+ skill is ledger-guarded against double-apply:** at the *engine
1247
+ semantics* layer, `idempotent: true` means "re-running the phase against
1248
+ the same input produces equivalent output." A dispatch invocation that
1249
+ previously applied N migrations on attempt 1 and applies 0 on attempt 2
1250
+ (everything already in the ledger) DOES produce different output
1251
+ (different `appliedMigrations` list, different `status`). The spec's
1252
+ `idempotent: false` is correct.
1253
+
1254
+ **Engine-off path is byte-for-byte identical to v6.0.7.** Same dispatch
1255
+ shape (`src/core/migrate/dispatcher.ts` unchanged), same render lines,
1256
+ same `--json` payload callback. CI / scripts that don't pass `--engine`
1257
+ are unaffected.
1258
+
1259
+ | File | Role |
1260
+ |---|---|
1261
+ | `src/cli/migrate.ts` (new) | Engine-wrap shell calling `runMigrate(opts) → { exitCode, result }`. Defines `MigrateInput` / `MigrateOutput` (JSON-serializable), `RunPhase<MigrateInput, MigrateOutput>` with `name: 'migrate'`, `idempotent: false`, `hasSideEffects: true`. Phase body invokes the dispatcher and emits one `migration-version` externalRef per applied migration via `ctx.emitExternalRef({ kind: 'migration-version', id: '<env>:<name>' })`. Test seam: `__testDispatch` injects a fake dispatcher so smoke tests can exercise the engine-wrap path without spawning a child process or hitting a real database |
1262
+ | `src/cli/index.ts` | dispatcher case for `migrate` routes through `runMigrate` instead of inlining `runMigrateDispatch`; threads `cliEngine` + `envEngine`. Engine-off byte-for-byte unchanged — same `--json` payload callback, same render |
1263
+ | `src/cli/help-text.ts` | per-verb Options block for `migrate` documents `--engine` / `--no-engine` + `--config`; GLOBAL_FLAGS_BLOCK breadcrumb cites v6.0.8 |
1264
+ | `tests/cli/migrate-engine-smoke.test.ts` (new) | 6 cases: engine off (default — no run dir), engine on (lifecycle events, state.json shape, idempotent: false + hasSideEffects: true declaration), externalRef emission per applied migration scoped by env, skipped status (zero externalRefs), dispatcher error → exit 1 + engine still records phase.success (domain failure ≠ engine failure), CLI `--no-engine` beats env on |
1265
+ | `docs/v6/wrapping-pipeline-phases.md` | phase-status table flips `migrate` to "WRAPPED in v6.0.8"; status line at top moves to "NINE phases wrapped"; new deviation note documents the ledger-vs-engine-semantics rationale |
1266
+ | `docs/v6/migration-guide.md` | "What works today" updated — three knobs now honored by `scan`, `costs`, `fix`, `brainstorm`, `spec`, `plan`, `review`, `validate`, `migrate` |
1267
+ | `docs/specs/v6-run-state-engine.md` | new "What was actually built (v6.0.8)" reconciliation block |
1268
+
1269
+ **Test delta:** 1408 → 1414 (+6). Typecheck clean. All 1408 existing
1270
+ tests pass unchanged — the engine-off path for `migrate` is byte-for-
1271
+ byte identical to v6.0.7 (same dispatch shape, same render).
1272
+
1273
+ **Concurrency note.** v6.0.7 (`implement`) and v6.0.9 (`pr`) are in
1274
+ flight on parallel worktrees, both targeting shared docs (CHANGELOG,
1275
+ recipe table, migration-guide) and `src/cli/{index,help-text}.ts`. The
1276
+ rebase contract: on push rejection, fetch + rebase + resolve conflicts
1277
+ keeping all wraps' contributions, re-test, push with `--force-with-lease`.
1278
+
1279
+ **Not done in v6.0.8 — explicit non-goals:**
1280
+ - Wrapping `implement` and `pr`. Continues across v6.0.7 / v6.0.9
1281
+ using the same helper plus `ctx.emitExternalRef()` for
1282
+ `git-remote-push` (implement) and `github-pr` (pr).
1283
+ - Wiring Phase 6's `migration_state` read-back. The engine PERSISTS
1284
+ `migration-version` externalRefs in v6.0.8; consulting them on
1285
+ resume ships in Phase 6+. Until then, retries on side-effecting
1286
+ phases require `--force-replay`.
1287
+ - Multi-phase pipeline orchestrator (autopilot's full
1288
+ `brainstorm → spec → plan → ... → migrate → ...` flow under one runId).
1289
+ - Flipping the v6.0 built-in default to ON. v6.1 territory.
1290
+
1291
+ ## v6.0.6 — `runPhaseWithLifecycle` helper (2026-05-06)
1292
+
1293
+ **Tech-debt refactor, no behavior change.** v6.0.1 → v6.0.5 wrapped eight
1294
+ CLI verbs (`scan`, `costs`, `fix`, `brainstorm`, `spec`, `plan`, `review`,
1295
+ `validate`) by hand-rolling the same ~100-line lifecycle pattern in each
1296
+ file: `createRun → optional run.warning → runPhase → run.complete →
1297
+ state.json refresh → best-effort lock release in finally`. Bugbot caught
1298
+ the duplication on PR #97 (LOW severity, deferred) with the explicit
1299
+ note: "extracting from 5 of 10 examples risks getting the abstraction
1300
+ wrong; from 10 of 10 the pattern is fully evidenced." At 8 of 10, the
1301
+ pattern is sufficiently evidenced that the remaining three side-effecting
1302
+ phases (`implement`, `migrate`, `pr`) can use the same helper plus
1303
+ `ctx.emitExternalRef()` from inside their phase body — no helper-shape
1304
+ changes needed.
1305
+
1306
+ **The helper.** New `src/core/run-state/run-phase-with-lifecycle.ts` sits
1307
+ on top of the existing `runPhase()` API (which is unchanged). Callers
1308
+ continue to define their own `RunPhase<I, O>` with per-phase
1309
+ `idempotent` / `hasSideEffects` / `run`, and pass it in alongside the
1310
+ input, the loaded config, the engine knobs, and an `runEngineOff`
1311
+ escape-hatch callback. The helper:
1312
+
1313
+ - Resolves engine on/off via the canonical CLI > env > config > default
1314
+ precedence
1315
+ - On engine-off: invokes `runEngineOff()` and returns its result with
1316
+ `runId/runDir: null`
1317
+ - On engine-on: creates a run dir, optionally emits `run.warning` for
1318
+ invalid env, runs the phase, emits `run.complete` (success or failed),
1319
+ refreshes `state.json` from replayed events, releases the lock in
1320
+ `finally` (idempotent), and returns `{ output, runId, runDir }`
1321
+ - On phase failure: emits `run.complete` with `status: 'failed'`, prints
1322
+ the legacy `[<phase>] engine: phase failed — <msg>` banner to stderr
1323
+ byte-for-byte, releases the lock, and re-throws
1324
+
1325
+ **Migrated phases.** All eight wrapped verbs reduced. Each `runX(opts)`
1326
+ function shrinks: keep the per-phase `RunPhase<I, O>` definition + the
1327
+ engine-off path body; delete the lifecycle boilerplate; call
1328
+ `runPhaseWithLifecycle` once. Total reduction across `src/cli/`:
1329
+
1330
+ - `scan.ts` 498 → 429 lines (-69)
1331
+ - `costs.ts` 297 → 231 lines (-66)
1332
+ - `fix.ts` 473 → 415 lines (-58)
1333
+ - `brainstorm.ts` 251 → 189 lines (-62)
1334
+ - `spec.ts` 216 → 159 lines (-57)
1335
+ - `plan.ts` 269 → 199 lines (-70)
1336
+ - `review.ts` 256 → 189 lines (-67)
1337
+ - `validate.ts` 262 → 196 lines (-66)
1338
+ - **Total: 2522 → 2007 lines (~515 lines saved)**
1339
+
1340
+ **Engine-off path is byte-for-byte unchanged.** All eight existing
1341
+ `tests/cli/<verb>-engine-smoke.test.ts` smokes pass without modification
1342
+ (44 cases). The helper supplies an `runEngineOff` callback so the legacy
1343
+ code path stays intact even when the phase body's call shape would
1344
+ otherwise pin it.
1345
+
1346
+ ### Test count
1347
+
1348
+ After v6.0.5 baseline: 1396 → 1408 (+12). +12 cases for the new
1349
+ `tests/run-state/run-phase-with-lifecycle.test.ts` covering: engine-off
1350
+ (default + CLI > env > config precedence); engine-on success (lifecycle
1351
+ events, state.json shape, env / config resolution, costUSD pass-through,
1352
+ costUSD-absent fallback to 0); engine-on failure (run.complete failed,
1353
+ state.json refresh, error re-thrown with original message preserved,
1354
+ lock released through finally); invalid env value falling through to
1355
+ config-resolved engine-on with `run.warning`. Existing 44 phase smokes
1356
+ unchanged. Typecheck clean. Bugbot LOW from PR #97 addressed.
1357
+
1358
+ ### Deliberately deferred
1359
+
1360
+ - Wrapping the remaining pipeline phases (`implement`, `migrate`, `pr`).
1361
+ Side-effecting phases need careful externalRef plumbing — they will
1362
+ build against `runPhaseWithLifecycle` plus `ctx.emitExternalRef()`
1363
+ from inside their phase body. Helper signature does not need to grow
1364
+ for them; documented in the helper's header comment.
1365
+ - Multi-phase pipeline orchestrator (autopilot's full
1366
+ `brainstorm → spec → plan → ...` flow under one runId). The single-
1367
+ phase shape stays — multi-phase wrapping is a separate v6.x lift.
1368
+ - Flipping the v6.0 built-in default to ON. v6.1 territory.
1369
+
1370
+ ## v6.0.5 — Engine wire-up Part E (2026-05-06)
1371
+
1372
+ **The headline.** v6.0.4 wrapped `plan` and `review`. v6.0.5 continues the
1373
+ mechanical wrap pattern from the recipe at
1374
+ [`docs/v6/wrapping-pipeline-phases.md`](docs/v6/wrapping-pipeline-phases.md)
1375
+ with one more single-shot, read-only verb:
1376
+
1377
+ - **`validate`** — new CLI verb. Engine-wrap shell for the validate
1378
+ pipeline phase. Writes a validate log stub under
1379
+ `.guardrail-cache/validate/`; the actual validation work (static
1380
+ checks, auto-fix, tests, Codex review with auto-fix, bugbot triage) is
1381
+ owned by the Claude Code `/validate` skill. Declared `idempotent: true,
1382
+ hasSideEffects: false` (local file write only; no provider calls, no
1383
+ git push, no PR comment, no SARIF upload).
1384
+
1385
+ **Documented deviation from the spec table.** The v6 spec
1386
+ ([docs/specs/v6-run-state-engine.md](docs/specs/v6-run-state-engine.md),
1387
+ line 161) lists `validate` with externalRefs `sarif-artifact`. The
1388
+ v6.0.5 wrap matches the `idempotent: true, hasSideEffects: false`
1389
+ declaration but does **not** plumb a `sarif-artifact` externalRef — the
1390
+ v6.0.5 `validate` CLI verb does not emit a SARIF artifact. SARIF
1391
+ emission lives in `claude-autopilot run --format sarif --output <path>`
1392
+ (a separate verb). The SARIF reference is local-only file output (no
1393
+ remote upload), so the engine doesn't need a readback rule for it on
1394
+ resume — `idempotent: true` covers replay safety. If a future PR adds
1395
+ SARIF emission directly to this verb, the wrap can add a
1396
+ `ctx.emitExternalRef({ kind: 'sarif-artifact', ... })` call after the
1397
+ file write lands. Documented inline in `src/cli/validate.ts` and in the
1398
+ wrapping recipe's deviation note.
1399
+
1400
+ The engine-off code path is byte-for-byte unchanged; the `validate`
1401
+ verb is brand new in v6.0.5 (validation previously lived only as a
1402
+ Claude Code skill).
1403
+
1404
+ ### Test count
1405
+
1406
+ After v6.0.4 baseline: 1390 → 1396 (+6). +6 cases for
1407
+ `validate-engine-smoke.test.ts`, mirroring the
1408
+ `review-engine-smoke.test.ts` shape: engine off → no run dir + log
1409
+ written; engine off (cliEngine: false); engine on → state.json +
1410
+ events.ndjson with the right lifecycle (`run.start` →
1411
+ `phase.start` → `phase.success` → `run.complete`); engine on with
1412
+ explicit `--context`; env-resolved; CLI override beats env. Typecheck
1413
+ clean.
1414
+
1415
+ ### Deliberately deferred
1416
+
1417
+ - Wrapping the remaining pipeline phases (`implement`, `migrate`,
1418
+ `pr`). Side-effecting phases need careful externalRef plumbing per
1419
+ the recipe's "side effects" gate; wrap them last.
1420
+ - Adding SARIF emission directly to the `validate` verb. Lives in
1421
+ `claude-autopilot run --format sarif` (separate verb).
1422
+ - Extracting a shared `runPhaseWithLifecycle` helper across the eight
1423
+ wrapped verbs. Separate refactor PR — out of scope for v6.0.5.
1424
+ - Flipping the v6.0 built-in default to ON. v6.1 territory.
1425
+
1426
+ ## v6.0.4 — Engine wire-up Part D (2026-05-06)
1427
+
1428
+ **The headline.** v6.0.3 wrapped `brainstorm` and `spec`. v6.0.4 continues
1429
+ the mechanical wrap pattern from the recipe at
1430
+ [`docs/v6/wrapping-pipeline-phases.md`](docs/v6/wrapping-pipeline-phases.md)
1431
+ with two more single-shot verbs:
1432
+
1433
+ - **`plan`** ([#98](https://github.com/axledbetter/claude-autopilot/pull/98)) —
1434
+ new CLI verb. Engine-wrap shell for the plan pipeline phase. Writes a
1435
+ plan markdown stub under `.guardrail-cache/plans/`; the actual
1436
+ LLM-driven planning content is owned by the Claude Code
1437
+ superpowers:writing-plans skill. Declared `idempotent: true,
1438
+ hasSideEffects: false` (local file write only; no provider calls, no
1439
+ git push, no PR comment).
1440
+ - **`review`** ([#98](https://github.com/axledbetter/claude-autopilot/pull/98)) —
1441
+ new CLI verb. Engine-wrap shell for the review pipeline phase. Writes
1442
+ a review log stub under `.guardrail-cache/reviews/`; the actual
1443
+ LLM-driven review content is owned by the Claude Code review skills
1444
+ (`/review`, `/review-2pass`, `pr-review-toolkit:review-pr`). Declared
1445
+ `idempotent: true, hasSideEffects: false`.
1446
+
1447
+ **Documented deviation from the spec table.** The v6 spec
1448
+ ([docs/specs/v6-run-state-engine.md](docs/specs/v6-run-state-engine.md))
1449
+ lists `review` with externalRefs `review-comments`, implying PR-side
1450
+ comment posting (which would force `hasSideEffects: true`). The v6.0.4
1451
+ `review` verb does **not** post anywhere — PR-side comment posting
1452
+ lives in `claude-autopilot pr --inline-comments` /
1453
+ `--post-comments` (a separate verb). If a future PR adds platform-side
1454
+ comment posting to this verb, both declarations will need to flip and
1455
+ the readback rules will need to plumb a `review-comments` externalRef.
1456
+ Documented inline in `src/cli/review.ts`.
1457
+
1458
+ **Backward-compat — `review` grouping prefix preserved.**
1459
+ `claude-autopilot review` (no args) still prints the alpha.2 prefix
1460
+ help banner per the V16 v4-compat test. Flat-verb invocation requires
1461
+ at least one flag, e.g. `claude-autopilot review --engine`.
1462
+ `claude-autopilot help review` continues to surface the flat-verb
1463
+ Options block via `buildCommandHelpText`.
1464
+
1465
+ Engine-off code paths are unchanged for both verbs.
1466
+
1467
+ ### Test count
1468
+
1469
+ After v6.0.3 baseline: 1378 → 1390 (+12). +6 cases for
1470
+ `plan-engine-smoke.test.ts`, +6 cases for `review-engine-smoke.test.ts`.
1471
+ Both mirror `costs-engine-smoke.test.ts`: engine off → no run dir;
1472
+ engine on → state.json + events.ndjson with the right lifecycle
1473
+ (`run.start` → `phase.start` → `phase.success` → `run.complete`);
1474
+ env-resolved; CLI override beats env. Typecheck clean.
1475
+
1476
+ ### Deliberately deferred
1477
+
1478
+ - Wrapping the remaining pipeline phases (`implement`, `migrate`,
1479
+ `validate`, `pr`). Side-effecting phases (`implement`, `migrate`,
1480
+ `pr`) need careful externalRef plumbing per the recipe's "side
1481
+ effects" gate; wrap them last.
1482
+ - Flipping the v6.0 built-in default to ON. v6.1 territory.
1483
+
1484
+ ## v6.0.3 — Wrap brainstorm + spec through runPhase (2026-05-05)
1485
+
1486
+ **The headline.** v6.0.3 continues the mechanical phase-wrap pattern from
1487
+ the recipe at
1488
+ [`docs/v6/wrapping-pipeline-phases.md`](docs/v6/wrapping-pipeline-phases.md)
1489
+ with two more pipeline verbs:
1490
+
1491
+ - **`brainstorm`** — the pipeline entry point. Implemented primarily as
1492
+ a Claude Code skill (`/brainstorm` → `superpowers:brainstorming`); the
1493
+ CLI verb is an advisory shim pointing the user there. The wrap declares
1494
+ `idempotent: true, hasSideEffects: false`. Engine-off path is
1495
+ byte-for-byte identical to v6.0.2 (the same advisory banner). Engine-on
1496
+ path creates a run dir + emits `run.start` / `phase.start` /
1497
+ `phase.success` / `run.complete`. `--json` envelope shape is preserved
1498
+ for back-compat with the WS7 welcome regression guard and
1499
+ `json-channel-discipline.test.ts`.
1500
+ - **`spec`** — same shape as brainstorm. New top-level subcommand (it
1501
+ was previously absent from `SUBCOMMANDS`); the CLI verb is an advisory
1502
+ shim pointing at the autopilot/brainstorm Claude Code flow. Same wrap
1503
+ flags + same engine lifecycle.
1504
+
1505
+ **Documented deviation from the spec table.** The
1506
+ [v6 spec table](docs/specs/v6-run-state-engine.md) declares both
1507
+ `brainstorm` and `spec` `idempotent: no` because the LLM dialogue
1508
+ produces new content each invocation. v6.0.3 declares `idempotent: true`
1509
+ because the CLI verbs themselves are static advisory prints with no LLM
1510
+ call and no externalRefs to reconcile — the engine's idempotency check
1511
+ is "safe to retry without reconciliation," not "produces byte-identical
1512
+ output." Justified inline at the top of `src/cli/brainstorm.ts` and
1513
+ `src/cli/spec.ts` plus a deviation block in the recipe. Once the CLI
1514
+ verbs grow real LLM bodies (a future v6.x lift), the declaration may
1515
+ flip and a `spec-file` externalRef will land on every successful run.
1516
+
1517
+ Engine-off code paths are unchanged for both verbs; existing tests pass
1518
+ without modification.
1519
+
1520
+ ### Test count
1521
+
1522
+ 1367 → 1378 (+11). +5 cases for `brainstorm-engine-smoke.test.ts`, +5
1523
+ cases for `spec-engine-smoke.test.ts`, +1 case for `spec` joining
1524
+ `MIGRATED_VERBS` in `json-channel-discipline.test.ts`. Both new smoke
1525
+ files mirror `costs-engine-smoke.test.ts`: engine off → no run dir;
1526
+ engine on → state.json + events.ndjson with the right lifecycle
1527
+ (`run.start` → `phase.start` → `phase.success` → `run.complete`);
1528
+ env-resolved; CLI override beats env. Typecheck clean.
1529
+
1530
+ ### Deliberately deferred
1531
+
1532
+ - Wrapping the six remaining pipeline phases (`plan`, `implement`,
1533
+ `migrate`, `validate`, `pr`, `review`). One or two per release across
1534
+ v6.0.4+. A parallel agent works `plan` + `review` for v6.0.4.
1535
+ - Promoting `brainstorm`/`spec` from advisory shims to full LLM-bearing
1536
+ CLI verbs. The Claude Code skill remains the user-facing entry point;
1537
+ the CLI wraps exist so the engine has a place to record run-state for
1538
+ future multi-phase orchestration.
1539
+
1540
+ ## v6.0.2 — Engine wire-up Part B (2026-05-06)
1541
+
1542
+ **The headline.** v6.0.1 wrapped the first pipeline phase (`scan`) through
1543
+ `runPhase`. v6.0.2 continues the mechanical wrap pattern from the recipe at
1544
+ [`docs/v6/wrapping-pipeline-phases.md`](docs/v6/wrapping-pipeline-phases.md)
1545
+ with two more single-shot verbs:
1546
+
1547
+ - **`costs`** ([#96](https://github.com/axledbetter/claude-autopilot/pull/96)) —
1548
+ pure read-only summary of the local cost ledger. The cleanest possible
1549
+ wrap: `idempotent: true, hasSideEffects: false`, no provider, no LLM,
1550
+ no file writes. CLI dispatcher passes `cliEngine` + `envEngine` through;
1551
+ `--config` flag also wired since the engine resolver consults config.
1552
+ - **`fix`** ([#96](https://github.com/axledbetter/claude-autopilot/pull/96)) —
1553
+ applies LLM-generated patches to local files. Declared
1554
+ `idempotent: true` (same finding + same file content → same patch) and
1555
+ `hasSideEffects: false` (no remote / git push / PR creation in the
1556
+ existing flow — purely local file edits, which the recipe defines as
1557
+ platform-side-effect-free). If/when fix grows a `--push` mode it will
1558
+ flip to `hasSideEffects: true` with a `git-remote-push` externalRef.
1559
+
1560
+ **Documented deviation from the recipe.** Both wraps follow the recipe
1561
+ mechanically. `fix` adds one explicit deviation: its phase body emits
1562
+ per-finding console output and reads a [y/n/q] confirmation via
1563
+ `readline`. Pure side-effect-free phase bodies are the recipe default,
1564
+ but interactive verbs are an explicit exception (same precedent as
1565
+ `scan` keeping its LLM call inside `executeScanPhase`). The summary line
1566
+ + exit-code logic still lives in `renderFixOutput` so the engine path's
1567
+ idempotency isn't coupled to the final stdout shape. See the new "Note
1568
+ on interactive verbs" section at the bottom of the wrapping recipe.
1569
+
1570
+ Engine-off code paths are byte-for-byte unchanged for both verbs;
1571
+ existing tests pass without modification.
1572
+
1573
+ ### Test count
1574
+
1575
+ 1356 → 1367 (+11). +6 cases for `costs-engine-smoke.test.ts`, +5 cases
1576
+ for `fix-engine-smoke.test.ts`. Both mirror `scan-engine-smoke.test.ts`:
1577
+ engine off → no run dir; engine on → state.json + events.ndjson with
1578
+ the right lifecycle (`run.start` → `phase.start` → `phase.success` →
1579
+ `run.complete`); env-resolved; CLI override beats env. Typecheck clean.
1580
+
1581
+ ### Deliberately deferred
1582
+
1583
+ - Wrapping the seven remaining pipeline phases (`brainstorm`, `plan`,
1584
+ `implement`, `migrate`, `validate`, `pr`, `review`). One or two per
1585
+ release across v6.0.3+.
1586
+ - Flipping the v6.0 built-in default to ON. v6.1 territory.
1587
+
1588
+ ## v6.0.1 — Engine wire-up Part A (2026-05-05)
1589
+
1590
+ **The headline.** v6.0 shipped the engine modules but left the user-facing
1591
+ knobs un-wired. This release lights up the three knobs (`--engine` /
1592
+ `--no-engine` CLI flag, `CLAUDE_AUTOPILOT_ENGINE` env var,
1593
+ `engine.enabled` config key) with explicit precedence (CLI > env > config
1594
+ > built-in default) and wraps the **first** pipeline phase — `scan` —
1595
+ through `runPhase`. Every other pipeline phase still bypasses the engine;
1596
+ those land one or two per PR across subsequent v6.0.x releases following
1597
+ the recipe at [`docs/v6/wrapping-pipeline-phases.md`](docs/v6/wrapping-pipeline-phases.md).
1598
+
1599
+ The engine still ships **OFF** by default in v6.0.x. The default flip to
1600
+ **ON** lands in v6.1 per [`docs/specs/v6.1-default-flip.md`](docs/specs/v6.1-default-flip.md).
1601
+
1602
+ ### What landed (PR #95)
1603
+
1604
+ - **`resolveEngineEnabled()` precedence resolver.** Pure / no-IO function
1605
+ in `src/core/run-state/resolve-engine.ts`. Inputs:
1606
+ `{cliEngine?, envValue?, configEnabled?, builtInDefault?}`. Outputs:
1607
+ `{enabled, source, reason, invalidEnvValue?}`. Accepts case-insensitive
1608
+ env values `on/off/true/false/1/0/yes/no` (plus whitespace tolerance);
1609
+ invalid values fall through to the next-lowest precedence layer and
1610
+ surface the raw string in `invalidEnvValue` so the caller can emit a
1611
+ `run.warning`. **+45 unit tests** covering every precedence layer, every
1612
+ accepted env form, the conflict rules, and the invalid-env fallthrough.
1613
+ - **CLI flag parsing in `src/cli/index.ts`.** New `parseEngineCliFlag()`
1614
+ helper rejects the conflict case (both `--engine` AND `--no-engine`)
1615
+ with `invalid_config` exit 1. Wired into the `scan` case to pass
1616
+ `cliEngine` + `envEngine` (from `process.env.CLAUDE_AUTOPILOT_ENGINE`)
1617
+ through to `runScan`.
1618
+ - **Config schema** (`src/core/config/types.ts` + `schema.ts`). New
1619
+ optional `engine.enabled: boolean` knob; schema rejects unknown
1620
+ sub-keys (`additionalProperties: false`).
1621
+ - **Help text** (`src/cli/help-text.ts`). New `GLOBAL_FLAGS_BLOCK`
1622
+ documents `--json` / `--engine` / `--no-engine` + the precedence
1623
+ matrix + scope (scan only in v6.0.1; rest follows the recipe). Per-verb
1624
+ `scan` Options block adds the new flags so `claude-autopilot help scan`
1625
+ is self-contained.
1626
+ - **`scan` pilot phase wrapping** (`src/cli/scan.ts`). Refactored the
1627
+ LLM-call-and-finding-processing portion into `executeScanPhase(input)`
1628
+ → `ScanOutput` (pure, no console output, no exit-code logic). Defined
1629
+ `RunPhase<ScanInput, ScanOutput>` with `name: 'scan'`,
1630
+ `idempotent: true`, `hasSideEffects: false`. Engine-on path:
1631
+ `createRun()` → `runPhase()` → `run.complete` event +
1632
+ `replayState`/`writeStateSnapshot` refresh + best-effort lock release
1633
+ in `finally`. Engine-off path: `executeScanPhase(input)` directly,
1634
+ byte-for-byte unchanged from v6.0. Rendering extracted into
1635
+ `renderScanOutput()` so the engine path's idempotency isn't coupled
1636
+ to console output. Test seam (`__testReviewEngine`) lets the smoke test
1637
+ inject a fake without an LLM key.
1638
+ - **End-to-end smoke test** (`tests/cli/scan-engine-smoke.test.ts`).
1639
+ Drives `runScan` with the engine on against a tmp project; asserts
1640
+ `state.status === 'success'`, single `scan` phase with the right
1641
+ `idempotent` / `hasSideEffects` flags, monotonic seq numbers, and the
1642
+ full lifecycle (`run.start` → `phase.start` → `phase.success` →
1643
+ `run.complete`). Five cases including engine-off (no run dir),
1644
+ env-resolved, CLI override, and invalid-env-fallthrough warning.
1645
+ - **Wrapping recipe doc** (`docs/v6/wrapping-pipeline-phases.md`).
1646
+ Six-step recipe + phase-status table + idempotency decision tree +
1647
+ worked example (scan) + a checklist subsequent v6.0.x PRs follow when
1648
+ wrapping the remaining ten pipeline phases (`brainstorm`, `plan`,
1649
+ `implement`, `migrate`, `validate`, `pr`, `review`, `fix`, `costs`).
1650
+ - **Migration guide** (`docs/v6/migration-guide.md`). "What works today"
1651
+ list updated — three knobs move from "wiring pending" to "wired (limited
1652
+ to scan)". Other phases still tracked under "wiring pending."
1653
+ - **Spec reconciliation** (`docs/specs/v6-run-state-engine.md`). New "What
1654
+ was actually built (v6.0.1 — Part A)" block.
1655
+
1656
+ ### Test count
1657
+
1658
+ 1306 → 1356 (+50). Typecheck clean. Existing 1306 tests continue to pass
1659
+ unchanged — the engine-off code path for `scan` is byte-for-byte
1660
+ identical to v6.0.
1661
+
1662
+ ### Deliberately deferred
1663
+
1664
+ - Wrapping of any other pipeline phase. Lands one or two per PR across
1665
+ v6.0.2+ following the recipe.
1666
+ - Flipping the v6.0 built-in default to ON. v6.1 territory.
1667
+ - Removing `--no-engine`. v7 territory.
1668
+
1669
+ ## v6.0 — Run State Engine (2026-05-05)
1670
+
1671
+ **The headline.** Autopilot moves from a stateless command-stream to a
1672
+ checkpointed, resumable, budget-bounded, observable pipeline. Every run gets
1673
+ a ULID and a per-project directory at `.guardrail-cache/runs/<ulid>/`.
1674
+ Every state transition appends a typed event to `events.ndjson` and updates
1675
+ `state.json` atomically. Two-layer budget enforcement (advisory `estimateCost`
1676
+ preflight + mandatory runtime guard) hard-stops runaway spend before it
1677
+ happens. Every CLI verb grows a `--json` flag with strict stdout/stderr
1678
+ channel discipline so CI consumers can drive the pipeline programmatically.
1679
+ Side-effect phase replay decisions consult persisted `externalRefs` plus a
1680
+ live provider read-back so resume is safe by construction. **v6.0 ships
1681
+ with the engine OFF by default — opt-in via `engine.enabled: true` (config
1682
+ wiring across 6.0.x point releases). Default flips to ON in v6.1.** See
1683
+ [`docs/v6/migration-guide.md`](docs/v6/migration-guide.md) for the v5.x → v6
1684
+ walkthrough and [`docs/v6/quickstart.md`](docs/v6/quickstart.md) for the
1685
+ five-minute version.
1686
+
1687
+ ### Per-phase landings
1688
+
1689
+ - **Phase 1 — Run State Engine persistence layer ([#86](https://github.com/axledbetter/claude-autopilot/pull/86)).** `RunState` / `RunEvent` / `PhaseSnapshot` / `ExternalRef` / `WriterId` types in `src/core/run-state/types.ts`. Pure-TS 26-char Crockford Base32 ULID generator (`ulid.ts`). Per-run advisory lock via `proper-lockfile` + `.lock-meta.json` sidecar with PID + SHA-256-hashed hostname; off-host writers default to alive (fail closed) so a network-mounted lock can't be stolen. Durable append protocol for `events.ndjson` (`open(O_APPEND)` → `write` → `fsync(fd)` → `close` per event) with monotonic `seq` via `.seq` sidecar. Truncated last-line detection emits `run.recovery(reason: 'recovered-from-partial-write')` and continues; mid-file corruption throws `partial_write` immediately. Atomic snapshot writer for `state.json` (`open(.tmp)` → `fsync(fd)` → `rename` → `fsync(dirfd)`; tmpfs/SMB compatibility via swallowed EISDIR/EPERM/ENOTSUP on the dir-fsync). `recoverState` falls back to events replay when `state.json` is missing/corrupt. `createRun` / `listRuns` / `gcRuns` lifecycle helpers; symlink-safe GC. New `ErrorCode` variants: `lock_held`, `corrupted_state`, `partial_write`. **+56 tests.**
1690
+ - **Phase 2 — Phase wrapper + lifecycle ([#87](https://github.com/axledbetter/claude-autopilot/pull/87)).** `RunPhase<I, O>` interface (`idempotent` / `hasSideEffects` / `estimateCost?` / `run` / `onResume?`). `runPhase` orchestrator emits `phase.start` → `phase.success`/`failed` and gates idempotent short-circuit + side-effecting replay. Atomic per-phase snapshot writer (`writePhaseSnapshot` with path-traversal rejection on phase names). Hidden CLI verb `claude-autopilot internal log-phase-event` exposed via `cli-internal.ts` so markdown-driven skills can append events without importing the engine. Sub-phase nesting via synthetic `phaseIdx` encoding (`parentIdx * 1000 + childOrdinal`). **+27 tests.** Spec deviation: idempotent-replay short-circuit emits `run.warning(details.reason: 'idempotent-replay')` instead of a new `phase.skipped` event variant — durable log doesn't need a new shape since the snapshot is identical.
1691
+ - **Phase 3 — `runs` / `run resume` CLI ([#88](https://github.com/axledbetter/claude-autopilot/pull/88)).** Six verbs: `runs list` (newest-first, `--status` filter), `runs show <id>` (state + optional events tail), `runs gc` (default 30-day cutoff, confirmation gate), `runs delete <id>` (terminal-status guard + lock acquisition), `runs doctor` (replay vs snapshot drift; `--fix` rewrites), `run resume <id>` (**lookup-only** in v6.0 — identifies next phase + decision rationale; live execution wires in 6.1+). Every verb supports `--json` envelope output (v1 schema). New `Engine` group in `HELP_GROUPS`. Decision vocabulary (`retry` / `skip-idempotent` / `needs-human` / `already-complete`) preserved as a thin wrapper around the canonical `decideReplay` matrix introduced in Phase 6. **No changes to existing CLI verbs.**
1692
+ - **Phase 4 — Budget enforcement ([#89](https://github.com/axledbetter/claude-autopilot/pull/89)).** `BudgetConfig` (`perRunUSD`, `perPhaseUSD?`, `councilMaxRecursionDepth?`, `bgAutopilotMaxRoundsPerSelfEat?`, `conservativePhaseReserveUSD?`). `checkPhaseBudget` pure decision function with two-layer policy: (1) advisory — uses `estimateCost.high` if the phase declares one; (2) mandatory — runs regardless, enforces `actualSoFar + conservativePhaseReserveUSD <= perRunUSD` so phases without `estimateCost` still trigger budget gates. `runPhase` emits a `budget.check` event with full decision rationale (`{phase, phaseIdx, estimatedHigh, actualSoFar, reserveApplied, capRemaining, decision, reason}`) before every spawn; throws `GuardrailError(budget_exceeded)` on hard-fail. Council synthesizer recursion bounded via `councilMaxRecursionDepth` — exceeded calls return `status: 'partial'` rather than continuing. **+25-30 tests.**
1693
+ - **Phase 5 — Typed JSON events + strict `--json` channel discipline ([#90](https://github.com/axledbetter/claude-autopilot/pull/90)).** `--json` flag now lives on every Review / Pipeline / Deploy / Migrate / Diagnostics verb. Strict channel contract enforced by a dispatcher-level wrapper (`runUnderJsonMode` in `src/cli/json-envelope.ts`): exactly **one** JSON envelope on stdout per invocation; **only** NDJSON event lines on stderr (synthetic `run.warning` for legacy text via `installJsonModeChannelDiscipline` console-wrap); ANSI color codes stripped; interactive prompts hard-fail with `EXIT_NEEDS_HUMAN = 78` and the envelope's `nextActions` field carries the resume hint. Text-mode behavior unchanged. **`tests/cli/json-channel-discipline.test.ts` asserts the invariants per migrated verb.**
1694
+ - **Phase 6 — Idempotency contracts + provider read-back ([#91](https://github.com/axledbetter/claude-autopilot/pull/91)).** `decideReplay` pure decision matrix in `replay-decision.ts` maps `(priorSuccess, idempotent, hasSideEffects, refs, readbacks, forceReplay)` → `'retry' | 'skip-already-applied' | 'needs-human' | 'abort'`. Pluggable `ProviderReadback` registry in `provider-readback.ts` with built-in read-backs for `github` (via `gh` CLI), `vercel` / `fly` / `render` (via the deploy adapters), `supabase` (via `migration_state`). All read-backs **fail closed** — any throw, parse failure, or unrecognized state collapses to `existsOnPlatform=false, currentState='unknown'` so the matrix routes to `needs-human` instead of a silent skip. `runPhase` wires `decideReplay` (replaces Phase 2's hard-coded throw). New `replay.override` event variant emitted when `--force-replay` flips a refusal into a retry; `foldEvents` records overrides on `phase.meta.replayOverrides`. `PhaseSnapshot.result` field added so `skip-already-applied` returns the prior output without re-execution. CLI lookup (`runRunResume`) delegates to the same `decideReplay` so prediction matches live execution. **+55 tests.**
1695
+ - **Phase 7 — Live adapter certification suite ([#92](https://github.com/axledbetter/claude-autopilot/pull/92)).** Five live assertions × three providers (Vercel + Fly + Render): deploy success, auth failure, 404, rollback, log streaming with redaction-on-planted-secret. Env-gated via `resolveProviderEnv()` — runs report `skipped` until the operator adds the seven `*_TEST` GitHub Secrets per `docs/adapters/cert-suite.md`. Flake-control harness (`tests/adapters/live/_harness.ts`) implements per-provider 3-attempt retry budget with exp backoff (1s / 4s / 16s) on transient categories, hard-fail (no retry) on auth/404/schema-mismatch, soft-fail with 3-strike escalation on rollout/log-streaming flakes; **+42 unit tests** for the harness alone (run under regular `npm test`, no live creds required). Nightly CI workflow at `.github/workflows/adapter-cert.yml` (09:00 UTC + manual `workflow_dispatch`); uploads `events.ndjson` + `log-tail.txt` artifacts on every run. **Spec deviation:** Fly cert needs a third env var (`FLY_IMAGE_TEST`) since the Fly adapter doesn't build images per the v5.6 design.
1696
+ - **Phase 8 — Docs + migration guide ([#94](https://github.com/axledbetter/claude-autopilot/pull/94), this PR).** `docs/v6/migration-guide.md` walks v5.x users through the opt-in flow with a precedence matrix, troubleshooting recipes, the per-phase idempotency table, and the v6.0 → v6.1 default-flip plan. `docs/v6/quickstart.md` is the five-minute version. README gains a "Run State Engine (v6)" section. CHANGELOG (this entry) bundles every phase. Spec gets a Phase 8 reconciliation block + a Status column on the implementation phases table. New `docs/specs/v6.1-default-flip.md` outlines the stabilization criteria for flipping `engine.enabled` to `true` by default and removing `--no-engine`.
1697
+ - **Spec — Codex-reviewed twice ([#85](https://github.com/axledbetter/claude-autopilot/pull/85)).** Two passes through Codex 5.3 hardened the persistence protocol (durable append + atomic snapshot ordering), promoted `events.ndjson` to source-of-truth with `state.json` as a derived cache, mandated copy-not-symlink for artifacts, added the two-layer budget policy with a mandatory runtime guard, formalized the strict `--json` channel discipline, defined the external-operation ledger for replay safety (`ExternalRef` + provider read-back), pinned the precedence matrix, and added flake-control parameters for the live adapter cert suite.
1698
+
1699
+ ### Codex / council pricing — from the GPT-5.5 swap ([#93](https://github.com/axledbetter/claude-autopilot/pull/93))
1700
+
1701
+ - **Default codex/council model bumped `gpt-5.3-codex` → `gpt-5.5`.** OpenAI
1702
+ released GPT-5.5 (codename Spud) on 2026-04-23 — better at coding than 5.4
1703
+ with fewer tokens, available via standard Responses/Chat Completions API
1704
+ at `gpt-5.5` (no `-codex` suffix). Pricing **doubles** to $5/1M input +
1705
+ $30/1M output, so the per-adapter `COST_PER_M_INPUT/OUTPUT` defaults moved
1706
+ in lockstep — without this, every cost-ledger entry would silently halve.
1707
+ New canonical pricing table at `src/adapters/pricing.ts` keeps the legacy
1708
+ `gpt-5.3-codex` and `gpt-5.4` entries for back-compat with pinned
1709
+ `CODEX_MODEL`/`council.models[].model` configs. Override via env vars
1710
+ (`CODEX_MODEL`, `CODEX_COST_INPUT_PER_M`, `CODEX_COST_OUTPUT_PER_M`).
1711
+
1712
+ ## v5.6.0 — Fly.io + Render deploy adapters (2026-05-04)
1713
+
1714
+ ### Added
1715
+
1716
+ - **`@delegance/claude-autopilot deploy --adapter fly`** — first-class Fly.io adapter. Image-based releases via the Machines API (image must be pre-pushed via `fly deploy --build-only --push`), polling-based status, **WebSocket log streaming**, **native rollback** with simulated fallback when the API endpoint is unavailable. `FLY_API_TOKEN` env var; auth doctor warns when missing.
1717
+ - **`@delegance/claude-autopilot deploy --adapter render`** — first-class Render adapter. REST API deploys (with optional `clearCache`), service-scoped status polling at `GET /v1/services/{serviceId}/deploys/{deployId}`, REST-polling log stream with `(timestamp, logId)` cursor dedup, **simulated rollback** by re-deploying the previous successful commit. `RENDER_API_KEY` env var; auth doctor warns when missing.
1718
+ - **`DeployAdapterCapabilities` interface** — adapters declare `streamMode: 'websocket' | 'polling' | 'none'` and `nativeRollback: boolean`. CLI prints a one-line stderr notice for polling-mode adapters under `--watch` so users understand why log lines arrive in batches.
1719
+ - **Bounded auto-rollback orchestration in `src/cli/deploy.ts`** — when health check fails after deploy and `rollbackOn: [healthCheckFailure]` is configured, the CLI fires exactly one rollback (no chains), with `runHealthCheck` capped at 5 attempts × 6s backoff (~30s window). New terminal `DeployResult.status` values: `fail_rolled_back` and `fail_rollback_failed`.
1720
+ - **HTTP-status error taxonomy** — new `not_found` `ErrorCode` joins the union; per-adapter mapping: 401/403→`auth`, 404→`not_found`, 422/400→`invalid_config`, 5xx→`transient_network` (retryable). Provider request-id headers (`Fly-Request-Id`, `x-request-id`) captured into `error.details` for support tickets.
1721
+ - **Mandatory log redaction across all adapters** — every log line surfaced into `DeployResult.output` or PR-comment bodies runs through `redactLogLines()` (defaults: `AKIA…`, `sk-…`, `eyJ…`, `ghp_`, `xoxb-`, plus user-configurable `config.persistence.redactionPatterns`). Closes a real existing security hazard in the v5.4 Vercel adapter that was emitting unredacted logs into PR comments.
1722
+ - **Shared `src/adapters/deploy/_http.ts`** — extracted `fetchWithRetry` + `safeReadBody` helpers used by Vercel, Fly, and Render adapters; one canonical retry implementation to maintain.
1723
+
1724
+ ### Fixed
1725
+
1726
+ - **Bugbot caught + autopilot fixed 4 real bugs across the v5.6 self-eat phases.** HIGH on Phase 2 (Render service-scoped URL — `pollUntilTerminal` and `status()` were using shorthand `/v1/deploys/{id}` which doesn't exist on Render's API). MEDIUM on Phase 3 (Render cursor dedup wasn't sorting same-ms entries by id, silently dropping out-of-order siblings). LOW on Phase 4 (`printAutoRollback` hardcoded "failed 3x" but the constant is now 5). LOW on Phase 5 (`getPreviousFileContent` was being called for `.sql` files where `previousContent` is ignored, wasting a `git show` spawn per migration).
1727
+ - **Schema-alignment diff-aware Prisma parsing (PR #44, schema-alignment cleanup)** — `getPreviousFileContent` now defaults to a CI-aware base ref (`GITHUB_BASE_REF` → `origin/<base>`, then `CI_MERGE_REQUEST_TARGET_BRANCH_NAME`, fallback `HEAD~1`) instead of always reading from `HEAD` (which gave empty diffs in CI). Dropped models now emit `drop_column` for every field of the removed model.
1728
+ - **Tombstone CLI no longer crashes with a stack trace when presets are missing (PR #82)** — schema-validator was running file IO at module load time, so every `claude-autopilot --version` call eagerly read `presets/aliases.lock.json` + `presets/schemas/migrate.schema.json`; missing presets crashed the CLI before it could format an error. Now lazy-init via memoized `getValidator()`.
1729
+
1730
+ ## v5.5.2 — Framework-agnostic /migrate (2026-04-30)
1731
+
1732
+ ### Added
1733
+
1734
+ - **Working examples for Rails, Alembic, Django, golang-migrate, Prisma, Drizzle, dbmate, Flyway, supabase-cli, custom scripts** in `skills/migrate/SKILL.md`. The dispatcher was always framework-agnostic, but the prior doc text only described the Supabase path.
1735
+ - **Detector `defaultCommand` fills** for `prisma-push`, `drizzle-push`, `golang-migrate`, `typeorm` so `claude-autopilot init` produces a working `stack.md` on first try for these toolchains.
1736
+
1737
+ ### Fixed
1738
+
1739
+ - **`/migrate` skill description rewritten** as a generic dispatcher description with a "when to use migrate-supabase instead" callout. Anyone running `migrate@1` in a non-Supabase repo no longer sees Supabase-specific instructions.
1740
+
1741
+ ## v5.5.1 — `openai` SDK now optional (2026-04-30)
1742
+
1743
+ ### Changed
1744
+
1745
+ - **`openai` moved to `optionalDependencies`** alongside `@anthropic-ai/sdk`, `@google/generative-ai`, `@modelcontextprotocol/sdk`. All four LLM SDKs are now optional. `npm install --omit=optional` shed grows to **~26 MB** (was ~13 MB after v5.5.0). `scripts/autoregress.ts` migrated to `loadOpenAI()` — the last direct `import OpenAI` outside the adapter layer.
1746
+
1747
+ ### Notes
1748
+
1749
+ - Council runner already handles missing-synth-SDK gracefully — returns `status: 'partial'` with the friendly install hint surfaced via the synthesis error field. Users with only `ANTHROPIC_API_KEY` get a partial result with model responses preserved.
1750
+
1751
+ ## v5.5.0 — Lazy-load LLM SDKs + Vercel auth doctor (2026-04-30)
4
1752
 
5
1753
  ### Added
6
1754
 
7
- - **`deploy` phase** — adapter-agnostic deploy step that runs your existing deploy command, extracts the URL from stdout, optionally polls a `healthCheckUrl`, and (optionally) posts result to a PR. Closes the loop from "PR merged" to "PR merged + deployed + smoke-tested + URL on the PR".
8
- - **`deployCommand` + `healthCheckUrl` config keys**anything that works in your terminal works as `deployCommand` (`vercel --prod`, `flyctl deploy`, `kubectl apply`, `gh workflow run`, `make deploy`).
9
- - **`claude-autopilot deploy [--dry-run|--command|--health-url|--pr <n>]`**CLI surface. PR comment integration via `gh pr comment`.
1755
+ - **`src/adapters/sdk-loader.ts`** with `loadAnthropic` / `loadOpenAI` / `loadGoogleGenerativeAI` + `isSdkInstalled` helper. Friendly `GuardrailError` on `MODULE_NOT_FOUND` points at the exact `npm install` command.
1756
+ - **Phase 6 of v5.4 specVercel auth doctor.** `claude-autopilot doctor` detects `deploy.adapter: vercel` in `guardrail.config.yaml` and warns when `VERCEL_TOKEN` is missing.
1757
+ - **LLM SDK install-state surface in doctor** shows which optional LLM SDKs are actually installed.
1758
+
1759
+ ### Changed
1760
+
1761
+ - **`@anthropic-ai/sdk`, `@google/generative-ai`, `@modelcontextprotocol/sdk` moved to `optionalDependencies`**. Six adapters converted from top-level import to dynamic load. Users with `--omit=optional` shed ~13 MB and only need the SDK matching their API key.
1762
+
1763
+ ## v5.4.0 — Vercel first-class deploy adapter (2026-04-30)
1764
+
1765
+ ### Added
1766
+
1767
+ - **`@delegance/claude-autopilot deploy --adapter vercel`** — first-class Vercel adapter via the v13 deployments API. Returns `dpl_xxx` IDs, polls status until terminal, populates `deployUrl` / `buildLogsUrl` / `output`. Auth via `VERCEL_TOKEN`.
1768
+ - **`--watch` SSE+NDJSON log streaming** — subscribes to `/v2/deployments/<id>/events?builds=1`, prints to stderr in real time. Reconnects once with exp backoff on disconnect.
1769
+ - **`claude-autopilot deploy rollback` + `deploy status`** — CLI subverbs over the adapter's `rollback()` / `status()` methods. `--to <id>` overrides "previous prod deploy" lookup.
1770
+ - **Auto-rollback on health-check failure** — when `rollbackOn: [healthCheckFailure]` is set in config, the CLI promotes the previous prod deploy if the post-deploy health check fails. PR comment shows both URLs (new + rolled-back-to).
1771
+ - **`<!-- claude-autopilot-deploy -->` upserting PR comment** — single comment is updated in place across deploy → log-stream → health-check → rollback, instead of spamming the PR with multiple comments.
1772
+
1773
+ ### Fixed
1774
+
1775
+ - **Bugbot caught explicit `--config <missing>` was silently ignored on PR #63 (Phase 3)** — autopilot fixed it with a regression test in 4 minutes.
1776
+ - **Phase 4 introduced a regression in Phase 2's `--watch` test surface; caught via `npm test` before PR opened**, autopilot adapted spec interpretation (made health-check opt-in instead of falling back to deployUrl) and documented the deviation.
1777
+
1778
+ ### Notes
10
1779
 
11
- First-class provider adapters (Vercel/Fly/Render with API-level deploy IDs + rollback hooks) are queued for v5.4.
1780
+ - This release was **shipped as four self-eat PRs** (#59, #61, #63, #64) where autopilot implemented its own next phase end-to-end. Cumulative cost ~\$17.50, wall clock ~82 min, 47 new tests. See [DEMO.md](DEMO.md) for the full proof set.
1781
+ - v5.3 "deploy phase" was superseded by v5.4 — the adapter pattern subsumed the generic-command-only design from the in-flight v5.3 spec.
12
1782
 
13
1783
  ## v5.2.2 — Demo polish
14
1784