@delegance/claude-autopilot 5.5.2 → 6.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (119) hide show
  1. package/CHANGELOG.md +935 -6
  2. package/README.md +55 -0
  3. package/dist/src/adapters/council/openai.js +12 -6
  4. package/dist/src/adapters/deploy/_http.d.ts +43 -0
  5. package/dist/src/adapters/deploy/_http.js +99 -0
  6. package/dist/src/adapters/deploy/fly.d.ts +206 -0
  7. package/dist/src/adapters/deploy/fly.js +696 -0
  8. package/dist/src/adapters/deploy/index.d.ts +2 -0
  9. package/dist/src/adapters/deploy/index.js +33 -0
  10. package/dist/src/adapters/deploy/render.d.ts +181 -0
  11. package/dist/src/adapters/deploy/render.js +550 -0
  12. package/dist/src/adapters/deploy/types.d.ts +67 -3
  13. package/dist/src/adapters/deploy/vercel.d.ts +17 -1
  14. package/dist/src/adapters/deploy/vercel.js +29 -49
  15. package/dist/src/adapters/pricing.d.ts +36 -0
  16. package/dist/src/adapters/pricing.js +40 -0
  17. package/dist/src/adapters/review-engine/codex.js +10 -7
  18. package/dist/src/cli/autopilot.d.ts +71 -0
  19. package/dist/src/cli/autopilot.js +735 -0
  20. package/dist/src/cli/brainstorm.d.ts +23 -0
  21. package/dist/src/cli/brainstorm.js +131 -0
  22. package/dist/src/cli/costs.d.ts +15 -1
  23. package/dist/src/cli/costs.js +99 -10
  24. package/dist/src/cli/deploy.d.ts +3 -3
  25. package/dist/src/cli/deploy.js +34 -9
  26. package/dist/src/cli/fix.d.ts +18 -0
  27. package/dist/src/cli/fix.js +105 -11
  28. package/dist/src/cli/help-text.d.ts +52 -0
  29. package/dist/src/cli/help-text.js +400 -0
  30. package/dist/src/cli/implement.d.ts +91 -0
  31. package/dist/src/cli/implement.js +196 -0
  32. package/dist/src/cli/index.js +719 -245
  33. package/dist/src/cli/json-envelope.d.ts +187 -0
  34. package/dist/src/cli/json-envelope.js +270 -0
  35. package/dist/src/cli/json-mode.d.ts +33 -0
  36. package/dist/src/cli/json-mode.js +201 -0
  37. package/dist/src/cli/migrate.d.ts +111 -0
  38. package/dist/src/cli/migrate.js +305 -0
  39. package/dist/src/cli/plan.d.ts +81 -0
  40. package/dist/src/cli/plan.js +149 -0
  41. package/dist/src/cli/pr.d.ts +106 -0
  42. package/dist/src/cli/pr.js +191 -19
  43. package/dist/src/cli/preflight.js +26 -0
  44. package/dist/src/cli/review.d.ts +27 -0
  45. package/dist/src/cli/review.js +126 -0
  46. package/dist/src/cli/runs-watch-renderer.d.ts +45 -0
  47. package/dist/src/cli/runs-watch-renderer.js +275 -0
  48. package/dist/src/cli/runs-watch.d.ts +41 -0
  49. package/dist/src/cli/runs-watch.js +395 -0
  50. package/dist/src/cli/runs.d.ts +122 -0
  51. package/dist/src/cli/runs.js +902 -0
  52. package/dist/src/cli/scan.d.ts +93 -0
  53. package/dist/src/cli/scan.js +166 -40
  54. package/dist/src/cli/spec.d.ts +66 -0
  55. package/dist/src/cli/spec.js +132 -0
  56. package/dist/src/cli/validate.d.ts +29 -0
  57. package/dist/src/cli/validate.js +131 -0
  58. package/dist/src/core/config/schema.d.ts +9 -0
  59. package/dist/src/core/config/schema.js +7 -0
  60. package/dist/src/core/config/types.d.ts +11 -0
  61. package/dist/src/core/council/runner.d.ts +10 -1
  62. package/dist/src/core/council/runner.js +25 -3
  63. package/dist/src/core/council/types.d.ts +7 -0
  64. package/dist/src/core/errors.d.ts +1 -1
  65. package/dist/src/core/errors.js +11 -0
  66. package/dist/src/core/logging/redaction.d.ts +13 -0
  67. package/dist/src/core/logging/redaction.js +20 -0
  68. package/dist/src/core/migrate/schema-validator.js +15 -1
  69. package/dist/src/core/phases/static-rules.d.ts +5 -1
  70. package/dist/src/core/phases/static-rules.js +2 -5
  71. package/dist/src/core/run-state/budget.d.ts +88 -0
  72. package/dist/src/core/run-state/budget.js +141 -0
  73. package/dist/src/core/run-state/cli-internal.d.ts +21 -0
  74. package/dist/src/core/run-state/cli-internal.js +174 -0
  75. package/dist/src/core/run-state/events.d.ts +59 -0
  76. package/dist/src/core/run-state/events.js +504 -0
  77. package/dist/src/core/run-state/lock.d.ts +61 -0
  78. package/dist/src/core/run-state/lock.js +206 -0
  79. package/dist/src/core/run-state/phase-context.d.ts +60 -0
  80. package/dist/src/core/run-state/phase-context.js +108 -0
  81. package/dist/src/core/run-state/phase-registry.d.ts +137 -0
  82. package/dist/src/core/run-state/phase-registry.js +162 -0
  83. package/dist/src/core/run-state/phase-runner.d.ts +80 -0
  84. package/dist/src/core/run-state/phase-runner.js +447 -0
  85. package/dist/src/core/run-state/provider-readback.d.ts +130 -0
  86. package/dist/src/core/run-state/provider-readback.js +426 -0
  87. package/dist/src/core/run-state/replay-decision.d.ts +69 -0
  88. package/dist/src/core/run-state/replay-decision.js +144 -0
  89. package/dist/src/core/run-state/resolve-engine.d.ts +100 -0
  90. package/dist/src/core/run-state/resolve-engine.js +190 -0
  91. package/dist/src/core/run-state/resume-preflight.d.ts +66 -0
  92. package/dist/src/core/run-state/resume-preflight.js +116 -0
  93. package/dist/src/core/run-state/run-phase-with-lifecycle.d.ts +73 -0
  94. package/dist/src/core/run-state/run-phase-with-lifecycle.js +186 -0
  95. package/dist/src/core/run-state/runs.d.ts +57 -0
  96. package/dist/src/core/run-state/runs.js +288 -0
  97. package/dist/src/core/run-state/snapshot.d.ts +14 -0
  98. package/dist/src/core/run-state/snapshot.js +114 -0
  99. package/dist/src/core/run-state/state.d.ts +40 -0
  100. package/dist/src/core/run-state/state.js +164 -0
  101. package/dist/src/core/run-state/types.d.ts +278 -0
  102. package/dist/src/core/run-state/types.js +13 -0
  103. package/dist/src/core/run-state/ulid.d.ts +11 -0
  104. package/dist/src/core/run-state/ulid.js +95 -0
  105. package/dist/src/core/schema-alignment/extractor/index.d.ts +1 -1
  106. package/dist/src/core/schema-alignment/extractor/index.js +2 -2
  107. package/dist/src/core/schema-alignment/extractor/prisma.d.ts +13 -1
  108. package/dist/src/core/schema-alignment/extractor/prisma.js +65 -10
  109. package/dist/src/core/schema-alignment/git-history.d.ts +19 -0
  110. package/dist/src/core/schema-alignment/git-history.js +53 -0
  111. package/dist/src/core/static-rules/rules/brand-tokens.js +2 -2
  112. package/dist/src/core/static-rules/rules/schema-alignment.js +14 -4
  113. package/package.json +2 -1
  114. package/scripts/autoregress.ts +1 -1
  115. package/skills/claude-autopilot.md +1 -1
  116. package/skills/make-interfaces-feel-better/SKILL.md +104 -0
  117. package/skills/simplify-ui/SKILL.md +103 -0
  118. package/skills/ui/SKILL.md +117 -0
  119. package/skills/ui-ux-pro-max/SKILL.md +90 -0
package/CHANGELOG.md CHANGED
@@ -1,14 +1,943 @@
1
- # Changelog
1
+ ## Unreleased
2
2
 
3
- ## v5.3.0 Deploy phase (in flight, not yet shipped)
3
+ - v5.6 Phase 7 (docs reconciliation) pending.
4
+
5
+ ## 6.2.2 — `claude-autopilot autopilot --json` envelope + cache version policy (2026-05-07)
6
+
7
+ **Headline.** Closes out the v6.2.x track. `claude-autopilot autopilot --json` now emits exactly one machine-readable envelope on stdout — successful runs, pre-run failures, and mid-pipeline failures all produce the same shape so CI consumers can branch on `.exitCode` / `.failedPhase` / `.errorCode` directly without parsing stderr NDJSON. The cache contract gains a `MIN_SUPPORTED..MAX_SUPPORTED` schema-version window so a stale run dir from a future binary fails with a clear error instead of an opaque shape crash. The migration guide gets a new "v6.1 → v6.2: one runId across the pipeline" section.
8
+
9
+ **Motivation — Codex review of the v6.2 spec (3 WARNING + 3 NOTE).** The v6.2 orchestrator spec reserved `--json` for v6.2.2; the spec for this PR (Codex 5.3-reviewed) folded back three warnings (strict equality on schemaVersion blocks rolling deploys, exactly-once envelope needs uncaughtException coverage, exit-code taxonomy ambiguous for pre-run failures) and three notes (six-phases vs four-phases migration text, `errorCode` union too loose, stdout purity test under stderr load).
10
+
11
+ **What's in (the 9 deliverables from the spec's "Scope" section).**
12
+
13
+ - **Outer JSON envelope** for `claude-autopilot autopilot --json`. New `AutopilotJsonEnvelope` shape (`version: '1'`, `verb: 'autopilot'`, `runId | null`, `status`, `exitCode`, `phases[]`, `totalCostUSD`, `durationMs`, `errorCode?`, `errorMessage?`, `failedAtPhase?`, `failedPhaseName?`). Pre-run failures get `runId: null` + populated `errorCode`. Mid-pipeline failures get `failedAtPhase` + `failedPhaseName`.
14
+ - **Bounded `AutopilotErrorCode` enum.** Exact strings: `invalid_config | budget_exceeded | lock_held | corrupted_state | partial_write | needs_human | phase_failed | internal_error`. CI consumers can rely on these specific values; new codes ship as minor versions of the envelope schema. Per codex NOTE #5.
15
+ - **Single-write latch + uncaughtException / unhandledRejection handlers.** Module-scoped boolean in `src/cli/json-envelope.ts` flips BEFORE writing so subsequent calls no-op. The orchestrator's `runAutopilotWithJsonEnvelope` installs process-level fatal handlers that consult the latch — if an envelope already shipped, they exit silently; otherwise they emit a fallback `internal_error` envelope before exiting `1`. Test seam `__testInstallProcessHandlers: false` keeps the handlers from leaking across the suite. Per codex WARNING #2.
16
+ - **Deterministic exit-code-to-errorCode mapping** via `computeAutopilotExitCode`. `0` success / `1` `invalid_config | phase_failed | internal_error` / `2` `lock_held | corrupted_state | partial_write` / `78` `budget_exceeded | needs_human`. Per codex WARNING #3.
17
+ - **Cache contract version policy** in `src/core/run-state/state.ts` + the replay path in `events.ts`. New exports `RUN_STATE_MIN_SUPPORTED_SCHEMA_VERSION = 1` and `RUN_STATE_MAX_SUPPORTED_SCHEMA_VERSION = RUN_STATE_SCHEMA_VERSION`. `replayState()` throws `corrupted_state` when the persisted `schema_version` falls outside the window, with a message naming both bounds for operator triage. Future minor versions can additively expand the schema while preserving forward-read compatibility (bump writer, leave reader); major bumps reset `MIN_SUPPORTED` to break with the past explicitly. Per codex WARNING #1.
18
+ - **Migration guide section.** New "v6.1 → v6.2: one runId across the pipeline" section in `docs/v6/migration-guide.md` walks through the per-verb → orchestrator collapse, the `--json` envelope shape (success / pre-run failure / mid-pipeline failure examples), the `AutopilotErrorCode` taxonomy table, and the cache version policy. Flags the v6.2.0 vs v6.2.1 phase-set difference per codex NOTE #4 — examples assume the v6.2.1 6-phase set (`scan → spec → plan → implement → migrate → pr`).
19
+ - **Channel discipline preserved.** The envelope is the only thing on stdout in `--json` mode (orchestrator runs with `__silent: true`). NDJSON events continue to flow to stderr unchanged via the existing v6 Phase 5 helpers.
20
+ - **Dispatcher wiring.** `src/cli/index.ts` plumbs `--json` through to `runAutopilotWithJsonEnvelope`; pre-run validation failures (`--mode`, `--budget`) emit envelopes too so CI never sees free-text errors when `--json` is on.
21
+
22
+ **Tests.** Baseline 1534 → 1548 (+14 net new):
23
+
24
+ - 9 envelope tests in `tests/cli/autopilot-json-envelope.test.ts` covering the 6 spec scenarios (success, pre-run failure, mid-pipeline failure, no-ANSI on stdout, stdout purity under stderr load, single-write latch + uncaughtException) plus 1 latch sanity test and 2 exit-code/enum mapping tests.
25
+ - 5 schema-version range tests in `tests/run-state/state.test.ts` covering the bounds export plus accept-in-range, reject-below-MIN, reject-above-MAX, and message-names-both-bounds.
26
+
27
+ **Engine-off path unchanged.** The schema-version range check applies inside `replayState()` (engine-on territory). Engine-off invocations don't read run dirs and are byte-for-byte identical to v6.2.1.
28
+
29
+ **Out of scope (deliberate, see spec for full list).**
30
+ - `--json` envelope on individual wrapped verbs other than `autopilot`. They already emit per-verb envelopes via the v6 Phase 5 helper; no change needed.
31
+ - Streaming JSON (newline-delimited progress events on stdout). v6.3 — would need a major channel-discipline change.
32
+ - Schema migration tooling. v6.x has only one schema version; migration tooling is reserved for the v7 layout change.
33
+
34
+ **Spec.** docs/specs/v6.2.2-json-envelope-and-docs.md (3 WARNING + 3 NOTE folded from the Codex 5.3 review).
35
+
36
+ ## 6.2.1 — Side-effect phase idempotency contracts (`migrate` + `pr`) (2026-05-07)
37
+
38
+ **Headline.** Side-effecting phases now satisfy a registry-enforced two-step contract — record a deterministic "I'm starting this work" breadcrumb BEFORE the side-effect, then one reconciliation ref per durable artifact AFTER. With the contract in place, `migrate` and `pr` enter the orchestrator's `--mode=full` registry, expanding the v6.2.0 `scan → spec → plan → implement` pipeline to the full **6-phase** flow `scan → spec → plan → implement → migrate → pr` under one runId.
39
+
40
+ **Motivation — Codex CRITICAL gate from v6.2.** The v6.2 orchestrator spec flagged side-effect resume as the riskiest property to certify before adding `migrate` or `pr`: a partial crash mid-dispatch could leave the engine blind to applied work, causing the resume preflight to either silently re-run side effects (data loss) or pessimistically refuse every retry (operability tax). v6.2.1 closes the gap with a uniform contract every side-effecting phase must declare AND a registry-time guard that throws if the declaration is missing.
41
+
42
+ **What's in (the 7 deliverables from spec section "Scope of THIS PR").**
43
+
44
+ - **New `migration-batch` ref kind** in `ExternalRefKind` (`src/core/run-state/types.ts`). Documented semantics: "deterministic id covers a planned migration batch; emitted BEFORE dispatch so a partial crash leaves a resume target." Joins `migration-version` (the post-effect reconciliation ref).
45
+ - **`migrate` pre-effect breadcrumb.** `src/cli/migrate.ts` now emits a `migration-batch` ref BEFORE `dispatchFn(input)` — a partial crash leaves the orchestrator a resume target. The post-success `migration-version` refs stay (one per applied migration). Per the v6.2.1 spec, the batch id uses the `${env}:pre-dispatch:${Date.now()}` fallback form because no Delegance migrate skill (Supabase, Rails, Alembic, …) exposes its planned set pre-dispatch — the deterministic-id form `sha256(env+plannedMigrations)` is reserved for a follow-up that adds a planning verb to the skill protocol.
46
+ - **Provider readback for `migration-batch`** in `src/core/run-state/provider-readback.ts`. Queries the dispatcher's ledger for the planned set + applied set, returns `merged` (all applied), `open` (some pending), `failed` (any errored), or `unknown` (fail closed on missing fetcher / throw / null). New `MigrationBatchFetcher` interface + `registerMigrationBatchFetcher` seam alongside the existing `MigrationStateFetcher`.
47
+ - **Registry-time enforcement** in `src/core/run-state/phase-registry.ts`. New `registerPhase()` helper throws `Error: registry: side-effect phase <name> missing idempotency contract` when a `hasSideEffects: true` registration omits `preEffectRefKinds` or `postEffectRefKinds`. Applied to all six entries; the four read-only phases (scan/spec/plan/implement) omit the arrays without complaint.
48
+ - **`buildMigratePhase` and `buildPrPhase` builders** extracted following the v6.2.0 builder pattern (scan/spec/plan/implement). Each verb's existing `runX(options)` continues to delegate to its builder — direct CLI behavior is byte-for-byte identical to v6.2.0. The full registry now has: `scan / spec / plan / implement / migrate / pr`.
49
+ - **Resume preflight in orchestrator** (`src/cli/autopilot.ts` + new `src/core/run-state/resume-preflight.ts`). Before invoking `runPhase` on any side-effecting phase, the orchestrator collects prior `phase.success` + `phase.externalRef` events from `events.ndjson` and routes per the spec decision matrix: all post-effect refs `merged`/`live` → emit synthetic `phase.success` and skip; pre-effect breadcrumb `open` → retry (the phase body's own ledger handles dedup); otherwise → emit `replay.override` + throw `GuardrailError('needs_human')`. New error code `needs_human` joins the taxonomy in `src/core/errors.ts`.
50
+ - **`--mode=full` extended** to 6 phases (`DEFAULT_FULL_PHASES` in `phase-registry.ts`). After v6.2.1, `claude-autopilot autopilot` runs the entire pipeline under one runId — the YC-demo win deferred from v6.2.0.
51
+
52
+ **Tests.** Baseline 1509 → 1532 (+23 net new):
53
+
54
+ - 9 gating tests in `tests/cli/autopilot-side-effect-resume.test.ts` covering the 6 spec scenarios (migrate partial-crash retry, migrate full-success skip, pr-open skip, pr-closed needs-human, registry rejection, run-scope budget no-double-charge) plus 3 edge cases (proceed-fresh, prior success without refs, errored-ledger needs-human).
55
+ - 8 unit tests in `tests/run-state/provider-readback.test.ts` covering the new `migration-batch` readback (merged / open / failed / empty plan / null fetcher / throw / no fetcher / default-registry routing).
56
+ - 2 updated tests in `tests/cli/migrate-engine-smoke.test.ts` to account for the new pre-effect breadcrumb (now `1 + N` refs per run instead of `N`).
57
+ - 4 new test variants for the contract guard (`hasSideEffects: true` with each missing array, plus the empty-postEffect / read-only positive cases).
58
+
59
+ **Engine-off path unchanged.** Existing `migrate`/`pr` invocations without `--engine` continue byte-for-byte identical. The engine-off escape hatch threads through `executeMigratePhase(input, null)` / `executePrPhase(input, null)`, where a null `ctx` makes `emitExternalRef` a no-op — same precedent as every other wrapped verb.
60
+
61
+ **Out of scope (deliberate, see spec for full list).**
62
+ - Deterministic batch id (`sha256(env + plannedMigrations)`) — requires extracting a `planMigrations()` verb from each migrate skill's protocol. v6.2.x follow-up.
63
+ - `implement`'s `git-remote-push` ref (declared in the spec table but not yet emitted by `implement.ts`). v6.2.x follow-up.
64
+ - Cross-run ref dedup (e.g. recognizing two pre-dispatch breadcrumbs as the same operation across runs). Not needed for orchestrator MVP.
65
+ - Provider readback for non-Delegance migrate skills (Rails, Alembic, …). v6.2.1 ships the contract; per-skill readback is per-skill follow-up work.
66
+
67
+ **Spec.** docs/specs/v6.2.1-side-effect-idempotency.md (Codex CRITICAL gate from v6.2 — folded back as the foundation for this PR).
68
+
69
+ ## 6.2.0 — Multi-phase orchestrator (`claude-autopilot autopilot`) (2026-05-07)
70
+
71
+ **Headline.** New top-level `claude-autopilot autopilot` verb runs `scan → spec → plan → implement` under **one runId**. The pre-v6.2 chain (`scan && spec && plan && implement`) created four separate runs with no parent — the orchestrator collapses them into a single ledger so `claude-autopilot runs watch <id>` covers the whole pipeline and a `--budget=$25` cap ticks down across phases instead of resetting per verb.
72
+
73
+ **What's in.**
74
+ - **`claude-autopilot autopilot [options]`** — sequential N-phase orchestrator. Engine-on REQUIRED (rejected at pre-flight if `--no-engine` / `CLAUDE_AUTOPILOT_ENGINE=off` / `engine.enabled: false`). Lifecycle: `createRun({ phases })` → per-phase `buildPhase + runPhase` → emit `run.complete` exactly once → refresh state snapshot → release lock in `finally`. Non-interactive (a `pause` budget decision becomes hard-fail) so it works in CI without prompting.
75
+ - **`build<Phase>Phase()` builders** extracted from `scan`, `spec`, `plan`, `implement`. Each verb's existing `runX(options)` continues to call its builder internally — direct CLI behavior is byte-for-byte identical to v6.1. Per-verb parity tests (`tests/cli/<verb>-builder-parity.test.ts`) compare stdout / stderr / `events.ndjson` between the legacy entry and the explicit builder + `runPhaseWithLifecycle` path.
76
+ - **Phase registry** at `src/core/run-state/phase-registry.ts`. `as const` + per-entry `satisfies PhaseRegistration<I, O>` preserves per-phase I/O typing through dynamic dispatch (per codex review NOTE #5). `getPhase(name)`, `listPhaseNames()`, and `validatePhaseNames(names)` are the public surface; `--phases=<csv>` validation lives here.
77
+ - **Run-scope budget** — `BudgetConfig.scope: 'phase' | 'run'` (default `'phase'` for back-compat). When `scope === 'run'` the orchestrator's per-phase budget gates resolve against cross-phase `phase.cost` totals so the `$25` demo narrative ticks down across the whole pipeline. `sumPhaseCost(events, '*')` cross-phase overload added. Both `BudgetCheck.scope` and `BudgetCheckEvent.scope` carry the resolution forward to observers (`runs show <id> --events`, future cost dashboards). Per codex review WARNING #2 — pulled forward into v6.2.0 (was deferred to v6.2.2 in the initial draft).
78
+ - **Exit-code matrix** (per codex review WARNING #3) — 0 success, 78 budget_exceeded, 2 engine error (`lock_held` / `corrupted_state` / `partial_write`), 1 everything else. Phase failure wins over finalization error.
79
+ - **CLI surface**: `--mode=full` (default — `scan → spec → plan → implement`), `--phases=<csv>` for custom lists, `--budget=<usd>` for the run-scope cap. `--mode=fix` and `--mode=review` reserved for v6.2.1+; `--json` envelope reserved for v6.2.2.
80
+
81
+ **Tests.** Baseline 1492 → 1509 (+17 new):
82
+ - 4 builder-parity tests (`scan`, `spec`, `plan`, `implement`) covering stdout / stderr / events triple-snapshot.
83
+ - 6 run-scope budget tests in `tests/run-state/budget.test.ts` covering scope flag default, run-scope happy path, run-scope cap exceeded across phases, Layer 1 advisory in run-scope, and phase/run scope math equivalence (regression guard).
84
+ - 7 orchestrator integration tests in `tests/cli/autopilot.test.ts` covering: 3-phase happy path, scan-failure phase 0, run-scope budget exceeded → exit 78, resume lookup `already-complete` short-circuit, `--phases=invalid,scan` → exit 1 invalid_config no run dir, `CLAUDE_AUTOPILOT_ENGINE=off` → exit 1 invalid_config, `cliEngine: false` → exit 1 invalid_config.
85
+
86
+ **Out of scope (deliberate, see spec for full list).**
87
+ - `migrate`, `pr` — gated on per-phase idempotency contracts (preflight readback + externalRef recorded BEFORE side-effect). v6.2.1.
88
+ - `--mode=fix`, `--mode=review` — v6.2.1+.
89
+ - `--json` envelope — v6.2.2.
90
+ - Parallel phase execution. Sequential by design.
91
+ - Interactive prompts inside the orchestrator. CI/scripts get deterministic exit codes; pause budget decisions hard-fail.
92
+
93
+ **Spec.** docs/specs/v6.2-multi-phase-orchestrator.md (Codex-reviewed: 1 CRITICAL + 3 WARNING + 3 NOTE folded back into the spec before implementation).
94
+
95
+ ## 6.1.0 — Default flip: engine on by default + `--no-engine` deprecated (2026-05-07)
96
+
97
+ **Headline.** The Run State Engine is now ON by default. Bare
98
+ `claude-autopilot <verb>` invocations create a `.guardrail-cache/runs/<ulid>/`
99
+ directory, emit typed NDJSON events on stderr, apply budget gates if
100
+ `budgets:` is configured, and write a state snapshot — without any opt-in
101
+ config. v6.0 shipped the engine OFF behind an explicit `engine.enabled: true`
102
+ opt-in to give users control during a stabilization window; v6.1 closes
103
+ that window.
104
+
105
+ **Motivation — v6.0 stabilization criteria met.**
106
+ - 10 of 10 pipeline phases wrapped through `runPhaseWithLifecycle`
107
+ (`scan` v6.0.1, `costs`/`fix` v6.0.2, `brainstorm`/`spec` v6.0.3,
108
+ `plan`/`review` v6.0.4, `validate` v6.0.5, `implement` v6.0.7,
109
+ `migrate` v6.0.8 — first side-effecting wrap with `migration-version`
110
+ externalRefs, `pr` v6.0.9 — second side-effecting wrap with `github-pr`
111
+ externalRefs).
112
+ - Lifecycle helper extracted (v6.0.6) so all 10 wraps share the same
113
+ byte-for-byte engine-on / engine-off behavior.
114
+ - Side-effecting wraps proven (`migrate` + `pr`) — externalRef ledger
115
+ + provider readback semantics exercised end-to-end.
116
+ - Live adapter cert suite green (Vercel + Fly + Render).
117
+ - `runs watch <id>` live cost/budget meter shipped (this release's
118
+ `v6.1.0-pre` entry below) — the YC-demo moment for the events stream.
119
+ - `npm test` baseline: 1469 → 1492 (+23 net new this release; all green).
120
+
121
+ **Deprecation.** `--no-engine`, `CLAUDE_AUTOPILOT_ENGINE=off|false|0|no`,
122
+ and `engine.enabled: false` continue to work as the legacy escape hatch
123
+ in v6.1.x. Each invocation that resolves to engine-off via one of those
124
+ explicit opt-outs now prints a single-line stderr deprecation notice:
125
+
126
+ ```
127
+ [deprecation] --no-engine / engine.enabled: false will be removed in v7. Migrate to engine-on (default).
128
+ ```
129
+
130
+ The notice fires only on user-driven opt-outs (`source: 'cli' | 'env' |
131
+ 'config'`); the new (engine-on) default never trips it. **v7 removes
132
+ the escape hatch** — `engine.enabled: false` becomes a config validation
133
+ error and `--no-engine` / `CLAUDE_AUTOPILOT_ENGINE=off` are silently
134
+ ignored.
135
+
136
+ **Spec.** [`docs/specs/v6.1-default-flip.md`](docs/specs/v6.1-default-flip.md)
137
+ is the canonical reference for what flipped, why, and the v7 follow-up.
138
+
139
+ **Migration tips.**
140
+ - If your CI parses stderr as free-form text and relies on the v5.x
141
+ shape, set `CLAUDE_AUTOPILOT_ENGINE=off` (or pass `--no-engine`)
142
+ to pin the legacy behavior. You'll see the deprecation notice on
143
+ every invocation until you remove it — that's expected.
144
+ - If you opt out via config (`engine.enabled: false`), the same notice
145
+ fires on every invocation. Plan to remove that line before bumping
146
+ to v7.
147
+ - Existing users on `engine.enabled: true` are no-op'd — your config
148
+ still wins via the same precedence rules.
149
+ - See [`docs/v6/migration-guide.md#migrating-from-v60-to-v61`](docs/v6/migration-guide.md)
150
+ for the full upgrade walkthrough.
151
+
152
+ **Test surface.**
153
+ - `tests/run-state/resolve-engine.test.ts` — flipped 4 default-related
154
+ cases. New `v6.1 default-flip` describe block + `v6.1 deprecation
155
+ warning` describe block covering the predicate, the emitter, the
156
+ default `process.stderr` branch, and the `builtInDefault` override
157
+ path.
158
+ - `tests/run-state/run-phase-with-lifecycle.test.ts` — added 4 new
159
+ cases pinning engine-on as the new default + the deprecation banner
160
+ firing on opt-out / staying silent on the new default.
161
+ - 9 engine-smoke tests (`brainstorm`, `costs`, `implement`, `migrate`,
162
+ `plan`, `pr`, `review`, `spec`, `validate`) updated — the
163
+ "engine off (default)" cases are now "engine on (v6.1 default)";
164
+ the matching `cliEngine: false` cases stay as legacy-escape-hatch
165
+ coverage.
166
+
167
+ **Files changed.**
168
+ - `src/core/run-state/resolve-engine.ts` — new active default constant
169
+ `ENGINE_DEFAULT_V6_1 = true`. The deprecated `ENGINE_DEFAULT_V6_0`
170
+ export keeps its historical value (`false`) so out-of-tree consumers
171
+ who pinned that symbol get what the name promises; both constants are
172
+ removed in v7. New `emitEngineOffDeprecationWarning` helper +
173
+ `shouldWarnEngineOffDeprecation` predicate +
174
+ `ENGINE_OFF_DEPRECATION_MESSAGE` stable copy.
175
+ - `src/core/run-state/run-phase-with-lifecycle.ts` — wires the
176
+ deprecation helper into the engine-off branch.
177
+ - `docs/v6/migration-guide.md` — new "Migrating from v6.0 to v6.1"
178
+ section, updated precedence matrix, refreshed default-flip plan,
179
+ relabeled "What changes" table.
180
+ - `README.md` — v6 section updated (engine on by default + v7 removal
181
+ timeline).
182
+ - `package.json` — version `5.5.2` → `6.1.0`.
183
+
184
+ ## v6.1.0-pre — `runs watch <id>` live cost meter (2026-05-07)
185
+
186
+ **The YC-demo moment.** v6.0.x hardened the events.ndjson stream across
187
+ all 10 wrapped phases; v6.1 makes that stream visible in real time.
188
+ `runs watch <runId>` tails events.ndjson via `fs.watchFile` (1s poll —
189
+ inotify/FSEvents are unreliable for tiny appends across our matrix) and
190
+ pretty-renders each event with a running cost/budget meter so a user
191
+ running `claude-autopilot autopilot ...` in one terminal can `runs watch`
192
+ in another and watch their $25 budget tick down while phases ship code.
193
+
194
+ **Demo transcript.** Live tail of a fixture run, ANSI-stripped:
195
+
196
+ ```
197
+ * run 01HZK7P3D8Q9V00000000000AB
198
+ phases: spec -> plan -> implement -> pr
199
+ budget: $0.00 / $25.00 (0%)
200
+ [12:00:01] phase.start spec
201
+ [12:00:42] phase.cost spec +$0.07 (in: 1.2k, out: 3.4k) total: $0.07
202
+ [12:00:45] phase.success spec OK 44.2s
203
+ [12:00:46] phase.start plan
204
+ [12:01:12] phase.cost plan +$0.21 (in: 4.1k, out: 8.2k) total: $0.28
205
+ [12:01:15] phase.success plan OK 29.0s
206
+ [12:08:33] phase.externalRef pr -> github-pr#123
207
+ [12:08:34] run.complete status=success totalCostUSD=$4.20 duration=8m32s
208
+
209
+ done run 01HZK7P3D8Q9V00000000000AB
210
+ status=success totalCostUSD=$4.20 duration=8m33s
211
+ ```
212
+
213
+ **Modes.**
214
+
215
+ - `runs watch <id>` — live tail, exits on `run.complete` / Ctrl-C
216
+ - `runs watch <id> --since <seq>` — replay forward from a specific seq
217
+ (resume after disconnect)
218
+ - `runs watch <id> --no-follow` — render snapshot once and exit (CI /
219
+ scripting)
220
+ - `runs watch <id> --json` — emit raw NDJSON to stdout (one event per
221
+ line) for piping to `jq` or external dashboards. ANSI suppressed.
222
+ - `runs watch <id> --no-color` — force ANSI off even on a TTY
223
+
224
+ **Pretty rendering.** Color thresholds on the budget bar — green <50%,
225
+ yellow 50-90%, red >90%. Per-event coloring: cyan for phase.start, yellow
226
+ for phase.cost, green for phase.success, red for phase.failed, magenta
227
+ for phase.externalRef + lock.takeover + replay.override, bold-green for
228
+ run.complete success, bold-red for run.complete failed/aborted. ANSI
229
+ auto-strips when stdout is not a TTY (CI), when `--no-color` or `--json`
230
+ is set, or when `NO_COLOR` env var is present.
231
+
232
+ **Pure renderer.** `src/cli/runs-watch-renderer.ts` is referentially
233
+ transparent — `renderEventLine(event, runningTotal, opts)` is the core
234
+ primitive, exported and 100% pure. Tests run as string-equality
235
+ assertions in <300ms.
236
+
237
+ **Engine modules untouched.** This is purely a consumer of the existing
238
+ event stream — no changes to `src/core/run-state/**`, no changes to the
239
+ 10 wrapped phase verbs, no changes to `runPhaseWithLifecycle`.
240
+
241
+ **Tests.** +43 new tests:
242
+ - `tests/cli/runs-watch-renderer.test.ts` — 29 pure-renderer cases
243
+ covering every event-line variant, the three budget-bar color
244
+ thresholds, ANSI on/off symmetry, and the final-summary block
245
+ - `tests/cli/runs-watch.test.ts` — 14 verb-level cases covering
246
+ `--no-follow` snapshot, `--since` replay, `--json` mode, run-not-found
247
+ (exit 2), invalid-ULID, live tail picks up appended events,
248
+ budget rendering with/without `BudgetConfig`, plural `budgets` config
249
+ alias, ANSI behavior, and run-complete short-circuit on already-
250
+ terminated runs
251
+
252
+ **CLI plumbing.** New sub-verb on the `runs` umbrella: `runs watch <id>`.
253
+ Help block surfaces `--since`, `--no-follow`, `--json`, `--no-color`
254
+ plus a behavior summary + exit-code key. Exit codes: 0 success / clean
255
+ exit, 1 invalid input or stream error, 2 not_found.
256
+
257
+ ## v6.0.9 — wrap `pr` through `runPhaseWithLifecycle` (2026-05-06)
258
+
259
+ **First side-effecting phase wrapped.** v6.0.1 → v6.0.5 wrapped read-only
260
+ verbs (`scan`, `costs`, `fix`, `brainstorm`, `spec`, `plan`, `review`,
261
+ `validate`); v6.0.6 extracted the lifecycle helper. v6.0.9 wraps `pr` —
262
+ the first verb that mutates state on the platform of record (GitHub
263
+ issue comments + PR reviews). This proves the helper's `ctx.emitExternalRef`
264
+ plumbing for genuinely side-effecting phases without any helper-shape
265
+ changes.
266
+
267
+ **Declarations.** Match the v6 spec table exactly:
268
+
269
+ - `idempotent: false` — re-running posts a NEW PR review ID each time
270
+ (`postReviewComments` dismisses prior + creates new). PR comment
271
+ posting (`postPrComment`) is marker-deduped on the body but the
272
+ underlying `gh` API call is still mutating.
273
+ - `hasSideEffects: true` — posts to GitHub via the `gh` CLI inside the
274
+ inner `runCommand` invocation.
275
+ - `externalRefs: github-pr` — recorded BEFORE the inner `runCommand`
276
+ runs so a crash mid-pipeline still leaves a breadcrumb pointing at
277
+ the PR. The engine path's Phase 6 resume logic can `gh pr view <id>`
278
+ to confirm the PR is still open before deciding whether a replay
279
+ is safe.
280
+
281
+ **Engine-off byte-for-byte unchanged.** All `gh pr view` + `git fetch` +
282
+ `runCommand` behavior preserved. The wrap adds two test seams
283
+ (`__testPrMeta` to short-circuit PR metadata lookup, `__testRunCommand`
284
+ to stub the inner pipeline) so the smoke test exercises the engine
285
+ lifecycle without `gh` or a real review pipeline. Production callers
286
+ must not pass these — they're documented "test only" with a comment
287
+ mirroring scan / fix's `__testReviewEngine` precedent.
288
+
289
+ **CLI plumbing.** The `pr` dispatcher arm now threads `cliEngine` from
290
+ `parseEngineCliFlag()` and `envEngine` from
291
+ `process.env.CLAUDE_AUTOPILOT_ENGINE`, mirroring every other wrapped
292
+ verb. The per-verb help block (`claude-autopilot help pr`) gains
293
+ `--engine` / `--no-engine` lines plus a side-effects note (engine-on
294
+ records a `github-pr` externalRef; future replays gate on the spec's
295
+ "side-effect readback" rule). `GLOBAL_FLAGS_BLOCK` adds "v6.0.9: wired
296
+ for `pr`" to its breadcrumb list.
297
+
298
+ **Smoke test.** New `tests/cli/pr-engine-smoke.test.ts`, 6 cases:
299
+ - engine off (default): no run dir / no engine artifacts; runCommand
300
+ still invoked
301
+ - engine off (`cliEngine: false`): no run dir
302
+ - engine on (`--engine`): state.json + events.ndjson + lifecycle in
303
+ order (run.start → phase.start → phase.externalRef → phase.success
304
+ → run.complete); externalRef recorded with kind=`github-pr`,
305
+ id=`42`, provider=`github`; `idempotent: false, hasSideEffects: true`
306
+ reflected on the phase
307
+ - env precedence (`CLAUDE_AUTOPILOT_ENGINE=on` without CLI flag)
308
+ - CLI override (`--no-engine` beats env on)
309
+ - runCommand returning 1 surfaces as verb exit 1 WITHOUT marking the
310
+ engine phase as failed (pipeline result ≠ phase failure, same
311
+ precedent as scan)
312
+
313
+ **Why no follow-up `github-comment` externalRef yet.** A potential
314
+ extension is to record one externalRef per posted comment / review
315
+ (`github-comment`). That requires plumbing the post-comment URL out
316
+ of `runCommand` (currently only logged) — deferred to a follow-up PR.
317
+ For v6.0.9 the `github-pr` ref is sufficient for the spec's readback
318
+ rule: a Phase 6 resume can verify the PR is still open before
319
+ deciding whether to retry.
320
+
321
+ **Files changed.** `src/cli/pr.ts` (270 insertions / 22 deletions),
322
+ `src/cli/index.ts` (+12 lines for engine knob plumbing),
323
+ `src/cli/help-text.ts` (+8 lines for the per-verb Options block +
324
+ breadcrumb), `tests/cli/pr-engine-smoke.test.ts` (new, 306 lines),
325
+ `docs/v6/wrapping-pipeline-phases.md` (status header + table row +
326
+ deviation note), `docs/v6/migration-guide.md` ("what works today" list
327
+ adds `pr`), `docs/specs/v6-run-state-engine.md` (reconciliation block
328
+ appended). Total: ~600 lines added, ~25 lines removed.
329
+
330
+ **Status after v6.0.9.** Nine of 10 phases wrapped. Remaining:
331
+ `implement` (v6.0.7) and `migrate` (v6.0.8) — both side-effecting,
332
+ both wrapped concurrently with this PR by parallel agents.
333
+ - **Bundled UI polish skills** — ships `/ui`, `/simplify-ui`, `/ui-ux-pro-max`,
334
+ `/make-interfaces-feel-better` so consumers get them via `npm install` instead
335
+ of needing user-level skill installs. `/ui` runs the chained pass (audit →
336
+ simplify → align → polish); the other three are individual lenses. Auto-
337
+ discovered via the existing `skills/` directory in the package `files`
338
+ allowlist. Pairs with the design context loader
339
+ (`src/core/ui/design-context-loader.ts`) — both gate on the same
340
+ `hasFrontendFiles()` predicate so they only fire when frontend files change.
341
+
342
+ ## v6.0.7 — wrap `implement` through `runPhaseWithLifecycle` (2026-05-07)
343
+
344
+ **Wraps the ninth pipeline phase.** Mechanical wrap following the v6.0.6
345
+ helper recipe. Engine-off path is byte-for-byte unchanged (advisory print
346
+ pointing at the Claude Code `claude-autopilot` skill); engine-on path
347
+ creates a run dir + emits run.start / phase.start / phase.success /
348
+ run.complete events. Concurrent dispatch — landed alongside v6.0.8
349
+ (`migrate`) and v6.0.9 (`pr`).
350
+
351
+ - New `src/cli/implement.ts` — `RunPhase<ImplementInput, ImplementOutput>`
352
+ with `idempotent: true, hasSideEffects: false`. **Documented deviation
353
+ from spec table:** the spec at line 159 of
354
+ `docs/specs/v6-run-state-engine.md` lists `implement` with
355
+ `idempotent: partial, hasSideEffects: yes, externalRefs: git-remote-push`.
356
+ That declaration assumes the verb itself writes commits and pushes them
357
+ to a remote. The v6.0.7 CLI verb does **not** write code, run tests,
358
+ commit, or push to a remote — all of that lives in the Claude Code
359
+ `claude-autopilot` skill (and its delegates: `subagent-driven-development`,
360
+ `commit-push-pr`, `using-git-worktrees`). The CLI verb is the engine-wrap
361
+ shell — its only side effect is writing the local
362
+ `.guardrail-cache/implement/<ts>-implement.md` log stub. If a future PR
363
+ inlines the implement loop into the CLI verb, the declarations flip to
364
+ match the spec table and a `ctx.emitExternalRef({ kind: 'git-remote-push',
365
+ id: '<commit-sha>' })` call lands after each push.
366
+ - CLI dispatcher in `src/cli/index.ts` — wires `--engine` / `--no-engine` /
367
+ `--context` / `--plan` / `--output` / `--config` through the helper
368
+ alongside `process.env.CLAUDE_AUTOPILOT_ENGINE`. Mirrors the validate /
369
+ review / plan dispatcher shape.
370
+ - Help text in `src/cli/help-text.ts` — adds `implement` to the Pipeline
371
+ group + per-verb Options block. Bumps `GLOBAL_FLAGS_BLOCK` to cite
372
+ v6.0.7 alongside v6.0.1 → v6.0.5.
373
+ - New smoke test `tests/cli/implement-engine-smoke.test.ts` (6 cases) —
374
+ asserts state.json + events.ndjson lifecycle, idempotent /
375
+ hasSideEffects flags, env / CLI precedence, log file location.
376
+ - Test count: 1408 → 1414 (+6). `npm test` clean. `npx tsc --noEmit`
377
+ clean except pre-existing fixture errors.
378
+
379
+ ## v6.0.8 — wrap `migrate` through `runPhaseWithLifecycle` (2026-05-06)
380
+
381
+ **First side-effecting phase under the engine.** v6.0.1 → v6.0.6 wrapped
382
+ eight read-only / advisory verbs (`scan`, `costs`, `fix`, `brainstorm`,
383
+ `spec`, `plan`, `review`, `validate`). v6.0.8 wraps `migrate` — the
384
+ first verb that mutates external state (database schema). Builds on the
385
+ `runPhaseWithLifecycle` helper landed in v6.0.6 plus
386
+ `ctx.emitExternalRef()` from inside the phase body for the
387
+ `migration-version` ledger. No helper-shape changes needed.
388
+
389
+ **Phase declarations** match the spec table at line 162 of
390
+ `docs/specs/v6-run-state-engine.md`:
391
+
392
+ ```
393
+ idempotent: false — dispatcher output varies by ledger state
394
+ (N applied on attempt 1, 0 on attempt 2 even
395
+ though both are operationally safe)
396
+ hasSideEffects: true — applies migrations, writes audit log,
397
+ regenerates types, refreshes schema cache
398
+ externalRefs: migration-version, scoped `<env>:<name>` per applied
399
+ migration. Phase 6's resume gate will read these back
400
+ against the live `migration_state` to decide
401
+ skip-already-applied vs retry vs needs-human.
402
+ ```
403
+
404
+ **Why `idempotent: false` even though the underlying Delegance migrate
405
+ skill is ledger-guarded against double-apply:** at the *engine
406
+ semantics* layer, `idempotent: true` means "re-running the phase against
407
+ the same input produces equivalent output." A dispatch invocation that
408
+ previously applied N migrations on attempt 1 and applies 0 on attempt 2
409
+ (everything already in the ledger) DOES produce different output
410
+ (different `appliedMigrations` list, different `status`). The spec's
411
+ `idempotent: false` is correct.
412
+
413
+ **Engine-off path is byte-for-byte identical to v6.0.7.** Same dispatch
414
+ shape (`src/core/migrate/dispatcher.ts` unchanged), same render lines,
415
+ same `--json` payload callback. CI / scripts that don't pass `--engine`
416
+ are unaffected.
417
+
418
+ | File | Role |
419
+ |---|---|
420
+ | `src/cli/migrate.ts` (new) | Engine-wrap shell calling `runMigrate(opts) → { exitCode, result }`. Defines `MigrateInput` / `MigrateOutput` (JSON-serializable), `RunPhase<MigrateInput, MigrateOutput>` with `name: 'migrate'`, `idempotent: false`, `hasSideEffects: true`. Phase body invokes the dispatcher and emits one `migration-version` externalRef per applied migration via `ctx.emitExternalRef({ kind: 'migration-version', id: '<env>:<name>' })`. Test seam: `__testDispatch` injects a fake dispatcher so smoke tests can exercise the engine-wrap path without spawning a child process or hitting a real database |
421
+ | `src/cli/index.ts` | dispatcher case for `migrate` routes through `runMigrate` instead of inlining `runMigrateDispatch`; threads `cliEngine` + `envEngine`. Engine-off byte-for-byte unchanged — same `--json` payload callback, same render |
422
+ | `src/cli/help-text.ts` | per-verb Options block for `migrate` documents `--engine` / `--no-engine` + `--config`; GLOBAL_FLAGS_BLOCK breadcrumb cites v6.0.8 |
423
+ | `tests/cli/migrate-engine-smoke.test.ts` (new) | 6 cases: engine off (default — no run dir), engine on (lifecycle events, state.json shape, idempotent: false + hasSideEffects: true declaration), externalRef emission per applied migration scoped by env, skipped status (zero externalRefs), dispatcher error → exit 1 + engine still records phase.success (domain failure ≠ engine failure), CLI `--no-engine` beats env on |
424
+ | `docs/v6/wrapping-pipeline-phases.md` | phase-status table flips `migrate` to "WRAPPED in v6.0.8"; status line at top moves to "NINE phases wrapped"; new deviation note documents the ledger-vs-engine-semantics rationale |
425
+ | `docs/v6/migration-guide.md` | "What works today" updated — three knobs now honored by `scan`, `costs`, `fix`, `brainstorm`, `spec`, `plan`, `review`, `validate`, `migrate` |
426
+ | `docs/specs/v6-run-state-engine.md` | new "What was actually built (v6.0.8)" reconciliation block |
427
+
428
+ **Test delta:** 1408 → 1414 (+6). Typecheck clean. All 1408 existing
429
+ tests pass unchanged — the engine-off path for `migrate` is byte-for-
430
+ byte identical to v6.0.7 (same dispatch shape, same render).
431
+
432
+ **Concurrency note.** v6.0.7 (`implement`) and v6.0.9 (`pr`) are in
433
+ flight on parallel worktrees, both targeting shared docs (CHANGELOG,
434
+ recipe table, migration-guide) and `src/cli/{index,help-text}.ts`. The
435
+ rebase contract: on push rejection, fetch + rebase + resolve conflicts
436
+ keeping all wraps' contributions, re-test, push with `--force-with-lease`.
437
+
438
+ **Not done in v6.0.8 — explicit non-goals:**
439
+ - Wrapping `implement` and `pr`. Continues across v6.0.7 / v6.0.9
440
+ using the same helper plus `ctx.emitExternalRef()` for
441
+ `git-remote-push` (implement) and `github-pr` (pr).
442
+ - Wiring Phase 6's `migration_state` read-back. The engine PERSISTS
443
+ `migration-version` externalRefs in v6.0.8; consulting them on
444
+ resume ships in Phase 6+. Until then, retries on side-effecting
445
+ phases require `--force-replay`.
446
+ - Multi-phase pipeline orchestrator (autopilot's full
447
+ `brainstorm → spec → plan → ... → migrate → ...` flow under one runId).
448
+ - Flipping the v6.0 built-in default to ON. v6.1 territory.
449
+
450
+ ## v6.0.6 — `runPhaseWithLifecycle` helper (2026-05-06)
451
+
452
+ **Tech-debt refactor, no behavior change.** v6.0.1 → v6.0.5 wrapped eight
453
+ CLI verbs (`scan`, `costs`, `fix`, `brainstorm`, `spec`, `plan`, `review`,
454
+ `validate`) by hand-rolling the same ~100-line lifecycle pattern in each
455
+ file: `createRun → optional run.warning → runPhase → run.complete →
456
+ state.json refresh → best-effort lock release in finally`. Bugbot caught
457
+ the duplication on PR #97 (LOW severity, deferred) with the explicit
458
+ note: "extracting from 5 of 10 examples risks getting the abstraction
459
+ wrong; from 10 of 10 the pattern is fully evidenced." At 8 of 10, the
460
+ pattern is sufficiently evidenced that the remaining three side-effecting
461
+ phases (`implement`, `migrate`, `pr`) can use the same helper plus
462
+ `ctx.emitExternalRef()` from inside their phase body — no helper-shape
463
+ changes needed.
464
+
465
+ **The helper.** New `src/core/run-state/run-phase-with-lifecycle.ts` sits
466
+ on top of the existing `runPhase()` API (which is unchanged). Callers
467
+ continue to define their own `RunPhase<I, O>` with per-phase
468
+ `idempotent` / `hasSideEffects` / `run`, and pass it in alongside the
469
+ input, the loaded config, the engine knobs, and an `runEngineOff`
470
+ escape-hatch callback. The helper:
471
+
472
+ - Resolves engine on/off via the canonical CLI > env > config > default
473
+ precedence
474
+ - On engine-off: invokes `runEngineOff()` and returns its result with
475
+ `runId/runDir: null`
476
+ - On engine-on: creates a run dir, optionally emits `run.warning` for
477
+ invalid env, runs the phase, emits `run.complete` (success or failed),
478
+ refreshes `state.json` from replayed events, releases the lock in
479
+ `finally` (idempotent), and returns `{ output, runId, runDir }`
480
+ - On phase failure: emits `run.complete` with `status: 'failed'`, prints
481
+ the legacy `[<phase>] engine: phase failed — <msg>` banner to stderr
482
+ byte-for-byte, releases the lock, and re-throws
483
+
484
+ **Migrated phases.** All eight wrapped verbs reduced. Each `runX(opts)`
485
+ function shrinks: keep the per-phase `RunPhase<I, O>` definition + the
486
+ engine-off path body; delete the lifecycle boilerplate; call
487
+ `runPhaseWithLifecycle` once. Total reduction across `src/cli/`:
488
+
489
+ - `scan.ts` 498 → 429 lines (-69)
490
+ - `costs.ts` 297 → 231 lines (-66)
491
+ - `fix.ts` 473 → 415 lines (-58)
492
+ - `brainstorm.ts` 251 → 189 lines (-62)
493
+ - `spec.ts` 216 → 159 lines (-57)
494
+ - `plan.ts` 269 → 199 lines (-70)
495
+ - `review.ts` 256 → 189 lines (-67)
496
+ - `validate.ts` 262 → 196 lines (-66)
497
+ - **Total: 2522 → 2007 lines (~515 lines saved)**
498
+
499
+ **Engine-off path is byte-for-byte unchanged.** All eight existing
500
+ `tests/cli/<verb>-engine-smoke.test.ts` smokes pass without modification
501
+ (44 cases). The helper supplies an `runEngineOff` callback so the legacy
502
+ code path stays intact even when the phase body's call shape would
503
+ otherwise pin it.
504
+
505
+ ### Test count
506
+
507
+ After v6.0.5 baseline: 1396 → 1408 (+12). +12 cases for the new
508
+ `tests/run-state/run-phase-with-lifecycle.test.ts` covering: engine-off
509
+ (default + CLI > env > config precedence); engine-on success (lifecycle
510
+ events, state.json shape, env / config resolution, costUSD pass-through,
511
+ costUSD-absent fallback to 0); engine-on failure (run.complete failed,
512
+ state.json refresh, error re-thrown with original message preserved,
513
+ lock released through finally); invalid env value falling through to
514
+ config-resolved engine-on with `run.warning`. Existing 44 phase smokes
515
+ unchanged. Typecheck clean. Bugbot LOW from PR #97 addressed.
516
+
517
+ ### Deliberately deferred
518
+
519
+ - Wrapping the remaining pipeline phases (`implement`, `migrate`, `pr`).
520
+ Side-effecting phases need careful externalRef plumbing — they will
521
+ build against `runPhaseWithLifecycle` plus `ctx.emitExternalRef()`
522
+ from inside their phase body. Helper signature does not need to grow
523
+ for them; documented in the helper's header comment.
524
+ - Multi-phase pipeline orchestrator (autopilot's full
525
+ `brainstorm → spec → plan → ...` flow under one runId). The single-
526
+ phase shape stays — multi-phase wrapping is a separate v6.x lift.
527
+ - Flipping the v6.0 built-in default to ON. v6.1 territory.
528
+
529
+ ## v6.0.5 — Engine wire-up Part E (2026-05-06)
530
+
531
+ **The headline.** v6.0.4 wrapped `plan` and `review`. v6.0.5 continues the
532
+ mechanical wrap pattern from the recipe at
533
+ [`docs/v6/wrapping-pipeline-phases.md`](docs/v6/wrapping-pipeline-phases.md)
534
+ with one more single-shot, read-only verb:
535
+
536
+ - **`validate`** — new CLI verb. Engine-wrap shell for the validate
537
+ pipeline phase. Writes a validate log stub under
538
+ `.guardrail-cache/validate/`; the actual validation work (static
539
+ checks, auto-fix, tests, Codex review with auto-fix, bugbot triage) is
540
+ owned by the Claude Code `/validate` skill. Declared `idempotent: true,
541
+ hasSideEffects: false` (local file write only; no provider calls, no
542
+ git push, no PR comment, no SARIF upload).
543
+
544
+ **Documented deviation from the spec table.** The v6 spec
545
+ ([docs/specs/v6-run-state-engine.md](docs/specs/v6-run-state-engine.md),
546
+ line 161) lists `validate` with externalRefs `sarif-artifact`. The
547
+ v6.0.5 wrap matches the `idempotent: true, hasSideEffects: false`
548
+ declaration but does **not** plumb a `sarif-artifact` externalRef — the
549
+ v6.0.5 `validate` CLI verb does not emit a SARIF artifact. SARIF
550
+ emission lives in `claude-autopilot run --format sarif --output <path>`
551
+ (a separate verb). The SARIF reference is local-only file output (no
552
+ remote upload), so the engine doesn't need a readback rule for it on
553
+ resume — `idempotent: true` covers replay safety. If a future PR adds
554
+ SARIF emission directly to this verb, the wrap can add a
555
+ `ctx.emitExternalRef({ kind: 'sarif-artifact', ... })` call after the
556
+ file write lands. Documented inline in `src/cli/validate.ts` and in the
557
+ wrapping recipe's deviation note.
558
+
559
+ The engine-off code path is byte-for-byte unchanged; the `validate`
560
+ verb is brand new in v6.0.5 (validation previously lived only as a
561
+ Claude Code skill).
562
+
563
+ ### Test count
564
+
565
+ After v6.0.4 baseline: 1390 → 1396 (+6). +6 cases for
566
+ `validate-engine-smoke.test.ts`, mirroring the
567
+ `review-engine-smoke.test.ts` shape: engine off → no run dir + log
568
+ written; engine off (cliEngine: false); engine on → state.json +
569
+ events.ndjson with the right lifecycle (`run.start` →
570
+ `phase.start` → `phase.success` → `run.complete`); engine on with
571
+ explicit `--context`; env-resolved; CLI override beats env. Typecheck
572
+ clean.
573
+
574
+ ### Deliberately deferred
575
+
576
+ - Wrapping the remaining pipeline phases (`implement`, `migrate`,
577
+ `pr`). Side-effecting phases need careful externalRef plumbing per
578
+ the recipe's "side effects" gate; wrap them last.
579
+ - Adding SARIF emission directly to the `validate` verb. Lives in
580
+ `claude-autopilot run --format sarif` (separate verb).
581
+ - Extracting a shared `runPhaseWithLifecycle` helper across the eight
582
+ wrapped verbs. Separate refactor PR — out of scope for v6.0.5.
583
+ - Flipping the v6.0 built-in default to ON. v6.1 territory.
584
+
585
+ ## v6.0.4 — Engine wire-up Part D (2026-05-06)
586
+
587
+ **The headline.** v6.0.3 wrapped `brainstorm` and `spec`. v6.0.4 continues
588
+ the mechanical wrap pattern from the recipe at
589
+ [`docs/v6/wrapping-pipeline-phases.md`](docs/v6/wrapping-pipeline-phases.md)
590
+ with two more single-shot verbs:
591
+
592
+ - **`plan`** ([#98](https://github.com/axledbetter/claude-autopilot/pull/98)) —
593
+ new CLI verb. Engine-wrap shell for the plan pipeline phase. Writes a
594
+ plan markdown stub under `.guardrail-cache/plans/`; the actual
595
+ LLM-driven planning content is owned by the Claude Code
596
+ superpowers:writing-plans skill. Declared `idempotent: true,
597
+ hasSideEffects: false` (local file write only; no provider calls, no
598
+ git push, no PR comment).
599
+ - **`review`** ([#98](https://github.com/axledbetter/claude-autopilot/pull/98)) —
600
+ new CLI verb. Engine-wrap shell for the review pipeline phase. Writes
601
+ a review log stub under `.guardrail-cache/reviews/`; the actual
602
+ LLM-driven review content is owned by the Claude Code review skills
603
+ (`/review`, `/review-2pass`, `pr-review-toolkit:review-pr`). Declared
604
+ `idempotent: true, hasSideEffects: false`.
605
+
606
+ **Documented deviation from the spec table.** The v6 spec
607
+ ([docs/specs/v6-run-state-engine.md](docs/specs/v6-run-state-engine.md))
608
+ lists `review` with externalRefs `review-comments`, implying PR-side
609
+ comment posting (which would force `hasSideEffects: true`). The v6.0.4
610
+ `review` verb does **not** post anywhere — PR-side comment posting
611
+ lives in `claude-autopilot pr --inline-comments` /
612
+ `--post-comments` (a separate verb). If a future PR adds platform-side
613
+ comment posting to this verb, both declarations will need to flip and
614
+ the readback rules will need to plumb a `review-comments` externalRef.
615
+ Documented inline in `src/cli/review.ts`.
616
+
617
+ **Backward-compat — `review` grouping prefix preserved.**
618
+ `claude-autopilot review` (no args) still prints the alpha.2 prefix
619
+ help banner per the V16 v4-compat test. Flat-verb invocation requires
620
+ at least one flag, e.g. `claude-autopilot review --engine`.
621
+ `claude-autopilot help review` continues to surface the flat-verb
622
+ Options block via `buildCommandHelpText`.
623
+
624
+ Engine-off code paths are unchanged for both verbs.
625
+
626
+ ### Test count
627
+
628
+ After v6.0.3 baseline: 1378 → 1390 (+12). +6 cases for
629
+ `plan-engine-smoke.test.ts`, +6 cases for `review-engine-smoke.test.ts`.
630
+ Both mirror `costs-engine-smoke.test.ts`: engine off → no run dir;
631
+ engine on → state.json + events.ndjson with the right lifecycle
632
+ (`run.start` → `phase.start` → `phase.success` → `run.complete`);
633
+ env-resolved; CLI override beats env. Typecheck clean.
634
+
635
+ ### Deliberately deferred
636
+
637
+ - Wrapping the remaining pipeline phases (`implement`, `migrate`,
638
+ `validate`, `pr`). Side-effecting phases (`implement`, `migrate`,
639
+ `pr`) need careful externalRef plumbing per the recipe's "side
640
+ effects" gate; wrap them last.
641
+ - Flipping the v6.0 built-in default to ON. v6.1 territory.
642
+
643
+ ## v6.0.3 — Wrap brainstorm + spec through runPhase (2026-05-05)
644
+
645
+ **The headline.** v6.0.3 continues the mechanical phase-wrap pattern from
646
+ the recipe at
647
+ [`docs/v6/wrapping-pipeline-phases.md`](docs/v6/wrapping-pipeline-phases.md)
648
+ with two more pipeline verbs:
649
+
650
+ - **`brainstorm`** — the pipeline entry point. Implemented primarily as
651
+ a Claude Code skill (`/brainstorm` → `superpowers:brainstorming`); the
652
+ CLI verb is an advisory shim pointing the user there. The wrap declares
653
+ `idempotent: true, hasSideEffects: false`. Engine-off path is
654
+ byte-for-byte identical to v6.0.2 (the same advisory banner). Engine-on
655
+ path creates a run dir + emits `run.start` / `phase.start` /
656
+ `phase.success` / `run.complete`. `--json` envelope shape is preserved
657
+ for back-compat with the WS7 welcome regression guard and
658
+ `json-channel-discipline.test.ts`.
659
+ - **`spec`** — same shape as brainstorm. New top-level subcommand (it
660
+ was previously absent from `SUBCOMMANDS`); the CLI verb is an advisory
661
+ shim pointing at the autopilot/brainstorm Claude Code flow. Same wrap
662
+ flags + same engine lifecycle.
663
+
664
+ **Documented deviation from the spec table.** The
665
+ [v6 spec table](docs/specs/v6-run-state-engine.md) declares both
666
+ `brainstorm` and `spec` `idempotent: no` because the LLM dialogue
667
+ produces new content each invocation. v6.0.3 declares `idempotent: true`
668
+ because the CLI verbs themselves are static advisory prints with no LLM
669
+ call and no externalRefs to reconcile — the engine's idempotency check
670
+ is "safe to retry without reconciliation," not "produces byte-identical
671
+ output." Justified inline at the top of `src/cli/brainstorm.ts` and
672
+ `src/cli/spec.ts` plus a deviation block in the recipe. Once the CLI
673
+ verbs grow real LLM bodies (a future v6.x lift), the declaration may
674
+ flip and a `spec-file` externalRef will land on every successful run.
675
+
676
+ Engine-off code paths are unchanged for both verbs; existing tests pass
677
+ without modification.
678
+
679
+ ### Test count
680
+
681
+ 1367 → 1378 (+11). +5 cases for `brainstorm-engine-smoke.test.ts`, +5
682
+ cases for `spec-engine-smoke.test.ts`, +1 case for `spec` joining
683
+ `MIGRATED_VERBS` in `json-channel-discipline.test.ts`. Both new smoke
684
+ files mirror `costs-engine-smoke.test.ts`: engine off → no run dir;
685
+ engine on → state.json + events.ndjson with the right lifecycle
686
+ (`run.start` → `phase.start` → `phase.success` → `run.complete`);
687
+ env-resolved; CLI override beats env. Typecheck clean.
688
+
689
+ ### Deliberately deferred
690
+
691
+ - Wrapping the six remaining pipeline phases (`plan`, `implement`,
692
+ `migrate`, `validate`, `pr`, `review`). One or two per release across
693
+ v6.0.4+. A parallel agent works `plan` + `review` for v6.0.4.
694
+ - Promoting `brainstorm`/`spec` from advisory shims to full LLM-bearing
695
+ CLI verbs. The Claude Code skill remains the user-facing entry point;
696
+ the CLI wraps exist so the engine has a place to record run-state for
697
+ future multi-phase orchestration.
698
+
699
+ ## v6.0.2 — Engine wire-up Part B (2026-05-06)
700
+
701
+ **The headline.** v6.0.1 wrapped the first pipeline phase (`scan`) through
702
+ `runPhase`. v6.0.2 continues the mechanical wrap pattern from the recipe at
703
+ [`docs/v6/wrapping-pipeline-phases.md`](docs/v6/wrapping-pipeline-phases.md)
704
+ with two more single-shot verbs:
705
+
706
+ - **`costs`** ([#96](https://github.com/axledbetter/claude-autopilot/pull/96)) —
707
+ pure read-only summary of the local cost ledger. The cleanest possible
708
+ wrap: `idempotent: true, hasSideEffects: false`, no provider, no LLM,
709
+ no file writes. CLI dispatcher passes `cliEngine` + `envEngine` through;
710
+ `--config` flag also wired since the engine resolver consults config.
711
+ - **`fix`** ([#96](https://github.com/axledbetter/claude-autopilot/pull/96)) —
712
+ applies LLM-generated patches to local files. Declared
713
+ `idempotent: true` (same finding + same file content → same patch) and
714
+ `hasSideEffects: false` (no remote / git push / PR creation in the
715
+ existing flow — purely local file edits, which the recipe defines as
716
+ platform-side-effect-free). If/when fix grows a `--push` mode it will
717
+ flip to `hasSideEffects: true` with a `git-remote-push` externalRef.
718
+
719
+ **Documented deviation from the recipe.** Both wraps follow the recipe
720
+ mechanically. `fix` adds one explicit deviation: its phase body emits
721
+ per-finding console output and reads a [y/n/q] confirmation via
722
+ `readline`. Pure side-effect-free phase bodies are the recipe default,
723
+ but interactive verbs are an explicit exception (same precedent as
724
+ `scan` keeping its LLM call inside `executeScanPhase`). The summary line
725
+ + exit-code logic still lives in `renderFixOutput` so the engine path's
726
+ idempotency isn't coupled to the final stdout shape. See the new "Note
727
+ on interactive verbs" section at the bottom of the wrapping recipe.
728
+
729
+ Engine-off code paths are byte-for-byte unchanged for both verbs;
730
+ existing tests pass without modification.
731
+
732
+ ### Test count
733
+
734
+ 1356 → 1367 (+11). +6 cases for `costs-engine-smoke.test.ts`, +5 cases
735
+ for `fix-engine-smoke.test.ts`. Both mirror `scan-engine-smoke.test.ts`:
736
+ engine off → no run dir; engine on → state.json + events.ndjson with
737
+ the right lifecycle (`run.start` → `phase.start` → `phase.success` →
738
+ `run.complete`); env-resolved; CLI override beats env. Typecheck clean.
739
+
740
+ ### Deliberately deferred
741
+
742
+ - Wrapping the seven remaining pipeline phases (`brainstorm`, `plan`,
743
+ `implement`, `migrate`, `validate`, `pr`, `review`). One or two per
744
+ release across v6.0.3+.
745
+ - Flipping the v6.0 built-in default to ON. v6.1 territory.
746
+
747
+ ## v6.0.1 — Engine wire-up Part A (2026-05-05)
748
+
749
+ **The headline.** v6.0 shipped the engine modules but left the user-facing
750
+ knobs un-wired. This release lights up the three knobs (`--engine` /
751
+ `--no-engine` CLI flag, `CLAUDE_AUTOPILOT_ENGINE` env var,
752
+ `engine.enabled` config key) with explicit precedence (CLI > env > config
753
+ > built-in default) and wraps the **first** pipeline phase — `scan` —
754
+ through `runPhase`. Every other pipeline phase still bypasses the engine;
755
+ those land one or two per PR across subsequent v6.0.x releases following
756
+ the recipe at [`docs/v6/wrapping-pipeline-phases.md`](docs/v6/wrapping-pipeline-phases.md).
757
+
758
+ The engine still ships **OFF** by default in v6.0.x. The default flip to
759
+ **ON** lands in v6.1 per [`docs/specs/v6.1-default-flip.md`](docs/specs/v6.1-default-flip.md).
760
+
761
+ ### What landed (PR #95)
762
+
763
+ - **`resolveEngineEnabled()` precedence resolver.** Pure / no-IO function
764
+ in `src/core/run-state/resolve-engine.ts`. Inputs:
765
+ `{cliEngine?, envValue?, configEnabled?, builtInDefault?}`. Outputs:
766
+ `{enabled, source, reason, invalidEnvValue?}`. Accepts case-insensitive
767
+ env values `on/off/true/false/1/0/yes/no` (plus whitespace tolerance);
768
+ invalid values fall through to the next-lowest precedence layer and
769
+ surface the raw string in `invalidEnvValue` so the caller can emit a
770
+ `run.warning`. **+45 unit tests** covering every precedence layer, every
771
+ accepted env form, the conflict rules, and the invalid-env fallthrough.
772
+ - **CLI flag parsing in `src/cli/index.ts`.** New `parseEngineCliFlag()`
773
+ helper rejects the conflict case (both `--engine` AND `--no-engine`)
774
+ with `invalid_config` exit 1. Wired into the `scan` case to pass
775
+ `cliEngine` + `envEngine` (from `process.env.CLAUDE_AUTOPILOT_ENGINE`)
776
+ through to `runScan`.
777
+ - **Config schema** (`src/core/config/types.ts` + `schema.ts`). New
778
+ optional `engine.enabled: boolean` knob; schema rejects unknown
779
+ sub-keys (`additionalProperties: false`).
780
+ - **Help text** (`src/cli/help-text.ts`). New `GLOBAL_FLAGS_BLOCK`
781
+ documents `--json` / `--engine` / `--no-engine` + the precedence
782
+ matrix + scope (scan only in v6.0.1; rest follows the recipe). Per-verb
783
+ `scan` Options block adds the new flags so `claude-autopilot help scan`
784
+ is self-contained.
785
+ - **`scan` pilot phase wrapping** (`src/cli/scan.ts`). Refactored the
786
+ LLM-call-and-finding-processing portion into `executeScanPhase(input)`
787
+ → `ScanOutput` (pure, no console output, no exit-code logic). Defined
788
+ `RunPhase<ScanInput, ScanOutput>` with `name: 'scan'`,
789
+ `idempotent: true`, `hasSideEffects: false`. Engine-on path:
790
+ `createRun()` → `runPhase()` → `run.complete` event +
791
+ `replayState`/`writeStateSnapshot` refresh + best-effort lock release
792
+ in `finally`. Engine-off path: `executeScanPhase(input)` directly,
793
+ byte-for-byte unchanged from v6.0. Rendering extracted into
794
+ `renderScanOutput()` so the engine path's idempotency isn't coupled
795
+ to console output. Test seam (`__testReviewEngine`) lets the smoke test
796
+ inject a fake without an LLM key.
797
+ - **End-to-end smoke test** (`tests/cli/scan-engine-smoke.test.ts`).
798
+ Drives `runScan` with the engine on against a tmp project; asserts
799
+ `state.status === 'success'`, single `scan` phase with the right
800
+ `idempotent` / `hasSideEffects` flags, monotonic seq numbers, and the
801
+ full lifecycle (`run.start` → `phase.start` → `phase.success` →
802
+ `run.complete`). Five cases including engine-off (no run dir),
803
+ env-resolved, CLI override, and invalid-env-fallthrough warning.
804
+ - **Wrapping recipe doc** (`docs/v6/wrapping-pipeline-phases.md`).
805
+ Six-step recipe + phase-status table + idempotency decision tree +
806
+ worked example (scan) + a checklist subsequent v6.0.x PRs follow when
807
+ wrapping the remaining ten pipeline phases (`brainstorm`, `plan`,
808
+ `implement`, `migrate`, `validate`, `pr`, `review`, `fix`, `costs`).
809
+ - **Migration guide** (`docs/v6/migration-guide.md`). "What works today"
810
+ list updated — three knobs move from "wiring pending" to "wired (limited
811
+ to scan)". Other phases still tracked under "wiring pending."
812
+ - **Spec reconciliation** (`docs/specs/v6-run-state-engine.md`). New "What
813
+ was actually built (v6.0.1 — Part A)" block.
814
+
815
+ ### Test count
816
+
817
+ 1306 → 1356 (+50). Typecheck clean. Existing 1306 tests continue to pass
818
+ unchanged — the engine-off code path for `scan` is byte-for-byte
819
+ identical to v6.0.
820
+
821
+ ### Deliberately deferred
822
+
823
+ - Wrapping of any other pipeline phase. Lands one or two per PR across
824
+ v6.0.2+ following the recipe.
825
+ - Flipping the v6.0 built-in default to ON. v6.1 territory.
826
+ - Removing `--no-engine`. v7 territory.
827
+
828
+ ## v6.0 — Run State Engine (2026-05-05)
829
+
830
+ **The headline.** Autopilot moves from a stateless command-stream to a
831
+ checkpointed, resumable, budget-bounded, observable pipeline. Every run gets
832
+ a ULID and a per-project directory at `.guardrail-cache/runs/<ulid>/`.
833
+ Every state transition appends a typed event to `events.ndjson` and updates
834
+ `state.json` atomically. Two-layer budget enforcement (advisory `estimateCost`
835
+ preflight + mandatory runtime guard) hard-stops runaway spend before it
836
+ happens. Every CLI verb grows a `--json` flag with strict stdout/stderr
837
+ channel discipline so CI consumers can drive the pipeline programmatically.
838
+ Side-effect phase replay decisions consult persisted `externalRefs` plus a
839
+ live provider read-back so resume is safe by construction. **v6.0 ships
840
+ with the engine OFF by default — opt-in via `engine.enabled: true` (config
841
+ wiring across 6.0.x point releases). Default flips to ON in v6.1.** See
842
+ [`docs/v6/migration-guide.md`](docs/v6/migration-guide.md) for the v5.x → v6
843
+ walkthrough and [`docs/v6/quickstart.md`](docs/v6/quickstart.md) for the
844
+ five-minute version.
845
+
846
+ ### Per-phase landings
847
+
848
+ - **Phase 1 — Run State Engine persistence layer ([#86](https://github.com/axledbetter/claude-autopilot/pull/86)).** `RunState` / `RunEvent` / `PhaseSnapshot` / `ExternalRef` / `WriterId` types in `src/core/run-state/types.ts`. Pure-TS 26-char Crockford Base32 ULID generator (`ulid.ts`). Per-run advisory lock via `proper-lockfile` + `.lock-meta.json` sidecar with PID + SHA-256-hashed hostname; off-host writers default to alive (fail closed) so a network-mounted lock can't be stolen. Durable append protocol for `events.ndjson` (`open(O_APPEND)` → `write` → `fsync(fd)` → `close` per event) with monotonic `seq` via `.seq` sidecar. Truncated last-line detection emits `run.recovery(reason: 'recovered-from-partial-write')` and continues; mid-file corruption throws `partial_write` immediately. Atomic snapshot writer for `state.json` (`open(.tmp)` → `fsync(fd)` → `rename` → `fsync(dirfd)`; tmpfs/SMB compatibility via swallowed EISDIR/EPERM/ENOTSUP on the dir-fsync). `recoverState` falls back to events replay when `state.json` is missing/corrupt. `createRun` / `listRuns` / `gcRuns` lifecycle helpers; symlink-safe GC. New `ErrorCode` variants: `lock_held`, `corrupted_state`, `partial_write`. **+56 tests.**
849
+ - **Phase 2 — Phase wrapper + lifecycle ([#87](https://github.com/axledbetter/claude-autopilot/pull/87)).** `RunPhase<I, O>` interface (`idempotent` / `hasSideEffects` / `estimateCost?` / `run` / `onResume?`). `runPhase` orchestrator emits `phase.start` → `phase.success`/`failed` and gates idempotent short-circuit + side-effecting replay. Atomic per-phase snapshot writer (`writePhaseSnapshot` with path-traversal rejection on phase names). Hidden CLI verb `claude-autopilot internal log-phase-event` exposed via `cli-internal.ts` so markdown-driven skills can append events without importing the engine. Sub-phase nesting via synthetic `phaseIdx` encoding (`parentIdx * 1000 + childOrdinal`). **+27 tests.** Spec deviation: idempotent-replay short-circuit emits `run.warning(details.reason: 'idempotent-replay')` instead of a new `phase.skipped` event variant — durable log doesn't need a new shape since the snapshot is identical.
850
+ - **Phase 3 — `runs` / `run resume` CLI ([#88](https://github.com/axledbetter/claude-autopilot/pull/88)).** Six verbs: `runs list` (newest-first, `--status` filter), `runs show <id>` (state + optional events tail), `runs gc` (default 30-day cutoff, confirmation gate), `runs delete <id>` (terminal-status guard + lock acquisition), `runs doctor` (replay vs snapshot drift; `--fix` rewrites), `run resume <id>` (**lookup-only** in v6.0 — identifies next phase + decision rationale; live execution wires in 6.1+). Every verb supports `--json` envelope output (v1 schema). New `Engine` group in `HELP_GROUPS`. Decision vocabulary (`retry` / `skip-idempotent` / `needs-human` / `already-complete`) preserved as a thin wrapper around the canonical `decideReplay` matrix introduced in Phase 6. **No changes to existing CLI verbs.**
851
+ - **Phase 4 — Budget enforcement ([#89](https://github.com/axledbetter/claude-autopilot/pull/89)).** `BudgetConfig` (`perRunUSD`, `perPhaseUSD?`, `councilMaxRecursionDepth?`, `bgAutopilotMaxRoundsPerSelfEat?`, `conservativePhaseReserveUSD?`). `checkPhaseBudget` pure decision function with two-layer policy: (1) advisory — uses `estimateCost.high` if the phase declares one; (2) mandatory — runs regardless, enforces `actualSoFar + conservativePhaseReserveUSD <= perRunUSD` so phases without `estimateCost` still trigger budget gates. `runPhase` emits a `budget.check` event with full decision rationale (`{phase, phaseIdx, estimatedHigh, actualSoFar, reserveApplied, capRemaining, decision, reason}`) before every spawn; throws `GuardrailError(budget_exceeded)` on hard-fail. Council synthesizer recursion bounded via `councilMaxRecursionDepth` — exceeded calls return `status: 'partial'` rather than continuing. **+25-30 tests.**
852
+ - **Phase 5 — Typed JSON events + strict `--json` channel discipline ([#90](https://github.com/axledbetter/claude-autopilot/pull/90)).** `--json` flag now lives on every Review / Pipeline / Deploy / Migrate / Diagnostics verb. Strict channel contract enforced by a dispatcher-level wrapper (`runUnderJsonMode` in `src/cli/json-envelope.ts`): exactly **one** JSON envelope on stdout per invocation; **only** NDJSON event lines on stderr (synthetic `run.warning` for legacy text via `installJsonModeChannelDiscipline` console-wrap); ANSI color codes stripped; interactive prompts hard-fail with `EXIT_NEEDS_HUMAN = 78` and the envelope's `nextActions` field carries the resume hint. Text-mode behavior unchanged. **`tests/cli/json-channel-discipline.test.ts` asserts the invariants per migrated verb.**
853
+ - **Phase 6 — Idempotency contracts + provider read-back ([#91](https://github.com/axledbetter/claude-autopilot/pull/91)).** `decideReplay` pure decision matrix in `replay-decision.ts` maps `(priorSuccess, idempotent, hasSideEffects, refs, readbacks, forceReplay)` → `'retry' | 'skip-already-applied' | 'needs-human' | 'abort'`. Pluggable `ProviderReadback` registry in `provider-readback.ts` with built-in read-backs for `github` (via `gh` CLI), `vercel` / `fly` / `render` (via the deploy adapters), `supabase` (via `migration_state`). All read-backs **fail closed** — any throw, parse failure, or unrecognized state collapses to `existsOnPlatform=false, currentState='unknown'` so the matrix routes to `needs-human` instead of a silent skip. `runPhase` wires `decideReplay` (replaces Phase 2's hard-coded throw). New `replay.override` event variant emitted when `--force-replay` flips a refusal into a retry; `foldEvents` records overrides on `phase.meta.replayOverrides`. `PhaseSnapshot.result` field added so `skip-already-applied` returns the prior output without re-execution. CLI lookup (`runRunResume`) delegates to the same `decideReplay` so prediction matches live execution. **+55 tests.**
854
+ - **Phase 7 — Live adapter certification suite ([#92](https://github.com/axledbetter/claude-autopilot/pull/92)).** Five live assertions × three providers (Vercel + Fly + Render): deploy success, auth failure, 404, rollback, log streaming with redaction-on-planted-secret. Env-gated via `resolveProviderEnv()` — runs report `skipped` until the operator adds the seven `*_TEST` GitHub Secrets per `docs/adapters/cert-suite.md`. Flake-control harness (`tests/adapters/live/_harness.ts`) implements per-provider 3-attempt retry budget with exp backoff (1s / 4s / 16s) on transient categories, hard-fail (no retry) on auth/404/schema-mismatch, soft-fail with 3-strike escalation on rollout/log-streaming flakes; **+42 unit tests** for the harness alone (run under regular `npm test`, no live creds required). Nightly CI workflow at `.github/workflows/adapter-cert.yml` (09:00 UTC + manual `workflow_dispatch`); uploads `events.ndjson` + `log-tail.txt` artifacts on every run. **Spec deviation:** Fly cert needs a third env var (`FLY_IMAGE_TEST`) since the Fly adapter doesn't build images per the v5.6 design.
855
+ - **Phase 8 — Docs + migration guide ([#94](https://github.com/axledbetter/claude-autopilot/pull/94), this PR).** `docs/v6/migration-guide.md` walks v5.x users through the opt-in flow with a precedence matrix, troubleshooting recipes, the per-phase idempotency table, and the v6.0 → v6.1 default-flip plan. `docs/v6/quickstart.md` is the five-minute version. README gains a "Run State Engine (v6)" section. CHANGELOG (this entry) bundles every phase. Spec gets a Phase 8 reconciliation block + a Status column on the implementation phases table. New `docs/specs/v6.1-default-flip.md` outlines the stabilization criteria for flipping `engine.enabled` to `true` by default and removing `--no-engine`.
856
+ - **Spec — Codex-reviewed twice ([#85](https://github.com/axledbetter/claude-autopilot/pull/85)).** Two passes through Codex 5.3 hardened the persistence protocol (durable append + atomic snapshot ordering), promoted `events.ndjson` to source-of-truth with `state.json` as a derived cache, mandated copy-not-symlink for artifacts, added the two-layer budget policy with a mandatory runtime guard, formalized the strict `--json` channel discipline, defined the external-operation ledger for replay safety (`ExternalRef` + provider read-back), pinned the precedence matrix, and added flake-control parameters for the live adapter cert suite.
857
+
858
+ ### Codex / council pricing — from the GPT-5.5 swap ([#93](https://github.com/axledbetter/claude-autopilot/pull/93))
859
+
860
+ - **Default codex/council model bumped `gpt-5.3-codex` → `gpt-5.5`.** OpenAI
861
+ released GPT-5.5 (codename Spud) on 2026-04-23 — better at coding than 5.4
862
+ with fewer tokens, available via standard Responses/Chat Completions API
863
+ at `gpt-5.5` (no `-codex` suffix). Pricing **doubles** to $5/1M input +
864
+ $30/1M output, so the per-adapter `COST_PER_M_INPUT/OUTPUT` defaults moved
865
+ in lockstep — without this, every cost-ledger entry would silently halve.
866
+ New canonical pricing table at `src/adapters/pricing.ts` keeps the legacy
867
+ `gpt-5.3-codex` and `gpt-5.4` entries for back-compat with pinned
868
+ `CODEX_MODEL`/`council.models[].model` configs. Override via env vars
869
+ (`CODEX_MODEL`, `CODEX_COST_INPUT_PER_M`, `CODEX_COST_OUTPUT_PER_M`).
870
+
871
+ ## v5.6.0 — Fly.io + Render deploy adapters (2026-05-04)
872
+
873
+ ### Added
874
+
875
+ - **`@delegance/claude-autopilot deploy --adapter fly`** — first-class Fly.io adapter. Image-based releases via the Machines API (image must be pre-pushed via `fly deploy --build-only --push`), polling-based status, **WebSocket log streaming**, **native rollback** with simulated fallback when the API endpoint is unavailable. `FLY_API_TOKEN` env var; auth doctor warns when missing.
876
+ - **`@delegance/claude-autopilot deploy --adapter render`** — first-class Render adapter. REST API deploys (with optional `clearCache`), service-scoped status polling at `GET /v1/services/{serviceId}/deploys/{deployId}`, REST-polling log stream with `(timestamp, logId)` cursor dedup, **simulated rollback** by re-deploying the previous successful commit. `RENDER_API_KEY` env var; auth doctor warns when missing.
877
+ - **`DeployAdapterCapabilities` interface** — adapters declare `streamMode: 'websocket' | 'polling' | 'none'` and `nativeRollback: boolean`. CLI prints a one-line stderr notice for polling-mode adapters under `--watch` so users understand why log lines arrive in batches.
878
+ - **Bounded auto-rollback orchestration in `src/cli/deploy.ts`** — when health check fails after deploy and `rollbackOn: [healthCheckFailure]` is configured, the CLI fires exactly one rollback (no chains), with `runHealthCheck` capped at 5 attempts × 6s backoff (~30s window). New terminal `DeployResult.status` values: `fail_rolled_back` and `fail_rollback_failed`.
879
+ - **HTTP-status error taxonomy** — new `not_found` `ErrorCode` joins the union; per-adapter mapping: 401/403→`auth`, 404→`not_found`, 422/400→`invalid_config`, 5xx→`transient_network` (retryable). Provider request-id headers (`Fly-Request-Id`, `x-request-id`) captured into `error.details` for support tickets.
880
+ - **Mandatory log redaction across all adapters** — every log line surfaced into `DeployResult.output` or PR-comment bodies runs through `redactLogLines()` (defaults: `AKIA…`, `sk-…`, `eyJ…`, `ghp_`, `xoxb-`, plus user-configurable `config.persistence.redactionPatterns`). Closes a real existing security hazard in the v5.4 Vercel adapter that was emitting unredacted logs into PR comments.
881
+ - **Shared `src/adapters/deploy/_http.ts`** — extracted `fetchWithRetry` + `safeReadBody` helpers used by Vercel, Fly, and Render adapters; one canonical retry implementation to maintain.
882
+
883
+ ### Fixed
884
+
885
+ - **Bugbot caught + autopilot fixed 4 real bugs across the v5.6 self-eat phases.** HIGH on Phase 2 (Render service-scoped URL — `pollUntilTerminal` and `status()` were using shorthand `/v1/deploys/{id}` which doesn't exist on Render's API). MEDIUM on Phase 3 (Render cursor dedup wasn't sorting same-ms entries by id, silently dropping out-of-order siblings). LOW on Phase 4 (`printAutoRollback` hardcoded "failed 3x" but the constant is now 5). LOW on Phase 5 (`getPreviousFileContent` was being called for `.sql` files where `previousContent` is ignored, wasting a `git show` spawn per migration).
886
+ - **Schema-alignment diff-aware Prisma parsing (PR #44, schema-alignment cleanup)** — `getPreviousFileContent` now defaults to a CI-aware base ref (`GITHUB_BASE_REF` → `origin/<base>`, then `CI_MERGE_REQUEST_TARGET_BRANCH_NAME`, fallback `HEAD~1`) instead of always reading from `HEAD` (which gave empty diffs in CI). Dropped models now emit `drop_column` for every field of the removed model.
887
+ - **Tombstone CLI no longer crashes with a stack trace when presets are missing (PR #82)** — schema-validator was running file IO at module load time, so every `claude-autopilot --version` call eagerly read `presets/aliases.lock.json` + `presets/schemas/migrate.schema.json`; missing presets crashed the CLI before it could format an error. Now lazy-init via memoized `getValidator()`.
888
+
889
+ ## v5.5.2 — Framework-agnostic /migrate (2026-04-30)
890
+
891
+ ### Added
892
+
893
+ - **Working examples for Rails, Alembic, Django, golang-migrate, Prisma, Drizzle, dbmate, Flyway, supabase-cli, custom scripts** in `skills/migrate/SKILL.md`. The dispatcher was always framework-agnostic, but the prior doc text only described the Supabase path.
894
+ - **Detector `defaultCommand` fills** for `prisma-push`, `drizzle-push`, `golang-migrate`, `typeorm` so `claude-autopilot init` produces a working `stack.md` on first try for these toolchains.
895
+
896
+ ### Fixed
897
+
898
+ - **`/migrate` skill description rewritten** as a generic dispatcher description with a "when to use migrate-supabase instead" callout. Anyone running `migrate@1` in a non-Supabase repo no longer sees Supabase-specific instructions.
899
+
900
+ ## v5.5.1 — `openai` SDK now optional (2026-04-30)
901
+
902
+ ### Changed
903
+
904
+ - **`openai` moved to `optionalDependencies`** alongside `@anthropic-ai/sdk`, `@google/generative-ai`, `@modelcontextprotocol/sdk`. All four LLM SDKs are now optional. `npm install --omit=optional` shed grows to **~26 MB** (was ~13 MB after v5.5.0). `scripts/autoregress.ts` migrated to `loadOpenAI()` — the last direct `import OpenAI` outside the adapter layer.
905
+
906
+ ### Notes
907
+
908
+ - Council runner already handles missing-synth-SDK gracefully — returns `status: 'partial'` with the friendly install hint surfaced via the synthesis error field. Users with only `ANTHROPIC_API_KEY` get a partial result with model responses preserved.
909
+
910
+ ## v5.5.0 — Lazy-load LLM SDKs + Vercel auth doctor (2026-04-30)
911
+
912
+ ### Added
913
+
914
+ - **`src/adapters/sdk-loader.ts`** with `loadAnthropic` / `loadOpenAI` / `loadGoogleGenerativeAI` + `isSdkInstalled` helper. Friendly `GuardrailError` on `MODULE_NOT_FOUND` points at the exact `npm install` command.
915
+ - **Phase 6 of v5.4 spec — Vercel auth doctor.** `claude-autopilot doctor` detects `deploy.adapter: vercel` in `guardrail.config.yaml` and warns when `VERCEL_TOKEN` is missing.
916
+ - **LLM SDK install-state surface in doctor** — shows which optional LLM SDKs are actually installed.
917
+
918
+ ### Changed
919
+
920
+ - **`@anthropic-ai/sdk`, `@google/generative-ai`, `@modelcontextprotocol/sdk` moved to `optionalDependencies`**. Six adapters converted from top-level import to dynamic load. Users with `--omit=optional` shed ~13 MB and only need the SDK matching their API key.
921
+
922
+ ## v5.4.0 — Vercel first-class deploy adapter (2026-04-30)
4
923
 
5
924
  ### Added
6
925
 
7
- - **`deploy` phase** — adapter-agnostic deploy step that runs your existing deploy command, extracts the URL from stdout, optionally polls a `healthCheckUrl`, and (optionally) posts result to a PR. Closes the loop from "PR merged" to "PR merged + deployed + smoke-tested + URL on the PR".
8
- - **`deployCommand` + `healthCheckUrl` config keys** — anything that works in your terminal works as `deployCommand` (`vercel --prod`, `flyctl deploy`, `kubectl apply`, `gh workflow run`, `make deploy`).
9
- - **`claude-autopilot deploy [--dry-run|--command|--health-url|--pr <n>]`** — CLI surface. PR comment integration via `gh pr comment`.
926
+ - **`@delegance/claude-autopilot deploy --adapter vercel`** first-class Vercel adapter via the v13 deployments API. Returns `dpl_xxx` IDs, polls status until terminal, populates `deployUrl` / `buildLogsUrl` / `output`. Auth via `VERCEL_TOKEN`.
927
+ - **`--watch` SSE+NDJSON log streaming** — subscribes to `/v2/deployments/<id>/events?builds=1`, prints to stderr in real time. Reconnects once with exp backoff on disconnect.
928
+ - **`claude-autopilot deploy rollback` + `deploy status`** — CLI subverbs over the adapter's `rollback()` / `status()` methods. `--to <id>` overrides "previous prod deploy" lookup.
929
+ - **Auto-rollback on health-check failure** — when `rollbackOn: [healthCheckFailure]` is set in config, the CLI promotes the previous prod deploy if the post-deploy health check fails. PR comment shows both URLs (new + rolled-back-to).
930
+ - **`<!-- claude-autopilot-deploy -->` upserting PR comment** — single comment is updated in place across deploy → log-stream → health-check → rollback, instead of spamming the PR with multiple comments.
931
+
932
+ ### Fixed
933
+
934
+ - **Bugbot caught explicit `--config <missing>` was silently ignored on PR #63 (Phase 3)** — autopilot fixed it with a regression test in 4 minutes.
935
+ - **Phase 4 introduced a regression in Phase 2's `--watch` test surface; caught via `npm test` before PR opened**, autopilot adapted spec interpretation (made health-check opt-in instead of falling back to deployUrl) and documented the deviation.
936
+
937
+ ### Notes
10
938
 
11
- First-class provider adapters (Vercel/Fly/Render with API-level deploy IDs + rollback hooks) are queued for v5.4.
939
+ - This release was **shipped as four self-eat PRs** (#59, #61, #63, #64) where autopilot implemented its own next phase end-to-end. Cumulative cost ~\$17.50, wall clock ~82 min, 47 new tests. See [DEMO.md](DEMO.md) for the full proof set.
940
+ - v5.3 "deploy phase" was superseded by v5.4 — the adapter pattern subsumed the generic-command-only design from the in-flight v5.3 spec.
12
941
 
13
942
  ## v5.2.2 — Demo polish
14
943