@delegance/claude-autopilot 7.8.0 → 7.9.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,27 @@
2
2
 
3
3
  - v5.6 Phase 7 (docs reconciliation) — pending.
4
4
 
5
+ ## 7.9.1 — 2026-05-13 (correctness hotfix)
6
+
7
+ ### Fixed
8
+ - **`skills/autopilot/SKILL.md` ran migrate BEFORE validate.** On stacks that auto-promote (Supabase-script-specific), this could leave production with new schema and no working code if validate or PR review later failed. Resequenced: validate is now Step 4, migrate-dev is Step 5. PR + Codex + bugbot follow. Production migration is explicitly handed off to the user's CI/CD pipeline.
9
+ - **Removed misleading "dev → QA → prod auto-promote" claim.** That behavior is Supabase-stack-specific, not a generic CLI capability. The skill now references the four real `migrate.policy` keys (`allow_prod_in_ci`, `require_clean_git`, `require_manual_approval`, `require_dry_run_first`) and explains how to wire them in `.autopilot/stack.md`.
10
+
11
+ ### Out of scope (filed as v7.10.0 + v7.11.0 candidates)
12
+ - Expand/contract migration classification (additive vs destructive enforcement)
13
+ - v6 run-state engine integration into the autopilot skill (4,873 LOC of checkpoint/resume infra currently unused by the skill)
14
+ - Retry-loop sameness detector ("same fingerprint twice → escalate to human")
15
+
16
+ ## 7.9.0 — 2026-05-12
17
+
18
+ ### Changed
19
+ - `skills/autopilot/SKILL.md` rewrite: merge "idea → spec" Step 0 (Step 0 brainstorming with per-step Codex validation) with the risk-tiered codex pass policy from v7.8.0. Adds entry decision tree (idea vs spec), operational preflight (gh auth + push-permission check, strict worktree-clean gate, Codex CLI resolution with package-fallback, portable test-runner detection), tightened CRITICAL-finding remediation semantics ("must remediate, not just acknowledge"), missing-`risk:` frontmatter backward-compat (default medium + keyword auto-escalation to high, resolved at preflight not mid-pipeline), entry-path-aware risk-tier pass counts (idea-entry vs approved-spec-entry), and `mkdtemp`-based secure tempfile handling (0700 dir + 0600 file + cleanup in finally) to replace the predictable timestamp/pid pattern. Net 242→292 lines.
20
+
21
+ ### Documentation
22
+ - Each pipeline step now states its risk-tier behavior explicitly; Step 0 brainstorming substeps use 1 codex pass each, Step 1 onward uses the risk-tiered policy.
23
+
24
+ No CLI/code changes. Skill content only.
25
+
5
26
  ## 7.8.0 — 2026-05-11
6
27
 
7
28
  ### Changed
package/README.md CHANGED
@@ -6,14 +6,11 @@
6
6
 
7
7
  **Open source, MIT-licensed, runs on your machine with your API keys.** No hosted agent, no per-seat subscription — `npm install -g @delegance/claude-autopilot@latest` and you're done.
8
8
 
9
- ## Hosted product (v7)
9
+ ## Hosted dashboard (early access)
10
10
 
11
- A hosted dashboard at **[autopilot.dev](https://autopilot.dev)** complements the self-hosted CLI. The CLI keeps doing all the work locally on your machine — running models, writing code, opening PRs — and optionally uploads each completed run's state + cost summary to the hosted dashboard so your team can see what's been shipped, audit cost, and manage memberships from the browser.
11
+ A hosted dashboard for team-wide run history, cost roll-up, and member management is in design-partner phase not yet open for self-serve signup. The CLI is and stays fully usable without it.
12
12
 
13
- - **CLI (this package)** local-first, no telemetry by default, your machine + your API keys. Pipeline stays the same.
14
- - **Dashboard ([autopilot.dev](https://autopilot.dev))** — opt-in. After `claude-autopilot dashboard login` mints a personal API key via a loopback OAuth flow, every engine-on autopilot run uploads its `state.json` + `events.ndjson` + cost roll-up at run.complete. Org admin (members, billing, SSO) lives there.
15
-
16
- The CLI works offline; the dashboard is purely additive. See [docs/v7/runbook.md](./docs/v7/runbook.md) for operating the hosted product and [docs/v7/breaking-changes.md](./docs/v7/breaking-changes.md) for the v6→v7 migration checklist.
13
+ If you're interested in early access, open an issue or email alex@delegance.com. Otherwise the rest of this README covers everything you need to run the CLI locally with your own API keys.
17
14
 
18
15
  ```bash
19
16
  claude-autopilot brainstorm "add SSO with SAML for enterprise tenants"
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@delegance/claude-autopilot",
3
- "version": "7.8.0",
3
+ "version": "7.9.1",
4
4
  "type": "module",
5
5
  "publishConfig": {
6
6
  "tag": "next"
@@ -1,127 +1,201 @@
1
1
  ---
2
2
  name: autopilot
3
- description: After spec approval, automatically execute the full pipeline — plan → implement → migratevalidate → PR → Codex review. No manual intervention required.
3
+ description: End-to-end pipeline brainstorm spec plan → implement → validatemigrate dev → PR → Codex review → bugbot → merge. Risk-tiered. No manual intervention after spec approval until PR merge; production deploy/migration gates are handled by the user's CI/CD pipeline.
4
4
  ---
5
5
 
6
- # Autopilot — Spec to PR Pipeline
6
+ # Autopilot — Idea to Merged PR Pipeline
7
7
 
8
- After the user approves a spec during brainstorming, this skill runs the full pipeline automatically.
8
+ Drives the full flow from raw user idea (or an existing spec) through merged PR. The ONLY pause is explicit user approval of the spec after Step 0; everything after that runs unattended unless blocked by an unrecoverable failure, missing credentials, or an unresolved CRITICAL Codex finding.
9
9
 
10
- ## Prerequisites
10
+ ## Entry decision tree
11
11
 
12
- - Approved spec file at `docs/superpowers/specs/YYYY-MM-DD-<topic>-design.md`
13
- - Superpowers plugin installed (`writing-plans`, `using-git-worktrees`, `subagent-driven-development`)
14
- - Scripts installed and dependencies present (run step 0 preflight to verify)
12
+ Pick the entry point ONCE at the start:
15
13
 
16
- ## CRITICAL: Do Not Pause
14
+ - **Approved spec path provided** (e.g. `docs/superpowers/specs/YYYY-MM-DD-<topic>-design.md`) → skip Step 0, jump to Step 1.
15
+ - **Only an idea provided** (no spec) → run Step 0 to brainstorm + spec it.
16
+ - **Neither** → ask the user once for either the spec path or the idea. This is the only allowed pre-pipeline pause.
17
+
18
+ ## Prerequisites (hard-gate)
19
+
20
+ The pipeline ABORTS with a clear actionable error if any of these are missing:
21
+
22
+ - **Required skills:** `superpowers:brainstorming`, `superpowers:writing-plans`, `superpowers:subagent-driven-development`, `superpowers:using-git-worktrees` (for Step 2)
23
+ - **Required project scripts:** `scripts/codex-review.ts`, `scripts/codex-pr-review.ts`, `scripts/bugbot.ts`, `scripts/validate.ts` — OR equivalent CLI verbs from `@delegance/claude-autopilot` if running the package version
24
+ - **Required env:** `OPENAI_API_KEY` (for Codex passes), `GITHUB_TOKEN` or `gh auth status` (for PR creation), `ANTHROPIC_API_KEY` (for impl agents)
25
+
26
+ If any superpowers skill is missing, print:
27
+ ```
28
+ Autopilot requires superpowers plugin. Install with: claude plugin install superpowers
29
+ ```
30
+ and exit. Do NOT half-run with a missing dependency.
31
+
32
+ ## Operational preflight (run before Step 0/1)
33
+
34
+ Each numbered check below is a HARD GATE. ALL must pass before any LLM call. There are no "OR" conditions — every condition listed inside a check must hold.
35
+
36
+ 1. **Branch is NOT `main`/`master`** — independent hard gate. Run `git rev-parse --abbrev-ref HEAD`; if the result is `main` or `master`, ABORT immediately. The skill never commits or mutates `main`/`master` directly. Resolution: create a feature branch or worktree first.
37
+ 2. **Working tree is clean (tracked AND untracked)** — `git diff --quiet` AND `git diff --cached --quiet` must both succeed. Run `git status --porcelain`; abort if any untracked files are present unless the user explicitly opted in via `--allow-untracked`.
38
+ 3. **GitHub auth resolved** — accept ANY of:
39
+ - `gh auth status` succeeds (interactive CLI login), OR
40
+ - `GITHUB_TOKEN`/`GH_TOKEN` is set AND `gh api user` (using that token) returns 200.
41
+
42
+ Then verify push permission for the target repo: `gh repo view <owner>/<repo>` succeeds AND either (a) `gh api repos/<owner>/<repo>/collaborators/<user>/permission` returns `admin`/`maintain`/`write`, or (b) a no-op `git push --dry-run origin HEAD:refs/heads/<probe-branch>` succeeds. Abort if no push permission.
43
+ 4. **Codex CLI resolution** — Resolve the codex-review command ONCE here and cache the resolved invocation for the rest of the pipeline. Try in order:
44
+ - `npx tsx scripts/codex-review.ts --help` (project-local script), OR
45
+ - `npx @delegance/claude-autopilot codex-review --help` (package CLI, also exposes `codex-pr-review`, `bugbot`, `validate`).
46
+
47
+ Use whichever resolves first; cache the resolved command. Abort if neither does. The accepted package CLI verbs are: `codex-review`, `codex-pr-review`, `bugbot`, `validate` — pinned to the major version of `@delegance/claude-autopilot` declared in the host project's `package.json`.
48
+ 5. **Test runner reachable** — Detect from `package.json`: if `scripts.test` is defined, run `npm run test -- --help` or framework-specific probe (`vitest --version`, `jest --help`, `playwright --version`). Do NOT assume `--dry-run` is supported. If no `test` script exists, accept presence of `scripts/validate.ts` or `scripts.validate` instead. Abort only if neither test nor validation entrypoint is reachable.
49
+ 6. **Implementation-agent credentials present** — `ANTHROPIC_API_KEY` is required because the skill drives Claude Code subagents for implementation; this is independent of which provider the host project uses for application LLM calls. If your impl-agent backend is intentionally non-Anthropic, document the override in `.claude/autopilot.config.json` and the preflight will accept the configured alternative.
50
+ 7. **Migration tool reachable** (if `data/deltas/` exists) — `/migrate` skill or `supabase` CLI
51
+
52
+ If any preflight check fails, abort with the specific check and remediation hint. Do NOT proceed and discover the failure mid-pipeline.
53
+
54
+ **Risk-tier confirmation is preflight, not mid-pipeline.** If the entry path is an approved spec and the spec's frontmatter is missing `risk:`, run the keyword auto-escalation rule (see below) and surface the inferred tier as a single preflight prompt BEFORE Step 1 begins. This is the same pre-pipeline pause class as "ask for the spec path"; it is not a mid-pipeline check-in.
55
+
56
+ ## CRITICAL invariant — Do Not Pause (after spec approval)
57
+
58
+ There are exactly TWO allowed pre-pipeline pause classes, both occurring BEFORE Step 1:
59
+
60
+ 1. **Step 0 brainstorming user input** — the user picks an approach in substep 1 and explicitly approves the final spec at the end. These are intrinsic to brainstorming, not mid-pipeline check-ins.
61
+ 2. **Preflight prompts** — missing entry inputs (spec path or idea), and risk-tier confirmation when frontmatter is missing `risk:`. These resolve before Step 1 starts.
62
+
63
+ After Step 1 begins, do NOT:
17
64
 
18
- **Run the entire pipeline without stopping.** Do NOT:
19
65
  - Ask "want me to continue?" between steps
20
66
  - Show intermediate results or ask for confirmation
21
67
  - Pause to report progress mid-pipeline
22
68
  - Wait for user input between any steps
23
69
 
24
- The ONLY time you stop is if a step **fails and cannot be recovered**. Otherwise, execute all steps sequentially and report ONCE at the end (Step 9).
70
+ The pipeline halts ONLY for:
71
+ - Unrecoverable step failure (retries exhausted)
72
+ - Missing credentials surfaced mid-run (despite preflight)
73
+ - An **unresolved CRITICAL Codex finding** (see acceptance rules below)
74
+ - An external system hard-block (TCC permission revoked, network outage, etc.)
25
75
 
26
- Brief status lines like `[autopilot] Step 3: Executing plan...` are fine. Full summaries, questions, or check-ins are not.
76
+ Brief status lines like `[autopilot] Step 3: Executing plan...` are fine. Full summaries, questions, or check-ins are not. Report ONCE at the end (Step 9).
27
77
 
28
78
  ## Codex pass policy (risk-tiered)
29
79
 
30
- > Adopted from the v7.4.1 strategic review (see
31
- > `docs/strategy/2026-05-11-codex-pivot.md`, codex finding N2).
32
- >
33
- > The v8 spec pass-2 finding 3 CRITICALs the original spec missed
34
- > (especially sandbox / credential exfiltration) was concrete evidence
35
- > that 1 codex pass is insufficient for security-sensitive architecture.
36
- > But running 3 passes on every CLI polish spec adds latency without
37
- > proportional value.
38
-
39
- **Tier the spec by risk; pass count follows.**
80
+ > Adopted from the v7.4.1 strategic review. The v8 spec pass-2
81
+ > finding 3 CRITICALs the original spec missed (sandbox / credential
82
+ > exfiltration) was concrete evidence that 1 codex pass is
83
+ > insufficient for security-sensitive architecture. But running 3
84
+ > passes on every CLI polish spec adds latency without value.
40
85
 
41
- | Spec risk | Triggers | # of codex passes |
42
- |---|---|---|
43
- | **Low** | CLI UX changes, doc-only PRs, scaffolding extensions, config polish, CI workflow tweaks | **1 pass** (this skill's existing pattern — codex on the committed spec) |
44
- | **Medium** | New execution modes, auth changes, billing flows, data-access patterns, new env vars, API contracts | **2 passes** (1 on the draft spec, 1 on the merged spec after edits) |
45
- | **High** | Sandboxing, multi-tenancy, auto-merge, anything that mutates user repos, new secrets-handling, RPC/SECURITY DEFINER changes | **3 passes** + external review (1 draft, 1 post-edit, 1 on the impl PR diff) |
46
-
47
- **How to apply.** Spec docs declare risk in their frontmatter:
48
-
49
- ```markdown
86
+ **Specs must declare risk in YAML frontmatter:**
87
+ ```yaml
50
88
  ---
51
- title: <topic>
89
+ title: <spec title>
52
90
  risk: low | medium | high
53
91
  ---
54
92
  ```
55
93
 
56
- If the spec's `risk:` is omitted, default to **medium** (safer than
57
- defaulting to low; matches the v8 spec pattern where pass-1 was
58
- clearly insufficient).
94
+ **Pass-count rules:**
59
95
 
60
- The brainstorming skill's per-step codex pass (approach selection,
61
- architecture, components, error handling, implementation prep) is
62
- ALWAYS run it's how we get a draft spec good enough to merge.
63
- This tier policy applies to the **post-brainstorm** passes and to
64
- the codex PR review at Step 7 below.
96
+ | Spec risk | Triggers | Codex passes (idea-entry) | Codex passes (approved-spec-entry) |
97
+ |---|---|---|---|
98
+ | **Low** | CLI UX, doc-only PRs, scaffolding extensions, config polish, CI workflow tweaks | **1 pass** on the committed spec | **1 pass** on the committed spec (Step 1) |
99
+ | **Medium** | New execution modes, auth changes, billing flows, data-access patterns, new env vars, API contracts | **2 passes** (1 draft in Step 0, 1 on merged spec in Step 1) + Step 7 PR review | **2 passes** on the committed spec in Step 1 (back-to-back, with remediation between) + Step 7 PR review |
100
+ | **High** | Sandboxing, multi-tenancy, auto-merge, repo mutation, new secrets handling, RPC/SECURITY DEFINER changes | **3 passes** (1 draft, 1 post-edit in Step 1, 1 pre-impl) + Step 7 PR review | **3 passes** on the committed spec in Step 1 (each followed by remediation) + Step 7 PR review |
65
101
 
66
- **Examples from v7.x:**
102
+ **Entry-path semantics:** When entering with an approved spec (no Step 0 draft pass available), run the full required pass count starting in Step 1, applying remediation and re-running until CRITICALs are clean. Optionally, the spec may declare prior Codex passes in frontmatter (`codex_passes_completed: <n>`) to credit toward the requirement.
67
103
 
68
- * v7.1.7 (setup polish — CLAUDE.md scaffold + .gitignore + dedup): low.
69
- 1 pass on the committed spec. Caught zero CRITICALs in practice.
70
- * v7.4.0 (Python/FastAPI scaffold extension): low. 1 pass. Found
71
- 2 CRITICALs (FastAPI precedence, dangling entrypoint) — both
72
- fixed pre-impl, no PR-pass surprises.
73
- * v7.0 Phase 6 (engine-off removal + middleware revocation): high.
74
- 3 passes (spec, post-edit, PR diff). Each pass surfaced new
75
- trust-boundary issues; without all three the launch would have
76
- shipped with the credential-exfiltration vector C3.
77
- * v8.0 spec (standalone daemon): high. 2 passes so far + needs a
78
- 3rd before any v8 alpha implementation.
104
+ **Backward-compatibility for missing `risk:`:**
79
105
 
80
- ## Pipeline
106
+ If the spec frontmatter is missing `risk:`, default to `medium` AND emit a warning. Auto-escalate to `high` only when the spec content crosses a trust boundary — match on these specific phrases (not the bare word `auth`, which is too broad and conflicts with the medium-risk "auth changes" classification in the table above):
81
107
 
82
- Execute these steps in order. Do NOT pause between steps unless a step fails.
108
+ - `multi-tenant`, `tenancy`, `RLS bypass`, `SECURITY DEFINER`
109
+ - `secret handling`, `credential handling`, `vault`, `token storage`, `session token`
110
+ - `auto-merge`, `automerge`, `sandbox`, `sandboxed`
111
+ - `SSO provisioning`, `SAML assertion`, `OAuth provider integration`, `privilege escalation`
112
+ - `repo mutation`, `force-push`
83
113
 
84
- ### Step 0: Preflight
114
+ The risk-tier confirmation happens during preflight (see "Risk-tier confirmation is preflight, not mid-pipeline" above), not as a separate mid-pipeline pause.
115
+
116
+ **Step 0 substep passes are ALWAYS 1 initial pass each, regardless of spec risk.** Brainstorming is draft-stage feedback, not load-bearing security review. The risk-tiered policy applies starting at Step 1 (where the spec is committed). Note that CRITICAL findings in a Step 0 substep still trigger remediation + re-pass per the acceptance rules below — "1 initial pass" means "1 pass to surface findings," not "1 pass total even with unresolved CRITICALs."
117
+
118
+ ## Acceptance rules for Codex findings (CRITICAL — remediation semantics)
119
+
120
+ This is the load-bearing rule. Misreading it can ship vulnerable code.
121
+
122
+ - **CRITICAL findings: must be REMEDIATED, not just acknowledged.** Apply the fix to the spec/plan/code, then re-run the Codex pass. The pipeline MAY NOT proceed while any CRITICAL finding remains unresolved. "Auto-accept" never means "continue past."
123
+ - **WARNING findings:** remediate by default. Skip ONLY if the finding directly contradicts a locked user requirement; in that case, document the skip in the PR description with the reason.
124
+ - **NOTE findings:** discretionary. Roll into a "post-launch follow-ups" appendix if relevant; otherwise ignore.
125
+
126
+ After each Codex pass, present a single-line summary table to the user (severity + title + remediation status). Do NOT pause for "should I incorporate these?" — apply the rules above and continue.
127
+
128
+ ## Tempfile naming (avoid concurrent-session collisions + secure-by-default)
129
+
130
+ Codex passes write to temporary input files. Use a per-run isolated directory with restrictive permissions, NOT a shared `/tmp` path:
85
131
 
86
- ```bash
87
- npx tsx scripts/preflight.ts
88
132
  ```
133
+ <dir> = mkdtemp("${TMPDIR:-/tmp}/claude-autopilot-XXXXXX") # mode 0700
134
+ <file> = $dir/codex-input-<topic-slug>-<step>.md # mode 0600, exclusive create (O_EXCL)
135
+ ```
136
+
137
+ The timestamp+pid pattern from earlier drafts is INSUFFICIENT — it is predictable, world-readable in shared `/tmp`, and vulnerable to symlink races. Always use `mkdtemp` (or platform-equivalent) to get a randomized, exclusively-created directory before writing the file.
138
+
139
+ Example resolved path: `/tmp/claude-autopilot-aB3xQ9/codex-input-v8-daemon-step-arch.md`
140
+
141
+ Rules:
142
+ - Use `mkdtemp` (Node: `fs.mkdtempSync(path.join(os.tmpdir(), 'claude-autopilot-'))`) — never a hand-built path
143
+ - Set file mode `0600` on creation; set dir mode `0700`
144
+ - Clean up the entire temp directory in a `finally` block (success OR failure)
145
+ - Specs may contain sensitive architecture details (tenant/RLS behavior, auth flows, schema names, internal endpoints). The "no secrets" rule extends to "no production-sensitive design content in tempfiles you'd be uncomfortable surfacing"; paraphrase such content before writing
146
+ - Never write secrets, API keys, or production credentials to tempfiles regardless of location
147
+
148
+ ## Step 0: Brainstorming with per-step Codex validation
149
+
150
+ **Skip this step entirely if a spec already exists.** Otherwise:
89
151
 
90
- If any check **fails** (red ✗): stop and tell the user what to fix before continuing.
91
- If checks only **warn** (yellow !): proceed — degraded steps will be noted in the final report.
92
- If all pass: continue immediately, no user interaction needed.
152
+ Drive `superpowers:brainstorming` from the user's idea. **At each substep below, automatically run `/codex-review` and incorporate findings before moving on.** One pass per substep. Do not wait to be prompted.
93
153
 
94
- ### Step 1: Write Implementation Plan
154
+ Codex-validate after each of these brainstorming substeps:
95
155
 
156
+ 1. **Approach selection** — after presenting 2–3 approaches and the user picks one, write the chosen approach + rejected alternatives to a tempfile (per pattern above) and run the resolved codex-review command (from preflight, typically `npx tsx scripts/codex-review.ts <tempfile>` for project installs, or the package CLI fallback). Apply CRITICAL findings before proceeding (remediate, then re-pass if needed).
157
+ 2. **Architecture section** — after presenting the top-level architecture (boxes/arrows + key principles), Codex-validate; remediate CRITICALs.
158
+ 3. **Components + data flow section** — after detailing components, schemas, and data flow, Codex-validate; remediate CRITICALs.
159
+ 4. **Error handling + testing section** — after specifying failure modes and test strategy, Codex-validate; remediate CRITICALs.
160
+ 5. **Prepare final spec draft** — once the spec doc is written and self-reviewed, capture WARNINGs/NOTEs into a "post-launch follow-ups" appendix. (The load-bearing final spec validation happens in Step 1, NOT here — Step 0 produces a draft ready for the risk-tiered pass.)
161
+
162
+ Each substep uses ONE initial Codex pass for fast design feedback. Additional revalidation passes are required ONLY when CRITICAL findings are remediated (per the acceptance rules above) — "one initial pass" is not a cap on re-passes after remediation.
163
+
164
+ **Exit Step 0 with:**
165
+ - Committed spec at `docs/superpowers/specs/YYYY-MM-DD-<topic>-design.md`
166
+ - Spec includes `risk: low|medium|high` in frontmatter (or accept the default-medium per the backward-compat rule)
167
+ - User has explicitly approved the spec (this is the ONLY allowed pause in the entire pipeline)
168
+
169
+ ## Pipeline
170
+
171
+ Execute these steps in order. Do NOT pause between steps unless a step fails per the acceptance rules.
172
+
173
+ ### Step 1: Risk-tiered final spec validation + write implementation plan
174
+
175
+ **First — risk-tiered pass on the committed spec** per the policy table above:
176
+ - Low risk: 1 pass on the spec
177
+ - Medium risk: 2 passes (this one + Step 7 codex PR review serves as pass 2)
178
+ - High risk: 3 passes (this + Step 7 + an explicit pre-implementation pass)
179
+
180
+ Remediate CRITICALs in-place on the spec before moving on.
181
+
182
+ **Then write the plan:**
96
183
  ```
97
184
  Invoke: superpowers:writing-plans
98
- Input: The approved spec file
185
+ Input: The approved + validated spec
99
186
  Output: Plan at docs/superpowers/plans/YYYY-MM-DD-<topic>.md
100
187
  ```
101
188
 
102
- Commit the plan. Do NOT ask the user for execution choice always use subagent-driven development.
189
+ After the plan is written but BEFORE committing it, run `npx tsx scripts/codex-review.ts <plan-path>`. Apply CRITICAL findings (sequencing errors, missing test coverage on a load-bearing path, schema/migration ordering bugs) to the plan inline. Then commit. Always use subagent-driven development for execution — do not ask the user.
103
190
 
104
- ### Step 2: Set Up Worktree
191
+ ### Step 2: Set up worktree
105
192
 
106
193
  ```
107
194
  Invoke: superpowers:using-git-worktrees
108
195
  Branch: feature/<topic-slug>
109
196
  ```
110
197
 
111
- After the worktree is created, symlink the local env file into it so scripts
112
- (validate, Codex review, migrate) can read secrets:
113
-
114
- ```bash
115
- # Detect which env file the project uses
116
- ENV_FILE=$(ls .env.local .env.dev .env.development .env 2>/dev/null | head -1)
117
- if [ -n "$ENV_FILE" ]; then
118
- ln -sf "$(pwd)/$ENV_FILE" ".claude/worktrees/<branch>/$ENV_FILE"
119
- fi
120
- ```
121
-
122
- If no env file is found, note it in the preflight output (step 0 will have caught this).
123
-
124
- ### Step 3: Execute Plan
198
+ ### Step 3: Execute plan
125
199
 
126
200
  ```
127
201
  Invoke: superpowers:subagent-driven-development
@@ -135,74 +209,114 @@ For each task:
135
209
  - Skip formal spec/quality review to maintain speed (the validate step catches issues)
136
210
  - If subagent fails to write to worktree: implement directly
137
211
 
138
- ### Step 4: Migrate
212
+ ### Step 4: Validate
213
+
214
+ Run both checks in order. **Autofix runs first** so the static review sees the final post-autofix diff (otherwise the LLM review is stale on autofix mutations):
215
+
216
+ ```bash
217
+ # 1. Full project validation FIRST (autofix, tests, codex, gate) — mutates files
218
+ npx tsx scripts/validate.ts --commit-autofix --allow-dirty
139
219
 
140
- After implementation creates schema changes, the autopilot pipeline runs the migrate phase via the canonical dispatcher contract. The dispatcher:
220
+ # 2. THEN static rules + LLM review on the final diff
221
+ npx autopilot run --base main
222
+ ```
141
223
 
142
- 1. Reads `.autopilot/stack.md` looks up `migrate.skill` (default: `migrate@1`)
143
- 2. Resolves the skill via the alias map (path-escape protected)
144
- 3. Performs version handshake (skill manifest must declare a runtime range that includes the current `claude-autopilot` version, and an API version major matching the envelope contract)
145
- 4. Builds an invocation envelope (`{ contractVersion, invocationId, nonce, env, repoRoot, changedFiles, gitBase, gitHead, ci, ... }`) passed to the skill via env vars + stdin
146
- 5. Enforces policy (4-flag CI prod gate + clean-git + manual-approval + dry-run-first)
147
- 6. Executes the configured command via `spawn(shell:false)` — structured argv only, no shell injection
148
- 7. Parses the result artifact (file-first, nonce-bound stdout fallback only if skill manifest opts in)
149
- 8. Writes a hash-chained audit log entry to `.autopilot/audit.log`
150
- 9. Branches on `nextActions[]` from the result (e.g. `regenerate-types` triggers `npm run typecheck`)
224
+ The `validate.ts` Phase 1 includes a **tsc regression check**: it runs `npx tsc --noEmit` against both the PR and the merge-base (cached at `.claude/.tsc-baseline-cache.json`) and surfaces only files where the PR introduces *new* TypeScript errors versus the baseline. Forward-pressure check — type errors are warnings, not blockers.
151
225
 
152
- For the autopilot pipeline, this is invoked once per migration affected by the implementation changes.
226
+ If either FAIL:
227
+ - Read findings / validation report at `.claude/validation-report.json`
228
+ - Fix the blocking issues
229
+ - Re-run the failing check
230
+ - Max 3 retry iterations
153
231
 
154
- #### CI prod safety floor
232
+ If both PASS: proceed to Step 5.
155
233
 
156
- Running `--env prod` from CI requires **all four** of these (skills cannot relax):
157
- 1. `--yes` flag explicit
158
- 2. `AUTOPILOT_CI_POLICY=allow-prod` env var
159
- 3. `AUTOPILOT_TARGET_ENV=prod` env var (must match `--env`)
160
- 4. `migrate.policy.allow_prod_in_ci: true` in stack.md
234
+ ### Step 5: Migrate dev-only (verify SQL parses + applies)
161
235
 
162
- Plus a recognized CI provider env (GitHub Actions / GitLab CI / CircleCI / Buildkite / Jenkins)or an explicit `AUTOPILOT_CI_PROVIDER=<name>` override for self-hosted CI.
236
+ For any `.sql` files created in `data/deltas/` during implementation, run the migration against the dev database ONLY. Always invoke explicitly do not rely on the CLI default:
163
237
 
164
- #### Configuration source of truth
238
+ ```bash
239
+ /migrate --env=dev
240
+ ```
165
241
 
166
- All migrate behavior is configured in `.autopilot/stack.md`. The autopilot skill never invokes a specific runner directly; it always dispatches through `claude-autopilot dispatch`. See:
167
- - `docs/superpowers/specs/2026-04-29-migrate-skill-generalization-design.md` — full spec
168
- - `docs/skills/rich-migrate-contract.md` — envelope + result artifact contract for skill authors
169
- - `docs/skills/version-compatibility.md` — runtime/skill version compatibility matrix
170
- - `presets/schemas/migrate.schema.json` — JSON Schema for the stack.md migrate block
242
+ This verifies the SQL parses and applies against the real schema shape before the PR opens. Prod migration is **deferred to the user's CI/CD pipeline** after the PR merges — autopilot does NOT orchestrate prod deploys or migrations directly, because deployment ordering varies wildly across user infrastructure (ECS, CodeBuild, blue/green, manual approvals).
171
243
 
172
- ### Step 5: Validate
244
+ **Post-migration re-validate** (catches type/test failures that only surface after the schema actually changes):
173
245
 
174
246
  ```bash
175
- npx tsx scripts/validate.ts --commit-autofix --allow-dirty
247
+ # Regenerate any stack-specific DB types if migration introduced schema changes
248
+ # (e.g. Supabase: scripts/gen-types.ts). Stack-specific — skip if N/A.
249
+ # REQUIRED for any stack with generated DB-typed code: missing regeneration leaves
250
+ # stale types and validate.ts may not catch runtime/schema mismatches if the PR
251
+ # does not directly touch the changed columns.
252
+
253
+ # Re-run validation against the post-migration dev state (no autofix this time):
254
+ npx tsx scripts/validate.ts --allow-dirty
255
+
256
+ # If migration or type generation produced new file changes, re-run the
257
+ # static/LLM review against the final diff — the Step 4 review is now stale.
258
+ git diff --quiet HEAD -- || npx autopilot run --base main
176
259
  ```
177
260
 
178
- If FAIL:
179
- - Read the validation report at `.claude/validation-report.json`
180
- - Fix the blocking issues
181
- - Re-run validate
261
+ > **v7.10+ candidate:** automate detection by reading a `post_migrate_dev: [...]` hook list from `.autopilot/stack.md` and failing Step 5 if schema changes are detected but no type-generation hook is configured. Today this is advisory — operator responsibility.
262
+
263
+ **Dev database drift** — `/migrate --env=dev` applies schema changes to dev BEFORE the PR is merged. If the PR is later rejected, substantially changed, or abandoned, dev will contain unmerged schema. Mitigations:
264
+
265
+ - **Preferred:** target an isolated/ephemeral dev database per branch (per-PR Supabase project, Postgres schema namespace, or local container).
266
+ - **Shared dev DB fallback:** before running Step 5, capture the current migration_state head; on PR abandonment, run a corrective migration that brings dev back to that head. Document the policy in `.autopilot/stack.md` so the team knows the cleanup contract.
267
+
268
+ If migration fails:
269
+ - **Before retrying:** confirm the dev migration rolled back cleanly. Some migration tools leave partial state on failure (non-transactional DDL, drift in migration_state table). If partial state exists, reset the dev database to the pre-migration baseline OR write a corrective migration that brings dev back to a known state.
270
+ - Fix the SQL
271
+ - Re-run Step 4 (validate) against the corrected code
272
+ - Re-run Step 5 (migrate dev)
182
273
  - Max 3 retry iterations
183
274
 
184
- If PASS: proceed to PR.
275
+ **For prod-safety policy**, projects should set in `.autopilot/stack.md`:
276
+
277
+ ```yaml
278
+ migrate:
279
+ skill: <your-skill>@1
280
+ policy:
281
+ require_dry_run_first: true # always preview before apply
282
+ require_manual_approval: true # CI gates prod on human approval
283
+ require_clean_git: true # never apply against dirty tree
284
+ ```
285
+
286
+ As of v7.9.1, the valid keys per `presets/schemas/migrate.schema.json` are: `allow_prod_in_ci`, `require_clean_git`, `require_manual_approval`, `require_dry_run_first`. If the schema changes in a future release, that file is the source of truth.
287
+
288
+ These keys are **declarative policy inputs**, not autopilot enforcement. Your CI/CD pipeline must explicitly invoke a migrate runner that reads `.autopilot/stack.md` AND enforces equivalent checks (e.g. require-clean-git, dry-run-before-apply, manual-approval-for-prod). Setting them in stack.md alone does not protect production unless your pipeline reads them.
289
+
290
+ **Minimum CI/CD migrate-runner contract** (fail closed on all of these):
291
+
292
+ 1. Refuse to apply if `.autopilot/stack.md` cannot be read or parsed.
293
+ 2. Refuse to apply if the policy block contains unknown keys (schema-validate against `presets/schemas/migrate.schema.json`).
294
+ 3. If `require_dry_run_first: true`: refuse apply without a matching dry-run artifact for the current git head + target env.
295
+ 4. If `require_manual_approval: true` and `env != dev`: require an explicit human approval signal (CI approval gate, signed commit, etc.) before apply.
296
+ 5. If `require_clean_git: true`: refuse to apply against a dirty working tree (untracked or unstaged changes). This is intentionally limited to working-tree cleanliness — commit topology (squash vs rebase vs merge vs tag) is a separate concern not enforced by this key.
297
+ 6. If `allow_prod_in_ci: false` (the default): refuse prod apply from any CI context. **Note:** teams whose intended prod migration path is CI/CD with manual approval must explicitly set `allow_prod_in_ci: true` alongside `require_manual_approval: true` and `require_dry_run_first: true`. `allow_prod_in_ci: false` is for manual/operator-run production migration workflows only.
298
+ 7. On any policy-read or schema-validation failure, exit non-zero and surface the specific check that failed.
185
299
 
186
- ### Step 6: Push + Create PR
300
+ ### Step 6: Push + create PR
187
301
 
188
302
  ```bash
189
303
  git push -u origin <branch>
190
304
  gh pr create --title "<concise title>" --body "<generated PR body with spec link, test plan>"
191
305
  ```
192
306
 
193
- ### Step 7: Codex PR Review
307
+ ### Step 7: Codex PR review
194
308
 
195
309
  ```bash
196
310
  npx tsx scripts/codex-pr-review.ts <pr-number>
197
311
  ```
198
312
 
199
- Posts Codex review as a GitHub PR comment. If critical findings:
200
- - Fix them on the branch
313
+ Posts Codex review as a GitHub PR comment. **This serves as the second risk-tiered pass for medium-risk specs and the third pass for high-risk specs.** Remediate CRITICAL findings:
314
+ - Fix on the branch
201
315
  - Push
202
316
  - Re-run Codex review
203
317
  - Max 2 iterations
204
318
 
205
- ### Step 8: Bugbot Triage + Fix
319
+ ### Step 8: Bugbot triage + fix
206
320
 
207
321
  Wait 60 seconds for Cursor bugbot to post comments, then:
208
322
 
@@ -222,21 +336,25 @@ Tell the user:
222
336
  - PR URL
223
337
  - Test count
224
338
  - Validation verdict
225
- - Codex review summary
339
+ - Codex review summary (passes run, CRITICALs remediated, WARNINGs skipped + reason)
226
340
  - Bugbot triage summary (fixed / dismissed / needs-human)
227
- - Any human-required items that couldn't be auto-fixed
341
+ - Any human-required items that couldn't be auto-resolved
228
342
 
229
343
  ## Error Recovery
230
344
 
231
- - **Subagent failure:** Re-dispatch with more context or implement directly
232
- - **Migration failure:** Fix the migration source (per the configured `migrate.skill` in stack.md), re-dispatch via `claude-autopilot dispatch migrate`
233
- - **Validate failure:** Fix issues, re-run (max 3 retries)
234
- - **Codex critical findings:** Fix, push, re-review (max 2 retries)
235
- - **Bugbot findings:** /bugbot handles triage + fix automatically (max 3 rounds)
236
- - **Unrecoverable error:** Stop, report what was completed, show remaining work
345
+ - **Preflight failure:** Surface the specific check, exit. Do not partially run.
346
+ - **Missing skill/credential:** Exit with install/auth hint.
347
+ - **Subagent failure:** Re-dispatch with more context or implement directly.
348
+ - **Migration failure (Step 5 — dev only):** Fix SQL, re-run Step 4 (validate) against corrected code, then re-run Step 5 (migrate dev). Prod stays untouched. Max 3 retries.
349
+ - **Prod migration:** Out of scope for autopilot. Handed off to user's CI/CD pipeline via `migrate.policy` (require_manual_approval, require_dry_run_first).
350
+ - **Validate failure:** Fix issues, re-run (max 3 retries).
351
+ - **Codex CRITICAL findings:** REMEDIATE (apply fix), push, re-review (max 2 retries). Do NOT continue past unremediated CRITICALs.
352
+ - **Bugbot findings:** `/bugbot` handles triage + fix automatically (max 3 rounds).
353
+ - **External hard-block** (TCC, network, etc.): Stop, report what was completed, surface the blocker.
237
354
 
238
- ## When NOT to Use
355
+ ## When NOT to use
239
356
 
240
- - During brainstorming (this runs AFTER spec approval)
357
+ - During brainstorming if you haven't approved the spec yet (this skill runs AFTER spec approval)
241
358
  - For hotfixes (too heavy — just commit and push)
242
- - When the user wants manual control over each step
359
+ - When the user wants manual control over each step (use individual phase skills instead)
360
+ - When required credentials/dependencies are missing and you don't want to be told (preflight will catch them)