@delegance/claude-autopilot 7.8.0 → 7.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,16 @@
2
2
 
3
3
  - v5.6 Phase 7 (docs reconciliation) — pending.
4
4
 
5
+ ## 7.9.0 — 2026-05-12
6
+
7
+ ### Changed
8
+ - `skills/autopilot/SKILL.md` rewrite: merge "idea → spec" Step 0 (Step 0 brainstorming with per-step Codex validation) with the risk-tiered codex pass policy from v7.8.0. Adds entry decision tree (idea vs spec), operational preflight (gh auth + push-permission check, strict worktree-clean gate, Codex CLI resolution with package-fallback, portable test-runner detection), tightened CRITICAL-finding remediation semantics ("must remediate, not just acknowledge"), missing-`risk:` frontmatter backward-compat (default medium + keyword auto-escalation to high, resolved at preflight not mid-pipeline), entry-path-aware risk-tier pass counts (idea-entry vs approved-spec-entry), and `mkdtemp`-based secure tempfile handling (0700 dir + 0600 file + cleanup in finally) to replace the predictable timestamp/pid pattern. Net 242→292 lines.
9
+
10
+ ### Documentation
11
+ - Each pipeline step now states its risk-tier behavior explicitly; Step 0 brainstorming substeps use 1 codex pass each, Step 1 onward uses the risk-tiered policy.
12
+
13
+ No CLI/code changes. Skill content only.
14
+
5
15
  ## 7.8.0 — 2026-05-11
6
16
 
7
17
  ### Changed
package/README.md CHANGED
@@ -6,14 +6,11 @@
6
6
 
7
7
  **Open source, MIT-licensed, runs on your machine with your API keys.** No hosted agent, no per-seat subscription — `npm install -g @delegance/claude-autopilot@latest` and you're done.
8
8
 
9
- ## Hosted product (v7)
9
+ ## Hosted dashboard (early access)
10
10
 
11
- A hosted dashboard at **[autopilot.dev](https://autopilot.dev)** complements the self-hosted CLI. The CLI keeps doing all the work locally on your machine — running models, writing code, opening PRs — and optionally uploads each completed run's state + cost summary to the hosted dashboard so your team can see what's been shipped, audit cost, and manage memberships from the browser.
11
+ A hosted dashboard for team-wide run history, cost roll-up, and member management is in design-partner phase not yet open for self-serve signup. The CLI is and stays fully usable without it.
12
12
 
13
- - **CLI (this package)** local-first, no telemetry by default, your machine + your API keys. Pipeline stays the same.
14
- - **Dashboard ([autopilot.dev](https://autopilot.dev))** — opt-in. After `claude-autopilot dashboard login` mints a personal API key via a loopback OAuth flow, every engine-on autopilot run uploads its `state.json` + `events.ndjson` + cost roll-up at run.complete. Org admin (members, billing, SSO) lives there.
15
-
16
- The CLI works offline; the dashboard is purely additive. See [docs/v7/runbook.md](./docs/v7/runbook.md) for operating the hosted product and [docs/v7/breaking-changes.md](./docs/v7/breaking-changes.md) for the v6→v7 migration checklist.
13
+ If you're interested in early access, open an issue or email alex@delegance.com. Otherwise the rest of this README covers everything you need to run the CLI locally with your own API keys.
17
14
 
18
15
  ```bash
19
16
  claude-autopilot brainstorm "add SSO with SAML for enterprise tenants"
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@delegance/claude-autopilot",
3
- "version": "7.8.0",
3
+ "version": "7.9.0",
4
4
  "type": "module",
5
5
  "publishConfig": {
6
6
  "tag": "next"
@@ -1,127 +1,201 @@
1
1
  ---
2
2
  name: autopilot
3
- description: After spec approval, automatically execute the full pipeline — plan → implement → migrate → validate → PR → Codex review. No manual intervention required.
3
+ description: End-to-end pipeline brainstorm spec plan → implement → migrate → validate → PR → Codex review → bugbot. Risk-tiered. No manual intervention after spec approval.
4
4
  ---
5
5
 
6
- # Autopilot — Spec to PR Pipeline
6
+ # Autopilot — Idea to Merged PR Pipeline
7
7
 
8
- After the user approves a spec during brainstorming, this skill runs the full pipeline automatically.
8
+ Drives the full flow from raw user idea (or an existing spec) through merged PR. The ONLY pause is explicit user approval of the spec after Step 0; everything after that runs unattended unless blocked by an unrecoverable failure, missing credentials, or an unresolved CRITICAL Codex finding.
9
9
 
10
- ## Prerequisites
10
+ ## Entry decision tree
11
11
 
12
- - Approved spec file at `docs/superpowers/specs/YYYY-MM-DD-<topic>-design.md`
13
- - Superpowers plugin installed (`writing-plans`, `using-git-worktrees`, `subagent-driven-development`)
14
- - Scripts installed and dependencies present (run step 0 preflight to verify)
12
+ Pick the entry point ONCE at the start:
15
13
 
16
- ## CRITICAL: Do Not Pause
14
+ - **Approved spec path provided** (e.g. `docs/superpowers/specs/YYYY-MM-DD-<topic>-design.md`) → skip Step 0, jump to Step 1.
15
+ - **Only an idea provided** (no spec) → run Step 0 to brainstorm + spec it.
16
+ - **Neither** → ask the user once for either the spec path or the idea. This is the only allowed pre-pipeline pause.
17
+
18
+ ## Prerequisites (hard-gate)
19
+
20
+ The pipeline ABORTS with a clear actionable error if any of these are missing:
21
+
22
+ - **Required skills:** `superpowers:brainstorming`, `superpowers:writing-plans`, `superpowers:subagent-driven-development`, `superpowers:using-git-worktrees` (for Step 2)
23
+ - **Required project scripts:** `scripts/codex-review.ts`, `scripts/codex-pr-review.ts`, `scripts/bugbot.ts`, `scripts/validate.ts` — OR equivalent CLI verbs from `@delegance/claude-autopilot` if running the package version
24
+ - **Required env:** `OPENAI_API_KEY` (for Codex passes), `GITHUB_TOKEN` or `gh auth status` (for PR creation), `ANTHROPIC_API_KEY` (for impl agents)
25
+
26
+ If any superpowers skill is missing, print:
27
+ ```
28
+ Autopilot requires superpowers plugin. Install with: claude plugin install superpowers
29
+ ```
30
+ and exit. Do NOT half-run with a missing dependency.
31
+
32
+ ## Operational preflight (run before Step 0/1)
33
+
34
+ Each numbered check below is a HARD GATE. ALL must pass before any LLM call. There are no "OR" conditions — every condition listed inside a check must hold.
35
+
36
+ 1. **Branch is NOT `main`/`master`** — independent hard gate. Run `git rev-parse --abbrev-ref HEAD`; if the result is `main` or `master`, ABORT immediately. The skill never commits or mutates `main`/`master` directly. Resolution: create a feature branch or worktree first.
37
+ 2. **Working tree is clean (tracked AND untracked)** — `git diff --quiet` AND `git diff --cached --quiet` must both succeed. Run `git status --porcelain`; abort if any untracked files are present unless the user explicitly opted in via `--allow-untracked`.
38
+ 3. **GitHub auth resolved** — accept ANY of:
39
+ - `gh auth status` succeeds (interactive CLI login), OR
40
+ - `GITHUB_TOKEN`/`GH_TOKEN` is set AND `gh api user` (using that token) returns 200.
41
+
42
+ Then verify push permission for the target repo: `gh repo view <owner>/<repo>` succeeds AND either (a) `gh api repos/<owner>/<repo>/collaborators/<user>/permission` returns `admin`/`maintain`/`write`, or (b) a no-op `git push --dry-run origin HEAD:refs/heads/<probe-branch>` succeeds. Abort if no push permission.
43
+ 4. **Codex CLI resolution** — Resolve the codex-review command ONCE here and cache the resolved invocation for the rest of the pipeline. Try in order:
44
+ - `npx tsx scripts/codex-review.ts --help` (project-local script), OR
45
+ - `npx @delegance/claude-autopilot codex-review --help` (package CLI, also exposes `codex-pr-review`, `bugbot`, `validate`).
46
+
47
+ Use whichever resolves first; cache the resolved command. Abort if neither does. The accepted package CLI verbs are: `codex-review`, `codex-pr-review`, `bugbot`, `validate` — pinned to the major version of `@delegance/claude-autopilot` declared in the host project's `package.json`.
48
+ 5. **Test runner reachable** — Detect from `package.json`: if `scripts.test` is defined, run `npm run test -- --help` or framework-specific probe (`vitest --version`, `jest --help`, `playwright --version`). Do NOT assume `--dry-run` is supported. If no `test` script exists, accept presence of `scripts/validate.ts` or `scripts.validate` instead. Abort only if neither test nor validation entrypoint is reachable.
49
+ 6. **Implementation-agent credentials present** — `ANTHROPIC_API_KEY` is required because the skill drives Claude Code subagents for implementation; this is independent of which provider the host project uses for application LLM calls. If your impl-agent backend is intentionally non-Anthropic, document the override in `.claude/autopilot.config.json` and the preflight will accept the configured alternative.
50
+ 7. **Migration tool reachable** (if `data/deltas/` exists) — `/migrate` skill or `supabase` CLI
51
+
52
+ If any preflight check fails, abort with the specific check and remediation hint. Do NOT proceed and discover the failure mid-pipeline.
53
+
54
+ **Risk-tier confirmation is preflight, not mid-pipeline.** If the entry path is an approved spec and the spec's frontmatter is missing `risk:`, run the keyword auto-escalation rule (see below) and surface the inferred tier as a single preflight prompt BEFORE Step 1 begins. This is the same pre-pipeline pause class as "ask for the spec path"; it is not a mid-pipeline check-in.
55
+
56
+ ## CRITICAL invariant — Do Not Pause (after spec approval)
57
+
58
+ There are exactly TWO allowed pre-pipeline pause classes, both occurring BEFORE Step 1:
59
+
60
+ 1. **Step 0 brainstorming user input** — the user picks an approach in substep 1 and explicitly approves the final spec at the end. These are intrinsic to brainstorming, not mid-pipeline check-ins.
61
+ 2. **Preflight prompts** — missing entry inputs (spec path or idea), and risk-tier confirmation when frontmatter is missing `risk:`. These resolve before Step 1 starts.
62
+
63
+ After Step 1 begins, do NOT:
17
64
 
18
- **Run the entire pipeline without stopping.** Do NOT:
19
65
  - Ask "want me to continue?" between steps
20
66
  - Show intermediate results or ask for confirmation
21
67
  - Pause to report progress mid-pipeline
22
68
  - Wait for user input between any steps
23
69
 
24
- The ONLY time you stop is if a step **fails and cannot be recovered**. Otherwise, execute all steps sequentially and report ONCE at the end (Step 9).
70
+ The pipeline halts ONLY for:
71
+ - Unrecoverable step failure (retries exhausted)
72
+ - Missing credentials surfaced mid-run (despite preflight)
73
+ - An **unresolved CRITICAL Codex finding** (see acceptance rules below)
74
+ - An external system hard-block (TCC permission revoked, network outage, etc.)
25
75
 
26
- Brief status lines like `[autopilot] Step 3: Executing plan...` are fine. Full summaries, questions, or check-ins are not.
76
+ Brief status lines like `[autopilot] Step 3: Executing plan...` are fine. Full summaries, questions, or check-ins are not. Report ONCE at the end (Step 9).
27
77
 
28
78
  ## Codex pass policy (risk-tiered)
29
79
 
30
- > Adopted from the v7.4.1 strategic review (see
31
- > `docs/strategy/2026-05-11-codex-pivot.md`, codex finding N2).
32
- >
33
- > The v8 spec pass-2 finding 3 CRITICALs the original spec missed
34
- > (especially sandbox / credential exfiltration) was concrete evidence
35
- > that 1 codex pass is insufficient for security-sensitive architecture.
36
- > But running 3 passes on every CLI polish spec adds latency without
37
- > proportional value.
38
-
39
- **Tier the spec by risk; pass count follows.**
40
-
41
- | Spec risk | Triggers | # of codex passes |
42
- |---|---|---|
43
- | **Low** | CLI UX changes, doc-only PRs, scaffolding extensions, config polish, CI workflow tweaks | **1 pass** (this skill's existing pattern — codex on the committed spec) |
44
- | **Medium** | New execution modes, auth changes, billing flows, data-access patterns, new env vars, API contracts | **2 passes** (1 on the draft spec, 1 on the merged spec after edits) |
45
- | **High** | Sandboxing, multi-tenancy, auto-merge, anything that mutates user repos, new secrets-handling, RPC/SECURITY DEFINER changes | **3 passes** + external review (1 draft, 1 post-edit, 1 on the impl PR diff) |
80
+ > Adopted from the v7.4.1 strategic review. The v8 spec pass-2
81
+ > finding 3 CRITICALs the original spec missed (sandbox / credential
82
+ > exfiltration) was concrete evidence that 1 codex pass is
83
+ > insufficient for security-sensitive architecture. But running 3
84
+ > passes on every CLI polish spec adds latency without value.
46
85
 
47
- **How to apply.** Spec docs declare risk in their frontmatter:
48
-
49
- ```markdown
86
+ **Specs must declare risk in YAML frontmatter:**
87
+ ```yaml
50
88
  ---
51
- title: <topic>
89
+ title: <spec title>
52
90
  risk: low | medium | high
53
91
  ---
54
92
  ```
55
93
 
56
- If the spec's `risk:` is omitted, default to **medium** (safer than
57
- defaulting to low; matches the v8 spec pattern where pass-1 was
58
- clearly insufficient).
94
+ **Pass-count rules:**
59
95
 
60
- The brainstorming skill's per-step codex pass (approach selection,
61
- architecture, components, error handling, implementation prep) is
62
- ALWAYS run it's how we get a draft spec good enough to merge.
63
- This tier policy applies to the **post-brainstorm** passes and to
64
- the codex PR review at Step 7 below.
96
+ | Spec risk | Triggers | Codex passes (idea-entry) | Codex passes (approved-spec-entry) |
97
+ |---|---|---|---|
98
+ | **Low** | CLI UX, doc-only PRs, scaffolding extensions, config polish, CI workflow tweaks | **1 pass** on the committed spec | **1 pass** on the committed spec (Step 1) |
99
+ | **Medium** | New execution modes, auth changes, billing flows, data-access patterns, new env vars, API contracts | **2 passes** (1 draft in Step 0, 1 on merged spec in Step 1) + Step 7 PR review | **2 passes** on the committed spec in Step 1 (back-to-back, with remediation between) + Step 7 PR review |
100
+ | **High** | Sandboxing, multi-tenancy, auto-merge, repo mutation, new secrets handling, RPC/SECURITY DEFINER changes | **3 passes** (1 draft, 1 post-edit in Step 1, 1 pre-impl) + Step 7 PR review | **3 passes** on the committed spec in Step 1 (each followed by remediation) + Step 7 PR review |
65
101
 
66
- **Examples from v7.x:**
102
+ **Entry-path semantics:** When entering with an approved spec (no Step 0 draft pass available), run the full required pass count starting in Step 1, applying remediation and re-running until CRITICALs are clean. Optionally, the spec may declare prior Codex passes in frontmatter (`codex_passes_completed: <n>`) to credit toward the requirement.
67
103
 
68
- * v7.1.7 (setup polish — CLAUDE.md scaffold + .gitignore + dedup): low.
69
- 1 pass on the committed spec. Caught zero CRITICALs in practice.
70
- * v7.4.0 (Python/FastAPI scaffold extension): low. 1 pass. Found
71
- 2 CRITICALs (FastAPI precedence, dangling entrypoint) — both
72
- fixed pre-impl, no PR-pass surprises.
73
- * v7.0 Phase 6 (engine-off removal + middleware revocation): high.
74
- 3 passes (spec, post-edit, PR diff). Each pass surfaced new
75
- trust-boundary issues; without all three the launch would have
76
- shipped with the credential-exfiltration vector C3.
77
- * v8.0 spec (standalone daemon): high. 2 passes so far + needs a
78
- 3rd before any v8 alpha implementation.
104
+ **Backward-compatibility for missing `risk:`:**
79
105
 
80
- ## Pipeline
106
+ If the spec frontmatter is missing `risk:`, default to `medium` AND emit a warning. Auto-escalate to `high` only when the spec content crosses a trust boundary — match on these specific phrases (not the bare word `auth`, which is too broad and conflicts with the medium-risk "auth changes" classification in the table above):
81
107
 
82
- Execute these steps in order. Do NOT pause between steps unless a step fails.
108
+ - `multi-tenant`, `tenancy`, `RLS bypass`, `SECURITY DEFINER`
109
+ - `secret handling`, `credential handling`, `vault`, `token storage`, `session token`
110
+ - `auto-merge`, `automerge`, `sandbox`, `sandboxed`
111
+ - `SSO provisioning`, `SAML assertion`, `OAuth provider integration`, `privilege escalation`
112
+ - `repo mutation`, `force-push`
83
113
 
84
- ### Step 0: Preflight
114
+ The risk-tier confirmation happens during preflight (see "Risk-tier confirmation is preflight, not mid-pipeline" above), not as a separate mid-pipeline pause.
85
115
 
86
- ```bash
87
- npx tsx scripts/preflight.ts
116
+ **Step 0 substep passes are ALWAYS 1 initial pass each, regardless of spec risk.** Brainstorming is draft-stage feedback, not load-bearing security review. The risk-tiered policy applies starting at Step 1 (where the spec is committed). Note that CRITICAL findings in a Step 0 substep still trigger remediation + re-pass per the acceptance rules below — "1 initial pass" means "1 pass to surface findings," not "1 pass total even with unresolved CRITICALs."
117
+
118
+ ## Acceptance rules for Codex findings (CRITICAL — remediation semantics)
119
+
120
+ This is the load-bearing rule. Misreading it can ship vulnerable code.
121
+
122
+ - **CRITICAL findings: must be REMEDIATED, not just acknowledged.** Apply the fix to the spec/plan/code, then re-run the Codex pass. The pipeline MAY NOT proceed while any CRITICAL finding remains unresolved. "Auto-accept" never means "continue past."
123
+ - **WARNING findings:** remediate by default. Skip ONLY if the finding directly contradicts a locked user requirement; in that case, document the skip in the PR description with the reason.
124
+ - **NOTE findings:** discretionary. Roll into a "post-launch follow-ups" appendix if relevant; otherwise ignore.
125
+
126
+ After each Codex pass, present a single-line summary table to the user (severity + title + remediation status). Do NOT pause for "should I incorporate these?" — apply the rules above and continue.
127
+
128
+ ## Tempfile naming (avoid concurrent-session collisions + secure-by-default)
129
+
130
+ Codex passes write to temporary input files. Use a per-run isolated directory with restrictive permissions, NOT a shared `/tmp` path:
131
+
132
+ ```
133
+ <dir> = mkdtemp("${TMPDIR:-/tmp}/claude-autopilot-XXXXXX") # mode 0700
134
+ <file> = $dir/codex-input-<topic-slug>-<step>.md # mode 0600, exclusive create (O_EXCL)
88
135
  ```
89
136
 
90
- If any check **fails** (red ✗): stop and tell the user what to fix before continuing.
91
- If checks only **warn** (yellow !): proceed — degraded steps will be noted in the final report.
92
- If all pass: continue immediately, no user interaction needed.
137
+ The timestamp+pid pattern from earlier drafts is INSUFFICIENT — it is predictable, world-readable in shared `/tmp`, and vulnerable to symlink races. Always use `mkdtemp` (or platform-equivalent) to get a randomized, exclusively-created directory before writing the file.
93
138
 
94
- ### Step 1: Write Implementation Plan
139
+ Example resolved path: `/tmp/claude-autopilot-aB3xQ9/codex-input-v8-daemon-step-arch.md`
95
140
 
141
+ Rules:
142
+ - Use `mkdtemp` (Node: `fs.mkdtempSync(path.join(os.tmpdir(), 'claude-autopilot-'))`) — never a hand-built path
143
+ - Set file mode `0600` on creation; set dir mode `0700`
144
+ - Clean up the entire temp directory in a `finally` block (success OR failure)
145
+ - Specs may contain sensitive architecture details (tenant/RLS behavior, auth flows, schema names, internal endpoints). The "no secrets" rule extends to "no production-sensitive design content in tempfiles you'd be uncomfortable surfacing"; paraphrase such content before writing
146
+ - Never write secrets, API keys, or production credentials to tempfiles regardless of location
147
+
148
+ ## Step 0: Brainstorming with per-step Codex validation
149
+
150
+ **Skip this step entirely if a spec already exists.** Otherwise:
151
+
152
+ Drive `superpowers:brainstorming` from the user's idea. **At each substep below, automatically run `/codex-review` and incorporate findings before moving on.** One pass per substep. Do not wait to be prompted.
153
+
154
+ Codex-validate after each of these brainstorming substeps:
155
+
156
+ 1. **Approach selection** — after presenting 2–3 approaches and the user picks one, write the chosen approach + rejected alternatives to a tempfile (per pattern above) and run the resolved codex-review command (from preflight, typically `npx tsx scripts/codex-review.ts <tempfile>` for project installs, or the package CLI fallback). Apply CRITICAL findings before proceeding (remediate, then re-pass if needed).
157
+ 2. **Architecture section** — after presenting the top-level architecture (boxes/arrows + key principles), Codex-validate; remediate CRITICALs.
158
+ 3. **Components + data flow section** — after detailing components, schemas, and data flow, Codex-validate; remediate CRITICALs.
159
+ 4. **Error handling + testing section** — after specifying failure modes and test strategy, Codex-validate; remediate CRITICALs.
160
+ 5. **Prepare final spec draft** — once the spec doc is written and self-reviewed, capture WARNINGs/NOTEs into a "post-launch follow-ups" appendix. (The load-bearing final spec validation happens in Step 1, NOT here — Step 0 produces a draft ready for the risk-tiered pass.)
161
+
162
+ Each substep uses ONE initial Codex pass for fast design feedback. Additional revalidation passes are required ONLY when CRITICAL findings are remediated (per the acceptance rules above) — "one initial pass" is not a cap on re-passes after remediation.
163
+
164
+ **Exit Step 0 with:**
165
+ - Committed spec at `docs/superpowers/specs/YYYY-MM-DD-<topic>-design.md`
166
+ - Spec includes `risk: low|medium|high` in frontmatter (or accept the default-medium per the backward-compat rule)
167
+ - User has explicitly approved the spec (this is the ONLY allowed pause in the entire pipeline)
168
+
169
+ ## Pipeline
170
+
171
+ Execute these steps in order. Do NOT pause between steps unless a step fails per the acceptance rules.
172
+
173
+ ### Step 1: Risk-tiered final spec validation + write implementation plan
174
+
175
+ **First — risk-tiered pass on the committed spec** per the policy table above:
176
+ - Low risk: 1 pass on the spec
177
+ - Medium risk: 2 passes (this one + Step 7 codex PR review serves as pass 2)
178
+ - High risk: 3 passes (this + Step 7 + an explicit pre-implementation pass)
179
+
180
+ Remediate CRITICALs in-place on the spec before moving on.
181
+
182
+ **Then write the plan:**
96
183
  ```
97
184
  Invoke: superpowers:writing-plans
98
- Input: The approved spec file
185
+ Input: The approved + validated spec
99
186
  Output: Plan at docs/superpowers/plans/YYYY-MM-DD-<topic>.md
100
187
  ```
101
188
 
102
- Commit the plan. Do NOT ask the user for execution choice always use subagent-driven development.
189
+ After the plan is written but BEFORE committing it, run `npx tsx scripts/codex-review.ts <plan-path>`. Apply CRITICAL findings (sequencing errors, missing test coverage on a load-bearing path, schema/migration ordering bugs) to the plan inline. Then commit. Always use subagent-driven development for execution — do not ask the user.
103
190
 
104
- ### Step 2: Set Up Worktree
191
+ ### Step 2: Set up worktree
105
192
 
106
193
  ```
107
194
  Invoke: superpowers:using-git-worktrees
108
195
  Branch: feature/<topic-slug>
109
196
  ```
110
197
 
111
- After the worktree is created, symlink the local env file into it so scripts
112
- (validate, Codex review, migrate) can read secrets:
113
-
114
- ```bash
115
- # Detect which env file the project uses
116
- ENV_FILE=$(ls .env.local .env.dev .env.development .env 2>/dev/null | head -1)
117
- if [ -n "$ENV_FILE" ]; then
118
- ln -sf "$(pwd)/$ENV_FILE" ".claude/worktrees/<branch>/$ENV_FILE"
119
- fi
120
- ```
121
-
122
- If no env file is found, note it in the preflight output (step 0 will have caught this).
123
-
124
- ### Step 3: Execute Plan
198
+ ### Step 3: Execute plan
125
199
 
126
200
  ```
127
201
  Invoke: superpowers:subagent-driven-development
@@ -135,74 +209,58 @@ For each task:
135
209
  - Skip formal spec/quality review to maintain speed (the validate step catches issues)
136
210
  - If subagent fails to write to worktree: implement directly
137
211
 
138
- ### Step 4: Migrate
139
-
140
- After implementation creates schema changes, the autopilot pipeline runs the migrate phase via the canonical dispatcher contract. The dispatcher:
141
-
142
- 1. Reads `.autopilot/stack.md` → looks up `migrate.skill` (default: `migrate@1`)
143
- 2. Resolves the skill via the alias map (path-escape protected)
144
- 3. Performs version handshake (skill manifest must declare a runtime range that includes the current `claude-autopilot` version, and an API version major matching the envelope contract)
145
- 4. Builds an invocation envelope (`{ contractVersion, invocationId, nonce, env, repoRoot, changedFiles, gitBase, gitHead, ci, ... }`) passed to the skill via env vars + stdin
146
- 5. Enforces policy (4-flag CI prod gate + clean-git + manual-approval + dry-run-first)
147
- 6. Executes the configured command via `spawn(shell:false)` — structured argv only, no shell injection
148
- 7. Parses the result artifact (file-first, nonce-bound stdout fallback only if skill manifest opts in)
149
- 8. Writes a hash-chained audit log entry to `.autopilot/audit.log`
150
- 9. Branches on `nextActions[]` from the result (e.g. `regenerate-types` triggers `npm run typecheck`)
151
-
152
- For the autopilot pipeline, this is invoked once per migration affected by the implementation changes.
212
+ ### Step 4: Auto-migrate
153
213
 
154
- #### CI prod safety floor
214
+ For any `.sql` files created in `data/deltas/` during implementation:
155
215
 
156
- Running `--env prod` from CI requires **all four** of these (skills cannot relax):
157
- 1. `--yes` flag explicit
158
- 2. `AUTOPILOT_CI_POLICY=allow-prod` env var
159
- 3. `AUTOPILOT_TARGET_ENV=prod` env var (must match `--env`)
160
- 4. `migrate.policy.allow_prod_in_ci: true` in stack.md
161
-
162
- Plus a recognized CI provider env (GitHub Actions / GitLab CI / CircleCI / Buildkite / Jenkins) — or an explicit `AUTOPILOT_CI_PROVIDER=<name>` override for self-hosted CI.
163
-
164
- #### Configuration source of truth
216
+ ```bash
217
+ /migrate
218
+ ```
165
219
 
166
- All migrate behavior is configured in `.autopilot/stack.md`. The autopilot skill never invokes a specific runner directly; it always dispatches through `claude-autopilot dispatch`. See:
167
- - `docs/superpowers/specs/2026-04-29-migrate-skill-generalization-design.md` — full spec
168
- - `docs/skills/rich-migrate-contract.md` — envelope + result artifact contract for skill authors
169
- - `docs/skills/version-compatibility.md` — runtime/skill version compatibility matrix
170
- - `presets/schemas/migrate.schema.json` — JSON Schema for the stack.md migrate block
220
+ Run against dev QA prod with auto-promote. If migration fails, fix the SQL and retry.
171
221
 
172
222
  ### Step 5: Validate
173
223
 
224
+ Run both checks in order:
225
+
174
226
  ```bash
227
+ # 1. Static rules + LLM review on changed files
228
+ npx autopilot run --base main
229
+
230
+ # 2. Full project validation (autofix, tests, codex, gate)
175
231
  npx tsx scripts/validate.ts --commit-autofix --allow-dirty
176
232
  ```
177
233
 
178
- If FAIL:
179
- - Read the validation report at `.claude/validation-report.json`
234
+ The `validate.ts` Phase 1 includes a **tsc regression check**: it runs `npx tsc --noEmit` against both the PR and the merge-base (cached at `.claude/.tsc-baseline-cache.json`) and surfaces only files where the PR introduces *new* TypeScript errors versus the baseline. Forward-pressure check — type errors are warnings, not blockers.
235
+
236
+ If either FAIL:
237
+ - Read findings / validation report at `.claude/validation-report.json`
180
238
  - Fix the blocking issues
181
- - Re-run validate
239
+ - Re-run the failing check
182
240
  - Max 3 retry iterations
183
241
 
184
- If PASS: proceed to PR.
242
+ If both PASS: proceed to PR.
185
243
 
186
- ### Step 6: Push + Create PR
244
+ ### Step 6: Push + create PR
187
245
 
188
246
  ```bash
189
247
  git push -u origin <branch>
190
248
  gh pr create --title "<concise title>" --body "<generated PR body with spec link, test plan>"
191
249
  ```
192
250
 
193
- ### Step 7: Codex PR Review
251
+ ### Step 7: Codex PR review
194
252
 
195
253
  ```bash
196
254
  npx tsx scripts/codex-pr-review.ts <pr-number>
197
255
  ```
198
256
 
199
- Posts Codex review as a GitHub PR comment. If critical findings:
200
- - Fix them on the branch
257
+ Posts Codex review as a GitHub PR comment. **This serves as the second risk-tiered pass for medium-risk specs and the third pass for high-risk specs.** Remediate CRITICAL findings:
258
+ - Fix on the branch
201
259
  - Push
202
260
  - Re-run Codex review
203
261
  - Max 2 iterations
204
262
 
205
- ### Step 8: Bugbot Triage + Fix
263
+ ### Step 8: Bugbot triage + fix
206
264
 
207
265
  Wait 60 seconds for Cursor bugbot to post comments, then:
208
266
 
@@ -222,21 +280,24 @@ Tell the user:
222
280
  - PR URL
223
281
  - Test count
224
282
  - Validation verdict
225
- - Codex review summary
283
+ - Codex review summary (passes run, CRITICALs remediated, WARNINGs skipped + reason)
226
284
  - Bugbot triage summary (fixed / dismissed / needs-human)
227
- - Any human-required items that couldn't be auto-fixed
285
+ - Any human-required items that couldn't be auto-resolved
228
286
 
229
287
  ## Error Recovery
230
288
 
231
- - **Subagent failure:** Re-dispatch with more context or implement directly
232
- - **Migration failure:** Fix the migration source (per the configured `migrate.skill` in stack.md), re-dispatch via `claude-autopilot dispatch migrate`
233
- - **Validate failure:** Fix issues, re-run (max 3 retries)
234
- - **Codex critical findings:** Fix, push, re-review (max 2 retries)
235
- - **Bugbot findings:** /bugbot handles triage + fix automatically (max 3 rounds)
236
- - **Unrecoverable error:** Stop, report what was completed, show remaining work
289
+ - **Preflight failure:** Surface the specific check, exit. Do not partially run.
290
+ - **Missing skill/credential:** Exit with install/auth hint.
291
+ - **Subagent failure:** Re-dispatch with more context or implement directly.
292
+ - **Migration failure:** Fix SQL, re-run `/migrate`.
293
+ - **Validate failure:** Fix issues, re-run (max 3 retries).
294
+ - **Codex CRITICAL findings:** REMEDIATE (apply fix), push, re-review (max 2 retries). Do NOT continue past unremediated CRITICALs.
295
+ - **Bugbot findings:** `/bugbot` handles triage + fix automatically (max 3 rounds).
296
+ - **External hard-block** (TCC, network, etc.): Stop, report what was completed, surface the blocker.
237
297
 
238
- ## When NOT to Use
298
+ ## When NOT to use
239
299
 
240
- - During brainstorming (this runs AFTER spec approval)
300
+ - During brainstorming if you haven't approved the spec yet (this skill runs AFTER spec approval)
241
301
  - For hotfixes (too heavy — just commit and push)
242
- - When the user wants manual control over each step
302
+ - When the user wants manual control over each step (use individual phase skills instead)
303
+ - When required credentials/dependencies are missing and you don't want to be told (preflight will catch them)