@delegance/claude-autopilot 7.8.0 → 7.9.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +21 -0
- package/README.md +3 -6
- package/package.json +1 -1
- package/skills/autopilot/SKILL.md +243 -125
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,27 @@
|
|
|
2
2
|
|
|
3
3
|
- v5.6 Phase 7 (docs reconciliation) — pending.
|
|
4
4
|
|
|
5
|
+
## 7.9.1 — 2026-05-13 (correctness hotfix)
|
|
6
|
+
|
|
7
|
+
### Fixed
|
|
8
|
+
- **`skills/autopilot/SKILL.md` ran migrate BEFORE validate.** On stacks that auto-promote (Supabase-script-specific), this could leave production with new schema and no working code if validate or PR review later failed. Resequenced: validate is now Step 4, migrate-dev is Step 5. PR + Codex + bugbot follow. Production migration is explicitly handed off to the user's CI/CD pipeline.
|
|
9
|
+
- **Removed misleading "dev → QA → prod auto-promote" claim.** That behavior is Supabase-stack-specific, not a generic CLI capability. The skill now references the four real `migrate.policy` keys (`allow_prod_in_ci`, `require_clean_git`, `require_manual_approval`, `require_dry_run_first`) and explains how to wire them in `.autopilot/stack.md`.
|
|
10
|
+
|
|
11
|
+
### Out of scope (filed as v7.10.0 + v7.11.0 candidates)
|
|
12
|
+
- Expand/contract migration classification (additive vs destructive enforcement)
|
|
13
|
+
- v6 run-state engine integration into the autopilot skill (4,873 LOC of checkpoint/resume infra currently unused by the skill)
|
|
14
|
+
- Retry-loop sameness detector ("same fingerprint twice → escalate to human")
|
|
15
|
+
|
|
16
|
+
## 7.9.0 — 2026-05-12
|
|
17
|
+
|
|
18
|
+
### Changed
|
|
19
|
+
- `skills/autopilot/SKILL.md` rewrite: merge "idea → spec" Step 0 (Step 0 brainstorming with per-step Codex validation) with the risk-tiered codex pass policy from v7.8.0. Adds entry decision tree (idea vs spec), operational preflight (gh auth + push-permission check, strict worktree-clean gate, Codex CLI resolution with package-fallback, portable test-runner detection), tightened CRITICAL-finding remediation semantics ("must remediate, not just acknowledge"), missing-`risk:` frontmatter backward-compat (default medium + keyword auto-escalation to high, resolved at preflight not mid-pipeline), entry-path-aware risk-tier pass counts (idea-entry vs approved-spec-entry), and `mkdtemp`-based secure tempfile handling (0700 dir + 0600 file + cleanup in finally) to replace the predictable timestamp/pid pattern. Net 242→292 lines.
|
|
20
|
+
|
|
21
|
+
### Documentation
|
|
22
|
+
- Each pipeline step now states its risk-tier behavior explicitly; Step 0 brainstorming substeps use 1 codex pass each, Step 1 onward uses the risk-tiered policy.
|
|
23
|
+
|
|
24
|
+
No CLI/code changes. Skill content only.
|
|
25
|
+
|
|
5
26
|
## 7.8.0 — 2026-05-11
|
|
6
27
|
|
|
7
28
|
### Changed
|
package/README.md
CHANGED
|
@@ -6,14 +6,11 @@
|
|
|
6
6
|
|
|
7
7
|
**Open source, MIT-licensed, runs on your machine with your API keys.** No hosted agent, no per-seat subscription — `npm install -g @delegance/claude-autopilot@latest` and you're done.
|
|
8
8
|
|
|
9
|
-
## Hosted
|
|
9
|
+
## Hosted dashboard (early access)
|
|
10
10
|
|
|
11
|
-
A hosted dashboard
|
|
11
|
+
A hosted dashboard for team-wide run history, cost roll-up, and member management is in design-partner phase — not yet open for self-serve signup. The CLI is and stays fully usable without it.
|
|
12
12
|
|
|
13
|
-
|
|
14
|
-
- **Dashboard ([autopilot.dev](https://autopilot.dev))** — opt-in. After `claude-autopilot dashboard login` mints a personal API key via a loopback OAuth flow, every engine-on autopilot run uploads its `state.json` + `events.ndjson` + cost roll-up at run.complete. Org admin (members, billing, SSO) lives there.
|
|
15
|
-
|
|
16
|
-
The CLI works offline; the dashboard is purely additive. See [docs/v7/runbook.md](./docs/v7/runbook.md) for operating the hosted product and [docs/v7/breaking-changes.md](./docs/v7/breaking-changes.md) for the v6→v7 migration checklist.
|
|
13
|
+
If you're interested in early access, open an issue or email alex@delegance.com. Otherwise the rest of this README covers everything you need to run the CLI locally with your own API keys.
|
|
17
14
|
|
|
18
15
|
```bash
|
|
19
16
|
claude-autopilot brainstorm "add SSO with SAML for enterprise tenants"
|
package/package.json
CHANGED
|
@@ -1,127 +1,201 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: autopilot
|
|
3
|
-
description:
|
|
3
|
+
description: End-to-end pipeline — brainstorm → spec → plan → implement → validate → migrate dev → PR → Codex review → bugbot → merge. Risk-tiered. No manual intervention after spec approval until PR merge; production deploy/migration gates are handled by the user's CI/CD pipeline.
|
|
4
4
|
---
|
|
5
5
|
|
|
6
|
-
# Autopilot —
|
|
6
|
+
# Autopilot — Idea to Merged PR Pipeline
|
|
7
7
|
|
|
8
|
-
|
|
8
|
+
Drives the full flow from raw user idea (or an existing spec) through merged PR. The ONLY pause is explicit user approval of the spec after Step 0; everything after that runs unattended unless blocked by an unrecoverable failure, missing credentials, or an unresolved CRITICAL Codex finding.
|
|
9
9
|
|
|
10
|
-
##
|
|
10
|
+
## Entry decision tree
|
|
11
11
|
|
|
12
|
-
|
|
13
|
-
- Superpowers plugin installed (`writing-plans`, `using-git-worktrees`, `subagent-driven-development`)
|
|
14
|
-
- Scripts installed and dependencies present (run step 0 preflight to verify)
|
|
12
|
+
Pick the entry point ONCE at the start:
|
|
15
13
|
|
|
16
|
-
|
|
14
|
+
- **Approved spec path provided** (e.g. `docs/superpowers/specs/YYYY-MM-DD-<topic>-design.md`) → skip Step 0, jump to Step 1.
|
|
15
|
+
- **Only an idea provided** (no spec) → run Step 0 to brainstorm + spec it.
|
|
16
|
+
- **Neither** → ask the user once for either the spec path or the idea. This is the only allowed pre-pipeline pause.
|
|
17
|
+
|
|
18
|
+
## Prerequisites (hard-gate)
|
|
19
|
+
|
|
20
|
+
The pipeline ABORTS with a clear actionable error if any of these are missing:
|
|
21
|
+
|
|
22
|
+
- **Required skills:** `superpowers:brainstorming`, `superpowers:writing-plans`, `superpowers:subagent-driven-development`, `superpowers:using-git-worktrees` (for Step 2)
|
|
23
|
+
- **Required project scripts:** `scripts/codex-review.ts`, `scripts/codex-pr-review.ts`, `scripts/bugbot.ts`, `scripts/validate.ts` — OR equivalent CLI verbs from `@delegance/claude-autopilot` if running the package version
|
|
24
|
+
- **Required env:** `OPENAI_API_KEY` (for Codex passes), `GITHUB_TOKEN` or `gh auth status` (for PR creation), `ANTHROPIC_API_KEY` (for impl agents)
|
|
25
|
+
|
|
26
|
+
If any superpowers skill is missing, print:
|
|
27
|
+
```
|
|
28
|
+
Autopilot requires superpowers plugin. Install with: claude plugin install superpowers
|
|
29
|
+
```
|
|
30
|
+
and exit. Do NOT half-run with a missing dependency.
|
|
31
|
+
|
|
32
|
+
## Operational preflight (run before Step 0/1)
|
|
33
|
+
|
|
34
|
+
Each numbered check below is a HARD GATE. ALL must pass before any LLM call. There are no "OR" conditions — every condition listed inside a check must hold.
|
|
35
|
+
|
|
36
|
+
1. **Branch is NOT `main`/`master`** — independent hard gate. Run `git rev-parse --abbrev-ref HEAD`; if the result is `main` or `master`, ABORT immediately. The skill never commits or mutates `main`/`master` directly. Resolution: create a feature branch or worktree first.
|
|
37
|
+
2. **Working tree is clean (tracked AND untracked)** — `git diff --quiet` AND `git diff --cached --quiet` must both succeed. Run `git status --porcelain`; abort if any untracked files are present unless the user explicitly opted in via `--allow-untracked`.
|
|
38
|
+
3. **GitHub auth resolved** — accept ANY of:
|
|
39
|
+
- `gh auth status` succeeds (interactive CLI login), OR
|
|
40
|
+
- `GITHUB_TOKEN`/`GH_TOKEN` is set AND `gh api user` (using that token) returns 200.
|
|
41
|
+
|
|
42
|
+
Then verify push permission for the target repo: `gh repo view <owner>/<repo>` succeeds AND either (a) `gh api repos/<owner>/<repo>/collaborators/<user>/permission` returns `admin`/`maintain`/`write`, or (b) a no-op `git push --dry-run origin HEAD:refs/heads/<probe-branch>` succeeds. Abort if no push permission.
|
|
43
|
+
4. **Codex CLI resolution** — Resolve the codex-review command ONCE here and cache the resolved invocation for the rest of the pipeline. Try in order:
|
|
44
|
+
- `npx tsx scripts/codex-review.ts --help` (project-local script), OR
|
|
45
|
+
- `npx @delegance/claude-autopilot codex-review --help` (package CLI, also exposes `codex-pr-review`, `bugbot`, `validate`).
|
|
46
|
+
|
|
47
|
+
Use whichever resolves first; cache the resolved command. Abort if neither does. The accepted package CLI verbs are: `codex-review`, `codex-pr-review`, `bugbot`, `validate` — pinned to the major version of `@delegance/claude-autopilot` declared in the host project's `package.json`.
|
|
48
|
+
5. **Test runner reachable** — Detect from `package.json`: if `scripts.test` is defined, run `npm run test -- --help` or framework-specific probe (`vitest --version`, `jest --help`, `playwright --version`). Do NOT assume `--dry-run` is supported. If no `test` script exists, accept presence of `scripts/validate.ts` or `scripts.validate` instead. Abort only if neither test nor validation entrypoint is reachable.
|
|
49
|
+
6. **Implementation-agent credentials present** — `ANTHROPIC_API_KEY` is required because the skill drives Claude Code subagents for implementation; this is independent of which provider the host project uses for application LLM calls. If your impl-agent backend is intentionally non-Anthropic, document the override in `.claude/autopilot.config.json` and the preflight will accept the configured alternative.
|
|
50
|
+
7. **Migration tool reachable** (if `data/deltas/` exists) — `/migrate` skill or `supabase` CLI
|
|
51
|
+
|
|
52
|
+
If any preflight check fails, abort with the specific check and remediation hint. Do NOT proceed and discover the failure mid-pipeline.
|
|
53
|
+
|
|
54
|
+
**Risk-tier confirmation is preflight, not mid-pipeline.** If the entry path is an approved spec and the spec's frontmatter is missing `risk:`, run the keyword auto-escalation rule (see below) and surface the inferred tier as a single preflight prompt BEFORE Step 1 begins. This is the same pre-pipeline pause class as "ask for the spec path"; it is not a mid-pipeline check-in.
|
|
55
|
+
|
|
56
|
+
## CRITICAL invariant — Do Not Pause (after spec approval)
|
|
57
|
+
|
|
58
|
+
There are exactly TWO allowed pre-pipeline pause classes, both occurring BEFORE Step 1:
|
|
59
|
+
|
|
60
|
+
1. **Step 0 brainstorming user input** — the user picks an approach in substep 1 and explicitly approves the final spec at the end. These are intrinsic to brainstorming, not mid-pipeline check-ins.
|
|
61
|
+
2. **Preflight prompts** — missing entry inputs (spec path or idea), and risk-tier confirmation when frontmatter is missing `risk:`. These resolve before Step 1 starts.
|
|
62
|
+
|
|
63
|
+
After Step 1 begins, do NOT:
|
|
17
64
|
|
|
18
|
-
**Run the entire pipeline without stopping.** Do NOT:
|
|
19
65
|
- Ask "want me to continue?" between steps
|
|
20
66
|
- Show intermediate results or ask for confirmation
|
|
21
67
|
- Pause to report progress mid-pipeline
|
|
22
68
|
- Wait for user input between any steps
|
|
23
69
|
|
|
24
|
-
The
|
|
70
|
+
The pipeline halts ONLY for:
|
|
71
|
+
- Unrecoverable step failure (retries exhausted)
|
|
72
|
+
- Missing credentials surfaced mid-run (despite preflight)
|
|
73
|
+
- An **unresolved CRITICAL Codex finding** (see acceptance rules below)
|
|
74
|
+
- An external system hard-block (TCC permission revoked, network outage, etc.)
|
|
25
75
|
|
|
26
|
-
Brief status lines like `[autopilot] Step 3: Executing plan...` are fine. Full summaries, questions, or check-ins are not.
|
|
76
|
+
Brief status lines like `[autopilot] Step 3: Executing plan...` are fine. Full summaries, questions, or check-ins are not. Report ONCE at the end (Step 9).
|
|
27
77
|
|
|
28
78
|
## Codex pass policy (risk-tiered)
|
|
29
79
|
|
|
30
|
-
> Adopted from the v7.4.1 strategic review
|
|
31
|
-
>
|
|
32
|
-
>
|
|
33
|
-
>
|
|
34
|
-
>
|
|
35
|
-
> that 1 codex pass is insufficient for security-sensitive architecture.
|
|
36
|
-
> But running 3 passes on every CLI polish spec adds latency without
|
|
37
|
-
> proportional value.
|
|
38
|
-
|
|
39
|
-
**Tier the spec by risk; pass count follows.**
|
|
80
|
+
> Adopted from the v7.4.1 strategic review. The v8 spec pass-2
|
|
81
|
+
> finding 3 CRITICALs the original spec missed (sandbox / credential
|
|
82
|
+
> exfiltration) was concrete evidence that 1 codex pass is
|
|
83
|
+
> insufficient for security-sensitive architecture. But running 3
|
|
84
|
+
> passes on every CLI polish spec adds latency without value.
|
|
40
85
|
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
| **Low** | CLI UX changes, doc-only PRs, scaffolding extensions, config polish, CI workflow tweaks | **1 pass** (this skill's existing pattern — codex on the committed spec) |
|
|
44
|
-
| **Medium** | New execution modes, auth changes, billing flows, data-access patterns, new env vars, API contracts | **2 passes** (1 on the draft spec, 1 on the merged spec after edits) |
|
|
45
|
-
| **High** | Sandboxing, multi-tenancy, auto-merge, anything that mutates user repos, new secrets-handling, RPC/SECURITY DEFINER changes | **3 passes** + external review (1 draft, 1 post-edit, 1 on the impl PR diff) |
|
|
46
|
-
|
|
47
|
-
**How to apply.** Spec docs declare risk in their frontmatter:
|
|
48
|
-
|
|
49
|
-
```markdown
|
|
86
|
+
**Specs must declare risk in YAML frontmatter:**
|
|
87
|
+
```yaml
|
|
50
88
|
---
|
|
51
|
-
title: <
|
|
89
|
+
title: <spec title>
|
|
52
90
|
risk: low | medium | high
|
|
53
91
|
---
|
|
54
92
|
```
|
|
55
93
|
|
|
56
|
-
|
|
57
|
-
defaulting to low; matches the v8 spec pattern where pass-1 was
|
|
58
|
-
clearly insufficient).
|
|
94
|
+
**Pass-count rules:**
|
|
59
95
|
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
96
|
+
| Spec risk | Triggers | Codex passes (idea-entry) | Codex passes (approved-spec-entry) |
|
|
97
|
+
|---|---|---|---|
|
|
98
|
+
| **Low** | CLI UX, doc-only PRs, scaffolding extensions, config polish, CI workflow tweaks | **1 pass** on the committed spec | **1 pass** on the committed spec (Step 1) |
|
|
99
|
+
| **Medium** | New execution modes, auth changes, billing flows, data-access patterns, new env vars, API contracts | **2 passes** (1 draft in Step 0, 1 on merged spec in Step 1) + Step 7 PR review | **2 passes** on the committed spec in Step 1 (back-to-back, with remediation between) + Step 7 PR review |
|
|
100
|
+
| **High** | Sandboxing, multi-tenancy, auto-merge, repo mutation, new secrets handling, RPC/SECURITY DEFINER changes | **3 passes** (1 draft, 1 post-edit in Step 1, 1 pre-impl) + Step 7 PR review | **3 passes** on the committed spec in Step 1 (each followed by remediation) + Step 7 PR review |
|
|
65
101
|
|
|
66
|
-
**
|
|
102
|
+
**Entry-path semantics:** When entering with an approved spec (no Step 0 draft pass available), run the full required pass count starting in Step 1, applying remediation and re-running until CRITICALs are clean. Optionally, the spec may declare prior Codex passes in frontmatter (`codex_passes_completed: <n>`) to credit toward the requirement.
|
|
67
103
|
|
|
68
|
-
|
|
69
|
-
1 pass on the committed spec. Caught zero CRITICALs in practice.
|
|
70
|
-
* v7.4.0 (Python/FastAPI scaffold extension): low. 1 pass. Found
|
|
71
|
-
2 CRITICALs (FastAPI precedence, dangling entrypoint) — both
|
|
72
|
-
fixed pre-impl, no PR-pass surprises.
|
|
73
|
-
* v7.0 Phase 6 (engine-off removal + middleware revocation): high.
|
|
74
|
-
3 passes (spec, post-edit, PR diff). Each pass surfaced new
|
|
75
|
-
trust-boundary issues; without all three the launch would have
|
|
76
|
-
shipped with the credential-exfiltration vector C3.
|
|
77
|
-
* v8.0 spec (standalone daemon): high. 2 passes so far + needs a
|
|
78
|
-
3rd before any v8 alpha implementation.
|
|
104
|
+
**Backward-compatibility for missing `risk:`:**
|
|
79
105
|
|
|
80
|
-
|
|
106
|
+
If the spec frontmatter is missing `risk:`, default to `medium` AND emit a warning. Auto-escalate to `high` only when the spec content crosses a trust boundary — match on these specific phrases (not the bare word `auth`, which is too broad and conflicts with the medium-risk "auth changes" classification in the table above):
|
|
81
107
|
|
|
82
|
-
|
|
108
|
+
- `multi-tenant`, `tenancy`, `RLS bypass`, `SECURITY DEFINER`
|
|
109
|
+
- `secret handling`, `credential handling`, `vault`, `token storage`, `session token`
|
|
110
|
+
- `auto-merge`, `automerge`, `sandbox`, `sandboxed`
|
|
111
|
+
- `SSO provisioning`, `SAML assertion`, `OAuth provider integration`, `privilege escalation`
|
|
112
|
+
- `repo mutation`, `force-push`
|
|
83
113
|
|
|
84
|
-
|
|
114
|
+
The risk-tier confirmation happens during preflight (see "Risk-tier confirmation is preflight, not mid-pipeline" above), not as a separate mid-pipeline pause.
|
|
115
|
+
|
|
116
|
+
**Step 0 substep passes are ALWAYS 1 initial pass each, regardless of spec risk.** Brainstorming is draft-stage feedback, not load-bearing security review. The risk-tiered policy applies starting at Step 1 (where the spec is committed). Note that CRITICAL findings in a Step 0 substep still trigger remediation + re-pass per the acceptance rules below — "1 initial pass" means "1 pass to surface findings," not "1 pass total even with unresolved CRITICALs."
|
|
117
|
+
|
|
118
|
+
## Acceptance rules for Codex findings (CRITICAL — remediation semantics)
|
|
119
|
+
|
|
120
|
+
This is the load-bearing rule. Misreading it can ship vulnerable code.
|
|
121
|
+
|
|
122
|
+
- **CRITICAL findings: must be REMEDIATED, not just acknowledged.** Apply the fix to the spec/plan/code, then re-run the Codex pass. The pipeline MAY NOT proceed while any CRITICAL finding remains unresolved. "Auto-accept" never means "continue past."
|
|
123
|
+
- **WARNING findings:** remediate by default. Skip ONLY if the finding directly contradicts a locked user requirement; in that case, document the skip in the PR description with the reason.
|
|
124
|
+
- **NOTE findings:** discretionary. Roll into a "post-launch follow-ups" appendix if relevant; otherwise ignore.
|
|
125
|
+
|
|
126
|
+
After each Codex pass, present a single-line summary table to the user (severity + title + remediation status). Do NOT pause for "should I incorporate these?" — apply the rules above and continue.
|
|
127
|
+
|
|
128
|
+
## Tempfile naming (avoid concurrent-session collisions + secure-by-default)
|
|
129
|
+
|
|
130
|
+
Codex passes write to temporary input files. Use a per-run isolated directory with restrictive permissions, NOT a shared `/tmp` path:
|
|
85
131
|
|
|
86
|
-
```bash
|
|
87
|
-
npx tsx scripts/preflight.ts
|
|
88
132
|
```
|
|
133
|
+
<dir> = mkdtemp("${TMPDIR:-/tmp}/claude-autopilot-XXXXXX") # mode 0700
|
|
134
|
+
<file> = $dir/codex-input-<topic-slug>-<step>.md # mode 0600, exclusive create (O_EXCL)
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
The timestamp+pid pattern from earlier drafts is INSUFFICIENT — it is predictable, world-readable in shared `/tmp`, and vulnerable to symlink races. Always use `mkdtemp` (or platform-equivalent) to get a randomized, exclusively-created directory before writing the file.
|
|
138
|
+
|
|
139
|
+
Example resolved path: `/tmp/claude-autopilot-aB3xQ9/codex-input-v8-daemon-step-arch.md`
|
|
140
|
+
|
|
141
|
+
Rules:
|
|
142
|
+
- Use `mkdtemp` (Node: `fs.mkdtempSync(path.join(os.tmpdir(), 'claude-autopilot-'))`) — never a hand-built path
|
|
143
|
+
- Set file mode `0600` on creation; set dir mode `0700`
|
|
144
|
+
- Clean up the entire temp directory in a `finally` block (success OR failure)
|
|
145
|
+
- Specs may contain sensitive architecture details (tenant/RLS behavior, auth flows, schema names, internal endpoints). The "no secrets" rule extends to "no production-sensitive design content in tempfiles you'd be uncomfortable surfacing"; paraphrase such content before writing
|
|
146
|
+
- Never write secrets, API keys, or production credentials to tempfiles regardless of location
|
|
147
|
+
|
|
148
|
+
## Step 0: Brainstorming with per-step Codex validation
|
|
149
|
+
|
|
150
|
+
**Skip this step entirely if a spec already exists.** Otherwise:
|
|
89
151
|
|
|
90
|
-
|
|
91
|
-
If checks only **warn** (yellow !): proceed — degraded steps will be noted in the final report.
|
|
92
|
-
If all pass: continue immediately, no user interaction needed.
|
|
152
|
+
Drive `superpowers:brainstorming` from the user's idea. **At each substep below, automatically run `/codex-review` and incorporate findings before moving on.** One pass per substep. Do not wait to be prompted.
|
|
93
153
|
|
|
94
|
-
|
|
154
|
+
Codex-validate after each of these brainstorming substeps:
|
|
95
155
|
|
|
156
|
+
1. **Approach selection** — after presenting 2–3 approaches and the user picks one, write the chosen approach + rejected alternatives to a tempfile (per pattern above) and run the resolved codex-review command (from preflight, typically `npx tsx scripts/codex-review.ts <tempfile>` for project installs, or the package CLI fallback). Apply CRITICAL findings before proceeding (remediate, then re-pass if needed).
|
|
157
|
+
2. **Architecture section** — after presenting the top-level architecture (boxes/arrows + key principles), Codex-validate; remediate CRITICALs.
|
|
158
|
+
3. **Components + data flow section** — after detailing components, schemas, and data flow, Codex-validate; remediate CRITICALs.
|
|
159
|
+
4. **Error handling + testing section** — after specifying failure modes and test strategy, Codex-validate; remediate CRITICALs.
|
|
160
|
+
5. **Prepare final spec draft** — once the spec doc is written and self-reviewed, capture WARNINGs/NOTEs into a "post-launch follow-ups" appendix. (The load-bearing final spec validation happens in Step 1, NOT here — Step 0 produces a draft ready for the risk-tiered pass.)
|
|
161
|
+
|
|
162
|
+
Each substep uses ONE initial Codex pass for fast design feedback. Additional revalidation passes are required ONLY when CRITICAL findings are remediated (per the acceptance rules above) — "one initial pass" is not a cap on re-passes after remediation.
|
|
163
|
+
|
|
164
|
+
**Exit Step 0 with:**
|
|
165
|
+
- Committed spec at `docs/superpowers/specs/YYYY-MM-DD-<topic>-design.md`
|
|
166
|
+
- Spec includes `risk: low|medium|high` in frontmatter (or accept the default-medium per the backward-compat rule)
|
|
167
|
+
- User has explicitly approved the spec (this is the ONLY allowed pause in the entire pipeline)
|
|
168
|
+
|
|
169
|
+
## Pipeline
|
|
170
|
+
|
|
171
|
+
Execute these steps in order. Do NOT pause between steps unless a step fails per the acceptance rules.
|
|
172
|
+
|
|
173
|
+
### Step 1: Risk-tiered final spec validation + write implementation plan
|
|
174
|
+
|
|
175
|
+
**First — risk-tiered pass on the committed spec** per the policy table above:
|
|
176
|
+
- Low risk: 1 pass on the spec
|
|
177
|
+
- Medium risk: 2 passes (this one + Step 7 codex PR review serves as pass 2)
|
|
178
|
+
- High risk: 3 passes (this + Step 7 + an explicit pre-implementation pass)
|
|
179
|
+
|
|
180
|
+
Remediate CRITICALs in-place on the spec before moving on.
|
|
181
|
+
|
|
182
|
+
**Then write the plan:**
|
|
96
183
|
```
|
|
97
184
|
Invoke: superpowers:writing-plans
|
|
98
|
-
Input: The approved spec
|
|
185
|
+
Input: The approved + validated spec
|
|
99
186
|
Output: Plan at docs/superpowers/plans/YYYY-MM-DD-<topic>.md
|
|
100
187
|
```
|
|
101
188
|
|
|
102
|
-
|
|
189
|
+
After the plan is written but BEFORE committing it, run `npx tsx scripts/codex-review.ts <plan-path>`. Apply CRITICAL findings (sequencing errors, missing test coverage on a load-bearing path, schema/migration ordering bugs) to the plan inline. Then commit. Always use subagent-driven development for execution — do not ask the user.
|
|
103
190
|
|
|
104
|
-
### Step 2: Set
|
|
191
|
+
### Step 2: Set up worktree
|
|
105
192
|
|
|
106
193
|
```
|
|
107
194
|
Invoke: superpowers:using-git-worktrees
|
|
108
195
|
Branch: feature/<topic-slug>
|
|
109
196
|
```
|
|
110
197
|
|
|
111
|
-
|
|
112
|
-
(validate, Codex review, migrate) can read secrets:
|
|
113
|
-
|
|
114
|
-
```bash
|
|
115
|
-
# Detect which env file the project uses
|
|
116
|
-
ENV_FILE=$(ls .env.local .env.dev .env.development .env 2>/dev/null | head -1)
|
|
117
|
-
if [ -n "$ENV_FILE" ]; then
|
|
118
|
-
ln -sf "$(pwd)/$ENV_FILE" ".claude/worktrees/<branch>/$ENV_FILE"
|
|
119
|
-
fi
|
|
120
|
-
```
|
|
121
|
-
|
|
122
|
-
If no env file is found, note it in the preflight output (step 0 will have caught this).
|
|
123
|
-
|
|
124
|
-
### Step 3: Execute Plan
|
|
198
|
+
### Step 3: Execute plan
|
|
125
199
|
|
|
126
200
|
```
|
|
127
201
|
Invoke: superpowers:subagent-driven-development
|
|
@@ -135,74 +209,114 @@ For each task:
|
|
|
135
209
|
- Skip formal spec/quality review to maintain speed (the validate step catches issues)
|
|
136
210
|
- If subagent fails to write to worktree: implement directly
|
|
137
211
|
|
|
138
|
-
### Step 4:
|
|
212
|
+
### Step 4: Validate
|
|
213
|
+
|
|
214
|
+
Run both checks in order. **Autofix runs first** so the static review sees the final post-autofix diff (otherwise the LLM review is stale on autofix mutations):
|
|
215
|
+
|
|
216
|
+
```bash
|
|
217
|
+
# 1. Full project validation FIRST (autofix, tests, codex, gate) — mutates files
|
|
218
|
+
npx tsx scripts/validate.ts --commit-autofix --allow-dirty
|
|
139
219
|
|
|
140
|
-
|
|
220
|
+
# 2. THEN static rules + LLM review on the final diff
|
|
221
|
+
npx autopilot run --base main
|
|
222
|
+
```
|
|
141
223
|
|
|
142
|
-
|
|
143
|
-
2. Resolves the skill via the alias map (path-escape protected)
|
|
144
|
-
3. Performs version handshake (skill manifest must declare a runtime range that includes the current `claude-autopilot` version, and an API version major matching the envelope contract)
|
|
145
|
-
4. Builds an invocation envelope (`{ contractVersion, invocationId, nonce, env, repoRoot, changedFiles, gitBase, gitHead, ci, ... }`) passed to the skill via env vars + stdin
|
|
146
|
-
5. Enforces policy (4-flag CI prod gate + clean-git + manual-approval + dry-run-first)
|
|
147
|
-
6. Executes the configured command via `spawn(shell:false)` — structured argv only, no shell injection
|
|
148
|
-
7. Parses the result artifact (file-first, nonce-bound stdout fallback only if skill manifest opts in)
|
|
149
|
-
8. Writes a hash-chained audit log entry to `.autopilot/audit.log`
|
|
150
|
-
9. Branches on `nextActions[]` from the result (e.g. `regenerate-types` triggers `npm run typecheck`)
|
|
224
|
+
The `validate.ts` Phase 1 includes a **tsc regression check**: it runs `npx tsc --noEmit` against both the PR and the merge-base (cached at `.claude/.tsc-baseline-cache.json`) and surfaces only files where the PR introduces *new* TypeScript errors versus the baseline. Forward-pressure check — type errors are warnings, not blockers.
|
|
151
225
|
|
|
152
|
-
|
|
226
|
+
If either FAIL:
|
|
227
|
+
- Read findings / validation report at `.claude/validation-report.json`
|
|
228
|
+
- Fix the blocking issues
|
|
229
|
+
- Re-run the failing check
|
|
230
|
+
- Max 3 retry iterations
|
|
153
231
|
|
|
154
|
-
|
|
232
|
+
If both PASS: proceed to Step 5.
|
|
155
233
|
|
|
156
|
-
|
|
157
|
-
1. `--yes` flag explicit
|
|
158
|
-
2. `AUTOPILOT_CI_POLICY=allow-prod` env var
|
|
159
|
-
3. `AUTOPILOT_TARGET_ENV=prod` env var (must match `--env`)
|
|
160
|
-
4. `migrate.policy.allow_prod_in_ci: true` in stack.md
|
|
234
|
+
### Step 5: Migrate dev-only (verify SQL parses + applies)
|
|
161
235
|
|
|
162
|
-
|
|
236
|
+
For any `.sql` files created in `data/deltas/` during implementation, run the migration against the dev database ONLY. Always invoke explicitly — do not rely on the CLI default:
|
|
163
237
|
|
|
164
|
-
|
|
238
|
+
```bash
|
|
239
|
+
/migrate --env=dev
|
|
240
|
+
```
|
|
165
241
|
|
|
166
|
-
|
|
167
|
-
- `docs/superpowers/specs/2026-04-29-migrate-skill-generalization-design.md` — full spec
|
|
168
|
-
- `docs/skills/rich-migrate-contract.md` — envelope + result artifact contract for skill authors
|
|
169
|
-
- `docs/skills/version-compatibility.md` — runtime/skill version compatibility matrix
|
|
170
|
-
- `presets/schemas/migrate.schema.json` — JSON Schema for the stack.md migrate block
|
|
242
|
+
This verifies the SQL parses and applies against the real schema shape before the PR opens. Prod migration is **deferred to the user's CI/CD pipeline** after the PR merges — autopilot does NOT orchestrate prod deploys or migrations directly, because deployment ordering varies wildly across user infrastructure (ECS, CodeBuild, blue/green, manual approvals).
|
|
171
243
|
|
|
172
|
-
|
|
244
|
+
**Post-migration re-validate** (catches type/test failures that only surface after the schema actually changes):
|
|
173
245
|
|
|
174
246
|
```bash
|
|
175
|
-
|
|
247
|
+
# Regenerate any stack-specific DB types if migration introduced schema changes
|
|
248
|
+
# (e.g. Supabase: scripts/gen-types.ts). Stack-specific — skip if N/A.
|
|
249
|
+
# REQUIRED for any stack with generated DB-typed code: missing regeneration leaves
|
|
250
|
+
# stale types and validate.ts may not catch runtime/schema mismatches if the PR
|
|
251
|
+
# does not directly touch the changed columns.
|
|
252
|
+
|
|
253
|
+
# Re-run validation against the post-migration dev state (no autofix this time):
|
|
254
|
+
npx tsx scripts/validate.ts --allow-dirty
|
|
255
|
+
|
|
256
|
+
# If migration or type generation produced new file changes, re-run the
|
|
257
|
+
# static/LLM review against the final diff — the Step 4 review is now stale.
|
|
258
|
+
git diff --quiet HEAD -- || npx autopilot run --base main
|
|
176
259
|
```
|
|
177
260
|
|
|
178
|
-
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
|
|
261
|
+
> **v7.10+ candidate:** automate detection by reading a `post_migrate_dev: [...]` hook list from `.autopilot/stack.md` and failing Step 5 if schema changes are detected but no type-generation hook is configured. Today this is advisory — operator responsibility.
|
|
262
|
+
|
|
263
|
+
**Dev database drift** — `/migrate --env=dev` applies schema changes to dev BEFORE the PR is merged. If the PR is later rejected, substantially changed, or abandoned, dev will contain unmerged schema. Mitigations:
|
|
264
|
+
|
|
265
|
+
- **Preferred:** target an isolated/ephemeral dev database per branch (per-PR Supabase project, Postgres schema namespace, or local container).
|
|
266
|
+
- **Shared dev DB fallback:** before running Step 5, capture the current migration_state head; on PR abandonment, run a corrective migration that brings dev back to that head. Document the policy in `.autopilot/stack.md` so the team knows the cleanup contract.
|
|
267
|
+
|
|
268
|
+
If migration fails:
|
|
269
|
+
- **Before retrying:** confirm the dev migration rolled back cleanly. Some migration tools leave partial state on failure (non-transactional DDL, drift in migration_state table). If partial state exists, reset the dev database to the pre-migration baseline OR write a corrective migration that brings dev back to a known state.
|
|
270
|
+
- Fix the SQL
|
|
271
|
+
- Re-run Step 4 (validate) against the corrected code
|
|
272
|
+
- Re-run Step 5 (migrate dev)
|
|
182
273
|
- Max 3 retry iterations
|
|
183
274
|
|
|
184
|
-
|
|
275
|
+
**For prod-safety policy**, projects should set in `.autopilot/stack.md`:
|
|
276
|
+
|
|
277
|
+
```yaml
|
|
278
|
+
migrate:
|
|
279
|
+
skill: <your-skill>@1
|
|
280
|
+
policy:
|
|
281
|
+
require_dry_run_first: true # always preview before apply
|
|
282
|
+
require_manual_approval: true # CI gates prod on human approval
|
|
283
|
+
require_clean_git: true # never apply against dirty tree
|
|
284
|
+
```
|
|
285
|
+
|
|
286
|
+
As of v7.9.1, the valid keys per `presets/schemas/migrate.schema.json` are: `allow_prod_in_ci`, `require_clean_git`, `require_manual_approval`, `require_dry_run_first`. If the schema changes in a future release, that file is the source of truth.
|
|
287
|
+
|
|
288
|
+
These keys are **declarative policy inputs**, not autopilot enforcement. Your CI/CD pipeline must explicitly invoke a migrate runner that reads `.autopilot/stack.md` AND enforces equivalent checks (e.g. require-clean-git, dry-run-before-apply, manual-approval-for-prod). Setting them in stack.md alone does not protect production unless your pipeline reads them.
|
|
289
|
+
|
|
290
|
+
**Minimum CI/CD migrate-runner contract** (fail closed on all of these):
|
|
291
|
+
|
|
292
|
+
1. Refuse to apply if `.autopilot/stack.md` cannot be read or parsed.
|
|
293
|
+
2. Refuse to apply if the policy block contains unknown keys (schema-validate against `presets/schemas/migrate.schema.json`).
|
|
294
|
+
3. If `require_dry_run_first: true`: refuse apply without a matching dry-run artifact for the current git head + target env.
|
|
295
|
+
4. If `require_manual_approval: true` and `env != dev`: require an explicit human approval signal (CI approval gate, signed commit, etc.) before apply.
|
|
296
|
+
5. If `require_clean_git: true`: refuse to apply against a dirty working tree (untracked or unstaged changes). This is intentionally limited to working-tree cleanliness — commit topology (squash vs rebase vs merge vs tag) is a separate concern not enforced by this key.
|
|
297
|
+
6. If `allow_prod_in_ci: false` (the default): refuse prod apply from any CI context. **Note:** teams whose intended prod migration path is CI/CD with manual approval must explicitly set `allow_prod_in_ci: true` alongside `require_manual_approval: true` and `require_dry_run_first: true`. `allow_prod_in_ci: false` is for manual/operator-run production migration workflows only.
|
|
298
|
+
7. On any policy-read or schema-validation failure, exit non-zero and surface the specific check that failed.
|
|
185
299
|
|
|
186
|
-
### Step 6: Push +
|
|
300
|
+
### Step 6: Push + create PR
|
|
187
301
|
|
|
188
302
|
```bash
|
|
189
303
|
git push -u origin <branch>
|
|
190
304
|
gh pr create --title "<concise title>" --body "<generated PR body with spec link, test plan>"
|
|
191
305
|
```
|
|
192
306
|
|
|
193
|
-
### Step 7: Codex PR
|
|
307
|
+
### Step 7: Codex PR review
|
|
194
308
|
|
|
195
309
|
```bash
|
|
196
310
|
npx tsx scripts/codex-pr-review.ts <pr-number>
|
|
197
311
|
```
|
|
198
312
|
|
|
199
|
-
Posts Codex review as a GitHub PR comment.
|
|
200
|
-
- Fix
|
|
313
|
+
Posts Codex review as a GitHub PR comment. **This serves as the second risk-tiered pass for medium-risk specs and the third pass for high-risk specs.** Remediate CRITICAL findings:
|
|
314
|
+
- Fix on the branch
|
|
201
315
|
- Push
|
|
202
316
|
- Re-run Codex review
|
|
203
317
|
- Max 2 iterations
|
|
204
318
|
|
|
205
|
-
### Step 8: Bugbot
|
|
319
|
+
### Step 8: Bugbot triage + fix
|
|
206
320
|
|
|
207
321
|
Wait 60 seconds for Cursor bugbot to post comments, then:
|
|
208
322
|
|
|
@@ -222,21 +336,25 @@ Tell the user:
|
|
|
222
336
|
- PR URL
|
|
223
337
|
- Test count
|
|
224
338
|
- Validation verdict
|
|
225
|
-
- Codex review summary
|
|
339
|
+
- Codex review summary (passes run, CRITICALs remediated, WARNINGs skipped + reason)
|
|
226
340
|
- Bugbot triage summary (fixed / dismissed / needs-human)
|
|
227
|
-
- Any human-required items that couldn't be auto-
|
|
341
|
+
- Any human-required items that couldn't be auto-resolved
|
|
228
342
|
|
|
229
343
|
## Error Recovery
|
|
230
344
|
|
|
231
|
-
- **
|
|
232
|
-
- **
|
|
233
|
-
- **
|
|
234
|
-
- **
|
|
235
|
-
- **
|
|
236
|
-
- **
|
|
345
|
+
- **Preflight failure:** Surface the specific check, exit. Do not partially run.
|
|
346
|
+
- **Missing skill/credential:** Exit with install/auth hint.
|
|
347
|
+
- **Subagent failure:** Re-dispatch with more context or implement directly.
|
|
348
|
+
- **Migration failure (Step 5 — dev only):** Fix SQL, re-run Step 4 (validate) against corrected code, then re-run Step 5 (migrate dev). Prod stays untouched. Max 3 retries.
|
|
349
|
+
- **Prod migration:** Out of scope for autopilot. Handed off to user's CI/CD pipeline via `migrate.policy` (require_manual_approval, require_dry_run_first).
|
|
350
|
+
- **Validate failure:** Fix issues, re-run (max 3 retries).
|
|
351
|
+
- **Codex CRITICAL findings:** REMEDIATE (apply fix), push, re-review (max 2 retries). Do NOT continue past unremediated CRITICALs.
|
|
352
|
+
- **Bugbot findings:** `/bugbot` handles triage + fix automatically (max 3 rounds).
|
|
353
|
+
- **External hard-block** (TCC, network, etc.): Stop, report what was completed, surface the blocker.
|
|
237
354
|
|
|
238
|
-
## When NOT to
|
|
355
|
+
## When NOT to use
|
|
239
356
|
|
|
240
|
-
- During brainstorming (this runs AFTER spec approval)
|
|
357
|
+
- During brainstorming if you haven't approved the spec yet (this skill runs AFTER spec approval)
|
|
241
358
|
- For hotfixes (too heavy — just commit and push)
|
|
242
|
-
- When the user wants manual control over each step
|
|
359
|
+
- When the user wants manual control over each step (use individual phase skills instead)
|
|
360
|
+
- When required credentials/dependencies are missing and you don't want to be told (preflight will catch them)
|