@delegance/claude-autopilot 7.4.0 → 7.4.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +93 -0
- package/package.json +1 -1
- package/skills/autopilot/SKILL.md +52 -0
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,99 @@
|
|
|
2
2
|
|
|
3
3
|
- v5.6 Phase 7 (docs reconciliation) — pending.
|
|
4
4
|
|
|
5
|
+
## 7.4.2 (2026-05-11)
|
|
6
|
+
|
|
7
|
+
**v7.4.2 — risk-tiered codex pass policy in autopilot skill.**
|
|
8
|
+
Docs-only PR. Codifies finding N2 from the v7.4.1 codex strategic
|
|
9
|
+
review into `skills/autopilot/SKILL.md`.
|
|
10
|
+
|
|
11
|
+
**New policy table** in the skill:
|
|
12
|
+
|
|
13
|
+
| Spec risk | # of codex passes |
|
|
14
|
+
|---|---|
|
|
15
|
+
| **Low** (CLI UX, doc-only, scaffolding, CI tweaks) | 1 |
|
|
16
|
+
| **Medium** (new exec modes, auth, billing, data-access, env vars, API contracts) | 2 |
|
|
17
|
+
| **High** (sandboxing, multi-tenancy, auto-merge, repo-mutation, secrets, RPC/SECURITY DEFINER) | 3 + external review |
|
|
18
|
+
|
|
19
|
+
**Convention:** spec docs declare `risk: low | medium | high` in
|
|
20
|
+
frontmatter. Omitted defaults to **medium** (safer than defaulting
|
|
21
|
+
to low).
|
|
22
|
+
|
|
23
|
+
**v7.x examples** included in the skill text:
|
|
24
|
+
* v7.1.7 (low) — 1 pass, 0 CRITICALs in practice.
|
|
25
|
+
* v7.4.0 (low) — 1 pass, 2 CRITICALs caught pre-impl.
|
|
26
|
+
* v7.0 Phase 6 (high) — 3 passes, would have shipped credential-
|
|
27
|
+
exfiltration vector C3 without all three.
|
|
28
|
+
* v8.0 spec (high) — 2 passes done, needs 3rd before v8 alpha.
|
|
29
|
+
|
|
30
|
+
No code change. Bumping to 7.4.2.
|
|
31
|
+
|
|
32
|
+
## 7.4.1 (2026-05-11)
|
|
33
|
+
|
|
34
|
+
**v7.4.1 — strategic pivot doc from codex 5.5 review.** Docs-only
|
|
35
|
+
PR. Records the decision to pause v8 daemon implementation pending
|
|
36
|
+
customer discovery, plus 8 other findings from the codex strategic
|
|
37
|
+
review of full project state on 2026-05-11.
|
|
38
|
+
|
|
39
|
+
**Key outcome:** "ship v8 daemon" is NOT the next milestone. The CLI
|
|
40
|
+
chat-session loop is the validated asset; v8 is unvalidated. New
|
|
41
|
+
priority order: (1) customer discovery sprint, (2) hosted beta
|
|
42
|
+
readiness slice (operational), (3) org-tier revocation completion,
|
|
43
|
+
(4) risk-tiered codex pass policy in the autopilot skill.
|
|
44
|
+
|
|
45
|
+
**Process changes adopted:**
|
|
46
|
+
|
|
47
|
+
* **Risk-tiered codex passes** (1 for low-risk CLI UX, 2 for new
|
|
48
|
+
exec/auth/billing/data-access modes, 3 for sandboxing /
|
|
49
|
+
multi-tenancy / repo-mutation).
|
|
50
|
+
* **Strategic codex review every ~10 PRs** (separate from per-spec
|
|
51
|
+
passes — catches "ship more without validating demand" trap).
|
|
52
|
+
* **Bounded benchmark suite gate** (4 repo shapes only, run
|
|
53
|
+
pre-release + after major workflow changes — already in v8 spec).
|
|
54
|
+
|
|
55
|
+
**v8 IF customer discovery validates demand:** local-only alpha
|
|
56
|
+
first (per W5 of codex review). NO hosted workers, NO billing, NO
|
|
57
|
+
auto-merge until alpha demand is proven.
|
|
58
|
+
|
|
59
|
+
Full doc at `docs/strategy/2026-05-11-codex-pivot.md`.
|
|
60
|
+
|
|
61
|
+
## 7.4.0 (2026-05-11)
|
|
62
|
+
|
|
63
|
+
**v7.4.0 — scaffold per-stack support (Python + FastAPI).** Closes
|
|
64
|
+
the v7.1.6/v7.1.8 benchmark caveat ("n=1, Node 22 ESM only —
|
|
65
|
+
Python/Rust/Go remain v8 follow-ups") and gates v8 spec
|
|
66
|
+
stabilization criteria #2 (4-repo benchmark suite).
|
|
67
|
+
|
|
68
|
+
* **Stack detection precedence** (codex C1): explicit `--stack` >
|
|
69
|
+
FastAPI > Python > Node > detected-but-unsupported > Node fallback.
|
|
70
|
+
FastAPI checked BEFORE Python so FastAPI specs that include
|
|
71
|
+
`pyproject.toml` aren't mis-classified.
|
|
72
|
+
* **FastAPI scaffold completeness** (codex C2): generates a runnable
|
|
73
|
+
`src/<package>/main.py` with `app = FastAPI()`, `/health` route,
|
|
74
|
+
`run()` function, plus `tests/test_main.py` (otherwise the
|
|
75
|
+
`[project.scripts]` entry was dangling).
|
|
76
|
+
* **Name normalization** (codex W1): PEP 503 distribution name +
|
|
77
|
+
valid Python identifier package name. `my-pkg-2` → distribution
|
|
78
|
+
`my-pkg-2`, package `my_pkg_2`. Hatchling explicit `packages`
|
|
79
|
+
config always present.
|
|
80
|
+
* **Detected-but-unsupported** (codex W2): Go/Rust/Ruby specs →
|
|
81
|
+
exit 3 with diagnostic, NOT silent fallback to Node.
|
|
82
|
+
* **Polyglot guard** (codex W3): specs listing both `package.json`
|
|
83
|
+
AND `pyproject.toml` without `--stack` → exit 3.
|
|
84
|
+
* **Narrow dep extraction** (codex W6): 3 patterns only, no inferred
|
|
85
|
+
versions, dedup by PEP 503 normalized name. FastAPI auto-includes
|
|
86
|
+
`fastapi>=0.110` + `uvicorn[standard]>=0.27`.
|
|
87
|
+
* **Module split**: `scaffold.ts` is now the dispatcher;
|
|
88
|
+
per-stack scaffolders live under
|
|
89
|
+
`src/cli/scaffold/{node,python,types}.ts`.
|
|
90
|
+
* **New flags**: `--stack <node|python|fastapi>`, `--list-stacks`.
|
|
91
|
+
* **Integration test** (codex N3): scaffolds FastAPI + creates
|
|
92
|
+
isolated venv (handles PEP 668) + `pip install -e .` + import-
|
|
93
|
+
app. Skipped cleanly when `python3` unavailable.
|
|
94
|
+
|
|
95
|
+
1563 → 1597 CLI tests; tsc clean; build clean. PR #155 spec +
|
|
96
|
+
#156 impl. Version 7.3.0 → 7.4.0.
|
|
97
|
+
|
|
5
98
|
## 7.3.0 (2026-05-10)
|
|
6
99
|
|
|
7
100
|
**v7.3.0 — library export surface for v8 daemon.** Minor bump
|
package/package.json
CHANGED
|
@@ -25,6 +25,58 @@ The ONLY time you stop is if a step **fails and cannot be recovered**. Otherwise
|
|
|
25
25
|
|
|
26
26
|
Brief status lines like `[autopilot] Step 3: Executing plan...` are fine. Full summaries, questions, or check-ins are not.
|
|
27
27
|
|
|
28
|
+
## Codex pass policy (risk-tiered)
|
|
29
|
+
|
|
30
|
+
> Adopted from the v7.4.1 strategic review (see
|
|
31
|
+
> `docs/strategy/2026-05-11-codex-pivot.md`, codex finding N2).
|
|
32
|
+
>
|
|
33
|
+
> The v8 spec pass-2 finding 3 CRITICALs the original spec missed
|
|
34
|
+
> (especially sandbox / credential exfiltration) was concrete evidence
|
|
35
|
+
> that 1 codex pass is insufficient for security-sensitive architecture.
|
|
36
|
+
> But running 3 passes on every CLI polish spec adds latency without
|
|
37
|
+
> proportional value.
|
|
38
|
+
|
|
39
|
+
**Tier the spec by risk; pass count follows.**
|
|
40
|
+
|
|
41
|
+
| Spec risk | Triggers | # of codex passes |
|
|
42
|
+
|---|---|---|
|
|
43
|
+
| **Low** | CLI UX changes, doc-only PRs, scaffolding extensions, config polish, CI workflow tweaks | **1 pass** (this skill's existing pattern — codex on the committed spec) |
|
|
44
|
+
| **Medium** | New execution modes, auth changes, billing flows, data-access patterns, new env vars, API contracts | **2 passes** (1 on the draft spec, 1 on the merged spec after edits) |
|
|
45
|
+
| **High** | Sandboxing, multi-tenancy, auto-merge, anything that mutates user repos, new secrets-handling, RPC/SECURITY DEFINER changes | **3 passes** + external review (1 draft, 1 post-edit, 1 on the impl PR diff) |
|
|
46
|
+
|
|
47
|
+
**How to apply.** Spec docs declare risk in their frontmatter:
|
|
48
|
+
|
|
49
|
+
```markdown
|
|
50
|
+
---
|
|
51
|
+
title: <topic>
|
|
52
|
+
risk: low | medium | high
|
|
53
|
+
---
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
If the spec's `risk:` is omitted, default to **medium** (safer than
|
|
57
|
+
defaulting to low; matches the v8 spec pattern where pass-1 was
|
|
58
|
+
clearly insufficient).
|
|
59
|
+
|
|
60
|
+
The brainstorming skill's per-step codex pass (approach selection,
|
|
61
|
+
architecture, components, error handling, implementation prep) is
|
|
62
|
+
ALWAYS run — it's how we get a draft spec good enough to merge.
|
|
63
|
+
This tier policy applies to the **post-brainstorm** passes and to
|
|
64
|
+
the codex PR review at Step 7 below.
|
|
65
|
+
|
|
66
|
+
**Examples from v7.x:**
|
|
67
|
+
|
|
68
|
+
* v7.1.7 (setup polish — CLAUDE.md scaffold + .gitignore + dedup): low.
|
|
69
|
+
1 pass on the committed spec. Caught zero CRITICALs in practice.
|
|
70
|
+
* v7.4.0 (Python/FastAPI scaffold extension): low. 1 pass. Found
|
|
71
|
+
2 CRITICALs (FastAPI precedence, dangling entrypoint) — both
|
|
72
|
+
fixed pre-impl, no PR-pass surprises.
|
|
73
|
+
* v7.0 Phase 6 (engine-off removal + middleware revocation): high.
|
|
74
|
+
3 passes (spec, post-edit, PR diff). Each pass surfaced new
|
|
75
|
+
trust-boundary issues; without all three the launch would have
|
|
76
|
+
shipped with the credential-exfiltration vector C3.
|
|
77
|
+
* v8.0 spec (standalone daemon): high. 2 passes so far + needs a
|
|
78
|
+
3rd before any v8 alpha implementation.
|
|
79
|
+
|
|
28
80
|
## Pipeline
|
|
29
81
|
|
|
30
82
|
Execute these steps in order. Do NOT pause between steps unless a step fails.
|