@delegance/claude-autopilot 5.0.0-alpha.5 → 5.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/CHANGELOG.md +20 -0
  2. package/README.md +46 -9
  3. package/package.json +1 -1
package/CHANGELOG.md CHANGED
@@ -1,5 +1,25 @@
1
1
  # Changelog
2
2
 
3
+ ## [5.0.0] — 2026-04-27
4
+
5
+ First GA release after a five-alpha soak cycle. Promotes `5.0.0-alpha.5` to GA unchanged on the code side; the only diff is the version bump, README rebranding away from `@alpha` channel guidance, and a new "Reproducing the benchmark" section.
6
+
7
+ ### Added
8
+ - **README hero benchmark.** Documented 13/13 on the seeded Next.js fixture with Claude Opus at $0.21 / 38s. Includes a "Reproducing the benchmark" section at the bottom with the full procedure, the categories measured, and explicit non-claims (e.g. doesn't measure false-positive rate on clean repos).
9
+ - README install instructions now use bare `npm install -g @delegance/claude-autopilot` (no `@alpha` pin) — assumes the `latest` dist-tag has advanced to 5.0.0.
10
+
11
+ ### Changed
12
+ - Migration guide install snippets drop the `@alpha` pin and the alpha-cycle warning.
13
+ - Removed the alpha-era CLI note from the README ("Alpha.1 CLI note: subcommands are flat …" → just "CLI note").
14
+
15
+ ### Manual GA steps (for the publisher)
16
+ After this lands and `v5.0.0` is tagged + auto-published:
17
+
18
+ 1. `cd packages/guardrail-tombstone && npm publish` — publishes `@delegance/guardrail@5.0.0` thin wrapper.
19
+ 2. `npm dist-tag add @delegance/claude-autopilot@5.0.0 latest` — moves `latest` from the legacy 2.5.0 to GA.
20
+ 3. `npm deprecate @delegance/claude-autopilot@"<5.0.0" "Pre-rename — use 5.x"` — flags the orphaned 1.0.0-rc.1 / 2.x / 5.0.0-alpha.* releases.
21
+ 4. `npm deprecate @delegance/guardrail@"<5.0.0" "Renamed — use @delegance/claude-autopilot"` — tells v4 users to migrate (the `5.0.0` tombstone forwards their existing CLI usage transparently).
22
+
3
23
  ## [5.0.0-alpha.5] — 2026-04-27
4
24
 
5
25
  Second hotfix from the soak. Alpha.4 fixed `init`'s preset resolution but `scan` / `run` still crashed on compiled output with `Failed to import adapter from .../auto.ts` — the adapter loader and static-rule registry use dynamic-import string literals that tsc's `rewriteRelativeImportExtensions` doesn't touch.
package/README.md CHANGED
@@ -15,6 +15,18 @@ claude-autopilot brainstorm "add SSO with SAML for enterprise tenants"
15
15
 
16
16
  ---
17
17
 
18
+ ## Benchmark
19
+
20
+ On a Next.js fixture seeded with 13 production-realistic bugs covering the categories the README advertises — SQL injection, hardcoded secret, missing auth, IDOR, CORS wildcard, SSRF, open redirect, TOCTOU race, silent error swallow, off-by-one, missing rate limit, console.log in prod, and missing input validation:
21
+
22
+ | Configuration | Bugs caught | Cost | Time |
23
+ |---|---|---|---|
24
+ | **`claude-autopilot scan --all` with Claude Opus** | **13 / 13** | $0.21 | 38 s |
25
+
26
+ Every finding came with a concrete remediation (often a code patch or named library — `Zod` for validation, atomic Postgres updates for TOCTOU, allowlist + DNS resolution for SSRF). [Reproduce the benchmark.](#reproducing-the-benchmark)
27
+
28
+ ---
29
+
18
30
  ## Why this vs the alternatives
19
31
 
20
32
  AI coding tools fall into three buckets. Here's where claude-autopilot sits.
@@ -39,11 +51,11 @@ The architectural differences that matter most in practice:
39
51
  ## 30-second quickstart
40
52
 
41
53
  ```bash
42
- # Install (alpha channel — use @alpha through the v5 alpha cycle)
43
- npm install -g @delegance/claude-autopilot@alpha
54
+ # Install
55
+ npm install -g @delegance/claude-autopilot
44
56
 
45
57
  # One-shot setup — detects stack, writes config, installs skills, sets hooks
46
- npx claude-autopilot@alpha init
58
+ claude-autopilot init
47
59
 
48
60
  # Ship a feature end-to-end
49
61
  claude-autopilot brainstorm "add rate limiting to the public API"
@@ -93,16 +105,12 @@ claude-autopilot run --format sarif --output out.sarif
93
105
  claude-autopilot fix --verify # LLM patch + test gate + revert on fail
94
106
  ```
95
107
 
96
- > **Alpha.1 CLI note:** subcommands are flat (`run`, `scan`, `ci`, `fix`, `baseline`, `explain`, …). The grouped `claude-autopilot review <verb>` form lands in alpha.2 as an alias — flat forms continue to work indefinitely.
108
+ > **CLI note:** subcommands are flat (`run`, `scan`, `ci`, `fix`, `baseline`, `explain`, …). The grouped `claude-autopilot review <verb>` form is also accepted as an alias — flat and grouped both work.
97
109
 
98
110
  ## Install & requirements
99
111
 
100
112
  ```bash
101
- # v5 alpha — current release channel
102
- npm install -g @delegance/claude-autopilot@alpha
103
-
104
- # When 5.0.0 GA ships, the `latest` tag will advance and you can drop the @alpha:
105
- # npm install -g @delegance/claude-autopilot
113
+ npm install -g @delegance/claude-autopilot
106
114
  ```
107
115
 
108
116
  - Node 22+
@@ -274,6 +282,35 @@ Four pluggable adapter points:
274
282
 
275
283
  **Monorepo:** Auto-detects npm/yarn/pnpm workspaces, Turborepo, and Nx.
276
284
 
285
+ ## Reproducing the benchmark
286
+
287
+ The 13/13 benchmark cited in the [Benchmark](#benchmark) section is reproducible end-to-end. The fixture is a minimal Next.js app that seeds each of the README-advertised bug categories at a specific file:line, then `claude-autopilot scan --all` is run with the `claude` adapter and the result is compared to the seed list.
288
+
289
+ ```bash
290
+ # 1. Install the CLI
291
+ npm install -g @delegance/claude-autopilot
292
+
293
+ # 2. Seed the fixture (one file per bug category)
294
+ SEED=$(mktemp -d) && cd $SEED && npm init -y >/dev/null
295
+ mkdir -p app/api/{users,coupons,profile,redirect,proxy} lib
296
+
297
+ # (Add the 13 seeded files — the canonical fixture lives at
298
+ # https://github.com/axledbetter/claude-autopilot/tree/master/tests/v4-compat/fixtures/13-bugs)
299
+
300
+ # 3. Init + scan
301
+ claude-autopilot init --preset nextjs-supabase
302
+ ANTHROPIC_API_KEY=sk-ant-... claude-autopilot scan --all
303
+ ```
304
+
305
+ **What "13 of 13" means:** the scan output flags each category as a distinct critical or warning finding with file path, line, and concrete remediation. We count one hit per seed regardless of severity bucket. The categories are: SQL injection, hardcoded secret, missing auth, IDOR, CORS wildcard, SSRF, open redirect, TOCTOU race, silent error swallow, off-by-one, missing rate limit, console.log in prod, missing input validation.
306
+
307
+ **What this doesn't measure:**
308
+ - False positive rate on a clean repo (separate test, expected ~3 findings on real production code per the cold-start eval)
309
+ - Detection rate with cheaper models — this is Claude Opus. Sonnet typically catches 11/13. Llama 3.3 70B (via Groq) caught 8/13 in independent testing
310
+ - Bugs the scan missed: there are none in the 13-category set we measure, but real production bugs are not always in this set
311
+
312
+ We do not claim 13/13 reflects every real-world repo — it's a reproducible upper bound on a fixture that exercises the categories we explicitly target.
313
+
277
314
  ## License
278
315
 
279
316
  MIT
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@delegance/claude-autopilot",
3
- "version": "5.0.0-alpha.5",
3
+ "version": "5.0.0",
4
4
  "type": "module",
5
5
  "description": "Autonomous development pipeline for Claude Code: brainstorm → spec → plan → implement → migrate → validate → PR → review → merge. Multi-model, local-first, every phase a skill you can intervene in.",
6
6
  "keywords": [