npm - @delegance/claude-autopilot - Versions diffs - 5.0.0-alpha.5 → 5.0.0 - Mend

@delegance/claude-autopilot 5.0.0-alpha.5 → 5.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,25 @@
 # Changelog
+## [5.0.0] — 2026-04-27
+First GA release after a five-alpha soak cycle. Promotes `5.0.0-alpha.5` to GA unchanged on the code side; the only diff is the version bump, README rebranding away from `@alpha` channel guidance, and a new "Reproducing the benchmark" section.
+### Added
+- **README hero benchmark.** Documented 13/13 on the seeded Next.js fixture with Claude Opus at $0.21 / 38s. Includes a "Reproducing the benchmark" section at the bottom with the full procedure, the categories measured, and explicit non-claims (e.g. doesn't measure false-positive rate on clean repos).
+- README install instructions now use bare `npm install -g @delegance/claude-autopilot` (no `@alpha` pin) — assumes the `latest` dist-tag has advanced to 5.0.0.
+### Changed
+- Migration guide install snippets drop the `@alpha` pin and the alpha-cycle warning.
+- Removed the alpha-era CLI note from the README ("Alpha.1 CLI note: subcommands are flat …" → just "CLI note").
+### Manual GA steps (for the publisher)
+After this lands and `v5.0.0` is tagged + auto-published:
+1. `cd packages/guardrail-tombstone && npm publish` — publishes `@delegance/guardrail@5.0.0` thin wrapper.
+2. `npm dist-tag add @delegance/claude-autopilot@5.0.0 latest` — moves `latest` from the legacy 2.5.0 to GA.
+3. `npm deprecate @delegance/claude-autopilot@"<5.0.0" "Pre-rename — use 5.x"` — flags the orphaned 1.0.0-rc.1 / 2.x / 5.0.0-alpha.* releases.
+4. `npm deprecate @delegance/guardrail@"<5.0.0" "Renamed — use @delegance/claude-autopilot"` — tells v4 users to migrate (the `5.0.0` tombstone forwards their existing CLI usage transparently).
 ## [5.0.0-alpha.5] — 2026-04-27
 Second hotfix from the soak. Alpha.4 fixed `init`'s preset resolution but `scan` / `run` still crashed on compiled output with `Failed to import adapter from .../auto.ts` — the adapter loader and static-rule registry use dynamic-import string literals that tsc's `rewriteRelativeImportExtensions` doesn't touch.

package/README.md CHANGED Viewed

@@ -15,6 +15,18 @@ claude-autopilot brainstorm "add SSO with SAML for enterprise tenants"
 ---
+## Benchmark
+On a Next.js fixture seeded with 13 production-realistic bugs covering the categories the README advertises — SQL injection, hardcoded secret, missing auth, IDOR, CORS wildcard, SSRF, open redirect, TOCTOU race, silent error swallow, off-by-one, missing rate limit, console.log in prod, and missing input validation:
+| Configuration | Bugs caught | Cost | Time |
+|---|---|---|---|
+| **`claude-autopilot scan --all` with Claude Opus** | **13 / 13** | $0.21 | 38 s |
+Every finding came with a concrete remediation (often a code patch or named library — `Zod` for validation, atomic Postgres updates for TOCTOU, allowlist + DNS resolution for SSRF). [Reproduce the benchmark.](#reproducing-the-benchmark)
+---
 ## Why this vs the alternatives
 AI coding tools fall into three buckets. Here's where claude-autopilot sits.
@@ -39,11 +51,11 @@ The architectural differences that matter most in practice:
 ## 30-second quickstart
 ```bash
-# Install (alpha channel — use @alpha through the v5 alpha cycle)
-npm install -g @delegance/claude-autopilot@alpha
+# Install
+npm install -g @delegance/claude-autopilot
 # One-shot setup — detects stack, writes config, installs skills, sets hooks
-npx claude-autopilot@alpha init
+claude-autopilot init
 # Ship a feature end-to-end
 claude-autopilot brainstorm "add rate limiting to the public API"
@@ -93,16 +105,12 @@ claude-autopilot run --format sarif --output out.sarif
 claude-autopilot fix --verify                    # LLM patch + test gate + revert on fail
 ```
-> **Alpha.1 CLI note:** subcommands are flat (`run`, `scan`, `ci`, `fix`, `baseline`, `explain`, …). The grouped `claude-autopilot review <verb>` form lands in alpha.2 as an alias — flat forms continue to work indefinitely.
+> **CLI note:** subcommands are flat (`run`, `scan`, `ci`, `fix`, `baseline`, `explain`, …). The grouped `claude-autopilot review <verb>` form is also accepted as an alias — flat and grouped both work.
 ## Install & requirements
 ```bash
-# v5 alpha — current release channel
-npm install -g @delegance/claude-autopilot@alpha
-# When 5.0.0 GA ships, the `latest` tag will advance and you can drop the @alpha:
-# npm install -g @delegance/claude-autopilot
+npm install -g @delegance/claude-autopilot
 ```
 - Node 22+
@@ -274,6 +282,35 @@ Four pluggable adapter points:
 **Monorepo:** Auto-detects npm/yarn/pnpm workspaces, Turborepo, and Nx.
+## Reproducing the benchmark
+The 13/13 benchmark cited in the [Benchmark](#benchmark) section is reproducible end-to-end. The fixture is a minimal Next.js app that seeds each of the README-advertised bug categories at a specific file:line, then `claude-autopilot scan --all` is run with the `claude` adapter and the result is compared to the seed list.
+```bash
+# 1. Install the CLI
+npm install -g @delegance/claude-autopilot
+# 2. Seed the fixture (one file per bug category)
+SEED=$(mktemp -d) && cd $SEED && npm init -y >/dev/null
+mkdir -p app/api/{users,coupons,profile,redirect,proxy} lib
+# (Add the 13 seeded files — the canonical fixture lives at
+#  https://github.com/axledbetter/claude-autopilot/tree/master/tests/v4-compat/fixtures/13-bugs)
+# 3. Init + scan
+claude-autopilot init --preset nextjs-supabase
+ANTHROPIC_API_KEY=sk-ant-... claude-autopilot scan --all
+```
+**What "13 of 13" means:** the scan output flags each category as a distinct critical or warning finding with file path, line, and concrete remediation. We count one hit per seed regardless of severity bucket. The categories are: SQL injection, hardcoded secret, missing auth, IDOR, CORS wildcard, SSRF, open redirect, TOCTOU race, silent error swallow, off-by-one, missing rate limit, console.log in prod, missing input validation.
+**What this doesn't measure:**
+- False positive rate on a clean repo (separate test, expected ~3 findings on real production code per the cold-start eval)
+- Detection rate with cheaper models — this is Claude Opus. Sonnet typically catches 11/13. Llama 3.3 70B (via Groq) caught 8/13 in independent testing
+- Bugs the scan missed: there are none in the 13-category set we measure, but real production bugs are not always in this set
+We do not claim 13/13 reflects every real-world repo — it's a reproducible upper bound on a fixture that exercises the categories we explicitly target.
 ## License
 MIT

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@delegance/claude-autopilot",
-  "version": "5.0.0-alpha.5",
+  "version": "5.0.0",
   "type": "module",
   "description": "Autonomous development pipeline for Claude Code: brainstorm → spec → plan → implement → migrate → validate → PR → review → merge. Multi-model, local-first, every phase a skill you can intervene in.",
   "keywords": [