@delegance/claude-autopilot 7.9.1 → 7.10.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +46 -1
- package/README.md +19 -13
- package/dist/src/cli/examples.d.ts +15 -0
- package/dist/src/cli/examples.js +108 -0
- package/dist/src/cli/help-text.js +13 -0
- package/dist/src/cli/index.js +12 -1
- package/dist/src/core/run-state/sameness-detector.d.ts +82 -0
- package/dist/src/core/run-state/sameness-detector.js +146 -0
- package/examples/specs/fastapi.md +37 -0
- package/examples/specs/go-cli.md +32 -0
- package/examples/specs/node-cli.md +36 -0
- package/examples/specs/python-cli.md +35 -0
- package/examples/specs/rust-cli.md +34 -0
- package/package.json +18 -3
- package/skills/autopilot/SKILL.md +40 -3
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,52 @@
|
|
|
2
2
|
|
|
3
3
|
- v5.6 Phase 7 (docs reconciliation) — pending.
|
|
4
4
|
|
|
5
|
+
## 7.10.1 — 2026-05-13
|
|
6
|
+
|
|
7
|
+
**v7.10.1 — `examples` verb.** Patch release. Closes the discoverability
|
|
8
|
+
gap a new user hits between `setup` and `scaffold --from-spec`: the
|
|
9
|
+
new `claude-autopilot examples` verb prints sample specs for each
|
|
10
|
+
supported stack (node / python / fastapi / go / rust) so an operator
|
|
11
|
+
can pipe one to a file and feed it back into the scaffolder.
|
|
12
|
+
|
|
13
|
+
**New:** `claude-autopilot examples` lists all five stacks with a
|
|
14
|
+
5-line preview. `claude-autopilot examples <stack>` prints the full
|
|
15
|
+
spec for one stack to stdout — pipe-friendly:
|
|
16
|
+
|
|
17
|
+
```bash
|
|
18
|
+
claude-autopilot examples fastapi > docs/specs/my-api.md
|
|
19
|
+
# edit the spec...
|
|
20
|
+
claude-autopilot scaffold --from-spec docs/specs/my-api.md
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
**Bundled examples.** Five spec markdown files ship in
|
|
24
|
+
`examples/specs/{node-cli,python-cli,fastapi,go-cli,rust-cli}.md` —
|
|
25
|
+
one per supported scaffold target. Each shows the `## Files` section
|
|
26
|
+
shape the scaffolder reads, plus a `## Goal` and `## How to use`
|
|
27
|
+
section for human readers.
|
|
28
|
+
|
|
29
|
+
**Package shape.** `examples/` is added to the published `files:`
|
|
30
|
+
array in package.json so the directory ships in the npm tarball — the
|
|
31
|
+
verb resolves spec paths via `findPackageRoot` and works under both
|
|
32
|
+
source and globally installed CLI invocations.
|
|
33
|
+
|
|
34
|
+
## 7.10.0 — 2026-05-13
|
|
35
|
+
|
|
36
|
+
### Added
|
|
37
|
+
- **Retry-loop sameness detector** (`src/core/run-state/sameness-detector.ts`) — new pure-TS module exporting `computeFingerprint`, `isSameFailure`, `shouldEscalate`, and `stripVolatileTokens`. A failure fingerprint is `{ phase, errorType, errorLocation, errorMessage, hash }` where `hash` is `sha256(JSON.stringify([phase, errorType, errorLocation, normalize(message[:200])]))`. JSON encoding is delimiter-safe — pipes, quotes, and embedded JSON in fields cannot produce cross-tuple collisions. `shouldEscalate(history)` returns `{ escalate: true }` when the last two entries have identical hashes — the signal that retries are making no progress.
|
|
38
|
+
- **Volatile-token stripping built into the detector.** Both `errorLocation` and `errorMessage` are scrubbed of UUIDs, ISO timestamps, 13-digit epoch ms, sha1/sha256 hex digests, /tmp + /var/folders paths, and localhost:port before hashing — so a retry whose message differs only in a per-run UUID or temp directory still hashes to the same fingerprint. Exported as `stripVolatileTokens()` for callers that want to scrub before constructing a fingerprint.
|
|
39
|
+
- **Public package subpath export.** `package.json` adds `./run-state/sameness-detector` pointing at `dist/src/core/run-state/sameness-detector.{js,d.ts}` so consumers can `import { computeFingerprint, shouldEscalate } from '@delegance/claude-autopilot/run-state/sameness-detector'` without deep-importing into compiled paths.
|
|
40
|
+
- **Pipeline halts when retries make no progress, even if you have retries remaining.** `skills/autopilot/SKILL.md` Step 4 (validate), Step 7 (Codex PR review), and Step 8 (bugbot) now consult the detector before consuming a retry. If the same failure fingerprint fires twice in a row inside any retry loop, the pipeline stops and surfaces the matching fingerprint to the user instead of burning the remaining retry budget. This catches the class of bug where validate retries fix nothing because the underlying type error is unreachable from the change set.
|
|
41
|
+
- Tests: `tests/run-state/sameness-detector.test.ts` (32 cases) covers the three issue-#181 acceptance scenarios (same × 2 escalates, same × 1 continues, different × 3 continues), edge cases (empty history, ABA pattern, message truncation, all three phases), delimiter-safety (fields containing `|`), and volatile-token scrubbing (UUIDs, timestamps, tmpdirs). `tests/run-state/sameness-detector-integration.test.ts` (7 cases) verifies SKILL.md retry-block references and that the compiled subpath export is importable + functional.
|
|
42
|
+
|
|
43
|
+
### Notes
|
|
44
|
+
- Persistence is intentionally in-memory only in v7.10.0. Per-retry-loop history is held in the autopilot skill execution scope; bugbot and validate do not share a history. The v6 run-state events.ndjson integration is tracked separately as issue #180.
|
|
45
|
+
- Released as v7.10.0 even though issue #181 was originally labeled v7.11.0 — this ships before #178 and #179, so it gets the next minor.
|
|
46
|
+
|
|
47
|
+
### Out of scope (still pending)
|
|
48
|
+
- Expand/contract migration classification (additive vs destructive enforcement) — v7.11.0 candidate
|
|
49
|
+
- v6 run-state engine integration into the autopilot skill (4,873 LOC of checkpoint/resume infra currently unused by the skill) — issue #180
|
|
50
|
+
|
|
5
51
|
## 7.9.1 — 2026-05-13 (correctness hotfix)
|
|
6
52
|
|
|
7
53
|
### Fixed
|
|
@@ -74,7 +120,6 @@ usage except for the bundled `tsx` deprecation warning.
|
|
|
74
120
|
A1-A8 covering ESM/CJS safety, CLI parser scope, PATH self-pointer,
|
|
75
121
|
AST audit, type-only imports, hand-rolled PATH lookup dropping the
|
|
76
122
|
`which` dep, XDG state dir, npm-only --omit=optional documentation).
|
|
77
|
-
|
|
78
123
|
## 7.7.0 (2026-05-11)
|
|
79
124
|
|
|
80
125
|
**v7.7.0 — Rust scaffold support.** Minor release. Promotes Rust from
|
package/README.md
CHANGED
|
@@ -17,8 +17,11 @@ claude-autopilot brainstorm "add SSO with SAML for enterprise tenants"
|
|
|
17
17
|
# → writes spec (reviewed by Codex) → writes plan (reviewed by Codex) →
|
|
18
18
|
# → creates branch → implements with subagents → runs migrations →
|
|
19
19
|
# → runs full test + lint + type + security gate → opens PR →
|
|
20
|
-
# →
|
|
21
|
-
# →
|
|
20
|
+
# → runs risk-tiered Codex PR review (1/2/3 passes by spec risk) →
|
|
21
|
+
# → triages bugbot findings, auto-fixes real bugs, re-runs validate →
|
|
22
|
+
# → merges with your configured permissions (default is admin-squash;
|
|
23
|
+
# configure branch protection + required checks if you need to enforce
|
|
24
|
+
# reviews/CI gates that the autopilot agent should not bypass)
|
|
22
25
|
```
|
|
23
26
|
|
|
24
27
|
*No hosted agent. No per-seat subscription. Runs locally on your machine, against your real repo, using your API keys. Every phase is a Claude Code skill you can intervene in, rewire, or run by itself.*
|
|
@@ -50,13 +53,14 @@ Every finding came with a concrete remediation (often a code patch or named libr
|
|
|
50
53
|
| **Cursor BugBot / CodeRabbit** | Hosted | Per-PR or seat | Vendor's model | Review only | Post-hoc |
|
|
51
54
|
| **Aider / Cline** | Local CLI | Free + your API key | User's choice | None | Continuous |
|
|
52
55
|
| **OpenHands / SWE-agent** | Local research | Free | User's choice | Agent decides | Rare |
|
|
53
|
-
| **claude-autopilot** | **Local CLI, your repo** | **
|
|
56
|
+
| **claude-autopilot** | **Local CLI, your repo** | **Open source CLI + your model/API costs (Claude / Codex / Gemini / Groq / Ollama-local)** | **Multi-model per role (Claude + Codex + Gemini)** | **Skill-per-phase, rewireable** | **Every phase, all state on disk** |
|
|
54
57
|
|
|
55
|
-
|
|
58
|
+
Four things only this product gives you:
|
|
56
59
|
|
|
57
|
-
1. **
|
|
58
|
-
2. **
|
|
60
|
+
1. **No hosted workspace or remote sandbox.** Your repo stays on your machine. No third-party agent runtime, no SaaS-side orchestration, no per-seat markup. Model prompts (diffs, file context, design questions) are sent to whichever LLM providers you've configured (Anthropic / OpenAI / Google / Groq / Ollama-local). For a truly local-only setup, you must point _every_ model used by the entire execution path at a local endpoint: that includes the Claude Code agent runtime itself (configure a local Claude Code provider) AND the autopilot review adapter (`openai-compatible` pointed at Ollama). Pointing only the review adapter at Ollama still ships prompts/diffs to Anthropic via Claude Code. For most teams, local-only isn't the goal; "no hosted orchestration + your existing provider keys" is.
|
|
61
|
+
2. **Risk-tiered review depth (policy-driven).** Specs declare `risk: low | medium | high` in frontmatter. The autopilot skill runs 1 / 2 / 3 sequential Codex passes accordingly, each with a remediation cycle in between. Enforcement is encoded in the skill (an LLM-driven instruction set, not a hard CLI gate) so it's auditable and editable: read `.claude/skills/autopilot/SKILL.md`, swap the tier rules for your codebase, expand the auto-escalation keyword list. Designed for teams that want review depth to scale with change risk instead of running forensic-grade review on every typo.
|
|
59
62
|
3. **Ships as a Claude Code skill, not a competing IDE.** `/brainstorm`, `/autopilot`, `/migrate`, `/validate` are first-class Claude Code commands. As Claude Code grows, autopilot rides that adoption. You don't switch tools to use it; it's already there.
|
|
63
|
+
4. **Multi-model council, available as a verb.** `claude-autopilot council` dispatches the same diff or design question to Claude + Codex + Gemini in parallel and synthesizes the consensus. Wire it into the autopilot pipeline by editing `.claude/skills/autopilot/SKILL.md` Step 7, or invoke standalone for one-off design decisions. The default pipeline uses sequential Codex review (cheaper, faster, often sufficient for routine changes); council is the higher-rigor option when you want broader model diversity.
|
|
60
64
|
|
|
61
65
|
Plus the four practical differences:
|
|
62
66
|
|
|
@@ -128,9 +132,9 @@ Each phase is a Claude Code skill (`.claude/skills/<name>/SKILL.md`). You can in
|
|
|
128
132
|
| **Migrate** | `migrate` | Dispatches to the configured migration skill (see [Migrate phase](#migrate-phase)) — runs your migration tool dev → QA → prod with per-env validation | Deterministic |
|
|
129
133
|
| **Validate** | `validate` | Static rules + tests + type check + security scan + LLM review | Any |
|
|
130
134
|
| **PR** | `commit-push-pr` | Opens the PR with auto-generated title, summary, and test plan | Claude |
|
|
131
|
-
| **Review** | `review
|
|
135
|
+
| **Review** | `codex-pr-review` (default) or `council` (opt-in) | Sequential Codex pass on the diff with risk-tiered iteration count (1/2/3 passes for low/medium/high). Swap in `council` for parallel multi-model dispatch if you want higher rigor. | Codex (default) or multi-model |
|
|
132
136
|
| **Triage** | `bugbot` | Fetches automated reviewer findings, auto-fixes real bugs, dismisses false positives | Claude |
|
|
133
|
-
| **Deploy** | `deploy` | Deploys via configured adapter (`vercel` \| `fly` \| `render` \| `generic`) with optional log streaming, health check, and bounded auto-rollback (see [Deploy phase](#deploy-phase)) | Deterministic |
|
|
137
|
+
| **Deploy** (opt-in) | `deploy` | Deploys via configured adapter (`vercel` \| `fly` \| `render` \| `generic`) with optional log streaming, health check, and bounded auto-rollback (see [Deploy phase](#deploy-phase)). Not on the default `/autopilot` critical path: the autopilot loop ends at merge, and your CI/CD handles prod. Invoke `claude-autopilot deploy` directly, or wire it into the autopilot skill as Step 10. | Deterministic |
|
|
134
138
|
|
|
135
139
|
### Migrate phase
|
|
136
140
|
|
|
@@ -181,11 +185,13 @@ deploy:
|
|
|
181
185
|
|
|
182
186
|
Features that are hard or impossible to find in the competitive set:
|
|
183
187
|
|
|
184
|
-
- **
|
|
185
|
-
- **
|
|
186
|
-
- **
|
|
187
|
-
- **
|
|
188
|
-
- **
|
|
188
|
+
- **Risk-tiered review depth (policy-driven).** Specs are tagged `risk: low | medium | high` in their frontmatter, with auto-escalation by keyword detection for sensitive categories (auth, multi-tenancy, sandboxing, billing, secrets, migrations, RLS, deploy/IAM, vector-DB tenancy — extend the list in the skill for your codebase). The pipeline runs 1 / 2 / 3 sequential Codex passes accordingly, each with a remediation cycle in between. Enforcement is encoded in `.claude/skills/autopilot/SKILL.md` (LLM-driven instructions, not a hard CLI gate), so it's auditable and editable. For teams that need hard enforcement, gate the merge step on the configured pass count by extending the skill or wrapping the CLI.
|
|
189
|
+
- **Retry-loop sameness detector.** Validate / Codex / bugbot retry loops compute a failure fingerprint before consuming each retry. If the same fingerprint fires twice in a row, the pipeline halts and surfaces it to you — instead of burning the remaining retry budget on attempts that are making no progress. Available as a public subpath import (`@delegance/claude-autopilot/run-state/sameness-detector`) for embedding into your own retry loops.
|
|
190
|
+
- **Multi-model council, available as a verb.** `claude-autopilot council` dispatches the same prompt to 3+ models in parallel and synthesizes the consensus. Opt-in for the autopilot pipeline (wire it into Step 7 of the autopilot skill), or invoke standalone for design decisions and architecture questions.
|
|
191
|
+
- **Fix with test verification.** `claude-autopilot fix --verify` runs your full test suite after every patch and reverts on failure. Safer than any tool that proposes fixes without running your tests.
|
|
192
|
+
- **Bug-bot auto-triage.** Watches Cursor BugBot / Copilot comments on your PR, triages each (real bug vs false positive), auto-fixes confirmed bugs, dismisses noise with explanations.
|
|
193
|
+
- **Schema alignment rule.** Ensures DB migrations, backend types, and frontend types stay in sync. Custom static rule, not something any competitor ships.
|
|
194
|
+
- **SARIF output + GitHub Code Scanning integration.** Findings appear as annotations in the PR and in the Security tab.
|
|
189
195
|
|
|
190
196
|
## Just the review layer
|
|
191
197
|
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
/** Supported stack id → relative path under `examples/specs/`. */
|
|
2
|
+
export declare const EXAMPLE_STACKS: Record<string, string>;
|
|
3
|
+
/** Public stack ids in the order we want them listed. */
|
|
4
|
+
export declare const EXAMPLE_STACK_IDS: readonly ["node", "python", "fastapi", "go", "rust"];
|
|
5
|
+
/** Resolve the absolute on-disk path for a stack's spec file. */
|
|
6
|
+
export declare function resolveExamplePath(stack: string): string | null;
|
|
7
|
+
/**
|
|
8
|
+
* Run the `examples` verb. With no `stack`, prints an intro + a summary card
|
|
9
|
+
* for each stack (path + first ~5 lines of the spec). With a stack id, prints
|
|
10
|
+
* the full spec content to stdout (suitable for piping into a file).
|
|
11
|
+
*
|
|
12
|
+
* Returns the exit code so the dispatcher can `process.exit(code)`.
|
|
13
|
+
*/
|
|
14
|
+
export declare function runExamples(stack?: string): number;
|
|
15
|
+
//# sourceMappingURL=examples.d.ts.map
|
|
@@ -0,0 +1,108 @@
|
|
|
1
|
+
// v7.7.1 — `claude-autopilot examples [<stack>]`
|
|
2
|
+
//
|
|
3
|
+
// Discoverability bridge between `setup` and `scaffold --from-spec`. A new
|
|
4
|
+
// user runs `setup`, then `scaffold --from-spec ???` and has no idea what a
|
|
5
|
+
// spec looks like. This verb prints sample specs — one per supported stack —
|
|
6
|
+
// straight to stdout so the operator can pipe to a file, edit, and feed back
|
|
7
|
+
// into `scaffold`.
|
|
8
|
+
//
|
|
9
|
+
// claude-autopilot examples → list all 5 stacks
|
|
10
|
+
// claude-autopilot examples node → print just the Node spec
|
|
11
|
+
// claude-autopilot examples fastapi > foo.md → spec-as-template via shell
|
|
12
|
+
//
|
|
13
|
+
// The spec files ship in the published tarball via the `files: ["examples/"]`
|
|
14
|
+
// entry in package.json. At runtime we resolve them relative to the package
|
|
15
|
+
// root (found by `findPackageRoot`), so `examples` works whether the CLI is
|
|
16
|
+
// run from source, the built dist/, or a globally installed `npm i -g`
|
|
17
|
+
// invocation.
|
|
18
|
+
import * as fs from 'node:fs';
|
|
19
|
+
import * as path from 'node:path';
|
|
20
|
+
import { findPackageRoot } from "./_pkg-root.js";
|
|
21
|
+
/** Supported stack id → relative path under `examples/specs/`. */
|
|
22
|
+
export const EXAMPLE_STACKS = {
|
|
23
|
+
node: 'examples/specs/node-cli.md',
|
|
24
|
+
python: 'examples/specs/python-cli.md',
|
|
25
|
+
fastapi: 'examples/specs/fastapi.md',
|
|
26
|
+
go: 'examples/specs/go-cli.md',
|
|
27
|
+
rust: 'examples/specs/rust-cli.md',
|
|
28
|
+
};
|
|
29
|
+
/** Public stack ids in the order we want them listed. */
|
|
30
|
+
export const EXAMPLE_STACK_IDS = ['node', 'python', 'fastapi', 'go', 'rust'];
|
|
31
|
+
const BOLD = (t) => `\x1b[1m${t}\x1b[0m`;
|
|
32
|
+
const DIM = (t) => `\x1b[2m${t}\x1b[0m`;
|
|
33
|
+
/** Resolve the absolute on-disk path for a stack's spec file. */
|
|
34
|
+
export function resolveExamplePath(stack) {
|
|
35
|
+
const rel = EXAMPLE_STACKS[stack];
|
|
36
|
+
if (!rel)
|
|
37
|
+
return null;
|
|
38
|
+
const root = findPackageRoot(import.meta.url);
|
|
39
|
+
if (!root)
|
|
40
|
+
return null;
|
|
41
|
+
return path.join(root, rel);
|
|
42
|
+
}
|
|
43
|
+
/** Print the first N non-empty lines of a file for the listing summary. */
|
|
44
|
+
function previewHead(absPath, n) {
|
|
45
|
+
try {
|
|
46
|
+
const body = fs.readFileSync(absPath, 'utf8');
|
|
47
|
+
const lines = body.split('\n');
|
|
48
|
+
const head = [];
|
|
49
|
+
for (const line of lines) {
|
|
50
|
+
head.push(line);
|
|
51
|
+
if (head.length >= n)
|
|
52
|
+
break;
|
|
53
|
+
}
|
|
54
|
+
return head.join('\n');
|
|
55
|
+
}
|
|
56
|
+
catch {
|
|
57
|
+
return '(failed to read example file)';
|
|
58
|
+
}
|
|
59
|
+
}
|
|
60
|
+
/**
|
|
61
|
+
* Run the `examples` verb. With no `stack`, prints an intro + a summary card
|
|
62
|
+
* for each stack (path + first ~5 lines of the spec). With a stack id, prints
|
|
63
|
+
* the full spec content to stdout (suitable for piping into a file).
|
|
64
|
+
*
|
|
65
|
+
* Returns the exit code so the dispatcher can `process.exit(code)`.
|
|
66
|
+
*/
|
|
67
|
+
export function runExamples(stack) {
|
|
68
|
+
if (!stack) {
|
|
69
|
+
console.log('');
|
|
70
|
+
console.log(BOLD('Sample specs for each supported stack.'));
|
|
71
|
+
console.log(DIM('Pass `examples <stack>` to print just one. Pipe to a file to use as a template:'));
|
|
72
|
+
console.log(DIM(' claude-autopilot examples node > docs/specs/my-feature.md'));
|
|
73
|
+
console.log(DIM(' claude-autopilot scaffold --from-spec docs/specs/my-feature.md'));
|
|
74
|
+
console.log('');
|
|
75
|
+
for (const id of EXAMPLE_STACK_IDS) {
|
|
76
|
+
const abs = resolveExamplePath(id);
|
|
77
|
+
if (!abs || !fs.existsSync(abs)) {
|
|
78
|
+
console.log(`${BOLD(id)} ${DIM('(example file not found)')}`);
|
|
79
|
+
console.log('');
|
|
80
|
+
continue;
|
|
81
|
+
}
|
|
82
|
+
console.log(BOLD(id));
|
|
83
|
+
console.log(DIM(` ${abs}`));
|
|
84
|
+
const head = previewHead(abs, 5);
|
|
85
|
+
for (const line of head.split('\n')) {
|
|
86
|
+
console.log(` ${line}`);
|
|
87
|
+
}
|
|
88
|
+
console.log('');
|
|
89
|
+
}
|
|
90
|
+
return 0;
|
|
91
|
+
}
|
|
92
|
+
const abs = resolveExamplePath(stack);
|
|
93
|
+
if (!abs) {
|
|
94
|
+
process.stderr.write(`\x1b[31m[claude-autopilot] unknown stack "${stack}" — valid: ${EXAMPLE_STACK_IDS.join(', ')}\x1b[0m\n`);
|
|
95
|
+
return 1;
|
|
96
|
+
}
|
|
97
|
+
if (!fs.existsSync(abs)) {
|
|
98
|
+
process.stderr.write(`\x1b[31m[claude-autopilot] example file missing on disk: ${abs}\x1b[0m\n`);
|
|
99
|
+
process.stderr.write(`\x1b[2m Did the published tarball include the "examples/" directory? See package.json "files".\x1b[0m\n`);
|
|
100
|
+
return 1;
|
|
101
|
+
}
|
|
102
|
+
const body = fs.readFileSync(abs, 'utf8');
|
|
103
|
+
process.stdout.write(body);
|
|
104
|
+
if (!body.endsWith('\n'))
|
|
105
|
+
process.stdout.write('\n');
|
|
106
|
+
return 0;
|
|
107
|
+
}
|
|
108
|
+
//# sourceMappingURL=examples.js.map
|
|
@@ -29,6 +29,7 @@ export const HELP_GROUPS = [
|
|
|
29
29
|
{ verb: 'init', summary: 'Scaffold guardrail.config.yaml + auto-detect migrate stack (writes .autopilot/stack.md)' },
|
|
30
30
|
{ verb: 'setup', summary: 'Auto-detect stack, write config, install pre-push hook' },
|
|
31
31
|
{ verb: 'scaffold', summary: 'Scaffold project skeleton from a spec markdown (--from-spec <path> [--stack node|python|fastapi|go|rust])' },
|
|
32
|
+
{ verb: 'examples', summary: 'Print sample specs for each supported stack (use as starter templates for `scaffold --from-spec`)' },
|
|
32
33
|
{ verb: 'autopilot', summary: 'Multi-phase orchestrator — run scan → spec → plan → implement under one runId (v6.2.0)' },
|
|
33
34
|
{ verb: 'brainstorm', summary: 'Pipeline entry point (Claude Code skill — see /brainstorm)' },
|
|
34
35
|
{ verb: 'spec', summary: 'Spec-writing pointer (Claude Code skill — see /brainstorm)' },
|
|
@@ -213,6 +214,18 @@ export const HELP_OPTIONS = {
|
|
|
213
214
|
claude-autopilot autopilot
|
|
214
215
|
claude-autopilot autopilot --budget 25
|
|
215
216
|
claude-autopilot autopilot --phases=scan,spec,plan`,
|
|
217
|
+
examples: `Options (examples):
|
|
218
|
+
[<stack>] Optional: print only the spec for one stack (node|python|fastapi|go|rust)
|
|
219
|
+
With no arg, lists all supported stacks with a 5-line preview each.
|
|
220
|
+
|
|
221
|
+
Behavior: read-only. Reads bundled spec files from the package's
|
|
222
|
+
\`examples/specs/\` directory and prints them to stdout. Pipe
|
|
223
|
+
to a file to use as a starter template:
|
|
224
|
+
|
|
225
|
+
claude-autopilot examples node > docs/specs/my-feature.md
|
|
226
|
+
claude-autopilot scaffold --from-spec docs/specs/my-feature.md
|
|
227
|
+
|
|
228
|
+
Exit codes: 0 success, 1 unknown stack id or missing bundled file.`,
|
|
216
229
|
dashboard: `Options (dashboard):
|
|
217
230
|
login Open browser, mint API key via loopback callback
|
|
218
231
|
logout Revoke server-side, delete local config
|
package/dist/src/cli/index.js
CHANGED
|
@@ -226,7 +226,7 @@ These are aliases for the flat subcommands; they still work without the 'advance
|
|
|
226
226
|
// gc, delete, doctor) are dispatched inside its case block. The singular
|
|
227
227
|
// `run resume` form is handled BEFORE the default `run` -> review dispatch
|
|
228
228
|
// kicks in (see disambiguation block just below).
|
|
229
|
-
const SUBCOMMANDS = ['init', 'run', 'runs', 'scan', 'report', 'explain', 'ignore', 'ci', 'pr', 'fix', 'costs', 'watch', 'hook', 'autoregress', 'baseline', 'triage', 'lsp', 'worker', 'mcp', 'test-gen', 'pr-desc', 'doctor', 'preflight', 'setup', 'council', 'migrate-v4', 'migrate', 'migrate-doctor', 'deploy', 'brainstorm', 'spec', 'plan', 'implement', 'review', 'validate', 'autopilot', 'internal', 'help', '--help', '-h'];
|
|
229
|
+
const SUBCOMMANDS = ['init', 'run', 'runs', 'scan', 'report', 'explain', 'ignore', 'ci', 'pr', 'fix', 'costs', 'watch', 'hook', 'autoregress', 'baseline', 'triage', 'lsp', 'worker', 'mcp', 'test-gen', 'pr-desc', 'doctor', 'preflight', 'setup', 'council', 'migrate-v4', 'migrate', 'migrate-doctor', 'deploy', 'brainstorm', 'spec', 'plan', 'implement', 'review', 'validate', 'autopilot', 'examples', 'internal', 'help', '--help', '-h'];
|
|
230
230
|
const VALUE_FLAGS = ['base', 'config', 'files', 'format', 'output', 'debounce', 'ask', 'focus', 'fail-on', 'note', 'reason', 'expires', 'profile', 'severity', 'prompt', 'context-file', 'path', 'adapter', 'ref', 'sha', 'spec', 'context', 'mode', 'phases', 'budget', 'stack'];
|
|
231
231
|
// Bare invocation — no subcommand, no flags → show welcome guide
|
|
232
232
|
if (args.length === 0) {
|
|
@@ -993,6 +993,17 @@ switch (subcommand) {
|
|
|
993
993
|
process.exit(0);
|
|
994
994
|
break;
|
|
995
995
|
}
|
|
996
|
+
case 'examples': {
|
|
997
|
+
// v7.7.1 — `claude-autopilot examples [<stack>]`. Bridges the
|
|
998
|
+
// discoverability gap between `setup` and `scaffold --from-spec` —
|
|
999
|
+
// new users don't know what a spec looks like. Optional positional
|
|
1000
|
+
// arg selects a single stack; no arg lists all five.
|
|
1001
|
+
const { runExamples } = await import("./examples.js");
|
|
1002
|
+
const target = args[1] && !args[1].startsWith('--') ? args[1] : undefined;
|
|
1003
|
+
const code = runExamples(target);
|
|
1004
|
+
process.exit(code);
|
|
1005
|
+
break;
|
|
1006
|
+
}
|
|
996
1007
|
case 'council': {
|
|
997
1008
|
const config = flag('config');
|
|
998
1009
|
const prompt = flag('prompt');
|
|
@@ -0,0 +1,82 @@
|
|
|
1
|
+
/** Which loop in the autopilot pipeline produced the failure. The three
|
|
2
|
+
* Step-4 / Step-7 / Step-8 retry loops in `skills/autopilot/SKILL.md` are
|
|
3
|
+
* the call sites that consult the detector before consuming a retry. */
|
|
4
|
+
export type FailurePhase = 'validate' | 'codex-review' | 'bugbot';
|
|
5
|
+
/** A normalized identity for a single failure occurrence. Two fingerprints
|
|
6
|
+
* are "the same failure" iff their `hash` is equal — phase, errorType,
|
|
7
|
+
* errorLocation, and the truncated/normalized errorMessage all feed into
|
|
8
|
+
* the hash, so any meaningful change between retries produces a new hash. */
|
|
9
|
+
export interface FailureFingerprint {
|
|
10
|
+
phase: FailurePhase;
|
|
11
|
+
/** Discriminator inside the phase. Examples:
|
|
12
|
+
* - validate: 'tsc_error' | 'test_failure' | 'lint_error'
|
|
13
|
+
* - codex-review: 'codex_critical' | 'codex_warning'
|
|
14
|
+
* - bugbot: 'bugbot_high' | 'bugbot_medium' */
|
|
15
|
+
errorType: string;
|
|
16
|
+
/** Where the failure points. `file:line` for tsc/lint, test name for tests,
|
|
17
|
+
* finding-id for codex, comment-id for bugbot. Whatever uniquely locates
|
|
18
|
+
* the problem within the phase. */
|
|
19
|
+
errorLocation: string;
|
|
20
|
+
/** First 200 chars of the canonical message, whitespace-collapsed. The
|
|
21
|
+
* truncation is what makes the fingerprint stable across runs that differ
|
|
22
|
+
* only in trailing stack-frame noise. */
|
|
23
|
+
errorMessage: string;
|
|
24
|
+
/** sha256 hex of `${phase}|${errorType}|${errorLocation}|${errorMessage}`.
|
|
25
|
+
* This is the equality key. */
|
|
26
|
+
hash: string;
|
|
27
|
+
}
|
|
28
|
+
/** Maximum length of the normalized error message that feeds the hash.
|
|
29
|
+
* Anything beyond this is dropped — picked to match the issue spec and to
|
|
30
|
+
* keep the hash stable across runs that differ only in trailing noise. */
|
|
31
|
+
export declare const FINGERPRINT_MESSAGE_MAX = 200;
|
|
32
|
+
export interface ComputeFingerprintInput {
|
|
33
|
+
phase: FailurePhase;
|
|
34
|
+
errorType: string;
|
|
35
|
+
errorLocation: string;
|
|
36
|
+
errorMessage: string;
|
|
37
|
+
}
|
|
38
|
+
/** Strip known volatile / per-run tokens from a free-form string so that two
|
|
39
|
+
* retries that differ only in transient data (UUIDs, ports, epoch
|
|
40
|
+
* timestamps, ISO timestamps, hex SHAs, absolute temp paths) produce the
|
|
41
|
+
* same canonical form. Order matters — broader patterns run first so they
|
|
42
|
+
* can swallow embedded delimiters before narrower patterns see them.
|
|
43
|
+
*
|
|
44
|
+
* Exported because callers building locations/messages outside this module
|
|
45
|
+
* may want to apply the same scrubbing before constructing a fingerprint
|
|
46
|
+
* (e.g. when assembling an `errorLocation` from a tool output that embeds
|
|
47
|
+
* a run-id). */
|
|
48
|
+
export declare function stripVolatileTokens(s: string): string;
|
|
49
|
+
/** Compute a stable fingerprint for a single failure occurrence. The
|
|
50
|
+
* returned `hash` is the equality key — two failures with equal hashes
|
|
51
|
+
* are considered "the same failure" for retry-loop escalation purposes. */
|
|
52
|
+
export declare function computeFingerprint(input: ComputeFingerprintInput): FailureFingerprint;
|
|
53
|
+
/** Compare two fingerprints. They are "the same failure" iff their hashes
|
|
54
|
+
* match — phase/type/location/message all feed the hash, so equal hash
|
|
55
|
+
* means equal across all observable identity. */
|
|
56
|
+
export declare function isSameFailure(a: FailureFingerprint, b: FailureFingerprint): boolean;
|
|
57
|
+
export interface EscalationDecision {
|
|
58
|
+
/** True iff the caller should STOP consuming retries and surface to a
|
|
59
|
+
* human, because the last two recorded attempts produced the same
|
|
60
|
+
* failure (no progress between retries). */
|
|
61
|
+
escalate: boolean;
|
|
62
|
+
/** Set when `escalate` is true — human-readable explanation. */
|
|
63
|
+
reason?: string;
|
|
64
|
+
/** Set when `escalate` is true — the offending fingerprint that fired
|
|
65
|
+
* twice. Callers should display this to the operator so they can see
|
|
66
|
+
* what's stuck. */
|
|
67
|
+
fingerprint?: FailureFingerprint;
|
|
68
|
+
}
|
|
69
|
+
/** Decide whether to escalate a retry loop to a human, given the history
|
|
70
|
+
* of failure fingerprints recorded so far in this retry loop.
|
|
71
|
+
*
|
|
72
|
+
* Rule (per issue #181): escalate iff `history.length >= 2` AND the last
|
|
73
|
+
* two fingerprints have identical hashes. Anything else — first failure,
|
|
74
|
+
* different failures across retries, longer streak of different failures
|
|
75
|
+
* — returns `{ escalate: false }`.
|
|
76
|
+
*
|
|
77
|
+
* Rationale: a single retry on the SAME failure means we tried, fixed
|
|
78
|
+
* nothing, and failed identically. Retries that keep failing on
|
|
79
|
+
* *different* things are still making progress (each one is a new fix).
|
|
80
|
+
* Only no-progress retries should consume the escalation budget. */
|
|
81
|
+
export declare function shouldEscalate(history: readonly FailureFingerprint[]): EscalationDecision;
|
|
82
|
+
//# sourceMappingURL=sameness-detector.d.ts.map
|
|
@@ -0,0 +1,146 @@
|
|
|
1
|
+
// src/core/run-state/sameness-detector.ts
|
|
2
|
+
//
|
|
3
|
+
// Retry-loop sameness detector — escalates when the same failure fingerprint
|
|
4
|
+
// fires twice in a row during a retry loop (validate, codex PR review, or
|
|
5
|
+
// bugbot). The pipeline halts when retries make no progress, even if you have
|
|
6
|
+
// retries remaining.
|
|
7
|
+
//
|
|
8
|
+
// Issue: #181 (v7.11.0 — released as v7.10.0).
|
|
9
|
+
//
|
|
10
|
+
// Design:
|
|
11
|
+
// - `FailureFingerprint` is a hashable identity for a failure. Same hash
|
|
12
|
+
// across two attempts means "we tried, failed for the same reason, fixed
|
|
13
|
+
// nothing". That is the signal to stop burning retries and surface to a
|
|
14
|
+
// human.
|
|
15
|
+
// - Storage is in-memory only. The v6 run-state events.ndjson integration
|
|
16
|
+
// is tracked separately as issue #180; explicitly deferred here so the
|
|
17
|
+
// pipeline can adopt the detector without waiting on persistence.
|
|
18
|
+
// - All functions are pure (modulo `crypto.createHash`), making this easy
|
|
19
|
+
// to unit-test under node:test.
|
|
20
|
+
//
|
|
21
|
+
// Who calls this:
|
|
22
|
+
// The detector is consumed by the autopilot skill agent (an LLM following
|
|
23
|
+
// `skills/autopilot/SKILL.md`), NOT by the `scripts/validate.ts`,
|
|
24
|
+
// `scripts/codex-pr-review.ts`, or `scripts/bugbot.ts` CLI scripts. Those
|
|
25
|
+
// scripts are stateless per-invocation; the retry loop lives one layer
|
|
26
|
+
// above them, inside the skill execution. Wiring this into the CLIs would
|
|
27
|
+
// not catch repeated failures because each CLI invocation is a clean
|
|
28
|
+
// process. The skill agent is the durable retry-loop scope.
|
|
29
|
+
import { createHash } from 'node:crypto';
|
|
30
|
+
/** Maximum length of the normalized error message that feeds the hash.
|
|
31
|
+
* Anything beyond this is dropped — picked to match the issue spec and to
|
|
32
|
+
* keep the hash stable across runs that differ only in trailing noise. */
|
|
33
|
+
export const FINGERPRINT_MESSAGE_MAX = 200;
|
|
34
|
+
/** Strip known volatile / per-run tokens from a free-form string so that two
|
|
35
|
+
* retries that differ only in transient data (UUIDs, ports, epoch
|
|
36
|
+
* timestamps, ISO timestamps, hex SHAs, absolute temp paths) produce the
|
|
37
|
+
* same canonical form. Order matters — broader patterns run first so they
|
|
38
|
+
* can swallow embedded delimiters before narrower patterns see them.
|
|
39
|
+
*
|
|
40
|
+
* Exported because callers building locations/messages outside this module
|
|
41
|
+
* may want to apply the same scrubbing before constructing a fingerprint
|
|
42
|
+
* (e.g. when assembling an `errorLocation` from a tool output that embeds
|
|
43
|
+
* a run-id). */
|
|
44
|
+
export function stripVolatileTokens(s) {
|
|
45
|
+
if (typeof s !== 'string')
|
|
46
|
+
return '';
|
|
47
|
+
return (s
|
|
48
|
+
// ISO-8601 timestamps (e.g. 2026-05-13T07:00:00.000Z)
|
|
49
|
+
.replace(/\b\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d{1,6})?(?:Z|[+-]\d{2}:?\d{2})?\b/g, '<ts>')
|
|
50
|
+
// 13-digit epoch ms
|
|
51
|
+
.replace(/\b\d{13}\b/g, '<ts>')
|
|
52
|
+
// UUIDs (v1-v5)
|
|
53
|
+
.replace(/\b[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\b/g, '<uuid>')
|
|
54
|
+
// 40-char (sha1) or 64-char (sha256) hex digests
|
|
55
|
+
.replace(/\b[0-9a-fA-F]{40}\b/g, '<sha>')
|
|
56
|
+
.replace(/\b[0-9a-fA-F]{64}\b/g, '<sha>')
|
|
57
|
+
// macOS / Linux temp paths (/tmp, /var/folders) up to the next whitespace
|
|
58
|
+
.replace(/\/(?:tmp|var\/folders)\/[^\s'"`]+/g, '<tmpdir>')
|
|
59
|
+
// localhost ports like :49213
|
|
60
|
+
.replace(/\b(?:127\.0\.0\.1|localhost):\d{2,5}\b/g, '<host:port>'));
|
|
61
|
+
}
|
|
62
|
+
/** Normalize a free-form error message: strip volatile tokens, trim, collapse
|
|
63
|
+
* all runs of whitespace (including newlines/tabs) to single spaces, and
|
|
64
|
+
* truncate to `FINGERPRINT_MESSAGE_MAX` characters. The truncation is what
|
|
65
|
+
* makes the fingerprint stable across runs whose messages differ only in
|
|
66
|
+
* trailing stack-frame noise. */
|
|
67
|
+
function normalizeMessage(msg) {
|
|
68
|
+
if (typeof msg !== 'string') {
|
|
69
|
+
return '';
|
|
70
|
+
}
|
|
71
|
+
const scrubbed = stripVolatileTokens(msg);
|
|
72
|
+
const collapsed = scrubbed.replace(/\s+/g, ' ').trim();
|
|
73
|
+
if (collapsed.length <= FINGERPRINT_MESSAGE_MAX) {
|
|
74
|
+
return collapsed;
|
|
75
|
+
}
|
|
76
|
+
return collapsed.slice(0, FINGERPRINT_MESSAGE_MAX);
|
|
77
|
+
}
|
|
78
|
+
/** Normalize an `errorLocation` (file path / test name / etc.) by applying
|
|
79
|
+
* the same volatile-token scrubbing as the message, plus whitespace trim.
|
|
80
|
+
* Does NOT truncate — locations are short by construction. */
|
|
81
|
+
function normalizeLocation(loc) {
|
|
82
|
+
if (typeof loc !== 'string')
|
|
83
|
+
return '';
|
|
84
|
+
return stripVolatileTokens(loc).trim();
|
|
85
|
+
}
|
|
86
|
+
/** Compute a stable fingerprint for a single failure occurrence. The
|
|
87
|
+
* returned `hash` is the equality key — two failures with equal hashes
|
|
88
|
+
* are considered "the same failure" for retry-loop escalation purposes. */
|
|
89
|
+
export function computeFingerprint(input) {
|
|
90
|
+
const phase = input.phase;
|
|
91
|
+
const errorType = (input.errorType ?? '').toString();
|
|
92
|
+
const errorLocation = normalizeLocation((input.errorLocation ?? '').toString());
|
|
93
|
+
const errorMessage = normalizeMessage(input.errorMessage ?? '');
|
|
94
|
+
// Use JSON.stringify of a 4-tuple as the canonical pre-hash serialization.
|
|
95
|
+
// This is unambiguous under any field content — pipe characters, quotes,
|
|
96
|
+
// braces, embedded JSON, etc. all serialize unambiguously and cannot
|
|
97
|
+
// produce collisions across different `[phase, type, location, message]`
|
|
98
|
+
// tuples. (Earlier drafts used a pipe-delimited string; that was vulnerable
|
|
99
|
+
// to delimiter ambiguity when, e.g., a test name legitimately contained
|
|
100
|
+
// '|'. The JSON form has no such edge case.)
|
|
101
|
+
const canonical = JSON.stringify([phase, errorType, errorLocation, errorMessage]);
|
|
102
|
+
const hash = createHash('sha256').update(canonical).digest('hex');
|
|
103
|
+
return { phase, errorType, errorLocation, errorMessage, hash };
|
|
104
|
+
}
|
|
105
|
+
/** Compare two fingerprints. They are "the same failure" iff their hashes
|
|
106
|
+
* match — phase/type/location/message all feed the hash, so equal hash
|
|
107
|
+
* means equal across all observable identity. */
|
|
108
|
+
export function isSameFailure(a, b) {
|
|
109
|
+
if (!a || !b)
|
|
110
|
+
return false;
|
|
111
|
+
return a.hash === b.hash;
|
|
112
|
+
}
|
|
113
|
+
/** Decide whether to escalate a retry loop to a human, given the history
|
|
114
|
+
* of failure fingerprints recorded so far in this retry loop.
|
|
115
|
+
*
|
|
116
|
+
* Rule (per issue #181): escalate iff `history.length >= 2` AND the last
|
|
117
|
+
* two fingerprints have identical hashes. Anything else — first failure,
|
|
118
|
+
* different failures across retries, longer streak of different failures
|
|
119
|
+
* — returns `{ escalate: false }`.
|
|
120
|
+
*
|
|
121
|
+
* Rationale: a single retry on the SAME failure means we tried, fixed
|
|
122
|
+
* nothing, and failed identically. Retries that keep failing on
|
|
123
|
+
* *different* things are still making progress (each one is a new fix).
|
|
124
|
+
* Only no-progress retries should consume the escalation budget. */
|
|
125
|
+
export function shouldEscalate(history) {
|
|
126
|
+
if (!Array.isArray(history) || history.length < 2) {
|
|
127
|
+
return { escalate: false };
|
|
128
|
+
}
|
|
129
|
+
const last = history[history.length - 1];
|
|
130
|
+
const prev = history[history.length - 2];
|
|
131
|
+
if (!last || !prev) {
|
|
132
|
+
return { escalate: false };
|
|
133
|
+
}
|
|
134
|
+
if (isSameFailure(prev, last)) {
|
|
135
|
+
return {
|
|
136
|
+
escalate: true,
|
|
137
|
+
reason: `Retry loop produced the same failure twice in a row ` +
|
|
138
|
+
`(phase=${last.phase}, errorType=${last.errorType}, ` +
|
|
139
|
+
`errorLocation=${last.errorLocation}). The pipeline is not making ` +
|
|
140
|
+
`progress — surfacing to human instead of consuming another retry.`,
|
|
141
|
+
fingerprint: last,
|
|
142
|
+
};
|
|
143
|
+
}
|
|
144
|
+
return { escalate: false };
|
|
145
|
+
}
|
|
146
|
+
//# sourceMappingURL=sameness-detector.js.map
|
|
@@ -0,0 +1,37 @@
|
|
|
1
|
+
# tasks-api — FastAPI service
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
|
|
5
|
+
A small FastAPI HTTP service exposing a `/tasks` CRUD endpoint with
|
|
6
|
+
in-memory storage. Demonstrates the FastAPI scaffold path:
|
|
7
|
+
`pyproject.toml` + `src/<pkg>/main.py` layout, dependencies on
|
|
8
|
+
`fastapi` and `uvicorn[standard]`, and pytest with `httpx.AsyncClient`
|
|
9
|
+
for endpoint tests.
|
|
10
|
+
|
|
11
|
+
The scaffolder auto-classifies as FastAPI when prose mentions
|
|
12
|
+
`fastapi` AND a `main.py` is listed.
|
|
13
|
+
|
|
14
|
+
## Files
|
|
15
|
+
|
|
16
|
+
* `pyproject.toml` — hatchling build backend, depends on `fastapi`, `uvicorn[standard]`, and `httpx` (test-only)
|
|
17
|
+
* `requirements.txt` — Pinned lock file
|
|
18
|
+
* `.gitignore` — `__pycache__/`, `.venv/`, `dist/`, `*.egg-info/`, `.pytest_cache/`
|
|
19
|
+
* `src/tasks_api/__init__.py` — Package marker
|
|
20
|
+
* `src/tasks_api/main.py` — FastAPI app (`app = FastAPI()`), defines `GET/POST/DELETE /tasks`
|
|
21
|
+
* `src/tasks_api/models.py` — Pydantic `Task` model
|
|
22
|
+
* `tests/test_api.py` — Async pytest tests using `httpx.AsyncClient(app=app)`
|
|
23
|
+
* `README.md` — Usage: `uvicorn tasks_api.main:app --reload`
|
|
24
|
+
|
|
25
|
+
## How to use
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
claude-autopilot examples fastapi > spec.md
|
|
29
|
+
claude-autopilot scaffold --from-spec spec.md
|
|
30
|
+
python -m pip install -e .
|
|
31
|
+
pytest
|
|
32
|
+
uvicorn tasks_api.main:app --reload
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
The scaffolder writes a FastAPI skeleton with one working endpoint as
|
|
36
|
+
a starting point. Replace the in-memory dict with a real backing
|
|
37
|
+
store (SQLite, Postgres, Redis) when you outgrow it.
|
|
@@ -0,0 +1,32 @@
|
|
|
1
|
+
# greet — Go 1.22 CLI
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
|
|
5
|
+
A small Go CLI that prints a greeting. Demonstrates the standard Go
|
|
6
|
+
module layout: `go.mod` at the repo root, `main.go` with a `main()`
|
|
7
|
+
function, and table-driven tests in `main_test.go`. No external
|
|
8
|
+
dependencies — uses only `fmt`, `os`, `flag` from the standard library.
|
|
9
|
+
|
|
10
|
+
This is the simplest Go scaffold target. For a multi-binary repo,
|
|
11
|
+
add `cmd/<name>/main.go` paths to the spec instead of root `main.go`.
|
|
12
|
+
|
|
13
|
+
## Files
|
|
14
|
+
|
|
15
|
+
* `go.mod` — `module greet`, `go 1.22`, no `require` block (stdlib only)
|
|
16
|
+
* `.gitignore` — `vendor/`, `*.test`, `*.out`, binary output names
|
|
17
|
+
* `main.go` — `package main`, defines `main()` and a pure `greet(name string) string`
|
|
18
|
+
* `main_test.go` — Table-driven tests for `greet()` plus a smoke test for `main()`
|
|
19
|
+
* `README.md` — Usage example: `go run . --name=World`
|
|
20
|
+
|
|
21
|
+
## How to use
|
|
22
|
+
|
|
23
|
+
```bash
|
|
24
|
+
claude-autopilot examples go > spec.md
|
|
25
|
+
claude-autopilot scaffold --from-spec spec.md
|
|
26
|
+
go test ./...
|
|
27
|
+
go run . --name=World
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
The scaffolder writes a working "hello, name" skeleton with a
|
|
31
|
+
table-driven test you can extend. Edit `go.mod`'s module path before
|
|
32
|
+
publishing (e.g. `module github.com/you/greet`).
|
|
@@ -0,0 +1,36 @@
|
|
|
1
|
+
# url-summarizer — Node 22 ESM CLI
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
|
|
5
|
+
A small Node 22 ESM CLI that takes a URL on the command line, fetches
|
|
6
|
+
the page, calls an LLM for a 3-bullet markdown summary, and prints the
|
|
7
|
+
result to stdout. Demonstrates the v7.1.6 benchmark layout: a thin
|
|
8
|
+
`bin/` entry that imports a pure handler from `src/`, JS-only with
|
|
9
|
+
`tsconfig.json` set up for `allowJs + checkJs + noEmit` typechecking.
|
|
10
|
+
|
|
11
|
+
This is the simplest scaffold target — it lights up the Node ESM path
|
|
12
|
+
in `claude-autopilot scaffold --from-spec`.
|
|
13
|
+
|
|
14
|
+
## Files
|
|
15
|
+
|
|
16
|
+
* `package.json` — Node 22 ESM, `type: "module"`, bin: { url-summarizer: bin/url-summarizer.js }, scripts: { test: "node --test --import=tsx tests/", typecheck: "tsc --noEmit" }
|
|
17
|
+
* `tsconfig.json` — `allowJs + checkJs + noEmit`, `types: ["node"]`
|
|
18
|
+
* `.gitignore` — `node_modules/`, `.env.local`, `.guardrail-cache/`
|
|
19
|
+
* `bin/url-summarizer.js` — CLI entry; parses `argv`, calls handler, prints result
|
|
20
|
+
* `src/handler.js` — Pure async function `summarize(url): Promise<string>` (fetch + LLM call)
|
|
21
|
+
* `tests/handler.test.js` — Unit tests with mocked fetch + LLM
|
|
22
|
+
* `tests/cli.test.js` — CLI subprocess tests (uses `child_process.spawn`, NOT `spawnSync`)
|
|
23
|
+
* `README.md` — Usage example: `node bin/url-summarizer.js https://example.com`
|
|
24
|
+
|
|
25
|
+
## How to use
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
claude-autopilot examples node > spec.md
|
|
29
|
+
claude-autopilot scaffold --from-spec spec.md
|
|
30
|
+
npm test
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
The scaffolder writes a working skeleton; you (or your impl agent) fill
|
|
34
|
+
in the handler body. The pre-existing tests should fail until the
|
|
35
|
+
handler is implemented — that's intentional, it gives the impl agent a
|
|
36
|
+
clear target.
|
|
@@ -0,0 +1,35 @@
|
|
|
1
|
+
# wordcount — Python 3.11+ CLI
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
|
|
5
|
+
A bare Python CLI that counts words in a text file. Demonstrates the
|
|
6
|
+
modern `pyproject.toml` + `src/<pkg>/` layout (the import-isolation
|
|
7
|
+
pattern that pytest + hatchling both recommend), with a console
|
|
8
|
+
script entry point and pytest-driven tests.
|
|
9
|
+
|
|
10
|
+
No FastAPI, no web framework — this is the path for "Python script /
|
|
11
|
+
library / CLI" specs. For HTTP services see `fastapi.md`.
|
|
12
|
+
|
|
13
|
+
## Files
|
|
14
|
+
|
|
15
|
+
* `pyproject.toml` — `[project]` table with hatchling build backend, `requires-python = ">=3.11"`, `dependencies = []`, `[project.scripts] wordcount = "wordcount.cli:main"`
|
|
16
|
+
* `requirements.txt` — Lock file; pinned via `pip-compile` or `uv pip compile`
|
|
17
|
+
* `.gitignore` — `__pycache__/`, `.venv/`, `dist/`, `*.egg-info/`, `.pytest_cache/`
|
|
18
|
+
* `src/wordcount/__init__.py` — Package marker
|
|
19
|
+
* `src/wordcount/cli.py` — `main()` entry; parses `argv`, calls counter, prints result
|
|
20
|
+
* `src/wordcount/counter.py` — Pure function `count_words(text: str) -> int`
|
|
21
|
+
* `tests/test_counter.py` — Unit tests for the counter
|
|
22
|
+
* `tests/test_cli.py` — CLI smoke test via `subprocess.run`
|
|
23
|
+
* `README.md` — Usage example: `wordcount path/to/file.txt`
|
|
24
|
+
|
|
25
|
+
## How to use
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
claude-autopilot examples python > spec.md
|
|
29
|
+
claude-autopilot scaffold --from-spec spec.md
|
|
30
|
+
python -m pip install -e .
|
|
31
|
+
pytest
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
The scaffolder writes the package skeleton + a pinned `requirements.txt`
|
|
35
|
+
stub. Add your runtime deps to `pyproject.toml` then re-lock.
|
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
# greet — Rust 2021 binary crate
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
|
|
5
|
+
A small Rust binary crate that prints a greeting. Demonstrates the
|
|
6
|
+
default `cargo init` layout: `Cargo.toml` + `src/main.rs` for the
|
|
7
|
+
binary target + `tests/integration_test.rs` for the integration
|
|
8
|
+
smoke test. No external crate dependencies — uses only `std`.
|
|
9
|
+
|
|
10
|
+
This is the v7.7.0 binary-only path. For a library, list ONLY
|
|
11
|
+
`src/lib.rs` (no `main.rs`) and the scaffolder switches to library
|
|
12
|
+
mode and excludes `Cargo.lock` from `.gitignore` per Cargo's
|
|
13
|
+
documented convention.
|
|
14
|
+
|
|
15
|
+
## Files
|
|
16
|
+
|
|
17
|
+
* `Cargo.toml` — `[package]` name = "greet", edition = "2021", no `[dependencies]` block (stdlib only)
|
|
18
|
+
* `.gitignore` — `target/` (Cargo.lock NOT excluded — binary crate commits it)
|
|
19
|
+
* `src/main.rs` — `fn main()` plus a pure `fn greet(name: &str) -> String`
|
|
20
|
+
* `tests/integration_test.rs` — Integration smoke test that invokes the binary via `assert_cmd` (or `std::process::Command` for stdlib-only)
|
|
21
|
+
* `README.md` — Usage example: `cargo run -- --name=World`
|
|
22
|
+
|
|
23
|
+
## How to use
|
|
24
|
+
|
|
25
|
+
```bash
|
|
26
|
+
claude-autopilot examples rust > spec.md
|
|
27
|
+
claude-autopilot scaffold --from-spec spec.md
|
|
28
|
+
cargo test
|
|
29
|
+
cargo run -- --name=World
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
The scaffolder writes a working "hello, name" binary skeleton with an
|
|
33
|
+
integration test stub you can extend. Rename the `[package].name` in
|
|
34
|
+
`Cargo.toml` before publishing to crates.io.
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@delegance/claude-autopilot",
|
|
3
|
-
"version": "7.
|
|
3
|
+
"version": "7.10.1",
|
|
4
4
|
"type": "module",
|
|
5
5
|
"publishConfig": {
|
|
6
6
|
"tag": "next"
|
|
@@ -15,7 +15,17 @@
|
|
|
15
15
|
"llm",
|
|
16
16
|
"sarif",
|
|
17
17
|
"cli",
|
|
18
|
-
"pipeline"
|
|
18
|
+
"pipeline",
|
|
19
|
+
"coding-agent",
|
|
20
|
+
"devin-alternative",
|
|
21
|
+
"cursor-alternative",
|
|
22
|
+
"autonomous-coding",
|
|
23
|
+
"multi-model",
|
|
24
|
+
"codex",
|
|
25
|
+
"mit-license",
|
|
26
|
+
"local-first",
|
|
27
|
+
"developer-tools",
|
|
28
|
+
"ci-cd"
|
|
19
29
|
],
|
|
20
30
|
"license": "MIT",
|
|
21
31
|
"workspaces": [
|
|
@@ -39,6 +49,10 @@
|
|
|
39
49
|
"types": "./dist/src/index.d.ts",
|
|
40
50
|
"default": "./dist/src/index.js"
|
|
41
51
|
},
|
|
52
|
+
"./run-state/sameness-detector": {
|
|
53
|
+
"types": "./dist/src/core/run-state/sameness-detector.d.ts",
|
|
54
|
+
"default": "./dist/src/core/run-state/sameness-detector.js"
|
|
55
|
+
},
|
|
42
56
|
"./bin/claude-autopilot.js": "./bin/claude-autopilot.js",
|
|
43
57
|
"./bin/guardrail.js": "./bin/guardrail.js",
|
|
44
58
|
"./package.json": "./package.json"
|
|
@@ -48,6 +62,7 @@
|
|
|
48
62
|
"dist/**/*.js",
|
|
49
63
|
"dist/**/*.d.ts",
|
|
50
64
|
"dist/**/*.json",
|
|
65
|
+
"examples/",
|
|
51
66
|
"presets/",
|
|
52
67
|
"skills/",
|
|
53
68
|
"scripts/test-runner.mjs",
|
|
@@ -80,7 +95,7 @@
|
|
|
80
95
|
"ulid": "^3.0.2"
|
|
81
96
|
},
|
|
82
97
|
"optionalDependencies": {
|
|
83
|
-
"@anthropic-ai/sdk": "^0.
|
|
98
|
+
"@anthropic-ai/sdk": "^0.96.0",
|
|
84
99
|
"@google/generative-ai": "^0.24.1",
|
|
85
100
|
"@modelcontextprotocol/sdk": "^1.29.0",
|
|
86
101
|
"@supabase/supabase-js": "^2.97.0",
|
|
@@ -226,6 +226,13 @@ The `validate.ts` Phase 1 includes a **tsc regression check**: it runs `npx tsc
|
|
|
226
226
|
If either FAIL:
|
|
227
227
|
- Read findings / validation report at `.claude/validation-report.json`
|
|
228
228
|
- Fix the blocking issues
|
|
229
|
+
- **Before consuming a retry, compute the failure fingerprint** with `computeFingerprint({ phase: 'validate', errorType, errorLocation, errorMessage })` from `src/core/run-state/sameness-detector.ts`:
|
|
230
|
+
- `errorType`: `tsc_error` | `test_failure` | `lint_error` (whichever class caused the FAIL)
|
|
231
|
+
- `errorLocation`: prefer stable identifiers — `<repo-relative-path>:<line>` for tsc/lint, test-suite-name+test-name for tests. Avoid absolute paths and transient temp dirs; use the repo-relative form so the hash survives across working-tree moves.
|
|
232
|
+
- `errorMessage`: first 200 chars of the canonical message. Strip any embedded UUIDs, ports, or epoch timestamps from the message before computing — those rotate per run and will mask true sameness.
|
|
233
|
+
|
|
234
|
+
**Known limitation:** if the underlying fix shifts line numbers, two retries can produce different fingerprints even when the root cause is identical (false negative — no escalation). The current behavior is conservative: it never falsely escalates, but it may let the loop run its full retry budget on a shifting-line-number failure. If you see this pattern, prefer test name + diagnostic code over `file:line`.
|
|
235
|
+
- Append the fingerprint to an in-memory list for this retry loop, then call `shouldEscalate(history)`. If `escalate === true` (the last two attempts produced the same fingerprint), STOP — do not consume another retry. Surface the matching fingerprint to the user and stop the pipeline.
|
|
229
236
|
- Re-run the failing check
|
|
230
237
|
- Max 3 retry iterations
|
|
231
238
|
|
|
@@ -313,6 +320,7 @@ npx tsx scripts/codex-pr-review.ts <pr-number>
|
|
|
313
320
|
Posts Codex review as a GitHub PR comment. **This serves as the second risk-tiered pass for medium-risk specs and the third pass for high-risk specs.** Remediate CRITICAL findings:
|
|
314
321
|
- Fix on the branch
|
|
315
322
|
- Push
|
|
323
|
+
- **Before consuming a re-review iteration, compute the failure fingerprint** with `computeFingerprint({ phase: 'codex-review', errorType: 'codex_critical', errorLocation, errorMessage })` from `src/core/run-state/sameness-detector.ts`. For `errorLocation` prefer a stable composite — `${normalized-title}::${repo-relative-path}` — over the transport-level finding ID (Codex regenerates finding IDs on each review run, so the raw ID is not stable). For `errorMessage` use the first sentence of the finding body, stripping any per-run identifiers (run-id, timestamps, file SHAs). Append to an in-memory list for this retry loop and call `shouldEscalate(history)`. If `escalate === true` — the same critical finding fired twice after a remediation attempt — STOP and surface to the user. Do not consume another re-review.
|
|
316
324
|
- Re-run Codex review
|
|
317
325
|
- Max 2 iterations
|
|
318
326
|
|
|
@@ -327,6 +335,7 @@ npx tsx scripts/bugbot.ts --pr <pr-number>
|
|
|
327
335
|
Triages each finding (real bug vs false positive), auto-fixes real bugs, dismisses false positives with GitHub replies. If fixes applied:
|
|
328
336
|
- Push
|
|
329
337
|
- Wait for new bugbot comments (30s)
|
|
338
|
+
- **Before consuming a bugbot retry, compute the failure fingerprint** with `computeFingerprint({ phase: 'bugbot', errorType: <'bugbot_high' | 'bugbot_medium'>, errorLocation, errorMessage })` from `src/core/run-state/sameness-detector.ts`. For `errorLocation` prefer the stable `<repo-relative-path>:<line>` rather than the GitHub comment ID — Cursor reposts findings with fresh comment IDs after each push, so comment ID is not stable across retries. The comment ID can be surfaced as metadata for human inspection but should not enter the hash. For `errorMessage` use the first 200 chars of the finding body. Append to an in-memory list for this retry loop and call `shouldEscalate(history)`. If `escalate === true` — a "fixed" finding re-fired identically — STOP and surface the fingerprint to the user. Do not consume another round.
|
|
330
339
|
- Re-run /bugbot
|
|
331
340
|
- Max 3 rounds
|
|
332
341
|
|
|
@@ -340,6 +349,33 @@ Tell the user:
|
|
|
340
349
|
- Bugbot triage summary (fixed / dismissed / needs-human)
|
|
341
350
|
- Any human-required items that couldn't be auto-resolved
|
|
342
351
|
|
|
352
|
+
## Retry-loop sameness detector (Steps 4, 7, 8)
|
|
353
|
+
|
|
354
|
+
As of v7.10.0, the three retry loops (Step 4 validate, Step 7 Codex PR review, Step 8 bugbot) MUST consult `src/core/run-state/sameness-detector.ts` before consuming a retry. The pipeline halts when retries make no progress — even if you have retries remaining.
|
|
355
|
+
|
|
356
|
+
**The detector is consumed by the autopilot skill agent itself** (the LLM following this skill), not by the underlying `scripts/validate.ts` / `scripts/codex-pr-review.ts` / `scripts/bugbot.ts` CLI scripts. Those scripts are stateless per-invocation; the retry loop with its history lives inside this skill execution scope. That is why the integration is in this document and not in the CLI source.
|
|
357
|
+
|
|
358
|
+
```ts
|
|
359
|
+
// Public package import (uses the subpath export added in v7.10.0):
|
|
360
|
+
import {
|
|
361
|
+
computeFingerprint,
|
|
362
|
+
shouldEscalate,
|
|
363
|
+
} from '@delegance/claude-autopilot/run-state/sameness-detector';
|
|
364
|
+
|
|
365
|
+
// Or from a local checkout:
|
|
366
|
+
import { computeFingerprint, shouldEscalate } from './src/core/run-state/sameness-detector.ts';
|
|
367
|
+
```
|
|
368
|
+
|
|
369
|
+
How to use inside each retry loop:
|
|
370
|
+
|
|
371
|
+
1. On a FAIL, derive `{ phase, errorType, errorLocation, errorMessage }` for the most salient blocker.
|
|
372
|
+
2. `const fp = computeFingerprint({...})`.
|
|
373
|
+
3. Append `fp` to an in-memory list for THIS retry loop (no cross-loop state — bugbot and validate keep separate histories).
|
|
374
|
+
4. `const decision = shouldEscalate(history)` — if `decision.escalate === true`, STOP. Surface `decision.fingerprint` and `decision.reason` to the user. Do not consume another retry attempt even if the per-loop max isn't reached.
|
|
375
|
+
5. Otherwise apply the fix, run again.
|
|
376
|
+
|
|
377
|
+
Persistence is in-memory only in v7.10.0. The v6 run-state events.ndjson integration is tracked as issue #180.
|
|
378
|
+
|
|
343
379
|
## Error Recovery
|
|
344
380
|
|
|
345
381
|
- **Preflight failure:** Surface the specific check, exit. Do not partially run.
|
|
@@ -347,9 +383,10 @@ Tell the user:
|
|
|
347
383
|
- **Subagent failure:** Re-dispatch with more context or implement directly.
|
|
348
384
|
- **Migration failure (Step 5 — dev only):** Fix SQL, re-run Step 4 (validate) against corrected code, then re-run Step 5 (migrate dev). Prod stays untouched. Max 3 retries.
|
|
349
385
|
- **Prod migration:** Out of scope for autopilot. Handed off to user's CI/CD pipeline via `migrate.policy` (require_manual_approval, require_dry_run_first).
|
|
350
|
-
- **Validate failure:** Fix issues, re-run (max 3 retries).
|
|
351
|
-
- **Codex CRITICAL findings:** REMEDIATE (apply fix), push, re-review (max 2 retries). Do NOT continue past unremediated CRITICALs.
|
|
352
|
-
- **Bugbot findings:** `/bugbot` handles triage + fix automatically (max 3 rounds).
|
|
386
|
+
- **Validate failure:** Fix issues, re-run (max 3 retries, sameness-detector may halt earlier).
|
|
387
|
+
- **Codex CRITICAL findings:** REMEDIATE (apply fix), push, re-review (max 2 retries, sameness-detector may halt earlier). Do NOT continue past unremediated CRITICALs.
|
|
388
|
+
- **Bugbot findings:** `/bugbot` handles triage + fix automatically (max 3 rounds, sameness-detector may halt earlier).
|
|
389
|
+
- **Sameness detector halt:** The same failure fingerprint fired in two consecutive retry attempts — the loop is not making progress. Stop, surface the fingerprint, ask the human to inspect.
|
|
353
390
|
- **External hard-block** (TCC, network, etc.): Stop, report what was completed, surface the blocker.
|
|
354
391
|
|
|
355
392
|
## When NOT to use
|