agent-gauntlet 0.1.11 → 0.1.12
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +55 -87
- package/package.json +1 -1
- package/src/bun-plugins.d.ts +4 -0
- package/src/cli-adapters/github-copilot.ts +1 -1
- package/src/commands/check.ts +23 -2
- package/src/commands/ci/index.ts +15 -0
- package/src/commands/ci/init.ts +96 -0
- package/src/commands/ci/list-jobs.ts +78 -0
- package/src/commands/detect.ts +16 -1
- package/src/commands/help.ts +1 -0
- package/src/commands/index.ts +1 -0
- package/src/commands/init.test.ts +2 -0
- package/src/commands/rerun.ts +16 -1
- package/src/commands/review.ts +23 -2
- package/src/commands/run.ts +23 -2
- package/src/config/ci-loader.ts +33 -0
- package/src/config/ci-schema.ts +52 -0
- package/src/config/schema.ts +0 -1
- package/src/config/types.ts +13 -0
- package/src/core/change-detector.ts +3 -3
- package/src/core/entry-point.ts +37 -0
- package/src/core/runner.ts +4 -1
- package/src/gates/review.test.ts +162 -59
- package/src/gates/review.ts +9 -1
- package/src/index.ts +2 -0
- package/src/output/logger.ts +5 -3
- package/src/templates/workflow.yml +77 -0
package/README.md
CHANGED
|
@@ -1,122 +1,90 @@
|
|
|
1
1
|
# Agent Gauntlet
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
> Don't just review the agent's code — put it through the gauntlet.
|
|
4
4
|
|
|
5
|
-
|
|
6
|
-
- **Entry points** (paths in your repo)
|
|
7
|
-
- **Check gates** (shell commands: tests, linters, typecheck, etc.)
|
|
8
|
-
- **Review gates** (AI CLI tools run on diffs, with regex-based pass/fail)
|
|
5
|
+
Agent Gauntlet is a configurable “feedback loop” runner for AI-assisted development workflows.
|
|
9
6
|
|
|
10
|
-
|
|
7
|
+
You configure which paths in your repo should trigger which validations — shell commands like tests and linters, plus AI-powered code reviews. When files change, Gauntlet automatically runs the relevant validations and reports results.
|
|
11
8
|
|
|
12
|
-
|
|
9
|
+
For AI reviews, it uses the CLI tool of your choice: Gemini, Codex, Claude Code, GitHub Copilot, or Cursor.
|
|
13
10
|
|
|
14
|
-
|
|
15
|
-
- **Leverage existing subscriptions**: Use the tools you are already paying for.
|
|
16
|
-
- **Dynamic Context**: Agents are invoked in a non-interactive, read-only mode where they can use their own file-reading and search tools to pull additional context from your repository as needed.
|
|
17
|
-
- **Security**: By using standard CLI tools with strict flags (like `--sandbox` or `--allowed-tools`), Agent Gauntlet ensures that agents can read your code to review it without being able to modify your files or escape the repository scope.
|
|
11
|
+
## Features
|
|
18
12
|
|
|
19
|
-
|
|
13
|
+
- **Agent validation loop**: Keep your coding agent on track with automated feedback loops. Detect problems — deterministically and/or non-deterministically — and let your agent fix and Gauntlet verify.
|
|
14
|
+
- **Multi-agent collaboration**: Enable one AI agent to automatically request code reviews from another. For example, if Claude made changes, Gauntlet can request a review from Codex or Gemini — spreading token usage across your subscriptions instead of burning through one.
|
|
15
|
+
- **Leverage existing subscriptions**: Agent Gauntlet is *free* and tool-agnostic, leveraging the AI CLI tools you already have installed.
|
|
16
|
+
- **Easy CI setup**: Define your checks once, run them locally and in GitHub.
|
|
20
17
|
|
|
21
|
-
|
|
22
|
-
- **git** (change detection and diffs)
|
|
23
|
-
- For review gates: one or more supported AI CLIs installed (`gemini`, `codex`, `claude`, `github-copilot`, `cursor`). For the full list of tools and how they are used, see [CLI Invocation Details](docs/cli-invocation-details.md)
|
|
18
|
+
## Usage Patterns
|
|
24
19
|
|
|
25
|
-
|
|
20
|
+
Agent Gauntlet supports three primary usage patterns, each suited for different development workflows:
|
|
21
|
+
1. Run CLI: `agent-gauntlet run`
|
|
22
|
+
2. Run agent command: `/gauntlet`
|
|
23
|
+
3. Automatically run after agent completes task
|
|
26
24
|
|
|
27
|
-
|
|
25
|
+
The use cases below illustrate when each of these patterns may be used.
|
|
28
26
|
|
|
29
|
-
|
|
30
|
-
```bash
|
|
31
|
-
bun add -g agent-gauntlet
|
|
32
|
-
```
|
|
27
|
+
### 1. Planning Mode
|
|
33
28
|
|
|
34
|
-
**
|
|
35
|
-
```bash
|
|
36
|
-
npm install -g agent-gauntlet
|
|
37
|
-
```
|
|
29
|
+
**Use case:** Generate and review high-level implementation plans before coding.
|
|
38
30
|
|
|
39
|
-
|
|
31
|
+
**Problem Gauntlet solves:** Catch architectural issues and requirement misunderstandings before coding to avoid costly rework.
|
|
40
32
|
|
|
41
|
-
|
|
33
|
+
**Workflow:**
|
|
42
34
|
|
|
43
|
-
|
|
44
|
-
agent-gauntlet
|
|
45
|
-
|
|
35
|
+
1. Create a plan document in your project directory
|
|
36
|
+
2. Run `agent-gauntlet run` from the terminal
|
|
37
|
+
3. Gauntlet detects the new or modified plan and invokes configured AI CLIs to review it
|
|
38
|
+
4. *(Optional)* Ask your assistant to refine the plan based on review feedback
|
|
46
39
|
|
|
47
|
-
|
|
40
|
+
**Note:** Review configuration and prompts are fully customizable. Example prompt: *"Review this plan for completeness and potential issues."*
|
|
48
41
|
|
|
49
|
-
|
|
50
|
-
agent-gauntlet
|
|
51
|
-
```
|
|
42
|
+
### 2. AI-Assisted Development
|
|
52
43
|
|
|
53
|
-
|
|
44
|
+
**Use case:** Pair with an AI coding assistant to implement features with continuous quality checks.
|
|
54
45
|
|
|
55
|
-
-
|
|
46
|
+
**Problem Gauntlet solves:** Catch AI-introduced bugs and quality issues through automated checks and multi-LLM review.
|
|
56
47
|
|
|
57
|
-
|
|
58
|
-
bun install
|
|
59
|
-
```
|
|
48
|
+
**Workflow:**
|
|
60
49
|
|
|
61
|
-
|
|
50
|
+
1. Collaborate with your assistant to implement code changes
|
|
51
|
+
2. Run `/gauntlet` from chat
|
|
52
|
+
3. Gauntlet detects changed files and runs configured checks (linter, tests, type checking, etc.)
|
|
53
|
+
4. Simultaneously, Gauntlet invokes AI CLIs for code review
|
|
54
|
+
5. Assistant reviews results, fixes identified issues, and runs `agent-gauntlet rerun`
|
|
55
|
+
6. Gauntlet verifies fixes and checks for new issues
|
|
56
|
+
7. Process repeats automatically (up to 3 reruns) until all gates pass
|
|
62
57
|
|
|
63
|
-
|
|
64
|
-
bun run build
|
|
65
|
-
```
|
|
58
|
+
### 3. Agentic Implementation
|
|
66
59
|
|
|
67
|
-
|
|
60
|
+
**Use case:** Delegate well-defined tasks to a coding agent for autonomous implementation.
|
|
68
61
|
|
|
69
|
-
-
|
|
62
|
+
**Problem Gauntlet solves:** Enable autonomous agent development with built-in quality gates, eliminating the validation gap when humans aren't in the loop.
|
|
70
63
|
|
|
71
|
-
|
|
72
|
-
agent-gauntlet run
|
|
73
|
-
```
|
|
64
|
+
**Workflow:**
|
|
74
65
|
|
|
75
|
-
|
|
66
|
+
1. Configure your agent to automatically run `/gauntlet` after completing implementation:
|
|
67
|
+
- **Rules files:** Add to `.cursorrules`, `AGENT.md`, or similar
|
|
68
|
+
- **Custom commands:** Create a `/my-dev-workflow` that includes gauntlet
|
|
69
|
+
- **Git hooks:** Use pre-commit hooks to trigger gauntlet
|
|
70
|
+
- **Agent hooks:** Leverage platform features (e.g., Claude's Stop event)
|
|
71
|
+
2. Assign the task to your agent and step away
|
|
72
|
+
3. When you return: the task is complete, reviewed by a different LLM, all issues fixed, and CI checks passing
|
|
76
73
|
|
|
77
|
-
|
|
78
|
-
agent-gauntlet run --gate lint
|
|
79
|
-
```
|
|
74
|
+
**Benefit:** Fully autonomous quality assurance without manual intervention.
|
|
80
75
|
|
|
81
|
-
|
|
76
|
+
## Quick Start
|
|
82
77
|
|
|
83
|
-
|
|
84
|
-
agent-gauntlet
|
|
85
|
-
|
|
78
|
+
1. **Install**: `bun add -g agent-gauntlet`
|
|
79
|
+
2. **Initialize**: `agent-gauntlet init`
|
|
80
|
+
3. **Run**: `agent-gauntlet run`
|
|
86
81
|
|
|
87
|
-
|
|
82
|
+
For basic usage and configuration guide, see the [Quick Start Guide](docs/quick-start.md).
|
|
88
83
|
|
|
89
|
-
|
|
90
|
-
agent-gauntlet health
|
|
91
|
-
```
|
|
92
|
-
|
|
93
|
-
### Agent loop rules
|
|
94
|
-
|
|
95
|
-
The `.gauntlet/run_gauntlet.md` file defines how AI agents should interact with the gauntlet. By default, agents will terminate after 4 runs (1 initial + 3 fix attempts). You can increase this limit by manually editing the termination conditions in that file.
|
|
96
|
-
|
|
97
|
-
### Configuration layout
|
|
98
|
-
|
|
99
|
-
Agent Gauntlet loads configuration from your repository:
|
|
100
|
-
|
|
101
|
-
```text
|
|
102
|
-
.gauntlet/
|
|
103
|
-
config.yml
|
|
104
|
-
checks/
|
|
105
|
-
*.yml
|
|
106
|
-
reviews/
|
|
107
|
-
*.md
|
|
108
|
-
```
|
|
109
|
-
|
|
110
|
-
- **Project config**: `.gauntlet/config.yml`
|
|
111
|
-
- **Check definitions**: `.gauntlet/checks/*.yml`
|
|
112
|
-
- **Review definitions**: `.gauntlet/reviews/*.md` (filename is the review gate name)
|
|
113
|
-
|
|
114
|
-
### Logs
|
|
115
|
-
|
|
116
|
-
Each job writes a log file under `log_dir` (default: `.gauntlet_logs/`). Filenames are derived from the job id (sanitized).
|
|
117
|
-
|
|
118
|
-
### Documentation
|
|
84
|
+
## Documentation
|
|
119
85
|
|
|
86
|
+
- [Quick Start Guide](docs/quick-start.md) — installation, basic usage, and config layout
|
|
120
87
|
- [User Guide](docs/user-guide.md) — full usage details
|
|
121
88
|
- [Configuration Reference](docs/config-reference.md) — all configuration fields + defaults
|
|
122
89
|
- [CLI Invocation Details](docs/cli-invocation-details.md) — how we securely invoke AI CLIs
|
|
90
|
+
- [Development Guide](docs/development.md) — how to build and develop this project
|
package/package.json
CHANGED
|
@@ -139,7 +139,7 @@ export class GitHubCopilotAdapter implements CLIAdapter {
|
|
|
139
139
|
// because copilot requires stdin input. The tmpFile path is system-controlled
|
|
140
140
|
// (os.tmpdir() + Date.now() + process.pid), not user-supplied, eliminating injection risk.
|
|
141
141
|
// Double quotes handle paths with spaces. This pattern matches claude.ts:131.
|
|
142
|
-
const cmd = `cat "${tmpFile}" | copilot --allow-tool shell(cat) --allow-tool shell(grep) --allow-tool shell(ls) --allow-tool shell(find) --allow-tool shell(head) --allow-tool shell(tail)`;
|
|
142
|
+
const cmd = `cat "${tmpFile}" | copilot --allow-tool "shell(cat)" --allow-tool "shell(grep)" --allow-tool "shell(ls)" --allow-tool "shell(find)" --allow-tool "shell(head)" --allow-tool "shell(tail)"`;
|
|
143
143
|
const { stdout } = await execAsync(cmd, {
|
|
144
144
|
timeout: opts.timeoutMs,
|
|
145
145
|
maxBuffer: MAX_BUFFER_BYTES,
|
package/src/commands/check.ts
CHANGED
|
@@ -13,6 +13,10 @@ export function registerCheckCommand(program: Command): void {
|
|
|
13
13
|
program
|
|
14
14
|
.command("check")
|
|
15
15
|
.description("Run only applicable checks for detected changes")
|
|
16
|
+
.option(
|
|
17
|
+
"-b, --base-branch <branch>",
|
|
18
|
+
"Override base branch for change detection",
|
|
19
|
+
)
|
|
16
20
|
.option("-g, --gate <name>", "Run specific check gate only")
|
|
17
21
|
.option("-c, --commit <sha>", "Use diff for a specific commit")
|
|
18
22
|
.option(
|
|
@@ -26,7 +30,17 @@ export function registerCheckCommand(program: Command): void {
|
|
|
26
30
|
// Rotate logs before starting
|
|
27
31
|
await rotateLogs(config.project.log_dir);
|
|
28
32
|
|
|
29
|
-
|
|
33
|
+
// Determine effective base branch
|
|
34
|
+
// Priority: CLI override > CI env var > config
|
|
35
|
+
const effectiveBaseBranch =
|
|
36
|
+
options.baseBranch ||
|
|
37
|
+
(process.env.GITHUB_BASE_REF &&
|
|
38
|
+
(process.env.CI === "true" || process.env.GITHUB_ACTIONS === "true")
|
|
39
|
+
? process.env.GITHUB_BASE_REF
|
|
40
|
+
: null) ||
|
|
41
|
+
config.project.base_branch;
|
|
42
|
+
|
|
43
|
+
const changeDetector = new ChangeDetector(effectiveBaseBranch, {
|
|
30
44
|
commit: options.commit,
|
|
31
45
|
uncommitted: options.uncommitted,
|
|
32
46
|
});
|
|
@@ -65,7 +79,14 @@ export function registerCheckCommand(program: Command): void {
|
|
|
65
79
|
|
|
66
80
|
const logger = new Logger(config.project.log_dir);
|
|
67
81
|
const reporter = new ConsoleReporter();
|
|
68
|
-
const runner = new Runner(
|
|
82
|
+
const runner = new Runner(
|
|
83
|
+
config,
|
|
84
|
+
logger,
|
|
85
|
+
reporter,
|
|
86
|
+
undefined,
|
|
87
|
+
undefined,
|
|
88
|
+
effectiveBaseBranch,
|
|
89
|
+
);
|
|
69
90
|
|
|
70
91
|
const success = await runner.run(jobs);
|
|
71
92
|
process.exit(success ? 0 : 1);
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
import type { Command } from "commander";
|
|
2
|
+
import { initCI } from "./init.js";
|
|
3
|
+
import { listJobs } from "./list-jobs.js";
|
|
4
|
+
|
|
5
|
+
export function registerCICommand(program: Command): void {
|
|
6
|
+
const ci = program.command("ci").description("Manage CI integration");
|
|
7
|
+
|
|
8
|
+
ci.command("init")
|
|
9
|
+
.description("Initialize CI workflow and configuration")
|
|
10
|
+
.action(initCI);
|
|
11
|
+
|
|
12
|
+
ci.command("list-jobs")
|
|
13
|
+
.description("List CI jobs (used by workflow)")
|
|
14
|
+
.action(listJobs);
|
|
15
|
+
}
|
|
@@ -0,0 +1,96 @@
|
|
|
1
|
+
import fs from "node:fs/promises";
|
|
2
|
+
import path from "node:path";
|
|
3
|
+
import chalk from "chalk";
|
|
4
|
+
import YAML from "yaml";
|
|
5
|
+
import { loadCIConfig } from "../../config/ci-loader.js";
|
|
6
|
+
import type { CIConfig } from "../../config/types.js";
|
|
7
|
+
import workflowTemplate from "../../templates/workflow.yml" with {
|
|
8
|
+
type: "text",
|
|
9
|
+
};
|
|
10
|
+
|
|
11
|
+
export async function initCI(): Promise<void> {
|
|
12
|
+
const workflowDir = path.join(process.cwd(), ".github", "workflows");
|
|
13
|
+
const workflowPath = path.join(workflowDir, "gauntlet.yml");
|
|
14
|
+
const gauntletDir = path.join(process.cwd(), ".gauntlet");
|
|
15
|
+
const ciConfigPath = path.join(gauntletDir, "ci.yml");
|
|
16
|
+
|
|
17
|
+
// 1. Ensure .gauntlet/ci.yml exists
|
|
18
|
+
if (!(await fileExists(ciConfigPath))) {
|
|
19
|
+
console.log(chalk.yellow("Creating starter .gauntlet/ci.yml..."));
|
|
20
|
+
await fs.mkdir(gauntletDir, { recursive: true });
|
|
21
|
+
const starterContent = `# CI Configuration for Agent Gauntlet
|
|
22
|
+
# Define runtimes, services, and which checks to run in CI.
|
|
23
|
+
|
|
24
|
+
runtimes:
|
|
25
|
+
# ruby:
|
|
26
|
+
# version: "3.3"
|
|
27
|
+
# bundler_cache: true
|
|
28
|
+
|
|
29
|
+
services:
|
|
30
|
+
# postgres:
|
|
31
|
+
# image: postgres:16
|
|
32
|
+
# ports: ["5432:5432"]
|
|
33
|
+
|
|
34
|
+
setup:
|
|
35
|
+
# - name: Global Setup
|
|
36
|
+
# run: echo "Setting up..."
|
|
37
|
+
|
|
38
|
+
checks:
|
|
39
|
+
# - name: linter
|
|
40
|
+
# requires_runtimes: [ruby]
|
|
41
|
+
`;
|
|
42
|
+
await fs.writeFile(ciConfigPath, starterContent);
|
|
43
|
+
} else {
|
|
44
|
+
console.log(chalk.dim("Found existing .gauntlet/ci.yml"));
|
|
45
|
+
}
|
|
46
|
+
|
|
47
|
+
// 2. Load CI config to get services
|
|
48
|
+
let ciConfig: CIConfig | undefined;
|
|
49
|
+
try {
|
|
50
|
+
ciConfig = await loadCIConfig();
|
|
51
|
+
} catch (_e) {
|
|
52
|
+
console.warn(
|
|
53
|
+
chalk.yellow(
|
|
54
|
+
"Could not load CI config to inject services. Workflow will have no services defined.",
|
|
55
|
+
),
|
|
56
|
+
);
|
|
57
|
+
}
|
|
58
|
+
|
|
59
|
+
// 3. Generate workflow file
|
|
60
|
+
console.log(chalk.dim(`Generating ${workflowPath}...`));
|
|
61
|
+
await fs.mkdir(workflowDir, { recursive: true });
|
|
62
|
+
|
|
63
|
+
let templateContent = workflowTemplate;
|
|
64
|
+
|
|
65
|
+
// Inject services
|
|
66
|
+
if (ciConfig?.services && Object.keys(ciConfig.services).length > 0) {
|
|
67
|
+
const servicesYaml = YAML.stringify({ services: ciConfig.services });
|
|
68
|
+
// Indent services
|
|
69
|
+
const indentedServices = servicesYaml
|
|
70
|
+
.split("\n")
|
|
71
|
+
.map((line) => (line.trim() ? ` ${line}` : line))
|
|
72
|
+
.join("\n");
|
|
73
|
+
|
|
74
|
+
templateContent = templateContent.replace(
|
|
75
|
+
"# Services will be injected here by agent-gauntlet",
|
|
76
|
+
indentedServices,
|
|
77
|
+
);
|
|
78
|
+
} else {
|
|
79
|
+
templateContent = templateContent.replace(
|
|
80
|
+
" # Services will be injected here by agent-gauntlet\n",
|
|
81
|
+
"",
|
|
82
|
+
);
|
|
83
|
+
}
|
|
84
|
+
|
|
85
|
+
await fs.writeFile(workflowPath, templateContent);
|
|
86
|
+
console.log(chalk.green("Successfully generated GitHub Actions workflow!"));
|
|
87
|
+
}
|
|
88
|
+
|
|
89
|
+
async function fileExists(path: string): Promise<boolean> {
|
|
90
|
+
try {
|
|
91
|
+
const stat = await fs.stat(path);
|
|
92
|
+
return stat.isFile();
|
|
93
|
+
} catch {
|
|
94
|
+
return false;
|
|
95
|
+
}
|
|
96
|
+
}
|
|
@@ -0,0 +1,78 @@
|
|
|
1
|
+
import { loadCIConfig } from "../../config/ci-loader.js";
|
|
2
|
+
import { loadConfig } from "../../config/loader.js";
|
|
3
|
+
import type { CISetupStep } from "../../config/types.js";
|
|
4
|
+
import { EntryPointExpander } from "../../core/entry-point.js";
|
|
5
|
+
|
|
6
|
+
export async function listJobs(): Promise<void> {
|
|
7
|
+
try {
|
|
8
|
+
const config = await loadConfig();
|
|
9
|
+
const ciConfig = await loadCIConfig();
|
|
10
|
+
const expander = new EntryPointExpander();
|
|
11
|
+
const expandedEntryPoints = await expander.expandAll(
|
|
12
|
+
config.project.entry_points,
|
|
13
|
+
);
|
|
14
|
+
|
|
15
|
+
const matrixJobs = [];
|
|
16
|
+
|
|
17
|
+
const globalSetup = formatSetup(ciConfig.setup || undefined);
|
|
18
|
+
|
|
19
|
+
if (ciConfig.checks) {
|
|
20
|
+
for (const ep of expandedEntryPoints) {
|
|
21
|
+
// Get checks enabled for this entry point
|
|
22
|
+
const allowedChecks = new Set(ep.config.checks || []);
|
|
23
|
+
|
|
24
|
+
for (const check of ciConfig.checks) {
|
|
25
|
+
if (allowedChecks.has(check.name)) {
|
|
26
|
+
// Check definition from .gauntlet/checks/*.yml
|
|
27
|
+
const checkDef = config.checks[check.name];
|
|
28
|
+
if (!checkDef) {
|
|
29
|
+
console.warn(
|
|
30
|
+
`Warning: Check '${check.name}' found in CI config but not defined in checks/*.yml`,
|
|
31
|
+
);
|
|
32
|
+
continue;
|
|
33
|
+
}
|
|
34
|
+
|
|
35
|
+
const id = `${check.name}-${ep.path.replace(/\//g, "-")}`;
|
|
36
|
+
|
|
37
|
+
matrixJobs.push({
|
|
38
|
+
id,
|
|
39
|
+
name: check.name,
|
|
40
|
+
entry_point: ep.path,
|
|
41
|
+
working_directory: checkDef.working_directory || ep.path,
|
|
42
|
+
command: checkDef.command,
|
|
43
|
+
runtimes: check.requires_runtimes || [],
|
|
44
|
+
services: check.requires_services || [],
|
|
45
|
+
setup: formatSetup(check.setup || undefined),
|
|
46
|
+
global_setup: globalSetup,
|
|
47
|
+
});
|
|
48
|
+
}
|
|
49
|
+
}
|
|
50
|
+
}
|
|
51
|
+
}
|
|
52
|
+
|
|
53
|
+
const output = {
|
|
54
|
+
matrix: matrixJobs,
|
|
55
|
+
services: ciConfig.services || {},
|
|
56
|
+
runtimes: ciConfig.runtimes || {},
|
|
57
|
+
};
|
|
58
|
+
|
|
59
|
+
console.log(JSON.stringify(output));
|
|
60
|
+
} catch (e) {
|
|
61
|
+
console.error("Error generating CI jobs:", e);
|
|
62
|
+
process.exit(1);
|
|
63
|
+
}
|
|
64
|
+
}
|
|
65
|
+
|
|
66
|
+
const formatSetup = (steps: CISetupStep[] | null | undefined): string => {
|
|
67
|
+
if (!steps || steps.length === 0) return "";
|
|
68
|
+
return steps
|
|
69
|
+
.map((s) => {
|
|
70
|
+
const cmd = s.working_directory
|
|
71
|
+
? `(cd "${s.working_directory}" && ${s.run})`
|
|
72
|
+
: s.run;
|
|
73
|
+
return `echo "::group::${s.name}"
|
|
74
|
+
${cmd}
|
|
75
|
+
echo "::endgroup::"`;
|
|
76
|
+
})
|
|
77
|
+
.join("\n");
|
|
78
|
+
};
|
package/src/commands/detect.ts
CHANGED
|
@@ -11,6 +11,10 @@ export function registerDetectCommand(program: Command): void {
|
|
|
11
11
|
.description(
|
|
12
12
|
"Show what gates would run for detected changes (without executing them)",
|
|
13
13
|
)
|
|
14
|
+
.option(
|
|
15
|
+
"-b, --base-branch <branch>",
|
|
16
|
+
"Override base branch for change detection",
|
|
17
|
+
)
|
|
14
18
|
.option("-c, --commit <sha>", "Use diff for a specific commit")
|
|
15
19
|
.option(
|
|
16
20
|
"-u, --uncommitted",
|
|
@@ -19,7 +23,18 @@ export function registerDetectCommand(program: Command): void {
|
|
|
19
23
|
.action(async (options) => {
|
|
20
24
|
try {
|
|
21
25
|
const config = await loadConfig();
|
|
22
|
-
|
|
26
|
+
|
|
27
|
+
// Determine effective base branch
|
|
28
|
+
// Priority: CLI override > CI env var > config
|
|
29
|
+
const effectiveBaseBranch =
|
|
30
|
+
options.baseBranch ||
|
|
31
|
+
(process.env.GITHUB_BASE_REF &&
|
|
32
|
+
(process.env.CI === "true" || process.env.GITHUB_ACTIONS === "true")
|
|
33
|
+
? process.env.GITHUB_BASE_REF
|
|
34
|
+
: null) ||
|
|
35
|
+
config.project.base_branch;
|
|
36
|
+
|
|
37
|
+
const changeDetector = new ChangeDetector(effectiveBaseBranch, {
|
|
23
38
|
commit: options.commit,
|
|
24
39
|
uncommitted: options.uncommitted,
|
|
25
40
|
});
|
package/src/commands/help.ts
CHANGED
|
@@ -24,6 +24,7 @@ export function registerHelpCommand(program: Command): void {
|
|
|
24
24
|
console.log(" list List configured gates");
|
|
25
25
|
console.log(" health Check CLI tool availability");
|
|
26
26
|
console.log(" init Initialize .gauntlet configuration");
|
|
27
|
+
console.log(" ci CI integration commands (init, list-jobs)");
|
|
27
28
|
console.log(" help Show this help message\n");
|
|
28
29
|
console.log(
|
|
29
30
|
"For more information, see: https://github.com/your-repo/agent-gauntlet",
|
package/src/commands/index.ts
CHANGED
|
@@ -40,6 +40,8 @@ mock.module("../cli-adapters/index.js", () => ({
|
|
|
40
40
|
getAllAdapters: () => mockAdapters,
|
|
41
41
|
getProjectCommandAdapters: () => mockAdapters,
|
|
42
42
|
getUserCommandAdapters: () => [],
|
|
43
|
+
getAdapter: (name: string) => mockAdapters.find((a) => a.name === name),
|
|
44
|
+
getValidCLITools: () => mockAdapters.map((a) => a.name),
|
|
43
45
|
}));
|
|
44
46
|
|
|
45
47
|
// Import after mocking
|
package/src/commands/rerun.ts
CHANGED
|
@@ -19,6 +19,10 @@ export function registerRerunCommand(program: Command): void {
|
|
|
19
19
|
.description(
|
|
20
20
|
"Rerun gates (checks & reviews) with previous failures as context (defaults to uncommitted changes)",
|
|
21
21
|
)
|
|
22
|
+
.option(
|
|
23
|
+
"-b, --base-branch <branch>",
|
|
24
|
+
"Override base branch for change detection",
|
|
25
|
+
)
|
|
22
26
|
.option("-g, --gate <name>", "Run specific gate only")
|
|
23
27
|
.option(
|
|
24
28
|
"-c, --commit <sha>",
|
|
@@ -71,6 +75,16 @@ export function registerRerunCommand(program: Command): void {
|
|
|
71
75
|
// Rotate logs before starting the new run
|
|
72
76
|
await rotateLogs(config.project.log_dir);
|
|
73
77
|
|
|
78
|
+
// Determine effective base branch
|
|
79
|
+
// Priority: CLI override > CI env var > config
|
|
80
|
+
const effectiveBaseBranch =
|
|
81
|
+
options.baseBranch ||
|
|
82
|
+
(process.env.GITHUB_BASE_REF &&
|
|
83
|
+
(process.env.CI === "true" || process.env.GITHUB_ACTIONS === "true")
|
|
84
|
+
? process.env.GITHUB_BASE_REF
|
|
85
|
+
: null) ||
|
|
86
|
+
config.project.base_branch;
|
|
87
|
+
|
|
74
88
|
// Detect changes (default to uncommitted unless --commit is specified)
|
|
75
89
|
// Note: Rerun defaults to uncommitted changes for faster iteration loops,
|
|
76
90
|
// unlike 'run' which defaults to base_branch comparison.
|
|
@@ -80,7 +94,7 @@ export function registerRerunCommand(program: Command): void {
|
|
|
80
94
|
};
|
|
81
95
|
|
|
82
96
|
const changeDetector = new ChangeDetector(
|
|
83
|
-
|
|
97
|
+
effectiveBaseBranch,
|
|
84
98
|
changeOptions,
|
|
85
99
|
);
|
|
86
100
|
const expander = new EntryPointExpander();
|
|
@@ -132,6 +146,7 @@ export function registerRerunCommand(program: Command): void {
|
|
|
132
146
|
reporter,
|
|
133
147
|
failuresMap, // Pass previous failures map
|
|
134
148
|
changeOptions, // Pass change detection options
|
|
149
|
+
effectiveBaseBranch, // Pass effective base branch
|
|
135
150
|
);
|
|
136
151
|
|
|
137
152
|
const success = await runner.run(jobs);
|
package/src/commands/review.ts
CHANGED
|
@@ -13,6 +13,10 @@ export function registerReviewCommand(program: Command): void {
|
|
|
13
13
|
program
|
|
14
14
|
.command("review")
|
|
15
15
|
.description("Run only applicable reviews for detected changes")
|
|
16
|
+
.option(
|
|
17
|
+
"-b, --base-branch <branch>",
|
|
18
|
+
"Override base branch for change detection",
|
|
19
|
+
)
|
|
16
20
|
.option("-g, --gate <name>", "Run specific review gate only")
|
|
17
21
|
.option("-c, --commit <sha>", "Use diff for a specific commit")
|
|
18
22
|
.option(
|
|
@@ -26,7 +30,17 @@ export function registerReviewCommand(program: Command): void {
|
|
|
26
30
|
// Rotate logs before starting
|
|
27
31
|
await rotateLogs(config.project.log_dir);
|
|
28
32
|
|
|
29
|
-
|
|
33
|
+
// Determine effective base branch
|
|
34
|
+
// Priority: CLI override > CI env var > config
|
|
35
|
+
const effectiveBaseBranch =
|
|
36
|
+
options.baseBranch ||
|
|
37
|
+
(process.env.GITHUB_BASE_REF &&
|
|
38
|
+
(process.env.CI === "true" || process.env.GITHUB_ACTIONS === "true")
|
|
39
|
+
? process.env.GITHUB_BASE_REF
|
|
40
|
+
: null) ||
|
|
41
|
+
config.project.base_branch;
|
|
42
|
+
|
|
43
|
+
const changeDetector = new ChangeDetector(effectiveBaseBranch, {
|
|
30
44
|
commit: options.commit,
|
|
31
45
|
uncommitted: options.uncommitted,
|
|
32
46
|
});
|
|
@@ -65,7 +79,14 @@ export function registerReviewCommand(program: Command): void {
|
|
|
65
79
|
|
|
66
80
|
const logger = new Logger(config.project.log_dir);
|
|
67
81
|
const reporter = new ConsoleReporter();
|
|
68
|
-
const runner = new Runner(
|
|
82
|
+
const runner = new Runner(
|
|
83
|
+
config,
|
|
84
|
+
logger,
|
|
85
|
+
reporter,
|
|
86
|
+
undefined,
|
|
87
|
+
undefined,
|
|
88
|
+
effectiveBaseBranch,
|
|
89
|
+
);
|
|
69
90
|
|
|
70
91
|
const success = await runner.run(jobs);
|
|
71
92
|
process.exit(success ? 0 : 1);
|
package/src/commands/run.ts
CHANGED
|
@@ -13,6 +13,10 @@ export function registerRunCommand(program: Command): void {
|
|
|
13
13
|
program
|
|
14
14
|
.command("run")
|
|
15
15
|
.description("Run gates for detected changes")
|
|
16
|
+
.option(
|
|
17
|
+
"-b, --base-branch <branch>",
|
|
18
|
+
"Override base branch for change detection",
|
|
19
|
+
)
|
|
16
20
|
.option("-g, --gate <name>", "Run specific gate only")
|
|
17
21
|
.option("-c, --commit <sha>", "Use diff for a specific commit")
|
|
18
22
|
.option(
|
|
@@ -26,7 +30,17 @@ export function registerRunCommand(program: Command): void {
|
|
|
26
30
|
// Rotate logs before starting
|
|
27
31
|
await rotateLogs(config.project.log_dir);
|
|
28
32
|
|
|
29
|
-
|
|
33
|
+
// Determine effective base branch
|
|
34
|
+
// Priority: CLI override > CI env var > config
|
|
35
|
+
const effectiveBaseBranch =
|
|
36
|
+
options.baseBranch ||
|
|
37
|
+
(process.env.GITHUB_BASE_REF &&
|
|
38
|
+
(process.env.CI === "true" || process.env.GITHUB_ACTIONS === "true")
|
|
39
|
+
? process.env.GITHUB_BASE_REF
|
|
40
|
+
: null) ||
|
|
41
|
+
config.project.base_branch;
|
|
42
|
+
|
|
43
|
+
const changeDetector = new ChangeDetector(effectiveBaseBranch, {
|
|
30
44
|
commit: options.commit,
|
|
31
45
|
uncommitted: options.uncommitted,
|
|
32
46
|
});
|
|
@@ -62,7 +76,14 @@ export function registerRunCommand(program: Command): void {
|
|
|
62
76
|
|
|
63
77
|
const logger = new Logger(config.project.log_dir);
|
|
64
78
|
const reporter = new ConsoleReporter();
|
|
65
|
-
const runner = new Runner(
|
|
79
|
+
const runner = new Runner(
|
|
80
|
+
config,
|
|
81
|
+
logger,
|
|
82
|
+
reporter,
|
|
83
|
+
undefined,
|
|
84
|
+
undefined,
|
|
85
|
+
effectiveBaseBranch,
|
|
86
|
+
);
|
|
66
87
|
|
|
67
88
|
const success = await runner.run(jobs);
|
|
68
89
|
process.exit(success ? 0 : 1);
|
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
import fs from "node:fs/promises";
|
|
2
|
+
import path from "node:path";
|
|
3
|
+
import YAML from "yaml";
|
|
4
|
+
import { ciConfigSchema } from "./ci-schema.js";
|
|
5
|
+
import type { CIConfig } from "./types.js";
|
|
6
|
+
|
|
7
|
+
const GAUNTLET_DIR = ".gauntlet";
|
|
8
|
+
const CI_FILE = "ci.yml";
|
|
9
|
+
|
|
10
|
+
export async function loadCIConfig(
|
|
11
|
+
rootDir: string = process.cwd(),
|
|
12
|
+
): Promise<CIConfig> {
|
|
13
|
+
const ciPath = path.join(rootDir, GAUNTLET_DIR, CI_FILE);
|
|
14
|
+
|
|
15
|
+
if (!(await fileExists(ciPath))) {
|
|
16
|
+
throw new Error(
|
|
17
|
+
`CI configuration file not found at ${ciPath}. Run 'agent-gauntlet ci init' to create it.`,
|
|
18
|
+
);
|
|
19
|
+
}
|
|
20
|
+
|
|
21
|
+
const content = await fs.readFile(ciPath, "utf-8");
|
|
22
|
+
const raw = YAML.parse(content);
|
|
23
|
+
return ciConfigSchema.parse(raw);
|
|
24
|
+
}
|
|
25
|
+
|
|
26
|
+
async function fileExists(path: string): Promise<boolean> {
|
|
27
|
+
try {
|
|
28
|
+
const stat = await fs.stat(path);
|
|
29
|
+
return stat.isFile();
|
|
30
|
+
} catch {
|
|
31
|
+
return false;
|
|
32
|
+
}
|
|
33
|
+
}
|
|
@@ -0,0 +1,52 @@
|
|
|
1
|
+
import { z } from "zod";
|
|
2
|
+
|
|
3
|
+
export const runtimeConfigSchema = z.record(
|
|
4
|
+
z.string(),
|
|
5
|
+
z
|
|
6
|
+
.object({
|
|
7
|
+
version: z.string().min(1),
|
|
8
|
+
bundler_cache: z.boolean().optional(),
|
|
9
|
+
})
|
|
10
|
+
.passthrough(),
|
|
11
|
+
);
|
|
12
|
+
|
|
13
|
+
export const serviceConfigSchema = z.record(
|
|
14
|
+
z.string(),
|
|
15
|
+
z
|
|
16
|
+
.object({
|
|
17
|
+
image: z.string().min(1),
|
|
18
|
+
env: z.record(z.string()).optional(),
|
|
19
|
+
ports: z.array(z.string()).optional(),
|
|
20
|
+
options: z.string().optional(),
|
|
21
|
+
health_check: z
|
|
22
|
+
.object({
|
|
23
|
+
cmd: z.string().optional(),
|
|
24
|
+
interval: z.string().optional(),
|
|
25
|
+
timeout: z.string().optional(),
|
|
26
|
+
retries: z.number().optional(),
|
|
27
|
+
})
|
|
28
|
+
.optional(),
|
|
29
|
+
})
|
|
30
|
+
.passthrough(),
|
|
31
|
+
);
|
|
32
|
+
|
|
33
|
+
export const ciSetupStepSchema = z.object({
|
|
34
|
+
name: z.string().min(1),
|
|
35
|
+
run: z.string().min(1),
|
|
36
|
+
working_directory: z.string().optional(),
|
|
37
|
+
if: z.string().optional(),
|
|
38
|
+
});
|
|
39
|
+
|
|
40
|
+
export const ciCheckConfigSchema = z.object({
|
|
41
|
+
name: z.string().min(1),
|
|
42
|
+
requires_runtimes: z.array(z.string()).optional(),
|
|
43
|
+
requires_services: z.array(z.string()).optional(),
|
|
44
|
+
setup: z.array(ciSetupStepSchema).optional(),
|
|
45
|
+
});
|
|
46
|
+
|
|
47
|
+
export const ciConfigSchema = z.object({
|
|
48
|
+
runtimes: runtimeConfigSchema.nullable().optional(),
|
|
49
|
+
services: serviceConfigSchema.nullable().optional(),
|
|
50
|
+
setup: z.array(ciSetupStepSchema).nullable().optional(),
|
|
51
|
+
checks: z.array(ciCheckConfigSchema).nullable().optional(),
|
|
52
|
+
});
|
package/src/config/schema.ts
CHANGED
|
@@ -11,7 +11,6 @@ export const checkGateSchema = z
|
|
|
11
11
|
command: z.string().min(1),
|
|
12
12
|
working_directory: z.string().optional(),
|
|
13
13
|
parallel: z.boolean().default(false),
|
|
14
|
-
run_in_ci: z.boolean().default(true),
|
|
15
14
|
run_locally: z.boolean().default(true),
|
|
16
15
|
timeout: z.number().optional(),
|
|
17
16
|
fail_fast: z.boolean().optional(),
|
package/src/config/types.ts
CHANGED
|
@@ -1,4 +1,11 @@
|
|
|
1
1
|
import type { z } from "zod";
|
|
2
|
+
import type {
|
|
3
|
+
ciCheckConfigSchema,
|
|
4
|
+
ciConfigSchema,
|
|
5
|
+
ciSetupStepSchema,
|
|
6
|
+
runtimeConfigSchema,
|
|
7
|
+
serviceConfigSchema,
|
|
8
|
+
} from "./ci-schema.js";
|
|
2
9
|
import type {
|
|
3
10
|
checkGateSchema,
|
|
4
11
|
cliConfigSchema,
|
|
@@ -17,6 +24,12 @@ export type EntryPointConfig = z.infer<typeof entryPointSchema>;
|
|
|
17
24
|
export type GauntletConfig = z.infer<typeof gauntletConfigSchema>;
|
|
18
25
|
export type CLIConfig = z.infer<typeof cliConfigSchema>;
|
|
19
26
|
|
|
27
|
+
export type CIConfig = z.infer<typeof ciConfigSchema>;
|
|
28
|
+
export type CICheckConfig = z.infer<typeof ciCheckConfigSchema>;
|
|
29
|
+
export type CISetupStep = z.infer<typeof ciSetupStepSchema>;
|
|
30
|
+
export type RuntimeConfig = z.infer<typeof runtimeConfigSchema>;
|
|
31
|
+
export type ServiceConfig = z.infer<typeof serviceConfigSchema>;
|
|
32
|
+
|
|
20
33
|
// Combined type for the fully loaded configuration
|
|
21
34
|
export interface LoadedConfig {
|
|
22
35
|
project: GauntletConfig;
|
|
@@ -36,9 +36,9 @@ export class ChangeDetector {
|
|
|
36
36
|
}
|
|
37
37
|
|
|
38
38
|
private async getCIChangedFiles(): Promise<string[]> {
|
|
39
|
-
// In GitHub Actions,
|
|
40
|
-
//
|
|
41
|
-
const baseRef =
|
|
39
|
+
// In GitHub Actions, GITHUB_SHA is the commit being built
|
|
40
|
+
// Base branch priority is already resolved by caller
|
|
41
|
+
const baseRef = this.baseBranch;
|
|
42
42
|
const headRef = process.env.GITHUB_SHA || "HEAD";
|
|
43
43
|
|
|
44
44
|
// We might need to fetch first in some shallow clones, but assuming strictly for now
|
package/src/core/entry-point.ts
CHANGED
|
@@ -1,3 +1,4 @@
|
|
|
1
|
+
import fs from "node:fs/promises";
|
|
1
2
|
import path from "node:path";
|
|
2
3
|
import type { EntryPointConfig } from "../config/types.js";
|
|
3
4
|
|
|
@@ -55,6 +56,31 @@ export class EntryPointExpander {
|
|
|
55
56
|
return results;
|
|
56
57
|
}
|
|
57
58
|
|
|
59
|
+
async expandAll(
|
|
60
|
+
entryPoints: EntryPointConfig[],
|
|
61
|
+
): Promise<ExpandedEntryPoint[]> {
|
|
62
|
+
const results: ExpandedEntryPoint[] = [];
|
|
63
|
+
|
|
64
|
+
for (const ep of entryPoints) {
|
|
65
|
+
if (ep.path === ".") {
|
|
66
|
+
results.push({ path: ".", config: ep });
|
|
67
|
+
continue;
|
|
68
|
+
}
|
|
69
|
+
|
|
70
|
+
if (ep.path.endsWith("*")) {
|
|
71
|
+
const parentDir = ep.path.slice(0, -2);
|
|
72
|
+
const subDirs = await this.listSubDirectories(parentDir);
|
|
73
|
+
for (const subDir of subDirs) {
|
|
74
|
+
results.push({ path: subDir, config: ep });
|
|
75
|
+
}
|
|
76
|
+
} else {
|
|
77
|
+
results.push({ path: ep.path, config: ep });
|
|
78
|
+
}
|
|
79
|
+
}
|
|
80
|
+
|
|
81
|
+
return results;
|
|
82
|
+
}
|
|
83
|
+
|
|
58
84
|
private async expandWildcard(
|
|
59
85
|
parentDir: string,
|
|
60
86
|
changedFiles: string[],
|
|
@@ -80,6 +106,17 @@ export class EntryPointExpander {
|
|
|
80
106
|
return Array.from(affectedSubDirs);
|
|
81
107
|
}
|
|
82
108
|
|
|
109
|
+
private async listSubDirectories(parentDir: string): Promise<string[]> {
|
|
110
|
+
try {
|
|
111
|
+
const dirents = await fs.readdir(parentDir, { withFileTypes: true });
|
|
112
|
+
return dirents
|
|
113
|
+
.filter((d) => d.isDirectory())
|
|
114
|
+
.map((d) => path.join(parentDir, d.name));
|
|
115
|
+
} catch {
|
|
116
|
+
return [];
|
|
117
|
+
}
|
|
118
|
+
}
|
|
119
|
+
|
|
83
120
|
private hasChangesInDir(dirPath: string, changedFiles: string[]): boolean {
|
|
84
121
|
// Check if any changed file starts with the dirPath
|
|
85
122
|
// Need to ensure exact match or subdirectory (e.g. "app" should not match "apple")
|
package/src/core/runner.ts
CHANGED
|
@@ -32,6 +32,7 @@ export class Runner {
|
|
|
32
32
|
private reporter: ConsoleReporter,
|
|
33
33
|
private previousFailuresMap?: Map<string, Map<string, PreviousViolation[]>>,
|
|
34
34
|
private changeOptions?: { commit?: string; uncommitted?: boolean },
|
|
35
|
+
private baseBranchOverride?: string,
|
|
35
36
|
) {}
|
|
36
37
|
|
|
37
38
|
async run(jobs: Job[]): Promise<boolean> {
|
|
@@ -89,12 +90,14 @@ export class Runner {
|
|
|
89
90
|
const safeJobId = sanitizeJobId(job.id);
|
|
90
91
|
const previousFailures = this.previousFailuresMap?.get(safeJobId);
|
|
91
92
|
const loggerFactory = this.logger.createLoggerFactory(job.id);
|
|
93
|
+
const effectiveBaseBranch =
|
|
94
|
+
this.baseBranchOverride || this.config.project.base_branch;
|
|
92
95
|
result = await this.reviewExecutor.execute(
|
|
93
96
|
job.id,
|
|
94
97
|
job.gateConfig as ReviewGateConfig & ReviewPromptFrontmatter,
|
|
95
98
|
job.entryPoint,
|
|
96
99
|
loggerFactory,
|
|
97
|
-
|
|
100
|
+
effectiveBaseBranch,
|
|
98
101
|
previousFailures,
|
|
99
102
|
this.changeOptions,
|
|
100
103
|
this.config.project.cli.check_usage_limit,
|
package/src/gates/review.test.ts
CHANGED
|
@@ -7,64 +7,112 @@ import type {
|
|
|
7
7
|
ReviewPromptFrontmatter,
|
|
8
8
|
} from "../config/types.js";
|
|
9
9
|
import { Logger } from "../output/logger.js";
|
|
10
|
-
import { ReviewGateExecutor } from "./review.js";
|
|
10
|
+
import type { ReviewGateExecutor } from "./review.js";
|
|
11
11
|
|
|
12
12
|
const TEST_DIR = path.join(process.cwd(), `test-review-logs-${Date.now()}`);
|
|
13
|
-
const LOG_DIR = path.join(TEST_DIR, "logs");
|
|
14
13
|
|
|
15
14
|
describe("ReviewGateExecutor Logging", () => {
|
|
16
15
|
let logger: Logger;
|
|
17
16
|
let executor: ReviewGateExecutor;
|
|
17
|
+
let originalCI: string | undefined;
|
|
18
|
+
let originalGithubActions: string | undefined;
|
|
19
|
+
let originalCwd: string;
|
|
18
20
|
|
|
19
21
|
beforeEach(async () => {
|
|
20
22
|
await fs.mkdir(TEST_DIR, { recursive: true });
|
|
21
|
-
await fs.mkdir(LOG_DIR, { recursive: true });
|
|
22
|
-
logger = new Logger(LOG_DIR);
|
|
23
|
-
executor = new ReviewGateExecutor();
|
|
24
23
|
|
|
25
|
-
//
|
|
24
|
+
// Save and disable CI mode for this test to avoid complex git ref issues
|
|
25
|
+
originalCI = process.env.CI;
|
|
26
|
+
originalGithubActions = process.env.GITHUB_ACTIONS;
|
|
27
|
+
originalCwd = process.cwd();
|
|
28
|
+
delete process.env.CI;
|
|
29
|
+
delete process.env.GITHUB_ACTIONS;
|
|
30
|
+
|
|
31
|
+
// Change to test directory with its own git repo to avoid issues with the main repo
|
|
32
|
+
process.chdir(TEST_DIR);
|
|
33
|
+
// Initialize a minimal git repo for the test
|
|
34
|
+
const { exec } = await import("node:child_process");
|
|
35
|
+
const { promisify } = await import("node:util");
|
|
36
|
+
const execAsync = promisify(exec);
|
|
37
|
+
await execAsync("git init");
|
|
38
|
+
await execAsync('git config user.email "test@test.com"');
|
|
39
|
+
await execAsync('git config user.name "Test"');
|
|
40
|
+
// Create an initial commit so we have a history
|
|
41
|
+
await fs.writeFile("test.txt", "initial");
|
|
42
|
+
await execAsync("git add test.txt");
|
|
43
|
+
await execAsync('git commit -m "initial"');
|
|
44
|
+
// Create a "main" branch
|
|
45
|
+
await execAsync("git branch -M main");
|
|
46
|
+
// Create src directory for our test
|
|
47
|
+
await fs.mkdir("src", { recursive: true });
|
|
48
|
+
await fs.writeFile("src/test.ts", "test content");
|
|
49
|
+
await execAsync("git add src/test.ts");
|
|
50
|
+
await execAsync('git commit -m "add src"');
|
|
51
|
+
|
|
52
|
+
// Make uncommitted changes so the diff isn't empty
|
|
53
|
+
await fs.writeFile("src/test.ts", "modified test content");
|
|
54
|
+
|
|
55
|
+
// Now create the log directory and logger in the test directory
|
|
56
|
+
await fs.mkdir("logs", { recursive: true });
|
|
57
|
+
logger = new Logger(path.join(process.cwd(), "logs"));
|
|
58
|
+
|
|
59
|
+
// Create a factory function for mock adapters that returns the correct name
|
|
60
|
+
const createMockAdapter = (name: string): CLIAdapter =>
|
|
61
|
+
({
|
|
62
|
+
name,
|
|
63
|
+
isAvailable: async () => true,
|
|
64
|
+
checkHealth: async () => ({ status: "healthy" }),
|
|
65
|
+
// execute returns the raw string output from the LLM, which is then parsed by the executor.
|
|
66
|
+
// The real adapter returns a string. In this test, we return a JSON string to simulate
|
|
67
|
+
// the LLM returning structured data. This IS intentional and matches the expected contract
|
|
68
|
+
// where execute() -> Promise<string>.
|
|
69
|
+
execute: async () => {
|
|
70
|
+
await new Promise((r) => setTimeout(r, 1)); // Simulate async work
|
|
71
|
+
return JSON.stringify({ status: "pass", message: "OK" });
|
|
72
|
+
},
|
|
73
|
+
getProjectCommandDir: () => null,
|
|
74
|
+
getUserCommandDir: () => null,
|
|
75
|
+
getCommandExtension: () => "md",
|
|
76
|
+
canUseSymlink: () => false,
|
|
77
|
+
transformCommand: (c: string) => c,
|
|
78
|
+
}) as unknown as CLIAdapter;
|
|
79
|
+
|
|
80
|
+
// Mock getAdapter and other exports that may be imported by other modules
|
|
26
81
|
mock.module("../cli-adapters/index.js", () => ({
|
|
27
|
-
getAdapter: (name: string) =>
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
getUserCommandDir: () => null,
|
|
42
|
-
getCommandExtension: () => "md",
|
|
43
|
-
canUseSymlink: () => false,
|
|
44
|
-
transformCommand: (c: string) => c,
|
|
45
|
-
}) as unknown as CLIAdapter,
|
|
82
|
+
getAdapter: (name: string) => createMockAdapter(name),
|
|
83
|
+
getAllAdapters: () => [
|
|
84
|
+
createMockAdapter("codex"),
|
|
85
|
+
createMockAdapter("claude"),
|
|
86
|
+
],
|
|
87
|
+
getProjectCommandAdapters: () => [
|
|
88
|
+
createMockAdapter("codex"),
|
|
89
|
+
createMockAdapter("claude"),
|
|
90
|
+
],
|
|
91
|
+
getUserCommandAdapters: () => [
|
|
92
|
+
createMockAdapter("codex"),
|
|
93
|
+
createMockAdapter("claude"),
|
|
94
|
+
],
|
|
95
|
+
getValidCLITools: () => ["codex", "claude", "gemini"],
|
|
46
96
|
}));
|
|
47
97
|
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
promisify: (fn: (...args: unknown[]) => unknown) => {
|
|
51
|
-
// Only mock exec, let others pass (though in this test env we likely only use exec)
|
|
52
|
-
if (fn.name === "exec") {
|
|
53
|
-
return async (cmd: string) => {
|
|
54
|
-
if (/^git diff/.test(cmd)) return "diff content";
|
|
55
|
-
if (/^git ls-files/.test(cmd)) return "file.ts";
|
|
56
|
-
return { stdout: "", stderr: "" };
|
|
57
|
-
};
|
|
58
|
-
}
|
|
59
|
-
// Fallback for other functions if needed
|
|
60
|
-
return async () => {};
|
|
61
|
-
},
|
|
62
|
-
}));
|
|
98
|
+
const { ReviewGateExecutor } = await import("./review.js");
|
|
99
|
+
executor = new ReviewGateExecutor();
|
|
63
100
|
});
|
|
64
101
|
|
|
65
102
|
afterEach(async () => {
|
|
103
|
+
// Restore working directory first
|
|
104
|
+
process.chdir(originalCwd);
|
|
105
|
+
|
|
66
106
|
await fs.rm(TEST_DIR, { recursive: true, force: true });
|
|
67
107
|
mock.restore();
|
|
108
|
+
|
|
109
|
+
// Restore CI env vars
|
|
110
|
+
if (originalCI !== undefined) {
|
|
111
|
+
process.env.CI = originalCI;
|
|
112
|
+
}
|
|
113
|
+
if (originalGithubActions !== undefined) {
|
|
114
|
+
process.env.GITHUB_ACTIONS = originalGithubActions;
|
|
115
|
+
}
|
|
68
116
|
});
|
|
69
117
|
|
|
70
118
|
it("should only create adapter-specific logs and no generic log", async () => {
|
|
@@ -89,39 +137,94 @@ describe("ReviewGateExecutor Logging", () => {
|
|
|
89
137
|
"main",
|
|
90
138
|
);
|
|
91
139
|
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
140
|
+
// Enhanced error messages for better debugging
|
|
141
|
+
if (result.status !== "pass") {
|
|
142
|
+
throw new Error(
|
|
143
|
+
`Expected result.status to be "pass" but got "${result.status}". Message: ${result.message || "none"}. Duration: ${result.duration}ms`,
|
|
144
|
+
);
|
|
145
|
+
}
|
|
146
|
+
|
|
147
|
+
if (!result.logPaths) {
|
|
148
|
+
throw new Error(
|
|
149
|
+
`Expected result.logPaths to be defined but got ${JSON.stringify(result.logPaths)}`,
|
|
150
|
+
);
|
|
151
|
+
}
|
|
152
|
+
|
|
153
|
+
if (result.logPaths.length !== 2) {
|
|
154
|
+
throw new Error(
|
|
155
|
+
`Expected result.logPaths to have length 2 but got ${result.logPaths.length}. Paths: ${JSON.stringify(result.logPaths)}`,
|
|
156
|
+
);
|
|
157
|
+
}
|
|
158
|
+
|
|
159
|
+
if (!result.logPaths[0]?.includes("review_src_code-quality_codex.log")) {
|
|
160
|
+
throw new Error(
|
|
161
|
+
`Expected result.logPaths[0] to contain "review_src_code-quality_codex.log" but got "${result.logPaths[0]}"`,
|
|
162
|
+
);
|
|
163
|
+
}
|
|
164
|
+
|
|
165
|
+
if (!result.logPaths[1]?.includes("review_src_code-quality_claude.log")) {
|
|
166
|
+
throw new Error(
|
|
167
|
+
`Expected result.logPaths[1] to contain "review_src_code-quality_claude.log" but got "${result.logPaths[1]}"`,
|
|
168
|
+
);
|
|
169
|
+
}
|
|
170
|
+
|
|
171
|
+
const files = await fs.readdir("logs");
|
|
172
|
+
const filesList = files.join(", ");
|
|
173
|
+
|
|
174
|
+
if (!files.includes("review_src_code-quality_codex.log")) {
|
|
175
|
+
throw new Error(
|
|
176
|
+
`Expected log directory to contain "review_src_code-quality_codex.log" but only found: [${filesList}]`,
|
|
177
|
+
);
|
|
178
|
+
}
|
|
179
|
+
|
|
180
|
+
if (!files.includes("review_src_code-quality_claude.log")) {
|
|
181
|
+
throw new Error(
|
|
182
|
+
`Expected log directory to contain "review_src_code-quality_claude.log" but only found: [${filesList}]`,
|
|
183
|
+
);
|
|
184
|
+
}
|
|
99
185
|
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
186
|
+
if (files.includes("review_src_code-quality.log")) {
|
|
187
|
+
throw new Error(
|
|
188
|
+
`Expected log directory NOT to contain generic log "review_src_code-quality.log" but it was found. All files: [${filesList}]`,
|
|
189
|
+
);
|
|
190
|
+
}
|
|
104
191
|
|
|
105
192
|
// Verify multiplexed content
|
|
106
193
|
const codexLog = await fs.readFile(
|
|
107
|
-
|
|
194
|
+
"logs/review_src_code-quality_codex.log",
|
|
108
195
|
"utf-8",
|
|
109
196
|
);
|
|
110
|
-
|
|
111
|
-
|
|
197
|
+
if (!codexLog.includes("Starting review: code-quality")) {
|
|
198
|
+
throw new Error(
|
|
199
|
+
`Expected codex log to contain "Starting review: code-quality" but got: ${codexLog.substring(0, 200)}...`,
|
|
200
|
+
);
|
|
201
|
+
}
|
|
202
|
+
if (!codexLog.includes("Review result (codex): pass")) {
|
|
203
|
+
throw new Error(
|
|
204
|
+
`Expected codex log to contain "Review result (codex): pass" but got: ${codexLog.substring(0, 200)}...`,
|
|
205
|
+
);
|
|
206
|
+
}
|
|
112
207
|
|
|
113
208
|
const claudeLog = await fs.readFile(
|
|
114
|
-
|
|
209
|
+
"logs/review_src_code-quality_claude.log",
|
|
115
210
|
"utf-8",
|
|
116
211
|
);
|
|
117
|
-
|
|
118
|
-
|
|
212
|
+
if (!claudeLog.includes("Starting review: code-quality")) {
|
|
213
|
+
throw new Error(
|
|
214
|
+
`Expected claude log to contain "Starting review: code-quality" but got: ${claudeLog.substring(0, 200)}...`,
|
|
215
|
+
);
|
|
216
|
+
}
|
|
217
|
+
if (!claudeLog.includes("Review result (claude): pass")) {
|
|
218
|
+
throw new Error(
|
|
219
|
+
`Expected claude log to contain "Review result (claude): pass" but got: ${claudeLog.substring(0, 200)}...`,
|
|
220
|
+
);
|
|
221
|
+
}
|
|
119
222
|
});
|
|
120
223
|
|
|
121
224
|
it("should be handled correctly by ConsoleReporter", async () => {
|
|
122
225
|
const jobId = "review:src:code-quality";
|
|
123
|
-
const codexPath =
|
|
124
|
-
const claudePath =
|
|
226
|
+
const codexPath = "logs/review_src_code-quality_codex.log";
|
|
227
|
+
const claudePath = "logs/review_src_code-quality_claude.log";
|
|
125
228
|
|
|
126
229
|
await fs.writeFile(
|
|
127
230
|
codexPath,
|
package/src/gates/review.ts
CHANGED
|
@@ -370,6 +370,13 @@ export class ReviewGateExecutor {
|
|
|
370
370
|
}
|
|
371
371
|
|
|
372
372
|
// Create per-adapter logger
|
|
373
|
+
// Defensive check: ensure adapter name is valid
|
|
374
|
+
if (!adapter.name || typeof adapter.name !== "string") {
|
|
375
|
+
await mainLogger(
|
|
376
|
+
`Error: Invalid adapter name: ${JSON.stringify(adapter.name)}\n`,
|
|
377
|
+
);
|
|
378
|
+
return null;
|
|
379
|
+
}
|
|
373
380
|
const adapterLogger = await getAdapterLogger(adapter.name);
|
|
374
381
|
|
|
375
382
|
try {
|
|
@@ -493,7 +500,8 @@ export class ReviewGateExecutor {
|
|
|
493
500
|
entryPointPath: string,
|
|
494
501
|
baseBranch: string,
|
|
495
502
|
): Promise<string> {
|
|
496
|
-
|
|
503
|
+
// Base branch priority is already resolved by caller
|
|
504
|
+
const baseRef = baseBranch;
|
|
497
505
|
const headRef = process.env.GITHUB_SHA || "HEAD";
|
|
498
506
|
const pathArg = this.pathArg(entryPointPath);
|
|
499
507
|
|
package/src/index.ts
CHANGED
|
@@ -3,6 +3,7 @@ import { Command } from "commander";
|
|
|
3
3
|
import packageJson from "../package.json" with { type: "json" };
|
|
4
4
|
import {
|
|
5
5
|
registerCheckCommand,
|
|
6
|
+
registerCICommand,
|
|
6
7
|
registerDetectCommand,
|
|
7
8
|
registerHealthCommand,
|
|
8
9
|
registerHelpCommand,
|
|
@@ -24,6 +25,7 @@ program
|
|
|
24
25
|
registerRunCommand(program);
|
|
25
26
|
registerRerunCommand(program);
|
|
26
27
|
registerCheckCommand(program);
|
|
28
|
+
registerCICommand(program);
|
|
27
29
|
registerReviewCommand(program);
|
|
28
30
|
registerDetectCommand(program);
|
|
29
31
|
registerListCommand(program);
|
package/src/output/logger.ts
CHANGED
|
@@ -28,10 +28,12 @@ export class Logger {
|
|
|
28
28
|
}
|
|
29
29
|
|
|
30
30
|
private async initFile(logPath: string): Promise<void> {
|
|
31
|
-
if (
|
|
32
|
-
|
|
33
|
-
this.initializedFiles.add(logPath);
|
|
31
|
+
if (this.initializedFiles.has(logPath)) {
|
|
32
|
+
return;
|
|
34
33
|
}
|
|
34
|
+
// Add to set BEFORE writing to make this more atomic
|
|
35
|
+
this.initializedFiles.add(logPath);
|
|
36
|
+
await fs.writeFile(logPath, "");
|
|
35
37
|
}
|
|
36
38
|
|
|
37
39
|
async createJobLogger(
|
|
@@ -0,0 +1,77 @@
|
|
|
1
|
+
name: Gauntlet CI
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
branches: [main]
|
|
6
|
+
pull_request:
|
|
7
|
+
branches: [main]
|
|
8
|
+
|
|
9
|
+
jobs:
|
|
10
|
+
discover:
|
|
11
|
+
name: Discover Jobs
|
|
12
|
+
runs-on: ubuntu-latest
|
|
13
|
+
outputs:
|
|
14
|
+
matrix: ${{ steps.discover.outputs.matrix }}
|
|
15
|
+
runtimes: ${{ steps.discover.outputs.runtimes }}
|
|
16
|
+
steps:
|
|
17
|
+
- uses: actions/checkout@v4
|
|
18
|
+
|
|
19
|
+
- name: Install agent-gauntlet
|
|
20
|
+
run: |
|
|
21
|
+
curl -fsSL https://bun.sh/install | bash
|
|
22
|
+
~/.bun/bin/bun add -g agent-gauntlet
|
|
23
|
+
|
|
24
|
+
- name: Discover gauntlet jobs
|
|
25
|
+
id: discover
|
|
26
|
+
run: |
|
|
27
|
+
output=$(~/.bun/bin/agent-gauntlet ci list-jobs)
|
|
28
|
+
echo "matrix=$(echo "$output" | jq -c '.matrix')" >> $GITHUB_OUTPUT
|
|
29
|
+
echo "runtimes=$(echo "$output" | jq -c '.runtimes')" >> $GITHUB_OUTPUT
|
|
30
|
+
|
|
31
|
+
checks:
|
|
32
|
+
name: ${{ matrix.job.name }} (${{ matrix.job.entry_point }})
|
|
33
|
+
runs-on: ubuntu-latest
|
|
34
|
+
needs: discover
|
|
35
|
+
if: ${{ needs.discover.outputs.matrix != '[]' }}
|
|
36
|
+
strategy:
|
|
37
|
+
fail-fast: false
|
|
38
|
+
matrix:
|
|
39
|
+
job: ${{ fromJson(needs.discover.outputs.matrix) }}
|
|
40
|
+
|
|
41
|
+
# Services will be injected here by agent-gauntlet
|
|
42
|
+
|
|
43
|
+
steps:
|
|
44
|
+
- uses: actions/checkout@v4
|
|
45
|
+
|
|
46
|
+
- name: Set up Ruby
|
|
47
|
+
if: contains(matrix.job.runtimes, 'ruby')
|
|
48
|
+
uses: ruby/setup-ruby@v1
|
|
49
|
+
with:
|
|
50
|
+
ruby-version: ${{ fromJson(needs.discover.outputs.runtimes).ruby.version }}
|
|
51
|
+
bundler-cache: ${{ fromJson(needs.discover.outputs.runtimes).ruby.bundler_cache }}
|
|
52
|
+
working-directory: ${{ matrix.job.working_directory }}
|
|
53
|
+
|
|
54
|
+
- name: Set up Node
|
|
55
|
+
if: contains(matrix.job.runtimes, 'node')
|
|
56
|
+
uses: actions/setup-node@v4
|
|
57
|
+
with:
|
|
58
|
+
node-version: ${{ fromJson(needs.discover.outputs.runtimes).node.version }}
|
|
59
|
+
|
|
60
|
+
- name: Set up Bun
|
|
61
|
+
if: contains(matrix.job.runtimes, 'bun')
|
|
62
|
+
uses: oven-sh/setup-bun@v1
|
|
63
|
+
with:
|
|
64
|
+
bun-version: ${{ fromJson(needs.discover.outputs.runtimes).bun.version }}
|
|
65
|
+
|
|
66
|
+
- name: Run global setup
|
|
67
|
+
if: ${{ matrix.job.global_setup != '' }}
|
|
68
|
+
run: ${{ matrix.job.global_setup }}
|
|
69
|
+
|
|
70
|
+
- name: Run check setup
|
|
71
|
+
if: ${{ matrix.job.setup != '' }}
|
|
72
|
+
working-directory: ${{ matrix.job.working_directory }}
|
|
73
|
+
run: ${{ matrix.job.setup }}
|
|
74
|
+
|
|
75
|
+
- name: Run check
|
|
76
|
+
working-directory: ${{ matrix.job.working_directory }}
|
|
77
|
+
run: ${{ matrix.job.command }}
|