agent-gauntlet 1.2.0 → 1.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "agent-gauntlet",
3
- "version": "1.2.0",
3
+ "version": "1.2.2",
4
4
  "description": "A CLI tool for testing AI coding agents",
5
5
  "license": "MIT",
6
6
  "author": "Paul Caplan",
@@ -16,11 +16,15 @@ Commit with optional gauntlet validation. Runs `agent-gauntlet detect` first, va
16
16
  Run `agent-gauntlet detect` using `Bash`:
17
17
 
18
18
  ```bash
19
- agent-gauntlet detect 2>&1
19
+ agent-gauntlet detect 2>&1; echo "DETECT_EXIT:$?"
20
20
  ```
21
21
 
22
- - If no changed files are reported → **skip to Step 4** (commit directly, no validation needed)
23
- - If changed files are reported → continue to Step 2
22
+ Check the exit code from the `DETECT_EXIT:` line:
23
+
24
+ - **Exit 0** → gates would run, continue to Step 2
25
+ - **Exit 2** → no gates would run (no changes or no applicable gates), **skip to Step 4** (commit directly)
26
+ - **Exit 1** → error, report the error to the user and stop
27
+ - **Any other exit code** → treat as error, report output to the user, and stop
24
28
 
25
29
  ## Step 2 - Determine Validation Intent
26
30
 
@@ -13,26 +13,22 @@ Fix issues you reasonably agree with or believe the human wants to be fixed. Ski
13
13
 
14
14
  ## Procedure
15
15
 
16
- ### Step 1 - Clean Logs
17
-
18
- Run `agent-gauntlet clean` to archive any previous log files.
19
-
20
- ### Step 2 - Run Gauntlet
16
+ ### Step 1 - Run Gauntlet
21
17
 
22
18
  If the caller requests a specific review to be enabled, append `--enable-review <name>` to the run command for each requested review.
23
19
 
24
20
  Run `agent-gauntlet run` using `Bash` with `timeout: 300000`. **ALWAYS wait for and read the full command output** before proceeding — the command typically takes 1-2 minutes. **Verify you can see a `Status:` line in the output before continuing.**
25
21
 
26
- ### Step 3 - Check Status
22
+ ### Step 2 - Check Status
27
23
 
28
24
  **NEVER assume success** — you must see an explicit `Status:` line before continuing. Check it and route accordingly:
29
- - `Status: Passed` → Go to Step 9.
30
- - `Status: Passed with warnings` → Go to Step 9.
31
- - `Status: Failed` → Continue to Step 4. **You MUST continue — do not stop here.**
32
- - `Status: Retry limit exceeded` → Run `agent-gauntlet clean` to archive logs. Go to Step 9.
25
+ - `Status: Passed` → Go to Step 8.
26
+ - `Status: Passed with warnings` → Go to Step 8.
27
+ - `Status: Failed` → Continue to Step 3. **You MUST continue — do not stop here.**
28
+ - `Status: Retry limit exceeded` → Go to Step 8.
33
29
  - No status line visible → **Known issue:** Bun can drop all stdout/stderr when LLM review subprocesses run. Read the console log file to get the status: find the latest `console.*.log` in the gauntlet log directory (e.g., `gauntlet_logs/console.1.log`) and look for the `Status:` line there. If no console log is found there, also check `gauntlet_logs/previous/` for logs from the most recent archived run. If no console log exists in either location, the command may have timed out or failed to run — re-run with a longer timeout or investigate the error. Do NOT proceed as if it passed.
34
30
 
35
- ### Step 4 - Extract Failures
31
+ ### Step 3 - Extract Failures
36
32
 
37
33
  Required when status is Failed:
38
34
  - Infer the log directory from the file paths in the console output (e.g., if output references `gauntlet_logs/check_._lint.1.log`, the log directory is `gauntlet_logs/`)
@@ -42,31 +38,31 @@ Required when status is Failed:
42
38
  b. **Subagent delegation**: If your environment supports delegating work to a subagent but not the Task tool, delegate the extract-prompt instructions with the log directory to a subagent for processing.
43
39
  c. **Inline fallback**: If no subagent capability is available, follow the extract-prompt instructions yourself to read the log files and produce the compact failure summary.
44
40
 
45
- ### Step 5 - Report Failures
41
+ ### Step 4 - Report Failures
46
42
 
47
- Print the compact failure summary returned from Step 4.
43
+ Print the compact failure summary returned from Step 3.
48
44
 
49
- ### Step 6 - Fix
45
+ ### Step 5 - Fix
50
46
 
51
47
  Apply the review guidance above to each failure and fix accordingly:
52
48
  - CHECK failures with Fix Skill: invoke the named skill
53
49
  - CHECK failures with Fix Instructions: follow the instructions
54
50
  - REVIEW violations: fix or skip per the review guidance above
55
51
 
56
- ### Step 7 - Update Review Decisions
52
+ ### Step 6 - Update Review Decisions
57
53
 
58
54
  For REVIEW violations you addressed:
59
55
  - Read `update-prompt.md` from this skill's directory
60
- - **Update review decisions** using the first available strategy (same as Step 4):
56
+ - **Update review decisions** using the first available strategy (same as Step 3):
61
57
  a. **Task tool** (Claude Code): `Task` with `subagent_type="general-purpose"`, `model="haiku"`, `prompt=` update-prompt content + log directory + decisions list. **Task calls MUST be synchronous** — NEVER use `run_in_background: true`.
62
58
  b. **Subagent delegation**: Delegate the update-prompt instructions with the log directory and decisions to a subagent.
63
59
  c. **Inline fallback**: Follow the update-prompt instructions yourself to update the review JSON files.
64
60
 
65
- ### Step 8 - Re-run Verification
61
+ ### Step 7 - Re-run Verification
66
62
 
67
- **NEVER skip this step** — if the run failed, you MUST fix and re-run. Run the same command from Step 2 (including any `--enable-review` flags) again with `Bash` and `timeout: 300000`. Do NOT run `agent-gauntlet clean` between retries. The tool detects existing logs and automatically switches to verification mode. **Go back to Step 3** to check the status line and repeat.
63
+ **NEVER skip this step** — if the run failed, you MUST fix and re-run. Run the same command from Step 1 (including any `--enable-review` flags) again with `Bash` and `timeout: 300000`. The tool detects existing logs and automatically switches to verification mode. **Go back to Step 2** to check the status line and repeat.
68
64
 
69
- ### Step 9 - Summarize Session
65
+ ### Step 8 - Summarize Session
70
66
 
71
67
  Provide a summary of the session:
72
68
  - Final Status: (Passed / Passed with warnings / Retry limit exceeded)