npm - agent-gauntlet - Versions diffs - 1.2.0 → 1.2.2 - Mend

agent-gauntlet 1.2.0 → 1.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/dist/index.js +195 -122
package/dist/index.js.map +8 -8
package/package.json +1 -1
package/skills/gauntlet-commit/SKILL.md +7 -3
package/skills/gauntlet-run/SKILL.md +15 -19

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "agent-gauntlet",
-  "version": "1.2.0",
+  "version": "1.2.2",
   "description": "A CLI tool for testing AI coding agents",
   "license": "MIT",
   "author": "Paul Caplan",

package/skills/gauntlet-commit/SKILL.md CHANGED Viewed

@@ -16,11 +16,15 @@ Commit with optional gauntlet validation. Runs `agent-gauntlet detect` first, va
 Run `agent-gauntlet detect` using `Bash`:
 ```bash
-agent-gauntlet detect 2>&1
+agent-gauntlet detect 2>&1; echo "DETECT_EXIT:$?"
 ```
-- If no changed files are reported → **skip to Step 4** (commit directly, no validation needed)
-- If changed files are reported → continue to Step 2
+Check the exit code from the `DETECT_EXIT:` line:
+- **Exit 0** → gates would run, continue to Step 2
+- **Exit 2** → no gates would run (no changes or no applicable gates), **skip to Step 4** (commit directly)
+- **Exit 1** → error, report the error to the user and stop
+- **Any other exit code** → treat as error, report output to the user, and stop
 ## Step 2 - Determine Validation Intent

package/skills/gauntlet-run/SKILL.md CHANGED Viewed

@@ -13,26 +13,22 @@ Fix issues you reasonably agree with or believe the human wants to be fixed. Ski
 ## Procedure
-### Step 1 - Clean Logs
-Run `agent-gauntlet clean` to archive any previous log files.
-### Step 2 - Run Gauntlet
+### Step 1 - Run Gauntlet
 If the caller requests a specific review to be enabled, append `--enable-review <name>` to the run command for each requested review.
 Run `agent-gauntlet run` using `Bash` with `timeout: 300000`. **ALWAYS wait for and read the full command output** before proceeding — the command typically takes 1-2 minutes. **Verify you can see a `Status:` line in the output before continuing.**
-### Step 3 - Check Status
+### Step 2 - Check Status
 **NEVER assume success** — you must see an explicit `Status:` line before continuing. Check it and route accordingly:
-- `Status: Passed` → Go to Step 9.
-- `Status: Passed with warnings` → Go to Step 9.
-- `Status: Failed` → Continue to Step 4. **You MUST continue — do not stop here.**
-- `Status: Retry limit exceeded` → Run `agent-gauntlet clean` to archive logs. Go to Step 9.
+- `Status: Passed` → Go to Step 8.
+- `Status: Passed with warnings` → Go to Step 8.
+- `Status: Failed` → Continue to Step 3. **You MUST continue — do not stop here.**
+- `Status: Retry limit exceeded` → Go to Step 8.
 - No status line visible → **Known issue:** Bun can drop all stdout/stderr when LLM review subprocesses run. Read the console log file to get the status: find the latest `console.*.log` in the gauntlet log directory (e.g., `gauntlet_logs/console.1.log`) and look for the `Status:` line there. If no console log is found there, also check `gauntlet_logs/previous/` for logs from the most recent archived run. If no console log exists in either location, the command may have timed out or failed to run — re-run with a longer timeout or investigate the error. Do NOT proceed as if it passed.
-### Step 4 - Extract Failures
+### Step 3 - Extract Failures
 Required when status is Failed:
 - Infer the log directory from the file paths in the console output (e.g., if output references `gauntlet_logs/check_._lint.1.log`, the log directory is `gauntlet_logs/`)
@@ -42,31 +38,31 @@ Required when status is Failed:
   b. **Subagent delegation**: If your environment supports delegating work to a subagent but not the Task tool, delegate the extract-prompt instructions with the log directory to a subagent for processing.
   c. **Inline fallback**: If no subagent capability is available, follow the extract-prompt instructions yourself to read the log files and produce the compact failure summary.
-### Step 5 - Report Failures
+### Step 4 - Report Failures
-Print the compact failure summary returned from Step 4.
+Print the compact failure summary returned from Step 3.
-### Step 6 - Fix
+### Step 5 - Fix
 Apply the review guidance above to each failure and fix accordingly:
 - CHECK failures with Fix Skill: invoke the named skill
 - CHECK failures with Fix Instructions: follow the instructions
 - REVIEW violations: fix or skip per the review guidance above
-### Step 7 - Update Review Decisions
+### Step 6 - Update Review Decisions
 For REVIEW violations you addressed:
 - Read `update-prompt.md` from this skill's directory
-- **Update review decisions** using the first available strategy (same as Step 4):
+- **Update review decisions** using the first available strategy (same as Step 3):
   a. **Task tool** (Claude Code): `Task` with `subagent_type="general-purpose"`, `model="haiku"`, `prompt=` update-prompt content + log directory + decisions list. **Task calls MUST be synchronous** — NEVER use `run_in_background: true`.
   b. **Subagent delegation**: Delegate the update-prompt instructions with the log directory and decisions to a subagent.
   c. **Inline fallback**: Follow the update-prompt instructions yourself to update the review JSON files.
-### Step 8 - Re-run Verification
+### Step 7 - Re-run Verification
-**NEVER skip this step** — if the run failed, you MUST fix and re-run. Run the same command from Step 2 (including any `--enable-review` flags) again with `Bash` and `timeout: 300000`. Do NOT run `agent-gauntlet clean` between retries. The tool detects existing logs and automatically switches to verification mode. **Go back to Step 3** to check the status line and repeat.
+**NEVER skip this step** — if the run failed, you MUST fix and re-run. Run the same command from Step 1 (including any `--enable-review` flags) again with `Bash` and `timeout: 300000`. The tool detects existing logs and automatically switches to verification mode. **Go back to Step 2** to check the status line and repeat.
-### Step 9 - Summarize Session
+### Step 8 - Summarize Session
 Provide a summary of the session:
 - Final Status: (Passed / Passed with warnings / Retry limit exceeded)