npm - skilltest - Versions diffs - 0.1.1 → 0.3.0 - Mend

skilltest 0.1.1 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/CLAUDE.md CHANGED Viewed

@@ -7,6 +7,7 @@
 - `lint`: static/offline quality checks
 - `trigger`: model-based triggerability testing
 - `eval`: end-to-end execution + grader-based scoring
+- `check`: lint + trigger + eval quality gates in one run
 The CLI is published as `skilltest` and built for `npx skilltest` usage.
@@ -18,6 +19,7 @@ The CLI is published as `skilltest` and built for `npx skilltest` usage.
 - `src/core/linter/`: lint check modules and orchestrator
 - `src/core/trigger-tester.ts`: query generation + trigger simulation + metrics
 - `src/core/eval-runner.ts`: prompt generation/loading + skill execution + grading loop
+- `src/core/check-runner.ts`: orchestrates lint + trigger + eval with threshold gates
 - `src/core/grader.ts`: structured grader prompt + JSON parse
 - `src/providers/`: LLM provider abstraction (`sendMessage`) and provider implementations
 - `src/reporters/`: terminal rendering and JSON output helper
@@ -68,6 +70,9 @@ ANTHROPIC_API_KEY=your-key node dist/index.js trigger test-fixtures/sample-skill
   - `sendMessage(systemPrompt, userMessage, { model }) => Promise<string>`
 - Lint is fully offline and first-class.
 - Trigger/eval rely on the same provider abstraction.
+- `check` wraps lint + trigger + eval and enforces minimum thresholds:
+  - trigger F1
+  - eval assertion pass rate
 - JSON mode is strict:
   - no spinners
   - no colored output
@@ -79,7 +84,9 @@ ANTHROPIC_API_KEY=your-key node dist/index.js trigger test-fixtures/sample-skill
 ## Gotchas
 - `trigger --num-queries` must be even for balanced positive/negative cases.
-- OpenAI provider is intentionally a stub in v1 and throws `"OpenAI provider coming soon."`.
+- `check` also requires even `--num-queries`.
+- `check` stops after lint failures unless `--continue-on-lint-fail` is set.
+- OpenAI provider is implemented via dynamic import so Anthropic-only installs do not crash if optional deps are skipped.
 - Frontmatter is validated with both `gray-matter` and `js-yaml`; malformed YAML should fail fast.
 - Keep file references relative to skill root; out-of-root refs are lint failures.
 - If you modify reporter formatting, ensure JSON mode remains machine-safe.
@@ -94,10 +101,10 @@ ANTHROPIC_API_KEY=your-key node dist/index.js trigger test-fixtures/sample-skill
 - Compatibility hints: `src/core/linter/compat.ts`
 - Trigger fake skill pool + scoring: `src/core/trigger-tester.ts`
 - Eval grading schema: `src/core/grader.ts`
+- Combined quality gate orchestration: `src/core/check-runner.ts`
 ## Future Work (Not Implemented Yet)
-- Real OpenAI provider implementation
 - Config file support (`.skilltestrc`)
 - Parallel execution
 - HTML reporting

package/README.md CHANGED Viewed

@@ -23,7 +23,7 @@ Agent Skills are quick to write but hard to validate before deployment:
 - You cannot easily measure trigger precision/recall.
 - You do not know whether outputs are good until users exercise the skill.
-`skilltest` closes this gap with one CLI and three modes.
+`skilltest` closes this gap with one CLI and four modes.
 ## Install
@@ -61,12 +61,18 @@ End-to-end eval:
 skilltest eval ./path/to/skill --provider anthropic --model claude-sonnet-4-5-20250929
 ```
+Run full quality gate:
+```bash
+skilltest check ./path/to/skill --provider anthropic --min-f1 0.8 --min-assert-pass-rate 0.9
+```
 Example lint summary:
 ```text
 skilltest lint
 target: ./test-fixtures/sample-skill
-summary: 25/25 checks passed, 0 warnings, 0 failures
+summary: 29/29 checks passed, 0 warnings, 0 failures
 ```
 ## Commands
@@ -153,6 +159,32 @@ Flags:
 - `--api-key <key>` explicit key override
 - `--verbose` show full model responses
+### `skilltest check <path-to-skill>`
+Runs `lint + trigger + eval` in one command and applies quality thresholds.
+Default behavior:
+1. Run lint.
+2. Stop before model calls if lint has failures.
+3. Run trigger and eval only when lint passes.
+4. Fail quality gate when either threshold is below target.
+Flags:
+- `--provider <anthropic|openai>` default: `anthropic`
+- `--model <model>` default: `claude-sonnet-4-5-20250929` (auto-switches to `gpt-4.1-mini` for `--provider openai` when unchanged)
+- `--grader-model <model>` default: same as resolved `--model`
+- `--api-key <key>` explicit key override
+- `--queries <path>` custom trigger queries JSON
+- `--num-queries <n>` default: `20` (must be even)
+- `--prompts <path>` custom eval prompts JSON
+- `--min-f1 <n>` default: `0.8`
+- `--min-assert-pass-rate <n>` default: `0.9`
+- `--save-results <path>` save combined check result JSON
+- `--continue-on-lint-fail` continue trigger/eval even if lint fails
+- `--verbose` include detailed trigger/eval sections
 ## Global Flags
 - `--help` show help
@@ -195,8 +227,8 @@ Eval prompts (`--prompts`):
 Exit codes:
-- `0`: success with no lint failures
-- `1`: lint failures present
+- `0`: success
+- `1`: quality gate failed (`lint`, `check` thresholds, or command-specific failure conditions)
 - `2`: runtime/config/API/parse error
 JSON mode examples:
@@ -205,6 +237,7 @@ JSON mode examples:
 skilltest lint ./skill --json
 skilltest trigger ./skill --json
 skilltest eval ./skill --json
+skilltest check ./skill --json
 ```
 ## API Keys
@@ -230,7 +263,18 @@ skilltest trigger ./skill --api-key your-key
 Current provider status:
 - `anthropic`: implemented
-- `openai`: interface wired, command currently returns "OpenAI provider coming soon."
+- `openai`: implemented
+OpenAI quick example:
+```bash
+skilltest trigger ./path/to/skill --provider openai --model gpt-4.1-mini
+skilltest eval ./path/to/skill --provider openai --model gpt-4.1-mini
+```
+Note:
+- If you pass `--provider openai` and keep the Anthropic default model value, `skilltest` automatically switches to `gpt-4.1-mini`.
 ## CICD Integration
@@ -283,6 +327,7 @@ jobs:
       - run: npm run build
       - run: npx skilltest trigger path/to/skill --num-queries 20 --json
       - run: npx skilltest eval path/to/skill --prompts path/to/prompts.json --json
+      - run: npx skilltest check path/to/skill --min-f1 0.8 --min-assert-pass-rate 0.9 --json
 ```
 ## Local Development
@@ -300,6 +345,7 @@ Smoke tests:
 node dist/index.js lint test-fixtures/sample-skill/
 node dist/index.js trigger test-fixtures/sample-skill/ --num-queries 2
 node dist/index.js eval test-fixtures/sample-skill/ --prompts test-fixtures/eval-prompts.json
+node dist/index.js check test-fixtures/sample-skill/ --num-queries 2 --prompts test-fixtures/eval-prompts.json
 ```
 ## Release Checklist

package/dist/index.js CHANGED Viewed

@@ -1076,6 +1076,88 @@ function renderEvalReport(result, enableColor, verbose) {
   }
   return lines.join("\n");
 }
+function gateStatusLabel(value, c) {
+  if (value === null) {
+    return c.yellow("SKIP");
+  }
+  return value ? c.green("PASS") : c.red("FAIL");
+}
+function renderCheckReport(result, enableColor, verbose) {
+  const c = getChalkInstance(enableColor);
+  const lines = [];
+  const lintGate = gateStatusLabel(result.gates.lintPassed, c);
+  const triggerGate = gateStatusLabel(result.gates.triggerPassed, c);
+  const evalGate = gateStatusLabel(result.gates.evalPassed, c);
+  const overallGate = result.gates.overallPassed ? c.green("PASS") : c.red("FAIL");
+  lines.push("skilltest check");
+  lines.push(`target: ${result.target}`);
+  lines.push(`provider/model: ${result.provider}/${result.model}`);
+  lines.push(`grader model: ${result.graderModel}`);
+  lines.push(
+    `thresholds: min-f1=${result.thresholds.minF1.toFixed(2)} min-assert-pass-rate=${result.thresholds.minAssertPassRate.toFixed(2)}`
+  );
+  lines.push("");
+  lines.push("Lint");
+  lines.push(
+    `- ${lintGate} ${result.lint.summary.passed}/${result.lint.summary.total} checks passed (${result.lint.summary.warnings} warnings, ${result.lint.summary.failures} failures)`
+  );
+  const lintIssues = verbose ? result.lint.issues : result.lint.issues.filter((issue) => issue.status !== "pass");
+  for (const issue of lintIssues) {
+    lines.push(renderIssueLine(issue, c));
+  }
+  lines.push("");
+  lines.push("Trigger");
+  if (result.trigger) {
+    lines.push(
+      `- ${triggerGate} f1=${formatPercent(result.trigger.metrics.f1)} (precision=${formatPercent(result.trigger.metrics.precision)} recall=${formatPercent(result.trigger.metrics.recall)})`
+    );
+    lines.push(
+      `  TP ${result.trigger.metrics.truePositives} TN ${result.trigger.metrics.trueNegatives} FP ${result.trigger.metrics.falsePositives} FN ${result.trigger.metrics.falseNegatives}`
+    );
+    const triggerCases = verbose ? result.trigger.cases : result.trigger.cases.filter((testCase) => !testCase.matched);
+    for (const testCase of triggerCases) {
+      const status = testCase.matched ? c.green("PASS") : c.red("FAIL");
+      lines.push(`  - ${status} ${testCase.query}`);
+      lines.push(`    expected=${testCase.expected} actual=${testCase.actual}`);
+    }
+  } else {
+    lines.push(`- ${triggerGate} ${result.triggerSkippedReason ?? "Skipped."}`);
+  }
+  lines.push("");
+  lines.push("Eval");
+  if (result.eval) {
+    const passRate = result.gates.evalAssertPassRate ?? 0;
+    lines.push(
+      `- ${evalGate} assertion pass rate=${formatPercent(passRate)} (${result.eval.summary.passedAssertions}/${result.eval.summary.totalAssertions})`
+    );
+    for (const promptResult of result.eval.results) {
+      const failedAssertions = promptResult.assertions.filter((assertion) => !assertion.passed);
+      if (!verbose && failedAssertions.length === 0) {
+        continue;
+      }
+      lines.push(`  - prompt: ${promptResult.prompt}`);
+      lines.push(`    response summary: ${promptResult.responseSummary.replace(/\s+/g, " ").trim()}`);
+      const assertionsToRender = verbose ? promptResult.assertions : failedAssertions;
+      for (const assertion of assertionsToRender) {
+        const assertionStatus = assertion.passed ? c.green("PASS") : c.red("FAIL");
+        lines.push(`    ${assertionStatus} ${assertion.assertion}`);
+        lines.push(`      evidence: ${assertion.evidence}`);
+      }
+      if (verbose) {
+        lines.push(`    full response: ${promptResult.response}`);
+      }
+    }
+  } else {
+    lines.push(`- ${evalGate} ${result.evalSkippedReason ?? "Skipped."}`);
+  }
+  lines.push("");
+  lines.push("Quality Gate");
+  lines.push(`- lint gate: ${lintGate}`);
+  lines.push(`- trigger gate: ${triggerGate}`);
+  lines.push(`- eval gate: ${evalGate}`);
+  lines.push(`- overall: ${overallGate}`);
+  return lines.join("\n");
+}
 // src/reporters/json.ts
 function renderJson(value) {
@@ -1414,15 +1496,104 @@ var AnthropicProvider = class {
 };
 // src/providers/openai.ts
+function wait2(ms) {
+  return new Promise((resolve) => {
+    setTimeout(resolve, ms);
+  });
+}
+function isRetriableError(error) {
+  if (!error || typeof error !== "object") {
+    return false;
+  }
+  const maybeStatus = error.status;
+  if (maybeStatus === 429 || typeof maybeStatus === "number" && maybeStatus >= 500) {
+    return true;
+  }
+  const maybeCode = error.code;
+  if (typeof maybeCode === "string" && /timeout|econnreset|enotfound|eai_again/i.test(maybeCode)) {
+    return true;
+  }
+  const maybeMessage = error.message;
+  if (typeof maybeMessage === "string" && /(rate limit|timeout|temporarily unavailable|connection)/i.test(maybeMessage)) {
+    return true;
+  }
+  return false;
+}
+function extractTextContent(content) {
+  if (!content) {
+    return "";
+  }
+  if (typeof content === "string") {
+    return content.trim();
+  }
+  const text = content.map((item) => item.type === "text" || !item.type ? item.text ?? "" : "").join("\n").trim();
+  return text;
+}
 var OpenAIProvider = class {
   name = "openai";
-  _apiKey;
+  apiKey;
+  client;
   constructor(apiKey) {
-    this._apiKey = apiKey;
+    this.apiKey = apiKey;
+    this.client = null;
+  }
+  async ensureClient() {
+    if (this.client) {
+      return this.client;
+    }
+    let openAiModule;
+    try {
+      const moduleName = "openai";
+      openAiModule = await import(moduleName);
+    } catch {
+      throw new Error(
+        "OpenAI SDK is not installed. Install optional dependency 'openai' or run 'npm install' with optional dependencies enabled."
+      );
+    }
+    const OpenAIConstructor = openAiModule.default;
+    if (!OpenAIConstructor) {
+      throw new Error("OpenAI SDK loaded but no default client export was found.");
+    }
+    this.client = new OpenAIConstructor({ apiKey: this.apiKey });
+    return this.client;
   }
-  async sendMessage(_systemPrompt, _userMessage, _options) {
-    void this._apiKey;
-    throw new Error("OpenAI provider coming soon.");
+  async sendMessage(systemPrompt, userMessage, options) {
+    const client = await this.ensureClient();
+    let lastError;
+    for (let attempt = 0; attempt < 3; attempt += 1) {
+      try {
+        const response = await client.chat.completions.create({
+          model: options.model,
+          max_tokens: 2048,
+          messages: [
+            {
+              role: "system",
+              content: systemPrompt
+            },
+            {
+              role: "user",
+              content: userMessage
+            }
+          ]
+        });
+        const text = (response.choices ?? []).map((choice) => extractTextContent(choice.message?.content)).join("\n").trim();
+        if (text.length === 0) {
+          throw new Error("Model returned an empty response.");
+        }
+        return text;
+      } catch (error) {
+        lastError = error;
+        if (!isRetriableError(error) || attempt === 2) {
+          break;
+        }
+        const delay = Math.min(4e3, 500 * 2 ** attempt) + Math.floor(Math.random() * 250);
+        await wait2(delay);
+      }
+    }
+    if (lastError instanceof Error) {
+      throw new Error(`OpenAI API call failed: ${lastError.message}`);
+    }
+    throw new Error("OpenAI API call failed with an unknown error.");
   }
 };
@@ -1445,8 +1616,16 @@ var triggerOptionsSchema = z3.object({
   verbose: z3.boolean().optional(),
   apiKey: z3.string().optional()
 });
+var DEFAULT_ANTHROPIC_MODEL = "claude-sonnet-4-5-20250929";
+var DEFAULT_OPENAI_MODEL = "gpt-4.1-mini";
+function resolveModel(provider, model) {
+  if (provider === "openai" && model === DEFAULT_ANTHROPIC_MODEL) {
+    return DEFAULT_OPENAI_MODEL;
+  }
+  return model;
+}
 function registerTriggerCommand(program) {
-  program.command("trigger").description("Evaluate whether a skill description triggers correctly.").argument("<path-to-skill>", "Path to SKILL.md or skill directory").option("--model <model>", "Model to use", "claude-sonnet-4-5-20250929").option("--provider <provider>", "LLM provider: anthropic|openai", "anthropic").option("--queries <path>", "Path to custom test queries JSON").option("--num-queries <n>", "Number of auto-generated queries", (value) => Number.parseInt(value, 10), 20).option("--save-queries <path>", "Save generated queries to a JSON file").option("--api-key <key>", "API key override").option("--verbose", "Show full model decisions").action(async (targetPath, commandOptions, command) => {
+  program.command("trigger").description("Evaluate whether a skill description triggers correctly.").argument("<path-to-skill>", "Path to SKILL.md or skill directory").option("--model <model>", "Model to use", DEFAULT_ANTHROPIC_MODEL).option("--provider <provider>", "LLM provider: anthropic|openai", "anthropic").option("--queries <path>", "Path to custom test queries JSON").option("--num-queries <n>", "Number of auto-generated queries", (value) => Number.parseInt(value, 10), 20).option("--save-queries <path>", "Save generated queries to a JSON file").option("--api-key <key>", "API key override").option("--verbose", "Show full model decisions").action(async (targetPath, commandOptions, command) => {
     const globalOptions = getGlobalCliOptions(command);
     const parsedOptions = triggerOptionsSchema.safeParse(commandOptions);
     if (!parsedOptions.success) {
@@ -1483,8 +1662,9 @@ function registerTriggerCommand(program) {
       if (spinner) {
         spinner.text = "Running trigger simulations...";
       }
+      const model = resolveModel(options.provider, options.model);
       const result = await runTriggerTest(skill, {
-        model: options.model,
+        model,
         provider,
         queries,
         numQueries: options.numQueries,
@@ -1669,8 +1849,16 @@ var evalOptionsSchema = z6.object({
   verbose: z6.boolean().optional(),
   apiKey: z6.string().optional()
 });
+var DEFAULT_ANTHROPIC_MODEL2 = "claude-sonnet-4-5-20250929";
+var DEFAULT_OPENAI_MODEL2 = "gpt-4.1-mini";
+function resolveModel2(provider, model) {
+  if (provider === "openai" && model === DEFAULT_ANTHROPIC_MODEL2) {
+    return DEFAULT_OPENAI_MODEL2;
+  }
+  return model;
+}
 function registerEvalCommand(program) {
-  program.command("eval").description("Run end-to-end skill execution and quality evaluation.").argument("<path-to-skill>", "Path to SKILL.md or skill directory").option("--prompts <path>", "Path to eval prompts JSON").option("--model <model>", "Model to execute prompts", "claude-sonnet-4-5-20250929").option("--grader-model <model>", "Model used for grading (defaults to --model)").option("--provider <provider>", "LLM provider: anthropic|openai", "anthropic").option("--save-results <path>", "Save full evaluation results to JSON").option("--api-key <key>", "API key override").option("--verbose", "Show full model responses").action(async (targetPath, commandOptions, command) => {
+  program.command("eval").description("Run end-to-end skill execution and quality evaluation.").argument("<path-to-skill>", "Path to SKILL.md or skill directory").option("--prompts <path>", "Path to eval prompts JSON").option("--model <model>", "Model to execute prompts", DEFAULT_ANTHROPIC_MODEL2).option("--grader-model <model>", "Model used for grading (defaults to --model)").option("--provider <provider>", "LLM provider: anthropic|openai", "anthropic").option("--save-results <path>", "Save full evaluation results to JSON").option("--api-key <key>", "API key override").option("--verbose", "Show full model responses").action(async (targetPath, commandOptions, command) => {
     const globalOptions = getGlobalCliOptions(command);
     const parsedOptions = evalOptionsSchema.safeParse(commandOptions);
     if (!parsedOptions.success) {
@@ -1704,10 +1892,12 @@ function registerEvalCommand(program) {
       if (spinner) {
         spinner.text = "Running eval prompts and grading responses...";
       }
+      const model = resolveModel2(options.provider, options.model);
+      const graderModel = options.graderModel ?? model;
       const result = await runEval(skill, {
         provider,
-        model: options.model,
-        graderModel: options.graderModel ?? options.model,
+        model,
+        graderModel,
         prompts
       });
       if (options.saveResults) {
@@ -1727,6 +1917,213 @@ function registerEvalCommand(program) {
   });
 }
+// src/commands/check.ts
+import ora3 from "ora";
+import { z as z7 } from "zod";
+// src/core/check-runner.ts
+function calculateEvalAssertPassRate(result) {
+  if (result.summary.totalAssertions === 0) {
+    return 0;
+  }
+  return result.summary.passedAssertions / result.summary.totalAssertions;
+}
+async function runCheck(inputPath, options) {
+  options.onStage?.("lint");
+  const lint = await runLinter(inputPath);
+  const lintPassed = lint.summary.failures === 0;
+  let trigger = null;
+  let evalResult = null;
+  let triggerSkippedReason;
+  let evalSkippedReason;
+  if (!lintPassed && !options.continueOnLintFail) {
+    triggerSkippedReason = "Skipped because lint has failures (use --continue-on-lint-fail to override).";
+    evalSkippedReason = "Skipped because lint has failures (use --continue-on-lint-fail to override).";
+  } else {
+    options.onStage?.("parse");
+    let parsedSkill = null;
+    try {
+      parsedSkill = await parseSkillStrict(inputPath);
+    } catch (error) {
+      const message = error instanceof Error ? error.message : String(error);
+      triggerSkippedReason = `Skipped: skill could not be parsed strictly (${message}).`;
+      evalSkippedReason = `Skipped: skill could not be parsed strictly (${message}).`;
+    }
+    if (parsedSkill) {
+      options.onStage?.("trigger");
+      trigger = await runTriggerTest(parsedSkill, {
+        provider: options.provider,
+        model: options.model,
+        queries: options.queries,
+        numQueries: options.numQueries,
+        verbose: options.verbose
+      });
+      options.onStage?.("eval");
+      evalResult = await runEval(parsedSkill, {
+        provider: options.provider,
+        model: options.model,
+        graderModel: options.graderModel,
+        prompts: options.prompts
+      });
+    }
+  }
+  const triggerF1 = trigger ? trigger.metrics.f1 : null;
+  const evalAssertPassRate = evalResult ? calculateEvalAssertPassRate(evalResult) : null;
+  const triggerPassed = triggerF1 === null ? null : triggerF1 >= options.minF1;
+  const evalPassed = evalAssertPassRate === null ? null : evalAssertPassRate >= options.minAssertPassRate;
+  const overallPassed = lintPassed && triggerPassed === true && evalPassed === true;
+  return {
+    target: inputPath,
+    provider: options.provider.name,
+    model: options.model,
+    graderModel: options.graderModel,
+    thresholds: {
+      minF1: options.minF1,
+      minAssertPassRate: options.minAssertPassRate
+    },
+    continueOnLintFail: options.continueOnLintFail,
+    lint,
+    trigger,
+    eval: evalResult,
+    triggerSkippedReason,
+    evalSkippedReason,
+    gates: {
+      lintPassed,
+      triggerPassed,
+      evalPassed,
+      triggerF1,
+      evalAssertPassRate,
+      overallPassed
+    }
+  };
+}
+// src/commands/check.ts
+var checkOptionsSchema = z7.object({
+  provider: z7.enum(["anthropic", "openai"]),
+  model: z7.string(),
+  graderModel: z7.string().optional(),
+  apiKey: z7.string().optional(),
+  queries: z7.string().optional(),
+  numQueries: z7.number().int().min(2),
+  prompts: z7.string().optional(),
+  minF1: z7.number().min(0).max(1),
+  minAssertPassRate: z7.number().min(0).max(1),
+  saveResults: z7.string().optional(),
+  continueOnLintFail: z7.boolean().optional(),
+  verbose: z7.boolean().optional()
+});
+var DEFAULT_ANTHROPIC_MODEL3 = "claude-sonnet-4-5-20250929";
+var DEFAULT_OPENAI_MODEL3 = "gpt-4.1-mini";
+function resolveModel3(provider, model) {
+  if (provider === "openai" && model === DEFAULT_ANTHROPIC_MODEL3) {
+    return DEFAULT_OPENAI_MODEL3;
+  }
+  return model;
+}
+function registerCheckCommand(program) {
+  program.command("check").description("Run lint + trigger + eval with threshold-based quality gates.").argument("<path-to-skill>", "Path to SKILL.md or skill directory").option("--provider <provider>", "LLM provider: anthropic|openai", "anthropic").option("--model <model>", "Model for trigger/eval runs", DEFAULT_ANTHROPIC_MODEL3).option("--grader-model <model>", "Model used for grading (defaults to --model)").option("--api-key <key>", "API key override").option("--queries <path>", "Path to custom trigger queries JSON").option("--num-queries <n>", "Number of auto-generated trigger queries", (value) => Number.parseInt(value, 10), 20).option("--prompts <path>", "Path to eval prompts JSON").option("--min-f1 <n>", "Minimum required trigger F1 score (0-1)", (value) => Number.parseFloat(value), 0.8).option(
+    "--min-assert-pass-rate <n>",
+    "Minimum required eval assertion pass rate (0-1)",
+    (value) => Number.parseFloat(value),
+    0.9
+  ).option("--save-results <path>", "Save combined check results to JSON").option("--continue-on-lint-fail", "Continue trigger/eval stages even when lint has failures").option("--verbose", "Show detailed trigger/eval output sections").action(async (targetPath, commandOptions, command) => {
+    const globalOptions = getGlobalCliOptions(command);
+    const parsedOptions = checkOptionsSchema.safeParse(commandOptions);
+    if (!parsedOptions.success) {
+      writeError(new Error(parsedOptions.error.issues[0]?.message ?? "Invalid check options."), globalOptions.json);
+      process.exitCode = 2;
+      return;
+    }
+    const options = parsedOptions.data;
+    if (options.numQueries % 2 !== 0) {
+      writeError(
+        new Error("--num-queries must be an even number so the suite can split should/should-not trigger cases."),
+        globalOptions.json
+      );
+      process.exitCode = 2;
+      return;
+    }
+    const spinner = globalOptions.json || !process.stdout.isTTY ? null : ora3("Preparing check run...").start();
+    try {
+      if (spinner) {
+        spinner.text = "Initializing model provider...";
+      }
+      const provider = createProvider(options.provider, options.apiKey);
+      let queries = void 0;
+      if (options.queries) {
+        if (spinner) {
+          spinner.text = "Loading custom trigger queries...";
+        }
+        const loadedQueries = await readJsonFile(options.queries);
+        const parsedQueries = triggerQueryArraySchema.safeParse(loadedQueries);
+        if (!parsedQueries.success) {
+          throw new Error(
+            `Invalid --queries JSON: ${parsedQueries.error.issues[0]?.message ?? "unknown format issue"}`
+          );
+        }
+        queries = parsedQueries.data;
+      }
+      let prompts = void 0;
+      if (options.prompts) {
+        if (spinner) {
+          spinner.text = "Loading eval prompts...";
+        }
+        const loadedPrompts = await readJsonFile(options.prompts);
+        const parsedPrompts = evalPromptArraySchema.safeParse(loadedPrompts);
+        if (!parsedPrompts.success) {
+          throw new Error(
+            `Invalid --prompts JSON: ${parsedPrompts.error.issues[0]?.message ?? "unknown format issue"}`
+          );
+        }
+        prompts = parsedPrompts.data;
+      }
+      const model = resolveModel3(options.provider, options.model);
+      const graderModel = options.graderModel ?? model;
+      const result = await runCheck(targetPath, {
+        provider,
+        model,
+        graderModel,
+        queries,
+        numQueries: options.numQueries,
+        prompts,
+        minF1: options.minF1,
+        minAssertPassRate: options.minAssertPassRate,
+        continueOnLintFail: Boolean(options.continueOnLintFail),
+        verbose: Boolean(options.verbose),
+        onStage: (stage) => {
+          if (!spinner) {
+            return;
+          }
+          if (stage === "lint") {
+            spinner.text = "Running lint checks...";
+          } else if (stage === "parse") {
+            spinner.text = "Parsing skill for model evaluations...";
+          } else if (stage === "trigger") {
+            spinner.text = "Running trigger test suite...";
+          } else if (stage === "eval") {
+            spinner.text = "Running end-to-end eval suite...";
+          }
+        }
+      });
+      if (options.saveResults) {
+        await writeJsonFile(options.saveResults, result);
+      }
+      spinner?.stop();
+      if (globalOptions.json) {
+        writeResult(result, true);
+      } else {
+        writeResult(renderCheckReport(result, globalOptions.color, Boolean(options.verbose)), false);
+      }
+      process.exitCode = result.gates.overallPassed ? 0 : 1;
+    } catch (error) {
+      spinner?.stop();
+      writeError(error, globalOptions.json);
+      process.exitCode = 2;
+    }
+  });
+}
 // src/index.ts
 function resolveVersion() {
   try {
@@ -1745,6 +2142,7 @@ async function run(argv) {
   registerLintCommand(program);
   registerTriggerCommand(program);
   registerEvalCommand(program);
+  registerCheckCommand(program);
   await program.parseAsync(argv);
 }
 run(process.argv).catch((error) => {