npm - @pauly4010/evalai-sdk - Versions diffs - 1.5.7 → 1.6.0 - Mend

@pauly4010/evalai-sdk 1.5.7 → 1.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (41) hide show

package/CHANGELOG.md +46 -1
package/README.md +12 -3
package/dist/assertions.d.ts +11 -11
package/dist/assertions.js +1 -1
package/dist/batch.d.ts +3 -3
package/dist/batch.js +1 -1
package/dist/cache.d.ts +3 -3
package/dist/cache.js +1 -1
package/dist/cli/baseline.d.ts +10 -0
package/dist/cli/baseline.js +172 -0
package/dist/cli/formatters/github.js +1 -1
package/dist/cli/formatters/human.js +1 -1
package/dist/cli/formatters/pr-comment.js +1 -1
package/dist/cli/index.js +20 -4
package/dist/cli/regression-gate.d.ts +11 -0
package/dist/cli/regression-gate.js +150 -0
package/dist/client.d.ts +3 -3
package/dist/client.js +3 -2
package/dist/client.request.test.d.ts +1 -0
package/dist/client.request.test.js +157 -0
package/dist/context.d.ts +4 -4
package/dist/context.js +1 -1
package/dist/errors.d.ts +5 -5
package/dist/errors.js +21 -24
package/dist/export.d.ts +1 -1
package/dist/export.js +4 -2
package/dist/index.d.ts +1 -0
package/dist/index.js +7 -1
package/dist/integrations/openai-eval.js +1 -1
package/dist/logger.d.ts +10 -10
package/dist/pagination.d.ts +2 -2
package/dist/regression.d.ts +100 -0
package/dist/regression.js +44 -0
package/dist/snapshot.d.ts +3 -3
package/dist/streaming.d.ts +4 -4
package/dist/testing.d.ts +1 -1
package/dist/types.d.ts +33 -33
package/dist/version.d.ts +1 -1
package/dist/version.js +1 -1
package/dist/workflows.d.ts +29 -18
package/package.json +7 -3

package/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,51 @@ All notable changes to the @pauly4010/evalai-sdk package will be documented in t
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [1.6.0] - 2026-02-24
+### ✨ Added
+#### CLI — Regression Gate & Baseline Management
+- **`evalai baseline init`** — Create a starter `evals/baseline.json` with sample values and provenance metadata
+- **`evalai baseline update`** — Run confidence tests, golden eval, and latency benchmark, then update baseline with real scores
+- **`evalai gate`** — Run the local regression gate with proper exit code taxonomy (0=pass, 1=regression, 2=infra_error, 3=confidence_failed, 4=confidence_missing)
+- **`evalai gate --format json`** — Output `evals/regression-report.json` as machine-readable JSON to stdout
+- **`evalai gate --format github`** — Output GitHub Step Summary markdown with delta table
+#### SDK Exports — Regression Gate Constants & Types
+- **`GATE_EXIT`** — Exit code constants (`PASS`, `REGRESSION`, `INFRA_ERROR`, `CONFIDENCE_FAILED`, `CONFIDENCE_MISSING`)
+- **`GATE_CATEGORY`** — Report category constants (`pass`, `regression`, `infra_error`)
+- **`REPORT_SCHEMA_VERSION`** — Current schema version for `regression-report.json`
+- **`ARTIFACTS`** — Well-known artifact paths (`BASELINE`, `REGRESSION_REPORT`, `CONFIDENCE_SUMMARY`, `LATENCY_BENCHMARK`)
+- **Types**: `RegressionReport`, `RegressionDelta`, `Baseline`, `BaselineTolerance`, `GateExitCode`, `GateCategory`
+- **Subpath export**: `@pauly4010/evalai-sdk/regression` for tree-shakeable imports
+### 🔧 Changed
+- CLI help text updated to include `baseline` and `gate` commands
+- SDK becomes the public contract for regression gate — scripts are implementation detail
+---
+## [1.5.8] - 2026-02-22
+### 🐛 Fixed
+- **secureRoute TypeScript overload compatibility** — Fixed implementation signature to use `ctx: any` for proper overload compatibility
+- **Test infrastructure fixes** — Replaced invalid `expect.unknown()` with `expect.any()` across test files
+- **NextRequest constructor** — Fixed test mocks using incorrect `(NextRequest as any)()` syntax
+- **304 response handling** — Fixed exports API returning invalid 304 response with body
+- **Error catalog tests** — Updated test expectations to match actual EvalAIError behavior
+- **Redis cache timeout** — Added explicit timeout to prevent test hangs
+### 🔧 Changed
+- **Biome formatting** — Applied consistent line endings across 199 files
+---
 ## [1.5.7] - 2026-02-20
 ### 📚 Documentation
@@ -32,7 +77,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - **`--warnDrop <n>`** — Introduce a WARN band when score drops > `warnDrop` but < `maxDrop`
 - **Gate verdicts:** PASS, WARN, FAIL
 - **Profiles:** `strict` (warnDrop: 0), `balanced` (warnDrop: 1), `fast` (warnDrop: 2)
-- **`--fail-on-flake`** — Fail the gate if any case is flagged as flaky (partial pass rate across determinism runs)
+- **`--fail-on-flake`** — Fail the gate if unknown case is flagged as flaky (partial pass rate across determinism runs)
 #### Determinism & flake intelligence

package/README.md CHANGED Viewed

@@ -99,7 +99,7 @@ Key flags
 --maxDrop → hard regression fail
---fail-on-flake → fail if any test is unstable
+--fail-on-flake → fail if unknown test is unstable
 This lets teams tune signal vs noise in CI.
@@ -190,7 +190,7 @@ Option	Description
 --allowWeakEvidence	Permit weak evidence
 --policy <name>	HIPAA, SOC2, GDPR, PCI_DSS, FINRA_4511
 --baseline <mode>	published, previous, production
---fail-on-flake	Fail if any case is flaky
+--fail-on-flake	Fail if unknown case is flaky
 --baseUrl <url>	Override API base URL
 Exit codes
@@ -257,7 +257,16 @@ await openai.chat.completions.create({
 🧭 Changelog
-v1.5.7 (Latest)
+v1.5.8 (Latest)
+Fixed secureRoute TypeScript overload compatibility
+Fixed test infrastructure (expect.any, NextRequest constructor)
+Fixed 304 response handling in exports API
+Improved error catalog test coverage
+v1.5.7
 Documentation updates for CJS compatibility
 Version alignment across README and changelog

package/dist/assertions.d.ts CHANGED Viewed

@@ -17,26 +17,26 @@
 export interface AssertionResult {
     name: string;
     passed: boolean;
-    expected: any;
-    actual: any;
+    expected: unknown;
+    actual: unknown;
     message?: string;
 }
 export declare class AssertionError extends Error {
-    expected: any;
-    actual: any;
-    constructor(message: string, expected: any, actual: any);
+    expected: unknown;
+    actual: unknown;
+    constructor(message: string, expected: unknown, actual: unknown);
 }
 /**
  * Fluent assertion builder
  */
 export declare class Expectation {
     private value;
-    constructor(value: any);
+    constructor(value: unknown);
     /**
      * Assert value equals expected
      * @example expect(output).toEqual("Hello")
      */
-    toEqual(expected: any, message?: string): AssertionResult;
+    toEqual(expected: unknown, message?: string): AssertionResult;
     /**
      * Assert value contains substring
      * @example expect(output).toContain("help")
@@ -71,7 +71,7 @@ export declare class Expectation {
      * Assert JSON matches schema
      * @example expect(output).toMatchJSON({ status: 'success' })
      */
-    toMatchJSON(schema: Record<string, any>, message?: string): AssertionResult;
+    toMatchJSON(schema: Record<string, unknown>, message?: string): AssertionResult;
     /**
      * Assert value has expected sentiment
      * @example expect(output).toHaveSentiment('positive')
@@ -148,7 +148,7 @@ export declare class Expectation {
  * expect(output).toHaveLength({ min: 10, max: 100 });
  * ```
  */
-export declare function expect(value: any): Expectation;
+export declare function expect(value: unknown): Expectation;
 /**
  * Run multiple assertions and collect results
  *
@@ -178,12 +178,12 @@ export declare function withinRange(value: number, min: number, max: number): bo
 export declare function isValidEmail(email: string): boolean;
 export declare function isValidURL(url: string): boolean;
 export declare function hasNoHallucinations(text: string, groundTruth: string[]): boolean;
-export declare function matchesSchema(value: any, schema: Record<string, any>): boolean;
+export declare function matchesSchema(value: unknown, schema: Record<string, unknown>): boolean;
 export declare function hasReadabilityScore(text: string, minScore: number): boolean;
 export declare function containsLanguage(text: string, language: string): boolean;
 export declare function hasFactualAccuracy(text: string, facts: string[]): boolean;
 export declare function respondedWithinTime(startTime: number, maxMs: number): boolean;
 export declare function hasNoToxicity(text: string): boolean;
 export declare function followsInstructions(text: string, instructions: string[]): boolean;
-export declare function containsAllRequiredFields(obj: any, requiredFields: string[]): boolean;
+export declare function containsAllRequiredFields(obj: unknown, requiredFields: string[]): boolean;
 export declare function hasValidCodeSyntax(code: string, language: string): boolean;

package/dist/assertions.js CHANGED Viewed

@@ -612,7 +612,7 @@ function followsInstructions(text, instructions) {
     });
 }
 function containsAllRequiredFields(obj, requiredFields) {
-    return requiredFields.every((field) => field in obj);
+    return requiredFields.every((field) => obj && typeof obj === "object" && field in obj);
 }
 function hasValidCodeSyntax(code, language) {
     // This is a simplified implementation

package/dist/batch.d.ts CHANGED Viewed

@@ -6,13 +6,13 @@ export interface BatchRequest {
     id: string;
     method: string;
     endpoint: string;
-    body?: any;
+    body?: unknown;
     headers?: Record<string, string>;
 }
 export interface BatchResponse {
     id: string;
     status: number;
-    data?: any;
+    data?: unknown;
     error?: string;
 }
 /**
@@ -32,7 +32,7 @@ export declare class RequestBatcher {
     /**
      * Add request to batch queue
      */
-    enqueue(method: string, endpoint: string, body?: any, headers?: Record<string, string>): Promise<any>;
+    enqueue(method: string, endpoint: string, body?: unknown, headers?: Record<string, string>): Promise<unknown>;
     /**
      * Schedule batch processing after delay
      */

package/dist/batch.js CHANGED Viewed

@@ -85,7 +85,7 @@ class RequestBatcher {
                     }
                 }
             }
-            // Handle any requests that didn't get a response
+            // Handle unknown requests that didn't get a response
             for (const item of batch) {
                 if (!responses.find((r) => r.id === item.id)) {
                     item.reject(new Error("No response received for request"));

package/dist/cache.d.ts CHANGED Viewed

@@ -17,15 +17,15 @@ export declare class RequestCache {
     /**
      * Get cached response if valid
      */
-    get<T>(method: string, url: string, params?: any): T | null;
+    get<T>(method: string, url: string, params?: unknown): T | null;
     /**
      * Store response in cache
      */
-    set<T>(method: string, url: string, data: T, ttl: number, params?: any): void;
+    set<T>(method: string, url: string, data: T, ttl: number, params?: unknown): void;
     /**
      * Invalidate specific cache entry
      */
-    invalidate(method: string, url: string, params?: any): void;
+    invalidate(method: string, url: string, params?: unknown): void;
     /**
      * Invalidate all cache entries matching a pattern
      */

package/dist/cache.js CHANGED Viewed

@@ -69,7 +69,7 @@ class RequestCache {
      * Invalidate all cache entries matching a pattern
      */
     invalidatePattern(pattern) {
-        for (const key of this.cache.keys()) {
+        for (const key of Array.from(this.cache.keys())) {
             if (key.includes(pattern)) {
                 this.cache.delete(key);
             }

package/dist/cli/baseline.d.ts ADDED Viewed

@@ -0,0 +1,10 @@
+/**
+ * evalai baseline — Baseline management commands
+ *
+ * Subcommands:
+ *   evalai baseline init    — Create a starter evals/baseline.json
+ *   evalai baseline update  — Run tests + update baseline with real scores
+ */
+export declare function runBaselineInit(cwd: string): number;
+export declare function runBaselineUpdate(cwd: string): number;
+export declare function runBaseline(argv: string[]): number;

package/dist/cli/baseline.js ADDED Viewed

@@ -0,0 +1,172 @@
+"use strict";
+/**
+ * evalai baseline — Baseline management commands
+ *
+ * Subcommands:
+ *   evalai baseline init    — Create a starter evals/baseline.json
+ *   evalai baseline update  — Run tests + update baseline with real scores
+ */
+var __createBinding = (this && this.__createBinding) || (Object.create ? (function(o, m, k, k2) {
+    if (k2 === undefined) k2 = k;
+    var desc = Object.getOwnPropertyDescriptor(m, k);
+    if (!desc || ("get" in desc ? !m.__esModule : desc.writable || desc.configurable)) {
+      desc = { enumerable: true, get: function() { return m[k]; } };
+    }
+    Object.defineProperty(o, k2, desc);
+}) : (function(o, m, k, k2) {
+    if (k2 === undefined) k2 = k;
+    o[k2] = m[k];
+}));
+var __setModuleDefault = (this && this.__setModuleDefault) || (Object.create ? (function(o, v) {
+    Object.defineProperty(o, "default", { enumerable: true, value: v });
+}) : function(o, v) {
+    o["default"] = v;
+});
+var __importStar = (this && this.__importStar) || (function () {
+    var ownKeys = function(o) {
+        ownKeys = Object.getOwnPropertyNames || function (o) {
+            var ar = [];
+            for (var k in o) if (Object.prototype.hasOwnProperty.call(o, k)) ar[ar.length] = k;
+            return ar;
+        };
+        return ownKeys(o);
+    };
+    return function (mod) {
+        if (mod && mod.__esModule) return mod;
+        var result = {};
+        if (mod != null) for (var k = ownKeys(mod), i = 0; i < k.length; i++) if (k[i] !== "default") __createBinding(result, mod, k[i]);
+        __setModuleDefault(result, mod);
+        return result;
+    };
+})();
+Object.defineProperty(exports, "__esModule", { value: true });
+exports.runBaselineInit = runBaselineInit;
+exports.runBaselineUpdate = runBaselineUpdate;
+exports.runBaseline = runBaseline;
+const node_child_process_1 = require("node:child_process");
+const fs = __importStar(require("node:fs"));
+const path = __importStar(require("node:path"));
+const BASELINE_REL = "evals/baseline.json";
+/** Detect the package manager used in the project */
+function detectPackageManager(cwd) {
+    if (fs.existsSync(path.join(cwd, "pnpm-lock.yaml")))
+        return "pnpm";
+    if (fs.existsSync(path.join(cwd, "yarn.lock")))
+        return "yarn";
+    return "npm";
+}
+/** Run an npm script via the detected package manager */
+function runScript(cwd, scriptName) {
+    const pm = detectPackageManager(cwd);
+    const isWin = process.platform === "win32";
+    const result = (0, node_child_process_1.spawnSync)(pm, ["run", scriptName], {
+        cwd,
+        stdio: "inherit",
+        shell: isWin,
+    });
+    return result.status ?? 1;
+}
+function runBaselineInit(cwd) {
+    const baselinePath = path.join(cwd, BASELINE_REL);
+    if (fs.existsSync(baselinePath)) {
+        console.log(`⚠ ${BASELINE_REL} already exists. Delete it first or use 'evalai baseline update'.`);
+        return 1;
+    }
+    // Ensure evals/ directory exists
+    const evalsDir = path.join(cwd, "evals");
+    if (!fs.existsSync(evalsDir)) {
+        fs.mkdirSync(evalsDir, { recursive: true });
+    }
+    const user = process.env.USER || process.env.USERNAME || "unknown";
+    const now = new Date().toISOString();
+    const baseline = {
+        schemaVersion: 1,
+        description: "Regression gate baseline — created by evalai baseline init",
+        generatedAt: now,
+        generatedBy: user,
+        commitSha: "0000000",
+        updatedAt: now,
+        updatedBy: user,
+        tolerance: {
+            scoreDrop: 5,
+            passRateDrop: 5,
+            maxLatencyIncreaseMs: 200,
+            maxCostIncreaseUsd: 0.05,
+        },
+        goldenEval: {
+            score: 100,
+            passRate: 100,
+            totalCases: 3,
+            passedCases: 3,
+        },
+        qualityScore: {
+            overall: 90,
+            grade: "A",
+            accuracy: 85,
+            safety: 100,
+            latency: 90,
+            cost: 90,
+            consistency: 90,
+        },
+        confidenceTests: {
+            unitPassed: true,
+            unitTotal: 0,
+            dbPassed: true,
+            dbTotal: 0,
+        },
+        productMetrics: {},
+    };
+    fs.writeFileSync(baselinePath, `${JSON.stringify(baseline, null, 2)}\n`);
+    console.log(`✅ Created ${BASELINE_REL} with sample values\n`);
+    console.log("Next steps:");
+    console.log(`  1. Commit ${BASELINE_REL} to your repo`);
+    console.log("  2. Run 'evalai baseline update' to populate with real scores");
+    console.log("  3. Run 'evalai gate' to verify the regression gate\n");
+    return 0;
+}
+// ── baseline update ──
+function runBaselineUpdate(cwd) {
+    // Check if eval:baseline-update script exists in package.json
+    const pkgPath = path.join(cwd, "package.json");
+    if (!fs.existsSync(pkgPath)) {
+        console.error("❌ No package.json found. Run this from your project root.");
+        return 1;
+    }
+    let pkg;
+    try {
+        pkg = JSON.parse(fs.readFileSync(pkgPath, "utf-8"));
+    }
+    catch {
+        console.error("❌ Failed to parse package.json");
+        return 1;
+    }
+    if (!pkg.scripts?.["eval:baseline-update"]) {
+        console.error("❌ Missing 'eval:baseline-update' script in package.json.");
+        console.error("   Add it:  \"eval:baseline-update\": \"npx tsx scripts/regression-gate.ts --update-baseline\"");
+        return 1;
+    }
+    console.log("📊 Running baseline update...\n");
+    return runScript(cwd, "eval:baseline-update");
+}
+// ── baseline router ──
+function runBaseline(argv) {
+    const sub = argv[0];
+    const cwd = process.cwd();
+    if (sub === "init") {
+        return runBaselineInit(cwd);
+    }
+    if (sub === "update") {
+        return runBaselineUpdate(cwd);
+    }
+    console.log(`evalai baseline — Manage regression gate baselines
+Usage:
+  evalai baseline init     Create starter ${BASELINE_REL}
+  evalai baseline update   Run tests and update baseline with real scores
+Examples:
+  evalai baseline init
+  evalai baseline update
+`);
+    return sub === "--help" || sub === "-h" ? 0 : 1;
+}

package/dist/cli/formatters/github.js CHANGED Viewed

@@ -81,7 +81,7 @@ function appendStepSummary(report) {
             const exp = (0, snippet_1.truncateSnippet)(fc.expectedOutput ?? fc.expectedSnippet, 80);
             const out = (0, snippet_1.truncateSnippet)(fc.output ?? fc.outputSnippet, 80);
             const reason = out ? `got "${out}"` : "no output";
-            lines.push(`- **${(0, snippet_1.truncateSnippet)(label, 60)}** — expected: ${exp || "(any)"}, ${reason}`);
+            lines.push(`- **${(0, snippet_1.truncateSnippet)(label, 60)}** — expected: ${exp || "(unknown)"}, ${reason}`);
         }
         if (failedCases.length > 10) {
             lines.push(`- _+ ${failedCases.length - 10} more_`);

package/dist/cli/formatters/human.js CHANGED Viewed

@@ -30,7 +30,7 @@ function formatHuman(report) {
             const exp = (0, snippet_1.truncateSnippet)(fc.expectedOutput ?? fc.expectedSnippet, 50);
             const out = (0, snippet_1.truncateSnippet)(fc.output ?? fc.outputSnippet, 50);
             const reason = out ? `got "${out}"` : "no output";
-            lines.push(`  - "${(0, snippet_1.truncateSnippet)(label, 50)}" → expected: ${exp || "(any)"}, ${reason}`);
+            lines.push(`  - "${(0, snippet_1.truncateSnippet)(label, 50)}" → expected: ${exp || "(unknown)"}, ${reason}`);
         }
         if (failedCases.length > toShow.length) {
             lines.push(`  + ${failedCases.length - toShow.length} more`);

package/dist/cli/formatters/pr-comment.js CHANGED Viewed

@@ -49,7 +49,7 @@ function buildPrComment(report) {
         lines.push(`_${escapeMarkdown(report.reasonMessage)}_`);
     }
     lines.push("");
-    // Policy (if any)
+    // Policy (if unknown)
     if (report.policy) {
         lines.push(`**Policy:** ${report.policy}`);
         lines.push("");

package/dist/cli/index.js CHANGED Viewed

@@ -8,9 +8,11 @@
  *   evalai check  — CI/CD evaluation gate (see evalai check --help)
  */
 Object.defineProperty(exports, "__esModule", { value: true });
+const baseline_1 = require("./baseline");
 const check_1 = require("./check");
 const doctor_1 = require("./doctor");
 const init_1 = require("./init");
+const regression_gate_1 = require("./regression-gate");
 const share_1 = require("./share");
 const argv = process.argv.slice(2);
 const subcommand = argv[0];
@@ -19,6 +21,14 @@ if (subcommand === "init") {
     const ok = (0, init_1.runInit)(cwd);
     process.exit(ok ? 0 : 1);
 }
+else if (subcommand === "baseline") {
+    const code = (0, baseline_1.runBaseline)(argv.slice(1));
+    process.exit(code);
+}
+else if (subcommand === "gate") {
+    const code = (0, regression_gate_1.runGate)(argv.slice(1));
+    process.exit(code);
+}
 else if (subcommand === "doctor") {
     (0, doctor_1.runDoctor)(argv.slice(1))
         .then((code) => process.exit(code))
@@ -57,10 +67,16 @@ else {
     console.log(`EvalAI CLI
 Usage:
-  evalai init              Create evalai.config.json
-  evalai doctor [options]  Verify CI/CD setup (same endpoint as check)
-  evalai check [options]   CI/CD evaluation gate
-  evalai share [options]   Create share link for a run
+  evalai init                Create evalai.config.json
+  evalai baseline init       Create starter evals/baseline.json
+  evalai baseline update     Run tests and update baseline with real scores
+  evalai gate [options]      Run regression gate (local test-based)
+  evalai doctor [options]    Verify CI/CD setup (same endpoint as check)
+  evalai check [options]     CI/CD evaluation gate (API-based)
+  evalai share [options]     Create share link for a run
+Options for gate:
+  --format <fmt>      Output format: human (default), json, github
 Options for check:
   --evaluationId <id>  Evaluation to gate on (or from config)

package/dist/cli/regression-gate.d.ts ADDED Viewed

@@ -0,0 +1,11 @@
+/**
+ * evalai gate — Run the regression gate
+ *
+ * Delegates to the project's eval:regression-gate npm script.
+ * Supports --format json to output the regression-report.json contents.
+ */
+export interface GateArgs {
+    format: "human" | "json" | "github";
+}
+export declare function parseGateArgs(argv: string[]): GateArgs;
+export declare function runGate(argv: string[]): number;

package/dist/cli/regression-gate.js ADDED Viewed

@@ -0,0 +1,150 @@
+"use strict";
+/**
+ * evalai gate — Run the regression gate
+ *
+ * Delegates to the project's eval:regression-gate npm script.
+ * Supports --format json to output the regression-report.json contents.
+ */
+var __createBinding = (this && this.__createBinding) || (Object.create ? (function(o, m, k, k2) {
+    if (k2 === undefined) k2 = k;
+    var desc = Object.getOwnPropertyDescriptor(m, k);
+    if (!desc || ("get" in desc ? !m.__esModule : desc.writable || desc.configurable)) {
+      desc = { enumerable: true, get: function() { return m[k]; } };
+    }
+    Object.defineProperty(o, k2, desc);
+}) : (function(o, m, k, k2) {
+    if (k2 === undefined) k2 = k;
+    o[k2] = m[k];
+}));
+var __setModuleDefault = (this && this.__setModuleDefault) || (Object.create ? (function(o, v) {
+    Object.defineProperty(o, "default", { enumerable: true, value: v });
+}) : function(o, v) {
+    o["default"] = v;
+});
+var __importStar = (this && this.__importStar) || (function () {
+    var ownKeys = function(o) {
+        ownKeys = Object.getOwnPropertyNames || function (o) {
+            var ar = [];
+            for (var k in o) if (Object.prototype.hasOwnProperty.call(o, k)) ar[ar.length] = k;
+            return ar;
+        };
+        return ownKeys(o);
+    };
+    return function (mod) {
+        if (mod && mod.__esModule) return mod;
+        var result = {};
+        if (mod != null) for (var k = ownKeys(mod), i = 0; i < k.length; i++) if (k[i] !== "default") __createBinding(result, mod, k[i]);
+        __setModuleDefault(result, mod);
+        return result;
+    };
+})();
+Object.defineProperty(exports, "__esModule", { value: true });
+exports.parseGateArgs = parseGateArgs;
+exports.runGate = runGate;
+const node_child_process_1 = require("node:child_process");
+const fs = __importStar(require("node:fs"));
+const path = __importStar(require("node:path"));
+const REPORT_REL = "evals/regression-report.json";
+/** Detect the package manager used in the project */
+function detectPackageManager(cwd) {
+    if (fs.existsSync(path.join(cwd, "pnpm-lock.yaml")))
+        return "pnpm";
+    if (fs.existsSync(path.join(cwd, "yarn.lock")))
+        return "yarn";
+    return "npm";
+}
+function parseGateArgs(argv) {
+    const args = { format: "human" };
+    for (let i = 0; i < argv.length; i++) {
+        if (argv[i] === "--format" && argv[i + 1]) {
+            const fmt = argv[i + 1];
+            if (fmt === "json" || fmt === "github" || fmt === "human") {
+                args.format = fmt;
+            }
+            i++;
+        }
+    }
+    return args;
+}
+function runGate(argv) {
+    const cwd = process.cwd();
+    const args = parseGateArgs(argv);
+    // Check if eval:regression-gate script exists
+    const pkgPath = path.join(cwd, "package.json");
+    if (!fs.existsSync(pkgPath)) {
+        console.error("❌ No package.json found. Run this from your project root.");
+        return 1;
+    }
+    let pkg;
+    try {
+        pkg = JSON.parse(fs.readFileSync(pkgPath, "utf-8"));
+    }
+    catch {
+        console.error("❌ Failed to parse package.json");
+        return 1;
+    }
+    if (!pkg.scripts?.["eval:regression-gate"]) {
+        console.error("❌ Missing 'eval:regression-gate' script in package.json.");
+        console.error('   Add it:  "eval:regression-gate": "npx tsx scripts/regression-gate.ts"');
+        return 1;
+    }
+    const pm = detectPackageManager(cwd);
+    const isWin = process.platform === "win32";
+    // For json format, suppress human output and print report JSON
+    const stdio = args.format === "json" ? "pipe" : "inherit";
+    const result = (0, node_child_process_1.spawnSync)(pm, ["run", "eval:regression-gate"], {
+        cwd,
+        stdio: stdio,
+        shell: isWin,
+    });
+    const exitCode = result.status ?? 1;
+    if (args.format === "json") {
+        // Output the regression report as JSON
+        const reportPath = path.join(cwd, REPORT_REL);
+        if (fs.existsSync(reportPath)) {
+            const report = fs.readFileSync(reportPath, "utf-8");
+            process.stdout.write(report);
+        }
+        else {
+            console.error(JSON.stringify({ error: "regression-report.json not found", exitCode }));
+        }
+    }
+    else if (args.format === "github") {
+        // Output GitHub Step Summary markdown
+        const reportPath = path.join(cwd, REPORT_REL);
+        if (fs.existsSync(reportPath)) {
+            try {
+                const report = JSON.parse(fs.readFileSync(reportPath, "utf-8"));
+                const icon = report.passed ? "✅" : "❌";
+                const lines = [
+                    `## ${icon} Regression Gate: ${report.category}`,
+                    "",
+                    "| Metric | Baseline | Current | Delta | Status |",
+                    "|--------|----------|---------|-------|--------|",
+                ];
+                for (const d of report.deltas ?? []) {
+                    const statusIcon = d.status === "pass" ? "✅" : "❌";
+                    lines.push(`| ${d.metric} | ${d.baseline} | ${d.current} | ${d.delta} | ${statusIcon} |`);
+                }
+                if (report.failures?.length > 0) {
+                    lines.push("", "### Failures", "");
+                    for (const f of report.failures) {
+                        lines.push(`- ${f}`);
+                    }
+                }
+                lines.push("", `Schema version: ${report.schemaVersion ?? "unknown"}`);
+                const md = lines.join("\n");
+                // Write to $GITHUB_STEP_SUMMARY if available
+                const summaryPath = process.env.GITHUB_STEP_SUMMARY;
+                if (summaryPath) {
+                    fs.appendFileSync(summaryPath, `${md}\n`);
+                }
+                console.log(md);
+            }
+            catch {
+                // Fall through — human output already printed
+            }
+        }
+    }
+    return exitCode;
+}