npm - @pauly4010/evalai-sdk - Versions diffs - 1.6.0 → 1.8.0 - Mend

@pauly4010/evalai-sdk 1.6.0 → 1.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

package/CHANGELOG.md +114 -0
package/README.md +198 -236
package/dist/cli/baseline.js +1 -1
package/dist/cli/check.js +15 -0
package/dist/cli/doctor.d.ts +80 -3
package/dist/cli/doctor.js +576 -43
package/dist/cli/explain.d.ts +58 -0
package/dist/cli/explain.js +429 -0
package/dist/cli/formatters/github.js +5 -0
package/dist/cli/formatters/types.d.ts +3 -0
package/dist/cli/formatters/types.js +3 -0
package/dist/cli/index.js +47 -4
package/dist/cli/init.d.ts +11 -2
package/dist/cli/init.js +239 -16
package/dist/cli/print-config.d.ts +29 -0
package/dist/cli/print-config.js +251 -0
package/dist/cli/regression-gate.d.ts +6 -2
package/dist/cli/regression-gate.js +246 -61
package/dist/cli/report/build-check-report.d.ts +1 -1
package/dist/cli/report/build-check-report.js +2 -0
package/dist/cli/upgrade.d.ts +15 -0
package/dist/cli/upgrade.js +491 -0
package/dist/index.d.ts +1 -1
package/dist/index.js +7 -7
package/dist/version.d.ts +1 -1
package/dist/version.js +1 -1
package/package.json +1 -1

package/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,120 @@ All notable changes to the @pauly4010/evalai-sdk package will be documented in t
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [1.8.0] - 2026-02-26
+### ✨ Added
+#### CLI — `evalai doctor` Rewrite (Comprehensive Checklist)
+- **9 itemized checks** with pass/fail/warn/skip status and exact remediation commands:
+  1. Project detection (package.json + lockfile + package manager)
+  2. Config file validity (evalai.config.json)
+  3. Baseline file (evals/baseline.json — schema, staleness)
+  4. Authentication (API key presence, redacted display)
+  5. Evaluation target (evaluationId configured)
+  6. API connectivity (reachable, latency)
+  7. Evaluation access (permissions, baseline presence)
+  8. CI wiring (.github/workflows/evalai-gate.yml)
+  9. Provider env vars (OpenAI/Anthropic/Azure — optional)
+- **Exit codes**: `0` ready, `2` not ready, `3` infrastructure error
+- **`--report`** flag outputs full JSON diagnostic bundle (versions, hashes, latency, all checks)
+- **`--format json`** for machine-readable output
+#### CLI — `evalai explain` (New Command)
+- **Offline report explainer** — reads `.evalai/last-report.json` or `evals/regression-report.json` with zero flags
+- **Top 3 failing test cases** with input/expected/actual
+- **What changed** — baseline vs current with directional indicators
+- **Root cause classification**: prompt drift, retrieval drift, formatting drift, tool-use drift, safety/cost/latency regression, coverage drop, baseline stale
+- **Prioritized suggested fixes** with actionable commands
+- Works with both `evalai check` reports (CheckReport) and `evalai gate` reports (BuiltinReport)
+- **`--format json`** for CI pipeline consumption
+#### Guided Failure Flow
+- **`evalai check` now writes `.evalai/last-report.json`** automatically after every run
+- **Failure hint**: prints `Next: evalai explain` on gate failure
+- **GitHub step summary**: adds tip about `evalai explain` and report artifact location on failure
+#### CI Template Improvements
+- **Doctor preflight step** added to generated workflow (`continue-on-error: true`)
+- **Report artifact upload** now includes both `evals/regression-report.json` and `.evalai/last-report.json`
+#### `evalai init` Output Updated
+- First recommendation: `npx evalai doctor` (verify setup)
+- Full command reference: doctor, gate, check, explain, baseline update
+#### CLI — `evalai print-config` (New Command)
+- **Resolved config viewer** — prints every config field with its current value
+- **Source-of-truth annotations**: `[file]`, `[env]`, `[default]`, `[profile]`, `[arg]` for each field
+- **Secrets redacted** — API keys shown as `sk_t...abcd`
+- **Environment summary** — shows all relevant env vars (EVALAI_API_KEY, OPENAI_API_KEY, CI, etc.)
+- **`--format json`** for machine-readable output
+- Accepts `--evaluationId`, `--baseUrl`, etc. to show how CLI args would merge
+#### Minimal Green Example
+- **`examples/minimal-green/`** — passes on first run, no account needed
+- Zero dependencies, 3 `node:test` tests
+- Clone → init → doctor → gate → ✅
+### 🔧 Changed
+- `evalai doctor` exit codes changed: was `0`/`1`, now `0`/`2`/`3`
+- SDK README: added Debugging & Diagnostics section with guided flow diagram
+- SDK README: added Doctor Exit Codes table
+- Doctor test count: 4 → 29 tests; added 9 explain tests (38 total new tests)
+---
+## [1.7.0] - 2026-02-25
+### ✨ Added
+#### CLI — `evalai init` Full Project Scaffolder
+- **`evalai init`** — Zero-to-gate in under 5 minutes:
+  - Detects Node repo + package manager (npm/yarn/pnpm)
+  - Runs existing tests to capture real pass/fail + test count
+  - Creates `evals/baseline.json` with provenance metadata
+  - Installs `.github/workflows/evalai-gate.yml` (package-manager aware)
+  - Creates `evalai.config.json`
+  - Prints copy-paste next steps — just commit and push
+  - Idempotent: skips files that already exist
+#### CLI — `evalai upgrade --full` (Tier 1 → Tier 2)
+- **`evalai upgrade --full`** — Upgrade from built-in gate to full gate:
+  - Creates `scripts/regression-gate.ts` (full gate with `--update-baseline`)
+  - Adds `eval:regression-gate` + `eval:baseline-update` npm scripts
+  - Creates `.github/workflows/baseline-governance.yml` (label + diff enforcement)
+  - Upgrades `evalai-gate.yml` to project mode
+  - Adds `CODEOWNERS` entry for `evals/baseline.json`
+#### Gate Output — Machine-Readable Improvements
+- **`detectRunner()`** — Identifies test runner from `package.json` scripts: vitest, jest, mocha, node:test, ava, tap, or unknown
+- **BuiltinReport** now always emits: `durationMs`, `command`, `runner`, `baseline` metadata
+- Report schema updated with optional `durationMs`, `command`, `runner` properties
+#### Init Scaffolder Integration Tests
+- 4 fixtures: npm+jest, pnpm+vitest, yarn+jest, pnpm monorepo
+- 25 tests: files created, YAML valid, pm-aware workflow, idempotent runs
+- All fixtures use `node:test` (zero external deps)
+### 🔧 Changed
+- CLI help text updated to include `upgrade` command
+- Gate report includes runner detection and timing metadata
+- SDK test count: 147 → 172 tests (12 → 15 contract tests)
+---
 ## [1.6.0] - 2026-02-24
 ### ✨ Added

package/README.md CHANGED Viewed

@@ -1,323 +1,285 @@
 # @pauly4010/evalai-sdk
-[![npm version](https://img.shields.io/npm/v/@pauly4010/evalai-sdk.svg)](https://www.npmjs.com/package/@pauly4010/evalai-sdk)
+[![npm version](https://img.shields.io/npm/v/@pauly4010/evalai-sdk.svg)](https://www.npmjs.com/package/@pauly4010/evalai-sdk)
 [![npm downloads](https://img.shields.io/npm/dm/@pauly4010/evalai-sdk.svg)](https://www.npmjs.com/package/@pauly4010/evalai-sdk)
+[![TypeScript](https://img.shields.io/badge/TypeScript-strict-blue.svg)](https://www.typescriptlang.org/)
+[![SDK Tests](https://img.shields.io/badge/tests-172%20passed-brightgreen.svg)](#)
+[![Contract Version](https://img.shields.io/badge/report%20schema-v1-blue.svg)](#)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
 **Stop LLM regressions in CI in minutes.**
-Evaluate locally in 60 seconds. Add an optional CI gate in 2 minutes.
-No lock-in — remove by deleting `evalai.config.json`.
+Zero to gate in under 5 minutes. No infra. No lock-in. Remove anytime.
 ---
-# 🚀 1) 60 seconds: Run locally (no account)
-Install, run, get a score.
-No EvalAI account. No API key. No dashboard required.
+## Quick Start (2 minutes)
 ```bash
-npm install @pauly4010/evalai-sdk openai
-import { openAIChatEval } from "@pauly4010/evalai-sdk";
-await openAIChatEval({
-  name: "chat-regression",
-  cases: [
-    { input: "Hello", expectedOutput: "greeting" },
-    { input: "2 + 2 = ?", expectedOutput: "4" },
-  ],
-});
-Set:
-OPENAI_API_KEY=...
-✅ Vitest integration (recommended)
-import {
-  openAIChatEval,
-  extendExpectWithToPassGate,
-} from "@pauly4010/evalai-sdk";
-import { expect } from "vitest";
-extendExpectWithToPassGate(expect);
-it("passes gate", async () => {
-  const result = await openAIChatEval({
-    name: "chat-regression",
-    cases: [
-      { input: "Hello", expectedOutput: "greeting" },
-      { input: "2 + 2 = ?", expectedOutput: "4" },
-    ],
-  });
-  expect(result).toPassGate();
-});
-Example output
-PASS 2/2 (score: 100)
-Tip: Want dashboards and history?
-Set EVALAI_API_KEY and connect this to the platform.
-Failures show:
-FAIL 9/10 (score: 90)
-with failed cases and CI guidance.
-⚡ 2) Optional: Add a CI gate (2 minutes)
-When you're ready to gate PRs on quality and regressions:
-npx -y @pauly4010/evalai-sdk@^1 init
-Create an evaluation in the dashboard and paste its ID into:
-{
-  "evaluationId": "42"
-}
-Add to your CI:
+npx @pauly4010/evalai-sdk init
+git add evals/ .github/workflows/evalai-gate.yml evalai.config.json
+git commit -m "chore: add EvalAI regression gate"
+git push
+```
-- name: EvalAI gate
-  env:
-    EVALAI_API_KEY: ${{ secrets.EVALAI_API_KEY }}
-  run: npx -y @pauly4010/evalai-sdk@^1 check --format github --onFail import --warnDrop 1
-You’ll get:
+That's it. Open a PR and CI blocks regressions automatically.
-GitHub annotations
+`evalai init` detects your project, creates a baseline from your current tests, and installs a GitHub Actions workflow. No manual config needed.
-Step summary
+---
-Optional dashboard link
+## What `evalai init` does
-PASS / WARN / FAIL (v1.5.7)
-EvalAI introduces a WARN band so teams can see meaningful regressions without always blocking merges.
+1. **Detects** your Node repo and package manager (npm/yarn/pnpm)
+2. **Runs your tests** to capture a real baseline (pass/fail + test count)
+3. **Creates `evals/baseline.json`** with provenance metadata
+4. **Installs `.github/workflows/evalai-gate.yml`** (package-manager aware)
+5. **Creates `evalai.config.json`**
+6. **Prints next steps** — just commit and push
-Behavior
+---
-PASS → within thresholds
+## CLI Commands
-WARN → regression > warnDrop but < maxDrop
+### Regression Gate (local, no account needed)
-FAIL → regression > maxDrop
+| Command | Description |
+|---------|-------------|
+| `npx evalai init` | Full project scaffolder — creates everything you need |
+| `npx evalai gate` | Run regression gate locally |
+| `npx evalai gate --format json` | Machine-readable JSON output |
+| `npx evalai gate --format github` | GitHub Step Summary with delta table |
+| `npx evalai baseline init` | Create starter `evals/baseline.json` |
+| `npx evalai baseline update` | Re-run tests and update baseline with real scores |
+| `npx evalai upgrade --full` | Upgrade from Tier 1 (built-in) to Tier 2 (full gate) |
-Key flags
+### API Gate (requires account)
---warnDrop → soft regression warning
+| Command | Description |
+|---------|-------------|
+| `npx evalai check` | Gate on quality score from dashboard |
+| `npx evalai share` | Create share link for a run |
---maxDrop → hard regression fail
+### Debugging & Diagnostics
---fail-on-flake → fail if unknown test is unstable
+| Command | Description |
+|---------|-------------|
+| `npx evalai doctor` | Comprehensive preflight checklist — verifies config, baseline, auth, API, CI wiring |
+| `npx evalai explain` | Offline report explainer — top failures, root cause classification, suggested fixes |
+| `npx evalai print-config` | Show resolved config with source-of-truth annotations (file/env/default/arg) |
-This lets teams tune signal vs noise in CI.
+**Guided failure flow:**
-🔒 3) No lock-in
-To stop using EvalAI:
+```
+evalai check  →  fails  →  "Next: evalai explain"
+                              ↓
+                   evalai explain  →  root causes + fixes
+```
-rm evalai.config.json
-Your local openAIChatEval runs continue to work exactly the same.
+**GitHub Actions step summary** — gate result at a glance:
-No account cancellation. No data export required.
+![GitHub Actions step summary showing gate pass/fail with delta table](../../docs/images/evalai-gate-step-summary.svg)
-📦 Installation
-npm install @pauly4010/evalai-sdk openai
-# or
-yarn add @pauly4010/evalai-sdk openai
-# or
-pnpm add @pauly4010/evalai-sdk openai
-🖥️ Environment Support
-This SDK works in both Node.js and browsers, with some Node-only features.
+**`evalai explain` terminal output** — root causes + fix commands:
-✅ Works Everywhere (Node.js + Browser)
-Traces API
+![Terminal output of evalai explain showing top failures and suggested fixes](../../docs/images/evalai-explain-terminal.svg)
-Evaluations API
+`check` automatically writes `.evalai/last-report.json` so `explain` works with zero flags.
-LLM Judge API
+`doctor` uses exit codes: **0** = ready, **2** = not ready, **3** = infra error. Use `--report` for a JSON diagnostic bundle.
-Annotations API
+### Gate Exit Codes
-Developer API (API Keys, Webhooks, Usage)
+| Code | Meaning |
+|------|---------|
+| 0 | Pass — no regression |
+| 1 | Regression detected |
+| 2 | Infra error (baseline missing, tests crashed) |
-Organizations API
+### Check Exit Codes (API mode)
-Assertions Library
+| Code | Meaning |
+|------|---------|
+| 0 | Pass |
+| 1 | Score below threshold |
+| 2 | Regression failure |
+| 3 | Policy violation |
+| 4 | API error |
+| 5 | Bad arguments |
+| 6 | Low test count |
+| 7 | Weak evidence |
+| 8 | Warn (soft regression) |
-Test Suites
+### Doctor Exit Codes
-Error Handling
+| Code | Meaning |
+|------|---------|
+| 0 | Ready — all checks passed |
+| 2 | Not ready — one or more checks failed |
+| 3 | Infrastructure error |
-CJS/ESM Compatibility
+---
-🟡 Node.js Only
-These require Node.js:
+## How the Gate Works
-Snapshot Testing
+**Built-in mode** (any Node project, no config needed):
+- Runs `<pm> test`, captures exit code + test count
+- Compares against `evals/baseline.json`
+- Writes `evals/regression-report.json`
+- Fails CI if tests regress
-Local Storage Mode
+**Project mode** (advanced, for full regression gate):
+- If `eval:regression-gate` script exists in `package.json`, delegates to it
+- Supports golden eval scores, confidence tests, p95 latency, cost tracking
+- Full delta table with tolerances
-CLI Tool
+---
-Export to File
+## Run a Regression Test Locally (no account)
-🔄 Context Propagation
-Node.js: full async context via AsyncLocalStorage
+```bash
+npm install @pauly4010/evalai-sdk openai
+```
-Browser: basic support (not safe across all async boundaries)
+```typescript
+import { openAIChatEval } from "@pauly4010/evalai-sdk";
-🧠 AIEvalClient (Platform API)
-import { AIEvalClient } from "@pauly4010/evalai-sdk";
+await openAIChatEval({
+  name: "chat-regression",
+  cases: [
+    { input: "Hello", expectedOutput: "greeting" },
+    { input: "2 + 2 = ?", expectedOutput: "4" },
+  ],
+});
+```
-// From env
-const client = AIEvalClient.init();
+Output: `PASS 2/2 (score: 100)`. No account needed. Just a score.
-// Explicit
-const client2 = new AIEvalClient({
-  apiKey: "your-api-key",
-  organizationId: 123,
-  debug: true,
-});
-🧪 evalai CLI (v1.5.7)
-The CLI gates deployments on quality, regression, and policy.
-Quick start
-npx -y @pauly4010/evalai-sdk@^1 check \
-  --evaluationId 42 \
-  --apiKey $EVALAI_API_KEY
-evalai check
-Option	Description
---evaluationId <id>	Required. Evaluation to gate on
---apiKey <key>	API key (or EVALAI_API_KEY)
---format <fmt>	human, json, or github
---onFail import	Import failing run to dashboard
---explain	Show score breakdown
---minScore <n>	Fail if score < n
---warnDrop <n>	Warn if regression exceeds n
---maxDrop <n>	Fail if regression exceeds n
---minN <n>	Fail if test count < n
---allowWeakEvidence	Permit weak evidence
---policy <name>	HIPAA, SOC2, GDPR, PCI_DSS, FINRA_4511
---baseline <mode>	published, previous, production
---fail-on-flake	Fail if unknown case is flaky
---baseUrl <url>	Override API base URL
-Exit codes
-Code	Meaning
-0	PASS
-8	WARN
-1	Score below threshold
-2	Regression failure
-3	Policy violation
-4	API error
-5	Bad arguments
-6	Low test count
-7	Weak evidence
-evalai doctor
-Verify CI setup before running the gate:
-npx -y @pauly4010/evalai-sdk@^1 doctor \
-  --evaluationId 42 \
-  --apiKey $EVALAI_API_KEY
-If doctor passes, check will work.
-🧯 Error Handling
-import { EvalAIError, RateLimitError } from "@pauly4010/evalai-sdk";
-try {
-  await client.traces.create({ name: "User Query" });
-} catch (err) {
-  if (err instanceof RateLimitError) {
-    console.log("Retry after:", err.retryAfter);
-  } else if (err instanceof EvalAIError) {
-    console.log(err.code, err.message, err.requestId);
-  }
-}
-🔍 Traces
-const trace = await client.traces.create({
-  name: "User Query",
-  traceId: "trace-123",
-  metadata: { userId: "456" },
-});
+### Vitest Integration
+```typescript
+import { openAIChatEval, extendExpectWithToPassGate } from "@pauly4010/evalai-sdk";
+import { expect } from "vitest";
-📝 Evaluations
-import { EvaluationTemplates } from "@pauly4010/evalai-sdk";
+extendExpectWithToPassGate(expect);
-const evaluation = await client.evaluations.create({
-  name: "Chatbot Responses",
-  type: EvaluationTemplates.OUTPUT_QUALITY,
-  createdBy: userId,
+it("passes gate", async () => {
+  const result = await openAIChatEval({
+    name: "chat-regression",
+    cases: [
+      { input: "Hello", expectedOutput: "greeting" },
+      { input: "2 + 2 = ?", expectedOutput: "4" },
+    ],
+  });
+  expect(result).toPassGate();
 });
+```
+---
-🔌 Framework Integrations
-import { traceOpenAI } from "@pauly4010/evalai-sdk/integrations/openai";
-import OpenAI from "openai";
-const openai = traceOpenAI(new OpenAI(), client);
+## SDK Exports
-await openai.chat.completions.create({
-  model: "gpt-4",
-  messages: [{ role: "user", content: "Hello" }],
-});
+### Regression Gate Constants
+```typescript
+import {
+  GATE_EXIT,           // { PASS: 0, REGRESSION: 1, INFRA_ERROR: 2, ... }
+  GATE_CATEGORY,       // { PASS: "pass", REGRESSION: "regression", INFRA_ERROR: "infra_error" }
+  REPORT_SCHEMA_VERSION,
+  ARTIFACTS,           // { BASELINE, REGRESSION_REPORT, CONFIDENCE_SUMMARY, LATENCY_BENCHMARK }
+} from "@pauly4010/evalai-sdk";
-🧭 Changelog
-v1.5.8 (Latest)
-Fixed secureRoute TypeScript overload compatibility
+// Or tree-shakeable:
+import { GATE_EXIT } from "@pauly4010/evalai-sdk/regression";
+```
-Fixed test infrastructure (expect.any, NextRequest constructor)
+### Types
+```typescript
+import type {
+  RegressionReport,
+  RegressionDelta,
+  Baseline,
+  BaselineTolerance,
+  GateExitCode,
+  GateCategory,
+} from "@pauly4010/evalai-sdk/regression";
+```
-Fixed 304 response handling in exports API
+### Platform Client
-Improved error catalog test coverage
+```typescript
+import { AIEvalClient } from "@pauly4010/evalai-sdk";
-v1.5.7
-Documentation updates for CJS compatibility
+const client = AIEvalClient.init(); // from EVALAI_API_KEY env
+// or
+const client = new AIEvalClient({ apiKey: "...", organizationId: 123 });
+```
-Version alignment across README and changelog
+### Framework Integrations
-Environment support section updated
+```typescript
+import { traceOpenAI } from "@pauly4010/evalai-sdk/integrations/openai";
+import { traceAnthropic } from "@pauly4010/evalai-sdk/integrations/anthropic";
+```
-v1.5.6
-PASS/WARN/FAIL gate semantics
+---
---warnDrop soft regression band
+## Installation
-Flake intelligence + per-case pass rates
+```bash
+npm install @pauly4010/evalai-sdk
+# or
+yarn add @pauly4010/evalai-sdk
+# or
+pnpm add @pauly4010/evalai-sdk
+```
---fail-on-flake enforcement
+Add `openai` as a peer dependency if using `openAIChatEval`:
-Golden regression suite
+```bash
+npm install openai
+```
-Nightly determinism + performance audits
+## Environment Support
-Audit trail, observability, retention, and migration safety docs
+| Feature | Node.js | Browser |
+|---------|---------|---------|
+| Platform APIs (Traces, Evaluations, LLM Judge) | ✅ | ✅ |
+| Assertions, Test Suites, Error Handling | ✅ | ✅ |
+| CJS/ESM | ✅ | ✅ |
+| CLI, Snapshots, File Export | ✅ | — |
+| Context Propagation | ✅ Full | ⚠️ Basic |
-CJS compatibility for all subpath exports
+## No Lock-in
-v1.5.0
-GitHub annotations formatter
+```bash
+rm evalai.config.json
+```
-JSON formatter
+Your local `openAIChatEval` runs continue to work. No account cancellation. No data export required.
---onFail import
+## Changelog
---explain
+See [CHANGELOG.md](CHANGELOG.md) for the full release history.
-evalai doctor
+**v1.8.0** — `evalai doctor` rewrite (9-check checklist), `evalai explain` command, guided failure flow, CI template with doctor preflight
-CI pinned invocation guidance
+**v1.7.0** — `evalai init` scaffolder, `evalai upgrade --full`, `detectRunner()`, machine-readable gate output, init test matrix
+**v1.6.0** — `evalai gate`, `evalai baseline`, regression gate constants & types
-Environment Variable Safety
+**v1.5.8** — secureRoute fix, test infra fixes, 304 handling fix
-The SDK never assumes `process.env` exists. All environment reads are guarded, so the client can initialize safely in browser, edge, and server runtimes.
+**v1.5.5** — PASS/WARN/FAIL semantics, flake intelligence, golden regression suite
-If environment variables are unavailable, the SDK falls back to explicit config.
+**v1.5.0** — GitHub annotations, `--onFail import`, `evalai doctor`
+## License
-📄 License
 MIT
-🤝 Support
-Documentation:
-https://v0-ai-evaluation-platform-nu.vercel.app/documentation
+## Support
-Issues:
-https://github.com/pauly7610/ai-evaluation-platform/issues
-```
+- **Docs:** https://v0-ai-evaluation-platform-nu.vercel.app/documentation
+- **Issues:** https://github.com/pauly7610/ai-evaluation-platform/issues

package/dist/cli/baseline.js CHANGED Viewed

@@ -142,7 +142,7 @@ function runBaselineUpdate(cwd) {
     }
     if (!pkg.scripts?.["eval:baseline-update"]) {
         console.error("❌ Missing 'eval:baseline-update' script in package.json.");
-        console.error("   Add it:  \"eval:baseline-update\": \"npx tsx scripts/regression-gate.ts --update-baseline\"");
+        console.error('   Add it:  "eval:baseline-update": "npx tsx scripts/regression-gate.ts --update-baseline"');
         return 1;
     }
     console.log("📊 Running baseline update...\n");