@pauly4010/evalai-sdk 1.6.0 → 1.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -5,6 +5,120 @@ All notable changes to the @pauly4010/evalai-sdk package will be documented in t
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [1.8.0] - 2026-02-26
9
+
10
+ ### ✨ Added
11
+
12
+ #### CLI — `evalai doctor` Rewrite (Comprehensive Checklist)
13
+
14
+ - **9 itemized checks** with pass/fail/warn/skip status and exact remediation commands:
15
+ 1. Project detection (package.json + lockfile + package manager)
16
+ 2. Config file validity (evalai.config.json)
17
+ 3. Baseline file (evals/baseline.json — schema, staleness)
18
+ 4. Authentication (API key presence, redacted display)
19
+ 5. Evaluation target (evaluationId configured)
20
+ 6. API connectivity (reachable, latency)
21
+ 7. Evaluation access (permissions, baseline presence)
22
+ 8. CI wiring (.github/workflows/evalai-gate.yml)
23
+ 9. Provider env vars (OpenAI/Anthropic/Azure — optional)
24
+ - **Exit codes**: `0` ready, `2` not ready, `3` infrastructure error
25
+ - **`--report`** flag outputs full JSON diagnostic bundle (versions, hashes, latency, all checks)
26
+ - **`--format json`** for machine-readable output
27
+
28
+ #### CLI — `evalai explain` (New Command)
29
+
30
+ - **Offline report explainer** — reads `.evalai/last-report.json` or `evals/regression-report.json` with zero flags
31
+ - **Top 3 failing test cases** with input/expected/actual
32
+ - **What changed** — baseline vs current with directional indicators
33
+ - **Root cause classification**: prompt drift, retrieval drift, formatting drift, tool-use drift, safety/cost/latency regression, coverage drop, baseline stale
34
+ - **Prioritized suggested fixes** with actionable commands
35
+ - Works with both `evalai check` reports (CheckReport) and `evalai gate` reports (BuiltinReport)
36
+ - **`--format json`** for CI pipeline consumption
37
+
38
+ #### Guided Failure Flow
39
+
40
+ - **`evalai check` now writes `.evalai/last-report.json`** automatically after every run
41
+ - **Failure hint**: prints `Next: evalai explain` on gate failure
42
+ - **GitHub step summary**: adds tip about `evalai explain` and report artifact location on failure
43
+
44
+ #### CI Template Improvements
45
+
46
+ - **Doctor preflight step** added to generated workflow (`continue-on-error: true`)
47
+ - **Report artifact upload** now includes both `evals/regression-report.json` and `.evalai/last-report.json`
48
+
49
+ #### `evalai init` Output Updated
50
+
51
+ - First recommendation: `npx evalai doctor` (verify setup)
52
+ - Full command reference: doctor, gate, check, explain, baseline update
53
+
54
+ #### CLI — `evalai print-config` (New Command)
55
+
56
+ - **Resolved config viewer** — prints every config field with its current value
57
+ - **Source-of-truth annotations**: `[file]`, `[env]`, `[default]`, `[profile]`, `[arg]` for each field
58
+ - **Secrets redacted** — API keys shown as `sk_t...abcd`
59
+ - **Environment summary** — shows all relevant env vars (EVALAI_API_KEY, OPENAI_API_KEY, CI, etc.)
60
+ - **`--format json`** for machine-readable output
61
+ - Accepts `--evaluationId`, `--baseUrl`, etc. to show how CLI args would merge
62
+
63
+ #### Minimal Green Example
64
+
65
+ - **`examples/minimal-green/`** — passes on first run, no account needed
66
+ - Zero dependencies, 3 `node:test` tests
67
+ - Clone → init → doctor → gate → ✅
68
+
69
+ ### 🔧 Changed
70
+
71
+ - `evalai doctor` exit codes changed: was `0`/`1`, now `0`/`2`/`3`
72
+ - SDK README: added Debugging & Diagnostics section with guided flow diagram
73
+ - SDK README: added Doctor Exit Codes table
74
+ - Doctor test count: 4 → 29 tests; added 9 explain tests (38 total new tests)
75
+
76
+ ---
77
+
78
+ ## [1.7.0] - 2026-02-25
79
+
80
+ ### ✨ Added
81
+
82
+ #### CLI — `evalai init` Full Project Scaffolder
83
+
84
+ - **`evalai init`** — Zero-to-gate in under 5 minutes:
85
+ - Detects Node repo + package manager (npm/yarn/pnpm)
86
+ - Runs existing tests to capture real pass/fail + test count
87
+ - Creates `evals/baseline.json` with provenance metadata
88
+ - Installs `.github/workflows/evalai-gate.yml` (package-manager aware)
89
+ - Creates `evalai.config.json`
90
+ - Prints copy-paste next steps — just commit and push
91
+ - Idempotent: skips files that already exist
92
+
93
+ #### CLI — `evalai upgrade --full` (Tier 1 → Tier 2)
94
+
95
+ - **`evalai upgrade --full`** — Upgrade from built-in gate to full gate:
96
+ - Creates `scripts/regression-gate.ts` (full gate with `--update-baseline`)
97
+ - Adds `eval:regression-gate` + `eval:baseline-update` npm scripts
98
+ - Creates `.github/workflows/baseline-governance.yml` (label + diff enforcement)
99
+ - Upgrades `evalai-gate.yml` to project mode
100
+ - Adds `CODEOWNERS` entry for `evals/baseline.json`
101
+
102
+ #### Gate Output — Machine-Readable Improvements
103
+
104
+ - **`detectRunner()`** — Identifies test runner from `package.json` scripts: vitest, jest, mocha, node:test, ava, tap, or unknown
105
+ - **BuiltinReport** now always emits: `durationMs`, `command`, `runner`, `baseline` metadata
106
+ - Report schema updated with optional `durationMs`, `command`, `runner` properties
107
+
108
+ #### Init Scaffolder Integration Tests
109
+
110
+ - 4 fixtures: npm+jest, pnpm+vitest, yarn+jest, pnpm monorepo
111
+ - 25 tests: files created, YAML valid, pm-aware workflow, idempotent runs
112
+ - All fixtures use `node:test` (zero external deps)
113
+
114
+ ### 🔧 Changed
115
+
116
+ - CLI help text updated to include `upgrade` command
117
+ - Gate report includes runner detection and timing metadata
118
+ - SDK test count: 147 → 172 tests (12 → 15 contract tests)
119
+
120
+ ---
121
+
8
122
  ## [1.6.0] - 2026-02-24
9
123
 
10
124
  ### ✨ Added
package/README.md CHANGED
@@ -1,323 +1,285 @@
1
1
  # @pauly4010/evalai-sdk
2
2
 
3
- [![npm version](https://img.shields.io/npm/v/@pauly4010/evalai-sdk.svg)](https://www.npmjs.com/package/@pauly4010/evalai-sdk)
3
+ [![npm version](https://img.shields.io/npm/v/@pauly4010/evalai-sdk.svg)](https://www.npmjs.com/package/@pauly4010/evalai-sdk)
4
4
  [![npm downloads](https://img.shields.io/npm/dm/@pauly4010/evalai-sdk.svg)](https://www.npmjs.com/package/@pauly4010/evalai-sdk)
5
+ [![TypeScript](https://img.shields.io/badge/TypeScript-strict-blue.svg)](https://www.typescriptlang.org/)
6
+ [![SDK Tests](https://img.shields.io/badge/tests-172%20passed-brightgreen.svg)](#)
7
+ [![Contract Version](https://img.shields.io/badge/report%20schema-v1-blue.svg)](#)
8
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
5
9
 
6
10
  **Stop LLM regressions in CI in minutes.**
7
11
 
8
- Evaluate locally in 60 seconds. Add an optional CI gate in 2 minutes.
9
- No lock-in — remove by deleting `evalai.config.json`.
12
+ Zero to gate in under 5 minutes. No infra. No lock-in. Remove anytime.
10
13
 
11
14
  ---
12
15
 
13
- # 🚀 1) 60 seconds: Run locally (no account)
14
-
15
- Install, run, get a score.
16
- No EvalAI account. No API key. No dashboard required.
16
+ ## Quick Start (2 minutes)
17
17
 
18
18
  ```bash
19
- npm install @pauly4010/evalai-sdk openai
20
- import { openAIChatEval } from "@pauly4010/evalai-sdk";
21
-
22
- await openAIChatEval({
23
- name: "chat-regression",
24
- cases: [
25
- { input: "Hello", expectedOutput: "greeting" },
26
- { input: "2 + 2 = ?", expectedOutput: "4" },
27
- ],
28
- });
29
- Set:
30
-
31
- OPENAI_API_KEY=...
32
- ✅ Vitest integration (recommended)
33
- import {
34
- openAIChatEval,
35
- extendExpectWithToPassGate,
36
- } from "@pauly4010/evalai-sdk";
37
- import { expect } from "vitest";
38
-
39
- extendExpectWithToPassGate(expect);
40
-
41
- it("passes gate", async () => {
42
- const result = await openAIChatEval({
43
- name: "chat-regression",
44
- cases: [
45
- { input: "Hello", expectedOutput: "greeting" },
46
- { input: "2 + 2 = ?", expectedOutput: "4" },
47
- ],
48
- });
49
-
50
- expect(result).toPassGate();
51
- });
52
- Example output
53
- PASS 2/2 (score: 100)
54
-
55
- Tip: Want dashboards and history?
56
- Set EVALAI_API_KEY and connect this to the platform.
57
- Failures show:
58
-
59
- FAIL 9/10 (score: 90)
60
- with failed cases and CI guidance.
61
-
62
- ⚡ 2) Optional: Add a CI gate (2 minutes)
63
- When you're ready to gate PRs on quality and regressions:
64
-
65
- npx -y @pauly4010/evalai-sdk@^1 init
66
- Create an evaluation in the dashboard and paste its ID into:
67
-
68
- {
69
- "evaluationId": "42"
70
- }
71
- Add to your CI:
19
+ npx @pauly4010/evalai-sdk init
20
+ git add evals/ .github/workflows/evalai-gate.yml evalai.config.json
21
+ git commit -m "chore: add EvalAI regression gate"
22
+ git push
23
+ ```
72
24
 
73
- - name: EvalAI gate
74
- env:
75
- EVALAI_API_KEY: ${{ secrets.EVALAI_API_KEY }}
76
- run: npx -y @pauly4010/evalai-sdk@^1 check --format github --onFail import --warnDrop 1
77
- You’ll get:
25
+ That's it. Open a PR and CI blocks regressions automatically.
78
26
 
79
- GitHub annotations
27
+ `evalai init` detects your project, creates a baseline from your current tests, and installs a GitHub Actions workflow. No manual config needed.
80
28
 
81
- Step summary
29
+ ---
82
30
 
83
- Optional dashboard link
31
+ ## What `evalai init` does
84
32
 
85
- PASS / WARN / FAIL (v1.5.7)
86
- EvalAI introduces a WARN band so teams can see meaningful regressions without always blocking merges.
33
+ 1. **Detects** your Node repo and package manager (npm/yarn/pnpm)
34
+ 2. **Runs your tests** to capture a real baseline (pass/fail + test count)
35
+ 3. **Creates `evals/baseline.json`** with provenance metadata
36
+ 4. **Installs `.github/workflows/evalai-gate.yml`** (package-manager aware)
37
+ 5. **Creates `evalai.config.json`**
38
+ 6. **Prints next steps** — just commit and push
87
39
 
88
- Behavior
40
+ ---
89
41
 
90
- PASS within thresholds
42
+ ## CLI Commands
91
43
 
92
- WARN regression > warnDrop but < maxDrop
44
+ ### Regression Gate (local, no account needed)
93
45
 
94
- FAIL regression > maxDrop
46
+ | Command | Description |
47
+ |---------|-------------|
48
+ | `npx evalai init` | Full project scaffolder — creates everything you need |
49
+ | `npx evalai gate` | Run regression gate locally |
50
+ | `npx evalai gate --format json` | Machine-readable JSON output |
51
+ | `npx evalai gate --format github` | GitHub Step Summary with delta table |
52
+ | `npx evalai baseline init` | Create starter `evals/baseline.json` |
53
+ | `npx evalai baseline update` | Re-run tests and update baseline with real scores |
54
+ | `npx evalai upgrade --full` | Upgrade from Tier 1 (built-in) to Tier 2 (full gate) |
95
55
 
96
- Key flags
56
+ ### API Gate (requires account)
97
57
 
98
- --warnDrop soft regression warning
58
+ | Command | Description |
59
+ |---------|-------------|
60
+ | `npx evalai check` | Gate on quality score from dashboard |
61
+ | `npx evalai share` | Create share link for a run |
99
62
 
100
- --maxDrop hard regression fail
63
+ ### Debugging & Diagnostics
101
64
 
102
- --fail-on-flake fail if unknown test is unstable
65
+ | Command | Description |
66
+ |---------|-------------|
67
+ | `npx evalai doctor` | Comprehensive preflight checklist — verifies config, baseline, auth, API, CI wiring |
68
+ | `npx evalai explain` | Offline report explainer — top failures, root cause classification, suggested fixes |
69
+ | `npx evalai print-config` | Show resolved config with source-of-truth annotations (file/env/default/arg) |
103
70
 
104
- This lets teams tune signal vs noise in CI.
71
+ **Guided failure flow:**
105
72
 
106
- 🔒 3) No lock-in
107
- To stop using EvalAI:
73
+ ```
74
+ evalai check → fails → "Next: evalai explain"
75
+
76
+ evalai explain → root causes + fixes
77
+ ```
108
78
 
109
- rm evalai.config.json
110
- Your local openAIChatEval runs continue to work exactly the same.
79
+ **GitHub Actions step summary** — gate result at a glance:
111
80
 
112
- No account cancellation. No data export required.
81
+ ![GitHub Actions step summary showing gate pass/fail with delta table](../../docs/images/evalai-gate-step-summary.svg)
113
82
 
114
- 📦 Installation
115
- npm install @pauly4010/evalai-sdk openai
116
- # or
117
- yarn add @pauly4010/evalai-sdk openai
118
- # or
119
- pnpm add @pauly4010/evalai-sdk openai
120
- 🖥️ Environment Support
121
- This SDK works in both Node.js and browsers, with some Node-only features.
83
+ **`evalai explain` terminal output** — root causes + fix commands:
122
84
 
123
- Works Everywhere (Node.js + Browser)
124
- Traces API
85
+ ![Terminal output of evalai explain showing top failures and suggested fixes](../../docs/images/evalai-explain-terminal.svg)
125
86
 
126
- Evaluations API
87
+ `check` automatically writes `.evalai/last-report.json` so `explain` works with zero flags.
127
88
 
128
- LLM Judge API
89
+ `doctor` uses exit codes: **0** = ready, **2** = not ready, **3** = infra error. Use `--report` for a JSON diagnostic bundle.
129
90
 
130
- Annotations API
91
+ ### Gate Exit Codes
131
92
 
132
- Developer API (API Keys, Webhooks, Usage)
93
+ | Code | Meaning |
94
+ |------|---------|
95
+ | 0 | Pass — no regression |
96
+ | 1 | Regression detected |
97
+ | 2 | Infra error (baseline missing, tests crashed) |
133
98
 
134
- Organizations API
99
+ ### Check Exit Codes (API mode)
135
100
 
136
- Assertions Library
101
+ | Code | Meaning |
102
+ |------|---------|
103
+ | 0 | Pass |
104
+ | 1 | Score below threshold |
105
+ | 2 | Regression failure |
106
+ | 3 | Policy violation |
107
+ | 4 | API error |
108
+ | 5 | Bad arguments |
109
+ | 6 | Low test count |
110
+ | 7 | Weak evidence |
111
+ | 8 | Warn (soft regression) |
137
112
 
138
- Test Suites
113
+ ### Doctor Exit Codes
139
114
 
140
- Error Handling
115
+ | Code | Meaning |
116
+ |------|---------|
117
+ | 0 | Ready — all checks passed |
118
+ | 2 | Not ready — one or more checks failed |
119
+ | 3 | Infrastructure error |
141
120
 
142
- CJS/ESM Compatibility
121
+ ---
143
122
 
144
- 🟡 Node.js Only
145
- These require Node.js:
123
+ ## How the Gate Works
146
124
 
147
- Snapshot Testing
125
+ **Built-in mode** (any Node project, no config needed):
126
+ - Runs `<pm> test`, captures exit code + test count
127
+ - Compares against `evals/baseline.json`
128
+ - Writes `evals/regression-report.json`
129
+ - Fails CI if tests regress
148
130
 
149
- Local Storage Mode
131
+ **Project mode** (advanced, for full regression gate):
132
+ - If `eval:regression-gate` script exists in `package.json`, delegates to it
133
+ - Supports golden eval scores, confidence tests, p95 latency, cost tracking
134
+ - Full delta table with tolerances
150
135
 
151
- CLI Tool
136
+ ---
152
137
 
153
- Export to File
138
+ ## Run a Regression Test Locally (no account)
154
139
 
155
- 🔄 Context Propagation
156
- Node.js: full async context via AsyncLocalStorage
140
+ ```bash
141
+ npm install @pauly4010/evalai-sdk openai
142
+ ```
157
143
 
158
- Browser: basic support (not safe across all async boundaries)
144
+ ```typescript
145
+ import { openAIChatEval } from "@pauly4010/evalai-sdk";
159
146
 
160
- 🧠 AIEvalClient (Platform API)
161
- import { AIEvalClient } from "@pauly4010/evalai-sdk";
147
+ await openAIChatEval({
148
+ name: "chat-regression",
149
+ cases: [
150
+ { input: "Hello", expectedOutput: "greeting" },
151
+ { input: "2 + 2 = ?", expectedOutput: "4" },
152
+ ],
153
+ });
154
+ ```
162
155
 
163
- // From env
164
- const client = AIEvalClient.init();
156
+ Output: `PASS 2/2 (score: 100)`. No account needed. Just a score.
165
157
 
166
- // Explicit
167
- const client2 = new AIEvalClient({
168
- apiKey: "your-api-key",
169
- organizationId: 123,
170
- debug: true,
171
- });
172
- 🧪 evalai CLI (v1.5.7)
173
- The CLI gates deployments on quality, regression, and policy.
174
-
175
- Quick start
176
- npx -y @pauly4010/evalai-sdk@^1 check \
177
- --evaluationId 42 \
178
- --apiKey $EVALAI_API_KEY
179
- evalai check
180
- Option Description
181
- --evaluationId <id> Required. Evaluation to gate on
182
- --apiKey <key> API key (or EVALAI_API_KEY)
183
- --format <fmt> human, json, or github
184
- --onFail import Import failing run to dashboard
185
- --explain Show score breakdown
186
- --minScore <n> Fail if score < n
187
- --warnDrop <n> Warn if regression exceeds n
188
- --maxDrop <n> Fail if regression exceeds n
189
- --minN <n> Fail if test count < n
190
- --allowWeakEvidence Permit weak evidence
191
- --policy <name> HIPAA, SOC2, GDPR, PCI_DSS, FINRA_4511
192
- --baseline <mode> published, previous, production
193
- --fail-on-flake Fail if unknown case is flaky
194
- --baseUrl <url> Override API base URL
195
-
196
- Exit codes
197
- Code Meaning
198
- 0 PASS
199
- 8 WARN
200
- 1 Score below threshold
201
- 2 Regression failure
202
- 3 Policy violation
203
- 4 API error
204
- 5 Bad arguments
205
- 6 Low test count
206
- 7 Weak evidence
207
- evalai doctor
208
- Verify CI setup before running the gate:
209
-
210
- npx -y @pauly4010/evalai-sdk@^1 doctor \
211
- --evaluationId 42 \
212
- --apiKey $EVALAI_API_KEY
213
- If doctor passes, check will work.
214
-
215
- 🧯 Error Handling
216
- import { EvalAIError, RateLimitError } from "@pauly4010/evalai-sdk";
217
-
218
- try {
219
- await client.traces.create({ name: "User Query" });
220
- } catch (err) {
221
- if (err instanceof RateLimitError) {
222
- console.log("Retry after:", err.retryAfter);
223
- } else if (err instanceof EvalAIError) {
224
- console.log(err.code, err.message, err.requestId);
225
- }
226
- }
227
-
228
-
229
- 🔍 Traces
230
- const trace = await client.traces.create({
231
- name: "User Query",
232
- traceId: "trace-123",
233
- metadata: { userId: "456" },
234
- });
158
+ ### Vitest Integration
235
159
 
160
+ ```typescript
161
+ import { openAIChatEval, extendExpectWithToPassGate } from "@pauly4010/evalai-sdk";
162
+ import { expect } from "vitest";
236
163
 
237
- 📝 Evaluations
238
- import { EvaluationTemplates } from "@pauly4010/evalai-sdk";
164
+ extendExpectWithToPassGate(expect);
239
165
 
240
- const evaluation = await client.evaluations.create({
241
- name: "Chatbot Responses",
242
- type: EvaluationTemplates.OUTPUT_QUALITY,
243
- createdBy: userId,
166
+ it("passes gate", async () => {
167
+ const result = await openAIChatEval({
168
+ name: "chat-regression",
169
+ cases: [
170
+ { input: "Hello", expectedOutput: "greeting" },
171
+ { input: "2 + 2 = ?", expectedOutput: "4" },
172
+ ],
173
+ });
174
+ expect(result).toPassGate();
244
175
  });
176
+ ```
245
177
 
178
+ ---
246
179
 
247
- 🔌 Framework Integrations
248
- import { traceOpenAI } from "@pauly4010/evalai-sdk/integrations/openai";
249
- import OpenAI from "openai";
250
-
251
- const openai = traceOpenAI(new OpenAI(), client);
180
+ ## SDK Exports
252
181
 
253
- await openai.chat.completions.create({
254
- model: "gpt-4",
255
- messages: [{ role: "user", content: "Hello" }],
256
- });
182
+ ### Regression Gate Constants
257
183
 
184
+ ```typescript
185
+ import {
186
+ GATE_EXIT, // { PASS: 0, REGRESSION: 1, INFRA_ERROR: 2, ... }
187
+ GATE_CATEGORY, // { PASS: "pass", REGRESSION: "regression", INFRA_ERROR: "infra_error" }
188
+ REPORT_SCHEMA_VERSION,
189
+ ARTIFACTS, // { BASELINE, REGRESSION_REPORT, CONFIDENCE_SUMMARY, LATENCY_BENCHMARK }
190
+ } from "@pauly4010/evalai-sdk";
258
191
 
259
- 🧭 Changelog
260
- v1.5.8 (Latest)
261
- Fixed secureRoute TypeScript overload compatibility
192
+ // Or tree-shakeable:
193
+ import { GATE_EXIT } from "@pauly4010/evalai-sdk/regression";
194
+ ```
262
195
 
263
- Fixed test infrastructure (expect.any, NextRequest constructor)
196
+ ### Types
197
+
198
+ ```typescript
199
+ import type {
200
+ RegressionReport,
201
+ RegressionDelta,
202
+ Baseline,
203
+ BaselineTolerance,
204
+ GateExitCode,
205
+ GateCategory,
206
+ } from "@pauly4010/evalai-sdk/regression";
207
+ ```
264
208
 
265
- Fixed 304 response handling in exports API
209
+ ### Platform Client
266
210
 
267
- Improved error catalog test coverage
211
+ ```typescript
212
+ import { AIEvalClient } from "@pauly4010/evalai-sdk";
268
213
 
269
- v1.5.7
270
- Documentation updates for CJS compatibility
214
+ const client = AIEvalClient.init(); // from EVALAI_API_KEY env
215
+ // or
216
+ const client = new AIEvalClient({ apiKey: "...", organizationId: 123 });
217
+ ```
271
218
 
272
- Version alignment across README and changelog
219
+ ### Framework Integrations
273
220
 
274
- Environment support section updated
221
+ ```typescript
222
+ import { traceOpenAI } from "@pauly4010/evalai-sdk/integrations/openai";
223
+ import { traceAnthropic } from "@pauly4010/evalai-sdk/integrations/anthropic";
224
+ ```
275
225
 
276
- v1.5.6
277
- PASS/WARN/FAIL gate semantics
226
+ ---
278
227
 
279
- --warnDrop soft regression band
228
+ ## Installation
280
229
 
281
- Flake intelligence + per-case pass rates
230
+ ```bash
231
+ npm install @pauly4010/evalai-sdk
232
+ # or
233
+ yarn add @pauly4010/evalai-sdk
234
+ # or
235
+ pnpm add @pauly4010/evalai-sdk
236
+ ```
282
237
 
283
- --fail-on-flake enforcement
238
+ Add `openai` as a peer dependency if using `openAIChatEval`:
284
239
 
285
- Golden regression suite
240
+ ```bash
241
+ npm install openai
242
+ ```
286
243
 
287
- Nightly determinism + performance audits
244
+ ## Environment Support
288
245
 
289
- Audit trail, observability, retention, and migration safety docs
246
+ | Feature | Node.js | Browser |
247
+ |---------|---------|---------|
248
+ | Platform APIs (Traces, Evaluations, LLM Judge) | ✅ | ✅ |
249
+ | Assertions, Test Suites, Error Handling | ✅ | ✅ |
250
+ | CJS/ESM | ✅ | ✅ |
251
+ | CLI, Snapshots, File Export | ✅ | — |
252
+ | Context Propagation | ✅ Full | ⚠️ Basic |
290
253
 
291
- CJS compatibility for all subpath exports
254
+ ## No Lock-in
292
255
 
293
- v1.5.0
294
- GitHub annotations formatter
256
+ ```bash
257
+ rm evalai.config.json
258
+ ```
295
259
 
296
- JSON formatter
260
+ Your local `openAIChatEval` runs continue to work. No account cancellation. No data export required.
297
261
 
298
- --onFail import
262
+ ## Changelog
299
263
 
300
- --explain
264
+ See [CHANGELOG.md](CHANGELOG.md) for the full release history.
301
265
 
302
- evalai doctor
266
+ **v1.8.0** — `evalai doctor` rewrite (9-check checklist), `evalai explain` command, guided failure flow, CI template with doctor preflight
303
267
 
304
- CI pinned invocation guidance
268
+ **v1.7.0** `evalai init` scaffolder, `evalai upgrade --full`, `detectRunner()`, machine-readable gate output, init test matrix
305
269
 
270
+ **v1.6.0** — `evalai gate`, `evalai baseline`, regression gate constants & types
306
271
 
307
- Environment Variable Safety
272
+ **v1.5.8** secureRoute fix, test infra fixes, 304 handling fix
308
273
 
309
- The SDK never assumes `process.env` exists. All environment reads are guarded, so the client can initialize safely in browser, edge, and server runtimes.
274
+ **v1.5.5** PASS/WARN/FAIL semantics, flake intelligence, golden regression suite
310
275
 
311
- If environment variables are unavailable, the SDK falls back to explicit config.
276
+ **v1.5.0** GitHub annotations, `--onFail import`, `evalai doctor`
312
277
 
278
+ ## License
313
279
 
314
- 📄 License
315
280
  MIT
316
281
 
317
- 🤝 Support
318
- Documentation:
319
- https://v0-ai-evaluation-platform-nu.vercel.app/documentation
282
+ ## Support
320
283
 
321
- Issues:
322
- https://github.com/pauly7610/ai-evaluation-platform/issues
323
- ```
284
+ - **Docs:** https://v0-ai-evaluation-platform-nu.vercel.app/documentation
285
+ - **Issues:** https://github.com/pauly7610/ai-evaluation-platform/issues
@@ -142,7 +142,7 @@ function runBaselineUpdate(cwd) {
142
142
  }
143
143
  if (!pkg.scripts?.["eval:baseline-update"]) {
144
144
  console.error("❌ Missing 'eval:baseline-update' script in package.json.");
145
- console.error(" Add it: \"eval:baseline-update\": \"npx tsx scripts/regression-gate.ts --update-baseline\"");
145
+ console.error(' Add it: "eval:baseline-update": "npx tsx scripts/regression-gate.ts --update-baseline"');
146
146
  return 1;
147
147
  }
148
148
  console.log("📊 Running baseline update...\n");