@evalgate/sdk 2.2.2 → 2.2.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -5,6 +5,33 @@ All notable changes to the @evalgate/sdk package will be documented in this file
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [2.2.3] - 2026-03-03
9
+
10
+ ### Fixed
11
+
12
+ - **`RequestCache.set` missing default TTL** — entries stored without an explicit TTL were immediately stale on next read. Default is now `CacheTTL.MEDIUM`; callers that omit `ttl` get a live cache entry instead of a cache miss every time.
13
+ - **`EvalGateError` subclass prototype chain** — `ValidationError.name` was silently overwritten by the base class constructor, surfacing as `"EvalGateError"` in stack traces and `instanceof` checks. All four subclasses (`ValidationError`, `RateLimitError`, `AuthenticationError`, `NetworkError`) now call `Object.setPrototypeOf(this, Subclass.prototype)` and set `this.name` after `super()`.
14
+ - **`RateLimitError.retryAfter` not a direct property** — the value was only stored inside `details.retryAfter` and not accessible as `err.retryAfter`. It is now assigned directly on the instance when provided.
15
+ - **`autoPaginate` returned `AsyncGenerator` instead of `Promise<T[]>`** — calling `await autoPaginate(fetcher)` was resolving to an unexhausted generator. It now collects all pages and returns a flat `Promise<T[]>`. The original streaming behaviour is available via the new `autoPaginateGenerator` export.
16
+ - **`createEvalRuntime` string-only overload** — passing `{ name, projectRoot }` config objects was ignored (treated as `process.cwd()`). The function now accepts `string | { name?: string; projectRoot?: string }` and extracts `projectRoot` correctly.
17
+ - **`defaultLocalExecutor` was an instance, not a factory** — importing `defaultLocalExecutor` returned a pre-constructed executor rather than a callable factory. It is now re-exported as `createLocalExecutor` so each import site can call it to get a fresh instance.
18
+ - **`SnapshotManager.save` crash on `undefined`/`null` output** — passing `undefined` or `null` to `snapshot(name, output)` threw `TypeError: Cannot convert undefined to string`. Both values are now serialized to the strings `"undefined"` and `"null"` respectively, matching the existing `null`-safe coercion already present for objects.
19
+ - **`compareSnapshots` loaded raw string instead of disk snapshot** — the old `compareWithSnapshot` alias passed its second argument as literal content rather than a snapshot name, producing meaningless diffs. The new `compareSnapshots(nameA, nameB, dir?)` loads both snapshots from disk before diffing.
20
+ - **`AIEvalClient` default `baseUrl`** — the no-arg constructor defaulted to `http://localhost:3000`, causing silent failures in production environments. Default is now `https://api.evalgate.com`.
21
+ - **`importData` unguarded `client.traces` / `client.evaluations` access** — calling `importData(data)` with a partial or undefined client could throw `TypeError: Cannot read properties of undefined`. Both property accesses now use optional chaining (`client?.traces`, `client?.evaluations`).
22
+ - **`toContainCode` required a fenced code block** — raw function definitions, `const` assignments, class declarations, arrow functions, `import`/`export` statements, and `return` expressions now satisfy the assertion without needing triple-backtick fencing.
23
+ - **`hasReadabilityScore` ignored `{min}` object form** — passing `{ min: 40 }` instead of a plain number was coerced to `NaN` threshold, making every call return `true`. The function now unwraps `{ min?, max? }` objects and applies both bounds.
24
+
25
+ ### Added
26
+
27
+ - **`autoPaginateGenerator`** — new export for streaming pagination as an `AsyncGenerator<T[]>` (one chunk per page). Use when you want to process pages incrementally rather than wait for all pages to load.
28
+ - **`compareSnapshots(nameA, nameB, dir?)`** — loads both named snapshots from disk and returns a `SnapshotComparison`. Replaces the incorrectly aliased `compareWithSnapshot`.
29
+ - **141 new regression tests** across 9 test files covering all fixes above: `RequestCache` TTL defaults, error class prototype chains, `autoPaginate` flat-array return, `createEvalRuntime` config-object overload, `defaultLocalExecutor` callable factory, `SnapshotManager` null/undefined handling, `compareSnapshots` disk-load path, `AIEvalClient` default `baseUrl`, `importData` guards, `toContainCode` raw-code detection, and `hasReadabilityScore` object form.
30
+ - **`upgrade --full` post-upgrade warning** — CLI now prints a reminder to run `npx evalgate baseline update` after a full upgrade to avoid a false regression on the next CI run.
31
+ - **Optional chaining on OpenAI / Anthropic integration `traces.create`** — `evalClient.traces?.create(...)` prevents crashes when the `traces` resource is unavailable on the client (e.g. minimal config or testing without a full API key).
32
+
33
+ ---
34
+
8
35
  ## [2.2.2] - 2026-03-03
9
36
 
10
37
  ### Fixed
package/README.md CHANGED
@@ -450,6 +450,8 @@ Your local `openAIChatEval` runs continue to work. No account cancellation. No d
450
450
 
451
451
  See [CHANGELOG.md](CHANGELOG.md) for the full release history.
452
452
 
453
+ **v2.2.3** — Bug-fix release. `RequestCache` default TTL, `EvalGateError` subclass prototype chain and `retryAfter` direct property, `autoPaginate` now returns `Promise<T[]>` (new `autoPaginateGenerator` for streaming), `createEvalRuntime` config-object overload, `defaultLocalExecutor` callable factory, `SnapshotManager.save` null/undefined safety, `compareSnapshots` loads both sides from disk, `AIEvalClient` default baseUrl → `https://api.evalgate.com`, `importData` optional-chaining guards, `toContainCode` raw-code detection, `hasReadabilityScore` `{min,max}` object form. 141 new regression tests.
454
+
453
455
  **v2.2.2** — 8 stub assertions replaced with real implementations (`hasSentiment` expanded lexicon, `hasNoToxicity` ~80-term blocklist, `hasValidCodeSyntax` real bracket balance, `containsLanguage` 12 languages + BCP-47, `hasFactualAccuracy`/`hasNoHallucinations` case-insensitive, `hasReadabilityScore` per-word syllable fix, `matchesSchema` JSON Schema support). Added LLM-backed `*Async` variants + `configureAssertions`. Fixed `importData` crash, `compareWithSnapshot` object coercion, `WorkflowTracer` defensive guard. 115 new tests.
454
456
 
455
457
  **v2.2.1** — `snapshot(name, output)` accepts objects; auto-serialized via `JSON.stringify`
@@ -126,10 +126,11 @@ export declare class Expectation {
126
126
  */
127
127
  toBeBetween(min: number, max: number, message?: string): AssertionResult;
128
128
  /**
129
- * Assert value contains code block
129
+ * Assert value contains code block or raw code
130
130
  * @example expect(output).toContainCode()
131
+ * @example expect(output).toContainCode('typescript')
131
132
  */
132
- toContainCode(message?: string): AssertionResult;
133
+ toContainCode(language?: string, message?: string): AssertionResult;
133
134
  /**
134
135
  * Assert value is professional tone (no profanity)
135
136
  * @example expect(output).toBeProfessional()
@@ -209,9 +210,12 @@ export declare function isValidURL(url: string): boolean;
209
210
  * facts but cannot detect paraphrased fabrications. Use
210
211
  * {@link hasNoHallucinationsAsync} for semantic accuracy.
211
212
  */
212
- export declare function hasNoHallucinations(text: string, groundTruth: string[]): boolean;
213
+ export declare function hasNoHallucinations(text: string, groundTruth?: string[]): boolean;
213
214
  export declare function matchesSchema(value: unknown, schema: Record<string, unknown>): boolean;
214
- export declare function hasReadabilityScore(text: string, minScore: number): boolean;
215
+ export declare function hasReadabilityScore(text: string, minScore: number | {
216
+ min?: number;
217
+ max?: number;
218
+ }): boolean;
215
219
  /**
216
220
  * Keyword-frequency language detector supporting 12 languages.
217
221
  * **Fast and approximate** — detects the most common languages reliably
@@ -234,7 +238,7 @@ export declare function respondedWithinTime(startTime: number, maxMs: number): b
234
238
  * with an LLM for context-aware moderation.
235
239
  */
236
240
  export declare function hasNoToxicity(text: string): boolean;
237
- export declare function followsInstructions(text: string, instructions: string[]): boolean;
241
+ export declare function followsInstructions(text: string, instructions: string | string[]): boolean;
238
242
  export declare function containsAllRequiredFields(obj: unknown, requiredFields: string[]): boolean;
239
243
  export interface AssertionLLMConfig {
240
244
  provider: "openai" | "anthropic";
@@ -234,9 +234,10 @@ class Expectation {
234
234
  let parsedJson = null;
235
235
  try {
236
236
  parsedJson = JSON.parse(String(this.value));
237
- const requiredKeys = Object.keys(schema);
238
- const actualKeys = Object.keys(parsedJson);
239
- passed = requiredKeys.every((key) => actualKeys.includes(key));
237
+ const entries = Object.entries(schema);
238
+ passed = entries.every(([key, expectedValue]) => parsedJson !== null &&
239
+ key in parsedJson &&
240
+ JSON.stringify(parsedJson[key]) === JSON.stringify(expectedValue));
240
241
  }
241
242
  catch (_e) {
242
243
  passed = false;
@@ -436,19 +437,30 @@ class Expectation {
436
437
  };
437
438
  }
438
439
  /**
439
- * Assert value contains code block
440
+ * Assert value contains code block or raw code
440
441
  * @example expect(output).toContainCode()
442
+ * @example expect(output).toContainCode('typescript')
441
443
  */
442
- toContainCode(message) {
444
+ toContainCode(language, message) {
443
445
  const text = String(this.value);
444
- const hasCodeBlock = /```[\s\S]*?```/.test(text) || /<code>[\s\S]*?<\/code>/.test(text);
446
+ const hasMarkdownBlock = language
447
+ ? new RegExp(`\`\`\`${language}[\\s\\S]*?\`\`\``).test(text)
448
+ : /```[\s\S]*?```/.test(text);
449
+ const hasHtmlBlock = /<code>[\s\S]*?<\/code>/.test(text);
450
+ const hasRawCode = /\bfunction\s+\w+\s*\(/.test(text) ||
451
+ /\b(?:const|let|var)\s+\w+\s*=/.test(text) ||
452
+ /\bclass\s+\w+/.test(text) ||
453
+ /=>\s*[{(]/.test(text) ||
454
+ /\bimport\s+.*\bfrom\b/.test(text) ||
455
+ /\bexport\s+(?:default\s+)?(?:function|class|const)/.test(text) ||
456
+ /\breturn\s+.+;/.test(text);
457
+ const hasCodeBlock = hasMarkdownBlock || hasHtmlBlock || hasRawCode;
445
458
  return {
446
459
  name: "toContainCode",
447
460
  passed: hasCodeBlock,
448
- expected: "code block",
461
+ expected: language ? `code block (${language})` : "code block",
449
462
  actual: text,
450
- message: message ||
451
- (hasCodeBlock ? "Contains code block" : "No code block found"),
463
+ message: message || (hasCodeBlock ? "Contains code" : "No code found"),
452
464
  };
453
465
  }
454
466
  /**
@@ -719,7 +731,7 @@ function isValidURL(url) {
719
731
  * facts but cannot detect paraphrased fabrications. Use
720
732
  * {@link hasNoHallucinationsAsync} for semantic accuracy.
721
733
  */
722
- function hasNoHallucinations(text, groundTruth) {
734
+ function hasNoHallucinations(text, groundTruth = []) {
723
735
  const lower = text.toLowerCase();
724
736
  return groundTruth.every((truth) => lower.includes(truth.toLowerCase()));
725
737
  }
@@ -739,12 +751,14 @@ function matchesSchema(value, schema) {
739
751
  return Object.keys(schema).every((key) => key in obj);
740
752
  }
741
753
  function hasReadabilityScore(text, minScore) {
754
+ const threshold = typeof minScore === "number" ? minScore : (minScore.min ?? 0);
755
+ const maxThreshold = typeof minScore === "object" ? minScore.max : undefined;
742
756
  const wordList = text.trim().split(/\s+/).filter(Boolean);
743
757
  const words = wordList.length || 1;
744
758
  const sentences = text.split(/[.!?]+/).filter((s) => s.trim().length > 0).length || 1;
745
759
  const totalSyllables = wordList.reduce((sum, w) => sum + syllables(w), 0);
746
760
  const score = 206.835 - 1.015 * (words / sentences) - 84.6 * (totalSyllables / words);
747
- return score >= minScore;
761
+ return (score >= threshold && (maxThreshold === undefined || score <= maxThreshold));
748
762
  }
749
763
  function syllables(word) {
750
764
  // Simple syllable counter
@@ -1154,7 +1168,10 @@ function hasNoToxicity(text) {
1154
1168
  return !toxicTerms.some((term) => lower.includes(term));
1155
1169
  }
1156
1170
  function followsInstructions(text, instructions) {
1157
- return instructions.every((instruction) => {
1171
+ const instructionList = Array.isArray(instructions)
1172
+ ? instructions
1173
+ : [instructions];
1174
+ return instructionList.every((instruction) => {
1158
1175
  if (instruction.startsWith("!")) {
1159
1176
  return !text.includes(instruction.slice(1));
1160
1177
  }
package/dist/cache.d.ts CHANGED
@@ -21,7 +21,7 @@ export declare class RequestCache {
21
21
  /**
22
22
  * Store response in cache
23
23
  */
24
- set<T>(method: string, url: string, data: T, ttl: number, params?: unknown): void;
24
+ set<T>(method: string, url: string, data: T, ttl?: number, params?: unknown): void;
25
25
  /**
26
26
  * Invalidate specific cache entry
27
27
  */
package/dist/cache.js CHANGED
@@ -43,7 +43,7 @@ class RequestCache {
43
43
  /**
44
44
  * Store response in cache
45
45
  */
46
- set(method, url, data, ttl, params) {
46
+ set(method, url, data, ttl = exports.CacheTTL.MEDIUM, params) {
47
47
  // Enforce cache size limit (LRU-style)
48
48
  if (this.cache.size >= this.maxSize) {
49
49
  const firstKey = this.cache.keys().next().value;
@@ -480,7 +480,12 @@ After upgrading:
480
480
  console.log(" - package.json eval:regression-gate + eval:baseline-update");
481
481
  console.log(" - .github/workflows/ Gate + governance workflows");
482
482
  console.log(" - .github/CODEOWNERS Baseline requires approval\n");
483
+ console.log(" ⚠️ IMPORTANT — Reset your baseline before pushing:");
484
+ console.log(" The gate compares against your existing Tier 1 baseline.");
485
+ console.log(" If your test script changed, run this first to avoid an immediate regression:");
486
+ console.log(" npx evalgate baseline update (or: pnpm eval:baseline-update)\n");
483
487
  console.log(" Next:");
488
+ console.log(" npx evalgate baseline update");
484
489
  console.log(" git add -A");
485
490
  console.log(" git commit -m 'chore: upgrade EvalGate gate to Tier 2'");
486
491
  console.log(" git push\n");
package/dist/client.js CHANGED
@@ -72,7 +72,7 @@ class AIEvalClient {
72
72
  this.baseUrl =
73
73
  config.baseUrl ||
74
74
  getEnvVar("EVALGATE_BASE_URL", "EVALAI_BASE_URL") ||
75
- (isBrowser ? "" : "http://localhost:3000");
75
+ (isBrowser ? "" : "https://api.evalgate.com");
76
76
  this.timeout = config.timeout || 30000;
77
77
  // Tier 4.17: Debug mode with request logging
78
78
  const logLevel = config.logLevel || (config.debug ? "debug" : "info");
package/dist/errors.js CHANGED
@@ -271,6 +271,10 @@ class RateLimitError extends EvalGateError {
271
271
  constructor(message, retryAfter) {
272
272
  super(message, "RATE_LIMIT_EXCEEDED", 429, { retryAfter });
273
273
  this.name = "RateLimitError";
274
+ if (retryAfter !== undefined) {
275
+ this.retryAfter = retryAfter;
276
+ }
277
+ Object.setPrototypeOf(this, RateLimitError.prototype);
274
278
  }
275
279
  }
276
280
  exports.RateLimitError = RateLimitError;
@@ -278,6 +282,7 @@ class AuthenticationError extends EvalGateError {
278
282
  constructor(message = "Authentication failed") {
279
283
  super(message, "AUTHENTICATION_ERROR", 401);
280
284
  this.name = "AuthenticationError";
285
+ Object.setPrototypeOf(this, AuthenticationError.prototype);
281
286
  }
282
287
  }
283
288
  exports.AuthenticationError = AuthenticationError;
@@ -285,6 +290,7 @@ class ValidationError extends EvalGateError {
285
290
  constructor(message = "Validation failed", details) {
286
291
  super(message, "VALIDATION_ERROR", 400, details);
287
292
  this.name = "ValidationError";
293
+ Object.setPrototypeOf(this, ValidationError.prototype);
288
294
  }
289
295
  }
290
296
  exports.ValidationError = ValidationError;
@@ -293,6 +299,7 @@ class NetworkError extends EvalGateError {
293
299
  super(message, "NETWORK_ERROR", 0);
294
300
  this.name = "NetworkError";
295
301
  this.retryable = true;
302
+ Object.setPrototypeOf(this, NetworkError.prototype);
296
303
  }
297
304
  }
298
305
  exports.NetworkError = NetworkError;
package/dist/export.js CHANGED
@@ -155,7 +155,7 @@ async function importData(client, data, options = {}) {
155
155
  return result;
156
156
  }
157
157
  // Import traces
158
- if (data.traces) {
158
+ if (data.traces && client?.traces) {
159
159
  const traceResults = { imported: 0, skipped: 0, failed: 0 };
160
160
  for (const trace of data.traces) {
161
161
  try {
@@ -191,7 +191,7 @@ async function importData(client, data, options = {}) {
191
191
  result.summary.total += data.traces.length;
192
192
  }
193
193
  // Import evaluations
194
- if (data.evaluations) {
194
+ if (data.evaluations && client?.evaluations) {
195
195
  const evalResults = { imported: 0, skipped: 0, failed: 0 };
196
196
  for (const evaluation of data.evaluations) {
197
197
  try {
package/dist/index.d.ts CHANGED
@@ -20,8 +20,8 @@ export { createEvalRuntime, disposeActiveRuntime, getActiveRuntime, setActiveRun
20
20
  export type { CloudExecutor, DefineEvalFunction, EvalContext, EvalExecutor, EvalExecutorInterface, EvalOptions, EvalResult, EvalRuntime, EvalSpec, ExecutorCapabilities, LocalExecutor, SpecConfig, SpecOptions, WorkerExecutor, } from "./runtime/types";
21
21
  export { EvalRuntimeError, RuntimeError, SpecExecutionError, SpecRegistrationError, } from "./runtime/types";
22
22
  export { createTestSuite, type TestCaseResult, TestSuite, TestSuiteCase, TestSuiteCaseResult, TestSuiteConfig, TestSuiteResult, } from "./testing";
23
- import { compareWithSnapshot, snapshot } from "./snapshot";
24
- export { snapshot, compareWithSnapshot, snapshot as saveSnapshot, compareWithSnapshot as compareSnapshots, };
23
+ import { compareSnapshots, compareWithSnapshot, snapshot } from "./snapshot";
24
+ export { snapshot, compareWithSnapshot, compareSnapshots, snapshot as saveSnapshot, };
25
25
  import type { ExportFormat } from "./export";
26
26
  import { exportData, importData } from "./export";
27
27
  export { exportData, importData };
@@ -34,7 +34,7 @@ export { traceOpenAI } from "./integrations/openai";
34
34
  export { type OpenAIChatEvalCase, type OpenAIChatEvalOptions, type OpenAIChatEvalResult, openAIChatEval, } from "./integrations/openai-eval";
35
35
  export { Logger } from "./logger";
36
36
  export { extendExpectWithToPassGate } from "./matchers";
37
- export { autoPaginate, createPaginatedIterator, decodeCursor, encodeCursor, PaginatedIterator, type PaginatedResponse, type PaginationParams, } from "./pagination";
37
+ export { autoPaginate, autoPaginateGenerator, createPaginatedIterator, decodeCursor, encodeCursor, PaginatedIterator, type PaginatedResponse, type PaginationParams, } from "./pagination";
38
38
  export { ARTIFACTS, type Baseline, type BaselineTolerance, GATE_CATEGORY, GATE_EXIT, type GateCategory, type GateExitCode, REPORT_SCHEMA_VERSION, type RegressionDelta, type RegressionReport, } from "./regression";
39
39
  export { batchProcess, batchRead, RateLimiter, streamEvaluation, } from "./streaming";
40
40
  export type { Annotation, AnnotationItem, AnnotationTask, APIKey, APIKeyUsage, APIKeyWithSecret, BatchOptions, ClientConfig as AIEvalConfig, CreateAnnotationItemParams, CreateAnnotationParams, CreateAnnotationTaskParams, CreateAPIKeyParams, CreateLLMJudgeConfigParams, CreateWebhookParams, Evaluation as EvaluationData, EvaluationRun, EvaluationRunDetail, ExportOptions, GenericMetadata as AnnotationData, GetLLMJudgeAlignmentParams, GetUsageParams, ImportOptions, ListAnnotationItemsParams, ListAnnotationsParams, ListAnnotationTasksParams, ListAPIKeysParams, ListLLMJudgeConfigsParams, ListLLMJudgeResultsParams, ListWebhookDeliveriesParams, ListWebhooksParams, LLMJudgeAlignment, LLMJudgeConfig, LLMJudgeEvaluateResult, LLMJudgeResult as LLMJudgeData, Organization, RetryConfig, SnapshotData, Span as SpanData, StreamOptions, TestCase, TestResult, Trace as TraceData, TraceDetail, TracedResponse, UpdateAPIKeyParams, UpdateWebhookParams, UsageStats, UsageSummary, Webhook, WebhookDelivery, } from "./types";
package/dist/index.js CHANGED
@@ -9,7 +9,7 @@
9
9
  */
10
10
  Object.defineProperty(exports, "__esModule", { value: true });
11
11
  exports.defaultLocalExecutor = exports.createLocalExecutor = exports.evalai = exports.defineSuite = exports.defineEval = exports.createResult = exports.createEvalContext = exports.validateContext = exports.mergeContexts = exports.cloneContext = exports.ContextManager = exports.withContext = exports.getContext = exports.createContext = exports.withinRange = exports.similarTo = exports.respondedWithinTime = exports.notContainsPII = exports.matchesSchema = exports.matchesPattern = exports.isValidURL = exports.isValidEmail = exports.hasValidCodeSyntaxAsync = exports.hasValidCodeSyntax = exports.hasSentimentAsync = exports.hasSentiment = exports.hasReadabilityScore = exports.hasPII = exports.hasNoToxicityAsync = exports.hasNoToxicity = exports.hasNoHallucinationsAsync = exports.hasNoHallucinations = exports.hasLength = exports.hasFactualAccuracyAsync = exports.hasFactualAccuracy = exports.getAssertionConfig = exports.followsInstructions = exports.expect = exports.containsLanguageAsync = exports.containsLanguage = exports.containsKeywords = exports.containsJSON = exports.containsAllRequiredFields = exports.configureAssertions = exports.NetworkError = exports.ValidationError = exports.AuthenticationError = exports.RateLimitError = exports.EvalGateError = exports.AIEvalClient = void 0;
12
- exports.WorkflowTracer = exports.traceWorkflowStep = exports.traceLangChainAgent = exports.traceCrewAI = exports.traceAutoGen = exports.createWorkflowTracer = exports.EvaluationTemplates = exports.streamEvaluation = exports.RateLimiter = exports.batchRead = exports.batchProcess = exports.REPORT_SCHEMA_VERSION = exports.GATE_EXIT = exports.GATE_CATEGORY = exports.ARTIFACTS = exports.PaginatedIterator = exports.encodeCursor = exports.decodeCursor = exports.createPaginatedIterator = exports.autoPaginate = exports.extendExpectWithToPassGate = exports.Logger = exports.openAIChatEval = exports.traceOpenAI = exports.traceAnthropic = exports.runCheck = exports.parseArgs = exports.EXIT = exports.RequestCache = exports.CacheTTL = exports.RequestBatcher = exports.importData = exports.exportData = exports.compareSnapshots = exports.saveSnapshot = exports.compareWithSnapshot = exports.snapshot = exports.TestSuite = exports.createTestSuite = exports.SpecRegistrationError = exports.SpecExecutionError = exports.RuntimeError = exports.EvalRuntimeError = exports.setActiveRuntime = exports.getActiveRuntime = exports.disposeActiveRuntime = exports.createEvalRuntime = void 0;
12
+ exports.WorkflowTracer = exports.traceWorkflowStep = exports.traceLangChainAgent = exports.traceCrewAI = exports.traceAutoGen = exports.createWorkflowTracer = exports.EvaluationTemplates = exports.streamEvaluation = exports.RateLimiter = exports.batchRead = exports.batchProcess = exports.REPORT_SCHEMA_VERSION = exports.GATE_EXIT = exports.GATE_CATEGORY = exports.ARTIFACTS = exports.PaginatedIterator = exports.encodeCursor = exports.decodeCursor = exports.createPaginatedIterator = exports.autoPaginateGenerator = exports.autoPaginate = exports.extendExpectWithToPassGate = exports.Logger = exports.openAIChatEval = exports.traceOpenAI = exports.traceAnthropic = exports.runCheck = exports.parseArgs = exports.EXIT = exports.RequestCache = exports.CacheTTL = exports.RequestBatcher = exports.importData = exports.exportData = exports.saveSnapshot = exports.compareSnapshots = exports.compareWithSnapshot = exports.snapshot = exports.TestSuite = exports.createTestSuite = exports.SpecRegistrationError = exports.SpecExecutionError = exports.RuntimeError = exports.EvalRuntimeError = exports.setActiveRuntime = exports.getActiveRuntime = exports.disposeActiveRuntime = exports.createEvalRuntime = void 0;
13
13
  // Main SDK exports
14
14
  var client_1 = require("./client");
15
15
  Object.defineProperty(exports, "AIEvalClient", { enumerable: true, get: function () { return client_1.AIEvalClient; } });
@@ -91,8 +91,8 @@ Object.defineProperty(exports, "createTestSuite", { enumerable: true, get: funct
91
91
  Object.defineProperty(exports, "TestSuite", { enumerable: true, get: function () { return testing_1.TestSuite; } });
92
92
  // Snapshot testing (Tier 2.8)
93
93
  const snapshot_1 = require("./snapshot");
94
+ Object.defineProperty(exports, "compareSnapshots", { enumerable: true, get: function () { return snapshot_1.compareSnapshots; } });
94
95
  Object.defineProperty(exports, "compareWithSnapshot", { enumerable: true, get: function () { return snapshot_1.compareWithSnapshot; } });
95
- Object.defineProperty(exports, "compareSnapshots", { enumerable: true, get: function () { return snapshot_1.compareWithSnapshot; } });
96
96
  Object.defineProperty(exports, "snapshot", { enumerable: true, get: function () { return snapshot_1.snapshot; } });
97
97
  Object.defineProperty(exports, "saveSnapshot", { enumerable: true, get: function () { return snapshot_1.snapshot; } });
98
98
  // Export/Import utilities (Tier 4.18)
@@ -130,6 +130,7 @@ var matchers_1 = require("./matchers");
130
130
  Object.defineProperty(exports, "extendExpectWithToPassGate", { enumerable: true, get: function () { return matchers_1.extendExpectWithToPassGate; } });
131
131
  var pagination_1 = require("./pagination");
132
132
  Object.defineProperty(exports, "autoPaginate", { enumerable: true, get: function () { return pagination_1.autoPaginate; } });
133
+ Object.defineProperty(exports, "autoPaginateGenerator", { enumerable: true, get: function () { return pagination_1.autoPaginateGenerator; } });
133
134
  Object.defineProperty(exports, "createPaginatedIterator", { enumerable: true, get: function () { return pagination_1.createPaginatedIterator; } });
134
135
  Object.defineProperty(exports, "decodeCursor", { enumerable: true, get: function () { return pagination_1.decodeCursor; } });
135
136
  Object.defineProperty(exports, "encodeCursor", { enumerable: true, get: function () { return pagination_1.encodeCursor; } });
@@ -67,7 +67,7 @@ function traceAnthropic(anthropic, evalClient, options = {}) {
67
67
  }
68
68
  : {}),
69
69
  });
70
- await evalClient.traces.create({
70
+ await evalClient.traces?.create({
71
71
  name: `Anthropic: ${params.model}`,
72
72
  traceId,
73
73
  organizationId: organizationId || evalClient.getOrganizationId(),
@@ -89,7 +89,7 @@ function traceAnthropic(anthropic, evalClient, options = {}) {
89
89
  error: error instanceof Error ? error.message : String(error),
90
90
  });
91
91
  await evalClient.traces
92
- .create({
92
+ ?.create({
93
93
  name: `Anthropic: ${params.model}`,
94
94
  traceId,
95
95
  organizationId: organizationId || evalClient.getOrganizationId(),
@@ -97,7 +97,7 @@ function traceAnthropic(anthropic, evalClient, options = {}) {
97
97
  durationMs,
98
98
  metadata: errorMetadata,
99
99
  })
100
- .catch(() => {
100
+ ?.catch(() => {
101
101
  // Ignore errors in trace creation to avoid masking the original error
102
102
  });
103
103
  throw error;
@@ -127,7 +127,7 @@ async function traceAnthropicCall(evalClient, name, fn, options = {}) {
127
127
  const startTime = Date.now();
128
128
  const traceId = `anthropic-${Date.now()}-${Math.random().toString(36).substr(2, 9)}`;
129
129
  try {
130
- await evalClient.traces.create({
130
+ await evalClient.traces?.create({
131
131
  name,
132
132
  traceId,
133
133
  organizationId: options.organizationId || evalClient.getOrganizationId(),
@@ -136,7 +136,7 @@ async function traceAnthropicCall(evalClient, name, fn, options = {}) {
136
136
  });
137
137
  const result = await fn();
138
138
  const durationMs = Date.now() - startTime;
139
- await evalClient.traces.create({
139
+ await evalClient.traces?.create({
140
140
  name,
141
141
  traceId,
142
142
  organizationId: options.organizationId || evalClient.getOrganizationId(),
@@ -148,7 +148,7 @@ async function traceAnthropicCall(evalClient, name, fn, options = {}) {
148
148
  }
149
149
  catch (error) {
150
150
  const durationMs = Date.now() - startTime;
151
- await evalClient.traces.create({
151
+ await evalClient.traces?.create({
152
152
  name,
153
153
  traceId,
154
154
  organizationId: options.organizationId || evalClient.getOrganizationId(),
@@ -65,7 +65,7 @@ function traceOpenAI(openai, evalClient, options = {}) {
65
65
  }
66
66
  : {}),
67
67
  });
68
- await evalClient.traces.create({
68
+ await evalClient.traces?.create({
69
69
  name: `OpenAI: ${params.model}`,
70
70
  traceId,
71
71
  organizationId: organizationId || evalClient.getOrganizationId(),
@@ -87,7 +87,7 @@ function traceOpenAI(openai, evalClient, options = {}) {
87
87
  error: error instanceof Error ? error.message : String(error),
88
88
  });
89
89
  await evalClient.traces
90
- .create({
90
+ ?.create({
91
91
  name: `OpenAI: ${params.model}`,
92
92
  traceId,
93
93
  organizationId: organizationId || evalClient.getOrganizationId(),
@@ -95,7 +95,7 @@ function traceOpenAI(openai, evalClient, options = {}) {
95
95
  durationMs,
96
96
  metadata: errorMetadata,
97
97
  })
98
- .catch(() => {
98
+ ?.catch(() => {
99
99
  // Ignore errors in trace creation to avoid masking the original error
100
100
  });
101
101
  throw error;
@@ -124,7 +124,7 @@ async function traceOpenAICall(evalClient, name, fn, options = {}) {
124
124
  const startTime = Date.now();
125
125
  const traceId = `openai-${Date.now()}-${Math.random().toString(36).substr(2, 9)}`;
126
126
  try {
127
- await evalClient.traces.create({
127
+ await evalClient.traces?.create({
128
128
  name,
129
129
  traceId,
130
130
  organizationId: options.organizationId || evalClient.getOrganizationId(),
@@ -133,7 +133,7 @@ async function traceOpenAICall(evalClient, name, fn, options = {}) {
133
133
  });
134
134
  const result = await fn();
135
135
  const durationMs = Date.now() - startTime;
136
- await evalClient.traces.create({
136
+ await evalClient.traces?.create({
137
137
  name,
138
138
  traceId,
139
139
  organizationId: options.organizationId || evalClient.getOrganizationId(),
@@ -145,7 +145,7 @@ async function traceOpenAICall(evalClient, name, fn, options = {}) {
145
145
  }
146
146
  catch (error) {
147
147
  const durationMs = Date.now() - startTime;
148
- await evalClient.traces.create({
148
+ await evalClient.traces?.create({
149
149
  name,
150
150
  traceId,
151
151
  organizationId: options.organizationId || evalClient.getOrganizationId(),
@@ -50,9 +50,20 @@ export declare function createPaginatedIterator<T>(fetchFn: (offset: number, lim
50
50
  hasMore: boolean;
51
51
  }>, limit?: number): PaginatedIterator<T>;
52
52
  /**
53
- * Auto-paginate helper that fetches all pages automatically
53
+ * Auto-paginate helper that fetches all pages and returns a flat array.
54
+ * @example
55
+ * ```typescript
56
+ * const allItems = await autoPaginate(
57
+ * (offset, limit) => client.traces.list({ offset, limit }),
58
+ * );
59
+ * ```
54
60
  */
55
- export declare function autoPaginate<T>(fetchFn: (offset: number, limit: number) => Promise<T[]>, limit?: number): AsyncGenerator<T, void, unknown>;
61
+ export declare function autoPaginate<T>(fetchFn: (offset: number, limit: number) => Promise<T[]>, limit?: number): Promise<T[]>;
62
+ /**
63
+ * Streaming auto-paginate generator — yields individual items one at a time.
64
+ * Use this when you want to process items as they arrive rather than waiting for all pages.
65
+ */
66
+ export declare function autoPaginateGenerator<T>(fetchFn: (offset: number, limit: number) => Promise<T[]>, limit?: number): AsyncGenerator<T, void, unknown>;
56
67
  /**
57
68
  * Encode cursor for pagination (base64)
58
69
  */
@@ -6,6 +6,7 @@ Object.defineProperty(exports, "__esModule", { value: true });
6
6
  exports.PaginatedIterator = void 0;
7
7
  exports.createPaginatedIterator = createPaginatedIterator;
8
8
  exports.autoPaginate = autoPaginate;
9
+ exports.autoPaginateGenerator = autoPaginateGenerator;
9
10
  exports.encodeCursor = encodeCursor;
10
11
  exports.decodeCursor = decodeCursor;
11
12
  exports.createPaginationMeta = createPaginationMeta;
@@ -56,9 +57,34 @@ function createPaginatedIterator(fetchFn, limit = 50) {
56
57
  return new PaginatedIterator(fetchFn, limit);
57
58
  }
58
59
  /**
59
- * Auto-paginate helper that fetches all pages automatically
60
+ * Auto-paginate helper that fetches all pages and returns a flat array.
61
+ * @example
62
+ * ```typescript
63
+ * const allItems = await autoPaginate(
64
+ * (offset, limit) => client.traces.list({ offset, limit }),
65
+ * );
66
+ * ```
60
67
  */
61
- async function* autoPaginate(fetchFn, limit = 50) {
68
+ async function autoPaginate(fetchFn, limit = 50) {
69
+ const result = [];
70
+ let offset = 0;
71
+ let hasMore = true;
72
+ while (hasMore) {
73
+ const items = await fetchFn(offset, limit);
74
+ if (items.length === 0) {
75
+ break;
76
+ }
77
+ result.push(...items);
78
+ hasMore = items.length === limit;
79
+ offset += limit;
80
+ }
81
+ return result;
82
+ }
83
+ /**
84
+ * Streaming auto-paginate generator — yields individual items one at a time.
85
+ * Use this when you want to process items as they arrive rather than waiting for all pages.
86
+ */
87
+ async function* autoPaginateGenerator(fetchFn, limit = 50) {
62
88
  let offset = 0;
63
89
  let hasMore = true;
64
90
  while (hasMore) {
@@ -208,12 +208,7 @@ function generateDefineEvalCode(suite, options = {}) {
208
208
  });
209
209
  const helperFunctions = generateHelperFunctionsForSuite(specs, options);
210
210
  const evaluationFunction = generateEvaluationFunction();
211
- return [
212
- ...imports,
213
- ...helperFunctions,
214
- ...evaluationFunction,
215
- ...specCode,
216
- ].join("\n");
211
+ return [...imports, helperFunctions, evaluationFunction, ...specCode].join("\n");
217
212
  }
218
213
  /**
219
214
  * Generate helper functions for a specific spec
@@ -10,7 +10,8 @@ import type { LocalExecutor } from "./types";
10
10
  */
11
11
  export declare function createLocalExecutor(): LocalExecutor;
12
12
  /**
13
- * Default local executor instance
13
+ * Default local executor factory
14
+ * Call as defaultLocalExecutor() to get a new executor instance.
14
15
  * For convenience in simple use cases
15
16
  */
16
- export declare const defaultLocalExecutor: LocalExecutor;
17
+ export declare const defaultLocalExecutor: typeof createLocalExecutor;
@@ -146,7 +146,8 @@ function createLocalExecutor() {
146
146
  return new LocalExecutorImpl();
147
147
  }
148
148
  /**
149
- * Default local executor instance
149
+ * Default local executor factory
150
+ * Call as defaultLocalExecutor() to get a new executor instance.
150
151
  * For convenience in simple use cases
151
152
  */
152
- exports.defaultLocalExecutor = createLocalExecutor();
153
+ exports.defaultLocalExecutor = createLocalExecutor;
@@ -61,7 +61,10 @@ export interface SerializedSpec {
61
61
  * Create a new scoped runtime with lifecycle management
62
62
  * Returns a handle for proper resource management
63
63
  */
64
- export declare function createEvalRuntime(projectRoot?: string): RuntimeHandle;
64
+ export declare function createEvalRuntime(projectRootOrConfig?: string | {
65
+ name?: string;
66
+ projectRoot?: string;
67
+ }): RuntimeHandle;
65
68
  /**
66
69
  * Helper function for safe runtime execution with automatic cleanup
67
70
  * Ensures runtime is disposed even if an exception is thrown
@@ -315,7 +315,10 @@ class EvalRuntimeImpl {
315
315
  * Create a new scoped runtime with lifecycle management
316
316
  * Returns a handle for proper resource management
317
317
  */
318
- function createEvalRuntime(projectRoot = process.cwd()) {
318
+ function createEvalRuntime(projectRootOrConfig = process.cwd()) {
319
+ const projectRoot = typeof projectRootOrConfig === "string"
320
+ ? projectRootOrConfig
321
+ : (projectRootOrConfig.projectRoot ?? process.cwd());
319
322
  const runtime = new EvalRuntimeImpl(projectRoot);
320
323
  // Create bound defineEval function
321
324
  const boundDefineEval = ((nameOrConfig, executor, options) => {
@@ -166,6 +166,18 @@ export declare function loadSnapshot(name: string, dir?: string): Promise<Snapsh
166
166
  * ```
167
167
  */
168
168
  export declare function compareWithSnapshot(name: string, currentOutput: unknown, dir?: string): Promise<SnapshotComparison>;
169
+ /**
170
+ * Compare two saved snapshots by name (convenience function)
171
+ *
172
+ * @example
173
+ * ```typescript
174
+ * const comparison = await compareSnapshots('baseline', 'current');
175
+ * if (!comparison.matches) {
176
+ * console.log('Snapshots differ!', comparison.differences);
177
+ * }
178
+ * ```
179
+ */
180
+ export declare function compareSnapshots(nameA: string, nameB: string, dir?: string): Promise<SnapshotComparison>;
169
181
  /**
170
182
  * Delete a snapshot (convenience function)
171
183
  */
package/dist/snapshot.js CHANGED
@@ -55,6 +55,7 @@ exports.SnapshotManager = void 0;
55
55
  exports.snapshot = snapshot;
56
56
  exports.loadSnapshot = loadSnapshot;
57
57
  exports.compareWithSnapshot = compareWithSnapshot;
58
+ exports.compareSnapshots = compareSnapshots;
58
59
  exports.deleteSnapshot = deleteSnapshot;
59
60
  exports.listSnapshots = listSnapshots;
60
61
  // Environment check
@@ -130,7 +131,13 @@ class SnapshotManager {
130
131
  if (!options?.overwrite && fs.existsSync(filePath)) {
131
132
  throw new Error(`Snapshot '${name}' already exists. Use overwrite: true to update.`);
132
133
  }
133
- const serialized = typeof output === "string" ? output : JSON.stringify(output);
134
+ const serialized = output === undefined
135
+ ? "undefined"
136
+ : output === null
137
+ ? "null"
138
+ : typeof output === "string"
139
+ ? output
140
+ : JSON.stringify(output);
134
141
  const snapshotData = {
135
142
  output: serialized,
136
143
  metadata: {
@@ -310,6 +317,22 @@ async function compareWithSnapshot(name, currentOutput, dir) {
310
317
  const manager = getSnapshotManager(dir);
311
318
  return manager.compare(name, currentOutput);
312
319
  }
320
+ /**
321
+ * Compare two saved snapshots by name (convenience function)
322
+ *
323
+ * @example
324
+ * ```typescript
325
+ * const comparison = await compareSnapshots('baseline', 'current');
326
+ * if (!comparison.matches) {
327
+ * console.log('Snapshots differ!', comparison.differences);
328
+ * }
329
+ * ```
330
+ */
331
+ async function compareSnapshots(nameA, nameB, dir) {
332
+ const manager = getSnapshotManager(dir);
333
+ const snapshotB = await manager.load(nameB);
334
+ return manager.compare(nameA, snapshotB.output);
335
+ }
313
336
  /**
314
337
  * Delete a snapshot (convenience function)
315
338
  */
package/dist/version.d.ts CHANGED
@@ -3,5 +3,5 @@
3
3
  * X-EvalGate-SDK-Version: SDK package version
4
4
  * X-EvalGate-Spec-Version: OpenAPI spec version (docs/openapi.json info.version)
5
5
  */
6
- export declare const SDK_VERSION = "2.2.2";
7
- export declare const SPEC_VERSION = "2.2.2";
6
+ export declare const SDK_VERSION = "2.2.3";
7
+ export declare const SPEC_VERSION = "2.2.3";
package/dist/version.js CHANGED
@@ -6,5 +6,5 @@ exports.SPEC_VERSION = exports.SDK_VERSION = void 0;
6
6
  * X-EvalGate-SDK-Version: SDK package version
7
7
  * X-EvalGate-Spec-Version: OpenAPI spec version (docs/openapi.json info.version)
8
8
  */
9
- exports.SDK_VERSION = "2.2.2";
10
- exports.SPEC_VERSION = "2.2.2";
9
+ exports.SDK_VERSION = "2.2.3";
10
+ exports.SPEC_VERSION = "2.2.3";
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@evalgate/sdk",
3
- "version": "2.2.2",
3
+ "version": "2.2.3",
4
4
  "publishConfig": {
5
5
  "access": "public",
6
6
  "registry": "https://registry.npmjs.org/"