@evalgate/sdk 2.2.2 → 2.2.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +32 -0
- package/README.md +40 -1
- package/dist/assertions.d.ts +194 -10
- package/dist/assertions.js +525 -73
- package/dist/batch.js +4 -4
- package/dist/cache.d.ts +5 -1
- package/dist/cache.js +5 -1
- package/dist/cli/baseline.d.ts +14 -0
- package/dist/cli/baseline.js +43 -3
- package/dist/cli/check.d.ts +5 -2
- package/dist/cli/check.js +20 -12
- package/dist/cli/compare.d.ts +80 -0
- package/dist/cli/compare.js +266 -0
- package/dist/cli/index.js +244 -101
- package/dist/cli/regression-gate.js +23 -0
- package/dist/cli/run.js +22 -0
- package/dist/cli/start.d.ts +26 -0
- package/dist/cli/start.js +130 -0
- package/dist/cli/templates.d.ts +24 -0
- package/dist/cli/templates.js +314 -0
- package/dist/cli/traces.d.ts +109 -0
- package/dist/cli/traces.js +152 -0
- package/dist/cli/upgrade.js +5 -0
- package/dist/cli/validate.d.ts +37 -0
- package/dist/cli/validate.js +252 -0
- package/dist/cli/watch.d.ts +19 -0
- package/dist/cli/watch.js +175 -0
- package/dist/client.js +6 -13
- package/dist/constants.d.ts +2 -0
- package/dist/constants.js +5 -0
- package/dist/errors.js +7 -0
- package/dist/export.js +2 -2
- package/dist/index.d.ts +10 -9
- package/dist/index.js +24 -7
- package/dist/integrations/anthropic.js +6 -6
- package/dist/integrations/openai.js +84 -61
- package/dist/logger.d.ts +3 -1
- package/dist/logger.js +2 -1
- package/dist/otel.d.ts +130 -0
- package/dist/otel.js +309 -0
- package/dist/pagination.d.ts +13 -2
- package/dist/pagination.js +28 -2
- package/dist/runtime/adapters/testsuite-to-dsl.js +1 -6
- package/dist/runtime/eval.d.ts +14 -4
- package/dist/runtime/eval.js +127 -2
- package/dist/runtime/executor.d.ts +3 -2
- package/dist/runtime/executor.js +3 -2
- package/dist/runtime/registry.d.ts +8 -3
- package/dist/runtime/registry.js +15 -4
- package/dist/runtime/run-report.d.ts +1 -1
- package/dist/runtime/run-report.js +7 -4
- package/dist/runtime/types.d.ts +38 -0
- package/dist/snapshot.d.ts +12 -0
- package/dist/snapshot.js +24 -1
- package/dist/testing.d.ts +8 -0
- package/dist/testing.js +45 -10
- package/dist/version.d.ts +2 -2
- package/dist/version.js +2 -2
- package/dist/workflows.d.ts +2 -0
- package/dist/workflows.js +184 -102
- package/package.json +8 -1
package/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,38 @@ All notable changes to the @evalgate/sdk package will be documented in this file
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [2.2.3] - 2026-03-03
|
|
9
|
+
|
|
10
|
+
### Breaking
|
|
11
|
+
|
|
12
|
+
- **`PaginatedIterator` API changed from cursor-based to offset-based** — the constructor signature changed from `(cursor) => { items, nextCursor, hasMore }` to `(offset, limit) => { data, hasMore }`. If you were using `PaginatedIterator` directly with a cursor-based fetcher, update your callback to accept `(offset: number, limit: number)` and return `{ data: T[], hasMore: boolean }`. The `autoPaginate` and `autoPaginateGenerator` helpers also use the new offset-based signature. Cursor encoding/decoding utilities (`encodeCursor`, `decodeCursor`) remain available for server-side cursor generation.
|
|
13
|
+
- **`RequestCache` removed from public exports** — `RequestCache` was an internal HTTP cache with a method-specific API (`set(method, url, data, ttl, params)`) that did not match general-purpose cache expectations. It is no longer exported from the package entry point. If you were importing it directly, use your own cache implementation or rely on the SDK's built-in automatic caching. `CacheTTL` constants remain exported for advanced configuration.
|
|
14
|
+
|
|
15
|
+
### Fixed
|
|
16
|
+
|
|
17
|
+
- **`RequestCache.set` missing default TTL** — entries stored without an explicit TTL were immediately stale on next read. Default is now `CacheTTL.MEDIUM`; callers that omit `ttl` get a live cache entry instead of a cache miss every time.
|
|
18
|
+
- **`EvalGateError` subclass prototype chain** — `ValidationError.name` was silently overwritten by the base class constructor, surfacing as `"EvalGateError"` in stack traces and `instanceof` checks. All four subclasses (`ValidationError`, `RateLimitError`, `AuthenticationError`, `NetworkError`) now call `Object.setPrototypeOf(this, Subclass.prototype)` and set `this.name` after `super()`.
|
|
19
|
+
- **`RateLimitError.retryAfter` not a direct property** — the value was only stored inside `details.retryAfter` and not accessible as `err.retryAfter`. It is now assigned directly on the instance when provided.
|
|
20
|
+
- **`autoPaginate` returned `AsyncGenerator` instead of `Promise<T[]>`** — calling `await autoPaginate(fetcher)` was resolving to an unexhausted generator. It now collects all pages and returns a flat `Promise<T[]>`. The original streaming behaviour is available via the new `autoPaginateGenerator` export.
|
|
21
|
+
- **`createEvalRuntime` string-only overload** — passing `{ name, projectRoot }` config objects was ignored (treated as `process.cwd()`). The function now accepts `string | { name?: string; projectRoot?: string }` and extracts `projectRoot` correctly.
|
|
22
|
+
- **`defaultLocalExecutor` was an instance, not a factory** — importing `defaultLocalExecutor` returned a pre-constructed executor rather than a callable factory. It is now re-exported as `createLocalExecutor` so each import site can call it to get a fresh instance.
|
|
23
|
+
- **`SnapshotManager.save` crash on `undefined`/`null` output** — passing `undefined` or `null` to `snapshot(name, output)` threw `TypeError: Cannot convert undefined to string`. Both values are now serialized to the strings `"undefined"` and `"null"` respectively, matching the existing `null`-safe coercion already present for objects.
|
|
24
|
+
- **`compareSnapshots` loaded raw string instead of disk snapshot** — the old `compareWithSnapshot` alias passed its second argument as literal content rather than a snapshot name, producing meaningless diffs. The new `compareSnapshots(nameA, nameB, dir?)` loads both snapshots from disk before diffing.
|
|
25
|
+
- **`AIEvalClient` default `baseUrl`** — the no-arg constructor defaulted to `http://localhost:3000`, causing silent failures in production environments. Default is now `https://api.evalgate.com`.
|
|
26
|
+
- **`importData` unguarded `client.traces` / `client.evaluations` access** — calling `importData(data)` with a partial or undefined client could throw `TypeError: Cannot read properties of undefined`. Both property accesses now use optional chaining (`client?.traces`, `client?.evaluations`).
|
|
27
|
+
- **`toContainCode` required a fenced code block** — raw function definitions, `const` assignments, class declarations, arrow functions, `import`/`export` statements, and `return` expressions now satisfy the assertion without needing triple-backtick fencing.
|
|
28
|
+
- **`hasReadabilityScore` ignored `{min}` object form** — passing `{ min: 40 }` instead of a plain number was coerced to `NaN` threshold, making every call return `true`. The function now unwraps `{ min?, max? }` objects and applies both bounds.
|
|
29
|
+
|
|
30
|
+
### Added
|
|
31
|
+
|
|
32
|
+
- **`autoPaginateGenerator`** — new export for streaming pagination as an `AsyncGenerator<T[]>` (one chunk per page). Use when you want to process pages incrementally rather than wait for all pages to load.
|
|
33
|
+
- **`compareSnapshots(nameA, nameB, dir?)`** — loads both named snapshots from disk and returns a `SnapshotComparison`. Replaces the incorrectly aliased `compareWithSnapshot`.
|
|
34
|
+
- **141 new regression tests** across 9 test files covering all fixes above: `RequestCache` TTL defaults, error class prototype chains, `autoPaginate` flat-array return, `createEvalRuntime` config-object overload, `defaultLocalExecutor` callable factory, `SnapshotManager` null/undefined handling, `compareSnapshots` disk-load path, `AIEvalClient` default `baseUrl`, `importData` guards, `toContainCode` raw-code detection, and `hasReadabilityScore` object form.
|
|
35
|
+
- **`upgrade --full` post-upgrade warning** — CLI now prints a reminder to run `npx evalgate baseline update` after a full upgrade to avoid a false regression on the next CI run.
|
|
36
|
+
- **Optional chaining on OpenAI / Anthropic integration `traces.create`** — `evalClient.traces?.create(...)` prevents crashes when the `traces` resource is unavailable on the client (e.g. minimal config or testing without a full API key).
|
|
37
|
+
|
|
38
|
+
---
|
|
39
|
+
|
|
8
40
|
## [2.2.2] - 2026-03-03
|
|
9
41
|
|
|
10
42
|
### Fixed
|
package/README.md
CHANGED
|
@@ -157,6 +157,42 @@ Every failure prints a clear next step:
|
|
|
157
157
|
| `npx evalgate diff --base last --head last` | Compare last two runs |
|
|
158
158
|
| `npx evalgate diff --format github` | GitHub Step Summary with regressions |
|
|
159
159
|
|
|
160
|
+
### Compare — Side-by-Side Result Diff
|
|
161
|
+
|
|
162
|
+
**Important:** `evalgate compare` compares **result files**, not models.
|
|
163
|
+
You run each model/config separately (via `evalgate run --write-results`),
|
|
164
|
+
then compare the saved JSON artifacts. Nothing is re-executed.
|
|
165
|
+
|
|
166
|
+
```bash
|
|
167
|
+
# The primary interface — two result files:
|
|
168
|
+
evalgate compare --base .evalgate/runs/gpt4o-run.json --head .evalgate/runs/claude-run.json
|
|
169
|
+
|
|
170
|
+
# Optional labels for the output table (cosmetic, not identifiers):
|
|
171
|
+
evalgate compare --base gpt4o.json --head claude.json --labels "GPT-4o" "Claude 3.5"
|
|
172
|
+
|
|
173
|
+
# N-way compare (3+ files):
|
|
174
|
+
evalgate compare --runs run-a.json run-b.json run-c.json
|
|
175
|
+
|
|
176
|
+
# Machine-readable:
|
|
177
|
+
evalgate compare --base a.json --head b.json --format json
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
| Command | Description |
|
|
181
|
+
|---------|-------------|
|
|
182
|
+
| `evalgate compare --base <file> --head <file>` | Compare two run result JSON files |
|
|
183
|
+
| `evalgate compare --runs <f1> <f2> [f3...]` | N-way comparison across multiple runs |
|
|
184
|
+
| `--labels <l1> <l2>` | Optional human-readable labels for output |
|
|
185
|
+
| `--sort-by <key>` | Sort specs by: `name` (default), `score`, `duration` |
|
|
186
|
+
| `--format json` | Machine-readable JSON output |
|
|
187
|
+
|
|
188
|
+
**Workflow:**
|
|
189
|
+
```
|
|
190
|
+
evalgate run --write-results # saves .evalgate/runs/run-<id>.json
|
|
191
|
+
# change model/config/prompt
|
|
192
|
+
evalgate run --write-results # saves another run file
|
|
193
|
+
evalgate compare --base <first>.json --head <second>.json
|
|
194
|
+
```
|
|
195
|
+
|
|
160
196
|
### Legacy Regression Gate (local, no account needed)
|
|
161
197
|
|
|
162
198
|
| Command | Description |
|
|
@@ -389,7 +425,8 @@ console.log(hasNoToxicity("Have a great day!")); // true
|
|
|
389
425
|
console.log(hasValidCodeSyntax("function f() {}", "js")); // true
|
|
390
426
|
|
|
391
427
|
// Async — LLM-backed, context-aware
|
|
392
|
-
|
|
428
|
+
const { matches, confidence } = await hasSentimentAsync("subtle irony...", "negative");
|
|
429
|
+
console.log(matches, confidence); // true, 0.85
|
|
393
430
|
console.log(await hasNoToxicityAsync("sarcastic attack text")); // false
|
|
394
431
|
```
|
|
395
432
|
|
|
@@ -450,6 +487,8 @@ Your local `openAIChatEval` runs continue to work. No account cancellation. No d
|
|
|
450
487
|
|
|
451
488
|
See [CHANGELOG.md](CHANGELOG.md) for the full release history.
|
|
452
489
|
|
|
490
|
+
**v2.2.3** — Bug-fix release. `RequestCache` default TTL, `EvalGateError` subclass prototype chain and `retryAfter` direct property, `autoPaginate` now returns `Promise<T[]>` (new `autoPaginateGenerator` for streaming), `createEvalRuntime` config-object overload, `defaultLocalExecutor` callable factory, `SnapshotManager.save` null/undefined safety, `compareSnapshots` loads both sides from disk, `AIEvalClient` default baseUrl → `https://api.evalgate.com`, `importData` optional-chaining guards, `toContainCode` raw-code detection, `hasReadabilityScore` `{min,max}` object form. 141 new regression tests.
|
|
491
|
+
|
|
453
492
|
**v2.2.2** — 8 stub assertions replaced with real implementations (`hasSentiment` expanded lexicon, `hasNoToxicity` ~80-term blocklist, `hasValidCodeSyntax` real bracket balance, `containsLanguage` 12 languages + BCP-47, `hasFactualAccuracy`/`hasNoHallucinations` case-insensitive, `hasReadabilityScore` per-word syllable fix, `matchesSchema` JSON Schema support). Added LLM-backed `*Async` variants + `configureAssertions`. Fixed `importData` crash, `compareWithSnapshot` object coercion, `WorkflowTracer` defensive guard. 115 new tests.
|
|
454
493
|
|
|
455
494
|
**v2.2.1** — `snapshot(name, output)` accepts objects; auto-serialized via `JSON.stringify`
|
package/dist/assertions.d.ts
CHANGED
|
@@ -126,13 +126,22 @@ export declare class Expectation {
|
|
|
126
126
|
*/
|
|
127
127
|
toBeBetween(min: number, max: number, message?: string): AssertionResult;
|
|
128
128
|
/**
|
|
129
|
-
* Assert value contains code block
|
|
129
|
+
* Assert value contains code block or raw code
|
|
130
130
|
* @example expect(output).toContainCode()
|
|
131
|
+
* @example expect(output).toContainCode('typescript')
|
|
131
132
|
*/
|
|
132
|
-
toContainCode(message?: string): AssertionResult;
|
|
133
|
+
toContainCode(language?: string, message?: string): AssertionResult;
|
|
133
134
|
/**
|
|
134
|
-
*
|
|
135
|
-
*
|
|
135
|
+
* Blocklist check for 7 common profane words. Does NOT analyze tone,
|
|
136
|
+
* formality, or professional communication quality. For actual tone
|
|
137
|
+
* analysis, use an LLM-backed assertion.
|
|
138
|
+
* @see hasSentimentAsync for LLM-based tone checking
|
|
139
|
+
* @example expect(output).toHaveNoProfanity()
|
|
140
|
+
*/
|
|
141
|
+
toHaveNoProfanity(message?: string): AssertionResult;
|
|
142
|
+
/**
|
|
143
|
+
* @deprecated Use {@link toHaveNoProfanity} instead. This method only
|
|
144
|
+
* checks for 7 profane words — it does not analyze professional tone.
|
|
136
145
|
*/
|
|
137
146
|
toBeProfessional(message?: string): AssertionResult;
|
|
138
147
|
/**
|
|
@@ -199,7 +208,63 @@ export declare function hasPII(text: string): boolean;
|
|
|
199
208
|
* {@link hasSentimentAsync} with an LLM provider for context-aware accuracy.
|
|
200
209
|
*/
|
|
201
210
|
export declare function hasSentiment(text: string, expected: "positive" | "negative" | "neutral"): boolean;
|
|
211
|
+
/**
|
|
212
|
+
* Lexicon-based sentiment check with confidence score.
|
|
213
|
+
* Returns the detected sentiment, a confidence score (0–1), and whether
|
|
214
|
+
* it matches the expected sentiment.
|
|
215
|
+
*
|
|
216
|
+
* Confidence is derived from the magnitude of the word-count difference
|
|
217
|
+
* relative to the total sentiment-bearing words found.
|
|
218
|
+
*
|
|
219
|
+
* @example
|
|
220
|
+
* ```ts
|
|
221
|
+
* const { sentiment, confidence, matches } = hasSentimentWithScore(
|
|
222
|
+
* "This product is absolutely amazing and wonderful!",
|
|
223
|
+
* "positive",
|
|
224
|
+
* );
|
|
225
|
+
* // sentiment: "positive", confidence: ~0.9, matches: true
|
|
226
|
+
* ```
|
|
227
|
+
*/
|
|
228
|
+
export declare function hasSentimentWithScore(text: string, expected: "positive" | "negative" | "neutral"): {
|
|
229
|
+
sentiment: "positive" | "negative" | "neutral";
|
|
230
|
+
confidence: number;
|
|
231
|
+
matches: boolean;
|
|
232
|
+
};
|
|
202
233
|
export declare function similarTo(text1: string, text2: string, threshold?: number): boolean;
|
|
234
|
+
/**
|
|
235
|
+
* Measure consistency across multiple outputs for the same input.
|
|
236
|
+
* **Fast and approximate** — uses word-overlap (Jaccard) across all pairs.
|
|
237
|
+
* Returns a score from 0 (completely inconsistent) to 1 (identical).
|
|
238
|
+
*
|
|
239
|
+
* @param outputs - Array of LLM outputs to compare (minimum 2)
|
|
240
|
+
* @param threshold - Optional minimum consistency score to return true (default 0.7)
|
|
241
|
+
* @returns `{ score, consistent }` where `consistent` is `score >= threshold`
|
|
242
|
+
*
|
|
243
|
+
* @example
|
|
244
|
+
* ```ts
|
|
245
|
+
* const { score, consistent } = hasConsistency([
|
|
246
|
+
* "The capital of France is Paris.",
|
|
247
|
+
* "Paris is the capital of France.",
|
|
248
|
+
* "France's capital city is Paris.",
|
|
249
|
+
* ]);
|
|
250
|
+
* // score ≈ 0.6-0.8, consistent = true at default threshold
|
|
251
|
+
* ```
|
|
252
|
+
*/
|
|
253
|
+
export declare function hasConsistency(outputs: string[], threshold?: number): {
|
|
254
|
+
score: number;
|
|
255
|
+
consistent: boolean;
|
|
256
|
+
};
|
|
257
|
+
/**
|
|
258
|
+
* LLM-backed consistency check. **Slow and accurate** — asks the LLM to
|
|
259
|
+
* judge whether multiple outputs convey the same meaning, catching
|
|
260
|
+
* paraphrased contradictions that word-overlap misses.
|
|
261
|
+
*
|
|
262
|
+
* @returns A score from 0 to 1 where 1 = perfectly consistent.
|
|
263
|
+
*/
|
|
264
|
+
export declare function hasConsistencyAsync(outputs: string[], config?: AssertionLLMConfig): Promise<{
|
|
265
|
+
score: number;
|
|
266
|
+
consistent: boolean;
|
|
267
|
+
}>;
|
|
203
268
|
export declare function withinRange(value: number, min: number, max: number): boolean;
|
|
204
269
|
export declare function isValidEmail(email: string): boolean;
|
|
205
270
|
export declare function isValidURL(url: string): boolean;
|
|
@@ -209,9 +274,12 @@ export declare function isValidURL(url: string): boolean;
|
|
|
209
274
|
* facts but cannot detect paraphrased fabrications. Use
|
|
210
275
|
* {@link hasNoHallucinationsAsync} for semantic accuracy.
|
|
211
276
|
*/
|
|
212
|
-
export declare function hasNoHallucinations(text: string, groundTruth
|
|
277
|
+
export declare function hasNoHallucinations(text: string, groundTruth?: string[]): boolean;
|
|
213
278
|
export declare function matchesSchema(value: unknown, schema: Record<string, unknown>): boolean;
|
|
214
|
-
export declare function hasReadabilityScore(text: string, minScore: number
|
|
279
|
+
export declare function hasReadabilityScore(text: string, minScore: number | {
|
|
280
|
+
min?: number;
|
|
281
|
+
max?: number;
|
|
282
|
+
}): boolean;
|
|
215
283
|
/**
|
|
216
284
|
* Keyword-frequency language detector supporting 12 languages.
|
|
217
285
|
* **Fast and approximate** — detects the most common languages reliably
|
|
@@ -225,6 +293,23 @@ export declare function containsLanguage(text: string, language: string): boolea
|
|
|
225
293
|
* paraphrasing. Use {@link hasFactualAccuracyAsync} for semantic accuracy.
|
|
226
294
|
*/
|
|
227
295
|
export declare function hasFactualAccuracy(text: string, facts: string[]): boolean;
|
|
296
|
+
/**
|
|
297
|
+
* Check if a measured duration is within the allowed limit.
|
|
298
|
+
* @param durationMs - The actual elapsed time in milliseconds
|
|
299
|
+
* @param maxMs - Maximum allowed duration in milliseconds
|
|
300
|
+
*/
|
|
301
|
+
export declare function respondedWithinDuration(durationMs: number, maxMs: number): boolean;
|
|
302
|
+
/**
|
|
303
|
+
* Check if elapsed time since a start timestamp is within the allowed limit.
|
|
304
|
+
* @param startTime - Timestamp from Date.now() captured before the operation
|
|
305
|
+
* @param maxMs - Maximum allowed duration in milliseconds
|
|
306
|
+
*/
|
|
307
|
+
export declare function respondedWithinTimeSince(startTime: number, maxMs: number): boolean;
|
|
308
|
+
/**
|
|
309
|
+
* @deprecated Use {@link respondedWithinDuration} (takes measured duration)
|
|
310
|
+
* or {@link respondedWithinTimeSince} (takes start timestamp) instead.
|
|
311
|
+
* This function takes a start timestamp, not a duration — the name is misleading.
|
|
312
|
+
*/
|
|
228
313
|
export declare function respondedWithinTime(startTime: number, maxMs: number): boolean;
|
|
229
314
|
/**
|
|
230
315
|
* Blocklist-based toxicity check (~80 terms across 9 categories).
|
|
@@ -234,23 +319,72 @@ export declare function respondedWithinTime(startTime: number, maxMs: number): b
|
|
|
234
319
|
* with an LLM for context-aware moderation.
|
|
235
320
|
*/
|
|
236
321
|
export declare function hasNoToxicity(text: string): boolean;
|
|
237
|
-
export declare function followsInstructions(text: string, instructions: string[]): boolean;
|
|
322
|
+
export declare function followsInstructions(text: string, instructions: string | string[]): boolean;
|
|
238
323
|
export declare function containsAllRequiredFields(obj: unknown, requiredFields: string[]): boolean;
|
|
239
324
|
export interface AssertionLLMConfig {
|
|
240
325
|
provider: "openai" | "anthropic";
|
|
241
326
|
apiKey: string;
|
|
242
327
|
model?: string;
|
|
328
|
+
/** Embedding model for toSemanticallyContain (default: text-embedding-3-small). OpenAI only. */
|
|
329
|
+
embeddingModel?: string;
|
|
243
330
|
baseUrl?: string;
|
|
331
|
+
/** Maximum time in ms to wait for an LLM response. Default: 30 000 (30s). */
|
|
332
|
+
timeoutMs?: number;
|
|
244
333
|
}
|
|
245
334
|
export declare function configureAssertions(config: AssertionLLMConfig): void;
|
|
246
335
|
export declare function getAssertionConfig(): AssertionLLMConfig | null;
|
|
336
|
+
/**
|
|
337
|
+
* Result object from {@link hasSentimentAsync}.
|
|
338
|
+
*
|
|
339
|
+
* Implements `Symbol.toPrimitive` so that legacy callers using
|
|
340
|
+
* `if (await hasSentimentAsync(...))` get the correct `matches` boolean
|
|
341
|
+
* instead of an always-truthy object. A one-time deprecation warning is
|
|
342
|
+
* emitted when boolean coercion is detected.
|
|
343
|
+
*
|
|
344
|
+
* **Migration:** Destructure the result instead of using it as a boolean.
|
|
345
|
+
* ```ts
|
|
346
|
+
* // ❌ Deprecated (works but warns):
|
|
347
|
+
* if (await hasSentimentAsync(text, "positive")) { ... }
|
|
348
|
+
*
|
|
349
|
+
* // ✅ New pattern:
|
|
350
|
+
* const { matches } = await hasSentimentAsync(text, "positive");
|
|
351
|
+
* if (matches) { ... }
|
|
352
|
+
* ```
|
|
353
|
+
*/
|
|
354
|
+
export interface SentimentAsyncResult {
|
|
355
|
+
sentiment: "positive" | "negative" | "neutral";
|
|
356
|
+
confidence: number;
|
|
357
|
+
matches: boolean;
|
|
358
|
+
[Symbol.toPrimitive]: (hint: string) => boolean | number | string;
|
|
359
|
+
}
|
|
360
|
+
/** @internal Reset the one-time deprecation flag. For testing only. */
|
|
361
|
+
export declare function resetSentimentDeprecationWarning(): void;
|
|
247
362
|
/**
|
|
248
363
|
* LLM-backed sentiment check. **Slow and accurate** — uses an LLM to
|
|
249
|
-
* classify sentiment with full context awareness.
|
|
250
|
-
* {@link configureAssertions} or an inline `config` argument.
|
|
364
|
+
* classify sentiment with full context awareness and return a confidence score.
|
|
365
|
+
* Requires {@link configureAssertions} or an inline `config` argument.
|
|
251
366
|
* Falls back gracefully with a clear error if no API key is configured.
|
|
367
|
+
*
|
|
368
|
+
* Returns `{ sentiment, confidence, matches }` — the async layer now provides
|
|
369
|
+
* the same rich return shape as {@link hasSentimentWithScore}, but powered by
|
|
370
|
+
* an LLM instead of keyword counting. The `confidence` field is the LLM's
|
|
371
|
+
* self-reported confidence (0–1), not a lexical heuristic.
|
|
372
|
+
*
|
|
373
|
+
* The returned object implements `Symbol.toPrimitive` so that legacy code
|
|
374
|
+
* using `if (await hasSentimentAsync(...))` still works correctly (coerces
|
|
375
|
+
* to `matches`), but a deprecation warning is emitted. Migrate to
|
|
376
|
+
* destructuring: `const { matches } = await hasSentimentAsync(...)`.
|
|
377
|
+
*
|
|
378
|
+
* @example
|
|
379
|
+
* ```ts
|
|
380
|
+
* const { sentiment, confidence, matches } = await hasSentimentAsync(
|
|
381
|
+
* "This product is revolutionary but overpriced",
|
|
382
|
+
* "negative",
|
|
383
|
+
* );
|
|
384
|
+
* // sentiment: "negative", confidence: 0.7, matches: true
|
|
385
|
+
* ```
|
|
252
386
|
*/
|
|
253
|
-
export declare function hasSentimentAsync(text: string, expected: "positive" | "negative" | "neutral", config?: AssertionLLMConfig): Promise<
|
|
387
|
+
export declare function hasSentimentAsync(text: string, expected: "positive" | "negative" | "neutral", config?: AssertionLLMConfig): Promise<SentimentAsyncResult>;
|
|
254
388
|
/**
|
|
255
389
|
* LLM-backed toxicity check. **Slow and accurate** — context-aware, handles
|
|
256
390
|
* sarcasm, implicit threats, and culturally specific harmful content that
|
|
@@ -265,4 +399,54 @@ export declare function hasFactualAccuracyAsync(text: string, facts: string[], c
|
|
|
265
399
|
* claims even when they are paraphrased or contradict facts indirectly.
|
|
266
400
|
*/
|
|
267
401
|
export declare function hasNoHallucinationsAsync(text: string, groundTruth: string[], config?: AssertionLLMConfig): Promise<boolean>;
|
|
402
|
+
/**
|
|
403
|
+
* Embedding-based semantic containment check. Uses OpenAI embeddings and
|
|
404
|
+
* cosine similarity to determine whether the text semantically contains
|
|
405
|
+
* the given concept — no LLM prompt, no "does this text contain X" trick.
|
|
406
|
+
*
|
|
407
|
+
* This is **real semantic containment**: embed both strings, compute cosine
|
|
408
|
+
* similarity, and compare against a threshold. "The city of lights" will
|
|
409
|
+
* have high similarity to "Paris" because their embeddings are close in
|
|
410
|
+
* vector space.
|
|
411
|
+
*
|
|
412
|
+
* Requires `provider: "openai"` in the config. For Anthropic or other
|
|
413
|
+
* providers without an embedding API, use {@link toSemanticallyContainLLM}.
|
|
414
|
+
*
|
|
415
|
+
* @param text - The text to check
|
|
416
|
+
* @param phrase - The semantic concept to look for
|
|
417
|
+
* @param config - LLM config (must be OpenAI with embedding support)
|
|
418
|
+
* @param threshold - Cosine similarity threshold (default: 0.4). Lower values
|
|
419
|
+
* are more permissive. Typical ranges: 0.3–0.5 for concept containment,
|
|
420
|
+
* 0.6–0.8 for paraphrase detection, 0.9+ for near-duplicates.
|
|
421
|
+
* @returns `{ contains, similarity }` — whether the threshold was met and the raw score
|
|
422
|
+
*
|
|
423
|
+
* @example
|
|
424
|
+
* ```ts
|
|
425
|
+
* const { contains, similarity } = await toSemanticallyContain(
|
|
426
|
+
* "The city of lights is beautiful in spring",
|
|
427
|
+
* "Paris",
|
|
428
|
+
* { provider: "openai", apiKey: process.env.OPENAI_API_KEY },
|
|
429
|
+
* );
|
|
430
|
+
* // contains: true, similarity: ~0.52
|
|
431
|
+
* ```
|
|
432
|
+
*/
|
|
433
|
+
export declare function toSemanticallyContain(text: string, phrase: string, config?: AssertionLLMConfig, threshold?: number): Promise<{
|
|
434
|
+
contains: boolean;
|
|
435
|
+
similarity: number;
|
|
436
|
+
}>;
|
|
437
|
+
/**
|
|
438
|
+
* LLM-prompt-based semantic containment check. Uses an LLM prompt to ask
|
|
439
|
+
* whether the text conveys a concept. This is a **fallback** for providers
|
|
440
|
+
* that don't offer an embedding API (e.g., Anthropic).
|
|
441
|
+
*
|
|
442
|
+
* Note: This is functionally similar to `followsInstructions` — the LLM is
|
|
443
|
+
* being asked to judge containment, not compute vector similarity. For
|
|
444
|
+
* real embedding-based semantic containment, use {@link toSemanticallyContain}.
|
|
445
|
+
*
|
|
446
|
+
* @param text - The text to check
|
|
447
|
+
* @param phrase - The semantic concept to look for
|
|
448
|
+
* @param config - Optional LLM config
|
|
449
|
+
* @returns true if the LLM judges the text contains the concept
|
|
450
|
+
*/
|
|
451
|
+
export declare function toSemanticallyContainLLM(text: string, phrase: string, config?: AssertionLLMConfig): Promise<boolean>;
|
|
268
452
|
export declare function hasValidCodeSyntax(code: string, language: string): boolean;
|