@pauly4010/evalai-sdk 1.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +289 -0
- package/LICENSE +21 -0
- package/README.md +565 -0
- package/dist/assertions.d.ts +189 -0
- package/dist/assertions.js +596 -0
- package/dist/batch.d.ts +68 -0
- package/dist/batch.js +178 -0
- package/dist/cache.d.ts +65 -0
- package/dist/cache.js +135 -0
- package/dist/cli/index.d.ts +6 -0
- package/dist/cli/index.js +181 -0
- package/dist/client.d.ts +358 -0
- package/dist/client.js +802 -0
- package/dist/context.d.ts +134 -0
- package/dist/context.js +215 -0
- package/dist/errors.d.ts +80 -0
- package/dist/errors.js +285 -0
- package/dist/export.d.ts +195 -0
- package/dist/export.js +334 -0
- package/dist/index.d.ts +35 -0
- package/dist/index.js +111 -0
- package/dist/integrations/anthropic.d.ts +72 -0
- package/dist/integrations/anthropic.js +159 -0
- package/dist/integrations/openai.d.ts +69 -0
- package/dist/integrations/openai.js +156 -0
- package/dist/local.d.ts +39 -0
- package/dist/local.js +146 -0
- package/dist/logger.d.ts +128 -0
- package/dist/logger.js +227 -0
- package/dist/pagination.d.ts +74 -0
- package/dist/pagination.js +135 -0
- package/dist/snapshot.d.ts +176 -0
- package/dist/snapshot.js +322 -0
- package/dist/streaming.d.ts +173 -0
- package/dist/streaming.js +268 -0
- package/dist/testing.d.ts +204 -0
- package/dist/testing.js +252 -0
- package/dist/types.d.ts +715 -0
- package/dist/types.js +54 -0
- package/dist/workflows.d.ts +378 -0
- package/dist/workflows.js +628 -0
- package/package.json +102 -0
package/README.md
ADDED
|
@@ -0,0 +1,565 @@
|
|
|
1
|
+
# @pauly4010/evalai-sdk
|
|
2
|
+
|
|
3
|
+
Official TypeScript/JavaScript SDK for the AI Evaluation Platform. Build confidence in your AI systems with comprehensive evaluation tools.
|
|
4
|
+
|
|
5
|
+
## Installation
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
npm install @pauly4010/evalai-sdk
|
|
9
|
+
# or
|
|
10
|
+
yarn add @pauly4010/evalai-sdk
|
|
11
|
+
# or
|
|
12
|
+
pnpm add @pauly4010/evalai-sdk
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
## Environment Support
|
|
16
|
+
|
|
17
|
+
This SDK works in both **Node.js** and **browsers**, with some features having specific requirements:
|
|
18
|
+
|
|
19
|
+
### ✅ Works Everywhere (Node.js + Browser)
|
|
20
|
+
|
|
21
|
+
- Traces API
|
|
22
|
+
- Evaluations API
|
|
23
|
+
- LLM Judge API
|
|
24
|
+
- Annotations API
|
|
25
|
+
- Developer API (API Keys, Webhooks, Usage)
|
|
26
|
+
- Organizations API
|
|
27
|
+
- Assertions Library
|
|
28
|
+
- Test Suites
|
|
29
|
+
- Error Handling
|
|
30
|
+
|
|
31
|
+
### 🟡 Node.js Only Features
|
|
32
|
+
|
|
33
|
+
The following features require Node.js and **will not work in browsers**:
|
|
34
|
+
|
|
35
|
+
- **Snapshot Testing** - Uses filesystem for storage
|
|
36
|
+
- **Local Storage Mode** - Uses filesystem for offline development
|
|
37
|
+
- **CLI Tool** - Command-line interface
|
|
38
|
+
- **Export to File** - Direct file system writes
|
|
39
|
+
|
|
40
|
+
### 🔄 Context Propagation
|
|
41
|
+
|
|
42
|
+
- **Node.js**: Full async context propagation using `AsyncLocalStorage`
|
|
43
|
+
- **Browser**: Basic context support (not safe across all async boundaries)
|
|
44
|
+
|
|
45
|
+
Use appropriate features based on your environment. The SDK will throw helpful errors if you try to use Node.js-only features in a browser.
|
|
46
|
+
|
|
47
|
+
## Quick Start
|
|
48
|
+
|
|
49
|
+
```typescript
|
|
50
|
+
import { AIEvalClient } from "@pauly4010/evalai-sdk";
|
|
51
|
+
|
|
52
|
+
// Initialize with environment variables
|
|
53
|
+
const client = AIEvalClient.init();
|
|
54
|
+
|
|
55
|
+
// Or with explicit config
|
|
56
|
+
const client = new AIEvalClient({
|
|
57
|
+
apiKey: "your-api-key",
|
|
58
|
+
organizationId: 123,
|
|
59
|
+
debug: true,
|
|
60
|
+
});
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
## Features
|
|
64
|
+
|
|
65
|
+
### 🎯 Evaluation Templates (v1.1.0)
|
|
66
|
+
|
|
67
|
+
The SDK now includes comprehensive evaluation template types for different testing scenarios:
|
|
68
|
+
|
|
69
|
+
```typescript
|
|
70
|
+
import { EvaluationTemplates } from "@pauly4010/evalai-sdk";
|
|
71
|
+
|
|
72
|
+
// Create evaluations with predefined templates
|
|
73
|
+
await client.evaluations.create({
|
|
74
|
+
name: "Prompt Optimization Test",
|
|
75
|
+
type: EvaluationTemplates.PROMPT_OPTIMIZATION,
|
|
76
|
+
createdBy: userId,
|
|
77
|
+
});
|
|
78
|
+
|
|
79
|
+
// Available templates:
|
|
80
|
+
// Core Testing
|
|
81
|
+
EvaluationTemplates.UNIT_TESTING;
|
|
82
|
+
EvaluationTemplates.OUTPUT_QUALITY;
|
|
83
|
+
|
|
84
|
+
// Advanced Evaluation
|
|
85
|
+
EvaluationTemplates.PROMPT_OPTIMIZATION;
|
|
86
|
+
EvaluationTemplates.CHAIN_OF_THOUGHT;
|
|
87
|
+
EvaluationTemplates.LONG_CONTEXT_TESTING;
|
|
88
|
+
EvaluationTemplates.MODEL_STEERING;
|
|
89
|
+
EvaluationTemplates.REGRESSION_TESTING;
|
|
90
|
+
EvaluationTemplates.CONFIDENCE_CALIBRATION;
|
|
91
|
+
|
|
92
|
+
// Safety & Compliance
|
|
93
|
+
EvaluationTemplates.SAFETY_COMPLIANCE;
|
|
94
|
+
|
|
95
|
+
// Domain-Specific
|
|
96
|
+
EvaluationTemplates.RAG_EVALUATION;
|
|
97
|
+
EvaluationTemplates.CODE_GENERATION;
|
|
98
|
+
EvaluationTemplates.SUMMARIZATION;
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
### 📊 Organization Resource Limits (v1.1.0)
|
|
102
|
+
|
|
103
|
+
Track your organization's resource usage and limits:
|
|
104
|
+
|
|
105
|
+
```typescript
|
|
106
|
+
// Get current usage and limits
|
|
107
|
+
const limits = await client.getOrganizationLimits();
|
|
108
|
+
|
|
109
|
+
console.log("Traces:", {
|
|
110
|
+
usage: limits.traces_per_organization?.usage,
|
|
111
|
+
balance: limits.traces_per_organization?.balance,
|
|
112
|
+
total: limits.traces_per_organization?.included_usage,
|
|
113
|
+
});
|
|
114
|
+
|
|
115
|
+
console.log("Evaluations:", {
|
|
116
|
+
usage: limits.evals_per_organization?.usage,
|
|
117
|
+
balance: limits.evals_per_organization?.balance,
|
|
118
|
+
total: limits.evals_per_organization?.included_usage,
|
|
119
|
+
});
|
|
120
|
+
|
|
121
|
+
console.log("Annotations:", {
|
|
122
|
+
usage: limits.annotations_per_organization?.usage,
|
|
123
|
+
balance: limits.annotations_per_organization?.balance,
|
|
124
|
+
total: limits.annotations_per_organization?.included_usage,
|
|
125
|
+
});
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
### 🔍 Traces
|
|
129
|
+
|
|
130
|
+
```typescript
|
|
131
|
+
// Create a trace
|
|
132
|
+
const trace = await client.traces.create({
|
|
133
|
+
name: "User Query",
|
|
134
|
+
traceId: "trace-123",
|
|
135
|
+
metadata: { userId: "456" },
|
|
136
|
+
});
|
|
137
|
+
|
|
138
|
+
// List traces
|
|
139
|
+
const traces = await client.traces.list({
|
|
140
|
+
limit: 10,
|
|
141
|
+
status: "success",
|
|
142
|
+
});
|
|
143
|
+
|
|
144
|
+
// Create spans
|
|
145
|
+
const span = await client.traces.createSpan(trace.id, {
|
|
146
|
+
name: "LLM Call",
|
|
147
|
+
spanId: "span-456",
|
|
148
|
+
startTime: new Date().toISOString(),
|
|
149
|
+
metadata: { model: "gpt-4" },
|
|
150
|
+
});
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
### 📝 Evaluations
|
|
154
|
+
|
|
155
|
+
```typescript
|
|
156
|
+
// Create evaluation
|
|
157
|
+
const evaluation = await client.evaluations.create({
|
|
158
|
+
name: "Chatbot Responses",
|
|
159
|
+
type: EvaluationTemplates.OUTPUT_QUALITY,
|
|
160
|
+
description: "Test chatbot response quality",
|
|
161
|
+
createdBy: userId,
|
|
162
|
+
});
|
|
163
|
+
|
|
164
|
+
// Add test cases
|
|
165
|
+
await client.evaluations.createTestCase(evaluation.id, {
|
|
166
|
+
input: "What is the capital of France?",
|
|
167
|
+
expectedOutput: "Paris",
|
|
168
|
+
});
|
|
169
|
+
|
|
170
|
+
// Run evaluation
|
|
171
|
+
const run = await client.evaluations.createRun(evaluation.id, {
|
|
172
|
+
status: "running",
|
|
173
|
+
});
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
### ⚖️ LLM Judge
|
|
177
|
+
|
|
178
|
+
```typescript
|
|
179
|
+
// Evaluate with LLM judge
|
|
180
|
+
const result = await client.llmJudge.evaluate({
|
|
181
|
+
configId: 1,
|
|
182
|
+
input: "Translate: Hello world",
|
|
183
|
+
output: "Bonjour le monde",
|
|
184
|
+
metadata: { language: "French" },
|
|
185
|
+
});
|
|
186
|
+
|
|
187
|
+
console.log("Score:", result.result.score);
|
|
188
|
+
console.log("Reasoning:", result.result.reasoning);
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
## Configuration
|
|
192
|
+
|
|
193
|
+
### Environment Variables
|
|
194
|
+
|
|
195
|
+
```bash
|
|
196
|
+
# Required
|
|
197
|
+
EVALAI_API_KEY=your-api-key
|
|
198
|
+
|
|
199
|
+
# Optional
|
|
200
|
+
EVALAI_ORGANIZATION_ID=123
|
|
201
|
+
EVALAI_BASE_URL=https://api.example.com
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
### Client Options
|
|
205
|
+
|
|
206
|
+
```typescript
|
|
207
|
+
const client = new AIEvalClient({
|
|
208
|
+
apiKey: "your-api-key",
|
|
209
|
+
organizationId: 123,
|
|
210
|
+
baseUrl: "https://api.example.com",
|
|
211
|
+
timeout: 30000,
|
|
212
|
+
debug: true,
|
|
213
|
+
logLevel: "debug",
|
|
214
|
+
retry: {
|
|
215
|
+
maxAttempts: 3,
|
|
216
|
+
backoff: "exponential",
|
|
217
|
+
retryableErrors: ["RATE_LIMIT_EXCEEDED", "TIMEOUT"],
|
|
218
|
+
},
|
|
219
|
+
});
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
## Error Handling
|
|
223
|
+
|
|
224
|
+
```typescript
|
|
225
|
+
import { EvalAIError, RateLimitError } from '@pauly4010/evalai-sdk';
|
|
226
|
+
|
|
227
|
+
try {
|
|
228
|
+
await client.traces.create({...});
|
|
229
|
+
} catch (error) {
|
|
230
|
+
if (error instanceof RateLimitError) {
|
|
231
|
+
console.log('Rate limited, retry after:', error.retryAfter);
|
|
232
|
+
} else if (error instanceof EvalAIError) {
|
|
233
|
+
console.log('Error:', error.code, error.message);
|
|
234
|
+
}
|
|
235
|
+
}
|
|
236
|
+
```
|
|
237
|
+
|
|
238
|
+
## Advanced Features
|
|
239
|
+
|
|
240
|
+
### Context Propagation
|
|
241
|
+
|
|
242
|
+
```typescript
|
|
243
|
+
import { withContext } from "@pauly4010/evalai-sdk";
|
|
244
|
+
|
|
245
|
+
withContext({ userId: "123", sessionId: "abc" }, async () => {
|
|
246
|
+
// Context automatically included in all traces
|
|
247
|
+
await client.traces.create({
|
|
248
|
+
name: "Query",
|
|
249
|
+
traceId: "trace-1",
|
|
250
|
+
});
|
|
251
|
+
});
|
|
252
|
+
```
|
|
253
|
+
|
|
254
|
+
### Test Suites
|
|
255
|
+
|
|
256
|
+
```typescript
|
|
257
|
+
import { createTestSuite } from "@pauly4010/evalai-sdk";
|
|
258
|
+
|
|
259
|
+
const suite = createTestSuite({
|
|
260
|
+
name: "Chatbot Tests",
|
|
261
|
+
tests: [
|
|
262
|
+
{
|
|
263
|
+
name: "Greeting",
|
|
264
|
+
input: "Hello",
|
|
265
|
+
expectedOutput: "Hi there!",
|
|
266
|
+
},
|
|
267
|
+
],
|
|
268
|
+
});
|
|
269
|
+
|
|
270
|
+
await suite.run(client);
|
|
271
|
+
```
|
|
272
|
+
|
|
273
|
+
### Framework Integrations
|
|
274
|
+
|
|
275
|
+
```typescript
|
|
276
|
+
import { traceOpenAI } from "@pauly4010/evalai-sdk/integrations/openai";
|
|
277
|
+
import OpenAI from "openai";
|
|
278
|
+
|
|
279
|
+
const openai = traceOpenAI(new OpenAI(), client);
|
|
280
|
+
|
|
281
|
+
// All OpenAI calls are automatically traced
|
|
282
|
+
const response = await openai.chat.completions.create({
|
|
283
|
+
model: "gpt-4",
|
|
284
|
+
messages: [{ role: "user", content: "Hello" }],
|
|
285
|
+
});
|
|
286
|
+
```
|
|
287
|
+
|
|
288
|
+
## TypeScript Support
|
|
289
|
+
|
|
290
|
+
The SDK is fully typed with TypeScript generics for type-safe metadata:
|
|
291
|
+
|
|
292
|
+
```typescript
|
|
293
|
+
interface CustomMetadata {
|
|
294
|
+
userId: string;
|
|
295
|
+
sessionId: string;
|
|
296
|
+
model: string;
|
|
297
|
+
}
|
|
298
|
+
|
|
299
|
+
const trace = await client.traces.create<CustomMetadata>({
|
|
300
|
+
name: "Query",
|
|
301
|
+
traceId: "trace-1",
|
|
302
|
+
metadata: {
|
|
303
|
+
userId: "123",
|
|
304
|
+
sessionId: "abc",
|
|
305
|
+
model: "gpt-4",
|
|
306
|
+
},
|
|
307
|
+
});
|
|
308
|
+
|
|
309
|
+
// TypeScript knows the exact metadata type
|
|
310
|
+
console.log(trace.metadata.userId);
|
|
311
|
+
```
|
|
312
|
+
|
|
313
|
+
## 📋 Annotations API (v1.2.0)
|
|
314
|
+
|
|
315
|
+
Human-in-the-loop evaluation for quality assurance:
|
|
316
|
+
|
|
317
|
+
```typescript
|
|
318
|
+
// Create an annotation
|
|
319
|
+
const annotation = await client.annotations.create({
|
|
320
|
+
evaluationRunId: 123,
|
|
321
|
+
testCaseId: 456,
|
|
322
|
+
rating: 5,
|
|
323
|
+
feedback: "Excellent response!",
|
|
324
|
+
labels: { category: "helpful", sentiment: "positive" },
|
|
325
|
+
});
|
|
326
|
+
|
|
327
|
+
// List annotations
|
|
328
|
+
const annotations = await client.annotations.list({
|
|
329
|
+
evaluationRunId: 123,
|
|
330
|
+
});
|
|
331
|
+
|
|
332
|
+
// Annotation Tasks
|
|
333
|
+
const task = await client.annotations.tasks.create({
|
|
334
|
+
name: "Q4 Quality Review",
|
|
335
|
+
type: "classification",
|
|
336
|
+
organizationId: 1,
|
|
337
|
+
instructions: "Rate responses from 1-5",
|
|
338
|
+
});
|
|
339
|
+
|
|
340
|
+
const tasks = await client.annotations.tasks.list({
|
|
341
|
+
organizationId: 1,
|
|
342
|
+
status: "pending",
|
|
343
|
+
});
|
|
344
|
+
|
|
345
|
+
const taskDetail = await client.annotations.tasks.get(taskId);
|
|
346
|
+
|
|
347
|
+
// Annotation Items
|
|
348
|
+
const item = await client.annotations.tasks.items.create(taskId, {
|
|
349
|
+
content: "Response to evaluate",
|
|
350
|
+
annotation: { rating: 4, category: "good" },
|
|
351
|
+
});
|
|
352
|
+
|
|
353
|
+
const items = await client.annotations.tasks.items.list(taskId);
|
|
354
|
+
```
|
|
355
|
+
|
|
356
|
+
## 🔑 Developer API (v1.2.0)
|
|
357
|
+
|
|
358
|
+
Manage API keys, webhooks, and monitor usage:
|
|
359
|
+
|
|
360
|
+
### API Keys
|
|
361
|
+
|
|
362
|
+
```typescript
|
|
363
|
+
// Create an API key
|
|
364
|
+
const { apiKey, id, keyPrefix } = await client.developer.apiKeys.create({
|
|
365
|
+
name: "Production Key",
|
|
366
|
+
organizationId: 1,
|
|
367
|
+
scopes: ["traces:read", "traces:write", "evaluations:read"],
|
|
368
|
+
expiresAt: "2025-12-31T23:59:59Z",
|
|
369
|
+
});
|
|
370
|
+
|
|
371
|
+
// IMPORTANT: Save the apiKey securely - it's only shown once!
|
|
372
|
+
|
|
373
|
+
// List API keys
|
|
374
|
+
const keys = await client.developer.apiKeys.list({
|
|
375
|
+
organizationId: 1,
|
|
376
|
+
});
|
|
377
|
+
|
|
378
|
+
// Update an API key
|
|
379
|
+
await client.developer.apiKeys.update(keyId, {
|
|
380
|
+
name: "Updated Name",
|
|
381
|
+
scopes: ["traces:read"],
|
|
382
|
+
});
|
|
383
|
+
|
|
384
|
+
// Revoke an API key
|
|
385
|
+
await client.developer.apiKeys.revoke(keyId);
|
|
386
|
+
|
|
387
|
+
// Get usage statistics for a key
|
|
388
|
+
const usage = await client.developer.apiKeys.getUsage(keyId);
|
|
389
|
+
console.log("Total requests:", usage.totalRequests);
|
|
390
|
+
console.log("By endpoint:", usage.usageByEndpoint);
|
|
391
|
+
```
|
|
392
|
+
|
|
393
|
+
### Webhooks
|
|
394
|
+
|
|
395
|
+
```typescript
|
|
396
|
+
// Create a webhook
|
|
397
|
+
const webhook = await client.developer.webhooks.create({
|
|
398
|
+
organizationId: 1,
|
|
399
|
+
url: "https://your-app.com/webhooks/evalai",
|
|
400
|
+
events: ["trace.created", "evaluation.completed", "annotation.created"],
|
|
401
|
+
});
|
|
402
|
+
|
|
403
|
+
// List webhooks
|
|
404
|
+
const webhooks = await client.developer.webhooks.list({
|
|
405
|
+
organizationId: 1,
|
|
406
|
+
status: "active",
|
|
407
|
+
});
|
|
408
|
+
|
|
409
|
+
// Get a specific webhook
|
|
410
|
+
const webhookDetail = await client.developer.webhooks.get(webhookId);
|
|
411
|
+
|
|
412
|
+
// Update a webhook
|
|
413
|
+
await client.developer.webhooks.update(webhookId, {
|
|
414
|
+
url: "https://new-url.com/webhooks",
|
|
415
|
+
events: ["trace.created"],
|
|
416
|
+
status: "inactive",
|
|
417
|
+
});
|
|
418
|
+
|
|
419
|
+
// Delete a webhook
|
|
420
|
+
await client.developer.webhooks.delete(webhookId);
|
|
421
|
+
|
|
422
|
+
// Get webhook deliveries (for debugging)
|
|
423
|
+
const deliveries = await client.developer.webhooks.getDeliveries(webhookId, {
|
|
424
|
+
limit: 50,
|
|
425
|
+
success: false, // Only failed deliveries
|
|
426
|
+
});
|
|
427
|
+
```
|
|
428
|
+
|
|
429
|
+
### Usage Analytics
|
|
430
|
+
|
|
431
|
+
```typescript
|
|
432
|
+
// Get detailed usage statistics
|
|
433
|
+
const stats = await client.developer.getUsage({
|
|
434
|
+
organizationId: 1,
|
|
435
|
+
startDate: "2025-01-01",
|
|
436
|
+
endDate: "2025-01-31",
|
|
437
|
+
});
|
|
438
|
+
|
|
439
|
+
console.log("Traces:", stats.traces.total);
|
|
440
|
+
console.log("Evaluations by type:", stats.evaluations.byType);
|
|
441
|
+
console.log("API calls by endpoint:", stats.apiCalls.byEndpoint);
|
|
442
|
+
|
|
443
|
+
// Get usage summary
|
|
444
|
+
const summary = await client.developer.getUsageSummary(organizationId);
|
|
445
|
+
console.log("Current period:", summary.currentPeriod);
|
|
446
|
+
console.log("Limits:", summary.limits);
|
|
447
|
+
```
|
|
448
|
+
|
|
449
|
+
## ⚖️ LLM Judge Extended (v1.2.0)
|
|
450
|
+
|
|
451
|
+
Enhanced LLM judge configuration and analysis:
|
|
452
|
+
|
|
453
|
+
```typescript
|
|
454
|
+
// Create a judge configuration
|
|
455
|
+
const config = await client.llmJudge.createConfig({
|
|
456
|
+
name: "GPT-4 Accuracy Judge",
|
|
457
|
+
description: "Evaluates factual accuracy",
|
|
458
|
+
model: "gpt-4",
|
|
459
|
+
rubric: "Score 1-10 based on factual accuracy...",
|
|
460
|
+
temperature: 0.3,
|
|
461
|
+
maxTokens: 500,
|
|
462
|
+
organizationId: 1,
|
|
463
|
+
createdBy: userId,
|
|
464
|
+
});
|
|
465
|
+
|
|
466
|
+
// List configurations
|
|
467
|
+
const configs = await client.llmJudge.listConfigs({
|
|
468
|
+
organizationId: 1,
|
|
469
|
+
});
|
|
470
|
+
|
|
471
|
+
// List results
|
|
472
|
+
const results = await client.llmJudge.listResults({
|
|
473
|
+
configId: config.id,
|
|
474
|
+
evaluationId: 123,
|
|
475
|
+
});
|
|
476
|
+
|
|
477
|
+
// Get alignment analysis
|
|
478
|
+
const alignment = await client.llmJudge.getAlignment({
|
|
479
|
+
configId: config.id,
|
|
480
|
+
startDate: "2025-01-01",
|
|
481
|
+
endDate: "2025-01-31",
|
|
482
|
+
});
|
|
483
|
+
|
|
484
|
+
console.log("Average score:", alignment.averageScore);
|
|
485
|
+
console.log("Accuracy:", alignment.alignmentMetrics.accuracy);
|
|
486
|
+
console.log("Agreement with human:", alignment.comparisonWithHuman?.agreement);
|
|
487
|
+
```
|
|
488
|
+
|
|
489
|
+
## 🏢 Organizations API (v1.2.0)
|
|
490
|
+
|
|
491
|
+
Manage organization details:
|
|
492
|
+
|
|
493
|
+
```typescript
|
|
494
|
+
// Get current organization
|
|
495
|
+
const org = await client.organizations.getCurrent();
|
|
496
|
+
console.log("Organization:", org.name);
|
|
497
|
+
console.log("Plan:", org.plan);
|
|
498
|
+
console.log("Status:", org.status);
|
|
499
|
+
```
|
|
500
|
+
|
|
501
|
+
## Changelog
|
|
502
|
+
|
|
503
|
+
### v1.2.1 (Latest - Bug Fixes)
|
|
504
|
+
|
|
505
|
+
- 🐛 **Critical Fixes**
|
|
506
|
+
- Fixed CLI import paths for proper npm package distribution
|
|
507
|
+
- Fixed duplicate trace creation in OpenAI/Anthropic integrations
|
|
508
|
+
- Fixed Commander.js command structure
|
|
509
|
+
- Added browser/Node.js environment detection and helpful errors
|
|
510
|
+
- Fixed context system to work in both Node.js and browsers
|
|
511
|
+
- Added security checks to snapshot path sanitization
|
|
512
|
+
- Removed misleading empty exports (StreamingClient, BatchClient)
|
|
513
|
+
- 📦 **Dependencies**
|
|
514
|
+
- Updated Commander to v14
|
|
515
|
+
- Added peer dependencies for OpenAI and Anthropic SDKs (optional)
|
|
516
|
+
- Added Node.js engine requirement (>=16.0.0)
|
|
517
|
+
- 📚 **Documentation**
|
|
518
|
+
- Clarified Node.js-only vs universal features
|
|
519
|
+
- Added environment support section
|
|
520
|
+
- Updated examples with security best practices
|
|
521
|
+
|
|
522
|
+
### v1.2.0
|
|
523
|
+
|
|
524
|
+
- 🎉 **100% API Coverage** - All backend endpoints now supported!
|
|
525
|
+
- 📋 **Annotations API** - Complete human-in-the-loop evaluation
|
|
526
|
+
- Create and list annotations
|
|
527
|
+
- Manage annotation tasks
|
|
528
|
+
- Handle annotation items
|
|
529
|
+
- 🔑 **Developer API** - Full API key and webhook management
|
|
530
|
+
- CRUD operations for API keys
|
|
531
|
+
- Webhook management with delivery tracking
|
|
532
|
+
- Usage analytics and monitoring
|
|
533
|
+
- ⚖️ **LLM Judge Extended** - Enhanced judge capabilities
|
|
534
|
+
- Configuration management
|
|
535
|
+
- Results querying
|
|
536
|
+
- Alignment analysis
|
|
537
|
+
- 🏢 **Organizations API** - Organization details access
|
|
538
|
+
- 📊 **Enhanced Types** - 40+ new TypeScript interfaces
|
|
539
|
+
- 📚 **Comprehensive Documentation** - Examples for all new features
|
|
540
|
+
|
|
541
|
+
### v1.1.0
|
|
542
|
+
|
|
543
|
+
- ✨ Added comprehensive evaluation template types
|
|
544
|
+
- ✨ Added organization resource limits tracking
|
|
545
|
+
- ✨ Added `getOrganizationLimits()` method
|
|
546
|
+
- 📚 Enhanced documentation with new features
|
|
547
|
+
|
|
548
|
+
### v1.0.0
|
|
549
|
+
|
|
550
|
+
- 🎉 Initial release
|
|
551
|
+
- ✅ Traces, Evaluations, LLM Judge APIs
|
|
552
|
+
- ✅ Framework integrations (OpenAI, Anthropic)
|
|
553
|
+
- ✅ Test suite builder
|
|
554
|
+
- ✅ Context propagation
|
|
555
|
+
- ✅ Error handling & retries
|
|
556
|
+
|
|
557
|
+
## License
|
|
558
|
+
|
|
559
|
+
MIT
|
|
560
|
+
|
|
561
|
+
## Support
|
|
562
|
+
|
|
563
|
+
- Documentation: https://docs.evalai.com
|
|
564
|
+
- Issues: https://github.com/evalai/sdk/issues
|
|
565
|
+
- Discord: https://discord.gg/evalai
|