@pauly4010/evalai-sdk 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,565 @@
1
+ # @pauly4010/evalai-sdk
2
+
3
+ Official TypeScript/JavaScript SDK for the AI Evaluation Platform. Build confidence in your AI systems with comprehensive evaluation tools.
4
+
5
+ ## Installation
6
+
7
+ ```bash
8
+ npm install @pauly4010/evalai-sdk
9
+ # or
10
+ yarn add @pauly4010/evalai-sdk
11
+ # or
12
+ pnpm add @pauly4010/evalai-sdk
13
+ ```
14
+
15
+ ## Environment Support
16
+
17
+ This SDK works in both **Node.js** and **browsers**, with some features having specific requirements:
18
+
19
+ ### ✅ Works Everywhere (Node.js + Browser)
20
+
21
+ - Traces API
22
+ - Evaluations API
23
+ - LLM Judge API
24
+ - Annotations API
25
+ - Developer API (API Keys, Webhooks, Usage)
26
+ - Organizations API
27
+ - Assertions Library
28
+ - Test Suites
29
+ - Error Handling
30
+
31
+ ### 🟡 Node.js Only Features
32
+
33
+ The following features require Node.js and **will not work in browsers**:
34
+
35
+ - **Snapshot Testing** - Uses filesystem for storage
36
+ - **Local Storage Mode** - Uses filesystem for offline development
37
+ - **CLI Tool** - Command-line interface
38
+ - **Export to File** - Direct file system writes
39
+
40
+ ### 🔄 Context Propagation
41
+
42
+ - **Node.js**: Full async context propagation using `AsyncLocalStorage`
43
+ - **Browser**: Basic context support (not safe across all async boundaries)
44
+
45
+ Use appropriate features based on your environment. The SDK will throw helpful errors if you try to use Node.js-only features in a browser.
46
+
47
+ ## Quick Start
48
+
49
+ ```typescript
50
+ import { AIEvalClient } from "@pauly4010/evalai-sdk";
51
+
52
+ // Initialize with environment variables
53
+ const client = AIEvalClient.init();
54
+
55
+ // Or with explicit config
56
+ const client = new AIEvalClient({
57
+ apiKey: "your-api-key",
58
+ organizationId: 123,
59
+ debug: true,
60
+ });
61
+ ```
62
+
63
+ ## Features
64
+
65
+ ### 🎯 Evaluation Templates (v1.1.0)
66
+
67
+ The SDK now includes comprehensive evaluation template types for different testing scenarios:
68
+
69
+ ```typescript
70
+ import { EvaluationTemplates } from "@pauly4010/evalai-sdk";
71
+
72
+ // Create evaluations with predefined templates
73
+ await client.evaluations.create({
74
+ name: "Prompt Optimization Test",
75
+ type: EvaluationTemplates.PROMPT_OPTIMIZATION,
76
+ createdBy: userId,
77
+ });
78
+
79
+ // Available templates:
80
+ // Core Testing
81
+ EvaluationTemplates.UNIT_TESTING;
82
+ EvaluationTemplates.OUTPUT_QUALITY;
83
+
84
+ // Advanced Evaluation
85
+ EvaluationTemplates.PROMPT_OPTIMIZATION;
86
+ EvaluationTemplates.CHAIN_OF_THOUGHT;
87
+ EvaluationTemplates.LONG_CONTEXT_TESTING;
88
+ EvaluationTemplates.MODEL_STEERING;
89
+ EvaluationTemplates.REGRESSION_TESTING;
90
+ EvaluationTemplates.CONFIDENCE_CALIBRATION;
91
+
92
+ // Safety & Compliance
93
+ EvaluationTemplates.SAFETY_COMPLIANCE;
94
+
95
+ // Domain-Specific
96
+ EvaluationTemplates.RAG_EVALUATION;
97
+ EvaluationTemplates.CODE_GENERATION;
98
+ EvaluationTemplates.SUMMARIZATION;
99
+ ```
100
+
101
+ ### 📊 Organization Resource Limits (v1.1.0)
102
+
103
+ Track your organization's resource usage and limits:
104
+
105
+ ```typescript
106
+ // Get current usage and limits
107
+ const limits = await client.getOrganizationLimits();
108
+
109
+ console.log("Traces:", {
110
+ usage: limits.traces_per_organization?.usage,
111
+ balance: limits.traces_per_organization?.balance,
112
+ total: limits.traces_per_organization?.included_usage,
113
+ });
114
+
115
+ console.log("Evaluations:", {
116
+ usage: limits.evals_per_organization?.usage,
117
+ balance: limits.evals_per_organization?.balance,
118
+ total: limits.evals_per_organization?.included_usage,
119
+ });
120
+
121
+ console.log("Annotations:", {
122
+ usage: limits.annotations_per_organization?.usage,
123
+ balance: limits.annotations_per_organization?.balance,
124
+ total: limits.annotations_per_organization?.included_usage,
125
+ });
126
+ ```
127
+
128
+ ### 🔍 Traces
129
+
130
+ ```typescript
131
+ // Create a trace
132
+ const trace = await client.traces.create({
133
+ name: "User Query",
134
+ traceId: "trace-123",
135
+ metadata: { userId: "456" },
136
+ });
137
+
138
+ // List traces
139
+ const traces = await client.traces.list({
140
+ limit: 10,
141
+ status: "success",
142
+ });
143
+
144
+ // Create spans
145
+ const span = await client.traces.createSpan(trace.id, {
146
+ name: "LLM Call",
147
+ spanId: "span-456",
148
+ startTime: new Date().toISOString(),
149
+ metadata: { model: "gpt-4" },
150
+ });
151
+ ```
152
+
153
+ ### 📝 Evaluations
154
+
155
+ ```typescript
156
+ // Create evaluation
157
+ const evaluation = await client.evaluations.create({
158
+ name: "Chatbot Responses",
159
+ type: EvaluationTemplates.OUTPUT_QUALITY,
160
+ description: "Test chatbot response quality",
161
+ createdBy: userId,
162
+ });
163
+
164
+ // Add test cases
165
+ await client.evaluations.createTestCase(evaluation.id, {
166
+ input: "What is the capital of France?",
167
+ expectedOutput: "Paris",
168
+ });
169
+
170
+ // Run evaluation
171
+ const run = await client.evaluations.createRun(evaluation.id, {
172
+ status: "running",
173
+ });
174
+ ```
175
+
176
+ ### ⚖️ LLM Judge
177
+
178
+ ```typescript
179
+ // Evaluate with LLM judge
180
+ const result = await client.llmJudge.evaluate({
181
+ configId: 1,
182
+ input: "Translate: Hello world",
183
+ output: "Bonjour le monde",
184
+ metadata: { language: "French" },
185
+ });
186
+
187
+ console.log("Score:", result.result.score);
188
+ console.log("Reasoning:", result.result.reasoning);
189
+ ```
190
+
191
+ ## Configuration
192
+
193
+ ### Environment Variables
194
+
195
+ ```bash
196
+ # Required
197
+ EVALAI_API_KEY=your-api-key
198
+
199
+ # Optional
200
+ EVALAI_ORGANIZATION_ID=123
201
+ EVALAI_BASE_URL=https://api.example.com
202
+ ```
203
+
204
+ ### Client Options
205
+
206
+ ```typescript
207
+ const client = new AIEvalClient({
208
+ apiKey: "your-api-key",
209
+ organizationId: 123,
210
+ baseUrl: "https://api.example.com",
211
+ timeout: 30000,
212
+ debug: true,
213
+ logLevel: "debug",
214
+ retry: {
215
+ maxAttempts: 3,
216
+ backoff: "exponential",
217
+ retryableErrors: ["RATE_LIMIT_EXCEEDED", "TIMEOUT"],
218
+ },
219
+ });
220
+ ```
221
+
222
+ ## Error Handling
223
+
224
+ ```typescript
225
+ import { EvalAIError, RateLimitError } from '@pauly4010/evalai-sdk';
226
+
227
+ try {
228
+ await client.traces.create({...});
229
+ } catch (error) {
230
+ if (error instanceof RateLimitError) {
231
+ console.log('Rate limited, retry after:', error.retryAfter);
232
+ } else if (error instanceof EvalAIError) {
233
+ console.log('Error:', error.code, error.message);
234
+ }
235
+ }
236
+ ```
237
+
238
+ ## Advanced Features
239
+
240
+ ### Context Propagation
241
+
242
+ ```typescript
243
+ import { withContext } from "@pauly4010/evalai-sdk";
244
+
245
+ withContext({ userId: "123", sessionId: "abc" }, async () => {
246
+ // Context automatically included in all traces
247
+ await client.traces.create({
248
+ name: "Query",
249
+ traceId: "trace-1",
250
+ });
251
+ });
252
+ ```
253
+
254
+ ### Test Suites
255
+
256
+ ```typescript
257
+ import { createTestSuite } from "@pauly4010/evalai-sdk";
258
+
259
+ const suite = createTestSuite({
260
+ name: "Chatbot Tests",
261
+ tests: [
262
+ {
263
+ name: "Greeting",
264
+ input: "Hello",
265
+ expectedOutput: "Hi there!",
266
+ },
267
+ ],
268
+ });
269
+
270
+ await suite.run(client);
271
+ ```
272
+
273
+ ### Framework Integrations
274
+
275
+ ```typescript
276
+ import { traceOpenAI } from "@pauly4010/evalai-sdk/integrations/openai";
277
+ import OpenAI from "openai";
278
+
279
+ const openai = traceOpenAI(new OpenAI(), client);
280
+
281
+ // All OpenAI calls are automatically traced
282
+ const response = await openai.chat.completions.create({
283
+ model: "gpt-4",
284
+ messages: [{ role: "user", content: "Hello" }],
285
+ });
286
+ ```
287
+
288
+ ## TypeScript Support
289
+
290
+ The SDK is fully typed with TypeScript generics for type-safe metadata:
291
+
292
+ ```typescript
293
+ interface CustomMetadata {
294
+ userId: string;
295
+ sessionId: string;
296
+ model: string;
297
+ }
298
+
299
+ const trace = await client.traces.create<CustomMetadata>({
300
+ name: "Query",
301
+ traceId: "trace-1",
302
+ metadata: {
303
+ userId: "123",
304
+ sessionId: "abc",
305
+ model: "gpt-4",
306
+ },
307
+ });
308
+
309
+ // TypeScript knows the exact metadata type
310
+ console.log(trace.metadata.userId);
311
+ ```
312
+
313
+ ## 📋 Annotations API (v1.2.0)
314
+
315
+ Human-in-the-loop evaluation for quality assurance:
316
+
317
+ ```typescript
318
+ // Create an annotation
319
+ const annotation = await client.annotations.create({
320
+ evaluationRunId: 123,
321
+ testCaseId: 456,
322
+ rating: 5,
323
+ feedback: "Excellent response!",
324
+ labels: { category: "helpful", sentiment: "positive" },
325
+ });
326
+
327
+ // List annotations
328
+ const annotations = await client.annotations.list({
329
+ evaluationRunId: 123,
330
+ });
331
+
332
+ // Annotation Tasks
333
+ const task = await client.annotations.tasks.create({
334
+ name: "Q4 Quality Review",
335
+ type: "classification",
336
+ organizationId: 1,
337
+ instructions: "Rate responses from 1-5",
338
+ });
339
+
340
+ const tasks = await client.annotations.tasks.list({
341
+ organizationId: 1,
342
+ status: "pending",
343
+ });
344
+
345
+ const taskDetail = await client.annotations.tasks.get(taskId);
346
+
347
+ // Annotation Items
348
+ const item = await client.annotations.tasks.items.create(taskId, {
349
+ content: "Response to evaluate",
350
+ annotation: { rating: 4, category: "good" },
351
+ });
352
+
353
+ const items = await client.annotations.tasks.items.list(taskId);
354
+ ```
355
+
356
+ ## 🔑 Developer API (v1.2.0)
357
+
358
+ Manage API keys, webhooks, and monitor usage:
359
+
360
+ ### API Keys
361
+
362
+ ```typescript
363
+ // Create an API key
364
+ const { apiKey, id, keyPrefix } = await client.developer.apiKeys.create({
365
+ name: "Production Key",
366
+ organizationId: 1,
367
+ scopes: ["traces:read", "traces:write", "evaluations:read"],
368
+ expiresAt: "2025-12-31T23:59:59Z",
369
+ });
370
+
371
+ // IMPORTANT: Save the apiKey securely - it's only shown once!
372
+
373
+ // List API keys
374
+ const keys = await client.developer.apiKeys.list({
375
+ organizationId: 1,
376
+ });
377
+
378
+ // Update an API key
379
+ await client.developer.apiKeys.update(keyId, {
380
+ name: "Updated Name",
381
+ scopes: ["traces:read"],
382
+ });
383
+
384
+ // Revoke an API key
385
+ await client.developer.apiKeys.revoke(keyId);
386
+
387
+ // Get usage statistics for a key
388
+ const usage = await client.developer.apiKeys.getUsage(keyId);
389
+ console.log("Total requests:", usage.totalRequests);
390
+ console.log("By endpoint:", usage.usageByEndpoint);
391
+ ```
392
+
393
+ ### Webhooks
394
+
395
+ ```typescript
396
+ // Create a webhook
397
+ const webhook = await client.developer.webhooks.create({
398
+ organizationId: 1,
399
+ url: "https://your-app.com/webhooks/evalai",
400
+ events: ["trace.created", "evaluation.completed", "annotation.created"],
401
+ });
402
+
403
+ // List webhooks
404
+ const webhooks = await client.developer.webhooks.list({
405
+ organizationId: 1,
406
+ status: "active",
407
+ });
408
+
409
+ // Get a specific webhook
410
+ const webhookDetail = await client.developer.webhooks.get(webhookId);
411
+
412
+ // Update a webhook
413
+ await client.developer.webhooks.update(webhookId, {
414
+ url: "https://new-url.com/webhooks",
415
+ events: ["trace.created"],
416
+ status: "inactive",
417
+ });
418
+
419
+ // Delete a webhook
420
+ await client.developer.webhooks.delete(webhookId);
421
+
422
+ // Get webhook deliveries (for debugging)
423
+ const deliveries = await client.developer.webhooks.getDeliveries(webhookId, {
424
+ limit: 50,
425
+ success: false, // Only failed deliveries
426
+ });
427
+ ```
428
+
429
+ ### Usage Analytics
430
+
431
+ ```typescript
432
+ // Get detailed usage statistics
433
+ const stats = await client.developer.getUsage({
434
+ organizationId: 1,
435
+ startDate: "2025-01-01",
436
+ endDate: "2025-01-31",
437
+ });
438
+
439
+ console.log("Traces:", stats.traces.total);
440
+ console.log("Evaluations by type:", stats.evaluations.byType);
441
+ console.log("API calls by endpoint:", stats.apiCalls.byEndpoint);
442
+
443
+ // Get usage summary
444
+ const summary = await client.developer.getUsageSummary(organizationId);
445
+ console.log("Current period:", summary.currentPeriod);
446
+ console.log("Limits:", summary.limits);
447
+ ```
448
+
449
+ ## ⚖️ LLM Judge Extended (v1.2.0)
450
+
451
+ Enhanced LLM judge configuration and analysis:
452
+
453
+ ```typescript
454
+ // Create a judge configuration
455
+ const config = await client.llmJudge.createConfig({
456
+ name: "GPT-4 Accuracy Judge",
457
+ description: "Evaluates factual accuracy",
458
+ model: "gpt-4",
459
+ rubric: "Score 1-10 based on factual accuracy...",
460
+ temperature: 0.3,
461
+ maxTokens: 500,
462
+ organizationId: 1,
463
+ createdBy: userId,
464
+ });
465
+
466
+ // List configurations
467
+ const configs = await client.llmJudge.listConfigs({
468
+ organizationId: 1,
469
+ });
470
+
471
+ // List results
472
+ const results = await client.llmJudge.listResults({
473
+ configId: config.id,
474
+ evaluationId: 123,
475
+ });
476
+
477
+ // Get alignment analysis
478
+ const alignment = await client.llmJudge.getAlignment({
479
+ configId: config.id,
480
+ startDate: "2025-01-01",
481
+ endDate: "2025-01-31",
482
+ });
483
+
484
+ console.log("Average score:", alignment.averageScore);
485
+ console.log("Accuracy:", alignment.alignmentMetrics.accuracy);
486
+ console.log("Agreement with human:", alignment.comparisonWithHuman?.agreement);
487
+ ```
488
+
489
+ ## 🏢 Organizations API (v1.2.0)
490
+
491
+ Manage organization details:
492
+
493
+ ```typescript
494
+ // Get current organization
495
+ const org = await client.organizations.getCurrent();
496
+ console.log("Organization:", org.name);
497
+ console.log("Plan:", org.plan);
498
+ console.log("Status:", org.status);
499
+ ```
500
+
501
+ ## Changelog
502
+
503
+ ### v1.2.1 (Latest - Bug Fixes)
504
+
505
+ - 🐛 **Critical Fixes**
506
+ - Fixed CLI import paths for proper npm package distribution
507
+ - Fixed duplicate trace creation in OpenAI/Anthropic integrations
508
+ - Fixed Commander.js command structure
509
+ - Added browser/Node.js environment detection and helpful errors
510
+ - Fixed context system to work in both Node.js and browsers
511
+ - Added security checks to snapshot path sanitization
512
+ - Removed misleading empty exports (StreamingClient, BatchClient)
513
+ - 📦 **Dependencies**
514
+ - Updated Commander to v14
515
+ - Added peer dependencies for OpenAI and Anthropic SDKs (optional)
516
+ - Added Node.js engine requirement (>=16.0.0)
517
+ - 📚 **Documentation**
518
+ - Clarified Node.js-only vs universal features
519
+ - Added environment support section
520
+ - Updated examples with security best practices
521
+
522
+ ### v1.2.0
523
+
524
+ - 🎉 **100% API Coverage** - All backend endpoints now supported!
525
+ - 📋 **Annotations API** - Complete human-in-the-loop evaluation
526
+ - Create and list annotations
527
+ - Manage annotation tasks
528
+ - Handle annotation items
529
+ - 🔑 **Developer API** - Full API key and webhook management
530
+ - CRUD operations for API keys
531
+ - Webhook management with delivery tracking
532
+ - Usage analytics and monitoring
533
+ - ⚖️ **LLM Judge Extended** - Enhanced judge capabilities
534
+ - Configuration management
535
+ - Results querying
536
+ - Alignment analysis
537
+ - 🏢 **Organizations API** - Organization details access
538
+ - 📊 **Enhanced Types** - 40+ new TypeScript interfaces
539
+ - 📚 **Comprehensive Documentation** - Examples for all new features
540
+
541
+ ### v1.1.0
542
+
543
+ - ✨ Added comprehensive evaluation template types
544
+ - ✨ Added organization resource limits tracking
545
+ - ✨ Added `getOrganizationLimits()` method
546
+ - 📚 Enhanced documentation with new features
547
+
548
+ ### v1.0.0
549
+
550
+ - 🎉 Initial release
551
+ - ✅ Traces, Evaluations, LLM Judge APIs
552
+ - ✅ Framework integrations (OpenAI, Anthropic)
553
+ - ✅ Test suite builder
554
+ - ✅ Context propagation
555
+ - ✅ Error handling & retries
556
+
557
+ ## License
558
+
559
+ MIT
560
+
561
+ ## Support
562
+
563
+ - Documentation: https://docs.evalai.com
564
+ - Issues: https://github.com/evalai/sdk/issues
565
+ - Discord: https://discord.gg/evalai