@threnn/acap-sdk 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,517 @@
1
+ # @threnn/acap-sdk
2
+
3
+ Agent Capability Attestation Protocol — SDK for attesting, certifying, and monitoring AI agent capabilities.
4
+
5
+ ## What is ACAP?
6
+
7
+ ACAP is an open protocol for objectively measuring and certifying what AI agents can do. It provides a standardized pipeline: **Benchmark → Attest → Certify → Monitor** — so organizations can make data-driven decisions about which agents to trust with which tasks.
8
+
9
+ Certificates are cryptographically signed, include dimension-level scoring with confidence intervals, and can be published to a searchable registry for cross-organization comparison.
10
+
11
+ ## Quick Start
12
+
13
+ Attest an agent in under 10 minutes:
14
+
15
+ ```typescript
16
+ import { ACAPClient } from '@threnn/acap-sdk';
17
+
18
+ const client = new ACAPClient({
19
+ baseUrl: 'https://app.threnn.ai',
20
+ apiKey: process.env.ACAP_API_KEY!,
21
+ });
22
+
23
+ // 1. Pick a benchmark suite
24
+ const suites = await client.listSuites({ domain: 'general' });
25
+ const suite = suites[0];
26
+
27
+ // 2. Start attestation
28
+ const run = await client.startAttestation('my-agent-001', suite.id);
29
+ console.log('Run started:', run.id);
30
+
31
+ // 3. Poll until complete
32
+ let status = run;
33
+ while (status.status === 'pending' || status.status === 'running') {
34
+ await new Promise(r => setTimeout(r, 2000));
35
+ status = await client.getAttestationStatus(run.id);
36
+ }
37
+
38
+ // 4. Get the certificate
39
+ const cert = await client.getCertificate(status.certificate_id);
40
+ console.log(`Score: ${(cert.overall_score * 100).toFixed(1)}%`);
41
+ console.log(`Tier: ${cert.capability_tier}`);
42
+ ```
43
+
44
+ ## Installation
45
+
46
+ ```bash
47
+ npm install @threnn/acap-sdk
48
+ ```
49
+
50
+ ## Configuration
51
+
52
+ ```typescript
53
+ import { ACAPClient } from '@threnn/acap-sdk';
54
+
55
+ const client = new ACAPClient({
56
+ baseUrl: 'https://app.threnn.ai', // Required — ACAP API base URL
57
+ apiKey: 'your-api-key', // Required — from Settings page
58
+ fetch: customFetch, // Optional — custom fetch for Node <18
59
+ });
60
+ ```
61
+
62
+ | Parameter | Type | Required | Description |
63
+ |-----------|------|----------|-------------|
64
+ | `baseUrl` | `string` | Yes | Base URL of the ACAP API |
65
+ | `apiKey` | `string` | Yes | API key for Bearer token authentication |
66
+ | `fetch` | `typeof fetch` | No | Custom fetch implementation (defaults to `globalThis.fetch`) |
67
+
68
+ ## Attestation
69
+
70
+ ### Start an attestation run
71
+
72
+ ```typescript
73
+ const run = await client.startAttestation('my-agent-id', 'suite-uuid');
74
+ // Returns: AttestationRun { id, status, task_progress, ... }
75
+
76
+ // If another run is already in progress for this agent,
77
+ // the response includes a `warning` field
78
+ if (run.warning) {
79
+ console.warn(run.warning);
80
+ }
81
+ ```
82
+
83
+ ### Poll for status
84
+
85
+ ```typescript
86
+ const status = await client.getAttestationStatus(run.id);
87
+ console.log(`${status.task_progress.completedTasks}/${status.task_progress.totalTasks} tasks`);
88
+ // status.status: 'pending' | 'running' | 'completed' | 'failed' | 'cancelled'
89
+ ```
90
+
91
+ ### Cancel a run
92
+
93
+ ```typescript
94
+ await client.cancelAttestation(run.id);
95
+ ```
96
+
97
+ ## Certificates
98
+
99
+ ### Fetch a certificate
100
+
101
+ ```typescript
102
+ const cert = await client.getCertificate('CERT-ID');
103
+ console.log(cert.overall_score); // 0.0–1.0
104
+ console.log(cert.capability_tier); // 'basic' | 'intermediate' | 'advanced' | 'expert'
105
+ console.log(cert.dimension_scores); // Array<{ dimension, score, confidence_interval }>
106
+ console.log(cert.limitations); // Array<{ dimension, severity, description }>
107
+ ```
108
+
109
+ ### List certificates
110
+
111
+ ```typescript
112
+ const { data, total, page, pageSize } = await client.listCertificates({
113
+ agentId: 'my-agent', // Optional filter
114
+ domain: 'general', // Optional filter
115
+ status: 'active', // Optional: 'active' | 'revoked' | 'expired'
116
+ page: 1,
117
+ pageSize: 20,
118
+ });
119
+ ```
120
+
121
+ ### Get certificates for a specific agent
122
+
123
+ ```typescript
124
+ const { data } = await client.getCertificatesForAgent('my-agent', {
125
+ domain: 'code_generation',
126
+ status: 'active',
127
+ pageSize: 10,
128
+ });
129
+ ```
130
+
131
+ ### Verify certificate integrity
132
+
133
+ ```typescript
134
+ const result = await client.verifyCertificate('CERT-ID');
135
+ console.log('Valid:', result.valid);
136
+
137
+ for (const check of result.checks) {
138
+ console.log(` ${check.name}: ${check.passed ? 'PASS' : 'FAIL'} — ${check.message}`);
139
+ }
140
+ // Checks: content_hash, signature, not_expired (7-day grace), status, score_range, issued_date
141
+
142
+ for (const warning of result.warnings) {
143
+ console.warn(` Warning: ${warning}`);
144
+ }
145
+ ```
146
+
147
+ ### Revoke a certificate
148
+
149
+ ```typescript
150
+ await client.revokeCertificate('CERT-ID', 'Model updated, re-attestation in progress');
151
+ ```
152
+
153
+ ### Get nutrition label
154
+
155
+ ```typescript
156
+ const label = await client.getNutritionLabel('CERT-ID');
157
+ console.log('Tier:', label.tier);
158
+ console.log('Strengths:', label.strengths);
159
+ console.log('Test conditions:', label.testConditions);
160
+ ```
161
+
162
+ ## Benchmark Suites
163
+
164
+ ### List suites
165
+
166
+ ```typescript
167
+ // All available suites (public + your own)
168
+ const suites = await client.listSuites();
169
+
170
+ // Filter by domain
171
+ const codeSuites = await client.listSuites({ domain: 'code_generation' });
172
+
173
+ // Only public/system suites
174
+ const publicSuites = await client.listSuites({ isPublic: true });
175
+ ```
176
+
177
+ ### Create a custom suite
178
+
179
+ ```typescript
180
+ const suite = await client.createSuite({
181
+ name: 'My Code Quality Suite',
182
+ domain: 'code_generation',
183
+ version: '1.0',
184
+ description: 'Tests code correctness, security, and style',
185
+ tasks: [
186
+ {
187
+ name: 'Reverse a string',
188
+ category: 'accuracy',
189
+ difficulty: 2,
190
+ dimension: 'accuracy',
191
+ weight: 1.0,
192
+ evaluationMethod: 'regex_match',
193
+ timeout_ms: 30000,
194
+ description: 'Write a function to reverse a string',
195
+ },
196
+ {
197
+ name: 'Find SQL injection vulnerability',
198
+ category: 'safety',
199
+ difficulty: 3,
200
+ dimension: 'safety',
201
+ weight: 1.5,
202
+ evaluationMethod: 'llm_judge',
203
+ timeout_ms: 60000,
204
+ description: 'Identify and fix the SQL injection in the given code',
205
+ },
206
+ ],
207
+ dimensions: ['accuracy', 'safety'],
208
+ difficulty: 'intermediate',
209
+ estimatedDuration_ms: 90000,
210
+ isPublic: false,
211
+ });
212
+ ```
213
+
214
+ ### Update a suite
215
+
216
+ ```typescript
217
+ const updated = await client.updateSuite(suite.id, {
218
+ name: 'Updated Suite Name',
219
+ tasks: [...suite.tasks, newTask],
220
+ });
221
+ ```
222
+
223
+ ### Get a suite by ID
224
+
225
+ ```typescript
226
+ const suite = await client.getSuite('BS-XXXXXXXXXX');
227
+ ```
228
+
229
+ ## Registry
230
+
231
+ ### Publish to registry
232
+
233
+ Requires an active certificate with at least 3 task results.
234
+
235
+ ```typescript
236
+ const listing = await client.publishToRegistry(
237
+ 'CERT-ID',
238
+ 'My Agent Name',
239
+ 'A specialized agent for customer support workflows',
240
+ );
241
+ console.log('Listed:', listing.id);
242
+ ```
243
+
244
+ ### Search the registry
245
+
246
+ ```typescript
247
+ const results = await client.searchRegistry({
248
+ domain: 'customer_service',
249
+ minScore: 0.7,
250
+ tier: 'advanced',
251
+ query: 'support',
252
+ sortBy: 'score', // 'score' | 'recent' | 'name'
253
+ page: 1,
254
+ pageSize: 20,
255
+ });
256
+
257
+ for (const entry of results.data) {
258
+ console.log(`${entry.agent_name}: ${(entry.overall_score * 100).toFixed(1)}% [${entry.capability_tier}]`);
259
+ }
260
+ ```
261
+
262
+ ### Get domain leaderboard
263
+
264
+ ```typescript
265
+ const { entries } = await client.getLeaderboard('general', 10);
266
+ for (const entry of entries) {
267
+ console.log(`#${entry.rank} ${entry.agent_name}: ${(entry.overall_score * 100).toFixed(1)}%`);
268
+ }
269
+ ```
270
+
271
+ ## Agent Comparison
272
+
273
+ Compare up to 10 certificates side-by-side:
274
+
275
+ ```typescript
276
+ const comparison = await client.compareAgents(['CERT-A', 'CERT-B', 'CERT-C']);
277
+
278
+ // Overall ranking
279
+ for (const rank of comparison.overallRanking) {
280
+ console.log(`${rank.agentId}: ${(rank.overallScore * 100).toFixed(1)}% [${rank.tier}]`);
281
+ }
282
+
283
+ // Dimension-by-dimension comparison (sorted by largest gap)
284
+ for (const dim of comparison.dimensionComparisons) {
285
+ const scores = dim.scores.map(s => `${s.agentId}: ${(s.score * 100).toFixed(0)}%`).join(', ');
286
+ console.log(`${dim.dimension}: ${scores} (spread: ${(dim.maxDelta * 100).toFixed(0)}%)`);
287
+ }
288
+ ```
289
+
290
+ ## Capability Drift Detection
291
+
292
+ Drift detection runs automatically on the server when new attestations are submitted. For local analysis, use `CapabilityDriftDetector` from `@threnn/acap-core`:
293
+
294
+ ```typescript
295
+ import { CapabilityDriftDetector } from '@threnn/acap-core';
296
+
297
+ const detector = new CapabilityDriftDetector({
298
+ degradationThreshold: 0.05, // 5% drop triggers warning
299
+ criticalThreshold: 0.15, // 15% drop triggers critical
300
+ });
301
+
302
+ // Compare two certificates for the same agent
303
+ const report = detector.compareCertificates(previousCert, currentCert);
304
+ console.log('Drift type:', report.driftType);
305
+ // 'stable' | 'improvement' | 'degradation' | 'tier_change' | 'dimension_shift'
306
+ console.log('Severity:', report.severity);
307
+ console.log('Score delta:', (report.overallScoreDelta * 100).toFixed(1) + '%');
308
+ console.log('Recommendation:', report.recommendation);
309
+ ```
310
+
311
+ ## Agent Adapters
312
+
313
+ Agent adapters live in `@threnn/acap-core` and are used with the `BenchmarkRunner` for local execution. Five built-in adapters are available:
314
+
315
+ ### HTTPAdapter — Any agent with an HTTP API
316
+
317
+ ```typescript
318
+ import { HTTPAdapter } from '@threnn/acap-core';
319
+
320
+ const adapter = new HTTPAdapter({
321
+ endpoint: 'https://my-agent.example.com/run',
322
+ headers: { Authorization: 'Bearer my-key' },
323
+ metadata: {
324
+ modelProvider: 'custom',
325
+ modelVersion: '1.0',
326
+ systemPromptHash: '',
327
+ toolsAvailable: [],
328
+ temperature: 0,
329
+ },
330
+ });
331
+ ```
332
+
333
+ ### AnthropicAdapter — Claude
334
+
335
+ ```typescript
336
+ import { AnthropicAdapter } from '@threnn/acap-core';
337
+
338
+ const adapter = new AnthropicAdapter({
339
+ apiKey: process.env.ANTHROPIC_API_KEY!,
340
+ model: 'claude-sonnet-4-20250514',
341
+ maxTokens: 4096,
342
+ temperature: 0,
343
+ systemPrompt: 'You are a helpful assistant.',
344
+ });
345
+ ```
346
+
347
+ ### OpenAIAdapter — GPT
348
+
349
+ ```typescript
350
+ import { OpenAIAdapter } from '@threnn/acap-core';
351
+
352
+ const adapter = new OpenAIAdapter({
353
+ apiKey: process.env.OPENAI_API_KEY!,
354
+ model: 'gpt-4o',
355
+ temperature: 0,
356
+ systemPrompt: 'You are a helpful assistant.',
357
+ });
358
+ ```
359
+
360
+ ### MCPAdapter — MCP Server
361
+
362
+ ```typescript
363
+ import { MCPAdapter } from '@threnn/acap-core';
364
+
365
+ const adapter = new MCPAdapter({
366
+ endpoint: 'https://my-mcp-server.example.com/mcp',
367
+ toolName: 'run_task',
368
+ headers: { Authorization: 'Bearer token' },
369
+ metadata: {
370
+ modelProvider: 'custom',
371
+ modelVersion: '1.0',
372
+ systemPromptHash: '',
373
+ toolsAvailable: ['run_task'],
374
+ temperature: 0,
375
+ },
376
+ });
377
+ ```
378
+
379
+ ### ManualAdapter — Human-in-the-loop
380
+
381
+ ```typescript
382
+ import { ManualAdapter } from '@threnn/acap-core';
383
+
384
+ const adapter = new ManualAdapter({
385
+ promptHandler: async (input) => {
386
+ // Present task to human, collect response
387
+ const response = await promptUser(input.content);
388
+ return { output: response, latency_ms: 0 };
389
+ },
390
+ metadata: {
391
+ modelProvider: 'human',
392
+ modelVersion: '1.0',
393
+ systemPromptHash: '',
394
+ toolsAvailable: [],
395
+ temperature: 0,
396
+ },
397
+ });
398
+ ```
399
+
400
+ ## Capability Tiers
401
+
402
+ | Tier | Score Range | Description |
403
+ |------|-----------|-------------|
404
+ | **Expert** | 85–100% | Exceptional capability across all dimensions |
405
+ | **Advanced** | 65–84% | Excels at complex tasks with minor limitations |
406
+ | **Intermediate** | 40–64% | Handles standard tasks reliably with some weaknesses |
407
+ | **Basic** | 0–39% | Foundational capabilities with significant limitations |
408
+
409
+ ## Standard Dimensions
410
+
411
+ | Dimension | Description |
412
+ |-----------|-------------|
413
+ | `accuracy` | Correctness of outputs relative to expected results |
414
+ | `reasoning` | Ability to perform multi-step logical reasoning |
415
+ | `instruction_following` | Adherence to specified constraints and formats |
416
+ | `tool_use` | Correct and efficient use of available tools and APIs |
417
+ | `safety` | Avoidance of harmful, biased, or policy-violating outputs |
418
+ | `consistency` | Reliability and reproducibility of outputs across runs |
419
+ | `latency` | Response time relative to task complexity |
420
+
421
+ ## Error Handling
422
+
423
+ All SDK methods throw `ACAPError` on API errors:
424
+
425
+ ```typescript
426
+ import { ACAPClient, ACAPError } from '@threnn/acap-sdk';
427
+
428
+ try {
429
+ const cert = await client.getCertificate('nonexistent-id');
430
+ } catch (err) {
431
+ if (err instanceof ACAPError) {
432
+ console.error(`${err.status} [${err.code}]: ${err.message}`);
433
+ // e.g. "404 [NOT_FOUND]: Certificate not found"
434
+ }
435
+ }
436
+ ```
437
+
438
+ ### Error Codes
439
+
440
+ | Code | HTTP Status | Description |
441
+ |------|-------------|-------------|
442
+ | `UNAUTHORIZED` | 401 | Missing or invalid API key |
443
+ | `FORBIDDEN` | 403 | Not authorized to access this resource |
444
+ | `NOT_FOUND` | 404 | Resource does not exist |
445
+ | `CONFLICT` | 409 | Resource already exists or invalid state transition |
446
+ | `VALIDATION_ERROR` | 400 | Invalid request body or parameters |
447
+ | `RATE_LIMITED` | 429 | Too many requests (see rate limits below) |
448
+ | `INTERNAL_ERROR` | 500 | Server error |
449
+
450
+ ### Rate Limits
451
+
452
+ | Endpoint | Limit |
453
+ |----------|-------|
454
+ | POST /attest | 10 requests/minute |
455
+ | POST /registry/publish | 20 requests/minute |
456
+ | POST /suites | 30 requests/minute |
457
+ | PATCH /drift | 30 requests/minute |
458
+
459
+ Rate-limited responses include `Retry-After`, `X-RateLimit-Limit`, `X-RateLimit-Remaining`, and `X-RateLimit-Reset` headers.
460
+
461
+ ## TypeScript Types
462
+
463
+ Key types are re-exported from `@threnn/acap-core`:
464
+
465
+ ```typescript
466
+ import type {
467
+ CapabilityCertificate,
468
+ BenchmarkSuite,
469
+ BenchmarkTask,
470
+ AttestationRun,
471
+ RunProgress,
472
+ DimensionScore,
473
+ TaskResult,
474
+ AgentAdapter,
475
+ AgentResponse,
476
+ NutritionLabel,
477
+ RegistryListing,
478
+ CapabilityDriftEvent,
479
+ VerificationResult,
480
+ ComparisonResult,
481
+ } from '@threnn/acap-sdk';
482
+ ```
483
+
484
+ SDK-specific types:
485
+
486
+ ```typescript
487
+ import type {
488
+ PaginatedResult,
489
+ ListCertificatesOptions,
490
+ SearchRegistryOptions,
491
+ ListSuitesOptions,
492
+ LeaderboardEntry,
493
+ } from '@threnn/acap-sdk';
494
+ ```
495
+
496
+ ## Demo Scripts
497
+
498
+ Runnable demo scripts are included in the `demo/` directory:
499
+
500
+ | Script | Description |
501
+ |--------|-------------|
502
+ | `demo/attest-agent.ts` | Basic attestation of a mock agent |
503
+ | `demo/custom-benchmark.ts` | Build and run a custom benchmark suite |
504
+ | `demo/capability-comparison.ts` | Compare two agents side-by-side |
505
+ | `demo/drift-detection.ts` | Detect capability drift between attestations |
506
+ | `demo/registry-publish.ts` | Publish and search the registry |
507
+
508
+ Set `ACAP_API_KEY` and optionally `ACAP_BASE_URL` before running:
509
+
510
+ ```bash
511
+ export ACAP_API_KEY=your-key
512
+ npx ts-node demo/attest-agent.ts
513
+ ```
514
+
515
+ ## License
516
+
517
+ MIT