npm - @threnn/acap-sdk - Versions diffs - 0.2.0 - Mend

@threnn/acap-sdk 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/README.md ADDED Viewed

@@ -0,0 +1,517 @@
+# @threnn/acap-sdk
+Agent Capability Attestation Protocol — SDK for attesting, certifying, and monitoring AI agent capabilities.
+## What is ACAP?
+ACAP is an open protocol for objectively measuring and certifying what AI agents can do. It provides a standardized pipeline: **Benchmark → Attest → Certify → Monitor** — so organizations can make data-driven decisions about which agents to trust with which tasks.
+Certificates are cryptographically signed, include dimension-level scoring with confidence intervals, and can be published to a searchable registry for cross-organization comparison.
+## Quick Start
+Attest an agent in under 10 minutes:
+```typescript
+import { ACAPClient } from '@threnn/acap-sdk';
+const client = new ACAPClient({
+  baseUrl: 'https://app.threnn.ai',
+  apiKey: process.env.ACAP_API_KEY!,
+});
+// 1. Pick a benchmark suite
+const suites = await client.listSuites({ domain: 'general' });
+const suite = suites[0];
+// 2. Start attestation
+const run = await client.startAttestation('my-agent-001', suite.id);
+console.log('Run started:', run.id);
+// 3. Poll until complete
+let status = run;
+while (status.status === 'pending' || status.status === 'running') {
+  await new Promise(r => setTimeout(r, 2000));
+  status = await client.getAttestationStatus(run.id);
+}
+// 4. Get the certificate
+const cert = await client.getCertificate(status.certificate_id);
+console.log(`Score: ${(cert.overall_score * 100).toFixed(1)}%`);
+console.log(`Tier: ${cert.capability_tier}`);
+```
+## Installation
+```bash
+npm install @threnn/acap-sdk
+```
+## Configuration
+```typescript
+import { ACAPClient } from '@threnn/acap-sdk';
+const client = new ACAPClient({
+  baseUrl: 'https://app.threnn.ai',  // Required — ACAP API base URL
+  apiKey: 'your-api-key',            // Required — from Settings page
+  fetch: customFetch,                // Optional — custom fetch for Node <18
+});
+```
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `baseUrl` | `string` | Yes | Base URL of the ACAP API |
+| `apiKey` | `string` | Yes | API key for Bearer token authentication |
+| `fetch` | `typeof fetch` | No | Custom fetch implementation (defaults to `globalThis.fetch`) |
+## Attestation
+### Start an attestation run
+```typescript
+const run = await client.startAttestation('my-agent-id', 'suite-uuid');
+// Returns: AttestationRun { id, status, task_progress, ... }
+// If another run is already in progress for this agent,
+// the response includes a `warning` field
+if (run.warning) {
+  console.warn(run.warning);
+}
+```
+### Poll for status
+```typescript
+const status = await client.getAttestationStatus(run.id);
+console.log(`${status.task_progress.completedTasks}/${status.task_progress.totalTasks} tasks`);
+// status.status: 'pending' | 'running' | 'completed' | 'failed' | 'cancelled'
+```
+### Cancel a run
+```typescript
+await client.cancelAttestation(run.id);
+```
+## Certificates
+### Fetch a certificate
+```typescript
+const cert = await client.getCertificate('CERT-ID');
+console.log(cert.overall_score);      // 0.0–1.0
+console.log(cert.capability_tier);     // 'basic' | 'intermediate' | 'advanced' | 'expert'
+console.log(cert.dimension_scores);    // Array<{ dimension, score, confidence_interval }>
+console.log(cert.limitations);         // Array<{ dimension, severity, description }>
+```
+### List certificates
+```typescript
+const { data, total, page, pageSize } = await client.listCertificates({
+  agentId: 'my-agent',     // Optional filter
+  domain: 'general',       // Optional filter
+  status: 'active',        // Optional: 'active' | 'revoked' | 'expired'
+  page: 1,
+  pageSize: 20,
+});
+```
+### Get certificates for a specific agent
+```typescript
+const { data } = await client.getCertificatesForAgent('my-agent', {
+  domain: 'code_generation',
+  status: 'active',
+  pageSize: 10,
+});
+```
+### Verify certificate integrity
+```typescript
+const result = await client.verifyCertificate('CERT-ID');
+console.log('Valid:', result.valid);
+for (const check of result.checks) {
+  console.log(`  ${check.name}: ${check.passed ? 'PASS' : 'FAIL'} — ${check.message}`);
+}
+// Checks: content_hash, signature, not_expired (7-day grace), status, score_range, issued_date
+for (const warning of result.warnings) {
+  console.warn(`  Warning: ${warning}`);
+}
+```
+### Revoke a certificate
+```typescript
+await client.revokeCertificate('CERT-ID', 'Model updated, re-attestation in progress');
+```
+### Get nutrition label
+```typescript
+const label = await client.getNutritionLabel('CERT-ID');
+console.log('Tier:', label.tier);
+console.log('Strengths:', label.strengths);
+console.log('Test conditions:', label.testConditions);
+```
+## Benchmark Suites
+### List suites
+```typescript
+// All available suites (public + your own)
+const suites = await client.listSuites();
+// Filter by domain
+const codeSuites = await client.listSuites({ domain: 'code_generation' });
+// Only public/system suites
+const publicSuites = await client.listSuites({ isPublic: true });
+```
+### Create a custom suite
+```typescript
+const suite = await client.createSuite({
+  name: 'My Code Quality Suite',
+  domain: 'code_generation',
+  version: '1.0',
+  description: 'Tests code correctness, security, and style',
+  tasks: [
+    {
+      name: 'Reverse a string',
+      category: 'accuracy',
+      difficulty: 2,
+      dimension: 'accuracy',
+      weight: 1.0,
+      evaluationMethod: 'regex_match',
+      timeout_ms: 30000,
+      description: 'Write a function to reverse a string',
+    },
+    {
+      name: 'Find SQL injection vulnerability',
+      category: 'safety',
+      difficulty: 3,
+      dimension: 'safety',
+      weight: 1.5,
+      evaluationMethod: 'llm_judge',
+      timeout_ms: 60000,
+      description: 'Identify and fix the SQL injection in the given code',
+    },
+  ],
+  dimensions: ['accuracy', 'safety'],
+  difficulty: 'intermediate',
+  estimatedDuration_ms: 90000,
+  isPublic: false,
+});
+```
+### Update a suite
+```typescript
+const updated = await client.updateSuite(suite.id, {
+  name: 'Updated Suite Name',
+  tasks: [...suite.tasks, newTask],
+});
+```
+### Get a suite by ID
+```typescript
+const suite = await client.getSuite('BS-XXXXXXXXXX');
+```
+## Registry
+### Publish to registry
+Requires an active certificate with at least 3 task results.
+```typescript
+const listing = await client.publishToRegistry(
+  'CERT-ID',
+  'My Agent Name',
+  'A specialized agent for customer support workflows',
+);
+console.log('Listed:', listing.id);
+```
+### Search the registry
+```typescript
+const results = await client.searchRegistry({
+  domain: 'customer_service',
+  minScore: 0.7,
+  tier: 'advanced',
+  query: 'support',
+  sortBy: 'score',     // 'score' | 'recent' | 'name'
+  page: 1,
+  pageSize: 20,
+});
+for (const entry of results.data) {
+  console.log(`${entry.agent_name}: ${(entry.overall_score * 100).toFixed(1)}% [${entry.capability_tier}]`);
+}
+```
+### Get domain leaderboard
+```typescript
+const { entries } = await client.getLeaderboard('general', 10);
+for (const entry of entries) {
+  console.log(`#${entry.rank} ${entry.agent_name}: ${(entry.overall_score * 100).toFixed(1)}%`);
+}
+```
+## Agent Comparison
+Compare up to 10 certificates side-by-side:
+```typescript
+const comparison = await client.compareAgents(['CERT-A', 'CERT-B', 'CERT-C']);
+// Overall ranking
+for (const rank of comparison.overallRanking) {
+  console.log(`${rank.agentId}: ${(rank.overallScore * 100).toFixed(1)}% [${rank.tier}]`);
+}
+// Dimension-by-dimension comparison (sorted by largest gap)
+for (const dim of comparison.dimensionComparisons) {
+  const scores = dim.scores.map(s => `${s.agentId}: ${(s.score * 100).toFixed(0)}%`).join(', ');
+  console.log(`${dim.dimension}: ${scores} (spread: ${(dim.maxDelta * 100).toFixed(0)}%)`);
+}
+```
+## Capability Drift Detection
+Drift detection runs automatically on the server when new attestations are submitted. For local analysis, use `CapabilityDriftDetector` from `@threnn/acap-core`:
+```typescript
+import { CapabilityDriftDetector } from '@threnn/acap-core';
+const detector = new CapabilityDriftDetector({
+  degradationThreshold: 0.05,  // 5% drop triggers warning
+  criticalThreshold: 0.15,     // 15% drop triggers critical
+});
+// Compare two certificates for the same agent
+const report = detector.compareCertificates(previousCert, currentCert);
+console.log('Drift type:', report.driftType);
+// 'stable' | 'improvement' | 'degradation' | 'tier_change' | 'dimension_shift'
+console.log('Severity:', report.severity);
+console.log('Score delta:', (report.overallScoreDelta * 100).toFixed(1) + '%');
+console.log('Recommendation:', report.recommendation);
+```
+## Agent Adapters
+Agent adapters live in `@threnn/acap-core` and are used with the `BenchmarkRunner` for local execution. Five built-in adapters are available:
+### HTTPAdapter — Any agent with an HTTP API
+```typescript
+import { HTTPAdapter } from '@threnn/acap-core';
+const adapter = new HTTPAdapter({
+  endpoint: 'https://my-agent.example.com/run',
+  headers: { Authorization: 'Bearer my-key' },
+  metadata: {
+    modelProvider: 'custom',
+    modelVersion: '1.0',
+    systemPromptHash: '',
+    toolsAvailable: [],
+    temperature: 0,
+  },
+});
+```
+### AnthropicAdapter — Claude
+```typescript
+import { AnthropicAdapter } from '@threnn/acap-core';
+const adapter = new AnthropicAdapter({
+  apiKey: process.env.ANTHROPIC_API_KEY!,
+  model: 'claude-sonnet-4-20250514',
+  maxTokens: 4096,
+  temperature: 0,
+  systemPrompt: 'You are a helpful assistant.',
+});
+```
+### OpenAIAdapter — GPT
+```typescript
+import { OpenAIAdapter } from '@threnn/acap-core';
+const adapter = new OpenAIAdapter({
+  apiKey: process.env.OPENAI_API_KEY!,
+  model: 'gpt-4o',
+  temperature: 0,
+  systemPrompt: 'You are a helpful assistant.',
+});
+```
+### MCPAdapter — MCP Server
+```typescript
+import { MCPAdapter } from '@threnn/acap-core';
+const adapter = new MCPAdapter({
+  endpoint: 'https://my-mcp-server.example.com/mcp',
+  toolName: 'run_task',
+  headers: { Authorization: 'Bearer token' },
+  metadata: {
+    modelProvider: 'custom',
+    modelVersion: '1.0',
+    systemPromptHash: '',
+    toolsAvailable: ['run_task'],
+    temperature: 0,
+  },
+});
+```
+### ManualAdapter — Human-in-the-loop
+```typescript
+import { ManualAdapter } from '@threnn/acap-core';
+const adapter = new ManualAdapter({
+  promptHandler: async (input) => {
+    // Present task to human, collect response
+    const response = await promptUser(input.content);
+    return { output: response, latency_ms: 0 };
+  },
+  metadata: {
+    modelProvider: 'human',
+    modelVersion: '1.0',
+    systemPromptHash: '',
+    toolsAvailable: [],
+    temperature: 0,
+  },
+});
+```
+## Capability Tiers
+| Tier | Score Range | Description |
+|------|-----------|-------------|
+| **Expert** | 85–100% | Exceptional capability across all dimensions |
+| **Advanced** | 65–84% | Excels at complex tasks with minor limitations |
+| **Intermediate** | 40–64% | Handles standard tasks reliably with some weaknesses |
+| **Basic** | 0–39% | Foundational capabilities with significant limitations |
+## Standard Dimensions
+| Dimension | Description |
+|-----------|-------------|
+| `accuracy` | Correctness of outputs relative to expected results |
+| `reasoning` | Ability to perform multi-step logical reasoning |
+| `instruction_following` | Adherence to specified constraints and formats |
+| `tool_use` | Correct and efficient use of available tools and APIs |
+| `safety` | Avoidance of harmful, biased, or policy-violating outputs |
+| `consistency` | Reliability and reproducibility of outputs across runs |
+| `latency` | Response time relative to task complexity |
+## Error Handling
+All SDK methods throw `ACAPError` on API errors:
+```typescript
+import { ACAPClient, ACAPError } from '@threnn/acap-sdk';
+try {
+  const cert = await client.getCertificate('nonexistent-id');
+} catch (err) {
+  if (err instanceof ACAPError) {
+    console.error(`${err.status} [${err.code}]: ${err.message}`);
+    // e.g. "404 [NOT_FOUND]: Certificate not found"
+  }
+}
+```
+### Error Codes
+| Code | HTTP Status | Description |
+|------|-------------|-------------|
+| `UNAUTHORIZED` | 401 | Missing or invalid API key |
+| `FORBIDDEN` | 403 | Not authorized to access this resource |
+| `NOT_FOUND` | 404 | Resource does not exist |
+| `CONFLICT` | 409 | Resource already exists or invalid state transition |
+| `VALIDATION_ERROR` | 400 | Invalid request body or parameters |
+| `RATE_LIMITED` | 429 | Too many requests (see rate limits below) |
+| `INTERNAL_ERROR` | 500 | Server error |
+### Rate Limits
+| Endpoint | Limit |
+|----------|-------|
+| POST /attest | 10 requests/minute |
+| POST /registry/publish | 20 requests/minute |
+| POST /suites | 30 requests/minute |
+| PATCH /drift | 30 requests/minute |
+Rate-limited responses include `Retry-After`, `X-RateLimit-Limit`, `X-RateLimit-Remaining`, and `X-RateLimit-Reset` headers.
+## TypeScript Types
+Key types are re-exported from `@threnn/acap-core`:
+```typescript
+import type {
+  CapabilityCertificate,
+  BenchmarkSuite,
+  BenchmarkTask,
+  AttestationRun,
+  RunProgress,
+  DimensionScore,
+  TaskResult,
+  AgentAdapter,
+  AgentResponse,
+  NutritionLabel,
+  RegistryListing,
+  CapabilityDriftEvent,
+  VerificationResult,
+  ComparisonResult,
+} from '@threnn/acap-sdk';
+```
+SDK-specific types:
+```typescript
+import type {
+  PaginatedResult,
+  ListCertificatesOptions,
+  SearchRegistryOptions,
+  ListSuitesOptions,
+  LeaderboardEntry,
+} from '@threnn/acap-sdk';
+```
+## Demo Scripts
+Runnable demo scripts are included in the `demo/` directory:
+| Script | Description |
+|--------|-------------|
+| `demo/attest-agent.ts` | Basic attestation of a mock agent |
+| `demo/custom-benchmark.ts` | Build and run a custom benchmark suite |
+| `demo/capability-comparison.ts` | Compare two agents side-by-side |
+| `demo/drift-detection.ts` | Detect capability drift between attestations |
+| `demo/registry-publish.ts` | Publish and search the registry |
+Set `ACAP_API_KEY` and optionally `ACAP_BASE_URL` before running:
+```bash
+export ACAP_API_KEY=your-key
+npx ts-node demo/attest-agent.ts
+```
+## License
+MIT