@amplitude/ai 0.1.0 → 0.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +1 -1
- package/README.md +436 -6
- package/dist/mcp/server.d.ts.map +1 -1
- package/dist/mcp/server.js +1 -1
- package/dist/mcp/server.js.map +1 -1
- package/dist/patching.d.ts.map +1 -1
- package/dist/providers/anthropic.d.ts.map +1 -1
- package/dist/providers/anthropic.js +4 -18
- package/dist/providers/anthropic.js.map +1 -1
- package/dist/providers/azure-openai.d.ts +3 -3
- package/dist/providers/azure-openai.d.ts.map +1 -1
- package/dist/providers/azure-openai.js +2 -2
- package/dist/providers/azure-openai.js.map +1 -1
- package/dist/providers/base.d.ts +11 -2
- package/dist/providers/base.d.ts.map +1 -1
- package/dist/providers/base.js +32 -14
- package/dist/providers/base.js.map +1 -1
- package/dist/providers/bedrock.d.ts.map +1 -1
- package/dist/providers/bedrock.js +13 -28
- package/dist/providers/bedrock.js.map +1 -1
- package/dist/providers/gemini.d.ts.map +1 -1
- package/dist/providers/gemini.js +8 -27
- package/dist/providers/gemini.js.map +1 -1
- package/dist/providers/mistral.d.ts.map +1 -1
- package/dist/providers/mistral.js +20 -24
- package/dist/providers/mistral.js.map +1 -1
- package/dist/providers/openai.d.ts.map +1 -1
- package/dist/providers/openai.js +23 -41
- package/dist/providers/openai.js.map +1 -1
- package/dist/types.d.ts +3 -0
- package/dist/types.d.ts.map +1 -1
- package/dist/types.js.map +1 -1
- package/dist/utils/streaming.d.ts +2 -0
- package/dist/utils/streaming.d.ts.map +1 -1
- package/dist/utils/streaming.js +10 -0
- package/dist/utils/streaming.js.map +1 -1
- package/llms-full.txt +1 -1
- package/llms.txt +1 -1
- package/mcp.schema.json +1 -1
- package/package.json +1 -1
package/AGENTS.md
CHANGED
package/README.md
CHANGED
|
@@ -39,17 +39,26 @@ One call auto-detects and patches every installed provider (OpenAI, Anthropic, A
|
|
|
39
39
|
|
|
40
40
|
- [Installation](#installation)
|
|
41
41
|
- [Quick Start](#quick-start)
|
|
42
|
+
- [Current Limitations](#current-limitations)
|
|
43
|
+
- [Is this for me?](#is-this-for-me)
|
|
44
|
+
- [Why this SDK?](#why-this-sdk)
|
|
45
|
+
- [What you can build](#what-you-can-build)
|
|
42
46
|
- [Choose Your Integration Tier](#choose-your-integration-tier)
|
|
43
47
|
- [Support matrix](#support-matrix)
|
|
44
48
|
- [Parity and runtime limitations](#parity-and-runtime-limitations)
|
|
45
49
|
- [Core Concepts](#core-concepts)
|
|
50
|
+
- [User Identity](#user-identity)
|
|
51
|
+
- [Session](#session)
|
|
46
52
|
- [Configuration](#configuration)
|
|
53
|
+
- [Context Dict Conventions](#context-dict-conventions)
|
|
47
54
|
- [Privacy & Content Control](#privacy--content-control)
|
|
48
55
|
- [Cache-Aware Cost Tracking](#cache-aware-cost-tracking)
|
|
56
|
+
- [Semantic Cache Tracking](#semantic-cache-tracking)
|
|
49
57
|
- [Model Tier Classification](#model-tier-classification)
|
|
50
58
|
- [Provider Wrappers](#provider-wrappers)
|
|
51
59
|
- [Streaming Tracking](#streaming-tracking)
|
|
52
60
|
- [Attachment Tracking](#attachment-tracking)
|
|
61
|
+
- [Implicit Feedback](#implicit-feedback)
|
|
53
62
|
- [tool() and observe() HOFs](#tool-and-observe-hofs)
|
|
54
63
|
- [Scoring Patterns](#scoring-patterns)
|
|
55
64
|
- [Enrichments](#enrichments)
|
|
@@ -57,6 +66,8 @@ One call auto-detects and patches every installed provider (OpenAI, Anthropic, A
|
|
|
57
66
|
- [Patching (Zero-Code Instrumentation)](#patching-zero-code-instrumentation)
|
|
58
67
|
- [Auto-Instrumentation CLI](#auto-instrumentation-cli)
|
|
59
68
|
- [Integrations](#integrations)
|
|
69
|
+
- [Data Flow](#data-flow)
|
|
70
|
+
- [Which Integration Should I Use?](#which-integration-should-i-use)
|
|
60
71
|
- [Integration Patterns](#integration-patterns)
|
|
61
72
|
- [Serverless Environments](#serverless-environments)
|
|
62
73
|
- [Error Handling and Reliability](#error-handling-and-reliability)
|
|
@@ -152,6 +163,49 @@ The zero-code / CLI setup gives you cost, latency, token counts, and error track
|
|
|
152
163
|
|
|
153
164
|
Adding `userId` is one option per call. Adding session context is `session.run()`. See [Session](#session) and [Choose Your Integration Tier](#choose-your-integration-tier).
|
|
154
165
|
|
|
166
|
+
### Current Limitations
|
|
167
|
+
|
|
168
|
+
| Area | Status |
|
|
169
|
+
| ---- | ------ |
|
|
170
|
+
| Runtime | Node.js only (no browser). Python SDK available separately ([amplitude-ai on PyPI](https://pypi.org/project/amplitude-ai/)). |
|
|
171
|
+
| Zero-code patching | OpenAI, Anthropic, Azure OpenAI, Gemini, Mistral, Bedrock (Converse/ConverseStream only). |
|
|
172
|
+
| CrewAI | Python-only; the Node.js export throws `ProviderError` by design. Use LangChain or OpenTelemetry integrations instead. |
|
|
173
|
+
| OTEL scope filtering | Not yet supported (Python SDK has `allowed_scopes`/`blocked_scopes`). |
|
|
174
|
+
| Streaming cost tracking | Automatic for OpenAI and Anthropic. Manual token counts required for other providers' streamed responses. |
|
|
175
|
+
|
|
176
|
+
### Is this for me?
|
|
177
|
+
|
|
178
|
+
**Yes, if** you're building an AI-powered feature (chatbot, copilot, agent, RAG pipeline) and you want to measure how it impacts real user behavior. AI events land in the same Amplitude project as your product events, so you can build funnels from "user asks a question" to "user converts," create cohorts of users with low AI quality scores, and measure retention without stitching data across tools.
|
|
179
|
+
|
|
180
|
+
**Already using an LLM observability tool?** Keep it. The [OTEL bridge](#opentelemetry) adds Amplitude as a second destination in one line. Your existing traces stay, and you get product analytics on top.
|
|
181
|
+
|
|
182
|
+
### Why this SDK?
|
|
183
|
+
|
|
184
|
+
Most AI observability tools give you traces. This SDK gives you **per-turn events that live in your product analytics** so you can:
|
|
185
|
+
|
|
186
|
+
- Build funnels from "user opens chat" through "AI responds" to "user converts"
|
|
187
|
+
- Create cohorts of users with low AI quality scores and measure their 7-day retention
|
|
188
|
+
- Answer "is this AI feature helping or hurting?" without moving data between tools
|
|
189
|
+
|
|
190
|
+
The structural difference is the event model. Trace-centric tools typically produce spans per LLM call. This SDK produces **one event per conversation turn** with 40+ properties: model, tokens, cost, latency, reasoning, implicit feedback signals (regeneration, copy, abandonment), cache breakdowns, agent hierarchy, and experiment context. Each event is independently queryable in Amplitude's charts, cohorts, funnels, and retention analysis.
|
|
191
|
+
|
|
192
|
+
**Every AI event carries your product `user_id`.** No separate identity system, no data joining required. Build a funnel from "user opens chat" to "AI responds" to "user upgrades" directly in Amplitude.
|
|
193
|
+
|
|
194
|
+
**Server-side enrichment does the evals for you.** When content is available (`contentMode: 'full'`), Amplitude's enrichment pipeline runs automatically on every session after it closes. You get topic classifications, quality rubrics, behavioral flags, and session outcomes without writing or maintaining any eval code.
|
|
195
|
+
|
|
196
|
+
**Three content-control tiers.** `full` sends content and Amplitude runs enrichments for you. `metadata_only` sends zero content (you still get cost, latency, tokens, session grouping). `customer_enriched` sends zero content but lets you provide your own structured labels via `trackSessionEnrichment()`.
|
|
197
|
+
|
|
198
|
+
**Cache-aware cost tracking.** Pass `cacheReadTokens` and `cacheCreationTokens` for accurate blended costs. Without this breakdown, naive cost calculation can overestimate by 2-5x for cache-heavy workloads.
|
|
199
|
+
|
|
200
|
+
### What you can build
|
|
201
|
+
|
|
202
|
+
Once AI events are in Amplitude alongside your product events:
|
|
203
|
+
|
|
204
|
+
- **Cohorts.** "Users who had 3+ task failures in the last 30 days." "Users with low task completion scores." Target them with Guides, measure churn impact.
|
|
205
|
+
- **Funnels.** "AI session about charts -> Chart Created." "Sign Up -> First AI Session -> Conversion." Measure whether AI drives feature adoption and onboarding.
|
|
206
|
+
- **Retention.** Do users with successful AI sessions retain better than those with failures? Segment retention curves by `[Agent] Overall Outcome` or task completion score.
|
|
207
|
+
- **Agent analytics.** Compare quality, cost, and failure rate across agents in one chart. Identify which agent in a multi-agent chain introduced a failure.
|
|
208
|
+
|
|
155
209
|
## Choose Your Integration Tier
|
|
156
210
|
|
|
157
211
|
| Tier | Code Changes | What You Get |
|
|
@@ -328,6 +382,42 @@ const tenant = ai.tenant('org-456', { env: 'production' });
|
|
|
328
382
|
const agent = tenant.agent('support-bot', { userId: 'user-123' });
|
|
329
383
|
```
|
|
330
384
|
|
|
385
|
+
### User Identity
|
|
386
|
+
|
|
387
|
+
User identity flows through the **session**, **per-call**, or **middleware** -- not at agent creation or patch time. This keeps the agent reusable across users.
|
|
388
|
+
|
|
389
|
+
**Via sessions** (recommended): pass `userId` when opening a session:
|
|
390
|
+
|
|
391
|
+
```typescript
|
|
392
|
+
const agent = ai.agent('support-bot', { env: 'production' });
|
|
393
|
+
const session = agent.session({ userId: 'user-42' });
|
|
394
|
+
|
|
395
|
+
await session.run(async (s) => {
|
|
396
|
+
s.trackUserMessage('Hello');
|
|
397
|
+
// userId inherited from session context
|
|
398
|
+
});
|
|
399
|
+
```
|
|
400
|
+
|
|
401
|
+
**Per-call**: pass `userId` on each tracking call (useful with the zero-code tier):
|
|
402
|
+
|
|
403
|
+
```typescript
|
|
404
|
+
agent.trackUserMessage('Hello', {
|
|
405
|
+
userId: 'user-42',
|
|
406
|
+
sessionId: 'sess-1',
|
|
407
|
+
});
|
|
408
|
+
```
|
|
409
|
+
|
|
410
|
+
**Via middleware**: `createAmplitudeAIMiddleware` extracts user identity from the request (see [Middleware](#middleware)):
|
|
411
|
+
|
|
412
|
+
```typescript
|
|
413
|
+
app.use(
|
|
414
|
+
createAmplitudeAIMiddleware({
|
|
415
|
+
amplitudeAI: ai,
|
|
416
|
+
userIdResolver: (req) => req.headers['x-user-id'] ?? null,
|
|
417
|
+
}),
|
|
418
|
+
);
|
|
419
|
+
```
|
|
420
|
+
|
|
331
421
|
### Session
|
|
332
422
|
|
|
333
423
|
Async context manager using `AsyncLocalStorage`. Use `session.run()` to execute a callback within session context; session end is tracked automatically on exit:
|
|
@@ -345,12 +435,37 @@ Start a new trace within an ongoing session to group related operations:
|
|
|
345
435
|
```typescript
|
|
346
436
|
await session.run(async (s) => {
|
|
347
437
|
const traceId = s.newTrace();
|
|
348
|
-
// All subsequent tracking calls inherit this traceId
|
|
349
438
|
s.trackUserMessage('Follow-up question');
|
|
350
439
|
s.trackAiMessage(response.content, 'gpt-4o', 'openai', latencyMs);
|
|
351
440
|
});
|
|
352
441
|
```
|
|
353
442
|
|
|
443
|
+
For sessions where gaps between messages may exceed 30 minutes (e.g., coding assistants, support agents waiting on customer replies), pass `idleTimeoutMinutes` so Amplitude knows the session is still active:
|
|
444
|
+
|
|
445
|
+
```typescript
|
|
446
|
+
const session = agent.session({
|
|
447
|
+
userId: 'user-123',
|
|
448
|
+
idleTimeoutMinutes: 240, // expect up to 4-hour gaps
|
|
449
|
+
});
|
|
450
|
+
```
|
|
451
|
+
|
|
452
|
+
Without this, sessions with long idle periods may be closed and evaluated prematurely. The default is 30 minutes.
|
|
453
|
+
|
|
454
|
+
**Link to Session Replay**: If your frontend uses Amplitude's [Session Replay](https://www.docs.developers.amplitude.com/session-replay/), pass the browser's `deviceId` and `browserSessionId` to link AI sessions to browser recordings:
|
|
455
|
+
|
|
456
|
+
```typescript
|
|
457
|
+
const session = agent.session({
|
|
458
|
+
userId: 'user-123',
|
|
459
|
+
deviceId: req.headers['x-amp-device-id'],
|
|
460
|
+
browserSessionId: req.headers['x-amp-session-id'],
|
|
461
|
+
});
|
|
462
|
+
|
|
463
|
+
await session.run(async (s) => {
|
|
464
|
+
s.trackUserMessage('What is retention?');
|
|
465
|
+
// All events now carry [Amplitude] Session Replay ID = deviceId/browserSessionId
|
|
466
|
+
});
|
|
467
|
+
```
|
|
468
|
+
|
|
354
469
|
### tool()
|
|
355
470
|
|
|
356
471
|
Higher-order function wrapping functions to auto-track as `[Agent] Tool Call` events:
|
|
@@ -424,6 +539,58 @@ const ai = new AmplitudeAI({ apiKey: 'YOUR_API_KEY', config });
|
|
|
424
539
|
| `onEventCallback` | Callback invoked after every tracked event `(event, statusCode, message) => void` |
|
|
425
540
|
| `propagateContext` | Enable cross-service context propagation |
|
|
426
541
|
|
|
542
|
+
## Context Dict Conventions
|
|
543
|
+
|
|
544
|
+
The `context` parameter on `ai.agent()` accepts an arbitrary `Record<string, unknown>` that is JSON-serialized and attached to every event as `[Agent] Context`. This is the recommended way to add segmentation dimensions without requiring new global properties.
|
|
545
|
+
|
|
546
|
+
**Recommended keys:**
|
|
547
|
+
|
|
548
|
+
| Key | Example Values | Use Case |
|
|
549
|
+
| --- | --- | --- |
|
|
550
|
+
| `agent_type` | `"planner"`, `"executor"`, `"retriever"`, `"router"` | Filter/group analytics by agent role in multi-agent systems. |
|
|
551
|
+
| `experiment_variant` | `"control"`, `"treatment-v2"`, `"prompt-rewrite-a"` | Segment AI sessions by A/B test variant. Compare quality scores, abandonment rates, or cost across experiment arms. |
|
|
552
|
+
| `feature_flag` | `"new-rag-pipeline"`, `"reasoning-model-enabled"` | Track which feature flags were active during the session. |
|
|
553
|
+
| `surface` | `"chat"`, `"search"`, `"copilot"`, `"email-draft"` | Identify which UI surface or product area triggered the AI interaction. |
|
|
554
|
+
| `prompt_revision` | `"v7"`, `"abc123"`, `"2026-02-15"` | Track which prompt version was used. Detect prompt regression when combined with `agentVersion`. |
|
|
555
|
+
| `deployment_region` | `"us-east-1"`, `"eu-west-1"` | Segment by deployment region for latency analysis or compliance tracking. |
|
|
556
|
+
| `canary_group` | `"canary"`, `"stable"` | Identify canary vs. stable deployments for progressive rollout monitoring. |
|
|
557
|
+
|
|
558
|
+
**Example:**
|
|
559
|
+
|
|
560
|
+
```typescript
|
|
561
|
+
const agent = ai.agent('support-bot', {
|
|
562
|
+
userId: 'u1',
|
|
563
|
+
agentVersion: '4.2.0',
|
|
564
|
+
context: {
|
|
565
|
+
agent_type: 'executor',
|
|
566
|
+
experiment_variant: 'reasoning-enabled',
|
|
567
|
+
surface: 'chat',
|
|
568
|
+
feature_flag: 'new-rag-pipeline',
|
|
569
|
+
prompt_revision: 'v7',
|
|
570
|
+
},
|
|
571
|
+
});
|
|
572
|
+
|
|
573
|
+
// All events from this agent (and its sessions, child agents, and provider
|
|
574
|
+
// wrappers) will include [Agent] Context with these keys.
|
|
575
|
+
```
|
|
576
|
+
|
|
577
|
+
**Context merging in child agents:**
|
|
578
|
+
|
|
579
|
+
```typescript
|
|
580
|
+
const parent = ai.agent('orchestrator', {
|
|
581
|
+
context: { experiment_variant: 'treatment', surface: 'chat' },
|
|
582
|
+
});
|
|
583
|
+
const child = parent.child('researcher', {
|
|
584
|
+
context: { agent_type: 'retriever' },
|
|
585
|
+
});
|
|
586
|
+
// child context = { experiment_variant: 'treatment', surface: 'chat', agent_type: 'retriever' }
|
|
587
|
+
// Child keys override parent keys; parent keys absent from the child are preserved.
|
|
588
|
+
```
|
|
589
|
+
|
|
590
|
+
**Querying in Amplitude:** The `[Agent] Context` property is a JSON string. Use Amplitude's JSON property parsing to extract individual keys for charts, cohorts, and funnels. For example, group by `[Agent] Context.agent_type` to see metrics by agent role.
|
|
591
|
+
|
|
592
|
+
> **Note on `experiment_variant` and server-generated events:** Context keys appear on all SDK-emitted events (`[Agent] User Message`, `[Agent] AI Response`, etc.). Server-generated events (`[Agent] Session Evaluation`, `[Agent] Score` with `source="ai"`) do not yet inherit context keys. To segment server-generated quality scores by experiment arm, use Amplitude Derived Properties to extract from `[Agent] Context` on SDK events.
|
|
593
|
+
|
|
427
594
|
## Privacy & Content Control
|
|
428
595
|
|
|
429
596
|
Three content modes control what data is sent to Amplitude:
|
|
@@ -548,6 +715,18 @@ s.trackAiMessage(response.content, 'gpt-4o', 'openai', latencyMs, {
|
|
|
548
715
|
});
|
|
549
716
|
```
|
|
550
717
|
|
|
718
|
+
## Semantic Cache Tracking
|
|
719
|
+
|
|
720
|
+
Track full-response semantic cache hits (distinct from token-level prompt caching above):
|
|
721
|
+
|
|
722
|
+
```typescript
|
|
723
|
+
s.trackAiMessage(cachedResponse.content, 'gpt-4o', 'openai', latencyMs, {
|
|
724
|
+
wasCached: true, // served from Redis/semantic cache
|
|
725
|
+
});
|
|
726
|
+
```
|
|
727
|
+
|
|
728
|
+
Maps to `[Agent] Was Cached`. Enables "cache hit rate" charts and cost optimization analysis. Only emitted when `true`; omitted (not `false`) when the response was not cached.
|
|
729
|
+
|
|
551
730
|
## Model Tier Classification
|
|
552
731
|
|
|
553
732
|
Models are automatically classified into tiers for cost/performance analysis:
|
|
@@ -759,6 +938,42 @@ s.trackAiMessage(response.content, 'gpt-4o', 'openai', latencyMs, {
|
|
|
759
938
|
});
|
|
760
939
|
```
|
|
761
940
|
|
|
941
|
+
## Implicit Feedback
|
|
942
|
+
|
|
943
|
+
Track behavioral signals that indicate whether a response met the user's need, without requiring explicit ratings:
|
|
944
|
+
|
|
945
|
+
```typescript
|
|
946
|
+
// User asks a question
|
|
947
|
+
s.trackUserMessage('How do I create a funnel?');
|
|
948
|
+
|
|
949
|
+
// AI responds — user copies the answer (positive signal)
|
|
950
|
+
s.trackAiMessage('To create a funnel, go to...', 'gpt-4o', 'openai', latencyMs, {
|
|
951
|
+
wasCopied: true,
|
|
952
|
+
});
|
|
953
|
+
|
|
954
|
+
// User regenerates (negative signal — first response wasn't good enough)
|
|
955
|
+
s.trackUserMessage('How do I create a funnel?', {
|
|
956
|
+
isRegeneration: true,
|
|
957
|
+
});
|
|
958
|
+
|
|
959
|
+
// User edits their question (refining intent)
|
|
960
|
+
s.trackUserMessage('How do I create a conversion funnel for signups?', {
|
|
961
|
+
isEdit: true,
|
|
962
|
+
editedMessageId: originalMsgId, // links the edit to the original
|
|
963
|
+
});
|
|
964
|
+
```
|
|
965
|
+
|
|
966
|
+
Track abandonment at session end — a low `abandonmentTurn` (e.g., 1) strongly signals first-response dissatisfaction:
|
|
967
|
+
|
|
968
|
+
```typescript
|
|
969
|
+
agent.trackSessionEnd({
|
|
970
|
+
sessionId: 'sess-1',
|
|
971
|
+
abandonmentTurn: 1, // user left after first AI response
|
|
972
|
+
});
|
|
973
|
+
```
|
|
974
|
+
|
|
975
|
+
These signals map to `[Agent] Was Copied`, `[Agent] Is Regeneration`, `[Agent] Is Edit`, `[Agent] Edited Message ID`, and `[Agent] Abandonment Turn`. Use them in Amplitude to build quality dashboards without requiring user surveys.
|
|
976
|
+
|
|
762
977
|
## tool() and observe() HOFs
|
|
763
978
|
|
|
764
979
|
### tool()
|
|
@@ -908,6 +1123,78 @@ agent.trackSessionEnrichment(enrichments, {
|
|
|
908
1123
|
});
|
|
909
1124
|
```
|
|
910
1125
|
|
|
1126
|
+
### End-to-End Example: `customer_enriched` Mode
|
|
1127
|
+
|
|
1128
|
+
This mode is for teams that run their own evaluation pipeline (or can't send message content to Amplitude) but still want rich session-level analytics. Here's a complete workflow:
|
|
1129
|
+
|
|
1130
|
+
```typescript
|
|
1131
|
+
import {
|
|
1132
|
+
AIConfig,
|
|
1133
|
+
AmplitudeAI,
|
|
1134
|
+
ContentMode,
|
|
1135
|
+
MessageLabel,
|
|
1136
|
+
RubricScore,
|
|
1137
|
+
SessionEnrichments,
|
|
1138
|
+
TopicClassification,
|
|
1139
|
+
} from '@amplitude/ai';
|
|
1140
|
+
|
|
1141
|
+
// 1. Configure: no content sent to Amplitude
|
|
1142
|
+
const ai = new AmplitudeAI({
|
|
1143
|
+
apiKey: process.env.AMPLITUDE_AI_API_KEY!,
|
|
1144
|
+
config: new AIConfig({
|
|
1145
|
+
contentMode: ContentMode.CUSTOMER_ENRICHED,
|
|
1146
|
+
}),
|
|
1147
|
+
});
|
|
1148
|
+
|
|
1149
|
+
const agent = ai.agent('support-bot', { agentVersion: '2.1.0' });
|
|
1150
|
+
|
|
1151
|
+
// 2. Run the conversation — content is NOT sent (metadata only)
|
|
1152
|
+
const session = agent.session({ userId: 'user-42' });
|
|
1153
|
+
const { sessionId, messageIds } = await session.run(async (s) => {
|
|
1154
|
+
const msgIds: string[] = [];
|
|
1155
|
+
msgIds.push(s.trackUserMessage('Why was I charged twice?'));
|
|
1156
|
+
msgIds.push(
|
|
1157
|
+
s.trackAiMessage(
|
|
1158
|
+
aiResponse.content,
|
|
1159
|
+
'gpt-4o',
|
|
1160
|
+
'openai',
|
|
1161
|
+
latencyMs,
|
|
1162
|
+
),
|
|
1163
|
+
);
|
|
1164
|
+
return { sessionId: s.sessionId, messageIds: msgIds };
|
|
1165
|
+
});
|
|
1166
|
+
|
|
1167
|
+
// 3. Run your eval pipeline on the raw messages (e.g., your own LLM judge)
|
|
1168
|
+
const evalResults = await myEvalPipeline(conversationHistory);
|
|
1169
|
+
|
|
1170
|
+
// 4. Ship enrichments back to Amplitude
|
|
1171
|
+
const enrichments = new SessionEnrichments({
|
|
1172
|
+
qualityScore: evalResults.quality,
|
|
1173
|
+
sentimentScore: evalResults.sentiment,
|
|
1174
|
+
overallOutcome: evalResults.outcome,
|
|
1175
|
+
topicClassifications: {
|
|
1176
|
+
'billing': new TopicClassification({
|
|
1177
|
+
topic: 'billing-dispute',
|
|
1178
|
+
confidence: 0.92,
|
|
1179
|
+
}),
|
|
1180
|
+
},
|
|
1181
|
+
rubricScores: [
|
|
1182
|
+
new RubricScore({ name: 'accuracy', score: 4, maxScore: 5 }),
|
|
1183
|
+
new RubricScore({ name: 'helpfulness', score: 5, maxScore: 5 }),
|
|
1184
|
+
],
|
|
1185
|
+
messageLabels: {
|
|
1186
|
+
[messageIds[0]]: [
|
|
1187
|
+
new MessageLabel({ key: 'intent', value: 'billing-dispute', confidence: 0.94 }),
|
|
1188
|
+
],
|
|
1189
|
+
},
|
|
1190
|
+
customMetadata: { eval_model: 'gpt-4o-judge-v2' },
|
|
1191
|
+
});
|
|
1192
|
+
|
|
1193
|
+
agent.trackSessionEnrichment(enrichments, { sessionId });
|
|
1194
|
+
```
|
|
1195
|
+
|
|
1196
|
+
This produces the same Amplitude event properties as Amplitude's built-in server-side enrichment (topics, rubrics, outcomes, message labels), but sourced from your pipeline. Use it when compliance requires zero-content transmission, or when you need custom evaluation logic beyond what the built-in enrichment provides.
|
|
1197
|
+
|
|
911
1198
|
### Available Enrichment Fields
|
|
912
1199
|
|
|
913
1200
|
- **Quality & Sentiment**: `qualityScore`, `sentimentScore`
|
|
@@ -924,7 +1211,11 @@ agent.trackSessionEnrichment(enrichments, {
|
|
|
924
1211
|
|
|
925
1212
|
### Message Labels
|
|
926
1213
|
|
|
927
|
-
Attach classification labels to individual messages within a session
|
|
1214
|
+
Attach classification labels to individual messages within a session. Labels are flexible key-value pairs for filtering and segmentation in Amplitude.
|
|
1215
|
+
|
|
1216
|
+
**Common use cases:** routing tags (`flow`, `surface`), classifier output (`intent`, `sentiment`, `toxicity`), business context (`tier`, `plan`).
|
|
1217
|
+
|
|
1218
|
+
**Inline labels** (at tracking time):
|
|
928
1219
|
|
|
929
1220
|
```typescript
|
|
930
1221
|
import { MessageLabel } from '@amplitude/ai';
|
|
@@ -945,6 +1236,29 @@ s.trackUserMessage('I want to cancel my subscription', {
|
|
|
945
1236
|
});
|
|
946
1237
|
```
|
|
947
1238
|
|
|
1239
|
+
**Retrospective labels** (after the session, from a background pipeline):
|
|
1240
|
+
|
|
1241
|
+
When classifier results arrive after the session ends, attach them via `SessionEnrichments.messageLabels`, keyed by the `messageId` returned from tracking calls:
|
|
1242
|
+
|
|
1243
|
+
```typescript
|
|
1244
|
+
import { MessageLabel, SessionEnrichments } from '@amplitude/ai';
|
|
1245
|
+
|
|
1246
|
+
const enrichments = new SessionEnrichments({
|
|
1247
|
+
messageLabels: {
|
|
1248
|
+
[userMsgId]: [
|
|
1249
|
+
new MessageLabel({ key: 'intent', value: 'cancellation', confidence: 0.94 }),
|
|
1250
|
+
],
|
|
1251
|
+
[aiMsgId]: [
|
|
1252
|
+
new MessageLabel({ key: 'quality', value: 'good', confidence: 0.91 }),
|
|
1253
|
+
],
|
|
1254
|
+
},
|
|
1255
|
+
});
|
|
1256
|
+
|
|
1257
|
+
agent.trackSessionEnrichment(enrichments, { sessionId: 'sess-abc123' });
|
|
1258
|
+
```
|
|
1259
|
+
|
|
1260
|
+
Labels are emitted as `[Agent] Message Labels` on the event. In Amplitude, filter or group by label key/value to build charts like "messages by intent" or "sessions where flow=onboarding".
|
|
1261
|
+
|
|
948
1262
|
## Debug and Dry-Run Modes
|
|
949
1263
|
|
|
950
1264
|
### Debug Mode
|
|
@@ -1139,18 +1453,67 @@ const handler = new AmplitudeCallbackHandler({
|
|
|
1139
1453
|
|
|
1140
1454
|
### OpenTelemetry
|
|
1141
1455
|
|
|
1456
|
+
Two exporters add Amplitude as a destination alongside your existing trace backend (Datadog, Honeycomb, Jaeger, etc.):
|
|
1457
|
+
|
|
1142
1458
|
```typescript
|
|
1143
|
-
import {
|
|
1459
|
+
import {
|
|
1460
|
+
AmplitudeAgentExporter,
|
|
1461
|
+
AmplitudeGenAIExporter,
|
|
1462
|
+
} from '@amplitude/ai';
|
|
1463
|
+
import { NodeTracerProvider } from '@opentelemetry/sdk-trace-node';
|
|
1464
|
+
import {
|
|
1465
|
+
BatchSpanProcessor,
|
|
1466
|
+
SimpleSpanProcessor,
|
|
1467
|
+
} from '@opentelemetry/sdk-trace-base';
|
|
1144
1468
|
|
|
1145
|
-
|
|
1469
|
+
const provider = new NodeTracerProvider();
|
|
1470
|
+
|
|
1471
|
+
// GenAI exporter — converts gen_ai.* spans into Amplitude AI events
|
|
1472
|
+
provider.addSpanProcessor(
|
|
1473
|
+
new BatchSpanProcessor(
|
|
1474
|
+
new AmplitudeGenAIExporter({
|
|
1475
|
+
apiKey: process.env.AMPLITUDE_AI_API_KEY!,
|
|
1476
|
+
}),
|
|
1477
|
+
),
|
|
1478
|
+
);
|
|
1479
|
+
|
|
1480
|
+
// Agent exporter — converts agent.* spans into Amplitude session events
|
|
1481
|
+
provider.addSpanProcessor(
|
|
1482
|
+
new SimpleSpanProcessor(
|
|
1483
|
+
new AmplitudeAgentExporter({
|
|
1484
|
+
apiKey: process.env.AMPLITUDE_AI_API_KEY!,
|
|
1485
|
+
}),
|
|
1486
|
+
),
|
|
1487
|
+
);
|
|
1488
|
+
|
|
1489
|
+
provider.register();
|
|
1146
1490
|
```
|
|
1147
1491
|
|
|
1148
|
-
|
|
1492
|
+
Only spans with `gen_ai.provider.name` or `gen_ai.system` attributes are processed; all other spans are silently ignored. This means it's safe to add the exporter to a pipeline that produces mixed (GenAI + HTTP + DB) spans.
|
|
1493
|
+
|
|
1494
|
+
**Attribute mapping reference:**
|
|
1149
1495
|
|
|
1150
|
-
|
|
1496
|
+
| OTEL Span Attribute | Amplitude Event Property | Notes |
|
|
1497
|
+
| --- | --- | --- |
|
|
1498
|
+
| `gen_ai.response.model` / `gen_ai.request.model` | `[Agent] Model` | Response model preferred |
|
|
1499
|
+
| `gen_ai.system` / `gen_ai.provider.name` | `[Agent] Provider` | |
|
|
1500
|
+
| `gen_ai.usage.input_tokens` | `[Agent] Input Tokens` | |
|
|
1501
|
+
| `gen_ai.usage.output_tokens` | `[Agent] Output Tokens` | |
|
|
1502
|
+
| `gen_ai.usage.total_tokens` | `[Agent] Total Tokens` | Derived if not present |
|
|
1503
|
+
| `gen_ai.usage.cache_read.input_tokens` | `[Agent] Cache Read Tokens` | |
|
|
1504
|
+
| `gen_ai.usage.cache_creation.input_tokens` | `[Agent] Cache Creation Tokens` | |
|
|
1505
|
+
| `gen_ai.request.temperature` | `[Agent] Temperature` | |
|
|
1506
|
+
| `gen_ai.request.top_p` | `[Agent] Top P` | |
|
|
1507
|
+
| `gen_ai.request.max_output_tokens` | `[Agent] Max Output Tokens` | |
|
|
1508
|
+
| `gen_ai.response.finish_reasons` | `[Agent] Finish Reason` | |
|
|
1509
|
+
| `gen_ai.input.messages` | `[Agent] LLM Message` | Only if content mode allows |
|
|
1510
|
+
| Span duration | `[Agent] Latency Ms` | |
|
|
1511
|
+
| Span status ERROR | `[Agent] Is Error`, `[Agent] Error Message` | |
|
|
1151
1512
|
|
|
1152
1513
|
**Not available via OTEL (use native wrappers):** reasoning content/tokens, TTFB, streaming detection, implicit feedback, file attachments, event graph linking (parent_message_id).
|
|
1153
1514
|
|
|
1515
|
+
**When to use OTEL vs. native wrappers:** If you already have `@opentelemetry/instrumentation-openai` or similar producing GenAI spans, the OTEL bridge gives you Amplitude analytics with zero code changes. For richer tracking (implicit feedback, streaming metrics, attachments), use the native `wrapOpenAI()`/`wrapAnthropic()` wrappers alongside OTEL.
|
|
1516
|
+
|
|
1154
1517
|
### LlamaIndex
|
|
1155
1518
|
|
|
1156
1519
|
```typescript
|
|
@@ -1180,6 +1543,73 @@ import { AmplitudeCrewAIHooks } from '@amplitude/ai';
|
|
|
1180
1543
|
|
|
1181
1544
|
In Node.js, `AmplitudeCrewAIHooks` throws a `ProviderError` by design. Use LangChain or OpenTelemetry integrations instead.
|
|
1182
1545
|
|
|
1546
|
+
## Data Flow
|
|
1547
|
+
|
|
1548
|
+
How events flow from your application to Amplitude charts:
|
|
1549
|
+
|
|
1550
|
+
```
|
|
1551
|
+
Your Application
|
|
1552
|
+
├── wrapOpenAI() / wrapAnthropic() ─── auto-emits ──┐
|
|
1553
|
+
├── session.trackUserMessage() ─── manual ──────┤
|
|
1554
|
+
├── session.trackAiMessage() ─── manual ──────┤
|
|
1555
|
+
├── agent.trackToolCall() ─── manual ──────┤
|
|
1556
|
+
├── agent.trackSessionEnrichment() ─── manual ──────┤
|
|
1557
|
+
└── OTEL exporter (AmplitudeGenAI...) ─── bridge ──────┤
|
|
1558
|
+
│
|
|
1559
|
+
AmplitudeAI client ◄──────┘
|
|
1560
|
+
│
|
|
1561
|
+
├── validate (if enabled)
|
|
1562
|
+
├── apply middleware chain
|
|
1563
|
+
├── batch events
|
|
1564
|
+
│
|
|
1565
|
+
▼
|
|
1566
|
+
Amplitude HTTP API
|
|
1567
|
+
│
|
|
1568
|
+
┌─────────────┴──────────────┐
|
|
1569
|
+
│ │
|
|
1570
|
+
Amplitude Charts LLM Enrichment
|
|
1571
|
+
(immediate querying) Pipeline (async)
|
|
1572
|
+
│
|
|
1573
|
+
▼
|
|
1574
|
+
[Agent] Session Evaluation
|
|
1575
|
+
[Agent] Score events
|
|
1576
|
+
(topic, rubric, outcome)
|
|
1577
|
+
```
|
|
1578
|
+
|
|
1579
|
+
**Key points:**
|
|
1580
|
+
- All paths converge at the `AmplitudeAI` client, which batches and sends events.
|
|
1581
|
+
- Events are available for charting within seconds of ingestion.
|
|
1582
|
+
- The LLM Enrichment Pipeline runs asynchronously after session close (only when `contentMode: 'full'`). It produces server-side events like `[Agent] Session Evaluation` and `[Agent] Score`.
|
|
1583
|
+
- With `contentMode: 'customer_enriched'`, the enrichment pipeline is skipped — you provide your own enrichments via `trackSessionEnrichment()`.
|
|
1584
|
+
|
|
1585
|
+
## Which Integration Should I Use?
|
|
1586
|
+
|
|
1587
|
+
Start here and pick the first tier that satisfies your analytics needs:
|
|
1588
|
+
|
|
1589
|
+
```
|
|
1590
|
+
Do you need per-user analytics
|
|
1591
|
+
(funnels, cohorts, retention)?
|
|
1592
|
+
│
|
|
1593
|
+
┌─── No ──┴── Yes ──┐
|
|
1594
|
+
│ │
|
|
1595
|
+
Tier 0: Zero-Code Do you need session
|
|
1596
|
+
(CLI auto-patch) grouping & enrichment?
|
|
1597
|
+
Cost, latency, │
|
|
1598
|
+
tokens, errors. ┌─ No ──┴── Yes ──┐
|
|
1599
|
+
│ │
|
|
1600
|
+
Tier 1: Events Do you control
|
|
1601
|
+
Per-call tracking the LLM call site?
|
|
1602
|
+
+ userId │
|
|
1603
|
+
┌── Yes ──┴── No ──┐
|
|
1604
|
+
│ │
|
|
1605
|
+
Tier 2: Sessions Tier 3: OTEL Bridge
|
|
1606
|
+
session.run() Add exporter to
|
|
1607
|
+
Full enrichment existing OTEL pipeline
|
|
1608
|
+
Implicit feedback Limited to OTEL attrs
|
|
1609
|
+
```
|
|
1610
|
+
|
|
1611
|
+
**Rule of thumb:** If you own the LLM call site, start with **Tier 2** (sessions). If you don't (e.g., a third-party framework exports OTEL spans), use **Tier 3** (OTEL bridge). If you just want aggregate cost monitoring without user analytics, **Tier 0** (zero-code) is ready in 60 seconds.
|
|
1612
|
+
|
|
1183
1613
|
## Integration Patterns
|
|
1184
1614
|
|
|
1185
1615
|
### Pattern A: Single-Request API Endpoint
|
package/dist/mcp/server.d.ts.map
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"server.d.ts","names":[],"sources":["../../src/mcp/server.ts"],"sourcesContent":[],"mappings":";;;KAqCK,WAAA;cAaC,iFAGS;AAlDqD,cA2H9D,YAzFU,EAAA,GAAA,GAyFS,SAzFT;AAAA,cAqaV,YAxZA,EAAA,GAAA,GAwZyB,OAhW9B,CAAA,IAAA,
|
|
1
|
+
{"version":3,"file":"server.d.ts","names":[],"sources":["../../src/mcp/server.ts"],"sourcesContent":[],"mappings":";;;KAqCK,WAAA;cAaC,iFAGS;AAlDqD,cA2H9D,YAzFU,EAAA,GAAA,GAyFS,SAzFT;AAAA,cAqaV,YAxZA,EAAA,GAAA,GAwZyB,OAhW9B,CAAA,IAAA,CAAA"}
|
package/dist/mcp/server.js
CHANGED
|
@@ -190,7 +190,7 @@ const createServer = () => {
|
|
|
190
190
|
text: JSON.stringify({
|
|
191
191
|
query,
|
|
192
192
|
total: selected.length,
|
|
193
|
-
results: selected.map(({ priority, ...rest }) => rest)
|
|
193
|
+
results: selected.map(({ priority: _priority, ...rest }) => rest)
|
|
194
194
|
}, null, 2)
|
|
195
195
|
}] };
|
|
196
196
|
});
|