npm - @tollgateai/sdk - Versions diffs - 0.4.0 → 0.6.0 - Mend

@tollgateai/sdk 0.4.0 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Tollgate
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md CHANGED Viewed

@@ -2,13 +2,13 @@
 > Real-time gross-margin observability for AI agents. Track every LLM call's cost, attribute it to a customer, and see whether you're making money — before the invoice goes out.
-**v0.4.0** &middot; [npm](https://www.npmjs.com/package/@tollgateai/sdk) &middot; [Dashboard](https://tollgateai.vercel.app)
+**v0.6.0** &middot; [npm](https://www.npmjs.com/package/@tollgateai/sdk) &middot; [Dashboard](https://tollgateai.vercel.app)
 ---
 ## Why Tollgate
-You sell an AI-powered product. Each customer interaction triggers LLM calls that cost you real money — input tokens, output tokens, reasoning tokens, cached tokens, tool calls. Tollgate captures that cost automatically from provider responses, joins it with the revenue your pricing model defines, and shows you per-customer, per-agent, per-run gross margin in real time.
+You sell an AI-powered product. Each customer interaction triggers LLM calls that cost you real money — input tokens, output tokens, reasoning tokens, audio tokens, cached tokens, web searches, tool calls. Tollgate captures that cost automatically from provider responses, joins it with the revenue your pricing model defines, and shows you per-customer, per-agent, per-run gross margin in real time.
 ## Installation
@@ -34,11 +34,11 @@ const anthropic = wrapAnthropic(new Anthropic(), tollgate, {
   runId: 'ticket_8842',
 });
-// Every call is tracked automatically — tokens, cost, tool calls.
+// Every call is tracked automatically — tokens, cost, latency, tool calls.
 const msg = await anthropic.messages.create({
   model: 'claude-sonnet-4-6',
   max_tokens: 1024,
-  messages: [{ role: 'user', content: 'Resolve this billing dispute…' }],
+  messages: [{ role: 'user', content: 'Resolve this billing dispute...' }],
 });
 // Close the run and book revenue.
@@ -52,12 +52,13 @@ await tollgate.resolve({
 ## Provider Support
-| Provider | Wrapper | Streaming | Tool-Call Tracking |
+| Provider | Wrapper | Streaming | What Gets Extracted |
 |---|---|---|---|
-| Anthropic | `wrapAnthropic` | Automatic | Counts `tool_use` content blocks |
-| OpenAI | `wrapOpenAI` | Needs `stream_options: { include_usage: true }` | Counts `tool_calls` on choices |
-| OpenAI-compatible (Groq, OpenRouter, Together, Nebius, vLLM, …) | `wrapOpenAI` with `provider: 'openai_compatible'` | Same as OpenAI | Same as OpenAI |
-| AWS Bedrock | `wrapBedrock` | Automatic | Counts `toolUse` content blocks |
+| **Anthropic** | `wrapAnthropic` | Automatic | Tokens, thinking/reasoning, cache (read + write by TTL), web search requests, tool calls, latency |
+| **OpenAI** | `wrapOpenAI` | `stream_options: { include_usage: true }` | Tokens, reasoning, cached, audio in/out, text in/out, prediction tokens, service tier, tool calls, latency |
+| **Google Gemini** | `wrapGemini` | Automatic | Tokens, thinking, cached, audio/image/video per-modality, web search (grounding), tool calls, latency |
+| **OpenAI-compatible** | `wrapOpenAI` + `provider: 'openai_compatible'` | Same as OpenAI | Same as OpenAI |
+| **AWS Bedrock** | `wrapBedrock` | Automatic | Tokens, cache (read + write), tool calls, latency |
 ## Configuration
@@ -81,7 +82,7 @@ const tollgate = createTollgateClient({
 ## Auto-Instrumentation
-Wrap your provider client once. Every `create` call reports usage in the background — non-blocking, fire-and-forget. Failures go to `onError` (default: `console.warn`) and never break your LLM call.
+Wrap your provider client once. Every `create` / `generateContent` call reports usage in the background — non-blocking, fire-and-forget. Failures go to `onError` (default: `console.warn`) and never break your LLM call.
 ### Anthropic
@@ -98,7 +99,7 @@ const anthropic = wrapAnthropic(new Anthropic(), tollgate, {
 await anthropic.messages.create({
   model: 'claude-sonnet-4-6',
   max_tokens: 512,
-  messages: [{ role: 'user', content: 'Summarize this ticket…' }],
+  messages: [{ role: 'user', content: 'Summarize this ticket...' }],
 });
 ```
@@ -117,6 +118,23 @@ await openai.chat.completions.create({
 });
 ```
+### Google Gemini
+```ts
+import { GoogleGenerativeAI } from '@google/generative-ai';
+import { createTollgateClient, wrapGemini } from '@tollgateai/sdk';
+const tollgate = createTollgateClient();
+const genai = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);
+const model = wrapGemini(
+  genai.getGenerativeModel({ model: 'gemini-2.0-flash' }),
+  tollgate,
+  { customerId: 'cust_acme' },
+);
+const result = await model.generateContent('Explain quantum computing');
+```
 ### OpenAI-Compatible Gateways
 Point the OpenAI SDK at any compatible endpoint and set `provider: 'openai_compatible'`:
@@ -159,9 +177,9 @@ await bedrock.send(new ConverseCommand({
 ### Streaming
-Streaming is captured automatically — iterate the stream as usual and usage is reported when the stream ends.
+Streaming is captured automatically. Iterate the stream as usual — usage and latency are reported when the stream ends.
-**OpenAI / compatible** requires `stream_options: { include_usage: true }` for the final usage chunk. **Anthropic** and **Bedrock** need no extra flags.
+**OpenAI / compatible** requires `stream_options: { include_usage: true }`. **Anthropic**, **Gemini**, and **Bedrock** need no extra flags.
 ```ts
 const stream = await openai.chat.completions.create({
@@ -171,38 +189,45 @@ const stream = await openai.chat.completions.create({
   messages: [{ role: 'user', content: 'Hello' }],
 });
 for await (const chunk of stream) { /* render to UI */ }
-// Usage reported automatically when stream ends.
+// Usage + latency reported automatically when stream ends.
 ```
 ---
 ## What Gets Tracked
-Every auto-instrumented call captures the following from the provider response:
+Every auto-instrumented call captures these fields from the provider response:
-| Field | Source | Description |
+| Field | Providers | Description |
 |---|---|---|
-| `tokensIn` | `usage.input_tokens` / `prompt_tokens` | Input tokens consumed |
-| `tokensOut` | `usage.output_tokens` / `completion_tokens` | Output tokens generated |
-| `reasoningTokens` | `completion_tokens_details.reasoning_tokens` | Reasoning/chain-of-thought tokens (OpenAI) |
-| `cachedTokens` | `cache_read_input_tokens` / `cached_tokens` | Prompt cache read tokens |
-| `cacheWrite5mTokens` | `cache_creation_input_tokens` | 5-min TTL cache write tokens |
-| `cacheWrite1hTokens` | `cache_creation.ephemeral_1h_input_tokens` | 1-hour TTL cache write tokens |
-| `toolCalls` | Content block / choice inspection | Number of tool calls in the response |
-| `externalCostCents` | User-provided | Cost of external tools/services (image gen, sandbox, search) |
-| `provider` | Wrapper default or override | `anthropic`, `openai`, `openai_compatible`, `bedrock` |
-| `model` | Response object | Model identifier as reported by the provider |
-Cost is computed **server-side** from token counts and a rate card that auto-syncs daily from the public LiteLLM registry. Unknown models are priced at $0 and flagged in logs.
+| `tokensIn` | All | Input tokens consumed |
+| `tokensOut` | All | Output tokens generated |
+| `reasoningTokens` | OpenAI, Anthropic, Gemini | Reasoning/thinking tokens (billed at reasoning rate) |
+| `cachedTokens` | All | Prompt cache read tokens (reduced rate) |
+| `cacheWrite5mTokens` | Anthropic, Bedrock | 5-min TTL cache creation tokens |
+| `cacheWrite1hTokens` | Anthropic | 1-hour TTL cache creation tokens |
+| `audioTokensIn` | OpenAI | Audio input tokens (GPT-4o audio / Realtime) |
+| `audioTokensOut` | OpenAI, Gemini | Audio output tokens |
+| `imageTokensIn` | Gemini | Image/vision input tokens |
+| `imageTokensOut` | Gemini | Image generation output tokens |
+| `videoTokensIn` | Gemini | Video input tokens |
+| `textTokensIn` | OpenAI, Gemini | Text-only input tokens (modality split) |
+| `textTokensOut` | OpenAI, Gemini | Text-only output tokens |
+| `webSearchRequests` | Anthropic, Gemini | Web search requests (server tools / grounding) |
+| `acceptedPredictionTokens` | OpenAI | Predicted Outputs: accepted tokens |
+| `rejectedPredictionTokens` | OpenAI | Predicted Outputs: rejected tokens (waste) |
+| `serviceTier` | OpenAI | Service tier used (`default`, `flex`, `priority`) |
+| `latencyMs` | All | SDK-measured request duration in milliseconds |
+| `toolCalls` | All | Number of tool calls in the response |
+| `model` | All | Model identifier as reported by the provider |
+Cost is computed **server-side** from token counts and a rate card that auto-syncs daily from the LiteLLM registry (1,500+ models). Rate cards include per-token pricing for text, audio, image, video, cache, reasoning, and web search. Unknown models are priced at $0 and flagged in logs.
 ---
 ## Outcome-Based Pricing
-Under per-resolution pricing, only a **resolved** run earns revenue. An escalated or failed run earns $0 but its provider cost still counts. The pattern:
-1. **Wrap** to meter cost on every LLM call (automatic).
-2. **Resolve** once at the end to book the outcome.
+Under per-resolution pricing, only a **resolved** run earns revenue. An escalated or failed run earns $0 but its provider cost still counts.
 ```ts
 const runId = 'ticket_8842';
@@ -211,7 +236,7 @@ const anthropic = wrapAnthropic(new Anthropic(), tollgate, {
   runId,
 });
-// … multiple LLM calls within this run …
+// ... multiple LLM calls within this run ...
 await tollgate.resolve({
   runId,
@@ -227,12 +252,9 @@ For simple per-call billing, pass `revenueUnitCents` in the wrap options and ski
 ## External Tool Costs
-AI agents often call external services — image generation, code sandboxes, search APIs, vector databases — that cost real money outside of LLM token pricing. Report these costs alongside the LLM call so Tollgate includes them in the margin calculation.
+Report costs from external services (image generation, code sandboxes, search APIs) alongside LLM calls:
 ```ts
-// Agent generates an image during the run
-const image = await dalle.generate({ prompt: '...' }); // costs $0.04
 await tollgate.track({
   customerId: 'cust_acme',
   runId: 'ticket_8842',
@@ -240,35 +262,21 @@ await tollgate.track({
   model: 'gpt-4o',
   tokensIn: 500,
   tokensOut: 200,
-  toolCalls: 1,                // LLM tool_use count (auto-extracted by wrappers)
   externalCostCents: 4.0,     // $0.04 for the DALL-E call
   idempotencyKey: 'ticket_8842#step_2',
 });
 ```
-Common examples:
-| Service | Typical Cost | How to report |
-|---|---|---|
-| DALL-E image generation | ~$0.04/image | `externalCostCents: 4` |
-| E2B code sandbox | ~$0.01/run | `externalCostCents: 1` |
-| Tavily search API | ~$0.01/search | `externalCostCents: 1` |
-| Pinecone vector query | ~$0.001/query | `externalCostCents: 0.1` |
-External costs flow into the **Tools** bucket in the Customer Drawer's cost-split chart — so you can see exactly what share of each customer's cost comes from external services vs. LLM tokens.
 ---
 ## Customer & Plan Setup
-Create customers and assign plans **before** sending usage so plan-priced revenue is recognized from the first event. Idempotent — safe to run on every boot.
+Create customers and assign plans before sending usage so plan-priced revenue is recognized from the first event. Idempotent.
 ```ts
 await tollgate.upsertCustomer({
   customerId: 'cust_acme',
   name: 'Acme Corp',
-  company: 'Acme Corp',
-  seats: 5,
   plan: {
     name: 'Pro Plan',
     pricingModel: 'usage_based',   // per_unit | per_resolution | usage_based | per_seat | flat | hybrid
@@ -279,57 +287,26 @@ await tollgate.upsertCustomer({
 ---
-## Manual Tracking
-For full control, unusual providers, or non-LLM cost events:
-```ts
-await tollgate.track({
-  customerId: 'cust_acme',
-  runId: 'run_12345',
-  provider: 'anthropic',
-  model: 'claude-sonnet-4-6',
-  tokensIn: 1200,
-  tokensOut: 450,
-  reasoningTokens: 0,
-  cachedTokens: 0,
-  toolCalls: 2,
-  revenueUnitCents: 50,
-  idempotencyKey: 'run_12345#step_1',
-});
-```
-### Already have an exact cost?
-Pass `providerCostCents` (a number or a function of the response) and the server uses it verbatim, skipping the rate card entirely:
-```ts
-const anthropic = wrapAnthropic(new Anthropic(), tollgate, {
-  customerId: 'cust_acme',
-  providerCostCents: 3.5,   // or: (response) => computeMyOwnCost(response)
-});
-```
----
 ## API Reference
 ### Exports
 ```ts
 // Client
-createTollgateClient(options?)   // → TollgateClient
+createTollgateClient(options?)   // -> TollgateClient
 TollgateError                    // Custom error with status & body
 // Auto-instrumentation wrappers
-wrapAnthropic(client, tollgate, options)   // → instrumented Anthropic client
-wrapOpenAI(client, tollgate, options)      // → instrumented OpenAI / compatible client
-wrapBedrock(client, tollgate, options)     // → instrumented Bedrock Runtime client
+wrapAnthropic(client, tollgate, options)       // -> instrumented Anthropic client
+wrapOpenAI(client, tollgate, options)          // -> instrumented OpenAI / compatible client
+wrapBedrock(client, tollgate, options)         // -> instrumented Bedrock Runtime client
+wrapGemini(model, tollgate, options)           // -> instrumented Gemini model
 // Low-level event builders (for manual track payloads)
-anthropicEventFrom(msg, options)           // → TrackEventInput | null
-openAIEventFrom(completion, options)       // → TrackEventInput | null
-bedrockEventFrom(usage, model, options)    // → TrackEventInput | null
+anthropicEventFrom(msg, options)               // -> TrackEventInput | null
+openAIEventFrom(completion, options)           // -> TrackEventInput | null
+bedrockEventFrom(usage, model, options)        // -> TrackEventInput | null
+geminiEventFrom(response, options)             // -> TrackEventInput | null
 ```
 ### TollgateClient
@@ -344,32 +321,54 @@ bedrockEventFrom(usage, model, options)    // → TrackEventInput | null
 | Field | Type | Required | Description |
 |---|---|---|---|
-| `customerId` | `string` | Yes | Your end customer's stable identifier. |
-| `agentId` | `string` | No | Agent or workflow identifier. |
-| `runId` | `string \| () => string` | No | Logical run ID. Defaults to the provider response ID. |
-| `provider` | `Provider` | No | Override the reported provider (e.g. `'openai_compatible'`). |
-| `revenueUnitCents` | `number \| (response) => number` | No | Revenue per call in cents. |
-| `providerCostCents` | `number \| (response) => number` | No | Exact cost override — skips rate card. |
-| `onError` | `(err) => void` | No | Error handler for background tracking (default: `console.warn`). |
+| `customerId` | `string` | Yes | Your end customer's stable identifier |
+| `agentId` | `string` | No | Agent or workflow identifier |
+| `runId` | `string \| () => string` | No | Logical run ID (defaults to provider response ID) |
+| `provider` | `Provider` | No | Override the reported provider |
+| `revenueUnitCents` | `number \| (response) => number` | No | Revenue per call in cents |
+| `providerCostCents` | `number \| (response) => number` | No | Exact cost override (skips rate card) |
+| `onError` | `(err) => void` | No | Error handler for background tracking |
 ---
 ## How It Works
-1. **Proxy wrappers** intercept `messages.create` / `chat.completions.create` / `send` without modifying the request or response.
-2. After the provider responds, the wrapper extracts token counts, tool call counts, and metadata from the response's `usage` object and content blocks.
-3. A `POST /api/track` is fired **in the background** — non-blocking, with automatic retries on transient failures.
-4. The server computes cost from tokens via rate cards, joins it with your plan-configured revenue, and updates real-time margin rollups.
-5. Events are **idempotent** on `idempotencyKey` (auto-set to the provider response ID), so retries and stream replays never double-count.
+1. **Proxy wrappers** intercept provider calls without modifying the request or response.
+2. After the provider responds, the wrapper extracts token counts (by modality), tool calls, service tier, and latency from the response.
+3. A `POST /api/track` fires **in the background** with automatic retries on transient failures.
+4. The server computes cost from tokens via rate cards (text, audio, image, video, cache, reasoning, web search), joins it with plan-configured revenue, and updates real-time margin rollups.
+5. Events are **idempotent** on `idempotencyKey` (auto-set to the provider response ID).
 ## Privacy & Security
 - **No prompt content is ever sent.** Only token counts, model name, and metadata.
-- Events are deduplicated server-side — safe to retry.
+- Events are deduplicated server-side.
 - Background tracking never throws into your application code.
 ---
+## What's New in v0.6.0
+- **Fix: Anthropic thinking token extraction** — `output_tokens_details.thinking_tokens` is now extracted and costed at the reasoning rate instead of the output rate. Previously, thinking tokens from extended thinking (Sonnet 4.x, Opus 4.x) were invisible to cost computation.
+- **Fix: OpenAI double-counting** — `completion_tokens` includes reasoning and audio sub-totals; these are now subtracted from `tokensOut` so each token is costed at exactly one rate. Previously, reasoning tokens were billed at both the output rate and the reasoning rate.
+- **Fix: OpenAI input double-counting** — `prompt_tokens` includes cached and audio sub-totals; these are now subtracted from `tokensIn`. Previously, cached tokens were billed at both the full input rate and the cached rate.
+- **Fix: Multimodal-only events** — audio, image, video, and web search events now trigger rate-card lookup even when text token counts are zero.
+- `reasoningTokens` is now extracted from **all three** providers: OpenAI, Anthropic, and Gemini.
+### v0.5.0
+- Google Gemini / Vertex AI support (`wrapGemini`) with full multimodal extraction
+- Audio token tracking (OpenAI GPT-4o audio / Realtime API)
+- Image & video token tracking (Gemini per-modality breakdowns)
+- Web search request tracking (Anthropic `server_tool_use`, Gemini grounding)
+- Latency measurement on all wrappers (SDK-measured `latencyMs`)
+- OpenAI Predicted Outputs (`acceptedPredictionTokens` / `rejectedPredictionTokens`)
+- Service tier tracking (OpenAI `flex` / `priority`, Anthropic `priority`)
+- Text modality split for accurate cost attribution in mixed-modal requests
+- Expanded rate card sync: audio, image, video, and web search rates from LiteLLM
+---
 ## License
 Licensed for use with Tollgate.

package/dist/index.cjs CHANGED Viewed

@@ -162,6 +162,7 @@ function anthropicEventFrom(msg, opts) {
   const oneh = usage.cache_creation?.ephemeral_1h_input_tokens;
   const hasSplit = fivem !== void 0 || oneh !== void 0;
   const toolCalls = Array.isArray(msg.content) ? msg.content.filter((b) => b.type === "tool_use").length : 0;
+  const thinkingTokens = usage.output_tokens_details?.thinking_tokens ?? 0;
   const event = {
     customerId: opts.customerId,
     agentId: opts.agentId,
@@ -169,10 +170,12 @@ function anthropicEventFrom(msg, opts) {
     provider: opts.provider ?? "anthropic",
     model: msg.model ?? "unknown",
     tokensIn: usage.input_tokens ?? 0,
-    tokensOut: usage.output_tokens ?? 0,
+    tokensOut: (usage.output_tokens ?? 0) - thinkingTokens,
+    reasoningTokens: thinkingTokens,
     cachedTokens: usage.cache_read_input_tokens ?? 0,
     cacheWrite5mTokens: hasSplit ? fivem ?? 0 : usage.cache_creation_input_tokens ?? 0,
     cacheWrite1hTokens: hasSplit ? oneh ?? 0 : 0,
+    webSearchRequests: usage.server_tool_use?.web_search_requests ?? 0,
     toolCalls,
     revenueUnitCents: resolveRevenue(opts, msg),
     idempotencyKey: msg.id ?? `${runId}#${randomId()}`
@@ -183,6 +186,7 @@ function wrapAnthropic(client, tollgate, opts) {
   const messages = client.messages;
   const original = messages.create.bind(messages);
   const create = async (...args) => {
+    const t0 = Date.now();
     const result = await original(...args);
     if (isAsyncIterable(result)) {
       const msg = {};
@@ -195,7 +199,11 @@ function wrapAnthropic(client, tollgate, opts) {
             msg.model = ev.message.model;
             msg.usage = { ...ev.message.usage };
           } else if (ev.type === "message_delta" && ev.usage) {
-            msg.usage = { ...msg.usage ?? {}, output_tokens: ev.usage.output_tokens };
+            msg.usage = {
+              ...msg.usage ?? {},
+              output_tokens: ev.usage.output_tokens,
+              output_tokens_details: ev.usage.output_tokens_details
+            };
           } else if (ev.type === "content_block_start" && ev.content_block?.type === "tool_use") {
             toolUseBlocks.push(ev.content_block);
           }
@@ -203,12 +211,18 @@ function wrapAnthropic(client, tollgate, opts) {
         () => {
           msg.content = toolUseBlocks;
           const event2 = anthropicEventFrom(msg, opts);
-          if (event2) fireAndForget(tollgate.track(event2), opts.onError);
+          if (event2) {
+            event2.latencyMs = Date.now() - t0;
+            fireAndForget(tollgate.track(event2), opts.onError);
+          }
         }
       );
     }
     const event = anthropicEventFrom(result, opts);
-    if (event) fireAndForget(tollgate.track(event), opts.onError);
+    if (event) {
+      event.latencyMs = Date.now() - t0;
+      fireAndForget(tollgate.track(event), opts.onError);
+    }
     return result;
   };
   return new Proxy(client, {
@@ -227,16 +241,29 @@ function openAIEventFrom(completion, opts) {
   if (!usage) return null;
   const runId = resolveRunId(opts, completion.id);
   const toolCalls = completion.choices?.[0]?.message?.tool_calls?.length ?? 0;
+  const ptd = usage.prompt_tokens_details;
+  const ctd = usage.completion_tokens_details;
+  const cachedIn = ptd?.cached_tokens ?? 0;
+  const audioIn = ptd?.audio_tokens ?? 0;
+  const reasoningOut = ctd?.reasoning_tokens ?? 0;
+  const audioOut = ctd?.audio_tokens ?? 0;
   const event = {
     customerId: opts.customerId,
     agentId: opts.agentId,
     runId,
     provider: opts.provider ?? "openai",
     model: completion.model ?? "unknown",
-    tokensIn: usage.prompt_tokens ?? 0,
-    tokensOut: usage.completion_tokens ?? 0,
-    reasoningTokens: usage.completion_tokens_details?.reasoning_tokens ?? 0,
-    cachedTokens: usage.prompt_tokens_details?.cached_tokens ?? 0,
+    tokensIn: (usage.prompt_tokens ?? 0) - cachedIn - audioIn,
+    tokensOut: (usage.completion_tokens ?? 0) - reasoningOut - audioOut,
+    reasoningTokens: reasoningOut,
+    cachedTokens: cachedIn,
+    audioTokensIn: audioIn,
+    audioTokensOut: audioOut,
+    textTokensIn: ptd?.text_tokens ?? 0,
+    textTokensOut: ctd?.text_tokens ?? 0,
+    acceptedPredictionTokens: ctd?.accepted_prediction_tokens ?? 0,
+    rejectedPredictionTokens: ctd?.rejected_prediction_tokens ?? 0,
+    serviceTier: completion.service_tier,
     toolCalls,
     revenueUnitCents: resolveRevenue(opts, completion),
     idempotencyKey: completion.id ?? `${runId}#${randomId()}`
@@ -247,11 +274,13 @@ function wrapOpenAI(client, tollgate, opts) {
   const completions = client.chat.completions;
   const original = completions.create.bind(completions);
   const create = async (...args) => {
+    const t0 = Date.now();
     const result = await original(...args);
     if (isAsyncIterable(result)) {
       let id;
       let model;
       let usage;
+      let serviceTier;
       const toolCallIndices = /* @__PURE__ */ new Set();
       return instrumentStream(
         result,
@@ -259,6 +288,7 @@ function wrapOpenAI(client, tollgate, opts) {
           if (chunk.id) id = chunk.id;
           if (chunk.model) model = chunk.model;
           if (chunk.usage) usage = chunk.usage;
+          if (chunk.service_tier) serviceTier = chunk.service_tier;
           for (const c of chunk.choices ?? []) {
             for (const tc of c.delta?.tool_calls ?? []) {
               if (tc.index !== void 0) toolCallIndices.add(tc.index);
@@ -267,17 +297,23 @@ function wrapOpenAI(client, tollgate, opts) {
         },
         () => {
           if (!usage) return;
-          const synth = { id, model, usage };
+          const synth = { id, model, usage, service_tier: serviceTier };
           if (toolCallIndices.size > 0) {
             synth.choices = [{ message: { tool_calls: new Array(toolCallIndices.size) } }];
           }
           const event2 = openAIEventFrom(synth, opts);
-          if (event2) fireAndForget(tollgate.track(event2), opts.onError);
+          if (event2) {
+            event2.latencyMs = Date.now() - t0;
+            fireAndForget(tollgate.track(event2), opts.onError);
+          }
         }
       );
     }
     const event = openAIEventFrom(result, opts);
-    if (event) fireAndForget(tollgate.track(event), opts.onError);
+    if (event) {
+      event.latencyMs = Date.now() - t0;
+      fireAndForget(tollgate.track(event), opts.onError);
+    }
     return result;
   };
   return new Proxy(client, {
@@ -316,6 +352,7 @@ function bedrockEventFrom(usage, model, opts, response = void 0, toolCalls = 0)
 function wrapBedrock(client, tollgate, opts) {
   const originalSend = client.send.bind(client);
   const send = async (command, ...rest) => {
+    const t0 = Date.now();
     const result = await originalSend(command, ...rest);
     const model = command?.input?.modelId ?? "unknown";
     if (result?.stream && isAsyncIterable(result.stream)) {
@@ -329,7 +366,10 @@ function wrapBedrock(client, tollgate, opts) {
         },
         () => {
           const event = bedrockEventFrom(usage, model, opts, result, streamToolCalls);
-          if (event) fireAndForget(tollgate.track(event), opts.onError);
+          if (event) {
+            event.latencyMs = Date.now() - t0;
+            fireAndForget(tollgate.track(event), opts.onError);
+          }
         }
       );
       return result;
@@ -337,7 +377,10 @@ function wrapBedrock(client, tollgate, opts) {
     if (result?.usage) {
       const tc = result.output?.message?.content?.filter((b) => b.toolUse != null).length ?? 0;
       const event = bedrockEventFrom(result.usage, model, opts, result, tc);
-      if (event) fireAndForget(tollgate.track(event), opts.onError);
+      if (event) {
+        event.latencyMs = Date.now() - t0;
+        fireAndForget(tollgate.track(event), opts.onError);
+      }
     }
     return result;
   };
@@ -348,14 +391,110 @@ function wrapBedrock(client, tollgate, opts) {
     }
   });
 }
+function modalityTokens(details, modality) {
+  if (!details) return 0;
+  return details.filter((d) => d.modality === modality).reduce((sum, d) => sum + (d.tokenCount ?? 0), 0);
+}
+function geminiEventFrom(response, opts) {
+  const usage = response?.usageMetadata;
+  if (!usage) return null;
+  const runId = resolveRunId(opts, void 0);
+  const candidates = response.candidates ?? [];
+  const toolCalls = candidates.reduce((sum, c) => {
+    const parts = c.content?.parts ?? [];
+    return sum + parts.filter((p) => p.functionCall != null).length;
+  }, 0);
+  const webSearchRequests = candidates.reduce((sum, c) => {
+    return sum + (c.groundingMetadata?.webSearchQueries?.length ?? 0);
+  }, 0);
+  const promptDetails = usage.promptTokensDetails;
+  const candidateDetails = usage.candidatesTokensDetails;
+  const event = {
+    customerId: opts.customerId,
+    agentId: opts.agentId,
+    runId,
+    provider: opts.provider ?? "google",
+    model: "unknown",
+    tokensIn: usage.promptTokenCount ?? 0,
+    tokensOut: usage.candidatesTokenCount ?? 0,
+    reasoningTokens: usage.thoughtsTokenCount ?? 0,
+    cachedTokens: usage.cachedContentTokenCount ?? 0,
+    audioTokensIn: modalityTokens(promptDetails, "AUDIO"),
+    audioTokensOut: modalityTokens(candidateDetails, "AUDIO"),
+    imageTokensIn: modalityTokens(promptDetails, "IMAGE"),
+    imageTokensOut: modalityTokens(candidateDetails, "IMAGE"),
+    videoTokensIn: modalityTokens(promptDetails, "VIDEO"),
+    textTokensIn: modalityTokens(promptDetails, "TEXT"),
+    textTokensOut: modalityTokens(candidateDetails, "TEXT"),
+    webSearchRequests,
+    toolCalls,
+    revenueUnitCents: resolveRevenue(opts, response),
+    idempotencyKey: `${runId}#${randomId()}`
+  };
+  return withCost(event, opts, response);
+}
+function wrapGemini(model, tollgate, opts) {
+  const original = model.generateContent.bind(model);
+  const modelName = model.model ?? "unknown";
+  const generateContent = async (...args) => {
+    const t0 = Date.now();
+    const result = await original(...args);
+    if (isAsyncIterable(result)) {
+      const accumulated = {};
+      let toolCallCount = 0;
+      let searchCount = 0;
+      return instrumentStream(
+        result,
+        (chunk) => {
+          if (chunk.usageMetadata) {
+            Object.assign(accumulated, chunk.usageMetadata);
+          }
+          for (const c of chunk.candidates ?? []) {
+            for (const p of c.content?.parts ?? []) {
+              if (p.functionCall != null) toolCallCount++;
+            }
+            searchCount += c.groundingMetadata?.webSearchQueries?.length ?? 0;
+          }
+        },
+        () => {
+          const synth = {
+            usageMetadata: accumulated,
+            candidates: searchCount > 0 || toolCallCount > 0 ? [{ content: { parts: new Array(toolCallCount).fill({ functionCall: {} }) }, groundingMetadata: { webSearchQueries: new Array(searchCount) } }] : []
+          };
+          const event2 = geminiEventFrom(synth, opts);
+          if (event2) {
+            event2.model = modelName;
+            event2.latencyMs = Date.now() - t0;
+            fireAndForget(tollgate.track(event2), opts.onError);
+          }
+        }
+      );
+    }
+    const event = geminiEventFrom(result, opts);
+    if (event) {
+      event.model = modelName;
+      event.latencyMs = Date.now() - t0;
+      fireAndForget(tollgate.track(event), opts.onError);
+    }
+    return result;
+  };
+  return new Proxy(model, {
+    get(target, prop, recv) {
+      if (prop === "generateContent") return generateContent;
+      return Reflect.get(target, prop, recv);
+    }
+  });
+}
 exports.TollgateError = TollgateError;
 exports.anthropicEventFrom = anthropicEventFrom;
 exports.bedrockEventFrom = bedrockEventFrom;
 exports.createTollgateClient = createTollgateClient;
+exports.geminiEventFrom = geminiEventFrom;
 exports.openAIEventFrom = openAIEventFrom;
 exports.wrapAnthropic = wrapAnthropic;
 exports.wrapBedrock = wrapBedrock;
+exports.wrapGemini = wrapGemini;
 exports.wrapOpenAI = wrapOpenAI;
 //# sourceMappingURL=index.cjs.map
 //# sourceMappingURL=index.cjs.map