npm - @tollgateai/sdk - Versions diffs - 0.2.0 → 0.3.0 - Mend

@tollgateai/sdk 0.2.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md CHANGED Viewed

@@ -1,72 +1,106 @@
 # @tollgateai/sdk
-Track **real** LLM model usage and compute live gross margin with
-[Tollgate](https://tollgateai.vercel.app). The SDK reads the actual `usage`
-object off each provider response — you never hand-count tokens.
+> Real-time gross-margin observability for AI agents. Track every LLM call's cost, attribute it to a customer, and see whether you're making money — before the invoice goes out.
-Published on npm: [@tollgateai/sdk](https://www.npmjs.com/package/@tollgateai/sdk) (v0.2.0).
+**v0.3.0** &middot; [npm](https://www.npmjs.com/package/@tollgateai/sdk) &middot; [Dashboard](https://tollgateai.vercel.app)
-Works with **OpenAI**, **Anthropic**, **AWS Bedrock**, and **every OpenAI-compatible
-gateway** (Vercel AI Gateway, OpenRouter, Groq, Together, Nebius, local vLLM, …) —
-streaming and non-streaming. Cost is computed server-side from the token counts the
-wrappers capture, so no provider has to return a dollar figure.
+---
+## Why Tollgate
+You sell an AI-powered product. Each customer interaction triggers LLM calls that cost you real money — input tokens, output tokens, reasoning tokens, cached tokens, tool calls. Tollgate captures that cost automatically from provider responses, joins it with the revenue your pricing model defines, and shows you per-customer, per-agent, per-run gross margin in real time.
+## Installation
 ```bash
 npm install @tollgateai/sdk
-# or: pnpm add @tollgateai/sdk / yarn add @tollgateai/sdk
 ```
-Create an API key in **Tollgate → Integrations**, then set:
 ```bash
-TOLLGATE_API_KEY=tg_live_xxx
-# optional, defaults to the hosted app:
-TOLLGATE_BASE_URL=https://tollgateai.vercel.app
+pnpm add @tollgateai/sdk   # or yarn add @tollgateai/sdk
 ```
-## Auto-instrumentation (recommended)
+Requires Node.js 18+. Zero runtime dependencies.
-Wrap your provider client once; every call reports real usage in the background.
-### Anthropic
+## Quick Start
 ```ts
 import Anthropic from '@anthropic-ai/sdk';
 import { createTollgateClient, wrapAnthropic } from '@tollgateai/sdk';
-const tollgate = createTollgateClient(); // reads TOLLGATE_API_KEY
-// Pin a runId so every call in this run is grouped and reports cost only.
-const runId = 'ticket_8842';
+const tollgate = createTollgateClient();          // reads TOLLGATE_API_KEY from env
 const anthropic = wrapAnthropic(new Anthropic(), tollgate, {
-  customerId: 'cust_A',     // your end customer
-  runId,
+  customerId: 'cust_acme',
+  runId: 'ticket_8842',
 });
-// Use the client normally — usage is tracked automatically.
-await anthropic.messages.create({
+// Every call is tracked automatically — tokens, cost, tool calls.
+const msg = await anthropic.messages.create({
   model: 'claude-sonnet-4-6',
-  max_tokens: 512,
-  messages: [{ role: 'user', content: 'Resolve this ticket…' }],
+  max_tokens: 1024,
+  messages: [{ role: 'user', content: 'Resolve this billing dispute…' }],
 });
-// Book revenue once, when the run finishes — "no outcome, no charge".
+// Close the run and book revenue.
 await tollgate.resolve({
-  runId,
-  customerId: 'cust_A',
-  outcome: 'resolved',      // 'resolved' | 'escalated' | 'failed'
-  revenueUnitCents: 50,     // charge for this resolved unit ($0.50)
+  runId: 'ticket_8842',
+  customerId: 'cust_acme',
+  outcome: 'resolved',
+  revenueUnitCents: 50,       // $0.50 per resolved ticket
+});
+```
+## Provider Support
+| Provider | Wrapper | Streaming | Tool-Call Tracking |
+|---|---|---|---|
+| Anthropic | `wrapAnthropic` | Automatic | Counts `tool_use` content blocks |
+| OpenAI | `wrapOpenAI` | Needs `stream_options: { include_usage: true }` | Counts `tool_calls` on choices |
+| OpenAI-compatible (Groq, OpenRouter, Together, Nebius, vLLM, …) | `wrapOpenAI` with `provider: 'openai_compatible'` | Same as OpenAI | Same as OpenAI |
+| AWS Bedrock | `wrapBedrock` | Automatic | Counts `toolUse` content blocks |
+## Configuration
+| Environment Variable | Required | Default |
+|---|---|---|
+| `TOLLGATE_API_KEY` | Yes | — |
+| `TOLLGATE_BASE_URL` | No | `https://tollgateai.vercel.app` |
+Or pass them directly:
+```ts
+const tollgate = createTollgateClient({
+  apiKey: 'tg_live_xxx',
+  baseUrl: 'https://tollgateai.vercel.app',
+  timeoutMs: 10_000,   // per-request timeout (default 10s)
+  maxRetries: 2,        // retries on 5xx/429/network (default 2)
 });
 ```
-### Outcome-based pricing
+---
+## Auto-Instrumentation
+Wrap your provider client once. Every `create` call reports usage in the background — non-blocking, fire-and-forget. Failures go to `onError` (default: `console.warn`) and never break your LLM call.
+### Anthropic
+```ts
+import Anthropic from '@anthropic-ai/sdk';
+import { createTollgateClient, wrapAnthropic } from '@tollgateai/sdk';
-Under per-resolution / outcome pricing, only a **resolved** run earns revenue —
-an `escalated`/`failed` run earns $0 but its provider cost still counts against
-you. Wrap your client to meter cost on every call, then call `resolve()` once at
-the end of the run to book the outcome (and, if resolved, its revenue). For
-simple per-call billing you can instead pass `revenueUnitCents` in the wrap
-options and skip `resolve()`.
+const tollgate = createTollgateClient();
+const anthropic = wrapAnthropic(new Anthropic(), tollgate, {
+  customerId: 'cust_acme',
+  runId: 'ticket_8842',
+});
+await anthropic.messages.create({
+  model: 'claude-sonnet-4-6',
+  max_tokens: 512,
+  messages: [{ role: 'user', content: 'Summarize this ticket…' }],
+});
+```
 ### OpenAI
@@ -75,7 +109,7 @@ import OpenAI from 'openai';
 import { createTollgateClient, wrapOpenAI } from '@tollgateai/sdk';
 const tollgate = createTollgateClient();
-const openai = wrapOpenAI(new OpenAI(), tollgate, { customerId: 'cust_A' });
+const openai = wrapOpenAI(new OpenAI(), tollgate, { customerId: 'cust_acme' });
 await openai.chat.completions.create({
   model: 'gpt-4o',
@@ -83,66 +117,140 @@ await openai.chat.completions.create({
 });
 ```
-`revenueUnitCents` may also be a function of the response, e.g.
-`revenueUnitCents: (res) => res.someField ? 50 : 0`.
-### OpenAI-compatible gateways
+### OpenAI-Compatible Gateways
-Point the OpenAI SDK at any compatible endpoint and set `provider:
-'openai_compatible'` so the server prices it from the gateway-echoed model name:
+Point the OpenAI SDK at any compatible endpoint and set `provider: 'openai_compatible'`:
 ```ts
-const openai = new OpenAI({ apiKey: process.env.GROQ_API_KEY, baseURL: 'https://api.groq.com/openai/v1' });
-const client = wrapOpenAI(openai, tollgate, {
-  customerId: 'cust_A',
-  provider: 'openai_compatible',     // Groq / OpenRouter / Together / Nebius / vLLM …
+import OpenAI from 'openai';
+import { createTollgateClient, wrapOpenAI } from '@tollgateai/sdk';
+const tollgate = createTollgateClient();
+const groq = wrapOpenAI(
+  new OpenAI({ apiKey: process.env.GROQ_API_KEY, baseURL: 'https://api.groq.com/openai/v1' }),
+  tollgate,
+  { customerId: 'cust_acme', provider: 'openai_compatible' },
+);
+await groq.chat.completions.create({
+  model: 'llama-3.3-70b-versatile',
+  messages: [{ role: 'user', content: 'Hello' }],
 });
-await client.chat.completions.create({ model: 'llama-3.3-70b-versatile', messages: [...] });
+```
+### AWS Bedrock
+```ts
+import { BedrockRuntimeClient, ConverseCommand } from '@aws-sdk/client-bedrock-runtime';
+import { createTollgateClient, wrapBedrock } from '@tollgateai/sdk';
+const tollgate = createTollgateClient();
+const bedrock = wrapBedrock(
+  new BedrockRuntimeClient({ region: 'us-east-1' }),
+  tollgate,
+  { customerId: 'cust_acme' },
+);
+await bedrock.send(new ConverseCommand({
+  modelId: 'anthropic.claude-3-5-sonnet-20241022-v2:0',
+  messages: [{ role: 'user', content: [{ text: 'Hello' }] }],
+}));
 ```
 ### Streaming
-Streaming is captured automatically. For **OpenAI / compatible**, pass
-`stream_options: { include_usage: true }` (required for a final usage chunk); for
-**Anthropic** no flag is needed. Just iterate the stream as usual:
+Streaming is captured automatically — iterate the stream as usual and usage is reported when the stream ends.
+**OpenAI / compatible** requires `stream_options: { include_usage: true }` for the final usage chunk. **Anthropic** and **Bedrock** need no extra flags.
 ```ts
-const stream = await client.chat.completions.create({
-  model: 'gpt-4o', stream: true, stream_options: { include_usage: true },
+const stream = await openai.chat.completions.create({
+  model: 'gpt-4o',
+  stream: true,
+  stream_options: { include_usage: true },
   messages: [{ role: 'user', content: 'Hello' }],
 });
-for await (const chunk of stream) { /* … */ }   // usage is reported when the stream ends
+for await (const chunk of stream) { /* render to UI */ }
+// Usage reported automatically when stream ends.
 ```
-### AWS Bedrock
+---
+## What Gets Tracked
+Every auto-instrumented call captures the following from the provider response:
+| Field | Source | Description |
+|---|---|---|
+| `tokensIn` | `usage.input_tokens` / `prompt_tokens` | Input tokens consumed |
+| `tokensOut` | `usage.output_tokens` / `completion_tokens` | Output tokens generated |
+| `reasoningTokens` | `completion_tokens_details.reasoning_tokens` | Reasoning/chain-of-thought tokens (OpenAI) |
+| `cachedTokens` | `cache_read_input_tokens` / `cached_tokens` | Prompt cache read tokens |
+| `cacheWrite5mTokens` | `cache_creation_input_tokens` | 5-min TTL cache write tokens |
+| `cacheWrite1hTokens` | `cache_creation.ephemeral_1h_input_tokens` | 1-hour TTL cache write tokens |
+| `toolCalls` | Content block / choice inspection | Number of tool calls in the response |
+| `provider` | Wrapper default or override | `anthropic`, `openai`, `openai_compatible`, `bedrock` |
+| `model` | Response object | Model identifier as reported by the provider |
+Cost is computed **server-side** from token counts and a rate card that auto-syncs daily from the public LiteLLM registry. Unknown models are priced at $0 and flagged in logs.
-Wrap a `BedrockRuntimeClient` so `ConverseCommand` / `ConverseStreamCommand`
-auto-report usage (the model id is read from the command):
+---
+## Outcome-Based Pricing
+Under per-resolution pricing, only a **resolved** run earns revenue. An escalated or failed run earns $0 but its provider cost still counts. The pattern:
+1. **Wrap** to meter cost on every LLM call (automatic).
+2. **Resolve** once at the end to book the outcome.
 ```ts
-import { BedrockRuntimeClient, ConverseCommand } from '@aws-sdk/client-bedrock-runtime';
-import { wrapBedrock } from '@tollgateai/sdk';
+const runId = 'ticket_8842';
+const anthropic = wrapAnthropic(new Anthropic(), tollgate, {
+  customerId: 'cust_acme',
+  runId,
+});
-const bedrock = wrapBedrock(new BedrockRuntimeClient({ region: 'us-east-1' }), tollgate, { customerId: 'cust_A' });
-await bedrock.send(new ConverseCommand({ modelId: 'anthropic.claude-3-5-sonnet-20241022-v2:0', messages: [...] }));
+// … multiple LLM calls within this run …
+await tollgate.resolve({
+  runId,
+  customerId: 'cust_acme',
+  outcome: 'resolved',        // 'resolved' | 'escalated' | 'failed'
+  revenueUnitCents: 50,
+});
 ```
-### Already have an exact cost?
+For simple per-call billing, pass `revenueUnitCents` in the wrap options and skip `resolve()`.
-Pass `providerCostCents` (a number or a function of the response) and the server
-uses it verbatim, skipping the rate card entirely.
+---
-## Manual tracking
+## Customer & Plan Setup
-For full control or unusual providers:
+Create customers and assign plans **before** sending usage so plan-priced revenue is recognized from the first event. Idempotent — safe to run on every boot.
 ```ts
-import { createTollgateClient } from '@tollgateai/sdk';
+await tollgate.upsertCustomer({
+  customerId: 'cust_acme',
+  name: 'Acme Corp',
+  company: 'Acme Corp',
+  seats: 5,
+  plan: {
+    name: 'Pro Plan',
+    pricingModel: 'usage_based',   // per_unit | per_resolution | usage_based | per_seat | flat | hybrid
+    unitRevenueCents: 10,
+  },
+});
+```
-const tollgate = createTollgateClient({ apiKey: process.env.TOLLGATE_API_KEY });
+---
+## Manual Tracking
+For full control, unusual providers, or non-LLM cost events:
+```ts
 await tollgate.track({
-  customerId: 'cust_A',
+  customerId: 'cust_acme',
   runId: 'run_12345',
   provider: 'anthropic',
   model: 'claude-sonnet-4-6',
@@ -150,35 +258,83 @@ await tollgate.track({
   tokensOut: 450,
   reasoningTokens: 0,
   cachedTokens: 0,
+  toolCalls: 2,
   revenueUnitCents: 50,
-  idempotencyKey: 'run_12345#step_1', // exactly-once: safe to retry
+  idempotencyKey: 'run_12345#step_1',
 });
 ```
-## Notes
-- **Idempotent.** Events are deduplicated on `idempotencyKey` (auto-set to the
-  provider response id by the wrappers), so retries never double-count.
-- **No prompt content is ever sent** — only token counts and metadata.
-- **Streaming is auto-tracked** (OpenAI needs `stream_options.include_usage`).
-- **Cost from tokens.** The server prices every event from token counts × a rate
-  card that auto-syncs daily from the public LiteLLM registry — unknown models are
-  priced at $0 and flagged in logs. See [docs/PRICING.md](../../docs/PRICING.md).
-- **Non-blocking.** Auto-instrumented tracking runs in the background; failures
-  are passed to `onError` (default `console.warn`) and never break your call.
-## API
-- `createTollgateClient(options?)` → `{ track(event), resolve(input) }`
-- `resolve({ runId, customerId, outcome, revenueUnitCents? })` → close a run with
-  its outcome; books revenue once, only when `outcome` is `'resolved'`
-- `wrapAnthropic(client, tollgate, options)` → instrumented Anthropic client
-- `wrapOpenAI(client, tollgate, options)` → instrumented OpenAI / compatible client
-- `wrapBedrock(client, tollgate, options)` → instrumented Bedrock Runtime client
-- `anthropicEventFrom` / `openAIEventFrom` / `bedrockEventFrom` → build a track
-  payload manually from a provider response
-`options` accepts `customerId`, `agentId`, `runId`, `revenueUnitCents`,
-`provider` (override; e.g. `'openai_compatible'`), `providerCostCents`, and `onError`.
-Licensed for use with Tollgate. Not open source.
+### Already have an exact cost?
+Pass `providerCostCents` (a number or a function of the response) and the server uses it verbatim, skipping the rate card entirely:
+```ts
+const anthropic = wrapAnthropic(new Anthropic(), tollgate, {
+  customerId: 'cust_acme',
+  providerCostCents: 3.5,   // or: (response) => computeMyOwnCost(response)
+});
+```
+---
+## API Reference
+### Exports
+```ts
+// Client
+createTollgateClient(options?)   // → TollgateClient
+TollgateError                    // Custom error with status & body
+// Auto-instrumentation wrappers
+wrapAnthropic(client, tollgate, options)   // → instrumented Anthropic client
+wrapOpenAI(client, tollgate, options)      // → instrumented OpenAI / compatible client
+wrapBedrock(client, tollgate, options)     // → instrumented Bedrock Runtime client
+// Low-level event builders (for manual track payloads)
+anthropicEventFrom(msg, options)           // → TrackEventInput | null
+openAIEventFrom(completion, options)       // → TrackEventInput | null
+bedrockEventFrom(usage, model, options)    // → TrackEventInput | null
+```
+### TollgateClient
+| Method | Description |
+|---|---|
+| `track(event)` | Report a single usage event. Idempotent on `idempotencyKey`. |
+| `resolve(input)` | Close a run with an outcome. Books revenue only when `outcome` is `'resolved'`. |
+| `upsertCustomer(input)` | Create or update a customer and optionally assign a plan. |
+### InstrumentOptions
+| Field | Type | Required | Description |
+|---|---|---|---|
+| `customerId` | `string` | Yes | Your end customer's stable identifier. |
+| `agentId` | `string` | No | Agent or workflow identifier. |
+| `runId` | `string \| () => string` | No | Logical run ID. Defaults to the provider response ID. |
+| `provider` | `Provider` | No | Override the reported provider (e.g. `'openai_compatible'`). |
+| `revenueUnitCents` | `number \| (response) => number` | No | Revenue per call in cents. |
+| `providerCostCents` | `number \| (response) => number` | No | Exact cost override — skips rate card. |
+| `onError` | `(err) => void` | No | Error handler for background tracking (default: `console.warn`). |
+---
+## How It Works
+1. **Proxy wrappers** intercept `messages.create` / `chat.completions.create` / `send` without modifying the request or response.
+2. After the provider responds, the wrapper extracts token counts, tool call counts, and metadata from the response's `usage` object and content blocks.
+3. A `POST /api/track` is fired **in the background** — non-blocking, with automatic retries on transient failures.
+4. The server computes cost from tokens via rate cards, joins it with your plan-configured revenue, and updates real-time margin rollups.
+5. Events are **idempotent** on `idempotencyKey` (auto-set to the provider response ID), so retries and stream replays never double-count.
+## Privacy & Security
+- **No prompt content is ever sent.** Only token counts, model name, and metadata.
+- Events are deduplicated server-side — safe to retry.
+- Background tracking never throws into your application code.
+---
+## License
+Licensed for use with Tollgate.

package/dist/index.cjs CHANGED Viewed

@@ -23,7 +23,7 @@ function createTollgateClient(opts = {}) {
   if (typeof doFetch !== "function") {
     throw new TollgateError("No fetch implementation available \u2014 pass `fetch` in options.");
   }
-  async function track(event) {
+  async function postJson(path, body) {
     if (!apiKey) {
       throw new TollgateError("Missing API key \u2014 set opts.apiKey or TOLLGATE_API_KEY.");
     }
@@ -32,23 +32,23 @@ function createTollgateClient(opts = {}) {
       const controller = new AbortController();
       const timer = setTimeout(() => controller.abort(), timeoutMs);
       try {
-        const res = await doFetch(`${baseUrl}/api/track`, {
+        const res = await doFetch(`${baseUrl}${path}`, {
           method: "POST",
           headers: {
             "Content-Type": "application/json",
             Authorization: `Bearer ${apiKey}`
           },
-          body: JSON.stringify(event),
+          body: JSON.stringify(body),
           signal: controller.signal
         });
         if (res.ok) {
           return await res.json();
         }
         if (res.status >= 500 || res.status === 429) {
-          lastErr = new TollgateError(`Tollgate track failed (${res.status})`, res.status);
+          lastErr = new TollgateError(`Tollgate request failed (${res.status})`, res.status);
         } else {
-          const body = await res.json().catch(() => ({}));
-          throw new TollgateError(`Tollgate track failed (${res.status})`, res.status, body);
+          const errBody = await res.json().catch(() => ({}));
+          throw new TollgateError(`Tollgate request failed (${res.status})`, res.status, errBody);
         }
       } catch (err) {
         if (err instanceof TollgateError && err.status && err.status < 500 && err.status !== 429) {
@@ -62,7 +62,13 @@ function createTollgateClient(opts = {}) {
         await sleep(2 ** attempt * 200);
       }
     }
-    throw lastErr instanceof Error ? lastErr : new TollgateError("Tollgate track failed after retries");
+    throw lastErr instanceof Error ? lastErr : new TollgateError("Tollgate request failed after retries");
+  }
+  function track(event) {
+    return postJson("/api/track", event);
+  }
+  function upsertCustomer(input) {
+    return postJson("/api/sdk/customer", input);
   }
   function resolve(input) {
     return track({
@@ -80,7 +86,7 @@ function createTollgateClient(opts = {}) {
       ts: input.ts
     });
   }
-  return { track, resolve };
+  return { track, resolve, upsertCustomer };
 }
 // src/instrument.ts
@@ -155,6 +161,7 @@ function anthropicEventFrom(msg, opts) {
   const fivem = usage.cache_creation?.ephemeral_5m_input_tokens;
   const oneh = usage.cache_creation?.ephemeral_1h_input_tokens;
   const hasSplit = fivem !== void 0 || oneh !== void 0;
+  const toolCalls = Array.isArray(msg.content) ? msg.content.filter((b) => b.type === "tool_use").length : 0;
   const event = {
     customerId: opts.customerId,
     agentId: opts.agentId,
@@ -166,6 +173,7 @@ function anthropicEventFrom(msg, opts) {
     cachedTokens: usage.cache_read_input_tokens ?? 0,
     cacheWrite5mTokens: hasSplit ? fivem ?? 0 : usage.cache_creation_input_tokens ?? 0,
     cacheWrite1hTokens: hasSplit ? oneh ?? 0 : 0,
+    toolCalls,
     revenueUnitCents: resolveRevenue(opts, msg),
     idempotencyKey: msg.id ?? `${runId}#${randomId()}`
   };
@@ -178,6 +186,7 @@ function wrapAnthropic(client, tollgate, opts) {
     const result = await original(...args);
     if (isAsyncIterable(result)) {
       const msg = {};
+      const toolUseBlocks = [];
       return instrumentStream(
         result,
         (ev) => {
@@ -187,9 +196,12 @@ function wrapAnthropic(client, tollgate, opts) {
             msg.usage = { ...ev.message.usage };
           } else if (ev.type === "message_delta" && ev.usage) {
             msg.usage = { ...msg.usage ?? {}, output_tokens: ev.usage.output_tokens };
+          } else if (ev.type === "content_block_start" && ev.content_block?.type === "tool_use") {
+            toolUseBlocks.push(ev.content_block);
           }
         },
         () => {
+          msg.content = toolUseBlocks;
           const event2 = anthropicEventFrom(msg, opts);
           if (event2) fireAndForget(tollgate.track(event2), opts.onError);
         }
@@ -214,6 +226,7 @@ function openAIEventFrom(completion, opts) {
   const usage = completion?.usage;
   if (!usage) return null;
   const runId = resolveRunId(opts, completion.id);
+  const toolCalls = completion.choices?.[0]?.message?.tool_calls?.length ?? 0;
   const event = {
     customerId: opts.customerId,
     agentId: opts.agentId,
@@ -224,6 +237,7 @@ function openAIEventFrom(completion, opts) {
     tokensOut: usage.completion_tokens ?? 0,
     reasoningTokens: usage.completion_tokens_details?.reasoning_tokens ?? 0,
     cachedTokens: usage.prompt_tokens_details?.cached_tokens ?? 0,
+    toolCalls,
     revenueUnitCents: resolveRevenue(opts, completion),
     idempotencyKey: completion.id ?? `${runId}#${randomId()}`
   };
@@ -238,16 +252,26 @@ function wrapOpenAI(client, tollgate, opts) {
       let id;
       let model;
       let usage;
+      const toolCallIndices = /* @__PURE__ */ new Set();
       return instrumentStream(
         result,
         (chunk) => {
           if (chunk.id) id = chunk.id;
           if (chunk.model) model = chunk.model;
           if (chunk.usage) usage = chunk.usage;
+          for (const c of chunk.choices ?? []) {
+            for (const tc of c.delta?.tool_calls ?? []) {
+              if (tc.index !== void 0) toolCallIndices.add(tc.index);
+            }
+          }
         },
         () => {
           if (!usage) return;
-          const event2 = openAIEventFrom({ id, model, usage }, opts);
+          const synth = { id, model, usage };
+          if (toolCallIndices.size > 0) {
+            synth.choices = [{ message: { tool_calls: new Array(toolCallIndices.size) } }];
+          }
+          const event2 = openAIEventFrom(synth, opts);
           if (event2) fireAndForget(tollgate.track(event2), opts.onError);
         }
       );
@@ -270,7 +294,7 @@ function wrapOpenAI(client, tollgate, opts) {
     }
   });
 }
-function bedrockEventFrom(usage, model, opts, response = void 0) {
+function bedrockEventFrom(usage, model, opts, response = void 0, toolCalls = 0) {
   if (!usage) return null;
   const runId = resolveRunId(opts, void 0);
   const event = {
@@ -283,6 +307,7 @@ function bedrockEventFrom(usage, model, opts, response = void 0) {
     tokensOut: usage.outputTokens ?? 0,
     cachedTokens: usage.cacheReadInputTokens ?? 0,
     cacheWrite5mTokens: usage.cacheWriteInputTokens ?? 0,
+    toolCalls,
     revenueUnitCents: resolveRevenue(opts, response),
     idempotencyKey: `${runId}#${randomId()}`
   };
@@ -295,20 +320,23 @@ function wrapBedrock(client, tollgate, opts) {
     const model = command?.input?.modelId ?? "unknown";
     if (result?.stream && isAsyncIterable(result.stream)) {
       let usage;
+      let streamToolCalls = 0;
       result.stream = instrumentStream(
         result.stream,
         (ev) => {
           if (ev.metadata?.usage) usage = ev.metadata.usage;
+          if (ev.contentBlockStart?.start?.toolUse) streamToolCalls++;
         },
         () => {
-          const event = bedrockEventFrom(usage, model, opts, result);
+          const event = bedrockEventFrom(usage, model, opts, result, streamToolCalls);
           if (event) fireAndForget(tollgate.track(event), opts.onError);
         }
       );
       return result;
     }
     if (result?.usage) {
-      const event = bedrockEventFrom(result.usage, model, opts, result);
+      const tc = result.output?.message?.content?.filter((b) => b.toolUse != null).length ?? 0;
+      const event = bedrockEventFrom(result.usage, model, opts, result, tc);
       if (event) fireAndForget(tollgate.track(event), opts.onError);
     }
     return result;