ai-sdk-rate-limiter 0.5.0 → 0.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +403 -189
- package/dist/cli.js +1029 -0
- package/dist/index.cjs +208 -87
- package/dist/index.cjs.map +1 -1
- package/dist/index.d.cts +2 -2
- package/dist/index.d.ts +2 -2
- package/dist/index.js +208 -87
- package/dist/index.js.map +1 -1
- package/dist/otel.cjs +75 -0
- package/dist/otel.cjs.map +1 -0
- package/dist/otel.d.cts +63 -0
- package/dist/otel.d.ts +63 -0
- package/dist/otel.js +72 -0
- package/dist/otel.js.map +1 -0
- package/dist/redis.d.cts +1 -1
- package/dist/redis.d.ts +1 -1
- package/dist/testing.cjs +2007 -0
- package/dist/testing.cjs.map +1 -0
- package/dist/testing.d.cts +59 -0
- package/dist/testing.d.ts +59 -0
- package/dist/testing.js +2005 -0
- package/dist/testing.js.map +1 -0
- package/dist/{types-CgePLtmQ.d.cts → types-D7qskXNw.d.cts} +54 -1
- package/dist/{types-CgePLtmQ.d.ts → types-D7qskXNw.d.ts} +54 -1
- package/package.json +29 -2
package/README.md
CHANGED
|
@@ -56,9 +56,152 @@ The wrapped model is a drop-in replacement. Every Vercel AI SDK feature — stre
|
|
|
56
56
|
|
|
57
57
|
**Raw SDK support** — Works with the native OpenAI, Anthropic, Groq, Mistral, and Cohere SDKs directly via a transparent JavaScript Proxy. No Vercel AI SDK required.
|
|
58
58
|
|
|
59
|
+
**OpenTelemetry** — Drop-in OTel plugin that emits GenAI-spec spans for every request. Works with any OTel-compatible tracer.
|
|
60
|
+
|
|
61
|
+
**CLI audit** — `npx ai-sdk-rate-limiter audit` probes your API keys to detect your actual tier limits and generates a ready-to-paste config override.
|
|
62
|
+
|
|
63
|
+
---
|
|
64
|
+
|
|
65
|
+
## Contents
|
|
66
|
+
|
|
67
|
+
- [Vercel AI SDK usage](#vercel-ai-sdk-usage)
|
|
68
|
+
- [Raw SDK proxy](#raw-sdk-proxy)
|
|
69
|
+
- [Configuration reference](#configuration-reference)
|
|
70
|
+
- [Per-request options](#per-request-options)
|
|
71
|
+
- [Cost tracking](#cost-tracking)
|
|
72
|
+
- [Budget fallback routing](#budget-fallback-routing)
|
|
73
|
+
- [Multi-instance Redis store](#multi-instance-redis-store)
|
|
74
|
+
- [Events](#events)
|
|
75
|
+
- [Backpressure](#backpressure)
|
|
76
|
+
- [Error handling](#error-handling)
|
|
77
|
+
- [OpenTelemetry](#opentelemetry)
|
|
78
|
+
- [CLI audit](#cli-audit)
|
|
79
|
+
- [Model registry](#model-registry)
|
|
80
|
+
- [Advanced usage](#advanced-usage)
|
|
81
|
+
- [How it works](#how-it-works)
|
|
82
|
+
- [Comparison](#comparison)
|
|
83
|
+
- [TypeScript](#typescript)
|
|
84
|
+
- [Requirements](#requirements)
|
|
85
|
+
|
|
59
86
|
---
|
|
60
87
|
|
|
61
|
-
##
|
|
88
|
+
## Vercel AI SDK usage
|
|
89
|
+
|
|
90
|
+
### Basic wrap
|
|
91
|
+
|
|
92
|
+
```typescript
|
|
93
|
+
import { createRateLimiter } from 'ai-sdk-rate-limiter'
|
|
94
|
+
import { openai } from '@ai-sdk/openai'
|
|
95
|
+
import { generateText, streamText } from 'ai'
|
|
96
|
+
|
|
97
|
+
const limiter = createRateLimiter()
|
|
98
|
+
const model = limiter.wrap(openai('gpt-4o'))
|
|
99
|
+
|
|
100
|
+
// generateText
|
|
101
|
+
const { text } = await generateText({ model, prompt: 'Summarize this...' })
|
|
102
|
+
|
|
103
|
+
// streamText — streaming is first-class, rate limit slot consumed at request start
|
|
104
|
+
const result = streamText({ model, messages })
|
|
105
|
+
for await (const chunk of result.textStream) {
|
|
106
|
+
process.stdout.write(chunk)
|
|
107
|
+
}
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
### Using the raw middleware
|
|
111
|
+
|
|
112
|
+
If you use `wrapLanguageModel` directly or need to compose middleware:
|
|
113
|
+
|
|
114
|
+
```typescript
|
|
115
|
+
import { wrapLanguageModel } from 'ai'
|
|
116
|
+
|
|
117
|
+
// Single middleware
|
|
118
|
+
const model = wrapLanguageModel({
|
|
119
|
+
model: openai('gpt-4o'),
|
|
120
|
+
middleware: limiter.middleware,
|
|
121
|
+
})
|
|
122
|
+
|
|
123
|
+
// Composed with other middleware
|
|
124
|
+
const model = wrapLanguageModel({
|
|
125
|
+
model: openai('gpt-4o'),
|
|
126
|
+
middleware: [loggingMiddleware, limiter.middleware, cachingMiddleware],
|
|
127
|
+
})
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
---
|
|
131
|
+
|
|
132
|
+
## Raw SDK proxy
|
|
133
|
+
|
|
134
|
+
If you're using the OpenAI, Anthropic, Groq, Mistral, or Cohere SDK directly — without the Vercel AI SDK — use `limiter.rawProxy()` to add rate limiting as a transparent drop-in:
|
|
135
|
+
|
|
136
|
+
```typescript
|
|
137
|
+
import { createRateLimiter } from 'ai-sdk-rate-limiter'
|
|
138
|
+
import OpenAI from 'openai'
|
|
139
|
+
import Anthropic from '@anthropic-ai/sdk'
|
|
140
|
+
|
|
141
|
+
const limiter = createRateLimiter({
|
|
142
|
+
cost: { budget: { daily: 50 }, onExceeded: 'throw' },
|
|
143
|
+
on: { rateLimited: ({ model }) => console.warn(`${model} rate limited`) },
|
|
144
|
+
})
|
|
145
|
+
|
|
146
|
+
// Every API call goes through the same rate limiter and cost tracker
|
|
147
|
+
const openai = limiter.rawProxy(new OpenAI())
|
|
148
|
+
const anthropic = limiter.rawProxy(new Anthropic())
|
|
149
|
+
|
|
150
|
+
// Use exactly as before — no other changes needed
|
|
151
|
+
const completion = await openai.chat.completions.create({
|
|
152
|
+
model: 'gpt-4o',
|
|
153
|
+
messages: [{ role: 'user', content: 'Hello!' }],
|
|
154
|
+
})
|
|
155
|
+
|
|
156
|
+
const message = await anthropic.messages.create({
|
|
157
|
+
model: 'claude-opus-4-6',
|
|
158
|
+
max_tokens: 1024,
|
|
159
|
+
messages: [{ role: 'user', content: 'Hello!' }],
|
|
160
|
+
})
|
|
161
|
+
|
|
162
|
+
// Cost from both clients tracked together
|
|
163
|
+
const report = limiter.getCostReport()
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
**Streaming** — the proxy wraps the returned `AsyncIterable` to capture the final usage chunk automatically:
|
|
167
|
+
|
|
168
|
+
```typescript
|
|
169
|
+
const stream = await openai.chat.completions.create({
|
|
170
|
+
model: 'gpt-4o',
|
|
171
|
+
messages: [{ role: 'user', content: 'Stream this' }],
|
|
172
|
+
stream: true,
|
|
173
|
+
stream_options: { include_usage: true }, // OpenAI: include usage in final chunk
|
|
174
|
+
})
|
|
175
|
+
|
|
176
|
+
for await (const chunk of stream) {
|
|
177
|
+
process.stdout.write(chunk.choices[0]?.delta?.content ?? '')
|
|
178
|
+
}
|
|
179
|
+
// After the loop, tokens are recorded in limiter.getCostReport()
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
**Standalone — no shared limiter needed:**
|
|
183
|
+
|
|
184
|
+
```typescript
|
|
185
|
+
import { rateLimited } from 'ai-sdk-rate-limiter'
|
|
186
|
+
|
|
187
|
+
const openai = rateLimited(new OpenAI(), {
|
|
188
|
+
config: { cost: { budget: { daily: 20 } } },
|
|
189
|
+
})
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
**Override auto-detected provider** — useful for OpenAI-compatible endpoints:
|
|
193
|
+
|
|
194
|
+
```typescript
|
|
195
|
+
const client = limiter.rawProxy(new OpenAI({ baseURL: 'https://api.groq.com/openai/v1' }), {
|
|
196
|
+
provider: 'groq', // use Groq's limits and pricing instead of OpenAI's
|
|
197
|
+
})
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
Provider is auto-detected from the client's constructor name (`OpenAI` → `openai`, `Anthropic` → `anthropic`, `Groq` → `groq`, etc.).
|
|
201
|
+
|
|
202
|
+
---
|
|
203
|
+
|
|
204
|
+
## Configuration reference
|
|
62
205
|
|
|
63
206
|
Everything has a sensible default. Override only what you need.
|
|
64
207
|
|
|
@@ -67,17 +210,15 @@ const limiter = createRateLimiter({
|
|
|
67
210
|
// Override or extend built-in model limits for your API tier
|
|
68
211
|
limits: {
|
|
69
212
|
'gpt-4o': { rpm: 500, itpm: 30_000 },
|
|
70
|
-
'claude-opus-4-6': { rpm: 50, itpm:
|
|
71
|
-
// Wildcard: apply to all models from a provider
|
|
72
|
-
'openai/*': { rpm: 1000 },
|
|
213
|
+
'claude-opus-4-6': { rpm: 50, itpm: 20_000 },
|
|
73
214
|
},
|
|
74
215
|
|
|
75
216
|
// Cost budgets and behavior when exceeded
|
|
76
217
|
cost: {
|
|
77
218
|
budget: {
|
|
78
|
-
hourly: 5, // USD
|
|
79
|
-
daily: 50,
|
|
80
|
-
monthly: 500,
|
|
219
|
+
hourly: 5, // USD — hard cap per hour
|
|
220
|
+
daily: 50, // USD — hard cap per day
|
|
221
|
+
monthly: 500, // USD — hard cap per month
|
|
81
222
|
},
|
|
82
223
|
onExceeded: 'throw', // 'throw' | 'queue' | 'fallback'
|
|
83
224
|
},
|
|
@@ -100,14 +241,14 @@ const limiter = createRateLimiter({
|
|
|
100
241
|
retryOn: [429, 500, 502, 503, 504],
|
|
101
242
|
},
|
|
102
243
|
|
|
103
|
-
// Observability
|
|
244
|
+
// Observability — see Events section for all available events
|
|
104
245
|
on: {
|
|
105
246
|
rateLimited: ({ model, source, resetAt }) =>
|
|
106
247
|
console.warn(`${model} rate limited (${source}), resets ${new Date(resetAt).toISOString()}`),
|
|
107
248
|
retrying: ({ model, attempt, delayMs }) =>
|
|
108
249
|
console.log(`${model} retry ${attempt} in ${delayMs}ms`),
|
|
109
250
|
budgetHit: ({ model, currentCostUsd, limitUsd, period }) =>
|
|
110
|
-
alerts.send(`${model} hit $${limitUsd} ${period} budget
|
|
251
|
+
alerts.send(`${model} hit $${limitUsd} ${period} budget`),
|
|
111
252
|
completed: ({ model, inputTokens, outputTokens, costUsd, latencyMs }) =>
|
|
112
253
|
metrics.record({ model, inputTokens, outputTokens, costUsd, latencyMs }),
|
|
113
254
|
},
|
|
@@ -123,14 +264,14 @@ Pass options to individual requests via `providerOptions.rateLimiter`:
|
|
|
123
264
|
```typescript
|
|
124
265
|
import { generateText } from 'ai'
|
|
125
266
|
|
|
126
|
-
// High-priority
|
|
267
|
+
// High-priority — skips ahead of normal traffic in the queue
|
|
127
268
|
await generateText({
|
|
128
269
|
model,
|
|
129
270
|
prompt: 'Urgent user request...',
|
|
130
271
|
providerOptions: {
|
|
131
272
|
rateLimiter: {
|
|
132
|
-
priority: 'high',
|
|
133
|
-
timeout: 10_000,
|
|
273
|
+
priority: 'high', // 'high' | 'normal' | 'low'
|
|
274
|
+
timeout: 10_000, // override the default queue timeout for this request
|
|
134
275
|
},
|
|
135
276
|
},
|
|
136
277
|
})
|
|
@@ -145,7 +286,7 @@ await generateText({
|
|
|
145
286
|
})
|
|
146
287
|
```
|
|
147
288
|
|
|
148
|
-
This is the
|
|
289
|
+
This is the recommended way to colocate user requests and background jobs on the same model without background jobs starving users.
|
|
149
290
|
|
|
150
291
|
---
|
|
151
292
|
|
|
@@ -157,7 +298,7 @@ const report = limiter.getCostReport()
|
|
|
157
298
|
|
|
158
299
|
console.log(report)
|
|
159
300
|
// {
|
|
160
|
-
// hour: { requests: 42, inputTokens: 84_000, outputTokens: 21_000,
|
|
301
|
+
// hour: { requests: 42, inputTokens: 84_000, outputTokens: 21_000, costUsd: 0.29 },
|
|
161
302
|
// day: { requests: 318, inputTokens: 620_000, outputTokens: 155_000, costUsd: 2.11 },
|
|
162
303
|
// month: { requests: 4821, inputTokens: 9_100_000, outputTokens: 2_200_000, costUsd: 34.80 },
|
|
163
304
|
// byModel: {
|
|
@@ -173,17 +314,17 @@ Costs are based on **actual token counts** from API responses — not estimates.
|
|
|
173
314
|
|
|
174
315
|
## Budget fallback routing
|
|
175
316
|
|
|
176
|
-
When a budget limit is hit, you can transparently reroute to a cheaper model instead of throwing an error
|
|
317
|
+
When a budget limit is hit, you can transparently reroute to a cheaper model instead of throwing an error:
|
|
177
318
|
|
|
178
319
|
```typescript
|
|
179
320
|
const limiter = createRateLimiter({
|
|
180
321
|
cost: {
|
|
181
322
|
budget: { daily: 10 },
|
|
182
|
-
onExceeded: 'fallback',
|
|
323
|
+
onExceeded: 'fallback', // reroute to fallback instead of throwing
|
|
183
324
|
},
|
|
184
325
|
on: {
|
|
185
|
-
budgetHit: ({ model,
|
|
186
|
-
console.warn(`${model}
|
|
326
|
+
budgetHit: ({ model, usingFallback }) =>
|
|
327
|
+
console.warn(`${model} budget hit — ${usingFallback ? 'using fallback' : 'throwing'}`),
|
|
187
328
|
},
|
|
188
329
|
})
|
|
189
330
|
|
|
@@ -192,17 +333,11 @@ const model = limiter.wrap(
|
|
|
192
333
|
{ fallback: openai('gpt-4o-mini') }, // used when budget is exceeded
|
|
193
334
|
)
|
|
194
335
|
|
|
195
|
-
// Under budget
|
|
196
|
-
// Over $10/day
|
|
336
|
+
// Under budget → uses gpt-4o normally
|
|
337
|
+
// Over $10/day → silently switches to gpt-4o-mini, no code changes needed
|
|
197
338
|
const result = await generateText({ model, prompt })
|
|
198
339
|
```
|
|
199
340
|
|
|
200
|
-
**How it works:**
|
|
201
|
-
1. The budget is checked before every request against total rolling spend
|
|
202
|
-
2. When exceeded, `BudgetExceededError` is caught inside `wrap()` before it reaches your code
|
|
203
|
-
3. The request is re-executed against the fallback model, bypassing the budget pre-check
|
|
204
|
-
4. Fallback usage is tracked under the fallback model's ID in `getCostReport()`
|
|
205
|
-
|
|
206
341
|
**Behavior matrix:**
|
|
207
342
|
|
|
208
343
|
| `onExceeded` | `fallback` configured | Outcome |
|
|
@@ -212,11 +347,13 @@ const result = await generateText({ model, prompt })
|
|
|
212
347
|
| `'fallback'` | no | Throws `BudgetExceededError` |
|
|
213
348
|
| `'queue'` | any | Queues until period resets |
|
|
214
349
|
|
|
350
|
+
Fallback usage is tracked under the fallback model's ID in `getCostReport()`.
|
|
351
|
+
|
|
215
352
|
---
|
|
216
353
|
|
|
217
354
|
## Multi-instance Redis store
|
|
218
355
|
|
|
219
|
-
By default, rate limit state is in-memory (per-process).
|
|
356
|
+
By default, rate limit state is in-memory (per-process). For multi-instance deployments — multiple pods, serverless replicas, workers — each instance has its own counters. Install the Redis store to share state:
|
|
220
357
|
|
|
221
358
|
```
|
|
222
359
|
npm install ioredis
|
|
@@ -229,162 +366,91 @@ import Redis from 'ioredis'
|
|
|
229
366
|
|
|
230
367
|
const limiter = createRateLimiter({
|
|
231
368
|
store: new RedisStore(new Redis(process.env.REDIS_URL)),
|
|
232
|
-
// ... rest of your config
|
|
369
|
+
// ... rest of your config unchanged
|
|
233
370
|
})
|
|
234
371
|
```
|
|
235
372
|
|
|
236
373
|
That's the entire change. All APIs — `wrap()`, `rawProxy()`, events, cost reports — work identically. The Redis store enforces rate limits collectively so no two instances can jointly exceed the API limits.
|
|
237
374
|
|
|
238
|
-
**How it works:**
|
|
239
|
-
|
|
240
|
-
Each request atomically runs a Lua script that:
|
|
241
|
-
1. Removes entries older than 60 seconds from a sorted set (`ZREMRANGEBYSCORE`)
|
|
242
|
-
2. Counts remaining requests and sums input tokens
|
|
243
|
-
3. Checks against RPM and ITPM limits
|
|
244
|
-
4. If allowed: reserves the slot (`ZADD`) and returns immediately
|
|
245
|
-
5. If blocked: returns the timestamp when the next slot opens
|
|
246
|
-
|
|
247
|
-
The local queue (priority ordering, drain timer, timeout handling) stays in-memory per instance — only the window counters are shared.
|
|
248
|
-
|
|
249
375
|
**Options:**
|
|
250
376
|
|
|
251
377
|
```typescript
|
|
252
378
|
new RedisStore(redis, {
|
|
253
|
-
keyPrefix: 'rl:myapp:',
|
|
254
|
-
windowMs: 60_000,
|
|
379
|
+
keyPrefix: 'rl:myapp:', // namespace if multiple apps share one Redis instance
|
|
380
|
+
windowMs: 60_000, // window size in ms; match your provider's limit window
|
|
255
381
|
})
|
|
256
382
|
```
|
|
257
383
|
|
|
258
|
-
**
|
|
384
|
+
**How it works internally:**
|
|
385
|
+
|
|
386
|
+
Each request runs a Lua script atomically that: removes stale entries from a sorted set, counts requests and tokens in the current window, checks against RPM and ITPM limits, and either reserves the slot or returns when the next slot opens. The local queue (priority ordering, drain timer, timeout handling) stays in-memory per instance — only the window counters are shared via Redis.
|
|
387
|
+
|
|
388
|
+
**Compatible clients** — any client with `eval()`, `get()`, and `set()` works: `ioredis`, `node-redis`, Upstash Redis.
|
|
259
389
|
|
|
260
|
-
|
|
390
|
+
Use the default `InMemoryStore` for single-instance deployments — it's more accurate (true sliding window, no network round-trips) and zero-config. Only switch to `RedisStore` when you actually need cross-instance coordination.
|
|
261
391
|
|
|
262
392
|
---
|
|
263
393
|
|
|
264
|
-
##
|
|
394
|
+
## Events
|
|
265
395
|
|
|
266
|
-
|
|
396
|
+
All events are typed. Register handlers at creation time or dynamically:
|
|
267
397
|
|
|
268
398
|
```typescript
|
|
269
|
-
|
|
270
|
-
import OpenAI from 'openai'
|
|
271
|
-
import Anthropic from '@anthropic-ai/sdk'
|
|
272
|
-
|
|
399
|
+
// At creation time
|
|
273
400
|
const limiter = createRateLimiter({
|
|
274
|
-
|
|
275
|
-
on: { rateLimited: ({ model }) => console.warn(`${model} rate limited`) },
|
|
276
|
-
})
|
|
277
|
-
|
|
278
|
-
// Every API call goes through the same rate limiter and cost tracker
|
|
279
|
-
const openai = limiter.rawProxy(new OpenAI())
|
|
280
|
-
const anthropic = limiter.rawProxy(new Anthropic())
|
|
281
|
-
|
|
282
|
-
// Use exactly as before — no other changes needed
|
|
283
|
-
const completion = await openai.chat.completions.create({
|
|
284
|
-
model: 'gpt-4o',
|
|
285
|
-
messages: [{ role: 'user', content: 'Hello!' }],
|
|
286
|
-
})
|
|
287
|
-
|
|
288
|
-
const message = await anthropic.messages.create({
|
|
289
|
-
model: 'claude-opus-4-6',
|
|
290
|
-
max_tokens: 1024,
|
|
291
|
-
messages: [{ role: 'user', content: 'Hello!' }],
|
|
401
|
+
on: { rateLimited: handler },
|
|
292
402
|
})
|
|
293
403
|
|
|
294
|
-
//
|
|
295
|
-
|
|
296
|
-
|
|
297
|
-
|
|
298
|
-
**Streaming works too** — the proxy wraps the returned `AsyncIterable` to capture the final usage chunk automatically:
|
|
299
|
-
|
|
300
|
-
```typescript
|
|
301
|
-
const stream = await openai.chat.completions.create({
|
|
302
|
-
model: 'gpt-4o',
|
|
303
|
-
messages: [{ role: 'user', content: 'Stream this' }],
|
|
304
|
-
stream: true,
|
|
305
|
-
stream_options: { include_usage: true }, // tells OpenAI to include usage in final chunk
|
|
404
|
+
// Dynamically
|
|
405
|
+
limiter.on('queued', ({ model, queueDepth, estimatedWaitMs }) => {
|
|
406
|
+
console.log(`${model} queued (depth: ${queueDepth}, ~${estimatedWaitMs}ms wait)`)
|
|
306
407
|
})
|
|
307
408
|
|
|
308
|
-
|
|
309
|
-
process.stdout.write(chunk.choices[0]?.delta?.content ?? '')
|
|
310
|
-
}
|
|
311
|
-
// After the loop, usage is recorded in limiter.getCostReport()
|
|
312
|
-
```
|
|
313
|
-
|
|
314
|
-
**Zero-config standalone version** — if you don't need to share the limiter with other models:
|
|
315
|
-
|
|
316
|
-
```typescript
|
|
317
|
-
import { rateLimited } from 'ai-sdk-rate-limiter'
|
|
318
|
-
|
|
319
|
-
const openai = rateLimited(new OpenAI(), {
|
|
320
|
-
config: { cost: { budget: { daily: 20 } } },
|
|
321
|
-
})
|
|
409
|
+
limiter.off('queued', handler)
|
|
322
410
|
```
|
|
323
411
|
|
|
324
|
-
|
|
412
|
+
| Event | When | Key fields |
|
|
413
|
+
|---|---|---|
|
|
414
|
+
| `queued` | Request enters the queue | `model`, `provider`, `priority`, `queueDepth`, `estimatedWaitMs` |
|
|
415
|
+
| `dequeued` | Request leaves the queue | `model`, `provider`, `waitedMs`, `priority` |
|
|
416
|
+
| `retrying` | A failed request is about to retry | `model`, `provider`, `attempt`, `maxAttempts`, `delayMs`, `error` |
|
|
417
|
+
| `rateLimited` | Limit hit (local or remote 429) | `model`, `provider`, `source`, `limitType`, `resetAt` |
|
|
418
|
+
| `budgetHit` | Cost budget exceeded | `model`, `provider`, `currentCostUsd`, `limitUsd`, `period`, `usingFallback` |
|
|
419
|
+
| `dropped` | Request rejected (queue full or timeout) | `model`, `provider`, `reason` |
|
|
420
|
+
| `completed` | Request finished successfully | `model`, `provider`, `inputTokens`, `outputTokens`, `costUsd`, `latencyMs`, `streaming` |
|
|
325
421
|
|
|
326
|
-
|
|
327
|
-
const client = limiter.rawProxy(new OpenAI({ baseURL: 'https://api.groq.com/openai/v1' }), {
|
|
328
|
-
provider: 'groq', // use Groq's limits and pricing instead of OpenAI's
|
|
329
|
-
})
|
|
330
|
-
```
|
|
422
|
+
The `source` on `rateLimited` distinguishes between requests we blocked locally (`'local'`) vs. requests the API rejected with a 429 (`'remote'`). Local blocks are expected and free. Frequent remote blocks mean your configured limits are too high for your tier — run `npx ai-sdk-rate-limiter audit` to get accurate numbers.
|
|
331
423
|
|
|
332
424
|
---
|
|
333
425
|
|
|
334
|
-
## Backpressure
|
|
426
|
+
## Backpressure
|
|
335
427
|
|
|
336
|
-
Check estimated wait time before committing to a request. Useful for showing loading states or shedding load gracefully
|
|
428
|
+
Check estimated wait time before committing to a request. Useful for showing loading states or shedding load gracefully:
|
|
337
429
|
|
|
338
430
|
```typescript
|
|
339
|
-
const waitMs = limiter.estimatedWait('gpt-4o')
|
|
431
|
+
const waitMs = await limiter.estimatedWait('gpt-4o')
|
|
340
432
|
|
|
341
433
|
if (waitMs > 5_000) {
|
|
342
|
-
return res.status(503).json({
|
|
434
|
+
return res.status(503).json({
|
|
435
|
+
error: 'Model busy, try again shortly',
|
|
436
|
+
retryAfterMs: waitMs,
|
|
437
|
+
})
|
|
343
438
|
}
|
|
344
439
|
|
|
345
440
|
const result = await generateText({ model, prompt })
|
|
346
441
|
```
|
|
347
442
|
|
|
348
|
-
|
|
349
|
-
|
|
350
|
-
## Events
|
|
351
|
-
|
|
352
|
-
All events are typed. Register handlers at creation time or dynamically:
|
|
353
|
-
|
|
354
|
-
```typescript
|
|
355
|
-
// At creation time
|
|
356
|
-
const limiter = createRateLimiter({
|
|
357
|
-
on: { rateLimited: handler },
|
|
358
|
-
})
|
|
359
|
-
|
|
360
|
-
// Dynamically
|
|
361
|
-
limiter.on('queued', ({ model, queueDepth, estimatedWaitMs }) => {
|
|
362
|
-
console.log(`${model} queued (depth: ${queueDepth}, ~${estimatedWaitMs}ms wait)`)
|
|
363
|
-
})
|
|
364
|
-
|
|
365
|
-
limiter.off('queued', handler)
|
|
366
|
-
```
|
|
367
|
-
|
|
368
|
-
| Event | When | Key fields |
|
|
369
|
-
|---|---|---|
|
|
370
|
-
| `queued` | Request enters the queue | `model`, `priority`, `queueDepth`, `estimatedWaitMs` |
|
|
371
|
-
| `dequeued` | Request leaves the queue | `model`, `waitedMs`, `priority` |
|
|
372
|
-
| `retrying` | A failed request is about to retry | `model`, `attempt`, `maxAttempts`, `delayMs`, `error` |
|
|
373
|
-
| `rateLimited` | Limit hit (local or remote 429) | `model`, `source`, `limitType`, `resetAt` |
|
|
374
|
-
| `budgetHit` | Cost budget exceeded | `model`, `currentCostUsd`, `limitUsd`, `period`, `usingFallback` |
|
|
375
|
-
| `dropped` | Request rejected (queue full or timeout) | `model`, `reason` |
|
|
376
|
-
| `completed` | Request finished successfully | `model`, `inputTokens`, `outputTokens`, `costUsd`, `latencyMs` |
|
|
377
|
-
|
|
378
|
-
The `source` on `rateLimited` distinguishes between requests we blocked locally (`'local'`) vs. requests the API rejected with a 429 (`'remote'`). Local blocks are free; remote blocks mean your limits are misconfigured.
|
|
443
|
+
Returns `0` if the model would proceed immediately.
|
|
379
444
|
|
|
380
445
|
---
|
|
381
446
|
|
|
382
447
|
## Error handling
|
|
383
448
|
|
|
384
|
-
Every error is typed
|
|
449
|
+
Every error is typed and carries structured context:
|
|
385
450
|
|
|
386
451
|
```typescript
|
|
387
452
|
import {
|
|
453
|
+
RateLimitExceededError,
|
|
388
454
|
QueueTimeoutError,
|
|
389
455
|
QueueFullError,
|
|
390
456
|
BudgetExceededError,
|
|
@@ -396,77 +462,163 @@ try {
|
|
|
396
462
|
const result = await generateText({ model, prompt })
|
|
397
463
|
} catch (error) {
|
|
398
464
|
if (error instanceof QueueTimeoutError) {
|
|
399
|
-
//
|
|
465
|
+
// Request waited in queue longer than queue.timeout
|
|
400
466
|
console.error(`Timed out after ${error.waitedMs}ms (queue depth: ${error.queueDepth})`)
|
|
401
467
|
} else if (error instanceof BudgetExceededError) {
|
|
402
|
-
//
|
|
468
|
+
// Cost budget hit and onExceeded is 'throw' or no fallback configured
|
|
403
469
|
console.error(`Budget exceeded: $${error.currentCostUsd} of $${error.limitUsd} ${error.period}`)
|
|
404
470
|
} else if (error instanceof RetryExhaustedError) {
|
|
405
|
-
//
|
|
471
|
+
// All retry attempts failed
|
|
406
472
|
console.error(`All ${error.attempts} retries exhausted`, error.cause)
|
|
407
473
|
} else if (error instanceof QueueFullError) {
|
|
408
|
-
//
|
|
409
|
-
console.error(`Queue full at ${error.maxSize} requests`)
|
|
474
|
+
// Queue at capacity and onFull is 'throw'
|
|
475
|
+
console.error(`Queue full at ${error.maxSize} requests for ${error.model}`)
|
|
476
|
+
} else if (error instanceof RateLimitExceededError) {
|
|
477
|
+
// Rate limit hit and the request could not be queued
|
|
478
|
+
console.error(`${error.model} ${error.limitType} limit of ${error.limit} exceeded`)
|
|
410
479
|
}
|
|
411
480
|
}
|
|
412
481
|
```
|
|
413
482
|
|
|
414
|
-
All errors extend `RateLimiterError`, so a single `instanceof RateLimiterError` check separates rate-
|
|
483
|
+
All errors extend `RateLimiterError`, so a single `instanceof RateLimiterError` check separates rate-limiter failures from AI API errors.
|
|
484
|
+
|
|
485
|
+
**Error fields:**
|
|
486
|
+
|
|
487
|
+
| Error | Fields |
|
|
488
|
+
|---|---|
|
|
489
|
+
| `QueueTimeoutError` | `model`, `waitedMs`, `queueDepth` |
|
|
490
|
+
| `BudgetExceededError` | `model`, `currentCostUsd`, `limitUsd`, `period` |
|
|
491
|
+
| `RetryExhaustedError` | `model`, `attempts`, `cause` |
|
|
492
|
+
| `QueueFullError` | `model`, `maxSize` |
|
|
493
|
+
| `RateLimitExceededError` | `model`, `limitType`, `limit`, `resetAt` |
|
|
415
494
|
|
|
416
495
|
---
|
|
417
496
|
|
|
418
|
-
##
|
|
497
|
+
## OpenTelemetry
|
|
419
498
|
|
|
420
|
-
|
|
499
|
+
The `ai-sdk-rate-limiter/otel` entry point provides a plugin that emits OpenTelemetry spans for every AI request. No hard dependency on `@opentelemetry/api` — the plugin accepts any OTel-compatible tracer via structural typing.
|
|
421
500
|
|
|
422
501
|
```typescript
|
|
423
|
-
import {
|
|
502
|
+
import { trace } from '@opentelemetry/api'
|
|
503
|
+
import { createRateLimiter } from 'ai-sdk-rate-limiter'
|
|
504
|
+
import { createOtelPlugin } from 'ai-sdk-rate-limiter/otel'
|
|
424
505
|
|
|
425
|
-
const
|
|
426
|
-
|
|
427
|
-
middleware: limiter.middleware,
|
|
506
|
+
const limiter = createRateLimiter({
|
|
507
|
+
on: createOtelPlugin(trace.getTracer('my-service')),
|
|
428
508
|
})
|
|
429
509
|
```
|
|
430
510
|
|
|
431
|
-
|
|
511
|
+
**Spans emitted:**
|
|
512
|
+
|
|
513
|
+
| Span name | When | Status |
|
|
514
|
+
|---|---|---|
|
|
515
|
+
| `gen_ai.request` | Every completed request | OK |
|
|
516
|
+
| `gen_ai.request` | Every dropped request | ERROR |
|
|
517
|
+
| `ai_rate_limiter.retry` | Each retry attempt | OK |
|
|
518
|
+
| `ai_rate_limiter.budget_hit` | Budget exceeded | ERROR |
|
|
519
|
+
|
|
520
|
+
**Attributes on `gen_ai.request` (completed):**
|
|
521
|
+
|
|
522
|
+
| Attribute | Value |
|
|
523
|
+
|---|---|
|
|
524
|
+
| `gen_ai.system` | Provider name (e.g. `openai`, `anthropic`) |
|
|
525
|
+
| `gen_ai.request.model` | Model ID |
|
|
526
|
+
| `gen_ai.usage.input_tokens` | Actual input tokens from API response |
|
|
527
|
+
| `gen_ai.usage.output_tokens` | Actual output tokens from API response |
|
|
528
|
+
| `ai_rate_limiter.cost_usd` | Cost in USD for this request |
|
|
529
|
+
| `ai_rate_limiter.latency_ms` | Total latency including queue wait |
|
|
530
|
+
| `ai_rate_limiter.streaming` | Whether the request used streaming |
|
|
531
|
+
|
|
532
|
+
Attribute names follow the [OTel GenAI semantic conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/). The `gen_ai.request` span duration is reconstructed from `latencyMs` so it reflects the full wall-clock time of the request, including any queue wait.
|
|
533
|
+
|
|
534
|
+
**Custom tracer interface** — if you don't want to install `@opentelemetry/api`, implement the `OtelTracer` interface directly:
|
|
432
535
|
|
|
433
536
|
```typescript
|
|
434
|
-
|
|
435
|
-
|
|
436
|
-
|
|
537
|
+
import { createOtelPlugin, type OtelTracer } from 'ai-sdk-rate-limiter/otel'
|
|
538
|
+
|
|
539
|
+
const tracer: OtelTracer = {
|
|
540
|
+
startSpan(name, options) {
|
|
541
|
+
// return any object that implements OtelSpan
|
|
542
|
+
},
|
|
543
|
+
}
|
|
544
|
+
|
|
545
|
+
const limiter = createRateLimiter({
|
|
546
|
+
on: createOtelPlugin(tracer),
|
|
437
547
|
})
|
|
438
548
|
```
|
|
439
549
|
|
|
440
550
|
---
|
|
441
551
|
|
|
442
|
-
##
|
|
552
|
+
## CLI audit
|
|
443
553
|
|
|
444
|
-
|
|
554
|
+
Probe your API keys to discover your actual rate limit tier and generate a ready-to-paste config override:
|
|
445
555
|
|
|
446
|
-
```
|
|
447
|
-
|
|
448
|
-
|
|
449
|
-
cost: { budget: { daily: 0.10 }, onExceeded: 'throw' },
|
|
450
|
-
queue: { timeout: 5_000 },
|
|
451
|
-
})
|
|
556
|
+
```
|
|
557
|
+
npx ai-sdk-rate-limiter audit
|
|
558
|
+
```
|
|
452
559
|
|
|
453
|
-
|
|
454
|
-
|
|
455
|
-
|
|
456
|
-
|
|
457
|
-
|
|
560
|
+
```
|
|
561
|
+
────────────────────────────────────────────────────────────────────────────────
|
|
562
|
+
ai-sdk-rate-limiter audit
|
|
563
|
+
────────────────────────────────────────────────────────────────────────────────
|
|
564
|
+
|
|
565
|
+
OPENAI (OPENAI_API_KEY)
|
|
566
|
+
Model RPM TPM Registry
|
|
567
|
+
──────────────────────────────────────────────────────────────────────────────
|
|
568
|
+
gpt-4o 10000 2,000,000 (registry: 500 RPM / 30,000 TPM)
|
|
569
|
+
gpt-4o-mini 10000 10,000,000 ≠ (registry: 500 RPM / 200,000 TPM)
|
|
570
|
+
|
|
571
|
+
────────────────────────────────────────────────────────────────────────────────
|
|
572
|
+
1 model(s) differ from registry defaults.
|
|
573
|
+
Paste the config below into createRateLimiter():
|
|
574
|
+
|
|
575
|
+
const limiter = createRateLimiter({
|
|
576
|
+
limits: {
|
|
577
|
+
'gpt-4o-mini': { rpm: 10000, itpm: 10,000,000 },
|
|
578
|
+
},
|
|
579
|
+
})
|
|
458
580
|
|
|
459
|
-
|
|
460
|
-
|
|
461
|
-
|
|
462
|
-
|
|
581
|
+
────────────────────────────────────────────────────────────────────────────────
|
|
582
|
+
```
|
|
583
|
+
|
|
584
|
+
**How it works** — Makes a minimal (5-token) request per model and reads the `x-ratelimit-limit-*` headers that every provider returns on each response. These headers reflect your account's actual tier, not the published defaults.
|
|
585
|
+
|
|
586
|
+
**Options:**
|
|
587
|
+
|
|
588
|
+
```
|
|
589
|
+
npx ai-sdk-rate-limiter audit [options]
|
|
590
|
+
|
|
591
|
+
--provider, -p <name> Audit a single provider: openai, anthropic, groq, mistral, cohere
|
|
592
|
+
--json Machine-readable JSON output
|
|
593
|
+
--help, -h Show help
|
|
594
|
+
--version, -v Print version
|
|
595
|
+
|
|
596
|
+
Environment variables required:
|
|
597
|
+
OPENAI_API_KEY
|
|
598
|
+
ANTHROPIC_API_KEY
|
|
599
|
+
GROQ_API_KEY
|
|
600
|
+
MISTRAL_API_KEY
|
|
601
|
+
COHERE_API_KEY
|
|
602
|
+
```
|
|
603
|
+
|
|
604
|
+
**Examples:**
|
|
605
|
+
|
|
606
|
+
```bash
|
|
607
|
+
# Audit all configured providers
|
|
608
|
+
npx ai-sdk-rate-limiter audit
|
|
609
|
+
|
|
610
|
+
# Audit only OpenAI
|
|
611
|
+
npx ai-sdk-rate-limiter audit --provider openai
|
|
612
|
+
|
|
613
|
+
# Machine-readable output for CI / scripts
|
|
614
|
+
npx ai-sdk-rate-limiter audit --json | jq '.providers[0].models'
|
|
463
615
|
```
|
|
464
616
|
|
|
465
617
|
---
|
|
466
618
|
|
|
467
|
-
##
|
|
619
|
+
## Model registry
|
|
468
620
|
|
|
469
|
-
Limits and pricing are built-in for every major model across 6 providers. Defaults are conservative (free/Tier 1) — override with your actual plan limits
|
|
621
|
+
Limits and pricing are built-in for every major model across 6 providers. Defaults are conservative (free/Tier 1) — override with your actual plan limits via the `limits` config option or by running `audit`.
|
|
470
622
|
|
|
471
623
|
**OpenAI**
|
|
472
624
|
|
|
@@ -540,30 +692,87 @@ import {
|
|
|
540
692
|
console.log(GROQ_MODELS['llama-3.3-70b-versatile'])
|
|
541
693
|
// { rpm: 30, itpm: 6000, rpd: 1000, inputPricePerMillion: 0.59, ... }
|
|
542
694
|
|
|
543
|
-
console.log(isKnownModel('llama-3.3-70b-versatile', 'groq'))
|
|
544
|
-
//
|
|
695
|
+
console.log(isKnownModel('llama-3.3-70b-versatile', 'groq')) // true
|
|
696
|
+
console.log(isKnownModel('my-fine-tune', 'openai')) // false → fallback limits
|
|
697
|
+
|
|
698
|
+
// Resolve the effective limits for a model (registry defaults merged with user overrides)
|
|
699
|
+
const limits = resolveModelLimits('gpt-4o', 'openai', { 'gpt-4o': { rpm: 1000 } })
|
|
700
|
+
```
|
|
701
|
+
|
|
702
|
+
---
|
|
703
|
+
|
|
704
|
+
## Advanced usage
|
|
545
705
|
|
|
546
|
-
|
|
547
|
-
|
|
706
|
+
### Multiple limiters per tier
|
|
707
|
+
|
|
708
|
+
```typescript
|
|
709
|
+
const freeLimiter = createRateLimiter({
|
|
710
|
+
limits: { 'gpt-4o-mini': { rpm: 5 } },
|
|
711
|
+
cost: { budget: { daily: 0.10 }, onExceeded: 'throw' },
|
|
712
|
+
queue: { timeout: 5_000 },
|
|
713
|
+
})
|
|
714
|
+
|
|
715
|
+
const paidLimiter = createRateLimiter({
|
|
716
|
+
limits: { 'gpt-4o': { rpm: 100 } },
|
|
717
|
+
cost: { budget: { daily: 20 } },
|
|
718
|
+
queue: { timeout: 30_000 },
|
|
719
|
+
})
|
|
720
|
+
|
|
721
|
+
// Route per request based on user plan
|
|
722
|
+
const model = req.user.plan === 'paid'
|
|
723
|
+
? paidLimiter.wrap(openai('gpt-4o'))
|
|
724
|
+
: freeLimiter.wrap(openai('gpt-4o-mini'))
|
|
725
|
+
```
|
|
726
|
+
|
|
727
|
+
### Combine OTel tracing with event logging
|
|
728
|
+
|
|
729
|
+
```typescript
|
|
730
|
+
import { createOtelPlugin } from 'ai-sdk-rate-limiter/otel'
|
|
731
|
+
|
|
732
|
+
const limiter = createRateLimiter({
|
|
733
|
+
on: {
|
|
734
|
+
// OTel spans for every request
|
|
735
|
+
...createOtelPlugin(trace.getTracer('my-service')),
|
|
736
|
+
// Plus any additional handlers
|
|
737
|
+
budgetHit: ({ model, limitUsd, period }) =>
|
|
738
|
+
alerts.send(`Budget alert: ${model} hit $${limitUsd} ${period} cap`),
|
|
739
|
+
},
|
|
740
|
+
})
|
|
741
|
+
```
|
|
742
|
+
|
|
743
|
+
### Custom rate limit store
|
|
744
|
+
|
|
745
|
+
Implement `RateLimitStore` to use any backend (DynamoDB, Postgres, etc.):
|
|
746
|
+
|
|
747
|
+
```typescript
|
|
748
|
+
import type { RateLimitStore } from 'ai-sdk-rate-limiter'
|
|
749
|
+
|
|
750
|
+
class MyStore implements RateLimitStore {
|
|
751
|
+
async checkAndReserve(key, tokens, limits) { /* ... */ }
|
|
752
|
+
async applyBackoff(key, untilMs) { /* ... */ }
|
|
753
|
+
async getBackoff(key) { /* ... */ }
|
|
754
|
+
}
|
|
755
|
+
|
|
756
|
+
const limiter = createRateLimiter({ store: new MyStore() })
|
|
548
757
|
```
|
|
549
758
|
|
|
550
759
|
---
|
|
551
760
|
|
|
552
761
|
## How it works
|
|
553
762
|
|
|
554
|
-
**Rate limiting
|
|
763
|
+
**Rate limiting** — Sliding window counter per model. Each model tracks a list of `{timestamp, tokens}` entries for the past 60 seconds. On every request, stale entries are evicted and the window is checked against RPM and ITPM limits simultaneously.
|
|
555
764
|
|
|
556
|
-
**Queue** — A
|
|
765
|
+
**Queue** — A sorted priority queue per model, ordered by `priority` then enqueue time (FIFO within same priority). A drain timer fires when the oldest window entry expires, processing as many waiters as possible before rescheduling.
|
|
557
766
|
|
|
558
|
-
**Retry-After propagation** — When a remote 429 arrives with a `Retry-After` header, the backoff is applied to the engine, not just the failing request. All requests queued behind it pause until the backoff clears. This prevents the common failure
|
|
767
|
+
**Retry-After propagation** — When a remote 429 arrives with a `Retry-After` header, the backoff is applied to the entire model key in the engine, not just the failing request. All requests queued behind it pause until the backoff clears. This prevents the common thundering-herd failure where you retry one request while 10 others immediately follow and all get 429s.
|
|
559
768
|
|
|
560
|
-
**Token estimation** — Before a request fires, tokens are estimated from the prompt text (~4 chars/token) and reserved in the window. After the response, actual usage from the API replaces the estimate. For streaming, actual counts come from the `finish` chunk.
|
|
769
|
+
**Token estimation** — Before a request fires, tokens are estimated from the prompt text (~4 chars/token) and reserved in the window. After the response, actual usage from the API replaces the estimate. For streaming, actual counts come from the `finish` chunk (Vercel AI SDK) or the final usage chunk (raw proxy).
|
|
561
770
|
|
|
562
|
-
**Zero dependencies** — The middleware interface is implemented structurally
|
|
771
|
+
**Zero dependencies** — The Vercel AI SDK middleware interface is implemented structurally — `@ai-sdk/provider` types are used for type checking only and not required at runtime. No `ioredis`, no `bottleneck`, no tokenizer libraries in the core.
|
|
563
772
|
|
|
564
773
|
---
|
|
565
774
|
|
|
566
|
-
##
|
|
775
|
+
## Comparison
|
|
567
776
|
|
|
568
777
|
| | ai-sdk-rate-limiter | bottleneck | p-limit | SDK built-in retry | LangChain |
|
|
569
778
|
|---|:---:|:---:|:---:|:---:|:---:|
|
|
@@ -575,37 +784,42 @@ console.log(isKnownModel('my-fine-tune', 'openai'))
|
|
|
575
784
|
| Cost tracking + budgets | yes | no | no | no | no |
|
|
576
785
|
| Retry-After header | yes | no | no | partial | partial |
|
|
577
786
|
| Backoff propagation | yes | no | no | no | no |
|
|
787
|
+
| OpenTelemetry | yes | no | no | no | partial |
|
|
788
|
+
| CLI audit | yes | no | no | no | no |
|
|
578
789
|
| Zero runtime deps | yes | no | yes | — | no |
|
|
579
790
|
| Provider-agnostic | yes | yes | yes | no | no |
|
|
580
791
|
|
|
581
|
-
**bottleneck**
|
|
792
|
+
**bottleneck** — Excellent general-purpose rate limiting, but knows nothing about AI APIs. No model limits, no token counting, no cost tracking. You'd need to configure it per-model manually and rebuild the cost system yourself.
|
|
582
793
|
|
|
583
|
-
**p-limit**
|
|
794
|
+
**p-limit** — Controls concurrency, not rate. Limits to N concurrent requests, not N requests per minute. A different problem.
|
|
584
795
|
|
|
585
|
-
**SDK built-in retry**
|
|
796
|
+
**SDK built-in retry** — Retries on 429 with backoff. That's the floor, not the ceiling. No queuing, no priority, no cost tracking, no backoff propagation to other in-flight requests.
|
|
586
797
|
|
|
587
798
|
---
|
|
588
799
|
|
|
589
800
|
## TypeScript
|
|
590
801
|
|
|
591
|
-
Fully typed. All configuration options, events, errors, and report shapes have precise TypeScript definitions.
|
|
802
|
+
Fully typed. All configuration options, events, errors, and report shapes have precise TypeScript definitions exported from the main entry point.
|
|
592
803
|
|
|
593
804
|
```typescript
|
|
594
805
|
import type {
|
|
595
806
|
RateLimiterConfig,
|
|
596
807
|
CostReport,
|
|
808
|
+
EventMap,
|
|
597
809
|
QueuedEvent,
|
|
598
810
|
Priority,
|
|
811
|
+
ModelLimits,
|
|
599
812
|
} from 'ai-sdk-rate-limiter'
|
|
600
|
-
|
|
601
|
-
function handleQueuedRequest(event: QueuedEvent) {
|
|
602
|
-
// event.model, event.priority, event.queueDepth, event.estimatedWaitMs
|
|
603
|
-
// all typed, all autocompleted
|
|
604
|
-
}
|
|
605
813
|
```
|
|
606
814
|
|
|
607
815
|
---
|
|
608
816
|
|
|
817
|
+
## Examples
|
|
818
|
+
|
|
819
|
+
A full Next.js 15 App Router example is included at [`examples/nextjs/`](./examples/nextjs/). It demonstrates streaming chat with rate limiting, live cost display, and proper error handling for budget and rate limit errors.
|
|
820
|
+
|
|
821
|
+
---
|
|
822
|
+
|
|
609
823
|
## Requirements
|
|
610
824
|
|
|
611
825
|
- Node.js 18+ / Bun / Deno
|