ai-sdk-rate-limiter 0.3.0 → 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +121 -1
- package/dist/index.cjs +338 -142
- package/dist/index.cjs.map +1 -1
- package/dist/index.d.cts +56 -298
- package/dist/index.d.ts +56 -298
- package/dist/index.js +338 -143
- package/dist/index.js.map +1 -1
- package/dist/redis.cjs +209 -0
- package/dist/redis.cjs.map +1 -0
- package/dist/redis.d.cts +54 -0
- package/dist/redis.d.ts +54 -0
- package/dist/redis.js +207 -0
- package/dist/redis.js.map +1 -0
- package/dist/types-CgePLtmQ.d.cts +385 -0
- package/dist/types-CgePLtmQ.d.ts +385 -0
- package/package.json +16 -2
package/README.md
CHANGED
|
@@ -52,7 +52,9 @@ The wrapped model is a drop-in replacement. Every Vercel AI SDK feature — stre
|
|
|
52
52
|
|
|
53
53
|
**Cost tracking** — Records actual token usage from every response. Reports hourly, daily, and monthly spend per model. Optionally enforces budget caps.
|
|
54
54
|
|
|
55
|
-
**Built-in model registry** — Knows the RPM, ITPM, and per-token pricing for every major OpenAI, Anthropic, and
|
|
55
|
+
**Built-in model registry** — Knows the RPM, ITPM, and per-token pricing for every major OpenAI, Anthropic, Google, Groq, Mistral, and Cohere model out of the box. Nothing to configure to get started.
|
|
56
|
+
|
|
57
|
+
**Raw SDK support** — Works with the native OpenAI, Anthropic, Groq, Mistral, and Cohere SDKs directly via a transparent JavaScript Proxy. No Vercel AI SDK required.
|
|
56
58
|
|
|
57
59
|
---
|
|
58
60
|
|
|
@@ -212,6 +214,123 @@ const result = await generateText({ model, prompt })
|
|
|
212
214
|
|
|
213
215
|
---
|
|
214
216
|
|
|
217
|
+
## Multi-instance Redis store
|
|
218
|
+
|
|
219
|
+
By default, rate limit state is in-memory (per-process). In multi-instance deployments — serverless functions, multiple pods, workers — each instance has its own counters. Install the Redis store to share state across all instances:
|
|
220
|
+
|
|
221
|
+
```
|
|
222
|
+
npm install ioredis
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
```typescript
|
|
226
|
+
import { createRateLimiter } from 'ai-sdk-rate-limiter'
|
|
227
|
+
import { RedisStore } from 'ai-sdk-rate-limiter/redis'
|
|
228
|
+
import Redis from 'ioredis'
|
|
229
|
+
|
|
230
|
+
const limiter = createRateLimiter({
|
|
231
|
+
store: new RedisStore(new Redis(process.env.REDIS_URL)),
|
|
232
|
+
// ... rest of your config
|
|
233
|
+
})
|
|
234
|
+
```
|
|
235
|
+
|
|
236
|
+
That's the entire change. All APIs — `wrap()`, `rawProxy()`, events, cost reports — work identically. The Redis store enforces rate limits collectively so no two instances can jointly exceed the API limits.
|
|
237
|
+
|
|
238
|
+
**How it works:**
|
|
239
|
+
|
|
240
|
+
Each request atomically runs a Lua script that:
|
|
241
|
+
1. Removes entries older than 60 seconds from a sorted set (`ZREMRANGEBYSCORE`)
|
|
242
|
+
2. Counts remaining requests and sums input tokens
|
|
243
|
+
3. Checks against RPM and ITPM limits
|
|
244
|
+
4. If allowed: reserves the slot (`ZADD`) and returns immediately
|
|
245
|
+
5. If blocked: returns the timestamp when the next slot opens
|
|
246
|
+
|
|
247
|
+
The local queue (priority ordering, drain timer, timeout handling) stays in-memory per instance — only the window counters are shared.
|
|
248
|
+
|
|
249
|
+
**Options:**
|
|
250
|
+
|
|
251
|
+
```typescript
|
|
252
|
+
new RedisStore(redis, {
|
|
253
|
+
keyPrefix: 'rl:myapp:', // namespace if multiple apps share Redis
|
|
254
|
+
windowMs: 60_000, // window size; match your provider's limit window
|
|
255
|
+
})
|
|
256
|
+
```
|
|
257
|
+
|
|
258
|
+
**Compatible clients** — any Redis client with `eval()`, `get()`, and `set()` works: `ioredis`, `node-redis`, Upstash Redis.
|
|
259
|
+
|
|
260
|
+
**Single-instance deployments:** the default `InMemoryStore` is more accurate (true sliding window, no network round-trips) and zero-config. Only switch to `RedisStore` when you actually need cross-instance coordination.
|
|
261
|
+
|
|
262
|
+
---
|
|
263
|
+
|
|
264
|
+
## Raw SDK proxy
|
|
265
|
+
|
|
266
|
+
If you're using the OpenAI, Anthropic, Groq, Mistral, or Cohere SDK directly — without the Vercel AI SDK — use `limiter.rawProxy()` to add rate limiting as a transparent drop-in:
|
|
267
|
+
|
|
268
|
+
```typescript
|
|
269
|
+
import { createRateLimiter } from 'ai-sdk-rate-limiter'
|
|
270
|
+
import OpenAI from 'openai'
|
|
271
|
+
import Anthropic from '@anthropic-ai/sdk'
|
|
272
|
+
|
|
273
|
+
const limiter = createRateLimiter({
|
|
274
|
+
cost: { budget: { daily: 50 }, onExceeded: 'throw' },
|
|
275
|
+
on: { rateLimited: ({ model }) => console.warn(`${model} rate limited`) },
|
|
276
|
+
})
|
|
277
|
+
|
|
278
|
+
// Every API call goes through the same rate limiter and cost tracker
|
|
279
|
+
const openai = limiter.rawProxy(new OpenAI())
|
|
280
|
+
const anthropic = limiter.rawProxy(new Anthropic())
|
|
281
|
+
|
|
282
|
+
// Use exactly as before — no other changes needed
|
|
283
|
+
const completion = await openai.chat.completions.create({
|
|
284
|
+
model: 'gpt-4o',
|
|
285
|
+
messages: [{ role: 'user', content: 'Hello!' }],
|
|
286
|
+
})
|
|
287
|
+
|
|
288
|
+
const message = await anthropic.messages.create({
|
|
289
|
+
model: 'claude-opus-4-6',
|
|
290
|
+
max_tokens: 1024,
|
|
291
|
+
messages: [{ role: 'user', content: 'Hello!' }],
|
|
292
|
+
})
|
|
293
|
+
|
|
294
|
+
// Cost from both clients tracked together
|
|
295
|
+
const report = limiter.getCostReport()
|
|
296
|
+
```
|
|
297
|
+
|
|
298
|
+
**Streaming works too** — the proxy wraps the returned `AsyncIterable` to capture the final usage chunk automatically:
|
|
299
|
+
|
|
300
|
+
```typescript
|
|
301
|
+
const stream = await openai.chat.completions.create({
|
|
302
|
+
model: 'gpt-4o',
|
|
303
|
+
messages: [{ role: 'user', content: 'Stream this' }],
|
|
304
|
+
stream: true,
|
|
305
|
+
stream_options: { include_usage: true }, // tells OpenAI to include usage in final chunk
|
|
306
|
+
})
|
|
307
|
+
|
|
308
|
+
for await (const chunk of stream) {
|
|
309
|
+
process.stdout.write(chunk.choices[0]?.delta?.content ?? '')
|
|
310
|
+
}
|
|
311
|
+
// After the loop, usage is recorded in limiter.getCostReport()
|
|
312
|
+
```
|
|
313
|
+
|
|
314
|
+
**Zero-config standalone version** — if you don't need to share the limiter with other models:
|
|
315
|
+
|
|
316
|
+
```typescript
|
|
317
|
+
import { rateLimited } from 'ai-sdk-rate-limiter'
|
|
318
|
+
|
|
319
|
+
const openai = rateLimited(new OpenAI(), {
|
|
320
|
+
config: { cost: { budget: { daily: 20 } } },
|
|
321
|
+
})
|
|
322
|
+
```
|
|
323
|
+
|
|
324
|
+
**Provider is auto-detected** from the client's constructor name (`OpenAI`, `Anthropic`, `Groq`, etc.). Override it explicitly if needed:
|
|
325
|
+
|
|
326
|
+
```typescript
|
|
327
|
+
const client = limiter.rawProxy(new OpenAI({ baseURL: 'https://api.groq.com/openai/v1' }), {
|
|
328
|
+
provider: 'groq', // use Groq's limits and pricing instead of OpenAI's
|
|
329
|
+
})
|
|
330
|
+
```
|
|
331
|
+
|
|
332
|
+
---
|
|
333
|
+
|
|
215
334
|
## Backpressure — know before you send
|
|
216
335
|
|
|
217
336
|
Check estimated wait time before committing to a request. Useful for showing loading states or shedding load gracefully.
|
|
@@ -449,6 +568,7 @@ console.log(isKnownModel('my-fine-tune', 'openai'))
|
|
|
449
568
|
| | ai-sdk-rate-limiter | bottleneck | p-limit | SDK built-in retry | LangChain |
|
|
450
569
|
|---|:---:|:---:|:---:|:---:|:---:|
|
|
451
570
|
| Vercel AI SDK `.wrap()` | yes | no | no | — | no |
|
|
571
|
+
| Raw SDK proxy | yes | no | no | — | no |
|
|
452
572
|
| Model-aware limits | yes | no | no | no | partial |
|
|
453
573
|
| ITPM / token tracking | yes | no | no | no | no |
|
|
454
574
|
| Priority queue | yes | yes | no | no | no |
|