ai-sdk-rate-limiter 0.3.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -52,7 +52,9 @@ The wrapped model is a drop-in replacement. Every Vercel AI SDK feature — stre
52
52
 
53
53
  **Cost tracking** — Records actual token usage from every response. Reports hourly, daily, and monthly spend per model. Optionally enforces budget caps.
54
54
 
55
- **Built-in model registry** — Knows the RPM, ITPM, and per-token pricing for every major OpenAI, Anthropic, and Google model out of the box. Nothing to configure to get started.
55
+ **Built-in model registry** — Knows the RPM, ITPM, and per-token pricing for every major OpenAI, Anthropic, Google, Groq, Mistral, and Cohere model out of the box. Nothing to configure to get started.
56
+
57
+ **Raw SDK support** — Works with the native OpenAI, Anthropic, Groq, Mistral, and Cohere SDKs directly via a transparent JavaScript Proxy. No Vercel AI SDK required.
56
58
 
57
59
  ---
58
60
 
@@ -212,6 +214,123 @@ const result = await generateText({ model, prompt })
212
214
 
213
215
  ---
214
216
 
217
+ ## Multi-instance Redis store
218
+
219
+ By default, rate limit state is in-memory (per-process). In multi-instance deployments — serverless functions, multiple pods, workers — each instance has its own counters. Install the Redis store to share state across all instances:
220
+
221
+ ```
222
+ npm install ioredis
223
+ ```
224
+
225
+ ```typescript
226
+ import { createRateLimiter } from 'ai-sdk-rate-limiter'
227
+ import { RedisStore } from 'ai-sdk-rate-limiter/redis'
228
+ import Redis from 'ioredis'
229
+
230
+ const limiter = createRateLimiter({
231
+ store: new RedisStore(new Redis(process.env.REDIS_URL)),
232
+ // ... rest of your config
233
+ })
234
+ ```
235
+
236
+ That's the entire change. All APIs — `wrap()`, `rawProxy()`, events, cost reports — work identically. The Redis store enforces rate limits collectively so no two instances can jointly exceed the API limits.
237
+
238
+ **How it works:**
239
+
240
+ Each request atomically runs a Lua script that:
241
+ 1. Removes entries older than 60 seconds from a sorted set (`ZREMRANGEBYSCORE`)
242
+ 2. Counts remaining requests and sums input tokens
243
+ 3. Checks against RPM and ITPM limits
244
+ 4. If allowed: reserves the slot (`ZADD`) and returns immediately
245
+ 5. If blocked: returns the timestamp when the next slot opens
246
+
247
+ The local queue (priority ordering, drain timer, timeout handling) stays in-memory per instance — only the window counters are shared.
248
+
249
+ **Options:**
250
+
251
+ ```typescript
252
+ new RedisStore(redis, {
253
+ keyPrefix: 'rl:myapp:', // namespace if multiple apps share Redis
254
+ windowMs: 60_000, // window size; match your provider's limit window
255
+ })
256
+ ```
257
+
258
+ **Compatible clients** — any Redis client with `eval()`, `get()`, and `set()` works: `ioredis`, `node-redis`, Upstash Redis.
259
+
260
+ **Single-instance deployments:** the default `InMemoryStore` is more accurate (true sliding window, no network round-trips) and zero-config. Only switch to `RedisStore` when you actually need cross-instance coordination.
261
+
262
+ ---
263
+
264
+ ## Raw SDK proxy
265
+
266
+ If you're using the OpenAI, Anthropic, Groq, Mistral, or Cohere SDK directly — without the Vercel AI SDK — use `limiter.rawProxy()` to add rate limiting as a transparent drop-in:
267
+
268
+ ```typescript
269
+ import { createRateLimiter } from 'ai-sdk-rate-limiter'
270
+ import OpenAI from 'openai'
271
+ import Anthropic from '@anthropic-ai/sdk'
272
+
273
+ const limiter = createRateLimiter({
274
+ cost: { budget: { daily: 50 }, onExceeded: 'throw' },
275
+ on: { rateLimited: ({ model }) => console.warn(`${model} rate limited`) },
276
+ })
277
+
278
+ // Every API call goes through the same rate limiter and cost tracker
279
+ const openai = limiter.rawProxy(new OpenAI())
280
+ const anthropic = limiter.rawProxy(new Anthropic())
281
+
282
+ // Use exactly as before — no other changes needed
283
+ const completion = await openai.chat.completions.create({
284
+ model: 'gpt-4o',
285
+ messages: [{ role: 'user', content: 'Hello!' }],
286
+ })
287
+
288
+ const message = await anthropic.messages.create({
289
+ model: 'claude-opus-4-6',
290
+ max_tokens: 1024,
291
+ messages: [{ role: 'user', content: 'Hello!' }],
292
+ })
293
+
294
+ // Cost from both clients tracked together
295
+ const report = limiter.getCostReport()
296
+ ```
297
+
298
+ **Streaming works too** — the proxy wraps the returned `AsyncIterable` to capture the final usage chunk automatically:
299
+
300
+ ```typescript
301
+ const stream = await openai.chat.completions.create({
302
+ model: 'gpt-4o',
303
+ messages: [{ role: 'user', content: 'Stream this' }],
304
+ stream: true,
305
+ stream_options: { include_usage: true }, // tells OpenAI to include usage in final chunk
306
+ })
307
+
308
+ for await (const chunk of stream) {
309
+ process.stdout.write(chunk.choices[0]?.delta?.content ?? '')
310
+ }
311
+ // After the loop, usage is recorded in limiter.getCostReport()
312
+ ```
313
+
314
+ **Zero-config standalone version** — if you don't need to share the limiter with other models:
315
+
316
+ ```typescript
317
+ import { rateLimited } from 'ai-sdk-rate-limiter'
318
+
319
+ const openai = rateLimited(new OpenAI(), {
320
+ config: { cost: { budget: { daily: 20 } } },
321
+ })
322
+ ```
323
+
324
+ **Provider is auto-detected** from the client's constructor name (`OpenAI`, `Anthropic`, `Groq`, etc.). Override it explicitly if needed:
325
+
326
+ ```typescript
327
+ const client = limiter.rawProxy(new OpenAI({ baseURL: 'https://api.groq.com/openai/v1' }), {
328
+ provider: 'groq', // use Groq's limits and pricing instead of OpenAI's
329
+ })
330
+ ```
331
+
332
+ ---
333
+
215
334
  ## Backpressure — know before you send
216
335
 
217
336
  Check estimated wait time before committing to a request. Useful for showing loading states or shedding load gracefully.
@@ -449,6 +568,7 @@ console.log(isKnownModel('my-fine-tune', 'openai'))
449
568
  | | ai-sdk-rate-limiter | bottleneck | p-limit | SDK built-in retry | LangChain |
450
569
  |---|:---:|:---:|:---:|:---:|:---:|
451
570
  | Vercel AI SDK `.wrap()` | yes | no | no | — | no |
571
+ | Raw SDK proxy | yes | no | no | — | no |
452
572
  | Model-aware limits | yes | no | no | no | partial |
453
573
  | ITPM / token tracking | yes | no | no | no | no |
454
574
  | Priority queue | yes | yes | no | no | no |