tokenfirewall 1.0.2 → 2.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,25 +1,33 @@
1
1
  # TokenFirewall
2
2
 
3
- > Enterprise-grade LLM cost enforcement middleware for Node.js with automatic budget protection, multi-provider support, and intelligent cost tracking.
3
+ Enterprise-grade LLM cost enforcement middleware for Node.js with automatic budget protection, intelligent model routing, and comprehensive multi-provider support.
4
4
 
5
- [![npm version](https://img.shields.io/npm/v/tokenfirewall.svg)](https://www.npmjs.com/package/tokenfirewall)
6
- [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
7
- [![TypeScript](https://img.shields.io/badge/TypeScript-Ready-blue.svg)](https://www.typescriptlang.org/)
5
+ [![npm version](https://img.shields.io/npm/v/tokenfirewall.svg?style=flat-square)](https://www.npmjs.com/package/tokenfirewall)
6
+ [![npm downloads](https://img.shields.io/npm/dm/tokenfirewall.svg?style=flat-square)](https://www.npmjs.com/package/tokenfirewall)
7
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square)](https://opensource.org/licenses/MIT)
8
+ [![TypeScript](https://img.shields.io/badge/TypeScript-Ready-blue.svg?style=flat-square)](https://www.typescriptlang.org/)
8
9
 
9
10
  ## Overview
10
11
 
11
- TokenFirewall is a production-ready middleware that automatically tracks and enforces budget limits for Large Language Model (LLM) API calls. It provides transparent cost monitoring, prevents budget overruns, and supports multiple providers through a unified interface.
12
+ TokenFirewall is a production-ready middleware that automatically tracks and enforces budget limits for Large Language Model (LLM) API calls. It provides transparent cost monitoring, prevents budget overruns, intelligent model routing with automatic failover, and supports multiple providers through a unified interface.
12
13
 
13
14
  ### Key Features
14
15
 
15
- - **Automatic Budget Enforcement** - Block or warn when spending limits are exceeded
16
- - **Real-time Cost Tracking** - Automatic calculation based on actual token usage
17
- - **Multi-Provider Support** - Works with OpenAI, Anthropic, Gemini, Grok, Kimi, and custom providers
18
- - **Model Discovery** - List available models with context limits and pricing
19
- - **Budget Persistence** - Save and restore budget state across restarts
20
- - **Zero Configuration** - Works out-of-the-box with sensible defaults
21
- - **Production Ready** - Comprehensive error handling and validation
22
- - **TypeScript Native** - Full type definitions included
16
+ - **Never Exceed Your Budget** - Automatically blocks API calls when spending limits are reached, preventing surprise bills
17
+ - **Zero Code Changes Required** - Drop-in middleware that works with any LLM API without modifying your existing code
18
+ - **Automatic Failover** - Intelligent router switches to backup models when primary fails, keeping your app running
19
+ - **Real-time Cost Tracking** - See exactly how much each API call costs based on actual token usage
20
+ - **Multi-Provider Support** - Works with OpenAI, Anthropic, Gemini, Grok, Kimi, and any custom LLM provider
21
+ - **Custom Model Support** - Register your own models with custom pricing and context limits at runtime
22
+ - **Production Ready** - Battle-tested with comprehensive error handling and edge case coverage
23
+ - **TypeScript Native** - Full type safety with included definitions
24
+
25
+ ### What's New in v2.0.0
26
+
27
+ - **Intelligent Router** - Automatic failover to backup models when API calls fail
28
+ - **40+ Latest Models** - GPT-5, Claude 4.5, Gemini 3, with accurate 2026 pricing
29
+ - **Dynamic Registration** - Add custom models and pricing at runtime
30
+ - **Production Hardened** - Comprehensive validation, error handling, and edge case coverage
23
31
 
24
32
  ---
25
33
 
@@ -29,18 +37,13 @@ TokenFirewall is a production-ready middleware that automatically tracks and enf
29
37
  - [Quick Start](#quick-start)
30
38
  - [Core Concepts](#core-concepts)
31
39
  - [API Reference](#api-reference)
32
- - [Budget Management](#budget-management)
33
- - [Interception](#interception)
34
- - [Model Discovery](#model-discovery)
35
- - [Custom Providers](#custom-providers)
36
- - [Budget Persistence](#budget-persistence)
40
+ - [Intelligent Model Router](#intelligent-model-router)
41
+ - [Dynamic Model Registration](#dynamic-model-registration)
37
42
  - [Supported Providers](#supported-providers)
38
- - [Use Cases](#use-cases)
39
43
  - [Examples](#examples)
40
44
  - [TypeScript Support](#typescript-support)
41
45
  - [Error Handling](#error-handling)
42
46
  - [Best Practices](#best-practices)
43
- - [Contributing](#contributing)
44
47
  - [License](#license)
45
48
 
46
49
  ---
@@ -71,7 +74,7 @@ createBudgetGuard({
71
74
  // Step 2: Patch global fetch
72
75
  patchGlobalFetch();
73
76
 
74
- // Step 3: Use any LLM API normally - tokenfirewall handles the rest
77
+ // Step 3: Use any LLM API normally
75
78
  const response = await fetch("https://api.openai.com/v1/chat/completions", {
76
79
  method: "POST",
77
80
  headers: {
@@ -93,7 +96,7 @@ const response = await fetch("https://api.openai.com/v1/chat/completions", {
93
96
 
94
97
  ### Budget Guard
95
98
 
96
- The Budget Guard is the core component that tracks spending and enforces limits. It operates in two modes:
99
+ The Budget Guard tracks spending and enforces limits in two modes:
97
100
 
98
101
  - **Block Mode** (`mode: "block"`): Throws an error when budget is exceeded, preventing the API call
99
102
  - **Warn Mode** (`mode: "warn"`): Logs a warning but allows the API call to proceed
@@ -126,17 +129,12 @@ Creates and configures a budget guard instance.
126
129
 
127
130
  **Parameters:**
128
131
 
129
- | Parameter | Type | Required | Description |
130
- |-----------|------|----------|-------------|
131
- | `options` | `BudgetGuardOptions` | Yes | Budget configuration object |
132
- | `options.monthlyLimit` | `number` | Yes | Maximum spending limit in USD |
133
- | `options.mode` | `"block" \| "warn"` | No | Enforcement mode (default: `"block"`) |
134
-
135
- **Returns:** `BudgetManager` - The budget manager instance
136
-
137
- **Throws:**
138
- - `Error` if `monthlyLimit` is not a positive number
139
- - `Error` if `mode` is not "block" or "warn"
132
+ ```typescript
133
+ interface BudgetGuardOptions {
134
+ monthlyLimit: number; // Maximum spending limit in USD
135
+ mode?: "block" | "warn"; // Enforcement mode (default: "block")
136
+ }
137
+ ```
140
138
 
141
139
  **Example:**
142
140
 
@@ -144,32 +142,25 @@ Creates and configures a budget guard instance.
144
142
  const { createBudgetGuard } = require("tokenfirewall");
145
143
 
146
144
  // Block mode - strict enforcement
147
- const guard = createBudgetGuard({
145
+ createBudgetGuard({
148
146
  monthlyLimit: 100,
149
147
  mode: "block"
150
148
  });
151
149
 
152
150
  // Warn mode - soft limits
153
- const guard = createBudgetGuard({
151
+ createBudgetGuard({
154
152
  monthlyLimit: 500,
155
153
  mode: "warn"
156
154
  });
157
155
  ```
158
156
 
159
- **Notes:**
160
- - Calling `createBudgetGuard()` multiple times will replace the existing guard
161
- - A warning is logged when overwriting an existing guard
162
- - The guard is global and applies to all subsequent API calls
163
-
164
157
  ---
165
158
 
166
159
  #### `getBudgetStatus()`
167
160
 
168
161
  Retrieves the current budget status and usage statistics.
169
162
 
170
- **Parameters:** None
171
-
172
- **Returns:** `BudgetStatus | null`
163
+ **Returns:**
173
164
 
174
165
  ```typescript
175
166
  interface BudgetStatus {
@@ -186,152 +177,61 @@ interface BudgetStatus {
186
177
  const { getBudgetStatus } = require("tokenfirewall");
187
178
 
188
179
  const status = getBudgetStatus();
189
-
190
180
  if (status) {
191
181
  console.log(`Spent: $${status.totalSpent.toFixed(2)}`);
192
182
  console.log(`Remaining: $${status.remaining.toFixed(2)}`);
193
183
  console.log(`Usage: ${status.percentageUsed.toFixed(1)}%`);
194
-
195
- // Alert if over 80%
196
- if (status.percentageUsed > 80) {
197
- console.warn("⚠️ Budget usage is high!");
198
- }
199
184
  }
200
185
  ```
201
186
 
202
- **Returns `null` if:**
203
- - No budget guard has been created
204
- - Budget guard was not initialized
205
-
206
187
  ---
207
188
 
208
189
  #### `resetBudget()`
209
190
 
210
- Resets the budget tracking to zero, clearing all accumulated costs.
211
-
212
- **Parameters:** None
213
-
214
- **Returns:** `void`
215
-
216
- **Example:**
191
+ Resets the budget tracking to zero.
217
192
 
218
193
  ```javascript
219
- const { resetBudget, getBudgetStatus } = require("tokenfirewall");
194
+ const { resetBudget } = require("tokenfirewall");
220
195
 
221
196
  // Reset at the start of each month
222
- function monthlyReset() {
223
- resetBudget();
224
- console.log("Budget reset for new month");
225
-
226
- const status = getBudgetStatus();
227
- console.log(`New budget: $${status.limit}`);
228
- }
229
-
230
- // Schedule monthly reset
231
- const cron = require("node-cron");
232
- cron.schedule("0 0 1 * *", monthlyReset); // First day of month
197
+ resetBudget();
233
198
  ```
234
199
 
235
- **Use Cases:**
236
- - Monthly budget resets
237
- - Testing and development
238
- - Per-session budgets
239
- - Tenant-specific resets
240
-
241
200
  ---
242
201
 
243
- ### Interception
244
-
245
- #### `patchGlobalFetch()`
246
-
247
- Patches the global `fetch` function to intercept and track LLM API calls.
202
+ #### `exportBudgetState()` / `importBudgetState(state)`
248
203
 
249
- **Parameters:** None
250
-
251
- **Returns:** `void`
252
-
253
- **Example:**
204
+ Save and restore budget state for persistence.
254
205
 
255
206
  ```javascript
256
- const { patchGlobalFetch } = require("tokenfirewall");
207
+ const { exportBudgetState, importBudgetState } = require("tokenfirewall");
208
+ const fs = require("fs");
257
209
 
258
- // Patch once at application startup
259
- patchGlobalFetch();
210
+ // Export state
211
+ const state = exportBudgetState();
212
+ fs.writeFileSync("budget.json", JSON.stringify(state));
260
213
 
261
- // All subsequent fetch calls are intercepted
262
- await fetch("https://api.openai.com/v1/chat/completions", { /* ... */ });
263
- await fetch("https://api.anthropic.com/v1/messages", { /* ... */ });
214
+ // Import state
215
+ const savedState = JSON.parse(fs.readFileSync("budget.json"));
216
+ importBudgetState(savedState);
264
217
  ```
265
218
 
266
- **Behavior:**
267
- - Intercepts all `fetch` calls globally
268
- - Only processes LLM API responses (non-LLM calls are ignored)
269
- - Automatically detects provider from response format
270
- - Calculates costs and tracks against budget
271
- - Logs usage information to console
272
- - Can be called multiple times safely (idempotent)
273
-
274
- **Important Notes:**
275
- - Must be called AFTER `createBudgetGuard()`
276
- - Works with official SDKs that use `fetch` internally
277
- - Does not affect non-LLM HTTP requests
278
- - Minimal performance overhead
279
-
280
219
  ---
281
220
 
282
- #### `unpatchGlobalFetch()`
283
-
284
- Restores the original `fetch` function, disabling interception.
285
-
286
- **Parameters:** None
221
+ ### Interception
287
222
 
288
- **Returns:** `void`
223
+ #### `patchGlobalFetch()`
289
224
 
290
- **Example:**
225
+ Patches the global `fetch` function to intercept and track LLM API calls.
291
226
 
292
227
  ```javascript
293
- const { patchGlobalFetch, unpatchGlobalFetch } = require("tokenfirewall");
228
+ const { patchGlobalFetch } = require("tokenfirewall");
294
229
 
295
- // Enable tracking
296
230
  patchGlobalFetch();
297
231
 
298
- // ... make some API calls ...
299
-
300
- // Disable tracking
301
- unpatchGlobalFetch();
302
-
303
- // Subsequent calls are not tracked
304
- ```
305
-
306
- **Use Cases:**
307
- - Temporarily disable tracking
308
- - Testing specific scenarios
309
- - Cleanup in test suites
310
-
311
- ---
312
-
313
- #### `patchProvider(providerName)`
314
-
315
- Patches a specific provider SDK (currently placeholder - most providers work via fetch interception).
316
-
317
- **Parameters:**
318
-
319
- | Parameter | Type | Required | Description |
320
- |-----------|------|----------|-------------|
321
- | `providerName` | `string` | Yes | Provider name ("openai", "anthropic", etc.) |
322
-
323
- **Returns:** `void`
324
-
325
- **Example:**
326
-
327
- ```javascript
328
- const { patchProvider } = require("tokenfirewall");
329
-
330
- patchProvider("openai");
232
+ // All subsequent fetch calls are intercepted
331
233
  ```
332
234
 
333
- **Note:** Most providers work automatically with `patchGlobalFetch()`. This function is reserved for future provider-specific integrations.
334
-
335
235
  ---
336
236
 
337
237
  ### Model Discovery
@@ -342,21 +242,12 @@ Lists available models from a provider with context limits and budget informatio
342
242
 
343
243
  **Parameters:**
344
244
 
345
- | Parameter | Type | Required | Description |
346
- |-----------|------|----------|-------------|
347
- | `options` | `ListModelsOptions` | Yes | Discovery options |
348
- | `options.provider` | `string` | Yes | Provider name ("openai", "gemini", "grok", "kimi") |
349
- | `options.apiKey` | `string` | Yes | Provider API key |
350
- | `options.baseURL` | `string` | No | Custom API endpoint URL |
351
- | `options.includeBudgetUsage` | `boolean` | No | Include current budget usage % (default: false) |
352
-
353
- **Returns:** `Promise<ModelInfo[]>`
354
-
355
245
  ```typescript
356
- interface ModelInfo {
357
- model: string; // Model identifier
358
- contextLimit?: number; // Context window size in tokens
359
- budgetUsagePercentage?: number; // Current budget usage (if requested)
246
+ interface ListModelsOptions {
247
+ provider: string; // Provider name
248
+ apiKey: string; // Provider API key
249
+ baseURL?: string; // Custom API endpoint
250
+ includeBudgetUsage?: boolean; // Include budget usage %
360
251
  }
361
252
  ```
362
253
 
@@ -365,7 +256,6 @@ interface ModelInfo {
365
256
  ```javascript
366
257
  const { listModels } = require("tokenfirewall");
367
258
 
368
- // Discover OpenAI models
369
259
  const models = await listModels({
370
260
  provider: "openai",
371
261
  apiKey: process.env.OPENAI_API_KEY,
@@ -373,305 +263,156 @@ const models = await listModels({
373
263
  });
374
264
 
375
265
  models.forEach(model => {
376
- console.log(`Model: ${model.model}`);
377
- if (model.contextLimit) {
378
- console.log(` Context: ${model.contextLimit.toLocaleString()} tokens`);
379
- }
380
- if (model.budgetUsagePercentage !== undefined) {
381
- console.log(` Budget Used: ${model.budgetUsagePercentage.toFixed(2)}%`);
382
- }
266
+ console.log(`${model.model}: ${model.contextLimit} tokens`);
383
267
  });
384
-
385
- // Find models with large context windows
386
- const largeContext = models.filter(m => m.contextLimit && m.contextLimit > 100000);
387
268
  ```
388
269
 
389
- **Supported Providers:**
390
- - `"openai"` - Fetches from OpenAI API
391
- - `"gemini"` - Fetches from Google Gemini API
392
- - `"grok"` - Fetches from X.AI API
393
- - `"kimi"` - Fetches from Moonshot AI API
394
- - `"anthropic"` - Returns static list (no API endpoint available)
395
-
396
- **Error Handling:**
397
- - Returns empty array if API call fails
398
- - Logs warning on errors
399
- - Has 10-second timeout to prevent hanging
400
-
401
270
  ---
402
271
 
403
- #### `listAvailableModels(options)`
404
-
405
- Lower-level model discovery function with manual budget manager injection.
406
-
407
- **Parameters:**
408
-
409
- | Parameter | Type | Required | Description |
410
- |-----------|------|----------|-------------|
411
- | `options` | `ListModelsOptions` | Yes | Discovery options (same as `listModels`) |
412
- | `options.budgetManager` | `BudgetManager` | No | Manual budget manager instance |
413
-
414
- **Returns:** `Promise<ModelInfo[]>`
415
-
416
- **Example:**
417
-
418
- ```javascript
419
- const { listAvailableModels, createBudgetGuard } = require("tokenfirewall");
420
-
421
- const manager = createBudgetGuard({ monthlyLimit: 100, mode: "warn" });
272
+ ## Intelligent Model Router
422
273
 
423
- const models = await listAvailableModels({
424
- provider: "openai",
425
- apiKey: process.env.OPENAI_API_KEY,
426
- budgetManager: manager,
427
- includeBudgetUsage: true
428
- });
429
- ```
274
+ The Model Router provides automatic retry and model switching on failures.
430
275
 
431
- **Note:** Use `listModels()` instead - it automatically passes the global budget manager.
276
+ ### `createModelRouter(options)`
432
277
 
433
- ---
434
-
435
- ### Custom Providers
436
-
437
- #### `registerAdapter(adapter)`
438
-
439
- Registers a custom provider adapter for tracking non-standard LLM APIs.
278
+ Creates and configures an intelligent model router.
440
279
 
441
280
  **Parameters:**
442
281
 
443
- | Parameter | Type | Required | Description |
444
- |-----------|------|----------|-------------|
445
- | `adapter` | `ProviderAdapter` | Yes | Custom adapter implementation |
446
-
447
282
  ```typescript
448
- interface ProviderAdapter {
449
- name: string; // Unique provider name
450
- detect: (response: unknown) => boolean; // Detect if response is from this provider
451
- normalize: (response: unknown, request?: unknown) => NormalizedUsage; // Extract token usage
452
- }
453
-
454
- interface NormalizedUsage {
455
- provider: string; // Provider name
456
- model: string; // Model identifier
457
- inputTokens: number; // Input/prompt tokens
458
- outputTokens: number; // Output/completion tokens
459
- totalTokens: number; // Total tokens
283
+ interface ModelRouterOptions {
284
+ strategy: "fallback" | "context" | "cost"; // Routing strategy
285
+ fallbackMap?: Record<string, string[]>; // Fallback model map
286
+ maxRetries?: number; // Max retry attempts (default: 1)
460
287
  }
461
288
  ```
462
289
 
463
290
  **Example:**
464
291
 
465
292
  ```javascript
466
- const { registerAdapter } = require("tokenfirewall");
467
-
468
- // Register Ollama (self-hosted) adapter
469
- registerAdapter({
470
- name: "ollama",
471
-
472
- detect: (response) => {
473
- return response &&
474
- typeof response === "object" &&
475
- response.model &&
476
- response.prompt_eval_count !== undefined;
293
+ const { createModelRouter, patchGlobalFetch } = require("tokenfirewall");
294
+
295
+ // Fallback strategy - use predefined fallback models
296
+ createModelRouter({
297
+ strategy: "fallback",
298
+ fallbackMap: {
299
+ "gpt-4o": ["gpt-4o-mini", "gpt-3.5-turbo"],
300
+ "claude-3-5-sonnet-20241022": ["claude-3-5-haiku-20241022"]
477
301
  },
478
-
479
- normalize: (response) => {
480
- return {
481
- provider: "ollama",
482
- model: response.model,
483
- inputTokens: response.prompt_eval_count || 0,
484
- outputTokens: response.eval_count || 0,
485
- totalTokens: (response.prompt_eval_count || 0) + (response.eval_count || 0)
486
- };
487
- }
302
+ maxRetries: 2
488
303
  });
489
304
 
490
- // Now Ollama calls are tracked
491
- const response = await fetch("http://localhost:11434/api/generate", {
492
- method: "POST",
493
- body: JSON.stringify({ model: "llama3.2", prompt: "Hello" })
494
- });
305
+ patchGlobalFetch();
306
+
307
+ // API calls will automatically retry with fallback models on failure
495
308
  ```
496
309
 
497
- **Validation:**
498
- - Adapter name must be a non-empty string
499
- - `detect()` must return boolean
500
- - `normalize()` must return valid `NormalizedUsage` object
501
- - Adapters are checked in registration order (first match wins)
310
+ ### Routing Strategies
502
311
 
503
- ---
312
+ **1. Fallback Strategy** - Uses predefined fallback map
313
+ - Tries models in order from fallbackMap
314
+ - Best for: Known model preferences, production resilience
504
315
 
505
- #### `registerPricing(provider, model, pricing)`
316
+ **2. Context Strategy** - Upgrades to larger context window
317
+ - Only triggers on context overflow errors
318
+ - Selects model with larger context from same provider
319
+ - Best for: Handling variable input sizes
506
320
 
507
- Registers custom pricing for a provider and model.
321
+ **3. Cost Strategy** - Switches to cheaper model
322
+ - Selects cheaper model from same provider
323
+ - Best for: Cost optimization, rate limit handling
508
324
 
509
- **Parameters:**
325
+ ### Error Detection
510
326
 
511
- | Parameter | Type | Required | Description |
512
- |-----------|------|----------|-------------|
513
- | `provider` | `string` | Yes | Provider name |
514
- | `model` | `string` | Yes | Model identifier |
515
- | `pricing` | `ModelPricing` | Yes | Pricing configuration |
327
+ The router automatically detects and classifies failures:
328
+ - `rate_limit` - HTTP 429 or rate limit errors
329
+ - `context_overflow` - Context length exceeded errors
330
+ - `model_unavailable` - HTTP 404 or model not found
331
+ - `access_denied` - HTTP 403 or unauthorized
332
+ - `unknown` - Other errors
516
333
 
517
- ```typescript
518
- interface ModelPricing {
519
- input: number; // Cost per 1M input tokens (USD)
520
- output: number; // Cost per 1M output tokens (USD)
521
- }
522
- ```
334
+ ### `disableModelRouter()`
523
335
 
524
- **Example:**
336
+ Disables the model router.
525
337
 
526
338
  ```javascript
527
- const { registerPricing } = require("tokenfirewall");
528
-
529
- // Register pricing for custom model
530
- registerPricing("ollama", "llama3.2", {
531
- input: 0.0, // Free (self-hosted)
532
- output: 0.0
533
- });
339
+ const { disableModelRouter } = require("tokenfirewall");
534
340
 
535
- // Register pricing for new OpenAI model
536
- registerPricing("openai", "gpt-5", {
537
- input: 5.0, // $5 per 1M input tokens
538
- output: 15.0 // $15 per 1M output tokens
539
- });
540
-
541
- // Override existing pricing
542
- registerPricing("openai", "gpt-4o", {
543
- input: 2.0, // Custom pricing
544
- output: 8.0
545
- });
341
+ disableModelRouter();
546
342
  ```
547
343
 
548
- **Validation:**
549
- - Provider and model must be non-empty strings
550
- - Input and output prices must be non-negative numbers
551
- - Prices cannot be NaN or Infinity
344
+ ---
552
345
 
553
- **Default Pricing:**
554
- TokenFirewall includes default pricing for:
555
- - OpenAI (GPT-4o, GPT-4o-mini, GPT-4-turbo, GPT-3.5-turbo)
556
- - Anthropic (Claude 3.5 Sonnet, Claude 3.5 Haiku, Claude 3 Opus)
557
- - Gemini (Gemini 2.0 Flash, Gemini 1.5 Pro, Gemini 1.5 Flash)
558
- - Grok (Grok-beta, Grok-2, Llama models)
559
- - Kimi (Moonshot v1 models)
346
+ ## Dynamic Model Registration
560
347
 
561
- ---
348
+ Register models with pricing and context limits at runtime.
562
349
 
563
- #### `registerContextLimit(provider, model, contextLimit)`
350
+ ### `registerModels(provider, models)`
564
351
 
565
- Registers custom context window limit for a model.
352
+ Bulk register models for a provider.
566
353
 
567
354
  **Parameters:**
568
355
 
569
- | Parameter | Type | Required | Description |
570
- |-----------|------|----------|-------------|
571
- | `provider` | `string` | Yes | Provider name |
572
- | `model` | `string` | Yes | Model identifier |
573
- | `contextLimit` | `number` | Yes | Context window size in tokens |
356
+ ```typescript
357
+ interface ModelConfig {
358
+ name: string; // Model identifier
359
+ contextLimit?: number; // Context window size in tokens
360
+ pricing?: { // Pricing per 1M tokens (USD)
361
+ input: number;
362
+ output: number;
363
+ };
364
+ }
365
+ ```
574
366
 
575
367
  **Example:**
576
368
 
577
369
  ```javascript
578
- const { registerContextLimit } = require("tokenfirewall");
579
-
580
- // Register context limit for custom model
581
- registerContextLimit("ollama", "llama3.2", 8192);
370
+ const { registerModels, createModelRouter } = require("tokenfirewall");
371
+
372
+ // Register custom models
373
+ registerModels("my-provider", [
374
+ {
375
+ name: "my-large-model",
376
+ contextLimit: 200000,
377
+ pricing: { input: 5.0, output: 15.0 }
378
+ },
379
+ {
380
+ name: "my-small-model",
381
+ contextLimit: 50000,
382
+ pricing: { input: 1.0, output: 3.0 }
383
+ }
384
+ ]);
582
385
 
583
- // Register for new model
584
- registerContextLimit("openai", "gpt-5", 256000);
386
+ // Router will use dynamically registered models
387
+ createModelRouter({
388
+ strategy: "cost",
389
+ maxRetries: 2
390
+ });
585
391
  ```
586
392
 
587
- **Validation:**
588
- - Provider and model must be non-empty strings
589
- - Context limit must be a positive number
590
- - Cannot be NaN or Infinity
591
-
592
- ---
593
-
594
- ### Budget Persistence
393
+ ### `registerPricing(provider, model, pricing)`
595
394
 
596
- #### `exportBudgetState()`
597
-
598
- Exports the current budget state for persistence.
599
-
600
- **Parameters:** None
601
-
602
- **Returns:** `{ totalSpent: number; limit: number; mode: string } | null`
603
-
604
- **Example:**
395
+ Register custom pricing for a specific model.
605
396
 
606
397
  ```javascript
607
- const { exportBudgetState } = require("tokenfirewall");
608
- const fs = require("fs");
609
-
610
- // Export state
611
- const state = exportBudgetState();
398
+ const { registerPricing } = require("tokenfirewall");
612
399
 
613
- if (state) {
614
- // Save to file
615
- fs.writeFileSync("budget-state.json", JSON.stringify(state, null, 2));
616
-
617
- // Or save to database
618
- await db.budgets.update({ id: "main" }, state);
619
-
620
- // Or save to Redis
621
- await redis.set("budget:state", JSON.stringify(state));
622
- }
400
+ registerPricing("openai", "gpt-5", {
401
+ input: 5.0, // $5 per 1M input tokens
402
+ output: 15.0 // $15 per 1M output tokens
403
+ });
623
404
  ```
624
405
 
625
- **Returns `null` if:**
626
- - No budget guard has been created
627
-
628
- ---
629
-
630
- #### `importBudgetState(state)`
631
-
632
- Imports and restores a previously saved budget state.
633
-
634
- **Parameters:**
635
-
636
- | Parameter | Type | Required | Description |
637
- |-----------|------|----------|-------------|
638
- | `state` | `{ totalSpent: number }` | Yes | Saved budget state |
639
-
640
- **Returns:** `void`
406
+ ### `registerContextLimit(provider, model, contextLimit)`
641
407
 
642
- **Throws:**
643
- - `Error` if no budget guard exists
644
- - `Error` if `totalSpent` is not a valid number
645
- - `Error` if `totalSpent` is negative
646
-
647
- **Example:**
408
+ Register custom context window limit.
648
409
 
649
410
  ```javascript
650
- const { importBudgetState, createBudgetGuard } = require("tokenfirewall");
651
- const fs = require("fs");
652
-
653
- // Create budget guard first
654
- createBudgetGuard({ monthlyLimit: 100, mode: "block" });
655
-
656
- // Load from file
657
- if (fs.existsSync("budget-state.json")) {
658
- const state = JSON.parse(fs.readFileSync("budget-state.json", "utf8"));
659
- importBudgetState(state);
660
- console.log("Budget state restored");
661
- }
411
+ const { registerContextLimit } = require("tokenfirewall");
662
412
 
663
- // Or load from database
664
- const state = await db.budgets.findOne({ id: "main" });
665
- if (state) {
666
- importBudgetState(state);
667
- }
413
+ registerContextLimit("openai", "gpt-5", 256000);
668
414
  ```
669
415
 
670
- **Validation:**
671
- - Validates `totalSpent` is a valid number
672
- - Rejects negative values
673
- - Warns if imported value is suspiciously large (>10x limit)
674
-
675
416
  ---
676
417
 
677
418
  ## Supported Providers
@@ -680,41 +421,46 @@ TokenFirewall includes built-in support for:
680
421
 
681
422
  | Provider | Models | Pricing | Discovery |
682
423
  |----------|--------|---------|-----------|
683
- | **OpenAI** | GPT-4o, GPT-4o-mini, GPT-4-turbo, GPT-3.5-turbo | Included | API |
684
- | **Anthropic** | Claude 3.5 Sonnet, Claude 3.5 Haiku, Claude 3 Opus | Included | Static |
685
- | **Google Gemini** | Gemini 2.0 Flash, Gemini 1.5 Pro, Gemini 1.5 Flash | Included | API |
686
- | **Grok (X.AI)** | Grok-beta, Grok-2, Llama 3.x models | Included | API |
687
- | **Kimi (Moonshot)** | Moonshot v1 (8k, 32k, 128k) | Included | API |
688
- | **Custom** | Any LLM API | ⚙️ Register | ⚙️ Custom |
689
-
690
- ---
691
-
692
- ## Use Cases
693
-
694
- ### 1. Production Applications
695
- - Prevent unexpected cost spikes
696
- - Enforce spending limits per tenant/user
697
- - Track costs across multiple providers
698
-
699
- ### 2. Development & Testing
700
- - Limit test suite costs
701
- - Prevent accidental expensive calls
702
- - Safe experimentation with new models
703
-
704
- ### 3. Multi-Tenant SaaS
705
- - Per-customer budget limits
706
- - Tiered pricing enforcement
707
- - Usage-based billing
708
-
709
- ### 4. AI Agent Systems
710
- - Prevent runaway agent loops
711
- - Budget-aware task planning
712
- - Cost-optimized model selection
713
-
714
- ### 5. Internal Tools
715
- - Department-level budgets
716
- - Employee usage tracking
717
- - Cost allocation and reporting
424
+ | **OpenAI** | GPT-5, GPT-5-mini, GPT-4.1, GPT-4o, o1, gpt-image-1 | Included | API |
425
+ | **Anthropic** | Claude 4.5 (Opus, Sonnet, Haiku), Claude 4, Claude 3.5 | Included | Static |
426
+ | **Google Gemini** | Gemini 3, Gemini 3.1, Gemini 2.5, Nano Banana | Included | API |
427
+ | **Grok (X.AI)** | Grok 3, Grok 2, Grok Vision | Included | API |
428
+ | **Kimi (Moonshot)** | Moonshot v1 (8k, 32k, 128k) | Included | API |
429
+ | **Meta** | Llama 3.3, Llama 3.1 | Included | Static |
430
+ | **Mistral** | Mistral Large, Mixtral | Included | Static |
431
+ | **Cohere** | Command R+, Command R | Included | Static |
432
+ | **Custom** | Any LLM API | Register | Custom |
433
+
434
+ ### Pricing (Per 1M Tokens)
435
+
436
+ **OpenAI:**
437
+ - GPT-5: $1.25 / $10.00
438
+ - GPT-5-mini: $0.25 / $2.00
439
+ - GPT-4.1: $2.00 / $8.00
440
+ - GPT-4o: $2.50 / $10.00
441
+ - o1: $15.00 / $60.00
442
+
443
+ **Anthropic:**
444
+ - Claude Opus 4.5: $5.00 / $25.00
445
+ - Claude Sonnet 4.5: $3.00 / $15.00
446
+ - Claude Haiku 4.5: $1.00 / $5.00
447
+
448
+ **Gemini:**
449
+ - Gemini 3 Pro: $2.00 / $12.00
450
+ - Gemini 3 Flash: $0.50 / $3.00
451
+ - Gemini 2.5 Pro: $1.25 / $10.00
452
+ - Gemini 2.5 Flash: $0.30 / $2.50
453
+ - Gemini 2.5 Flash Lite: $0.10 / $0.40
454
+
455
+ *Pricing verified as of February 27, 2026. Standard tier, ≤200K input tokens.*
456
+
457
+ ### Context Limits
458
+
459
+ - GPT-5: 256K tokens
460
+ - GPT-4.1: 200K tokens
461
+ - Claude 4.5: 200K tokens
462
+ - Gemini 3 Pro: 2M tokens
463
+ - o1: 200K tokens
718
464
 
719
465
  ---
720
466
 
@@ -727,6 +473,8 @@ See the [`examples/`](./examples) directory for complete, runnable examples:
727
473
  3. **[Budget Persistence](./examples/3-budget-persistence.js)** - Save and restore state
728
474
  4. **[Custom Provider](./examples/4-custom-provider.js)** - Add your own LLM provider
729
475
  5. **[Model Discovery](./examples/5-model-discovery.js)** - Find and compare models
476
+ 6. **[Intelligent Routing](./examples/6-intelligent-routing.js)** - Automatic retry and fallback
477
+ 7. **[Dynamic Models](./examples/7-dynamic-models.js)** - Register models at runtime
730
478
 
731
479
  ---
732
480
 
@@ -739,11 +487,13 @@ import {
739
487
  createBudgetGuard,
740
488
  patchGlobalFetch,
741
489
  getBudgetStatus,
490
+ createModelRouter,
491
+ registerModels,
742
492
  BudgetGuardOptions,
743
493
  BudgetStatus,
744
494
  ModelInfo,
745
- ProviderAdapter,
746
- ModelPricing
495
+ ModelRouterOptions,
496
+ ModelConfig
747
497
  } from "tokenfirewall";
748
498
 
749
499
  // Full type safety
@@ -769,14 +519,12 @@ try {
769
519
  const response = await fetch(/* ... */);
770
520
  } catch (error) {
771
521
  if (error.message.includes("TokenFirewall: Budget exceeded")) {
772
- // Budget limit reached
773
522
  console.error("Monthly budget exhausted");
774
- // Notify user, upgrade prompt, etc.
775
- } else if (error.message.includes("TokenFirewall: Cost must be")) {
776
- // Invalid cost calculation (should not happen in normal use)
777
- console.error("Internal error:", error.message);
523
+ // Handle budget limit
524
+ } else if (error.message.includes("TokenFirewall Router: Max routing retries exceeded")) {
525
+ console.error("All fallback models failed");
526
+ // Handle routing failure
778
527
  } else {
779
- // Other errors (network, API, etc.)
780
528
  console.error("API error:", error.message);
781
529
  }
782
530
  }
@@ -788,15 +536,15 @@ try {
788
536
  |---------------|-------|----------|
789
537
  | `Budget exceeded! Would spend $X of $Y limit` | Budget limit reached | Increase limit or wait for reset |
790
538
  | `monthlyLimit must be a valid number` | Invalid budget configuration | Provide positive number |
791
- | `Cost must be a valid number` | Internal error | Report as bug |
539
+ | `Max routing retries exceeded` | All fallback models failed | Check API status or fallback map |
792
540
  | `No pricing found for model "X"` | Unknown model | Register custom pricing |
793
- | `Cannot import budget state - no budget guard exists` | Import before create | Call `createBudgetGuard()` first |
794
541
 
795
542
  ---
796
543
 
797
544
  ## Best Practices
798
545
 
799
546
  ### 1. Initialize Early
547
+
800
548
  ```javascript
801
549
  // At application startup
802
550
  createBudgetGuard({ monthlyLimit: 100, mode: "block" });
@@ -804,12 +552,14 @@ patchGlobalFetch();
804
552
  ```
805
553
 
806
554
  ### 2. Use Warn Mode in Development
555
+
807
556
  ```javascript
808
557
  const mode = process.env.NODE_ENV === "production" ? "block" : "warn";
809
558
  createBudgetGuard({ monthlyLimit: 100, mode });
810
559
  ```
811
560
 
812
561
  ### 3. Persist Budget State
562
+
813
563
  ```javascript
814
564
  // Save on exit
815
565
  process.on("beforeExit", () => {
@@ -819,29 +569,39 @@ process.on("beforeExit", () => {
819
569
  ```
820
570
 
821
571
  ### 4. Monitor Usage
572
+
822
573
  ```javascript
823
574
  // Alert at 80% usage
824
575
  const status = getBudgetStatus();
825
576
  if (status && status.percentageUsed > 80) {
826
- await sendAlert("Budget usage high!");
577
+ await sendAlert("Budget usage high");
827
578
  }
828
579
  ```
829
580
 
830
- ### 5. Reset Monthly
581
+ ### 5. Use Router for Resilience
582
+
831
583
  ```javascript
832
- // Automated monthly reset
833
- const cron = require("node-cron");
834
- cron.schedule("0 0 1 * *", () => {
835
- resetBudget();
836
- console.log("Budget reset for new month");
584
+ // Automatic fallback on failures
585
+ createModelRouter({
586
+ strategy: "fallback",
587
+ fallbackMap: {
588
+ "gpt-4o": ["gpt-4o-mini", "gpt-3.5-turbo"]
589
+ },
590
+ maxRetries: 2
837
591
  });
838
592
  ```
839
593
 
840
- ---
841
-
842
- ## Contributing
594
+ ### 6. Register Models Dynamically
843
595
 
844
- Contributions are welcome! Please see [CONTRIBUTING.md](./CONTRIBUTING.md) for guidelines.
596
+ ```javascript
597
+ // Discover and register models from API
598
+ const models = await discoverModels(apiKey);
599
+ registerModels("provider", models.map(m => ({
600
+ name: m.id,
601
+ contextLimit: m.context_window,
602
+ pricing: { input: m.input_price, output: m.output_price }
603
+ })));
604
+ ```
845
605
 
846
606
  ---
847
607
 
@@ -856,20 +616,8 @@ MIT © [Ruthwik](https://github.com/Ruthwik000)
856
616
  - **GitHub:** https://github.com/Ruthwik000/tokenfirewall
857
617
  - **npm:** https://www.npmjs.com/package/tokenfirewall
858
618
  - **Issues:** https://github.com/Ruthwik000/tokenfirewall/issues
859
- - **Documentation:** [API.md](./API.md)
860
619
  - **Changelog:** [CHANGELOG.md](./CHANGELOG.md)
861
620
 
862
621
  ---
863
622
 
864
- ## Support
865
-
866
- If you find TokenFirewall useful, please:
867
- - ⭐ Star the repository
868
- - 🐛 Report bugs and issues
869
- - 💡 Suggest new features
870
- - 📖 Improve documentation
871
- - 🔀 Submit pull requests
872
-
873
- ---
874
-
875
- **Built with ❤️ for the AI developer community**
623
+ Built with ❤️ for the AI developer community.