@salimassili/ai-costguard 1.2.0 → 2.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (76) hide show
  1. package/CHANGELOG.md +62 -0
  2. package/LICENSE +21 -0
  3. package/README.md +415 -177
  4. package/benchmarks/run.mjs +229 -0
  5. package/benchmarks/token-accuracy.mjs +86 -0
  6. package/dist/cli.d.ts +50 -0
  7. package/dist/cli.d.ts.map +1 -0
  8. package/dist/cli.js +178 -0
  9. package/dist/cli.js.map +1 -0
  10. package/dist/core/CostGuard.d.ts +4 -5
  11. package/dist/core/CostGuard.d.ts.map +1 -1
  12. package/dist/core/CostGuard.js +2 -3
  13. package/dist/core/CostGuard.js.map +1 -1
  14. package/dist/core/GuardCore.d.ts +93 -13
  15. package/dist/core/GuardCore.d.ts.map +1 -1
  16. package/dist/core/GuardCore.js +372 -158
  17. package/dist/core/GuardCore.js.map +1 -1
  18. package/dist/core/GuardFree.d.ts +42 -18
  19. package/dist/core/GuardFree.d.ts.map +1 -1
  20. package/dist/core/GuardFree.js +95 -140
  21. package/dist/core/GuardFree.js.map +1 -1
  22. package/dist/core/GuardPro.d.ts +76 -8
  23. package/dist/core/GuardPro.d.ts.map +1 -1
  24. package/dist/core/GuardPro.js +213 -130
  25. package/dist/core/GuardPro.js.map +1 -1
  26. package/dist/core/event-log.d.ts +37 -0
  27. package/dist/core/event-log.d.ts.map +1 -0
  28. package/dist/core/event-log.js +49 -0
  29. package/dist/core/event-log.js.map +1 -0
  30. package/dist/core/events.d.ts +20 -0
  31. package/dist/core/events.d.ts.map +1 -0
  32. package/dist/core/events.js +46 -0
  33. package/dist/core/events.js.map +1 -0
  34. package/dist/core/similarity.d.ts +13 -0
  35. package/dist/core/similarity.d.ts.map +1 -0
  36. package/dist/core/similarity.js +51 -0
  37. package/dist/core/similarity.js.map +1 -0
  38. package/dist/core/tokenizer.d.ts +18 -0
  39. package/dist/core/tokenizer.d.ts.map +1 -0
  40. package/dist/core/tokenizer.js +137 -0
  41. package/dist/core/tokenizer.js.map +1 -0
  42. package/dist/core/types.d.ts +151 -5
  43. package/dist/core/types.d.ts.map +1 -1
  44. package/dist/core/types.js +0 -3
  45. package/dist/core/types.js.map +1 -1
  46. package/dist/core/webhooks.d.ts +15 -0
  47. package/dist/core/webhooks.d.ts.map +1 -0
  48. package/dist/core/webhooks.js +58 -0
  49. package/dist/core/webhooks.js.map +1 -0
  50. package/dist/dashboard.d.ts +73 -0
  51. package/dist/dashboard.d.ts.map +1 -0
  52. package/dist/dashboard.js +201 -0
  53. package/dist/dashboard.js.map +1 -0
  54. package/dist/index.d.ts +4 -5
  55. package/dist/index.d.ts.map +1 -1
  56. package/dist/index.js +2 -3
  57. package/dist/index.js.map +1 -1
  58. package/dist/pricing/index.d.ts +26 -2
  59. package/dist/pricing/index.d.ts.map +1 -1
  60. package/dist/pricing/index.js +100 -13
  61. package/dist/pricing/index.js.map +1 -1
  62. package/dist/pro.d.ts +3 -0
  63. package/dist/pro.d.ts.map +1 -0
  64. package/dist/pro.js +2 -0
  65. package/dist/pro.js.map +1 -0
  66. package/docs/BENCHMARKS.md +70 -0
  67. package/docs/DASHBOARD.md +61 -0
  68. package/docs/INTEGRATIONS.md +153 -0
  69. package/examples/integrations/anthropic-workflow-budget.mjs +36 -0
  70. package/examples/integrations/ci-budget-check.mjs +32 -0
  71. package/examples/integrations/crewai-budget-gate.mjs +31 -0
  72. package/examples/integrations/langchain-retry-storm.mjs +32 -0
  73. package/examples/integrations/mastra-agent.mjs +41 -0
  74. package/examples/integrations/openai-agent-loop.mjs +44 -0
  75. package/examples/integrations/vercel-ai-chatbot.mjs +29 -0
  76. package/package.json +76 -46
package/README.md CHANGED
@@ -1,202 +1,440 @@
1
- # AI CostGuard
1
+ # AI CostGuard
2
+ [![npm version](https://img.shields.io/npm/v/@salimassili/ai-costguard)](https://www.npmjs.com/package/@salimassili/ai-costguard)
3
+
4
+ AI CostGuard is a local-first runtime safety layer for AI agents that prevents runaway costs, loops, retries, and budget explosions before API calls execute. It wraps OpenAI-compatible clients and function-style SDK calls, estimates request cost locally, blocks budget overruns, detects repeated prompts, emits structured events, and exposes CLI checks plus a local dashboard.
5
+
6
+ It is local-first. It does not include a SaaS control plane, cloud dashboard, proxy gateway, telemetry service, billing reconciliation service, or hard security boundary.
2
7
 
3
- AI CostGuard is a small TypeScript library that wraps OpenAI-like clients and blocks requests before they run when local safety checks predict unsafe AI API spend.
8
+ ## What AI CostGuard Does
4
9
 
5
- It is ESM-only, targets Node.js 18+, and is built with `tsc`.
10
+ - Checks selected AI SDK calls before they execute.
11
+ - Estimates request cost from model pricing, prompt text, and reserved output tokens.
12
+ - Blocks unknown models unless explicit pricing is supplied.
13
+ - Blocks budget overruns, repeated prompt loops, retry storms, and max-step overruns.
14
+ - Emits structured errors and local events your app can handle.
6
15
 
7
- ## What Works Today
16
+ ## What AI CostGuard Does Not Do
8
17
 
9
- - `guard()` wraps a client with budget, loop, and retry protection.
10
- - `GuardError` is thrown when a request is blocked.
11
- - Budget blocking estimates request cost before the API call.
12
- - Loop detection blocks repeated prompts within the current process.
13
- - Retry detection blocks repeated failure/retry prompts within the current process.
14
- - `middleware()` adds the same local checks to web request flows.
15
- - `getPricing()` returns known built-in model pricing.
16
- - `registerPricing()` and `listPricing()` let you manage runtime pricing entries.
18
+ - It does not call providers for real-time pricing.
19
+ - It does not reconcile provider invoices or replace provider billing alerts.
20
+ - It does not provide auth, API-key security, or a hard security boundary.
21
+ - It does not run a hosted dashboard, SaaS backend, or cloud telemetry service.
22
+ - It does not guarantee exact tokenizer parity with OpenAI, Anthropic, or other providers.
17
23
 
18
24
  ## Install
19
-
20
- ```bash
21
- npm install @salimassili/ai-costguard
22
- ```
23
-
24
- ## Basic Usage
25
-
26
- ```ts
27
- import OpenAI from 'openai';
28
- import { guard, GuardError } from '@salimassili/ai-costguard';
29
-
30
- const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
31
- const ai = guard(client, { budget: 1 });
32
-
33
- try {
34
- const response = await ai.chat.completions.create({
35
- model: 'gpt-4o-mini',
36
- messages: [{ role: 'user', content: 'Write a short project summary.' }],
37
- max_tokens: 200
38
- });
39
-
40
- console.log(response);
41
- } catch (error) {
42
- if (error instanceof GuardError) {
43
- console.error('AI request blocked:', error.message, error.context);
44
- } else {
45
- throw error;
46
- }
25
+
26
+ ```bash
27
+ npm install @salimassili/ai-costguard
28
+ ```
29
+
30
+ ## Quick Start
31
+
32
+ ```ts
33
+ import OpenAI from 'openai';
34
+ import { guard, GuardError } from '@salimassili/ai-costguard';
35
+
36
+ const openai = guard(new OpenAI({ apiKey: process.env.OPENAI_API_KEY }), {
37
+ budget: 5,
38
+ maxSteps: 50,
39
+ scope: { projectId: 'my-app' },
40
+ });
41
+
42
+ try {
43
+ const response = await openai.chat.completions.create({
44
+ model: 'gpt-4o-mini',
45
+ messages: [{ role: 'user', content: 'Write a short summary.' }],
46
+ max_tokens: 200,
47
+ });
48
+
49
+ console.log(response.choices[0]?.message?.content);
50
+ } catch (error) {
51
+ if (error instanceof GuardError) {
52
+ console.error(error.code, error.message, error.context);
53
+ } else {
54
+ throw error;
55
+ }
47
56
  }
48
57
  ```
49
58
 
50
- ## How `guard()` Works
51
-
52
- `guard(client, config)` returns a `Proxy` around your client. When code calls a method such as `client.chat.completions.create(...)`, CostGuard:
59
+ ## Before / After
53
60
 
54
- 1. Reads the request model, messages, and `max_tokens`.
55
- 2. Estimates input tokens from message length and combines them with the requested output limit.
56
- 3. Looks up pricing for the model.
57
- 4. Estimates the request cost.
58
- 5. Blocks the call with `GuardError` if the local budget would be exceeded.
59
- 6. Blocks repeated prompts that look like loops.
60
- 7. Blocks repeated prompts that look like retry storms.
61
- 8. Lets the original client method run when checks pass.
62
-
63
- The free guard state is process-local. Separate Node.js processes do not share budget state.
64
-
65
- ## Budget Blocking
61
+ Without AI CostGuard:
66
62
 
67
63
  ```ts
68
- import { guard } from '@salimassili/ai-costguard';
69
-
70
- const ai = guard(openai, { budget: 0.25 });
71
-
72
- await ai.chat.completions.create({
73
- model: 'gpt-4o-mini',
74
- messages: [{ role: 'user', content: 'Hello' }],
75
- max_tokens: 100
76
- });
64
+ await openai.chat.completions.create(request);
77
65
  ```
78
66
 
79
- When the estimated cumulative spend in the current process would exceed `budget`, CostGuard throws `GuardError` before calling the underlying AI client.
80
-
81
- ## Loop And Retry Detection
82
-
83
- CostGuard keeps a short in-memory history of recent prompts for the wrapped client:
84
-
85
- - Loop detection blocks repeated prompt hashes.
86
- - Retry detection blocks repeated prompts containing retry/failure language such as `retry`, `again`, `repeat`, `error`, `fail`, or `timeout`.
87
-
88
- These checks are intentionally local and lightweight.
89
-
90
- ## Middleware
67
+ With AI CostGuard:
91
68
 
92
69
  ```ts
93
- import express from 'express';
94
- import { middleware, GuardError } from '@salimassili/ai-costguard';
95
-
96
- const app = express();
97
-
98
- app.use(middleware({ budget: 2 }));
99
-
100
- app.post('/chat', async (req, res, next) => {
101
- try {
102
- req.localSafety.check({
103
- model: 'gpt-4o-mini',
104
- tokens: 1000,
105
- estimatedCost: 0.001,
106
- timestamp: Date.now(),
107
- prompt: req.body?.prompt ?? ''
108
- });
109
-
110
- res.json({ ok: true });
111
- } catch (error) {
112
- if (error instanceof GuardError) {
113
- res.status(402).json({ error: error.message, context: error.context });
114
- return;
115
- }
116
-
117
- next(error);
118
- }
70
+ const openai = guard(new OpenAI({ apiKey: process.env.OPENAI_API_KEY }), {
71
+ budget: 5,
72
+ maxSteps: 50,
73
+ scope: { projectId: 'agent-api', sessionId: runId },
119
74
  });
120
- ```
121
-
122
- ## Pricing
123
-
124
- ```ts
125
- import { getPricing, listPricing, registerPricing } from '@salimassili/ai-costguard';
126
-
127
- console.log(getPricing('gpt-4o-mini'));
128
-
129
- registerPricing([
130
- {
131
- model: 'custom-model',
132
- inputPer1kTokens: 0.001,
133
- outputPer1kTokens: 0.002,
134
- lastUpdated: '2026-05-21',
135
- source: 'internal'
136
- }
137
- ]);
138
75
 
139
- console.log(listPricing());
76
+ await openai.chat.completions.create(request);
140
77
  ```
141
78
 
142
- `getPricing(model)` returns an exact match when available, then falls back to simple fuzzy matching. Unknown models return `undefined`.
143
-
144
- ## Pro Features (Coming Soon)
145
-
146
- > These features are under active development and not yet available:
147
- > - Distributed Redis-backed budget state
148
- > - Real Slack/Discord webhook alerts
149
- > - Multi-instance coordination
150
- > - Production license validation
151
-
152
- ## API
153
-
154
- ### `guard(client, config)`
155
-
156
- Wraps an OpenAI-like client.
157
-
158
- ```ts
159
- guard(client, { budget: 10 });
160
- ```
161
-
162
- ### `GuardError`
163
-
164
- Thrown when CostGuard blocks a request.
79
+ ## What It Guards
80
+
81
+ By default AI CostGuard evaluates these SDK method paths:
82
+
83
+ - `chat.completions.create`
84
+ - `completions.create`
85
+ - `responses.create`
86
+ - `messages.create`
87
+
88
+ Other client methods are passed through without cost checks. To protect a custom client method:
89
+
90
+ ```ts
91
+ const client = guard(customClient, {
92
+ budget: 2,
93
+ guardedMethods: ['agent.run'],
94
+ pricingOverrides: [
95
+ {
96
+ model: 'internal-model',
97
+ inputPer1kTokens: 0.001,
98
+ outputPer1kTokens: 0.002,
99
+ lastUpdated: '2026-06-07',
100
+ source: 'internal pricing sheet',
101
+ },
102
+ ],
103
+ });
104
+ ```
105
+
106
+ For function-style SDKs such as Vercel AI SDK adapters, LangChain wrappers, or agent runners:
107
+
108
+ ```ts
109
+ import { guardFunction } from '@salimassili/ai-costguard';
110
+
111
+ const guardedGenerateText = guardFunction(generateTextAdapter, {
112
+ budget: 1,
113
+ scope: { projectId: 'chatbot' },
114
+ });
115
+
116
+ await guardedGenerateText({
117
+ model: 'gpt-4o-mini',
118
+ prompt: 'Answer the user in one paragraph.',
119
+ max_tokens: 200,
120
+ });
121
+ ```
122
+
123
+ ## Decisions And Errors
124
+
125
+ Blocked requests throw `GuardError` before the provider method is called.
126
+
127
+ ```ts
128
+ try {
129
+ await openai.chat.completions.create(request);
130
+ } catch (error) {
131
+ if (error instanceof GuardError) {
132
+ console.log(error.code);
133
+ console.log(error.metadata);
134
+ }
135
+ }
136
+ ```
137
+
138
+ Current runtime block codes:
139
+
140
+ - `UNKNOWN_MODEL`
141
+ - `BUDGET_EXCEEDED`
142
+ - `MAX_STEPS_EXCEEDED`
143
+ - `LOOP_DETECTED`
144
+ - `RETRY_STORM_DETECTED`
145
+
146
+ ## Configuration
147
+
148
+ ```ts
149
+ guard(client, {
150
+ budget: 10,
151
+ maxSteps: 100,
152
+ behaviorAnalysis: true,
153
+ maxHistory: 32,
154
+ historyTtlMs: 5 * 60 * 1000,
155
+ loopSimilarityThreshold: 0.85,
156
+ loopMinRepeats: 2,
157
+ retryThreshold: 2,
158
+ scope: {
159
+ projectId: 'production-api',
160
+ userId: 'optional-user',
161
+ sessionId: 'optional-agent-run',
162
+ },
163
+ guardedMethods: ['chat.completions.create', 'responses.create'],
164
+ pricingOverrides: [],
165
+ webhooks: {
166
+ slack: process.env.SLACK_WEBHOOK,
167
+ discord: process.env.DISCORD_WEBHOOK,
168
+ retries: 2,
169
+ timeoutMs: 1500,
170
+ },
171
+ eventLogPath: '.ai-costguard/events.jsonl',
172
+ eventLogPrompt: 'none',
173
+ });
174
+ ```
175
+
176
+ `scope` isolates budgets and behavior history. If no scope is supplied, the guard uses one process-local default scope.
177
+
178
+ ## Loop Detection Tuning
179
+
180
+ Default loop detection uses character trigram cosine similarity with `loopSimilarityThreshold: 0.85` and `loopMinRepeats: 2`.
181
+
182
+ - Higher threshold, such as `0.95`: fewer false positives, but near-duplicate loops can slip through.
183
+ - Lower threshold, such as `0.75`: catches looser repeats, but unrelated prompts can be blocked.
184
+ - Higher `loopMinRepeats`: waits for more repeated prompts before blocking.
185
+ - Lower `loopMinRepeats`: blocks faster, but is more aggressive.
165
186
 
166
187
  ```ts
167
- try {
168
- await ai.chat.completions.create(params);
169
- } catch (error) {
170
- if (error instanceof GuardError) {
171
- console.log(error.context);
172
- }
173
- }
188
+ const openai = guard(client, {
189
+ budget: 5,
190
+ loopSimilarityThreshold: 0.9,
191
+ loopMinRepeats: 3,
192
+ scope: { sessionId: 'agent-run-123' },
193
+ });
174
194
  ```
175
195
 
176
- ### `middleware(config)`
177
-
178
- Creates request middleware with local budget, loop, and retry checks.
179
-
180
- ### `getPricing(model, overrides?)`
181
-
182
- Returns pricing for a model from overrides, runtime registrations, or built-in entries.
183
-
184
- ### `registerPricing(entries)`
185
-
186
- Registers or replaces runtime pricing entries by model name.
187
-
188
- ### `listPricing()`
189
-
190
- Returns built-in and runtime pricing entries, deduplicated by model name.
191
-
192
- ## Limitations
193
-
194
- - Free guard state is stored in memory only.
195
- - Budget checks are estimates, not billing records.
196
- - Token estimation is approximate.
197
- - Pricing entries are static until the package or runtime registry is updated.
198
- - The library does not include dashboards, analytics, governance workflows, or hosted services.
199
-
200
- ## License
196
+ Loop detection is heuristic. Expect false positives and false negatives, especially for short prompts, templated prompts, and prompts that share a lot of boilerplate.
197
+
198
+ ## Accounting Semantics
199
+
200
+ AI CostGuard is a pre-call estimator, not a billing ledger.
201
+
202
+ - `attemptedCost`: estimated cost of every guarded attempt, including blocked attempts.
203
+ - `totalCost`: estimated cost of allowed calls.
204
+ - `blockedCost`: estimated cost stopped before provider execution.
205
+ - `actualCost`: provider-reported usage cost when the response includes recognizable `usage` fields.
206
+
207
+ Budget decisions use estimated allowed cost. Actual usage is recorded for observability but does not rewrite earlier decisions.
208
+
209
+ ## Pricing
201
210
 
202
- MIT
211
+ Known model pricing comes from built-in registry entries, runtime registrations, or per-guard overrides. Unknown models are blocked by default.
212
+
213
+ Pricing last updated: `2026-06-07`. Provider pricing changes; AI CostGuard does not fetch real-time pricing. Override pricing manually when provider pages or your contract pricing differ from the built-ins.
214
+
215
+ ```ts
216
+ import { registerPricing } from '@salimassili/ai-costguard';
217
+
218
+ registerPricing([
219
+ {
220
+ model: 'my-company-model',
221
+ inputPer1kTokens: 0.001,
222
+ outputPer1kTokens: 0.002,
223
+ lastUpdated: '2026-06-07',
224
+ source: 'internal',
225
+ },
226
+ ]);
227
+ ```
228
+
229
+ If you intentionally want fallback pricing for unknown models:
230
+
231
+ ```ts
232
+ guard(client, {
233
+ budget: 5,
234
+ unknownModelPolicy: 'fallback',
235
+ unknownModelPricing: {
236
+ model: 'fallback',
237
+ inputPer1kTokens: 0.001,
238
+ outputPer1kTokens: 0.002,
239
+ lastUpdated: '2026-06-07',
240
+ source: 'application fallback',
241
+ },
242
+ });
243
+ ```
244
+
245
+ Pricing changes frequently. Verify provider pricing before production use and override entries when needed.
246
+
247
+ ## Events
248
+
249
+ ```ts
250
+ const unsubscribe = openai.on('block', (event) => {
251
+ console.log(event.code, event.reason, event.context.estimatedCost);
252
+ });
253
+
254
+ unsubscribe();
255
+ ```
256
+
257
+ Supported events are `cost`, `allow`, and `block`. Handler errors are swallowed so observability code cannot change guard decisions.
258
+
259
+ ## Local Dashboard
260
+
261
+ Opt into a local JSONL event log:
262
+
263
+ ```ts
264
+ const openai = guard(client, {
265
+ budget: 5,
266
+ eventLogPath: '.ai-costguard/events.jsonl',
267
+ });
268
+ ```
269
+
270
+ Start the local-only dashboard:
271
+
272
+ ```bash
273
+ ai-costguard dashboard --events .ai-costguard/events.jsonl --budget 5
274
+ ```
275
+
276
+ For one-off package execution:
277
+
278
+ ```bash
279
+ npx @salimassili/ai-costguard dashboard --events .ai-costguard/events.jsonl --budget 5
280
+ ```
281
+
282
+ If the package is installed locally, `npx ai-costguard dashboard` also works. The dashboard binds to `127.0.0.1` by default and reads only local event files.
283
+
284
+ For CI or terminal output:
285
+
286
+ ```bash
287
+ ai-costguard dashboard --events .ai-costguard/events.jsonl --budget 5 --once --json
288
+ ```
289
+
290
+ See `docs/DASHBOARD.md`.
291
+
292
+ ## Integrations
293
+
294
+ Runnable mocked examples are included for:
295
+
296
+ - OpenAI SDK agent loop protection
297
+ - Anthropic SDK workflow budget guard
298
+ - Vercel AI SDK chatbot budget cap
299
+ - LangChain retry-storm prevention
300
+ - Mastra-style agent runner protection
301
+ - CrewAI launch/budget gate
302
+ - CI budget checks
303
+
304
+ See `docs/INTEGRATIONS.md` and `examples/integrations`.
305
+
306
+ ## Express Middleware
307
+
308
+ The middleware attaches a manual checker. It does not automatically parse or inspect every route.
309
+
310
+ ```ts
311
+ import { middleware, GuardError } from '@salimassili/ai-costguard';
312
+
313
+ app.use(middleware({ budget: 2 }));
314
+
315
+ app.post('/chat', async (req, res, next) => {
316
+ try {
317
+ req.localSafety.check({
318
+ model: 'gpt-4o-mini',
319
+ tokens: 500,
320
+ inputTokens: 100,
321
+ outputTokens: 400,
322
+ estimatedCost: 0.0003,
323
+ timestamp: Date.now(),
324
+ prompt: String(req.body?.prompt ?? ''),
325
+ });
326
+
327
+ res.json({ ok: true });
328
+ } catch (error) {
329
+ if (error instanceof GuardError) {
330
+ res.status(403).json({ code: error.code, reason: error.message });
331
+ return;
332
+ }
333
+ next(error);
334
+ }
335
+ });
336
+ ```
337
+
338
+ ## Optional Redis / Pro Helper
339
+
340
+ Redis-backed shared spend tracking is isolated behind a subpath import:
341
+
342
+ ```ts
343
+ import { GuardPro } from '@salimassili/ai-costguard/pro';
344
+
345
+ const pro = new GuardPro({
346
+ redisUrl: process.env.REDIS_URL ?? '',
347
+ budget: 25,
348
+ windowSeconds: 86400,
349
+ });
350
+
351
+ await pro.checkAndCharge('production', 0.0042);
352
+ await pro.shutdown();
353
+ ```
354
+
355
+ `ioredis` is an optional dependency and is not loaded by the root import.
356
+
357
+ AI CostGuard does not include license-key checks or local commercial-license enforcement.
358
+
359
+ ## CLI
360
+
361
+ ```bash
362
+ aifw check --budget 1 --model gpt-4o-mini --input-tokens 500 --tokens 1000 --max-steps 5
363
+ ```
364
+
365
+ The package also installs an `ai-costguard` bin alias:
366
+
367
+ ```bash
368
+ ai-costguard check --budget 1 --model gpt-4o-mini --tokens 1000 --max-steps 5
369
+ ai-costguard dashboard --events .ai-costguard/events.jsonl --budget 5
370
+ ```
371
+
372
+ For custom models:
373
+
374
+ ```bash
375
+ aifw check --budget 1 --model internal-model --tokens 1000 --input-price-per-1k 0.001 --output-price-per-1k 0.002
376
+ ```
377
+
378
+ Exit codes:
379
+
380
+ - `0`: projected cost is within budget
381
+ - `1`: projected cost exceeds budget
382
+ - `2`: usage/config error
383
+
384
+ ## Benchmarks
385
+
386
+ Run local benchmarks:
387
+
388
+ ```bash
389
+ npm run build
390
+ npm run benchmark
391
+ ```
392
+
393
+ The script reports runtime overhead, approximate heap delta, false-positive scenarios, loop detection behavior, and cost-estimation boundaries. Results are local measurements, not universal guarantees. See `docs/BENCHMARKS.md`.
394
+
395
+ Latest local benchmark in this repo on Node `v24.14.1` / Windows measured `0.020691 ms` added per mocked guarded call over `5000` iterations. Re-run on your target runtime before using this number in performance-sensitive claims.
396
+
397
+ Token accuracy benchmark, fixed corpus, `gpt-tokenizer cl100k_base` fixture counts: average error `259.08%`, median error `263.98%`, max error `323.53%`, `8` samples. The current estimator is conservative and can substantially overestimate short prompts. Use this package as a pre-call guardrail, not an exact tokenizer.
398
+
399
+ ## Why Not 50 Lines Of Code?
400
+
401
+ A simple homemade budget check can stop one request after one counter crosses one number. AI CostGuard packages the parts that usually become messy once agents enter production:
402
+
403
+ - Provider pricing registry with runtime overrides and unknown-model blocking.
404
+ - Structured `GuardError` codes and metadata for API responses.
405
+ - Scoped budget and behavior state per project, user, or session.
406
+ - TTL-bounded prompt history.
407
+ - Loop and retry-storm detection.
408
+ - Estimated, attempted, blocked, and actual usage accounting.
409
+ - Method filtering so non-AI SDK calls are not charged.
410
+ - Event hooks, best-effort webhooks, JSONL event logs, and local dashboard visibility.
411
+ - CI budget checks and runnable integration examples.
412
+
413
+ ## Development
414
+
415
+ ```bash
416
+ npm ci
417
+ npm run build
418
+ npm run typecheck
419
+ npm test
420
+ npm run smoke
421
+ npm run benchmark
422
+ npm audit --omit=dev
423
+ npm pack --dry-run
424
+ ```
425
+
426
+ ## Limitations
427
+
428
+ - Token counting is approximate and dependency-free.
429
+ - Token estimation is intentionally conservative and can overestimate materially; see the token accuracy benchmark.
430
+ - Pricing entries can become stale; override them for production.
431
+ - The free guard is process-local.
432
+ - Loop detection uses character trigram similarity, not embeddings.
433
+ - Retry detection is heuristic.
434
+ - Webhooks are best-effort and never affect enforcement.
435
+ - The dashboard reads local JSONL logs only; it is not a hosted analytics product.
436
+ - Provider usage reconciliation only works when responses expose recognizable `usage` fields.
437
+
438
+ ## License
439
+
440
+ MIT