budget-agent 0.4.7 → 0.4.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +449 -51
  2. package/package.json +29 -19
package/README.md CHANGED
@@ -1,8 +1,8 @@
1
1
  # budget-agent
2
2
 
3
- Stop runaway LLM agents from burning your API credits. Set hard limits on cost, tokens, steps, and wall time. The SDK blocks each call before and after it hits your provider -- so you never overspend.
3
+ Stop runaway AI agents from burning through your API credits. Track cost, tokens, runtime, and steps. Enforce hard budget limits for OpenAI, Anthropic, LangGraph, LangChain, OpenRouter, CrewAI, Mastra, AutoGen, and any LLM workflow.
4
4
 
5
- Works with **OpenAI**, **Anthropic**, **OpenRouter**, **Ollama**, **Together AI**, **Fireworks**, and any OpenAI-compatible API.
5
+ budget-agent helps developers **track AI agent costs**, **enforce token limits**, **set spending caps**, **monitor LLM usage**, and **prevent runaway OpenAI, Anthropic, and OpenRouter agents** from exceeding budget. Works with every provider. Zero vendor lock-in.
6
6
 
7
7
  ## Install
8
8
 
@@ -10,7 +10,7 @@ Works with **OpenAI**, **Anthropic**, **OpenRouter**, **Ollama**, **Together AI*
10
10
  npm install budget-agent
11
11
  ```
12
12
 
13
- ## Usage
13
+ ## Quick start
14
14
 
15
15
  ```ts
16
16
  import { AgentBudget, BudgetError } from 'budget-agent';
@@ -18,10 +18,10 @@ import { AgentBudget, BudgetError } from 'budget-agent';
18
18
  const agent = new AgentBudget({
19
19
  apiKey: process.env.OPENROUTER_API_KEY,
20
20
  limits: {
21
- maxCostUSD: 0.10,
22
- maxSteps: 15,
21
+ maxCostUSD: 0.10,
22
+ maxSteps: 15,
23
23
  maxTotalTokens: 50_000,
24
- maxWallTimeMs: 30_000,
24
+ maxWallTimeMs: 30_000,
25
25
  },
26
26
  });
27
27
 
@@ -33,25 +33,54 @@ const response = await agent.step({
33
33
  console.log(agent.getUsage());
34
34
  ```
35
35
 
36
- ## How it works
36
+ ---
37
37
 
38
- Every `step()` call runs two budget checks:
38
+ ## Prevent runaway AI agents
39
39
 
40
- 1. **Before the API call** -- estimates cost and blocks if you'd go over budget.
41
- 2. **After the API call** -- records actual tokens/cost and blocks if a limit was hit. The step rolls back so you can retry cleanly.
40
+ Agent loops multiply LLM costs across every step. Without guardrails, a single loop can burn through your entire API budget in seconds. budget-agent blocks each call before and after it hits your provider -- so you never overspend.
42
41
 
43
42
  ```ts
43
+ const agent = new AgentBudget({
44
+ apiKey: key,
45
+ limits: { maxCostUSD: 0.05, maxSteps: 10 },
46
+ });
47
+
44
48
  try {
45
- await agent.step({ model, messages });
49
+ while (true) {
50
+ const res = await agent.step({ model, messages });
51
+ messages.push(res.choices[0].message);
52
+ messages.push({ role: 'user', content: 'Continue.' });
53
+ }
46
54
  } catch (err) {
47
55
  if (err instanceof BudgetError) {
48
- console.log(err.exceeded.reason); // 'cost' | 'steps' | 'totalTokens' | 'wallTime'
49
- console.log(err.exceeded.usage); // full usage snapshot at cutoff
56
+ console.log('Agent stopped:', err.exceeded.reason);
50
57
  }
51
58
  }
52
59
  ```
53
60
 
54
- ## Limits
61
+ ---
62
+
63
+ ## Track LLM costs in production
64
+
65
+ Get real-time visibility into every API call. See cost per step, total spend, token breakdown, and wall time.
66
+
67
+ ```ts
68
+ const usage = agent.getUsage();
69
+ // {
70
+ // steps: 12,
71
+ // totalCostUSD: 0.0847,
72
+ // totalInputTokens: 24300,
73
+ // totalOutputTokens: 8200,
74
+ // elapsedMs: 45200,
75
+ // stepHistory: [...]
76
+ // }
77
+
78
+ agent.summary(); // formatted table in console
79
+ ```
80
+
81
+ ---
82
+
83
+ ## Set hard budget caps
55
84
 
56
85
  Every limit is optional. Set only what you need.
57
86
 
@@ -66,9 +95,56 @@ limits: {
66
95
  }
67
96
  ```
68
97
 
69
- ## Custom executor (any provider)
98
+ ---
99
+
100
+ ## Runtime limits for AI agents
101
+
102
+ Kill agents that run too long. Set wall time limits to prevent infinite loops from consuming compute and money.
103
+
104
+ ```ts
105
+ const agent = new AgentBudget({
106
+ apiKey: key,
107
+ limits: {
108
+ maxWallTimeMs: 30_000, // 30 second hard stop
109
+ maxCostUSD: 1.00,
110
+ },
111
+ });
112
+ ```
113
+
114
+ ---
115
+
116
+ ## Token usage tracking
117
+
118
+ Track input tokens, output tokens, and total tokens across every step. Know exactly where your budget goes.
119
+
120
+ ```ts
121
+ agent.on('step:end', (e) => {
122
+ console.log(`Step ${e.stepIndex}: ${e.inputTokens} in / ${e.outputTokens} out / $${e.costUSD}`);
123
+ });
124
+ ```
125
+
126
+ ---
127
+
128
+ ## Agent guardrails
129
+
130
+ Pre-flight checks estimate output cost before the API call. Post-step checks record actual spend. If a limit is hit, the step rolls back and you can retry cleanly.
131
+
132
+ ```ts
133
+ try {
134
+ await agent.step({ model, messages });
135
+ } catch (err) {
136
+ if (err instanceof BudgetError) {
137
+ err.exceeded.reason; // 'cost' | 'steps' | 'totalTokens' | 'wallTime'
138
+ err.exceeded.usage; // full snapshot at cutoff
139
+ }
140
+ }
141
+ ```
142
+
143
+ ---
70
144
 
71
- Use any LLM provider with a custom executor:
145
+ ## OpenAI cost tracking
146
+
147
+ Use budget-agent with the OpenAI SDK to track GPT-5.5 costs in real time.
72
148
 
73
149
  ```ts
74
150
  import { AgentBudget } from 'budget-agent';
@@ -78,7 +154,7 @@ const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
78
154
 
79
155
  const agent = new AgentBudget({
80
156
  apiKey: process.env.OPENAI_API_KEY,
81
- limits: { maxCostUSD: 0.10 },
157
+ limits: { maxCostUSD: 0.50 },
82
158
  executor: async (request) => {
83
159
  const completion = await openai.chat.completions.create({
84
160
  model: request.model,
@@ -100,23 +176,352 @@ const agent = new AgentBudget({
100
176
  });
101
177
  ```
102
178
 
103
- Works with OpenAI, Anthropic, Ollama, Together AI, Fireworks, LocalAI, or any OpenAI-compatible endpoint.
179
+ ---
180
+
181
+ ## Anthropic budget limits
182
+
183
+ Set spending caps on Claude Opus, Sonnet, and Haiku. Track token usage and enforce cost limits for Anthropic models.
184
+
185
+ ```ts
186
+ const agent = new AgentBudget({
187
+ apiKey: process.env.ANTHROPIC_API_KEY,
188
+ limits: { maxCostUSD: 0.25, maxSteps: 20 },
189
+ executor: async (request) => {
190
+ const response = await fetch('https://api.anthropic.com/v1/messages', {
191
+ method: 'POST',
192
+ headers: {
193
+ 'x-api-key': process.env.ANTHROPIC_API_KEY!,
194
+ 'anthropic-version': '2023-06-01',
195
+ 'content-type': 'application/json',
196
+ },
197
+ body: JSON.stringify({
198
+ model: request.model,
199
+ messages: request.messages,
200
+ max_tokens: 1024,
201
+ }),
202
+ });
203
+ const data = await response.json();
204
+ return {
205
+ model: data.model,
206
+ usage: {
207
+ prompt_tokens: data.usage?.input_tokens ?? 0,
208
+ completion_tokens: data.usage?.output_tokens ?? 0,
209
+ total_tokens: (data.usage?.input_tokens ?? 0) + (data.usage?.output_tokens ?? 0),
210
+ },
211
+ choices: [{
212
+ message: { role: 'assistant', content: data.content?.[0]?.text ?? '' },
213
+ finish_reason: 'stop',
214
+ }],
215
+ };
216
+ },
217
+ });
218
+ ```
219
+
220
+ ---
221
+
222
+ ## LangGraph budget control
223
+
224
+ Add budget limits to LangGraph agent graphs. Prevent infinite loops and control cost per execution.
225
+
226
+ ```ts
227
+ import { AgentBudget, BudgetError } from 'budget-agent';
228
+
229
+ const agent = new AgentBudget({
230
+ apiKey: process.env.OPENROUTER_API_KEY,
231
+ limits: { maxCostUSD: 0.20, maxSteps: 50 },
232
+ });
233
+
234
+ // Use inside a LangGraph node
235
+ async function agentNode(state) {
236
+ const response = await agent.step({
237
+ model: 'anthropic/claude-sonnet-4-5',
238
+ messages: state.messages,
239
+ });
240
+ return { messages: [...state.messages, response.choices[0].message] };
241
+ }
242
+ ```
243
+
244
+ ---
245
+
246
+ ## LangChain cost monitoring
247
+
248
+ Track costs for LangChain chains and agents. Set token limits and spending caps.
249
+
250
+ ```ts
251
+ import { AgentBudget } from 'budget-agent';
252
+
253
+ const agent = new AgentBudget({
254
+ apiKey: process.env.OPENROUTER_API_KEY,
255
+ limits: { maxCostUSD: 0.15, maxTotalTokens: 100_000 },
256
+ });
257
+
258
+ // Wrap any LangChain call
259
+ const response = await agent.step({
260
+ model: 'openai/gpt-5.5',
261
+ messages: [{ role: 'user', content: prompt }],
262
+ });
263
+ ```
264
+
265
+ ---
266
+
267
+ ## OpenRouter spend caps
268
+
269
+ budget-agent fetches live pricing from OpenRouter. No hardcoded price tables. If OpenRouter adds a model, it works automatically.
270
+
271
+ ```ts
272
+ const agent = new AgentBudget({
273
+ apiKey: process.env.OPENROUTER_API_KEY,
274
+ limits: { maxCostUSD: 0.10 },
275
+ });
276
+
277
+ // Pricing is fetched and cached automatically
278
+ const response = await agent.step({
279
+ model: 'anthropic/claude-sonnet-4-5',
280
+ messages: [{ role: 'user', content: 'Hello' }],
281
+ });
282
+ ```
283
+
284
+ ---
285
+
286
+ ## Ollama agent limits
287
+
288
+ Set budget limits for local Ollama models. Track token usage even for self-hosted inference.
289
+
290
+ ```ts
291
+ const agent = new AgentBudget({
292
+ apiKey: 'ollama',
293
+ limits: { maxSteps: 100, maxWallTimeMs: 60_000 },
294
+ baseUrl: 'http://localhost:11434/v1',
295
+ executor: async (request) => {
296
+ const res = await fetch('http://localhost:11434/api/chat', {
297
+ method: 'POST',
298
+ body: JSON.stringify({ model: request.model, messages: request.messages }),
299
+ });
300
+ const data = await res.json();
301
+ return {
302
+ model: data.model,
303
+ usage: data.usage ?? { prompt_tokens: 0, completion_tokens: 0, total_tokens: 0 },
304
+ choices: data.messages?.map((m) => ({
305
+ message: { role: m.role, content: m.content },
306
+ finish_reason: 'stop',
307
+ })) ?? [],
308
+ };
309
+ },
310
+ });
311
+ ```
312
+
313
+ ---
314
+
315
+ ## CrewAI budget enforcement
316
+
317
+ Add cost limits to CrewAI agent crews. Prevent multi-agent systems from running up bills.
318
+
319
+ ```ts
320
+ import { AgentBudget } from 'budget-agent';
321
+
322
+ const agent = new AgentBudget({
323
+ apiKey: process.env.OPENROUTER_API_KEY,
324
+ limits: { maxCostUSD: 1.00, maxSteps: 100 },
325
+ });
326
+
327
+ // Use in CrewAI task execution
328
+ const response = await agent.step({
329
+ model: 'anthropic/claude-sonnet-4-5',
330
+ messages: [{ role: 'user', content: taskDescription }],
331
+ });
332
+ ```
333
+
334
+ ---
335
+
336
+ ## Mastra agent limits
337
+
338
+ Set budget limits for Mastra agents. Track cost and tokens across agent workflows.
339
+
340
+ ```ts
341
+ import { AgentBudget } from 'budget-agent';
342
+
343
+ const agent = new AgentBudget({
344
+ apiKey: process.env.OPENROUTER_API_KEY,
345
+ limits: { maxCostUSD: 0.50, maxSteps: 30 },
346
+ });
347
+ ```
348
+
349
+ ---
350
+
351
+ ## AutoGen cost control
352
+
353
+ Add budget limits to AutoGen multi-agent conversations. Prevent agent loops from exceeding budget.
354
+
355
+ ```ts
356
+ import { AgentBudget } from 'budget-agent';
357
+
358
+ const agent = new AgentBudget({
359
+ apiKey: process.env.OPENROUTER_API_KEY,
360
+ limits: { maxCostUSD: 0.25, maxSteps: 20 },
361
+ });
362
+ ```
363
+
364
+ ---
365
+
366
+ ## LLM observability
367
+
368
+ Subscribe to lifecycle events for full visibility into agent behavior.
369
+
370
+ ```ts
371
+ agent.on('step:start', (e) => console.log('Step', e.stepIndex, 'started'));
372
+ agent.on('step:token', (e) => process.stdout.write(e.token));
373
+ agent.on('step:end', (e) => console.log(`Step cost: $${e.costUSD}`));
374
+ agent.on('budget:exceeded', (e) => console.log('Limit hit:', e.exceeded.reason));
375
+ agent.on('budget:warning', (e) => console.log(`Warning: ${e.pctConsumed * 100}% consumed`));
376
+ agent.on('model:downgraded', (e) => console.log(`Downgraded: ${e.from} → ${e.to}`));
377
+ ```
378
+
379
+ ---
380
+
381
+ ## Adaptive model routing
382
+
383
+ Downgrade to cheaper models as budget depletes. Automatic fallback chains.
384
+
385
+ ```ts
386
+ const agent = new AgentBudget({
387
+ apiKey: key,
388
+ limits: { maxCostUSD: 5.00 },
389
+ adaptiveRouting: {
390
+ fallbackChain: [
391
+ 'anthropic/claude-opus-4.8-fast',
392
+ 'openai/gpt-5.5',
393
+ 'openrouter/free',
394
+ ],
395
+ thresholds: [0.4, 0.75],
396
+ },
397
+ });
398
+ ```
399
+
400
+ ---
104
401
 
105
- ## Features
402
+ ## Circuit breaker
403
+
404
+ Detect repetition or stagnation and halt the agent before it burns through credits.
405
+
406
+ ```ts
407
+ const agent = new AgentBudget({
408
+ apiKey: key,
409
+ limits: { maxCostUSD: 1.00 },
410
+ circuitBreaker: {
411
+ repetitionWindow: 3,
412
+ repetitionThreshold: 0.85,
413
+ stagnationWindow: 4,
414
+ stagnationMinLength: 50,
415
+ },
416
+ });
417
+ ```
106
418
 
107
- - **Cost limits** -- hard stop at a USD ceiling across all steps
108
- - **Token limits** -- cap input, output, or total tokens
109
- - **Step limits** -- max number of LLM calls
110
- - **Wall time limits** -- kill agents that run too long
111
- - **Pre-flight checks** -- estimate cost before spending money
112
- - **Rollback on exceed** -- step rolls back so retry stays clean
113
- - **Adaptive routing** -- auto-downgrade to cheaper models as budget depletes
114
- - **Circuit breaker** -- detect repetition or stagnation, halt the agent
115
- - **Auto-compress** -- truncate message history when tokens exceed threshold
116
- - **Checkpoints** -- save and resume agent state across restarts
117
- - **Streaming** -- set `stream: true`, listen for `step:token` events
118
- - **Rate-limit retry** -- automatic 429 retry with exponential backoff
119
- - **OpenTelemetry** -- optional tracing spans
419
+ ---
420
+
421
+ ## Auto-compress messages
422
+
423
+ Truncate message history with an LLM summary when token count exceeds a threshold.
424
+
425
+ ```ts
426
+ const agent = new AgentBudget({
427
+ apiKey: key,
428
+ limits: { maxTotalTokens: 100_000 },
429
+ autoCompress: {
430
+ thresholdTokens: 80_000,
431
+ keepLastN: 4,
432
+ },
433
+ });
434
+ ```
435
+
436
+ ---
437
+
438
+ ## Checkpoints
439
+
440
+ Save and resume agent state across restarts.
441
+
442
+ ```ts
443
+ const agent = new AgentBudget({
444
+ apiKey: key,
445
+ limits: { maxCostUSD: 0.50 },
446
+ checkpoint: { enabled: true, path: './agent-state.json' },
447
+ });
448
+
449
+ // Resume later
450
+ const resumed = await AgentBudget.resume(options);
451
+ ```
452
+
453
+ ---
454
+
455
+ ## Warning thresholds
456
+
457
+ Get notified before hitting limits.
458
+
459
+ ```ts
460
+ const agent = new AgentBudget({
461
+ limits: { maxCostUSD: 0.10 },
462
+ warningThreshold: 0.5,
463
+ });
464
+
465
+ agent.on('budget:warning', (e) => {
466
+ console.log(`${e.pctConsumed * 100}% of ${e.reason} budget consumed`);
467
+ });
468
+ ```
469
+
470
+ ---
471
+
472
+ ## budget-agent vs LangSmith
473
+
474
+ LangSmith is an observability platform. budget-agent is a runtime enforcement layer. LangSmith shows you what happened. budget-agent stops it from happening.
475
+
476
+ | | budget-agent | LangSmith |
477
+ |---|---|---|
478
+ | Runtime enforcement | Yes | No |
479
+ | Pre-flight cost estimation | Yes | No |
480
+ | Budget limits | Hard stops | Soft alerts |
481
+ | Pricing | Free, self-hosted | Paid SaaS |
482
+ | Provider lock-in | None | LangChain ecosystem |
483
+
484
+ ---
485
+
486
+ ## budget-agent vs Helicone
487
+
488
+ Helicone is a proxy for LLM cost tracking. budget-agent is an SDK that enforces limits at runtime. Helicone tracks after the fact. budget-agent blocks before spend happens.
489
+
490
+ | | budget-agent | Helicone |
491
+ |---|---|---|
492
+ | Runtime enforcement | Yes | No |
493
+ | Pre-flight checks | Yes | No |
494
+ | Self-hosted | Yes | Cloud only |
495
+ | Free tier | Yes | Limited |
496
+
497
+ ---
498
+
499
+ ## budget-agent vs Langfuse
500
+
501
+ Langfuse is an LLM observability tool. budget-agent is a budget enforcement SDK. Langfuse gives you dashboards. budget-agent gives you hard limits.
502
+
503
+ | | budget-agent | Langfuse |
504
+ |---|---|---|
505
+ | Runtime enforcement | Yes | No |
506
+ | Pre-flight cost estimation | Yes | No |
507
+ | Budget limits | Hard stops | Observability only |
508
+ | Self-hosted | Yes | Yes |
509
+ | Free | Yes | Yes (self-hosted) |
510
+
511
+ ---
512
+
513
+ ## budget-agent vs OpenAI Usage Dashboard
514
+
515
+ OpenAI's dashboard shows usage after the fact. budget-agent prevents overspend in real time.
516
+
517
+ | | budget-agent | OpenAI Dashboard |
518
+ |---|---|---|
519
+ | Real-time enforcement | Yes | No |
520
+ | Pre-flight checks | Yes | No |
521
+ | Multi-provider | Yes | OpenAI only |
522
+ | Agent loop protection | Yes | No |
523
+
524
+ ---
120
525
 
121
526
  ## API
122
527
 
@@ -143,17 +548,9 @@ Works with OpenAI, Anthropic, Ollama, Together AI, Fireworks, LocalAI, or any Op
143
548
 
144
549
  One LLM call. Checks limits before and after. Throws `BudgetError` on exceed.
145
550
 
146
- ```ts
147
- const response = await agent.step({
148
- model: 'anthropic/claude-sonnet-4-5',
149
- messages: [{ role: 'user', content: 'Hi' }],
150
- stream: true,
151
- });
152
- ```
153
-
154
551
  ### `agent.getUsage()`
155
552
 
156
- Returns current usage: `steps`, `totalInputTokens`, `totalOutputTokens`, `totalCostUSD`, `elapsedMs`, `stepHistory`.
553
+ Returns: `steps`, `totalInputTokens`, `totalOutputTokens`, `totalCostUSD`, `elapsedMs`, `stepHistory`.
157
554
 
158
555
  ### `agent.reset()`
159
556
 
@@ -161,18 +558,19 @@ Reset all counters.
161
558
 
162
559
  ### `agent.refreshPricing()`
163
560
 
164
- Force re-fetch model prices from OpenRouter.
561
+ Force re-fetch model prices.
165
562
 
166
- ## Events
563
+ ### `agent.summary()`
167
564
 
168
- ```ts
169
- agent.on('step:start', (e) => {});
170
- agent.on('step:token', (e) => {});
171
- agent.on('step:end', (e) => {});
172
- agent.on('budget:exceeded', (e) => {});
173
- agent.on('budget:warning', (e) => {});
174
- agent.on('model:downgraded', (e) => {});
175
- ```
565
+ Formatted usage table in console.
566
+
567
+ ### `agent.loadCheckpoint()` / `agent.clearCheckpoint()`
568
+
569
+ Load or clear persisted state.
570
+
571
+ ### `AgentBudget.resume(options, checkpointPath?)`
572
+
573
+ Create a new agent pre-loaded with checkpoint state.
176
574
 
177
575
  ## License
178
576
 
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "budget-agent",
3
- "version": "0.4.7",
4
- "description": "Control LLM agent costs with real-time token, cost, and step tracking. Set budget limits, enforce spend caps, and prevent runaway agents. Works with OpenAI, Anthropic, OpenRouter, Ollama, and any provider.",
3
+ "version": "0.4.9",
4
+ "description": "Track AI agent costs, tokens, runtime and spending. Prevent runaway OpenAI, Anthropic, LangGraph and OpenRouter agents from exceeding budget.",
5
5
  "type": "module",
6
6
  "main": "./dist/index.js",
7
7
  "types": "./dist/index.d.ts",
@@ -29,31 +29,41 @@
29
29
  "test:legacy": "tsx test-integration.ts"
30
30
  },
31
31
  "keywords": [
32
- "llm",
32
+ "ai",
33
33
  "agent",
34
- "budget",
35
- "cost-control",
36
- "token-limit",
37
- "openrouter",
34
+ "llm",
38
35
  "openai",
39
36
  "anthropic",
40
- "llm-cost",
37
+ "langgraph",
38
+ "langchain",
39
+ "openrouter",
40
+ "ollama",
41
+ "crewai",
42
+ "mastra",
43
+ "autogen",
44
+ "cost-tracking",
45
+ "budget",
46
+ "token-tracking",
41
47
  "agent-budget",
48
+ "ai-cost",
49
+ "ai-observability",
50
+ "agent-monitoring",
51
+ "guardrails",
52
+ "runtime-limits",
53
+ "token-limits",
42
54
  "spending-limit",
43
- "rate-limit",
55
+ "cost-control",
56
+ "llm-cost",
57
+ "agent-guardrails",
58
+ "runaway-agent",
59
+ "budget-enforcement",
44
60
  "circuit-breaker",
45
61
  "checkpoint",
46
- "token-tracker",
47
- "cost-tracker",
48
- "llm-agent",
49
- "ai-agent",
50
- "prompt-cost",
51
- "usage-tracking",
52
- "budget-enforcement",
53
- "ollama",
54
- "gpt-4",
62
+ "gpt-5.5",
55
63
  "claude",
56
- "llm-proxy"
64
+ "langsmith",
65
+ "langfuse",
66
+ "helicone"
57
67
  ],
58
68
  "license": "MIT",
59
69
  "repository": {