npm - budget-agent - Versions diffs - 0.4.4 → 0.4.6 - Mend

budget-agent 0.4.4 → 0.4.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md CHANGED Viewed

@@ -1,15 +1,17 @@
-# agent-budget
+# budget-agent
-Budget-aware enforcement layer for LLM agents. Track token, cost, and step usage in real time. Enforce limits before and after every LLM call. Works with any provider.
+Control LLM agent costs with real-time token, cost, and step tracking. Set budget limits, enforce spend caps, and prevent runaway agents from burning through your API credits.
-```
+**Works with OpenAI, Anthropic, OpenRouter, Ollama, Together AI, Fireworks, and any OpenAI-compatible endpoint.**
+## Install
+```bash
 npm install budget-agent
 ```
 ## Quick start
-You bring your own API key and model. The SDK calls your provider.
 ```ts
 import { AgentBudget } from 'budget-agent';
@@ -27,48 +29,38 @@ console.log(agent.getUsage());
 // { steps: 1, totalCostUSD: 0.000015, totalInputTokens: 12, ... }
 ```
-## How it works
+## Why use this
-You provide the **model**, the **messages**, and your **API key**. The SDK:
+LLM API calls cost money. Agent loops multiply that cost across every step. Without guardrails, a single runaway agent can burn through your credits in seconds.
-1. Checks budget before the call (pre-flight)
-2. Makes the API request to your provider
-3. Tracks tokens, cost, and duration
-4. Checks budget after the call (post-step)
-5. Emits events for streaming, warnings, and overages
+This SDK sits between your agent and the LLM provider. It tracks every call, checks your budget before each one, and stops the agent when it hits a limit. No provider is bundled. No model is defaulted. You bring everything.
-No provider is bundled. No model is defaulted. You bring everything.
+## Budget limits
-## Limits
-Budget guardrails that stop your agent before it spends too much:
+Set limits on cost, tokens, steps, and wall time. Every limit is optional.
 ```ts
-limits: {
-  maxCostUSD:     0.05,   // total USD before the agent aborts
-  maxSteps:       10,     // total LLM calls before abort
-  maxInputTokens: 50000,  // total input tokens sent to models
-  maxOutputTokens: 10000, // total output tokens received
-  maxTotalTokens:  60000, // input + output combined
-  maxWallTimeMs:   60000, // 60 seconds wall clock
-}
+const agent = new AgentBudget({
+  apiKey: key,
+  limits: {
+    maxCostUSD:      0.05,   // total USD before abort
+    maxSteps:        10,     // total LLM calls before abort
+    maxInputTokens:  50000,  // total input tokens
+    maxOutputTokens: 10000,  // total output tokens
+    maxTotalTokens:  60000,  // input + output combined
+    maxWallTimeMs:   60000,  // 60 seconds wall clock
+  },
+});
 ```
-Every limit is optional. Omit what you don't want to enforce.
 ### How enforcement works
 Each `step()` runs two checks:
-1. **Pre-flight** — before the API call. Estimates output cost (default 512 tokens) and catches over-budget calls before burning money.
-2. **Post-step** — after recording the real token/cost. If exceeded, the step is **rolled back** from the tracker so you can retry without a stale balance.
+1. **Pre-flight** -- before the API call. Estimates output cost and catches over-budget calls before spending money.
+2. **Post-step** -- after recording real token/cost data. If a limit is exceeded, the step is rolled back from the tracker so you can retry without a stale balance.
 ```ts
-const agent = new AgentBudget({
-  apiKey: key,
-  limits: { maxCostUSD: 0.01, maxSteps: 3 },
-});
 try {
   await agent.step({ model, messages });
 } catch (err) {
@@ -85,23 +77,15 @@ const agent = new AgentBudget({
   apiKey: key,
   limits: { maxCostUSD: 0.01 },
   onExceeded: (usage) => {
-    // Log, alert, switch models — never throws
     console.log(`Over budget: $${usage.totalCostUSD}`);
+    // Log, alert, switch models -- never throws
   },
 });
 ```
-### Tune pre-flight estimation
-```ts
-limits: {
-  maxCostUSD: 0.05,
-  preflightCheck: false,              // skip pre-flight entirely
-  preflightOutputTokenEstimate: 2048, // safety buffer (default 512)
-}
-```
+### Warning thresholds
-### Warning thresholds (non-blocking)
+Get notified before hitting limits:
 ```ts
 const agent = new AgentBudget({
@@ -114,7 +98,9 @@ agent.on('budget:warning', (e) => {
 });
 ```
-### Combine with adaptive routing
+## Adaptive model routing
+Downgrade to cheaper models as budget depletes:
 ```ts
 const agent = new AgentBudget({
@@ -122,20 +108,18 @@ const agent = new AgentBudget({
   limits: { maxCostUSD: 5.00 },
   adaptiveRouting: {
     fallbackChain: [
-      'anthropic/claude-opus-4.8-fast', // $15/M tokens — best model
-      'openai/gpt-4o',                  // $5/M tokens
-      'openrouter/free',                // $0 — emergency
+      'anthropic/claude-opus-4.8-fast', // best model
+      'openai/gpt-4o',                  // mid-tier
+      'openrouter/free',                // emergency fallback
     ],
-    thresholds: [0.4, 0.75], // downgrade at 40% and 75% of budget consumed
+    thresholds: [0.4, 0.75], // downgrade at 40% and 75% budget consumed
   },
 });
 ```
-The router downgrades the model tier as the budget depletes. Each `step()` checks the current consumption against the thresholds and selects the appropriate model from the chain before the API call.
 ## Bring your own executor
-Use any LLM provider — OpenAI, Anthropic, Ollama, local models, or the OpenRouter Agent SDK:
+Use any LLM provider with a custom executor:
 ```ts
 import { AgentBudget } from 'budget-agent';
@@ -165,66 +149,22 @@ const agent = new AgentBudget({
     };
   },
 });
-const response = await agent.step({
-  model: 'anthropic/claude-opus-4.8-fast',
-  messages: [{ role: 'user', content: 'Hello' }],
-});
-```
-Or use raw fetch to any API:
-```ts
-const agent = new AgentBudget({
-  apiKey: 'none',
-  limits: { maxCostUSD: 0.05 },
-  executor: async (request) => {
-    const res = await fetch('http://localhost:11434/api/chat', {
-      method: 'POST',
-      body: JSON.stringify({ model: request.model, messages: request.messages }),
-    });
-    const data = await res.json();
-    return {
-      model: data.model,
-      usage: data.usage ?? { prompt_tokens: 0, completion_tokens: 0, total_tokens: 0 },
-      choices: data.messages?.map((m: any) => ({
-        message: { role: m.role, content: m.content },
-        finish_reason: 'stop',
-      })) ?? [],
-    };
-  },
-});
 ```
-## Built-in OpenRouter support
-By default, the SDK calls OpenRouter's API. Configure the endpoint and headers:
-```ts
-const agent = new AgentBudget({
-  apiKey: process.env.OPENROUTER_API_KEY,
-  baseUrl: 'https://openrouter.ai/api/v1',        // default — change for any OpenAI-compatible API
-  siteUrl: 'https://mysite.com',                   // OpenRouter attribution
-  appTitle: 'My App',                              // OpenRouter attribution
-  defaultHeaders: { 'X-Custom': 'value' },         // extra headers for every request
-  limits: { maxCostUSD: 0.10 },
-});
-```
-Works with any OpenAI-compatible endpoint: OpenRouter, OpenAI, Together AI, Fireworks, LocalAI, Ollama (with compat layer), etc.
+Works with OpenAI, Anthropic, Ollama, Together AI, Fireworks, LocalAI, or any OpenAI-compatible API.
 ## Features
-- **Budget enforcement** — set limits on cost, tokens, steps, wall time. Checked pre-flight and post-step.
-- **Auto-compress** — truncate message history with an LLM summary when token count exceeds a threshold.
-- **Circuit breaker** — detect repetition or stagnation and halt the agent.
-- **Adaptive routing** — downgrade to cheaper models as budget depletes.
-- **Checkpoints** — save and resume agent state across restarts.
-- **Events** — subscribe to lifecycle events (`step:start`, `step:end`, `step:token`, `budget:exceeded`, etc.).
-- **Pricing cache** — model pricing fetched from OpenRouter with configurable TTL (or use `setModelPricing()` for any model).
-- **Rate-limit retry** — automatic 429 retry with exponential backoff (3 attempts).
-- **Streaming** — set `stream: true` and listen for `step:token` events.
-- **OpenTelemetry** — optional spans via `telemetry: { enabled: true }` (requires `@opentelemetry/api`).
+- **Budget enforcement** -- cost, tokens, steps, wall time limits checked before and after every LLM call
+- **Adaptive routing** -- automatic model downgrade as budget depletes
+- **Circuit breaker** -- detect repetition or stagnation and halt the agent
+- **Auto-compress** -- truncate message history with LLM summary when tokens exceed threshold
+- **Checkpoints** -- save and resume agent state across restarts
+- **Streaming** -- set `stream: true` and listen for `step:token` events
+- **Events** -- subscribe to `step:start`, `step:end`, `step:token`, `budget:exceeded`, and more
+- **Pricing cache** -- model pricing fetched from OpenRouter with configurable TTL
+- **Rate-limit retry** -- automatic 429 retry with exponential backoff
+- **OpenTelemetry** -- optional tracing spans via `telemetry: { enabled: true }`
 ## API
@@ -232,21 +172,20 @@ Works with any OpenAI-compatible endpoint: OpenRouter, OpenAI, Together AI, Fire
 | Option | Type | Default | Description |
 |--------|------|---------|-------------|
-| `apiKey` | `string` | — | Your provider API key |
-| `limits.*` | `object` | — | Budget limits (cost, tokens, steps, wall time) |
-| `executor` | `AgentExecutor` | — | Custom API executor (replaces built-in fetch) |
-| `baseUrl` | `string` | `https://openrouter.ai/api/v1` | API base URL for built-in fetch |
-| `defaultHeaders` | `object` | — | Extra HTTP headers for built-in fetch |
-| `autoCompress` | `object` | — | Auto-compress messages at token threshold |
-| `circuitBreaker` | `object` | — | Detect repetition/stagnation loops |
-| `adaptiveRouting` | `object` | — | Downgrade model tiers as budget depletes |
-| `checkpoint` | `object` | — | Persist and resume agent state |
+| `apiKey` | `string` | -- | Your provider API key |
+| `limits` | `object` | -- | Budget limits (cost, tokens, steps, wall time) |
+| `executor` | `function` | -- | Custom API executor (replaces built-in fetch) |
+| `baseUrl` | `string` | `https://openrouter.ai/api/v1` | API base URL |
+| `defaultHeaders` | `object` | -- | Extra HTTP headers |
+| `autoCompress` | `object` | -- | Auto-compress messages at token threshold |
+| `circuitBreaker` | `object` | -- | Detect repetition/stagnation loops |
+| `adaptiveRouting` | `object` | -- | Downgrade model tiers as budget depletes |
+| `checkpoint` | `object` | -- | Persist and resume agent state |
 | `onExceeded` | `'abort' \| function` | `'abort'` | Strategy when budget exceeded |
-| `onEvent` | `function` | — | Global event listener |
-| `pricingCacheTTLMs` | `number` | `300_000` | Pricing cache TTL |
-| `siteUrl` | `string` | — | OpenRouter HTTP-Referer |
-| `appTitle` | `string` | — | OpenRouter X-OpenRouter-Title |
-| `telemetry` | `object` | — | Enable OpenTelemetry spans |
+| `onEvent` | `function` | -- | Global event listener |
+| `warningThreshold` | `number` | `0.75` | Fraction of limit that triggers warning |
+| `pricingCacheTTLMs` | `number` | `300000` | Pricing cache TTL in ms |
+| `telemetry` | `object` | -- | Enable OpenTelemetry spans |
 ### `agent.step(request)`
@@ -254,17 +193,15 @@ Make one LLM call. Checks limits before and after. Throws `BudgetError` if excee
 ```ts
 const response = await agent.step({
-  model: 'anthropic/claude-opus-4.8-fast',            // any model slug
+  model: 'anthropic/claude-opus-4.8-fast',
   messages: [{ role: 'user', content: 'Hi' }],
-  stream: true,                        // optional — emit step:token events
+  stream: true, // optional -- emit step:token events
 });
 ```
-**Budget enforcement with rollback.** When a step exceeds budget, the step is recorded for circuit-breaker analysis, then rolled back before throwing. The tracker stays clean for retry. The actual spend is available in the `BudgetError`.
 ### `agent.getUsage()`
-Returns a snapshot of current usage:
+Returns current usage snapshot:
 ```ts
 {
@@ -279,7 +216,7 @@ Returns a snapshot of current usage:
 ### `agent.summary()`
-Prints a formatted table to console and returns the same usage snapshot.
+Prints a formatted table to console and returns the usage snapshot.
 ### `agent.reset()`
@@ -300,22 +237,19 @@ Static factory. Creates a new agent pre-loaded with checkpoint state.
 ## Events
 ```ts
-agent.on('step:start', (event) => console.log('Step', event.stepIndex, 'started'));
-agent.on('step:token', (event) => process.stdout.write(event.token));
-agent.on('step:end', (event) => console.log('Step cost:', event.costUSD));
-agent.on('budget:exceeded', (event) => console.log('Limit hit:', event.exceeded.reason));
-agent.on('compress:triggered', (event) => console.log('Compressed:', event.messagesBefore, '→', event.messagesAfter));
-agent.on('model:downgraded', (event) => console.log('Downgraded to', event.to));
+agent.on('step:start', (e) => console.log('Step', e.stepIndex, 'started'));
+agent.on('step:token', (e) => process.stdout.write(e.token));
+agent.on('step:end', (e) => console.log('Step cost:', e.costUSD));
+agent.on('budget:exceeded', (e) => console.log('Limit hit:', e.exceeded.reason));
+agent.on('model:downgraded', (e) => console.log('Downgraded to', e.to));
 ```
 ## Testing
-```
+```bash
 npm test
 ```
-Runs 10 real-API tests against OpenRouter with simulated pricing.
 ## License
 MIT

package/dist/index.js CHANGED Viewed

@@ -519,9 +519,18 @@ export class AgentBudget {
         const response = request.stream === true
             ? await this._readStream(res, request.model, stepIndex, Date.now(), pricing)
             : (await res.json());
-        // OpenRouter may return HTTP 200 with an error inside choices[0].
-        // This happens when the provider rejects the request (insufficient
-        // credits, guardrail, provider outage, etc.).
+        // OpenRouter may return HTTP 200 with an error body.
+        // Check top-level error first (rate limit, auth, etc.).
+        const bodyAny = response;
+        if (bodyAny.error) {
+            const code = bodyAny.error.code ?? 500;
+            const msg = bodyAny.error.message ?? 'Unknown error';
+            if (code === 429) {
+                throw new RateLimitError(429, 0, msg);
+            }
+            throw new UpstreamError(code, msg);
+        }
+        // Also check choices[0].error (provider-level rejection).
         const choiceError = response.choices?.[0]?.error;
         if (choiceError) {
             throw new UpstreamError(choiceError.code, choiceError.message, choiceError.metadata);

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "budget-agent",
-  "version": "0.4.4",
-  "description": "Provider-agnostic budget enforcement SDK for LLM agents. Track token/cost/step usage, enforce limits, auto-compress, circuit-breaker, checkpoints, adaptive routing, and more.",
+  "version": "0.4.6",
+  "description": "Control LLM agent costs with real-time token, cost, and step tracking. Set budget limits, enforce spend caps, and prevent runaway agents. Works with OpenAI, Anthropic, OpenRouter, Ollama, and any provider.",
   "type": "module",
   "main": "./dist/index.js",
   "types": "./dist/index.d.ts",
@@ -18,19 +18,51 @@
   "engines": {
     "node": ">=18"
   },
+  "scripts": {
+    "build": "tsc",
+    "prepublishOnly": "npm run build",
+    "typecheck": "tsc --noEmit",
+    "test": "tsx test/run.ts",
+    "test:unit": "tsx test/run.ts --unit",
+    "test:integration": "tsx test/run.ts --integration",
+    "test:gauntlet": "tsx test-gauntlet.ts",
+    "test:legacy": "tsx test-integration.ts"
+  },
   "keywords": [
+    "llm",
     "agent",
     "budget",
-    "token",
-    "cost",
-    "llm",
+    "cost-control",
+    "token-limit",
+    "openrouter",
+    "openai",
+    "anthropic",
+    "llm-cost",
+    "agent-budget",
+    "spending-limit",
+    "rate-limit",
     "circuit-breaker",
     "checkpoint",
-    "rate-limit"
+    "token-tracker",
+    "cost-tracker",
+    "llm-agent",
+    "ai-agent",
+    "prompt-cost",
+    "usage-tracking",
+    "budget-enforcement",
+    "ollama",
+    "gpt-4",
+    "claude",
+    "llm-proxy"
   ],
   "license": "MIT",
   "repository": {
     "type": "git",
     "url": "https://github.com/duggal1/agent-budget.git"
+  },
+  "devDependencies": {
+    "dotenv": "^17.4.2",
+    "tsx": "^4.19.0",
+    "typescript": "^5.4.0"
   }
 }