llm_cost_tracker 0.5.0 → 0.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md CHANGED
@@ -1,111 +1,103 @@
1
1
  # LLM Cost Tracker
2
2
 
3
- **Self-hosted LLM cost tracking for Ruby and Rails.** Instruments common Ruby SDKs, intercepts Faraday LLM responses, prices events locally, and can store them in your database. No proxy, no SaaS.
3
+ A Rails-native ledger for what your LLM calls actually cost.
4
4
 
5
5
  [![Gem Version](https://img.shields.io/gem/v/llm_cost_tracker.svg)](https://rubygems.org/gems/llm_cost_tracker)
6
6
  [![CI](https://github.com/sergey-homenko/llm_cost_tracker/actions/workflows/ruby.yml/badge.svg)](https://github.com/sergey-homenko/llm_cost_tracker/actions)
7
7
  [![codecov](https://codecov.io/gh/sergey-homenko/llm_cost_tracker/branch/main/graph/badge.svg)](https://codecov.io/gh/sergey-homenko/llm_cost_tracker)
8
8
 
9
- Requires Ruby 3.3+, ActiveSupport 7.1+, and Faraday 2.0+.
10
- ActiveRecord storage requires ActiveRecord 7.1+. The mounted dashboard requires Rails 7.1+.
9
+ If you have OpenAI, Anthropic, or Gemini in production and someone keeps asking "where did that bill come from?", this gem records every call into your own database, prices it locally, and gives you a dashboard you can mount in five minutes. No proxy, no SaaS account, no extra service to deploy.
11
10
 
12
- ## Why
11
+ It is not Langfuse, Helicone, or LiteLLM. It does not capture prompts, score completions, or replay traces. It does one thing: tells you which provider, which model, which feature, and which user burned how much money. That's the entire pitch.
13
12
 
14
- Every Rails app with LLM integrations eventually runs into the same question: where did that invoice come from? Full observability platforms like Langfuse and Helicone solve a broader set of problems; sometimes you just need a small Rails-native ledger in your own database.
13
+ Requires Ruby 3.3+, ActiveSupport 7.1+, Faraday 2.0+. ActiveRecord storage and the dashboard need Rails 7.1+.
15
14
 
16
- ## What You Get
17
-
18
- - A local ActiveRecord ledger of provider, model, usage breakdown, cost, latency, tags, streaming usage, and provider response IDs
19
- - Optional official OpenAI and Anthropic SDK integrations, plus Faraday middleware for custom clients
20
- - Explicit `track` / `track_stream` helpers as a fallback for unsupported clients
21
- - Server-rendered Rails dashboard with overview, models, calls, tags, CSV export, and data-quality pages
22
- - Local pricing snapshots, price sync tasks, and budget guardrails
23
- - Prompt and response bodies are never persisted
24
-
25
- ## Dashboard
26
-
27
- LLM Cost Tracker ships with a server-rendered Rails Engine dashboard for spend review, attribution, and data quality checks.
28
-
29
- ![LLM Cost Tracker dashboard](docs/dashboard-overview.png)
30
-
31
- The overview page includes spend trend, budget status, provider breakdown, top models, and filterable slices. The engine also includes Models, Calls, Tags, and Data Quality pages. Plain ERB, no JavaScript bundle.
15
+ ![Dashboard overview](docs/dashboard-overview.png)
32
16
 
33
17
  ## Quickstart
34
18
 
19
+ Add to your Gemfile alongside whatever LLM client you already use:
20
+
35
21
  ```ruby
36
22
  gem "llm_cost_tracker"
37
- gem "openai"
23
+ gem "openai" # or "anthropic", or your existing client
38
24
  ```
39
25
 
26
+ Install, migrate, verify:
27
+
40
28
  ```bash
41
29
  bin/rails generate llm_cost_tracker:install --dashboard --prices
42
30
  bin/rails db:migrate
43
31
  bin/rails llm_cost_tracker:doctor
44
32
  ```
45
33
 
46
- Skip `--dashboard` if you only want the ledger. Skip `--prices` if you do not want a local pricing file yet.
34
+ Drop this into `config/initializers/llm_cost_tracker.rb`:
47
35
 
48
36
  ```ruby
49
37
  LlmCostTracker.configure do |config|
50
38
  config.storage_backend = :active_record
51
- config.default_tags = -> { { environment: Rails.env } }
39
+ config.default_tags = -> { { environment: Rails.env } }
52
40
  config.instrument :openai
53
41
  end
42
+ ```
43
+
44
+ Now every OpenAI call is recorded. Wrap calls in `with_tags` to attribute spend to a user, feature, or anything else you care about:
54
45
 
46
+ ```ruby
55
47
  LlmCostTracker.with_tags(user_id: Current.user&.id, feature: "chat") do
56
48
  client = OpenAI::Client.new(api_key: ENV["OPENAI_API_KEY"])
57
49
  client.responses.create(model: "gpt-4o", input: "Hello")
58
50
  end
59
51
  ```
60
52
 
61
- After that, LLM Cost Tracker starts recording calls into `llm_api_calls` and the dashboard becomes available at `/llm-costs`.
62
- Protect the mounted engine with your application's authentication before exposing it outside development.
53
+ Visit `/llm-costs` for the dashboard. **Mount it behind your app's auth before deploying** — the gem doesn't ship with one, on purpose.
54
+
55
+ ## What you get
63
56
 
64
- ## Tradeoffs
57
+ - Local ActiveRecord ledger of every call: provider, model, token breakdown, cost, latency, tags, response IDs
58
+ - Auto-capture for the official `openai` and `anthropic` Ruby SDKs, plus Faraday middleware for `ruby-openai`, the Gemini REST API, and any client you can inject middleware into
59
+ - Server-rendered dashboard (plain ERB, zero JavaScript) with overview, models, calls, tags, CSV export, and a data-quality page
60
+ - Local pricing snapshots refreshed daily from the official provider pricing pages, applied with `bin/rails llm_cost_tracker:prices:refresh`
61
+ - Monthly / daily / per-call budget guardrails with notify, raise, or block-requests behaviour
62
+ - Tag-based attribution that survives concurrency — Puma threads and Sidekiq fibers don't bleed into each other
65
63
 
66
- - Self-hosted ledger first: no proxy, no SaaS, no separate service to operate
67
- - Best-effort pricing for spend review and attribution, not invoice-grade billing
68
- - No prompt or response body storage
69
- - No built-in auth on the mounted dashboard
70
- - Use `:active_record` when you want shared dashboards and budget checks across Puma workers and Sidekiq processes
64
+ ## What it deliberately doesn't do
71
65
 
72
- ## Technical Docs
66
+ - **Doesn't run as a proxy.** Calls go directly from your app to the provider.
67
+ - **Doesn't store prompts or completions.** Token counts, model, cost, tags, response IDs only. Nothing else.
68
+ - **Doesn't promise invoice-grade accuracy.** It uses official provider pricing pages, but enterprise rates, batch discounts on unsupported endpoints, and modality tiers are not always modeled. `provider_response_id` is stored as a join key for whoever does that reconciliation.
69
+ - **Doesn't ship with auth on the dashboard.** It's a Rails Engine; mount it behind whatever your app already uses (Devise, basic auth, Cloudflare Access, your own session middleware).
70
+ - **Doesn't centralize multi-service visibility.** One Rails monolith — perfect fit. Six services in four languages — wrong tool, look at a proxy or API-layer gateway.
73
71
 
74
- - [Architecture](docs/architecture.md)
72
+ ## Capturing calls
75
73
 
76
- ## Usage
74
+ Three paths, in order of preference. Use the first one that fits your stack.
77
75
 
78
- ### Official SDK integrations
76
+ ### 1. Official SDK integrations
79
77
 
80
- `config.instrument` patches **official** provider SDKs only — currently the official `openai` and `anthropic` gems. SDK integrations are optional and do not add provider SDKs as gem dependencies. Install the provider SDK you already use, then enable its integration.
78
+ Drop-in for the official `openai` and `anthropic` gems. `config.instrument` patches the SDK's resource methods so you don't change a single call site:
81
79
 
82
80
  ```ruby
83
81
  LlmCostTracker.configure do |config|
84
- config.instrument :openai
85
- config.instrument :anthropic
82
+ config.instrument :openai # or :anthropic, or :all
86
83
  end
87
- ```
88
-
89
- The OpenAI integration records non-streaming calls through the official `openai` gem's `responses.create` and `chat.completions.create`. The Anthropic integration records non-streaming calls through the official `anthropic` gem's `messages.create`. Both integrations extract usage, model, latency, provider response ID, cache tokens, and hidden/reasoning tokens when the SDK response exposes them.
90
84
 
91
- ```ruby
92
- LlmCostTracker.with_tags(feature: "support_chat", user_id: Current.user&.id) do
93
- anthropic = Anthropic::Client.new(api_key: ENV["ANTHROPIC_API_KEY"])
94
- anthropic.messages.create(
95
- model: "claude-sonnet-4-5-20250929",
85
+ LlmCostTracker.with_tags(feature: "support_chat") do
86
+ Anthropic::Client.new.messages.create(
87
+ model: "claude-sonnet-4-6",
96
88
  max_tokens: 1024,
97
89
  messages: [{ role: "user", content: "Hello" }]
98
90
  )
99
91
  end
100
92
  ```
101
93
 
102
- Community clients such as `ruby-openai` are not patched by `instrument`. `ruby-openai` exposes a Faraday block on its constructor and is covered by the middleware below.
94
+ Captures usage, model, latency, response ID, cache tokens, and reasoning tokens whenever the SDK exposes them. Provider SDKs are not added as gem dependencies you install whichever you actually use.
103
95
 
104
- Google's official Gemini SDKs do not include Ruby. Use the Faraday middleware against Gemini's REST API, or keep custom clients behind the fallback helpers until a stable SDK integration exists.
96
+ This patches **only** the official Ruby SDKs. `ruby-openai` (alexrudall) and any custom client go through Faraday middleware below.
105
97
 
106
- ### Faraday middleware
98
+ ### 2. Faraday middleware
107
99
 
108
- `tags:` can be a hash or callable. Callables are evaluated on each request and may accept the Faraday request env.
100
+ For `ruby-openai`, the Gemini REST API, custom Faraday clients, or anything OpenAI-compatible (OpenRouter, DeepSeek, LiteLLM proxies):
109
101
 
110
102
  ```ruby
111
103
  conn = Faraday.new(url: "https://api.openai.com") do |f|
@@ -114,66 +106,17 @@ conn = Faraday.new(url: "https://api.openai.com") do |f|
114
106
  f.response :json
115
107
  f.adapter Faraday.default_adapter
116
108
  end
117
-
118
- conn.post("/v1/responses", { model: "gpt-5-mini", input: "Hello!" })
119
- ```
120
-
121
- Place `llm_cost_tracker` inside the Faraday stack where it can see the final response body.
122
-
123
- The same middleware covers `ruby-openai` through its constructor block.
124
-
125
- ### Streaming
126
-
127
- Streaming is captured automatically for OpenAI, Anthropic, and Gemini when the request goes through the Faraday middleware. The middleware tees the `on_data` callback, keeps the stream flowing to your code, and records provider-reported usage once the response completes.
128
-
129
- ```ruby
130
- # OpenAI: include usage in the final chunk
131
- client.chat(parameters: {
132
- model: "gpt-4o",
133
- messages: [...],
134
- stream: proc { |chunk| ... },
135
- stream_options: { include_usage: true }
136
- })
137
109
  ```
138
110
 
139
- Anthropic emits usage in `message_start` + `message_delta` events. Gemini's `:streamGenerateContent` endpoint includes `usageMetadata`; the latest usage block is used.
111
+ Tags can be a hash or a callable evaluated per request. Place the middleware where it sees the final response body in practice, before the JSON parser.
140
112
 
141
- Streamed calls are stored with `stream: true` and `usage_source: "stream_final"`. If the provider never sends final usage, the call is still recorded with `usage_source: "unknown"` so those calls surface on the Data Quality page.
113
+ Streaming works through the same path: the middleware tees the `on_data` callback so your code keeps receiving chunks normally, and the final usage gets recorded once the stream finishes. OpenAI streams need `stream_options: { include_usage: true }` for the final usage event.
142
114
 
143
- When the provider emits a stable response object ID, LLM Cost Tracker stores it as `provider_response_id`. OpenAI and Anthropic are covered end-to-end; Gemini is best effort and may vary by endpoint or API version.
115
+ Per-client setup snippets for `ruby-openai`, Azure OpenAI, LiteLLM proxy, and Gemini live in [`docs/cookbook.md`](docs/cookbook.md).
144
116
 
145
- Model identifiers are extracted from the provider response, request body, stream events, or URL path depending on the provider. If no source carries a model, the event is stored under `model: "unknown"` and shows up as unknown pricing instead of being guessed.
117
+ ### 3. Manual `track` / `track_stream`
146
118
 
147
- For non-Faraday clients without an SDK integration, prefer adding a supported adapter. Use the explicit helper only as a fallback while wiring a client that does not expose a stable hook yet:
148
-
149
- ```ruby
150
- LlmCostTracker.track_stream(provider: "openai", model: "gpt-4o") do |stream|
151
- my_client.stream(...) { |event| stream.event(event.to_h) }
152
- end
153
-
154
- # Or skip provider event parsing entirely if you already know the totals:
155
- LlmCostTracker.track_stream(provider: "openai", model: "gpt-4o") do |stream|
156
- # ... your streaming loop ...
157
- stream.usage(input_tokens: 120, output_tokens: 45)
158
- end
159
- ```
160
-
161
- If your custom streaming client exposes the provider's response object ID after the stream starts, set it explicitly:
162
-
163
- ```ruby
164
- LlmCostTracker.track_stream(provider: "anthropic", model: "claude-sonnet-4-6") do |stream|
165
- stream.provider_response_id = response.id
166
- stream.usage(input_tokens: 120, output_tokens: 45)
167
- end
168
- ```
169
-
170
- Run `bin/rails g llm_cost_tracker:add_streaming` once on existing installs to add the `stream` and `usage_source` columns. Run `bin/rails g llm_cost_tracker:add_provider_response_id` to persist provider-issued response IDs. Run `bin/rails g llm_cost_tracker:add_usage_breakdown` to add cache-read, cache-write, hidden-output, and pricing-mode columns.
171
-
172
- More client-specific snippets live in [`docs/cookbook.md`](docs/cookbook.md).
173
-
174
- ### Fallback tracking
175
-
176
- Automatic capture should be the default integration path. `track` exists for custom clients, internal gateways, migrations, and SDKs that do not expose a stable middleware or instrumentation hook yet.
119
+ When you have a client that doesn't expose Faraday and isn't an official SDK internal gateways, homegrown wrappers, batch jobs replaying historical usage:
177
120
 
178
121
  ```ruby
179
122
  LlmCostTracker.track(
@@ -181,22 +124,16 @@ LlmCostTracker.track(
181
124
  model: "claude-sonnet-4-6",
182
125
  input_tokens: 1500,
183
126
  output_tokens: 320,
184
- provider_response_id: "msg_01XFDUDYJgAACzvnptvVoYEL",
185
- cache_read_input_tokens: 1200,
186
127
  feature: "summarizer",
187
128
  user_id: current_user.id
188
129
  )
189
130
  ```
190
131
 
191
- `input_tokens` is regular non-cache input. Put cache hits in
192
- `cache_read_input_tokens` and cache writes in `cache_write_input_tokens`; total
193
- tokens are calculated from the canonical billing breakdown.
194
-
195
- For manual tracking, pass the real upstream model when you know it. If a gateway only exposes a deployment or router name, use that stable identifier and add a matching `prices_file` / `pricing_overrides` entry.
132
+ For streaming the same way, `track_stream` accepts a block, parses provider events automatically, and records once the stream finishes. Full reference in [`docs/streaming.md`](docs/streaming.md).
196
133
 
197
- ### Tags
134
+ ## Tags: who burned this money
198
135
 
199
- Tags are application context, not provider metadata. LLM Cost Tracker detects provider/model from the response when a parser is available; tags tell you who or what caused the call.
136
+ Tags answer the only question that matters in attribution: which feature, which user, which job, which tenant. They're free-form strings, indexed (JSONB on Postgres, fallback elsewhere), and queryable from both Ruby and the dashboard.
200
137
 
201
138
  ```ruby
202
139
  LlmCostTracker.with_tags(user_id: current_user.id, feature: "support_chat", trace_id: request.uuid) do
@@ -204,68 +141,15 @@ LlmCostTracker.with_tags(user_id: current_user.id, feature: "support_chat", trac
204
141
  end
205
142
  ```
206
143
 
207
- `default_tags` can be a hash or callable. Scoped tags from `with_tags` apply only inside the block and are isolated per thread/fiber. Explicit tags passed to `track`, `track_stream`, or middleware metadata win over scoped/default tags.
208
-
209
- ## Configuration
210
-
211
- ```ruby
212
- LlmCostTracker.configure do |config|
213
- config.storage_backend = :active_record
214
- config.default_tags = -> { { environment: Rails.env } }
215
- config.instrument :openai
216
- config.instrument :anthropic
217
- config.prices_file = Rails.root.join("config/llm_cost_tracker_prices.yml")
218
- config.monthly_budget = 500.00
219
- config.daily_budget = 50.00
220
- config.per_call_budget = 2.00
221
- config.budget_exceeded_behavior = :notify
222
- config.on_budget_exceeded = ->(data) {
223
- SlackNotifier.notify("#alerts", "LLM #{data[:budget_type]} budget $#{data[:total].round(2)} / $#{data[:budget]}")
224
- }
225
- end
226
- ```
144
+ `with_tags` is thread- and fiber-isolated, so concurrent requests in Puma or jobs in Sidekiq don't bleed into each other. A `default_tags` callable on configuration runs on every event for things you always want — `environment`, `region`, deployment SHA. Explicit tags passed to `track` win over scoped tags, scoped tags win over defaults.
227
145
 
228
- Storage backends: `:log` (default), `:active_record`, `:custom`. Error behaviors: `:ignore`, `:warn`, `:raise`; budget behavior also supports `:block_requests`.
146
+ What you put in tags is **your** input they're queryable strings. Don't put prompts, completions, emails, or secrets there. Use IDs.
229
147
 
230
- Configuration reference:
148
+ ## Pricing
231
149
 
232
- | Option | Default | Purpose |
233
- |---|---:|---|
234
- | `enabled` | `true` | Turns tracking on/off. |
235
- | `storage_backend` | `:log` | `:log`, `:active_record`, or `:custom`. |
236
- | `custom_storage` | `nil` | Callable storage hook for `:custom`. |
237
- | `default_tags` | `{}` | Hash or callable merged into every event. |
238
- | `prices_file` | `nil` | Local JSON/YAML price table. |
239
- | `pricing_overrides` | `{}` | Ruby-side model price overrides. |
240
- | `instrument` | none | Enables optional SDK integrations such as `:openai`, `:anthropic`, or `:all`. |
241
- | `monthly_budget` | `nil` | Monthly spend guardrail. |
242
- | `daily_budget` | `nil` | Daily spend guardrail. |
243
- | `per_call_budget` | `nil` | Single-event spend guardrail. |
244
- | `budget_exceeded_behavior` | `:notify` | `:notify`, `:raise`, or `:block_requests`. |
245
- | `on_budget_exceeded` | `nil` | Callback for budget events. |
246
- | `storage_error_behavior` | `:warn` | `:ignore`, `:warn`, or `:raise`. |
247
- | `unknown_pricing_behavior` | `:warn` | `:ignore`, `:warn`, or `:raise`. |
248
- | `log_level` | `:info` | Log level used by `:log` storage. |
249
- | `openai_compatible_providers` | OpenRouter + DeepSeek | Host-to-provider map for compatible APIs. |
250
- | `report_tag_breakdowns` | `[]` | Tag keys included in text reports. |
150
+ Built-in prices live in `lib/llm_cost_tracker/prices.json` and are refreshed daily from official provider pricing pages by an automated CI workflow that opens a PR on every change. Most apps run on bundled prices and never think about this.
251
151
 
252
- LLM Cost Tracker estimates cost from recorded usage and a versioned price registry. Providers usually return token usage, not a stable per-request price, so request costs are calculated locally and stored with the call. Historical rows do not change when prices update.
253
-
254
- Pricing is best effort. OpenRouter-style IDs like `openai/gpt-4o-mini` are normalized to built-in names when possible. Use `prices_file` / `pricing_overrides` for fine-tunes, gateway-specific IDs, enterprise discounts, alternate pricing modes, or models the gem does not know.
255
- Provider-specific entries like `openai/gpt-4o-mini` win over model-only entries like `gpt-4o-mini`.
256
- Pass `pricing_mode: :batch` to use optional mode-specific keys such as `batch_input` / `batch_output`; missing mode-specific keys fall back to standard `input` / `output` rates. The same pattern works for custom modes, for example `contract_input`.
257
-
258
- `storage_error_behavior = :warn` (default) lets LLM responses continue if storage fails; `:raise` exposes `StorageError#original_error`.
259
-
260
- With `unknown_pricing_behavior = :ignore` or `:warn`, unknown pricing still records token counts, but `cost` is `nil` and budget guardrails skip that event. With `:raise`, the event raises before storage. Find unpriced models:
261
-
262
- ```ruby
263
- LlmCostTracker::LlmApiCall.unknown_pricing.group(:model).count
264
- ```
265
-
266
- ### Keeping prices current
267
-
268
- Built-in prices live in `lib/llm_cost_tracker/prices.json`. The gem never fetches pricing on boot. For production, generate a local snapshot from the bundled registry, keep it under source control, and point the gem at it:
152
+ When you want to control updates yourself for negotiated rates, gateway-specific model IDs, or pinned reviews generate a local snapshot:
269
153
 
270
154
  ```bash
271
155
  bin/rails generate llm_cost_tracker:prices
@@ -275,356 +159,117 @@ bin/rails generate llm_cost_tracker:prices
275
159
  config.prices_file = Rails.root.join("config/llm_cost_tracker_prices.yml")
276
160
  ```
277
161
 
278
- The generated file has the same shape as the bundled registry:
279
-
280
- ```yaml
281
- metadata:
282
- updated_at: "2026-04-25"
283
- currency: USD
284
- unit: 1M tokens
285
- models:
286
- my-gateway/gpt-4o-mini:
287
- input: 0.20
288
- cache_read_input: 0.10
289
- output: 0.80
290
- batch_input: 0.10
291
- batch_output: 0.40
292
- ```
293
-
294
- Pricing precedence is `pricing_overrides`, then `prices_file`, then bundled prices. Use `prices_file` for the app's source-controlled snapshot and `pricing_overrides` only for a handful of Ruby-side emergency overrides.
295
-
296
- To refresh prices on demand:
297
-
298
- ```bash
299
- bin/rails llm_cost_tracker:prices:sync
300
- ```
301
-
302
- `llm_cost_tracker:prices:sync` refreshes a pricing file from two structured sources: LiteLLM first, OpenRouter second. LiteLLM is the primary source; OpenRouter fills gaps and helps surface discrepancies.
303
-
304
- `llm_cost_tracker:prices:sync` / `llm_cost_tracker:prices:check` perform HTTP GET requests to:
305
-
306
- - LiteLLM pricing JSON: `https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json`
307
- - OpenRouter Models API: `https://openrouter.ai/api/v1/models`
308
-
309
- The task writes to `ENV["OUTPUT"]`, then `config.prices_file`, in that order. It aborts if neither is present. The gem's bundled `prices.json` is only updated when you explicitly pass it through `OUTPUT=` while developing the gem. `_source: "manual"` entries are never touched. Models that are still in your file but missing from both upstream sources are left alone and reported as orphaned. For intentional custom entries, mark them as manual so they stop showing up in orphaned warnings.
310
-
311
- Use `OUTPUT=config/llm_cost_tracker_prices.yml` to choose a target file explicitly. Use `PREVIEW=1` to see the diff without writing. Use `STRICT=1` to fail instead of applying a partial refresh when a source fails or the validator rejects a price. Use `bin/rails llm_cost_tracker:prices:check` in CI to print the current diff and exit non-zero when the snapshot has drifted or refresh fails.
312
-
313
- Large price changes are flagged during sync. If a specific entry is expected to move by more than 3x, add `_validator_override: ["skip_relative_change"]` to that entry in your local price file.
314
-
315
- If sync reports `certificate verify failed`, fix the host Ruby/OpenSSL trust store rather than disabling TLS verification. Common fixes are installing `ca-certificates` in Docker/Linux images, configuring the corporate proxy CA, setting `SSL_CERT_FILE` to the system CA bundle, or rebuilding rbenv/asdf Ruby after an OpenSSL upgrade.
316
-
317
- For unattended updates, run the check daily and sync through review:
162
+ Refresh on demand from the maintained snapshot:
318
163
 
319
164
  ```bash
320
- bin/rails llm_cost_tracker:prices:check
321
- STRICT=1 bin/rails llm_cost_tracker:prices:sync
322
- ```
323
-
324
- `bin/rails llm_cost_tracker:doctor` warns when the configured price file has no `metadata.updated_at` or when it is older than 30 days.
325
-
326
- ## Budget enforcement
327
-
328
- ```ruby
329
- config.storage_backend = :active_record
330
- config.monthly_budget = 100.00
331
- config.daily_budget = 10.00
332
- config.per_call_budget = 1.00
333
- config.budget_exceeded_behavior = :block_requests
165
+ bin/rails llm_cost_tracker:prices:refresh
334
166
  ```
335
167
 
336
- - `:notify` fire `on_budget_exceeded` after an event pushes the monthly, daily, or per-call budget over the limit.
337
- - `:raise` — record the event, then raise `BudgetExceededError`.
338
- - `:block_requests` — block preflight when the stored monthly or daily total is already over budget; still raises post-response on the event that crosses the line. Needs `:active_record` storage for preflight.
168
+ Precedence is `pricing_overrides` `prices_file` bundled. Provider-qualified keys like `openai/gpt-4o-mini` win over model-only keys. Full pricing reference: [`docs/pricing.md`](docs/pricing.md).
339
169
 
340
- `monthly_budget` and `daily_budget` are cumulative ledger limits. `per_call_budget` is a ceiling for a single priced event and runs after the response cost is known.
170
+ ## Budgets
341
171
 
342
- ActiveRecord installs keep `llm_cost_tracker_period_totals` in sync with atomic upserts. Budget preflight reads period rollups when they are available instead of scanning `llm_api_calls`.
172
+ Budgets are guardrails, not transactional caps:
343
173
 
344
174
  ```ruby
345
- rescue LlmCostTracker::BudgetExceededError => e
346
- # e.budget_type, e.total, e.budget, e.monthly_total, e.daily_total, e.call_cost, e.last_event
175
+ config.monthly_budget = 500.00
176
+ config.daily_budget = 50.00
177
+ config.per_call_budget = 2.00
178
+ config.budget_exceeded_behavior = :block_requests # or :notify, :raise
179
+ config.on_budget_exceeded = ->(data) { SlackNotifier.notify("#alerts", "...") }
347
180
  ```
348
181
 
349
- `:block_requests` is a **guardrail, not a hard cap**. The preflight and the spend-recording write are separate statements, so under Puma / Sidekiq concurrency multiple workers can all pass the preflight and then collectively overshoot the budget. The setting reliably *stops new requests after the overshoot is visible* — it does not prevent the overshoot itself. For strict quotas use a provider- or gateway-level limit, or a database-backed counter outside this gem.
182
+ `:block_requests` reads ledger totals before a call goes out and stops it if you're already over. Under concurrency multiple workers can pass preflight at the same time and collectively overshoot this catches the next call after the overshoot becomes visible, not the overshoot itself. For a strict cap, use a provider-side limit or a transactional counter outside the gem.
350
183
 
351
- Preflight is wired into the Faraday middleware and SDK integrations automatically. When you record events via `LlmCostTracker.track` / `track_stream` and also want the same preflight, opt in:
184
+ Full behavior, error class, and preflight details: [`docs/budgets.md`](docs/budgets.md).
352
185
 
353
- ```ruby
354
- LlmCostTracker.track(
355
- provider: "openai",
356
- model: "gpt-4o",
357
- input_tokens: 120,
358
- output_tokens: 45,
359
- enforce_budget: true
360
- )
361
-
362
- LlmCostTracker.track_stream(provider: "openai", model: "gpt-4o", enforce_budget: true) do |stream|
363
- # raises BudgetExceededError before the block runs when over budget
364
- end
186
+ ## Querying
365
187
 
366
- LlmCostTracker.enforce_budget! # standalone preflight
367
- ```
368
-
369
- ## Doctor
370
-
371
- Run the setup check after install, deploy, or upgrades:
372
-
373
- ```bash
374
- bin/rails llm_cost_tracker:doctor
375
- ```
376
-
377
- It checks storage mode, ActiveRecord availability, table/column coverage, period rollups, pricing file loading, and whether calls are being recorded. Setup errors exit non-zero; warnings point at optional production hardening.
378
-
379
- ## Querying costs
380
-
381
- These helpers and rake tasks require ActiveRecord storage.
382
-
383
- ```bash
384
- bin/rails llm_cost_tracker:report
385
- DAYS=7 bin/rails llm_cost_tracker:report
386
- ```
188
+ When you want to slice spend from a console, scheduled job, or your own admin page:
387
189
 
388
190
  ```ruby
389
- LlmCostTracker::LlmApiCall.today.total_cost
390
191
  LlmCostTracker::LlmApiCall.this_month.cost_by_model
391
- LlmCostTracker::LlmApiCall.this_month.cost_by_provider
392
-
393
- # Group / sum by any tag
394
- LlmCostTracker::LlmApiCall.this_month.group_by_tag("feature").sum(:total_cost)
395
- LlmCostTracker::LlmApiCall.this_month.cost_by_tag("feature") # with "(untagged)" bucket
396
-
397
- # Period grouping (SQL-side)
398
- LlmCostTracker::LlmApiCall.this_month.group_by_period(:day).sum(:total_cost)
399
- LlmCostTracker::LlmApiCall.group_by_period(:month).sum(:total_cost)
192
+ LlmCostTracker::LlmApiCall.this_month.cost_by_tag("feature")
400
193
  LlmCostTracker::LlmApiCall.daily_costs(days: 7)
401
-
402
- # Latency
403
- LlmCostTracker::LlmApiCall.with_latency.average_latency_ms
404
- LlmCostTracker::LlmApiCall.this_month.latency_by_model
405
-
406
- # Tag filters
407
- LlmCostTracker::LlmApiCall.by_tag("feature", "chat").this_month.total_cost
408
194
  LlmCostTracker::LlmApiCall.by_tags(user_id: 42, feature: "chat").this_month.total_cost
409
-
410
- # Range
411
- LlmCostTracker::LlmApiCall.between(1.week.ago, Time.current).cost_by_model
412
195
  ```
413
196
 
414
- ## Retention
415
-
416
- Retention is not enforced automatically. With ActiveRecord storage, use the rake task below if you need to delete older records in batches.
197
+ A text report is also one rake task away:
417
198
 
418
199
  ```bash
419
- DAYS=90 bin/rails llm_cost_tracker:prune # delete calls older than N days in batches
420
- ```
421
-
422
- ## Tag storage
423
-
424
- New installs use `jsonb` + GIN on PostgreSQL:
425
-
426
- ```ruby
427
- t.jsonb :tags, null: false, default: {}
428
- add_index :llm_api_calls, :tags, using: :gin
429
- ```
430
-
431
- On other adapters tags fall back to JSON in a text column. `by_tag` uses JSONB containment on PG, text matching elsewhere.
432
-
433
- ## Upgrading existing installs
434
-
435
- Run the generators that match columns missing from older versions:
436
-
437
- ```bash
438
- bin/rails generate llm_cost_tracker:add_period_totals # shared budget rollups
439
- bin/rails generate llm_cost_tracker:add_streaming # stream + usage_source
440
- bin/rails generate llm_cost_tracker:add_provider_response_id
441
- bin/rails generate llm_cost_tracker:add_usage_breakdown
442
- bin/rails generate llm_cost_tracker:upgrade_tags_to_jsonb # PG: text → jsonb + GIN
443
- bin/rails generate llm_cost_tracker:upgrade_cost_precision # widen cost columns
444
- bin/rails generate llm_cost_tracker:add_latency_ms
445
- bin/rails db:migrate
200
+ DAYS=7 bin/rails llm_cost_tracker:report
446
201
  ```
447
202
 
448
- On PostgreSQL, the generated `upgrade_tags_to_jsonb` migration rewrites `llm_api_calls`. Run it during a maintenance window on large tables, or replace it with a two-phase backfill for zero-downtime deploys.
203
+ Full scope and helper reference: [`docs/querying.md`](docs/querying.md).
449
204
 
450
- ## Mounting the dashboard
451
-
452
- Optional Rails Engine. Plain ERB, no JavaScript framework, no asset pipeline required. Requires Rails 7.1+; the core middleware works without Rails. The dashboard reads `llm_api_calls`, so use `storage_backend = :active_record` for apps that mount it.
205
+ ## Dashboard
453
206
 
454
- `bin/rails generate llm_cost_tracker:install --dashboard` adds the require and route for you. Manual setup:
207
+ Mount the engine wherever you want it's plain ERB, no JavaScript bundle, no asset pipeline gymnastics:
455
208
 
456
209
  ```ruby
457
- # config/application.rb (or an initializer)
458
- require "llm_cost_tracker/engine"
459
-
460
210
  # config/routes.rb
461
211
  mount LlmCostTracker::Engine => "/llm-costs"
462
212
  ```
463
213
 
464
- Routes (GET-only; CSV export included):
465
-
466
- - `/llm-costs` — overview: spend with delta vs previous period, budget projection, spend anomaly banner, daily trend vs previous slice, provider rollup, top models
467
- - `/llm-costs/models` — by provider + model; sortable by spend, volume, avg cost, latency
468
- - `/llm-costs/calls` — filterable + paginated; sort modes for recency, spend, input tokens, output tokens, latency, and unknown pricing; CSV export
469
- - `/llm-costs/calls/:id` — details with token mix and cost mix breakdowns
470
- - `/llm-costs/tags` — tag keys present in the dataset (PG/SQLite native; MySQL 8.0+ via JSON_TABLE)
471
- - `/llm-costs/tags/:key` — breakdown by values of a given tag key
472
- - `/llm-costs/data_quality` — unknown pricing, untagged calls, missing latency, incomplete stream usage, and missing provider response IDs
473
-
474
- No built-in auth is included. Tags carry whatever your app puts in them, so protect the mount point with your application's authentication.
475
-
476
- ### Basic auth
477
-
478
- ```ruby
479
- authenticated = ->(req) {
480
- ActionController::HttpAuthentication::Basic.authenticate(req) do |name, password|
481
- ActiveSupport::SecurityUtils.secure_compare(name, ENV.fetch("LLM_DASHBOARD_USER")) &
482
- ActiveSupport::SecurityUtils.secure_compare(password, ENV.fetch("LLM_DASHBOARD_PASSWORD"))
483
- end
484
- }
485
- constraints(authenticated) { mount LlmCostTracker::Engine => "/llm-costs" }
486
- ```
487
-
488
- ### Devise
489
-
490
- ```ruby
491
- authenticate :user, ->(user) { user.admin? } do
492
- mount LlmCostTracker::Engine => "/llm-costs"
493
- end
494
- ```
495
-
496
- ## ActiveSupport::Notifications
497
-
498
- ```ruby
499
- ActiveSupport::Notifications.subscribe("llm_request.llm_cost_tracker") do |*, payload|
500
- # payload =>
501
- # {
502
- # provider: "openai", model: "gpt-4o",
503
- # input_tokens: 150, cache_read_input_tokens: 0, cache_write_input_tokens: 0,
504
- # hidden_output_tokens: 0, output_tokens: 42, total_tokens: 192, latency_ms: 248,
505
- # cost: {
506
- # input_cost: 0.000375, cache_read_input_cost: 0.0,
507
- # cache_write_input_cost: 0.0, output_cost: 0.00042,
508
- # total_cost: 0.000795, currency: "USD"
509
- # },
510
- # pricing_mode: "batch",
511
- # stream: false, usage_source: "response", provider_response_id: "chatcmpl_123",
512
- # tags: { feature: "chat", user_id: 42 },
513
- # tracked_at: 2026-04-16 14:30:00 UTC
514
- # }
515
- end
516
- ```
214
+ Pages: overview (spend trend, budget status, anomaly banner), models, calls (filterable, paginated, CSV export), tags, data quality. Reads `llm_api_calls`, so use `:active_record` storage if you want to mount it.
517
215
 
518
- ## Custom storage backend
519
-
520
- ```ruby
521
- config.storage_backend = :custom
522
- config.custom_storage = ->(event) {
523
- InfluxDB.write("llm_costs",
524
- values: { cost: event.cost&.total_cost, tokens: event.total_tokens, latency_ms: event.latency_ms },
525
- tags: { provider: event.provider, model: event.model }
526
- )
527
- }
528
- ```
529
-
530
- ## OpenAI-compatible providers
531
-
532
- ```ruby
533
- config.openai_compatible_providers["gateway.example.com"] = "internal_gateway"
534
- ```
535
-
536
- Configured hosts are parsed using the OpenAI-compatible usage shape (`prompt_tokens` / `completion_tokens` / `total_tokens`, `input_tokens` / `output_tokens`, and optional cached-input details). This covers OpenRouter, DeepSeek, and private gateways exposing Chat Completions / Responses / Completions / Embeddings.
537
-
538
- ## Custom parser
539
-
540
- For providers with a non-OpenAI usage shape:
541
-
542
- ```ruby
543
- class AcmeParser < LlmCostTracker::Parsers::Base
544
- HOSTS = %w[api.acme-llm.example].freeze
545
- TRACKED_PATHS = %w[/v1/generate].freeze
546
-
547
- def provider_names
548
- %w[acme]
549
- end
550
-
551
- def match?(url)
552
- match_uri?(url, hosts: HOSTS, exact_paths: TRACKED_PATHS)
553
- end
554
-
555
- def parse(_request_url, _request_body, response_status, response_body)
556
- return nil unless response_status == 200
557
-
558
- payload = safe_json_parse(response_body)
559
- usage = payload.dig("usage")
560
- return nil unless usage
561
-
562
- LlmCostTracker::ParsedUsage.build(
563
- provider: "acme",
564
- model: payload["model"],
565
- input_tokens: usage["input"] || 0,
566
- output_tokens: usage["output"] || 0
567
- )
568
- end
569
- end
570
-
571
- LlmCostTracker::Parsers::Registry.register(AcmeParser)
572
- ```
216
+ Auth is your job. Examples for basic auth and Devise: [`docs/dashboard.md`](docs/dashboard.md).
573
217
 
574
218
  ## Supported providers
575
219
 
576
- | Provider | Auto-detected | Models with pricing |
220
+ | Provider | Auto-detected | Coverage |
577
221
  |---|:---:|---|
578
- | OpenAI | Yes | GPT-5.5/5.4/5.2/5.1/5, GPT-5.5/5.4/5.2/5 pro, GPT-5.4 mini/nano, GPT-5 mini/nano, GPT-4.1, GPT-4o, o1/o3/o4-mini |
579
- | OpenRouter | Yes | OpenAI-compatible usage; provider-prefixed OpenAI model IDs normalized when possible |
580
- | DeepSeek | Yes | OpenAI-compatible usage; add `pricing_overrides` for DeepSeek models |
581
- | OpenAI-compatible hosts | Config | Configure `openai_compatible_providers` |
222
+ | OpenAI | Yes | GPT-5.5/5.4/5.2/5.1/5 + pro/mini/nano variants, GPT-4.1, GPT-4o, o1/o3/o4-mini |
582
223
  | Anthropic | Yes | Claude Opus 4.7/4.6/4.5/4.1/4, Sonnet 4.6/4.5/4, Haiku 4.5 |
583
224
  | Google Gemini | Yes | Gemini 2.5 Pro/Flash/Flash-Lite, 2.0 Flash/Flash-Lite |
584
- | Any other | Config | Custom parser |
225
+ | OpenRouter | Yes | OpenAI-compatible usage; provider-prefixed model IDs are normalized |
226
+ | DeepSeek | Yes | OpenAI-compatible usage; add `pricing_overrides` for DeepSeek-specific rates |
227
+ | Other OpenAI-compatible hosts | Configurable | Register the host via `config.openai_compatible_providers` |
228
+ | Anything else | Configurable | Custom parser — see [`docs/extending.md`](docs/extending.md) |
585
229
 
586
- Endpoints: OpenAI Chat Completions / Responses / Completions / Embeddings; OpenAI-compatible equivalents; Anthropic Messages; Gemini `generateContent` and `streamGenerateContent`. Official SDK integrations currently cover non-streaming OpenAI Responses / Chat Completions and Anthropic Messages. Streaming capture is supported for Faraday endpoints that emit stream events with final usage.
230
+ Endpoints covered end-to-end: OpenAI Chat Completions / Responses / Completions / Embeddings, Anthropic Messages, Gemini `generateContent` and `streamGenerateContent`, plus their OpenAI-compatible equivalents. Streaming is captured for Faraday paths whenever the provider emits final-usage events.
587
231
 
588
- ## Safety
232
+ ## Privacy
589
233
 
590
- **By design, `llm_cost_tracker` never persists prompt or response content.** The only data stored per call is the metadata needed for a cost ledger (provider, model, token counts, cost, latency, tags, provider response ID, and timestamp). Tags carry whatever your application passes in treat them as user-controlled input and avoid putting request bodies, completions, or secrets into them.
234
+ By design, **no prompt or response content is ever stored.** Per call, the ledger holds: provider, model, token counts, cost, latency, tags, response ID, timestamp. That's it. No request bodies, no headers, no completions. Warning logs strip query strings before logging URLs.
591
235
 
592
- - No external HTTP calls at request-tracking time.
593
- - No prompt or response bodies stored.
594
- - Faraday responses not modified.
595
- - Request headers are never stored. Warning logs strip query strings from URLs before logging.
596
- - Storage failures non-fatal by default (`storage_error_behavior = :warn`).
597
- - Budget and unknown-pricing errors are raised only when you opt in.
236
+ Tags carry whatever your app passes — they are application-controlled input, treat them accordingly. Use `user_id`, not the user's email; use a feature key, not the input prompt.
598
237
 
599
- ## Thread safety (Puma, Sidekiq)
238
+ ## Documentation
600
239
 
601
- The gem is designed for multi-threaded hosts — Puma with `max_threads > 1` and Sidekiq with `concurrency > 1` are both supported. A few rules:
240
+ Deeper guides live in `docs/`. Reference pages are being filled out as content
241
+ moves out of this README; the inline sections above remain canonical where a page
242
+ is still brief.
602
243
 
603
- - **Configure once at boot.** `LlmCostTracker.configure` freezes mutable shared configuration when the block returns, and replacing shared fields through `LlmCostTracker.configuration` raises `FrozenError`. If `default_tags` is callable, keep it fast and thread-safe.
604
- - **Use `:active_record` storage for the built-in shared ledger.** Puma workers and Sidekiq processes do not share memory; `:log` is process-local, and `:custom` is only as shared as the sink you write to. `:active_record` writes to a single table and is the right choice for the bundled dashboard and budget checks across processes.
605
- - **Size your connection pool.** Each tracked call on the middleware path uses the host app's ActiveRecord connection for ledger writes, period rollups, and optional budget checks. Make sure the AR pool covers `puma max_threads + sidekiq concurrency` plus your app's own usage.
606
- - **Don't share a `StreamCollector` across threads you don't own.** The collector itself is thread-safe — `event`, `usage`, and `finish!` synchronize internally and `finish!` is idempotent — but the documented pattern is one collector per stream.
607
- - **`finish!` is a barrier.** Once a stream is finished, later `event`, `usage`, or `model=` calls raise `FrozenError` instead of mutating a closed collector.
608
- - **`ActiveSupport::Notifications` subscribers run synchronously** in the caller's thread. Keep them fast or hand off to a background job; otherwise they add latency to every tracked call.
609
- - **`storage_error_behavior = :raise` inside Sidekiq** will retry the job, which can duplicate an expensive LLM call. Prefer `:warn` plus a Notifications subscriber, or `:ignore`, for worker contexts.
244
+ - [Configuration reference](docs/configuration.md)
245
+ - [Pricing & price refresh](docs/pricing.md)
246
+ - [Budgets & guardrails](docs/budgets.md)
247
+ - [Querying & reports](docs/querying.md)
248
+ - [Dashboard mounting](docs/dashboard.md)
249
+ - [Streaming capture](docs/streaming.md)
250
+ - [Extending](docs/extending.md)
251
+ - [Production operations](docs/operations.md)
252
+ - [Upgrading](docs/upgrading.md)
253
+ - [Cookbook — per-client recipes](docs/cookbook.md)
254
+ - [Architecture & design rules](docs/architecture.md)
610
255
 
611
256
  ## Known limitations
612
257
 
613
- - `:block_requests` is a best-effort guardrail, not a hard cap. Concurrent workers can pass preflight simultaneously and collectively overshoot the budget. Use an external quota system if you need a transactional cap.
614
- - Official SDK integrations currently cover non-streaming calls. Use Faraday middleware or `track_stream` for SDK streaming until stable stream wrappers are added.
615
- - Streaming capture relies on the provider emitting a final-usage event (OpenAI needs `stream_options: { include_usage: true }`); missing events are recorded with `usage_source: "unknown"` so they surface on the Data Quality page.
616
- - `provider_response_id` is stored only when the provider exposes a stable response object ID. Missing IDs stay `nil` and surface on the Data Quality page.
617
- - Cache write TTL variants (1h vs 5min writes) are not modeled separately.
258
+ - `:block_requests` is best-effort under concurrency, not a transactional cap.
259
+ - Official SDK integrations cover non-streaming calls; streaming via the SDKs falls back to Faraday middleware or `track_stream`.
260
+ - Streaming usage capture relies on the provider emitting a final-usage event. Missing events are stored with `usage_source: "unknown"` so they appear on the data-quality page rather than vanishing.
261
+ - `provider_response_id` is stored only when the provider exposes a stable ID. Gemini is best-effort and varies by endpoint.
262
+ - Cache write TTL variants on Anthropic (1h vs 5min writes) are not modeled separately yet.
618
263
 
619
264
  ## Development
620
265
 
621
- Architecture rules for future changes live in [`docs/architecture.md`](docs/architecture.md).
622
-
623
266
  ```bash
624
267
  bundle install
625
- bin/check
268
+ bin/check # rubocop + rspec
626
269
  ```
627
270
 
271
+ Architecture rules and conventions for contributions live in [`AGENTS.md`](AGENTS.md) and [`docs/architecture.md`](docs/architecture.md).
272
+
628
273
  ## License
629
274
 
630
- MIT. See [LICENSE.txt](LICENSE.txt).
275
+ MIT see [LICENSE.txt](LICENSE.txt).