llm_cost_tracker 0.4.1 → 0.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (45) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +30 -0
  3. data/README.md +132 -405
  4. data/lib/llm_cost_tracker/configuration/instrumentation.rb +37 -0
  5. data/lib/llm_cost_tracker/configuration.rb +10 -5
  6. data/lib/llm_cost_tracker/doctor.rb +166 -0
  7. data/lib/llm_cost_tracker/generators/llm_cost_tracker/install_generator.rb +33 -0
  8. data/lib/llm_cost_tracker/generators/llm_cost_tracker/prices_generator.rb +12 -6
  9. data/lib/llm_cost_tracker/generators/llm_cost_tracker/templates/initializer.rb.erb +53 -21
  10. data/lib/llm_cost_tracker/integrations/anthropic.rb +75 -0
  11. data/lib/llm_cost_tracker/integrations/base.rb +72 -0
  12. data/lib/llm_cost_tracker/integrations/object_reader.rb +56 -0
  13. data/lib/llm_cost_tracker/integrations/openai.rb +95 -0
  14. data/lib/llm_cost_tracker/integrations/registry.rb +41 -0
  15. data/lib/llm_cost_tracker/middleware/faraday.rb +6 -5
  16. data/lib/llm_cost_tracker/parsed_usage.rb +8 -1
  17. data/lib/llm_cost_tracker/parsers/base.rb +1 -1
  18. data/lib/llm_cost_tracker/parsers/openai_usage.rb +1 -1
  19. data/lib/llm_cost_tracker/price_freshness.rb +38 -0
  20. data/lib/llm_cost_tracker/price_registry.rb +14 -0
  21. data/lib/llm_cost_tracker/price_sync/fetcher.rb +5 -2
  22. data/lib/llm_cost_tracker/price_sync/registry_diff.rb +51 -0
  23. data/lib/llm_cost_tracker/price_sync/registry_writer.rb +5 -1
  24. data/lib/llm_cost_tracker/price_sync.rb +111 -109
  25. data/lib/llm_cost_tracker/prices.json +391 -42
  26. data/lib/llm_cost_tracker/pricing.rb +35 -16
  27. data/lib/llm_cost_tracker/request_url.rb +20 -0
  28. data/lib/llm_cost_tracker/storage/dispatcher.rb +68 -0
  29. data/lib/llm_cost_tracker/stream_collector.rb +3 -3
  30. data/lib/llm_cost_tracker/tag_context.rb +52 -0
  31. data/lib/llm_cost_tracker/tracker.rb +7 -60
  32. data/lib/llm_cost_tracker/version.rb +1 -1
  33. data/lib/llm_cost_tracker.rb +14 -4
  34. data/lib/tasks/llm_cost_tracker.rake +33 -69
  35. metadata +28 -12
  36. data/lib/llm_cost_tracker/generators/llm_cost_tracker/templates/llm_cost_tracker_prices.yml.erb +0 -51
  37. data/lib/llm_cost_tracker/price_sync/merger.rb +0 -72
  38. data/lib/llm_cost_tracker/price_sync/model_catalog.rb +0 -77
  39. data/lib/llm_cost_tracker/price_sync/raw_price.rb +0 -33
  40. data/lib/llm_cost_tracker/price_sync/refresh_plan_builder.rb +0 -162
  41. data/lib/llm_cost_tracker/price_sync/source.rb +0 -29
  42. data/lib/llm_cost_tracker/price_sync/source_result.rb +0 -7
  43. data/lib/llm_cost_tracker/price_sync/sources/litellm.rb +0 -90
  44. data/lib/llm_cost_tracker/price_sync/sources/open_router.rb +0 -93
  45. data/lib/llm_cost_tracker/price_sync/validator.rb +0 -66
data/README.md CHANGED
@@ -1,107 +1,103 @@
1
1
  # LLM Cost Tracker
2
2
 
3
- **Self-hosted LLM cost tracking for Ruby and Rails.** Intercepts Faraday LLM responses or records usage explicitly, prices events locally, and stores them in your database. No proxy, no SaaS.
3
+ A Rails-native ledger for what your LLM calls actually cost.
4
4
 
5
5
  [![Gem Version](https://img.shields.io/gem/v/llm_cost_tracker.svg)](https://rubygems.org/gems/llm_cost_tracker)
6
6
  [![CI](https://github.com/sergey-homenko/llm_cost_tracker/actions/workflows/ruby.yml/badge.svg)](https://github.com/sergey-homenko/llm_cost_tracker/actions)
7
7
  [![codecov](https://codecov.io/gh/sergey-homenko/llm_cost_tracker/branch/main/graph/badge.svg)](https://codecov.io/gh/sergey-homenko/llm_cost_tracker)
8
8
 
9
- Requires Ruby 3.3+, Rails/ActiveRecord 7.1+, and Faraday 2.0+.
10
- Core tracking works without Rails; the mounted dashboard requires Rails 7.1+.
9
+ If you have OpenAI, Anthropic, or Gemini in production and someone keeps asking "where did that bill come from?", this gem records every call into your own database, prices it locally, and gives you a dashboard you can mount in five minutes. No proxy, no SaaS account, no extra service to deploy.
11
10
 
12
- ## Why
11
+ It is not Langfuse, Helicone, or LiteLLM. It does not capture prompts, score completions, or replay traces. It does one thing: tells you which provider, which model, which feature, and which user burned how much money. That's the entire pitch.
13
12
 
14
- Every Rails app with LLM integrations eventually runs into the same question: where did that invoice come from? Full observability platforms like Langfuse and Helicone solve a broader set of problems; sometimes you just need a small Rails-native ledger in your own database.
13
+ Requires Ruby 3.3+, ActiveSupport 7.1+, Faraday 2.0+. ActiveRecord storage and the dashboard need Rails 7.1+.
15
14
 
16
- ## What You Get
17
-
18
- - A local ActiveRecord ledger of provider, model, usage breakdown, cost, latency, tags, streaming usage, and provider response IDs
19
- - Faraday middleware plus explicit `track` / `track_stream` helpers for non-Faraday clients
20
- - Server-rendered Rails dashboard with overview, calls, tags, CSV export, and data-quality pages
21
- - Local pricing snapshots, price sync tasks, and budget guardrails
22
- - Prompt and response bodies are never persisted
23
-
24
- ## Dashboard
25
-
26
- LLM Cost Tracker ships with an optional server-rendered Rails Engine dashboard for spend review, attribution, and data quality checks.
27
-
28
- ![LLM Cost Tracker dashboard](docs/dashboard-overview.png)
29
-
30
- The overview page includes spend trend, budget status, provider breakdown, top models, and filterable slices. The engine also includes Calls, Tags, and Data Quality pages. Plain ERB, no JavaScript bundle.
15
+ ![Dashboard overview](docs/dashboard-overview.png)
31
16
 
32
17
  ## Quickstart
33
18
 
19
+ Add to your Gemfile alongside whatever LLM client you already use:
20
+
34
21
  ```ruby
35
22
  gem "llm_cost_tracker"
23
+ gem "openai" # or "anthropic", or your existing client
36
24
  ```
37
25
 
26
+ Install, migrate, verify:
27
+
38
28
  ```bash
39
- bin/rails generate llm_cost_tracker:install
29
+ bin/rails generate llm_cost_tracker:install --dashboard --prices
40
30
  bin/rails db:migrate
31
+ bin/rails llm_cost_tracker:doctor
41
32
  ```
42
33
 
34
+ Drop this into `config/initializers/llm_cost_tracker.rb`:
35
+
43
36
  ```ruby
44
37
  LlmCostTracker.configure do |config|
45
38
  config.storage_backend = :active_record
46
- config.default_tags = { app: "my_app", environment: Rails.env }
47
- end
48
-
49
- OpenAI.configure do |config|
50
- config.access_token = ENV["OPENAI_API_KEY"]
51
- config.faraday do |f|
52
- f.use :llm_cost_tracker, tags: -> { { user_id: Current.user&.id, feature: "chat" } }
53
- end
39
+ config.default_tags = -> { { environment: Rails.env } }
40
+ config.instrument :openai
54
41
  end
55
42
  ```
56
43
 
44
+ Now every OpenAI call is recorded. Wrap calls in `with_tags` to attribute spend to a user, feature, or anything else you care about:
45
+
57
46
  ```ruby
58
- mount LlmCostTracker::Engine => "/llm-costs"
47
+ LlmCostTracker.with_tags(user_id: Current.user&.id, feature: "chat") do
48
+ client = OpenAI::Client.new(api_key: ENV["OPENAI_API_KEY"])
49
+ client.responses.create(model: "gpt-4o", input: "Hello")
50
+ end
59
51
  ```
60
52
 
61
- After that, LLM Cost Tracker starts recording calls into `llm_api_calls` and the dashboard becomes available at `/llm-costs`.
62
- Protect the mounted engine with your application's authentication before exposing it outside development.
53
+ Visit `/llm-costs` for the dashboard. **Mount it behind your app's auth before deploying** — the gem doesn't ship with one, on purpose.
63
54
 
64
- ## Tradeoffs
55
+ ## What you get
65
56
 
66
- - Self-hosted ledger first: no proxy, no SaaS, no separate service to operate
67
- - Best-effort pricing for spend review and attribution, not invoice-grade billing
68
- - No prompt or response body storage
69
- - No built-in auth on the mounted dashboard
70
- - Use `:active_record` when you want shared dashboards and budget checks across Puma workers and Sidekiq processes
57
+ - Local ActiveRecord ledger of every call: provider, model, token breakdown, cost, latency, tags, response IDs
58
+ - Auto-capture for the official `openai` and `anthropic` Ruby SDKs, plus Faraday middleware for `ruby-openai`, the Gemini REST API, and any client you can inject middleware into
59
+ - Server-rendered dashboard (plain ERB, zero JavaScript) with overview, models, calls, tags, CSV export, and a data-quality page
60
+ - Local pricing snapshots refreshed daily from the official provider pricing pages, applied with `bin/rails llm_cost_tracker:prices:refresh`
61
+ - Monthly / daily / per-call budget guardrails with notify, raise, or block-requests behaviour
62
+ - Tag-based attribution that survives concurrency — Puma threads and Sidekiq fibers don't bleed into each other
71
63
 
72
- ## Installation
64
+ ## What it deliberately doesn't do
73
65
 
74
- ```ruby
75
- gem "llm_cost_tracker"
76
- ```
66
+ - **Doesn't run as a proxy.** Calls go directly from your app to the provider.
67
+ - **Doesn't store prompts or completions.** Token counts, model, cost, tags, response IDs only. Nothing else.
68
+ - **Doesn't promise invoice-grade accuracy.** It uses official provider pricing pages, but enterprise rates, batch discounts on unsupported endpoints, and modality tiers are not always modeled. `provider_response_id` is stored as a join key for whoever does that reconciliation.
69
+ - **Doesn't ship with auth on the dashboard.** It's a Rails Engine; mount it behind whatever your app already uses (Devise, basic auth, Cloudflare Access, your own session middleware).
70
+ - **Doesn't centralize multi-service visibility.** One Rails monolith — perfect fit. Six services in four languages — wrong tool, look at a proxy or API-layer gateway.
77
71
 
78
- For ActiveRecord storage:
72
+ ## Capturing calls
79
73
 
80
- ```bash
81
- bin/rails generate llm_cost_tracker:install
82
- bin/rails db:migrate
83
- ```
74
+ Three paths, in order of preference. Use the first one that fits your stack.
84
75
 
85
- ## Usage
76
+ ### 1. Official SDK integrations
86
77
 
87
- ### Patch an existing client's Faraday connection
78
+ Drop-in for the official `openai` and `anthropic` gems. `config.instrument` patches the SDK's resource methods so you don't change a single call site:
88
79
 
89
80
  ```ruby
90
- # config/initializers/openai.rb
91
- OpenAI.configure do |config|
92
- config.access_token = ENV["OPENAI_API_KEY"]
93
-
94
- config.faraday do |f|
95
- f.use :llm_cost_tracker, tags: -> {
96
- { user_id: Current.user&.id, workflow: Current.workflow, env: Rails.env }
97
- }
98
- end
81
+ LlmCostTracker.configure do |config|
82
+ config.instrument :openai # or :anthropic, or :all
83
+ end
84
+
85
+ LlmCostTracker.with_tags(feature: "support_chat") do
86
+ Anthropic::Client.new.messages.create(
87
+ model: "claude-sonnet-4-6",
88
+ max_tokens: 1024,
89
+ messages: [{ role: "user", content: "Hello" }]
90
+ )
99
91
  end
100
92
  ```
101
93
 
102
- `tags:` can be a callable and is evaluated on each request.
94
+ Captures usage, model, latency, response ID, cache tokens, and reasoning tokens whenever the SDK exposes them. Provider SDKs are not added as gem dependencies — you install whichever you actually use.
103
95
 
104
- ### Raw Faraday
96
+ This patches **only** the official Ruby SDKs. `ruby-openai` (alexrudall) and any custom client go through Faraday middleware below.
97
+
98
+ ### 2. Faraday middleware
99
+
100
+ For `ruby-openai`, the Gemini REST API, custom Faraday clients, or anything OpenAI-compatible (OpenRouter, DeepSeek, LiteLLM proxies):
105
101
 
106
102
  ```ruby
107
103
  conn = Faraday.new(url: "https://api.openai.com") do |f|
@@ -110,60 +106,17 @@ conn = Faraday.new(url: "https://api.openai.com") do |f|
110
106
  f.response :json
111
107
  f.adapter Faraday.default_adapter
112
108
  end
113
-
114
- conn.post("/v1/responses", { model: "gpt-5-mini", input: "Hello!" })
115
- ```
116
-
117
- Place `llm_cost_tracker` inside the Faraday stack where it can see the final response body.
118
-
119
- ### Streaming
120
-
121
- Streaming is captured automatically for OpenAI, Anthropic, and Gemini when the request goes through the Faraday middleware. The middleware tees the `on_data` callback, keeps the stream flowing to your code, and records the final usage block once the response completes.
122
-
123
- ```ruby
124
- # OpenAI: include usage in the final chunk
125
- client.chat(parameters: {
126
- model: "gpt-4o",
127
- messages: [...],
128
- stream: proc { |chunk| ... },
129
- stream_options: { include_usage: true }
130
- })
131
- ```
132
-
133
- Anthropic emits usage in `message_start` + `message_delta` events. Gemini's `:streamGenerateContent` endpoint includes `usageMetadata`; usage from the final chunk is used.
134
-
135
- Streamed calls are stored with `stream: true` and `usage_source: "stream_final"`. If the provider never sends final usage, the call is still recorded with `usage_source: "unknown"` so those calls surface on the Data Quality page.
136
-
137
- When the provider emits a stable response object ID, LLM Cost Tracker stores it as `provider_response_id`. OpenAI and Anthropic are covered end-to-end; Gemini is best effort and may vary by endpoint or API version.
138
-
139
- For non-Faraday clients (raw `Net::HTTP`, custom SSE code, Azure OpenAI), use the explicit helper:
140
-
141
- ```ruby
142
- LlmCostTracker.track_stream(provider: "openai", model: "gpt-4o") do |stream|
143
- my_client.stream(...) { |chunk| stream.event(chunk) }
144
- end
145
-
146
- # Or skip the chunk parsing entirely if you already know the totals:
147
- LlmCostTracker.track_stream(provider: "openai", model: "gpt-4o") do |stream|
148
- # ... your streaming loop ...
149
- stream.usage(input_tokens: 120, output_tokens: 45)
150
- end
151
109
  ```
152
110
 
153
- If your custom streaming client exposes the provider's response object ID after the stream starts, set it explicitly:
111
+ Tags can be a hash or a callable evaluated per request. Place the middleware where it sees the final response body — in practice, before the JSON parser.
154
112
 
155
- ```ruby
156
- LlmCostTracker.track_stream(provider: "anthropic", model: "claude-sonnet-4-6") do |stream|
157
- stream.provider_response_id = response.id
158
- stream.usage(input_tokens: 120, output_tokens: 45)
159
- end
160
- ```
113
+ Streaming works through the same path: the middleware tees the `on_data` callback so your code keeps receiving chunks normally, and the final usage gets recorded once the stream finishes. OpenAI streams need `stream_options: { include_usage: true }` for the final usage event.
161
114
 
162
- Run `bin/rails g llm_cost_tracker:add_streaming` once on existing installs to add the `stream` and `usage_source` columns. Run `bin/rails g llm_cost_tracker:add_provider_response_id` to persist provider-issued response IDs. Run `bin/rails g llm_cost_tracker:add_usage_breakdown` to add cache-read, cache-write, hidden-output, and pricing-mode columns.
115
+ Per-client setup snippets for `ruby-openai`, Azure OpenAI, LiteLLM proxy, and Gemini live in [`docs/cookbook.md`](docs/cookbook.md).
163
116
 
164
- More client-specific snippets live in [`docs/cookbook.md`](docs/cookbook.md).
117
+ ### 3. Manual `track` / `track_stream`
165
118
 
166
- ### Manual tracking
119
+ When you have a client that doesn't expose Faraday and isn't an official SDK — internal gateways, homegrown wrappers, batch jobs replaying historical usage:
167
120
 
168
121
  ```ruby
169
122
  LlmCostTracker.track(
@@ -171,378 +124,152 @@ LlmCostTracker.track(
171
124
  model: "claude-sonnet-4-6",
172
125
  input_tokens: 1500,
173
126
  output_tokens: 320,
174
- provider_response_id: "msg_01XFDUDYJgAACzvnptvVoYEL",
175
- cache_read_input_tokens: 1200,
176
127
  feature: "summarizer",
177
128
  user_id: current_user.id
178
129
  )
179
130
  ```
180
131
 
181
- `input_tokens` is regular non-cache input. Put cache hits in
182
- `cache_read_input_tokens` and cache writes in `cache_write_input_tokens`; total
183
- tokens are calculated from the canonical billing breakdown.
132
+ For streaming the same way, `track_stream` accepts a block, parses provider events automatically, and records once the stream finishes. Full reference in [`docs/streaming.md`](docs/streaming.md).
184
133
 
185
- ## Configuration
134
+ ## Tags: who burned this money
135
+
136
+ Tags answer the only question that matters in attribution: which feature, which user, which job, which tenant. They're free-form strings, indexed (JSONB on Postgres, fallback elsewhere), and queryable from both Ruby and the dashboard.
186
137
 
187
138
  ```ruby
188
- # config/initializers/llm_cost_tracker.rb
189
- LlmCostTracker.configure do |config|
190
- config.storage_backend = :active_record # :log (default), :active_record, :custom
191
- config.default_tags = { app: "my_app", environment: Rails.env }
192
-
193
- config.monthly_budget = 500.00
194
- config.daily_budget = 50.00
195
- config.per_call_budget = 2.00
196
- config.budget_exceeded_behavior = :notify # :notify, :raise, :block_requests
197
- config.storage_error_behavior = :warn # :ignore, :warn, :raise
198
- config.unknown_pricing_behavior = :warn # :ignore, :warn, :raise
199
-
200
- config.on_budget_exceeded = ->(data) {
201
- SlackNotifier.notify("#alerts", "🚨 LLM #{data[:budget_type]} budget $#{data[:total].round(2)} / $#{data[:budget]}")
202
- }
203
-
204
- config.prices_file = Rails.root.join("config/llm_cost_tracker_prices.yml")
205
- config.pricing_overrides = {
206
- "ft:gpt-4o-mini:my-org" => { input: 0.30, cache_read_input: 0.15, output: 1.20 }
207
- }
208
-
209
- # Built-in: openrouter.ai, api.deepseek.com
210
- config.openai_compatible_providers["llm.my-company.com"] = "internal_gateway"
139
+ LlmCostTracker.with_tags(user_id: current_user.id, feature: "support_chat", trace_id: request.uuid) do
140
+ client.chat(parameters: { model: "gpt-4o", messages: [...] })
211
141
  end
212
142
  ```
213
143
 
214
- Pricing is best effort. OpenRouter-style IDs like `openai/gpt-4o-mini` are normalized to built-in names when possible. Use `prices_file` / `pricing_overrides` for fine-tunes, gateway-specific IDs, enterprise discounts, alternate pricing modes, or models the gem does not know.
215
- Provider-specific entries like `openai/gpt-4o-mini` win over model-only entries like `gpt-4o-mini`.
216
- Pass `pricing_mode: :batch` to use optional mode-specific keys such as `batch_input` / `batch_output`; missing mode-specific keys fall back to standard `input` / `output` rates. The same pattern works for custom modes, for example `contract_input`.
144
+ `with_tags` is thread- and fiber-isolated, so concurrent requests in Puma or jobs in Sidekiq don't bleed into each other. A `default_tags` callable on configuration runs on every event for things you always want — `environment`, `region`, deployment SHA. Explicit tags passed to `track` win over scoped tags, scoped tags win over defaults.
217
145
 
218
- `storage_error_behavior = :warn` (default) lets LLM responses continue if storage fails; `:raise` exposes `StorageError#original_error`.
146
+ What you put in tags is **your** input they're queryable strings. Don't put prompts, completions, emails, or secrets there. Use IDs.
219
147
 
220
- Unknown pricing still records token counts, but `cost` is `nil` and budget guardrails skip that event. Find unpriced models:
148
+ ## Pricing
221
149
 
222
- ```ruby
223
- LlmCostTracker::LlmApiCall.unknown_pricing.group(:model).count
224
- ```
150
+ Built-in prices live in `lib/llm_cost_tracker/prices.json` and are refreshed daily from official provider pricing pages by an automated CI workflow that opens a PR on every change. Most apps run on bundled prices and never think about this.
225
151
 
226
- ### Keeping prices current
227
-
228
- Built-in prices live in `lib/llm_cost_tracker/prices.json`. The gem never fetches pricing on boot. For production, keep a local snapshot under `config/` and point the gem at it:
152
+ When you want to control updates yourself — for negotiated rates, gateway-specific model IDs, or pinned reviews — generate a local snapshot:
229
153
 
230
154
  ```bash
231
155
  bin/rails generate llm_cost_tracker:prices
232
156
  ```
233
157
 
234
- ```json
235
- {
236
- "metadata": { "updated_at": "2026-04-18", "currency": "USD", "unit": "1M tokens" },
237
- "models": {
238
- "my-gateway/gpt-4o-mini": { "input": 0.20, "cache_read_input": 0.10, "output": 0.80, "batch_input": 0.10, "batch_output": 0.40 }
239
- }
240
- }
158
+ ```ruby
159
+ config.prices_file = Rails.root.join("config/llm_cost_tracker_prices.yml")
241
160
  ```
242
161
 
243
- `pricing_overrides` has the highest precedence. Use it for a handful of Ruby-side overrides; use `prices_file` when you want a local pricing table under source control.
244
-
245
- To refresh prices on demand:
162
+ Refresh on demand from the maintained snapshot:
246
163
 
247
164
  ```bash
248
- bin/rails llm_cost_tracker:prices:sync
249
- ```
250
-
251
- `llm_cost_tracker:prices:sync` refreshes the current registry from two structured sources: LiteLLM first, OpenRouter second. LiteLLM is the primary source; OpenRouter fills gaps and helps surface discrepancies.
252
-
253
- `llm_cost_tracker:prices:sync` / `llm_cost_tracker:prices:check` perform HTTP GET requests to:
254
-
255
- - LiteLLM pricing JSON: `https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json`
256
- - OpenRouter Models API: `https://openrouter.ai/api/v1/models`
257
-
258
- If `config.prices_file` is configured, the task syncs that file automatically; otherwise it works from the built-in snapshot. `_source: "manual"` entries are never touched. Models that are still in your file but missing from both upstream sources are left alone and reported as orphaned. For intentional custom entries, mark them as manual so they stop showing up in orphaned warnings.
259
-
260
- Use `PREVIEW=1` to see the diff without writing. Use `STRICT=1` to fail instead of applying a partial refresh when a source fails or the validator rejects a price. Use `bin/rails llm_cost_tracker:prices:check` in CI to print the current diff and exit non-zero when the snapshot has drifted or refresh fails.
261
-
262
- Large price changes are flagged during sync. If a specific entry is expected to move by more than 3x, add `_validator_override: ["skip_relative_change"]` to that entry in your local price file.
263
-
264
- ## Budget enforcement
265
-
266
- ```ruby
267
- config.storage_backend = :active_record
268
- config.monthly_budget = 100.00
269
- config.daily_budget = 10.00
270
- config.per_call_budget = 1.00
271
- config.budget_exceeded_behavior = :block_requests
165
+ bin/rails llm_cost_tracker:prices:refresh
272
166
  ```
273
167
 
274
- - `:notify` fire `on_budget_exceeded` after an event pushes the monthly, daily, or per-call budget over the limit.
275
- - `:raise` — record the event, then raise `BudgetExceededError`.
276
- - `:block_requests` — block preflight when the stored monthly or daily total is already over budget; still raises post-response on the event that crosses the line. Needs `:active_record` storage for preflight.
168
+ Precedence is `pricing_overrides` `prices_file` bundled. Provider-qualified keys like `openai/gpt-4o-mini` win over model-only keys. Full pricing reference: [`docs/pricing.md`](docs/pricing.md).
277
169
 
278
- `monthly_budget` and `daily_budget` are cumulative ledger limits. `per_call_budget` is a ceiling for a single priced event and runs after the response cost is known.
170
+ ## Budgets
279
171
 
280
- ActiveRecord installs keep `llm_cost_tracker_period_totals` in sync with atomic upserts. Budget preflight reads period rollups instead of scanning `llm_api_calls`.
172
+ Budgets are guardrails, not transactional caps:
281
173
 
282
174
  ```ruby
283
- rescue LlmCostTracker::BudgetExceededError => e
284
- # e.budget_type, e.total, e.budget, e.monthly_total, e.daily_total, e.call_cost, e.last_event
175
+ config.monthly_budget = 500.00
176
+ config.daily_budget = 50.00
177
+ config.per_call_budget = 2.00
178
+ config.budget_exceeded_behavior = :block_requests # or :notify, :raise
179
+ config.on_budget_exceeded = ->(data) { SlackNotifier.notify("#alerts", "...") }
285
180
  ```
286
181
 
287
- `:block_requests` is a **guardrail, not a hard cap**. The preflight and the spend-recording write are separate statements, so under Puma / Sidekiq concurrency multiple workers can all pass the preflight and then collectively overshoot the budget. The setting reliably *stops new requests after the overshoot is visible* — it does not prevent the overshoot itself. For strict quotas use a provider- or gateway-level limit, or a database-backed counter outside this gem.
288
-
289
- Preflight is wired into the Faraday middleware automatically. When you record events via `LlmCostTracker.track` / `track_stream` and also want the same preflight, opt in:
182
+ `:block_requests` reads ledger totals before a call goes out and stops it if you're already over. Under concurrency multiple workers can pass preflight at the same time and collectively overshoot this catches the next call after the overshoot becomes visible, not the overshoot itself. For a strict cap, use a provider-side limit or a transactional counter outside the gem.
290
183
 
291
- ```ruby
292
- LlmCostTracker.track(
293
- provider: "openai",
294
- model: "gpt-4o",
295
- input_tokens: 120,
296
- output_tokens: 45,
297
- enforce_budget: true
298
- )
299
-
300
- LlmCostTracker.track_stream(provider: "openai", model: "gpt-4o", enforce_budget: true) do |stream|
301
- # raises BudgetExceededError before the block runs when over budget
302
- end
303
-
304
- LlmCostTracker.enforce_budget! # standalone preflight
305
- ```
184
+ Full behavior, error class, and preflight details: [`docs/budgets.md`](docs/budgets.md).
306
185
 
307
- ## Querying costs
186
+ ## Querying
308
187
 
309
- ```bash
310
- bin/rails llm_cost_tracker:report
311
- DAYS=7 bin/rails llm_cost_tracker:report
312
- ```
188
+ When you want to slice spend from a console, scheduled job, or your own admin page:
313
189
 
314
190
  ```ruby
315
- LlmCostTracker::LlmApiCall.today.total_cost
316
191
  LlmCostTracker::LlmApiCall.this_month.cost_by_model
317
- LlmCostTracker::LlmApiCall.this_month.cost_by_provider
318
-
319
- # Group / sum by any tag
320
- LlmCostTracker::LlmApiCall.this_month.group_by_tag("feature").sum(:total_cost)
321
- LlmCostTracker::LlmApiCall.this_month.cost_by_tag("feature") # with "(untagged)" bucket
322
-
323
- # Period grouping (SQL-side)
324
- LlmCostTracker::LlmApiCall.this_month.group_by_period(:day).sum(:total_cost)
325
- LlmCostTracker::LlmApiCall.group_by_period(:month).sum(:total_cost)
192
+ LlmCostTracker::LlmApiCall.this_month.cost_by_tag("feature")
326
193
  LlmCostTracker::LlmApiCall.daily_costs(days: 7)
327
-
328
- # Latency
329
- LlmCostTracker::LlmApiCall.with_latency.average_latency_ms
330
- LlmCostTracker::LlmApiCall.this_month.latency_by_model
331
-
332
- # Tag filters
333
- LlmCostTracker::LlmApiCall.by_tag("feature", "chat").this_month.total_cost
334
194
  LlmCostTracker::LlmApiCall.by_tags(user_id: 42, feature: "chat").this_month.total_cost
335
-
336
- # Range
337
- LlmCostTracker::LlmApiCall.between(1.week.ago, Time.current).cost_by_model
338
- ```
339
-
340
- ## Retention
341
-
342
- Retention is not enforced automatically. Use the rake task below if you need to delete older records in batches.
343
-
344
- ```bash
345
- DAYS=90 bin/rails llm_cost_tracker:prune # delete calls older than N days in batches
346
- ```
347
-
348
- ## Tag storage
349
-
350
- New installs use `jsonb` + GIN on PostgreSQL:
351
-
352
- ```ruby
353
- t.jsonb :tags, null: false, default: {}
354
- add_index :llm_api_calls, :tags, using: :gin
355
195
  ```
356
196
 
357
- On other adapters tags fall back to JSON in a text column. `by_tag` uses JSONB containment on PG, text matching elsewhere.
358
-
359
- Upgrade an existing install:
197
+ A text report is also one rake task away:
360
198
 
361
199
  ```bash
362
- bin/rails generate llm_cost_tracker:add_period_totals # shared budget rollups
363
- bin/rails generate llm_cost_tracker:upgrade_tags_to_jsonb # PG: text → jsonb + GIN
364
- bin/rails generate llm_cost_tracker:upgrade_cost_precision # widen cost columns
365
- bin/rails generate llm_cost_tracker:add_latency_ms
366
- bin/rails db:migrate
200
+ DAYS=7 bin/rails llm_cost_tracker:report
367
201
  ```
368
202
 
369
- On PostgreSQL, the generated `upgrade_tags_to_jsonb` migration rewrites `llm_api_calls`. Run it during a maintenance window on large tables, or replace it with a two-phase backfill for zero-downtime deploys.
203
+ Full scope and helper reference: [`docs/querying.md`](docs/querying.md).
370
204
 
371
- ## Mounting the dashboard
205
+ ## Dashboard
372
206
 
373
- Optional Rails Engine. Plain ERB, no JavaScript framework, no asset pipeline required. Requires Rails 7.1+; the core middleware works without Rails.
207
+ Mount the engine wherever you want — it's plain ERB, no JavaScript bundle, no asset pipeline gymnastics:
374
208
 
375
209
  ```ruby
376
- # config/application.rb (or an initializer)
377
- require "llm_cost_tracker/engine"
378
-
379
210
  # config/routes.rb
380
211
  mount LlmCostTracker::Engine => "/llm-costs"
381
212
  ```
382
213
 
383
- Routes (GET-only; CSV export included):
384
-
385
- - `/llm-costs` — overview: spend with delta vs previous period, budget projection, spend anomaly banner, daily trend vs previous slice, provider rollup, top models
386
- - `/llm-costs/models` — by provider + model; sortable by spend, volume, avg cost, latency
387
- - `/llm-costs/calls` — filterable + paginated; outlier sort modes (expensive, largest input/output, slowest, unknown pricing); CSV export
388
- - `/llm-costs/calls/:id` — details with token mix and cost mix breakdowns
389
- - `/llm-costs/tags` — tag keys present in the dataset (PG/SQLite native; MySQL 8.0+ via JSON_TABLE)
390
- - `/llm-costs/tags/:key` — breakdown by values of a given tag key
391
- - `/llm-costs/data_quality` — unknown pricing share, untagged calls, missing latency
392
-
393
- No built-in auth is included. Tags carry whatever your app puts in them, so protect the mount point with your application's authentication.
394
-
395
- ### Basic auth
396
-
397
- ```ruby
398
- authenticated = ->(req) {
399
- ActionController::HttpAuthentication::Basic.authenticate(req) do |name, password|
400
- ActiveSupport::SecurityUtils.secure_compare(name, ENV.fetch("LLM_DASHBOARD_USER")) &
401
- ActiveSupport::SecurityUtils.secure_compare(password, ENV.fetch("LLM_DASHBOARD_PASSWORD"))
402
- end
403
- }
404
- constraints(authenticated) { mount LlmCostTracker::Engine => "/llm-costs" }
405
- ```
406
-
407
- ### Devise
408
-
409
- ```ruby
410
- authenticate :user, ->(user) { user.admin? } do
411
- mount LlmCostTracker::Engine => "/llm-costs"
412
- end
413
- ```
414
-
415
- ## ActiveSupport::Notifications
416
-
417
- ```ruby
418
- ActiveSupport::Notifications.subscribe("llm_request.llm_cost_tracker") do |*, payload|
419
- # payload =>
420
- # {
421
- # provider: "openai", model: "gpt-4o",
422
- # input_tokens: 150, cache_read_input_tokens: 0, cache_write_input_tokens: 0,
423
- # hidden_output_tokens: 0, output_tokens: 42, total_tokens: 192, latency_ms: 248,
424
- # cost: {
425
- # input_cost: 0.000375, cache_read_input_cost: 0.0,
426
- # cache_write_input_cost: 0.0, output_cost: 0.00042,
427
- # total_cost: 0.000795, currency: "USD"
428
- # },
429
- # pricing_mode: "batch",
430
- # tags: { feature: "chat", user_id: 42 },
431
- # tracked_at: 2026-04-16 14:30:00 UTC
432
- # }
433
- end
434
- ```
435
-
436
- ## Custom storage backend
437
-
438
- ```ruby
439
- config.storage_backend = :custom
440
- config.custom_storage = ->(event) {
441
- InfluxDB.write("llm_costs",
442
- values: { cost: event.cost&.total_cost, tokens: event.total_tokens, latency_ms: event.latency_ms },
443
- tags: { provider: event.provider, model: event.model }
444
- )
445
- }
446
- ```
447
-
448
- ## OpenAI-compatible providers
449
-
450
- ```ruby
451
- config.openai_compatible_providers["gateway.example.com"] = "internal_gateway"
452
- ```
453
-
454
- Configured hosts are parsed using the OpenAI-compatible usage shape (`prompt_tokens` / `completion_tokens` / `total_tokens`, `input_tokens` / `output_tokens`, and optional cached-input details). This covers OpenRouter, DeepSeek, and private gateways exposing Chat Completions / Responses / Completions / Embeddings.
214
+ Pages: overview (spend trend, budget status, anomaly banner), models, calls (filterable, paginated, CSV export), tags, data quality. Reads `llm_api_calls`, so use `:active_record` storage if you want to mount it.
455
215
 
456
- ## Custom parser
457
-
458
- For providers with a non-OpenAI usage shape:
459
-
460
- ```ruby
461
- class AcmeParser < LlmCostTracker::Parsers::Base
462
- HOSTS = %w[api.acme-llm.example].freeze
463
- TRACKED_PATHS = %w[/v1/generate].freeze
464
-
465
- def provider_names
466
- %w[acme]
467
- end
468
-
469
- def match?(url)
470
- match_uri?(url, hosts: HOSTS, exact_paths: TRACKED_PATHS)
471
- end
472
-
473
- def parse(_request_url, _request_body, response_status, response_body)
474
- return nil unless response_status == 200
475
-
476
- payload = safe_json_parse(response_body)
477
- usage = payload.dig("usage")
478
- return nil unless usage
479
-
480
- LlmCostTracker::ParsedUsage.build(
481
- provider: "acme",
482
- model: payload["model"],
483
- input_tokens: usage["input"] || 0,
484
- output_tokens: usage["output"] || 0
485
- )
486
- end
487
- end
488
-
489
- LlmCostTracker::Parsers::Registry.register(AcmeParser)
490
- ```
216
+ Auth is your job. Examples for basic auth and Devise: [`docs/dashboard.md`](docs/dashboard.md).
491
217
 
492
218
  ## Supported providers
493
219
 
494
- | Provider | Auto-detected | Models with pricing |
220
+ | Provider | Auto-detected | Coverage |
495
221
  |---|:---:|---|
496
- | OpenAI | | GPT-5.2/5.1/5, GPT-5 mini/nano, GPT-4.1, GPT-4o, o1/o3/o4-mini |
497
- | OpenRouter | | OpenAI-compatible usage; provider-prefixed OpenAI model IDs normalized when possible |
498
- | DeepSeek | | OpenAI-compatible usage; add `pricing_overrides` for DeepSeek models |
499
- | OpenAI-compatible hosts | 🔧 | Configure `openai_compatible_providers` |
500
- | Anthropic | | Claude Opus 4.6/4.1/4, Sonnet 4.6/4.5/4, Haiku 4.5, Claude 3.x |
501
- | Google Gemini | | Gemini 2.5 Pro/Flash/Flash-Lite, 2.0 Flash/Flash-Lite, 1.5 Pro/Flash |
502
- | Any other | 🔧 | Custom parser |
222
+ | OpenAI | Yes | GPT-5.5/5.4/5.2/5.1/5 + pro/mini/nano variants, GPT-4.1, GPT-4o, o1/o3/o4-mini |
223
+ | Anthropic | Yes | Claude Opus 4.7/4.6/4.5/4.1/4, Sonnet 4.6/4.5/4, Haiku 4.5 |
224
+ | Google Gemini | Yes | Gemini 2.5 Pro/Flash/Flash-Lite, 2.0 Flash/Flash-Lite |
225
+ | OpenRouter | Yes | OpenAI-compatible usage; provider-prefixed model IDs are normalized |
226
+ | DeepSeek | Yes | OpenAI-compatible usage; add `pricing_overrides` for DeepSeek-specific rates |
227
+ | Other OpenAI-compatible hosts | Configurable | Register the host via `config.openai_compatible_providers` |
228
+ | Anything else | Configurable | Custom parser — see [`docs/extending.md`](docs/extending.md) |
503
229
 
504
- Endpoints: OpenAI Chat Completions / Responses / Completions / Embeddings; OpenAI-compatible equivalents; Anthropic Messages; Gemini `generateContent` and `streamGenerateContent`. All endpoints support streaming capture.
230
+ Endpoints covered end-to-end: OpenAI Chat Completions / Responses / Completions / Embeddings, Anthropic Messages, Gemini `generateContent` and `streamGenerateContent`, plus their OpenAI-compatible equivalents. Streaming is captured for Faraday paths whenever the provider emits final-usage events.
505
231
 
506
- ## Safety
232
+ ## Privacy
507
233
 
508
- **By design, `llm_cost_tracker` never persists prompt or response content.** The only data stored per call is the metadata needed for a cost ledger (provider, model, token counts, cost, latency, tags, provider response ID, HTTP status, and a timestamp). Tags carry whatever your application passes in treat them as user-controlled input and avoid putting request bodies, completions, or secrets into them.
234
+ By design, **no prompt or response content is ever stored.** Per call, the ledger holds: provider, model, token counts, cost, latency, tags, response ID, timestamp. That's it. No request bodies, no headers, no completions. Warning logs strip query strings before logging URLs.
509
235
 
510
- - No external HTTP calls at request-tracking time.
511
- - No prompt or response bodies stored.
512
- - Faraday responses not modified.
513
- - Authorization headers and API keys are never stored or logged.
514
- - Storage failures non-fatal by default (`storage_error_behavior = :warn`).
515
- - Budget and unknown-pricing errors are raised only when you opt in.
236
+ Tags carry whatever your app passes — they are application-controlled input, treat them accordingly. Use `user_id`, not the user's email; use a feature key, not the input prompt.
516
237
 
517
- ## Thread safety (Puma, Sidekiq)
238
+ ## Documentation
518
239
 
519
- The gem is designed for multi-threaded hosts — Puma with `max_threads > 1` and Sidekiq with `concurrency > 1` are both supported. A few rules:
240
+ Deeper guides live in `docs/`. Reference pages are being filled out as content
241
+ moves out of this README; the inline sections above remain canonical where a page
242
+ is still brief.
520
243
 
521
- - **Configure once at boot.** `LlmCostTracker.configure` deep-freezes `default_tags`, `pricing_overrides`, `report_tag_breakdowns`, and `openai_compatible_providers` when the block returns. Mutating or replacing shared fields through `LlmCostTracker.configuration` raises `FrozenError`.
522
- - **Use `:active_record` storage for shared ledgers.** Puma workers and Sidekiq processes do not share memory; `:log` and `:custom` backends see per-process state only. `:active_record` writes to a single table and is the right choice for dashboards and budget checks across processes.
523
- - **Size your connection pool.** Each tracked call on the middleware path issues up to three SQL queries (preflight `SUM`, `INSERT`, post-check `SUM`). Make sure the AR pool covers `puma max_threads + sidekiq concurrency` plus your app's own usage.
524
- - **Don't share a `StreamCollector` across threads you don't own.** The collector itself is thread-safe — `event`, `usage`, and `finish!` synchronize internally and `finish!` is idempotent — but the documented pattern is one collector per stream.
525
- - **`finish!` is a barrier.** Once a stream is finished, later `event`, `usage`, or `model=` calls raise `FrozenError` instead of mutating a closed collector.
526
- - **`ActiveSupport::Notifications` subscribers run synchronously** in the caller's thread. Keep them fast or hand off to a background job; otherwise they add latency to every tracked call.
527
- - **`storage_error_behavior = :raise` inside Sidekiq** will retry the job, which can duplicate an expensive LLM call. Prefer `:warn` plus a Notifications subscriber, or `:ignore`, for worker contexts.
244
+ - [Configuration reference](docs/configuration.md)
245
+ - [Pricing & price refresh](docs/pricing.md)
246
+ - [Budgets & guardrails](docs/budgets.md)
247
+ - [Querying & reports](docs/querying.md)
248
+ - [Dashboard mounting](docs/dashboard.md)
249
+ - [Streaming capture](docs/streaming.md)
250
+ - [Extending](docs/extending.md)
251
+ - [Production operations](docs/operations.md)
252
+ - [Upgrading](docs/upgrading.md)
253
+ - [Cookbook — per-client recipes](docs/cookbook.md)
254
+ - [Architecture & design rules](docs/architecture.md)
528
255
 
529
256
  ## Known limitations
530
257
 
531
- - `:block_requests` is a best-effort guardrail, not a hard cap. Concurrent workers can pass preflight simultaneously and collectively overshoot the budget. Use an external quota system if you need a transactional cap.
532
- - Streaming capture relies on the provider emitting a final-usage event (OpenAI needs `stream_options: { include_usage: true }`); missing events are recorded with `usage_source: "unknown"` so they surface on the Data Quality page.
533
- - `provider_response_id` is stored only when the provider exposes a stable response object ID. Missing IDs stay `nil` and surface on the Data Quality page.
534
- - Cache write TTL variants (1h vs 5min writes) not modeled separately.
258
+ - `:block_requests` is best-effort under concurrency, not a transactional cap.
259
+ - Official SDK integrations cover non-streaming calls; streaming via the SDKs falls back to Faraday middleware or `track_stream`.
260
+ - Streaming usage capture relies on the provider emitting a final-usage event. Missing events are stored with `usage_source: "unknown"` so they appear on the data-quality page rather than vanishing.
261
+ - `provider_response_id` is stored only when the provider exposes a stable ID. Gemini is best-effort and varies by endpoint.
262
+ - Cache write TTL variants on Anthropic (1h vs 5min writes) are not modeled separately yet.
535
263
 
536
264
  ## Development
537
265
 
538
- Architecture rules for future changes live in [`docs/architecture.md`](docs/architecture.md). Integration recipes live in [`docs/cookbook.md`](docs/cookbook.md).
539
-
540
266
  ```bash
541
267
  bundle install
542
- bundle exec rspec
543
- bundle exec rubocop
268
+ bin/check # rubocop + rspec
544
269
  ```
545
270
 
271
+ Architecture rules and conventions for contributions live in [`AGENTS.md`](AGENTS.md) and [`docs/architecture.md`](docs/architecture.md).
272
+
546
273
  ## License
547
274
 
548
- MIT. See [LICENSE.txt](LICENSE.txt).
275
+ MIT see [LICENSE.txt](LICENSE.txt).