llm_cost_tracker 0.1.4 → 0.2.0.alpha2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (69) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +58 -91
  3. data/PLAN_0.2.md +488 -0
  4. data/README.md +140 -320
  5. data/app/controllers/llm_cost_tracker/application_controller.rb +42 -0
  6. data/app/controllers/llm_cost_tracker/calls_controller.rb +77 -0
  7. data/app/controllers/llm_cost_tracker/dashboard_controller.rb +54 -0
  8. data/app/controllers/llm_cost_tracker/data_quality_controller.rb +10 -0
  9. data/app/controllers/llm_cost_tracker/models_controller.rb +12 -0
  10. data/app/controllers/llm_cost_tracker/tags_controller.rb +21 -0
  11. data/app/helpers/llm_cost_tracker/application_helper.rb +113 -0
  12. data/app/services/llm_cost_tracker/dashboard/data_quality.rb +38 -0
  13. data/app/services/llm_cost_tracker/dashboard/filter.rb +109 -0
  14. data/app/services/llm_cost_tracker/dashboard/overview_stats.rb +87 -0
  15. data/app/services/llm_cost_tracker/dashboard/provider_breakdown.rb +44 -0
  16. data/app/services/llm_cost_tracker/dashboard/tag_breakdown.rb +58 -0
  17. data/app/services/llm_cost_tracker/dashboard/tag_key_explorer.rb +125 -0
  18. data/app/services/llm_cost_tracker/dashboard/time_series.rb +44 -0
  19. data/app/services/llm_cost_tracker/dashboard/top_models.rb +89 -0
  20. data/app/services/llm_cost_tracker/pagination.rb +59 -0
  21. data/app/views/layouts/llm_cost_tracker/application.html.erb +342 -0
  22. data/app/views/llm_cost_tracker/calls/index.html.erb +127 -0
  23. data/app/views/llm_cost_tracker/calls/show.html.erb +67 -0
  24. data/app/views/llm_cost_tracker/dashboard/index.html.erb +145 -0
  25. data/app/views/llm_cost_tracker/data_quality/index.html.erb +110 -0
  26. data/app/views/llm_cost_tracker/errors/database.html.erb +8 -0
  27. data/app/views/llm_cost_tracker/errors/invalid_filter.html.erb +4 -0
  28. data/app/views/llm_cost_tracker/errors/not_found.html.erb +5 -0
  29. data/app/views/llm_cost_tracker/models/index.html.erb +95 -0
  30. data/app/views/llm_cost_tracker/shared/_bar.html.erb +5 -0
  31. data/app/views/llm_cost_tracker/shared/setup_required.html.erb +6 -0
  32. data/app/views/llm_cost_tracker/tags/index.html.erb +34 -0
  33. data/app/views/llm_cost_tracker/tags/show.html.erb +69 -0
  34. data/config/routes.rb +10 -0
  35. data/lib/llm_cost_tracker/budget.rb +16 -38
  36. data/lib/llm_cost_tracker/configuration.rb +3 -1
  37. data/lib/llm_cost_tracker/cost.rb +1 -3
  38. data/lib/llm_cost_tracker/engine.rb +13 -0
  39. data/lib/llm_cost_tracker/engine_compatibility.rb +15 -0
  40. data/lib/llm_cost_tracker/errors.rb +2 -0
  41. data/lib/llm_cost_tracker/event.rb +1 -3
  42. data/lib/llm_cost_tracker/event_metadata.rb +9 -18
  43. data/lib/llm_cost_tracker/llm_api_call.rb +4 -17
  44. data/lib/llm_cost_tracker/middleware/faraday.rb +4 -4
  45. data/lib/llm_cost_tracker/parsed_usage.rb +5 -9
  46. data/lib/llm_cost_tracker/parsers/anthropic.rb +4 -5
  47. data/lib/llm_cost_tracker/parsers/base.rb +3 -8
  48. data/lib/llm_cost_tracker/parsers/gemini.rb +3 -3
  49. data/lib/llm_cost_tracker/parsers/openai_usage.rb +3 -3
  50. data/lib/llm_cost_tracker/parsers/registry.rb +5 -12
  51. data/lib/llm_cost_tracker/period_grouping.rb +68 -0
  52. data/lib/llm_cost_tracker/price_registry.rb +22 -30
  53. data/lib/llm_cost_tracker/pricing.rb +10 -19
  54. data/lib/llm_cost_tracker/report.rb +4 -4
  55. data/lib/llm_cost_tracker/report_data.rb +21 -24
  56. data/lib/llm_cost_tracker/report_formatter.rb +4 -2
  57. data/lib/llm_cost_tracker/storage/active_record_store.rb +1 -3
  58. data/lib/llm_cost_tracker/tag_key.rb +16 -0
  59. data/lib/llm_cost_tracker/tracker.rb +35 -1
  60. data/lib/llm_cost_tracker/version.rb +1 -1
  61. data/lib/llm_cost_tracker.rb +3 -6
  62. data/llm_cost_tracker.gemspec +13 -9
  63. metadata +91 -20
  64. data/.rubocop.yml +0 -44
  65. data/lib/llm_cost_tracker/storage/active_record_backend.rb +0 -19
  66. data/lib/llm_cost_tracker/storage/backends.rb +0 -26
  67. data/lib/llm_cost_tracker/storage/custom_backend.rb +0 -16
  68. data/lib/llm_cost_tracker/storage/log_backend.rb +0 -28
  69. data/lib/llm_cost_tracker/value_object.rb +0 -45
data/README.md CHANGED
@@ -1,8 +1,6 @@
1
1
  # LlmCostTracker
2
2
 
3
- **See where your Rails app spends money on LLM APIs.**
4
-
5
- Track cost by user, tenant, feature, provider, and model, all in your own database. No proxy. No SaaS required.
3
+ **Self-hosted LLM cost tracking for Ruby and Rails.** Intercepts Faraday LLM responses, prices them locally, stores events in your database. No proxy, no SaaS.
6
4
 
7
5
  [![Gem Version](https://img.shields.io/gem/v/llm_cost_tracker.svg)](https://rubygems.org/gems/llm_cost_tracker)
8
6
  [![CI](https://github.com/sergey-homenko/llm_cost_tracker/actions/workflows/ruby.yml/badge.svg)](https://github.com/sergey-homenko/llm_cost_tracker/actions)
@@ -20,53 +18,38 @@ By model:
20
18
  claude-sonnet-4-6 $31.200000
21
19
  gemini-2.5-flash $14.120000
22
20
 
23
- By tag (feature):
24
- chat $73.500000
25
- summarizer $29.220000
26
- translate $24.700000
21
+ By tag key "env":
22
+ production $119.300000
23
+ staging $8.120000
27
24
  ```
28
25
 
29
- ## Why?
30
-
31
- Every Rails app integrating LLMs faces the same problem: **you don't know how much AI is costing you** until the invoice arrives. Full observability platforms like Langfuse and Helicone are powerful, but sometimes you just need a small Rails-native cost ledger that lives in your app database.
26
+ ## Why
32
27
 
33
- `llm_cost_tracker` takes a different approach:
28
+ Every Rails app with LLM integrations eventually runs into the same question: where did that invoice come from? Full observability platforms like Langfuse and Helicone cover a lot more than cost, and sometimes you just want a small Rails-native ledger that lives in your own database.
34
29
 
35
- - 🔌 **Faraday-native** intercepts LLM HTTP responses without changing the response
36
- - 🏠 **Self-hosted** — your data stays in your database
37
- - 🧩 **Client-light** — works with raw Faraday and LLM gems that expose their Faraday connection
38
- - 🏷️ **Attribution-first** — tag spend by feature, tenant, user, job, or environment
39
- - 🌐 **OpenAI-compatible** — auto-detect OpenRouter and DeepSeek, with custom compatible hosts configurable
40
- - 🛑 **Budget guardrails** — notify, raise, or block requests when monthly spend is exhausted
41
- - 📊 **Quick reports** — print a terminal cost report with one rake task
30
+ `llm_cost_tracker` is scoped to that. It plugs into Faraday, parses provider usage out of the response, looks up pricing locally, and writes an event. You end up with a ledger you can query with plain ActiveRecord, slice by any tag dimension, and optionally surface on a built-in dashboard. No proxy, no SaaS, no separate service to run.
42
31
 
43
- This gem is intentionally not a tracing platform, prompt CMS, eval system, or gateway. It focuses on the boring but valuable question: "What did this app spend on LLM APIs, and where did that spend come from?"
32
+ It's not a tracing platform, prompt CMS, eval system, or gateway and doesn't want to be. The goal is answering _"what did this app spend on LLM APIs, and where did that spend come from?"_ well enough that you stop worrying about it.
44
33
 
45
34
  ## Installation
46
35
 
47
- Add to your Gemfile:
48
-
49
36
  ```ruby
50
37
  gem "llm_cost_tracker"
51
38
  ```
52
39
 
53
- For ActiveRecord storage (recommended for production):
40
+ For ActiveRecord storage:
54
41
 
55
42
  ```bash
56
43
  bin/rails generate llm_cost_tracker:install
57
44
  bin/rails db:migrate
58
45
  ```
59
46
 
60
- ## Try It In 30 Seconds
61
-
62
- Try cost calculation without a database or migration:
47
+ ## Quick try (no database)
63
48
 
64
49
  ```ruby
65
50
  require "llm_cost_tracker"
66
51
 
67
- LlmCostTracker.configure do |config|
68
- config.storage_backend = :log
69
- end
52
+ LlmCostTracker.configure { |c| c.storage_backend = :log }
70
53
 
71
54
  LlmCostTracker.track(
72
55
  provider: :openai,
@@ -75,25 +58,12 @@ LlmCostTracker.track(
75
58
  output_tokens: 200,
76
59
  feature: "demo"
77
60
  )
61
+ # => [LlmCostTracker] openai/gpt-4o tokens=1000+200 cost=$0.004500 tags={:feature=>"demo"}
78
62
  ```
79
63
 
80
- Output:
81
-
82
- ```text
83
- [LlmCostTracker] openai/gpt-4o tokens=1000+200 cost=$0.004500 tags={:feature=>"demo"}
84
- ```
85
-
86
- ## Quick Start
87
-
88
- Use the path that matches your app:
89
-
90
- - Using `ruby-openai`, `ruby_llm`, or another client that exposes Faraday? Patch that client's Faraday connection.
91
- - Using raw Faraday? Add the middleware directly.
92
- - Using a client without Faraday access? Use manual tracking.
93
-
94
- ### Option 1: Patch An Existing Client
64
+ ## Usage
95
65
 
96
- Some LLM gems expose their Faraday connection. For example, with `ruby-openai`:
66
+ ### Patch an existing client's Faraday connection
97
67
 
98
68
  ```ruby
99
69
  # config/initializers/openai.rb
@@ -102,34 +72,27 @@ OpenAI.configure do |config|
102
72
 
103
73
  config.faraday do |f|
104
74
  f.use :llm_cost_tracker, tags: -> {
105
- {
106
- user_id: Current.user&.id,
107
- feature: Current.llm_feature || "chat"
108
- }
75
+ { user_id: Current.user&.id, workflow: Current.workflow, env: Rails.env }
109
76
  }
110
77
  end
111
78
  end
112
79
  ```
113
80
 
114
- For Rails apps, `tags:` can be a callable so request-local values are evaluated per request:
81
+ `tags:` can be a callable so `Current` attributes are evaluated per request:
115
82
 
116
83
  ```ruby
117
- # app/models/current.rb
118
84
  class Current < ActiveSupport::CurrentAttributes
119
- attribute :user, :tenant, :llm_feature
85
+ attribute :user, :tenant, :workflow
120
86
  end
121
87
 
122
- # app/controllers/application_controller.rb
88
+ # application_controller.rb
123
89
  before_action do
124
90
  Current.user = current_user
125
- Current.tenant = current_tenant if respond_to?(:current_tenant, true)
126
- Current.llm_feature = "chat"
91
+ Current.workflow = "chat"
127
92
  end
128
93
  ```
129
94
 
130
- ### Option 2: Faraday Middleware
131
-
132
- If your LLM client uses Faraday, add the middleware to that connection:
95
+ ### Raw Faraday
133
96
 
134
97
  ```ruby
135
98
  conn = Faraday.new(url: "https://api.openai.com") do |f|
@@ -139,18 +102,12 @@ conn = Faraday.new(url: "https://api.openai.com") do |f|
139
102
  f.adapter Faraday.default_adapter
140
103
  end
141
104
 
142
- # Every supported LLM request through this connection is tracked
143
- response = conn.post("/v1/responses", {
144
- model: "gpt-5-mini",
145
- input: "Hello!"
146
- })
105
+ conn.post("/v1/responses", { model: "gpt-5-mini", input: "Hello!" })
147
106
  ```
148
107
 
149
- If a client does not expose its HTTP connection, use manual tracking or register a custom parser around the HTTP layer you control.
150
-
151
- ### Option 3: Manual tracking
108
+ Place `llm_cost_tracker` inside the Faraday stack where it can see the final response body. For streaming APIs, tracking requires the final body to expose provider usage; otherwise the gem warns and skips — use manual tracking there.
152
109
 
153
- For non-Faraday clients, track manually:
110
+ ### Manual tracking
154
111
 
155
112
  ```ruby
156
113
  LlmCostTracker.track(
@@ -169,337 +126,215 @@ LlmCostTracker.track(
169
126
  ```ruby
170
127
  # config/initializers/llm_cost_tracker.rb
171
128
  LlmCostTracker.configure do |config|
172
- # Storage: :log (default), :active_record, or :custom
173
- config.storage_backend = :active_record
174
-
175
- # Default tags on every event
129
+ config.storage_backend = :active_record # :log (default), :active_record, :custom
176
130
  config.default_tags = { app: "my_app", environment: Rails.env }
177
131
 
178
- # Monthly budget in USD
179
132
  config.monthly_budget = 500.00
180
- config.budget_exceeded_behavior = :notify # :notify, :raise, or :block_requests
181
- config.storage_error_behavior = :warn # :ignore, :warn, or :raise
182
- config.unknown_pricing_behavior = :warn # :ignore, :warn, or :raise
133
+ config.budget_exceeded_behavior = :notify # :notify, :raise, :block_requests
134
+ config.storage_error_behavior = :warn # :ignore, :warn, :raise
135
+ config.unknown_pricing_behavior = :warn # :ignore, :warn, :raise
183
136
 
184
- # Alert callback
185
137
  config.on_budget_exceeded = ->(data) {
186
- SlackNotifier.notify(
187
- "#alerts",
188
- "🚨 LLM budget exceeded! $#{data[:monthly_total].round(2)} / $#{data[:budget]}"
189
- )
138
+ SlackNotifier.notify("#alerts", "🚨 LLM budget $#{data[:monthly_total].round(2)} / $#{data[:budget]}")
190
139
  }
191
140
 
192
- # Override pricing for custom/fine-tuned models (per 1M tokens)
193
141
  config.prices_file = Rails.root.join("config/llm_cost_tracker_prices.yml")
194
142
  config.pricing_overrides = {
195
143
  "ft:gpt-4o-mini:my-org" => { input: 0.30, cached_input: 0.15, output: 1.20 }
196
144
  }
197
145
 
198
- # OpenAI-compatible APIs. OpenRouter and DeepSeek are included by default.
146
+ # Built-in: openrouter.ai, api.deepseek.com
199
147
  config.openai_compatible_providers["llm.my-company.com"] = "internal_gateway"
200
148
  end
201
149
  ```
202
150
 
203
- Pricing is best-effort and based on public provider pricing for standard token usage. Providers change pricing frequently, and some features have extra charges or tiered pricing. OpenRouter-style model IDs such as `openai/gpt-4o-mini` are normalized to built-in model names when possible. Use `prices_file` or `pricing_overrides` for fine-tunes, gateway-specific model IDs, enterprise discounts, batch pricing, long-context premiums, and any model this gem does not know yet.
204
-
205
- Storage errors are non-fatal by default:
206
-
207
- ```ruby
208
- config.storage_error_behavior = :warn # default
209
- config.storage_error_behavior = :raise # fail fast with StorageError
210
- config.storage_error_behavior = :ignore # skip storage failures silently
211
- ```
212
-
213
- With the default `:warn` behavior, tracking emits a warning and lets the LLM response continue if ActiveRecord or custom storage fails. `LlmCostTracker::StorageError` exposes `original_error` when `:raise` is enabled.
151
+ Pricing is best-effort. OpenRouter-style IDs like `openai/gpt-4o-mini` are normalized to built-in names when possible. Use `prices_file` / `pricing_overrides` for fine-tunes, gateway-specific IDs, enterprise discounts, batch pricing, or models the gem doesn't know.
214
152
 
215
- Unknown model pricing is visible by default:
216
-
217
- ```ruby
218
- config.unknown_pricing_behavior = :warn # default
219
- config.unknown_pricing_behavior = :raise # fail fast with UnknownPricingError
220
- config.unknown_pricing_behavior = :ignore # keep tracking tokens silently
221
- ```
153
+ `storage_error_behavior = :warn` (default) lets LLM responses continue if storage fails; `:raise` exposes `StorageError#original_error`.
222
154
 
223
- When pricing is unknown, the event can still be recorded with token counts, but `cost` is `nil` and budget guardrails are skipped for that event. Use `prices_file` or `pricing_overrides` to ensure all production models are priced. Check this ActiveRecord query for a list of unpriced models in your data:
155
+ Unknown pricing still records token counts, but `cost` is `nil` and budget guardrails skip that event. Find unpriced models:
224
156
 
225
157
  ```ruby
226
158
  LlmCostTracker::LlmApiCall.unknown_pricing.group(:model).count
227
159
  ```
228
160
 
229
- ### Keeping Prices Current
161
+ ### Keeping prices current
230
162
 
231
- Built-in prices live in `lib/llm_cost_tracker/prices.json`, with `updated_at`, `unit`, `currency`, and source URLs in the file metadata. The gem does not fetch pricing on boot; that keeps it self-hosted and avoids hidden external dependencies.
232
-
233
- For production apps, keep a local JSON or YAML price file and point the gem at it:
163
+ Built-in prices are in `lib/llm_cost_tracker/prices.json`. The gem never fetches pricing on boot. For production, generate a local overrides file and point the gem at it:
234
164
 
235
165
  ```bash
236
166
  bin/rails generate llm_cost_tracker:prices
237
167
  ```
238
168
 
239
- ```ruby
240
- config.prices_file = Rails.root.join("config/llm_cost_tracker_prices.yml")
241
- ```
242
-
243
- Example JSON:
244
-
245
169
  ```json
246
170
  {
247
- "metadata": {
248
- "updated_at": "2026-04-18",
249
- "currency": "USD",
250
- "unit": "1M tokens"
251
- },
171
+ "metadata": { "updated_at": "2026-04-18", "currency": "USD", "unit": "1M tokens" },
252
172
  "models": {
253
- "my-gateway/gpt-4o-mini": {
254
- "input": 0.20,
255
- "cached_input": 0.10,
256
- "output": 0.80
257
- }
173
+ "my-gateway/gpt-4o-mini": { "input": 0.20, "cached_input": 0.10, "output": 0.80 }
258
174
  }
259
175
  }
260
176
  ```
261
177
 
262
- `pricing_overrides` still has the highest precedence, so you can use it for small Ruby-only overrides and keep broader provider tables in the file. A practical release rhythm is to refresh built-in `prices.json` quarterly and use `prices_file` for urgent provider changes between gem releases.
178
+ `pricing_overrides` has the highest precedence; use it for small Ruby-only tweaks, `prices_file` for broader tables.
263
179
 
264
- ## Budget Enforcement
180
+ ## Budget enforcement
265
181
 
266
182
  ```ruby
267
- LlmCostTracker.configure do |config|
268
- config.storage_backend = :active_record
269
- config.monthly_budget = 100.00
270
- config.budget_exceeded_behavior = :block_requests
271
- end
183
+ config.storage_backend = :active_record
184
+ config.monthly_budget = 100.00
185
+ config.budget_exceeded_behavior = :block_requests
272
186
  ```
273
187
 
274
- Budget behavior options:
275
-
276
- - `:notify` — default. Calls `on_budget_exceeded` after a tracked event pushes the month over budget.
277
- - `:raise` — records the event, then raises `LlmCostTracker::BudgetExceededError` when the month is over budget.
278
- - `:block_requests` — blocks Faraday LLM requests before the HTTP call when the ActiveRecord monthly total has already reached the budget. If a request pushes the month over budget, it also raises after recording the event.
279
-
280
- `BudgetExceededError` exposes `monthly_total`, `budget`, and `last_event`:
188
+ - `:notify` — fire `on_budget_exceeded` after an event pushes the month over budget.
189
+ - `:raise` — record the event, then raise `BudgetExceededError`.
190
+ - `:block_requests` — block preflight when the stored monthly total is already over budget; still raises post-response on the event that crosses the line. Needs `:active_record` storage.
281
191
 
282
192
  ```ruby
283
- begin
284
- client.chat(...)
285
193
  rescue LlmCostTracker::BudgetExceededError => e
286
- Rails.logger.warn("LLM budget exhausted: #{e.monthly_total} / #{e.budget}")
287
- end
194
+ # e.monthly_total, e.budget, e.last_event
288
195
  ```
289
196
 
290
- Pre-request blocking needs `storage_backend = :active_record` because the middleware must query your stored monthly total before sending the request. With `:log` or `:custom` storage, `:raise` and the post-response part of `:block_requests` still work for the event being tracked.
291
-
292
- `:block_requests` is a best-effort guardrail, not a transactional hard quota. In highly concurrent deployments, multiple workers can pass the preflight check at the same time before any of them records its final cost. The request that first pushes the month over budget is stored before the post-response `BudgetExceededError` is raised; later Faraday requests are blocked during preflight once the stored monthly total is exhausted. Use provider-side limits or a gateway-level quota if you need strict cross-process caps.
293
-
294
- ## Querying Costs (ActiveRecord)
197
+ `:block_requests` is best-effort under concurrency, not a transactional cap. Use provider/gateway-level limits for strict quotas.
295
198
 
296
- Print a quick terminal report:
199
+ ## Querying costs
297
200
 
298
201
  ```bash
299
202
  bin/rails llm_cost_tracker:report
300
-
301
- # Optional: change the window
302
203
  DAYS=7 bin/rails llm_cost_tracker:report
303
204
  ```
304
205
 
305
- Example:
306
-
307
- ```text
308
- LLM Cost Report (last 30 days)
309
-
310
- Total cost: $127.420000
311
- Requests: 4,218
312
- Avg latency: 812ms
313
- Unknown pricing: 0
314
-
315
- By provider:
316
- openai $96.220000
317
- anthropic $31.200000
318
- ```
319
-
320
- Or query the ledger directly:
321
-
322
206
  ```ruby
323
- # Today's total spend
324
207
  LlmCostTracker::LlmApiCall.today.total_cost
325
- # => 12.45
326
-
327
- # Cost breakdown by model this month
328
208
  LlmCostTracker::LlmApiCall.this_month.cost_by_model
329
- # => { "gpt-4o" => 8.20, "claude-sonnet-4-6" => 4.25 }
330
-
331
- # Cost by provider
332
209
  LlmCostTracker::LlmApiCall.this_month.cost_by_provider
333
- # => { "openai" => 8.20, "anthropic" => 4.25 }
334
-
335
- # SQL-side cost breakdown by any tag key
336
- calls = LlmCostTracker::LlmApiCall.this_month
337
- calls.group_by_tag("feature").sum(:total_cost)
338
- # => { "chat" => 7.10, "summarizer" => 1.10 }
339
210
 
340
- # Convenience wrapper with "(untagged)" labels and float values
341
- calls.cost_by_tag("feature")
342
- # => { "chat" => 7.10, "summarizer" => 1.10 }
211
+ # Group / sum by any tag
212
+ LlmCostTracker::LlmApiCall.this_month.group_by_tag("feature").sum(:total_cost)
213
+ LlmCostTracker::LlmApiCall.this_month.cost_by_tag("feature") # with "(untagged)" bucket
343
214
 
344
- # Daily cost trend
215
+ # Period grouping (SQL-side)
216
+ LlmCostTracker::LlmApiCall.this_month.group_by_period(:day).sum(:total_cost)
217
+ LlmCostTracker::LlmApiCall.group_by_period(:month).sum(:total_cost)
345
218
  LlmCostTracker::LlmApiCall.daily_costs(days: 7)
346
- # => { "2026-04-10" => 1.5, "2026-04-11" => 2.3, ... }
347
219
 
348
- # Latency overview
220
+ # Latency
349
221
  LlmCostTracker::LlmApiCall.with_latency.average_latency_ms
350
222
  LlmCostTracker::LlmApiCall.this_month.latency_by_model
351
223
 
352
- # Filter by one tag
224
+ # Tag filters
353
225
  LlmCostTracker::LlmApiCall.by_tag("feature", "chat").this_month.total_cost
354
-
355
- # Filter by another tag
356
- LlmCostTracker::LlmApiCall.by_tag("user_id", "42").today.total_cost
357
-
358
- # Filter by multiple tags
359
226
  LlmCostTracker::LlmApiCall.by_tags(user_id: 42, feature: "chat").this_month.total_cost
360
227
 
361
- # Find models without pricing
362
- LlmCostTracker::LlmApiCall.unknown_pricing.group(:model).count
363
- LlmCostTracker::LlmApiCall.with_cost.this_month.total_cost
364
-
365
- # Custom date range
228
+ # Range
366
229
  LlmCostTracker::LlmApiCall.between(1.week.ago, Time.current).cost_by_model
367
230
  ```
368
231
 
369
- ### Tag Storage
232
+ ### Tag storage
370
233
 
371
- The install generator uses `jsonb` tags with a GIN index on PostgreSQL:
234
+ New installs use `jsonb` + GIN on PostgreSQL:
372
235
 
373
236
  ```ruby
374
237
  t.jsonb :tags, null: false, default: {}
375
238
  add_index :llm_api_calls, :tags, using: :gin
376
239
  ```
377
240
 
378
- On SQLite, MySQL, and other adapters, tags fall back to JSON stored in a text column. The `by_tag` scope automatically uses PostgreSQL JSONB containment when the column supports it, and the text fallback otherwise. This works, but tag queries are less efficient than PostgreSQL JSONB containment.
241
+ On other adapters tags fall back to JSON in a text column. `by_tag` uses JSONB containment on PG, text matching elsewhere.
379
242
 
380
- If you installed `llm_cost_tracker` before JSONB tags were available and your app uses PostgreSQL, generate an upgrade migration:
243
+ Upgrade an existing install:
381
244
 
382
245
  ```bash
383
- bin/rails generate llm_cost_tracker:upgrade_tags_to_jsonb
246
+ bin/rails generate llm_cost_tracker:upgrade_tags_to_jsonb # PG: text → jsonb + GIN
247
+ bin/rails generate llm_cost_tracker:upgrade_cost_precision # widen cost columns
248
+ bin/rails generate llm_cost_tracker:add_latency_ms
384
249
  bin/rails db:migrate
385
250
  ```
386
251
 
387
- This converts the existing `tags` text column to `jsonb`, keeps existing tag data, and adds the GIN index.
252
+ ## Dashboard (optional)
388
253
 
389
- If you installed an earlier version with `precision: 12, scale: 8` cost columns, widen them for larger production ledgers:
254
+ Opt-in Rails Engine. Plain ERB, inline CSS, no JS. Requires Rails 7.1+; the core middleware works without Rails.
390
255
 
391
- ```bash
392
- bin/rails generate llm_cost_tracker:upgrade_cost_precision
393
- bin/rails db:migrate
256
+ ```ruby
257
+ # config/application.rb (or an initializer)
258
+ require "llm_cost_tracker/engine"
259
+
260
+ # config/routes.rb
261
+ mount LlmCostTracker::Engine => "/llm-costs"
394
262
  ```
395
263
 
396
- If you installed before `latency_ms` was available, add the latency column:
264
+ Routes (GET-only; CSV export included):
397
265
 
398
- ```bash
399
- bin/rails generate llm_cost_tracker:add_latency_ms
400
- bin/rails db:migrate
266
+ - `/llm-costs` — overview: spend (with delta vs previous period), calls, avg cost/call, avg latency, unknown pricing, budget, daily trend, provider rollup, top models
267
+ - `/llm-costs/models` by provider + model; sortable by spend, volume, avg cost, latency
268
+ - `/llm-costs/calls` — filterable + paginated; outlier sort modes (expensive, largest input/output, slowest, unknown pricing); CSV export
269
+ - `/llm-costs/calls/:id` — details
270
+ - `/llm-costs/tags` — tag keys present in the dataset (PG/SQLite native, MySQL via in-Ruby fallback)
271
+ - `/llm-costs/tags/:key` — breakdown by values of a given tag key
272
+ - `/llm-costs/data_quality` — unknown pricing share, untagged calls, missing latency
273
+
274
+ > ⚠️ **No built-in auth.** Tags carry whatever your app puts in them. Protect the mount point with your app's auth.
275
+
276
+ ### Basic auth
277
+
278
+ ```ruby
279
+ authenticated = ->(req) {
280
+ ActionController::HttpAuthentication::Basic.authenticate(req) do |name, password|
281
+ ActiveSupport::SecurityUtils.secure_compare(name, ENV.fetch("LLM_DASHBOARD_USER")) &
282
+ ActiveSupport::SecurityUtils.secure_compare(password, ENV.fetch("LLM_DASHBOARD_PASSWORD"))
283
+ end
284
+ }
285
+ constraints(authenticated) { mount LlmCostTracker::Engine => "/llm-costs" }
401
286
  ```
402
287
 
403
- ## ActiveSupport::Notifications
288
+ ### Devise
289
+
290
+ ```ruby
291
+ authenticate :user, ->(user) { user.admin? } do
292
+ mount LlmCostTracker::Engine => "/llm-costs"
293
+ end
294
+ ```
404
295
 
405
- Every tracked call emits an `llm_request.llm_cost_tracker` event:
296
+ ## ActiveSupport::Notifications
406
297
 
407
298
  ```ruby
408
299
  ActiveSupport::Notifications.subscribe("llm_request.llm_cost_tracker") do |*, payload|
409
300
  # payload =>
410
301
  # {
411
- # provider: "openai",
412
- # model: "gpt-4o",
413
- # input_tokens: 150,
414
- # output_tokens: 42,
415
- # total_tokens: 192,
416
- # latency_ms: 248,
302
+ # provider: "openai", model: "gpt-4o",
303
+ # input_tokens: 150, output_tokens: 42, total_tokens: 192, latency_ms: 248,
417
304
  # cost: {
418
- # input_cost: 0.000375,
419
- # cached_input_cost: 0.0,
420
- # cache_read_input_cost: 0.0,
421
- # cache_creation_input_cost: 0.0,
422
- # output_cost: 0.00042,
423
- # total_cost: 0.000795,
424
- # currency: "USD"
305
+ # input_cost: 0.000375, cached_input_cost: 0.0,
306
+ # cache_read_input_cost: 0.0, cache_creation_input_cost: 0.0,
307
+ # output_cost: 0.00042, total_cost: 0.000795, currency: "USD"
425
308
  # },
426
309
  # tags: { feature: "chat", user_id: 42 },
427
310
  # tracked_at: 2026-04-16 14:30:00 UTC
428
311
  # }
429
-
430
- StatsD.increment("llm.requests", tags: ["provider:#{payload[:provider]}"])
431
- StatsD.histogram("llm.cost", payload[:cost][:total_cost])
432
312
  end
433
313
  ```
434
314
 
435
- ## Custom Storage Backend
315
+ ## Custom storage backend
436
316
 
437
317
  ```ruby
438
- LlmCostTracker.configure do |config|
439
- config.storage_backend = :custom
440
- config.custom_storage = ->(event) {
441
- InfluxDB.write("llm_costs", {
442
- values: {
443
- cost: event[:cost]&.fetch(:total_cost, nil),
444
- tokens: event[:total_tokens],
445
- latency_ms: event[:latency_ms]
446
- },
447
- tags: { provider: event[:provider], model: event[:model] }
448
- })
449
- }
450
- end
318
+ config.storage_backend = :custom
319
+ config.custom_storage = ->(event) {
320
+ InfluxDB.write("llm_costs",
321
+ values: { cost: event.cost&.total_cost, tokens: event.total_tokens, latency_ms: event.latency_ms },
322
+ tags: { provider: event.provider, model: event.model }
323
+ )
324
+ }
451
325
  ```
452
326
 
453
- ## OpenAI-Compatible Providers
327
+ ## OpenAI-compatible providers
454
328
 
455
329
  ```ruby
456
- LlmCostTracker.configure do |config|
457
- # Built in:
458
- # "openrouter.ai" => "openrouter"
459
- # "api.deepseek.com" => "deepseek"
460
- config.openai_compatible_providers["gateway.example.com"] = "internal_gateway"
461
- end
330
+ config.openai_compatible_providers["gateway.example.com"] = "internal_gateway"
462
331
  ```
463
332
 
464
- Any configured host is parsed with the OpenAI-compatible usage shape:
465
-
466
- - `prompt_tokens` / `completion_tokens` / `total_tokens`
467
- - `input_tokens` / `output_tokens` / `total_tokens`
468
- - optional cached input details when the response includes them
469
-
470
- This covers OpenRouter, DeepSeek, and private gateways that expose OpenAI-style Chat Completions, Responses, Completions, or Embeddings endpoints.
471
-
472
- ## Safety Guarantees
333
+ Configured hosts are parsed with the OpenAI-compatible usage shape (`prompt_tokens` / `completion_tokens` / `total_tokens`, `input_tokens` / `output_tokens`, and optional cached-input details). Covers OpenRouter, DeepSeek, and private gateways exposing Chat Completions / Responses / Completions / Embeddings.
473
334
 
474
- - `llm_cost_tracker` does not make external HTTP calls.
475
- - It does not store prompt or response bodies.
476
- - Faraday responses are not modified.
477
- - Storage failures are non-fatal by default via `storage_error_behavior = :warn`.
478
- - Budget and unknown-pricing errors are raised only when you opt into `:raise` or `:block_requests`.
479
- - Pricing is local and best-effort; use `prices_file` or `pricing_overrides` for production-specific rates.
480
- - Streaming/SSE calls are skipped with a warning when the final usage payload is not readable by Faraday.
335
+ ## Custom parser
481
336
 
482
- ## Production Checklist
483
-
484
- - Use `storage_backend = :active_record` in production.
485
- - Set `monthly_budget` and choose `budget_exceeded_behavior`.
486
- - Treat `:block_requests` as best-effort in concurrent systems, not a strict quota.
487
- - Keep `unknown_pricing_behavior = :warn` or `:raise` until pricing overrides are complete.
488
- - Add `pricing_overrides` for custom, fine-tuned, gateway-specific, or newly released models.
489
- - Tag calls with useful business context such as `tenant_id`, `user_id`, and `feature`.
490
- - Check `LlmCostTracker::LlmApiCall.unknown_pricing.group(:model).count` after deploys.
491
- - Track `latency_ms` and watch `latency_by_model` for slow or degraded providers.
492
-
493
- ## Known Limitations
494
-
495
- - `:block_requests` is best-effort under concurrency. For hard caps, use an external quota system, provider-side limits, or a gateway-level budget.
496
- - Streaming/SSE calls are tracked only when Faraday exposes a final response body with usage data. Otherwise the gem warns and skips automatic tracking.
497
- - Anthropic cache creation TTL variants are not modeled separately yet; 1-hour cache writes may be underestimated compared with the default 5-minute cache write rate.
498
- - OpenAI reasoning tokens are included in output-token totals when providers report them that way, but separate reasoning-token attribution is not stored yet.
499
-
500
- ## Adding a Custom Provider Parser
501
-
502
- Use this for providers that are not OpenAI-compatible and return a different usage shape.
337
+ For providers with a non-OpenAI usage shape:
503
338
 
504
339
  ```ruby
505
340
  class AcmeParser < LlmCostTracker::Parsers::Base
@@ -510,73 +345,58 @@ class AcmeParser < LlmCostTracker::Parsers::Base
510
345
  def parse(request_url, request_body, response_status, response_body)
511
346
  return nil unless response_status == 200
512
347
 
513
- response = safe_json_parse(response_body)
514
- usage = response["usage"]
348
+ usage = safe_json_parse(response_body)&.dig("usage")
515
349
  return nil unless usage
516
350
 
517
- {
351
+ LlmCostTracker::ParsedUsage.build(
518
352
  provider: "acme",
519
- model: response["model"],
353
+ model: safe_json_parse(response_body)["model"],
520
354
  input_tokens: usage["input"] || 0,
521
355
  output_tokens: usage["output"] || 0
522
- }
356
+ )
523
357
  end
524
358
  end
525
359
 
526
- # Register it
527
360
  LlmCostTracker::Parsers::Registry.register(AcmeParser.new)
528
361
  ```
529
362
 
530
- ## Supported Providers
363
+ ## Supported providers
531
364
 
532
365
  | Provider | Auto-detected | Models with pricing |
533
- |----------|:---:|---|
366
+ |---|:---:|---|
534
367
  | OpenAI | ✅ | GPT-5.2/5.1/5, GPT-5 mini/nano, GPT-4.1, GPT-4o, o1/o3/o4-mini |
535
- | OpenRouter | ✅ | Uses OpenAI-compatible usage; provider-prefixed OpenAI model IDs are normalized when possible |
536
- | DeepSeek | ✅ | Uses OpenAI-compatible usage; add `pricing_overrides` for DeepSeek model pricing |
368
+ | OpenRouter | ✅ | OpenAI-compatible usage; provider-prefixed OpenAI model IDs normalized when possible |
369
+ | DeepSeek | ✅ | OpenAI-compatible usage; add `pricing_overrides` for DeepSeek models |
537
370
  | OpenAI-compatible hosts | 🔧 | Configure `openai_compatible_providers` |
538
371
  | Anthropic | ✅ | Claude Opus 4.6/4.1/4, Sonnet 4.6/4.5/4, Haiku 4.5, Claude 3.x |
539
372
  | Google Gemini | ✅ | Gemini 2.5 Pro/Flash/Flash-Lite, 2.0 Flash/Flash-Lite, 1.5 Pro/Flash |
540
- | Any other | 🔧 | Via custom parser (see above) |
541
-
542
- Supported endpoint families:
373
+ | Any other | 🔧 | Custom parser |
543
374
 
544
- - OpenAI: Chat Completions, Responses, Completions, Embeddings
545
- - OpenAI-compatible: Chat Completions, Responses, Completions, Embeddings
546
- - Anthropic: Messages
547
- - Google Gemini: `generateContent` responses with `usageMetadata`
375
+ Endpoints: OpenAI Chat Completions / Responses / Completions / Embeddings; OpenAI-compatible equivalents; Anthropic Messages; Gemini `generateContent` with `usageMetadata`.
548
376
 
549
- ## How It Works
377
+ ## Safety
550
378
 
551
- ```
552
- Your App Faraday [LlmCostTracker Middleware] → LLM API
553
-
554
- Parses response body
555
- Extracts token usage
556
- Calculates cost
557
-
558
- ActiveSupport::Notifications
559
- ActiveRecord / Log / Custom
560
- ```
379
+ - No external HTTP calls.
380
+ - No prompt or response bodies stored.
381
+ - Faraday responses not modified.
382
+ - Storage failures non-fatal by default (`storage_error_behavior = :warn`).
383
+ - Budget / unknown-pricing errors are raised only when you opt in.
561
384
 
562
- The middleware intercepts **outgoing** HTTP responses (not incoming Rails requests), parses the provider usage object, looks up pricing, and records the event. It never modifies requests or responses. Put `llm_cost_tracker` inside the Faraday stack where it can see the final response body; if another middleware consumes or transforms streaming bodies, use manual tracking.
385
+ ## Known limitations
563
386
 
564
- For streaming APIs, tracking depends on the final response body including provider usage data. If the client consumes server-sent events without exposing the final usage payload to Faraday, the gem logs a warning and skips tracking; use manual tracking for those calls.
387
+ - `:block_requests` is best-effort under concurrency; use an external quota system for hard caps.
388
+ - Streaming/SSE tracked only when Faraday exposes a final body with usage.
389
+ - Anthropic cache TTL variants (1h vs 5min writes) not modeled separately.
390
+ - OpenAI reasoning tokens included in output totals; separate reasoning-token attribution not stored.
565
391
 
566
392
  ## Development
567
393
 
568
394
  ```bash
569
- git clone https://github.com/sergey-homenko/llm_cost_tracker.git
570
- cd llm_cost_tracker
571
395
  bundle install
572
396
  bundle exec rspec
573
397
  bundle exec rubocop
574
398
  ```
575
399
 
576
- ## Contributing
577
-
578
- Bug reports and pull requests are welcome on [GitHub](https://github.com/sergey-homenko/llm_cost_tracker).
579
-
580
400
  ## License
581
401
 
582
- The gem is available as open source under the terms of the [MIT License](LICENSE.txt).
402
+ MIT. See [LICENSE.txt](LICENSE.txt).