llm_cost_tracker 0.1.4 → 0.2.0.alpha2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +58 -91
- data/PLAN_0.2.md +488 -0
- data/README.md +140 -320
- data/app/controllers/llm_cost_tracker/application_controller.rb +42 -0
- data/app/controllers/llm_cost_tracker/calls_controller.rb +77 -0
- data/app/controllers/llm_cost_tracker/dashboard_controller.rb +54 -0
- data/app/controllers/llm_cost_tracker/data_quality_controller.rb +10 -0
- data/app/controllers/llm_cost_tracker/models_controller.rb +12 -0
- data/app/controllers/llm_cost_tracker/tags_controller.rb +21 -0
- data/app/helpers/llm_cost_tracker/application_helper.rb +113 -0
- data/app/services/llm_cost_tracker/dashboard/data_quality.rb +38 -0
- data/app/services/llm_cost_tracker/dashboard/filter.rb +109 -0
- data/app/services/llm_cost_tracker/dashboard/overview_stats.rb +87 -0
- data/app/services/llm_cost_tracker/dashboard/provider_breakdown.rb +44 -0
- data/app/services/llm_cost_tracker/dashboard/tag_breakdown.rb +58 -0
- data/app/services/llm_cost_tracker/dashboard/tag_key_explorer.rb +125 -0
- data/app/services/llm_cost_tracker/dashboard/time_series.rb +44 -0
- data/app/services/llm_cost_tracker/dashboard/top_models.rb +89 -0
- data/app/services/llm_cost_tracker/pagination.rb +59 -0
- data/app/views/layouts/llm_cost_tracker/application.html.erb +342 -0
- data/app/views/llm_cost_tracker/calls/index.html.erb +127 -0
- data/app/views/llm_cost_tracker/calls/show.html.erb +67 -0
- data/app/views/llm_cost_tracker/dashboard/index.html.erb +145 -0
- data/app/views/llm_cost_tracker/data_quality/index.html.erb +110 -0
- data/app/views/llm_cost_tracker/errors/database.html.erb +8 -0
- data/app/views/llm_cost_tracker/errors/invalid_filter.html.erb +4 -0
- data/app/views/llm_cost_tracker/errors/not_found.html.erb +5 -0
- data/app/views/llm_cost_tracker/models/index.html.erb +95 -0
- data/app/views/llm_cost_tracker/shared/_bar.html.erb +5 -0
- data/app/views/llm_cost_tracker/shared/setup_required.html.erb +6 -0
- data/app/views/llm_cost_tracker/tags/index.html.erb +34 -0
- data/app/views/llm_cost_tracker/tags/show.html.erb +69 -0
- data/config/routes.rb +10 -0
- data/lib/llm_cost_tracker/budget.rb +16 -38
- data/lib/llm_cost_tracker/configuration.rb +3 -1
- data/lib/llm_cost_tracker/cost.rb +1 -3
- data/lib/llm_cost_tracker/engine.rb +13 -0
- data/lib/llm_cost_tracker/engine_compatibility.rb +15 -0
- data/lib/llm_cost_tracker/errors.rb +2 -0
- data/lib/llm_cost_tracker/event.rb +1 -3
- data/lib/llm_cost_tracker/event_metadata.rb +9 -18
- data/lib/llm_cost_tracker/llm_api_call.rb +4 -17
- data/lib/llm_cost_tracker/middleware/faraday.rb +4 -4
- data/lib/llm_cost_tracker/parsed_usage.rb +5 -9
- data/lib/llm_cost_tracker/parsers/anthropic.rb +4 -5
- data/lib/llm_cost_tracker/parsers/base.rb +3 -8
- data/lib/llm_cost_tracker/parsers/gemini.rb +3 -3
- data/lib/llm_cost_tracker/parsers/openai_usage.rb +3 -3
- data/lib/llm_cost_tracker/parsers/registry.rb +5 -12
- data/lib/llm_cost_tracker/period_grouping.rb +68 -0
- data/lib/llm_cost_tracker/price_registry.rb +22 -30
- data/lib/llm_cost_tracker/pricing.rb +10 -19
- data/lib/llm_cost_tracker/report.rb +4 -4
- data/lib/llm_cost_tracker/report_data.rb +21 -24
- data/lib/llm_cost_tracker/report_formatter.rb +4 -2
- data/lib/llm_cost_tracker/storage/active_record_store.rb +1 -3
- data/lib/llm_cost_tracker/tag_key.rb +16 -0
- data/lib/llm_cost_tracker/tracker.rb +35 -1
- data/lib/llm_cost_tracker/version.rb +1 -1
- data/lib/llm_cost_tracker.rb +3 -6
- data/llm_cost_tracker.gemspec +13 -9
- metadata +91 -20
- data/.rubocop.yml +0 -44
- data/lib/llm_cost_tracker/storage/active_record_backend.rb +0 -19
- data/lib/llm_cost_tracker/storage/backends.rb +0 -26
- data/lib/llm_cost_tracker/storage/custom_backend.rb +0 -16
- data/lib/llm_cost_tracker/storage/log_backend.rb +0 -28
- data/lib/llm_cost_tracker/value_object.rb +0 -45
data/README.md
CHANGED
|
@@ -1,8 +1,6 @@
|
|
|
1
1
|
# LlmCostTracker
|
|
2
2
|
|
|
3
|
-
**
|
|
4
|
-
|
|
5
|
-
Track cost by user, tenant, feature, provider, and model, all in your own database. No proxy. No SaaS required.
|
|
3
|
+
**Self-hosted LLM cost tracking for Ruby and Rails.** Intercepts Faraday LLM responses, prices them locally, stores events in your database. No proxy, no SaaS.
|
|
6
4
|
|
|
7
5
|
[](https://rubygems.org/gems/llm_cost_tracker)
|
|
8
6
|
[](https://github.com/sergey-homenko/llm_cost_tracker/actions)
|
|
@@ -20,53 +18,38 @@ By model:
|
|
|
20
18
|
claude-sonnet-4-6 $31.200000
|
|
21
19
|
gemini-2.5-flash $14.120000
|
|
22
20
|
|
|
23
|
-
By tag
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
translate $24.700000
|
|
21
|
+
By tag key "env":
|
|
22
|
+
production $119.300000
|
|
23
|
+
staging $8.120000
|
|
27
24
|
```
|
|
28
25
|
|
|
29
|
-
## Why
|
|
30
|
-
|
|
31
|
-
Every Rails app integrating LLMs faces the same problem: **you don't know how much AI is costing you** until the invoice arrives. Full observability platforms like Langfuse and Helicone are powerful, but sometimes you just need a small Rails-native cost ledger that lives in your app database.
|
|
26
|
+
## Why
|
|
32
27
|
|
|
33
|
-
|
|
28
|
+
Every Rails app with LLM integrations eventually runs into the same question: where did that invoice come from? Full observability platforms like Langfuse and Helicone cover a lot more than cost, and sometimes you just want a small Rails-native ledger that lives in your own database.
|
|
34
29
|
|
|
35
|
-
|
|
36
|
-
- 🏠 **Self-hosted** — your data stays in your database
|
|
37
|
-
- 🧩 **Client-light** — works with raw Faraday and LLM gems that expose their Faraday connection
|
|
38
|
-
- 🏷️ **Attribution-first** — tag spend by feature, tenant, user, job, or environment
|
|
39
|
-
- 🌐 **OpenAI-compatible** — auto-detect OpenRouter and DeepSeek, with custom compatible hosts configurable
|
|
40
|
-
- 🛑 **Budget guardrails** — notify, raise, or block requests when monthly spend is exhausted
|
|
41
|
-
- 📊 **Quick reports** — print a terminal cost report with one rake task
|
|
30
|
+
`llm_cost_tracker` is scoped to that. It plugs into Faraday, parses provider usage out of the response, looks up pricing locally, and writes an event. You end up with a ledger you can query with plain ActiveRecord, slice by any tag dimension, and optionally surface on a built-in dashboard. No proxy, no SaaS, no separate service to run.
|
|
42
31
|
|
|
43
|
-
|
|
32
|
+
It's not a tracing platform, prompt CMS, eval system, or gateway — and doesn't want to be. The goal is answering _"what did this app spend on LLM APIs, and where did that spend come from?"_ well enough that you stop worrying about it.
|
|
44
33
|
|
|
45
34
|
## Installation
|
|
46
35
|
|
|
47
|
-
Add to your Gemfile:
|
|
48
|
-
|
|
49
36
|
```ruby
|
|
50
37
|
gem "llm_cost_tracker"
|
|
51
38
|
```
|
|
52
39
|
|
|
53
|
-
For ActiveRecord storage
|
|
40
|
+
For ActiveRecord storage:
|
|
54
41
|
|
|
55
42
|
```bash
|
|
56
43
|
bin/rails generate llm_cost_tracker:install
|
|
57
44
|
bin/rails db:migrate
|
|
58
45
|
```
|
|
59
46
|
|
|
60
|
-
##
|
|
61
|
-
|
|
62
|
-
Try cost calculation without a database or migration:
|
|
47
|
+
## Quick try (no database)
|
|
63
48
|
|
|
64
49
|
```ruby
|
|
65
50
|
require "llm_cost_tracker"
|
|
66
51
|
|
|
67
|
-
LlmCostTracker.configure
|
|
68
|
-
config.storage_backend = :log
|
|
69
|
-
end
|
|
52
|
+
LlmCostTracker.configure { |c| c.storage_backend = :log }
|
|
70
53
|
|
|
71
54
|
LlmCostTracker.track(
|
|
72
55
|
provider: :openai,
|
|
@@ -75,25 +58,12 @@ LlmCostTracker.track(
|
|
|
75
58
|
output_tokens: 200,
|
|
76
59
|
feature: "demo"
|
|
77
60
|
)
|
|
61
|
+
# => [LlmCostTracker] openai/gpt-4o tokens=1000+200 cost=$0.004500 tags={:feature=>"demo"}
|
|
78
62
|
```
|
|
79
63
|
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
```text
|
|
83
|
-
[LlmCostTracker] openai/gpt-4o tokens=1000+200 cost=$0.004500 tags={:feature=>"demo"}
|
|
84
|
-
```
|
|
85
|
-
|
|
86
|
-
## Quick Start
|
|
87
|
-
|
|
88
|
-
Use the path that matches your app:
|
|
89
|
-
|
|
90
|
-
- Using `ruby-openai`, `ruby_llm`, or another client that exposes Faraday? Patch that client's Faraday connection.
|
|
91
|
-
- Using raw Faraday? Add the middleware directly.
|
|
92
|
-
- Using a client without Faraday access? Use manual tracking.
|
|
93
|
-
|
|
94
|
-
### Option 1: Patch An Existing Client
|
|
64
|
+
## Usage
|
|
95
65
|
|
|
96
|
-
|
|
66
|
+
### Patch an existing client's Faraday connection
|
|
97
67
|
|
|
98
68
|
```ruby
|
|
99
69
|
# config/initializers/openai.rb
|
|
@@ -102,34 +72,27 @@ OpenAI.configure do |config|
|
|
|
102
72
|
|
|
103
73
|
config.faraday do |f|
|
|
104
74
|
f.use :llm_cost_tracker, tags: -> {
|
|
105
|
-
{
|
|
106
|
-
user_id: Current.user&.id,
|
|
107
|
-
feature: Current.llm_feature || "chat"
|
|
108
|
-
}
|
|
75
|
+
{ user_id: Current.user&.id, workflow: Current.workflow, env: Rails.env }
|
|
109
76
|
}
|
|
110
77
|
end
|
|
111
78
|
end
|
|
112
79
|
```
|
|
113
80
|
|
|
114
|
-
|
|
81
|
+
`tags:` can be a callable so `Current` attributes are evaluated per request:
|
|
115
82
|
|
|
116
83
|
```ruby
|
|
117
|
-
# app/models/current.rb
|
|
118
84
|
class Current < ActiveSupport::CurrentAttributes
|
|
119
|
-
attribute :user, :tenant, :
|
|
85
|
+
attribute :user, :tenant, :workflow
|
|
120
86
|
end
|
|
121
87
|
|
|
122
|
-
#
|
|
88
|
+
# application_controller.rb
|
|
123
89
|
before_action do
|
|
124
90
|
Current.user = current_user
|
|
125
|
-
Current.
|
|
126
|
-
Current.llm_feature = "chat"
|
|
91
|
+
Current.workflow = "chat"
|
|
127
92
|
end
|
|
128
93
|
```
|
|
129
94
|
|
|
130
|
-
###
|
|
131
|
-
|
|
132
|
-
If your LLM client uses Faraday, add the middleware to that connection:
|
|
95
|
+
### Raw Faraday
|
|
133
96
|
|
|
134
97
|
```ruby
|
|
135
98
|
conn = Faraday.new(url: "https://api.openai.com") do |f|
|
|
@@ -139,18 +102,12 @@ conn = Faraday.new(url: "https://api.openai.com") do |f|
|
|
|
139
102
|
f.adapter Faraday.default_adapter
|
|
140
103
|
end
|
|
141
104
|
|
|
142
|
-
|
|
143
|
-
response = conn.post("/v1/responses", {
|
|
144
|
-
model: "gpt-5-mini",
|
|
145
|
-
input: "Hello!"
|
|
146
|
-
})
|
|
105
|
+
conn.post("/v1/responses", { model: "gpt-5-mini", input: "Hello!" })
|
|
147
106
|
```
|
|
148
107
|
|
|
149
|
-
|
|
150
|
-
|
|
151
|
-
### Option 3: Manual tracking
|
|
108
|
+
Place `llm_cost_tracker` inside the Faraday stack where it can see the final response body. For streaming APIs, tracking requires the final body to expose provider usage; otherwise the gem warns and skips — use manual tracking there.
|
|
152
109
|
|
|
153
|
-
|
|
110
|
+
### Manual tracking
|
|
154
111
|
|
|
155
112
|
```ruby
|
|
156
113
|
LlmCostTracker.track(
|
|
@@ -169,337 +126,215 @@ LlmCostTracker.track(
|
|
|
169
126
|
```ruby
|
|
170
127
|
# config/initializers/llm_cost_tracker.rb
|
|
171
128
|
LlmCostTracker.configure do |config|
|
|
172
|
-
|
|
173
|
-
config.storage_backend = :active_record
|
|
174
|
-
|
|
175
|
-
# Default tags on every event
|
|
129
|
+
config.storage_backend = :active_record # :log (default), :active_record, :custom
|
|
176
130
|
config.default_tags = { app: "my_app", environment: Rails.env }
|
|
177
131
|
|
|
178
|
-
# Monthly budget in USD
|
|
179
132
|
config.monthly_budget = 500.00
|
|
180
|
-
config.budget_exceeded_behavior = :notify
|
|
181
|
-
config.storage_error_behavior
|
|
182
|
-
config.unknown_pricing_behavior = :warn
|
|
133
|
+
config.budget_exceeded_behavior = :notify # :notify, :raise, :block_requests
|
|
134
|
+
config.storage_error_behavior = :warn # :ignore, :warn, :raise
|
|
135
|
+
config.unknown_pricing_behavior = :warn # :ignore, :warn, :raise
|
|
183
136
|
|
|
184
|
-
# Alert callback
|
|
185
137
|
config.on_budget_exceeded = ->(data) {
|
|
186
|
-
SlackNotifier.notify(
|
|
187
|
-
"#alerts",
|
|
188
|
-
"🚨 LLM budget exceeded! $#{data[:monthly_total].round(2)} / $#{data[:budget]}"
|
|
189
|
-
)
|
|
138
|
+
SlackNotifier.notify("#alerts", "🚨 LLM budget $#{data[:monthly_total].round(2)} / $#{data[:budget]}")
|
|
190
139
|
}
|
|
191
140
|
|
|
192
|
-
# Override pricing for custom/fine-tuned models (per 1M tokens)
|
|
193
141
|
config.prices_file = Rails.root.join("config/llm_cost_tracker_prices.yml")
|
|
194
142
|
config.pricing_overrides = {
|
|
195
143
|
"ft:gpt-4o-mini:my-org" => { input: 0.30, cached_input: 0.15, output: 1.20 }
|
|
196
144
|
}
|
|
197
145
|
|
|
198
|
-
#
|
|
146
|
+
# Built-in: openrouter.ai, api.deepseek.com
|
|
199
147
|
config.openai_compatible_providers["llm.my-company.com"] = "internal_gateway"
|
|
200
148
|
end
|
|
201
149
|
```
|
|
202
150
|
|
|
203
|
-
Pricing is best-effort
|
|
204
|
-
|
|
205
|
-
Storage errors are non-fatal by default:
|
|
206
|
-
|
|
207
|
-
```ruby
|
|
208
|
-
config.storage_error_behavior = :warn # default
|
|
209
|
-
config.storage_error_behavior = :raise # fail fast with StorageError
|
|
210
|
-
config.storage_error_behavior = :ignore # skip storage failures silently
|
|
211
|
-
```
|
|
212
|
-
|
|
213
|
-
With the default `:warn` behavior, tracking emits a warning and lets the LLM response continue if ActiveRecord or custom storage fails. `LlmCostTracker::StorageError` exposes `original_error` when `:raise` is enabled.
|
|
151
|
+
Pricing is best-effort. OpenRouter-style IDs like `openai/gpt-4o-mini` are normalized to built-in names when possible. Use `prices_file` / `pricing_overrides` for fine-tunes, gateway-specific IDs, enterprise discounts, batch pricing, or models the gem doesn't know.
|
|
214
152
|
|
|
215
|
-
|
|
216
|
-
|
|
217
|
-
```ruby
|
|
218
|
-
config.unknown_pricing_behavior = :warn # default
|
|
219
|
-
config.unknown_pricing_behavior = :raise # fail fast with UnknownPricingError
|
|
220
|
-
config.unknown_pricing_behavior = :ignore # keep tracking tokens silently
|
|
221
|
-
```
|
|
153
|
+
`storage_error_behavior = :warn` (default) lets LLM responses continue if storage fails; `:raise` exposes `StorageError#original_error`.
|
|
222
154
|
|
|
223
|
-
|
|
155
|
+
Unknown pricing still records token counts, but `cost` is `nil` and budget guardrails skip that event. Find unpriced models:
|
|
224
156
|
|
|
225
157
|
```ruby
|
|
226
158
|
LlmCostTracker::LlmApiCall.unknown_pricing.group(:model).count
|
|
227
159
|
```
|
|
228
160
|
|
|
229
|
-
### Keeping
|
|
161
|
+
### Keeping prices current
|
|
230
162
|
|
|
231
|
-
Built-in prices
|
|
232
|
-
|
|
233
|
-
For production apps, keep a local JSON or YAML price file and point the gem at it:
|
|
163
|
+
Built-in prices are in `lib/llm_cost_tracker/prices.json`. The gem never fetches pricing on boot. For production, generate a local overrides file and point the gem at it:
|
|
234
164
|
|
|
235
165
|
```bash
|
|
236
166
|
bin/rails generate llm_cost_tracker:prices
|
|
237
167
|
```
|
|
238
168
|
|
|
239
|
-
```ruby
|
|
240
|
-
config.prices_file = Rails.root.join("config/llm_cost_tracker_prices.yml")
|
|
241
|
-
```
|
|
242
|
-
|
|
243
|
-
Example JSON:
|
|
244
|
-
|
|
245
169
|
```json
|
|
246
170
|
{
|
|
247
|
-
"metadata": {
|
|
248
|
-
"updated_at": "2026-04-18",
|
|
249
|
-
"currency": "USD",
|
|
250
|
-
"unit": "1M tokens"
|
|
251
|
-
},
|
|
171
|
+
"metadata": { "updated_at": "2026-04-18", "currency": "USD", "unit": "1M tokens" },
|
|
252
172
|
"models": {
|
|
253
|
-
"my-gateway/gpt-4o-mini": {
|
|
254
|
-
"input": 0.20,
|
|
255
|
-
"cached_input": 0.10,
|
|
256
|
-
"output": 0.80
|
|
257
|
-
}
|
|
173
|
+
"my-gateway/gpt-4o-mini": { "input": 0.20, "cached_input": 0.10, "output": 0.80 }
|
|
258
174
|
}
|
|
259
175
|
}
|
|
260
176
|
```
|
|
261
177
|
|
|
262
|
-
`pricing_overrides`
|
|
178
|
+
`pricing_overrides` has the highest precedence; use it for small Ruby-only tweaks, `prices_file` for broader tables.
|
|
263
179
|
|
|
264
|
-
## Budget
|
|
180
|
+
## Budget enforcement
|
|
265
181
|
|
|
266
182
|
```ruby
|
|
267
|
-
|
|
268
|
-
|
|
269
|
-
|
|
270
|
-
config.budget_exceeded_behavior = :block_requests
|
|
271
|
-
end
|
|
183
|
+
config.storage_backend = :active_record
|
|
184
|
+
config.monthly_budget = 100.00
|
|
185
|
+
config.budget_exceeded_behavior = :block_requests
|
|
272
186
|
```
|
|
273
187
|
|
|
274
|
-
|
|
275
|
-
|
|
276
|
-
- `:
|
|
277
|
-
- `:raise` — records the event, then raises `LlmCostTracker::BudgetExceededError` when the month is over budget.
|
|
278
|
-
- `:block_requests` — blocks Faraday LLM requests before the HTTP call when the ActiveRecord monthly total has already reached the budget. If a request pushes the month over budget, it also raises after recording the event.
|
|
279
|
-
|
|
280
|
-
`BudgetExceededError` exposes `monthly_total`, `budget`, and `last_event`:
|
|
188
|
+
- `:notify` — fire `on_budget_exceeded` after an event pushes the month over budget.
|
|
189
|
+
- `:raise` — record the event, then raise `BudgetExceededError`.
|
|
190
|
+
- `:block_requests` — block preflight when the stored monthly total is already over budget; still raises post-response on the event that crosses the line. Needs `:active_record` storage.
|
|
281
191
|
|
|
282
192
|
```ruby
|
|
283
|
-
begin
|
|
284
|
-
client.chat(...)
|
|
285
193
|
rescue LlmCostTracker::BudgetExceededError => e
|
|
286
|
-
|
|
287
|
-
end
|
|
194
|
+
# e.monthly_total, e.budget, e.last_event
|
|
288
195
|
```
|
|
289
196
|
|
|
290
|
-
|
|
291
|
-
|
|
292
|
-
`:block_requests` is a best-effort guardrail, not a transactional hard quota. In highly concurrent deployments, multiple workers can pass the preflight check at the same time before any of them records its final cost. The request that first pushes the month over budget is stored before the post-response `BudgetExceededError` is raised; later Faraday requests are blocked during preflight once the stored monthly total is exhausted. Use provider-side limits or a gateway-level quota if you need strict cross-process caps.
|
|
293
|
-
|
|
294
|
-
## Querying Costs (ActiveRecord)
|
|
197
|
+
`:block_requests` is best-effort under concurrency, not a transactional cap. Use provider/gateway-level limits for strict quotas.
|
|
295
198
|
|
|
296
|
-
|
|
199
|
+
## Querying costs
|
|
297
200
|
|
|
298
201
|
```bash
|
|
299
202
|
bin/rails llm_cost_tracker:report
|
|
300
|
-
|
|
301
|
-
# Optional: change the window
|
|
302
203
|
DAYS=7 bin/rails llm_cost_tracker:report
|
|
303
204
|
```
|
|
304
205
|
|
|
305
|
-
Example:
|
|
306
|
-
|
|
307
|
-
```text
|
|
308
|
-
LLM Cost Report (last 30 days)
|
|
309
|
-
|
|
310
|
-
Total cost: $127.420000
|
|
311
|
-
Requests: 4,218
|
|
312
|
-
Avg latency: 812ms
|
|
313
|
-
Unknown pricing: 0
|
|
314
|
-
|
|
315
|
-
By provider:
|
|
316
|
-
openai $96.220000
|
|
317
|
-
anthropic $31.200000
|
|
318
|
-
```
|
|
319
|
-
|
|
320
|
-
Or query the ledger directly:
|
|
321
|
-
|
|
322
206
|
```ruby
|
|
323
|
-
# Today's total spend
|
|
324
207
|
LlmCostTracker::LlmApiCall.today.total_cost
|
|
325
|
-
# => 12.45
|
|
326
|
-
|
|
327
|
-
# Cost breakdown by model this month
|
|
328
208
|
LlmCostTracker::LlmApiCall.this_month.cost_by_model
|
|
329
|
-
# => { "gpt-4o" => 8.20, "claude-sonnet-4-6" => 4.25 }
|
|
330
|
-
|
|
331
|
-
# Cost by provider
|
|
332
209
|
LlmCostTracker::LlmApiCall.this_month.cost_by_provider
|
|
333
|
-
# => { "openai" => 8.20, "anthropic" => 4.25 }
|
|
334
|
-
|
|
335
|
-
# SQL-side cost breakdown by any tag key
|
|
336
|
-
calls = LlmCostTracker::LlmApiCall.this_month
|
|
337
|
-
calls.group_by_tag("feature").sum(:total_cost)
|
|
338
|
-
# => { "chat" => 7.10, "summarizer" => 1.10 }
|
|
339
210
|
|
|
340
|
-
#
|
|
341
|
-
|
|
342
|
-
|
|
211
|
+
# Group / sum by any tag
|
|
212
|
+
LlmCostTracker::LlmApiCall.this_month.group_by_tag("feature").sum(:total_cost)
|
|
213
|
+
LlmCostTracker::LlmApiCall.this_month.cost_by_tag("feature") # with "(untagged)" bucket
|
|
343
214
|
|
|
344
|
-
#
|
|
215
|
+
# Period grouping (SQL-side)
|
|
216
|
+
LlmCostTracker::LlmApiCall.this_month.group_by_period(:day).sum(:total_cost)
|
|
217
|
+
LlmCostTracker::LlmApiCall.group_by_period(:month).sum(:total_cost)
|
|
345
218
|
LlmCostTracker::LlmApiCall.daily_costs(days: 7)
|
|
346
|
-
# => { "2026-04-10" => 1.5, "2026-04-11" => 2.3, ... }
|
|
347
219
|
|
|
348
|
-
# Latency
|
|
220
|
+
# Latency
|
|
349
221
|
LlmCostTracker::LlmApiCall.with_latency.average_latency_ms
|
|
350
222
|
LlmCostTracker::LlmApiCall.this_month.latency_by_model
|
|
351
223
|
|
|
352
|
-
#
|
|
224
|
+
# Tag filters
|
|
353
225
|
LlmCostTracker::LlmApiCall.by_tag("feature", "chat").this_month.total_cost
|
|
354
|
-
|
|
355
|
-
# Filter by another tag
|
|
356
|
-
LlmCostTracker::LlmApiCall.by_tag("user_id", "42").today.total_cost
|
|
357
|
-
|
|
358
|
-
# Filter by multiple tags
|
|
359
226
|
LlmCostTracker::LlmApiCall.by_tags(user_id: 42, feature: "chat").this_month.total_cost
|
|
360
227
|
|
|
361
|
-
#
|
|
362
|
-
LlmCostTracker::LlmApiCall.unknown_pricing.group(:model).count
|
|
363
|
-
LlmCostTracker::LlmApiCall.with_cost.this_month.total_cost
|
|
364
|
-
|
|
365
|
-
# Custom date range
|
|
228
|
+
# Range
|
|
366
229
|
LlmCostTracker::LlmApiCall.between(1.week.ago, Time.current).cost_by_model
|
|
367
230
|
```
|
|
368
231
|
|
|
369
|
-
### Tag
|
|
232
|
+
### Tag storage
|
|
370
233
|
|
|
371
|
-
|
|
234
|
+
New installs use `jsonb` + GIN on PostgreSQL:
|
|
372
235
|
|
|
373
236
|
```ruby
|
|
374
237
|
t.jsonb :tags, null: false, default: {}
|
|
375
238
|
add_index :llm_api_calls, :tags, using: :gin
|
|
376
239
|
```
|
|
377
240
|
|
|
378
|
-
On
|
|
241
|
+
On other adapters tags fall back to JSON in a text column. `by_tag` uses JSONB containment on PG, text matching elsewhere.
|
|
379
242
|
|
|
380
|
-
|
|
243
|
+
Upgrade an existing install:
|
|
381
244
|
|
|
382
245
|
```bash
|
|
383
|
-
bin/rails generate llm_cost_tracker:upgrade_tags_to_jsonb
|
|
246
|
+
bin/rails generate llm_cost_tracker:upgrade_tags_to_jsonb # PG: text → jsonb + GIN
|
|
247
|
+
bin/rails generate llm_cost_tracker:upgrade_cost_precision # widen cost columns
|
|
248
|
+
bin/rails generate llm_cost_tracker:add_latency_ms
|
|
384
249
|
bin/rails db:migrate
|
|
385
250
|
```
|
|
386
251
|
|
|
387
|
-
|
|
252
|
+
## Dashboard (optional)
|
|
388
253
|
|
|
389
|
-
|
|
254
|
+
Opt-in Rails Engine. Plain ERB, inline CSS, no JS. Requires Rails 7.1+; the core middleware works without Rails.
|
|
390
255
|
|
|
391
|
-
```
|
|
392
|
-
|
|
393
|
-
|
|
256
|
+
```ruby
|
|
257
|
+
# config/application.rb (or an initializer)
|
|
258
|
+
require "llm_cost_tracker/engine"
|
|
259
|
+
|
|
260
|
+
# config/routes.rb
|
|
261
|
+
mount LlmCostTracker::Engine => "/llm-costs"
|
|
394
262
|
```
|
|
395
263
|
|
|
396
|
-
|
|
264
|
+
Routes (GET-only; CSV export included):
|
|
397
265
|
|
|
398
|
-
|
|
399
|
-
|
|
400
|
-
|
|
266
|
+
- `/llm-costs` — overview: spend (with delta vs previous period), calls, avg cost/call, avg latency, unknown pricing, budget, daily trend, provider rollup, top models
|
|
267
|
+
- `/llm-costs/models` — by provider + model; sortable by spend, volume, avg cost, latency
|
|
268
|
+
- `/llm-costs/calls` — filterable + paginated; outlier sort modes (expensive, largest input/output, slowest, unknown pricing); CSV export
|
|
269
|
+
- `/llm-costs/calls/:id` — details
|
|
270
|
+
- `/llm-costs/tags` — tag keys present in the dataset (PG/SQLite native, MySQL via in-Ruby fallback)
|
|
271
|
+
- `/llm-costs/tags/:key` — breakdown by values of a given tag key
|
|
272
|
+
- `/llm-costs/data_quality` — unknown pricing share, untagged calls, missing latency
|
|
273
|
+
|
|
274
|
+
> ⚠️ **No built-in auth.** Tags carry whatever your app puts in them. Protect the mount point with your app's auth.
|
|
275
|
+
|
|
276
|
+
### Basic auth
|
|
277
|
+
|
|
278
|
+
```ruby
|
|
279
|
+
authenticated = ->(req) {
|
|
280
|
+
ActionController::HttpAuthentication::Basic.authenticate(req) do |name, password|
|
|
281
|
+
ActiveSupport::SecurityUtils.secure_compare(name, ENV.fetch("LLM_DASHBOARD_USER")) &
|
|
282
|
+
ActiveSupport::SecurityUtils.secure_compare(password, ENV.fetch("LLM_DASHBOARD_PASSWORD"))
|
|
283
|
+
end
|
|
284
|
+
}
|
|
285
|
+
constraints(authenticated) { mount LlmCostTracker::Engine => "/llm-costs" }
|
|
401
286
|
```
|
|
402
287
|
|
|
403
|
-
|
|
288
|
+
### Devise
|
|
289
|
+
|
|
290
|
+
```ruby
|
|
291
|
+
authenticate :user, ->(user) { user.admin? } do
|
|
292
|
+
mount LlmCostTracker::Engine => "/llm-costs"
|
|
293
|
+
end
|
|
294
|
+
```
|
|
404
295
|
|
|
405
|
-
|
|
296
|
+
## ActiveSupport::Notifications
|
|
406
297
|
|
|
407
298
|
```ruby
|
|
408
299
|
ActiveSupport::Notifications.subscribe("llm_request.llm_cost_tracker") do |*, payload|
|
|
409
300
|
# payload =>
|
|
410
301
|
# {
|
|
411
|
-
# provider: "openai",
|
|
412
|
-
#
|
|
413
|
-
# input_tokens: 150,
|
|
414
|
-
# output_tokens: 42,
|
|
415
|
-
# total_tokens: 192,
|
|
416
|
-
# latency_ms: 248,
|
|
302
|
+
# provider: "openai", model: "gpt-4o",
|
|
303
|
+
# input_tokens: 150, output_tokens: 42, total_tokens: 192, latency_ms: 248,
|
|
417
304
|
# cost: {
|
|
418
|
-
# input_cost: 0.000375,
|
|
419
|
-
#
|
|
420
|
-
#
|
|
421
|
-
# cache_creation_input_cost: 0.0,
|
|
422
|
-
# output_cost: 0.00042,
|
|
423
|
-
# total_cost: 0.000795,
|
|
424
|
-
# currency: "USD"
|
|
305
|
+
# input_cost: 0.000375, cached_input_cost: 0.0,
|
|
306
|
+
# cache_read_input_cost: 0.0, cache_creation_input_cost: 0.0,
|
|
307
|
+
# output_cost: 0.00042, total_cost: 0.000795, currency: "USD"
|
|
425
308
|
# },
|
|
426
309
|
# tags: { feature: "chat", user_id: 42 },
|
|
427
310
|
# tracked_at: 2026-04-16 14:30:00 UTC
|
|
428
311
|
# }
|
|
429
|
-
|
|
430
|
-
StatsD.increment("llm.requests", tags: ["provider:#{payload[:provider]}"])
|
|
431
|
-
StatsD.histogram("llm.cost", payload[:cost][:total_cost])
|
|
432
312
|
end
|
|
433
313
|
```
|
|
434
314
|
|
|
435
|
-
## Custom
|
|
315
|
+
## Custom storage backend
|
|
436
316
|
|
|
437
317
|
```ruby
|
|
438
|
-
|
|
439
|
-
|
|
440
|
-
|
|
441
|
-
|
|
442
|
-
|
|
443
|
-
|
|
444
|
-
|
|
445
|
-
latency_ms: event[:latency_ms]
|
|
446
|
-
},
|
|
447
|
-
tags: { provider: event[:provider], model: event[:model] }
|
|
448
|
-
})
|
|
449
|
-
}
|
|
450
|
-
end
|
|
318
|
+
config.storage_backend = :custom
|
|
319
|
+
config.custom_storage = ->(event) {
|
|
320
|
+
InfluxDB.write("llm_costs",
|
|
321
|
+
values: { cost: event.cost&.total_cost, tokens: event.total_tokens, latency_ms: event.latency_ms },
|
|
322
|
+
tags: { provider: event.provider, model: event.model }
|
|
323
|
+
)
|
|
324
|
+
}
|
|
451
325
|
```
|
|
452
326
|
|
|
453
|
-
## OpenAI-
|
|
327
|
+
## OpenAI-compatible providers
|
|
454
328
|
|
|
455
329
|
```ruby
|
|
456
|
-
|
|
457
|
-
# Built in:
|
|
458
|
-
# "openrouter.ai" => "openrouter"
|
|
459
|
-
# "api.deepseek.com" => "deepseek"
|
|
460
|
-
config.openai_compatible_providers["gateway.example.com"] = "internal_gateway"
|
|
461
|
-
end
|
|
330
|
+
config.openai_compatible_providers["gateway.example.com"] = "internal_gateway"
|
|
462
331
|
```
|
|
463
332
|
|
|
464
|
-
|
|
465
|
-
|
|
466
|
-
- `prompt_tokens` / `completion_tokens` / `total_tokens`
|
|
467
|
-
- `input_tokens` / `output_tokens` / `total_tokens`
|
|
468
|
-
- optional cached input details when the response includes them
|
|
469
|
-
|
|
470
|
-
This covers OpenRouter, DeepSeek, and private gateways that expose OpenAI-style Chat Completions, Responses, Completions, or Embeddings endpoints.
|
|
471
|
-
|
|
472
|
-
## Safety Guarantees
|
|
333
|
+
Configured hosts are parsed with the OpenAI-compatible usage shape (`prompt_tokens` / `completion_tokens` / `total_tokens`, `input_tokens` / `output_tokens`, and optional cached-input details). Covers OpenRouter, DeepSeek, and private gateways exposing Chat Completions / Responses / Completions / Embeddings.
|
|
473
334
|
|
|
474
|
-
|
|
475
|
-
- It does not store prompt or response bodies.
|
|
476
|
-
- Faraday responses are not modified.
|
|
477
|
-
- Storage failures are non-fatal by default via `storage_error_behavior = :warn`.
|
|
478
|
-
- Budget and unknown-pricing errors are raised only when you opt into `:raise` or `:block_requests`.
|
|
479
|
-
- Pricing is local and best-effort; use `prices_file` or `pricing_overrides` for production-specific rates.
|
|
480
|
-
- Streaming/SSE calls are skipped with a warning when the final usage payload is not readable by Faraday.
|
|
335
|
+
## Custom parser
|
|
481
336
|
|
|
482
|
-
|
|
483
|
-
|
|
484
|
-
- Use `storage_backend = :active_record` in production.
|
|
485
|
-
- Set `monthly_budget` and choose `budget_exceeded_behavior`.
|
|
486
|
-
- Treat `:block_requests` as best-effort in concurrent systems, not a strict quota.
|
|
487
|
-
- Keep `unknown_pricing_behavior = :warn` or `:raise` until pricing overrides are complete.
|
|
488
|
-
- Add `pricing_overrides` for custom, fine-tuned, gateway-specific, or newly released models.
|
|
489
|
-
- Tag calls with useful business context such as `tenant_id`, `user_id`, and `feature`.
|
|
490
|
-
- Check `LlmCostTracker::LlmApiCall.unknown_pricing.group(:model).count` after deploys.
|
|
491
|
-
- Track `latency_ms` and watch `latency_by_model` for slow or degraded providers.
|
|
492
|
-
|
|
493
|
-
## Known Limitations
|
|
494
|
-
|
|
495
|
-
- `:block_requests` is best-effort under concurrency. For hard caps, use an external quota system, provider-side limits, or a gateway-level budget.
|
|
496
|
-
- Streaming/SSE calls are tracked only when Faraday exposes a final response body with usage data. Otherwise the gem warns and skips automatic tracking.
|
|
497
|
-
- Anthropic cache creation TTL variants are not modeled separately yet; 1-hour cache writes may be underestimated compared with the default 5-minute cache write rate.
|
|
498
|
-
- OpenAI reasoning tokens are included in output-token totals when providers report them that way, but separate reasoning-token attribution is not stored yet.
|
|
499
|
-
|
|
500
|
-
## Adding a Custom Provider Parser
|
|
501
|
-
|
|
502
|
-
Use this for providers that are not OpenAI-compatible and return a different usage shape.
|
|
337
|
+
For providers with a non-OpenAI usage shape:
|
|
503
338
|
|
|
504
339
|
```ruby
|
|
505
340
|
class AcmeParser < LlmCostTracker::Parsers::Base
|
|
@@ -510,73 +345,58 @@ class AcmeParser < LlmCostTracker::Parsers::Base
|
|
|
510
345
|
def parse(request_url, request_body, response_status, response_body)
|
|
511
346
|
return nil unless response_status == 200
|
|
512
347
|
|
|
513
|
-
|
|
514
|
-
usage = response["usage"]
|
|
348
|
+
usage = safe_json_parse(response_body)&.dig("usage")
|
|
515
349
|
return nil unless usage
|
|
516
350
|
|
|
517
|
-
|
|
351
|
+
LlmCostTracker::ParsedUsage.build(
|
|
518
352
|
provider: "acme",
|
|
519
|
-
model:
|
|
353
|
+
model: safe_json_parse(response_body)["model"],
|
|
520
354
|
input_tokens: usage["input"] || 0,
|
|
521
355
|
output_tokens: usage["output"] || 0
|
|
522
|
-
|
|
356
|
+
)
|
|
523
357
|
end
|
|
524
358
|
end
|
|
525
359
|
|
|
526
|
-
# Register it
|
|
527
360
|
LlmCostTracker::Parsers::Registry.register(AcmeParser.new)
|
|
528
361
|
```
|
|
529
362
|
|
|
530
|
-
## Supported
|
|
363
|
+
## Supported providers
|
|
531
364
|
|
|
532
365
|
| Provider | Auto-detected | Models with pricing |
|
|
533
|
-
|
|
366
|
+
|---|:---:|---|
|
|
534
367
|
| OpenAI | ✅ | GPT-5.2/5.1/5, GPT-5 mini/nano, GPT-4.1, GPT-4o, o1/o3/o4-mini |
|
|
535
|
-
| OpenRouter | ✅ |
|
|
536
|
-
| DeepSeek | ✅ |
|
|
368
|
+
| OpenRouter | ✅ | OpenAI-compatible usage; provider-prefixed OpenAI model IDs normalized when possible |
|
|
369
|
+
| DeepSeek | ✅ | OpenAI-compatible usage; add `pricing_overrides` for DeepSeek models |
|
|
537
370
|
| OpenAI-compatible hosts | 🔧 | Configure `openai_compatible_providers` |
|
|
538
371
|
| Anthropic | ✅ | Claude Opus 4.6/4.1/4, Sonnet 4.6/4.5/4, Haiku 4.5, Claude 3.x |
|
|
539
372
|
| Google Gemini | ✅ | Gemini 2.5 Pro/Flash/Flash-Lite, 2.0 Flash/Flash-Lite, 1.5 Pro/Flash |
|
|
540
|
-
| Any other | 🔧 |
|
|
541
|
-
|
|
542
|
-
Supported endpoint families:
|
|
373
|
+
| Any other | 🔧 | Custom parser |
|
|
543
374
|
|
|
544
|
-
|
|
545
|
-
- OpenAI-compatible: Chat Completions, Responses, Completions, Embeddings
|
|
546
|
-
- Anthropic: Messages
|
|
547
|
-
- Google Gemini: `generateContent` responses with `usageMetadata`
|
|
375
|
+
Endpoints: OpenAI Chat Completions / Responses / Completions / Embeddings; OpenAI-compatible equivalents; Anthropic Messages; Gemini `generateContent` with `usageMetadata`.
|
|
548
376
|
|
|
549
|
-
##
|
|
377
|
+
## Safety
|
|
550
378
|
|
|
551
|
-
|
|
552
|
-
|
|
553
|
-
|
|
554
|
-
|
|
555
|
-
|
|
556
|
-
Calculates cost
|
|
557
|
-
↓
|
|
558
|
-
ActiveSupport::Notifications
|
|
559
|
-
ActiveRecord / Log / Custom
|
|
560
|
-
```
|
|
379
|
+
- No external HTTP calls.
|
|
380
|
+
- No prompt or response bodies stored.
|
|
381
|
+
- Faraday responses not modified.
|
|
382
|
+
- Storage failures non-fatal by default (`storage_error_behavior = :warn`).
|
|
383
|
+
- Budget / unknown-pricing errors are raised only when you opt in.
|
|
561
384
|
|
|
562
|
-
|
|
385
|
+
## Known limitations
|
|
563
386
|
|
|
564
|
-
|
|
387
|
+
- `:block_requests` is best-effort under concurrency; use an external quota system for hard caps.
|
|
388
|
+
- Streaming/SSE tracked only when Faraday exposes a final body with usage.
|
|
389
|
+
- Anthropic cache TTL variants (1h vs 5min writes) not modeled separately.
|
|
390
|
+
- OpenAI reasoning tokens included in output totals; separate reasoning-token attribution not stored.
|
|
565
391
|
|
|
566
392
|
## Development
|
|
567
393
|
|
|
568
394
|
```bash
|
|
569
|
-
git clone https://github.com/sergey-homenko/llm_cost_tracker.git
|
|
570
|
-
cd llm_cost_tracker
|
|
571
395
|
bundle install
|
|
572
396
|
bundle exec rspec
|
|
573
397
|
bundle exec rubocop
|
|
574
398
|
```
|
|
575
399
|
|
|
576
|
-
## Contributing
|
|
577
|
-
|
|
578
|
-
Bug reports and pull requests are welcome on [GitHub](https://github.com/sergey-homenko/llm_cost_tracker).
|
|
579
|
-
|
|
580
400
|
## License
|
|
581
401
|
|
|
582
|
-
|
|
402
|
+
MIT. See [LICENSE.txt](LICENSE.txt).
|