llm_cost_tracker 0.4.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (47) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +35 -0
  3. data/README.md +195 -109
  4. data/app/services/llm_cost_tracker/dashboard/data_quality.rb +46 -55
  5. data/app/services/llm_cost_tracker/dashboard/data_quality_aggregate.rb +81 -0
  6. data/lib/llm_cost_tracker/budget.rb +34 -37
  7. data/lib/llm_cost_tracker/configuration/instrumentation.rb +37 -0
  8. data/lib/llm_cost_tracker/configuration.rb +10 -5
  9. data/lib/llm_cost_tracker/doctor.rb +166 -0
  10. data/lib/llm_cost_tracker/generators/llm_cost_tracker/install_generator.rb +33 -0
  11. data/lib/llm_cost_tracker/generators/llm_cost_tracker/prices_generator.rb +12 -6
  12. data/lib/llm_cost_tracker/generators/llm_cost_tracker/templates/add_period_totals_to_llm_cost_tracker.rb.erb +38 -8
  13. data/lib/llm_cost_tracker/generators/llm_cost_tracker/templates/create_llm_api_calls.rb.erb +1 -2
  14. data/lib/llm_cost_tracker/generators/llm_cost_tracker/templates/initializer.rb.erb +53 -21
  15. data/lib/llm_cost_tracker/integrations/anthropic.rb +75 -0
  16. data/lib/llm_cost_tracker/integrations/base.rb +72 -0
  17. data/lib/llm_cost_tracker/integrations/object_reader.rb +56 -0
  18. data/lib/llm_cost_tracker/integrations/openai.rb +95 -0
  19. data/lib/llm_cost_tracker/integrations/registry.rb +41 -0
  20. data/lib/llm_cost_tracker/middleware/faraday.rb +4 -3
  21. data/lib/llm_cost_tracker/parsed_usage.rb +8 -1
  22. data/lib/llm_cost_tracker/parsers/anthropic.rb +17 -49
  23. data/lib/llm_cost_tracker/parsers/base.rb +80 -0
  24. data/lib/llm_cost_tracker/parsers/gemini.rb +12 -35
  25. data/lib/llm_cost_tracker/parsers/openai.rb +1 -6
  26. data/lib/llm_cost_tracker/parsers/openai_compatible.rb +6 -15
  27. data/lib/llm_cost_tracker/parsers/openai_usage.rb +8 -30
  28. data/lib/llm_cost_tracker/parsers/registry.rb +17 -2
  29. data/lib/llm_cost_tracker/price_freshness.rb +38 -0
  30. data/lib/llm_cost_tracker/price_registry.rb +14 -0
  31. data/lib/llm_cost_tracker/price_sync/fetcher.rb +2 -1
  32. data/lib/llm_cost_tracker/price_sync/refresh_plan_builder.rb +4 -2
  33. data/lib/llm_cost_tracker/price_sync.rb +10 -0
  34. data/lib/llm_cost_tracker/prices.json +394 -41
  35. data/lib/llm_cost_tracker/pricing.rb +8 -1
  36. data/lib/llm_cost_tracker/request_url.rb +20 -0
  37. data/lib/llm_cost_tracker/storage/active_record_rollups.rb +47 -27
  38. data/lib/llm_cost_tracker/storage/active_record_store.rb +4 -0
  39. data/lib/llm_cost_tracker/stream_collector.rb +3 -3
  40. data/lib/llm_cost_tracker/tag_context.rb +52 -0
  41. data/lib/llm_cost_tracker/tags_column.rb +62 -24
  42. data/lib/llm_cost_tracker/tracker.rb +5 -2
  43. data/lib/llm_cost_tracker/version.rb +1 -1
  44. data/lib/llm_cost_tracker.rb +14 -4
  45. data/lib/tasks/llm_cost_tracker.rake +21 -3
  46. metadata +13 -3
  47. data/lib/llm_cost_tracker/generators/llm_cost_tracker/templates/llm_cost_tracker_prices.yml.erb +0 -51
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: ccb9a8365f4a06026a4352385efa1318ac59ce403cb848e0c9aff992fc80f64c
4
- data.tar.gz: f21503cd322e923dc5bde0139cc61bc1547cef01eac59fe7a3861e1ab33e9860
3
+ metadata.gz: 6ee180a9d6ead4b84965b3ff96f87b31c6ce8982a8e13383f936d3031e8f6f5f
4
+ data.tar.gz: fda6d61c9f86b4e2a4dbdc7a7852f6f4f22bcf43f76b6cfbdd4f438c325e8d8c
5
5
  SHA512:
6
- metadata.gz: 304ab6de6404f070b21b1dd72ce9eae2b44fb2fc7845eae8831a04971ed2b8ec2b6f740bc082fb36cfa42d90f0be59ab5800d43d72b68f04918a113b6d7d8cbd
7
- data.tar.gz: afa2e92a99062bb1e0b4a00ab1d0762ca688f1890e0d76a29801881e2319e68db217c036d8f8c5d99558b580d5d4c039f8b3334283631763ee093fd12d369329
6
+ metadata.gz: 8e341c007ff3380459a07890a45bc5e05010c12ffc52a1f805492eb6c643e9637529b02e4d6ae12a7f35c1e25ea819336544bd007f7cfc5efa9c7999559f5d83
7
+ data.tar.gz: 44c912532194be0f239c6950f1f91317329bb5b8c3afbf33e430b4f9006377a8729ad0ed7f3c2c98528983d218fabef136774088018203426749532eb01627ef
data/CHANGELOG.md CHANGED
@@ -4,6 +4,41 @@ Format: [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). Versioning: [S
4
4
 
5
5
  ## [Unreleased]
6
6
 
7
+ ## [0.5.0] - 2026-04-25
8
+
9
+ ### Added
10
+
11
+ - Optional SDK integrations: `config.instrument :openai`, `:anthropic`, or `:all` patches the official `openai` and `anthropic` gems' resource methods to record usage automatically. Provider SDKs are not added as hard dependencies.
12
+ - `LlmCostTracker.with_tags` plus `TagContext` for thread- and fiber-isolated request-scoped tags that flow through middleware, SDK integrations, and `track` / `track_stream`.
13
+ - `LlmCostTracker::Doctor` and the `llm_cost_tracker:doctor` rake task for diagnosing storage, schema, optional columns, period totals, integrations, prices, and recent calls.
14
+ - `LlmCostTracker::PriceFreshness` helper plus a price-freshness doctor check that warns when bundled or local prices are stale.
15
+ - Technical documentation under `docs/technical/` covering architecture, data flow, extension points, module map, and operational notes.
16
+
17
+ ### Changed
18
+
19
+ - Pricing fuzzy matching now only accepts dated snapshot suffixes instead of guessing new model families.
20
+ - Built-in prices include GPT-5.5 and GPT-5.4 variants and drop retired Claude and Gemini entries.
21
+ - Missing model identifiers now normalize to `unknown` instead of leaking nil into tracked events.
22
+ - `llm_cost_tracker:prices` now generates a full local price snapshot instead of an empty override file.
23
+ - Price sync workflow surfaces clearer error context for fetcher failures and skips refresh-plan entries with malformed pricing.
24
+ - README, cookbook, and technical docs clarify that `config.instrument` patches official SDKs only; `ruby-openai` (alexrudall) routes through the Faraday middleware via its constructor block, and `ruby_llm` is not auto-captured today because the gem does not expose a Faraday middleware hook.
25
+
26
+ ## [0.4.1] - 2026-04-24
27
+
28
+ ### Changed
29
+
30
+ - Batched ActiveRecord period rollup writes and budget total reads.
31
+ - Memoized schema capability checks and refreshed them on `reset_column_information`.
32
+ - Install migration adds `[:model, :tracked_at]` composite index and drops redundant single-column `:provider` / `:model` indexes.
33
+ - Data Quality now reads counters and usage sums through one aggregate query.
34
+ - Parser URL matching, stream-event extraction, and custom parser registration now share a smaller base/registry extension surface.
35
+ - Added cookbook recipes for `ruby-openai`, `anthropic-sdk-ruby`, `gemini-ai`, `langchainrb`, Azure OpenAI, and LiteLLM proxy setups.
36
+
37
+ ### Fixed
38
+
39
+ - `llm_cost_tracker:add_period_totals` now imports legacy monthly rollups and backfills before adding the unique index.
40
+ - Budget docs now describe `:notify` across monthly, daily, and per-call budgets.
41
+
7
42
  ## [0.4.0] - 2026-04-24
8
43
 
9
44
  ### Changed
data/README.md CHANGED
@@ -1,13 +1,13 @@
1
1
  # LLM Cost Tracker
2
2
 
3
- **Self-hosted LLM cost tracking for Ruby and Rails.** Intercepts Faraday LLM responses or records usage explicitly, prices events locally, and stores them in your database. No proxy, no SaaS.
3
+ **Self-hosted LLM cost tracking for Ruby and Rails.** Instruments common Ruby SDKs, intercepts Faraday LLM responses, prices events locally, and can store them in your database. No proxy, no SaaS.
4
4
 
5
5
  [![Gem Version](https://img.shields.io/gem/v/llm_cost_tracker.svg)](https://rubygems.org/gems/llm_cost_tracker)
6
6
  [![CI](https://github.com/sergey-homenko/llm_cost_tracker/actions/workflows/ruby.yml/badge.svg)](https://github.com/sergey-homenko/llm_cost_tracker/actions)
7
7
  [![codecov](https://codecov.io/gh/sergey-homenko/llm_cost_tracker/branch/main/graph/badge.svg)](https://codecov.io/gh/sergey-homenko/llm_cost_tracker)
8
8
 
9
- Requires Ruby 3.3+, Rails/ActiveRecord 7.1+, and Faraday 2.0+.
10
- Core tracking works without Rails; the mounted dashboard requires Rails 7.1+.
9
+ Requires Ruby 3.3+, ActiveSupport 7.1+, and Faraday 2.0+.
10
+ ActiveRecord storage requires ActiveRecord 7.1+. The mounted dashboard requires Rails 7.1+.
11
11
 
12
12
  ## Why
13
13
 
@@ -16,48 +16,48 @@ Every Rails app with LLM integrations eventually runs into the same question: wh
16
16
  ## What You Get
17
17
 
18
18
  - A local ActiveRecord ledger of provider, model, usage breakdown, cost, latency, tags, streaming usage, and provider response IDs
19
- - Faraday middleware plus explicit `track` / `track_stream` helpers for non-Faraday clients
20
- - Server-rendered Rails dashboard with overview, calls, tags, CSV export, and data-quality pages
19
+ - Optional official OpenAI and Anthropic SDK integrations, plus Faraday middleware for custom clients
20
+ - Explicit `track` / `track_stream` helpers as a fallback for unsupported clients
21
+ - Server-rendered Rails dashboard with overview, models, calls, tags, CSV export, and data-quality pages
21
22
  - Local pricing snapshots, price sync tasks, and budget guardrails
22
23
  - Prompt and response bodies are never persisted
23
24
 
24
25
  ## Dashboard
25
26
 
26
- LLM Cost Tracker ships with an optional server-rendered Rails Engine dashboard for spend review, attribution, and data quality checks.
27
+ LLM Cost Tracker ships with a server-rendered Rails Engine dashboard for spend review, attribution, and data quality checks.
27
28
 
28
29
  ![LLM Cost Tracker dashboard](docs/dashboard-overview.png)
29
30
 
30
- The overview page includes spend trend, budget status, provider breakdown, top models, and filterable slices. The engine also includes Calls, Tags, and Data Quality pages. Plain ERB, no JavaScript bundle.
31
+ The overview page includes spend trend, budget status, provider breakdown, top models, and filterable slices. The engine also includes Models, Calls, Tags, and Data Quality pages. Plain ERB, no JavaScript bundle.
31
32
 
32
33
  ## Quickstart
33
34
 
34
35
  ```ruby
35
36
  gem "llm_cost_tracker"
37
+ gem "openai"
36
38
  ```
37
39
 
38
40
  ```bash
39
- bin/rails generate llm_cost_tracker:install
41
+ bin/rails generate llm_cost_tracker:install --dashboard --prices
40
42
  bin/rails db:migrate
43
+ bin/rails llm_cost_tracker:doctor
41
44
  ```
42
45
 
46
+ Skip `--dashboard` if you only want the ledger. Skip `--prices` if you do not want a local pricing file yet.
47
+
43
48
  ```ruby
44
49
  LlmCostTracker.configure do |config|
45
50
  config.storage_backend = :active_record
46
- config.default_tags = { app: "my_app", environment: Rails.env }
51
+ config.default_tags = -> { { environment: Rails.env } }
52
+ config.instrument :openai
47
53
  end
48
54
 
49
- OpenAI.configure do |config|
50
- config.access_token = ENV["OPENAI_API_KEY"]
51
- config.faraday do |f|
52
- f.use :llm_cost_tracker, tags: -> { { user_id: Current.user&.id, feature: "chat" } }
53
- end
55
+ LlmCostTracker.with_tags(user_id: Current.user&.id, feature: "chat") do
56
+ client = OpenAI::Client.new(api_key: ENV["OPENAI_API_KEY"])
57
+ client.responses.create(model: "gpt-4o", input: "Hello")
54
58
  end
55
59
  ```
56
60
 
57
- ```ruby
58
- mount LlmCostTracker::Engine => "/llm-costs"
59
- ```
60
-
61
61
  After that, LLM Cost Tracker starts recording calls into `llm_api_calls` and the dashboard becomes available at `/llm-costs`.
62
62
  Protect the mounted engine with your application's authentication before exposing it outside development.
63
63
 
@@ -69,39 +69,43 @@ Protect the mounted engine with your application's authentication before exposin
69
69
  - No built-in auth on the mounted dashboard
70
70
  - Use `:active_record` when you want shared dashboards and budget checks across Puma workers and Sidekiq processes
71
71
 
72
- ## Installation
72
+ ## Technical Docs
73
73
 
74
- ```ruby
75
- gem "llm_cost_tracker"
76
- ```
74
+ - [Architecture](docs/architecture.md)
77
75
 
78
- For ActiveRecord storage:
76
+ ## Usage
79
77
 
80
- ```bash
81
- bin/rails generate llm_cost_tracker:install
82
- bin/rails db:migrate
83
- ```
78
+ ### Official SDK integrations
84
79
 
85
- ## Usage
80
+ `config.instrument` patches **official** provider SDKs only — currently the official `openai` and `anthropic` gems. SDK integrations are optional and do not add provider SDKs as gem dependencies. Install the provider SDK you already use, then enable its integration.
86
81
 
87
- ### Patch an existing client's Faraday connection
82
+ ```ruby
83
+ LlmCostTracker.configure do |config|
84
+ config.instrument :openai
85
+ config.instrument :anthropic
86
+ end
87
+ ```
88
+
89
+ The OpenAI integration records non-streaming calls through the official `openai` gem's `responses.create` and `chat.completions.create`. The Anthropic integration records non-streaming calls through the official `anthropic` gem's `messages.create`. Both integrations extract usage, model, latency, provider response ID, cache tokens, and hidden/reasoning tokens when the SDK response exposes them.
88
90
 
89
91
  ```ruby
90
- # config/initializers/openai.rb
91
- OpenAI.configure do |config|
92
- config.access_token = ENV["OPENAI_API_KEY"]
93
-
94
- config.faraday do |f|
95
- f.use :llm_cost_tracker, tags: -> {
96
- { user_id: Current.user&.id, workflow: Current.workflow, env: Rails.env }
97
- }
98
- end
92
+ LlmCostTracker.with_tags(feature: "support_chat", user_id: Current.user&.id) do
93
+ anthropic = Anthropic::Client.new(api_key: ENV["ANTHROPIC_API_KEY"])
94
+ anthropic.messages.create(
95
+ model: "claude-sonnet-4-5-20250929",
96
+ max_tokens: 1024,
97
+ messages: [{ role: "user", content: "Hello" }]
98
+ )
99
99
  end
100
100
  ```
101
101
 
102
- `tags:` can be a callable and is evaluated on each request.
102
+ Community clients such as `ruby-openai` are not patched by `instrument`. `ruby-openai` exposes a Faraday block on its constructor and is covered by the middleware below.
103
+
104
+ Google's official Gemini SDKs do not include Ruby. Use the Faraday middleware against Gemini's REST API, or keep custom clients behind the fallback helpers until a stable SDK integration exists.
105
+
106
+ ### Faraday middleware
103
107
 
104
- ### Raw Faraday
108
+ `tags:` can be a hash or callable. Callables are evaluated on each request and may accept the Faraday request env.
105
109
 
106
110
  ```ruby
107
111
  conn = Faraday.new(url: "https://api.openai.com") do |f|
@@ -116,9 +120,11 @@ conn.post("/v1/responses", { model: "gpt-5-mini", input: "Hello!" })
116
120
 
117
121
  Place `llm_cost_tracker` inside the Faraday stack where it can see the final response body.
118
122
 
123
+ The same middleware covers `ruby-openai` through its constructor block.
124
+
119
125
  ### Streaming
120
126
 
121
- Streaming is captured automatically for OpenAI, Anthropic, and Gemini when the request goes through the Faraday middleware. The middleware tees the `on_data` callback, keeps the stream flowing to your code, and records the final usage block once the response completes.
127
+ Streaming is captured automatically for OpenAI, Anthropic, and Gemini when the request goes through the Faraday middleware. The middleware tees the `on_data` callback, keeps the stream flowing to your code, and records provider-reported usage once the response completes.
122
128
 
123
129
  ```ruby
124
130
  # OpenAI: include usage in the final chunk
@@ -130,20 +136,22 @@ client.chat(parameters: {
130
136
  })
131
137
  ```
132
138
 
133
- Anthropic emits usage in `message_start` + `message_delta` events. Gemini's `:streamGenerateContent` endpoint includes `usageMetadata`; usage from the final chunk is used.
139
+ Anthropic emits usage in `message_start` + `message_delta` events. Gemini's `:streamGenerateContent` endpoint includes `usageMetadata`; the latest usage block is used.
134
140
 
135
141
  Streamed calls are stored with `stream: true` and `usage_source: "stream_final"`. If the provider never sends final usage, the call is still recorded with `usage_source: "unknown"` so those calls surface on the Data Quality page.
136
142
 
137
143
  When the provider emits a stable response object ID, LLM Cost Tracker stores it as `provider_response_id`. OpenAI and Anthropic are covered end-to-end; Gemini is best effort and may vary by endpoint or API version.
138
144
 
139
- For non-Faraday clients (raw `Net::HTTP`, custom SSE code, Azure OpenAI), use the explicit helper:
145
+ Model identifiers are extracted from the provider response, request body, stream events, or URL path depending on the provider. If no source carries a model, the event is stored under `model: "unknown"` and shows up as unknown pricing instead of being guessed.
146
+
147
+ For non-Faraday clients without an SDK integration, prefer adding a supported adapter. Use the explicit helper only as a fallback while wiring a client that does not expose a stable hook yet:
140
148
 
141
149
  ```ruby
142
150
  LlmCostTracker.track_stream(provider: "openai", model: "gpt-4o") do |stream|
143
- my_client.stream(...) { |chunk| stream.event(chunk) }
151
+ my_client.stream(...) { |event| stream.event(event.to_h) }
144
152
  end
145
153
 
146
- # Or skip the chunk parsing entirely if you already know the totals:
154
+ # Or skip provider event parsing entirely if you already know the totals:
147
155
  LlmCostTracker.track_stream(provider: "openai", model: "gpt-4o") do |stream|
148
156
  # ... your streaming loop ...
149
157
  stream.usage(input_tokens: 120, output_tokens: 45)
@@ -161,7 +169,11 @@ end
161
169
 
162
170
  Run `bin/rails g llm_cost_tracker:add_streaming` once on existing installs to add the `stream` and `usage_source` columns. Run `bin/rails g llm_cost_tracker:add_provider_response_id` to persist provider-issued response IDs. Run `bin/rails g llm_cost_tracker:add_usage_breakdown` to add cache-read, cache-write, hidden-output, and pricing-mode columns.
163
171
 
164
- ### Manual tracking
172
+ More client-specific snippets live in [`docs/cookbook.md`](docs/cookbook.md).
173
+
174
+ ### Fallback tracking
175
+
176
+ Automatic capture should be the default integration path. `track` exists for custom clients, internal gateways, migrations, and SDKs that do not expose a stable middleware or instrumentation hook yet.
165
177
 
166
178
  ```ruby
167
179
  LlmCostTracker.track(
@@ -180,42 +192,72 @@ LlmCostTracker.track(
180
192
  `cache_read_input_tokens` and cache writes in `cache_write_input_tokens`; total
181
193
  tokens are calculated from the canonical billing breakdown.
182
194
 
195
+ For manual tracking, pass the real upstream model when you know it. If a gateway only exposes a deployment or router name, use that stable identifier and add a matching `prices_file` / `pricing_overrides` entry.
196
+
197
+ ### Tags
198
+
199
+ Tags are application context, not provider metadata. LLM Cost Tracker detects provider/model from the response when a parser is available; tags tell you who or what caused the call.
200
+
201
+ ```ruby
202
+ LlmCostTracker.with_tags(user_id: current_user.id, feature: "support_chat", trace_id: request.uuid) do
203
+ client.chat(parameters: { model: "gpt-4o", messages: [...] })
204
+ end
205
+ ```
206
+
207
+ `default_tags` can be a hash or callable. Scoped tags from `with_tags` apply only inside the block and are isolated per thread/fiber. Explicit tags passed to `track`, `track_stream`, or middleware metadata win over scoped/default tags.
208
+
183
209
  ## Configuration
184
210
 
185
211
  ```ruby
186
- # config/initializers/llm_cost_tracker.rb
187
212
  LlmCostTracker.configure do |config|
188
- config.storage_backend = :active_record # :log (default), :active_record, :custom
189
- config.default_tags = { app: "my_app", environment: Rails.env }
190
-
213
+ config.storage_backend = :active_record
214
+ config.default_tags = -> { { environment: Rails.env } }
215
+ config.instrument :openai
216
+ config.instrument :anthropic
217
+ config.prices_file = Rails.root.join("config/llm_cost_tracker_prices.yml")
191
218
  config.monthly_budget = 500.00
192
219
  config.daily_budget = 50.00
193
220
  config.per_call_budget = 2.00
194
- config.budget_exceeded_behavior = :notify # :notify, :raise, :block_requests
195
- config.storage_error_behavior = :warn # :ignore, :warn, :raise
196
- config.unknown_pricing_behavior = :warn # :ignore, :warn, :raise
197
-
221
+ config.budget_exceeded_behavior = :notify
198
222
  config.on_budget_exceeded = ->(data) {
199
- SlackNotifier.notify("#alerts", "🚨 LLM #{data[:budget_type]} budget $#{data[:total].round(2)} / $#{data[:budget]}")
223
+ SlackNotifier.notify("#alerts", "LLM #{data[:budget_type]} budget $#{data[:total].round(2)} / $#{data[:budget]}")
200
224
  }
201
-
202
- config.prices_file = Rails.root.join("config/llm_cost_tracker_prices.yml")
203
- config.pricing_overrides = {
204
- "ft:gpt-4o-mini:my-org" => { input: 0.30, cache_read_input: 0.15, output: 1.20 }
205
- }
206
-
207
- # Built-in: openrouter.ai, api.deepseek.com
208
- config.openai_compatible_providers["llm.my-company.com"] = "internal_gateway"
209
225
  end
210
226
  ```
211
227
 
228
+ Storage backends: `:log` (default), `:active_record`, `:custom`. Error behaviors: `:ignore`, `:warn`, `:raise`; budget behavior also supports `:block_requests`.
229
+
230
+ Configuration reference:
231
+
232
+ | Option | Default | Purpose |
233
+ |---|---:|---|
234
+ | `enabled` | `true` | Turns tracking on/off. |
235
+ | `storage_backend` | `:log` | `:log`, `:active_record`, or `:custom`. |
236
+ | `custom_storage` | `nil` | Callable storage hook for `:custom`. |
237
+ | `default_tags` | `{}` | Hash or callable merged into every event. |
238
+ | `prices_file` | `nil` | Local JSON/YAML price table. |
239
+ | `pricing_overrides` | `{}` | Ruby-side model price overrides. |
240
+ | `instrument` | none | Enables optional SDK integrations such as `:openai`, `:anthropic`, or `:all`. |
241
+ | `monthly_budget` | `nil` | Monthly spend guardrail. |
242
+ | `daily_budget` | `nil` | Daily spend guardrail. |
243
+ | `per_call_budget` | `nil` | Single-event spend guardrail. |
244
+ | `budget_exceeded_behavior` | `:notify` | `:notify`, `:raise`, or `:block_requests`. |
245
+ | `on_budget_exceeded` | `nil` | Callback for budget events. |
246
+ | `storage_error_behavior` | `:warn` | `:ignore`, `:warn`, or `:raise`. |
247
+ | `unknown_pricing_behavior` | `:warn` | `:ignore`, `:warn`, or `:raise`. |
248
+ | `log_level` | `:info` | Log level used by `:log` storage. |
249
+ | `openai_compatible_providers` | OpenRouter + DeepSeek | Host-to-provider map for compatible APIs. |
250
+ | `report_tag_breakdowns` | `[]` | Tag keys included in text reports. |
251
+
252
+ LLM Cost Tracker estimates cost from recorded usage and a versioned price registry. Providers usually return token usage, not a stable per-request price, so request costs are calculated locally and stored with the call. Historical rows do not change when prices update.
253
+
212
254
  Pricing is best effort. OpenRouter-style IDs like `openai/gpt-4o-mini` are normalized to built-in names when possible. Use `prices_file` / `pricing_overrides` for fine-tunes, gateway-specific IDs, enterprise discounts, alternate pricing modes, or models the gem does not know.
213
255
  Provider-specific entries like `openai/gpt-4o-mini` win over model-only entries like `gpt-4o-mini`.
214
256
  Pass `pricing_mode: :batch` to use optional mode-specific keys such as `batch_input` / `batch_output`; missing mode-specific keys fall back to standard `input` / `output` rates. The same pattern works for custom modes, for example `contract_input`.
215
257
 
216
258
  `storage_error_behavior = :warn` (default) lets LLM responses continue if storage fails; `:raise` exposes `StorageError#original_error`.
217
259
 
218
- Unknown pricing still records token counts, but `cost` is `nil` and budget guardrails skip that event. Find unpriced models:
260
+ With `unknown_pricing_behavior = :ignore` or `:warn`, unknown pricing still records token counts, but `cost` is `nil` and budget guardrails skip that event. With `:raise`, the event raises before storage. Find unpriced models:
219
261
 
220
262
  ```ruby
221
263
  LlmCostTracker::LlmApiCall.unknown_pricing.group(:model).count
@@ -223,22 +265,33 @@ LlmCostTracker::LlmApiCall.unknown_pricing.group(:model).count
223
265
 
224
266
  ### Keeping prices current
225
267
 
226
- Built-in prices live in `lib/llm_cost_tracker/prices.json`. The gem never fetches pricing on boot. For production, keep a local snapshot under `config/` and point the gem at it:
268
+ Built-in prices live in `lib/llm_cost_tracker/prices.json`. The gem never fetches pricing on boot. For production, generate a local snapshot from the bundled registry, keep it under source control, and point the gem at it:
227
269
 
228
270
  ```bash
229
271
  bin/rails generate llm_cost_tracker:prices
230
272
  ```
231
273
 
232
- ```json
233
- {
234
- "metadata": { "updated_at": "2026-04-18", "currency": "USD", "unit": "1M tokens" },
235
- "models": {
236
- "my-gateway/gpt-4o-mini": { "input": 0.20, "cache_read_input": 0.10, "output": 0.80, "batch_input": 0.10, "batch_output": 0.40 }
237
- }
238
- }
274
+ ```ruby
275
+ config.prices_file = Rails.root.join("config/llm_cost_tracker_prices.yml")
276
+ ```
277
+
278
+ The generated file has the same shape as the bundled registry:
279
+
280
+ ```yaml
281
+ metadata:
282
+ updated_at: "2026-04-25"
283
+ currency: USD
284
+ unit: 1M tokens
285
+ models:
286
+ my-gateway/gpt-4o-mini:
287
+ input: 0.20
288
+ cache_read_input: 0.10
289
+ output: 0.80
290
+ batch_input: 0.10
291
+ batch_output: 0.40
239
292
  ```
240
293
 
241
- `pricing_overrides` has the highest precedence. Use it for a handful of Ruby-side overrides; use `prices_file` when you want a local pricing table under source control.
294
+ Pricing precedence is `pricing_overrides`, then `prices_file`, then bundled prices. Use `prices_file` for the app's source-controlled snapshot and `pricing_overrides` only for a handful of Ruby-side emergency overrides.
242
295
 
243
296
  To refresh prices on demand:
244
297
 
@@ -246,19 +299,30 @@ To refresh prices on demand:
246
299
  bin/rails llm_cost_tracker:prices:sync
247
300
  ```
248
301
 
249
- `llm_cost_tracker:prices:sync` refreshes the current registry from two structured sources: LiteLLM first, OpenRouter second. LiteLLM is the primary source; OpenRouter fills gaps and helps surface discrepancies.
302
+ `llm_cost_tracker:prices:sync` refreshes a pricing file from two structured sources: LiteLLM first, OpenRouter second. LiteLLM is the primary source; OpenRouter fills gaps and helps surface discrepancies.
250
303
 
251
304
  `llm_cost_tracker:prices:sync` / `llm_cost_tracker:prices:check` perform HTTP GET requests to:
252
305
 
253
306
  - LiteLLM pricing JSON: `https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json`
254
307
  - OpenRouter Models API: `https://openrouter.ai/api/v1/models`
255
308
 
256
- If `config.prices_file` is configured, the task syncs that file automatically; otherwise it works from the built-in snapshot. `_source: "manual"` entries are never touched. Models that are still in your file but missing from both upstream sources are left alone and reported as orphaned. For intentional custom entries, mark them as manual so they stop showing up in orphaned warnings.
309
+ The task writes to `ENV["OUTPUT"]`, then `config.prices_file`, in that order. It aborts if neither is present. The gem's bundled `prices.json` is only updated when you explicitly pass it through `OUTPUT=` while developing the gem. `_source: "manual"` entries are never touched. Models that are still in your file but missing from both upstream sources are left alone and reported as orphaned. For intentional custom entries, mark them as manual so they stop showing up in orphaned warnings.
257
310
 
258
- Use `PREVIEW=1` to see the diff without writing. Use `STRICT=1` to fail instead of applying a partial refresh when a source fails or the validator rejects a price. Use `bin/rails llm_cost_tracker:prices:check` in CI to print the current diff and exit non-zero when the snapshot has drifted or refresh fails.
311
+ Use `OUTPUT=config/llm_cost_tracker_prices.yml` to choose a target file explicitly. Use `PREVIEW=1` to see the diff without writing. Use `STRICT=1` to fail instead of applying a partial refresh when a source fails or the validator rejects a price. Use `bin/rails llm_cost_tracker:prices:check` in CI to print the current diff and exit non-zero when the snapshot has drifted or refresh fails.
259
312
 
260
313
  Large price changes are flagged during sync. If a specific entry is expected to move by more than 3x, add `_validator_override: ["skip_relative_change"]` to that entry in your local price file.
261
314
 
315
+ If sync reports `certificate verify failed`, fix the host Ruby/OpenSSL trust store rather than disabling TLS verification. Common fixes are installing `ca-certificates` in Docker/Linux images, configuring the corporate proxy CA, setting `SSL_CERT_FILE` to the system CA bundle, or rebuilding rbenv/asdf Ruby after an OpenSSL upgrade.
316
+
317
+ For unattended updates, run the check daily and sync through review:
318
+
319
+ ```bash
320
+ bin/rails llm_cost_tracker:prices:check
321
+ STRICT=1 bin/rails llm_cost_tracker:prices:sync
322
+ ```
323
+
324
+ `bin/rails llm_cost_tracker:doctor` warns when the configured price file has no `metadata.updated_at` or when it is older than 30 days.
325
+
262
326
  ## Budget enforcement
263
327
 
264
328
  ```ruby
@@ -269,13 +333,13 @@ config.per_call_budget = 1.00
269
333
  config.budget_exceeded_behavior = :block_requests
270
334
  ```
271
335
 
272
- - `:notify` — fire `on_budget_exceeded` after an event pushes the month over budget.
336
+ - `:notify` — fire `on_budget_exceeded` after an event pushes the monthly, daily, or per-call budget over the limit.
273
337
  - `:raise` — record the event, then raise `BudgetExceededError`.
274
338
  - `:block_requests` — block preflight when the stored monthly or daily total is already over budget; still raises post-response on the event that crosses the line. Needs `:active_record` storage for preflight.
275
339
 
276
340
  `monthly_budget` and `daily_budget` are cumulative ledger limits. `per_call_budget` is a ceiling for a single priced event and runs after the response cost is known.
277
341
 
278
- ActiveRecord installs keep `llm_cost_tracker_period_totals` in sync with atomic upserts. Budget preflight reads period rollups instead of scanning `llm_api_calls`.
342
+ ActiveRecord installs keep `llm_cost_tracker_period_totals` in sync with atomic upserts. Budget preflight reads period rollups when they are available instead of scanning `llm_api_calls`.
279
343
 
280
344
  ```ruby
281
345
  rescue LlmCostTracker::BudgetExceededError => e
@@ -284,7 +348,7 @@ rescue LlmCostTracker::BudgetExceededError => e
284
348
 
285
349
  `:block_requests` is a **guardrail, not a hard cap**. The preflight and the spend-recording write are separate statements, so under Puma / Sidekiq concurrency multiple workers can all pass the preflight and then collectively overshoot the budget. The setting reliably *stops new requests after the overshoot is visible* — it does not prevent the overshoot itself. For strict quotas use a provider- or gateway-level limit, or a database-backed counter outside this gem.
286
350
 
287
- Preflight is wired into the Faraday middleware automatically. When you record events via `LlmCostTracker.track` / `track_stream` and also want the same preflight, opt in:
351
+ Preflight is wired into the Faraday middleware and SDK integrations automatically. When you record events via `LlmCostTracker.track` / `track_stream` and also want the same preflight, opt in:
288
352
 
289
353
  ```ruby
290
354
  LlmCostTracker.track(
@@ -302,8 +366,20 @@ end
302
366
  LlmCostTracker.enforce_budget! # standalone preflight
303
367
  ```
304
368
 
369
+ ## Doctor
370
+
371
+ Run the setup check after install, deploy, or upgrades:
372
+
373
+ ```bash
374
+ bin/rails llm_cost_tracker:doctor
375
+ ```
376
+
377
+ It checks storage mode, ActiveRecord availability, table/column coverage, period rollups, pricing file loading, and whether calls are being recorded. Setup errors exit non-zero; warnings point at optional production hardening.
378
+
305
379
  ## Querying costs
306
380
 
381
+ These helpers and rake tasks require ActiveRecord storage.
382
+
307
383
  ```bash
308
384
  bin/rails llm_cost_tracker:report
309
385
  DAYS=7 bin/rails llm_cost_tracker:report
@@ -337,7 +413,7 @@ LlmCostTracker::LlmApiCall.between(1.week.ago, Time.current).cost_by_model
337
413
 
338
414
  ## Retention
339
415
 
340
- Retention is not enforced automatically. Use the rake task below if you need to delete older records in batches.
416
+ Retention is not enforced automatically. With ActiveRecord storage, use the rake task below if you need to delete older records in batches.
341
417
 
342
418
  ```bash
343
419
  DAYS=90 bin/rails llm_cost_tracker:prune # delete calls older than N days in batches
@@ -354,10 +430,15 @@ add_index :llm_api_calls, :tags, using: :gin
354
430
 
355
431
  On other adapters tags fall back to JSON in a text column. `by_tag` uses JSONB containment on PG, text matching elsewhere.
356
432
 
357
- Upgrade an existing install:
433
+ ## Upgrading existing installs
434
+
435
+ Run the generators that match columns missing from older versions:
358
436
 
359
437
  ```bash
360
438
  bin/rails generate llm_cost_tracker:add_period_totals # shared budget rollups
439
+ bin/rails generate llm_cost_tracker:add_streaming # stream + usage_source
440
+ bin/rails generate llm_cost_tracker:add_provider_response_id
441
+ bin/rails generate llm_cost_tracker:add_usage_breakdown
361
442
  bin/rails generate llm_cost_tracker:upgrade_tags_to_jsonb # PG: text → jsonb + GIN
362
443
  bin/rails generate llm_cost_tracker:upgrade_cost_precision # widen cost columns
363
444
  bin/rails generate llm_cost_tracker:add_latency_ms
@@ -368,7 +449,9 @@ On PostgreSQL, the generated `upgrade_tags_to_jsonb` migration rewrites `llm_api
368
449
 
369
450
  ## Mounting the dashboard
370
451
 
371
- Optional Rails Engine. Plain ERB, no JavaScript framework, no asset pipeline required. Requires Rails 7.1+; the core middleware works without Rails.
452
+ Optional Rails Engine. Plain ERB, no JavaScript framework, no asset pipeline required. Requires Rails 7.1+; the core middleware works without Rails. The dashboard reads `llm_api_calls`, so use `storage_backend = :active_record` for apps that mount it.
453
+
454
+ `bin/rails generate llm_cost_tracker:install --dashboard` adds the require and route for you. Manual setup:
372
455
 
373
456
  ```ruby
374
457
  # config/application.rb (or an initializer)
@@ -382,11 +465,11 @@ Routes (GET-only; CSV export included):
382
465
 
383
466
  - `/llm-costs` — overview: spend with delta vs previous period, budget projection, spend anomaly banner, daily trend vs previous slice, provider rollup, top models
384
467
  - `/llm-costs/models` — by provider + model; sortable by spend, volume, avg cost, latency
385
- - `/llm-costs/calls` — filterable + paginated; outlier sort modes (expensive, largest input/output, slowest, unknown pricing); CSV export
468
+ - `/llm-costs/calls` — filterable + paginated; sort modes for recency, spend, input tokens, output tokens, latency, and unknown pricing; CSV export
386
469
  - `/llm-costs/calls/:id` — details with token mix and cost mix breakdowns
387
470
  - `/llm-costs/tags` — tag keys present in the dataset (PG/SQLite native; MySQL 8.0+ via JSON_TABLE)
388
471
  - `/llm-costs/tags/:key` — breakdown by values of a given tag key
389
- - `/llm-costs/data_quality` — unknown pricing share, untagged calls, missing latency
472
+ - `/llm-costs/data_quality` — unknown pricing, untagged calls, missing latency, incomplete stream usage, and missing provider response IDs
390
473
 
391
474
  No built-in auth is included. Tags carry whatever your app puts in them, so protect the mount point with your application's authentication.
392
475
 
@@ -425,6 +508,7 @@ ActiveSupport::Notifications.subscribe("llm_request.llm_cost_tracker") do |*, pa
425
508
  # total_cost: 0.000795, currency: "USD"
426
509
  # },
427
510
  # pricing_mode: "batch",
511
+ # stream: false, usage_source: "response", provider_response_id: "chatcmpl_123",
428
512
  # tags: { feature: "chat", user_id: 42 },
429
513
  # tracked_at: 2026-04-16 14:30:00 UTC
430
514
  # }
@@ -456,21 +540,23 @@ Configured hosts are parsed using the OpenAI-compatible usage shape (`prompt_tok
456
540
  For providers with a non-OpenAI usage shape:
457
541
 
458
542
  ```ruby
459
- require "uri"
460
-
461
543
  class AcmeParser < LlmCostTracker::Parsers::Base
544
+ HOSTS = %w[api.acme-llm.example].freeze
545
+ TRACKED_PATHS = %w[/v1/generate].freeze
546
+
547
+ def provider_names
548
+ %w[acme]
549
+ end
550
+
462
551
  def match?(url)
463
- uri = URI.parse(url.to_s)
464
- uri.host == "api.acme-llm.example" && uri.path == "/v1/generate"
465
- rescue URI::InvalidURIError
466
- false
552
+ match_uri?(url, hosts: HOSTS, exact_paths: TRACKED_PATHS)
467
553
  end
468
554
 
469
- def parse(request_url, request_body, response_status, response_body)
555
+ def parse(_request_url, _request_body, response_status, response_body)
470
556
  return nil unless response_status == 200
471
557
 
472
558
  payload = safe_json_parse(response_body)
473
- usage = payload&.dig("usage")
559
+ usage = payload.dig("usage")
474
560
  return nil unless usage
475
561
 
476
562
  LlmCostTracker::ParsedUsage.build(
@@ -482,31 +568,31 @@ class AcmeParser < LlmCostTracker::Parsers::Base
482
568
  end
483
569
  end
484
570
 
485
- LlmCostTracker::Parsers::Registry.register(AcmeParser.new)
571
+ LlmCostTracker::Parsers::Registry.register(AcmeParser)
486
572
  ```
487
573
 
488
574
  ## Supported providers
489
575
 
490
576
  | Provider | Auto-detected | Models with pricing |
491
577
  |---|:---:|---|
492
- | OpenAI | | GPT-5.2/5.1/5, GPT-5 mini/nano, GPT-4.1, GPT-4o, o1/o3/o4-mini |
493
- | OpenRouter | | OpenAI-compatible usage; provider-prefixed OpenAI model IDs normalized when possible |
494
- | DeepSeek | | OpenAI-compatible usage; add `pricing_overrides` for DeepSeek models |
495
- | OpenAI-compatible hosts | 🔧 | Configure `openai_compatible_providers` |
496
- | Anthropic | | Claude Opus 4.6/4.1/4, Sonnet 4.6/4.5/4, Haiku 4.5, Claude 3.x |
497
- | Google Gemini | | Gemini 2.5 Pro/Flash/Flash-Lite, 2.0 Flash/Flash-Lite, 1.5 Pro/Flash |
498
- | Any other | 🔧 | Custom parser |
578
+ | OpenAI | Yes | GPT-5.5/5.4/5.2/5.1/5, GPT-5.5/5.4/5.2/5 pro, GPT-5.4 mini/nano, GPT-5 mini/nano, GPT-4.1, GPT-4o, o1/o3/o4-mini |
579
+ | OpenRouter | Yes | OpenAI-compatible usage; provider-prefixed OpenAI model IDs normalized when possible |
580
+ | DeepSeek | Yes | OpenAI-compatible usage; add `pricing_overrides` for DeepSeek models |
581
+ | OpenAI-compatible hosts | Config | Configure `openai_compatible_providers` |
582
+ | Anthropic | Yes | Claude Opus 4.7/4.6/4.5/4.1/4, Sonnet 4.6/4.5/4, Haiku 4.5 |
583
+ | Google Gemini | Yes | Gemini 2.5 Pro/Flash/Flash-Lite, 2.0 Flash/Flash-Lite |
584
+ | Any other | Config | Custom parser |
499
585
 
500
- Endpoints: OpenAI Chat Completions / Responses / Completions / Embeddings; OpenAI-compatible equivalents; Anthropic Messages; Gemini `generateContent` and `streamGenerateContent`. All endpoints support streaming capture.
586
+ Endpoints: OpenAI Chat Completions / Responses / Completions / Embeddings; OpenAI-compatible equivalents; Anthropic Messages; Gemini `generateContent` and `streamGenerateContent`. Official SDK integrations currently cover non-streaming OpenAI Responses / Chat Completions and Anthropic Messages. Streaming capture is supported for Faraday endpoints that emit stream events with final usage.
501
587
 
502
588
  ## Safety
503
589
 
504
- **By design, `llm_cost_tracker` never persists prompt or response content.** The only data stored per call is the metadata needed for a cost ledger (provider, model, token counts, cost, latency, tags, provider response ID, HTTP status, and a timestamp). Tags carry whatever your application passes in — treat them as user-controlled input and avoid putting request bodies, completions, or secrets into them.
590
+ **By design, `llm_cost_tracker` never persists prompt or response content.** The only data stored per call is the metadata needed for a cost ledger (provider, model, token counts, cost, latency, tags, provider response ID, and timestamp). Tags carry whatever your application passes in — treat them as user-controlled input and avoid putting request bodies, completions, or secrets into them.
505
591
 
506
592
  - No external HTTP calls at request-tracking time.
507
593
  - No prompt or response bodies stored.
508
594
  - Faraday responses not modified.
509
- - Authorization headers and API keys are never stored or logged.
595
+ - Request headers are never stored. Warning logs strip query strings from URLs before logging.
510
596
  - Storage failures non-fatal by default (`storage_error_behavior = :warn`).
511
597
  - Budget and unknown-pricing errors are raised only when you opt in.
512
598
 
@@ -514,9 +600,9 @@ Endpoints: OpenAI Chat Completions / Responses / Completions / Embeddings; OpenA
514
600
 
515
601
  The gem is designed for multi-threaded hosts — Puma with `max_threads > 1` and Sidekiq with `concurrency > 1` are both supported. A few rules:
516
602
 
517
- - **Configure once at boot.** `LlmCostTracker.configure` deep-freezes `default_tags`, `pricing_overrides`, `report_tag_breakdowns`, and `openai_compatible_providers` when the block returns. Mutating or replacing shared fields through `LlmCostTracker.configuration` raises `FrozenError`.
518
- - **Use `:active_record` storage for shared ledgers.** Puma workers and Sidekiq processes do not share memory; `:log` and `:custom` backends see per-process state only. `:active_record` writes to a single table and is the right choice for dashboards and budget checks across processes.
519
- - **Size your connection pool.** Each tracked call on the middleware path issues up to three SQL queries (preflight `SUM`, `INSERT`, post-check `SUM`). Make sure the AR pool covers `puma max_threads + sidekiq concurrency` plus your app's own usage.
603
+ - **Configure once at boot.** `LlmCostTracker.configure` freezes mutable shared configuration when the block returns, and replacing shared fields through `LlmCostTracker.configuration` raises `FrozenError`. If `default_tags` is callable, keep it fast and thread-safe.
604
+ - **Use `:active_record` storage for the built-in shared ledger.** Puma workers and Sidekiq processes do not share memory; `:log` is process-local, and `:custom` is only as shared as the sink you write to. `:active_record` writes to a single table and is the right choice for the bundled dashboard and budget checks across processes.
605
+ - **Size your connection pool.** Each tracked call on the middleware path uses the host app's ActiveRecord connection for ledger writes, period rollups, and optional budget checks. Make sure the AR pool covers `puma max_threads + sidekiq concurrency` plus your app's own usage.
520
606
  - **Don't share a `StreamCollector` across threads you don't own.** The collector itself is thread-safe — `event`, `usage`, and `finish!` synchronize internally and `finish!` is idempotent — but the documented pattern is one collector per stream.
521
607
  - **`finish!` is a barrier.** Once a stream is finished, later `event`, `usage`, or `model=` calls raise `FrozenError` instead of mutating a closed collector.
522
608
  - **`ActiveSupport::Notifications` subscribers run synchronously** in the caller's thread. Keep them fast or hand off to a background job; otherwise they add latency to every tracked call.
@@ -525,9 +611,10 @@ The gem is designed for multi-threaded hosts — Puma with `max_threads > 1` and
525
611
  ## Known limitations
526
612
 
527
613
  - `:block_requests` is a best-effort guardrail, not a hard cap. Concurrent workers can pass preflight simultaneously and collectively overshoot the budget. Use an external quota system if you need a transactional cap.
614
+ - Official SDK integrations currently cover non-streaming calls. Use Faraday middleware or `track_stream` for SDK streaming until stable stream wrappers are added.
528
615
  - Streaming capture relies on the provider emitting a final-usage event (OpenAI needs `stream_options: { include_usage: true }`); missing events are recorded with `usage_source: "unknown"` so they surface on the Data Quality page.
529
616
  - `provider_response_id` is stored only when the provider exposes a stable response object ID. Missing IDs stay `nil` and surface on the Data Quality page.
530
- - Cache write TTL variants (1h vs 5min writes) not modeled separately.
617
+ - Cache write TTL variants (1h vs 5min writes) are not modeled separately.
531
618
 
532
619
  ## Development
533
620
 
@@ -535,8 +622,7 @@ Architecture rules for future changes live in [`docs/architecture.md`](docs/arch
535
622
 
536
623
  ```bash
537
624
  bundle install
538
- bundle exec rspec
539
- bundle exec rubocop
625
+ bin/check
540
626
  ```
541
627
 
542
628
  ## License