llm_cost_tracker 0.4.1 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (34) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +19 -0
  3. data/README.md +182 -100
  4. data/lib/llm_cost_tracker/configuration/instrumentation.rb +37 -0
  5. data/lib/llm_cost_tracker/configuration.rb +10 -5
  6. data/lib/llm_cost_tracker/doctor.rb +166 -0
  7. data/lib/llm_cost_tracker/generators/llm_cost_tracker/install_generator.rb +33 -0
  8. data/lib/llm_cost_tracker/generators/llm_cost_tracker/prices_generator.rb +12 -6
  9. data/lib/llm_cost_tracker/generators/llm_cost_tracker/templates/initializer.rb.erb +53 -21
  10. data/lib/llm_cost_tracker/integrations/anthropic.rb +75 -0
  11. data/lib/llm_cost_tracker/integrations/base.rb +72 -0
  12. data/lib/llm_cost_tracker/integrations/object_reader.rb +56 -0
  13. data/lib/llm_cost_tracker/integrations/openai.rb +95 -0
  14. data/lib/llm_cost_tracker/integrations/registry.rb +41 -0
  15. data/lib/llm_cost_tracker/middleware/faraday.rb +4 -3
  16. data/lib/llm_cost_tracker/parsed_usage.rb +8 -1
  17. data/lib/llm_cost_tracker/parsers/base.rb +1 -1
  18. data/lib/llm_cost_tracker/parsers/openai_usage.rb +1 -1
  19. data/lib/llm_cost_tracker/price_freshness.rb +38 -0
  20. data/lib/llm_cost_tracker/price_registry.rb +14 -0
  21. data/lib/llm_cost_tracker/price_sync/fetcher.rb +2 -1
  22. data/lib/llm_cost_tracker/price_sync/refresh_plan_builder.rb +4 -2
  23. data/lib/llm_cost_tracker/price_sync.rb +10 -0
  24. data/lib/llm_cost_tracker/prices.json +394 -41
  25. data/lib/llm_cost_tracker/pricing.rb +8 -1
  26. data/lib/llm_cost_tracker/request_url.rb +20 -0
  27. data/lib/llm_cost_tracker/stream_collector.rb +3 -3
  28. data/lib/llm_cost_tracker/tag_context.rb +52 -0
  29. data/lib/llm_cost_tracker/tracker.rb +5 -2
  30. data/lib/llm_cost_tracker/version.rb +1 -1
  31. data/lib/llm_cost_tracker.rb +14 -4
  32. data/lib/tasks/llm_cost_tracker.rake +21 -3
  33. metadata +12 -3
  34. data/lib/llm_cost_tracker/generators/llm_cost_tracker/templates/llm_cost_tracker_prices.yml.erb +0 -51
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: d2cdd5f30c6fbd8c0168549b0853e9d8bc54586e60921733ce11a89a1d86078c
4
- data.tar.gz: c91384579df6acdeb04d24b62f8bf916040f98156fd2bc882c94afc534f7dba5
3
+ metadata.gz: 6ee180a9d6ead4b84965b3ff96f87b31c6ce8982a8e13383f936d3031e8f6f5f
4
+ data.tar.gz: fda6d61c9f86b4e2a4dbdc7a7852f6f4f22bcf43f76b6cfbdd4f438c325e8d8c
5
5
  SHA512:
6
- metadata.gz: 88d61d6714101ee9e8162814f5527bde487eced83663d86a1f938b77bcee1e4fcb4db2c4dde763720a828368a779a8677da57ecf10b2fadec78a959e6fdce6a7
7
- data.tar.gz: d2d2bb097058507c06c1ea330a0c7d5a63d2e92b824fd8e4e7640a052dff100038eadb2e097edb4500457c382368080414c5261a2cc0a4f69436ae2234fd420d
6
+ metadata.gz: 8e341c007ff3380459a07890a45bc5e05010c12ffc52a1f805492eb6c643e9637529b02e4d6ae12a7f35c1e25ea819336544bd007f7cfc5efa9c7999559f5d83
7
+ data.tar.gz: 44c912532194be0f239c6950f1f91317329bb5b8c3afbf33e430b4f9006377a8729ad0ed7f3c2c98528983d218fabef136774088018203426749532eb01627ef
data/CHANGELOG.md CHANGED
@@ -4,6 +4,25 @@ Format: [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). Versioning: [S
4
4
 
5
5
  ## [Unreleased]
6
6
 
7
+ ## [0.5.0] - 2026-04-25
8
+
9
+ ### Added
10
+
11
+ - Optional SDK integrations: `config.instrument :openai`, `:anthropic`, or `:all` patches the official `openai` and `anthropic` gems' resource methods to record usage automatically. Provider SDKs are not added as hard dependencies.
12
+ - `LlmCostTracker.with_tags` plus `TagContext` for thread- and fiber-isolated request-scoped tags that flow through middleware, SDK integrations, and `track` / `track_stream`.
13
+ - `LlmCostTracker::Doctor` and the `llm_cost_tracker:doctor` rake task for diagnosing storage, schema, optional columns, period totals, integrations, prices, and recent calls.
14
+ - `LlmCostTracker::PriceFreshness` helper plus a price-freshness doctor check that warns when bundled or local prices are stale.
15
+ - Technical documentation under `docs/technical/` covering architecture, data flow, extension points, module map, and operational notes.
16
+
17
+ ### Changed
18
+
19
+ - Pricing fuzzy matching now only accepts dated snapshot suffixes instead of guessing new model families.
20
+ - Built-in prices include GPT-5.5 and GPT-5.4 variants and drop retired Claude and Gemini entries.
21
+ - Missing model identifiers now normalize to `unknown` instead of leaking nil into tracked events.
22
+ - `llm_cost_tracker:prices` now generates a full local price snapshot instead of an empty override file.
23
+ - Price sync workflow surfaces clearer error context for fetcher failures and skips refresh-plan entries with malformed pricing.
24
+ - README, cookbook, and technical docs clarify that `config.instrument` patches official SDKs only; `ruby-openai` (alexrudall) routes through the Faraday middleware via its constructor block, and `ruby_llm` is not auto-captured today because the gem does not expose a Faraday middleware hook.
25
+
7
26
  ## [0.4.1] - 2026-04-24
8
27
 
9
28
  ### Changed
data/README.md CHANGED
@@ -1,13 +1,13 @@
1
1
  # LLM Cost Tracker
2
2
 
3
- **Self-hosted LLM cost tracking for Ruby and Rails.** Intercepts Faraday LLM responses or records usage explicitly, prices events locally, and stores them in your database. No proxy, no SaaS.
3
+ **Self-hosted LLM cost tracking for Ruby and Rails.** Instruments common Ruby SDKs, intercepts Faraday LLM responses, prices events locally, and can store them in your database. No proxy, no SaaS.
4
4
 
5
5
  [![Gem Version](https://img.shields.io/gem/v/llm_cost_tracker.svg)](https://rubygems.org/gems/llm_cost_tracker)
6
6
  [![CI](https://github.com/sergey-homenko/llm_cost_tracker/actions/workflows/ruby.yml/badge.svg)](https://github.com/sergey-homenko/llm_cost_tracker/actions)
7
7
  [![codecov](https://codecov.io/gh/sergey-homenko/llm_cost_tracker/branch/main/graph/badge.svg)](https://codecov.io/gh/sergey-homenko/llm_cost_tracker)
8
8
 
9
- Requires Ruby 3.3+, Rails/ActiveRecord 7.1+, and Faraday 2.0+.
10
- Core tracking works without Rails; the mounted dashboard requires Rails 7.1+.
9
+ Requires Ruby 3.3+, ActiveSupport 7.1+, and Faraday 2.0+.
10
+ ActiveRecord storage requires ActiveRecord 7.1+. The mounted dashboard requires Rails 7.1+.
11
11
 
12
12
  ## Why
13
13
 
@@ -16,48 +16,48 @@ Every Rails app with LLM integrations eventually runs into the same question: wh
16
16
  ## What You Get
17
17
 
18
18
  - A local ActiveRecord ledger of provider, model, usage breakdown, cost, latency, tags, streaming usage, and provider response IDs
19
- - Faraday middleware plus explicit `track` / `track_stream` helpers for non-Faraday clients
20
- - Server-rendered Rails dashboard with overview, calls, tags, CSV export, and data-quality pages
19
+ - Optional official OpenAI and Anthropic SDK integrations, plus Faraday middleware for custom clients
20
+ - Explicit `track` / `track_stream` helpers as a fallback for unsupported clients
21
+ - Server-rendered Rails dashboard with overview, models, calls, tags, CSV export, and data-quality pages
21
22
  - Local pricing snapshots, price sync tasks, and budget guardrails
22
23
  - Prompt and response bodies are never persisted
23
24
 
24
25
  ## Dashboard
25
26
 
26
- LLM Cost Tracker ships with an optional server-rendered Rails Engine dashboard for spend review, attribution, and data quality checks.
27
+ LLM Cost Tracker ships with a server-rendered Rails Engine dashboard for spend review, attribution, and data quality checks.
27
28
 
28
29
  ![LLM Cost Tracker dashboard](docs/dashboard-overview.png)
29
30
 
30
- The overview page includes spend trend, budget status, provider breakdown, top models, and filterable slices. The engine also includes Calls, Tags, and Data Quality pages. Plain ERB, no JavaScript bundle.
31
+ The overview page includes spend trend, budget status, provider breakdown, top models, and filterable slices. The engine also includes Models, Calls, Tags, and Data Quality pages. Plain ERB, no JavaScript bundle.
31
32
 
32
33
  ## Quickstart
33
34
 
34
35
  ```ruby
35
36
  gem "llm_cost_tracker"
37
+ gem "openai"
36
38
  ```
37
39
 
38
40
  ```bash
39
- bin/rails generate llm_cost_tracker:install
41
+ bin/rails generate llm_cost_tracker:install --dashboard --prices
40
42
  bin/rails db:migrate
43
+ bin/rails llm_cost_tracker:doctor
41
44
  ```
42
45
 
46
+ Skip `--dashboard` if you only want the ledger. Skip `--prices` if you do not want a local pricing file yet.
47
+
43
48
  ```ruby
44
49
  LlmCostTracker.configure do |config|
45
50
  config.storage_backend = :active_record
46
- config.default_tags = { app: "my_app", environment: Rails.env }
51
+ config.default_tags = -> { { environment: Rails.env } }
52
+ config.instrument :openai
47
53
  end
48
54
 
49
- OpenAI.configure do |config|
50
- config.access_token = ENV["OPENAI_API_KEY"]
51
- config.faraday do |f|
52
- f.use :llm_cost_tracker, tags: -> { { user_id: Current.user&.id, feature: "chat" } }
53
- end
55
+ LlmCostTracker.with_tags(user_id: Current.user&.id, feature: "chat") do
56
+ client = OpenAI::Client.new(api_key: ENV["OPENAI_API_KEY"])
57
+ client.responses.create(model: "gpt-4o", input: "Hello")
54
58
  end
55
59
  ```
56
60
 
57
- ```ruby
58
- mount LlmCostTracker::Engine => "/llm-costs"
59
- ```
60
-
61
61
  After that, LLM Cost Tracker starts recording calls into `llm_api_calls` and the dashboard becomes available at `/llm-costs`.
62
62
  Protect the mounted engine with your application's authentication before exposing it outside development.
63
63
 
@@ -69,39 +69,43 @@ Protect the mounted engine with your application's authentication before exposin
69
69
  - No built-in auth on the mounted dashboard
70
70
  - Use `:active_record` when you want shared dashboards and budget checks across Puma workers and Sidekiq processes
71
71
 
72
- ## Installation
72
+ ## Technical Docs
73
73
 
74
- ```ruby
75
- gem "llm_cost_tracker"
76
- ```
74
+ - [Architecture](docs/architecture.md)
77
75
 
78
- For ActiveRecord storage:
76
+ ## Usage
79
77
 
80
- ```bash
81
- bin/rails generate llm_cost_tracker:install
82
- bin/rails db:migrate
83
- ```
78
+ ### Official SDK integrations
84
79
 
85
- ## Usage
80
+ `config.instrument` patches **official** provider SDKs only — currently the official `openai` and `anthropic` gems. SDK integrations are optional and do not add provider SDKs as gem dependencies. Install the provider SDK you already use, then enable its integration.
81
+
82
+ ```ruby
83
+ LlmCostTracker.configure do |config|
84
+ config.instrument :openai
85
+ config.instrument :anthropic
86
+ end
87
+ ```
86
88
 
87
- ### Patch an existing client's Faraday connection
89
+ The OpenAI integration records non-streaming calls through the official `openai` gem's `responses.create` and `chat.completions.create`. The Anthropic integration records non-streaming calls through the official `anthropic` gem's `messages.create`. Both integrations extract usage, model, latency, provider response ID, cache tokens, and hidden/reasoning tokens when the SDK response exposes them.
88
90
 
89
91
  ```ruby
90
- # config/initializers/openai.rb
91
- OpenAI.configure do |config|
92
- config.access_token = ENV["OPENAI_API_KEY"]
93
-
94
- config.faraday do |f|
95
- f.use :llm_cost_tracker, tags: -> {
96
- { user_id: Current.user&.id, workflow: Current.workflow, env: Rails.env }
97
- }
98
- end
92
+ LlmCostTracker.with_tags(feature: "support_chat", user_id: Current.user&.id) do
93
+ anthropic = Anthropic::Client.new(api_key: ENV["ANTHROPIC_API_KEY"])
94
+ anthropic.messages.create(
95
+ model: "claude-sonnet-4-5-20250929",
96
+ max_tokens: 1024,
97
+ messages: [{ role: "user", content: "Hello" }]
98
+ )
99
99
  end
100
100
  ```
101
101
 
102
- `tags:` can be a callable and is evaluated on each request.
102
+ Community clients such as `ruby-openai` are not patched by `instrument`. `ruby-openai` exposes a Faraday block on its constructor and is covered by the middleware below.
103
+
104
+ Google's official Gemini SDKs do not include Ruby. Use the Faraday middleware against Gemini's REST API, or keep custom clients behind the fallback helpers until a stable SDK integration exists.
103
105
 
104
- ### Raw Faraday
106
+ ### Faraday middleware
107
+
108
+ `tags:` can be a hash or callable. Callables are evaluated on each request and may accept the Faraday request env.
105
109
 
106
110
  ```ruby
107
111
  conn = Faraday.new(url: "https://api.openai.com") do |f|
@@ -116,9 +120,11 @@ conn.post("/v1/responses", { model: "gpt-5-mini", input: "Hello!" })
116
120
 
117
121
  Place `llm_cost_tracker` inside the Faraday stack where it can see the final response body.
118
122
 
123
+ The same middleware covers `ruby-openai` through its constructor block.
124
+
119
125
  ### Streaming
120
126
 
121
- Streaming is captured automatically for OpenAI, Anthropic, and Gemini when the request goes through the Faraday middleware. The middleware tees the `on_data` callback, keeps the stream flowing to your code, and records the final usage block once the response completes.
127
+ Streaming is captured automatically for OpenAI, Anthropic, and Gemini when the request goes through the Faraday middleware. The middleware tees the `on_data` callback, keeps the stream flowing to your code, and records provider-reported usage once the response completes.
122
128
 
123
129
  ```ruby
124
130
  # OpenAI: include usage in the final chunk
@@ -130,20 +136,22 @@ client.chat(parameters: {
130
136
  })
131
137
  ```
132
138
 
133
- Anthropic emits usage in `message_start` + `message_delta` events. Gemini's `:streamGenerateContent` endpoint includes `usageMetadata`; usage from the final chunk is used.
139
+ Anthropic emits usage in `message_start` + `message_delta` events. Gemini's `:streamGenerateContent` endpoint includes `usageMetadata`; the latest usage block is used.
134
140
 
135
141
  Streamed calls are stored with `stream: true` and `usage_source: "stream_final"`. If the provider never sends final usage, the call is still recorded with `usage_source: "unknown"` so those calls surface on the Data Quality page.
136
142
 
137
143
  When the provider emits a stable response object ID, LLM Cost Tracker stores it as `provider_response_id`. OpenAI and Anthropic are covered end-to-end; Gemini is best effort and may vary by endpoint or API version.
138
144
 
139
- For non-Faraday clients (raw `Net::HTTP`, custom SSE code, Azure OpenAI), use the explicit helper:
145
+ Model identifiers are extracted from the provider response, request body, stream events, or URL path depending on the provider. If no source carries a model, the event is stored under `model: "unknown"` and shows up as unknown pricing instead of being guessed.
146
+
147
+ For non-Faraday clients without an SDK integration, prefer adding a supported adapter. Use the explicit helper only as a fallback while wiring a client that does not expose a stable hook yet:
140
148
 
141
149
  ```ruby
142
150
  LlmCostTracker.track_stream(provider: "openai", model: "gpt-4o") do |stream|
143
- my_client.stream(...) { |chunk| stream.event(chunk) }
151
+ my_client.stream(...) { |event| stream.event(event.to_h) }
144
152
  end
145
153
 
146
- # Or skip the chunk parsing entirely if you already know the totals:
154
+ # Or skip provider event parsing entirely if you already know the totals:
147
155
  LlmCostTracker.track_stream(provider: "openai", model: "gpt-4o") do |stream|
148
156
  # ... your streaming loop ...
149
157
  stream.usage(input_tokens: 120, output_tokens: 45)
@@ -163,7 +171,9 @@ Run `bin/rails g llm_cost_tracker:add_streaming` once on existing installs to ad
163
171
 
164
172
  More client-specific snippets live in [`docs/cookbook.md`](docs/cookbook.md).
165
173
 
166
- ### Manual tracking
174
+ ### Fallback tracking
175
+
176
+ Automatic capture should be the default integration path. `track` exists for custom clients, internal gateways, migrations, and SDKs that do not expose a stable middleware or instrumentation hook yet.
167
177
 
168
178
  ```ruby
169
179
  LlmCostTracker.track(
@@ -182,42 +192,72 @@ LlmCostTracker.track(
182
192
  `cache_read_input_tokens` and cache writes in `cache_write_input_tokens`; total
183
193
  tokens are calculated from the canonical billing breakdown.
184
194
 
195
+ For manual tracking, pass the real upstream model when you know it. If a gateway only exposes a deployment or router name, use that stable identifier and add a matching `prices_file` / `pricing_overrides` entry.
196
+
197
+ ### Tags
198
+
199
+ Tags are application context, not provider metadata. LLM Cost Tracker detects provider/model from the response when a parser is available; tags tell you who or what caused the call.
200
+
201
+ ```ruby
202
+ LlmCostTracker.with_tags(user_id: current_user.id, feature: "support_chat", trace_id: request.uuid) do
203
+ client.chat(parameters: { model: "gpt-4o", messages: [...] })
204
+ end
205
+ ```
206
+
207
+ `default_tags` can be a hash or callable. Scoped tags from `with_tags` apply only inside the block and are isolated per thread/fiber. Explicit tags passed to `track`, `track_stream`, or middleware metadata win over scoped/default tags.
208
+
185
209
  ## Configuration
186
210
 
187
211
  ```ruby
188
- # config/initializers/llm_cost_tracker.rb
189
212
  LlmCostTracker.configure do |config|
190
- config.storage_backend = :active_record # :log (default), :active_record, :custom
191
- config.default_tags = { app: "my_app", environment: Rails.env }
192
-
213
+ config.storage_backend = :active_record
214
+ config.default_tags = -> { { environment: Rails.env } }
215
+ config.instrument :openai
216
+ config.instrument :anthropic
217
+ config.prices_file = Rails.root.join("config/llm_cost_tracker_prices.yml")
193
218
  config.monthly_budget = 500.00
194
219
  config.daily_budget = 50.00
195
220
  config.per_call_budget = 2.00
196
- config.budget_exceeded_behavior = :notify # :notify, :raise, :block_requests
197
- config.storage_error_behavior = :warn # :ignore, :warn, :raise
198
- config.unknown_pricing_behavior = :warn # :ignore, :warn, :raise
199
-
221
+ config.budget_exceeded_behavior = :notify
200
222
  config.on_budget_exceeded = ->(data) {
201
- SlackNotifier.notify("#alerts", "🚨 LLM #{data[:budget_type]} budget $#{data[:total].round(2)} / $#{data[:budget]}")
202
- }
203
-
204
- config.prices_file = Rails.root.join("config/llm_cost_tracker_prices.yml")
205
- config.pricing_overrides = {
206
- "ft:gpt-4o-mini:my-org" => { input: 0.30, cache_read_input: 0.15, output: 1.20 }
223
+ SlackNotifier.notify("#alerts", "LLM #{data[:budget_type]} budget $#{data[:total].round(2)} / $#{data[:budget]}")
207
224
  }
208
-
209
- # Built-in: openrouter.ai, api.deepseek.com
210
- config.openai_compatible_providers["llm.my-company.com"] = "internal_gateway"
211
225
  end
212
226
  ```
213
227
 
228
+ Storage backends: `:log` (default), `:active_record`, `:custom`. Error behaviors: `:ignore`, `:warn`, `:raise`; budget behavior also supports `:block_requests`.
229
+
230
+ Configuration reference:
231
+
232
+ | Option | Default | Purpose |
233
+ |---|---:|---|
234
+ | `enabled` | `true` | Turns tracking on/off. |
235
+ | `storage_backend` | `:log` | `:log`, `:active_record`, or `:custom`. |
236
+ | `custom_storage` | `nil` | Callable storage hook for `:custom`. |
237
+ | `default_tags` | `{}` | Hash or callable merged into every event. |
238
+ | `prices_file` | `nil` | Local JSON/YAML price table. |
239
+ | `pricing_overrides` | `{}` | Ruby-side model price overrides. |
240
+ | `instrument` | none | Enables optional SDK integrations such as `:openai`, `:anthropic`, or `:all`. |
241
+ | `monthly_budget` | `nil` | Monthly spend guardrail. |
242
+ | `daily_budget` | `nil` | Daily spend guardrail. |
243
+ | `per_call_budget` | `nil` | Single-event spend guardrail. |
244
+ | `budget_exceeded_behavior` | `:notify` | `:notify`, `:raise`, or `:block_requests`. |
245
+ | `on_budget_exceeded` | `nil` | Callback for budget events. |
246
+ | `storage_error_behavior` | `:warn` | `:ignore`, `:warn`, or `:raise`. |
247
+ | `unknown_pricing_behavior` | `:warn` | `:ignore`, `:warn`, or `:raise`. |
248
+ | `log_level` | `:info` | Log level used by `:log` storage. |
249
+ | `openai_compatible_providers` | OpenRouter + DeepSeek | Host-to-provider map for compatible APIs. |
250
+ | `report_tag_breakdowns` | `[]` | Tag keys included in text reports. |
251
+
252
+ LLM Cost Tracker estimates cost from recorded usage and a versioned price registry. Providers usually return token usage, not a stable per-request price, so request costs are calculated locally and stored with the call. Historical rows do not change when prices update.
253
+
214
254
  Pricing is best effort. OpenRouter-style IDs like `openai/gpt-4o-mini` are normalized to built-in names when possible. Use `prices_file` / `pricing_overrides` for fine-tunes, gateway-specific IDs, enterprise discounts, alternate pricing modes, or models the gem does not know.
215
255
  Provider-specific entries like `openai/gpt-4o-mini` win over model-only entries like `gpt-4o-mini`.
216
256
  Pass `pricing_mode: :batch` to use optional mode-specific keys such as `batch_input` / `batch_output`; missing mode-specific keys fall back to standard `input` / `output` rates. The same pattern works for custom modes, for example `contract_input`.
217
257
 
218
258
  `storage_error_behavior = :warn` (default) lets LLM responses continue if storage fails; `:raise` exposes `StorageError#original_error`.
219
259
 
220
- Unknown pricing still records token counts, but `cost` is `nil` and budget guardrails skip that event. Find unpriced models:
260
+ With `unknown_pricing_behavior = :ignore` or `:warn`, unknown pricing still records token counts, but `cost` is `nil` and budget guardrails skip that event. With `:raise`, the event raises before storage. Find unpriced models:
221
261
 
222
262
  ```ruby
223
263
  LlmCostTracker::LlmApiCall.unknown_pricing.group(:model).count
@@ -225,22 +265,33 @@ LlmCostTracker::LlmApiCall.unknown_pricing.group(:model).count
225
265
 
226
266
  ### Keeping prices current
227
267
 
228
- Built-in prices live in `lib/llm_cost_tracker/prices.json`. The gem never fetches pricing on boot. For production, keep a local snapshot under `config/` and point the gem at it:
268
+ Built-in prices live in `lib/llm_cost_tracker/prices.json`. The gem never fetches pricing on boot. For production, generate a local snapshot from the bundled registry, keep it under source control, and point the gem at it:
229
269
 
230
270
  ```bash
231
271
  bin/rails generate llm_cost_tracker:prices
232
272
  ```
233
273
 
234
- ```json
235
- {
236
- "metadata": { "updated_at": "2026-04-18", "currency": "USD", "unit": "1M tokens" },
237
- "models": {
238
- "my-gateway/gpt-4o-mini": { "input": 0.20, "cache_read_input": 0.10, "output": 0.80, "batch_input": 0.10, "batch_output": 0.40 }
239
- }
240
- }
274
+ ```ruby
275
+ config.prices_file = Rails.root.join("config/llm_cost_tracker_prices.yml")
276
+ ```
277
+
278
+ The generated file has the same shape as the bundled registry:
279
+
280
+ ```yaml
281
+ metadata:
282
+ updated_at: "2026-04-25"
283
+ currency: USD
284
+ unit: 1M tokens
285
+ models:
286
+ my-gateway/gpt-4o-mini:
287
+ input: 0.20
288
+ cache_read_input: 0.10
289
+ output: 0.80
290
+ batch_input: 0.10
291
+ batch_output: 0.40
241
292
  ```
242
293
 
243
- `pricing_overrides` has the highest precedence. Use it for a handful of Ruby-side overrides; use `prices_file` when you want a local pricing table under source control.
294
+ Pricing precedence is `pricing_overrides`, then `prices_file`, then bundled prices. Use `prices_file` for the app's source-controlled snapshot and `pricing_overrides` only for a handful of Ruby-side emergency overrides.
244
295
 
245
296
  To refresh prices on demand:
246
297
 
@@ -248,19 +299,30 @@ To refresh prices on demand:
248
299
  bin/rails llm_cost_tracker:prices:sync
249
300
  ```
250
301
 
251
- `llm_cost_tracker:prices:sync` refreshes the current registry from two structured sources: LiteLLM first, OpenRouter second. LiteLLM is the primary source; OpenRouter fills gaps and helps surface discrepancies.
302
+ `llm_cost_tracker:prices:sync` refreshes a pricing file from two structured sources: LiteLLM first, OpenRouter second. LiteLLM is the primary source; OpenRouter fills gaps and helps surface discrepancies.
252
303
 
253
304
  `llm_cost_tracker:prices:sync` / `llm_cost_tracker:prices:check` perform HTTP GET requests to:
254
305
 
255
306
  - LiteLLM pricing JSON: `https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json`
256
307
  - OpenRouter Models API: `https://openrouter.ai/api/v1/models`
257
308
 
258
- If `config.prices_file` is configured, the task syncs that file automatically; otherwise it works from the built-in snapshot. `_source: "manual"` entries are never touched. Models that are still in your file but missing from both upstream sources are left alone and reported as orphaned. For intentional custom entries, mark them as manual so they stop showing up in orphaned warnings.
309
+ The task writes to `ENV["OUTPUT"]`, then `config.prices_file`, in that order. It aborts if neither is present. The gem's bundled `prices.json` is only updated when you explicitly pass it through `OUTPUT=` while developing the gem. `_source: "manual"` entries are never touched. Models that are still in your file but missing from both upstream sources are left alone and reported as orphaned. For intentional custom entries, mark them as manual so they stop showing up in orphaned warnings.
259
310
 
260
- Use `PREVIEW=1` to see the diff without writing. Use `STRICT=1` to fail instead of applying a partial refresh when a source fails or the validator rejects a price. Use `bin/rails llm_cost_tracker:prices:check` in CI to print the current diff and exit non-zero when the snapshot has drifted or refresh fails.
311
+ Use `OUTPUT=config/llm_cost_tracker_prices.yml` to choose a target file explicitly. Use `PREVIEW=1` to see the diff without writing. Use `STRICT=1` to fail instead of applying a partial refresh when a source fails or the validator rejects a price. Use `bin/rails llm_cost_tracker:prices:check` in CI to print the current diff and exit non-zero when the snapshot has drifted or refresh fails.
261
312
 
262
313
  Large price changes are flagged during sync. If a specific entry is expected to move by more than 3x, add `_validator_override: ["skip_relative_change"]` to that entry in your local price file.
263
314
 
315
+ If sync reports `certificate verify failed`, fix the host Ruby/OpenSSL trust store rather than disabling TLS verification. Common fixes are installing `ca-certificates` in Docker/Linux images, configuring the corporate proxy CA, setting `SSL_CERT_FILE` to the system CA bundle, or rebuilding rbenv/asdf Ruby after an OpenSSL upgrade.
316
+
317
+ For unattended updates, run the check daily and sync through review:
318
+
319
+ ```bash
320
+ bin/rails llm_cost_tracker:prices:check
321
+ STRICT=1 bin/rails llm_cost_tracker:prices:sync
322
+ ```
323
+
324
+ `bin/rails llm_cost_tracker:doctor` warns when the configured price file has no `metadata.updated_at` or when it is older than 30 days.
325
+
264
326
  ## Budget enforcement
265
327
 
266
328
  ```ruby
@@ -277,7 +339,7 @@ config.budget_exceeded_behavior = :block_requests
277
339
 
278
340
  `monthly_budget` and `daily_budget` are cumulative ledger limits. `per_call_budget` is a ceiling for a single priced event and runs after the response cost is known.
279
341
 
280
- ActiveRecord installs keep `llm_cost_tracker_period_totals` in sync with atomic upserts. Budget preflight reads period rollups instead of scanning `llm_api_calls`.
342
+ ActiveRecord installs keep `llm_cost_tracker_period_totals` in sync with atomic upserts. Budget preflight reads period rollups when they are available instead of scanning `llm_api_calls`.
281
343
 
282
344
  ```ruby
283
345
  rescue LlmCostTracker::BudgetExceededError => e
@@ -286,7 +348,7 @@ rescue LlmCostTracker::BudgetExceededError => e
286
348
 
287
349
  `:block_requests` is a **guardrail, not a hard cap**. The preflight and the spend-recording write are separate statements, so under Puma / Sidekiq concurrency multiple workers can all pass the preflight and then collectively overshoot the budget. The setting reliably *stops new requests after the overshoot is visible* — it does not prevent the overshoot itself. For strict quotas use a provider- or gateway-level limit, or a database-backed counter outside this gem.
288
350
 
289
- Preflight is wired into the Faraday middleware automatically. When you record events via `LlmCostTracker.track` / `track_stream` and also want the same preflight, opt in:
351
+ Preflight is wired into the Faraday middleware and SDK integrations automatically. When you record events via `LlmCostTracker.track` / `track_stream` and also want the same preflight, opt in:
290
352
 
291
353
  ```ruby
292
354
  LlmCostTracker.track(
@@ -304,8 +366,20 @@ end
304
366
  LlmCostTracker.enforce_budget! # standalone preflight
305
367
  ```
306
368
 
369
+ ## Doctor
370
+
371
+ Run the setup check after install, deploy, or upgrades:
372
+
373
+ ```bash
374
+ bin/rails llm_cost_tracker:doctor
375
+ ```
376
+
377
+ It checks storage mode, ActiveRecord availability, table/column coverage, period rollups, pricing file loading, and whether calls are being recorded. Setup errors exit non-zero; warnings point at optional production hardening.
378
+
307
379
  ## Querying costs
308
380
 
381
+ These helpers and rake tasks require ActiveRecord storage.
382
+
309
383
  ```bash
310
384
  bin/rails llm_cost_tracker:report
311
385
  DAYS=7 bin/rails llm_cost_tracker:report
@@ -339,7 +413,7 @@ LlmCostTracker::LlmApiCall.between(1.week.ago, Time.current).cost_by_model
339
413
 
340
414
  ## Retention
341
415
 
342
- Retention is not enforced automatically. Use the rake task below if you need to delete older records in batches.
416
+ Retention is not enforced automatically. With ActiveRecord storage, use the rake task below if you need to delete older records in batches.
343
417
 
344
418
  ```bash
345
419
  DAYS=90 bin/rails llm_cost_tracker:prune # delete calls older than N days in batches
@@ -356,10 +430,15 @@ add_index :llm_api_calls, :tags, using: :gin
356
430
 
357
431
  On other adapters tags fall back to JSON in a text column. `by_tag` uses JSONB containment on PG, text matching elsewhere.
358
432
 
359
- Upgrade an existing install:
433
+ ## Upgrading existing installs
434
+
435
+ Run the generators that match columns missing from older versions:
360
436
 
361
437
  ```bash
362
438
  bin/rails generate llm_cost_tracker:add_period_totals # shared budget rollups
439
+ bin/rails generate llm_cost_tracker:add_streaming # stream + usage_source
440
+ bin/rails generate llm_cost_tracker:add_provider_response_id
441
+ bin/rails generate llm_cost_tracker:add_usage_breakdown
363
442
  bin/rails generate llm_cost_tracker:upgrade_tags_to_jsonb # PG: text → jsonb + GIN
364
443
  bin/rails generate llm_cost_tracker:upgrade_cost_precision # widen cost columns
365
444
  bin/rails generate llm_cost_tracker:add_latency_ms
@@ -370,7 +449,9 @@ On PostgreSQL, the generated `upgrade_tags_to_jsonb` migration rewrites `llm_api
370
449
 
371
450
  ## Mounting the dashboard
372
451
 
373
- Optional Rails Engine. Plain ERB, no JavaScript framework, no asset pipeline required. Requires Rails 7.1+; the core middleware works without Rails.
452
+ Optional Rails Engine. Plain ERB, no JavaScript framework, no asset pipeline required. Requires Rails 7.1+; the core middleware works without Rails. The dashboard reads `llm_api_calls`, so use `storage_backend = :active_record` for apps that mount it.
453
+
454
+ `bin/rails generate llm_cost_tracker:install --dashboard` adds the require and route for you. Manual setup:
374
455
 
375
456
  ```ruby
376
457
  # config/application.rb (or an initializer)
@@ -384,11 +465,11 @@ Routes (GET-only; CSV export included):
384
465
 
385
466
  - `/llm-costs` — overview: spend with delta vs previous period, budget projection, spend anomaly banner, daily trend vs previous slice, provider rollup, top models
386
467
  - `/llm-costs/models` — by provider + model; sortable by spend, volume, avg cost, latency
387
- - `/llm-costs/calls` — filterable + paginated; outlier sort modes (expensive, largest input/output, slowest, unknown pricing); CSV export
468
+ - `/llm-costs/calls` — filterable + paginated; sort modes for recency, spend, input tokens, output tokens, latency, and unknown pricing; CSV export
388
469
  - `/llm-costs/calls/:id` — details with token mix and cost mix breakdowns
389
470
  - `/llm-costs/tags` — tag keys present in the dataset (PG/SQLite native; MySQL 8.0+ via JSON_TABLE)
390
471
  - `/llm-costs/tags/:key` — breakdown by values of a given tag key
391
- - `/llm-costs/data_quality` — unknown pricing share, untagged calls, missing latency
472
+ - `/llm-costs/data_quality` — unknown pricing, untagged calls, missing latency, incomplete stream usage, and missing provider response IDs
392
473
 
393
474
  No built-in auth is included. Tags carry whatever your app puts in them, so protect the mount point with your application's authentication.
394
475
 
@@ -427,6 +508,7 @@ ActiveSupport::Notifications.subscribe("llm_request.llm_cost_tracker") do |*, pa
427
508
  # total_cost: 0.000795, currency: "USD"
428
509
  # },
429
510
  # pricing_mode: "batch",
511
+ # stream: false, usage_source: "response", provider_response_id: "chatcmpl_123",
430
512
  # tags: { feature: "chat", user_id: 42 },
431
513
  # tracked_at: 2026-04-16 14:30:00 UTC
432
514
  # }
@@ -493,24 +575,24 @@ LlmCostTracker::Parsers::Registry.register(AcmeParser)
493
575
 
494
576
  | Provider | Auto-detected | Models with pricing |
495
577
  |---|:---:|---|
496
- | OpenAI | | GPT-5.2/5.1/5, GPT-5 mini/nano, GPT-4.1, GPT-4o, o1/o3/o4-mini |
497
- | OpenRouter | | OpenAI-compatible usage; provider-prefixed OpenAI model IDs normalized when possible |
498
- | DeepSeek | | OpenAI-compatible usage; add `pricing_overrides` for DeepSeek models |
499
- | OpenAI-compatible hosts | 🔧 | Configure `openai_compatible_providers` |
500
- | Anthropic | | Claude Opus 4.6/4.1/4, Sonnet 4.6/4.5/4, Haiku 4.5, Claude 3.x |
501
- | Google Gemini | | Gemini 2.5 Pro/Flash/Flash-Lite, 2.0 Flash/Flash-Lite, 1.5 Pro/Flash |
502
- | Any other | 🔧 | Custom parser |
578
+ | OpenAI | Yes | GPT-5.5/5.4/5.2/5.1/5, GPT-5.5/5.4/5.2/5 pro, GPT-5.4 mini/nano, GPT-5 mini/nano, GPT-4.1, GPT-4o, o1/o3/o4-mini |
579
+ | OpenRouter | Yes | OpenAI-compatible usage; provider-prefixed OpenAI model IDs normalized when possible |
580
+ | DeepSeek | Yes | OpenAI-compatible usage; add `pricing_overrides` for DeepSeek models |
581
+ | OpenAI-compatible hosts | Config | Configure `openai_compatible_providers` |
582
+ | Anthropic | Yes | Claude Opus 4.7/4.6/4.5/4.1/4, Sonnet 4.6/4.5/4, Haiku 4.5 |
583
+ | Google Gemini | Yes | Gemini 2.5 Pro/Flash/Flash-Lite, 2.0 Flash/Flash-Lite |
584
+ | Any other | Config | Custom parser |
503
585
 
504
- Endpoints: OpenAI Chat Completions / Responses / Completions / Embeddings; OpenAI-compatible equivalents; Anthropic Messages; Gemini `generateContent` and `streamGenerateContent`. All endpoints support streaming capture.
586
+ Endpoints: OpenAI Chat Completions / Responses / Completions / Embeddings; OpenAI-compatible equivalents; Anthropic Messages; Gemini `generateContent` and `streamGenerateContent`. Official SDK integrations currently cover non-streaming OpenAI Responses / Chat Completions and Anthropic Messages. Streaming capture is supported for Faraday endpoints that emit stream events with final usage.
505
587
 
506
588
  ## Safety
507
589
 
508
- **By design, `llm_cost_tracker` never persists prompt or response content.** The only data stored per call is the metadata needed for a cost ledger (provider, model, token counts, cost, latency, tags, provider response ID, HTTP status, and a timestamp). Tags carry whatever your application passes in — treat them as user-controlled input and avoid putting request bodies, completions, or secrets into them.
590
+ **By design, `llm_cost_tracker` never persists prompt or response content.** The only data stored per call is the metadata needed for a cost ledger (provider, model, token counts, cost, latency, tags, provider response ID, and timestamp). Tags carry whatever your application passes in — treat them as user-controlled input and avoid putting request bodies, completions, or secrets into them.
509
591
 
510
592
  - No external HTTP calls at request-tracking time.
511
593
  - No prompt or response bodies stored.
512
594
  - Faraday responses not modified.
513
- - Authorization headers and API keys are never stored or logged.
595
+ - Request headers are never stored. Warning logs strip query strings from URLs before logging.
514
596
  - Storage failures non-fatal by default (`storage_error_behavior = :warn`).
515
597
  - Budget and unknown-pricing errors are raised only when you opt in.
516
598
 
@@ -518,9 +600,9 @@ Endpoints: OpenAI Chat Completions / Responses / Completions / Embeddings; OpenA
518
600
 
519
601
  The gem is designed for multi-threaded hosts — Puma with `max_threads > 1` and Sidekiq with `concurrency > 1` are both supported. A few rules:
520
602
 
521
- - **Configure once at boot.** `LlmCostTracker.configure` deep-freezes `default_tags`, `pricing_overrides`, `report_tag_breakdowns`, and `openai_compatible_providers` when the block returns. Mutating or replacing shared fields through `LlmCostTracker.configuration` raises `FrozenError`.
522
- - **Use `:active_record` storage for shared ledgers.** Puma workers and Sidekiq processes do not share memory; `:log` and `:custom` backends see per-process state only. `:active_record` writes to a single table and is the right choice for dashboards and budget checks across processes.
523
- - **Size your connection pool.** Each tracked call on the middleware path issues up to three SQL queries (preflight `SUM`, `INSERT`, post-check `SUM`). Make sure the AR pool covers `puma max_threads + sidekiq concurrency` plus your app's own usage.
603
+ - **Configure once at boot.** `LlmCostTracker.configure` freezes mutable shared configuration when the block returns, and replacing shared fields through `LlmCostTracker.configuration` raises `FrozenError`. If `default_tags` is callable, keep it fast and thread-safe.
604
+ - **Use `:active_record` storage for the built-in shared ledger.** Puma workers and Sidekiq processes do not share memory; `:log` is process-local, and `:custom` is only as shared as the sink you write to. `:active_record` writes to a single table and is the right choice for the bundled dashboard and budget checks across processes.
605
+ - **Size your connection pool.** Each tracked call on the middleware path uses the host app's ActiveRecord connection for ledger writes, period rollups, and optional budget checks. Make sure the AR pool covers `puma max_threads + sidekiq concurrency` plus your app's own usage.
524
606
  - **Don't share a `StreamCollector` across threads you don't own.** The collector itself is thread-safe — `event`, `usage`, and `finish!` synchronize internally and `finish!` is idempotent — but the documented pattern is one collector per stream.
525
607
  - **`finish!` is a barrier.** Once a stream is finished, later `event`, `usage`, or `model=` calls raise `FrozenError` instead of mutating a closed collector.
526
608
  - **`ActiveSupport::Notifications` subscribers run synchronously** in the caller's thread. Keep them fast or hand off to a background job; otherwise they add latency to every tracked call.
@@ -529,18 +611,18 @@ The gem is designed for multi-threaded hosts — Puma with `max_threads > 1` and
529
611
  ## Known limitations
530
612
 
531
613
  - `:block_requests` is a best-effort guardrail, not a hard cap. Concurrent workers can pass preflight simultaneously and collectively overshoot the budget. Use an external quota system if you need a transactional cap.
614
+ - Official SDK integrations currently cover non-streaming calls. Use Faraday middleware or `track_stream` for SDK streaming until stable stream wrappers are added.
532
615
  - Streaming capture relies on the provider emitting a final-usage event (OpenAI needs `stream_options: { include_usage: true }`); missing events are recorded with `usage_source: "unknown"` so they surface on the Data Quality page.
533
616
  - `provider_response_id` is stored only when the provider exposes a stable response object ID. Missing IDs stay `nil` and surface on the Data Quality page.
534
- - Cache write TTL variants (1h vs 5min writes) not modeled separately.
617
+ - Cache write TTL variants (1h vs 5min writes) are not modeled separately.
535
618
 
536
619
  ## Development
537
620
 
538
- Architecture rules for future changes live in [`docs/architecture.md`](docs/architecture.md). Integration recipes live in [`docs/cookbook.md`](docs/cookbook.md).
621
+ Architecture rules for future changes live in [`docs/architecture.md`](docs/architecture.md).
539
622
 
540
623
  ```bash
541
624
  bundle install
542
- bundle exec rspec
543
- bundle exec rubocop
625
+ bin/check
544
626
  ```
545
627
 
546
628
  ## License
@@ -0,0 +1,37 @@
1
+ # frozen_string_literal: true
2
+
3
+ module LlmCostTracker
4
+ module ConfigurationInstrumentation
5
+ def instrument(*names)
6
+ ensure_shared_configuration_mutable!
7
+ @instrumented_integrations = (@instrumented_integrations + normalize_instrumentation_names(names)).uniq
8
+ end
9
+
10
+ def instrumented?(name)
11
+ @instrumented_integrations.include?(name.to_sym)
12
+ end
13
+
14
+ private
15
+
16
+ def normalize_instrumentation_names(names)
17
+ names.flatten.flat_map do |name|
18
+ key = name.to_sym
19
+ next available_instrumentation_names if key == :all
20
+
21
+ validate_instrumentation_name!(key)
22
+ key
23
+ end
24
+ end
25
+
26
+ def validate_instrumentation_name!(name)
27
+ return if available_instrumentation_names.include?(name)
28
+
29
+ raise Error, "Unknown integration: #{name.inspect}. " \
30
+ "Use one of: #{available_instrumentation_names.join(', ')}"
31
+ end
32
+
33
+ def available_instrumentation_names
34
+ Integrations::Registry::INTEGRATIONS.keys
35
+ end
36
+ end
37
+ end