RubyGems - llm_cost_tracker - Versions diffs - 0.4.0 → 0.5.0 - Mend

llm_cost_tracker 0.4.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (47) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +35 -0
data/README.md +195 -109
data/app/services/llm_cost_tracker/dashboard/data_quality.rb +46 -55
data/app/services/llm_cost_tracker/dashboard/data_quality_aggregate.rb +81 -0
data/lib/llm_cost_tracker/budget.rb +34 -37
data/lib/llm_cost_tracker/configuration/instrumentation.rb +37 -0
data/lib/llm_cost_tracker/configuration.rb +10 -5
data/lib/llm_cost_tracker/doctor.rb +166 -0
data/lib/llm_cost_tracker/generators/llm_cost_tracker/install_generator.rb +33 -0
data/lib/llm_cost_tracker/generators/llm_cost_tracker/prices_generator.rb +12 -6
data/lib/llm_cost_tracker/generators/llm_cost_tracker/templates/add_period_totals_to_llm_cost_tracker.rb.erb +38 -8
data/lib/llm_cost_tracker/generators/llm_cost_tracker/templates/create_llm_api_calls.rb.erb +1 -2
data/lib/llm_cost_tracker/generators/llm_cost_tracker/templates/initializer.rb.erb +53 -21
data/lib/llm_cost_tracker/integrations/anthropic.rb +75 -0
data/lib/llm_cost_tracker/integrations/base.rb +72 -0
data/lib/llm_cost_tracker/integrations/object_reader.rb +56 -0
data/lib/llm_cost_tracker/integrations/openai.rb +95 -0
data/lib/llm_cost_tracker/integrations/registry.rb +41 -0
data/lib/llm_cost_tracker/middleware/faraday.rb +4 -3
data/lib/llm_cost_tracker/parsed_usage.rb +8 -1
data/lib/llm_cost_tracker/parsers/anthropic.rb +17 -49
data/lib/llm_cost_tracker/parsers/base.rb +80 -0
data/lib/llm_cost_tracker/parsers/gemini.rb +12 -35
data/lib/llm_cost_tracker/parsers/openai.rb +1 -6
data/lib/llm_cost_tracker/parsers/openai_compatible.rb +6 -15
data/lib/llm_cost_tracker/parsers/openai_usage.rb +8 -30
data/lib/llm_cost_tracker/parsers/registry.rb +17 -2
data/lib/llm_cost_tracker/price_freshness.rb +38 -0
data/lib/llm_cost_tracker/price_registry.rb +14 -0
data/lib/llm_cost_tracker/price_sync/fetcher.rb +2 -1
data/lib/llm_cost_tracker/price_sync/refresh_plan_builder.rb +4 -2
data/lib/llm_cost_tracker/price_sync.rb +10 -0
data/lib/llm_cost_tracker/prices.json +394 -41
data/lib/llm_cost_tracker/pricing.rb +8 -1
data/lib/llm_cost_tracker/request_url.rb +20 -0
data/lib/llm_cost_tracker/storage/active_record_rollups.rb +47 -27
data/lib/llm_cost_tracker/storage/active_record_store.rb +4 -0
data/lib/llm_cost_tracker/stream_collector.rb +3 -3
data/lib/llm_cost_tracker/tag_context.rb +52 -0
data/lib/llm_cost_tracker/tags_column.rb +62 -24
data/lib/llm_cost_tracker/tracker.rb +5 -2
data/lib/llm_cost_tracker/version.rb +1 -1
data/lib/llm_cost_tracker.rb +14 -4
data/lib/tasks/llm_cost_tracker.rake +21 -3
metadata +13 -3
data/lib/llm_cost_tracker/generators/llm_cost_tracker/templates/llm_cost_tracker_prices.yml.erb +0 -51

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: ccb9a8365f4a06026a4352385efa1318ac59ce403cb848e0c9aff992fc80f64c
-  data.tar.gz: f21503cd322e923dc5bde0139cc61bc1547cef01eac59fe7a3861e1ab33e9860
+  metadata.gz: 6ee180a9d6ead4b84965b3ff96f87b31c6ce8982a8e13383f936d3031e8f6f5f
+  data.tar.gz: fda6d61c9f86b4e2a4dbdc7a7852f6f4f22bcf43f76b6cfbdd4f438c325e8d8c
 SHA512:
-  metadata.gz: 304ab6de6404f070b21b1dd72ce9eae2b44fb2fc7845eae8831a04971ed2b8ec2b6f740bc082fb36cfa42d90f0be59ab5800d43d72b68f04918a113b6d7d8cbd
-  data.tar.gz: afa2e92a99062bb1e0b4a00ab1d0762ca688f1890e0d76a29801881e2319e68db217c036d8f8c5d99558b580d5d4c039f8b3334283631763ee093fd12d369329
+  metadata.gz: 8e341c007ff3380459a07890a45bc5e05010c12ffc52a1f805492eb6c643e9637529b02e4d6ae12a7f35c1e25ea819336544bd007f7cfc5efa9c7999559f5d83
+  data.tar.gz: 44c912532194be0f239c6950f1f91317329bb5b8c3afbf33e430b4f9006377a8729ad0ed7f3c2c98528983d218fabef136774088018203426749532eb01627ef

data/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,41 @@ Format: [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). Versioning: [S
 ## [Unreleased]
+## [0.5.0] - 2026-04-25
+### Added
+- Optional SDK integrations: `config.instrument :openai`, `:anthropic`, or `:all` patches the official `openai` and `anthropic` gems' resource methods to record usage automatically. Provider SDKs are not added as hard dependencies.
+- `LlmCostTracker.with_tags` plus `TagContext` for thread- and fiber-isolated request-scoped tags that flow through middleware, SDK integrations, and `track` / `track_stream`.
+- `LlmCostTracker::Doctor` and the `llm_cost_tracker:doctor` rake task for diagnosing storage, schema, optional columns, period totals, integrations, prices, and recent calls.
+- `LlmCostTracker::PriceFreshness` helper plus a price-freshness doctor check that warns when bundled or local prices are stale.
+- Technical documentation under `docs/technical/` covering architecture, data flow, extension points, module map, and operational notes.
+### Changed
+- Pricing fuzzy matching now only accepts dated snapshot suffixes instead of guessing new model families.
+- Built-in prices include GPT-5.5 and GPT-5.4 variants and drop retired Claude and Gemini entries.
+- Missing model identifiers now normalize to `unknown` instead of leaking nil into tracked events.
+- `llm_cost_tracker:prices` now generates a full local price snapshot instead of an empty override file.
+- Price sync workflow surfaces clearer error context for fetcher failures and skips refresh-plan entries with malformed pricing.
+- README, cookbook, and technical docs clarify that `config.instrument` patches official SDKs only; `ruby-openai` (alexrudall) routes through the Faraday middleware via its constructor block, and `ruby_llm` is not auto-captured today because the gem does not expose a Faraday middleware hook.
+## [0.4.1] - 2026-04-24
+### Changed
+- Batched ActiveRecord period rollup writes and budget total reads.
+- Memoized schema capability checks and refreshed them on `reset_column_information`.
+- Install migration adds `[:model, :tracked_at]` composite index and drops redundant single-column `:provider` / `:model` indexes.
+- Data Quality now reads counters and usage sums through one aggregate query.
+- Parser URL matching, stream-event extraction, and custom parser registration now share a smaller base/registry extension surface.
+- Added cookbook recipes for `ruby-openai`, `anthropic-sdk-ruby`, `gemini-ai`, `langchainrb`, Azure OpenAI, and LiteLLM proxy setups.
+### Fixed
+- `llm_cost_tracker:add_period_totals` now imports legacy monthly rollups and backfills before adding the unique index.
+- Budget docs now describe `:notify` across monthly, daily, and per-call budgets.
 ## [0.4.0] - 2026-04-24
 ### Changed

data/README.md CHANGED Viewed

@@ -1,13 +1,13 @@
 # LLM Cost Tracker
-**Self-hosted LLM cost tracking for Ruby and Rails.** Intercepts Faraday LLM responses or records usage explicitly, prices events locally, and stores them in your database. No proxy, no SaaS.
+**Self-hosted LLM cost tracking for Ruby and Rails.** Instruments common Ruby SDKs, intercepts Faraday LLM responses, prices events locally, and can store them in your database. No proxy, no SaaS.
 [![Gem Version](https://img.shields.io/gem/v/llm_cost_tracker.svg)](https://rubygems.org/gems/llm_cost_tracker)
 [![CI](https://github.com/sergey-homenko/llm_cost_tracker/actions/workflows/ruby.yml/badge.svg)](https://github.com/sergey-homenko/llm_cost_tracker/actions)
 [![codecov](https://codecov.io/gh/sergey-homenko/llm_cost_tracker/branch/main/graph/badge.svg)](https://codecov.io/gh/sergey-homenko/llm_cost_tracker)
-Requires Ruby 3.3+, Rails/ActiveRecord 7.1+, and Faraday 2.0+.
-Core tracking works without Rails; the mounted dashboard requires Rails 7.1+.
+Requires Ruby 3.3+, ActiveSupport 7.1+, and Faraday 2.0+.
+ActiveRecord storage requires ActiveRecord 7.1+. The mounted dashboard requires Rails 7.1+.
 ## Why
@@ -16,48 +16,48 @@ Every Rails app with LLM integrations eventually runs into the same question: wh
 ## What You Get
 - A local ActiveRecord ledger of provider, model, usage breakdown, cost, latency, tags, streaming usage, and provider response IDs
-- Faraday middleware plus explicit `track` / `track_stream` helpers for non-Faraday clients
-- Server-rendered Rails dashboard with overview, calls, tags, CSV export, and data-quality pages
+- Optional official OpenAI and Anthropic SDK integrations, plus Faraday middleware for custom clients
+- Explicit `track` / `track_stream` helpers as a fallback for unsupported clients
+- Server-rendered Rails dashboard with overview, models, calls, tags, CSV export, and data-quality pages
 - Local pricing snapshots, price sync tasks, and budget guardrails
 - Prompt and response bodies are never persisted
 ## Dashboard
-LLM Cost Tracker ships with an optional server-rendered Rails Engine dashboard for spend review, attribution, and data quality checks.
+LLM Cost Tracker ships with a server-rendered Rails Engine dashboard for spend review, attribution, and data quality checks.
 ![LLM Cost Tracker dashboard](docs/dashboard-overview.png)
-The overview page includes spend trend, budget status, provider breakdown, top models, and filterable slices. The engine also includes Calls, Tags, and Data Quality pages. Plain ERB, no JavaScript bundle.
+The overview page includes spend trend, budget status, provider breakdown, top models, and filterable slices. The engine also includes Models, Calls, Tags, and Data Quality pages. Plain ERB, no JavaScript bundle.
 ## Quickstart
 ```ruby
 gem "llm_cost_tracker"
+gem "openai"
 ```
 ```bash
-bin/rails generate llm_cost_tracker:install
+bin/rails generate llm_cost_tracker:install --dashboard --prices
 bin/rails db:migrate
+bin/rails llm_cost_tracker:doctor
 ```
+Skip `--dashboard` if you only want the ledger. Skip `--prices` if you do not want a local pricing file yet.
 ```ruby
 LlmCostTracker.configure do |config|
   config.storage_backend = :active_record
-  config.default_tags = { app: "my_app", environment: Rails.env }
+  config.default_tags = -> { { environment: Rails.env } }
+  config.instrument :openai
 end
-OpenAI.configure do |config|
-  config.access_token = ENV["OPENAI_API_KEY"]
-  config.faraday do |f|
-    f.use :llm_cost_tracker, tags: -> { { user_id: Current.user&.id, feature: "chat" } }
-  end
+LlmCostTracker.with_tags(user_id: Current.user&.id, feature: "chat") do
+  client = OpenAI::Client.new(api_key: ENV["OPENAI_API_KEY"])
+  client.responses.create(model: "gpt-4o", input: "Hello")
 end
 ```
-```ruby
-mount LlmCostTracker::Engine => "/llm-costs"
-```
 After that, LLM Cost Tracker starts recording calls into `llm_api_calls` and the dashboard becomes available at `/llm-costs`.
 Protect the mounted engine with your application's authentication before exposing it outside development.
@@ -69,39 +69,43 @@ Protect the mounted engine with your application's authentication before exposin
 - No built-in auth on the mounted dashboard
 - Use `:active_record` when you want shared dashboards and budget checks across Puma workers and Sidekiq processes
-## Installation
+## Technical Docs
-```ruby
-gem "llm_cost_tracker"
-```
+- [Architecture](docs/architecture.md)
-For ActiveRecord storage:
+## Usage
-```bash
-bin/rails generate llm_cost_tracker:install
-bin/rails db:migrate
-```
+### Official SDK integrations
-## Usage
+`config.instrument` patches **official** provider SDKs only — currently the official `openai` and `anthropic` gems. SDK integrations are optional and do not add provider SDKs as gem dependencies. Install the provider SDK you already use, then enable its integration.
-### Patch an existing client's Faraday connection
+```ruby
+LlmCostTracker.configure do |config|
+  config.instrument :openai
+  config.instrument :anthropic
+end
+```
+The OpenAI integration records non-streaming calls through the official `openai` gem's `responses.create` and `chat.completions.create`. The Anthropic integration records non-streaming calls through the official `anthropic` gem's `messages.create`. Both integrations extract usage, model, latency, provider response ID, cache tokens, and hidden/reasoning tokens when the SDK response exposes them.
 ```ruby
-# config/initializers/openai.rb
-OpenAI.configure do |config|
-  config.access_token = ENV["OPENAI_API_KEY"]
-  config.faraday do |f|
-    f.use :llm_cost_tracker, tags: -> {
-      { user_id: Current.user&.id, workflow: Current.workflow, env: Rails.env }
-    }
-  end
+LlmCostTracker.with_tags(feature: "support_chat", user_id: Current.user&.id) do
+  anthropic = Anthropic::Client.new(api_key: ENV["ANTHROPIC_API_KEY"])
+  anthropic.messages.create(
+    model: "claude-sonnet-4-5-20250929",
+    max_tokens: 1024,
+    messages: [{ role: "user", content: "Hello" }]
+  )
 end
 ```
-`tags:` can be a callable and is evaluated on each request.
+Community clients such as `ruby-openai` are not patched by `instrument`. `ruby-openai` exposes a Faraday block on its constructor and is covered by the middleware below.
+Google's official Gemini SDKs do not include Ruby. Use the Faraday middleware against Gemini's REST API, or keep custom clients behind the fallback helpers until a stable SDK integration exists.
+### Faraday middleware
-### Raw Faraday
+`tags:` can be a hash or callable. Callables are evaluated on each request and may accept the Faraday request env.
 ```ruby
 conn = Faraday.new(url: "https://api.openai.com") do |f|
@@ -116,9 +120,11 @@ conn.post("/v1/responses", { model: "gpt-5-mini", input: "Hello!" })
 Place `llm_cost_tracker` inside the Faraday stack where it can see the final response body.
+The same middleware covers `ruby-openai` through its constructor block.
 ### Streaming
-Streaming is captured automatically for OpenAI, Anthropic, and Gemini when the request goes through the Faraday middleware. The middleware tees the `on_data` callback, keeps the stream flowing to your code, and records the final usage block once the response completes.
+Streaming is captured automatically for OpenAI, Anthropic, and Gemini when the request goes through the Faraday middleware. The middleware tees the `on_data` callback, keeps the stream flowing to your code, and records provider-reported usage once the response completes.
 ```ruby
 # OpenAI: include usage in the final chunk
@@ -130,20 +136,22 @@ client.chat(parameters: {
 })
 ```
-Anthropic emits usage in `message_start` + `message_delta` events. Gemini's `:streamGenerateContent` endpoint includes `usageMetadata`; usage from the final chunk is used.
+Anthropic emits usage in `message_start` + `message_delta` events. Gemini's `:streamGenerateContent` endpoint includes `usageMetadata`; the latest usage block is used.
 Streamed calls are stored with `stream: true` and `usage_source: "stream_final"`. If the provider never sends final usage, the call is still recorded with `usage_source: "unknown"` so those calls surface on the Data Quality page.
 When the provider emits a stable response object ID, LLM Cost Tracker stores it as `provider_response_id`. OpenAI and Anthropic are covered end-to-end; Gemini is best effort and may vary by endpoint or API version.
-For non-Faraday clients (raw `Net::HTTP`, custom SSE code, Azure OpenAI), use the explicit helper:
+Model identifiers are extracted from the provider response, request body, stream events, or URL path depending on the provider. If no source carries a model, the event is stored under `model: "unknown"` and shows up as unknown pricing instead of being guessed.
+For non-Faraday clients without an SDK integration, prefer adding a supported adapter. Use the explicit helper only as a fallback while wiring a client that does not expose a stable hook yet:
 ```ruby
 LlmCostTracker.track_stream(provider: "openai", model: "gpt-4o") do |stream|
-  my_client.stream(...) { |chunk| stream.event(chunk) }
+  my_client.stream(...) { |event| stream.event(event.to_h) }
 end
-# Or skip the chunk parsing entirely if you already know the totals:
+# Or skip provider event parsing entirely if you already know the totals:
 LlmCostTracker.track_stream(provider: "openai", model: "gpt-4o") do |stream|
   # ... your streaming loop ...
   stream.usage(input_tokens: 120, output_tokens: 45)
@@ -161,7 +169,11 @@ end
 Run `bin/rails g llm_cost_tracker:add_streaming` once on existing installs to add the `stream` and `usage_source` columns. Run `bin/rails g llm_cost_tracker:add_provider_response_id` to persist provider-issued response IDs. Run `bin/rails g llm_cost_tracker:add_usage_breakdown` to add cache-read, cache-write, hidden-output, and pricing-mode columns.
-### Manual tracking
+More client-specific snippets live in [`docs/cookbook.md`](docs/cookbook.md).
+### Fallback tracking
+Automatic capture should be the default integration path. `track` exists for custom clients, internal gateways, migrations, and SDKs that do not expose a stable middleware or instrumentation hook yet.
 ```ruby
 LlmCostTracker.track(
@@ -180,42 +192,72 @@ LlmCostTracker.track(
 `cache_read_input_tokens` and cache writes in `cache_write_input_tokens`; total
 tokens are calculated from the canonical billing breakdown.
+For manual tracking, pass the real upstream model when you know it. If a gateway only exposes a deployment or router name, use that stable identifier and add a matching `prices_file` / `pricing_overrides` entry.
+### Tags
+Tags are application context, not provider metadata. LLM Cost Tracker detects provider/model from the response when a parser is available; tags tell you who or what caused the call.
+```ruby
+LlmCostTracker.with_tags(user_id: current_user.id, feature: "support_chat", trace_id: request.uuid) do
+  client.chat(parameters: { model: "gpt-4o", messages: [...] })
+end
+```
+`default_tags` can be a hash or callable. Scoped tags from `with_tags` apply only inside the block and are isolated per thread/fiber. Explicit tags passed to `track`, `track_stream`, or middleware metadata win over scoped/default tags.
 ## Configuration
 ```ruby
-# config/initializers/llm_cost_tracker.rb
 LlmCostTracker.configure do |config|
-  config.storage_backend = :active_record # :log (default), :active_record, :custom
-  config.default_tags = { app: "my_app", environment: Rails.env }
+  config.storage_backend = :active_record
+  config.default_tags = -> { { environment: Rails.env } }
+  config.instrument :openai
+  config.instrument :anthropic
+  config.prices_file = Rails.root.join("config/llm_cost_tracker_prices.yml")
   config.monthly_budget = 500.00
   config.daily_budget = 50.00
   config.per_call_budget = 2.00
-  config.budget_exceeded_behavior = :notify  # :notify, :raise, :block_requests
-  config.storage_error_behavior   = :warn    # :ignore, :warn, :raise
-  config.unknown_pricing_behavior = :warn    # :ignore, :warn, :raise
+  config.budget_exceeded_behavior = :notify
   config.on_budget_exceeded = ->(data) {
-    SlackNotifier.notify("#alerts", "🚨 LLM #{data[:budget_type]} budget $#{data[:total].round(2)} / $#{data[:budget]}")
+    SlackNotifier.notify("#alerts", "LLM #{data[:budget_type]} budget $#{data[:total].round(2)} / $#{data[:budget]}")
   }
-  config.prices_file = Rails.root.join("config/llm_cost_tracker_prices.yml")
-  config.pricing_overrides = {
-    "ft:gpt-4o-mini:my-org" => { input: 0.30, cache_read_input: 0.15, output: 1.20 }
-  }
-  # Built-in: openrouter.ai, api.deepseek.com
-  config.openai_compatible_providers["llm.my-company.com"] = "internal_gateway"
 end
 ```
+Storage backends: `:log` (default), `:active_record`, `:custom`. Error behaviors: `:ignore`, `:warn`, `:raise`; budget behavior also supports `:block_requests`.
+Configuration reference:
+| Option | Default | Purpose |
+|---|---:|---|
+| `enabled` | `true` | Turns tracking on/off. |
+| `storage_backend` | `:log` | `:log`, `:active_record`, or `:custom`. |
+| `custom_storage` | `nil` | Callable storage hook for `:custom`. |
+| `default_tags` | `{}` | Hash or callable merged into every event. |
+| `prices_file` | `nil` | Local JSON/YAML price table. |
+| `pricing_overrides` | `{}` | Ruby-side model price overrides. |
+| `instrument` | none | Enables optional SDK integrations such as `:openai`, `:anthropic`, or `:all`. |
+| `monthly_budget` | `nil` | Monthly spend guardrail. |
+| `daily_budget` | `nil` | Daily spend guardrail. |
+| `per_call_budget` | `nil` | Single-event spend guardrail. |
+| `budget_exceeded_behavior` | `:notify` | `:notify`, `:raise`, or `:block_requests`. |
+| `on_budget_exceeded` | `nil` | Callback for budget events. |
+| `storage_error_behavior` | `:warn` | `:ignore`, `:warn`, or `:raise`. |
+| `unknown_pricing_behavior` | `:warn` | `:ignore`, `:warn`, or `:raise`. |
+| `log_level` | `:info` | Log level used by `:log` storage. |
+| `openai_compatible_providers` | OpenRouter + DeepSeek | Host-to-provider map for compatible APIs. |
+| `report_tag_breakdowns` | `[]` | Tag keys included in text reports. |
+LLM Cost Tracker estimates cost from recorded usage and a versioned price registry. Providers usually return token usage, not a stable per-request price, so request costs are calculated locally and stored with the call. Historical rows do not change when prices update.
 Pricing is best effort. OpenRouter-style IDs like `openai/gpt-4o-mini` are normalized to built-in names when possible. Use `prices_file` / `pricing_overrides` for fine-tunes, gateway-specific IDs, enterprise discounts, alternate pricing modes, or models the gem does not know.
 Provider-specific entries like `openai/gpt-4o-mini` win over model-only entries like `gpt-4o-mini`.
 Pass `pricing_mode: :batch` to use optional mode-specific keys such as `batch_input` / `batch_output`; missing mode-specific keys fall back to standard `input` / `output` rates. The same pattern works for custom modes, for example `contract_input`.
 `storage_error_behavior = :warn` (default) lets LLM responses continue if storage fails; `:raise` exposes `StorageError#original_error`.
-Unknown pricing still records token counts, but `cost` is `nil` and budget guardrails skip that event. Find unpriced models:
+With `unknown_pricing_behavior = :ignore` or `:warn`, unknown pricing still records token counts, but `cost` is `nil` and budget guardrails skip that event. With `:raise`, the event raises before storage. Find unpriced models:
 ```ruby
 LlmCostTracker::LlmApiCall.unknown_pricing.group(:model).count
@@ -223,22 +265,33 @@ LlmCostTracker::LlmApiCall.unknown_pricing.group(:model).count
 ### Keeping prices current
-Built-in prices live in `lib/llm_cost_tracker/prices.json`. The gem never fetches pricing on boot. For production, keep a local snapshot under `config/` and point the gem at it:
+Built-in prices live in `lib/llm_cost_tracker/prices.json`. The gem never fetches pricing on boot. For production, generate a local snapshot from the bundled registry, keep it under source control, and point the gem at it:
 ```bash
 bin/rails generate llm_cost_tracker:prices
 ```
-```json
-{
-  "metadata": { "updated_at": "2026-04-18", "currency": "USD", "unit": "1M tokens" },
-  "models": {
-    "my-gateway/gpt-4o-mini": { "input": 0.20, "cache_read_input": 0.10, "output": 0.80, "batch_input": 0.10, "batch_output": 0.40 }
-  }
-}
+```ruby
+config.prices_file = Rails.root.join("config/llm_cost_tracker_prices.yml")
+```
+The generated file has the same shape as the bundled registry:
+```yaml
+metadata:
+  updated_at: "2026-04-25"
+  currency: USD
+  unit: 1M tokens
+models:
+  my-gateway/gpt-4o-mini:
+    input: 0.20
+    cache_read_input: 0.10
+    output: 0.80
+    batch_input: 0.10
+    batch_output: 0.40
 ```
-`pricing_overrides` has the highest precedence. Use it for a handful of Ruby-side overrides; use `prices_file` when you want a local pricing table under source control.
+Pricing precedence is `pricing_overrides`, then `prices_file`, then bundled prices. Use `prices_file` for the app's source-controlled snapshot and `pricing_overrides` only for a handful of Ruby-side emergency overrides.
 To refresh prices on demand:
@@ -246,19 +299,30 @@ To refresh prices on demand:
 bin/rails llm_cost_tracker:prices:sync
 ```
-`llm_cost_tracker:prices:sync` refreshes the current registry from two structured sources: LiteLLM first, OpenRouter second. LiteLLM is the primary source; OpenRouter fills gaps and helps surface discrepancies.
+`llm_cost_tracker:prices:sync` refreshes a pricing file from two structured sources: LiteLLM first, OpenRouter second. LiteLLM is the primary source; OpenRouter fills gaps and helps surface discrepancies.
 `llm_cost_tracker:prices:sync` / `llm_cost_tracker:prices:check` perform HTTP GET requests to:
 - LiteLLM pricing JSON: `https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json`
 - OpenRouter Models API: `https://openrouter.ai/api/v1/models`
-If `config.prices_file` is configured, the task syncs that file automatically; otherwise it works from the built-in snapshot. `_source: "manual"` entries are never touched. Models that are still in your file but missing from both upstream sources are left alone and reported as orphaned. For intentional custom entries, mark them as manual so they stop showing up in orphaned warnings.
+The task writes to `ENV["OUTPUT"]`, then `config.prices_file`, in that order. It aborts if neither is present. The gem's bundled `prices.json` is only updated when you explicitly pass it through `OUTPUT=` while developing the gem. `_source: "manual"` entries are never touched. Models that are still in your file but missing from both upstream sources are left alone and reported as orphaned. For intentional custom entries, mark them as manual so they stop showing up in orphaned warnings.
-Use `PREVIEW=1` to see the diff without writing. Use `STRICT=1` to fail instead of applying a partial refresh when a source fails or the validator rejects a price. Use `bin/rails llm_cost_tracker:prices:check` in CI to print the current diff and exit non-zero when the snapshot has drifted or refresh fails.
+Use `OUTPUT=config/llm_cost_tracker_prices.yml` to choose a target file explicitly. Use `PREVIEW=1` to see the diff without writing. Use `STRICT=1` to fail instead of applying a partial refresh when a source fails or the validator rejects a price. Use `bin/rails llm_cost_tracker:prices:check` in CI to print the current diff and exit non-zero when the snapshot has drifted or refresh fails.
 Large price changes are flagged during sync. If a specific entry is expected to move by more than 3x, add `_validator_override: ["skip_relative_change"]` to that entry in your local price file.
+If sync reports `certificate verify failed`, fix the host Ruby/OpenSSL trust store rather than disabling TLS verification. Common fixes are installing `ca-certificates` in Docker/Linux images, configuring the corporate proxy CA, setting `SSL_CERT_FILE` to the system CA bundle, or rebuilding rbenv/asdf Ruby after an OpenSSL upgrade.
+For unattended updates, run the check daily and sync through review:
+```bash
+bin/rails llm_cost_tracker:prices:check
+STRICT=1 bin/rails llm_cost_tracker:prices:sync
+```
+`bin/rails llm_cost_tracker:doctor` warns when the configured price file has no `metadata.updated_at` or when it is older than 30 days.
 ## Budget enforcement
 ```ruby
@@ -269,13 +333,13 @@ config.per_call_budget = 1.00
 config.budget_exceeded_behavior = :block_requests
 ```
-- `:notify` — fire `on_budget_exceeded` after an event pushes the month over budget.
+- `:notify` — fire `on_budget_exceeded` after an event pushes the monthly, daily, or per-call budget over the limit.
 - `:raise` — record the event, then raise `BudgetExceededError`.
 - `:block_requests` — block preflight when the stored monthly or daily total is already over budget; still raises post-response on the event that crosses the line. Needs `:active_record` storage for preflight.
 `monthly_budget` and `daily_budget` are cumulative ledger limits. `per_call_budget` is a ceiling for a single priced event and runs after the response cost is known.
-ActiveRecord installs keep `llm_cost_tracker_period_totals` in sync with atomic upserts. Budget preflight reads period rollups instead of scanning `llm_api_calls`.
+ActiveRecord installs keep `llm_cost_tracker_period_totals` in sync with atomic upserts. Budget preflight reads period rollups when they are available instead of scanning `llm_api_calls`.
 ```ruby
 rescue LlmCostTracker::BudgetExceededError => e
@@ -284,7 +348,7 @@ rescue LlmCostTracker::BudgetExceededError => e
 `:block_requests` is a **guardrail, not a hard cap**. The preflight and the spend-recording write are separate statements, so under Puma / Sidekiq concurrency multiple workers can all pass the preflight and then collectively overshoot the budget. The setting reliably *stops new requests after the overshoot is visible* — it does not prevent the overshoot itself. For strict quotas use a provider- or gateway-level limit, or a database-backed counter outside this gem.
-Preflight is wired into the Faraday middleware automatically. When you record events via `LlmCostTracker.track` / `track_stream` and also want the same preflight, opt in:
+Preflight is wired into the Faraday middleware and SDK integrations automatically. When you record events via `LlmCostTracker.track` / `track_stream` and also want the same preflight, opt in:
 ```ruby
 LlmCostTracker.track(
@@ -302,8 +366,20 @@ end
 LlmCostTracker.enforce_budget! # standalone preflight
 ```
+## Doctor
+Run the setup check after install, deploy, or upgrades:
+```bash
+bin/rails llm_cost_tracker:doctor
+```
+It checks storage mode, ActiveRecord availability, table/column coverage, period rollups, pricing file loading, and whether calls are being recorded. Setup errors exit non-zero; warnings point at optional production hardening.
 ## Querying costs
+These helpers and rake tasks require ActiveRecord storage.
 ```bash
 bin/rails llm_cost_tracker:report
 DAYS=7 bin/rails llm_cost_tracker:report
@@ -337,7 +413,7 @@ LlmCostTracker::LlmApiCall.between(1.week.ago, Time.current).cost_by_model
 ## Retention
-Retention is not enforced automatically. Use the rake task below if you need to delete older records in batches.
+Retention is not enforced automatically. With ActiveRecord storage, use the rake task below if you need to delete older records in batches.
 ```bash
 DAYS=90 bin/rails llm_cost_tracker:prune  # delete calls older than N days in batches
@@ -354,10 +430,15 @@ add_index :llm_api_calls, :tags, using: :gin
 On other adapters tags fall back to JSON in a text column. `by_tag` uses JSONB containment on PG, text matching elsewhere.
-Upgrade an existing install:
+## Upgrading existing installs
+Run the generators that match columns missing from older versions:
 ```bash
 bin/rails generate llm_cost_tracker:add_period_totals    # shared budget rollups
+bin/rails generate llm_cost_tracker:add_streaming         # stream + usage_source
+bin/rails generate llm_cost_tracker:add_provider_response_id
+bin/rails generate llm_cost_tracker:add_usage_breakdown
 bin/rails generate llm_cost_tracker:upgrade_tags_to_jsonb   # PG: text → jsonb + GIN
 bin/rails generate llm_cost_tracker:upgrade_cost_precision  # widen cost columns
 bin/rails generate llm_cost_tracker:add_latency_ms
@@ -368,7 +449,9 @@ On PostgreSQL, the generated `upgrade_tags_to_jsonb` migration rewrites `llm_api
 ## Mounting the dashboard
-Optional Rails Engine. Plain ERB, no JavaScript framework, no asset pipeline required. Requires Rails 7.1+; the core middleware works without Rails.
+Optional Rails Engine. Plain ERB, no JavaScript framework, no asset pipeline required. Requires Rails 7.1+; the core middleware works without Rails. The dashboard reads `llm_api_calls`, so use `storage_backend = :active_record` for apps that mount it.
+`bin/rails generate llm_cost_tracker:install --dashboard` adds the require and route for you. Manual setup:
 ```ruby
 # config/application.rb (or an initializer)
@@ -382,11 +465,11 @@ Routes (GET-only; CSV export included):
 - `/llm-costs` — overview: spend with delta vs previous period, budget projection, spend anomaly banner, daily trend vs previous slice, provider rollup, top models
 - `/llm-costs/models` — by provider + model; sortable by spend, volume, avg cost, latency
-- `/llm-costs/calls` — filterable + paginated; outlier sort modes (expensive, largest input/output, slowest, unknown pricing); CSV export
+- `/llm-costs/calls` — filterable + paginated; sort modes for recency, spend, input tokens, output tokens, latency, and unknown pricing; CSV export
 - `/llm-costs/calls/:id` — details with token mix and cost mix breakdowns
 - `/llm-costs/tags` — tag keys present in the dataset (PG/SQLite native; MySQL 8.0+ via JSON_TABLE)
 - `/llm-costs/tags/:key` — breakdown by values of a given tag key
-- `/llm-costs/data_quality` — unknown pricing share, untagged calls, missing latency
+- `/llm-costs/data_quality` — unknown pricing, untagged calls, missing latency, incomplete stream usage, and missing provider response IDs
 No built-in auth is included. Tags carry whatever your app puts in them, so protect the mount point with your application's authentication.
@@ -425,6 +508,7 @@ ActiveSupport::Notifications.subscribe("llm_request.llm_cost_tracker") do |*, pa
   #     total_cost: 0.000795, currency: "USD"
   #   },
   #   pricing_mode: "batch",
+  #   stream: false, usage_source: "response", provider_response_id: "chatcmpl_123",
   #   tags: { feature: "chat", user_id: 42 },
   #   tracked_at: 2026-04-16 14:30:00 UTC
   # }
@@ -456,21 +540,23 @@ Configured hosts are parsed using the OpenAI-compatible usage shape (`prompt_tok
 For providers with a non-OpenAI usage shape:
 ```ruby
-require "uri"
 class AcmeParser < LlmCostTracker::Parsers::Base
+  HOSTS = %w[api.acme-llm.example].freeze
+  TRACKED_PATHS = %w[/v1/generate].freeze
+  def provider_names
+    %w[acme]
+  end
   def match?(url)
-    uri = URI.parse(url.to_s)
-    uri.host == "api.acme-llm.example" && uri.path == "/v1/generate"
-  rescue URI::InvalidURIError
-    false
+    match_uri?(url, hosts: HOSTS, exact_paths: TRACKED_PATHS)
   end
-  def parse(request_url, request_body, response_status, response_body)
+  def parse(_request_url, _request_body, response_status, response_body)
     return nil unless response_status == 200
     payload = safe_json_parse(response_body)
-    usage = payload&.dig("usage")
+    usage = payload.dig("usage")
     return nil unless usage
     LlmCostTracker::ParsedUsage.build(
@@ -482,31 +568,31 @@ class AcmeParser < LlmCostTracker::Parsers::Base
   end
 end
-LlmCostTracker::Parsers::Registry.register(AcmeParser.new)
+LlmCostTracker::Parsers::Registry.register(AcmeParser)
 ```
 ## Supported providers
 | Provider | Auto-detected | Models with pricing |
 |---|:---:|---|
-| OpenAI | ✅ | GPT-5.2/5.1/5, GPT-5 mini/nano, GPT-4.1, GPT-4o, o1/o3/o4-mini |
-| OpenRouter | ✅ | OpenAI-compatible usage; provider-prefixed OpenAI model IDs normalized when possible |
-| DeepSeek | ✅ | OpenAI-compatible usage; add `pricing_overrides` for DeepSeek models |
-| OpenAI-compatible hosts | 🔧 | Configure `openai_compatible_providers` |
-| Anthropic | ✅ | Claude Opus 4.6/4.1/4, Sonnet 4.6/4.5/4, Haiku 4.5, Claude 3.x |
-| Google Gemini | ✅ | Gemini 2.5 Pro/Flash/Flash-Lite, 2.0 Flash/Flash-Lite, 1.5 Pro/Flash |
-| Any other | 🔧 | Custom parser |
+| OpenAI | Yes | GPT-5.5/5.4/5.2/5.1/5, GPT-5.5/5.4/5.2/5 pro, GPT-5.4 mini/nano, GPT-5 mini/nano, GPT-4.1, GPT-4o, o1/o3/o4-mini |
+| OpenRouter | Yes | OpenAI-compatible usage; provider-prefixed OpenAI model IDs normalized when possible |
+| DeepSeek | Yes | OpenAI-compatible usage; add `pricing_overrides` for DeepSeek models |
+| OpenAI-compatible hosts | Config | Configure `openai_compatible_providers` |
+| Anthropic | Yes | Claude Opus 4.7/4.6/4.5/4.1/4, Sonnet 4.6/4.5/4, Haiku 4.5 |
+| Google Gemini | Yes | Gemini 2.5 Pro/Flash/Flash-Lite, 2.0 Flash/Flash-Lite |
+| Any other | Config | Custom parser |
-Endpoints: OpenAI Chat Completions / Responses / Completions / Embeddings; OpenAI-compatible equivalents; Anthropic Messages; Gemini `generateContent` and `streamGenerateContent`. All endpoints support streaming capture.
+Endpoints: OpenAI Chat Completions / Responses / Completions / Embeddings; OpenAI-compatible equivalents; Anthropic Messages; Gemini `generateContent` and `streamGenerateContent`. Official SDK integrations currently cover non-streaming OpenAI Responses / Chat Completions and Anthropic Messages. Streaming capture is supported for Faraday endpoints that emit stream events with final usage.
 ## Safety
-**By design, `llm_cost_tracker` never persists prompt or response content.** The only data stored per call is the metadata needed for a cost ledger (provider, model, token counts, cost, latency, tags, provider response ID, HTTP status, and a timestamp). Tags carry whatever your application passes in — treat them as user-controlled input and avoid putting request bodies, completions, or secrets into them.
+**By design, `llm_cost_tracker` never persists prompt or response content.** The only data stored per call is the metadata needed for a cost ledger (provider, model, token counts, cost, latency, tags, provider response ID, and timestamp). Tags carry whatever your application passes in — treat them as user-controlled input and avoid putting request bodies, completions, or secrets into them.
 - No external HTTP calls at request-tracking time.
 - No prompt or response bodies stored.
 - Faraday responses not modified.
-- Authorization headers and API keys are never stored or logged.
+- Request headers are never stored. Warning logs strip query strings from URLs before logging.
 - Storage failures non-fatal by default (`storage_error_behavior = :warn`).
 - Budget and unknown-pricing errors are raised only when you opt in.
@@ -514,9 +600,9 @@ Endpoints: OpenAI Chat Completions / Responses / Completions / Embeddings; OpenA
 The gem is designed for multi-threaded hosts — Puma with `max_threads > 1` and Sidekiq with `concurrency > 1` are both supported. A few rules:
-- **Configure once at boot.** `LlmCostTracker.configure` deep-freezes `default_tags`, `pricing_overrides`, `report_tag_breakdowns`, and `openai_compatible_providers` when the block returns. Mutating or replacing shared fields through `LlmCostTracker.configuration` raises `FrozenError`.
-- **Use `:active_record` storage for shared ledgers.** Puma workers and Sidekiq processes do not share memory; `:log` and `:custom` backends see per-process state only. `:active_record` writes to a single table and is the right choice for dashboards and budget checks across processes.
-- **Size your connection pool.** Each tracked call on the middleware path issues up to three SQL queries (preflight `SUM`, `INSERT`, post-check `SUM`). Make sure the AR pool covers `puma max_threads + sidekiq concurrency` plus your app's own usage.
+- **Configure once at boot.** `LlmCostTracker.configure` freezes mutable shared configuration when the block returns, and replacing shared fields through `LlmCostTracker.configuration` raises `FrozenError`. If `default_tags` is callable, keep it fast and thread-safe.
+- **Use `:active_record` storage for the built-in shared ledger.** Puma workers and Sidekiq processes do not share memory; `:log` is process-local, and `:custom` is only as shared as the sink you write to. `:active_record` writes to a single table and is the right choice for the bundled dashboard and budget checks across processes.
+- **Size your connection pool.** Each tracked call on the middleware path uses the host app's ActiveRecord connection for ledger writes, period rollups, and optional budget checks. Make sure the AR pool covers `puma max_threads + sidekiq concurrency` plus your app's own usage.
 - **Don't share a `StreamCollector` across threads you don't own.** The collector itself is thread-safe — `event`, `usage`, and `finish!` synchronize internally and `finish!` is idempotent — but the documented pattern is one collector per stream.
 - **`finish!` is a barrier.** Once a stream is finished, later `event`, `usage`, or `model=` calls raise `FrozenError` instead of mutating a closed collector.
 - **`ActiveSupport::Notifications` subscribers run synchronously** in the caller's thread. Keep them fast or hand off to a background job; otherwise they add latency to every tracked call.
@@ -525,9 +611,10 @@ The gem is designed for multi-threaded hosts — Puma with `max_threads > 1` and
 ## Known limitations
 - `:block_requests` is a best-effort guardrail, not a hard cap. Concurrent workers can pass preflight simultaneously and collectively overshoot the budget. Use an external quota system if you need a transactional cap.
+- Official SDK integrations currently cover non-streaming calls. Use Faraday middleware or `track_stream` for SDK streaming until stable stream wrappers are added.
 - Streaming capture relies on the provider emitting a final-usage event (OpenAI needs `stream_options: { include_usage: true }`); missing events are recorded with `usage_source: "unknown"` so they surface on the Data Quality page.
 - `provider_response_id` is stored only when the provider exposes a stable response object ID. Missing IDs stay `nil` and surface on the Data Quality page.
-- Cache write TTL variants (1h vs 5min writes) not modeled separately.
+- Cache write TTL variants (1h vs 5min writes) are not modeled separately.
 ## Development
@@ -535,8 +622,7 @@ Architecture rules for future changes live in [`docs/architecture.md`](docs/arch
 ```bash
 bundle install
-bundle exec rspec
-bundle exec rubocop
+bin/check
 ```
 ## License