RubyGems - llm_cost_tracker - Versions diffs - 0.10.0 → 0.12.0 - Mend

llm_cost_tracker 0.10.0 → 0.12.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (209) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: e84eab476358c65154ce99e1c1b95d91406094a1221e3724253b3a2efb471ed5
-  data.tar.gz: 8793c30b7cdbd161cc9ea4b242969f8c3ba61e164aa5779c84e33e4d5ac38db5
+  metadata.gz: a3bb624cf9437e2ab972021128ab552b48b16c9b8d209429fb264062837e8547
+  data.tar.gz: 8785221213ed888a592b312e5a734193637653930ef9652ece73f650cb920eb5
 SHA512:
-  metadata.gz: b377e608cf26ae425ac6eda48213878b14d108aa559c56035279e83f0414370db21fc05f8115d27094a92beaa380959b50a21c46730f7cc4ee8b63f8db7f2ad8
-  data.tar.gz: b53b1587f4080f74140cc73854edd1d1916b66a6e8e4b520a33b4897a0d5b428798c9cd6f248e189628d5fd6f71460727306523de0ce7268ced2c0a183b9b5bd
+  metadata.gz: c223c14dbfe3e2ebf61930175ae7607c2a4a05f502962963312c4ec929965242fccab115eda9d1426d6e331d7fb23ad811f73c9ba8a795cb3262c3d49a60eb45
+  data.tar.gz: 6b8e3ef019f41907909bb9f07eb58085dd6355a442699d52440de564c97fb6fb65979ee01271e4741bd9411ce7ef6a3b69102accd764fdf419d42db1bdb2f6e8

data/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,88 @@
 Format: [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). Versioning: [SemVer](https://semver.org/spec/v2.0.0.html).
+## [Unreleased]
+## [0.12.0] - 2026-06-04
+### Added
+- `bin/rails llm_cost_tracker:rebuild_rollups` rebuilds the `llm_cost_tracker_call_rollups` cache from the calls ledger — populate it after turning on `config.cache_rollups` for an app with existing calls, or resync it if rollup totals ever drift from the calls.
+### Removed
+- BREAKING: the experimental `Reconciliation` subsystem (provider invoice import + diff, the `/reconciliation` dashboard page, `bin/rails llm_cost_tracker:reconcile:*` rake tasks, `config.reconciliation_enabled`, `config.reconciliation_importers`, the `llm_cost_tracker:reconciliation` generator, and the `llm_cost_tracker_provider_invoices` / `_provider_invoice_imports` tables) is gone. It was never finished and never billing-accurate. `calls.provider_response_id` (captured on every call) already covers invoice cross-reference; if invoice-vs-ledger reconciliation ships again it lives in a separate gem. Existing installs can drop the two tables — see [docs/upgrading.md](docs/upgrading.md#v011--v012-unreleased).
+- `config.instrument :gemnii` (or any other typo / unknown integration name) no longer raises at config time — it now logs `Logging.warn("Unknown integration: :gemnii. Known: ...")` once when integrations install, and `bin/rails llm_cost_tracker:doctor` shows the unknown name as a `:warn` row so the typo is visible without crashing boot.
+- Pre-call budget enforcement for Azure-hosted OpenAI calls now keys on `"azure_openai"` (matching the recorded `Call.provider`), so `pricing_overrides` for Azure rates actually gate the call. Previously it always keyed on `"openai"` regardless of the SDK client's `base_url`.
+- BREAKING: removed the `batch:` keyword argument from `LlmCostTracker.track`, `LlmCostTracker.track_stream`, and `stream.usage` (inside `track_stream` blocks). Signal a batch-tier call via `pricing_mode: :batch` (or any pricing_mode containing the `batch` token like `:batch_flex`) — that's the single source of truth now. Previously `batch:` and `pricing_mode:` could disagree, especially after request-side pricing_mode merge inside `Tracker.record` overwrote the parser's mode but left the stored `batch` flag stale, so `calls.batch` could read `true` while `calls.pricing_mode` read `flex` (or vice versa) for the same row.
+- The `bin/rails llm_cost_tracker:prices:explain` rake task (and `LlmCostTracker::Pricing.explain`) is removed — the dashboard's Data Quality page surfaces unknown-pricing models and their effective rates instead.
+### Changed
+- The RubyLLM SDK integration now requires `ruby_llm >= 1.15.0` (was `>= 1.14.1`).
+- Engine no longer adds `tag` / `tag_value` to Rails `filter_parameters` — the Symbol filter was substring-matching unrelated host-app params (`tags`, `meta_tag`, etc.) into `[FILTERED]`. `Tags::Sanitizer` continues redacting secret-shaped tag values at storage.
+- BREAKING: the serialized event `cost` (the `llm_request.llm_cost_tracker` notification payload and the async-ingestion inbox payload) is now `{ components: {...}, total:, currency: }` (was flat with a top-level `total_cost:`). Notification subscribers should read `cost[:total]`; `ingestion: :async` rolling deploys should drain the inbox first — see [docs/upgrading.md](docs/upgrading.md#v011--v012-unreleased).
+- BREAKING: `pricing_mode` in the `llm_request.llm_cost_tracker` notification payload is now a String (e.g. `"batch"`, `"fast_data_residency"`), not a Symbol — subscribers matching it against a Symbol must compare to the String.
+- BREAKING: `LlmCostTracker.track(tokens:)` now takes the same `_tokens`-suffixed keys as `stream.usage` and the stored columns — `input_tokens`, `output_tokens`, `cache_read_input_tokens`, `audio_input_tokens`, etc. (was the short `input`, `output`, `cache_read_input`, …). Update manual `track` calls. Pricing-file / `pricing_overrides` field names are unchanged — they stay `input`, `output`, … (per-component rates, a separate vocabulary).
+### Fixed
+- RubyLLM streaming chats to Anthropic and Gemini (`chat.ask { |chunk| }`) are now recorded — previously the streamed response's raw body is the SSE text rather than the parsed hash the integration read, so an internal lookup raised and the call was silently dropped from the ledger. Blocking RubyLLM calls were unaffected.
+- A malformed or very long `pricing_mode` (or a provider `service_tier` / `speed` with many underscore-separated tokens) no longer hangs cost calculation — the call lands `cost_status: unknown` instead of pinning a CPU.
+- Gemini preview models dated with a four-digit year (e.g. `gemini-2.5-flash-preview-09-2025`) now fall back to the stable model's price instead of landing `cost_status: unknown`.
+- A typo'd price-key prefix in `pricing_overrides` or a custom `prices_file` (e.g. `bath_input` for `batch_input`, or any unknown `<mode>_<component>`) now logs an `Unknown price keys` warning and is ignored, instead of being silently accepted so the override quietly never applied at the intended mode/tier.
+- Anthropic responses with `service_tier: "priority"` now keep `:priority` as their pricing_mode instead of being silently billed at standard rates — committed-tier customers get `cost_status: unknown` (signaling to add `priority_input`/`priority_output` to `pricing_overrides`) instead of an over-counted USD figure that ignores their commitment discount.
+- OpenAI's `scale` enterprise tier and `priority` tier are now recognized as pricing modes (no more `Logging.warn` about unknown tokens); calls land as `cost_status: unknown` when negotiated rates are absent so you can add them via `pricing_overrides`.
+- Gemini responses echoing `usageMetadata.serviceTier: "unspecified"` (the default) now resolve to standard pricing instead of warning about an unknown token and landing as `cost_status: unknown`.
+- Anthropic SDK batch results (`client.messages.batches.results_streaming(id).each`) land in the ledger with `pricing_mode: :batch` and the per-result `provider_response_id`, with a same-process best-effort dedup against already-ledgered `provider_response_id`s so re-iterating the stream doesn't duplicate rows (concurrent retrieves from multiple processes can still race; async-mode rows in the inbox aren't checked until they drain).
+- OpenAI SDK batch processing auto-captures: `client.batches.retrieve(id)` on a completed batch downloads the output JSONL and emits one ledger event per response with `pricing_mode: :batch` and the per-response `provider_response_id`, with the same best-effort dedup as Anthropic batches.
+- OpenRouter pricing is now scraped via `openrouter.ai/api/v1/models`, so RubyLLM-routed OpenRouter calls (e.g. `openrouter/openai/gpt-4o`) get a real `total_cost` from the next `prices_file` refresh instead of landing as `cost_status: unknown`. The scrape also captures `image` / `audio` per-token rates so OpenRouter calls with multimodal inputs bill against the correct bucket instead of folding image/audio tokens into the text-input rate.
+- Misspelled `pricing_mode:` values now log a `Logging.warn` listing the unrecognized token (e.g. `:bach` for `:batch`) so the resulting `cost_status: unknown` call surfaces a typo instead of silently absorbing it; the warn fires once per unique token.
+- Whisper-style transcriptions whose response carries `usage.type = "duration"` now emit a `transcription_minute` line item (quantity = `ceil(seconds / 60)`) across both the OpenAI Ruby SDK patch and the Faraday / RubyLLM HTTP path; the call previously recorded with zero tokens and no line item, so audio-minute usage was invisible.
+- OpenAI Responses-API `image_generation_call` and `computer_call` output items now emit line items so per-call hosted-tool usage shows up on the dashboard alongside the existing `web_search_call` / `file_search_call` / `code_interpreter_call` coverage.
+- `LlmCostTracker.track(..., enforce_budget: true)` now actually raises `BudgetExceededError` pre-call when the estimated cost (token cost plus priced service line items) overshoots the budget, even when `budget_exceeded_behavior: :notify` is configured — previously the kwarg silently no-op'd unless policy was already `:block_requests`.
+- `Call#pricing_snapshot.rates` now includes per-charge rates for non-token service line items (web search, MCP calls, TTS character billing, etc.) — previously only token rates were captured, so audit/replay of service-charge pricing had no record of the rate that was actually applied.
+- Tags with invalid keys (e.g. containing whitespace or characters outside `[\w.-]`) are now skipped at write with a `Logging.warn` instead of being silently written and then raising `InvalidFilterError` on dashboard read.
+- A raising `default_tags` proc is now captured by `Logging.warn` and falls back to empty default tags, so a broken user callback doesn't take down every tracked call.
+- `LlmCostTracker::Ingestion::Worker.shutdown!(drain: true)` always attempts the final inbox flush even if waking the worker thread raises, so pending inbox rows aren't left when the host process exits.
+- Gemini preview-dated models (e.g. `gemini-2.5-flash-preview-04-17`) now resolve to the stable entry's pricing — previously the `preview-MM-DD` suffix didn't match the dated-snapshot regex so the call landed as `cost_status: unknown`.
+- Gemini parser now reads `usageMetadata.serviceTier` from the response body in addition to the `x-gemini-service-tier` header, so tier-aware pricing applies when only the body carries the tier signal.
+- Line-item and pricing-snapshot `currency` is now stored uppercase regardless of `prices_file` casing, so a `prices_file` with `currency: "eur"` shows up as `EUR` everywhere and service-line items don't get partitioned out of header totals on a case mismatch with cost-data currency.
+- Async-ingestion inbox rows reaching `MAX_ATTEMPTS_BEFORE_QUARANTINE` now log a `Logging.warn` (with row ids) at the moment they quarantine, so production sees the event in `Rails.logger` instead of needing to run `bin/rails llm_cost_tracker:doctor` to discover it.
+- Dashboard "Setup required" page now flags missing `llm_cost_tracker_ingestion_inbox_entries` and `llm_cost_tracker_ingestion_leases` tables when `ingestion: :async` is configured — previously the drift only surfaced as a worker boot crash.
+- Gemini image-generation models (`gemini-2.5-flash-image`, `gemini-3-pro-image-preview`, `gemini-3.1-flash-image-preview`) and stable preview text models (`gemini-3.1-pro-preview`, `gemini-2.5-flash-lite-preview-09-2025`, etc.) are no longer dropped by the price scraper — they flow into the pricing snapshot on the next refresh cycle.
+- Gemini parser splits `IMAGE`-modality tokens from `promptTokensDetails` / `candidatesTokensDetails` (mirroring the existing AUDIO handling), so image-output usage from Gemini calls routes to `image_output` rates instead of falling into the text-output bucket.
+- RubyLLM SDK integration over-subtracted cache-read tokens from recorded `input_tokens` on chat completions, so the figure landed in the ledger short by the cache-read amount; the gem now passes RubyLLM's net `input_tokens` through unchanged.
+- RubyLLM SDK integration captures `service_tier` from response bodies across Anthropic, OpenAI, and Gemini — previously the field was read from the wrong JSON path so batch and flex modes silently priced against standard rates.
+- RubyLLM SDK integration records the provider's response id in `provider_response_id` (previously always nil), so each ledger row carries the upstream id you can cross-reference against provider invoices and logs.
+- RubyLLM Anthropic chat completions split 1-hour and 5-minute cache writes into separate token buckets so 1h writes bill at the 2x extended rate instead of being lumped into the 5m bucket at 1.25x.
+- Async-inbox `total_cost` now round-trips through the JSON payload without losing precision; previously the payload coerced `BigDecimal` to `Float` and dropped digits past ~15 significant figures, so high-volume aggregate billing under `ingestion: :async` came out systematically short. BREAKING for subscribers to the `llm_request.llm_cost_tracker` `ActiveSupport::Notifications` event: `payload[:cost]` numeric values are now decimal strings (was `Float`) — wrap with `BigDecimal(value)` before arithmetic.
+## [0.11.0] - 2026-05-21
+### Added
+- A "Pricing" page under the dashboard sidebar's new "Reference" group lists every model's rates from `pricing_overrides`, your `prices_file`, and the bundled fallback as separate tabs; the active source (first non-empty in priority order) is highlighted, with last-updated date and currency next to the row count.
+- The Overview page now has Provider and Model filter pills next to Date, so cost slices can be scoped without leaving Overview.
+- The Calls page now has a Stream filter pill for narrowing to streaming or non-streaming calls.
+### Changed
+- Models, Calls, and Tags-breakdown tables sort by clicking a column header — ▲/▼ shows direction and clicking again reverses it. URLs use `?sort=<column>&dir=asc|desc`. The previous "Recent / Most expensive / Largest input / Slowest" sort buttons on Calls are gone; use `?sort=cost&dir=desc`, `?sort=latency&dir=desc`, etc. instead.
+- `?sort=unknown_pricing` on `/calls` is replaced by the `?cost_status=incomplete` filter — the Data Quality "Incomplete pricing by model" panel's "Calls" button uses the new URL.
+- The dashboard sidebar stays sticky while you scroll, with pages grouped into "Insights" (Overview / Models / Calls / Tags / Data Quality) and "Reference" (Pricing / Reconciliation if enabled).
+- Dashboard pages use native `<details>` filter popovers instead of inline form rows. Only one popover stays open at a time, and Esc closes the open one.
+- The Calls show page tucks the pricing snapshot and metadata JSON behind expandable `<details>` blocks; the redundant per-component "Tokens" and "Cost" lists are dropped — the Token mix / Cost mix bars already carry the breakdown.
+- The Data Quality page groups stat cards under "Volume" and "Issues" headers, hides zero-count issue cards, and replaces the whole "Issues" block with a single "No data-quality issues in this slice" message when nothing is wrong.
+- The daily spend chart text no longer stretches horizontally on wide screens.
+- `bin/rails llm_cost_tracker:doctor` groups checks under Setup / Schema / Data integrity / Operations headers, renders each row with a `[✓]` / `[!]` / `[x]` status icon (green / yellow / red on a TTY), and aligns the columns so the message stays readable.
+- The dashboard "Setup required" screen clears after `bin/rails db:migrate` without a Rails server restart, and the schema-drift details render as a monospaced block instead of indented bullets.
+### Fixed
+- The `upgrade_call_rollups_provider`, `upgrade_provider_invoice_imports_provider`, and `upgrade_provider_invoices_metadata_index` migrations no-op when their target table doesn't exist (installs that never opted into `cache_rollups` or reconciliation) instead of crashing.
+- `llm_cost_tracker:*` rake tasks ran their body twice on each invocation, so `doctor` and `report` printed every line twice and `prices:refresh` re-scraped on each run.
 ## [0.10.0] - 2026-05-17
 ### Added

data/README.md CHANGED Viewed

@@ -15,7 +15,10 @@ attribution only.
 Requires Ruby 3.4+, Rails 7.1+, PostgreSQL or MySQL.
-![Dashboard overview](docs/dashboard-overview.png)
+<picture>
+  <source media="(prefers-color-scheme: dark)" srcset="docs/dashboard-overview-dark.png">
+  <img alt="LLM Cost Tracker dashboard" src="docs/dashboard-overview-light.png">
+</picture>
 ## Quickstart
@@ -29,16 +32,17 @@ gem "openai"
 bin/rails llm_cost_tracker:setup
 ```
-Runs the install generator, drops a price snapshot, migrates the database, and verifies via `llm_cost_tracker:doctor`.
+Runs the install generator, drops a price snapshot, migrates the database, and verifies via `llm_cost_tracker:doctor`. The generated `config/initializers/llm_cost_tracker.rb` looks like:
 ```ruby
-# config/initializers/llm_cost_tracker.rb
 LlmCostTracker.configure do |config|
   config.default_tags = -> { { environment: Rails.env } }
   config.instrument :openai
 end
 ```
+Edit it in place to add tags, switch on async ingestion, etc.
 Tag your calls to attribute spend:
 ```ruby
@@ -82,7 +86,9 @@ The engine ships without authentication on purpose.
 | Anything else | `LlmCostTracker.track` |
 Streams capture when the provider emits final usage. OpenAI Faraday streams
-need `stream_options: { include_usage: true }`.
+get `stream_options: { include_usage: true }` auto-injected so the final
+usage chunk lands in the ledger (opt out via
+`config.auto_enable_stream_usage = false`).
 ## What it isn't
@@ -99,7 +105,7 @@ For batch jobs, internal gateways, or anything without an SDK/Faraday hook:
 LlmCostTracker.track(
   provider: :anthropic,
   model: "claude-sonnet-4-6",
-  tokens: { input: 1500, output: 320 },
+  tokens: { input_tokens: 1500, output_tokens: 320 },
   tags: { feature: "summarizer", user_id: current_user.id }
 )
 ```