RubyGems - llm_cost_tracker - Versions diffs - 0.7.3 → 0.9.0 - Mend

llm_cost_tracker 0.7.3 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (195) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 6950dae400eac9294a57a0ba2fd2bce7977658837962eafffc3836fa4ab9bd2b
-  data.tar.gz: c398f5271d3d0fa53cb27e1206e418e0242fb9dff73ed2c405903b92dfaf8a48
+  metadata.gz: f17f618b28473afa871c9961a443a34152f4de81f7026d676d62b7e2bd1396d8
+  data.tar.gz: d024b23f0ca6cd117afa5d10faa0a0b96374391a4741ea330b924b5091f665f7
 SHA512:
-  metadata.gz: c52638e31e7eb0f46308312339bd40cfce87227a8c7ec77c94b3af08ffc931c3cffb9566f2ce15ec70a87700084e5e9bb6d05fe670028b57a12066af4a9ebaf6
-  data.tar.gz: 12da45f4cd8c485bd6fde5f9376bdfa2c8e618abd41e7be11c022878ec005348ca5d98578216170f1b7105dcc9e2c0c4b037cb5394e2085e2526e04ee8d5a885
+  metadata.gz: 0abf684c595b7bc84dfda26ffc62eaabc0c6d91d0b93f1065bf6e824c7867326b7978875d845d3df8be25bfa04ff9091150e0a4cac7f84d835ceaf2f1e2996bb
+  data.tar.gz: 5b9405bf332b2e9e1eae05f0e7d107d4bb76ea71a6602846a16198540f3e0f315f48dbb316bbda24812d4d66c56ba837a42bfe81aeea502238f09e1f0202c6b4

data/.ruby-version ADDED Viewed

	@@ -0,0 +1 @@
1	+ 3.4.5

data/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,179 @@ Format: [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). Versioning: [S
 ## [Unreleased]
+## [0.9.0] - 2026-05-12
+0.9 leans the default install: only `calls`, `call_line_items`, and `call_tags`
+are mandatory. Durable ingestion, rollup-cached budget reads, and provider
+invoice reconciliation are opt-in behind config flags and dedicated generators.
+Plus expanded SDK capture (OpenAI embeddings/audio/images/moderation, RubyLLM
+paint/moderate), correct handling of Anthropic data residency and Priority
+Tier, and a security-hardened dashboard. Existing installs need a migration —
+see [Upgrading](docs/upgrading.md).
+### Added
+- **Experimental:** opt-in provider invoice reconciliation. Set `config.reconciliation_enabled = true` and run `bin/rails generate llm_cost_tracker:reconciliation`. Public surface: `LlmCostTracker::Reconciliation.import / .diff`, `config.register_reconciliation_importer(:source) { … }`, rake tasks `llm_cost_tracker:reconcile:import` and `:reconcile:diff`. Doctor warns when drift exceeds 5% or imports go stale past 14 days. See [Configuration](docs/configuration.md#reconciliation-experimental-opt-in).
+- Dashboard Data Quality page now shows a "Streaming health by provider" breakdown (streams, with-usage, unknown, unknown share) so a misconfigured OpenAI-compatible host shipping streams without `stream_options.include_usage` is visible at a glance.
+- Dashboard tag detail page drills into a single value via `?tag_value=…` with total cost, call count, average per call, and a daily spend timeseries.
+- Bundled rates for OpenAI embeddings (`text-embedding-3-small` / `-3-large` / `-ada-002`, including 50% batch discount) and token-priced transcription (`gpt-4o-transcribe`, `gpt-4o-mini-transcribe`). Token-priced transcription splits audio and text inputs at their separate rates. DALL-E and Whisper still record as zero-token visibility events until their per-image / per-minute pricing components land.
+- OpenAI `gpt-image-1` / `gpt-image-1-mini` / `gpt-image-1.5` / `gpt-image-2` priced per image-token at their published standard rates, with `batch_*` shadow rates for the 50% batch tier. (Earlier preview snapshots stored only the batch rates, which silently halved image-generation costs.) The SDK integration extracts `usage.input_tokens_details.image_tokens` for image-as-input flows (edits / variations) and treats `usage.output_tokens` as image output. Requires the new `bin/rails generate llm_cost_tracker:upgrade_image_tokens` migration on v0.8 → v0.9 upgrades.
+- OpenAI `tts-1` / `tts-1-hd` priced per character (request `input.length`). `gpt-4o-mini-tts` is left as a zero-cost visibility event because its tokens are not exposed to the client.
+- OpenAI SDK integration now also patches `Embeddings#create`, `Images#generate` / `#edit` / `#create_variation`, `Audio::Transcriptions#create`, `Audio::Speech#create`, `Moderations#create`, and `Chat::Completions#stream`. Calls without provider-reported usage record as zero-token visibility events.
+- RubyLLM SDK integration also records `Provider#paint` and `Provider#moderate`.
+- `bin/rails generate llm_cost_tracker:upgrade_call_rollups_provider` writes the v0.8 → v0.9 migration that adds the `provider` column and swaps the unique index. Re-runs are no-ops.
+- [EU AI Act record-keeping guide](docs/eu_ai_act.md) — maps the ledger fields and `llm_cost_tracker:prune` retention to Article 26(6) deployer obligations (≥ 6-month retention, traceability, attribution tags). Not legal advice.
+### Fixed
+- Subscriber failures during `Tracker.record` no longer lose the event — the ledger write happens first; subscriber errors are caught and logged.
+- Header `total_cost` no longer mixes currencies. Mismatched service-line costs keep their per-line currency and are excluded from the header total (with a warning).
+- Budget reads aggregate across all rollup currencies instead of being silently scoped to USD only.
+- `bin/rails llm_cost_tracker:setup` no longer fails with `Missing Thor class for invoke llm_cost_tracker:prices`, is idempotent on re-runs, and surfaces a friendly error when the database is unreachable.
+- Stream events are no longer lost when finalization raises. The collector retries on the next `finish!`. Abandoned streams (wrapped but never iterated) emit a usage event instead of disappearing.
+- Faraday streaming overflow keeps the buffer accumulated up to the limit (matching the SDK collector) instead of dropping all events.
+- Edits to `config.prices_file` are picked up without a gem reload — the lookup cache invalidates on file mtime changes.
+- Models flagged with `_source: "manual"` in the local prices file are preserved through `prices:refresh` when the remote registry does not claim the same key.
+- Anthropic Priority Tier no longer falls to `cost_status: unknown`. It's a throughput commitment, not a per-token surcharge — both the SDK integration and the Faraday parser treat `service_tier: "priority"` as standard pricing.
+- Anthropic `data_residency` mode triggers on `inference_geo: "us"` only — the documented +1.1x uplift tier. Earlier preview ranges that listed `"eu"` were incorrect; EU data residency runs through Bedrock Frankfurt or Vertex Belgium with separate pricing, not the Anthropic API.
+- Anthropic `web_fetch_request` is recorded with a `$0` rate (Anthropic bills web fetch via standard tokens, not per fetch). The scraper picks up the "no additional charges" wording so `prices:refresh` keeps it accurate.
+- OpenAI `web_search_call` is now priced model-aware. The legacy `web_search_preview` tool routes to `web_search_preview_request_reasoning` ($10/1k for gpt-5/o-series) or `web_search_preview_request_non_reasoning` ($25/1k for everything else), matching OpenAI's three published web-search billing paths. `gpt-5-chat-latest` and dotted variants (`gpt-5.1-chat-latest`, `gpt-5.2-chat-latest`, …) are classified as non-reasoning despite the `gpt-5` prefix.
+- Anthropic Cost API reconciliation now ingests rows against live admin payloads (preview builds expected obsolete field names and produced zero rows). `service_tier: "batch"` and `inference_geo: "us"` are the only dimensions that promote a row's `pricing_mode`.
+- Reconciliation diff windows are anchored in UTC; non-UTC servers no longer skew the window.
+- Reconciliation provider totals sum only invoices fully contained in the diff window. Partially overlapping invoices no longer count at their full `billed_amount`.
+- Reconciliation diff window upper bound is now exclusive of midnight on the day after `period_end`. Calls tracked at exactly `00:00:00.000` of the next month no longer get counted in both periods.
+- `Reconciliation.import` / `Reconciliation.diff` accept (and require, for unmapped sources) an explicit `provider:`. Built-in mappings cover `openai`, `openai_usage`, `anthropic`, `anthropic_usage`, `gemini`. CSV and other custom sources must pass `provider:` (or be derivable from a prior import's metadata) — the previous silent fall-through summed local calls across every provider.
+- Reconciliation import errors no longer echo the exception verbatim into the dashboard flash. The full trace goes to logs; the alert shows the exception class only.
+- `Reconciliation::Importer` works on MySQL/Trilogy installs (adapter-aware `upsert_all`).
+- Reconciliation `external_id` is namespaced by `source/provider` for sources that carry multiple providers (e.g. `csv/openai:row-1` vs `csv/anthropic:row-1`). The same CSV row id imported under two providers no longer collides on the unique index. Native sources keep their `openai:` / `anthropic:` / `gemini:` prefix.
+- Reconciliation dashboard groups latest-period and drill-down by source, provider, and currency. A second provider importing under the same source no longer hides its drift in the first provider's row.
+- OpenAI Cost API reconciliation tags the organization id under `provider_workspace_id` so org-level scope filters work.
+- Reconciliation diff drill-down is capped at the top 100 unmatched rows by amount with totals counted separately, so the dashboard stays responsive on large monthly reconciliations. Pass `DRILLDOWN_LIMIT=all` to `rake llm_cost_tracker:reconcile:diff` to see every row.
+- Period totals fall back to live aggregation from `llm_cost_tracker_calls` when `cache_rollups = true` but the rollups table has no row for the period. Budget reads and dashboard totals no longer read zero during a rollup rebuild window after the v0.9 upgrade migration.
+- OpenAI hosted-tool service line items (`web_search_call`, `file_search_call`, `code_interpreter_call`, `mcp_call`) are recorded when the SDK returns the type as a Symbol. Previously these line items were silently dropped on SDK-shaped responses.
+- Image generation streams (`gpt-image-1.5`, `gpt-image-2`) and audio streams no longer overflow on a single base64 chunk; the final usage event is captured and tokens get priced.
+- Interrupted Anthropic and Gemini streams record the right provider name instead of `provider: "unknown"`.
+- Tag sanitizer redacts secrets before truncating, so the leading bytes of a secret can't survive a small `max_tag_value_bytesize`. Nested `[REDACTED]` markers stay whole regardless of the byte budget.
+- `Pricing::Registry` rejects non-finite price values (`Infinity` / `NaN`) alongside negatives.
+- Reconciliation `ProviderInvoiceImport.started_at` is the wall-clock import time. Backfills with a historical `imported_at` no longer invert `resume_cursor_for` ordering.
+- Reconciliation install migration is re-runnable on installs that already carry the v0.8 placeholder tables.
+- Pre-release v0.9 deployers who imported reconciliation rows before these fixes need `LlmCostTracker::ProviderInvoice.delete_all` and a re-import — the `external_id` prefix and the OpenAI organization-id field both changed shape.
+- Budget reads survive the v0.9 upgrade migration's rollup truncation — a partial rollup row no longer hides historical pre-migration spend in the same period.
+- Streaming requests that hit `unknown_pricing_behavior = :raise` after the response is received raise without recording a synthetic zero-token event.
+- Reconciliation doctor checks each `source / provider / currency` combination separately; a stale Anthropic CSV import no longer hides behind a fresh OpenAI one on the same source.
+- Reconciliation imports normalise `currency` to upper case so `usd` and `USD` no longer split the diff.
+- Reconciliation dashboard and CLI render `n/a` for invoice rows imported with no `billed_amount` instead of `$0.00`.
+- Reconciliation diff drill-down shows the actual unmatched rows even when most invoices match — small-amount unmatched rows are no longer hidden by a wall of matched big-amount rows.
+- OpenAI SDK Responses calls bill image and text tokens separately for `gpt-image-*` models, matching the Faraday parser.
+- OpenAI SDK integration captures the request when the caller passes a typed request object (anything that responds to `to_h`) instead of dropping it.
+- Custom prices files with `Infinity` / `NaN` service-charge rates fail to load with a clear error instead of silently corrupting cost math.
+- High-cardinality tag filters (`Call.by_tag(:tenant_id, …)`) now hit a composite index instead of scanning. Existing installs run `bin/rails generate llm_cost_tracker:upgrade_call_tags_key_value_index && bin/rails db:migrate`.
+- Reconciliation diff over a large invoice set uses an index scan on the new `(source, currency, period_start)` composite.
+- Doctor warns when provider invoice rows are stored with non-uppercase currency and points at the one-line backfill SQL, instead of the dashboard silently zeroing out diffs against legacy lowercase data.
+- A request-level `pricing_mode` no longer overrides what the provider reports back on a streamed response. Provider-reported standard wins over a request that asked for priority.
+- The new generators (`call_rollups`, `durable_ingestion`, `reconciliation`, `upgrade_call_rollups_provider`) are reachable through `bin/rails generate llm_cost_tracker:<name>`.
+- Faraday streaming captures no longer silently degrade to `usage_source: :unknown`.
+- Dashboard filters apply the default 30-day range when `from`/`to` params are missing.
+- `provider_api_key_id` and `provider_workspace_id` are masked on the call detail page and CSV export. Host apps that added a `metadata` column written as a JSON string now flow through the same masking instead of rendering the raw column.
+- Faraday parser tracks OpenAI `/v1/images/*` and `/v1/audio/transcriptions`/`/v1/audio/translations` so raw-Faraday image generations and transcriptions land in the ledger. `/v1/audio/speech` and `/v1/moderations` are also matched so `Tracker.enforce_budget!` gates them; they do not record a row because OpenAI does not return token usage for those endpoints.
+- OpenAI SDK `Audio::Translations#create` is now patched alongside `Audio::Transcriptions#create`.
+- OpenAI SDK `Images#generate` / `#edit` / `#create_variation` no longer double-counts cached input tokens.
+- OpenAI SDK `Responses.create` and Faraday parser both route output to `image_output_tokens` for `gpt-image-*` models even when the response omits `output_tokens_details.image_tokens`.
+- OpenAI SDK `Images#generate` / `#edit` / `#create_variation` no longer drops the text-output remainder when `output_tokens_details` reports only `image_tokens`. The remainder lands as `output_tokens`, matching the Faraday parser.
+- RubyLLM `Provider#paint` for `gpt-image-*` models records image output tokens under `image_output_tokens` so image rates apply.
+- RubyLLM integration treats Anthropic `service_tier: "priority"` as standard pricing (Priority Tier is committed throughput, not a surcharge). Previously these calls fell to `cost_status: unknown` because the literal `"priority"` was passed through as `pricing_mode`.
+- Reconciliation diff falls back to live `llm_cost_tracker_call_line_items` aggregation when the rollup fast path finds no row for the period. Without the fallback, past-month diffs after the v0.9 `upgrade_call_rollups_provider` migration (which truncates rollups) would report `local_total = $0` until events repopulate the new schema.
+- Provider-invoice reconciliation falls back to `match_basis: "model"` (was `period_only`) when an invoice carries only a model identifier.
+- `prices:refresh` bootstraps a missing local pricing file instead of failing with `Errno::ENOENT`.
+- Doctor's durable-inbox verification no longer leaves a synthetic inbox row behind when `Tracker.track` raises `BudgetExceededError`.
+- Install-generator snippet in [Upgrading](docs/upgrading.md) for the reconciliation table now matches the shipped index (`(source, currency, period_start)`).
+- Doctor catches schema drift on required columns, required indexes, and the foreign key on `call_line_items` before the first row is inserted.
+- Service-charge rows render `n/a` instead of `$0.00` when `cost_status` is `unknown`, so unpriced charges don't masquerade as zero-cost.
+- Enabling `:ruby_llm` together with `:openai` / `:anthropic` logs a warning at install — RubyLLM routes through HTTP, so calls would otherwise be double-counted. Pick one path per provider.
+### Changed
+- BREAKING: `bin/rails generate llm_cost_tracker:install --dashboard` no longer writes the `mount LlmCostTracker::Engine` line into `config/routes.rb`. The CLI prints the snippet wrapped in your auth instead — leaving the dashboard auto-mounted would expose spend, tags, and provider IDs to anyone who can reach the host. Add the route under your authentication block.
+- BREAKING: `config.durable_ingestion` defaults to `false`. Tracking writes go directly to the ledger from the request thread; the durable inbox + worker + leases tables are no longer created by the install generator. Existing installs keep their tables — set `config.durable_ingestion = true` to keep the inbox path. Fresh installs that need durability run `bin/rails generate llm_cost_tracker:durable_ingestion` and flip the flag.
+- BREAKING: `config.cache_rollups` defaults to `false`. Budget reads aggregate live from `llm_cost_tracker_calls`; the rollup table is no longer created by the install generator. Existing installs keep their table — set `config.cache_rollups = true` to keep the rollup fast path. Fresh installs run `bin/rails generate llm_cost_tracker:call_rollups` and flip the flag.
+- BREAKING: `llm_cost_tracker_call_rollups` gains a `provider` column; unique index moves from `(period, period_start, currency)` to `(period, period_start, currency, provider)`. See [Upgrading](docs/upgrading.md).
+- BREAKING: `llm_cost_tracker_calls` gains `image_input_tokens` and `image_output_tokens` columns (default 0) so OpenAI `gpt-image-*` models can bill image-token rates separately from text. Run `bin/rails generate llm_cost_tracker:upgrade_image_tokens && bin/rails db:migrate`. CSV exports include the new columns between `audio_input_tokens` / `output_tokens` and `audio_output_tokens` / `total_tokens` respectively — downstream consumers indexing by header name keep working; positional consumers shift by two.
+- BREAKING: `LlmCostTracker::Call.by_tag(key, value)` encodes Hash and Array values with `JSON.generate` to match how `Ledger::Store` writes them. The previous `value.to_s` path produced `"{:k=>v}"`-shaped strings that never matched stored JSON, so filtering by nested attribution silently returned zero rows.
+- Faraday middleware auto-injects `stream_options: { include_usage: true }` on OpenAI and OpenAI-compatible chat-completions streaming requests when the caller hasn't set it. Disable with `config.auto_enable_stream_usage = false`.
+- Header `total_cost` and per-line-item rates can no longer disagree on the context-tier boundary or the resolved pricing mode.
+- OpenAI-compatible chat-completions streams without a final usage chunk log a warning instead of recording silently as `usage_source: "unknown"`.
+- `Tags::Sanitizer` redacts tag values matching known secret patterns (OpenAI/Anthropic, GitHub, AWS, JWT, Slack, Stripe, Google API key, `Bearer …`) regardless of tag key, recurses into nested Hash/Array leaves, and on tag-count overflow keeps the most recently added tags. `Tags::Context` sanitises at block entry so raw values never reach notification subscribers, the Faraday request env, or in-flight stream collectors.
+- Engine dashboard adds CSRF protection on the reconciliation import endpoint, sets `Cache-Control: no-store` on CSV exports, registers `tag` / `tag_value` in `config.filter_parameters`, and emits `X-Frame-Options: DENY` / `Referrer-Policy: same-origin` / a baseline `Content-Security-Policy` on every dashboard response. CSV export is capped at 10,000 rows and respects the requested sort.
+- Dashboard schema drift check runs once per process instead of on every request, cutting per-request DB metadata load. Code reloads in development still trigger a re-check.
+- Dashboard dynamic widths (progress bars, budget fills, stack segments) render via a per-request CSP-nonced `<style>` block instead of inline `style="…"` attributes. Strict `style-src 'self' 'nonce-…'` no longer collapses the visualisations.
+## [0.8.0] - 2026-05-07
+0.8 is a storage rebuild. Tokens and tool/runtime charges share one shape
+(`Billing::LineItem`) and live in a dedicated line items table. Per-component
+cost columns and the standalone service charges table are gone. Several tables
+were also renamed during the cycle. See [Upgrading](docs/upgrading.md) for the
+migration path — there is no rolling-deploy upgrade.
+### Added
+- `llm_cost_tracker_call_line_items` — one row per priced component (text/audio/cached tokens, web search, code execution, grounding, container sessions, file search). Tokens and tool charges share one shape and one `cost_status` semantics.
+- `llm_cost_tracker_call_tags` — normalized attribution. Tag filters and aggregations now JOIN through this table on PostgreSQL and MySQL alike.
+- `llm_cost_tracker_provider_invoices` — placeholder table reserved for v0.9 invoice reconciliation.
+- `Billing::LineItem` value object covering both token and service charges. `LineItem.from_token_usage` and explicit `component_key:` builders price token and tool/runtime quantities through the same path.
+- `Pricing.price_line_items` — single pricing pass for token + tool/runtime line items, used by `Tracker.build_event`.
+- Doctor schema checks for `llm_cost_tracker_call_line_items`, `llm_cost_tracker_call_tags`, and `llm_cost_tracker_provider_invoices`.
+- Doctor sample-based drift checks: header `total_cost` vs `SUM(line_items.cost)` and stored line item cost vs `pricing_snapshot.rates` (RFC §Doctor).
+- `currency` column on `llm_cost_tracker_call_rollups` (default `USD`) with a `(period, period_start, currency)` unique index. v0.8 stays single-currency; the schema is in place so v0.9 multi-currency rollups don't need another migration.
+- `Billing::Components::REGISTRY` now loads from `lib/llm_cost_tracker/billing/components.yml`. Adding a billable component is one YAML row plus a price entry — no more 11-line `Component.new(...)` literals.
+- Anthropic web search and code execution usage emitted as line items with `component_key: :web_search_request` / `:code_execution_request`. SDK integration emits the same line items from native SDK responses, not just Faraday-wrapped ones.
+- OpenAI hosted web search, file search, and Code Interpreter container sessions emitted as line items via both Faraday and SDK integration paths.
+- Gemini grounding queries emitted as line items.
+- `provider_project_id`, `provider_api_key_id`, `provider_workspace_id`, `batch` capture dimensions on `LlmCostTracker.track` and the `Event` payload, persisted as columns on `llm_cost_tracker_calls`.
+- `Pricing::EffectivePrices` permutes compound pricing modes (e.g. `priority_batch_data_residency`) when matching rates, so combined modes resolve correctly.
+- `Pricing::Sync` registry-diff compares `service_charges` rates in addition to model rates.
+- Dashboard polish pass: shared `_filters.html.erb` and `_sort.html.erb` partials, sticky table headers, button hover/active states, spacing/shadow scales, and a full `prefers-color-scheme` dark palette.
+- Bundled audio and tool rates refreshed from current provider pricing.
+### Changed
+- BREAKING: Renamed `llm_api_calls` → `llm_cost_tracker_calls`, `llm_cost_tracker_period_totals` → `llm_cost_tracker_call_rollups`, `llm_cost_tracker_inbox_events` → `llm_cost_tracker_ingestion_inbox_entries`, `llm_cost_tracker_ingestor_leases` → `llm_cost_tracker_ingestion_leases`. Corresponding model `LlmCostTracker::PeriodTotal` renamed to `LlmCostTracker::CallRollup`; ingestion models live under `LlmCostTracker::Ingestion::InboxEntry` and `LlmCostTracker::Ingestion::Lease`.
+- BREAKING: Per-component cost columns removed from `llm_cost_tracker_calls` (`input_cost`, `output_cost`, `cache_read_input_cost`, `cache_write_input_cost`, `cache_write_extended_input_cost`, `cache_write_1h_input_cost`, `audio_input_cost`, `audio_output_cost`). The header keeps `total_cost` only; per-component costs live in line items.
+- BREAKING: `llm_cost_tracker_calls.tags` JSONB column removed in favor of `llm_cost_tracker_call_tags`. `Call#parsed_tags`, `Call.by_tags`, `Call.cost_by_tag`, `Call.group_by_tag`, and the dashboard tag explorer now read the normalized table.
+- BREAKING: `llm_cost_tracker_service_charges` table removed. Tool/runtime rows are stored in `llm_cost_tracker_call_line_items` with `unit != 'token'`.
+- BREAKING: `Billing::ServiceCharge` value object and `LlmCostTracker::ServiceCharge` AR model removed. Use `Billing::LineItem` and `LlmCostTracker::CallLineItem`.
+- BREAKING: `Event#service_charges` removed. Filter `event.line_items` by `unit != :token` instead.
+- BREAKING: `Call#service_charges` association removed. Use `call.line_items.where.not(unit: "token")`.
+- BREAKING: `LlmCostTracker.track(service_charges:)` keyword renamed to `service_line_items:`. Hash keys: `component:` → `component_key:`, `source_key:` → `provider_field:`, `pricing_basis: PROVIDER_USAGE_BASIS` → `pricing_basis: :provider_usage`.
+- BREAKING: `Billing::CostStatus.call(service_charges:)` keyword renamed to `service_line_items:`.
+- BREAKING: `Pricing.cost_with_service_charges` public API removed; replaced internally by `Pricing.price_line_items`.
+- BREAKING: Top-level delegators `LlmCostTracker.flush!`, `LlmCostTracker.shutdown!`, `LlmCostTracker.enforce_budget!` removed. Use `LlmCostTracker::Ingestion::Worker.flush!` / `.shutdown!` directly; budget enforcement is internal.
+- BREAKING: `LlmCostTracker.track` requires explicit `tokens:` and accepts `tags:` as a hash; the previous keyword shape is no longer supported.
+- BREAKING: Notification payload (`llm_request.llm_cost_tracker`) no longer carries `service_charges`. Subscribers read `line_items`.
+- BREAKING: Inbox payload v0/v1 compatibility dropped; only v2 is accepted. Drain any pre-v2 entries on the prior gem version before bumping.
+- BREAKING: Ruby 3.4+ required.
+- BREAKING: Legacy upgrade generators removed (`add_billing`, `add_ingestion`, `add_call_rollups`, `add_capture_dimensions`, `add_latency_ms`, `add_provider_response_id`, `add_streaming`, `add_token_usage`, `upgrade_cost_precision`, `upgrade_schema_foundation`, `upgrade_tags_to_jsonb`). Doctor no longer suggests them.
+- `llm_cost_tracker_call_tags.value` widened to TEXT (was VARCHAR), and the `[:key, :value]` composite index dropped in favor of `:key` only — value-equality filters scan the per-key bucket.
+- `Configuration#pricing_overrides` validates shape at assignment time rather than at first read.
+- Pricing computes a partial `total_cost` (with `cost_status: :partial`) when only some token components have rates; previously `total_cost` was nil whenever any component lacked a rate.
+- `TokenUsage.build` clamps negative token counts to zero so anomalous provider payloads don't poison rollups.
+- Stream collector buffer overflow keeps already-accumulated events instead of dropping them.
+- Budget guardrail preflight time is excluded from SDK call latency measurements.
+- Dashboard data-quality breakdown computes per-component cost from line items via JOIN; usage_rows accepts `component_costs:` hash.
+- CSV export pulls tag JSON from `tag_records` instead of the dropped JSONB column.
+- The fingerprinted dashboard stylesheet is served with `Cache-Control: no-store` in development so edits show up without a hard reload; production keeps the immutable cache.
+### Fixed
+- Railtie no longer requires removed legacy upgrade generators at boot, so installs on a clean app don't crash during eager-load.
+- `Tracker` only flags unknown pricing when token quantities are positive — service-only events with zero tokens no longer raise `Pricing::Unknown`.
+- `Billing::CostStatus.cost_status_for` coerces symbol/string status values consistently when building line items.
+- Gemini `thoughtsTokenCount` is billed at the output token rate (already present in 0.7.3, kept for clarity given the rebuild).
+### Removed
+- Dead `Billing::LineItem.from_service_charge` constructor and the unused `Call.with_json_tags` scope.
 ## [0.7.3] - 2026-05-01
 ### Fixed

data/README.md CHANGED Viewed

@@ -1,50 +1,45 @@
 # LLM Cost Tracker
-A Rails-native ledger for estimating LLM API spend.
+Self-hosted LLM cost tracking for Rails.
 [![Gem Version](https://img.shields.io/gem/v/llm_cost_tracker.svg)](https://rubygems.org/gems/llm_cost_tracker)
 [![CI](https://github.com/sergey-homenko/llm_cost_tracker/actions/workflows/ruby.yml/badge.svg)](https://github.com/sergey-homenko/llm_cost_tracker/actions)
 [![codecov](https://codecov.io/gh/sergey-homenko/llm_cost_tracker/branch/main/graph/badge.svg)](https://codecov.io/gh/sergey-homenko/llm_cost_tracker)
-If someone keeps asking "where did that LLM bill come from?", this gem records provider-reported usage into your own database, prices it locally, and gives you a dashboard you can mount in five minutes. No proxy, no SaaS account, no extra service to deploy.
+Every call your app makes to OpenAI, Anthropic, Gemini, RubyLLM, or any
+OpenAI-compatible API gets logged: tokens, cost, latency, tags. Calls go
+app → provider direct. No proxy.
-It is not Langfuse, Helicone, or LiteLLM. It does not capture prompts, score completions, or replay traces. It does one thing: tells you which provider, which model, which feature, and which user burned how much money. That's the entire pitch.
+Not Langfuse, Helicone, or LiteLLM. No prompts, no traces, no replay. Spend
+attribution only.
-Requires Ruby 3.3+, Rails 7.1+, PostgreSQL or MySQL, and Faraday 2.0+.
+Requires Ruby 3.4+, Rails 7.1+, PostgreSQL or MySQL.
 ![Dashboard overview](docs/dashboard-overview.png)
-## Accuracy model
-LLM Cost Tracker estimates spend from provider-reported usage and configured prices. It is useful for explaining spend by provider, model, and tags, but it is not invoice-grade billing. For reconciliation, each call keeps `provider_response_id`, `usage_source`, token breakdowns, and `pricing_mode`.
 ## Quickstart
-Add to your Gemfile alongside whatever LLM client you already use:
 ```ruby
+# Gemfile
 gem "llm_cost_tracker"
-gem "openai"  # or "anthropic", "ruby_llm", or your existing client
+gem "openai"
 ```
-Install, migrate, verify:
 ```bash
-bin/rails generate llm_cost_tracker:install --dashboard --prices
-bin/rails db:migrate
-bin/rails llm_cost_tracker:doctor
+bin/rails llm_cost_tracker:setup
 ```
-Drop this into `config/initializers/llm_cost_tracker.rb`:
+Runs the install generator, drops a price snapshot, migrates the database, and verifies via `llm_cost_tracker:doctor`.
 ```ruby
+# config/initializers/llm_cost_tracker.rb
 LlmCostTracker.configure do |config|
   config.default_tags = -> { { environment: Rails.env } }
   config.instrument :openai
 end
 ```
-Now every OpenAI call is recorded. Wrap calls in `with_tags` to attribute spend to a user, feature, or anything else you care about:
+Tag your calls — that's how you find out who burned the money:
 ```ruby
 LlmCostTracker.with_tags(user_id: Current.user&.id, feature: "chat") do
@@ -53,240 +48,85 @@ LlmCostTracker.with_tags(user_id: Current.user&.id, feature: "chat") do
 end
 ```
-Visit `/llm-costs` for the dashboard. **Mount it behind your app's auth before deploying** — the gem doesn't ship with one, on purpose.
-## What you get
-- Local ActiveRecord ledger of every call: provider, model, token breakdown, cost, latency, tags, response IDs
-- Auto-capture for RubyLLM and the official `openai` and `anthropic` Ruby SDKs, plus Faraday middleware for `ruby-openai`, the Gemini REST API, and any client you can inject middleware into
-- Server-rendered dashboard (plain ERB, zero JavaScript) with overview, models, calls, tags, CSV export, and a data-quality page
-- Local pricing snapshots refreshed daily from the official provider pricing pages, applied with `bin/rails llm_cost_tracker:prices:refresh`
-- Monthly / daily / per-call budget guardrails with notify, raise, or block-requests behaviour
-- Tag-based attribution that survives concurrency — Puma threads and Sidekiq fibers don't bleed into each other
-## What it deliberately doesn't do
-- **Doesn't run as a proxy.** Calls go directly from your app to the provider.
-- **Doesn't store prompts or completions.** Token counts, model, cost, tags, response IDs only. Nothing else.
-- **Doesn't promise invoice-grade accuracy.** It uses official provider pricing pages, but enterprise rates, batch discounts on unsupported endpoints, and modality tiers are not always modeled. `provider_response_id` is stored as a join key for whoever does that reconciliation.
-- **Doesn't ship with auth on the dashboard.** It's a Rails Engine; mount it behind whatever your app already uses (Devise, basic auth, Cloudflare Access, your own session middleware).
-- **Doesn't centralize multi-service visibility.** One Rails monolith — perfect fit. Six services in four languages — wrong tool, look at a proxy or API-layer gateway.
-## Capturing calls
-Three paths, in order of preference. Use the first one that fits your stack.
-### 1. SDK integrations
-Drop-in for RubyLLM and the official `openai` and `anthropic` gems. `config.instrument` patches tested SDK methods so you don't change a single call site:
+Mount the dashboard in `config/routes.rb`, behind your auth:
 ```ruby
-LlmCostTracker.configure do |config|
-  config.instrument :openai # or :anthropic / :ruby_llm
-end
-LlmCostTracker.with_tags(feature: "support_chat") do
-  Anthropic::Client.new.messages.create(
-    model: "claude-sonnet-4-6",
-    max_tokens: 1024,
-    messages: [{ role: "user", content: "Hello" }]
-  )
+authenticate :admin do
+  mount LlmCostTracker::Engine => "/llm-costs"
 end
 ```
-Captures usage, model, latency, response ID, pricing mode, cache tokens, Anthropic cache-write TTLs, and reasoning tokens whenever the SDK exposes them. Provider SDKs are not added as gem dependencies — you install whichever you actually use.
-Enabled integrations are checked at boot: the client gem must be loaded, meet the minimum supported version, and expose the expected classes and methods. If the contract check fails, boot raises instead of silently missing spend.
+The engine ships without authentication on purpose.
-This patches **only** RubyLLM and the official Ruby SDKs. `ruby-openai` (alexrudall) and any custom client go through Faraday middleware below.
+## What lands in the ledger
-### 2. Faraday middleware
+- **Calls.** Provider, model, total tokens, total cost, latency, status.
+- **Line items.** Per-component breakdown — text/audio/cached tokens, tool
+  charges (web search, code execution, grounding, container sessions).
+- **Tags.** Whatever attribution you pass — user, feature, tenant, env.
+- **Provider IDs.** Response, project, API key, workspace — for downstream
+  audits.
+- **Pricing snapshot.** So historical numbers don't drift when prices change.
-For `ruby-openai`, the Gemini REST API, custom Faraday clients, or anything OpenAI-compatible (OpenRouter, DeepSeek, Groq, LiteLLM proxies):
+## Capture surfaces
-```ruby
-conn = Faraday.new(url: "https://api.openai.com") do |f|
-  f.use :llm_cost_tracker, tags: -> { { feature: "chat", user_id: Current.user&.id } }
-  f.request :json
-  f.response :json
-  f.adapter Faraday.default_adapter
-end
-```
+| Surface | Path |
+| --- | --- |
+| OpenAI | Official SDK or Faraday |
+| Anthropic | Official SDK or Faraday |
+| Google Gemini | Faraday |
+| RubyLLM | Provider layer |
+| `ruby-openai` | Faraday |
+| OpenRouter, DeepSeek, Groq, LiteLLM-style gateways | OpenAI-compatible Faraday |
+| Anything else | `LlmCostTracker.track` |
-Tags can be a hash or a callable evaluated per request. Place the middleware where it sees the final response body — in practice, before the JSON parser.
+Streams capture when the provider emits final usage. OpenAI Faraday streams
+need `stream_options: { include_usage: true }`.
-Streaming works through the same path: the middleware tees the `on_data` callback so your code keeps receiving chunks normally, and the final usage gets recorded once the stream finishes. OpenAI streams need `stream_options: { include_usage: true }` for the final usage event.
+## What it isn't
-Per-client setup snippets for `ruby-openai`, Azure OpenAI, LiteLLM proxy, and Gemini live in [`docs/cookbook.md`](docs/cookbook.md).
+- No proxy. Direct calls only.
+- No prompts. Token counts and metadata only.
+- No traces, evals, or prompt management. Different product, different gem.
+- Not multi-service. Built for a Rails monolith.
-### 3. Manual `track` / `track_stream`
+## Manual tracking
-When you have a client that doesn't expose Faraday and isn't an official SDK — internal gateways, homegrown wrappers, batch jobs replaying historical usage:
+For batch jobs, internal gateways, or anything without an SDK/Faraday hook:
 ```ruby
 LlmCostTracker.track(
   provider: :anthropic,
   model: "claude-sonnet-4-6",
-  input_tokens: 1500,
-  output_tokens: 320,
-  feature: "summarizer",
-  user_id: current_user.id
+  tokens: { input: 1500, output: 320 },
+  tags: { feature: "summarizer", user_id: current_user.id }
 )
 ```
-For streaming the same way, `track_stream` accepts a block, parses provider events automatically, and records once the stream finishes. Full reference in [`docs/streaming.md`](docs/streaming.md).
-## Tags: who burned this money
-Tags answer the only question that matters in attribution: which feature, which user, which job, which tenant. They're free-form strings, stored as JSONB on PostgreSQL or JSON on MySQL, and queryable from both Ruby and the dashboard.
-```ruby
-LlmCostTracker.with_tags(user_id: current_user.id, feature: "support_chat") do
-  client.chat(parameters: { model: "gpt-4o", messages: [...] })
-end
-```
-`with_tags` is thread- and fiber-isolated, so concurrent requests in Puma or jobs in Sidekiq don't bleed into each other. A `default_tags` callable on configuration runs on every event for things you always want — `environment`, `region`, deployment SHA. Explicit tags passed to `track` win over scoped tags, scoped tags win over defaults.
-Streaming capture snapshots tags when the stream starts, so attribution survives delayed or cross-thread stream consumption.
-What you put in tags is **your** input — they're queryable strings. Don't put prompts, completions, emails, or secrets there. Use IDs.
-## Pricing
-Built-in prices live in `lib/llm_cost_tracker/prices.json` and are refreshed daily from official provider pricing pages by an automated CI workflow that opens a PR on every change. Most apps run on bundled prices and never think about this.
-When you want to control updates yourself — for negotiated rates, gateway-specific model IDs, or pinned reviews — generate a local snapshot:
-```bash
-bin/rails generate llm_cost_tracker:prices
-```
-```ruby
-config.prices_file = Rails.root.join("config/llm_cost_tracker_prices.yml")
-```
-Refresh on demand from the maintained snapshot:
-```bash
-bin/rails llm_cost_tracker:prices:refresh
-```
-Explain why a model is priced or unknown:
+## Docs
-```bash
-PROVIDER=openai MODEL=gpt-4o bin/rails llm_cost_tracker:prices:explain
-```
-Precedence is `pricing_overrides` → `prices_file` → bundled. Provider-qualified keys like `openai/gpt-4o-mini` win over model-only keys.
-`pricing_mode` selects mode-prefixed rates such as `batch_input` or `priority_output`. Built-in capture fills it from provider tier fields when available; explicit `track` calls can pass it directly for batch jobs or gateway-specific modes. Full pricing reference: [`docs/pricing.md`](docs/pricing.md).
-## Budgets
-Budgets are guardrails, not transactional caps:
-```ruby
-config.monthly_budget           = 500.00
-config.daily_budget             = 50.00
-config.per_call_budget          = 2.00
-config.budget_exceeded_behavior = :block_requests # or :notify, :raise
-config.on_budget_exceeded       = ->(data) { SlackNotifier.notify("#alerts", "...") }
-```
-`:block_requests` reads ledger totals before a call goes out and stops it if you're already over. Under concurrency multiple workers can pass preflight at the same time and collectively overshoot — this catches the next call after the overshoot becomes visible, not the overshoot itself. For a strict cap, use a provider-side limit or a transactional counter outside the gem.
-Full behavior, error class, and preflight details: [`docs/budgets.md`](docs/budgets.md).
-## Querying
-When you want to slice spend from a console, scheduled job, or your own admin page:
-```ruby
-LlmCostTracker::Ledger::Call.this_month.cost_by_model
-LlmCostTracker::Ledger::Call.this_month.cost_by_tag("feature")
-LlmCostTracker::Ledger::Call.daily_costs(days: 7)
-LlmCostTracker::Ledger::Call.by_tags(user_id: 42, feature: "chat").this_month.total_cost
-```
-A text report is also one rake task away:
-```bash
-DAYS=7 bin/rails llm_cost_tracker:report
-```
-Full scope and helper reference: [`docs/querying.md`](docs/querying.md).
-## Dashboard
-Mount the engine wherever you want — it's plain ERB, no JavaScript bundle, no asset pipeline gymnastics:
-```ruby
-# config/routes.rb
-mount LlmCostTracker::Engine => "/llm-costs"
-```
-Pages: overview (spend trend, budget status, anomaly banner), models, calls (filterable, paginated, CSV export), tags, data quality. Reads the ActiveRecord ledger in `llm_api_calls`.
-Auth is your job. Examples for basic auth and Devise: [`docs/dashboard.md`](docs/dashboard.md).
-## Supported providers
-| Provider | Auto-detected | Coverage |
-|---|:---:|---|
-| OpenAI | Yes | GPT-5.5/5.4/5.2/5.1/5 + pro/mini/nano variants, GPT-4.1, GPT-4o, o1/o3/o4-mini |
-| Anthropic | Yes | Claude Opus 4.7/4.6/4.5/4.1/4, Sonnet 4.6/4.5/4, Haiku 4.5 |
-| Google Gemini | Yes | Gemini 2.5 Pro/Flash/Flash-Lite, 2.0 Flash/Flash-Lite |
-| OpenRouter | Yes | OpenAI-compatible usage; provider-prefixed model IDs are normalized |
-| DeepSeek | Yes | OpenAI-compatible usage; add `pricing_overrides` for DeepSeek-specific rates |
-| Groq | Yes | OpenAI-compatible usage with bundled prices for production text models |
-| Other OpenAI-compatible hosts | Configurable | Register the host via `config.openai_compatible_providers` |
-| Anything else | Manual | Use `LlmCostTracker.track` / `track_stream` |
-RubyLLM chat, embedding, and transcription calls are captured through RubyLLM's provider layer when `config.instrument :ruby_llm` is enabled.
-Endpoints covered end-to-end: OpenAI Chat Completions / Responses / Completions / Embeddings, Anthropic Messages, Gemini `generateContent` and `streamGenerateContent`, plus their OpenAI-compatible equivalents. Streaming is captured for Faraday paths and official OpenAI / Anthropic SDK stream helpers whenever the provider emits final-usage events.
-## Privacy
-By design, **no prompt or response content is ever stored.** Per call, the ledger holds: provider, model, token counts, cost, latency, tags, response ID, timestamp. That's it. No request bodies, no headers, no completions. Warning logs strip query strings before logging URLs.
-Tags carry whatever your app passes — they are application-controlled input, treat them accordingly. Use `user_id`, not the user's email; use a feature key, not the input prompt.
-## Documentation
-Deeper guides live in `docs/`. Reference pages are being filled out as content
-moves out of this README; the inline sections above remain canonical where a page
-is still brief.
-- [Configuration reference](docs/configuration.md)
-- [Pricing & price refresh](docs/pricing.md)
-- [Budgets & guardrails](docs/budgets.md)
-- [Querying & reports](docs/querying.md)
-- [Dashboard mounting](docs/dashboard.md)
-- [Streaming capture](docs/streaming.md)
+- [Configuration](docs/configuration.md)
+- [Pricing](docs/pricing.md)
+- [Budgets](docs/budgets.md)
+- [Data model](docs/data-model.md)
+- [Querying](docs/querying.md)
+- [Dashboard](docs/dashboard.md)
+- [Streaming](docs/streaming.md)
+- [Cookbook](docs/cookbook.md)
 - [Extending](docs/extending.md)
-- [Production operations](docs/operations.md)
+- [Operations](docs/operations.md)
+- [Architecture](docs/architecture.md)
+- [EU AI Act record-keeping](docs/eu_ai_act.md)
 - [Upgrading](docs/upgrading.md)
-- [Cookbook — per-client recipes](docs/cookbook.md)
-- [Architecture & design rules](docs/architecture.md)
-## Known limitations
-- `:block_requests` is best-effort under concurrency, not a transactional cap.
-- Streaming usage capture relies on the provider emitting a final-usage event. Missing events are stored with `usage_source: "unknown"` so they appear on the data-quality page rather than vanishing.
-- Non-token line items such as Gemini explicit-cache storage duration, provider tool calls, and modality-specific surcharges are not folded into token cost.
-- `provider_response_id` is stored only when the provider exposes a stable ID. Gemini is best-effort and varies by endpoint.
+- [Changelog](CHANGELOG.md)
 ## Development
 ```bash
 bundle install
-bin/check # rubocop + rspec + coverage gate
+bin/check
 ```
-Architecture rules and conventions for contributions live in [`docs/architecture.md`](docs/architecture.md).
 ## License
 MIT — see [LICENSE.txt](LICENSE.txt).