llm_cost_tracker 0.5.2 → 0.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +46 -0
- data/README.md +8 -3
- data/app/controllers/llm_cost_tracker/calls_controller.rb +35 -21
- data/app/services/llm_cost_tracker/dashboard/overview_stats.rb +3 -1
- data/app/services/llm_cost_tracker/dashboard/tag_key_explorer.rb +4 -5
- data/docs/architecture.md +28 -0
- data/docs/budgets.md +45 -0
- data/docs/configuration.md +65 -0
- data/docs/cookbook.md +185 -0
- data/docs/dashboard-overview.png +0 -0
- data/docs/dashboard.md +38 -0
- data/docs/extending.md +32 -0
- data/docs/operations.md +44 -0
- data/docs/pricing.md +94 -0
- data/docs/querying.md +36 -0
- data/docs/streaming.md +70 -0
- data/docs/technical/README.md +10 -0
- data/docs/technical/data-flow.md +70 -0
- data/docs/technical/extension-points.md +111 -0
- data/docs/technical/module-map.md +197 -0
- data/docs/technical/operational-notes.md +97 -0
- data/docs/upgrading.md +47 -0
- data/lib/llm_cost_tracker/active_record_adapter.rb +49 -0
- data/lib/llm_cost_tracker/capture_verifier.rb +71 -0
- data/lib/llm_cost_tracker/configuration/instrumentation.rb +1 -1
- data/lib/llm_cost_tracker/configuration/storage_backend.rb +26 -0
- data/lib/llm_cost_tracker/configuration.rb +2 -1
- data/lib/llm_cost_tracker/doctor/capture_check.rb +39 -0
- data/lib/llm_cost_tracker/doctor/ingestion_check.rb +117 -0
- data/lib/llm_cost_tracker/doctor.rb +8 -1
- data/lib/llm_cost_tracker/event.rb +1 -0
- data/lib/llm_cost_tracker/generators/llm_cost_tracker/add_ingestion_generator.rb +29 -0
- data/lib/llm_cost_tracker/generators/llm_cost_tracker/templates/add_ingestion_to_llm_cost_tracker.rb.erb +33 -0
- data/lib/llm_cost_tracker/generators/llm_cost_tracker/templates/add_period_totals_to_llm_cost_tracker.rb.erb +14 -6
- data/lib/llm_cost_tracker/generators/llm_cost_tracker/templates/add_streaming_to_llm_api_calls.rb.erb +0 -4
- data/lib/llm_cost_tracker/generators/llm_cost_tracker/templates/create_llm_api_calls.rb.erb +30 -3
- data/lib/llm_cost_tracker/generators/llm_cost_tracker/templates/initializer.rb.erb +1 -1
- data/lib/llm_cost_tracker/generators/llm_cost_tracker/templates/upgrade_llm_api_call_tags_to_jsonb.rb.erb +3 -1
- data/lib/llm_cost_tracker/inbox_event.rb +9 -0
- data/lib/llm_cost_tracker/ingestor_lease.rb +9 -0
- data/lib/llm_cost_tracker/integrations/anthropic.rb +41 -2
- data/lib/llm_cost_tracker/integrations/openai.rb +66 -2
- data/lib/llm_cost_tracker/integrations/registry.rb +33 -3
- data/lib/llm_cost_tracker/integrations/stream_tracker.rb +166 -0
- data/lib/llm_cost_tracker/llm_api_call.rb +2 -78
- data/lib/llm_cost_tracker/llm_api_call_metrics.rb +63 -0
- data/lib/llm_cost_tracker/parsers/openai_usage.rb +1 -1
- data/lib/llm_cost_tracker/period_grouping.rb +4 -3
- data/lib/llm_cost_tracker/pricing/effective_prices.rb +75 -0
- data/lib/llm_cost_tracker/pricing/explainer.rb +77 -0
- data/lib/llm_cost_tracker/pricing/lookup.rb +143 -0
- data/lib/llm_cost_tracker/pricing.rb +25 -108
- data/lib/llm_cost_tracker/railtie.rb +1 -0
- data/lib/llm_cost_tracker/retention.rb +3 -9
- data/lib/llm_cost_tracker/storage/active_record_backend.rb +166 -0
- data/lib/llm_cost_tracker/storage/active_record_connection_cleanup.rb +13 -0
- data/lib/llm_cost_tracker/storage/active_record_inbox.rb +165 -0
- data/lib/llm_cost_tracker/storage/active_record_inbox_batch.rb +92 -0
- data/lib/llm_cost_tracker/storage/active_record_ingestor.rb +174 -0
- data/lib/llm_cost_tracker/storage/active_record_ingestor_lease.rb +38 -0
- data/lib/llm_cost_tracker/storage/active_record_period_totals.rb +84 -0
- data/lib/llm_cost_tracker/storage/active_record_periods.rb +31 -0
- data/lib/llm_cost_tracker/storage/active_record_rollup_batch.rb +41 -0
- data/lib/llm_cost_tracker/storage/active_record_rollup_upsert_sql.rb +42 -0
- data/lib/llm_cost_tracker/storage/active_record_rollups.rb +59 -55
- data/lib/llm_cost_tracker/storage/active_record_store.rb +68 -9
- data/lib/llm_cost_tracker/storage/custom_backend.rb +32 -0
- data/lib/llm_cost_tracker/storage/dispatcher.rb +11 -34
- data/lib/llm_cost_tracker/storage/log_backend.rb +38 -0
- data/lib/llm_cost_tracker/storage/registry.rb +63 -0
- data/lib/llm_cost_tracker/stream_collector.rb +18 -7
- data/lib/llm_cost_tracker/tag_sql.rb +34 -0
- data/lib/llm_cost_tracker/tags_column.rb +7 -1
- data/lib/llm_cost_tracker/tracker.rb +3 -0
- data/lib/llm_cost_tracker/version.rb +1 -1
- data/lib/llm_cost_tracker.rb +39 -1
- data/lib/tasks/llm_cost_tracker.rake +49 -0
- metadata +47 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: fa3f705baf280c2c2239b2dab3522fe7db9b60e26060b00fc08dcc039117da83
|
|
4
|
+
data.tar.gz: ea34bdad7cb0d7c9fb3233b40b609cea1361ded833ad130c4a7a7ce559b34758
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 03c55866b522b36b0728f73fd0ae0075e0a42faa6f80c429865f120d2c597e0b0996665b6a82528c566a0f8554d7636dfd03d9ab600441dafd5a4f0233d3f56b
|
|
7
|
+
data.tar.gz: af01c4554912d80276bf54ed97aa379c2eaed791fa357ca135aa008fdcbdd41e365c236ac74c9ceb1982c0fbbf2538bc569df1690acb4b5758895dd48c4497b5
|
data/CHANGELOG.md
CHANGED
|
@@ -4,6 +4,52 @@ Format: [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). Versioning: [S
|
|
|
4
4
|
|
|
5
5
|
## [Unreleased]
|
|
6
6
|
|
|
7
|
+
## [0.6.0] - 2026-04-29
|
|
8
|
+
|
|
9
|
+
### Added
|
|
10
|
+
|
|
11
|
+
- Durable ActiveRecord ingestion through `llm_cost_tracker_inbox_events` and `llm_cost_tracker_ingestor_leases`.
|
|
12
|
+
- `llm_cost_tracker:add_ingestion` generator for upgrading existing ActiveRecord installs.
|
|
13
|
+
- `LlmCostTracker.flush!` and `LlmCostTracker.shutdown!` for draining or stopping durable ingestion.
|
|
14
|
+
- Doctor diagnostics for missing durable ingestion schema, stale pending inbox rows, and quarantined inbox rows.
|
|
15
|
+
- PostgreSQL and MySQL smoke checks for ActiveRecord durable ingestion.
|
|
16
|
+
|
|
17
|
+
### Changed
|
|
18
|
+
|
|
19
|
+
- Fresh ActiveRecord installs now include durable ingestion tables, event IDs, and production indexes.
|
|
20
|
+
- ActiveRecord budget totals now read stored period rollups plus pending inbox totals while durable ingestion is enabled.
|
|
21
|
+
- ActiveRecord writes now use a durable inbox before batching ledger inserts and period rollup updates when ingestion tables are present.
|
|
22
|
+
- Pricing lookup now caches normalized runtime price tables and model matches by configuration generation.
|
|
23
|
+
- Stream capture now estimates buffered event size without serializing every captured event.
|
|
24
|
+
- CSV export now selects only exported columns instead of loading full ActiveRecord objects.
|
|
25
|
+
|
|
26
|
+
### Fixed
|
|
27
|
+
|
|
28
|
+
- ActiveRecord rollups no longer double-count retried events when duplicate event IDs race across workers.
|
|
29
|
+
- Invalid inbox rows are retried and quarantined without blocking healthy rows behind them.
|
|
30
|
+
- Idle ingestors no longer acquire the leader lease while the inbox is empty.
|
|
31
|
+
- ActiveRecord inbox writes now fail honestly when a separate connection is unavailable inside a caller transaction.
|
|
32
|
+
- Ingestor shutdown/reset no longer lets an old sleeping thread resume as a second local ingestor.
|
|
33
|
+
- `flush!` now returns `false` instead of raising when its timeout expires during ingestion.
|
|
34
|
+
- ActiveRecord adapter family detection now works through known adapter class ancestry with an adapter-name fallback.
|
|
35
|
+
- CSV export now emits `{}` for invalid stored tag payloads.
|
|
36
|
+
|
|
37
|
+
## [0.5.3] - 2026-04-28
|
|
38
|
+
|
|
39
|
+
### Added
|
|
40
|
+
|
|
41
|
+
- Official OpenAI SDK streaming capture for Responses streams, Responses raw streams, Responses retrieve streams, and Chat Completions raw streams.
|
|
42
|
+
- Official Anthropic SDK streaming capture for Messages streams and raw streams.
|
|
43
|
+
- Capture verification via `llm_cost_tracker:verify_capture` and expanded doctor capture diagnostics.
|
|
44
|
+
- Pricing explanation via `LlmCostTracker::Pricing.explain` and `llm_cost_tracker:prices:explain`.
|
|
45
|
+
- Extensible storage and SDK integration registries via `Storage.register` and `Integrations.register`.
|
|
46
|
+
|
|
47
|
+
### Fixed
|
|
48
|
+
|
|
49
|
+
- OpenAI Responses stream parsing now reads final usage from completed response events.
|
|
50
|
+
- Incomplete price entries now return unknown pricing instead of raising `TypeError`.
|
|
51
|
+
- Retention pruning now keeps ActiveRecord period rollups in sync when deleting rows inside active budget windows.
|
|
52
|
+
|
|
7
53
|
## [0.5.2] - 2026-04-27
|
|
8
54
|
|
|
9
55
|
### Added
|
data/README.md
CHANGED
|
@@ -167,6 +167,12 @@ Refresh on demand from the maintained snapshot:
|
|
|
167
167
|
bin/rails llm_cost_tracker:prices:refresh
|
|
168
168
|
```
|
|
169
169
|
|
|
170
|
+
Explain why a model is priced or unknown:
|
|
171
|
+
|
|
172
|
+
```bash
|
|
173
|
+
PROVIDER=openai MODEL=gpt-4o bin/rails llm_cost_tracker:prices:explain
|
|
174
|
+
```
|
|
175
|
+
|
|
170
176
|
Precedence is `pricing_overrides` → `prices_file` → bundled. Provider-qualified keys like `openai/gpt-4o-mini` win over model-only keys. Full pricing reference: [`docs/pricing.md`](docs/pricing.md).
|
|
171
177
|
|
|
172
178
|
## Budgets
|
|
@@ -231,7 +237,7 @@ Auth is your job. Examples for basic auth and Devise: [`docs/dashboard.md`](docs
|
|
|
231
237
|
|
|
232
238
|
RubyLLM chat, embedding, and transcription calls are captured through RubyLLM's provider layer when `config.instrument :ruby_llm` is enabled.
|
|
233
239
|
|
|
234
|
-
Endpoints covered end-to-end: OpenAI Chat Completions / Responses / Completions / Embeddings, Anthropic Messages, Gemini `generateContent` and `streamGenerateContent`, plus their OpenAI-compatible equivalents. Streaming is captured for Faraday paths whenever the provider emits final-usage events.
|
|
240
|
+
Endpoints covered end-to-end: OpenAI Chat Completions / Responses / Completions / Embeddings, Anthropic Messages, Gemini `generateContent` and `streamGenerateContent`, plus their OpenAI-compatible equivalents. Streaming is captured for Faraday paths and official OpenAI / Anthropic SDK stream helpers whenever the provider emits final-usage events.
|
|
235
241
|
|
|
236
242
|
## Privacy
|
|
237
243
|
|
|
@@ -260,7 +266,6 @@ is still brief.
|
|
|
260
266
|
## Known limitations
|
|
261
267
|
|
|
262
268
|
- `:block_requests` is best-effort under concurrency, not a transactional cap.
|
|
263
|
-
- Official OpenAI and Anthropic SDK integrations cover non-streaming calls; streaming via those SDKs falls back to Faraday middleware or `track_stream`.
|
|
264
269
|
- Streaming usage capture relies on the provider emitting a final-usage event. Missing events are stored with `usage_source: "unknown"` so they appear on the data-quality page rather than vanishing.
|
|
265
270
|
- `provider_response_id` is stored only when the provider exposes a stable ID. Gemini is best-effort and varies by endpoint.
|
|
266
271
|
- Cache write TTL variants on Anthropic (1h vs 5min writes) are not modeled separately yet.
|
|
@@ -269,7 +274,7 @@ is still brief.
|
|
|
269
274
|
|
|
270
275
|
```bash
|
|
271
276
|
bundle install
|
|
272
|
-
bin/check # rubocop + rspec
|
|
277
|
+
bin/check # rubocop + rspec + coverage gate
|
|
273
278
|
```
|
|
274
279
|
|
|
275
280
|
Architecture rules and conventions for contributions live in [`AGENTS.md`](AGENTS.md) and [`docs/architecture.md`](docs/architecture.md).
|
|
@@ -1,6 +1,7 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
3
|
require "csv"
|
|
4
|
+
require "json"
|
|
4
5
|
|
|
5
6
|
module LlmCostTracker
|
|
6
7
|
class CallsController < ApplicationController
|
|
@@ -54,32 +55,45 @@ module LlmCostTracker
|
|
|
54
55
|
end
|
|
55
56
|
|
|
56
57
|
def render_csv(relation)
|
|
57
|
-
|
|
58
|
+
fields = csv_fields
|
|
58
59
|
CSV.generate do |csv|
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
csv << headers
|
|
64
|
-
|
|
65
|
-
relation.each do |call|
|
|
66
|
-
row = [
|
|
67
|
-
call.tracked_at&.utc&.iso8601,
|
|
68
|
-
csv_safe(call.provider),
|
|
69
|
-
csv_safe(call.model),
|
|
70
|
-
call.input_tokens,
|
|
71
|
-
call.output_tokens,
|
|
72
|
-
call.total_tokens,
|
|
73
|
-
call.total_cost
|
|
74
|
-
]
|
|
75
|
-
row << call.latency_ms if latency
|
|
76
|
-
row << csv_safe(call.provider_response_id) if LlmApiCall.provider_response_id_column?
|
|
77
|
-
row << csv_safe(call.parsed_tags.to_json)
|
|
78
|
-
csv << row
|
|
60
|
+
csv << fields.map(&:to_s)
|
|
61
|
+
|
|
62
|
+
relation.pluck(*fields).each do |values|
|
|
63
|
+
csv << fields.zip(values).map { |field, value| csv_value(field, value) }
|
|
79
64
|
end
|
|
80
65
|
end
|
|
81
66
|
end
|
|
82
67
|
|
|
68
|
+
def csv_fields
|
|
69
|
+
fields = %i[tracked_at provider model input_tokens output_tokens total_tokens total_cost]
|
|
70
|
+
fields << :latency_ms if LlmApiCall.latency_column?
|
|
71
|
+
fields << :provider_response_id if LlmApiCall.provider_response_id_column?
|
|
72
|
+
fields << :tags
|
|
73
|
+
fields
|
|
74
|
+
end
|
|
75
|
+
|
|
76
|
+
def csv_value(field, value)
|
|
77
|
+
case field
|
|
78
|
+
when :tracked_at
|
|
79
|
+
value&.utc&.iso8601
|
|
80
|
+
when :provider, :model, :provider_response_id
|
|
81
|
+
csv_safe(value)
|
|
82
|
+
when :tags
|
|
83
|
+
csv_safe(csv_tags(value))
|
|
84
|
+
else
|
|
85
|
+
value
|
|
86
|
+
end
|
|
87
|
+
end
|
|
88
|
+
|
|
89
|
+
def csv_tags(value)
|
|
90
|
+
return value.transform_keys(&:to_s).to_json if value.is_a?(Hash)
|
|
91
|
+
|
|
92
|
+
JSON.parse(value || "{}").to_json
|
|
93
|
+
rescue JSON::ParserError
|
|
94
|
+
"{}"
|
|
95
|
+
end
|
|
96
|
+
|
|
83
97
|
def csv_safe(value)
|
|
84
98
|
return value if value.nil?
|
|
85
99
|
|
|
@@ -1,5 +1,7 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
|
+
require "llm_cost_tracker/storage/active_record_store"
|
|
4
|
+
|
|
3
5
|
module LlmCostTracker
|
|
4
6
|
module Dashboard
|
|
5
7
|
OverviewStatsData = Data.define(
|
|
@@ -77,7 +79,7 @@ module LlmCostTracker
|
|
|
77
79
|
now = Time.now.utc
|
|
78
80
|
month_start = now.beginning_of_month
|
|
79
81
|
month_end = now.end_of_month
|
|
80
|
-
spent = LlmCostTracker::
|
|
82
|
+
spent = LlmCostTracker::Storage::ActiveRecordStore.monthly_total(time: now)
|
|
81
83
|
elapsed_seconds = now - month_start
|
|
82
84
|
total_seconds = month_end - month_start
|
|
83
85
|
projected_spent = if spent.zero? || !elapsed_seconds.positive?
|
|
@@ -42,11 +42,10 @@ module LlmCostTracker
|
|
|
42
42
|
end
|
|
43
43
|
|
|
44
44
|
def build_sql
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
end
|
|
45
|
+
return postgresql_sql if ActiveRecordAdapter.postgresql?(connection)
|
|
46
|
+
return mysql_sql if ActiveRecordAdapter.mysql?(connection)
|
|
47
|
+
|
|
48
|
+
sqlite_sql
|
|
50
49
|
end
|
|
51
50
|
|
|
52
51
|
def mysql_sql
|
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
# Architecture
|
|
2
|
+
|
|
3
|
+
LLM Cost Tracker is a provider-agnostic billing ledger. Core code should model durable billing concepts, not the naming quirks of one provider or one model family.
|
|
4
|
+
|
|
5
|
+
Core vocabulary belongs in provider-neutral terms:
|
|
6
|
+
|
|
7
|
+
- `input_tokens`
|
|
8
|
+
- `cache_read_input_tokens`
|
|
9
|
+
- `cache_write_input_tokens`
|
|
10
|
+
- `output_tokens`
|
|
11
|
+
- `hidden_output_tokens`
|
|
12
|
+
- `pricing_mode`
|
|
13
|
+
- `provider_response_id`
|
|
14
|
+
|
|
15
|
+
Provider-specific names belong only at ingestion boundaries: parsers and stream adapters. Those adapters translate raw fields into the canonical ledger vocabulary before data reaches `Tracker`, `Pricing`, storage, dashboard services, or reports.
|
|
16
|
+
|
|
17
|
+
Pricing logic should prefer generic mechanisms over provider branches. Use provider/model price entries only for lookup and rate selection. Use `pricing_mode` plus mode-prefixed price keys for alternate billing modes instead of adding model-specific conditionals.
|
|
18
|
+
|
|
19
|
+
Tags remain the extension point for app-specific attribution such as tenant, user, feature, trace, job, workflow, or agent session. Do not promote those dimensions into first-class columns unless the ledger itself needs them for provider-agnostic billing behavior.
|
|
20
|
+
|
|
21
|
+
Hot-path guardrails must not aggregate over the growing call ledger. ActiveRecord period budgets should read maintained rows in `llm_cost_tracker_period_totals`; dashboard analytics may run grouped queries because they are user-initiated reporting paths. Do not add dashboard-only aggregate tables until bounded indexed reads from `llm_api_calls` are no longer enough for the supported date range.
|
|
22
|
+
|
|
23
|
+
## Technical Docs
|
|
24
|
+
|
|
25
|
+
- [Module map](technical/module-map.md)
|
|
26
|
+
- [Data flow](technical/data-flow.md)
|
|
27
|
+
- [Extension points](technical/extension-points.md)
|
|
28
|
+
- [Operational notes](technical/operational-notes.md)
|
data/docs/budgets.md
ADDED
|
@@ -0,0 +1,45 @@
|
|
|
1
|
+
# Budgets and Guardrails
|
|
2
|
+
|
|
3
|
+
Budgets are safety rails for a Rails app using LLMs in production. They are not
|
|
4
|
+
invoice reconciliation and they are not a transactional quota system.
|
|
5
|
+
|
|
6
|
+
The full behavior reference is moving here from the README: monthly, daily, and
|
|
7
|
+
per-call budgets; notification payloads; preflight behavior; and failure modes.
|
|
8
|
+
|
|
9
|
+
## Canonical Sources
|
|
10
|
+
|
|
11
|
+
Until this page is expanded, use:
|
|
12
|
+
|
|
13
|
+
- [Budgets](../README.md#budgets)
|
|
14
|
+
- [Known limitations](../README.md#known-limitations)
|
|
15
|
+
- [Operations](operations.md)
|
|
16
|
+
|
|
17
|
+
## Behaviors
|
|
18
|
+
|
|
19
|
+
- `:notify`: call `on_budget_exceeded` after a priced event crosses a limit.
|
|
20
|
+
- `:raise`: record the event, then raise `BudgetExceededError`.
|
|
21
|
+
- `:block_requests`: preflight future calls when stored period totals are
|
|
22
|
+
already over budget.
|
|
23
|
+
|
|
24
|
+
```ruby
|
|
25
|
+
config.monthly_budget = 500.00
|
|
26
|
+
config.daily_budget = 50.00
|
|
27
|
+
config.per_call_budget = 2.00
|
|
28
|
+
config.budget_exceeded_behavior = :block_requests
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
`:block_requests` needs ActiveRecord storage for shared period totals. Under
|
|
32
|
+
concurrency it stops the next request after overspend is visible; it does not
|
|
33
|
+
make provider spend transactional.
|
|
34
|
+
|
|
35
|
+
## Error Payload
|
|
36
|
+
|
|
37
|
+
`BudgetExceededError` exposes:
|
|
38
|
+
|
|
39
|
+
- `budget_type`
|
|
40
|
+
- `total`
|
|
41
|
+
- `budget`
|
|
42
|
+
- `monthly_total`
|
|
43
|
+
- `daily_total`
|
|
44
|
+
- `call_cost`
|
|
45
|
+
- `last_event`
|
|
@@ -0,0 +1,65 @@
|
|
|
1
|
+
# Configuration
|
|
2
|
+
|
|
3
|
+
Configuration is the contract between the host app and the ledger: where events
|
|
4
|
+
go, which integrations are enabled, how attribution is attached, and how the app
|
|
5
|
+
reacts when storage, pricing, or budgets need attention.
|
|
6
|
+
|
|
7
|
+
The full option reference is moving here from the README. Until that migration is
|
|
8
|
+
complete, the README anchors below remain canonical.
|
|
9
|
+
|
|
10
|
+
## Canonical Sources
|
|
11
|
+
|
|
12
|
+
Until this page is expanded, use:
|
|
13
|
+
|
|
14
|
+
- [Quickstart](../README.md#quickstart)
|
|
15
|
+
- [Capturing calls](../README.md#capturing-calls)
|
|
16
|
+
- [Tags](../README.md#tags-who-burned-this-money)
|
|
17
|
+
- [Pricing](../README.md#pricing)
|
|
18
|
+
- [Budgets](../README.md#budgets)
|
|
19
|
+
|
|
20
|
+
## Scope
|
|
21
|
+
|
|
22
|
+
This page is scoped to:
|
|
23
|
+
|
|
24
|
+
- `storage_backend`: `:log`, `:active_record`, and `:custom`; ActiveRecord capture uses a durable inbox when the ingestion migration is present
|
|
25
|
+
- `default_tags`: static tags and per-request callable tags
|
|
26
|
+
- `instrument`: RubyLLM and official SDK integrations
|
|
27
|
+
- `prices_file` and `pricing_overrides`
|
|
28
|
+
- `monthly_budget`, `daily_budget`, and `per_call_budget`
|
|
29
|
+
- `budget_exceeded_behavior`
|
|
30
|
+
- `storage_error_behavior`
|
|
31
|
+
- `unknown_pricing_behavior`
|
|
32
|
+
- `openai_compatible_providers`
|
|
33
|
+
- `report_tag_breakdowns`
|
|
34
|
+
|
|
35
|
+
## Minimal Production Config
|
|
36
|
+
|
|
37
|
+
```ruby
|
|
38
|
+
LlmCostTracker.configure do |config|
|
|
39
|
+
config.storage_backend = :active_record
|
|
40
|
+
config.default_tags = -> { { environment: Rails.env } }
|
|
41
|
+
config.prices_file = Rails.root.join("config/llm_cost_tracker_prices.yml")
|
|
42
|
+
config.instrument :openai
|
|
43
|
+
end
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
Keep configuration at boot. Mutable shared settings are frozen after
|
|
47
|
+
`configure` returns so request-time code cannot silently change global tracking
|
|
48
|
+
behavior.
|
|
49
|
+
|
|
50
|
+
Enabled SDK integrations are fail-fast. The client gem must be loaded, meet the
|
|
51
|
+
minimum supported version, and expose the expected classes and methods before
|
|
52
|
+
LLM Cost Tracker installs its wrapper.
|
|
53
|
+
|
|
54
|
+
## Capture Verification
|
|
55
|
+
|
|
56
|
+
After boot, run:
|
|
57
|
+
|
|
58
|
+
```bash
|
|
59
|
+
bin/rails llm_cost_tracker:verify_capture
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
For ActiveRecord storage, the task records a synthetic manual event inside a
|
|
63
|
+
rollback and checks that notifications and persistence both work. For log and
|
|
64
|
+
custom storage, it reports the configured capture path without invoking external
|
|
65
|
+
sinks.
|
data/docs/cookbook.md
ADDED
|
@@ -0,0 +1,185 @@
|
|
|
1
|
+
# Cookbook
|
|
2
|
+
|
|
3
|
+
Short integration recipes for common Ruby clients. Prefer SDK integrations or middleware. Use `track` and `track_stream` only as fallback helpers for unsupported clients.
|
|
4
|
+
|
|
5
|
+
| Client | Best path | Why |
|
|
6
|
+
|---|---|---|
|
|
7
|
+
| RubyLLM | `config.instrument :ruby_llm` | The integration wraps RubyLLM's provider layer without adding a third-party instrumentation gem. |
|
|
8
|
+
| Official `openai` gem | `config.instrument :openai` | The integration wraps SDK resource methods without changing call sites. |
|
|
9
|
+
| Official `anthropic` gem | `config.instrument :anthropic` | The integration records returned message usage without changing call sites. |
|
|
10
|
+
| `ruby-openai` | Faraday middleware | The client is built on Faraday and accepts middleware via the constructor block. |
|
|
11
|
+
| OpenAI-compatible proxy | Faraday middleware | Use `ruby-openai` or a direct Faraday client against the proxy host. |
|
|
12
|
+
| Custom Faraday client | Faraday middleware | The middleware can parse known provider responses automatically. |
|
|
13
|
+
| Other clients | Adapter first, fallback helpers second | Add a stable integration instead of scattering per-call ledger code. |
|
|
14
|
+
|
|
15
|
+
## RubyLLM
|
|
16
|
+
|
|
17
|
+
Enable the integration once, then keep normal RubyLLM calls unchanged.
|
|
18
|
+
|
|
19
|
+
```ruby
|
|
20
|
+
LlmCostTracker.configure do |config|
|
|
21
|
+
config.instrument :ruby_llm
|
|
22
|
+
end
|
|
23
|
+
|
|
24
|
+
LlmCostTracker.with_tags(feature: "support_chat") do
|
|
25
|
+
RubyLLM.chat.ask("Hello")
|
|
26
|
+
RubyLLM.embed("text to embed")
|
|
27
|
+
end
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
The RubyLLM integration supports `ruby_llm >= 1.14.1` and checks RubyLLM's provider contract at boot. Chat, embedding, and transcription calls are captured. Image generation, moderation, and tool execution are not recorded as separate ledger rows.
|
|
31
|
+
|
|
32
|
+
## Official OpenAI SDK
|
|
33
|
+
|
|
34
|
+
Enable the integration once, then keep normal `openai` gem calls unchanged.
|
|
35
|
+
|
|
36
|
+
```ruby
|
|
37
|
+
LlmCostTracker.configure do |config|
|
|
38
|
+
config.instrument :openai
|
|
39
|
+
end
|
|
40
|
+
|
|
41
|
+
client = OpenAI::Client.new(api_key: ENV["OPENAI_API_KEY"])
|
|
42
|
+
|
|
43
|
+
client.responses.create(model: "gpt-4o", input: "Hello")
|
|
44
|
+
client.chat.completions.create(
|
|
45
|
+
model: "gpt-4o",
|
|
46
|
+
messages: [{ role: "user", content: "Hello" }]
|
|
47
|
+
)
|
|
48
|
+
|
|
49
|
+
client.responses.stream(model: "gpt-4o", input: "Hello").each do |event|
|
|
50
|
+
puts event.type
|
|
51
|
+
end
|
|
52
|
+
|
|
53
|
+
client.responses.stream_raw(model: "gpt-4o", input: "Hello").each do |event|
|
|
54
|
+
puts event.type
|
|
55
|
+
end
|
|
56
|
+
|
|
57
|
+
client.chat.completions.stream_raw(
|
|
58
|
+
model: "gpt-4o",
|
|
59
|
+
messages: [{ role: "user", content: "Hello" }],
|
|
60
|
+
stream_options: { include_usage: true }
|
|
61
|
+
).each do |event|
|
|
62
|
+
puts event
|
|
63
|
+
end
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
The OpenAI SDK integration supports `openai >= 0.59.0`. Streaming calls are recorded after the returned stream is consumed. Chat Completions streams need `stream_options: { include_usage: true }` for final usage.
|
|
67
|
+
|
|
68
|
+
## Official Anthropic SDK
|
|
69
|
+
|
|
70
|
+
Enable the integration once, then keep normal `anthropic` gem calls unchanged.
|
|
71
|
+
|
|
72
|
+
```ruby
|
|
73
|
+
LlmCostTracker.configure do |config|
|
|
74
|
+
config.instrument :anthropic
|
|
75
|
+
end
|
|
76
|
+
|
|
77
|
+
client = Anthropic::Client.new(api_key: ENV["ANTHROPIC_API_KEY"])
|
|
78
|
+
|
|
79
|
+
client.messages.create(
|
|
80
|
+
max_tokens: 1024,
|
|
81
|
+
model: "claude-sonnet-4-5-20250929",
|
|
82
|
+
messages: [{ role: "user", content: "Hello" }]
|
|
83
|
+
)
|
|
84
|
+
|
|
85
|
+
client.messages.stream(
|
|
86
|
+
max_tokens: 1024,
|
|
87
|
+
model: "claude-sonnet-4-5-20250929",
|
|
88
|
+
messages: [{ role: "user", content: "Hello" }]
|
|
89
|
+
).each do |event|
|
|
90
|
+
puts event.type
|
|
91
|
+
end
|
|
92
|
+
|
|
93
|
+
client.messages.stream_raw(
|
|
94
|
+
max_tokens: 1024,
|
|
95
|
+
model: "claude-sonnet-4-5-20250929",
|
|
96
|
+
messages: [{ role: "user", content: "Hello" }]
|
|
97
|
+
).each do |event|
|
|
98
|
+
puts event.type
|
|
99
|
+
end
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
The Anthropic SDK integration supports `anthropic >= 1.36.0`. Streaming calls are recorded after the returned stream is consumed.
|
|
103
|
+
|
|
104
|
+
## ruby-openai
|
|
105
|
+
|
|
106
|
+
`ruby-openai` is a community client that occupies the same `OpenAI::Client` constant as the official gem; only one of the two can be loaded. `config.instrument :openai` is for the official gem. For `ruby-openai`, attach the Faraday middleware via the constructor block:
|
|
107
|
+
|
|
108
|
+
```ruby
|
|
109
|
+
client = OpenAI::Client.new(access_token: ENV["OPENAI_API_KEY"]) do |f|
|
|
110
|
+
f.use :llm_cost_tracker, tags: { feature: "chat" }
|
|
111
|
+
end
|
|
112
|
+
|
|
113
|
+
client.chat(
|
|
114
|
+
parameters: {
|
|
115
|
+
model: "gpt-4o",
|
|
116
|
+
messages: [{ role: "user", content: "Hello" }],
|
|
117
|
+
stream: proc { |chunk, _bytesize| puts chunk.dig("choices", 0, "delta", "content") },
|
|
118
|
+
stream_options: { include_usage: true }
|
|
119
|
+
}
|
|
120
|
+
)
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
Use the constructor block on every client you build, or wrap client creation in your own factory.
|
|
124
|
+
|
|
125
|
+
## Azure OpenAI
|
|
126
|
+
|
|
127
|
+
Azure's v1 API works with OpenAI-compatible HTTP shapes, but pricing and deployment names are yours. Register the Azure host, use the Faraday middleware path, and keep Azure-specific prices in `prices_file` or `pricing_overrides`.
|
|
128
|
+
|
|
129
|
+
```ruby
|
|
130
|
+
LlmCostTracker.configure do |config|
|
|
131
|
+
config.openai_compatible_providers["my-resource.openai.azure.com"] = "azure_openai"
|
|
132
|
+
end
|
|
133
|
+
|
|
134
|
+
conn = Faraday.new(url: "https://my-resource.openai.azure.com") do |f|
|
|
135
|
+
f.use :llm_cost_tracker, tags: { feature: "chat" }
|
|
136
|
+
f.request :json
|
|
137
|
+
f.response :json
|
|
138
|
+
f.adapter Faraday.default_adapter
|
|
139
|
+
end
|
|
140
|
+
|
|
141
|
+
conn.post(
|
|
142
|
+
"/openai/v1/responses",
|
|
143
|
+
{ model: "gpt-4o-prod", input: "Hello" },
|
|
144
|
+
{ "api-key" => ENV.fetch("AZURE_OPENAI_API_KEY") }
|
|
145
|
+
)
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
## Gemini API
|
|
149
|
+
|
|
150
|
+
Google's official Gemini SDKs do not include Ruby. Use a Faraday client against the REST API so the Gemini parser can capture usage automatically.
|
|
151
|
+
|
|
152
|
+
```ruby
|
|
153
|
+
conn = Faraday.new(url: "https://generativelanguage.googleapis.com") do |f|
|
|
154
|
+
f.use :llm_cost_tracker, tags: { feature: "chat" }
|
|
155
|
+
f.request :json
|
|
156
|
+
f.response :json
|
|
157
|
+
f.adapter Faraday.default_adapter
|
|
158
|
+
end
|
|
159
|
+
|
|
160
|
+
conn.post(
|
|
161
|
+
"/v1beta/models/gemini-2.5-flash:generateContent?key=#{ENV.fetch("GOOGLE_API_KEY")}",
|
|
162
|
+
{ contents: [{ role: "user", parts: [{ text: "Hello" }] }] }
|
|
163
|
+
)
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
## LiteLLM proxy
|
|
167
|
+
|
|
168
|
+
LiteLLM Proxy speaks an OpenAI-compatible HTTP shape, so register the proxy host once and keep using the normal middleware path.
|
|
169
|
+
|
|
170
|
+
```ruby
|
|
171
|
+
LlmCostTracker.configure do |config|
|
|
172
|
+
config.openai_compatible_providers["proxy.internal.example"] = "litellm"
|
|
173
|
+
end
|
|
174
|
+
|
|
175
|
+
client = OpenAI::Client.new(
|
|
176
|
+
access_token: ENV["LITELLM_API_KEY"],
|
|
177
|
+
uri_base: "https://proxy.internal.example"
|
|
178
|
+
) do |f|
|
|
179
|
+
f.use :llm_cost_tracker, tags: { gateway: "litellm" }
|
|
180
|
+
end
|
|
181
|
+
|
|
182
|
+
client.chat(parameters: { model: "openai/gpt-5-mini", messages: [{ role: "user", content: "Hello" }] })
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
If your proxy exposes custom model IDs or discounts, add them in `prices_file` or `pricing_overrides`.
|
|
Binary file
|
data/docs/dashboard.md
ADDED
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
# Dashboard
|
|
2
|
+
|
|
3
|
+
The dashboard is a Rails Engine for humans reviewing spend, attribution, and data
|
|
4
|
+
quality. It is server-rendered ERB, has no JavaScript bundle, and reads from the
|
|
5
|
+
host app's `llm_api_calls` table.
|
|
6
|
+
|
|
7
|
+
The detailed dashboard guide is moving here from the README: mounting, route
|
|
8
|
+
constraints, authentication examples, page map, and operational notes.
|
|
9
|
+
|
|
10
|
+
## Canonical Sources
|
|
11
|
+
|
|
12
|
+
Until this page is expanded, use:
|
|
13
|
+
|
|
14
|
+
- [Dashboard](../README.md#dashboard)
|
|
15
|
+
- [Privacy](../README.md#privacy)
|
|
16
|
+
- [Operations](operations.md)
|
|
17
|
+
|
|
18
|
+
## Mounting
|
|
19
|
+
|
|
20
|
+
```ruby
|
|
21
|
+
mount LlmCostTracker::Engine => "/llm-costs"
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
Use `storage_backend = :active_record` for apps that mount the dashboard.
|
|
25
|
+
|
|
26
|
+
## Pages
|
|
27
|
+
|
|
28
|
+
- Overview: spend trend, budget status, anomaly banner, provider rollup, top models
|
|
29
|
+
- Models: spend and usage by provider and model
|
|
30
|
+
- Calls: filterable call ledger with CSV export
|
|
31
|
+
- Tags: tag keys and tag value breakdowns
|
|
32
|
+
- Data Quality: unknown pricing, untagged calls, missing latency, incomplete streams
|
|
33
|
+
|
|
34
|
+
## Authentication
|
|
35
|
+
|
|
36
|
+
The gem does not ship dashboard auth. Mount the engine behind the host app's
|
|
37
|
+
existing authentication layer: Devise, basic auth, Cloudflare Access, or your own
|
|
38
|
+
constraints.
|
data/docs/extending.md
ADDED
|
@@ -0,0 +1,32 @@
|
|
|
1
|
+
# Extending LLM Cost Tracker
|
|
2
|
+
|
|
3
|
+
Extensions belong at clear boundaries: parsers for response shapes, integrations
|
|
4
|
+
for SDK hooks, pricing files for rates, and custom storage for apps that own
|
|
5
|
+
persistence themselves.
|
|
6
|
+
|
|
7
|
+
The practical extension guide is moving here from the README. The lower-level
|
|
8
|
+
contracts already live in the technical extension reference.
|
|
9
|
+
|
|
10
|
+
## Canonical Sources
|
|
11
|
+
|
|
12
|
+
Until this page is expanded, use:
|
|
13
|
+
|
|
14
|
+
- [Capturing calls](../README.md#capturing-calls)
|
|
15
|
+
- [Pricing](pricing.md)
|
|
16
|
+
- [Technical extension points](technical/extension-points.md)
|
|
17
|
+
|
|
18
|
+
## Extension Points
|
|
19
|
+
|
|
20
|
+
- Custom parser: translate a provider response into `ParsedUsage`.
|
|
21
|
+
- OpenAI-compatible host: register the host-to-provider mapping.
|
|
22
|
+
- Custom storage: receive the canonical `Event` and write it elsewhere.
|
|
23
|
+
- Notifications subscriber: observe `llm_request.llm_cost_tracker`.
|
|
24
|
+
- Local price file: model gateway IDs, contract rates, or unsupported models.
|
|
25
|
+
|
|
26
|
+
## Parser Boundary
|
|
27
|
+
|
|
28
|
+
A parser matches request URLs, translates provider response shapes into
|
|
29
|
+
`ParsedUsage`, and returns `nil` when the response is outside its contract.
|
|
30
|
+
|
|
31
|
+
Do provider-specific translation at this boundary. Keep `Tracker`, storage,
|
|
32
|
+
dashboard, and pricing in canonical ledger terms.
|
data/docs/operations.md
ADDED
|
@@ -0,0 +1,44 @@
|
|
|
1
|
+
# Operations
|
|
2
|
+
|
|
3
|
+
Production use is mostly about choosing the right storage backend, keeping the
|
|
4
|
+
database healthy, and understanding where the gem is intentionally best effort.
|
|
5
|
+
|
|
6
|
+
The operational guide is moving here from the README: retention, tag storage,
|
|
7
|
+
thread safety, connection pools, and deployment notes.
|
|
8
|
+
|
|
9
|
+
## Canonical Sources
|
|
10
|
+
|
|
11
|
+
Until this page is expanded, use:
|
|
12
|
+
|
|
13
|
+
- [Privacy](../README.md#privacy)
|
|
14
|
+
- [Known limitations](../README.md#known-limitations)
|
|
15
|
+
- [Technical operational notes](technical/operational-notes.md)
|
|
16
|
+
|
|
17
|
+
## Production Defaults
|
|
18
|
+
|
|
19
|
+
- Use `storage_backend = :active_record` for the shared ledger, dashboard, and
|
|
20
|
+
cross-process budget guardrails.
|
|
21
|
+
- Size the ActiveRecord connection pool for your app plus ledger writes.
|
|
22
|
+
- Keep `storage_error_behavior = :warn` unless losing the LLM response is better
|
|
23
|
+
than losing the ledger event.
|
|
24
|
+
- Treat `:block_requests` as a guardrail, not a hard quota.
|
|
25
|
+
- Keep `default_tags` callables fast and thread-safe.
|
|
26
|
+
|
|
27
|
+
## Retention
|
|
28
|
+
|
|
29
|
+
Retention is explicit. Use the prune task when the ledger should not grow
|
|
30
|
+
forever:
|
|
31
|
+
|
|
32
|
+
```bash
|
|
33
|
+
DAYS=90 bin/rails llm_cost_tracker:prune
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
When ActiveRecord period rollups are installed, pruning decrements the
|
|
37
|
+
affected daily and monthly buckets in the same batch transaction as the ledger
|
|
38
|
+
delete.
|
|
39
|
+
|
|
40
|
+
## Data Shape
|
|
41
|
+
|
|
42
|
+
Tags are JSONB with a GIN index on PostgreSQL and JSON text elsewhere. The
|
|
43
|
+
dashboard and query helpers work across supported adapters, but PostgreSQL is the
|
|
44
|
+
strongest path for large tag-heavy ledgers.
|