llm_cost_tracker 0.5.0 → 0.5.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +38 -0
- data/README.md +116 -467
- data/app/controllers/llm_cost_tracker/calls_controller.rb +2 -1
- data/app/controllers/llm_cost_tracker/dashboard_controller.rb +3 -15
- data/app/controllers/llm_cost_tracker/tags_controller.rb +7 -6
- data/app/helpers/llm_cost_tracker/application_helper.rb +21 -6
- data/app/helpers/llm_cost_tracker/dashboard_filter_options_helper.rb +3 -1
- data/app/services/llm_cost_tracker/dashboard/date_range.rb +42 -0
- data/app/services/llm_cost_tracker/dashboard/filter.rb +6 -8
- data/app/services/llm_cost_tracker/dashboard/spend_anomaly.rb +6 -5
- data/app/services/llm_cost_tracker/dashboard/tag_breakdown.rb +74 -18
- data/app/services/llm_cost_tracker/dashboard/tag_key_explorer.rb +15 -4
- data/app/views/llm_cost_tracker/shared/_tag_chips.html.erb +1 -1
- data/app/views/llm_cost_tracker/tags/show.html.erb +4 -0
- data/lib/llm_cost_tracker/configuration.rb +22 -16
- data/lib/llm_cost_tracker/doctor.rb +1 -1
- data/lib/llm_cost_tracker/generators/llm_cost_tracker/install_generator.rb +1 -0
- data/lib/llm_cost_tracker/generators/llm_cost_tracker/templates/initializer.rb.erb +8 -2
- data/lib/llm_cost_tracker/integrations/anthropic.rb +12 -3
- data/lib/llm_cost_tracker/integrations/base.rb +77 -6
- data/lib/llm_cost_tracker/integrations/object_reader.rb +1 -1
- data/lib/llm_cost_tracker/integrations/openai.rb +14 -5
- data/lib/llm_cost_tracker/integrations/registry.rb +3 -1
- data/lib/llm_cost_tracker/integrations/ruby_llm.rb +171 -0
- data/lib/llm_cost_tracker/llm_api_call.rb +10 -9
- data/lib/llm_cost_tracker/middleware/faraday.rb +10 -6
- data/lib/llm_cost_tracker/parsers/gemini.rb +8 -1
- data/lib/llm_cost_tracker/parsers/openai_usage.rb +11 -2
- data/lib/llm_cost_tracker/price_freshness.rb +3 -3
- data/lib/llm_cost_tracker/price_registry.rb +3 -0
- data/lib/llm_cost_tracker/price_sync/fetcher.rb +43 -12
- data/lib/llm_cost_tracker/price_sync/registry_diff.rb +51 -0
- data/lib/llm_cost_tracker/price_sync/registry_loader.rb +6 -0
- data/lib/llm_cost_tracker/price_sync/registry_writer.rb +5 -1
- data/lib/llm_cost_tracker/price_sync.rb +103 -111
- data/lib/llm_cost_tracker/prices.json +225 -229
- data/lib/llm_cost_tracker/pricing.rb +27 -15
- data/lib/llm_cost_tracker/report.rb +8 -1
- data/lib/llm_cost_tracker/report_data.rb +25 -9
- data/lib/llm_cost_tracker/retention.rb +30 -7
- data/lib/llm_cost_tracker/storage/dispatcher.rb +68 -0
- data/lib/llm_cost_tracker/stream_capture.rb +7 -0
- data/lib/llm_cost_tracker/stream_collector.rb +25 -1
- data/lib/llm_cost_tracker/tag_sanitizer.rb +81 -0
- data/lib/llm_cost_tracker/tracker.rb +7 -59
- data/lib/llm_cost_tracker/version.rb +1 -1
- data/lib/llm_cost_tracker.rb +1 -0
- data/lib/tasks/llm_cost_tracker.rake +24 -78
- metadata +26 -15
- data/lib/llm_cost_tracker/price_sync/merger.rb +0 -72
- data/lib/llm_cost_tracker/price_sync/model_catalog.rb +0 -77
- data/lib/llm_cost_tracker/price_sync/raw_price.rb +0 -33
- data/lib/llm_cost_tracker/price_sync/refresh_plan_builder.rb +0 -164
- data/lib/llm_cost_tracker/price_sync/source.rb +0 -29
- data/lib/llm_cost_tracker/price_sync/source_result.rb +0 -7
- data/lib/llm_cost_tracker/price_sync/sources/litellm.rb +0 -90
- data/lib/llm_cost_tracker/price_sync/sources/open_router.rb +0 -93
- data/lib/llm_cost_tracker/price_sync/validator.rb +0 -66
data/README.md
CHANGED
|
@@ -1,111 +1,105 @@
|
|
|
1
1
|
# LLM Cost Tracker
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
A Rails-native ledger for what your LLM calls actually cost.
|
|
4
4
|
|
|
5
5
|
[](https://rubygems.org/gems/llm_cost_tracker)
|
|
6
6
|
[](https://github.com/sergey-homenko/llm_cost_tracker/actions)
|
|
7
7
|
[](https://codecov.io/gh/sergey-homenko/llm_cost_tracker)
|
|
8
8
|
|
|
9
|
-
|
|
10
|
-
ActiveRecord storage requires ActiveRecord 7.1+. The mounted dashboard requires Rails 7.1+.
|
|
9
|
+
If you have OpenAI, Anthropic, or Gemini in production and someone keeps asking "where did that bill come from?", this gem records every call into your own database, prices it locally, and gives you a dashboard you can mount in five minutes. No proxy, no SaaS account, no extra service to deploy.
|
|
11
10
|
|
|
12
|
-
|
|
11
|
+
It is not Langfuse, Helicone, or LiteLLM. It does not capture prompts, score completions, or replay traces. It does one thing: tells you which provider, which model, which feature, and which user burned how much money. That's the entire pitch.
|
|
13
12
|
|
|
14
|
-
|
|
13
|
+
Requires Ruby 3.3+, ActiveSupport 7.1+, Faraday 2.0+. ActiveRecord storage and the dashboard need Rails 7.1+.
|
|
15
14
|
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
- A local ActiveRecord ledger of provider, model, usage breakdown, cost, latency, tags, streaming usage, and provider response IDs
|
|
19
|
-
- Optional official OpenAI and Anthropic SDK integrations, plus Faraday middleware for custom clients
|
|
20
|
-
- Explicit `track` / `track_stream` helpers as a fallback for unsupported clients
|
|
21
|
-
- Server-rendered Rails dashboard with overview, models, calls, tags, CSV export, and data-quality pages
|
|
22
|
-
- Local pricing snapshots, price sync tasks, and budget guardrails
|
|
23
|
-
- Prompt and response bodies are never persisted
|
|
24
|
-
|
|
25
|
-
## Dashboard
|
|
26
|
-
|
|
27
|
-
LLM Cost Tracker ships with a server-rendered Rails Engine dashboard for spend review, attribution, and data quality checks.
|
|
28
|
-
|
|
29
|
-

|
|
30
|
-
|
|
31
|
-
The overview page includes spend trend, budget status, provider breakdown, top models, and filterable slices. The engine also includes Models, Calls, Tags, and Data Quality pages. Plain ERB, no JavaScript bundle.
|
|
15
|
+

|
|
32
16
|
|
|
33
17
|
## Quickstart
|
|
34
18
|
|
|
19
|
+
Add to your Gemfile alongside whatever LLM client you already use:
|
|
20
|
+
|
|
35
21
|
```ruby
|
|
36
22
|
gem "llm_cost_tracker"
|
|
37
|
-
gem "openai"
|
|
23
|
+
gem "openai" # or "anthropic", "ruby_llm", or your existing client
|
|
38
24
|
```
|
|
39
25
|
|
|
26
|
+
Install, migrate, verify:
|
|
27
|
+
|
|
40
28
|
```bash
|
|
41
29
|
bin/rails generate llm_cost_tracker:install --dashboard --prices
|
|
42
30
|
bin/rails db:migrate
|
|
43
31
|
bin/rails llm_cost_tracker:doctor
|
|
44
32
|
```
|
|
45
33
|
|
|
46
|
-
|
|
34
|
+
Drop this into `config/initializers/llm_cost_tracker.rb`:
|
|
47
35
|
|
|
48
36
|
```ruby
|
|
49
37
|
LlmCostTracker.configure do |config|
|
|
50
38
|
config.storage_backend = :active_record
|
|
51
|
-
config.default_tags
|
|
39
|
+
config.default_tags = -> { { environment: Rails.env } }
|
|
52
40
|
config.instrument :openai
|
|
53
41
|
end
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
Now every OpenAI call is recorded. Wrap calls in `with_tags` to attribute spend to a user, feature, or anything else you care about:
|
|
54
45
|
|
|
46
|
+
```ruby
|
|
55
47
|
LlmCostTracker.with_tags(user_id: Current.user&.id, feature: "chat") do
|
|
56
48
|
client = OpenAI::Client.new(api_key: ENV["OPENAI_API_KEY"])
|
|
57
49
|
client.responses.create(model: "gpt-4o", input: "Hello")
|
|
58
50
|
end
|
|
59
51
|
```
|
|
60
52
|
|
|
61
|
-
|
|
62
|
-
|
|
53
|
+
Visit `/llm-costs` for the dashboard. **Mount it behind your app's auth before deploying** — the gem doesn't ship with one, on purpose.
|
|
54
|
+
|
|
55
|
+
## What you get
|
|
63
56
|
|
|
64
|
-
|
|
57
|
+
- Local ActiveRecord ledger of every call: provider, model, token breakdown, cost, latency, tags, response IDs
|
|
58
|
+
- Auto-capture for RubyLLM and the official `openai` and `anthropic` Ruby SDKs, plus Faraday middleware for `ruby-openai`, the Gemini REST API, and any client you can inject middleware into
|
|
59
|
+
- Server-rendered dashboard (plain ERB, zero JavaScript) with overview, models, calls, tags, CSV export, and a data-quality page
|
|
60
|
+
- Local pricing snapshots refreshed daily from the official provider pricing pages, applied with `bin/rails llm_cost_tracker:prices:refresh`
|
|
61
|
+
- Monthly / daily / per-call budget guardrails with notify, raise, or block-requests behaviour
|
|
62
|
+
- Tag-based attribution that survives concurrency — Puma threads and Sidekiq fibers don't bleed into each other
|
|
65
63
|
|
|
66
|
-
|
|
67
|
-
- Best-effort pricing for spend review and attribution, not invoice-grade billing
|
|
68
|
-
- No prompt or response body storage
|
|
69
|
-
- No built-in auth on the mounted dashboard
|
|
70
|
-
- Use `:active_record` when you want shared dashboards and budget checks across Puma workers and Sidekiq processes
|
|
64
|
+
## What it deliberately doesn't do
|
|
71
65
|
|
|
72
|
-
|
|
66
|
+
- **Doesn't run as a proxy.** Calls go directly from your app to the provider.
|
|
67
|
+
- **Doesn't store prompts or completions.** Token counts, model, cost, tags, response IDs only. Nothing else.
|
|
68
|
+
- **Doesn't promise invoice-grade accuracy.** It uses official provider pricing pages, but enterprise rates, batch discounts on unsupported endpoints, and modality tiers are not always modeled. `provider_response_id` is stored as a join key for whoever does that reconciliation.
|
|
69
|
+
- **Doesn't ship with auth on the dashboard.** It's a Rails Engine; mount it behind whatever your app already uses (Devise, basic auth, Cloudflare Access, your own session middleware).
|
|
70
|
+
- **Doesn't centralize multi-service visibility.** One Rails monolith — perfect fit. Six services in four languages — wrong tool, look at a proxy or API-layer gateway.
|
|
73
71
|
|
|
74
|
-
|
|
72
|
+
## Capturing calls
|
|
75
73
|
|
|
76
|
-
|
|
74
|
+
Three paths, in order of preference. Use the first one that fits your stack.
|
|
77
75
|
|
|
78
|
-
###
|
|
76
|
+
### 1. SDK integrations
|
|
79
77
|
|
|
80
|
-
|
|
78
|
+
Drop-in for RubyLLM and the official `openai` and `anthropic` gems. `config.instrument` patches tested SDK methods so you don't change a single call site:
|
|
81
79
|
|
|
82
80
|
```ruby
|
|
83
81
|
LlmCostTracker.configure do |config|
|
|
84
|
-
config.instrument :openai
|
|
85
|
-
config.instrument :anthropic
|
|
82
|
+
config.instrument :openai # or :anthropic / :ruby_llm
|
|
86
83
|
end
|
|
87
|
-
```
|
|
88
|
-
|
|
89
|
-
The OpenAI integration records non-streaming calls through the official `openai` gem's `responses.create` and `chat.completions.create`. The Anthropic integration records non-streaming calls through the official `anthropic` gem's `messages.create`. Both integrations extract usage, model, latency, provider response ID, cache tokens, and hidden/reasoning tokens when the SDK response exposes them.
|
|
90
84
|
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
anthropic.messages.create(
|
|
95
|
-
model: "claude-sonnet-4-5-20250929",
|
|
85
|
+
LlmCostTracker.with_tags(feature: "support_chat") do
|
|
86
|
+
Anthropic::Client.new.messages.create(
|
|
87
|
+
model: "claude-sonnet-4-6",
|
|
96
88
|
max_tokens: 1024,
|
|
97
89
|
messages: [{ role: "user", content: "Hello" }]
|
|
98
90
|
)
|
|
99
91
|
end
|
|
100
92
|
```
|
|
101
93
|
|
|
102
|
-
|
|
94
|
+
Captures usage, model, latency, response ID, cache tokens, and reasoning tokens whenever the SDK exposes them. Provider SDKs are not added as gem dependencies — you install whichever you actually use.
|
|
95
|
+
|
|
96
|
+
Enabled integrations are checked at boot: the client gem must be loaded, meet the minimum supported version, and expose the expected classes and methods. If the contract check fails, boot raises instead of silently missing spend.
|
|
103
97
|
|
|
104
|
-
|
|
98
|
+
This patches **only** RubyLLM and the official Ruby SDKs. `ruby-openai` (alexrudall) and any custom client go through Faraday middleware below.
|
|
105
99
|
|
|
106
|
-
### Faraday middleware
|
|
100
|
+
### 2. Faraday middleware
|
|
107
101
|
|
|
108
|
-
`
|
|
102
|
+
For `ruby-openai`, the Gemini REST API, custom Faraday clients, or anything OpenAI-compatible (OpenRouter, DeepSeek, LiteLLM proxies):
|
|
109
103
|
|
|
110
104
|
```ruby
|
|
111
105
|
conn = Faraday.new(url: "https://api.openai.com") do |f|
|
|
@@ -114,66 +108,17 @@ conn = Faraday.new(url: "https://api.openai.com") do |f|
|
|
|
114
108
|
f.response :json
|
|
115
109
|
f.adapter Faraday.default_adapter
|
|
116
110
|
end
|
|
117
|
-
|
|
118
|
-
conn.post("/v1/responses", { model: "gpt-5-mini", input: "Hello!" })
|
|
119
|
-
```
|
|
120
|
-
|
|
121
|
-
Place `llm_cost_tracker` inside the Faraday stack where it can see the final response body.
|
|
122
|
-
|
|
123
|
-
The same middleware covers `ruby-openai` through its constructor block.
|
|
124
|
-
|
|
125
|
-
### Streaming
|
|
126
|
-
|
|
127
|
-
Streaming is captured automatically for OpenAI, Anthropic, and Gemini when the request goes through the Faraday middleware. The middleware tees the `on_data` callback, keeps the stream flowing to your code, and records provider-reported usage once the response completes.
|
|
128
|
-
|
|
129
|
-
```ruby
|
|
130
|
-
# OpenAI: include usage in the final chunk
|
|
131
|
-
client.chat(parameters: {
|
|
132
|
-
model: "gpt-4o",
|
|
133
|
-
messages: [...],
|
|
134
|
-
stream: proc { |chunk| ... },
|
|
135
|
-
stream_options: { include_usage: true }
|
|
136
|
-
})
|
|
137
|
-
```
|
|
138
|
-
|
|
139
|
-
Anthropic emits usage in `message_start` + `message_delta` events. Gemini's `:streamGenerateContent` endpoint includes `usageMetadata`; the latest usage block is used.
|
|
140
|
-
|
|
141
|
-
Streamed calls are stored with `stream: true` and `usage_source: "stream_final"`. If the provider never sends final usage, the call is still recorded with `usage_source: "unknown"` so those calls surface on the Data Quality page.
|
|
142
|
-
|
|
143
|
-
When the provider emits a stable response object ID, LLM Cost Tracker stores it as `provider_response_id`. OpenAI and Anthropic are covered end-to-end; Gemini is best effort and may vary by endpoint or API version.
|
|
144
|
-
|
|
145
|
-
Model identifiers are extracted from the provider response, request body, stream events, or URL path depending on the provider. If no source carries a model, the event is stored under `model: "unknown"` and shows up as unknown pricing instead of being guessed.
|
|
146
|
-
|
|
147
|
-
For non-Faraday clients without an SDK integration, prefer adding a supported adapter. Use the explicit helper only as a fallback while wiring a client that does not expose a stable hook yet:
|
|
148
|
-
|
|
149
|
-
```ruby
|
|
150
|
-
LlmCostTracker.track_stream(provider: "openai", model: "gpt-4o") do |stream|
|
|
151
|
-
my_client.stream(...) { |event| stream.event(event.to_h) }
|
|
152
|
-
end
|
|
153
|
-
|
|
154
|
-
# Or skip provider event parsing entirely if you already know the totals:
|
|
155
|
-
LlmCostTracker.track_stream(provider: "openai", model: "gpt-4o") do |stream|
|
|
156
|
-
# ... your streaming loop ...
|
|
157
|
-
stream.usage(input_tokens: 120, output_tokens: 45)
|
|
158
|
-
end
|
|
159
111
|
```
|
|
160
112
|
|
|
161
|
-
|
|
162
|
-
|
|
163
|
-
```ruby
|
|
164
|
-
LlmCostTracker.track_stream(provider: "anthropic", model: "claude-sonnet-4-6") do |stream|
|
|
165
|
-
stream.provider_response_id = response.id
|
|
166
|
-
stream.usage(input_tokens: 120, output_tokens: 45)
|
|
167
|
-
end
|
|
168
|
-
```
|
|
113
|
+
Tags can be a hash or a callable evaluated per request. Place the middleware where it sees the final response body — in practice, before the JSON parser.
|
|
169
114
|
|
|
170
|
-
|
|
115
|
+
Streaming works through the same path: the middleware tees the `on_data` callback so your code keeps receiving chunks normally, and the final usage gets recorded once the stream finishes. OpenAI streams need `stream_options: { include_usage: true }` for the final usage event.
|
|
171
116
|
|
|
172
|
-
|
|
117
|
+
Per-client setup snippets for `ruby-openai`, Azure OpenAI, LiteLLM proxy, and Gemini live in [`docs/cookbook.md`](docs/cookbook.md).
|
|
173
118
|
|
|
174
|
-
###
|
|
119
|
+
### 3. Manual `track` / `track_stream`
|
|
175
120
|
|
|
176
|
-
|
|
121
|
+
When you have a client that doesn't expose Faraday and isn't an official SDK — internal gateways, homegrown wrappers, batch jobs replaying historical usage:
|
|
177
122
|
|
|
178
123
|
```ruby
|
|
179
124
|
LlmCostTracker.track(
|
|
@@ -181,22 +126,16 @@ LlmCostTracker.track(
|
|
|
181
126
|
model: "claude-sonnet-4-6",
|
|
182
127
|
input_tokens: 1500,
|
|
183
128
|
output_tokens: 320,
|
|
184
|
-
provider_response_id: "msg_01XFDUDYJgAACzvnptvVoYEL",
|
|
185
|
-
cache_read_input_tokens: 1200,
|
|
186
129
|
feature: "summarizer",
|
|
187
130
|
user_id: current_user.id
|
|
188
131
|
)
|
|
189
132
|
```
|
|
190
133
|
|
|
191
|
-
`
|
|
192
|
-
`cache_read_input_tokens` and cache writes in `cache_write_input_tokens`; total
|
|
193
|
-
tokens are calculated from the canonical billing breakdown.
|
|
194
|
-
|
|
195
|
-
For manual tracking, pass the real upstream model when you know it. If a gateway only exposes a deployment or router name, use that stable identifier and add a matching `prices_file` / `pricing_overrides` entry.
|
|
134
|
+
For streaming the same way, `track_stream` accepts a block, parses provider events automatically, and records once the stream finishes. Full reference in [`docs/streaming.md`](docs/streaming.md).
|
|
196
135
|
|
|
197
|
-
|
|
136
|
+
## Tags: who burned this money
|
|
198
137
|
|
|
199
|
-
Tags
|
|
138
|
+
Tags answer the only question that matters in attribution: which feature, which user, which job, which tenant. They're free-form strings, indexed (JSONB on Postgres, fallback elsewhere), and queryable from both Ruby and the dashboard.
|
|
200
139
|
|
|
201
140
|
```ruby
|
|
202
141
|
LlmCostTracker.with_tags(user_id: current_user.id, feature: "support_chat", trace_id: request.uuid) do
|
|
@@ -204,68 +143,15 @@ LlmCostTracker.with_tags(user_id: current_user.id, feature: "support_chat", trac
|
|
|
204
143
|
end
|
|
205
144
|
```
|
|
206
145
|
|
|
207
|
-
`
|
|
146
|
+
`with_tags` is thread- and fiber-isolated, so concurrent requests in Puma or jobs in Sidekiq don't bleed into each other. A `default_tags` callable on configuration runs on every event for things you always want — `environment`, `region`, deployment SHA. Explicit tags passed to `track` win over scoped tags, scoped tags win over defaults.
|
|
208
147
|
|
|
209
|
-
|
|
210
|
-
|
|
211
|
-
```ruby
|
|
212
|
-
LlmCostTracker.configure do |config|
|
|
213
|
-
config.storage_backend = :active_record
|
|
214
|
-
config.default_tags = -> { { environment: Rails.env } }
|
|
215
|
-
config.instrument :openai
|
|
216
|
-
config.instrument :anthropic
|
|
217
|
-
config.prices_file = Rails.root.join("config/llm_cost_tracker_prices.yml")
|
|
218
|
-
config.monthly_budget = 500.00
|
|
219
|
-
config.daily_budget = 50.00
|
|
220
|
-
config.per_call_budget = 2.00
|
|
221
|
-
config.budget_exceeded_behavior = :notify
|
|
222
|
-
config.on_budget_exceeded = ->(data) {
|
|
223
|
-
SlackNotifier.notify("#alerts", "LLM #{data[:budget_type]} budget $#{data[:total].round(2)} / $#{data[:budget]}")
|
|
224
|
-
}
|
|
225
|
-
end
|
|
226
|
-
```
|
|
148
|
+
What you put in tags is **your** input — they're queryable strings. Don't put prompts, completions, emails, or secrets there. Use IDs.
|
|
227
149
|
|
|
228
|
-
|
|
229
|
-
|
|
230
|
-
Configuration reference:
|
|
231
|
-
|
|
232
|
-
| Option | Default | Purpose |
|
|
233
|
-
|---|---:|---|
|
|
234
|
-
| `enabled` | `true` | Turns tracking on/off. |
|
|
235
|
-
| `storage_backend` | `:log` | `:log`, `:active_record`, or `:custom`. |
|
|
236
|
-
| `custom_storage` | `nil` | Callable storage hook for `:custom`. |
|
|
237
|
-
| `default_tags` | `{}` | Hash or callable merged into every event. |
|
|
238
|
-
| `prices_file` | `nil` | Local JSON/YAML price table. |
|
|
239
|
-
| `pricing_overrides` | `{}` | Ruby-side model price overrides. |
|
|
240
|
-
| `instrument` | none | Enables optional SDK integrations such as `:openai`, `:anthropic`, or `:all`. |
|
|
241
|
-
| `monthly_budget` | `nil` | Monthly spend guardrail. |
|
|
242
|
-
| `daily_budget` | `nil` | Daily spend guardrail. |
|
|
243
|
-
| `per_call_budget` | `nil` | Single-event spend guardrail. |
|
|
244
|
-
| `budget_exceeded_behavior` | `:notify` | `:notify`, `:raise`, or `:block_requests`. |
|
|
245
|
-
| `on_budget_exceeded` | `nil` | Callback for budget events. |
|
|
246
|
-
| `storage_error_behavior` | `:warn` | `:ignore`, `:warn`, or `:raise`. |
|
|
247
|
-
| `unknown_pricing_behavior` | `:warn` | `:ignore`, `:warn`, or `:raise`. |
|
|
248
|
-
| `log_level` | `:info` | Log level used by `:log` storage. |
|
|
249
|
-
| `openai_compatible_providers` | OpenRouter + DeepSeek | Host-to-provider map for compatible APIs. |
|
|
250
|
-
| `report_tag_breakdowns` | `[]` | Tag keys included in text reports. |
|
|
251
|
-
|
|
252
|
-
LLM Cost Tracker estimates cost from recorded usage and a versioned price registry. Providers usually return token usage, not a stable per-request price, so request costs are calculated locally and stored with the call. Historical rows do not change when prices update.
|
|
253
|
-
|
|
254
|
-
Pricing is best effort. OpenRouter-style IDs like `openai/gpt-4o-mini` are normalized to built-in names when possible. Use `prices_file` / `pricing_overrides` for fine-tunes, gateway-specific IDs, enterprise discounts, alternate pricing modes, or models the gem does not know.
|
|
255
|
-
Provider-specific entries like `openai/gpt-4o-mini` win over model-only entries like `gpt-4o-mini`.
|
|
256
|
-
Pass `pricing_mode: :batch` to use optional mode-specific keys such as `batch_input` / `batch_output`; missing mode-specific keys fall back to standard `input` / `output` rates. The same pattern works for custom modes, for example `contract_input`.
|
|
257
|
-
|
|
258
|
-
`storage_error_behavior = :warn` (default) lets LLM responses continue if storage fails; `:raise` exposes `StorageError#original_error`.
|
|
259
|
-
|
|
260
|
-
With `unknown_pricing_behavior = :ignore` or `:warn`, unknown pricing still records token counts, but `cost` is `nil` and budget guardrails skip that event. With `:raise`, the event raises before storage. Find unpriced models:
|
|
261
|
-
|
|
262
|
-
```ruby
|
|
263
|
-
LlmCostTracker::LlmApiCall.unknown_pricing.group(:model).count
|
|
264
|
-
```
|
|
150
|
+
## Pricing
|
|
265
151
|
|
|
266
|
-
|
|
152
|
+
Built-in prices live in `lib/llm_cost_tracker/prices.json` and are refreshed daily from official provider pricing pages by an automated CI workflow that opens a PR on every change. Most apps run on bundled prices and never think about this.
|
|
267
153
|
|
|
268
|
-
|
|
154
|
+
When you want to control updates yourself — for negotiated rates, gateway-specific model IDs, or pinned reviews — generate a local snapshot:
|
|
269
155
|
|
|
270
156
|
```bash
|
|
271
157
|
bin/rails generate llm_cost_tracker:prices
|
|
@@ -275,356 +161,119 @@ bin/rails generate llm_cost_tracker:prices
|
|
|
275
161
|
config.prices_file = Rails.root.join("config/llm_cost_tracker_prices.yml")
|
|
276
162
|
```
|
|
277
163
|
|
|
278
|
-
|
|
279
|
-
|
|
280
|
-
```yaml
|
|
281
|
-
metadata:
|
|
282
|
-
updated_at: "2026-04-25"
|
|
283
|
-
currency: USD
|
|
284
|
-
unit: 1M tokens
|
|
285
|
-
models:
|
|
286
|
-
my-gateway/gpt-4o-mini:
|
|
287
|
-
input: 0.20
|
|
288
|
-
cache_read_input: 0.10
|
|
289
|
-
output: 0.80
|
|
290
|
-
batch_input: 0.10
|
|
291
|
-
batch_output: 0.40
|
|
292
|
-
```
|
|
293
|
-
|
|
294
|
-
Pricing precedence is `pricing_overrides`, then `prices_file`, then bundled prices. Use `prices_file` for the app's source-controlled snapshot and `pricing_overrides` only for a handful of Ruby-side emergency overrides.
|
|
295
|
-
|
|
296
|
-
To refresh prices on demand:
|
|
164
|
+
Refresh on demand from the maintained snapshot:
|
|
297
165
|
|
|
298
166
|
```bash
|
|
299
|
-
bin/rails llm_cost_tracker:prices:
|
|
300
|
-
```
|
|
301
|
-
|
|
302
|
-
`llm_cost_tracker:prices:sync` refreshes a pricing file from two structured sources: LiteLLM first, OpenRouter second. LiteLLM is the primary source; OpenRouter fills gaps and helps surface discrepancies.
|
|
303
|
-
|
|
304
|
-
`llm_cost_tracker:prices:sync` / `llm_cost_tracker:prices:check` perform HTTP GET requests to:
|
|
305
|
-
|
|
306
|
-
- LiteLLM pricing JSON: `https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json`
|
|
307
|
-
- OpenRouter Models API: `https://openrouter.ai/api/v1/models`
|
|
308
|
-
|
|
309
|
-
The task writes to `ENV["OUTPUT"]`, then `config.prices_file`, in that order. It aborts if neither is present. The gem's bundled `prices.json` is only updated when you explicitly pass it through `OUTPUT=` while developing the gem. `_source: "manual"` entries are never touched. Models that are still in your file but missing from both upstream sources are left alone and reported as orphaned. For intentional custom entries, mark them as manual so they stop showing up in orphaned warnings.
|
|
310
|
-
|
|
311
|
-
Use `OUTPUT=config/llm_cost_tracker_prices.yml` to choose a target file explicitly. Use `PREVIEW=1` to see the diff without writing. Use `STRICT=1` to fail instead of applying a partial refresh when a source fails or the validator rejects a price. Use `bin/rails llm_cost_tracker:prices:check` in CI to print the current diff and exit non-zero when the snapshot has drifted or refresh fails.
|
|
312
|
-
|
|
313
|
-
Large price changes are flagged during sync. If a specific entry is expected to move by more than 3x, add `_validator_override: ["skip_relative_change"]` to that entry in your local price file.
|
|
314
|
-
|
|
315
|
-
If sync reports `certificate verify failed`, fix the host Ruby/OpenSSL trust store rather than disabling TLS verification. Common fixes are installing `ca-certificates` in Docker/Linux images, configuring the corporate proxy CA, setting `SSL_CERT_FILE` to the system CA bundle, or rebuilding rbenv/asdf Ruby after an OpenSSL upgrade.
|
|
316
|
-
|
|
317
|
-
For unattended updates, run the check daily and sync through review:
|
|
318
|
-
|
|
319
|
-
```bash
|
|
320
|
-
bin/rails llm_cost_tracker:prices:check
|
|
321
|
-
STRICT=1 bin/rails llm_cost_tracker:prices:sync
|
|
322
|
-
```
|
|
323
|
-
|
|
324
|
-
`bin/rails llm_cost_tracker:doctor` warns when the configured price file has no `metadata.updated_at` or when it is older than 30 days.
|
|
325
|
-
|
|
326
|
-
## Budget enforcement
|
|
327
|
-
|
|
328
|
-
```ruby
|
|
329
|
-
config.storage_backend = :active_record
|
|
330
|
-
config.monthly_budget = 100.00
|
|
331
|
-
config.daily_budget = 10.00
|
|
332
|
-
config.per_call_budget = 1.00
|
|
333
|
-
config.budget_exceeded_behavior = :block_requests
|
|
167
|
+
bin/rails llm_cost_tracker:prices:refresh
|
|
334
168
|
```
|
|
335
169
|
|
|
336
|
-
|
|
337
|
-
- `:raise` — record the event, then raise `BudgetExceededError`.
|
|
338
|
-
- `:block_requests` — block preflight when the stored monthly or daily total is already over budget; still raises post-response on the event that crosses the line. Needs `:active_record` storage for preflight.
|
|
170
|
+
Precedence is `pricing_overrides` → `prices_file` → bundled. Provider-qualified keys like `openai/gpt-4o-mini` win over model-only keys. Full pricing reference: [`docs/pricing.md`](docs/pricing.md).
|
|
339
171
|
|
|
340
|
-
|
|
172
|
+
## Budgets
|
|
341
173
|
|
|
342
|
-
|
|
174
|
+
Budgets are guardrails, not transactional caps:
|
|
343
175
|
|
|
344
176
|
```ruby
|
|
345
|
-
|
|
346
|
-
|
|
177
|
+
config.monthly_budget = 500.00
|
|
178
|
+
config.daily_budget = 50.00
|
|
179
|
+
config.per_call_budget = 2.00
|
|
180
|
+
config.budget_exceeded_behavior = :block_requests # or :notify, :raise
|
|
181
|
+
config.on_budget_exceeded = ->(data) { SlackNotifier.notify("#alerts", "...") }
|
|
347
182
|
```
|
|
348
183
|
|
|
349
|
-
`:block_requests`
|
|
184
|
+
`:block_requests` reads ledger totals before a call goes out and stops it if you're already over. Under concurrency multiple workers can pass preflight at the same time and collectively overshoot — this catches the next call after the overshoot becomes visible, not the overshoot itself. For a strict cap, use a provider-side limit or a transactional counter outside the gem.
|
|
350
185
|
|
|
351
|
-
|
|
186
|
+
Full behavior, error class, and preflight details: [`docs/budgets.md`](docs/budgets.md).
|
|
352
187
|
|
|
353
|
-
|
|
354
|
-
LlmCostTracker.track(
|
|
355
|
-
provider: "openai",
|
|
356
|
-
model: "gpt-4o",
|
|
357
|
-
input_tokens: 120,
|
|
358
|
-
output_tokens: 45,
|
|
359
|
-
enforce_budget: true
|
|
360
|
-
)
|
|
361
|
-
|
|
362
|
-
LlmCostTracker.track_stream(provider: "openai", model: "gpt-4o", enforce_budget: true) do |stream|
|
|
363
|
-
# raises BudgetExceededError before the block runs when over budget
|
|
364
|
-
end
|
|
188
|
+
## Querying
|
|
365
189
|
|
|
366
|
-
|
|
367
|
-
```
|
|
368
|
-
|
|
369
|
-
## Doctor
|
|
370
|
-
|
|
371
|
-
Run the setup check after install, deploy, or upgrades:
|
|
372
|
-
|
|
373
|
-
```bash
|
|
374
|
-
bin/rails llm_cost_tracker:doctor
|
|
375
|
-
```
|
|
376
|
-
|
|
377
|
-
It checks storage mode, ActiveRecord availability, table/column coverage, period rollups, pricing file loading, and whether calls are being recorded. Setup errors exit non-zero; warnings point at optional production hardening.
|
|
378
|
-
|
|
379
|
-
## Querying costs
|
|
380
|
-
|
|
381
|
-
These helpers and rake tasks require ActiveRecord storage.
|
|
382
|
-
|
|
383
|
-
```bash
|
|
384
|
-
bin/rails llm_cost_tracker:report
|
|
385
|
-
DAYS=7 bin/rails llm_cost_tracker:report
|
|
386
|
-
```
|
|
190
|
+
When you want to slice spend from a console, scheduled job, or your own admin page:
|
|
387
191
|
|
|
388
192
|
```ruby
|
|
389
|
-
LlmCostTracker::LlmApiCall.today.total_cost
|
|
390
193
|
LlmCostTracker::LlmApiCall.this_month.cost_by_model
|
|
391
|
-
LlmCostTracker::LlmApiCall.this_month.
|
|
392
|
-
|
|
393
|
-
# Group / sum by any tag
|
|
394
|
-
LlmCostTracker::LlmApiCall.this_month.group_by_tag("feature").sum(:total_cost)
|
|
395
|
-
LlmCostTracker::LlmApiCall.this_month.cost_by_tag("feature") # with "(untagged)" bucket
|
|
396
|
-
|
|
397
|
-
# Period grouping (SQL-side)
|
|
398
|
-
LlmCostTracker::LlmApiCall.this_month.group_by_period(:day).sum(:total_cost)
|
|
399
|
-
LlmCostTracker::LlmApiCall.group_by_period(:month).sum(:total_cost)
|
|
194
|
+
LlmCostTracker::LlmApiCall.this_month.cost_by_tag("feature")
|
|
400
195
|
LlmCostTracker::LlmApiCall.daily_costs(days: 7)
|
|
401
|
-
|
|
402
|
-
# Latency
|
|
403
|
-
LlmCostTracker::LlmApiCall.with_latency.average_latency_ms
|
|
404
|
-
LlmCostTracker::LlmApiCall.this_month.latency_by_model
|
|
405
|
-
|
|
406
|
-
# Tag filters
|
|
407
|
-
LlmCostTracker::LlmApiCall.by_tag("feature", "chat").this_month.total_cost
|
|
408
196
|
LlmCostTracker::LlmApiCall.by_tags(user_id: 42, feature: "chat").this_month.total_cost
|
|
409
|
-
|
|
410
|
-
# Range
|
|
411
|
-
LlmCostTracker::LlmApiCall.between(1.week.ago, Time.current).cost_by_model
|
|
412
197
|
```
|
|
413
198
|
|
|
414
|
-
|
|
415
|
-
|
|
416
|
-
Retention is not enforced automatically. With ActiveRecord storage, use the rake task below if you need to delete older records in batches.
|
|
199
|
+
A text report is also one rake task away:
|
|
417
200
|
|
|
418
201
|
```bash
|
|
419
|
-
DAYS=
|
|
420
|
-
```
|
|
421
|
-
|
|
422
|
-
## Tag storage
|
|
423
|
-
|
|
424
|
-
New installs use `jsonb` + GIN on PostgreSQL:
|
|
425
|
-
|
|
426
|
-
```ruby
|
|
427
|
-
t.jsonb :tags, null: false, default: {}
|
|
428
|
-
add_index :llm_api_calls, :tags, using: :gin
|
|
429
|
-
```
|
|
430
|
-
|
|
431
|
-
On other adapters tags fall back to JSON in a text column. `by_tag` uses JSONB containment on PG, text matching elsewhere.
|
|
432
|
-
|
|
433
|
-
## Upgrading existing installs
|
|
434
|
-
|
|
435
|
-
Run the generators that match columns missing from older versions:
|
|
436
|
-
|
|
437
|
-
```bash
|
|
438
|
-
bin/rails generate llm_cost_tracker:add_period_totals # shared budget rollups
|
|
439
|
-
bin/rails generate llm_cost_tracker:add_streaming # stream + usage_source
|
|
440
|
-
bin/rails generate llm_cost_tracker:add_provider_response_id
|
|
441
|
-
bin/rails generate llm_cost_tracker:add_usage_breakdown
|
|
442
|
-
bin/rails generate llm_cost_tracker:upgrade_tags_to_jsonb # PG: text → jsonb + GIN
|
|
443
|
-
bin/rails generate llm_cost_tracker:upgrade_cost_precision # widen cost columns
|
|
444
|
-
bin/rails generate llm_cost_tracker:add_latency_ms
|
|
445
|
-
bin/rails db:migrate
|
|
202
|
+
DAYS=7 bin/rails llm_cost_tracker:report
|
|
446
203
|
```
|
|
447
204
|
|
|
448
|
-
|
|
205
|
+
Full scope and helper reference: [`docs/querying.md`](docs/querying.md).
|
|
449
206
|
|
|
450
|
-
##
|
|
451
|
-
|
|
452
|
-
Optional Rails Engine. Plain ERB, no JavaScript framework, no asset pipeline required. Requires Rails 7.1+; the core middleware works without Rails. The dashboard reads `llm_api_calls`, so use `storage_backend = :active_record` for apps that mount it.
|
|
207
|
+
## Dashboard
|
|
453
208
|
|
|
454
|
-
|
|
209
|
+
Mount the engine wherever you want — it's plain ERB, no JavaScript bundle, no asset pipeline gymnastics:
|
|
455
210
|
|
|
456
211
|
```ruby
|
|
457
|
-
# config/application.rb (or an initializer)
|
|
458
|
-
require "llm_cost_tracker/engine"
|
|
459
|
-
|
|
460
212
|
# config/routes.rb
|
|
461
213
|
mount LlmCostTracker::Engine => "/llm-costs"
|
|
462
214
|
```
|
|
463
215
|
|
|
464
|
-
|
|
465
|
-
|
|
466
|
-
- `/llm-costs` — overview: spend with delta vs previous period, budget projection, spend anomaly banner, daily trend vs previous slice, provider rollup, top models
|
|
467
|
-
- `/llm-costs/models` — by provider + model; sortable by spend, volume, avg cost, latency
|
|
468
|
-
- `/llm-costs/calls` — filterable + paginated; sort modes for recency, spend, input tokens, output tokens, latency, and unknown pricing; CSV export
|
|
469
|
-
- `/llm-costs/calls/:id` — details with token mix and cost mix breakdowns
|
|
470
|
-
- `/llm-costs/tags` — tag keys present in the dataset (PG/SQLite native; MySQL 8.0+ via JSON_TABLE)
|
|
471
|
-
- `/llm-costs/tags/:key` — breakdown by values of a given tag key
|
|
472
|
-
- `/llm-costs/data_quality` — unknown pricing, untagged calls, missing latency, incomplete stream usage, and missing provider response IDs
|
|
473
|
-
|
|
474
|
-
No built-in auth is included. Tags carry whatever your app puts in them, so protect the mount point with your application's authentication.
|
|
475
|
-
|
|
476
|
-
### Basic auth
|
|
477
|
-
|
|
478
|
-
```ruby
|
|
479
|
-
authenticated = ->(req) {
|
|
480
|
-
ActionController::HttpAuthentication::Basic.authenticate(req) do |name, password|
|
|
481
|
-
ActiveSupport::SecurityUtils.secure_compare(name, ENV.fetch("LLM_DASHBOARD_USER")) &
|
|
482
|
-
ActiveSupport::SecurityUtils.secure_compare(password, ENV.fetch("LLM_DASHBOARD_PASSWORD"))
|
|
483
|
-
end
|
|
484
|
-
}
|
|
485
|
-
constraints(authenticated) { mount LlmCostTracker::Engine => "/llm-costs" }
|
|
486
|
-
```
|
|
487
|
-
|
|
488
|
-
### Devise
|
|
489
|
-
|
|
490
|
-
```ruby
|
|
491
|
-
authenticate :user, ->(user) { user.admin? } do
|
|
492
|
-
mount LlmCostTracker::Engine => "/llm-costs"
|
|
493
|
-
end
|
|
494
|
-
```
|
|
495
|
-
|
|
496
|
-
## ActiveSupport::Notifications
|
|
497
|
-
|
|
498
|
-
```ruby
|
|
499
|
-
ActiveSupport::Notifications.subscribe("llm_request.llm_cost_tracker") do |*, payload|
|
|
500
|
-
# payload =>
|
|
501
|
-
# {
|
|
502
|
-
# provider: "openai", model: "gpt-4o",
|
|
503
|
-
# input_tokens: 150, cache_read_input_tokens: 0, cache_write_input_tokens: 0,
|
|
504
|
-
# hidden_output_tokens: 0, output_tokens: 42, total_tokens: 192, latency_ms: 248,
|
|
505
|
-
# cost: {
|
|
506
|
-
# input_cost: 0.000375, cache_read_input_cost: 0.0,
|
|
507
|
-
# cache_write_input_cost: 0.0, output_cost: 0.00042,
|
|
508
|
-
# total_cost: 0.000795, currency: "USD"
|
|
509
|
-
# },
|
|
510
|
-
# pricing_mode: "batch",
|
|
511
|
-
# stream: false, usage_source: "response", provider_response_id: "chatcmpl_123",
|
|
512
|
-
# tags: { feature: "chat", user_id: 42 },
|
|
513
|
-
# tracked_at: 2026-04-16 14:30:00 UTC
|
|
514
|
-
# }
|
|
515
|
-
end
|
|
516
|
-
```
|
|
517
|
-
|
|
518
|
-
## Custom storage backend
|
|
519
|
-
|
|
520
|
-
```ruby
|
|
521
|
-
config.storage_backend = :custom
|
|
522
|
-
config.custom_storage = ->(event) {
|
|
523
|
-
InfluxDB.write("llm_costs",
|
|
524
|
-
values: { cost: event.cost&.total_cost, tokens: event.total_tokens, latency_ms: event.latency_ms },
|
|
525
|
-
tags: { provider: event.provider, model: event.model }
|
|
526
|
-
)
|
|
527
|
-
}
|
|
528
|
-
```
|
|
529
|
-
|
|
530
|
-
## OpenAI-compatible providers
|
|
531
|
-
|
|
532
|
-
```ruby
|
|
533
|
-
config.openai_compatible_providers["gateway.example.com"] = "internal_gateway"
|
|
534
|
-
```
|
|
535
|
-
|
|
536
|
-
Configured hosts are parsed using the OpenAI-compatible usage shape (`prompt_tokens` / `completion_tokens` / `total_tokens`, `input_tokens` / `output_tokens`, and optional cached-input details). This covers OpenRouter, DeepSeek, and private gateways exposing Chat Completions / Responses / Completions / Embeddings.
|
|
216
|
+
Pages: overview (spend trend, budget status, anomaly banner), models, calls (filterable, paginated, CSV export), tags, data quality. Reads `llm_api_calls`, so use `:active_record` storage if you want to mount it.
|
|
537
217
|
|
|
538
|
-
|
|
539
|
-
|
|
540
|
-
For providers with a non-OpenAI usage shape:
|
|
541
|
-
|
|
542
|
-
```ruby
|
|
543
|
-
class AcmeParser < LlmCostTracker::Parsers::Base
|
|
544
|
-
HOSTS = %w[api.acme-llm.example].freeze
|
|
545
|
-
TRACKED_PATHS = %w[/v1/generate].freeze
|
|
546
|
-
|
|
547
|
-
def provider_names
|
|
548
|
-
%w[acme]
|
|
549
|
-
end
|
|
550
|
-
|
|
551
|
-
def match?(url)
|
|
552
|
-
match_uri?(url, hosts: HOSTS, exact_paths: TRACKED_PATHS)
|
|
553
|
-
end
|
|
554
|
-
|
|
555
|
-
def parse(_request_url, _request_body, response_status, response_body)
|
|
556
|
-
return nil unless response_status == 200
|
|
557
|
-
|
|
558
|
-
payload = safe_json_parse(response_body)
|
|
559
|
-
usage = payload.dig("usage")
|
|
560
|
-
return nil unless usage
|
|
561
|
-
|
|
562
|
-
LlmCostTracker::ParsedUsage.build(
|
|
563
|
-
provider: "acme",
|
|
564
|
-
model: payload["model"],
|
|
565
|
-
input_tokens: usage["input"] || 0,
|
|
566
|
-
output_tokens: usage["output"] || 0
|
|
567
|
-
)
|
|
568
|
-
end
|
|
569
|
-
end
|
|
570
|
-
|
|
571
|
-
LlmCostTracker::Parsers::Registry.register(AcmeParser)
|
|
572
|
-
```
|
|
218
|
+
Auth is your job. Examples for basic auth and Devise: [`docs/dashboard.md`](docs/dashboard.md).
|
|
573
219
|
|
|
574
220
|
## Supported providers
|
|
575
221
|
|
|
576
|
-
| Provider | Auto-detected |
|
|
222
|
+
| Provider | Auto-detected | Coverage |
|
|
577
223
|
|---|:---:|---|
|
|
578
|
-
| OpenAI | Yes | GPT-5.5/5.4/5.2/5.1/5
|
|
579
|
-
| OpenRouter | Yes | OpenAI-compatible usage; provider-prefixed OpenAI model IDs normalized when possible |
|
|
580
|
-
| DeepSeek | Yes | OpenAI-compatible usage; add `pricing_overrides` for DeepSeek models |
|
|
581
|
-
| OpenAI-compatible hosts | Config | Configure `openai_compatible_providers` |
|
|
224
|
+
| OpenAI | Yes | GPT-5.5/5.4/5.2/5.1/5 + pro/mini/nano variants, GPT-4.1, GPT-4o, o1/o3/o4-mini |
|
|
582
225
|
| Anthropic | Yes | Claude Opus 4.7/4.6/4.5/4.1/4, Sonnet 4.6/4.5/4, Haiku 4.5 |
|
|
583
226
|
| Google Gemini | Yes | Gemini 2.5 Pro/Flash/Flash-Lite, 2.0 Flash/Flash-Lite |
|
|
584
|
-
|
|
|
227
|
+
| OpenRouter | Yes | OpenAI-compatible usage; provider-prefixed model IDs are normalized |
|
|
228
|
+
| DeepSeek | Yes | OpenAI-compatible usage; add `pricing_overrides` for DeepSeek-specific rates |
|
|
229
|
+
| Other OpenAI-compatible hosts | Configurable | Register the host via `config.openai_compatible_providers` |
|
|
230
|
+
| Anything else | Configurable | Custom parser — see [`docs/extending.md`](docs/extending.md) |
|
|
231
|
+
|
|
232
|
+
RubyLLM chat, embedding, and transcription calls are captured through RubyLLM's provider layer when `config.instrument :ruby_llm` is enabled.
|
|
585
233
|
|
|
586
|
-
Endpoints: OpenAI Chat Completions / Responses / Completions / Embeddings
|
|
234
|
+
Endpoints covered end-to-end: OpenAI Chat Completions / Responses / Completions / Embeddings, Anthropic Messages, Gemini `generateContent` and `streamGenerateContent`, plus their OpenAI-compatible equivalents. Streaming is captured for Faraday paths whenever the provider emits final-usage events.
|
|
587
235
|
|
|
588
|
-
##
|
|
236
|
+
## Privacy
|
|
589
237
|
|
|
590
|
-
|
|
238
|
+
By design, **no prompt or response content is ever stored.** Per call, the ledger holds: provider, model, token counts, cost, latency, tags, response ID, timestamp. That's it. No request bodies, no headers, no completions. Warning logs strip query strings before logging URLs.
|
|
591
239
|
|
|
592
|
-
|
|
593
|
-
- No prompt or response bodies stored.
|
|
594
|
-
- Faraday responses not modified.
|
|
595
|
-
- Request headers are never stored. Warning logs strip query strings from URLs before logging.
|
|
596
|
-
- Storage failures non-fatal by default (`storage_error_behavior = :warn`).
|
|
597
|
-
- Budget and unknown-pricing errors are raised only when you opt in.
|
|
240
|
+
Tags carry whatever your app passes — they are application-controlled input, treat them accordingly. Use `user_id`, not the user's email; use a feature key, not the input prompt.
|
|
598
241
|
|
|
599
|
-
##
|
|
242
|
+
## Documentation
|
|
600
243
|
|
|
601
|
-
|
|
244
|
+
Deeper guides live in `docs/`. Reference pages are being filled out as content
|
|
245
|
+
moves out of this README; the inline sections above remain canonical where a page
|
|
246
|
+
is still brief.
|
|
602
247
|
|
|
603
|
-
-
|
|
604
|
-
-
|
|
605
|
-
-
|
|
606
|
-
-
|
|
607
|
-
-
|
|
608
|
-
-
|
|
609
|
-
-
|
|
248
|
+
- [Configuration reference](docs/configuration.md)
|
|
249
|
+
- [Pricing & price refresh](docs/pricing.md)
|
|
250
|
+
- [Budgets & guardrails](docs/budgets.md)
|
|
251
|
+
- [Querying & reports](docs/querying.md)
|
|
252
|
+
- [Dashboard mounting](docs/dashboard.md)
|
|
253
|
+
- [Streaming capture](docs/streaming.md)
|
|
254
|
+
- [Extending](docs/extending.md)
|
|
255
|
+
- [Production operations](docs/operations.md)
|
|
256
|
+
- [Upgrading](docs/upgrading.md)
|
|
257
|
+
- [Cookbook — per-client recipes](docs/cookbook.md)
|
|
258
|
+
- [Architecture & design rules](docs/architecture.md)
|
|
610
259
|
|
|
611
260
|
## Known limitations
|
|
612
261
|
|
|
613
|
-
- `:block_requests` is
|
|
614
|
-
- Official SDK integrations
|
|
615
|
-
- Streaming capture relies on the provider emitting a final-usage event
|
|
616
|
-
- `provider_response_id` is stored only when the provider exposes a stable
|
|
617
|
-
- Cache write TTL variants (1h vs 5min writes) are not modeled separately.
|
|
262
|
+
- `:block_requests` is best-effort under concurrency, not a transactional cap.
|
|
263
|
+
- Official OpenAI and Anthropic SDK integrations cover non-streaming calls; streaming via those SDKs falls back to Faraday middleware or `track_stream`.
|
|
264
|
+
- Streaming usage capture relies on the provider emitting a final-usage event. Missing events are stored with `usage_source: "unknown"` so they appear on the data-quality page rather than vanishing.
|
|
265
|
+
- `provider_response_id` is stored only when the provider exposes a stable ID. Gemini is best-effort and varies by endpoint.
|
|
266
|
+
- Cache write TTL variants on Anthropic (1h vs 5min writes) are not modeled separately yet.
|
|
618
267
|
|
|
619
268
|
## Development
|
|
620
269
|
|
|
621
|
-
Architecture rules for future changes live in [`docs/architecture.md`](docs/architecture.md).
|
|
622
|
-
|
|
623
270
|
```bash
|
|
624
271
|
bundle install
|
|
625
|
-
bin/check
|
|
272
|
+
bin/check # rubocop + rspec
|
|
626
273
|
```
|
|
627
274
|
|
|
275
|
+
Architecture rules and conventions for contributions live in [`AGENTS.md`](AGENTS.md) and [`docs/architecture.md`](docs/architecture.md).
|
|
276
|
+
|
|
628
277
|
## License
|
|
629
278
|
|
|
630
|
-
MIT
|
|
279
|
+
MIT — see [LICENSE.txt](LICENSE.txt).
|