llm_cost_tracker 0.5.1 → 0.5.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +43 -0
- data/README.md +18 -9
- data/app/controllers/llm_cost_tracker/calls_controller.rb +2 -1
- data/app/controllers/llm_cost_tracker/dashboard_controller.rb +3 -15
- data/app/controllers/llm_cost_tracker/tags_controller.rb +7 -6
- data/app/helpers/llm_cost_tracker/application_helper.rb +21 -6
- data/app/helpers/llm_cost_tracker/dashboard_filter_options_helper.rb +3 -1
- data/app/services/llm_cost_tracker/dashboard/date_range.rb +42 -0
- data/app/services/llm_cost_tracker/dashboard/filter.rb +6 -8
- data/app/services/llm_cost_tracker/dashboard/spend_anomaly.rb +6 -5
- data/app/services/llm_cost_tracker/dashboard/tag_breakdown.rb +74 -18
- data/app/services/llm_cost_tracker/dashboard/tag_key_explorer.rb +15 -4
- data/app/views/llm_cost_tracker/shared/_tag_chips.html.erb +1 -1
- data/app/views/llm_cost_tracker/tags/show.html.erb +4 -0
- data/docs/architecture.md +28 -0
- data/docs/budgets.md +45 -0
- data/docs/configuration.md +65 -0
- data/docs/cookbook.md +185 -0
- data/docs/dashboard-overview.png +0 -0
- data/docs/dashboard.md +38 -0
- data/docs/extending.md +32 -0
- data/docs/operations.md +44 -0
- data/docs/pricing.md +94 -0
- data/docs/querying.md +36 -0
- data/docs/streaming.md +70 -0
- data/docs/technical/README.md +10 -0
- data/docs/technical/data-flow.md +67 -0
- data/docs/technical/extension-points.md +111 -0
- data/docs/technical/module-map.md +197 -0
- data/docs/technical/operational-notes.md +77 -0
- data/docs/upgrading.md +46 -0
- data/lib/llm_cost_tracker/capture_verifier.rb +71 -0
- data/lib/llm_cost_tracker/configuration/instrumentation.rb +1 -1
- data/lib/llm_cost_tracker/configuration/storage_backend.rb +26 -0
- data/lib/llm_cost_tracker/configuration.rb +24 -17
- data/lib/llm_cost_tracker/doctor/capture_check.rb +39 -0
- data/lib/llm_cost_tracker/doctor.rb +6 -1
- data/lib/llm_cost_tracker/generators/llm_cost_tracker/install_generator.rb +1 -0
- data/lib/llm_cost_tracker/generators/llm_cost_tracker/templates/initializer.rb.erb +7 -1
- data/lib/llm_cost_tracker/integrations/anthropic.rb +51 -3
- data/lib/llm_cost_tracker/integrations/base.rb +77 -6
- data/lib/llm_cost_tracker/integrations/object_reader.rb +1 -1
- data/lib/llm_cost_tracker/integrations/openai.rb +78 -5
- data/lib/llm_cost_tracker/integrations/registry.rb +36 -4
- data/lib/llm_cost_tracker/integrations/ruby_llm.rb +171 -0
- data/lib/llm_cost_tracker/integrations/stream_tracker.rb +166 -0
- data/lib/llm_cost_tracker/llm_api_call.rb +2 -77
- data/lib/llm_cost_tracker/llm_api_call_metrics.rb +63 -0
- data/lib/llm_cost_tracker/middleware/faraday.rb +8 -4
- data/lib/llm_cost_tracker/parsers/gemini.rb +8 -1
- data/lib/llm_cost_tracker/parsers/openai_usage.rb +12 -3
- data/lib/llm_cost_tracker/price_registry.rb +3 -0
- data/lib/llm_cost_tracker/price_sync/fetcher.rb +41 -12
- data/lib/llm_cost_tracker/price_sync/registry_loader.rb +6 -0
- data/lib/llm_cost_tracker/pricing/effective_prices.rb +75 -0
- data/lib/llm_cost_tracker/pricing/explainer.rb +77 -0
- data/lib/llm_cost_tracker/pricing/lookup.rb +110 -0
- data/lib/llm_cost_tracker/pricing.rb +25 -108
- data/lib/llm_cost_tracker/report.rb +8 -1
- data/lib/llm_cost_tracker/report_data.rb +25 -9
- data/lib/llm_cost_tracker/retention.rb +33 -16
- data/lib/llm_cost_tracker/storage/active_record_backend.rb +115 -0
- data/lib/llm_cost_tracker/storage/active_record_rollups.rb +42 -0
- data/lib/llm_cost_tracker/storage/active_record_store.rb +26 -0
- data/lib/llm_cost_tracker/storage/custom_backend.rb +32 -0
- data/lib/llm_cost_tracker/storage/dispatcher.rb +11 -34
- data/lib/llm_cost_tracker/storage/log_backend.rb +38 -0
- data/lib/llm_cost_tracker/storage/registry.rb +63 -0
- data/lib/llm_cost_tracker/stream_capture.rb +7 -0
- data/lib/llm_cost_tracker/stream_collector.rb +25 -1
- data/lib/llm_cost_tracker/tag_sanitizer.rb +81 -0
- data/lib/llm_cost_tracker/tag_sql.rb +34 -0
- data/lib/llm_cost_tracker/tracker.rb +6 -2
- data/lib/llm_cost_tracker/version.rb +1 -1
- data/lib/llm_cost_tracker.rb +4 -0
- data/lib/tasks/llm_cost_tracker.rake +49 -0
- metadata +40 -6
|
@@ -0,0 +1,65 @@
|
|
|
1
|
+
# Configuration
|
|
2
|
+
|
|
3
|
+
Configuration is the contract between the host app and the ledger: where events
|
|
4
|
+
go, which integrations are enabled, how attribution is attached, and how the app
|
|
5
|
+
reacts when storage, pricing, or budgets need attention.
|
|
6
|
+
|
|
7
|
+
The full option reference is moving here from the README. Until that migration is
|
|
8
|
+
complete, the README anchors below remain canonical.
|
|
9
|
+
|
|
10
|
+
## Canonical Sources
|
|
11
|
+
|
|
12
|
+
Until this page is expanded, use:
|
|
13
|
+
|
|
14
|
+
- [Quickstart](../README.md#quickstart)
|
|
15
|
+
- [Capturing calls](../README.md#capturing-calls)
|
|
16
|
+
- [Tags](../README.md#tags-who-burned-this-money)
|
|
17
|
+
- [Pricing](../README.md#pricing)
|
|
18
|
+
- [Budgets](../README.md#budgets)
|
|
19
|
+
|
|
20
|
+
## Scope
|
|
21
|
+
|
|
22
|
+
This page is scoped to:
|
|
23
|
+
|
|
24
|
+
- `storage_backend`: `:log`, `:active_record`, and `:custom`
|
|
25
|
+
- `default_tags`: static tags and per-request callable tags
|
|
26
|
+
- `instrument`: RubyLLM and official SDK integrations
|
|
27
|
+
- `prices_file` and `pricing_overrides`
|
|
28
|
+
- `monthly_budget`, `daily_budget`, and `per_call_budget`
|
|
29
|
+
- `budget_exceeded_behavior`
|
|
30
|
+
- `storage_error_behavior`
|
|
31
|
+
- `unknown_pricing_behavior`
|
|
32
|
+
- `openai_compatible_providers`
|
|
33
|
+
- `report_tag_breakdowns`
|
|
34
|
+
|
|
35
|
+
## Minimal Production Config
|
|
36
|
+
|
|
37
|
+
```ruby
|
|
38
|
+
LlmCostTracker.configure do |config|
|
|
39
|
+
config.storage_backend = :active_record
|
|
40
|
+
config.default_tags = -> { { environment: Rails.env } }
|
|
41
|
+
config.prices_file = Rails.root.join("config/llm_cost_tracker_prices.yml")
|
|
42
|
+
config.instrument :openai
|
|
43
|
+
end
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
Keep configuration at boot. Mutable shared settings are frozen after
|
|
47
|
+
`configure` returns so request-time code cannot silently change global tracking
|
|
48
|
+
behavior.
|
|
49
|
+
|
|
50
|
+
Enabled SDK integrations are fail-fast. The client gem must be loaded, meet the
|
|
51
|
+
minimum supported version, and expose the expected classes and methods before
|
|
52
|
+
LLM Cost Tracker installs its wrapper.
|
|
53
|
+
|
|
54
|
+
## Capture Verification
|
|
55
|
+
|
|
56
|
+
After boot, run:
|
|
57
|
+
|
|
58
|
+
```bash
|
|
59
|
+
bin/rails llm_cost_tracker:verify_capture
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
For ActiveRecord storage, the task records a synthetic manual event inside a
|
|
63
|
+
rollback and checks that notifications and persistence both work. For log and
|
|
64
|
+
custom storage, it reports the configured capture path without invoking external
|
|
65
|
+
sinks.
|
data/docs/cookbook.md
ADDED
|
@@ -0,0 +1,185 @@
|
|
|
1
|
+
# Cookbook
|
|
2
|
+
|
|
3
|
+
Short integration recipes for common Ruby clients. Prefer SDK integrations or middleware. Use `track` and `track_stream` only as fallback helpers for unsupported clients.
|
|
4
|
+
|
|
5
|
+
| Client | Best path | Why |
|
|
6
|
+
|---|---|---|
|
|
7
|
+
| RubyLLM | `config.instrument :ruby_llm` | The integration wraps RubyLLM's provider layer without adding a third-party instrumentation gem. |
|
|
8
|
+
| Official `openai` gem | `config.instrument :openai` | The integration wraps SDK resource methods without changing call sites. |
|
|
9
|
+
| Official `anthropic` gem | `config.instrument :anthropic` | The integration records returned message usage without changing call sites. |
|
|
10
|
+
| `ruby-openai` | Faraday middleware | The client is built on Faraday and accepts middleware via the constructor block. |
|
|
11
|
+
| OpenAI-compatible proxy | Faraday middleware | Use `ruby-openai` or a direct Faraday client against the proxy host. |
|
|
12
|
+
| Custom Faraday client | Faraday middleware | The middleware can parse known provider responses automatically. |
|
|
13
|
+
| Other clients | Adapter first, fallback helpers second | Add a stable integration instead of scattering per-call ledger code. |
|
|
14
|
+
|
|
15
|
+
## RubyLLM
|
|
16
|
+
|
|
17
|
+
Enable the integration once, then keep normal RubyLLM calls unchanged.
|
|
18
|
+
|
|
19
|
+
```ruby
|
|
20
|
+
LlmCostTracker.configure do |config|
|
|
21
|
+
config.instrument :ruby_llm
|
|
22
|
+
end
|
|
23
|
+
|
|
24
|
+
LlmCostTracker.with_tags(feature: "support_chat") do
|
|
25
|
+
RubyLLM.chat.ask("Hello")
|
|
26
|
+
RubyLLM.embed("text to embed")
|
|
27
|
+
end
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
The RubyLLM integration supports `ruby_llm >= 1.14.1` and checks RubyLLM's provider contract at boot. Chat, embedding, and transcription calls are captured. Image generation, moderation, and tool execution are not recorded as separate ledger rows.
|
|
31
|
+
|
|
32
|
+
## Official OpenAI SDK
|
|
33
|
+
|
|
34
|
+
Enable the integration once, then keep normal `openai` gem calls unchanged.
|
|
35
|
+
|
|
36
|
+
```ruby
|
|
37
|
+
LlmCostTracker.configure do |config|
|
|
38
|
+
config.instrument :openai
|
|
39
|
+
end
|
|
40
|
+
|
|
41
|
+
client = OpenAI::Client.new(api_key: ENV["OPENAI_API_KEY"])
|
|
42
|
+
|
|
43
|
+
client.responses.create(model: "gpt-4o", input: "Hello")
|
|
44
|
+
client.chat.completions.create(
|
|
45
|
+
model: "gpt-4o",
|
|
46
|
+
messages: [{ role: "user", content: "Hello" }]
|
|
47
|
+
)
|
|
48
|
+
|
|
49
|
+
client.responses.stream(model: "gpt-4o", input: "Hello").each do |event|
|
|
50
|
+
puts event.type
|
|
51
|
+
end
|
|
52
|
+
|
|
53
|
+
client.responses.stream_raw(model: "gpt-4o", input: "Hello").each do |event|
|
|
54
|
+
puts event.type
|
|
55
|
+
end
|
|
56
|
+
|
|
57
|
+
client.chat.completions.stream_raw(
|
|
58
|
+
model: "gpt-4o",
|
|
59
|
+
messages: [{ role: "user", content: "Hello" }],
|
|
60
|
+
stream_options: { include_usage: true }
|
|
61
|
+
).each do |event|
|
|
62
|
+
puts event
|
|
63
|
+
end
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
The OpenAI SDK integration supports `openai >= 0.59.0`. Streaming calls are recorded after the returned stream is consumed. Chat Completions streams need `stream_options: { include_usage: true }` for final usage.
|
|
67
|
+
|
|
68
|
+
## Official Anthropic SDK
|
|
69
|
+
|
|
70
|
+
Enable the integration once, then keep normal `anthropic` gem calls unchanged.
|
|
71
|
+
|
|
72
|
+
```ruby
|
|
73
|
+
LlmCostTracker.configure do |config|
|
|
74
|
+
config.instrument :anthropic
|
|
75
|
+
end
|
|
76
|
+
|
|
77
|
+
client = Anthropic::Client.new(api_key: ENV["ANTHROPIC_API_KEY"])
|
|
78
|
+
|
|
79
|
+
client.messages.create(
|
|
80
|
+
max_tokens: 1024,
|
|
81
|
+
model: "claude-sonnet-4-5-20250929",
|
|
82
|
+
messages: [{ role: "user", content: "Hello" }]
|
|
83
|
+
)
|
|
84
|
+
|
|
85
|
+
client.messages.stream(
|
|
86
|
+
max_tokens: 1024,
|
|
87
|
+
model: "claude-sonnet-4-5-20250929",
|
|
88
|
+
messages: [{ role: "user", content: "Hello" }]
|
|
89
|
+
).each do |event|
|
|
90
|
+
puts event.type
|
|
91
|
+
end
|
|
92
|
+
|
|
93
|
+
client.messages.stream_raw(
|
|
94
|
+
max_tokens: 1024,
|
|
95
|
+
model: "claude-sonnet-4-5-20250929",
|
|
96
|
+
messages: [{ role: "user", content: "Hello" }]
|
|
97
|
+
).each do |event|
|
|
98
|
+
puts event.type
|
|
99
|
+
end
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
The Anthropic SDK integration supports `anthropic >= 1.36.0`. Streaming calls are recorded after the returned stream is consumed.
|
|
103
|
+
|
|
104
|
+
## ruby-openai
|
|
105
|
+
|
|
106
|
+
`ruby-openai` is a community client that occupies the same `OpenAI::Client` constant as the official gem; only one of the two can be loaded. `config.instrument :openai` is for the official gem. For `ruby-openai`, attach the Faraday middleware via the constructor block:
|
|
107
|
+
|
|
108
|
+
```ruby
|
|
109
|
+
client = OpenAI::Client.new(access_token: ENV["OPENAI_API_KEY"]) do |f|
|
|
110
|
+
f.use :llm_cost_tracker, tags: { feature: "chat" }
|
|
111
|
+
end
|
|
112
|
+
|
|
113
|
+
client.chat(
|
|
114
|
+
parameters: {
|
|
115
|
+
model: "gpt-4o",
|
|
116
|
+
messages: [{ role: "user", content: "Hello" }],
|
|
117
|
+
stream: proc { |chunk, _bytesize| puts chunk.dig("choices", 0, "delta", "content") },
|
|
118
|
+
stream_options: { include_usage: true }
|
|
119
|
+
}
|
|
120
|
+
)
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
Use the constructor block on every client you build, or wrap client creation in your own factory.
|
|
124
|
+
|
|
125
|
+
## Azure OpenAI
|
|
126
|
+
|
|
127
|
+
Azure's v1 API works with OpenAI-compatible HTTP shapes, but pricing and deployment names are yours. Register the Azure host, use the Faraday middleware path, and keep Azure-specific prices in `prices_file` or `pricing_overrides`.
|
|
128
|
+
|
|
129
|
+
```ruby
|
|
130
|
+
LlmCostTracker.configure do |config|
|
|
131
|
+
config.openai_compatible_providers["my-resource.openai.azure.com"] = "azure_openai"
|
|
132
|
+
end
|
|
133
|
+
|
|
134
|
+
conn = Faraday.new(url: "https://my-resource.openai.azure.com") do |f|
|
|
135
|
+
f.use :llm_cost_tracker, tags: { feature: "chat" }
|
|
136
|
+
f.request :json
|
|
137
|
+
f.response :json
|
|
138
|
+
f.adapter Faraday.default_adapter
|
|
139
|
+
end
|
|
140
|
+
|
|
141
|
+
conn.post(
|
|
142
|
+
"/openai/v1/responses",
|
|
143
|
+
{ model: "gpt-4o-prod", input: "Hello" },
|
|
144
|
+
{ "api-key" => ENV.fetch("AZURE_OPENAI_API_KEY") }
|
|
145
|
+
)
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
## Gemini API
|
|
149
|
+
|
|
150
|
+
Google's official Gemini SDKs do not include Ruby. Use a Faraday client against the REST API so the Gemini parser can capture usage automatically.
|
|
151
|
+
|
|
152
|
+
```ruby
|
|
153
|
+
conn = Faraday.new(url: "https://generativelanguage.googleapis.com") do |f|
|
|
154
|
+
f.use :llm_cost_tracker, tags: { feature: "chat" }
|
|
155
|
+
f.request :json
|
|
156
|
+
f.response :json
|
|
157
|
+
f.adapter Faraday.default_adapter
|
|
158
|
+
end
|
|
159
|
+
|
|
160
|
+
conn.post(
|
|
161
|
+
"/v1beta/models/gemini-2.5-flash:generateContent?key=#{ENV.fetch("GOOGLE_API_KEY")}",
|
|
162
|
+
{ contents: [{ role: "user", parts: [{ text: "Hello" }] }] }
|
|
163
|
+
)
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
## LiteLLM proxy
|
|
167
|
+
|
|
168
|
+
LiteLLM Proxy speaks an OpenAI-compatible HTTP shape, so register the proxy host once and keep using the normal middleware path.
|
|
169
|
+
|
|
170
|
+
```ruby
|
|
171
|
+
LlmCostTracker.configure do |config|
|
|
172
|
+
config.openai_compatible_providers["proxy.internal.example"] = "litellm"
|
|
173
|
+
end
|
|
174
|
+
|
|
175
|
+
client = OpenAI::Client.new(
|
|
176
|
+
access_token: ENV["LITELLM_API_KEY"],
|
|
177
|
+
uri_base: "https://proxy.internal.example"
|
|
178
|
+
) do |f|
|
|
179
|
+
f.use :llm_cost_tracker, tags: { gateway: "litellm" }
|
|
180
|
+
end
|
|
181
|
+
|
|
182
|
+
client.chat(parameters: { model: "openai/gpt-5-mini", messages: [{ role: "user", content: "Hello" }] })
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
If your proxy exposes custom model IDs or discounts, add them in `prices_file` or `pricing_overrides`.
|
|
Binary file
|
data/docs/dashboard.md
ADDED
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
# Dashboard
|
|
2
|
+
|
|
3
|
+
The dashboard is a Rails Engine for humans reviewing spend, attribution, and data
|
|
4
|
+
quality. It is server-rendered ERB, has no JavaScript bundle, and reads from the
|
|
5
|
+
host app's `llm_api_calls` table.
|
|
6
|
+
|
|
7
|
+
The detailed dashboard guide is moving here from the README: mounting, route
|
|
8
|
+
constraints, authentication examples, page map, and operational notes.
|
|
9
|
+
|
|
10
|
+
## Canonical Sources
|
|
11
|
+
|
|
12
|
+
Until this page is expanded, use:
|
|
13
|
+
|
|
14
|
+
- [Dashboard](../README.md#dashboard)
|
|
15
|
+
- [Privacy](../README.md#privacy)
|
|
16
|
+
- [Operations](operations.md)
|
|
17
|
+
|
|
18
|
+
## Mounting
|
|
19
|
+
|
|
20
|
+
```ruby
|
|
21
|
+
mount LlmCostTracker::Engine => "/llm-costs"
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
Use `storage_backend = :active_record` for apps that mount the dashboard.
|
|
25
|
+
|
|
26
|
+
## Pages
|
|
27
|
+
|
|
28
|
+
- Overview: spend trend, budget status, anomaly banner, provider rollup, top models
|
|
29
|
+
- Models: spend and usage by provider and model
|
|
30
|
+
- Calls: filterable call ledger with CSV export
|
|
31
|
+
- Tags: tag keys and tag value breakdowns
|
|
32
|
+
- Data Quality: unknown pricing, untagged calls, missing latency, incomplete streams
|
|
33
|
+
|
|
34
|
+
## Authentication
|
|
35
|
+
|
|
36
|
+
The gem does not ship dashboard auth. Mount the engine behind the host app's
|
|
37
|
+
existing authentication layer: Devise, basic auth, Cloudflare Access, or your own
|
|
38
|
+
constraints.
|
data/docs/extending.md
ADDED
|
@@ -0,0 +1,32 @@
|
|
|
1
|
+
# Extending LLM Cost Tracker
|
|
2
|
+
|
|
3
|
+
Extensions belong at clear boundaries: parsers for response shapes, integrations
|
|
4
|
+
for SDK hooks, pricing files for rates, and custom storage for apps that own
|
|
5
|
+
persistence themselves.
|
|
6
|
+
|
|
7
|
+
The practical extension guide is moving here from the README. The lower-level
|
|
8
|
+
contracts already live in the technical extension reference.
|
|
9
|
+
|
|
10
|
+
## Canonical Sources
|
|
11
|
+
|
|
12
|
+
Until this page is expanded, use:
|
|
13
|
+
|
|
14
|
+
- [Capturing calls](../README.md#capturing-calls)
|
|
15
|
+
- [Pricing](pricing.md)
|
|
16
|
+
- [Technical extension points](technical/extension-points.md)
|
|
17
|
+
|
|
18
|
+
## Extension Points
|
|
19
|
+
|
|
20
|
+
- Custom parser: translate a provider response into `ParsedUsage`.
|
|
21
|
+
- OpenAI-compatible host: register the host-to-provider mapping.
|
|
22
|
+
- Custom storage: receive the canonical `Event` and write it elsewhere.
|
|
23
|
+
- Notifications subscriber: observe `llm_request.llm_cost_tracker`.
|
|
24
|
+
- Local price file: model gateway IDs, contract rates, or unsupported models.
|
|
25
|
+
|
|
26
|
+
## Parser Boundary
|
|
27
|
+
|
|
28
|
+
A parser matches request URLs, translates provider response shapes into
|
|
29
|
+
`ParsedUsage`, and returns `nil` when the response is outside its contract.
|
|
30
|
+
|
|
31
|
+
Do provider-specific translation at this boundary. Keep `Tracker`, storage,
|
|
32
|
+
dashboard, and pricing in canonical ledger terms.
|
data/docs/operations.md
ADDED
|
@@ -0,0 +1,44 @@
|
|
|
1
|
+
# Operations
|
|
2
|
+
|
|
3
|
+
Production use is mostly about choosing the right storage backend, keeping the
|
|
4
|
+
database healthy, and understanding where the gem is intentionally best effort.
|
|
5
|
+
|
|
6
|
+
The operational guide is moving here from the README: retention, tag storage,
|
|
7
|
+
thread safety, connection pools, and deployment notes.
|
|
8
|
+
|
|
9
|
+
## Canonical Sources
|
|
10
|
+
|
|
11
|
+
Until this page is expanded, use:
|
|
12
|
+
|
|
13
|
+
- [Privacy](../README.md#privacy)
|
|
14
|
+
- [Known limitations](../README.md#known-limitations)
|
|
15
|
+
- [Technical operational notes](technical/operational-notes.md)
|
|
16
|
+
|
|
17
|
+
## Production Defaults
|
|
18
|
+
|
|
19
|
+
- Use `storage_backend = :active_record` for the shared ledger, dashboard, and
|
|
20
|
+
cross-process budget guardrails.
|
|
21
|
+
- Size the ActiveRecord connection pool for your app plus ledger writes.
|
|
22
|
+
- Keep `storage_error_behavior = :warn` unless losing the LLM response is better
|
|
23
|
+
than losing the ledger event.
|
|
24
|
+
- Treat `:block_requests` as a guardrail, not a hard quota.
|
|
25
|
+
- Keep `default_tags` callables fast and thread-safe.
|
|
26
|
+
|
|
27
|
+
## Retention
|
|
28
|
+
|
|
29
|
+
Retention is explicit. Use the prune task when the ledger should not grow
|
|
30
|
+
forever:
|
|
31
|
+
|
|
32
|
+
```bash
|
|
33
|
+
DAYS=90 bin/rails llm_cost_tracker:prune
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
When ActiveRecord period rollups are installed, pruning decrements the
|
|
37
|
+
affected daily and monthly buckets in the same batch transaction as the ledger
|
|
38
|
+
delete.
|
|
39
|
+
|
|
40
|
+
## Data Shape
|
|
41
|
+
|
|
42
|
+
Tags are JSONB with a GIN index on PostgreSQL and JSON text elsewhere. The
|
|
43
|
+
dashboard and query helpers work across supported adapters, but PostgreSQL is the
|
|
44
|
+
strongest path for large tag-heavy ledgers.
|
data/docs/pricing.md
ADDED
|
@@ -0,0 +1,94 @@
|
|
|
1
|
+
# Pricing and Price Refresh
|
|
2
|
+
|
|
3
|
+
LLM Cost Tracker prices calls locally from recorded usage and a versioned price
|
|
4
|
+
registry. Providers usually return token counts, not a stable per-request price,
|
|
5
|
+
so the gem stores the calculated cost with each ledger row.
|
|
6
|
+
|
|
7
|
+
The full pricing reference is moving here from the README: registry shape,
|
|
8
|
+
refresh tasks, precedence, provider-qualified keys, and mode-specific rates.
|
|
9
|
+
|
|
10
|
+
## Canonical Sources
|
|
11
|
+
|
|
12
|
+
Until this page is expanded, use:
|
|
13
|
+
|
|
14
|
+
- [Pricing](../README.md#pricing)
|
|
15
|
+
- [Supported providers](../README.md#supported-providers)
|
|
16
|
+
- [Known limitations](../README.md#known-limitations)
|
|
17
|
+
|
|
18
|
+
## Registry Rules
|
|
19
|
+
|
|
20
|
+
- Built-in prices live in `lib/llm_cost_tracker/prices.json`.
|
|
21
|
+
- Local snapshots live wherever `config.prices_file` points.
|
|
22
|
+
- Precedence is `pricing_overrides`, then `prices_file`, then bundled prices.
|
|
23
|
+
- Provider-qualified keys like `openai/gpt-4o-mini` win over model-only keys.
|
|
24
|
+
- Historical rows keep the cost calculated when the call was recorded.
|
|
25
|
+
|
|
26
|
+
## Refresh Commands
|
|
27
|
+
|
|
28
|
+
```bash
|
|
29
|
+
bin/rails generate llm_cost_tracker:prices
|
|
30
|
+
bin/rails llm_cost_tracker:prices:refresh
|
|
31
|
+
bin/rails llm_cost_tracker:prices:check
|
|
32
|
+
PROVIDER=openai MODEL=gpt-4o bin/rails llm_cost_tracker:prices:explain
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
The refresh task reads the maintained LLM Cost Tracker snapshot and writes to
|
|
36
|
+
`ENV["OUTPUT"]`, then `config.prices_file`, then
|
|
37
|
+
`config/llm_cost_tracker_prices.yml`.
|
|
38
|
+
|
|
39
|
+
## Price Fields
|
|
40
|
+
|
|
41
|
+
Base fields:
|
|
42
|
+
|
|
43
|
+
- `input`
|
|
44
|
+
- `output`
|
|
45
|
+
- `cache_read_input`
|
|
46
|
+
- `cache_write_input`
|
|
47
|
+
|
|
48
|
+
Mode-prefixed fields use the same base terms:
|
|
49
|
+
|
|
50
|
+
- `batch_input`
|
|
51
|
+
- `batch_output`
|
|
52
|
+
- `priority_input`
|
|
53
|
+
- `batch_cache_read_input`
|
|
54
|
+
|
|
55
|
+
## Pricing Modes
|
|
56
|
+
|
|
57
|
+
Pass `pricing_mode: :batch` when usage came from a provider batch job or another
|
|
58
|
+
discounted mode:
|
|
59
|
+
|
|
60
|
+
```ruby
|
|
61
|
+
LlmCostTracker.track(
|
|
62
|
+
provider: "openai",
|
|
63
|
+
model: "gpt-4o",
|
|
64
|
+
input_tokens: 1_000_000,
|
|
65
|
+
output_tokens: 250_000,
|
|
66
|
+
pricing_mode: :batch,
|
|
67
|
+
feature: "offline_eval"
|
|
68
|
+
)
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
The calculator uses `batch_input`, `batch_output`, and other matching
|
|
72
|
+
mode-prefixed fields when present, then falls back to the base field for missing
|
|
73
|
+
mode-specific rates.
|
|
74
|
+
|
|
75
|
+
## Price Explain
|
|
76
|
+
|
|
77
|
+
Use `prices:explain` when Data Quality shows unknown pricing or a local override
|
|
78
|
+
does not behave as expected:
|
|
79
|
+
|
|
80
|
+
```bash
|
|
81
|
+
PROVIDER=openai MODEL=gpt-4o PRICING_MODE=batch bin/rails llm_cost_tracker:prices:explain
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
Optional token env vars let the command check the exact buckets that a call used:
|
|
85
|
+
|
|
86
|
+
```bash
|
|
87
|
+
PROVIDER=custom MODEL=gateway-model INPUT_TOKENS=1000 OUTPUT_TOKENS=200 CACHE_READ_INPUT_TOKENS=50 bin/rails llm_cost_tracker:prices:explain
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
The command reports the matched source, matched key, match strategy, effective
|
|
91
|
+
rates, and any missing rate needed to price the event.
|
|
92
|
+
|
|
93
|
+
Provider-specific pricing pages belong in scrapers and snapshots. Runtime
|
|
94
|
+
pricing should stay in canonical billing terms.
|
data/docs/querying.md
ADDED
|
@@ -0,0 +1,36 @@
|
|
|
1
|
+
# Querying and Reports
|
|
2
|
+
|
|
3
|
+
Once calls are in `llm_api_calls`, the host app owns the data. Query it from a
|
|
4
|
+
console, a scheduled job, your admin UI, or the mounted dashboard.
|
|
5
|
+
|
|
6
|
+
The full querying reference is moving here from the README: ActiveRecord scopes,
|
|
7
|
+
reporting helpers, tag breakdowns, and SQL-side grouping patterns.
|
|
8
|
+
|
|
9
|
+
## Canonical Sources
|
|
10
|
+
|
|
11
|
+
Until this page is expanded, use:
|
|
12
|
+
|
|
13
|
+
- [Querying](../README.md#querying)
|
|
14
|
+
- [Dashboard](dashboard.md)
|
|
15
|
+
- [Operations](operations.md)
|
|
16
|
+
|
|
17
|
+
## Common Queries
|
|
18
|
+
|
|
19
|
+
```ruby
|
|
20
|
+
LlmCostTracker::LlmApiCall.today.total_cost
|
|
21
|
+
LlmCostTracker::LlmApiCall.this_month.cost_by_model
|
|
22
|
+
LlmCostTracker::LlmApiCall.this_month.cost_by_provider
|
|
23
|
+
LlmCostTracker::LlmApiCall.this_month.cost_by_tag("feature")
|
|
24
|
+
LlmCostTracker::LlmApiCall.by_tags(user_id: 42, feature: "chat").this_month.total_cost
|
|
25
|
+
LlmCostTracker::LlmApiCall.daily_costs(days: 7)
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
## Report Task
|
|
29
|
+
|
|
30
|
+
```bash
|
|
31
|
+
bin/rails llm_cost_tracker:report
|
|
32
|
+
DAYS=7 bin/rails llm_cost_tracker:report
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
This page is scoped to cost scopes, tag grouping, period grouping, latency
|
|
36
|
+
helpers, unknown-pricing queries, and report tag breakdowns.
|
data/docs/streaming.md
ADDED
|
@@ -0,0 +1,70 @@
|
|
|
1
|
+
# Streaming Capture
|
|
2
|
+
|
|
3
|
+
Streaming calls should appear in the ledger instead of disappearing into a live
|
|
4
|
+
callback. LLM Cost Tracker records them when the provider emits final usage or
|
|
5
|
+
when the app supplies explicit totals.
|
|
6
|
+
|
|
7
|
+
The full streaming reference is moving here from the README: Faraday streaming,
|
|
8
|
+
`track_stream`, provider response IDs, final usage events, and data-quality
|
|
9
|
+
states.
|
|
10
|
+
|
|
11
|
+
## Canonical Sources
|
|
12
|
+
|
|
13
|
+
Until this page is expanded, use:
|
|
14
|
+
|
|
15
|
+
- [Capturing calls](../README.md#capturing-calls)
|
|
16
|
+
- [Known limitations](../README.md#known-limitations)
|
|
17
|
+
- [Cookbook](cookbook.md)
|
|
18
|
+
|
|
19
|
+
## Faraday Path
|
|
20
|
+
|
|
21
|
+
The middleware tees Faraday's `on_data` callback, keeps chunks flowing to the
|
|
22
|
+
caller, and records usage when the response completes.
|
|
23
|
+
|
|
24
|
+
OpenAI streams need final usage:
|
|
25
|
+
|
|
26
|
+
```ruby
|
|
27
|
+
stream_options: { include_usage: true }
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
Anthropic and Gemini are parsed from their provider stream event shapes when
|
|
31
|
+
usage is present.
|
|
32
|
+
|
|
33
|
+
## SDK Path
|
|
34
|
+
|
|
35
|
+
Official OpenAI and Anthropic SDK streams are captured when `config.instrument`
|
|
36
|
+
is enabled for the provider. The returned stream object is preserved, and usage
|
|
37
|
+
is recorded after the stream is consumed.
|
|
38
|
+
|
|
39
|
+
```ruby
|
|
40
|
+
config.instrument :openai
|
|
41
|
+
config.instrument :anthropic
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
Captured SDK helpers:
|
|
45
|
+
|
|
46
|
+
- OpenAI `responses.stream`, `responses.stream_raw`, `responses.retrieve_streaming`, and `chat.completions.stream_raw`.
|
|
47
|
+
- Anthropic `messages.stream` and `messages.stream_raw`.
|
|
48
|
+
|
|
49
|
+
OpenAI Chat Completions streams need final usage:
|
|
50
|
+
|
|
51
|
+
```ruby
|
|
52
|
+
stream_options: { include_usage: true }
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
## Manual Path
|
|
56
|
+
|
|
57
|
+
```ruby
|
|
58
|
+
LlmCostTracker.track_stream(provider: "openai", model: "gpt-4o") do |stream|
|
|
59
|
+
my_client.stream(...) { |event| stream.event(event.to_h) }
|
|
60
|
+
end
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
If the client already knows totals, skip provider event parsing:
|
|
64
|
+
|
|
65
|
+
```ruby
|
|
66
|
+
stream.usage(input_tokens: 120, output_tokens: 45)
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
Missing final usage is stored with `usage_source: "unknown"` so the Data Quality
|
|
70
|
+
page can surface it.
|
|
@@ -0,0 +1,10 @@
|
|
|
1
|
+
# Technical Documentation
|
|
2
|
+
|
|
3
|
+
These files describe the internal module boundaries for LLM Cost Tracker.
|
|
4
|
+
|
|
5
|
+
- [Module map](module-map.md)
|
|
6
|
+
- [Data flow](data-flow.md)
|
|
7
|
+
- [Extension points](extension-points.md)
|
|
8
|
+
- [Operational notes](operational-notes.md)
|
|
9
|
+
|
|
10
|
+
The main rule is simple: provider-specific API shapes stop at ingestion boundaries. The ledger, storage, budgets, dashboard, and reports work with canonical billing concepts.
|
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
# Data Flow
|
|
2
|
+
|
|
3
|
+
This is the normal path from an application LLM call to stored ledger data.
|
|
4
|
+
|
|
5
|
+
## Faraday Requests
|
|
6
|
+
|
|
7
|
+
1. The host app sends an HTTP request through Faraday.
|
|
8
|
+
2. `LlmCostTracker::Middleware::Faraday` checks whether a parser matches the request URL.
|
|
9
|
+
3. For non-streaming responses, the middleware passes request and response data to the parser.
|
|
10
|
+
4. For streaming responses, the middleware tees `on_data`, collects stream events, and parses final usage when the stream completes.
|
|
11
|
+
5. The parser returns `ParsedUsage` with canonical fields.
|
|
12
|
+
6. `Tracker.record` prices and persists the event.
|
|
13
|
+
|
|
14
|
+
## SDK Integrations
|
|
15
|
+
|
|
16
|
+
1. The host app enables an integration with `config.instrument`.
|
|
17
|
+
2. `LlmCostTracker::Integrations` checks the SDK version, target classes, and target methods once at install time.
|
|
18
|
+
3. `LlmCostTracker::Integrations` prepends a narrow wrapper to supported SDK resource methods.
|
|
19
|
+
4. The host app keeps calling the provider SDK normally.
|
|
20
|
+
5. The wrapper measures latency, extracts usage from the SDK response object, and sends canonical fields to `Tracker.record`.
|
|
21
|
+
6. If an explicitly enabled SDK is not loaded or does not satisfy the install contract, boot raises before the app silently misses usage.
|
|
22
|
+
|
|
23
|
+
## Explicit Tracking
|
|
24
|
+
|
|
25
|
+
1. The host app calls `LlmCostTracker.track` with known usage totals, or `LlmCostTracker.track_stream` with stream events.
|
|
26
|
+
2. `track` sends manual totals directly to `Tracker.record`.
|
|
27
|
+
3. `track_stream` uses `StreamCollector`, then parser lookup by provider when events need parsing.
|
|
28
|
+
4. `Tracker.record` prices and persists the event.
|
|
29
|
+
|
|
30
|
+
## Canonical Event Build
|
|
31
|
+
|
|
32
|
+
`Tracker.record` performs the central normalization step:
|
|
33
|
+
|
|
34
|
+
1. Blank model identifiers become `unknown`.
|
|
35
|
+
2. Input, output, cache-read, cache-write, hidden-output, and pricing-mode values are extracted from metadata.
|
|
36
|
+
3. `Pricing.cost_for` calculates a `Cost` object or returns `nil` for unknown pricing.
|
|
37
|
+
4. Tags are merged from `with_tags`, `default_tags`, middleware tags, and explicit metadata.
|
|
38
|
+
5. An `Event` is created and emitted through `ActiveSupport::Notifications`.
|
|
39
|
+
6. The configured storage backend receives the event.
|
|
40
|
+
7. Budget checks run unless storage explicitly returns `false`.
|
|
41
|
+
|
|
42
|
+
## ActiveRecord Storage
|
|
43
|
+
|
|
44
|
+
1. `Storage::ActiveRecordStore.save` converts tags for JSON or text storage.
|
|
45
|
+
2. Optional fields are written only when their columns exist.
|
|
46
|
+
3. The call row and period rollup updates happen in one transaction.
|
|
47
|
+
4. `ActiveRecordRollups.increment!` updates daily and monthly totals atomically.
|
|
48
|
+
5. Budget reads use period totals when available.
|
|
49
|
+
|
|
50
|
+
## Dashboard Reads
|
|
51
|
+
|
|
52
|
+
1. Controllers build a filtered `LlmApiCall` scope.
|
|
53
|
+
2. Dashboard services run targeted aggregate queries.
|
|
54
|
+
3. Helpers render filters, charts, pagination, CSV links, and numeric formatting.
|
|
55
|
+
4. Views render plain ERB with the engine CSS asset.
|
|
56
|
+
|
|
57
|
+
Dashboard reads do not mutate ledger state. They can be heavier than request-time code, but they still need explicit grouping and indexes.
|
|
58
|
+
|
|
59
|
+
## Pricing Refresh
|
|
60
|
+
|
|
61
|
+
1. `llm_cost_tracker:prices:refresh` chooses `ENV["OUTPUT"]`, then `config.prices_file`, then `config/llm_cost_tracker_prices.yml`.
|
|
62
|
+
2. `PriceSync::Fetcher` fetches the maintained LLM Cost Tracker price snapshot.
|
|
63
|
+
3. `PriceSync` validates schema compatibility, gem-version compatibility, and model price shape.
|
|
64
|
+
4. `RegistryWriter` writes a local JSON or YAML registry.
|
|
65
|
+
5. Runtime pricing reloads the local file when its mtime changes.
|
|
66
|
+
|
|
67
|
+
The gem never fetches pricing from the network during normal request tracking.
|