llm_gateway 0.5.0 → 0.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +26 -0
- data/README.md +95 -42
- data/docs/migration_guide_0.6.0.md +386 -0
- data/lib/llm_gateway/adapters/adapter.rb +7 -10
- data/lib/llm_gateway/adapters/anthropic/stream_mapper.rb +33 -6
- data/lib/llm_gateway/adapters/normalized_stream_accumulator.rb +87 -26
- data/lib/llm_gateway/adapters/openai/chat_completions/stream_mapper.rb +40 -16
- data/lib/llm_gateway/adapters/openai/responses/stream_mapper.rb +42 -21
- data/lib/llm_gateway/adapters/stream_mapper.rb +9 -2
- data/lib/llm_gateway/adapters/structs.rb +102 -52
- data/lib/llm_gateway/base_client.rb +2 -4
- data/lib/llm_gateway/clients/anthropic.rb +5 -4
- data/lib/llm_gateway/clients/groq.rb +8 -6
- data/lib/llm_gateway/clients/openai.rb +20 -18
- data/lib/llm_gateway/prompt.rb +35 -17
- data/lib/llm_gateway/version.rb +1 -1
- data/lib/llm_gateway.rb +3 -21
- metadata +3 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: '086d7bdff1cb0b6b3febb78d025d7ccfe4b53c6fd40fcb5cddebd335d786e437'
|
|
4
|
+
data.tar.gz: 1b2ea3af95f44d27c0c1636da321d24dc036fad8a242263f608948c79ac11f88
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: '0147478704832819ee6d8fbe4e0e6203f4e598d72fd3b23138b550de9da64fb90cd8354713a5553244acf17b9c6fe0a89a0b5cab624f03ec7382e12f11aebb21'
|
|
7
|
+
data.tar.gz: 22e1ff9571717ebe8f39a31cd36d37815c6053def32ac1e125a103ccb516a98b37aeb225edae899c6a9ecf121df5344bbbe6f6166a2e1d89ffa05db881c70e14
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,31 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## [v0.6.0](https://github.com/Hyper-Unearthing/llm_gateway/tree/v0.6.0) (2026-05-27)
|
|
4
|
+
|
|
5
|
+
[Full Changelog](https://github.com/Hyper-Unearthing/llm_gateway/compare/v0.5.0...v0.6.0)
|
|
6
|
+
|
|
7
|
+
**Closed issues:**
|
|
8
|
+
|
|
9
|
+
- issues with token normalization [\#75](https://github.com/Hyper-Unearthing/llm_gateway/issues/75)
|
|
10
|
+
- Add normalized token usage fields for streamed responses [\#72](https://github.com/Hyper-Unearthing/llm_gateway/issues/72)
|
|
11
|
+
- Add timestamp metadata to messages [\#70](https://github.com/Hyper-Unearthing/llm_gateway/issues/70)
|
|
12
|
+
- Build final AssistantMessage in stream pipeline and include it on message\_end [\#69](https://github.com/Hyper-Unearthing/llm_gateway/issues/69)
|
|
13
|
+
- Expose finalized content on stream \_end events [\#68](https://github.com/Hyper-Unearthing/llm_gateway/issues/68)
|
|
14
|
+
- Add accumulated AssistantMessage partials to stream events [\#66](https://github.com/Hyper-Unearthing/llm_gateway/issues/66)
|
|
15
|
+
- 1.0 [\#37](https://github.com/Hyper-Unearthing/llm_gateway/issues/37)
|
|
16
|
+
|
|
17
|
+
**Merged pull requests:**
|
|
18
|
+
|
|
19
|
+
- Improve token normalization [\#78](https://github.com/Hyper-Unearthing/llm_gateway/pull/78) ([billybonks](https://github.com/billybonks))
|
|
20
|
+
- fix\(tests\): the hand off tests were totally fake, now they work [\#77](https://github.com/Hyper-Unearthing/llm_gateway/pull/77) ([billybonks](https://github.com/billybonks))
|
|
21
|
+
- Improve message event metadata and helpers [\#74](https://github.com/Hyper-Unearthing/llm_gateway/pull/74) ([billybonks](https://github.com/billybonks))
|
|
22
|
+
- fix: update migration guide [\#73](https://github.com/Hyper-Unearthing/llm_gateway/pull/73) ([billybonks](https://github.com/billybonks))
|
|
23
|
+
- feat: add partial message as part of streaming events [\#67](https://github.com/Hyper-Unearthing/llm_gateway/pull/67) ([billybonks](https://github.com/billybonks))
|
|
24
|
+
- docs: add migration guide for upcomming version [\#65](https://github.com/Hyper-Unearthing/llm_gateway/pull/65) ([billybonks](https://github.com/billybonks))
|
|
25
|
+
- Decouple model selection from provider auth configuration [\#64](https://github.com/Hyper-Unearthing/llm_gateway/pull/64) ([billybonks](https://github.com/billybonks))
|
|
26
|
+
- burn: support for legacy provider keys [\#63](https://github.com/Hyper-Unearthing/llm_gateway/pull/63) ([billybonks](https://github.com/billybonks))
|
|
27
|
+
- docs: add docs about options for stream method [\#62](https://github.com/Hyper-Unearthing/llm_gateway/pull/62) ([billybonks](https://github.com/billybonks))
|
|
28
|
+
|
|
3
29
|
## [v0.5.0](https://github.com/Hyper-Unearthing/llm_gateway/tree/v0.5.0) (2026-05-20)
|
|
4
30
|
|
|
5
31
|
[Full Changelog](https://github.com/Hyper-Unearthing/llm_gateway/compare/v0.4.0...v0.5.0)
|
data/README.md
CHANGED
|
@@ -7,6 +7,9 @@ Provide a unified translation interface for LLM Provider API's, While allowing d
|
|
|
7
7
|
- [Principles:](#principles)
|
|
8
8
|
- [Installation](#installation)
|
|
9
9
|
- [Supported Providers](#supported-providers)
|
|
10
|
+
- [Stream Options](#stream-options)
|
|
11
|
+
- [Managed cross-provider options](#managed-cross-provider-options)
|
|
12
|
+
- [Provider-specific options](#provider-specific-options)
|
|
10
13
|
- [Quick Start: Streaming (all events)](#quick-start-streaming-all-events)
|
|
11
14
|
- [Stream API without handling events (final result only)](#stream-api-without-handling-events-final-result-only)
|
|
12
15
|
- [Migration guides](#migration-guides)
|
|
@@ -56,7 +59,53 @@ gem "llm_gateway"
|
|
|
56
59
|
| OpenAI Codex | `openai_codex` | OAuth | Responses |
|
|
57
60
|
| Groq | `groq_completions` | API key | Chat Completions |
|
|
58
61
|
|
|
59
|
-
|
|
62
|
+
Provider configuration only contains auth/client settings (for example `api_key` or `access_token`). Pass the model per request with `model:` when calling `chat` or `stream`.
|
|
63
|
+
|
|
64
|
+
## Stream Options
|
|
65
|
+
|
|
66
|
+
Pass options to `stream` as keyword arguments alongside `tools:` and `system:`:
|
|
67
|
+
|
|
68
|
+
```ruby
|
|
69
|
+
result = adapter.stream(
|
|
70
|
+
transcript,
|
|
71
|
+
system: "You are concise.",
|
|
72
|
+
reasoning: "high",
|
|
73
|
+
cache_key: "conversation-123",
|
|
74
|
+
cache_retention: "short",
|
|
75
|
+
max_completion_tokens: 2_000
|
|
76
|
+
)
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
Options are split into two groups:
|
|
80
|
+
|
|
81
|
+
1. **Managed cross-provider options**: normalized by `llm_gateway` and mapped to each provider API when supported.
|
|
82
|
+
2. **Provider-specific options**: passed through only when that provider/API pair explicitly allows them.
|
|
83
|
+
|
|
84
|
+
Unknown provider-specific options raise `ArgumentError` with the valid option list for that provider/API pair.
|
|
85
|
+
|
|
86
|
+
### Managed cross-provider options
|
|
87
|
+
|
|
88
|
+
| Option | Accepted values | What it means | Provider mapping notes |
|
|
89
|
+
|--------|-----------------|---------------|------------------------|
|
|
90
|
+
| `reasoning` | `"none"`, `"low"`, `"medium"`, `"high"`, `"xhigh"` | Request provider reasoning/thinking effort. | Anthropic maps to `thinking` token budgets. OpenAI Responses maps to `reasoning`. OpenAI Chat Completions maps to `reasoning_effort`. Groq maps to `reasoning_effort` and `reasoning_format: "parsed"`; Groq accepts `"default"`, `"low"`, `"medium"`, `"high"` and does not accept `"xhigh"`. |
|
|
91
|
+
| `cache_key` | String | Stable prompt/session cache key. | OpenAI Chat Completions and OpenAI Responses map this to `prompt_cache_key`. |
|
|
92
|
+
| `cache_retention` | `"short"`, `"long"`, `"none"` | Requested cache retention policy for `cache_key`. | OpenAI maps `"short"` to `"in_memory"`, `"long"` to `"24h"`, and `"none"` removes prompt-cache fields. If `cache_key` is set without retention, OpenAI defaults to `"short"`. |
|
|
93
|
+
| `max_completion_tokens` | Integer | Maximum generated tokens using gateway naming. | Anthropic maps to `max_tokens`; OpenAI Responses maps to `max_output_tokens`; OpenAI/Groq Chat Completions use `max_completion_tokens`. OpenAI Codex currently removes token limit parameters before sending. |
|
|
94
|
+
| `response_format` | String or Hash, provider-dependent | Requested final response shape, e.g. text or JSON. | OpenAI Chat Completions and Groq pass this as `response_format`; OpenAI Responses maps it under `text.format`; Anthropic maps JSON-ish formats to `output_config`. |
|
|
95
|
+
|
|
96
|
+
### Provider-specific options
|
|
97
|
+
|
|
98
|
+
Provider-specific options are maintained as explicit allowlists in the option mapper source. Use the mapper link to see the current allowed Ruby option keys and the provider documentation link for upstream meanings and values.
|
|
99
|
+
|
|
100
|
+
| Provider key | Provider/API pair | Option mapper source | Provider API documentation |
|
|
101
|
+
|--------------|-------------------|----------------------|----------------------------|
|
|
102
|
+
| `anthropic_messages` | Anthropic Messages Create | [`lib/llm_gateway/adapters/anthropic_option_mapper.rb`](lib/llm_gateway/adapters/anthropic_option_mapper.rb) | [Anthropic Messages API](https://platform.claude.com/docs/en/api/messages/create.md) |
|
|
103
|
+
| `openai_completions` | OpenAI Chat Completions Create | [`lib/llm_gateway/adapters/openai/chat_completions/option_mapper.rb`](lib/llm_gateway/adapters/openai/chat_completions/option_mapper.rb) | [OpenAI Chat Completions API](https://developers.openai.com/api/reference/resources/chat/subresources/completions/methods/create/index.md) |
|
|
104
|
+
| `openai_responses` | OpenAI Responses Create | [`lib/llm_gateway/adapters/openai/responses/option_mapper.rb`](lib/llm_gateway/adapters/openai/responses/option_mapper.rb) | [OpenAI Responses API](https://developers.openai.com/api/reference/resources/responses/methods/create/index.md) |
|
|
105
|
+
| `openai_codex` | OpenAI Codex Responses-compatible endpoint | [`lib/llm_gateway/adapters/openai_codex/option_mapper.rb`](lib/llm_gateway/adapters/openai_codex/option_mapper.rb) | [OpenAI Responses API](https://developers.openai.com/api/reference/resources/responses/methods/create/index.md) |
|
|
106
|
+
| `groq_completions` | Groq Chat Completions Create | [`lib/llm_gateway/adapters/groq/option_mapper.rb`](lib/llm_gateway/adapters/groq/option_mapper.rb) | [Groq Chat API](https://console.groq.com/docs/api-reference.md#chat-create) |
|
|
107
|
+
|
|
108
|
+
Common provider-native options you may pass directly when allowed include OpenAI `prompt_cache_key` / `prompt_cache_retention` and Groq `reasoning_effort` / `reasoning_format`. Prefer the managed options above when you want portable behavior across providers.
|
|
60
109
|
|
|
61
110
|
## Quick Start: Streaming (all events)
|
|
62
111
|
|
|
@@ -67,10 +116,8 @@ require "json"
|
|
|
67
116
|
# Build a provider adapter directly (not via prebuilt config)
|
|
68
117
|
adapter = LlmGateway.build_provider(
|
|
69
118
|
provider: "openai_responses", # or anthropic_messages, groq_completions, ...
|
|
70
|
-
api_key: ENV.fetch("OPENAI_API_KEY")
|
|
71
|
-
model_key: "gpt-5.4"
|
|
119
|
+
api_key: ENV.fetch("OPENAI_API_KEY")
|
|
72
120
|
)
|
|
73
|
-
|
|
74
121
|
tools = [
|
|
75
122
|
{
|
|
76
123
|
name: "get_time",
|
|
@@ -90,15 +137,15 @@ transcript = [
|
|
|
90
137
|
|
|
91
138
|
streamed_tool_args = Hash.new { |h, k| h[k] = +"" }
|
|
92
139
|
|
|
93
|
-
response = adapter.stream(transcript, tools: tools, reasoning: "high") do |event|
|
|
140
|
+
response = adapter.stream(transcript, tools: tools, model: "gpt-5.4", reasoning: "high") do |event|
|
|
94
141
|
case event.type
|
|
95
142
|
# AssistantStreamMessageEvent
|
|
96
143
|
when :message_start
|
|
97
144
|
puts "\n[message_start] #{event.delta.inspect}"
|
|
98
145
|
when :message_delta
|
|
99
|
-
puts "\n[message_delta] #{event.delta.inspect} usage
|
|
146
|
+
puts "\n[message_delta] #{event.delta.inspect} usage=#{event.usage.inspect}"
|
|
100
147
|
when :message_end
|
|
101
|
-
puts "\n[message_end]"
|
|
148
|
+
puts "\n[message_end] final_id=#{event.message.id} stop_reason=#{event.message.stop_reason}"
|
|
102
149
|
|
|
103
150
|
# Text events
|
|
104
151
|
when :text_start
|
|
@@ -141,6 +188,7 @@ puts "id: #{response.id}"
|
|
|
141
188
|
puts "model: #{response.model}"
|
|
142
189
|
puts "provider/api: #{response.provider}/#{response.api}"
|
|
143
190
|
puts "role: #{response.role}"
|
|
191
|
+
puts "timestamp: #{response.timestamp}" # Unix milliseconds
|
|
144
192
|
puts "stop_reason: #{response.stop_reason}"
|
|
145
193
|
puts "error_message: #{response.error_message.inspect}" if response.error_message
|
|
146
194
|
puts "usage: #{response.usage.inspect}"
|
|
@@ -159,12 +207,23 @@ end
|
|
|
159
207
|
```
|
|
160
208
|
|
|
161
209
|
Stream callback event families:
|
|
162
|
-
- `AssistantStreamMessageEvent`: `:message_start`, `:message_delta
|
|
210
|
+
- `AssistantStreamMessageEvent`: `:message_start`, `:message_delta`
|
|
211
|
+
- `AssistantStreamMessageEndEvent`: `:message_end` with the final `event.message`
|
|
163
212
|
- `AssistantStreamEvent` (and subclasses):
|
|
164
213
|
- Text: `:text_start`, `:text_delta`, `:text_end`
|
|
165
214
|
- Tool call: `:tool_start`, `:tool_delta`, `:tool_end`
|
|
166
215
|
- Reasoning: `:reasoning_start`, `:reasoning_delta`, `:reasoning_end`
|
|
167
216
|
|
|
217
|
+
Non-final stream events expose `event.partial`, a `PartialAssistantMessage` snapshot accumulated so far. The final `:message_end` event exposes the complete `AssistantMessage` as `event.message` instead.
|
|
218
|
+
|
|
219
|
+
End events include helpers for the finalized current content block:
|
|
220
|
+
- `event.content` for `:text_end`, `:reasoning_end`, and `:tool_end`
|
|
221
|
+
- `event.text` for `:text_end`
|
|
222
|
+
- `event.reasoning` for `:reasoning_end`
|
|
223
|
+
- `event.tool_call` / `event.tool` for `:tool_end`
|
|
224
|
+
|
|
225
|
+
Usage counters are normalized as `:input`, `:cache_write`, `:cache_read`, `:output`, and `:total`. `:total` is the sum of all input-side buckets plus output. `usage[:raw]` contains the original provider usage/token payload.
|
|
226
|
+
|
|
168
227
|
### Stream API without handling events (final result only)
|
|
169
228
|
|
|
170
229
|
If you only care about the final `AssistantMessage`, call `stream` without a block:
|
|
@@ -173,14 +232,14 @@ If you only care about the final `AssistantMessage`, call `stream` without a blo
|
|
|
173
232
|
require "llm_gateway"
|
|
174
233
|
|
|
175
234
|
adapter = LlmGateway.build_provider(
|
|
176
|
-
provider: "
|
|
177
|
-
api_key: ENV.fetch("OPENAI_API_KEY")
|
|
178
|
-
model_key: "gpt-5.4"
|
|
235
|
+
provider: "openai_responses",
|
|
236
|
+
api_key: ENV.fetch("OPENAI_API_KEY")
|
|
179
237
|
)
|
|
180
238
|
|
|
181
|
-
result = adapter.stream("Write one short sentence about Ruby.")
|
|
239
|
+
result = adapter.stream("Write one short sentence about Ruby.", model: "gpt-5.4")
|
|
182
240
|
|
|
183
241
|
puts result.role # "assistant"
|
|
242
|
+
puts result.timestamp # Unix milliseconds
|
|
184
243
|
puts result.stop_reason # "stop" (usually)
|
|
185
244
|
puts result.usage.inspect
|
|
186
245
|
|
|
@@ -194,7 +253,8 @@ puts text
|
|
|
194
253
|
|
|
195
254
|
## Migration guides
|
|
196
255
|
|
|
197
|
-
- [
|
|
256
|
+
- [0.6.0 migration guide](docs/migration_guide_0.6.0.md) — move `model_key` to per-request `model:`, update provider keys, update `Prompt` usage, and migrate stream event/usage changes.
|
|
257
|
+
- [Migrating from `chat` to `stream`](docs/migration-guide.md) — use `stream` without a block when you only need the final response.
|
|
198
258
|
|
|
199
259
|
## Tools
|
|
200
260
|
|
|
@@ -228,11 +288,9 @@ require "llm_gateway"
|
|
|
228
288
|
require "json"
|
|
229
289
|
|
|
230
290
|
adapter = LlmGateway.build_provider(
|
|
231
|
-
provider: "
|
|
232
|
-
api_key: ENV.fetch("OPENAI_API_KEY")
|
|
233
|
-
model_key: "gpt-5.4"
|
|
291
|
+
provider: "openai_responses",
|
|
292
|
+
api_key: ENV.fetch("OPENAI_API_KEY")
|
|
234
293
|
)
|
|
235
|
-
|
|
236
294
|
weather_tool = {
|
|
237
295
|
name: "get_weather",
|
|
238
296
|
description: "Get current weather for a location",
|
|
@@ -261,7 +319,7 @@ transcript = [
|
|
|
261
319
|
]
|
|
262
320
|
|
|
263
321
|
# 1) First model pass (stream API, no event block)
|
|
264
|
-
response = adapter.stream(transcript, tools: [weather_tool])
|
|
322
|
+
response = adapter.stream(transcript, tools: [weather_tool], model: "gpt-5.4")
|
|
265
323
|
transcript << response.to_h
|
|
266
324
|
|
|
267
325
|
# 2) Execute tool calls returned by the model
|
|
@@ -284,7 +342,7 @@ end
|
|
|
284
342
|
|
|
285
343
|
# 3) Continue the conversation after tool execution
|
|
286
344
|
if response.content.any? { |b| b.type == "tool_use" }
|
|
287
|
-
final_response = adapter.stream(transcript, tools: [weather_tool])
|
|
345
|
+
final_response = adapter.stream(transcript, tools: [weather_tool], model: "gpt-5.4")
|
|
288
346
|
|
|
289
347
|
final_text = final_response.content
|
|
290
348
|
.select { |b| b.type == "text" }
|
|
@@ -309,11 +367,9 @@ require "llm_gateway"
|
|
|
309
367
|
require "base64"
|
|
310
368
|
|
|
311
369
|
adapter = LlmGateway.build_provider(
|
|
312
|
-
provider: "
|
|
313
|
-
api_key: ENV.fetch("OPENAI_API_KEY")
|
|
314
|
-
model_key: "gpt-5.4"
|
|
370
|
+
provider: "openai_responses",
|
|
371
|
+
api_key: ENV.fetch("OPENAI_API_KEY")
|
|
315
372
|
)
|
|
316
|
-
|
|
317
373
|
image_b64 = Base64.strict_encode64(File.binread("./chart.png"))
|
|
318
374
|
|
|
319
375
|
message = [
|
|
@@ -326,7 +382,7 @@ message = [
|
|
|
326
382
|
}
|
|
327
383
|
]
|
|
328
384
|
|
|
329
|
-
result = adapter.stream(message) # stream API, no event block
|
|
385
|
+
result = adapter.stream(message, model: "gpt-5.4") # stream API, no event block
|
|
330
386
|
|
|
331
387
|
text = result.content
|
|
332
388
|
.select { |b| b.type == "text" }
|
|
@@ -346,18 +402,18 @@ You can request higher-effort reasoning by passing `reasoning:` to `stream`.
|
|
|
346
402
|
require "llm_gateway"
|
|
347
403
|
|
|
348
404
|
adapter = LlmGateway.build_provider(
|
|
349
|
-
provider: "
|
|
350
|
-
api_key: ENV.fetch("OPENAI_API_KEY")
|
|
351
|
-
model_key: "gpt-5.4"
|
|
405
|
+
provider: "openai_responses",
|
|
406
|
+
api_key: ENV.fetch("OPENAI_API_KEY")
|
|
352
407
|
)
|
|
353
408
|
|
|
354
409
|
result = adapter.stream(
|
|
355
410
|
"Think step by step and then compute 482 * 17.",
|
|
411
|
+
model: "gpt-5.4",
|
|
356
412
|
reasoning: "high"
|
|
357
413
|
)
|
|
358
414
|
|
|
359
415
|
puts "stop_reason: #{result.stop_reason}"
|
|
360
|
-
puts "usage: #{result.usage.inspect}" #
|
|
416
|
+
puts "usage: #{result.usage.inspect}" # normalized keys: :input, :cache_write, :cache_read, :output, :total, :raw
|
|
361
417
|
|
|
362
418
|
result.content.each do |block|
|
|
363
419
|
case block.type
|
|
@@ -377,7 +433,7 @@ If you want incremental thinking/reasoning tokens as they arrive, pass a block t
|
|
|
377
433
|
```ruby
|
|
378
434
|
reasoning_text = +""
|
|
379
435
|
|
|
380
|
-
result = adapter.stream("Solve 99 * 99 with brief reasoning.", reasoning: "high") do |event|
|
|
436
|
+
result = adapter.stream("Solve 99 * 99 with brief reasoning.", model: "gpt-5.4", reasoning: "high") do |event|
|
|
381
437
|
case event.type
|
|
382
438
|
when :reasoning_start
|
|
383
439
|
print "\n[thinking start]\n"
|
|
@@ -405,7 +461,7 @@ puts "Final stop_reason: #{result.stop_reason}"
|
|
|
405
461
|
- fields: `reasoning` and optional `signature`
|
|
406
462
|
- Usage accounting:
|
|
407
463
|
- normalized in `result.usage` when provided by the upstream API
|
|
408
|
-
-
|
|
464
|
+
- keys are `:input`, `:cache_write`, `:cache_read`, `:output`, `:total`, and `:raw`
|
|
409
465
|
|
|
410
466
|
In practice this means you can:
|
|
411
467
|
- listen to `:reasoning_*` stream event variants, and
|
|
@@ -439,7 +495,7 @@ What happens under the hood on `stream`/`chat`:
|
|
|
439
495
|
|
|
440
496
|
5. **Map response back to canonical output**
|
|
441
497
|
- Stream chunks are mapped into normalized stream events.
|
|
442
|
-
- Final output is accumulated into a normalized `AssistantMessage` (`id`, `model`, `usage`, `stop_reason`, `content`, etc.).
|
|
498
|
+
- Final output is accumulated into a normalized `AssistantMessage` (`id`, `model`, `timestamp` as Unix milliseconds, `usage`, `stop_reason`, `content`, etc.).
|
|
443
499
|
|
|
444
500
|
Why this matters:
|
|
445
501
|
- A transcript produced by one provider can be reused with another provider without manually rewriting message structure.
|
|
@@ -455,18 +511,16 @@ require "llm_gateway"
|
|
|
455
511
|
require "json"
|
|
456
512
|
|
|
457
513
|
adapter = LlmGateway.build_provider(
|
|
458
|
-
provider: "
|
|
459
|
-
api_key: ENV.fetch("OPENAI_API_KEY")
|
|
460
|
-
model_key: "gpt-5.4"
|
|
514
|
+
provider: "openai_responses",
|
|
515
|
+
api_key: ENV.fetch("OPENAI_API_KEY")
|
|
461
516
|
)
|
|
462
|
-
|
|
463
517
|
# Build context (transcript)
|
|
464
518
|
transcript = [
|
|
465
519
|
{ role: "user", content: "Plan a 3-day trip to Tokyo." }
|
|
466
520
|
]
|
|
467
521
|
|
|
468
522
|
# Run one turn and persist assistant output
|
|
469
|
-
first = adapter.stream(transcript)
|
|
523
|
+
first = adapter.stream(transcript, model: "gpt-5.4")
|
|
470
524
|
transcript << first.to_h
|
|
471
525
|
|
|
472
526
|
# Serialize (store in DB/file/cache)
|
|
@@ -477,7 +531,7 @@ restored_transcript = JSON.parse(json_context)
|
|
|
477
531
|
|
|
478
532
|
# Continue conversation from restored context
|
|
479
533
|
restored_transcript << { role: "user", content: "Now make it budget-friendly." }
|
|
480
|
-
second = adapter.stream(restored_transcript)
|
|
534
|
+
second = adapter.stream(restored_transcript, model: "gpt-5.4")
|
|
481
535
|
|
|
482
536
|
puts second.content.select { |b| b.type == "text" }.map(&:text).join
|
|
483
537
|
```
|
|
@@ -491,7 +545,7 @@ Tip: if you serialize to JSON, keys become strings on parse; `llm_gateway` accep
|
|
|
491
545
|
|
|
492
546
|
## OAuth
|
|
493
547
|
|
|
494
|
-
Use OAuth-capable providers (for example `openai_codex` and `
|
|
548
|
+
Use OAuth-capable providers (for example `openai_codex` and `anthropic_messages`) by supplying an `access_token` when building the adapter.
|
|
495
549
|
|
|
496
550
|
### Get initial tokens (Codex / OpenAI OAuth)
|
|
497
551
|
|
|
@@ -599,11 +653,10 @@ Build the provider with the current access token:
|
|
|
599
653
|
```ruby
|
|
600
654
|
adapter = LlmGateway.build_provider(
|
|
601
655
|
provider: "openai_codex",
|
|
602
|
-
access_token: current_access_token
|
|
603
|
-
model_key: "gpt-5.4"
|
|
656
|
+
access_token: current_access_token
|
|
604
657
|
)
|
|
605
658
|
|
|
606
|
-
result = adapter.stream("Hello from OAuth auth")
|
|
659
|
+
result = adapter.stream("Hello from OAuth auth", model: "gpt-5.4")
|
|
607
660
|
puts result.content.select { |b| b.type == "text" }.map(&:text).join
|
|
608
661
|
```
|
|
609
662
|
|
|
@@ -641,6 +694,6 @@ bundle exec ruby -Itest test/integration/live/stream_test.rb
|
|
|
641
694
|
|
|
642
695
|
Cassette names are derived from the test file and test name, with VCR sanitizing path segments such as `stream_test.rb` to `stream_test_rb`.
|
|
643
696
|
|
|
644
|
-
For OAuth-backed providers (`
|
|
697
|
+
For OAuth-backed providers (`anthropic_messages`, `openai_codex`), the live test helper only loads real OAuth credentials while the cassette is being recorded. Once the cassette exists, replay uses placeholder tokens/account IDs so the test suite can run without local OAuth state. API-key providers still require the relevant API key when recording. Sensitive authorization headers and selected response headers are redacted before cassettes are written.
|
|
645
698
|
|
|
646
699
|
Some tests pass `redact_request_body: true` to `with_vcr_adapter`; those cassettes match on method and URI only and replace large request bodies with `"<huge prompt body redacted>"`.
|