llm_gateway 0.5.0 → 0.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +38 -0
- data/README.md +350 -43
- data/docs/migration_guide_0.6.0.md +386 -0
- data/docs/migration_guide_0.7.0.md +193 -0
- data/lib/llm_gateway/adapters/adapter.rb +8 -11
- data/lib/llm_gateway/adapters/anthropic/input_mapper.rb +24 -0
- data/lib/llm_gateway/adapters/anthropic/stream_mapper.rb +61 -11
- data/lib/llm_gateway/adapters/anthropic_option_mapper.rb +1 -1
- data/lib/llm_gateway/adapters/groq/option_mapper.rb +1 -1
- data/lib/llm_gateway/adapters/input_message_sanitizer.rb +98 -7
- data/lib/llm_gateway/adapters/normalized_stream_accumulator.rb +132 -39
- data/lib/llm_gateway/adapters/openai/chat_completions/option_mapper.rb +1 -1
- data/lib/llm_gateway/adapters/openai/chat_completions/stream_mapper.rb +40 -16
- data/lib/llm_gateway/adapters/openai/responses/input_mapper.rb +47 -31
- data/lib/llm_gateway/adapters/openai/responses/option_mapper.rb +1 -1
- data/lib/llm_gateway/adapters/openai/responses/stream_mapper.rb +173 -24
- data/lib/llm_gateway/adapters/stream_mapper.rb +9 -2
- data/lib/llm_gateway/adapters/structs.rb +140 -55
- data/lib/llm_gateway/agents/event.rb +105 -0
- data/lib/llm_gateway/agents/file_session_manager.rb +100 -0
- data/lib/llm_gateway/agents/harness.rb +176 -0
- data/lib/llm_gateway/agents/in_memory_session_manager.rb +222 -0
- data/lib/llm_gateway/agents/tools/bash_tool.rb +132 -0
- data/lib/llm_gateway/agents/tools/edit_tool.rb +215 -0
- data/lib/llm_gateway/agents/tools/read_tool.rb +143 -0
- data/lib/llm_gateway/agents/tools/tool_utils.rb +164 -0
- data/lib/llm_gateway/agents/tools/write_tool.rb +34 -0
- data/lib/llm_gateway/base_client.rb +5 -7
- data/lib/llm_gateway/clients/anthropic.rb +10 -9
- data/lib/llm_gateway/clients/claude_code/oauth_flow.rb +2 -2
- data/lib/llm_gateway/clients/groq.rb +8 -6
- data/lib/llm_gateway/clients/openai.rb +22 -20
- data/lib/llm_gateway/clients/openai_codex/oauth_flow.rb +4 -4
- data/lib/llm_gateway/prompt.rb +107 -52
- data/lib/llm_gateway/utils.rb +116 -13
- data/lib/llm_gateway/version.rb +1 -1
- data/lib/llm_gateway.rb +7 -21
- metadata +13 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 173ab613e57543956e39d70f4a38fc865bc6b6bac4e8dfe319be9c2928810f77
|
|
4
|
+
data.tar.gz: 46c761a838aee6c3cebad151467555cba8ab70480e952ab741874c2d8acc13e8
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 0f21f7288e4d8d374ea77d96ee3110b08a260a2e06ef6fd6372357b88abb5e936d2cbeae934720a0a5e17ad431bdce7027a3cd68e4fc60460c6cb5d0f02acc0a
|
|
7
|
+
data.tar.gz: ecf15206364c5ef7d632c0c421294deb1929e508b4287828acbee91e4a4182fb0efd5fb12cca58d6909499b2b33197070bd4c19e232bed8f25fd12f86e2dd604
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,43 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## [v0.7.0](https://github.com/Hyper-Unearthing/llm_gateway/tree/v0.7.0) (2026-06-03)
|
|
4
|
+
|
|
5
|
+
[Full Changelog](https://github.com/Hyper-Unearthing/llm_gateway/compare/v0.6.0...v0.7.0)
|
|
6
|
+
|
|
7
|
+
**Merged pull requests:**
|
|
8
|
+
|
|
9
|
+
- feat: add agent harness [\#88](https://github.com/Hyper-Unearthing/llm_gateway/pull/88) ([billybonks](https://github.com/billybonks))
|
|
10
|
+
- refactor: prompt to use modern patterns [\#87](https://github.com/Hyper-Unearthing/llm_gateway/pull/87) ([billybonks](https://github.com/billybonks))
|
|
11
|
+
- feat: change our utils to follow actie support style [\#85](https://github.com/Hyper-Unearthing/llm_gateway/pull/85) ([billybonks](https://github.com/billybonks))
|
|
12
|
+
- feat: add reasoning level as soemthing configurable in prompt [\#83](https://github.com/Hyper-Unearthing/llm_gateway/pull/83) ([billybonks](https://github.com/billybonks))
|
|
13
|
+
- feat: add support for code execution tool [\#79](https://github.com/Hyper-Unearthing/llm_gateway/pull/79) ([billybonks](https://github.com/billybonks))
|
|
14
|
+
|
|
15
|
+
## [v0.6.0](https://github.com/Hyper-Unearthing/llm_gateway/tree/v0.6.0) (2026-05-27)
|
|
16
|
+
|
|
17
|
+
[Full Changelog](https://github.com/Hyper-Unearthing/llm_gateway/compare/v0.5.0...v0.6.0)
|
|
18
|
+
|
|
19
|
+
**Closed issues:**
|
|
20
|
+
|
|
21
|
+
- issues with token normalization [\#75](https://github.com/Hyper-Unearthing/llm_gateway/issues/75)
|
|
22
|
+
- Add normalized token usage fields for streamed responses [\#72](https://github.com/Hyper-Unearthing/llm_gateway/issues/72)
|
|
23
|
+
- Add timestamp metadata to messages [\#70](https://github.com/Hyper-Unearthing/llm_gateway/issues/70)
|
|
24
|
+
- Build final AssistantMessage in stream pipeline and include it on message\_end [\#69](https://github.com/Hyper-Unearthing/llm_gateway/issues/69)
|
|
25
|
+
- Expose finalized content on stream \_end events [\#68](https://github.com/Hyper-Unearthing/llm_gateway/issues/68)
|
|
26
|
+
- Add accumulated AssistantMessage partials to stream events [\#66](https://github.com/Hyper-Unearthing/llm_gateway/issues/66)
|
|
27
|
+
- 1.0 [\#37](https://github.com/Hyper-Unearthing/llm_gateway/issues/37)
|
|
28
|
+
|
|
29
|
+
**Merged pull requests:**
|
|
30
|
+
|
|
31
|
+
- Improve token normalization [\#78](https://github.com/Hyper-Unearthing/llm_gateway/pull/78) ([billybonks](https://github.com/billybonks))
|
|
32
|
+
- fix\(tests\): the hand off tests were totally fake, now they work [\#77](https://github.com/Hyper-Unearthing/llm_gateway/pull/77) ([billybonks](https://github.com/billybonks))
|
|
33
|
+
- Improve message event metadata and helpers [\#74](https://github.com/Hyper-Unearthing/llm_gateway/pull/74) ([billybonks](https://github.com/billybonks))
|
|
34
|
+
- fix: update migration guide [\#73](https://github.com/Hyper-Unearthing/llm_gateway/pull/73) ([billybonks](https://github.com/billybonks))
|
|
35
|
+
- feat: add partial message as part of streaming events [\#67](https://github.com/Hyper-Unearthing/llm_gateway/pull/67) ([billybonks](https://github.com/billybonks))
|
|
36
|
+
- docs: add migration guide for upcomming version [\#65](https://github.com/Hyper-Unearthing/llm_gateway/pull/65) ([billybonks](https://github.com/billybonks))
|
|
37
|
+
- Decouple model selection from provider auth configuration [\#64](https://github.com/Hyper-Unearthing/llm_gateway/pull/64) ([billybonks](https://github.com/billybonks))
|
|
38
|
+
- burn: support for legacy provider keys [\#63](https://github.com/Hyper-Unearthing/llm_gateway/pull/63) ([billybonks](https://github.com/billybonks))
|
|
39
|
+
- docs: add docs about options for stream method [\#62](https://github.com/Hyper-Unearthing/llm_gateway/pull/62) ([billybonks](https://github.com/billybonks))
|
|
40
|
+
|
|
3
41
|
## [v0.5.0](https://github.com/Hyper-Unearthing/llm_gateway/tree/v0.5.0) (2026-05-20)
|
|
4
42
|
|
|
5
43
|
[Full Changelog](https://github.com/Hyper-Unearthing/llm_gateway/compare/v0.4.0...v0.5.0)
|
data/README.md
CHANGED
|
@@ -7,12 +7,23 @@ Provide a unified translation interface for LLM Provider API's, While allowing d
|
|
|
7
7
|
- [Principles:](#principles)
|
|
8
8
|
- [Installation](#installation)
|
|
9
9
|
- [Supported Providers](#supported-providers)
|
|
10
|
+
- [Stream Options](#stream-options)
|
|
11
|
+
- [Managed cross-provider options](#managed-cross-provider-options)
|
|
12
|
+
- [Provider-specific options](#provider-specific-options)
|
|
10
13
|
- [Quick Start: Streaming (all events)](#quick-start-streaming-all-events)
|
|
11
14
|
- [Stream API without handling events (final result only)](#stream-api-without-handling-events-final-result-only)
|
|
15
|
+
- [Prompt classes](#prompt-classes)
|
|
12
16
|
- [Migration guides](#migration-guides)
|
|
13
17
|
- [Tools](#tools)
|
|
14
18
|
- [Defining Tools](#defining-tools)
|
|
15
19
|
- [Handling Tool Calls](#handling-tool-calls)
|
|
20
|
+
- [Server Tool Use](#server-tool-use)
|
|
21
|
+
- [Agents](#agents)
|
|
22
|
+
- [Agent events](#agent-events)
|
|
23
|
+
- [Session managers and persistence](#session-managers-and-persistence)
|
|
24
|
+
- [Queues, steering, and follow-ups](#queues-steering-and-follow-ups)
|
|
25
|
+
- [Compaction](#compaction)
|
|
26
|
+
- [Built-in agent tools](#built-in-agent-tools)
|
|
16
27
|
- [Image Input](#image-input)
|
|
17
28
|
- [Thinking / Reasoning](#thinking--reasoning)
|
|
18
29
|
- [Streaming Thinking Content](#streaming-thinking-content)
|
|
@@ -56,7 +67,53 @@ gem "llm_gateway"
|
|
|
56
67
|
| OpenAI Codex | `openai_codex` | OAuth | Responses |
|
|
57
68
|
| Groq | `groq_completions` | API key | Chat Completions |
|
|
58
69
|
|
|
59
|
-
|
|
70
|
+
Provider configuration only contains auth/client settings (for example `api_key` or `access_token`). Pass the model per request with `model:` when calling `chat` or `stream`.
|
|
71
|
+
|
|
72
|
+
## Stream Options
|
|
73
|
+
|
|
74
|
+
Pass options to `stream` as keyword arguments alongside `tools:` and `system:`:
|
|
75
|
+
|
|
76
|
+
```ruby
|
|
77
|
+
result = adapter.stream(
|
|
78
|
+
transcript,
|
|
79
|
+
system: "You are concise.",
|
|
80
|
+
reasoning: "high",
|
|
81
|
+
cache_key: "conversation-123",
|
|
82
|
+
cache_retention: "short",
|
|
83
|
+
max_completion_tokens: 2_000
|
|
84
|
+
)
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
Options are split into two groups:
|
|
88
|
+
|
|
89
|
+
1. **Managed cross-provider options**: normalized by `llm_gateway` and mapped to each provider API when supported.
|
|
90
|
+
2. **Provider-specific options**: passed through only when that provider/API pair explicitly allows them.
|
|
91
|
+
|
|
92
|
+
Unknown provider-specific options raise `ArgumentError` with the valid option list for that provider/API pair.
|
|
93
|
+
|
|
94
|
+
### Managed cross-provider options
|
|
95
|
+
|
|
96
|
+
| Option | Accepted values | What it means | Provider mapping notes |
|
|
97
|
+
|--------|-----------------|---------------|------------------------|
|
|
98
|
+
| `reasoning` | `"none"`, `"low"`, `"medium"`, `"high"`, `"xhigh"` | Request provider reasoning/thinking effort. | Anthropic maps to `thinking` token budgets. OpenAI Responses maps to `reasoning`. OpenAI Chat Completions maps to `reasoning_effort`. Groq maps to `reasoning_effort` and `reasoning_format: "parsed"`; Groq accepts `"default"`, `"low"`, `"medium"`, `"high"` and does not accept `"xhigh"`. |
|
|
99
|
+
| `cache_key` | String | Stable prompt/session cache key. | OpenAI Chat Completions and OpenAI Responses map this to `prompt_cache_key`. |
|
|
100
|
+
| `cache_retention` | `"short"`, `"long"`, `"none"` | Requested cache retention policy for `cache_key`. | OpenAI maps `"short"` to `"in_memory"`, `"long"` to `"24h"`, and `"none"` removes prompt-cache fields. If `cache_key` is set without retention, OpenAI defaults to `"short"`. |
|
|
101
|
+
| `max_completion_tokens` | Integer | Maximum generated tokens using gateway naming. | Anthropic maps to `max_tokens`; OpenAI Responses maps to `max_output_tokens`; OpenAI/Groq Chat Completions use `max_completion_tokens`. OpenAI Codex currently removes token limit parameters before sending. |
|
|
102
|
+
| `response_format` | String or Hash, provider-dependent | Requested final response shape, e.g. text or JSON. | OpenAI Chat Completions and Groq pass this as `response_format`; OpenAI Responses maps it under `text.format`; Anthropic maps JSON-ish formats to `output_config`. |
|
|
103
|
+
|
|
104
|
+
### Provider-specific options
|
|
105
|
+
|
|
106
|
+
Provider-specific options are maintained as explicit allowlists in the option mapper source. Use the mapper link to see the current allowed Ruby option keys and the provider documentation link for upstream meanings and values.
|
|
107
|
+
|
|
108
|
+
| Provider key | Provider/API pair | Option mapper source | Provider API documentation |
|
|
109
|
+
|--------------|-------------------|----------------------|----------------------------|
|
|
110
|
+
| `anthropic_messages` | Anthropic Messages Create | [`lib/llm_gateway/adapters/anthropic_option_mapper.rb`](lib/llm_gateway/adapters/anthropic_option_mapper.rb) | [Anthropic Messages API](https://platform.claude.com/docs/en/api/messages/create.md) |
|
|
111
|
+
| `openai_completions` | OpenAI Chat Completions Create | [`lib/llm_gateway/adapters/openai/chat_completions/option_mapper.rb`](lib/llm_gateway/adapters/openai/chat_completions/option_mapper.rb) | [OpenAI Chat Completions API](https://developers.openai.com/api/reference/resources/chat/subresources/completions/methods/create/index.md) |
|
|
112
|
+
| `openai_responses` | OpenAI Responses Create | [`lib/llm_gateway/adapters/openai/responses/option_mapper.rb`](lib/llm_gateway/adapters/openai/responses/option_mapper.rb) | [OpenAI Responses API](https://developers.openai.com/api/reference/resources/responses/methods/create/index.md) |
|
|
113
|
+
| `openai_codex` | OpenAI Codex Responses-compatible endpoint | [`lib/llm_gateway/adapters/openai_codex/option_mapper.rb`](lib/llm_gateway/adapters/openai_codex/option_mapper.rb) | [OpenAI Responses API](https://developers.openai.com/api/reference/resources/responses/methods/create/index.md) |
|
|
114
|
+
| `groq_completions` | Groq Chat Completions Create | [`lib/llm_gateway/adapters/groq/option_mapper.rb`](lib/llm_gateway/adapters/groq/option_mapper.rb) | [Groq Chat API](https://console.groq.com/docs/api-reference.md#chat-create) |
|
|
115
|
+
|
|
116
|
+
Common provider-native options you may pass directly when allowed include OpenAI `prompt_cache_key` / `prompt_cache_retention` and Groq `reasoning_effort` / `reasoning_format`. Prefer the managed options above when you want portable behavior across providers.
|
|
60
117
|
|
|
61
118
|
## Quick Start: Streaming (all events)
|
|
62
119
|
|
|
@@ -67,10 +124,8 @@ require "json"
|
|
|
67
124
|
# Build a provider adapter directly (not via prebuilt config)
|
|
68
125
|
adapter = LlmGateway.build_provider(
|
|
69
126
|
provider: "openai_responses", # or anthropic_messages, groq_completions, ...
|
|
70
|
-
api_key: ENV.fetch("OPENAI_API_KEY")
|
|
71
|
-
model_key: "gpt-5.4"
|
|
127
|
+
api_key: ENV.fetch("OPENAI_API_KEY")
|
|
72
128
|
)
|
|
73
|
-
|
|
74
129
|
tools = [
|
|
75
130
|
{
|
|
76
131
|
name: "get_time",
|
|
@@ -90,15 +145,15 @@ transcript = [
|
|
|
90
145
|
|
|
91
146
|
streamed_tool_args = Hash.new { |h, k| h[k] = +"" }
|
|
92
147
|
|
|
93
|
-
response = adapter.stream(transcript, tools: tools, reasoning: "high") do |event|
|
|
148
|
+
response = adapter.stream(transcript, tools: tools, model: "gpt-5.4", reasoning: "high") do |event|
|
|
94
149
|
case event.type
|
|
95
150
|
# AssistantStreamMessageEvent
|
|
96
151
|
when :message_start
|
|
97
152
|
puts "\n[message_start] #{event.delta.inspect}"
|
|
98
153
|
when :message_delta
|
|
99
|
-
puts "\n[message_delta] #{event.delta.inspect} usage
|
|
154
|
+
puts "\n[message_delta] #{event.delta.inspect} usage=#{event.usage.inspect}"
|
|
100
155
|
when :message_end
|
|
101
|
-
puts "\n[message_end]"
|
|
156
|
+
puts "\n[message_end] final_id=#{event.message.id} stop_reason=#{event.message.stop_reason}"
|
|
102
157
|
|
|
103
158
|
# Text events
|
|
104
159
|
when :text_start
|
|
@@ -111,7 +166,7 @@ response = adapter.stream(transcript, tools: tools, reasoning: "high") do |event
|
|
|
111
166
|
|
|
112
167
|
# Tool-call events
|
|
113
168
|
when :tool_start
|
|
114
|
-
puts "\n[tool_start] id=#{event.id} name=#{event.name} index=#{event.content_index}"
|
|
169
|
+
puts "\n[tool_start] id=#{event.id} name=#{event.name} type=#{event.tool_type} index=#{event.content_index}"
|
|
115
170
|
when :tool_delta
|
|
116
171
|
streamed_tool_args[event.content_index] << event.delta
|
|
117
172
|
print event.delta
|
|
@@ -141,6 +196,7 @@ puts "id: #{response.id}"
|
|
|
141
196
|
puts "model: #{response.model}"
|
|
142
197
|
puts "provider/api: #{response.provider}/#{response.api}"
|
|
143
198
|
puts "role: #{response.role}"
|
|
199
|
+
puts "timestamp: #{response.timestamp}" # Unix milliseconds
|
|
144
200
|
puts "stop_reason: #{response.stop_reason}"
|
|
145
201
|
puts "error_message: #{response.error_message.inspect}" if response.error_message
|
|
146
202
|
puts "usage: #{response.usage.inspect}"
|
|
@@ -159,12 +215,24 @@ end
|
|
|
159
215
|
```
|
|
160
216
|
|
|
161
217
|
Stream callback event families:
|
|
162
|
-
- `AssistantStreamMessageEvent`: `:message_start`, `:message_delta
|
|
218
|
+
- `AssistantStreamMessageEvent`: `:message_start`, `:message_delta`
|
|
219
|
+
- `AssistantStreamMessageEndEvent`: `:message_end` with the final `event.message`
|
|
163
220
|
- `AssistantStreamEvent` (and subclasses):
|
|
164
221
|
- Text: `:text_start`, `:text_delta`, `:text_end`
|
|
165
222
|
- Tool call: `:tool_start`, `:tool_delta`, `:tool_end`
|
|
223
|
+
- Tool result: `:tool_result_start`, `:tool_result_delta`, `:tool_result_end` (emitted by some provider-hosted/server tools)
|
|
166
224
|
- Reasoning: `:reasoning_start`, `:reasoning_delta`, `:reasoning_end`
|
|
167
225
|
|
|
226
|
+
Non-final stream events expose `event.partial`, a `PartialAssistantMessage` snapshot accumulated so far. The final `:message_end` event exposes the complete `AssistantMessage` as `event.message` instead.
|
|
227
|
+
|
|
228
|
+
End events include helpers for the finalized current content block:
|
|
229
|
+
- `event.content` for `:text_end`, `:reasoning_end`, and `:tool_end`
|
|
230
|
+
- `event.text` for `:text_end`
|
|
231
|
+
- `event.reasoning` for `:reasoning_end`
|
|
232
|
+
- `event.tool_call` / `event.tool` for `:tool_end`
|
|
233
|
+
|
|
234
|
+
Usage counters are normalized as `:input`, `:cache_write`, `:cache_read`, `:output`, and `:total`. `:total` is the sum of all input-side buckets plus output. `usage[:raw]` contains the original provider usage/token payload.
|
|
235
|
+
|
|
168
236
|
### Stream API without handling events (final result only)
|
|
169
237
|
|
|
170
238
|
If you only care about the final `AssistantMessage`, call `stream` without a block:
|
|
@@ -173,14 +241,14 @@ If you only care about the final `AssistantMessage`, call `stream` without a blo
|
|
|
173
241
|
require "llm_gateway"
|
|
174
242
|
|
|
175
243
|
adapter = LlmGateway.build_provider(
|
|
176
|
-
provider: "
|
|
177
|
-
api_key: ENV.fetch("OPENAI_API_KEY")
|
|
178
|
-
model_key: "gpt-5.4"
|
|
244
|
+
provider: "openai_responses",
|
|
245
|
+
api_key: ENV.fetch("OPENAI_API_KEY")
|
|
179
246
|
)
|
|
180
247
|
|
|
181
|
-
result = adapter.stream("Write one short sentence about Ruby.")
|
|
248
|
+
result = adapter.stream("Write one short sentence about Ruby.", model: "gpt-5.4")
|
|
182
249
|
|
|
183
250
|
puts result.role # "assistant"
|
|
251
|
+
puts result.timestamp # Unix milliseconds
|
|
184
252
|
puts result.stop_reason # "stop" (usually)
|
|
185
253
|
puts result.usage.inspect
|
|
186
254
|
|
|
@@ -192,9 +260,65 @@ text = result.content
|
|
|
192
260
|
puts text
|
|
193
261
|
```
|
|
194
262
|
|
|
263
|
+
## Prompt classes
|
|
264
|
+
|
|
265
|
+
`LlmGateway::Prompt` wraps a reusable prompt, provider/model defaults, callbacks, optional tools, and prompt-cache options around the `stream` API.
|
|
266
|
+
|
|
267
|
+
```ruby
|
|
268
|
+
class AddTool < LlmGateway::Tool
|
|
269
|
+
name "add"
|
|
270
|
+
description "Adds two numbers"
|
|
271
|
+
input_schema(type: "object")
|
|
272
|
+
cache true # optional: mark the tool definition as cacheable where supported
|
|
273
|
+
|
|
274
|
+
def execute(input)
|
|
275
|
+
input[:left] + input[:right]
|
|
276
|
+
end
|
|
277
|
+
end
|
|
278
|
+
|
|
279
|
+
class MathPrompt < LlmGateway::Prompt
|
|
280
|
+
self.provider = LlmGateway.build_provider(
|
|
281
|
+
provider: "openai_responses",
|
|
282
|
+
api_key: ENV.fetch("OPENAI_API_KEY")
|
|
283
|
+
)
|
|
284
|
+
self.model = "gpt-5.4"
|
|
285
|
+
|
|
286
|
+
TOOLS = [AddTool].freeze
|
|
287
|
+
|
|
288
|
+
def prompt
|
|
289
|
+
"What is 2 + 3? Use the add tool."
|
|
290
|
+
end
|
|
291
|
+
|
|
292
|
+
def system_prompt
|
|
293
|
+
"You are a careful math assistant."
|
|
294
|
+
end
|
|
295
|
+
end
|
|
296
|
+
|
|
297
|
+
response = MathPrompt.new(
|
|
298
|
+
cache_key: "math-prompt-v1",
|
|
299
|
+
cache_retention: "short"
|
|
300
|
+
).run
|
|
301
|
+
|
|
302
|
+
puts response.role # "assistant"
|
|
303
|
+
puts response.content.select { |block| block.type == "text" }.map(&:text).join
|
|
304
|
+
```
|
|
305
|
+
|
|
306
|
+
How `Prompt` works now:
|
|
307
|
+
|
|
308
|
+
- `prompt` is evaluated once per `run`.
|
|
309
|
+
- `run(provider:, model:, reasoning:, **options)` calls `stream` and returns the final normalized `AssistantMessage` after any tool calls complete.
|
|
310
|
+
- `stream(input = prompt, provider:, model:, reasoning:, **options, &block)` forwards to the provider and returns the normalized `AssistantMessage`.
|
|
311
|
+
- Tools are declared as tool classes in a `TOOLS` constant. `run` automatically executes returned `tool_use` blocks, appends `tool_result` messages, and loops until no tool calls remain.
|
|
312
|
+
- `system_prompt`, `tools`, `model`, `reasoning`, `cache_key`, and `cache_retention` are forwarded as stream options.
|
|
313
|
+
- `cache_retention` can also enable provider cache control for prompt-owned system/tool blocks where supported, and `Tool.cache true` marks a tool definition with `cache_control`.
|
|
314
|
+
- `before_execute` callbacks receive the resolved input. `after_execute` callbacks receive the final `AssistantMessage`.
|
|
315
|
+
- The old `extract_response` and `parse_response` hooks are no longer called; inspect, parse, or transform the returned `AssistantMessage` after `run`.
|
|
316
|
+
|
|
195
317
|
## Migration guides
|
|
196
318
|
|
|
197
|
-
- [
|
|
319
|
+
- [0.7.0 migration guide](docs/migration_guide_0.7.0.md) — update `Prompt` subclasses for normalized `AssistantMessage` return values, automatic tool loops, `TOOLS`, and removed response hooks.
|
|
320
|
+
- [0.6.0 migration guide](docs/migration_guide_0.6.0.md) — move `model_key` to per-request `model:`, update provider keys, update `Prompt` usage, and migrate stream event/usage changes.
|
|
321
|
+
- [Migrating from `chat` to `stream`](docs/migration-guide.md) — use `stream` without a block when you only need the final response.
|
|
198
322
|
|
|
199
323
|
## Tools
|
|
200
324
|
|
|
@@ -228,11 +352,9 @@ require "llm_gateway"
|
|
|
228
352
|
require "json"
|
|
229
353
|
|
|
230
354
|
adapter = LlmGateway.build_provider(
|
|
231
|
-
provider: "
|
|
232
|
-
api_key: ENV.fetch("OPENAI_API_KEY")
|
|
233
|
-
model_key: "gpt-5.4"
|
|
355
|
+
provider: "openai_responses",
|
|
356
|
+
api_key: ENV.fetch("OPENAI_API_KEY")
|
|
234
357
|
)
|
|
235
|
-
|
|
236
358
|
weather_tool = {
|
|
237
359
|
name: "get_weather",
|
|
238
360
|
description: "Get current weather for a location",
|
|
@@ -261,7 +383,7 @@ transcript = [
|
|
|
261
383
|
]
|
|
262
384
|
|
|
263
385
|
# 1) First model pass (stream API, no event block)
|
|
264
|
-
response = adapter.stream(transcript, tools: [weather_tool])
|
|
386
|
+
response = adapter.stream(transcript, tools: [weather_tool], model: "gpt-5.4")
|
|
265
387
|
transcript << response.to_h
|
|
266
388
|
|
|
267
389
|
# 2) Execute tool calls returned by the model
|
|
@@ -284,7 +406,7 @@ end
|
|
|
284
406
|
|
|
285
407
|
# 3) Continue the conversation after tool execution
|
|
286
408
|
if response.content.any? { |b| b.type == "tool_use" }
|
|
287
|
-
final_response = adapter.stream(transcript, tools: [weather_tool])
|
|
409
|
+
final_response = adapter.stream(transcript, tools: [weather_tool], model: "gpt-5.4")
|
|
288
410
|
|
|
289
411
|
final_text = final_response.content
|
|
290
412
|
.select { |b| b.type == "text" }
|
|
@@ -300,6 +422,196 @@ Notes:
|
|
|
300
422
|
- Tool results are sent back in the transcript as `{ type: "tool_result", tool_use_id:, content: }` blocks.
|
|
301
423
|
- For multimodal-capable models, `tool_result` content can include image blocks when supported by the provider/model.
|
|
302
424
|
|
|
425
|
+
### Server Tool Use
|
|
426
|
+
|
|
427
|
+
Some providers offer provider-hosted tools, such as OpenAI Responses code interpreter or Anthropic code execution. Pass these tools in the provider's native shape; `llm_gateway` preserves them and normalizes server tool activity in streams and final messages.
|
|
428
|
+
|
|
429
|
+
```ruby
|
|
430
|
+
openai_code_interpreter = {
|
|
431
|
+
type: "code_interpreter",
|
|
432
|
+
container: { type: "auto", memory_limit: "1g" }
|
|
433
|
+
}
|
|
434
|
+
|
|
435
|
+
anthropic_code_execution = {
|
|
436
|
+
type: "code_execution_20250825",
|
|
437
|
+
name: "code_execution"
|
|
438
|
+
}
|
|
439
|
+
|
|
440
|
+
tools = provider == "openai_responses" ? [openai_code_interpreter] : [anthropic_code_execution]
|
|
441
|
+
response = adapter.stream("Create a chart from this CSV and save it as PNG.", tools: tools) do |event|
|
|
442
|
+
case event.type
|
|
443
|
+
when :tool_start
|
|
444
|
+
puts "server tool: #{event.name}" if event.tool_type == "server_tool_use"
|
|
445
|
+
when :tool_delta
|
|
446
|
+
print event.delta # streamed code/input JSON when the provider exposes it
|
|
447
|
+
when :tool_result_start, :tool_result_delta
|
|
448
|
+
print event.delta # provider-hosted result metadata/content when available
|
|
449
|
+
end
|
|
450
|
+
end
|
|
451
|
+
|
|
452
|
+
response.content.each do |block|
|
|
453
|
+
case block.type
|
|
454
|
+
when "server_tool_use"
|
|
455
|
+
puts "server tool #{block.name} input=#{block.input.inspect} id=#{block.id}"
|
|
456
|
+
when "server_tool_result"
|
|
457
|
+
puts "server tool result for #{block.tool_use_id}: #{block.content.inspect}"
|
|
458
|
+
end
|
|
459
|
+
end
|
|
460
|
+
```
|
|
461
|
+
|
|
462
|
+
Cross-provider server tool handoffs are best-effort:
|
|
463
|
+
|
|
464
|
+
- Same provider/API replay keeps `server_tool_use` / `server_tool_result` blocks when possible.
|
|
465
|
+
- Cross-provider replay converts server tool uses into normal `tool_use` blocks and server tool results into `tool_result` blocks.
|
|
466
|
+
- `llm_gateway` does not translate server tool names between providers. Supply the target provider's server tool definition on the follow-up request.
|
|
467
|
+
- Some providers require the same server tool to be selected in `tools:` when replaying prior server tool activity.
|
|
468
|
+
|
|
469
|
+
## Agents
|
|
470
|
+
|
|
471
|
+
`LlmGateway::Agents::Harness` wraps the streaming API in a stateful conversation loop. It stores session history, executes `LlmGateway::Tool` classes automatically when the model emits tool calls, appends `tool_result` messages, repeats model turns until there are no more tool calls, supports queued user messages while a turn is running, and compacts older session context when needed.
|
|
472
|
+
|
|
473
|
+
```ruby
|
|
474
|
+
require "llm_gateway"
|
|
475
|
+
require "json"
|
|
476
|
+
|
|
477
|
+
class WeatherTool < LlmGateway::Tool
|
|
478
|
+
name "get_weather"
|
|
479
|
+
description "Get current weather for a location"
|
|
480
|
+
input_schema(
|
|
481
|
+
type: "object",
|
|
482
|
+
properties: {
|
|
483
|
+
location: { type: "string" }
|
|
484
|
+
},
|
|
485
|
+
required: ["location"]
|
|
486
|
+
)
|
|
487
|
+
|
|
488
|
+
def execute(input)
|
|
489
|
+
location = input[:location] || input["location"]
|
|
490
|
+
|
|
491
|
+
JSON.generate(
|
|
492
|
+
location: location,
|
|
493
|
+
temperature: 14,
|
|
494
|
+
condition: "Cloudy"
|
|
495
|
+
)
|
|
496
|
+
end
|
|
497
|
+
end
|
|
498
|
+
|
|
499
|
+
class WeatherHarness < LlmGateway::Agents::Harness
|
|
500
|
+
TOOLS = [WeatherTool]
|
|
501
|
+
|
|
502
|
+
def system_prompt
|
|
503
|
+
"You are a concise weather assistant. Use tools when useful."
|
|
504
|
+
end
|
|
505
|
+
end
|
|
506
|
+
|
|
507
|
+
adapter = LlmGateway.build_provider(
|
|
508
|
+
provider: "openai_responses",
|
|
509
|
+
api_key: ENV.fetch("OPENAI_API_KEY")
|
|
510
|
+
)
|
|
511
|
+
|
|
512
|
+
session = LlmGateway::Agents::InMemorySessionManager.new("weather-session")
|
|
513
|
+
harness = WeatherHarness.new(
|
|
514
|
+
session,
|
|
515
|
+
provider: adapter,
|
|
516
|
+
model: "gpt-5.4",
|
|
517
|
+
reasoning: "high"
|
|
518
|
+
)
|
|
519
|
+
|
|
520
|
+
harness.prompt_message(
|
|
521
|
+
role: "user",
|
|
522
|
+
content: [ { type: "text", text: "What is the weather in London?" } ]
|
|
523
|
+
) do |event|
|
|
524
|
+
case event.type
|
|
525
|
+
when :agent_start
|
|
526
|
+
puts "Agent started"
|
|
527
|
+
when :turn_start
|
|
528
|
+
puts "Turn started"
|
|
529
|
+
when :message_update
|
|
530
|
+
# Streaming provider events are wrapped on message update events.
|
|
531
|
+
stream_event = event.stream_event
|
|
532
|
+
print stream_event.delta if stream_event.respond_to?(:delta)
|
|
533
|
+
when :tool_execution_start
|
|
534
|
+
puts "\nExecuting #{event.parameters[:name]}"
|
|
535
|
+
when :tool_execution_end
|
|
536
|
+
puts "\nTool result: #{event.result.content}"
|
|
537
|
+
when :agent_end
|
|
538
|
+
puts "\nAgent finished"
|
|
539
|
+
end
|
|
540
|
+
end
|
|
541
|
+
|
|
542
|
+
puts harness.transcript.inspect
|
|
543
|
+
```
|
|
544
|
+
|
|
545
|
+
Harness behavior:
|
|
546
|
+
|
|
547
|
+
- `prompt_message(message)` accepts an LLM-shaped message hash, records it in the session, streams the provider response, records the final assistant message, executes any returned tool calls from the harness class's `TOOLS` constant, records a user `tool_result` message, and continues until no tool calls remain.
|
|
548
|
+
- Harnesses pass `tools`, `system_prompt`, `model`, `reasoning`, `cache_key`, and `cache_retention` through the inherited `Prompt#stream` defaults.
|
|
549
|
+
- Pass `model:` and optional `reasoning:` to `new`, or set them later with `harness.model = "..."` / `harness.reasoning = "..."`. Model and reasoning changes are recorded as session events.
|
|
550
|
+
- `harness.transcript` (also aliased as `prompt`) returns the current model input: the latest compaction summary, if any, followed by active messages.
|
|
551
|
+
- `harness.run` / `harness.continue` continues from the current session state without adding a new user message.
|
|
552
|
+
|
|
553
|
+
### Agent events
|
|
554
|
+
|
|
555
|
+
When a block is passed to `prompt_message`, `run`, or `continue`, the harness emits typed events:
|
|
556
|
+
|
|
557
|
+
- `:agent_start`
|
|
558
|
+
- `:turn_start`
|
|
559
|
+
- `:message_start`
|
|
560
|
+
- `:message_update` with `event.stream_event` containing the normalized streaming event from the provider
|
|
561
|
+
- `:message_end` with `event.message`
|
|
562
|
+
- `:tool_execution_start` with `event.parameters` (`id`, `type`, `name`, `input`)
|
|
563
|
+
- `:tool_execution_end` with `event.parameters` and `event.result`
|
|
564
|
+
- `:turn_end` with `event.message` and `event.tool_results`
|
|
565
|
+
- `:agent_end`
|
|
566
|
+
|
|
567
|
+
### Session managers and persistence
|
|
568
|
+
|
|
569
|
+
- `LlmGateway::Agents::InMemorySessionManager.new(session_id = nil)` keeps session events in memory for the lifetime of the process.
|
|
570
|
+
- `LlmGateway::Agents::FileSessionManager.new(file_name = nil, session_id: nil, session_start: nil, session_dir: nil)` persists session events as JSONL. If `file_name` is omitted, files are created under `LLM_GATEWAY_SESSION_DIR` or `~/.llm_gateway/sessions`.
|
|
571
|
+
- File sessions load existing JSONL sessions and append new events to the same file.
|
|
572
|
+
- Session event types include `session`, `message`, `model_change`, `reasoning_change`, and `compaction`. Queued messages are kept in memory and are persisted only when drained into the active conversation.
|
|
573
|
+
|
|
574
|
+
### Queues, steering, and follow-ups
|
|
575
|
+
|
|
576
|
+
Calls made while a harness is already processing are queued instead of recursively starting another run.
|
|
577
|
+
|
|
578
|
+
- `prompt_message(message)` queues to the harness's default queue while busy. The default is `:next_turn`.
|
|
579
|
+
- `steer_message(message)`, `follow_up_message(message)`, and `next_turn_message(message)` enqueue to their matching queue while busy. When idle, they behave like `prompt_message`.
|
|
580
|
+
- `:steer` messages are drained before the next model request in the current run.
|
|
581
|
+
- `:follow_up` messages run after the current turn finishes and before `:next_turn` messages.
|
|
582
|
+
- `:next_turn` messages run after the current agent run completes.
|
|
583
|
+
- Queued messages drain as `:all` by default. Set `harness.queue_drain_mode = :one_at_a_time` to drain one FIFO message at a time.
|
|
584
|
+
- Set `harness.default_queue_mode = :steer`, `:follow_up`, or `:next_turn` to change where busy `prompt_message` calls are queued.
|
|
585
|
+
|
|
586
|
+
### Compaction
|
|
587
|
+
|
|
588
|
+
Before starting a new user message and before draining queued follow-up/next-turn work, the harness checks whether compaction is needed. It compacts when either:
|
|
589
|
+
|
|
590
|
+
- the latest recorded message usage exceeds `LlmGateway::Agents::Harness::COMPACTION_TOKEN_THRESHOLD`, or
|
|
591
|
+
- the latest assistant message is older than `LlmGateway::Agents::Harness::COMPACTION_IDLE_THRESHOLD_SECONDS`.
|
|
592
|
+
|
|
593
|
+
Compaction calls `adapter.stream(active_messages, system: "Summarize the conversation so far for future context.", tools: [])`, stores the returned assistant message as a `compaction` event, and builds future model input as the compaction summary plus messages recorded after that compaction.
|
|
594
|
+
|
|
595
|
+
### Built-in agent tools
|
|
596
|
+
|
|
597
|
+
The agent harness can use any `LlmGateway::Tool` subclass in its `TOOLS` constant. The library also provides optional coding-oriented tools. Require the ones you want and include them in your harness:
|
|
598
|
+
|
|
599
|
+
```ruby
|
|
600
|
+
require "llm_gateway/agents/tools/read_tool"
|
|
601
|
+
require "llm_gateway/agents/tools/bash_tool"
|
|
602
|
+
require "llm_gateway/agents/tools/edit_tool"
|
|
603
|
+
require "llm_gateway/agents/tools/write_tool"
|
|
604
|
+
|
|
605
|
+
class CodingHarness < LlmGateway::Agents::Harness
|
|
606
|
+
TOOLS = [ReadTool, BashTool, EditTool, WriteTool]
|
|
607
|
+
end
|
|
608
|
+
```
|
|
609
|
+
|
|
610
|
+
- `ReadTool` (`read`) reads text files and supported images (`jpg`, `png`, `gif`, `webp`). Text output is truncated to 2,000 lines or 50KB from the start; use `offset`/`limit` to continue through large files.
|
|
611
|
+
- `BashTool` (`bash`) runs a command in the current working directory, combines stdout/stderr, supports an optional timeout, truncates long output to the last 2,000 lines or 50KB, and saves full truncated output to a temp file.
|
|
612
|
+
- `EditTool` (`edit`) edits one file with one or more exact `edits[].oldText` → `edits[].newText` replacements. Each `oldText` must be unique in the original file and edits must not overlap.
|
|
613
|
+
- `WriteTool` (`write`) creates parent directories as needed and writes or overwrites a file.
|
|
614
|
+
|
|
303
615
|
## Image Input
|
|
304
616
|
|
|
305
617
|
Send images by including an `image` content block in a user message.
|
|
@@ -309,11 +621,9 @@ require "llm_gateway"
|
|
|
309
621
|
require "base64"
|
|
310
622
|
|
|
311
623
|
adapter = LlmGateway.build_provider(
|
|
312
|
-
provider: "
|
|
313
|
-
api_key: ENV.fetch("OPENAI_API_KEY")
|
|
314
|
-
model_key: "gpt-5.4"
|
|
624
|
+
provider: "openai_responses",
|
|
625
|
+
api_key: ENV.fetch("OPENAI_API_KEY")
|
|
315
626
|
)
|
|
316
|
-
|
|
317
627
|
image_b64 = Base64.strict_encode64(File.binread("./chart.png"))
|
|
318
628
|
|
|
319
629
|
message = [
|
|
@@ -326,7 +636,7 @@ message = [
|
|
|
326
636
|
}
|
|
327
637
|
]
|
|
328
638
|
|
|
329
|
-
result = adapter.stream(message) # stream API, no event block
|
|
639
|
+
result = adapter.stream(message, model: "gpt-5.4") # stream API, no event block
|
|
330
640
|
|
|
331
641
|
text = result.content
|
|
332
642
|
.select { |b| b.type == "text" }
|
|
@@ -346,18 +656,18 @@ You can request higher-effort reasoning by passing `reasoning:` to `stream`.
|
|
|
346
656
|
require "llm_gateway"
|
|
347
657
|
|
|
348
658
|
adapter = LlmGateway.build_provider(
|
|
349
|
-
provider: "
|
|
350
|
-
api_key: ENV.fetch("OPENAI_API_KEY")
|
|
351
|
-
model_key: "gpt-5.4"
|
|
659
|
+
provider: "openai_responses",
|
|
660
|
+
api_key: ENV.fetch("OPENAI_API_KEY")
|
|
352
661
|
)
|
|
353
662
|
|
|
354
663
|
result = adapter.stream(
|
|
355
664
|
"Think step by step and then compute 482 * 17.",
|
|
665
|
+
model: "gpt-5.4",
|
|
356
666
|
reasoning: "high"
|
|
357
667
|
)
|
|
358
668
|
|
|
359
669
|
puts "stop_reason: #{result.stop_reason}"
|
|
360
|
-
puts "usage: #{result.usage.inspect}" #
|
|
670
|
+
puts "usage: #{result.usage.inspect}" # normalized keys: :input, :cache_write, :cache_read, :output, :total, :raw
|
|
361
671
|
|
|
362
672
|
result.content.each do |block|
|
|
363
673
|
case block.type
|
|
@@ -377,7 +687,7 @@ If you want incremental thinking/reasoning tokens as they arrive, pass a block t
|
|
|
377
687
|
```ruby
|
|
378
688
|
reasoning_text = +""
|
|
379
689
|
|
|
380
|
-
result = adapter.stream("Solve 99 * 99 with brief reasoning.", reasoning: "high") do |event|
|
|
690
|
+
result = adapter.stream("Solve 99 * 99 with brief reasoning.", model: "gpt-5.4", reasoning: "high") do |event|
|
|
381
691
|
case event.type
|
|
382
692
|
when :reasoning_start
|
|
383
693
|
print "\n[thinking start]\n"
|
|
@@ -405,7 +715,7 @@ puts "Final stop_reason: #{result.stop_reason}"
|
|
|
405
715
|
- fields: `reasoning` and optional `signature`
|
|
406
716
|
- Usage accounting:
|
|
407
717
|
- normalized in `result.usage` when provided by the upstream API
|
|
408
|
-
-
|
|
718
|
+
- keys are `:input`, `:cache_write`, `:cache_read`, `:output`, `:total`, and `:raw`
|
|
409
719
|
|
|
410
720
|
In practice this means you can:
|
|
411
721
|
- listen to `:reasoning_*` stream event variants, and
|
|
@@ -439,7 +749,7 @@ What happens under the hood on `stream`/`chat`:
|
|
|
439
749
|
|
|
440
750
|
5. **Map response back to canonical output**
|
|
441
751
|
- Stream chunks are mapped into normalized stream events.
|
|
442
|
-
- Final output is accumulated into a normalized `AssistantMessage` (`id`, `model`, `usage`, `stop_reason`, `content`, etc.).
|
|
752
|
+
- Final output is accumulated into a normalized `AssistantMessage` (`id`, `model`, `timestamp` as Unix milliseconds, `usage`, `stop_reason`, `content`, etc.).
|
|
443
753
|
|
|
444
754
|
Why this matters:
|
|
445
755
|
- A transcript produced by one provider can be reused with another provider without manually rewriting message structure.
|
|
@@ -455,18 +765,16 @@ require "llm_gateway"
|
|
|
455
765
|
require "json"
|
|
456
766
|
|
|
457
767
|
adapter = LlmGateway.build_provider(
|
|
458
|
-
provider: "
|
|
459
|
-
api_key: ENV.fetch("OPENAI_API_KEY")
|
|
460
|
-
model_key: "gpt-5.4"
|
|
768
|
+
provider: "openai_responses",
|
|
769
|
+
api_key: ENV.fetch("OPENAI_API_KEY")
|
|
461
770
|
)
|
|
462
|
-
|
|
463
771
|
# Build context (transcript)
|
|
464
772
|
transcript = [
|
|
465
773
|
{ role: "user", content: "Plan a 3-day trip to Tokyo." }
|
|
466
774
|
]
|
|
467
775
|
|
|
468
776
|
# Run one turn and persist assistant output
|
|
469
|
-
first = adapter.stream(transcript)
|
|
777
|
+
first = adapter.stream(transcript, model: "gpt-5.4")
|
|
470
778
|
transcript << first.to_h
|
|
471
779
|
|
|
472
780
|
# Serialize (store in DB/file/cache)
|
|
@@ -477,7 +785,7 @@ restored_transcript = JSON.parse(json_context)
|
|
|
477
785
|
|
|
478
786
|
# Continue conversation from restored context
|
|
479
787
|
restored_transcript << { role: "user", content: "Now make it budget-friendly." }
|
|
480
|
-
second = adapter.stream(restored_transcript)
|
|
788
|
+
second = adapter.stream(restored_transcript, model: "gpt-5.4")
|
|
481
789
|
|
|
482
790
|
puts second.content.select { |b| b.type == "text" }.map(&:text).join
|
|
483
791
|
```
|
|
@@ -491,7 +799,7 @@ Tip: if you serialize to JSON, keys become strings on parse; `llm_gateway` accep
|
|
|
491
799
|
|
|
492
800
|
## OAuth
|
|
493
801
|
|
|
494
|
-
Use OAuth-capable providers (for example `openai_codex` and `
|
|
802
|
+
Use OAuth-capable providers (for example `openai_codex` and `anthropic_messages`) by supplying an `access_token` when building the adapter.
|
|
495
803
|
|
|
496
804
|
### Get initial tokens (Codex / OpenAI OAuth)
|
|
497
805
|
|
|
@@ -599,11 +907,10 @@ Build the provider with the current access token:
|
|
|
599
907
|
```ruby
|
|
600
908
|
adapter = LlmGateway.build_provider(
|
|
601
909
|
provider: "openai_codex",
|
|
602
|
-
access_token: current_access_token
|
|
603
|
-
model_key: "gpt-5.4"
|
|
910
|
+
access_token: current_access_token
|
|
604
911
|
)
|
|
605
912
|
|
|
606
|
-
result = adapter.stream("Hello from OAuth auth")
|
|
913
|
+
result = adapter.stream("Hello from OAuth auth", model: "gpt-5.4")
|
|
607
914
|
puts result.content.select { |b| b.type == "text" }.map(&:text).join
|
|
608
915
|
```
|
|
609
916
|
|
|
@@ -641,6 +948,6 @@ bundle exec ruby -Itest test/integration/live/stream_test.rb
|
|
|
641
948
|
|
|
642
949
|
Cassette names are derived from the test file and test name, with VCR sanitizing path segments such as `stream_test.rb` to `stream_test_rb`.
|
|
643
950
|
|
|
644
|
-
For OAuth-backed providers (`
|
|
951
|
+
For OAuth-backed providers (`anthropic_messages`, `openai_codex`), the live test helper only loads real OAuth credentials while the cassette is being recorded. Once the cassette exists, replay uses placeholder tokens/account IDs so the test suite can run without local OAuth state. API-key providers still require the relevant API key when recording. Sensitive authorization headers and selected response headers are redacted before cassettes are written.
|
|
645
952
|
|
|
646
953
|
Some tests pass `redact_request_body: true` to `with_vcr_adapter`; those cassettes match on method and URI only and replace large request bodies with `"<huge prompt body redacted>"`.
|