llm_gateway 0.3.0 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +26 -0
- data/README.md +544 -186
- data/Rakefile +1 -2
- data/docs/migration-guide.md +135 -0
- data/lib/llm_gateway/adapters/adapter.rb +173 -0
- data/lib/llm_gateway/adapters/anthropic/acts_like_messages.rb +23 -0
- data/lib/llm_gateway/adapters/{claude → anthropic}/bidirectional_message_mapper.rb +31 -3
- data/lib/llm_gateway/adapters/{claude → anthropic}/input_mapper.rb +4 -3
- data/lib/llm_gateway/adapters/anthropic/messages_adapter.rb +19 -0
- data/lib/llm_gateway/adapters/{claude → anthropic}/output_mapper.rb +1 -1
- data/lib/llm_gateway/adapters/anthropic/stream_mapper.rb +110 -0
- data/lib/llm_gateway/adapters/anthropic_option_mapper.rb +53 -0
- data/lib/llm_gateway/adapters/groq/chat_completions_adapter.rb +47 -0
- data/lib/llm_gateway/adapters/groq/option_mapper.rb +27 -0
- data/lib/llm_gateway/adapters/input_message_sanitizer.rb +93 -0
- data/lib/llm_gateway/adapters/openai/acts_like_chat_completions.rb +22 -0
- data/lib/llm_gateway/adapters/openai/acts_like_responses.rb +31 -0
- data/lib/llm_gateway/adapters/{open_ai → openai}/chat_completions/bidirectional_message_mapper.rb +9 -2
- data/lib/llm_gateway/adapters/{open_ai → openai}/chat_completions/input_mapper.rb +1 -6
- data/lib/llm_gateway/adapters/openai/chat_completions/input_message_sanitizer.rb +65 -0
- data/lib/llm_gateway/adapters/openai/chat_completions/option_mapper.rb +39 -0
- data/lib/llm_gateway/adapters/{open_ai → openai}/chat_completions/output_mapper.rb +1 -1
- data/lib/llm_gateway/adapters/openai/chat_completions/stream_mapper.rb +242 -0
- data/lib/llm_gateway/adapters/openai/chat_completions_adapter.rb +20 -0
- data/lib/llm_gateway/adapters/{open_ai → openai}/file_output_mapper.rb +1 -1
- data/lib/llm_gateway/adapters/openai/prompt_cache_option_mapper.rb +39 -0
- data/lib/llm_gateway/adapters/{open_ai → openai}/responses/bidirectional_message_mapper.rb +52 -4
- data/lib/llm_gateway/adapters/openai/responses/input_mapper.rb +106 -0
- data/lib/llm_gateway/adapters/openai/responses/option_mapper.rb +41 -0
- data/lib/llm_gateway/adapters/{open_ai → openai}/responses/output_mapper.rb +1 -1
- data/lib/llm_gateway/adapters/openai/responses/stream_mapper.rb +340 -0
- data/lib/llm_gateway/adapters/openai/responses_adapter.rb +20 -0
- data/lib/llm_gateway/adapters/openai_codex/input_mapper.rb +206 -0
- data/lib/llm_gateway/adapters/openai_codex/option_mapper.rb +28 -0
- data/lib/llm_gateway/adapters/openai_codex/responses_adapter.rb +38 -0
- data/lib/llm_gateway/adapters/option_mapper.rb +13 -0
- data/lib/llm_gateway/adapters/stream_accumulator.rb +91 -0
- data/lib/llm_gateway/adapters/structs.rb +145 -0
- data/lib/llm_gateway/base_client.rb +62 -1
- data/lib/llm_gateway/client.rb +45 -129
- data/lib/llm_gateway/clients/anthropic.rb +167 -0
- data/lib/llm_gateway/clients/claude_code/oauth_flow.rb +162 -0
- data/lib/llm_gateway/clients/claude_code/token_manager.rb +112 -0
- data/lib/llm_gateway/clients/groq.rb +54 -0
- data/lib/llm_gateway/clients/openai.rb +208 -0
- data/lib/llm_gateway/clients/openai_codex/oauth_flow.rb +258 -0
- data/lib/llm_gateway/clients/openai_codex/token_manager.rb +71 -0
- data/lib/llm_gateway/errors.rb +21 -0
- data/lib/llm_gateway/prompt.rb +12 -1
- data/lib/llm_gateway/provider_registry.rb +37 -0
- data/lib/llm_gateway/version.rb +1 -1
- data/lib/llm_gateway.rb +165 -14
- data/scripts/create_anthropic_credentials.rb +106 -0
- data/scripts/create_openai_codex_credentials.rb +116 -0
- data/scripts/generate_handoff_live_fixture.rb +169 -0
- data/scripts/generate_handoff_media_fixture.rb +167 -0
- metadata +64 -28
- data/lib/llm_gateway/adapters/claude/client.rb +0 -60
- data/lib/llm_gateway/adapters/groq/bidirectional_message_mapper.rb +0 -18
- data/lib/llm_gateway/adapters/groq/client.rb +0 -58
- data/lib/llm_gateway/adapters/groq/input_mapper.rb +0 -18
- data/lib/llm_gateway/adapters/groq/output_mapper.rb +0 -10
- data/lib/llm_gateway/adapters/open_ai/client.rb +0 -80
- data/lib/llm_gateway/adapters/open_ai/responses/input_mapper.rb +0 -62
- data/sample/claude_code_clone/agent.rb +0 -65
- data/sample/claude_code_clone/claude_code_clone.rb +0 -40
- data/sample/claude_code_clone/prompt.rb +0 -79
- data/sample/claude_code_clone/run.rb +0 -47
- data/sample/claude_code_clone/tools/bash_tool.rb +0 -54
- data/sample/claude_code_clone/tools/edit_tool.rb +0 -61
- data/sample/claude_code_clone/tools/grep_tool.rb +0 -113
- data/sample/claude_code_clone/tools/read_tool.rb +0 -61
- data/sample/claude_code_clone/tools/todowrite_tool.rb +0 -98
data/README.md
CHANGED
|
@@ -1,272 +1,630 @@
|
|
|
1
|
-
#
|
|
1
|
+
# llm_gateway
|
|
2
2
|
|
|
3
3
|
Provide a unified translation interface for LLM Provider API's, While allowing developers to have as much control as possible, This does make it more complicated because we dont want developers to be blocked at using something that the provider supports. As time progress the library will mature and support more responses
|
|
4
4
|
|
|
5
|
+
## Table of Contents
|
|
6
|
+
|
|
7
|
+
- [Principles:](#principles)
|
|
8
|
+
- [Installation](#installation)
|
|
9
|
+
- [Supported Providers](#supported-providers)
|
|
10
|
+
- [Quick Start: Streaming (all events)](#quick-start-streaming-all-events)
|
|
11
|
+
- [Stream API without handling events (final result only)](#stream-api-without-handling-events-final-result-only)
|
|
12
|
+
- [Migration guides](#migration-guides)
|
|
13
|
+
- [Tools](#tools)
|
|
14
|
+
- [Defining Tools](#defining-tools)
|
|
15
|
+
- [Handling Tool Calls](#handling-tool-calls)
|
|
16
|
+
- [Image Input](#image-input)
|
|
17
|
+
- [Thinking / Reasoning](#thinking--reasoning)
|
|
18
|
+
- [Streaming Thinking Content](#streaming-thinking-content)
|
|
19
|
+
- [How reasoning values are mapped](#how-reasoning-values-are-mapped)
|
|
20
|
+
- [Cross-Provider Handoffs](#cross-provider-handoffs)
|
|
21
|
+
- [Context Serialization](#context-serialization)
|
|
22
|
+
- [OAuth](#oauth)
|
|
23
|
+
- [Get initial tokens (Codex / OpenAI OAuth)](#get-initial-tokens-codex--openai-oauth)
|
|
24
|
+
- [Get initial tokens (Anthropic OAuth)](#get-initial-tokens-anthropic-oauth)
|
|
25
|
+
- [Get a refresh token](#get-a-refresh-token)
|
|
26
|
+
- [Exchange refresh token for access token](#exchange-refresh-token-for-access-token)
|
|
27
|
+
- [Pass access token in provider requests](#pass-access-token-in-provider-requests)
|
|
28
|
+
- [Token refresh responsibility](#token-refresh-responsibility)
|
|
29
|
+
- [Library’s role (llm_gateway)](#librarys-role-llm_gateway)
|
|
30
|
+
- [User/app’s role](#userapps-role)
|
|
5
31
|
|
|
6
32
|
## Principles:
|
|
7
33
|
1. Transcription integrity is most important
|
|
8
34
|
2. Input messages must have bidirectional integrity
|
|
9
35
|
3. Allow developers as much control as possible
|
|
10
36
|
|
|
11
|
-
##
|
|
12
|
-
|
|
37
|
+
## Installation
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
gem install llm_gateway
|
|
41
|
+
```
|
|
13
42
|
|
|
14
|
-
|
|
15
|
-
Messages either support unidirectional or bidirectional format. (unidirectional means we can format it as an output but should not be added as an input).
|
|
43
|
+
Or add it to your `Gemfile`:
|
|
16
44
|
|
|
17
|
-
|
|
45
|
+
```ruby
|
|
46
|
+
gem "llm_gateway"
|
|
47
|
+
```
|
|
18
48
|
|
|
19
|
-
|
|
20
|
-
Messages
|
|
21
|
-
- Text
|
|
22
|
-
- Tool Use
|
|
23
|
-
- Tool Response
|
|
49
|
+
## Supported Providers
|
|
24
50
|
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
51
|
+
| Provider | Provider Key | Auth | API Surface |
|
|
52
|
+
|-----------|------------------------------|-------|------------------------|
|
|
53
|
+
| Anthropic | `anthropic_messages` | API key | Messages |
|
|
54
|
+
| OpenAI | `openai_completions` | API key | Chat Completions |
|
|
55
|
+
| OpenAI | `openai_responses` | API key | Responses |
|
|
56
|
+
| OpenAI Codex | `openai_codex` | OAuth | Responses |
|
|
57
|
+
| Groq | `groq_completions` | API key | Chat Completions |
|
|
28
58
|
|
|
29
|
-
|
|
30
|
-
- Server Tool Use Reponse
|
|
59
|
+
Legacy keys (`*_apikey_*`, `*_oauth_*`) are still supported for backward compatibility.
|
|
31
60
|
|
|
32
|
-
|
|
61
|
+
## Quick Start: Streaming (all events)
|
|
33
62
|
|
|
63
|
+
```ruby
|
|
64
|
+
require "llm_gateway"
|
|
65
|
+
require "json"
|
|
66
|
+
|
|
67
|
+
# Build a provider adapter directly (not via prebuilt config)
|
|
68
|
+
adapter = LlmGateway.build_provider(
|
|
69
|
+
provider: "openai_responses", # or anthropic_messages, groq_completions, ...
|
|
70
|
+
api_key: ENV.fetch("OPENAI_API_KEY"),
|
|
71
|
+
model_key: "gpt-5.4"
|
|
72
|
+
)
|
|
34
73
|
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
74
|
+
tools = [
|
|
75
|
+
{
|
|
76
|
+
name: "get_time",
|
|
77
|
+
description: "Get the current time",
|
|
78
|
+
input_schema: {
|
|
79
|
+
type: "object",
|
|
80
|
+
properties: {
|
|
81
|
+
timezone: { type: "string", description: "Optional timezone, e.g. America/New_York" }
|
|
82
|
+
}
|
|
83
|
+
}
|
|
84
|
+
}
|
|
85
|
+
]
|
|
86
|
+
|
|
87
|
+
transcript = [
|
|
88
|
+
{ role: "user", content: "What time is it? Think briefly, then call get_time." }
|
|
89
|
+
]
|
|
90
|
+
|
|
91
|
+
streamed_tool_args = Hash.new { |h, k| h[k] = +"" }
|
|
92
|
+
|
|
93
|
+
response = adapter.stream(transcript, tools: tools, reasoning: "high") do |event|
|
|
94
|
+
case event.type
|
|
95
|
+
# AssistantStreamMessageEvent
|
|
96
|
+
when :message_start
|
|
97
|
+
puts "\n[message_start] #{event.delta.inspect}"
|
|
98
|
+
when :message_delta
|
|
99
|
+
puts "\n[message_delta] #{event.delta.inspect} usage+=#{event.usage_increment.inspect}"
|
|
100
|
+
when :message_end
|
|
101
|
+
puts "\n[message_end]"
|
|
102
|
+
|
|
103
|
+
# Text events
|
|
104
|
+
when :text_start
|
|
105
|
+
puts "\n[text_start] index=#{event.content_index}"
|
|
106
|
+
print event.delta unless event.delta.empty?
|
|
107
|
+
when :text_delta
|
|
108
|
+
print event.delta
|
|
109
|
+
when :text_end
|
|
110
|
+
puts "\n[text_end] index=#{event.content_index}"
|
|
111
|
+
|
|
112
|
+
# Tool-call events
|
|
113
|
+
when :tool_start
|
|
114
|
+
puts "\n[tool_start] id=#{event.id} name=#{event.name} index=#{event.content_index}"
|
|
115
|
+
when :tool_delta
|
|
116
|
+
streamed_tool_args[event.content_index] << event.delta
|
|
117
|
+
print event.delta
|
|
118
|
+
when :tool_end
|
|
119
|
+
puts "\n[tool_end] index=#{event.content_index}"
|
|
120
|
+
begin
|
|
121
|
+
puts "tool args: #{JSON.parse(streamed_tool_args[event.content_index])}"
|
|
122
|
+
rescue JSON::ParserError
|
|
123
|
+
puts "tool args (partial/raw): #{streamed_tool_args[event.content_index]}"
|
|
124
|
+
end
|
|
125
|
+
|
|
126
|
+
# Reasoning events
|
|
127
|
+
when :reasoning_start
|
|
128
|
+
puts "\n[reasoning_start] sig=#{event.respond_to?(:signature) ? event.signature : ""}"
|
|
129
|
+
print event.delta
|
|
130
|
+
when :reasoning_delta
|
|
131
|
+
print event.delta
|
|
132
|
+
when :reasoning_end
|
|
133
|
+
puts "\n[reasoning_end]"
|
|
134
|
+
|
|
135
|
+
end
|
|
136
|
+
end
|
|
40
137
|
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
138
|
+
# Final AssistantMessage (assembled from the stream)
|
|
139
|
+
puts "\n\n=== Final assistant message ==="
|
|
140
|
+
puts "id: #{response.id}"
|
|
141
|
+
puts "model: #{response.model}"
|
|
142
|
+
puts "provider/api: #{response.provider}/#{response.api}"
|
|
143
|
+
puts "role: #{response.role}"
|
|
144
|
+
puts "stop_reason: #{response.stop_reason}"
|
|
145
|
+
puts "error_message: #{response.error_message.inspect}" if response.error_message
|
|
146
|
+
puts "usage: #{response.usage.inspect}"
|
|
147
|
+
|
|
148
|
+
response.content.each do |block|
|
|
149
|
+
case block.type
|
|
150
|
+
when "text"
|
|
151
|
+
puts "text: #{block.text}"
|
|
152
|
+
when "reasoning"
|
|
153
|
+
puts "reasoning: #{block.reasoning}"
|
|
154
|
+
puts "signature: #{block.signature}" if block.respond_to?(:signature) && block.signature
|
|
155
|
+
when "tool_use"
|
|
156
|
+
puts "tool_use: #{block.name}(#{block.input.inspect}) id=#{block.id}"
|
|
157
|
+
end
|
|
158
|
+
end
|
|
159
|
+
```
|
|
59
160
|
|
|
161
|
+
Stream callback event families:
|
|
162
|
+
- `AssistantStreamMessageEvent`: `:message_start`, `:message_delta`, `:message_end`
|
|
163
|
+
- `AssistantStreamEvent` (and subclasses):
|
|
164
|
+
- Text: `:text_start`, `:text_delta`, `:text_end`
|
|
165
|
+
- Tool call: `:tool_start`, `:tool_delta`, `:tool_end`
|
|
166
|
+
- Reasoning: `:reasoning_start`, `:reasoning_delta`, `:reasoning_end`
|
|
60
167
|
|
|
168
|
+
### Stream API without handling events (final result only)
|
|
61
169
|
|
|
62
|
-
|
|
170
|
+
If you only care about the final `AssistantMessage`, call `stream` without a block:
|
|
63
171
|
|
|
64
|
-
|
|
65
|
-
|
|
172
|
+
```ruby
|
|
173
|
+
require "llm_gateway"
|
|
66
174
|
|
|
175
|
+
adapter = LlmGateway.build_provider(
|
|
176
|
+
provider: "openai_apikey_responses",
|
|
177
|
+
api_key: ENV.fetch("OPENAI_API_KEY"),
|
|
178
|
+
model_key: "gpt-5.4"
|
|
179
|
+
)
|
|
67
180
|
|
|
68
|
-
|
|
181
|
+
result = adapter.stream("Write one short sentence about Ruby.")
|
|
69
182
|
|
|
70
|
-
|
|
183
|
+
puts result.role # "assistant"
|
|
184
|
+
puts result.stop_reason # "stop" (usually)
|
|
185
|
+
puts result.usage.inspect
|
|
71
186
|
|
|
72
|
-
|
|
73
|
-
|
|
187
|
+
text = result.content
|
|
188
|
+
.select { |block| block.type == "text" }
|
|
189
|
+
.map(&:text)
|
|
190
|
+
.join
|
|
191
|
+
|
|
192
|
+
puts text
|
|
74
193
|
```
|
|
75
194
|
|
|
76
|
-
|
|
195
|
+
## Migration guides
|
|
77
196
|
|
|
78
|
-
|
|
79
|
-
gem install llm_gateway
|
|
80
|
-
```
|
|
197
|
+
- [Migrating from `chat` to `stream`](docs/chat-to-stream-migration.md) — use `stream` without a block when you only need the final response.
|
|
81
198
|
|
|
82
|
-
##
|
|
199
|
+
## Tools
|
|
83
200
|
|
|
84
|
-
###
|
|
201
|
+
### Defining Tools
|
|
85
202
|
|
|
86
203
|
```ruby
|
|
87
|
-
|
|
204
|
+
weather_tool = {
|
|
205
|
+
name: "get_weather",
|
|
206
|
+
description: "Get current weather for a location",
|
|
207
|
+
input_schema: {
|
|
208
|
+
type: "object",
|
|
209
|
+
properties: {
|
|
210
|
+
location: { type: "string", description: "City name or coordinates" },
|
|
211
|
+
units: {
|
|
212
|
+
type: "string",
|
|
213
|
+
enum: ["celsius", "fahrenheit"],
|
|
214
|
+
default: "celsius"
|
|
215
|
+
}
|
|
216
|
+
},
|
|
217
|
+
required: ["location"]
|
|
218
|
+
}
|
|
219
|
+
}
|
|
220
|
+
```
|
|
88
221
|
|
|
89
|
-
|
|
90
|
-
LlmGateway::Client.chat(
|
|
91
|
-
'claude-sonnet-4-20250514',
|
|
92
|
-
'What is the capital of France?'
|
|
93
|
-
)
|
|
222
|
+
### Handling Tool Calls
|
|
94
223
|
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
224
|
+
Use `stream` without a block, inspect returned `tool_use` blocks, execute tools, append `tool_result`, then continue:
|
|
225
|
+
|
|
226
|
+
```ruby
|
|
227
|
+
require "llm_gateway"
|
|
228
|
+
require "json"
|
|
229
|
+
|
|
230
|
+
adapter = LlmGateway.build_provider(
|
|
231
|
+
provider: "openai_apikey_responses",
|
|
232
|
+
api_key: ENV.fetch("OPENAI_API_KEY"),
|
|
233
|
+
model_key: "gpt-5.4"
|
|
100
234
|
)
|
|
101
235
|
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
"
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
]
|
|
236
|
+
weather_tool = {
|
|
237
|
+
name: "get_weather",
|
|
238
|
+
description: "Get current weather for a location",
|
|
239
|
+
input_schema: {
|
|
240
|
+
type: "object",
|
|
241
|
+
properties: {
|
|
242
|
+
location: { type: "string" },
|
|
243
|
+
units: { type: "string", enum: ["celsius", "fahrenheit"], default: "celsius" }
|
|
111
244
|
},
|
|
112
|
-
|
|
113
|
-
|
|
245
|
+
required: ["location"]
|
|
246
|
+
}
|
|
247
|
+
}
|
|
114
248
|
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
249
|
+
def execute_weather_api(args)
|
|
250
|
+
# Replace with real API call
|
|
251
|
+
{
|
|
252
|
+
location: args[:location] || args["location"],
|
|
253
|
+
units: args[:units] || args["units"] || "celsius",
|
|
254
|
+
temperature: 14,
|
|
255
|
+
condition: "Cloudy"
|
|
256
|
+
}
|
|
257
|
+
end
|
|
122
258
|
|
|
259
|
+
transcript = [
|
|
260
|
+
{ role: "user", content: "What is the weather in London?" }
|
|
261
|
+
]
|
|
123
262
|
|
|
124
|
-
#
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
|
|
263
|
+
# 1) First model pass (stream API, no event block)
|
|
264
|
+
response = adapter.stream(transcript, tools: [weather_tool])
|
|
265
|
+
transcript << response.to_h
|
|
266
|
+
|
|
267
|
+
# 2) Execute tool calls returned by the model
|
|
268
|
+
response.content.each do |block|
|
|
269
|
+
next unless block.type == "tool_use"
|
|
270
|
+
|
|
271
|
+
tool_result = execute_weather_api(block.input)
|
|
272
|
+
|
|
273
|
+
transcript << {
|
|
274
|
+
role: "developer",
|
|
275
|
+
content: [
|
|
276
|
+
{
|
|
277
|
+
type: "tool_result",
|
|
278
|
+
tool_use_id: block.id,
|
|
279
|
+
content: JSON.generate(tool_result)
|
|
280
|
+
}
|
|
281
|
+
]
|
|
282
|
+
}
|
|
283
|
+
end
|
|
284
|
+
|
|
285
|
+
# 3) Continue the conversation after tool execution
|
|
286
|
+
if response.content.any? { |b| b.type == "tool_use" }
|
|
287
|
+
final_response = adapter.stream(transcript, tools: [weather_tool])
|
|
288
|
+
|
|
289
|
+
final_text = final_response.content
|
|
290
|
+
.select { |b| b.type == "text" }
|
|
291
|
+
.map(&:text)
|
|
292
|
+
.join
|
|
293
|
+
|
|
294
|
+
puts final_text
|
|
295
|
+
end
|
|
140
296
|
```
|
|
141
297
|
|
|
142
|
-
|
|
298
|
+
Notes:
|
|
299
|
+
- Tool calls are returned as `ToolCall` blocks with `type: "tool_use"`, `id`, `name`, and `input`.
|
|
300
|
+
- Tool results are sent back in the transcript as `{ type: "tool_result", tool_use_id:, content: }` blocks.
|
|
301
|
+
- For multimodal-capable models, `tool_result` content can include image blocks when supported by the provider/model.
|
|
143
302
|
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
303
|
+
## Image Input
|
|
304
|
+
|
|
305
|
+
Send images by including an `image` content block in a user message.
|
|
147
306
|
|
|
148
|
-
#### Examples
|
|
149
307
|
```ruby
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
|
|
154
|
-
|
|
155
|
-
|
|
156
|
-
|
|
157
|
-
|
|
308
|
+
require "llm_gateway"
|
|
309
|
+
require "base64"
|
|
310
|
+
|
|
311
|
+
adapter = LlmGateway.build_provider(
|
|
312
|
+
provider: "openai_apikey_responses",
|
|
313
|
+
api_key: ENV.fetch("OPENAI_API_KEY"),
|
|
314
|
+
model_key: "gpt-5.4"
|
|
315
|
+
)
|
|
158
316
|
|
|
159
|
-
|
|
160
|
-
{ role: "assistant", content: "what kind of content"},
|
|
317
|
+
image_b64 = Base64.strict_encode64(File.binread("./chart.png"))
|
|
161
318
|
|
|
162
|
-
|
|
163
|
-
{
|
|
319
|
+
message = [
|
|
320
|
+
{
|
|
321
|
+
role: "user",
|
|
164
322
|
content: [
|
|
165
|
-
{
|
|
323
|
+
{ type: "text", text: "What do you see in this image?" },
|
|
324
|
+
{ type: "image", data: image_b64, media_type: "image/png" }
|
|
166
325
|
]
|
|
167
|
-
}
|
|
326
|
+
}
|
|
327
|
+
]
|
|
328
|
+
|
|
329
|
+
result = adapter.stream(message) # stream API, no event block
|
|
330
|
+
|
|
331
|
+
text = result.content
|
|
332
|
+
.select { |b| b.type == "text" }
|
|
333
|
+
.map(&:text)
|
|
334
|
+
.join
|
|
335
|
+
|
|
336
|
+
puts text
|
|
168
337
|
```
|
|
169
338
|
|
|
170
|
-
|
|
171
|
-
it to the list of roles, when it is not supported it will be mapped to user instead.
|
|
339
|
+
Tip: use a model/provider combination that supports vision input.
|
|
172
340
|
|
|
173
|
-
|
|
341
|
+
## Thinking / Reasoning
|
|
174
342
|
|
|
343
|
+
You can request higher-effort reasoning by passing `reasoning:` to `stream`.
|
|
175
344
|
|
|
176
|
-
|
|
345
|
+
```ruby
|
|
346
|
+
require "llm_gateway"
|
|
347
|
+
|
|
348
|
+
adapter = LlmGateway.build_provider(
|
|
349
|
+
provider: "openai_apikey_responses",
|
|
350
|
+
api_key: ENV.fetch("OPENAI_API_KEY"),
|
|
351
|
+
model_key: "gpt-5.4"
|
|
352
|
+
)
|
|
353
|
+
|
|
354
|
+
result = adapter.stream(
|
|
355
|
+
"Think step by step and then compute 482 * 17.",
|
|
356
|
+
reasoning: "high"
|
|
357
|
+
)
|
|
358
|
+
|
|
359
|
+
puts "stop_reason: #{result.stop_reason}"
|
|
360
|
+
puts "usage: #{result.usage.inspect}" # may include reasoning_tokens depending on provider
|
|
361
|
+
|
|
362
|
+
result.content.each do |block|
|
|
363
|
+
case block.type
|
|
364
|
+
when "reasoning"
|
|
365
|
+
puts "[reasoning] #{block.reasoning}"
|
|
366
|
+
puts "[signature] #{block.signature}" if block.respond_to?(:signature) && block.signature
|
|
367
|
+
when "text"
|
|
368
|
+
puts "[text] #{block.text}"
|
|
369
|
+
end
|
|
370
|
+
end
|
|
371
|
+
```
|
|
177
372
|
|
|
178
|
-
|
|
373
|
+
### Streaming Thinking Content
|
|
179
374
|
|
|
180
|
-
|
|
375
|
+
If you want incremental thinking/reasoning tokens as they arrive, pass a block to `stream` and handle reasoning events:
|
|
181
376
|
|
|
182
377
|
```ruby
|
|
183
|
-
|
|
184
|
-
|
|
185
|
-
result =
|
|
186
|
-
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
|
|
190
|
-
|
|
191
|
-
|
|
192
|
-
|
|
193
|
-
|
|
194
|
-
|
|
195
|
-
|
|
196
|
-
|
|
378
|
+
reasoning_text = +""
|
|
379
|
+
|
|
380
|
+
result = adapter.stream("Solve 99 * 99 with brief reasoning.", reasoning: "high") do |event|
|
|
381
|
+
case event.type
|
|
382
|
+
when :reasoning_start
|
|
383
|
+
print "\n[thinking start]\n"
|
|
384
|
+
reasoning_text << event.delta
|
|
385
|
+
when :reasoning_delta
|
|
386
|
+
reasoning_text << event.delta
|
|
387
|
+
print event.delta
|
|
388
|
+
when :reasoning_end
|
|
389
|
+
print "\n[thinking end]\n"
|
|
390
|
+
end
|
|
391
|
+
end
|
|
392
|
+
|
|
393
|
+
puts "\nCollected reasoning chars: #{reasoning_text.length}"
|
|
394
|
+
puts "Final stop_reason: #{result.stop_reason}"
|
|
197
395
|
```
|
|
198
396
|
|
|
199
|
-
###
|
|
397
|
+
### How reasoning values are mapped
|
|
200
398
|
|
|
201
|
-
|
|
202
|
-
- Creating reusable Prompt and Tool classes
|
|
203
|
-
- Handling conversation transcripts with tool execution
|
|
204
|
-
- Building an interactive terminal interface
|
|
399
|
+
`llm_gateway` normalizes provider-specific reasoning/thinking output into shared structures:
|
|
205
400
|
|
|
206
|
-
|
|
401
|
+
- Stream events:
|
|
402
|
+
- `:reasoning_start/:reasoning_delta/:reasoning_end`
|
|
403
|
+
- Final content block:
|
|
404
|
+
- `ReasoningContent` with `type: "reasoning"`
|
|
405
|
+
- fields: `reasoning` and optional `signature`
|
|
406
|
+
- Usage accounting:
|
|
407
|
+
- normalized in `result.usage` when provided by the upstream API
|
|
408
|
+
- may include `:reasoning_tokens` plus standard token counters
|
|
207
409
|
|
|
208
|
-
|
|
209
|
-
|
|
210
|
-
|
|
410
|
+
In practice this means you can:
|
|
411
|
+
- listen to `:reasoning_*` stream event variants, and
|
|
412
|
+
- always read final reasoning text from `result.content` blocks where `block.type == "reasoning"`.
|
|
413
|
+
|
|
414
|
+
Notes:
|
|
415
|
+
- Reasoning output appears as `ReasoningContent` blocks with `type: "reasoning"`.
|
|
416
|
+
- Some providers/models expose explicit reasoning content; others may only reflect reasoning effort in usage fields.
|
|
417
|
+
- In streamed callbacks, reasoning events are emitted as `:reasoning_*` variants.
|
|
418
|
+
|
|
419
|
+
## Cross-Provider Handoffs
|
|
420
|
+
|
|
421
|
+
Internally, `llm_gateway` handles handoffs by normalizing message history into a provider-agnostic shape, then remapping that shape to the target provider API on each request.
|
|
422
|
+
|
|
423
|
+
What happens under the hood on `stream`/`chat`:
|
|
424
|
+
|
|
425
|
+
1. **Normalize input**
|
|
426
|
+
- String input is converted to a user message.
|
|
427
|
+
- `system` is normalized into system message objects.
|
|
428
|
+
- Prior assistant turns (including `response.to_h`) are treated as structured transcript entries.
|
|
429
|
+
|
|
430
|
+
2. **Map into canonical gateway format**
|
|
431
|
+
- Provider-specific differences (content block names, tool-call shapes, reasoning/thinking variants) are unified into shared structs.
|
|
432
|
+
|
|
433
|
+
3. **Sanitize for target provider/model**
|
|
434
|
+
- Before sending, messages are sanitized for the destination provider/API/model.
|
|
435
|
+
- Unsupported or provider-specific fields are adjusted/translated where possible.
|
|
436
|
+
|
|
437
|
+
4. **Map to outbound provider payload**
|
|
438
|
+
- The adapter input mapper converts canonical messages/tools/options into the exact wire format expected by the selected provider endpoint.
|
|
439
|
+
|
|
440
|
+
5. **Map response back to canonical output**
|
|
441
|
+
- Stream chunks are mapped into normalized stream events.
|
|
442
|
+
- Final output is accumulated into a normalized `AssistantMessage` (`id`, `model`, `usage`, `stop_reason`, `content`, etc.).
|
|
443
|
+
|
|
444
|
+
Why this matters:
|
|
445
|
+
- A transcript produced by one provider can be reused with another provider without manually rewriting message structure.
|
|
446
|
+
- Tool calls/reasoning/text are exposed through a consistent API even when upstream event formats differ.
|
|
447
|
+
- Your app can keep one conversation state format while switching providers for cost, latency, capability, or reliability reasons.
|
|
448
|
+
|
|
449
|
+
## Context Serialization
|
|
450
|
+
|
|
451
|
+
`llm_gateway` contexts are plain Ruby hashes/arrays, so they can be serialized to JSON and restored later.
|
|
452
|
+
|
|
453
|
+
```ruby
|
|
454
|
+
require "llm_gateway"
|
|
455
|
+
require "json"
|
|
456
|
+
|
|
457
|
+
adapter = LlmGateway.build_provider(
|
|
458
|
+
provider: "openai_apikey_responses",
|
|
459
|
+
api_key: ENV.fetch("OPENAI_API_KEY"),
|
|
460
|
+
model_key: "gpt-5.4"
|
|
461
|
+
)
|
|
462
|
+
|
|
463
|
+
# Build context (transcript)
|
|
464
|
+
transcript = [
|
|
465
|
+
{ role: "user", content: "Plan a 3-day trip to Tokyo." }
|
|
466
|
+
]
|
|
467
|
+
|
|
468
|
+
# Run one turn and persist assistant output
|
|
469
|
+
first = adapter.stream(transcript)
|
|
470
|
+
transcript << first.to_h
|
|
471
|
+
|
|
472
|
+
# Serialize (store in DB/file/cache)
|
|
473
|
+
json_context = JSON.generate(transcript)
|
|
474
|
+
|
|
475
|
+
# ...later / elsewhere...
|
|
476
|
+
restored_transcript = JSON.parse(json_context)
|
|
477
|
+
|
|
478
|
+
# Continue conversation from restored context
|
|
479
|
+
restored_transcript << { role: "user", content: "Now make it budget-friendly." }
|
|
480
|
+
second = adapter.stream(restored_transcript)
|
|
481
|
+
|
|
482
|
+
puts second.content.select { |b| b.type == "text" }.map(&:text).join
|
|
211
483
|
```
|
|
212
484
|
|
|
213
|
-
|
|
485
|
+
What to persist:
|
|
486
|
+
- full transcript array (including assistant messages from `response.to_h`)
|
|
487
|
+
- any tool result messages you appended
|
|
488
|
+
- optional app metadata (user id, conversation id, timestamps) alongside the transcript
|
|
489
|
+
|
|
490
|
+
Tip: if you serialize to JSON, keys become strings on parse; `llm_gateway` accepts standard hash input and normalizes internally.
|
|
491
|
+
|
|
492
|
+
## OAuth
|
|
214
493
|
|
|
215
|
-
|
|
494
|
+
Use OAuth-capable providers (for example `openai_codex` and `anthropic_oauth_messages`) by supplying an `access_token` when building the adapter.
|
|
216
495
|
|
|
217
|
-
|
|
496
|
+
### Get initial tokens (Codex / OpenAI OAuth)
|
|
218
497
|
|
|
219
498
|
```ruby
|
|
220
|
-
|
|
221
|
-
|
|
222
|
-
|
|
223
|
-
|
|
224
|
-
|
|
225
|
-
|
|
226
|
-
|
|
227
|
-
|
|
228
|
-
|
|
229
|
-
|
|
230
|
-
|
|
231
|
-
|
|
232
|
-
|
|
233
|
-
|
|
234
|
-
|
|
235
|
-
|
|
236
|
-
|
|
237
|
-
|
|
499
|
+
require "llm_gateway"
|
|
500
|
+
|
|
501
|
+
flow = LlmGateway::Clients::OpenAI::OAuthFlow.new
|
|
502
|
+
|
|
503
|
+
# 1) Start flow (generate auth URL + PKCE verifier + state)
|
|
504
|
+
start = flow.start
|
|
505
|
+
puts "Open in browser: #{start[:authorization_url]}"
|
|
506
|
+
|
|
507
|
+
# 2) After user auth, paste redirect URL (or raw code)
|
|
508
|
+
# Example: http://localhost:1455/auth/callback?code=...&state=...
|
|
509
|
+
print "Paste callback URL or code: "
|
|
510
|
+
input = STDIN.gets&.strip
|
|
511
|
+
|
|
512
|
+
# 3) Exchange for initial tokens
|
|
513
|
+
tokens = flow.exchange_code(input, start[:code_verifier], expected_state: start[:state])
|
|
514
|
+
|
|
515
|
+
puts tokens
|
|
516
|
+
# => {
|
|
517
|
+
# access_token: "...",
|
|
518
|
+
# refresh_token: "...",
|
|
519
|
+
# expires_at: <Time>,
|
|
520
|
+
# account_id: "..."
|
|
521
|
+
# }
|
|
522
|
+
```
|
|
523
|
+
|
|
524
|
+
### Get initial tokens (Anthropic OAuth)
|
|
525
|
+
|
|
526
|
+
```ruby
|
|
527
|
+
require "llm_gateway"
|
|
528
|
+
|
|
529
|
+
flow = LlmGateway::Clients::ClaudeCode::OAuthFlow.new
|
|
530
|
+
|
|
531
|
+
# 1) Start flow (auth URL + PKCE verifier + state)
|
|
532
|
+
start = flow.start
|
|
533
|
+
puts "Open in browser: #{start[:authorization_url]}"
|
|
534
|
+
|
|
535
|
+
# 2) After user auth, paste callback URL (or code)
|
|
536
|
+
# Example callback contains ?code=...&state=...
|
|
537
|
+
print "Paste callback URL or code: "
|
|
538
|
+
input = STDIN.gets&.strip
|
|
539
|
+
|
|
540
|
+
# 3) Exchange for initial tokens
|
|
541
|
+
tokens = flow.exchange_code(input, start[:code_verifier], state: start[:state])
|
|
542
|
+
|
|
543
|
+
puts tokens
|
|
544
|
+
# => {
|
|
545
|
+
# access_token: "...",
|
|
546
|
+
# refresh_token: "...",
|
|
547
|
+
# expires_at: <Time>
|
|
548
|
+
# }
|
|
238
549
|
```
|
|
239
550
|
|
|
240
|
-
###
|
|
551
|
+
### Get a refresh token
|
|
241
552
|
|
|
242
|
-
|
|
553
|
+
### Exchange refresh token for access token
|
|
554
|
+
|
|
555
|
+
Use the built-in token managers in this repo. `on_token_refresh` block will be called when the refresh token is updated and should be persisted.
|
|
556
|
+
|
|
557
|
+
OpenAI Codex OAuth:
|
|
243
558
|
|
|
244
559
|
```ruby
|
|
245
|
-
|
|
246
|
-
|
|
247
|
-
|
|
248
|
-
|
|
249
|
-
|
|
250
|
-
|
|
251
|
-
|
|
252
|
-
|
|
560
|
+
require "llm_gateway"
|
|
561
|
+
|
|
562
|
+
manager = LlmGateway::Clients::OpenAI::TokenManager.new(
|
|
563
|
+
refresh_token: stored_refresh_token,
|
|
564
|
+
access_token: stored_access_token, # optional
|
|
565
|
+
expires_at: stored_expires_at # optional
|
|
566
|
+
)
|
|
567
|
+
|
|
568
|
+
manager.on_token_refresh = lambda do |new_access_token, new_refresh_token, new_expires_at|
|
|
569
|
+
# Persist updated credentials in your DB/secrets store
|
|
253
570
|
end
|
|
571
|
+
|
|
572
|
+
current_access_token = manager.access_token
|
|
254
573
|
```
|
|
255
574
|
|
|
256
|
-
|
|
575
|
+
Anthropic OAuth:
|
|
576
|
+
|
|
577
|
+
```ruby
|
|
578
|
+
require "llm_gateway"
|
|
579
|
+
|
|
580
|
+
manager = LlmGateway::Clients::ClaudeCode::TokenManager.new(
|
|
581
|
+
refresh_token: stored_refresh_token,
|
|
582
|
+
access_token: stored_access_token, # optional
|
|
583
|
+
expires_at: stored_expires_at, # optional
|
|
584
|
+
client_id: ENV.fetch("ANTHROPIC_CLIENT_ID"),
|
|
585
|
+
client_secret: ENV["ANTHROPIC_CLIENT_SECRET"] # optional depending on app setup
|
|
586
|
+
)
|
|
587
|
+
|
|
588
|
+
manager.on_token_refresh = lambda do |new_access_token, new_refresh_token, new_expires_at|
|
|
589
|
+
# Persist updated credentials
|
|
590
|
+
end
|
|
257
591
|
|
|
258
|
-
|
|
592
|
+
current_access_token = manager.access_token
|
|
593
|
+
```
|
|
594
|
+
|
|
595
|
+
### Pass access token in provider requests
|
|
596
|
+
|
|
597
|
+
Build the provider with the current access token:
|
|
598
|
+
|
|
599
|
+
```ruby
|
|
600
|
+
adapter = LlmGateway.build_provider(
|
|
601
|
+
provider: "openai_codex",
|
|
602
|
+
access_token: current_access_token,
|
|
603
|
+
model_key: "gpt-5.4"
|
|
604
|
+
)
|
|
605
|
+
|
|
606
|
+
result = adapter.stream("Hello from OAuth auth")
|
|
607
|
+
puts result.content.select { |b| b.type == "text" }.map(&:text).join
|
|
608
|
+
```
|
|
259
609
|
|
|
260
|
-
|
|
610
|
+
If your app refreshes tokens in the background, rebuild the adapter (or recreate client state) with the newest `access_token` before subsequent calls.
|
|
261
611
|
|
|
262
|
-
|
|
612
|
+
### Token refresh responsibility
|
|
263
613
|
|
|
264
|
-
|
|
614
|
+
#### Library’s role (llm_gateway)
|
|
265
615
|
|
|
266
|
-
|
|
616
|
+
- Provides token manager helpers.
|
|
617
|
+
- Detects expiry from expires_at.
|
|
618
|
+
- Refreshes access token when asked (ensure_valid_token / refresh methods).
|
|
619
|
+
- Returns updated token values and triggers on_token_refresh callback after successful refresh.
|
|
620
|
+
- Uses whatever access token you pass into provider requests.
|
|
267
621
|
|
|
268
|
-
|
|
622
|
+
#### User/app’s role
|
|
269
623
|
|
|
270
|
-
|
|
624
|
+
- Persist tokens securely (DB/secrets store).
|
|
625
|
+
- Store and pass access_token, refresh_token, expires_at into the token manager.
|
|
626
|
+
- Implement on_token_refresh to save updated credentials.
|
|
627
|
+
- Decide refresh/retry policy at app level (e.g., retry failed request after refresh when appropriate).
|
|
628
|
+
- Rebuild client/provider state with latest access token for future calls.
|
|
271
629
|
|
|
272
|
-
|
|
630
|
+
In short: library executes refresh mechanics; your app owns token lifecycle persistence and operational policy.
|