llm_gateway 0.2.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (74) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +42 -0
  3. data/README.md +565 -129
  4. data/Rakefile +8 -3
  5. data/docs/migration-guide.md +135 -0
  6. data/lib/llm_gateway/adapters/adapter.rb +173 -0
  7. data/lib/llm_gateway/adapters/anthropic/acts_like_messages.rb +23 -0
  8. data/lib/llm_gateway/adapters/anthropic/bidirectional_message_mapper.rb +111 -0
  9. data/lib/llm_gateway/adapters/{claude → anthropic}/input_mapper.rb +12 -10
  10. data/lib/llm_gateway/adapters/anthropic/messages_adapter.rb +19 -0
  11. data/lib/llm_gateway/adapters/anthropic/output_mapper.rb +50 -0
  12. data/lib/llm_gateway/adapters/anthropic/stream_mapper.rb +110 -0
  13. data/lib/llm_gateway/adapters/anthropic_option_mapper.rb +53 -0
  14. data/lib/llm_gateway/adapters/groq/chat_completions_adapter.rb +47 -0
  15. data/lib/llm_gateway/adapters/groq/option_mapper.rb +27 -0
  16. data/lib/llm_gateway/adapters/input_message_sanitizer.rb +93 -0
  17. data/lib/llm_gateway/adapters/openai/acts_like_chat_completions.rb +22 -0
  18. data/lib/llm_gateway/adapters/openai/acts_like_responses.rb +31 -0
  19. data/lib/llm_gateway/adapters/openai/chat_completions/bidirectional_message_mapper.rb +110 -0
  20. data/lib/llm_gateway/adapters/openai/chat_completions/input_mapper.rb +105 -0
  21. data/lib/llm_gateway/adapters/openai/chat_completions/input_message_sanitizer.rb +65 -0
  22. data/lib/llm_gateway/adapters/openai/chat_completions/option_mapper.rb +39 -0
  23. data/lib/llm_gateway/adapters/openai/chat_completions/output_mapper.rb +40 -0
  24. data/lib/llm_gateway/adapters/openai/chat_completions/stream_mapper.rb +242 -0
  25. data/lib/llm_gateway/adapters/openai/chat_completions_adapter.rb +20 -0
  26. data/lib/llm_gateway/adapters/openai/file_output_mapper.rb +25 -0
  27. data/lib/llm_gateway/adapters/openai/prompt_cache_option_mapper.rb +39 -0
  28. data/lib/llm_gateway/adapters/openai/responses/bidirectional_message_mapper.rb +120 -0
  29. data/lib/llm_gateway/adapters/openai/responses/input_mapper.rb +106 -0
  30. data/lib/llm_gateway/adapters/openai/responses/option_mapper.rb +41 -0
  31. data/lib/llm_gateway/adapters/openai/responses/output_mapper.rb +47 -0
  32. data/lib/llm_gateway/adapters/openai/responses/stream_mapper.rb +340 -0
  33. data/lib/llm_gateway/adapters/openai/responses_adapter.rb +20 -0
  34. data/lib/llm_gateway/adapters/openai_codex/input_mapper.rb +206 -0
  35. data/lib/llm_gateway/adapters/openai_codex/option_mapper.rb +28 -0
  36. data/lib/llm_gateway/adapters/openai_codex/responses_adapter.rb +38 -0
  37. data/lib/llm_gateway/adapters/{open_ai/output_mapper.rb → option_mapper.rb} +5 -2
  38. data/lib/llm_gateway/adapters/stream_accumulator.rb +91 -0
  39. data/lib/llm_gateway/adapters/structs.rb +145 -0
  40. data/lib/llm_gateway/base_client.rb +97 -1
  41. data/lib/llm_gateway/client.rb +66 -54
  42. data/lib/llm_gateway/clients/anthropic.rb +167 -0
  43. data/lib/llm_gateway/clients/claude_code/oauth_flow.rb +162 -0
  44. data/lib/llm_gateway/clients/claude_code/token_manager.rb +112 -0
  45. data/lib/llm_gateway/clients/groq.rb +54 -0
  46. data/lib/llm_gateway/clients/openai.rb +208 -0
  47. data/lib/llm_gateway/clients/openai_codex/oauth_flow.rb +258 -0
  48. data/lib/llm_gateway/clients/openai_codex/token_manager.rb +71 -0
  49. data/lib/llm_gateway/errors.rb +23 -0
  50. data/lib/llm_gateway/prompt.rb +12 -1
  51. data/lib/llm_gateway/provider_registry.rb +37 -0
  52. data/lib/llm_gateway/version.rb +1 -1
  53. data/lib/llm_gateway.rb +169 -10
  54. data/scripts/create_anthropic_credentials.rb +106 -0
  55. data/scripts/create_openai_codex_credentials.rb +116 -0
  56. data/scripts/generate_handoff_live_fixture.rb +169 -0
  57. data/scripts/generate_handoff_media_fixture.rb +167 -0
  58. metadata +64 -21
  59. data/lib/llm_gateway/adapters/claude/client.rb +0 -56
  60. data/lib/llm_gateway/adapters/claude/output_mapper.rb +0 -30
  61. data/lib/llm_gateway/adapters/groq/client.rb +0 -58
  62. data/lib/llm_gateway/adapters/groq/input_mapper.rb +0 -105
  63. data/lib/llm_gateway/adapters/groq/output_mapper.rb +0 -62
  64. data/lib/llm_gateway/adapters/open_ai/client.rb +0 -59
  65. data/lib/llm_gateway/adapters/open_ai/input_mapper.rb +0 -63
  66. data/sample/claude_code_clone/agent.rb +0 -65
  67. data/sample/claude_code_clone/claude_code_clone.rb +0 -40
  68. data/sample/claude_code_clone/prompt.rb +0 -79
  69. data/sample/claude_code_clone/run.rb +0 -47
  70. data/sample/claude_code_clone/tools/bash_tool.rb +0 -54
  71. data/sample/claude_code_clone/tools/edit_tool.rb +0 -61
  72. data/sample/claude_code_clone/tools/grep_tool.rb +0 -113
  73. data/sample/claude_code_clone/tools/read_tool.rb +0 -61
  74. data/sample/claude_code_clone/tools/todowrite_tool.rb +0 -98
data/README.md CHANGED
@@ -1,194 +1,630 @@
1
- # LlmGateway
1
+ # llm_gateway
2
+
3
+ Provide a unified translation interface for LLM Provider API's, While allowing developers to have as much control as possible, This does make it more complicated because we dont want developers to be blocked at using something that the provider supports. As time progress the library will mature and support more responses
4
+
5
+ ## Table of Contents
6
+
7
+ - [Principles:](#principles)
8
+ - [Installation](#installation)
9
+ - [Supported Providers](#supported-providers)
10
+ - [Quick Start: Streaming (all events)](#quick-start-streaming-all-events)
11
+ - [Stream API without handling events (final result only)](#stream-api-without-handling-events-final-result-only)
12
+ - [Migration guides](#migration-guides)
13
+ - [Tools](#tools)
14
+ - [Defining Tools](#defining-tools)
15
+ - [Handling Tool Calls](#handling-tool-calls)
16
+ - [Image Input](#image-input)
17
+ - [Thinking / Reasoning](#thinking--reasoning)
18
+ - [Streaming Thinking Content](#streaming-thinking-content)
19
+ - [How reasoning values are mapped](#how-reasoning-values-are-mapped)
20
+ - [Cross-Provider Handoffs](#cross-provider-handoffs)
21
+ - [Context Serialization](#context-serialization)
22
+ - [OAuth](#oauth)
23
+ - [Get initial tokens (Codex / OpenAI OAuth)](#get-initial-tokens-codex--openai-oauth)
24
+ - [Get initial tokens (Anthropic OAuth)](#get-initial-tokens-anthropic-oauth)
25
+ - [Get a refresh token](#get-a-refresh-token)
26
+ - [Exchange refresh token for access token](#exchange-refresh-token-for-access-token)
27
+ - [Pass access token in provider requests](#pass-access-token-in-provider-requests)
28
+ - [Token refresh responsibility](#token-refresh-responsibility)
29
+ - [Library’s role (llm_gateway)](#librarys-role-llm_gateway)
30
+ - [User/app’s role](#userapps-role)
31
+
32
+ ## Principles:
33
+ 1. Transcription integrity is most important
34
+ 2. Input messages must have bidirectional integrity
35
+ 3. Allow developers as much control as possible
2
36
 
3
- Provide nuts and bolts for LLM APIs. The goal is to provide a unified interface for multiple LLM provider API's; And Enable developers to have as much control as they want.
37
+ ## Installation
38
+
39
+ ```bash
40
+ gem install llm_gateway
41
+ ```
4
42
 
5
- You can use the clients directly, Or you can use the gateway to have interop between clients.
43
+ Or add it to your `Gemfile`:
44
+
45
+ ```ruby
46
+ gem "llm_gateway"
47
+ ```
6
48
 
7
49
  ## Supported Providers
8
- Anthropic, OpenAi, Groq
9
50
 
51
+ | Provider | Provider Key | Auth | API Surface |
52
+ |-----------|------------------------------|-------|------------------------|
53
+ | Anthropic | `anthropic_messages` | API key | Messages |
54
+ | OpenAI | `openai_completions` | API key | Chat Completions |
55
+ | OpenAI | `openai_responses` | API key | Responses |
56
+ | OpenAI Codex | `openai_codex` | OAuth | Responses |
57
+ | Groq | `groq_completions` | API key | Chat Completions |
10
58
 
11
- ## Installation
59
+ Legacy keys (`*_apikey_*`, `*_oauth_*`) are still supported for backward compatibility.
12
60
 
13
- Add the gem to your application's Gemfile:
61
+ ## Quick Start: Streaming (all events)
14
62
 
15
- ```bash
16
- bundle add llm_gateway
17
- ```
63
+ ```ruby
64
+ require "llm_gateway"
65
+ require "json"
66
+
67
+ # Build a provider adapter directly (not via prebuilt config)
68
+ adapter = LlmGateway.build_provider(
69
+ provider: "openai_responses", # or anthropic_messages, groq_completions, ...
70
+ api_key: ENV.fetch("OPENAI_API_KEY"),
71
+ model_key: "gpt-5.4"
72
+ )
18
73
 
19
- Or install it yourself:
74
+ tools = [
75
+ {
76
+ name: "get_time",
77
+ description: "Get the current time",
78
+ input_schema: {
79
+ type: "object",
80
+ properties: {
81
+ timezone: { type: "string", description: "Optional timezone, e.g. America/New_York" }
82
+ }
83
+ }
84
+ }
85
+ ]
86
+
87
+ transcript = [
88
+ { role: "user", content: "What time is it? Think briefly, then call get_time." }
89
+ ]
90
+
91
+ streamed_tool_args = Hash.new { |h, k| h[k] = +"" }
92
+
93
+ response = adapter.stream(transcript, tools: tools, reasoning: "high") do |event|
94
+ case event.type
95
+ # AssistantStreamMessageEvent
96
+ when :message_start
97
+ puts "\n[message_start] #{event.delta.inspect}"
98
+ when :message_delta
99
+ puts "\n[message_delta] #{event.delta.inspect} usage+=#{event.usage_increment.inspect}"
100
+ when :message_end
101
+ puts "\n[message_end]"
102
+
103
+ # Text events
104
+ when :text_start
105
+ puts "\n[text_start] index=#{event.content_index}"
106
+ print event.delta unless event.delta.empty?
107
+ when :text_delta
108
+ print event.delta
109
+ when :text_end
110
+ puts "\n[text_end] index=#{event.content_index}"
111
+
112
+ # Tool-call events
113
+ when :tool_start
114
+ puts "\n[tool_start] id=#{event.id} name=#{event.name} index=#{event.content_index}"
115
+ when :tool_delta
116
+ streamed_tool_args[event.content_index] << event.delta
117
+ print event.delta
118
+ when :tool_end
119
+ puts "\n[tool_end] index=#{event.content_index}"
120
+ begin
121
+ puts "tool args: #{JSON.parse(streamed_tool_args[event.content_index])}"
122
+ rescue JSON::ParserError
123
+ puts "tool args (partial/raw): #{streamed_tool_args[event.content_index]}"
124
+ end
125
+
126
+ # Reasoning events
127
+ when :reasoning_start
128
+ puts "\n[reasoning_start] sig=#{event.respond_to?(:signature) ? event.signature : ""}"
129
+ print event.delta
130
+ when :reasoning_delta
131
+ print event.delta
132
+ when :reasoning_end
133
+ puts "\n[reasoning_end]"
134
+
135
+ end
136
+ end
20
137
 
21
- ```bash
22
- gem install llm_gateway
138
+ # Final AssistantMessage (assembled from the stream)
139
+ puts "\n\n=== Final assistant message ==="
140
+ puts "id: #{response.id}"
141
+ puts "model: #{response.model}"
142
+ puts "provider/api: #{response.provider}/#{response.api}"
143
+ puts "role: #{response.role}"
144
+ puts "stop_reason: #{response.stop_reason}"
145
+ puts "error_message: #{response.error_message.inspect}" if response.error_message
146
+ puts "usage: #{response.usage.inspect}"
147
+
148
+ response.content.each do |block|
149
+ case block.type
150
+ when "text"
151
+ puts "text: #{block.text}"
152
+ when "reasoning"
153
+ puts "reasoning: #{block.reasoning}"
154
+ puts "signature: #{block.signature}" if block.respond_to?(:signature) && block.signature
155
+ when "tool_use"
156
+ puts "tool_use: #{block.name}(#{block.input.inspect}) id=#{block.id}"
157
+ end
158
+ end
23
159
  ```
24
160
 
25
- ## Usage
161
+ Stream callback event families:
162
+ - `AssistantStreamMessageEvent`: `:message_start`, `:message_delta`, `:message_end`
163
+ - `AssistantStreamEvent` (and subclasses):
164
+ - Text: `:text_start`, `:text_delta`, `:text_end`
165
+ - Tool call: `:tool_start`, `:tool_delta`, `:tool_end`
166
+ - Reasoning: `:reasoning_start`, `:reasoning_delta`, `:reasoning_end`
167
+
168
+ ### Stream API without handling events (final result only)
26
169
 
27
- ### Basic Chat
170
+ If you only care about the final `AssistantMessage`, call `stream` without a block:
28
171
 
29
172
  ```ruby
30
- require 'llm_gateway'
173
+ require "llm_gateway"
31
174
 
32
- # Simple text completion
33
- LlmGateway::Client.chat(
34
- 'claude-sonnet-4-20250514',
35
- 'What is the capital of France?'
175
+ adapter = LlmGateway.build_provider(
176
+ provider: "openai_apikey_responses",
177
+ api_key: ENV.fetch("OPENAI_API_KEY"),
178
+ model_key: "gpt-5.4"
36
179
  )
37
180
 
38
- # With system message
39
- LlmGateway::Client.chat(
40
- 'gpt-4',
41
- 'What is the capital of France?',
42
- system: 'You are a helpful geography teacher.'
43
- )
181
+ result = adapter.stream("Write one short sentence about Ruby.")
44
182
 
45
- # With inline file
46
- LlmGateway::Client.chat(
47
- "claude-sonnet-4-20250514",
48
- [
49
- {
50
- role: "user", content: [
51
- { type: "text", text: "return the content of the document exactly" },
52
- { type: "file", data: "abc\n", media_type: "text/plain", name: "small.txt" }
53
- ]
54
- },
55
- ]
56
- )
183
+ puts result.role # "assistant"
184
+ puts result.stop_reason # "stop" (usually)
185
+ puts result.usage.inspect
57
186
 
58
- # Transcript
59
- LlmGateway::Client.chat('llama-3.3-70b-versatile',[
60
- { role: "user", content: "Tell Me a joke" },
61
- { role: "assistant", content: "what kind of content"},
62
- { role: "user", content: "About Sparkling water" },
63
- ]
64
- )
187
+ text = result.content
188
+ .select { |block| block.type == "text" }
189
+ .map(&:text)
190
+ .join
191
+
192
+ puts text
193
+ ```
194
+
195
+ ## Migration guides
65
196
 
197
+ - [Migrating from `chat` to `stream`](docs/chat-to-stream-migration.md) — use `stream` without a block when you only need the final response.
66
198
 
67
- # Tool usage
68
- LlmGateway::Client.chat('gpt-5',[
69
- { role: "user", content: "What's the weather in Singapore? reply in 10 words and no special characters" },
70
- { role: "assistant",
71
- content: [
72
- { id: "call_gpXfy9l9QNmShNEbNI1FyuUZ", type: "tool_use", name: "get_weather", input: { location: "Singapore" } }
73
- ]
199
+ ## Tools
200
+
201
+ ### Defining Tools
202
+
203
+ ```ruby
204
+ weather_tool = {
205
+ name: "get_weather",
206
+ description: "Get current weather for a location",
207
+ input_schema: {
208
+ type: "object",
209
+ properties: {
210
+ location: { type: "string", description: "City name or coordinates" },
211
+ units: {
212
+ type: "string",
213
+ enum: ["celsius", "fahrenheit"],
214
+ default: "celsius"
215
+ }
74
216
  },
75
- { role: "developer",
76
- content: [
77
- { content: "-15 celcius", type: "tool_result", tool_use_id: "call_gpXfy9l9QNmShNEbNI1FyuUZ" }
78
- ]
79
- }
80
- ],
81
- tools: [ { name: "get_weather", description: "Get current weather for a location", input_schema: { type: "object", properties: { location: { type: "string", description: "City name" } }, required: [ "location" ] } } ]
82
- )
217
+ required: ["location"]
218
+ }
219
+ }
83
220
  ```
84
221
 
85
- ### Supported Roles
222
+ ### Handling Tool Calls
86
223
 
87
- - user
88
- - developer
89
- - assistant
224
+ Use `stream` without a block, inspect returned `tool_use` blocks, execute tools, append `tool_result`, then continue:
90
225
 
91
- #### Examples
92
226
  ```ruby
93
- # tool call
94
- { role: "developer",
95
- content: [
96
- { content: "-15 celcius", type: "tool_result", tool_use_id: "call_gpXfy9l9QNmShNEbNI1FyuUZ" }
97
- ]
227
+ require "llm_gateway"
228
+ require "json"
229
+
230
+ adapter = LlmGateway.build_provider(
231
+ provider: "openai_apikey_responses",
232
+ api_key: ENV.fetch("OPENAI_API_KEY"),
233
+ model_key: "gpt-5.4"
234
+ )
235
+
236
+ weather_tool = {
237
+ name: "get_weather",
238
+ description: "Get current weather for a location",
239
+ input_schema: {
240
+ type: "object",
241
+ properties: {
242
+ location: { type: "string" },
243
+ units: { type: "string", enum: ["celsius", "fahrenheit"], default: "celsius" }
244
+ },
245
+ required: ["location"]
246
+ }
98
247
  }
99
- # plain message
100
- { role: "user", content: "What's the weather in Singapore? reply in 10 words and no special characters" }
101
248
 
102
- # plain response
103
- { role: "assistant", content: "what kind of content"},
249
+ def execute_weather_api(args)
250
+ # Replace with real API call
251
+ {
252
+ location: args[:location] || args["location"],
253
+ units: args[:units] || args["units"] || "celsius",
254
+ temperature: 14,
255
+ condition: "Cloudy"
256
+ }
257
+ end
258
+
259
+ transcript = [
260
+ { role: "user", content: "What is the weather in London?" }
261
+ ]
262
+
263
+ # 1) First model pass (stream API, no event block)
264
+ response = adapter.stream(transcript, tools: [weather_tool])
265
+ transcript << response.to_h
104
266
 
105
- # tool call response
106
- { role: "assistant",
267
+ # 2) Execute tool calls returned by the model
268
+ response.content.each do |block|
269
+ next unless block.type == "tool_use"
270
+
271
+ tool_result = execute_weather_api(block.input)
272
+
273
+ transcript << {
274
+ role: "developer",
107
275
  content: [
108
- { id: "call_gpXfy9l9QNmShNEbNI1FyuUZ", type: "tool_use", name: "get_weather", input: { location: "Singapore" } }
276
+ {
277
+ type: "tool_result",
278
+ tool_use_id: block.id,
279
+ content: JSON.generate(tool_result)
280
+ }
109
281
  ]
110
- },
282
+ }
283
+ end
284
+
285
+ # 3) Continue the conversation after tool execution
286
+ if response.content.any? { |b| b.type == "tool_use" }
287
+ final_response = adapter.stream(transcript, tools: [weather_tool])
288
+
289
+ final_text = final_response.content
290
+ .select { |b| b.type == "text" }
291
+ .map(&:text)
292
+ .join
293
+
294
+ puts final_text
295
+ end
111
296
  ```
112
297
 
113
- developer is an open ai role, but i thought it was usefull for tracing if message sent from server or user so i added
114
- it to the list of roles, when it is not supported it will be mapped to user instead.
298
+ Notes:
299
+ - Tool calls are returned as `ToolCall` blocks with `type: "tool_use"`, `id`, `name`, and `input`.
300
+ - Tool results are sent back in the transcript as `{ type: "tool_result", tool_use_id:, content: }` blocks.
301
+ - For multimodal-capable models, `tool_result` content can include image blocks when supported by the provider/model.
115
302
 
116
- you can assume developer and user to be interchangeable
303
+ ## Image Input
117
304
 
305
+ Send images by including an `image` content block in a user message.
118
306
 
307
+ ```ruby
308
+ require "llm_gateway"
309
+ require "base64"
119
310
 
311
+ adapter = LlmGateway.build_provider(
312
+ provider: "openai_apikey_responses",
313
+ api_key: ENV.fetch("OPENAI_API_KEY"),
314
+ model_key: "gpt-5.4"
315
+ )
120
316
 
121
- ### Sample Application
317
+ image_b64 = Base64.strict_encode64(File.binread("./chart.png"))
122
318
 
123
- See the [file search bot example](sample/claude_code_clone/) for a complete working application that demonstrates:
124
- - Creating reusable Prompt and Tool classes
125
- - Handling conversation transcripts with tool execution
126
- - Building an interactive terminal interface
319
+ message = [
320
+ {
321
+ role: "user",
322
+ content: [
323
+ { type: "text", text: "What do you see in this image?" },
324
+ { type: "image", data: image_b64, media_type: "image/png" }
325
+ ]
326
+ }
327
+ ]
127
328
 
128
- To run the sample:
329
+ result = adapter.stream(message) # stream API, no event block
129
330
 
130
- ```bash
131
- cd sample/claude_code_clone
132
- ruby run.rb
331
+ text = result.content
332
+ .select { |b| b.type == "text" }
333
+ .map(&:text)
334
+ .join
335
+
336
+ puts text
133
337
  ```
134
338
 
135
- The bot will prompt for your model and API key, then allow you to ask natural language questions about finding files and searching directories.
339
+ Tip: use a model/provider combination that supports vision input.
136
340
 
137
- ### Response Format
341
+ ## Thinking / Reasoning
138
342
 
139
- All providers return responses in a consistent format:
343
+ You can request higher-effort reasoning by passing `reasoning:` to `stream`.
140
344
 
141
345
  ```ruby
142
- {
143
- choices: [
144
- {
145
- content: [
146
- { type: 'text', text: 'The capital of France is Paris.' }
147
- ],
148
- finish_reason: 'end_turn',
149
- role: 'assistant'
150
- }
151
- ],
152
- usage: {
153
- input_tokens: 15,
154
- output_tokens: 8,
155
- total_tokens: 23
156
- },
157
- model: 'claude-sonnet-4-20250514',
158
- id: 'msg_abc123'
159
- }
346
+ require "llm_gateway"
347
+
348
+ adapter = LlmGateway.build_provider(
349
+ provider: "openai_apikey_responses",
350
+ api_key: ENV.fetch("OPENAI_API_KEY"),
351
+ model_key: "gpt-5.4"
352
+ )
353
+
354
+ result = adapter.stream(
355
+ "Think step by step and then compute 482 * 17.",
356
+ reasoning: "high"
357
+ )
358
+
359
+ puts "stop_reason: #{result.stop_reason}"
360
+ puts "usage: #{result.usage.inspect}" # may include reasoning_tokens depending on provider
361
+
362
+ result.content.each do |block|
363
+ case block.type
364
+ when "reasoning"
365
+ puts "[reasoning] #{block.reasoning}"
366
+ puts "[signature] #{block.signature}" if block.respond_to?(:signature) && block.signature
367
+ when "text"
368
+ puts "[text] #{block.text}"
369
+ end
370
+ end
160
371
  ```
161
372
 
162
- ### Error Handling
373
+ ### Streaming Thinking Content
163
374
 
164
- LlmGateway provides consistent error handling across all providers:
375
+ If you want incremental thinking/reasoning tokens as they arrive, pass a block to `stream` and handle reasoning events:
165
376
 
166
377
  ```ruby
167
- begin
168
- result = LlmGateway::Client.chat('invalid-model', 'Hello')
169
- rescue LlmGateway::Errors::UnsupportedModel => e
170
- puts "Unsupported model: #{e.message}"
171
- rescue LlmGateway::Errors::AuthenticationError => e
172
- puts "Authentication failed: #{e.message}"
173
- rescue LlmGateway::Errors::RateLimitError => e
174
- puts "Rate limit exceeded: #{e.message}"
378
+ reasoning_text = +""
379
+
380
+ result = adapter.stream("Solve 99 * 99 with brief reasoning.", reasoning: "high") do |event|
381
+ case event.type
382
+ when :reasoning_start
383
+ print "\n[thinking start]\n"
384
+ reasoning_text << event.delta
385
+ when :reasoning_delta
386
+ reasoning_text << event.delta
387
+ print event.delta
388
+ when :reasoning_end
389
+ print "\n[thinking end]\n"
390
+ end
175
391
  end
392
+
393
+ puts "\nCollected reasoning chars: #{reasoning_text.length}"
394
+ puts "Final stop_reason: #{result.stop_reason}"
176
395
  ```
177
396
 
178
- ## Development
397
+ ### How reasoning values are mapped
398
+
399
+ `llm_gateway` normalizes provider-specific reasoning/thinking output into shared structures:
400
+
401
+ - Stream events:
402
+ - `:reasoning_start/:reasoning_delta/:reasoning_end`
403
+ - Final content block:
404
+ - `ReasoningContent` with `type: "reasoning"`
405
+ - fields: `reasoning` and optional `signature`
406
+ - Usage accounting:
407
+ - normalized in `result.usage` when provided by the upstream API
408
+ - may include `:reasoning_tokens` plus standard token counters
409
+
410
+ In practice this means you can:
411
+ - listen to `:reasoning_*` stream event variants, and
412
+ - always read final reasoning text from `result.content` blocks where `block.type == "reasoning"`.
413
+
414
+ Notes:
415
+ - Reasoning output appears as `ReasoningContent` blocks with `type: "reasoning"`.
416
+ - Some providers/models expose explicit reasoning content; others may only reflect reasoning effort in usage fields.
417
+ - In streamed callbacks, reasoning events are emitted as `:reasoning_*` variants.
418
+
419
+ ## Cross-Provider Handoffs
420
+
421
+ Internally, `llm_gateway` handles handoffs by normalizing message history into a provider-agnostic shape, then remapping that shape to the target provider API on each request.
422
+
423
+ What happens under the hood on `stream`/`chat`:
179
424
 
180
- After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
425
+ 1. **Normalize input**
426
+ - String input is converted to a user message.
427
+ - `system` is normalized into system message objects.
428
+ - Prior assistant turns (including `response.to_h`) are treated as structured transcript entries.
429
+
430
+ 2. **Map into canonical gateway format**
431
+ - Provider-specific differences (content block names, tool-call shapes, reasoning/thinking variants) are unified into shared structs.
432
+
433
+ 3. **Sanitize for target provider/model**
434
+ - Before sending, messages are sanitized for the destination provider/API/model.
435
+ - Unsupported or provider-specific fields are adjusted/translated where possible.
436
+
437
+ 4. **Map to outbound provider payload**
438
+ - The adapter input mapper converts canonical messages/tools/options into the exact wire format expected by the selected provider endpoint.
439
+
440
+ 5. **Map response back to canonical output**
441
+ - Stream chunks are mapped into normalized stream events.
442
+ - Final output is accumulated into a normalized `AssistantMessage` (`id`, `model`, `usage`, `stop_reason`, `content`, etc.).
443
+
444
+ Why this matters:
445
+ - A transcript produced by one provider can be reused with another provider without manually rewriting message structure.
446
+ - Tool calls/reasoning/text are exposed through a consistent API even when upstream event formats differ.
447
+ - Your app can keep one conversation state format while switching providers for cost, latency, capability, or reliability reasons.
448
+
449
+ ## Context Serialization
450
+
451
+ `llm_gateway` contexts are plain Ruby hashes/arrays, so they can be serialized to JSON and restored later.
452
+
453
+ ```ruby
454
+ require "llm_gateway"
455
+ require "json"
456
+
457
+ adapter = LlmGateway.build_provider(
458
+ provider: "openai_apikey_responses",
459
+ api_key: ENV.fetch("OPENAI_API_KEY"),
460
+ model_key: "gpt-5.4"
461
+ )
462
+
463
+ # Build context (transcript)
464
+ transcript = [
465
+ { role: "user", content: "Plan a 3-day trip to Tokyo." }
466
+ ]
467
+
468
+ # Run one turn and persist assistant output
469
+ first = adapter.stream(transcript)
470
+ transcript << first.to_h
471
+
472
+ # Serialize (store in DB/file/cache)
473
+ json_context = JSON.generate(transcript)
474
+
475
+ # ...later / elsewhere...
476
+ restored_transcript = JSON.parse(json_context)
477
+
478
+ # Continue conversation from restored context
479
+ restored_transcript << { role: "user", content: "Now make it budget-friendly." }
480
+ second = adapter.stream(restored_transcript)
481
+
482
+ puts second.content.select { |b| b.type == "text" }.map(&:text).join
483
+ ```
484
+
485
+ What to persist:
486
+ - full transcript array (including assistant messages from `response.to_h`)
487
+ - any tool result messages you appended
488
+ - optional app metadata (user id, conversation id, timestamps) alongside the transcript
489
+
490
+ Tip: if you serialize to JSON, keys become strings on parse; `llm_gateway` accepts standard hash input and normalizes internally.
491
+
492
+ ## OAuth
493
+
494
+ Use OAuth-capable providers (for example `openai_codex` and `anthropic_oauth_messages`) by supplying an `access_token` when building the adapter.
495
+
496
+ ### Get initial tokens (Codex / OpenAI OAuth)
497
+
498
+ ```ruby
499
+ require "llm_gateway"
500
+
501
+ flow = LlmGateway::Clients::OpenAI::OAuthFlow.new
502
+
503
+ # 1) Start flow (generate auth URL + PKCE verifier + state)
504
+ start = flow.start
505
+ puts "Open in browser: #{start[:authorization_url]}"
506
+
507
+ # 2) After user auth, paste redirect URL (or raw code)
508
+ # Example: http://localhost:1455/auth/callback?code=...&state=...
509
+ print "Paste callback URL or code: "
510
+ input = STDIN.gets&.strip
511
+
512
+ # 3) Exchange for initial tokens
513
+ tokens = flow.exchange_code(input, start[:code_verifier], expected_state: start[:state])
514
+
515
+ puts tokens
516
+ # => {
517
+ # access_token: "...",
518
+ # refresh_token: "...",
519
+ # expires_at: <Time>,
520
+ # account_id: "..."
521
+ # }
522
+ ```
523
+
524
+ ### Get initial tokens (Anthropic OAuth)
525
+
526
+ ```ruby
527
+ require "llm_gateway"
528
+
529
+ flow = LlmGateway::Clients::ClaudeCode::OAuthFlow.new
530
+
531
+ # 1) Start flow (auth URL + PKCE verifier + state)
532
+ start = flow.start
533
+ puts "Open in browser: #{start[:authorization_url]}"
534
+
535
+ # 2) After user auth, paste callback URL (or code)
536
+ # Example callback contains ?code=...&state=...
537
+ print "Paste callback URL or code: "
538
+ input = STDIN.gets&.strip
539
+
540
+ # 3) Exchange for initial tokens
541
+ tokens = flow.exchange_code(input, start[:code_verifier], state: start[:state])
542
+
543
+ puts tokens
544
+ # => {
545
+ # access_token: "...",
546
+ # refresh_token: "...",
547
+ # expires_at: <Time>
548
+ # }
549
+ ```
550
+
551
+ ### Get a refresh token
552
+
553
+ ### Exchange refresh token for access token
554
+
555
+ Use the built-in token managers in this repo. `on_token_refresh` block will be called when the refresh token is updated and should be persisted.
556
+
557
+ OpenAI Codex OAuth:
558
+
559
+ ```ruby
560
+ require "llm_gateway"
561
+
562
+ manager = LlmGateway::Clients::OpenAI::TokenManager.new(
563
+ refresh_token: stored_refresh_token,
564
+ access_token: stored_access_token, # optional
565
+ expires_at: stored_expires_at # optional
566
+ )
567
+
568
+ manager.on_token_refresh = lambda do |new_access_token, new_refresh_token, new_expires_at|
569
+ # Persist updated credentials in your DB/secrets store
570
+ end
571
+
572
+ current_access_token = manager.access_token
573
+ ```
574
+
575
+ Anthropic OAuth:
576
+
577
+ ```ruby
578
+ require "llm_gateway"
579
+
580
+ manager = LlmGateway::Clients::ClaudeCode::TokenManager.new(
581
+ refresh_token: stored_refresh_token,
582
+ access_token: stored_access_token, # optional
583
+ expires_at: stored_expires_at, # optional
584
+ client_id: ENV.fetch("ANTHROPIC_CLIENT_ID"),
585
+ client_secret: ENV["ANTHROPIC_CLIENT_SECRET"] # optional depending on app setup
586
+ )
587
+
588
+ manager.on_token_refresh = lambda do |new_access_token, new_refresh_token, new_expires_at|
589
+ # Persist updated credentials
590
+ end
591
+
592
+ current_access_token = manager.access_token
593
+ ```
594
+
595
+ ### Pass access token in provider requests
596
+
597
+ Build the provider with the current access token:
598
+
599
+ ```ruby
600
+ adapter = LlmGateway.build_provider(
601
+ provider: "openai_codex",
602
+ access_token: current_access_token,
603
+ model_key: "gpt-5.4"
604
+ )
605
+
606
+ result = adapter.stream("Hello from OAuth auth")
607
+ puts result.content.select { |b| b.type == "text" }.map(&:text).join
608
+ ```
181
609
 
182
- To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
610
+ If your app refreshes tokens in the background, rebuild the adapter (or recreate client state) with the newest `access_token` before subsequent calls.
183
611
 
184
- ## Contributing
612
+ ### Token refresh responsibility
185
613
 
186
- Bug reports and pull requests are welcome on GitHub at https://github.com/Hyper-Unearthing/llm_gateway. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [code of conduct](https://github.com/Hyper-Unearthing/llm_gateway/blob/master/CODE_OF_CONDUCT.md).
614
+ #### Library’s role (llm_gateway)
187
615
 
188
- ## License
616
+ - Provides token manager helpers.
617
+ - Detects expiry from expires_at.
618
+ - Refreshes access token when asked (ensure_valid_token / refresh methods).
619
+ - Returns updated token values and triggers on_token_refresh callback after successful refresh.
620
+ - Uses whatever access token you pass into provider requests.
189
621
 
190
- The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
622
+ #### User/app’s role
191
623
 
192
- ## Code of Conduct
624
+ - Persist tokens securely (DB/secrets store).
625
+ - Store and pass access_token, refresh_token, expires_at into the token manager.
626
+ - Implement on_token_refresh to save updated credentials.
627
+ - Decide refresh/retry policy at app level (e.g., retry failed request after refresh when appropriate).
628
+ - Rebuild client/provider state with latest access token for future calls.
193
629
 
194
- Everyone interacting in the LlmGateway project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/Hyper-Unearthing/llm_gateway/blob/master/CODE_OF_CONDUCT.md).
630
+ In short: library executes refresh mechanics; your app owns token lifecycle persistence and operational policy.