llm_gateway 0.3.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (74) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +26 -0
  3. data/README.md +544 -186
  4. data/Rakefile +1 -2
  5. data/docs/migration-guide.md +135 -0
  6. data/lib/llm_gateway/adapters/adapter.rb +173 -0
  7. data/lib/llm_gateway/adapters/anthropic/acts_like_messages.rb +23 -0
  8. data/lib/llm_gateway/adapters/{claude → anthropic}/bidirectional_message_mapper.rb +31 -3
  9. data/lib/llm_gateway/adapters/{claude → anthropic}/input_mapper.rb +4 -3
  10. data/lib/llm_gateway/adapters/anthropic/messages_adapter.rb +19 -0
  11. data/lib/llm_gateway/adapters/{claude → anthropic}/output_mapper.rb +1 -1
  12. data/lib/llm_gateway/adapters/anthropic/stream_mapper.rb +110 -0
  13. data/lib/llm_gateway/adapters/anthropic_option_mapper.rb +53 -0
  14. data/lib/llm_gateway/adapters/groq/chat_completions_adapter.rb +47 -0
  15. data/lib/llm_gateway/adapters/groq/option_mapper.rb +27 -0
  16. data/lib/llm_gateway/adapters/input_message_sanitizer.rb +93 -0
  17. data/lib/llm_gateway/adapters/openai/acts_like_chat_completions.rb +22 -0
  18. data/lib/llm_gateway/adapters/openai/acts_like_responses.rb +31 -0
  19. data/lib/llm_gateway/adapters/{open_ai → openai}/chat_completions/bidirectional_message_mapper.rb +9 -2
  20. data/lib/llm_gateway/adapters/{open_ai → openai}/chat_completions/input_mapper.rb +1 -6
  21. data/lib/llm_gateway/adapters/openai/chat_completions/input_message_sanitizer.rb +65 -0
  22. data/lib/llm_gateway/adapters/openai/chat_completions/option_mapper.rb +39 -0
  23. data/lib/llm_gateway/adapters/{open_ai → openai}/chat_completions/output_mapper.rb +1 -1
  24. data/lib/llm_gateway/adapters/openai/chat_completions/stream_mapper.rb +242 -0
  25. data/lib/llm_gateway/adapters/openai/chat_completions_adapter.rb +20 -0
  26. data/lib/llm_gateway/adapters/{open_ai → openai}/file_output_mapper.rb +1 -1
  27. data/lib/llm_gateway/adapters/openai/prompt_cache_option_mapper.rb +39 -0
  28. data/lib/llm_gateway/adapters/{open_ai → openai}/responses/bidirectional_message_mapper.rb +52 -4
  29. data/lib/llm_gateway/adapters/openai/responses/input_mapper.rb +106 -0
  30. data/lib/llm_gateway/adapters/openai/responses/option_mapper.rb +41 -0
  31. data/lib/llm_gateway/adapters/{open_ai → openai}/responses/output_mapper.rb +1 -1
  32. data/lib/llm_gateway/adapters/openai/responses/stream_mapper.rb +340 -0
  33. data/lib/llm_gateway/adapters/openai/responses_adapter.rb +20 -0
  34. data/lib/llm_gateway/adapters/openai_codex/input_mapper.rb +206 -0
  35. data/lib/llm_gateway/adapters/openai_codex/option_mapper.rb +28 -0
  36. data/lib/llm_gateway/adapters/openai_codex/responses_adapter.rb +38 -0
  37. data/lib/llm_gateway/adapters/option_mapper.rb +13 -0
  38. data/lib/llm_gateway/adapters/stream_accumulator.rb +91 -0
  39. data/lib/llm_gateway/adapters/structs.rb +145 -0
  40. data/lib/llm_gateway/base_client.rb +62 -1
  41. data/lib/llm_gateway/client.rb +45 -129
  42. data/lib/llm_gateway/clients/anthropic.rb +167 -0
  43. data/lib/llm_gateway/clients/claude_code/oauth_flow.rb +162 -0
  44. data/lib/llm_gateway/clients/claude_code/token_manager.rb +112 -0
  45. data/lib/llm_gateway/clients/groq.rb +54 -0
  46. data/lib/llm_gateway/clients/openai.rb +208 -0
  47. data/lib/llm_gateway/clients/openai_codex/oauth_flow.rb +258 -0
  48. data/lib/llm_gateway/clients/openai_codex/token_manager.rb +71 -0
  49. data/lib/llm_gateway/errors.rb +21 -0
  50. data/lib/llm_gateway/prompt.rb +12 -1
  51. data/lib/llm_gateway/provider_registry.rb +37 -0
  52. data/lib/llm_gateway/version.rb +1 -1
  53. data/lib/llm_gateway.rb +165 -14
  54. data/scripts/create_anthropic_credentials.rb +106 -0
  55. data/scripts/create_openai_codex_credentials.rb +116 -0
  56. data/scripts/generate_handoff_live_fixture.rb +169 -0
  57. data/scripts/generate_handoff_media_fixture.rb +167 -0
  58. metadata +64 -28
  59. data/lib/llm_gateway/adapters/claude/client.rb +0 -60
  60. data/lib/llm_gateway/adapters/groq/bidirectional_message_mapper.rb +0 -18
  61. data/lib/llm_gateway/adapters/groq/client.rb +0 -58
  62. data/lib/llm_gateway/adapters/groq/input_mapper.rb +0 -18
  63. data/lib/llm_gateway/adapters/groq/output_mapper.rb +0 -10
  64. data/lib/llm_gateway/adapters/open_ai/client.rb +0 -80
  65. data/lib/llm_gateway/adapters/open_ai/responses/input_mapper.rb +0 -62
  66. data/sample/claude_code_clone/agent.rb +0 -65
  67. data/sample/claude_code_clone/claude_code_clone.rb +0 -40
  68. data/sample/claude_code_clone/prompt.rb +0 -79
  69. data/sample/claude_code_clone/run.rb +0 -47
  70. data/sample/claude_code_clone/tools/bash_tool.rb +0 -54
  71. data/sample/claude_code_clone/tools/edit_tool.rb +0 -61
  72. data/sample/claude_code_clone/tools/grep_tool.rb +0 -113
  73. data/sample/claude_code_clone/tools/read_tool.rb +0 -61
  74. data/sample/claude_code_clone/tools/todowrite_tool.rb +0 -98
data/README.md CHANGED
@@ -1,272 +1,630 @@
1
- # LlmGateway
1
+ # llm_gateway
2
2
 
3
3
  Provide a unified translation interface for LLM Provider API's, While allowing developers to have as much control as possible, This does make it more complicated because we dont want developers to be blocked at using something that the provider supports. As time progress the library will mature and support more responses
4
4
 
5
+ ## Table of Contents
6
+
7
+ - [Principles:](#principles)
8
+ - [Installation](#installation)
9
+ - [Supported Providers](#supported-providers)
10
+ - [Quick Start: Streaming (all events)](#quick-start-streaming-all-events)
11
+ - [Stream API without handling events (final result only)](#stream-api-without-handling-events-final-result-only)
12
+ - [Migration guides](#migration-guides)
13
+ - [Tools](#tools)
14
+ - [Defining Tools](#defining-tools)
15
+ - [Handling Tool Calls](#handling-tool-calls)
16
+ - [Image Input](#image-input)
17
+ - [Thinking / Reasoning](#thinking--reasoning)
18
+ - [Streaming Thinking Content](#streaming-thinking-content)
19
+ - [How reasoning values are mapped](#how-reasoning-values-are-mapped)
20
+ - [Cross-Provider Handoffs](#cross-provider-handoffs)
21
+ - [Context Serialization](#context-serialization)
22
+ - [OAuth](#oauth)
23
+ - [Get initial tokens (Codex / OpenAI OAuth)](#get-initial-tokens-codex--openai-oauth)
24
+ - [Get initial tokens (Anthropic OAuth)](#get-initial-tokens-anthropic-oauth)
25
+ - [Get a refresh token](#get-a-refresh-token)
26
+ - [Exchange refresh token for access token](#exchange-refresh-token-for-access-token)
27
+ - [Pass access token in provider requests](#pass-access-token-in-provider-requests)
28
+ - [Token refresh responsibility](#token-refresh-responsibility)
29
+ - [Library’s role (llm_gateway)](#librarys-role-llm_gateway)
30
+ - [User/app’s role](#userapps-role)
5
31
 
6
32
  ## Principles:
7
33
  1. Transcription integrity is most important
8
34
  2. Input messages must have bidirectional integrity
9
35
  3. Allow developers as much control as possible
10
36
 
11
- ## Assumptions
12
- things that do not support unidirectional format, probably cant be sent between providers
37
+ ## Installation
38
+
39
+ ```bash
40
+ gem install llm_gateway
41
+ ```
13
42
 
14
- ## Mechanics
15
- Messages either support unidirectional or bidirectional format. (unidirectional means we can format it as an output but should not be added as an input).
43
+ Or add it to your `Gemfile`:
16
44
 
17
- The result from the llm is in the format that can be sent to the provider, but if you want to consolidate complex messages like code_execution, you must run a mapper we provide manually, but dont send that format back to the provider.
45
+ ```ruby
46
+ gem "llm_gateway"
47
+ ```
18
48
 
19
- ### bidirectional Support
20
- Messages
21
- - Text
22
- - Tool Use
23
- - Tool Response
49
+ ## Supported Providers
24
50
 
25
- Tools
26
- - Server Tools
27
- - Tools
51
+ | Provider | Provider Key | Auth | API Surface |
52
+ |-----------|------------------------------|-------|------------------------|
53
+ | Anthropic | `anthropic_messages` | API key | Messages |
54
+ | OpenAI | `openai_completions` | API key | Chat Completions |
55
+ | OpenAI | `openai_responses` | API key | Responses |
56
+ | OpenAI Codex | `openai_codex` | OAuth | Responses |
57
+ | Groq | `groq_completions` | API key | Chat Completions |
28
58
 
29
- ### Unidirectional Support
30
- - Server Tool Use Reponse
59
+ Legacy keys (`*_apikey_*`, `*_oauth_*`) are still supported for backward compatibility.
31
60
 
32
- ### Example flow
61
+ ## Quick Start: Streaming (all events)
33
62
 
63
+ ```ruby
64
+ require "llm_gateway"
65
+ require "json"
66
+
67
+ # Build a provider adapter directly (not via prebuilt config)
68
+ adapter = LlmGateway.build_provider(
69
+ provider: "openai_responses", # or anthropic_messages, groq_completions, ...
70
+ api_key: ENV.fetch("OPENAI_API_KEY"),
71
+ model_key: "gpt-5.4"
72
+ )
34
73
 
35
- ```mermaid
36
- sequenceDiagram
37
- actor developer
38
- participant llm_gateway
39
- participant llm_provider
74
+ tools = [
75
+ {
76
+ name: "get_time",
77
+ description: "Get the current time",
78
+ input_schema: {
79
+ type: "object",
80
+ properties: {
81
+ timezone: { type: "string", description: "Optional timezone, e.g. America/New_York" }
82
+ }
83
+ }
84
+ }
85
+ ]
86
+
87
+ transcript = [
88
+ { role: "user", content: "What time is it? Think briefly, then call get_time." }
89
+ ]
90
+
91
+ streamed_tool_args = Hash.new { |h, k| h[k] = +"" }
92
+
93
+ response = adapter.stream(transcript, tools: tools, reasoning: "high") do |event|
94
+ case event.type
95
+ # AssistantStreamMessageEvent
96
+ when :message_start
97
+ puts "\n[message_start] #{event.delta.inspect}"
98
+ when :message_delta
99
+ puts "\n[message_delta] #{event.delta.inspect} usage+=#{event.usage_increment.inspect}"
100
+ when :message_end
101
+ puts "\n[message_end]"
102
+
103
+ # Text events
104
+ when :text_start
105
+ puts "\n[text_start] index=#{event.content_index}"
106
+ print event.delta unless event.delta.empty?
107
+ when :text_delta
108
+ print event.delta
109
+ when :text_end
110
+ puts "\n[text_end] index=#{event.content_index}"
111
+
112
+ # Tool-call events
113
+ when :tool_start
114
+ puts "\n[tool_start] id=#{event.id} name=#{event.name} index=#{event.content_index}"
115
+ when :tool_delta
116
+ streamed_tool_args[event.content_index] << event.delta
117
+ print event.delta
118
+ when :tool_end
119
+ puts "\n[tool_end] index=#{event.content_index}"
120
+ begin
121
+ puts "tool args: #{JSON.parse(streamed_tool_args[event.content_index])}"
122
+ rescue JSON::ParserError
123
+ puts "tool args (partial/raw): #{streamed_tool_args[event.content_index]}"
124
+ end
125
+
126
+ # Reasoning events
127
+ when :reasoning_start
128
+ puts "\n[reasoning_start] sig=#{event.respond_to?(:signature) ? event.signature : ""}"
129
+ print event.delta
130
+ when :reasoning_delta
131
+ print event.delta
132
+ when :reasoning_end
133
+ puts "\n[reasoning_end]"
134
+
135
+ end
136
+ end
40
137
 
41
- developer ->> llm_gateway: Send Text Message
42
- llm_gateway ->> llm_gateway: transform to provider format
43
- llm_gateway ->> llm_provider: Transformed Text Message
44
- llm_provider ->> llm_gateway: Response <br />(transcript in provider format)
45
- llm_gateway ->> developer: Response <br />(transcript in combination <br />of gatway and provider formats)
46
- Note over llm_gateway,developer: llm_gateway will transform <br /> messages that support bi-direction
47
- developer ->> developer: save the transcript
48
- loop ProcessMessage
49
- developer ->> llm_gateway: format message
50
- llm_gateway ->> developer: return transformed message
51
- Note over llm_gateway,developer: if the message: <br /> supports bidirection format returns as is <br /> otherwise will transform <br />into consolidated format
52
- developer ->> developer: append earlier saved transcript
53
- Note over developer, developer: for example tool use
54
- end
55
- developer -> llm_gateway: Transcript
56
- llm_gateway ->> llm_gateway: transform to provider format
57
- Note over llm_gateway,llm_gateway: non bidirectional messages are sent as is
58
- llm_gateway ->> llm_provider: etc etc etc
138
+ # Final AssistantMessage (assembled from the stream)
139
+ puts "\n\n=== Final assistant message ==="
140
+ puts "id: #{response.id}"
141
+ puts "model: #{response.model}"
142
+ puts "provider/api: #{response.provider}/#{response.api}"
143
+ puts "role: #{response.role}"
144
+ puts "stop_reason: #{response.stop_reason}"
145
+ puts "error_message: #{response.error_message.inspect}" if response.error_message
146
+ puts "usage: #{response.usage.inspect}"
147
+
148
+ response.content.each do |block|
149
+ case block.type
150
+ when "text"
151
+ puts "text: #{block.text}"
152
+ when "reasoning"
153
+ puts "reasoning: #{block.reasoning}"
154
+ puts "signature: #{block.signature}" if block.respond_to?(:signature) && block.signature
155
+ when "tool_use"
156
+ puts "tool_use: #{block.name}(#{block.input.inspect}) id=#{block.id}"
157
+ end
158
+ end
159
+ ```
59
160
 
161
+ Stream callback event families:
162
+ - `AssistantStreamMessageEvent`: `:message_start`, `:message_delta`, `:message_end`
163
+ - `AssistantStreamEvent` (and subclasses):
164
+ - Text: `:text_start`, `:text_delta`, `:text_end`
165
+ - Tool call: `:tool_start`, `:tool_delta`, `:tool_end`
166
+ - Reasoning: `:reasoning_start`, `:reasoning_delta`, `:reasoning_end`
60
167
 
168
+ ### Stream API without handling events (final result only)
61
169
 
62
- ```
170
+ If you only care about the final `AssistantMessage`, call `stream` without a block:
63
171
 
64
- ## Supported Providers
65
- Anthropic, OpenAi, Groq
172
+ ```ruby
173
+ require "llm_gateway"
66
174
 
175
+ adapter = LlmGateway.build_provider(
176
+ provider: "openai_apikey_responses",
177
+ api_key: ENV.fetch("OPENAI_API_KEY"),
178
+ model_key: "gpt-5.4"
179
+ )
67
180
 
68
- ## Installation
181
+ result = adapter.stream("Write one short sentence about Ruby.")
69
182
 
70
- Add the gem to your application's Gemfile:
183
+ puts result.role # "assistant"
184
+ puts result.stop_reason # "stop" (usually)
185
+ puts result.usage.inspect
71
186
 
72
- ```bash
73
- bundle add llm_gateway
187
+ text = result.content
188
+ .select { |block| block.type == "text" }
189
+ .map(&:text)
190
+ .join
191
+
192
+ puts text
74
193
  ```
75
194
 
76
- Or install it yourself:
195
+ ## Migration guides
77
196
 
78
- ```bash
79
- gem install llm_gateway
80
- ```
197
+ - [Migrating from `chat` to `stream`](docs/chat-to-stream-migration.md) — use `stream` without a block when you only need the final response.
81
198
 
82
- ## Usage
199
+ ## Tools
83
200
 
84
- ### Basic Chat
201
+ ### Defining Tools
85
202
 
86
203
  ```ruby
87
- require 'llm_gateway'
204
+ weather_tool = {
205
+ name: "get_weather",
206
+ description: "Get current weather for a location",
207
+ input_schema: {
208
+ type: "object",
209
+ properties: {
210
+ location: { type: "string", description: "City name or coordinates" },
211
+ units: {
212
+ type: "string",
213
+ enum: ["celsius", "fahrenheit"],
214
+ default: "celsius"
215
+ }
216
+ },
217
+ required: ["location"]
218
+ }
219
+ }
220
+ ```
88
221
 
89
- # Simple text completion
90
- LlmGateway::Client.chat(
91
- 'claude-sonnet-4-20250514',
92
- 'What is the capital of France?'
93
- )
222
+ ### Handling Tool Calls
94
223
 
95
- # With system message
96
- LlmGateway::Client.chat(
97
- 'gpt-4',
98
- 'What is the capital of France?',
99
- system: 'You are a helpful geography teacher.'
224
+ Use `stream` without a block, inspect returned `tool_use` blocks, execute tools, append `tool_result`, then continue:
225
+
226
+ ```ruby
227
+ require "llm_gateway"
228
+ require "json"
229
+
230
+ adapter = LlmGateway.build_provider(
231
+ provider: "openai_apikey_responses",
232
+ api_key: ENV.fetch("OPENAI_API_KEY"),
233
+ model_key: "gpt-5.4"
100
234
  )
101
235
 
102
- # With inline file
103
- LlmGateway::Client.chat(
104
- "claude-sonnet-4-20250514",
105
- [
106
- {
107
- role: "user", content: [
108
- { type: "text", text: "return the content of the document exactly" },
109
- { type: "file", data: "abc\n", media_type: "text/plain", name: "small.txt" }
110
- ]
236
+ weather_tool = {
237
+ name: "get_weather",
238
+ description: "Get current weather for a location",
239
+ input_schema: {
240
+ type: "object",
241
+ properties: {
242
+ location: { type: "string" },
243
+ units: { type: "string", enum: ["celsius", "fahrenheit"], default: "celsius" }
111
244
  },
112
- ]
113
- )
245
+ required: ["location"]
246
+ }
247
+ }
114
248
 
115
- # Transcript
116
- LlmGateway::Client.chat('llama-3.3-70b-versatile',[
117
- { role: "user", content: "Tell Me a joke" },
118
- { role: "assistant", content: "what kind of content"},
119
- { role: "user", content: "About Sparkling water" },
120
- ]
121
- )
249
+ def execute_weather_api(args)
250
+ # Replace with real API call
251
+ {
252
+ location: args[:location] || args["location"],
253
+ units: args[:units] || args["units"] || "celsius",
254
+ temperature: 14,
255
+ condition: "Cloudy"
256
+ }
257
+ end
122
258
 
259
+ transcript = [
260
+ { role: "user", content: "What is the weather in London?" }
261
+ ]
123
262
 
124
- # Tool usage
125
- LlmGateway::Client.chat('gpt-5',[
126
- { role: "user", content: "What's the weather in Singapore? reply in 10 words and no special characters" },
127
- { role: "assistant",
128
- content: [
129
- { id: "call_gpXfy9l9QNmShNEbNI1FyuUZ", type: "tool_use", name: "get_weather", input: { location: "Singapore" } }
130
- ]
131
- },
132
- { role: "developer",
133
- content: [
134
- { content: "-15 celcius", type: "tool_result", tool_use_id: "call_gpXfy9l9QNmShNEbNI1FyuUZ" }
135
- ]
136
- }
137
- ],
138
- tools: [ { name: "get_weather", description: "Get current weather for a location", input_schema: { type: "object", properties: { location: { type: "string", description: "City name" } }, required: [ "location" ] } } ]
139
- )
263
+ # 1) First model pass (stream API, no event block)
264
+ response = adapter.stream(transcript, tools: [weather_tool])
265
+ transcript << response.to_h
266
+
267
+ # 2) Execute tool calls returned by the model
268
+ response.content.each do |block|
269
+ next unless block.type == "tool_use"
270
+
271
+ tool_result = execute_weather_api(block.input)
272
+
273
+ transcript << {
274
+ role: "developer",
275
+ content: [
276
+ {
277
+ type: "tool_result",
278
+ tool_use_id: block.id,
279
+ content: JSON.generate(tool_result)
280
+ }
281
+ ]
282
+ }
283
+ end
284
+
285
+ # 3) Continue the conversation after tool execution
286
+ if response.content.any? { |b| b.type == "tool_use" }
287
+ final_response = adapter.stream(transcript, tools: [weather_tool])
288
+
289
+ final_text = final_response.content
290
+ .select { |b| b.type == "text" }
291
+ .map(&:text)
292
+ .join
293
+
294
+ puts final_text
295
+ end
140
296
  ```
141
297
 
142
- ### Supported Roles
298
+ Notes:
299
+ - Tool calls are returned as `ToolCall` blocks with `type: "tool_use"`, `id`, `name`, and `input`.
300
+ - Tool results are sent back in the transcript as `{ type: "tool_result", tool_use_id:, content: }` blocks.
301
+ - For multimodal-capable models, `tool_result` content can include image blocks when supported by the provider/model.
143
302
 
144
- - user
145
- - developer
146
- - assistant
303
+ ## Image Input
304
+
305
+ Send images by including an `image` content block in a user message.
147
306
 
148
- #### Examples
149
307
  ```ruby
150
- # tool call
151
- { role: "developer",
152
- content: [
153
- { content: "-15 celcius", type: "tool_result", tool_use_id: "call_gpXfy9l9QNmShNEbNI1FyuUZ" }
154
- ]
155
- }
156
- # plain message
157
- { role: "user", content: "What's the weather in Singapore? reply in 10 words and no special characters" }
308
+ require "llm_gateway"
309
+ require "base64"
310
+
311
+ adapter = LlmGateway.build_provider(
312
+ provider: "openai_apikey_responses",
313
+ api_key: ENV.fetch("OPENAI_API_KEY"),
314
+ model_key: "gpt-5.4"
315
+ )
158
316
 
159
- # plain response
160
- { role: "assistant", content: "what kind of content"},
317
+ image_b64 = Base64.strict_encode64(File.binread("./chart.png"))
161
318
 
162
- # tool call response
163
- { role: "assistant",
319
+ message = [
320
+ {
321
+ role: "user",
164
322
  content: [
165
- { id: "call_gpXfy9l9QNmShNEbNI1FyuUZ", type: "tool_use", name: "get_weather", input: { location: "Singapore" } }
323
+ { type: "text", text: "What do you see in this image?" },
324
+ { type: "image", data: image_b64, media_type: "image/png" }
166
325
  ]
167
- },
326
+ }
327
+ ]
328
+
329
+ result = adapter.stream(message) # stream API, no event block
330
+
331
+ text = result.content
332
+ .select { |b| b.type == "text" }
333
+ .map(&:text)
334
+ .join
335
+
336
+ puts text
168
337
  ```
169
338
 
170
- developer is an open ai role, but i thought it was usefull for tracing if message sent from server or user so i added
171
- it to the list of roles, when it is not supported it will be mapped to user instead.
339
+ Tip: use a model/provider combination that supports vision input.
172
340
 
173
- you can assume developer and user to be interchangeable
341
+ ## Thinking / Reasoning
174
342
 
343
+ You can request higher-effort reasoning by passing `reasoning:` to `stream`.
175
344
 
176
- ### Files
345
+ ```ruby
346
+ require "llm_gateway"
347
+
348
+ adapter = LlmGateway.build_provider(
349
+ provider: "openai_apikey_responses",
350
+ api_key: ENV.fetch("OPENAI_API_KEY"),
351
+ model_key: "gpt-5.4"
352
+ )
353
+
354
+ result = adapter.stream(
355
+ "Think step by step and then compute 482 * 17.",
356
+ reasoning: "high"
357
+ )
358
+
359
+ puts "stop_reason: #{result.stop_reason}"
360
+ puts "usage: #{result.usage.inspect}" # may include reasoning_tokens depending on provider
361
+
362
+ result.content.each do |block|
363
+ case block.type
364
+ when "reasoning"
365
+ puts "[reasoning] #{block.reasoning}"
366
+ puts "[signature] #{block.signature}" if block.respond_to?(:signature) && block.signature
367
+ when "text"
368
+ puts "[text] #{block.text}"
369
+ end
370
+ end
371
+ ```
177
372
 
178
- Many providers offer the ability to upload files which can be referenced in conversations, or for other purposes like batching. Downloading files is also used for when llm generates something or batches complete.
373
+ ### Streaming Thinking Content
179
374
 
180
- ## Examples
375
+ If you want incremental thinking/reasoning tokens as they arrive, pass a block to `stream` and handle reasoning events:
181
376
 
182
377
  ```ruby
183
- # Upload File
184
- result = LlmGateway::Client.upload_file("openai", filename: "test.txt", content: "Hello, world!", mime_type: "text/plain")
185
- result = LlmGateway::Client.download_file("openai", file_id: "file-Kb6X7f8YDffu7FG1NcaPVu")
186
- # Response Format
187
- {
188
- id: "file-Kb6X7f8YDffu7FG1NcaPVu",
189
- size_bytes: 13, # follows anthropic naming cause clearer
190
- created_at: "2025-08-08T06:03:16.000000Z", # follow anthropic style cause easier to read as human
191
- filename: "test.txt",
192
- mime_type: nil,
193
- downloadable: true, # anthropic returns this for other providers it is infered
194
- expires_at: nil,
195
- purpose: "user_data" # for anthropic this is always user_data
196
- }
378
+ reasoning_text = +""
379
+
380
+ result = adapter.stream("Solve 99 * 99 with brief reasoning.", reasoning: "high") do |event|
381
+ case event.type
382
+ when :reasoning_start
383
+ print "\n[thinking start]\n"
384
+ reasoning_text << event.delta
385
+ when :reasoning_delta
386
+ reasoning_text << event.delta
387
+ print event.delta
388
+ when :reasoning_end
389
+ print "\n[thinking end]\n"
390
+ end
391
+ end
392
+
393
+ puts "\nCollected reasoning chars: #{reasoning_text.length}"
394
+ puts "Final stop_reason: #{result.stop_reason}"
197
395
  ```
198
396
 
199
- ### Sample Application
397
+ ### How reasoning values are mapped
200
398
 
201
- See the [file search bot example](sample/claude_code_clone/) for a complete working application that demonstrates:
202
- - Creating reusable Prompt and Tool classes
203
- - Handling conversation transcripts with tool execution
204
- - Building an interactive terminal interface
399
+ `llm_gateway` normalizes provider-specific reasoning/thinking output into shared structures:
205
400
 
206
- To run the sample:
401
+ - Stream events:
402
+ - `:reasoning_start/:reasoning_delta/:reasoning_end`
403
+ - Final content block:
404
+ - `ReasoningContent` with `type: "reasoning"`
405
+ - fields: `reasoning` and optional `signature`
406
+ - Usage accounting:
407
+ - normalized in `result.usage` when provided by the upstream API
408
+ - may include `:reasoning_tokens` plus standard token counters
207
409
 
208
- ```bash
209
- cd sample/claude_code_clone
210
- ruby run.rb
410
+ In practice this means you can:
411
+ - listen to `:reasoning_*` stream event variants, and
412
+ - always read final reasoning text from `result.content` blocks where `block.type == "reasoning"`.
413
+
414
+ Notes:
415
+ - Reasoning output appears as `ReasoningContent` blocks with `type: "reasoning"`.
416
+ - Some providers/models expose explicit reasoning content; others may only reflect reasoning effort in usage fields.
417
+ - In streamed callbacks, reasoning events are emitted as `:reasoning_*` variants.
418
+
419
+ ## Cross-Provider Handoffs
420
+
421
+ Internally, `llm_gateway` handles handoffs by normalizing message history into a provider-agnostic shape, then remapping that shape to the target provider API on each request.
422
+
423
+ What happens under the hood on `stream`/`chat`:
424
+
425
+ 1. **Normalize input**
426
+ - String input is converted to a user message.
427
+ - `system` is normalized into system message objects.
428
+ - Prior assistant turns (including `response.to_h`) are treated as structured transcript entries.
429
+
430
+ 2. **Map into canonical gateway format**
431
+ - Provider-specific differences (content block names, tool-call shapes, reasoning/thinking variants) are unified into shared structs.
432
+
433
+ 3. **Sanitize for target provider/model**
434
+ - Before sending, messages are sanitized for the destination provider/API/model.
435
+ - Unsupported or provider-specific fields are adjusted/translated where possible.
436
+
437
+ 4. **Map to outbound provider payload**
438
+ - The adapter input mapper converts canonical messages/tools/options into the exact wire format expected by the selected provider endpoint.
439
+
440
+ 5. **Map response back to canonical output**
441
+ - Stream chunks are mapped into normalized stream events.
442
+ - Final output is accumulated into a normalized `AssistantMessage` (`id`, `model`, `usage`, `stop_reason`, `content`, etc.).
443
+
444
+ Why this matters:
445
+ - A transcript produced by one provider can be reused with another provider without manually rewriting message structure.
446
+ - Tool calls/reasoning/text are exposed through a consistent API even when upstream event formats differ.
447
+ - Your app can keep one conversation state format while switching providers for cost, latency, capability, or reliability reasons.
448
+
449
+ ## Context Serialization
450
+
451
+ `llm_gateway` contexts are plain Ruby hashes/arrays, so they can be serialized to JSON and restored later.
452
+
453
+ ```ruby
454
+ require "llm_gateway"
455
+ require "json"
456
+
457
+ adapter = LlmGateway.build_provider(
458
+ provider: "openai_apikey_responses",
459
+ api_key: ENV.fetch("OPENAI_API_KEY"),
460
+ model_key: "gpt-5.4"
461
+ )
462
+
463
+ # Build context (transcript)
464
+ transcript = [
465
+ { role: "user", content: "Plan a 3-day trip to Tokyo." }
466
+ ]
467
+
468
+ # Run one turn and persist assistant output
469
+ first = adapter.stream(transcript)
470
+ transcript << first.to_h
471
+
472
+ # Serialize (store in DB/file/cache)
473
+ json_context = JSON.generate(transcript)
474
+
475
+ # ...later / elsewhere...
476
+ restored_transcript = JSON.parse(json_context)
477
+
478
+ # Continue conversation from restored context
479
+ restored_transcript << { role: "user", content: "Now make it budget-friendly." }
480
+ second = adapter.stream(restored_transcript)
481
+
482
+ puts second.content.select { |b| b.type == "text" }.map(&:text).join
211
483
  ```
212
484
 
213
- The bot will prompt for your model and API key, then allow you to ask natural language questions about finding files and searching directories.
485
+ What to persist:
486
+ - full transcript array (including assistant messages from `response.to_h`)
487
+ - any tool result messages you appended
488
+ - optional app metadata (user id, conversation id, timestamps) alongside the transcript
489
+
490
+ Tip: if you serialize to JSON, keys become strings on parse; `llm_gateway` accepts standard hash input and normalizes internally.
491
+
492
+ ## OAuth
214
493
 
215
- ### Response Format
494
+ Use OAuth-capable providers (for example `openai_codex` and `anthropic_oauth_messages`) by supplying an `access_token` when building the adapter.
216
495
 
217
- All providers return responses in a consistent format:
496
+ ### Get initial tokens (Codex / OpenAI OAuth)
218
497
 
219
498
  ```ruby
220
- {
221
- choices: [
222
- {
223
- content: [
224
- { type: 'text', text: 'The capital of France is Paris.' }
225
- ],
226
- finish_reason: 'end_turn',
227
- role: 'assistant'
228
- }
229
- ],
230
- usage: {
231
- input_tokens: 15,
232
- output_tokens: 8,
233
- total_tokens: 23
234
- },
235
- model: 'claude-sonnet-4-20250514',
236
- id: 'msg_abc123'
237
- }
499
+ require "llm_gateway"
500
+
501
+ flow = LlmGateway::Clients::OpenAI::OAuthFlow.new
502
+
503
+ # 1) Start flow (generate auth URL + PKCE verifier + state)
504
+ start = flow.start
505
+ puts "Open in browser: #{start[:authorization_url]}"
506
+
507
+ # 2) After user auth, paste redirect URL (or raw code)
508
+ # Example: http://localhost:1455/auth/callback?code=...&state=...
509
+ print "Paste callback URL or code: "
510
+ input = STDIN.gets&.strip
511
+
512
+ # 3) Exchange for initial tokens
513
+ tokens = flow.exchange_code(input, start[:code_verifier], expected_state: start[:state])
514
+
515
+ puts tokens
516
+ # => {
517
+ # access_token: "...",
518
+ # refresh_token: "...",
519
+ # expires_at: <Time>,
520
+ # account_id: "..."
521
+ # }
522
+ ```
523
+
524
+ ### Get initial tokens (Anthropic OAuth)
525
+
526
+ ```ruby
527
+ require "llm_gateway"
528
+
529
+ flow = LlmGateway::Clients::ClaudeCode::OAuthFlow.new
530
+
531
+ # 1) Start flow (auth URL + PKCE verifier + state)
532
+ start = flow.start
533
+ puts "Open in browser: #{start[:authorization_url]}"
534
+
535
+ # 2) After user auth, paste callback URL (or code)
536
+ # Example callback contains ?code=...&state=...
537
+ print "Paste callback URL or code: "
538
+ input = STDIN.gets&.strip
539
+
540
+ # 3) Exchange for initial tokens
541
+ tokens = flow.exchange_code(input, start[:code_verifier], state: start[:state])
542
+
543
+ puts tokens
544
+ # => {
545
+ # access_token: "...",
546
+ # refresh_token: "...",
547
+ # expires_at: <Time>
548
+ # }
238
549
  ```
239
550
 
240
- ### Error Handling
551
+ ### Get a refresh token
241
552
 
242
- LlmGateway provides consistent error handling across all providers:
553
+ ### Exchange refresh token for access token
554
+
555
+ Use the built-in token managers in this repo. `on_token_refresh` block will be called when the refresh token is updated and should be persisted.
556
+
557
+ OpenAI Codex OAuth:
243
558
 
244
559
  ```ruby
245
- begin
246
- result = LlmGateway::Client.chat('invalid-model', 'Hello')
247
- rescue LlmGateway::Errors::UnsupportedModel => e
248
- puts "Unsupported model: #{e.message}"
249
- rescue LlmGateway::Errors::AuthenticationError => e
250
- puts "Authentication failed: #{e.message}"
251
- rescue LlmGateway::Errors::RateLimitError => e
252
- puts "Rate limit exceeded: #{e.message}"
560
+ require "llm_gateway"
561
+
562
+ manager = LlmGateway::Clients::OpenAI::TokenManager.new(
563
+ refresh_token: stored_refresh_token,
564
+ access_token: stored_access_token, # optional
565
+ expires_at: stored_expires_at # optional
566
+ )
567
+
568
+ manager.on_token_refresh = lambda do |new_access_token, new_refresh_token, new_expires_at|
569
+ # Persist updated credentials in your DB/secrets store
253
570
  end
571
+
572
+ current_access_token = manager.access_token
254
573
  ```
255
574
 
256
- ## Development
575
+ Anthropic OAuth:
576
+
577
+ ```ruby
578
+ require "llm_gateway"
579
+
580
+ manager = LlmGateway::Clients::ClaudeCode::TokenManager.new(
581
+ refresh_token: stored_refresh_token,
582
+ access_token: stored_access_token, # optional
583
+ expires_at: stored_expires_at, # optional
584
+ client_id: ENV.fetch("ANTHROPIC_CLIENT_ID"),
585
+ client_secret: ENV["ANTHROPIC_CLIENT_SECRET"] # optional depending on app setup
586
+ )
587
+
588
+ manager.on_token_refresh = lambda do |new_access_token, new_refresh_token, new_expires_at|
589
+ # Persist updated credentials
590
+ end
257
591
 
258
- After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
592
+ current_access_token = manager.access_token
593
+ ```
594
+
595
+ ### Pass access token in provider requests
596
+
597
+ Build the provider with the current access token:
598
+
599
+ ```ruby
600
+ adapter = LlmGateway.build_provider(
601
+ provider: "openai_codex",
602
+ access_token: current_access_token,
603
+ model_key: "gpt-5.4"
604
+ )
605
+
606
+ result = adapter.stream("Hello from OAuth auth")
607
+ puts result.content.select { |b| b.type == "text" }.map(&:text).join
608
+ ```
259
609
 
260
- To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
610
+ If your app refreshes tokens in the background, rebuild the adapter (or recreate client state) with the newest `access_token` before subsequent calls.
261
611
 
262
- ## Contributing
612
+ ### Token refresh responsibility
263
613
 
264
- Bug reports and pull requests are welcome on GitHub at https://github.com/Hyper-Unearthing/llm_gateway. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [code of conduct](https://github.com/Hyper-Unearthing/llm_gateway/blob/master/CODE_OF_CONDUCT.md).
614
+ #### Library’s role (llm_gateway)
265
615
 
266
- ## License
616
+ - Provides token manager helpers.
617
+ - Detects expiry from expires_at.
618
+ - Refreshes access token when asked (ensure_valid_token / refresh methods).
619
+ - Returns updated token values and triggers on_token_refresh callback after successful refresh.
620
+ - Uses whatever access token you pass into provider requests.
267
621
 
268
- The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
622
+ #### User/app’s role
269
623
 
270
- ## Code of Conduct
624
+ - Persist tokens securely (DB/secrets store).
625
+ - Store and pass access_token, refresh_token, expires_at into the token manager.
626
+ - Implement on_token_refresh to save updated credentials.
627
+ - Decide refresh/retry policy at app level (e.g., retry failed request after refresh when appropriate).
628
+ - Rebuild client/provider state with latest access token for future calls.
271
629
 
272
- Everyone interacting in the LlmGateway project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/Hyper-Unearthing/llm_gateway/blob/master/CODE_OF_CONDUCT.md).
630
+ In short: library executes refresh mechanics; your app owns token lifecycle persistence and operational policy.