RubyGems - llm_gateway - Versions diffs - 0.3.0 → 0.4.0 - Mend

llm_gateway 0.3.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (74) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +26 -0
data/README.md +544 -186
data/Rakefile +1 -2
data/docs/migration-guide.md +135 -0
data/lib/llm_gateway/adapters/adapter.rb +173 -0
data/lib/llm_gateway/adapters/anthropic/acts_like_messages.rb +23 -0
data/lib/llm_gateway/adapters/{claude → anthropic}/bidirectional_message_mapper.rb +31 -3
data/lib/llm_gateway/adapters/{claude → anthropic}/input_mapper.rb +4 -3
data/lib/llm_gateway/adapters/anthropic/messages_adapter.rb +19 -0
data/lib/llm_gateway/adapters/{claude → anthropic}/output_mapper.rb +1 -1
data/lib/llm_gateway/adapters/anthropic/stream_mapper.rb +110 -0
data/lib/llm_gateway/adapters/anthropic_option_mapper.rb +53 -0
data/lib/llm_gateway/adapters/groq/chat_completions_adapter.rb +47 -0
data/lib/llm_gateway/adapters/groq/option_mapper.rb +27 -0
data/lib/llm_gateway/adapters/input_message_sanitizer.rb +93 -0
data/lib/llm_gateway/adapters/openai/acts_like_chat_completions.rb +22 -0
data/lib/llm_gateway/adapters/openai/acts_like_responses.rb +31 -0
data/lib/llm_gateway/adapters/{open_ai → openai}/chat_completions/bidirectional_message_mapper.rb +9 -2
data/lib/llm_gateway/adapters/{open_ai → openai}/chat_completions/input_mapper.rb +1 -6
data/lib/llm_gateway/adapters/openai/chat_completions/input_message_sanitizer.rb +65 -0
data/lib/llm_gateway/adapters/openai/chat_completions/option_mapper.rb +39 -0
data/lib/llm_gateway/adapters/{open_ai → openai}/chat_completions/output_mapper.rb +1 -1
data/lib/llm_gateway/adapters/openai/chat_completions/stream_mapper.rb +242 -0
data/lib/llm_gateway/adapters/openai/chat_completions_adapter.rb +20 -0
data/lib/llm_gateway/adapters/{open_ai → openai}/file_output_mapper.rb +1 -1
data/lib/llm_gateway/adapters/openai/prompt_cache_option_mapper.rb +39 -0
data/lib/llm_gateway/adapters/{open_ai → openai}/responses/bidirectional_message_mapper.rb +52 -4
data/lib/llm_gateway/adapters/openai/responses/input_mapper.rb +106 -0
data/lib/llm_gateway/adapters/openai/responses/option_mapper.rb +41 -0
data/lib/llm_gateway/adapters/{open_ai → openai}/responses/output_mapper.rb +1 -1
data/lib/llm_gateway/adapters/openai/responses/stream_mapper.rb +340 -0
data/lib/llm_gateway/adapters/openai/responses_adapter.rb +20 -0
data/lib/llm_gateway/adapters/openai_codex/input_mapper.rb +206 -0
data/lib/llm_gateway/adapters/openai_codex/option_mapper.rb +28 -0
data/lib/llm_gateway/adapters/openai_codex/responses_adapter.rb +38 -0
data/lib/llm_gateway/adapters/option_mapper.rb +13 -0
data/lib/llm_gateway/adapters/stream_accumulator.rb +91 -0
data/lib/llm_gateway/adapters/structs.rb +145 -0
data/lib/llm_gateway/base_client.rb +62 -1
data/lib/llm_gateway/client.rb +45 -129
data/lib/llm_gateway/clients/anthropic.rb +167 -0
data/lib/llm_gateway/clients/claude_code/oauth_flow.rb +162 -0
data/lib/llm_gateway/clients/claude_code/token_manager.rb +112 -0
data/lib/llm_gateway/clients/groq.rb +54 -0
data/lib/llm_gateway/clients/openai.rb +208 -0
data/lib/llm_gateway/clients/openai_codex/oauth_flow.rb +258 -0
data/lib/llm_gateway/clients/openai_codex/token_manager.rb +71 -0
data/lib/llm_gateway/errors.rb +21 -0
data/lib/llm_gateway/prompt.rb +12 -1
data/lib/llm_gateway/provider_registry.rb +37 -0
data/lib/llm_gateway/version.rb +1 -1
data/lib/llm_gateway.rb +165 -14
data/scripts/create_anthropic_credentials.rb +106 -0
data/scripts/create_openai_codex_credentials.rb +116 -0
data/scripts/generate_handoff_live_fixture.rb +169 -0
data/scripts/generate_handoff_media_fixture.rb +167 -0
metadata +64 -28
data/lib/llm_gateway/adapters/claude/client.rb +0 -60
data/lib/llm_gateway/adapters/groq/bidirectional_message_mapper.rb +0 -18
data/lib/llm_gateway/adapters/groq/client.rb +0 -58
data/lib/llm_gateway/adapters/groq/input_mapper.rb +0 -18
data/lib/llm_gateway/adapters/groq/output_mapper.rb +0 -10
data/lib/llm_gateway/adapters/open_ai/client.rb +0 -80
data/lib/llm_gateway/adapters/open_ai/responses/input_mapper.rb +0 -62
data/sample/claude_code_clone/agent.rb +0 -65
data/sample/claude_code_clone/claude_code_clone.rb +0 -40
data/sample/claude_code_clone/prompt.rb +0 -79
data/sample/claude_code_clone/run.rb +0 -47
data/sample/claude_code_clone/tools/bash_tool.rb +0 -54
data/sample/claude_code_clone/tools/edit_tool.rb +0 -61
data/sample/claude_code_clone/tools/grep_tool.rb +0 -113
data/sample/claude_code_clone/tools/read_tool.rb +0 -61
data/sample/claude_code_clone/tools/todowrite_tool.rb +0 -98

data/README.md CHANGED Viewed

@@ -1,272 +1,630 @@
-# LlmGateway
+# llm_gateway
 Provide a unified translation interface for LLM Provider API's, While allowing developers to have as much control as possible, This does make it more complicated because we dont want developers to be blocked at using something that the provider supports. As time progress the library will mature and support more responses
+## Table of Contents
+- [Principles:](#principles)
+- [Installation](#installation)
+- [Supported Providers](#supported-providers)
+- [Quick Start: Streaming (all events)](#quick-start-streaming-all-events)
+  - [Stream API without handling events (final result only)](#stream-api-without-handling-events-final-result-only)
+- [Migration guides](#migration-guides)
+- [Tools](#tools)
+  - [Defining Tools](#defining-tools)
+  - [Handling Tool Calls](#handling-tool-calls)
+- [Image Input](#image-input)
+- [Thinking / Reasoning](#thinking--reasoning)
+  - [Streaming Thinking Content](#streaming-thinking-content)
+  - [How reasoning values are mapped](#how-reasoning-values-are-mapped)
+- [Cross-Provider Handoffs](#cross-provider-handoffs)
+- [Context Serialization](#context-serialization)
+- [OAuth](#oauth)
+  - [Get initial tokens (Codex / OpenAI OAuth)](#get-initial-tokens-codex--openai-oauth)
+  - [Get initial tokens (Anthropic OAuth)](#get-initial-tokens-anthropic-oauth)
+  - [Get a refresh token](#get-a-refresh-token)
+  - [Exchange refresh token for access token](#exchange-refresh-token-for-access-token)
+  - [Pass access token in provider requests](#pass-access-token-in-provider-requests)
+  - [Token refresh responsibility](#token-refresh-responsibility)
+    - [Library’s role (llm_gateway)](#librarys-role-llm_gateway)
+    - [User/app’s role](#userapps-role)
 ## Principles:
 1. Transcription integrity is most important
 2. Input messages must have bidirectional integrity
 3. Allow developers as much control as possible
-## Assumptions
-things that do not support unidirectional format, probably cant be sent between providers
+## Installation
+```bash
+gem install llm_gateway
+```
-## Mechanics
-Messages either support unidirectional or bidirectional format. (unidirectional means we can format it as an output but should not be added as an input).
+Or add it to your `Gemfile`:
-The result from the llm is in the format that can be sent to the provider, but if you want to consolidate complex messages like code_execution, you must run a mapper we provide manually, but dont send that format back to the provider.
+```ruby
+gem "llm_gateway"
+```
-### bidirectional Support
-Messages
-- Text
-- Tool Use
-- Tool Response
+## Supported Providers
-Tools
-- Server Tools
-- Tools
+| Provider  | Provider Key                 | Auth  | API Surface            |
+|-----------|------------------------------|-------|------------------------|
+| Anthropic | `anthropic_messages`         | API key | Messages             |
+| OpenAI    | `openai_completions`         | API key | Chat Completions     |
+| OpenAI    | `openai_responses`           | API key | Responses            |
+| OpenAI Codex | `openai_codex`            | OAuth   | Responses            |
+| Groq      | `groq_completions`           | API key | Chat Completions     |
-### Unidirectional Support
-- Server Tool Use Reponse
+Legacy keys (`*_apikey_*`, `*_oauth_*`) are still supported for backward compatibility.
-### Example flow
+## Quick Start: Streaming (all events)
+```ruby
+require "llm_gateway"
+require "json"
+# Build a provider adapter directly (not via prebuilt config)
+adapter = LlmGateway.build_provider(
+  provider: "openai_responses", # or anthropic_messages, groq_completions, ...
+  api_key: ENV.fetch("OPENAI_API_KEY"),
+  model_key: "gpt-5.4"
+)
-```mermaid
-sequenceDiagram
-        actor developer
-        participant llm_gateway
-        participant llm_provider
+tools = [
+  {
+    name: "get_time",
+    description: "Get the current time",
+    input_schema: {
+      type: "object",
+      properties: {
+        timezone: { type: "string", description: "Optional timezone, e.g. America/New_York" }
+      }
+    }
+  }
+]
+transcript = [
+  { role: "user", content: "What time is it? Think briefly, then call get_time." }
+]
+streamed_tool_args = Hash.new { |h, k| h[k] = +"" }
+response = adapter.stream(transcript, tools: tools, reasoning: "high") do |event|
+  case event.type
+  # AssistantStreamMessageEvent
+  when :message_start
+    puts "\n[message_start] #{event.delta.inspect}"
+  when :message_delta
+    puts "\n[message_delta] #{event.delta.inspect} usage+=#{event.usage_increment.inspect}"
+  when :message_end
+    puts "\n[message_end]"
+  # Text events
+  when :text_start
+    puts "\n[text_start] index=#{event.content_index}"
+    print event.delta unless event.delta.empty?
+  when :text_delta
+    print event.delta
+  when :text_end
+    puts "\n[text_end] index=#{event.content_index}"
+  # Tool-call events
+  when :tool_start
+    puts "\n[tool_start] id=#{event.id} name=#{event.name} index=#{event.content_index}"
+  when :tool_delta
+    streamed_tool_args[event.content_index] << event.delta
+    print event.delta
+  when :tool_end
+    puts "\n[tool_end] index=#{event.content_index}"
+    begin
+      puts "tool args: #{JSON.parse(streamed_tool_args[event.content_index])}"
+    rescue JSON::ParserError
+      puts "tool args (partial/raw): #{streamed_tool_args[event.content_index]}"
+    end
+  # Reasoning events
+  when :reasoning_start
+    puts "\n[reasoning_start] sig=#{event.respond_to?(:signature) ? event.signature : ""}"
+    print event.delta
+  when :reasoning_delta
+    print event.delta
+  when :reasoning_end
+    puts "\n[reasoning_end]"
+  end
+end
-        developer ->> llm_gateway: Send Text Message
-        llm_gateway ->> llm_gateway: transform to provider format
-        llm_gateway ->> llm_provider: Transformed Text Message
-        llm_provider ->> llm_gateway: Response <br />(transcript in provider format)
-        llm_gateway ->> developer:  Response <br />(transcript in combination <br />of gatway and provider formats)
-        Note over llm_gateway,developer: llm_gateway will transform <br /> messages that support bi-direction
-        developer ->> developer: save the transcript
-        loop ProcessMessage
-            developer ->> llm_gateway: format message
-            llm_gateway ->> developer: return transformed message
-            Note over llm_gateway,developer: if the message: <br /> supports bidirection format returns as is <br /> otherwise will transform <br />into consolidated format
-            developer ->> developer: append earlier saved transcript
-            Note over developer, developer: for example tool use
-        end
-        developer -> llm_gateway: Transcript
-        llm_gateway ->> llm_gateway: transform to provider format
-        Note over llm_gateway,llm_gateway: non bidirectional messages are sent as is
-        llm_gateway ->> llm_provider: etc etc etc
+# Final AssistantMessage (assembled from the stream)
+puts "\n\n=== Final assistant message ==="
+puts "id: #{response.id}"
+puts "model: #{response.model}"
+puts "provider/api: #{response.provider}/#{response.api}"
+puts "role: #{response.role}"
+puts "stop_reason: #{response.stop_reason}"
+puts "error_message: #{response.error_message.inspect}" if response.error_message
+puts "usage: #{response.usage.inspect}"
+response.content.each do |block|
+  case block.type
+  when "text"
+    puts "text: #{block.text}"
+  when "reasoning"
+    puts "reasoning: #{block.reasoning}"
+    puts "signature: #{block.signature}" if block.respond_to?(:signature) && block.signature
+  when "tool_use"
+    puts "tool_use: #{block.name}(#{block.input.inspect}) id=#{block.id}"
+  end
+end
+```
+Stream callback event families:
+- `AssistantStreamMessageEvent`: `:message_start`, `:message_delta`, `:message_end`
+- `AssistantStreamEvent` (and subclasses):
+  - Text: `:text_start`, `:text_delta`, `:text_end`
+  - Tool call: `:tool_start`, `:tool_delta`, `:tool_end`
+  - Reasoning: `:reasoning_start`, `:reasoning_delta`, `:reasoning_end`
+### Stream API without handling events (final result only)
-```
+If you only care about the final `AssistantMessage`, call `stream` without a block:
-## Supported Providers
-Anthropic, OpenAi, Groq
+```ruby
+require "llm_gateway"
+adapter = LlmGateway.build_provider(
+  provider: "openai_apikey_responses",
+  api_key: ENV.fetch("OPENAI_API_KEY"),
+  model_key: "gpt-5.4"
+)
-## Installation
+result = adapter.stream("Write one short sentence about Ruby.")
-Add the gem to your application's Gemfile:
+puts result.role         # "assistant"
+puts result.stop_reason  # "stop" (usually)
+puts result.usage.inspect
-```bash
-bundle add llm_gateway
+text = result.content
+  .select { |block| block.type == "text" }
+  .map(&:text)
+  .join
+puts text
 ```
-Or install it yourself:
+## Migration guides
-```bash
-gem install llm_gateway
-```
+- [Migrating from `chat` to `stream`](docs/chat-to-stream-migration.md) — use `stream` without a block when you only need the final response.
-## Usage
+## Tools
-### Basic Chat
+### Defining Tools
 ```ruby
-require 'llm_gateway'
+weather_tool = {
+  name: "get_weather",
+  description: "Get current weather for a location",
+  input_schema: {
+    type: "object",
+    properties: {
+      location: { type: "string", description: "City name or coordinates" },
+      units: {
+        type: "string",
+        enum: ["celsius", "fahrenheit"],
+        default: "celsius"
+      }
+    },
+    required: ["location"]
+  }
+}
+```
-# Simple text completion
-LlmGateway::Client.chat(
-  'claude-sonnet-4-20250514',
-  'What is the capital of France?'
-)
+### Handling Tool Calls
-# With system message
-LlmGateway::Client.chat(
-  'gpt-4',
-  'What is the capital of France?',
-  system: 'You are a helpful geography teacher.'
+Use `stream` without a block, inspect returned `tool_use` blocks, execute tools, append `tool_result`, then continue:
+```ruby
+require "llm_gateway"
+require "json"
+adapter = LlmGateway.build_provider(
+  provider: "openai_apikey_responses",
+  api_key: ENV.fetch("OPENAI_API_KEY"),
+  model_key: "gpt-5.4"
 )
-# With inline file
-LlmGateway::Client.chat(
-  "claude-sonnet-4-20250514",
-  [
-    {
-      role: "user", content: [
-        { type: "text", text: "return the content of the document exactly" },
-        { type: "file", data: "abc\n", media_type: "text/plain", name: "small.txt"  }
-      ]
+weather_tool = {
+  name: "get_weather",
+  description: "Get current weather for a location",
+  input_schema: {
+    type: "object",
+    properties: {
+      location: { type: "string" },
+      units: { type: "string", enum: ["celsius", "fahrenheit"], default: "celsius" }
     },
-  ]
-)
+    required: ["location"]
+  }
+}
-# Transcript
-LlmGateway::Client.chat('llama-3.3-70b-versatile',[
-    { role: "user", content: "Tell Me a joke" },
-    { role: "assistant", content: "what kind of content"},
-    { role: "user", content: "About Sparkling water" },
-  ]
-)
+def execute_weather_api(args)
+  # Replace with real API call
+  {
+    location: args[:location] || args["location"],
+    units: args[:units] || args["units"] || "celsius",
+    temperature: 14,
+    condition: "Cloudy"
+  }
+end
+transcript = [
+  { role: "user", content: "What is the weather in London?" }
+]
-# Tool usage
-LlmGateway::Client.chat('gpt-5',[
-    { role: "user", content: "What's the weather in Singapore? reply in 10 words and no special characters" },
-    { role: "assistant",
-        content: [
-          { id: "call_gpXfy9l9QNmShNEbNI1FyuUZ", type: "tool_use", name: "get_weather", input: { location: "Singapore" } }
-        ]
-    },
-    { role: "developer",
-      content: [
-        { content: "-15 celcius", type: "tool_result", tool_use_id: "call_gpXfy9l9QNmShNEbNI1FyuUZ" }
-      ]
-    }
-  ],
-  tools: [ { name: "get_weather", description: "Get current weather for a location", input_schema: { type: "object", properties: { location: { type: "string", description: "City name" } }, required: [ "location" ] } } ]
-)
+# 1) First model pass (stream API, no event block)
+response = adapter.stream(transcript, tools: [weather_tool])
+transcript << response.to_h
+# 2) Execute tool calls returned by the model
+response.content.each do |block|
+  next unless block.type == "tool_use"
+  tool_result = execute_weather_api(block.input)
+  transcript << {
+    role: "developer",
+    content: [
+      {
+        type: "tool_result",
+        tool_use_id: block.id,
+        content: JSON.generate(tool_result)
+      }
+    ]
+  }
+end
+# 3) Continue the conversation after tool execution
+if response.content.any? { |b| b.type == "tool_use" }
+  final_response = adapter.stream(transcript, tools: [weather_tool])
+  final_text = final_response.content
+    .select { |b| b.type == "text" }
+    .map(&:text)
+    .join
+  puts final_text
+end
 ```
-### Supported Roles
+Notes:
+- Tool calls are returned as `ToolCall` blocks with `type: "tool_use"`, `id`, `name`, and `input`.
+- Tool results are sent back in the transcript as `{ type: "tool_result", tool_use_id:, content: }` blocks.
+- For multimodal-capable models, `tool_result` content can include image blocks when supported by the provider/model.
-- user
-- developer
-- assistant
+## Image Input
+Send images by including an `image` content block in a user message.
-#### Examples
 ```ruby
-# tool call
-{ role: "developer",
-  content: [
-    { content: "-15 celcius", type: "tool_result", tool_use_id: "call_gpXfy9l9QNmShNEbNI1FyuUZ" }
-  ]
-}
-# plain message
-{ role: "user", content: "What's the weather in Singapore? reply in 10 words and no special characters" }
+require "llm_gateway"
+require "base64"
+adapter = LlmGateway.build_provider(
+  provider: "openai_apikey_responses",
+  api_key: ENV.fetch("OPENAI_API_KEY"),
+  model_key: "gpt-5.4"
+)
-# plain response
-{ role: "assistant", content: "what kind of content"},
+image_b64 = Base64.strict_encode64(File.binread("./chart.png"))
-# tool call response
-{ role: "assistant",
+message = [
+  {
+    role: "user",
     content: [
-      { id: "call_gpXfy9l9QNmShNEbNI1FyuUZ", type: "tool_use", name: "get_weather", input: { location: "Singapore" } }
+      { type: "text", text: "What do you see in this image?" },
+      { type: "image", data: image_b64, media_type: "image/png" }
     ]
-},
+  }
+]
+result = adapter.stream(message) # stream API, no event block
+text = result.content
+  .select { |b| b.type == "text" }
+  .map(&:text)
+  .join
+puts text
 ```
-developer is an open ai role, but i thought it was usefull for tracing if message sent from server or user so i added
-it to the list of roles, when it is not supported it will be mapped to user instead.
+Tip: use a model/provider combination that supports vision input.
-you can assume developer and user to be interchangeable
+## Thinking / Reasoning
+You can request higher-effort reasoning by passing `reasoning:` to `stream`.
-### Files
+```ruby
+require "llm_gateway"
+adapter = LlmGateway.build_provider(
+  provider: "openai_apikey_responses",
+  api_key: ENV.fetch("OPENAI_API_KEY"),
+  model_key: "gpt-5.4"
+)
+result = adapter.stream(
+  "Think step by step and then compute 482 * 17.",
+  reasoning: "high"
+)
+puts "stop_reason: #{result.stop_reason}"
+puts "usage: #{result.usage.inspect}" # may include reasoning_tokens depending on provider
+result.content.each do |block|
+  case block.type
+  when "reasoning"
+    puts "[reasoning] #{block.reasoning}"
+    puts "[signature] #{block.signature}" if block.respond_to?(:signature) && block.signature
+  when "text"
+    puts "[text] #{block.text}"
+  end
+end
+```
-Many providers offer the ability to upload files which can be referenced in conversations, or for other purposes like batching. Downloading files is also used for when llm generates something or batches complete.
+### Streaming Thinking Content
-## Examples
+If you want incremental thinking/reasoning tokens as they arrive, pass a block to `stream` and handle reasoning events:
 ```ruby
-# Upload File
-result = LlmGateway::Client.upload_file("openai", filename: "test.txt", content: "Hello, world!", mime_type: "text/plain")
-result = LlmGateway::Client.download_file("openai", file_id: "file-Kb6X7f8YDffu7FG1NcaPVu")
-# Response Format
-{
-  id: "file-Kb6X7f8YDffu7FG1NcaPVu",
-  size_bytes: 13,  # follows anthropic naming cause clearer
-  created_at: "2025-08-08T06:03:16.000000Z", # follow anthropic style cause easier to read as human
-  filename: "test.txt",
-  mime_type: nil,
-  downloadable: true, # anthropic returns this for other providers it is infered
-  expires_at: nil,
-  purpose: "user_data" # for anthropic this is always user_data
-}
+reasoning_text = +""
+result = adapter.stream("Solve 99 * 99 with brief reasoning.", reasoning: "high") do |event|
+  case event.type
+  when :reasoning_start
+    print "\n[thinking start]\n"
+    reasoning_text << event.delta
+  when :reasoning_delta
+    reasoning_text << event.delta
+    print event.delta
+  when :reasoning_end
+    print "\n[thinking end]\n"
+  end
+end
+puts "\nCollected reasoning chars: #{reasoning_text.length}"
+puts "Final stop_reason: #{result.stop_reason}"
 ```
-### Sample Application
+### How reasoning values are mapped
-See the [file search bot example](sample/claude_code_clone/) for a complete working application that demonstrates:
-- Creating reusable Prompt and Tool classes
-- Handling conversation transcripts with tool execution
-- Building an interactive terminal interface
+`llm_gateway` normalizes provider-specific reasoning/thinking output into shared structures:
-To run the sample:
+- Stream events:
+  - `:reasoning_start/:reasoning_delta/:reasoning_end`
+- Final content block:
+  - `ReasoningContent` with `type: "reasoning"`
+  - fields: `reasoning` and optional `signature`
+- Usage accounting:
+  - normalized in `result.usage` when provided by the upstream API
+  - may include `:reasoning_tokens` plus standard token counters
-```bash
-cd sample/claude_code_clone
-ruby run.rb
+In practice this means you can:
+- listen to `:reasoning_*` stream event variants, and
+- always read final reasoning text from `result.content` blocks where `block.type == "reasoning"`.
+Notes:
+- Reasoning output appears as `ReasoningContent` blocks with `type: "reasoning"`.
+- Some providers/models expose explicit reasoning content; others may only reflect reasoning effort in usage fields.
+- In streamed callbacks, reasoning events are emitted as `:reasoning_*` variants.
+## Cross-Provider Handoffs
+Internally, `llm_gateway` handles handoffs by normalizing message history into a provider-agnostic shape, then remapping that shape to the target provider API on each request.
+What happens under the hood on `stream`/`chat`:
+1. **Normalize input**
+   - String input is converted to a user message.
+   - `system` is normalized into system message objects.
+   - Prior assistant turns (including `response.to_h`) are treated as structured transcript entries.
+2. **Map into canonical gateway format**
+   - Provider-specific differences (content block names, tool-call shapes, reasoning/thinking variants) are unified into shared structs.
+3. **Sanitize for target provider/model**
+   - Before sending, messages are sanitized for the destination provider/API/model.
+   - Unsupported or provider-specific fields are adjusted/translated where possible.
+4. **Map to outbound provider payload**
+   - The adapter input mapper converts canonical messages/tools/options into the exact wire format expected by the selected provider endpoint.
+5. **Map response back to canonical output**
+   - Stream chunks are mapped into normalized stream events.
+   - Final output is accumulated into a normalized `AssistantMessage` (`id`, `model`, `usage`, `stop_reason`, `content`, etc.).
+Why this matters:
+- A transcript produced by one provider can be reused with another provider without manually rewriting message structure.
+- Tool calls/reasoning/text are exposed through a consistent API even when upstream event formats differ.
+- Your app can keep one conversation state format while switching providers for cost, latency, capability, or reliability reasons.
+## Context Serialization
+`llm_gateway` contexts are plain Ruby hashes/arrays, so they can be serialized to JSON and restored later.
+```ruby
+require "llm_gateway"
+require "json"
+adapter = LlmGateway.build_provider(
+  provider: "openai_apikey_responses",
+  api_key: ENV.fetch("OPENAI_API_KEY"),
+  model_key: "gpt-5.4"
+)
+# Build context (transcript)
+transcript = [
+  { role: "user", content: "Plan a 3-day trip to Tokyo." }
+]
+# Run one turn and persist assistant output
+first = adapter.stream(transcript)
+transcript << first.to_h
+# Serialize (store in DB/file/cache)
+json_context = JSON.generate(transcript)
+# ...later / elsewhere...
+restored_transcript = JSON.parse(json_context)
+# Continue conversation from restored context
+restored_transcript << { role: "user", content: "Now make it budget-friendly." }
+second = adapter.stream(restored_transcript)
+puts second.content.select { |b| b.type == "text" }.map(&:text).join
 ```
-The bot will prompt for your model and API key, then allow you to ask natural language questions about finding files and searching directories.
+What to persist:
+- full transcript array (including assistant messages from `response.to_h`)
+- any tool result messages you appended
+- optional app metadata (user id, conversation id, timestamps) alongside the transcript
+Tip: if you serialize to JSON, keys become strings on parse; `llm_gateway` accepts standard hash input and normalizes internally.
+## OAuth
-### Response Format
+Use OAuth-capable providers (for example `openai_codex` and `anthropic_oauth_messages`) by supplying an `access_token` when building the adapter.
-All providers return responses in a consistent format:
+### Get initial tokens (Codex / OpenAI OAuth)
 ```ruby
-{
-  choices: [
-    {
-      content: [
-        { type: 'text', text: 'The capital of France is Paris.' }
-      ],
-      finish_reason: 'end_turn',
-      role: 'assistant'
-    }
-  ],
-  usage: {
-    input_tokens: 15,
-    output_tokens: 8,
-    total_tokens: 23
-  },
-  model: 'claude-sonnet-4-20250514',
-  id: 'msg_abc123'
-}
+require "llm_gateway"
+flow = LlmGateway::Clients::OpenAI::OAuthFlow.new
+# 1) Start flow (generate auth URL + PKCE verifier + state)
+start = flow.start
+puts "Open in browser: #{start[:authorization_url]}"
+# 2) After user auth, paste redirect URL (or raw code)
+# Example: http://localhost:1455/auth/callback?code=...&state=...
+print "Paste callback URL or code: "
+input = STDIN.gets&.strip
+# 3) Exchange for initial tokens
+tokens = flow.exchange_code(input, start[:code_verifier], expected_state: start[:state])
+puts tokens
+# => {
+#   access_token: "...",
+#   refresh_token: "...",
+#   expires_at: <Time>,
+#   account_id: "..."
+# }
+```
+### Get initial tokens (Anthropic OAuth)
+```ruby
+require "llm_gateway"
+flow = LlmGateway::Clients::ClaudeCode::OAuthFlow.new
+# 1) Start flow (auth URL + PKCE verifier + state)
+start = flow.start
+puts "Open in browser: #{start[:authorization_url]}"
+# 2) After user auth, paste callback URL (or code)
+# Example callback contains ?code=...&state=...
+print "Paste callback URL or code: "
+input = STDIN.gets&.strip
+# 3) Exchange for initial tokens
+tokens = flow.exchange_code(input, start[:code_verifier], state: start[:state])
+puts tokens
+# => {
+#   access_token: "...",
+#   refresh_token: "...",
+#   expires_at: <Time>
+# }
 ```
-### Error Handling
+### Get a refresh token
-LlmGateway provides consistent error handling across all providers:
+### Exchange refresh token for access token
+Use the built-in token managers in this repo. `on_token_refresh` block will be called when the refresh token is updated and should be persisted.
+OpenAI Codex OAuth:
 ```ruby
-begin
-  result = LlmGateway::Client.chat('invalid-model', 'Hello')
-rescue LlmGateway::Errors::UnsupportedModel => e
-  puts "Unsupported model: #{e.message}"
-rescue LlmGateway::Errors::AuthenticationError => e
-  puts "Authentication failed: #{e.message}"
-rescue LlmGateway::Errors::RateLimitError => e
-  puts "Rate limit exceeded: #{e.message}"
+require "llm_gateway"
+manager = LlmGateway::Clients::OpenAI::TokenManager.new(
+  refresh_token: stored_refresh_token,
+  access_token: stored_access_token,   # optional
+  expires_at: stored_expires_at         # optional
+)
+manager.on_token_refresh = lambda do |new_access_token, new_refresh_token, new_expires_at|
+  # Persist updated credentials in your DB/secrets store
 end
+current_access_token = manager.access_token
 ```
-## Development
+Anthropic OAuth:
+```ruby
+require "llm_gateway"
+manager = LlmGateway::Clients::ClaudeCode::TokenManager.new(
+  refresh_token: stored_refresh_token,
+  access_token: stored_access_token,    # optional
+  expires_at: stored_expires_at,        # optional
+  client_id: ENV.fetch("ANTHROPIC_CLIENT_ID"),
+  client_secret: ENV["ANTHROPIC_CLIENT_SECRET"] # optional depending on app setup
+)
+manager.on_token_refresh = lambda do |new_access_token, new_refresh_token, new_expires_at|
+  # Persist updated credentials
+end
-After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
+current_access_token = manager.access_token
+```
+### Pass access token in provider requests
+Build the provider with the current access token:
+```ruby
+adapter = LlmGateway.build_provider(
+  provider: "openai_codex",
+  access_token: current_access_token,
+  model_key: "gpt-5.4"
+)
+result = adapter.stream("Hello from OAuth auth")
+puts result.content.select { |b| b.type == "text" }.map(&:text).join
+```
-To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
+If your app refreshes tokens in the background, rebuild the adapter (or recreate client state) with the newest `access_token` before subsequent calls.
-## Contributing
+### Token refresh responsibility
-Bug reports and pull requests are welcome on GitHub at https://github.com/Hyper-Unearthing/llm_gateway. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [code of conduct](https://github.com/Hyper-Unearthing/llm_gateway/blob/master/CODE_OF_CONDUCT.md).
+#### Library’s role (llm_gateway)
-## License
+- Provides token manager helpers.
+- Detects expiry from expires_at.
+- Refreshes access token when asked (ensure_valid_token / refresh methods).
+- Returns updated token values and triggers on_token_refresh callback after successful refresh.
+- Uses whatever access token you pass into provider requests.
-The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
+#### User/app’s role
-## Code of Conduct
+- Persist tokens securely (DB/secrets store).
+- Store and pass access_token, refresh_token, expires_at into the token manager.
+- Implement on_token_refresh to save updated credentials.
+- Decide refresh/retry policy at app level (e.g., retry failed request after refresh when appropriate).
+- Rebuild client/provider state with latest access token for future calls.
-Everyone interacting in the LlmGateway project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/Hyper-Unearthing/llm_gateway/blob/master/CODE_OF_CONDUCT.md).
+In short: library executes refresh mechanics; your app owns token lifecycle persistence and operational policy.