RubyGems - llm_gateway - Versions diffs - 0.5.0 → 0.6.0 - Mend

llm_gateway 0.5.0 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +26 -0
data/README.md +95 -42
data/docs/migration_guide_0.6.0.md +386 -0
data/lib/llm_gateway/adapters/adapter.rb +7 -10
data/lib/llm_gateway/adapters/anthropic/stream_mapper.rb +33 -6
data/lib/llm_gateway/adapters/normalized_stream_accumulator.rb +87 -26
data/lib/llm_gateway/adapters/openai/chat_completions/stream_mapper.rb +40 -16
data/lib/llm_gateway/adapters/openai/responses/stream_mapper.rb +42 -21
data/lib/llm_gateway/adapters/stream_mapper.rb +9 -2
data/lib/llm_gateway/adapters/structs.rb +102 -52
data/lib/llm_gateway/base_client.rb +2 -4
data/lib/llm_gateway/clients/anthropic.rb +5 -4
data/lib/llm_gateway/clients/groq.rb +8 -6
data/lib/llm_gateway/clients/openai.rb +20 -18
data/lib/llm_gateway/prompt.rb +35 -17
data/lib/llm_gateway/version.rb +1 -1
data/lib/llm_gateway.rb +3 -21
metadata +3 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: ce9b9e4f2137a73474b1ed5f0876d8b1bf6185666ab8c756c3a3f67e99e9d86e
-  data.tar.gz: 5735832e4bd57946ffc0a251c5a3a0861af0fc12a989456e1f877675e08846ba
+  metadata.gz: '086d7bdff1cb0b6b3febb78d025d7ccfe4b53c6fd40fcb5cddebd335d786e437'
+  data.tar.gz: 1b2ea3af95f44d27c0c1636da321d24dc036fad8a242263f608948c79ac11f88
 SHA512:
-  metadata.gz: afd52f4ead29acf7a612a06456e203e295534d2cb2275a7ea99be5840da39a821f4727402687bd9c3696bc0081c12f09861aa1b7ad135f986054625c68341422
-  data.tar.gz: 9c033b13f91e9315aadedca98cb61e32c01584a4e6cbe4f05b3782eb84287d24e50dbbc5e1bc127d14bc362d2aac262a589b810614a01d432d02f72acb3013a7
+  metadata.gz: '0147478704832819ee6d8fbe4e0e6203f4e598d72fd3b23138b550de9da64fb90cd8354713a5553244acf17b9c6fe0a89a0b5cab624f03ec7382e12f11aebb21'
+  data.tar.gz: 22e1ff9571717ebe8f39a31cd36d37815c6053def32ac1e125a103ccb516a98b37aeb225edae899c6a9ecf121df5344bbbe6f6166a2e1d89ffa05db881c70e14

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,31 @@
 # Changelog
+## [v0.6.0](https://github.com/Hyper-Unearthing/llm_gateway/tree/v0.6.0) (2026-05-27)
+[Full Changelog](https://github.com/Hyper-Unearthing/llm_gateway/compare/v0.5.0...v0.6.0)
+**Closed issues:**
+- issues with token normalization [\#75](https://github.com/Hyper-Unearthing/llm_gateway/issues/75)
+- Add normalized token usage fields for streamed responses [\#72](https://github.com/Hyper-Unearthing/llm_gateway/issues/72)
+- Add timestamp metadata to messages [\#70](https://github.com/Hyper-Unearthing/llm_gateway/issues/70)
+- Build final AssistantMessage in stream pipeline and include it on message\_end [\#69](https://github.com/Hyper-Unearthing/llm_gateway/issues/69)
+- Expose finalized content on stream \_end events [\#68](https://github.com/Hyper-Unearthing/llm_gateway/issues/68)
+- Add accumulated AssistantMessage partials to stream events [\#66](https://github.com/Hyper-Unearthing/llm_gateway/issues/66)
+- 1.0 [\#37](https://github.com/Hyper-Unearthing/llm_gateway/issues/37)
+**Merged pull requests:**
+- Improve token normalization [\#78](https://github.com/Hyper-Unearthing/llm_gateway/pull/78) ([billybonks](https://github.com/billybonks))
+- fix\(tests\): the hand off tests were totally fake, now they work [\#77](https://github.com/Hyper-Unearthing/llm_gateway/pull/77) ([billybonks](https://github.com/billybonks))
+- Improve message event metadata and helpers [\#74](https://github.com/Hyper-Unearthing/llm_gateway/pull/74) ([billybonks](https://github.com/billybonks))
+- fix: update migration guide [\#73](https://github.com/Hyper-Unearthing/llm_gateway/pull/73) ([billybonks](https://github.com/billybonks))
+- feat: add partial message as part of streaming events [\#67](https://github.com/Hyper-Unearthing/llm_gateway/pull/67) ([billybonks](https://github.com/billybonks))
+- docs: add migration guide for upcomming version [\#65](https://github.com/Hyper-Unearthing/llm_gateway/pull/65) ([billybonks](https://github.com/billybonks))
+- Decouple model selection from provider auth configuration [\#64](https://github.com/Hyper-Unearthing/llm_gateway/pull/64) ([billybonks](https://github.com/billybonks))
+- burn: support for legacy provider keys [\#63](https://github.com/Hyper-Unearthing/llm_gateway/pull/63) ([billybonks](https://github.com/billybonks))
+- docs: add docs about options for stream method [\#62](https://github.com/Hyper-Unearthing/llm_gateway/pull/62) ([billybonks](https://github.com/billybonks))
 ## [v0.5.0](https://github.com/Hyper-Unearthing/llm_gateway/tree/v0.5.0) (2026-05-20)
 [Full Changelog](https://github.com/Hyper-Unearthing/llm_gateway/compare/v0.4.0...v0.5.0)

data/README.md CHANGED Viewed

@@ -7,6 +7,9 @@ Provide a unified translation interface for LLM Provider API's, While allowing d
 - [Principles:](#principles)
 - [Installation](#installation)
 - [Supported Providers](#supported-providers)
+- [Stream Options](#stream-options)
+  - [Managed cross-provider options](#managed-cross-provider-options)
+  - [Provider-specific options](#provider-specific-options)
 - [Quick Start: Streaming (all events)](#quick-start-streaming-all-events)
   - [Stream API without handling events (final result only)](#stream-api-without-handling-events-final-result-only)
 - [Migration guides](#migration-guides)
@@ -56,7 +59,53 @@ gem "llm_gateway"
 | OpenAI Codex | `openai_codex`            | OAuth   | Responses            |
 | Groq      | `groq_completions`           | API key | Chat Completions     |
-Legacy keys (`*_apikey_*`, `*_oauth_*`) are still supported for backward compatibility.
+Provider configuration only contains auth/client settings (for example `api_key` or `access_token`). Pass the model per request with `model:` when calling `chat` or `stream`.
+## Stream Options
+Pass options to `stream` as keyword arguments alongside `tools:` and `system:`:
+```ruby
+result = adapter.stream(
+  transcript,
+  system: "You are concise.",
+  reasoning: "high",
+  cache_key: "conversation-123",
+  cache_retention: "short",
+  max_completion_tokens: 2_000
+)
+```
+Options are split into two groups:
+1. **Managed cross-provider options**: normalized by `llm_gateway` and mapped to each provider API when supported.
+2. **Provider-specific options**: passed through only when that provider/API pair explicitly allows them.
+Unknown provider-specific options raise `ArgumentError` with the valid option list for that provider/API pair.
+### Managed cross-provider options
+| Option | Accepted values | What it means | Provider mapping notes |
+|--------|-----------------|---------------|------------------------|
+| `reasoning` | `"none"`, `"low"`, `"medium"`, `"high"`, `"xhigh"` | Request provider reasoning/thinking effort. | Anthropic maps to `thinking` token budgets. OpenAI Responses maps to `reasoning`. OpenAI Chat Completions maps to `reasoning_effort`. Groq maps to `reasoning_effort` and `reasoning_format: "parsed"`; Groq accepts `"default"`, `"low"`, `"medium"`, `"high"` and does not accept `"xhigh"`. |
+| `cache_key` | String | Stable prompt/session cache key. | OpenAI Chat Completions and OpenAI Responses map this to `prompt_cache_key`. |
+| `cache_retention` | `"short"`, `"long"`, `"none"` | Requested cache retention policy for `cache_key`. | OpenAI maps `"short"` to `"in_memory"`, `"long"` to `"24h"`, and `"none"` removes prompt-cache fields. If `cache_key` is set without retention, OpenAI defaults to `"short"`. |
+| `max_completion_tokens` | Integer | Maximum generated tokens using gateway naming. | Anthropic maps to `max_tokens`; OpenAI Responses maps to `max_output_tokens`; OpenAI/Groq Chat Completions use `max_completion_tokens`. OpenAI Codex currently removes token limit parameters before sending. |
+| `response_format` | String or Hash, provider-dependent | Requested final response shape, e.g. text or JSON. | OpenAI Chat Completions and Groq pass this as `response_format`; OpenAI Responses maps it under `text.format`; Anthropic maps JSON-ish formats to `output_config`. |
+### Provider-specific options
+Provider-specific options are maintained as explicit allowlists in the option mapper source. Use the mapper link to see the current allowed Ruby option keys and the provider documentation link for upstream meanings and values.
+| Provider key | Provider/API pair | Option mapper source | Provider API documentation |
+|--------------|-------------------|----------------------|----------------------------|
+| `anthropic_messages` | Anthropic Messages Create | [`lib/llm_gateway/adapters/anthropic_option_mapper.rb`](lib/llm_gateway/adapters/anthropic_option_mapper.rb) | [Anthropic Messages API](https://platform.claude.com/docs/en/api/messages/create.md) |
+| `openai_completions` | OpenAI Chat Completions Create | [`lib/llm_gateway/adapters/openai/chat_completions/option_mapper.rb`](lib/llm_gateway/adapters/openai/chat_completions/option_mapper.rb) | [OpenAI Chat Completions API](https://developers.openai.com/api/reference/resources/chat/subresources/completions/methods/create/index.md) |
+| `openai_responses` | OpenAI Responses Create | [`lib/llm_gateway/adapters/openai/responses/option_mapper.rb`](lib/llm_gateway/adapters/openai/responses/option_mapper.rb) | [OpenAI Responses API](https://developers.openai.com/api/reference/resources/responses/methods/create/index.md) |
+| `openai_codex` | OpenAI Codex Responses-compatible endpoint | [`lib/llm_gateway/adapters/openai_codex/option_mapper.rb`](lib/llm_gateway/adapters/openai_codex/option_mapper.rb) | [OpenAI Responses API](https://developers.openai.com/api/reference/resources/responses/methods/create/index.md) |
+| `groq_completions` | Groq Chat Completions Create | [`lib/llm_gateway/adapters/groq/option_mapper.rb`](lib/llm_gateway/adapters/groq/option_mapper.rb) | [Groq Chat API](https://console.groq.com/docs/api-reference.md#chat-create) |
+Common provider-native options you may pass directly when allowed include OpenAI `prompt_cache_key` / `prompt_cache_retention` and Groq `reasoning_effort` / `reasoning_format`. Prefer the managed options above when you want portable behavior across providers.
 ## Quick Start: Streaming (all events)
@@ -67,10 +116,8 @@ require "json"
 # Build a provider adapter directly (not via prebuilt config)
 adapter = LlmGateway.build_provider(
   provider: "openai_responses", # or anthropic_messages, groq_completions, ...
-  api_key: ENV.fetch("OPENAI_API_KEY"),
-  model_key: "gpt-5.4"
+  api_key: ENV.fetch("OPENAI_API_KEY")
 )
 tools = [
   {
     name: "get_time",
@@ -90,15 +137,15 @@ transcript = [
 streamed_tool_args = Hash.new { |h, k| h[k] = +"" }
-response = adapter.stream(transcript, tools: tools, reasoning: "high") do |event|
+response = adapter.stream(transcript, tools: tools, model: "gpt-5.4", reasoning: "high") do |event|
   case event.type
   # AssistantStreamMessageEvent
   when :message_start
     puts "\n[message_start] #{event.delta.inspect}"
   when :message_delta
-    puts "\n[message_delta] #{event.delta.inspect} usage+=#{event.usage_increment.inspect}"
+    puts "\n[message_delta] #{event.delta.inspect} usage=#{event.usage.inspect}"
   when :message_end
-    puts "\n[message_end]"
+    puts "\n[message_end] final_id=#{event.message.id} stop_reason=#{event.message.stop_reason}"
   # Text events
   when :text_start
@@ -141,6 +188,7 @@ puts "id: #{response.id}"
 puts "model: #{response.model}"
 puts "provider/api: #{response.provider}/#{response.api}"
 puts "role: #{response.role}"
+puts "timestamp: #{response.timestamp}" # Unix milliseconds
 puts "stop_reason: #{response.stop_reason}"
 puts "error_message: #{response.error_message.inspect}" if response.error_message
 puts "usage: #{response.usage.inspect}"
@@ -159,12 +207,23 @@ end
 ```
 Stream callback event families:
-- `AssistantStreamMessageEvent`: `:message_start`, `:message_delta`, `:message_end`
+- `AssistantStreamMessageEvent`: `:message_start`, `:message_delta`
+- `AssistantStreamMessageEndEvent`: `:message_end` with the final `event.message`
 - `AssistantStreamEvent` (and subclasses):
   - Text: `:text_start`, `:text_delta`, `:text_end`
   - Tool call: `:tool_start`, `:tool_delta`, `:tool_end`
   - Reasoning: `:reasoning_start`, `:reasoning_delta`, `:reasoning_end`
+Non-final stream events expose `event.partial`, a `PartialAssistantMessage` snapshot accumulated so far. The final `:message_end` event exposes the complete `AssistantMessage` as `event.message` instead.
+End events include helpers for the finalized current content block:
+- `event.content` for `:text_end`, `:reasoning_end`, and `:tool_end`
+- `event.text` for `:text_end`
+- `event.reasoning` for `:reasoning_end`
+- `event.tool_call` / `event.tool` for `:tool_end`
+Usage counters are normalized as `:input`, `:cache_write`, `:cache_read`, `:output`, and `:total`. `:total` is the sum of all input-side buckets plus output. `usage[:raw]` contains the original provider usage/token payload.
 ### Stream API without handling events (final result only)
 If you only care about the final `AssistantMessage`, call `stream` without a block:
@@ -173,14 +232,14 @@ If you only care about the final `AssistantMessage`, call `stream` without a blo
 require "llm_gateway"
 adapter = LlmGateway.build_provider(
-  provider: "openai_apikey_responses",
-  api_key: ENV.fetch("OPENAI_API_KEY"),
-  model_key: "gpt-5.4"
+  provider: "openai_responses",
+  api_key: ENV.fetch("OPENAI_API_KEY")
 )
-result = adapter.stream("Write one short sentence about Ruby.")
+result = adapter.stream("Write one short sentence about Ruby.", model: "gpt-5.4")
 puts result.role         # "assistant"
+puts result.timestamp    # Unix milliseconds
 puts result.stop_reason  # "stop" (usually)
 puts result.usage.inspect
@@ -194,7 +253,8 @@ puts text
 ## Migration guides
-- [Migrating from `chat` to `stream`](docs/chat-to-stream-migration.md) — use `stream` without a block when you only need the final response.
+- [0.6.0 migration guide](docs/migration_guide_0.6.0.md) — move `model_key` to per-request `model:`, update provider keys, update `Prompt` usage, and migrate stream event/usage changes.
+- [Migrating from `chat` to `stream`](docs/migration-guide.md) — use `stream` without a block when you only need the final response.
 ## Tools
@@ -228,11 +288,9 @@ require "llm_gateway"
 require "json"
 adapter = LlmGateway.build_provider(
-  provider: "openai_apikey_responses",
-  api_key: ENV.fetch("OPENAI_API_KEY"),
-  model_key: "gpt-5.4"
+  provider: "openai_responses",
+  api_key: ENV.fetch("OPENAI_API_KEY")
 )
 weather_tool = {
   name: "get_weather",
   description: "Get current weather for a location",
@@ -261,7 +319,7 @@ transcript = [
 ]
 # 1) First model pass (stream API, no event block)
-response = adapter.stream(transcript, tools: [weather_tool])
+response = adapter.stream(transcript, tools: [weather_tool], model: "gpt-5.4")
 transcript << response.to_h
 # 2) Execute tool calls returned by the model
@@ -284,7 +342,7 @@ end
 # 3) Continue the conversation after tool execution
 if response.content.any? { |b| b.type == "tool_use" }
-  final_response = adapter.stream(transcript, tools: [weather_tool])
+  final_response = adapter.stream(transcript, tools: [weather_tool], model: "gpt-5.4")
   final_text = final_response.content
     .select { |b| b.type == "text" }
@@ -309,11 +367,9 @@ require "llm_gateway"
 require "base64"
 adapter = LlmGateway.build_provider(
-  provider: "openai_apikey_responses",
-  api_key: ENV.fetch("OPENAI_API_KEY"),
-  model_key: "gpt-5.4"
+  provider: "openai_responses",
+  api_key: ENV.fetch("OPENAI_API_KEY")
 )
 image_b64 = Base64.strict_encode64(File.binread("./chart.png"))
 message = [
@@ -326,7 +382,7 @@ message = [
   }
 ]
-result = adapter.stream(message) # stream API, no event block
+result = adapter.stream(message, model: "gpt-5.4") # stream API, no event block
 text = result.content
   .select { |b| b.type == "text" }
@@ -346,18 +402,18 @@ You can request higher-effort reasoning by passing `reasoning:` to `stream`.
 require "llm_gateway"
 adapter = LlmGateway.build_provider(
-  provider: "openai_apikey_responses",
-  api_key: ENV.fetch("OPENAI_API_KEY"),
-  model_key: "gpt-5.4"
+  provider: "openai_responses",
+  api_key: ENV.fetch("OPENAI_API_KEY")
 )
 result = adapter.stream(
   "Think step by step and then compute 482 * 17.",
+  model: "gpt-5.4",
   reasoning: "high"
 )
 puts "stop_reason: #{result.stop_reason}"
-puts "usage: #{result.usage.inspect}" # may include reasoning_tokens depending on provider
+puts "usage: #{result.usage.inspect}" # normalized keys: :input, :cache_write, :cache_read, :output, :total, :raw
 result.content.each do |block|
   case block.type
@@ -377,7 +433,7 @@ If you want incremental thinking/reasoning tokens as they arrive, pass a block t
 ```ruby
 reasoning_text = +""
-result = adapter.stream("Solve 99 * 99 with brief reasoning.", reasoning: "high") do |event|
+result = adapter.stream("Solve 99 * 99 with brief reasoning.", model: "gpt-5.4", reasoning: "high") do |event|
   case event.type
   when :reasoning_start
     print "\n[thinking start]\n"
@@ -405,7 +461,7 @@ puts "Final stop_reason: #{result.stop_reason}"
   - fields: `reasoning` and optional `signature`
 - Usage accounting:
   - normalized in `result.usage` when provided by the upstream API
-  - may include `:reasoning_tokens` plus standard token counters
+  - keys are `:input`, `:cache_write`, `:cache_read`, `:output`, `:total`, and `:raw`
 In practice this means you can:
 - listen to `:reasoning_*` stream event variants, and
@@ -439,7 +495,7 @@ What happens under the hood on `stream`/`chat`:
 5. **Map response back to canonical output**
    - Stream chunks are mapped into normalized stream events.
-   - Final output is accumulated into a normalized `AssistantMessage` (`id`, `model`, `usage`, `stop_reason`, `content`, etc.).
+   - Final output is accumulated into a normalized `AssistantMessage` (`id`, `model`, `timestamp` as Unix milliseconds, `usage`, `stop_reason`, `content`, etc.).
 Why this matters:
 - A transcript produced by one provider can be reused with another provider without manually rewriting message structure.
@@ -455,18 +511,16 @@ require "llm_gateway"
 require "json"
 adapter = LlmGateway.build_provider(
-  provider: "openai_apikey_responses",
-  api_key: ENV.fetch("OPENAI_API_KEY"),
-  model_key: "gpt-5.4"
+  provider: "openai_responses",
+  api_key: ENV.fetch("OPENAI_API_KEY")
 )
 # Build context (transcript)
 transcript = [
   { role: "user", content: "Plan a 3-day trip to Tokyo." }
 ]
 # Run one turn and persist assistant output
-first = adapter.stream(transcript)
+first = adapter.stream(transcript, model: "gpt-5.4")
 transcript << first.to_h
 # Serialize (store in DB/file/cache)
@@ -477,7 +531,7 @@ restored_transcript = JSON.parse(json_context)
 # Continue conversation from restored context
 restored_transcript << { role: "user", content: "Now make it budget-friendly." }
-second = adapter.stream(restored_transcript)
+second = adapter.stream(restored_transcript, model: "gpt-5.4")
 puts second.content.select { |b| b.type == "text" }.map(&:text).join
 ```
@@ -491,7 +545,7 @@ Tip: if you serialize to JSON, keys become strings on parse; `llm_gateway` accep
 ## OAuth
-Use OAuth-capable providers (for example `openai_codex` and `anthropic_oauth_messages`) by supplying an `access_token` when building the adapter.
+Use OAuth-capable providers (for example `openai_codex` and `anthropic_messages`) by supplying an `access_token` when building the adapter.
 ### Get initial tokens (Codex / OpenAI OAuth)
@@ -599,11 +653,10 @@ Build the provider with the current access token:
 ```ruby
 adapter = LlmGateway.build_provider(
   provider: "openai_codex",
-  access_token: current_access_token,
-  model_key: "gpt-5.4"
+  access_token: current_access_token
 )
-result = adapter.stream("Hello from OAuth auth")
+result = adapter.stream("Hello from OAuth auth", model: "gpt-5.4")
 puts result.content.select { |b| b.type == "text" }.map(&:text).join
 ```
@@ -641,6 +694,6 @@ bundle exec ruby -Itest test/integration/live/stream_test.rb
 Cassette names are derived from the test file and test name, with VCR sanitizing path segments such as `stream_test.rb` to `stream_test_rb`.
-For OAuth-backed providers (`anthropic_oauth_messages`, `openai_oauth_codex`), the live test helper only loads real OAuth credentials while the cassette is being recorded. Once the cassette exists, replay uses placeholder tokens/account IDs so the test suite can run without local OAuth state. API-key providers still require the relevant API key when recording. Sensitive authorization headers and selected response headers are redacted before cassettes are written.
+For OAuth-backed providers (`anthropic_messages`, `openai_codex`), the live test helper only loads real OAuth credentials while the cassette is being recorded. Once the cassette exists, replay uses placeholder tokens/account IDs so the test suite can run without local OAuth state. API-key providers still require the relevant API key when recording. Sensitive authorization headers and selected response headers are redacted before cassettes are written.
 Some tests pass `redact_request_body: true` to `with_vcr_adapter`; those cassettes match on method and URI only and replace large request bodies with `"<huge prompt body redacted>"`.