RubyGems - open_router_enhanced - Versions diffs - 1.0.0 → 1.2.0 - Mend

open_router_enhanced 1.0.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +36 -0
data/Gemfile.lock +45 -47
data/README.md +151 -1228
data/docs/observability.md +3 -0
data/docs/plugins.md +183 -0
data/docs/responses_api.md +298 -0
data/docs/streaming.md +18 -3
data/docs/structured_outputs.md +466 -146
data/lib/open_router/client.rb +122 -5
data/lib/open_router/responses_response.rb +192 -0
data/lib/open_router/responses_tool_call.rb +95 -0
data/lib/open_router/tool_call.rb +13 -59
data/lib/open_router/tool_call_base.rb +69 -0
data/lib/open_router/version.rb +1 -1
data/lib/open_router.rb +9 -0
metadata +7 -2

data/docs/observability.md CHANGED Viewed

@@ -104,12 +104,15 @@ end
 client.on(:on_healing) do |healing_data|
   if healing_data[:healed]
     puts "Successfully healed JSON response"
+    puts "Attempts: #{healing_data[:attempts]}"
   else
     puts "JSON healing failed: #{healing_data[:error]}"
   end
 end
 ```
+**Note**: For detailed information about when auto-healing triggers, how it works, and configuration options, see the [Structured Outputs documentation](structured_outputs.md#json-auto-healing).
 ### 4. Streaming Observability
 Enhanced streaming support with detailed event callbacks:

data/docs/plugins.md ADDED Viewed

@@ -0,0 +1,183 @@
+# OpenRouter Plugins
+OpenRouter provides plugins that extend model capabilities. The gem supports all OpenRouter plugins and automatically enables response healing for structured outputs.
+## Available Plugins
+| Plugin | ID | Description |
+|--------|-----|-------------|
+| Response Healing | `response-healing` | Fixes malformed JSON responses |
+| Web Search | `web-search` | Augments responses with real-time web search |
+| PDF Inputs | `pdf-inputs` | Parses and extracts content from PDF files |
+## Basic Usage
+```ruby
+# Specify plugins in your request
+response = client.complete(
+  messages,
+  model: "openai/gpt-4o-mini",
+  plugins: [{ id: "web-search" }]
+)
+# Multiple plugins
+response = client.complete(
+  messages,
+  model: "openai/gpt-4o-mini",
+  plugins: [
+    { id: "web-search" },
+    { id: "pdf-inputs" }
+  ]
+)
+```
+## Response Healing Plugin
+The response-healing plugin fixes common JSON formatting issues server-side:
+- Missing brackets, commas, and quotes
+- Trailing commas
+- Markdown-wrapped JSON
+- Text mixed with JSON
+- Unquoted object keys
+### Automatic Activation
+The gem **automatically adds** the response-healing plugin when:
+1. Using structured outputs (`response_format` is set)
+2. Not streaming
+3. `auto_native_healing` is enabled (default: true)
+```ruby
+# Response-healing is automatically added here
+response = client.complete(
+  messages,
+  model: "openai/gpt-4o-mini",
+  response_format: schema
+)
+```
+### Disable Automatic Healing
+```ruby
+# Via configuration
+OpenRouter.configure do |config|
+  config.auto_native_healing = false
+end
+# Via environment variable
+# OPENROUTER_AUTO_NATIVE_HEALING=false
+```
+### Manual Control
+```ruby
+# Explicitly add response-healing
+response = client.complete(
+  messages,
+  model: "openai/gpt-4o-mini",
+  plugins: [{ id: "response-healing" }],
+  response_format: { type: "json_object" }
+)
+# Disable for a specific request (when auto is enabled)
+response = client.complete(
+  messages,
+  model: "openai/gpt-4o-mini",
+  plugins: [{ id: "response-healing", enabled: false }],
+  response_format: schema
+)
+```
+### Limitations
+- **Non-streaming only**: Does not work with `stream: proc`
+- **Syntax only**: Fixes JSON syntax, not schema conformance
+- **Truncation issues**: May fail if response was cut off by `max_tokens`
+For schema validation failures, use the gem's [client-side auto-healing](structured_outputs.md#json-auto-healing-client-side).
+## Web Search Plugin
+Augments model responses with real-time web search results.
+```ruby
+response = client.complete(
+  [{ role: "user", content: "What are the latest AI developments?" }],
+  model: "openai/gpt-4o-mini",
+  plugins: [{ id: "web-search" }]
+)
+```
+**Shortcut**: Append `:online` to model ID:
+```ruby
+response = client.complete(
+  messages,
+  model: "openai/gpt-4o-mini:online"  # Enables web-search
+)
+```
+## PDF Inputs Plugin
+Enables models to process PDF file content.
+```ruby
+response = client.complete(
+  [{ role: "user", content: "Summarize this PDF: [pdf content]" }],
+  model: "openai/gpt-4o-mini",
+  plugins: [{ id: "pdf-inputs" }]
+)
+```
+## Plugin Configuration Options
+Plugins can accept additional configuration:
+```ruby
+# Enable/disable a plugin explicitly
+plugins: [{ id: "response-healing", enabled: true }]
+# Disable a default plugin for one request
+plugins: [{ id: "response-healing", enabled: false }]
+```
+## Prediction Parameter (Latency Optimization)
+The `prediction` parameter reduces latency by providing the model with an expected output:
+```ruby
+response = client.complete(
+  [{ role: "user", content: "What is the capital of France?" }],
+  model: "openai/gpt-4o",
+  prediction: { type: "content", content: "The capital of France is Paris." }
+)
+```
+**When to use**:
+- Code completion with predictable boilerplate
+- Template filling where most content is known
+- Minor corrections/refinements to existing text
+**How it works**: Instead of generating from scratch, the model confirms/refines your prediction, which is faster when accurate.
+## Best Practices
+1. **Use native healing for structured outputs**: It's free and adds <1ms latency
+2. **Don't combine response-healing with streaming**: It won't work
+3. **Check model compatibility**: Not all models support all plugins
+4. **Monitor costs**: Web search may add to response latency
+## Comparison: Native vs Client-Side Healing
+| Aspect | Native (Plugin) | Client-Side (Gem) |
+|--------|-----------------|-------------------|
+| Location | Server-side | Client-side |
+| Cost | Free | API call per attempt |
+| Latency | <1ms | Full LLM call |
+| Fixes syntax | Yes | Yes |
+| Fixes schema | No | Yes |
+| Streaming | No | Yes |
+| Auto-enabled | For structured outputs | When `auto_heal_responses = true` |
+**Recommendation**: Use both! Native healing catches 80%+ of issues for free. Client-side healing handles the rest and validates against your schema.

data/docs/responses_api.md ADDED Viewed

@@ -0,0 +1,298 @@
+# Responses API (Beta)
+The Responses API is an OpenAI-compatible stateless endpoint that provides access to multiple AI models with advanced reasoning capabilities.
+> **Beta**: This API may have breaking changes. Use with caution in production.
+## Basic Usage
+```ruby
+client = OpenRouter::Client.new
+response = client.responses(
+  "What is the capital of France?",
+  model: "openai/gpt-4o-mini"
+)
+puts response.content  # => "Paris"
+```
+## With Reasoning
+The Responses API supports reasoning with configurable effort levels:
+```ruby
+response = client.responses(
+  "What is 15% of 80? Show your work.",
+  model: "openai/o4-mini",
+  reasoning: { effort: "high" },
+  max_output_tokens: 500
+)
+# Access reasoning steps
+if response.has_reasoning?
+  puts "Reasoning steps:"
+  response.reasoning_summary.each { |step| puts "  - #{step}" }
+end
+puts "Answer: #{response.content}"
+puts "Reasoning tokens used: #{response.reasoning_tokens}"
+```
+### Effort Levels
+| Level | Description |
+|-------|-------------|
+| `minimal` | Basic reasoning with minimal computational effort |
+| `low` | Light reasoning for simple problems |
+| `medium` | Balanced reasoning for moderate complexity |
+| `high` | Deep reasoning for complex problems |
+## Parameters
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `input` | String or Array | The input text or structured message array (required) |
+| `model` | String | Model identifier, e.g. `"openai/o4-mini"` (required) |
+| `reasoning` | Hash | Reasoning config with `effort` key |
+| `tools` | Array | Tool definitions for function calling |
+| `tool_choice` | String/Hash | `"auto"`, `"none"`, `"required"`, or specific tool |
+| `max_output_tokens` | Integer | Maximum tokens to generate |
+| `temperature` | Float | Sampling temperature (0-2) |
+| `top_p` | Float | Nucleus sampling parameter (0-1) |
+| `extras` | Hash | Additional API parameters |
+## Structured Input
+You can also use structured message arrays:
+```ruby
+response = client.responses(
+  [
+    {
+      "type" => "message",
+      "role" => "user",
+      "content" => [
+        { "type" => "input_text", "text" => "Hello, world!" }
+      ]
+    }
+  ],
+  model: "openai/gpt-4o-mini"
+)
+```
+## Response Object
+The `ResponsesResponse` class provides convenient accessors:
+```ruby
+response.id              # Response ID
+response.status          # "completed", "failed", etc.
+response.model           # Model used
+response.content         # Assistant's text response
+response.output          # Raw output array
+# Reasoning
+response.has_reasoning?     # Boolean
+response.reasoning_summary  # Array of reasoning steps
+# Tool calls
+response.has_tool_calls?  # Boolean
+response.tool_calls       # Array of ResponsesToolCall objects
+response.tool_calls_raw   # Array of raw hash data
+# Token usage
+response.input_tokens     # Input token count
+response.output_tokens    # Output token count
+response.total_tokens     # Total token count
+response.reasoning_tokens # Tokens used for reasoning
+```
+## Tool/Function Calling
+The Responses API supports function calling with a simplified format. Tool calls are wrapped in `ResponsesToolCall` objects for easy execution.
+### Defining Tools
+You can use the same tool format as Chat Completions - the gem automatically converts it:
+```ruby
+tools = [
+  {
+    type: "function",
+    function: {
+      name: "get_weather",
+      description: "Get current weather for a location",
+      parameters: {
+        type: "object",
+        properties: {
+          location: { type: "string", description: "City name" },
+          units: { type: "string", enum: ["celsius", "fahrenheit"] }
+        },
+        required: ["location"]
+      }
+    }
+  }
+]
+response = client.responses(
+  "What's the weather in San Francisco?",
+  model: "openai/gpt-4o-mini",
+  tools: tools
+)
+```
+You can also use the `Tool` DSL:
+```ruby
+weather_tool = OpenRouter::Tool.define do
+  name "get_weather"
+  description "Get current weather for a location"
+  parameters do
+    string :location, required: true, description: "City name"
+    string :units, enum: %w[celsius fahrenheit]
+  end
+end
+response = client.responses(
+  "What's the weather in Tokyo?",
+  model: "openai/gpt-4o-mini",
+  tools: [weather_tool]
+)
+```
+### Tool Choice
+Control when the model uses tools with `tool_choice`:
+```ruby
+# Let model decide (default)
+response = client.responses(input, model: model, tools: tools, tool_choice: "auto")
+# Force tool use
+response = client.responses(input, model: model, tools: tools, tool_choice: "required")
+# Prevent tool use
+response = client.responses(input, model: model, tools: tools, tool_choice: "none")
+```
+### Executing Tool Calls
+```ruby
+if response.has_tool_calls?
+  # Execute each tool call with a block
+  results = response.execute_tool_calls do |name, arguments|
+    case name
+    when "get_weather"
+      fetch_weather(arguments["location"], arguments["units"])
+    when "search_web"
+      search(arguments["query"])
+    else
+      { error: "Unknown function: #{name}" }
+    end
+  end
+  # Results are ResponsesToolResult objects
+  results.each do |result|
+    if result.success?
+      puts "#{result.tool_call.name}: #{result.result}"
+    else
+      puts "Error: #{result.error}"
+    end
+  end
+end
+```
+### Multi-turn Tool Conversations
+Use `build_follow_up_input` to continue conversations after tool execution:
+```ruby
+# First call - model requests tool use
+original_input = "What's the weather in NYC and Paris?"
+response = client.responses(original_input, model: "openai/gpt-4o-mini", tools: tools)
+# Execute the tool calls
+results = response.execute_tool_calls do |name, args|
+  fetch_weather(args["location"])
+end
+# Build follow-up input with tool results
+next_input = response.build_follow_up_input(
+  original_input: original_input,
+  tool_results: results
+)
+# Continue the conversation - model will use the tool results
+final_response = client.responses(next_input, model: "openai/gpt-4o-mini")
+puts final_response.content
+# => "In NYC it's 72°F and sunny. In Paris it's 18°C and cloudy."
+```
+### Adding Follow-up Messages
+You can include a follow-up question when building the input:
+```ruby
+next_input = response.build_follow_up_input(
+  original_input: original_input,
+  tool_results: results,
+  follow_up_message: "Which city has better weather for a picnic?"
+)
+```
+### Tool Call Objects
+`ResponsesToolCall` provides:
+```ruby
+tool_call.id              # Tool call ID
+tool_call.call_id         # Call ID for result matching
+tool_call.name            # Function name
+tool_call.function_name   # Alias for name
+tool_call.arguments       # Parsed arguments hash
+tool_call.arguments_string # Raw JSON string
+tool_call.to_input_item   # Convert to input format
+```
+`ResponsesToolResult` provides:
+```ruby
+result.tool_call  # Reference to the tool call
+result.result     # Execution result (if successful)
+result.error      # Error message (if failed)
+result.success?   # Boolean
+result.failure?   # Boolean
+result.to_input_item  # Convert to function_call_output format
+```
+## Comparison with Chat Completions
+| Aspect | `complete()` | `responses()` |
+|--------|--------------|---------------|
+| Endpoint | `/chat/completions` | `/responses` |
+| Input | `messages` array | `input` string or array |
+| Output | `choices[].message` | `output[]` typed items |
+| Reasoning | Not supported | `reasoning` parameter |
+| Tool calling | Supported | Supported |
+| Token limit | `max_tokens` | `max_output_tokens` |
+| Streaming | Supported | Not yet supported |
+## When to Use
+Use the Responses API when you need:
+- Built-in reasoning with effort control
+- OpenAI Responses API compatibility
+- Simpler input format (string instead of messages)
+Use Chat Completions when you need:
+- Streaming responses
+- Full callback system integration
+- Usage tracking integration
+- Response healing features
+## Future Enhancements
+The following features are planned but not yet implemented:
+- Streaming support
+- Callbacks integration

data/docs/streaming.md CHANGED Viewed

@@ -214,7 +214,7 @@ end
 ## Structured Outputs with Streaming
-Streaming works seamlessly with structured outputs:
+Streaming works seamlessly with structured outputs. The response is streamed in real-time, then validated and parsed after accumulation completes.
 ```ruby
 # Define schema
@@ -225,18 +225,33 @@ user_schema = OpenRouter::Schema.define("user") do
 end
 # Stream with structured output
+# IMPORTANT: accumulate_response must be true for structured outputs
 response = streaming_client.stream_complete(
   [{ role: "user", content: "Create a user: John Doe, 30, john@example.com" }],
   model: "openai/gpt-4o",
   response_format: user_schema,
-  accumulate_response: true
+  accumulate_response: true  # Required for structured_output access
 )
-# Access structured output after streaming
+# Access structured output after streaming completes
 user_data = response.structured_output
 puts "User: #{user_data['name']}, Age: #{user_data['age']}"
 ```
+### How Structured Outputs Work with Streaming
+1. **During Streaming**: Content chunks are streamed and displayed in real-time
+2. **After Accumulation**: The complete response is validated against your schema
+3. **Auto-Healing**: If enabled and needed, healing occurs after streaming completes
+4. **Validation**: Schema validation happens on the accumulated response
+**Important Notes:**
+- You must set `accumulate_response: true` to use `response.structured_output`
+- Auto-healing (if configured) happens after streaming completes, not during streaming
+- The `on_finish` callback receives the final, validated response
+For detailed information on auto-healing, native vs forced outputs, and troubleshooting, see the [Structured Outputs documentation](structured_outputs.md).
 ## Configuration Options
 The streaming client accepts all the same configuration options as the regular client: