open_router_enhanced 1.0.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -104,12 +104,15 @@ end
104
104
  client.on(:on_healing) do |healing_data|
105
105
  if healing_data[:healed]
106
106
  puts "Successfully healed JSON response"
107
+ puts "Attempts: #{healing_data[:attempts]}"
107
108
  else
108
109
  puts "JSON healing failed: #{healing_data[:error]}"
109
110
  end
110
111
  end
111
112
  ```
112
113
 
114
+ **Note**: For detailed information about when auto-healing triggers, how it works, and configuration options, see the [Structured Outputs documentation](structured_outputs.md#json-auto-healing).
115
+
113
116
  ### 4. Streaming Observability
114
117
  Enhanced streaming support with detailed event callbacks:
115
118
 
data/docs/plugins.md ADDED
@@ -0,0 +1,183 @@
1
+ # OpenRouter Plugins
2
+
3
+ OpenRouter provides plugins that extend model capabilities. The gem supports all OpenRouter plugins and automatically enables response healing for structured outputs.
4
+
5
+ ## Available Plugins
6
+
7
+ | Plugin | ID | Description |
8
+ |--------|-----|-------------|
9
+ | Response Healing | `response-healing` | Fixes malformed JSON responses |
10
+ | Web Search | `web-search` | Augments responses with real-time web search |
11
+ | PDF Inputs | `pdf-inputs` | Parses and extracts content from PDF files |
12
+
13
+ ## Basic Usage
14
+
15
+ ```ruby
16
+ # Specify plugins in your request
17
+ response = client.complete(
18
+ messages,
19
+ model: "openai/gpt-4o-mini",
20
+ plugins: [{ id: "web-search" }]
21
+ )
22
+
23
+ # Multiple plugins
24
+ response = client.complete(
25
+ messages,
26
+ model: "openai/gpt-4o-mini",
27
+ plugins: [
28
+ { id: "web-search" },
29
+ { id: "pdf-inputs" }
30
+ ]
31
+ )
32
+ ```
33
+
34
+ ## Response Healing Plugin
35
+
36
+ The response-healing plugin fixes common JSON formatting issues server-side:
37
+
38
+ - Missing brackets, commas, and quotes
39
+ - Trailing commas
40
+ - Markdown-wrapped JSON
41
+ - Text mixed with JSON
42
+ - Unquoted object keys
43
+
44
+ ### Automatic Activation
45
+
46
+ The gem **automatically adds** the response-healing plugin when:
47
+ 1. Using structured outputs (`response_format` is set)
48
+ 2. Not streaming
49
+ 3. `auto_native_healing` is enabled (default: true)
50
+
51
+ ```ruby
52
+ # Response-healing is automatically added here
53
+ response = client.complete(
54
+ messages,
55
+ model: "openai/gpt-4o-mini",
56
+ response_format: schema
57
+ )
58
+ ```
59
+
60
+ ### Disable Automatic Healing
61
+
62
+ ```ruby
63
+ # Via configuration
64
+ OpenRouter.configure do |config|
65
+ config.auto_native_healing = false
66
+ end
67
+
68
+ # Via environment variable
69
+ # OPENROUTER_AUTO_NATIVE_HEALING=false
70
+ ```
71
+
72
+ ### Manual Control
73
+
74
+ ```ruby
75
+ # Explicitly add response-healing
76
+ response = client.complete(
77
+ messages,
78
+ model: "openai/gpt-4o-mini",
79
+ plugins: [{ id: "response-healing" }],
80
+ response_format: { type: "json_object" }
81
+ )
82
+
83
+ # Disable for a specific request (when auto is enabled)
84
+ response = client.complete(
85
+ messages,
86
+ model: "openai/gpt-4o-mini",
87
+ plugins: [{ id: "response-healing", enabled: false }],
88
+ response_format: schema
89
+ )
90
+ ```
91
+
92
+ ### Limitations
93
+
94
+ - **Non-streaming only**: Does not work with `stream: proc`
95
+ - **Syntax only**: Fixes JSON syntax, not schema conformance
96
+ - **Truncation issues**: May fail if response was cut off by `max_tokens`
97
+
98
+ For schema validation failures, use the gem's [client-side auto-healing](structured_outputs.md#json-auto-healing-client-side).
99
+
100
+ ## Web Search Plugin
101
+
102
+ Augments model responses with real-time web search results.
103
+
104
+ ```ruby
105
+ response = client.complete(
106
+ [{ role: "user", content: "What are the latest AI developments?" }],
107
+ model: "openai/gpt-4o-mini",
108
+ plugins: [{ id: "web-search" }]
109
+ )
110
+ ```
111
+
112
+ **Shortcut**: Append `:online` to model ID:
113
+
114
+ ```ruby
115
+ response = client.complete(
116
+ messages,
117
+ model: "openai/gpt-4o-mini:online" # Enables web-search
118
+ )
119
+ ```
120
+
121
+ ## PDF Inputs Plugin
122
+
123
+ Enables models to process PDF file content.
124
+
125
+ ```ruby
126
+ response = client.complete(
127
+ [{ role: "user", content: "Summarize this PDF: [pdf content]" }],
128
+ model: "openai/gpt-4o-mini",
129
+ plugins: [{ id: "pdf-inputs" }]
130
+ )
131
+ ```
132
+
133
+ ## Plugin Configuration Options
134
+
135
+ Plugins can accept additional configuration:
136
+
137
+ ```ruby
138
+ # Enable/disable a plugin explicitly
139
+ plugins: [{ id: "response-healing", enabled: true }]
140
+
141
+ # Disable a default plugin for one request
142
+ plugins: [{ id: "response-healing", enabled: false }]
143
+ ```
144
+
145
+ ## Prediction Parameter (Latency Optimization)
146
+
147
+ The `prediction` parameter reduces latency by providing the model with an expected output:
148
+
149
+ ```ruby
150
+ response = client.complete(
151
+ [{ role: "user", content: "What is the capital of France?" }],
152
+ model: "openai/gpt-4o",
153
+ prediction: { type: "content", content: "The capital of France is Paris." }
154
+ )
155
+ ```
156
+
157
+ **When to use**:
158
+ - Code completion with predictable boilerplate
159
+ - Template filling where most content is known
160
+ - Minor corrections/refinements to existing text
161
+
162
+ **How it works**: Instead of generating from scratch, the model confirms/refines your prediction, which is faster when accurate.
163
+
164
+ ## Best Practices
165
+
166
+ 1. **Use native healing for structured outputs**: It's free and adds <1ms latency
167
+ 2. **Don't combine response-healing with streaming**: It won't work
168
+ 3. **Check model compatibility**: Not all models support all plugins
169
+ 4. **Monitor costs**: Web search may add to response latency
170
+
171
+ ## Comparison: Native vs Client-Side Healing
172
+
173
+ | Aspect | Native (Plugin) | Client-Side (Gem) |
174
+ |--------|-----------------|-------------------|
175
+ | Location | Server-side | Client-side |
176
+ | Cost | Free | API call per attempt |
177
+ | Latency | <1ms | Full LLM call |
178
+ | Fixes syntax | Yes | Yes |
179
+ | Fixes schema | No | Yes |
180
+ | Streaming | No | Yes |
181
+ | Auto-enabled | For structured outputs | When `auto_heal_responses = true` |
182
+
183
+ **Recommendation**: Use both! Native healing catches 80%+ of issues for free. Client-side healing handles the rest and validates against your schema.
@@ -0,0 +1,298 @@
1
+ # Responses API (Beta)
2
+
3
+ The Responses API is an OpenAI-compatible stateless endpoint that provides access to multiple AI models with advanced reasoning capabilities.
4
+
5
+ > **Beta**: This API may have breaking changes. Use with caution in production.
6
+
7
+ ## Basic Usage
8
+
9
+ ```ruby
10
+ client = OpenRouter::Client.new
11
+
12
+ response = client.responses(
13
+ "What is the capital of France?",
14
+ model: "openai/gpt-4o-mini"
15
+ )
16
+
17
+ puts response.content # => "Paris"
18
+ ```
19
+
20
+ ## With Reasoning
21
+
22
+ The Responses API supports reasoning with configurable effort levels:
23
+
24
+ ```ruby
25
+ response = client.responses(
26
+ "What is 15% of 80? Show your work.",
27
+ model: "openai/o4-mini",
28
+ reasoning: { effort: "high" },
29
+ max_output_tokens: 500
30
+ )
31
+
32
+ # Access reasoning steps
33
+ if response.has_reasoning?
34
+ puts "Reasoning steps:"
35
+ response.reasoning_summary.each { |step| puts " - #{step}" }
36
+ end
37
+
38
+ puts "Answer: #{response.content}"
39
+ puts "Reasoning tokens used: #{response.reasoning_tokens}"
40
+ ```
41
+
42
+ ### Effort Levels
43
+
44
+ | Level | Description |
45
+ |-------|-------------|
46
+ | `minimal` | Basic reasoning with minimal computational effort |
47
+ | `low` | Light reasoning for simple problems |
48
+ | `medium` | Balanced reasoning for moderate complexity |
49
+ | `high` | Deep reasoning for complex problems |
50
+
51
+ ## Parameters
52
+
53
+ | Parameter | Type | Description |
54
+ |-----------|------|-------------|
55
+ | `input` | String or Array | The input text or structured message array (required) |
56
+ | `model` | String | Model identifier, e.g. `"openai/o4-mini"` (required) |
57
+ | `reasoning` | Hash | Reasoning config with `effort` key |
58
+ | `tools` | Array | Tool definitions for function calling |
59
+ | `tool_choice` | String/Hash | `"auto"`, `"none"`, `"required"`, or specific tool |
60
+ | `max_output_tokens` | Integer | Maximum tokens to generate |
61
+ | `temperature` | Float | Sampling temperature (0-2) |
62
+ | `top_p` | Float | Nucleus sampling parameter (0-1) |
63
+ | `extras` | Hash | Additional API parameters |
64
+
65
+ ## Structured Input
66
+
67
+ You can also use structured message arrays:
68
+
69
+ ```ruby
70
+ response = client.responses(
71
+ [
72
+ {
73
+ "type" => "message",
74
+ "role" => "user",
75
+ "content" => [
76
+ { "type" => "input_text", "text" => "Hello, world!" }
77
+ ]
78
+ }
79
+ ],
80
+ model: "openai/gpt-4o-mini"
81
+ )
82
+ ```
83
+
84
+ ## Response Object
85
+
86
+ The `ResponsesResponse` class provides convenient accessors:
87
+
88
+ ```ruby
89
+ response.id # Response ID
90
+ response.status # "completed", "failed", etc.
91
+ response.model # Model used
92
+ response.content # Assistant's text response
93
+ response.output # Raw output array
94
+
95
+ # Reasoning
96
+ response.has_reasoning? # Boolean
97
+ response.reasoning_summary # Array of reasoning steps
98
+
99
+ # Tool calls
100
+ response.has_tool_calls? # Boolean
101
+ response.tool_calls # Array of ResponsesToolCall objects
102
+ response.tool_calls_raw # Array of raw hash data
103
+
104
+ # Token usage
105
+ response.input_tokens # Input token count
106
+ response.output_tokens # Output token count
107
+ response.total_tokens # Total token count
108
+ response.reasoning_tokens # Tokens used for reasoning
109
+ ```
110
+
111
+ ## Tool/Function Calling
112
+
113
+ The Responses API supports function calling with a simplified format. Tool calls are wrapped in `ResponsesToolCall` objects for easy execution.
114
+
115
+ ### Defining Tools
116
+
117
+ You can use the same tool format as Chat Completions - the gem automatically converts it:
118
+
119
+ ```ruby
120
+ tools = [
121
+ {
122
+ type: "function",
123
+ function: {
124
+ name: "get_weather",
125
+ description: "Get current weather for a location",
126
+ parameters: {
127
+ type: "object",
128
+ properties: {
129
+ location: { type: "string", description: "City name" },
130
+ units: { type: "string", enum: ["celsius", "fahrenheit"] }
131
+ },
132
+ required: ["location"]
133
+ }
134
+ }
135
+ }
136
+ ]
137
+
138
+ response = client.responses(
139
+ "What's the weather in San Francisco?",
140
+ model: "openai/gpt-4o-mini",
141
+ tools: tools
142
+ )
143
+ ```
144
+
145
+ You can also use the `Tool` DSL:
146
+
147
+ ```ruby
148
+ weather_tool = OpenRouter::Tool.define do
149
+ name "get_weather"
150
+ description "Get current weather for a location"
151
+ parameters do
152
+ string :location, required: true, description: "City name"
153
+ string :units, enum: %w[celsius fahrenheit]
154
+ end
155
+ end
156
+
157
+ response = client.responses(
158
+ "What's the weather in Tokyo?",
159
+ model: "openai/gpt-4o-mini",
160
+ tools: [weather_tool]
161
+ )
162
+ ```
163
+
164
+ ### Tool Choice
165
+
166
+ Control when the model uses tools with `tool_choice`:
167
+
168
+ ```ruby
169
+ # Let model decide (default)
170
+ response = client.responses(input, model: model, tools: tools, tool_choice: "auto")
171
+
172
+ # Force tool use
173
+ response = client.responses(input, model: model, tools: tools, tool_choice: "required")
174
+
175
+ # Prevent tool use
176
+ response = client.responses(input, model: model, tools: tools, tool_choice: "none")
177
+ ```
178
+
179
+ ### Executing Tool Calls
180
+
181
+ ```ruby
182
+ if response.has_tool_calls?
183
+ # Execute each tool call with a block
184
+ results = response.execute_tool_calls do |name, arguments|
185
+ case name
186
+ when "get_weather"
187
+ fetch_weather(arguments["location"], arguments["units"])
188
+ when "search_web"
189
+ search(arguments["query"])
190
+ else
191
+ { error: "Unknown function: #{name}" }
192
+ end
193
+ end
194
+
195
+ # Results are ResponsesToolResult objects
196
+ results.each do |result|
197
+ if result.success?
198
+ puts "#{result.tool_call.name}: #{result.result}"
199
+ else
200
+ puts "Error: #{result.error}"
201
+ end
202
+ end
203
+ end
204
+ ```
205
+
206
+ ### Multi-turn Tool Conversations
207
+
208
+ Use `build_follow_up_input` to continue conversations after tool execution:
209
+
210
+ ```ruby
211
+ # First call - model requests tool use
212
+ original_input = "What's the weather in NYC and Paris?"
213
+ response = client.responses(original_input, model: "openai/gpt-4o-mini", tools: tools)
214
+
215
+ # Execute the tool calls
216
+ results = response.execute_tool_calls do |name, args|
217
+ fetch_weather(args["location"])
218
+ end
219
+
220
+ # Build follow-up input with tool results
221
+ next_input = response.build_follow_up_input(
222
+ original_input: original_input,
223
+ tool_results: results
224
+ )
225
+
226
+ # Continue the conversation - model will use the tool results
227
+ final_response = client.responses(next_input, model: "openai/gpt-4o-mini")
228
+ puts final_response.content
229
+ # => "In NYC it's 72°F and sunny. In Paris it's 18°C and cloudy."
230
+ ```
231
+
232
+ ### Adding Follow-up Messages
233
+
234
+ You can include a follow-up question when building the input:
235
+
236
+ ```ruby
237
+ next_input = response.build_follow_up_input(
238
+ original_input: original_input,
239
+ tool_results: results,
240
+ follow_up_message: "Which city has better weather for a picnic?"
241
+ )
242
+ ```
243
+
244
+ ### Tool Call Objects
245
+
246
+ `ResponsesToolCall` provides:
247
+
248
+ ```ruby
249
+ tool_call.id # Tool call ID
250
+ tool_call.call_id # Call ID for result matching
251
+ tool_call.name # Function name
252
+ tool_call.function_name # Alias for name
253
+ tool_call.arguments # Parsed arguments hash
254
+ tool_call.arguments_string # Raw JSON string
255
+ tool_call.to_input_item # Convert to input format
256
+ ```
257
+
258
+ `ResponsesToolResult` provides:
259
+
260
+ ```ruby
261
+ result.tool_call # Reference to the tool call
262
+ result.result # Execution result (if successful)
263
+ result.error # Error message (if failed)
264
+ result.success? # Boolean
265
+ result.failure? # Boolean
266
+ result.to_input_item # Convert to function_call_output format
267
+ ```
268
+
269
+ ## Comparison with Chat Completions
270
+
271
+ | Aspect | `complete()` | `responses()` |
272
+ |--------|--------------|---------------|
273
+ | Endpoint | `/chat/completions` | `/responses` |
274
+ | Input | `messages` array | `input` string or array |
275
+ | Output | `choices[].message` | `output[]` typed items |
276
+ | Reasoning | Not supported | `reasoning` parameter |
277
+ | Tool calling | Supported | Supported |
278
+ | Token limit | `max_tokens` | `max_output_tokens` |
279
+ | Streaming | Supported | Not yet supported |
280
+
281
+ ## When to Use
282
+
283
+ Use the Responses API when you need:
284
+ - Built-in reasoning with effort control
285
+ - OpenAI Responses API compatibility
286
+ - Simpler input format (string instead of messages)
287
+
288
+ Use Chat Completions when you need:
289
+ - Streaming responses
290
+ - Full callback system integration
291
+ - Usage tracking integration
292
+ - Response healing features
293
+
294
+ ## Future Enhancements
295
+
296
+ The following features are planned but not yet implemented:
297
+ - Streaming support
298
+ - Callbacks integration
data/docs/streaming.md CHANGED
@@ -214,7 +214,7 @@ end
214
214
 
215
215
  ## Structured Outputs with Streaming
216
216
 
217
- Streaming works seamlessly with structured outputs:
217
+ Streaming works seamlessly with structured outputs. The response is streamed in real-time, then validated and parsed after accumulation completes.
218
218
 
219
219
  ```ruby
220
220
  # Define schema
@@ -225,18 +225,33 @@ user_schema = OpenRouter::Schema.define("user") do
225
225
  end
226
226
 
227
227
  # Stream with structured output
228
+ # IMPORTANT: accumulate_response must be true for structured outputs
228
229
  response = streaming_client.stream_complete(
229
230
  [{ role: "user", content: "Create a user: John Doe, 30, john@example.com" }],
230
231
  model: "openai/gpt-4o",
231
232
  response_format: user_schema,
232
- accumulate_response: true
233
+ accumulate_response: true # Required for structured_output access
233
234
  )
234
235
 
235
- # Access structured output after streaming
236
+ # Access structured output after streaming completes
236
237
  user_data = response.structured_output
237
238
  puts "User: #{user_data['name']}, Age: #{user_data['age']}"
238
239
  ```
239
240
 
241
+ ### How Structured Outputs Work with Streaming
242
+
243
+ 1. **During Streaming**: Content chunks are streamed and displayed in real-time
244
+ 2. **After Accumulation**: The complete response is validated against your schema
245
+ 3. **Auto-Healing**: If enabled and needed, healing occurs after streaming completes
246
+ 4. **Validation**: Schema validation happens on the accumulated response
247
+
248
+ **Important Notes:**
249
+ - You must set `accumulate_response: true` to use `response.structured_output`
250
+ - Auto-healing (if configured) happens after streaming completes, not during streaming
251
+ - The `on_finish` callback receives the final, validated response
252
+
253
+ For detailed information on auto-healing, native vs forced outputs, and troubleshooting, see the [Structured Outputs documentation](structured_outputs.md).
254
+
240
255
  ## Configuration Options
241
256
 
242
257
  The streaming client accepts all the same configuration options as the regular client: