open_router_enhanced 2.0.1 → 2.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: d443d948a07c5b55d6366e135354b2faa07a8edc38cb2791237a6a4a92bd229a
4
- data.tar.gz: 37a93b36720b58bf1ee1c4809aa4d54e4d29e06ba27a63c9285a78fa7074eb66
3
+ metadata.gz: b6c9c14171242103eaeab8219521180242f9e0ee0968c739f8db2148a17423a5
4
+ data.tar.gz: '0906f33ab027e8cbf17ff60ab120ec65de39ec689c6e9a48025fb8df679f7d55'
5
5
  SHA512:
6
- metadata.gz: 78fb6b74df5a7cb901ecac23fe503c667035e4794d6e7d9b97d4ad703e1f0a40347bd3274b86ba13c35501c8afc0b3c29ec5cb7b632eb93013af5a5328403285
7
- data.tar.gz: 51d704a3035cf8211ac0d9533931d0b1a9bc9ebe49d21f192dd49b7dbb423eadcbf5adcc7b99d3212002947f2646337122e99660a589fa54be22ab3a356555eb
6
+ metadata.gz: 4cd127d4d6889e281e88e3f044f2444bd32e46f7ac4797d5c786b9a3fc5a8f792baf445c8dea481ab5018fae72646a221da05266bcb4134266735546e35a428d
7
+ data.tar.gz: a5b80c88d5f2228f1891409d2edb335a1a2c0294b94ffc69fb2730d0a854f7010c5eece8ac026eb6ba99a60df583634f42c77e18aada4b0399886b6a744a3488
data/CHANGELOG.md CHANGED
@@ -1,5 +1,23 @@
1
1
  ## [Unreleased]
2
2
 
3
+ ## [2.2.0] - 2026-06-28
4
+
5
+ ### Added
6
+
7
+ - **`Routing` mixin** (`OpenRouter::Routing`) included in `Client`, providing two new meta-routing methods:
8
+ - `pareto_complete(messages, min_coding_score: nil, **opts)` — routes to the cheapest model meeting a configurable quality bar via OpenRouter's Pareto Code Router (`openrouter/pareto-code`). `min_coding_score` is validated to `0.0–1.0`.
9
+ - `fuse(messages, analysis_models: nil, judge: nil, preset: nil, max_tool_calls: nil, **opts)` — fans a prompt out to a panel of models and synthesises one answer via OpenRouter's Fusion router (`openrouter/fusion`). `analysis_models` (1–8) and `max_tool_calls` (1–16) are validated.
10
+ - **`SubagentTool`** (`OpenRouter::SubagentTool`) — wraps OpenRouter's `openrouter:subagent` server tool so an orchestrator model can delegate self-contained subtasks to a cheaper worker model mid-generation. Constructor: `model:` (required worker model) plus optional `instructions:`, `max_completion_tokens:`, `temperature:`, and `reasoning:`. Pass it via the normal `tools:` array to `complete`.
11
+ - **`Response#selected_model`** — alias for `#model`; returns the concrete model OpenRouter resolved for routing responses (e.g. Pareto, Auto, Fusion).
12
+
13
+ ### Changed
14
+
15
+ - Capability warning / strict-mode guards now exempt all `openrouter/`-prefixed meta-models (previously only `openrouter/auto` was exempt); this prevents spurious warnings or `CapabilityError` when using `pareto_complete` or `fuse` with tools or structured outputs.
16
+
17
+ ### Notes
18
+
19
+ - These three OpenRouter platform features are still evolving server-side. The gem builds and validates the requests; routing/synthesis/delegation behaviour is performed by OpenRouter. Fusion fans out to every panel model plus a judge, so it costs roughly 4–5× a single completion. `pareto_complete` may resolve to a reasoning model that consumes a small `max_tokens` budget entirely on reasoning (returning `nil` content with `finish_reason: "length"`) — budget `max_tokens` accordingly.
20
+
3
21
  ## [2.0.0] - 2025-12-28
4
22
 
5
23
  ### Overview
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- open_router_enhanced (2.0.0)
4
+ open_router_enhanced (2.2.0)
5
5
  activesupport (>= 6.0, < 9.0)
6
6
  dotenv (>= 2.0, < 4.0)
7
7
  faraday (>= 1.0, < 3.0)
data/README.md CHANGED
@@ -45,6 +45,7 @@ The [OpenRouter API](https://openrouter.ai/docs) is a single unified interface f
45
45
  - [Tool Calling](#tool-calling)
46
46
  - [Structured Outputs](#structured-outputs)
47
47
  - [Smart Model Selection](#smart-model-selection)
48
+ - [Routing (Pareto & Fusion)](#routing-pareto--fusion)
48
49
  - [Prompt Templates](#prompt-templates)
49
50
  - [Streaming](#streaming)
50
51
  - [Usage Tracking](#usage-tracking)
@@ -383,6 +384,95 @@ models = OpenRouter::ModelSelector.new
383
384
 
384
385
  **[Complete Model Selection Documentation](docs/model_selection.md)**
385
386
 
387
+ ### Routing (Pareto & Fusion)
388
+
389
+ OpenRouter offers two meta-routing modes that automatically pick or synthesize answers across models.
390
+
391
+ #### Pareto Code Router
392
+
393
+ Routes each request to the cheapest model that meets a configurable quality bar — useful when you want cost-optimised code completions without picking a specific model.
394
+
395
+ ```ruby
396
+ # Cheapest model meeting default quality threshold
397
+ response = client.pareto_complete([
398
+ { role: "user", content: "Write a binary search in Ruby" }
399
+ ])
400
+
401
+ # Require a higher quality bar (0.0–1.0, higher = better)
402
+ response = client.pareto_complete(
403
+ [{ role: "user", content: "Implement a red-black tree" }],
404
+ min_coding_score: 0.8,
405
+ max_tokens: 1000
406
+ )
407
+
408
+ # Which model actually answered?
409
+ puts response.selected_model # => "anthropic/claude-3.5-haiku"
410
+ puts response.content
411
+ ```
412
+
413
+ #### Fusion Router
414
+
415
+ Fans a prompt out to a panel of models in parallel, then synthesises one answer with a judge model. Costs roughly 4–5× a single completion but can outperform any individual model.
416
+
417
+ ```ruby
418
+ # Default panel (OpenRouter chooses)
419
+ response = client.fuse([
420
+ { role: "user", content: "What is the best approach to distributed consensus?" }
421
+ ])
422
+
423
+ # Custom panel + explicit judge
424
+ response = client.fuse(
425
+ [{ role: "user", content: "Review this architecture" }],
426
+ analysis_models: [
427
+ "anthropic/claude-3.5-sonnet",
428
+ "openai/gpt-4o",
429
+ "google/gemini-2.0-flash-001"
430
+ ],
431
+ judge: "anthropic/claude-opus-4-5",
432
+ max_tokens: 2000
433
+ )
434
+
435
+ # Curated preset panels
436
+ response = client.fuse(messages, preset: "general-budget")
437
+
438
+ # selected_model reports the synthesis/judge model that produced the answer,
439
+ # e.g. "anthropic/claude-opus-4-5" — not the "openrouter/fusion" router alias.
440
+ puts response.selected_model
441
+ puts response.content
442
+ ```
443
+
444
+ > **Note:** Fusion fans out to every panel model plus a judge, so it costs roughly 4–5× a single completion. `min_coding_score` for Pareto is validated to `0.0–1.0`; `analysis_models` (1–8) and `max_tool_calls` (1–16) for Fusion are validated client-side.
445
+
446
+ #### `SubagentTool`
447
+
448
+ Wraps OpenRouter's built-in `openrouter:subagent` server tool so an LLM can spawn its own sub-completions during a tool-calling loop.
449
+
450
+ ```ruby
451
+ subagent = OpenRouter::SubagentTool.new(
452
+ model: "anthropic/claude-3.5-haiku", # required: the cheaper worker model
453
+ instructions: "Complete the task exactly as described. Be concise.", # optional
454
+ max_completion_tokens: 512 # optional (also: temperature:, reasoning:)
455
+ )
456
+
457
+ response = client.complete(
458
+ [{ role: "user", content: "Summarize the attached changelog into release notes." }],
459
+ model: "openai/gpt-4o",
460
+ tools: [subagent],
461
+ tool_choice: "auto"
462
+ )
463
+ ```
464
+
465
+ > The orchestrator decides whether to delegate. The gem's job is to build and send a valid `openrouter:subagent` tool; OpenRouter runs the worker server-side and feeds its result back into the orchestrator's generation.
466
+
467
+ #### `Response#selected_model`
468
+
469
+ All routing methods (`complete`, `pareto_complete`, `fuse`) return a `Response` object. Use `#selected_model` (alias for `#model`) to see which model OpenRouter ultimately used:
470
+
471
+ ```ruby
472
+ response = client.pareto_complete(messages)
473
+ puts response.selected_model # e.g. "mistralai/codestral-2501"
474
+ ```
475
+
386
476
  ### Prompt Templates
387
477
 
388
478
  Create reusable, parameterized prompts with variable interpolation.
data/Rakefile CHANGED
@@ -30,6 +30,16 @@ task ci: %i[spec_all rubocop]
30
30
 
31
31
  # Model exploration tasks
32
32
  namespace :models do
33
+ desc "Fetch fresh model data from OpenRouter API and update local cache"
34
+ task :update do
35
+ require_relative "lib/open_router"
36
+
37
+ print "Fetching models from OpenRouter API..."
38
+ OpenRouter::ModelRegistry.refresh!
39
+ count = OpenRouter::ModelRegistry.all_models.size
40
+ puts " done. #{count} models cached."
41
+ end
42
+
33
43
  desc "Display summary of available models"
34
44
  task :summary do
35
45
  require_relative "lib/open_router"
@@ -59,18 +69,18 @@ namespace :models do
59
69
  end
60
70
 
61
71
  # Cost analysis
62
- input_costs = models.values.map { |spec| spec[:cost_per_1k_tokens][:input] }.compact.sort
63
- output_costs = models.values.map { |spec| spec[:cost_per_1k_tokens][:output] }.compact.sort
72
+ input_costs = models.values.map { |spec| spec[:cost_per_token][:input] }.compact.sort
73
+ output_costs = models.values.map { |spec| spec[:cost_per_token][:output] }.compact.sort
64
74
 
65
- puts "\n💰 Cost Analysis (per 1k tokens):"
75
+ puts "\n💰 Cost Analysis (per million tokens):"
66
76
  puts " Input tokens:"
67
- puts " Min: $#{format("%.6f", input_costs.min)}"
68
- puts " Max: $#{format("%.6f", input_costs.max)}"
69
- puts " Median: $#{format("%.6f", input_costs[input_costs.size / 2])}"
77
+ puts " Min: $#{format("%.4f", input_costs.min * 1_000_000)}"
78
+ puts " Max: $#{format("%.4f", input_costs.max * 1_000_000)}"
79
+ puts " Median: $#{format("%.4f", input_costs[input_costs.size / 2] * 1_000_000)}"
70
80
  puts " Output tokens:"
71
- puts " Min: $#{format("%.6f", output_costs.min)}"
72
- puts " Max: $#{format("%.6f", output_costs.max)}"
73
- puts " Median: $#{format("%.6f", output_costs[output_costs.size / 2])}"
81
+ puts " Min: $#{format("%.4f", output_costs.min * 1_000_000)}"
82
+ puts " Max: $#{format("%.4f", output_costs.max * 1_000_000)}"
83
+ puts " Median: $#{format("%.4f", output_costs[output_costs.size / 2] * 1_000_000)}"
74
84
 
75
85
  # Context length analysis
76
86
  context_lengths = models.values.map { |spec| spec[:context_length] }.compact.sort
@@ -269,8 +279,8 @@ namespace :models do
269
279
  def self.display_model_info(model_id, specs, index)
270
280
  puts "#{(index + 1).to_s.rjust(3)}. #{model_id}"
271
281
  puts " Name: #{specs[:name]}" if specs[:name]
272
- puts " Cost: $#{format("%.6f", specs[:cost_per_1k_tokens][:input])}/1k input, " \
273
- "$#{format("%.6f", specs[:cost_per_1k_tokens][:output])}/1k output"
282
+ cpm = OpenRouter::ModelRegistry.cost_per_million(model_id)
283
+ puts " Cost: $#{format("%.4f", cpm[:input])}/M input, $#{format("%.4f", cpm[:output])}/M output"
274
284
  puts " Context: #{format_number_with_commas(specs[:context_length])} tokens"
275
285
  puts " Capabilities: #{specs[:capabilities].join(", ")}"
276
286
  puts " Tier: #{specs[:performance_tier]}"
@@ -318,17 +328,17 @@ namespace :models do
318
328
  def self.sort_by_strategy(candidates, strategy)
319
329
  case strategy
320
330
  when :cost
321
- candidates.sort_by { |_, specs| specs[:cost_per_1k_tokens][:input] }
331
+ candidates.sort_by { |_, specs| specs[:cost_per_token][:input] }
322
332
  when :performance
323
333
  candidates.sort_by do |_, specs|
324
- [specs[:performance_tier] == :premium ? 0 : 1, specs[:cost_per_1k_tokens][:input]]
334
+ [specs[:performance_tier] == :premium ? 0 : 1, specs[:cost_per_token][:input]]
325
335
  end
326
336
  when :latest
327
337
  candidates.sort_by { |_, specs| -(specs[:created_at] || 0).to_i }
328
338
  when :context
329
339
  candidates.sort_by { |_, specs| -(specs[:context_length] || 0).to_i }
330
340
  else
331
- candidates.sort_by { |_, specs| specs[:cost_per_1k_tokens][:input] }
341
+ candidates.sort_by { |_, specs| specs[:cost_per_token][:input] }
332
342
  end
333
343
  end
334
344
  end