RubyGems - legion-llm - Versions diffs - 0.11.2 → 0.12.2 - Mend

legion-llm 0.11.2 → 0.12.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +12 -0
data/lib/legion/llm/api/namespaces/openai/responses.rb +21 -2
data/lib/legion/llm/api/native/models.rb +18 -4
data/lib/legion/llm/api/native/tiers.rb +1 -1
data/lib/legion/llm/api/openai/responses.rb +2 -0
data/lib/legion/llm/api/translators/openai_response.rb +4 -1
data/lib/legion/llm/call/lex_llm_adapter.rb +4 -1
data/lib/legion/llm/discovery/rule_generator.rb +1 -1
data/lib/legion/llm/discovery.rb +1 -1
data/lib/legion/llm/inference/executor.rb +54 -1
data/lib/legion/llm/inference.rb +1 -1
data/lib/legion/llm/router/resolution.rb +0 -4
data/lib/legion/llm/router.rb +7 -21
data/lib/legion/llm/settings.rb +8 -17
data/lib/legion/llm/version.rb +1 -1
metadata +1 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 555bff51c05efea04f283dc4a0c005703fef2344d57b9692fdd70d6ab95d9646
-  data.tar.gz: 454c9cd0be750aec0d597c9e343a35fc8939c22a67af8d048d3887214c225cba
+  metadata.gz: b5c5db81b61e68e29fd4a0ebdb612b8032145bac6ef1e2e70a6578b4dff3c9e4
+  data.tar.gz: 6ea03a18b9e7e448607d168af2c8c54bbccee3bb942670cd6adc5d1e673ca115
 SHA512:
-  metadata.gz: c153ef24c678502b0bd9249c6ff8d39070e07b8e06eb950686b552bfdf8bcc5d03b060c333dff2c06418cfc4a1046c3a779274756fb784e821166b3605ee430d
-  data.tar.gz: b1ab367c71a7098292fecbcf2c67758de0734ecd7e70cb108d8ee08abc9fd3917e01adefe2a7bc2d0472a0d731e86b0bf73a9a2a7f566619502322ff0eb9d4ec
+  metadata.gz: 13ec651f801eede51e8389244609bd4d74a38ecd385815aeca5bb22ef9cb9a6df35ef12ab0bb99d92e1fa96f08a730d72d03302d5a268631db38c96fdf5ba938
+  data.tar.gz: e9798e55b838b440ec1608ddb25424124bf36885a7fef46c3c445fc662361a7c5a82951aa38aedc63934242d363fb67ebdad6ff13060eee272a234b6ef450da7

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,17 @@
 # Legion LLM Changelog
+## [0.12.2] - 2026-06-02
+### Fixed
+- **Codex CLI `/v1/responses` routing through non-native providers** — `RESPONSES_PROVIDER_FAMILIES` now contains only `:openai` (api.openai.com). All other providers (vLLM, Ollama, MLX, Bedrock, Gemini, Azure, etc.) use `/v1/chat/completions` and must explicitly declare `:responses` in their instance capabilities to opt in. Previously `:vllm` was included, causing `BadRequestError: Invalid request` when Codex routed through a vLLM-backed proxy (call/lex_llm_adapter.rb)
+- **`developer` role crash on Responses API input** — The OpenAI Responses API sends `developer` as a higher-trust system role. Both Responses API handlers now map `developer` → `system` before building the message array, preventing `InvalidRoleError` from `lex-llm::Message::ROLES` validation (api/namespaces/openai/responses.rb, api/openai/responses.rb)
+- **Non-streaming Responses API path always used `call_responses`** — Sync path now calls `call_executor_sync` which routes through `call` for non-native providers and `call_responses` only when `provider_supports_responses?` returns true (api/namespaces/openai/responses.rb)
+### Added
+- **`Executor#provider_supports_responses?`** — Public method that checks whether the resolved provider's adapter natively supports the Responses API. Used by the API layer to gate `call_responses` vs `call_stream`/`call` dispatch. Returns false safely when provider resolution hasn't run yet (inference/executor.rb)
+- **`Responses.call_executor_sync`** — New method for non-streaming dispatch: routes through `call_responses` when native, otherwise `call` (api/namespaces/openai/responses.rb)
+- **`Responses.native_responses_supported?`** — Predicate shared by streaming and sync dispatch paths (api/namespaces/openai/responses.rb)
 ## [0.11.2] - 2026-06-02
 ### Removed

data/lib/legion/llm/api/namespaces/openai/responses.rb CHANGED Viewed

@@ -63,7 +63,7 @@ module Legion
                     out << "event: error\ndata: #{Legion::JSON.dump({ type: 'server_error', message: e.message })}\n\n"
                   end
                 else
-                  pipeline_response = executor.call_responses(body: body, stream: false)
+                  pipeline_response = Responses.call_executor_sync(executor, upstream_body: body)
                   response_body     = Responses.format_response(pipeline_response, request_id: request_id, model: model)
                   log.info("[llm][api][namespaces][openai][responses] action=complete request_id=#{request_id}")
                   content_type :json
@@ -165,6 +165,10 @@ module Legion
                   role = item[:role]&.to_s
                   next unless role
+                  # OpenAI Responses API uses "developer" as a higher-trust system role.
+                  # All downstream providers only understand the standard four roles.
+                  role = 'system' if role == 'developer'
                   content = item[:content]
                   content = content.to_s if content && !content.is_a?(Array)
                   messages << { role: role, content: content }.compact
@@ -295,13 +299,28 @@ module Legion
             end
             def self.call_executor(executor, upstream_body: nil, &)
-              if upstream_body && executor.respond_to?(:call_responses)
+              if native_responses_supported?(executor, upstream_body)
                 executor.call_responses(body: upstream_body, stream: true, &)
               else
                 executor.call_stream(&)
               end
             end
+            def self.call_executor_sync(executor, upstream_body: nil)
+              if native_responses_supported?(executor, upstream_body)
+                executor.call_responses(body: upstream_body, stream: false)
+              else
+                executor.call
+              end
+            end
+            def self.native_responses_supported?(executor, upstream_body)
+              upstream_body &&
+                executor.respond_to?(:call_responses) &&
+                executor.respond_to?(:provider_supports_responses?) &&
+                executor.provider_supports_responses?
+            end
             def self.build_output_tool_calls(pipeline_response)
               tools_data = pipeline_response.respond_to?(:tools) ? pipeline_response.tools : nil
               return [] unless tools_data.is_a?(Array) && !tools_data.empty?

data/lib/legion/llm/api/native/models.rb CHANGED Viewed

@@ -13,6 +13,7 @@ module Legion
           AUTO_ROUTING_MODEL_DISPLAY = 'LegionIO'
           AUTO_ROUTING_OFFERING_ID = 'legionio:auto:inference:legionio'
           AUTO_ROUTING_CAPABILITIES = %w[auto_routing chat completion json_schema tools].freeze
+          AUTO_ROUTING_MODEL_ALIASES = %w[auto].freeze
           def self.registered(app)
             log.debug('[llm][api][models] registering model inventory routes')
@@ -108,9 +109,10 @@ module Legion
               enabled:        offerings.any? { |offering| offering[:enabled] != false }
             }
             if auto_routing_model?(model)
-              summary[:display_name] = AUTO_ROUTING_MODEL_DISPLAY
+              first_display = offerings.filter_map { |o| o[:display_name] }.first
+              summary[:display_name] = first_display || AUTO_ROUTING_MODEL_DISPLAY
               summary[:auto_route] = true
-              summary[:default] = true
+              summary[:default] = model.to_s == AUTO_ROUTING_MODEL_ID
             end
             summary
           end
@@ -129,7 +131,18 @@ module Legion
             return offerings unless auto_routing_offering_matches?(filters)
             return offerings if offerings.any? { |offering| auto_routing_model?(offering[:model]) }
-            [auto_routing_offering, *offerings]
+            [auto_routing_offering, auto_routing_alias_offering, *offerings]
+          end
+          def self.auto_routing_alias_offering
+            base = auto_routing_offering
+            base.merge(
+              id:                    'legionio:auto:inference:auto',
+              offering_id:           'legionio:auto:inference:auto',
+              model:                 'auto',
+              display_name:          'LegionIO (auto)',
+              canonical_model_alias: 'auto'
+            )
           end
           def self.auto_routing_offering
@@ -182,7 +195,8 @@ module Legion
           end
           def self.auto_routing_model?(model)
-            model.to_s.strip.downcase == AUTO_ROUTING_MODEL_ID
+            m = model.to_s.strip.downcase
+            m == AUTO_ROUTING_MODEL_ID || AUTO_ROUTING_MODEL_ALIASES.include?(m)
           end
         end
       end

data/lib/legion/llm/api/native/tiers.rb CHANGED Viewed

@@ -165,7 +165,7 @@ module Legion
             routing_config = Legion::Settings[:llm][:routing] || {}
             top_level = Legion::Settings[:llm][:tier_order] || nil
             Array(top_level || routing_config[:tier_order] || routing_config[:tier_priority] ||
-                  %w[local direct fleet openai_compat cloud frontier])
+                  %w[local direct fleet cloud frontier])
           end
           def self.privacy_mode?

data/lib/legion/llm/api/openai/responses.rb CHANGED Viewed

@@ -130,6 +130,8 @@ module Legion
                 role = item[:role]&.to_s
                 next unless role
+                role = 'system' if role == 'developer'
                 content = item[:content]
                 content = content.to_s if content && !content.is_a?(Array)
                 messages << { role: role, content: content }.compact

data/lib/legion/llm/api/translators/openai_response.rb CHANGED Viewed

@@ -159,7 +159,10 @@ module Legion
               owned_by: owned_by
             }
             if limits.is_a?(Hash)
-              obj[:context_window] = limits[:context_window] if limits[:context_window]
+              if limits[:context_window]
+                obj[:context_window] = limits[:context_window]
+                obj[:context_size] = limits[:context_window]
+              end
               obj[:max_output_tokens] = limits[:max_output_tokens] if limits[:max_output_tokens]
             end
             obj

data/lib/legion/llm/call/lex_llm_adapter.rb CHANGED Viewed

@@ -11,7 +11,10 @@ module Legion
         include Legion::Logging::Helper
         METADATA_KEYS = %i[tier capabilities enabled].freeze
-        RESPONSES_PROVIDER_FAMILIES = %i[openai vllm].freeze
+        # Only providers that natively expose /v1/responses (OpenAI API proper).
+        # All other providers (vLLM, Ollama, MLX, Anthropic, Bedrock, Gemini, Vertex, Azure Foundry)
+        # use /v1/chat/completions and must declare :responses in their instance capabilities explicitly.
+        RESPONSES_PROVIDER_FAMILIES = %i[openai].freeze
         def initialize(provider_name, provider_class, instance_config: {})
           @provider_name = provider_name.to_sym

data/lib/legion/llm/discovery/rule_generator.rb CHANGED Viewed

@@ -26,7 +26,7 @@ module Legion
           anthropic: :frontier
         }.freeze
-        DEFAULT_TIER_PRIORITY = %i[local direct fleet openai_compat cloud frontier].freeze
+        DEFAULT_TIER_PRIORITY = %i[local direct fleet cloud frontier].freeze
         CAPABILITY_ALIASES = {
           function_calling: :tools,
           functions:        :tools,

data/lib/legion/llm/discovery.rb CHANGED Viewed

@@ -17,7 +17,7 @@ module Legion
       @discovered_models_cache = nil
       @discovered_models_at = nil
-      EMBEDDING_TIER_ORDER = %w[local direct fleet openai_compat cloud frontier].freeze
+      EMBEDDING_TIER_ORDER = %w[local direct fleet cloud frontier].freeze
       class << self
         attr_reader :embedding_provider, :embedding_model, :embedding_instance, :embedding_fallback_chain

data/lib/legion/llm/inference/executor.rb CHANGED Viewed

@@ -146,6 +146,22 @@ module Legion
           clear_log_context
         end
+        # Returns true when the resolved provider's adapter natively supports the Responses API.
+        # Called by the API layer before choosing call_responses vs call_stream.
+        # Pre-provider steps must have already run (provider is resolved) for this to be accurate;
+        # returns false safely if resolution hasn't happened yet.
+        def provider_supports_responses?
+          provider = @resolved_provider
+          return false unless provider && use_native_dispatch?(provider)
+          ext = Call::Registry.for(provider, instance: @resolved_instance || :default)
+          ext.respond_to?(:supports?) ? ext.supports?(:responses) : false
+        rescue StandardError => e
+          handle_exception(e, level: :warn, handled: true, operation: 'llm.executor.provider_supports_responses',
+                              provider: @resolved_provider)
+          false
+        end
         private
         def set_log_context
@@ -368,7 +384,7 @@ module Legion
         def step_routing
           log.debug "[llm][executor] action=step_routing.enter requested_provider=#{@request.routing[:provider]} requested_model=#{@request.routing[:model]}"
           @timestamps[:routing_start] = Time.now
-          state = resolve_routing_state(apply_proactive_tier_assignment(routing_request_state))
+          state = resolve_routing_state(apply_proactive_tier_assignment(resolve_model_to_local_provider(routing_request_state)))
           auto_route = state[:auto_route] == true
           @resolved_provider = state[:provider] ||
@@ -537,6 +553,43 @@ module Legion
           state
         end
+        # If the caller named a model but gave no explicit provider/tier/instance,
+        # search discovered providers for that model with a healthy circuit.
+        # On a hit: pin provider + instance so normal routing runs against the local copy.
+        # On a miss: clear the model name and set auto_route so the pipeline picks the best
+        # available provider rather than blindly forwarding a frontier model name.
+        def resolve_model_to_local_provider(state)
+          return state if state[:provider_explicit] || state[:tier_explicit] || state[:instance_explicit]
+          return state if state[:provider] || state[:tier] || state[:instance]
+          return state unless state[:model] && defined?(Discovery) && defined?(Router)
+          model = state[:model].to_s
+          all_discovered = Array(Discovery.cached_discovered_models)
+          return state if all_discovered.empty?
+          candidates = all_discovered.select do |m|
+            dn = m[:model].to_s
+            dn == model || dn.start_with?("#{model}:")
+          end
+          return state if candidates.empty?
+          healthy = candidates.find do |m|
+            Router.health_tracker.circuit_state(m[:provider], instance: m[:instance]) != :open
+          end
+          if healthy
+            log.info "[llm][executor] action=model_discovery_pin model=#{model} provider=#{healthy[:provider]} instance=#{healthy[:instance]}"
+            state[:provider] = healthy[:provider]
+            state[:instance] = healthy[:instance]
+          else
+            log.info "[llm][executor] action=model_discovery_miss model=#{model} falling_back=auto_route"
+            state[:model] = nil
+            state[:auto_route] = true
+          end
+          state
+        end
         def resolve_routing_state(state)
           return state unless defined?(Router)

data/lib/legion/llm/inference.rb CHANGED Viewed

@@ -864,7 +864,7 @@ module Legion
       alias effective_tier_is_cloud? effective_tier_is_external?
       def external_tier?(tier)
-        %i[cloud frontier openai_compat].include?(tier)
+        %i[cloud frontier].include?(tier)
       end
     end
   end

data/lib/legion/llm/router/resolution.rb CHANGED Viewed

@@ -40,10 +40,6 @@ module Legion
           @tier == :frontier
         end
-        def openai_compat?
-          @tier == :openai_compat
-        end
         def external?
           !%i[local direct fleet].include?(@tier)
         end

data/lib/legion/llm/router.rb CHANGED Viewed

@@ -17,8 +17,8 @@ module Legion
       PROVIDER_TIER = { bedrock: :cloud, anthropic: :frontier, openai: :frontier,
                         gemini: :cloud, azure: :cloud, ollama: :local, vllm: :fleet }.freeze
       PROVIDER_ORDER = %i[ollama vllm bedrock azure gemini anthropic openai].freeze
-      TIER_EXTERNAL = Set[:cloud, :frontier, :openai_compat].freeze
-      TIER_RANK = { local: 0, direct: 1, fleet: 2, openai_compat: 3, cloud: 4, frontier: 5 }.freeze
+      TIER_EXTERNAL = Set[:cloud, :frontier].freeze
+      TIER_RANK = { local: 0, direct: 1, fleet: 2, cloud: 3, frontier: 4 }.freeze
       CAPABILITY_ALIASES = {
         function_calling: :tools,
         functions:        :tools,
@@ -142,12 +142,11 @@ module Legion
         end
         # Check whether a tier can be used right now.
-        # :local          — always available
-        # :direct         — always available (remote self-hosted instances)
-        # :fleet          — available when Legion::Transport is loaded
-        # :openai_compat  — available when OpenAI-compatible provider instances are registered
-        # :cloud          — available unless privacy mode
-        # :frontier       — available unless privacy mode
+        # :local    — always available
+        # :direct   — always available (remote self-hosted instances)
+        # :fleet    — available when Legion::Transport is loaded
+        # :cloud    — available unless privacy mode
+        # :frontier — available unless privacy mode
         def tier_available?(tier)
           sym = tier.to_sym
           if external_tier?(sym) && privacy_mode?
@@ -159,11 +158,6 @@ module Legion
             log.debug "[llm][router] action=tier_available tier=fleet available=#{available}"
             return available
           end
-          if sym == :openai_compat
-            available = openai_compat_available?
-            log.debug "[llm][router] action=tier_available tier=openai_compat available=#{available}"
-            return available
-          end
           true
         end
@@ -403,10 +397,6 @@ module Legion
           TIER_EXTERNAL.include?(tier)
         end
-        def openai_compat_available?
-          !registry_entry_for_tier(:openai_compat).nil?
-        end
         def pick_best(candidates)
           return nil if candidates.empty?
@@ -454,8 +444,6 @@ module Legion
           case sym
           when :local, :direct, :fleet
             :ollama
-          when :openai_compat
-            :openai
           when :cloud
             default = Legion::Settings[:llm][:default_provider]
             default ? default.to_sym : :bedrock
@@ -477,8 +465,6 @@ module Legion
           case sym
           when :local, :direct, :fleet
             default_settings_model_for_tier(sym) || 'llama3'
-          when :openai_compat
-            'gpt-4o'
           when :cloud
             default_settings_model_for_tier(sym) || 'us.anthropic.claude-sonnet-4-6'
           when :frontier

data/lib/legion/llm/settings.rb CHANGED Viewed

@@ -76,17 +76,9 @@ module Legion
         end
         routing = settings.is_a?(Hash) ? (settings[:routing] || settings['routing'] || {}) : {}
-        if routing.is_a?(Hash)
-          if routing.key?(:use_fleet) || routing.key?('use_fleet')
-            raise ArgumentError,
-                  'routing.use_fleet has been removed; configure fleet.dispatch.enabled instead'
-          end
-          tiers = routing[:tiers] || routing['tiers'] || {}
-          openai_compat = tiers.is_a?(Hash) ? (tiers[:openai_compat] || tiers['openai_compat'] || {}) : {}
-          if openai_compat.is_a?(Hash) && (openai_compat.key?(:gateways) || openai_compat.key?('gateways'))
-            raise ArgumentError, 'routing.tiers.openai_compat.gateways has been removed; configure lex-llm-openai provider instances instead'
-          end
+        if routing.is_a?(Hash) && (routing.key?(:use_fleet) || routing.key?('use_fleet'))
+          raise ArgumentError,
+                'routing.use_fleet has been removed; configure fleet.dispatch.enabled instead'
         end
         settings
@@ -215,19 +207,18 @@ module Legion
       def self.routing_defaults
         {
           enabled:        true,
-          tier_priority:  %w[local direct fleet openai_compat cloud frontier],
+          tier_priority:  %w[local direct fleet cloud frontier],
           default_intent: { privacy: 'normal', capability: 'moderate', cost: 'normal' },
           tiers:          {
-            local:         { provider: 'ollama' },
-            fleet:         {
+            local:    { provider: 'ollama' },
+            fleet:    {
               queue:           'llm.fleet',
               routing_style:   :shared_lane,
               timeout_seconds: 30,
               timeouts:        { embed: 10, chat: 30, generate: 30, default: 30 }
             },
-            openai_compat: {},
-            cloud:         { providers: %w[bedrock azure gemini] },
-            frontier:      { providers: %w[anthropic openai] }
+            cloud:    { providers: %w[bedrock azure gemini] },
+            frontier: { providers: %w[anthropic openai] }
           },
           health:         {
             window_seconds:               300,

data/lib/legion/llm/version.rb CHANGED Viewed

@@ -2,6 +2,6 @@
 module Legion
   module LLM
-    VERSION = '0.11.2'
+    VERSION = '0.12.2'
   end
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: legion-llm
 version: !ruby/object:Gem::Version
-  version: 0.11.2
+  version: 0.12.2
 platform: ruby
 authors:
 - Esity