RubyGems - legion-llm - Versions diffs - 0.9.23 → 0.9.28 - Mend

legion-llm 0.9.23 → 0.9.28

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +55 -0
data/CLAUDE.md +20 -0
data/lib/legion/llm/api/native/inference.rb +1 -1
data/lib/legion/llm/api/native/models.rb +75 -3
data/lib/legion/llm/api/native/tiers.rb +2 -1
data/lib/legion/llm/call/embeddings.rb +11 -2
data/lib/legion/llm/call/lex_llm_adapter.rb +41 -4
data/lib/legion/llm/call/providers.rb +4 -2
data/lib/legion/llm/context/curator.rb +31 -11
data/lib/legion/llm/discovery/rule_generator.rb +23 -3
data/lib/legion/llm/discovery.rb +35 -6
data/lib/legion/llm/inference/executor.rb +187 -38
data/lib/legion/llm/inference/prompt.rb +30 -19
data/lib/legion/llm/inference/request.rb +45 -2
data/lib/legion/llm/inference/steps/rag_context.rb +1 -0
data/lib/legion/llm/router/escalation/chain.rb +1 -3
data/lib/legion/llm/router.rb +60 -37
data/lib/legion/llm/settings.rb +1 -1
data/lib/legion/llm/version.rb +1 -1
metadata +1 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 8c8c98a439d2e96bba437e5e8b4bf8c47c01277a4079bd459c7257e2990278c6
-  data.tar.gz: f9344c761ebf18b4c5ab271ac8cb5858ce46f791588ace438726002d2907c70e
+  metadata.gz: 178958a3403cbac0fad20d83f2726914d420137db2a1c340c33c4c7305457fcd
+  data.tar.gz: df951b9e05e0a0bfaff3701b6a3c5bd8452edea2298fe91e6a98165ce96961d1
 SHA512:
-  metadata.gz: 9574535d0eeca84d522858dd323e8d028994b46b3d3f78a37a8094a4f1a692fbdd68bf24e8a061160b5238a2c3e4f73141e29bf70c6423f18e4b4441937f5417
-  data.tar.gz: 69aa8eccf10beb687b637b7442d9eb8a7bae0d42405fc9cfd47f8c8d5c036b7724df6cb55c10d8264f0701f46f8282f0a44d8622d607a6129a09ab4c39ad2e99
+  metadata.gz: 7ff1622a50cdafc4d09577e8dc5f9e90632f7f93e528f41c1b630ce5daeeeceab6c91cbf5d8ec92183e0b6f27c17a62d09934b6bb0a413da9577cea5a47c942f
+  data.tar.gz: 49651eb56bbe046223674a626ce4339363ba10550793c1d95ca46d4bf2c4b7f2eb430397b7f0d91368791cecb683ae2919f8627edb45d88d5c847f4d0cb4ee12

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,60 @@
 # Legion LLM Changelog
+## [0.9.28] - 2026-05-15
+### Added
+- API: `/api/llm/models` now surfaces a static `LegionIO` model (`id: legionio`) as the default auto-routing placeholder.
+### Changed
+- Routing: `model: "legionio"` clears explicit provider/model/instance/tier routing and sends the request through the router chain using the configured default intent.
+- Routing: default tier priority now includes `direct` between `local` and `fleet`, and discovery-generated rule scores honor `routing.tier_priority`.
+### Fixed
+- Prompt dispatch: provider-inferable model-only calls such as `gpt-5.4` infer the provider instead of pairing the model with `llm.default_provider`.
+- Executor: provider-tier lookup failures are logged and return nil instead of silently defaulting to `:cloud`.
+- LexLLMAdapter: optional content-block accessor fallbacks now capture and debug-log probe errors instead of bare-rescuing them.
+- Auto routing: unresolved `legionio` requests now raise a clear provider error instead of falling back to configured defaults.
+- Routing: model-only requests stay on provider inference while explicit provider/instance/tier requests still get registry defaults without requiring rule routing.
+## [0.9.27] - 2026-05-15
+### Fixed
+- Router/Executor: provider-scoped instance resolution no longer applies a global `llm.default_instance` to models inferred for another provider; invalid explicit instances now fall back to that provider's registered default instance instead of dispatching to an unregistered `provider/instance` pair.
+## [0.9.26] - 2026-05-15
+### Fixed
+- Discovery: `detect_embedding_from_registry` no longer sets `@can_embed = true` when no model is resolvable — adds `first_embedding_model_for(provider, instance)` as a third fallback scanning the discovered model catalog; returns `false` (allowing legacy probe to run) when all three sources yield nothing (#121)
+- RagContext: `positive_integer` no longer raises `TypeError` when `value` is nil or an empty string — adds empty-string guard before `Kernel#Integer()` call so GAIA advisory `context_window: nil` does not abort the inference pipeline (#122)
+- LexLLMAdapter: `text_part_content` now handles Anthropic-style `[{type:"text", text:"…"}]` content block arrays — flattens them to plain text instead of calling `.to_s` on the array, preventing Ruby array literals from leaking into provider prompts (#123)
+- Embeddings/Discovery: `embedding_config_value` and `embedding_settings` now accept the deprecated plural `"embeddings"` key alongside the canonical singular `"embedding"` key, emitting a deprecation warning; fixes silent misconfiguration when users follow doc examples that used the plural spelling (#124)
+## [0.9.25] - 2026-05-14
+### Added
+- Router: `TIER_RANK` constant — ordered quality ranking of tiers (local → direct → fleet → openai_compat → cloud → frontier)
+- Router: `explicit_resolution` promoted to public — callable directly from executor without `send`
+- Router: `chain_from_defaults` appends all registered fallback providers after the primary so the chain has real alternatives to escalate to (previously single-entry when a default provider was configured)
+- Executor: `run_escalation_resolution` extracted from escalation loop — encapsulates per-attempt dispatch, error rescue, and `tried[]` tracking
+- Executor: `skip_same_tier!` — on `ContextOverflow`, immediately skips all remaining same-tier candidates and routes to a higher-tier provider with a larger context window
+- Executor: lateral vs. escalation move classification in per-attempt log line (`move=lateral` for same-tier, `move=escalation` for higher-tier)
+### Fixed
+- Router: `explicit_resolution` handles nil `provider` and nil `tier` without raising `NoMethodError`
+- Executor: `build_fallback_resolutions` sorts lateral alternatives (same-tier) before escalation candidates (higher-tier) — tries other instances at the same tier before promoting to a more expensive one
+- Executor: deduplication in escalation loop is fully safe — `tried` entry is recorded on all rescue paths and on quality failure
+- EscalationChain: `padded_resolutions` no longer pads the list by repeating the last resolution — only real distinct options are tried
+## [0.9.24] - 2026-05-14
+### Fixed
+- API: `instance` from POST body was silently dropped — never forwarded into routing hash
+- Executor: Gaia advisory tier assignment no longer overrides explicit `provider`+`instance` from caller
+- Executor: `instance` now passed through `routing_resolution_for` to `Router.resolve`/`resolve_chain`
+- Executor: `build_default_escalation_chain` now passes resolved provider/instance/model — previously ignored them and built a full auto chain, routing to vllm/fleet instead of the requested provider
+- Router: `resolve`/`resolve_chain` accept `instance:` param; short-circuit to `explicit_resolution` when `provider` or `instance` is set (not just `tier`)
+- Router: `explicit_resolution` honors caller-supplied instance instead of always pulling from registry; infers tier from `PROVIDER_TIER` when not explicitly given
 ## [0.9.23] - 2026-05-13
 ### Added

data/CLAUDE.md CHANGED Viewed

@@ -745,6 +745,26 @@ These rules are enforced across all legion-llm code. Violations will be caught i
 - **Advanced signals**: Budget tracking, GPU utilization monitoring, per-tenant spend limits
 - **Fleet auto-scaling**: Dynamic worker pool sizing based on queue depth and latency
+## Provider Registration & Model Resolution
+- `discover_instances` in each lex-llm-* must include `default_model` in returned config — it flows to registry metadata via `instance_metadata` in `call/providers.rb`
+- Router resolves models via: `registry_entry_for_provider(provider)` → `registry_default_model(entry)` → `metadata[:default_model]`
+- `enabled: false` on an instance config prevents registration — checked in `register_provider_instance`
+- `PROVIDER_DEFAULT_MODEL` does NOT belong in legion-llm — each provider owns its default in its own extension
+- Inventory calls `native_provider_offerings` (full metadata) and excludes `discovery_offerings` for providers with native adapters
+## Metering Spool
+- Events spool to `~/.legionio/data/spool/metering/events.jsonl` when AMQP transport is unavailable
+- Thread-safe (SPOOL_MUTEX), capped at `settings[:metering][:spool][:max_events]` (default 10K)
+- `flush_spool` publishes spooled events when transport reconnects; `lex-llm-ledger` actor triggers it
+## Health Tracker
+- `deny_model(provider:, model:, instance:)` — permanently excludes a model from routing (in-memory, until restart)
+- Config errors (ValidationException, AccessDenied, marketplace) trigger deny instead of circuit breaker
+- Discovery connection failures report `:error` to health tracker — circuit opens after threshold
 ---
 **Maintained By**: Matthew Iverson (@Esity)

data/lib/legion/llm/api/native/inference.rb CHANGED Viewed

@@ -108,7 +108,7 @@ module Legion
                 id:              request_id,
                 messages:        messages,
                 system:          body[:system],
-                routing:         { provider: provider, model: model },
+                routing:         { provider: provider, model: model, instance: body[:instance] }.compact,
                 tools:           tool_declarations,
                 caller:          effective_caller,
                 conversation_id: conversation_id,

data/lib/legion/llm/api/native/models.rb CHANGED Viewed

@@ -9,6 +9,11 @@ module Legion
         module Models
           extend Legion::Logging::Helper
+          AUTO_ROUTING_MODEL_ID = 'legionio'
+          AUTO_ROUTING_MODEL_DISPLAY = 'LegionIO'
+          AUTO_ROUTING_OFFERING_ID = 'legionio:auto:inference:legionio'
+          AUTO_ROUTING_CAPABILITIES = %w[auto_routing chat completion json_schema tools].freeze
           def self.registered(app)
             log.debug('[llm][api][models] registering model inventory routes')
@@ -18,6 +23,7 @@ module Legion
               filters = Legion::LLM::API::Native::Models.request_filters(params)
               offerings = Legion::LLM::Inventory.offerings(filters)
+              offerings = Legion::LLM::API::Native::Models.with_auto_routing_offering(offerings, filters)
               json_response({
                               models:    Legion::LLM::API::Native::Models.model_summaries(offerings),
@@ -34,7 +40,9 @@ module Legion
               log.debug("[llm][api][models] action=get_model id=#{model_id}")
               require_llm!
-              offerings = Legion::LLM::Inventory.offerings(model: model_id)
+              filters = { model: model_id }
+              offerings = Legion::LLM::Inventory.offerings(filters)
+              offerings = Legion::LLM::API::Native::Models.with_auto_routing_offering(offerings, filters)
               halt json_error('model_not_found', "Model '#{model_id}' not found", status_code: 404) unless offerings.any?
               json_response({
@@ -84,11 +92,11 @@ module Legion
             summaries = offerings.group_by { |offering| offering[:model] }.map do |model, rows|
               summarize_model(model, rows)
             end
-            summaries.sort_by { |model| model[:id] }
+            summaries.sort_by { |model| [auto_routing_model?(model[:id]) ? 0 : 1, model[:id]] }
           end
           def self.summarize_model(model, offerings)
-            {
+            summary = {
               id:             model.to_s,
               types:          offerings.map { |offering| offering[:type].to_s }.uniq.sort,
               providers:      offerings.map { |offering| offering[:provider_family] }.uniq.sort,
@@ -99,6 +107,12 @@ module Legion
               max_context:    offerings.filter_map { |offering| offering.dig(:limits, :context_window) }.max,
               enabled:        offerings.any? { |offering| offering[:enabled] != false }
             }
+            if auto_routing_model?(model)
+              summary[:display_name] = AUTO_ROUTING_MODEL_DISPLAY
+              summary[:auto_route] = true
+              summary[:default] = true
+            end
+            summary
           end
           def self.summary(offerings)
@@ -110,6 +124,64 @@ module Legion
                                         .transform_values(&:size)
             }
           end
+          def self.with_auto_routing_offering(offerings, filters = {})
+            return offerings unless auto_routing_offering_matches?(filters)
+            return offerings if offerings.any? { |offering| auto_routing_model?(offering[:model]) }
+            [auto_routing_offering, *offerings]
+          end
+          def self.auto_routing_offering
+            {
+              id:                    AUTO_ROUTING_OFFERING_ID,
+              offering_id:           AUTO_ROUTING_OFFERING_ID,
+              model:                 AUTO_ROUTING_MODEL_ID,
+              display_name:          AUTO_ROUTING_MODEL_DISPLAY,
+              model_family:          'legionio',
+              canonical_model_alias: AUTO_ROUTING_MODEL_ID,
+              type:                  :inference,
+              provider_family:       'legionio',
+              provider_instance:     'auto',
+              instance_id:           'auto',
+              tier:                  :auto,
+              transport:             :internal,
+              enabled:               true,
+              capabilities:          AUTO_ROUTING_CAPABILITIES,
+              limits:                {},
+              health:                { circuit_state: 'available' },
+              metadata:              { auto_route: true, placeholder: true, display_name: AUTO_ROUTING_MODEL_DISPLAY },
+              routing_metadata:      { strategy: 'auto' },
+              source:                'static'
+            }
+          end
+          def self.auto_routing_offering_matches?(filters)
+            normalized = request_filters(filters)
+            type = normalized[:type]
+            return false if type && !type.to_s.empty? && type.to_s != 'inference' && type.to_s != 'chat'
+            provider = normalized[:provider]
+            return false if provider && !provider.to_s.empty? && !%w[legionio auto].include?(provider.to_s.downcase)
+            instance = normalized[:instance_id]
+            return false if instance && !instance.to_s.empty? && !%w[auto legionio].include?(instance.to_s.downcase)
+            model = normalized[:model] || normalized[:offering_id]
+            return false if model && !model.to_s.empty? && !auto_routing_model?(model) && model.to_s != AUTO_ROUTING_OFFERING_ID
+            family = normalized[:model_family]
+            return false if family && !family.to_s.empty? && family.to_s.downcase != 'legionio'
+            capability = normalized[:capability]
+            return false if capability && !AUTO_ROUTING_CAPABILITIES.include?(capability.to_s)
+            true
+          end
+          def self.auto_routing_model?(model)
+            model.to_s.strip.downcase == AUTO_ROUTING_MODEL_ID
+          end
         end
       end
     end

data/lib/legion/llm/api/native/tiers.rb CHANGED Viewed

@@ -232,7 +232,8 @@ module Legion
             return 'unknown' unless tracker
             tracker.circuit_state(provider_name.to_sym, instance: instance_name.to_sym).to_s
-          rescue StandardError
+          rescue StandardError => e
+            log.debug "[llm][tiers] action=offering_instance_health provider=#{provider_name} instance=#{instance_name} error=#{e.class} — #{e.message}"
             'unknown'
           end
         end

data/lib/legion/llm/call/embeddings.rb CHANGED Viewed

@@ -122,12 +122,21 @@ module Legion
           def resolve_provider
             LLM.embedding_provider ||
-              Legion::LLM::Settings.value(:embedding, :provider)&.to_sym
+              embedding_config_value(:provider)&.to_sym
           end
           def resolve_model
             LLM.embedding_model ||
-              Legion::LLM::Settings.value(:embedding, :default_model)
+              embedding_config_value(:default_model)
+          end
+          def embedding_config_value(key)
+            v = Legion::LLM::Settings.value(:embedding, key)
+            return v unless v.nil?
+            plural = Legion::LLM::Settings.value(:embeddings, key)
+            log.warn "[llm][embeddings] settings key \"embeddings.#{key}\" (plural) is deprecated — rename to \"embedding.#{key}\"" unless plural.nil?
+            plural
           end
           def coerce_text(value)

data/lib/legion/llm/call/lex_llm_adapter.rb CHANGED Viewed

@@ -239,12 +239,49 @@ module Legion
         end
         def text_part_content(part)
-          return unless part.respond_to?(:transform_keys)
+          return part if part.is_a?(String)
-          normalized = part.transform_keys { |key| key.respond_to?(:to_sym) ? key.to_sym : key }
-          return unless normalized[:type].to_s == 'text'
+          if part.respond_to?(:transform_keys)
+            normalized = part.transform_keys { |key| key.respond_to?(:to_sym) ? key.to_sym : key }
+            return unless normalized[:type].to_s == 'text'
-          normalized[:text].to_s
+            return normalized[:text].to_s
+          end
+          # Data structs expose named readers (type/text) without necessarily implementing [].
+          # Try named accessor path first; fall through to [] / fetch for plain hashes/structs.
+          if part.respond_to?(:type) || part.respond_to?(:text)
+            type = (part.respond_to?(:type) ? part.type.to_s : '')
+            text = part.respond_to?(:text) ? part.text : nil
+            return text.to_s if type == 'text' || (type.empty? && !text.nil?)
+            return nil
+          end
+          return unless part.respond_to?(:[]) || part.respond_to?(:fetch)
+          type = (defined_method_access(part, :type) || '').to_s
+          text = defined_method_access(part, :text)
+          text.to_s if type == 'text' || (type.empty? && !text.nil?)
+        end
+        def defined_method_access(obj, key)
+          # Prefer named accessor (covers Data structs like Types::ContentBlock).
+          key_sym = key.respond_to?(:to_sym) ? key.to_sym : key
+          return obj.public_send(key_sym) if obj.respond_to?(key_sym)
+          str_key = key.to_s
+          obj[key]
+        rescue TypeError, NoMethodError, KeyError => e
+          log.debug "[llm][adapter] action=defined_method_access key=#{key} class=#{obj.class} " \
+                    "fallback=string_key error=#{e.class}: #{e.message}"
+          begin
+            obj[str_key]
+          rescue TypeError, NoMethodError, KeyError => fallback_error
+            log.debug "[llm][adapter] action=defined_method_access key=#{key} class=#{obj.class} " \
+                      "fallback=none error=#{fallback_error.class}: #{fallback_error.message}"
+            nil
+          end
         end
         def normalize_message_tool_calls(tool_calls)

data/lib/legion/llm/call/providers.rb CHANGED Viewed

@@ -52,7 +52,8 @@ module Legion
           Legion::Extensions::Llm.constants(false).filter_map do |const_name|
             mod = Legion::Extensions::Llm.const_get(const_name, false)
             provider_module?(mod) ? mod : nil
-          rescue NameError
+          rescue NameError => e
+            log.debug "[llm][providers] action=discover_provider_modules const=#{const_name} error=#{e.class} — #{e.message}"
             nil
           end
         end
@@ -120,7 +121,8 @@ module Legion
           return nil unless provider_module&.const_defined?(:PROVIDER_FAMILY, false)
           provider_module::PROVIDER_FAMILY
-        rescue StandardError
+        rescue StandardError => e
+          log.debug "[llm][providers] action=safe_provider_family error=#{e.class} — #{e.message}"
           nil
         end

data/lib/legion/llm/context/curator.rb CHANGED Viewed

@@ -9,9 +9,15 @@ module Legion
       class Curator
         include Legion::Logging::Helper
-        CURATED_KEY    = :__curated__
-        THINKING_OPEN  = '<thinking>'
-        THINKING_CLOSE = '</thinking>'
+        CURATED_KEY = :__curated__
+        # All known provider thinking tag variants.
+        # Anthropic: <thinking>…</thinking>
+        # DeepSeek / Qwen / Ollama / vLLM inline: <think>…</think>
+        THINKING_TAG_PAIRS = [
+          ['<thinking>', '</thinking>'],
+          ['<think>',    '</think>']
+        ].freeze
         def initialize(conversation_id:)
           @conversation_id = conversation_id
@@ -76,6 +82,8 @@ module Legion
           return msg if content.length <= max_chars
           summary = heuristic_tool_summary(content, tool_name_from(msg))
+          log.debug "[llm][curator] action=distill_tool_result conversation_id=#{@conversation_id} " \
+                    "original_chars=#{content.length} summary_chars=#{summary.length}"
           msg.merge(content: summary, curated: true, original_content: content)
         end
@@ -89,6 +97,8 @@ module Legion
           return msg if stripped == content || stripped.empty?
+          log.debug "[llm][curator] action=strip_thinking conversation_id=#{@conversation_id} " \
+                    "original_chars=#{content.length} stripped_chars=#{stripped.length}"
           msg.merge(content: stripped, curated: true, original_content: content)
         end
@@ -192,18 +202,27 @@ module Legion
         end
         def strip_thinking_tags(text)
-          result = +''
+          result = text
+          THINKING_TAG_PAIRS.each do |open_tag, close_tag|
+            result = strip_tag_pair(result, open_tag, close_tag)
+          end
+          result
+        end
+        def strip_tag_pair(text, open_tag, close_tag)
+          out = +''
           pos = 0
           while pos < text.length
-            open_idx = text.index(THINKING_OPEN, pos)
+            open_idx = text.index(open_tag, pos)
             break unless open_idx
-            result << text[pos...open_idx]
-            close_idx = text.index(THINKING_CLOSE, open_idx + THINKING_OPEN.length)
-            pos = close_idx ? close_idx + THINKING_CLOSE.length : text.length
+            out << text[pos...open_idx]
+            close_idx = text.index(close_tag, open_idx + open_tag.length)
+            pos = close_idx ? close_idx + close_tag.length : text.length
           end
-          result << text[pos..] if pos < text.length
-          result
+          out << text[pos..] if pos < text.length
+          # Strip any unclosed open tag left at the end (provider died mid-stream).
+          out.sub(/#{Regexp.escape(open_tag)}.*\z/m, '').strip
         end
         def curate_message(msg, assistant_response)
@@ -427,7 +446,8 @@ module Legion
         def curated_payload(entry)
           parsed = Legion::JSON.parse(entry[:content].to_s)
           parsed.is_a?(Hash) ? parsed : {}
-        rescue Legion::JSON::ParseError
+        rescue Legion::JSON::ParseError => e
+          log.debug "[llm][curator] action=curated_payload conversation_id=#{@conversation_id} error=#{e.class} — #{e.message}"
           {}
         end

data/lib/legion/llm/discovery/rule_generator.rb CHANGED Viewed

@@ -26,7 +26,7 @@ module Legion
           anthropic: :frontier
         }.freeze
-        TIER_WEIGHT = { local: 100, fleet: 80, cloud: 60, frontier: 40 }.freeze
+        DEFAULT_TIER_PRIORITY = %i[local direct fleet openai_compat cloud frontier].freeze
         module_function
@@ -50,7 +50,7 @@ module Legion
                              extract_field(model_data, 'tier')&.to_sym ||
                              tier
                 capability = embedding_model?(model_data) ? :embed : :chat
-                priority = (TIER_WEIGHT[model_tier] || 80) - order
+                priority = tier_weight(model_tier) - order
                 rules << build_rule(provider, instance_id, model_data, capability, model_tier, priority)
                 rules << build_rule(provider, instance_id, model_data, :stream, model_tier, priority) if capability == :chat
                 order += 1
@@ -91,7 +91,7 @@ module Legion
             next unless default_model
             model_data = { name: default_model }
-            priority = TIER_WEIGHT[tier] || 40
+            priority = tier_weight(tier)
             rules << build_rule(provider_name, :default, model_data, :chat, tier, priority)
             rules << build_rule(provider_name, :default, model_data, :stream, tier, priority)
           end
@@ -136,6 +136,26 @@ module Legion
           model_data[field] || model_data[field.to_s]
         end
+        def tier_weight(tier)
+          tier_sym = tier.respond_to?(:to_sym) ? tier.to_sym : tier
+          index = tier_priority.index(tier_sym)
+          return 0 unless index
+          (tier_priority.length - index) * 100
+        end
+        def tier_priority
+          configured = Legion::LLM::Settings.value(:routing, :tier_priority, default: DEFAULT_TIER_PRIORITY)
+          normalized = Array(configured).filter_map do |tier|
+            tier.to_sym if tier.respond_to?(:to_sym)
+          end
+          normalized = DEFAULT_TIER_PRIORITY if normalized.empty?
+          (normalized + DEFAULT_TIER_PRIORITY).uniq
+        rescue StandardError => e
+          handle_exception(e, level: :warn, handled: true, operation: 'rule_generator.tier_priority')
+          DEFAULT_TIER_PRIORITY
+        end
         def extension_providers
           ext = Legion::Settings[:extensions]
           return ext[:llm] if ext.is_a?(Hash) && ext[:llm].is_a?(Hash)

data/lib/legion/llm/discovery.rb CHANGED Viewed

@@ -254,11 +254,22 @@ module Legion
           end
           return false unless best
-          @embedding_provider = best[:provider]
-          @embedding_model = best.dig(:metadata, :default_model) ||
-                             Settings.value(:embedding, :default_model)
-          @embedding_instance = best[:instance]
-          @can_embed = true
+          provider  = best[:provider]
+          instance  = best[:instance]
+          resolved  = best.dig(:metadata, :default_model) ||
+                      embedding_settings[:default_model] ||
+                      first_embedding_model_for(provider, instance)
+          unless resolved.to_s.length.positive?
+            log.debug '[llm][discovery] action=detect_embedding_from_registry no_model_resolved ' \
+                      "provider=#{provider} instance=#{instance} — falling through to legacy probe"
+            return false
+          end
+          @embedding_provider = provider
+          @embedding_model    = resolved
+          @embedding_instance = instance
+          @can_embed          = true
           @embedding_fallback_chain = build_registry_embedding_fallback(embedding_instances)
           log.info "[llm][discovery] embedding available provider=#{@embedding_provider} " \
@@ -280,6 +291,14 @@ module Legion
           end
         end
+        def first_embedding_model_for(provider, instance)
+          embedding_caps = %w[embedding embeddings embed].freeze
+          cached_discovered_models.find do |m|
+            m[:provider].to_s == provider.to_s && m[:instance].to_s == instance.to_s &&
+              Array(m[:capabilities]).any? { |c| embedding_caps.include?(c.to_s) }
+          end&.dig(:model)
+        end
         def find_embedding_provider(embedding_settings)
           fallback = Legion::LLM::Settings.config_value(embedding_settings, :provider_fallback, %w[ollama bedrock openai])
           provider_models = Legion::LLM::Settings.config_value(embedding_settings, :provider_models, {})
@@ -396,7 +415,17 @@ module Legion
         end
         def embedding_settings
-          Legion::LLM::Settings.config_value(llm_settings, :embedding, {})
+          settings = llm_settings
+          result = Legion::LLM::Settings.config_value(settings, :embedding)
+          return result if result.is_a?(Hash) && !result.empty?
+          plural = Legion::LLM::Settings.config_value(settings, :embeddings)
+          if plural.is_a?(Hash) && !plural.empty?
+            log.warn '[llm][discovery] settings key "embeddings" (plural) is deprecated — rename to "embedding" (singular)'
+            return plural
+          end
+          result || {}
         end
         def providers_settings

data/lib/legion/llm/inference/executor.rb CHANGED Viewed

@@ -159,9 +159,11 @@ module Legion
           return meta[:tier].to_sym if meta.is_a?(Hash) && meta[:tier]
           return Router.provider_tier(provider) if defined?(Router) && Router.respond_to?(:provider_tier)
-          Router::PROVIDER_TIER.fetch(provider.to_sym, :cloud) if defined?(Router::PROVIDER_TIER)
-        rescue StandardError
-          :cloud
+          Router::PROVIDER_TIER.fetch(provider.to_sym, nil) if defined?(Router::PROVIDER_TIER)
+        rescue StandardError => e
+          handle_exception(e, level: :warn, handled: true, operation: 'llm.pipeline.inferred_provider_tier',
+                              provider: provider)
+          nil
         end
         def execute_steps
@@ -326,12 +328,17 @@ module Legion
           log.debug "[llm][executor] action=step_routing.enter requested_provider=#{@request.routing[:provider]} requested_model=#{@request.routing[:model]}"
           @timestamps[:routing_start] = Time.now
           state = resolve_routing_state(apply_proactive_tier_assignment(routing_request_state))
+          auto_route = state[:auto_route] == true
           @resolved_provider = state[:provider] ||
                                (state[:model] && Router.infer_provider_for_model(state[:model])) ||
-                               llm_setting(:default_provider)
-          @resolved_instance = state[:instance] || llm_setting(:default_instance)
-          @resolved_model = state[:model] || llm_setting(:default_model)
+                               (llm_setting(:default_provider) unless auto_route)
+          @resolved_instance = resolve_provider_instance(state[:instance], @resolved_provider)
+          @resolved_model = state[:model] || (llm_setting(:default_model) unless auto_route)
+          if auto_route && (@resolved_provider.nil? || @resolved_model.nil?)
+            raise ProviderError, 'Auto routing could not resolve an available LLM provider/model'
+          end
           @resolved_tier = state[:tier]&.to_sym || inferred_provider_tier(@resolved_provider)
           @resolved_offering_id = state[:offering_id]
           @resolved_offering_metadata = state[:offering_metadata]
@@ -347,16 +354,43 @@ module Legion
           )
         end
+        def resolve_provider_instance(requested_instance, provider)
+          return provider_scoped_instance(requested_instance, provider, preserve_unknown: true) if requested_instance
+          provider_scoped_instance(llm_setting(:default_instance), provider, preserve_unknown: false)
+        end
+        def provider_scoped_instance(instance, provider, preserve_unknown:)
+          return nil if instance.nil? || instance.to_s.empty? || provider.nil? || provider.to_s.empty?
+          provider_sym = provider.to_sym
+          instance_sym = instance.to_sym
+          return instance_sym if Call::Registry.registered?(provider_sym, instance: instance_sym)
+          return nil if Call::Registry.registered?(provider_sym)
+          preserve_unknown ? instance_sym : nil
+        rescue StandardError => e
+          handle_exception(e, level: :warn, handled: true, operation: 'llm.pipeline.provider_scoped_instance')
+          preserve_unknown ? instance : nil
+        end
         def routing_request_state
+          routing_explicit = @request.extra[:routing_explicit]
+          instance = @request.routing[:instance] || @request.routing[:instance_id] || @request.routing[:provider_instance]
+          tier = @request.extra[:tier]
           {
             provider:          @request.routing[:provider],
-            instance:          @request.routing[:instance] || @request.routing[:instance_id] || @request.routing[:provider_instance],
+            instance:          instance,
             model:             @request.routing[:model],
             offering_id:       @request.routing[:offering_id] || @request.routing[:id],
             offering_metadata: normalize_offering_metadata(@request.routing[:offering_metadata] ||
                                                            @request.routing[:offering]),
             intent:            @request.extra[:intent],
-            tier:              @request.extra[:tier]
+            tier:              tier,
+            auto_route:        @request.extra[:auto_route],
+            provider_explicit: routing_field_explicit?(routing_explicit, :provider, @request.routing[:provider]),
+            instance_explicit: routing_field_explicit?(routing_explicit, :instance, instance),
+            tier_explicit:     routing_field_explicit?(routing_explicit, :tier, tier)
           }
         end
@@ -365,17 +399,25 @@ module Legion
           # caller-supplied tier/intent. Advisory assignments only fill blanks.
           if @proactive_tier_assignment&.dig(:forced)
             state[:tier] = @proactive_tier_assignment[:tier]
+            state[:tier_explicit] = true
             state[:intent] = merge_routing_intent(state[:intent], @proactive_tier_assignment[:intent])
             log.info "[llm][routing] action=forced_tier source=#{@proactive_tier_assignment[:source]} tier=#{state[:tier]}"
-          elsif @proactive_tier_assignment && !state[:tier] && !state[:intent]
+          elsif @proactive_tier_assignment && !state[:tier] && !state[:intent] && !state[:instance] &&
+                !state[:provider] && !state[:model]
             state[:tier] = @proactive_tier_assignment[:tier]
+            state[:tier_explicit] = true
             state[:intent] = @proactive_tier_assignment[:intent]
           end
           state
         end
         def resolve_routing_state(state)
-          return state unless (state[:intent] || state[:tier]) && defined?(Router) && Router.routing_enabled?
+          return state unless defined?(Router)
+          explicit_route = state[:provider_explicit] || state[:instance_explicit] || state[:tier_explicit]
+          auto_route = state[:auto_route] == true
+          intent_route = state[:intent] && Router.routing_enabled?
+          return state unless explicit_route || auto_route || intent_route
           resolution = routing_resolution_for(state)
           return state unless resolution
@@ -384,17 +426,20 @@ module Legion
         end
         def routing_resolution_for(state)
-          if pipeline_escalation_enabled?
+          if state[:auto_route] == true || (state[:intent] && pipeline_escalation_enabled?)
             @escalation_chain = Router.resolve_chain(
-              intent:          state[:intent],
-              tier:            state[:tier],
-              model:           state[:model],
-              provider:        state[:provider],
-              max_escalations: pipeline_escalation_max_attempts
+              intent:                 state[:intent],
+              tier:                   state[:tier],
+              model:                  state[:model],
+              provider:               state[:provider],
+              instance:               state[:instance],
+              max_escalations:        pipeline_escalation_max_attempts,
+              allow_default_fallback: state[:auto_route] != true
             )
             @escalation_chain.primary
           else
-            Router.resolve(intent: state[:intent], tier: state[:tier], model: state[:model], provider: state[:provider])
+            Router.resolve(intent: state[:intent], tier: state[:tier], model: state[:model],
+                           provider: state[:provider], instance: state[:instance])
           end
         end
@@ -422,6 +467,13 @@ module Legion
           state
         end
+        def routing_field_explicit?(flags, key, value)
+          return false if value.nil? || value.to_s.empty?
+          return true unless flags.is_a?(Hash)
+          flags.fetch(key, flags.fetch(key.to_s, true)) == true
+        end
         def step_request_normalization
           @exchange_id = Tracing.exchange_id
         end
@@ -476,33 +528,21 @@ module Legion
         end
         def run_provider_call_with_escalation
-          chain = @escalation_chain || build_default_escalation_chain
+          @escalation_chain ||= build_default_escalation_chain
+          chain = @escalation_chain
           threshold = pipeline_escalation_quality_threshold
           quality_check = @request.extra[:quality_check]
           succeeded = false
+          tried = []
           log.debug "[llm][executor] action=escalation.enter chain_size=#{chain.size} threshold=#{threshold}"
+          primary_tier = @escalation_chain.primary&.tier
           chain.each do |resolution|
-            start_time = Time.now
-            @resolved_provider = resolution.provider
-            @resolved_instance = resolution.instance
-            @resolved_model = resolution.model
-            @resolved_tier = resolution.tier
-            @resolved_offering_id = resolution.offering_id
-            @resolved_offering_metadata = resolution.offering_metadata
-            succeeded = attempt_escalation(resolution, threshold, quality_check, start_time)
+            next if tried.any? { |t| t[:provider] == resolution.provider && t[:instance] == resolution.instance && t[:model] == resolution.model }
+            succeeded = run_escalation_resolution(resolution, threshold, quality_check, tried, primary_tier)
             break if succeeded
-          rescue Legion::LLM::AuthError, Legion::LLM::PrivacyModeError => e
-            record_escalation_failure(e, resolution, start_time,
-                                      outcome: :auth_error, operation: 'llm.pipeline.escalation_attempt.auth',
-                                      handled: true)
-          rescue Legion::LLM::RateLimitError => e
-            record_escalation_failure(e, resolution, start_time,
-                                      outcome: :rate_limited, operation: 'llm.pipeline.escalation_attempt.rate_limit',
-                                      handled: true)
-          rescue StandardError => e
-            record_escalation_failure(e, resolution, start_time, outcome:   :error,
-                                                                 operation: 'llm.pipeline.escalation_attempt')
           end
           return if succeeded
@@ -513,6 +553,58 @@ module Legion
           raise EscalationExhausted, "All #{@escalation_history.size} escalation attempts failed"
         end
+        def run_escalation_resolution(resolution, threshold, quality_check, tried, primary_tier)
+          move_type = if tried.empty?
+                        :primary
+                      elsif resolution.tier == primary_tier
+                        :lateral
+                      else
+                        :escalation
+                      end
+          log.info "[llm][escalation] action=attempt move=#{move_type} provider=#{resolution.provider} model=#{resolution.model} tier=#{resolution.tier}"
+          start_time = Time.now
+          @resolved_provider = resolution.provider
+          @resolved_instance = resolution.instance
+          @resolved_model = resolution.model
+          @resolved_tier = resolution.tier
+          @resolved_offering_id = resolution.offering_id
+          @resolved_offering_metadata = resolution.offering_metadata
+          succeeded = attempt_escalation(resolution, threshold, quality_check, start_time)
+          tried << { provider: resolution.provider, instance: resolution.instance, model: resolution.model } unless succeeded
+          succeeded
+        rescue Legion::LLM::AuthError, Legion::LLM::PrivacyModeError => e
+          tried << { provider: resolution.provider, instance: resolution.instance, model: resolution.model }
+          record_escalation_failure(e, resolution, start_time,
+                                    outcome:   :auth_error,
+                                    operation: 'llm.pipeline.escalation_attempt.auth',
+                                    handled:   true)
+          false
+        rescue Legion::LLM::ContextOverflow => e
+          tried << { provider: resolution.provider, instance: resolution.instance, model: resolution.model }
+          record_escalation_failure(e, resolution, start_time,
+                                    outcome:   :context_overflow,
+                                    operation: 'llm.pipeline.escalation_attempt.context_overflow',
+                                    handled:   true)
+          log.warn "[llm][escalation] context_overflow provider=#{resolution.provider} " \
+                   "model=#{resolution.model} — skipping same-tier, seeking larger context window"
+          skip_same_tier!(resolution, tried)
+          false
+        rescue Legion::LLM::RateLimitError => e
+          tried << { provider: resolution.provider, instance: resolution.instance, model: resolution.model }
+          record_escalation_failure(e, resolution, start_time,
+                                    outcome:   :rate_limited,
+                                    operation: 'llm.pipeline.escalation_attempt.rate_limit',
+                                    handled:   true)
+          false
+        rescue StandardError => e
+          tried << { provider: resolution.provider, instance: resolution.instance, model: resolution.model }
+          record_escalation_failure(e, resolution, start_time,
+                                    outcome:   :error,
+                                    operation: 'llm.pipeline.escalation_attempt')
+          false
+        end
         def attempt_escalation(resolution, threshold, quality_check, start_time)
           @current_escalation_context = {
             attempt:      @escalation_history.size + 1,
@@ -586,7 +678,64 @@ module Legion
         end
         def build_default_escalation_chain
-          Router.resolve_chain(max_escalations: pipeline_escalation_max_attempts)
+          primary = Router.explicit_resolution(@resolved_tier, @resolved_provider, @resolved_model, @resolved_instance)
+          fallbacks = build_fallback_resolutions(
+            exclude_provider: @resolved_provider,
+            exclude_instance: @resolved_instance,
+            primary_tier:     @resolved_tier
+          )
+          resolutions = ([primary] + fallbacks).compact.uniq { |r| [r.provider, r.instance, r.model] }
+          Router::EscalationChain.new(resolutions: resolutions, max_attempts: pipeline_escalation_max_attempts)
+        end
+        def build_fallback_resolutions(exclude_provider: nil, exclude_instance: nil, primary_tier: nil)
+          tier_rank = Router::TIER_RANK
+          primary_rank = primary_tier ? (tier_rank[primary_tier.to_sym] || 99) : 99
+          candidates = Call::Registry.all_instances.filter_map do |entry|
+            next if entry[:provider] == exclude_provider&.to_sym && entry[:instance] == (exclude_instance&.to_sym || :default)
+            next if entry[:provider] == exclude_provider&.to_sym && exclude_instance.nil?
+            model = Router.send(:registry_default_model, entry)
+            next unless model
+            tier = Router::PROVIDER_TIER.fetch(entry[:provider], :frontier)
+            Router::Resolution.new(
+              tier:     tier,
+              provider: entry[:provider],
+              instance: entry[:instance] == :default ? nil : entry[:instance],
+              model:    model,
+              rule:     'escalation_fallback'
+            )
+          end
+          # Lateral alternatives (same tier) come first; escalations (higher tier) follow;
+          # lower-ranked tiers are appended last.
+          candidates.sort_by do |r|
+            r_rank = tier_rank[r.tier] || 99
+            rank_diff = r_rank - primary_rank
+            bucket = if rank_diff.zero?
+                       0
+                     elsif rank_diff.positive?
+                       1
+                     else
+                       2
+                     end
+            [bucket, r_rank]
+          end
+        end
+        def skip_same_tier!(failed_resolution, tried)
+          chain = @escalation_chain
+          return unless chain.respond_to?(:each)
+          chain.each do |r|
+            next if r.tier != failed_resolution.tier
+            next if tried.any? { |t| t[:provider] == r.provider && t[:instance] == r.instance && t[:model] == r.model }
+            log.debug "[llm][escalation] action=skip_same_tier provider=#{r.provider} model=#{r.model} tier=#{r.tier} reason=context_overflow"
+            tried << { provider: r.provider, instance: r.instance, model: r.model }
+          end
         end
         def escalation_attempt_hash(resolution, outcome:, failures:, duration_ms:)

data/lib/legion/llm/inference/prompt.rb CHANGED Viewed

@@ -39,8 +39,17 @@ module Legion
                      cache: nil,
                      quality_check: nil,
                      **)
+          routing_explicit = { provider: !provider.nil?, model: !model.nil?, tier: !tier.nil? }
           resolved_provider = provider
           resolved_model = model
+          auto_route = Inference::Request.auto_routing_model?(resolved_model)
+          if auto_route
+            resolved_provider = nil
+            intent ||= Inference::Request.default_auto_routing_intent
+          elsif resolved_provider.nil? && resolved_model && defined?(Router)
+            resolved_provider = Router.infer_provider_for_model(resolved_model)
+          end
           if resolved_provider.nil? && resolved_model.nil? && defined?(Router) && Router.routing_enabled? && (intent || tier)
             resolution = Router.resolve(intent: intent, tier: tier, exclude: exclude)
@@ -48,26 +57,27 @@ module Legion
             resolved_model = resolution&.model
           end
-          resolved_provider ||= llm_setting(:default_provider)
-          resolved_model ||= llm_setting(:default_model)
+          resolved_provider ||= llm_setting(:default_provider) unless auto_route
+          resolved_model ||= llm_setting(:default_model) unless auto_route
           request(message,
-                  provider:        resolved_provider,
-                  model:           resolved_model,
-                  intent:          intent,
-                  tier:            tier,
-                  schema:          schema,
-                  tools:           tools,
-                  escalate:        escalate,
-                  max_escalations: max_escalations,
-                  thinking:        thinking,
-                  temperature:     temperature,
-                  max_tokens:      max_tokens,
-                  tracing:         tracing,
-                  agent:           agent,
-                  caller:          caller,
-                  cache:           cache,
-                  quality_check:   quality_check,
+                  provider:         resolved_provider,
+                  model:            resolved_model,
+                  intent:           intent,
+                  tier:             tier,
+                  schema:           schema,
+                  tools:            tools,
+                  escalate:         escalate,
+                  max_escalations:  max_escalations,
+                  thinking:         thinking,
+                  temperature:      temperature,
+                  max_tokens:       max_tokens,
+                  tracing:          tracing,
+                  agent:            agent,
+                  caller:           caller,
+                  cache:            cache,
+                  quality_check:    quality_check,
+                  routing_explicit: routing_explicit,
                   **)
         end
@@ -90,7 +100,8 @@ module Legion
                     cache: nil,
                     quality_check: nil,
                     **)
-          if provider.nil? || model.nil?
+          auto_route = Inference::Request.auto_routing_model?(model)
+          if !auto_route && (provider.nil? || model.nil?)
             raise LLMError, "Prompt.request: provider and model must be set (got provider=#{provider.inspect}, model=#{model.inspect}). " \
                             'Configure Legion::Settings[:llm][:default_provider] and [:default_model], or pass them explicitly.'
           end

data/lib/legion/llm/inference/request.rb CHANGED Viewed

@@ -3,6 +3,9 @@
 module Legion
   module LLM
     module Inference
+      AUTO_ROUTING_MODEL_KEY = 'legionio'
+      AUTO_ROUTING_MODEL_ALIASES = [AUTO_ROUTING_MODEL_KEY].freeze
       Request = ::Data.define(
         :id, :conversation_id, :idempotency_key, :schema_version,
         :system, :messages, :tools, :tool_choice,
@@ -14,6 +17,11 @@ module Legion
         :billing, :test, :modality, :hooks
       ) do
         def self.build(**kwargs)
+          routing, extra = normalize_auto_routing(
+            kwargs.fetch(:routing, { provider: nil, model: nil }),
+            kwargs.fetch(:extra, {})
+          )
           new(
             id:               kwargs[:id] || "req_#{SecureRandom.hex(12)}",
             conversation_id:  kwargs[:conversation_id],
@@ -23,7 +31,7 @@ module Legion
             messages:         kwargs.fetch(:messages, []),
             tools:            kwargs.key?(:tools) ? kwargs[:tools] : nil,
             tool_choice:      kwargs.fetch(:tool_choice, { mode: :auto }),
-            routing:          kwargs.fetch(:routing, { provider: nil, model: nil }),
+            routing:          routing,
             tokens:           kwargs.fetch(:tokens, { max: 4096 }),
             stop:             kwargs.fetch(:stop, { sequences: [] }),
             generation:       kwargs.fetch(:generation, {}),
@@ -35,7 +43,7 @@ module Legion
             cache:            kwargs.fetch(:cache, { strategy: :default, cacheable: true }),
             priority:         kwargs.fetch(:priority, :normal),
             ttl:              kwargs[:ttl],
-            extra:            kwargs.fetch(:extra, {}),
+            extra:            extra,
             metadata:         kwargs.fetch(:metadata, {}),
             enrichments:      kwargs.fetch(:enrichments, {}),
             predictions:      kwargs.fetch(:predictions, {}),
@@ -110,6 +118,41 @@ module Legion
           build_args[:id] = request_id if request_id
           build(**build_args)
         end
+        def self.auto_routing_model?(model)
+          Legion::LLM::Inference::AUTO_ROUTING_MODEL_ALIASES.include?(model.to_s.strip.downcase)
+        end
+        def self.default_auto_routing_intent
+          routing = Legion::LLM::Settings.value(:routing, default: {})
+          intent = Legion::LLM::Settings.config_value(routing, :default_intent, {})
+          intent = intent.is_a?(Hash) ? normalize_hash(intent) : {}
+          intent.merge(capability: :chat)
+        rescue StandardError
+          { capability: :chat }
+        end
+        def self.normalize_auto_routing(routing, extra)
+          normalized_routing = normalize_hash(routing)
+          normalized_extra = normalize_hash(extra)
+          return [normalized_routing, normalized_extra] unless auto_routing_model?(normalized_routing[:model])
+          normalized_routing = { provider: nil, model: nil }
+          normalized_extra = normalized_extra.dup
+          normalized_extra.delete(:tier)
+          normalized_extra[:intent] ||= default_auto_routing_intent
+          normalized_extra[:auto_route] = true
+          normalized_extra[:requested_model_alias] = Legion::LLM::Inference::AUTO_ROUTING_MODEL_KEY
+          [normalized_routing, normalized_extra]
+        end
+        def self.normalize_hash(value)
+          return {} unless value.is_a?(Hash)
+          value.each_with_object({}) do |(key, hash_value), normalized|
+            normalized[key.respond_to?(:to_sym) ? key.to_sym : key] = hash_value
+          end
+        end
       end
     end
   end

data/lib/legion/llm/inference/steps/rag_context.rb CHANGED Viewed

@@ -319,6 +319,7 @@ module Legion
           def positive_integer(value)
             return nil if value.nil?
+            return nil if value.respond_to?(:empty?) && value.empty?
             integer = Integer(value)
             integer.positive? ? integer : nil

data/lib/legion/llm/router/escalation/chain.rb CHANGED Viewed

@@ -39,10 +39,8 @@ module Legion
         def padded_resolutions
           return [] if @resolutions.empty?
-          return @resolutions.first(@max_attempts) if @resolutions.size >= @max_attempts
-          last = @resolutions.last
-          (@resolutions + Array.new(@max_attempts - @resolutions.size) { last }).first(@max_attempts)
+          @resolutions.first(@max_attempts)
         end
       end
     end

data/lib/legion/llm/router.rb CHANGED Viewed

@@ -18,6 +18,7 @@ module Legion
                         gemini: :cloud, azure: :cloud, ollama: :local, vllm: :fleet }.freeze
       PROVIDER_ORDER = %i[ollama vllm bedrock azure gemini anthropic openai].freeze
       TIER_EXTERNAL = Set[:cloud, :frontier, :openai_compat].freeze
+      TIER_RANK = { local: 0, direct: 1, fleet: 2, openai_compat: 3, cloud: 4, frontier: 5 }.freeze
       OLLAMA_MODEL_PATTERN = %r{[:/]}
@@ -60,9 +61,9 @@ module Legion
         # @param model    [String, nil] explicit model override
         # @param provider [Symbol, nil] explicit provider override
         # @return [Resolution, nil]
-        def resolve(intent: nil, tier: nil, model: nil, provider: nil, exclude: {})
-          log.debug "[llm][router] action=resolve.enter intent=#{intent} tier=#{tier} model=#{model} provider=#{provider}"
-          return explicit_resolution(tier, provider, model) if tier
+        def resolve(intent: nil, tier: nil, model: nil, provider: nil, instance: nil, exclude: {})
+          log.debug "[llm][router] action=resolve.enter intent=#{intent} tier=#{tier} model=#{model} provider=#{provider} instance=#{instance}"
+          return explicit_resolution(tier, provider, model, instance) if tier || provider || instance
           return nil unless routing_enabled? && intent
@@ -81,13 +82,14 @@ module Legion
           resolution || arbitrage_fallback(intent)
         end
-        def resolve_chain(intent: nil, tier: nil, model: nil, provider: nil, max_escalations: nil, exclude: {})
+        def resolve_chain(intent: nil, tier: nil, model: nil, provider: nil, instance: nil, max_escalations: nil,
+                          exclude: {}, allow_default_fallback: true)
           log.debug "[llm][router] action=resolve_chain.enter intent=#{intent} tier=#{tier} max_escalations=#{max_escalations}"
           max = max_escalations || escalation_max_attempts
-          return EscalationChain.new(resolutions: [explicit_resolution(tier, provider, model)], max_attempts: max) if tier
-          return chain_from_defaults(model, provider, max) unless routing_enabled? && intent
+          return EscalationChain.new(resolutions: [explicit_resolution(tier, provider, model, instance)], max_attempts: max) if tier || provider || instance
+          return chain_from_defaults(model, provider, max, allow_default_fallback: allow_default_fallback) unless routing_enabled? && intent
-          chain_from_intent(intent, max, exclude: exclude)
+          chain_from_intent(intent, max, exclude: exclude, allow_default_fallback: allow_default_fallback)
         end
         def health_tracker
@@ -145,6 +147,34 @@ module Legion
           true
         end
+        def explicit_resolution(tier, provider, model, instance = nil)
+          registry_entry = if provider
+                             registry_entry_for_provider(provider.to_sym, instance: instance&.to_sym)
+                           elsif tier
+                             registry_entry_for_tier(tier)
+                           end
+          resolved_provider = if provider
+                                provider.to_sym
+                              else
+                                registry_entry&.[](:provider) ||
+                                  (tier && default_provider_for_tier(tier)) ||
+                                  default_settings_provider&.to_sym ||
+                                  :anthropic
+                              end
+          resolved_model    = model || registry_default_model(registry_entry) || (tier && default_model_for_tier(tier))
+          resolved_instance = registry_entry&.[](:instance) || instance
+          resolved_tier     = tier || PROVIDER_TIER.fetch(resolved_provider, :frontier)
+          Resolution.new(
+            tier:     resolved_tier,
+            provider: resolved_provider,
+            model:    resolved_model,
+            instance: resolved_instance,
+            rule:     'explicit',
+            metadata: registry_resolution_metadata(registry_entry)
+          )
+        end
         private
         def arbitrage_fallback(intent)
@@ -162,25 +192,6 @@ module Legion
           Resolution.new(tier: tier, provider: provider, model: model, rule: 'arbitrage_fallback')
         end
-        def explicit_resolution(tier, provider, model)
-          registry_entry = if provider
-                             registry_entry_for_provider(provider.to_sym)
-                           else
-                             registry_entry_for_tier(tier)
-                           end
-          resolved_provider = provider ? provider.to_sym : (registry_entry&.[](:provider) || default_provider_for_tier(tier))
-          resolved_model = model || registry_default_model(registry_entry) || default_model_for_tier(tier)
-          Resolution.new(
-            tier:     tier,
-            provider: resolved_provider,
-            model:    resolved_model,
-            instance: registry_entry&.[](:instance),
-            rule:     'explicit',
-            metadata: registry_resolution_metadata(registry_entry)
-          )
-        end
         def merge_defaults(intent)
           defaults = (routing_settings[:default_intent] || {})
                      .transform_keys(&:to_sym)
@@ -423,19 +434,22 @@ module Legion
           end
         end
-        def chain_from_defaults(model, provider, max)
-          if provider || model || default_settings_provider || default_settings_model
+        def chain_from_defaults(model, provider, max, allow_default_fallback: true)
+          if provider || model || (allow_default_fallback && (default_settings_provider || default_settings_model))
             p = (provider || default_settings_provider)&.to_sym
             resolved_model = model || registry_default_model(registry_entry_for_provider(p)) ||
                              default_settings_model || 'claude-sonnet-4-6'
-            res = Resolution.new(tier:     PROVIDER_TIER.fetch(p, :frontier),
-                                 provider: p || :anthropic,
-                                 model:    resolved_model)
-            return EscalationChain.new(resolutions: [res], max_attempts: max)
+            primary = Resolution.new(tier:     PROVIDER_TIER.fetch(p || :anthropic, :frontier),
+                                     provider: p || :anthropic,
+                                     model:    resolved_model)
+            # Append remaining registered providers as fallbacks (sorted by tier rank)
+            fallbacks = enabled_provider_chain.reject { |r| r.provider == primary.provider }
+            resolutions = ([primary] + fallbacks).uniq { |r| [r.provider, r.instance, r.model] }
+            return EscalationChain.new(resolutions: resolutions, max_attempts: max)
           end
           resolutions = enabled_provider_chain
-          if resolutions.empty?
+          if resolutions.empty? && allow_default_fallback
             p = default_settings_provider&.to_sym || :anthropic
             resolutions = [Resolution.new(tier:     PROVIDER_TIER.fetch(p, :frontier),
                                           provider: p,
@@ -475,7 +489,7 @@ module Legion
           end
         end
-        def chain_from_intent(intent, max, exclude: {})
+        def chain_from_intent(intent, max, exclude: {}, allow_default_fallback: true)
           merged     = intent ? merge_defaults(intent) : {}
           rules      = load_rules
           candidates = select_candidates(rules, merged, exclude: exclude)
@@ -484,7 +498,7 @@ module Legion
           resolutions = build_fallback_chain(sorted.first, sorted, resolutions) if sorted.first&.fallback
           resolutions = resolutions.uniq { |r| [r.provider, r.model] }
           resolutions = enabled_provider_chain if resolutions.empty?
-          if resolutions.empty?
+          if resolutions.empty? && allow_default_fallback
             p = default_settings_provider&.to_sym || :anthropic
             resolutions = [Resolution.new(tier:     PROVIDER_TIER.fetch(p, :frontier),
                                           provider: p,
@@ -573,14 +587,23 @@ module Legion
         end
         # Find the first registered instance for a specific provider.
-        def registry_entry_for_provider(provider)
+        # When +instance+ is given, prefers the entry whose :instance matches;
+        # falls back to the first provider entry if no exact match is found.
+        def registry_entry_for_provider(provider, instance: nil)
           instances = begin
             Call::Registry.all_instances
           rescue StandardError => e
             handle_exception(e, level: :warn, handled: true, operation: 'router.registry_entry_for_provider')
             []
           end
-          instances.find { |entry| entry[:provider] == provider }
+          provider_entries = instances.select { |entry| entry[:provider] == provider }
+          return nil if provider_entries.empty?
+          if instance
+            provider_entries.find { |entry| entry[:instance] == instance } || provider_entries.first
+          else
+            provider_entries.first
+          end
         end
         # Find a default model from registry for a given tier.

data/lib/legion/llm/settings.rb CHANGED Viewed

@@ -305,7 +305,7 @@ module Legion
       def self.routing_defaults
         {
           enabled:        true,
-          tier_priority:  %w[local fleet openai_compat cloud frontier],
+          tier_priority:  %w[local direct fleet openai_compat cloud frontier],
           default_intent: { privacy: 'normal', capability: 'moderate', cost: 'normal' },
           tiers:          {
             local:         { provider: 'ollama' },

data/lib/legion/llm/version.rb CHANGED Viewed

@@ -2,6 +2,6 @@
 module Legion
   module LLM
-    VERSION = '0.9.23'
+    VERSION = '0.9.28'
   end
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: legion-llm
 version: !ruby/object:Gem::Version
-  version: 0.9.23
+  version: 0.9.28
 platform: ruby
 authors:
 - Esity