legion-llm 0.9.23 → 0.9.28

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8c8c98a439d2e96bba437e5e8b4bf8c47c01277a4079bd459c7257e2990278c6
4
- data.tar.gz: f9344c761ebf18b4c5ab271ac8cb5858ce46f791588ace438726002d2907c70e
3
+ metadata.gz: 178958a3403cbac0fad20d83f2726914d420137db2a1c340c33c4c7305457fcd
4
+ data.tar.gz: df951b9e05e0a0bfaff3701b6a3c5bd8452edea2298fe91e6a98165ce96961d1
5
5
  SHA512:
6
- metadata.gz: 9574535d0eeca84d522858dd323e8d028994b46b3d3f78a37a8094a4f1a692fbdd68bf24e8a061160b5238a2c3e4f73141e29bf70c6423f18e4b4441937f5417
7
- data.tar.gz: 69aa8eccf10beb687b637b7442d9eb8a7bae0d42405fc9cfd47f8c8d5c036b7724df6cb55c10d8264f0701f46f8282f0a44d8622d607a6129a09ab4c39ad2e99
6
+ metadata.gz: 7ff1622a50cdafc4d09577e8dc5f9e90632f7f93e528f41c1b630ce5daeeeceab6c91cbf5d8ec92183e0b6f27c17a62d09934b6bb0a413da9577cea5a47c942f
7
+ data.tar.gz: 49651eb56bbe046223674a626ce4339363ba10550793c1d95ca46d4bf2c4b7f2eb430397b7f0d91368791cecb683ae2919f8627edb45d88d5c847f4d0cb4ee12
data/CHANGELOG.md CHANGED
@@ -1,5 +1,60 @@
1
1
  # Legion LLM Changelog
2
2
 
3
+ ## [0.9.28] - 2026-05-15
4
+
5
+ ### Added
6
+ - API: `/api/llm/models` now surfaces a static `LegionIO` model (`id: legionio`) as the default auto-routing placeholder.
7
+
8
+ ### Changed
9
+ - Routing: `model: "legionio"` clears explicit provider/model/instance/tier routing and sends the request through the router chain using the configured default intent.
10
+ - Routing: default tier priority now includes `direct` between `local` and `fleet`, and discovery-generated rule scores honor `routing.tier_priority`.
11
+
12
+ ### Fixed
13
+ - Prompt dispatch: provider-inferable model-only calls such as `gpt-5.4` infer the provider instead of pairing the model with `llm.default_provider`.
14
+ - Executor: provider-tier lookup failures are logged and return nil instead of silently defaulting to `:cloud`.
15
+ - LexLLMAdapter: optional content-block accessor fallbacks now capture and debug-log probe errors instead of bare-rescuing them.
16
+ - Auto routing: unresolved `legionio` requests now raise a clear provider error instead of falling back to configured defaults.
17
+ - Routing: model-only requests stay on provider inference while explicit provider/instance/tier requests still get registry defaults without requiring rule routing.
18
+
19
+ ## [0.9.27] - 2026-05-15
20
+
21
+ ### Fixed
22
+ - Router/Executor: provider-scoped instance resolution no longer applies a global `llm.default_instance` to models inferred for another provider; invalid explicit instances now fall back to that provider's registered default instance instead of dispatching to an unregistered `provider/instance` pair.
23
+
24
+ ## [0.9.26] - 2026-05-15
25
+
26
+ ### Fixed
27
+ - Discovery: `detect_embedding_from_registry` no longer sets `@can_embed = true` when no model is resolvable — adds `first_embedding_model_for(provider, instance)` as a third fallback scanning the discovered model catalog; returns `false` (allowing legacy probe to run) when all three sources yield nothing (#121)
28
+ - RagContext: `positive_integer` no longer raises `TypeError` when `value` is nil or an empty string — adds empty-string guard before `Kernel#Integer()` call so GAIA advisory `context_window: nil` does not abort the inference pipeline (#122)
29
+ - LexLLMAdapter: `text_part_content` now handles Anthropic-style `[{type:"text", text:"…"}]` content block arrays — flattens them to plain text instead of calling `.to_s` on the array, preventing Ruby array literals from leaking into provider prompts (#123)
30
+ - Embeddings/Discovery: `embedding_config_value` and `embedding_settings` now accept the deprecated plural `"embeddings"` key alongside the canonical singular `"embedding"` key, emitting a deprecation warning; fixes silent misconfiguration when users follow doc examples that used the plural spelling (#124)
31
+
32
+ ## [0.9.25] - 2026-05-14
33
+
34
+ ### Added
35
+ - Router: `TIER_RANK` constant — ordered quality ranking of tiers (local → direct → fleet → openai_compat → cloud → frontier)
36
+ - Router: `explicit_resolution` promoted to public — callable directly from executor without `send`
37
+ - Router: `chain_from_defaults` appends all registered fallback providers after the primary so the chain has real alternatives to escalate to (previously single-entry when a default provider was configured)
38
+ - Executor: `run_escalation_resolution` extracted from escalation loop — encapsulates per-attempt dispatch, error rescue, and `tried[]` tracking
39
+ - Executor: `skip_same_tier!` — on `ContextOverflow`, immediately skips all remaining same-tier candidates and routes to a higher-tier provider with a larger context window
40
+ - Executor: lateral vs. escalation move classification in per-attempt log line (`move=lateral` for same-tier, `move=escalation` for higher-tier)
41
+
42
+ ### Fixed
43
+ - Router: `explicit_resolution` handles nil `provider` and nil `tier` without raising `NoMethodError`
44
+ - Executor: `build_fallback_resolutions` sorts lateral alternatives (same-tier) before escalation candidates (higher-tier) — tries other instances at the same tier before promoting to a more expensive one
45
+ - Executor: deduplication in escalation loop is fully safe — `tried` entry is recorded on all rescue paths and on quality failure
46
+ - EscalationChain: `padded_resolutions` no longer pads the list by repeating the last resolution — only real distinct options are tried
47
+
48
+ ## [0.9.24] - 2026-05-14
49
+
50
+ ### Fixed
51
+ - API: `instance` from POST body was silently dropped — never forwarded into routing hash
52
+ - Executor: Gaia advisory tier assignment no longer overrides explicit `provider`+`instance` from caller
53
+ - Executor: `instance` now passed through `routing_resolution_for` to `Router.resolve`/`resolve_chain`
54
+ - Executor: `build_default_escalation_chain` now passes resolved provider/instance/model — previously ignored them and built a full auto chain, routing to vllm/fleet instead of the requested provider
55
+ - Router: `resolve`/`resolve_chain` accept `instance:` param; short-circuit to `explicit_resolution` when `provider` or `instance` is set (not just `tier`)
56
+ - Router: `explicit_resolution` honors caller-supplied instance instead of always pulling from registry; infers tier from `PROVIDER_TIER` when not explicitly given
57
+
3
58
  ## [0.9.23] - 2026-05-13
4
59
 
5
60
  ### Added
data/CLAUDE.md CHANGED
@@ -745,6 +745,26 @@ These rules are enforced across all legion-llm code. Violations will be caught i
745
745
  - **Advanced signals**: Budget tracking, GPU utilization monitoring, per-tenant spend limits
746
746
  - **Fleet auto-scaling**: Dynamic worker pool sizing based on queue depth and latency
747
747
 
748
+ ## Provider Registration & Model Resolution
749
+
750
+ - `discover_instances` in each lex-llm-* must include `default_model` in returned config — it flows to registry metadata via `instance_metadata` in `call/providers.rb`
751
+ - Router resolves models via: `registry_entry_for_provider(provider)` → `registry_default_model(entry)` → `metadata[:default_model]`
752
+ - `enabled: false` on an instance config prevents registration — checked in `register_provider_instance`
753
+ - `PROVIDER_DEFAULT_MODEL` does NOT belong in legion-llm — each provider owns its default in its own extension
754
+ - Inventory calls `native_provider_offerings` (full metadata) and excludes `discovery_offerings` for providers with native adapters
755
+
756
+ ## Metering Spool
757
+
758
+ - Events spool to `~/.legionio/data/spool/metering/events.jsonl` when AMQP transport is unavailable
759
+ - Thread-safe (SPOOL_MUTEX), capped at `settings[:metering][:spool][:max_events]` (default 10K)
760
+ - `flush_spool` publishes spooled events when transport reconnects; `lex-llm-ledger` actor triggers it
761
+
762
+ ## Health Tracker
763
+
764
+ - `deny_model(provider:, model:, instance:)` — permanently excludes a model from routing (in-memory, until restart)
765
+ - Config errors (ValidationException, AccessDenied, marketplace) trigger deny instead of circuit breaker
766
+ - Discovery connection failures report `:error` to health tracker — circuit opens after threshold
767
+
748
768
  ---
749
769
 
750
770
  **Maintained By**: Matthew Iverson (@Esity)
@@ -108,7 +108,7 @@ module Legion
108
108
  id: request_id,
109
109
  messages: messages,
110
110
  system: body[:system],
111
- routing: { provider: provider, model: model },
111
+ routing: { provider: provider, model: model, instance: body[:instance] }.compact,
112
112
  tools: tool_declarations,
113
113
  caller: effective_caller,
114
114
  conversation_id: conversation_id,
@@ -9,6 +9,11 @@ module Legion
9
9
  module Models
10
10
  extend Legion::Logging::Helper
11
11
 
12
+ AUTO_ROUTING_MODEL_ID = 'legionio'
13
+ AUTO_ROUTING_MODEL_DISPLAY = 'LegionIO'
14
+ AUTO_ROUTING_OFFERING_ID = 'legionio:auto:inference:legionio'
15
+ AUTO_ROUTING_CAPABILITIES = %w[auto_routing chat completion json_schema tools].freeze
16
+
12
17
  def self.registered(app)
13
18
  log.debug('[llm][api][models] registering model inventory routes')
14
19
 
@@ -18,6 +23,7 @@ module Legion
18
23
 
19
24
  filters = Legion::LLM::API::Native::Models.request_filters(params)
20
25
  offerings = Legion::LLM::Inventory.offerings(filters)
26
+ offerings = Legion::LLM::API::Native::Models.with_auto_routing_offering(offerings, filters)
21
27
 
22
28
  json_response({
23
29
  models: Legion::LLM::API::Native::Models.model_summaries(offerings),
@@ -34,7 +40,9 @@ module Legion
34
40
  log.debug("[llm][api][models] action=get_model id=#{model_id}")
35
41
  require_llm!
36
42
 
37
- offerings = Legion::LLM::Inventory.offerings(model: model_id)
43
+ filters = { model: model_id }
44
+ offerings = Legion::LLM::Inventory.offerings(filters)
45
+ offerings = Legion::LLM::API::Native::Models.with_auto_routing_offering(offerings, filters)
38
46
  halt json_error('model_not_found', "Model '#{model_id}' not found", status_code: 404) unless offerings.any?
39
47
 
40
48
  json_response({
@@ -84,11 +92,11 @@ module Legion
84
92
  summaries = offerings.group_by { |offering| offering[:model] }.map do |model, rows|
85
93
  summarize_model(model, rows)
86
94
  end
87
- summaries.sort_by { |model| model[:id] }
95
+ summaries.sort_by { |model| [auto_routing_model?(model[:id]) ? 0 : 1, model[:id]] }
88
96
  end
89
97
 
90
98
  def self.summarize_model(model, offerings)
91
- {
99
+ summary = {
92
100
  id: model.to_s,
93
101
  types: offerings.map { |offering| offering[:type].to_s }.uniq.sort,
94
102
  providers: offerings.map { |offering| offering[:provider_family] }.uniq.sort,
@@ -99,6 +107,12 @@ module Legion
99
107
  max_context: offerings.filter_map { |offering| offering.dig(:limits, :context_window) }.max,
100
108
  enabled: offerings.any? { |offering| offering[:enabled] != false }
101
109
  }
110
+ if auto_routing_model?(model)
111
+ summary[:display_name] = AUTO_ROUTING_MODEL_DISPLAY
112
+ summary[:auto_route] = true
113
+ summary[:default] = true
114
+ end
115
+ summary
102
116
  end
103
117
 
104
118
  def self.summary(offerings)
@@ -110,6 +124,64 @@ module Legion
110
124
  .transform_values(&:size)
111
125
  }
112
126
  end
127
+
128
+ def self.with_auto_routing_offering(offerings, filters = {})
129
+ return offerings unless auto_routing_offering_matches?(filters)
130
+ return offerings if offerings.any? { |offering| auto_routing_model?(offering[:model]) }
131
+
132
+ [auto_routing_offering, *offerings]
133
+ end
134
+
135
+ def self.auto_routing_offering
136
+ {
137
+ id: AUTO_ROUTING_OFFERING_ID,
138
+ offering_id: AUTO_ROUTING_OFFERING_ID,
139
+ model: AUTO_ROUTING_MODEL_ID,
140
+ display_name: AUTO_ROUTING_MODEL_DISPLAY,
141
+ model_family: 'legionio',
142
+ canonical_model_alias: AUTO_ROUTING_MODEL_ID,
143
+ type: :inference,
144
+ provider_family: 'legionio',
145
+ provider_instance: 'auto',
146
+ instance_id: 'auto',
147
+ tier: :auto,
148
+ transport: :internal,
149
+ enabled: true,
150
+ capabilities: AUTO_ROUTING_CAPABILITIES,
151
+ limits: {},
152
+ health: { circuit_state: 'available' },
153
+ metadata: { auto_route: true, placeholder: true, display_name: AUTO_ROUTING_MODEL_DISPLAY },
154
+ routing_metadata: { strategy: 'auto' },
155
+ source: 'static'
156
+ }
157
+ end
158
+
159
+ def self.auto_routing_offering_matches?(filters)
160
+ normalized = request_filters(filters)
161
+ type = normalized[:type]
162
+ return false if type && !type.to_s.empty? && type.to_s != 'inference' && type.to_s != 'chat'
163
+
164
+ provider = normalized[:provider]
165
+ return false if provider && !provider.to_s.empty? && !%w[legionio auto].include?(provider.to_s.downcase)
166
+
167
+ instance = normalized[:instance_id]
168
+ return false if instance && !instance.to_s.empty? && !%w[auto legionio].include?(instance.to_s.downcase)
169
+
170
+ model = normalized[:model] || normalized[:offering_id]
171
+ return false if model && !model.to_s.empty? && !auto_routing_model?(model) && model.to_s != AUTO_ROUTING_OFFERING_ID
172
+
173
+ family = normalized[:model_family]
174
+ return false if family && !family.to_s.empty? && family.to_s.downcase != 'legionio'
175
+
176
+ capability = normalized[:capability]
177
+ return false if capability && !AUTO_ROUTING_CAPABILITIES.include?(capability.to_s)
178
+
179
+ true
180
+ end
181
+
182
+ def self.auto_routing_model?(model)
183
+ model.to_s.strip.downcase == AUTO_ROUTING_MODEL_ID
184
+ end
113
185
  end
114
186
  end
115
187
  end
@@ -232,7 +232,8 @@ module Legion
232
232
  return 'unknown' unless tracker
233
233
 
234
234
  tracker.circuit_state(provider_name.to_sym, instance: instance_name.to_sym).to_s
235
- rescue StandardError
235
+ rescue StandardError => e
236
+ log.debug "[llm][tiers] action=offering_instance_health provider=#{provider_name} instance=#{instance_name} error=#{e.class} — #{e.message}"
236
237
  'unknown'
237
238
  end
238
239
  end
@@ -122,12 +122,21 @@ module Legion
122
122
 
123
123
  def resolve_provider
124
124
  LLM.embedding_provider ||
125
- Legion::LLM::Settings.value(:embedding, :provider)&.to_sym
125
+ embedding_config_value(:provider)&.to_sym
126
126
  end
127
127
 
128
128
  def resolve_model
129
129
  LLM.embedding_model ||
130
- Legion::LLM::Settings.value(:embedding, :default_model)
130
+ embedding_config_value(:default_model)
131
+ end
132
+
133
+ def embedding_config_value(key)
134
+ v = Legion::LLM::Settings.value(:embedding, key)
135
+ return v unless v.nil?
136
+
137
+ plural = Legion::LLM::Settings.value(:embeddings, key)
138
+ log.warn "[llm][embeddings] settings key \"embeddings.#{key}\" (plural) is deprecated — rename to \"embedding.#{key}\"" unless plural.nil?
139
+ plural
131
140
  end
132
141
 
133
142
  def coerce_text(value)
@@ -239,12 +239,49 @@ module Legion
239
239
  end
240
240
 
241
241
  def text_part_content(part)
242
- return unless part.respond_to?(:transform_keys)
242
+ return part if part.is_a?(String)
243
243
 
244
- normalized = part.transform_keys { |key| key.respond_to?(:to_sym) ? key.to_sym : key }
245
- return unless normalized[:type].to_s == 'text'
244
+ if part.respond_to?(:transform_keys)
245
+ normalized = part.transform_keys { |key| key.respond_to?(:to_sym) ? key.to_sym : key }
246
+ return unless normalized[:type].to_s == 'text'
246
247
 
247
- normalized[:text].to_s
248
+ return normalized[:text].to_s
249
+ end
250
+
251
+ # Data structs expose named readers (type/text) without necessarily implementing [].
252
+ # Try named accessor path first; fall through to [] / fetch for plain hashes/structs.
253
+ if part.respond_to?(:type) || part.respond_to?(:text)
254
+ type = (part.respond_to?(:type) ? part.type.to_s : '')
255
+ text = part.respond_to?(:text) ? part.text : nil
256
+ return text.to_s if type == 'text' || (type.empty? && !text.nil?)
257
+
258
+ return nil
259
+ end
260
+
261
+ return unless part.respond_to?(:[]) || part.respond_to?(:fetch)
262
+
263
+ type = (defined_method_access(part, :type) || '').to_s
264
+ text = defined_method_access(part, :text)
265
+ text.to_s if type == 'text' || (type.empty? && !text.nil?)
266
+ end
267
+
268
+ def defined_method_access(obj, key)
269
+ # Prefer named accessor (covers Data structs like Types::ContentBlock).
270
+ key_sym = key.respond_to?(:to_sym) ? key.to_sym : key
271
+ return obj.public_send(key_sym) if obj.respond_to?(key_sym)
272
+
273
+ str_key = key.to_s
274
+ obj[key]
275
+ rescue TypeError, NoMethodError, KeyError => e
276
+ log.debug "[llm][adapter] action=defined_method_access key=#{key} class=#{obj.class} " \
277
+ "fallback=string_key error=#{e.class}: #{e.message}"
278
+ begin
279
+ obj[str_key]
280
+ rescue TypeError, NoMethodError, KeyError => fallback_error
281
+ log.debug "[llm][adapter] action=defined_method_access key=#{key} class=#{obj.class} " \
282
+ "fallback=none error=#{fallback_error.class}: #{fallback_error.message}"
283
+ nil
284
+ end
248
285
  end
249
286
 
250
287
  def normalize_message_tool_calls(tool_calls)
@@ -52,7 +52,8 @@ module Legion
52
52
  Legion::Extensions::Llm.constants(false).filter_map do |const_name|
53
53
  mod = Legion::Extensions::Llm.const_get(const_name, false)
54
54
  provider_module?(mod) ? mod : nil
55
- rescue NameError
55
+ rescue NameError => e
56
+ log.debug "[llm][providers] action=discover_provider_modules const=#{const_name} error=#{e.class} — #{e.message}"
56
57
  nil
57
58
  end
58
59
  end
@@ -120,7 +121,8 @@ module Legion
120
121
  return nil unless provider_module&.const_defined?(:PROVIDER_FAMILY, false)
121
122
 
122
123
  provider_module::PROVIDER_FAMILY
123
- rescue StandardError
124
+ rescue StandardError => e
125
+ log.debug "[llm][providers] action=safe_provider_family error=#{e.class} — #{e.message}"
124
126
  nil
125
127
  end
126
128
 
@@ -9,9 +9,15 @@ module Legion
9
9
  class Curator
10
10
  include Legion::Logging::Helper
11
11
 
12
- CURATED_KEY = :__curated__
13
- THINKING_OPEN = '<thinking>'
14
- THINKING_CLOSE = '</thinking>'
12
+ CURATED_KEY = :__curated__
13
+
14
+ # All known provider thinking tag variants.
15
+ # Anthropic: <thinking>…</thinking>
16
+ # DeepSeek / Qwen / Ollama / vLLM inline: <think>…</think>
17
+ THINKING_TAG_PAIRS = [
18
+ ['<thinking>', '</thinking>'],
19
+ ['<think>', '</think>']
20
+ ].freeze
15
21
 
16
22
  def initialize(conversation_id:)
17
23
  @conversation_id = conversation_id
@@ -76,6 +82,8 @@ module Legion
76
82
  return msg if content.length <= max_chars
77
83
 
78
84
  summary = heuristic_tool_summary(content, tool_name_from(msg))
85
+ log.debug "[llm][curator] action=distill_tool_result conversation_id=#{@conversation_id} " \
86
+ "original_chars=#{content.length} summary_chars=#{summary.length}"
79
87
  msg.merge(content: summary, curated: true, original_content: content)
80
88
  end
81
89
 
@@ -89,6 +97,8 @@ module Legion
89
97
 
90
98
  return msg if stripped == content || stripped.empty?
91
99
 
100
+ log.debug "[llm][curator] action=strip_thinking conversation_id=#{@conversation_id} " \
101
+ "original_chars=#{content.length} stripped_chars=#{stripped.length}"
92
102
  msg.merge(content: stripped, curated: true, original_content: content)
93
103
  end
94
104
 
@@ -192,18 +202,27 @@ module Legion
192
202
  end
193
203
 
194
204
  def strip_thinking_tags(text)
195
- result = +''
205
+ result = text
206
+ THINKING_TAG_PAIRS.each do |open_tag, close_tag|
207
+ result = strip_tag_pair(result, open_tag, close_tag)
208
+ end
209
+ result
210
+ end
211
+
212
+ def strip_tag_pair(text, open_tag, close_tag)
213
+ out = +''
196
214
  pos = 0
197
215
  while pos < text.length
198
- open_idx = text.index(THINKING_OPEN, pos)
216
+ open_idx = text.index(open_tag, pos)
199
217
  break unless open_idx
200
218
 
201
- result << text[pos...open_idx]
202
- close_idx = text.index(THINKING_CLOSE, open_idx + THINKING_OPEN.length)
203
- pos = close_idx ? close_idx + THINKING_CLOSE.length : text.length
219
+ out << text[pos...open_idx]
220
+ close_idx = text.index(close_tag, open_idx + open_tag.length)
221
+ pos = close_idx ? close_idx + close_tag.length : text.length
204
222
  end
205
- result << text[pos..] if pos < text.length
206
- result
223
+ out << text[pos..] if pos < text.length
224
+ # Strip any unclosed open tag left at the end (provider died mid-stream).
225
+ out.sub(/#{Regexp.escape(open_tag)}.*\z/m, '').strip
207
226
  end
208
227
 
209
228
  def curate_message(msg, assistant_response)
@@ -427,7 +446,8 @@ module Legion
427
446
  def curated_payload(entry)
428
447
  parsed = Legion::JSON.parse(entry[:content].to_s)
429
448
  parsed.is_a?(Hash) ? parsed : {}
430
- rescue Legion::JSON::ParseError
449
+ rescue Legion::JSON::ParseError => e
450
+ log.debug "[llm][curator] action=curated_payload conversation_id=#{@conversation_id} error=#{e.class} — #{e.message}"
431
451
  {}
432
452
  end
433
453
 
@@ -26,7 +26,7 @@ module Legion
26
26
  anthropic: :frontier
27
27
  }.freeze
28
28
 
29
- TIER_WEIGHT = { local: 100, fleet: 80, cloud: 60, frontier: 40 }.freeze
29
+ DEFAULT_TIER_PRIORITY = %i[local direct fleet openai_compat cloud frontier].freeze
30
30
 
31
31
  module_function
32
32
 
@@ -50,7 +50,7 @@ module Legion
50
50
  extract_field(model_data, 'tier')&.to_sym ||
51
51
  tier
52
52
  capability = embedding_model?(model_data) ? :embed : :chat
53
- priority = (TIER_WEIGHT[model_tier] || 80) - order
53
+ priority = tier_weight(model_tier) - order
54
54
  rules << build_rule(provider, instance_id, model_data, capability, model_tier, priority)
55
55
  rules << build_rule(provider, instance_id, model_data, :stream, model_tier, priority) if capability == :chat
56
56
  order += 1
@@ -91,7 +91,7 @@ module Legion
91
91
  next unless default_model
92
92
 
93
93
  model_data = { name: default_model }
94
- priority = TIER_WEIGHT[tier] || 40
94
+ priority = tier_weight(tier)
95
95
  rules << build_rule(provider_name, :default, model_data, :chat, tier, priority)
96
96
  rules << build_rule(provider_name, :default, model_data, :stream, tier, priority)
97
97
  end
@@ -136,6 +136,26 @@ module Legion
136
136
  model_data[field] || model_data[field.to_s]
137
137
  end
138
138
 
139
+ def tier_weight(tier)
140
+ tier_sym = tier.respond_to?(:to_sym) ? tier.to_sym : tier
141
+ index = tier_priority.index(tier_sym)
142
+ return 0 unless index
143
+
144
+ (tier_priority.length - index) * 100
145
+ end
146
+
147
+ def tier_priority
148
+ configured = Legion::LLM::Settings.value(:routing, :tier_priority, default: DEFAULT_TIER_PRIORITY)
149
+ normalized = Array(configured).filter_map do |tier|
150
+ tier.to_sym if tier.respond_to?(:to_sym)
151
+ end
152
+ normalized = DEFAULT_TIER_PRIORITY if normalized.empty?
153
+ (normalized + DEFAULT_TIER_PRIORITY).uniq
154
+ rescue StandardError => e
155
+ handle_exception(e, level: :warn, handled: true, operation: 'rule_generator.tier_priority')
156
+ DEFAULT_TIER_PRIORITY
157
+ end
158
+
139
159
  def extension_providers
140
160
  ext = Legion::Settings[:extensions]
141
161
  return ext[:llm] if ext.is_a?(Hash) && ext[:llm].is_a?(Hash)
@@ -254,11 +254,22 @@ module Legion
254
254
  end
255
255
  return false unless best
256
256
 
257
- @embedding_provider = best[:provider]
258
- @embedding_model = best.dig(:metadata, :default_model) ||
259
- Settings.value(:embedding, :default_model)
260
- @embedding_instance = best[:instance]
261
- @can_embed = true
257
+ provider = best[:provider]
258
+ instance = best[:instance]
259
+ resolved = best.dig(:metadata, :default_model) ||
260
+ embedding_settings[:default_model] ||
261
+ first_embedding_model_for(provider, instance)
262
+
263
+ unless resolved.to_s.length.positive?
264
+ log.debug '[llm][discovery] action=detect_embedding_from_registry no_model_resolved ' \
265
+ "provider=#{provider} instance=#{instance} — falling through to legacy probe"
266
+ return false
267
+ end
268
+
269
+ @embedding_provider = provider
270
+ @embedding_model = resolved
271
+ @embedding_instance = instance
272
+ @can_embed = true
262
273
  @embedding_fallback_chain = build_registry_embedding_fallback(embedding_instances)
263
274
 
264
275
  log.info "[llm][discovery] embedding available provider=#{@embedding_provider} " \
@@ -280,6 +291,14 @@ module Legion
280
291
  end
281
292
  end
282
293
 
294
+ def first_embedding_model_for(provider, instance)
295
+ embedding_caps = %w[embedding embeddings embed].freeze
296
+ cached_discovered_models.find do |m|
297
+ m[:provider].to_s == provider.to_s && m[:instance].to_s == instance.to_s &&
298
+ Array(m[:capabilities]).any? { |c| embedding_caps.include?(c.to_s) }
299
+ end&.dig(:model)
300
+ end
301
+
283
302
  def find_embedding_provider(embedding_settings)
284
303
  fallback = Legion::LLM::Settings.config_value(embedding_settings, :provider_fallback, %w[ollama bedrock openai])
285
304
  provider_models = Legion::LLM::Settings.config_value(embedding_settings, :provider_models, {})
@@ -396,7 +415,17 @@ module Legion
396
415
  end
397
416
 
398
417
  def embedding_settings
399
- Legion::LLM::Settings.config_value(llm_settings, :embedding, {})
418
+ settings = llm_settings
419
+ result = Legion::LLM::Settings.config_value(settings, :embedding)
420
+ return result if result.is_a?(Hash) && !result.empty?
421
+
422
+ plural = Legion::LLM::Settings.config_value(settings, :embeddings)
423
+ if plural.is_a?(Hash) && !plural.empty?
424
+ log.warn '[llm][discovery] settings key "embeddings" (plural) is deprecated — rename to "embedding" (singular)'
425
+ return plural
426
+ end
427
+
428
+ result || {}
400
429
  end
401
430
 
402
431
  def providers_settings
@@ -159,9 +159,11 @@ module Legion
159
159
  return meta[:tier].to_sym if meta.is_a?(Hash) && meta[:tier]
160
160
  return Router.provider_tier(provider) if defined?(Router) && Router.respond_to?(:provider_tier)
161
161
 
162
- Router::PROVIDER_TIER.fetch(provider.to_sym, :cloud) if defined?(Router::PROVIDER_TIER)
163
- rescue StandardError
164
- :cloud
162
+ Router::PROVIDER_TIER.fetch(provider.to_sym, nil) if defined?(Router::PROVIDER_TIER)
163
+ rescue StandardError => e
164
+ handle_exception(e, level: :warn, handled: true, operation: 'llm.pipeline.inferred_provider_tier',
165
+ provider: provider)
166
+ nil
165
167
  end
166
168
 
167
169
  def execute_steps
@@ -326,12 +328,17 @@ module Legion
326
328
  log.debug "[llm][executor] action=step_routing.enter requested_provider=#{@request.routing[:provider]} requested_model=#{@request.routing[:model]}"
327
329
  @timestamps[:routing_start] = Time.now
328
330
  state = resolve_routing_state(apply_proactive_tier_assignment(routing_request_state))
331
+ auto_route = state[:auto_route] == true
329
332
 
330
333
  @resolved_provider = state[:provider] ||
331
334
  (state[:model] && Router.infer_provider_for_model(state[:model])) ||
332
- llm_setting(:default_provider)
333
- @resolved_instance = state[:instance] || llm_setting(:default_instance)
334
- @resolved_model = state[:model] || llm_setting(:default_model)
335
+ (llm_setting(:default_provider) unless auto_route)
336
+ @resolved_instance = resolve_provider_instance(state[:instance], @resolved_provider)
337
+ @resolved_model = state[:model] || (llm_setting(:default_model) unless auto_route)
338
+ if auto_route && (@resolved_provider.nil? || @resolved_model.nil?)
339
+ raise ProviderError, 'Auto routing could not resolve an available LLM provider/model'
340
+ end
341
+
335
342
  @resolved_tier = state[:tier]&.to_sym || inferred_provider_tier(@resolved_provider)
336
343
  @resolved_offering_id = state[:offering_id]
337
344
  @resolved_offering_metadata = state[:offering_metadata]
@@ -347,16 +354,43 @@ module Legion
347
354
  )
348
355
  end
349
356
 
357
+ def resolve_provider_instance(requested_instance, provider)
358
+ return provider_scoped_instance(requested_instance, provider, preserve_unknown: true) if requested_instance
359
+
360
+ provider_scoped_instance(llm_setting(:default_instance), provider, preserve_unknown: false)
361
+ end
362
+
363
+ def provider_scoped_instance(instance, provider, preserve_unknown:)
364
+ return nil if instance.nil? || instance.to_s.empty? || provider.nil? || provider.to_s.empty?
365
+
366
+ provider_sym = provider.to_sym
367
+ instance_sym = instance.to_sym
368
+ return instance_sym if Call::Registry.registered?(provider_sym, instance: instance_sym)
369
+ return nil if Call::Registry.registered?(provider_sym)
370
+
371
+ preserve_unknown ? instance_sym : nil
372
+ rescue StandardError => e
373
+ handle_exception(e, level: :warn, handled: true, operation: 'llm.pipeline.provider_scoped_instance')
374
+ preserve_unknown ? instance : nil
375
+ end
376
+
350
377
  def routing_request_state
378
+ routing_explicit = @request.extra[:routing_explicit]
379
+ instance = @request.routing[:instance] || @request.routing[:instance_id] || @request.routing[:provider_instance]
380
+ tier = @request.extra[:tier]
351
381
  {
352
382
  provider: @request.routing[:provider],
353
- instance: @request.routing[:instance] || @request.routing[:instance_id] || @request.routing[:provider_instance],
383
+ instance: instance,
354
384
  model: @request.routing[:model],
355
385
  offering_id: @request.routing[:offering_id] || @request.routing[:id],
356
386
  offering_metadata: normalize_offering_metadata(@request.routing[:offering_metadata] ||
357
387
  @request.routing[:offering]),
358
388
  intent: @request.extra[:intent],
359
- tier: @request.extra[:tier]
389
+ tier: tier,
390
+ auto_route: @request.extra[:auto_route],
391
+ provider_explicit: routing_field_explicit?(routing_explicit, :provider, @request.routing[:provider]),
392
+ instance_explicit: routing_field_explicit?(routing_explicit, :instance, instance),
393
+ tier_explicit: routing_field_explicit?(routing_explicit, :tier, tier)
360
394
  }
361
395
  end
362
396
 
@@ -365,17 +399,25 @@ module Legion
365
399
  # caller-supplied tier/intent. Advisory assignments only fill blanks.
366
400
  if @proactive_tier_assignment&.dig(:forced)
367
401
  state[:tier] = @proactive_tier_assignment[:tier]
402
+ state[:tier_explicit] = true
368
403
  state[:intent] = merge_routing_intent(state[:intent], @proactive_tier_assignment[:intent])
369
404
  log.info "[llm][routing] action=forced_tier source=#{@proactive_tier_assignment[:source]} tier=#{state[:tier]}"
370
- elsif @proactive_tier_assignment && !state[:tier] && !state[:intent]
405
+ elsif @proactive_tier_assignment && !state[:tier] && !state[:intent] && !state[:instance] &&
406
+ !state[:provider] && !state[:model]
371
407
  state[:tier] = @proactive_tier_assignment[:tier]
408
+ state[:tier_explicit] = true
372
409
  state[:intent] = @proactive_tier_assignment[:intent]
373
410
  end
374
411
  state
375
412
  end
376
413
 
377
414
  def resolve_routing_state(state)
378
- return state unless (state[:intent] || state[:tier]) && defined?(Router) && Router.routing_enabled?
415
+ return state unless defined?(Router)
416
+
417
+ explicit_route = state[:provider_explicit] || state[:instance_explicit] || state[:tier_explicit]
418
+ auto_route = state[:auto_route] == true
419
+ intent_route = state[:intent] && Router.routing_enabled?
420
+ return state unless explicit_route || auto_route || intent_route
379
421
 
380
422
  resolution = routing_resolution_for(state)
381
423
  return state unless resolution
@@ -384,17 +426,20 @@ module Legion
384
426
  end
385
427
 
386
428
  def routing_resolution_for(state)
387
- if pipeline_escalation_enabled?
429
+ if state[:auto_route] == true || (state[:intent] && pipeline_escalation_enabled?)
388
430
  @escalation_chain = Router.resolve_chain(
389
- intent: state[:intent],
390
- tier: state[:tier],
391
- model: state[:model],
392
- provider: state[:provider],
393
- max_escalations: pipeline_escalation_max_attempts
431
+ intent: state[:intent],
432
+ tier: state[:tier],
433
+ model: state[:model],
434
+ provider: state[:provider],
435
+ instance: state[:instance],
436
+ max_escalations: pipeline_escalation_max_attempts,
437
+ allow_default_fallback: state[:auto_route] != true
394
438
  )
395
439
  @escalation_chain.primary
396
440
  else
397
- Router.resolve(intent: state[:intent], tier: state[:tier], model: state[:model], provider: state[:provider])
441
+ Router.resolve(intent: state[:intent], tier: state[:tier], model: state[:model],
442
+ provider: state[:provider], instance: state[:instance])
398
443
  end
399
444
  end
400
445
 
@@ -422,6 +467,13 @@ module Legion
422
467
  state
423
468
  end
424
469
 
470
+ def routing_field_explicit?(flags, key, value)
471
+ return false if value.nil? || value.to_s.empty?
472
+ return true unless flags.is_a?(Hash)
473
+
474
+ flags.fetch(key, flags.fetch(key.to_s, true)) == true
475
+ end
476
+
425
477
  def step_request_normalization
426
478
  @exchange_id = Tracing.exchange_id
427
479
  end
@@ -476,33 +528,21 @@ module Legion
476
528
  end
477
529
 
478
530
  def run_provider_call_with_escalation
479
- chain = @escalation_chain || build_default_escalation_chain
531
+ @escalation_chain ||= build_default_escalation_chain
532
+ chain = @escalation_chain
480
533
  threshold = pipeline_escalation_quality_threshold
481
534
  quality_check = @request.extra[:quality_check]
482
535
  succeeded = false
536
+ tried = []
483
537
  log.debug "[llm][executor] action=escalation.enter chain_size=#{chain.size} threshold=#{threshold}"
484
538
 
539
+ primary_tier = @escalation_chain.primary&.tier
540
+
485
541
  chain.each do |resolution|
486
- start_time = Time.now
487
- @resolved_provider = resolution.provider
488
- @resolved_instance = resolution.instance
489
- @resolved_model = resolution.model
490
- @resolved_tier = resolution.tier
491
- @resolved_offering_id = resolution.offering_id
492
- @resolved_offering_metadata = resolution.offering_metadata
493
- succeeded = attempt_escalation(resolution, threshold, quality_check, start_time)
542
+ next if tried.any? { |t| t[:provider] == resolution.provider && t[:instance] == resolution.instance && t[:model] == resolution.model }
543
+
544
+ succeeded = run_escalation_resolution(resolution, threshold, quality_check, tried, primary_tier)
494
545
  break if succeeded
495
- rescue Legion::LLM::AuthError, Legion::LLM::PrivacyModeError => e
496
- record_escalation_failure(e, resolution, start_time,
497
- outcome: :auth_error, operation: 'llm.pipeline.escalation_attempt.auth',
498
- handled: true)
499
- rescue Legion::LLM::RateLimitError => e
500
- record_escalation_failure(e, resolution, start_time,
501
- outcome: :rate_limited, operation: 'llm.pipeline.escalation_attempt.rate_limit',
502
- handled: true)
503
- rescue StandardError => e
504
- record_escalation_failure(e, resolution, start_time, outcome: :error,
505
- operation: 'llm.pipeline.escalation_attempt')
506
546
  end
507
547
  return if succeeded
508
548
 
@@ -513,6 +553,58 @@ module Legion
513
553
  raise EscalationExhausted, "All #{@escalation_history.size} escalation attempts failed"
514
554
  end
515
555
 
556
+ def run_escalation_resolution(resolution, threshold, quality_check, tried, primary_tier)
557
+ move_type = if tried.empty?
558
+ :primary
559
+ elsif resolution.tier == primary_tier
560
+ :lateral
561
+ else
562
+ :escalation
563
+ end
564
+ log.info "[llm][escalation] action=attempt move=#{move_type} provider=#{resolution.provider} model=#{resolution.model} tier=#{resolution.tier}"
565
+
566
+ start_time = Time.now
567
+ @resolved_provider = resolution.provider
568
+ @resolved_instance = resolution.instance
569
+ @resolved_model = resolution.model
570
+ @resolved_tier = resolution.tier
571
+ @resolved_offering_id = resolution.offering_id
572
+ @resolved_offering_metadata = resolution.offering_metadata
573
+ succeeded = attempt_escalation(resolution, threshold, quality_check, start_time)
574
+ tried << { provider: resolution.provider, instance: resolution.instance, model: resolution.model } unless succeeded
575
+ succeeded
576
+ rescue Legion::LLM::AuthError, Legion::LLM::PrivacyModeError => e
577
+ tried << { provider: resolution.provider, instance: resolution.instance, model: resolution.model }
578
+ record_escalation_failure(e, resolution, start_time,
579
+ outcome: :auth_error,
580
+ operation: 'llm.pipeline.escalation_attempt.auth',
581
+ handled: true)
582
+ false
583
+ rescue Legion::LLM::ContextOverflow => e
584
+ tried << { provider: resolution.provider, instance: resolution.instance, model: resolution.model }
585
+ record_escalation_failure(e, resolution, start_time,
586
+ outcome: :context_overflow,
587
+ operation: 'llm.pipeline.escalation_attempt.context_overflow',
588
+ handled: true)
589
+ log.warn "[llm][escalation] context_overflow provider=#{resolution.provider} " \
590
+ "model=#{resolution.model} — skipping same-tier, seeking larger context window"
591
+ skip_same_tier!(resolution, tried)
592
+ false
593
+ rescue Legion::LLM::RateLimitError => e
594
+ tried << { provider: resolution.provider, instance: resolution.instance, model: resolution.model }
595
+ record_escalation_failure(e, resolution, start_time,
596
+ outcome: :rate_limited,
597
+ operation: 'llm.pipeline.escalation_attempt.rate_limit',
598
+ handled: true)
599
+ false
600
+ rescue StandardError => e
601
+ tried << { provider: resolution.provider, instance: resolution.instance, model: resolution.model }
602
+ record_escalation_failure(e, resolution, start_time,
603
+ outcome: :error,
604
+ operation: 'llm.pipeline.escalation_attempt')
605
+ false
606
+ end
607
+
516
608
  def attempt_escalation(resolution, threshold, quality_check, start_time)
517
609
  @current_escalation_context = {
518
610
  attempt: @escalation_history.size + 1,
@@ -586,7 +678,64 @@ module Legion
586
678
  end
587
679
 
588
680
  def build_default_escalation_chain
589
- Router.resolve_chain(max_escalations: pipeline_escalation_max_attempts)
681
+ primary = Router.explicit_resolution(@resolved_tier, @resolved_provider, @resolved_model, @resolved_instance)
682
+ fallbacks = build_fallback_resolutions(
683
+ exclude_provider: @resolved_provider,
684
+ exclude_instance: @resolved_instance,
685
+ primary_tier: @resolved_tier
686
+ )
687
+ resolutions = ([primary] + fallbacks).compact.uniq { |r| [r.provider, r.instance, r.model] }
688
+ Router::EscalationChain.new(resolutions: resolutions, max_attempts: pipeline_escalation_max_attempts)
689
+ end
690
+
691
+ def build_fallback_resolutions(exclude_provider: nil, exclude_instance: nil, primary_tier: nil)
692
+ tier_rank = Router::TIER_RANK
693
+ primary_rank = primary_tier ? (tier_rank[primary_tier.to_sym] || 99) : 99
694
+
695
+ candidates = Call::Registry.all_instances.filter_map do |entry|
696
+ next if entry[:provider] == exclude_provider&.to_sym && entry[:instance] == (exclude_instance&.to_sym || :default)
697
+ next if entry[:provider] == exclude_provider&.to_sym && exclude_instance.nil?
698
+
699
+ model = Router.send(:registry_default_model, entry)
700
+ next unless model
701
+
702
+ tier = Router::PROVIDER_TIER.fetch(entry[:provider], :frontier)
703
+ Router::Resolution.new(
704
+ tier: tier,
705
+ provider: entry[:provider],
706
+ instance: entry[:instance] == :default ? nil : entry[:instance],
707
+ model: model,
708
+ rule: 'escalation_fallback'
709
+ )
710
+ end
711
+
712
+ # Lateral alternatives (same tier) come first; escalations (higher tier) follow;
713
+ # lower-ranked tiers are appended last.
714
+ candidates.sort_by do |r|
715
+ r_rank = tier_rank[r.tier] || 99
716
+ rank_diff = r_rank - primary_rank
717
+ bucket = if rank_diff.zero?
718
+ 0
719
+ elsif rank_diff.positive?
720
+ 1
721
+ else
722
+ 2
723
+ end
724
+ [bucket, r_rank]
725
+ end
726
+ end
727
+
728
+ def skip_same_tier!(failed_resolution, tried)
729
+ chain = @escalation_chain
730
+ return unless chain.respond_to?(:each)
731
+
732
+ chain.each do |r|
733
+ next if r.tier != failed_resolution.tier
734
+ next if tried.any? { |t| t[:provider] == r.provider && t[:instance] == r.instance && t[:model] == r.model }
735
+
736
+ log.debug "[llm][escalation] action=skip_same_tier provider=#{r.provider} model=#{r.model} tier=#{r.tier} reason=context_overflow"
737
+ tried << { provider: r.provider, instance: r.instance, model: r.model }
738
+ end
590
739
  end
591
740
 
592
741
  def escalation_attempt_hash(resolution, outcome:, failures:, duration_ms:)
@@ -39,8 +39,17 @@ module Legion
39
39
  cache: nil,
40
40
  quality_check: nil,
41
41
  **)
42
+ routing_explicit = { provider: !provider.nil?, model: !model.nil?, tier: !tier.nil? }
42
43
  resolved_provider = provider
43
44
  resolved_model = model
45
+ auto_route = Inference::Request.auto_routing_model?(resolved_model)
46
+
47
+ if auto_route
48
+ resolved_provider = nil
49
+ intent ||= Inference::Request.default_auto_routing_intent
50
+ elsif resolved_provider.nil? && resolved_model && defined?(Router)
51
+ resolved_provider = Router.infer_provider_for_model(resolved_model)
52
+ end
44
53
 
45
54
  if resolved_provider.nil? && resolved_model.nil? && defined?(Router) && Router.routing_enabled? && (intent || tier)
46
55
  resolution = Router.resolve(intent: intent, tier: tier, exclude: exclude)
@@ -48,26 +57,27 @@ module Legion
48
57
  resolved_model = resolution&.model
49
58
  end
50
59
 
51
- resolved_provider ||= llm_setting(:default_provider)
52
- resolved_model ||= llm_setting(:default_model)
60
+ resolved_provider ||= llm_setting(:default_provider) unless auto_route
61
+ resolved_model ||= llm_setting(:default_model) unless auto_route
53
62
 
54
63
  request(message,
55
- provider: resolved_provider,
56
- model: resolved_model,
57
- intent: intent,
58
- tier: tier,
59
- schema: schema,
60
- tools: tools,
61
- escalate: escalate,
62
- max_escalations: max_escalations,
63
- thinking: thinking,
64
- temperature: temperature,
65
- max_tokens: max_tokens,
66
- tracing: tracing,
67
- agent: agent,
68
- caller: caller,
69
- cache: cache,
70
- quality_check: quality_check,
64
+ provider: resolved_provider,
65
+ model: resolved_model,
66
+ intent: intent,
67
+ tier: tier,
68
+ schema: schema,
69
+ tools: tools,
70
+ escalate: escalate,
71
+ max_escalations: max_escalations,
72
+ thinking: thinking,
73
+ temperature: temperature,
74
+ max_tokens: max_tokens,
75
+ tracing: tracing,
76
+ agent: agent,
77
+ caller: caller,
78
+ cache: cache,
79
+ quality_check: quality_check,
80
+ routing_explicit: routing_explicit,
71
81
  **)
72
82
  end
73
83
 
@@ -90,7 +100,8 @@ module Legion
90
100
  cache: nil,
91
101
  quality_check: nil,
92
102
  **)
93
- if provider.nil? || model.nil?
103
+ auto_route = Inference::Request.auto_routing_model?(model)
104
+ if !auto_route && (provider.nil? || model.nil?)
94
105
  raise LLMError, "Prompt.request: provider and model must be set (got provider=#{provider.inspect}, model=#{model.inspect}). " \
95
106
  'Configure Legion::Settings[:llm][:default_provider] and [:default_model], or pass them explicitly.'
96
107
  end
@@ -3,6 +3,9 @@
3
3
  module Legion
4
4
  module LLM
5
5
  module Inference
6
+ AUTO_ROUTING_MODEL_KEY = 'legionio'
7
+ AUTO_ROUTING_MODEL_ALIASES = [AUTO_ROUTING_MODEL_KEY].freeze
8
+
6
9
  Request = ::Data.define(
7
10
  :id, :conversation_id, :idempotency_key, :schema_version,
8
11
  :system, :messages, :tools, :tool_choice,
@@ -14,6 +17,11 @@ module Legion
14
17
  :billing, :test, :modality, :hooks
15
18
  ) do
16
19
  def self.build(**kwargs)
20
+ routing, extra = normalize_auto_routing(
21
+ kwargs.fetch(:routing, { provider: nil, model: nil }),
22
+ kwargs.fetch(:extra, {})
23
+ )
24
+
17
25
  new(
18
26
  id: kwargs[:id] || "req_#{SecureRandom.hex(12)}",
19
27
  conversation_id: kwargs[:conversation_id],
@@ -23,7 +31,7 @@ module Legion
23
31
  messages: kwargs.fetch(:messages, []),
24
32
  tools: kwargs.key?(:tools) ? kwargs[:tools] : nil,
25
33
  tool_choice: kwargs.fetch(:tool_choice, { mode: :auto }),
26
- routing: kwargs.fetch(:routing, { provider: nil, model: nil }),
34
+ routing: routing,
27
35
  tokens: kwargs.fetch(:tokens, { max: 4096 }),
28
36
  stop: kwargs.fetch(:stop, { sequences: [] }),
29
37
  generation: kwargs.fetch(:generation, {}),
@@ -35,7 +43,7 @@ module Legion
35
43
  cache: kwargs.fetch(:cache, { strategy: :default, cacheable: true }),
36
44
  priority: kwargs.fetch(:priority, :normal),
37
45
  ttl: kwargs[:ttl],
38
- extra: kwargs.fetch(:extra, {}),
46
+ extra: extra,
39
47
  metadata: kwargs.fetch(:metadata, {}),
40
48
  enrichments: kwargs.fetch(:enrichments, {}),
41
49
  predictions: kwargs.fetch(:predictions, {}),
@@ -110,6 +118,41 @@ module Legion
110
118
  build_args[:id] = request_id if request_id
111
119
  build(**build_args)
112
120
  end
121
+
122
+ def self.auto_routing_model?(model)
123
+ Legion::LLM::Inference::AUTO_ROUTING_MODEL_ALIASES.include?(model.to_s.strip.downcase)
124
+ end
125
+
126
+ def self.default_auto_routing_intent
127
+ routing = Legion::LLM::Settings.value(:routing, default: {})
128
+ intent = Legion::LLM::Settings.config_value(routing, :default_intent, {})
129
+ intent = intent.is_a?(Hash) ? normalize_hash(intent) : {}
130
+ intent.merge(capability: :chat)
131
+ rescue StandardError
132
+ { capability: :chat }
133
+ end
134
+
135
+ def self.normalize_auto_routing(routing, extra)
136
+ normalized_routing = normalize_hash(routing)
137
+ normalized_extra = normalize_hash(extra)
138
+ return [normalized_routing, normalized_extra] unless auto_routing_model?(normalized_routing[:model])
139
+
140
+ normalized_routing = { provider: nil, model: nil }
141
+ normalized_extra = normalized_extra.dup
142
+ normalized_extra.delete(:tier)
143
+ normalized_extra[:intent] ||= default_auto_routing_intent
144
+ normalized_extra[:auto_route] = true
145
+ normalized_extra[:requested_model_alias] = Legion::LLM::Inference::AUTO_ROUTING_MODEL_KEY
146
+ [normalized_routing, normalized_extra]
147
+ end
148
+
149
+ def self.normalize_hash(value)
150
+ return {} unless value.is_a?(Hash)
151
+
152
+ value.each_with_object({}) do |(key, hash_value), normalized|
153
+ normalized[key.respond_to?(:to_sym) ? key.to_sym : key] = hash_value
154
+ end
155
+ end
113
156
  end
114
157
  end
115
158
  end
@@ -319,6 +319,7 @@ module Legion
319
319
 
320
320
  def positive_integer(value)
321
321
  return nil if value.nil?
322
+ return nil if value.respond_to?(:empty?) && value.empty?
322
323
 
323
324
  integer = Integer(value)
324
325
  integer.positive? ? integer : nil
@@ -39,10 +39,8 @@ module Legion
39
39
 
40
40
  def padded_resolutions
41
41
  return [] if @resolutions.empty?
42
- return @resolutions.first(@max_attempts) if @resolutions.size >= @max_attempts
43
42
 
44
- last = @resolutions.last
45
- (@resolutions + Array.new(@max_attempts - @resolutions.size) { last }).first(@max_attempts)
43
+ @resolutions.first(@max_attempts)
46
44
  end
47
45
  end
48
46
  end
@@ -18,6 +18,7 @@ module Legion
18
18
  gemini: :cloud, azure: :cloud, ollama: :local, vllm: :fleet }.freeze
19
19
  PROVIDER_ORDER = %i[ollama vllm bedrock azure gemini anthropic openai].freeze
20
20
  TIER_EXTERNAL = Set[:cloud, :frontier, :openai_compat].freeze
21
+ TIER_RANK = { local: 0, direct: 1, fleet: 2, openai_compat: 3, cloud: 4, frontier: 5 }.freeze
21
22
 
22
23
  OLLAMA_MODEL_PATTERN = %r{[:/]}
23
24
 
@@ -60,9 +61,9 @@ module Legion
60
61
  # @param model [String, nil] explicit model override
61
62
  # @param provider [Symbol, nil] explicit provider override
62
63
  # @return [Resolution, nil]
63
- def resolve(intent: nil, tier: nil, model: nil, provider: nil, exclude: {})
64
- log.debug "[llm][router] action=resolve.enter intent=#{intent} tier=#{tier} model=#{model} provider=#{provider}"
65
- return explicit_resolution(tier, provider, model) if tier
64
+ def resolve(intent: nil, tier: nil, model: nil, provider: nil, instance: nil, exclude: {})
65
+ log.debug "[llm][router] action=resolve.enter intent=#{intent} tier=#{tier} model=#{model} provider=#{provider} instance=#{instance}"
66
+ return explicit_resolution(tier, provider, model, instance) if tier || provider || instance
66
67
 
67
68
  return nil unless routing_enabled? && intent
68
69
 
@@ -81,13 +82,14 @@ module Legion
81
82
  resolution || arbitrage_fallback(intent)
82
83
  end
83
84
 
84
- def resolve_chain(intent: nil, tier: nil, model: nil, provider: nil, max_escalations: nil, exclude: {})
85
+ def resolve_chain(intent: nil, tier: nil, model: nil, provider: nil, instance: nil, max_escalations: nil,
86
+ exclude: {}, allow_default_fallback: true)
85
87
  log.debug "[llm][router] action=resolve_chain.enter intent=#{intent} tier=#{tier} max_escalations=#{max_escalations}"
86
88
  max = max_escalations || escalation_max_attempts
87
- return EscalationChain.new(resolutions: [explicit_resolution(tier, provider, model)], max_attempts: max) if tier
88
- return chain_from_defaults(model, provider, max) unless routing_enabled? && intent
89
+ return EscalationChain.new(resolutions: [explicit_resolution(tier, provider, model, instance)], max_attempts: max) if tier || provider || instance
90
+ return chain_from_defaults(model, provider, max, allow_default_fallback: allow_default_fallback) unless routing_enabled? && intent
89
91
 
90
- chain_from_intent(intent, max, exclude: exclude)
92
+ chain_from_intent(intent, max, exclude: exclude, allow_default_fallback: allow_default_fallback)
91
93
  end
92
94
 
93
95
  def health_tracker
@@ -145,6 +147,34 @@ module Legion
145
147
  true
146
148
  end
147
149
 
150
+ def explicit_resolution(tier, provider, model, instance = nil)
151
+ registry_entry = if provider
152
+ registry_entry_for_provider(provider.to_sym, instance: instance&.to_sym)
153
+ elsif tier
154
+ registry_entry_for_tier(tier)
155
+ end
156
+ resolved_provider = if provider
157
+ provider.to_sym
158
+ else
159
+ registry_entry&.[](:provider) ||
160
+ (tier && default_provider_for_tier(tier)) ||
161
+ default_settings_provider&.to_sym ||
162
+ :anthropic
163
+ end
164
+ resolved_model = model || registry_default_model(registry_entry) || (tier && default_model_for_tier(tier))
165
+ resolved_instance = registry_entry&.[](:instance) || instance
166
+ resolved_tier = tier || PROVIDER_TIER.fetch(resolved_provider, :frontier)
167
+
168
+ Resolution.new(
169
+ tier: resolved_tier,
170
+ provider: resolved_provider,
171
+ model: resolved_model,
172
+ instance: resolved_instance,
173
+ rule: 'explicit',
174
+ metadata: registry_resolution_metadata(registry_entry)
175
+ )
176
+ end
177
+
148
178
  private
149
179
 
150
180
  def arbitrage_fallback(intent)
@@ -162,25 +192,6 @@ module Legion
162
192
  Resolution.new(tier: tier, provider: provider, model: model, rule: 'arbitrage_fallback')
163
193
  end
164
194
 
165
- def explicit_resolution(tier, provider, model)
166
- registry_entry = if provider
167
- registry_entry_for_provider(provider.to_sym)
168
- else
169
- registry_entry_for_tier(tier)
170
- end
171
- resolved_provider = provider ? provider.to_sym : (registry_entry&.[](:provider) || default_provider_for_tier(tier))
172
- resolved_model = model || registry_default_model(registry_entry) || default_model_for_tier(tier)
173
-
174
- Resolution.new(
175
- tier: tier,
176
- provider: resolved_provider,
177
- model: resolved_model,
178
- instance: registry_entry&.[](:instance),
179
- rule: 'explicit',
180
- metadata: registry_resolution_metadata(registry_entry)
181
- )
182
- end
183
-
184
195
  def merge_defaults(intent)
185
196
  defaults = (routing_settings[:default_intent] || {})
186
197
  .transform_keys(&:to_sym)
@@ -423,19 +434,22 @@ module Legion
423
434
  end
424
435
  end
425
436
 
426
- def chain_from_defaults(model, provider, max)
427
- if provider || model || default_settings_provider || default_settings_model
437
+ def chain_from_defaults(model, provider, max, allow_default_fallback: true)
438
+ if provider || model || (allow_default_fallback && (default_settings_provider || default_settings_model))
428
439
  p = (provider || default_settings_provider)&.to_sym
429
440
  resolved_model = model || registry_default_model(registry_entry_for_provider(p)) ||
430
441
  default_settings_model || 'claude-sonnet-4-6'
431
- res = Resolution.new(tier: PROVIDER_TIER.fetch(p, :frontier),
432
- provider: p || :anthropic,
433
- model: resolved_model)
434
- return EscalationChain.new(resolutions: [res], max_attempts: max)
442
+ primary = Resolution.new(tier: PROVIDER_TIER.fetch(p || :anthropic, :frontier),
443
+ provider: p || :anthropic,
444
+ model: resolved_model)
445
+ # Append remaining registered providers as fallbacks (sorted by tier rank)
446
+ fallbacks = enabled_provider_chain.reject { |r| r.provider == primary.provider }
447
+ resolutions = ([primary] + fallbacks).uniq { |r| [r.provider, r.instance, r.model] }
448
+ return EscalationChain.new(resolutions: resolutions, max_attempts: max)
435
449
  end
436
450
 
437
451
  resolutions = enabled_provider_chain
438
- if resolutions.empty?
452
+ if resolutions.empty? && allow_default_fallback
439
453
  p = default_settings_provider&.to_sym || :anthropic
440
454
  resolutions = [Resolution.new(tier: PROVIDER_TIER.fetch(p, :frontier),
441
455
  provider: p,
@@ -475,7 +489,7 @@ module Legion
475
489
  end
476
490
  end
477
491
 
478
- def chain_from_intent(intent, max, exclude: {})
492
+ def chain_from_intent(intent, max, exclude: {}, allow_default_fallback: true)
479
493
  merged = intent ? merge_defaults(intent) : {}
480
494
  rules = load_rules
481
495
  candidates = select_candidates(rules, merged, exclude: exclude)
@@ -484,7 +498,7 @@ module Legion
484
498
  resolutions = build_fallback_chain(sorted.first, sorted, resolutions) if sorted.first&.fallback
485
499
  resolutions = resolutions.uniq { |r| [r.provider, r.model] }
486
500
  resolutions = enabled_provider_chain if resolutions.empty?
487
- if resolutions.empty?
501
+ if resolutions.empty? && allow_default_fallback
488
502
  p = default_settings_provider&.to_sym || :anthropic
489
503
  resolutions = [Resolution.new(tier: PROVIDER_TIER.fetch(p, :frontier),
490
504
  provider: p,
@@ -573,14 +587,23 @@ module Legion
573
587
  end
574
588
 
575
589
  # Find the first registered instance for a specific provider.
576
- def registry_entry_for_provider(provider)
590
+ # When +instance+ is given, prefers the entry whose :instance matches;
591
+ # falls back to the first provider entry if no exact match is found.
592
+ def registry_entry_for_provider(provider, instance: nil)
577
593
  instances = begin
578
594
  Call::Registry.all_instances
579
595
  rescue StandardError => e
580
596
  handle_exception(e, level: :warn, handled: true, operation: 'router.registry_entry_for_provider')
581
597
  []
582
598
  end
583
- instances.find { |entry| entry[:provider] == provider }
599
+ provider_entries = instances.select { |entry| entry[:provider] == provider }
600
+ return nil if provider_entries.empty?
601
+
602
+ if instance
603
+ provider_entries.find { |entry| entry[:instance] == instance } || provider_entries.first
604
+ else
605
+ provider_entries.first
606
+ end
584
607
  end
585
608
 
586
609
  # Find a default model from registry for a given tier.
@@ -305,7 +305,7 @@ module Legion
305
305
  def self.routing_defaults
306
306
  {
307
307
  enabled: true,
308
- tier_priority: %w[local fleet openai_compat cloud frontier],
308
+ tier_priority: %w[local direct fleet openai_compat cloud frontier],
309
309
  default_intent: { privacy: 'normal', capability: 'moderate', cost: 'normal' },
310
310
  tiers: {
311
311
  local: { provider: 'ollama' },
@@ -2,6 +2,6 @@
2
2
 
3
3
  module Legion
4
4
  module LLM
5
- VERSION = '0.9.23'
5
+ VERSION = '0.9.28'
6
6
  end
7
7
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: legion-llm
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.9.23
4
+ version: 0.9.28
5
5
  platform: ruby
6
6
  authors:
7
7
  - Esity