legion-llm 0.9.23 → 0.9.29

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8c8c98a439d2e96bba437e5e8b4bf8c47c01277a4079bd459c7257e2990278c6
4
- data.tar.gz: f9344c761ebf18b4c5ab271ac8cb5858ce46f791588ace438726002d2907c70e
3
+ metadata.gz: 914f95fd880bbb73c043d16612bbf22fc22569455b721fe9052b5ee4c55e83b3
4
+ data.tar.gz: 0a2f12babcae95cab6c9f2e0ee8b5207c26d998aafcee464d457af7ebea4dc16
5
5
  SHA512:
6
- metadata.gz: 9574535d0eeca84d522858dd323e8d028994b46b3d3f78a37a8094a4f1a692fbdd68bf24e8a061160b5238a2c3e4f73141e29bf70c6423f18e4b4441937f5417
7
- data.tar.gz: 69aa8eccf10beb687b637b7442d9eb8a7bae0d42405fc9cfd47f8c8d5c036b7724df6cb55c10d8264f0701f46f8282f0a44d8622d607a6129a09ab4c39ad2e99
6
+ metadata.gz: 52a9853e3c2337d19a0e4517d0472143e9aa809dd94363655db76a94ed924041b95b20a1038442053fc8b30f4c035782335a6edc0ad9634cb17e7f6c59c81f4b
7
+ data.tar.gz: bd2f8b50e52e0653cedcb44c0c18105c212151b64baab5dd62248a456e917bdae8e8347d83d7c8c4dd535da354f00795d938643e94e95ec03657c03d2bfa33a2
data/CHANGELOG.md CHANGED
@@ -1,5 +1,74 @@
1
1
  # Legion LLM Changelog
2
2
 
3
+ ## [0.9.29] - 2026-05-16
4
+
5
+ ### Added
6
+ - Routing: generated discovery rules now expose `:tools` capability rules for tool-capable models and normalize provider aliases such as `function_calling`/`functions`.
7
+
8
+ ### Changed
9
+ - Routing: automatic routing honors top-level `llm.tier_order` before routing-specific tier priority settings.
10
+ - Routing: stream requests with injected native tools now require both `streaming` and `tools` model capabilities before selecting a target.
11
+
12
+ ### Fixed
13
+ - Router: required model capabilities filter out non-tool-capable candidates instead of selecting a local model that later rejects tool payloads.
14
+ - Executor: streaming provider calls now use the escalation chain, so provider errors like "does not support tools" can move to the next routed model.
15
+ - Executor: synthetic routing requirements no longer make model-only or explicit-provider requests bypass provider inference or registry defaults.
16
+
17
+ ## [0.9.28] - 2026-05-15
18
+
19
+ ### Added
20
+ - API: `/api/llm/models` now surfaces a static `LegionIO` model (`id: legionio`) as the default auto-routing placeholder.
21
+
22
+ ### Changed
23
+ - Routing: `model: "legionio"` clears explicit provider/model/instance/tier routing and sends the request through the router chain using the configured default intent.
24
+ - Routing: default tier priority now includes `direct` between `local` and `fleet`, and discovery-generated rule scores honor `routing.tier_priority`.
25
+
26
+ ### Fixed
27
+ - Prompt dispatch: provider-inferable model-only calls such as `gpt-5.4` infer the provider instead of pairing the model with `llm.default_provider`.
28
+ - Executor: provider-tier lookup failures are logged and return nil instead of silently defaulting to `:cloud`.
29
+ - LexLLMAdapter: optional content-block accessor fallbacks now capture and debug-log probe errors instead of bare-rescuing them.
30
+ - Auto routing: unresolved `legionio` requests now raise a clear provider error instead of falling back to configured defaults.
31
+ - Routing: model-only requests stay on provider inference while explicit provider/instance/tier requests still get registry defaults without requiring rule routing.
32
+
33
+ ## [0.9.27] - 2026-05-15
34
+
35
+ ### Fixed
36
+ - Router/Executor: provider-scoped instance resolution no longer applies a global `llm.default_instance` to models inferred for another provider; invalid explicit instances now fall back to that provider's registered default instance instead of dispatching to an unregistered `provider/instance` pair.
37
+
38
+ ## [0.9.26] - 2026-05-15
39
+
40
+ ### Fixed
41
+ - Discovery: `detect_embedding_from_registry` no longer sets `@can_embed = true` when no model is resolvable — adds `first_embedding_model_for(provider, instance)` as a third fallback scanning the discovered model catalog; returns `false` (allowing legacy probe to run) when all three sources yield nothing (#121)
42
+ - RagContext: `positive_integer` no longer raises `TypeError` when `value` is nil or an empty string — adds empty-string guard before `Kernel#Integer()` call so GAIA advisory `context_window: nil` does not abort the inference pipeline (#122)
43
+ - LexLLMAdapter: `text_part_content` now handles Anthropic-style `[{type:"text", text:"…"}]` content block arrays — flattens them to plain text instead of calling `.to_s` on the array, preventing Ruby array literals from leaking into provider prompts (#123)
44
+ - Embeddings/Discovery: `embedding_config_value` and `embedding_settings` now accept the deprecated plural `"embeddings"` key alongside the canonical singular `"embedding"` key, emitting a deprecation warning; fixes silent misconfiguration when users follow doc examples that used the plural spelling (#124)
45
+
46
+ ## [0.9.25] - 2026-05-14
47
+
48
+ ### Added
49
+ - Router: `TIER_RANK` constant — ordered quality ranking of tiers (local → direct → fleet → openai_compat → cloud → frontier)
50
+ - Router: `explicit_resolution` promoted to public — callable directly from executor without `send`
51
+ - Router: `chain_from_defaults` appends all registered fallback providers after the primary so the chain has real alternatives to escalate to (previously single-entry when a default provider was configured)
52
+ - Executor: `run_escalation_resolution` extracted from escalation loop — encapsulates per-attempt dispatch, error rescue, and `tried[]` tracking
53
+ - Executor: `skip_same_tier!` — on `ContextOverflow`, immediately skips all remaining same-tier candidates and routes to a higher-tier provider with a larger context window
54
+ - Executor: lateral vs. escalation move classification in per-attempt log line (`move=lateral` for same-tier, `move=escalation` for higher-tier)
55
+
56
+ ### Fixed
57
+ - Router: `explicit_resolution` handles nil `provider` and nil `tier` without raising `NoMethodError`
58
+ - Executor: `build_fallback_resolutions` sorts lateral alternatives (same-tier) before escalation candidates (higher-tier) — tries other instances at the same tier before promoting to a more expensive one
59
+ - Executor: deduplication in escalation loop is fully safe — `tried` entry is recorded on all rescue paths and on quality failure
60
+ - EscalationChain: `padded_resolutions` no longer pads the list by repeating the last resolution — only real distinct options are tried
61
+
62
+ ## [0.9.24] - 2026-05-14
63
+
64
+ ### Fixed
65
+ - API: `instance` from POST body was silently dropped — never forwarded into routing hash
66
+ - Executor: Gaia advisory tier assignment no longer overrides explicit `provider`+`instance` from caller
67
+ - Executor: `instance` now passed through `routing_resolution_for` to `Router.resolve`/`resolve_chain`
68
+ - Executor: `build_default_escalation_chain` now passes resolved provider/instance/model — previously ignored them and built a full auto chain, routing to vllm/fleet instead of the requested provider
69
+ - Router: `resolve`/`resolve_chain` accept `instance:` param; short-circuit to `explicit_resolution` when `provider` or `instance` is set (not just `tier`)
70
+ - Router: `explicit_resolution` honors caller-supplied instance instead of always pulling from registry; infers tier from `PROVIDER_TIER` when not explicitly given
71
+
3
72
  ## [0.9.23] - 2026-05-13
4
73
 
5
74
  ### Added
data/CLAUDE.md CHANGED
@@ -745,6 +745,26 @@ These rules are enforced across all legion-llm code. Violations will be caught i
745
745
  - **Advanced signals**: Budget tracking, GPU utilization monitoring, per-tenant spend limits
746
746
  - **Fleet auto-scaling**: Dynamic worker pool sizing based on queue depth and latency
747
747
 
748
+ ## Provider Registration & Model Resolution
749
+
750
+ - `discover_instances` in each lex-llm-* must include `default_model` in returned config — it flows to registry metadata via `instance_metadata` in `call/providers.rb`
751
+ - Router resolves models via: `registry_entry_for_provider(provider)` → `registry_default_model(entry)` → `metadata[:default_model]`
752
+ - `enabled: false` on an instance config prevents registration — checked in `register_provider_instance`
753
+ - `PROVIDER_DEFAULT_MODEL` does NOT belong in legion-llm — each provider owns its default in its own extension
754
+ - Inventory calls `native_provider_offerings` (full metadata) and excludes `discovery_offerings` for providers with native adapters
755
+
756
+ ## Metering Spool
757
+
758
+ - Events spool to `~/.legionio/data/spool/metering/events.jsonl` when AMQP transport is unavailable
759
+ - Thread-safe (SPOOL_MUTEX), capped at `settings[:metering][:spool][:max_events]` (default 10K)
760
+ - `flush_spool` publishes spooled events when transport reconnects; `lex-llm-ledger` actor triggers it
761
+
762
+ ## Health Tracker
763
+
764
+ - `deny_model(provider:, model:, instance:)` — permanently excludes a model from routing (in-memory, until restart)
765
+ - Config errors (ValidationException, AccessDenied, marketplace) trigger deny instead of circuit breaker
766
+ - Discovery connection failures report `:error` to health tracker — circuit opens after threshold
767
+
748
768
  ---
749
769
 
750
770
  **Maintained By**: Matthew Iverson (@Esity)
@@ -108,7 +108,7 @@ module Legion
108
108
  id: request_id,
109
109
  messages: messages,
110
110
  system: body[:system],
111
- routing: { provider: provider, model: model },
111
+ routing: { provider: provider, model: model, instance: body[:instance] }.compact,
112
112
  tools: tool_declarations,
113
113
  caller: effective_caller,
114
114
  conversation_id: conversation_id,
@@ -9,6 +9,11 @@ module Legion
9
9
  module Models
10
10
  extend Legion::Logging::Helper
11
11
 
12
+ AUTO_ROUTING_MODEL_ID = 'legionio'
13
+ AUTO_ROUTING_MODEL_DISPLAY = 'LegionIO'
14
+ AUTO_ROUTING_OFFERING_ID = 'legionio:auto:inference:legionio'
15
+ AUTO_ROUTING_CAPABILITIES = %w[auto_routing chat completion json_schema tools].freeze
16
+
12
17
  def self.registered(app)
13
18
  log.debug('[llm][api][models] registering model inventory routes')
14
19
 
@@ -18,6 +23,7 @@ module Legion
18
23
 
19
24
  filters = Legion::LLM::API::Native::Models.request_filters(params)
20
25
  offerings = Legion::LLM::Inventory.offerings(filters)
26
+ offerings = Legion::LLM::API::Native::Models.with_auto_routing_offering(offerings, filters)
21
27
 
22
28
  json_response({
23
29
  models: Legion::LLM::API::Native::Models.model_summaries(offerings),
@@ -34,7 +40,9 @@ module Legion
34
40
  log.debug("[llm][api][models] action=get_model id=#{model_id}")
35
41
  require_llm!
36
42
 
37
- offerings = Legion::LLM::Inventory.offerings(model: model_id)
43
+ filters = { model: model_id }
44
+ offerings = Legion::LLM::Inventory.offerings(filters)
45
+ offerings = Legion::LLM::API::Native::Models.with_auto_routing_offering(offerings, filters)
38
46
  halt json_error('model_not_found', "Model '#{model_id}' not found", status_code: 404) unless offerings.any?
39
47
 
40
48
  json_response({
@@ -84,11 +92,11 @@ module Legion
84
92
  summaries = offerings.group_by { |offering| offering[:model] }.map do |model, rows|
85
93
  summarize_model(model, rows)
86
94
  end
87
- summaries.sort_by { |model| model[:id] }
95
+ summaries.sort_by { |model| [auto_routing_model?(model[:id]) ? 0 : 1, model[:id]] }
88
96
  end
89
97
 
90
98
  def self.summarize_model(model, offerings)
91
- {
99
+ summary = {
92
100
  id: model.to_s,
93
101
  types: offerings.map { |offering| offering[:type].to_s }.uniq.sort,
94
102
  providers: offerings.map { |offering| offering[:provider_family] }.uniq.sort,
@@ -99,6 +107,12 @@ module Legion
99
107
  max_context: offerings.filter_map { |offering| offering.dig(:limits, :context_window) }.max,
100
108
  enabled: offerings.any? { |offering| offering[:enabled] != false }
101
109
  }
110
+ if auto_routing_model?(model)
111
+ summary[:display_name] = AUTO_ROUTING_MODEL_DISPLAY
112
+ summary[:auto_route] = true
113
+ summary[:default] = true
114
+ end
115
+ summary
102
116
  end
103
117
 
104
118
  def self.summary(offerings)
@@ -110,6 +124,64 @@ module Legion
110
124
  .transform_values(&:size)
111
125
  }
112
126
  end
127
+
128
+ def self.with_auto_routing_offering(offerings, filters = {})
129
+ return offerings unless auto_routing_offering_matches?(filters)
130
+ return offerings if offerings.any? { |offering| auto_routing_model?(offering[:model]) }
131
+
132
+ [auto_routing_offering, *offerings]
133
+ end
134
+
135
+ def self.auto_routing_offering
136
+ {
137
+ id: AUTO_ROUTING_OFFERING_ID,
138
+ offering_id: AUTO_ROUTING_OFFERING_ID,
139
+ model: AUTO_ROUTING_MODEL_ID,
140
+ display_name: AUTO_ROUTING_MODEL_DISPLAY,
141
+ model_family: 'legionio',
142
+ canonical_model_alias: AUTO_ROUTING_MODEL_ID,
143
+ type: :inference,
144
+ provider_family: 'legionio',
145
+ provider_instance: 'auto',
146
+ instance_id: 'auto',
147
+ tier: :auto,
148
+ transport: :internal,
149
+ enabled: true,
150
+ capabilities: AUTO_ROUTING_CAPABILITIES,
151
+ limits: {},
152
+ health: { circuit_state: 'available' },
153
+ metadata: { auto_route: true, placeholder: true, display_name: AUTO_ROUTING_MODEL_DISPLAY },
154
+ routing_metadata: { strategy: 'auto' },
155
+ source: 'static'
156
+ }
157
+ end
158
+
159
+ def self.auto_routing_offering_matches?(filters)
160
+ normalized = request_filters(filters)
161
+ type = normalized[:type]
162
+ return false if type && !type.to_s.empty? && type.to_s != 'inference' && type.to_s != 'chat'
163
+
164
+ provider = normalized[:provider]
165
+ return false if provider && !provider.to_s.empty? && !%w[legionio auto].include?(provider.to_s.downcase)
166
+
167
+ instance = normalized[:instance_id]
168
+ return false if instance && !instance.to_s.empty? && !%w[auto legionio].include?(instance.to_s.downcase)
169
+
170
+ model = normalized[:model] || normalized[:offering_id]
171
+ return false if model && !model.to_s.empty? && !auto_routing_model?(model) && model.to_s != AUTO_ROUTING_OFFERING_ID
172
+
173
+ family = normalized[:model_family]
174
+ return false if family && !family.to_s.empty? && family.to_s.downcase != 'legionio'
175
+
176
+ capability = normalized[:capability]
177
+ return false if capability && !AUTO_ROUTING_CAPABILITIES.include?(capability.to_s)
178
+
179
+ true
180
+ end
181
+
182
+ def self.auto_routing_model?(model)
183
+ model.to_s.strip.downcase == AUTO_ROUTING_MODEL_ID
184
+ end
113
185
  end
114
186
  end
115
187
  end
@@ -160,8 +160,12 @@ module Legion
160
160
  end
161
161
 
162
162
  def self.tier_priority
163
+ return Legion::LLM::Router.tier_priority if defined?(Legion::LLM::Router)
164
+
163
165
  routing_config = Legion::LLM::Settings.value(:routing) || {}
164
- Array(routing_config[:tier_priority] || %w[local fleet openai_compat cloud frontier])
166
+ top_level = Legion::LLM::Settings.value(:tier_order, default: nil)
167
+ Array(top_level || routing_config[:tier_order] || routing_config[:tier_priority] ||
168
+ %w[local direct fleet openai_compat cloud frontier])
165
169
  end
166
170
 
167
171
  def self.privacy_mode?
@@ -232,7 +236,8 @@ module Legion
232
236
  return 'unknown' unless tracker
233
237
 
234
238
  tracker.circuit_state(provider_name.to_sym, instance: instance_name.to_sym).to_s
235
- rescue StandardError
239
+ rescue StandardError => e
240
+ log.debug "[llm][tiers] action=offering_instance_health provider=#{provider_name} instance=#{instance_name} error=#{e.class} — #{e.message}"
236
241
  'unknown'
237
242
  end
238
243
  end
@@ -122,12 +122,21 @@ module Legion
122
122
 
123
123
  def resolve_provider
124
124
  LLM.embedding_provider ||
125
- Legion::LLM::Settings.value(:embedding, :provider)&.to_sym
125
+ embedding_config_value(:provider)&.to_sym
126
126
  end
127
127
 
128
128
  def resolve_model
129
129
  LLM.embedding_model ||
130
- Legion::LLM::Settings.value(:embedding, :default_model)
130
+ embedding_config_value(:default_model)
131
+ end
132
+
133
+ def embedding_config_value(key)
134
+ v = Legion::LLM::Settings.value(:embedding, key)
135
+ return v unless v.nil?
136
+
137
+ plural = Legion::LLM::Settings.value(:embeddings, key)
138
+ log.warn "[llm][embeddings] settings key \"embeddings.#{key}\" (plural) is deprecated — rename to \"embedding.#{key}\"" unless plural.nil?
139
+ plural
131
140
  end
132
141
 
133
142
  def coerce_text(value)
@@ -239,12 +239,49 @@ module Legion
239
239
  end
240
240
 
241
241
  def text_part_content(part)
242
- return unless part.respond_to?(:transform_keys)
242
+ return part if part.is_a?(String)
243
243
 
244
- normalized = part.transform_keys { |key| key.respond_to?(:to_sym) ? key.to_sym : key }
245
- return unless normalized[:type].to_s == 'text'
244
+ if part.respond_to?(:transform_keys)
245
+ normalized = part.transform_keys { |key| key.respond_to?(:to_sym) ? key.to_sym : key }
246
+ return unless normalized[:type].to_s == 'text'
246
247
 
247
- normalized[:text].to_s
248
+ return normalized[:text].to_s
249
+ end
250
+
251
+ # Data structs expose named readers (type/text) without necessarily implementing [].
252
+ # Try named accessor path first; fall through to [] / fetch for plain hashes/structs.
253
+ if part.respond_to?(:type) || part.respond_to?(:text)
254
+ type = (part.respond_to?(:type) ? part.type.to_s : '')
255
+ text = part.respond_to?(:text) ? part.text : nil
256
+ return text.to_s if type == 'text' || (type.empty? && !text.nil?)
257
+
258
+ return nil
259
+ end
260
+
261
+ return unless part.respond_to?(:[]) || part.respond_to?(:fetch)
262
+
263
+ type = (defined_method_access(part, :type) || '').to_s
264
+ text = defined_method_access(part, :text)
265
+ text.to_s if type == 'text' || (type.empty? && !text.nil?)
266
+ end
267
+
268
+ def defined_method_access(obj, key)
269
+ # Prefer named accessor (covers Data structs like Types::ContentBlock).
270
+ key_sym = key.respond_to?(:to_sym) ? key.to_sym : key
271
+ return obj.public_send(key_sym) if obj.respond_to?(key_sym)
272
+
273
+ str_key = key.to_s
274
+ obj[key]
275
+ rescue TypeError, NoMethodError, KeyError => e
276
+ log.debug "[llm][adapter] action=defined_method_access key=#{key} class=#{obj.class} " \
277
+ "fallback=string_key error=#{e.class}: #{e.message}"
278
+ begin
279
+ obj[str_key]
280
+ rescue TypeError, NoMethodError, KeyError => fallback_error
281
+ log.debug "[llm][adapter] action=defined_method_access key=#{key} class=#{obj.class} " \
282
+ "fallback=none error=#{fallback_error.class}: #{fallback_error.message}"
283
+ nil
284
+ end
248
285
  end
249
286
 
250
287
  def normalize_message_tool_calls(tool_calls)
@@ -52,7 +52,8 @@ module Legion
52
52
  Legion::Extensions::Llm.constants(false).filter_map do |const_name|
53
53
  mod = Legion::Extensions::Llm.const_get(const_name, false)
54
54
  provider_module?(mod) ? mod : nil
55
- rescue NameError
55
+ rescue NameError => e
56
+ log.debug "[llm][providers] action=discover_provider_modules const=#{const_name} error=#{e.class} — #{e.message}"
56
57
  nil
57
58
  end
58
59
  end
@@ -120,7 +121,8 @@ module Legion
120
121
  return nil unless provider_module&.const_defined?(:PROVIDER_FAMILY, false)
121
122
 
122
123
  provider_module::PROVIDER_FAMILY
123
- rescue StandardError
124
+ rescue StandardError => e
125
+ log.debug "[llm][providers] action=safe_provider_family error=#{e.class} — #{e.message}"
124
126
  nil
125
127
  end
126
128
 
@@ -9,9 +9,15 @@ module Legion
9
9
  class Curator
10
10
  include Legion::Logging::Helper
11
11
 
12
- CURATED_KEY = :__curated__
13
- THINKING_OPEN = '<thinking>'
14
- THINKING_CLOSE = '</thinking>'
12
+ CURATED_KEY = :__curated__
13
+
14
+ # All known provider thinking tag variants.
15
+ # Anthropic: <thinking>…</thinking>
16
+ # DeepSeek / Qwen / Ollama / vLLM inline: <think>…</think>
17
+ THINKING_TAG_PAIRS = [
18
+ ['<thinking>', '</thinking>'],
19
+ ['<think>', '</think>']
20
+ ].freeze
15
21
 
16
22
  def initialize(conversation_id:)
17
23
  @conversation_id = conversation_id
@@ -76,6 +82,8 @@ module Legion
76
82
  return msg if content.length <= max_chars
77
83
 
78
84
  summary = heuristic_tool_summary(content, tool_name_from(msg))
85
+ log.debug "[llm][curator] action=distill_tool_result conversation_id=#{@conversation_id} " \
86
+ "original_chars=#{content.length} summary_chars=#{summary.length}"
79
87
  msg.merge(content: summary, curated: true, original_content: content)
80
88
  end
81
89
 
@@ -89,6 +97,8 @@ module Legion
89
97
 
90
98
  return msg if stripped == content || stripped.empty?
91
99
 
100
+ log.debug "[llm][curator] action=strip_thinking conversation_id=#{@conversation_id} " \
101
+ "original_chars=#{content.length} stripped_chars=#{stripped.length}"
92
102
  msg.merge(content: stripped, curated: true, original_content: content)
93
103
  end
94
104
 
@@ -192,18 +202,27 @@ module Legion
192
202
  end
193
203
 
194
204
  def strip_thinking_tags(text)
195
- result = +''
205
+ result = text
206
+ THINKING_TAG_PAIRS.each do |open_tag, close_tag|
207
+ result = strip_tag_pair(result, open_tag, close_tag)
208
+ end
209
+ result
210
+ end
211
+
212
+ def strip_tag_pair(text, open_tag, close_tag)
213
+ out = +''
196
214
  pos = 0
197
215
  while pos < text.length
198
- open_idx = text.index(THINKING_OPEN, pos)
216
+ open_idx = text.index(open_tag, pos)
199
217
  break unless open_idx
200
218
 
201
- result << text[pos...open_idx]
202
- close_idx = text.index(THINKING_CLOSE, open_idx + THINKING_OPEN.length)
203
- pos = close_idx ? close_idx + THINKING_CLOSE.length : text.length
219
+ out << text[pos...open_idx]
220
+ close_idx = text.index(close_tag, open_idx + open_tag.length)
221
+ pos = close_idx ? close_idx + close_tag.length : text.length
204
222
  end
205
- result << text[pos..] if pos < text.length
206
- result
223
+ out << text[pos..] if pos < text.length
224
+ # Strip any unclosed open tag left at the end (provider died mid-stream).
225
+ out.sub(/#{Regexp.escape(open_tag)}.*\z/m, '').strip
207
226
  end
208
227
 
209
228
  def curate_message(msg, assistant_response)
@@ -427,7 +446,8 @@ module Legion
427
446
  def curated_payload(entry)
428
447
  parsed = Legion::JSON.parse(entry[:content].to_s)
429
448
  parsed.is_a?(Hash) ? parsed : {}
430
- rescue Legion::JSON::ParseError
449
+ rescue Legion::JSON::ParseError => e
450
+ log.debug "[llm][curator] action=curated_payload conversation_id=#{@conversation_id} error=#{e.class} — #{e.message}"
431
451
  {}
432
452
  end
433
453
 
@@ -26,7 +26,15 @@ module Legion
26
26
  anthropic: :frontier
27
27
  }.freeze
28
28
 
29
- TIER_WEIGHT = { local: 100, fleet: 80, cloud: 60, frontier: 40 }.freeze
29
+ DEFAULT_TIER_PRIORITY = %i[local direct fleet openai_compat cloud frontier].freeze
30
+ CAPABILITY_ALIASES = {
31
+ function_calling: :tools,
32
+ functions: :tools,
33
+ tool: :tools,
34
+ tool_use: :tools,
35
+ stream: :streaming,
36
+ stream_chat: :streaming
37
+ }.freeze
30
38
 
31
39
  module_function
32
40
 
@@ -50,9 +58,12 @@ module Legion
50
58
  extract_field(model_data, 'tier')&.to_sym ||
51
59
  tier
52
60
  capability = embedding_model?(model_data) ? :embed : :chat
53
- priority = (TIER_WEIGHT[model_tier] || 80) - order
61
+ priority = tier_weight(model_tier) - order
54
62
  rules << build_rule(provider, instance_id, model_data, capability, model_tier, priority)
55
- rules << build_rule(provider, instance_id, model_data, :stream, model_tier, priority) if capability == :chat
63
+ if capability == :chat
64
+ rules << build_rule(provider, instance_id, model_data, :stream, model_tier, priority) if supports_streaming?(model_data)
65
+ rules << build_rule(provider, instance_id, model_data, :tools, model_tier, priority) if supports_tools?(model_data)
66
+ end
56
67
  order += 1
57
68
  end
58
69
  end
@@ -91,7 +102,7 @@ module Legion
91
102
  next unless default_model
92
103
 
93
104
  model_data = { name: default_model }
94
- priority = TIER_WEIGHT[tier] || 40
105
+ priority = tier_weight(tier)
95
106
  rules << build_rule(provider_name, :default, model_data, :chat, tier, priority)
96
107
  rules << build_rule(provider_name, :default, model_data, :stream, tier, priority)
97
108
  end
@@ -125,17 +136,71 @@ module Legion
125
136
  return nil unless model_data.is_a?(Hash)
126
137
 
127
138
  caps = model_data[:capabilities] || model_data['capabilities']
128
- return caps if caps.is_a?(Array) && caps.any?
139
+ normalized = normalize_capabilities(caps)
140
+ return normalized if normalized.any?
129
141
 
130
142
  nil
131
143
  end
132
144
 
145
+ def supports_streaming?(model_data)
146
+ capabilities = extract_capabilities(model_data)
147
+ return true if capabilities.nil?
148
+
149
+ capabilities.include?(:streaming)
150
+ end
151
+
152
+ def supports_tools?(model_data)
153
+ capabilities = extract_capabilities(model_data)
154
+ return false if capabilities.nil?
155
+
156
+ capabilities.include?(:tools)
157
+ end
158
+
159
+ def normalize_capabilities(capabilities)
160
+ Array(capabilities).compact.each_with_object([]) do |capability, normalized|
161
+ next unless capability.respond_to?(:to_s)
162
+
163
+ capability_sym = capability.to_s.downcase.strip.to_sym
164
+ next if capability_sym.to_s.empty?
165
+
166
+ normalized << capability_sym
167
+ alias_sym = CAPABILITY_ALIASES[capability_sym]
168
+ normalized << alias_sym if alias_sym
169
+ end.uniq
170
+ end
171
+
133
172
  def extract_field(model_data, field)
134
173
  return nil unless model_data.is_a?(Hash)
135
174
 
136
175
  model_data[field] || model_data[field.to_s]
137
176
  end
138
177
 
178
+ def tier_weight(tier)
179
+ tier_sym = tier.respond_to?(:to_sym) ? tier.to_sym : tier
180
+ index = tier_priority.index(tier_sym)
181
+ return 0 unless index
182
+
183
+ (tier_priority.length - index) * 100
184
+ end
185
+
186
+ def tier_priority
187
+ configured = Legion::LLM::Settings.value(:tier_order, default: nil)
188
+ configured = Legion::LLM::Settings.value(:routing, :tier_order, default: nil) if blank_array?(configured)
189
+ configured = Legion::LLM::Settings.value(:routing, :tier_priority, default: DEFAULT_TIER_PRIORITY) if blank_array?(configured)
190
+ normalized = Array(configured).filter_map do |tier|
191
+ tier.to_sym if tier.respond_to?(:to_sym)
192
+ end
193
+ normalized = DEFAULT_TIER_PRIORITY if normalized.empty?
194
+ (normalized + DEFAULT_TIER_PRIORITY).uniq
195
+ rescue StandardError => e
196
+ handle_exception(e, level: :warn, handled: true, operation: 'rule_generator.tier_priority')
197
+ DEFAULT_TIER_PRIORITY
198
+ end
199
+
200
+ def blank_array?(value)
201
+ Array(value).empty?
202
+ end
203
+
139
204
  def extension_providers
140
205
  ext = Legion::Settings[:extensions]
141
206
  return ext[:llm] if ext.is_a?(Hash) && ext[:llm].is_a?(Hash)
@@ -254,11 +254,22 @@ module Legion
254
254
  end
255
255
  return false unless best
256
256
 
257
- @embedding_provider = best[:provider]
258
- @embedding_model = best.dig(:metadata, :default_model) ||
259
- Settings.value(:embedding, :default_model)
260
- @embedding_instance = best[:instance]
261
- @can_embed = true
257
+ provider = best[:provider]
258
+ instance = best[:instance]
259
+ resolved = best.dig(:metadata, :default_model) ||
260
+ embedding_settings[:default_model] ||
261
+ first_embedding_model_for(provider, instance)
262
+
263
+ unless resolved.to_s.length.positive?
264
+ log.debug '[llm][discovery] action=detect_embedding_from_registry no_model_resolved ' \
265
+ "provider=#{provider} instance=#{instance} — falling through to legacy probe"
266
+ return false
267
+ end
268
+
269
+ @embedding_provider = provider
270
+ @embedding_model = resolved
271
+ @embedding_instance = instance
272
+ @can_embed = true
262
273
  @embedding_fallback_chain = build_registry_embedding_fallback(embedding_instances)
263
274
 
264
275
  log.info "[llm][discovery] embedding available provider=#{@embedding_provider} " \
@@ -280,6 +291,14 @@ module Legion
280
291
  end
281
292
  end
282
293
 
294
+ def first_embedding_model_for(provider, instance)
295
+ embedding_caps = %w[embedding embeddings embed].freeze
296
+ cached_discovered_models.find do |m|
297
+ m[:provider].to_s == provider.to_s && m[:instance].to_s == instance.to_s &&
298
+ Array(m[:capabilities]).any? { |c| embedding_caps.include?(c.to_s) }
299
+ end&.dig(:model)
300
+ end
301
+
283
302
  def find_embedding_provider(embedding_settings)
284
303
  fallback = Legion::LLM::Settings.config_value(embedding_settings, :provider_fallback, %w[ollama bedrock openai])
285
304
  provider_models = Legion::LLM::Settings.config_value(embedding_settings, :provider_models, {})
@@ -396,7 +415,17 @@ module Legion
396
415
  end
397
416
 
398
417
  def embedding_settings
399
- Legion::LLM::Settings.config_value(llm_settings, :embedding, {})
418
+ settings = llm_settings
419
+ result = Legion::LLM::Settings.config_value(settings, :embedding)
420
+ return result if result.is_a?(Hash) && !result.empty?
421
+
422
+ plural = Legion::LLM::Settings.config_value(settings, :embeddings)
423
+ if plural.is_a?(Hash) && !plural.empty?
424
+ log.warn '[llm][discovery] settings key "embeddings" (plural) is deprecated — rename to "embedding" (singular)'
425
+ return plural
426
+ end
427
+
428
+ result || {}
400
429
  end
401
430
 
402
431
  def providers_settings