legion-llm 0.9.23 → 0.9.28
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +55 -0
- data/CLAUDE.md +20 -0
- data/lib/legion/llm/api/native/inference.rb +1 -1
- data/lib/legion/llm/api/native/models.rb +75 -3
- data/lib/legion/llm/api/native/tiers.rb +2 -1
- data/lib/legion/llm/call/embeddings.rb +11 -2
- data/lib/legion/llm/call/lex_llm_adapter.rb +41 -4
- data/lib/legion/llm/call/providers.rb +4 -2
- data/lib/legion/llm/context/curator.rb +31 -11
- data/lib/legion/llm/discovery/rule_generator.rb +23 -3
- data/lib/legion/llm/discovery.rb +35 -6
- data/lib/legion/llm/inference/executor.rb +187 -38
- data/lib/legion/llm/inference/prompt.rb +30 -19
- data/lib/legion/llm/inference/request.rb +45 -2
- data/lib/legion/llm/inference/steps/rag_context.rb +1 -0
- data/lib/legion/llm/router/escalation/chain.rb +1 -3
- data/lib/legion/llm/router.rb +60 -37
- data/lib/legion/llm/settings.rb +1 -1
- data/lib/legion/llm/version.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 178958a3403cbac0fad20d83f2726914d420137db2a1c340c33c4c7305457fcd
|
|
4
|
+
data.tar.gz: df951b9e05e0a0bfaff3701b6a3c5bd8452edea2298fe91e6a98165ce96961d1
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 7ff1622a50cdafc4d09577e8dc5f9e90632f7f93e528f41c1b630ce5daeeeceab6c91cbf5d8ec92183e0b6f27c17a62d09934b6bb0a413da9577cea5a47c942f
|
|
7
|
+
data.tar.gz: 49651eb56bbe046223674a626ce4339363ba10550793c1d95ca46d4bf2c4b7f2eb430397b7f0d91368791cecb683ae2919f8627edb45d88d5c847f4d0cb4ee12
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,60 @@
|
|
|
1
1
|
# Legion LLM Changelog
|
|
2
2
|
|
|
3
|
+
## [0.9.28] - 2026-05-15
|
|
4
|
+
|
|
5
|
+
### Added
|
|
6
|
+
- API: `/api/llm/models` now surfaces a static `LegionIO` model (`id: legionio`) as the default auto-routing placeholder.
|
|
7
|
+
|
|
8
|
+
### Changed
|
|
9
|
+
- Routing: `model: "legionio"` clears explicit provider/model/instance/tier routing and sends the request through the router chain using the configured default intent.
|
|
10
|
+
- Routing: default tier priority now includes `direct` between `local` and `fleet`, and discovery-generated rule scores honor `routing.tier_priority`.
|
|
11
|
+
|
|
12
|
+
### Fixed
|
|
13
|
+
- Prompt dispatch: provider-inferable model-only calls such as `gpt-5.4` infer the provider instead of pairing the model with `llm.default_provider`.
|
|
14
|
+
- Executor: provider-tier lookup failures are logged and return nil instead of silently defaulting to `:cloud`.
|
|
15
|
+
- LexLLMAdapter: optional content-block accessor fallbacks now capture and debug-log probe errors instead of bare-rescuing them.
|
|
16
|
+
- Auto routing: unresolved `legionio` requests now raise a clear provider error instead of falling back to configured defaults.
|
|
17
|
+
- Routing: model-only requests stay on provider inference while explicit provider/instance/tier requests still get registry defaults without requiring rule routing.
|
|
18
|
+
|
|
19
|
+
## [0.9.27] - 2026-05-15
|
|
20
|
+
|
|
21
|
+
### Fixed
|
|
22
|
+
- Router/Executor: provider-scoped instance resolution no longer applies a global `llm.default_instance` to models inferred for another provider; invalid explicit instances now fall back to that provider's registered default instance instead of dispatching to an unregistered `provider/instance` pair.
|
|
23
|
+
|
|
24
|
+
## [0.9.26] - 2026-05-15
|
|
25
|
+
|
|
26
|
+
### Fixed
|
|
27
|
+
- Discovery: `detect_embedding_from_registry` no longer sets `@can_embed = true` when no model is resolvable — adds `first_embedding_model_for(provider, instance)` as a third fallback scanning the discovered model catalog; returns `false` (allowing legacy probe to run) when all three sources yield nothing (#121)
|
|
28
|
+
- RagContext: `positive_integer` no longer raises `TypeError` when `value` is nil or an empty string — adds empty-string guard before `Kernel#Integer()` call so GAIA advisory `context_window: nil` does not abort the inference pipeline (#122)
|
|
29
|
+
- LexLLMAdapter: `text_part_content` now handles Anthropic-style `[{type:"text", text:"…"}]` content block arrays — flattens them to plain text instead of calling `.to_s` on the array, preventing Ruby array literals from leaking into provider prompts (#123)
|
|
30
|
+
- Embeddings/Discovery: `embedding_config_value` and `embedding_settings` now accept the deprecated plural `"embeddings"` key alongside the canonical singular `"embedding"` key, emitting a deprecation warning; fixes silent misconfiguration when users follow doc examples that used the plural spelling (#124)
|
|
31
|
+
|
|
32
|
+
## [0.9.25] - 2026-05-14
|
|
33
|
+
|
|
34
|
+
### Added
|
|
35
|
+
- Router: `TIER_RANK` constant — ordered quality ranking of tiers (local → direct → fleet → openai_compat → cloud → frontier)
|
|
36
|
+
- Router: `explicit_resolution` promoted to public — callable directly from executor without `send`
|
|
37
|
+
- Router: `chain_from_defaults` appends all registered fallback providers after the primary so the chain has real alternatives to escalate to (previously single-entry when a default provider was configured)
|
|
38
|
+
- Executor: `run_escalation_resolution` extracted from escalation loop — encapsulates per-attempt dispatch, error rescue, and `tried[]` tracking
|
|
39
|
+
- Executor: `skip_same_tier!` — on `ContextOverflow`, immediately skips all remaining same-tier candidates and routes to a higher-tier provider with a larger context window
|
|
40
|
+
- Executor: lateral vs. escalation move classification in per-attempt log line (`move=lateral` for same-tier, `move=escalation` for higher-tier)
|
|
41
|
+
|
|
42
|
+
### Fixed
|
|
43
|
+
- Router: `explicit_resolution` handles nil `provider` and nil `tier` without raising `NoMethodError`
|
|
44
|
+
- Executor: `build_fallback_resolutions` sorts lateral alternatives (same-tier) before escalation candidates (higher-tier) — tries other instances at the same tier before promoting to a more expensive one
|
|
45
|
+
- Executor: deduplication in escalation loop is fully safe — `tried` entry is recorded on all rescue paths and on quality failure
|
|
46
|
+
- EscalationChain: `padded_resolutions` no longer pads the list by repeating the last resolution — only real distinct options are tried
|
|
47
|
+
|
|
48
|
+
## [0.9.24] - 2026-05-14
|
|
49
|
+
|
|
50
|
+
### Fixed
|
|
51
|
+
- API: `instance` from POST body was silently dropped — never forwarded into routing hash
|
|
52
|
+
- Executor: Gaia advisory tier assignment no longer overrides explicit `provider`+`instance` from caller
|
|
53
|
+
- Executor: `instance` now passed through `routing_resolution_for` to `Router.resolve`/`resolve_chain`
|
|
54
|
+
- Executor: `build_default_escalation_chain` now passes resolved provider/instance/model — previously ignored them and built a full auto chain, routing to vllm/fleet instead of the requested provider
|
|
55
|
+
- Router: `resolve`/`resolve_chain` accept `instance:` param; short-circuit to `explicit_resolution` when `provider` or `instance` is set (not just `tier`)
|
|
56
|
+
- Router: `explicit_resolution` honors caller-supplied instance instead of always pulling from registry; infers tier from `PROVIDER_TIER` when not explicitly given
|
|
57
|
+
|
|
3
58
|
## [0.9.23] - 2026-05-13
|
|
4
59
|
|
|
5
60
|
### Added
|
data/CLAUDE.md
CHANGED
|
@@ -745,6 +745,26 @@ These rules are enforced across all legion-llm code. Violations will be caught i
|
|
|
745
745
|
- **Advanced signals**: Budget tracking, GPU utilization monitoring, per-tenant spend limits
|
|
746
746
|
- **Fleet auto-scaling**: Dynamic worker pool sizing based on queue depth and latency
|
|
747
747
|
|
|
748
|
+
## Provider Registration & Model Resolution
|
|
749
|
+
|
|
750
|
+
- `discover_instances` in each lex-llm-* must include `default_model` in returned config — it flows to registry metadata via `instance_metadata` in `call/providers.rb`
|
|
751
|
+
- Router resolves models via: `registry_entry_for_provider(provider)` → `registry_default_model(entry)` → `metadata[:default_model]`
|
|
752
|
+
- `enabled: false` on an instance config prevents registration — checked in `register_provider_instance`
|
|
753
|
+
- `PROVIDER_DEFAULT_MODEL` does NOT belong in legion-llm — each provider owns its default in its own extension
|
|
754
|
+
- Inventory calls `native_provider_offerings` (full metadata) and excludes `discovery_offerings` for providers with native adapters
|
|
755
|
+
|
|
756
|
+
## Metering Spool
|
|
757
|
+
|
|
758
|
+
- Events spool to `~/.legionio/data/spool/metering/events.jsonl` when AMQP transport is unavailable
|
|
759
|
+
- Thread-safe (SPOOL_MUTEX), capped at `settings[:metering][:spool][:max_events]` (default 10K)
|
|
760
|
+
- `flush_spool` publishes spooled events when transport reconnects; `lex-llm-ledger` actor triggers it
|
|
761
|
+
|
|
762
|
+
## Health Tracker
|
|
763
|
+
|
|
764
|
+
- `deny_model(provider:, model:, instance:)` — permanently excludes a model from routing (in-memory, until restart)
|
|
765
|
+
- Config errors (ValidationException, AccessDenied, marketplace) trigger deny instead of circuit breaker
|
|
766
|
+
- Discovery connection failures report `:error` to health tracker — circuit opens after threshold
|
|
767
|
+
|
|
748
768
|
---
|
|
749
769
|
|
|
750
770
|
**Maintained By**: Matthew Iverson (@Esity)
|
|
@@ -108,7 +108,7 @@ module Legion
|
|
|
108
108
|
id: request_id,
|
|
109
109
|
messages: messages,
|
|
110
110
|
system: body[:system],
|
|
111
|
-
routing: { provider: provider, model: model },
|
|
111
|
+
routing: { provider: provider, model: model, instance: body[:instance] }.compact,
|
|
112
112
|
tools: tool_declarations,
|
|
113
113
|
caller: effective_caller,
|
|
114
114
|
conversation_id: conversation_id,
|
|
@@ -9,6 +9,11 @@ module Legion
|
|
|
9
9
|
module Models
|
|
10
10
|
extend Legion::Logging::Helper
|
|
11
11
|
|
|
12
|
+
AUTO_ROUTING_MODEL_ID = 'legionio'
|
|
13
|
+
AUTO_ROUTING_MODEL_DISPLAY = 'LegionIO'
|
|
14
|
+
AUTO_ROUTING_OFFERING_ID = 'legionio:auto:inference:legionio'
|
|
15
|
+
AUTO_ROUTING_CAPABILITIES = %w[auto_routing chat completion json_schema tools].freeze
|
|
16
|
+
|
|
12
17
|
def self.registered(app)
|
|
13
18
|
log.debug('[llm][api][models] registering model inventory routes')
|
|
14
19
|
|
|
@@ -18,6 +23,7 @@ module Legion
|
|
|
18
23
|
|
|
19
24
|
filters = Legion::LLM::API::Native::Models.request_filters(params)
|
|
20
25
|
offerings = Legion::LLM::Inventory.offerings(filters)
|
|
26
|
+
offerings = Legion::LLM::API::Native::Models.with_auto_routing_offering(offerings, filters)
|
|
21
27
|
|
|
22
28
|
json_response({
|
|
23
29
|
models: Legion::LLM::API::Native::Models.model_summaries(offerings),
|
|
@@ -34,7 +40,9 @@ module Legion
|
|
|
34
40
|
log.debug("[llm][api][models] action=get_model id=#{model_id}")
|
|
35
41
|
require_llm!
|
|
36
42
|
|
|
37
|
-
|
|
43
|
+
filters = { model: model_id }
|
|
44
|
+
offerings = Legion::LLM::Inventory.offerings(filters)
|
|
45
|
+
offerings = Legion::LLM::API::Native::Models.with_auto_routing_offering(offerings, filters)
|
|
38
46
|
halt json_error('model_not_found', "Model '#{model_id}' not found", status_code: 404) unless offerings.any?
|
|
39
47
|
|
|
40
48
|
json_response({
|
|
@@ -84,11 +92,11 @@ module Legion
|
|
|
84
92
|
summaries = offerings.group_by { |offering| offering[:model] }.map do |model, rows|
|
|
85
93
|
summarize_model(model, rows)
|
|
86
94
|
end
|
|
87
|
-
summaries.sort_by { |model| model[:id] }
|
|
95
|
+
summaries.sort_by { |model| [auto_routing_model?(model[:id]) ? 0 : 1, model[:id]] }
|
|
88
96
|
end
|
|
89
97
|
|
|
90
98
|
def self.summarize_model(model, offerings)
|
|
91
|
-
{
|
|
99
|
+
summary = {
|
|
92
100
|
id: model.to_s,
|
|
93
101
|
types: offerings.map { |offering| offering[:type].to_s }.uniq.sort,
|
|
94
102
|
providers: offerings.map { |offering| offering[:provider_family] }.uniq.sort,
|
|
@@ -99,6 +107,12 @@ module Legion
|
|
|
99
107
|
max_context: offerings.filter_map { |offering| offering.dig(:limits, :context_window) }.max,
|
|
100
108
|
enabled: offerings.any? { |offering| offering[:enabled] != false }
|
|
101
109
|
}
|
|
110
|
+
if auto_routing_model?(model)
|
|
111
|
+
summary[:display_name] = AUTO_ROUTING_MODEL_DISPLAY
|
|
112
|
+
summary[:auto_route] = true
|
|
113
|
+
summary[:default] = true
|
|
114
|
+
end
|
|
115
|
+
summary
|
|
102
116
|
end
|
|
103
117
|
|
|
104
118
|
def self.summary(offerings)
|
|
@@ -110,6 +124,64 @@ module Legion
|
|
|
110
124
|
.transform_values(&:size)
|
|
111
125
|
}
|
|
112
126
|
end
|
|
127
|
+
|
|
128
|
+
def self.with_auto_routing_offering(offerings, filters = {})
|
|
129
|
+
return offerings unless auto_routing_offering_matches?(filters)
|
|
130
|
+
return offerings if offerings.any? { |offering| auto_routing_model?(offering[:model]) }
|
|
131
|
+
|
|
132
|
+
[auto_routing_offering, *offerings]
|
|
133
|
+
end
|
|
134
|
+
|
|
135
|
+
def self.auto_routing_offering
|
|
136
|
+
{
|
|
137
|
+
id: AUTO_ROUTING_OFFERING_ID,
|
|
138
|
+
offering_id: AUTO_ROUTING_OFFERING_ID,
|
|
139
|
+
model: AUTO_ROUTING_MODEL_ID,
|
|
140
|
+
display_name: AUTO_ROUTING_MODEL_DISPLAY,
|
|
141
|
+
model_family: 'legionio',
|
|
142
|
+
canonical_model_alias: AUTO_ROUTING_MODEL_ID,
|
|
143
|
+
type: :inference,
|
|
144
|
+
provider_family: 'legionio',
|
|
145
|
+
provider_instance: 'auto',
|
|
146
|
+
instance_id: 'auto',
|
|
147
|
+
tier: :auto,
|
|
148
|
+
transport: :internal,
|
|
149
|
+
enabled: true,
|
|
150
|
+
capabilities: AUTO_ROUTING_CAPABILITIES,
|
|
151
|
+
limits: {},
|
|
152
|
+
health: { circuit_state: 'available' },
|
|
153
|
+
metadata: { auto_route: true, placeholder: true, display_name: AUTO_ROUTING_MODEL_DISPLAY },
|
|
154
|
+
routing_metadata: { strategy: 'auto' },
|
|
155
|
+
source: 'static'
|
|
156
|
+
}
|
|
157
|
+
end
|
|
158
|
+
|
|
159
|
+
def self.auto_routing_offering_matches?(filters)
|
|
160
|
+
normalized = request_filters(filters)
|
|
161
|
+
type = normalized[:type]
|
|
162
|
+
return false if type && !type.to_s.empty? && type.to_s != 'inference' && type.to_s != 'chat'
|
|
163
|
+
|
|
164
|
+
provider = normalized[:provider]
|
|
165
|
+
return false if provider && !provider.to_s.empty? && !%w[legionio auto].include?(provider.to_s.downcase)
|
|
166
|
+
|
|
167
|
+
instance = normalized[:instance_id]
|
|
168
|
+
return false if instance && !instance.to_s.empty? && !%w[auto legionio].include?(instance.to_s.downcase)
|
|
169
|
+
|
|
170
|
+
model = normalized[:model] || normalized[:offering_id]
|
|
171
|
+
return false if model && !model.to_s.empty? && !auto_routing_model?(model) && model.to_s != AUTO_ROUTING_OFFERING_ID
|
|
172
|
+
|
|
173
|
+
family = normalized[:model_family]
|
|
174
|
+
return false if family && !family.to_s.empty? && family.to_s.downcase != 'legionio'
|
|
175
|
+
|
|
176
|
+
capability = normalized[:capability]
|
|
177
|
+
return false if capability && !AUTO_ROUTING_CAPABILITIES.include?(capability.to_s)
|
|
178
|
+
|
|
179
|
+
true
|
|
180
|
+
end
|
|
181
|
+
|
|
182
|
+
def self.auto_routing_model?(model)
|
|
183
|
+
model.to_s.strip.downcase == AUTO_ROUTING_MODEL_ID
|
|
184
|
+
end
|
|
113
185
|
end
|
|
114
186
|
end
|
|
115
187
|
end
|
|
@@ -232,7 +232,8 @@ module Legion
|
|
|
232
232
|
return 'unknown' unless tracker
|
|
233
233
|
|
|
234
234
|
tracker.circuit_state(provider_name.to_sym, instance: instance_name.to_sym).to_s
|
|
235
|
-
rescue StandardError
|
|
235
|
+
rescue StandardError => e
|
|
236
|
+
log.debug "[llm][tiers] action=offering_instance_health provider=#{provider_name} instance=#{instance_name} error=#{e.class} — #{e.message}"
|
|
236
237
|
'unknown'
|
|
237
238
|
end
|
|
238
239
|
end
|
|
@@ -122,12 +122,21 @@ module Legion
|
|
|
122
122
|
|
|
123
123
|
def resolve_provider
|
|
124
124
|
LLM.embedding_provider ||
|
|
125
|
-
|
|
125
|
+
embedding_config_value(:provider)&.to_sym
|
|
126
126
|
end
|
|
127
127
|
|
|
128
128
|
def resolve_model
|
|
129
129
|
LLM.embedding_model ||
|
|
130
|
-
|
|
130
|
+
embedding_config_value(:default_model)
|
|
131
|
+
end
|
|
132
|
+
|
|
133
|
+
def embedding_config_value(key)
|
|
134
|
+
v = Legion::LLM::Settings.value(:embedding, key)
|
|
135
|
+
return v unless v.nil?
|
|
136
|
+
|
|
137
|
+
plural = Legion::LLM::Settings.value(:embeddings, key)
|
|
138
|
+
log.warn "[llm][embeddings] settings key \"embeddings.#{key}\" (plural) is deprecated — rename to \"embedding.#{key}\"" unless plural.nil?
|
|
139
|
+
plural
|
|
131
140
|
end
|
|
132
141
|
|
|
133
142
|
def coerce_text(value)
|
|
@@ -239,12 +239,49 @@ module Legion
|
|
|
239
239
|
end
|
|
240
240
|
|
|
241
241
|
def text_part_content(part)
|
|
242
|
-
return
|
|
242
|
+
return part if part.is_a?(String)
|
|
243
243
|
|
|
244
|
-
|
|
245
|
-
|
|
244
|
+
if part.respond_to?(:transform_keys)
|
|
245
|
+
normalized = part.transform_keys { |key| key.respond_to?(:to_sym) ? key.to_sym : key }
|
|
246
|
+
return unless normalized[:type].to_s == 'text'
|
|
246
247
|
|
|
247
|
-
|
|
248
|
+
return normalized[:text].to_s
|
|
249
|
+
end
|
|
250
|
+
|
|
251
|
+
# Data structs expose named readers (type/text) without necessarily implementing [].
|
|
252
|
+
# Try named accessor path first; fall through to [] / fetch for plain hashes/structs.
|
|
253
|
+
if part.respond_to?(:type) || part.respond_to?(:text)
|
|
254
|
+
type = (part.respond_to?(:type) ? part.type.to_s : '')
|
|
255
|
+
text = part.respond_to?(:text) ? part.text : nil
|
|
256
|
+
return text.to_s if type == 'text' || (type.empty? && !text.nil?)
|
|
257
|
+
|
|
258
|
+
return nil
|
|
259
|
+
end
|
|
260
|
+
|
|
261
|
+
return unless part.respond_to?(:[]) || part.respond_to?(:fetch)
|
|
262
|
+
|
|
263
|
+
type = (defined_method_access(part, :type) || '').to_s
|
|
264
|
+
text = defined_method_access(part, :text)
|
|
265
|
+
text.to_s if type == 'text' || (type.empty? && !text.nil?)
|
|
266
|
+
end
|
|
267
|
+
|
|
268
|
+
def defined_method_access(obj, key)
|
|
269
|
+
# Prefer named accessor (covers Data structs like Types::ContentBlock).
|
|
270
|
+
key_sym = key.respond_to?(:to_sym) ? key.to_sym : key
|
|
271
|
+
return obj.public_send(key_sym) if obj.respond_to?(key_sym)
|
|
272
|
+
|
|
273
|
+
str_key = key.to_s
|
|
274
|
+
obj[key]
|
|
275
|
+
rescue TypeError, NoMethodError, KeyError => e
|
|
276
|
+
log.debug "[llm][adapter] action=defined_method_access key=#{key} class=#{obj.class} " \
|
|
277
|
+
"fallback=string_key error=#{e.class}: #{e.message}"
|
|
278
|
+
begin
|
|
279
|
+
obj[str_key]
|
|
280
|
+
rescue TypeError, NoMethodError, KeyError => fallback_error
|
|
281
|
+
log.debug "[llm][adapter] action=defined_method_access key=#{key} class=#{obj.class} " \
|
|
282
|
+
"fallback=none error=#{fallback_error.class}: #{fallback_error.message}"
|
|
283
|
+
nil
|
|
284
|
+
end
|
|
248
285
|
end
|
|
249
286
|
|
|
250
287
|
def normalize_message_tool_calls(tool_calls)
|
|
@@ -52,7 +52,8 @@ module Legion
|
|
|
52
52
|
Legion::Extensions::Llm.constants(false).filter_map do |const_name|
|
|
53
53
|
mod = Legion::Extensions::Llm.const_get(const_name, false)
|
|
54
54
|
provider_module?(mod) ? mod : nil
|
|
55
|
-
rescue NameError
|
|
55
|
+
rescue NameError => e
|
|
56
|
+
log.debug "[llm][providers] action=discover_provider_modules const=#{const_name} error=#{e.class} — #{e.message}"
|
|
56
57
|
nil
|
|
57
58
|
end
|
|
58
59
|
end
|
|
@@ -120,7 +121,8 @@ module Legion
|
|
|
120
121
|
return nil unless provider_module&.const_defined?(:PROVIDER_FAMILY, false)
|
|
121
122
|
|
|
122
123
|
provider_module::PROVIDER_FAMILY
|
|
123
|
-
rescue StandardError
|
|
124
|
+
rescue StandardError => e
|
|
125
|
+
log.debug "[llm][providers] action=safe_provider_family error=#{e.class} — #{e.message}"
|
|
124
126
|
nil
|
|
125
127
|
end
|
|
126
128
|
|
|
@@ -9,9 +9,15 @@ module Legion
|
|
|
9
9
|
class Curator
|
|
10
10
|
include Legion::Logging::Helper
|
|
11
11
|
|
|
12
|
-
CURATED_KEY
|
|
13
|
-
|
|
14
|
-
|
|
12
|
+
CURATED_KEY = :__curated__
|
|
13
|
+
|
|
14
|
+
# All known provider thinking tag variants.
|
|
15
|
+
# Anthropic: <thinking>…</thinking>
|
|
16
|
+
# DeepSeek / Qwen / Ollama / vLLM inline: <think>…</think>
|
|
17
|
+
THINKING_TAG_PAIRS = [
|
|
18
|
+
['<thinking>', '</thinking>'],
|
|
19
|
+
['<think>', '</think>']
|
|
20
|
+
].freeze
|
|
15
21
|
|
|
16
22
|
def initialize(conversation_id:)
|
|
17
23
|
@conversation_id = conversation_id
|
|
@@ -76,6 +82,8 @@ module Legion
|
|
|
76
82
|
return msg if content.length <= max_chars
|
|
77
83
|
|
|
78
84
|
summary = heuristic_tool_summary(content, tool_name_from(msg))
|
|
85
|
+
log.debug "[llm][curator] action=distill_tool_result conversation_id=#{@conversation_id} " \
|
|
86
|
+
"original_chars=#{content.length} summary_chars=#{summary.length}"
|
|
79
87
|
msg.merge(content: summary, curated: true, original_content: content)
|
|
80
88
|
end
|
|
81
89
|
|
|
@@ -89,6 +97,8 @@ module Legion
|
|
|
89
97
|
|
|
90
98
|
return msg if stripped == content || stripped.empty?
|
|
91
99
|
|
|
100
|
+
log.debug "[llm][curator] action=strip_thinking conversation_id=#{@conversation_id} " \
|
|
101
|
+
"original_chars=#{content.length} stripped_chars=#{stripped.length}"
|
|
92
102
|
msg.merge(content: stripped, curated: true, original_content: content)
|
|
93
103
|
end
|
|
94
104
|
|
|
@@ -192,18 +202,27 @@ module Legion
|
|
|
192
202
|
end
|
|
193
203
|
|
|
194
204
|
def strip_thinking_tags(text)
|
|
195
|
-
result =
|
|
205
|
+
result = text
|
|
206
|
+
THINKING_TAG_PAIRS.each do |open_tag, close_tag|
|
|
207
|
+
result = strip_tag_pair(result, open_tag, close_tag)
|
|
208
|
+
end
|
|
209
|
+
result
|
|
210
|
+
end
|
|
211
|
+
|
|
212
|
+
def strip_tag_pair(text, open_tag, close_tag)
|
|
213
|
+
out = +''
|
|
196
214
|
pos = 0
|
|
197
215
|
while pos < text.length
|
|
198
|
-
open_idx = text.index(
|
|
216
|
+
open_idx = text.index(open_tag, pos)
|
|
199
217
|
break unless open_idx
|
|
200
218
|
|
|
201
|
-
|
|
202
|
-
close_idx = text.index(
|
|
203
|
-
pos = close_idx ? close_idx +
|
|
219
|
+
out << text[pos...open_idx]
|
|
220
|
+
close_idx = text.index(close_tag, open_idx + open_tag.length)
|
|
221
|
+
pos = close_idx ? close_idx + close_tag.length : text.length
|
|
204
222
|
end
|
|
205
|
-
|
|
206
|
-
|
|
223
|
+
out << text[pos..] if pos < text.length
|
|
224
|
+
# Strip any unclosed open tag left at the end (provider died mid-stream).
|
|
225
|
+
out.sub(/#{Regexp.escape(open_tag)}.*\z/m, '').strip
|
|
207
226
|
end
|
|
208
227
|
|
|
209
228
|
def curate_message(msg, assistant_response)
|
|
@@ -427,7 +446,8 @@ module Legion
|
|
|
427
446
|
def curated_payload(entry)
|
|
428
447
|
parsed = Legion::JSON.parse(entry[:content].to_s)
|
|
429
448
|
parsed.is_a?(Hash) ? parsed : {}
|
|
430
|
-
rescue Legion::JSON::ParseError
|
|
449
|
+
rescue Legion::JSON::ParseError => e
|
|
450
|
+
log.debug "[llm][curator] action=curated_payload conversation_id=#{@conversation_id} error=#{e.class} — #{e.message}"
|
|
431
451
|
{}
|
|
432
452
|
end
|
|
433
453
|
|
|
@@ -26,7 +26,7 @@ module Legion
|
|
|
26
26
|
anthropic: :frontier
|
|
27
27
|
}.freeze
|
|
28
28
|
|
|
29
|
-
|
|
29
|
+
DEFAULT_TIER_PRIORITY = %i[local direct fleet openai_compat cloud frontier].freeze
|
|
30
30
|
|
|
31
31
|
module_function
|
|
32
32
|
|
|
@@ -50,7 +50,7 @@ module Legion
|
|
|
50
50
|
extract_field(model_data, 'tier')&.to_sym ||
|
|
51
51
|
tier
|
|
52
52
|
capability = embedding_model?(model_data) ? :embed : :chat
|
|
53
|
-
priority = (
|
|
53
|
+
priority = tier_weight(model_tier) - order
|
|
54
54
|
rules << build_rule(provider, instance_id, model_data, capability, model_tier, priority)
|
|
55
55
|
rules << build_rule(provider, instance_id, model_data, :stream, model_tier, priority) if capability == :chat
|
|
56
56
|
order += 1
|
|
@@ -91,7 +91,7 @@ module Legion
|
|
|
91
91
|
next unless default_model
|
|
92
92
|
|
|
93
93
|
model_data = { name: default_model }
|
|
94
|
-
priority =
|
|
94
|
+
priority = tier_weight(tier)
|
|
95
95
|
rules << build_rule(provider_name, :default, model_data, :chat, tier, priority)
|
|
96
96
|
rules << build_rule(provider_name, :default, model_data, :stream, tier, priority)
|
|
97
97
|
end
|
|
@@ -136,6 +136,26 @@ module Legion
|
|
|
136
136
|
model_data[field] || model_data[field.to_s]
|
|
137
137
|
end
|
|
138
138
|
|
|
139
|
+
def tier_weight(tier)
|
|
140
|
+
tier_sym = tier.respond_to?(:to_sym) ? tier.to_sym : tier
|
|
141
|
+
index = tier_priority.index(tier_sym)
|
|
142
|
+
return 0 unless index
|
|
143
|
+
|
|
144
|
+
(tier_priority.length - index) * 100
|
|
145
|
+
end
|
|
146
|
+
|
|
147
|
+
def tier_priority
|
|
148
|
+
configured = Legion::LLM::Settings.value(:routing, :tier_priority, default: DEFAULT_TIER_PRIORITY)
|
|
149
|
+
normalized = Array(configured).filter_map do |tier|
|
|
150
|
+
tier.to_sym if tier.respond_to?(:to_sym)
|
|
151
|
+
end
|
|
152
|
+
normalized = DEFAULT_TIER_PRIORITY if normalized.empty?
|
|
153
|
+
(normalized + DEFAULT_TIER_PRIORITY).uniq
|
|
154
|
+
rescue StandardError => e
|
|
155
|
+
handle_exception(e, level: :warn, handled: true, operation: 'rule_generator.tier_priority')
|
|
156
|
+
DEFAULT_TIER_PRIORITY
|
|
157
|
+
end
|
|
158
|
+
|
|
139
159
|
def extension_providers
|
|
140
160
|
ext = Legion::Settings[:extensions]
|
|
141
161
|
return ext[:llm] if ext.is_a?(Hash) && ext[:llm].is_a?(Hash)
|
data/lib/legion/llm/discovery.rb
CHANGED
|
@@ -254,11 +254,22 @@ module Legion
|
|
|
254
254
|
end
|
|
255
255
|
return false unless best
|
|
256
256
|
|
|
257
|
-
|
|
258
|
-
|
|
259
|
-
|
|
260
|
-
|
|
261
|
-
|
|
257
|
+
provider = best[:provider]
|
|
258
|
+
instance = best[:instance]
|
|
259
|
+
resolved = best.dig(:metadata, :default_model) ||
|
|
260
|
+
embedding_settings[:default_model] ||
|
|
261
|
+
first_embedding_model_for(provider, instance)
|
|
262
|
+
|
|
263
|
+
unless resolved.to_s.length.positive?
|
|
264
|
+
log.debug '[llm][discovery] action=detect_embedding_from_registry no_model_resolved ' \
|
|
265
|
+
"provider=#{provider} instance=#{instance} — falling through to legacy probe"
|
|
266
|
+
return false
|
|
267
|
+
end
|
|
268
|
+
|
|
269
|
+
@embedding_provider = provider
|
|
270
|
+
@embedding_model = resolved
|
|
271
|
+
@embedding_instance = instance
|
|
272
|
+
@can_embed = true
|
|
262
273
|
@embedding_fallback_chain = build_registry_embedding_fallback(embedding_instances)
|
|
263
274
|
|
|
264
275
|
log.info "[llm][discovery] embedding available provider=#{@embedding_provider} " \
|
|
@@ -280,6 +291,14 @@ module Legion
|
|
|
280
291
|
end
|
|
281
292
|
end
|
|
282
293
|
|
|
294
|
+
def first_embedding_model_for(provider, instance)
|
|
295
|
+
embedding_caps = %w[embedding embeddings embed].freeze
|
|
296
|
+
cached_discovered_models.find do |m|
|
|
297
|
+
m[:provider].to_s == provider.to_s && m[:instance].to_s == instance.to_s &&
|
|
298
|
+
Array(m[:capabilities]).any? { |c| embedding_caps.include?(c.to_s) }
|
|
299
|
+
end&.dig(:model)
|
|
300
|
+
end
|
|
301
|
+
|
|
283
302
|
def find_embedding_provider(embedding_settings)
|
|
284
303
|
fallback = Legion::LLM::Settings.config_value(embedding_settings, :provider_fallback, %w[ollama bedrock openai])
|
|
285
304
|
provider_models = Legion::LLM::Settings.config_value(embedding_settings, :provider_models, {})
|
|
@@ -396,7 +415,17 @@ module Legion
|
|
|
396
415
|
end
|
|
397
416
|
|
|
398
417
|
def embedding_settings
|
|
399
|
-
|
|
418
|
+
settings = llm_settings
|
|
419
|
+
result = Legion::LLM::Settings.config_value(settings, :embedding)
|
|
420
|
+
return result if result.is_a?(Hash) && !result.empty?
|
|
421
|
+
|
|
422
|
+
plural = Legion::LLM::Settings.config_value(settings, :embeddings)
|
|
423
|
+
if plural.is_a?(Hash) && !plural.empty?
|
|
424
|
+
log.warn '[llm][discovery] settings key "embeddings" (plural) is deprecated — rename to "embedding" (singular)'
|
|
425
|
+
return plural
|
|
426
|
+
end
|
|
427
|
+
|
|
428
|
+
result || {}
|
|
400
429
|
end
|
|
401
430
|
|
|
402
431
|
def providers_settings
|
|
@@ -159,9 +159,11 @@ module Legion
|
|
|
159
159
|
return meta[:tier].to_sym if meta.is_a?(Hash) && meta[:tier]
|
|
160
160
|
return Router.provider_tier(provider) if defined?(Router) && Router.respond_to?(:provider_tier)
|
|
161
161
|
|
|
162
|
-
Router::PROVIDER_TIER.fetch(provider.to_sym,
|
|
163
|
-
rescue StandardError
|
|
164
|
-
:
|
|
162
|
+
Router::PROVIDER_TIER.fetch(provider.to_sym, nil) if defined?(Router::PROVIDER_TIER)
|
|
163
|
+
rescue StandardError => e
|
|
164
|
+
handle_exception(e, level: :warn, handled: true, operation: 'llm.pipeline.inferred_provider_tier',
|
|
165
|
+
provider: provider)
|
|
166
|
+
nil
|
|
165
167
|
end
|
|
166
168
|
|
|
167
169
|
def execute_steps
|
|
@@ -326,12 +328,17 @@ module Legion
|
|
|
326
328
|
log.debug "[llm][executor] action=step_routing.enter requested_provider=#{@request.routing[:provider]} requested_model=#{@request.routing[:model]}"
|
|
327
329
|
@timestamps[:routing_start] = Time.now
|
|
328
330
|
state = resolve_routing_state(apply_proactive_tier_assignment(routing_request_state))
|
|
331
|
+
auto_route = state[:auto_route] == true
|
|
329
332
|
|
|
330
333
|
@resolved_provider = state[:provider] ||
|
|
331
334
|
(state[:model] && Router.infer_provider_for_model(state[:model])) ||
|
|
332
|
-
llm_setting(:default_provider)
|
|
333
|
-
@resolved_instance = state[:instance]
|
|
334
|
-
@resolved_model = state[:model] || llm_setting(:default_model)
|
|
335
|
+
(llm_setting(:default_provider) unless auto_route)
|
|
336
|
+
@resolved_instance = resolve_provider_instance(state[:instance], @resolved_provider)
|
|
337
|
+
@resolved_model = state[:model] || (llm_setting(:default_model) unless auto_route)
|
|
338
|
+
if auto_route && (@resolved_provider.nil? || @resolved_model.nil?)
|
|
339
|
+
raise ProviderError, 'Auto routing could not resolve an available LLM provider/model'
|
|
340
|
+
end
|
|
341
|
+
|
|
335
342
|
@resolved_tier = state[:tier]&.to_sym || inferred_provider_tier(@resolved_provider)
|
|
336
343
|
@resolved_offering_id = state[:offering_id]
|
|
337
344
|
@resolved_offering_metadata = state[:offering_metadata]
|
|
@@ -347,16 +354,43 @@ module Legion
|
|
|
347
354
|
)
|
|
348
355
|
end
|
|
349
356
|
|
|
357
|
+
def resolve_provider_instance(requested_instance, provider)
|
|
358
|
+
return provider_scoped_instance(requested_instance, provider, preserve_unknown: true) if requested_instance
|
|
359
|
+
|
|
360
|
+
provider_scoped_instance(llm_setting(:default_instance), provider, preserve_unknown: false)
|
|
361
|
+
end
|
|
362
|
+
|
|
363
|
+
def provider_scoped_instance(instance, provider, preserve_unknown:)
|
|
364
|
+
return nil if instance.nil? || instance.to_s.empty? || provider.nil? || provider.to_s.empty?
|
|
365
|
+
|
|
366
|
+
provider_sym = provider.to_sym
|
|
367
|
+
instance_sym = instance.to_sym
|
|
368
|
+
return instance_sym if Call::Registry.registered?(provider_sym, instance: instance_sym)
|
|
369
|
+
return nil if Call::Registry.registered?(provider_sym)
|
|
370
|
+
|
|
371
|
+
preserve_unknown ? instance_sym : nil
|
|
372
|
+
rescue StandardError => e
|
|
373
|
+
handle_exception(e, level: :warn, handled: true, operation: 'llm.pipeline.provider_scoped_instance')
|
|
374
|
+
preserve_unknown ? instance : nil
|
|
375
|
+
end
|
|
376
|
+
|
|
350
377
|
def routing_request_state
|
|
378
|
+
routing_explicit = @request.extra[:routing_explicit]
|
|
379
|
+
instance = @request.routing[:instance] || @request.routing[:instance_id] || @request.routing[:provider_instance]
|
|
380
|
+
tier = @request.extra[:tier]
|
|
351
381
|
{
|
|
352
382
|
provider: @request.routing[:provider],
|
|
353
|
-
instance:
|
|
383
|
+
instance: instance,
|
|
354
384
|
model: @request.routing[:model],
|
|
355
385
|
offering_id: @request.routing[:offering_id] || @request.routing[:id],
|
|
356
386
|
offering_metadata: normalize_offering_metadata(@request.routing[:offering_metadata] ||
|
|
357
387
|
@request.routing[:offering]),
|
|
358
388
|
intent: @request.extra[:intent],
|
|
359
|
-
tier:
|
|
389
|
+
tier: tier,
|
|
390
|
+
auto_route: @request.extra[:auto_route],
|
|
391
|
+
provider_explicit: routing_field_explicit?(routing_explicit, :provider, @request.routing[:provider]),
|
|
392
|
+
instance_explicit: routing_field_explicit?(routing_explicit, :instance, instance),
|
|
393
|
+
tier_explicit: routing_field_explicit?(routing_explicit, :tier, tier)
|
|
360
394
|
}
|
|
361
395
|
end
|
|
362
396
|
|
|
@@ -365,17 +399,25 @@ module Legion
|
|
|
365
399
|
# caller-supplied tier/intent. Advisory assignments only fill blanks.
|
|
366
400
|
if @proactive_tier_assignment&.dig(:forced)
|
|
367
401
|
state[:tier] = @proactive_tier_assignment[:tier]
|
|
402
|
+
state[:tier_explicit] = true
|
|
368
403
|
state[:intent] = merge_routing_intent(state[:intent], @proactive_tier_assignment[:intent])
|
|
369
404
|
log.info "[llm][routing] action=forced_tier source=#{@proactive_tier_assignment[:source]} tier=#{state[:tier]}"
|
|
370
|
-
elsif @proactive_tier_assignment && !state[:tier] && !state[:intent]
|
|
405
|
+
elsif @proactive_tier_assignment && !state[:tier] && !state[:intent] && !state[:instance] &&
|
|
406
|
+
!state[:provider] && !state[:model]
|
|
371
407
|
state[:tier] = @proactive_tier_assignment[:tier]
|
|
408
|
+
state[:tier_explicit] = true
|
|
372
409
|
state[:intent] = @proactive_tier_assignment[:intent]
|
|
373
410
|
end
|
|
374
411
|
state
|
|
375
412
|
end
|
|
376
413
|
|
|
377
414
|
def resolve_routing_state(state)
|
|
378
|
-
return state unless
|
|
415
|
+
return state unless defined?(Router)
|
|
416
|
+
|
|
417
|
+
explicit_route = state[:provider_explicit] || state[:instance_explicit] || state[:tier_explicit]
|
|
418
|
+
auto_route = state[:auto_route] == true
|
|
419
|
+
intent_route = state[:intent] && Router.routing_enabled?
|
|
420
|
+
return state unless explicit_route || auto_route || intent_route
|
|
379
421
|
|
|
380
422
|
resolution = routing_resolution_for(state)
|
|
381
423
|
return state unless resolution
|
|
@@ -384,17 +426,20 @@ module Legion
|
|
|
384
426
|
end
|
|
385
427
|
|
|
386
428
|
def routing_resolution_for(state)
|
|
387
|
-
if pipeline_escalation_enabled?
|
|
429
|
+
if state[:auto_route] == true || (state[:intent] && pipeline_escalation_enabled?)
|
|
388
430
|
@escalation_chain = Router.resolve_chain(
|
|
389
|
-
intent:
|
|
390
|
-
tier:
|
|
391
|
-
model:
|
|
392
|
-
provider:
|
|
393
|
-
|
|
431
|
+
intent: state[:intent],
|
|
432
|
+
tier: state[:tier],
|
|
433
|
+
model: state[:model],
|
|
434
|
+
provider: state[:provider],
|
|
435
|
+
instance: state[:instance],
|
|
436
|
+
max_escalations: pipeline_escalation_max_attempts,
|
|
437
|
+
allow_default_fallback: state[:auto_route] != true
|
|
394
438
|
)
|
|
395
439
|
@escalation_chain.primary
|
|
396
440
|
else
|
|
397
|
-
Router.resolve(intent: state[:intent], tier: state[:tier], model: state[:model],
|
|
441
|
+
Router.resolve(intent: state[:intent], tier: state[:tier], model: state[:model],
|
|
442
|
+
provider: state[:provider], instance: state[:instance])
|
|
398
443
|
end
|
|
399
444
|
end
|
|
400
445
|
|
|
@@ -422,6 +467,13 @@ module Legion
|
|
|
422
467
|
state
|
|
423
468
|
end
|
|
424
469
|
|
|
470
|
+
def routing_field_explicit?(flags, key, value)
|
|
471
|
+
return false if value.nil? || value.to_s.empty?
|
|
472
|
+
return true unless flags.is_a?(Hash)
|
|
473
|
+
|
|
474
|
+
flags.fetch(key, flags.fetch(key.to_s, true)) == true
|
|
475
|
+
end
|
|
476
|
+
|
|
425
477
|
def step_request_normalization
|
|
426
478
|
@exchange_id = Tracing.exchange_id
|
|
427
479
|
end
|
|
@@ -476,33 +528,21 @@ module Legion
|
|
|
476
528
|
end
|
|
477
529
|
|
|
478
530
|
def run_provider_call_with_escalation
|
|
479
|
-
|
|
531
|
+
@escalation_chain ||= build_default_escalation_chain
|
|
532
|
+
chain = @escalation_chain
|
|
480
533
|
threshold = pipeline_escalation_quality_threshold
|
|
481
534
|
quality_check = @request.extra[:quality_check]
|
|
482
535
|
succeeded = false
|
|
536
|
+
tried = []
|
|
483
537
|
log.debug "[llm][executor] action=escalation.enter chain_size=#{chain.size} threshold=#{threshold}"
|
|
484
538
|
|
|
539
|
+
primary_tier = @escalation_chain.primary&.tier
|
|
540
|
+
|
|
485
541
|
chain.each do |resolution|
|
|
486
|
-
|
|
487
|
-
|
|
488
|
-
|
|
489
|
-
@resolved_model = resolution.model
|
|
490
|
-
@resolved_tier = resolution.tier
|
|
491
|
-
@resolved_offering_id = resolution.offering_id
|
|
492
|
-
@resolved_offering_metadata = resolution.offering_metadata
|
|
493
|
-
succeeded = attempt_escalation(resolution, threshold, quality_check, start_time)
|
|
542
|
+
next if tried.any? { |t| t[:provider] == resolution.provider && t[:instance] == resolution.instance && t[:model] == resolution.model }
|
|
543
|
+
|
|
544
|
+
succeeded = run_escalation_resolution(resolution, threshold, quality_check, tried, primary_tier)
|
|
494
545
|
break if succeeded
|
|
495
|
-
rescue Legion::LLM::AuthError, Legion::LLM::PrivacyModeError => e
|
|
496
|
-
record_escalation_failure(e, resolution, start_time,
|
|
497
|
-
outcome: :auth_error, operation: 'llm.pipeline.escalation_attempt.auth',
|
|
498
|
-
handled: true)
|
|
499
|
-
rescue Legion::LLM::RateLimitError => e
|
|
500
|
-
record_escalation_failure(e, resolution, start_time,
|
|
501
|
-
outcome: :rate_limited, operation: 'llm.pipeline.escalation_attempt.rate_limit',
|
|
502
|
-
handled: true)
|
|
503
|
-
rescue StandardError => e
|
|
504
|
-
record_escalation_failure(e, resolution, start_time, outcome: :error,
|
|
505
|
-
operation: 'llm.pipeline.escalation_attempt')
|
|
506
546
|
end
|
|
507
547
|
return if succeeded
|
|
508
548
|
|
|
@@ -513,6 +553,58 @@ module Legion
|
|
|
513
553
|
raise EscalationExhausted, "All #{@escalation_history.size} escalation attempts failed"
|
|
514
554
|
end
|
|
515
555
|
|
|
556
|
+
def run_escalation_resolution(resolution, threshold, quality_check, tried, primary_tier)
|
|
557
|
+
move_type = if tried.empty?
|
|
558
|
+
:primary
|
|
559
|
+
elsif resolution.tier == primary_tier
|
|
560
|
+
:lateral
|
|
561
|
+
else
|
|
562
|
+
:escalation
|
|
563
|
+
end
|
|
564
|
+
log.info "[llm][escalation] action=attempt move=#{move_type} provider=#{resolution.provider} model=#{resolution.model} tier=#{resolution.tier}"
|
|
565
|
+
|
|
566
|
+
start_time = Time.now
|
|
567
|
+
@resolved_provider = resolution.provider
|
|
568
|
+
@resolved_instance = resolution.instance
|
|
569
|
+
@resolved_model = resolution.model
|
|
570
|
+
@resolved_tier = resolution.tier
|
|
571
|
+
@resolved_offering_id = resolution.offering_id
|
|
572
|
+
@resolved_offering_metadata = resolution.offering_metadata
|
|
573
|
+
succeeded = attempt_escalation(resolution, threshold, quality_check, start_time)
|
|
574
|
+
tried << { provider: resolution.provider, instance: resolution.instance, model: resolution.model } unless succeeded
|
|
575
|
+
succeeded
|
|
576
|
+
rescue Legion::LLM::AuthError, Legion::LLM::PrivacyModeError => e
|
|
577
|
+
tried << { provider: resolution.provider, instance: resolution.instance, model: resolution.model }
|
|
578
|
+
record_escalation_failure(e, resolution, start_time,
|
|
579
|
+
outcome: :auth_error,
|
|
580
|
+
operation: 'llm.pipeline.escalation_attempt.auth',
|
|
581
|
+
handled: true)
|
|
582
|
+
false
|
|
583
|
+
rescue Legion::LLM::ContextOverflow => e
|
|
584
|
+
tried << { provider: resolution.provider, instance: resolution.instance, model: resolution.model }
|
|
585
|
+
record_escalation_failure(e, resolution, start_time,
|
|
586
|
+
outcome: :context_overflow,
|
|
587
|
+
operation: 'llm.pipeline.escalation_attempt.context_overflow',
|
|
588
|
+
handled: true)
|
|
589
|
+
log.warn "[llm][escalation] context_overflow provider=#{resolution.provider} " \
|
|
590
|
+
"model=#{resolution.model} — skipping same-tier, seeking larger context window"
|
|
591
|
+
skip_same_tier!(resolution, tried)
|
|
592
|
+
false
|
|
593
|
+
rescue Legion::LLM::RateLimitError => e
|
|
594
|
+
tried << { provider: resolution.provider, instance: resolution.instance, model: resolution.model }
|
|
595
|
+
record_escalation_failure(e, resolution, start_time,
|
|
596
|
+
outcome: :rate_limited,
|
|
597
|
+
operation: 'llm.pipeline.escalation_attempt.rate_limit',
|
|
598
|
+
handled: true)
|
|
599
|
+
false
|
|
600
|
+
rescue StandardError => e
|
|
601
|
+
tried << { provider: resolution.provider, instance: resolution.instance, model: resolution.model }
|
|
602
|
+
record_escalation_failure(e, resolution, start_time,
|
|
603
|
+
outcome: :error,
|
|
604
|
+
operation: 'llm.pipeline.escalation_attempt')
|
|
605
|
+
false
|
|
606
|
+
end
|
|
607
|
+
|
|
516
608
|
def attempt_escalation(resolution, threshold, quality_check, start_time)
|
|
517
609
|
@current_escalation_context = {
|
|
518
610
|
attempt: @escalation_history.size + 1,
|
|
@@ -586,7 +678,64 @@ module Legion
|
|
|
586
678
|
end
|
|
587
679
|
|
|
588
680
|
def build_default_escalation_chain
|
|
589
|
-
Router.
|
|
681
|
+
primary = Router.explicit_resolution(@resolved_tier, @resolved_provider, @resolved_model, @resolved_instance)
|
|
682
|
+
fallbacks = build_fallback_resolutions(
|
|
683
|
+
exclude_provider: @resolved_provider,
|
|
684
|
+
exclude_instance: @resolved_instance,
|
|
685
|
+
primary_tier: @resolved_tier
|
|
686
|
+
)
|
|
687
|
+
resolutions = ([primary] + fallbacks).compact.uniq { |r| [r.provider, r.instance, r.model] }
|
|
688
|
+
Router::EscalationChain.new(resolutions: resolutions, max_attempts: pipeline_escalation_max_attempts)
|
|
689
|
+
end
|
|
690
|
+
|
|
691
|
+
def build_fallback_resolutions(exclude_provider: nil, exclude_instance: nil, primary_tier: nil)
|
|
692
|
+
tier_rank = Router::TIER_RANK
|
|
693
|
+
primary_rank = primary_tier ? (tier_rank[primary_tier.to_sym] || 99) : 99
|
|
694
|
+
|
|
695
|
+
candidates = Call::Registry.all_instances.filter_map do |entry|
|
|
696
|
+
next if entry[:provider] == exclude_provider&.to_sym && entry[:instance] == (exclude_instance&.to_sym || :default)
|
|
697
|
+
next if entry[:provider] == exclude_provider&.to_sym && exclude_instance.nil?
|
|
698
|
+
|
|
699
|
+
model = Router.send(:registry_default_model, entry)
|
|
700
|
+
next unless model
|
|
701
|
+
|
|
702
|
+
tier = Router::PROVIDER_TIER.fetch(entry[:provider], :frontier)
|
|
703
|
+
Router::Resolution.new(
|
|
704
|
+
tier: tier,
|
|
705
|
+
provider: entry[:provider],
|
|
706
|
+
instance: entry[:instance] == :default ? nil : entry[:instance],
|
|
707
|
+
model: model,
|
|
708
|
+
rule: 'escalation_fallback'
|
|
709
|
+
)
|
|
710
|
+
end
|
|
711
|
+
|
|
712
|
+
# Lateral alternatives (same tier) come first; escalations (higher tier) follow;
|
|
713
|
+
# lower-ranked tiers are appended last.
|
|
714
|
+
candidates.sort_by do |r|
|
|
715
|
+
r_rank = tier_rank[r.tier] || 99
|
|
716
|
+
rank_diff = r_rank - primary_rank
|
|
717
|
+
bucket = if rank_diff.zero?
|
|
718
|
+
0
|
|
719
|
+
elsif rank_diff.positive?
|
|
720
|
+
1
|
|
721
|
+
else
|
|
722
|
+
2
|
|
723
|
+
end
|
|
724
|
+
[bucket, r_rank]
|
|
725
|
+
end
|
|
726
|
+
end
|
|
727
|
+
|
|
728
|
+
def skip_same_tier!(failed_resolution, tried)
|
|
729
|
+
chain = @escalation_chain
|
|
730
|
+
return unless chain.respond_to?(:each)
|
|
731
|
+
|
|
732
|
+
chain.each do |r|
|
|
733
|
+
next if r.tier != failed_resolution.tier
|
|
734
|
+
next if tried.any? { |t| t[:provider] == r.provider && t[:instance] == r.instance && t[:model] == r.model }
|
|
735
|
+
|
|
736
|
+
log.debug "[llm][escalation] action=skip_same_tier provider=#{r.provider} model=#{r.model} tier=#{r.tier} reason=context_overflow"
|
|
737
|
+
tried << { provider: r.provider, instance: r.instance, model: r.model }
|
|
738
|
+
end
|
|
590
739
|
end
|
|
591
740
|
|
|
592
741
|
def escalation_attempt_hash(resolution, outcome:, failures:, duration_ms:)
|
|
@@ -39,8 +39,17 @@ module Legion
|
|
|
39
39
|
cache: nil,
|
|
40
40
|
quality_check: nil,
|
|
41
41
|
**)
|
|
42
|
+
routing_explicit = { provider: !provider.nil?, model: !model.nil?, tier: !tier.nil? }
|
|
42
43
|
resolved_provider = provider
|
|
43
44
|
resolved_model = model
|
|
45
|
+
auto_route = Inference::Request.auto_routing_model?(resolved_model)
|
|
46
|
+
|
|
47
|
+
if auto_route
|
|
48
|
+
resolved_provider = nil
|
|
49
|
+
intent ||= Inference::Request.default_auto_routing_intent
|
|
50
|
+
elsif resolved_provider.nil? && resolved_model && defined?(Router)
|
|
51
|
+
resolved_provider = Router.infer_provider_for_model(resolved_model)
|
|
52
|
+
end
|
|
44
53
|
|
|
45
54
|
if resolved_provider.nil? && resolved_model.nil? && defined?(Router) && Router.routing_enabled? && (intent || tier)
|
|
46
55
|
resolution = Router.resolve(intent: intent, tier: tier, exclude: exclude)
|
|
@@ -48,26 +57,27 @@ module Legion
|
|
|
48
57
|
resolved_model = resolution&.model
|
|
49
58
|
end
|
|
50
59
|
|
|
51
|
-
resolved_provider ||= llm_setting(:default_provider)
|
|
52
|
-
resolved_model ||= llm_setting(:default_model)
|
|
60
|
+
resolved_provider ||= llm_setting(:default_provider) unless auto_route
|
|
61
|
+
resolved_model ||= llm_setting(:default_model) unless auto_route
|
|
53
62
|
|
|
54
63
|
request(message,
|
|
55
|
-
provider:
|
|
56
|
-
model:
|
|
57
|
-
intent:
|
|
58
|
-
tier:
|
|
59
|
-
schema:
|
|
60
|
-
tools:
|
|
61
|
-
escalate:
|
|
62
|
-
max_escalations:
|
|
63
|
-
thinking:
|
|
64
|
-
temperature:
|
|
65
|
-
max_tokens:
|
|
66
|
-
tracing:
|
|
67
|
-
agent:
|
|
68
|
-
caller:
|
|
69
|
-
cache:
|
|
70
|
-
quality_check:
|
|
64
|
+
provider: resolved_provider,
|
|
65
|
+
model: resolved_model,
|
|
66
|
+
intent: intent,
|
|
67
|
+
tier: tier,
|
|
68
|
+
schema: schema,
|
|
69
|
+
tools: tools,
|
|
70
|
+
escalate: escalate,
|
|
71
|
+
max_escalations: max_escalations,
|
|
72
|
+
thinking: thinking,
|
|
73
|
+
temperature: temperature,
|
|
74
|
+
max_tokens: max_tokens,
|
|
75
|
+
tracing: tracing,
|
|
76
|
+
agent: agent,
|
|
77
|
+
caller: caller,
|
|
78
|
+
cache: cache,
|
|
79
|
+
quality_check: quality_check,
|
|
80
|
+
routing_explicit: routing_explicit,
|
|
71
81
|
**)
|
|
72
82
|
end
|
|
73
83
|
|
|
@@ -90,7 +100,8 @@ module Legion
|
|
|
90
100
|
cache: nil,
|
|
91
101
|
quality_check: nil,
|
|
92
102
|
**)
|
|
93
|
-
|
|
103
|
+
auto_route = Inference::Request.auto_routing_model?(model)
|
|
104
|
+
if !auto_route && (provider.nil? || model.nil?)
|
|
94
105
|
raise LLMError, "Prompt.request: provider and model must be set (got provider=#{provider.inspect}, model=#{model.inspect}). " \
|
|
95
106
|
'Configure Legion::Settings[:llm][:default_provider] and [:default_model], or pass them explicitly.'
|
|
96
107
|
end
|
|
@@ -3,6 +3,9 @@
|
|
|
3
3
|
module Legion
|
|
4
4
|
module LLM
|
|
5
5
|
module Inference
|
|
6
|
+
AUTO_ROUTING_MODEL_KEY = 'legionio'
|
|
7
|
+
AUTO_ROUTING_MODEL_ALIASES = [AUTO_ROUTING_MODEL_KEY].freeze
|
|
8
|
+
|
|
6
9
|
Request = ::Data.define(
|
|
7
10
|
:id, :conversation_id, :idempotency_key, :schema_version,
|
|
8
11
|
:system, :messages, :tools, :tool_choice,
|
|
@@ -14,6 +17,11 @@ module Legion
|
|
|
14
17
|
:billing, :test, :modality, :hooks
|
|
15
18
|
) do
|
|
16
19
|
def self.build(**kwargs)
|
|
20
|
+
routing, extra = normalize_auto_routing(
|
|
21
|
+
kwargs.fetch(:routing, { provider: nil, model: nil }),
|
|
22
|
+
kwargs.fetch(:extra, {})
|
|
23
|
+
)
|
|
24
|
+
|
|
17
25
|
new(
|
|
18
26
|
id: kwargs[:id] || "req_#{SecureRandom.hex(12)}",
|
|
19
27
|
conversation_id: kwargs[:conversation_id],
|
|
@@ -23,7 +31,7 @@ module Legion
|
|
|
23
31
|
messages: kwargs.fetch(:messages, []),
|
|
24
32
|
tools: kwargs.key?(:tools) ? kwargs[:tools] : nil,
|
|
25
33
|
tool_choice: kwargs.fetch(:tool_choice, { mode: :auto }),
|
|
26
|
-
routing:
|
|
34
|
+
routing: routing,
|
|
27
35
|
tokens: kwargs.fetch(:tokens, { max: 4096 }),
|
|
28
36
|
stop: kwargs.fetch(:stop, { sequences: [] }),
|
|
29
37
|
generation: kwargs.fetch(:generation, {}),
|
|
@@ -35,7 +43,7 @@ module Legion
|
|
|
35
43
|
cache: kwargs.fetch(:cache, { strategy: :default, cacheable: true }),
|
|
36
44
|
priority: kwargs.fetch(:priority, :normal),
|
|
37
45
|
ttl: kwargs[:ttl],
|
|
38
|
-
extra:
|
|
46
|
+
extra: extra,
|
|
39
47
|
metadata: kwargs.fetch(:metadata, {}),
|
|
40
48
|
enrichments: kwargs.fetch(:enrichments, {}),
|
|
41
49
|
predictions: kwargs.fetch(:predictions, {}),
|
|
@@ -110,6 +118,41 @@ module Legion
|
|
|
110
118
|
build_args[:id] = request_id if request_id
|
|
111
119
|
build(**build_args)
|
|
112
120
|
end
|
|
121
|
+
|
|
122
|
+
def self.auto_routing_model?(model)
|
|
123
|
+
Legion::LLM::Inference::AUTO_ROUTING_MODEL_ALIASES.include?(model.to_s.strip.downcase)
|
|
124
|
+
end
|
|
125
|
+
|
|
126
|
+
def self.default_auto_routing_intent
|
|
127
|
+
routing = Legion::LLM::Settings.value(:routing, default: {})
|
|
128
|
+
intent = Legion::LLM::Settings.config_value(routing, :default_intent, {})
|
|
129
|
+
intent = intent.is_a?(Hash) ? normalize_hash(intent) : {}
|
|
130
|
+
intent.merge(capability: :chat)
|
|
131
|
+
rescue StandardError
|
|
132
|
+
{ capability: :chat }
|
|
133
|
+
end
|
|
134
|
+
|
|
135
|
+
def self.normalize_auto_routing(routing, extra)
|
|
136
|
+
normalized_routing = normalize_hash(routing)
|
|
137
|
+
normalized_extra = normalize_hash(extra)
|
|
138
|
+
return [normalized_routing, normalized_extra] unless auto_routing_model?(normalized_routing[:model])
|
|
139
|
+
|
|
140
|
+
normalized_routing = { provider: nil, model: nil }
|
|
141
|
+
normalized_extra = normalized_extra.dup
|
|
142
|
+
normalized_extra.delete(:tier)
|
|
143
|
+
normalized_extra[:intent] ||= default_auto_routing_intent
|
|
144
|
+
normalized_extra[:auto_route] = true
|
|
145
|
+
normalized_extra[:requested_model_alias] = Legion::LLM::Inference::AUTO_ROUTING_MODEL_KEY
|
|
146
|
+
[normalized_routing, normalized_extra]
|
|
147
|
+
end
|
|
148
|
+
|
|
149
|
+
def self.normalize_hash(value)
|
|
150
|
+
return {} unless value.is_a?(Hash)
|
|
151
|
+
|
|
152
|
+
value.each_with_object({}) do |(key, hash_value), normalized|
|
|
153
|
+
normalized[key.respond_to?(:to_sym) ? key.to_sym : key] = hash_value
|
|
154
|
+
end
|
|
155
|
+
end
|
|
113
156
|
end
|
|
114
157
|
end
|
|
115
158
|
end
|
|
@@ -39,10 +39,8 @@ module Legion
|
|
|
39
39
|
|
|
40
40
|
def padded_resolutions
|
|
41
41
|
return [] if @resolutions.empty?
|
|
42
|
-
return @resolutions.first(@max_attempts) if @resolutions.size >= @max_attempts
|
|
43
42
|
|
|
44
|
-
|
|
45
|
-
(@resolutions + Array.new(@max_attempts - @resolutions.size) { last }).first(@max_attempts)
|
|
43
|
+
@resolutions.first(@max_attempts)
|
|
46
44
|
end
|
|
47
45
|
end
|
|
48
46
|
end
|
data/lib/legion/llm/router.rb
CHANGED
|
@@ -18,6 +18,7 @@ module Legion
|
|
|
18
18
|
gemini: :cloud, azure: :cloud, ollama: :local, vllm: :fleet }.freeze
|
|
19
19
|
PROVIDER_ORDER = %i[ollama vllm bedrock azure gemini anthropic openai].freeze
|
|
20
20
|
TIER_EXTERNAL = Set[:cloud, :frontier, :openai_compat].freeze
|
|
21
|
+
TIER_RANK = { local: 0, direct: 1, fleet: 2, openai_compat: 3, cloud: 4, frontier: 5 }.freeze
|
|
21
22
|
|
|
22
23
|
OLLAMA_MODEL_PATTERN = %r{[:/]}
|
|
23
24
|
|
|
@@ -60,9 +61,9 @@ module Legion
|
|
|
60
61
|
# @param model [String, nil] explicit model override
|
|
61
62
|
# @param provider [Symbol, nil] explicit provider override
|
|
62
63
|
# @return [Resolution, nil]
|
|
63
|
-
def resolve(intent: nil, tier: nil, model: nil, provider: nil, exclude: {})
|
|
64
|
-
log.debug "[llm][router] action=resolve.enter intent=#{intent} tier=#{tier} model=#{model} provider=#{provider}"
|
|
65
|
-
return explicit_resolution(tier, provider, model) if tier
|
|
64
|
+
def resolve(intent: nil, tier: nil, model: nil, provider: nil, instance: nil, exclude: {})
|
|
65
|
+
log.debug "[llm][router] action=resolve.enter intent=#{intent} tier=#{tier} model=#{model} provider=#{provider} instance=#{instance}"
|
|
66
|
+
return explicit_resolution(tier, provider, model, instance) if tier || provider || instance
|
|
66
67
|
|
|
67
68
|
return nil unless routing_enabled? && intent
|
|
68
69
|
|
|
@@ -81,13 +82,14 @@ module Legion
|
|
|
81
82
|
resolution || arbitrage_fallback(intent)
|
|
82
83
|
end
|
|
83
84
|
|
|
84
|
-
def resolve_chain(intent: nil, tier: nil, model: nil, provider: nil,
|
|
85
|
+
def resolve_chain(intent: nil, tier: nil, model: nil, provider: nil, instance: nil, max_escalations: nil,
|
|
86
|
+
exclude: {}, allow_default_fallback: true)
|
|
85
87
|
log.debug "[llm][router] action=resolve_chain.enter intent=#{intent} tier=#{tier} max_escalations=#{max_escalations}"
|
|
86
88
|
max = max_escalations || escalation_max_attempts
|
|
87
|
-
return EscalationChain.new(resolutions: [explicit_resolution(tier, provider, model)], max_attempts: max) if tier
|
|
88
|
-
return chain_from_defaults(model, provider, max) unless routing_enabled? && intent
|
|
89
|
+
return EscalationChain.new(resolutions: [explicit_resolution(tier, provider, model, instance)], max_attempts: max) if tier || provider || instance
|
|
90
|
+
return chain_from_defaults(model, provider, max, allow_default_fallback: allow_default_fallback) unless routing_enabled? && intent
|
|
89
91
|
|
|
90
|
-
chain_from_intent(intent, max, exclude: exclude)
|
|
92
|
+
chain_from_intent(intent, max, exclude: exclude, allow_default_fallback: allow_default_fallback)
|
|
91
93
|
end
|
|
92
94
|
|
|
93
95
|
def health_tracker
|
|
@@ -145,6 +147,34 @@ module Legion
|
|
|
145
147
|
true
|
|
146
148
|
end
|
|
147
149
|
|
|
150
|
+
def explicit_resolution(tier, provider, model, instance = nil)
|
|
151
|
+
registry_entry = if provider
|
|
152
|
+
registry_entry_for_provider(provider.to_sym, instance: instance&.to_sym)
|
|
153
|
+
elsif tier
|
|
154
|
+
registry_entry_for_tier(tier)
|
|
155
|
+
end
|
|
156
|
+
resolved_provider = if provider
|
|
157
|
+
provider.to_sym
|
|
158
|
+
else
|
|
159
|
+
registry_entry&.[](:provider) ||
|
|
160
|
+
(tier && default_provider_for_tier(tier)) ||
|
|
161
|
+
default_settings_provider&.to_sym ||
|
|
162
|
+
:anthropic
|
|
163
|
+
end
|
|
164
|
+
resolved_model = model || registry_default_model(registry_entry) || (tier && default_model_for_tier(tier))
|
|
165
|
+
resolved_instance = registry_entry&.[](:instance) || instance
|
|
166
|
+
resolved_tier = tier || PROVIDER_TIER.fetch(resolved_provider, :frontier)
|
|
167
|
+
|
|
168
|
+
Resolution.new(
|
|
169
|
+
tier: resolved_tier,
|
|
170
|
+
provider: resolved_provider,
|
|
171
|
+
model: resolved_model,
|
|
172
|
+
instance: resolved_instance,
|
|
173
|
+
rule: 'explicit',
|
|
174
|
+
metadata: registry_resolution_metadata(registry_entry)
|
|
175
|
+
)
|
|
176
|
+
end
|
|
177
|
+
|
|
148
178
|
private
|
|
149
179
|
|
|
150
180
|
def arbitrage_fallback(intent)
|
|
@@ -162,25 +192,6 @@ module Legion
|
|
|
162
192
|
Resolution.new(tier: tier, provider: provider, model: model, rule: 'arbitrage_fallback')
|
|
163
193
|
end
|
|
164
194
|
|
|
165
|
-
def explicit_resolution(tier, provider, model)
|
|
166
|
-
registry_entry = if provider
|
|
167
|
-
registry_entry_for_provider(provider.to_sym)
|
|
168
|
-
else
|
|
169
|
-
registry_entry_for_tier(tier)
|
|
170
|
-
end
|
|
171
|
-
resolved_provider = provider ? provider.to_sym : (registry_entry&.[](:provider) || default_provider_for_tier(tier))
|
|
172
|
-
resolved_model = model || registry_default_model(registry_entry) || default_model_for_tier(tier)
|
|
173
|
-
|
|
174
|
-
Resolution.new(
|
|
175
|
-
tier: tier,
|
|
176
|
-
provider: resolved_provider,
|
|
177
|
-
model: resolved_model,
|
|
178
|
-
instance: registry_entry&.[](:instance),
|
|
179
|
-
rule: 'explicit',
|
|
180
|
-
metadata: registry_resolution_metadata(registry_entry)
|
|
181
|
-
)
|
|
182
|
-
end
|
|
183
|
-
|
|
184
195
|
def merge_defaults(intent)
|
|
185
196
|
defaults = (routing_settings[:default_intent] || {})
|
|
186
197
|
.transform_keys(&:to_sym)
|
|
@@ -423,19 +434,22 @@ module Legion
|
|
|
423
434
|
end
|
|
424
435
|
end
|
|
425
436
|
|
|
426
|
-
def chain_from_defaults(model, provider, max)
|
|
427
|
-
if provider || model || default_settings_provider || default_settings_model
|
|
437
|
+
def chain_from_defaults(model, provider, max, allow_default_fallback: true)
|
|
438
|
+
if provider || model || (allow_default_fallback && (default_settings_provider || default_settings_model))
|
|
428
439
|
p = (provider || default_settings_provider)&.to_sym
|
|
429
440
|
resolved_model = model || registry_default_model(registry_entry_for_provider(p)) ||
|
|
430
441
|
default_settings_model || 'claude-sonnet-4-6'
|
|
431
|
-
|
|
432
|
-
|
|
433
|
-
|
|
434
|
-
|
|
442
|
+
primary = Resolution.new(tier: PROVIDER_TIER.fetch(p || :anthropic, :frontier),
|
|
443
|
+
provider: p || :anthropic,
|
|
444
|
+
model: resolved_model)
|
|
445
|
+
# Append remaining registered providers as fallbacks (sorted by tier rank)
|
|
446
|
+
fallbacks = enabled_provider_chain.reject { |r| r.provider == primary.provider }
|
|
447
|
+
resolutions = ([primary] + fallbacks).uniq { |r| [r.provider, r.instance, r.model] }
|
|
448
|
+
return EscalationChain.new(resolutions: resolutions, max_attempts: max)
|
|
435
449
|
end
|
|
436
450
|
|
|
437
451
|
resolutions = enabled_provider_chain
|
|
438
|
-
if resolutions.empty?
|
|
452
|
+
if resolutions.empty? && allow_default_fallback
|
|
439
453
|
p = default_settings_provider&.to_sym || :anthropic
|
|
440
454
|
resolutions = [Resolution.new(tier: PROVIDER_TIER.fetch(p, :frontier),
|
|
441
455
|
provider: p,
|
|
@@ -475,7 +489,7 @@ module Legion
|
|
|
475
489
|
end
|
|
476
490
|
end
|
|
477
491
|
|
|
478
|
-
def chain_from_intent(intent, max, exclude: {})
|
|
492
|
+
def chain_from_intent(intent, max, exclude: {}, allow_default_fallback: true)
|
|
479
493
|
merged = intent ? merge_defaults(intent) : {}
|
|
480
494
|
rules = load_rules
|
|
481
495
|
candidates = select_candidates(rules, merged, exclude: exclude)
|
|
@@ -484,7 +498,7 @@ module Legion
|
|
|
484
498
|
resolutions = build_fallback_chain(sorted.first, sorted, resolutions) if sorted.first&.fallback
|
|
485
499
|
resolutions = resolutions.uniq { |r| [r.provider, r.model] }
|
|
486
500
|
resolutions = enabled_provider_chain if resolutions.empty?
|
|
487
|
-
if resolutions.empty?
|
|
501
|
+
if resolutions.empty? && allow_default_fallback
|
|
488
502
|
p = default_settings_provider&.to_sym || :anthropic
|
|
489
503
|
resolutions = [Resolution.new(tier: PROVIDER_TIER.fetch(p, :frontier),
|
|
490
504
|
provider: p,
|
|
@@ -573,14 +587,23 @@ module Legion
|
|
|
573
587
|
end
|
|
574
588
|
|
|
575
589
|
# Find the first registered instance for a specific provider.
|
|
576
|
-
|
|
590
|
+
# When +instance+ is given, prefers the entry whose :instance matches;
|
|
591
|
+
# falls back to the first provider entry if no exact match is found.
|
|
592
|
+
def registry_entry_for_provider(provider, instance: nil)
|
|
577
593
|
instances = begin
|
|
578
594
|
Call::Registry.all_instances
|
|
579
595
|
rescue StandardError => e
|
|
580
596
|
handle_exception(e, level: :warn, handled: true, operation: 'router.registry_entry_for_provider')
|
|
581
597
|
[]
|
|
582
598
|
end
|
|
583
|
-
instances.
|
|
599
|
+
provider_entries = instances.select { |entry| entry[:provider] == provider }
|
|
600
|
+
return nil if provider_entries.empty?
|
|
601
|
+
|
|
602
|
+
if instance
|
|
603
|
+
provider_entries.find { |entry| entry[:instance] == instance } || provider_entries.first
|
|
604
|
+
else
|
|
605
|
+
provider_entries.first
|
|
606
|
+
end
|
|
584
607
|
end
|
|
585
608
|
|
|
586
609
|
# Find a default model from registry for a given tier.
|
data/lib/legion/llm/settings.rb
CHANGED
|
@@ -305,7 +305,7 @@ module Legion
|
|
|
305
305
|
def self.routing_defaults
|
|
306
306
|
{
|
|
307
307
|
enabled: true,
|
|
308
|
-
tier_priority: %w[local fleet openai_compat cloud frontier],
|
|
308
|
+
tier_priority: %w[local direct fleet openai_compat cloud frontier],
|
|
309
309
|
default_intent: { privacy: 'normal', capability: 'moderate', cost: 'normal' },
|
|
310
310
|
tiers: {
|
|
311
311
|
local: { provider: 'ollama' },
|
data/lib/legion/llm/version.rb
CHANGED