legion-llm 0.8.27 → 0.8.29

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 2cccf9351fd9f4db59b1548197bf7b78c5947e85183535e86ede5c3359d71b89
4
- data.tar.gz: 14e3a1b5b6648bea618941f84473e63aeccee7edc9520366d09dde8d27b00a7b
3
+ metadata.gz: 4ec77012ba08ec5ed5cb8fd544fca1a28ee8993b5136852f647be4f7e8725309
4
+ data.tar.gz: ed78d65c4966669e008c853c89983a0baa1f1fd8f28a3510bc8461544f9ead5c
5
5
  SHA512:
6
- metadata.gz: 31ec279fcb498e5cc3308bcefcb6adc94915c36867967fd08aa3f0422d4c583f83bc0ace1db9a47ecd45371a0ce0c82542fea7015c1474f5b7386409d789e5e0
7
- data.tar.gz: '083bd8e581399a574b313a31784eacca4424fadb131f82cd17a5a7e840420da14ec3e5f589efa88b7be6be8a76916439c88bbf936a6d13c9a9435bd8fd04245c'
6
+ metadata.gz: 2a33fd3d2b5dcd7c36e11ef5d1715d03f71256c1826967cf987e1665443bed0123d9473e178f034e9aa6a6f62ab72b8a1608b6b187ee3554882d8aacc98ded04
7
+ data.tar.gz: 3c701ef336fbb0695819860bf3f68c108887d040ccd960dddb944ffc3fcfabc2ab52dc3d0194384530756bf6b8891d0f6d091bbae7b4cdd9db62024f7bd9874e
data/CHANGELOG.md CHANGED
@@ -1,5 +1,26 @@
1
1
  # Legion LLM Changelog
2
2
 
3
+ ## [0.8.29] - 2026-04-27
4
+
5
+ ### Added
6
+ - Bedrock embedding support via `call/bedrock_embeddings.rb` — a `RubyLLM::Providers::Bedrock` monkey-patch (same pattern as `bedrock_auth.rb`) that implements `render_embedding_payload`, `embedding_url`, `parse_embedding_response`, and overrides `embed` for signed transport. Covers Amazon Titan v1, Titan v2 (selectable 256/512/1024 dimensions), and Cohere Embed v3 (English + multilingual).
7
+ - Short-circuit guard: when ruby_llm eventually ships native `render_embedding_payload`, the patch becomes inert rather than double-loading the method.
8
+ - Trap-and-continue batch semantics for Titan (which is single-text-per-call): `embed_titan_batch` iterates client-side, preserves partial successes on mid-batch failures, logs the failure count via `RubyLLM.logger.warn`, and only raises when 100% of inputs fail.
9
+ - Input-size guards: Titan rejects >8k tokens with a billable 400 — we now raise a descriptive `RubyLLM::Error` at ≥45 000 bytes before the wire call. Cohere enforces the 96-texts / 8 KB-per-text documented limits.
10
+ - Full spec coverage in `spec/legion/llm/bedrock_embeddings_spec.rb` (probe contract, per-model payload shapes, dimension validation, batch limits, error paths).
11
+
12
+ ### Fixed
13
+ - `Legion::LLM::Discovery.find_embedding_provider` can now actually resolve Bedrock when it is the configured fallback. Previously, the discovery probe (`klass.instance_method(:render_embedding_payload)`) raised `NameError` for Bedrock and the fallback chain skipped past it with `[llm][discovery] no embedding provider available` — even when Bedrock was the only reachable embedding provider.
14
+
15
+ ## [0.8.28] - 2026-04-24
16
+
17
+ ### Fixed
18
+ - Model/provider mismatch when clients send a model name (e.g., `qwen3.5:latest`) without an explicit provider. The fallback paths blindly paired it with `default_provider` (typically `bedrock`), causing `RubyLLM::ModelNotFoundError`. Now infers the correct provider from model naming patterns before falling back to the global default.
19
+ - `arbitrage_fallback` hardcoded `:cloud` tier and `:bedrock` provider when inference failed. Now uses `PROVIDER_TIER` to resolve the correct tier for the inferred provider.
20
+
21
+ ### Added
22
+ - `Router.infer_provider_for_model(model)` — public method that maps model naming patterns to providers. Recognizes Ollama-style models (`:` or `/` in name), Bedrock (`us.*`), OpenAI (`gpt-*`, `o1-*`/`o3-*`/`o4-*`), Anthropic (`claude-*`), and Gemini (`gemini-*`).
23
+
3
24
  ## [0.8.27] - 2026-04-24
4
25
 
5
26
  ### Fixed
@@ -0,0 +1,270 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Monkey-patch RubyLLM's Bedrock provider to support embeddings via
4
+ # Amazon Titan (amazon.titan-embed-text-v1 and v2) and Cohere Embed
5
+ # (cohere.embed-english-v3 / cohere.embed-multilingual-v3).
6
+ #
7
+ # Without this patch, `RubyLLM::Providers::Bedrock` exposes no
8
+ # `render_embedding_payload` method, so the discovery probe
9
+ # (`klass.instance_method(:render_embedding_payload)`) raises NameError
10
+ # and Bedrock is silently excluded from the embedding fallback chain.
11
+ #
12
+ # Companion piece to `call/bedrock_auth.rb` — both use the same
13
+ # bearer-or-SigV4 `signed_post` path and live here (not in lex-bedrock)
14
+ # because lex-bedrock wraps `aws-sdk-bedrockruntime`, not RubyLLM.
15
+ #
16
+ # ─── Upstream tracking ────────────────────────────────────────────
17
+ # This is a deprecation-scheduled shim. The methods below are the
18
+ # kind of thing that eventually belongs in the underlying ruby_llm
19
+ # library's Bedrock provider. Remove this file once upstream ships
20
+ # equivalent support. The short-circuit below renders the patch
21
+ # inert when `render_embedding_payload` is defined natively, so an
22
+ # accidental double-load after an upstream bump is safe.
23
+ # ───────────────────────────────────────────────────────────────────
24
+ #
25
+ # Titan v2 request shape:
26
+ # POST /model/amazon.titan-embed-text-v2:0/invoke
27
+ # { "inputText": "...", "dimensions": 1024, "normalize": true }
28
+ # => { "embedding": [...], "inputTextTokenCount": N }
29
+ #
30
+ # Cohere Embed request shape:
31
+ # POST /model/cohere.embed-english-v3/invoke
32
+ # { "texts": ["..."], "input_type": "search_document" }
33
+ # => { "embeddings": [[...]], ... }
34
+
35
+ require 'ruby_llm'
36
+ require_relative 'bedrock_auth'
37
+
38
+ if RubyLLM::Providers::Bedrock.method_defined?(:render_embedding_payload)
39
+ # Native support landed upstream — patch is inert.
40
+ Legion::Logging.logger.info('[llm][bedrock_embeddings] native ruby_llm embedding support detected — skipping patch')
41
+ else
42
+
43
+ module RubyLLM
44
+ module Providers
45
+ class Bedrock
46
+ # Embeddings methods for AWS Bedrock via InvokeModel.
47
+ #
48
+ # Public methods are instance methods (not `module_function`) so the
49
+ # `include Embeddings` at the end of the class body properly overrides
50
+ # `Provider#embed` via Ruby's method-resolution order.
51
+ module Embeddings
52
+ TITAN_V2_PREFIX = 'amazon.titan-embed-text-v2'
53
+ TITAN_V1_PREFIX = 'amazon.titan-embed-text-v1'
54
+ COHERE_PREFIX = 'cohere.embed'
55
+
56
+ TITAN_ALLOWED_DIMENSIONS = [256, 512, 1024].freeze
57
+ TITAN_MAX_INPUT_BYTES = 45_000 # ~8k tokens; Titan rejects larger with 400 (and still bills)
58
+ COHERE_MAX_INPUT_BYTES = 8_192 # Cohere Embed v3 per-text byte budget
59
+ COHERE_MAX_TEXTS = 96 # Cohere Embed v3 batch limit
60
+ # Bedrock model IDs use only alphanumerics, `.`, `-`, and `:` (e.g.
61
+ # `amazon.titan-embed-text-v2:0`, `cohere.embed-english-v3`,
62
+ # `us.anthropic.claude-sonnet-4-6-v1`). Slashes and `..` are rejected
63
+ # to block path-injection into the `/model/<id>/invoke` URL.
64
+ MODEL_ID_PATTERN = /\A[a-zA-Z0-9.\-:]+\z/
65
+
66
+ # @param model [String, Symbol] Bedrock model id
67
+ # @return [String] InvokeModel URL path
68
+ # @raise [RubyLLM::Error] if model id contains unsafe characters
69
+ def embedding_url(model:)
70
+ raise RubyLLM::Error.new(nil, "Invalid Bedrock model id: #{model.inspect}") \
71
+ unless model.to_s.match?(MODEL_ID_PATTERN)
72
+
73
+ "/model/#{model}/invoke"
74
+ end
75
+
76
+ # @param text [String, Array<String>]
77
+ # @param model [String] Bedrock embedding model id
78
+ # @param dimensions [Integer, nil] Titan v2 only; one of {256, 512, 1024}
79
+ # @return [Hash] JSON-serializable request payload
80
+ # @raise [RubyLLM::Error] on unsupported model, oversize input, or invalid dimensions
81
+ def render_embedding_payload(text, model:, dimensions:)
82
+ model_str = model.to_s
83
+
84
+ if model_str.start_with?(TITAN_V2_PREFIX)
85
+ titan_v2_payload(text, dimensions: dimensions)
86
+ elsif model_str.start_with?(TITAN_V1_PREFIX)
87
+ titan_v1_payload(text)
88
+ elsif model_str.start_with?(COHERE_PREFIX)
89
+ cohere_payload(text)
90
+ else
91
+ raise RubyLLM::Error.new(
92
+ nil,
93
+ "Bedrock model '#{model}' is not supported for embeddings. " \
94
+ 'Supported prefixes: amazon.titan-embed-text-v1, ' \
95
+ 'amazon.titan-embed-text-v2, cohere.embed-*.'
96
+ )
97
+ end
98
+ end
99
+
100
+ # @param response [Faraday::Response]
101
+ # @param model [String]
102
+ # @param text [String, Array<String>] original input (used for shape decisions)
103
+ # @return [RubyLLM::Embedding]
104
+ # @raise [RubyLLM::Error] if the response carried no vector
105
+ def parse_embedding_response(response, model:, text:)
106
+ body = response.body
107
+ body = try_parse_json(body) if body.is_a?(String)
108
+
109
+ vectors =
110
+ if model.to_s.start_with?(COHERE_PREFIX)
111
+ Array(body['embeddings'])
112
+ else
113
+ # Titan single-text response: the single vector lives in :embedding.
114
+ # Batch callers are handled in `embed` via iteration.
115
+ [body['embedding']].compact
116
+ end
117
+
118
+ raise RubyLLM::Error.new(response, "Empty embedding response for model #{model}") if vectors.empty?
119
+
120
+ vectors = vectors.first if vectors.length == 1 && !text.is_a?(Array)
121
+ input_tokens = body['inputTextTokenCount'] ||
122
+ body.dig('meta', 'billed_units', 'input_tokens') ||
123
+ 0
124
+
125
+ RubyLLM::Embedding.new(vectors: vectors, model: model, input_tokens: input_tokens)
126
+ end
127
+
128
+ # Override the base `embed` method so signing headers are applied.
129
+ #
130
+ # The parent `Provider#embed` calls `@connection.post(url, payload)` directly,
131
+ # which would skip both bearer-token and SigV4 auth for Bedrock. We go through
132
+ # `invoke_embedding`, which mirrors `signed_post` but parses responses with
133
+ # `parse_embedding_response` (not `parse_completion_response`).
134
+ #
135
+ # Titan accepts a single text per invocation. When an Array is passed to a
136
+ # Titan model, we iterate via `embed_titan_batch`, which traps per-element
137
+ # failures so one 429 mid-batch does not lose preceding successes.
138
+ #
139
+ # @param text [String, Array<String>]
140
+ # @param model [String]
141
+ # @param dimensions [Integer, nil]
142
+ # @return [RubyLLM::Embedding]
143
+ def embed(text, model:, dimensions:)
144
+ return embed_titan_batch(text, model: model, dimensions: dimensions) \
145
+ if text.is_a?(Array) && !model.to_s.start_with?(COHERE_PREFIX)
146
+
147
+ payload = render_embedding_payload(text, model: model, dimensions: dimensions)
148
+ url = embedding_url(model: model)
149
+ response = invoke_embedding(url, payload)
150
+ parse_embedding_response(response, model: model, text: text)
151
+ end
152
+
153
+ private
154
+
155
+ def titan_v2_payload(text, dimensions:)
156
+ raise RubyLLM::Error.new(nil, 'Titan v2 embeddings accept a single string per invocation.') \
157
+ if text.is_a?(Array)
158
+
159
+ enforce_input_size!(text, TITAN_MAX_INPUT_BYTES, 'Titan v2')
160
+
161
+ payload = { inputText: text.to_s, normalize: true }
162
+ dim = dimensions&.to_i
163
+ if dim
164
+ unless TITAN_ALLOWED_DIMENSIONS.include?(dim)
165
+ raise RubyLLM::Error.new(
166
+ nil,
167
+ "Titan v2 dimensions must be one of #{TITAN_ALLOWED_DIMENSIONS.inspect}, got #{dim}"
168
+ )
169
+ end
170
+ payload[:dimensions] = dim
171
+ end
172
+ payload
173
+ end
174
+
175
+ def titan_v1_payload(text)
176
+ raise RubyLLM::Error.new(nil, 'Titan v1 embeddings accept a single string per invocation.') \
177
+ if text.is_a?(Array)
178
+
179
+ enforce_input_size!(text, TITAN_MAX_INPUT_BYTES, 'Titan v1')
180
+ { inputText: text.to_s }
181
+ end
182
+
183
+ def cohere_payload(text)
184
+ texts = Array(text).map(&:to_s)
185
+ raise RubyLLM::Error.new(nil, "Cohere Embed batch size #{texts.size} exceeds max #{COHERE_MAX_TEXTS}") \
186
+ if texts.size > COHERE_MAX_TEXTS
187
+
188
+ texts.each { |t| enforce_input_size!(t, COHERE_MAX_INPUT_BYTES, 'Cohere Embed') }
189
+
190
+ { texts: texts, input_type: 'search_document' }
191
+ end
192
+
193
+ def enforce_input_size!(text, max_bytes, model_name)
194
+ bytes = text.to_s.bytesize
195
+ return if bytes <= max_bytes
196
+
197
+ raise RubyLLM::Error.new(
198
+ nil,
199
+ "#{model_name} input too large: #{bytes} bytes exceeds max #{max_bytes}. " \
200
+ 'Caller must chunk before embedding.'
201
+ )
202
+ end
203
+
204
+ # Mirror of `signed_post` for embeddings: pre-serializes the body so the
205
+ # SigV4 signature matches the bytes Faraday actually sends. `@connection.post`
206
+ # is `RubyLLM::Connection#post(url, payload)` which requires both args, so we
207
+ # pass `payload` to satisfy the arity but override `req.body = body` in the
208
+ # block — the block runs after middleware, so the pre-serialized bytes win
209
+ # over whatever JSON middleware would have produced.
210
+ def invoke_embedding(url, payload)
211
+ body = Legion::JSON.dump(payload)
212
+ headers = sign_headers('POST', url, body)
213
+
214
+ @connection.post(url, payload) do |req|
215
+ req.headers.merge!(headers)
216
+ req.body = body
217
+ end
218
+ end
219
+
220
+ # Per-item trap-and-continue for Titan batch. Returns a combined Embedding
221
+ # whose `vectors` is an Array of [Float] per input index, with `nil` entries
222
+ # for failed slots. Token count aggregates successful calls.
223
+ #
224
+ # Raises only when every element failed — otherwise logs failures via
225
+ # `RubyLLM.logger` and returns partial results so callers keep the paid-for
226
+ # vectors. Idiomatic for this file because we are inside the RubyLLM
227
+ # namespace; Legion-side batch orchestration lives in
228
+ # `Legion::LLM::Call::Embeddings.generate_batch`.
229
+ def embed_titan_batch(texts, model:, dimensions:)
230
+ vectors = []
231
+ token_total = 0
232
+ failures = []
233
+
234
+ texts.each_with_index do |text, idx|
235
+ single = embed(text.to_s, model: model, dimensions: dimensions)
236
+ vectors << Array(single.vectors).first
237
+ token_total += single.input_tokens.to_i
238
+ rescue StandardError => e
239
+ vectors << nil
240
+ failures << { index: idx, error: e.class.name, message: e.message }
241
+ end
242
+
243
+ unless failures.empty?
244
+ RubyLLM.logger.warn(
245
+ '[bedrock_embeddings] Titan batch partial failure: ' \
246
+ "#{failures.size}/#{texts.size} model=#{model}"
247
+ )
248
+ failures.each do |f|
249
+ RubyLLM.logger.debug(
250
+ "[bedrock_embeddings] batch item index=#{f[:index]} error=#{f[:error]} message=#{f[:message]}"
251
+ )
252
+ end
253
+ end
254
+
255
+ if failures.size == texts.size
256
+ raise RubyLLM::Error.new(
257
+ nil,
258
+ "All #{texts.size} Titan batch items failed. First error: #{failures.first[:message]}"
259
+ )
260
+ end
261
+
262
+ RubyLLM::Embedding.new(vectors: vectors, model: model, input_tokens: token_total)
263
+ end
264
+ end
265
+
266
+ include Embeddings
267
+ end
268
+ end
269
+ end
270
+ end
@@ -328,7 +328,9 @@ module Legion
328
328
  end
329
329
  end
330
330
 
331
- @resolved_provider = provider || Legion::LLM.settings[:default_provider]
331
+ @resolved_provider = provider ||
332
+ (model && Router.infer_provider_for_model(model)) ||
333
+ Legion::LLM.settings[:default_provider]
332
334
  @resolved_model = model || Legion::LLM.settings[:default_model]
333
335
 
334
336
  log.info "[llm][inference] resolved provider=#{@resolved_provider} model=#{@resolved_model}"
@@ -846,6 +848,8 @@ module Legion
846
848
  duration_ms = started_at ? ((finished_at - started_at) * 1000).round : nil
847
849
 
848
850
  result_str = (raw.is_a?(String) ? raw : raw.to_s)
851
+ result_str = result_str.encode('UTF-8', invalid: :replace, undef: :replace, replace: '�') unless result_str.valid_encoding?
852
+ result_str = result_str.delete("\x00")
849
853
  is_error = raw.is_a?(Hash) && (raw[:error] || raw['error']) ? true : false
850
854
 
851
855
  @pending_tool_history_mutex.synchronize do
@@ -496,7 +496,8 @@ module Legion
496
496
  end
497
497
 
498
498
  model ||= Legion::LLM.settings[:default_model]
499
- provider ||= Legion::LLM.settings[:default_provider]
499
+ provider ||= (model && Router.infer_provider_for_model(model)) ||
500
+ Legion::LLM.settings[:default_provider]
500
501
 
501
502
  opts = {}
502
503
  opts[:model] = model if model
@@ -18,7 +18,22 @@ module Legion
18
18
  gemini: :cloud, azure: :cloud, ollama: :local, vllm: :local }.freeze
19
19
  PROVIDER_ORDER = %i[ollama vllm bedrock azure gemini anthropic openai].freeze
20
20
 
21
+ OLLAMA_MODEL_PATTERN = %r{[:/]}
22
+
21
23
  class << self
24
+ def infer_provider_for_model(model)
25
+ return nil if model.nil? || model.to_s.empty?
26
+
27
+ model_s = model.to_s
28
+ return :bedrock if model_s.start_with?('us.')
29
+ return :openai if model_s.match?(/\Agpt-|\Ao[134]-/)
30
+ return :anthropic if model_s.start_with?('claude-')
31
+ return :gemini if model_s.start_with?('gemini-')
32
+ return :ollama if model_s.match?(OLLAMA_MODEL_PATTERN)
33
+
34
+ nil
35
+ end
36
+
22
37
  # Resolve an LLM routing intent to a tier/provider/model decision.
23
38
  #
24
39
  # @param intent [Hash, nil] routing intent (capability, privacy, etc.)
@@ -95,18 +110,12 @@ module Legion
95
110
  model = Arbitrage.cheapest_for(capability: capability)
96
111
  return nil unless model
97
112
 
98
- provider = Arbitrage.cost_table[model] ? infer_provider(model) : nil
99
- log.debug("Router: arbitrage fallback selected model=#{model}")
100
- Resolution.new(tier: :cloud, provider: provider || :bedrock, model: model, rule: 'arbitrage_fallback')
101
- end
102
-
103
- def infer_provider(model)
104
- return :ollama if model.include?('llama')
105
- return :bedrock if model.start_with?('us.')
106
- return :openai if model.start_with?('gpt')
107
- return :google if model.start_with?('gemini')
113
+ provider = infer_provider_for_model(model)
114
+ return nil unless provider
108
115
 
109
- :anthropic if model.start_with?('claude')
116
+ tier = PROVIDER_TIER.fetch(provider, :cloud)
117
+ log.debug("Router: arbitrage fallback selected model=#{model} provider=#{provider} tier=#{tier}")
118
+ Resolution.new(tier: tier, provider: provider, model: model, rule: 'arbitrage_fallback')
110
119
  end
111
120
 
112
121
  def explicit_resolution(tier, provider, model)
@@ -2,6 +2,6 @@
2
2
 
3
3
  module Legion
4
4
  module LLM
5
- VERSION = '0.8.27'
5
+ VERSION = '0.8.29'
6
6
  end
7
7
  end
data/lib/legion/llm.rb CHANGED
@@ -15,6 +15,7 @@ require_relative 'llm/call/embeddings'
15
15
  require_relative 'llm/call/structured_output'
16
16
  require_relative 'llm/call/daemon_client'
17
17
  require_relative 'llm/call/bedrock_auth'
18
+ require_relative 'llm/call/bedrock_embeddings'
18
19
  require_relative 'llm/call/claude_config_loader'
19
20
  require_relative 'llm/call/codex_config_loader'
20
21
  require_relative 'llm/router'
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: legion-llm
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.8.27
4
+ version: 0.8.29
5
5
  platform: ruby
6
6
  authors:
7
7
  - Esity
@@ -246,6 +246,7 @@ files:
246
246
  - lib/legion/llm/cache/response.rb
247
247
  - lib/legion/llm/call.rb
248
248
  - lib/legion/llm/call/bedrock_auth.rb
249
+ - lib/legion/llm/call/bedrock_embeddings.rb
249
250
  - lib/legion/llm/call/claude_config_loader.rb
250
251
  - lib/legion/llm/call/codex_config_loader.rb
251
252
  - lib/legion/llm/call/daemon_client.rb