RubyGems - legion-llm - Versions diffs - 0.8.27 → 0.8.29 - Mend

legion-llm 0.8.27 → 0.8.29

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +21 -0
data/lib/legion/llm/call/bedrock_embeddings.rb +270 -0
data/lib/legion/llm/inference/executor.rb +5 -1
data/lib/legion/llm/inference.rb +2 -1
data/lib/legion/llm/router.rb +20 -11
data/lib/legion/llm/version.rb +1 -1
data/lib/legion/llm.rb +1 -0
metadata +2 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 2cccf9351fd9f4db59b1548197bf7b78c5947e85183535e86ede5c3359d71b89
-  data.tar.gz: 14e3a1b5b6648bea618941f84473e63aeccee7edc9520366d09dde8d27b00a7b
+  metadata.gz: 4ec77012ba08ec5ed5cb8fd544fca1a28ee8993b5136852f647be4f7e8725309
+  data.tar.gz: ed78d65c4966669e008c853c89983a0baa1f1fd8f28a3510bc8461544f9ead5c
 SHA512:
-  metadata.gz: 31ec279fcb498e5cc3308bcefcb6adc94915c36867967fd08aa3f0422d4c583f83bc0ace1db9a47ecd45371a0ce0c82542fea7015c1474f5b7386409d789e5e0
-  data.tar.gz: '083bd8e581399a574b313a31784eacca4424fadb131f82cd17a5a7e840420da14ec3e5f589efa88b7be6be8a76916439c88bbf936a6d13c9a9435bd8fd04245c'
+  metadata.gz: 2a33fd3d2b5dcd7c36e11ef5d1715d03f71256c1826967cf987e1665443bed0123d9473e178f034e9aa6a6f62ab72b8a1608b6b187ee3554882d8aacc98ded04
+  data.tar.gz: 3c701ef336fbb0695819860bf3f68c108887d040ccd960dddb944ffc3fcfabc2ab52dc3d0194384530756bf6b8891d0f6d091bbae7b4cdd9db62024f7bd9874e

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,26 @@
 # Legion LLM Changelog
+## [0.8.29] - 2026-04-27
+### Added
+- Bedrock embedding support via `call/bedrock_embeddings.rb` — a `RubyLLM::Providers::Bedrock` monkey-patch (same pattern as `bedrock_auth.rb`) that implements `render_embedding_payload`, `embedding_url`, `parse_embedding_response`, and overrides `embed` for signed transport. Covers Amazon Titan v1, Titan v2 (selectable 256/512/1024 dimensions), and Cohere Embed v3 (English + multilingual).
+- Short-circuit guard: when ruby_llm eventually ships native `render_embedding_payload`, the patch becomes inert rather than double-loading the method.
+- Trap-and-continue batch semantics for Titan (which is single-text-per-call): `embed_titan_batch` iterates client-side, preserves partial successes on mid-batch failures, logs the failure count via `RubyLLM.logger.warn`, and only raises when 100% of inputs fail.
+- Input-size guards: Titan rejects >8k tokens with a billable 400 — we now raise a descriptive `RubyLLM::Error` at ≥45 000 bytes before the wire call. Cohere enforces the 96-texts / 8 KB-per-text documented limits.
+- Full spec coverage in `spec/legion/llm/bedrock_embeddings_spec.rb` (probe contract, per-model payload shapes, dimension validation, batch limits, error paths).
+### Fixed
+- `Legion::LLM::Discovery.find_embedding_provider` can now actually resolve Bedrock when it is the configured fallback. Previously, the discovery probe (`klass.instance_method(:render_embedding_payload)`) raised `NameError` for Bedrock and the fallback chain skipped past it with `[llm][discovery] no embedding provider available` — even when Bedrock was the only reachable embedding provider.
+## [0.8.28] - 2026-04-24
+### Fixed
+- Model/provider mismatch when clients send a model name (e.g., `qwen3.5:latest`) without an explicit provider. The fallback paths blindly paired it with `default_provider` (typically `bedrock`), causing `RubyLLM::ModelNotFoundError`. Now infers the correct provider from model naming patterns before falling back to the global default.
+- `arbitrage_fallback` hardcoded `:cloud` tier and `:bedrock` provider when inference failed. Now uses `PROVIDER_TIER` to resolve the correct tier for the inferred provider.
+### Added
+- `Router.infer_provider_for_model(model)` — public method that maps model naming patterns to providers. Recognizes Ollama-style models (`:` or `/` in name), Bedrock (`us.*`), OpenAI (`gpt-*`, `o1-*`/`o3-*`/`o4-*`), Anthropic (`claude-*`), and Gemini (`gemini-*`).
 ## [0.8.27] - 2026-04-24
 ### Fixed

data/lib/legion/llm/call/bedrock_embeddings.rb ADDED Viewed

@@ -0,0 +1,270 @@
+# frozen_string_literal: true
+# Monkey-patch RubyLLM's Bedrock provider to support embeddings via
+# Amazon Titan (amazon.titan-embed-text-v1 and v2) and Cohere Embed
+# (cohere.embed-english-v3 / cohere.embed-multilingual-v3).
+#
+# Without this patch, `RubyLLM::Providers::Bedrock` exposes no
+# `render_embedding_payload` method, so the discovery probe
+# (`klass.instance_method(:render_embedding_payload)`) raises NameError
+# and Bedrock is silently excluded from the embedding fallback chain.
+#
+# Companion piece to `call/bedrock_auth.rb` — both use the same
+# bearer-or-SigV4 `signed_post` path and live here (not in lex-bedrock)
+# because lex-bedrock wraps `aws-sdk-bedrockruntime`, not RubyLLM.
+#
+# ─── Upstream tracking ────────────────────────────────────────────
+# This is a deprecation-scheduled shim. The methods below are the
+# kind of thing that eventually belongs in the underlying ruby_llm
+# library's Bedrock provider. Remove this file once upstream ships
+# equivalent support. The short-circuit below renders the patch
+# inert when `render_embedding_payload` is defined natively, so an
+# accidental double-load after an upstream bump is safe.
+# ───────────────────────────────────────────────────────────────────
+#
+# Titan v2 request shape:
+#   POST /model/amazon.titan-embed-text-v2:0/invoke
+#   { "inputText": "...", "dimensions": 1024, "normalize": true }
+#   => { "embedding": [...], "inputTextTokenCount": N }
+#
+# Cohere Embed request shape:
+#   POST /model/cohere.embed-english-v3/invoke
+#   { "texts": ["..."], "input_type": "search_document" }
+#   => { "embeddings": [[...]], ... }
+require 'ruby_llm'
+require_relative 'bedrock_auth'
+if RubyLLM::Providers::Bedrock.method_defined?(:render_embedding_payload)
+  # Native support landed upstream — patch is inert.
+  Legion::Logging.logger.info('[llm][bedrock_embeddings] native ruby_llm embedding support detected — skipping patch')
+else
+  module RubyLLM
+    module Providers
+      class Bedrock
+        # Embeddings methods for AWS Bedrock via InvokeModel.
+        #
+        # Public methods are instance methods (not `module_function`) so the
+        # `include Embeddings` at the end of the class body properly overrides
+        # `Provider#embed` via Ruby's method-resolution order.
+        module Embeddings
+          TITAN_V2_PREFIX = 'amazon.titan-embed-text-v2'
+          TITAN_V1_PREFIX = 'amazon.titan-embed-text-v1'
+          COHERE_PREFIX   = 'cohere.embed'
+          TITAN_ALLOWED_DIMENSIONS = [256, 512, 1024].freeze
+          TITAN_MAX_INPUT_BYTES    = 45_000   # ~8k tokens; Titan rejects larger with 400 (and still bills)
+          COHERE_MAX_INPUT_BYTES   = 8_192    # Cohere Embed v3 per-text byte budget
+          COHERE_MAX_TEXTS         = 96       # Cohere Embed v3 batch limit
+          # Bedrock model IDs use only alphanumerics, `.`, `-`, and `:` (e.g.
+          # `amazon.titan-embed-text-v2:0`, `cohere.embed-english-v3`,
+          # `us.anthropic.claude-sonnet-4-6-v1`). Slashes and `..` are rejected
+          # to block path-injection into the `/model/<id>/invoke` URL.
+          MODEL_ID_PATTERN         = /\A[a-zA-Z0-9.\-:]+\z/
+          # @param model [String, Symbol] Bedrock model id
+          # @return [String] InvokeModel URL path
+          # @raise [RubyLLM::Error] if model id contains unsafe characters
+          def embedding_url(model:)
+            raise RubyLLM::Error.new(nil, "Invalid Bedrock model id: #{model.inspect}") \
+              unless model.to_s.match?(MODEL_ID_PATTERN)
+            "/model/#{model}/invoke"
+          end
+          # @param text [String, Array<String>]
+          # @param model [String] Bedrock embedding model id
+          # @param dimensions [Integer, nil] Titan v2 only; one of {256, 512, 1024}
+          # @return [Hash] JSON-serializable request payload
+          # @raise [RubyLLM::Error] on unsupported model, oversize input, or invalid dimensions
+          def render_embedding_payload(text, model:, dimensions:)
+            model_str = model.to_s
+            if model_str.start_with?(TITAN_V2_PREFIX)
+              titan_v2_payload(text, dimensions: dimensions)
+            elsif model_str.start_with?(TITAN_V1_PREFIX)
+              titan_v1_payload(text)
+            elsif model_str.start_with?(COHERE_PREFIX)
+              cohere_payload(text)
+            else
+              raise RubyLLM::Error.new(
+                nil,
+                "Bedrock model '#{model}' is not supported for embeddings. " \
+                'Supported prefixes: amazon.titan-embed-text-v1, ' \
+                'amazon.titan-embed-text-v2, cohere.embed-*.'
+              )
+            end
+          end
+          # @param response [Faraday::Response]
+          # @param model [String]
+          # @param text [String, Array<String>] original input (used for shape decisions)
+          # @return [RubyLLM::Embedding]
+          # @raise [RubyLLM::Error] if the response carried no vector
+          def parse_embedding_response(response, model:, text:)
+            body = response.body
+            body = try_parse_json(body) if body.is_a?(String)
+            vectors =
+              if model.to_s.start_with?(COHERE_PREFIX)
+                Array(body['embeddings'])
+              else
+                # Titan single-text response: the single vector lives in :embedding.
+                # Batch callers are handled in `embed` via iteration.
+                [body['embedding']].compact
+              end
+            raise RubyLLM::Error.new(response, "Empty embedding response for model #{model}") if vectors.empty?
+            vectors = vectors.first if vectors.length == 1 && !text.is_a?(Array)
+            input_tokens = body['inputTextTokenCount'] ||
+                           body.dig('meta', 'billed_units', 'input_tokens') ||
+                           0
+            RubyLLM::Embedding.new(vectors: vectors, model: model, input_tokens: input_tokens)
+          end
+          # Override the base `embed` method so signing headers are applied.
+          #
+          # The parent `Provider#embed` calls `@connection.post(url, payload)` directly,
+          # which would skip both bearer-token and SigV4 auth for Bedrock. We go through
+          # `invoke_embedding`, which mirrors `signed_post` but parses responses with
+          # `parse_embedding_response` (not `parse_completion_response`).
+          #
+          # Titan accepts a single text per invocation. When an Array is passed to a
+          # Titan model, we iterate via `embed_titan_batch`, which traps per-element
+          # failures so one 429 mid-batch does not lose preceding successes.
+          #
+          # @param text [String, Array<String>]
+          # @param model [String]
+          # @param dimensions [Integer, nil]
+          # @return [RubyLLM::Embedding]
+          def embed(text, model:, dimensions:)
+            return embed_titan_batch(text, model: model, dimensions: dimensions) \
+              if text.is_a?(Array) && !model.to_s.start_with?(COHERE_PREFIX)
+            payload  = render_embedding_payload(text, model: model, dimensions: dimensions)
+            url      = embedding_url(model: model)
+            response = invoke_embedding(url, payload)
+            parse_embedding_response(response, model: model, text: text)
+          end
+          private
+          def titan_v2_payload(text, dimensions:)
+            raise RubyLLM::Error.new(nil, 'Titan v2 embeddings accept a single string per invocation.') \
+              if text.is_a?(Array)
+            enforce_input_size!(text, TITAN_MAX_INPUT_BYTES, 'Titan v2')
+            payload = { inputText: text.to_s, normalize: true }
+            dim     = dimensions&.to_i
+            if dim
+              unless TITAN_ALLOWED_DIMENSIONS.include?(dim)
+                raise RubyLLM::Error.new(
+                  nil,
+                  "Titan v2 dimensions must be one of #{TITAN_ALLOWED_DIMENSIONS.inspect}, got #{dim}"
+                )
+              end
+              payload[:dimensions] = dim
+            end
+            payload
+          end
+          def titan_v1_payload(text)
+            raise RubyLLM::Error.new(nil, 'Titan v1 embeddings accept a single string per invocation.') \
+              if text.is_a?(Array)
+            enforce_input_size!(text, TITAN_MAX_INPUT_BYTES, 'Titan v1')
+            { inputText: text.to_s }
+          end
+          def cohere_payload(text)
+            texts = Array(text).map(&:to_s)
+            raise RubyLLM::Error.new(nil, "Cohere Embed batch size #{texts.size} exceeds max #{COHERE_MAX_TEXTS}") \
+              if texts.size > COHERE_MAX_TEXTS
+            texts.each { |t| enforce_input_size!(t, COHERE_MAX_INPUT_BYTES, 'Cohere Embed') }
+            { texts: texts, input_type: 'search_document' }
+          end
+          def enforce_input_size!(text, max_bytes, model_name)
+            bytes = text.to_s.bytesize
+            return if bytes <= max_bytes
+            raise RubyLLM::Error.new(
+              nil,
+              "#{model_name} input too large: #{bytes} bytes exceeds max #{max_bytes}. " \
+              'Caller must chunk before embedding.'
+            )
+          end
+          # Mirror of `signed_post` for embeddings: pre-serializes the body so the
+          # SigV4 signature matches the bytes Faraday actually sends. `@connection.post`
+          # is `RubyLLM::Connection#post(url, payload)` which requires both args, so we
+          # pass `payload` to satisfy the arity but override `req.body = body` in the
+          # block — the block runs after middleware, so the pre-serialized bytes win
+          # over whatever JSON middleware would have produced.
+          def invoke_embedding(url, payload)
+            body    = Legion::JSON.dump(payload)
+            headers = sign_headers('POST', url, body)
+            @connection.post(url, payload) do |req|
+              req.headers.merge!(headers)
+              req.body = body
+            end
+          end
+          # Per-item trap-and-continue for Titan batch. Returns a combined Embedding
+          # whose `vectors` is an Array of [Float] per input index, with `nil` entries
+          # for failed slots. Token count aggregates successful calls.
+          #
+          # Raises only when every element failed — otherwise logs failures via
+          # `RubyLLM.logger` and returns partial results so callers keep the paid-for
+          # vectors. Idiomatic for this file because we are inside the RubyLLM
+          # namespace; Legion-side batch orchestration lives in
+          # `Legion::LLM::Call::Embeddings.generate_batch`.
+          def embed_titan_batch(texts, model:, dimensions:)
+            vectors     = []
+            token_total = 0
+            failures    = []
+            texts.each_with_index do |text, idx|
+              single = embed(text.to_s, model: model, dimensions: dimensions)
+              vectors << Array(single.vectors).first
+              token_total += single.input_tokens.to_i
+            rescue StandardError => e
+              vectors << nil
+              failures << { index: idx, error: e.class.name, message: e.message }
+            end
+            unless failures.empty?
+              RubyLLM.logger.warn(
+                '[bedrock_embeddings] Titan batch partial failure: ' \
+                "#{failures.size}/#{texts.size} model=#{model}"
+              )
+              failures.each do |f|
+                RubyLLM.logger.debug(
+                  "[bedrock_embeddings] batch item index=#{f[:index]} error=#{f[:error]} message=#{f[:message]}"
+                )
+              end
+            end
+            if failures.size == texts.size
+              raise RubyLLM::Error.new(
+                nil,
+                "All #{texts.size} Titan batch items failed. First error: #{failures.first[:message]}"
+              )
+            end
+            RubyLLM::Embedding.new(vectors: vectors, model: model, input_tokens: token_total)
+          end
+        end
+        include Embeddings
+      end
+    end
+  end
+end

data/lib/legion/llm/inference/executor.rb CHANGED Viewed

@@ -328,7 +328,9 @@ module Legion
             end
           end
-          @resolved_provider = provider || Legion::LLM.settings[:default_provider]
+          @resolved_provider = provider ||
+                               (model && Router.infer_provider_for_model(model)) ||
+                               Legion::LLM.settings[:default_provider]
           @resolved_model = model || Legion::LLM.settings[:default_model]
           log.info "[llm][inference] resolved provider=#{@resolved_provider} model=#{@resolved_model}"
@@ -846,6 +848,8 @@ module Legion
           duration_ms = started_at ? ((finished_at - started_at) * 1000).round : nil
           result_str = (raw.is_a?(String) ? raw : raw.to_s)
+          result_str = result_str.encode('UTF-8', invalid: :replace, undef: :replace, replace: '�') unless result_str.valid_encoding?
+          result_str = result_str.delete("\x00")
           is_error = raw.is_a?(Hash) && (raw[:error] || raw['error']) ? true : false
           @pending_tool_history_mutex.synchronize do

data/lib/legion/llm/inference.rb CHANGED Viewed

@@ -496,7 +496,8 @@ module Legion
         end
         model ||= Legion::LLM.settings[:default_model]
-        provider ||= Legion::LLM.settings[:default_provider]
+        provider ||= (model && Router.infer_provider_for_model(model)) ||
+                     Legion::LLM.settings[:default_provider]
         opts = {}
         opts[:model] = model if model

data/lib/legion/llm/router.rb CHANGED Viewed

@@ -18,7 +18,22 @@ module Legion
                         gemini: :cloud, azure: :cloud, ollama: :local, vllm: :local }.freeze
       PROVIDER_ORDER = %i[ollama vllm bedrock azure gemini anthropic openai].freeze
+      OLLAMA_MODEL_PATTERN = %r{[:/]}
       class << self
+        def infer_provider_for_model(model)
+          return nil if model.nil? || model.to_s.empty?
+          model_s = model.to_s
+          return :bedrock if model_s.start_with?('us.')
+          return :openai if model_s.match?(/\Agpt-|\Ao[134]-/)
+          return :anthropic if model_s.start_with?('claude-')
+          return :gemini if model_s.start_with?('gemini-')
+          return :ollama if model_s.match?(OLLAMA_MODEL_PATTERN)
+          nil
+        end
         # Resolve an LLM routing intent to a tier/provider/model decision.
         #
         # @param intent   [Hash, nil] routing intent (capability, privacy, etc.)
@@ -95,18 +110,12 @@ module Legion
           model = Arbitrage.cheapest_for(capability: capability)
           return nil unless model
-          provider = Arbitrage.cost_table[model] ? infer_provider(model) : nil
-          log.debug("Router: arbitrage fallback selected model=#{model}")
-          Resolution.new(tier: :cloud, provider: provider || :bedrock, model: model, rule: 'arbitrage_fallback')
-        end
-        def infer_provider(model)
-          return :ollama if model.include?('llama')
-          return :bedrock if model.start_with?('us.')
-          return :openai if model.start_with?('gpt')
-          return :google if model.start_with?('gemini')
+          provider = infer_provider_for_model(model)
+          return nil unless provider
-          :anthropic if model.start_with?('claude')
+          tier = PROVIDER_TIER.fetch(provider, :cloud)
+          log.debug("Router: arbitrage fallback selected model=#{model} provider=#{provider} tier=#{tier}")
+          Resolution.new(tier: tier, provider: provider, model: model, rule: 'arbitrage_fallback')
         end
         def explicit_resolution(tier, provider, model)

data/lib/legion/llm/version.rb CHANGED Viewed

@@ -2,6 +2,6 @@
 module Legion
   module LLM
-    VERSION = '0.8.27'
+    VERSION = '0.8.29'
   end
 end

data/lib/legion/llm.rb CHANGED Viewed

@@ -15,6 +15,7 @@ require_relative 'llm/call/embeddings'
 require_relative 'llm/call/structured_output'
 require_relative 'llm/call/daemon_client'
 require_relative 'llm/call/bedrock_auth'
+require_relative 'llm/call/bedrock_embeddings'
 require_relative 'llm/call/claude_config_loader'
 require_relative 'llm/call/codex_config_loader'
 require_relative 'llm/router'

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: legion-llm
 version: !ruby/object:Gem::Version
-  version: 0.8.27
+  version: 0.8.29
 platform: ruby
 authors:
 - Esity
@@ -246,6 +246,7 @@ files:
 - lib/legion/llm/cache/response.rb
 - lib/legion/llm/call.rb
 - lib/legion/llm/call/bedrock_auth.rb
+- lib/legion/llm/call/bedrock_embeddings.rb
 - lib/legion/llm/call/claude_config_loader.rb
 - lib/legion/llm/call/codex_config_loader.rb
 - lib/legion/llm/call/daemon_client.rb