RubyGems - legion-llm - Versions diffs - 0.8.28 → 0.8.30 - Mend

legion-llm 0.8.28 → 0.8.30

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (39) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +24 -0
data/README.md +4 -1
data/legion-llm.gemspec +1 -0
data/lib/legion/llm/api/native/helpers.rb +81 -9
data/lib/legion/llm/api/native/inference.rb +5 -2
data/lib/legion/llm/call/bedrock_embeddings.rb +270 -0
data/lib/legion/llm/call/providers.rb +47 -28
data/lib/legion/llm/call/structured_output.rb +15 -3
data/lib/legion/llm/inference/executor.rb +37 -32
data/lib/legion/llm/router.rb +1 -1
data/lib/legion/llm/settings.rb +3 -2
data/lib/legion/llm/tools/adapter.rb +15 -0
data/lib/legion/llm/version.rb +1 -1
data/lib/legion/llm.rb +1 -0
metadata +16 -24
data/docs/2026-03-23-pipeline-gap-analysis.md +0 -203
data/docs/example_settings.json +0 -16
data/docs/examples/anthropic_request.json +0 -108
data/docs/examples/anthropic_response.json +0 -90
data/docs/examples/azure_ai_request.json +0 -103
data/docs/examples/azure_ai_response.json +0 -91
data/docs/examples/bedrock_request.json +0 -127
data/docs/examples/bedrock_response.json +0 -93
data/docs/examples/gemini_request.json +0 -127
data/docs/examples/gemini_response.json +0 -109
data/docs/examples/openai_request.json +0 -100
data/docs/examples/openai_response.json +0 -77
data/docs/examples/xai_request.json +0 -93
data/docs/examples/xai_response.json +0 -48
data/docs/gas-apollo-idea.md +0 -528
data/docs/generation-augmented-storage.md +0 -135
data/docs/llm-schema-spec.md +0 -2816
data/docs/plans/2026-03-15-ollama-discovery-design.md +0 -164
data/docs/plans/2026-03-15-ollama-discovery-implementation.md +0 -1147
data/docs/routing-reenvisioned.md +0 -861
data/docs/superpowers/plans/2026-04-15-sticky-runners-tool-history.md +0 -1866
data/docs/superpowers/specs/2026-04-15-sticky-runners-tool-history-design.md +0 -713
data/legion-llm-0.3.20.gem +0 -0

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 523afac32d76644a92db4f6af5228c9ff9856521ccffb7cff0a0e8194570a432
-  data.tar.gz: b58073ec104d42eb18436fd708a33bea881931386dfab1e578f0570d891a6f55
+  metadata.gz: d560457b6321f55371b3dd14d546c8c23d11485b2b3ba5dec218cea028d50399
+  data.tar.gz: 8ee76eba6bf57f592f9d372fec7f8c5372d97c220869f988abe9e1521e766b7b
 SHA512:
-  metadata.gz: 205d3a1ef6f1c9e8712bc61e2d88382b88a91560343ab6be7e5c863f2b839ea3d384f5a5642240f7a62fed73ed96aff7b653855c15c1cb5517a43d342306a54b
-  data.tar.gz: 8847c3be8580a5c1c62bd61b83c72ef53db974b19a8567fd102827b748e9113bfc3dc0af7aa14e23d25b524453b2465dd8053e55f64ee2798f9ae9b387cac264
+  metadata.gz: 0b81ee44a4f57a8ec0e9eb8aef043bdc543217baade9ff1b2772a46056ebc7a486d4e9acfded4fb05cedf341583cb8aaaa9091bd243c4f26eac9ff494ada3d01
+  data.tar.gz: 2ebf5a44aa5588a635c75ef6dc0b68c6554905d343b125cd944c1f5032399c4c714cad8953030e53ba3bb92efe0233371d5c2ebad12eebc5b4a1e2887a0e7c35

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,29 @@
 # Legion LLM Changelog
+## [0.8.30] - 2026-04-27
+### Fixed
+- Structured output parsing now strips markdown code fences before JSON parse, including retry responses from models that keep returning fenced JSON.
+- LLM tool adapter dispatch now symbolizes JSON/string-keyed tool arguments before invoking Ruby keyword-argument tool classes.
+- Default routing chains now honor explicit `default_provider` / `default_model` before auto-enabled local providers, preventing Ollama defaults from overriding a configured Bedrock default.
+- Provider credential setup now resolves `env://` placeholders consistently for Bedrock SigV4, Anthropic, OpenAI, Gemini, Azure, and vLLM, and unresolved placeholder arrays no longer auto-enable hosted providers.
+- Native `/api/llm/inference` responses now flatten structured provider content blocks into plain text for both streaming SSE deltas and non-streaming JSON responses, preventing Anthropic/Bedrock-style block arrays from being stored and replayed as nested JSON-looking assistant replies.
+- Native `/api/llm/inference` streaming now emits `thinking-delta` SSE events for provider reasoning chunks without appending those chunks to final assistant content.
+- Native `file_read` client tools now extract text from PDFs via `pdf-reader` and return a clear unsupported-binary message for non-text binary files.
+- Local providers now cap automatically injected registry tools with `llm.tool_trigger.local_tool_limit`, prioritizing trigger-matched tools before always-loaded tools for Ollama/vLLM requests.
+## [0.8.29] - 2026-04-27
+### Added
+- Bedrock embedding support via `call/bedrock_embeddings.rb` — a `RubyLLM::Providers::Bedrock` monkey-patch (same pattern as `bedrock_auth.rb`) that implements `render_embedding_payload`, `embedding_url`, `parse_embedding_response`, and overrides `embed` for signed transport. Covers Amazon Titan v1, Titan v2 (selectable 256/512/1024 dimensions), and Cohere Embed v3 (English + multilingual).
+- Short-circuit guard: when ruby_llm eventually ships native `render_embedding_payload`, the patch becomes inert rather than double-loading the method.
+- Trap-and-continue batch semantics for Titan (which is single-text-per-call): `embed_titan_batch` iterates client-side, preserves partial successes on mid-batch failures, logs the failure count via `RubyLLM.logger.warn`, and only raises when 100% of inputs fail.
+- Input-size guards: Titan rejects >8k tokens with a billable 400 — we now raise a descriptive `RubyLLM::Error` at ≥45 000 bytes before the wire call. Cohere enforces the 96-texts / 8 KB-per-text documented limits.
+- Full spec coverage in `spec/legion/llm/bedrock_embeddings_spec.rb` (probe contract, per-model payload shapes, dimension validation, batch limits, error paths).
+### Fixed
+- `Legion::LLM::Discovery.find_embedding_provider` can now actually resolve Bedrock when it is the configured fallback. Previously, the discovery probe (`klass.instance_method(:render_embedding_payload)`) raised `NameError` for Bedrock and the fallback chain skipped past it with `[llm][discovery] no embedding provider available` — even when Bedrock was the only reachable embedding provider.
 ## [0.8.28] - 2026-04-24
 ### Fixed

data/README.md CHANGED Viewed

@@ -2,7 +2,7 @@
 LLM integration for the [LegionIO](https://github.com/LegionIO/LegionIO) framework. Wraps [ruby_llm](https://github.com/crmne/ruby_llm) to provide chat, embeddings, tool use, and agent capabilities to any Legion extension. Exposes OpenAI- and Anthropic-compatible API endpoints so external tools can point at the Legion daemon and just work.
-**Version**: 0.8.0
+**Version**: 0.8.30
 ## Installation
@@ -60,6 +60,7 @@ Requests flow through the full Inference pipeline — routing, metering, audit,
 Both formats supported with correct SSE shapes:
 - **OpenAI**: `data: {"choices":[{"delta":{"content":"..."}}]}` chunks, terminated by `data: [DONE]`
 - **Anthropic**: Typed events — `message_start`, `content_block_start`, `content_block_delta`, `content_block_stop`, `message_delta`, `message_stop`
+- **Native**: `/api/llm/inference` streams `text-delta`, `thinking-delta`, tool lifecycle events, and a final `done` event. Structured provider content blocks are flattened to plain text in both streaming and non-streaming native responses so `content` remains a string for daemon clients.
 ### API Authentication
@@ -851,6 +852,8 @@ No code changes are needed in consumers immediately. The aliases will be maintai
 | Azure AI | `azure` | `vault://`, `env://`, or direct | Azure OpenAI endpoint; `api_base` + `api_key` or `auth_token` |
 | Ollama | `ollama` | Local, no credentials needed | Local inference |
+`env://NAME` credential placeholders resolve at provider configuration time, including array fallbacks such as `["env://OPENAI_API_KEY", "env://CODEX_API_KEY"]`. Unresolved placeholders do not auto-enable hosted providers.
 ## Integration with LegionIO
 legion-llm follows the standard core gem lifecycle:

data/legion-llm.gemspec CHANGED Viewed

@@ -35,6 +35,7 @@ Gem::Specification.new do |spec|
   spec.add_dependency 'lex-gemini'
   spec.add_dependency 'lex-knowledge'
   spec.add_dependency 'lex-openai'
+  spec.add_dependency 'pdf-reader'
   spec.add_dependency 'ruby_llm', '~> 1.13'
   spec.add_dependency 'tzinfo', '>= 2.0'
 end

data/lib/legion/llm/api/native/helpers.rb CHANGED Viewed

@@ -10,12 +10,14 @@ module Legion
     module API
       module Native
         module ClientToolMethods
+          include Legion::Logging::Helper
           private
           def log_tool(level, ref, status, **details)
             parts = ["[tool][#{ref}] #{status}"]
             details.each { |k, v| parts << "#{k}=#{v}" }
-            Legion::Logging.send(level, parts.join(' '))
+            log.public_send(level, parts.join(' '))
           end
           def summarize_tool_arg_keys(kwargs)
@@ -37,7 +39,7 @@ module Legion
             end
           end
-          def dispatch_client_tool(ref, **kwargs) # rubocop:disable Metrics/AbcSize,Metrics/CyclomaticComplexity,Metrics/PerceivedComplexity
+          def dispatch_client_tool(ref, **kwargs) # rubocop:disable Metrics/AbcSize,Metrics/CyclomaticComplexity,Metrics/MethodLength,Metrics/PerceivedComplexity
             case ref
             when 'sh'
               cmd = kwargs[:command] || kwargs[:cmd] || kwargs.values.first.to_s
@@ -45,7 +47,7 @@ module Legion
               "exit=#{status.exitstatus}\n#{output}"
             when 'file_read'
               path = kwargs[:path] || kwargs[:file_path] || kwargs.values.first.to_s
-              ::File.exist?(path) ? ::File.read(path, encoding: 'utf-8') : "File not found: #{path}"
+              read_client_file(path)
             when 'file_write'
               path = kwargs[:path] || kwargs[:file_path]
               content = kwargs[:content] || kwargs[:contents]
@@ -82,6 +84,7 @@ module Legion
                 max_length ? content[0, max_length] : content
               rescue LoadError => e
                 missing = e.respond_to?(:path) && e.path ? e.path : 'legion/cli/chat/web_fetch'
+                handle_exception(e, level: :warn, handled: true, operation: 'llm.api.client_tool.web_fetch', missing: missing)
                 "web_fetch is unavailable: missing optional dependency #{missing}"
               end
             when 'web_search'
@@ -93,6 +96,7 @@ module Legion
                 results[:results].map { |r| "### #{r[:title]}\n#{r[:url]}\n#{r[:snippet]}" }.join("\n\n")
               rescue LoadError => e
                 missing = e.respond_to?(:path) && e.path ? e.path : 'legion/cli/chat/web_search'
+                handle_exception(e, level: :warn, handled: true, operation: 'llm.api.client_tool.web_search', missing: missing)
                 "web_search is unavailable: missing optional dependency #{missing}"
               end
             else
@@ -100,6 +104,51 @@ module Legion
             end
           end
+          def read_client_file(path)
+            return "File not found: #{path}" unless ::File.exist?(path)
+            return read_pdf_text(path) if pdf_file?(path)
+            content = ::File.binread(path)
+            return 'Binary file detected, cannot read as text.' if binary_content?(content)
+            content.force_encoding('UTF-8')
+            content
+          rescue StandardError => e
+            handle_exception(e, level: :warn, handled: true, operation: 'llm.api.client_tool.file_read', path: path)
+            "file_read error: #{e.message}"
+          end
+          def pdf_file?(path)
+            ::File.extname(path).casecmp('.pdf').zero? || ::File.binread(path, 5) == '%PDF-'
+          rescue StandardError => e
+            handle_exception(e, level: :warn, handled: true, operation: 'llm.api.client_tool.pdf_sniff', path: path)
+            false
+          end
+          def read_pdf_text(path)
+            require 'pdf-reader' unless defined?(::PDF::Reader)
+            reader = ::PDF::Reader.new(path)
+            text = reader.pages.map(&:text).join("\n\n").strip
+            text.empty? ? 'PDF contained no extractable text.' : text
+          rescue LoadError => e
+            missing = e.respond_to?(:path) && e.path ? e.path : 'pdf-reader'
+            handle_exception(e, level: :warn, handled: true, operation: 'llm.api.client_tool.pdf_extract', missing: missing)
+            'PDF text extraction unavailable: missing pdf-reader gem.'
+          rescue StandardError => e
+            handle_exception(e, level: :warn, handled: true, operation: 'llm.api.client_tool.pdf_extract', path: path)
+            "PDF text extraction failed: #{e.message}"
+          end
+          def binary_content?(content)
+            return true if content.include?("\x00")
+            sample = content.byteslice(0, 4096).to_s
+            sample.force_encoding('UTF-8')
+            !sample.valid_encoding?
+          end
           def notify_tool_event(type, ref, **data)
             handler = Thread.current[:legion_tool_event_handler]
             return unless handler
@@ -257,13 +306,14 @@ module Legion
                   rescue StandardError => e
                     ms = begin
                       ((::Process.clock_gettime(::Process::CLOCK_MONOTONIC) - t0) * 1000).round(1)
-                    rescue StandardError
+                    rescue StandardError => e
+                      handle_exception(e, level: :warn, handled: true,
+                                          operation: 'llm.api.client_tool.duration_measurement', tool_ref: tool_ref)
                       nil
                     end
                     log_tool(:error, tool_ref, 'failed', duration_ms: ms, error: e.message)
                     notify_tool_event(:tool_error, tool_ref, error: e.message)
-                    Legion::Logging.log_exception(e, payload_summary: "client tool #{tool_ref} failed",
-                                                     component_type:  :api)
+                    handle_exception(e, level: :error, handled: true, operation: "llm.api.client_tool.#{tool_ref}")
                     "Tool error: #{e.message}"
                   end
                 end
@@ -287,6 +337,25 @@ module Legion
                 end
               end
+              define_method(:extract_text_content) do |content|
+                case content
+                when nil
+                  ''
+                when String
+                  content
+                when Array
+                  content.filter_map { |entry| extract_text_content(entry) }.join
+                when Hash
+                  type = content[:type] || content['type']
+                  return '' unless type.nil? || type.to_s == 'text'
+                  text = content.key?(:text) || content.key?('text') ? (content[:text] || content['text']) : (content[:content] || content['content'])
+                  extract_text_content(text)
+                else
+                  content.to_s
+                end
+              end
               define_method(:emit_sse_event) do |stream, event_name, payload|
                 level = event_name == 'text-delta' ? :debug : :info
                 log.send(level, "[sse][emit] event=#{event_name} keys=#{payload.is_a?(Hash) ? payload.keys.join(',') : 'n/a'}")
@@ -333,7 +402,8 @@ module Legion
                 kerb = begin
                   Legion::Settings.dig(:kerberos, :username)
-                rescue StandardError
+                rescue StandardError => e
+                  handle_exception(e, level: :warn, handled: true, operation: 'llm.api.identity.kerberos_username')
                   nil
                 end
                 return "user:#{kerb}" if kerb.is_a?(String) && !kerb.empty?
@@ -354,14 +424,16 @@ module Legion
               define_method(:resolve_requested_by) do |rack_env, identity_string|
                 hostname = begin
                   Legion::Settings[:client][:hostname]
-                rescue StandardError
+                rescue StandardError => e
+                  handle_exception(e, level: :warn, handled: true, operation: 'llm.api.identity.client_hostname')
                   Socket.gethostname
                 end
                 username = identity_string.delete_prefix('user:')
                 kerb = begin
                   Legion::Settings.dig(:kerberos, :username)
-                rescue StandardError
+                rescue StandardError => e
+                  handle_exception(e, level: :warn, handled: true, operation: 'llm.api.identity.requested_by_kerberos')
                   nil
                 end
                 if kerb.is_a?(String) && !kerb.empty?

data/lib/legion/llm/api/native/inference.rb CHANGED Viewed

@@ -150,7 +150,10 @@ module Legion
                   }
                   pipeline_response = executor.call_stream do |chunk|
-                    text = chunk.respond_to?(:content) ? chunk.content.to_s : chunk.to_s
+                    thinking = extract_text_content(chunk.thinking) if chunk.respond_to?(:thinking)
+                    emit_sse_event(out, 'thinking-delta', { delta: thinking }) unless thinking.to_s.empty?
+                    text = extract_text_content(chunk.respond_to?(:content) ? chunk.content : chunk)
                     next if text.empty?
                     full_text << text
@@ -195,7 +198,7 @@ module Legion
                 exec_ms = ((::Process.clock_gettime(::Process::CLOCK_MONOTONIC) - exec_t0) * 1000).round
                 log.debug("[llm][api][inference] action=executor_call duration_ms=#{exec_ms} request_id=#{request_id}")
                 raw_msg = pipeline_response.message
-                content = raw_msg.is_a?(Hash) ? (raw_msg[:content] || raw_msg['content']) : raw_msg.to_s
+                content = extract_text_content(raw_msg.is_a?(Hash) ? (raw_msg[:content] || raw_msg['content']) : raw_msg)
                 routing = pipeline_response.routing || {}
                 tokens = pipeline_response.tokens || {}
                 tool_calls = extract_tool_calls(pipeline_response)

data/lib/legion/llm/call/bedrock_embeddings.rb ADDED Viewed

@@ -0,0 +1,270 @@
+# frozen_string_literal: true
+# Monkey-patch RubyLLM's Bedrock provider to support embeddings via
+# Amazon Titan (amazon.titan-embed-text-v1 and v2) and Cohere Embed
+# (cohere.embed-english-v3 / cohere.embed-multilingual-v3).
+#
+# Without this patch, `RubyLLM::Providers::Bedrock` exposes no
+# `render_embedding_payload` method, so the discovery probe
+# (`klass.instance_method(:render_embedding_payload)`) raises NameError
+# and Bedrock is silently excluded from the embedding fallback chain.
+#
+# Companion piece to `call/bedrock_auth.rb` — both use the same
+# bearer-or-SigV4 `signed_post` path and live here (not in lex-bedrock)
+# because lex-bedrock wraps `aws-sdk-bedrockruntime`, not RubyLLM.
+#
+# ─── Upstream tracking ────────────────────────────────────────────
+# This is a deprecation-scheduled shim. The methods below are the
+# kind of thing that eventually belongs in the underlying ruby_llm
+# library's Bedrock provider. Remove this file once upstream ships
+# equivalent support. The short-circuit below renders the patch
+# inert when `render_embedding_payload` is defined natively, so an
+# accidental double-load after an upstream bump is safe.
+# ───────────────────────────────────────────────────────────────────
+#
+# Titan v2 request shape:
+#   POST /model/amazon.titan-embed-text-v2:0/invoke
+#   { "inputText": "...", "dimensions": 1024, "normalize": true }
+#   => { "embedding": [...], "inputTextTokenCount": N }
+#
+# Cohere Embed request shape:
+#   POST /model/cohere.embed-english-v3/invoke
+#   { "texts": ["..."], "input_type": "search_document" }
+#   => { "embeddings": [[...]], ... }
+require 'ruby_llm'
+require_relative 'bedrock_auth'
+if RubyLLM::Providers::Bedrock.method_defined?(:render_embedding_payload)
+  # Native support landed upstream — patch is inert.
+  Legion::Logging.logger.info('[llm][bedrock_embeddings] native ruby_llm embedding support detected — skipping patch')
+else
+  module RubyLLM
+    module Providers
+      class Bedrock
+        # Embeddings methods for AWS Bedrock via InvokeModel.
+        #
+        # Public methods are instance methods (not `module_function`) so the
+        # `include Embeddings` at the end of the class body properly overrides
+        # `Provider#embed` via Ruby's method-resolution order.
+        module Embeddings
+          TITAN_V2_PREFIX = 'amazon.titan-embed-text-v2'
+          TITAN_V1_PREFIX = 'amazon.titan-embed-text-v1'
+          COHERE_PREFIX   = 'cohere.embed'
+          TITAN_ALLOWED_DIMENSIONS = [256, 512, 1024].freeze
+          TITAN_MAX_INPUT_BYTES    = 45_000   # ~8k tokens; Titan rejects larger with 400 (and still bills)
+          COHERE_MAX_INPUT_BYTES   = 8_192    # Cohere Embed v3 per-text byte budget
+          COHERE_MAX_TEXTS         = 96       # Cohere Embed v3 batch limit
+          # Bedrock model IDs use only alphanumerics, `.`, `-`, and `:` (e.g.
+          # `amazon.titan-embed-text-v2:0`, `cohere.embed-english-v3`,
+          # `us.anthropic.claude-sonnet-4-6-v1`). Slashes and `..` are rejected
+          # to block path-injection into the `/model/<id>/invoke` URL.
+          MODEL_ID_PATTERN         = /\A[a-zA-Z0-9.\-:]+\z/
+          # @param model [String, Symbol] Bedrock model id
+          # @return [String] InvokeModel URL path
+          # @raise [RubyLLM::Error] if model id contains unsafe characters
+          def embedding_url(model:)
+            raise RubyLLM::Error.new(nil, "Invalid Bedrock model id: #{model.inspect}") \
+              unless model.to_s.match?(MODEL_ID_PATTERN)
+            "/model/#{model}/invoke"
+          end
+          # @param text [String, Array<String>]
+          # @param model [String] Bedrock embedding model id
+          # @param dimensions [Integer, nil] Titan v2 only; one of {256, 512, 1024}
+          # @return [Hash] JSON-serializable request payload
+          # @raise [RubyLLM::Error] on unsupported model, oversize input, or invalid dimensions
+          def render_embedding_payload(text, model:, dimensions:)
+            model_str = model.to_s
+            if model_str.start_with?(TITAN_V2_PREFIX)
+              titan_v2_payload(text, dimensions: dimensions)
+            elsif model_str.start_with?(TITAN_V1_PREFIX)
+              titan_v1_payload(text)
+            elsif model_str.start_with?(COHERE_PREFIX)
+              cohere_payload(text)
+            else
+              raise RubyLLM::Error.new(
+                nil,
+                "Bedrock model '#{model}' is not supported for embeddings. " \
+                'Supported prefixes: amazon.titan-embed-text-v1, ' \
+                'amazon.titan-embed-text-v2, cohere.embed-*.'
+              )
+            end
+          end
+          # @param response [Faraday::Response]
+          # @param model [String]
+          # @param text [String, Array<String>] original input (used for shape decisions)
+          # @return [RubyLLM::Embedding]
+          # @raise [RubyLLM::Error] if the response carried no vector
+          def parse_embedding_response(response, model:, text:)
+            body = response.body
+            body = try_parse_json(body) if body.is_a?(String)
+            vectors =
+              if model.to_s.start_with?(COHERE_PREFIX)
+                Array(body['embeddings'])
+              else
+                # Titan single-text response: the single vector lives in :embedding.
+                # Batch callers are handled in `embed` via iteration.
+                [body['embedding']].compact
+              end
+            raise RubyLLM::Error.new(response, "Empty embedding response for model #{model}") if vectors.empty?
+            vectors = vectors.first if vectors.length == 1 && !text.is_a?(Array)
+            input_tokens = body['inputTextTokenCount'] ||
+                           body.dig('meta', 'billed_units', 'input_tokens') ||
+                           0
+            RubyLLM::Embedding.new(vectors: vectors, model: model, input_tokens: input_tokens)
+          end
+          # Override the base `embed` method so signing headers are applied.
+          #
+          # The parent `Provider#embed` calls `@connection.post(url, payload)` directly,
+          # which would skip both bearer-token and SigV4 auth for Bedrock. We go through
+          # `invoke_embedding`, which mirrors `signed_post` but parses responses with
+          # `parse_embedding_response` (not `parse_completion_response`).
+          #
+          # Titan accepts a single text per invocation. When an Array is passed to a
+          # Titan model, we iterate via `embed_titan_batch`, which traps per-element
+          # failures so one 429 mid-batch does not lose preceding successes.
+          #
+          # @param text [String, Array<String>]
+          # @param model [String]
+          # @param dimensions [Integer, nil]
+          # @return [RubyLLM::Embedding]
+          def embed(text, model:, dimensions:)
+            return embed_titan_batch(text, model: model, dimensions: dimensions) \
+              if text.is_a?(Array) && !model.to_s.start_with?(COHERE_PREFIX)
+            payload  = render_embedding_payload(text, model: model, dimensions: dimensions)
+            url      = embedding_url(model: model)
+            response = invoke_embedding(url, payload)
+            parse_embedding_response(response, model: model, text: text)
+          end
+          private
+          def titan_v2_payload(text, dimensions:)
+            raise RubyLLM::Error.new(nil, 'Titan v2 embeddings accept a single string per invocation.') \
+              if text.is_a?(Array)
+            enforce_input_size!(text, TITAN_MAX_INPUT_BYTES, 'Titan v2')
+            payload = { inputText: text.to_s, normalize: true }
+            dim     = dimensions&.to_i
+            if dim
+              unless TITAN_ALLOWED_DIMENSIONS.include?(dim)
+                raise RubyLLM::Error.new(
+                  nil,
+                  "Titan v2 dimensions must be one of #{TITAN_ALLOWED_DIMENSIONS.inspect}, got #{dim}"
+                )
+              end
+              payload[:dimensions] = dim
+            end
+            payload
+          end
+          def titan_v1_payload(text)
+            raise RubyLLM::Error.new(nil, 'Titan v1 embeddings accept a single string per invocation.') \
+              if text.is_a?(Array)
+            enforce_input_size!(text, TITAN_MAX_INPUT_BYTES, 'Titan v1')
+            { inputText: text.to_s }
+          end
+          def cohere_payload(text)
+            texts = Array(text).map(&:to_s)
+            raise RubyLLM::Error.new(nil, "Cohere Embed batch size #{texts.size} exceeds max #{COHERE_MAX_TEXTS}") \
+              if texts.size > COHERE_MAX_TEXTS
+            texts.each { |t| enforce_input_size!(t, COHERE_MAX_INPUT_BYTES, 'Cohere Embed') }
+            { texts: texts, input_type: 'search_document' }
+          end
+          def enforce_input_size!(text, max_bytes, model_name)
+            bytes = text.to_s.bytesize
+            return if bytes <= max_bytes
+            raise RubyLLM::Error.new(
+              nil,
+              "#{model_name} input too large: #{bytes} bytes exceeds max #{max_bytes}. " \
+              'Caller must chunk before embedding.'
+            )
+          end
+          # Mirror of `signed_post` for embeddings: pre-serializes the body so the
+          # SigV4 signature matches the bytes Faraday actually sends. `@connection.post`
+          # is `RubyLLM::Connection#post(url, payload)` which requires both args, so we
+          # pass `payload` to satisfy the arity but override `req.body = body` in the
+          # block — the block runs after middleware, so the pre-serialized bytes win
+          # over whatever JSON middleware would have produced.
+          def invoke_embedding(url, payload)
+            body    = Legion::JSON.dump(payload)
+            headers = sign_headers('POST', url, body)
+            @connection.post(url, payload) do |req|
+              req.headers.merge!(headers)
+              req.body = body
+            end
+          end
+          # Per-item trap-and-continue for Titan batch. Returns a combined Embedding
+          # whose `vectors` is an Array of [Float] per input index, with `nil` entries
+          # for failed slots. Token count aggregates successful calls.
+          #
+          # Raises only when every element failed — otherwise logs failures via
+          # `RubyLLM.logger` and returns partial results so callers keep the paid-for
+          # vectors. Idiomatic for this file because we are inside the RubyLLM
+          # namespace; Legion-side batch orchestration lives in
+          # `Legion::LLM::Call::Embeddings.generate_batch`.
+          def embed_titan_batch(texts, model:, dimensions:)
+            vectors     = []
+            token_total = 0
+            failures    = []
+            texts.each_with_index do |text, idx|
+              single = embed(text.to_s, model: model, dimensions: dimensions)
+              vectors << Array(single.vectors).first
+              token_total += single.input_tokens.to_i
+            rescue StandardError => e
+              vectors << nil
+              failures << { index: idx, error: e.class.name, message: e.message }
+            end
+            unless failures.empty?
+              RubyLLM.logger.warn(
+                '[bedrock_embeddings] Titan batch partial failure: ' \
+                "#{failures.size}/#{texts.size} model=#{model}"
+              )
+              failures.each do |f|
+                RubyLLM.logger.debug(
+                  "[bedrock_embeddings] batch item index=#{f[:index]} error=#{f[:error]} message=#{f[:message]}"
+                )
+              end
+            end
+            if failures.size == texts.size
+              raise RubyLLM::Error.new(
+                nil,
+                "All #{texts.size} Titan batch items failed. First error: #{failures.first[:message]}"
+              )
+            end
+            RubyLLM::Embedding.new(vectors: vectors, model: model, input_tokens: token_total)
+          end
+        end
+        include Embeddings
+      end
+    end
+  end
+end