legion-llm 0.8.28 → 0.8.30

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (39) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +24 -0
  3. data/README.md +4 -1
  4. data/legion-llm.gemspec +1 -0
  5. data/lib/legion/llm/api/native/helpers.rb +81 -9
  6. data/lib/legion/llm/api/native/inference.rb +5 -2
  7. data/lib/legion/llm/call/bedrock_embeddings.rb +270 -0
  8. data/lib/legion/llm/call/providers.rb +47 -28
  9. data/lib/legion/llm/call/structured_output.rb +15 -3
  10. data/lib/legion/llm/inference/executor.rb +37 -32
  11. data/lib/legion/llm/router.rb +1 -1
  12. data/lib/legion/llm/settings.rb +3 -2
  13. data/lib/legion/llm/tools/adapter.rb +15 -0
  14. data/lib/legion/llm/version.rb +1 -1
  15. data/lib/legion/llm.rb +1 -0
  16. metadata +16 -24
  17. data/docs/2026-03-23-pipeline-gap-analysis.md +0 -203
  18. data/docs/example_settings.json +0 -16
  19. data/docs/examples/anthropic_request.json +0 -108
  20. data/docs/examples/anthropic_response.json +0 -90
  21. data/docs/examples/azure_ai_request.json +0 -103
  22. data/docs/examples/azure_ai_response.json +0 -91
  23. data/docs/examples/bedrock_request.json +0 -127
  24. data/docs/examples/bedrock_response.json +0 -93
  25. data/docs/examples/gemini_request.json +0 -127
  26. data/docs/examples/gemini_response.json +0 -109
  27. data/docs/examples/openai_request.json +0 -100
  28. data/docs/examples/openai_response.json +0 -77
  29. data/docs/examples/xai_request.json +0 -93
  30. data/docs/examples/xai_response.json +0 -48
  31. data/docs/gas-apollo-idea.md +0 -528
  32. data/docs/generation-augmented-storage.md +0 -135
  33. data/docs/llm-schema-spec.md +0 -2816
  34. data/docs/plans/2026-03-15-ollama-discovery-design.md +0 -164
  35. data/docs/plans/2026-03-15-ollama-discovery-implementation.md +0 -1147
  36. data/docs/routing-reenvisioned.md +0 -861
  37. data/docs/superpowers/plans/2026-04-15-sticky-runners-tool-history.md +0 -1866
  38. data/docs/superpowers/specs/2026-04-15-sticky-runners-tool-history-design.md +0 -713
  39. data/legion-llm-0.3.20.gem +0 -0
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 523afac32d76644a92db4f6af5228c9ff9856521ccffb7cff0a0e8194570a432
4
- data.tar.gz: b58073ec104d42eb18436fd708a33bea881931386dfab1e578f0570d891a6f55
3
+ metadata.gz: d560457b6321f55371b3dd14d546c8c23d11485b2b3ba5dec218cea028d50399
4
+ data.tar.gz: 8ee76eba6bf57f592f9d372fec7f8c5372d97c220869f988abe9e1521e766b7b
5
5
  SHA512:
6
- metadata.gz: 205d3a1ef6f1c9e8712bc61e2d88382b88a91560343ab6be7e5c863f2b839ea3d384f5a5642240f7a62fed73ed96aff7b653855c15c1cb5517a43d342306a54b
7
- data.tar.gz: 8847c3be8580a5c1c62bd61b83c72ef53db974b19a8567fd102827b748e9113bfc3dc0af7aa14e23d25b524453b2465dd8053e55f64ee2798f9ae9b387cac264
6
+ metadata.gz: 0b81ee44a4f57a8ec0e9eb8aef043bdc543217baade9ff1b2772a46056ebc7a486d4e9acfded4fb05cedf341583cb8aaaa9091bd243c4f26eac9ff494ada3d01
7
+ data.tar.gz: 2ebf5a44aa5588a635c75ef6dc0b68c6554905d343b125cd944c1f5032399c4c714cad8953030e53ba3bb92efe0233371d5c2ebad12eebc5b4a1e2887a0e7c35
data/CHANGELOG.md CHANGED
@@ -1,5 +1,29 @@
1
1
  # Legion LLM Changelog
2
2
 
3
+ ## [0.8.30] - 2026-04-27
4
+
5
+ ### Fixed
6
+ - Structured output parsing now strips markdown code fences before JSON parse, including retry responses from models that keep returning fenced JSON.
7
+ - LLM tool adapter dispatch now symbolizes JSON/string-keyed tool arguments before invoking Ruby keyword-argument tool classes.
8
+ - Default routing chains now honor explicit `default_provider` / `default_model` before auto-enabled local providers, preventing Ollama defaults from overriding a configured Bedrock default.
9
+ - Provider credential setup now resolves `env://` placeholders consistently for Bedrock SigV4, Anthropic, OpenAI, Gemini, Azure, and vLLM, and unresolved placeholder arrays no longer auto-enable hosted providers.
10
+ - Native `/api/llm/inference` responses now flatten structured provider content blocks into plain text for both streaming SSE deltas and non-streaming JSON responses, preventing Anthropic/Bedrock-style block arrays from being stored and replayed as nested JSON-looking assistant replies.
11
+ - Native `/api/llm/inference` streaming now emits `thinking-delta` SSE events for provider reasoning chunks without appending those chunks to final assistant content.
12
+ - Native `file_read` client tools now extract text from PDFs via `pdf-reader` and return a clear unsupported-binary message for non-text binary files.
13
+ - Local providers now cap automatically injected registry tools with `llm.tool_trigger.local_tool_limit`, prioritizing trigger-matched tools before always-loaded tools for Ollama/vLLM requests.
14
+
15
+ ## [0.8.29] - 2026-04-27
16
+
17
+ ### Added
18
+ - Bedrock embedding support via `call/bedrock_embeddings.rb` — a `RubyLLM::Providers::Bedrock` monkey-patch (same pattern as `bedrock_auth.rb`) that implements `render_embedding_payload`, `embedding_url`, `parse_embedding_response`, and overrides `embed` for signed transport. Covers Amazon Titan v1, Titan v2 (selectable 256/512/1024 dimensions), and Cohere Embed v3 (English + multilingual).
19
+ - Short-circuit guard: when ruby_llm eventually ships native `render_embedding_payload`, the patch becomes inert rather than double-loading the method.
20
+ - Trap-and-continue batch semantics for Titan (which is single-text-per-call): `embed_titan_batch` iterates client-side, preserves partial successes on mid-batch failures, logs the failure count via `RubyLLM.logger.warn`, and only raises when 100% of inputs fail.
21
+ - Input-size guards: Titan rejects >8k tokens with a billable 400 — we now raise a descriptive `RubyLLM::Error` at ≥45 000 bytes before the wire call. Cohere enforces the 96-texts / 8 KB-per-text documented limits.
22
+ - Full spec coverage in `spec/legion/llm/bedrock_embeddings_spec.rb` (probe contract, per-model payload shapes, dimension validation, batch limits, error paths).
23
+
24
+ ### Fixed
25
+ - `Legion::LLM::Discovery.find_embedding_provider` can now actually resolve Bedrock when it is the configured fallback. Previously, the discovery probe (`klass.instance_method(:render_embedding_payload)`) raised `NameError` for Bedrock and the fallback chain skipped past it with `[llm][discovery] no embedding provider available` — even when Bedrock was the only reachable embedding provider.
26
+
3
27
  ## [0.8.28] - 2026-04-24
4
28
 
5
29
  ### Fixed
data/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  LLM integration for the [LegionIO](https://github.com/LegionIO/LegionIO) framework. Wraps [ruby_llm](https://github.com/crmne/ruby_llm) to provide chat, embeddings, tool use, and agent capabilities to any Legion extension. Exposes OpenAI- and Anthropic-compatible API endpoints so external tools can point at the Legion daemon and just work.
4
4
 
5
- **Version**: 0.8.0
5
+ **Version**: 0.8.30
6
6
 
7
7
  ## Installation
8
8
 
@@ -60,6 +60,7 @@ Requests flow through the full Inference pipeline — routing, metering, audit,
60
60
  Both formats supported with correct SSE shapes:
61
61
  - **OpenAI**: `data: {"choices":[{"delta":{"content":"..."}}]}` chunks, terminated by `data: [DONE]`
62
62
  - **Anthropic**: Typed events — `message_start`, `content_block_start`, `content_block_delta`, `content_block_stop`, `message_delta`, `message_stop`
63
+ - **Native**: `/api/llm/inference` streams `text-delta`, `thinking-delta`, tool lifecycle events, and a final `done` event. Structured provider content blocks are flattened to plain text in both streaming and non-streaming native responses so `content` remains a string for daemon clients.
63
64
 
64
65
  ### API Authentication
65
66
 
@@ -851,6 +852,8 @@ No code changes are needed in consumers immediately. The aliases will be maintai
851
852
  | Azure AI | `azure` | `vault://`, `env://`, or direct | Azure OpenAI endpoint; `api_base` + `api_key` or `auth_token` |
852
853
  | Ollama | `ollama` | Local, no credentials needed | Local inference |
853
854
 
855
+ `env://NAME` credential placeholders resolve at provider configuration time, including array fallbacks such as `["env://OPENAI_API_KEY", "env://CODEX_API_KEY"]`. Unresolved placeholders do not auto-enable hosted providers.
856
+
854
857
  ## Integration with LegionIO
855
858
 
856
859
  legion-llm follows the standard core gem lifecycle:
data/legion-llm.gemspec CHANGED
@@ -35,6 +35,7 @@ Gem::Specification.new do |spec|
35
35
  spec.add_dependency 'lex-gemini'
36
36
  spec.add_dependency 'lex-knowledge'
37
37
  spec.add_dependency 'lex-openai'
38
+ spec.add_dependency 'pdf-reader'
38
39
  spec.add_dependency 'ruby_llm', '~> 1.13'
39
40
  spec.add_dependency 'tzinfo', '>= 2.0'
40
41
  end
@@ -10,12 +10,14 @@ module Legion
10
10
  module API
11
11
  module Native
12
12
  module ClientToolMethods
13
+ include Legion::Logging::Helper
14
+
13
15
  private
14
16
 
15
17
  def log_tool(level, ref, status, **details)
16
18
  parts = ["[tool][#{ref}] #{status}"]
17
19
  details.each { |k, v| parts << "#{k}=#{v}" }
18
- Legion::Logging.send(level, parts.join(' '))
20
+ log.public_send(level, parts.join(' '))
19
21
  end
20
22
 
21
23
  def summarize_tool_arg_keys(kwargs)
@@ -37,7 +39,7 @@ module Legion
37
39
  end
38
40
  end
39
41
 
40
- def dispatch_client_tool(ref, **kwargs) # rubocop:disable Metrics/AbcSize,Metrics/CyclomaticComplexity,Metrics/PerceivedComplexity
42
+ def dispatch_client_tool(ref, **kwargs) # rubocop:disable Metrics/AbcSize,Metrics/CyclomaticComplexity,Metrics/MethodLength,Metrics/PerceivedComplexity
41
43
  case ref
42
44
  when 'sh'
43
45
  cmd = kwargs[:command] || kwargs[:cmd] || kwargs.values.first.to_s
@@ -45,7 +47,7 @@ module Legion
45
47
  "exit=#{status.exitstatus}\n#{output}"
46
48
  when 'file_read'
47
49
  path = kwargs[:path] || kwargs[:file_path] || kwargs.values.first.to_s
48
- ::File.exist?(path) ? ::File.read(path, encoding: 'utf-8') : "File not found: #{path}"
50
+ read_client_file(path)
49
51
  when 'file_write'
50
52
  path = kwargs[:path] || kwargs[:file_path]
51
53
  content = kwargs[:content] || kwargs[:contents]
@@ -82,6 +84,7 @@ module Legion
82
84
  max_length ? content[0, max_length] : content
83
85
  rescue LoadError => e
84
86
  missing = e.respond_to?(:path) && e.path ? e.path : 'legion/cli/chat/web_fetch'
87
+ handle_exception(e, level: :warn, handled: true, operation: 'llm.api.client_tool.web_fetch', missing: missing)
85
88
  "web_fetch is unavailable: missing optional dependency #{missing}"
86
89
  end
87
90
  when 'web_search'
@@ -93,6 +96,7 @@ module Legion
93
96
  results[:results].map { |r| "### #{r[:title]}\n#{r[:url]}\n#{r[:snippet]}" }.join("\n\n")
94
97
  rescue LoadError => e
95
98
  missing = e.respond_to?(:path) && e.path ? e.path : 'legion/cli/chat/web_search'
99
+ handle_exception(e, level: :warn, handled: true, operation: 'llm.api.client_tool.web_search', missing: missing)
96
100
  "web_search is unavailable: missing optional dependency #{missing}"
97
101
  end
98
102
  else
@@ -100,6 +104,51 @@ module Legion
100
104
  end
101
105
  end
102
106
 
107
+ def read_client_file(path)
108
+ return "File not found: #{path}" unless ::File.exist?(path)
109
+
110
+ return read_pdf_text(path) if pdf_file?(path)
111
+
112
+ content = ::File.binread(path)
113
+ return 'Binary file detected, cannot read as text.' if binary_content?(content)
114
+
115
+ content.force_encoding('UTF-8')
116
+ content
117
+ rescue StandardError => e
118
+ handle_exception(e, level: :warn, handled: true, operation: 'llm.api.client_tool.file_read', path: path)
119
+ "file_read error: #{e.message}"
120
+ end
121
+
122
+ def pdf_file?(path)
123
+ ::File.extname(path).casecmp('.pdf').zero? || ::File.binread(path, 5) == '%PDF-'
124
+ rescue StandardError => e
125
+ handle_exception(e, level: :warn, handled: true, operation: 'llm.api.client_tool.pdf_sniff', path: path)
126
+ false
127
+ end
128
+
129
+ def read_pdf_text(path)
130
+ require 'pdf-reader' unless defined?(::PDF::Reader)
131
+
132
+ reader = ::PDF::Reader.new(path)
133
+ text = reader.pages.map(&:text).join("\n\n").strip
134
+ text.empty? ? 'PDF contained no extractable text.' : text
135
+ rescue LoadError => e
136
+ missing = e.respond_to?(:path) && e.path ? e.path : 'pdf-reader'
137
+ handle_exception(e, level: :warn, handled: true, operation: 'llm.api.client_tool.pdf_extract', missing: missing)
138
+ 'PDF text extraction unavailable: missing pdf-reader gem.'
139
+ rescue StandardError => e
140
+ handle_exception(e, level: :warn, handled: true, operation: 'llm.api.client_tool.pdf_extract', path: path)
141
+ "PDF text extraction failed: #{e.message}"
142
+ end
143
+
144
+ def binary_content?(content)
145
+ return true if content.include?("\x00")
146
+
147
+ sample = content.byteslice(0, 4096).to_s
148
+ sample.force_encoding('UTF-8')
149
+ !sample.valid_encoding?
150
+ end
151
+
103
152
  def notify_tool_event(type, ref, **data)
104
153
  handler = Thread.current[:legion_tool_event_handler]
105
154
  return unless handler
@@ -257,13 +306,14 @@ module Legion
257
306
  rescue StandardError => e
258
307
  ms = begin
259
308
  ((::Process.clock_gettime(::Process::CLOCK_MONOTONIC) - t0) * 1000).round(1)
260
- rescue StandardError
309
+ rescue StandardError => e
310
+ handle_exception(e, level: :warn, handled: true,
311
+ operation: 'llm.api.client_tool.duration_measurement', tool_ref: tool_ref)
261
312
  nil
262
313
  end
263
314
  log_tool(:error, tool_ref, 'failed', duration_ms: ms, error: e.message)
264
315
  notify_tool_event(:tool_error, tool_ref, error: e.message)
265
- Legion::Logging.log_exception(e, payload_summary: "client tool #{tool_ref} failed",
266
- component_type: :api)
316
+ handle_exception(e, level: :error, handled: true, operation: "llm.api.client_tool.#{tool_ref}")
267
317
  "Tool error: #{e.message}"
268
318
  end
269
319
  end
@@ -287,6 +337,25 @@ module Legion
287
337
  end
288
338
  end
289
339
 
340
+ define_method(:extract_text_content) do |content|
341
+ case content
342
+ when nil
343
+ ''
344
+ when String
345
+ content
346
+ when Array
347
+ content.filter_map { |entry| extract_text_content(entry) }.join
348
+ when Hash
349
+ type = content[:type] || content['type']
350
+ return '' unless type.nil? || type.to_s == 'text'
351
+
352
+ text = content.key?(:text) || content.key?('text') ? (content[:text] || content['text']) : (content[:content] || content['content'])
353
+ extract_text_content(text)
354
+ else
355
+ content.to_s
356
+ end
357
+ end
358
+
290
359
  define_method(:emit_sse_event) do |stream, event_name, payload|
291
360
  level = event_name == 'text-delta' ? :debug : :info
292
361
  log.send(level, "[sse][emit] event=#{event_name} keys=#{payload.is_a?(Hash) ? payload.keys.join(',') : 'n/a'}")
@@ -333,7 +402,8 @@ module Legion
333
402
 
334
403
  kerb = begin
335
404
  Legion::Settings.dig(:kerberos, :username)
336
- rescue StandardError
405
+ rescue StandardError => e
406
+ handle_exception(e, level: :warn, handled: true, operation: 'llm.api.identity.kerberos_username')
337
407
  nil
338
408
  end
339
409
  return "user:#{kerb}" if kerb.is_a?(String) && !kerb.empty?
@@ -354,14 +424,16 @@ module Legion
354
424
  define_method(:resolve_requested_by) do |rack_env, identity_string|
355
425
  hostname = begin
356
426
  Legion::Settings[:client][:hostname]
357
- rescue StandardError
427
+ rescue StandardError => e
428
+ handle_exception(e, level: :warn, handled: true, operation: 'llm.api.identity.client_hostname')
358
429
  Socket.gethostname
359
430
  end
360
431
  username = identity_string.delete_prefix('user:')
361
432
 
362
433
  kerb = begin
363
434
  Legion::Settings.dig(:kerberos, :username)
364
- rescue StandardError
435
+ rescue StandardError => e
436
+ handle_exception(e, level: :warn, handled: true, operation: 'llm.api.identity.requested_by_kerberos')
365
437
  nil
366
438
  end
367
439
  if kerb.is_a?(String) && !kerb.empty?
@@ -150,7 +150,10 @@ module Legion
150
150
  }
151
151
 
152
152
  pipeline_response = executor.call_stream do |chunk|
153
- text = chunk.respond_to?(:content) ? chunk.content.to_s : chunk.to_s
153
+ thinking = extract_text_content(chunk.thinking) if chunk.respond_to?(:thinking)
154
+ emit_sse_event(out, 'thinking-delta', { delta: thinking }) unless thinking.to_s.empty?
155
+
156
+ text = extract_text_content(chunk.respond_to?(:content) ? chunk.content : chunk)
154
157
  next if text.empty?
155
158
 
156
159
  full_text << text
@@ -195,7 +198,7 @@ module Legion
195
198
  exec_ms = ((::Process.clock_gettime(::Process::CLOCK_MONOTONIC) - exec_t0) * 1000).round
196
199
  log.debug("[llm][api][inference] action=executor_call duration_ms=#{exec_ms} request_id=#{request_id}")
197
200
  raw_msg = pipeline_response.message
198
- content = raw_msg.is_a?(Hash) ? (raw_msg[:content] || raw_msg['content']) : raw_msg.to_s
201
+ content = extract_text_content(raw_msg.is_a?(Hash) ? (raw_msg[:content] || raw_msg['content']) : raw_msg)
199
202
  routing = pipeline_response.routing || {}
200
203
  tokens = pipeline_response.tokens || {}
201
204
  tool_calls = extract_tool_calls(pipeline_response)
@@ -0,0 +1,270 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Monkey-patch RubyLLM's Bedrock provider to support embeddings via
4
+ # Amazon Titan (amazon.titan-embed-text-v1 and v2) and Cohere Embed
5
+ # (cohere.embed-english-v3 / cohere.embed-multilingual-v3).
6
+ #
7
+ # Without this patch, `RubyLLM::Providers::Bedrock` exposes no
8
+ # `render_embedding_payload` method, so the discovery probe
9
+ # (`klass.instance_method(:render_embedding_payload)`) raises NameError
10
+ # and Bedrock is silently excluded from the embedding fallback chain.
11
+ #
12
+ # Companion piece to `call/bedrock_auth.rb` — both use the same
13
+ # bearer-or-SigV4 `signed_post` path and live here (not in lex-bedrock)
14
+ # because lex-bedrock wraps `aws-sdk-bedrockruntime`, not RubyLLM.
15
+ #
16
+ # ─── Upstream tracking ────────────────────────────────────────────
17
+ # This is a deprecation-scheduled shim. The methods below are the
18
+ # kind of thing that eventually belongs in the underlying ruby_llm
19
+ # library's Bedrock provider. Remove this file once upstream ships
20
+ # equivalent support. The short-circuit below renders the patch
21
+ # inert when `render_embedding_payload` is defined natively, so an
22
+ # accidental double-load after an upstream bump is safe.
23
+ # ───────────────────────────────────────────────────────────────────
24
+ #
25
+ # Titan v2 request shape:
26
+ # POST /model/amazon.titan-embed-text-v2:0/invoke
27
+ # { "inputText": "...", "dimensions": 1024, "normalize": true }
28
+ # => { "embedding": [...], "inputTextTokenCount": N }
29
+ #
30
+ # Cohere Embed request shape:
31
+ # POST /model/cohere.embed-english-v3/invoke
32
+ # { "texts": ["..."], "input_type": "search_document" }
33
+ # => { "embeddings": [[...]], ... }
34
+
35
+ require 'ruby_llm'
36
+ require_relative 'bedrock_auth'
37
+
38
+ if RubyLLM::Providers::Bedrock.method_defined?(:render_embedding_payload)
39
+ # Native support landed upstream — patch is inert.
40
+ Legion::Logging.logger.info('[llm][bedrock_embeddings] native ruby_llm embedding support detected — skipping patch')
41
+ else
42
+
43
+ module RubyLLM
44
+ module Providers
45
+ class Bedrock
46
+ # Embeddings methods for AWS Bedrock via InvokeModel.
47
+ #
48
+ # Public methods are instance methods (not `module_function`) so the
49
+ # `include Embeddings` at the end of the class body properly overrides
50
+ # `Provider#embed` via Ruby's method-resolution order.
51
+ module Embeddings
52
+ TITAN_V2_PREFIX = 'amazon.titan-embed-text-v2'
53
+ TITAN_V1_PREFIX = 'amazon.titan-embed-text-v1'
54
+ COHERE_PREFIX = 'cohere.embed'
55
+
56
+ TITAN_ALLOWED_DIMENSIONS = [256, 512, 1024].freeze
57
+ TITAN_MAX_INPUT_BYTES = 45_000 # ~8k tokens; Titan rejects larger with 400 (and still bills)
58
+ COHERE_MAX_INPUT_BYTES = 8_192 # Cohere Embed v3 per-text byte budget
59
+ COHERE_MAX_TEXTS = 96 # Cohere Embed v3 batch limit
60
+ # Bedrock model IDs use only alphanumerics, `.`, `-`, and `:` (e.g.
61
+ # `amazon.titan-embed-text-v2:0`, `cohere.embed-english-v3`,
62
+ # `us.anthropic.claude-sonnet-4-6-v1`). Slashes and `..` are rejected
63
+ # to block path-injection into the `/model/<id>/invoke` URL.
64
+ MODEL_ID_PATTERN = /\A[a-zA-Z0-9.\-:]+\z/
65
+
66
+ # @param model [String, Symbol] Bedrock model id
67
+ # @return [String] InvokeModel URL path
68
+ # @raise [RubyLLM::Error] if model id contains unsafe characters
69
+ def embedding_url(model:)
70
+ raise RubyLLM::Error.new(nil, "Invalid Bedrock model id: #{model.inspect}") \
71
+ unless model.to_s.match?(MODEL_ID_PATTERN)
72
+
73
+ "/model/#{model}/invoke"
74
+ end
75
+
76
+ # @param text [String, Array<String>]
77
+ # @param model [String] Bedrock embedding model id
78
+ # @param dimensions [Integer, nil] Titan v2 only; one of {256, 512, 1024}
79
+ # @return [Hash] JSON-serializable request payload
80
+ # @raise [RubyLLM::Error] on unsupported model, oversize input, or invalid dimensions
81
+ def render_embedding_payload(text, model:, dimensions:)
82
+ model_str = model.to_s
83
+
84
+ if model_str.start_with?(TITAN_V2_PREFIX)
85
+ titan_v2_payload(text, dimensions: dimensions)
86
+ elsif model_str.start_with?(TITAN_V1_PREFIX)
87
+ titan_v1_payload(text)
88
+ elsif model_str.start_with?(COHERE_PREFIX)
89
+ cohere_payload(text)
90
+ else
91
+ raise RubyLLM::Error.new(
92
+ nil,
93
+ "Bedrock model '#{model}' is not supported for embeddings. " \
94
+ 'Supported prefixes: amazon.titan-embed-text-v1, ' \
95
+ 'amazon.titan-embed-text-v2, cohere.embed-*.'
96
+ )
97
+ end
98
+ end
99
+
100
+ # @param response [Faraday::Response]
101
+ # @param model [String]
102
+ # @param text [String, Array<String>] original input (used for shape decisions)
103
+ # @return [RubyLLM::Embedding]
104
+ # @raise [RubyLLM::Error] if the response carried no vector
105
+ def parse_embedding_response(response, model:, text:)
106
+ body = response.body
107
+ body = try_parse_json(body) if body.is_a?(String)
108
+
109
+ vectors =
110
+ if model.to_s.start_with?(COHERE_PREFIX)
111
+ Array(body['embeddings'])
112
+ else
113
+ # Titan single-text response: the single vector lives in :embedding.
114
+ # Batch callers are handled in `embed` via iteration.
115
+ [body['embedding']].compact
116
+ end
117
+
118
+ raise RubyLLM::Error.new(response, "Empty embedding response for model #{model}") if vectors.empty?
119
+
120
+ vectors = vectors.first if vectors.length == 1 && !text.is_a?(Array)
121
+ input_tokens = body['inputTextTokenCount'] ||
122
+ body.dig('meta', 'billed_units', 'input_tokens') ||
123
+ 0
124
+
125
+ RubyLLM::Embedding.new(vectors: vectors, model: model, input_tokens: input_tokens)
126
+ end
127
+
128
+ # Override the base `embed` method so signing headers are applied.
129
+ #
130
+ # The parent `Provider#embed` calls `@connection.post(url, payload)` directly,
131
+ # which would skip both bearer-token and SigV4 auth for Bedrock. We go through
132
+ # `invoke_embedding`, which mirrors `signed_post` but parses responses with
133
+ # `parse_embedding_response` (not `parse_completion_response`).
134
+ #
135
+ # Titan accepts a single text per invocation. When an Array is passed to a
136
+ # Titan model, we iterate via `embed_titan_batch`, which traps per-element
137
+ # failures so one 429 mid-batch does not lose preceding successes.
138
+ #
139
+ # @param text [String, Array<String>]
140
+ # @param model [String]
141
+ # @param dimensions [Integer, nil]
142
+ # @return [RubyLLM::Embedding]
143
+ def embed(text, model:, dimensions:)
144
+ return embed_titan_batch(text, model: model, dimensions: dimensions) \
145
+ if text.is_a?(Array) && !model.to_s.start_with?(COHERE_PREFIX)
146
+
147
+ payload = render_embedding_payload(text, model: model, dimensions: dimensions)
148
+ url = embedding_url(model: model)
149
+ response = invoke_embedding(url, payload)
150
+ parse_embedding_response(response, model: model, text: text)
151
+ end
152
+
153
+ private
154
+
155
+ def titan_v2_payload(text, dimensions:)
156
+ raise RubyLLM::Error.new(nil, 'Titan v2 embeddings accept a single string per invocation.') \
157
+ if text.is_a?(Array)
158
+
159
+ enforce_input_size!(text, TITAN_MAX_INPUT_BYTES, 'Titan v2')
160
+
161
+ payload = { inputText: text.to_s, normalize: true }
162
+ dim = dimensions&.to_i
163
+ if dim
164
+ unless TITAN_ALLOWED_DIMENSIONS.include?(dim)
165
+ raise RubyLLM::Error.new(
166
+ nil,
167
+ "Titan v2 dimensions must be one of #{TITAN_ALLOWED_DIMENSIONS.inspect}, got #{dim}"
168
+ )
169
+ end
170
+ payload[:dimensions] = dim
171
+ end
172
+ payload
173
+ end
174
+
175
+ def titan_v1_payload(text)
176
+ raise RubyLLM::Error.new(nil, 'Titan v1 embeddings accept a single string per invocation.') \
177
+ if text.is_a?(Array)
178
+
179
+ enforce_input_size!(text, TITAN_MAX_INPUT_BYTES, 'Titan v1')
180
+ { inputText: text.to_s }
181
+ end
182
+
183
+ def cohere_payload(text)
184
+ texts = Array(text).map(&:to_s)
185
+ raise RubyLLM::Error.new(nil, "Cohere Embed batch size #{texts.size} exceeds max #{COHERE_MAX_TEXTS}") \
186
+ if texts.size > COHERE_MAX_TEXTS
187
+
188
+ texts.each { |t| enforce_input_size!(t, COHERE_MAX_INPUT_BYTES, 'Cohere Embed') }
189
+
190
+ { texts: texts, input_type: 'search_document' }
191
+ end
192
+
193
+ def enforce_input_size!(text, max_bytes, model_name)
194
+ bytes = text.to_s.bytesize
195
+ return if bytes <= max_bytes
196
+
197
+ raise RubyLLM::Error.new(
198
+ nil,
199
+ "#{model_name} input too large: #{bytes} bytes exceeds max #{max_bytes}. " \
200
+ 'Caller must chunk before embedding.'
201
+ )
202
+ end
203
+
204
+ # Mirror of `signed_post` for embeddings: pre-serializes the body so the
205
+ # SigV4 signature matches the bytes Faraday actually sends. `@connection.post`
206
+ # is `RubyLLM::Connection#post(url, payload)` which requires both args, so we
207
+ # pass `payload` to satisfy the arity but override `req.body = body` in the
208
+ # block — the block runs after middleware, so the pre-serialized bytes win
209
+ # over whatever JSON middleware would have produced.
210
+ def invoke_embedding(url, payload)
211
+ body = Legion::JSON.dump(payload)
212
+ headers = sign_headers('POST', url, body)
213
+
214
+ @connection.post(url, payload) do |req|
215
+ req.headers.merge!(headers)
216
+ req.body = body
217
+ end
218
+ end
219
+
220
+ # Per-item trap-and-continue for Titan batch. Returns a combined Embedding
221
+ # whose `vectors` is an Array of [Float] per input index, with `nil` entries
222
+ # for failed slots. Token count aggregates successful calls.
223
+ #
224
+ # Raises only when every element failed — otherwise logs failures via
225
+ # `RubyLLM.logger` and returns partial results so callers keep the paid-for
226
+ # vectors. Idiomatic for this file because we are inside the RubyLLM
227
+ # namespace; Legion-side batch orchestration lives in
228
+ # `Legion::LLM::Call::Embeddings.generate_batch`.
229
+ def embed_titan_batch(texts, model:, dimensions:)
230
+ vectors = []
231
+ token_total = 0
232
+ failures = []
233
+
234
+ texts.each_with_index do |text, idx|
235
+ single = embed(text.to_s, model: model, dimensions: dimensions)
236
+ vectors << Array(single.vectors).first
237
+ token_total += single.input_tokens.to_i
238
+ rescue StandardError => e
239
+ vectors << nil
240
+ failures << { index: idx, error: e.class.name, message: e.message }
241
+ end
242
+
243
+ unless failures.empty?
244
+ RubyLLM.logger.warn(
245
+ '[bedrock_embeddings] Titan batch partial failure: ' \
246
+ "#{failures.size}/#{texts.size} model=#{model}"
247
+ )
248
+ failures.each do |f|
249
+ RubyLLM.logger.debug(
250
+ "[bedrock_embeddings] batch item index=#{f[:index]} error=#{f[:error]} message=#{f[:message]}"
251
+ )
252
+ end
253
+ end
254
+
255
+ if failures.size == texts.size
256
+ raise RubyLLM::Error.new(
257
+ nil,
258
+ "All #{texts.size} Titan batch items failed. First error: #{failures.first[:message]}"
259
+ )
260
+ end
261
+
262
+ RubyLLM::Embedding.new(vectors: vectors, model: model, input_tokens: token_total)
263
+ end
264
+ end
265
+
266
+ include Embeddings
267
+ end
268
+ end
269
+ end
270
+ end