legion-llm 0.8.28 → 0.8.30
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +24 -0
- data/README.md +4 -1
- data/legion-llm.gemspec +1 -0
- data/lib/legion/llm/api/native/helpers.rb +81 -9
- data/lib/legion/llm/api/native/inference.rb +5 -2
- data/lib/legion/llm/call/bedrock_embeddings.rb +270 -0
- data/lib/legion/llm/call/providers.rb +47 -28
- data/lib/legion/llm/call/structured_output.rb +15 -3
- data/lib/legion/llm/inference/executor.rb +37 -32
- data/lib/legion/llm/router.rb +1 -1
- data/lib/legion/llm/settings.rb +3 -2
- data/lib/legion/llm/tools/adapter.rb +15 -0
- data/lib/legion/llm/version.rb +1 -1
- data/lib/legion/llm.rb +1 -0
- metadata +16 -24
- data/docs/2026-03-23-pipeline-gap-analysis.md +0 -203
- data/docs/example_settings.json +0 -16
- data/docs/examples/anthropic_request.json +0 -108
- data/docs/examples/anthropic_response.json +0 -90
- data/docs/examples/azure_ai_request.json +0 -103
- data/docs/examples/azure_ai_response.json +0 -91
- data/docs/examples/bedrock_request.json +0 -127
- data/docs/examples/bedrock_response.json +0 -93
- data/docs/examples/gemini_request.json +0 -127
- data/docs/examples/gemini_response.json +0 -109
- data/docs/examples/openai_request.json +0 -100
- data/docs/examples/openai_response.json +0 -77
- data/docs/examples/xai_request.json +0 -93
- data/docs/examples/xai_response.json +0 -48
- data/docs/gas-apollo-idea.md +0 -528
- data/docs/generation-augmented-storage.md +0 -135
- data/docs/llm-schema-spec.md +0 -2816
- data/docs/plans/2026-03-15-ollama-discovery-design.md +0 -164
- data/docs/plans/2026-03-15-ollama-discovery-implementation.md +0 -1147
- data/docs/routing-reenvisioned.md +0 -861
- data/docs/superpowers/plans/2026-04-15-sticky-runners-tool-history.md +0 -1866
- data/docs/superpowers/specs/2026-04-15-sticky-runners-tool-history-design.md +0 -713
- data/legion-llm-0.3.20.gem +0 -0
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: d560457b6321f55371b3dd14d546c8c23d11485b2b3ba5dec218cea028d50399
|
|
4
|
+
data.tar.gz: 8ee76eba6bf57f592f9d372fec7f8c5372d97c220869f988abe9e1521e766b7b
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 0b81ee44a4f57a8ec0e9eb8aef043bdc543217baade9ff1b2772a46056ebc7a486d4e9acfded4fb05cedf341583cb8aaaa9091bd243c4f26eac9ff494ada3d01
|
|
7
|
+
data.tar.gz: 2ebf5a44aa5588a635c75ef6dc0b68c6554905d343b125cd944c1f5032399c4c714cad8953030e53ba3bb92efe0233371d5c2ebad12eebc5b4a1e2887a0e7c35
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,29 @@
|
|
|
1
1
|
# Legion LLM Changelog
|
|
2
2
|
|
|
3
|
+
## [0.8.30] - 2026-04-27
|
|
4
|
+
|
|
5
|
+
### Fixed
|
|
6
|
+
- Structured output parsing now strips markdown code fences before JSON parse, including retry responses from models that keep returning fenced JSON.
|
|
7
|
+
- LLM tool adapter dispatch now symbolizes JSON/string-keyed tool arguments before invoking Ruby keyword-argument tool classes.
|
|
8
|
+
- Default routing chains now honor explicit `default_provider` / `default_model` before auto-enabled local providers, preventing Ollama defaults from overriding a configured Bedrock default.
|
|
9
|
+
- Provider credential setup now resolves `env://` placeholders consistently for Bedrock SigV4, Anthropic, OpenAI, Gemini, Azure, and vLLM, and unresolved placeholder arrays no longer auto-enable hosted providers.
|
|
10
|
+
- Native `/api/llm/inference` responses now flatten structured provider content blocks into plain text for both streaming SSE deltas and non-streaming JSON responses, preventing Anthropic/Bedrock-style block arrays from being stored and replayed as nested JSON-looking assistant replies.
|
|
11
|
+
- Native `/api/llm/inference` streaming now emits `thinking-delta` SSE events for provider reasoning chunks without appending those chunks to final assistant content.
|
|
12
|
+
- Native `file_read` client tools now extract text from PDFs via `pdf-reader` and return a clear unsupported-binary message for non-text binary files.
|
|
13
|
+
- Local providers now cap automatically injected registry tools with `llm.tool_trigger.local_tool_limit`, prioritizing trigger-matched tools before always-loaded tools for Ollama/vLLM requests.
|
|
14
|
+
|
|
15
|
+
## [0.8.29] - 2026-04-27
|
|
16
|
+
|
|
17
|
+
### Added
|
|
18
|
+
- Bedrock embedding support via `call/bedrock_embeddings.rb` — a `RubyLLM::Providers::Bedrock` monkey-patch (same pattern as `bedrock_auth.rb`) that implements `render_embedding_payload`, `embedding_url`, `parse_embedding_response`, and overrides `embed` for signed transport. Covers Amazon Titan v1, Titan v2 (selectable 256/512/1024 dimensions), and Cohere Embed v3 (English + multilingual).
|
|
19
|
+
- Short-circuit guard: when ruby_llm eventually ships native `render_embedding_payload`, the patch becomes inert rather than double-loading the method.
|
|
20
|
+
- Trap-and-continue batch semantics for Titan (which is single-text-per-call): `embed_titan_batch` iterates client-side, preserves partial successes on mid-batch failures, logs the failure count via `RubyLLM.logger.warn`, and only raises when 100% of inputs fail.
|
|
21
|
+
- Input-size guards: Titan rejects >8k tokens with a billable 400 — we now raise a descriptive `RubyLLM::Error` at ≥45 000 bytes before the wire call. Cohere enforces the 96-texts / 8 KB-per-text documented limits.
|
|
22
|
+
- Full spec coverage in `spec/legion/llm/bedrock_embeddings_spec.rb` (probe contract, per-model payload shapes, dimension validation, batch limits, error paths).
|
|
23
|
+
|
|
24
|
+
### Fixed
|
|
25
|
+
- `Legion::LLM::Discovery.find_embedding_provider` can now actually resolve Bedrock when it is the configured fallback. Previously, the discovery probe (`klass.instance_method(:render_embedding_payload)`) raised `NameError` for Bedrock and the fallback chain skipped past it with `[llm][discovery] no embedding provider available` — even when Bedrock was the only reachable embedding provider.
|
|
26
|
+
|
|
3
27
|
## [0.8.28] - 2026-04-24
|
|
4
28
|
|
|
5
29
|
### Fixed
|
data/README.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
|
|
3
3
|
LLM integration for the [LegionIO](https://github.com/LegionIO/LegionIO) framework. Wraps [ruby_llm](https://github.com/crmne/ruby_llm) to provide chat, embeddings, tool use, and agent capabilities to any Legion extension. Exposes OpenAI- and Anthropic-compatible API endpoints so external tools can point at the Legion daemon and just work.
|
|
4
4
|
|
|
5
|
-
**Version**: 0.8.
|
|
5
|
+
**Version**: 0.8.30
|
|
6
6
|
|
|
7
7
|
## Installation
|
|
8
8
|
|
|
@@ -60,6 +60,7 @@ Requests flow through the full Inference pipeline — routing, metering, audit,
|
|
|
60
60
|
Both formats supported with correct SSE shapes:
|
|
61
61
|
- **OpenAI**: `data: {"choices":[{"delta":{"content":"..."}}]}` chunks, terminated by `data: [DONE]`
|
|
62
62
|
- **Anthropic**: Typed events — `message_start`, `content_block_start`, `content_block_delta`, `content_block_stop`, `message_delta`, `message_stop`
|
|
63
|
+
- **Native**: `/api/llm/inference` streams `text-delta`, `thinking-delta`, tool lifecycle events, and a final `done` event. Structured provider content blocks are flattened to plain text in both streaming and non-streaming native responses so `content` remains a string for daemon clients.
|
|
63
64
|
|
|
64
65
|
### API Authentication
|
|
65
66
|
|
|
@@ -851,6 +852,8 @@ No code changes are needed in consumers immediately. The aliases will be maintai
|
|
|
851
852
|
| Azure AI | `azure` | `vault://`, `env://`, or direct | Azure OpenAI endpoint; `api_base` + `api_key` or `auth_token` |
|
|
852
853
|
| Ollama | `ollama` | Local, no credentials needed | Local inference |
|
|
853
854
|
|
|
855
|
+
`env://NAME` credential placeholders resolve at provider configuration time, including array fallbacks such as `["env://OPENAI_API_KEY", "env://CODEX_API_KEY"]`. Unresolved placeholders do not auto-enable hosted providers.
|
|
856
|
+
|
|
854
857
|
## Integration with LegionIO
|
|
855
858
|
|
|
856
859
|
legion-llm follows the standard core gem lifecycle:
|
data/legion-llm.gemspec
CHANGED
|
@@ -35,6 +35,7 @@ Gem::Specification.new do |spec|
|
|
|
35
35
|
spec.add_dependency 'lex-gemini'
|
|
36
36
|
spec.add_dependency 'lex-knowledge'
|
|
37
37
|
spec.add_dependency 'lex-openai'
|
|
38
|
+
spec.add_dependency 'pdf-reader'
|
|
38
39
|
spec.add_dependency 'ruby_llm', '~> 1.13'
|
|
39
40
|
spec.add_dependency 'tzinfo', '>= 2.0'
|
|
40
41
|
end
|
|
@@ -10,12 +10,14 @@ module Legion
|
|
|
10
10
|
module API
|
|
11
11
|
module Native
|
|
12
12
|
module ClientToolMethods
|
|
13
|
+
include Legion::Logging::Helper
|
|
14
|
+
|
|
13
15
|
private
|
|
14
16
|
|
|
15
17
|
def log_tool(level, ref, status, **details)
|
|
16
18
|
parts = ["[tool][#{ref}] #{status}"]
|
|
17
19
|
details.each { |k, v| parts << "#{k}=#{v}" }
|
|
18
|
-
|
|
20
|
+
log.public_send(level, parts.join(' '))
|
|
19
21
|
end
|
|
20
22
|
|
|
21
23
|
def summarize_tool_arg_keys(kwargs)
|
|
@@ -37,7 +39,7 @@ module Legion
|
|
|
37
39
|
end
|
|
38
40
|
end
|
|
39
41
|
|
|
40
|
-
def dispatch_client_tool(ref, **kwargs) # rubocop:disable Metrics/AbcSize,Metrics/CyclomaticComplexity,Metrics/PerceivedComplexity
|
|
42
|
+
def dispatch_client_tool(ref, **kwargs) # rubocop:disable Metrics/AbcSize,Metrics/CyclomaticComplexity,Metrics/MethodLength,Metrics/PerceivedComplexity
|
|
41
43
|
case ref
|
|
42
44
|
when 'sh'
|
|
43
45
|
cmd = kwargs[:command] || kwargs[:cmd] || kwargs.values.first.to_s
|
|
@@ -45,7 +47,7 @@ module Legion
|
|
|
45
47
|
"exit=#{status.exitstatus}\n#{output}"
|
|
46
48
|
when 'file_read'
|
|
47
49
|
path = kwargs[:path] || kwargs[:file_path] || kwargs.values.first.to_s
|
|
48
|
-
|
|
50
|
+
read_client_file(path)
|
|
49
51
|
when 'file_write'
|
|
50
52
|
path = kwargs[:path] || kwargs[:file_path]
|
|
51
53
|
content = kwargs[:content] || kwargs[:contents]
|
|
@@ -82,6 +84,7 @@ module Legion
|
|
|
82
84
|
max_length ? content[0, max_length] : content
|
|
83
85
|
rescue LoadError => e
|
|
84
86
|
missing = e.respond_to?(:path) && e.path ? e.path : 'legion/cli/chat/web_fetch'
|
|
87
|
+
handle_exception(e, level: :warn, handled: true, operation: 'llm.api.client_tool.web_fetch', missing: missing)
|
|
85
88
|
"web_fetch is unavailable: missing optional dependency #{missing}"
|
|
86
89
|
end
|
|
87
90
|
when 'web_search'
|
|
@@ -93,6 +96,7 @@ module Legion
|
|
|
93
96
|
results[:results].map { |r| "### #{r[:title]}\n#{r[:url]}\n#{r[:snippet]}" }.join("\n\n")
|
|
94
97
|
rescue LoadError => e
|
|
95
98
|
missing = e.respond_to?(:path) && e.path ? e.path : 'legion/cli/chat/web_search'
|
|
99
|
+
handle_exception(e, level: :warn, handled: true, operation: 'llm.api.client_tool.web_search', missing: missing)
|
|
96
100
|
"web_search is unavailable: missing optional dependency #{missing}"
|
|
97
101
|
end
|
|
98
102
|
else
|
|
@@ -100,6 +104,51 @@ module Legion
|
|
|
100
104
|
end
|
|
101
105
|
end
|
|
102
106
|
|
|
107
|
+
def read_client_file(path)
|
|
108
|
+
return "File not found: #{path}" unless ::File.exist?(path)
|
|
109
|
+
|
|
110
|
+
return read_pdf_text(path) if pdf_file?(path)
|
|
111
|
+
|
|
112
|
+
content = ::File.binread(path)
|
|
113
|
+
return 'Binary file detected, cannot read as text.' if binary_content?(content)
|
|
114
|
+
|
|
115
|
+
content.force_encoding('UTF-8')
|
|
116
|
+
content
|
|
117
|
+
rescue StandardError => e
|
|
118
|
+
handle_exception(e, level: :warn, handled: true, operation: 'llm.api.client_tool.file_read', path: path)
|
|
119
|
+
"file_read error: #{e.message}"
|
|
120
|
+
end
|
|
121
|
+
|
|
122
|
+
def pdf_file?(path)
|
|
123
|
+
::File.extname(path).casecmp('.pdf').zero? || ::File.binread(path, 5) == '%PDF-'
|
|
124
|
+
rescue StandardError => e
|
|
125
|
+
handle_exception(e, level: :warn, handled: true, operation: 'llm.api.client_tool.pdf_sniff', path: path)
|
|
126
|
+
false
|
|
127
|
+
end
|
|
128
|
+
|
|
129
|
+
def read_pdf_text(path)
|
|
130
|
+
require 'pdf-reader' unless defined?(::PDF::Reader)
|
|
131
|
+
|
|
132
|
+
reader = ::PDF::Reader.new(path)
|
|
133
|
+
text = reader.pages.map(&:text).join("\n\n").strip
|
|
134
|
+
text.empty? ? 'PDF contained no extractable text.' : text
|
|
135
|
+
rescue LoadError => e
|
|
136
|
+
missing = e.respond_to?(:path) && e.path ? e.path : 'pdf-reader'
|
|
137
|
+
handle_exception(e, level: :warn, handled: true, operation: 'llm.api.client_tool.pdf_extract', missing: missing)
|
|
138
|
+
'PDF text extraction unavailable: missing pdf-reader gem.'
|
|
139
|
+
rescue StandardError => e
|
|
140
|
+
handle_exception(e, level: :warn, handled: true, operation: 'llm.api.client_tool.pdf_extract', path: path)
|
|
141
|
+
"PDF text extraction failed: #{e.message}"
|
|
142
|
+
end
|
|
143
|
+
|
|
144
|
+
def binary_content?(content)
|
|
145
|
+
return true if content.include?("\x00")
|
|
146
|
+
|
|
147
|
+
sample = content.byteslice(0, 4096).to_s
|
|
148
|
+
sample.force_encoding('UTF-8')
|
|
149
|
+
!sample.valid_encoding?
|
|
150
|
+
end
|
|
151
|
+
|
|
103
152
|
def notify_tool_event(type, ref, **data)
|
|
104
153
|
handler = Thread.current[:legion_tool_event_handler]
|
|
105
154
|
return unless handler
|
|
@@ -257,13 +306,14 @@ module Legion
|
|
|
257
306
|
rescue StandardError => e
|
|
258
307
|
ms = begin
|
|
259
308
|
((::Process.clock_gettime(::Process::CLOCK_MONOTONIC) - t0) * 1000).round(1)
|
|
260
|
-
rescue StandardError
|
|
309
|
+
rescue StandardError => e
|
|
310
|
+
handle_exception(e, level: :warn, handled: true,
|
|
311
|
+
operation: 'llm.api.client_tool.duration_measurement', tool_ref: tool_ref)
|
|
261
312
|
nil
|
|
262
313
|
end
|
|
263
314
|
log_tool(:error, tool_ref, 'failed', duration_ms: ms, error: e.message)
|
|
264
315
|
notify_tool_event(:tool_error, tool_ref, error: e.message)
|
|
265
|
-
|
|
266
|
-
component_type: :api)
|
|
316
|
+
handle_exception(e, level: :error, handled: true, operation: "llm.api.client_tool.#{tool_ref}")
|
|
267
317
|
"Tool error: #{e.message}"
|
|
268
318
|
end
|
|
269
319
|
end
|
|
@@ -287,6 +337,25 @@ module Legion
|
|
|
287
337
|
end
|
|
288
338
|
end
|
|
289
339
|
|
|
340
|
+
define_method(:extract_text_content) do |content|
|
|
341
|
+
case content
|
|
342
|
+
when nil
|
|
343
|
+
''
|
|
344
|
+
when String
|
|
345
|
+
content
|
|
346
|
+
when Array
|
|
347
|
+
content.filter_map { |entry| extract_text_content(entry) }.join
|
|
348
|
+
when Hash
|
|
349
|
+
type = content[:type] || content['type']
|
|
350
|
+
return '' unless type.nil? || type.to_s == 'text'
|
|
351
|
+
|
|
352
|
+
text = content.key?(:text) || content.key?('text') ? (content[:text] || content['text']) : (content[:content] || content['content'])
|
|
353
|
+
extract_text_content(text)
|
|
354
|
+
else
|
|
355
|
+
content.to_s
|
|
356
|
+
end
|
|
357
|
+
end
|
|
358
|
+
|
|
290
359
|
define_method(:emit_sse_event) do |stream, event_name, payload|
|
|
291
360
|
level = event_name == 'text-delta' ? :debug : :info
|
|
292
361
|
log.send(level, "[sse][emit] event=#{event_name} keys=#{payload.is_a?(Hash) ? payload.keys.join(',') : 'n/a'}")
|
|
@@ -333,7 +402,8 @@ module Legion
|
|
|
333
402
|
|
|
334
403
|
kerb = begin
|
|
335
404
|
Legion::Settings.dig(:kerberos, :username)
|
|
336
|
-
rescue StandardError
|
|
405
|
+
rescue StandardError => e
|
|
406
|
+
handle_exception(e, level: :warn, handled: true, operation: 'llm.api.identity.kerberos_username')
|
|
337
407
|
nil
|
|
338
408
|
end
|
|
339
409
|
return "user:#{kerb}" if kerb.is_a?(String) && !kerb.empty?
|
|
@@ -354,14 +424,16 @@ module Legion
|
|
|
354
424
|
define_method(:resolve_requested_by) do |rack_env, identity_string|
|
|
355
425
|
hostname = begin
|
|
356
426
|
Legion::Settings[:client][:hostname]
|
|
357
|
-
rescue StandardError
|
|
427
|
+
rescue StandardError => e
|
|
428
|
+
handle_exception(e, level: :warn, handled: true, operation: 'llm.api.identity.client_hostname')
|
|
358
429
|
Socket.gethostname
|
|
359
430
|
end
|
|
360
431
|
username = identity_string.delete_prefix('user:')
|
|
361
432
|
|
|
362
433
|
kerb = begin
|
|
363
434
|
Legion::Settings.dig(:kerberos, :username)
|
|
364
|
-
rescue StandardError
|
|
435
|
+
rescue StandardError => e
|
|
436
|
+
handle_exception(e, level: :warn, handled: true, operation: 'llm.api.identity.requested_by_kerberos')
|
|
365
437
|
nil
|
|
366
438
|
end
|
|
367
439
|
if kerb.is_a?(String) && !kerb.empty?
|
|
@@ -150,7 +150,10 @@ module Legion
|
|
|
150
150
|
}
|
|
151
151
|
|
|
152
152
|
pipeline_response = executor.call_stream do |chunk|
|
|
153
|
-
|
|
153
|
+
thinking = extract_text_content(chunk.thinking) if chunk.respond_to?(:thinking)
|
|
154
|
+
emit_sse_event(out, 'thinking-delta', { delta: thinking }) unless thinking.to_s.empty?
|
|
155
|
+
|
|
156
|
+
text = extract_text_content(chunk.respond_to?(:content) ? chunk.content : chunk)
|
|
154
157
|
next if text.empty?
|
|
155
158
|
|
|
156
159
|
full_text << text
|
|
@@ -195,7 +198,7 @@ module Legion
|
|
|
195
198
|
exec_ms = ((::Process.clock_gettime(::Process::CLOCK_MONOTONIC) - exec_t0) * 1000).round
|
|
196
199
|
log.debug("[llm][api][inference] action=executor_call duration_ms=#{exec_ms} request_id=#{request_id}")
|
|
197
200
|
raw_msg = pipeline_response.message
|
|
198
|
-
content = raw_msg.is_a?(Hash) ? (raw_msg[:content] || raw_msg['content']) : raw_msg
|
|
201
|
+
content = extract_text_content(raw_msg.is_a?(Hash) ? (raw_msg[:content] || raw_msg['content']) : raw_msg)
|
|
199
202
|
routing = pipeline_response.routing || {}
|
|
200
203
|
tokens = pipeline_response.tokens || {}
|
|
201
204
|
tool_calls = extract_tool_calls(pipeline_response)
|
|
@@ -0,0 +1,270 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
# Monkey-patch RubyLLM's Bedrock provider to support embeddings via
|
|
4
|
+
# Amazon Titan (amazon.titan-embed-text-v1 and v2) and Cohere Embed
|
|
5
|
+
# (cohere.embed-english-v3 / cohere.embed-multilingual-v3).
|
|
6
|
+
#
|
|
7
|
+
# Without this patch, `RubyLLM::Providers::Bedrock` exposes no
|
|
8
|
+
# `render_embedding_payload` method, so the discovery probe
|
|
9
|
+
# (`klass.instance_method(:render_embedding_payload)`) raises NameError
|
|
10
|
+
# and Bedrock is silently excluded from the embedding fallback chain.
|
|
11
|
+
#
|
|
12
|
+
# Companion piece to `call/bedrock_auth.rb` — both use the same
|
|
13
|
+
# bearer-or-SigV4 `signed_post` path and live here (not in lex-bedrock)
|
|
14
|
+
# because lex-bedrock wraps `aws-sdk-bedrockruntime`, not RubyLLM.
|
|
15
|
+
#
|
|
16
|
+
# ─── Upstream tracking ────────────────────────────────────────────
|
|
17
|
+
# This is a deprecation-scheduled shim. The methods below are the
|
|
18
|
+
# kind of thing that eventually belongs in the underlying ruby_llm
|
|
19
|
+
# library's Bedrock provider. Remove this file once upstream ships
|
|
20
|
+
# equivalent support. The short-circuit below renders the patch
|
|
21
|
+
# inert when `render_embedding_payload` is defined natively, so an
|
|
22
|
+
# accidental double-load after an upstream bump is safe.
|
|
23
|
+
# ───────────────────────────────────────────────────────────────────
|
|
24
|
+
#
|
|
25
|
+
# Titan v2 request shape:
|
|
26
|
+
# POST /model/amazon.titan-embed-text-v2:0/invoke
|
|
27
|
+
# { "inputText": "...", "dimensions": 1024, "normalize": true }
|
|
28
|
+
# => { "embedding": [...], "inputTextTokenCount": N }
|
|
29
|
+
#
|
|
30
|
+
# Cohere Embed request shape:
|
|
31
|
+
# POST /model/cohere.embed-english-v3/invoke
|
|
32
|
+
# { "texts": ["..."], "input_type": "search_document" }
|
|
33
|
+
# => { "embeddings": [[...]], ... }
|
|
34
|
+
|
|
35
|
+
require 'ruby_llm'
|
|
36
|
+
require_relative 'bedrock_auth'
|
|
37
|
+
|
|
38
|
+
if RubyLLM::Providers::Bedrock.method_defined?(:render_embedding_payload)
|
|
39
|
+
# Native support landed upstream — patch is inert.
|
|
40
|
+
Legion::Logging.logger.info('[llm][bedrock_embeddings] native ruby_llm embedding support detected — skipping patch')
|
|
41
|
+
else
|
|
42
|
+
|
|
43
|
+
module RubyLLM
|
|
44
|
+
module Providers
|
|
45
|
+
class Bedrock
|
|
46
|
+
# Embeddings methods for AWS Bedrock via InvokeModel.
|
|
47
|
+
#
|
|
48
|
+
# Public methods are instance methods (not `module_function`) so the
|
|
49
|
+
# `include Embeddings` at the end of the class body properly overrides
|
|
50
|
+
# `Provider#embed` via Ruby's method-resolution order.
|
|
51
|
+
module Embeddings
|
|
52
|
+
TITAN_V2_PREFIX = 'amazon.titan-embed-text-v2'
|
|
53
|
+
TITAN_V1_PREFIX = 'amazon.titan-embed-text-v1'
|
|
54
|
+
COHERE_PREFIX = 'cohere.embed'
|
|
55
|
+
|
|
56
|
+
TITAN_ALLOWED_DIMENSIONS = [256, 512, 1024].freeze
|
|
57
|
+
TITAN_MAX_INPUT_BYTES = 45_000 # ~8k tokens; Titan rejects larger with 400 (and still bills)
|
|
58
|
+
COHERE_MAX_INPUT_BYTES = 8_192 # Cohere Embed v3 per-text byte budget
|
|
59
|
+
COHERE_MAX_TEXTS = 96 # Cohere Embed v3 batch limit
|
|
60
|
+
# Bedrock model IDs use only alphanumerics, `.`, `-`, and `:` (e.g.
|
|
61
|
+
# `amazon.titan-embed-text-v2:0`, `cohere.embed-english-v3`,
|
|
62
|
+
# `us.anthropic.claude-sonnet-4-6-v1`). Slashes and `..` are rejected
|
|
63
|
+
# to block path-injection into the `/model/<id>/invoke` URL.
|
|
64
|
+
MODEL_ID_PATTERN = /\A[a-zA-Z0-9.\-:]+\z/
|
|
65
|
+
|
|
66
|
+
# @param model [String, Symbol] Bedrock model id
|
|
67
|
+
# @return [String] InvokeModel URL path
|
|
68
|
+
# @raise [RubyLLM::Error] if model id contains unsafe characters
|
|
69
|
+
def embedding_url(model:)
|
|
70
|
+
raise RubyLLM::Error.new(nil, "Invalid Bedrock model id: #{model.inspect}") \
|
|
71
|
+
unless model.to_s.match?(MODEL_ID_PATTERN)
|
|
72
|
+
|
|
73
|
+
"/model/#{model}/invoke"
|
|
74
|
+
end
|
|
75
|
+
|
|
76
|
+
# @param text [String, Array<String>]
|
|
77
|
+
# @param model [String] Bedrock embedding model id
|
|
78
|
+
# @param dimensions [Integer, nil] Titan v2 only; one of {256, 512, 1024}
|
|
79
|
+
# @return [Hash] JSON-serializable request payload
|
|
80
|
+
# @raise [RubyLLM::Error] on unsupported model, oversize input, or invalid dimensions
|
|
81
|
+
def render_embedding_payload(text, model:, dimensions:)
|
|
82
|
+
model_str = model.to_s
|
|
83
|
+
|
|
84
|
+
if model_str.start_with?(TITAN_V2_PREFIX)
|
|
85
|
+
titan_v2_payload(text, dimensions: dimensions)
|
|
86
|
+
elsif model_str.start_with?(TITAN_V1_PREFIX)
|
|
87
|
+
titan_v1_payload(text)
|
|
88
|
+
elsif model_str.start_with?(COHERE_PREFIX)
|
|
89
|
+
cohere_payload(text)
|
|
90
|
+
else
|
|
91
|
+
raise RubyLLM::Error.new(
|
|
92
|
+
nil,
|
|
93
|
+
"Bedrock model '#{model}' is not supported for embeddings. " \
|
|
94
|
+
'Supported prefixes: amazon.titan-embed-text-v1, ' \
|
|
95
|
+
'amazon.titan-embed-text-v2, cohere.embed-*.'
|
|
96
|
+
)
|
|
97
|
+
end
|
|
98
|
+
end
|
|
99
|
+
|
|
100
|
+
# @param response [Faraday::Response]
|
|
101
|
+
# @param model [String]
|
|
102
|
+
# @param text [String, Array<String>] original input (used for shape decisions)
|
|
103
|
+
# @return [RubyLLM::Embedding]
|
|
104
|
+
# @raise [RubyLLM::Error] if the response carried no vector
|
|
105
|
+
def parse_embedding_response(response, model:, text:)
|
|
106
|
+
body = response.body
|
|
107
|
+
body = try_parse_json(body) if body.is_a?(String)
|
|
108
|
+
|
|
109
|
+
vectors =
|
|
110
|
+
if model.to_s.start_with?(COHERE_PREFIX)
|
|
111
|
+
Array(body['embeddings'])
|
|
112
|
+
else
|
|
113
|
+
# Titan single-text response: the single vector lives in :embedding.
|
|
114
|
+
# Batch callers are handled in `embed` via iteration.
|
|
115
|
+
[body['embedding']].compact
|
|
116
|
+
end
|
|
117
|
+
|
|
118
|
+
raise RubyLLM::Error.new(response, "Empty embedding response for model #{model}") if vectors.empty?
|
|
119
|
+
|
|
120
|
+
vectors = vectors.first if vectors.length == 1 && !text.is_a?(Array)
|
|
121
|
+
input_tokens = body['inputTextTokenCount'] ||
|
|
122
|
+
body.dig('meta', 'billed_units', 'input_tokens') ||
|
|
123
|
+
0
|
|
124
|
+
|
|
125
|
+
RubyLLM::Embedding.new(vectors: vectors, model: model, input_tokens: input_tokens)
|
|
126
|
+
end
|
|
127
|
+
|
|
128
|
+
# Override the base `embed` method so signing headers are applied.
|
|
129
|
+
#
|
|
130
|
+
# The parent `Provider#embed` calls `@connection.post(url, payload)` directly,
|
|
131
|
+
# which would skip both bearer-token and SigV4 auth for Bedrock. We go through
|
|
132
|
+
# `invoke_embedding`, which mirrors `signed_post` but parses responses with
|
|
133
|
+
# `parse_embedding_response` (not `parse_completion_response`).
|
|
134
|
+
#
|
|
135
|
+
# Titan accepts a single text per invocation. When an Array is passed to a
|
|
136
|
+
# Titan model, we iterate via `embed_titan_batch`, which traps per-element
|
|
137
|
+
# failures so one 429 mid-batch does not lose preceding successes.
|
|
138
|
+
#
|
|
139
|
+
# @param text [String, Array<String>]
|
|
140
|
+
# @param model [String]
|
|
141
|
+
# @param dimensions [Integer, nil]
|
|
142
|
+
# @return [RubyLLM::Embedding]
|
|
143
|
+
def embed(text, model:, dimensions:)
|
|
144
|
+
return embed_titan_batch(text, model: model, dimensions: dimensions) \
|
|
145
|
+
if text.is_a?(Array) && !model.to_s.start_with?(COHERE_PREFIX)
|
|
146
|
+
|
|
147
|
+
payload = render_embedding_payload(text, model: model, dimensions: dimensions)
|
|
148
|
+
url = embedding_url(model: model)
|
|
149
|
+
response = invoke_embedding(url, payload)
|
|
150
|
+
parse_embedding_response(response, model: model, text: text)
|
|
151
|
+
end
|
|
152
|
+
|
|
153
|
+
private
|
|
154
|
+
|
|
155
|
+
def titan_v2_payload(text, dimensions:)
|
|
156
|
+
raise RubyLLM::Error.new(nil, 'Titan v2 embeddings accept a single string per invocation.') \
|
|
157
|
+
if text.is_a?(Array)
|
|
158
|
+
|
|
159
|
+
enforce_input_size!(text, TITAN_MAX_INPUT_BYTES, 'Titan v2')
|
|
160
|
+
|
|
161
|
+
payload = { inputText: text.to_s, normalize: true }
|
|
162
|
+
dim = dimensions&.to_i
|
|
163
|
+
if dim
|
|
164
|
+
unless TITAN_ALLOWED_DIMENSIONS.include?(dim)
|
|
165
|
+
raise RubyLLM::Error.new(
|
|
166
|
+
nil,
|
|
167
|
+
"Titan v2 dimensions must be one of #{TITAN_ALLOWED_DIMENSIONS.inspect}, got #{dim}"
|
|
168
|
+
)
|
|
169
|
+
end
|
|
170
|
+
payload[:dimensions] = dim
|
|
171
|
+
end
|
|
172
|
+
payload
|
|
173
|
+
end
|
|
174
|
+
|
|
175
|
+
def titan_v1_payload(text)
|
|
176
|
+
raise RubyLLM::Error.new(nil, 'Titan v1 embeddings accept a single string per invocation.') \
|
|
177
|
+
if text.is_a?(Array)
|
|
178
|
+
|
|
179
|
+
enforce_input_size!(text, TITAN_MAX_INPUT_BYTES, 'Titan v1')
|
|
180
|
+
{ inputText: text.to_s }
|
|
181
|
+
end
|
|
182
|
+
|
|
183
|
+
def cohere_payload(text)
|
|
184
|
+
texts = Array(text).map(&:to_s)
|
|
185
|
+
raise RubyLLM::Error.new(nil, "Cohere Embed batch size #{texts.size} exceeds max #{COHERE_MAX_TEXTS}") \
|
|
186
|
+
if texts.size > COHERE_MAX_TEXTS
|
|
187
|
+
|
|
188
|
+
texts.each { |t| enforce_input_size!(t, COHERE_MAX_INPUT_BYTES, 'Cohere Embed') }
|
|
189
|
+
|
|
190
|
+
{ texts: texts, input_type: 'search_document' }
|
|
191
|
+
end
|
|
192
|
+
|
|
193
|
+
def enforce_input_size!(text, max_bytes, model_name)
|
|
194
|
+
bytes = text.to_s.bytesize
|
|
195
|
+
return if bytes <= max_bytes
|
|
196
|
+
|
|
197
|
+
raise RubyLLM::Error.new(
|
|
198
|
+
nil,
|
|
199
|
+
"#{model_name} input too large: #{bytes} bytes exceeds max #{max_bytes}. " \
|
|
200
|
+
'Caller must chunk before embedding.'
|
|
201
|
+
)
|
|
202
|
+
end
|
|
203
|
+
|
|
204
|
+
# Mirror of `signed_post` for embeddings: pre-serializes the body so the
|
|
205
|
+
# SigV4 signature matches the bytes Faraday actually sends. `@connection.post`
|
|
206
|
+
# is `RubyLLM::Connection#post(url, payload)` which requires both args, so we
|
|
207
|
+
# pass `payload` to satisfy the arity but override `req.body = body` in the
|
|
208
|
+
# block — the block runs after middleware, so the pre-serialized bytes win
|
|
209
|
+
# over whatever JSON middleware would have produced.
|
|
210
|
+
def invoke_embedding(url, payload)
|
|
211
|
+
body = Legion::JSON.dump(payload)
|
|
212
|
+
headers = sign_headers('POST', url, body)
|
|
213
|
+
|
|
214
|
+
@connection.post(url, payload) do |req|
|
|
215
|
+
req.headers.merge!(headers)
|
|
216
|
+
req.body = body
|
|
217
|
+
end
|
|
218
|
+
end
|
|
219
|
+
|
|
220
|
+
# Per-item trap-and-continue for Titan batch. Returns a combined Embedding
|
|
221
|
+
# whose `vectors` is an Array of [Float] per input index, with `nil` entries
|
|
222
|
+
# for failed slots. Token count aggregates successful calls.
|
|
223
|
+
#
|
|
224
|
+
# Raises only when every element failed — otherwise logs failures via
|
|
225
|
+
# `RubyLLM.logger` and returns partial results so callers keep the paid-for
|
|
226
|
+
# vectors. Idiomatic for this file because we are inside the RubyLLM
|
|
227
|
+
# namespace; Legion-side batch orchestration lives in
|
|
228
|
+
# `Legion::LLM::Call::Embeddings.generate_batch`.
|
|
229
|
+
def embed_titan_batch(texts, model:, dimensions:)
|
|
230
|
+
vectors = []
|
|
231
|
+
token_total = 0
|
|
232
|
+
failures = []
|
|
233
|
+
|
|
234
|
+
texts.each_with_index do |text, idx|
|
|
235
|
+
single = embed(text.to_s, model: model, dimensions: dimensions)
|
|
236
|
+
vectors << Array(single.vectors).first
|
|
237
|
+
token_total += single.input_tokens.to_i
|
|
238
|
+
rescue StandardError => e
|
|
239
|
+
vectors << nil
|
|
240
|
+
failures << { index: idx, error: e.class.name, message: e.message }
|
|
241
|
+
end
|
|
242
|
+
|
|
243
|
+
unless failures.empty?
|
|
244
|
+
RubyLLM.logger.warn(
|
|
245
|
+
'[bedrock_embeddings] Titan batch partial failure: ' \
|
|
246
|
+
"#{failures.size}/#{texts.size} model=#{model}"
|
|
247
|
+
)
|
|
248
|
+
failures.each do |f|
|
|
249
|
+
RubyLLM.logger.debug(
|
|
250
|
+
"[bedrock_embeddings] batch item index=#{f[:index]} error=#{f[:error]} message=#{f[:message]}"
|
|
251
|
+
)
|
|
252
|
+
end
|
|
253
|
+
end
|
|
254
|
+
|
|
255
|
+
if failures.size == texts.size
|
|
256
|
+
raise RubyLLM::Error.new(
|
|
257
|
+
nil,
|
|
258
|
+
"All #{texts.size} Titan batch items failed. First error: #{failures.first[:message]}"
|
|
259
|
+
)
|
|
260
|
+
end
|
|
261
|
+
|
|
262
|
+
RubyLLM::Embedding.new(vectors: vectors, model: model, input_tokens: token_total)
|
|
263
|
+
end
|
|
264
|
+
end
|
|
265
|
+
|
|
266
|
+
include Embeddings
|
|
267
|
+
end
|
|
268
|
+
end
|
|
269
|
+
end
|
|
270
|
+
end
|