lex-llm-bedrock 0.3.12 → 0.3.18
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +36 -0
- data/README.md +103 -15
- data/lib/legion/extensions/llm/bedrock/provider.rb +696 -24
- data/lib/legion/extensions/llm/bedrock/version.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: e0a7bdd5a1097bfe7caf898a8b47649c5b29d7ae2d73c980e47bec743f343b2c
|
|
4
|
+
data.tar.gz: bab966bb6aa10487d43f1f4d01ae531b701ef74c49884ad820c127ee3d7efc91
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 34d5f3629994cda2216de0826249516659e39e3e908152a66d62e46d4a9ce6b01f983e094c1c30f9c233227e810231d67dadfd8bed1fa86b70f448ea59b7c1b4
|
|
7
|
+
data.tar.gz: 6cc2164e232cada49623ec316ea2a767227196ed7526ac1956c7c0ebf9110e52d1aaa69823c51bb06f33f4779772e4dce1037ed0e9aaf17601db4c9e2e0ec089
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,41 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 0.3.18 - 2026-06-05
|
|
4
|
+
|
|
5
|
+
### Fixed
|
|
6
|
+
- **Spec and RuboCop compliance** — Verified all 54 specs pass cleanly. RuboCop auto-correct applied; 0 offenses remaining.
|
|
7
|
+
|
|
8
|
+
## 0.3.17 - 2026-06-05
|
|
9
|
+
|
|
10
|
+
### Fixed
|
|
11
|
+
- **Unused method arguments** — Prefixed unused keyword parameters (`params`, `model`, `streaming`) in `invoke_model_chat`, `invoke_model_stream`, and `build_invoke_model_body` with underscore prefix to satisfy RuboCop `Lint/UnusedMethodArgument` (provider.rb)
|
|
12
|
+
- **Keyword parameter ordering** — Moved optional keyword parameters to the end of `build_invoke_model_body` signature per `Style/KeywordParametersOrder` (provider.rb)
|
|
13
|
+
|
|
14
|
+
## 0.3.16 - 2026-06-04
|
|
15
|
+
|
|
16
|
+
### Fixed
|
|
17
|
+
- **Thinking config silently ignored by Converse API for Claude Sonnet 4+** — Bedrock Converse API does not support extended thinking for Claude Sonnet 4 and newer. When thinking is enabled for an Anthropic model, the provider now routes through `invoke_model` with the native Anthropic Messages API payload (the same format Phase 1 direct tests use), which correctly generates and returns thinking blocks (provider.rb)
|
|
18
|
+
- **Thinking extraction failed on AWS SDK structs** — `extract_thinking_from_content` assumed content blocks were Hashes. Bedrock Converse returns `Aws::BedrockRuntime::Types` structs that don't respond to `[]` the same way. Now uses `value()` helper for safe struct access on reasoning content blocks (provider.rb)
|
|
19
|
+
- **Streaming reasoning/thinking blocks not detected** — `wire_block_start` only checked `:thinking` blocks but Bedrock Converse uses `:reasoning` blocks for thinking content. Added `:reasoning` check. `wire_block_delta` now extracts from `delta.reasoning.text` and `delta.thinking.text` in addition to `delta.text` (provider.rb)
|
|
20
|
+
|
|
21
|
+
### Added
|
|
22
|
+
- **Debug logging for Bedrock converse calls** — Logs thinking config sent, elapsed time, usage, additional_fields keys, and content block types on response. Logs stream completion with accumulated length, tool use block count, and stop reason (provider.rb)
|
|
23
|
+
|
|
24
|
+
## 0.3.15 - 2026-06-04
|
|
25
|
+
|
|
26
|
+
### Fixed
|
|
27
|
+
- **Thinking config ignored in chat/stream/complete** — The `chat`, `stream`, and `complete` methods accepted `thinking:` kwarg but never passed it to Bedrock's converse API. Now passes thinking through `additional_model_request_fields[:thinking]` with AWS-format `{ type: "enabled", budget_tokens: N }`, accepting both `:budget_tokens` and `:budget` keys for compatibility with Anthropic API format (provider.rb)
|
|
28
|
+
|
|
29
|
+
## 0.3.14 - 2026-06-04
|
|
30
|
+
|
|
31
|
+
### Fixed
|
|
32
|
+
- **`NameError` on unpopulated AWS SDK struct fields** — `Aws::Structure` objects declare all members in their schema (including `cache_creation_input_tokens`), so `key?` returns `true`, but accessing a missing member raises `NameError` instead of returning `nil`. Added `safe_struct_access` helper that wraps `object[key]` in `rescue NameError → nil`, so unpopulated struct fields gracefully return `nil` instead of crashing the request (provider.rb)
|
|
33
|
+
|
|
34
|
+
## 0.3.13 - 2026-06-02
|
|
35
|
+
|
|
36
|
+
### Fixed
|
|
37
|
+
- **Tool call iteration crash on Bedrock escalation** — `assistant_tool_use_blocks` iterated `message.tool_calls` (a `Hash`) with `each`, which yields `[key, value]` pairs rather than `ToolCall` objects. Calling `.id` on the Array raised `NoMethodError` on every Bedrock call with tool-call history, tripping the circuit breaker and exhausting the escalation chain. Fixed by using `each_value` (provider.rb)
|
|
38
|
+
|
|
3
39
|
## 0.3.12 - 2026-06-02
|
|
4
40
|
|
|
5
41
|
### Fixed
|
data/README.md
CHANGED
|
@@ -8,18 +8,44 @@ This gem adds a hosted Bedrock provider surface for Legion LLM routing. It uses
|
|
|
8
8
|
|
|
9
9
|
```
|
|
10
10
|
Legion::Extensions::Llm::Bedrock
|
|
11
|
-
├── Provider
|
|
12
|
-
│ ├── Capabilities
|
|
13
|
-
│ ├── chat / stream
|
|
14
|
-
│ ├── embed
|
|
15
|
-
│ ├── count_tokens
|
|
16
|
-
│ ├── discover_offerings
|
|
17
|
-
│ ├── health / readiness
|
|
18
|
-
│
|
|
19
|
-
├──
|
|
20
|
-
└──
|
|
11
|
+
├── Provider # Bedrock implementation of the lex-llm Provider contract
|
|
12
|
+
│ ├── Capabilities # Capability predicates inferred from model IDs
|
|
13
|
+
│ ├── chat / stream # Converse / ConverseStream API calls
|
|
14
|
+
│ ├── embed # Titan InvokeModel embedding
|
|
15
|
+
│ ├── count_tokens # CountTokens API call
|
|
16
|
+
│ ├── discover_offerings # Static catalog + live ListFoundationModels
|
|
17
|
+
│ ├── health / readiness # Provider health checks with live AWS verification
|
|
18
|
+
│ ├── list_models # Live model enumeration
|
|
19
|
+
│ ├── invoke_model_chat # Native Anthropic payload for thinking-enabled models
|
|
20
|
+
│ └── invoke_model_stream # Native Anthropic streaming for thinking-enabled models
|
|
21
|
+
├── Actor::FleetWorker # Provider-owned fleet subscription gate
|
|
22
|
+
├── Actor::DiscoveryRefresh # Periodic model catalog refresh (conditional on actor runtime)
|
|
23
|
+
└── Runners::FleetWorker # Delegates fleet requests to lex-llm ProviderResponder
|
|
21
24
|
```
|
|
22
25
|
|
|
26
|
+
### Provider Dispatch
|
|
27
|
+
|
|
28
|
+
The `Provider` class decides at call time which API path to use:
|
|
29
|
+
|
|
30
|
+
| Condition | Path | Why |
|
|
31
|
+
|-----------|------|-----|
|
|
32
|
+
| Anthropic model + `thinking` or `tools` | `invoke_model` (native Anthropic payload) | Bedrock Converse silently drops thinking config and tool_use blocks for Claude Sonnet 4+ |
|
|
33
|
+
| All other cases | `Converse` / `ConverseStream` | Standard Bedrock managed inference API |
|
|
34
|
+
|
|
35
|
+
### Instance Discovery
|
|
36
|
+
|
|
37
|
+
`Legion::Extensions::Llm::Bedrock.discover_instances` scans five credential sources in priority order, deduplicates by fingerprint, and returns a hash of `{ instance_name => config_hash }` pairs:
|
|
38
|
+
|
|
39
|
+
| Source | Key | How it works |
|
|
40
|
+
|--------|-----|--------------|
|
|
41
|
+
| ENV bearer | `:env_bearer` | Reads `AWS_BEARER_TOKEN_BEDROCK` from environment |
|
|
42
|
+
| Claude config bearer | `:claude` | Reads `AWS_BEARER_TOKEN_BEDROCK` from Claude env/config, falls back to pattern match on any key containing `AWS`, `BEARER`, `TOKEN`, `BEDROCK` |
|
|
43
|
+
| ENV SigV4 | `:env_sigv4` | Reads `AWS_ACCESS_KEY_ID` + `AWS_SECRET_ACCESS_KEY` from environment |
|
|
44
|
+
| Extension settings | `:settings` + named instances | Reads from `extensions.llm.bedrock` settings, normalizes generic keys to `bedrock_*` prefix |
|
|
45
|
+
| Identity Broker | `:broker` | Reads `Legion::Identity::Broker.credentials_for(:aws)` when the module is defined |
|
|
46
|
+
|
|
47
|
+
Instances with unresolved credential references (`vault://` or `env://` URIs) are filtered out.
|
|
48
|
+
|
|
23
49
|
## Dependencies
|
|
24
50
|
|
|
25
51
|
| Gem | Required | Purpose |
|
|
@@ -36,9 +62,10 @@ Legion::Extensions::Llm::Bedrock
|
|
|
36
62
|
|
|
37
63
|
| Path | Purpose |
|
|
38
64
|
|------|---------|
|
|
39
|
-
| `lib/legion/extensions/llm/bedrock.rb` | Entry point: namespace, default settings, discovery, and shared provider registration metadata |
|
|
40
|
-
| `lib/legion/extensions/llm/bedrock/provider.rb` | Full Bedrock provider implementation |
|
|
65
|
+
| `lib/legion/extensions/llm/bedrock.rb` | Entry point: namespace, default settings, instance discovery, credential sources, and shared provider registration metadata |
|
|
66
|
+
| `lib/legion/extensions/llm/bedrock/provider.rb` | Full Bedrock provider implementation (1500+ lines) — Converse, invoke_model, streaming, tool calls, thinking, embeddings, health, and discovery |
|
|
41
67
|
| `lib/legion/extensions/llm/bedrock/actors/fleet_worker.rb` | Starts the provider-owned fleet subscriber when an instance opts in |
|
|
68
|
+
| `lib/legion/extensions/llm/bedrock/actors/discovery_refresh.rb` | Periodic model catalog refresh actor (loaded only when `Legion::Extensions::Actors::Every` is available) |
|
|
42
69
|
| `lib/legion/extensions/llm/bedrock/runners/fleet_worker.rb` | Hands provider fleet requests to `Legion::Extensions::Llm::Fleet::ProviderResponder` |
|
|
43
70
|
| `lib/legion/extensions/llm/bedrock/version.rb` | `VERSION` constant |
|
|
44
71
|
|
|
@@ -69,7 +96,7 @@ If explicit keys are not configured, the AWS SDK default credential provider cha
|
|
|
69
96
|
Legion::Extensions::Llm::Bedrock.default_settings
|
|
70
97
|
```
|
|
71
98
|
|
|
72
|
-
Configuration options: `bedrock_region`, `bedrock_endpoint`, `bedrock_access_key_id`, `bedrock_secret_access_key`, `bedrock_session_token`, `bedrock_profile`, `bedrock_stub_responses`.
|
|
99
|
+
Configuration options: `bedrock_region`, `bedrock_endpoint`, `bedrock_access_key_id`, `bedrock_secret_access_key`, `bedrock_session_token`, `bedrock_profile`, `bedrock_stub_responses`, `bearer_token`.
|
|
73
100
|
|
|
74
101
|
## Fleet Responder
|
|
75
102
|
|
|
@@ -121,7 +148,33 @@ Every offering uses:
|
|
|
121
148
|
|
|
122
149
|
Known aliases are intentionally small and conservative. For example, `claude-3-haiku` resolves to `anthropic.claude-3-haiku-20240307-v1:0`, while the preserved Bedrock model ID remains the routing model.
|
|
123
150
|
|
|
124
|
-
Static models: `claude-3-haiku`, `titan-text-express`, `titan-embed-text-v2`, `llama-3.2-11b-instruct`, `mistral-large-3`.
|
|
151
|
+
Static models: `claude-3-haiku`, `anthropic.claude-sonnet-4`, `titan-text-express`, `titan-embed-text-v2`, `llama-3.2-11b-instruct`, `mistral-large-3`.
|
|
152
|
+
|
|
153
|
+
## Inference Profiles
|
|
154
|
+
|
|
155
|
+
Bare model IDs (e.g. `anthropic.claude-sonnet-4`) are automatically prefixed with the region-based inference profile prefix (`us.`, `eu.`, `ap.`) based on the configured region. Region mapping is defined in `REGION_PREFIX`:
|
|
156
|
+
|
|
157
|
+
| Region | Prefix |
|
|
158
|
+
|--------|--------|
|
|
159
|
+
| `us-east-1`, `us-east-2`, `us-west-1`, `us-west-2` | `us` |
|
|
160
|
+
| `eu-central-1`, `eu-west-*` | `eu` |
|
|
161
|
+
| `ap-south-1`, `ap-southeast-*`, `ap-northeast-1` | `ap` |
|
|
162
|
+
|
|
163
|
+
Models already prefixed (`us.`, `eu.`, `ap.`, `arn:`) are passed through unchanged.
|
|
164
|
+
|
|
165
|
+
## Context Windows
|
|
166
|
+
|
|
167
|
+
Static context window data is available for known models without making live API calls. Looked up by prefix match in `Provider::CONTEXT_WINDOWS`.
|
|
168
|
+
|
|
169
|
+
| Model prefix | Context |
|
|
170
|
+
|-------------|---------|
|
|
171
|
+
| `anthropic.claude-*` (all) | 200,000 |
|
|
172
|
+
| `meta.llama3*` | 128,000 |
|
|
173
|
+
| `mistral.mistral-*` | 128,000 |
|
|
174
|
+
| `amazon.nova-pro`, `nova-lite` | 300,000 |
|
|
175
|
+
| `amazon.nova-micro` | 128,000 |
|
|
176
|
+
| `amazon.titan-text-premier` | 32,000 |
|
|
177
|
+
| `amazon.titan-text-express` | 8,192 |
|
|
125
178
|
|
|
126
179
|
## API Contract
|
|
127
180
|
|
|
@@ -132,17 +185,41 @@ The implementation is intentionally limited to Bedrock operations documented by
|
|
|
132
185
|
- `ConverseStream` for streaming chat responses
|
|
133
186
|
- `CountTokens` for token estimates
|
|
134
187
|
- `InvokeModel` only for the Titan text embedding request shape implemented here
|
|
188
|
+
- `InvokeModel` (non-streaming) for Anthropic models with thinking/tool use enabled
|
|
189
|
+
- `InvokeModelWithResponseStream` for Anthropic models with thinking/tool use enabled
|
|
135
190
|
|
|
136
191
|
Provider-specific request bodies are not guessed. Non-Titan embedding models raise until their documented body shape is added explicitly.
|
|
137
192
|
|
|
193
|
+
## Tool Calls
|
|
194
|
+
|
|
195
|
+
Tool calls follow the Bedrock Converse `tool_config` shape. When tool call history is present in the message array, assistant messages emit proper `{ tool_use: { tool_use_id, name, input } }` content blocks. Tool results use `{ tool_result: { tool_use_id, content } }` blocks.
|
|
196
|
+
|
|
197
|
+
For Anthropic models with tools, the `invoke_model` path is used with native Anthropic tool formatting (`input_schema` wrapped in the tool definition).
|
|
198
|
+
|
|
199
|
+
## Thinking (Extended Reasoning)
|
|
200
|
+
|
|
201
|
+
When `thinking:` is passed to `chat`, `stream`, or `complete` for an Anthropic model:
|
|
202
|
+
|
|
203
|
+
1. The provider detects the Anthropic model prefix and routes through `invoke_model` with the native Anthropic Messages API payload.
|
|
204
|
+
2. Thinking config is serialized as `{ type: 'enabled', budget_tokens: N }`, accepting both `:budget_tokens` and `:budget` keys.
|
|
205
|
+
3. Provider-specific keys (e.g. `:effort` from OpenAI) are stripped before sending.
|
|
206
|
+
4. Responses parse thinking content from `content_blocks[type: 'thinking']` for `invoke_model`, and from `delta.reasoning.text` for `ConverseStream`.
|
|
207
|
+
|
|
208
|
+
## Security
|
|
209
|
+
|
|
210
|
+
- Static AWS credentials emit a deprecation warning. Set `security.block_static_aws_credentials: true` in settings to reject them entirely.
|
|
211
|
+
- Bearer token authentication is supported via `Aws::StaticTokenProvider`, eliminating IMDS timeout on startup.
|
|
212
|
+
|
|
138
213
|
## Observability
|
|
139
214
|
|
|
140
215
|
The Bedrock namespace and provider implementation include `Legion::Logging::Helper` for structured logging:
|
|
141
216
|
|
|
142
217
|
- **Info-level**: provider connections, API calls (chat, stream, embed), model listing, health checks
|
|
143
|
-
- **Debug-level**: offline health checks, readiness probes,
|
|
218
|
+
- **Debug-level**: offline health checks, readiness probes, token counting, thinking config, request/response metadata
|
|
144
219
|
- **Rescue blocks**: handled provider failures call `handle_exception(e, level:, handled:, operation:)` with dot-separated operation names such as `bedrock.provider.health`
|
|
145
220
|
|
|
221
|
+
Set `BEDROCK_DEBUG_OUTPUT=/path/to/dir` to dump raw Bedrock responses and streaming events to JSON files for debugging.
|
|
222
|
+
|
|
146
223
|
## Development
|
|
147
224
|
|
|
148
225
|
```bash
|
|
@@ -152,12 +229,23 @@ bundle exec rubocop -A # auto-fix
|
|
|
152
229
|
bundle exec rubocop # lint check (0 offenses expected)
|
|
153
230
|
```
|
|
154
231
|
|
|
232
|
+
### Test Structure
|
|
233
|
+
|
|
234
|
+
| Spec file | Coverage |
|
|
235
|
+
|-----------|----------|
|
|
236
|
+
| `bedrock_spec.rb` | Provider surface: offerings, chat, stream, tools, embed, count_tokens, health, readiness, model listing, caching |
|
|
237
|
+
| `discover_instances_spec.rb` | Credential discovery from ENV, Claude config, settings, Identity Broker, and deduplication |
|
|
238
|
+
| `provider_contract_spec.rb` | Verifies all canonical methods use keyword-only arguments (no positional params) |
|
|
239
|
+
| `actors/fleet_worker_spec.rb` | Fleet worker actor: runner class, function, use_runner?, enabled? |
|
|
240
|
+
| `runners/fleet_worker_spec.rb` | Fleet worker runner: delegation to shared ProviderResponder |
|
|
241
|
+
|
|
155
242
|
## AWS References
|
|
156
243
|
|
|
157
244
|
- [Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html)
|
|
158
245
|
- [ConverseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ConverseStream.html)
|
|
159
246
|
- [CountTokens](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_CountTokens.html)
|
|
160
247
|
- [ListFoundationModels](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_ListFoundationModels.html)
|
|
248
|
+
- [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html)
|
|
161
249
|
- [Foundation model information](https://docs.aws.amazon.com/bedrock/latest/userguide/foundation-models-reference.html)
|
|
162
250
|
|
|
163
251
|
## License
|
|
@@ -1,5 +1,6 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
|
+
require 'base64'
|
|
3
4
|
require 'aws-sdk-bedrock'
|
|
4
5
|
require 'aws-sdk-bedrockruntime'
|
|
5
6
|
require 'legion/json'
|
|
@@ -16,6 +17,7 @@ module Legion
|
|
|
16
17
|
|
|
17
18
|
STATIC_MODELS = [
|
|
18
19
|
{ model: 'anthropic.claude-3-haiku-20240307-v1:0', alias: 'claude-3-haiku' },
|
|
20
|
+
{ model: 'anthropic.claude-sonnet-4-20250514-v1:0', alias: 'anthropic.claude-sonnet-4' },
|
|
19
21
|
{ model: 'amazon.titan-text-express-v1', alias: 'titan-text-express' },
|
|
20
22
|
{ model: 'amazon.titan-embed-text-v2:0', alias: 'titan-embed-text-v2', usage_type: :embedding },
|
|
21
23
|
{ model: 'meta.llama3-2-11b-instruct-v1:0', alias: 'llama-3.2-11b-instruct' },
|
|
@@ -210,32 +212,124 @@ module Legion
|
|
|
210
212
|
tools: {},
|
|
211
213
|
tool_prefs: nil,
|
|
212
214
|
params: {},
|
|
215
|
+
thinking: nil,
|
|
213
216
|
**_provider_options
|
|
214
217
|
)
|
|
215
218
|
log.info { "bedrock.provider.chat: model=#{model_id(model)} messages=#{messages.size}" }
|
|
219
|
+
|
|
220
|
+
# Bedrock Converse API silently drops thinking config and tool_use blocks
|
|
221
|
+
# for Claude Sonnet 4+. Use invoke_model with native Anthropic payload.
|
|
222
|
+
if anthropic_model?(model_id(model)) && (thinking || (tools && !tools.empty?))
|
|
223
|
+
return invoke_model_chat(messages:, model:, temperature:, max_tokens:, tools:, tool_prefs:,
|
|
224
|
+
thinking:, params:)
|
|
225
|
+
end
|
|
226
|
+
|
|
216
227
|
request = Utils.deep_merge(
|
|
217
|
-
converse_request(messages, model:, temperature:, max_tokens:, tools:, tool_prefs:),
|
|
228
|
+
converse_request(messages, model:, temperature:, max_tokens:, tools:, tool_prefs:, thinking:),
|
|
218
229
|
params
|
|
219
230
|
)
|
|
220
231
|
log.debug do
|
|
221
232
|
"bedrock.provider.chat: request prepared model=#{model_id(model)} tools=#{tools.size} " \
|
|
222
233
|
"tool_choice=#{tool_choice_label(tool_prefs)} param_keys=#{params.keys.map(&:to_s).sort.join(',')}"
|
|
223
234
|
end
|
|
224
|
-
|
|
235
|
+
|
|
236
|
+
# Log the thinking config being sent
|
|
237
|
+
thinking_config = request.dig(:additional_model_request_fields, :thinking)
|
|
238
|
+
log.debug { "bedrock.provider.chat: thinking_config=#{thinking_config.inspect}" } if thinking_config
|
|
239
|
+
|
|
240
|
+
start_time = Time.now
|
|
241
|
+
response = begin
|
|
242
|
+
runtime_client.converse(**request)
|
|
243
|
+
rescue StandardError => e
|
|
244
|
+
elapsed = ((Time.now - start_time) * 1000).round
|
|
245
|
+
log.error do
|
|
246
|
+
"bedrock.provider.chat: converse failed model=#{model_id(model)} " \
|
|
247
|
+
"error=#{e.class}: #{e.message} elapsed_ms=#{elapsed}"
|
|
248
|
+
end
|
|
249
|
+
raise
|
|
250
|
+
end
|
|
251
|
+
elapsed = ((Time.now - start_time) * 1000).round
|
|
252
|
+
|
|
253
|
+
# Dump raw Bedrock response for debugging
|
|
254
|
+
raw_debug = response.respond_to?(:to_h) ? response.to_h : response.inspect[0, 2000]
|
|
255
|
+
dump_path = ENV.fetch('BEDROCK_DEBUG_OUTPUT', nil)
|
|
256
|
+
if dump_path
|
|
257
|
+
begin
|
|
258
|
+
dump_file = File.join(dump_path, "bedrock_chat_#{Time.now.strftime('%Y%m%d_%H%M%S')}.json")
|
|
259
|
+
File.write(dump_file, Legion::JSON.pretty_generate(raw_debug))
|
|
260
|
+
log.debug { "bedrock.provider.chat: raw response dumped to #{dump_file}" }
|
|
261
|
+
rescue StandardError => e
|
|
262
|
+
log.warn { "bedrock.provider.chat: failed to dump raw response: #{e.message}" }
|
|
263
|
+
end
|
|
264
|
+
end
|
|
265
|
+
|
|
266
|
+
# Log response metadata
|
|
267
|
+
usage = value(response, :usage) || {}
|
|
268
|
+
additional_fields = value(response, :additional_model_response_fields)
|
|
269
|
+
output = value(response, :output)
|
|
270
|
+
content_blocks = output ? value(output, :message) : nil
|
|
271
|
+
# AWS SDK content blocks are structs, not hashes — use safe inspection
|
|
272
|
+
block_types = if content_blocks
|
|
273
|
+
Array(value(content_blocks, :content)).map do |b|
|
|
274
|
+
if b.respond_to?(:reasoning)
|
|
275
|
+
'reasoning'
|
|
276
|
+
elsif b.respond_to?(:text)
|
|
277
|
+
'text'
|
|
278
|
+
elsif b.respond_to?(:tool_use)
|
|
279
|
+
'tool_use'
|
|
280
|
+
else
|
|
281
|
+
b.class.name
|
|
282
|
+
end
|
|
283
|
+
end.inspect
|
|
284
|
+
else
|
|
285
|
+
'none'
|
|
286
|
+
end
|
|
287
|
+
af_keys = if additional_fields.respond_to?(:to_h)
|
|
288
|
+
additional_fields.to_h.keys.map(&:to_s).sort
|
|
289
|
+
else
|
|
290
|
+
additional_fields.respond_to?(:keys) ? additional_fields.keys.map(&:to_s).sort : []
|
|
291
|
+
end
|
|
292
|
+
|
|
293
|
+
log.debug do
|
|
294
|
+
"bedrock.provider.chat: response received model=#{model_id(model)} elapsed_ms=#{elapsed} " \
|
|
295
|
+
"usage=#{usage.inspect} additional_fields_keys=#{af_keys.inspect} " \
|
|
296
|
+
"content_block_types=#{block_types}"
|
|
297
|
+
end
|
|
298
|
+
|
|
299
|
+
parse_converse_response(response, model_id(model))
|
|
225
300
|
end
|
|
226
301
|
|
|
227
302
|
def stream(messages:, model:, temperature: nil, max_tokens: nil, tools: {}, tool_prefs: nil, params: {},
|
|
228
|
-
**_provider_options, &)
|
|
229
|
-
log.info
|
|
303
|
+
thinking: nil, **_provider_options, &)
|
|
304
|
+
log.info do
|
|
305
|
+
"bedrock.provider.stream: model=#{model_id(model)} messages=#{messages.size} tools=#{tools.size}"
|
|
306
|
+
end
|
|
307
|
+
|
|
308
|
+
# Bedrock Converse API silently drops thinking config and tool_use blocks
|
|
309
|
+
# for Claude Sonnet 4+. Use invoke_model with native Anthropic payload.
|
|
310
|
+
if anthropic_model?(model_id(model)) && (thinking || (tools && !tools.empty?))
|
|
311
|
+
return invoke_model_stream(messages:, model:, temperature:, max_tokens:, tools:, tool_prefs:,
|
|
312
|
+
thinking:, params:, &)
|
|
313
|
+
end
|
|
314
|
+
|
|
230
315
|
request = Utils.deep_merge(
|
|
231
|
-
converse_request(messages, model:, temperature:, max_tokens:, tools:, tool_prefs:),
|
|
316
|
+
converse_request(messages, model:, temperature:, max_tokens:, tools:, tool_prefs:, thinking:),
|
|
232
317
|
params
|
|
233
318
|
)
|
|
234
319
|
log.debug do
|
|
235
320
|
"bedrock.provider.stream: request prepared model=#{model_id(model)} tools=#{tools.size} " \
|
|
236
321
|
"tool_choice=#{tool_choice_label(tool_prefs)} param_keys=#{params.keys.map(&:to_s).sort.join(',')}"
|
|
237
322
|
end
|
|
238
|
-
|
|
323
|
+
|
|
324
|
+
# Log the thinking config being sent
|
|
325
|
+
thinking_config = request.dig(:additional_model_request_fields, :thinking)
|
|
326
|
+
log.debug { "bedrock.provider.stream: thinking_config=#{thinking_config.inspect}" } if thinking_config
|
|
327
|
+
|
|
328
|
+
start_time = Time.now
|
|
329
|
+
result = stream_converse(request, model_id(model), &)
|
|
330
|
+
elapsed = ((Time.now - start_time) * 1000).round
|
|
331
|
+
log.debug { "bedrock.provider.stream: completed model=#{model_id(model)} elapsed_ms=#{elapsed}" }
|
|
332
|
+
result
|
|
239
333
|
end
|
|
240
334
|
|
|
241
335
|
def count_tokens(
|
|
@@ -284,18 +378,434 @@ module Legion
|
|
|
284
378
|
tool_prefs: nil, &)
|
|
285
379
|
payload = params.dup
|
|
286
380
|
payload[:additional_model_request_fields] ||= {}
|
|
287
|
-
payload[:additional_model_request_fields][:thinking] = thinking if thinking
|
|
288
381
|
payload[:additional_model_request_fields][:response_format] = schema if schema
|
|
289
382
|
|
|
290
383
|
if block_given?
|
|
291
|
-
stream(messages:, model:, temperature:, tools:, tool_prefs:, params: payload, &)
|
|
384
|
+
stream(messages:, model:, temperature:, tools:, tool_prefs:, params: payload, thinking:, &)
|
|
292
385
|
else
|
|
293
|
-
chat(messages:, model:, temperature:, tools:, tool_prefs:, params: payload)
|
|
386
|
+
chat(messages:, model:, temperature:, tools:, tool_prefs:, params: payload, thinking:)
|
|
294
387
|
end
|
|
295
388
|
end
|
|
296
389
|
|
|
297
390
|
private
|
|
298
391
|
|
|
392
|
+
# Returns true if the model is an Anthropic model on Bedrock
|
|
393
|
+
def anthropic_model?(model_id)
|
|
394
|
+
return false unless model_id
|
|
395
|
+
|
|
396
|
+
mid = model_id.to_s
|
|
397
|
+
mid.start_with?('anthropic.', 'us.anthropic.', 'eu.anthropic.', 'ap.anthropic.')
|
|
398
|
+
end
|
|
399
|
+
|
|
400
|
+
# --- invoke_model path for thinking-enabled Anthropic models ---
|
|
401
|
+
# Bedrock Converse API silently drops thinking config for Claude Sonnet 4+.
|
|
402
|
+
# invoke_model uses the native Anthropic Messages API payload format which supports thinking.
|
|
403
|
+
|
|
404
|
+
def invoke_model_chat(messages:, model:, temperature:, max_tokens:, tools:, tool_prefs:,
|
|
405
|
+
thinking:, _params: nil, **_rest)
|
|
406
|
+
mid = model_id(model)
|
|
407
|
+
body = build_invoke_model_body(
|
|
408
|
+
messages: messages, model: mid, temperature: temperature, max_tokens: max_tokens,
|
|
409
|
+
tools: tools, tool_prefs: tool_prefs, thinking: thinking
|
|
410
|
+
)
|
|
411
|
+
|
|
412
|
+
log.debug { "bedrock.provider.invoke_model_chat: model=#{mid} thinking=#{thinking.inspect}" }
|
|
413
|
+
|
|
414
|
+
response = runtime_client.invoke_model(
|
|
415
|
+
model_id: self.class.inference_profile_id(mid, region: region),
|
|
416
|
+
content_type: 'application/json',
|
|
417
|
+
accept: 'application/json',
|
|
418
|
+
body: Legion::JSON.generate(body)
|
|
419
|
+
)
|
|
420
|
+
|
|
421
|
+
# Read body once — it's a stream that can only be consumed once
|
|
422
|
+
body_raw = value(response, :body)
|
|
423
|
+
body_raw = body_raw.read if body_raw.respond_to?(:read)
|
|
424
|
+
body_raw = body_raw.string if body_raw.respond_to?(:string)
|
|
425
|
+
body_str = body_raw.to_s
|
|
426
|
+
|
|
427
|
+
# Dump raw invoke_model response for debugging
|
|
428
|
+
dump_path = ENV.fetch('BEDROCK_DEBUG_OUTPUT', nil)
|
|
429
|
+
if dump_path
|
|
430
|
+
begin
|
|
431
|
+
dump_file = File.join(dump_path, "bedrock_invoke_chat_#{Time.now.strftime('%Y%m%d_%H%M%S')}.json")
|
|
432
|
+
File.write(dump_file, body_str)
|
|
433
|
+
log.debug { "bedrock.provider.invoke_model_chat: raw response dumped to #{dump_file}" }
|
|
434
|
+
rescue StandardError => e
|
|
435
|
+
log.warn { "bedrock.provider.invoke_model_chat: failed to dump raw response: #{e.message}" }
|
|
436
|
+
end
|
|
437
|
+
end
|
|
438
|
+
|
|
439
|
+
# Wrap body string back into response so parse_invoke_model_response can use it
|
|
440
|
+
parsed_body = Legion::JSON.parse(body_str, symbolize_names: false)
|
|
441
|
+
parse_invoke_model_response_hash(parsed_body, mid)
|
|
442
|
+
end
|
|
443
|
+
|
|
444
|
+
def invoke_model_stream(messages:, model:, temperature:, max_tokens:, tools:, tool_prefs:,
|
|
445
|
+
thinking:, _params: nil, **_rest, &)
|
|
446
|
+
mid = model_id(model)
|
|
447
|
+
body = build_invoke_model_body(
|
|
448
|
+
messages: messages, model: mid, temperature: temperature, max_tokens: max_tokens,
|
|
449
|
+
tools: tools, tool_prefs: tool_prefs, thinking: thinking, streaming: true
|
|
450
|
+
)
|
|
451
|
+
|
|
452
|
+
log.debug { "bedrock.provider.invoke_model_stream: model=#{mid} thinking=#{thinking.inspect}" }
|
|
453
|
+
|
|
454
|
+
state = {
|
|
455
|
+
accumulated: +'',
|
|
456
|
+
thinking: +'',
|
|
457
|
+
final_usage: nil,
|
|
458
|
+
stop_reason: nil,
|
|
459
|
+
tool_use_blocks: [],
|
|
460
|
+
current_tool_use: nil,
|
|
461
|
+
in_thinking: false,
|
|
462
|
+
raw_events: []
|
|
463
|
+
}
|
|
464
|
+
|
|
465
|
+
dump_path = ENV.fetch('BEDROCK_DEBUG_OUTPUT', nil)
|
|
466
|
+
|
|
467
|
+
# rubocop:disable Metrics/BlockLength
|
|
468
|
+
runtime_client.invoke_model_with_response_stream(
|
|
469
|
+
model_id: self.class.inference_profile_id(mid, region: region),
|
|
470
|
+
content_type: 'application/json',
|
|
471
|
+
accept: 'application/json',
|
|
472
|
+
body: Legion::JSON.generate(body)
|
|
473
|
+
) do |stream|
|
|
474
|
+
# ResponseStream is an event emitter (Aws::BedrockRuntime::EventStreams::ResponseStream).
|
|
475
|
+
# Wire on_chunk_event to receive actual data events.
|
|
476
|
+
# Each chunk contains base64-encoded JSON lines with Anthropic events.
|
|
477
|
+
log.debug { "bedrock.provider.invoke_model_stream: stream class=#{stream.class}" }
|
|
478
|
+
|
|
479
|
+
stream.on_chunk_event do |event|
|
|
480
|
+
raw = event.respond_to?(:bytes) ? event.bytes : nil
|
|
481
|
+
raw = raw.read if raw.respond_to?(:read)
|
|
482
|
+
next unless raw&.length&.positive?
|
|
483
|
+
|
|
484
|
+
# Bedrock invoke_model_with_response_stream payloads are gzip-compressed.
|
|
485
|
+
# Detect gzip magic bytes (0x1f8b) and decompress.
|
|
486
|
+
require 'zlib'
|
|
487
|
+
raw = Zlib::GzipReader.wrap(StringIO.new(raw), &:read) if raw.byteslice(0, 2) == "\x1f\x8b"
|
|
488
|
+
|
|
489
|
+
# Now raw is UTF-8 JSON lines (newline-delimited Anthropic events)
|
|
490
|
+
text = raw.force_encoding('UTF-8')
|
|
491
|
+
text.lines.each do |line|
|
|
492
|
+
line = line.strip
|
|
493
|
+
next if line.empty?
|
|
494
|
+
|
|
495
|
+
raw_event = Legion::JSON.parse(line, symbolize_names: false)
|
|
496
|
+
next unless raw_event.is_a?(Hash)
|
|
497
|
+
|
|
498
|
+
event_type = raw_event['type'] || 'unknown'
|
|
499
|
+
state[:raw_events] << { event: event_type, data: raw_event } if dump_path
|
|
500
|
+
handle_invoke_model_stream_json(raw_event, state, mid) { |chunk| yield chunk if block_given? }
|
|
501
|
+
end
|
|
502
|
+
rescue StandardError => e
|
|
503
|
+
log.warn { "bedrock.provider.invoke_model_stream: chunk decode error=#{sanitize_log(e.message)}" }
|
|
504
|
+
end
|
|
505
|
+
|
|
506
|
+
stream.on_error_event do |event|
|
|
507
|
+
log.warn do
|
|
508
|
+
"bedrock.provider.invoke_model_stream: error event ivars=#{event.instance_variables.inspect}"
|
|
509
|
+
end
|
|
510
|
+
end
|
|
511
|
+
|
|
512
|
+
stream.on_internal_server_exception_event do |event|
|
|
513
|
+
log.warn do
|
|
514
|
+
'bedrock.provider.invoke_model_stream: internal_server_exception ' \
|
|
515
|
+
"ivars=#{event.instance_variables.inspect}"
|
|
516
|
+
end
|
|
517
|
+
end
|
|
518
|
+
|
|
519
|
+
stream.on_model_stream_error_exception_event do |event|
|
|
520
|
+
log.warn do
|
|
521
|
+
"bedrock.provider.invoke_model_stream: model_stream_error ivars=#{event.instance_variables.inspect}"
|
|
522
|
+
end
|
|
523
|
+
end
|
|
524
|
+
end
|
|
525
|
+
# rubocop:enable Metrics/BlockLength
|
|
526
|
+
|
|
527
|
+
# Dump raw streaming events for debugging
|
|
528
|
+
if dump_path && state[:raw_events].any?
|
|
529
|
+
begin
|
|
530
|
+
dump_file = File.join(dump_path, "bedrock_invoke_stream_#{Time.now.strftime('%Y%m%d_%H%M%S')}.json")
|
|
531
|
+
File.write(dump_file, Legion::JSON.pretty_generate(state[:raw_events]))
|
|
532
|
+
log.debug do
|
|
533
|
+
"bedrock.provider.invoke_model_stream: #{state[:raw_events].size} raw events dumped to #{dump_file}"
|
|
534
|
+
end
|
|
535
|
+
rescue StandardError => e
|
|
536
|
+
log.warn { "bedrock.provider.invoke_model_stream: failed to dump raw events: #{e.message}" }
|
|
537
|
+
end
|
|
538
|
+
end
|
|
539
|
+
|
|
540
|
+
usage = state[:final_usage] || {}
|
|
541
|
+
msg_attrs = {
|
|
542
|
+
role: :assistant,
|
|
543
|
+
content: state[:accumulated],
|
|
544
|
+
model_id: mid,
|
|
545
|
+
tool_calls: build_stream_tool_calls(state[:tool_use_blocks]),
|
|
546
|
+
input_tokens: usage.fetch(:input_tokens, 0) || usage.fetch('input_tokens', 0),
|
|
547
|
+
output_tokens: usage.fetch(:output_tokens, 0) || usage.fetch('output_tokens', 0),
|
|
548
|
+
cached_tokens: usage.fetch(:cache_read_input_tokens, nil) || usage.fetch('cache_read_input_tokens', nil),
|
|
549
|
+
cache_creation_tokens: usage.fetch(:cache_creation_input_tokens,
|
|
550
|
+
nil) || usage.fetch('cache_creation_input_tokens', nil),
|
|
551
|
+
stop_reason: state[:stop_reason]
|
|
552
|
+
}
|
|
553
|
+
msg_attrs[:thinking] = state[:thinking] unless state[:thinking].empty?
|
|
554
|
+
|
|
555
|
+
Legion::Extensions::Llm::Message.new(**msg_attrs)
|
|
556
|
+
end
|
|
557
|
+
|
|
558
|
+
def build_invoke_model_body(messages:, temperature:, max_tokens:, tools:, tool_prefs:, thinking:,
|
|
559
|
+
_model: nil, _streaming: false)
|
|
560
|
+
body = {
|
|
561
|
+
max_tokens: max_tokens || 4096,
|
|
562
|
+
messages: format_invoke_model_messages(messages),
|
|
563
|
+
anthropic_version: 'bedrock-2023-05-31'
|
|
564
|
+
}
|
|
565
|
+
body[:temperature] = temperature if temperature
|
|
566
|
+
if tools && !tools.empty?
|
|
567
|
+
tool_format = format_invoke_model_tools(tools, tool_prefs)
|
|
568
|
+
body[:tools] = tool_format[:tools]
|
|
569
|
+
body[:tool_choice] = tool_format[:tool_choice] if tool_format[:tool_choice]
|
|
570
|
+
end
|
|
571
|
+
body[:thinking] = invoke_model_thinking(thinking) if thinking
|
|
572
|
+
# NOTE: Don't include body[:stream] = true in the JSON body for invoke_model_with_response_stream.
|
|
573
|
+
# The endpoint itself implies streaming; Bedrock rejects the extra field.
|
|
574
|
+
body
|
|
575
|
+
end
|
|
576
|
+
|
|
577
|
+
# Strip provider-specific keys (e.g. effort from OpenAI) that Bedrock/Anthropic APIs don't accept.
|
|
578
|
+
def invoke_model_thinking(thinking)
|
|
579
|
+
return thinking unless thinking.is_a?(Hash)
|
|
580
|
+
|
|
581
|
+
thinking.except(:effort, 'effort')
|
|
582
|
+
end
|
|
583
|
+
|
|
584
|
+
def format_invoke_model_messages(messages)
|
|
585
|
+
messages.filter_map do |msg|
|
|
586
|
+
role = msg.respond_to?(:role) ? msg.role.to_s : (msg[:role] || msg['role']).to_s
|
|
587
|
+
next if role == 'system'
|
|
588
|
+
|
|
589
|
+
content = case role
|
|
590
|
+
when 'tool'
|
|
591
|
+
format_invoke_model_tool_result(msg)
|
|
592
|
+
when 'assistant'
|
|
593
|
+
format_invoke_model_assistant(msg)
|
|
594
|
+
else
|
|
595
|
+
format_invoke_model_content(msg)
|
|
596
|
+
end
|
|
597
|
+
|
|
598
|
+
next if content.nil? || (content.is_a?(Array) && content.empty?)
|
|
599
|
+
|
|
600
|
+
{ role: role, content: content }
|
|
601
|
+
end
|
|
602
|
+
end
|
|
603
|
+
|
|
604
|
+
def format_invoke_model_content(msg)
|
|
605
|
+
content = msg.respond_to?(:content) ? msg.content : (msg[:content] || msg['content'])
|
|
606
|
+
return [] if content.nil?
|
|
607
|
+
|
|
608
|
+
if content.is_a?(String)
|
|
609
|
+
[{ type: 'text', text: content }]
|
|
610
|
+
elsif content.is_a?(Array)
|
|
611
|
+
content.filter_map do |block|
|
|
612
|
+
type = (block[:type] || block['type']).to_s
|
|
613
|
+
next { type: 'text', text: block[:text] || block['text'] } if type == 'text'
|
|
614
|
+
|
|
615
|
+
block
|
|
616
|
+
end
|
|
617
|
+
else
|
|
618
|
+
[{ type: 'text', text: content.to_s }]
|
|
619
|
+
end
|
|
620
|
+
end
|
|
621
|
+
|
|
622
|
+
def format_invoke_model_tool_result(msg)
|
|
623
|
+
tool_call_id = if msg.respond_to?(:tool_call_id)
|
|
624
|
+
msg.tool_call_id
|
|
625
|
+
else
|
|
626
|
+
msg[:tool_call_id] || msg['tool_call_id']
|
|
627
|
+
end
|
|
628
|
+
content = if msg.respond_to?(:tool_results)
|
|
629
|
+
msg.tool_results.to_s
|
|
630
|
+
else
|
|
631
|
+
(msg[:content] || msg['content']).to_s
|
|
632
|
+
end
|
|
633
|
+
[{ type: 'tool_result', tool_use_id: tool_call_id, content: [{ type: 'text', text: content }] }]
|
|
634
|
+
end
|
|
635
|
+
|
|
636
|
+
def format_invoke_model_assistant(msg)
|
|
637
|
+
blocks = []
|
|
638
|
+
|
|
639
|
+
text = msg.respond_to?(:content) ? msg.content : (msg[:content] || msg['content'])
|
|
640
|
+
text_str = text.to_s
|
|
641
|
+
blocks << { type: 'text', text: text_str } unless text_str.strip.empty?
|
|
642
|
+
|
|
643
|
+
tool_calls = msg.respond_to?(:tool_calls) ? msg.tool_calls : (msg[:tool_calls] || msg['tool_calls'] || {})
|
|
644
|
+
call_array = tool_calls.is_a?(Hash) ? tool_calls.values : Array(tool_calls)
|
|
645
|
+
|
|
646
|
+
call_array.each do |call|
|
|
647
|
+
call_id = call.respond_to?(:id) ? call.id : (call[:id] || call['id'])
|
|
648
|
+
call_name = call.respond_to?(:name) ? call.name : (call[:name] || call['name'])
|
|
649
|
+
call_args = if call.respond_to?(:arguments)
|
|
650
|
+
call.arguments
|
|
651
|
+
else
|
|
652
|
+
call[:arguments] || call['arguments'] || {}
|
|
653
|
+
end
|
|
654
|
+
|
|
655
|
+
blocks << {
|
|
656
|
+
type: 'tool_use',
|
|
657
|
+
id: call_id,
|
|
658
|
+
name: call_name,
|
|
659
|
+
input: call_args
|
|
660
|
+
}
|
|
661
|
+
end
|
|
662
|
+
|
|
663
|
+
blocks
|
|
664
|
+
end
|
|
665
|
+
|
|
666
|
+
def format_invoke_model_tools(tools, tool_prefs)
|
|
667
|
+
tool_list = tools.values.map do |tool|
|
|
668
|
+
{
|
|
669
|
+
name: tool[:name] || tool['name'],
|
|
670
|
+
description: tool[:description] || tool['description'] || '',
|
|
671
|
+
input_schema: tool[:params_schema] || tool['params_schema'] ||
|
|
672
|
+
{ type: 'object', properties: {} }
|
|
673
|
+
}
|
|
674
|
+
end
|
|
675
|
+
|
|
676
|
+
result = { tools: tool_list }
|
|
677
|
+
|
|
678
|
+
if tool_prefs
|
|
679
|
+
choice = tool_prefs[:choice] || tool_prefs['choice']
|
|
680
|
+
result[:tool_choice] = if [:required, 'required'].include?(choice)
|
|
681
|
+
{ type: 'any' }
|
|
682
|
+
elsif choice.to_s != 'auto' && !choice.to_s.empty?
|
|
683
|
+
{ type: 'tool', name: choice.to_s }
|
|
684
|
+
else
|
|
685
|
+
{ type: 'auto' }
|
|
686
|
+
end
|
|
687
|
+
end
|
|
688
|
+
|
|
689
|
+
result
|
|
690
|
+
end
|
|
691
|
+
|
|
692
|
+
def parse_invoke_model_response(response, model_id)
|
|
693
|
+
body_raw = value(response, :body)
|
|
694
|
+
body_raw = body_raw.read if body_raw.respond_to?(:read)
|
|
695
|
+
body_raw = body_raw.string if body_raw.respond_to?(:string)
|
|
696
|
+
body = Legion::JSON.parse(body_raw, symbolize_names: false)
|
|
697
|
+
build_invoke_model_message(body, model_id)
|
|
698
|
+
end
|
|
699
|
+
|
|
700
|
+
def parse_invoke_model_response_hash(body, model_id)
|
|
701
|
+
# body is already a parsed Hash from Legion::JSON.parse
|
|
702
|
+
build_invoke_model_message(body, model_id)
|
|
703
|
+
end
|
|
704
|
+
|
|
705
|
+
def build_invoke_model_message(body, model_id)
|
|
706
|
+
content_blocks = body['content'] || []
|
|
707
|
+
|
|
708
|
+
text_parts = content_blocks.filter_map { |b| b['text'] if b['type'] == 'text' }.join
|
|
709
|
+
thinking_text = content_blocks.filter_map { |b| b['thinking'] if b['type'] == 'thinking' }.join
|
|
710
|
+
tool_calls_raw = content_blocks.select { |b| b['type'] == 'tool_use' }
|
|
711
|
+
|
|
712
|
+
tc = {}
|
|
713
|
+
tool_calls_raw.each do |tc_block|
|
|
714
|
+
tc[tc_block['id']] = Legion::Extensions::Llm::ToolCall.new(
|
|
715
|
+
id: tc_block['id'], name: tc_block['name'], arguments: tc_block['input'] || {}
|
|
716
|
+
)
|
|
717
|
+
end
|
|
718
|
+
|
|
719
|
+
usage = body['usage'] || {}
|
|
720
|
+
|
|
721
|
+
msg_attrs = {
|
|
722
|
+
role: :assistant,
|
|
723
|
+
content: text_parts,
|
|
724
|
+
model_id: model_id,
|
|
725
|
+
tool_calls: tc.empty? ? nil : tc,
|
|
726
|
+
input_tokens: usage['input_tokens'] || 0,
|
|
727
|
+
output_tokens: usage['output_tokens'] || 0,
|
|
728
|
+
cached_tokens: usage['cache_read_input_tokens'],
|
|
729
|
+
cache_creation_tokens: usage['cache_creation_input_tokens']
|
|
730
|
+
}
|
|
731
|
+
msg_attrs[:thinking] = thinking_text unless thinking_text.empty?
|
|
732
|
+
|
|
733
|
+
Legion::Extensions::Llm::Message.new(**msg_attrs)
|
|
734
|
+
end
|
|
735
|
+
|
|
736
|
+
def handle_invoke_model_stream_json(event_json, state, model_id)
|
|
737
|
+
# event_json is a Hash like { "type": "message_start", "message": { ... } }
|
|
738
|
+
case event_json['type']
|
|
739
|
+
when 'message_start'
|
|
740
|
+
msg = event_json['message'] || {}
|
|
741
|
+
state[:final_usage] = msg['usage'] || {}
|
|
742
|
+
when 'content_block_start'
|
|
743
|
+
block = event_json['content_block'] || {}
|
|
744
|
+
block_type = block['type'].to_s
|
|
745
|
+
state[:in_thinking] = (block_type == 'thinking')
|
|
746
|
+
if block_type == 'tool_use'
|
|
747
|
+
state[:current_tool_use] = {
|
|
748
|
+
tool_use_id: block['id'],
|
|
749
|
+
name: block['name'],
|
|
750
|
+
input_json: +''
|
|
751
|
+
}
|
|
752
|
+
elsif block_type != 'thinking'
|
|
753
|
+
state[:in_thinking] = false
|
|
754
|
+
end
|
|
755
|
+
when 'content_block_delta'
|
|
756
|
+
delta = event_json['delta'] || {}
|
|
757
|
+
delta_type = delta['type'].to_s
|
|
758
|
+
case delta_type
|
|
759
|
+
when 'thinking_delta'
|
|
760
|
+
text = delta['thinking'] || ''
|
|
761
|
+
state[:thinking] << text
|
|
762
|
+
if block_given? && !text.empty?
|
|
763
|
+
yield Legion::Extensions::Llm::Chunk.new(
|
|
764
|
+
role: :assistant,
|
|
765
|
+
content: '',
|
|
766
|
+
thinking: { content: text, enabled: true },
|
|
767
|
+
model_id: model_id
|
|
768
|
+
)
|
|
769
|
+
end
|
|
770
|
+
when 'text_delta'
|
|
771
|
+
text = delta['text'] || ''
|
|
772
|
+
state[:accumulated] << text
|
|
773
|
+
if block_given?
|
|
774
|
+
yield Legion::Extensions::Llm::Chunk.new(role: :assistant, content: text,
|
|
775
|
+
model_id: model_id)
|
|
776
|
+
end
|
|
777
|
+
when 'input_json_delta'
|
|
778
|
+
partial = delta['partial_json'] || ''
|
|
779
|
+
state[:current_tool_use][:input_json] << partial
|
|
780
|
+
if block_given? && !partial.empty?
|
|
781
|
+
yield Legion::Extensions::Llm::Chunk.new(
|
|
782
|
+
role: :assistant,
|
|
783
|
+
content: '',
|
|
784
|
+
tool_calls: {
|
|
785
|
+
state[:current_tool_use][:tool_use_id].to_sym =>
|
|
786
|
+
Legion::Extensions::Llm::ToolCall.new(
|
|
787
|
+
id: state[:current_tool_use][:tool_use_id],
|
|
788
|
+
name: state[:current_tool_use][:name],
|
|
789
|
+
arguments: partial
|
|
790
|
+
)
|
|
791
|
+
},
|
|
792
|
+
model_id: model_id
|
|
793
|
+
)
|
|
794
|
+
end
|
|
795
|
+
end
|
|
796
|
+
when 'content_block_stop'
|
|
797
|
+
if state[:current_tool_use]
|
|
798
|
+
state[:tool_use_blocks] << state[:current_tool_use]
|
|
799
|
+
state[:current_tool_use] = nil
|
|
800
|
+
end
|
|
801
|
+
when 'message_delta'
|
|
802
|
+
delta = event_json['delta'] || {}
|
|
803
|
+
state[:stop_reason] = delta['stop_reason']
|
|
804
|
+
end
|
|
805
|
+
rescue StandardError => e
|
|
806
|
+
log.warn { "bedrock.provider.invoke_model_stream_json: error=#{e.message}" }
|
|
807
|
+
end
|
|
808
|
+
|
|
299
809
|
def static_offerings(**filters)
|
|
300
810
|
STATIC_MODELS.filter_map do |entry|
|
|
301
811
|
provider_filter = normalize_provider(filters[:by_provider])
|
|
@@ -363,17 +873,35 @@ module Legion
|
|
|
363
873
|
ctx ? { context_window: ctx } : nil
|
|
364
874
|
end
|
|
365
875
|
|
|
366
|
-
def converse_request(messages, model:, temperature:, max_tokens:, tools:, tool_prefs:, guardrail_config: nil
|
|
876
|
+
def converse_request(messages, model:, temperature:, max_tokens:, tools:, tool_prefs:, guardrail_config: nil,
|
|
877
|
+
thinking: nil)
|
|
367
878
|
{
|
|
368
879
|
model_id: self.class.inference_profile_id(model_id(model), region: region),
|
|
369
880
|
messages: format_messages(messages.reject { |message| message.role == :system }),
|
|
370
881
|
system: format_system(messages),
|
|
371
882
|
inference_config: { temperature: temperature, max_tokens: max_tokens || model_max_tokens(model) }.compact,
|
|
372
883
|
tool_config: format_tool_config(tools, tool_prefs),
|
|
373
|
-
guardrail_config: guardrail_config
|
|
884
|
+
guardrail_config: guardrail_config,
|
|
885
|
+
additional_model_request_fields: bedrock_additional_fields(thinking)
|
|
374
886
|
}.compact
|
|
375
887
|
end
|
|
376
888
|
|
|
889
|
+
def bedrock_additional_fields(thinking)
|
|
890
|
+
fields = {}
|
|
891
|
+
if thinking
|
|
892
|
+
fields[:thinking] = {
|
|
893
|
+
type: 'enabled',
|
|
894
|
+
budget_tokens: if thinking.is_a?(Hash)
|
|
895
|
+
thinking[:budget_tokens] || thinking['budget_tokens'] ||
|
|
896
|
+
thinking[:budget] || thinking['budget'] || 1024
|
|
897
|
+
else
|
|
898
|
+
1024
|
|
899
|
+
end
|
|
900
|
+
}
|
|
901
|
+
end
|
|
902
|
+
fields.empty? ? nil : fields
|
|
903
|
+
end
|
|
904
|
+
|
|
377
905
|
def format_messages(messages)
|
|
378
906
|
total = messages.size
|
|
379
907
|
messages.filter_map.with_index do |message, idx|
|
|
@@ -389,9 +917,10 @@ module Legion
|
|
|
389
917
|
return [] unless message.tool_result?
|
|
390
918
|
|
|
391
919
|
[{
|
|
392
|
-
|
|
393
|
-
|
|
394
|
-
|
|
920
|
+
tool_result: {
|
|
921
|
+
tool_use_id: message.tool_call_id,
|
|
922
|
+
content: [{ text: message.tool_results.to_s }]
|
|
923
|
+
}
|
|
395
924
|
}]
|
|
396
925
|
end
|
|
397
926
|
|
|
@@ -439,7 +968,7 @@ module Legion
|
|
|
439
968
|
text = content_text(message.content)
|
|
440
969
|
blocks << { text: text } if text && !text.strip.empty?
|
|
441
970
|
|
|
442
|
-
message.tool_calls.
|
|
971
|
+
message.tool_calls.each_value do |call|
|
|
443
972
|
blocks << {
|
|
444
973
|
tool_use: {
|
|
445
974
|
tool_use_id: call.id,
|
|
@@ -562,27 +1091,133 @@ module Legion
|
|
|
562
1091
|
def parse_converse_response(response, fallback_model)
|
|
563
1092
|
output = value(response, :output)
|
|
564
1093
|
message = value(output, :message)
|
|
1094
|
+
content_blocks = value(message, :content)
|
|
565
1095
|
usage = value(response, :usage) || {}
|
|
1096
|
+
additional_fields = value(response, :additional_model_response_fields)
|
|
566
1097
|
|
|
567
|
-
|
|
1098
|
+
msg_attrs = {
|
|
568
1099
|
role: :assistant,
|
|
569
|
-
content: text_from(
|
|
1100
|
+
content: text_from(content_blocks),
|
|
570
1101
|
model_id: fallback_model,
|
|
571
|
-
tool_calls: parse_tool_calls(
|
|
1102
|
+
tool_calls: parse_tool_calls(content_blocks),
|
|
572
1103
|
input_tokens: value(usage, :input_tokens),
|
|
573
1104
|
output_tokens: value(usage, :output_tokens),
|
|
574
1105
|
cached_tokens: cache_read_tokens(usage),
|
|
575
1106
|
cache_creation_tokens: cache_write_tokens(usage),
|
|
576
1107
|
raw: normalize_response(response)
|
|
577
|
-
|
|
1108
|
+
}
|
|
1109
|
+
|
|
1110
|
+
# Bedrock Converse returns thinking in two possible locations:
|
|
1111
|
+
# 1. Content blocks: { reasoning: { text: "..." } }
|
|
1112
|
+
# 2. Additional model response fields: { thinking: { reasoningContent: { chunk: { text } } } }
|
|
1113
|
+
thinking_text = extract_thinking_from_content(content_blocks) ||
|
|
1114
|
+
(additional_fields ? extract_thinking_from_fields(additional_fields) : nil)
|
|
1115
|
+
msg_attrs[:thinking] = thinking_text if thinking_text
|
|
1116
|
+
|
|
1117
|
+
Legion::Extensions::Llm::Message.new(**msg_attrs)
|
|
1118
|
+
end
|
|
1119
|
+
|
|
1120
|
+
def extract_thinking_from_content(content_blocks)
|
|
1121
|
+
return nil unless content_blocks
|
|
1122
|
+
|
|
1123
|
+
Array(content_blocks).each do |block|
|
|
1124
|
+
reasoning = value(block, :reasoning)
|
|
1125
|
+
# reasoning can be a Hash or an AWS SDK struct (Aws::BedrockRuntime::Types::ReasoningContent)
|
|
1126
|
+
next if reasoning.nil?
|
|
1127
|
+
|
|
1128
|
+
text = if reasoning.is_a?(Hash)
|
|
1129
|
+
reasoning[:text] || reasoning['text']
|
|
1130
|
+
else
|
|
1131
|
+
# AWS SDK struct — use value() to safely extract the :text field
|
|
1132
|
+
value(reasoning, :text)
|
|
1133
|
+
end
|
|
1134
|
+
return text.to_s unless text.to_s.empty?
|
|
1135
|
+
end
|
|
1136
|
+
nil
|
|
1137
|
+
end
|
|
1138
|
+
|
|
1139
|
+
def extract_thinking_from_fields(additional_fields)
|
|
1140
|
+
thinking = additional_fields[:thinking] || additional_fields['thinking']
|
|
1141
|
+
return nil unless thinking.is_a?(Hash)
|
|
1142
|
+
|
|
1143
|
+
# Bedrock Converse API returns thinking in multiple shapes depending on model:
|
|
1144
|
+
# - Claude direct: { text: "..." }
|
|
1145
|
+
# - Claude via Converse: { reasoningContent: { chunk: { text: "..." } } }
|
|
1146
|
+
# - Some models: { reasoning_text: "..." } or { reasoning: "..." }
|
|
1147
|
+
content = thinking[:text] || thinking['text'] ||
|
|
1148
|
+
thinking[:reasoning_text] || thinking['reasoningText'] ||
|
|
1149
|
+
thinking[:reasoning] || thinking['reasoning'] ||
|
|
1150
|
+
reasoning_content_text(thinking)
|
|
1151
|
+
content.to_s unless content.to_s.empty?
|
|
1152
|
+
end
|
|
1153
|
+
|
|
1154
|
+
def reasoning_content_text(thinking)
|
|
1155
|
+
rc = thinking[:reasoningContent] || thinking['reasoningContent']
|
|
1156
|
+
return nil unless rc.is_a?(Hash)
|
|
1157
|
+
|
|
1158
|
+
# Handle the nested chunk structure from Bedrock Converse
|
|
1159
|
+
chunk = rc[:chunk] || rc['chunk']
|
|
1160
|
+
if chunk.is_a?(Hash)
|
|
1161
|
+
chunk[:text] || chunk['text']
|
|
1162
|
+
else
|
|
1163
|
+
rc[:text] || rc['text']
|
|
1164
|
+
end
|
|
578
1165
|
end
|
|
579
1166
|
|
|
580
1167
|
def stream_converse(request, fallback_model)
|
|
581
1168
|
state = { accumulated: +'', thinking: +'', final_usage: nil, stop_reason: nil,
|
|
582
|
-
tool_use_blocks: [], current_tool_use: nil, in_thinking: false
|
|
1169
|
+
tool_use_blocks: [], current_tool_use: nil, in_thinking: false,
|
|
1170
|
+
raw_events: [] }
|
|
1171
|
+
|
|
1172
|
+
log.debug do
|
|
1173
|
+
"bedrock.provider.stream_converse: starting model=#{fallback_model} tools=#{state[:tool_use_blocks].size}"
|
|
1174
|
+
end
|
|
1175
|
+
|
|
1176
|
+
dump_path = ENV.fetch('BEDROCK_DEBUG_OUTPUT', nil)
|
|
583
1177
|
|
|
584
1178
|
runtime_client.converse_stream(**request) do |stream|
|
|
585
1179
|
wire_stream_handlers(stream, state, fallback_model) { |chunk| yield chunk if block_given? }
|
|
1180
|
+
|
|
1181
|
+
# Capture all raw events for debugging
|
|
1182
|
+
if dump_path
|
|
1183
|
+
stream.on_content_block_start_event do |evt|
|
|
1184
|
+
state[:raw_events] << { event: 'content_block_start', data: safe_event_data(evt) }
|
|
1185
|
+
end
|
|
1186
|
+
stream.on_content_block_delta_event do |evt|
|
|
1187
|
+
state[:raw_events] << { event: 'content_block_delta', data: safe_event_data(evt) }
|
|
1188
|
+
end
|
|
1189
|
+
stream.on_content_block_stop_event do |evt|
|
|
1190
|
+
state[:raw_events] << { event: 'content_block_stop', data: safe_event_data(evt) }
|
|
1191
|
+
end
|
|
1192
|
+
stream.on_message_start_event do |evt|
|
|
1193
|
+
state[:raw_events] << { event: 'message_start', data: safe_event_data(evt) }
|
|
1194
|
+
end
|
|
1195
|
+
stream.on_message_stop_event do |evt|
|
|
1196
|
+
state[:raw_events] << { event: 'message_stop', data: safe_event_data(evt) }
|
|
1197
|
+
end
|
|
1198
|
+
stream.on_metadata_event do |evt|
|
|
1199
|
+
state[:raw_events] << { event: 'metadata', data: safe_event_data(evt) }
|
|
1200
|
+
end
|
|
1201
|
+
end
|
|
1202
|
+
end
|
|
1203
|
+
|
|
1204
|
+
# Dump raw streaming events for debugging
|
|
1205
|
+
if dump_path && state[:raw_events].any?
|
|
1206
|
+
begin
|
|
1207
|
+
dump_file = File.join(dump_path, "bedrock_stream_#{Time.now.strftime('%Y%m%d_%H%M%S')}.json")
|
|
1208
|
+
File.write(dump_file, Legion::JSON.pretty_generate(state[:raw_events]))
|
|
1209
|
+
log.debug do
|
|
1210
|
+
"bedrock.provider.stream_converse: #{state[:raw_events].size} raw events dumped to #{dump_file}"
|
|
1211
|
+
end
|
|
1212
|
+
rescue StandardError => e
|
|
1213
|
+
log.warn { "bedrock.provider.stream_converse: failed to dump raw events: #{e.message}" }
|
|
1214
|
+
end
|
|
1215
|
+
end
|
|
1216
|
+
|
|
1217
|
+
log.debug do
|
|
1218
|
+
"bedrock.provider.stream_converse: completed model=#{fallback_model} " \
|
|
1219
|
+
"accumulated_length=#{state[:accumulated].length} thinking_length=#{state[:thinking].length} " \
|
|
1220
|
+
"tool_use_blocks=#{state[:tool_use_blocks].size} stop_reason=#{state[:stop_reason]}"
|
|
586
1221
|
end
|
|
587
1222
|
|
|
588
1223
|
msg_attrs = {
|
|
@@ -614,7 +1249,9 @@ module Legion
|
|
|
614
1249
|
stream.on_content_block_start_event do |event|
|
|
615
1250
|
start = value(event, :start)
|
|
616
1251
|
|
|
617
|
-
|
|
1252
|
+
# Bedrock Converse uses 'reasoning' blocks for thinking content,
|
|
1253
|
+
# and 'thinking' blocks for legacy/direct invoke_model responses
|
|
1254
|
+
if value(start, :thinking) || value(start, :reasoning)
|
|
618
1255
|
state[:in_thinking] = true
|
|
619
1256
|
next
|
|
620
1257
|
end
|
|
@@ -634,7 +1271,11 @@ module Legion
|
|
|
634
1271
|
def wire_block_delta(stream, state, fallback_model)
|
|
635
1272
|
stream.on_content_block_delta_event do |event|
|
|
636
1273
|
delta = value(event, :delta)
|
|
637
|
-
text
|
|
1274
|
+
# Bedrock streaming: text blocks use delta.text,
|
|
1275
|
+
# reasoning/thinking blocks use delta.reasoning.text or delta.thinking.text
|
|
1276
|
+
text = value(delta, :text) ||
|
|
1277
|
+
(value(delta, :reasoning) ? value(reasoning_delta, :text) : nil) ||
|
|
1278
|
+
(value(delta, :thinking) ? value(thinking_delta, :text) : nil)
|
|
638
1279
|
if text
|
|
639
1280
|
if state[:in_thinking]
|
|
640
1281
|
state[:thinking] << text
|
|
@@ -857,6 +1498,12 @@ module Legion
|
|
|
857
1498
|
body.is_a?(String) ? Legion::JSON.parse(body, symbolize_names: false) : body.to_h
|
|
858
1499
|
end
|
|
859
1500
|
|
|
1501
|
+
# Safely extract event data for debugging — AWS SDK structs
|
|
1502
|
+
# may or may not respond to #to_h
|
|
1503
|
+
def safe_event_data(evt)
|
|
1504
|
+
evt.respond_to?(:to_h) ? evt.to_h : evt.inspect[0, 500]
|
|
1505
|
+
end
|
|
1506
|
+
|
|
860
1507
|
def normalize_response(response)
|
|
861
1508
|
response.respond_to?(:to_h) ? response.to_h : {}
|
|
862
1509
|
end
|
|
@@ -865,8 +1512,13 @@ module Legion
|
|
|
865
1512
|
return nil if object.nil?
|
|
866
1513
|
|
|
867
1514
|
string_key = key.to_s
|
|
868
|
-
|
|
869
|
-
|
|
1515
|
+
|
|
1516
|
+
val = safe_struct_access(object, key)
|
|
1517
|
+
return val unless val.nil?
|
|
1518
|
+
|
|
1519
|
+
val = safe_struct_access(object, string_key)
|
|
1520
|
+
return val unless val.nil?
|
|
1521
|
+
|
|
870
1522
|
return object.public_send(key) if object.respond_to?(key)
|
|
871
1523
|
|
|
872
1524
|
if object.respond_to?(:to_h)
|
|
@@ -877,6 +1529,26 @@ module Legion
|
|
|
877
1529
|
|
|
878
1530
|
nil
|
|
879
1531
|
end
|
|
1532
|
+
|
|
1533
|
+
# Sanitize potentially binary/non-UTF-8 strings for safe logging
|
|
1534
|
+
def sanitize_log(str)
|
|
1535
|
+
return str unless str.is_a?(String)
|
|
1536
|
+
|
|
1537
|
+
str.force_encoding('UTF-8').scrub('?')
|
|
1538
|
+
rescue StandardError
|
|
1539
|
+
str.inspect
|
|
1540
|
+
end
|
|
1541
|
+
|
|
1542
|
+
def safe_struct_access(object, key)
|
|
1543
|
+
return nil unless object.respond_to?(:key?) && object.key?(key)
|
|
1544
|
+
|
|
1545
|
+
object[key]
|
|
1546
|
+
rescue NameError
|
|
1547
|
+
# AWS SDK structs (Aws::Structure) define members in their schema
|
|
1548
|
+
# but may not populate them in every response. A missing value
|
|
1549
|
+
# raises NameError instead of returning nil.
|
|
1550
|
+
nil
|
|
1551
|
+
end
|
|
880
1552
|
end
|
|
881
1553
|
end
|
|
882
1554
|
end
|