lex-llm-bedrock 0.3.12 → 0.3.18

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 939ac50e6240dcde55f7bc9b24fb0a7f1f6fd527ec46cea27d8255f8a401fcd6
4
- data.tar.gz: 899da50daa4595d92b0bb23a14bd4c873c03d41bc72235e8337b17f61fe69d99
3
+ metadata.gz: e0a7bdd5a1097bfe7caf898a8b47649c5b29d7ae2d73c980e47bec743f343b2c
4
+ data.tar.gz: bab966bb6aa10487d43f1f4d01ae531b701ef74c49884ad820c127ee3d7efc91
5
5
  SHA512:
6
- metadata.gz: b2e824ee11517dbbfaf7710bd25a8329ee9007ccb35226776a57131d4ba859fc90fc2b8f0dc3e31ff349b2cc15e3a82778e491fca0f75b2b76d6a4783a8e7e67
7
- data.tar.gz: c06248c9b3c047db80193c7a3c9666b0b251de5bb6a62199f135902f91295cfcfc069e2994cdc93168b66a1c11a7ad755875bc1c9ba8fd60ae0850a7277dae30
6
+ metadata.gz: 34d5f3629994cda2216de0826249516659e39e3e908152a66d62e46d4a9ce6b01f983e094c1c30f9c233227e810231d67dadfd8bed1fa86b70f448ea59b7c1b4
7
+ data.tar.gz: 6cc2164e232cada49623ec316ea2a767227196ed7526ac1956c7c0ebf9110e52d1aaa69823c51bb06f33f4779772e4dce1037ed0e9aaf17601db4c9e2e0ec089
data/CHANGELOG.md CHANGED
@@ -1,5 +1,41 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.3.18 - 2026-06-05
4
+
5
+ ### Fixed
6
+ - **Spec and RuboCop compliance** — Verified all 54 specs pass cleanly. RuboCop auto-correct applied; 0 offenses remaining.
7
+
8
+ ## 0.3.17 - 2026-06-05
9
+
10
+ ### Fixed
11
+ - **Unused method arguments** — Prefixed unused keyword parameters (`params`, `model`, `streaming`) in `invoke_model_chat`, `invoke_model_stream`, and `build_invoke_model_body` with underscore prefix to satisfy RuboCop `Lint/UnusedMethodArgument` (provider.rb)
12
+ - **Keyword parameter ordering** — Moved optional keyword parameters to the end of `build_invoke_model_body` signature per `Style/KeywordParametersOrder` (provider.rb)
13
+
14
+ ## 0.3.16 - 2026-06-04
15
+
16
+ ### Fixed
17
+ - **Thinking config silently ignored by Converse API for Claude Sonnet 4+** — Bedrock Converse API does not support extended thinking for Claude Sonnet 4 and newer. When thinking is enabled for an Anthropic model, the provider now routes through `invoke_model` with the native Anthropic Messages API payload (the same format Phase 1 direct tests use), which correctly generates and returns thinking blocks (provider.rb)
18
+ - **Thinking extraction failed on AWS SDK structs** — `extract_thinking_from_content` assumed content blocks were Hashes. Bedrock Converse returns `Aws::BedrockRuntime::Types` structs that don't respond to `[]` the same way. Now uses `value()` helper for safe struct access on reasoning content blocks (provider.rb)
19
+ - **Streaming reasoning/thinking blocks not detected** — `wire_block_start` only checked `:thinking` blocks but Bedrock Converse uses `:reasoning` blocks for thinking content. Added `:reasoning` check. `wire_block_delta` now extracts from `delta.reasoning.text` and `delta.thinking.text` in addition to `delta.text` (provider.rb)
20
+
21
+ ### Added
22
+ - **Debug logging for Bedrock converse calls** — Logs thinking config sent, elapsed time, usage, additional_fields keys, and content block types on response. Logs stream completion with accumulated length, tool use block count, and stop reason (provider.rb)
23
+
24
+ ## 0.3.15 - 2026-06-04
25
+
26
+ ### Fixed
27
+ - **Thinking config ignored in chat/stream/complete** — The `chat`, `stream`, and `complete` methods accepted `thinking:` kwarg but never passed it to Bedrock's converse API. Now passes thinking through `additional_model_request_fields[:thinking]` with AWS-format `{ type: "enabled", budget_tokens: N }`, accepting both `:budget_tokens` and `:budget` keys for compatibility with Anthropic API format (provider.rb)
28
+
29
+ ## 0.3.14 - 2026-06-04
30
+
31
+ ### Fixed
32
+ - **`NameError` on unpopulated AWS SDK struct fields** — `Aws::Structure` objects declare all members in their schema (including `cache_creation_input_tokens`), so `key?` returns `true`, but accessing a missing member raises `NameError` instead of returning `nil`. Added `safe_struct_access` helper that wraps `object[key]` in `rescue NameError → nil`, so unpopulated struct fields gracefully return `nil` instead of crashing the request (provider.rb)
33
+
34
+ ## 0.3.13 - 2026-06-02
35
+
36
+ ### Fixed
37
+ - **Tool call iteration crash on Bedrock escalation** — `assistant_tool_use_blocks` iterated `message.tool_calls` (a `Hash`) with `each`, which yields `[key, value]` pairs rather than `ToolCall` objects. Calling `.id` on the Array raised `NoMethodError` on every Bedrock call with tool-call history, tripping the circuit breaker and exhausting the escalation chain. Fixed by using `each_value` (provider.rb)
38
+
3
39
  ## 0.3.12 - 2026-06-02
4
40
 
5
41
  ### Fixed
data/README.md CHANGED
@@ -8,18 +8,44 @@ This gem adds a hosted Bedrock provider surface for Legion LLM routing. It uses
8
8
 
9
9
  ```
10
10
  Legion::Extensions::Llm::Bedrock
11
- ├── Provider # Bedrock implementation of the lex-llm Provider contract
12
- │ ├── Capabilities # Capability predicates inferred from model IDs
13
- │ ├── chat / stream # Converse / ConverseStream API calls
14
- │ ├── embed # Titan InvokeModel embedding
15
- │ ├── count_tokens # CountTokens API call
16
- │ ├── discover_offerings # Static catalog + live ListFoundationModels
17
- │ ├── health / readiness # Provider health checks with live AWS verification
18
- └── list_models # Live model enumeration
19
- ├── Actor::FleetWorker # Provider-owned fleet subscription gate
20
- └── Runners::FleetWorker # Delegates fleet requests to lex-llm ProviderResponder
11
+ ├── Provider # Bedrock implementation of the lex-llm Provider contract
12
+ │ ├── Capabilities # Capability predicates inferred from model IDs
13
+ │ ├── chat / stream # Converse / ConverseStream API calls
14
+ │ ├── embed # Titan InvokeModel embedding
15
+ │ ├── count_tokens # CountTokens API call
16
+ │ ├── discover_offerings # Static catalog + live ListFoundationModels
17
+ │ ├── health / readiness # Provider health checks with live AWS verification
18
+ ├── list_models # Live model enumeration
19
+ ├── invoke_model_chat # Native Anthropic payload for thinking-enabled models
20
+ └── invoke_model_stream # Native Anthropic streaming for thinking-enabled models
21
+ ├── Actor::FleetWorker # Provider-owned fleet subscription gate
22
+ ├── Actor::DiscoveryRefresh # Periodic model catalog refresh (conditional on actor runtime)
23
+ └── Runners::FleetWorker # Delegates fleet requests to lex-llm ProviderResponder
21
24
  ```
22
25
 
26
+ ### Provider Dispatch
27
+
28
+ The `Provider` class decides at call time which API path to use:
29
+
30
+ | Condition | Path | Why |
31
+ |-----------|------|-----|
32
+ | Anthropic model + `thinking` or `tools` | `invoke_model` (native Anthropic payload) | Bedrock Converse silently drops thinking config and tool_use blocks for Claude Sonnet 4+ |
33
+ | All other cases | `Converse` / `ConverseStream` | Standard Bedrock managed inference API |
34
+
35
+ ### Instance Discovery
36
+
37
+ `Legion::Extensions::Llm::Bedrock.discover_instances` scans five credential sources in priority order, deduplicates by fingerprint, and returns a hash of `{ instance_name => config_hash }` pairs:
38
+
39
+ | Source | Key | How it works |
40
+ |--------|-----|--------------|
41
+ | ENV bearer | `:env_bearer` | Reads `AWS_BEARER_TOKEN_BEDROCK` from environment |
42
+ | Claude config bearer | `:claude` | Reads `AWS_BEARER_TOKEN_BEDROCK` from Claude env/config, falls back to pattern match on any key containing `AWS`, `BEARER`, `TOKEN`, `BEDROCK` |
43
+ | ENV SigV4 | `:env_sigv4` | Reads `AWS_ACCESS_KEY_ID` + `AWS_SECRET_ACCESS_KEY` from environment |
44
+ | Extension settings | `:settings` + named instances | Reads from `extensions.llm.bedrock` settings, normalizes generic keys to `bedrock_*` prefix |
45
+ | Identity Broker | `:broker` | Reads `Legion::Identity::Broker.credentials_for(:aws)` when the module is defined |
46
+
47
+ Instances with unresolved credential references (`vault://` or `env://` URIs) are filtered out.
48
+
23
49
  ## Dependencies
24
50
 
25
51
  | Gem | Required | Purpose |
@@ -36,9 +62,10 @@ Legion::Extensions::Llm::Bedrock
36
62
 
37
63
  | Path | Purpose |
38
64
  |------|---------|
39
- | `lib/legion/extensions/llm/bedrock.rb` | Entry point: namespace, default settings, discovery, and shared provider registration metadata |
40
- | `lib/legion/extensions/llm/bedrock/provider.rb` | Full Bedrock provider implementation |
65
+ | `lib/legion/extensions/llm/bedrock.rb` | Entry point: namespace, default settings, instance discovery, credential sources, and shared provider registration metadata |
66
+ | `lib/legion/extensions/llm/bedrock/provider.rb` | Full Bedrock provider implementation (1500+ lines) — Converse, invoke_model, streaming, tool calls, thinking, embeddings, health, and discovery |
41
67
  | `lib/legion/extensions/llm/bedrock/actors/fleet_worker.rb` | Starts the provider-owned fleet subscriber when an instance opts in |
68
+ | `lib/legion/extensions/llm/bedrock/actors/discovery_refresh.rb` | Periodic model catalog refresh actor (loaded only when `Legion::Extensions::Actors::Every` is available) |
42
69
  | `lib/legion/extensions/llm/bedrock/runners/fleet_worker.rb` | Hands provider fleet requests to `Legion::Extensions::Llm::Fleet::ProviderResponder` |
43
70
  | `lib/legion/extensions/llm/bedrock/version.rb` | `VERSION` constant |
44
71
 
@@ -69,7 +96,7 @@ If explicit keys are not configured, the AWS SDK default credential provider cha
69
96
  Legion::Extensions::Llm::Bedrock.default_settings
70
97
  ```
71
98
 
72
- Configuration options: `bedrock_region`, `bedrock_endpoint`, `bedrock_access_key_id`, `bedrock_secret_access_key`, `bedrock_session_token`, `bedrock_profile`, `bedrock_stub_responses`.
99
+ Configuration options: `bedrock_region`, `bedrock_endpoint`, `bedrock_access_key_id`, `bedrock_secret_access_key`, `bedrock_session_token`, `bedrock_profile`, `bedrock_stub_responses`, `bearer_token`.
73
100
 
74
101
  ## Fleet Responder
75
102
 
@@ -121,7 +148,33 @@ Every offering uses:
121
148
 
122
149
  Known aliases are intentionally small and conservative. For example, `claude-3-haiku` resolves to `anthropic.claude-3-haiku-20240307-v1:0`, while the preserved Bedrock model ID remains the routing model.
123
150
 
124
- Static models: `claude-3-haiku`, `titan-text-express`, `titan-embed-text-v2`, `llama-3.2-11b-instruct`, `mistral-large-3`.
151
+ Static models: `claude-3-haiku`, `anthropic.claude-sonnet-4`, `titan-text-express`, `titan-embed-text-v2`, `llama-3.2-11b-instruct`, `mistral-large-3`.
152
+
153
+ ## Inference Profiles
154
+
155
+ Bare model IDs (e.g. `anthropic.claude-sonnet-4`) are automatically prefixed with the region-based inference profile prefix (`us.`, `eu.`, `ap.`) based on the configured region. Region mapping is defined in `REGION_PREFIX`:
156
+
157
+ | Region | Prefix |
158
+ |--------|--------|
159
+ | `us-east-1`, `us-east-2`, `us-west-1`, `us-west-2` | `us` |
160
+ | `eu-central-1`, `eu-west-*` | `eu` |
161
+ | `ap-south-1`, `ap-southeast-*`, `ap-northeast-1` | `ap` |
162
+
163
+ Models already prefixed (`us.`, `eu.`, `ap.`, `arn:`) are passed through unchanged.
164
+
165
+ ## Context Windows
166
+
167
+ Static context window data is available for known models without making live API calls. Looked up by prefix match in `Provider::CONTEXT_WINDOWS`.
168
+
169
+ | Model prefix | Context |
170
+ |-------------|---------|
171
+ | `anthropic.claude-*` (all) | 200,000 |
172
+ | `meta.llama3*` | 128,000 |
173
+ | `mistral.mistral-*` | 128,000 |
174
+ | `amazon.nova-pro`, `nova-lite` | 300,000 |
175
+ | `amazon.nova-micro` | 128,000 |
176
+ | `amazon.titan-text-premier` | 32,000 |
177
+ | `amazon.titan-text-express` | 8,192 |
125
178
 
126
179
  ## API Contract
127
180
 
@@ -132,17 +185,41 @@ The implementation is intentionally limited to Bedrock operations documented by
132
185
  - `ConverseStream` for streaming chat responses
133
186
  - `CountTokens` for token estimates
134
187
  - `InvokeModel` only for the Titan text embedding request shape implemented here
188
+ - `InvokeModel` (non-streaming) for Anthropic models with thinking/tool use enabled
189
+ - `InvokeModelWithResponseStream` for Anthropic models with thinking/tool use enabled
135
190
 
136
191
  Provider-specific request bodies are not guessed. Non-Titan embedding models raise until their documented body shape is added explicitly.
137
192
 
193
+ ## Tool Calls
194
+
195
+ Tool calls follow the Bedrock Converse `tool_config` shape. When tool call history is present in the message array, assistant messages emit proper `{ tool_use: { tool_use_id, name, input } }` content blocks. Tool results use `{ tool_result: { tool_use_id, content } }` blocks.
196
+
197
+ For Anthropic models with tools, the `invoke_model` path is used with native Anthropic tool formatting (`input_schema` wrapped in the tool definition).
198
+
199
+ ## Thinking (Extended Reasoning)
200
+
201
+ When `thinking:` is passed to `chat`, `stream`, or `complete` for an Anthropic model:
202
+
203
+ 1. The provider detects the Anthropic model prefix and routes through `invoke_model` with the native Anthropic Messages API payload.
204
+ 2. Thinking config is serialized as `{ type: 'enabled', budget_tokens: N }`, accepting both `:budget_tokens` and `:budget` keys.
205
+ 3. Provider-specific keys (e.g. `:effort` from OpenAI) are stripped before sending.
206
+ 4. Responses parse thinking content from `content_blocks[type: 'thinking']` for `invoke_model`, and from `delta.reasoning.text` for `ConverseStream`.
207
+
208
+ ## Security
209
+
210
+ - Static AWS credentials emit a deprecation warning. Set `security.block_static_aws_credentials: true` in settings to reject them entirely.
211
+ - Bearer token authentication is supported via `Aws::StaticTokenProvider`, eliminating IMDS timeout on startup.
212
+
138
213
  ## Observability
139
214
 
140
215
  The Bedrock namespace and provider implementation include `Legion::Logging::Helper` for structured logging:
141
216
 
142
217
  - **Info-level**: provider connections, API calls (chat, stream, embed), model listing, health checks
143
- - **Debug-level**: offline health checks, readiness probes, and token counting
218
+ - **Debug-level**: offline health checks, readiness probes, token counting, thinking config, request/response metadata
144
219
  - **Rescue blocks**: handled provider failures call `handle_exception(e, level:, handled:, operation:)` with dot-separated operation names such as `bedrock.provider.health`
145
220
 
221
+ Set `BEDROCK_DEBUG_OUTPUT=/path/to/dir` to dump raw Bedrock responses and streaming events to JSON files for debugging.
222
+
146
223
  ## Development
147
224
 
148
225
  ```bash
@@ -152,12 +229,23 @@ bundle exec rubocop -A # auto-fix
152
229
  bundle exec rubocop # lint check (0 offenses expected)
153
230
  ```
154
231
 
232
+ ### Test Structure
233
+
234
+ | Spec file | Coverage |
235
+ |-----------|----------|
236
+ | `bedrock_spec.rb` | Provider surface: offerings, chat, stream, tools, embed, count_tokens, health, readiness, model listing, caching |
237
+ | `discover_instances_spec.rb` | Credential discovery from ENV, Claude config, settings, Identity Broker, and deduplication |
238
+ | `provider_contract_spec.rb` | Verifies all canonical methods use keyword-only arguments (no positional params) |
239
+ | `actors/fleet_worker_spec.rb` | Fleet worker actor: runner class, function, use_runner?, enabled? |
240
+ | `runners/fleet_worker_spec.rb` | Fleet worker runner: delegation to shared ProviderResponder |
241
+
155
242
  ## AWS References
156
243
 
157
244
  - [Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html)
158
245
  - [ConverseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ConverseStream.html)
159
246
  - [CountTokens](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_CountTokens.html)
160
247
  - [ListFoundationModels](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_ListFoundationModels.html)
248
+ - [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html)
161
249
  - [Foundation model information](https://docs.aws.amazon.com/bedrock/latest/userguide/foundation-models-reference.html)
162
250
 
163
251
  ## License
@@ -1,5 +1,6 @@
1
1
  # frozen_string_literal: true
2
2
 
3
+ require 'base64'
3
4
  require 'aws-sdk-bedrock'
4
5
  require 'aws-sdk-bedrockruntime'
5
6
  require 'legion/json'
@@ -16,6 +17,7 @@ module Legion
16
17
 
17
18
  STATIC_MODELS = [
18
19
  { model: 'anthropic.claude-3-haiku-20240307-v1:0', alias: 'claude-3-haiku' },
20
+ { model: 'anthropic.claude-sonnet-4-20250514-v1:0', alias: 'anthropic.claude-sonnet-4' },
19
21
  { model: 'amazon.titan-text-express-v1', alias: 'titan-text-express' },
20
22
  { model: 'amazon.titan-embed-text-v2:0', alias: 'titan-embed-text-v2', usage_type: :embedding },
21
23
  { model: 'meta.llama3-2-11b-instruct-v1:0', alias: 'llama-3.2-11b-instruct' },
@@ -210,32 +212,124 @@ module Legion
210
212
  tools: {},
211
213
  tool_prefs: nil,
212
214
  params: {},
215
+ thinking: nil,
213
216
  **_provider_options
214
217
  )
215
218
  log.info { "bedrock.provider.chat: model=#{model_id(model)} messages=#{messages.size}" }
219
+
220
+ # Bedrock Converse API silently drops thinking config and tool_use blocks
221
+ # for Claude Sonnet 4+. Use invoke_model with native Anthropic payload.
222
+ if anthropic_model?(model_id(model)) && (thinking || (tools && !tools.empty?))
223
+ return invoke_model_chat(messages:, model:, temperature:, max_tokens:, tools:, tool_prefs:,
224
+ thinking:, params:)
225
+ end
226
+
216
227
  request = Utils.deep_merge(
217
- converse_request(messages, model:, temperature:, max_tokens:, tools:, tool_prefs:),
228
+ converse_request(messages, model:, temperature:, max_tokens:, tools:, tool_prefs:, thinking:),
218
229
  params
219
230
  )
220
231
  log.debug do
221
232
  "bedrock.provider.chat: request prepared model=#{model_id(model)} tools=#{tools.size} " \
222
233
  "tool_choice=#{tool_choice_label(tool_prefs)} param_keys=#{params.keys.map(&:to_s).sort.join(',')}"
223
234
  end
224
- parse_converse_response(runtime_client.converse(**request), model_id(model))
235
+
236
+ # Log the thinking config being sent
237
+ thinking_config = request.dig(:additional_model_request_fields, :thinking)
238
+ log.debug { "bedrock.provider.chat: thinking_config=#{thinking_config.inspect}" } if thinking_config
239
+
240
+ start_time = Time.now
241
+ response = begin
242
+ runtime_client.converse(**request)
243
+ rescue StandardError => e
244
+ elapsed = ((Time.now - start_time) * 1000).round
245
+ log.error do
246
+ "bedrock.provider.chat: converse failed model=#{model_id(model)} " \
247
+ "error=#{e.class}: #{e.message} elapsed_ms=#{elapsed}"
248
+ end
249
+ raise
250
+ end
251
+ elapsed = ((Time.now - start_time) * 1000).round
252
+
253
+ # Dump raw Bedrock response for debugging
254
+ raw_debug = response.respond_to?(:to_h) ? response.to_h : response.inspect[0, 2000]
255
+ dump_path = ENV.fetch('BEDROCK_DEBUG_OUTPUT', nil)
256
+ if dump_path
257
+ begin
258
+ dump_file = File.join(dump_path, "bedrock_chat_#{Time.now.strftime('%Y%m%d_%H%M%S')}.json")
259
+ File.write(dump_file, Legion::JSON.pretty_generate(raw_debug))
260
+ log.debug { "bedrock.provider.chat: raw response dumped to #{dump_file}" }
261
+ rescue StandardError => e
262
+ log.warn { "bedrock.provider.chat: failed to dump raw response: #{e.message}" }
263
+ end
264
+ end
265
+
266
+ # Log response metadata
267
+ usage = value(response, :usage) || {}
268
+ additional_fields = value(response, :additional_model_response_fields)
269
+ output = value(response, :output)
270
+ content_blocks = output ? value(output, :message) : nil
271
+ # AWS SDK content blocks are structs, not hashes — use safe inspection
272
+ block_types = if content_blocks
273
+ Array(value(content_blocks, :content)).map do |b|
274
+ if b.respond_to?(:reasoning)
275
+ 'reasoning'
276
+ elsif b.respond_to?(:text)
277
+ 'text'
278
+ elsif b.respond_to?(:tool_use)
279
+ 'tool_use'
280
+ else
281
+ b.class.name
282
+ end
283
+ end.inspect
284
+ else
285
+ 'none'
286
+ end
287
+ af_keys = if additional_fields.respond_to?(:to_h)
288
+ additional_fields.to_h.keys.map(&:to_s).sort
289
+ else
290
+ additional_fields.respond_to?(:keys) ? additional_fields.keys.map(&:to_s).sort : []
291
+ end
292
+
293
+ log.debug do
294
+ "bedrock.provider.chat: response received model=#{model_id(model)} elapsed_ms=#{elapsed} " \
295
+ "usage=#{usage.inspect} additional_fields_keys=#{af_keys.inspect} " \
296
+ "content_block_types=#{block_types}"
297
+ end
298
+
299
+ parse_converse_response(response, model_id(model))
225
300
  end
226
301
 
227
302
  def stream(messages:, model:, temperature: nil, max_tokens: nil, tools: {}, tool_prefs: nil, params: {},
228
- **_provider_options, &)
229
- log.info { "bedrock.provider.stream: model=#{model_id(model)} messages=#{messages.size}" }
303
+ thinking: nil, **_provider_options, &)
304
+ log.info do
305
+ "bedrock.provider.stream: model=#{model_id(model)} messages=#{messages.size} tools=#{tools.size}"
306
+ end
307
+
308
+ # Bedrock Converse API silently drops thinking config and tool_use blocks
309
+ # for Claude Sonnet 4+. Use invoke_model with native Anthropic payload.
310
+ if anthropic_model?(model_id(model)) && (thinking || (tools && !tools.empty?))
311
+ return invoke_model_stream(messages:, model:, temperature:, max_tokens:, tools:, tool_prefs:,
312
+ thinking:, params:, &)
313
+ end
314
+
230
315
  request = Utils.deep_merge(
231
- converse_request(messages, model:, temperature:, max_tokens:, tools:, tool_prefs:),
316
+ converse_request(messages, model:, temperature:, max_tokens:, tools:, tool_prefs:, thinking:),
232
317
  params
233
318
  )
234
319
  log.debug do
235
320
  "bedrock.provider.stream: request prepared model=#{model_id(model)} tools=#{tools.size} " \
236
321
  "tool_choice=#{tool_choice_label(tool_prefs)} param_keys=#{params.keys.map(&:to_s).sort.join(',')}"
237
322
  end
238
- stream_converse(request, model_id(model), &)
323
+
324
+ # Log the thinking config being sent
325
+ thinking_config = request.dig(:additional_model_request_fields, :thinking)
326
+ log.debug { "bedrock.provider.stream: thinking_config=#{thinking_config.inspect}" } if thinking_config
327
+
328
+ start_time = Time.now
329
+ result = stream_converse(request, model_id(model), &)
330
+ elapsed = ((Time.now - start_time) * 1000).round
331
+ log.debug { "bedrock.provider.stream: completed model=#{model_id(model)} elapsed_ms=#{elapsed}" }
332
+ result
239
333
  end
240
334
 
241
335
  def count_tokens(
@@ -284,18 +378,434 @@ module Legion
284
378
  tool_prefs: nil, &)
285
379
  payload = params.dup
286
380
  payload[:additional_model_request_fields] ||= {}
287
- payload[:additional_model_request_fields][:thinking] = thinking if thinking
288
381
  payload[:additional_model_request_fields][:response_format] = schema if schema
289
382
 
290
383
  if block_given?
291
- stream(messages:, model:, temperature:, tools:, tool_prefs:, params: payload, &)
384
+ stream(messages:, model:, temperature:, tools:, tool_prefs:, params: payload, thinking:, &)
292
385
  else
293
- chat(messages:, model:, temperature:, tools:, tool_prefs:, params: payload)
386
+ chat(messages:, model:, temperature:, tools:, tool_prefs:, params: payload, thinking:)
294
387
  end
295
388
  end
296
389
 
297
390
  private
298
391
 
392
+ # Returns true if the model is an Anthropic model on Bedrock
393
+ def anthropic_model?(model_id)
394
+ return false unless model_id
395
+
396
+ mid = model_id.to_s
397
+ mid.start_with?('anthropic.', 'us.anthropic.', 'eu.anthropic.', 'ap.anthropic.')
398
+ end
399
+
400
+ # --- invoke_model path for thinking-enabled Anthropic models ---
401
+ # Bedrock Converse API silently drops thinking config for Claude Sonnet 4+.
402
+ # invoke_model uses the native Anthropic Messages API payload format which supports thinking.
403
+
404
+ def invoke_model_chat(messages:, model:, temperature:, max_tokens:, tools:, tool_prefs:,
405
+ thinking:, _params: nil, **_rest)
406
+ mid = model_id(model)
407
+ body = build_invoke_model_body(
408
+ messages: messages, model: mid, temperature: temperature, max_tokens: max_tokens,
409
+ tools: tools, tool_prefs: tool_prefs, thinking: thinking
410
+ )
411
+
412
+ log.debug { "bedrock.provider.invoke_model_chat: model=#{mid} thinking=#{thinking.inspect}" }
413
+
414
+ response = runtime_client.invoke_model(
415
+ model_id: self.class.inference_profile_id(mid, region: region),
416
+ content_type: 'application/json',
417
+ accept: 'application/json',
418
+ body: Legion::JSON.generate(body)
419
+ )
420
+
421
+ # Read body once — it's a stream that can only be consumed once
422
+ body_raw = value(response, :body)
423
+ body_raw = body_raw.read if body_raw.respond_to?(:read)
424
+ body_raw = body_raw.string if body_raw.respond_to?(:string)
425
+ body_str = body_raw.to_s
426
+
427
+ # Dump raw invoke_model response for debugging
428
+ dump_path = ENV.fetch('BEDROCK_DEBUG_OUTPUT', nil)
429
+ if dump_path
430
+ begin
431
+ dump_file = File.join(dump_path, "bedrock_invoke_chat_#{Time.now.strftime('%Y%m%d_%H%M%S')}.json")
432
+ File.write(dump_file, body_str)
433
+ log.debug { "bedrock.provider.invoke_model_chat: raw response dumped to #{dump_file}" }
434
+ rescue StandardError => e
435
+ log.warn { "bedrock.provider.invoke_model_chat: failed to dump raw response: #{e.message}" }
436
+ end
437
+ end
438
+
439
+ # Wrap body string back into response so parse_invoke_model_response can use it
440
+ parsed_body = Legion::JSON.parse(body_str, symbolize_names: false)
441
+ parse_invoke_model_response_hash(parsed_body, mid)
442
+ end
443
+
444
+ def invoke_model_stream(messages:, model:, temperature:, max_tokens:, tools:, tool_prefs:,
445
+ thinking:, _params: nil, **_rest, &)
446
+ mid = model_id(model)
447
+ body = build_invoke_model_body(
448
+ messages: messages, model: mid, temperature: temperature, max_tokens: max_tokens,
449
+ tools: tools, tool_prefs: tool_prefs, thinking: thinking, streaming: true
450
+ )
451
+
452
+ log.debug { "bedrock.provider.invoke_model_stream: model=#{mid} thinking=#{thinking.inspect}" }
453
+
454
+ state = {
455
+ accumulated: +'',
456
+ thinking: +'',
457
+ final_usage: nil,
458
+ stop_reason: nil,
459
+ tool_use_blocks: [],
460
+ current_tool_use: nil,
461
+ in_thinking: false,
462
+ raw_events: []
463
+ }
464
+
465
+ dump_path = ENV.fetch('BEDROCK_DEBUG_OUTPUT', nil)
466
+
467
+ # rubocop:disable Metrics/BlockLength
468
+ runtime_client.invoke_model_with_response_stream(
469
+ model_id: self.class.inference_profile_id(mid, region: region),
470
+ content_type: 'application/json',
471
+ accept: 'application/json',
472
+ body: Legion::JSON.generate(body)
473
+ ) do |stream|
474
+ # ResponseStream is an event emitter (Aws::BedrockRuntime::EventStreams::ResponseStream).
475
+ # Wire on_chunk_event to receive actual data events.
476
+ # Each chunk contains base64-encoded JSON lines with Anthropic events.
477
+ log.debug { "bedrock.provider.invoke_model_stream: stream class=#{stream.class}" }
478
+
479
+ stream.on_chunk_event do |event|
480
+ raw = event.respond_to?(:bytes) ? event.bytes : nil
481
+ raw = raw.read if raw.respond_to?(:read)
482
+ next unless raw&.length&.positive?
483
+
484
+ # Bedrock invoke_model_with_response_stream payloads are gzip-compressed.
485
+ # Detect gzip magic bytes (0x1f8b) and decompress.
486
+ require 'zlib'
487
+ raw = Zlib::GzipReader.wrap(StringIO.new(raw), &:read) if raw.byteslice(0, 2) == "\x1f\x8b"
488
+
489
+ # Now raw is UTF-8 JSON lines (newline-delimited Anthropic events)
490
+ text = raw.force_encoding('UTF-8')
491
+ text.lines.each do |line|
492
+ line = line.strip
493
+ next if line.empty?
494
+
495
+ raw_event = Legion::JSON.parse(line, symbolize_names: false)
496
+ next unless raw_event.is_a?(Hash)
497
+
498
+ event_type = raw_event['type'] || 'unknown'
499
+ state[:raw_events] << { event: event_type, data: raw_event } if dump_path
500
+ handle_invoke_model_stream_json(raw_event, state, mid) { |chunk| yield chunk if block_given? }
501
+ end
502
+ rescue StandardError => e
503
+ log.warn { "bedrock.provider.invoke_model_stream: chunk decode error=#{sanitize_log(e.message)}" }
504
+ end
505
+
506
+ stream.on_error_event do |event|
507
+ log.warn do
508
+ "bedrock.provider.invoke_model_stream: error event ivars=#{event.instance_variables.inspect}"
509
+ end
510
+ end
511
+
512
+ stream.on_internal_server_exception_event do |event|
513
+ log.warn do
514
+ 'bedrock.provider.invoke_model_stream: internal_server_exception ' \
515
+ "ivars=#{event.instance_variables.inspect}"
516
+ end
517
+ end
518
+
519
+ stream.on_model_stream_error_exception_event do |event|
520
+ log.warn do
521
+ "bedrock.provider.invoke_model_stream: model_stream_error ivars=#{event.instance_variables.inspect}"
522
+ end
523
+ end
524
+ end
525
+ # rubocop:enable Metrics/BlockLength
526
+
527
+ # Dump raw streaming events for debugging
528
+ if dump_path && state[:raw_events].any?
529
+ begin
530
+ dump_file = File.join(dump_path, "bedrock_invoke_stream_#{Time.now.strftime('%Y%m%d_%H%M%S')}.json")
531
+ File.write(dump_file, Legion::JSON.pretty_generate(state[:raw_events]))
532
+ log.debug do
533
+ "bedrock.provider.invoke_model_stream: #{state[:raw_events].size} raw events dumped to #{dump_file}"
534
+ end
535
+ rescue StandardError => e
536
+ log.warn { "bedrock.provider.invoke_model_stream: failed to dump raw events: #{e.message}" }
537
+ end
538
+ end
539
+
540
+ usage = state[:final_usage] || {}
541
+ msg_attrs = {
542
+ role: :assistant,
543
+ content: state[:accumulated],
544
+ model_id: mid,
545
+ tool_calls: build_stream_tool_calls(state[:tool_use_blocks]),
546
+ input_tokens: usage.fetch(:input_tokens, 0) || usage.fetch('input_tokens', 0),
547
+ output_tokens: usage.fetch(:output_tokens, 0) || usage.fetch('output_tokens', 0),
548
+ cached_tokens: usage.fetch(:cache_read_input_tokens, nil) || usage.fetch('cache_read_input_tokens', nil),
549
+ cache_creation_tokens: usage.fetch(:cache_creation_input_tokens,
550
+ nil) || usage.fetch('cache_creation_input_tokens', nil),
551
+ stop_reason: state[:stop_reason]
552
+ }
553
+ msg_attrs[:thinking] = state[:thinking] unless state[:thinking].empty?
554
+
555
+ Legion::Extensions::Llm::Message.new(**msg_attrs)
556
+ end
557
+
558
+ def build_invoke_model_body(messages:, temperature:, max_tokens:, tools:, tool_prefs:, thinking:,
559
+ _model: nil, _streaming: false)
560
+ body = {
561
+ max_tokens: max_tokens || 4096,
562
+ messages: format_invoke_model_messages(messages),
563
+ anthropic_version: 'bedrock-2023-05-31'
564
+ }
565
+ body[:temperature] = temperature if temperature
566
+ if tools && !tools.empty?
567
+ tool_format = format_invoke_model_tools(tools, tool_prefs)
568
+ body[:tools] = tool_format[:tools]
569
+ body[:tool_choice] = tool_format[:tool_choice] if tool_format[:tool_choice]
570
+ end
571
+ body[:thinking] = invoke_model_thinking(thinking) if thinking
572
+ # NOTE: Don't include body[:stream] = true in the JSON body for invoke_model_with_response_stream.
573
+ # The endpoint itself implies streaming; Bedrock rejects the extra field.
574
+ body
575
+ end
576
+
577
+ # Strip provider-specific keys (e.g. effort from OpenAI) that Bedrock/Anthropic APIs don't accept.
578
+ def invoke_model_thinking(thinking)
579
+ return thinking unless thinking.is_a?(Hash)
580
+
581
+ thinking.except(:effort, 'effort')
582
+ end
583
+
584
+ def format_invoke_model_messages(messages)
585
+ messages.filter_map do |msg|
586
+ role = msg.respond_to?(:role) ? msg.role.to_s : (msg[:role] || msg['role']).to_s
587
+ next if role == 'system'
588
+
589
+ content = case role
590
+ when 'tool'
591
+ format_invoke_model_tool_result(msg)
592
+ when 'assistant'
593
+ format_invoke_model_assistant(msg)
594
+ else
595
+ format_invoke_model_content(msg)
596
+ end
597
+
598
+ next if content.nil? || (content.is_a?(Array) && content.empty?)
599
+
600
+ { role: role, content: content }
601
+ end
602
+ end
603
+
604
+ def format_invoke_model_content(msg)
605
+ content = msg.respond_to?(:content) ? msg.content : (msg[:content] || msg['content'])
606
+ return [] if content.nil?
607
+
608
+ if content.is_a?(String)
609
+ [{ type: 'text', text: content }]
610
+ elsif content.is_a?(Array)
611
+ content.filter_map do |block|
612
+ type = (block[:type] || block['type']).to_s
613
+ next { type: 'text', text: block[:text] || block['text'] } if type == 'text'
614
+
615
+ block
616
+ end
617
+ else
618
+ [{ type: 'text', text: content.to_s }]
619
+ end
620
+ end
621
+
622
+ def format_invoke_model_tool_result(msg)
623
+ tool_call_id = if msg.respond_to?(:tool_call_id)
624
+ msg.tool_call_id
625
+ else
626
+ msg[:tool_call_id] || msg['tool_call_id']
627
+ end
628
+ content = if msg.respond_to?(:tool_results)
629
+ msg.tool_results.to_s
630
+ else
631
+ (msg[:content] || msg['content']).to_s
632
+ end
633
+ [{ type: 'tool_result', tool_use_id: tool_call_id, content: [{ type: 'text', text: content }] }]
634
+ end
635
+
636
+ def format_invoke_model_assistant(msg)
637
+ blocks = []
638
+
639
+ text = msg.respond_to?(:content) ? msg.content : (msg[:content] || msg['content'])
640
+ text_str = text.to_s
641
+ blocks << { type: 'text', text: text_str } unless text_str.strip.empty?
642
+
643
+ tool_calls = msg.respond_to?(:tool_calls) ? msg.tool_calls : (msg[:tool_calls] || msg['tool_calls'] || {})
644
+ call_array = tool_calls.is_a?(Hash) ? tool_calls.values : Array(tool_calls)
645
+
646
+ call_array.each do |call|
647
+ call_id = call.respond_to?(:id) ? call.id : (call[:id] || call['id'])
648
+ call_name = call.respond_to?(:name) ? call.name : (call[:name] || call['name'])
649
+ call_args = if call.respond_to?(:arguments)
650
+ call.arguments
651
+ else
652
+ call[:arguments] || call['arguments'] || {}
653
+ end
654
+
655
+ blocks << {
656
+ type: 'tool_use',
657
+ id: call_id,
658
+ name: call_name,
659
+ input: call_args
660
+ }
661
+ end
662
+
663
+ blocks
664
+ end
665
+
666
+ def format_invoke_model_tools(tools, tool_prefs)
667
+ tool_list = tools.values.map do |tool|
668
+ {
669
+ name: tool[:name] || tool['name'],
670
+ description: tool[:description] || tool['description'] || '',
671
+ input_schema: tool[:params_schema] || tool['params_schema'] ||
672
+ { type: 'object', properties: {} }
673
+ }
674
+ end
675
+
676
+ result = { tools: tool_list }
677
+
678
+ if tool_prefs
679
+ choice = tool_prefs[:choice] || tool_prefs['choice']
680
+ result[:tool_choice] = if [:required, 'required'].include?(choice)
681
+ { type: 'any' }
682
+ elsif choice.to_s != 'auto' && !choice.to_s.empty?
683
+ { type: 'tool', name: choice.to_s }
684
+ else
685
+ { type: 'auto' }
686
+ end
687
+ end
688
+
689
+ result
690
+ end
691
+
692
+ def parse_invoke_model_response(response, model_id)
693
+ body_raw = value(response, :body)
694
+ body_raw = body_raw.read if body_raw.respond_to?(:read)
695
+ body_raw = body_raw.string if body_raw.respond_to?(:string)
696
+ body = Legion::JSON.parse(body_raw, symbolize_names: false)
697
+ build_invoke_model_message(body, model_id)
698
+ end
699
+
700
+ def parse_invoke_model_response_hash(body, model_id)
701
+ # body is already a parsed Hash from Legion::JSON.parse
702
+ build_invoke_model_message(body, model_id)
703
+ end
704
+
705
+ def build_invoke_model_message(body, model_id)
706
+ content_blocks = body['content'] || []
707
+
708
+ text_parts = content_blocks.filter_map { |b| b['text'] if b['type'] == 'text' }.join
709
+ thinking_text = content_blocks.filter_map { |b| b['thinking'] if b['type'] == 'thinking' }.join
710
+ tool_calls_raw = content_blocks.select { |b| b['type'] == 'tool_use' }
711
+
712
+ tc = {}
713
+ tool_calls_raw.each do |tc_block|
714
+ tc[tc_block['id']] = Legion::Extensions::Llm::ToolCall.new(
715
+ id: tc_block['id'], name: tc_block['name'], arguments: tc_block['input'] || {}
716
+ )
717
+ end
718
+
719
+ usage = body['usage'] || {}
720
+
721
+ msg_attrs = {
722
+ role: :assistant,
723
+ content: text_parts,
724
+ model_id: model_id,
725
+ tool_calls: tc.empty? ? nil : tc,
726
+ input_tokens: usage['input_tokens'] || 0,
727
+ output_tokens: usage['output_tokens'] || 0,
728
+ cached_tokens: usage['cache_read_input_tokens'],
729
+ cache_creation_tokens: usage['cache_creation_input_tokens']
730
+ }
731
+ msg_attrs[:thinking] = thinking_text unless thinking_text.empty?
732
+
733
+ Legion::Extensions::Llm::Message.new(**msg_attrs)
734
+ end
735
+
736
+ def handle_invoke_model_stream_json(event_json, state, model_id)
737
+ # event_json is a Hash like { "type": "message_start", "message": { ... } }
738
+ case event_json['type']
739
+ when 'message_start'
740
+ msg = event_json['message'] || {}
741
+ state[:final_usage] = msg['usage'] || {}
742
+ when 'content_block_start'
743
+ block = event_json['content_block'] || {}
744
+ block_type = block['type'].to_s
745
+ state[:in_thinking] = (block_type == 'thinking')
746
+ if block_type == 'tool_use'
747
+ state[:current_tool_use] = {
748
+ tool_use_id: block['id'],
749
+ name: block['name'],
750
+ input_json: +''
751
+ }
752
+ elsif block_type != 'thinking'
753
+ state[:in_thinking] = false
754
+ end
755
+ when 'content_block_delta'
756
+ delta = event_json['delta'] || {}
757
+ delta_type = delta['type'].to_s
758
+ case delta_type
759
+ when 'thinking_delta'
760
+ text = delta['thinking'] || ''
761
+ state[:thinking] << text
762
+ if block_given? && !text.empty?
763
+ yield Legion::Extensions::Llm::Chunk.new(
764
+ role: :assistant,
765
+ content: '',
766
+ thinking: { content: text, enabled: true },
767
+ model_id: model_id
768
+ )
769
+ end
770
+ when 'text_delta'
771
+ text = delta['text'] || ''
772
+ state[:accumulated] << text
773
+ if block_given?
774
+ yield Legion::Extensions::Llm::Chunk.new(role: :assistant, content: text,
775
+ model_id: model_id)
776
+ end
777
+ when 'input_json_delta'
778
+ partial = delta['partial_json'] || ''
779
+ state[:current_tool_use][:input_json] << partial
780
+ if block_given? && !partial.empty?
781
+ yield Legion::Extensions::Llm::Chunk.new(
782
+ role: :assistant,
783
+ content: '',
784
+ tool_calls: {
785
+ state[:current_tool_use][:tool_use_id].to_sym =>
786
+ Legion::Extensions::Llm::ToolCall.new(
787
+ id: state[:current_tool_use][:tool_use_id],
788
+ name: state[:current_tool_use][:name],
789
+ arguments: partial
790
+ )
791
+ },
792
+ model_id: model_id
793
+ )
794
+ end
795
+ end
796
+ when 'content_block_stop'
797
+ if state[:current_tool_use]
798
+ state[:tool_use_blocks] << state[:current_tool_use]
799
+ state[:current_tool_use] = nil
800
+ end
801
+ when 'message_delta'
802
+ delta = event_json['delta'] || {}
803
+ state[:stop_reason] = delta['stop_reason']
804
+ end
805
+ rescue StandardError => e
806
+ log.warn { "bedrock.provider.invoke_model_stream_json: error=#{e.message}" }
807
+ end
808
+
299
809
  def static_offerings(**filters)
300
810
  STATIC_MODELS.filter_map do |entry|
301
811
  provider_filter = normalize_provider(filters[:by_provider])
@@ -363,17 +873,35 @@ module Legion
363
873
  ctx ? { context_window: ctx } : nil
364
874
  end
365
875
 
366
- def converse_request(messages, model:, temperature:, max_tokens:, tools:, tool_prefs:, guardrail_config: nil)
876
+ def converse_request(messages, model:, temperature:, max_tokens:, tools:, tool_prefs:, guardrail_config: nil,
877
+ thinking: nil)
367
878
  {
368
879
  model_id: self.class.inference_profile_id(model_id(model), region: region),
369
880
  messages: format_messages(messages.reject { |message| message.role == :system }),
370
881
  system: format_system(messages),
371
882
  inference_config: { temperature: temperature, max_tokens: max_tokens || model_max_tokens(model) }.compact,
372
883
  tool_config: format_tool_config(tools, tool_prefs),
373
- guardrail_config: guardrail_config
884
+ guardrail_config: guardrail_config,
885
+ additional_model_request_fields: bedrock_additional_fields(thinking)
374
886
  }.compact
375
887
  end
376
888
 
889
+ def bedrock_additional_fields(thinking)
890
+ fields = {}
891
+ if thinking
892
+ fields[:thinking] = {
893
+ type: 'enabled',
894
+ budget_tokens: if thinking.is_a?(Hash)
895
+ thinking[:budget_tokens] || thinking['budget_tokens'] ||
896
+ thinking[:budget] || thinking['budget'] || 1024
897
+ else
898
+ 1024
899
+ end
900
+ }
901
+ end
902
+ fields.empty? ? nil : fields
903
+ end
904
+
377
905
  def format_messages(messages)
378
906
  total = messages.size
379
907
  messages.filter_map.with_index do |message, idx|
@@ -389,9 +917,10 @@ module Legion
389
917
  return [] unless message.tool_result?
390
918
 
391
919
  [{
392
- type: 'tool_result',
393
- tool_use: { tool_use_id: message.tool_call_id },
394
- content: [{ type: 'text', text: message.tool_results.to_s }]
920
+ tool_result: {
921
+ tool_use_id: message.tool_call_id,
922
+ content: [{ text: message.tool_results.to_s }]
923
+ }
395
924
  }]
396
925
  end
397
926
 
@@ -439,7 +968,7 @@ module Legion
439
968
  text = content_text(message.content)
440
969
  blocks << { text: text } if text && !text.strip.empty?
441
970
 
442
- message.tool_calls.each do |call|
971
+ message.tool_calls.each_value do |call|
443
972
  blocks << {
444
973
  tool_use: {
445
974
  tool_use_id: call.id,
@@ -562,27 +1091,133 @@ module Legion
562
1091
  def parse_converse_response(response, fallback_model)
563
1092
  output = value(response, :output)
564
1093
  message = value(output, :message)
1094
+ content_blocks = value(message, :content)
565
1095
  usage = value(response, :usage) || {}
1096
+ additional_fields = value(response, :additional_model_response_fields)
566
1097
 
567
- Legion::Extensions::Llm::Message.new(
1098
+ msg_attrs = {
568
1099
  role: :assistant,
569
- content: text_from(value(message, :content)),
1100
+ content: text_from(content_blocks),
570
1101
  model_id: fallback_model,
571
- tool_calls: parse_tool_calls(value(message, :content)),
1102
+ tool_calls: parse_tool_calls(content_blocks),
572
1103
  input_tokens: value(usage, :input_tokens),
573
1104
  output_tokens: value(usage, :output_tokens),
574
1105
  cached_tokens: cache_read_tokens(usage),
575
1106
  cache_creation_tokens: cache_write_tokens(usage),
576
1107
  raw: normalize_response(response)
577
- )
1108
+ }
1109
+
1110
+ # Bedrock Converse returns thinking in two possible locations:
1111
+ # 1. Content blocks: { reasoning: { text: "..." } }
1112
+ # 2. Additional model response fields: { thinking: { reasoningContent: { chunk: { text } } } }
1113
+ thinking_text = extract_thinking_from_content(content_blocks) ||
1114
+ (additional_fields ? extract_thinking_from_fields(additional_fields) : nil)
1115
+ msg_attrs[:thinking] = thinking_text if thinking_text
1116
+
1117
+ Legion::Extensions::Llm::Message.new(**msg_attrs)
1118
+ end
1119
+
1120
+ def extract_thinking_from_content(content_blocks)
1121
+ return nil unless content_blocks
1122
+
1123
+ Array(content_blocks).each do |block|
1124
+ reasoning = value(block, :reasoning)
1125
+ # reasoning can be a Hash or an AWS SDK struct (Aws::BedrockRuntime::Types::ReasoningContent)
1126
+ next if reasoning.nil?
1127
+
1128
+ text = if reasoning.is_a?(Hash)
1129
+ reasoning[:text] || reasoning['text']
1130
+ else
1131
+ # AWS SDK struct — use value() to safely extract the :text field
1132
+ value(reasoning, :text)
1133
+ end
1134
+ return text.to_s unless text.to_s.empty?
1135
+ end
1136
+ nil
1137
+ end
1138
+
1139
+ def extract_thinking_from_fields(additional_fields)
1140
+ thinking = additional_fields[:thinking] || additional_fields['thinking']
1141
+ return nil unless thinking.is_a?(Hash)
1142
+
1143
+ # Bedrock Converse API returns thinking in multiple shapes depending on model:
1144
+ # - Claude direct: { text: "..." }
1145
+ # - Claude via Converse: { reasoningContent: { chunk: { text: "..." } } }
1146
+ # - Some models: { reasoning_text: "..." } or { reasoning: "..." }
1147
+ content = thinking[:text] || thinking['text'] ||
1148
+ thinking[:reasoning_text] || thinking['reasoningText'] ||
1149
+ thinking[:reasoning] || thinking['reasoning'] ||
1150
+ reasoning_content_text(thinking)
1151
+ content.to_s unless content.to_s.empty?
1152
+ end
1153
+
1154
+ def reasoning_content_text(thinking)
1155
+ rc = thinking[:reasoningContent] || thinking['reasoningContent']
1156
+ return nil unless rc.is_a?(Hash)
1157
+
1158
+ # Handle the nested chunk structure from Bedrock Converse
1159
+ chunk = rc[:chunk] || rc['chunk']
1160
+ if chunk.is_a?(Hash)
1161
+ chunk[:text] || chunk['text']
1162
+ else
1163
+ rc[:text] || rc['text']
1164
+ end
578
1165
  end
579
1166
 
580
1167
  def stream_converse(request, fallback_model)
581
1168
  state = { accumulated: +'', thinking: +'', final_usage: nil, stop_reason: nil,
582
- tool_use_blocks: [], current_tool_use: nil, in_thinking: false }
1169
+ tool_use_blocks: [], current_tool_use: nil, in_thinking: false,
1170
+ raw_events: [] }
1171
+
1172
+ log.debug do
1173
+ "bedrock.provider.stream_converse: starting model=#{fallback_model} tools=#{state[:tool_use_blocks].size}"
1174
+ end
1175
+
1176
+ dump_path = ENV.fetch('BEDROCK_DEBUG_OUTPUT', nil)
583
1177
 
584
1178
  runtime_client.converse_stream(**request) do |stream|
585
1179
  wire_stream_handlers(stream, state, fallback_model) { |chunk| yield chunk if block_given? }
1180
+
1181
+ # Capture all raw events for debugging
1182
+ if dump_path
1183
+ stream.on_content_block_start_event do |evt|
1184
+ state[:raw_events] << { event: 'content_block_start', data: safe_event_data(evt) }
1185
+ end
1186
+ stream.on_content_block_delta_event do |evt|
1187
+ state[:raw_events] << { event: 'content_block_delta', data: safe_event_data(evt) }
1188
+ end
1189
+ stream.on_content_block_stop_event do |evt|
1190
+ state[:raw_events] << { event: 'content_block_stop', data: safe_event_data(evt) }
1191
+ end
1192
+ stream.on_message_start_event do |evt|
1193
+ state[:raw_events] << { event: 'message_start', data: safe_event_data(evt) }
1194
+ end
1195
+ stream.on_message_stop_event do |evt|
1196
+ state[:raw_events] << { event: 'message_stop', data: safe_event_data(evt) }
1197
+ end
1198
+ stream.on_metadata_event do |evt|
1199
+ state[:raw_events] << { event: 'metadata', data: safe_event_data(evt) }
1200
+ end
1201
+ end
1202
+ end
1203
+
1204
+ # Dump raw streaming events for debugging
1205
+ if dump_path && state[:raw_events].any?
1206
+ begin
1207
+ dump_file = File.join(dump_path, "bedrock_stream_#{Time.now.strftime('%Y%m%d_%H%M%S')}.json")
1208
+ File.write(dump_file, Legion::JSON.pretty_generate(state[:raw_events]))
1209
+ log.debug do
1210
+ "bedrock.provider.stream_converse: #{state[:raw_events].size} raw events dumped to #{dump_file}"
1211
+ end
1212
+ rescue StandardError => e
1213
+ log.warn { "bedrock.provider.stream_converse: failed to dump raw events: #{e.message}" }
1214
+ end
1215
+ end
1216
+
1217
+ log.debug do
1218
+ "bedrock.provider.stream_converse: completed model=#{fallback_model} " \
1219
+ "accumulated_length=#{state[:accumulated].length} thinking_length=#{state[:thinking].length} " \
1220
+ "tool_use_blocks=#{state[:tool_use_blocks].size} stop_reason=#{state[:stop_reason]}"
586
1221
  end
587
1222
 
588
1223
  msg_attrs = {
@@ -614,7 +1249,9 @@ module Legion
614
1249
  stream.on_content_block_start_event do |event|
615
1250
  start = value(event, :start)
616
1251
 
617
- if value(start, :thinking)
1252
+ # Bedrock Converse uses 'reasoning' blocks for thinking content,
1253
+ # and 'thinking' blocks for legacy/direct invoke_model responses
1254
+ if value(start, :thinking) || value(start, :reasoning)
618
1255
  state[:in_thinking] = true
619
1256
  next
620
1257
  end
@@ -634,7 +1271,11 @@ module Legion
634
1271
  def wire_block_delta(stream, state, fallback_model)
635
1272
  stream.on_content_block_delta_event do |event|
636
1273
  delta = value(event, :delta)
637
- text = value(delta, :text)
1274
+ # Bedrock streaming: text blocks use delta.text,
1275
+ # reasoning/thinking blocks use delta.reasoning.text or delta.thinking.text
1276
+ text = value(delta, :text) ||
1277
+ (value(delta, :reasoning) ? value(reasoning_delta, :text) : nil) ||
1278
+ (value(delta, :thinking) ? value(thinking_delta, :text) : nil)
638
1279
  if text
639
1280
  if state[:in_thinking]
640
1281
  state[:thinking] << text
@@ -857,6 +1498,12 @@ module Legion
857
1498
  body.is_a?(String) ? Legion::JSON.parse(body, symbolize_names: false) : body.to_h
858
1499
  end
859
1500
 
1501
+ # Safely extract event data for debugging — AWS SDK structs
1502
+ # may or may not respond to #to_h
1503
+ def safe_event_data(evt)
1504
+ evt.respond_to?(:to_h) ? evt.to_h : evt.inspect[0, 500]
1505
+ end
1506
+
860
1507
  def normalize_response(response)
861
1508
  response.respond_to?(:to_h) ? response.to_h : {}
862
1509
  end
@@ -865,8 +1512,13 @@ module Legion
865
1512
  return nil if object.nil?
866
1513
 
867
1514
  string_key = key.to_s
868
- return object[key] if object.respond_to?(:key?) && object.key?(key)
869
- return object[string_key] if object.respond_to?(:key?) && object.key?(string_key)
1515
+
1516
+ val = safe_struct_access(object, key)
1517
+ return val unless val.nil?
1518
+
1519
+ val = safe_struct_access(object, string_key)
1520
+ return val unless val.nil?
1521
+
870
1522
  return object.public_send(key) if object.respond_to?(key)
871
1523
 
872
1524
  if object.respond_to?(:to_h)
@@ -877,6 +1529,26 @@ module Legion
877
1529
 
878
1530
  nil
879
1531
  end
1532
+
1533
+ # Sanitize potentially binary/non-UTF-8 strings for safe logging
1534
+ def sanitize_log(str)
1535
+ return str unless str.is_a?(String)
1536
+
1537
+ str.force_encoding('UTF-8').scrub('?')
1538
+ rescue StandardError
1539
+ str.inspect
1540
+ end
1541
+
1542
+ def safe_struct_access(object, key)
1543
+ return nil unless object.respond_to?(:key?) && object.key?(key)
1544
+
1545
+ object[key]
1546
+ rescue NameError
1547
+ # AWS SDK structs (Aws::Structure) define members in their schema
1548
+ # but may not populate them in every response. A missing value
1549
+ # raises NameError instead of returning nil.
1550
+ nil
1551
+ end
880
1552
  end
881
1553
  end
882
1554
  end
@@ -4,7 +4,7 @@ module Legion
4
4
  module Extensions
5
5
  module Llm
6
6
  module Bedrock
7
- VERSION = '0.3.12'
7
+ VERSION = '0.3.18'
8
8
  end
9
9
  end
10
10
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: lex-llm-bedrock
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.12
4
+ version: 0.3.18
5
5
  platform: ruby
6
6
  authors:
7
7
  - LegionIO