RubyGems - legion-llm - Versions diffs - 0.9.19 → 0.9.23 - Mend

legion-llm 0.9.19 → 0.9.23

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (31) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +58 -0
data/lib/legion/llm/api/native/helpers.rb +20 -0
data/lib/legion/llm/api/native/inference.rb +17 -3
data/lib/legion/llm/api/native/providers.rb +4 -1
data/lib/legion/llm/call/dispatch.rb +8 -1
data/lib/legion/llm/call/embeddings.rb +123 -10
data/lib/legion/llm/call/lex_llm_adapter.rb +99 -24
data/lib/legion/llm/call/providers.rb +7 -1
data/lib/legion/llm/discovery.rb +23 -2
data/lib/legion/llm/inference/conversation.rb +17 -291
data/lib/legion/llm/inference/executor.rb +82 -48
data/lib/legion/llm/inference/native_tool_loop.rb +149 -0
data/lib/legion/llm/inference/steps/gaia_advisory.rb +4 -0
data/lib/legion/llm/inference/steps/rag_context.rb +2 -0
data/lib/legion/llm/inference/steps/sticky_runners.rb +11 -1
data/lib/legion/llm/inference/steps/tool_discovery.rb +2 -1
data/lib/legion/llm/inference/steps/trigger_match.rb +85 -15
data/lib/legion/llm/inventory.rb +16 -5
data/lib/legion/llm/metering.rb +116 -42
data/lib/legion/llm/router/health_tracker.rb +38 -0
data/lib/legion/llm/router.rb +60 -6
data/lib/legion/llm/settings.rb +9 -2
data/lib/legion/llm/tools/confidence.rb +1 -25
data/lib/legion/llm/tools/dispatcher.rb +8 -1
data/lib/legion/llm/tools/interceptors/python_venv.rb +13 -5
data/lib/legion/llm/tools/special.rb +325 -0
data/lib/legion/llm/tools.rb +1 -0
data/lib/legion/llm/version.rb +1 -1
data/lib/legion/llm.rb +1 -0
metadata +3 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 4743dd41922fbca3818f72bb48d353314ed2895ce0981e779ac29315c8ffea3b
-  data.tar.gz: a235df9596b11ddfd94ef5f075a9785f4bce0ae9c849db8b0b5845bde83af4ac
+  metadata.gz: 8c8c98a439d2e96bba437e5e8b4bf8c47c01277a4079bd459c7257e2990278c6
+  data.tar.gz: f9344c761ebf18b4c5ab271ac8cb5858ce46f791588ace438726002d2907c70e
 SHA512:
-  metadata.gz: 1c02e4859ef4bd824e854275fcbb1eadfe243b13477c9af9a9f2f3c484579eefa10bc70d0b1735c85b433b476ca9a8dd69b5fa788cdeafc651dcc370f71cfc40
-  data.tar.gz: 9f3ae0f1adba6bbe56653f0afce38c0eaa0dd4121b02279f5d9053be84682774f07401e346a855320c1bc006929d8ca184c88896098cd52697869c9b8d9f4630
+  metadata.gz: 9574535d0eeca84d522858dd323e8d028994b46b3d3f78a37a8094a4f1a692fbdd68bf24e8a061160b5238a2c3e4f73141e29bf70c6423f18e4b4441937f5417
+  data.tar.gz: 69aa8eccf10beb687b637b7442d9eb8a7bae0d42405fc9cfd47f8c8d5c036b7724df6cb55c10d8264f0701f46f8282f0a44d8622d607a6129a09ab4c39ad2e99

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,63 @@
 # Legion LLM Changelog
+## [0.9.23] - 2026-05-13
+### Added
+- Router: `registry_entry_for_provider` for explicit provider model resolution
+- Router: model denylist (`deny_model`, `model_denied?`, `excluded_by_denial?`) — config errors auto-deny models
+- Executor: config error detection (`CONFIG_ERROR_PATTERNS`) — prevents circuit breaker trips on auth/validation errors
+- Executor: step timing hash on response (`metrics.timing`, `metrics.latency_legionio_ms`)
+- API: `/api/llm/inference` response includes `provider`, `instance`, `tier`, `metrics`
+- API: `/api/llm/providers` surfaces `source` and `credential_fingerprint`
+- Inventory: provider-scoped queries skip unrelated providers
+- Metering: disk-based JSONL spool when transport unavailable (was dropping events)
+- Discovery: `report_discovery_failure` reports connection failures to health tracker
+- Providers: `enabled: false` instances not registered; `default_model` in metadata
+### Changed
+- Router: tier-aware model fallback — global default no longer bleeds across providers
+- Inventory: single-source offerings (native_provider preferred over discovery to eliminate duplicates)
+- Inventory: dedup normalizes `"default"` instance name
+- Discovery: concise connection error log (no stacktrace for unreachable providers)
+- Settings: removed `claude` from `native_providers` list
+### Fixed
+- Cache spec rewritten to use real `Legion::Cache` instead of fragile stubs
+## [0.9.22] - 2026-05-12
+### Added
+- Pin `legion_list_special_tools` before client and registry tools so models can inspect Legion special tools and the current `Legion::Settings::Extensions` inventory.
+- Surface special Ruby runtime execution with current process/PATH environment metadata, and add Legion-managed Python and pip tools when `legionio setup python` is available.
+### Changed
+- Route Python command interception through the same Legion Python runtime detection used by special tool injection.
+- Replace ad hoc `/api/llm/inference` tool-payload debug prints with structured debug logging.
+### Fixed
+- Chunk Ollama embedding requests according to configured model context limits and aggregate chunk vectors so large Apollo knowledge-capture documents do not exceed provider context windows.
+## [0.9.21] - 2026-05-12
+### Fixed
+- Route metering strictly through `legion-transport`, dropping events when transport is unavailable instead of writing metric events to `Legion::Data::Spool`.
+- Keep override confidence database access read-only by removing `Legion::Data::Local` upserts from `legion-llm`.
+- Stop conversation history and sticky state from writing directly to `Legion::Data` tables.
+## [0.9.20] - 2026-05-12
+### Added
+- Added `llm.gaia.advisory_enabled`, defaulting to `true`, so GAIA pre-request advisory shaping can be disabled without code changes.
+### Fixed
+- Preserve accumulated streamed native tool-call arguments from lex-llm provider responses instead of rebuilding final responses from partial stream chunks.
+- Symbolize extension tool arguments before invoking runner keyword methods so JSON string keys such as `chat_id` satisfy Ruby keyword parameters.
+- Match tool triggers from `Legion::Settings::Extensions` registry entries and keep registry tools injectable alongside client tools with better diagnostics.
+- Skip trigger matching cleanly when `Legion::Settings::Extensions` is not loaded instead of warning through a rescued `NameError`.
+- Accumulate only stream fallback state in the lex-llm adapter instead of retaining every streamed chunk when providers return final messages.
+- Apply explicit vLLM tool-name forcing only on the first native tool-loop round, allowing follow-up automatic tool calls after the requested tool returns.
+- Ignore absent GAIA advisory context-window limits when sizing RAG retrieval instead of routing nil through debug exception handling.
 ## [0.9.19] - 2026-05-11
 ### Added

data/lib/legion/llm/api/native/helpers.rb CHANGED Viewed

@@ -498,6 +498,26 @@ module Legion
                 nil
               end
+              define_method(:build_response_metrics) do |pipeline_response|
+                routing = pipeline_response.routing || {}
+                timestamps = pipeline_response.timestamps || {}
+                metrics = {}
+                if (latency = routing[:latency_ms])
+                  metrics[:latency_ms] = latency
+                end
+                step_timings = timestamps[:step_timings]
+                if step_timings.is_a?(Hash) && step_timings.any?
+                  metrics[:timing] = step_timings
+                  total = step_timings[:total].to_i
+                  external = step_timings[:provider_call].to_i + step_timings[:tool_calls].to_i
+                  metrics[:latency_legionio_ms] = total - external if total.positive?
+                end
+                metrics.empty? ? nil : metrics
+              end
             end
             log.debug('[llm][api][helpers] shared helpers registered')

data/lib/legion/llm/api/native/inference.rb CHANGED Viewed

@@ -43,6 +43,11 @@ module Legion
               tools = raw_tools || []
               validate_tools!(tools) unless tools.empty?
+              raw_tool_count = raw_tools.is_a?(Array) ? raw_tools.size : 0
+              log.debug(
+                "[llm][api][tools] action=request_tools_received request_id=#{request_id} " \
+                "has_tools=#{body.key?(:tools)} raw_tools_class=#{raw_tools&.class} raw_tools_count=#{raw_tool_count}"
+              )
               caller_identity = identity_canonical_name(env)
               last_user = messages.select { |m| (m[:role] || m['role']).to_s == 'user' }.last
@@ -179,11 +184,15 @@ module Legion
                     request_id:      request_id,
                     content:         full_text,
                     model:           (routing[:model] || routing['model']).to_s,
+                    provider:        (routing[:provider] || routing['provider'])&.to_s,
+                    instance:        (routing[:instance] || routing['instance'])&.to_s,
+                    tier:            (routing[:tier] || routing['tier'])&.to_s,
                     input_tokens:    token_value(tokens, :input),
                     output_tokens:   token_value(tokens, :output),
                     tool_calls:      extract_tool_calls(pipeline_response),
-                    conversation_id: pipeline_response.conversation_id
-                  }
+                    conversation_id: pipeline_response.conversation_id,
+                    metrics:         build_response_metrics(pipeline_response)
+                  }.compact
                   done_payload[:thinking] = pipeline_response.thinking if include_thinking && pipeline_response.thinking
                   emit_sse_event(out, 'done', {
                                    **done_payload
@@ -232,11 +241,16 @@ module Legion
                   tool_calls:      tool_calls,
                   stop_reason:     pipeline_response.stop&.dig(:reason)&.to_s,
                   model:           (routing[:model] || routing['model']).to_s,
+                  provider:        (routing[:provider] || routing['provider'])&.to_s,
+                  instance:        (routing[:instance] || routing['instance'])&.to_s,
+                  tier:            (routing[:tier] || routing['tier'])&.to_s,
                   input_tokens:    token_value(tokens, :input),
                   output_tokens:   token_value(tokens, :output),
-                  conversation_id: pipeline_response.conversation_id
+                  conversation_id: pipeline_response.conversation_id,
+                  metrics:         build_response_metrics(pipeline_response)
                 }
                 payload[:thinking] = pipeline_response.thinking if include_thinking && pipeline_response.thinking
+                payload.compact!
                 json_response(payload, status_code: 200)
               end
             rescue Legion::LLM::AuthError => e

data/lib/legion/llm/api/native/providers.rb CHANGED Viewed

@@ -87,7 +87,7 @@ module Legion
             provider_key = entry[:provider].to_sym
             instance_key = entry[:instance].to_sym
-            {
+            result = {
               provider:     entry[:provider].to_s,
               instance:     entry[:instance].to_s,
               tier:         entry.dig(:metadata, :tier)&.to_s,
@@ -102,6 +102,9 @@ module Legion
                             end,
               native:       true
             }
+            result[:source] = entry.dig(:metadata, :source) if entry.dig(:metadata, :source)
+            result[:credential_fingerprint] = entry.dig(:metadata, :credential_fingerprint) if entry.dig(:metadata, :credential_fingerprint)
+            result
           end
         end
       end

data/lib/legion/llm/call/dispatch.rb CHANGED Viewed

@@ -250,6 +250,14 @@ module Legion
           ext = Registry.for(provider, instance: instance)
           return ext if ext
+          if instance && instance.to_s != 'default'
+            ext = Registry.for(provider, instance: :default)
+            if ext
+              log.warn("[llm][native] instance_fallback provider=#{provider} requested=#{instance} using=default")
+              return ext
+            end
+          end
           instance_suffix = instance ? "/#{instance}" : ''
           log.error("[llm][native] provider_not_registered provider=#{provider}#{instance_suffix}")
           raise Legion::LLM::ProviderError,
@@ -296,7 +304,6 @@ module Legion
           tool_calls = normalize_tool_calls(raw[:tool_calls] || raw['tool_calls'] || raw[:tools] || raw['tools'] || result)
           stop_reason = raw[:stop_reason] || raw['stop_reason'] || (tool_calls.any? ? :tool_use : nil)
           {
             result:      result,
             model:       raw[:model] || raw['model'],

data/lib/legion/llm/call/embeddings.rb CHANGED Viewed

@@ -24,11 +24,13 @@ module Legion
             return unavailable_result(model, provider) unless provider
             model ||= resolve_model
-            text_length = text.to_s.length
-            text = apply_prefix(coerce_text(text), model: model, task: task)
+            text = coerce_text(text)
+            text_length = text.length
+            prepared_texts = prepare_embedding_texts(text, provider: provider, model: model, task: task)
+            dispatch_text = prepared_texts.one? ? prepared_texts.first : prepared_texts
             log.info("[llm][embed] action=generate provider=#{provider} instance=#{instance || 'default'} " \
-                     "model=#{model} task=#{task} text_chars=#{text_length}")
+                     "model=#{model} task=#{task} text_chars=#{text_length} chunks=#{prepared_texts.size}")
             started_at = ::Process.clock_gettime(::Process::CLOCK_MONOTONIC)
             response = Dispatch.call(
@@ -36,24 +38,29 @@ module Legion
               instance:   instance,
               capability: :embed,
               model:      model,
-              text:       text,
+              text:       dispatch_text,
               dimensions: dimensions
             )
             elapsed = ((::Process.clock_gettime(::Process::CLOCK_MONOTONIC) - started_at) * 1000).round(1)
-            vector = normalize_vector(response[:result])
+            vector = if prepared_texts.size > 1
+                       aggregate_vectors(response[:result], weights: prepared_texts.map(&:length), model: model, provider: provider)
+                     else
+                       normalize_vector(response[:result])
+                     end
             vector = enforce_dimensions(vector) if enforce_dimension?
             tokens = extract_tokens(response)
             log.info("[llm][embed] action=generate.complete provider=#{provider} instance=#{instance || 'default'} " \
-                     "model=#{model} dimensions=#{vector&.size || 0} tokens=#{tokens} duration_ms=#{elapsed}")
+                     "model=#{model} dimensions=#{vector&.size || 0} tokens=#{tokens} chunks=#{prepared_texts.size} duration_ms=#{elapsed}")
             {
               vector:     vector,
               model:      model,
               provider:   provider,
               dimensions: vector&.size || 0,
-              tokens:     tokens
+              tokens:     tokens,
+              chunks:     prepared_texts.size
             }
           rescue StandardError => e
             handle_exception(e, level: :warn, operation: 'llm.embeddings.generate')
@@ -70,7 +77,20 @@ module Legion
             log.info("[llm][embed] action=generate_batch provider=#{provider} instance=#{instance || 'default'} " \
                      "model=#{model} count=#{texts.size} task=#{task}")
-            texts = texts.map { |t| apply_prefix(coerce_text(t), model: model, task: task) }
+            raw_texts = texts.map { |t| coerce_text(t) }
+            prepared_texts = raw_texts.map { |t| prepare_embedding_texts(t, provider: provider, model: model, task: task) }
+            if prepared_texts.any? { |chunks| chunks.size > 1 }
+              return generate_chunked_batch(
+                raw_texts,
+                model:      model,
+                provider:   provider,
+                instance:   instance,
+                dimensions: dimensions,
+                task:       task
+              )
+            end
+            texts = prepared_texts.map(&:first)
             started_at = ::Process.clock_gettime(::Process::CLOCK_MONOTONIC)
             response = Dispatch.call(
@@ -122,11 +142,71 @@ module Legion
           end
           def apply_prefix(text, model:, task:)
-            base = model.to_s.split(':').first
-            prefix = PREFIX_REGISTRY.dig(base, task)
+            prefix = prefix_for(model, task)
             prefix ? "#{prefix}#{text}" : text
           end
+          def prepare_embedding_texts(text, provider:, model:, task:)
+            prefix = prefix_for(model, task).to_s
+            chunks = chunk_text(text, embedding_chunk_chars(provider: provider, model: model, prefix: prefix))
+            chunks.map { |chunk| prefix.empty? ? chunk : "#{prefix}#{chunk}" }
+          end
+          def prefix_for(model, task)
+            registry = Legion::LLM::Settings.value(:embedding, :prefix_registry, default: PREFIX_REGISTRY)
+            model_prefixes = Legion::LLM::Settings.config_value(registry, model_base(model), {})
+            Legion::LLM::Settings.config_value(model_prefixes, task)
+          end
+          def embedding_chunk_chars(provider:, model:, prefix:)
+            return nil unless provider.to_s == 'ollama'
+            embedding = Legion::LLM::Settings.value(:embedding, default: {})
+            context_chars = Legion::LLM::Settings.config_value(embedding, :ollama_context_chars, {})
+            limit = Legion::LLM::Settings.config_value(context_chars, model.to_s) ||
+                    Legion::LLM::Settings.config_value(context_chars, model_base(model)) ||
+                    Legion::LLM::Settings.config_value(embedding, :ollama_default_context_chars)
+            limit = limit.to_i
+            return nil unless limit.positive?
+            [limit - prefix.length, 1].max
+          end
+          def chunk_text(text, max_chars)
+            return [text] unless max_chars.to_i.positive?
+            return [text] if text.length <= max_chars
+            chunks = []
+            remaining = text.dup
+            until remaining.empty?
+              chunk, remaining = next_text_chunk(remaining, max_chars)
+              chunks << chunk unless chunk.empty?
+            end
+            chunks
+          end
+          def next_text_chunk(text, max_chars)
+            return [text, ''] if text.length <= max_chars
+            slice = text[0, max_chars]
+            boundary = chunk_boundary(slice, max_chars)
+            chunk = text[0, boundary].strip
+            remaining = text[boundary..].to_s.strip
+            [chunk.empty? ? text[0, max_chars] : chunk, remaining]
+          end
+          def chunk_boundary(slice, max_chars)
+            candidates = [slice.rindex("\n\n"), slice.rindex("\n"), slice.rindex('. '), slice.rindex(' ')]
+            boundary = candidates.compact.max
+            return max_chars unless boundary && boundary >= (max_chars * 0.5)
+            boundary + 1
+          end
+          def model_base(model)
+            model.to_s.split(':').first
+          end
           def normalize_vector(result)
             return nil if result.nil?
             return result if result.is_a?(Array) && result.first.is_a?(Numeric)
@@ -145,6 +225,39 @@ module Legion
             end
           end
+          def aggregate_vectors(result, weights:, model:, provider:)
+            vectors = normalize_batch(result, model, provider).map { |entry| entry[:vector] }
+            usable = vectors.each_with_index.filter_map do |vector, index|
+              next unless vector.is_a?(Array) && vector.first.is_a?(Numeric)
+              [vector, [weights[index].to_i, 1].max]
+            end
+            return nil if usable.empty?
+            dimensions = usable.first.first.size
+            usable.select! { |vector, _weight| vector.size == dimensions }
+            total_weight = usable.sum { |_vector, weight| weight }.to_f
+            Array.new(dimensions) do |index|
+              usable.sum { |vector, weight| vector[index].to_f * weight } / total_weight
+            end
+          end
+          def generate_chunked_batch(texts, model:, provider:, instance:, dimensions:, task:)
+            log.info("[llm][embed] action=generate_batch.chunked provider=#{provider} instance=#{instance || 'default'} " \
+                     "model=#{model} count=#{texts.size}")
+            texts.each_with_index.map do |text, index|
+              generate(
+                text:       text,
+                model:      model,
+                provider:   provider,
+                instance:   instance,
+                dimensions: dimensions,
+                task:       task
+              ).merge(index: index)
+            end
+          end
           def enforce_dimension?
             Legion::LLM::Settings.value(:embedding, :enforce_dimension) != false
           end

data/lib/legion/llm/call/lex_llm_adapter.rb CHANGED Viewed

@@ -35,8 +35,8 @@ module Legion
         end
         def stream(model:, messages:, **opts, &block)
-          chunks = []
-          provider.stream_chat(
+          accumulator = build_stream_accumulator
+          response = provider.stream_chat(
             messages:    normalize_messages(messages, system: opts[:system]),
             tools:       normalize_tools(opts[:tools]),
             temperature: opts[:temperature],
@@ -47,11 +47,15 @@ module Legion
             tool_prefs:  opts[:tool_prefs],
             model:       model_info(model, offering_metadata: opts[:offering_metadata])
           ) do |chunk|
-            chunks << chunk
+            accumulate_stream_chunk(accumulator, chunk)
             block&.call(chunk)
           end
-          chunk_response(chunks, offering_metadata: opts[:offering_metadata])
+          if response
+            message_response(response, offering_metadata: opts[:offering_metadata])
+          else
+            chunk_response(accumulator, offering_metadata: opts[:offering_metadata])
+          end
         end
         def embed(model:, text:, dimensions: nil, **opts)
@@ -158,8 +162,8 @@ module Legion
             message_hash = normalize_hash(message)
             message_class.new(
               role:         message_hash[:role] || :user,
-              content:      message_hash[:content].to_s,
-              tool_calls:   message_hash[:tool_calls],
+              content:      normalize_message_content(message_hash[:content]),
+              tool_calls:   normalize_message_tool_calls(message_hash[:tool_calls]),
               tool_call_id: message_hash[:tool_call_id]
             )
           end
@@ -222,6 +226,47 @@ module Legion
           { role: :user, content: value }
         end
+        def normalize_message_content(content)
+          return content if content.nil? || content.is_a?(String)
+          return content if content.respond_to?(:attachments)
+          if content.is_a?(Array)
+            text_parts = content.filter_map { |part| text_part_content(part) }
+            return text_parts.join("\n\n") unless text_parts.empty?
+          end
+          text_part_content(content) || content.to_s
+        end
+        def text_part_content(part)
+          return unless part.respond_to?(:transform_keys)
+          normalized = part.transform_keys { |key| key.respond_to?(:to_sym) ? key.to_sym : key }
+          return unless normalized[:type].to_s == 'text'
+          normalized[:text].to_s
+        end
+        def normalize_message_tool_calls(tool_calls)
+          return tool_calls unless tool_calls.is_a?(Array)
+          tool_calls.filter_map do |tool_call|
+            normalized = normalize_hash(tool_call)
+            name = normalized[:name]
+            next if name.to_s.empty?
+            arguments = normalized[:arguments] || {}
+            [
+              name.to_sym,
+              lex_llm_namespace::ToolCall.new(
+                id:        normalized[:id],
+                name:      name.to_s,
+                arguments: arguments
+              )
+            ]
+          end.to_h
+        end
         def message_response(response, offering_metadata: nil)
           {
             result:      response.content,
@@ -234,19 +279,52 @@ module Legion
           }.compact
         end
-        def chunk_response(chunks, offering_metadata: nil)
-          last = chunks.reverse.find { |chunk| chunk.respond_to?(:input_tokens) }
-          tool_calls = chunks.filter_map { |chunk| chunk.tool_calls if chunk.respond_to?(:tool_calls) }.reduce({}) do |memo, calls|
-            memo.merge(calls || {})
-          end
+        def build_stream_accumulator
           {
-            result:      chunks.filter_map(&:content).join,
-            model:       last&.model_id,
+            content:            +'',
+            model:              nil,
+            usage:              {},
+            raw:                nil,
+            tool_calls:         {},
+            thinking_text:      +'',
+            thinking_signature: nil
+          }
+        end
+        def accumulate_stream_chunk(accumulator, chunk)
+          accumulator[:content] << chunk.content.to_s if chunk.respond_to?(:content) && !chunk.content.nil?
+          accumulate_stream_usage(accumulator, chunk)
+          accumulator[:tool_calls].merge!(chunk.tool_calls || {}) if chunk.respond_to?(:tool_calls)
+          accumulate_stream_thinking(accumulator, chunk)
+        end
+        def accumulate_stream_usage(accumulator, chunk)
+          return unless chunk.respond_to?(:input_tokens)
+          accumulator[:model] = chunk.model_id if chunk.respond_to?(:model_id)
+          accumulator[:usage] = usage_hash(chunk)
+          accumulator[:raw] = chunk.raw if chunk.respond_to?(:raw)
+        end
+        def accumulate_stream_thinking(accumulator, chunk)
+          return unless chunk.respond_to?(:thinking)
+          thinking = normalize_thinking_value(chunk.thinking)
+          content = thinking[:content]
+          accumulator[:thinking_text] << content.to_s unless content.nil?
+          accumulator[:thinking_signature] ||= thinking[:signature]
+        end
+        def chunk_response(accumulator, offering_metadata: nil)
+          tool_calls = accumulator[:tool_calls]
+          {
+            result:      accumulator[:content],
+            model:       accumulator[:model],
             tool_calls:  tool_calls.empty? ? nil : tool_calls,
             stop_reason: tool_calls.empty? ? nil : :tool_use,
-            thinking:    stream_thinking_hash(chunks),
-            usage:       last ? usage_hash(last) : {},
-            metadata:    response_metadata(last, offering_metadata: offering_metadata)
+            thinking:    stream_thinking_hash(accumulator),
+            usage:       accumulator[:usage],
+            metadata:    response_metadata(accumulator[:raw], offering_metadata: offering_metadata)
           }.compact
         end
@@ -284,15 +362,11 @@ module Legion
           }
         end
-        def stream_thinking_hash(chunks)
-          thinking_parts = chunks.filter_map do |chunk|
-            normalize_thinking_value(chunk.thinking) if chunk.respond_to?(:thinking)
-          end
-          thinking_text = thinking_parts.filter_map { |part| part[:content] }.join
-          signature = thinking_parts.find { |part| part[:signature] }&.dig(:signature)
+        def stream_thinking_hash(accumulator)
+          thinking_text = accumulator[:thinking_text]
           return nil if thinking_text.empty?
-          { content: thinking_text, signature: signature, enabled: true }.compact
+          { content: thinking_text, signature: accumulator[:thinking_signature], enabled: true }.compact
         end
         def thinking_hash(response)
@@ -325,7 +399,8 @@ module Legion
         def response_metadata(response = nil, offering_metadata: nil)
           metadata = normalize_offering_metadata(offering_metadata)
-          raw = response.respond_to?(:raw) ? response.raw : nil
+          raw = response.is_a?(Hash) ? response : nil
+          raw ||= response.raw if response.respond_to?(:raw)
           metadata[:raw_model] = raw['model'] if raw.is_a?(Hash) && raw['model']
           metadata.empty? ? {} : { offering: metadata }
         end

data/lib/legion/llm/call/providers.rb CHANGED Viewed

@@ -80,6 +80,8 @@ module Legion
         def register_provider_instance(provider_module, family, aliases, instance_id, config)
           normalized_config = normalize_instance_config(config)
+          return if normalized_config[:enabled] == false
           registry_config = adapter_instance_config(normalized_config, instance_id)
           metadata = instance_metadata(normalized_config)
           adapter = Call::LexLLMAdapter.new(family, provider_module.provider_class, instance_config: registry_config)
@@ -107,7 +109,11 @@ module Legion
         end
         def instance_metadata(config)
-          { tier: config[:tier], capabilities: config[:capabilities] || [] }
+          meta = { tier: config[:tier], capabilities: config[:capabilities] || [] }
+          meta[:default_model] = config[:default_model] if config[:default_model]
+          meta[:source] = config[:source] if config[:source]
+          meta[:credential_fingerprint] = config[:credential_fingerprint] if config[:credential_fingerprint]
+          meta
         end
         def safe_provider_family(provider_module)

data/lib/legion/llm/discovery.rb CHANGED Viewed

@@ -141,8 +141,7 @@ module Legion
                 }
               end
             rescue StandardError => e
-              handle_exception(e, level:     :debug,
-                                  operation: "discovery.offerings.#{entry[:provider]}/#{entry[:instance]}")
+              report_discovery_failure(entry, e)
               []
             end
           end
@@ -165,6 +164,28 @@ module Legion
         private
+        def report_discovery_failure(entry, error)
+          provider = entry[:provider]
+          instance = entry[:instance]
+          connection_error = error.is_a?(Faraday::ConnectionFailed) ||
+                             error.message.match?(/connection refused|connect.*timeout|no route to host/i)
+          if connection_error
+            log.warn("[llm][discovery] provider=#{provider} instance=#{instance} unreachable: #{error.message}")
+          else
+            handle_exception(error, level: :warn, handled: true,
+                                    operation: "discovery.offerings.#{provider}/#{instance}")
+          end
+          return unless defined?(Router) && Router.respond_to?(:health_tracker)
+          Router.health_tracker.report(
+            provider: provider, instance: instance,
+            signal: :error, value: 1,
+            metadata: { reason: error.class.name, source: :discovery }
+          )
+        end
         def normalize_offering(offering)
           data = if offering.is_a?(Hash)
                    offering