RubyGems - legion-llm - Versions diffs - 0.6.23 → 0.6.25 - Mend

legion-llm 0.6.23 → 0.6.25

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +46 -0
data/CLAUDE.md +56 -7
data/lib/legion/llm/audit/exchange.rb +12 -0
data/lib/legion/llm/audit/prompt_event.rb +56 -0
data/lib/legion/llm/audit/tool_event.rb +46 -0
data/lib/legion/llm/audit.rb +53 -0
data/lib/legion/llm/conversation_store.rb +3 -1
data/lib/legion/llm/fleet/dispatcher.rb +93 -22
data/lib/legion/llm/fleet/error.rb +61 -0
data/lib/legion/llm/fleet/exchange.rb +12 -0
data/lib/legion/llm/fleet/reply_dispatcher.rb +40 -2
data/lib/legion/llm/fleet/request.rb +30 -0
data/lib/legion/llm/fleet/response.rb +49 -0
data/lib/legion/llm/fleet.rb +9 -0
data/lib/legion/llm/metering/event.rb +32 -0
data/lib/legion/llm/metering/exchange.rb +12 -0
data/lib/legion/llm/metering.rb +63 -0
data/lib/legion/llm/patches/ruby_llm_parallel_tools.rb +102 -0
data/lib/legion/llm/pipeline/audit_publisher.rb +12 -17
data/lib/legion/llm/pipeline/executor.rb +56 -4
data/lib/legion/llm/pipeline/steps/metering.rb +3 -36
data/lib/legion/llm/settings.rb +7 -1
data/lib/legion/llm/transport/message.rb +82 -0
data/lib/legion/llm/version.rb +1 -1
data/lib/legion/llm.rb +3 -0
metadata +14 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: bd422bcc5c5b6da0dbd4906df8ac394e5c712e709eb8cb367cc676fbf6e45f97
-  data.tar.gz: 148e5741014313918781e757c87a50e40b2d5e5ef164631b71959f6027c70316
+  metadata.gz: 5e329176f3f041bc5cade11008d7a932325df80d510142da79f362ab5bd27e5a
+  data.tar.gz: 46f1e1a12ebd64bafdbe0e4d4f799df4addf586e5c89dfd7824cef4694582681
 SHA512:
-  metadata.gz: dc80d32daf35e53bfe514a0e318911c97e9e3971374eb711128c68db6a02084cc9fd259f68ccf6e0242fb10ff1cbccf2a1ca9132b37e2aa1e6ee23cd5cbe0b5d
-  data.tar.gz: 5673f3536126bc1d3e17e69ab2892edb1b8bd9524bdc6c38993e85ac0869e48a42bd82f9d8691923527f701940a42969052bbbf9db40976434f1ad00210f3934
+  metadata.gz: 464cf343578d082092d57ed8a8091724a3508925088f795b1c1a1724250ec8876bb18ae32b8957b738df7439a31d1ce436275274997925f7cdc8ebafcdf99080
+  data.tar.gz: 0be8ab0bd23eed8cf0320aed8d422f9d1a63937e888ece646ead1c7615add65c8f769f6da250859c8f8e32013bb7daffd3310a311df9d279339902568e4151fb

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,51 @@
 # Legion LLM Changelog
+## [0.6.25] - 2026-04-08
+### Added
+- `Legion::LLM::Transport::Message` — LLM base message class with `message_context` propagation, LLM-specific headers (`x-legion-llm-provider`, `x-legion-llm-model`, `x-legion-llm-request-type`, `x-legion-llm-schema-version`), context header promotion, and `tracing_headers` stub for future OpenTelemetry integration
+- `Legion::LLM::Fleet::Exchange` — declares `llm.request` topic exchange (source of truth for fleet routing)
+- `Legion::LLM::Fleet::Request` — fleet inference request message with priority mapping, TTL-to-expiration conversion, and `req_` prefixed message IDs
+- `Legion::LLM::Fleet::Response` — fleet inference response message with default-exchange publish override, Bunny error rescue, and `resp_` prefixed message IDs
+- `Legion::LLM::Fleet::Error` — fleet error message with `ERROR_CODES` registry (12 codes), `x-legion-fleet-error` header, default-exchange publish override, and `err_` prefixed message IDs
+- `Legion::LLM::Metering::Exchange` — declares `llm.metering` topic exchange
+- `Legion::LLM::Metering::Event` — metering event message with tier header, `metering.<type>` routing keys, and `meter_` prefixed message IDs
+- `Legion::LLM::Metering` module — `emit(event)` and `flush_spool` public API replacing gateway dependency for metering
+- `Legion::LLM::Audit::Exchange` — declares `llm.audit` topic exchange (supersedes `Transport::Exchanges::Audit`)
+- `Legion::LLM::Audit::PromptEvent` — prompt audit message (always encrypted) with classification, caller, retention, and tier headers
+- `Legion::LLM::Audit::ToolEvent` — tool call audit message (always encrypted) with tool metadata headers
+- `Legion::LLM::Audit` module — `emit_prompt(event)` and `emit_tools(event)` public API (no spool — audit data too sensitive for plaintext disk)
+- `Fleet::Dispatcher.build_routing_key` — builds `llm.request.<provider>.<type>.<model>` routing keys with `:` to `.` sanitization
+- `Fleet::Dispatcher` per-type timeout resolution (`embed: 10s`, `chat: 30s`, `generate: 30s`) from settings or `TIMEOUTS` constant
+- `Fleet::Dispatcher` backwards-compatible shim supporting both old `(model:, messages:)` and new `(request:, message_context:)` dispatch signatures
+- `Fleet::ReplyDispatcher.fulfill_return` — handles `basic.return` with `no_fleet_queue` error
+- `Fleet::ReplyDispatcher.fulfill_nack` — handles `basic.nack` with `fleet_backpressure` error
+- `Fleet::ReplyDispatcher` type-aware delivery dispatch — handles `llm.fleet.response`, `llm.fleet.error`, and legacy (no type) formats
+- `routing.tier_priority` setting — default `[local, fleet, direct]` three-tier ordering
+- `routing.tiers.fleet.timeouts` setting — per-request-type timeout configuration
+### Changed
+- `Fleet::Dispatcher#publish_request` now uses `Fleet::Request` message class (falls back to gateway `InferenceRequest` when `Fleet::Request` unavailable)
+- `Pipeline::Steps::Metering#publish_event` now delegates to `Legion::LLM::Metering.emit` instead of `Gateway::Transport::Messages::MeteringEvent`
+- `Pipeline::AuditPublisher#publish` now delegates to `Legion::LLM::Audit.emit_prompt` instead of raw `Transport::Messages::AuditEvent`
+- `routing.tiers.fleet.queue` default changed from `llm.inference` to `llm.request` (fleet exchange rename)
+## [0.6.24] - 2026-04-08
+### Added
+- `Legion::LLM::Patches::RubyLLMParallelTools`: monkey-patch that replaces RubyLLM's serial `handle_tool_calls` loop with concurrent thread execution so all tool calls in a batch run in parallel
+- `ToolResultWrapper` struct exposes `tool_call_id`, `id`, `tool_name`, `result`, and `content` so bridge scripts can match results back to UI slots without falling back to name-based matching
+- `emit_tool_result_event` in `Pipeline::Executor`: fires `tool_event_handler` with `type: :tool_result`, `duration_ms`, `started_at`, and `finished_at` after each tool completes
+- `tool_event_handler` now also fires `type: :model_fallback` events (with `from_model`, `to_model`, `error`, `reason`) on auth-failed provider fallback in both regular and streaming paths
+- `max_tool_rounds` setting (default `200`) in LLM settings; `install_tool_loop_guard` now reads it at call time so callers can override the cap per-session
+- `started_at` timestamp stored in `Thread.current[:legion_current_tool_started_at]` for accurate per-call wall-clock duration even across parallel threads
+### Changed
+- `MAX_RUBY_LLM_TOOL_ROUNDS` constant raised from `25` to `200` (now serves as a fallback default for the configurable `max_tool_rounds` setting)
+### Fixed
+- `ConversationStore#db_append_message` now serializes non-String `content` values (e.g., tool-call arrays) to JSON before writing to the database, preventing Sequel type errors when tool-use messages are persisted
 ## [0.6.23] - 2026-04-07
 ### Fixed

data/CLAUDE.md CHANGED Viewed

@@ -8,7 +8,7 @@
 Core LegionIO gem providing LLM capabilities to all extensions. Wraps ruby_llm to provide a consistent interface for chat, embeddings, tool use, and agents across multiple providers (Bedrock, Anthropic, OpenAI, Gemini, Ollama). Includes a dynamic weighted routing engine that dispatches requests across local, fleet, and cloud tiers based on caller intent, priority rules, time schedules, cost multipliers, and real-time provider health.
 **GitHub**: https://github.com/LegionIO/legion-llm
-**Version**: 0.6.18
+**Version**: 0.6.25
 **License**: Apache-2.0
 ## Architecture
@@ -69,9 +69,20 @@ Legion::LLM (lib/legion/llm.rb)
 │         McpToolAdapter renamed to ToolAdapter; McpToolAdapter kept as a backwards-compatible alias.
 ├── CostEstimator    # Model cost estimation with fuzzy pricing (absorbed from lex-llm-gateway)
 ├── Fleet            # Fleet RPC dispatch (absorbed from lex-llm-gateway)
-│   ├── Dispatcher   # Fleet dispatch with timeout and availability checks
+│   ├── Exchange     # Declares `llm.request` topic exchange (source of truth)
+│   ├── Request      # Fleet inference request message (type: 'llm.fleet.request')
+│   ├── Response     # Fleet inference response message (type: 'llm.fleet.response', default exchange publish)
+│   ├── Error        # Fleet error message (type: 'llm.fleet.error', ERROR_CODES registry)
+│   ├── Dispatcher   # Fleet dispatch with timeout and routing key building
 │   ├── Handler      # Fleet request handler for GPU worker nodes
-│   └── ReplyDispatcher # Correlation-based reply routing for fleet RPC
+│   └── ReplyDispatcher # Correlation-based reply routing with type-aware dispatch, fulfill_return, fulfill_nack
+├── Metering         # Metering event emission (replaces gateway dependency)
+│   ├── Exchange     # Declares `llm.metering` topic exchange
+│   └── Event        # Metering event message (type: 'llm.metering.event')
+├── Audit            # Audit event emission (replaces gateway dependency)
+│   ├── Exchange     # Declares `llm.audit` topic exchange
+│   ├── PromptEvent  # Prompt audit message (type: 'llm.audit.prompt', always encrypted)
+│   └── ToolEvent    # Tool audit message (type: 'llm.audit.tool', always encrypted)
 └── Helpers::LLM     # Extension helper mixin (llm_chat, llm_embed, llm_session, compress:)
 ```
@@ -181,6 +192,20 @@ Legion::LLM.chat(message:, escalate: true, max_escalations: 3, quality_check:) #
 Legion::LLM::EscalationExhausted                                                # raised when all escalation attempts are exhausted
 Legion::LLM::Router.resolve_chain(intent:, tier:, max_escalations:)            # -> EscalationChain
 Legion::LLM::QualityChecker.check(response, quality_threshold: 50, json_expected: false, quality_check: nil) # -> QualityResult
+# Metering
+Legion::LLM::Metering.emit(event_hash)                        # -> :published | :spooled | :dropped
+Legion::LLM::Metering.flush_spool                             # -> Integer (count flushed)
+# Audit
+Legion::LLM::Audit.emit_prompt(event_hash)                    # -> :published | :dropped
+Legion::LLM::Audit.emit_tools(event_hash)                     # -> :published | :dropped
+# Fleet Dispatcher
+Legion::LLM::Fleet::Dispatcher.dispatch(model:, messages:, **) # Old signature (backwards compat)
+Legion::LLM::Fleet::Dispatcher.dispatch(request:, message_context:, routing_key:, **) # New signature
+Legion::LLM::Fleet::Dispatcher.build_routing_key(provider:, request_type:, model:)    # -> String
+Legion::LLM::Fleet::Dispatcher.fleet_available?                # -> Boolean
 ```
 ## Settings
@@ -347,10 +372,22 @@ In-memory signal consumer with pluggable handlers. Adjusts effective priorities
 | `lib/legion/llm/pipeline/steps/rag_guard.rb` | Pipeline::Steps::RagGuard: faithfulness check against retrieved RAG context |
 | `lib/legion/llm/pipeline/enrichment_injector.rb` | Pipeline::EnrichmentInjector: converts RAG/GAIA enrichments into system prompt |
 | `lib/legion/llm/cost_estimator.rb` | CostEstimator: model cost estimation with fuzzy pricing |
-| `lib/legion/llm/fleet.rb` | Fleet module: requires dispatcher, handler, reply_dispatcher |
-| `lib/legion/llm/fleet/dispatcher.rb` | Fleet::Dispatcher: fleet RPC dispatch |
+| `lib/legion/llm/transport/message.rb` | LLM base message class: message_context propagation, LLM headers, envelope key stripping |
+| `lib/legion/llm/fleet.rb` | Fleet module: requires exchange, request, response, error, dispatcher, handler, reply_dispatcher |
+| `lib/legion/llm/fleet/exchange.rb` | Fleet::Exchange: declares `llm.request` topic exchange |
+| `lib/legion/llm/fleet/request.rb` | Fleet::Request: fleet inference request with priority mapping, TTL conversion |
+| `lib/legion/llm/fleet/response.rb` | Fleet::Response: fleet response with default-exchange publish |
+| `lib/legion/llm/fleet/error.rb` | Fleet::Error: fleet error with ERROR_CODES registry, error headers |
+| `lib/legion/llm/fleet/dispatcher.rb` | Fleet::Dispatcher: fleet RPC dispatch with routing key building, per-type timeouts |
 | `lib/legion/llm/fleet/handler.rb` | Fleet::Handler: fleet request handler |
-| `lib/legion/llm/fleet/reply_dispatcher.rb` | Fleet::ReplyDispatcher: correlation-based reply routing |
+| `lib/legion/llm/fleet/reply_dispatcher.rb` | Fleet::ReplyDispatcher: type-aware reply routing, fulfill_return, fulfill_nack |
+| `lib/legion/llm/metering.rb` | Metering module: emit, flush_spool public API |
+| `lib/legion/llm/metering/exchange.rb` | Metering::Exchange: declares `llm.metering` topic exchange |
+| `lib/legion/llm/metering/event.rb` | Metering::Event: metering event message with tier header |
+| `lib/legion/llm/audit.rb` | Audit module: emit_prompt, emit_tools public API |
+| `lib/legion/llm/audit/exchange.rb` | Audit::Exchange: declares `llm.audit` topic exchange |
+| `lib/legion/llm/audit/prompt_event.rb` | Audit::PromptEvent: prompt audit with classification/caller/retention headers |
+| `lib/legion/llm/audit/tool_event.rb` | Audit::ToolEvent: tool audit with tool metadata headers |
 | `lib/legion/llm/helpers/llm.rb` | Extension helper mixin: llm_chat (with compress:, escalate:, max_escalations:, quality_check:), llm_embed, llm_session |
 | `spec/legion/llm_spec.rb` | Tests: settings, lifecycle, providers, auto-config |
 | `spec/legion/llm/integration_spec.rb` | Tests: routing integration with chat() |
@@ -390,8 +427,20 @@ In-memory signal consumer with pluggable handlers. Adjusts effective priorities
 | `spec/legion/llm/pipeline/executor_spec.rb` | Tests: Executor pipeline execution, profile skipping |
 | `spec/legion/llm/pipeline/integration_spec.rb` | Tests: Pipeline integration with chat() dispatch |
 | `spec/legion/llm/pipeline/steps/metering_spec.rb` | Tests: Metering event building |
-| `spec/legion/llm/fleet/dispatcher_spec.rb` | Tests: Fleet dispatch, availability, timeout |
+| `spec/legion/llm/transport/message_spec.rb` | Tests: LLM base message class |
+| `spec/legion/llm/fleet/exchange_spec.rb` | Tests: fleet exchange declaration |
+| `spec/legion/llm/fleet/request_spec.rb` | Tests: Fleet::Request message |
+| `spec/legion/llm/fleet/response_spec.rb` | Tests: Fleet::Response message |
+| `spec/legion/llm/fleet/error_spec.rb` | Tests: Fleet::Error message |
+| `spec/legion/llm/fleet/dispatcher_spec.rb` | Tests: Fleet dispatch, routing keys, per-type timeouts, ReplyDispatcher |
 | `spec/legion/llm/fleet/handler_spec.rb` | Tests: Fleet handler, auth, response building |
+| `spec/legion/llm/metering/exchange_spec.rb` | Tests: metering exchange |
+| `spec/legion/llm/metering/event_spec.rb` | Tests: Metering::Event message |
+| `spec/legion/llm/metering_spec.rb` | Tests: Metering emit/spool API |
+| `spec/legion/llm/audit/exchange_spec.rb` | Tests: audit exchange |
+| `spec/legion/llm/audit/prompt_event_spec.rb` | Tests: Audit::PromptEvent |
+| `spec/legion/llm/audit/tool_event_spec.rb` | Tests: Audit::ToolEvent |
+| `spec/legion/llm/audit_spec.rb` | Tests: Audit emit API |
 | `spec/legion/llm/pipeline/steps/rag_context_spec.rb` | Tests: RAG context strategy selection, Apollo retrieval, graceful degradation |
 | `spec/legion/llm/pipeline/steps/rag_guard_spec.rb` | Tests: RAG faithfulness checking |
 | `spec/legion/llm/pipeline/enrichment_injector_spec.rb` | Tests: enrichment injection into system prompt |

data/lib/legion/llm/audit/exchange.rb ADDED Viewed

@@ -0,0 +1,12 @@
+# frozen_string_literal: true
+module Legion
+  module LLM
+    module Audit
+      class Exchange < ::Legion::Transport::Exchange
+        def exchange_name = 'llm.audit'
+        def default_type  = 'topic'
+      end
+    end
+  end
+end

data/lib/legion/llm/audit/prompt_event.rb ADDED Viewed

@@ -0,0 +1,56 @@
+# frozen_string_literal: true
+require_relative '../transport/message'
+module Legion
+  module LLM
+    module Audit
+      class PromptEvent < Legion::LLM::Transport::Message
+        def type        = 'llm.audit.prompt'
+        def exchange    = Legion::LLM::Audit::Exchange
+        def routing_key = "audit.prompt.#{@options[:request_type]}"
+        def priority    = 0
+        def encrypt?    = true
+        def expiration  = nil
+        def headers
+          super.merge(classification_headers).merge(caller_headers).merge(retention_headers).merge(tier_header)
+        end
+        private
+        def message_id_prefix = 'audit_prompt'
+        def classification_headers
+          cls = @options[:classification] || {}
+          h = {}
+          h['x-legion-classification'] = cls[:level].to_s if cls[:level]
+          h['x-legion-contains-phi']   = cls[:contains_phi].to_s unless cls[:contains_phi].nil?
+          h['x-legion-jurisdictions']  = Array(cls[:jurisdictions]).join(',') if cls[:jurisdictions]
+          h
+        end
+        def caller_headers
+          caller_info = @options.dig(:caller, :requested_by) || {}
+          h = {}
+          h['x-legion-caller-identity'] = caller_info[:identity].to_s if caller_info[:identity]
+          h['x-legion-caller-type']     = caller_info[:type].to_s     if caller_info[:type]
+          h
+        end
+        def retention_headers
+          cls = @options[:classification] || {}
+          h = {}
+          h['x-legion-retention'] = cls[:retention].to_s if cls[:retention]
+          h
+        end
+        def tier_header
+          h = {}
+          h['x-legion-llm-tier'] = @options[:tier].to_s if @options[:tier]
+          h
+        end
+      end
+    end
+  end
+end

data/lib/legion/llm/audit/tool_event.rb ADDED Viewed

@@ -0,0 +1,46 @@
+# frozen_string_literal: true
+require_relative '../transport/message'
+module Legion
+  module LLM
+    module Audit
+      class ToolEvent < Legion::LLM::Transport::Message
+        def type        = 'llm.audit.tool'
+        def exchange    = Legion::LLM::Audit::Exchange
+        def routing_key = "audit.tool.#{@options[:tool_name]}"
+        def priority    = 0
+        def encrypt?    = true
+        def expiration  = nil
+        def headers
+          super.merge(tool_headers).merge(classification_headers)
+        end
+        private
+        def message_id_prefix = 'audit_tool'
+        def tool_headers
+          tc = @options[:tool_call] || {}
+          src = tc[:source] || {}
+          h = {}
+          tool_name = tc[:name] || @options[:tool_name]
+          h['x-legion-tool-name']          = tool_name.to_s    if tool_name
+          h['x-legion-tool-source-type']   = src[:type].to_s   if src[:type]
+          h['x-legion-tool-source-server'] = src[:server].to_s if src[:server]
+          h['x-legion-tool-status']        = tc[:status].to_s  if tc[:status]
+          h
+        end
+        def classification_headers
+          cls = @options[:classification] || {}
+          h = {}
+          h['x-legion-classification'] = cls[:level].to_s if cls[:level]
+          h['x-legion-contains-phi']   = cls[:contains_phi].to_s unless cls[:contains_phi].nil?
+          h
+        end
+      end
+    end
+  end
+end

data/lib/legion/llm/audit.rb ADDED Viewed

@@ -0,0 +1,53 @@
+# frozen_string_literal: true
+require 'legion/logging/helper'
+if defined?(Legion::Transport::Message)
+  require_relative 'audit/exchange'
+  require_relative 'audit/prompt_event'
+  require_relative 'audit/tool_event'
+end
+module Legion
+  module LLM
+    module Audit
+      extend Legion::Logging::Helper
+      module_function
+      def emit_prompt(event)
+        if transport_connected? && defined?(Legion::LLM::Audit::PromptEvent)
+          Legion::LLM::Audit::PromptEvent.new(**event).publish
+          log.info('[llm][audit] published prompt audit')
+          :published
+        else
+          log.warn('[llm][audit] dropped prompt audit: transport unavailable')
+          :dropped
+        end
+      rescue StandardError => e
+        handle_exception(e, level: :warn, operation: 'llm.audit.emit_prompt')
+        :dropped
+      end
+      def emit_tools(event)
+        if transport_connected? && defined?(Legion::LLM::Audit::ToolEvent)
+          Legion::LLM::Audit::ToolEvent.new(**event).publish
+          log.info('[llm][audit] published tool audit')
+          :published
+        else
+          log.warn('[llm][audit] dropped tool audit: transport unavailable')
+          :dropped
+        end
+      rescue StandardError => e
+        handle_exception(e, level: :warn, operation: 'llm.audit.emit_tools')
+        :dropped
+      end
+      def transport_connected?
+        !!(defined?(Legion::Transport) &&
+          Legion::Transport.respond_to?(:connected?) &&
+          Legion::Transport.connected?)
+      end
+    end
+  end
+end

data/lib/legion/llm/conversation_store.rb CHANGED Viewed

@@ -373,11 +373,13 @@ module Legion
         end
         def db_append_message(conversation_id, msg)
+          content = msg[:content]
+          content = content.to_json unless content.is_a?(String) || content.nil?
           row = {
             conversation_id: conversation_id,
             seq:             msg[:seq],
             role:            msg[:role].to_s,
-            content:         msg[:content],
+            content:         content,
             provider:        msg[:provider]&.to_s,
             model:           msg[:model]&.to_s,
             input_tokens:    msg[:input_tokens],

data/lib/legion/llm/fleet/dispatcher.rb CHANGED Viewed

@@ -7,18 +7,66 @@ module Legion
     module Fleet
       module Dispatcher
         DEFAULT_TIMEOUT = 30
+        TIMEOUTS = {
+          embed:    10,
+          chat:     30,
+          generate: 30,
+          default:  30
+        }.freeze
         extend Legion::Logging::Helper
         module_function
-        def dispatch(model:, messages:, **opts)
-          return error_result('fleet_unavailable') unless fleet_available?
+        # Backwards-compatible shim: supports old (model:, messages:) and new (request:, message_context:) callers
+        def dispatch(model: nil, messages: nil, request: nil, message_context: {}, routing_key: nil, reply_to: nil, **opts)
+          return error_result('fleet_unavailable', message_context: message_context) unless fleet_available?
+          # Old calling convention: build minimal params from model/messages
+          if request.nil? && (model || messages)
+            provider = opts[:provider] || 'ollama'
+            request_type = opts[:request_type] || 'chat'
+            routing_key ||= build_routing_key(provider: provider, request_type: request_type, model: model)
+            reply_to ||= ReplyDispatcher.agent_queue_name
+            correlation_id = publish_request(
+              routing_key: routing_key, reply_to: reply_to,
+              provider: provider, model: model, request_type: request_type,
+              messages: messages, message_context: message_context, **opts
+            )
+            timeout = resolve_timeout(request_type: request_type, override: opts[:timeout])
+            return wait_for_response(correlation_id, timeout: timeout, message_context: message_context)
+          end
+          # New calling convention
+          request_opts =
+            if request.respond_to?(:to_h)
+              request.to_h.transform_keys(&:to_sym)
+            else
+              {}
+            end
+          request_opts = request_opts.merge(opts)
+          provider = request_opts[:provider] || 'ollama'
+          request_type = request_opts[:request_type] || 'chat'
+          model = request_opts[:model]
+          routing_key ||= build_routing_key(provider: provider, request_type: request_type, model: model)
+          reply_to ||= ReplyDispatcher.agent_queue_name
+          correlation_id = publish_request(
+            routing_key: routing_key, reply_to: reply_to,
+            provider: provider, model: model, request_type: request_type,
+            message_context: message_context, **request_opts.except(:provider, :model, :request_type, :timeout)
+          )
+          timeout = resolve_timeout(request_type: request_type, override: request_opts[:timeout] || opts[:timeout])
+          wait_for_response(correlation_id, timeout: timeout, message_context: message_context)
+        end
-          correlation_id = "fleet_#{SecureRandom.hex(12)}"
-          publish_request(model: model, messages: messages, intent: opts[:intent],
-                          correlation_id: correlation_id, **opts.except(:intent, :timeout))
+        def build_routing_key(provider:, request_type:, model:)
+          "llm.request.#{provider}.#{request_type}.#{sanitize_model(model)}"
+        end
-          wait_for_response(correlation_id, timeout: resolve_timeout(opts[:timeout]))
+        def sanitize_model(model)
+          model.to_s.gsub(':', '.')
         end
         def fleet_available?
@@ -48,10 +96,17 @@ module Legion
           routing.fetch(:use_fleet, true)
         end
-        def resolve_timeout(override)
+        def resolve_timeout(request_type: :default, override: nil)
           return override if override
-          return DEFAULT_TIMEOUT unless defined?(Legion::Settings)
+          configured = fleet_timeout_from_settings(request_type)
+          return configured if configured
+          TIMEOUTS[request_type.to_sym] || TIMEOUTS[:default]
+        end
+        def fleet_timeout_from_settings(request_type)
+          return unless defined?(Legion::Settings)
           settings = begin
             Legion::Settings[:llm]
@@ -59,35 +114,51 @@ module Legion
             handle_exception(e, level: :debug, operation: 'llm.fleet.dispatcher.resolve_timeout')
             nil
           end
-          return DEFAULT_TIMEOUT unless settings.is_a?(Hash)
-          settings.dig(:routing, :fleet, :timeout_seconds) || DEFAULT_TIMEOUT
+          return unless settings.is_a?(Hash)
+          routing = settings[:routing]
+          return unless routing.is_a?(Hash)
+          fleet_settings = routing.dig(:tiers, :fleet)
+          fleet_settings = routing[:fleet] unless fleet_settings.is_a?(Hash)
+          return unless fleet_settings.is_a?(Hash)
+          fleet_settings.dig(:timeouts, request_type.to_sym) || fleet_settings[:timeout_seconds]
         end
-        def publish_request(**)
-          return unless defined?(Legion::Extensions::LLM::Gateway::Transport::Messages::InferenceRequest)
+        def publish_request(**opts)
+          correlation_id = "req_#{SecureRandom.uuid}"
+          opts[:fleet_correlation_id] = correlation_id
+          if defined?(Legion::LLM::Fleet::Request)
+            Legion::LLM::Fleet::Request.new(**opts).publish
+          elsif defined?(Legion::Extensions::LLM::Gateway::Transport::Messages::InferenceRequest)
+            Legion::Extensions::LLM::Gateway::Transport::Messages::InferenceRequest.new(
+              reply_to: opts[:reply_to], **opts.except(:reply_to)
+            ).publish
+          end
-          Legion::Extensions::LLM::Gateway::Transport::Messages::InferenceRequest.new(
-            reply_to: ReplyDispatcher.agent_queue_name, **
-          ).publish
+          correlation_id
         end
-        def wait_for_response(correlation_id, timeout:)
+        def wait_for_response(correlation_id, timeout:, message_context: {})
           future = ReplyDispatcher.register(correlation_id)
           result = future.value!(timeout)
-          result || timeout_result(correlation_id, timeout)
+          result || timeout_result(correlation_id, timeout, message_context: message_context)
         rescue Concurrent::CancelledOperationError
-          timeout_result(correlation_id, timeout)
+          timeout_result(correlation_id, timeout, message_context: message_context)
         ensure
           ReplyDispatcher.deregister(correlation_id)
         end
-        def timeout_result(correlation_id, timeout)
-          { success: false, error: 'fleet_timeout', correlation_id: correlation_id, timeout: timeout }
+        def timeout_result(correlation_id, timeout, message_context: {})
+          { success: false, error: 'fleet_timeout', correlation_id: correlation_id,
+            timeout: timeout, message_context: message_context }
         end
-        def error_result(reason)
-          { success: false, error: reason }
+        def error_result(reason, message_context: {})
+          { success: false, error: reason, message_context: message_context }
         end
       end
     end

data/lib/legion/llm/fleet/error.rb ADDED Viewed

@@ -0,0 +1,61 @@
+# frozen_string_literal: true
+require_relative '../transport/message'
+module Legion
+  module LLM
+    module Fleet
+      class Error < Legion::LLM::Transport::Message
+        ERROR_CODES = %w[
+          model_not_loaded ollama_unavailable inference_failed inference_timeout
+          invalid_token token_expired payload_too_large unsupported_type
+          unsupported_streaming no_fleet_queue fleet_backpressure fleet_timeout
+        ].freeze
+        def type        = 'llm.fleet.error'
+        def routing_key = @options[:reply_to]
+        def priority    = 0
+        def expiration  = nil
+        def encrypt?    = false
+        def headers
+          super.merge(error_headers).merge(tracing_headers)
+        end
+        # Same default-exchange override as Fleet::Response.
+        def publish(options = @options)
+          raise unless @valid
+          validate_payload_size
+          channel.default_exchange.publish(
+            encode_message,
+            routing_key:      routing_key,
+            content_type:     options[:content_type] || content_type,
+            content_encoding: options[:content_encoding] || content_encoding,
+            headers:          headers,
+            type:             type,
+            priority:         priority,
+            message_id:       message_id,
+            correlation_id:   correlation_id,
+            app_id:           app_id,
+            timestamp:        timestamp
+          )
+        rescue Bunny::ConnectionClosedError, Bunny::ChannelAlreadyClosed,
+               Bunny::NetworkErrorWrapper, IOError, Timeout::Error => e
+          spool_message(e)
+        end
+        private
+        def message_id_prefix = 'err'
+        def error_headers
+          h = {}
+          code = @options.dig(:error, :code)
+          h['x-legion-fleet-error'] = code.to_s if code
+          h
+        end
+      end
+    end
+  end
+end

data/lib/legion/llm/fleet/exchange.rb ADDED Viewed

@@ -0,0 +1,12 @@
+# frozen_string_literal: true
+module Legion
+  module LLM
+    module Fleet
+      class Exchange < ::Legion::Transport::Exchange
+        def exchange_name = 'llm.request'
+        def default_type  = 'topic'
+      end
+    end
+  end
+end

data/lib/legion/llm/fleet/reply_dispatcher.rb CHANGED Viewed

@@ -34,11 +34,36 @@ module Legion
           future = @pending.delete(cid)
           return unless future
-          future.fulfill(payload)
+          # Type-aware dispatch (new protocol) with fallback to legacy (no type)
+          case properties[:type]
+          when 'llm.fleet.error'
+            future.fulfill(normalize_error(payload))
+          else
+            # 'llm.fleet.response' or legacy (no type)
+            future.fulfill(payload)
+          end
         rescue StandardError => e
           handle_exception(e, level: :warn)
         end
+        def fulfill_return(correlation_id)
+          future = @pending.delete(correlation_id)
+          return unless future
+          future.fulfill({ success: false, error: 'no_fleet_queue' })
+        rescue StandardError => e
+          handle_exception(e, level: :warn, operation: 'llm.fleet.reply_dispatcher.fulfill_return')
+        end
+        def fulfill_nack(correlation_id)
+          future = @pending.delete(correlation_id)
+          return unless future
+          future.fulfill({ success: false, error: 'fleet_backpressure' })
+        rescue StandardError => e
+          handle_exception(e, level: :warn, operation: 'llm.fleet.reply_dispatcher.fulfill_nack')
+        end
         def agent_queue_name
           @agent_queue_name ||= "llm.fleet.reply.#{SecureRandom.hex(8)}"
         end
@@ -62,7 +87,10 @@ module Legion
             channel = Legion::Transport.connection.create_channel
             queue = channel.queue(agent_queue_name, auto_delete: true, durable: false)
             @consumer = queue.subscribe(manual_ack: false) do |_delivery, properties, body|
-              props = { correlation_id: properties.correlation_id }
+              props = {
+                correlation_id: properties.correlation_id,
+                type:           properties.type
+              }
               handle_delivery(body, props)
             end
           end
@@ -96,6 +124,16 @@ module Legion
           handle_exception(e, level: :debug)
           {}
         end
+        def normalize_error(payload)
+          error = payload[:error] || {}
+          {
+            success:         false,
+            error:           error.is_a?(Hash) ? error[:code] || error[:message] || 'fleet_error' : error.to_s,
+            message_context: payload[:message_context] || {},
+            raw_error:       error
+          }
+        end
       end
     end
   end