RubyGems - legion-llm - Versions diffs - 0.3.23 → 0.4.0 - Mend

legion-llm 0.3.23 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (32) hide show

checksums.yaml +4 -4
data/.gitignore +1 -0
data/CHANGELOG.md +92 -0
data/CLAUDE.md +42 -3
data/legion-llm-0.3.20.gem +0 -0
data/lib/legion/llm/batch.rb +14 -0
data/lib/legion/llm/compressor.rb +89 -0
data/lib/legion/llm/cost_estimator.rb +51 -0
data/lib/legion/llm/escalation_tracker.rb +73 -0
data/lib/legion/llm/fleet/dispatcher.rb +90 -0
data/lib/legion/llm/fleet/handler.rb +110 -0
data/lib/legion/llm/fleet/reply_dispatcher.rb +98 -0
data/lib/legion/llm/fleet.rb +12 -0
data/lib/legion/llm/hooks/budget_guard.rb +81 -0
data/lib/legion/llm/hooks/cost_tracking.rb +54 -0
data/lib/legion/llm/hooks/reflection.rb +238 -0
data/lib/legion/llm/hooks.rb +3 -0
data/lib/legion/llm/pipeline/executor.rb +226 -0
data/lib/legion/llm/pipeline/profile.rb +42 -0
data/lib/legion/llm/pipeline/request.rb +93 -0
data/lib/legion/llm/pipeline/response.rb +75 -0
data/lib/legion/llm/pipeline/steps/metering.rb +86 -0
data/lib/legion/llm/pipeline/timeline.rb +51 -0
data/lib/legion/llm/pipeline/tracing.rb +35 -0
data/lib/legion/llm/pipeline.rb +16 -0
data/lib/legion/llm/quality_checker.rb +23 -0
data/lib/legion/llm/scheduling.rb +12 -0
data/lib/legion/llm/settings.rb +1 -0
data/lib/legion/llm/shadow_eval.rb +76 -2
data/lib/legion/llm/version.rb +1 -1
data/lib/legion/llm.rb +26 -0
metadata +19 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 45a5b0befbd5b6ea879f539a30ecb6675f7481e349c115f52ffc2e167c9e7c8d
-  data.tar.gz: 66323d03c6aac956cb0ad78b9bb708cbe7e3c834ede3ab4b34c820e1289f2320
+  metadata.gz: 2963ec3f995c8bab5b80af82d724155eb3590473bbf04e6eec7fe22c8625435a
+  data.tar.gz: 2fb50f72dfbe6867a388a50317eea6403e10c117e80f2f83fcb259e5601e9419
 SHA512:
-  metadata.gz: 37ace42f654c110b9e633c53f4999fadd52d447f76145f95bd8ea18a3bcd816ad7ff4b6d3c7270eceaa12ab3b2f4a00f0192066df2a0adc53643d4614f534bb8
-  data.tar.gz: 7c6d639f5f45dbc4c3fccb94e8fece4e933da6037758d8eec3590c35fa15c24f55c19cdc9cc534b848a879d5de585d1c2cef070faa0b3bf354417cef25f86ab7
+  metadata.gz: f6009ed5907f5cfc642cac46f0b626b3e8c888549c6de480ff4c39a569fe23901a281cc2ccd249fe605f88862ffc1392a684afc9ab4a1439520edb5f83cf6734
+  data.tar.gz: 0d5b8fcd87a585cc3da2c81ff737cabe25bcadf171ee58304f2544b4a601fc519666e6a54ff402923b39b597b41c527019fbc421eefbb6bbdf8afaab0ce75f7e

data/.gitignore CHANGED Viewed

@@ -16,3 +16,4 @@ legionio.key
 # logs and OS artifacts
 legion.log
 .DS_Store
+.worktrees/

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,97 @@
 # Legion LLM Changelog
+## [0.4.0] - 2026-03-23
+### Added
+- `Pipeline::Request`: Data.define struct with `.build` and `.from_chat_args` for unified request representation
+- `Pipeline::Response`: Data.define struct with `.build`, `.from_ruby_llm`, and `#with` for immutable responses
+- `Pipeline::Profile`: Caller-derived profiles (external/gaia/system) with step skip logic
+- `Pipeline::Tracing`: Distributed tracing with trace_id, span_id, and exchange_id generation
+- `Pipeline::Timeline`: Ordered event recording with participant tracking
+- `Pipeline::Executor`: 18-step pipeline skeleton with profile-aware step execution
+- `Pipeline::Steps::Metering`: Metering event builder absorbed from lex-llm-gateway
+- `CostEstimator`: Model cost estimation with fuzzy matching, absorbed from lex-llm-gateway
+- `Fleet::Dispatcher`: Fleet RPC dispatch absorbed from lex-llm-gateway
+- `Fleet::Handler`: Fleet request handler absorbed from lex-llm-gateway
+- `Fleet::ReplyDispatcher`: Correlation-based reply routing for fleet RPC
+- Feature-flagged `pipeline_enabled` setting (default: false) for incremental rollout
+- Pipeline path in `_dispatch_chat` activated by `pipeline_enabled: true`
+## [0.3.32] - 2026-03-23
+### Added
+- `Hooks::Reflection`: after_chat hook that extracts knowledge from conversations
+- Detects decisions, patterns, and facts using regex markers
+- Publishes extracted entries to Apollo via AMQP or direct ingest
+- Cooldown-based dedup (5 min) and async extraction to avoid blocking
+- `summary` method for introspection of extraction history
+## [0.3.31] - 2026-03-23
+### Added
+- `Compressor.deduplicate_messages`: removes near-duplicate messages from conversation history using Jaccard similarity on word sets
+- Configurable similarity threshold (default 0.85), keeps last occurrence, same-role-only comparison
+- Skips short messages (< 20 chars) to avoid false positives
+## [0.3.30] - 2026-03-23
+### Added
+- `Scheduling.status`: returns hash with current scheduling state (peak hours, defer intents, next off-peak)
+- `Batch.status`: returns hash with queue size, priority breakdown, oldest entry, config
+## [0.3.29] - 2026-03-23
+### Added
+- `EscalationTracker`: global escalation history with summary analytics
+- Tracks model escalations (from_model, to_model, reason, tier changes)
+- `summary` aggregates by reason, source model, and target model
+- `escalation_rate` reports escalation frequency within configurable time windows
+- Capped at 200 entries with automatic eviction
+## [0.3.28] - 2026-03-23
+### Added
+- QualityChecker: truncation detection for responses cut off mid-sentence
+- QualityChecker: refusal detection for model refusal patterns ("I can't", "as an AI")
+- REFUSAL_PATTERNS constant with configurable regex patterns
+- 6 new specs covering truncation and refusal detection
+## [0.3.27] - 2026-03-23
+### Added
+- `Compressor.summarize_messages` for LLM-based conversation summarization
+- Uses configurable model (default: gpt-4o-mini) for context window compression
+- Falls back to aggressive stopword compression when LLM unavailable
+- Short conversations returned uncompressed to avoid unnecessary API calls
+## [0.3.26] - 2026-03-23
+### Changed
+- Enhanced ShadowEval with result history, cost comparison, and summary analytics
+- `compare` now includes primary_cost, shadow_cost, and cost_savings ratio
+- Added `history`, `clear_history`, and `summary` class methods
+- History capped at 100 entries with automatic eviction
+- Cost estimation uses CostTracker pricing when available
+## [0.3.25] - 2026-03-23
+### Added
+- `Hooks::BudgetGuard` before_chat hook: blocks LLM calls when session cost budget is exceeded
+- `BudgetGuard.status` returns enforcing state, spent, remaining, and ratio
+- `BudgetGuard.remaining` returns remaining budget in USD
+- Configurable via `llm.budget.session_usd` in settings (disabled when 0 or unset)
+- Auto-installed during `LLM.start` only when budget is configured
+- 10 specs covering blocking, passthrough, remaining, status, and enforcing checks
+## [0.3.24] - 2026-03-23
+### Added
+- Auto cost-tracking hook: records per-request cost via `CostTracker` after every LLM call
+- `Hooks::CostTracking.install` registers an `after_chat` hook during `LLM.start`
+- Extracts usage tokens and model from response, feeds into in-memory `CostTracker.record`
+- Opt-out via `llm.cost_tracking.auto: false` in settings
+- 9 specs covering hook installation, token extraction, model fallback, and edge cases
 ## [0.3.23] - 2026-03-23
 ### Added

data/CLAUDE.md CHANGED Viewed

@@ -8,7 +8,7 @@
 Core LegionIO gem providing LLM capabilities to all extensions. Wraps ruby_llm to provide a consistent interface for chat, embeddings, tool use, and agents across multiple providers (Bedrock, Anthropic, OpenAI, Gemini, Ollama). Includes a dynamic weighted routing engine that dispatches requests across local, fleet, and cloud tiers based on caller intent, priority rules, time schedules, cost multipliers, and real-time provider health.
 **GitHub**: https://github.com/LegionIO/legion-llm
-**Version**: 0.3.15
+**Version**: 0.4.0
 **License**: Apache-2.0
 ## Architecture
@@ -51,6 +51,20 @@ Legion::LLM (lib/legion/llm.rb)
 │   ├── Rule         # Routing rule: intent matching, schedule windows, constraints
 │   ├── HealthTracker # Circuit breaker, latency rolling window, pluggable signal handlers
 │   └── EscalationChain # Ordered fallback resolution chain with max_attempts cap (pads last resolution if chain is short)
+├── Pipeline         # 18-step request/response pipeline (feature-flagged)
+│   ├── Request      # Data.define struct for unified request representation
+│   ├── Response     # Data.define struct for unified response representation
+│   ├── Profile      # Caller-derived profiles (external/gaia/system) for step skipping
+│   ├── Tracing      # Distributed trace_id, span_id, exchange_id generation
+│   ├── Timeline     # Ordered event recording with participant tracking
+│   ├── Executor     # 18-step pipeline skeleton with profile-aware execution
+│   └── Steps/
+│       └── Metering # Metering event builder (absorbed from lex-llm-gateway)
+├── CostEstimator    # Model cost estimation with fuzzy pricing (absorbed from lex-llm-gateway)
+├── Fleet            # Fleet RPC dispatch (absorbed from lex-llm-gateway)
+│   ├── Dispatcher   # Fleet dispatch with timeout and availability checks
+│   ├── Handler      # Fleet request handler for GPU worker nodes
+│   └── ReplyDispatcher # Correlation-based reply routing for fleet RPC
 └── Helpers::LLM     # Extension helper mixin (llm_chat, llm_embed, llm_session, compress:)
 ```
@@ -179,6 +193,7 @@ Settings read from `Legion::Settings[:llm]`:
 |-----|------|---------|-------------|
 | `enabled` | Boolean | `true` | Enable LLM support |
 | `connected` | Boolean | `false` | Set to true after successful start |
+| `pipeline_enabled` | Boolean | `false` | Enable 18-step pipeline for chat() dispatch |
 | `default_model` | String | `nil` | Default model ID (auto-detected if nil) |
 | `default_provider` | Symbol | `nil` | Default provider (auto-detected if nil) |
 | `providers` | Hash | See below | Per-provider configuration |
@@ -320,6 +335,19 @@ In-memory signal consumer with pluggable handlers. Adjusts effective priorities
 | `lib/legion/llm/router/escalation_chain.rb` | EscalationChain value object |
 | `lib/legion/llm/transport/exchanges/escalation.rb` | AMQP exchange for escalation events |
 | `lib/legion/llm/transport/messages/escalation_event.rb` | AMQP message for escalation events |
+| `lib/legion/llm/pipeline.rb` | Pipeline module: requires all pipeline components |
+| `lib/legion/llm/pipeline/request.rb` | Pipeline::Request Data.define struct with .build and .from_chat_args |
+| `lib/legion/llm/pipeline/response.rb` | Pipeline::Response Data.define struct with .build, .from_ruby_llm, #with |
+| `lib/legion/llm/pipeline/profile.rb` | Pipeline::Profile: caller-derived profiles for step skipping |
+| `lib/legion/llm/pipeline/tracing.rb` | Pipeline::Tracing: trace_id, span_id, exchange_id generation |
+| `lib/legion/llm/pipeline/timeline.rb` | Pipeline::Timeline: ordered event recording |
+| `lib/legion/llm/pipeline/executor.rb` | Pipeline::Executor: 18-step skeleton with profile-aware execution |
+| `lib/legion/llm/pipeline/steps/metering.rb` | Pipeline::Steps::Metering: metering event builder |
+| `lib/legion/llm/cost_estimator.rb` | CostEstimator: model cost estimation with fuzzy pricing |
+| `lib/legion/llm/fleet.rb` | Fleet module: requires dispatcher, handler, reply_dispatcher |
+| `lib/legion/llm/fleet/dispatcher.rb` | Fleet::Dispatcher: fleet RPC dispatch |
+| `lib/legion/llm/fleet/handler.rb` | Fleet::Handler: fleet request handler |
+| `lib/legion/llm/fleet/reply_dispatcher.rb` | Fleet::ReplyDispatcher: correlation-based reply routing |
 | `lib/legion/llm/helpers/llm.rb` | Extension helper mixin: llm_chat (with compress:, escalate:, max_escalations:, quality_check:), llm_embed, llm_session |
 | `spec/legion/llm_spec.rb` | Tests: settings, lifecycle, providers, auto-config |
 | `spec/legion/llm/integration_spec.rb` | Tests: routing integration with chat() |
@@ -346,6 +374,17 @@ In-memory signal consumer with pluggable handlers. Adjusts effective priorities
 | `spec/legion/llm/shadow_eval_spec.rb` | ShadowEval tests |
 | `spec/legion/llm/structured_output_spec.rb` | StructuredOutput tests |
 | `spec/legion/llm/gateway_integration_spec.rb` | Tests: gateway delegation and _direct bypass |
+| `spec/legion/llm/cost_estimator_spec.rb` | Tests: cost estimation, fuzzy matching, pricing table |
+| `spec/legion/llm/pipeline/request_spec.rb` | Tests: Request struct builder, legacy adapter |
+| `spec/legion/llm/pipeline/response_spec.rb` | Tests: Response struct builder, RubyLLM adapter, #with |
+| `spec/legion/llm/pipeline/profile_spec.rb` | Tests: Profile derivation and step skipping |
+| `spec/legion/llm/pipeline/tracing_spec.rb` | Tests: Tracing init, exchange_id generation |
+| `spec/legion/llm/pipeline/timeline_spec.rb` | Tests: Timeline event recording, participants |
+| `spec/legion/llm/pipeline/executor_spec.rb` | Tests: Executor pipeline execution, profile skipping |
+| `spec/legion/llm/pipeline/integration_spec.rb` | Tests: Pipeline integration with chat() dispatch |
+| `spec/legion/llm/pipeline/steps/metering_spec.rb` | Tests: Metering event building |
+| `spec/legion/llm/fleet/dispatcher_spec.rb` | Tests: Fleet dispatch, availability, timeout |
+| `spec/legion/llm/fleet/handler_spec.rb` | Tests: Fleet handler, auth, response building |
 | `spec/spec_helper.rb` | Stubbed Legion::Logging and Legion::Settings for testing |
 ## Extension Integration
@@ -405,8 +444,8 @@ The legacy `vault_path` per-provider setting was removed in v0.3.1.
 Tests run without the full LegionIO stack. `spec/spec_helper.rb` stubs `Legion::Logging` and `Legion::Settings` with in-memory implementations. Each test resets settings to defaults via `before(:each)`.
 ```bash
-bundle exec rspec    # 304 examples, 0 failures
-bundle exec rubocop  # 52 files, 0 offenses
+bundle exec rspec    # 712 examples, 0 failures
+bundle exec rubocop  # 113 files, 0 offenses
 ```
 ## Design Documents

data/legion-llm-0.3.20.gem ADDED Viewed

Binary file

data/lib/legion/llm/batch.rb CHANGED Viewed

@@ -78,6 +78,20 @@ module Legion
           queue.size
         end
+        # Returns a summary of current batch queue state.
+        def status
+          entries = queue.dup
+          oldest = entries.min_by { |e| e[:queued_at] }
+          {
+            enabled:        enabled?,
+            queue_size:     entries.size,
+            max_batch_size: settings.fetch(:max_batch_size, 100),
+            window_seconds: settings.fetch(:window_seconds, 300),
+            oldest_queued:  oldest ? oldest[:queued_at].iso8601 : nil,
+            by_priority:    entries.group_by { |e| e[:priority] }.transform_values(&:size)
+          }
+        end
         # Clears the queue (useful for testing).
         def reset!
           @queue = []

data/lib/legion/llm/compressor.rb CHANGED Viewed

@@ -15,6 +15,17 @@ module Legion
         3 => %w[also then still even already yet again please note that]
       }.freeze
+      SUMMARIZE_PROMPT = <<~PROMPT
+        Summarize this conversation concisely. Preserve:
+        - Key decisions and conclusions
+        - Code snippets and file paths
+        - Action items and next steps
+        - Technical details that would be needed to continue the conversation
+        Omit pleasantries, repetition, and verbose explanations.
+        Return only the summary, no preamble.
+      PROMPT
       class << self
         def compress(text, level: LIGHT)
           return text if text.nil? || text.empty? || level <= NONE
@@ -28,6 +39,55 @@ module Legion
           result
         end
+        def summarize_messages(messages, max_tokens: 2000)
+          return { summary: '', original_count: 0 } if messages.nil? || messages.empty?
+          text = messages.map { |m| "#{m[:role]}: #{m[:content]}" }.join("\n\n")
+          return { summary: text, original_count: messages.size, compressed: false } if text.length < max_tokens * 4
+          summary = llm_summarize(text, max_tokens)
+          if summary
+            log_debug("summarize_messages: #{messages.size} messages -> #{summary.length} chars")
+            { summary: summary, original_count: messages.size, compressed: true }
+          else
+            fallback = compress(text, level: AGGRESSIVE)
+            { summary: fallback, original_count: messages.size, compressed: true, method: :stopword }
+          end
+        end
+        # Removes near-duplicate messages from a conversation history.
+        # Uses Jaccard similarity on word sets to detect duplicates.
+        # Keeps the last occurrence of similar messages.
+        #
+        # @param messages [Array<Hash>] messages with :role and :content keys
+        # @param threshold [Float] similarity threshold (0.0-1.0) above which messages are considered duplicates
+        # @return [Hash] { messages: Array, removed: Integer, original_count: Integer }
+        def deduplicate_messages(messages, threshold: 0.85)
+          return { messages: [], removed: 0, original_count: 0 } if messages.nil? || messages.empty?
+          kept = []
+          removed = 0
+          messages.reverse_each do |msg|
+            content = msg[:content].to_s
+            next kept.unshift(msg) if content.length < 20
+            duplicate = kept.any? do |existing|
+              next false unless existing[:role] == msg[:role]
+              jaccard_similarity(content, existing[:content].to_s) >= threshold
+            end
+            if duplicate
+              removed += 1
+            else
+              kept.unshift(msg)
+            end
+          end
+          { messages: kept, removed: removed, original_count: messages.size }
+        end
         def stopwords_for_level(level)
           return [] if level <= NONE
@@ -71,6 +131,35 @@ module Legion
         def collapse_whitespace(text)
           text.gsub(/\n{3,}/, "\n\n")
         end
+        def llm_summarize(text, max_tokens)
+          return nil unless defined?(Legion::LLM) && Legion::LLM.respond_to?(:chat_direct)
+          session = Legion::LLM.chat_direct(model: summarize_model)
+          response = session.ask("#{SUMMARIZE_PROMPT}\n\n#{text[0, max_tokens * 8]}")
+          response.content
+        rescue StandardError => e
+          log_debug("llm_summarize failed: #{e.message}")
+          nil
+        end
+        def summarize_model
+          (defined?(Legion::Settings) && Legion::Settings.dig(:llm, :compressor, :model)) || 'gpt-4o-mini'
+        end
+        def jaccard_similarity(text_a, text_b)
+          words_a = text_a.downcase.scan(/\w+/).to_set
+          words_b = text_b.downcase.scan(/\w+/).to_set
+          return 0.0 if words_a.empty? && words_b.empty?
+          intersection = (words_a & words_b).size.to_f
+          union = (words_a | words_b).size.to_f
+          union.zero? ? 0.0 : intersection / union
+        end
+        def log_debug(msg)
+          Legion::Logging.debug("Compressor: #{msg}") if defined?(Legion::Logging)
+        end
       end
     end
   end

data/lib/legion/llm/cost_estimator.rb ADDED Viewed

@@ -0,0 +1,51 @@
+# frozen_string_literal: true
+module Legion
+  module LLM
+    module CostEstimator
+      # Prices per 1M tokens [input, output] in USD
+      # Source: published API pricing as of 2026-03
+      PRICING = {
+        'claude-opus-4-6'   => [15.0, 75.0],
+        'claude-sonnet-4-6' => [3.0, 15.0],
+        'claude-haiku-4-5'  => [0.80, 4.0],
+        'claude-3-5-sonnet' => [3.0, 15.0],
+        'claude-3-haiku'    => [0.25, 1.25],
+        'gpt-4o'            => [2.50, 10.0],
+        'gpt-4o-mini'       => [0.15, 0.60],
+        'gpt-4-turbo'       => [10.0, 30.0],
+        'o3'                => [10.0, 40.0],
+        'o3-mini'           => [1.10, 4.40],
+        'o4-mini'           => [1.10, 4.40],
+        'gemini-2.5-pro'    => [1.25, 10.0],
+        'gemini-2.5-flash'  => [0.15, 0.60],
+        'gemini-2.0-flash'  => [0.10, 0.40]
+      }.freeze
+      DEFAULT_PRICE = [1.0, 3.0].freeze
+      module_function
+      def estimate(model_id:, input_tokens: 0, output_tokens: 0, **)
+        price = resolve_price(model_id)
+        input_cost  = (input_tokens.to_i / 1_000_000.0) * price[0]
+        output_cost = (output_tokens.to_i / 1_000_000.0) * price[1]
+        (input_cost + output_cost).round(6)
+      end
+      def resolve_price(model_id)
+        return DEFAULT_PRICE unless model_id
+        normalized = model_id.to_s.downcase
+        PRICING[normalized] || fuzzy_match(normalized) || DEFAULT_PRICE
+      end
+      def fuzzy_match(normalized)
+        PRICING.each do |key, price|
+          return price if normalized.include?(key) || key.include?(normalized)
+        end
+        nil
+      end
+    end
+  end
+end

data/lib/legion/llm/escalation_tracker.rb ADDED Viewed

@@ -0,0 +1,73 @@
+# frozen_string_literal: true
+module Legion
+  module LLM
+    module EscalationTracker
+      MAX_HISTORY = 200
+      class << self
+        def record(from_model:, to_model:, reason:, tier_from: nil, tier_to: nil)
+          entry = {
+            from_model:  from_model.to_s,
+            to_model:    to_model.to_s,
+            reason:      reason.to_s,
+            tier_from:   tier_from,
+            tier_to:     tier_to,
+            recorded_at: Time.now.utc
+          }
+          history << entry
+          history.shift while history.size > MAX_HISTORY
+          log_debug("escalation: #{from_model} -> #{to_model} reason=#{reason}")
+          entry
+        end
+        def history
+          @history ||= []
+        end
+        def clear
+          @history = []
+        end
+        def summary
+          entries = history.dup
+          return empty_summary if entries.empty?
+          {
+            total_escalations: entries.size,
+            by_reason:         count_by(entries, :reason),
+            by_target_model:   count_by(entries, :to_model),
+            by_source_model:   count_by(entries, :from_model),
+            recent:            entries.last(5).reverse
+          }
+        end
+        def escalation_rate(window_seconds: 3600)
+          cutoff = Time.now.utc - window_seconds
+          recent = history.count { |e| e[:recorded_at] >= cutoff }
+          { count: recent, window_seconds: window_seconds }
+        end
+        private
+        def count_by(entries, key)
+          entries.group_by { |e| e[key] }.transform_values(&:size)
+        end
+        def empty_summary
+          {
+            total_escalations: 0,
+            by_reason:         {},
+            by_target_model:   {},
+            by_source_model:   {},
+            recent:            []
+          }
+        end
+        def log_debug(msg)
+          Legion::Logging.debug("[EscalationTracker] #{msg}") if defined?(Legion::Logging)
+        end
+      end
+    end
+  end
+end

data/lib/legion/llm/fleet/dispatcher.rb ADDED Viewed

@@ -0,0 +1,90 @@
+# frozen_string_literal: true
+module Legion
+  module LLM
+    module Fleet
+      module Dispatcher
+        DEFAULT_TIMEOUT = 30
+        module_function
+        def dispatch(model:, messages:, **opts)
+          return error_result('fleet_unavailable') unless fleet_available?
+          correlation_id = "fleet_#{SecureRandom.hex(12)}"
+          publish_request(model: model, messages: messages, intent: opts[:intent],
+                          correlation_id: correlation_id, **opts.except(:intent, :timeout))
+          wait_for_response(correlation_id, timeout: resolve_timeout(opts[:timeout]))
+        end
+        def fleet_available?
+          transport_ready? && fleet_enabled?
+        end
+        def transport_ready?
+          !!(defined?(Legion::Transport) &&
+             Legion::Transport.respond_to?(:connected?) &&
+             Legion::Transport.connected?)
+        end
+        def fleet_enabled?
+          return true unless defined?(Legion::Settings)
+          settings = begin
+            Legion::Settings[:llm]
+          rescue StandardError
+            nil
+          end
+          return true unless settings.is_a?(Hash)
+          routing = settings[:routing]
+          return true unless routing.is_a?(Hash)
+          routing.fetch(:use_fleet, true)
+        end
+        def resolve_timeout(override)
+          return override if override
+          return DEFAULT_TIMEOUT unless defined?(Legion::Settings)
+          settings = begin
+            Legion::Settings[:llm]
+          rescue StandardError
+            nil
+          end
+          return DEFAULT_TIMEOUT unless settings.is_a?(Hash)
+          settings.dig(:routing, :fleet, :timeout_seconds) || DEFAULT_TIMEOUT
+        end
+        def publish_request(**)
+          return unless defined?(Legion::Extensions::LLM::Gateway::Transport::Messages::InferenceRequest)
+          Legion::Extensions::LLM::Gateway::Transport::Messages::InferenceRequest.new(
+            reply_to: ReplyDispatcher.agent_queue_name, **
+          ).publish
+        end
+        def wait_for_response(correlation_id, timeout:)
+          future = ReplyDispatcher.register(correlation_id)
+          result = future.value!(timeout)
+          result || timeout_result(correlation_id, timeout)
+        rescue Concurrent::CancelledOperationError
+          timeout_result(correlation_id, timeout)
+        ensure
+          ReplyDispatcher.deregister(correlation_id)
+        end
+        def timeout_result(correlation_id, timeout)
+          { success: false, error: 'fleet_timeout', correlation_id: correlation_id, timeout: timeout }
+        end
+        def error_result(reason)
+          { success: false, error: reason }
+        end
+      end
+    end
+  end
+end

data/lib/legion/llm/fleet/handler.rb ADDED Viewed

@@ -0,0 +1,110 @@
+# frozen_string_literal: true
+module Legion
+  module LLM
+    module Fleet
+      module Handler
+        module_function
+        def handle_fleet_request(payload)
+          if Dispatcher.fleet_enabled? && !valid_token?(payload[:signed_token])
+            error_response = { success: false, error: 'invalid_token' }
+            publish_reply(payload[:reply_to], payload[:correlation_id], error_response) if payload[:reply_to]
+            return error_response
+          end
+          response = call_local_llm(payload)
+          response_hash = build_response(payload[:correlation_id], response)
+          publish_reply(payload[:reply_to], payload[:correlation_id], response_hash) if payload[:reply_to]
+          response_hash
+        end
+        def valid_token?(token)
+          return true unless require_auth?
+          return false if token.nil?
+          return true unless defined?(Legion::Crypt)
+          !Legion::Crypt.validate_jwt(token).nil?
+        rescue StandardError
+          false
+        end
+        def require_auth?
+          return false unless defined?(Legion::Settings)
+          settings = begin
+            Legion::Settings[:llm]
+          rescue StandardError
+            nil
+          end
+          return false unless settings.is_a?(Hash)
+          fleet = settings.dig(:routing, :fleet)
+          return false unless fleet.is_a?(Hash)
+          fleet.fetch(:require_auth, false)
+        end
+        def call_local_llm(payload)
+          return { error: 'llm_not_available' } unless defined?(Legion::LLM)
+          case payload[:request_type]&.to_s
+          when 'structured'
+            Legion::LLM.structured_direct(messages: payload[:messages], schema: payload[:schema])
+          when 'embed'
+            text = payload[:text] || payload.dig(:messages, 0, :content)
+            Legion::LLM.embed_direct(text, model: payload[:model])
+          else
+            Legion::LLM.chat_direct(model: payload[:model], message: payload.dig(:messages, 0, :content))
+          end
+        end
+        def build_response(correlation_id, response)
+          {
+            correlation_id:  correlation_id,
+            response:        response,
+            input_tokens:    extract_token(response, :input_tokens),
+            output_tokens:   extract_token(response, :output_tokens),
+            thinking_tokens: extract_token(response, :thinking_tokens),
+            provider:        extract_field(response, :provider),
+            model_id:        extract_field(response, :model)
+          }
+        end
+        def publish_reply(reply_to, correlation_id, response_hash)
+          return unless defined?(Legion::Transport)
+          payload = if defined?(Legion::JSON)
+                      Legion::JSON.dump(response_hash)
+                    else
+                      require 'json'
+                      ::JSON.generate(response_hash)
+                    end
+          channel = Legion::Transport.connection.create_channel
+          channel.default_exchange.publish(
+            payload,
+            routing_key:    reply_to,
+            correlation_id: correlation_id,
+            content_type:   'application/json'
+          )
+          channel.close
+        rescue StandardError => e
+          Legion::Logging.warn("Fleet::Handler: publish_reply failed: #{e.message}") if defined?(Legion::Logging)
+        end
+        def extract_token(response, field)
+          return 0 unless response.respond_to?(field)
+          response.public_send(field).to_i
+        end
+        def extract_field(response, field)
+          return nil unless response.respond_to?(field)
+          response.public_send(field)
+        end
+      end
+    end
+  end
+end