RubyGems - legion-llm - Versions diffs - 0.4.0 → 0.4.2 - Mend

legion-llm 0.4.0 → 0.4.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +32 -0
data/CLAUDE.md +25 -18
data/lib/legion/llm/conversation_store.rb +182 -0
data/lib/legion/llm/errors.rb +43 -0
data/lib/legion/llm/pipeline/audit_publisher.rb +60 -0
data/lib/legion/llm/pipeline/enrichment_injector.rb +31 -0
data/lib/legion/llm/pipeline/executor.rb +136 -19
data/lib/legion/llm/pipeline/gaia_caller.rb +58 -0
data/lib/legion/llm/pipeline/steps/gaia_advisory.rb +64 -0
data/lib/legion/llm/pipeline/steps/mcp_discovery.rb +59 -0
data/lib/legion/llm/pipeline/steps/post_response.rb +59 -0
data/lib/legion/llm/pipeline/steps/rag_context.rb +85 -0
data/lib/legion/llm/pipeline/steps/rag_guard.rb +37 -0
data/lib/legion/llm/pipeline/steps/tool_calls.rb +63 -0
data/lib/legion/llm/pipeline/steps.rb +18 -0
data/lib/legion/llm/pipeline/tool_dispatcher.rb +81 -0
data/lib/legion/llm/pipeline.rb +5 -1
data/lib/legion/llm/version.rb +1 -1
data/lib/legion/llm.rb +18 -53
metadata +14 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 2963ec3f995c8bab5b80af82d724155eb3590473bbf04e6eec7fe22c8625435a
-  data.tar.gz: 2fb50f72dfbe6867a388a50317eea6403e10c117e80f2f83fcb259e5601e9419
+  metadata.gz: 7d783c05981cd0272f10826212088d4054d5d001f06c0ab1d813ac8a0d40a3ca
+  data.tar.gz: 5eac65f6ad91f5f7296b05b5c4b1bf89af789176a3775cb8864553bec5c924d6
 SHA512:
-  metadata.gz: f6009ed5907f5cfc642cac46f0b626b3e8c888549c6de480ff4c39a569fe23901a281cc2ccd249fe605f88862ffc1392a684afc9ab4a1439520edb5f83cf6734
-  data.tar.gz: 0d5b8fcd87a585cc3da2c81ff737cabe25bcadf171ee58304f2544b4a601fc519666e6a54ff402923b39b597b41c527019fbc421eefbb6bbdf8afaab0ce75f7e
+  metadata.gz: d3c39286ae48876691530ba2baed4a2d047a7491e6a024baf9fba8101f1a536c19768e654bc78fdb413efa5275778ae00fc6225fbcfd8571894475b5eba775b8
+  data.tar.gz: f4e990dbc665730c22f212fcc643843186dd13c5dbef1c5dfe871b1a79deb79ac570f893441c64e75c498ab058e487c8a1062188b79a8e23e4c51dd10a343006

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,37 @@
 # Legion LLM Changelog
+## [0.4.2] - 2026-03-23
+### Added
+- `Pipeline::Steps::RagContext` (step 8): context strategy selector (full/rag_hybrid/rag/none) based on utilization, queries Apollo via `retrieve_relevant`
+- `Pipeline::Steps::RagGuard`: post-response faithfulness check against retrieved RAG context via `Hooks::RagGuard`
+- `Pipeline::EnrichmentInjector`: converts RAG and GAIA enrichments into system prompt text before provider call
+- `Pipeline::GaiaCaller`: privileged helper for GAIA/GAS LLM calls with system profile (skips governance steps)
+- `Pipeline::AuditPublisher`: publishes audit events to `llm.audit` exchange for GAS subscriber consumption
+- RAG/GAS full cycle integration test (4 examples: enrichment, injection, degradation, feedback loop prevention)
+## [0.4.1] - 2026-03-23
+### Added
+- Typed error hierarchy (`AuthError`, `RateLimitError`, `ContextOverflow`, `ProviderError`, `ProviderDown`, `UnsupportedCapability`, `PipelineError`) with `retryable?` predicate
+- `ConversationStore` with in-memory LRU hot layer (256 conversations) and optional DB persistence via Sequel
+- Streaming pipeline support via `Executor#call_stream` — pre/post steps run normally, chunks yielded to caller
+- Pipeline steps `context_load` (Step 3) and `context_store` (Step 15) now functional
+- `Pipeline::Steps::McpDiscovery` (step 9): discovers tools from all healthy MCP servers via `Legion::MCP::Client::Pool`
+- `Pipeline::ToolDispatcher`: routes tool calls to MCP client, LEX extension runner, or RubyLLM builtin
+- `Pipeline::Steps::ToolCalls` (step 14): dispatches non-builtin tool calls from LLM response via `ToolDispatcher`
+- `pipeline/steps.rb` aggregator for all step modules
+### Changed
+- Executor `step_provider_call` classifies Faraday errors into typed hierarchy
+- `chat`, `embed`, and `structured` route directly without gateway delegation
+- `_dispatch_embed` and `_dispatch_structured` removed; dispatch inlined
+### Removed
+- `lex-llm-gateway` auto-loading (`begin/rescue LoadError` block removed)
+- `gateway_loaded?` and `gateway_chat` helper methods
+- `_dispatch_embed` and `_dispatch_structured` indirection methods
 ## [0.4.0] - 2026-03-23
 ### Added

data/CLAUDE.md CHANGED Viewed

@@ -8,7 +8,7 @@
 Core LegionIO gem providing LLM capabilities to all extensions. Wraps ruby_llm to provide a consistent interface for chat, embeddings, tool use, and agents across multiple providers (Bedrock, Anthropic, OpenAI, Gemini, Ollama). Includes a dynamic weighted routing engine that dispatches requests across local, fleet, and cloud tiers based on caller intent, priority rules, time schedules, cost multipliers, and real-time provider health.
 **GitHub**: https://github.com/LegionIO/legion-llm
-**Version**: 0.4.0
+**Version**: 0.4.1
 **License**: Apache-2.0
 ## Architecture
@@ -33,6 +33,8 @@ Legion::LLM (lib/legion/llm.rb)
 ├── EscalationExhausted # Raised when all escalation attempts are exhausted
 ├── DaemonDeniedError   # Raised when daemon returns HTTP 403
 ├── DaemonRateLimitedError # Raised when daemon returns HTTP 429
+├── LLMError / AuthError / RateLimitError / ContextOverflow / ProviderError / ProviderDown / UnsupportedCapability / PipelineError # Typed error hierarchy with retryable?
+├── ConversationStore   # In-memory LRU (256 conversations) + optional DB persistence via Sequel
 ├── Settings         # Default config, provider settings, routing defaults, discovery defaults
 ├── Providers        # Provider configuration and Vault credential resolution (includes Azure `configure_azure`)
 ├── DaemonClient     # HTTP routing to LegionIO daemon with 30s health cache
@@ -58,8 +60,9 @@ Legion::LLM (lib/legion/llm.rb)
 │   ├── Tracing      # Distributed trace_id, span_id, exchange_id generation
 │   ├── Timeline     # Ordered event recording with participant tracking
 │   ├── Executor     # 18-step pipeline skeleton with profile-aware execution
-│   └── Steps/
-│       └── Metering # Metering event builder (absorbed from lex-llm-gateway)
+│   ├── Steps/
+│   │   └── Metering # Metering event builder (absorbed from lex-llm-gateway)
+│   └── Executor#call_stream # Streaming variant: pre-provider steps, yield chunks, post-provider steps
 ├── CostEstimator    # Model cost estimation with fuzzy pricing (absorbed from lex-llm-gateway)
 ├── Fleet            # Fleet RPC dispatch (absorbed from lex-llm-gateway)
 │   ├── Dispatcher   # Fleet dispatch with timeout and availability checks
@@ -107,16 +110,7 @@ Three-tier dispatch model. Local-first avoids unnecessary network hops; fleet of
 ### Gateway Integration (lex-llm-gateway)
-When `lex-llm-gateway` is installed, `chat`, `embed`, and `structured` automatically delegate to the gateway for metering and fleet dispatch. The gateway is loaded via `begin/rescue LoadError` — optional, not a hard dependency.
-```
-Caller → Legion::LLM.chat(message:)
-  └─ gateway loaded? → Gateway::Runners::Inference.chat (meters, fleet dispatch)
-       └─ Legion::LLM.chat_direct (routing, escalation, RubyLLM)
-  └─ no gateway? → Legion::LLM.chat_direct (same path, no metering)
-```
-The `_direct` variants (`chat_direct`, `embed_direct`, `structured_direct`) bypass gateway delegation. The gateway's `call_llm` uses these to avoid infinite recursion.
+Gateway delegation removed in v0.4.1. `chat`, `embed`, and `structured` route directly — no `begin/rescue LoadError` block, no `gateway_loaded?` check. The pipeline (when `pipeline_enabled: true`) handles metering and fleet dispatch natively. The `_direct` variants still exist as the canonical non-pipeline path for `chat_direct`, `embed_direct`, `structured_direct`.
 ### Integration with LegionIO
@@ -135,7 +129,7 @@ The `_direct` variants (`chat_direct`, `embed_direct`, `structured_direct`) bypa
 | `tzinfo` (>= 2.0) | IANA timezone conversion for schedule windows |
 | `legion-logging` | Logging |
 | `legion-settings` | Configuration |
-| `lex-llm-gateway` (optional) | Metering over RMQ, fleet RPC dispatch, disk spool — auto-loaded if present |
+| `lex-llm-gateway` (removed) | No longer auto-loaded; pipeline handles metering and fleet dispatch natively |
 ## Key Interfaces
@@ -329,7 +323,9 @@ In-memory signal consumer with pluggable handlers. Adjusts effective priorities
 | `lib/legion/llm/embeddings.rb` | Embeddings module: generate, generate_batch, default_model |
 | `lib/legion/llm/shadow_eval.rb` | Shadow evaluation: enabled?, should_sample?, evaluate, compare |
 | `lib/legion/llm/structured_output.rb` | JSON schema enforcement with native response_format and prompt fallback |
-| `lib/legion/llm/version.rb` | Version constant (0.3.15) |
+| `lib/legion/llm/errors.rb` | Typed error hierarchy: LLMError base + AuthError, RateLimitError, ContextOverflow, ProviderError, ProviderDown, UnsupportedCapability, PipelineError |
+| `lib/legion/llm/conversation_store.rb` | ConversationStore: in-memory LRU (256 slots) + optional Sequel DB persistence + spool fallback |
+| `lib/legion/llm/version.rb` | Version constant (0.4.2) |
 | `lib/legion/llm/quality_checker.rb` | QualityChecker module with QualityResult struct |
 | `lib/legion/llm/escalation_history.rb` | EscalationHistory mixin: `escalation_history`, `escalated?`, `final_resolution`, `escalation_chain` |
 | `lib/legion/llm/router/escalation_chain.rb` | EscalationChain value object |
@@ -343,6 +339,9 @@ In-memory signal consumer with pluggable handlers. Adjusts effective priorities
 | `lib/legion/llm/pipeline/timeline.rb` | Pipeline::Timeline: ordered event recording |
 | `lib/legion/llm/pipeline/executor.rb` | Pipeline::Executor: 18-step skeleton with profile-aware execution |
 | `lib/legion/llm/pipeline/steps/metering.rb` | Pipeline::Steps::Metering: metering event builder |
+| `lib/legion/llm/pipeline/steps/rag_context.rb` | Pipeline::Steps::RagContext: context strategy selection and Apollo retrieval (step 8) |
+| `lib/legion/llm/pipeline/steps/rag_guard.rb` | Pipeline::Steps::RagGuard: faithfulness check against retrieved RAG context |
+| `lib/legion/llm/pipeline/enrichment_injector.rb` | Pipeline::EnrichmentInjector: converts RAG/GAIA enrichments into system prompt |
 | `lib/legion/llm/cost_estimator.rb` | CostEstimator: model cost estimation with fuzzy pricing |
 | `lib/legion/llm/fleet.rb` | Fleet module: requires dispatcher, handler, reply_dispatcher |
 | `lib/legion/llm/fleet/dispatcher.rb` | Fleet::Dispatcher: fleet RPC dispatch |
@@ -373,7 +372,11 @@ In-memory signal consumer with pluggable handlers. Adjusts effective priorities
 | `spec/legion/llm/embeddings_spec.rb` | Embeddings tests |
 | `spec/legion/llm/shadow_eval_spec.rb` | ShadowEval tests |
 | `spec/legion/llm/structured_output_spec.rb` | StructuredOutput tests |
-| `spec/legion/llm/gateway_integration_spec.rb` | Tests: gateway delegation and _direct bypass |
+| `spec/legion/llm/errors_spec.rb` | Tests: typed error hierarchy, retryable? predicate |
+| `spec/legion/llm/conversation_store_spec.rb` | Tests: LRU eviction, append, messages, DB fallback |
+| `spec/legion/llm/pipeline/executor_stream_spec.rb` | Tests: call_stream chunk yielding, pre/post steps |
+| `spec/legion/llm/pipeline/streaming_integration_spec.rb` | Tests: streaming end-to-end with ConversationStore |
+| `spec/legion/llm/gateway_integration_spec.rb` | Tests: gateway teardown — verifies no delegation |
 | `spec/legion/llm/cost_estimator_spec.rb` | Tests: cost estimation, fuzzy matching, pricing table |
 | `spec/legion/llm/pipeline/request_spec.rb` | Tests: Request struct builder, legacy adapter |
 | `spec/legion/llm/pipeline/response_spec.rb` | Tests: Response struct builder, RubyLLM adapter, #with |
@@ -385,6 +388,10 @@ In-memory signal consumer with pluggable handlers. Adjusts effective priorities
 | `spec/legion/llm/pipeline/steps/metering_spec.rb` | Tests: Metering event building |
 | `spec/legion/llm/fleet/dispatcher_spec.rb` | Tests: Fleet dispatch, availability, timeout |
 | `spec/legion/llm/fleet/handler_spec.rb` | Tests: Fleet handler, auth, response building |
+| `spec/legion/llm/pipeline/steps/rag_context_spec.rb` | Tests: RAG context strategy selection, Apollo retrieval, graceful degradation |
+| `spec/legion/llm/pipeline/steps/rag_guard_spec.rb` | Tests: RAG faithfulness checking |
+| `spec/legion/llm/pipeline/enrichment_injector_spec.rb` | Tests: enrichment injection into system prompt |
+| `spec/legion/llm/pipeline/rag_gas_integration_spec.rb` | Tests: RAG/GAS full cycle integration |
 | `spec/spec_helper.rb` | Stubbed Legion::Logging and Legion::Settings for testing |
 ## Extension Integration
@@ -444,8 +451,8 @@ The legacy `vault_path` per-provider setting was removed in v0.3.1.
 Tests run without the full LegionIO stack. `spec/spec_helper.rb` stubs `Legion::Logging` and `Legion::Settings` with in-memory implementations. Each test resets settings to defaults via `before(:each)`.
 ```bash
-bundle exec rspec    # 712 examples, 0 failures
-bundle exec rubocop  # 113 files, 0 offenses
+bundle exec rspec    # 794 examples, 0 failures
+bundle exec rubocop  # 142 files, 0 offenses
 ```
 ## Design Documents

data/lib/legion/llm/conversation_store.rb ADDED Viewed

@@ -0,0 +1,182 @@
+# frozen_string_literal: true
+module Legion
+  module LLM
+    module ConversationStore
+      MAX_CONVERSATIONS = 256
+      class << self
+        def append(conversation_id, role:, content:, **metadata)
+          ensure_conversation(conversation_id)
+          seq = next_seq(conversation_id)
+          msg = { seq: seq, role: role, content: content, created_at: Time.now, **metadata }
+          conversations[conversation_id][:messages] << msg
+          touch(conversation_id)
+          persist_message(conversation_id, msg)
+          msg
+        end
+        def messages(conversation_id)
+          if in_memory?(conversation_id)
+            touch(conversation_id)
+            conversations[conversation_id][:messages].sort_by { |m| m[:seq] }
+          else
+            load_from_db(conversation_id)
+          end
+        end
+        def create_conversation(conversation_id, **metadata)
+          conversations[conversation_id] = { messages: [], metadata: metadata, accessed_at: Time.now }
+          evict_if_needed
+          persist_conversation(conversation_id, metadata)
+        end
+        def conversation_exists?(conversation_id)
+          in_memory?(conversation_id) || db_conversation_exists?(conversation_id)
+        end
+        def in_memory?(conversation_id)
+          conversations.key?(conversation_id)
+        end
+        def reset!
+          @conversations = {}
+        end
+        private
+        def conversations
+          @conversations ||= {}
+        end
+        def ensure_conversation(conversation_id)
+          return if in_memory?(conversation_id)
+          create_conversation(conversation_id)
+        end
+        def next_seq(conversation_id)
+          msgs = conversations[conversation_id][:messages]
+          msgs.empty? ? 1 : msgs.last[:seq] + 1
+        end
+        def touch(conversation_id)
+          return unless in_memory?(conversation_id)
+          conversations[conversation_id][:accessed_at] = Time.now
+        end
+        def evict_if_needed
+          return unless conversations.size > self::MAX_CONVERSATIONS
+          oldest_id = conversations.min_by { |_, v| v[:accessed_at] }&.first
+          conversations.delete(oldest_id) if oldest_id
+        end
+        def persist_message(conversation_id, msg)
+          return unless db_available?
+          db_append_message(conversation_id, msg)
+        rescue StandardError => e
+          spool_message(conversation_id, msg)
+          Legion::Logging.warn("ConversationStore persist failed, spooled: #{e.message}") if defined?(Legion::Logging)
+        end
+        def persist_conversation(conversation_id, metadata)
+          return unless db_available?
+          db_create_conversation(conversation_id, metadata)
+        rescue StandardError => e
+          Legion::Logging.warn("ConversationStore conversation persist failed: #{e.message}") if defined?(Legion::Logging)
+        end
+        def load_from_db(conversation_id)
+          return [] unless db_available?
+          db_load_messages(conversation_id)
+        rescue StandardError
+          []
+        end
+        def db_conversation_exists?(conversation_id)
+          return false unless db_available?
+          db_conversation_record?(conversation_id)
+        rescue StandardError
+          false
+        end
+        def db_available?
+          defined?(Legion::Data) &&
+            Legion::Data.respond_to?(:connection) &&
+            Legion::Data.connection.respond_to?(:table_exists?) &&
+            Legion::Data.connection.table_exists?(:conversations)
+        rescue StandardError
+          false
+        end
+        def db_create_conversation(conversation_id, metadata)
+          Legion::Data.connection[:conversations].insert_ignore.insert(
+            id:              conversation_id,
+            caller_identity: metadata[:caller_identity],
+            metadata:        metadata.to_json,
+            created_at:      Time.now,
+            updated_at:      Time.now
+          )
+        end
+        def db_append_message(conversation_id, msg)
+          Legion::Data.connection[:conversation_messages].insert(
+            conversation_id: conversation_id,
+            seq:             msg[:seq],
+            role:            msg[:role].to_s,
+            content:         msg[:content],
+            provider:        msg[:provider]&.to_s,
+            model:           msg[:model]&.to_s,
+            input_tokens:    msg[:input_tokens],
+            output_tokens:   msg[:output_tokens],
+            created_at:      msg[:created_at] || Time.now
+          )
+        end
+        def db_load_messages(conversation_id)
+          Legion::Data.connection[:conversation_messages]
+                      .where(conversation_id: conversation_id)
+                      .order(:seq)
+                      .map { |row| symbolize_message(row) }
+        end
+        def db_conversation_record?(conversation_id)
+          Legion::Data.connection[:conversations].where(id: conversation_id).any?
+        end
+        def symbolize_message(row)
+          {
+            seq:           row[:seq],
+            role:          row[:role]&.to_sym,
+            content:       row[:content],
+            provider:      row[:provider]&.to_sym,
+            model:         row[:model],
+            input_tokens:  row[:input_tokens],
+            output_tokens: row[:output_tokens],
+            created_at:    row[:created_at]
+          }
+        end
+        def spool_message(conversation_id, msg)
+          return unless defined?(Legion::Data::Spool)
+          dir = File.join(spool_root, 'conversations')
+          FileUtils.mkdir_p(dir)
+          filename = "#{Time.now.strftime('%s%9N')}-#{SecureRandom.uuid}.json"
+          payload = { conversation_id: conversation_id, message: msg }
+          File.write(File.join(dir, filename), payload.to_json)
+        end
+        def spool_root
+          @spool_root ||= File.expand_path('~/.legionio/data/spool/llm')
+        end
+      end
+    end
+  end
+end

data/lib/legion/llm/errors.rb ADDED Viewed

@@ -0,0 +1,43 @@
+# frozen_string_literal: true
+module Legion
+  module LLM
+    class LLMError < StandardError
+      def retryable? = false
+    end
+    class AuthError < LLMError; end
+    class RateLimitError < LLMError
+      attr_reader :retry_after
+      def initialize(msg = nil, retry_after: nil)
+        @retry_after = retry_after
+        super(msg)
+      end
+      def retryable? = true
+    end
+    class ContextOverflow < LLMError
+      def retryable? = true
+    end
+    class ProviderError < LLMError
+      def retryable? = true
+    end
+    class ProviderDown < LLMError; end
+    class UnsupportedCapability < LLMError; end
+    class PipelineError < LLMError
+      attr_reader :step
+      def initialize(msg = nil, step: nil)
+        @step = step
+        super(msg)
+      end
+    end
+  end
+end

data/lib/legion/llm/pipeline/audit_publisher.rb ADDED Viewed

@@ -0,0 +1,60 @@
+# frozen_string_literal: true
+module Legion
+  module LLM
+    module Pipeline
+      module AuditPublisher
+        EXCHANGE    = 'llm.audit'
+        ROUTING_KEY = 'llm.audit.complete'
+        module_function
+        def build_event(request:, response:)
+          {
+            request_id:       response.request_id,
+            conversation_id:  response.conversation_id,
+            caller:           response.caller,
+            routing:          response.routing,
+            tokens:           response.tokens,
+            cost:             response.cost,
+            enrichments:      response.enrichments,
+            audit:            response.audit,
+            timeline:         response.timeline,
+            timestamps:       response.timestamps,
+            classification:   response.classification,
+            tracing:          response.tracing,
+            messages:         request.messages,
+            response_content: response.message[:content],
+            tools_used:       response.tools,
+            timestamp:        Time.now
+          }
+        end
+        def publish(request:, response:)
+          event = build_event(request: request, response: response)
+          begin
+            if defined?(Legion::Transport) &&
+               defined?(Legion::Transport::Messages::Dynamic)
+              Legion::Transport::Messages::Dynamic.new(
+                function:    'llm_audit',
+                opts:        event,
+                exchange:    EXCHANGE,
+                routing_key: ROUTING_KEY
+              ).publish
+            elsif defined?(Legion::Logging)
+              Legion::Logging.debug('audit publish skipped: transport unavailable')
+            end
+          rescue StandardError => e
+            Legion::Logging.warn("audit publish failed: #{e.message}") if defined?(Legion::Logging)
+          end
+          event
+        rescue StandardError => e
+          Legion::Logging.warn("audit build_event failed: #{e.message}") if defined?(Legion::Logging)
+          nil
+        end
+      end
+    end
+  end
+end

data/lib/legion/llm/pipeline/enrichment_injector.rb ADDED Viewed

@@ -0,0 +1,31 @@
+# frozen_string_literal: true
+module Legion
+  module LLM
+    module Pipeline
+      module EnrichmentInjector
+        module_function
+        def inject(system:, enrichments:)
+          parts = []
+          # GAIA system prompt (highest priority)
+          if (gaia = enrichments.dig('gaia:system_prompt', :content))
+            parts << gaia
+          end
+          # RAG context
+          if (rag = enrichments.dig('rag:context_retrieval', :data, :entries))
+            context_text = rag.map { |e| "[#{e[:content_type]}] #{e[:content]}" }.join("\n")
+            parts << "Relevant context:\n#{context_text}" unless context_text.empty?
+          end
+          return system if parts.empty?
+          parts << system if system
+          parts.join("\n\n")
+        end
+      end
+    end
+  end
+end