RubyGems - legion-llm - Versions diffs - 0.5.4 → 0.5.6 - Mend

legion-llm 0.5.4 → 0.5.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +15 -0
data/CLAUDE.md +6 -6
data/README.md +26 -3
data/lib/legion/llm/pipeline/audit_publisher.rb +4 -8
data/lib/legion/llm/pipeline/steps/rag_context.rb +77 -30
data/lib/legion/llm/settings.rb +15 -1
data/lib/legion/llm/transport/exchanges/audit.rb +15 -0
data/lib/legion/llm/transport/messages/audit_event.rb +19 -0
data/lib/legion/llm/version.rb +1 -1
metadata +3 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 888d6f2f71dee1e592ffdf92bc132355fcfd7900752e4c8df4151edebc32b8bb
-  data.tar.gz: 1596d8e3819ee82e812aa434ffe6fdb39c5b4418b48bd034ed9da54b3f76e1a8
+  metadata.gz: 734e9ae7b47f1cef605a851314bf37e6bbf1a05db5dd61fc9aa0a29562a48ec6
+  data.tar.gz: 9a4e5519dd6ca8e00903c23cbb26e14ee2b2a24241c342af72c1702ecbb5864b
 SHA512:
-  metadata.gz: ee78d1dad4322e967231c1c90dca5082f0e9bfd56e474a0906281cf4507649076b0a75b0eb9cfba9ddb94a7beb54693a2d3dbd3ec1185bc6b15899f2b6f50c66
-  data.tar.gz: e7cab9bfdcb2e602c1d7a1565faf78fca52bbfe18ac8763d82e58949f9f577c14ce3c3e9719035a9739145cab2428e800d2b9df29883da509c4055f7234ae199
+  metadata.gz: 5a678f5944f951a7c55892bc530bffb006f34b478b2958d07ff3b290705b0ed114e224bfce6cdf998a5a70e0e1864da52d71c7d2aedc38eb879dce9e190987d6
+  data.tar.gz: '093d34861b714964b85bd8b3275f1872b3c0461037f621765f5e2b3178f9ddbe8e2a28a919e85c2ef09455c978e9a25383e35900cbe02dcaf3cc6478806a30d0'

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,20 @@
 # Legion LLM Changelog
+## [0.5.6] - 2026-03-24
+### Fixed
+- `AuditPublisher` now uses dedicated `Transport::Messages::AuditEvent` message class instead of `Messages::Dynamic` (Dynamic ignores `exchange:`/`routing_key:` kwargs and requires a `function_id` DB lookup — audit events were never reaching RabbitMQ)
+- Added `Transport::Exchanges::Audit` exchange class for the `llm.audit` topic exchange
+- Added `Transport::Messages::AuditEvent` message class with `routing_key 'llm.audit.complete'`
+## [0.5.5] - 2026-03-24
+### Changed
+- RAG context step now fires on almost all queries, not just long conversations
+- RAG only skips trivial queries (greetings, pings) when strategy is auto
+- All RAG thresholds configurable via `Legion::Settings[:llm][:rag]`: `full_limit`, `compact_limit`, `min_confidence`, `utilization_compact_threshold`, `utilization_skip_threshold`, `trivial_max_chars`, `trivial_patterns`
+- Strategy logic inverted: low utilization gets full RAG (room for context), high utilization gets compact, very high skips
 ## [0.5.4] - 2026-03-24
 ### Added

data/CLAUDE.md CHANGED Viewed

@@ -8,7 +8,7 @@
 Core LegionIO gem providing LLM capabilities to all extensions. Wraps ruby_llm to provide a consistent interface for chat, embeddings, tool use, and agents across multiple providers (Bedrock, Anthropic, OpenAI, Gemini, Ollama). Includes a dynamic weighted routing engine that dispatches requests across local, fleet, and cloud tiers based on caller intent, priority rules, time schedules, cost multipliers, and real-time provider health.
 **GitHub**: https://github.com/LegionIO/legion-llm
-**Version**: 0.4.1
+**Version**: 0.5.3
 **License**: Apache-2.0
 ## Architecture
@@ -110,7 +110,7 @@ Three-tier dispatch model. Local-first avoids unnecessary network hops; fleet of
 ### Gateway Integration (lex-llm-gateway)
-Gateway delegation removed in v0.4.1. `chat`, `embed`, and `structured` route directly — no `begin/rescue LoadError` block, no `gateway_loaded?` check. The pipeline (when `pipeline_enabled: true`) handles metering and fleet dispatch natively. The `_direct` variants still exist as the canonical non-pipeline path for `chat_direct`, `embed_direct`, `structured_direct`.
+Gateway delegation removed in v0.4.1. `chat`, `embed`, and `structured` route directly — no `begin/rescue LoadError` block, no `gateway_loaded?` check. The pipeline (enabled by default since v0.4.8) handles metering and fleet dispatch natively. The `_direct` variants still exist as the canonical non-pipeline path for `chat_direct`, `embed_direct`, `structured_direct`.
 ### Integration with LegionIO
@@ -187,7 +187,7 @@ Settings read from `Legion::Settings[:llm]`:
 |-----|------|---------|-------------|
 | `enabled` | Boolean | `true` | Enable LLM support |
 | `connected` | Boolean | `false` | Set to true after successful start |
-| `pipeline_enabled` | Boolean | `false` | Enable 18-step pipeline for chat() dispatch |
+| `pipeline_enabled` | Boolean | `true` | Enable 18-step pipeline for chat() dispatch (enabled by default since v0.4.8) |
 | `default_model` | String | `nil` | Default model ID (auto-detected if nil) |
 | `default_provider` | Symbol | `nil` | Default provider (auto-detected if nil) |
 | `providers` | Hash | See below | Per-provider configuration |
@@ -325,7 +325,7 @@ In-memory signal consumer with pluggable handlers. Adjusts effective priorities
 | `lib/legion/llm/structured_output.rb` | JSON schema enforcement with native response_format and prompt fallback |
 | `lib/legion/llm/errors.rb` | Typed error hierarchy: LLMError base + AuthError, RateLimitError, ContextOverflow, ProviderError, ProviderDown, UnsupportedCapability, PipelineError |
 | `lib/legion/llm/conversation_store.rb` | ConversationStore: in-memory LRU (256 slots) + optional Sequel DB persistence + spool fallback |
-| `lib/legion/llm/version.rb` | Version constant (0.4.2) |
+| `lib/legion/llm/version.rb` | Version constant (0.5.3) |
 | `lib/legion/llm/quality_checker.rb` | QualityChecker module with QualityResult struct |
 | `lib/legion/llm/escalation_history.rb` | EscalationHistory mixin: `escalation_history`, `escalated?`, `final_resolution`, `escalation_chain` |
 | `lib/legion/llm/router/escalation_chain.rb` | EscalationChain value object |
@@ -451,8 +451,8 @@ The legacy `vault_path` per-provider setting was removed in v0.3.1.
 Tests run without the full LegionIO stack. `spec/spec_helper.rb` stubs `Legion::Logging` and `Legion::Settings` with in-memory implementations. Each test resets settings to defaults via `before(:each)`.
 ```bash
-bundle exec rspec    # 794 examples, 0 failures
-bundle exec rubocop  # 142 files, 0 offenses
+bundle exec rspec    # 882 examples, 0 failures
+bundle exec rubocop  # 0 offenses
 ```
 ## Design Documents

data/README.md CHANGED Viewed

@@ -2,7 +2,7 @@
 LLM integration for the [LegionIO](https://github.com/LegionIO/LegionIO) framework. Wraps [ruby_llm](https://github.com/crmne/ruby_llm) to provide chat, embeddings, tool use, and agent capabilities to any Legion extension.
-**Version**: 0.3.15
+**Version**: 0.5.3
 ## Installation
@@ -280,9 +280,32 @@ session.with_tools(CodeAnalyzer, SecurityScanner)
 response = session.ask("Review this PR: #{diff}")
 ```
+### Unified Pipeline
+All `chat()` calls flow through an 18-step request/response pipeline (enabled by default since v0.4.8). The pipeline handles RBAC, classification, RAG context retrieval, MCP tool discovery, metering, billing, audit, and GAIA advisory in a consistent sequence. Steps are skipped based on the caller profile (`:external`, `:gaia`, `:system`).
+```ruby
+# Pipeline is enabled by default — no configuration needed
+result = Legion::LLM.chat(message: "hello")
+# Disable pipeline for a specific call (not recommended — use caller: profile instead)
+# Set pipeline_enabled: false in settings to disable globally
+```
+The pipeline accepts a `caller:` hash describing the request origin:
+```ruby
+Legion::LLM.chat(
+  message: "hello",
+  caller: { requested_by: { identity: "user@example.com", type: :human, credential: :jwt } }
+)
+```
+System callers (type: `:system`) derive the `:system` profile, which skips governance steps to prevent recursion.
 ### Routing
-legion-llm includes a dynamic weighted routing engine that dispatches requests across local, fleet, and cloud tiers based on caller intent, priority rules, time schedules, cost multipliers, and real-time provider health. Routing is **disabled by default** — opt in via settings.
+legion-llm includes a dynamic weighted routing engine that dispatches requests across local, fleet, and cloud tiers based on caller intent, priority rules, time schedules, cost multipliers, and real-time provider health. Routing is **disabled by default** — opt in by setting `routing.enabled: true` in settings.
 #### Three Tiers
@@ -629,7 +652,7 @@ bundle exec rspec
 Tests use stubbed `Legion::Logging` and `Legion::Settings` modules (no need for the full LegionIO stack):
 ```bash
-bundle exec rspec                              # Run all 304 tests
+bundle exec rspec                              # Run all 882 tests
 bundle exec rubocop                            # Lint (0 offenses)
 bundle exec rspec spec/legion/llm_spec.rb      # Run specific test file
 bundle exec rspec spec/legion/llm/router_spec.rb  # Router tests only

data/lib/legion/llm/pipeline/audit_publisher.rb CHANGED Viewed

@@ -34,14 +34,10 @@ module Legion
           event = build_event(request: request, response: response)
           begin
-            if defined?(Legion::Transport) &&
-               defined?(Legion::Transport::Messages::Dynamic)
-              Legion::Transport::Messages::Dynamic.new(
-                function:    'llm_audit',
-                opts:        event,
-                exchange:    EXCHANGE,
-                routing_key: ROUTING_KEY
-              ).publish
+            if defined?(Legion::Transport)
+              require 'legion/llm/transport/exchanges/audit'
+              require 'legion/llm/transport/messages/audit_event'
+              Legion::LLM::Transport::Messages::AuditEvent.new(**event).publish
             elsif defined?(Legion::Logging)
               Legion::Logging.debug('audit publish skipped: transport unavailable')
             end

data/lib/legion/llm/pipeline/steps/rag_context.rb CHANGED Viewed

@@ -6,32 +6,64 @@ module Legion
       module Steps
         module RagContext
           def step_rag_context
-            strategy = select_context_strategy(utilization: estimate_utilization)
-            return if %i[none full].include?(strategy)
+            return unless rag_enabled?
+            return unless substantive_query?
+            return unless apollo_available_or_warn?
-            unless apollo_available?
-              @warnings << 'Apollo unavailable for RAG context retrieval'
-              return
-            end
+            strategy = select_context_strategy(utilization: estimate_utilization)
+            return if strategy == :none
             query = extract_query
-            return if query.nil? || query.empty?
             start_time = Time.now
             result = apollo_retrieve(query: query, strategy: strategy)
+            record_rag_enrichment(result, strategy)
+            record_rag_timeline(result, strategy, start_time)
+          rescue StandardError => e
+            @warnings << "RAG context error: #{e.message}"
+          end
-            if result && result[:success] && result[:entries]&.any?
-              @enrichments['rag:context_retrieval'] = {
-                content:   "#{result[:count]} entries retrieved via #{strategy}",
-                data:      {
-                  entries:  result[:entries],
-                  strategy: strategy,
-                  count:    result[:count]
-                },
-                timestamp: Time.now
-              }
-            end
+          private
+          def rag_settings
+            @rag_settings ||= if defined?(Legion::Settings) && !Legion::Settings[:llm].nil?
+                                Legion::Settings[:llm][:rag] || {}
+                              else
+                                {}
+                              end
+          end
+          def rag_enabled?
+            rag_settings.fetch(:enabled, true)
+          end
+          def substantive_query?
+            query = extract_query
+            return false if query.nil? || query.empty?
+            auto_strategy = @request.context_strategy.nil? || @request.context_strategy == :auto
+            return true unless auto_strategy
+            !trivial_query?(query)
+          end
+          def apollo_available_or_warn?
+            return true if apollo_available?
+            @warnings << 'Apollo unavailable for RAG context retrieval'
+            false
+          end
+          def record_rag_enrichment(result, strategy)
+            return unless result && result[:success] && result[:entries]&.any?
+            @enrichments['rag:context_retrieval'] = {
+              content:   "#{result[:count]} entries retrieved via #{strategy}",
+              data:      { entries: result[:entries], strategy: strategy, count: result[:count] },
+              timestamp: Time.now
+            }
+          end
+          def record_rag_timeline(result, strategy, start_time)
             @timeline.record(
               category: :enrichment, key: 'rag:context_retrieval',
               direction: :inbound,
@@ -39,20 +71,21 @@ module Legion
               from: 'apollo', to: 'pipeline',
               duration_ms: ((Time.now - start_time) * 1000).to_i
             )
-          rescue StandardError => e
-            @warnings << "RAG context error: #{e.message}"
           end
-          private
           def select_context_strategy(utilization:)
             explicit = @request.context_strategy
             return explicit if explicit && explicit != :auto
-            case utilization
-            when 0...0.3   then :full
-            when 0.3...0.8 then :rag_hybrid
-            else                :rag
+            skip_threshold    = rag_settings.fetch(:utilization_skip_threshold, 0.9)
+            compact_threshold = rag_settings.fetch(:utilization_compact_threshold, 0.7)
+            if utilization >= skip_threshold
+              :none
+            elsif utilization >= compact_threshold
+              :rag_compact
+            else
+              :rag
             end
           end
@@ -63,15 +96,29 @@ module Legion
             message_tokens.to_f / @request.tokens[:max]
           end
+          def trivial_query?(query)
+            max_chars = rag_settings.fetch(:trivial_max_chars, 20)
+            patterns  = rag_settings.fetch(:trivial_patterns, [])
+            return false if query.length > max_chars
+            normalized = query.strip.downcase.gsub(/[^a-z0-9\s]/, '')
+            patterns.any? { |p| normalized == p }
+          end
           def apollo_available?
             defined?(::Legion::Extensions::Apollo::Runners::Knowledge)
           end
           def apollo_retrieve(query:, strategy:)
-            opts = { query: query, limit: 10, min_confidence: 0.5 }
-            opts[:limit] = 5 if strategy == :rag_hybrid
+            full_limit    = rag_settings.fetch(:full_limit, 10)
+            compact_limit = rag_settings.fetch(:compact_limit, 5)
+            confidence    = rag_settings.fetch(:min_confidence, 0.5)
-            ::Legion::Extensions::Apollo::Runners::Knowledge.retrieve_relevant(**opts)
+            limit = strategy == :rag_compact ? compact_limit : full_limit
+            ::Legion::Extensions::Apollo::Runners::Knowledge.retrieve_relevant(
+              query: query, limit: limit, min_confidence: confidence
+            )
           end
           def extract_query

data/lib/legion/llm/settings.rb CHANGED Viewed

@@ -19,7 +19,8 @@ module Legion
           prompt_caching:   prompt_caching_defaults,
           arbitrage:        arbitrage_defaults,
           batch:            batch_defaults,
-          scheduling:       scheduling_defaults
+          scheduling:       scheduling_defaults,
+          rag:              rag_defaults
         }
       end
@@ -113,6 +114,19 @@ module Legion
         }
       end
+      def self.rag_defaults
+        {
+          enabled:                       true,
+          full_limit:                    10,
+          compact_limit:                 5,
+          min_confidence:                0.5,
+          utilization_compact_threshold: 0.7,
+          utilization_skip_threshold:    0.9,
+          trivial_max_chars:             20,
+          trivial_patterns:              %w[hello hi hey ping pong test ok okay yes no thanks thank]
+        }
+      end
       def self.providers
         {
           bedrock:   {

data/lib/legion/llm/transport/exchanges/audit.rb ADDED Viewed

@@ -0,0 +1,15 @@
+# frozen_string_literal: true
+module Legion
+  module LLM
+    module Transport
+      module Exchanges
+        class Audit < ::Legion::Transport::Exchange
+          def exchange_name
+            'llm.audit'
+          end
+        end
+      end
+    end
+  end
+end

data/lib/legion/llm/transport/messages/audit_event.rb ADDED Viewed

@@ -0,0 +1,19 @@
+# frozen_string_literal: true
+module Legion
+  module LLM
+    module Transport
+      module Messages
+        class AuditEvent < ::Legion::Transport::Message
+          def exchange
+            Legion::LLM::Transport::Exchanges::Audit
+          end
+          def routing_key
+            'llm.audit.complete'
+          end
+        end
+      end
+    end
+  end
+end

data/lib/legion/llm/version.rb CHANGED Viewed

@@ -2,6 +2,6 @@
 module Legion
   module LLM
-    VERSION = '0.5.4'
+    VERSION = '0.5.6'
   end
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: legion-llm
 version: !ruby/object:Gem::Version
-  version: 0.5.4
+  version: 0.5.6
 platform: ruby
 authors:
 - Esity
@@ -255,7 +255,9 @@ files:
 - lib/legion/llm/shadow_eval.rb
 - lib/legion/llm/structured_output.rb
 - lib/legion/llm/tool_registry.rb
+- lib/legion/llm/transport/exchanges/audit.rb
 - lib/legion/llm/transport/exchanges/escalation.rb
+- lib/legion/llm/transport/messages/audit_event.rb
 - lib/legion/llm/transport/messages/escalation_event.rb
 - lib/legion/llm/version.rb
 homepage: https://github.com/LegionIO/legion-llm