RubyGems - legion-llm - Versions diffs - 0.3.4 → 0.3.6 - Mend

legion-llm 0.3.4 → 0.3.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: bd0b530095616abc383dcd06473a6c435753f021458c76c981da1d4e98583a5f
-  data.tar.gz: 21d8645355c14d591891c3484ca90957e99b0cb376b115eb61dd61f3e0721800
+  metadata.gz: '0914899eb9eee81b947d95d617a16ddf152fb74fa7afb4c1c1cfca74c9c8445d'
+  data.tar.gz: d4146a95967ceffca175c531fd412089f3a13df4b4b60964598e123115d3c19f
 SHA512:
-  metadata.gz: 53f3b6bd09f86625986e6f9d5c53f665e000e71d78dc5db36d599f1b5e5d7267d40ca1a2fe1e9f2b48cc54fb7ab6d272108869a02b327ff9669f978b83280e71
-  data.tar.gz: 381d707e3bdb75a1cf87d82404dc842140f98fa4bb5091e5a837685235327684b800de977efcd24857bb0ab8ab5bc75d42fdc22364aa4b024d6adf8f27cdab65
+  metadata.gz: 8d9fb16e659a4f24d6c01bb3b7caa96d6814980e5b9866fe8ccc293bae57121f8d21acc95efef98832b015875abebfe1ca2cbba63f825a43d64cb9feac82f9b2
+  data.tar.gz: 4e6788a7b28889ed80ec1701e5a45a05bcfe71914610b538fae2f68d3b16ac4942e8edd0abbf2414d3dd124edc109817ceef3390d22108c1c9899a82b6d93c55

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,17 @@
 # Legion LLM Changelog
+## [0.3.6] - 2026-03-18
+### Added
+- Add `lex-claude`, `lex-gemini`, `lex-openai` as runtime dependencies (AI provider extensions)
+## [0.3.5] - 2026-03-18
+### Added
+- Gateway integration: `chat`, `embed`, `structured` delegate to `lex-llm-gateway` when loaded for automatic metering and fleet dispatch
+- `chat_direct`, `embed_direct`, `structured_direct` methods bypass gateway (used by gateway runners to avoid recursion)
+- Gateway integration spec (8 examples)
 ## [0.3.4] - 2026-03-18
 ### Added

data/CLAUDE.md CHANGED Viewed

@@ -8,6 +8,7 @@
 Core LegionIO gem providing LLM capabilities to all extensions. Wraps ruby_llm to provide a consistent interface for chat, embeddings, tool use, and agents across multiple providers (Bedrock, Anthropic, OpenAI, Gemini, Ollama). Includes a dynamic weighted routing engine that dispatches requests across local, fleet, and cloud tiers based on caller intent, priority rules, time schedules, cost multipliers, and real-time provider health.
 **GitHub**: https://github.com/LegionIO/legion-llm
+**Version**: 0.3.5
 **License**: Apache-2.0
 ## Architecture
@@ -61,8 +62,7 @@ Three-tier dispatch model. Local-first avoids unnecessary network hops; fleet of
 │          Zero network overhead, no Transport              │
 │                                                          │
 │  Tier 2: FLEET  → Ollama on Mac Studios / GPU servers    │
-│          Via Legion::Transport (AMQP) when local can't   │
-│          serve the model (Phase 2, not yet built)        │
+│          Via lex-llm-gateway RPC over AMQP               │
 │                                                          │
 │  Tier 3: CLOUD  → Bedrock / Anthropic / OpenAI / Gemini │
 │          Existing provider API calls                     │
@@ -87,6 +87,19 @@ Three-tier dispatch model. Local-first avoids unnecessary network hops; fleet of
 5. Return Resolution for highest-scoring candidate
 ```
+### Gateway Integration (lex-llm-gateway)
+When `lex-llm-gateway` is installed, `chat`, `embed`, and `structured` automatically delegate to the gateway for metering and fleet dispatch. The gateway is loaded via `begin/rescue LoadError` — optional, not a hard dependency.
+```
+Caller → Legion::LLM.chat(message:)
+  └─ gateway loaded? → Gateway::Runners::Inference.chat (meters, fleet dispatch)
+       └─ Legion::LLM.chat_direct (routing, escalation, RubyLLM)
+  └─ no gateway? → Legion::LLM.chat_direct (same path, no metering)
+```
+The `_direct` variants (`chat_direct`, `embed_direct`, `structured_direct`) bypass gateway delegation. The gateway's `call_llm` uses these to avoid infinite recursion.
 ### Integration with LegionIO
 - **Service**: `setup_llm` called between data and supervision in startup sequence
@@ -94,6 +107,7 @@ Three-tier dispatch model. Local-first avoids unnecessary network hops; fleet of
 - **Helpers**: `Legion::Extensions::Helpers::LLM` auto-loaded when gem is present
 - **Readiness**: Registers as `:llm` in `Legion::Readiness`
 - **Shutdown**: `Legion::LLM.shutdown` called during service shutdown
+- **Gateway**: `lex-llm-gateway` auto-loaded if present; provides metering and fleet RPC
 ## Dependencies
@@ -103,6 +117,7 @@ Three-tier dispatch model. Local-first avoids unnecessary network hops; fleet of
 | `tzinfo` (>= 2.0) | IANA timezone conversion for schedule windows |
 | `legion-logging` | Logging |
 | `legion-settings` | Configuration |
+| `lex-llm-gateway` (optional) | Metering over RMQ, fleet RPC dispatch, disk spool — auto-loaded if present |
 ## Key Interfaces
@@ -113,11 +128,15 @@ Legion::LLM.shutdown                 # Cleanup
 Legion::LLM.started?                 # -> Boolean
 Legion::LLM.settings                 # -> Hash
-# Chat (with optional routing)
-Legion::LLM.chat(model:, provider:)                         # Direct (no routing)
+# Chat (delegates to gateway when loaded, otherwise direct)
+Legion::LLM.chat(message: 'hello', model:, provider:)       # Gateway-metered if available
 Legion::LLM.chat(intent: { privacy: :strict })              # Intent-based routing
 Legion::LLM.chat(tier: :cloud, model: 'claude-sonnet-4-6')  # Explicit tier override
-Legion::LLM.embed(text, model:)                             # Embeddings (no routing)
+Legion::LLM.chat_direct(message:, model:, provider:)        # Bypass gateway (no metering)
+Legion::LLM.embed(text, model:)                             # Embeddings (gateway-metered)
+Legion::LLM.embed_direct(text, model:)                      # Bypass gateway
+Legion::LLM.structured(messages:, schema:)                  # Structured (gateway-metered)
+Legion::LLM.structured_direct(messages:, schema:)           # Bypass gateway
 Legion::LLM.agent(AgentClass)                               # Agent instance
 # Compressor
@@ -284,7 +303,7 @@ In-memory signal consumer with pluggable handlers. Adjusts effective priorities
 | `lib/legion/llm/embeddings.rb` | Embeddings module: generate, generate_batch, default_model |
 | `lib/legion/llm/shadow_eval.rb` | Shadow evaluation: enabled?, should_sample?, evaluate, compare |
 | `lib/legion/llm/structured_output.rb` | JSON schema enforcement with native response_format and prompt fallback |
-| `lib/legion/llm/version.rb` | Version constant (0.3.3) |
+| `lib/legion/llm/version.rb` | Version constant (0.3.5) |
 | `lib/legion/llm/quality_checker.rb` | QualityChecker module with QualityResult struct |
 | `lib/legion/llm/escalation_history.rb` | EscalationHistory mixin: `escalation_history`, `escalated?`, `final_resolution`, `escalation_chain` |
 | `lib/legion/llm/router/escalation_chain.rb` | EscalationChain value object |
@@ -315,6 +334,7 @@ In-memory signal consumer with pluggable handlers. Adjusts effective priorities
 | `spec/legion/llm/embeddings_spec.rb` | Embeddings tests |
 | `spec/legion/llm/shadow_eval_spec.rb` | ShadowEval tests |
 | `spec/legion/llm/structured_output_spec.rb` | StructuredOutput tests |
+| `spec/legion/llm/gateway_integration_spec.rb` | Tests: gateway delegation and _direct bypass |
 | `spec/spec_helper.rb` | Stubbed Legion::Logging and Legion::Settings for testing |
 ## Extension Integration
@@ -374,8 +394,8 @@ The legacy `vault_path` per-provider setting was removed in v0.3.1.
 Tests run without the full LegionIO stack. `spec/spec_helper.rb` stubs `Legion::Logging` and `Legion::Settings` with in-memory implementations. Each test resets settings to defaults via `before(:each)`.
 ```bash
-bundle exec rspec    # 287 examples, 0 failures
-bundle exec rubocop  # 31 files, 0 offenses
+bundle exec rspec    # 304 examples, 0 failures
+bundle exec rubocop  # 52 files, 0 offenses
 ```
 ## Design Documents
@@ -389,8 +409,8 @@ bundle exec rubocop  # 31 files, 0 offenses
 ## Future (Not Yet Built)
-- **Fleet tier (Phase 2)**: `lex-llm-fleet` extension — inference workers on Mac Studios / NVIDIA servers, dispatched via Legion::Transport AMQP queues
-- **Advanced signals (Phase 3)**: Budget tracking, lex-metering integration, GPU utilization monitoring
+- **Advanced signals**: Budget tracking, GPU utilization monitoring, per-tenant spend limits
+- **Fleet auto-scaling**: Dynamic worker pool sizing based on queue depth and latency
 ---

data/README.md CHANGED Viewed

@@ -2,6 +2,8 @@
 LLM integration for the [LegionIO](https://github.com/LegionIO/LegionIO) framework. Wraps [ruby_llm](https://github.com/crmne/ruby_llm) to provide chat, embeddings, tool use, and agent capabilities to any Legion extension.
+**Version**: 0.3.5
 ## Installation
 ```ruby
@@ -599,7 +601,7 @@ bundle exec rspec
 Tests use stubbed `Legion::Logging` and `Legion::Settings` modules (no need for the full LegionIO stack):
 ```bash
-bundle exec rspec                              # Run all 269 tests
+bundle exec rspec                              # Run all 304 tests
 bundle exec rubocop                            # Lint (0 offenses)
 bundle exec rspec spec/legion/llm_spec.rb      # Run specific test file
 bundle exec rspec spec/legion/llm/router_spec.rb  # Router tests only

data/legion-llm.gemspec CHANGED Viewed

@@ -27,6 +27,9 @@ Gem::Specification.new do |spec|
   spec.add_dependency 'legion-logging'
   spec.add_dependency 'legion-settings'
+  spec.add_dependency 'lex-claude'
+  spec.add_dependency 'lex-gemini'
+  spec.add_dependency 'lex-openai'
   spec.add_dependency 'ruby_llm', '>= 1.0'
   spec.add_dependency 'tzinfo', '>= 2.0'
 end

data/lib/legion/llm/version.rb CHANGED Viewed

@@ -2,6 +2,6 @@
 module Legion
   module LLM
-    VERSION = '0.3.4'
+    VERSION = '0.3.6'
   end
 end

data/lib/legion/llm.rb CHANGED Viewed

@@ -9,6 +9,12 @@ require 'legion/llm/compressor'
 require 'legion/llm/quality_checker'
 require 'legion/llm/escalation_history'
+begin
+  require 'legion/extensions/llm/gateway'
+rescue LoadError
+  nil
+end
 module Legion
   module LLM
     class EscalationExhausted < StandardError; end
@@ -50,20 +56,24 @@ module Legion
         end
       end
-      # Create a new chat session
-      # @param model [String] model ID (e.g., "us.anthropic.claude-sonnet-4-6-v1")
-      # @param provider [Symbol] provider slug (e.g., :bedrock, :anthropic)
-      # @param intent [Hash, nil] routing intent (capability, privacy, etc.)
-      # @param tier [Symbol, nil] explicit tier override — skips rule matching
-      # @param escalate [Boolean, nil] enable escalation retry loop (nil = auto from settings)
-      # @param max_escalations [Integer, nil] max escalation attempts override
-      # @param quality_check [Proc, nil] custom quality check callable
-      # @param message [String, nil] message to send (required for escalation)
-      # @param kwargs [Hash] additional options passed to RubyLLM.chat
-      # @return [RubyLLM::Chat]
-      # TODO: fleet tier dispatch via Transport (Phase 3)
+      # Create a new chat session — delegates to lex-llm-gateway when available
+      # for automatic metering and fleet dispatch
       def chat(model: nil, provider: nil, intent: nil, tier: nil, escalate: nil,
                max_escalations: nil, quality_check: nil, message: nil, **)
+        if gateway_loaded? && message
+          return gateway_chat(model: model, provider: provider, intent: intent,
+                              tier: tier, message: message, escalate: escalate,
+                              max_escalations: max_escalations, quality_check: quality_check, **)
+        end
+        chat_direct(model: model, provider: provider, intent: intent, tier: tier,
+                    escalate: escalate, max_escalations: max_escalations,
+                    quality_check: quality_check, message: message, **)
+      end
+      # Direct chat bypassing gateway — used by gateway runners to avoid recursion
+      def chat_direct(model: nil, provider: nil, intent: nil, tier: nil, escalate: nil,
+                      max_escalations: nil, quality_check: nil, message: nil, **)
         escalate = escalation_enabled? if escalate.nil?
         if escalate && message
@@ -77,11 +87,15 @@ module Legion
         end
       end
-      # Generate embeddings via Embeddings module
-      # @param text [String, Array<String>] text to embed
-      # @param model [String] embedding model ID
-      # @return [Hash] { vector:, model:, dimensions:, tokens: }
+      # Generate embeddings — delegates to gateway when available
       def embed(text, **)
+        return Legion::Extensions::LLM::Gateway::Runners::Inference.embed(text: text, **) if gateway_loaded?
+        embed_direct(text, **)
+      end
+      # Direct embed bypassing gateway
+      def embed_direct(text, **)
         require 'legion/llm/embeddings'
         Embeddings.generate(text: text, **)
       end
@@ -94,11 +108,19 @@ module Legion
         Embeddings.generate_batch(texts: texts, **)
       end
-      # Generate structured JSON output from LLM
-      # @param messages [Array<Hash>] conversation messages
-      # @param schema [Hash] JSON schema to enforce
-      # @return [Hash] { data:, raw:, model:, valid: }
+      # Generate structured JSON output — delegates to gateway when available
       def structured(messages:, schema:, **)
+        if gateway_loaded?
+          return Legion::Extensions::LLM::Gateway::Runners::Inference.structured(
+            messages: messages, schema: schema, **
+          )
+        end
+        structured_direct(messages: messages, schema: schema, **)
+      end
+      # Direct structured bypassing gateway
+      def structured_direct(messages:, schema:, **)
         require 'legion/llm/structured_output'
         StructuredOutput.generate(messages: messages, schema: schema, **)
       end
@@ -113,6 +135,14 @@ module Legion
       private
+      def gateway_loaded?
+        defined?(Legion::Extensions::LLM::Gateway::Runners::Inference)
+      end
+      def gateway_chat(**)
+        Legion::Extensions::LLM::Gateway::Runners::Inference.chat(**)
+      end
       def chat_single(model:, provider:, intent:, tier:, **kwargs)
         if (intent || tier) && Router.routing_enabled?
           resolution = Router.resolve(intent: intent, tier: tier, model: model, provider: provider)

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: legion-llm
 version: !ruby/object:Gem::Version
-  version: 0.3.4
+  version: 0.3.6
 platform: ruby
 authors:
 - Esity
@@ -37,6 +37,48 @@ dependencies:
     - - ">="
       - !ruby/object:Gem::Version
         version: '0'
+- !ruby/object:Gem::Dependency
+  name: lex-claude
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: lex-gemini
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: lex-openai
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
 - !ruby/object:Gem::Dependency
   name: ruby_llm
   requirement: !ruby/object:Gem::Requirement