legion-llm 0.3.4 → 0.3.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: bd0b530095616abc383dcd06473a6c435753f021458c76c981da1d4e98583a5f
4
- data.tar.gz: 21d8645355c14d591891c3484ca90957e99b0cb376b115eb61dd61f3e0721800
3
+ metadata.gz: '0914899eb9eee81b947d95d617a16ddf152fb74fa7afb4c1c1cfca74c9c8445d'
4
+ data.tar.gz: d4146a95967ceffca175c531fd412089f3a13df4b4b60964598e123115d3c19f
5
5
  SHA512:
6
- metadata.gz: 53f3b6bd09f86625986e6f9d5c53f665e000e71d78dc5db36d599f1b5e5d7267d40ca1a2fe1e9f2b48cc54fb7ab6d272108869a02b327ff9669f978b83280e71
7
- data.tar.gz: 381d707e3bdb75a1cf87d82404dc842140f98fa4bb5091e5a837685235327684b800de977efcd24857bb0ab8ab5bc75d42fdc22364aa4b024d6adf8f27cdab65
6
+ metadata.gz: 8d9fb16e659a4f24d6c01bb3b7caa96d6814980e5b9866fe8ccc293bae57121f8d21acc95efef98832b015875abebfe1ca2cbba63f825a43d64cb9feac82f9b2
7
+ data.tar.gz: 4e6788a7b28889ed80ec1701e5a45a05bcfe71914610b538fae2f68d3b16ac4942e8edd0abbf2414d3dd124edc109817ceef3390d22108c1c9899a82b6d93c55
data/CHANGELOG.md CHANGED
@@ -1,5 +1,17 @@
1
1
  # Legion LLM Changelog
2
2
 
3
+ ## [0.3.6] - 2026-03-18
4
+
5
+ ### Added
6
+ - Add `lex-claude`, `lex-gemini`, `lex-openai` as runtime dependencies (AI provider extensions)
7
+
8
+ ## [0.3.5] - 2026-03-18
9
+
10
+ ### Added
11
+ - Gateway integration: `chat`, `embed`, `structured` delegate to `lex-llm-gateway` when loaded for automatic metering and fleet dispatch
12
+ - `chat_direct`, `embed_direct`, `structured_direct` methods bypass gateway (used by gateway runners to avoid recursion)
13
+ - Gateway integration spec (8 examples)
14
+
3
15
  ## [0.3.4] - 2026-03-18
4
16
 
5
17
  ### Added
data/CLAUDE.md CHANGED
@@ -8,6 +8,7 @@
8
8
  Core LegionIO gem providing LLM capabilities to all extensions. Wraps ruby_llm to provide a consistent interface for chat, embeddings, tool use, and agents across multiple providers (Bedrock, Anthropic, OpenAI, Gemini, Ollama). Includes a dynamic weighted routing engine that dispatches requests across local, fleet, and cloud tiers based on caller intent, priority rules, time schedules, cost multipliers, and real-time provider health.
9
9
 
10
10
  **GitHub**: https://github.com/LegionIO/legion-llm
11
+ **Version**: 0.3.5
11
12
  **License**: Apache-2.0
12
13
 
13
14
  ## Architecture
@@ -61,8 +62,7 @@ Three-tier dispatch model. Local-first avoids unnecessary network hops; fleet of
61
62
  │ Zero network overhead, no Transport │
62
63
  │ │
63
64
  │ Tier 2: FLEET → Ollama on Mac Studios / GPU servers │
64
- │ Via Legion::Transport (AMQP) when local can't
65
- │ serve the model (Phase 2, not yet built) │
65
+ │ Via lex-llm-gateway RPC over AMQP
66
66
  │ │
67
67
  │ Tier 3: CLOUD → Bedrock / Anthropic / OpenAI / Gemini │
68
68
  │ Existing provider API calls │
@@ -87,6 +87,19 @@ Three-tier dispatch model. Local-first avoids unnecessary network hops; fleet of
87
87
  5. Return Resolution for highest-scoring candidate
88
88
  ```
89
89
 
90
+ ### Gateway Integration (lex-llm-gateway)
91
+
92
+ When `lex-llm-gateway` is installed, `chat`, `embed`, and `structured` automatically delegate to the gateway for metering and fleet dispatch. The gateway is loaded via `begin/rescue LoadError` — optional, not a hard dependency.
93
+
94
+ ```
95
+ Caller → Legion::LLM.chat(message:)
96
+ └─ gateway loaded? → Gateway::Runners::Inference.chat (meters, fleet dispatch)
97
+ └─ Legion::LLM.chat_direct (routing, escalation, RubyLLM)
98
+ └─ no gateway? → Legion::LLM.chat_direct (same path, no metering)
99
+ ```
100
+
101
+ The `_direct` variants (`chat_direct`, `embed_direct`, `structured_direct`) bypass gateway delegation. The gateway's `call_llm` uses these to avoid infinite recursion.
102
+
90
103
  ### Integration with LegionIO
91
104
 
92
105
  - **Service**: `setup_llm` called between data and supervision in startup sequence
@@ -94,6 +107,7 @@ Three-tier dispatch model. Local-first avoids unnecessary network hops; fleet of
94
107
  - **Helpers**: `Legion::Extensions::Helpers::LLM` auto-loaded when gem is present
95
108
  - **Readiness**: Registers as `:llm` in `Legion::Readiness`
96
109
  - **Shutdown**: `Legion::LLM.shutdown` called during service shutdown
110
+ - **Gateway**: `lex-llm-gateway` auto-loaded if present; provides metering and fleet RPC
97
111
 
98
112
  ## Dependencies
99
113
 
@@ -103,6 +117,7 @@ Three-tier dispatch model. Local-first avoids unnecessary network hops; fleet of
103
117
  | `tzinfo` (>= 2.0) | IANA timezone conversion for schedule windows |
104
118
  | `legion-logging` | Logging |
105
119
  | `legion-settings` | Configuration |
120
+ | `lex-llm-gateway` (optional) | Metering over RMQ, fleet RPC dispatch, disk spool — auto-loaded if present |
106
121
 
107
122
  ## Key Interfaces
108
123
 
@@ -113,11 +128,15 @@ Legion::LLM.shutdown # Cleanup
113
128
  Legion::LLM.started? # -> Boolean
114
129
  Legion::LLM.settings # -> Hash
115
130
 
116
- # Chat (with optional routing)
117
- Legion::LLM.chat(model:, provider:) # Direct (no routing)
131
+ # Chat (delegates to gateway when loaded, otherwise direct)
132
+ Legion::LLM.chat(message: 'hello', model:, provider:) # Gateway-metered if available
118
133
  Legion::LLM.chat(intent: { privacy: :strict }) # Intent-based routing
119
134
  Legion::LLM.chat(tier: :cloud, model: 'claude-sonnet-4-6') # Explicit tier override
120
- Legion::LLM.embed(text, model:) # Embeddings (no routing)
135
+ Legion::LLM.chat_direct(message:, model:, provider:) # Bypass gateway (no metering)
136
+ Legion::LLM.embed(text, model:) # Embeddings (gateway-metered)
137
+ Legion::LLM.embed_direct(text, model:) # Bypass gateway
138
+ Legion::LLM.structured(messages:, schema:) # Structured (gateway-metered)
139
+ Legion::LLM.structured_direct(messages:, schema:) # Bypass gateway
121
140
  Legion::LLM.agent(AgentClass) # Agent instance
122
141
 
123
142
  # Compressor
@@ -284,7 +303,7 @@ In-memory signal consumer with pluggable handlers. Adjusts effective priorities
284
303
  | `lib/legion/llm/embeddings.rb` | Embeddings module: generate, generate_batch, default_model |
285
304
  | `lib/legion/llm/shadow_eval.rb` | Shadow evaluation: enabled?, should_sample?, evaluate, compare |
286
305
  | `lib/legion/llm/structured_output.rb` | JSON schema enforcement with native response_format and prompt fallback |
287
- | `lib/legion/llm/version.rb` | Version constant (0.3.3) |
306
+ | `lib/legion/llm/version.rb` | Version constant (0.3.5) |
288
307
  | `lib/legion/llm/quality_checker.rb` | QualityChecker module with QualityResult struct |
289
308
  | `lib/legion/llm/escalation_history.rb` | EscalationHistory mixin: `escalation_history`, `escalated?`, `final_resolution`, `escalation_chain` |
290
309
  | `lib/legion/llm/router/escalation_chain.rb` | EscalationChain value object |
@@ -315,6 +334,7 @@ In-memory signal consumer with pluggable handlers. Adjusts effective priorities
315
334
  | `spec/legion/llm/embeddings_spec.rb` | Embeddings tests |
316
335
  | `spec/legion/llm/shadow_eval_spec.rb` | ShadowEval tests |
317
336
  | `spec/legion/llm/structured_output_spec.rb` | StructuredOutput tests |
337
+ | `spec/legion/llm/gateway_integration_spec.rb` | Tests: gateway delegation and _direct bypass |
318
338
  | `spec/spec_helper.rb` | Stubbed Legion::Logging and Legion::Settings for testing |
319
339
 
320
340
  ## Extension Integration
@@ -374,8 +394,8 @@ The legacy `vault_path` per-provider setting was removed in v0.3.1.
374
394
  Tests run without the full LegionIO stack. `spec/spec_helper.rb` stubs `Legion::Logging` and `Legion::Settings` with in-memory implementations. Each test resets settings to defaults via `before(:each)`.
375
395
 
376
396
  ```bash
377
- bundle exec rspec # 287 examples, 0 failures
378
- bundle exec rubocop # 31 files, 0 offenses
397
+ bundle exec rspec # 304 examples, 0 failures
398
+ bundle exec rubocop # 52 files, 0 offenses
379
399
  ```
380
400
 
381
401
  ## Design Documents
@@ -389,8 +409,8 @@ bundle exec rubocop # 31 files, 0 offenses
389
409
 
390
410
  ## Future (Not Yet Built)
391
411
 
392
- - **Fleet tier (Phase 2)**: `lex-llm-fleet` extension inference workers on Mac Studios / NVIDIA servers, dispatched via Legion::Transport AMQP queues
393
- - **Advanced signals (Phase 3)**: Budget tracking, lex-metering integration, GPU utilization monitoring
412
+ - **Advanced signals**: Budget tracking, GPU utilization monitoring, per-tenant spend limits
413
+ - **Fleet auto-scaling**: Dynamic worker pool sizing based on queue depth and latency
394
414
 
395
415
  ---
396
416
 
data/README.md CHANGED
@@ -2,6 +2,8 @@
2
2
 
3
3
  LLM integration for the [LegionIO](https://github.com/LegionIO/LegionIO) framework. Wraps [ruby_llm](https://github.com/crmne/ruby_llm) to provide chat, embeddings, tool use, and agent capabilities to any Legion extension.
4
4
 
5
+ **Version**: 0.3.5
6
+
5
7
  ## Installation
6
8
 
7
9
  ```ruby
@@ -599,7 +601,7 @@ bundle exec rspec
599
601
  Tests use stubbed `Legion::Logging` and `Legion::Settings` modules (no need for the full LegionIO stack):
600
602
 
601
603
  ```bash
602
- bundle exec rspec # Run all 269 tests
604
+ bundle exec rspec # Run all 304 tests
603
605
  bundle exec rubocop # Lint (0 offenses)
604
606
  bundle exec rspec spec/legion/llm_spec.rb # Run specific test file
605
607
  bundle exec rspec spec/legion/llm/router_spec.rb # Router tests only
data/legion-llm.gemspec CHANGED
@@ -27,6 +27,9 @@ Gem::Specification.new do |spec|
27
27
 
28
28
  spec.add_dependency 'legion-logging'
29
29
  spec.add_dependency 'legion-settings'
30
+ spec.add_dependency 'lex-claude'
31
+ spec.add_dependency 'lex-gemini'
32
+ spec.add_dependency 'lex-openai'
30
33
  spec.add_dependency 'ruby_llm', '>= 1.0'
31
34
  spec.add_dependency 'tzinfo', '>= 2.0'
32
35
  end
@@ -2,6 +2,6 @@
2
2
 
3
3
  module Legion
4
4
  module LLM
5
- VERSION = '0.3.4'
5
+ VERSION = '0.3.6'
6
6
  end
7
7
  end
data/lib/legion/llm.rb CHANGED
@@ -9,6 +9,12 @@ require 'legion/llm/compressor'
9
9
  require 'legion/llm/quality_checker'
10
10
  require 'legion/llm/escalation_history'
11
11
 
12
+ begin
13
+ require 'legion/extensions/llm/gateway'
14
+ rescue LoadError
15
+ nil
16
+ end
17
+
12
18
  module Legion
13
19
  module LLM
14
20
  class EscalationExhausted < StandardError; end
@@ -50,20 +56,24 @@ module Legion
50
56
  end
51
57
  end
52
58
 
53
- # Create a new chat session
54
- # @param model [String] model ID (e.g., "us.anthropic.claude-sonnet-4-6-v1")
55
- # @param provider [Symbol] provider slug (e.g., :bedrock, :anthropic)
56
- # @param intent [Hash, nil] routing intent (capability, privacy, etc.)
57
- # @param tier [Symbol, nil] explicit tier override — skips rule matching
58
- # @param escalate [Boolean, nil] enable escalation retry loop (nil = auto from settings)
59
- # @param max_escalations [Integer, nil] max escalation attempts override
60
- # @param quality_check [Proc, nil] custom quality check callable
61
- # @param message [String, nil] message to send (required for escalation)
62
- # @param kwargs [Hash] additional options passed to RubyLLM.chat
63
- # @return [RubyLLM::Chat]
64
- # TODO: fleet tier dispatch via Transport (Phase 3)
59
+ # Create a new chat session — delegates to lex-llm-gateway when available
60
+ # for automatic metering and fleet dispatch
65
61
  def chat(model: nil, provider: nil, intent: nil, tier: nil, escalate: nil,
66
62
  max_escalations: nil, quality_check: nil, message: nil, **)
63
+ if gateway_loaded? && message
64
+ return gateway_chat(model: model, provider: provider, intent: intent,
65
+ tier: tier, message: message, escalate: escalate,
66
+ max_escalations: max_escalations, quality_check: quality_check, **)
67
+ end
68
+
69
+ chat_direct(model: model, provider: provider, intent: intent, tier: tier,
70
+ escalate: escalate, max_escalations: max_escalations,
71
+ quality_check: quality_check, message: message, **)
72
+ end
73
+
74
+ # Direct chat bypassing gateway — used by gateway runners to avoid recursion
75
+ def chat_direct(model: nil, provider: nil, intent: nil, tier: nil, escalate: nil,
76
+ max_escalations: nil, quality_check: nil, message: nil, **)
67
77
  escalate = escalation_enabled? if escalate.nil?
68
78
 
69
79
  if escalate && message
@@ -77,11 +87,15 @@ module Legion
77
87
  end
78
88
  end
79
89
 
80
- # Generate embeddings via Embeddings module
81
- # @param text [String, Array<String>] text to embed
82
- # @param model [String] embedding model ID
83
- # @return [Hash] { vector:, model:, dimensions:, tokens: }
90
+ # Generate embeddings delegates to gateway when available
84
91
  def embed(text, **)
92
+ return Legion::Extensions::LLM::Gateway::Runners::Inference.embed(text: text, **) if gateway_loaded?
93
+
94
+ embed_direct(text, **)
95
+ end
96
+
97
+ # Direct embed bypassing gateway
98
+ def embed_direct(text, **)
85
99
  require 'legion/llm/embeddings'
86
100
  Embeddings.generate(text: text, **)
87
101
  end
@@ -94,11 +108,19 @@ module Legion
94
108
  Embeddings.generate_batch(texts: texts, **)
95
109
  end
96
110
 
97
- # Generate structured JSON output from LLM
98
- # @param messages [Array<Hash>] conversation messages
99
- # @param schema [Hash] JSON schema to enforce
100
- # @return [Hash] { data:, raw:, model:, valid: }
111
+ # Generate structured JSON output delegates to gateway when available
101
112
  def structured(messages:, schema:, **)
113
+ if gateway_loaded?
114
+ return Legion::Extensions::LLM::Gateway::Runners::Inference.structured(
115
+ messages: messages, schema: schema, **
116
+ )
117
+ end
118
+
119
+ structured_direct(messages: messages, schema: schema, **)
120
+ end
121
+
122
+ # Direct structured bypassing gateway
123
+ def structured_direct(messages:, schema:, **)
102
124
  require 'legion/llm/structured_output'
103
125
  StructuredOutput.generate(messages: messages, schema: schema, **)
104
126
  end
@@ -113,6 +135,14 @@ module Legion
113
135
 
114
136
  private
115
137
 
138
+ def gateway_loaded?
139
+ defined?(Legion::Extensions::LLM::Gateway::Runners::Inference)
140
+ end
141
+
142
+ def gateway_chat(**)
143
+ Legion::Extensions::LLM::Gateway::Runners::Inference.chat(**)
144
+ end
145
+
116
146
  def chat_single(model:, provider:, intent:, tier:, **kwargs)
117
147
  if (intent || tier) && Router.routing_enabled?
118
148
  resolution = Router.resolve(intent: intent, tier: tier, model: model, provider: provider)
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: legion-llm
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.4
4
+ version: 0.3.6
5
5
  platform: ruby
6
6
  authors:
7
7
  - Esity
@@ -37,6 +37,48 @@ dependencies:
37
37
  - - ">="
38
38
  - !ruby/object:Gem::Version
39
39
  version: '0'
40
+ - !ruby/object:Gem::Dependency
41
+ name: lex-claude
42
+ requirement: !ruby/object:Gem::Requirement
43
+ requirements:
44
+ - - ">="
45
+ - !ruby/object:Gem::Version
46
+ version: '0'
47
+ type: :runtime
48
+ prerelease: false
49
+ version_requirements: !ruby/object:Gem::Requirement
50
+ requirements:
51
+ - - ">="
52
+ - !ruby/object:Gem::Version
53
+ version: '0'
54
+ - !ruby/object:Gem::Dependency
55
+ name: lex-gemini
56
+ requirement: !ruby/object:Gem::Requirement
57
+ requirements:
58
+ - - ">="
59
+ - !ruby/object:Gem::Version
60
+ version: '0'
61
+ type: :runtime
62
+ prerelease: false
63
+ version_requirements: !ruby/object:Gem::Requirement
64
+ requirements:
65
+ - - ">="
66
+ - !ruby/object:Gem::Version
67
+ version: '0'
68
+ - !ruby/object:Gem::Dependency
69
+ name: lex-openai
70
+ requirement: !ruby/object:Gem::Requirement
71
+ requirements:
72
+ - - ">="
73
+ - !ruby/object:Gem::Version
74
+ version: '0'
75
+ type: :runtime
76
+ prerelease: false
77
+ version_requirements: !ruby/object:Gem::Requirement
78
+ requirements:
79
+ - - ">="
80
+ - !ruby/object:Gem::Version
81
+ version: '0'
40
82
  - !ruby/object:Gem::Dependency
41
83
  name: ruby_llm
42
84
  requirement: !ruby/object:Gem::Requirement