legion-llm 0.3.5 → 0.3.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: e3e10dfcd60fe722290bec30017671fb261f3baf151b416c82defb082d4445f4
4
- data.tar.gz: 0c0b062649f8ede281fada374681d551d437126a2efa0e604a069610e14b7069
3
+ metadata.gz: b06b6f10d21c6c4d03c73646fbecc2112e61e47e1dd82059076c61a721efb1aa
4
+ data.tar.gz: 541a1a1de0a108e95b5e2c204ec579a1b0a5f77e935bad64d7668aef9ae3322d
5
5
  SHA512:
6
- metadata.gz: eeb2cd074c2eb1c3b63ccb7644adbcf7cac6bab62f8d5cc966e318b2185267ab73fae920726b83e1f72cbf8753ba1245ab84ae914baa092aebdbef08e0548cd3
7
- data.tar.gz: ccc52360f869421100f0bbda503570168f2f7eb86c5fda9069e6ee29bdaa4c60553d202a1ad2e2109e98ba451b9abf9c00dc8210b50cf5d0a925192cada2ab9d
6
+ metadata.gz: 06ab55cec8a23d4be70ea3851fd4a7717686c4e02f7b4ca2f479e2353f79b14cacc770343a81ac54f492a05de3b5228aebb7c2e464203e434eba53e8b4144694
7
+ data.tar.gz: 93623de5b0baa0bb5390678daac043fc6c08111f965886b03b87f5e769aa6e5f267a713c1c20d93d061341a6ceca8d88fcbd0dfa431d4ce84c2caf5768b19609
data/CHANGELOG.md CHANGED
@@ -1,5 +1,20 @@
1
1
  # Legion LLM Changelog
2
2
 
3
+ ## [0.3.7] - 2026-03-19
4
+
5
+ ### Added
6
+ - `ResponseCache` module for async response delivery via memcached with spool overflow at 8MB
7
+ - `DaemonClient` module for HTTP routing to LegionIO daemon with health caching (30s TTL)
8
+ - `Legion::LLM.ask` one-shot method: daemon-first routing with direct RubyLLM fallback
9
+ - `DaemonDeniedError` and `DaemonRateLimitedError` error classes
10
+ - Daemon settings: `daemon.url` and `daemon.enabled` in defaults
11
+ - HTTP status code contract: 200 (cached), 201 (sync), 202 (async poll), 403, 429, 503
12
+
13
+ ## [0.3.6] - 2026-03-18
14
+
15
+ ### Added
16
+ - Add `lex-claude`, `lex-gemini`, `lex-openai` as runtime dependencies (AI provider extensions)
17
+
3
18
  ## [0.3.5] - 2026-03-18
4
19
 
5
20
  ### Added
data/CLAUDE.md CHANGED
@@ -8,6 +8,7 @@
8
8
  Core LegionIO gem providing LLM capabilities to all extensions. Wraps ruby_llm to provide a consistent interface for chat, embeddings, tool use, and agents across multiple providers (Bedrock, Anthropic, OpenAI, Gemini, Ollama). Includes a dynamic weighted routing engine that dispatches requests across local, fleet, and cloud tiers based on caller intent, priority rules, time schedules, cost multipliers, and real-time provider health.
9
9
 
10
10
  **GitHub**: https://github.com/LegionIO/legion-llm
11
+ **Version**: 0.3.6
11
12
  **License**: Apache-2.0
12
13
 
13
14
  ## Architecture
@@ -61,8 +62,7 @@ Three-tier dispatch model. Local-first avoids unnecessary network hops; fleet of
61
62
  │ Zero network overhead, no Transport │
62
63
  │ │
63
64
  │ Tier 2: FLEET → Ollama on Mac Studios / GPU servers │
64
- │ Via Legion::Transport (AMQP) when local can't
65
- │ serve the model (Phase 2, not yet built) │
65
+ │ Via lex-llm-gateway RPC over AMQP
66
66
  │ │
67
67
  │ Tier 3: CLOUD → Bedrock / Anthropic / OpenAI / Gemini │
68
68
  │ Existing provider API calls │
@@ -87,6 +87,19 @@ Three-tier dispatch model. Local-first avoids unnecessary network hops; fleet of
87
87
  5. Return Resolution for highest-scoring candidate
88
88
  ```
89
89
 
90
+ ### Gateway Integration (lex-llm-gateway)
91
+
92
+ When `lex-llm-gateway` is installed, `chat`, `embed`, and `structured` automatically delegate to the gateway for metering and fleet dispatch. The gateway is loaded via `begin/rescue LoadError` — optional, not a hard dependency.
93
+
94
+ ```
95
+ Caller → Legion::LLM.chat(message:)
96
+ └─ gateway loaded? → Gateway::Runners::Inference.chat (meters, fleet dispatch)
97
+ └─ Legion::LLM.chat_direct (routing, escalation, RubyLLM)
98
+ └─ no gateway? → Legion::LLM.chat_direct (same path, no metering)
99
+ ```
100
+
101
+ The `_direct` variants (`chat_direct`, `embed_direct`, `structured_direct`) bypass gateway delegation. The gateway's `call_llm` uses these to avoid infinite recursion.
102
+
90
103
  ### Integration with LegionIO
91
104
 
92
105
  - **Service**: `setup_llm` called between data and supervision in startup sequence
@@ -94,6 +107,7 @@ Three-tier dispatch model. Local-first avoids unnecessary network hops; fleet of
94
107
  - **Helpers**: `Legion::Extensions::Helpers::LLM` auto-loaded when gem is present
95
108
  - **Readiness**: Registers as `:llm` in `Legion::Readiness`
96
109
  - **Shutdown**: `Legion::LLM.shutdown` called during service shutdown
110
+ - **Gateway**: `lex-llm-gateway` auto-loaded if present; provides metering and fleet RPC
97
111
 
98
112
  ## Dependencies
99
113
 
@@ -103,6 +117,7 @@ Three-tier dispatch model. Local-first avoids unnecessary network hops; fleet of
103
117
  | `tzinfo` (>= 2.0) | IANA timezone conversion for schedule windows |
104
118
  | `legion-logging` | Logging |
105
119
  | `legion-settings` | Configuration |
120
+ | `lex-llm-gateway` (optional) | Metering over RMQ, fleet RPC dispatch, disk spool — auto-loaded if present |
106
121
 
107
122
  ## Key Interfaces
108
123
 
@@ -113,11 +128,15 @@ Legion::LLM.shutdown # Cleanup
113
128
  Legion::LLM.started? # -> Boolean
114
129
  Legion::LLM.settings # -> Hash
115
130
 
116
- # Chat (with optional routing)
117
- Legion::LLM.chat(model:, provider:) # Direct (no routing)
131
+ # Chat (delegates to gateway when loaded, otherwise direct)
132
+ Legion::LLM.chat(message: 'hello', model:, provider:) # Gateway-metered if available
118
133
  Legion::LLM.chat(intent: { privacy: :strict }) # Intent-based routing
119
134
  Legion::LLM.chat(tier: :cloud, model: 'claude-sonnet-4-6') # Explicit tier override
120
- Legion::LLM.embed(text, model:) # Embeddings (no routing)
135
+ Legion::LLM.chat_direct(message:, model:, provider:) # Bypass gateway (no metering)
136
+ Legion::LLM.embed(text, model:) # Embeddings (gateway-metered)
137
+ Legion::LLM.embed_direct(text, model:) # Bypass gateway
138
+ Legion::LLM.structured(messages:, schema:) # Structured (gateway-metered)
139
+ Legion::LLM.structured_direct(messages:, schema:) # Bypass gateway
121
140
  Legion::LLM.agent(AgentClass) # Agent instance
122
141
 
123
142
  # Compressor
@@ -284,7 +303,7 @@ In-memory signal consumer with pluggable handlers. Adjusts effective priorities
284
303
  | `lib/legion/llm/embeddings.rb` | Embeddings module: generate, generate_batch, default_model |
285
304
  | `lib/legion/llm/shadow_eval.rb` | Shadow evaluation: enabled?, should_sample?, evaluate, compare |
286
305
  | `lib/legion/llm/structured_output.rb` | JSON schema enforcement with native response_format and prompt fallback |
287
- | `lib/legion/llm/version.rb` | Version constant (0.3.3) |
306
+ | `lib/legion/llm/version.rb` | Version constant (0.3.6) |
288
307
  | `lib/legion/llm/quality_checker.rb` | QualityChecker module with QualityResult struct |
289
308
  | `lib/legion/llm/escalation_history.rb` | EscalationHistory mixin: `escalation_history`, `escalated?`, `final_resolution`, `escalation_chain` |
290
309
  | `lib/legion/llm/router/escalation_chain.rb` | EscalationChain value object |
@@ -315,6 +334,7 @@ In-memory signal consumer with pluggable handlers. Adjusts effective priorities
315
334
  | `spec/legion/llm/embeddings_spec.rb` | Embeddings tests |
316
335
  | `spec/legion/llm/shadow_eval_spec.rb` | ShadowEval tests |
317
336
  | `spec/legion/llm/structured_output_spec.rb` | StructuredOutput tests |
337
+ | `spec/legion/llm/gateway_integration_spec.rb` | Tests: gateway delegation and _direct bypass |
318
338
  | `spec/spec_helper.rb` | Stubbed Legion::Logging and Legion::Settings for testing |
319
339
 
320
340
  ## Extension Integration
@@ -374,8 +394,8 @@ The legacy `vault_path` per-provider setting was removed in v0.3.1.
374
394
  Tests run without the full LegionIO stack. `spec/spec_helper.rb` stubs `Legion::Logging` and `Legion::Settings` with in-memory implementations. Each test resets settings to defaults via `before(:each)`.
375
395
 
376
396
  ```bash
377
- bundle exec rspec # 287 examples, 0 failures
378
- bundle exec rubocop # 31 files, 0 offenses
397
+ bundle exec rspec # 304 examples, 0 failures
398
+ bundle exec rubocop # 52 files, 0 offenses
379
399
  ```
380
400
 
381
401
  ## Design Documents
@@ -389,8 +409,8 @@ bundle exec rubocop # 31 files, 0 offenses
389
409
 
390
410
  ## Future (Not Yet Built)
391
411
 
392
- - **Fleet tier (Phase 2)**: `lex-llm-fleet` extension inference workers on Mac Studios / NVIDIA servers, dispatched via Legion::Transport AMQP queues
393
- - **Advanced signals (Phase 3)**: Budget tracking, lex-metering integration, GPU utilization monitoring
412
+ - **Advanced signals**: Budget tracking, GPU utilization monitoring, per-tenant spend limits
413
+ - **Fleet auto-scaling**: Dynamic worker pool sizing based on queue depth and latency
394
414
 
395
415
  ---
396
416
 
data/Gemfile CHANGED
@@ -4,8 +4,6 @@ source 'https://rubygems.org'
4
4
 
5
5
  gemspec
6
6
 
7
- gem 'lex-llm-gateway', path: '../extensions-core/lex-llm-gateway' if File.directory?('../extensions-core/lex-llm-gateway')
8
-
9
7
  group :test do
10
8
  gem 'rake'
11
9
  gem 'rspec'
data/README.md CHANGED
@@ -2,6 +2,8 @@
2
2
 
3
3
  LLM integration for the [LegionIO](https://github.com/LegionIO/LegionIO) framework. Wraps [ruby_llm](https://github.com/crmne/ruby_llm) to provide chat, embeddings, tool use, and agent capabilities to any Legion extension.
4
4
 
5
+ **Version**: 0.3.6
6
+
5
7
  ## Installation
6
8
 
7
9
  ```ruby
@@ -599,7 +601,7 @@ bundle exec rspec
599
601
  Tests use stubbed `Legion::Logging` and `Legion::Settings` modules (no need for the full LegionIO stack):
600
602
 
601
603
  ```bash
602
- bundle exec rspec # Run all 269 tests
604
+ bundle exec rspec # Run all 304 tests
603
605
  bundle exec rubocop # Lint (0 offenses)
604
606
  bundle exec rspec spec/legion/llm_spec.rb # Run specific test file
605
607
  bundle exec rspec spec/legion/llm/router_spec.rb # Router tests only
data/legion-llm.gemspec CHANGED
@@ -27,6 +27,9 @@ Gem::Specification.new do |spec|
27
27
 
28
28
  spec.add_dependency 'legion-logging'
29
29
  spec.add_dependency 'legion-settings'
30
+ spec.add_dependency 'lex-claude'
31
+ spec.add_dependency 'lex-gemini'
32
+ spec.add_dependency 'lex-openai'
30
33
  spec.add_dependency 'ruby_llm', '>= 1.0'
31
34
  spec.add_dependency 'tzinfo', '>= 2.0'
32
35
  end
@@ -0,0 +1,179 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'net/http'
4
+ require 'uri'
5
+ require 'json'
6
+ require 'securerandom'
7
+
8
+ module Legion
9
+ module LLM
10
+ module DaemonClient
11
+ HEALTH_CACHE_TTL = 30
12
+ DEFAULT_TIMEOUT = 60
13
+
14
+ module_function
15
+
16
+ # Returns true if the daemon is reachable and healthy.
17
+ # Returns false immediately if daemon_url is nil.
18
+ # Caches a positive health check for HEALTH_CACHE_TTL seconds.
19
+ # An unhealthy result is not cached — rechecks on every call.
20
+ def available?
21
+ return false if daemon_url.nil?
22
+
23
+ now = ::Process.clock_gettime(::Process::CLOCK_MONOTONIC)
24
+
25
+ return true if @healthy == true && @health_checked_at && (now - @health_checked_at) < HEALTH_CACHE_TTL
26
+
27
+ result = check_health
28
+ if result
29
+ @healthy = true
30
+ @health_checked_at = ::Process.clock_gettime(::Process::CLOCK_MONOTONIC)
31
+ end
32
+ result
33
+ end
34
+
35
+ # POSTs a chat request to the daemon REST API.
36
+ # Returns a status hash based on the HTTP response code.
37
+ def chat(message:, request_id: nil, context: {}, tier_preference: :auto, model: nil, provider: nil)
38
+ request_id ||= SecureRandom.uuid
39
+
40
+ body = {
41
+ message: message,
42
+ request_id: request_id,
43
+ context: context,
44
+ tier_preference: tier_preference
45
+ }
46
+ body[:model] = model if model
47
+ body[:provider] = provider if provider
48
+
49
+ response = http_post('/api/llm/chat', body)
50
+ interpret_response(response)
51
+ rescue StandardError => e
52
+ mark_unhealthy
53
+ { status: :unavailable, error: e.message }
54
+ end
55
+
56
+ # Returns the daemon URL from settings, cached after first read.
57
+ # Returns nil if settings are unavailable or the key is missing.
58
+ def daemon_url
59
+ return @daemon_url if defined?(@daemon_url)
60
+
61
+ @daemon_url = fetch_daemon_url
62
+ end
63
+
64
+ # Clears all cached state. Returns self for chaining.
65
+ def reset!
66
+ remove_instance_variable(:@daemon_url) if defined?(@daemon_url)
67
+ @healthy = nil
68
+ @health_checked_at = nil
69
+ self
70
+ end
71
+
72
+ # GETs /api/health. Returns true on 200, false otherwise.
73
+ # Updates @healthy and @health_checked_at.
74
+ def check_health
75
+ response = http_get('/api/health')
76
+ healthy = response.code == '200'
77
+ @healthy = healthy
78
+ @health_checked_at = ::Process.clock_gettime(::Process::CLOCK_MONOTONIC)
79
+ healthy
80
+ rescue StandardError
81
+ mark_unhealthy
82
+ false
83
+ end
84
+
85
+ # Marks the daemon as unhealthy and records the timestamp.
86
+ def mark_unhealthy
87
+ @healthy = false
88
+ @health_checked_at = ::Process.clock_gettime(::Process::CLOCK_MONOTONIC)
89
+ end
90
+
91
+ # Builds and sends a GET request. Returns Net::HTTPResponse.
92
+ def http_get(path)
93
+ uri = URI.parse("#{daemon_url}#{path}")
94
+ http = Net::HTTP.new(uri.host, uri.port)
95
+ http.open_timeout = 2
96
+ http.read_timeout = 2
97
+ request = Net::HTTP::Get.new(uri.request_uri)
98
+ request['Content-Type'] = 'application/json'
99
+ http.request(request)
100
+ end
101
+
102
+ # Builds and sends a POST request with a JSON body.
103
+ # Returns Net::HTTPResponse.
104
+ def http_post(path, body)
105
+ uri = URI.parse("#{daemon_url}#{path}")
106
+ http = Net::HTTP.new(uri.host, uri.port)
107
+ http.open_timeout = 5
108
+ http.read_timeout = DEFAULT_TIMEOUT
109
+ request = Net::HTTP::Post.new(uri.request_uri)
110
+ request['Content-Type'] = 'application/json'
111
+ request.body = ::JSON.dump(body)
112
+ http.request(request)
113
+ end
114
+
115
+ # Maps an HTTP response to a status hash.
116
+ # Follows the Legion API format: { data: {...} } for success,
117
+ # { error: {...} } for failure.
118
+ def interpret_response(response)
119
+ code = response.code.to_i
120
+ parsed = safe_parse(response.body)
121
+
122
+ case code
123
+ when 200
124
+ { status: :immediate, body: parsed.fetch(:data, parsed) }
125
+ when 201
126
+ { status: :created, body: parsed.fetch(:data, parsed) }
127
+ when 202
128
+ data = parsed.fetch(:data, {})
129
+ { status: :accepted, request_id: data[:request_id], poll_key: data[:poll_key] }
130
+ when 403
131
+ { status: :denied, error: parsed.fetch(:error, parsed) }
132
+ when 429
133
+ retry_after = extract_retry_after(response, parsed)
134
+ { status: :rate_limited, retry_after: retry_after }
135
+ when 503
136
+ { status: :unavailable }
137
+ else
138
+ { status: :error, code: code, body: parsed }
139
+ end
140
+ end
141
+
142
+ # ── private helpers ────────────────────────────────────────────────
143
+
144
+ def fetch_daemon_url
145
+ return nil unless defined?(Legion::LLM) && Legion::LLM.respond_to?(:settings)
146
+
147
+ settings = Legion::LLM.settings
148
+ return nil unless settings.is_a?(Hash)
149
+
150
+ daemon = settings[:daemon]
151
+ return nil unless daemon.is_a?(Hash)
152
+
153
+ daemon[:url]
154
+ rescue StandardError
155
+ nil
156
+ end
157
+
158
+ def safe_parse(body)
159
+ return {} if body.nil? || body.strip.empty?
160
+
161
+ ::JSON.parse(body, symbolize_names: true)
162
+ rescue ::JSON::ParserError
163
+ {}
164
+ end
165
+
166
+ def extract_retry_after(response, parsed)
167
+ from_body = parsed.dig(:error, :retry_after) || parsed[:retry_after]
168
+ return from_body.to_i if from_body
169
+
170
+ header = response['Retry-After']
171
+ return header.to_i if header
172
+
173
+ 0
174
+ end
175
+
176
+ private_class_method :fetch_daemon_url, :safe_parse, :extract_retry_after
177
+ end
178
+ end
179
+ end
@@ -0,0 +1,133 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'fileutils'
4
+ require 'json'
5
+
6
+ module Legion
7
+ module LLM
8
+ module ResponseCache
9
+ DEFAULT_TTL = 300
10
+ SPOOL_THRESHOLD = 8 * 1024 * 1024 # 8 MB
11
+ SPOOL_DIR = File.expand_path('~/.legionio/data/spool/llm_responses').freeze
12
+
13
+ module_function
14
+
15
+ # Sets status to :pending for a new request.
16
+ def init_request(request_id, ttl: DEFAULT_TTL)
17
+ cache_set(status_key(request_id), 'pending', ttl)
18
+ end
19
+
20
+ # Writes response, meta, and marks status as :done.
21
+ def complete(request_id, response:, meta:, ttl: DEFAULT_TTL)
22
+ write_response(request_id, response, ttl)
23
+ cache_set(meta_key(request_id), ::JSON.dump(meta), ttl)
24
+ cache_set(status_key(request_id), 'done', ttl)
25
+ end
26
+
27
+ # Writes error details and marks status as :error.
28
+ def fail_request(request_id, code:, message:, ttl: DEFAULT_TTL)
29
+ payload = ::JSON.dump({ code: code, message: message })
30
+ cache_set(error_key(request_id), payload, ttl)
31
+ cache_set(status_key(request_id), 'error', ttl)
32
+ end
33
+
34
+ # Returns :pending, :done, :error, or nil.
35
+ def status(request_id)
36
+ raw = Legion::Cache.get(status_key(request_id))
37
+ raw&.to_sym
38
+ end
39
+
40
+ # Returns the response string (handles spool overflow transparently).
41
+ def response(request_id)
42
+ raw = Legion::Cache.get(response_key(request_id))
43
+ return nil if raw.nil?
44
+ return File.read(raw.delete_prefix('spool:')) if raw.start_with?('spool:')
45
+
46
+ raw
47
+ end
48
+
49
+ # Returns meta hash with symbolized keys, or nil.
50
+ def meta(request_id)
51
+ raw = Legion::Cache.get(meta_key(request_id))
52
+ return nil if raw.nil?
53
+
54
+ ::JSON.parse(raw, symbolize_names: true)
55
+ end
56
+
57
+ # Returns { code:, message: } hash, or nil.
58
+ def error(request_id)
59
+ raw = Legion::Cache.get(error_key(request_id))
60
+ return nil if raw.nil?
61
+
62
+ ::JSON.parse(raw, symbolize_names: true)
63
+ end
64
+
65
+ # Blocking poll. Returns { status: :done, response:, meta: },
66
+ # { status: :error, error: }, or { status: :timeout }.
67
+ def poll(request_id, timeout: DEFAULT_TTL, interval: 0.1)
68
+ deadline = ::Process.clock_gettime(::Process::CLOCK_MONOTONIC) + timeout
69
+
70
+ loop do
71
+ current = status(request_id)
72
+
73
+ case current
74
+ when :done
75
+ return { status: :done, response: response(request_id), meta: meta(request_id) }
76
+ when :error
77
+ return { status: :error, error: error(request_id) }
78
+ end
79
+
80
+ return { status: :timeout } if ::Process.clock_gettime(::Process::CLOCK_MONOTONIC) >= deadline
81
+
82
+ sleep interval
83
+ end
84
+ end
85
+
86
+ # Removes all cache keys for a request (and any spool file).
87
+ def cleanup(request_id)
88
+ raw = Legion::Cache.get(response_key(request_id))
89
+ if raw&.start_with?('spool:')
90
+ path = raw.delete_prefix('spool:')
91
+ FileUtils.rm_f(path)
92
+ end
93
+
94
+ Legion::Cache.delete(status_key(request_id))
95
+ Legion::Cache.delete(response_key(request_id))
96
+ Legion::Cache.delete(meta_key(request_id))
97
+ Legion::Cache.delete(error_key(request_id))
98
+ end
99
+
100
+ # ── private helpers ────────────────────────────────────────────────
101
+ private_class_method def self.status_key(request_id)
102
+ "llm:#{request_id}:status"
103
+ end
104
+
105
+ private_class_method def self.response_key(request_id)
106
+ "llm:#{request_id}:response"
107
+ end
108
+
109
+ private_class_method def self.meta_key(request_id)
110
+ "llm:#{request_id}:meta"
111
+ end
112
+
113
+ private_class_method def self.error_key(request_id)
114
+ "llm:#{request_id}:error"
115
+ end
116
+
117
+ private_class_method def self.cache_set(key, value, ttl)
118
+ Legion::Cache.set(key, value, ttl)
119
+ end
120
+
121
+ private_class_method def self.write_response(request_id, response_text, ttl)
122
+ if response_text.bytesize > SPOOL_THRESHOLD
123
+ FileUtils.mkdir_p(SPOOL_DIR)
124
+ path = File.join(SPOOL_DIR, "#{request_id}.txt")
125
+ File.write(path, response_text)
126
+ cache_set(response_key(request_id), "spool:#{path}", ttl)
127
+ else
128
+ cache_set(response_key(request_id), response_text, ttl)
129
+ end
130
+ end
131
+ end
132
+ end
133
+ end
@@ -13,7 +13,15 @@ module Legion
13
13
  providers: providers,
14
14
  routing: routing_defaults,
15
15
  discovery: discovery_defaults,
16
- gateway: gateway_defaults
16
+ gateway: gateway_defaults,
17
+ daemon: daemon_defaults
18
+ }
19
+ end
20
+
21
+ def self.daemon_defaults
22
+ {
23
+ url: nil,
24
+ enabled: false
17
25
  }
18
26
  end
19
27
 
@@ -2,6 +2,6 @@
2
2
 
3
3
  module Legion
4
4
  module LLM
5
- VERSION = '0.3.5'
5
+ VERSION = '0.3.7'
6
6
  end
7
7
  end
data/lib/legion/llm.rb CHANGED
@@ -8,6 +8,8 @@ require 'legion/llm/router'
8
8
  require 'legion/llm/compressor'
9
9
  require 'legion/llm/quality_checker'
10
10
  require 'legion/llm/escalation_history'
11
+ require_relative 'llm/response_cache'
12
+ require_relative 'llm/daemon_client'
11
13
 
12
14
  begin
13
15
  require 'legion/extensions/llm/gateway'
@@ -18,6 +20,8 @@ end
18
20
  module Legion
19
21
  module LLM
20
22
  class EscalationExhausted < StandardError; end
23
+ class DaemonDeniedError < StandardError; end
24
+ class DaemonRateLimitedError < StandardError; end
21
25
 
22
26
  class << self
23
27
  include Legion::LLM::Providers
@@ -71,6 +75,19 @@ module Legion
71
75
  quality_check: quality_check, message: message, **)
72
76
  end
73
77
 
78
+ # Send a single message — daemon-first, falls through to direct on unavailability.
79
+ def ask(message:, model: nil, provider: nil, intent: nil, tier: nil,
80
+ context: {}, identity: nil, &)
81
+ if DaemonClient.available?
82
+ result = daemon_ask(message: message, model: model, provider: provider,
83
+ context: context, tier: tier, identity: identity)
84
+ return result if result
85
+ end
86
+
87
+ ask_direct(message: message, model: model, provider: provider,
88
+ intent: intent, tier: tier, &)
89
+ end
90
+
74
91
  # Direct chat bypassing gateway — used by gateway runners to avoid recursion
75
92
  def chat_direct(model: nil, provider: nil, intent: nil, tier: nil, escalate: nil,
76
93
  max_escalations: nil, quality_check: nil, message: nil, **)
@@ -135,6 +152,41 @@ module Legion
135
152
 
136
153
  private
137
154
 
155
+ def daemon_ask(message:, model: nil, provider: nil, context: {}, tier: nil, identity: nil) # rubocop:disable Lint/UnusedMethodArgument
156
+ result = DaemonClient.chat(
157
+ message: message, model: model, provider: provider,
158
+ context: context, tier_preference: tier || :auto
159
+ )
160
+
161
+ case result[:status]
162
+ when :immediate, :created
163
+ result[:body]
164
+ when :accepted
165
+ ResponseCache.poll(result[:request_id])
166
+ when :denied
167
+ raise DaemonDeniedError, result.dig(:error, :message) || 'Access denied'
168
+ when :rate_limited
169
+ raise DaemonRateLimitedError, "Rate limited. Retry after #{result[:retry_after]}s"
170
+ end
171
+ # Returns nil for :unavailable/:error — caller falls through to direct
172
+ end
173
+
174
+ def ask_direct(message:, model: nil, provider: nil, intent: nil, tier: nil, &block)
175
+ session = chat_direct(model: model, provider: provider, intent: intent, tier: tier)
176
+ response = block ? session.ask(message, &block) : session.ask(message)
177
+
178
+ {
179
+ status: :done,
180
+ response: response.content,
181
+ meta: {
182
+ tier: :direct,
183
+ model: session.model.to_s,
184
+ tokens_in: response.respond_to?(:input_tokens) ? response.input_tokens : nil,
185
+ tokens_out: response.respond_to?(:output_tokens) ? response.output_tokens : nil
186
+ }
187
+ }
188
+ end
189
+
138
190
  def gateway_loaded?
139
191
  defined?(Legion::Extensions::LLM::Gateway::Runners::Inference)
140
192
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: legion-llm
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.5
4
+ version: 0.3.7
5
5
  platform: ruby
6
6
  authors:
7
7
  - Esity
@@ -37,6 +37,48 @@ dependencies:
37
37
  - - ">="
38
38
  - !ruby/object:Gem::Version
39
39
  version: '0'
40
+ - !ruby/object:Gem::Dependency
41
+ name: lex-claude
42
+ requirement: !ruby/object:Gem::Requirement
43
+ requirements:
44
+ - - ">="
45
+ - !ruby/object:Gem::Version
46
+ version: '0'
47
+ type: :runtime
48
+ prerelease: false
49
+ version_requirements: !ruby/object:Gem::Requirement
50
+ requirements:
51
+ - - ">="
52
+ - !ruby/object:Gem::Version
53
+ version: '0'
54
+ - !ruby/object:Gem::Dependency
55
+ name: lex-gemini
56
+ requirement: !ruby/object:Gem::Requirement
57
+ requirements:
58
+ - - ">="
59
+ - !ruby/object:Gem::Version
60
+ version: '0'
61
+ type: :runtime
62
+ prerelease: false
63
+ version_requirements: !ruby/object:Gem::Requirement
64
+ requirements:
65
+ - - ">="
66
+ - !ruby/object:Gem::Version
67
+ version: '0'
68
+ - !ruby/object:Gem::Dependency
69
+ name: lex-openai
70
+ requirement: !ruby/object:Gem::Requirement
71
+ requirements:
72
+ - - ">="
73
+ - !ruby/object:Gem::Version
74
+ version: '0'
75
+ type: :runtime
76
+ prerelease: false
77
+ version_requirements: !ruby/object:Gem::Requirement
78
+ requirements:
79
+ - - ">="
80
+ - !ruby/object:Gem::Version
81
+ version: '0'
40
82
  - !ruby/object:Gem::Dependency
41
83
  name: ruby_llm
42
84
  requirement: !ruby/object:Gem::Requirement
@@ -91,6 +133,7 @@ files:
91
133
  - lib/legion/llm/bedrock_bearer_auth.rb
92
134
  - lib/legion/llm/claude_config_loader.rb
93
135
  - lib/legion/llm/compressor.rb
136
+ - lib/legion/llm/daemon_client.rb
94
137
  - lib/legion/llm/discovery/ollama.rb
95
138
  - lib/legion/llm/discovery/system.rb
96
139
  - lib/legion/llm/embeddings.rb
@@ -98,6 +141,7 @@ files:
98
141
  - lib/legion/llm/helpers/llm.rb
99
142
  - lib/legion/llm/providers.rb
100
143
  - lib/legion/llm/quality_checker.rb
144
+ - lib/legion/llm/response_cache.rb
101
145
  - lib/legion/llm/router.rb
102
146
  - lib/legion/llm/router/escalation_chain.rb
103
147
  - lib/legion/llm/router/gateway_interceptor.rb