legion-llm 0.3.6 → 0.3.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: '0914899eb9eee81b947d95d617a16ddf152fb74fa7afb4c1c1cfca74c9c8445d'
4
- data.tar.gz: d4146a95967ceffca175c531fd412089f3a13df4b4b60964598e123115d3c19f
3
+ metadata.gz: b06b6f10d21c6c4d03c73646fbecc2112e61e47e1dd82059076c61a721efb1aa
4
+ data.tar.gz: 541a1a1de0a108e95b5e2c204ec579a1b0a5f77e935bad64d7668aef9ae3322d
5
5
  SHA512:
6
- metadata.gz: 8d9fb16e659a4f24d6c01bb3b7caa96d6814980e5b9866fe8ccc293bae57121f8d21acc95efef98832b015875abebfe1ca2cbba63f825a43d64cb9feac82f9b2
7
- data.tar.gz: 4e6788a7b28889ed80ec1701e5a45a05bcfe71914610b538fae2f68d3b16ac4942e8edd0abbf2414d3dd124edc109817ceef3390d22108c1c9899a82b6d93c55
6
+ metadata.gz: 06ab55cec8a23d4be70ea3851fd4a7717686c4e02f7b4ca2f479e2353f79b14cacc770343a81ac54f492a05de3b5228aebb7c2e464203e434eba53e8b4144694
7
+ data.tar.gz: 93623de5b0baa0bb5390678daac043fc6c08111f965886b03b87f5e769aa6e5f267a713c1c20d93d061341a6ceca8d88fcbd0dfa431d4ce84c2caf5768b19609
data/CHANGELOG.md CHANGED
@@ -1,5 +1,15 @@
1
1
  # Legion LLM Changelog
2
2
 
3
+ ## [0.3.7] - 2026-03-19
4
+
5
+ ### Added
6
+ - `ResponseCache` module for async response delivery via memcached with spool overflow at 8MB
7
+ - `DaemonClient` module for HTTP routing to LegionIO daemon with health caching (30s TTL)
8
+ - `Legion::LLM.ask` one-shot method: daemon-first routing with direct RubyLLM fallback
9
+ - `DaemonDeniedError` and `DaemonRateLimitedError` error classes
10
+ - Daemon settings: `daemon.url` and `daemon.enabled` in defaults
11
+ - HTTP status code contract: 200 (cached), 201 (sync), 202 (async poll), 403, 429, 503
12
+
3
13
  ## [0.3.6] - 2026-03-18
4
14
 
5
15
  ### Added
data/CLAUDE.md CHANGED
@@ -8,7 +8,7 @@
8
8
  Core LegionIO gem providing LLM capabilities to all extensions. Wraps ruby_llm to provide a consistent interface for chat, embeddings, tool use, and agents across multiple providers (Bedrock, Anthropic, OpenAI, Gemini, Ollama). Includes a dynamic weighted routing engine that dispatches requests across local, fleet, and cloud tiers based on caller intent, priority rules, time schedules, cost multipliers, and real-time provider health.
9
9
 
10
10
  **GitHub**: https://github.com/LegionIO/legion-llm
11
- **Version**: 0.3.5
11
+ **Version**: 0.3.6
12
12
  **License**: Apache-2.0
13
13
 
14
14
  ## Architecture
@@ -303,7 +303,7 @@ In-memory signal consumer with pluggable handlers. Adjusts effective priorities
303
303
  | `lib/legion/llm/embeddings.rb` | Embeddings module: generate, generate_batch, default_model |
304
304
  | `lib/legion/llm/shadow_eval.rb` | Shadow evaluation: enabled?, should_sample?, evaluate, compare |
305
305
  | `lib/legion/llm/structured_output.rb` | JSON schema enforcement with native response_format and prompt fallback |
306
- | `lib/legion/llm/version.rb` | Version constant (0.3.5) |
306
+ | `lib/legion/llm/version.rb` | Version constant (0.3.6) |
307
307
  | `lib/legion/llm/quality_checker.rb` | QualityChecker module with QualityResult struct |
308
308
  | `lib/legion/llm/escalation_history.rb` | EscalationHistory mixin: `escalation_history`, `escalated?`, `final_resolution`, `escalation_chain` |
309
309
  | `lib/legion/llm/router/escalation_chain.rb` | EscalationChain value object |
data/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  LLM integration for the [LegionIO](https://github.com/LegionIO/LegionIO) framework. Wraps [ruby_llm](https://github.com/crmne/ruby_llm) to provide chat, embeddings, tool use, and agent capabilities to any Legion extension.
4
4
 
5
- **Version**: 0.3.5
5
+ **Version**: 0.3.6
6
6
 
7
7
  ## Installation
8
8
 
@@ -0,0 +1,179 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'net/http'
4
+ require 'uri'
5
+ require 'json'
6
+ require 'securerandom'
7
+
8
+ module Legion
9
+ module LLM
10
+ module DaemonClient
11
+ HEALTH_CACHE_TTL = 30
12
+ DEFAULT_TIMEOUT = 60
13
+
14
+ module_function
15
+
16
+ # Returns true if the daemon is reachable and healthy.
17
+ # Returns false immediately if daemon_url is nil.
18
+ # Caches a positive health check for HEALTH_CACHE_TTL seconds.
19
+ # An unhealthy result is not cached — rechecks on every call.
20
+ def available?
21
+ return false if daemon_url.nil?
22
+
23
+ now = ::Process.clock_gettime(::Process::CLOCK_MONOTONIC)
24
+
25
+ return true if @healthy == true && @health_checked_at && (now - @health_checked_at) < HEALTH_CACHE_TTL
26
+
27
+ result = check_health
28
+ if result
29
+ @healthy = true
30
+ @health_checked_at = ::Process.clock_gettime(::Process::CLOCK_MONOTONIC)
31
+ end
32
+ result
33
+ end
34
+
35
+ # POSTs a chat request to the daemon REST API.
36
+ # Returns a status hash based on the HTTP response code.
37
+ def chat(message:, request_id: nil, context: {}, tier_preference: :auto, model: nil, provider: nil)
38
+ request_id ||= SecureRandom.uuid
39
+
40
+ body = {
41
+ message: message,
42
+ request_id: request_id,
43
+ context: context,
44
+ tier_preference: tier_preference
45
+ }
46
+ body[:model] = model if model
47
+ body[:provider] = provider if provider
48
+
49
+ response = http_post('/api/llm/chat', body)
50
+ interpret_response(response)
51
+ rescue StandardError => e
52
+ mark_unhealthy
53
+ { status: :unavailable, error: e.message }
54
+ end
55
+
56
+ # Returns the daemon URL from settings, cached after first read.
57
+ # Returns nil if settings are unavailable or the key is missing.
58
+ def daemon_url
59
+ return @daemon_url if defined?(@daemon_url)
60
+
61
+ @daemon_url = fetch_daemon_url
62
+ end
63
+
64
+ # Clears all cached state. Returns self for chaining.
65
+ def reset!
66
+ remove_instance_variable(:@daemon_url) if defined?(@daemon_url)
67
+ @healthy = nil
68
+ @health_checked_at = nil
69
+ self
70
+ end
71
+
72
+ # GETs /api/health. Returns true on 200, false otherwise.
73
+ # Updates @healthy and @health_checked_at.
74
+ def check_health
75
+ response = http_get('/api/health')
76
+ healthy = response.code == '200'
77
+ @healthy = healthy
78
+ @health_checked_at = ::Process.clock_gettime(::Process::CLOCK_MONOTONIC)
79
+ healthy
80
+ rescue StandardError
81
+ mark_unhealthy
82
+ false
83
+ end
84
+
85
+ # Marks the daemon as unhealthy and records the timestamp.
86
+ def mark_unhealthy
87
+ @healthy = false
88
+ @health_checked_at = ::Process.clock_gettime(::Process::CLOCK_MONOTONIC)
89
+ end
90
+
91
+ # Builds and sends a GET request. Returns Net::HTTPResponse.
92
+ def http_get(path)
93
+ uri = URI.parse("#{daemon_url}#{path}")
94
+ http = Net::HTTP.new(uri.host, uri.port)
95
+ http.open_timeout = 2
96
+ http.read_timeout = 2
97
+ request = Net::HTTP::Get.new(uri.request_uri)
98
+ request['Content-Type'] = 'application/json'
99
+ http.request(request)
100
+ end
101
+
102
+ # Builds and sends a POST request with a JSON body.
103
+ # Returns Net::HTTPResponse.
104
+ def http_post(path, body)
105
+ uri = URI.parse("#{daemon_url}#{path}")
106
+ http = Net::HTTP.new(uri.host, uri.port)
107
+ http.open_timeout = 5
108
+ http.read_timeout = DEFAULT_TIMEOUT
109
+ request = Net::HTTP::Post.new(uri.request_uri)
110
+ request['Content-Type'] = 'application/json'
111
+ request.body = ::JSON.dump(body)
112
+ http.request(request)
113
+ end
114
+
115
+ # Maps an HTTP response to a status hash.
116
+ # Follows the Legion API format: { data: {...} } for success,
117
+ # { error: {...} } for failure.
118
+ def interpret_response(response)
119
+ code = response.code.to_i
120
+ parsed = safe_parse(response.body)
121
+
122
+ case code
123
+ when 200
124
+ { status: :immediate, body: parsed.fetch(:data, parsed) }
125
+ when 201
126
+ { status: :created, body: parsed.fetch(:data, parsed) }
127
+ when 202
128
+ data = parsed.fetch(:data, {})
129
+ { status: :accepted, request_id: data[:request_id], poll_key: data[:poll_key] }
130
+ when 403
131
+ { status: :denied, error: parsed.fetch(:error, parsed) }
132
+ when 429
133
+ retry_after = extract_retry_after(response, parsed)
134
+ { status: :rate_limited, retry_after: retry_after }
135
+ when 503
136
+ { status: :unavailable }
137
+ else
138
+ { status: :error, code: code, body: parsed }
139
+ end
140
+ end
141
+
142
+ # ── private helpers ────────────────────────────────────────────────
143
+
144
+ def fetch_daemon_url
145
+ return nil unless defined?(Legion::LLM) && Legion::LLM.respond_to?(:settings)
146
+
147
+ settings = Legion::LLM.settings
148
+ return nil unless settings.is_a?(Hash)
149
+
150
+ daemon = settings[:daemon]
151
+ return nil unless daemon.is_a?(Hash)
152
+
153
+ daemon[:url]
154
+ rescue StandardError
155
+ nil
156
+ end
157
+
158
+ def safe_parse(body)
159
+ return {} if body.nil? || body.strip.empty?
160
+
161
+ ::JSON.parse(body, symbolize_names: true)
162
+ rescue ::JSON::ParserError
163
+ {}
164
+ end
165
+
166
+ def extract_retry_after(response, parsed)
167
+ from_body = parsed.dig(:error, :retry_after) || parsed[:retry_after]
168
+ return from_body.to_i if from_body
169
+
170
+ header = response['Retry-After']
171
+ return header.to_i if header
172
+
173
+ 0
174
+ end
175
+
176
+ private_class_method :fetch_daemon_url, :safe_parse, :extract_retry_after
177
+ end
178
+ end
179
+ end
@@ -0,0 +1,133 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'fileutils'
4
+ require 'json'
5
+
6
+ module Legion
7
+ module LLM
8
+ module ResponseCache
9
+ DEFAULT_TTL = 300
10
+ SPOOL_THRESHOLD = 8 * 1024 * 1024 # 8 MB
11
+ SPOOL_DIR = File.expand_path('~/.legionio/data/spool/llm_responses').freeze
12
+
13
+ module_function
14
+
15
+ # Sets status to :pending for a new request.
16
+ def init_request(request_id, ttl: DEFAULT_TTL)
17
+ cache_set(status_key(request_id), 'pending', ttl)
18
+ end
19
+
20
+ # Writes response, meta, and marks status as :done.
21
+ def complete(request_id, response:, meta:, ttl: DEFAULT_TTL)
22
+ write_response(request_id, response, ttl)
23
+ cache_set(meta_key(request_id), ::JSON.dump(meta), ttl)
24
+ cache_set(status_key(request_id), 'done', ttl)
25
+ end
26
+
27
+ # Writes error details and marks status as :error.
28
+ def fail_request(request_id, code:, message:, ttl: DEFAULT_TTL)
29
+ payload = ::JSON.dump({ code: code, message: message })
30
+ cache_set(error_key(request_id), payload, ttl)
31
+ cache_set(status_key(request_id), 'error', ttl)
32
+ end
33
+
34
+ # Returns :pending, :done, :error, or nil.
35
+ def status(request_id)
36
+ raw = Legion::Cache.get(status_key(request_id))
37
+ raw&.to_sym
38
+ end
39
+
40
+ # Returns the response string (handles spool overflow transparently).
41
+ def response(request_id)
42
+ raw = Legion::Cache.get(response_key(request_id))
43
+ return nil if raw.nil?
44
+ return File.read(raw.delete_prefix('spool:')) if raw.start_with?('spool:')
45
+
46
+ raw
47
+ end
48
+
49
+ # Returns meta hash with symbolized keys, or nil.
50
+ def meta(request_id)
51
+ raw = Legion::Cache.get(meta_key(request_id))
52
+ return nil if raw.nil?
53
+
54
+ ::JSON.parse(raw, symbolize_names: true)
55
+ end
56
+
57
+ # Returns { code:, message: } hash, or nil.
58
+ def error(request_id)
59
+ raw = Legion::Cache.get(error_key(request_id))
60
+ return nil if raw.nil?
61
+
62
+ ::JSON.parse(raw, symbolize_names: true)
63
+ end
64
+
65
+ # Blocking poll. Returns { status: :done, response:, meta: },
66
+ # { status: :error, error: }, or { status: :timeout }.
67
+ def poll(request_id, timeout: DEFAULT_TTL, interval: 0.1)
68
+ deadline = ::Process.clock_gettime(::Process::CLOCK_MONOTONIC) + timeout
69
+
70
+ loop do
71
+ current = status(request_id)
72
+
73
+ case current
74
+ when :done
75
+ return { status: :done, response: response(request_id), meta: meta(request_id) }
76
+ when :error
77
+ return { status: :error, error: error(request_id) }
78
+ end
79
+
80
+ return { status: :timeout } if ::Process.clock_gettime(::Process::CLOCK_MONOTONIC) >= deadline
81
+
82
+ sleep interval
83
+ end
84
+ end
85
+
86
+ # Removes all cache keys for a request (and any spool file).
87
+ def cleanup(request_id)
88
+ raw = Legion::Cache.get(response_key(request_id))
89
+ if raw&.start_with?('spool:')
90
+ path = raw.delete_prefix('spool:')
91
+ FileUtils.rm_f(path)
92
+ end
93
+
94
+ Legion::Cache.delete(status_key(request_id))
95
+ Legion::Cache.delete(response_key(request_id))
96
+ Legion::Cache.delete(meta_key(request_id))
97
+ Legion::Cache.delete(error_key(request_id))
98
+ end
99
+
100
+ # ── private helpers ────────────────────────────────────────────────
101
+ private_class_method def self.status_key(request_id)
102
+ "llm:#{request_id}:status"
103
+ end
104
+
105
+ private_class_method def self.response_key(request_id)
106
+ "llm:#{request_id}:response"
107
+ end
108
+
109
+ private_class_method def self.meta_key(request_id)
110
+ "llm:#{request_id}:meta"
111
+ end
112
+
113
+ private_class_method def self.error_key(request_id)
114
+ "llm:#{request_id}:error"
115
+ end
116
+
117
+ private_class_method def self.cache_set(key, value, ttl)
118
+ Legion::Cache.set(key, value, ttl)
119
+ end
120
+
121
+ private_class_method def self.write_response(request_id, response_text, ttl)
122
+ if response_text.bytesize > SPOOL_THRESHOLD
123
+ FileUtils.mkdir_p(SPOOL_DIR)
124
+ path = File.join(SPOOL_DIR, "#{request_id}.txt")
125
+ File.write(path, response_text)
126
+ cache_set(response_key(request_id), "spool:#{path}", ttl)
127
+ else
128
+ cache_set(response_key(request_id), response_text, ttl)
129
+ end
130
+ end
131
+ end
132
+ end
133
+ end
@@ -13,7 +13,15 @@ module Legion
13
13
  providers: providers,
14
14
  routing: routing_defaults,
15
15
  discovery: discovery_defaults,
16
- gateway: gateway_defaults
16
+ gateway: gateway_defaults,
17
+ daemon: daemon_defaults
18
+ }
19
+ end
20
+
21
+ def self.daemon_defaults
22
+ {
23
+ url: nil,
24
+ enabled: false
17
25
  }
18
26
  end
19
27
 
@@ -2,6 +2,6 @@
2
2
 
3
3
  module Legion
4
4
  module LLM
5
- VERSION = '0.3.6'
5
+ VERSION = '0.3.7'
6
6
  end
7
7
  end
data/lib/legion/llm.rb CHANGED
@@ -8,6 +8,8 @@ require 'legion/llm/router'
8
8
  require 'legion/llm/compressor'
9
9
  require 'legion/llm/quality_checker'
10
10
  require 'legion/llm/escalation_history'
11
+ require_relative 'llm/response_cache'
12
+ require_relative 'llm/daemon_client'
11
13
 
12
14
  begin
13
15
  require 'legion/extensions/llm/gateway'
@@ -18,6 +20,8 @@ end
18
20
  module Legion
19
21
  module LLM
20
22
  class EscalationExhausted < StandardError; end
23
+ class DaemonDeniedError < StandardError; end
24
+ class DaemonRateLimitedError < StandardError; end
21
25
 
22
26
  class << self
23
27
  include Legion::LLM::Providers
@@ -71,6 +75,19 @@ module Legion
71
75
  quality_check: quality_check, message: message, **)
72
76
  end
73
77
 
78
+ # Send a single message — daemon-first, falls through to direct on unavailability.
79
+ def ask(message:, model: nil, provider: nil, intent: nil, tier: nil,
80
+ context: {}, identity: nil, &)
81
+ if DaemonClient.available?
82
+ result = daemon_ask(message: message, model: model, provider: provider,
83
+ context: context, tier: tier, identity: identity)
84
+ return result if result
85
+ end
86
+
87
+ ask_direct(message: message, model: model, provider: provider,
88
+ intent: intent, tier: tier, &)
89
+ end
90
+
74
91
  # Direct chat bypassing gateway — used by gateway runners to avoid recursion
75
92
  def chat_direct(model: nil, provider: nil, intent: nil, tier: nil, escalate: nil,
76
93
  max_escalations: nil, quality_check: nil, message: nil, **)
@@ -135,6 +152,41 @@ module Legion
135
152
 
136
153
  private
137
154
 
155
+ def daemon_ask(message:, model: nil, provider: nil, context: {}, tier: nil, identity: nil) # rubocop:disable Lint/UnusedMethodArgument
156
+ result = DaemonClient.chat(
157
+ message: message, model: model, provider: provider,
158
+ context: context, tier_preference: tier || :auto
159
+ )
160
+
161
+ case result[:status]
162
+ when :immediate, :created
163
+ result[:body]
164
+ when :accepted
165
+ ResponseCache.poll(result[:request_id])
166
+ when :denied
167
+ raise DaemonDeniedError, result.dig(:error, :message) || 'Access denied'
168
+ when :rate_limited
169
+ raise DaemonRateLimitedError, "Rate limited. Retry after #{result[:retry_after]}s"
170
+ end
171
+ # Returns nil for :unavailable/:error — caller falls through to direct
172
+ end
173
+
174
+ def ask_direct(message:, model: nil, provider: nil, intent: nil, tier: nil, &block)
175
+ session = chat_direct(model: model, provider: provider, intent: intent, tier: tier)
176
+ response = block ? session.ask(message, &block) : session.ask(message)
177
+
178
+ {
179
+ status: :done,
180
+ response: response.content,
181
+ meta: {
182
+ tier: :direct,
183
+ model: session.model.to_s,
184
+ tokens_in: response.respond_to?(:input_tokens) ? response.input_tokens : nil,
185
+ tokens_out: response.respond_to?(:output_tokens) ? response.output_tokens : nil
186
+ }
187
+ }
188
+ end
189
+
138
190
  def gateway_loaded?
139
191
  defined?(Legion::Extensions::LLM::Gateway::Runners::Inference)
140
192
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: legion-llm
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.6
4
+ version: 0.3.7
5
5
  platform: ruby
6
6
  authors:
7
7
  - Esity
@@ -133,6 +133,7 @@ files:
133
133
  - lib/legion/llm/bedrock_bearer_auth.rb
134
134
  - lib/legion/llm/claude_config_loader.rb
135
135
  - lib/legion/llm/compressor.rb
136
+ - lib/legion/llm/daemon_client.rb
136
137
  - lib/legion/llm/discovery/ollama.rb
137
138
  - lib/legion/llm/discovery/system.rb
138
139
  - lib/legion/llm/embeddings.rb
@@ -140,6 +141,7 @@ files:
140
141
  - lib/legion/llm/helpers/llm.rb
141
142
  - lib/legion/llm/providers.rb
142
143
  - lib/legion/llm/quality_checker.rb
144
+ - lib/legion/llm/response_cache.rb
143
145
  - lib/legion/llm/router.rb
144
146
  - lib/legion/llm/router/escalation_chain.rb
145
147
  - lib/legion/llm/router/gateway_interceptor.rb