legion-llm 0.3.5 → 0.3.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +15 -0
- data/CLAUDE.md +30 -10
- data/Gemfile +0 -2
- data/README.md +3 -1
- data/legion-llm.gemspec +3 -0
- data/lib/legion/llm/daemon_client.rb +179 -0
- data/lib/legion/llm/response_cache.rb +133 -0
- data/lib/legion/llm/settings.rb +9 -1
- data/lib/legion/llm/version.rb +1 -1
- data/lib/legion/llm.rb +52 -0
- metadata +45 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: b06b6f10d21c6c4d03c73646fbecc2112e61e47e1dd82059076c61a721efb1aa
|
|
4
|
+
data.tar.gz: 541a1a1de0a108e95b5e2c204ec579a1b0a5f77e935bad64d7668aef9ae3322d
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 06ab55cec8a23d4be70ea3851fd4a7717686c4e02f7b4ca2f479e2353f79b14cacc770343a81ac54f492a05de3b5228aebb7c2e464203e434eba53e8b4144694
|
|
7
|
+
data.tar.gz: 93623de5b0baa0bb5390678daac043fc6c08111f965886b03b87f5e769aa6e5f267a713c1c20d93d061341a6ceca8d88fcbd0dfa431d4ce84c2caf5768b19609
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,20 @@
|
|
|
1
1
|
# Legion LLM Changelog
|
|
2
2
|
|
|
3
|
+
## [0.3.7] - 2026-03-19
|
|
4
|
+
|
|
5
|
+
### Added
|
|
6
|
+
- `ResponseCache` module for async response delivery via memcached with spool overflow at 8MB
|
|
7
|
+
- `DaemonClient` module for HTTP routing to LegionIO daemon with health caching (30s TTL)
|
|
8
|
+
- `Legion::LLM.ask` one-shot method: daemon-first routing with direct RubyLLM fallback
|
|
9
|
+
- `DaemonDeniedError` and `DaemonRateLimitedError` error classes
|
|
10
|
+
- Daemon settings: `daemon.url` and `daemon.enabled` in defaults
|
|
11
|
+
- HTTP status code contract: 200 (cached), 201 (sync), 202 (async poll), 403, 429, 503
|
|
12
|
+
|
|
13
|
+
## [0.3.6] - 2026-03-18
|
|
14
|
+
|
|
15
|
+
### Added
|
|
16
|
+
- Add `lex-claude`, `lex-gemini`, `lex-openai` as runtime dependencies (AI provider extensions)
|
|
17
|
+
|
|
3
18
|
## [0.3.5] - 2026-03-18
|
|
4
19
|
|
|
5
20
|
### Added
|
data/CLAUDE.md
CHANGED
|
@@ -8,6 +8,7 @@
|
|
|
8
8
|
Core LegionIO gem providing LLM capabilities to all extensions. Wraps ruby_llm to provide a consistent interface for chat, embeddings, tool use, and agents across multiple providers (Bedrock, Anthropic, OpenAI, Gemini, Ollama). Includes a dynamic weighted routing engine that dispatches requests across local, fleet, and cloud tiers based on caller intent, priority rules, time schedules, cost multipliers, and real-time provider health.
|
|
9
9
|
|
|
10
10
|
**GitHub**: https://github.com/LegionIO/legion-llm
|
|
11
|
+
**Version**: 0.3.6
|
|
11
12
|
**License**: Apache-2.0
|
|
12
13
|
|
|
13
14
|
## Architecture
|
|
@@ -61,8 +62,7 @@ Three-tier dispatch model. Local-first avoids unnecessary network hops; fleet of
|
|
|
61
62
|
│ Zero network overhead, no Transport │
|
|
62
63
|
│ │
|
|
63
64
|
│ Tier 2: FLEET → Ollama on Mac Studios / GPU servers │
|
|
64
|
-
│ Via
|
|
65
|
-
│ serve the model (Phase 2, not yet built) │
|
|
65
|
+
│ Via lex-llm-gateway RPC over AMQP │
|
|
66
66
|
│ │
|
|
67
67
|
│ Tier 3: CLOUD → Bedrock / Anthropic / OpenAI / Gemini │
|
|
68
68
|
│ Existing provider API calls │
|
|
@@ -87,6 +87,19 @@ Three-tier dispatch model. Local-first avoids unnecessary network hops; fleet of
|
|
|
87
87
|
5. Return Resolution for highest-scoring candidate
|
|
88
88
|
```
|
|
89
89
|
|
|
90
|
+
### Gateway Integration (lex-llm-gateway)
|
|
91
|
+
|
|
92
|
+
When `lex-llm-gateway` is installed, `chat`, `embed`, and `structured` automatically delegate to the gateway for metering and fleet dispatch. The gateway is loaded via `begin/rescue LoadError` — optional, not a hard dependency.
|
|
93
|
+
|
|
94
|
+
```
|
|
95
|
+
Caller → Legion::LLM.chat(message:)
|
|
96
|
+
└─ gateway loaded? → Gateway::Runners::Inference.chat (meters, fleet dispatch)
|
|
97
|
+
└─ Legion::LLM.chat_direct (routing, escalation, RubyLLM)
|
|
98
|
+
└─ no gateway? → Legion::LLM.chat_direct (same path, no metering)
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
The `_direct` variants (`chat_direct`, `embed_direct`, `structured_direct`) bypass gateway delegation. The gateway's `call_llm` uses these to avoid infinite recursion.
|
|
102
|
+
|
|
90
103
|
### Integration with LegionIO
|
|
91
104
|
|
|
92
105
|
- **Service**: `setup_llm` called between data and supervision in startup sequence
|
|
@@ -94,6 +107,7 @@ Three-tier dispatch model. Local-first avoids unnecessary network hops; fleet of
|
|
|
94
107
|
- **Helpers**: `Legion::Extensions::Helpers::LLM` auto-loaded when gem is present
|
|
95
108
|
- **Readiness**: Registers as `:llm` in `Legion::Readiness`
|
|
96
109
|
- **Shutdown**: `Legion::LLM.shutdown` called during service shutdown
|
|
110
|
+
- **Gateway**: `lex-llm-gateway` auto-loaded if present; provides metering and fleet RPC
|
|
97
111
|
|
|
98
112
|
## Dependencies
|
|
99
113
|
|
|
@@ -103,6 +117,7 @@ Three-tier dispatch model. Local-first avoids unnecessary network hops; fleet of
|
|
|
103
117
|
| `tzinfo` (>= 2.0) | IANA timezone conversion for schedule windows |
|
|
104
118
|
| `legion-logging` | Logging |
|
|
105
119
|
| `legion-settings` | Configuration |
|
|
120
|
+
| `lex-llm-gateway` (optional) | Metering over RMQ, fleet RPC dispatch, disk spool — auto-loaded if present |
|
|
106
121
|
|
|
107
122
|
## Key Interfaces
|
|
108
123
|
|
|
@@ -113,11 +128,15 @@ Legion::LLM.shutdown # Cleanup
|
|
|
113
128
|
Legion::LLM.started? # -> Boolean
|
|
114
129
|
Legion::LLM.settings # -> Hash
|
|
115
130
|
|
|
116
|
-
# Chat (
|
|
117
|
-
Legion::LLM.chat(model:, provider:)
|
|
131
|
+
# Chat (delegates to gateway when loaded, otherwise direct)
|
|
132
|
+
Legion::LLM.chat(message: 'hello', model:, provider:) # Gateway-metered if available
|
|
118
133
|
Legion::LLM.chat(intent: { privacy: :strict }) # Intent-based routing
|
|
119
134
|
Legion::LLM.chat(tier: :cloud, model: 'claude-sonnet-4-6') # Explicit tier override
|
|
120
|
-
Legion::LLM.
|
|
135
|
+
Legion::LLM.chat_direct(message:, model:, provider:) # Bypass gateway (no metering)
|
|
136
|
+
Legion::LLM.embed(text, model:) # Embeddings (gateway-metered)
|
|
137
|
+
Legion::LLM.embed_direct(text, model:) # Bypass gateway
|
|
138
|
+
Legion::LLM.structured(messages:, schema:) # Structured (gateway-metered)
|
|
139
|
+
Legion::LLM.structured_direct(messages:, schema:) # Bypass gateway
|
|
121
140
|
Legion::LLM.agent(AgentClass) # Agent instance
|
|
122
141
|
|
|
123
142
|
# Compressor
|
|
@@ -284,7 +303,7 @@ In-memory signal consumer with pluggable handlers. Adjusts effective priorities
|
|
|
284
303
|
| `lib/legion/llm/embeddings.rb` | Embeddings module: generate, generate_batch, default_model |
|
|
285
304
|
| `lib/legion/llm/shadow_eval.rb` | Shadow evaluation: enabled?, should_sample?, evaluate, compare |
|
|
286
305
|
| `lib/legion/llm/structured_output.rb` | JSON schema enforcement with native response_format and prompt fallback |
|
|
287
|
-
| `lib/legion/llm/version.rb` | Version constant (0.3.
|
|
306
|
+
| `lib/legion/llm/version.rb` | Version constant (0.3.6) |
|
|
288
307
|
| `lib/legion/llm/quality_checker.rb` | QualityChecker module with QualityResult struct |
|
|
289
308
|
| `lib/legion/llm/escalation_history.rb` | EscalationHistory mixin: `escalation_history`, `escalated?`, `final_resolution`, `escalation_chain` |
|
|
290
309
|
| `lib/legion/llm/router/escalation_chain.rb` | EscalationChain value object |
|
|
@@ -315,6 +334,7 @@ In-memory signal consumer with pluggable handlers. Adjusts effective priorities
|
|
|
315
334
|
| `spec/legion/llm/embeddings_spec.rb` | Embeddings tests |
|
|
316
335
|
| `spec/legion/llm/shadow_eval_spec.rb` | ShadowEval tests |
|
|
317
336
|
| `spec/legion/llm/structured_output_spec.rb` | StructuredOutput tests |
|
|
337
|
+
| `spec/legion/llm/gateway_integration_spec.rb` | Tests: gateway delegation and _direct bypass |
|
|
318
338
|
| `spec/spec_helper.rb` | Stubbed Legion::Logging and Legion::Settings for testing |
|
|
319
339
|
|
|
320
340
|
## Extension Integration
|
|
@@ -374,8 +394,8 @@ The legacy `vault_path` per-provider setting was removed in v0.3.1.
|
|
|
374
394
|
Tests run without the full LegionIO stack. `spec/spec_helper.rb` stubs `Legion::Logging` and `Legion::Settings` with in-memory implementations. Each test resets settings to defaults via `before(:each)`.
|
|
375
395
|
|
|
376
396
|
```bash
|
|
377
|
-
bundle exec rspec #
|
|
378
|
-
bundle exec rubocop #
|
|
397
|
+
bundle exec rspec # 304 examples, 0 failures
|
|
398
|
+
bundle exec rubocop # 52 files, 0 offenses
|
|
379
399
|
```
|
|
380
400
|
|
|
381
401
|
## Design Documents
|
|
@@ -389,8 +409,8 @@ bundle exec rubocop # 31 files, 0 offenses
|
|
|
389
409
|
|
|
390
410
|
## Future (Not Yet Built)
|
|
391
411
|
|
|
392
|
-
- **
|
|
393
|
-
- **
|
|
412
|
+
- **Advanced signals**: Budget tracking, GPU utilization monitoring, per-tenant spend limits
|
|
413
|
+
- **Fleet auto-scaling**: Dynamic worker pool sizing based on queue depth and latency
|
|
394
414
|
|
|
395
415
|
---
|
|
396
416
|
|
data/Gemfile
CHANGED
data/README.md
CHANGED
|
@@ -2,6 +2,8 @@
|
|
|
2
2
|
|
|
3
3
|
LLM integration for the [LegionIO](https://github.com/LegionIO/LegionIO) framework. Wraps [ruby_llm](https://github.com/crmne/ruby_llm) to provide chat, embeddings, tool use, and agent capabilities to any Legion extension.
|
|
4
4
|
|
|
5
|
+
**Version**: 0.3.6
|
|
6
|
+
|
|
5
7
|
## Installation
|
|
6
8
|
|
|
7
9
|
```ruby
|
|
@@ -599,7 +601,7 @@ bundle exec rspec
|
|
|
599
601
|
Tests use stubbed `Legion::Logging` and `Legion::Settings` modules (no need for the full LegionIO stack):
|
|
600
602
|
|
|
601
603
|
```bash
|
|
602
|
-
bundle exec rspec # Run all
|
|
604
|
+
bundle exec rspec # Run all 304 tests
|
|
603
605
|
bundle exec rubocop # Lint (0 offenses)
|
|
604
606
|
bundle exec rspec spec/legion/llm_spec.rb # Run specific test file
|
|
605
607
|
bundle exec rspec spec/legion/llm/router_spec.rb # Router tests only
|
data/legion-llm.gemspec
CHANGED
|
@@ -27,6 +27,9 @@ Gem::Specification.new do |spec|
|
|
|
27
27
|
|
|
28
28
|
spec.add_dependency 'legion-logging'
|
|
29
29
|
spec.add_dependency 'legion-settings'
|
|
30
|
+
spec.add_dependency 'lex-claude'
|
|
31
|
+
spec.add_dependency 'lex-gemini'
|
|
32
|
+
spec.add_dependency 'lex-openai'
|
|
30
33
|
spec.add_dependency 'ruby_llm', '>= 1.0'
|
|
31
34
|
spec.add_dependency 'tzinfo', '>= 2.0'
|
|
32
35
|
end
|
|
@@ -0,0 +1,179 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require 'net/http'
|
|
4
|
+
require 'uri'
|
|
5
|
+
require 'json'
|
|
6
|
+
require 'securerandom'
|
|
7
|
+
|
|
8
|
+
module Legion
|
|
9
|
+
module LLM
|
|
10
|
+
module DaemonClient
|
|
11
|
+
HEALTH_CACHE_TTL = 30
|
|
12
|
+
DEFAULT_TIMEOUT = 60
|
|
13
|
+
|
|
14
|
+
module_function
|
|
15
|
+
|
|
16
|
+
# Returns true if the daemon is reachable and healthy.
|
|
17
|
+
# Returns false immediately if daemon_url is nil.
|
|
18
|
+
# Caches a positive health check for HEALTH_CACHE_TTL seconds.
|
|
19
|
+
# An unhealthy result is not cached — rechecks on every call.
|
|
20
|
+
def available?
|
|
21
|
+
return false if daemon_url.nil?
|
|
22
|
+
|
|
23
|
+
now = ::Process.clock_gettime(::Process::CLOCK_MONOTONIC)
|
|
24
|
+
|
|
25
|
+
return true if @healthy == true && @health_checked_at && (now - @health_checked_at) < HEALTH_CACHE_TTL
|
|
26
|
+
|
|
27
|
+
result = check_health
|
|
28
|
+
if result
|
|
29
|
+
@healthy = true
|
|
30
|
+
@health_checked_at = ::Process.clock_gettime(::Process::CLOCK_MONOTONIC)
|
|
31
|
+
end
|
|
32
|
+
result
|
|
33
|
+
end
|
|
34
|
+
|
|
35
|
+
# POSTs a chat request to the daemon REST API.
|
|
36
|
+
# Returns a status hash based on the HTTP response code.
|
|
37
|
+
def chat(message:, request_id: nil, context: {}, tier_preference: :auto, model: nil, provider: nil)
|
|
38
|
+
request_id ||= SecureRandom.uuid
|
|
39
|
+
|
|
40
|
+
body = {
|
|
41
|
+
message: message,
|
|
42
|
+
request_id: request_id,
|
|
43
|
+
context: context,
|
|
44
|
+
tier_preference: tier_preference
|
|
45
|
+
}
|
|
46
|
+
body[:model] = model if model
|
|
47
|
+
body[:provider] = provider if provider
|
|
48
|
+
|
|
49
|
+
response = http_post('/api/llm/chat', body)
|
|
50
|
+
interpret_response(response)
|
|
51
|
+
rescue StandardError => e
|
|
52
|
+
mark_unhealthy
|
|
53
|
+
{ status: :unavailable, error: e.message }
|
|
54
|
+
end
|
|
55
|
+
|
|
56
|
+
# Returns the daemon URL from settings, cached after first read.
|
|
57
|
+
# Returns nil if settings are unavailable or the key is missing.
|
|
58
|
+
def daemon_url
|
|
59
|
+
return @daemon_url if defined?(@daemon_url)
|
|
60
|
+
|
|
61
|
+
@daemon_url = fetch_daemon_url
|
|
62
|
+
end
|
|
63
|
+
|
|
64
|
+
# Clears all cached state. Returns self for chaining.
|
|
65
|
+
def reset!
|
|
66
|
+
remove_instance_variable(:@daemon_url) if defined?(@daemon_url)
|
|
67
|
+
@healthy = nil
|
|
68
|
+
@health_checked_at = nil
|
|
69
|
+
self
|
|
70
|
+
end
|
|
71
|
+
|
|
72
|
+
# GETs /api/health. Returns true on 200, false otherwise.
|
|
73
|
+
# Updates @healthy and @health_checked_at.
|
|
74
|
+
def check_health
|
|
75
|
+
response = http_get('/api/health')
|
|
76
|
+
healthy = response.code == '200'
|
|
77
|
+
@healthy = healthy
|
|
78
|
+
@health_checked_at = ::Process.clock_gettime(::Process::CLOCK_MONOTONIC)
|
|
79
|
+
healthy
|
|
80
|
+
rescue StandardError
|
|
81
|
+
mark_unhealthy
|
|
82
|
+
false
|
|
83
|
+
end
|
|
84
|
+
|
|
85
|
+
# Marks the daemon as unhealthy and records the timestamp.
|
|
86
|
+
def mark_unhealthy
|
|
87
|
+
@healthy = false
|
|
88
|
+
@health_checked_at = ::Process.clock_gettime(::Process::CLOCK_MONOTONIC)
|
|
89
|
+
end
|
|
90
|
+
|
|
91
|
+
# Builds and sends a GET request. Returns Net::HTTPResponse.
|
|
92
|
+
def http_get(path)
|
|
93
|
+
uri = URI.parse("#{daemon_url}#{path}")
|
|
94
|
+
http = Net::HTTP.new(uri.host, uri.port)
|
|
95
|
+
http.open_timeout = 2
|
|
96
|
+
http.read_timeout = 2
|
|
97
|
+
request = Net::HTTP::Get.new(uri.request_uri)
|
|
98
|
+
request['Content-Type'] = 'application/json'
|
|
99
|
+
http.request(request)
|
|
100
|
+
end
|
|
101
|
+
|
|
102
|
+
# Builds and sends a POST request with a JSON body.
|
|
103
|
+
# Returns Net::HTTPResponse.
|
|
104
|
+
def http_post(path, body)
|
|
105
|
+
uri = URI.parse("#{daemon_url}#{path}")
|
|
106
|
+
http = Net::HTTP.new(uri.host, uri.port)
|
|
107
|
+
http.open_timeout = 5
|
|
108
|
+
http.read_timeout = DEFAULT_TIMEOUT
|
|
109
|
+
request = Net::HTTP::Post.new(uri.request_uri)
|
|
110
|
+
request['Content-Type'] = 'application/json'
|
|
111
|
+
request.body = ::JSON.dump(body)
|
|
112
|
+
http.request(request)
|
|
113
|
+
end
|
|
114
|
+
|
|
115
|
+
# Maps an HTTP response to a status hash.
|
|
116
|
+
# Follows the Legion API format: { data: {...} } for success,
|
|
117
|
+
# { error: {...} } for failure.
|
|
118
|
+
def interpret_response(response)
|
|
119
|
+
code = response.code.to_i
|
|
120
|
+
parsed = safe_parse(response.body)
|
|
121
|
+
|
|
122
|
+
case code
|
|
123
|
+
when 200
|
|
124
|
+
{ status: :immediate, body: parsed.fetch(:data, parsed) }
|
|
125
|
+
when 201
|
|
126
|
+
{ status: :created, body: parsed.fetch(:data, parsed) }
|
|
127
|
+
when 202
|
|
128
|
+
data = parsed.fetch(:data, {})
|
|
129
|
+
{ status: :accepted, request_id: data[:request_id], poll_key: data[:poll_key] }
|
|
130
|
+
when 403
|
|
131
|
+
{ status: :denied, error: parsed.fetch(:error, parsed) }
|
|
132
|
+
when 429
|
|
133
|
+
retry_after = extract_retry_after(response, parsed)
|
|
134
|
+
{ status: :rate_limited, retry_after: retry_after }
|
|
135
|
+
when 503
|
|
136
|
+
{ status: :unavailable }
|
|
137
|
+
else
|
|
138
|
+
{ status: :error, code: code, body: parsed }
|
|
139
|
+
end
|
|
140
|
+
end
|
|
141
|
+
|
|
142
|
+
# ── private helpers ────────────────────────────────────────────────
|
|
143
|
+
|
|
144
|
+
def fetch_daemon_url
|
|
145
|
+
return nil unless defined?(Legion::LLM) && Legion::LLM.respond_to?(:settings)
|
|
146
|
+
|
|
147
|
+
settings = Legion::LLM.settings
|
|
148
|
+
return nil unless settings.is_a?(Hash)
|
|
149
|
+
|
|
150
|
+
daemon = settings[:daemon]
|
|
151
|
+
return nil unless daemon.is_a?(Hash)
|
|
152
|
+
|
|
153
|
+
daemon[:url]
|
|
154
|
+
rescue StandardError
|
|
155
|
+
nil
|
|
156
|
+
end
|
|
157
|
+
|
|
158
|
+
def safe_parse(body)
|
|
159
|
+
return {} if body.nil? || body.strip.empty?
|
|
160
|
+
|
|
161
|
+
::JSON.parse(body, symbolize_names: true)
|
|
162
|
+
rescue ::JSON::ParserError
|
|
163
|
+
{}
|
|
164
|
+
end
|
|
165
|
+
|
|
166
|
+
def extract_retry_after(response, parsed)
|
|
167
|
+
from_body = parsed.dig(:error, :retry_after) || parsed[:retry_after]
|
|
168
|
+
return from_body.to_i if from_body
|
|
169
|
+
|
|
170
|
+
header = response['Retry-After']
|
|
171
|
+
return header.to_i if header
|
|
172
|
+
|
|
173
|
+
0
|
|
174
|
+
end
|
|
175
|
+
|
|
176
|
+
private_class_method :fetch_daemon_url, :safe_parse, :extract_retry_after
|
|
177
|
+
end
|
|
178
|
+
end
|
|
179
|
+
end
|
|
@@ -0,0 +1,133 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require 'fileutils'
|
|
4
|
+
require 'json'
|
|
5
|
+
|
|
6
|
+
module Legion
|
|
7
|
+
module LLM
|
|
8
|
+
module ResponseCache
|
|
9
|
+
DEFAULT_TTL = 300
|
|
10
|
+
SPOOL_THRESHOLD = 8 * 1024 * 1024 # 8 MB
|
|
11
|
+
SPOOL_DIR = File.expand_path('~/.legionio/data/spool/llm_responses').freeze
|
|
12
|
+
|
|
13
|
+
module_function
|
|
14
|
+
|
|
15
|
+
# Sets status to :pending for a new request.
|
|
16
|
+
def init_request(request_id, ttl: DEFAULT_TTL)
|
|
17
|
+
cache_set(status_key(request_id), 'pending', ttl)
|
|
18
|
+
end
|
|
19
|
+
|
|
20
|
+
# Writes response, meta, and marks status as :done.
|
|
21
|
+
def complete(request_id, response:, meta:, ttl: DEFAULT_TTL)
|
|
22
|
+
write_response(request_id, response, ttl)
|
|
23
|
+
cache_set(meta_key(request_id), ::JSON.dump(meta), ttl)
|
|
24
|
+
cache_set(status_key(request_id), 'done', ttl)
|
|
25
|
+
end
|
|
26
|
+
|
|
27
|
+
# Writes error details and marks status as :error.
|
|
28
|
+
def fail_request(request_id, code:, message:, ttl: DEFAULT_TTL)
|
|
29
|
+
payload = ::JSON.dump({ code: code, message: message })
|
|
30
|
+
cache_set(error_key(request_id), payload, ttl)
|
|
31
|
+
cache_set(status_key(request_id), 'error', ttl)
|
|
32
|
+
end
|
|
33
|
+
|
|
34
|
+
# Returns :pending, :done, :error, or nil.
|
|
35
|
+
def status(request_id)
|
|
36
|
+
raw = Legion::Cache.get(status_key(request_id))
|
|
37
|
+
raw&.to_sym
|
|
38
|
+
end
|
|
39
|
+
|
|
40
|
+
# Returns the response string (handles spool overflow transparently).
|
|
41
|
+
def response(request_id)
|
|
42
|
+
raw = Legion::Cache.get(response_key(request_id))
|
|
43
|
+
return nil if raw.nil?
|
|
44
|
+
return File.read(raw.delete_prefix('spool:')) if raw.start_with?('spool:')
|
|
45
|
+
|
|
46
|
+
raw
|
|
47
|
+
end
|
|
48
|
+
|
|
49
|
+
# Returns meta hash with symbolized keys, or nil.
|
|
50
|
+
def meta(request_id)
|
|
51
|
+
raw = Legion::Cache.get(meta_key(request_id))
|
|
52
|
+
return nil if raw.nil?
|
|
53
|
+
|
|
54
|
+
::JSON.parse(raw, symbolize_names: true)
|
|
55
|
+
end
|
|
56
|
+
|
|
57
|
+
# Returns { code:, message: } hash, or nil.
|
|
58
|
+
def error(request_id)
|
|
59
|
+
raw = Legion::Cache.get(error_key(request_id))
|
|
60
|
+
return nil if raw.nil?
|
|
61
|
+
|
|
62
|
+
::JSON.parse(raw, symbolize_names: true)
|
|
63
|
+
end
|
|
64
|
+
|
|
65
|
+
# Blocking poll. Returns { status: :done, response:, meta: },
|
|
66
|
+
# { status: :error, error: }, or { status: :timeout }.
|
|
67
|
+
def poll(request_id, timeout: DEFAULT_TTL, interval: 0.1)
|
|
68
|
+
deadline = ::Process.clock_gettime(::Process::CLOCK_MONOTONIC) + timeout
|
|
69
|
+
|
|
70
|
+
loop do
|
|
71
|
+
current = status(request_id)
|
|
72
|
+
|
|
73
|
+
case current
|
|
74
|
+
when :done
|
|
75
|
+
return { status: :done, response: response(request_id), meta: meta(request_id) }
|
|
76
|
+
when :error
|
|
77
|
+
return { status: :error, error: error(request_id) }
|
|
78
|
+
end
|
|
79
|
+
|
|
80
|
+
return { status: :timeout } if ::Process.clock_gettime(::Process::CLOCK_MONOTONIC) >= deadline
|
|
81
|
+
|
|
82
|
+
sleep interval
|
|
83
|
+
end
|
|
84
|
+
end
|
|
85
|
+
|
|
86
|
+
# Removes all cache keys for a request (and any spool file).
|
|
87
|
+
def cleanup(request_id)
|
|
88
|
+
raw = Legion::Cache.get(response_key(request_id))
|
|
89
|
+
if raw&.start_with?('spool:')
|
|
90
|
+
path = raw.delete_prefix('spool:')
|
|
91
|
+
FileUtils.rm_f(path)
|
|
92
|
+
end
|
|
93
|
+
|
|
94
|
+
Legion::Cache.delete(status_key(request_id))
|
|
95
|
+
Legion::Cache.delete(response_key(request_id))
|
|
96
|
+
Legion::Cache.delete(meta_key(request_id))
|
|
97
|
+
Legion::Cache.delete(error_key(request_id))
|
|
98
|
+
end
|
|
99
|
+
|
|
100
|
+
# ── private helpers ────────────────────────────────────────────────
|
|
101
|
+
private_class_method def self.status_key(request_id)
|
|
102
|
+
"llm:#{request_id}:status"
|
|
103
|
+
end
|
|
104
|
+
|
|
105
|
+
private_class_method def self.response_key(request_id)
|
|
106
|
+
"llm:#{request_id}:response"
|
|
107
|
+
end
|
|
108
|
+
|
|
109
|
+
private_class_method def self.meta_key(request_id)
|
|
110
|
+
"llm:#{request_id}:meta"
|
|
111
|
+
end
|
|
112
|
+
|
|
113
|
+
private_class_method def self.error_key(request_id)
|
|
114
|
+
"llm:#{request_id}:error"
|
|
115
|
+
end
|
|
116
|
+
|
|
117
|
+
private_class_method def self.cache_set(key, value, ttl)
|
|
118
|
+
Legion::Cache.set(key, value, ttl)
|
|
119
|
+
end
|
|
120
|
+
|
|
121
|
+
private_class_method def self.write_response(request_id, response_text, ttl)
|
|
122
|
+
if response_text.bytesize > SPOOL_THRESHOLD
|
|
123
|
+
FileUtils.mkdir_p(SPOOL_DIR)
|
|
124
|
+
path = File.join(SPOOL_DIR, "#{request_id}.txt")
|
|
125
|
+
File.write(path, response_text)
|
|
126
|
+
cache_set(response_key(request_id), "spool:#{path}", ttl)
|
|
127
|
+
else
|
|
128
|
+
cache_set(response_key(request_id), response_text, ttl)
|
|
129
|
+
end
|
|
130
|
+
end
|
|
131
|
+
end
|
|
132
|
+
end
|
|
133
|
+
end
|
data/lib/legion/llm/settings.rb
CHANGED
|
@@ -13,7 +13,15 @@ module Legion
|
|
|
13
13
|
providers: providers,
|
|
14
14
|
routing: routing_defaults,
|
|
15
15
|
discovery: discovery_defaults,
|
|
16
|
-
gateway: gateway_defaults
|
|
16
|
+
gateway: gateway_defaults,
|
|
17
|
+
daemon: daemon_defaults
|
|
18
|
+
}
|
|
19
|
+
end
|
|
20
|
+
|
|
21
|
+
def self.daemon_defaults
|
|
22
|
+
{
|
|
23
|
+
url: nil,
|
|
24
|
+
enabled: false
|
|
17
25
|
}
|
|
18
26
|
end
|
|
19
27
|
|
data/lib/legion/llm/version.rb
CHANGED
data/lib/legion/llm.rb
CHANGED
|
@@ -8,6 +8,8 @@ require 'legion/llm/router'
|
|
|
8
8
|
require 'legion/llm/compressor'
|
|
9
9
|
require 'legion/llm/quality_checker'
|
|
10
10
|
require 'legion/llm/escalation_history'
|
|
11
|
+
require_relative 'llm/response_cache'
|
|
12
|
+
require_relative 'llm/daemon_client'
|
|
11
13
|
|
|
12
14
|
begin
|
|
13
15
|
require 'legion/extensions/llm/gateway'
|
|
@@ -18,6 +20,8 @@ end
|
|
|
18
20
|
module Legion
|
|
19
21
|
module LLM
|
|
20
22
|
class EscalationExhausted < StandardError; end
|
|
23
|
+
class DaemonDeniedError < StandardError; end
|
|
24
|
+
class DaemonRateLimitedError < StandardError; end
|
|
21
25
|
|
|
22
26
|
class << self
|
|
23
27
|
include Legion::LLM::Providers
|
|
@@ -71,6 +75,19 @@ module Legion
|
|
|
71
75
|
quality_check: quality_check, message: message, **)
|
|
72
76
|
end
|
|
73
77
|
|
|
78
|
+
# Send a single message — daemon-first, falls through to direct on unavailability.
|
|
79
|
+
def ask(message:, model: nil, provider: nil, intent: nil, tier: nil,
|
|
80
|
+
context: {}, identity: nil, &)
|
|
81
|
+
if DaemonClient.available?
|
|
82
|
+
result = daemon_ask(message: message, model: model, provider: provider,
|
|
83
|
+
context: context, tier: tier, identity: identity)
|
|
84
|
+
return result if result
|
|
85
|
+
end
|
|
86
|
+
|
|
87
|
+
ask_direct(message: message, model: model, provider: provider,
|
|
88
|
+
intent: intent, tier: tier, &)
|
|
89
|
+
end
|
|
90
|
+
|
|
74
91
|
# Direct chat bypassing gateway — used by gateway runners to avoid recursion
|
|
75
92
|
def chat_direct(model: nil, provider: nil, intent: nil, tier: nil, escalate: nil,
|
|
76
93
|
max_escalations: nil, quality_check: nil, message: nil, **)
|
|
@@ -135,6 +152,41 @@ module Legion
|
|
|
135
152
|
|
|
136
153
|
private
|
|
137
154
|
|
|
155
|
+
def daemon_ask(message:, model: nil, provider: nil, context: {}, tier: nil, identity: nil) # rubocop:disable Lint/UnusedMethodArgument
|
|
156
|
+
result = DaemonClient.chat(
|
|
157
|
+
message: message, model: model, provider: provider,
|
|
158
|
+
context: context, tier_preference: tier || :auto
|
|
159
|
+
)
|
|
160
|
+
|
|
161
|
+
case result[:status]
|
|
162
|
+
when :immediate, :created
|
|
163
|
+
result[:body]
|
|
164
|
+
when :accepted
|
|
165
|
+
ResponseCache.poll(result[:request_id])
|
|
166
|
+
when :denied
|
|
167
|
+
raise DaemonDeniedError, result.dig(:error, :message) || 'Access denied'
|
|
168
|
+
when :rate_limited
|
|
169
|
+
raise DaemonRateLimitedError, "Rate limited. Retry after #{result[:retry_after]}s"
|
|
170
|
+
end
|
|
171
|
+
# Returns nil for :unavailable/:error — caller falls through to direct
|
|
172
|
+
end
|
|
173
|
+
|
|
174
|
+
def ask_direct(message:, model: nil, provider: nil, intent: nil, tier: nil, &block)
|
|
175
|
+
session = chat_direct(model: model, provider: provider, intent: intent, tier: tier)
|
|
176
|
+
response = block ? session.ask(message, &block) : session.ask(message)
|
|
177
|
+
|
|
178
|
+
{
|
|
179
|
+
status: :done,
|
|
180
|
+
response: response.content,
|
|
181
|
+
meta: {
|
|
182
|
+
tier: :direct,
|
|
183
|
+
model: session.model.to_s,
|
|
184
|
+
tokens_in: response.respond_to?(:input_tokens) ? response.input_tokens : nil,
|
|
185
|
+
tokens_out: response.respond_to?(:output_tokens) ? response.output_tokens : nil
|
|
186
|
+
}
|
|
187
|
+
}
|
|
188
|
+
end
|
|
189
|
+
|
|
138
190
|
def gateway_loaded?
|
|
139
191
|
defined?(Legion::Extensions::LLM::Gateway::Runners::Inference)
|
|
140
192
|
end
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: legion-llm
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.3.
|
|
4
|
+
version: 0.3.7
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Esity
|
|
@@ -37,6 +37,48 @@ dependencies:
|
|
|
37
37
|
- - ">="
|
|
38
38
|
- !ruby/object:Gem::Version
|
|
39
39
|
version: '0'
|
|
40
|
+
- !ruby/object:Gem::Dependency
|
|
41
|
+
name: lex-claude
|
|
42
|
+
requirement: !ruby/object:Gem::Requirement
|
|
43
|
+
requirements:
|
|
44
|
+
- - ">="
|
|
45
|
+
- !ruby/object:Gem::Version
|
|
46
|
+
version: '0'
|
|
47
|
+
type: :runtime
|
|
48
|
+
prerelease: false
|
|
49
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
50
|
+
requirements:
|
|
51
|
+
- - ">="
|
|
52
|
+
- !ruby/object:Gem::Version
|
|
53
|
+
version: '0'
|
|
54
|
+
- !ruby/object:Gem::Dependency
|
|
55
|
+
name: lex-gemini
|
|
56
|
+
requirement: !ruby/object:Gem::Requirement
|
|
57
|
+
requirements:
|
|
58
|
+
- - ">="
|
|
59
|
+
- !ruby/object:Gem::Version
|
|
60
|
+
version: '0'
|
|
61
|
+
type: :runtime
|
|
62
|
+
prerelease: false
|
|
63
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
64
|
+
requirements:
|
|
65
|
+
- - ">="
|
|
66
|
+
- !ruby/object:Gem::Version
|
|
67
|
+
version: '0'
|
|
68
|
+
- !ruby/object:Gem::Dependency
|
|
69
|
+
name: lex-openai
|
|
70
|
+
requirement: !ruby/object:Gem::Requirement
|
|
71
|
+
requirements:
|
|
72
|
+
- - ">="
|
|
73
|
+
- !ruby/object:Gem::Version
|
|
74
|
+
version: '0'
|
|
75
|
+
type: :runtime
|
|
76
|
+
prerelease: false
|
|
77
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
78
|
+
requirements:
|
|
79
|
+
- - ">="
|
|
80
|
+
- !ruby/object:Gem::Version
|
|
81
|
+
version: '0'
|
|
40
82
|
- !ruby/object:Gem::Dependency
|
|
41
83
|
name: ruby_llm
|
|
42
84
|
requirement: !ruby/object:Gem::Requirement
|
|
@@ -91,6 +133,7 @@ files:
|
|
|
91
133
|
- lib/legion/llm/bedrock_bearer_auth.rb
|
|
92
134
|
- lib/legion/llm/claude_config_loader.rb
|
|
93
135
|
- lib/legion/llm/compressor.rb
|
|
136
|
+
- lib/legion/llm/daemon_client.rb
|
|
94
137
|
- lib/legion/llm/discovery/ollama.rb
|
|
95
138
|
- lib/legion/llm/discovery/system.rb
|
|
96
139
|
- lib/legion/llm/embeddings.rb
|
|
@@ -98,6 +141,7 @@ files:
|
|
|
98
141
|
- lib/legion/llm/helpers/llm.rb
|
|
99
142
|
- lib/legion/llm/providers.rb
|
|
100
143
|
- lib/legion/llm/quality_checker.rb
|
|
144
|
+
- lib/legion/llm/response_cache.rb
|
|
101
145
|
- lib/legion/llm/router.rb
|
|
102
146
|
- lib/legion/llm/router/escalation_chain.rb
|
|
103
147
|
- lib/legion/llm/router/gateway_interceptor.rb
|