legion-llm 0.3.15 → 0.3.18
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +36 -0
- data/CLAUDE.md +2 -2
- data/CODEOWNERS +39 -0
- data/README.md +1 -1
- data/lib/legion/llm/arbitrage.rb +3 -1
- data/lib/legion/llm/cache.rb +9 -3
- data/lib/legion/llm/compressor.rb +2 -0
- data/lib/legion/llm/cost_tracker.rb +95 -0
- data/lib/legion/llm/daemon_client.rb +4 -0
- data/lib/legion/llm/discovery/ollama.rb +4 -1
- data/lib/legion/llm/discovery/system.rb +8 -4
- data/lib/legion/llm/off_peak.rb +46 -0
- data/lib/legion/llm/response_cache.rb +3 -0
- data/lib/legion/llm/router/health_tracker.rb +13 -2
- data/lib/legion/llm/router/rule.rb +32 -9
- data/lib/legion/llm/router.rb +18 -2
- data/lib/legion/llm/scheduling.rb +3 -1
- data/lib/legion/llm/shadow_eval.rb +2 -0
- data/lib/legion/llm/structured_output.rb +5 -2
- data/lib/legion/llm/version.rb +1 -1
- data/lib/legion/llm.rb +13 -4
- metadata +3 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 0a19ae18f6bbb96680e6bfbcad1e81bd3e67c6b1be72e0615cf24852a69b2e59
|
|
4
|
+
data.tar.gz: f293c1bc52cb97652e545efb4877b32592d8aa0d96d190bef4f2624d6b277a5a
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 488d6fec5178b75b4f48b9031cd3e172b9637d1044ca210912ff1bc0a5b91818d2fd379725426e8be58bfc30638f0cd6ef83a2f550e94e728f40b74b1a39a5ad
|
|
7
|
+
data.tar.gz: b738f793fc22c4dbf3400da0fb9c13d2bb427321fa1f1d8a594482a35239644d2e4d0fc96e9e08c295c95a2e3ff388e9f45b1cddf3e7db0f1bd00a4139c257ed
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,41 @@
|
|
|
1
1
|
# Legion LLM Changelog
|
|
2
2
|
|
|
3
|
+
## [0.3.18] - 2026-03-22
|
|
4
|
+
|
|
5
|
+
### Added
|
|
6
|
+
- Logging across routing, health tracking, caching, and discovery subsystems
|
|
7
|
+
- `Router.resolve`: `.info` on route decision (tier/provider/model/rule), `.debug` on candidate filtering counts, `.debug` when no rules match
|
|
8
|
+
- `Router::HealthTracker`: `.warn` on circuit state transitions (closed->open, half_open->open, open->half_open, any->closed), `.debug` on latency penalty applied
|
|
9
|
+
- `Router::Rule`: `.debug` on intent mismatch, schedule constraint rejections (valid_from, valid_until, hours, days)
|
|
10
|
+
- `Cache`: `.debug` on cache miss and cache write, `.warn` on swallowed get/set errors
|
|
11
|
+
- `ResponseCache`: `.warn` on spool overflow to disk, `.debug` on async poll status, `.warn` on fail_request
|
|
12
|
+
- `DaemonClient`: `.warn` on mark_unhealthy, `.warn` on 403/429 responses, `.info` on health check result
|
|
13
|
+
- `StructuredOutput`: `.warn` on JSON parse failure with attempt count, `.debug` when using prompt-based fallback
|
|
14
|
+
- `Compressor`: `.debug` on compression applied (level, original length, compressed length)
|
|
15
|
+
- `Discovery::Ollama`: `.warn` on HTTP failure, `.debug` on model list refresh with count
|
|
16
|
+
- `Discovery::System`: `.warn` on system command failures (sysctl, vm_stat, /proc/meminfo)
|
|
17
|
+
- `ShadowEval`: `.debug` on evaluation triggered, `.warn` on failure
|
|
18
|
+
- `Scheduling`: `.debug` on defer decision
|
|
19
|
+
- `OffPeak`: `.debug` on peak hour check result
|
|
20
|
+
- `Arbitrage`: `.debug` on model selection result
|
|
21
|
+
|
|
22
|
+
### Changed
|
|
23
|
+
- `Router::Rule#within_schedule?` refactored to extract `schedule_rejection` helper (reduces cyclomatic complexity)
|
|
24
|
+
|
|
25
|
+
## [0.3.17] - 2026-03-22
|
|
26
|
+
|
|
27
|
+
### Added
|
|
28
|
+
- `Legion::LLM::OffPeak` module for off-peak scheduling: `peak_hour?`, `should_defer?(priority:)`, `next_off_peak` — defers non-urgent LLM requests during configurable peak hours (default 14:00-22:00 UTC)
|
|
29
|
+
- `Legion::LLM::CostTracker` module for per-request cost tracking: `record(model:, input_tokens:, output_tokens:)`, `summary(since:)` with by-model breakdown, configurable pricing table via settings, thread-safe accumulator
|
|
30
|
+
|
|
31
|
+
## [0.3.16] - 2026-03-22
|
|
32
|
+
|
|
33
|
+
### Fixed
|
|
34
|
+
- `chat_single` now accepts and forwards `message:` kwarg, calling `session.ask(message)` when present instead of returning a bare session object
|
|
35
|
+
- `chat_direct` passes `message:` through to `chat_single` in the non-escalation branch
|
|
36
|
+
- Add `FRAMEWORK_KEYS` constant to strip Runner.run metadata kwargs (`task_id`, `source`, `timestamp`, etc.) before passing to RubyLLM
|
|
37
|
+
- Move `FRAMEWORK_KEYS` out of `private` scope (constants are not affected by `private` in Ruby)
|
|
38
|
+
|
|
3
39
|
## [0.3.15] - 2026-03-21
|
|
4
40
|
|
|
5
41
|
### Changed
|
data/CLAUDE.md
CHANGED
|
@@ -8,7 +8,7 @@
|
|
|
8
8
|
Core LegionIO gem providing LLM capabilities to all extensions. Wraps ruby_llm to provide a consistent interface for chat, embeddings, tool use, and agents across multiple providers (Bedrock, Anthropic, OpenAI, Gemini, Ollama). Includes a dynamic weighted routing engine that dispatches requests across local, fleet, and cloud tiers based on caller intent, priority rules, time schedules, cost multipliers, and real-time provider health.
|
|
9
9
|
|
|
10
10
|
**GitHub**: https://github.com/LegionIO/legion-llm
|
|
11
|
-
**Version**: 0.3.
|
|
11
|
+
**Version**: 0.3.15
|
|
12
12
|
**License**: Apache-2.0
|
|
13
13
|
|
|
14
14
|
## Architecture
|
|
@@ -314,7 +314,7 @@ In-memory signal consumer with pluggable handlers. Adjusts effective priorities
|
|
|
314
314
|
| `lib/legion/llm/embeddings.rb` | Embeddings module: generate, generate_batch, default_model |
|
|
315
315
|
| `lib/legion/llm/shadow_eval.rb` | Shadow evaluation: enabled?, should_sample?, evaluate, compare |
|
|
316
316
|
| `lib/legion/llm/structured_output.rb` | JSON schema enforcement with native response_format and prompt fallback |
|
|
317
|
-
| `lib/legion/llm/version.rb` | Version constant (0.3.
|
|
317
|
+
| `lib/legion/llm/version.rb` | Version constant (0.3.15) |
|
|
318
318
|
| `lib/legion/llm/quality_checker.rb` | QualityChecker module with QualityResult struct |
|
|
319
319
|
| `lib/legion/llm/escalation_history.rb` | EscalationHistory mixin: `escalation_history`, `escalated?`, `final_resolution`, `escalation_chain` |
|
|
320
320
|
| `lib/legion/llm/router/escalation_chain.rb` | EscalationChain value object |
|
data/CODEOWNERS
CHANGED
|
@@ -1 +1,40 @@
|
|
|
1
|
+
# Default owner — all files
|
|
1
2
|
* @Esity
|
|
3
|
+
|
|
4
|
+
# Core library code
|
|
5
|
+
# lib/ @Esity @future-ai-team
|
|
6
|
+
|
|
7
|
+
# Router (dynamic weighted routing, intent, escalation)
|
|
8
|
+
# lib/legion/llm/router/ @Esity @future-ai-team
|
|
9
|
+
# lib/legion/llm/router.rb @Esity @future-ai-team
|
|
10
|
+
|
|
11
|
+
# Provider configuration
|
|
12
|
+
# lib/legion/llm/providers.rb @Esity @future-ai-team
|
|
13
|
+
|
|
14
|
+
# Discovery (Ollama, system memory)
|
|
15
|
+
# lib/legion/llm/discovery/ @Esity @future-ai-team
|
|
16
|
+
|
|
17
|
+
# Embeddings
|
|
18
|
+
# lib/legion/llm/embeddings.rb @Esity @future-ai-team
|
|
19
|
+
|
|
20
|
+
# Structured output and quality checking
|
|
21
|
+
# lib/legion/llm/structured_output.rb @Esity @future-ai-team
|
|
22
|
+
# lib/legion/llm/quality_checker.rb @Esity @future-ai-team
|
|
23
|
+
|
|
24
|
+
# Compressor
|
|
25
|
+
# lib/legion/llm/compressor.rb @Esity @future-ai-team
|
|
26
|
+
|
|
27
|
+
# Transport (escalation events)
|
|
28
|
+
# lib/legion/llm/transport/ @Esity @future-infra-team
|
|
29
|
+
|
|
30
|
+
# Extension helper mixin
|
|
31
|
+
# lib/legion/llm/helpers/ @Esity @future-core-team
|
|
32
|
+
|
|
33
|
+
# Specs
|
|
34
|
+
# spec/ @Esity @future-contributors
|
|
35
|
+
|
|
36
|
+
# Documentation
|
|
37
|
+
# *.md @Esity @future-docs-team
|
|
38
|
+
|
|
39
|
+
# CI/CD
|
|
40
|
+
# .github/ @Esity
|
data/README.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
|
|
3
3
|
LLM integration for the [LegionIO](https://github.com/LegionIO/LegionIO) framework. Wraps [ruby_llm](https://github.com/crmne/ruby_llm) to provide chat, embeddings, tool use, and agent capabilities to any Legion extension.
|
|
4
4
|
|
|
5
|
-
**Version**: 0.3.
|
|
5
|
+
**Version**: 0.3.15
|
|
6
6
|
|
|
7
7
|
## Installation
|
|
8
8
|
|
data/lib/legion/llm/arbitrage.rb
CHANGED
|
@@ -56,7 +56,9 @@ module Legion
|
|
|
56
56
|
|
|
57
57
|
return nil if scored.empty?
|
|
58
58
|
|
|
59
|
-
scored.min_by { |_model, cost| cost }&.first
|
|
59
|
+
selected = scored.min_by { |_model, cost| cost }&.first
|
|
60
|
+
Legion::Logging.debug("Arbitrage selected model=#{selected} capability=#{capability}") if defined?(Legion::Logging)
|
|
61
|
+
selected
|
|
60
62
|
end
|
|
61
63
|
|
|
62
64
|
# Returns the merged cost table: defaults overridden by any settings-defined entries.
|
data/lib/legion/llm/cache.rb
CHANGED
|
@@ -27,10 +27,14 @@ module Legion
|
|
|
27
27
|
return nil unless available?
|
|
28
28
|
|
|
29
29
|
raw = Legion::Cache.get(cache_key)
|
|
30
|
-
|
|
30
|
+
if raw.nil?
|
|
31
|
+
Legion::Logging.debug("LLM cache miss key=#{cache_key}") if defined?(Legion::Logging)
|
|
32
|
+
return nil
|
|
33
|
+
end
|
|
31
34
|
|
|
32
35
|
::JSON.parse(raw, symbolize_names: true)
|
|
33
|
-
rescue StandardError
|
|
36
|
+
rescue StandardError => e
|
|
37
|
+
Legion::Logging.warn("LLM cache get error key=#{cache_key}: #{e.message}") if defined?(Legion::Logging)
|
|
34
38
|
nil
|
|
35
39
|
end
|
|
36
40
|
|
|
@@ -39,8 +43,10 @@ module Legion
|
|
|
39
43
|
return false unless available?
|
|
40
44
|
|
|
41
45
|
Legion::Cache.set(cache_key, ::JSON.dump(response), ttl)
|
|
46
|
+
Legion::Logging.debug("LLM cache write key=#{cache_key} ttl=#{ttl}") if defined?(Legion::Logging)
|
|
42
47
|
true
|
|
43
|
-
rescue StandardError
|
|
48
|
+
rescue StandardError => e
|
|
49
|
+
Legion::Logging.warn("LLM cache set error key=#{cache_key}: #{e.message}") if defined?(Legion::Logging)
|
|
44
50
|
false
|
|
45
51
|
end
|
|
46
52
|
|
|
@@ -19,10 +19,12 @@ module Legion
|
|
|
19
19
|
def compress(text, level: LIGHT)
|
|
20
20
|
return text if text.nil? || text.empty? || level <= NONE
|
|
21
21
|
|
|
22
|
+
original_length = text.length
|
|
22
23
|
segments = split_segments(text)
|
|
23
24
|
result = segments.map { |seg| seg[:protected] ? seg[:text] : compress_prose(seg[:text], level) }.join
|
|
24
25
|
|
|
25
26
|
result = collapse_whitespace(result) if level >= AGGRESSIVE
|
|
27
|
+
Legion::Logging.debug("Compressor applied level=#{level} original=#{original_length} compressed=#{result.length}") if defined?(Legion::Logging)
|
|
26
28
|
result
|
|
27
29
|
end
|
|
28
30
|
|
|
@@ -0,0 +1,95 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Legion
|
|
4
|
+
module LLM
|
|
5
|
+
module CostTracker
|
|
6
|
+
# Default per-1M-token pricing in USD (input / output).
|
|
7
|
+
# Overridable via Legion::Settings[:llm][:pricing].
|
|
8
|
+
DEFAULT_PRICING = {
|
|
9
|
+
'claude-sonnet-4-6' => { input: 3.0, output: 15.0 },
|
|
10
|
+
'claude-haiku-4-5' => { input: 0.80, output: 4.0 },
|
|
11
|
+
'claude-opus-4-6' => { input: 15.0, output: 75.0 },
|
|
12
|
+
'gpt-4o' => { input: 2.50, output: 10.0 },
|
|
13
|
+
'gpt-4o-mini' => { input: 0.15, output: 0.60 }
|
|
14
|
+
}.freeze
|
|
15
|
+
|
|
16
|
+
class << self
|
|
17
|
+
# Records a completed LLM request and calculates its cost.
|
|
18
|
+
#
|
|
19
|
+
# @param model [String] model identifier
|
|
20
|
+
# @param input_tokens [Integer] number of input tokens consumed
|
|
21
|
+
# @param output_tokens [Integer] number of output tokens produced
|
|
22
|
+
# @param provider [Symbol, nil] provider (informational)
|
|
23
|
+
# @return [Hash] the recorded entry
|
|
24
|
+
def record(model:, input_tokens:, output_tokens:, provider: nil)
|
|
25
|
+
pricing = pricing_for(model)
|
|
26
|
+
cost = (input_tokens * pricing[:input] / 1_000_000.0) +
|
|
27
|
+
(output_tokens * pricing[:output] / 1_000_000.0)
|
|
28
|
+
|
|
29
|
+
entry = {
|
|
30
|
+
model: model,
|
|
31
|
+
provider: provider,
|
|
32
|
+
input_tokens: input_tokens,
|
|
33
|
+
output_tokens: output_tokens,
|
|
34
|
+
cost_usd: cost.round(6),
|
|
35
|
+
recorded_at: Time.now
|
|
36
|
+
}
|
|
37
|
+
|
|
38
|
+
records << entry
|
|
39
|
+
Legion::Logging.debug "[LLM::CostTracker] #{model}: #{input_tokens}+#{output_tokens} tokens = $#{cost.round(6)}"
|
|
40
|
+
entry
|
|
41
|
+
end
|
|
42
|
+
|
|
43
|
+
# Returns a cost summary, optionally filtered by a start time.
|
|
44
|
+
#
|
|
45
|
+
# @param since [Time, nil] include only records on or after this time
|
|
46
|
+
# @return [Hash] with :total_cost_usd, :total_requests, token totals, and :by_model breakdown
|
|
47
|
+
def summary(since: nil)
|
|
48
|
+
subset = since ? records.select { |r| r[:recorded_at] >= since } : records.dup
|
|
49
|
+
|
|
50
|
+
{
|
|
51
|
+
total_cost_usd: subset.sum { |r| r[:cost_usd] }.round(6),
|
|
52
|
+
total_requests: subset.size,
|
|
53
|
+
total_input_tokens: subset.sum { |r| r[:input_tokens] },
|
|
54
|
+
total_output_tokens: subset.sum { |r| r[:output_tokens] },
|
|
55
|
+
by_model: subset.group_by { |r| r[:model] }.transform_values do |rs|
|
|
56
|
+
{
|
|
57
|
+
cost_usd: rs.sum { |r| r[:cost_usd] }.round(6),
|
|
58
|
+
requests: rs.size
|
|
59
|
+
}
|
|
60
|
+
end
|
|
61
|
+
}
|
|
62
|
+
end
|
|
63
|
+
|
|
64
|
+
# Clears all recorded entries.
|
|
65
|
+
def clear
|
|
66
|
+
@records = []
|
|
67
|
+
end
|
|
68
|
+
|
|
69
|
+
# Returns pricing for a model, preferring settings-defined overrides.
|
|
70
|
+
#
|
|
71
|
+
# @param model [String] model identifier
|
|
72
|
+
# @return [Hash] with :input and :output keys (per-1M-token USD)
|
|
73
|
+
def pricing_for(model)
|
|
74
|
+
custom = settings_pricing
|
|
75
|
+
custom[model.to_s] || DEFAULT_PRICING[model.to_s] || { input: 5.0, output: 15.0 }
|
|
76
|
+
end
|
|
77
|
+
|
|
78
|
+
private
|
|
79
|
+
|
|
80
|
+
def records
|
|
81
|
+
@records ||= []
|
|
82
|
+
end
|
|
83
|
+
|
|
84
|
+
def settings_pricing
|
|
85
|
+
return {} unless defined?(Legion::Settings)
|
|
86
|
+
|
|
87
|
+
pricing = Legion::Settings.dig(:'legion-llm', :pricing)
|
|
88
|
+
pricing.is_a?(Hash) ? pricing : {}
|
|
89
|
+
rescue StandardError
|
|
90
|
+
{}
|
|
91
|
+
end
|
|
92
|
+
end
|
|
93
|
+
end
|
|
94
|
+
end
|
|
95
|
+
end
|
|
@@ -76,6 +76,7 @@ module Legion
|
|
|
76
76
|
healthy = response.code == '200'
|
|
77
77
|
@healthy = healthy
|
|
78
78
|
@health_checked_at = ::Process.clock_gettime(::Process::CLOCK_MONOTONIC)
|
|
79
|
+
Legion::Logging.info("Daemon health check result=#{healthy ? 'healthy' : 'unhealthy'} url=#{daemon_url}") if defined?(Legion::Logging)
|
|
79
80
|
healthy
|
|
80
81
|
rescue StandardError
|
|
81
82
|
mark_unhealthy
|
|
@@ -84,6 +85,7 @@ module Legion
|
|
|
84
85
|
|
|
85
86
|
# Marks the daemon as unhealthy and records the timestamp.
|
|
86
87
|
def mark_unhealthy
|
|
88
|
+
Legion::Logging.warn("Daemon marked unhealthy url=#{daemon_url}") if defined?(Legion::Logging)
|
|
87
89
|
@healthy = false
|
|
88
90
|
@health_checked_at = ::Process.clock_gettime(::Process::CLOCK_MONOTONIC)
|
|
89
91
|
end
|
|
@@ -128,9 +130,11 @@ module Legion
|
|
|
128
130
|
data = parsed.fetch(:data, {})
|
|
129
131
|
{ status: :accepted, request_id: data[:request_id], poll_key: data[:poll_key] }
|
|
130
132
|
when 403
|
|
133
|
+
Legion::Logging.warn("Daemon returned 403 Denied url=#{daemon_url}") if defined?(Legion::Logging)
|
|
131
134
|
{ status: :denied, error: parsed.fetch(:error, parsed) }
|
|
132
135
|
when 429
|
|
133
136
|
retry_after = extract_retry_after(response, parsed)
|
|
137
|
+
Legion::Logging.warn("Daemon returned 429 RateLimited url=#{daemon_url} retry_after=#{retry_after}") if defined?(Legion::Logging)
|
|
134
138
|
{ status: :rate_limited, retry_after: retry_after }
|
|
135
139
|
when 503
|
|
136
140
|
{ status: :unavailable }
|
|
@@ -30,10 +30,13 @@ module Legion
|
|
|
30
30
|
if response.success?
|
|
31
31
|
parsed = ::JSON.parse(response.body)
|
|
32
32
|
@models = parsed['models'] || []
|
|
33
|
+
Legion::Logging.debug("Discovery::Ollama model list refreshed count=#{@models.size}") if defined?(Legion::Logging)
|
|
33
34
|
else
|
|
35
|
+
Legion::Logging.warn("Discovery::Ollama HTTP failure status=#{response.status}") if defined?(Legion::Logging)
|
|
34
36
|
@models ||= []
|
|
35
37
|
end
|
|
36
|
-
rescue StandardError
|
|
38
|
+
rescue StandardError => e
|
|
39
|
+
Legion::Logging.warn("Discovery::Ollama HTTP failure: #{e.message}") if defined?(Legion::Logging)
|
|
37
40
|
@models ||= []
|
|
38
41
|
ensure
|
|
39
42
|
@last_refreshed_at = Time.now
|
|
@@ -94,7 +94,8 @@ module Legion
|
|
|
94
94
|
def fetch_macos_total
|
|
95
95
|
raw = `sysctl -n hw.memsize`.strip.to_i
|
|
96
96
|
@total_memory_mb = raw / 1024 / 1024
|
|
97
|
-
rescue StandardError
|
|
97
|
+
rescue StandardError => e
|
|
98
|
+
Legion::Logging.warn("Discovery::System sysctl command failed: #{e.message}") if defined?(Legion::Logging)
|
|
98
99
|
@total_memory_mb = nil
|
|
99
100
|
end
|
|
100
101
|
|
|
@@ -104,7 +105,8 @@ module Legion
|
|
|
104
105
|
free = vm_output[/Pages free:\s+(\d+)/, 1].to_i
|
|
105
106
|
inactive = vm_output[/Pages inactive:\s+(\d+)/, 1].to_i
|
|
106
107
|
@available_memory_mb = (free + inactive) * page_size / 1024 / 1024
|
|
107
|
-
rescue StandardError
|
|
108
|
+
rescue StandardError => e
|
|
109
|
+
Legion::Logging.warn("Discovery::System vm_stat command failed: #{e.message}") if defined?(Legion::Logging)
|
|
108
110
|
@available_memory_mb = nil
|
|
109
111
|
end
|
|
110
112
|
|
|
@@ -112,7 +114,8 @@ module Legion
|
|
|
112
114
|
meminfo = File.read('/proc/meminfo')
|
|
113
115
|
total_kb = meminfo[/MemTotal:\s+(\d+)/, 1].to_i
|
|
114
116
|
@total_memory_mb = total_kb / 1024
|
|
115
|
-
rescue StandardError
|
|
117
|
+
rescue StandardError => e
|
|
118
|
+
Legion::Logging.warn("Discovery::System /proc/meminfo read failed: #{e.message}") if defined?(Legion::Logging)
|
|
116
119
|
@total_memory_mb = nil
|
|
117
120
|
end
|
|
118
121
|
|
|
@@ -121,7 +124,8 @@ module Legion
|
|
|
121
124
|
free_kb = meminfo[/MemFree:\s+(\d+)/, 1].to_i
|
|
122
125
|
inactive_kb = meminfo[/Inactive:\s+(\d+)/, 1].to_i
|
|
123
126
|
@available_memory_mb = (free_kb + inactive_kb) / 1024
|
|
124
|
-
rescue StandardError
|
|
127
|
+
rescue StandardError => e
|
|
128
|
+
Legion::Logging.warn("Discovery::System /proc/meminfo available read failed: #{e.message}") if defined?(Legion::Logging)
|
|
125
129
|
@available_memory_mb = nil
|
|
126
130
|
end
|
|
127
131
|
|
|
@@ -0,0 +1,46 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Legion
|
|
4
|
+
module LLM
|
|
5
|
+
module OffPeak
|
|
6
|
+
# Peak hours in UTC: 14:00-22:00 (9 AM - 5 PM CT)
|
|
7
|
+
PEAK_HOURS = (14..22)
|
|
8
|
+
|
|
9
|
+
class << self
|
|
10
|
+
# Returns true if the given time falls within peak hours.
|
|
11
|
+
#
|
|
12
|
+
# @param time [Time] time to check (defaults to now)
|
|
13
|
+
# @return [Boolean]
|
|
14
|
+
def peak_hour?(time = Time.now.utc)
|
|
15
|
+
result = PEAK_HOURS.cover?(time.hour)
|
|
16
|
+
Legion::Logging.debug("OffPeak peak_hour check hour=#{time.hour} peak=#{result}") if defined?(Legion::Logging)
|
|
17
|
+
result
|
|
18
|
+
end
|
|
19
|
+
|
|
20
|
+
# Returns true when a non-urgent request should be deferred to off-peak.
|
|
21
|
+
#
|
|
22
|
+
# @param priority [Symbol] :urgent bypasses deferral; :normal and :low defer during peak
|
|
23
|
+
# @return [Boolean]
|
|
24
|
+
def should_defer?(priority: :normal)
|
|
25
|
+
return false if priority.to_sym == :urgent
|
|
26
|
+
|
|
27
|
+
peak_hour?
|
|
28
|
+
end
|
|
29
|
+
|
|
30
|
+
# Returns the next off-peak Time (UTC).
|
|
31
|
+
# If already off-peak, returns the current time.
|
|
32
|
+
# Off-peak begins at the hour after the peak window ends (23:00 UTC).
|
|
33
|
+
#
|
|
34
|
+
# @param time [Time] reference time (defaults to now)
|
|
35
|
+
# @return [Time]
|
|
36
|
+
def next_off_peak(time = Time.now.utc)
|
|
37
|
+
if time.hour < PEAK_HOURS.first || time.hour >= PEAK_HOURS.last
|
|
38
|
+
time
|
|
39
|
+
else
|
|
40
|
+
Time.utc(time.year, time.month, time.day, PEAK_HOURS.last, 0, 0)
|
|
41
|
+
end
|
|
42
|
+
end
|
|
43
|
+
end
|
|
44
|
+
end
|
|
45
|
+
end
|
|
46
|
+
end
|
|
@@ -26,6 +26,7 @@ module Legion
|
|
|
26
26
|
|
|
27
27
|
# Writes error details and marks status as :error.
|
|
28
28
|
def fail_request(request_id, code:, message:, ttl: DEFAULT_TTL)
|
|
29
|
+
Legion::Logging.warn("ResponseCache fail_request request_id=#{request_id} code=#{code} message=#{message}") if defined?(Legion::Logging)
|
|
29
30
|
payload = ::JSON.dump({ code: code, message: message })
|
|
30
31
|
cache_set(error_key(request_id), payload, ttl)
|
|
31
32
|
cache_set(status_key(request_id), 'error', ttl)
|
|
@@ -69,6 +70,7 @@ module Legion
|
|
|
69
70
|
|
|
70
71
|
loop do
|
|
71
72
|
current = status(request_id)
|
|
73
|
+
Legion::Logging.debug("ResponseCache poll request_id=#{request_id} status=#{current}") if defined?(Legion::Logging)
|
|
72
74
|
|
|
73
75
|
case current
|
|
74
76
|
when :done
|
|
@@ -120,6 +122,7 @@ module Legion
|
|
|
120
122
|
|
|
121
123
|
private_class_method def self.write_response(request_id, response_text, ttl)
|
|
122
124
|
if response_text.bytesize > SPOOL_THRESHOLD
|
|
125
|
+
Legion::Logging.warn("ResponseCache spool overflow request_id=#{request_id} bytes=#{response_text.bytesize}") if defined?(Legion::Logging)
|
|
123
126
|
FileUtils.mkdir_p(SPOOL_DIR)
|
|
124
127
|
path = File.join(SPOOL_DIR, "#{request_id}.txt")
|
|
125
128
|
File.write(path, response_text)
|
|
@@ -49,7 +49,10 @@ module Legion
|
|
|
49
49
|
|
|
50
50
|
if circuit[:state] == :open
|
|
51
51
|
elapsed = Time.now - circuit[:opened_at]
|
|
52
|
-
|
|
52
|
+
if elapsed >= @cooldown_seconds
|
|
53
|
+
Legion::Logging.warn("Circuit open->half_open for provider=#{provider} (cooldown elapsed)") if defined?(Legion::Logging)
|
|
54
|
+
return :half_open
|
|
55
|
+
end
|
|
53
56
|
end
|
|
54
57
|
|
|
55
58
|
circuit[:state]
|
|
@@ -82,11 +85,13 @@ module Legion
|
|
|
82
85
|
if circuit_state(provider) == :half_open
|
|
83
86
|
circuit[:state] = :open
|
|
84
87
|
circuit[:opened_at] = Time.now
|
|
88
|
+
Legion::Logging.warn("Circuit half_open->open for provider=#{provider} (error during probe)") if defined?(Legion::Logging)
|
|
85
89
|
else
|
|
86
90
|
circuit[:failures] += 1.0
|
|
87
91
|
if circuit[:failures] >= @failure_threshold
|
|
88
92
|
circuit[:state] = :open
|
|
89
93
|
circuit[:opened_at] = Time.now
|
|
94
|
+
Legion::Logging.warn("Circuit closed->open for provider=#{provider} (failures=#{circuit[:failures]})") if defined?(Legion::Logging)
|
|
90
95
|
end
|
|
91
96
|
end
|
|
92
97
|
end
|
|
@@ -94,10 +99,12 @@ module Legion
|
|
|
94
99
|
register_handler(:success) do |payload|
|
|
95
100
|
provider = payload[:provider]
|
|
96
101
|
ensure_circuit(provider)
|
|
102
|
+
prev_state = circuit_state(provider)
|
|
97
103
|
circuit = @circuits[provider]
|
|
98
104
|
circuit[:failures] = 0
|
|
99
105
|
circuit[:state] = :closed
|
|
100
106
|
circuit[:opened_at] = nil
|
|
107
|
+
Legion::Logging.warn("Circuit #{prev_state}->closed for provider=#{provider}") if defined?(Legion::Logging) && prev_state != :closed
|
|
101
108
|
end
|
|
102
109
|
|
|
103
110
|
register_handler(:quality_failure) do |payload|
|
|
@@ -108,11 +115,13 @@ module Legion
|
|
|
108
115
|
if circuit_state(provider) == :half_open
|
|
109
116
|
circuit[:state] = :open
|
|
110
117
|
circuit[:opened_at] = Time.now
|
|
118
|
+
Legion::Logging.warn("Circuit half_open->open for provider=#{provider} (quality failure during probe)") if defined?(Legion::Logging)
|
|
111
119
|
else
|
|
112
120
|
circuit[:failures] += 0.5
|
|
113
121
|
if circuit[:failures] >= @failure_threshold
|
|
114
122
|
circuit[:state] = :open
|
|
115
123
|
circuit[:opened_at] = Time.now
|
|
124
|
+
Legion::Logging.warn("Circuit closed->open for provider=#{provider} (quality failures=#{circuit[:failures]})") if defined?(Legion::Logging)
|
|
116
125
|
end
|
|
117
126
|
end
|
|
118
127
|
end
|
|
@@ -152,7 +161,9 @@ module Legion
|
|
|
152
161
|
return 0 if avg <= LATENCY_THRESHOLD_MS
|
|
153
162
|
|
|
154
163
|
multiplier = (avg / LATENCY_THRESHOLD_MS).floor
|
|
155
|
-
[LATENCY_PENALTY_STEP * multiplier, OPEN_PENALTY].max
|
|
164
|
+
penalty = [LATENCY_PENALTY_STEP * multiplier, OPEN_PENALTY].max
|
|
165
|
+
Legion::Logging.debug("Latency penalty applied to provider=#{provider} avg_ms=#{avg.round} penalty=#{penalty}") if defined?(Legion::Logging)
|
|
166
|
+
penalty
|
|
156
167
|
end
|
|
157
168
|
end
|
|
158
169
|
end
|
|
@@ -39,9 +39,17 @@ module Legion
|
|
|
39
39
|
|
|
40
40
|
def matches_intent?(intent)
|
|
41
41
|
@conditions.all? do |key, value|
|
|
42
|
-
|
|
42
|
+
unless intent.key?(key)
|
|
43
|
+
Legion::Logging.debug("Rule '#{@name}' rejected: missing intent key=#{key}") if defined?(Legion::Logging)
|
|
44
|
+
return false
|
|
45
|
+
end
|
|
43
46
|
|
|
44
|
-
intent[key].to_s == value.to_s
|
|
47
|
+
unless intent[key].to_s == value.to_s
|
|
48
|
+
Legion::Logging.debug("Rule '#{@name}' rejected: intent #{key}=#{intent[key]} != #{value}") if defined?(Legion::Logging)
|
|
49
|
+
return false
|
|
50
|
+
end
|
|
51
|
+
|
|
52
|
+
true
|
|
45
53
|
end
|
|
46
54
|
end
|
|
47
55
|
|
|
@@ -60,17 +68,32 @@ module Legion
|
|
|
60
68
|
|
|
61
69
|
sched = @schedule.transform_keys(&:to_s)
|
|
62
70
|
now = localize(now, sched['timezone'])
|
|
63
|
-
|
|
64
|
-
return false if sched['valid_from'] && now < Time.parse(sched['valid_from'])
|
|
65
|
-
return false if sched['valid_until'] && now > Time.parse(sched['valid_until'])
|
|
66
|
-
return false if sched['hours'] && !within_hours?(sched['hours'], now)
|
|
67
|
-
return false if sched['days'] && !on_allowed_day?(sched['days'], now)
|
|
68
|
-
|
|
69
|
-
true
|
|
71
|
+
schedule_rejection(sched, now).nil?
|
|
70
72
|
end
|
|
71
73
|
|
|
72
74
|
private
|
|
73
75
|
|
|
76
|
+
def schedule_rejection(sched, now)
|
|
77
|
+
if sched['valid_from'] && now < Time.parse(sched['valid_from'])
|
|
78
|
+
Legion::Logging.debug("Rule '#{@name}' rejected: before valid_from=#{sched['valid_from']}") if defined?(Legion::Logging)
|
|
79
|
+
return :valid_from
|
|
80
|
+
end
|
|
81
|
+
if sched['valid_until'] && now > Time.parse(sched['valid_until'])
|
|
82
|
+
Legion::Logging.debug("Rule '#{@name}' rejected: after valid_until=#{sched['valid_until']}") if defined?(Legion::Logging)
|
|
83
|
+
return :valid_until
|
|
84
|
+
end
|
|
85
|
+
if sched['hours'] && !within_hours?(sched['hours'], now)
|
|
86
|
+
Legion::Logging.debug("Rule '#{@name}' rejected: outside schedule hours=#{sched['hours']}") if defined?(Legion::Logging)
|
|
87
|
+
return :hours
|
|
88
|
+
end
|
|
89
|
+
if sched['days'] && !on_allowed_day?(sched['days'], now)
|
|
90
|
+
Legion::Logging.debug("Rule '#{@name}' rejected: outside schedule days=#{sched['days']}") if defined?(Legion::Logging)
|
|
91
|
+
return :days
|
|
92
|
+
end
|
|
93
|
+
|
|
94
|
+
nil
|
|
95
|
+
end
|
|
96
|
+
|
|
74
97
|
def localize(time, timezone_name)
|
|
75
98
|
return time unless timezone_name
|
|
76
99
|
|
data/lib/legion/llm/router.rb
CHANGED
|
@@ -28,7 +28,17 @@ module Legion
|
|
|
28
28
|
rules = load_rules
|
|
29
29
|
candidates = select_candidates(rules, merged)
|
|
30
30
|
best = pick_best(candidates)
|
|
31
|
-
best&.to_resolution
|
|
31
|
+
resolution = best&.to_resolution
|
|
32
|
+
|
|
33
|
+
if resolution
|
|
34
|
+
if defined?(Legion::Logging)
|
|
35
|
+
Legion::Logging.info("Routed to tier=#{resolution.tier} provider=#{resolution.provider} model=#{resolution.model} via rule='#{resolution.rule}'")
|
|
36
|
+
end
|
|
37
|
+
elsif defined?(Legion::Logging)
|
|
38
|
+
Legion::Logging.debug('Router: no rules matched, resolution is nil')
|
|
39
|
+
end
|
|
40
|
+
|
|
41
|
+
resolution
|
|
32
42
|
end
|
|
33
43
|
|
|
34
44
|
def resolve_chain(intent: nil, tier: nil, model: nil, provider: nil, max_escalations: nil)
|
|
@@ -100,6 +110,8 @@ module Legion
|
|
|
100
110
|
end
|
|
101
111
|
|
|
102
112
|
def select_candidates(rules, intent)
|
|
113
|
+
Legion::Logging.debug("Router: selecting candidates from #{rules.size} rules") if defined?(Legion::Logging)
|
|
114
|
+
|
|
103
115
|
# 1. Collect constraints from constraint rules that match the intent
|
|
104
116
|
constraints = rules
|
|
105
117
|
.select { |r| r.constraint && r.matches_intent?(intent) }
|
|
@@ -118,7 +130,11 @@ module Legion
|
|
|
118
130
|
discovered = unconstrained.reject { |r| excluded_by_discovery?(r) }
|
|
119
131
|
|
|
120
132
|
# 5. Filter by tier availability
|
|
121
|
-
discovered.select { |r| tier_available?(r.target[:tier] || r.target['tier']) }
|
|
133
|
+
final = discovered.select { |r| tier_available?(r.target[:tier] || r.target['tier']) }
|
|
134
|
+
|
|
135
|
+
Legion::Logging.debug("Router: #{final.size} candidates after filtering (started with #{rules.size})") if defined?(Legion::Logging)
|
|
136
|
+
|
|
137
|
+
final
|
|
122
138
|
end
|
|
123
139
|
|
|
124
140
|
def excluded_by_constraint?(rule, constraints)
|
|
@@ -24,7 +24,9 @@ module Legion
|
|
|
24
24
|
return false unless enabled?
|
|
25
25
|
return false if urgency.to_sym == :immediate
|
|
26
26
|
|
|
27
|
-
eligible_for_deferral?(intent.to_sym) && peak_hours?
|
|
27
|
+
result = eligible_for_deferral?(intent.to_sym) && peak_hours?
|
|
28
|
+
Legion::Logging.debug("Scheduling defer decision intent=#{intent} urgency=#{urgency} defer=#{result}") if defined?(Legion::Logging)
|
|
29
|
+
result
|
|
28
30
|
end
|
|
29
31
|
|
|
30
32
|
# Returns true if the current UTC hour falls within the configured peak window.
|
|
@@ -17,6 +17,7 @@ module Legion
|
|
|
17
17
|
|
|
18
18
|
def evaluate(primary_response:, messages: nil, shadow_model: nil) # rubocop:disable Lint/UnusedMethodArgument
|
|
19
19
|
shadow_model ||= Legion::Settings.dig(:llm, :shadow, :model) || 'gpt-4o-mini'
|
|
20
|
+
Legion::Logging.debug("ShadowEval triggered primary_model=#{primary_response[:model]} shadow_model=#{shadow_model}") if defined?(Legion::Logging)
|
|
20
21
|
|
|
21
22
|
shadow_response = Legion::LLM.send(:chat_single,
|
|
22
23
|
model: shadow_model, provider: nil,
|
|
@@ -27,6 +28,7 @@ module Legion
|
|
|
27
28
|
Legion::Events.emit('llm.shadow_eval', comparison) if defined?(Legion::Events)
|
|
28
29
|
comparison
|
|
29
30
|
rescue StandardError => e
|
|
31
|
+
Legion::Logging.warn("ShadowEval failed shadow_model=#{shadow_model}: #{e.message}") if defined?(Legion::Logging)
|
|
30
32
|
{ error: e.message, shadow_model: shadow_model }
|
|
31
33
|
end
|
|
32
34
|
|
|
@@ -26,6 +26,7 @@ module Legion
|
|
|
26
26
|
json_schema: { name: 'response', schema: schema } },
|
|
27
27
|
**opts.except(:attempt))
|
|
28
28
|
else
|
|
29
|
+
Legion::Logging.debug("StructuredOutput using prompt-based fallback for model=#{model}") if defined?(Legion::Logging)
|
|
29
30
|
instruction = "You MUST respond with valid JSON matching this schema:\n" \
|
|
30
31
|
"```json\n#{Legion::JSON.dump(schema)}\n```\n" \
|
|
31
32
|
'Respond with ONLY the JSON object, no other text.'
|
|
@@ -37,8 +38,10 @@ module Legion
|
|
|
37
38
|
end
|
|
38
39
|
|
|
39
40
|
def handle_parse_error(error, messages, schema, model, result, **opts)
|
|
40
|
-
|
|
41
|
-
|
|
41
|
+
attempt = opts[:attempt] || 0
|
|
42
|
+
Legion::Logging.warn("StructuredOutput JSON parse failure attempt=#{attempt} model=#{model}: #{error.message}") if defined?(Legion::Logging)
|
|
43
|
+
if retry_enabled? && attempt < max_retries
|
|
44
|
+
retry_with_instruction(messages, schema, model, attempt: attempt + 1, **opts)
|
|
42
45
|
else
|
|
43
46
|
{ data: nil, error: "JSON parse failed: #{error.message}", raw: result&.dig(:content), valid: false }
|
|
44
47
|
end
|
data/lib/legion/llm/version.rb
CHANGED
data/lib/legion/llm.rb
CHANGED
|
@@ -15,6 +15,8 @@ require_relative 'llm/daemon_client'
|
|
|
15
15
|
require_relative 'llm/arbitrage'
|
|
16
16
|
require_relative 'llm/batch'
|
|
17
17
|
require_relative 'llm/scheduling'
|
|
18
|
+
require_relative 'llm/off_peak'
|
|
19
|
+
require_relative 'llm/cost_tracker'
|
|
18
20
|
|
|
19
21
|
begin
|
|
20
22
|
require 'legion/extensions/llm/gateway'
|
|
@@ -124,7 +126,7 @@ module Legion
|
|
|
124
126
|
)
|
|
125
127
|
else
|
|
126
128
|
chat_single(model: model, provider: provider, intent: intent, tier: tier,
|
|
127
|
-
temperature: temperature, **kwargs)
|
|
129
|
+
temperature: temperature, message: message, **kwargs)
|
|
128
130
|
end
|
|
129
131
|
|
|
130
132
|
if cache_key && result.is_a?(Hash)
|
|
@@ -185,6 +187,10 @@ module Legion
|
|
|
185
187
|
agent_class.new(**)
|
|
186
188
|
end
|
|
187
189
|
|
|
190
|
+
FRAMEWORK_KEYS = %i[request_id source timestamp datetime task_id parent_id master_id
|
|
191
|
+
check_subtask generate_task catch_exceptions worker_id principal_id
|
|
192
|
+
principal_type].freeze
|
|
193
|
+
|
|
188
194
|
private
|
|
189
195
|
|
|
190
196
|
def _dispatch_chat(model:, provider:, intent:, tier:, escalate:, max_escalations:, quality_check:, message:, **kwargs)
|
|
@@ -276,7 +282,7 @@ module Legion
|
|
|
276
282
|
Legion::Extensions::LLM::Gateway::Runners::Inference.chat(**)
|
|
277
283
|
end
|
|
278
284
|
|
|
279
|
-
def chat_single(model:, provider:, intent:, tier:, **kwargs)
|
|
285
|
+
def chat_single(model:, provider:, intent:, tier:, message: nil, **kwargs)
|
|
280
286
|
if (intent || tier) && Router.routing_enabled?
|
|
281
287
|
resolution = Router.resolve(intent: intent, tier: tier, model: model, provider: provider)
|
|
282
288
|
if resolution
|
|
@@ -295,12 +301,15 @@ module Legion
|
|
|
295
301
|
opts = {}
|
|
296
302
|
opts[:model] = model if model
|
|
297
303
|
opts[:provider] = provider if provider
|
|
298
|
-
opts.merge!(kwargs)
|
|
304
|
+
opts.merge!(kwargs.except(*FRAMEWORK_KEYS))
|
|
299
305
|
opts.delete(:temperature) if opts[:temperature].nil?
|
|
300
306
|
|
|
301
307
|
inject_anthropic_cache_control!(opts, provider)
|
|
302
308
|
|
|
303
|
-
RubyLLM.chat(**opts)
|
|
309
|
+
session = RubyLLM.chat(**opts)
|
|
310
|
+
return session unless message
|
|
311
|
+
|
|
312
|
+
session.ask(message)
|
|
304
313
|
end
|
|
305
314
|
|
|
306
315
|
def chat_with_escalation(model:, provider:, intent:, tier:, max_escalations:, quality_check:, message:, **kwargs)
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: legion-llm
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.3.
|
|
4
|
+
version: 0.3.18
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Esity
|
|
@@ -137,6 +137,7 @@ files:
|
|
|
137
137
|
- lib/legion/llm/cache.rb
|
|
138
138
|
- lib/legion/llm/claude_config_loader.rb
|
|
139
139
|
- lib/legion/llm/compressor.rb
|
|
140
|
+
- lib/legion/llm/cost_tracker.rb
|
|
140
141
|
- lib/legion/llm/daemon_client.rb
|
|
141
142
|
- lib/legion/llm/discovery/ollama.rb
|
|
142
143
|
- lib/legion/llm/discovery/system.rb
|
|
@@ -146,6 +147,7 @@ files:
|
|
|
146
147
|
- lib/legion/llm/hooks.rb
|
|
147
148
|
- lib/legion/llm/hooks/rag_guard.rb
|
|
148
149
|
- lib/legion/llm/hooks/response_guard.rb
|
|
150
|
+
- lib/legion/llm/off_peak.rb
|
|
149
151
|
- lib/legion/llm/providers.rb
|
|
150
152
|
- lib/legion/llm/quality_checker.rb
|
|
151
153
|
- lib/legion/llm/response_cache.rb
|