legion-llm 0.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.github/workflows/ci.yml +16 -0
- data/.gitignore +18 -0
- data/.rubocop.yml +56 -0
- data/CHANGELOG.md +71 -0
- data/CLAUDE.md +388 -0
- data/Gemfile +14 -0
- data/LICENSE +20 -0
- data/README.md +615 -0
- data/docs/plans/2026-03-15-ollama-discovery-design.md +164 -0
- data/docs/plans/2026-03-15-ollama-discovery-implementation.md +1147 -0
- data/legion-llm.gemspec +32 -0
- data/lib/legion/llm/bedrock_bearer_auth.rb +53 -0
- data/lib/legion/llm/compressor.rb +75 -0
- data/lib/legion/llm/discovery/ollama.rb +88 -0
- data/lib/legion/llm/discovery/system.rb +139 -0
- data/lib/legion/llm/escalation_history.rb +28 -0
- data/lib/legion/llm/helpers/llm.rb +59 -0
- data/lib/legion/llm/providers.rb +88 -0
- data/lib/legion/llm/quality_checker.rb +56 -0
- data/lib/legion/llm/router/escalation_chain.rb +49 -0
- data/lib/legion/llm/router/health_tracker.rb +160 -0
- data/lib/legion/llm/router/resolution.rb +43 -0
- data/lib/legion/llm/router/rule.rb +103 -0
- data/lib/legion/llm/router.rb +279 -0
- data/lib/legion/llm/settings.rb +97 -0
- data/lib/legion/llm/transport/exchanges/escalation.rb +14 -0
- data/lib/legion/llm/transport/messages/escalation_event.rb +13 -0
- data/lib/legion/llm/version.rb +7 -0
- data/lib/legion/llm.rb +264 -0
- metadata +136 -0
|
@@ -0,0 +1,164 @@
|
|
|
1
|
+
# Ollama Model Discovery & System Memory Awareness
|
|
2
|
+
|
|
3
|
+
**Date**: 2026-03-15
|
|
4
|
+
**Author**: Matthew Iverson (@Esity)
|
|
5
|
+
**Status**: Approved
|
|
6
|
+
|
|
7
|
+
## Problem
|
|
8
|
+
|
|
9
|
+
Legion::LLM's router can target Ollama models via routing rules, but has no awareness of:
|
|
10
|
+
1. Which models are actually pulled in the local Ollama instance
|
|
11
|
+
2. How much system memory is available to run them
|
|
12
|
+
|
|
13
|
+
This leads to rules targeting models that aren't present (silent failures falling through to cloud) and no protection against selecting models too large for available RAM.
|
|
14
|
+
|
|
15
|
+
## Solution
|
|
16
|
+
|
|
17
|
+
Add two discovery modules under `Legion::LLM::Discovery` that provide lazy TTL-cached system introspection. The router uses this data to filter candidates before scoring.
|
|
18
|
+
|
|
19
|
+
## Architecture
|
|
20
|
+
|
|
21
|
+
### Module Structure
|
|
22
|
+
|
|
23
|
+
```
|
|
24
|
+
Legion::LLM::Discovery
|
|
25
|
+
├── Ollama # Queries Ollama /api/tags for pulled models
|
|
26
|
+
└── System # Queries OS for memory stats (macOS + Linux)
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
### Discovery::Ollama
|
|
30
|
+
|
|
31
|
+
Queries `GET <base_url>/api/tags` via Faraday (transitive dep from ruby_llm).
|
|
32
|
+
|
|
33
|
+
```ruby
|
|
34
|
+
Discovery::Ollama
|
|
35
|
+
.models # -> Array<Hash> (raw model list)
|
|
36
|
+
.model_names # -> Array<String> (names for quick lookup)
|
|
37
|
+
.model_available?(name) # -> Boolean
|
|
38
|
+
.model_size(name) # -> Integer (bytes) or nil
|
|
39
|
+
.refresh! # Force re-fetch
|
|
40
|
+
.reset! # Clear cache (testing)
|
|
41
|
+
.stale? # -> Boolean (TTL expired?)
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
Response format from Ollama `/api/tags`:
|
|
45
|
+
```json
|
|
46
|
+
{
|
|
47
|
+
"models": [
|
|
48
|
+
{
|
|
49
|
+
"name": "llama3.1:8b",
|
|
50
|
+
"size": 4700000000,
|
|
51
|
+
"digest": "sha256:...",
|
|
52
|
+
"modified_at": "2026-03-15T..."
|
|
53
|
+
}
|
|
54
|
+
]
|
|
55
|
+
}
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
Connection: 2-second timeout, uses `ollama[:base_url]` from settings (default `http://localhost:11434`).
|
|
59
|
+
|
|
60
|
+
### Discovery::System
|
|
61
|
+
|
|
62
|
+
Queries OS-level memory information. Platform-aware:
|
|
63
|
+
|
|
64
|
+
- **macOS**: `sysctl -n hw.memsize` (total), `vm_stat` (free + inactive pages, excludes disk cache)
|
|
65
|
+
- **Linux**: `/proc/meminfo` (MemTotal, MemFree + Inactive)
|
|
66
|
+
|
|
67
|
+
```ruby
|
|
68
|
+
Discovery::System
|
|
69
|
+
.total_memory_mb # -> Integer
|
|
70
|
+
.available_memory_mb # -> Integer (free + inactive, no disk cache)
|
|
71
|
+
.memory_pressure? # -> Boolean (available < memory_floor_mb)
|
|
72
|
+
.platform # -> :macos | :linux | :unknown
|
|
73
|
+
.refresh!
|
|
74
|
+
.reset!
|
|
75
|
+
.stale?
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
### Router Integration
|
|
79
|
+
|
|
80
|
+
The existing `select_candidates` pipeline gains one new filter step:
|
|
81
|
+
|
|
82
|
+
```
|
|
83
|
+
Current pipeline:
|
|
84
|
+
1. Collect constraints from constraint rules
|
|
85
|
+
2. Filter by intent match
|
|
86
|
+
3. Filter by schedule
|
|
87
|
+
4. Reject rules excluded by constraints
|
|
88
|
+
5. Filter by tier availability
|
|
89
|
+
|
|
90
|
+
New pipeline:
|
|
91
|
+
1. Collect constraints from constraint rules
|
|
92
|
+
2. Filter by intent match
|
|
93
|
+
3. Filter by schedule
|
|
94
|
+
4. Reject rules excluded by constraints
|
|
95
|
+
4.5 Reject Ollama rules where model is not pulled or doesn't fit in memory <-- NEW
|
|
96
|
+
5. Filter by tier availability
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
Step 4.5 logic:
|
|
100
|
+
- Only applies to rules where target tier is `:local` and provider is `:ollama`
|
|
101
|
+
- Skip if `Discovery::Ollama.model_available?(model)` returns false
|
|
102
|
+
- Skip if model size > (`Discovery::System.available_memory_mb - memory_floor_mb`) * 1MB
|
|
103
|
+
- All non-Ollama rules pass through unchanged
|
|
104
|
+
- If discovery is disabled (`enabled: false`), skip all checks (permissive)
|
|
105
|
+
|
|
106
|
+
### Startup Integration
|
|
107
|
+
|
|
108
|
+
In `Legion::LLM.start`, after `configure_providers` and before `set_defaults`:
|
|
109
|
+
|
|
110
|
+
```ruby
|
|
111
|
+
if settings.dig(:providers, :ollama, :enabled)
|
|
112
|
+
Discovery::Ollama.refresh!
|
|
113
|
+
Discovery::System.refresh!
|
|
114
|
+
Legion::Logging.info "Ollama: #{Discovery::Ollama.model_names.size} models available " \
|
|
115
|
+
"(#{Discovery::Ollama.model_names.join(', ')})"
|
|
116
|
+
Legion::Logging.info "System: #{Discovery::System.total_memory_mb} MB total, " \
|
|
117
|
+
"#{Discovery::System.available_memory_mb} MB available"
|
|
118
|
+
end
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
### Settings
|
|
122
|
+
|
|
123
|
+
New settings nested under `Legion::Settings[:llm][:discovery]`:
|
|
124
|
+
|
|
125
|
+
| Key | Type | Default | Description |
|
|
126
|
+
|-----|------|---------|-------------|
|
|
127
|
+
| `enabled` | Boolean | `true` | Master switch for discovery checks |
|
|
128
|
+
| `refresh_seconds` | Integer | `60` | TTL for both discovery caches |
|
|
129
|
+
| `memory_floor_mb` | Integer | `2048` | Minimum free MB to reserve for OS |
|
|
130
|
+
|
|
131
|
+
Added to `Legion::LLM::Settings.default` and merged via `routing_defaults`.
|
|
132
|
+
|
|
133
|
+
### Error Handling
|
|
134
|
+
|
|
135
|
+
| Scenario | Behavior |
|
|
136
|
+
|----------|----------|
|
|
137
|
+
| Ollama not running | `models` returns `[]`, all Ollama rules skipped, cloud tier takes over |
|
|
138
|
+
| `vm_stat` fails | `available_memory_mb` returns `nil`, memory checks bypassed (permissive) |
|
|
139
|
+
| `/api/tags` timeout (2s) | Returns stale cache if available, empty array on first call |
|
|
140
|
+
| Discovery disabled | All checks bypassed, rules pass through as before |
|
|
141
|
+
| Unknown platform | `available_memory_mb` returns `nil`, memory checks bypassed |
|
|
142
|
+
|
|
143
|
+
### File Layout
|
|
144
|
+
|
|
145
|
+
```
|
|
146
|
+
lib/legion/llm/discovery/ollama.rb
|
|
147
|
+
lib/legion/llm/discovery/system.rb
|
|
148
|
+
spec/legion/llm/discovery/ollama_spec.rb
|
|
149
|
+
spec/legion/llm/discovery/system_spec.rb
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
### Dependencies
|
|
153
|
+
|
|
154
|
+
No new gem dependencies. Uses:
|
|
155
|
+
- `Faraday` (transitive via ruby_llm) for Ollama HTTP
|
|
156
|
+
- Shell commands (`sysctl`, `vm_stat`) for macOS memory
|
|
157
|
+
- File reads (`/proc/meminfo`) for Linux memory
|
|
158
|
+
|
|
159
|
+
## Out of Scope
|
|
160
|
+
|
|
161
|
+
- Auto-pulling models (explicit operator action only)
|
|
162
|
+
- GPU utilization monitoring (future HealthTracker signal)
|
|
163
|
+
- Fleet tier discovery (Phase 2)
|
|
164
|
+
- Disk space checks for model storage
|