legion-llm 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,164 @@
1
+ # Ollama Model Discovery & System Memory Awareness
2
+
3
+ **Date**: 2026-03-15
4
+ **Author**: Matthew Iverson (@Esity)
5
+ **Status**: Approved
6
+
7
+ ## Problem
8
+
9
+ Legion::LLM's router can target Ollama models via routing rules, but has no awareness of:
10
+ 1. Which models are actually pulled in the local Ollama instance
11
+ 2. How much system memory is available to run them
12
+
13
+ This leads to rules targeting models that aren't present (silent failures falling through to cloud) and no protection against selecting models too large for available RAM.
14
+
15
+ ## Solution
16
+
17
+ Add two discovery modules under `Legion::LLM::Discovery` that provide lazy TTL-cached system introspection. The router uses this data to filter candidates before scoring.
18
+
19
+ ## Architecture
20
+
21
+ ### Module Structure
22
+
23
+ ```
24
+ Legion::LLM::Discovery
25
+ ├── Ollama # Queries Ollama /api/tags for pulled models
26
+ └── System # Queries OS for memory stats (macOS + Linux)
27
+ ```
28
+
29
+ ### Discovery::Ollama
30
+
31
+ Queries `GET <base_url>/api/tags` via Faraday (transitive dep from ruby_llm).
32
+
33
+ ```ruby
34
+ Discovery::Ollama
35
+ .models # -> Array<Hash> (raw model list)
36
+ .model_names # -> Array<String> (names for quick lookup)
37
+ .model_available?(name) # -> Boolean
38
+ .model_size(name) # -> Integer (bytes) or nil
39
+ .refresh! # Force re-fetch
40
+ .reset! # Clear cache (testing)
41
+ .stale? # -> Boolean (TTL expired?)
42
+ ```
43
+
44
+ Response format from Ollama `/api/tags`:
45
+ ```json
46
+ {
47
+ "models": [
48
+ {
49
+ "name": "llama3.1:8b",
50
+ "size": 4700000000,
51
+ "digest": "sha256:...",
52
+ "modified_at": "2026-03-15T..."
53
+ }
54
+ ]
55
+ }
56
+ ```
57
+
58
+ Connection: 2-second timeout, uses `ollama[:base_url]` from settings (default `http://localhost:11434`).
59
+
60
+ ### Discovery::System
61
+
62
+ Queries OS-level memory information. Platform-aware:
63
+
64
+ - **macOS**: `sysctl -n hw.memsize` (total), `vm_stat` (free + inactive pages, excludes disk cache)
65
+ - **Linux**: `/proc/meminfo` (MemTotal, MemFree + Inactive)
66
+
67
+ ```ruby
68
+ Discovery::System
69
+ .total_memory_mb # -> Integer
70
+ .available_memory_mb # -> Integer (free + inactive, no disk cache)
71
+ .memory_pressure? # -> Boolean (available < memory_floor_mb)
72
+ .platform # -> :macos | :linux | :unknown
73
+ .refresh!
74
+ .reset!
75
+ .stale?
76
+ ```
77
+
78
+ ### Router Integration
79
+
80
+ The existing `select_candidates` pipeline gains one new filter step:
81
+
82
+ ```
83
+ Current pipeline:
84
+ 1. Collect constraints from constraint rules
85
+ 2. Filter by intent match
86
+ 3. Filter by schedule
87
+ 4. Reject rules excluded by constraints
88
+ 5. Filter by tier availability
89
+
90
+ New pipeline:
91
+ 1. Collect constraints from constraint rules
92
+ 2. Filter by intent match
93
+ 3. Filter by schedule
94
+ 4. Reject rules excluded by constraints
95
+ 4.5 Reject Ollama rules where model is not pulled or doesn't fit in memory <-- NEW
96
+ 5. Filter by tier availability
97
+ ```
98
+
99
+ Step 4.5 logic:
100
+ - Only applies to rules where target tier is `:local` and provider is `:ollama`
101
+ - Skip if `Discovery::Ollama.model_available?(model)` returns false
102
+ - Skip if model size > (`Discovery::System.available_memory_mb - memory_floor_mb`) * 1MB
103
+ - All non-Ollama rules pass through unchanged
104
+ - If discovery is disabled (`enabled: false`), skip all checks (permissive)
105
+
106
+ ### Startup Integration
107
+
108
+ In `Legion::LLM.start`, after `configure_providers` and before `set_defaults`:
109
+
110
+ ```ruby
111
+ if settings.dig(:providers, :ollama, :enabled)
112
+ Discovery::Ollama.refresh!
113
+ Discovery::System.refresh!
114
+ Legion::Logging.info "Ollama: #{Discovery::Ollama.model_names.size} models available " \
115
+ "(#{Discovery::Ollama.model_names.join(', ')})"
116
+ Legion::Logging.info "System: #{Discovery::System.total_memory_mb} MB total, " \
117
+ "#{Discovery::System.available_memory_mb} MB available"
118
+ end
119
+ ```
120
+
121
+ ### Settings
122
+
123
+ New settings nested under `Legion::Settings[:llm][:discovery]`:
124
+
125
+ | Key | Type | Default | Description |
126
+ |-----|------|---------|-------------|
127
+ | `enabled` | Boolean | `true` | Master switch for discovery checks |
128
+ | `refresh_seconds` | Integer | `60` | TTL for both discovery caches |
129
+ | `memory_floor_mb` | Integer | `2048` | Minimum free MB to reserve for OS |
130
+
131
+ Added to `Legion::LLM::Settings.default` and merged via `routing_defaults`.
132
+
133
+ ### Error Handling
134
+
135
+ | Scenario | Behavior |
136
+ |----------|----------|
137
+ | Ollama not running | `models` returns `[]`, all Ollama rules skipped, cloud tier takes over |
138
+ | `vm_stat` fails | `available_memory_mb` returns `nil`, memory checks bypassed (permissive) |
139
+ | `/api/tags` timeout (2s) | Returns stale cache if available, empty array on first call |
140
+ | Discovery disabled | All checks bypassed, rules pass through as before |
141
+ | Unknown platform | `available_memory_mb` returns `nil`, memory checks bypassed |
142
+
143
+ ### File Layout
144
+
145
+ ```
146
+ lib/legion/llm/discovery/ollama.rb
147
+ lib/legion/llm/discovery/system.rb
148
+ spec/legion/llm/discovery/ollama_spec.rb
149
+ spec/legion/llm/discovery/system_spec.rb
150
+ ```
151
+
152
+ ### Dependencies
153
+
154
+ No new gem dependencies. Uses:
155
+ - `Faraday` (transitive via ruby_llm) for Ollama HTTP
156
+ - Shell commands (`sysctl`, `vm_stat`) for macOS memory
157
+ - File reads (`/proc/meminfo`) for Linux memory
158
+
159
+ ## Out of Scope
160
+
161
+ - Auto-pulling models (explicit operator action only)
162
+ - GPU utilization monitoring (future HealthTracker signal)
163
+ - Fleet tier discovery (Phase 2)
164
+ - Disk space checks for model storage