legion-llm 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md ADDED
@@ -0,0 +1,615 @@
1
+ # Legion LLM
2
+
3
+ LLM integration for the [LegionIO](https://github.com/LegionIO/LegionIO) framework. Wraps [ruby_llm](https://github.com/crmne/ruby_llm) to provide chat, embeddings, tool use, and agent capabilities to any Legion extension.
4
+
5
+ ## Installation
6
+
7
+ ```ruby
8
+ gem 'legion-llm'
9
+ ```
10
+
11
+ Or add to your Gemfile and `bundle install`.
12
+
13
+ ## Configuration
14
+
15
+ Add to your LegionIO settings directory:
16
+
17
+ ```json
18
+ {
19
+ "llm": {
20
+ "default_model": "us.anthropic.claude-sonnet-4-6-v1",
21
+ "default_provider": "bedrock",
22
+ "providers": {
23
+ "bedrock": {
24
+ "enabled": true,
25
+ "region": "us-east-2",
26
+ "vault_path": "legion/bedrock"
27
+ },
28
+ "anthropic": {
29
+ "enabled": false,
30
+ "vault_path": "legion/anthropic"
31
+ },
32
+ "openai": {
33
+ "enabled": false
34
+ },
35
+ "ollama": {
36
+ "enabled": false,
37
+ "base_url": "http://localhost:11434"
38
+ }
39
+ }
40
+ }
41
+ }
42
+ ```
43
+
44
+ Credentials are resolved from Vault automatically when `vault_path` is set and Legion::Crypt is connected.
45
+
46
+ ### Provider Configuration
47
+
48
+ Each provider supports these common fields:
49
+
50
+ | Field | Type | Description |
51
+ |-------|------|-------------|
52
+ | `enabled` | Boolean | Enable this provider (default: `false`) |
53
+ | `api_key` | String | API key (resolved from Vault if `vault_path` set) |
54
+ | `vault_path` | String | Vault secret path for credential resolution |
55
+
56
+ Provider-specific fields:
57
+
58
+ | Provider | Additional Fields |
59
+ |----------|------------------|
60
+ | **Bedrock** | `secret_key`, `session_token`, `region` (default: `us-east-2`), `bearer_token` (alternative to SigV4 — for AWS Identity Center/SSO) |
61
+ | **Ollama** | `base_url` (default: `http://localhost:11434`) |
62
+
63
+ ### Vault Credential Resolution
64
+
65
+ When `vault_path` is set and `Legion::Crypt::Vault` is connected, credentials are fetched from Vault at startup. The secret keys map to provider fields automatically:
66
+
67
+ | Provider | Vault Key | Maps To |
68
+ |----------|-----------|---------|
69
+ | Bedrock | `access_key` / `aws_access_key_id` | `api_key` |
70
+ | Bedrock | `secret_key` / `aws_secret_access_key` | `secret_key` |
71
+ | Bedrock | `session_token` / `aws_session_token` | `session_token` |
72
+ | Bedrock | `bearer_token` / `aws_bearer_token` | `bearer_token` (Identity Center/SSO) |
73
+ | Anthropic / OpenAI / Gemini | `api_key` / `token` | `api_key` |
74
+
75
+ Direct configuration (setting `api_key` in settings) takes precedence over Vault-resolved values.
76
+
77
+ ### Auto-Detection
78
+
79
+ If no `default_model` or `default_provider` is set, legion-llm auto-detects from the first enabled provider in priority order:
80
+
81
+ | Priority | Provider | Default Model |
82
+ |----------|----------|---------------|
83
+ | 1 | Bedrock | `us.anthropic.claude-sonnet-4-6-v1` |
84
+ | 2 | Anthropic | `claude-sonnet-4-6` |
85
+ | 3 | OpenAI | `gpt-4o` |
86
+ | 4 | Gemini | `gemini-2.0-flash` |
87
+ | 5 | Ollama | `llama3` |
88
+
89
+ ## Core API
90
+
91
+ ### Lifecycle
92
+
93
+ ```ruby
94
+ Legion::LLM.start # Configure providers, resolve Vault credentials, warm discovery caches, set defaults, ping provider
95
+ Legion::LLM.shutdown # Mark disconnected, clean up
96
+ Legion::LLM.started? # -> Boolean
97
+ Legion::LLM.settings # -> Hash (current LLM settings)
98
+ ```
99
+
100
+ ### Chat
101
+
102
+ Returns a `RubyLLM::Chat` instance for multi-turn conversation:
103
+
104
+ ```ruby
105
+ # Use configured defaults
106
+ chat = Legion::LLM.chat
107
+ response = chat.ask("What is the capital of France?")
108
+ puts response.content
109
+
110
+ # Override model/provider per call
111
+ chat = Legion::LLM.chat(model: 'gpt-4o', provider: :openai)
112
+
113
+ # Multi-turn conversation
114
+ chat = Legion::LLM.chat
115
+ chat.ask("Remember: my name is Matt")
116
+ chat.ask("What's my name?") # -> "Matt"
117
+ ```
118
+
119
+ ### Embeddings
120
+
121
+ ```ruby
122
+ embedding = Legion::LLM.embed("some text to embed")
123
+ embedding.vectors # -> Array of floats
124
+
125
+ # Specific model
126
+ embedding = Legion::LLM.embed("text", model: "text-embedding-3-small")
127
+ ```
128
+
129
+ ### Tool Use
130
+
131
+ Define tools as Ruby classes and attach them to a chat session. RubyLLM handles the tool-use loop automatically — when the model calls a tool, ruby_llm executes it and feeds the result back:
132
+
133
+ ```ruby
134
+ class WeatherLookup < RubyLLM::Tool
135
+ description "Look up current weather for a location"
136
+
137
+ param :location, desc: "City name or zip code"
138
+ param :units, desc: "celsius or fahrenheit", required: false
139
+
140
+ def execute(location:, units: "fahrenheit")
141
+ # Your weather API call here
142
+ { temperature: 72, conditions: "sunny", location: location }
143
+ end
144
+ end
145
+
146
+ chat = Legion::LLM.chat
147
+ chat.with_tools(WeatherLookup)
148
+ response = chat.ask("What's the weather in Minneapolis?")
149
+ # Model calls WeatherLookup, gets result, responds with natural language
150
+ ```
151
+
152
+ ### Structured Output
153
+
154
+ Use `RubyLLM::Schema` to get typed, validated responses:
155
+
156
+ ```ruby
157
+ class SentimentResult < RubyLLM::Schema
158
+ string :sentiment, enum: %w[positive negative neutral]
159
+ number :confidence
160
+ string :reasoning
161
+ end
162
+
163
+ chat = Legion::LLM.chat
164
+ result = chat.with_output_schema(SentimentResult).ask("Analyze: 'I love this product!'")
165
+ result.sentiment # -> "positive"
166
+ result.confidence # -> 0.95
167
+ result.reasoning # -> "Strong positive language..."
168
+ ```
169
+
170
+ ### Agents
171
+
172
+ Define reusable agents as `RubyLLM::Agent` subclasses with declarative configuration:
173
+
174
+ ```ruby
175
+ class CodeReviewer < RubyLLM::Agent
176
+ model "us.anthropic.claude-sonnet-4-6-v1", provider: :bedrock
177
+ instructions "You review code for bugs, security issues, and style"
178
+ tools CodeAnalyzer, SecurityScanner
179
+ temperature 0.1
180
+
181
+ schema do
182
+ string :verdict, enum: %w[approve request_changes]
183
+ array :issues do
184
+ string
185
+ end
186
+ end
187
+ end
188
+
189
+ reviewer = Legion::LLM.agent(CodeReviewer)
190
+ result = reviewer.ask(diff_content)
191
+ result.verdict # -> "approve" or "request_changes"
192
+ result.issues # -> ["Line 42: potential SQL injection", ...]
193
+ ```
194
+
195
+ ## Usage in Extensions
196
+
197
+ Any LEX extension can use LLM capabilities. The gem provides helper methods that are auto-loaded when legion-llm is present.
198
+
199
+ ### Basic Extension Usage
200
+
201
+ ```ruby
202
+ module Legion::Extensions::MyLex::Runners
203
+ module Analyzer
204
+ def analyze(text:, **_opts)
205
+ chat = Legion::LLM.chat
206
+ response = chat.ask("Analyze this: #{text}")
207
+ { analysis: response.content }
208
+ end
209
+ end
210
+ end
211
+ ```
212
+
213
+ ### Declaring LLM as Required
214
+
215
+ Extensions that cannot function without LLM should declare the dependency. Legion will skip loading the extension if LLM is not available:
216
+
217
+ ```ruby
218
+ module Legion::Extensions::MyLex
219
+ def self.llm_required?
220
+ true
221
+ end
222
+ end
223
+ ```
224
+
225
+ ### Helper Methods
226
+
227
+ Include the LLM helper for convenience methods in any runner:
228
+
229
+ ```ruby
230
+ # One-shot chat (returns RubyLLM::Response)
231
+ result = llm_chat("Summarize this text", instructions: "Be concise")
232
+
233
+ # Chat with tools
234
+ result = llm_chat("Check the weather", tools: [WeatherLookup])
235
+
236
+ # With prompt compression (reduces input tokens for cost/speed)
237
+ result = llm_chat("Summarize the data", instructions: "Be concise", compress: 2)
238
+
239
+ # Embeddings
240
+ embedding = llm_embed("some text to embed")
241
+
242
+ # Multi-turn session (returns RubyLLM::Chat for continued conversation)
243
+ session = llm_session
244
+ session.with_instructions("You are a code reviewer")
245
+ session.with_tools(CodeAnalyzer, SecurityScanner)
246
+ response = session.ask("Review this PR: #{diff}")
247
+ ```
248
+
249
+ ### Routing
250
+
251
+ legion-llm includes a dynamic weighted routing engine that dispatches requests across local, fleet, and cloud tiers based on caller intent, priority rules, time schedules, cost multipliers, and real-time provider health. Routing is **disabled by default** — opt in via settings.
252
+
253
+ #### Three Tiers
254
+
255
+ ```
256
+ ┌─────────────────────────────────────────────────────────┐
257
+ │ Legion::LLM Router (per-node) │
258
+ │ │
259
+ │ Tier 1: LOCAL → Ollama on this machine (direct HTTP) │
260
+ │ Zero network overhead, no Transport │
261
+ │ │
262
+ │ Tier 2: FLEET → Ollama on Mac Studios / GPU servers │
263
+ │ Via Legion::Transport (AMQP) when local can't │
264
+ │ serve the model (Phase 2, not yet built) │
265
+ │ │
266
+ │ Tier 3: CLOUD → Bedrock / Anthropic / OpenAI / Gemini │
267
+ │ Existing provider API calls │
268
+ └─────────────────────────────────────────────────────────┘
269
+ ```
270
+
271
+ | Tier | Target | Use Case |
272
+ |------|--------|----------|
273
+ | `local` | Ollama on localhost | Privacy-sensitive, offline, or low-latency workloads |
274
+ | `fleet` | Shared hardware via Legion::Transport | Larger models on dedicated GPU servers (Phase 2) |
275
+ | `cloud` | API providers (Bedrock, Anthropic, OpenAI, Gemini) | Frontier models, full-capability inference |
276
+
277
+ #### Intent-Based Dispatch
278
+
279
+ Pass an `intent:` hash to route based on privacy, capability, or cost requirements:
280
+
281
+ ```ruby
282
+ # Route to local tier for strict privacy
283
+ result = llm_chat("Summarize this PII data", intent: { privacy: :strict })
284
+
285
+ # Route to cloud for reasoning tasks
286
+ result = llm_chat("Solve this proof", intent: { capability: :reasoning })
287
+
288
+ # Minimize cost — prefers local/fleet over cloud
289
+ result = llm_chat("Translate this", intent: { cost: :minimize })
290
+
291
+ # Explicit tier override (bypasses rules)
292
+ result = llm_chat("Translate this", tier: :cloud, model: "claude-sonnet-4-6")
293
+ ```
294
+
295
+ Same parameters work on `Legion::LLM.chat` and `llm_session`:
296
+
297
+ ```ruby
298
+ chat = Legion::LLM.chat(intent: { privacy: :strict, capability: :basic })
299
+ session = llm_session(tier: :local)
300
+ ```
301
+
302
+ #### Intent Dimensions
303
+
304
+ | Dimension | Values | Default | Effect |
305
+ |-----------|--------|---------|--------|
306
+ | `privacy` | `:strict`, `:normal` | `:normal` | `:strict` -> never cloud (via constraint rules) |
307
+ | `capability` | `:basic`, `:moderate`, `:reasoning` | `:moderate` | Higher prefers larger/cloud models |
308
+ | `cost` | `:minimize`, `:normal` | `:normal` | `:minimize` prefers local/fleet |
309
+
310
+ #### Routing Resolution
311
+
312
+ ```
313
+ 1. Caller passes intent: { privacy: :strict, capability: :basic }
314
+ 2. Router merges with default_intent (fills missing dimensions)
315
+ 3. Load rules from settings, filter by:
316
+ a. Intent match (all `when` conditions must match)
317
+ b. Schedule window (valid_from/valid_until, hours, days)
318
+ c. Constraints (e.g., never_cloud strips cloud-tier rules)
319
+ d. Discovery (Ollama model pulled? Model fits in available RAM?)
320
+ e. Tier availability (is Ollama running? is Transport loaded?)
321
+ 4. Score remaining candidates:
322
+ effective_priority = rule.priority
323
+ + health_tracker.adjustment(provider)
324
+ + (1.0 - cost_multiplier) * 10
325
+ 5. Return Resolution for highest-scoring candidate
326
+ ```
327
+
328
+ #### Settings
329
+
330
+ Add routing configuration under the `llm` key:
331
+
332
+ ```json
333
+ {
334
+ "llm": {
335
+ "routing": {
336
+ "enabled": true,
337
+ "default_intent": { "privacy": "normal", "capability": "moderate", "cost": "normal" },
338
+ "tiers": {
339
+ "local": { "provider": "ollama" },
340
+ "fleet": { "queue": "llm.inference", "timeout_seconds": 30 },
341
+ "cloud": { "providers": ["bedrock", "anthropic"] }
342
+ },
343
+ "health": {
344
+ "window_seconds": 300,
345
+ "circuit_breaker": { "failure_threshold": 3, "cooldown_seconds": 60 },
346
+ "latency_penalty_threshold_ms": 5000
347
+ },
348
+ "rules": [
349
+ {
350
+ "name": "privacy_local",
351
+ "when": { "privacy": "strict" },
352
+ "then": { "tier": "local", "provider": "ollama", "model": "llama3" },
353
+ "priority": 100,
354
+ "constraint": "never_cloud"
355
+ },
356
+ {
357
+ "name": "reasoning_cloud",
358
+ "when": { "capability": "reasoning" },
359
+ "then": { "tier": "cloud", "provider": "bedrock", "model": "us.anthropic.claude-sonnet-4-6-v1" },
360
+ "priority": 50,
361
+ "cost_multiplier": 1.0
362
+ },
363
+ {
364
+ "name": "anthropic_promo",
365
+ "when": { "cost": "normal" },
366
+ "then": { "tier": "cloud", "provider": "anthropic", "model": "claude-sonnet-4-6" },
367
+ "priority": 60,
368
+ "cost_multiplier": 0.5,
369
+ "schedule": {
370
+ "valid_from": "2026-03-15T00:00:00",
371
+ "valid_until": "2026-03-29T23:59:59",
372
+ "hours": ["00:00-06:00", "18:00-23:59"]
373
+ },
374
+ "note": "Double token promotion — off-peak hours only"
375
+ }
376
+ ]
377
+ }
378
+ }
379
+ }
380
+ ```
381
+
382
+ #### Routing Rules
383
+
384
+ Each rule is a hash with:
385
+
386
+ | Field | Type | Required | Description |
387
+ |-------|------|----------|-------------|
388
+ | `name` | String | Yes | Unique rule identifier |
389
+ | `when` | Hash | Yes | Intent conditions to match (`privacy`, `capability`, `cost`) |
390
+ | `then` | Hash | No | Target: `{ tier:, provider:, model: }` |
391
+ | `priority` | Integer | No (default 0) | Higher wins when multiple rules match |
392
+ | `constraint` | String | No | Hard constraint (e.g., `never_cloud`) |
393
+ | `fallback` | String | No | Fallback tier if primary is unavailable |
394
+ | `cost_multiplier` | Float | No (default 1.0) | Lower = cheaper = routing bonus |
395
+ | `schedule` | Hash | No | Time-based activation window |
396
+ | `note` | String | No | Human-readable note |
397
+
398
+ #### Health Tracking
399
+
400
+ The `HealthTracker` adjusts effective priorities at runtime based on provider health signals:
401
+
402
+ - **Circuit breaker**: After consecutive failures, a provider's circuit opens (penalty: -50) then transitions to half_open (penalty: -25) after a cooldown period
403
+ - **Latency penalty**: Rolling window tracks average latency; providers above threshold receive priority penalties
404
+ - **Pluggable signals**: Any LEX can feed custom signals (e.g., GPU utilization, budget tracking) via `register_handler`
405
+
406
+ ```ruby
407
+ # Report signals (typically called by LEX extensions)
408
+ tracker = Legion::LLM::Router.health_tracker
409
+ tracker.report(provider: :anthropic, signal: :error, value: 1)
410
+ tracker.report(provider: :ollama, signal: :latency, value: 1200)
411
+
412
+ # Check state
413
+ tracker.circuit_state(:anthropic) # -> :closed, :open, or :half_open
414
+ tracker.adjustment(:anthropic) # -> Integer (priority offset)
415
+
416
+ # Add custom signal handler
417
+ tracker.register_handler(:gpu_utilization) { |data| ... }
418
+ ```
419
+
420
+ When routing is disabled (the default), `chat`, `llm_chat`, and `llm_session` behave exactly as before — no behavior change until you opt in.
421
+
422
+ #### Local Model Discovery
423
+
424
+ When the Ollama provider is enabled, legion-llm discovers which models are actually pulled and checks available system memory before routing to local models. This prevents the router from selecting models that aren't installed or that won't fit in RAM.
425
+
426
+ Discovery uses lazy TTL-based caching (default: 60 seconds). At startup, caches are warmed and logged:
427
+
428
+ ```
429
+ Ollama: 3 models available (llama3.1:8b, qwen2.5:32b, nomic-embed-text)
430
+ System: 65536 MB total, 42000 MB available
431
+ ```
432
+
433
+ Configure under `discovery`:
434
+
435
+ ```json
436
+ {
437
+ "llm": {
438
+ "discovery": {
439
+ "enabled": true,
440
+ "refresh_seconds": 60,
441
+ "memory_floor_mb": 2048
442
+ }
443
+ }
444
+ }
445
+ ```
446
+
447
+ | Key | Type | Default | Description |
448
+ |-----|------|---------|-------------|
449
+ | `enabled` | Boolean | `true` | Master switch for discovery checks |
450
+ | `refresh_seconds` | Integer | `60` | TTL for discovery caches |
451
+ | `memory_floor_mb` | Integer | `2048` | Minimum free MB to reserve for OS |
452
+
453
+ When a routing rule targets a local Ollama model that isn't pulled or won't fit in available memory (minus `memory_floor_mb`), the rule is silently skipped and the next best candidate is used. If discovery fails (Ollama not running, unknown OS), checks are bypassed permissively.
454
+
455
+ ### Model Escalation
456
+
457
+ When an LLM call fails (API error, timeout, or quality issue), the escalation system automatically retries with more capable models. If all attempts fail, `Legion::LLM::EscalationExhausted` is raised.
458
+
459
+ ```ruby
460
+ # Enable escalation and ask in one call
461
+ response = Legion::LLM.chat(
462
+ message: "Generate a SQL query for user analytics",
463
+ escalate: true,
464
+ max_escalations: 3,
465
+ quality_check: ->(r) { r.content.include?('SELECT') }
466
+ )
467
+
468
+ # Check if escalation occurred (true only when more than one attempt was made)
469
+ response.escalated? # => true if >1 attempt was made
470
+ response.escalation_history # => [{model:, provider:, tier:, outcome:, failures:, duration_ms:}, ...]
471
+ response.final_resolution # => Resolution that succeeded
472
+ response.escalation_chain # => EscalationChain used for this call
473
+ ```
474
+
475
+ Raises `Legion::LLM::EscalationExhausted` if all attempts are exhausted.
476
+
477
+ Configure globally in settings:
478
+
479
+ ```yaml
480
+ llm:
481
+ routing:
482
+ escalation:
483
+ enabled: true
484
+ max_attempts: 3
485
+ quality_threshold: 50
486
+ ```
487
+
488
+ ### Prompt Compression
489
+
490
+ `Legion::LLM::Compressor` strips low-signal words from prompts before sending to the API, reducing input token count and cost. Compression is deterministic (same input always produces the same output), preserving prompt caching compatibility.
491
+
492
+ #### Levels
493
+
494
+ | Level | Name | What It Removes |
495
+ |-------|------|-----------------|
496
+ | 0 | None | Nothing |
497
+ | 1 | Light | Articles (a, an, the), filler adverbs (just, very, really, basically, ...) |
498
+ | 2 | Moderate | + sentence connectives (however, moreover, furthermore, ...) |
499
+ | 3 | Aggressive | + low-signal words (also, then, please, note, that, ...) + whitespace normalization |
500
+
501
+ Code blocks (fenced and inline) are never modified. Negation words are never removed.
502
+
503
+ #### Usage
504
+
505
+ ```ruby
506
+ # Direct API
507
+ text = Legion::LLM::Compressor.compress("The very important system prompt", level: 2)
508
+
509
+ # Via llm_chat helper (compresses both message and instructions)
510
+ result = llm_chat("Analyze the data", instructions: "Be very concise", compress: 2)
511
+ ```
512
+
513
+ #### Router Integration
514
+
515
+ Routing rules can specify `compress_level` in their target to auto-compress for cost-sensitive tiers:
516
+
517
+ ```json
518
+ {
519
+ "name": "cloud_compressed",
520
+ "priority": 50,
521
+ "when": { "capability": "chat" },
522
+ "then": { "tier": "cloud", "provider": "bedrock", "model": "claude-sonnet-4-6", "compress_level": 2 }
523
+ }
524
+ ```
525
+
526
+ ### Building an LLM-Powered LEX
527
+
528
+ A complete example of a LEX extension that uses LLM for intelligent processing:
529
+
530
+ ```ruby
531
+ # lib/legion/extensions/smart_alerts/runners/evaluate.rb
532
+ module Legion::Extensions::SmartAlerts::Runners
533
+ module Evaluate
534
+ def evaluate(alert_data:, **_opts)
535
+ session = llm_session(model: 'us.anthropic.claude-sonnet-4-6-v1')
536
+ session.with_instructions(<<~PROMPT)
537
+ You are an alert triage system. Given alert data, determine:
538
+ 1. Severity (critical, warning, info)
539
+ 2. Whether it requires immediate human attention
540
+ 3. Suggested remediation steps
541
+ PROMPT
542
+
543
+ result = session.ask("Evaluate this alert: #{alert_data.to_json}")
544
+
545
+ {
546
+ evaluation: result.content,
547
+ timestamp: Time.now.utc,
548
+ model: 'us.anthropic.claude-sonnet-4-6-v1'
549
+ }
550
+ end
551
+ end
552
+ end
553
+ ```
554
+
555
+ ## Providers
556
+
557
+ | Provider | Config Key | Credential Source | Notes |
558
+ |----------|-----------|-------------------|-------|
559
+ | AWS Bedrock | `bedrock` | Vault (`access_key`, `secret_key`) or direct | Default region: us-east-2 |
560
+ | Anthropic | `anthropic` | Vault (`api_key`) or direct | Direct API access |
561
+ | OpenAI | `openai` | Vault (`api_key`) or direct | GPT models |
562
+ | Google Gemini | `gemini` | Vault (`api_key`) or direct | Gemini models |
563
+ | Ollama | `ollama` | Local, no credentials needed | Local inference |
564
+
565
+ ## Integration with LegionIO
566
+
567
+ legion-llm follows the standard core gem lifecycle:
568
+
569
+ ```
570
+ Legion::Service#initialize
571
+ ...
572
+ setup_data # Legion::Data
573
+ setup_llm # Legion::LLM <-- here
574
+ setup_supervision # Legion::Supervision
575
+ load_extensions # LEX extensions (can use LLM if available)
576
+ ```
577
+
578
+ - **Service**: `setup_llm` called between data and supervision in startup sequence
579
+ - **Extensions**: `llm_required?` method on extension module, checked at load time
580
+ - **Helpers**: `Legion::Extensions::Helpers::LLM` auto-loaded when gem is present
581
+ - **Readiness**: Registers as `:llm` in `Legion::Readiness`
582
+ - **Shutdown**: `Legion::LLM.shutdown` called during service shutdown (reverse order)
583
+
584
+ ## Development
585
+
586
+ ```bash
587
+ git clone https://github.com/LegionIO/legion-llm.git
588
+ cd legion-llm
589
+ bundle install
590
+ bundle exec rspec
591
+ ```
592
+
593
+ ### Running Tests
594
+
595
+ Tests use stubbed `Legion::Logging` and `Legion::Settings` modules (no need for the full LegionIO stack):
596
+
597
+ ```bash
598
+ bundle exec rspec # Run all 269 tests
599
+ bundle exec rubocop # Lint (0 offenses)
600
+ bundle exec rspec spec/legion/llm_spec.rb # Run specific test file
601
+ bundle exec rspec spec/legion/llm/router_spec.rb # Router tests only
602
+ ```
603
+
604
+ ## Dependencies
605
+
606
+ | Gem | Purpose |
607
+ |-----|---------|
608
+ | `ruby_llm` (>= 1.0) | Multi-provider LLM client |
609
+ | `tzinfo` (>= 2.0) | IANA timezone conversion for schedule windows |
610
+ | `legion-logging` | Logging |
611
+ | `legion-settings` | Configuration |
612
+
613
+ ## License
614
+
615
+ Apache-2.0