legion-llm 0.3.1 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 9a39a5fe8483ddcd715dafd4d65dfe1f4457b90e5a39e62cfa2a32b6c68c8e0c
4
- data.tar.gz: 37ed9c3b024a1cb9cce7eed1e287512c91269ef99fbb7b54342de57ceae9a668
3
+ metadata.gz: 45ff0d9cdd07ee541c80dbac46e66d37542af95a96a614e31fc4af2c2bdf7833
4
+ data.tar.gz: 1562706b98e0e6e301a76dec373cecbcf99a3eb3a8e74a0d258928fbbaaf5be7
5
5
  SHA512:
6
- metadata.gz: a19943d8d25665e16ae55dfe6c0e32bad0e834a3eed3c5e028c0c0db672d531ea21e6015e6137b1f7c0b57bb38e2677091cab7d48dc3a3169cf9273fe6e468e7
7
- data.tar.gz: b9bd3d4586e64b9f1866d7e276ecdb3969e2204c8428b37e08fd262ddaef77846c5d28f192174b2ed5787d1576431aae4ebe29e2087499f06e2d0f9393e293ff
6
+ metadata.gz: e28b6c1e39599d1ecd16a60ff67bf4c7625b2d3cdcb2406b465751640b13d44d8a16e593644a4ee30f0bdcaff577580a19e7577e48a7bb07dc3fcfaaf28b3de9
7
+ data.tar.gz: d8fe57b67e87f8035c9c78c9bc8489aea68d75d50e477ca0007cd471cd24254081cf6784d2350d128d721dbbf4e5332340ac8e5d8297e9feffb1efe826b87d11
data/CHANGELOG.md CHANGED
@@ -1,5 +1,14 @@
1
1
  # Legion LLM Changelog
2
2
 
3
+ ## [0.3.2] - 2026-03-16
4
+
5
+ ### Added
6
+ - `Legion::LLM::Embeddings` module — structured wrapper around RubyLLM.embed with `generate`, `generate_batch`, `default_model`
7
+ - `Legion::LLM::ShadowEval` module — parallel evaluation on cheaper model with configurable sample rate for quality comparison
8
+ - `Legion::LLM::StructuredOutput` module — JSON schema enforcement with native `response_format` for capable models and prompt-based fallback with retry logic
9
+ - `embed_batch` and `structured` convenience methods on `Legion::LLM`
10
+ - `Settings.dig` support in spec_helper for nested settings access in tests
11
+
3
12
  ## [0.3.1] - 2026-03-16
4
13
 
5
14
  ### Removed
data/CLAUDE.md CHANGED
@@ -38,6 +38,9 @@ Legion::LLM (lib/legion/llm.rb)
38
38
  │ └── System # Queries OS memory: macOS (vm_stat/sysctl), Linux (/proc/meminfo)
39
39
  ├── QualityChecker # Response quality heuristics (empty, too_short, repetition, json_parse, json_expected) + pluggable callable
40
40
  ├── EscalationHistory # Mixin for response objects: escalation_history, escalated?, final_resolution, escalation_chain
41
+ ├── Embeddings # Structured embedding wrapper: generate, generate_batch, default_model
42
+ ├── ShadowEval # Parallel shadow evaluation on cheaper models with sampling
43
+ ├── StructuredOutput # JSON schema enforcement with native response_format and prompt fallback
41
44
  ├── Router # Dynamic weighted routing engine
42
45
  │ ├── Resolution # Value object: tier, provider, model, rule name, metadata, compress_level
43
46
  │ ├── Rule # Routing rule: intent matching, schedule windows, constraints
@@ -278,7 +281,10 @@ In-memory signal consumer with pluggable handlers. Adjusts effective priorities
278
281
  | `lib/legion/llm/router/health_tracker.rb` | HealthTracker: circuit breaker, latency window, pluggable signal handlers |
279
282
  | `lib/legion/llm/discovery/ollama.rb` | Ollama /api/tags discovery with TTL cache |
280
283
  | `lib/legion/llm/discovery/system.rb` | OS memory introspection (macOS + Linux) with TTL cache |
281
- | `lib/legion/llm/version.rb` | Version constant (0.3.0) |
284
+ | `lib/legion/llm/embeddings.rb` | Embeddings module: generate, generate_batch, default_model |
285
+ | `lib/legion/llm/shadow_eval.rb` | Shadow evaluation: enabled?, should_sample?, evaluate, compare |
286
+ | `lib/legion/llm/structured_output.rb` | JSON schema enforcement with native response_format and prompt fallback |
287
+ | `lib/legion/llm/version.rb` | Version constant (0.3.2) |
282
288
  | `lib/legion/llm/quality_checker.rb` | QualityChecker module with QualityResult struct |
283
289
  | `lib/legion/llm/escalation_history.rb` | EscalationHistory mixin: `escalation_history`, `escalated?`, `final_resolution`, `escalation_chain` |
284
290
  | `lib/legion/llm/router/escalation_chain.rb` | EscalationChain value object |
@@ -306,6 +312,9 @@ In-memory signal consumer with pluggable handlers. Adjusts effective priorities
306
312
  | `spec/legion/llm/router/escalation_chain_spec.rb` | EscalationChain tests |
307
313
  | `spec/legion/llm/router/resolve_chain_spec.rb` | Router.resolve_chain tests |
308
314
  | `spec/legion/llm/transport/escalation_spec.rb` | Transport tests |
315
+ | `spec/legion/llm/embeddings_spec.rb` | Embeddings tests |
316
+ | `spec/legion/llm/shadow_eval_spec.rb` | ShadowEval tests |
317
+ | `spec/legion/llm/structured_output_spec.rb` | StructuredOutput tests |
309
318
  | `spec/spec_helper.rb` | Stubbed Legion::Logging and Legion::Settings for testing |
310
319
 
311
320
  ## Extension Integration
@@ -365,7 +374,7 @@ The legacy `vault_path` per-provider setting was removed in v0.3.1.
365
374
  Tests run without the full LegionIO stack. `spec/spec_helper.rb` stubs `Legion::Logging` and `Legion::Settings` with in-memory implementations. Each test resets settings to defaults via `before(:each)`.
366
375
 
367
376
  ```bash
368
- bundle exec rspec # 269 examples, 0 failures
377
+ bundle exec rspec # 287 examples, 0 failures
369
378
  bundle exec rubocop # 31 files, 0 offenses
370
379
  ```
371
380
 
data/README.md CHANGED
@@ -12,7 +12,7 @@ Or add to your Gemfile and `bundle install`.
12
12
 
13
13
  ## Configuration
14
14
 
15
- Add to your LegionIO settings directory:
15
+ Add to your LegionIO settings directory (e.g. `~/.legionio/settings/llm.json`):
16
16
 
17
17
  ```json
18
18
  {
@@ -23,14 +23,15 @@ Add to your LegionIO settings directory:
23
23
  "bedrock": {
24
24
  "enabled": true,
25
25
  "region": "us-east-2",
26
- "vault_path": "legion/bedrock"
26
+ "bearer_token": ["vault://secret/data/llm/bedrock#bearer_token", "env://AWS_BEARER_TOKEN"]
27
27
  },
28
28
  "anthropic": {
29
29
  "enabled": false,
30
- "vault_path": "legion/anthropic"
30
+ "api_key": "env://ANTHROPIC_API_KEY"
31
31
  },
32
32
  "openai": {
33
- "enabled": false
33
+ "enabled": false,
34
+ "api_key": "env://OPENAI_API_KEY"
34
35
  },
35
36
  "ollama": {
36
37
  "enabled": false,
@@ -41,7 +42,7 @@ Add to your LegionIO settings directory:
41
42
  }
42
43
  ```
43
44
 
44
- Credentials are resolved from Vault automatically when `vault_path` is set and Legion::Crypt is connected.
45
+ Credentials are resolved automatically by the universal secret resolver in `legion-settings` (v1.3.0+). Use `vault://` URIs for Vault secrets, `env://` for environment variables, or plain strings for static values. Array values act as fallback chains — the first non-nil result wins.
45
46
 
46
47
  ### Provider Configuration
47
48
 
@@ -50,8 +51,7 @@ Each provider supports these common fields:
50
51
  | Field | Type | Description |
51
52
  |-------|------|-------------|
52
53
  | `enabled` | Boolean | Enable this provider (default: `false`) |
53
- | `api_key` | String | API key (resolved from Vault if `vault_path` set) |
54
- | `vault_path` | String | Vault secret path for credential resolution |
54
+ | `api_key` | String | API key (supports `vault://`, `env://`, or plain string) |
55
55
 
56
56
  Provider-specific fields:
57
57
 
@@ -60,19 +60,23 @@ Provider-specific fields:
60
60
  | **Bedrock** | `secret_key`, `session_token`, `region` (default: `us-east-2`), `bearer_token` (alternative to SigV4 — for AWS Identity Center/SSO) |
61
61
  | **Ollama** | `base_url` (default: `http://localhost:11434`) |
62
62
 
63
- ### Vault Credential Resolution
63
+ ### Credential Resolution
64
64
 
65
- When `vault_path` is set and `Legion::Crypt::Vault` is connected, credentials are fetched from Vault at startup. The secret keys map to provider fields automatically:
65
+ All credential fields support the universal `vault://` and `env://` URI schemes provided by `legion-settings`. Use array values for fallback chains:
66
66
 
67
- | Provider | Vault Key | Maps To |
68
- |----------|-----------|---------|
69
- | Bedrock | `access_key` / `aws_access_key_id` | `api_key` |
70
- | Bedrock | `secret_key` / `aws_secret_access_key` | `secret_key` |
71
- | Bedrock | `session_token` / `aws_session_token` | `session_token` |
72
- | Bedrock | `bearer_token` / `aws_bearer_token` | `bearer_token` (Identity Center/SSO) |
73
- | Anthropic / OpenAI / Gemini | `api_key` / `token` | `api_key` |
67
+ ```json
68
+ {
69
+ "bedrock": {
70
+ "enabled": true,
71
+ "api_key": ["vault://secret/data/llm/bedrock#access_key", "env://AWS_ACCESS_KEY_ID"],
72
+ "secret_key": ["vault://secret/data/llm/bedrock#secret_key", "env://AWS_SECRET_ACCESS_KEY"],
73
+ "bearer_token": ["vault://secret/data/llm/bedrock#bearer_token", "env://AWS_BEARER_TOKEN"],
74
+ "region": "us-east-2"
75
+ }
76
+ }
77
+ ```
74
78
 
75
- Direct configuration (setting `api_key` in settings) takes precedence over Vault-resolved values.
79
+ By the time `Legion::LLM.start` runs, all `vault://` and `env://` references have already been resolved to plain strings by `Legion::Settings.resolve_secrets!` (called in the boot sequence after `Legion::Crypt.start`). The `env://` scheme works even when Vault is not connected.
76
80
 
77
81
  ### Auto-Detection
78
82
 
@@ -91,7 +95,7 @@ If no `default_model` or `default_provider` is set, legion-llm auto-detects from
91
95
  ### Lifecycle
92
96
 
93
97
  ```ruby
94
- Legion::LLM.start # Configure providers, resolve Vault credentials, warm discovery caches, set defaults, ping provider
98
+ Legion::LLM.start # Configure providers, warm discovery caches, set defaults, ping provider
95
99
  Legion::LLM.shutdown # Mark disconnected, clean up
96
100
  Legion::LLM.started? # -> Boolean
97
101
  Legion::LLM.settings # -> Hash (current LLM settings)
@@ -556,10 +560,10 @@ end
556
560
 
557
561
  | Provider | Config Key | Credential Source | Notes |
558
562
  |----------|-----------|-------------------|-------|
559
- | AWS Bedrock | `bedrock` | Vault (`access_key`, `secret_key`) or direct | Default region: us-east-2 |
560
- | Anthropic | `anthropic` | Vault (`api_key`) or direct | Direct API access |
561
- | OpenAI | `openai` | Vault (`api_key`) or direct | GPT models |
562
- | Google Gemini | `gemini` | Vault (`api_key`) or direct | Gemini models |
563
+ | AWS Bedrock | `bedrock` | `vault://`, `env://`, or direct | Default region: us-east-2, SigV4 or Bearer Token auth |
564
+ | Anthropic | `anthropic` | `vault://`, `env://`, or direct | Direct API access |
565
+ | OpenAI | `openai` | `vault://`, `env://`, or direct | GPT models |
566
+ | Google Gemini | `gemini` | `vault://`, `env://`, or direct | Gemini models |
563
567
  | Ollama | `ollama` | Local, no credentials needed | Local inference |
564
568
 
565
569
  ## Integration with LegionIO
@@ -0,0 +1,43 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Legion
4
+ module LLM
5
+ module Embeddings
6
+ class << self
7
+ def generate(text:, model: nil, dimensions: nil)
8
+ model ||= default_model
9
+ opts = { model: model }
10
+ opts[:dimensions] = dimensions if dimensions
11
+
12
+ response = RubyLLM.embed(text, **opts)
13
+ {
14
+ vector: response.vectors.first,
15
+ model: model,
16
+ dimensions: response.vectors.first&.size || 0,
17
+ tokens: response.input_tokens
18
+ }
19
+ rescue StandardError => e
20
+ Legion::Logging.warn "Embedding failed: #{e.message}" if defined?(Legion::Logging)
21
+ { vector: nil, model: model, error: e.message }
22
+ end
23
+
24
+ def generate_batch(texts:, model: nil, dimensions: nil)
25
+ model ||= default_model
26
+ opts = { model: model }
27
+ opts[:dimensions] = dimensions if dimensions
28
+
29
+ response = RubyLLM.embed(texts, **opts)
30
+ response.vectors.each_with_index.map do |vec, i|
31
+ { vector: vec, model: model, dimensions: vec&.size || 0, index: i }
32
+ end
33
+ rescue StandardError => e
34
+ texts.map { |_| { vector: nil, model: model, error: e.message } }
35
+ end
36
+
37
+ def default_model
38
+ Legion::Settings.dig(:llm, :embeddings, :default_model) || 'text-embedding-3-small'
39
+ end
40
+ end
41
+ end
42
+ end
43
+ end
@@ -0,0 +1,49 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Legion
4
+ module LLM
5
+ module ShadowEval
6
+ class << self
7
+ def enabled?
8
+ Legion::Settings.dig(:llm, :shadow, :enabled) == true
9
+ end
10
+
11
+ def should_sample?
12
+ return false unless enabled?
13
+
14
+ rate = Legion::Settings.dig(:llm, :shadow, :sample_rate) || 0.1
15
+ rand < rate
16
+ end
17
+
18
+ def evaluate(primary_response:, messages: nil, shadow_model: nil) # rubocop:disable Lint/UnusedMethodArgument
19
+ shadow_model ||= Legion::Settings.dig(:llm, :shadow, :model) || 'gpt-4o-mini'
20
+
21
+ shadow_response = Legion::LLM.send(:chat_single,
22
+ model: shadow_model, provider: nil,
23
+ intent: nil, tier: nil,
24
+ skip_shadow: true)
25
+
26
+ comparison = compare(primary_response, shadow_response, shadow_model)
27
+ Legion::Events.emit('llm.shadow_eval', comparison) if defined?(Legion::Events)
28
+ comparison
29
+ rescue StandardError => e
30
+ { error: e.message, shadow_model: shadow_model }
31
+ end
32
+
33
+ def compare(primary, shadow, shadow_model)
34
+ primary_len = primary[:content]&.length || 0
35
+ shadow_len = shadow[:content]&.length || 0
36
+
37
+ {
38
+ primary_model: primary[:model],
39
+ shadow_model: shadow_model,
40
+ primary_tokens: primary[:usage],
41
+ shadow_tokens: shadow[:usage],
42
+ length_ratio: primary_len.zero? ? 0.0 : shadow_len.to_f / primary_len,
43
+ evaluated_at: Time.now.utc
44
+ }
45
+ end
46
+ end
47
+ end
48
+ end
49
+ end
@@ -0,0 +1,74 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Legion
4
+ module LLM
5
+ module StructuredOutput
6
+ SCHEMA_CAPABLE_MODELS = %w[gpt-4o gpt-4o-mini gpt-4-turbo claude-3-5-sonnet claude-4-sonnet claude-4-opus].freeze
7
+
8
+ class << self
9
+ def generate(messages:, schema:, model: nil, **)
10
+ model ||= Legion::LLM.settings[:default_model]
11
+ result = call_with_schema(messages, schema, model, **)
12
+
13
+ parsed = Legion::JSON.load(result[:content])
14
+ { data: parsed, raw: result[:content], model: result[:model], valid: true }
15
+ rescue ::JSON::ParserError => e
16
+ handle_parse_error(e, messages, schema, model, result, **)
17
+ end
18
+
19
+ private
20
+
21
+ def call_with_schema(messages, schema, model, **opts)
22
+ if supports_response_format?(model)
23
+ Legion::LLM.send(:chat_single,
24
+ model: model, provider: nil, intent: nil, tier: nil,
25
+ response_format: { type: 'json_schema',
26
+ json_schema: { name: 'response', schema: schema } },
27
+ **opts.except(:attempt))
28
+ else
29
+ instruction = "You MUST respond with valid JSON matching this schema:\n" \
30
+ "```json\n#{Legion::JSON.dump(schema)}\n```\n" \
31
+ 'Respond with ONLY the JSON object, no other text.'
32
+ augmented = [{ role: 'system', content: instruction }] + Array(messages)
33
+ Legion::LLM.send(:chat_single,
34
+ model: model, provider: nil, intent: nil, tier: nil,
35
+ messages: augmented, **opts.except(:attempt))
36
+ end
37
+ end
38
+
39
+ def handle_parse_error(error, messages, schema, model, result, **opts)
40
+ if retry_enabled? && (opts[:attempt] || 0) < max_retries
41
+ retry_with_instruction(messages, schema, model, attempt: (opts[:attempt] || 0) + 1, **opts)
42
+ else
43
+ { data: nil, error: "JSON parse failed: #{error.message}", raw: result&.dig(:content), valid: false }
44
+ end
45
+ end
46
+
47
+ def retry_with_instruction(messages, schema, model, **opts)
48
+ instruction = "Your previous response was not valid JSON. Respond with ONLY a valid JSON object matching this schema:\n#{Legion::JSON.dump(schema)}"
49
+ augmented = Array(messages) + [{ role: 'user', content: instruction }]
50
+ result = Legion::LLM.send(:chat_single,
51
+ model: model, provider: nil, intent: nil, tier: nil,
52
+ messages: augmented, **opts.except(:attempt))
53
+
54
+ parsed = Legion::JSON.load(result[:content])
55
+ { data: parsed, raw: result[:content], model: result[:model], valid: true, retried: true }
56
+ rescue StandardError => e
57
+ { data: nil, error: e.message, valid: false }
58
+ end
59
+
60
+ def supports_response_format?(model)
61
+ SCHEMA_CAPABLE_MODELS.any? { |m| model.to_s.include?(m) }
62
+ end
63
+
64
+ def retry_enabled?
65
+ Legion::Settings.dig(:llm, :structured_output, :retry_on_parse_failure) != false
66
+ end
67
+
68
+ def max_retries
69
+ Legion::Settings.dig(:llm, :structured_output, :max_retries) || 2
70
+ end
71
+ end
72
+ end
73
+ end
74
+ end
@@ -2,6 +2,6 @@
2
2
 
3
3
  module Legion
4
4
  module LLM
5
- VERSION = '0.3.1'
5
+ VERSION = '0.3.2'
6
6
  end
7
7
  end
data/lib/legion/llm.rb CHANGED
@@ -74,16 +74,30 @@ module Legion
74
74
  end
75
75
  end
76
76
 
77
- # Generate embeddings
77
+ # Generate embeddings via Embeddings module
78
78
  # @param text [String, Array<String>] text to embed
79
79
  # @param model [String] embedding model ID
80
- # @return [RubyLLM::Embedding]
81
- def embed(text, model: nil)
82
- if model
83
- RubyLLM.embed(text, model: model)
84
- else
85
- RubyLLM.embed(text)
86
- end
80
+ # @return [Hash] { vector:, model:, dimensions:, tokens: }
81
+ def embed(text, **)
82
+ require 'legion/llm/embeddings'
83
+ Embeddings.generate(text: text, **)
84
+ end
85
+
86
+ # Batch embed multiple texts
87
+ # @param texts [Array<String>] texts to embed
88
+ # @return [Array<Hash>]
89
+ def embed_batch(texts, **)
90
+ require 'legion/llm/embeddings'
91
+ Embeddings.generate_batch(texts: texts, **)
92
+ end
93
+
94
+ # Generate structured JSON output from LLM
95
+ # @param messages [Array<Hash>] conversation messages
96
+ # @param schema [Hash] JSON schema to enforce
97
+ # @return [Hash] { data:, raw:, model:, valid: }
98
+ def structured(messages:, schema:, **)
99
+ require 'legion/llm/structured_output'
100
+ StructuredOutput.generate(messages: messages, schema: schema, **)
87
101
  end
88
102
 
89
103
  # Create a configured agent instance
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: legion-llm
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.1
4
+ version: 0.3.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Esity
@@ -92,6 +92,7 @@ files:
92
92
  - lib/legion/llm/compressor.rb
93
93
  - lib/legion/llm/discovery/ollama.rb
94
94
  - lib/legion/llm/discovery/system.rb
95
+ - lib/legion/llm/embeddings.rb
95
96
  - lib/legion/llm/escalation_history.rb
96
97
  - lib/legion/llm/helpers/llm.rb
97
98
  - lib/legion/llm/providers.rb
@@ -102,6 +103,8 @@ files:
102
103
  - lib/legion/llm/router/resolution.rb
103
104
  - lib/legion/llm/router/rule.rb
104
105
  - lib/legion/llm/settings.rb
106
+ - lib/legion/llm/shadow_eval.rb
107
+ - lib/legion/llm/structured_output.rb
105
108
  - lib/legion/llm/transport/exchanges/escalation.rb
106
109
  - lib/legion/llm/transport/messages/escalation_event.rb
107
110
  - lib/legion/llm/version.rb