RubyGems - legion-llm - Versions diffs - 0.3.1 → 0.3.2 - Mend

legion-llm 0.3.1 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +9 -0
data/CLAUDE.md +11 -2
data/README.md +26 -22
data/lib/legion/llm/embeddings.rb +43 -0
data/lib/legion/llm/shadow_eval.rb +49 -0
data/lib/legion/llm/structured_output.rb +74 -0
data/lib/legion/llm/version.rb +1 -1
data/lib/legion/llm.rb +22 -8
metadata +4 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 9a39a5fe8483ddcd715dafd4d65dfe1f4457b90e5a39e62cfa2a32b6c68c8e0c
-  data.tar.gz: 37ed9c3b024a1cb9cce7eed1e287512c91269ef99fbb7b54342de57ceae9a668
+  metadata.gz: 45ff0d9cdd07ee541c80dbac46e66d37542af95a96a614e31fc4af2c2bdf7833
+  data.tar.gz: 1562706b98e0e6e301a76dec373cecbcf99a3eb3a8e74a0d258928fbbaaf5be7
 SHA512:
-  metadata.gz: a19943d8d25665e16ae55dfe6c0e32bad0e834a3eed3c5e028c0c0db672d531ea21e6015e6137b1f7c0b57bb38e2677091cab7d48dc3a3169cf9273fe6e468e7
-  data.tar.gz: b9bd3d4586e64b9f1866d7e276ecdb3969e2204c8428b37e08fd262ddaef77846c5d28f192174b2ed5787d1576431aae4ebe29e2087499f06e2d0f9393e293ff
+  metadata.gz: e28b6c1e39599d1ecd16a60ff67bf4c7625b2d3cdcb2406b465751640b13d44d8a16e593644a4ee30f0bdcaff577580a19e7577e48a7bb07dc3fcfaaf28b3de9
+  data.tar.gz: d8fe57b67e87f8035c9c78c9bc8489aea68d75d50e477ca0007cd471cd24254081cf6784d2350d128d721dbbf4e5332340ac8e5d8297e9feffb1efe826b87d11

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,14 @@
 # Legion LLM Changelog
+## [0.3.2] - 2026-03-16
+### Added
+- `Legion::LLM::Embeddings` module — structured wrapper around RubyLLM.embed with `generate`, `generate_batch`, `default_model`
+- `Legion::LLM::ShadowEval` module — parallel evaluation on cheaper model with configurable sample rate for quality comparison
+- `Legion::LLM::StructuredOutput` module — JSON schema enforcement with native `response_format` for capable models and prompt-based fallback with retry logic
+- `embed_batch` and `structured` convenience methods on `Legion::LLM`
+- `Settings.dig` support in spec_helper for nested settings access in tests
 ## [0.3.1] - 2026-03-16
 ### Removed

data/CLAUDE.md CHANGED Viewed

@@ -38,6 +38,9 @@ Legion::LLM (lib/legion/llm.rb)
 │   └── System       # Queries OS memory: macOS (vm_stat/sysctl), Linux (/proc/meminfo)
 ├── QualityChecker   # Response quality heuristics (empty, too_short, repetition, json_parse, json_expected) + pluggable callable
 ├── EscalationHistory # Mixin for response objects: escalation_history, escalated?, final_resolution, escalation_chain
+├── Embeddings       # Structured embedding wrapper: generate, generate_batch, default_model
+├── ShadowEval       # Parallel shadow evaluation on cheaper models with sampling
+├── StructuredOutput # JSON schema enforcement with native response_format and prompt fallback
 ├── Router           # Dynamic weighted routing engine
 │   ├── Resolution   # Value object: tier, provider, model, rule name, metadata, compress_level
 │   ├── Rule         # Routing rule: intent matching, schedule windows, constraints
@@ -278,7 +281,10 @@ In-memory signal consumer with pluggable handlers. Adjusts effective priorities
 | `lib/legion/llm/router/health_tracker.rb` | HealthTracker: circuit breaker, latency window, pluggable signal handlers |
 | `lib/legion/llm/discovery/ollama.rb` | Ollama /api/tags discovery with TTL cache |
 | `lib/legion/llm/discovery/system.rb` | OS memory introspection (macOS + Linux) with TTL cache |
-| `lib/legion/llm/version.rb` | Version constant (0.3.0) |
+| `lib/legion/llm/embeddings.rb` | Embeddings module: generate, generate_batch, default_model |
+| `lib/legion/llm/shadow_eval.rb` | Shadow evaluation: enabled?, should_sample?, evaluate, compare |
+| `lib/legion/llm/structured_output.rb` | JSON schema enforcement with native response_format and prompt fallback |
+| `lib/legion/llm/version.rb` | Version constant (0.3.2) |
 | `lib/legion/llm/quality_checker.rb` | QualityChecker module with QualityResult struct |
 | `lib/legion/llm/escalation_history.rb` | EscalationHistory mixin: `escalation_history`, `escalated?`, `final_resolution`, `escalation_chain` |
 | `lib/legion/llm/router/escalation_chain.rb` | EscalationChain value object |
@@ -306,6 +312,9 @@ In-memory signal consumer with pluggable handlers. Adjusts effective priorities
 | `spec/legion/llm/router/escalation_chain_spec.rb` | EscalationChain tests |
 | `spec/legion/llm/router/resolve_chain_spec.rb` | Router.resolve_chain tests |
 | `spec/legion/llm/transport/escalation_spec.rb` | Transport tests |
+| `spec/legion/llm/embeddings_spec.rb` | Embeddings tests |
+| `spec/legion/llm/shadow_eval_spec.rb` | ShadowEval tests |
+| `spec/legion/llm/structured_output_spec.rb` | StructuredOutput tests |
 | `spec/spec_helper.rb` | Stubbed Legion::Logging and Legion::Settings for testing |
 ## Extension Integration
@@ -365,7 +374,7 @@ The legacy `vault_path` per-provider setting was removed in v0.3.1.
 Tests run without the full LegionIO stack. `spec/spec_helper.rb` stubs `Legion::Logging` and `Legion::Settings` with in-memory implementations. Each test resets settings to defaults via `before(:each)`.
 ```bash
-bundle exec rspec    # 269 examples, 0 failures
+bundle exec rspec    # 287 examples, 0 failures
 bundle exec rubocop  # 31 files, 0 offenses
 ```

data/README.md CHANGED Viewed

@@ -12,7 +12,7 @@ Or add to your Gemfile and `bundle install`.
 ## Configuration
-Add to your LegionIO settings directory:
+Add to your LegionIO settings directory (e.g. `~/.legionio/settings/llm.json`):
 ```json
 {
@@ -23,14 +23,15 @@ Add to your LegionIO settings directory:
       "bedrock": {
         "enabled": true,
         "region": "us-east-2",
-        "vault_path": "legion/bedrock"
+        "bearer_token": ["vault://secret/data/llm/bedrock#bearer_token", "env://AWS_BEARER_TOKEN"]
       },
       "anthropic": {
         "enabled": false,
-        "vault_path": "legion/anthropic"
+        "api_key": "env://ANTHROPIC_API_KEY"
       },
       "openai": {
-        "enabled": false
+        "enabled": false,
+        "api_key": "env://OPENAI_API_KEY"
       },
       "ollama": {
         "enabled": false,
@@ -41,7 +42,7 @@ Add to your LegionIO settings directory:
 }
 ```
-Credentials are resolved from Vault automatically when `vault_path` is set and Legion::Crypt is connected.
+Credentials are resolved automatically by the universal secret resolver in `legion-settings` (v1.3.0+). Use `vault://` URIs for Vault secrets, `env://` for environment variables, or plain strings for static values. Array values act as fallback chains — the first non-nil result wins.
 ### Provider Configuration
@@ -50,8 +51,7 @@ Each provider supports these common fields:
 | Field | Type | Description |
 |-------|------|-------------|
 | `enabled` | Boolean | Enable this provider (default: `false`) |
-| `api_key` | String | API key (resolved from Vault if `vault_path` set) |
-| `vault_path` | String | Vault secret path for credential resolution |
+| `api_key` | String | API key (supports `vault://`, `env://`, or plain string) |
 Provider-specific fields:
@@ -60,19 +60,23 @@ Provider-specific fields:
 | **Bedrock** | `secret_key`, `session_token`, `region` (default: `us-east-2`), `bearer_token` (alternative to SigV4 — for AWS Identity Center/SSO) |
 | **Ollama** | `base_url` (default: `http://localhost:11434`) |
-### Vault Credential Resolution
+### Credential Resolution
-When `vault_path` is set and `Legion::Crypt::Vault` is connected, credentials are fetched from Vault at startup. The secret keys map to provider fields automatically:
+All credential fields support the universal `vault://` and `env://` URI schemes provided by `legion-settings`. Use array values for fallback chains:
-| Provider | Vault Key | Maps To |
-|----------|-----------|---------|
-| Bedrock | `access_key` / `aws_access_key_id` | `api_key` |
-| Bedrock | `secret_key` / `aws_secret_access_key` | `secret_key` |
-| Bedrock | `session_token` / `aws_session_token` | `session_token` |
-| Bedrock | `bearer_token` / `aws_bearer_token` | `bearer_token` (Identity Center/SSO) |
-| Anthropic / OpenAI / Gemini | `api_key` / `token` | `api_key` |
+```json
+{
+  "bedrock": {
+    "enabled": true,
+    "api_key": ["vault://secret/data/llm/bedrock#access_key", "env://AWS_ACCESS_KEY_ID"],
+    "secret_key": ["vault://secret/data/llm/bedrock#secret_key", "env://AWS_SECRET_ACCESS_KEY"],
+    "bearer_token": ["vault://secret/data/llm/bedrock#bearer_token", "env://AWS_BEARER_TOKEN"],
+    "region": "us-east-2"
+  }
+}
+```
-Direct configuration (setting `api_key` in settings) takes precedence over Vault-resolved values.
+By the time `Legion::LLM.start` runs, all `vault://` and `env://` references have already been resolved to plain strings by `Legion::Settings.resolve_secrets!` (called in the boot sequence after `Legion::Crypt.start`). The `env://` scheme works even when Vault is not connected.
 ### Auto-Detection
@@ -91,7 +95,7 @@ If no `default_model` or `default_provider` is set, legion-llm auto-detects from
 ### Lifecycle
 ```ruby
-Legion::LLM.start       # Configure providers, resolve Vault credentials, warm discovery caches, set defaults, ping provider
+Legion::LLM.start       # Configure providers, warm discovery caches, set defaults, ping provider
 Legion::LLM.shutdown     # Mark disconnected, clean up
 Legion::LLM.started?     # -> Boolean
 Legion::LLM.settings     # -> Hash (current LLM settings)
@@ -556,10 +560,10 @@ end
 | Provider | Config Key | Credential Source | Notes |
 |----------|-----------|-------------------|-------|
-| AWS Bedrock | `bedrock` | Vault (`access_key`, `secret_key`) or direct | Default region: us-east-2 |
-| Anthropic | `anthropic` | Vault (`api_key`) or direct | Direct API access |
-| OpenAI | `openai` | Vault (`api_key`) or direct | GPT models |
-| Google Gemini | `gemini` | Vault (`api_key`) or direct | Gemini models |
+| AWS Bedrock | `bedrock` | `vault://`, `env://`, or direct | Default region: us-east-2, SigV4 or Bearer Token auth |
+| Anthropic | `anthropic` | `vault://`, `env://`, or direct | Direct API access |
+| OpenAI | `openai` | `vault://`, `env://`, or direct | GPT models |
+| Google Gemini | `gemini` | `vault://`, `env://`, or direct | Gemini models |
 | Ollama | `ollama` | Local, no credentials needed | Local inference |
 ## Integration with LegionIO

data/lib/legion/llm/embeddings.rb ADDED Viewed

@@ -0,0 +1,43 @@
+# frozen_string_literal: true
+module Legion
+  module LLM
+    module Embeddings
+      class << self
+        def generate(text:, model: nil, dimensions: nil)
+          model ||= default_model
+          opts = { model: model }
+          opts[:dimensions] = dimensions if dimensions
+          response = RubyLLM.embed(text, **opts)
+          {
+            vector:     response.vectors.first,
+            model:      model,
+            dimensions: response.vectors.first&.size || 0,
+            tokens:     response.input_tokens
+          }
+        rescue StandardError => e
+          Legion::Logging.warn "Embedding failed: #{e.message}" if defined?(Legion::Logging)
+          { vector: nil, model: model, error: e.message }
+        end
+        def generate_batch(texts:, model: nil, dimensions: nil)
+          model ||= default_model
+          opts = { model: model }
+          opts[:dimensions] = dimensions if dimensions
+          response = RubyLLM.embed(texts, **opts)
+          response.vectors.each_with_index.map do |vec, i|
+            { vector: vec, model: model, dimensions: vec&.size || 0, index: i }
+          end
+        rescue StandardError => e
+          texts.map { |_| { vector: nil, model: model, error: e.message } }
+        end
+        def default_model
+          Legion::Settings.dig(:llm, :embeddings, :default_model) || 'text-embedding-3-small'
+        end
+      end
+    end
+  end
+end

data/lib/legion/llm/shadow_eval.rb ADDED Viewed

@@ -0,0 +1,49 @@
+# frozen_string_literal: true
+module Legion
+  module LLM
+    module ShadowEval
+      class << self
+        def enabled?
+          Legion::Settings.dig(:llm, :shadow, :enabled) == true
+        end
+        def should_sample?
+          return false unless enabled?
+          rate = Legion::Settings.dig(:llm, :shadow, :sample_rate) || 0.1
+          rand < rate
+        end
+        def evaluate(primary_response:, messages: nil, shadow_model: nil) # rubocop:disable Lint/UnusedMethodArgument
+          shadow_model ||= Legion::Settings.dig(:llm, :shadow, :model) || 'gpt-4o-mini'
+          shadow_response = Legion::LLM.send(:chat_single,
+                                             model: shadow_model, provider: nil,
+                                             intent: nil, tier: nil,
+                                             skip_shadow: true)
+          comparison = compare(primary_response, shadow_response, shadow_model)
+          Legion::Events.emit('llm.shadow_eval', comparison) if defined?(Legion::Events)
+          comparison
+        rescue StandardError => e
+          { error: e.message, shadow_model: shadow_model }
+        end
+        def compare(primary, shadow, shadow_model)
+          primary_len = primary[:content]&.length || 0
+          shadow_len = shadow[:content]&.length || 0
+          {
+            primary_model:  primary[:model],
+            shadow_model:   shadow_model,
+            primary_tokens: primary[:usage],
+            shadow_tokens:  shadow[:usage],
+            length_ratio:   primary_len.zero? ? 0.0 : shadow_len.to_f / primary_len,
+            evaluated_at:   Time.now.utc
+          }
+        end
+      end
+    end
+  end
+end

data/lib/legion/llm/structured_output.rb ADDED Viewed

@@ -0,0 +1,74 @@
+# frozen_string_literal: true
+module Legion
+  module LLM
+    module StructuredOutput
+      SCHEMA_CAPABLE_MODELS = %w[gpt-4o gpt-4o-mini gpt-4-turbo claude-3-5-sonnet claude-4-sonnet claude-4-opus].freeze
+      class << self
+        def generate(messages:, schema:, model: nil, **)
+          model ||= Legion::LLM.settings[:default_model]
+          result = call_with_schema(messages, schema, model, **)
+          parsed = Legion::JSON.load(result[:content])
+          { data: parsed, raw: result[:content], model: result[:model], valid: true }
+        rescue ::JSON::ParserError => e
+          handle_parse_error(e, messages, schema, model, result, **)
+        end
+        private
+        def call_with_schema(messages, schema, model, **opts)
+          if supports_response_format?(model)
+            Legion::LLM.send(:chat_single,
+                             model: model, provider: nil, intent: nil, tier: nil,
+                             response_format: { type:        'json_schema',
+                                                json_schema: { name: 'response', schema: schema } },
+                             **opts.except(:attempt))
+          else
+            instruction = "You MUST respond with valid JSON matching this schema:\n" \
+                          "```json\n#{Legion::JSON.dump(schema)}\n```\n" \
+                          'Respond with ONLY the JSON object, no other text.'
+            augmented = [{ role: 'system', content: instruction }] + Array(messages)
+            Legion::LLM.send(:chat_single,
+                             model: model, provider: nil, intent: nil, tier: nil,
+                             messages: augmented, **opts.except(:attempt))
+          end
+        end
+        def handle_parse_error(error, messages, schema, model, result, **opts)
+          if retry_enabled? && (opts[:attempt] || 0) < max_retries
+            retry_with_instruction(messages, schema, model, attempt: (opts[:attempt] || 0) + 1, **opts)
+          else
+            { data: nil, error: "JSON parse failed: #{error.message}", raw: result&.dig(:content), valid: false }
+          end
+        end
+        def retry_with_instruction(messages, schema, model, **opts)
+          instruction = "Your previous response was not valid JSON. Respond with ONLY a valid JSON object matching this schema:\n#{Legion::JSON.dump(schema)}"
+          augmented = Array(messages) + [{ role: 'user', content: instruction }]
+          result = Legion::LLM.send(:chat_single,
+                                    model: model, provider: nil, intent: nil, tier: nil,
+                                    messages: augmented, **opts.except(:attempt))
+          parsed = Legion::JSON.load(result[:content])
+          { data: parsed, raw: result[:content], model: result[:model], valid: true, retried: true }
+        rescue StandardError => e
+          { data: nil, error: e.message, valid: false }
+        end
+        def supports_response_format?(model)
+          SCHEMA_CAPABLE_MODELS.any? { |m| model.to_s.include?(m) }
+        end
+        def retry_enabled?
+          Legion::Settings.dig(:llm, :structured_output, :retry_on_parse_failure) != false
+        end
+        def max_retries
+          Legion::Settings.dig(:llm, :structured_output, :max_retries) || 2
+        end
+      end
+    end
+  end
+end

data/lib/legion/llm/version.rb CHANGED Viewed

@@ -2,6 +2,6 @@
 module Legion
   module LLM
-    VERSION = '0.3.1'
+    VERSION = '0.3.2'
   end
 end

data/lib/legion/llm.rb CHANGED Viewed

@@ -74,16 +74,30 @@ module Legion
         end
       end
-      # Generate embeddings
+      # Generate embeddings via Embeddings module
       # @param text [String, Array<String>] text to embed
       # @param model [String] embedding model ID
-      # @return [RubyLLM::Embedding]
-      def embed(text, model: nil)
-        if model
-          RubyLLM.embed(text, model: model)
-        else
-          RubyLLM.embed(text)
-        end
+      # @return [Hash] { vector:, model:, dimensions:, tokens: }
+      def embed(text, **)
+        require 'legion/llm/embeddings'
+        Embeddings.generate(text: text, **)
+      end
+      # Batch embed multiple texts
+      # @param texts [Array<String>] texts to embed
+      # @return [Array<Hash>]
+      def embed_batch(texts, **)
+        require 'legion/llm/embeddings'
+        Embeddings.generate_batch(texts: texts, **)
+      end
+      # Generate structured JSON output from LLM
+      # @param messages [Array<Hash>] conversation messages
+      # @param schema [Hash] JSON schema to enforce
+      # @return [Hash] { data:, raw:, model:, valid: }
+      def structured(messages:, schema:, **)
+        require 'legion/llm/structured_output'
+        StructuredOutput.generate(messages: messages, schema: schema, **)
       end
       # Create a configured agent instance

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: legion-llm
 version: !ruby/object:Gem::Version
-  version: 0.3.1
+  version: 0.3.2
 platform: ruby
 authors:
 - Esity
@@ -92,6 +92,7 @@ files:
 - lib/legion/llm/compressor.rb
 - lib/legion/llm/discovery/ollama.rb
 - lib/legion/llm/discovery/system.rb
+- lib/legion/llm/embeddings.rb
 - lib/legion/llm/escalation_history.rb
 - lib/legion/llm/helpers/llm.rb
 - lib/legion/llm/providers.rb
@@ -102,6 +103,8 @@ files:
 - lib/legion/llm/router/resolution.rb
 - lib/legion/llm/router/rule.rb
 - lib/legion/llm/settings.rb
+- lib/legion/llm/shadow_eval.rb
+- lib/legion/llm/structured_output.rb
 - lib/legion/llm/transport/exchanges/escalation.rb
 - lib/legion/llm/transport/messages/escalation_event.rb
 - lib/legion/llm/version.rb