RubyGems - lex-llm-vllm - Versions diffs - 0.2.11 → 0.2.13 - Mend

lex-llm-vllm 0.2.11 → 0.2.13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +8 -0
data/README.md +213 -31
data/lib/legion/extensions/llm/vllm/actors/discovery_refresh.rb +48 -0
data/lib/legion/extensions/llm/vllm/version.rb +1 -1
data/lib/legion/extensions/llm/vllm.rb +1 -1
metadata +2 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: c4369ada04bb372dd59d3a57e2490c2b984c6ec1d7c4277c044b0f4126ba6136
-  data.tar.gz: 922947d49abecdbefbc7818dff56a82a35d1cae0978ebb94f5e9711f02a402c1
+  metadata.gz: 6adc86b9d3286821c0efa59e4c820f3d99ee0acb5327f133a96010383d154505
+  data.tar.gz: 73ecff7ccc309eb0469a79edc3970fdf3d766199a6df31edde4cbaf2016dc970
 SHA512:
-  metadata.gz: e2d9c3ffc63f2ba151573ebd898c0a08e8f73c52fe11ba38d452470e61569a33914d253ad8b1d08b235f90cfbc1f9613dea4bae5728d4b905539241e7034f9ff
-  data.tar.gz: 9cf36664ead33936f9c70ca3e892603ed6a66e1e3ef82560e7016e94207835ae6de4af7e7e9041e70eeb69b3117a48d780ca4a7e71aa3626ff8cc497800e8453
+  metadata.gz: 4b2c498e26f09fa27edfa7abf08bf6fae656313cf6e2ce625772a9ce809ff1fcfae55a8746261b943b6b41111a46784fa97d9d5004f4be69f58761de05c6383d
+  data.tar.gz: 94b867bd099f8e062f23aee550be30d4549bbc3d4f10d40b6e4e9b3dcabca7c8837e246198a6ea9a208035da3dcfaaf41be626709de9c8e642e3a4035f6681b0

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,13 @@
 # Changelog
+## 0.2.13 - 2026-06-05
+- Fix missing documentation comment on `DiscoveryRefresh` actor (RuboCop Style/Documentation)
+## 0.2.12 - 2026-05-29
+- Add capabilities `[:completion, :streaming, :vision, :tools]` to `DEFAULT_INSTANCE_TIER` so routing can match vLLM instances by required capability without live discovery
 ## 0.2.11 - 2026-05-21
 - Add `default_transport`/`default_tier` class declarations, remove duplicate instance methods

data/README.md CHANGED Viewed

@@ -2,24 +2,138 @@
 LegionIO LLM provider extension for [vLLM](https://docs.vllm.ai/).
-This gem lives under `Legion::Extensions::Llm::Vllm` and depends on `lex-llm >= 0.4.3` for shared provider-neutral routing, response normalization, fleet envelopes, responder-side fleet execution, and schema primitives.
+This gem provides a complete vLLM adapter for the LegionIO LLM routing layer. It speaks the OpenAI-compatible API, discovers models at runtime, publishes availability events, and supports vLLM-specific features like thinking mode and server lifecycle management.
-Load it with `require 'legion/extensions/llm/vllm'`.
+**Namespace:** `Legion::Extensions::Llm::Vllm`
+**Provider slug:** `:vllm`
+**Dependency:** `lex-llm >= 0.4.3`
-## What It Provides
+Load with:
-- `Legion::Extensions::Llm::Provider` registration as `:vllm`
-- Shared `Legion::Extensions::Llm::Provider::OpenAICompatible` request and response handling
-- Chat requests through `POST /v1/chat/completions`
-- Streaming chat with `stream_usage_supported?` for token usage reporting
-- Model discovery through `GET /v1/models`
-- Embeddings through `POST /v1/embeddings`
-- vLLM thinking mode via `chat_template_kwargs` (configurable through `Legion::Settings`)
-- Best-effort `llm.registry` readiness and model availability event publishing when transport is loaded
-- vLLM management helpers: `/health`, `/version`, `/reset_prefix_cache`, `/reset_mm_cache`, `/sleep`, `/wake_up`
-- Normalized OpenAI-compatible capability and modality metadata for discovered models
-- Shared fleet/default settings via `Legion::Extensions::Llm.provider_settings`
-- Structured `Legion::Logging::Helper` handling for provider discovery and fallback paths
+```ruby
+require 'legion/extensions/llm/vllm'
+```
+---
+## Architecture at a Glance
+```
+Legion::Extensions::Llm::Vllm          # Root module (namespace, discovery, defaults)
+  |-- Provider                          # Per-instance provider (chat, models, management)
+  |     |-- OpenAICompatible (mixin)    # Shared request/response handling
+  |     |-- Capabilities (module)       # Capability predicates for offerings
+  |
+  |-- Actor::DiscoveryRefresh           # Periodic actor: refreshes discovered model list
+  |-- Actor::FleetWorker                # Subscription actor: consumes fleet requests
+  |
+  |-- Runners::FleetWorker              # Runner: delegates to Fleet::ProviderResponder
+```
+### File Map
+| File | What |
+|------|------|
+| `lib/legion/extensions/llm/vllm.rb` | Root module, `discover_instances`, `default_settings`, alias normalization |
+| `lib/legion/extensions/llm/vllm/version.rb` | `VERSION` constant |
+| `lib/legion/extensions/llm/vllm/provider.rb` | Provider class, chat/embeddings/model discovery, management endpoints |
+| `lib/legion/extensions/llm/vllm/actors/discovery_refresh.rb` | Periodic actor to refresh model discovery cache |
+| `lib/legion/extensions/llm/vllm/actors/fleet_worker.rb` | Subscription actor for fleet request consumption |
+| `lib/legion/extensions/llm/vllm/runners/fleet_worker.rb` | Runner entrypoint that delegates to `Fleet::ProviderResponder` |
+---
+## Key Classes
+### `Legion::Extensions::Llm::Vllm` (Root Module)
+The top-level module. It handles auto-registration via `Legion::Extensions::Llm::AutoRegistration`, instance discovery, and configuration normalization.
+**Constants:**
+- `PROVIDER_FAMILY` — `:vllm`
+- `DEFAULT_INSTANCE_TIER` — `{ tier: :direct, capabilities: [:completion, :streaming, :vision, :tools] }`
+**Class methods:**
+| Method | Description |
+|--------|-------------|
+| `default_settings` | Returns the full default settings hash (endpoint, fleet, thinking, etc.) |
+| `provider_class` | Returns `Provider` |
+| `registry_publisher` | Memoized `Legion::Extensions::Llm::RegistryPublisher` instance |
+| `discover_instances` | Probes `localhost:8000` health endpoint, merges configured instances from `Legion::Settings` |
+| `normalize_instance_config(config)` | Normalizes config keys (`base_url`/`api_base`/`endpoint` -> `vllm_api_base`), infers tier |
+| `normalize_api_base(url)` | Strips trailing `/v1` from URLs |
+| `infer_tier_from_endpoint(url)` | Returns `:local` for localhost addresses, `:direct` otherwise |
+**Instance discovery sources:**
+1. HTTP health probe against `http://localhost:8000` (0.1s timeout) -> `:local` tier
+2. Configured instances under `Legion::Settings[:extensions][:llm][:vllm][:instances]`
+### `Legion::Extensions::Llm::Vllm::Provider`
+The per-instance provider class. Inherits from `Legion::Extensions::Llm::Provider` and mixes in `OpenAICompatible` for shared HTTP request/response handling.
+**Class methods:**
+| Method | Returns |
+|--------|---------|
+| `slug` | `'vllm'` |
+| `local?` | `false` |
+| `default_transport` | `:http` |
+| `default_tier` | `:direct` |
+| `configuration_options` | `[:vllm_api_base, :vllm_api_key]` |
+| `configuration_requirements` | `[]` (no required fields) |
+| `capabilities` | `Capabilities` module |
+| `registry_publisher` | Delegates to `Vllm.registry_publisher` |
+**Instance methods:**
+| Method | Description |
+|--------|-------------|
+| `api_base` | Normalized API root from config, settings, or `http://localhost:8000` |
+| `headers` | Identity headers + optional Bearer token |
+| `settings` | Returns `Vllm.default_settings` |
+| `health(live:)` | `GET /health` |
+| `readiness(live:)` | Checks readiness, publishes async readiness event when `live: true` |
+| `list_models` | `GET /v1/models`, publishes async model availability events |
+| `discover_offerings(live:, **)` | Builds `ModelOffering` instances from discovered models (uses cache when not live) |
+| `version` | `GET /version` |
+| `fetch_model_detail(model_name)` | Re-fetches `/v1/models` to resolve `context_window` on cache miss |
+| `stream_usage_supported?` | Always `true` for vLLM |
+| `reset_prefix_cache(reset_running_requests:, reset_external:)` | `POST /reset_prefix_cache` |
+| `reset_mm_cache` | `POST /reset_mm_cache` |
+| `sleep(level:)` | `POST /sleep` |
+| `wake_up(tags:)` | `POST /wake_up` |
+**Payload rendering:** Overrides `render_payload` to support vLLM thinking mode via `chat_template_kwargs` and strips `reasoning_effort`.
+### `Provider::Capabilities` (Module)
+Predicate methods for model capability detection. All return `true` for vLLM by default:
+- `chat?(model)`, `streaming?(model)`, `vision?(model)`, `functions?(model)`, `embeddings?(model)`
+- `critical_capabilities_for(model)` — returns array of active capability names
+### `Actor::DiscoveryRefresh`
+Periodic actor (extends `Legion::Extensions::Actors::Every`) that refreshes the vLLM discovered model list.
+- **Default interval:** 1800 seconds (30 minutes)
+- **Configurable via:** `Legion::Settings[:extensions][:llm][:vllm][:discovery_interval]`
+- **Action:** Calls `Legion::LLM::Discovery.refresh_discovered_models!(provider: :vllm)`
+### `Actor::FleetWorker`
+Subscription actor (extends `Legion::Extensions::Actors::Subscription`) that consumes LLM fleet requests routed to vLLM.
+- Only activates when `Fleet::ProviderResponder.enabled_for?` returns true for discovered instances
+- Delegates execution to `Runners::FleetWorker.handle_fleet_request`
+### `Runners::FleetWorker`
+Runner module that dispatches fleet requests to `Legion::Extensions::Llm::Fleet::ProviderResponder` with vLLM-specific context (provider family, class, instance discovery callback).
+---
 ## Defaults
@@ -49,8 +163,12 @@ Legion::Extensions::Llm::Vllm.default_settings
 # }
 ```
+---
 ## Configuration
+### Per-instance via Legion::Extensions::Llm.configure
 ```ruby
 Legion::Extensions::Llm.configure do |config|
   config.vllm_api_base = "http://localhost:8000"
@@ -60,9 +178,36 @@ Legion::Extensions::Llm.configure do |config|
 end
 ```
+### Multi-instance via Legion::Settings
+```yaml
+extensions:
+  llm:
+    vllm:
+      discovery_interval: 1800  # seconds between model list refreshes
+      instances:
+        production:
+          vllm_api_base: "https://vllm.example.com"
+          tier: :direct
+        local:
+          vllm_api_base: "http://localhost:8000"
+          tier: :local
+```
+### Endpoint alias normalization
+The following keys are all resolved to `vllm_api_base` during instance config normalization:
+- `base_url`
+- `api_base`
+- `endpoint`
+Trailing `/v1` is stripped automatically.
+---
 ## Fleet Responder
-Provider instances can opt in to consuming Legion LLM fleet requests. The provider-owned fleet actor only starts when at least one configured instance enables `respond_to_requests`, and request execution delegates to `Legion::Extensions::Llm::Fleet::ProviderResponder`.
+Provider instances can opt in to consuming Legion LLM fleet requests. The fleet actor only starts when at least one configured instance enables `respond_to_requests`.
 ```yaml
 extensions:
@@ -79,29 +224,51 @@ extensions:
               - embed
 ```
-### Thinking Mode
+Execution flows: `Actor::FleetWorker` (receives message) -> `Runners::FleetWorker.handle_fleet_request` -> `Fleet::ProviderResponder.call`.
+---
+## Thinking Mode
-Enable vLLM thinking mode globally via settings:
+vLLM supports a "thinking" mode that enables extended reasoning. Enable via:
+**Instance-level:**
+```yaml
+extensions:
+  llm:
+    vllm:
+      instances:
+        default:
+          enable_thinking: true
+```
+**Global:**
 ```ruby
-# In Legion::Settings or settings JSON
+# Legion::Settings or settings JSON
 { llm: { providers: { vllm: { enable_thinking: true } } } }
 ```
-Or pass `thinking: { enabled: true }` per-request. When enabled, the provider adds `chat_template_kwargs: { enable_thinking: true }` to the payload and strips `reasoning_effort`.
+**Per-request:**
+```ruby
+# Pass thinking: { enabled: true } in the chat kwargs
+```
+When enabled, the provider adds `chat_template_kwargs: { enable_thinking: true }` to the chat payload and strips the OpenAI-specific `reasoning_effort` key.
+---
 ## Management Endpoints
-The provider exposes helpers for vLLM server management:
+| Method | Endpoint | Kwargs | Description |
+|--------|----------|--------|-------------|
+| `health(live:)` | `GET /health` | `live:` | Server health check |
+| `version` | `GET /version` | none | Server version info |
+| `reset_prefix_cache` | `POST /reset_prefix_cache` | `reset_running_requests:`, `reset_external:` | Clear prefix cache |
+| `reset_mm_cache` | `POST /reset_mm_cache` | none | Clear multimodal cache |
+| `sleep(level:)` | `POST /sleep` | `level:` (default: 1) | Put worker to sleep |
+| `wake_up(tags:)` | `POST /wake_up` | `tags:` | Wake worker up |
-| Method | Endpoint | Description |
-|--------|----------|-------------|
-| `health` | `GET /health` | Server health check |
-| `version` | `GET /version` | Server version info |
-| `reset_prefix_cache` | `POST /reset_prefix_cache` | Clear prefix cache |
-| `reset_mm_cache` | `POST /reset_mm_cache` | Clear multimodal cache |
-| `sleep(level:)` | `POST /sleep` | Put server to sleep |
-| `wake_up(tags:)` | `POST /wake_up` | Wake server up |
+---
 ## Registry Publishing
@@ -110,16 +277,31 @@ When `lex-llm` routing and Legion transport are available, the provider publishe
 - **Readiness events** on `readiness(live: true)` calls
 - **Model availability events** on `list_models` discovery
-Publishing is async (background threads) and never blocks the caller. All failures are handled gracefully via `handle_exception`.
+All publishing is async (background threads) and never blocks the caller. Failures are logged via `handle_exception`.
+---
+## Model Discovery & Offerings
+On `list_models`, vLLM returns `max_model_len` which is mapped to `context_length`. This value is:
+1. Attached to `Model::Info` objects
+2. Cached via `cache_set` with 86400s TTL keyed by `model_detail_cache_key`
+3. Available in routing offerings via `limits: { context_window: ctx }`
+`discover_offerings(live: false)` serves from the cached model list without hitting the network.
+---
 ## Development
 ```bash
 bundle install
-bundle exec rspec --format json --out tmp/rspec_results.json --format progress --out tmp/rspec_progress.txt
+bundle exec rspec
 bundle exec rubocop -A
 ```
+---
 ## License
 MIT

data/lib/legion/extensions/llm/vllm/actors/discovery_refresh.rb ADDED Viewed

@@ -0,0 +1,48 @@
+# frozen_string_literal: true
+begin
+  require 'legion/extensions/actors/every'
+rescue LoadError => e
+  warn(e.message) if $VERBOSE
+end
+return unless defined?(Legion::Extensions::Actors::Every)
+module Legion
+  module Extensions
+    module Llm
+      module Vllm
+        module Actor
+          # Periodic actor that refreshes the vLLM discovered model list.
+          class DiscoveryRefresh < Legion::Extensions::Actors::Every
+            include Legion::Logging::Helper
+            REFRESH_INTERVAL = 1800
+            def runner_class    = self.class
+            def runner_function = 'manual'
+            def run_now?        = true
+            def use_runner?     = false
+            def check_subtask?  = false
+            def generate_task?  = false
+            def time
+              return REFRESH_INTERVAL unless defined?(Legion::Settings)
+              Legion::Settings.dig(:extensions, :llm, :vllm, :discovery_interval) || REFRESH_INTERVAL
+            end
+            def manual
+              log.debug('[vllm][discovery_refresh] refreshing model list')
+              return unless defined?(Legion::LLM::Discovery)
+              Legion::LLM::Discovery.refresh_discovered_models!(provider: :vllm)
+            rescue StandardError => e
+              handle_exception(e, level: :warn, handled: true, operation: 'vllm.actor.discovery_refresh')
+            end
+          end
+        end
+      end
+    end
+  end
+end

data/lib/legion/extensions/llm/vllm/version.rb CHANGED Viewed

@@ -4,7 +4,7 @@ module Legion
   module Extensions
     module Llm
       module Vllm
-        VERSION = '0.2.11'
+        VERSION = '0.2.13'
       end
     end
   end

data/lib/legion/extensions/llm/vllm.rb CHANGED Viewed

@@ -15,7 +15,7 @@ module Legion
         extend Legion::Extensions::Llm::AutoRegistration
         PROVIDER_FAMILY = :vllm
-        DEFAULT_INSTANCE_TIER = { tier: :direct }.freeze
+        DEFAULT_INSTANCE_TIER = { tier: :direct, capabilities: %i[completion streaming vision tools] }.freeze
         def self.default_settings
           ::Legion::Extensions::Llm.provider_settings(

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: lex-llm-vllm
 version: !ruby/object:Gem::Version
-  version: 0.2.11
+  version: 0.2.13
 platform: ruby
 authors:
 - LegionIO
@@ -97,6 +97,7 @@ files:
 - README.md
 - lex-llm-vllm.gemspec
 - lib/legion/extensions/llm/vllm.rb
+- lib/legion/extensions/llm/vllm/actors/discovery_refresh.rb
 - lib/legion/extensions/llm/vllm/actors/fleet_worker.rb
 - lib/legion/extensions/llm/vllm/provider.rb
 - lib/legion/extensions/llm/vllm/runners/fleet_worker.rb