lex-llm-vllm 0.2.0 → 0.2.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 261a83f8a4243c795e10e759e5f7f0681cd02d91b91041ff6f11f50d498d1f2a
4
- data.tar.gz: c136080255383ca0d3417937d3e4988701075fef4df4c33de3e85fbd9f6ae04b
3
+ metadata.gz: 14dc7ee5334135f8eece7622a27cb57bad3c9c885cb6203de56006b5d2a1b353
4
+ data.tar.gz: 59ea84f7c50a9407da2af50c51f77ad26894d45b7f06d9c0223198a73a7e22a8
5
5
  SHA512:
6
- metadata.gz: a521587328074e46f4403783b85d001f6a9a4cab77e31556ea565432dea12535ed9d0c656b8f8212b20d37f196d8c08f86be25955ff25813ca908e03b5fa8e60
7
- data.tar.gz: 3b74fa8c6ecfd4c71fb027a5eb13d3d41b7acf1c42cebcbfeea0e4143ec5ef9ad2693020c284eb3377a8f77fe6cd492c85ed9b9af8f515ad32918f152df224a8
6
+ metadata.gz: 05c8fa912c908ec88943277dfd2e2b82f84bbd4ff6880d4e1f288f1397baf2cc1f72c2d20e894684be14be9f5f213e381ea5f524bfed3162e116de0271009f93
7
+ data.tar.gz: 14fffc18eff78d0c8751fbb96c9db6cbdd2773af66c46359b5934fad09d9a8fa37183a4f123f7cb75008e44cf821a21d97756d47633e44a50ab935f3f5743f0d
@@ -8,8 +8,20 @@ jobs:
8
8
  ci:
9
9
  uses: LegionIO/.github/.github/workflows/ci.yml@main
10
10
 
11
+ excluded-files:
12
+ uses: LegionIO/.github/.github/workflows/excluded-files.yml@main
13
+
14
+ security:
15
+ uses: LegionIO/.github/.github/workflows/security-scan.yml@main
16
+
17
+ version-changelog:
18
+ uses: LegionIO/.github/.github/workflows/version-changelog.yml@main
19
+
20
+ dependency-review:
21
+ uses: LegionIO/.github/.github/workflows/dependency-review.yml@main
22
+
11
23
  release:
12
- needs: ci
24
+ needs: [ci, excluded-files, security]
13
25
  if: github.event_name == 'push' && github.ref == 'refs/heads/main'
14
26
  uses: LegionIO/.github/.github/workflows/release.yml@main
15
27
  secrets:
data/CHANGELOG.md CHANGED
@@ -1,5 +1,42 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.2.6 - 2026-05-06
4
+
5
+ - Load provider-owned fleet actors through the LegionIO subscription base and the canonical vLLM provider root.
6
+ - Keep fleet runners anchored on the provider root namespace so provider constants and instance discovery are always loaded.
7
+ - Normalize configured `endpoint` and `api_base` aliases to `vllm_api_base`.
8
+ - Preserve configured transport and tier metadata when vLLM builds routing offerings.
9
+ - Gate release publishing on the shared security workflow.
10
+
11
+ ## 0.2.5 - 2026-05-06
12
+
13
+ - Mark handled vLLM offering-discovery failures as handled when logging through `Legion::Logging::Helper`.
14
+ - Refresh README dependency, defaults, and local verification guidance for the `lex-llm >= 0.4.3` fleet responder contract.
15
+
16
+ ## 0.2.4 - 2026-05-06
17
+
18
+ - Use the shared `lex-llm` fleet provider responder helper for provider-owned fleet workers.
19
+ - Remove the runtime `legion-llm` dependency and require `lex-llm >= 0.4.3` for responder-side fleet execution.
20
+
21
+ ## 0.2.3 - 2026-05-06
22
+
23
+ - Remove require-time provider self-registration; `legion-llm` now owns adapter creation and registry writes from loaded provider discovery metadata.
24
+ - Bump dependency floors to `lex-llm >= 0.4.1` and `legion-llm >= 0.9.1`.
25
+
26
+ ## 0.2.2 - 2026-05-06
27
+
28
+ - Enforce the shared keyword-only `lex-llm` provider contract and accept `health(live:)`.
29
+ - Move vLLM defaults back to `Legion::Extensions::Llm.provider_settings` with instance-level fleet responder settings.
30
+ - Read vLLM thinking defaults from the nested provider instance settings shape.
31
+ - Serve non-live vLLM offering reads from cached live model discovery instead of probing the configured endpoint.
32
+ - Add provider-owned fleet responder actor and runner backed by `legion-llm` fleet policy execution.
33
+ - Bump the transport dependency floor to `legion-transport >= 1.4.14`.
34
+
35
+ ## 0.2.1 - 2026-05-03
36
+
37
+ - Normalize configured `base_url` instance settings to `vllm_api_base` so LegionIO local settings are honored during provider registration.
38
+ - Strip a trailing `/v1` from configured vLLM API roots because OpenAI-compatible endpoints append their own `/v1/...` paths.
39
+
3
40
  ## 0.2.0 - 2026-05-01
4
41
 
5
42
  - Add auto-discovery via CredentialSources and AutoRegistration from lex-llm 0.3.0
data/Gemfile CHANGED
@@ -4,6 +4,8 @@ source 'https://rubygems.org'
4
4
 
5
5
  group :test do
6
6
  llm_base_path = ENV.fetch('LEX_LLM_PATH', File.expand_path('../lex-llm', __dir__))
7
+ transport_path = ENV.fetch('LEGION_TRANSPORT_PATH', File.expand_path('../../legion-transport', __dir__))
8
+ gem 'legion-transport', path: transport_path if File.directory?(transport_path)
7
9
  gem 'lex-llm', path: llm_base_path if File.directory?(llm_base_path)
8
10
  end
9
11
 
data/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  LegionIO LLM provider extension for [vLLM](https://docs.vllm.ai/).
4
4
 
5
- This gem lives under `Legion::Extensions::Llm::Vllm` and depends on `lex-llm` for shared provider-neutral routing, fleet, and schema primitives.
5
+ This gem lives under `Legion::Extensions::Llm::Vllm` and depends on `lex-llm >= 0.4.3` for shared provider-neutral routing, response normalization, fleet envelopes, responder-side fleet execution, and schema primitives.
6
6
 
7
7
  Load it with `require 'legion/extensions/llm/vllm'`.
8
8
 
@@ -19,7 +19,7 @@ Load it with `require 'legion/extensions/llm/vllm'`.
19
19
  - vLLM management helpers: `/health`, `/version`, `/reset_prefix_cache`, `/reset_mm_cache`, `/sleep`, `/wake_up`
20
20
  - Normalized OpenAI-compatible capability and modality metadata for discovered models
21
21
  - Shared fleet/default settings via `Legion::Extensions::Llm.provider_settings`
22
- - Full `Legion::Logging::Helper` integration with structured `handle_exception` across all classes
22
+ - Structured `Legion::Logging::Helper` handling for provider discovery and fallback paths
23
23
 
24
24
  ## Defaults
25
25
 
@@ -30,10 +30,20 @@ Legion::Extensions::Llm::Vllm.default_settings
30
30
  # instances: {
31
31
  # default: {
32
32
  # endpoint: "http://localhost:8000",
33
- # tier: :private,
33
+ # tier: :direct,
34
34
  # transport: :http,
35
- # usage: { inference: true, embedding: true },
36
- # limits: { concurrency: 8 }
35
+ # credentials: { api_key: nil },
36
+ # enable_thinking: true,
37
+ # usage: { inference: true, embedding: true, image: true },
38
+ # limits: { concurrency: 1 },
39
+ # fleet: {
40
+ # enabled: false,
41
+ # respond_to_requests: false,
42
+ # capabilities: [:chat, :stream_chat, :embed],
43
+ # lanes: [],
44
+ # concurrency: 1,
45
+ # queue_suffix: nil
46
+ # }
37
47
  # }
38
48
  # }
39
49
  # }
@@ -50,6 +60,25 @@ Legion::Extensions::Llm.configure do |config|
50
60
  end
51
61
  ```
52
62
 
63
+ ## Fleet Responder
64
+
65
+ Provider instances can opt in to consuming Legion LLM fleet requests. The provider-owned fleet actor only starts when at least one configured instance enables `respond_to_requests`, and request execution delegates to `Legion::Extensions::Llm::Fleet::ProviderResponder`.
66
+
67
+ ```yaml
68
+ extensions:
69
+ llm:
70
+ vllm:
71
+ instances:
72
+ local:
73
+ fleet:
74
+ enabled: true
75
+ respond_to_requests: true
76
+ capabilities:
77
+ - chat
78
+ - stream_chat
79
+ - embed
80
+ ```
81
+
53
82
  ### Thinking Mode
54
83
 
55
84
  Enable vLLM thinking mode globally via settings:
@@ -87,8 +116,8 @@ Publishing is async (background threads) and never blocks the caller. All failur
87
116
 
88
117
  ```bash
89
118
  bundle install
90
- bundle exec rspec
91
- bundle exec rubocop
119
+ bundle exec rspec --format json --out tmp/rspec_results.json --format progress --out tmp/rspec_progress.txt
120
+ bundle exec rubocop -A
92
121
  ```
93
122
 
94
123
  ## License
data/lex-llm-vllm.gemspec CHANGED
@@ -26,5 +26,6 @@ Gem::Specification.new do |spec|
26
26
  spec.add_dependency 'legion-json', '>= 1.2.1'
27
27
  spec.add_dependency 'legion-logging', '>= 1.3.2'
28
28
  spec.add_dependency 'legion-settings', '>= 1.3.14'
29
- spec.add_dependency 'lex-llm', '>= 0.3.0'
29
+ spec.add_dependency 'legion-transport', '>= 1.4.14'
30
+ spec.add_dependency 'lex-llm', '>= 0.4.3'
30
31
  end
@@ -0,0 +1,43 @@
1
+ # frozen_string_literal: true
2
+
3
+ begin
4
+ require 'legion/extensions/actors/subscription'
5
+ rescue LoadError => e
6
+ warn(e.message) if $VERBOSE
7
+ end
8
+
9
+ unless defined?(Legion::Extensions::Actors::Subscription)
10
+ raise LoadError, 'LegionIO actor runtime is required for vLLM fleet worker'
11
+ end
12
+
13
+ require 'legion/extensions/llm/vllm'
14
+ require 'legion/extensions/llm/fleet/provider_responder'
15
+
16
+ module Legion
17
+ module Extensions
18
+ module Llm
19
+ module Vllm
20
+ module Actor
21
+ # Subscription actor for vLLM fleet request consumption.
22
+ class FleetWorker < Legion::Extensions::Actors::Subscription
23
+ def runner_class
24
+ 'Legion::Extensions::Llm::Vllm::Runners::FleetWorker'
25
+ end
26
+
27
+ def runner_function
28
+ 'handle_fleet_request'
29
+ end
30
+
31
+ def use_runner?
32
+ false
33
+ end
34
+
35
+ def enabled?
36
+ Legion::Extensions::Llm::Fleet::ProviderResponder.enabled_for?(Vllm.discover_instances)
37
+ end
38
+ end
39
+ end
40
+ end
41
+ end
42
+ end
43
+ end
@@ -68,8 +68,8 @@ module Legion
68
68
  def sleep_url = '/sleep'
69
69
  def wake_up_url = '/wake_up'
70
70
 
71
- def health
72
- log.info { "checking health at #{api_base}#{health_url}" }
71
+ def health(live: false)
72
+ log.info { "checking health live=#{live} at #{api_base}#{health_url}" }
73
73
  connection.get(health_url).body
74
74
  end
75
75
 
@@ -88,6 +88,18 @@ module Legion
88
88
  end
89
89
  end
90
90
 
91
+ def discover_offerings(live: false, **)
92
+ models = if live
93
+ @cached_models = list_models
94
+ else
95
+ Array(@cached_models)
96
+ end
97
+ models.map { |model_info| offering_from_model(model_info) }
98
+ rescue StandardError => e
99
+ handle_exception(e, level: :warn, handled: true, operation: 'vllm.discover_offerings')
100
+ []
101
+ end
102
+
91
103
  def version
92
104
  log.info { "fetching version from #{api_base}#{version_url}" }
93
105
  connection.get(version_url).body
@@ -112,6 +124,28 @@ module Legion
112
124
 
113
125
  private
114
126
 
127
+ def offering_from_model(model_info)
128
+ Legion::Extensions::Llm::Routing::ModelOffering.new(
129
+ provider_family: :vllm,
130
+ instance_id: config.respond_to?(:instance_id) ? config.instance_id : :default,
131
+ transport: offering_transport,
132
+ tier: offering_tier,
133
+ model: model_info.id,
134
+ usage_type: model_info.embedding? ? :embedding : :inference,
135
+ capabilities: model_info.capabilities.map(&:to_s),
136
+ limits: { context_window: model_info.context_length }.compact,
137
+ metadata: { context_length: model_info.context_length }
138
+ )
139
+ end
140
+
141
+ def offering_transport
142
+ config.respond_to?(:transport) ? config.transport : :http
143
+ end
144
+
145
+ def offering_tier
146
+ config.respond_to?(:tier) ? config.tier : :direct
147
+ end
148
+
115
149
  def render_payload(messages, tools:, temperature:, model:, stream:, schema:, thinking:, tool_prefs:) # rubocop:disable Metrics/ParameterLists
116
150
  payload = super
117
151
  payload.delete(:reasoning_effort)
@@ -131,7 +165,12 @@ module Legion
131
165
  return false unless defined?(Legion::Settings)
132
166
 
133
167
  vllm = Legion::Settings.dig(:llm, :providers, :vllm)
134
- vllm.is_a?(Hash) && (vllm[:enable_thinking] == true || vllm['enable_thinking'] == true)
168
+ return false unless vllm.is_a?(Hash)
169
+
170
+ vllm[:enable_thinking] == true ||
171
+ vllm['enable_thinking'] == true ||
172
+ vllm.dig(:instances, :default, :enable_thinking) == true ||
173
+ vllm.dig('instances', 'default', 'enable_thinking') == true
135
174
  rescue StandardError => e
136
175
  handle_exception(e, level: :debug, handled: true, operation: 'vllm.thinking_setting')
137
176
  false
@@ -0,0 +1,30 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'legion/extensions/llm/fleet/provider_responder'
4
+ require 'legion/extensions/llm/vllm'
5
+
6
+ module Legion
7
+ module Extensions
8
+ module Llm
9
+ module Vllm
10
+ module Runners
11
+ # Runner entrypoint for vLLM fleet request execution.
12
+ module FleetWorker
13
+ module_function
14
+
15
+ def handle_fleet_request(payload, delivery: nil, properties: nil)
16
+ Legion::Extensions::Llm::Fleet::ProviderResponder.call(
17
+ payload: payload,
18
+ provider_family: Vllm::PROVIDER_FAMILY,
19
+ provider_class: Vllm::Provider,
20
+ provider_instances: -> { Vllm.discover_instances },
21
+ delivery: delivery,
22
+ properties: properties
23
+ )
24
+ end
25
+ end
26
+ end
27
+ end
28
+ end
29
+ end
30
+ end
@@ -4,7 +4,7 @@ module Legion
4
4
  module Extensions
5
5
  module Llm
6
6
  module Vllm
7
- VERSION = '0.2.0'
7
+ VERSION = '0.2.6'
8
8
  end
9
9
  end
10
10
  end
@@ -16,17 +16,26 @@ module Legion
16
16
  PROVIDER_FAMILY = :vllm
17
17
 
18
18
  def self.default_settings
19
- {
20
- enabled: false,
21
- base_url: 'localhost:8000/v1',
22
- default_model: nil,
23
- enable_thinking: true,
24
- model_whitelist: [],
25
- model_blacklist: [],
26
- model_cache_ttl: 300,
27
- tls: { enabled: false, verify: :peer },
28
- instances: {}
29
- }
19
+ ::Legion::Extensions::Llm.provider_settings(
20
+ family: PROVIDER_FAMILY,
21
+ instance: {
22
+ endpoint: 'http://localhost:8000',
23
+ tier: :direct,
24
+ transport: :http,
25
+ credentials: { api_key: nil },
26
+ enable_thinking: true,
27
+ usage: { inference: true, embedding: true, image: true },
28
+ limits: { concurrency: 1 },
29
+ fleet: {
30
+ enabled: false,
31
+ respond_to_requests: false,
32
+ capabilities: %i[chat stream_chat embed],
33
+ lanes: [],
34
+ concurrency: 1,
35
+ queue_suffix: nil
36
+ }
37
+ }
38
+ )
30
39
  end
31
40
 
32
41
  def self.provider_class
@@ -51,19 +60,26 @@ module Legion
51
60
  configured = CredentialSources.setting(:extensions, :llm, :vllm, :instances)
52
61
  if configured.is_a?(Hash)
53
62
  configured.each do |name, config|
54
- instances[name.to_sym] = config.merge(tier: :direct)
63
+ instances[name.to_sym] = normalize_instance_config(config).merge(tier: :direct)
55
64
  end
56
65
  end
57
66
 
58
67
  instances
59
68
  end
60
69
 
61
- if Legion::Extensions::Llm::Configuration.respond_to?(:register_provider_options)
62
- Legion::Extensions::Llm::Configuration.register_provider_options(Provider.configuration_options)
70
+ def self.normalize_instance_config(config)
71
+ normalized = config.to_h.transform_keys(&:to_sym)
72
+ normalized[:vllm_api_base] ||= normalized.delete(:base_url)
73
+ normalized[:vllm_api_base] ||= normalized.delete(:api_base)
74
+ normalized[:vllm_api_base] ||= normalized.delete(:endpoint)
75
+ normalized[:vllm_api_base] = normalize_api_base(normalized[:vllm_api_base]) if normalized[:vllm_api_base]
76
+ normalized
77
+ end
78
+
79
+ def self.normalize_api_base(url)
80
+ url.to_s.sub(%r{/v1/?\z}, '')
63
81
  end
64
82
  end
65
83
  end
66
84
  end
67
85
  end
68
-
69
- Legion::Extensions::Llm::Vllm.register_discovered_instances
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: lex-llm-vllm
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.2.6
5
5
  platform: ruby
6
6
  authors:
7
7
  - LegionIO
@@ -51,20 +51,34 @@ dependencies:
51
51
  - - ">="
52
52
  - !ruby/object:Gem::Version
53
53
  version: 1.3.14
54
+ - !ruby/object:Gem::Dependency
55
+ name: legion-transport
56
+ requirement: !ruby/object:Gem::Requirement
57
+ requirements:
58
+ - - ">="
59
+ - !ruby/object:Gem::Version
60
+ version: 1.4.14
61
+ type: :runtime
62
+ prerelease: false
63
+ version_requirements: !ruby/object:Gem::Requirement
64
+ requirements:
65
+ - - ">="
66
+ - !ruby/object:Gem::Version
67
+ version: 1.4.14
54
68
  - !ruby/object:Gem::Dependency
55
69
  name: lex-llm
56
70
  requirement: !ruby/object:Gem::Requirement
57
71
  requirements:
58
72
  - - ">="
59
73
  - !ruby/object:Gem::Version
60
- version: 0.3.0
74
+ version: 0.4.3
61
75
  type: :runtime
62
76
  prerelease: false
63
77
  version_requirements: !ruby/object:Gem::Requirement
64
78
  requirements:
65
79
  - - ">="
66
80
  - !ruby/object:Gem::Version
67
- version: 0.3.0
81
+ version: 0.4.3
68
82
  description: vLLM provider integration for the LegionIO LLM routing framework.
69
83
  email:
70
84
  - matthewdiverson@gmail.com
@@ -83,7 +97,9 @@ files:
83
97
  - README.md
84
98
  - lex-llm-vllm.gemspec
85
99
  - lib/legion/extensions/llm/vllm.rb
100
+ - lib/legion/extensions/llm/vllm/actors/fleet_worker.rb
86
101
  - lib/legion/extensions/llm/vllm/provider.rb
102
+ - lib/legion/extensions/llm/vllm/runners/fleet_worker.rb
87
103
  - lib/legion/extensions/llm/vllm/version.rb
88
104
  homepage: https://github.com/LegionIO/lex-llm-vllm
89
105
  licenses: