lex-llm-vllm 0.1.9 → 0.2.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: c7c1de4a067bd42d4675c0485f5e13c4d6fe3a1a17c29a2e23c46d266588dd20
4
- data.tar.gz: fe503a3a436ef92bcc88015b1d608180d4f98a226cdac4d825c7433c812ee67c
3
+ metadata.gz: 14dc7ee5334135f8eece7622a27cb57bad3c9c885cb6203de56006b5d2a1b353
4
+ data.tar.gz: 59ea84f7c50a9407da2af50c51f77ad26894d45b7f06d9c0223198a73a7e22a8
5
5
  SHA512:
6
- metadata.gz: 00bdc87460cf051250b56def2c2a910efe5ff058451a3eff26a7ad1254c5ec9441d3ddb592fa80fd600a83e8479d3593cd1771042e0bacadf0613bb33735ba26
7
- data.tar.gz: 7d0df28f8edc25b269f64a987e63de90fe89d098e409bbd2dada20ac8f8b981caec3c046a3eeb591c7edb5c025efc80cd480837fc03b81348ab69e160bef9d2b
6
+ metadata.gz: 05c8fa912c908ec88943277dfd2e2b82f84bbd4ff6880d4e1f288f1397baf2cc1f72c2d20e894684be14be9f5f213e381ea5f524bfed3162e116de0271009f93
7
+ data.tar.gz: 14fffc18eff78d0c8751fbb96c9db6cbdd2773af66c46359b5934fad09d9a8fa37183a4f123f7cb75008e44cf821a21d97756d47633e44a50ab935f3f5743f0d
@@ -8,8 +8,20 @@ jobs:
8
8
  ci:
9
9
  uses: LegionIO/.github/.github/workflows/ci.yml@main
10
10
 
11
+ excluded-files:
12
+ uses: LegionIO/.github/.github/workflows/excluded-files.yml@main
13
+
14
+ security:
15
+ uses: LegionIO/.github/.github/workflows/security-scan.yml@main
16
+
17
+ version-changelog:
18
+ uses: LegionIO/.github/.github/workflows/version-changelog.yml@main
19
+
20
+ dependency-review:
21
+ uses: LegionIO/.github/.github/workflows/dependency-review.yml@main
22
+
11
23
  release:
12
- needs: ci
24
+ needs: [ci, excluded-files, security]
13
25
  if: github.event_name == 'push' && github.ref == 'refs/heads/main'
14
26
  uses: LegionIO/.github/.github/workflows/release.yml@main
15
27
  secrets:
data/CHANGELOG.md CHANGED
@@ -1,5 +1,49 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.2.6 - 2026-05-06
4
+
5
+ - Load provider-owned fleet actors through the LegionIO subscription base and the canonical vLLM provider root.
6
+ - Keep fleet runners anchored on the provider root namespace so provider constants and instance discovery are always loaded.
7
+ - Normalize configured `endpoint` and `api_base` aliases to `vllm_api_base`.
8
+ - Preserve configured transport and tier metadata when vLLM builds routing offerings.
9
+ - Gate release publishing on the shared security workflow.
10
+
11
+ ## 0.2.5 - 2026-05-06
12
+
13
+ - Mark handled vLLM offering-discovery failures as handled when logging through `Legion::Logging::Helper`.
14
+ - Refresh README dependency, defaults, and local verification guidance for the `lex-llm >= 0.4.3` fleet responder contract.
15
+
16
+ ## 0.2.4 - 2026-05-06
17
+
18
+ - Use the shared `lex-llm` fleet provider responder helper for provider-owned fleet workers.
19
+ - Remove the runtime `legion-llm` dependency and require `lex-llm >= 0.4.3` for responder-side fleet execution.
20
+
21
+ ## 0.2.3 - 2026-05-06
22
+
23
+ - Remove require-time provider self-registration; `legion-llm` now owns adapter creation and registry writes from loaded provider discovery metadata.
24
+ - Bump dependency floors to `lex-llm >= 0.4.1` and `legion-llm >= 0.9.1`.
25
+
26
+ ## 0.2.2 - 2026-05-06
27
+
28
+ - Enforce the shared keyword-only `lex-llm` provider contract and accept `health(live:)`.
29
+ - Move vLLM defaults back to `Legion::Extensions::Llm.provider_settings` with instance-level fleet responder settings.
30
+ - Read vLLM thinking defaults from the nested provider instance settings shape.
31
+ - Serve non-live vLLM offering reads from cached live model discovery instead of probing the configured endpoint.
32
+ - Add provider-owned fleet responder actor and runner backed by `legion-llm` fleet policy execution.
33
+ - Bump the transport dependency floor to `legion-transport >= 1.4.14`.
34
+
35
+ ## 0.2.1 - 2026-05-03
36
+
37
+ - Normalize configured `base_url` instance settings to `vllm_api_base` so LegionIO local settings are honored during provider registration.
38
+ - Strip a trailing `/v1` from configured vLLM API roots because OpenAI-compatible endpoints append their own `/v1/...` paths.
39
+
40
+ ## 0.2.0 - 2026-05-01
41
+
42
+ - Add auto-discovery via CredentialSources and AutoRegistration from lex-llm 0.3.0
43
+ - Self-register discovered instances into Call::Registry at require-time
44
+ - Require lex-llm >= 0.3.0
45
+
46
+
3
47
  ## 0.1.9 - 2026-04-30
4
48
 
5
49
  - Adopt base provider contract from lex-llm 0.1.9
data/Gemfile CHANGED
@@ -4,6 +4,8 @@ source 'https://rubygems.org'
4
4
 
5
5
  group :test do
6
6
  llm_base_path = ENV.fetch('LEX_LLM_PATH', File.expand_path('../lex-llm', __dir__))
7
+ transport_path = ENV.fetch('LEGION_TRANSPORT_PATH', File.expand_path('../../legion-transport', __dir__))
8
+ gem 'legion-transport', path: transport_path if File.directory?(transport_path)
7
9
  gem 'lex-llm', path: llm_base_path if File.directory?(llm_base_path)
8
10
  end
9
11
 
data/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  LegionIO LLM provider extension for [vLLM](https://docs.vllm.ai/).
4
4
 
5
- This gem lives under `Legion::Extensions::Llm::Vllm` and depends on `lex-llm` for shared provider-neutral routing, fleet, and schema primitives.
5
+ This gem lives under `Legion::Extensions::Llm::Vllm` and depends on `lex-llm >= 0.4.3` for shared provider-neutral routing, response normalization, fleet envelopes, responder-side fleet execution, and schema primitives.
6
6
 
7
7
  Load it with `require 'legion/extensions/llm/vllm'`.
8
8
 
@@ -19,7 +19,7 @@ Load it with `require 'legion/extensions/llm/vllm'`.
19
19
  - vLLM management helpers: `/health`, `/version`, `/reset_prefix_cache`, `/reset_mm_cache`, `/sleep`, `/wake_up`
20
20
  - Normalized OpenAI-compatible capability and modality metadata for discovered models
21
21
  - Shared fleet/default settings via `Legion::Extensions::Llm.provider_settings`
22
- - Full `Legion::Logging::Helper` integration with structured `handle_exception` across all classes
22
+ - Structured `Legion::Logging::Helper` handling for provider discovery and fallback paths
23
23
 
24
24
  ## Defaults
25
25
 
@@ -30,10 +30,20 @@ Legion::Extensions::Llm::Vllm.default_settings
30
30
  # instances: {
31
31
  # default: {
32
32
  # endpoint: "http://localhost:8000",
33
- # tier: :private,
33
+ # tier: :direct,
34
34
  # transport: :http,
35
- # usage: { inference: true, embedding: true },
36
- # limits: { concurrency: 8 }
35
+ # credentials: { api_key: nil },
36
+ # enable_thinking: true,
37
+ # usage: { inference: true, embedding: true, image: true },
38
+ # limits: { concurrency: 1 },
39
+ # fleet: {
40
+ # enabled: false,
41
+ # respond_to_requests: false,
42
+ # capabilities: [:chat, :stream_chat, :embed],
43
+ # lanes: [],
44
+ # concurrency: 1,
45
+ # queue_suffix: nil
46
+ # }
37
47
  # }
38
48
  # }
39
49
  # }
@@ -50,6 +60,25 @@ Legion::Extensions::Llm.configure do |config|
50
60
  end
51
61
  ```
52
62
 
63
+ ## Fleet Responder
64
+
65
+ Provider instances can opt in to consuming Legion LLM fleet requests. The provider-owned fleet actor only starts when at least one configured instance enables `respond_to_requests`, and request execution delegates to `Legion::Extensions::Llm::Fleet::ProviderResponder`.
66
+
67
+ ```yaml
68
+ extensions:
69
+ llm:
70
+ vllm:
71
+ instances:
72
+ local:
73
+ fleet:
74
+ enabled: true
75
+ respond_to_requests: true
76
+ capabilities:
77
+ - chat
78
+ - stream_chat
79
+ - embed
80
+ ```
81
+
53
82
  ### Thinking Mode
54
83
 
55
84
  Enable vLLM thinking mode globally via settings:
@@ -87,8 +116,8 @@ Publishing is async (background threads) and never blocks the caller. All failur
87
116
 
88
117
  ```bash
89
118
  bundle install
90
- bundle exec rspec
91
- bundle exec rubocop
119
+ bundle exec rspec --format json --out tmp/rspec_results.json --format progress --out tmp/rspec_progress.txt
120
+ bundle exec rubocop -A
92
121
  ```
93
122
 
94
123
  ## License
data/lex-llm-vllm.gemspec CHANGED
@@ -26,5 +26,6 @@ Gem::Specification.new do |spec|
26
26
  spec.add_dependency 'legion-json', '>= 1.2.1'
27
27
  spec.add_dependency 'legion-logging', '>= 1.3.2'
28
28
  spec.add_dependency 'legion-settings', '>= 1.3.14'
29
- spec.add_dependency 'lex-llm', '>= 0.1.9'
29
+ spec.add_dependency 'legion-transport', '>= 1.4.14'
30
+ spec.add_dependency 'lex-llm', '>= 0.4.3'
30
31
  end
@@ -0,0 +1,43 @@
1
+ # frozen_string_literal: true
2
+
3
+ begin
4
+ require 'legion/extensions/actors/subscription'
5
+ rescue LoadError => e
6
+ warn(e.message) if $VERBOSE
7
+ end
8
+
9
+ unless defined?(Legion::Extensions::Actors::Subscription)
10
+ raise LoadError, 'LegionIO actor runtime is required for vLLM fleet worker'
11
+ end
12
+
13
+ require 'legion/extensions/llm/vllm'
14
+ require 'legion/extensions/llm/fleet/provider_responder'
15
+
16
+ module Legion
17
+ module Extensions
18
+ module Llm
19
+ module Vllm
20
+ module Actor
21
+ # Subscription actor for vLLM fleet request consumption.
22
+ class FleetWorker < Legion::Extensions::Actors::Subscription
23
+ def runner_class
24
+ 'Legion::Extensions::Llm::Vllm::Runners::FleetWorker'
25
+ end
26
+
27
+ def runner_function
28
+ 'handle_fleet_request'
29
+ end
30
+
31
+ def use_runner?
32
+ false
33
+ end
34
+
35
+ def enabled?
36
+ Legion::Extensions::Llm::Fleet::ProviderResponder.enabled_for?(Vllm.discover_instances)
37
+ end
38
+ end
39
+ end
40
+ end
41
+ end
42
+ end
43
+ end
@@ -46,8 +46,12 @@ module Legion
46
46
 
47
47
  def stream_usage_supported? = true
48
48
 
49
+ def settings
50
+ Vllm.default_settings
51
+ end
52
+
49
53
  def api_base
50
- config.vllm_api_base || 'http://localhost:8000'
54
+ normalize_url(config.vllm_api_base || 'localhost:8000')
51
55
  end
52
56
 
53
57
  def headers
@@ -64,8 +68,8 @@ module Legion
64
68
  def sleep_url = '/sleep'
65
69
  def wake_up_url = '/wake_up'
66
70
 
67
- def health
68
- log.info { "checking health at #{api_base}#{health_url}" }
71
+ def health(live: false)
72
+ log.info { "checking health live=#{live} at #{api_base}#{health_url}" }
69
73
  connection.get(health_url).body
70
74
  end
71
75
 
@@ -84,6 +88,18 @@ module Legion
84
88
  end
85
89
  end
86
90
 
91
+ def discover_offerings(live: false, **)
92
+ models = if live
93
+ @cached_models = list_models
94
+ else
95
+ Array(@cached_models)
96
+ end
97
+ models.map { |model_info| offering_from_model(model_info) }
98
+ rescue StandardError => e
99
+ handle_exception(e, level: :warn, handled: true, operation: 'vllm.discover_offerings')
100
+ []
101
+ end
102
+
87
103
  def version
88
104
  log.info { "fetching version from #{api_base}#{version_url}" }
89
105
  connection.get(version_url).body
@@ -108,6 +124,28 @@ module Legion
108
124
 
109
125
  private
110
126
 
127
+ def offering_from_model(model_info)
128
+ Legion::Extensions::Llm::Routing::ModelOffering.new(
129
+ provider_family: :vllm,
130
+ instance_id: config.respond_to?(:instance_id) ? config.instance_id : :default,
131
+ transport: offering_transport,
132
+ tier: offering_tier,
133
+ model: model_info.id,
134
+ usage_type: model_info.embedding? ? :embedding : :inference,
135
+ capabilities: model_info.capabilities.map(&:to_s),
136
+ limits: { context_window: model_info.context_length }.compact,
137
+ metadata: { context_length: model_info.context_length }
138
+ )
139
+ end
140
+
141
+ def offering_transport
142
+ config.respond_to?(:transport) ? config.transport : :http
143
+ end
144
+
145
+ def offering_tier
146
+ config.respond_to?(:tier) ? config.tier : :direct
147
+ end
148
+
111
149
  def render_payload(messages, tools:, temperature:, model:, stream:, schema:, thinking:, tool_prefs:) # rubocop:disable Metrics/ParameterLists
112
150
  payload = super
113
151
  payload.delete(:reasoning_effort)
@@ -127,7 +165,12 @@ module Legion
127
165
  return false unless defined?(Legion::Settings)
128
166
 
129
167
  vllm = Legion::Settings.dig(:llm, :providers, :vllm)
130
- vllm.is_a?(Hash) && (vllm[:enable_thinking] == true || vllm['enable_thinking'] == true)
168
+ return false unless vllm.is_a?(Hash)
169
+
170
+ vllm[:enable_thinking] == true ||
171
+ vllm['enable_thinking'] == true ||
172
+ vllm.dig(:instances, :default, :enable_thinking) == true ||
173
+ vllm.dig('instances', 'default', 'enable_thinking') == true
131
174
  rescue StandardError => e
132
175
  handle_exception(e, level: :debug, handled: true, operation: 'vllm.thinking_setting')
133
176
  false
@@ -0,0 +1,30 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'legion/extensions/llm/fleet/provider_responder'
4
+ require 'legion/extensions/llm/vllm'
5
+
6
+ module Legion
7
+ module Extensions
8
+ module Llm
9
+ module Vllm
10
+ module Runners
11
+ # Runner entrypoint for vLLM fleet request execution.
12
+ module FleetWorker
13
+ module_function
14
+
15
+ def handle_fleet_request(payload, delivery: nil, properties: nil)
16
+ Legion::Extensions::Llm::Fleet::ProviderResponder.call(
17
+ payload: payload,
18
+ provider_family: Vllm::PROVIDER_FAMILY,
19
+ provider_class: Vllm::Provider,
20
+ provider_instances: -> { Vllm.discover_instances },
21
+ delivery: delivery,
22
+ properties: properties
23
+ )
24
+ end
25
+ end
26
+ end
27
+ end
28
+ end
29
+ end
30
+ end
@@ -4,7 +4,7 @@ module Legion
4
4
  module Extensions
5
5
  module Llm
6
6
  module Vllm
7
- VERSION = '0.1.9'
7
+ VERSION = '0.2.6'
8
8
  end
9
9
  end
10
10
  end
@@ -11,21 +11,31 @@ module Legion
11
11
  module Vllm
12
12
  extend ::Legion::Extensions::Core if ::Legion::Extensions.const_defined?(:Core, false)
13
13
  extend Legion::Logging::Helper
14
+ extend Legion::Extensions::Llm::AutoRegistration
14
15
 
15
16
  PROVIDER_FAMILY = :vllm
16
17
 
17
18
  def self.default_settings
18
- {
19
- enabled: false,
20
- base_url: 'localhost:8000/v1',
21
- default_model: nil,
22
- enable_thinking: true,
23
- model_whitelist: [],
24
- model_blacklist: [],
25
- model_cache_ttl: 300,
26
- tls: { enabled: false, verify: :peer },
27
- instances: {}
28
- }
19
+ ::Legion::Extensions::Llm.provider_settings(
20
+ family: PROVIDER_FAMILY,
21
+ instance: {
22
+ endpoint: 'http://localhost:8000',
23
+ tier: :direct,
24
+ transport: :http,
25
+ credentials: { api_key: nil },
26
+ enable_thinking: true,
27
+ usage: { inference: true, embedding: true, image: true },
28
+ limits: { concurrency: 1 },
29
+ fleet: {
30
+ enabled: false,
31
+ respond_to_requests: false,
32
+ capabilities: %i[chat stream_chat embed],
33
+ lanes: [],
34
+ concurrency: 1,
35
+ queue_suffix: nil
36
+ }
37
+ }
38
+ )
29
39
  end
30
40
 
31
41
  def self.provider_class
@@ -36,7 +46,39 @@ module Legion
36
46
  @registry_publisher ||= Legion::Extensions::Llm::RegistryPublisher.new(provider_family: PROVIDER_FAMILY)
37
47
  end
38
48
 
39
- Legion::Extensions::Llm::Configuration.register_provider_options(Provider.configuration_options)
49
+ def self.discover_instances
50
+ instances = {}
51
+
52
+ if CredentialSources.http_ok?('http://localhost:8000', path: '/health', timeout: 0.1)
53
+ instances[:local] = {
54
+ vllm_api_base: 'http://localhost:8000',
55
+ tier: :local,
56
+ capabilities: [:completion]
57
+ }
58
+ end
59
+
60
+ configured = CredentialSources.setting(:extensions, :llm, :vllm, :instances)
61
+ if configured.is_a?(Hash)
62
+ configured.each do |name, config|
63
+ instances[name.to_sym] = normalize_instance_config(config).merge(tier: :direct)
64
+ end
65
+ end
66
+
67
+ instances
68
+ end
69
+
70
+ def self.normalize_instance_config(config)
71
+ normalized = config.to_h.transform_keys(&:to_sym)
72
+ normalized[:vllm_api_base] ||= normalized.delete(:base_url)
73
+ normalized[:vllm_api_base] ||= normalized.delete(:api_base)
74
+ normalized[:vllm_api_base] ||= normalized.delete(:endpoint)
75
+ normalized[:vllm_api_base] = normalize_api_base(normalized[:vllm_api_base]) if normalized[:vllm_api_base]
76
+ normalized
77
+ end
78
+
79
+ def self.normalize_api_base(url)
80
+ url.to_s.sub(%r{/v1/?\z}, '')
81
+ end
40
82
  end
41
83
  end
42
84
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: lex-llm-vllm
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.9
4
+ version: 0.2.6
5
5
  platform: ruby
6
6
  authors:
7
7
  - LegionIO
@@ -51,20 +51,34 @@ dependencies:
51
51
  - - ">="
52
52
  - !ruby/object:Gem::Version
53
53
  version: 1.3.14
54
+ - !ruby/object:Gem::Dependency
55
+ name: legion-transport
56
+ requirement: !ruby/object:Gem::Requirement
57
+ requirements:
58
+ - - ">="
59
+ - !ruby/object:Gem::Version
60
+ version: 1.4.14
61
+ type: :runtime
62
+ prerelease: false
63
+ version_requirements: !ruby/object:Gem::Requirement
64
+ requirements:
65
+ - - ">="
66
+ - !ruby/object:Gem::Version
67
+ version: 1.4.14
54
68
  - !ruby/object:Gem::Dependency
55
69
  name: lex-llm
56
70
  requirement: !ruby/object:Gem::Requirement
57
71
  requirements:
58
72
  - - ">="
59
73
  - !ruby/object:Gem::Version
60
- version: 0.1.9
74
+ version: 0.4.3
61
75
  type: :runtime
62
76
  prerelease: false
63
77
  version_requirements: !ruby/object:Gem::Requirement
64
78
  requirements:
65
79
  - - ">="
66
80
  - !ruby/object:Gem::Version
67
- version: 0.1.9
81
+ version: 0.4.3
68
82
  description: vLLM provider integration for the LegionIO LLM routing framework.
69
83
  email:
70
84
  - matthewdiverson@gmail.com
@@ -83,7 +97,9 @@ files:
83
97
  - README.md
84
98
  - lex-llm-vllm.gemspec
85
99
  - lib/legion/extensions/llm/vllm.rb
100
+ - lib/legion/extensions/llm/vllm/actors/fleet_worker.rb
86
101
  - lib/legion/extensions/llm/vllm/provider.rb
102
+ - lib/legion/extensions/llm/vllm/runners/fleet_worker.rb
87
103
  - lib/legion/extensions/llm/vllm/version.rb
88
104
  homepage: https://github.com/LegionIO/lex-llm-vllm
89
105
  licenses: