lex-llm-vllm 0.1.7 → 0.1.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 3b6bccbfd1d8e01fd38459107474d9ca3853f7d847ff3b5d71a8df3ff7a66c4b
4
- data.tar.gz: f2bd935851929d113f078301a08119a425a68c35907094ee66d69d10af3e5f6f
3
+ metadata.gz: 3dd53d60a8e1aed0d2e1af84c39bf869b31070b927a932d18a69f79990fdd1ec
4
+ data.tar.gz: 739b79d90f9b6744b3eef3ff355978820692337f909cb1bb863270fd0d8114d9
5
5
  SHA512:
6
- metadata.gz: 837e7ea4d14a09dd44922cb6193e4650b92aea3c4eea8cd85ed7916d766c84b7f8887961b0fb72ab8a1578d4005742f61ed44435d181235bb4f26042aa6aecf8
7
- data.tar.gz: 8c73bfdd7921d1f99d788d4a311be574fc7cb9f61c7ebb6a79bdf7ea4a68622f020ace60858134288eea85186beb3d4c32b97c5ac714515a124b8110f3253679
6
+ metadata.gz: 3f1c76258f803a948b304fca1e887d4b2d8368057914b761033e7cde5f3f44d926209f1598de2acba905f783443a8cd1015318193dfd594a11febacc9821334a
7
+ data.tar.gz: af6c18324720d51fb6460b45463955a1944beff9f16edca35b271fb1168dc0a1b3598b4ee3dbce7c1724b0dcef772b83770f330003cea13fee03187761968d23
data/CHANGELOG.md CHANGED
@@ -1,5 +1,14 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.1.8 - 2026-04-30
4
+
5
+ - Add `Legion::Logging::Helper` to all modules and classes for structured logging
6
+ - Replace all bare rescue blocks with `handle_exception` calls for full observability
7
+ - Add info-level action logging to Provider key actions (health, readiness, list_models, version)
8
+ - Add info-level logging to RegistryPublisher publish methods
9
+ - Remove custom `log_publish_failure` method in favor of standard `handle_exception`
10
+ - Update README to reflect registry publishing, thinking mode, and management endpoints
11
+
3
12
  ## 0.1.7 - 2026-04-30
4
13
 
5
14
  - Enable stream_usage_supported? for streaming token usage reporting
data/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # lex-llm-vllm
2
2
 
3
- LegionIO LLM provider extension for vLLM.
3
+ LegionIO LLM provider extension for [vLLM](https://docs.vllm.ai/).
4
4
 
5
5
  This gem lives under `Legion::Extensions::Llm::Vllm` and depends on `lex-llm` for shared provider-neutral routing, fleet, and schema primitives.
6
6
 
@@ -9,14 +9,17 @@ Load it with `require 'legion/extensions/llm/vllm'`.
9
9
  ## What It Provides
10
10
 
11
11
  - `Legion::Extensions::Llm::Provider` registration as `:vllm`
12
- - shared `Legion::Extensions::Llm::Provider::OpenAICompatible` request and response handling
13
- - chat requests through `POST /v1/chat/completions`
14
- - streaming chat support
15
- - model discovery through `GET /v1/models`
16
- - embeddings through `POST /v1/embeddings`
17
- - vLLM management helpers for `/health`, `/version`, `/reset_prefix_cache`, `/reset_mm_cache`, `/sleep`, and `/wake_up`
18
- - normalized OpenAI-compatible capability and modality metadata for discovered models
19
- - shared fleet/default settings via `Legion::Extensions::Llm.provider_settings`
12
+ - Shared `Legion::Extensions::Llm::Provider::OpenAICompatible` request and response handling
13
+ - Chat requests through `POST /v1/chat/completions`
14
+ - Streaming chat with `stream_usage_supported?` for token usage reporting
15
+ - Model discovery through `GET /v1/models`
16
+ - Embeddings through `POST /v1/embeddings`
17
+ - vLLM thinking mode via `chat_template_kwargs` (configurable through `Legion::Settings`)
18
+ - Best-effort `llm.registry` readiness and model availability event publishing when transport is loaded
19
+ - vLLM management helpers: `/health`, `/version`, `/reset_prefix_cache`, `/reset_mm_cache`, `/sleep`, `/wake_up`
20
+ - Normalized OpenAI-compatible capability and modality metadata for discovered models
21
+ - Shared fleet/default settings via `Legion::Extensions::Llm.provider_settings`
22
+ - Full `Legion::Logging::Helper` integration with structured `handle_exception` across all classes
20
23
 
21
24
  ## Defaults
22
25
 
@@ -47,4 +50,47 @@ Legion::Extensions::Llm.configure do |config|
47
50
  end
48
51
  ```
49
52
 
50
- vLLM's OpenAI-compatible server supports the chat completions, models, and embeddings APIs when the served model and task support them. Chat requests require a model with a chat template; embedding requests require an embedding-capable served model.
53
+ ### Thinking Mode
54
+
55
+ Enable vLLM thinking mode globally via settings:
56
+
57
+ ```ruby
58
+ # In Legion::Settings or settings JSON
59
+ { llm: { providers: { vllm: { enable_thinking: true } } } }
60
+ ```
61
+
62
+ Or pass `thinking: { enabled: true }` per-request. When enabled, the provider adds `chat_template_kwargs: { enable_thinking: true }` to the payload and strips `reasoning_effort`.
63
+
64
+ ## Management Endpoints
65
+
66
+ The provider exposes helpers for vLLM server management:
67
+
68
+ | Method | Endpoint | Description |
69
+ |--------|----------|-------------|
70
+ | `health` | `GET /health` | Server health check |
71
+ | `version` | `GET /version` | Server version info |
72
+ | `reset_prefix_cache` | `POST /reset_prefix_cache` | Clear prefix cache |
73
+ | `reset_mm_cache` | `POST /reset_mm_cache` | Clear multimodal cache |
74
+ | `sleep(level:)` | `POST /sleep` | Put server to sleep |
75
+ | `wake_up(tags:)` | `POST /wake_up` | Wake server up |
76
+
77
+ ## Registry Publishing
78
+
79
+ When `lex-llm` routing and Legion transport are available, the provider publishes best-effort availability events to the `llm.registry` exchange:
80
+
81
+ - **Readiness events** on `readiness(live: true)` calls
82
+ - **Model availability events** on `list_models` discovery
83
+
84
+ Publishing is async (background threads) and never blocks the caller. All failures are handled gracefully via `handle_exception`.
85
+
86
+ ## Development
87
+
88
+ ```bash
89
+ bundle install
90
+ bundle exec rspec
91
+ bundle exec rubocop
92
+ ```
93
+
94
+ ## License
95
+
96
+ MIT
@@ -10,6 +10,7 @@ module Legion
10
10
  # vLLM provider implementation for the Legion::Extensions::Llm base provider contract.
11
11
  class Provider < Legion::Extensions::Llm::Provider
12
12
  include Legion::Extensions::Llm::Provider::OpenAICompatible
13
+ include Legion::Logging::Helper
13
14
 
14
15
  class << self
15
16
  attr_writer :registry_publisher
@@ -66,22 +67,27 @@ module Legion
66
67
  def wake_up_url = '/wake_up'
67
68
 
68
69
  def health
70
+ log.info { "checking health at #{api_base}#{health_url}" }
69
71
  connection.get(health_url).body
70
72
  end
71
73
 
72
74
  def readiness(live: false)
75
+ log.info { "checking readiness live=#{live} at #{api_base}" }
73
76
  super.tap do |metadata|
74
77
  self.class.registry_publisher.publish_readiness_async(metadata) if live
75
78
  end
76
79
  end
77
80
 
78
81
  def list_models
82
+ log.info { "discovering models from #{api_base}#{models_url}" }
79
83
  super.tap do |models|
84
+ log.info { "discovered #{models.size} model(s) from vLLM" }
80
85
  self.class.registry_publisher.publish_models_async(models, readiness: readiness(live: false))
81
86
  end
82
87
  end
83
88
 
84
89
  def version
90
+ log.info { "fetching version from #{api_base}#{version_url}" }
85
91
  connection.get(version_url).body
86
92
  end
87
93
 
@@ -124,7 +130,8 @@ module Legion
124
130
 
125
131
  vllm = Legion::Settings.dig(:llm, :providers, :vllm)
126
132
  vllm.is_a?(Hash) && (vllm[:enable_thinking] == true || vllm['enable_thinking'] == true)
127
- rescue StandardError
133
+ rescue StandardError => e
134
+ handle_exception(e, level: :debug, handled: true, operation: 'vllm.thinking_setting')
128
135
  false
129
136
  end
130
137
 
@@ -6,6 +6,8 @@ module Legion
6
6
  module Vllm
7
7
  # Builds sanitized lex-llm registry envelopes for vLLM provider state.
8
8
  class RegistryEventBuilder
9
+ include Legion::Logging::Helper
10
+
9
11
  def readiness(readiness)
10
12
  registry_event_class.public_send(
11
13
  readiness[:ready] ? :available : :unavailable,
@@ -108,7 +110,8 @@ module Legion
108
110
  configured_node = (::Legion::Settings.dig(:node, :canonical_name) if defined?(::Legion::Settings))
109
111
  value = configured_node.to_s.strip
110
112
  value.empty? ? :vllm : value.to_sym
111
- rescue StandardError
113
+ rescue StandardError => e
114
+ handle_exception(e, level: :debug, handled: true, operation: 'vllm.registry.provider_instance')
112
115
  :vllm
113
116
  end
114
117
 
@@ -6,6 +6,8 @@ module Legion
6
6
  module Vllm
7
7
  # Best-effort publisher for vLLM provider availability events.
8
8
  class RegistryPublisher
9
+ include Legion::Logging::Helper
10
+
9
11
  APP_ID = 'lex-llm-vllm'
10
12
 
11
13
  def initialize(builder: RegistryEventBuilder.new)
@@ -13,10 +15,12 @@ module Legion
13
15
  end
14
16
 
15
17
  def publish_readiness_async(readiness)
18
+ log.info { 'publishing readiness event to llm.registry' }
16
19
  schedule { publish_event(@builder.readiness(readiness)) }
17
20
  end
18
21
 
19
22
  def publish_models_async(models, readiness:)
23
+ log.info { "publishing #{Array(models).size} model event(s) to llm.registry" }
20
24
  schedule do
21
25
  Array(models).each do |model|
22
26
  publish_event(@builder.model_available(model, readiness:))
@@ -33,10 +37,10 @@ module Legion
33
37
  Thread.current.abort_on_exception = false
34
38
  yield
35
39
  rescue StandardError => e
36
- log_publish_failure(e, level: :debug)
40
+ handle_exception(e, level: :debug, handled: true, operation: 'vllm.registry.schedule_thread')
37
41
  end
38
42
  rescue StandardError => e
39
- log_publish_failure(e, level: :debug)
43
+ handle_exception(e, level: :debug, handled: true, operation: 'vllm.registry.schedule')
40
44
  false
41
45
  end
42
46
 
@@ -45,7 +49,7 @@ module Legion
45
49
 
46
50
  message_class.new(event:, app_id: APP_ID).publish(spool: false)
47
51
  rescue StandardError => e
48
- log_publish_failure(e)
52
+ handle_exception(e, level: :warn, handled: true, operation: 'vllm.registry.publish_event')
49
53
  false
50
54
  end
51
55
 
@@ -56,7 +60,8 @@ module Legion
56
60
  return true unless ::Legion::Transport::Connection.respond_to?(:session_open?)
57
61
 
58
62
  ::Legion::Transport::Connection.session_open?
59
- rescue StandardError
63
+ rescue StandardError => e
64
+ handle_exception(e, level: :debug, handled: true, operation: 'vllm.registry.publishing_available?')
60
65
  false
61
66
  end
62
67
 
@@ -70,7 +75,8 @@ module Legion
70
75
 
71
76
  require 'legion/extensions/llm/vllm/transport/messages/registry_event'
72
77
  message_class_defined?
73
- rescue LoadError
78
+ rescue LoadError => e
79
+ handle_exception(e, level: :debug, handled: true, operation: 'vllm.registry.transport_load')
74
80
  false
75
81
  end
76
82
 
@@ -81,18 +87,6 @@ module Legion
81
87
  def message_class
82
88
  ::Legion::Extensions::Llm::Vllm::Transport::Messages::RegistryEvent
83
89
  end
84
-
85
- def log_publish_failure(error, level: :warn)
86
- message = "[lex-llm-vllm] llm.registry publish failed: #{error.class}: #{error.message}"
87
- logger = ::Legion::Extensions::Llm.logger if defined?(::Legion::Extensions::Llm)
88
- if logger.respond_to?(level)
89
- logger.public_send(level, message)
90
- elsif logger.respond_to?(:debug)
91
- logger.debug(message)
92
- end
93
- rescue StandardError
94
- nil
95
- end
96
90
  end
97
91
  end
98
92
  end
@@ -4,7 +4,7 @@ module Legion
4
4
  module Extensions
5
5
  module Llm
6
6
  module Vllm
7
- VERSION = '0.1.7'
7
+ VERSION = '0.1.8'
8
8
  end
9
9
  end
10
10
  end
@@ -12,6 +12,7 @@ module Legion
12
12
  # Vllm provider extension namespace.
13
13
  module Vllm
14
14
  extend ::Legion::Extensions::Core if ::Legion::Extensions.const_defined?(:Core, false)
15
+ extend Legion::Logging::Helper
15
16
 
16
17
  PROVIDER_FAMILY = :vllm
17
18
 
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: lex-llm-vllm
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.7
4
+ version: 0.1.8
5
5
  platform: ruby
6
6
  authors:
7
7
  - LegionIO