lex-llm-vllm 0.1.7 → 0.1.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +9 -0
- data/README.md +56 -10
- data/lib/legion/extensions/llm/vllm/provider.rb +8 -1
- data/lib/legion/extensions/llm/vllm/registry_event_builder.rb +4 -1
- data/lib/legion/extensions/llm/vllm/registry_publisher.rb +11 -17
- data/lib/legion/extensions/llm/vllm/version.rb +1 -1
- data/lib/legion/extensions/llm/vllm.rb +1 -0
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 3dd53d60a8e1aed0d2e1af84c39bf869b31070b927a932d18a69f79990fdd1ec
|
|
4
|
+
data.tar.gz: 739b79d90f9b6744b3eef3ff355978820692337f909cb1bb863270fd0d8114d9
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 3f1c76258f803a948b304fca1e887d4b2d8368057914b761033e7cde5f3f44d926209f1598de2acba905f783443a8cd1015318193dfd594a11febacc9821334a
|
|
7
|
+
data.tar.gz: af6c18324720d51fb6460b45463955a1944beff9f16edca35b271fb1168dc0a1b3598b4ee3dbce7c1724b0dcef772b83770f330003cea13fee03187761968d23
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,14 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 0.1.8 - 2026-04-30
|
|
4
|
+
|
|
5
|
+
- Add `Legion::Logging::Helper` to all modules and classes for structured logging
|
|
6
|
+
- Replace all bare rescue blocks with `handle_exception` calls for full observability
|
|
7
|
+
- Add info-level action logging to Provider key actions (health, readiness, list_models, version)
|
|
8
|
+
- Add info-level logging to RegistryPublisher publish methods
|
|
9
|
+
- Remove custom `log_publish_failure` method in favor of standard `handle_exception`
|
|
10
|
+
- Update README to reflect registry publishing, thinking mode, and management endpoints
|
|
11
|
+
|
|
3
12
|
## 0.1.7 - 2026-04-30
|
|
4
13
|
|
|
5
14
|
- Enable stream_usage_supported? for streaming token usage reporting
|
data/README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# lex-llm-vllm
|
|
2
2
|
|
|
3
|
-
LegionIO LLM provider extension for vLLM.
|
|
3
|
+
LegionIO LLM provider extension for [vLLM](https://docs.vllm.ai/).
|
|
4
4
|
|
|
5
5
|
This gem lives under `Legion::Extensions::Llm::Vllm` and depends on `lex-llm` for shared provider-neutral routing, fleet, and schema primitives.
|
|
6
6
|
|
|
@@ -9,14 +9,17 @@ Load it with `require 'legion/extensions/llm/vllm'`.
|
|
|
9
9
|
## What It Provides
|
|
10
10
|
|
|
11
11
|
- `Legion::Extensions::Llm::Provider` registration as `:vllm`
|
|
12
|
-
-
|
|
13
|
-
-
|
|
14
|
-
-
|
|
15
|
-
-
|
|
16
|
-
-
|
|
17
|
-
- vLLM
|
|
18
|
-
-
|
|
19
|
-
-
|
|
12
|
+
- Shared `Legion::Extensions::Llm::Provider::OpenAICompatible` request and response handling
|
|
13
|
+
- Chat requests through `POST /v1/chat/completions`
|
|
14
|
+
- Streaming chat with `stream_usage_supported?` for token usage reporting
|
|
15
|
+
- Model discovery through `GET /v1/models`
|
|
16
|
+
- Embeddings through `POST /v1/embeddings`
|
|
17
|
+
- vLLM thinking mode via `chat_template_kwargs` (configurable through `Legion::Settings`)
|
|
18
|
+
- Best-effort `llm.registry` readiness and model availability event publishing when transport is loaded
|
|
19
|
+
- vLLM management helpers: `/health`, `/version`, `/reset_prefix_cache`, `/reset_mm_cache`, `/sleep`, `/wake_up`
|
|
20
|
+
- Normalized OpenAI-compatible capability and modality metadata for discovered models
|
|
21
|
+
- Shared fleet/default settings via `Legion::Extensions::Llm.provider_settings`
|
|
22
|
+
- Full `Legion::Logging::Helper` integration with structured `handle_exception` across all classes
|
|
20
23
|
|
|
21
24
|
## Defaults
|
|
22
25
|
|
|
@@ -47,4 +50,47 @@ Legion::Extensions::Llm.configure do |config|
|
|
|
47
50
|
end
|
|
48
51
|
```
|
|
49
52
|
|
|
50
|
-
|
|
53
|
+
### Thinking Mode
|
|
54
|
+
|
|
55
|
+
Enable vLLM thinking mode globally via settings:
|
|
56
|
+
|
|
57
|
+
```ruby
|
|
58
|
+
# In Legion::Settings or settings JSON
|
|
59
|
+
{ llm: { providers: { vllm: { enable_thinking: true } } } }
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
Or pass `thinking: { enabled: true }` per-request. When enabled, the provider adds `chat_template_kwargs: { enable_thinking: true }` to the payload and strips `reasoning_effort`.
|
|
63
|
+
|
|
64
|
+
## Management Endpoints
|
|
65
|
+
|
|
66
|
+
The provider exposes helpers for vLLM server management:
|
|
67
|
+
|
|
68
|
+
| Method | Endpoint | Description |
|
|
69
|
+
|--------|----------|-------------|
|
|
70
|
+
| `health` | `GET /health` | Server health check |
|
|
71
|
+
| `version` | `GET /version` | Server version info |
|
|
72
|
+
| `reset_prefix_cache` | `POST /reset_prefix_cache` | Clear prefix cache |
|
|
73
|
+
| `reset_mm_cache` | `POST /reset_mm_cache` | Clear multimodal cache |
|
|
74
|
+
| `sleep(level:)` | `POST /sleep` | Put server to sleep |
|
|
75
|
+
| `wake_up(tags:)` | `POST /wake_up` | Wake server up |
|
|
76
|
+
|
|
77
|
+
## Registry Publishing
|
|
78
|
+
|
|
79
|
+
When `lex-llm` routing and Legion transport are available, the provider publishes best-effort availability events to the `llm.registry` exchange:
|
|
80
|
+
|
|
81
|
+
- **Readiness events** on `readiness(live: true)` calls
|
|
82
|
+
- **Model availability events** on `list_models` discovery
|
|
83
|
+
|
|
84
|
+
Publishing is async (background threads) and never blocks the caller. All failures are handled gracefully via `handle_exception`.
|
|
85
|
+
|
|
86
|
+
## Development
|
|
87
|
+
|
|
88
|
+
```bash
|
|
89
|
+
bundle install
|
|
90
|
+
bundle exec rspec
|
|
91
|
+
bundle exec rubocop
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
## License
|
|
95
|
+
|
|
96
|
+
MIT
|
|
@@ -10,6 +10,7 @@ module Legion
|
|
|
10
10
|
# vLLM provider implementation for the Legion::Extensions::Llm base provider contract.
|
|
11
11
|
class Provider < Legion::Extensions::Llm::Provider
|
|
12
12
|
include Legion::Extensions::Llm::Provider::OpenAICompatible
|
|
13
|
+
include Legion::Logging::Helper
|
|
13
14
|
|
|
14
15
|
class << self
|
|
15
16
|
attr_writer :registry_publisher
|
|
@@ -66,22 +67,27 @@ module Legion
|
|
|
66
67
|
def wake_up_url = '/wake_up'
|
|
67
68
|
|
|
68
69
|
def health
|
|
70
|
+
log.info { "checking health at #{api_base}#{health_url}" }
|
|
69
71
|
connection.get(health_url).body
|
|
70
72
|
end
|
|
71
73
|
|
|
72
74
|
def readiness(live: false)
|
|
75
|
+
log.info { "checking readiness live=#{live} at #{api_base}" }
|
|
73
76
|
super.tap do |metadata|
|
|
74
77
|
self.class.registry_publisher.publish_readiness_async(metadata) if live
|
|
75
78
|
end
|
|
76
79
|
end
|
|
77
80
|
|
|
78
81
|
def list_models
|
|
82
|
+
log.info { "discovering models from #{api_base}#{models_url}" }
|
|
79
83
|
super.tap do |models|
|
|
84
|
+
log.info { "discovered #{models.size} model(s) from vLLM" }
|
|
80
85
|
self.class.registry_publisher.publish_models_async(models, readiness: readiness(live: false))
|
|
81
86
|
end
|
|
82
87
|
end
|
|
83
88
|
|
|
84
89
|
def version
|
|
90
|
+
log.info { "fetching version from #{api_base}#{version_url}" }
|
|
85
91
|
connection.get(version_url).body
|
|
86
92
|
end
|
|
87
93
|
|
|
@@ -124,7 +130,8 @@ module Legion
|
|
|
124
130
|
|
|
125
131
|
vllm = Legion::Settings.dig(:llm, :providers, :vllm)
|
|
126
132
|
vllm.is_a?(Hash) && (vllm[:enable_thinking] == true || vllm['enable_thinking'] == true)
|
|
127
|
-
rescue StandardError
|
|
133
|
+
rescue StandardError => e
|
|
134
|
+
handle_exception(e, level: :debug, handled: true, operation: 'vllm.thinking_setting')
|
|
128
135
|
false
|
|
129
136
|
end
|
|
130
137
|
|
|
@@ -6,6 +6,8 @@ module Legion
|
|
|
6
6
|
module Vllm
|
|
7
7
|
# Builds sanitized lex-llm registry envelopes for vLLM provider state.
|
|
8
8
|
class RegistryEventBuilder
|
|
9
|
+
include Legion::Logging::Helper
|
|
10
|
+
|
|
9
11
|
def readiness(readiness)
|
|
10
12
|
registry_event_class.public_send(
|
|
11
13
|
readiness[:ready] ? :available : :unavailable,
|
|
@@ -108,7 +110,8 @@ module Legion
|
|
|
108
110
|
configured_node = (::Legion::Settings.dig(:node, :canonical_name) if defined?(::Legion::Settings))
|
|
109
111
|
value = configured_node.to_s.strip
|
|
110
112
|
value.empty? ? :vllm : value.to_sym
|
|
111
|
-
rescue StandardError
|
|
113
|
+
rescue StandardError => e
|
|
114
|
+
handle_exception(e, level: :debug, handled: true, operation: 'vllm.registry.provider_instance')
|
|
112
115
|
:vllm
|
|
113
116
|
end
|
|
114
117
|
|
|
@@ -6,6 +6,8 @@ module Legion
|
|
|
6
6
|
module Vllm
|
|
7
7
|
# Best-effort publisher for vLLM provider availability events.
|
|
8
8
|
class RegistryPublisher
|
|
9
|
+
include Legion::Logging::Helper
|
|
10
|
+
|
|
9
11
|
APP_ID = 'lex-llm-vllm'
|
|
10
12
|
|
|
11
13
|
def initialize(builder: RegistryEventBuilder.new)
|
|
@@ -13,10 +15,12 @@ module Legion
|
|
|
13
15
|
end
|
|
14
16
|
|
|
15
17
|
def publish_readiness_async(readiness)
|
|
18
|
+
log.info { 'publishing readiness event to llm.registry' }
|
|
16
19
|
schedule { publish_event(@builder.readiness(readiness)) }
|
|
17
20
|
end
|
|
18
21
|
|
|
19
22
|
def publish_models_async(models, readiness:)
|
|
23
|
+
log.info { "publishing #{Array(models).size} model event(s) to llm.registry" }
|
|
20
24
|
schedule do
|
|
21
25
|
Array(models).each do |model|
|
|
22
26
|
publish_event(@builder.model_available(model, readiness:))
|
|
@@ -33,10 +37,10 @@ module Legion
|
|
|
33
37
|
Thread.current.abort_on_exception = false
|
|
34
38
|
yield
|
|
35
39
|
rescue StandardError => e
|
|
36
|
-
|
|
40
|
+
handle_exception(e, level: :debug, handled: true, operation: 'vllm.registry.schedule_thread')
|
|
37
41
|
end
|
|
38
42
|
rescue StandardError => e
|
|
39
|
-
|
|
43
|
+
handle_exception(e, level: :debug, handled: true, operation: 'vllm.registry.schedule')
|
|
40
44
|
false
|
|
41
45
|
end
|
|
42
46
|
|
|
@@ -45,7 +49,7 @@ module Legion
|
|
|
45
49
|
|
|
46
50
|
message_class.new(event:, app_id: APP_ID).publish(spool: false)
|
|
47
51
|
rescue StandardError => e
|
|
48
|
-
|
|
52
|
+
handle_exception(e, level: :warn, handled: true, operation: 'vllm.registry.publish_event')
|
|
49
53
|
false
|
|
50
54
|
end
|
|
51
55
|
|
|
@@ -56,7 +60,8 @@ module Legion
|
|
|
56
60
|
return true unless ::Legion::Transport::Connection.respond_to?(:session_open?)
|
|
57
61
|
|
|
58
62
|
::Legion::Transport::Connection.session_open?
|
|
59
|
-
rescue StandardError
|
|
63
|
+
rescue StandardError => e
|
|
64
|
+
handle_exception(e, level: :debug, handled: true, operation: 'vllm.registry.publishing_available?')
|
|
60
65
|
false
|
|
61
66
|
end
|
|
62
67
|
|
|
@@ -70,7 +75,8 @@ module Legion
|
|
|
70
75
|
|
|
71
76
|
require 'legion/extensions/llm/vllm/transport/messages/registry_event'
|
|
72
77
|
message_class_defined?
|
|
73
|
-
rescue LoadError
|
|
78
|
+
rescue LoadError => e
|
|
79
|
+
handle_exception(e, level: :debug, handled: true, operation: 'vllm.registry.transport_load')
|
|
74
80
|
false
|
|
75
81
|
end
|
|
76
82
|
|
|
@@ -81,18 +87,6 @@ module Legion
|
|
|
81
87
|
def message_class
|
|
82
88
|
::Legion::Extensions::Llm::Vllm::Transport::Messages::RegistryEvent
|
|
83
89
|
end
|
|
84
|
-
|
|
85
|
-
def log_publish_failure(error, level: :warn)
|
|
86
|
-
message = "[lex-llm-vllm] llm.registry publish failed: #{error.class}: #{error.message}"
|
|
87
|
-
logger = ::Legion::Extensions::Llm.logger if defined?(::Legion::Extensions::Llm)
|
|
88
|
-
if logger.respond_to?(level)
|
|
89
|
-
logger.public_send(level, message)
|
|
90
|
-
elsif logger.respond_to?(:debug)
|
|
91
|
-
logger.debug(message)
|
|
92
|
-
end
|
|
93
|
-
rescue StandardError
|
|
94
|
-
nil
|
|
95
|
-
end
|
|
96
90
|
end
|
|
97
91
|
end
|
|
98
92
|
end
|