lex-ollama 0.3.2 → 0.3.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +9 -0
- data/CLAUDE.md +5 -4
- data/README.md +35 -5
- data/lib/legion/extensions/ollama/actors/model_sync.rb +90 -0
- data/lib/legion/extensions/ollama/actors/model_worker.rb +18 -11
- data/lib/legion/extensions/ollama/transport.rb +2 -7
- data/lib/legion/extensions/ollama/version.rb +1 -1
- data/lib/legion/extensions/ollama.rb +1 -1
- metadata +2 -2
- data/lib/legion/extensions/ollama/transport/queues/model_request.rb +0 -58
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: '086cd0d9744893c480b69c7b9fc6a10229af627344d316cafbccbd72d01cdafe'
|
|
4
|
+
data.tar.gz: e88f7893305401d36d8a3d48a7fb057e831fb6edeb245a5691c3d1e2527e1f1f
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: f6154fee6005bb96262961342983f01c5d4e5f2747c2cd11a33afbebc1f5ffa00ea64e4f39ef63758620524a5c6690b1a81856c1658371acc040fdd2295c8731
|
|
7
|
+
data.tar.gz: 0dda701fb81e1f084bc3cc4404d8014d6e2b6ecc68153fcdb42009c10a337f4a078ff11890a4032d95d516a5c1bc5177f4a8e1492db4caea47e9da1641fe6669
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,14 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## [0.3.3] - 2026-04-16
|
|
4
|
+
|
|
5
|
+
### Added
|
|
6
|
+
- `Actor::ModelSync` — once actor; runs 5s after extension load; reads `legion.ollama.default_models` and `legion.ollama.s3` from settings; calls `import_from_s3` for any configured model not already present on disk; no-op if either setting is absent
|
|
7
|
+
|
|
8
|
+
### Fixed
|
|
9
|
+
- `Transport::Queues::ModelRequest` deleted — the framework auto-discovers every file in `transport/queues/` and calls `.new` with no arguments at startup, which crashed because `ModelRequest` required `request_type:` and `model:`; the queue definition is now an anonymous class created inline by `Actor::ModelWorker#build_queue_class`
|
|
10
|
+
- `Actor::ModelWorker#queue` now returns a CLASS instead of an instance — `Subscription#initialize` calls `queue.new`, so returning an instance caused a silent `NoMethodError` on `NilClass#new`; the anonymous queue class has `queue_name`, `queue_options`, `dlx_enabled`, and `initialize` (exchange bind) defined inline via `define_method`
|
|
11
|
+
|
|
3
12
|
## [0.3.2] - 2026-04-08
|
|
4
13
|
|
|
5
14
|
### Changed
|
data/CLAUDE.md
CHANGED
|
@@ -13,7 +13,7 @@ reporting, and **fleet queue subscription** for receiving routed LLM requests fr
|
|
|
13
13
|
**GitHub**: https://github.com/LegionIO/lex-ollama
|
|
14
14
|
**License**: MIT
|
|
15
15
|
**Version**: 0.3.2
|
|
16
|
-
**Specs**:
|
|
16
|
+
**Specs**: 166 examples (17 spec files)
|
|
17
17
|
|
|
18
18
|
---
|
|
19
19
|
|
|
@@ -181,8 +181,9 @@ message_context:
|
|
|
181
181
|
A subset (`conversation_id`, `message_id`, `request_id`) is promoted to AMQP headers
|
|
182
182
|
(`x-legion-llm-conversation-id`, etc.) for filtering without body parsing.
|
|
183
183
|
|
|
184
|
-
|
|
185
|
-
|
|
184
|
+
The wire protocol spec (AMQP property mapping, platform-wide standard, per-message-type
|
|
185
|
+
specifications) was developed during the fleet design phase and is maintained in the
|
|
186
|
+
legion-llm repository alongside the implementation.
|
|
186
187
|
|
|
187
188
|
---
|
|
188
189
|
|
|
@@ -210,4 +211,4 @@ bundle exec rubocop
|
|
|
210
211
|
---
|
|
211
212
|
|
|
212
213
|
**Maintained By**: Matthew Iverson (@Esity)
|
|
213
|
-
**Last Updated**: 2026-04-
|
|
214
|
+
**Last Updated**: 2026-04-10
|
data/README.md
CHANGED
|
@@ -44,6 +44,36 @@ gem install lex-ollama
|
|
|
44
44
|
### Version
|
|
45
45
|
- `server_version` - Retrieve the Ollama server version (GET /api/version)
|
|
46
46
|
|
|
47
|
+
### Fleet Queue Subscription
|
|
48
|
+
- `handle_request` - Dispatch inbound fleet AMQP messages to the appropriate runner (chat/embed/generate)
|
|
49
|
+
|
|
50
|
+
When `Legion::Extensions::Core` is present, lex-ollama subscribes to model-scoped queues on the
|
|
51
|
+
`llm.request` topic exchange, accepting routed LLM inference work from other Legion fleet members.
|
|
52
|
+
|
|
53
|
+
Each configured `(type, model)` pair gets its own auto-delete queue with routing key
|
|
54
|
+
`llm.request.ollama.<type>.<model>`. Multiple nodes serving the same model compete fairly
|
|
55
|
+
via RabbitMQ round-robin with consumer priority.
|
|
56
|
+
|
|
57
|
+
```yaml
|
|
58
|
+
legion:
|
|
59
|
+
ollama:
|
|
60
|
+
host: "http://localhost:11434"
|
|
61
|
+
fleet:
|
|
62
|
+
consumer_priority: 10 # H100: 10, Mac Studio: 5, MacBook: 1
|
|
63
|
+
subscriptions:
|
|
64
|
+
- type: embed
|
|
65
|
+
model: nomic-embed-text
|
|
66
|
+
- type: chat
|
|
67
|
+
model: "qwen3.5:27b"
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
Fleet messages use the wire protocol defined in `legion-llm`: typed AMQP messages
|
|
71
|
+
(`llm.fleet.request` / `llm.fleet.response` / `llm.fleet.error`) with `message_context`
|
|
72
|
+
propagation for end-to-end tracing.
|
|
73
|
+
|
|
74
|
+
Without `Legion::Extensions::Core`, the gem works as a pure HTTP client library with no
|
|
75
|
+
AMQP dependency.
|
|
76
|
+
|
|
47
77
|
## Standalone Client
|
|
48
78
|
|
|
49
79
|
```ruby
|
|
@@ -85,21 +115,21 @@ Pull models from an internal S3 mirror instead of the public Ollama registry:
|
|
|
85
115
|
client = Legion::Extensions::Ollama::Client.new
|
|
86
116
|
|
|
87
117
|
# List available models in S3
|
|
88
|
-
client.list_s3_models(bucket: 'legion', endpoint: 'https://
|
|
118
|
+
client.list_s3_models(bucket: 'legion', endpoint: 'https://s3.example.internal')
|
|
89
119
|
|
|
90
120
|
# Import directly to filesystem (works without Ollama running)
|
|
91
121
|
client.import_from_s3(model: 'llama3:latest', bucket: 'legion',
|
|
92
|
-
endpoint: 'https://
|
|
122
|
+
endpoint: 'https://s3.example.internal')
|
|
93
123
|
|
|
94
124
|
# Push through Ollama API (requires Ollama running)
|
|
95
125
|
client.sync_from_s3(model: 'llama3:latest', bucket: 'legion',
|
|
96
|
-
endpoint: 'https://
|
|
126
|
+
endpoint: 'https://s3.example.internal')
|
|
97
127
|
|
|
98
128
|
# Provision fleet with default models
|
|
99
129
|
client.import_default_models(
|
|
100
130
|
default_models: %w[llama3:latest nomic-embed-text:latest],
|
|
101
131
|
bucket: 'legion',
|
|
102
|
-
endpoint: 'https://
|
|
132
|
+
endpoint: 'https://s3.example.internal'
|
|
103
133
|
)
|
|
104
134
|
```
|
|
105
135
|
|
|
@@ -121,7 +151,7 @@ result[:usage] # => { input_tokens: 1, output_tokens: 5, total_duration: ..., .
|
|
|
121
151
|
|
|
122
152
|
## Version
|
|
123
153
|
|
|
124
|
-
0.3.
|
|
154
|
+
0.3.2
|
|
125
155
|
|
|
126
156
|
## License
|
|
127
157
|
|
|
@@ -0,0 +1,90 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Legion
|
|
4
|
+
module Extensions
|
|
5
|
+
module Ollama
|
|
6
|
+
module Actor
|
|
7
|
+
# Once actor — runs once shortly after extension load.
|
|
8
|
+
# Reads legion.ollama.s3 and legion.ollama.default_models from settings
|
|
9
|
+
# and calls import_from_s3 for any model not already present locally.
|
|
10
|
+
#
|
|
11
|
+
# Settings example:
|
|
12
|
+
# {
|
|
13
|
+
# "legion": {
|
|
14
|
+
# "ollama": {
|
|
15
|
+
# "s3": {
|
|
16
|
+
# "bucket": "legion",
|
|
17
|
+
# "prefix": "ollama/models",
|
|
18
|
+
# "endpoint": "https://s3.example.internal"
|
|
19
|
+
# },
|
|
20
|
+
# "default_models": ["qwen3.5:4b", "nomic-embed-text:latest"]
|
|
21
|
+
# }
|
|
22
|
+
# }
|
|
23
|
+
# }
|
|
24
|
+
class ModelSync < Legion::Extensions::Actors::Once
|
|
25
|
+
include Legion::Logging::Helper
|
|
26
|
+
|
|
27
|
+
# Run 5 seconds after extension load to allow the rest of startup to complete.
|
|
28
|
+
def delay
|
|
29
|
+
5.0
|
|
30
|
+
end
|
|
31
|
+
|
|
32
|
+
def use_runner?
|
|
33
|
+
false
|
|
34
|
+
end
|
|
35
|
+
|
|
36
|
+
def runner_class
|
|
37
|
+
self.class
|
|
38
|
+
end
|
|
39
|
+
|
|
40
|
+
def enabled?
|
|
41
|
+
return false unless defined?(Legion::Settings)
|
|
42
|
+
|
|
43
|
+
models = Legion::Settings.dig(:ollama, :default_models)
|
|
44
|
+
s3_cfg = Legion::Settings.dig(:ollama, :s3)
|
|
45
|
+
models.is_a?(Array) && !models.empty? && s3_cfg.is_a?(Hash) && s3_cfg[:bucket]
|
|
46
|
+
rescue StandardError => e
|
|
47
|
+
handle_exception(e, level: :warn, handled: true)
|
|
48
|
+
false
|
|
49
|
+
end
|
|
50
|
+
|
|
51
|
+
def manual
|
|
52
|
+
models = Legion::Settings.dig(:ollama, :default_models) || []
|
|
53
|
+
s3_cfg = Legion::Settings.dig(:ollama, :s3)
|
|
54
|
+
bucket = s3_cfg[:bucket]
|
|
55
|
+
s3_opts = s3_cfg.except(:bucket)
|
|
56
|
+
|
|
57
|
+
client = Object.new.extend(Legion::Extensions::Ollama::Runners::S3Models)
|
|
58
|
+
models_path = ENV.fetch('OLLAMA_MODELS', File.join(Dir.home, '.ollama', 'models'))
|
|
59
|
+
|
|
60
|
+
models.each do |model|
|
|
61
|
+
if model_present_locally?(model, models_path)
|
|
62
|
+
log.debug "[ModelSync] #{model} already present locally, skipping"
|
|
63
|
+
next
|
|
64
|
+
end
|
|
65
|
+
|
|
66
|
+
log.info "[ModelSync] importing #{model} from S3"
|
|
67
|
+
result = client.import_from_s3(model: model, bucket: bucket, models_path: models_path, **s3_opts)
|
|
68
|
+
if result[:status] == 200
|
|
69
|
+
log.info "[ModelSync] imported #{model} (blobs_downloaded=#{result[:blobs_downloaded]}, blobs_skipped=#{result[:blobs_skipped]})"
|
|
70
|
+
else
|
|
71
|
+
log.warn "[ModelSync] failed to import #{model}: #{result.inspect}"
|
|
72
|
+
end
|
|
73
|
+
rescue StandardError => e
|
|
74
|
+
handle_exception(e, level: :error, handled: true, model: model)
|
|
75
|
+
end
|
|
76
|
+
end
|
|
77
|
+
|
|
78
|
+
private
|
|
79
|
+
|
|
80
|
+
def model_present_locally?(model, models_path)
|
|
81
|
+
name, tag = model.split(':')
|
|
82
|
+
tag ||= 'latest'
|
|
83
|
+
manifest = File.join(models_path, 'manifests', 'registry.ollama.ai', 'library', name, tag)
|
|
84
|
+
File.exist?(manifest)
|
|
85
|
+
end
|
|
86
|
+
end
|
|
87
|
+
end
|
|
88
|
+
end
|
|
89
|
+
end
|
|
90
|
+
end
|
|
@@ -75,10 +75,12 @@ module Legion
|
|
|
75
75
|
base.merge(arguments: { 'x-priority' => consumer_priority })
|
|
76
76
|
end
|
|
77
77
|
|
|
78
|
-
#
|
|
79
|
-
# routing key for this worker's (type, model) pair.
|
|
78
|
+
# Returns a queue CLASS (not instance) bound to the llm.request exchange
|
|
79
|
+
# with the routing key for this worker's (type, model) pair.
|
|
80
|
+
# The Subscription base class calls queue.new in initialize, so this must
|
|
81
|
+
# return a class, not an instance.
|
|
80
82
|
def queue
|
|
81
|
-
@queue ||=
|
|
83
|
+
@queue ||= build_queue_class
|
|
82
84
|
end
|
|
83
85
|
|
|
84
86
|
# Enrich every inbound message with the worker's own request_type and model
|
|
@@ -94,17 +96,22 @@ module Legion
|
|
|
94
96
|
|
|
95
97
|
private
|
|
96
98
|
|
|
97
|
-
def
|
|
99
|
+
def build_queue_class
|
|
98
100
|
sanitised_model = @model_name.tr(':', '.')
|
|
99
101
|
routing_key = "llm.request.ollama.#{@request_type}.#{sanitised_model}"
|
|
102
|
+
exchange_class = Transport::Exchanges::LlmRequest
|
|
100
103
|
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
104
|
+
Class.new(Legion::Transport::Queue) do
|
|
105
|
+
define_method(:queue_name) { routing_key }
|
|
106
|
+
define_method(:queue_options) do
|
|
107
|
+
{ durable: false, auto_delete: true, arguments: { 'x-max-priority' => 10 } }
|
|
108
|
+
end
|
|
109
|
+
define_method(:dlx_enabled) { false }
|
|
110
|
+
define_method(:initialize) do
|
|
111
|
+
super()
|
|
112
|
+
bind(exchange_class.new, routing_key: routing_key)
|
|
113
|
+
end
|
|
114
|
+
end
|
|
108
115
|
end
|
|
109
116
|
end
|
|
110
117
|
end
|
|
@@ -12,13 +12,8 @@ module Legion
|
|
|
12
12
|
module Transport
|
|
13
13
|
extend Legion::Extensions::Transport if Legion::Extensions.const_defined?(:Transport, false)
|
|
14
14
|
|
|
15
|
-
# All queue-to-exchange bindings are established dynamically by
|
|
16
|
-
# Actor::ModelWorker
|
|
17
|
-
# This file only needs to declare the exchange so topology/infra mode
|
|
18
|
-
# can introspect the full routing graph.
|
|
19
|
-
def self.additional_e_to_q
|
|
20
|
-
[]
|
|
21
|
-
end
|
|
15
|
+
# All queue-to-exchange bindings for fleet queues are established dynamically by
|
|
16
|
+
# Actor::ModelWorker at subscription time via build_queue_class.
|
|
22
17
|
end
|
|
23
18
|
end
|
|
24
19
|
end
|
|
@@ -18,10 +18,10 @@ require 'legion/extensions/ollama/client'
|
|
|
18
18
|
# so the gem still works as a standalone HTTP client without any AMQP runtime.
|
|
19
19
|
if Legion::Extensions.const_defined?(:Core, false)
|
|
20
20
|
require 'legion/extensions/ollama/transport/exchanges/llm_request'
|
|
21
|
-
require 'legion/extensions/ollama/transport/queues/model_request'
|
|
22
21
|
require 'legion/extensions/ollama/transport/messages/llm_response'
|
|
23
22
|
require 'legion/extensions/ollama/transport'
|
|
24
23
|
require 'legion/extensions/ollama/actors/model_worker'
|
|
24
|
+
require 'legion/extensions/ollama/actors/model_sync'
|
|
25
25
|
end
|
|
26
26
|
|
|
27
27
|
module Legion
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: lex-ollama
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.3.
|
|
4
|
+
version: 0.3.3
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Esity
|
|
@@ -56,6 +56,7 @@ files:
|
|
|
56
56
|
- README.md
|
|
57
57
|
- lex-ollama.gemspec
|
|
58
58
|
- lib/legion/extensions/ollama.rb
|
|
59
|
+
- lib/legion/extensions/ollama/actors/model_sync.rb
|
|
59
60
|
- lib/legion/extensions/ollama/actors/model_worker.rb
|
|
60
61
|
- lib/legion/extensions/ollama/client.rb
|
|
61
62
|
- lib/legion/extensions/ollama/helpers/client.rb
|
|
@@ -72,7 +73,6 @@ files:
|
|
|
72
73
|
- lib/legion/extensions/ollama/transport.rb
|
|
73
74
|
- lib/legion/extensions/ollama/transport/exchanges/llm_request.rb
|
|
74
75
|
- lib/legion/extensions/ollama/transport/messages/llm_response.rb
|
|
75
|
-
- lib/legion/extensions/ollama/transport/queues/model_request.rb
|
|
76
76
|
- lib/legion/extensions/ollama/version.rb
|
|
77
77
|
homepage: https://github.com/LegionIO/lex-ollama
|
|
78
78
|
licenses:
|
|
@@ -1,58 +0,0 @@
|
|
|
1
|
-
# frozen_string_literal: true
|
|
2
|
-
|
|
3
|
-
module Legion
|
|
4
|
-
module Extensions
|
|
5
|
-
module Ollama
|
|
6
|
-
module Transport
|
|
7
|
-
module Queues
|
|
8
|
-
# Parametric queue — one instance per (request_type, model) tuple.
|
|
9
|
-
#
|
|
10
|
-
# queue_name mirrors the routing key exactly so bindings are self-documenting
|
|
11
|
-
# in the RabbitMQ management UI, e.g.:
|
|
12
|
-
# llm.request.ollama.embed.nomic-embed-text
|
|
13
|
-
# llm.request.ollama.chat.qwen3.5.27b
|
|
14
|
-
#
|
|
15
|
-
# Queue strategy:
|
|
16
|
-
# - classic (not quorum): quorum queues cannot be auto-delete
|
|
17
|
-
# - auto_delete: true — queue deletes when last consumer disconnects + queue empties,
|
|
18
|
-
# enabling basic.return feedback to publishers via mandatory: true
|
|
19
|
-
# - x-max-priority: 10 — must be a queue argument at declaration time for classic
|
|
20
|
-
# queues; policies handle max-length and overflow externally
|
|
21
|
-
class ModelRequest < Legion::Transport::Queue
|
|
22
|
-
def initialize(request_type:, model:, **)
|
|
23
|
-
@request_type = request_type.to_s
|
|
24
|
-
@model = sanitise_model(model)
|
|
25
|
-
super(**)
|
|
26
|
-
end
|
|
27
|
-
|
|
28
|
-
def queue_name
|
|
29
|
-
"llm.request.ollama.#{@request_type}.#{@model}"
|
|
30
|
-
end
|
|
31
|
-
|
|
32
|
-
def queue_options
|
|
33
|
-
{
|
|
34
|
-
durable: false,
|
|
35
|
-
auto_delete: true,
|
|
36
|
-
arguments: { 'x-max-priority' => 10 }
|
|
37
|
-
}
|
|
38
|
-
end
|
|
39
|
-
|
|
40
|
-
# Disable dead-letter exchange provisioning. The base class
|
|
41
|
-
# default_options always adds x-dead-letter-exchange when
|
|
42
|
-
# dlx_enabled returns true. Fleet queues are ephemeral
|
|
43
|
-
# (auto-delete) and must not provision persistent DLX queues.
|
|
44
|
-
def dlx_enabled
|
|
45
|
-
false
|
|
46
|
-
end
|
|
47
|
-
|
|
48
|
-
private
|
|
49
|
-
|
|
50
|
-
def sanitise_model(name)
|
|
51
|
-
name.to_s.tr(':', '.')
|
|
52
|
-
end
|
|
53
|
-
end
|
|
54
|
-
end
|
|
55
|
-
end
|
|
56
|
-
end
|
|
57
|
-
end
|
|
58
|
-
end
|