lex-ollama 0.3.2 → 0.3.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +14 -0
- data/CLAUDE.md +34 -8
- data/README.md +47 -5
- data/lib/legion/extensions/ollama/actors/model_sync.rb +49 -0
- data/lib/legion/extensions/ollama/actors/model_worker.rb +19 -14
- data/lib/legion/extensions/ollama/runners/s3_models.rb +23 -0
- data/lib/legion/extensions/ollama/transport.rb +2 -7
- data/lib/legion/extensions/ollama/version.rb +1 -1
- data/lib/legion/extensions/ollama.rb +38 -1
- metadata +2 -2
- data/lib/legion/extensions/ollama/transport/queues/model_request.rb +0 -58
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 978c53ff8a178c003a5bb593a934536c20b616500d80ea0624f97014f9a88213
|
|
4
|
+
data.tar.gz: d700e31e6f38fe2b9c6cac3da627d67ce1ccab9a75d3d6d741cdc04f5cc614bf
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 6f44dcfc98336bcd0d28e6985ed468f7676b156d5135ff642256120db59563e161d46615b9acab0e3cdac6b578144121d60a23efc17c31f6f6c686349519f076
|
|
7
|
+
data.tar.gz: b191eacce0844eb0be9b6b4b22f12969007b37f335da700f6ea6bd4936b22fd6aa2eec945dbdfbb419f5b1a9f9f1b0c9e15c004d772e18a1a98c059c133e83e8
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,19 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## [0.3.4] - 2026-04-24
|
|
4
|
+
|
|
5
|
+
### Fixed
|
|
6
|
+
- `Ollama.build_actors` and `Ollama.default_settings` were absent from the installed 0.3.3 gem (gem was packaged before `bafb124` landed) — `Actor::ModelWorker` (requires `request_type:` and `model:` kwargs) was reaching the subscription actor pool with no zero-arg initializer, raising `ArgumentError: missing keywords: :request_type, :model` on every boot when running under the Homebrew legionio install
|
|
7
|
+
|
|
8
|
+
## [0.3.3] - 2026-04-16
|
|
9
|
+
|
|
10
|
+
### Added
|
|
11
|
+
- `Actor::ModelSync` — once actor; runs 5s after extension load; reads `legion.ollama.default_models` and `legion.ollama.s3` from settings; calls `import_from_s3` for any configured model not already present on disk; no-op if either setting is absent
|
|
12
|
+
|
|
13
|
+
### Fixed
|
|
14
|
+
- `Transport::Queues::ModelRequest` deleted — the framework auto-discovers every file in `transport/queues/` and calls `.new` with no arguments at startup, which crashed because `ModelRequest` required `request_type:` and `model:`; the queue definition is now an anonymous class created inline by `Actor::ModelWorker#build_queue_class`
|
|
15
|
+
- `Actor::ModelWorker#queue` now returns a CLASS instead of an instance — `Subscription#initialize` calls `queue.new`, so returning an instance caused a silent `NoMethodError` on `NilClass#new`; the anonymous queue class has `queue_name`, `queue_options`, `dlx_enabled`, and `initialize` (exchange bind) defined inline via `define_method`
|
|
16
|
+
|
|
3
17
|
## [0.3.2] - 2026-04-08
|
|
4
18
|
|
|
5
19
|
### Changed
|
data/CLAUDE.md
CHANGED
|
@@ -12,8 +12,8 @@ reporting, and **fleet queue subscription** for receiving routed LLM requests fr
|
|
|
12
12
|
|
|
13
13
|
**GitHub**: https://github.com/LegionIO/lex-ollama
|
|
14
14
|
**License**: MIT
|
|
15
|
-
**Version**: 0.3.
|
|
16
|
-
**Specs**:
|
|
15
|
+
**Version**: 0.3.3
|
|
16
|
+
**Specs**: 154 examples (16 spec files)
|
|
17
17
|
|
|
18
18
|
---
|
|
19
19
|
|
|
@@ -28,7 +28,8 @@ Legion::Extensions::Ollama
|
|
|
28
28
|
│ │ # pull_model, push_model, list_running
|
|
29
29
|
│ ├── Embeddings # embed
|
|
30
30
|
│ ├── Blobs # check_blob, push_blob
|
|
31
|
-
│ ├── S3Models # list_s3_models, import_from_s3, sync_from_s3, import_default_models
|
|
31
|
+
│ ├── S3Models # list_s3_models, import_from_s3, sync_from_s3, import_default_models,
|
|
32
|
+
│ │ # sync_configured_models
|
|
32
33
|
│ ├── Version # server_version
|
|
33
34
|
│ └── Fleet # handle_request (fleet dispatcher — chat/embed/generate)
|
|
34
35
|
├── Helpers/
|
|
@@ -44,7 +45,8 @@ Legion::Extensions::Ollama
|
|
|
44
45
|
│ └── Messages/
|
|
45
46
|
│ └── LlmResponse # Legion::LLM::Fleet::Response subclass, reply via default exchange
|
|
46
47
|
└── Actor/
|
|
47
|
-
|
|
48
|
+
├── ModelWorker # subscription actor — one per registered model/type
|
|
49
|
+
└── ModelSync # once actor — fires 5s after boot, pulls default models from S3
|
|
48
50
|
```
|
|
49
51
|
|
|
50
52
|
---
|
|
@@ -93,6 +95,15 @@ RabbitMQ policies (applied externally via Terraform) set `max-length` and
|
|
|
93
95
|
legion:
|
|
94
96
|
ollama:
|
|
95
97
|
host: "http://localhost:11434"
|
|
98
|
+
s3:
|
|
99
|
+
bucket: "legion"
|
|
100
|
+
prefix: "ollama/models"
|
|
101
|
+
endpoint: "https://s3.example.internal"
|
|
102
|
+
default_models:
|
|
103
|
+
- "qwen3.5:4b"
|
|
104
|
+
- "nomic-embed-text:latest"
|
|
105
|
+
fleet:
|
|
106
|
+
consumer_priority: 10 # H100: 10, Mac Studio: 5, MacBook: 1
|
|
96
107
|
subscriptions:
|
|
97
108
|
- type: embed
|
|
98
109
|
model: nomic-embed-text
|
|
@@ -104,7 +115,15 @@ legion:
|
|
|
104
115
|
model: llama3.2
|
|
105
116
|
```
|
|
106
117
|
|
|
107
|
-
|
|
118
|
+
**`s3` + `default_models`**: `Actor::ModelSync` fires 5 seconds after extension load and calls
|
|
119
|
+
`Runners::S3Models#sync_configured_models` to import any listed models not already present
|
|
120
|
+
locally. All download logic lives in the runner; the actor is only the trigger. Uses the
|
|
121
|
+
inherited `Actors::Base#manual` path (not `Legion::Runner`) so errors surface via
|
|
122
|
+
`handle_exception` rather than being silently swallowed by `Concurrent::ScheduledTask`.
|
|
123
|
+
|
|
124
|
+
**`subscriptions`**: `Ollama.build_actors` replaces the base `ModelWorker` actor entry with one
|
|
125
|
+
dynamically generated subclass per subscription entry (each with a zero-arg `initialize`).
|
|
126
|
+
The extension spawns one `Actor::ModelWorker` per entry at boot.
|
|
108
127
|
|
|
109
128
|
### Data Flow
|
|
110
129
|
|
|
@@ -154,6 +173,12 @@ The gem still works as a pure HTTP client library without AMQP, exactly as befor
|
|
|
154
173
|
- `request_type: 'generate'` → `Client#generate`.
|
|
155
174
|
- anything else (including `'chat'` or unknown) → `Client#chat`.
|
|
156
175
|
- **`Actor::ModelWorker#use_runner?` is `false`** — bypasses `Legion::Runner` / task DB entirely.
|
|
176
|
+
- **`Actor::ModelSync#use_runner?` is `false`** — uses inherited `Actors::Base#manual` which calls
|
|
177
|
+
`runner_class.send(runner_function, **{})` with proper `handle_exception` error handling.
|
|
178
|
+
- **`Ollama.build_actors`** dynamically generates one `ModelWorker` subclass per subscription
|
|
179
|
+
entry, each with a zero-arg `initialize` that passes the frozen `request_type` and `model`.
|
|
180
|
+
- **`Ollama.default_settings`** returns `{ s3: {}, fleet: {} }` so `settings[:s3]` and
|
|
181
|
+
`settings[:fleet]` are always hashes even without user configuration.
|
|
157
182
|
- **Reply publishing** never raises — errors are swallowed so the AMQP ack is not blocked.
|
|
158
183
|
- **Colon sanitisation** — `qwen3.5:27b` becomes `qwen3.5.27b` in queue/routing-key strings.
|
|
159
184
|
|
|
@@ -181,8 +206,9 @@ message_context:
|
|
|
181
206
|
A subset (`conversation_id`, `message_id`, `request_id`) is promoted to AMQP headers
|
|
182
207
|
(`x-legion-llm-conversation-id`, etc.) for filtering without body parsing.
|
|
183
208
|
|
|
184
|
-
|
|
185
|
-
|
|
209
|
+
The wire protocol spec (AMQP property mapping, platform-wide standard, per-message-type
|
|
210
|
+
specifications) was developed during the fleet design phase and is maintained in the
|
|
211
|
+
legion-llm repository alongside the implementation.
|
|
186
212
|
|
|
187
213
|
---
|
|
188
214
|
|
|
@@ -210,4 +236,4 @@ bundle exec rubocop
|
|
|
210
236
|
---
|
|
211
237
|
|
|
212
238
|
**Maintained By**: Matthew Iverson (@Esity)
|
|
213
|
-
**Last Updated**: 2026-04-
|
|
239
|
+
**Last Updated**: 2026-04-17
|
data/README.md
CHANGED
|
@@ -40,10 +40,52 @@ gem install lex-ollama
|
|
|
40
40
|
- `import_from_s3` - Download model from S3 directly to Ollama's filesystem (works before Ollama starts)
|
|
41
41
|
- `sync_from_s3` - Download model from S3, push blobs through Ollama's API, write manifest to filesystem
|
|
42
42
|
- `import_default_models` - Import a list of models from S3 (fleet provisioning)
|
|
43
|
+
- `sync_configured_models` - Import all `default_models` from S3 that aren't already present locally
|
|
43
44
|
|
|
44
45
|
### Version
|
|
45
46
|
- `server_version` - Retrieve the Ollama server version (GET /api/version)
|
|
46
47
|
|
|
48
|
+
### Fleet Queue Subscription
|
|
49
|
+
- `handle_request` - Dispatch inbound fleet AMQP messages to the appropriate runner (chat/embed/generate)
|
|
50
|
+
|
|
51
|
+
When `Legion::Extensions::Core` is present, lex-ollama subscribes to model-scoped queues on the
|
|
52
|
+
`llm.request` topic exchange, accepting routed LLM inference work from other Legion fleet members.
|
|
53
|
+
|
|
54
|
+
Each configured `(type, model)` pair gets its own auto-delete queue with routing key
|
|
55
|
+
`llm.request.ollama.<type>.<model>`. Multiple nodes serving the same model compete fairly
|
|
56
|
+
via RabbitMQ round-robin with consumer priority.
|
|
57
|
+
|
|
58
|
+
```yaml
|
|
59
|
+
legion:
|
|
60
|
+
ollama:
|
|
61
|
+
host: "http://localhost:11434"
|
|
62
|
+
s3:
|
|
63
|
+
bucket: "legion"
|
|
64
|
+
prefix: "ollama/models"
|
|
65
|
+
endpoint: "https://s3.example.internal"
|
|
66
|
+
default_models:
|
|
67
|
+
- "qwen3.5:4b"
|
|
68
|
+
- "nomic-embed-text:latest"
|
|
69
|
+
fleet:
|
|
70
|
+
consumer_priority: 10 # H100: 10, Mac Studio: 5, MacBook: 1
|
|
71
|
+
subscriptions:
|
|
72
|
+
- type: embed
|
|
73
|
+
model: nomic-embed-text
|
|
74
|
+
- type: chat
|
|
75
|
+
model: "qwen3.5:27b"
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
**Auto-provisioning**: When `s3` and `default_models` are configured, the `ModelSync` actor
|
|
79
|
+
fires 5 seconds after boot and imports any listed models not already present on disk from the
|
|
80
|
+
S3 mirror. No manual pull step needed for fleet nodes.
|
|
81
|
+
|
|
82
|
+
Fleet messages use the wire protocol defined in `legion-llm`: typed AMQP messages
|
|
83
|
+
(`llm.fleet.request` / `llm.fleet.response` / `llm.fleet.error`) with `message_context`
|
|
84
|
+
propagation for end-to-end tracing.
|
|
85
|
+
|
|
86
|
+
Without `Legion::Extensions::Core`, the gem works as a pure HTTP client library with no
|
|
87
|
+
AMQP dependency.
|
|
88
|
+
|
|
47
89
|
## Standalone Client
|
|
48
90
|
|
|
49
91
|
```ruby
|
|
@@ -85,21 +127,21 @@ Pull models from an internal S3 mirror instead of the public Ollama registry:
|
|
|
85
127
|
client = Legion::Extensions::Ollama::Client.new
|
|
86
128
|
|
|
87
129
|
# List available models in S3
|
|
88
|
-
client.list_s3_models(bucket: 'legion', endpoint: 'https://
|
|
130
|
+
client.list_s3_models(bucket: 'legion', endpoint: 'https://s3.example.internal')
|
|
89
131
|
|
|
90
132
|
# Import directly to filesystem (works without Ollama running)
|
|
91
133
|
client.import_from_s3(model: 'llama3:latest', bucket: 'legion',
|
|
92
|
-
endpoint: 'https://
|
|
134
|
+
endpoint: 'https://s3.example.internal')
|
|
93
135
|
|
|
94
136
|
# Push through Ollama API (requires Ollama running)
|
|
95
137
|
client.sync_from_s3(model: 'llama3:latest', bucket: 'legion',
|
|
96
|
-
endpoint: 'https://
|
|
138
|
+
endpoint: 'https://s3.example.internal')
|
|
97
139
|
|
|
98
140
|
# Provision fleet with default models
|
|
99
141
|
client.import_default_models(
|
|
100
142
|
default_models: %w[llama3:latest nomic-embed-text:latest],
|
|
101
143
|
bucket: 'legion',
|
|
102
|
-
endpoint: 'https://
|
|
144
|
+
endpoint: 'https://s3.example.internal'
|
|
103
145
|
)
|
|
104
146
|
```
|
|
105
147
|
|
|
@@ -121,7 +163,7 @@ result[:usage] # => { input_tokens: 1, output_tokens: 5, total_duration: ..., .
|
|
|
121
163
|
|
|
122
164
|
## Version
|
|
123
165
|
|
|
124
|
-
0.3.
|
|
166
|
+
0.3.3
|
|
125
167
|
|
|
126
168
|
## License
|
|
127
169
|
|
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Legion
|
|
4
|
+
module Extensions
|
|
5
|
+
module Ollama
|
|
6
|
+
module Actor
|
|
7
|
+
# Once actor — fires 5s after extension load and calls
|
|
8
|
+
# Runners::S3Models#sync_configured_models to pull any configured
|
|
9
|
+
# default models from S3 that are not already present locally.
|
|
10
|
+
#
|
|
11
|
+
# All download logic lives in the runner. This actor is only the trigger.
|
|
12
|
+
class ModelSync < Legion::Extensions::Actors::Once
|
|
13
|
+
def delay
|
|
14
|
+
5.0
|
|
15
|
+
end
|
|
16
|
+
|
|
17
|
+
def runner_class
|
|
18
|
+
Legion::Extensions::Ollama::Runners::S3Models
|
|
19
|
+
end
|
|
20
|
+
|
|
21
|
+
def runner_function
|
|
22
|
+
'sync_configured_models'
|
|
23
|
+
end
|
|
24
|
+
|
|
25
|
+
def use_runner?
|
|
26
|
+
false
|
|
27
|
+
end
|
|
28
|
+
|
|
29
|
+
def check_subtask?
|
|
30
|
+
false
|
|
31
|
+
end
|
|
32
|
+
|
|
33
|
+
def generate_task?
|
|
34
|
+
false
|
|
35
|
+
end
|
|
36
|
+
|
|
37
|
+
def enabled?
|
|
38
|
+
s3_cfg = settings[:s3]
|
|
39
|
+
models = settings[:default_models]
|
|
40
|
+
s3_cfg.is_a?(Hash) && !s3_cfg[:bucket].nil? && models.is_a?(Array) && !models.empty?
|
|
41
|
+
rescue StandardError => e
|
|
42
|
+
handle_exception(e, level: :warn, handled: true)
|
|
43
|
+
false
|
|
44
|
+
end
|
|
45
|
+
end
|
|
46
|
+
end
|
|
47
|
+
end
|
|
48
|
+
end
|
|
49
|
+
end
|
|
@@ -59,9 +59,7 @@ module Legion
|
|
|
59
59
|
# Standard scale: GPU server = 10, Mac Studio = 5, developer laptop = 1.
|
|
60
60
|
# Defaults to 0 (equal priority) if not configured.
|
|
61
61
|
def consumer_priority
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
Legion::Settings.dig(:ollama, :fleet, :consumer_priority) || 0
|
|
62
|
+
settings.dig(:fleet, :consumer_priority) || 0
|
|
65
63
|
end
|
|
66
64
|
|
|
67
65
|
# Subscribe options include x-priority argument so RabbitMQ can honour
|
|
@@ -75,10 +73,12 @@ module Legion
|
|
|
75
73
|
base.merge(arguments: { 'x-priority' => consumer_priority })
|
|
76
74
|
end
|
|
77
75
|
|
|
78
|
-
#
|
|
79
|
-
# routing key for this worker's (type, model) pair.
|
|
76
|
+
# Returns a queue CLASS (not instance) bound to the llm.request exchange
|
|
77
|
+
# with the routing key for this worker's (type, model) pair.
|
|
78
|
+
# The Subscription base class calls queue.new in initialize, so this must
|
|
79
|
+
# return a class, not an instance.
|
|
80
80
|
def queue
|
|
81
|
-
@queue ||=
|
|
81
|
+
@queue ||= build_queue_class
|
|
82
82
|
end
|
|
83
83
|
|
|
84
84
|
# Enrich every inbound message with the worker's own request_type and model
|
|
@@ -94,17 +94,22 @@ module Legion
|
|
|
94
94
|
|
|
95
95
|
private
|
|
96
96
|
|
|
97
|
-
def
|
|
97
|
+
def build_queue_class
|
|
98
98
|
sanitised_model = @model_name.tr(':', '.')
|
|
99
99
|
routing_key = "llm.request.ollama.#{@request_type}.#{sanitised_model}"
|
|
100
|
+
exchange_class = Transport::Exchanges::LlmRequest
|
|
100
101
|
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
102
|
+
Class.new(Legion::Transport::Queue) do
|
|
103
|
+
define_method(:queue_name) { routing_key }
|
|
104
|
+
define_method(:queue_options) do
|
|
105
|
+
{ durable: false, auto_delete: true, arguments: { 'x-max-priority' => 10 } }
|
|
106
|
+
end
|
|
107
|
+
define_method(:dlx_enabled) { false }
|
|
108
|
+
define_method(:initialize) do
|
|
109
|
+
super()
|
|
110
|
+
bind(exchange_class.new, routing_key: routing_key)
|
|
111
|
+
end
|
|
112
|
+
end
|
|
108
113
|
end
|
|
109
114
|
end
|
|
110
115
|
end
|
|
@@ -145,6 +145,29 @@ module Legion
|
|
|
145
145
|
{ result: results, status: 200 }
|
|
146
146
|
end
|
|
147
147
|
|
|
148
|
+
def sync_configured_models(**)
|
|
149
|
+
s3_cfg = settings[:s3]
|
|
150
|
+
models = settings[:default_models]
|
|
151
|
+
|
|
152
|
+
return { result: false, status: 412, error: 'no s3 config' } unless s3_cfg.is_a?(Hash) && s3_cfg[:bucket]
|
|
153
|
+
return { result: false, status: 412, error: 'no default_models configured' } unless models.is_a?(Array) && !models.empty?
|
|
154
|
+
|
|
155
|
+
bucket = s3_cfg[:bucket]
|
|
156
|
+
s3_opts = s3_cfg.except(:bucket)
|
|
157
|
+
models_path = ENV.fetch('OLLAMA_MODELS', File.join(Dir.home, '.ollama', 'models'))
|
|
158
|
+
|
|
159
|
+
results = models.filter_map do |model|
|
|
160
|
+
name, tag = model.split(':')
|
|
161
|
+
tag ||= 'latest'
|
|
162
|
+
manifest = File.join(models_path, 'manifests', 'registry.ollama.ai', 'library', name, tag)
|
|
163
|
+
next if File.exist?(manifest)
|
|
164
|
+
|
|
165
|
+
import_from_s3(model: model, bucket: bucket, models_path: models_path, **s3_opts)
|
|
166
|
+
end
|
|
167
|
+
|
|
168
|
+
{ result: results, status: 200 }
|
|
169
|
+
end
|
|
170
|
+
|
|
148
171
|
private
|
|
149
172
|
|
|
150
173
|
def default_models_path
|
|
@@ -12,13 +12,8 @@ module Legion
|
|
|
12
12
|
module Transport
|
|
13
13
|
extend Legion::Extensions::Transport if Legion::Extensions.const_defined?(:Transport, false)
|
|
14
14
|
|
|
15
|
-
# All queue-to-exchange bindings are established dynamically by
|
|
16
|
-
# Actor::ModelWorker
|
|
17
|
-
# This file only needs to declare the exchange so topology/infra mode
|
|
18
|
-
# can introspect the full routing graph.
|
|
19
|
-
def self.additional_e_to_q
|
|
20
|
-
[]
|
|
21
|
-
end
|
|
15
|
+
# All queue-to-exchange bindings for fleet queues are established dynamically by
|
|
16
|
+
# Actor::ModelWorker at subscription time via build_queue_class.
|
|
22
17
|
end
|
|
23
18
|
end
|
|
24
19
|
end
|
|
@@ -18,16 +18,53 @@ require 'legion/extensions/ollama/client'
|
|
|
18
18
|
# so the gem still works as a standalone HTTP client without any AMQP runtime.
|
|
19
19
|
if Legion::Extensions.const_defined?(:Core, false)
|
|
20
20
|
require 'legion/extensions/ollama/transport/exchanges/llm_request'
|
|
21
|
-
require 'legion/extensions/ollama/transport/queues/model_request'
|
|
22
21
|
require 'legion/extensions/ollama/transport/messages/llm_response'
|
|
23
22
|
require 'legion/extensions/ollama/transport'
|
|
24
23
|
require 'legion/extensions/ollama/actors/model_worker'
|
|
24
|
+
require 'legion/extensions/ollama/actors/model_sync'
|
|
25
25
|
end
|
|
26
26
|
|
|
27
27
|
module Legion
|
|
28
28
|
module Extensions
|
|
29
29
|
module Ollama
|
|
30
30
|
extend Legion::Extensions::Core if Legion::Extensions.const_defined?(:Core, false)
|
|
31
|
+
|
|
32
|
+
def self.default_settings
|
|
33
|
+
{
|
|
34
|
+
s3: {},
|
|
35
|
+
fleet: {}
|
|
36
|
+
}
|
|
37
|
+
end
|
|
38
|
+
|
|
39
|
+
# Called by the framework during autobuild. Runs normal actor discovery,
|
|
40
|
+
# then replaces the single ModelWorker entry with one concrete subclass
|
|
41
|
+
# per subscription entry in settings (each has a zero-arg initialize).
|
|
42
|
+
def self.build_actors
|
|
43
|
+
super
|
|
44
|
+
@actors.delete(:model_worker)
|
|
45
|
+
|
|
46
|
+
subs = settings[:subscriptions]
|
|
47
|
+
return unless subs.is_a?(Array)
|
|
48
|
+
|
|
49
|
+
subs.each do |sub|
|
|
50
|
+
request_type = sub[:type]&.to_s
|
|
51
|
+
model = sub[:model]&.to_s
|
|
52
|
+
next unless request_type && model
|
|
53
|
+
|
|
54
|
+
actor_name = :"model_worker_#{request_type}_#{model.tr(':.', '__')}"
|
|
55
|
+
worker_class = Class.new(Legion::Extensions::Ollama::Actor::ModelWorker) do
|
|
56
|
+
define_method(:initialize) { super(request_type: request_type, model: model) }
|
|
57
|
+
end
|
|
58
|
+
|
|
59
|
+
@actors[actor_name] = {
|
|
60
|
+
extension: 'lex-ollama',
|
|
61
|
+
extension_name: :ollama,
|
|
62
|
+
actor_name: actor_name,
|
|
63
|
+
actor_class: worker_class,
|
|
64
|
+
type: 'literal'
|
|
65
|
+
}
|
|
66
|
+
end
|
|
67
|
+
end
|
|
31
68
|
end
|
|
32
69
|
end
|
|
33
70
|
end
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: lex-ollama
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.3.
|
|
4
|
+
version: 0.3.4
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Esity
|
|
@@ -56,6 +56,7 @@ files:
|
|
|
56
56
|
- README.md
|
|
57
57
|
- lex-ollama.gemspec
|
|
58
58
|
- lib/legion/extensions/ollama.rb
|
|
59
|
+
- lib/legion/extensions/ollama/actors/model_sync.rb
|
|
59
60
|
- lib/legion/extensions/ollama/actors/model_worker.rb
|
|
60
61
|
- lib/legion/extensions/ollama/client.rb
|
|
61
62
|
- lib/legion/extensions/ollama/helpers/client.rb
|
|
@@ -72,7 +73,6 @@ files:
|
|
|
72
73
|
- lib/legion/extensions/ollama/transport.rb
|
|
73
74
|
- lib/legion/extensions/ollama/transport/exchanges/llm_request.rb
|
|
74
75
|
- lib/legion/extensions/ollama/transport/messages/llm_response.rb
|
|
75
|
-
- lib/legion/extensions/ollama/transport/queues/model_request.rb
|
|
76
76
|
- lib/legion/extensions/ollama/version.rb
|
|
77
77
|
homepage: https://github.com/LegionIO/lex-ollama
|
|
78
78
|
licenses:
|
|
@@ -1,58 +0,0 @@
|
|
|
1
|
-
# frozen_string_literal: true
|
|
2
|
-
|
|
3
|
-
module Legion
|
|
4
|
-
module Extensions
|
|
5
|
-
module Ollama
|
|
6
|
-
module Transport
|
|
7
|
-
module Queues
|
|
8
|
-
# Parametric queue — one instance per (request_type, model) tuple.
|
|
9
|
-
#
|
|
10
|
-
# queue_name mirrors the routing key exactly so bindings are self-documenting
|
|
11
|
-
# in the RabbitMQ management UI, e.g.:
|
|
12
|
-
# llm.request.ollama.embed.nomic-embed-text
|
|
13
|
-
# llm.request.ollama.chat.qwen3.5.27b
|
|
14
|
-
#
|
|
15
|
-
# Queue strategy:
|
|
16
|
-
# - classic (not quorum): quorum queues cannot be auto-delete
|
|
17
|
-
# - auto_delete: true — queue deletes when last consumer disconnects + queue empties,
|
|
18
|
-
# enabling basic.return feedback to publishers via mandatory: true
|
|
19
|
-
# - x-max-priority: 10 — must be a queue argument at declaration time for classic
|
|
20
|
-
# queues; policies handle max-length and overflow externally
|
|
21
|
-
class ModelRequest < Legion::Transport::Queue
|
|
22
|
-
def initialize(request_type:, model:, **)
|
|
23
|
-
@request_type = request_type.to_s
|
|
24
|
-
@model = sanitise_model(model)
|
|
25
|
-
super(**)
|
|
26
|
-
end
|
|
27
|
-
|
|
28
|
-
def queue_name
|
|
29
|
-
"llm.request.ollama.#{@request_type}.#{@model}"
|
|
30
|
-
end
|
|
31
|
-
|
|
32
|
-
def queue_options
|
|
33
|
-
{
|
|
34
|
-
durable: false,
|
|
35
|
-
auto_delete: true,
|
|
36
|
-
arguments: { 'x-max-priority' => 10 }
|
|
37
|
-
}
|
|
38
|
-
end
|
|
39
|
-
|
|
40
|
-
# Disable dead-letter exchange provisioning. The base class
|
|
41
|
-
# default_options always adds x-dead-letter-exchange when
|
|
42
|
-
# dlx_enabled returns true. Fleet queues are ephemeral
|
|
43
|
-
# (auto-delete) and must not provision persistent DLX queues.
|
|
44
|
-
def dlx_enabled
|
|
45
|
-
false
|
|
46
|
-
end
|
|
47
|
-
|
|
48
|
-
private
|
|
49
|
-
|
|
50
|
-
def sanitise_model(name)
|
|
51
|
-
name.to_s.tr(':', '.')
|
|
52
|
-
end
|
|
53
|
-
end
|
|
54
|
-
end
|
|
55
|
-
end
|
|
56
|
-
end
|
|
57
|
-
end
|
|
58
|
-
end
|