lex-ollama 0.3.0 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 7477574f919b18b85c79afba3a1f65c8540d9eff9ca02b9e0c807b3740fed452
4
- data.tar.gz: a5c69878c8518caf02c2e238c94243fd49c320f20bfaede00252bfdc87be5cbb
3
+ metadata.gz: 28df561b00b58c7cb179b9904aed61a5aa7e278140306dadb3b4b2665eaab824
4
+ data.tar.gz: 446afaab9d80e6a4f62286a1f5ccc1c023bdbb178dba043cb96081412991b2d3
5
5
  SHA512:
6
- metadata.gz: 31566bf77244dd3cfc097531a3af1da186e8d0e7e0ec675be0b7471f8b7654649fa666d4c3d2f6bb34c46d73d29aa72a64dfa07f7beb35ae01d23c8f2bc6c797
7
- data.tar.gz: f900e723d2db75dbdb266fcf33d01be56d7614b992be9e0b6d29345a85012be0d226ff9a2f42cb2d5a9f932cb1e641e6ecbfc033ccd3c6c98bbe4d2a7207ad13
6
+ metadata.gz: 2915cfe6e4e959e61ee5b8ce68e7da784b4c6001cfe0c3acdb0a4e0f804da79a1e46a17b7c5297b9dd4f26e58bbae504066f5874d8cd82d6ea223b3dfc561bbb
7
+ data.tar.gz: cb1337292d4bb7c94603612e03dbdbcbd9a41c2a94e56f4bbfad1132d403f6d14b0316091017745ee2d282ccc426bdcdb65b137f66f07f2a505d231792e424b0
data/CHANGELOG.md CHANGED
@@ -1,5 +1,19 @@
1
1
  # Changelog
2
2
 
3
+ ## [0.3.1] - 2026-04-08
4
+
5
+ ### Added
6
+ - `Runners::Fleet` — module-function dispatcher for inbound AMQP LLM request messages; routes by `request_type` to `Client#embed`, `Client#generate`, or `Client#chat`
7
+ - `Transport::Exchanges::LlmRequest` — durable topic exchange `llm.request` for fleet routing
8
+ - `Transport::Queues::ModelRequest` — parametric durable quorum queue per `(type, model)` pair; sanitises colons in model names to dots
9
+ - `Transport::Messages::LlmResponse` — reply message published back to `reply_to` queue after inference
10
+ - `Actor::ModelWorker` — subscription actor; one instance per configured `(type, model)` subscription; enriches inbound messages with `request_type` and `model`, bypasses Legion::Runner task DB (`use_runner? false`)
11
+ - Fleet queue subscription system: when `Legion::Extensions::Core` is present, subscribes to model-scoped queues on `llm.request` topic exchange using routing key `llm.request.ollama.<type>.<model>`
12
+ - Standalone mode: all transport/actor requires guarded behind `const_defined?(:Core, false)` so the gem works as a pure HTTP client library without AMQP
13
+
14
+ ### Fixed
15
+ - `Runners::S3Models`: use `::JSON.parse` (stdlib) instead of bare `JSON.parse` which resolves to `Legion::JSON` (symbol keys) inside the `Legion::` namespace — fixes `import_from_s3` and `sync_from_s3` manifest parsing
16
+
3
17
  ## [0.3.0] - 2026-04-01
4
18
 
5
19
  ### Added
data/CLAUDE.md CHANGED
@@ -1,44 +1,178 @@
1
1
  # lex-ollama: Ollama Integration for LegionIO
2
2
 
3
- **Parent**: `/Users/miverso2/rubymine/legion/extensions-ai/CLAUDE.md`
3
+ **Repository Level 3 Documentation**
4
+ - **Parent**: `../CLAUDE.md`
5
+ - **Grandparent**: `../../CLAUDE.md`
4
6
 
5
7
  ## Purpose
6
8
 
7
- Legion Extension that connects LegionIO to Ollama, a local LLM server. Provides text generation, chat completions, embeddings, model management, and blob operations.
9
+ Legion Extension that connects LegionIO to Ollama, a local LLM server. Provides text generation,
10
+ chat completions, embeddings, model management, blob operations, S3 model distribution, version
11
+ reporting, and **fleet queue subscription** for receiving routed LLM requests from the Legion bus.
8
12
 
9
13
  **GitHub**: https://github.com/LegionIO/lex-ollama
10
14
  **License**: MIT
15
+ **Version**: 0.3.1
16
+ **Specs**: 82 examples (12 spec files) — fleet additions add ~35 more
17
+
18
+ ---
11
19
 
12
20
  ## Architecture
13
21
 
14
22
  ```
15
23
  Legion::Extensions::Ollama
16
24
  ├── Runners/
17
- │ ├── Completions # POST /api/generate
18
- │ ├── Chat # POST /api/chat
19
- │ ├── Models # CRUD + pull/push/running
20
- ├── Embeddings # POST /api/embed
21
- │ ├── Blobs # HEAD/POST /api/blobs/:digest
22
- └── Version # GET /api/version
25
+ │ ├── Completions # generate, generate_stream
26
+ │ ├── Chat # chat, chat_stream
27
+ │ ├── Models # create_model, list_models, show_model, copy_model, delete_model,
28
+ # pull_model, push_model, list_running
29
+ │ ├── Embeddings # embed
30
+ ├── Blobs # check_blob, push_blob
31
+ │ ├── S3Models # list_s3_models, import_from_s3, sync_from_s3, import_default_models
32
+ │ ├── Version # server_version
33
+ │ └── Fleet # handle_request (fleet dispatcher — chat/embed/generate)
23
34
  ├── Helpers/
24
- └── Client # Faraday connection to Ollama server
25
- └── Client # Standalone client class
35
+ ├── Client # Faraday connection to Ollama server (module, factory method)
36
+ │ ├── Errors # error handling + with_retry
37
+ │ └── Usage # usage normalization (maps Ollama token/duration fields to standard shape)
38
+ ├── Client # Standalone client class (includes all runners, holds @config)
39
+ ├── Transport/ # (loaded only when Legion::Extensions::Core is present)
40
+ │ ├── Exchanges/
41
+ │ │ └── LlmRequest # topic exchange 'llm.request'
42
+ │ ├── Queues/
43
+ │ │ └── ModelRequest # parametric queue — one per (type, model) pair
44
+ │ └── Messages/
45
+ │ └── LlmResponse # reply message published back to reply_to
46
+ └── Actor/
47
+ └── ModelWorker # subscription actor — one per registered model/type
26
48
  ```
27
49
 
50
+ ---
51
+
52
+ ## Fleet Queue Subscription
53
+
54
+ ### Overview
55
+
56
+ When `Legion::Extensions::Core` is available, lex-ollama subscribes to model-scoped queues on the
57
+ `llm.request` topic exchange, accepting routed inference work from other Legion fleet members
58
+ (lex-llm-gateway, direct publishers, etc.).
59
+
60
+ ### Routing Key Schema
61
+
62
+ ```
63
+ llm.request.ollama.<type>.<model>
64
+ ```
65
+
66
+ | Segment | Values | Notes |
67
+ |------------|----------------------------|------------------------------------|
68
+ | `ollama` | literal | provider identifier |
69
+ | `type` | `chat`, `embed`, `generate`| maps to a specific runner method |
70
+ | `model` | sanitised model name | `:` replaced with `.` (AMQP rules) |
71
+
72
+ **Examples:**
73
+ ```
74
+ llm.request.ollama.embed.nomic-embed-text
75
+ llm.request.ollama.embed.mxbai-embed-large
76
+ llm.request.ollama.chat.qwen3.5.27b # was qwen3.5:27b
77
+ llm.request.ollama.chat.llama3.2
78
+ llm.request.ollama.generate.llama3.2
79
+ ```
80
+
81
+ ### Queue Strategy
82
+
83
+ Each model+type combination gets its own **durable quorum queue** with a routing key that matches
84
+ its queue name exactly. Multiple nodes carrying the same model compete fairly (no SAC) — any
85
+ subscriber can serve. The queue name is identical to the routing key for clarity in the management UI.
86
+
87
+ ### Configuration
88
+
89
+ ```yaml
90
+ legion:
91
+ ollama:
92
+ host: "http://localhost:11434"
93
+ subscriptions:
94
+ - type: embed
95
+ model: nomic-embed-text
96
+ - type: embed
97
+ model: mxbai-embed-large
98
+ - type: chat
99
+ model: "qwen3.5:27b"
100
+ - type: chat
101
+ model: llama3.2
102
+ ```
103
+
104
+ The extension spawns one `Actor::ModelWorker` per subscription entry at boot.
105
+
106
+ ### Data Flow
107
+
108
+ ```
109
+ Publisher (lex-llm-gateway / any fleet node)
110
+ │ routing_key: "llm.request.ollama.embed.nomic-embed-text"
111
+
112
+ Exchange: llm.request [topic, durable]
113
+
114
+ └── Queue: llm.request.ollama.embed.nomic-embed-text [quorum]
115
+
116
+ Actor::ModelWorker (type=embed, model=nomic-embed-text)
117
+
118
+ Runners::Fleet#handle_request
119
+
120
+ Ollama::Client#embed(model: 'nomic-embed-text', ...)
121
+
122
+ Transport::Messages::LlmResponse → reply_to queue (if present)
123
+ ```
124
+
125
+ ### Standalone Mode (no Legion runtime)
126
+
127
+ All transport/actor requires are guarded behind:
128
+ ```ruby
129
+ if Legion::Extensions.const_defined?(:Core, false)
130
+ # transport + actor requires
131
+ end
132
+ ```
133
+ The gem still works as a pure HTTP client library without AMQP, exactly as before.
134
+
135
+ ---
136
+
137
+ ## Key Design Decisions
138
+
139
+ - `generate_stream` and `chat_stream` yield `{ type: :delta, text: }` and `{ type: :done }` events.
140
+ - `S3Models` runner depends on `lex-s3`. Uses SHA256 digest verification. `import_from_s3` writes
141
+ directly to the filesystem; `sync_from_s3` pushes blobs through the Ollama API.
142
+ - `S3Models::OLLAMA_REGISTRY_PREFIX = 'manifests/registry.ollama.ai/library'`.
143
+ - `Usage` helper normalizes Ollama's token/duration fields to `{ input_tokens:, output_tokens:, ... }`.
144
+ - All runners return `{ result: body, status: code }`.
145
+ - **`Runners::Fleet` dispatch rules:**
146
+ - `request_type: 'embed'` → `Client#embed`, uses `:input` then falls back to `:text`.
147
+ - `request_type: 'generate'` → `Client#generate`.
148
+ - anything else (including `'chat'` or unknown) → `Client#chat`.
149
+ - **`Actor::ModelWorker#use_runner?` is `false`** — bypasses `Legion::Runner` / task DB entirely.
150
+ - **Reply publishing** never raises — errors are swallowed so the AMQP ack is not blocked.
151
+ - **Colon sanitisation** — `qwen3.5:27b` becomes `qwen3.5.27b` in queue/routing-key strings.
152
+
153
+ ---
154
+
28
155
  ## Dependencies
29
156
 
30
157
  | Gem | Purpose |
31
158
  |-----|---------|
32
- | faraday | HTTP client for Ollama REST API |
159
+ | `faraday` >= 2.0 | HTTP client for Ollama REST API |
160
+ | `lex-s3` >= 0.2 | S3 model distribution operations |
161
+
162
+ Fleet transport requires Legion runtime gems (`legion-transport`, `LegionIO`) but those are *not*
163
+ gemspec dependencies — they are expected to be present in the runtime environment.
164
+
165
+ ---
33
166
 
34
167
  ## Testing
35
168
 
36
169
  ```bash
37
170
  bundle install
38
- bundle exec rspec
171
+ bundle exec rspec # all examples
39
172
  bundle exec rubocop
40
173
  ```
41
174
 
42
175
  ---
43
176
 
44
177
  **Maintained By**: Matthew Iverson (@Esity)
178
+ **Last Updated**: 2026-04-07
data/README.md CHANGED
@@ -119,6 +119,10 @@ result[:usage] # => { input_tokens: 1, output_tokens: 5, total_duration: ..., .
119
119
  - [LegionIO](https://github.com/LegionIO/LegionIO) framework
120
120
  - [Ollama](https://ollama.com) running locally or on a remote host
121
121
 
122
+ ## Version
123
+
124
+ 0.3.1
125
+
122
126
  ## License
123
127
 
124
128
  MIT
@@ -0,0 +1,427 @@
1
+ # Fleet Queue Subscription for lex-ollama
2
+
3
+ **Date**: 2026-04-07
4
+ **Status**: Design / RFC
5
+
6
+ ---
7
+
8
+ ## Problem
9
+
10
+ `lex-ollama` currently operates purely as a client library — it wraps the Ollama HTTP API and
11
+ returns results, but it never *subscribes* to any AMQP queue. That means there is no way for the
12
+ Legion fleet to route LLM/embed work to an Ollama node over the message bus. Every other
13
+ producer-side extension (`lex-openai`, `lex-claude`, etc.) publishes to the `extensions` exchange;
14
+ there is currently no Ollama-backed consumer on the other side.
15
+
16
+ ---
17
+
18
+ ## Goals
19
+
20
+ 1. **Subscribe** — lex-ollama listens on a dedicated queue and processes `llm.request.*` messages
21
+ sent by other fleet members (lex-llm-gateway, direct callers, etc.).
22
+ 2. **Model-scoped routing keys** — each local model gets its own binding so traffic can be steered
23
+ precisely without code-level dispatch logic.
24
+ 3. **Minimal coupling** — the transport layer is guarded behind `const_defined?` so the gem still
25
+ works as a standalone library (tests, scripts, irb) without any Legion runtime present.
26
+ 4. **Consistent patterns** — follow the same `Transport/Queues`, `Transport/Messages`,
27
+ `Transport/Exchanges`, `Actors` layout used by every other Legion extension.
28
+
29
+ ---
30
+
31
+ ## Routing Key Schema
32
+
33
+ ```
34
+ llm.request.<provider>.<type>.<model>
35
+ ```
36
+
37
+ | Segment | Values | Notes |
38
+ |------------|---------------------------------------------|-------------------------------------|
39
+ | `provider` | `ollama` | always `ollama` for this extension |
40
+ | `type` | `chat`, `generate`, `embed` | maps 1-to-1 to a runner method |
41
+ | `model` | any Ollama model name (`:` → `.` sanitised) | e.g. `nomic-embed-text`, `qwen3.5.27b` |
42
+
43
+ ### Examples
44
+
45
+ ```
46
+ llm.request.ollama.embed.nomic-embed-text
47
+ llm.request.ollama.embed.mxbai-embed-large
48
+ llm.request.ollama.chat.qwen3.5.27b
49
+ llm.request.ollama.chat.llama3.2
50
+ llm.request.ollama.generate.llama3.2
51
+ ```
52
+
53
+ Colons in model names (`qwen3.5:27b`) are converted to dots (`qwen3.5.27b`) because AMQP topic
54
+ routing keys use `.` as a word separator and `:` is not permitted.
55
+
56
+ ---
57
+
58
+ ## Queue Strategy: Dynamic Per-Model Queues
59
+
60
+ Each subscribed model gets its **own durable queue** bound to the `llm.request` topic exchange.
61
+
62
+ ```
63
+ Exchange: llm.request (topic, durable)
64
+ ├── llm.request.ollama.embed.nomic-embed-text → Queue: llm.request.ollama.embed.nomic-embed-text
65
+ ├── llm.request.ollama.embed.mxbai-embed-large → Queue: llm.request.ollama.embed.mxbai-embed-large
66
+ ├── llm.request.ollama.chat.qwen3.5.27b → Queue: llm.request.ollama.chat.qwen3.5.27b
67
+ └── llm.request.ollama.chat.llama3.2 → Queue: llm.request.ollama.chat.llama3.2
68
+ ```
69
+
70
+ **Why per-model queues instead of a wildcard queue?**
71
+
72
+ - Multiple nodes can each carry *different* model subsets. A node with only `nomic-embed-text`
73
+ should not compete for messages destined for `mxbai-embed-large`.
74
+ - RabbitMQ quorum queues + SAC (`x-single-active-consumer`) per queue let us cleanly support both
75
+ load-balancing *and* exclusive-consumer topologies without any application-layer coordination.
76
+ - Routing key granularity lets lex-llm-gateway (or any sender) address a specific model precisely
77
+ rather than relying on message-body dispatch.
78
+
79
+ ---
80
+
81
+ ## New Files
82
+
83
+ ```
84
+ lib/legion/extensions/ollama/
85
+ transport/
86
+ exchanges/
87
+ llm_request.rb # Topic exchange: 'llm.request'
88
+ queues/
89
+ model_request.rb # Parametric queue class — one instance per (type, model) tuple
90
+ messages/
91
+ llm_response.rb # Response message published back to reply_to
92
+ actors/
93
+ model_worker.rb # Subscription actor — one per registered model
94
+ runners/
95
+ fleet.rb # NEW: fleet request dispatcher (chat/embed/generate dispatch)
96
+ transport.rb # Transport module wiring for the extension
97
+
98
+ spec/legion/extensions/ollama/
99
+ transport/
100
+ exchanges/llm_request_spec.rb
101
+ queues/model_request_spec.rb
102
+ messages/llm_response_spec.rb
103
+ actors/model_worker_spec.rb
104
+ runners/fleet_spec.rb
105
+ ```
106
+
107
+ ---
108
+
109
+ ## Detailed Design
110
+
111
+ ### `Transport::Exchanges::LlmRequest`
112
+
113
+ ```ruby
114
+ module Legion::Extensions::Ollama::Transport::Exchanges
115
+ class LlmRequest < Legion::Transport::Exchange
116
+ def exchange_name = 'llm.request'
117
+ def default_type = 'topic'
118
+ end
119
+ end
120
+ ```
121
+
122
+ A single `topic` exchange shared by all AI provider extensions. If `lex-openai` or `lex-claude`
123
+ declare the same exchange name with the same options, RabbitMQ deduplicates (no `PreconditionFailed`
124
+ because parameters match).
125
+
126
+ ---
127
+
128
+ ### `Transport::Queues::ModelRequest`
129
+
130
+ A **parametric queue** — one Ruby class, instantiated N times with different `(type, model)` pairs.
131
+
132
+ ```ruby
133
+ module Legion::Extensions::Ollama::Transport::Queues
134
+ class ModelRequest < Legion::Transport::Queue
135
+ def initialize(request_type:, model:, **)
136
+ @request_type = request_type.to_s
137
+ @model = sanitise_model(model)
138
+ super(**)
139
+ end
140
+
141
+ def queue_name
142
+ "llm.request.ollama.#{@request_type}.#{@model}"
143
+ end
144
+
145
+ def queue_options
146
+ { durable: true, arguments: { 'x-queue-type': 'quorum' } }
147
+ end
148
+
149
+ private
150
+
151
+ def sanitise_model(name)
152
+ name.to_s.tr(':', '.')
153
+ end
154
+ end
155
+ end
156
+ ```
157
+
158
+ The `queue_name` mirrors the routing key exactly, which keeps bindings trivially readable in the
159
+ RabbitMQ management UI.
160
+
161
+ ---
162
+
163
+ ### `Transport::Messages::LlmResponse`
164
+
165
+ Sent back to `reply_to` (if present) after processing.
166
+
167
+ ```ruby
168
+ module Legion::Extensions::Ollama::Transport::Messages
169
+ class LlmResponse < Legion::Transport::Message
170
+ def routing_key = @options[:reply_to]
171
+ def exchange = Legion::Transport::Exchanges::Agent # direct reply via default exchange
172
+ def encrypt? = false
173
+ def message
174
+ {
175
+ correlation_id: @options[:correlation_id],
176
+ result: @options[:result],
177
+ usage: @options[:usage],
178
+ model: @options[:model],
179
+ provider: 'ollama',
180
+ status: @options[:status]
181
+ }
182
+ end
183
+ end
184
+ end
185
+ ```
186
+
187
+ ---
188
+
189
+ ### `Runners::Fleet`
190
+
191
+ New runner module. Dispatches inbound AMQP payloads to the appropriate Ollama method and
192
+ optionally publishes a reply.
193
+
194
+ ```ruby
195
+ module Legion::Extensions::Ollama::Runners::Fleet
196
+ module_function
197
+
198
+ # Primary entry point called by the actor.
199
+ def handle_request(model:, request_type: 'chat', reply_to: nil,
200
+ correlation_id: nil, **payload)
201
+ result = dispatch(model: model, request_type: request_type, **payload)
202
+ publish_reply(reply_to, correlation_id, result) if reply_to
203
+ result
204
+ end
205
+
206
+ private
207
+
208
+ def dispatch(model:, request_type:, **payload)
209
+ client = Legion::Extensions::Ollama::Client.new
210
+
211
+ case request_type.to_s
212
+ when 'embed'
213
+ client.embed(model: model, input: payload[:input] || payload[:text])
214
+ when 'generate'
215
+ client.generate(model: model, prompt: payload[:prompt], **payload.slice(:options, :system))
216
+ else # 'chat' and anything else
217
+ client.chat(model: model, messages: payload[:messages],
218
+ **payload.slice(:tools, :format, :options))
219
+ end
220
+ rescue StandardError => e
221
+ { result: nil, status: 500, error: e.message }
222
+ end
223
+
224
+ def publish_reply(reply_to, correlation_id, result)
225
+ return unless defined?(Legion::Transport)
226
+
227
+ Transport::Messages::LlmResponse.new(
228
+ reply_to: reply_to,
229
+ correlation_id: correlation_id,
230
+ **result
231
+ ).publish
232
+ rescue StandardError
233
+ nil # never let a broken reply kill the ack
234
+ end
235
+ end
236
+ ```
237
+
238
+ ---
239
+
240
+ ### `Actors::ModelWorker`
241
+
242
+ One actor instance per `(type, model)` pair. Overrides `queue` to return the
243
+ pre-instantiated `ModelRequest` queue bound to its specific routing key.
244
+
245
+ ```ruby
246
+ module Legion::Extensions::Ollama::Actor
247
+ class ModelWorker < Legion::Extensions::Actors::Subscription
248
+ attr_reader :request_type, :model_name
249
+
250
+ def initialize(request_type:, model:, **)
251
+ @request_type = request_type.to_s
252
+ @model_name = model.to_s
253
+ super(**)
254
+ end
255
+
256
+ def runner_class = Legion::Extensions::Ollama::Runners::Fleet
257
+ def runner_function = 'handle_request'
258
+ def use_runner? = false
259
+
260
+ # Override to use a model-scoped queue instead of the default convention-based one.
261
+ def queue
262
+ @queue_class ||= begin
263
+ Transport::Queues::ModelRequest.new(
264
+ request_type: @request_type,
265
+ model: @model_name
266
+ ).tap do |q|
267
+ exchange = Transport::Exchanges::LlmRequest.new
268
+ routing_key = "llm.request.ollama.#{@request_type}.#{@model_name.tr(':', '.')}"
269
+ q.bind(exchange, routing_key: routing_key)
270
+ end
271
+ end
272
+ end
273
+
274
+ # Injects request_type + model into every message so Fleet#handle_request
275
+ # always has them, even if the sender omitted them.
276
+ def process_message(payload, metadata, delivery_info)
277
+ msg = super
278
+ msg[:request_type] ||= @request_type
279
+ msg[:model] ||= @model_name
280
+ msg
281
+ end
282
+ end
283
+ end
284
+ ```
285
+
286
+ ---
287
+
288
+ ### `transport.rb` (extension-level wiring)
289
+
290
+ ```ruby
291
+ require 'legion/extensions/transport'
292
+
293
+ module Legion::Extensions::Ollama::Transport
294
+ extend Legion::Extensions::Transport
295
+
296
+ # No additional e_to_q here — all bindings are created dynamically by
297
+ # ModelWorker#queue. The exchange declaration is enough for topology mode.
298
+ def self.additional_e_to_q = []
299
+ end
300
+ ```
301
+
302
+ ---
303
+
304
+ ### Settings / Model Registration
305
+
306
+ Models to subscribe for are read from `Legion::Settings` at boot:
307
+
308
+ ```yaml
309
+ # legion.yml (or legion-settings)
310
+ legion:
311
+ ollama:
312
+ host: "http://localhost:11434"
313
+ subscriptions:
314
+ - type: embed
315
+ model: nomic-embed-text
316
+ - type: embed
317
+ model: mxbai-embed-large
318
+ - type: chat
319
+ model: "qwen3.5:27b"
320
+ - type: chat
321
+ model: llama3.2
322
+ - type: generate
323
+ model: llama3.2
324
+ ```
325
+
326
+ The extension's `Core` lifecycle hook reads this list and spawns one `ModelWorker` actor per entry.
327
+
328
+ ---
329
+
330
+ ### `ollama.rb` changes (main extension file)
331
+
332
+ Add the new requires (guarded so the gem still loads without Legion core):
333
+
334
+ ```ruby
335
+ require 'legion/extensions/ollama/runners/fleet'
336
+
337
+ if Legion::Extensions.const_defined?(:Core)
338
+ require 'legion/extensions/ollama/transport/exchanges/llm_request'
339
+ require 'legion/extensions/ollama/transport/queues/model_request'
340
+ require 'legion/extensions/ollama/transport/messages/llm_response'
341
+ require 'legion/extensions/ollama/transport/transport'
342
+ require 'legion/extensions/ollama/actors/model_worker'
343
+ end
344
+ ```
345
+
346
+ ---
347
+
348
+ ## Transport Topology Diagram
349
+
350
+ ```
351
+ Publisher (lex-llm-gateway / any Legion node)
352
+
353
+ │ publish routing_key: "llm.request.ollama.embed.nomic-embed-text"
354
+
355
+ Exchange: llm.request [topic, durable]
356
+
357
+ ├─── binding: llm.request.ollama.embed.nomic-embed-text
358
+ │ ▼
359
+ │ Queue: llm.request.ollama.embed.nomic-embed-text [quorum, durable]
360
+ │ ▼
361
+ │ ModelWorker(type: embed, model: nomic-embed-text)
362
+ │ ▼
363
+ │ Runners::Fleet.handle_request(...)
364
+ │ ▼
365
+ │ Ollama::Client#embed(model: 'nomic-embed-text', ...)
366
+ │ ▼
367
+ │ LlmResponse.publish → reply_to queue
368
+
369
+ ├─── binding: llm.request.ollama.embed.mxbai-embed-large
370
+ │ ▼ [similar chain]
371
+
372
+ └─── binding: llm.request.ollama.chat.qwen3.5.27b
373
+ ▼ [similar chain]
374
+ ```
375
+
376
+ ---
377
+
378
+ ## What Stays Unchanged
379
+
380
+ | Component | Status | Reason |
381
+ |-------------------------|--------------|------------------------------------------------|
382
+ | `Runners::Chat` | Unchanged | Still used directly + via fleet |
383
+ | `Runners::Embeddings` | Unchanged | Still used directly + via fleet |
384
+ | `Runners::Completions` | Unchanged | Still used directly + via fleet |
385
+ | `Runners::Models` | Unchanged | Not a fleet-dispatched concern |
386
+ | `Runners::S3Models` | Unchanged | Separate distribution concern |
387
+ | `Runners::Blobs` | Unchanged | Internal implementation detail |
388
+ | `Helpers::Client` | Unchanged | Faraday factory, no transport coupling |
389
+ | `Helpers::Errors` | Unchanged | Retry logic, no transport coupling |
390
+ | `Helpers::Usage` | Unchanged | Token normalisation, no transport coupling |
391
+ | `Client` class | Unchanged | Standalone HTTP client — no AMQP dependency |
392
+ | All existing specs | Unchanged | 82 passing examples must remain green |
393
+
394
+ ---
395
+
396
+ ## Open Questions
397
+
398
+ 1. **`x-single-active-consumer` per queue?** If multiple ollama nodes carry the same model, do we
399
+ want them to compete (round-robin, no SAC) or have a single active + hot-standby (SAC=true)?
400
+ Default proposal: **no SAC** (any subscribed node can serve), matches how lex-conditioner works.
401
+
402
+ 2. **Wildcard subscription?** Should there be an opt-in `llm.request.ollama.#` catch-all queue for
403
+ nodes that want to handle *any* ollama traffic? Useful for dev/single-node setups. Proposal:
404
+ add as a separate `ModelWorker`-compatible setting (`type: '*', model: '*'`) with a wildcard
405
+ routing key binding.
406
+
407
+ 3. **Streaming over AMQP?** The current design returns the full accumulated response in a single
408
+ reply message (non-streaming). Streaming responses over AMQP (chunked delta messages) is
409
+ possible but significantly more complex — deferred to a future phase.
410
+
411
+ 4. **`request_type` in routing key vs message body?** Currently the routing key embeds the type
412
+ (`chat`, `embed`, `generate`). The message body should also carry it so `Fleet#handle_request`
413
+ can dispatch without needing to parse the delivery routing key. The actor injects it from its
414
+ own instance vars — this is the agreed approach.
415
+
416
+ ---
417
+
418
+ ## Implementation Phases
419
+
420
+ | Phase | Scope | New specs |
421
+ |-------|-----------------------------------------------------------------|-----------|
422
+ | 1 | `Transport::Exchanges::LlmRequest` + `Transport::Queues::ModelRequest` | 2 files |
423
+ | 2 | `Runners::Fleet` + `Transport::Messages::LlmResponse` | 2 files |
424
+ | 3 | `Actors::ModelWorker` + `transport.rb` + settings loading | 2 files |
425
+ | 4 | `ollama.rb` integration wiring + CLAUDE.md update | — |
426
+
427
+ Each phase is independently reviewable/mergeable.
@@ -0,0 +1,79 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Legion
4
+ module Extensions
5
+ module Ollama
6
+ module Actor
7
+ # Subscription actor that listens on a model-scoped queue and forwards
8
+ # inbound LLM request messages to Runners::Fleet#handle_request.
9
+ #
10
+ # One instance is created per (request_type, model) entry in settings:
11
+ #
12
+ # legion:
13
+ # ollama:
14
+ # subscriptions:
15
+ # - type: embed
16
+ # model: nomic-embed-text
17
+ # - type: chat
18
+ # model: "qwen3.5:27b"
19
+ #
20
+ # The queue name and routing key both follow the schema:
21
+ # llm.request.ollama.<type>.<model>
22
+ # where model colons are converted to dots (AMQP topic word separator).
23
+ class ModelWorker < Legion::Extensions::Actors::Subscription
24
+ attr_reader :request_type, :model_name
25
+
26
+ def initialize(request_type:, model:, **)
27
+ @request_type = request_type.to_s
28
+ @model_name = model.to_s
29
+ super(**)
30
+ end
31
+
32
+ def runner_class
33
+ Legion::Extensions::Ollama::Runners::Fleet
34
+ end
35
+
36
+ def runner_function
37
+ 'handle_request'
38
+ end
39
+
40
+ # Bypass Legion::Runner — call the runner module directly so we don't
41
+ # need a task record in the database for every LLM inference hop.
42
+ def use_runner?
43
+ false
44
+ end
45
+
46
+ # Override queue to return a model-scoped queue bound with the precise
47
+ # routing key for this worker's (type, model) pair.
48
+ def queue
49
+ @queue ||= build_and_bind_queue
50
+ end
51
+
52
+ # Enrich every inbound message with the worker's own request_type and model
53
+ # so Runners::Fleet#handle_request always has them, even if the sender omitted them.
54
+ def process_message(payload, metadata, delivery_info)
55
+ msg = super
56
+ msg[:request_type] ||= @request_type
57
+ msg[:model] ||= @model_name
58
+ msg
59
+ end
60
+
61
+ private
62
+
63
+ def build_and_bind_queue
64
+ sanitised_model = @model_name.tr(':', '.')
65
+ routing_key = "llm.request.ollama.#{@request_type}.#{sanitised_model}"
66
+
67
+ queue_obj = Transport::Queues::ModelRequest.new(
68
+ request_type: @request_type,
69
+ model: @model_name
70
+ )
71
+ exchange_obj = Transport::Exchanges::LlmRequest.new
72
+ queue_obj.bind(exchange_obj, routing_key: routing_key)
73
+ queue_obj
74
+ end
75
+ end
76
+ end
77
+ end
78
+ end
79
+ end
@@ -0,0 +1,67 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Legion
4
+ module Extensions
5
+ module Ollama
6
+ module Runners
7
+ # Fleet runner — handles inbound AMQP LLM request messages and dispatches
8
+ # them to the appropriate Ollama::Client method based on request_type.
9
+ #
10
+ # Called by Actor::ModelWorker with use_runner? = false, meaning the actor
11
+ # calls this module directly rather than going through Legion::Runner.
12
+ module Fleet
13
+ module_function
14
+
15
+ # Primary entry point called by the subscription actor.
16
+ #
17
+ # @param model [String] Ollama model name, e.g. "nomic-embed-text"
18
+ # @param request_type [String] "chat", "embed", or "generate"
19
+ # @param reply_to [String, nil] routing key for the reply queue (RPC pattern)
20
+ # @param correlation_id [String, nil] echoed back in the reply for caller matching
21
+ # @param payload [Hash] remaining message keys passed through to the Ollama client
22
+ def handle_request(model:, request_type: 'chat', reply_to: nil,
23
+ correlation_id: nil, **payload)
24
+ result = dispatch(model: model, request_type: request_type, **payload)
25
+ publish_reply(reply_to, correlation_id, result.merge(model: model)) if reply_to
26
+ result
27
+ end
28
+
29
+ def dispatch(model:, request_type:, **payload)
30
+ ollama = Legion::Extensions::Ollama::Client.new
31
+
32
+ case request_type.to_s
33
+ when 'embed'
34
+ input = payload[:input] || payload[:text]
35
+ ollama.embed(model: model, input: input,
36
+ **payload.slice(:truncate, :options, :keep_alive, :dimensions))
37
+ when 'generate'
38
+ ollama.generate(model: model, prompt: payload[:prompt],
39
+ **payload.slice(:images, :format, :options, :system, :keep_alive))
40
+ else
41
+ # 'chat' and any unrecognised type falls through to chat
42
+ ollama.chat(model: model, messages: payload[:messages],
43
+ **payload.slice(:tools, :format, :options, :keep_alive, :think))
44
+ end
45
+ rescue StandardError => e
46
+ { result: nil, usage: {}, status: 500, error: e.message }
47
+ end
48
+
49
+ def publish_reply(reply_to, correlation_id, result)
50
+ return unless defined?(Legion::Transport)
51
+
52
+ Transport::Messages::LlmResponse.new(
53
+ reply_to: reply_to,
54
+ correlation_id: correlation_id,
55
+ **result
56
+ ).publish
57
+ rescue StandardError
58
+ # Never let a broken reply pipeline kill the consumer ack path.
59
+ nil
60
+ end
61
+
62
+ private :dispatch, :publish_reply
63
+ end
64
+ end
65
+ end
66
+ end
67
+ end
@@ -45,7 +45,7 @@ module Legion
45
45
  manifest_key = "#{prefix}/#{OLLAMA_REGISTRY_PREFIX}/#{name}/#{tag}"
46
46
  manifest_resp = s3.get_object(bucket: bucket, key: manifest_key)
47
47
  manifest_body = manifest_resp[:body]
48
- manifest_data = JSON.parse(manifest_body)
48
+ manifest_data = ::JSON.parse(manifest_body)
49
49
 
50
50
  digests = []
51
51
  digests << manifest_data['config'].slice('digest', 'size')
@@ -90,7 +90,7 @@ module Legion
90
90
 
91
91
  manifest_key = "#{prefix}/#{OLLAMA_REGISTRY_PREFIX}/#{name}/#{tag}"
92
92
  manifest_resp = s3.get_object(bucket: bucket, key: manifest_key)
93
- manifest_data = JSON.parse(manifest_resp[:body])
93
+ manifest_data = ::JSON.parse(manifest_resp[:body])
94
94
 
95
95
  digests = []
96
96
  digests << manifest_data['config']['digest']
@@ -0,0 +1,21 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Legion
4
+ module Extensions
5
+ module Ollama
6
+ module Transport
7
+ module Exchanges
8
+ class LlmRequest < Legion::Transport::Exchange
9
+ def exchange_name
10
+ 'llm.request'
11
+ end
12
+
13
+ def default_type
14
+ 'topic'
15
+ end
16
+ end
17
+ end
18
+ end
19
+ end
20
+ end
21
+ end
@@ -0,0 +1,39 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Legion
4
+ module Extensions
5
+ module Ollama
6
+ module Transport
7
+ module Messages
8
+ # Published back to the caller's reply_to queue after a fleet request is processed.
9
+ # Uses the default RabbitMQ exchange (direct, empty string) with reply_to as routing key,
10
+ # which is standard for RPC-style reply routing.
11
+ class LlmResponse < Legion::Transport::Message
12
+ def routing_key
13
+ @options[:reply_to]
14
+ end
15
+
16
+ def exchange
17
+ Legion::Transport::Exchanges::Agent
18
+ end
19
+
20
+ def encrypt?
21
+ false
22
+ end
23
+
24
+ def message
25
+ {
26
+ correlation_id: @options[:correlation_id],
27
+ result: @options[:result],
28
+ usage: @options[:usage],
29
+ model: @options[:model],
30
+ provider: 'ollama',
31
+ status: @options[:status] || 200
32
+ }
33
+ end
34
+ end
35
+ end
36
+ end
37
+ end
38
+ end
39
+ end
@@ -0,0 +1,42 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Legion
4
+ module Extensions
5
+ module Ollama
6
+ module Transport
7
+ module Queues
8
+ # Parametric queue — one instance per (request_type, model) tuple.
9
+ #
10
+ # queue_name mirrors the routing key exactly so bindings are self-documenting
11
+ # in the RabbitMQ management UI, e.g.:
12
+ # llm.request.ollama.embed.nomic-embed-text
13
+ # llm.request.ollama.chat.qwen3.5.27b
14
+ class ModelRequest < Legion::Transport::Queue
15
+ def initialize(request_type:, model:, **)
16
+ @request_type = request_type.to_s
17
+ @model = sanitise_model(model)
18
+ super(**)
19
+ end
20
+
21
+ def queue_name
22
+ "llm.request.ollama.#{@request_type}.#{@model}"
23
+ end
24
+
25
+ def queue_options
26
+ { durable: true, arguments: { 'x-queue-type': 'quorum' } }
27
+ end
28
+
29
+ private
30
+
31
+ # Project convention: use dots as the only word separator in routing keys
32
+ # so queue names stay visually consistent (dots are the AMQP topic separator).
33
+ # e.g. "qwen3.5:27b" → "qwen3.5.27b"
34
+ def sanitise_model(name)
35
+ name.to_s.tr(':', '.')
36
+ end
37
+ end
38
+ end
39
+ end
40
+ end
41
+ end
42
+ end
@@ -0,0 +1,25 @@
1
+ # frozen_string_literal: true
2
+
3
+ begin
4
+ require 'legion/extensions/transport'
5
+ rescue LoadError
6
+ nil
7
+ end
8
+
9
+ module Legion
10
+ module Extensions
11
+ module Ollama
12
+ module Transport
13
+ extend Legion::Extensions::Transport if Legion::Extensions.const_defined?(:Transport, false)
14
+
15
+ # All queue-to-exchange bindings are established dynamically by
16
+ # Actor::ModelWorker#build_and_bind_queue at subscription time.
17
+ # This file only needs to declare the exchange so topology/infra mode
18
+ # can introspect the full routing graph.
19
+ def self.additional_e_to_q
20
+ []
21
+ end
22
+ end
23
+ end
24
+ end
25
+ end
@@ -3,7 +3,7 @@
3
3
  module Legion
4
4
  module Extensions
5
5
  module Ollama
6
- VERSION = '0.3.0'
6
+ VERSION = '0.3.1'
7
7
  end
8
8
  end
9
9
  end
@@ -11,12 +11,23 @@ require 'legion/extensions/ollama/runners/embeddings'
11
11
  require 'legion/extensions/ollama/runners/blobs'
12
12
  require 'legion/extensions/ollama/runners/s3_models'
13
13
  require 'legion/extensions/ollama/runners/version'
14
+ require 'legion/extensions/ollama/runners/fleet'
14
15
  require 'legion/extensions/ollama/client'
15
16
 
17
+ # Fleet transport and actor wiring — only loaded when Legion::Extensions::Core is present
18
+ # so the gem still works as a standalone HTTP client without any AMQP runtime.
19
+ if Legion::Extensions.const_defined?(:Core, false)
20
+ require 'legion/extensions/ollama/transport/exchanges/llm_request'
21
+ require 'legion/extensions/ollama/transport/queues/model_request'
22
+ require 'legion/extensions/ollama/transport/messages/llm_response'
23
+ require 'legion/extensions/ollama/transport'
24
+ require 'legion/extensions/ollama/actors/model_worker'
25
+ end
26
+
16
27
  module Legion
17
28
  module Extensions
18
29
  module Ollama
19
- extend Legion::Extensions::Core if Legion::Extensions.const_defined? :Core
30
+ extend Legion::Extensions::Core if Legion::Extensions.const_defined?(:Core, false)
20
31
  end
21
32
  end
22
33
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: lex-ollama
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.0
4
+ version: 0.3.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Esity
@@ -56,8 +56,10 @@ files:
56
56
  - README.md
57
57
  - docs/plans/2026-04-01-s3-model-distribution-design.md
58
58
  - docs/plans/2026-04-01-s3-model-distribution-plan.md
59
+ - docs/plans/2026-04-07-fleet-queue-subscription-design.md
59
60
  - lex-ollama.gemspec
60
61
  - lib/legion/extensions/ollama.rb
62
+ - lib/legion/extensions/ollama/actors/model_worker.rb
61
63
  - lib/legion/extensions/ollama/client.rb
62
64
  - lib/legion/extensions/ollama/helpers/client.rb
63
65
  - lib/legion/extensions/ollama/helpers/errors.rb
@@ -66,9 +68,14 @@ files:
66
68
  - lib/legion/extensions/ollama/runners/chat.rb
67
69
  - lib/legion/extensions/ollama/runners/completions.rb
68
70
  - lib/legion/extensions/ollama/runners/embeddings.rb
71
+ - lib/legion/extensions/ollama/runners/fleet.rb
69
72
  - lib/legion/extensions/ollama/runners/models.rb
70
73
  - lib/legion/extensions/ollama/runners/s3_models.rb
71
74
  - lib/legion/extensions/ollama/runners/version.rb
75
+ - lib/legion/extensions/ollama/transport.rb
76
+ - lib/legion/extensions/ollama/transport/exchanges/llm_request.rb
77
+ - lib/legion/extensions/ollama/transport/messages/llm_response.rb
78
+ - lib/legion/extensions/ollama/transport/queues/model_request.rb
72
79
  - lib/legion/extensions/ollama/version.rb
73
80
  homepage: https://github.com/LegionIO/lex-ollama
74
81
  licenses: