lex-ollama 0.3.1 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 28df561b00b58c7cb179b9904aed61a5aa7e278140306dadb3b4b2665eaab824
4
- data.tar.gz: 446afaab9d80e6a4f62286a1f5ccc1c023bdbb178dba043cb96081412991b2d3
3
+ metadata.gz: 8657f3e11e11fcd2ee34e12317bf7698bcfea1e907006c76f9a07326996c7a69
4
+ data.tar.gz: d1c1bbb05dc6a3a0071b4474a45a074b0d4929cd4b48b7488b5aa10539a9a6ee
5
5
  SHA512:
6
- metadata.gz: 2915cfe6e4e959e61ee5b8ce68e7da784b4c6001cfe0c3acdb0a4e0f804da79a1e46a17b7c5297b9dd4f26e58bbae504066f5874d8cd82d6ea223b3dfc561bbb
7
- data.tar.gz: cb1337292d4bb7c94603612e03dbdbcbd9a41c2a94e56f4bbfad1132d403f6d14b0316091017745ee2d282ccc426bdcdb65b137f66f07f2a505d231792e424b0
6
+ metadata.gz: e2a8622a2914cdfbc04b365d1ca7a9e8d35b4daa656931fd23a6e010b25b5a8ed6699246bffdbe0bfe064757ad26cad8f3664fe8c1dd2d1c606220dc932af45f
7
+ data.tar.gz: ea975e9ac1c89621d41c274b6040bb92760672f9d4a2db223ea6c08371f7ed8e8a2dc810417b9b5b2beb9ce30fccb64544f24c11ae815422e5b99f5a43a48517
data/CHANGELOG.md CHANGED
@@ -1,5 +1,20 @@
1
1
  # Changelog
2
2
 
3
+ ## [0.3.2] - 2026-04-08
4
+
5
+ ### Changed
6
+ - `Transport::Exchanges::LlmRequest` now inherits `Legion::LLM::Fleet::Exchange` instead of declaring exchange properties independently — prevents silent divergence if the canonical exchange definition changes
7
+ - `Transport::Queues::ModelRequest` switched from durable quorum queue to classic auto-delete with `x-max-priority: 10` — enables `basic.return` feedback when all workers disconnect; added `dlx_enabled: false` to prevent DLX provisioning on ephemeral queues
8
+ - `Transport::Messages::LlmResponse` now inherits `Legion::LLM::Fleet::Response` instead of `Legion::Transport::Message` — gains wire protocol compliance (`type: 'llm.fleet.response'`, `message_context` propagation, default-exchange publishing, `resp_` prefixed message_id); overrides `app_id` to `'lex-ollama'`
9
+ - `Runners::Fleet#handle_request` now accepts and propagates `message_context` verbatim from request to response; rejects `stream: true` requests with `unsupported_streaming` error; builds full wire protocol response envelope (routing, tokens, timestamps, audit, cost, stop)
10
+ - `Runners::Fleet#publish_reply` switched from positional to keyword arguments; uses `fleet_correlation_id` instead of `correlation_id` to avoid collision with Legion task tracking
11
+ - `Runners::Fleet#dispatch` now resolves Ollama host from `Legion::Settings` instead of using hardcoded default
12
+ - `Actor::ModelWorker` now sets `prefetch(1)` for fair consumer dispatch; reads `consumer_priority` from `legion.ollama.fleet.consumer_priority` settings; passes `x-priority` in `subscribe_options`; injects `message_context: {}` default in `process_message`
13
+
14
+ ### Added
15
+ - `Runners::Fleet#publish_error` — publishes `Legion::LLM::Fleet::Error` to caller's reply_to queue on validation failures (e.g., unsupported streaming)
16
+ - `Runners::Fleet#build_response_body` — constructs wire protocol response body with routing, tokens, timestamps, audit, cost, and stop blocks
17
+
3
18
  ## [0.3.1] - 2026-04-08
4
19
 
5
20
  ### Added
data/CLAUDE.md CHANGED
@@ -12,7 +12,7 @@ reporting, and **fleet queue subscription** for receiving routed LLM requests fr
12
12
 
13
13
  **GitHub**: https://github.com/LegionIO/lex-ollama
14
14
  **License**: MIT
15
- **Version**: 0.3.1
15
+ **Version**: 0.3.2
16
16
  **Specs**: 82 examples (12 spec files) — fleet additions add ~35 more
17
17
 
18
18
  ---
@@ -38,11 +38,11 @@ Legion::Extensions::Ollama
38
38
  ├── Client # Standalone client class (includes all runners, holds @config)
39
39
  ├── Transport/ # (loaded only when Legion::Extensions::Core is present)
40
40
  │ ├── Exchanges/
41
- │ │ └── LlmRequest # topic exchange 'llm.request'
41
+ │ │ └── LlmRequest # references Legion::LLM::Fleet::Exchange ('llm.request')
42
42
  │ ├── Queues/
43
- │ │ └── ModelRequest # parametric queue — one per (type, model) pair
43
+ │ │ └── ModelRequest # parametric queue — one per (type, model) pair, auto-delete
44
44
  │ └── Messages/
45
- │ └── LlmResponse # reply message published back to reply_to
45
+ │ └── LlmResponse # Legion::LLM::Fleet::Response subclass, reply via default exchange
46
46
  └── Actor/
47
47
  └── ModelWorker # subscription actor — one per registered model/type
48
48
  ```
@@ -80,9 +80,12 @@ llm.request.ollama.generate.llama3.2
80
80
 
81
81
  ### Queue Strategy
82
82
 
83
- Each model+type combination gets its own **durable quorum queue** with a routing key that matches
83
+ Each model+type combination gets its own **auto-delete queue** with a routing key that matches
84
84
  its queue name exactly. Multiple nodes carrying the same model compete fairly (no SAC) — any
85
85
  subscriber can serve. The queue name is identical to the routing key for clarity in the management UI.
86
+ RabbitMQ policies (applied externally via Terraform) set `max-length` and
87
+ `overflow: reject-publish` on `llm.request.*` queues. Queue priority is enabled by declaring
88
+ `x-max-priority: 10` on the queue itself (and may also be mirrored by policy for consistency).
86
89
 
87
90
  ### Configuration
88
91
 
@@ -106,20 +109,24 @@ The extension spawns one `Actor::ModelWorker` per subscription entry at boot.
106
109
  ### Data Flow
107
110
 
108
111
  ```
109
- Publisher (lex-llm-gateway / any fleet node)
112
+ Publisher (legion-llm Fleet::Dispatcher / any fleet node)
110
113
  │ routing_key: "llm.request.ollama.embed.nomic-embed-text"
114
+ │ AMQP type: 'llm.fleet.request'
115
+ │ Body includes: message_context { conversation_id, message_id, parent_message_id, message_seq, request_id, exchange_id }
111
116
 
112
117
  Exchange: llm.request [topic, durable]
113
118
 
114
- └── Queue: llm.request.ollama.embed.nomic-embed-text [quorum]
119
+ └── Queue: llm.request.ollama.embed.nomic-embed-text [auto-delete]
115
120
 
116
121
  Actor::ModelWorker (type=embed, model=nomic-embed-text)
117
122
 
118
123
  Runners::Fleet#handle_request
124
+ │ copies message_context from request
119
125
 
120
126
  Ollama::Client#embed(model: 'nomic-embed-text', ...)
121
127
 
122
- Transport::Messages::LlmResponse → reply_to queue (if present)
128
+ Fleet::Response (type: 'llm.fleet.response') → reply_to queue
129
+ Body includes: message_context (copied), response_message_id
123
130
  ```
124
131
 
125
132
  ### Standalone Mode (no Legion runtime)
@@ -152,6 +159,33 @@ The gem still works as a pure HTTP client library without AMQP, exactly as befor
152
159
 
153
160
  ---
154
161
 
162
+ ## Wire Protocol & Message Classes
163
+
164
+ Fleet messages inherit from `Legion::LLM::Transport::Message` (defined in legion-llm), which
165
+ extends `Legion::Transport::Message` with `message_context` propagation and LLM-specific headers.
166
+
167
+ ```
168
+ Legion::Transport::Message (platform base)
169
+ └── Legion::LLM::Transport::Message (LLM base — message_context, llm_headers)
170
+ ├── Legion::LLM::Fleet::Request (type: 'llm.fleet.request', app_id: 'legion-llm')
171
+ ├── Legion::LLM::Fleet::Response (type: 'llm.fleet.response', app_id: 'lex-ollama')
172
+ └── Legion::LLM::Fleet::Error (type: 'llm.fleet.error', app_id: 'lex-ollama')
173
+ ```
174
+
175
+ Every fleet message carries `message_context` in the body for end-to-end tracing:
176
+ ```
177
+ message_context:
178
+ conversation_id, message_id, parent_message_id, message_seq, request_id, exchange_id
179
+ ```
180
+
181
+ A subset (`conversation_id`, `message_id`, `request_id`) is promoted to AMQP headers
182
+ (`x-legion-llm-conversation-id`, etc.) for filtering without body parsing.
183
+
184
+ See: `docs/plans/2026-04-08-fleet-wire-protocol.md` for full AMQP property mapping,
185
+ platform-wide standard, and per-message-type specifications.
186
+
187
+ ---
188
+
155
189
  ## Dependencies
156
190
 
157
191
  | Gem | Purpose |
@@ -159,8 +193,9 @@ The gem still works as a pure HTTP client library without AMQP, exactly as befor
159
193
  | `faraday` >= 2.0 | HTTP client for Ollama REST API |
160
194
  | `lex-s3` >= 0.2 | S3 model distribution operations |
161
195
 
162
- Fleet transport requires Legion runtime gems (`legion-transport`, `LegionIO`) but those are *not*
163
- gemspec dependencies — they are expected to be present in the runtime environment.
196
+ Fleet transport requires Legion runtime gems (`legion-transport`, `legion-llm`, `LegionIO`) but
197
+ those are *not* gemspec dependencies — they are expected to be present in the runtime environment.
198
+ `legion-llm` is needed for fleet message classes (`Legion::LLM::Fleet::Request`, etc.).
164
199
 
165
200
  ---
166
201
 
@@ -175,4 +210,4 @@ bundle exec rubocop
175
210
  ---
176
211
 
177
212
  **Maintained By**: Matthew Iverson (@Esity)
178
- **Last Updated**: 2026-04-07
213
+ **Last Updated**: 2026-04-08
@@ -11,6 +11,8 @@ module Legion
11
11
  #
12
12
  # legion:
13
13
  # ollama:
14
+ # fleet:
15
+ # consumer_priority: 10
14
16
  # subscriptions:
15
17
  # - type: embed
16
18
  # model: nomic-embed-text
@@ -43,6 +45,36 @@ module Legion
43
45
  false
44
46
  end
45
47
 
48
+ # prefetch(1) is required for consumer priority to work correctly:
49
+ # without it, a high-priority consumer can hold multiple messages while
50
+ # lower-priority consumers sit idle. With prefetch=1, each consumer
51
+ # completes one message before RabbitMQ delivers the next, and priority
52
+ # determines which idle consumer gets it.
53
+ def prefetch
54
+ 1
55
+ end
56
+
57
+ # Consumer priority from settings. Tells RabbitMQ to prefer this consumer
58
+ # over lower-priority ones on the same queue when multiple consumers are idle.
59
+ # Standard scale: GPU server = 10, Mac Studio = 5, developer laptop = 1.
60
+ # Defaults to 0 (equal priority) if not configured.
61
+ def consumer_priority
62
+ return 0 unless defined?(Legion::Settings)
63
+
64
+ Legion::Settings.dig(:ollama, :fleet, :consumer_priority) || 0
65
+ end
66
+
67
+ # Subscribe options include x-priority argument so RabbitMQ can honour
68
+ # consumer priority when dispatching to competing consumers.
69
+ def subscribe_options
70
+ base = begin
71
+ super
72
+ rescue NoMethodError
73
+ {}
74
+ end
75
+ base.merge(arguments: { 'x-priority' => consumer_priority })
76
+ end
77
+
46
78
  # Override queue to return a model-scoped queue bound with the precise
47
79
  # routing key for this worker's (type, model) pair.
48
80
  def queue
@@ -50,11 +82,13 @@ module Legion
50
82
  end
51
83
 
52
84
  # Enrich every inbound message with the worker's own request_type and model
53
- # so Runners::Fleet#handle_request always has them, even if the sender omitted them.
85
+ # so Runners::Fleet#handle_request always has them, even if the sender omitted
86
+ # them. Also defaults message_context to {} if absent.
54
87
  def process_message(payload, metadata, delivery_info)
55
88
  msg = super
56
- msg[:request_type] ||= @request_type
57
- msg[:model] ||= @model_name
89
+ msg[:request_type] ||= @request_type
90
+ msg[:model] ||= @model_name
91
+ msg[:message_context] ||= {}
58
92
  msg
59
93
  end
60
94
 
@@ -7,59 +7,204 @@ module Legion
7
7
  # Fleet runner — handles inbound AMQP LLM request messages and dispatches
8
8
  # them to the appropriate Ollama::Client method based on request_type.
9
9
  #
10
- # Called by Actor::ModelWorker with use_runner? = false, meaning the actor
11
- # calls this module directly rather than going through Legion::Runner.
10
+ # Called by Actor::ModelWorker with use_runner? = false.
12
11
  module Fleet
13
- module_function
14
-
15
- # Primary entry point called by the subscription actor.
16
- #
17
- # @param model [String] Ollama model name, e.g. "nomic-embed-text"
18
- # @param request_type [String] "chat", "embed", or "generate"
19
- # @param reply_to [String, nil] routing key for the reply queue (RPC pattern)
20
- # @param correlation_id [String, nil] echoed back in the reply for caller matching
21
- # @param payload [Hash] remaining message keys passed through to the Ollama client
22
- def handle_request(model:, request_type: 'chat', reply_to: nil,
23
- correlation_id: nil, **payload)
24
- result = dispatch(model: model, request_type: request_type, **payload)
25
- publish_reply(reply_to, correlation_id, result.merge(model: model)) if reply_to
26
- result
27
- end
12
+ class << self
13
+ # Primary entry point called by the subscription actor.
14
+ #
15
+ # @param model [String] Ollama model name, e.g. "nomic-embed-text"
16
+ # @param request_type [String] "chat", "embed", or "generate"
17
+ # @param reply_to [String, nil] routing key for the reply queue (RPC pattern)
18
+ # @param correlation_id [String, nil] fleet correlation ID, echoed back in reply
19
+ # @param message_context [Hash] tracing context copied verbatim into the reply
20
+ # @param payload [Hash] remaining message keys passed to the Ollama client
21
+ def handle_request(model:, request_type: 'chat', reply_to: nil,
22
+ correlation_id: nil, message_context: {}, **payload)
23
+ received_at = Time.now.utc
24
+
25
+ if payload[:stream]
26
+ publish_error(
27
+ reply_to: reply_to,
28
+ correlation_id: correlation_id,
29
+ message_context: message_context,
30
+ model: model,
31
+ request_type: request_type,
32
+ error: {
33
+ code: 'unsupported_streaming',
34
+ message: 'Streaming over the fleet AMQP bus is not supported in v1',
35
+ retriable: false,
36
+ category: 'validation',
37
+ provider: 'ollama'
38
+ }
39
+ )
40
+ return { result: nil, status: 422, error: 'unsupported_streaming' }
41
+ end
42
+
43
+ result = dispatch(model: model, request_type: request_type, **payload)
44
+ returned_at = Time.now.utc
45
+
46
+ if reply_to
47
+ publish_reply(
48
+ reply_to: reply_to,
49
+ correlation_id: correlation_id,
50
+ message_context: message_context,
51
+ model: model,
52
+ request_type: request_type,
53
+ result: result,
54
+ received_at: received_at,
55
+ returned_at: returned_at
56
+ )
57
+ end
28
58
 
29
- def dispatch(model:, request_type:, **payload)
30
- ollama = Legion::Extensions::Ollama::Client.new
31
-
32
- case request_type.to_s
33
- when 'embed'
34
- input = payload[:input] || payload[:text]
35
- ollama.embed(model: model, input: input,
36
- **payload.slice(:truncate, :options, :keep_alive, :dimensions))
37
- when 'generate'
38
- ollama.generate(model: model, prompt: payload[:prompt],
39
- **payload.slice(:images, :format, :options, :system, :keep_alive))
40
- else
41
- # 'chat' and any unrecognised type falls through to chat
42
- ollama.chat(model: model, messages: payload[:messages],
43
- **payload.slice(:tools, :format, :options, :keep_alive, :think))
59
+ result
44
60
  end
45
- rescue StandardError => e
46
- { result: nil, usage: {}, status: 500, error: e.message }
47
- end
48
61
 
49
- def publish_reply(reply_to, correlation_id, result)
50
- return unless defined?(Legion::Transport)
51
-
52
- Transport::Messages::LlmResponse.new(
53
- reply_to: reply_to,
54
- correlation_id: correlation_id,
55
- **result
56
- ).publish
57
- rescue StandardError
58
- # Never let a broken reply pipeline kill the consumer ack path.
59
- nil
60
- end
62
+ # Dispatch to the correct Ollama client method by request_type.
63
+ #
64
+ # @return [Hash] { result: body, status: code } or { result: nil, status: 500, error: msg }
65
+ def dispatch(model:, request_type:, **payload)
66
+ host = ollama_host
67
+ ollama = Legion::Extensions::Ollama::Client.new(host: host)
68
+
69
+ case request_type.to_s
70
+ when 'embed'
71
+ input = payload[:input] || payload[:text]
72
+ ollama.embed(model: model, input: input,
73
+ **payload.slice(:truncate, :options, :keep_alive, :dimensions))
74
+ when 'generate'
75
+ ollama.generate(model: model, prompt: payload[:prompt],
76
+ **payload.slice(:images, :format, :options, :system, :keep_alive))
77
+ else
78
+ ollama.chat(model: model, messages: payload[:messages],
79
+ **payload.slice(:tools, :format, :options, :keep_alive, :think))
80
+ end
81
+ rescue StandardError => e
82
+ { result: nil, usage: {}, status: 500, error: e.message }
83
+ end
84
+
85
+ # Publish a successful fleet response to the caller's reply_to queue.
86
+ # Errors are swallowed so the AMQP ack path is never blocked by a broken reply.
87
+ def publish_reply(reply_to:, correlation_id:, message_context:, model:,
88
+ request_type:, result:, received_at:, returned_at:)
89
+ return unless defined?(Legion::Transport)
90
+
91
+ body = result[:result] || {}
92
+ usage = result[:usage] || {}
93
+ status = result[:status] || 200
94
+ latency_ms = ((returned_at - received_at) * 1000).round
95
+
96
+ Transport::Messages::LlmResponse.new(
97
+ reply_to: reply_to,
98
+ fleet_correlation_id: correlation_id,
99
+ message_context: message_context,
100
+ provider: 'ollama',
101
+ model: model,
102
+ request_type: request_type,
103
+ app_id: 'lex-ollama',
104
+ **build_response_body(
105
+ request_type: request_type,
106
+ body: body,
107
+ usage: usage,
108
+ status: status,
109
+ model: model,
110
+ latency_ms: latency_ms,
111
+ received_at: received_at,
112
+ returned_at: returned_at
113
+ )
114
+ ).publish
115
+ rescue StandardError
116
+ nil
117
+ end
118
+
119
+ # Publish a fleet error to the caller's reply_to queue.
120
+ # Errors are swallowed so the AMQP ack path is never blocked.
121
+ def publish_error(reply_to:, correlation_id:, message_context:, model:,
122
+ request_type:, error:)
123
+ return unless reply_to
124
+ return unless defined?(Legion::Transport)
61
125
 
62
- private :dispatch, :publish_reply
126
+ Legion::LLM::Fleet::Error.new(
127
+ reply_to: reply_to,
128
+ fleet_correlation_id: correlation_id,
129
+ message_context: message_context,
130
+ provider: 'ollama',
131
+ model: model,
132
+ request_type: request_type,
133
+ app_id: 'lex-ollama',
134
+ error: error,
135
+ worker_node: node_identity
136
+ ).publish
137
+ rescue StandardError
138
+ nil
139
+ end
140
+
141
+ private
142
+
143
+ # Build the JSON body for a successful fleet response.
144
+ def build_response_body(request_type:, body:, usage:, status:, model:,
145
+ latency_ms:, received_at:, returned_at:)
146
+ base = {
147
+ routing: {
148
+ provider: 'ollama',
149
+ model: model,
150
+ tier: 'fleet',
151
+ strategy: 'fleet_dispatch',
152
+ latency_ms: latency_ms
153
+ },
154
+ tokens: {
155
+ input: usage[:input_tokens] || 0,
156
+ output: usage[:output_tokens] || 0,
157
+ total: (usage[:input_tokens] || 0) + (usage[:output_tokens] || 0)
158
+ },
159
+ stop: { reason: body.is_a?(Hash) ? body['done_reason'] : nil },
160
+ cost: { estimated_usd: 0.0, provider: 'ollama', model: model },
161
+ timestamps: {
162
+ received: received_at.iso8601(3),
163
+ provider_start: received_at.iso8601(3),
164
+ provider_end: returned_at.iso8601(3),
165
+ returned: returned_at.iso8601(3)
166
+ },
167
+ audit: {
168
+ 'fleet:execute' => {
169
+ outcome: status == 200 ? 'success' : 'error',
170
+ duration_ms: latency_ms,
171
+ timestamp: returned_at.iso8601(3)
172
+ }
173
+ },
174
+ stream: false
175
+ }
176
+
177
+ case request_type.to_s
178
+ when 'embed'
179
+ base.merge(
180
+ embeddings: body.is_a?(Hash) ? body['embeddings'] : body
181
+ )
182
+ when 'generate'
183
+ base.merge(
184
+ message: { role: 'assistant', content: body.is_a?(Hash) ? body['response'] : body }
185
+ )
186
+ else
187
+ content = body.is_a?(Hash) ? body.dig('message', 'content') : body
188
+ base.merge(
189
+ message: { role: 'assistant', content: content }
190
+ )
191
+ end
192
+ end
193
+
194
+ # Resolve the Ollama host from settings, falling back to the default.
195
+ def ollama_host
196
+ return Helpers::Client::DEFAULT_HOST unless defined?(Legion::Settings)
197
+
198
+ Legion::Settings.dig(:ollama, :host) || Helpers::Client::DEFAULT_HOST
199
+ end
200
+
201
+ # Resolve the local node identity for worker_node in error messages.
202
+ def node_identity
203
+ return 'unknown' unless defined?(Legion::Settings)
204
+
205
+ Legion::Settings.dig(:node, :canonical_name) || 'unknown'
206
+ end
207
+ end
63
208
  end
64
209
  end
65
210
  end
@@ -5,14 +5,10 @@ module Legion
5
5
  module Ollama
6
6
  module Transport
7
7
  module Exchanges
8
- class LlmRequest < Legion::Transport::Exchange
9
- def exchange_name
10
- 'llm.request'
11
- end
12
-
13
- def default_type
14
- 'topic'
15
- end
8
+ # Thin alias that delegates exchange definition to Legion::LLM::Fleet::Exchange.
9
+ # This class exists solely so Ollama::Transport topology introspection has a
10
+ # local reference without importing legion-llm internals directly.
11
+ class LlmRequest < Legion::LLM::Fleet::Exchange
16
12
  end
17
13
  end
18
14
  end
@@ -6,30 +6,19 @@ module Legion
6
6
  module Transport
7
7
  module Messages
8
8
  # Published back to the caller's reply_to queue after a fleet request is processed.
9
- # Uses the default RabbitMQ exchange (direct, empty string) with reply_to as routing key,
10
- # which is standard for RPC-style reply routing.
11
- class LlmResponse < Legion::Transport::Message
12
- def routing_key
13
- @options[:reply_to]
14
- end
15
-
16
- def exchange
17
- Legion::Transport::Exchanges::Agent
18
- end
19
-
20
- def encrypt?
21
- false
22
- end
23
-
24
- def message
25
- {
26
- correlation_id: @options[:correlation_id],
27
- result: @options[:result],
28
- usage: @options[:usage],
29
- model: @options[:model],
30
- provider: 'ollama',
31
- status: @options[:status] || 200
32
- }
9
+ #
10
+ # Inherits Legion::LLM::Fleet::Response which:
11
+ # - sets type: 'llm.fleet.response'
12
+ # - sets routing_key to @options[:reply_to]
13
+ # - publishes via AMQP default exchange ('')
14
+ # - propagates message_context into body and headers
15
+ # - generates message_id with 'resp_' prefix
16
+ #
17
+ # This class only overrides app_id so audit records and the wire protocol
18
+ # correctly identify lex-ollama as the worker component.
19
+ class LlmResponse < Legion::LLM::Fleet::Response
20
+ def app_id
21
+ 'lex-ollama'
33
22
  end
34
23
  end
35
24
  end
@@ -11,6 +11,13 @@ module Legion
11
11
  # in the RabbitMQ management UI, e.g.:
12
12
  # llm.request.ollama.embed.nomic-embed-text
13
13
  # llm.request.ollama.chat.qwen3.5.27b
14
+ #
15
+ # Queue strategy:
16
+ # - classic (not quorum): quorum queues cannot be auto-delete
17
+ # - auto_delete: true — queue deletes when last consumer disconnects + queue empties,
18
+ # enabling basic.return feedback to publishers via mandatory: true
19
+ # - x-max-priority: 10 — must be a queue argument at declaration time for classic
20
+ # queues; policies handle max-length and overflow externally
14
21
  class ModelRequest < Legion::Transport::Queue
15
22
  def initialize(request_type:, model:, **)
16
23
  @request_type = request_type.to_s
@@ -23,14 +30,23 @@ module Legion
23
30
  end
24
31
 
25
32
  def queue_options
26
- { durable: true, arguments: { 'x-queue-type': 'quorum' } }
33
+ {
34
+ durable: false,
35
+ auto_delete: true,
36
+ arguments: { 'x-max-priority' => 10 }
37
+ }
38
+ end
39
+
40
+ # Disable dead-letter exchange provisioning. The base class
41
+ # default_options always adds x-dead-letter-exchange when
42
+ # dlx_enabled returns true. Fleet queues are ephemeral
43
+ # (auto-delete) and must not provision persistent DLX queues.
44
+ def dlx_enabled
45
+ false
27
46
  end
28
47
 
29
48
  private
30
49
 
31
- # Project convention: use dots as the only word separator in routing keys
32
- # so queue names stay visually consistent (dots are the AMQP topic separator).
33
- # e.g. "qwen3.5:27b" → "qwen3.5.27b"
34
50
  def sanitise_model(name)
35
51
  name.to_s.tr(':', '.')
36
52
  end
@@ -3,7 +3,7 @@
3
3
  module Legion
4
4
  module Extensions
5
5
  module Ollama
6
- VERSION = '0.3.1'
6
+ VERSION = '0.3.2'
7
7
  end
8
8
  end
9
9
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: lex-ollama
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.1
4
+ version: 0.3.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Esity
@@ -54,9 +54,6 @@ files:
54
54
  - Gemfile
55
55
  - LICENSE
56
56
  - README.md
57
- - docs/plans/2026-04-01-s3-model-distribution-design.md
58
- - docs/plans/2026-04-01-s3-model-distribution-plan.md
59
- - docs/plans/2026-04-07-fleet-queue-subscription-design.md
60
57
  - lex-ollama.gemspec
61
58
  - lib/legion/extensions/ollama.rb
62
59
  - lib/legion/extensions/ollama/actors/model_worker.rb