lex-ollama 0.3.1 → 0.3.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +24 -0
- data/CLAUDE.md +48 -12
- data/README.md +35 -5
- data/lib/legion/extensions/ollama/actors/model_sync.rb +90 -0
- data/lib/legion/extensions/ollama/actors/model_worker.rb +55 -14
- data/lib/legion/extensions/ollama/runners/fleet.rb +193 -48
- data/lib/legion/extensions/ollama/transport/exchanges/llm_request.rb +4 -8
- data/lib/legion/extensions/ollama/transport/messages/llm_response.rb +13 -24
- data/lib/legion/extensions/ollama/transport.rb +2 -7
- data/lib/legion/extensions/ollama/version.rb +1 -1
- data/lib/legion/extensions/ollama.rb +1 -1
- metadata +2 -5
- data/docs/plans/2026-04-01-s3-model-distribution-design.md +0 -131
- data/docs/plans/2026-04-01-s3-model-distribution-plan.md +0 -655
- data/docs/plans/2026-04-07-fleet-queue-subscription-design.md +0 -427
- data/lib/legion/extensions/ollama/transport/queues/model_request.rb +0 -42
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: '086cd0d9744893c480b69c7b9fc6a10229af627344d316cafbccbd72d01cdafe'
|
|
4
|
+
data.tar.gz: e88f7893305401d36d8a3d48a7fb057e831fb6edeb245a5691c3d1e2527e1f1f
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: f6154fee6005bb96262961342983f01c5d4e5f2747c2cd11a33afbebc1f5ffa00ea64e4f39ef63758620524a5c6690b1a81856c1658371acc040fdd2295c8731
|
|
7
|
+
data.tar.gz: 0dda701fb81e1f084bc3cc4404d8014d6e2b6ecc68153fcdb42009c10a337f4a078ff11890a4032d95d516a5c1bc5177f4a8e1492db4caea47e9da1641fe6669
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,29 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## [0.3.3] - 2026-04-16
|
|
4
|
+
|
|
5
|
+
### Added
|
|
6
|
+
- `Actor::ModelSync` — once actor; runs 5s after extension load; reads `legion.ollama.default_models` and `legion.ollama.s3` from settings; calls `import_from_s3` for any configured model not already present on disk; no-op if either setting is absent
|
|
7
|
+
|
|
8
|
+
### Fixed
|
|
9
|
+
- `Transport::Queues::ModelRequest` deleted — the framework auto-discovers every file in `transport/queues/` and calls `.new` with no arguments at startup, which crashed because `ModelRequest` required `request_type:` and `model:`; the queue definition is now an anonymous class created inline by `Actor::ModelWorker#build_queue_class`
|
|
10
|
+
- `Actor::ModelWorker#queue` now returns a CLASS instead of an instance — `Subscription#initialize` calls `queue.new`, so returning an instance caused a silent `NoMethodError` on `NilClass#new`; the anonymous queue class has `queue_name`, `queue_options`, `dlx_enabled`, and `initialize` (exchange bind) defined inline via `define_method`
|
|
11
|
+
|
|
12
|
+
## [0.3.2] - 2026-04-08
|
|
13
|
+
|
|
14
|
+
### Changed
|
|
15
|
+
- `Transport::Exchanges::LlmRequest` now inherits `Legion::LLM::Fleet::Exchange` instead of declaring exchange properties independently — prevents silent divergence if the canonical exchange definition changes
|
|
16
|
+
- `Transport::Queues::ModelRequest` switched from durable quorum queue to classic auto-delete with `x-max-priority: 10` — enables `basic.return` feedback when all workers disconnect; added `dlx_enabled: false` to prevent DLX provisioning on ephemeral queues
|
|
17
|
+
- `Transport::Messages::LlmResponse` now inherits `Legion::LLM::Fleet::Response` instead of `Legion::Transport::Message` — gains wire protocol compliance (`type: 'llm.fleet.response'`, `message_context` propagation, default-exchange publishing, `resp_` prefixed message_id); overrides `app_id` to `'lex-ollama'`
|
|
18
|
+
- `Runners::Fleet#handle_request` now accepts and propagates `message_context` verbatim from request to response; rejects `stream: true` requests with `unsupported_streaming` error; builds full wire protocol response envelope (routing, tokens, timestamps, audit, cost, stop)
|
|
19
|
+
- `Runners::Fleet#publish_reply` switched from positional to keyword arguments; uses `fleet_correlation_id` instead of `correlation_id` to avoid collision with Legion task tracking
|
|
20
|
+
- `Runners::Fleet#dispatch` now resolves Ollama host from `Legion::Settings` instead of using hardcoded default
|
|
21
|
+
- `Actor::ModelWorker` now sets `prefetch(1)` for fair consumer dispatch; reads `consumer_priority` from `legion.ollama.fleet.consumer_priority` settings; passes `x-priority` in `subscribe_options`; injects `message_context: {}` default in `process_message`
|
|
22
|
+
|
|
23
|
+
### Added
|
|
24
|
+
- `Runners::Fleet#publish_error` — publishes `Legion::LLM::Fleet::Error` to caller's reply_to queue on validation failures (e.g., unsupported streaming)
|
|
25
|
+
- `Runners::Fleet#build_response_body` — constructs wire protocol response body with routing, tokens, timestamps, audit, cost, and stop blocks
|
|
26
|
+
|
|
3
27
|
## [0.3.1] - 2026-04-08
|
|
4
28
|
|
|
5
29
|
### Added
|
data/CLAUDE.md
CHANGED
|
@@ -12,8 +12,8 @@ reporting, and **fleet queue subscription** for receiving routed LLM requests fr
|
|
|
12
12
|
|
|
13
13
|
**GitHub**: https://github.com/LegionIO/lex-ollama
|
|
14
14
|
**License**: MIT
|
|
15
|
-
**Version**: 0.3.
|
|
16
|
-
**Specs**:
|
|
15
|
+
**Version**: 0.3.2
|
|
16
|
+
**Specs**: 166 examples (17 spec files)
|
|
17
17
|
|
|
18
18
|
---
|
|
19
19
|
|
|
@@ -38,11 +38,11 @@ Legion::Extensions::Ollama
|
|
|
38
38
|
├── Client # Standalone client class (includes all runners, holds @config)
|
|
39
39
|
├── Transport/ # (loaded only when Legion::Extensions::Core is present)
|
|
40
40
|
│ ├── Exchanges/
|
|
41
|
-
│ │ └── LlmRequest #
|
|
41
|
+
│ │ └── LlmRequest # references Legion::LLM::Fleet::Exchange ('llm.request')
|
|
42
42
|
│ ├── Queues/
|
|
43
|
-
│ │ └── ModelRequest # parametric queue — one per (type, model) pair
|
|
43
|
+
│ │ └── ModelRequest # parametric queue — one per (type, model) pair, auto-delete
|
|
44
44
|
│ └── Messages/
|
|
45
|
-
│ └── LlmResponse #
|
|
45
|
+
│ └── LlmResponse # Legion::LLM::Fleet::Response subclass, reply via default exchange
|
|
46
46
|
└── Actor/
|
|
47
47
|
└── ModelWorker # subscription actor — one per registered model/type
|
|
48
48
|
```
|
|
@@ -80,9 +80,12 @@ llm.request.ollama.generate.llama3.2
|
|
|
80
80
|
|
|
81
81
|
### Queue Strategy
|
|
82
82
|
|
|
83
|
-
Each model+type combination gets its own **
|
|
83
|
+
Each model+type combination gets its own **auto-delete queue** with a routing key that matches
|
|
84
84
|
its queue name exactly. Multiple nodes carrying the same model compete fairly (no SAC) — any
|
|
85
85
|
subscriber can serve. The queue name is identical to the routing key for clarity in the management UI.
|
|
86
|
+
RabbitMQ policies (applied externally via Terraform) set `max-length` and
|
|
87
|
+
`overflow: reject-publish` on `llm.request.*` queues. Queue priority is enabled by declaring
|
|
88
|
+
`x-max-priority: 10` on the queue itself (and may also be mirrored by policy for consistency).
|
|
86
89
|
|
|
87
90
|
### Configuration
|
|
88
91
|
|
|
@@ -106,20 +109,24 @@ The extension spawns one `Actor::ModelWorker` per subscription entry at boot.
|
|
|
106
109
|
### Data Flow
|
|
107
110
|
|
|
108
111
|
```
|
|
109
|
-
Publisher (
|
|
112
|
+
Publisher (legion-llm Fleet::Dispatcher / any fleet node)
|
|
110
113
|
│ routing_key: "llm.request.ollama.embed.nomic-embed-text"
|
|
114
|
+
│ AMQP type: 'llm.fleet.request'
|
|
115
|
+
│ Body includes: message_context { conversation_id, message_id, parent_message_id, message_seq, request_id, exchange_id }
|
|
111
116
|
▼
|
|
112
117
|
Exchange: llm.request [topic, durable]
|
|
113
118
|
│
|
|
114
|
-
└── Queue: llm.request.ollama.embed.nomic-embed-text [
|
|
119
|
+
└── Queue: llm.request.ollama.embed.nomic-embed-text [auto-delete]
|
|
115
120
|
▼
|
|
116
121
|
Actor::ModelWorker (type=embed, model=nomic-embed-text)
|
|
117
122
|
▼
|
|
118
123
|
Runners::Fleet#handle_request
|
|
124
|
+
│ copies message_context from request
|
|
119
125
|
▼
|
|
120
126
|
Ollama::Client#embed(model: 'nomic-embed-text', ...)
|
|
121
127
|
▼
|
|
122
|
-
|
|
128
|
+
Fleet::Response (type: 'llm.fleet.response') → reply_to queue
|
|
129
|
+
Body includes: message_context (copied), response_message_id
|
|
123
130
|
```
|
|
124
131
|
|
|
125
132
|
### Standalone Mode (no Legion runtime)
|
|
@@ -152,6 +159,34 @@ The gem still works as a pure HTTP client library without AMQP, exactly as befor
|
|
|
152
159
|
|
|
153
160
|
---
|
|
154
161
|
|
|
162
|
+
## Wire Protocol & Message Classes
|
|
163
|
+
|
|
164
|
+
Fleet messages inherit from `Legion::LLM::Transport::Message` (defined in legion-llm), which
|
|
165
|
+
extends `Legion::Transport::Message` with `message_context` propagation and LLM-specific headers.
|
|
166
|
+
|
|
167
|
+
```
|
|
168
|
+
Legion::Transport::Message (platform base)
|
|
169
|
+
└── Legion::LLM::Transport::Message (LLM base — message_context, llm_headers)
|
|
170
|
+
├── Legion::LLM::Fleet::Request (type: 'llm.fleet.request', app_id: 'legion-llm')
|
|
171
|
+
├── Legion::LLM::Fleet::Response (type: 'llm.fleet.response', app_id: 'lex-ollama')
|
|
172
|
+
└── Legion::LLM::Fleet::Error (type: 'llm.fleet.error', app_id: 'lex-ollama')
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
Every fleet message carries `message_context` in the body for end-to-end tracing:
|
|
176
|
+
```
|
|
177
|
+
message_context:
|
|
178
|
+
conversation_id, message_id, parent_message_id, message_seq, request_id, exchange_id
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
A subset (`conversation_id`, `message_id`, `request_id`) is promoted to AMQP headers
|
|
182
|
+
(`x-legion-llm-conversation-id`, etc.) for filtering without body parsing.
|
|
183
|
+
|
|
184
|
+
The wire protocol spec (AMQP property mapping, platform-wide standard, per-message-type
|
|
185
|
+
specifications) was developed during the fleet design phase and is maintained in the
|
|
186
|
+
legion-llm repository alongside the implementation.
|
|
187
|
+
|
|
188
|
+
---
|
|
189
|
+
|
|
155
190
|
## Dependencies
|
|
156
191
|
|
|
157
192
|
| Gem | Purpose |
|
|
@@ -159,8 +194,9 @@ The gem still works as a pure HTTP client library without AMQP, exactly as befor
|
|
|
159
194
|
| `faraday` >= 2.0 | HTTP client for Ollama REST API |
|
|
160
195
|
| `lex-s3` >= 0.2 | S3 model distribution operations |
|
|
161
196
|
|
|
162
|
-
Fleet transport requires Legion runtime gems (`legion-transport`, `LegionIO`) but
|
|
163
|
-
gemspec dependencies — they are expected to be present in the runtime environment.
|
|
197
|
+
Fleet transport requires Legion runtime gems (`legion-transport`, `legion-llm`, `LegionIO`) but
|
|
198
|
+
those are *not* gemspec dependencies — they are expected to be present in the runtime environment.
|
|
199
|
+
`legion-llm` is needed for fleet message classes (`Legion::LLM::Fleet::Request`, etc.).
|
|
164
200
|
|
|
165
201
|
---
|
|
166
202
|
|
|
@@ -175,4 +211,4 @@ bundle exec rubocop
|
|
|
175
211
|
---
|
|
176
212
|
|
|
177
213
|
**Maintained By**: Matthew Iverson (@Esity)
|
|
178
|
-
**Last Updated**: 2026-04-
|
|
214
|
+
**Last Updated**: 2026-04-10
|
data/README.md
CHANGED
|
@@ -44,6 +44,36 @@ gem install lex-ollama
|
|
|
44
44
|
### Version
|
|
45
45
|
- `server_version` - Retrieve the Ollama server version (GET /api/version)
|
|
46
46
|
|
|
47
|
+
### Fleet Queue Subscription
|
|
48
|
+
- `handle_request` - Dispatch inbound fleet AMQP messages to the appropriate runner (chat/embed/generate)
|
|
49
|
+
|
|
50
|
+
When `Legion::Extensions::Core` is present, lex-ollama subscribes to model-scoped queues on the
|
|
51
|
+
`llm.request` topic exchange, accepting routed LLM inference work from other Legion fleet members.
|
|
52
|
+
|
|
53
|
+
Each configured `(type, model)` pair gets its own auto-delete queue with routing key
|
|
54
|
+
`llm.request.ollama.<type>.<model>`. Multiple nodes serving the same model compete fairly
|
|
55
|
+
via RabbitMQ round-robin with consumer priority.
|
|
56
|
+
|
|
57
|
+
```yaml
|
|
58
|
+
legion:
|
|
59
|
+
ollama:
|
|
60
|
+
host: "http://localhost:11434"
|
|
61
|
+
fleet:
|
|
62
|
+
consumer_priority: 10 # H100: 10, Mac Studio: 5, MacBook: 1
|
|
63
|
+
subscriptions:
|
|
64
|
+
- type: embed
|
|
65
|
+
model: nomic-embed-text
|
|
66
|
+
- type: chat
|
|
67
|
+
model: "qwen3.5:27b"
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
Fleet messages use the wire protocol defined in `legion-llm`: typed AMQP messages
|
|
71
|
+
(`llm.fleet.request` / `llm.fleet.response` / `llm.fleet.error`) with `message_context`
|
|
72
|
+
propagation for end-to-end tracing.
|
|
73
|
+
|
|
74
|
+
Without `Legion::Extensions::Core`, the gem works as a pure HTTP client library with no
|
|
75
|
+
AMQP dependency.
|
|
76
|
+
|
|
47
77
|
## Standalone Client
|
|
48
78
|
|
|
49
79
|
```ruby
|
|
@@ -85,21 +115,21 @@ Pull models from an internal S3 mirror instead of the public Ollama registry:
|
|
|
85
115
|
client = Legion::Extensions::Ollama::Client.new
|
|
86
116
|
|
|
87
117
|
# List available models in S3
|
|
88
|
-
client.list_s3_models(bucket: 'legion', endpoint: 'https://
|
|
118
|
+
client.list_s3_models(bucket: 'legion', endpoint: 'https://s3.example.internal')
|
|
89
119
|
|
|
90
120
|
# Import directly to filesystem (works without Ollama running)
|
|
91
121
|
client.import_from_s3(model: 'llama3:latest', bucket: 'legion',
|
|
92
|
-
endpoint: 'https://
|
|
122
|
+
endpoint: 'https://s3.example.internal')
|
|
93
123
|
|
|
94
124
|
# Push through Ollama API (requires Ollama running)
|
|
95
125
|
client.sync_from_s3(model: 'llama3:latest', bucket: 'legion',
|
|
96
|
-
endpoint: 'https://
|
|
126
|
+
endpoint: 'https://s3.example.internal')
|
|
97
127
|
|
|
98
128
|
# Provision fleet with default models
|
|
99
129
|
client.import_default_models(
|
|
100
130
|
default_models: %w[llama3:latest nomic-embed-text:latest],
|
|
101
131
|
bucket: 'legion',
|
|
102
|
-
endpoint: 'https://
|
|
132
|
+
endpoint: 'https://s3.example.internal'
|
|
103
133
|
)
|
|
104
134
|
```
|
|
105
135
|
|
|
@@ -121,7 +151,7 @@ result[:usage] # => { input_tokens: 1, output_tokens: 5, total_duration: ..., .
|
|
|
121
151
|
|
|
122
152
|
## Version
|
|
123
153
|
|
|
124
|
-
0.3.
|
|
154
|
+
0.3.2
|
|
125
155
|
|
|
126
156
|
## License
|
|
127
157
|
|
|
@@ -0,0 +1,90 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Legion
|
|
4
|
+
module Extensions
|
|
5
|
+
module Ollama
|
|
6
|
+
module Actor
|
|
7
|
+
# Once actor — runs once shortly after extension load.
|
|
8
|
+
# Reads legion.ollama.s3 and legion.ollama.default_models from settings
|
|
9
|
+
# and calls import_from_s3 for any model not already present locally.
|
|
10
|
+
#
|
|
11
|
+
# Settings example:
|
|
12
|
+
# {
|
|
13
|
+
# "legion": {
|
|
14
|
+
# "ollama": {
|
|
15
|
+
# "s3": {
|
|
16
|
+
# "bucket": "legion",
|
|
17
|
+
# "prefix": "ollama/models",
|
|
18
|
+
# "endpoint": "https://s3.example.internal"
|
|
19
|
+
# },
|
|
20
|
+
# "default_models": ["qwen3.5:4b", "nomic-embed-text:latest"]
|
|
21
|
+
# }
|
|
22
|
+
# }
|
|
23
|
+
# }
|
|
24
|
+
class ModelSync < Legion::Extensions::Actors::Once
|
|
25
|
+
include Legion::Logging::Helper
|
|
26
|
+
|
|
27
|
+
# Run 5 seconds after extension load to allow the rest of startup to complete.
|
|
28
|
+
def delay
|
|
29
|
+
5.0
|
|
30
|
+
end
|
|
31
|
+
|
|
32
|
+
def use_runner?
|
|
33
|
+
false
|
|
34
|
+
end
|
|
35
|
+
|
|
36
|
+
def runner_class
|
|
37
|
+
self.class
|
|
38
|
+
end
|
|
39
|
+
|
|
40
|
+
def enabled?
|
|
41
|
+
return false unless defined?(Legion::Settings)
|
|
42
|
+
|
|
43
|
+
models = Legion::Settings.dig(:ollama, :default_models)
|
|
44
|
+
s3_cfg = Legion::Settings.dig(:ollama, :s3)
|
|
45
|
+
models.is_a?(Array) && !models.empty? && s3_cfg.is_a?(Hash) && s3_cfg[:bucket]
|
|
46
|
+
rescue StandardError => e
|
|
47
|
+
handle_exception(e, level: :warn, handled: true)
|
|
48
|
+
false
|
|
49
|
+
end
|
|
50
|
+
|
|
51
|
+
def manual
|
|
52
|
+
models = Legion::Settings.dig(:ollama, :default_models) || []
|
|
53
|
+
s3_cfg = Legion::Settings.dig(:ollama, :s3)
|
|
54
|
+
bucket = s3_cfg[:bucket]
|
|
55
|
+
s3_opts = s3_cfg.except(:bucket)
|
|
56
|
+
|
|
57
|
+
client = Object.new.extend(Legion::Extensions::Ollama::Runners::S3Models)
|
|
58
|
+
models_path = ENV.fetch('OLLAMA_MODELS', File.join(Dir.home, '.ollama', 'models'))
|
|
59
|
+
|
|
60
|
+
models.each do |model|
|
|
61
|
+
if model_present_locally?(model, models_path)
|
|
62
|
+
log.debug "[ModelSync] #{model} already present locally, skipping"
|
|
63
|
+
next
|
|
64
|
+
end
|
|
65
|
+
|
|
66
|
+
log.info "[ModelSync] importing #{model} from S3"
|
|
67
|
+
result = client.import_from_s3(model: model, bucket: bucket, models_path: models_path, **s3_opts)
|
|
68
|
+
if result[:status] == 200
|
|
69
|
+
log.info "[ModelSync] imported #{model} (blobs_downloaded=#{result[:blobs_downloaded]}, blobs_skipped=#{result[:blobs_skipped]})"
|
|
70
|
+
else
|
|
71
|
+
log.warn "[ModelSync] failed to import #{model}: #{result.inspect}"
|
|
72
|
+
end
|
|
73
|
+
rescue StandardError => e
|
|
74
|
+
handle_exception(e, level: :error, handled: true, model: model)
|
|
75
|
+
end
|
|
76
|
+
end
|
|
77
|
+
|
|
78
|
+
private
|
|
79
|
+
|
|
80
|
+
def model_present_locally?(model, models_path)
|
|
81
|
+
name, tag = model.split(':')
|
|
82
|
+
tag ||= 'latest'
|
|
83
|
+
manifest = File.join(models_path, 'manifests', 'registry.ollama.ai', 'library', name, tag)
|
|
84
|
+
File.exist?(manifest)
|
|
85
|
+
end
|
|
86
|
+
end
|
|
87
|
+
end
|
|
88
|
+
end
|
|
89
|
+
end
|
|
90
|
+
end
|
|
@@ -11,6 +11,8 @@ module Legion
|
|
|
11
11
|
#
|
|
12
12
|
# legion:
|
|
13
13
|
# ollama:
|
|
14
|
+
# fleet:
|
|
15
|
+
# consumer_priority: 10
|
|
14
16
|
# subscriptions:
|
|
15
17
|
# - type: embed
|
|
16
18
|
# model: nomic-embed-text
|
|
@@ -43,34 +45,73 @@ module Legion
|
|
|
43
45
|
false
|
|
44
46
|
end
|
|
45
47
|
|
|
46
|
-
#
|
|
47
|
-
#
|
|
48
|
+
# prefetch(1) is required for consumer priority to work correctly:
|
|
49
|
+
# without it, a high-priority consumer can hold multiple messages while
|
|
50
|
+
# lower-priority consumers sit idle. With prefetch=1, each consumer
|
|
51
|
+
# completes one message before RabbitMQ delivers the next, and priority
|
|
52
|
+
# determines which idle consumer gets it.
|
|
53
|
+
def prefetch
|
|
54
|
+
1
|
|
55
|
+
end
|
|
56
|
+
|
|
57
|
+
# Consumer priority from settings. Tells RabbitMQ to prefer this consumer
|
|
58
|
+
# over lower-priority ones on the same queue when multiple consumers are idle.
|
|
59
|
+
# Standard scale: GPU server = 10, Mac Studio = 5, developer laptop = 1.
|
|
60
|
+
# Defaults to 0 (equal priority) if not configured.
|
|
61
|
+
def consumer_priority
|
|
62
|
+
return 0 unless defined?(Legion::Settings)
|
|
63
|
+
|
|
64
|
+
Legion::Settings.dig(:ollama, :fleet, :consumer_priority) || 0
|
|
65
|
+
end
|
|
66
|
+
|
|
67
|
+
# Subscribe options include x-priority argument so RabbitMQ can honour
|
|
68
|
+
# consumer priority when dispatching to competing consumers.
|
|
69
|
+
def subscribe_options
|
|
70
|
+
base = begin
|
|
71
|
+
super
|
|
72
|
+
rescue NoMethodError
|
|
73
|
+
{}
|
|
74
|
+
end
|
|
75
|
+
base.merge(arguments: { 'x-priority' => consumer_priority })
|
|
76
|
+
end
|
|
77
|
+
|
|
78
|
+
# Returns a queue CLASS (not instance) bound to the llm.request exchange
|
|
79
|
+
# with the routing key for this worker's (type, model) pair.
|
|
80
|
+
# The Subscription base class calls queue.new in initialize, so this must
|
|
81
|
+
# return a class, not an instance.
|
|
48
82
|
def queue
|
|
49
|
-
@queue ||=
|
|
83
|
+
@queue ||= build_queue_class
|
|
50
84
|
end
|
|
51
85
|
|
|
52
86
|
# Enrich every inbound message with the worker's own request_type and model
|
|
53
|
-
# so Runners::Fleet#handle_request always has them, even if the sender omitted
|
|
87
|
+
# so Runners::Fleet#handle_request always has them, even if the sender omitted
|
|
88
|
+
# them. Also defaults message_context to {} if absent.
|
|
54
89
|
def process_message(payload, metadata, delivery_info)
|
|
55
90
|
msg = super
|
|
56
|
-
msg[:request_type]
|
|
57
|
-
msg[:model]
|
|
91
|
+
msg[:request_type] ||= @request_type
|
|
92
|
+
msg[:model] ||= @model_name
|
|
93
|
+
msg[:message_context] ||= {}
|
|
58
94
|
msg
|
|
59
95
|
end
|
|
60
96
|
|
|
61
97
|
private
|
|
62
98
|
|
|
63
|
-
def
|
|
99
|
+
def build_queue_class
|
|
64
100
|
sanitised_model = @model_name.tr(':', '.')
|
|
65
101
|
routing_key = "llm.request.ollama.#{@request_type}.#{sanitised_model}"
|
|
102
|
+
exchange_class = Transport::Exchanges::LlmRequest
|
|
66
103
|
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
104
|
+
Class.new(Legion::Transport::Queue) do
|
|
105
|
+
define_method(:queue_name) { routing_key }
|
|
106
|
+
define_method(:queue_options) do
|
|
107
|
+
{ durable: false, auto_delete: true, arguments: { 'x-max-priority' => 10 } }
|
|
108
|
+
end
|
|
109
|
+
define_method(:dlx_enabled) { false }
|
|
110
|
+
define_method(:initialize) do
|
|
111
|
+
super()
|
|
112
|
+
bind(exchange_class.new, routing_key: routing_key)
|
|
113
|
+
end
|
|
114
|
+
end
|
|
74
115
|
end
|
|
75
116
|
end
|
|
76
117
|
end
|