lex-ollama 0.3.1 → 0.3.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 28df561b00b58c7cb179b9904aed61a5aa7e278140306dadb3b4b2665eaab824
4
- data.tar.gz: 446afaab9d80e6a4f62286a1f5ccc1c023bdbb178dba043cb96081412991b2d3
3
+ metadata.gz: '086cd0d9744893c480b69c7b9fc6a10229af627344d316cafbccbd72d01cdafe'
4
+ data.tar.gz: e88f7893305401d36d8a3d48a7fb057e831fb6edeb245a5691c3d1e2527e1f1f
5
5
  SHA512:
6
- metadata.gz: 2915cfe6e4e959e61ee5b8ce68e7da784b4c6001cfe0c3acdb0a4e0f804da79a1e46a17b7c5297b9dd4f26e58bbae504066f5874d8cd82d6ea223b3dfc561bbb
7
- data.tar.gz: cb1337292d4bb7c94603612e03dbdbcbd9a41c2a94e56f4bbfad1132d403f6d14b0316091017745ee2d282ccc426bdcdb65b137f66f07f2a505d231792e424b0
6
+ metadata.gz: f6154fee6005bb96262961342983f01c5d4e5f2747c2cd11a33afbebc1f5ffa00ea64e4f39ef63758620524a5c6690b1a81856c1658371acc040fdd2295c8731
7
+ data.tar.gz: 0dda701fb81e1f084bc3cc4404d8014d6e2b6ecc68153fcdb42009c10a337f4a078ff11890a4032d95d516a5c1bc5177f4a8e1492db4caea47e9da1641fe6669
data/CHANGELOG.md CHANGED
@@ -1,5 +1,29 @@
1
1
  # Changelog
2
2
 
3
+ ## [0.3.3] - 2026-04-16
4
+
5
+ ### Added
6
+ - `Actor::ModelSync` — once actor; runs 5s after extension load; reads `legion.ollama.default_models` and `legion.ollama.s3` from settings; calls `import_from_s3` for any configured model not already present on disk; no-op if either setting is absent
7
+
8
+ ### Fixed
9
+ - `Transport::Queues::ModelRequest` deleted — the framework auto-discovers every file in `transport/queues/` and calls `.new` with no arguments at startup, which crashed because `ModelRequest` required `request_type:` and `model:`; the queue definition is now an anonymous class created inline by `Actor::ModelWorker#build_queue_class`
10
+ - `Actor::ModelWorker#queue` now returns a CLASS instead of an instance — `Subscription#initialize` calls `queue.new`, so returning an instance caused a silent `NoMethodError` on `NilClass#new`; the anonymous queue class has `queue_name`, `queue_options`, `dlx_enabled`, and `initialize` (exchange bind) defined inline via `define_method`
11
+
12
+ ## [0.3.2] - 2026-04-08
13
+
14
+ ### Changed
15
+ - `Transport::Exchanges::LlmRequest` now inherits `Legion::LLM::Fleet::Exchange` instead of declaring exchange properties independently — prevents silent divergence if the canonical exchange definition changes
16
+ - `Transport::Queues::ModelRequest` switched from durable quorum queue to classic auto-delete with `x-max-priority: 10` — enables `basic.return` feedback when all workers disconnect; added `dlx_enabled: false` to prevent DLX provisioning on ephemeral queues
17
+ - `Transport::Messages::LlmResponse` now inherits `Legion::LLM::Fleet::Response` instead of `Legion::Transport::Message` — gains wire protocol compliance (`type: 'llm.fleet.response'`, `message_context` propagation, default-exchange publishing, `resp_` prefixed message_id); overrides `app_id` to `'lex-ollama'`
18
+ - `Runners::Fleet#handle_request` now accepts and propagates `message_context` verbatim from request to response; rejects `stream: true` requests with `unsupported_streaming` error; builds full wire protocol response envelope (routing, tokens, timestamps, audit, cost, stop)
19
+ - `Runners::Fleet#publish_reply` switched from positional to keyword arguments; uses `fleet_correlation_id` instead of `correlation_id` to avoid collision with Legion task tracking
20
+ - `Runners::Fleet#dispatch` now resolves Ollama host from `Legion::Settings` instead of using hardcoded default
21
+ - `Actor::ModelWorker` now sets `prefetch(1)` for fair consumer dispatch; reads `consumer_priority` from `legion.ollama.fleet.consumer_priority` settings; passes `x-priority` in `subscribe_options`; injects `message_context: {}` default in `process_message`
22
+
23
+ ### Added
24
+ - `Runners::Fleet#publish_error` — publishes `Legion::LLM::Fleet::Error` to caller's reply_to queue on validation failures (e.g., unsupported streaming)
25
+ - `Runners::Fleet#build_response_body` — constructs wire protocol response body with routing, tokens, timestamps, audit, cost, and stop blocks
26
+
3
27
  ## [0.3.1] - 2026-04-08
4
28
 
5
29
  ### Added
data/CLAUDE.md CHANGED
@@ -12,8 +12,8 @@ reporting, and **fleet queue subscription** for receiving routed LLM requests fr
12
12
 
13
13
  **GitHub**: https://github.com/LegionIO/lex-ollama
14
14
  **License**: MIT
15
- **Version**: 0.3.1
16
- **Specs**: 82 examples (12 spec files) — fleet additions add ~35 more
15
+ **Version**: 0.3.2
16
+ **Specs**: 166 examples (17 spec files)
17
17
 
18
18
  ---
19
19
 
@@ -38,11 +38,11 @@ Legion::Extensions::Ollama
38
38
  ├── Client # Standalone client class (includes all runners, holds @config)
39
39
  ├── Transport/ # (loaded only when Legion::Extensions::Core is present)
40
40
  │ ├── Exchanges/
41
- │ │ └── LlmRequest # topic exchange 'llm.request'
41
+ │ │ └── LlmRequest # references Legion::LLM::Fleet::Exchange ('llm.request')
42
42
  │ ├── Queues/
43
- │ │ └── ModelRequest # parametric queue — one per (type, model) pair
43
+ │ │ └── ModelRequest # parametric queue — one per (type, model) pair, auto-delete
44
44
  │ └── Messages/
45
- │ └── LlmResponse # reply message published back to reply_to
45
+ │ └── LlmResponse # Legion::LLM::Fleet::Response subclass, reply via default exchange
46
46
  └── Actor/
47
47
  └── ModelWorker # subscription actor — one per registered model/type
48
48
  ```
@@ -80,9 +80,12 @@ llm.request.ollama.generate.llama3.2
80
80
 
81
81
  ### Queue Strategy
82
82
 
83
- Each model+type combination gets its own **durable quorum queue** with a routing key that matches
83
+ Each model+type combination gets its own **auto-delete queue** with a routing key that matches
84
84
  its queue name exactly. Multiple nodes carrying the same model compete fairly (no SAC) — any
85
85
  subscriber can serve. The queue name is identical to the routing key for clarity in the management UI.
86
+ RabbitMQ policies (applied externally via Terraform) set `max-length` and
87
+ `overflow: reject-publish` on `llm.request.*` queues. Queue priority is enabled by declaring
88
+ `x-max-priority: 10` on the queue itself (and may also be mirrored by policy for consistency).
86
89
 
87
90
  ### Configuration
88
91
 
@@ -106,20 +109,24 @@ The extension spawns one `Actor::ModelWorker` per subscription entry at boot.
106
109
  ### Data Flow
107
110
 
108
111
  ```
109
- Publisher (lex-llm-gateway / any fleet node)
112
+ Publisher (legion-llm Fleet::Dispatcher / any fleet node)
110
113
  │ routing_key: "llm.request.ollama.embed.nomic-embed-text"
114
+ │ AMQP type: 'llm.fleet.request'
115
+ │ Body includes: message_context { conversation_id, message_id, parent_message_id, message_seq, request_id, exchange_id }
111
116
 
112
117
  Exchange: llm.request [topic, durable]
113
118
 
114
- └── Queue: llm.request.ollama.embed.nomic-embed-text [quorum]
119
+ └── Queue: llm.request.ollama.embed.nomic-embed-text [auto-delete]
115
120
 
116
121
  Actor::ModelWorker (type=embed, model=nomic-embed-text)
117
122
 
118
123
  Runners::Fleet#handle_request
124
+ │ copies message_context from request
119
125
 
120
126
  Ollama::Client#embed(model: 'nomic-embed-text', ...)
121
127
 
122
- Transport::Messages::LlmResponse → reply_to queue (if present)
128
+ Fleet::Response (type: 'llm.fleet.response') → reply_to queue
129
+ Body includes: message_context (copied), response_message_id
123
130
  ```
124
131
 
125
132
  ### Standalone Mode (no Legion runtime)
@@ -152,6 +159,34 @@ The gem still works as a pure HTTP client library without AMQP, exactly as befor
152
159
 
153
160
  ---
154
161
 
162
+ ## Wire Protocol & Message Classes
163
+
164
+ Fleet messages inherit from `Legion::LLM::Transport::Message` (defined in legion-llm), which
165
+ extends `Legion::Transport::Message` with `message_context` propagation and LLM-specific headers.
166
+
167
+ ```
168
+ Legion::Transport::Message (platform base)
169
+ └── Legion::LLM::Transport::Message (LLM base — message_context, llm_headers)
170
+ ├── Legion::LLM::Fleet::Request (type: 'llm.fleet.request', app_id: 'legion-llm')
171
+ ├── Legion::LLM::Fleet::Response (type: 'llm.fleet.response', app_id: 'lex-ollama')
172
+ └── Legion::LLM::Fleet::Error (type: 'llm.fleet.error', app_id: 'lex-ollama')
173
+ ```
174
+
175
+ Every fleet message carries `message_context` in the body for end-to-end tracing:
176
+ ```
177
+ message_context:
178
+ conversation_id, message_id, parent_message_id, message_seq, request_id, exchange_id
179
+ ```
180
+
181
+ A subset (`conversation_id`, `message_id`, `request_id`) is promoted to AMQP headers
182
+ (`x-legion-llm-conversation-id`, etc.) for filtering without body parsing.
183
+
184
+ The wire protocol spec (AMQP property mapping, platform-wide standard, per-message-type
185
+ specifications) was developed during the fleet design phase and is maintained in the
186
+ legion-llm repository alongside the implementation.
187
+
188
+ ---
189
+
155
190
  ## Dependencies
156
191
 
157
192
  | Gem | Purpose |
@@ -159,8 +194,9 @@ The gem still works as a pure HTTP client library without AMQP, exactly as befor
159
194
  | `faraday` >= 2.0 | HTTP client for Ollama REST API |
160
195
  | `lex-s3` >= 0.2 | S3 model distribution operations |
161
196
 
162
- Fleet transport requires Legion runtime gems (`legion-transport`, `LegionIO`) but those are *not*
163
- gemspec dependencies — they are expected to be present in the runtime environment.
197
+ Fleet transport requires Legion runtime gems (`legion-transport`, `legion-llm`, `LegionIO`) but
198
+ those are *not* gemspec dependencies — they are expected to be present in the runtime environment.
199
+ `legion-llm` is needed for fleet message classes (`Legion::LLM::Fleet::Request`, etc.).
164
200
 
165
201
  ---
166
202
 
@@ -175,4 +211,4 @@ bundle exec rubocop
175
211
  ---
176
212
 
177
213
  **Maintained By**: Matthew Iverson (@Esity)
178
- **Last Updated**: 2026-04-07
214
+ **Last Updated**: 2026-04-10
data/README.md CHANGED
@@ -44,6 +44,36 @@ gem install lex-ollama
44
44
  ### Version
45
45
  - `server_version` - Retrieve the Ollama server version (GET /api/version)
46
46
 
47
+ ### Fleet Queue Subscription
48
+ - `handle_request` - Dispatch inbound fleet AMQP messages to the appropriate runner (chat/embed/generate)
49
+
50
+ When `Legion::Extensions::Core` is present, lex-ollama subscribes to model-scoped queues on the
51
+ `llm.request` topic exchange, accepting routed LLM inference work from other Legion fleet members.
52
+
53
+ Each configured `(type, model)` pair gets its own auto-delete queue with routing key
54
+ `llm.request.ollama.<type>.<model>`. Multiple nodes serving the same model compete fairly
55
+ via RabbitMQ round-robin with consumer priority.
56
+
57
+ ```yaml
58
+ legion:
59
+ ollama:
60
+ host: "http://localhost:11434"
61
+ fleet:
62
+ consumer_priority: 10 # H100: 10, Mac Studio: 5, MacBook: 1
63
+ subscriptions:
64
+ - type: embed
65
+ model: nomic-embed-text
66
+ - type: chat
67
+ model: "qwen3.5:27b"
68
+ ```
69
+
70
+ Fleet messages use the wire protocol defined in `legion-llm`: typed AMQP messages
71
+ (`llm.fleet.request` / `llm.fleet.response` / `llm.fleet.error`) with `message_context`
72
+ propagation for end-to-end tracing.
73
+
74
+ Without `Legion::Extensions::Core`, the gem works as a pure HTTP client library with no
75
+ AMQP dependency.
76
+
47
77
  ## Standalone Client
48
78
 
49
79
  ```ruby
@@ -85,21 +115,21 @@ Pull models from an internal S3 mirror instead of the public Ollama registry:
85
115
  client = Legion::Extensions::Ollama::Client.new
86
116
 
87
117
  # List available models in S3
88
- client.list_s3_models(bucket: 'legion', endpoint: 'https://mesh.s3api-core.optum.com')
118
+ client.list_s3_models(bucket: 'legion', endpoint: 'https://s3.example.internal')
89
119
 
90
120
  # Import directly to filesystem (works without Ollama running)
91
121
  client.import_from_s3(model: 'llama3:latest', bucket: 'legion',
92
- endpoint: 'https://mesh.s3api-core.optum.com')
122
+ endpoint: 'https://s3.example.internal')
93
123
 
94
124
  # Push through Ollama API (requires Ollama running)
95
125
  client.sync_from_s3(model: 'llama3:latest', bucket: 'legion',
96
- endpoint: 'https://mesh.s3api-core.optum.com')
126
+ endpoint: 'https://s3.example.internal')
97
127
 
98
128
  # Provision fleet with default models
99
129
  client.import_default_models(
100
130
  default_models: %w[llama3:latest nomic-embed-text:latest],
101
131
  bucket: 'legion',
102
- endpoint: 'https://mesh.s3api-core.optum.com'
132
+ endpoint: 'https://s3.example.internal'
103
133
  )
104
134
  ```
105
135
 
@@ -121,7 +151,7 @@ result[:usage] # => { input_tokens: 1, output_tokens: 5, total_duration: ..., .
121
151
 
122
152
  ## Version
123
153
 
124
- 0.3.1
154
+ 0.3.2
125
155
 
126
156
  ## License
127
157
 
@@ -0,0 +1,90 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Legion
4
+ module Extensions
5
+ module Ollama
6
+ module Actor
7
+ # Once actor — runs once shortly after extension load.
8
+ # Reads legion.ollama.s3 and legion.ollama.default_models from settings
9
+ # and calls import_from_s3 for any model not already present locally.
10
+ #
11
+ # Settings example:
12
+ # {
13
+ # "legion": {
14
+ # "ollama": {
15
+ # "s3": {
16
+ # "bucket": "legion",
17
+ # "prefix": "ollama/models",
18
+ # "endpoint": "https://s3.example.internal"
19
+ # },
20
+ # "default_models": ["qwen3.5:4b", "nomic-embed-text:latest"]
21
+ # }
22
+ # }
23
+ # }
24
+ class ModelSync < Legion::Extensions::Actors::Once
25
+ include Legion::Logging::Helper
26
+
27
+ # Run 5 seconds after extension load to allow the rest of startup to complete.
28
+ def delay
29
+ 5.0
30
+ end
31
+
32
+ def use_runner?
33
+ false
34
+ end
35
+
36
+ def runner_class
37
+ self.class
38
+ end
39
+
40
+ def enabled?
41
+ return false unless defined?(Legion::Settings)
42
+
43
+ models = Legion::Settings.dig(:ollama, :default_models)
44
+ s3_cfg = Legion::Settings.dig(:ollama, :s3)
45
+ models.is_a?(Array) && !models.empty? && s3_cfg.is_a?(Hash) && s3_cfg[:bucket]
46
+ rescue StandardError => e
47
+ handle_exception(e, level: :warn, handled: true)
48
+ false
49
+ end
50
+
51
+ def manual
52
+ models = Legion::Settings.dig(:ollama, :default_models) || []
53
+ s3_cfg = Legion::Settings.dig(:ollama, :s3)
54
+ bucket = s3_cfg[:bucket]
55
+ s3_opts = s3_cfg.except(:bucket)
56
+
57
+ client = Object.new.extend(Legion::Extensions::Ollama::Runners::S3Models)
58
+ models_path = ENV.fetch('OLLAMA_MODELS', File.join(Dir.home, '.ollama', 'models'))
59
+
60
+ models.each do |model|
61
+ if model_present_locally?(model, models_path)
62
+ log.debug "[ModelSync] #{model} already present locally, skipping"
63
+ next
64
+ end
65
+
66
+ log.info "[ModelSync] importing #{model} from S3"
67
+ result = client.import_from_s3(model: model, bucket: bucket, models_path: models_path, **s3_opts)
68
+ if result[:status] == 200
69
+ log.info "[ModelSync] imported #{model} (blobs_downloaded=#{result[:blobs_downloaded]}, blobs_skipped=#{result[:blobs_skipped]})"
70
+ else
71
+ log.warn "[ModelSync] failed to import #{model}: #{result.inspect}"
72
+ end
73
+ rescue StandardError => e
74
+ handle_exception(e, level: :error, handled: true, model: model)
75
+ end
76
+ end
77
+
78
+ private
79
+
80
+ def model_present_locally?(model, models_path)
81
+ name, tag = model.split(':')
82
+ tag ||= 'latest'
83
+ manifest = File.join(models_path, 'manifests', 'registry.ollama.ai', 'library', name, tag)
84
+ File.exist?(manifest)
85
+ end
86
+ end
87
+ end
88
+ end
89
+ end
90
+ end
@@ -11,6 +11,8 @@ module Legion
11
11
  #
12
12
  # legion:
13
13
  # ollama:
14
+ # fleet:
15
+ # consumer_priority: 10
14
16
  # subscriptions:
15
17
  # - type: embed
16
18
  # model: nomic-embed-text
@@ -43,34 +45,73 @@ module Legion
43
45
  false
44
46
  end
45
47
 
46
- # Override queue to return a model-scoped queue bound with the precise
47
- # routing key for this worker's (type, model) pair.
48
+ # prefetch(1) is required for consumer priority to work correctly:
49
+ # without it, a high-priority consumer can hold multiple messages while
50
+ # lower-priority consumers sit idle. With prefetch=1, each consumer
51
+ # completes one message before RabbitMQ delivers the next, and priority
52
+ # determines which idle consumer gets it.
53
+ def prefetch
54
+ 1
55
+ end
56
+
57
+ # Consumer priority from settings. Tells RabbitMQ to prefer this consumer
58
+ # over lower-priority ones on the same queue when multiple consumers are idle.
59
+ # Standard scale: GPU server = 10, Mac Studio = 5, developer laptop = 1.
60
+ # Defaults to 0 (equal priority) if not configured.
61
+ def consumer_priority
62
+ return 0 unless defined?(Legion::Settings)
63
+
64
+ Legion::Settings.dig(:ollama, :fleet, :consumer_priority) || 0
65
+ end
66
+
67
+ # Subscribe options include x-priority argument so RabbitMQ can honour
68
+ # consumer priority when dispatching to competing consumers.
69
+ def subscribe_options
70
+ base = begin
71
+ super
72
+ rescue NoMethodError
73
+ {}
74
+ end
75
+ base.merge(arguments: { 'x-priority' => consumer_priority })
76
+ end
77
+
78
+ # Returns a queue CLASS (not instance) bound to the llm.request exchange
79
+ # with the routing key for this worker's (type, model) pair.
80
+ # The Subscription base class calls queue.new in initialize, so this must
81
+ # return a class, not an instance.
48
82
  def queue
49
- @queue ||= build_and_bind_queue
83
+ @queue ||= build_queue_class
50
84
  end
51
85
 
52
86
  # Enrich every inbound message with the worker's own request_type and model
53
- # so Runners::Fleet#handle_request always has them, even if the sender omitted them.
87
+ # so Runners::Fleet#handle_request always has them, even if the sender omitted
88
+ # them. Also defaults message_context to {} if absent.
54
89
  def process_message(payload, metadata, delivery_info)
55
90
  msg = super
56
- msg[:request_type] ||= @request_type
57
- msg[:model] ||= @model_name
91
+ msg[:request_type] ||= @request_type
92
+ msg[:model] ||= @model_name
93
+ msg[:message_context] ||= {}
58
94
  msg
59
95
  end
60
96
 
61
97
  private
62
98
 
63
- def build_and_bind_queue
99
+ def build_queue_class
64
100
  sanitised_model = @model_name.tr(':', '.')
65
101
  routing_key = "llm.request.ollama.#{@request_type}.#{sanitised_model}"
102
+ exchange_class = Transport::Exchanges::LlmRequest
66
103
 
67
- queue_obj = Transport::Queues::ModelRequest.new(
68
- request_type: @request_type,
69
- model: @model_name
70
- )
71
- exchange_obj = Transport::Exchanges::LlmRequest.new
72
- queue_obj.bind(exchange_obj, routing_key: routing_key)
73
- queue_obj
104
+ Class.new(Legion::Transport::Queue) do
105
+ define_method(:queue_name) { routing_key }
106
+ define_method(:queue_options) do
107
+ { durable: false, auto_delete: true, arguments: { 'x-max-priority' => 10 } }
108
+ end
109
+ define_method(:dlx_enabled) { false }
110
+ define_method(:initialize) do
111
+ super()
112
+ bind(exchange_class.new, routing_key: routing_key)
113
+ end
114
+ end
74
115
  end
75
116
  end
76
117
  end