lex-ollama 0.3.2 → 0.3.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8657f3e11e11fcd2ee34e12317bf7698bcfea1e907006c76f9a07326996c7a69
4
- data.tar.gz: d1c1bbb05dc6a3a0071b4474a45a074b0d4929cd4b48b7488b5aa10539a9a6ee
3
+ metadata.gz: 978c53ff8a178c003a5bb593a934536c20b616500d80ea0624f97014f9a88213
4
+ data.tar.gz: d700e31e6f38fe2b9c6cac3da627d67ce1ccab9a75d3d6d741cdc04f5cc614bf
5
5
  SHA512:
6
- metadata.gz: e2a8622a2914cdfbc04b365d1ca7a9e8d35b4daa656931fd23a6e010b25b5a8ed6699246bffdbe0bfe064757ad26cad8f3664fe8c1dd2d1c606220dc932af45f
7
- data.tar.gz: ea975e9ac1c89621d41c274b6040bb92760672f9d4a2db223ea6c08371f7ed8e8a2dc810417b9b5b2beb9ce30fccb64544f24c11ae815422e5b99f5a43a48517
6
+ metadata.gz: 6f44dcfc98336bcd0d28e6985ed468f7676b156d5135ff642256120db59563e161d46615b9acab0e3cdac6b578144121d60a23efc17c31f6f6c686349519f076
7
+ data.tar.gz: b191eacce0844eb0be9b6b4b22f12969007b37f335da700f6ea6bd4936b22fd6aa2eec945dbdfbb419f5b1a9f9f1b0c9e15c004d772e18a1a98c059c133e83e8
data/CHANGELOG.md CHANGED
@@ -1,5 +1,19 @@
1
1
  # Changelog
2
2
 
3
+ ## [0.3.4] - 2026-04-24
4
+
5
+ ### Fixed
6
+ - `Ollama.build_actors` and `Ollama.default_settings` were absent from the installed 0.3.3 gem (gem was packaged before `bafb124` landed) — `Actor::ModelWorker` (requires `request_type:` and `model:` kwargs) was reaching the subscription actor pool with no zero-arg initializer, raising `ArgumentError: missing keywords: :request_type, :model` on every boot when running under the Homebrew legionio install
7
+
8
+ ## [0.3.3] - 2026-04-16
9
+
10
+ ### Added
11
+ - `Actor::ModelSync` — once actor; runs 5s after extension load; reads `legion.ollama.default_models` and `legion.ollama.s3` from settings; calls `import_from_s3` for any configured model not already present on disk; no-op if either setting is absent
12
+
13
+ ### Fixed
14
+ - `Transport::Queues::ModelRequest` deleted — the framework auto-discovers every file in `transport/queues/` and calls `.new` with no arguments at startup, which crashed because `ModelRequest` required `request_type:` and `model:`; the queue definition is now an anonymous class created inline by `Actor::ModelWorker#build_queue_class`
15
+ - `Actor::ModelWorker#queue` now returns a CLASS instead of an instance — `Subscription#initialize` calls `queue.new`, so returning an instance caused a silent `NoMethodError` on `NilClass#new`; the anonymous queue class has `queue_name`, `queue_options`, `dlx_enabled`, and `initialize` (exchange bind) defined inline via `define_method`
16
+
3
17
  ## [0.3.2] - 2026-04-08
4
18
 
5
19
  ### Changed
data/CLAUDE.md CHANGED
@@ -12,8 +12,8 @@ reporting, and **fleet queue subscription** for receiving routed LLM requests fr
12
12
 
13
13
  **GitHub**: https://github.com/LegionIO/lex-ollama
14
14
  **License**: MIT
15
- **Version**: 0.3.2
16
- **Specs**: 82 examples (12 spec files) — fleet additions add ~35 more
15
+ **Version**: 0.3.3
16
+ **Specs**: 154 examples (16 spec files)
17
17
 
18
18
  ---
19
19
 
@@ -28,7 +28,8 @@ Legion::Extensions::Ollama
28
28
  │ │ # pull_model, push_model, list_running
29
29
  │ ├── Embeddings # embed
30
30
  │ ├── Blobs # check_blob, push_blob
31
- │ ├── S3Models # list_s3_models, import_from_s3, sync_from_s3, import_default_models
31
+ │ ├── S3Models # list_s3_models, import_from_s3, sync_from_s3, import_default_models,
32
+ │ │ # sync_configured_models
32
33
  │ ├── Version # server_version
33
34
  │ └── Fleet # handle_request (fleet dispatcher — chat/embed/generate)
34
35
  ├── Helpers/
@@ -44,7 +45,8 @@ Legion::Extensions::Ollama
44
45
  │ └── Messages/
45
46
  │ └── LlmResponse # Legion::LLM::Fleet::Response subclass, reply via default exchange
46
47
  └── Actor/
47
- └── ModelWorker # subscription actor — one per registered model/type
48
+ ├── ModelWorker # subscription actor — one per registered model/type
49
+ └── ModelSync # once actor — fires 5s after boot, pulls default models from S3
48
50
  ```
49
51
 
50
52
  ---
@@ -93,6 +95,15 @@ RabbitMQ policies (applied externally via Terraform) set `max-length` and
93
95
  legion:
94
96
  ollama:
95
97
  host: "http://localhost:11434"
98
+ s3:
99
+ bucket: "legion"
100
+ prefix: "ollama/models"
101
+ endpoint: "https://s3.example.internal"
102
+ default_models:
103
+ - "qwen3.5:4b"
104
+ - "nomic-embed-text:latest"
105
+ fleet:
106
+ consumer_priority: 10 # H100: 10, Mac Studio: 5, MacBook: 1
96
107
  subscriptions:
97
108
  - type: embed
98
109
  model: nomic-embed-text
@@ -104,7 +115,15 @@ legion:
104
115
  model: llama3.2
105
116
  ```
106
117
 
107
- The extension spawns one `Actor::ModelWorker` per subscription entry at boot.
118
+ **`s3` + `default_models`**: `Actor::ModelSync` fires 5 seconds after extension load and calls
119
+ `Runners::S3Models#sync_configured_models` to import any listed models not already present
120
+ locally. All download logic lives in the runner; the actor is only the trigger. Uses the
121
+ inherited `Actors::Base#manual` path (not `Legion::Runner`) so errors surface via
122
+ `handle_exception` rather than being silently swallowed by `Concurrent::ScheduledTask`.
123
+
124
+ **`subscriptions`**: `Ollama.build_actors` replaces the base `ModelWorker` actor entry with one
125
+ dynamically generated subclass per subscription entry (each with a zero-arg `initialize`).
126
+ The extension spawns one `Actor::ModelWorker` per entry at boot.
108
127
 
109
128
  ### Data Flow
110
129
 
@@ -154,6 +173,12 @@ The gem still works as a pure HTTP client library without AMQP, exactly as befor
154
173
  - `request_type: 'generate'` → `Client#generate`.
155
174
  - anything else (including `'chat'` or unknown) → `Client#chat`.
156
175
  - **`Actor::ModelWorker#use_runner?` is `false`** — bypasses `Legion::Runner` / task DB entirely.
176
+ - **`Actor::ModelSync#use_runner?` is `false`** — uses inherited `Actors::Base#manual` which calls
177
+ `runner_class.send(runner_function, **{})` with proper `handle_exception` error handling.
178
+ - **`Ollama.build_actors`** dynamically generates one `ModelWorker` subclass per subscription
179
+ entry, each with a zero-arg `initialize` that passes the frozen `request_type` and `model`.
180
+ - **`Ollama.default_settings`** returns `{ s3: {}, fleet: {} }` so `settings[:s3]` and
181
+ `settings[:fleet]` are always hashes even without user configuration.
157
182
  - **Reply publishing** never raises — errors are swallowed so the AMQP ack is not blocked.
158
183
  - **Colon sanitisation** — `qwen3.5:27b` becomes `qwen3.5.27b` in queue/routing-key strings.
159
184
 
@@ -181,8 +206,9 @@ message_context:
181
206
  A subset (`conversation_id`, `message_id`, `request_id`) is promoted to AMQP headers
182
207
  (`x-legion-llm-conversation-id`, etc.) for filtering without body parsing.
183
208
 
184
- See: `docs/plans/2026-04-08-fleet-wire-protocol.md` for full AMQP property mapping,
185
- platform-wide standard, and per-message-type specifications.
209
+ The wire protocol spec (AMQP property mapping, platform-wide standard, per-message-type
210
+ specifications) was developed during the fleet design phase and is maintained in the
211
+ legion-llm repository alongside the implementation.
186
212
 
187
213
  ---
188
214
 
@@ -210,4 +236,4 @@ bundle exec rubocop
210
236
  ---
211
237
 
212
238
  **Maintained By**: Matthew Iverson (@Esity)
213
- **Last Updated**: 2026-04-08
239
+ **Last Updated**: 2026-04-17
data/README.md CHANGED
@@ -40,10 +40,52 @@ gem install lex-ollama
40
40
  - `import_from_s3` - Download model from S3 directly to Ollama's filesystem (works before Ollama starts)
41
41
  - `sync_from_s3` - Download model from S3, push blobs through Ollama's API, write manifest to filesystem
42
42
  - `import_default_models` - Import a list of models from S3 (fleet provisioning)
43
+ - `sync_configured_models` - Import all `default_models` from S3 that aren't already present locally
43
44
 
44
45
  ### Version
45
46
  - `server_version` - Retrieve the Ollama server version (GET /api/version)
46
47
 
48
+ ### Fleet Queue Subscription
49
+ - `handle_request` - Dispatch inbound fleet AMQP messages to the appropriate runner (chat/embed/generate)
50
+
51
+ When `Legion::Extensions::Core` is present, lex-ollama subscribes to model-scoped queues on the
52
+ `llm.request` topic exchange, accepting routed LLM inference work from other Legion fleet members.
53
+
54
+ Each configured `(type, model)` pair gets its own auto-delete queue with routing key
55
+ `llm.request.ollama.<type>.<model>`. Multiple nodes serving the same model compete fairly
56
+ via RabbitMQ round-robin with consumer priority.
57
+
58
+ ```yaml
59
+ legion:
60
+ ollama:
61
+ host: "http://localhost:11434"
62
+ s3:
63
+ bucket: "legion"
64
+ prefix: "ollama/models"
65
+ endpoint: "https://s3.example.internal"
66
+ default_models:
67
+ - "qwen3.5:4b"
68
+ - "nomic-embed-text:latest"
69
+ fleet:
70
+ consumer_priority: 10 # H100: 10, Mac Studio: 5, MacBook: 1
71
+ subscriptions:
72
+ - type: embed
73
+ model: nomic-embed-text
74
+ - type: chat
75
+ model: "qwen3.5:27b"
76
+ ```
77
+
78
+ **Auto-provisioning**: When `s3` and `default_models` are configured, the `ModelSync` actor
79
+ fires 5 seconds after boot and imports any listed models not already present on disk from the
80
+ S3 mirror. No manual pull step needed for fleet nodes.
81
+
82
+ Fleet messages use the wire protocol defined in `legion-llm`: typed AMQP messages
83
+ (`llm.fleet.request` / `llm.fleet.response` / `llm.fleet.error`) with `message_context`
84
+ propagation for end-to-end tracing.
85
+
86
+ Without `Legion::Extensions::Core`, the gem works as a pure HTTP client library with no
87
+ AMQP dependency.
88
+
47
89
  ## Standalone Client
48
90
 
49
91
  ```ruby
@@ -85,21 +127,21 @@ Pull models from an internal S3 mirror instead of the public Ollama registry:
85
127
  client = Legion::Extensions::Ollama::Client.new
86
128
 
87
129
  # List available models in S3
88
- client.list_s3_models(bucket: 'legion', endpoint: 'https://mesh.s3api-core.optum.com')
130
+ client.list_s3_models(bucket: 'legion', endpoint: 'https://s3.example.internal')
89
131
 
90
132
  # Import directly to filesystem (works without Ollama running)
91
133
  client.import_from_s3(model: 'llama3:latest', bucket: 'legion',
92
- endpoint: 'https://mesh.s3api-core.optum.com')
134
+ endpoint: 'https://s3.example.internal')
93
135
 
94
136
  # Push through Ollama API (requires Ollama running)
95
137
  client.sync_from_s3(model: 'llama3:latest', bucket: 'legion',
96
- endpoint: 'https://mesh.s3api-core.optum.com')
138
+ endpoint: 'https://s3.example.internal')
97
139
 
98
140
  # Provision fleet with default models
99
141
  client.import_default_models(
100
142
  default_models: %w[llama3:latest nomic-embed-text:latest],
101
143
  bucket: 'legion',
102
- endpoint: 'https://mesh.s3api-core.optum.com'
144
+ endpoint: 'https://s3.example.internal'
103
145
  )
104
146
  ```
105
147
 
@@ -121,7 +163,7 @@ result[:usage] # => { input_tokens: 1, output_tokens: 5, total_duration: ..., .
121
163
 
122
164
  ## Version
123
165
 
124
- 0.3.1
166
+ 0.3.3
125
167
 
126
168
  ## License
127
169
 
@@ -0,0 +1,49 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Legion
4
+ module Extensions
5
+ module Ollama
6
+ module Actor
7
+ # Once actor — fires 5s after extension load and calls
8
+ # Runners::S3Models#sync_configured_models to pull any configured
9
+ # default models from S3 that are not already present locally.
10
+ #
11
+ # All download logic lives in the runner. This actor is only the trigger.
12
+ class ModelSync < Legion::Extensions::Actors::Once
13
+ def delay
14
+ 5.0
15
+ end
16
+
17
+ def runner_class
18
+ Legion::Extensions::Ollama::Runners::S3Models
19
+ end
20
+
21
+ def runner_function
22
+ 'sync_configured_models'
23
+ end
24
+
25
+ def use_runner?
26
+ false
27
+ end
28
+
29
+ def check_subtask?
30
+ false
31
+ end
32
+
33
+ def generate_task?
34
+ false
35
+ end
36
+
37
+ def enabled?
38
+ s3_cfg = settings[:s3]
39
+ models = settings[:default_models]
40
+ s3_cfg.is_a?(Hash) && !s3_cfg[:bucket].nil? && models.is_a?(Array) && !models.empty?
41
+ rescue StandardError => e
42
+ handle_exception(e, level: :warn, handled: true)
43
+ false
44
+ end
45
+ end
46
+ end
47
+ end
48
+ end
49
+ end
@@ -59,9 +59,7 @@ module Legion
59
59
  # Standard scale: GPU server = 10, Mac Studio = 5, developer laptop = 1.
60
60
  # Defaults to 0 (equal priority) if not configured.
61
61
  def consumer_priority
62
- return 0 unless defined?(Legion::Settings)
63
-
64
- Legion::Settings.dig(:ollama, :fleet, :consumer_priority) || 0
62
+ settings.dig(:fleet, :consumer_priority) || 0
65
63
  end
66
64
 
67
65
  # Subscribe options include x-priority argument so RabbitMQ can honour
@@ -75,10 +73,12 @@ module Legion
75
73
  base.merge(arguments: { 'x-priority' => consumer_priority })
76
74
  end
77
75
 
78
- # Override queue to return a model-scoped queue bound with the precise
79
- # routing key for this worker's (type, model) pair.
76
+ # Returns a queue CLASS (not instance) bound to the llm.request exchange
77
+ # with the routing key for this worker's (type, model) pair.
78
+ # The Subscription base class calls queue.new in initialize, so this must
79
+ # return a class, not an instance.
80
80
  def queue
81
- @queue ||= build_and_bind_queue
81
+ @queue ||= build_queue_class
82
82
  end
83
83
 
84
84
  # Enrich every inbound message with the worker's own request_type and model
@@ -94,17 +94,22 @@ module Legion
94
94
 
95
95
  private
96
96
 
97
- def build_and_bind_queue
97
+ def build_queue_class
98
98
  sanitised_model = @model_name.tr(':', '.')
99
99
  routing_key = "llm.request.ollama.#{@request_type}.#{sanitised_model}"
100
+ exchange_class = Transport::Exchanges::LlmRequest
100
101
 
101
- queue_obj = Transport::Queues::ModelRequest.new(
102
- request_type: @request_type,
103
- model: @model_name
104
- )
105
- exchange_obj = Transport::Exchanges::LlmRequest.new
106
- queue_obj.bind(exchange_obj, routing_key: routing_key)
107
- queue_obj
102
+ Class.new(Legion::Transport::Queue) do
103
+ define_method(:queue_name) { routing_key }
104
+ define_method(:queue_options) do
105
+ { durable: false, auto_delete: true, arguments: { 'x-max-priority' => 10 } }
106
+ end
107
+ define_method(:dlx_enabled) { false }
108
+ define_method(:initialize) do
109
+ super()
110
+ bind(exchange_class.new, routing_key: routing_key)
111
+ end
112
+ end
108
113
  end
109
114
  end
110
115
  end
@@ -145,6 +145,29 @@ module Legion
145
145
  { result: results, status: 200 }
146
146
  end
147
147
 
148
+ def sync_configured_models(**)
149
+ s3_cfg = settings[:s3]
150
+ models = settings[:default_models]
151
+
152
+ return { result: false, status: 412, error: 'no s3 config' } unless s3_cfg.is_a?(Hash) && s3_cfg[:bucket]
153
+ return { result: false, status: 412, error: 'no default_models configured' } unless models.is_a?(Array) && !models.empty?
154
+
155
+ bucket = s3_cfg[:bucket]
156
+ s3_opts = s3_cfg.except(:bucket)
157
+ models_path = ENV.fetch('OLLAMA_MODELS', File.join(Dir.home, '.ollama', 'models'))
158
+
159
+ results = models.filter_map do |model|
160
+ name, tag = model.split(':')
161
+ tag ||= 'latest'
162
+ manifest = File.join(models_path, 'manifests', 'registry.ollama.ai', 'library', name, tag)
163
+ next if File.exist?(manifest)
164
+
165
+ import_from_s3(model: model, bucket: bucket, models_path: models_path, **s3_opts)
166
+ end
167
+
168
+ { result: results, status: 200 }
169
+ end
170
+
148
171
  private
149
172
 
150
173
  def default_models_path
@@ -12,13 +12,8 @@ module Legion
12
12
  module Transport
13
13
  extend Legion::Extensions::Transport if Legion::Extensions.const_defined?(:Transport, false)
14
14
 
15
- # All queue-to-exchange bindings are established dynamically by
16
- # Actor::ModelWorker#build_and_bind_queue at subscription time.
17
- # This file only needs to declare the exchange so topology/infra mode
18
- # can introspect the full routing graph.
19
- def self.additional_e_to_q
20
- []
21
- end
15
+ # All queue-to-exchange bindings for fleet queues are established dynamically by
16
+ # Actor::ModelWorker at subscription time via build_queue_class.
22
17
  end
23
18
  end
24
19
  end
@@ -3,7 +3,7 @@
3
3
  module Legion
4
4
  module Extensions
5
5
  module Ollama
6
- VERSION = '0.3.2'
6
+ VERSION = '0.3.4'
7
7
  end
8
8
  end
9
9
  end
@@ -18,16 +18,53 @@ require 'legion/extensions/ollama/client'
18
18
  # so the gem still works as a standalone HTTP client without any AMQP runtime.
19
19
  if Legion::Extensions.const_defined?(:Core, false)
20
20
  require 'legion/extensions/ollama/transport/exchanges/llm_request'
21
- require 'legion/extensions/ollama/transport/queues/model_request'
22
21
  require 'legion/extensions/ollama/transport/messages/llm_response'
23
22
  require 'legion/extensions/ollama/transport'
24
23
  require 'legion/extensions/ollama/actors/model_worker'
24
+ require 'legion/extensions/ollama/actors/model_sync'
25
25
  end
26
26
 
27
27
  module Legion
28
28
  module Extensions
29
29
  module Ollama
30
30
  extend Legion::Extensions::Core if Legion::Extensions.const_defined?(:Core, false)
31
+
32
+ def self.default_settings
33
+ {
34
+ s3: {},
35
+ fleet: {}
36
+ }
37
+ end
38
+
39
+ # Called by the framework during autobuild. Runs normal actor discovery,
40
+ # then replaces the single ModelWorker entry with one concrete subclass
41
+ # per subscription entry in settings (each has a zero-arg initialize).
42
+ def self.build_actors
43
+ super
44
+ @actors.delete(:model_worker)
45
+
46
+ subs = settings[:subscriptions]
47
+ return unless subs.is_a?(Array)
48
+
49
+ subs.each do |sub|
50
+ request_type = sub[:type]&.to_s
51
+ model = sub[:model]&.to_s
52
+ next unless request_type && model
53
+
54
+ actor_name = :"model_worker_#{request_type}_#{model.tr(':.', '__')}"
55
+ worker_class = Class.new(Legion::Extensions::Ollama::Actor::ModelWorker) do
56
+ define_method(:initialize) { super(request_type: request_type, model: model) }
57
+ end
58
+
59
+ @actors[actor_name] = {
60
+ extension: 'lex-ollama',
61
+ extension_name: :ollama,
62
+ actor_name: actor_name,
63
+ actor_class: worker_class,
64
+ type: 'literal'
65
+ }
66
+ end
67
+ end
31
68
  end
32
69
  end
33
70
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: lex-ollama
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.2
4
+ version: 0.3.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - Esity
@@ -56,6 +56,7 @@ files:
56
56
  - README.md
57
57
  - lex-ollama.gemspec
58
58
  - lib/legion/extensions/ollama.rb
59
+ - lib/legion/extensions/ollama/actors/model_sync.rb
59
60
  - lib/legion/extensions/ollama/actors/model_worker.rb
60
61
  - lib/legion/extensions/ollama/client.rb
61
62
  - lib/legion/extensions/ollama/helpers/client.rb
@@ -72,7 +73,6 @@ files:
72
73
  - lib/legion/extensions/ollama/transport.rb
73
74
  - lib/legion/extensions/ollama/transport/exchanges/llm_request.rb
74
75
  - lib/legion/extensions/ollama/transport/messages/llm_response.rb
75
- - lib/legion/extensions/ollama/transport/queues/model_request.rb
76
76
  - lib/legion/extensions/ollama/version.rb
77
77
  homepage: https://github.com/LegionIO/lex-ollama
78
78
  licenses:
@@ -1,58 +0,0 @@
1
- # frozen_string_literal: true
2
-
3
- module Legion
4
- module Extensions
5
- module Ollama
6
- module Transport
7
- module Queues
8
- # Parametric queue — one instance per (request_type, model) tuple.
9
- #
10
- # queue_name mirrors the routing key exactly so bindings are self-documenting
11
- # in the RabbitMQ management UI, e.g.:
12
- # llm.request.ollama.embed.nomic-embed-text
13
- # llm.request.ollama.chat.qwen3.5.27b
14
- #
15
- # Queue strategy:
16
- # - classic (not quorum): quorum queues cannot be auto-delete
17
- # - auto_delete: true — queue deletes when last consumer disconnects + queue empties,
18
- # enabling basic.return feedback to publishers via mandatory: true
19
- # - x-max-priority: 10 — must be a queue argument at declaration time for classic
20
- # queues; policies handle max-length and overflow externally
21
- class ModelRequest < Legion::Transport::Queue
22
- def initialize(request_type:, model:, **)
23
- @request_type = request_type.to_s
24
- @model = sanitise_model(model)
25
- super(**)
26
- end
27
-
28
- def queue_name
29
- "llm.request.ollama.#{@request_type}.#{@model}"
30
- end
31
-
32
- def queue_options
33
- {
34
- durable: false,
35
- auto_delete: true,
36
- arguments: { 'x-max-priority' => 10 }
37
- }
38
- end
39
-
40
- # Disable dead-letter exchange provisioning. The base class
41
- # default_options always adds x-dead-letter-exchange when
42
- # dlx_enabled returns true. Fleet queues are ephemeral
43
- # (auto-delete) and must not provision persistent DLX queues.
44
- def dlx_enabled
45
- false
46
- end
47
-
48
- private
49
-
50
- def sanitise_model(name)
51
- name.to_s.tr(':', '.')
52
- end
53
- end
54
- end
55
- end
56
- end
57
- end
58
- end