lex-ollama 0.3.0 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 7477574f919b18b85c79afba3a1f65c8540d9eff9ca02b9e0c807b3740fed452
4
- data.tar.gz: a5c69878c8518caf02c2e238c94243fd49c320f20bfaede00252bfdc87be5cbb
3
+ metadata.gz: 8657f3e11e11fcd2ee34e12317bf7698bcfea1e907006c76f9a07326996c7a69
4
+ data.tar.gz: d1c1bbb05dc6a3a0071b4474a45a074b0d4929cd4b48b7488b5aa10539a9a6ee
5
5
  SHA512:
6
- metadata.gz: 31566bf77244dd3cfc097531a3af1da186e8d0e7e0ec675be0b7471f8b7654649fa666d4c3d2f6bb34c46d73d29aa72a64dfa07f7beb35ae01d23c8f2bc6c797
7
- data.tar.gz: f900e723d2db75dbdb266fcf33d01be56d7614b992be9e0b6d29345a85012be0d226ff9a2f42cb2d5a9f932cb1e641e6ecbfc033ccd3c6c98bbe4d2a7207ad13
6
+ metadata.gz: e2a8622a2914cdfbc04b365d1ca7a9e8d35b4daa656931fd23a6e010b25b5a8ed6699246bffdbe0bfe064757ad26cad8f3664fe8c1dd2d1c606220dc932af45f
7
+ data.tar.gz: ea975e9ac1c89621d41c274b6040bb92760672f9d4a2db223ea6c08371f7ed8e8a2dc810417b9b5b2beb9ce30fccb64544f24c11ae815422e5b99f5a43a48517
data/CHANGELOG.md CHANGED
@@ -1,5 +1,34 @@
1
1
  # Changelog
2
2
 
3
+ ## [0.3.2] - 2026-04-08
4
+
5
+ ### Changed
6
+ - `Transport::Exchanges::LlmRequest` now inherits `Legion::LLM::Fleet::Exchange` instead of declaring exchange properties independently — prevents silent divergence if the canonical exchange definition changes
7
+ - `Transport::Queues::ModelRequest` switched from durable quorum queue to classic auto-delete with `x-max-priority: 10` — enables `basic.return` feedback when all workers disconnect; added `dlx_enabled: false` to prevent DLX provisioning on ephemeral queues
8
+ - `Transport::Messages::LlmResponse` now inherits `Legion::LLM::Fleet::Response` instead of `Legion::Transport::Message` — gains wire protocol compliance (`type: 'llm.fleet.response'`, `message_context` propagation, default-exchange publishing, `resp_` prefixed message_id); overrides `app_id` to `'lex-ollama'`
9
+ - `Runners::Fleet#handle_request` now accepts and propagates `message_context` verbatim from request to response; rejects `stream: true` requests with `unsupported_streaming` error; builds full wire protocol response envelope (routing, tokens, timestamps, audit, cost, stop)
10
+ - `Runners::Fleet#publish_reply` switched from positional to keyword arguments; uses `fleet_correlation_id` instead of `correlation_id` to avoid collision with Legion task tracking
11
+ - `Runners::Fleet#dispatch` now resolves Ollama host from `Legion::Settings` instead of using hardcoded default
12
+ - `Actor::ModelWorker` now sets `prefetch(1)` for fair consumer dispatch; reads `consumer_priority` from `legion.ollama.fleet.consumer_priority` settings; passes `x-priority` in `subscribe_options`; injects `message_context: {}` default in `process_message`
13
+
14
+ ### Added
15
+ - `Runners::Fleet#publish_error` — publishes `Legion::LLM::Fleet::Error` to caller's reply_to queue on validation failures (e.g., unsupported streaming)
16
+ - `Runners::Fleet#build_response_body` — constructs wire protocol response body with routing, tokens, timestamps, audit, cost, and stop blocks
17
+
18
+ ## [0.3.1] - 2026-04-08
19
+
20
+ ### Added
21
+ - `Runners::Fleet` — module-function dispatcher for inbound AMQP LLM request messages; routes by `request_type` to `Client#embed`, `Client#generate`, or `Client#chat`
22
+ - `Transport::Exchanges::LlmRequest` — durable topic exchange `llm.request` for fleet routing
23
+ - `Transport::Queues::ModelRequest` — parametric durable quorum queue per `(type, model)` pair; sanitises colons in model names to dots
24
+ - `Transport::Messages::LlmResponse` — reply message published back to `reply_to` queue after inference
25
+ - `Actor::ModelWorker` — subscription actor; one instance per configured `(type, model)` subscription; enriches inbound messages with `request_type` and `model`, bypasses Legion::Runner task DB (`use_runner? false`)
26
+ - Fleet queue subscription system: when `Legion::Extensions::Core` is present, subscribes to model-scoped queues on `llm.request` topic exchange using routing key `llm.request.ollama.<type>.<model>`
27
+ - Standalone mode: all transport/actor requires guarded behind `const_defined?(:Core, false)` so the gem works as a pure HTTP client library without AMQP
28
+
29
+ ### Fixed
30
+ - `Runners::S3Models`: use `::JSON.parse` (stdlib) instead of bare `JSON.parse` which resolves to `Legion::JSON` (symbol keys) inside the `Legion::` namespace — fixes `import_from_s3` and `sync_from_s3` manifest parsing
31
+
3
32
  ## [0.3.0] - 2026-04-01
4
33
 
5
34
  ### Added
data/CLAUDE.md CHANGED
@@ -1,44 +1,213 @@
1
1
  # lex-ollama: Ollama Integration for LegionIO
2
2
 
3
- **Parent**: `/Users/miverso2/rubymine/legion/extensions-ai/CLAUDE.md`
3
+ **Repository Level 3 Documentation**
4
+ - **Parent**: `../CLAUDE.md`
5
+ - **Grandparent**: `../../CLAUDE.md`
4
6
 
5
7
  ## Purpose
6
8
 
7
- Legion Extension that connects LegionIO to Ollama, a local LLM server. Provides text generation, chat completions, embeddings, model management, and blob operations.
9
+ Legion Extension that connects LegionIO to Ollama, a local LLM server. Provides text generation,
10
+ chat completions, embeddings, model management, blob operations, S3 model distribution, version
11
+ reporting, and **fleet queue subscription** for receiving routed LLM requests from the Legion bus.
8
12
 
9
13
  **GitHub**: https://github.com/LegionIO/lex-ollama
10
14
  **License**: MIT
15
+ **Version**: 0.3.2
16
+ **Specs**: 82 examples (12 spec files) — fleet additions add ~35 more
17
+
18
+ ---
11
19
 
12
20
  ## Architecture
13
21
 
14
22
  ```
15
23
  Legion::Extensions::Ollama
16
24
  ├── Runners/
17
- │ ├── Completions # POST /api/generate
18
- │ ├── Chat # POST /api/chat
19
- │ ├── Models # CRUD + pull/push/running
20
- ├── Embeddings # POST /api/embed
21
- │ ├── Blobs # HEAD/POST /api/blobs/:digest
22
- └── Version # GET /api/version
25
+ │ ├── Completions # generate, generate_stream
26
+ │ ├── Chat # chat, chat_stream
27
+ │ ├── Models # create_model, list_models, show_model, copy_model, delete_model,
28
+ # pull_model, push_model, list_running
29
+ │ ├── Embeddings # embed
30
+ ├── Blobs # check_blob, push_blob
31
+ │ ├── S3Models # list_s3_models, import_from_s3, sync_from_s3, import_default_models
32
+ │ ├── Version # server_version
33
+ │ └── Fleet # handle_request (fleet dispatcher — chat/embed/generate)
23
34
  ├── Helpers/
24
- └── Client # Faraday connection to Ollama server
25
- └── Client # Standalone client class
35
+ ├── Client # Faraday connection to Ollama server (module, factory method)
36
+ │ ├── Errors # error handling + with_retry
37
+ │ └── Usage # usage normalization (maps Ollama token/duration fields to standard shape)
38
+ ├── Client # Standalone client class (includes all runners, holds @config)
39
+ ├── Transport/ # (loaded only when Legion::Extensions::Core is present)
40
+ │ ├── Exchanges/
41
+ │ │ └── LlmRequest # references Legion::LLM::Fleet::Exchange ('llm.request')
42
+ │ ├── Queues/
43
+ │ │ └── ModelRequest # parametric queue — one per (type, model) pair, auto-delete
44
+ │ └── Messages/
45
+ │ └── LlmResponse # Legion::LLM::Fleet::Response subclass, reply via default exchange
46
+ └── Actor/
47
+ └── ModelWorker # subscription actor — one per registered model/type
48
+ ```
49
+
50
+ ---
51
+
52
+ ## Fleet Queue Subscription
53
+
54
+ ### Overview
55
+
56
+ When `Legion::Extensions::Core` is available, lex-ollama subscribes to model-scoped queues on the
57
+ `llm.request` topic exchange, accepting routed inference work from other Legion fleet members
58
+ (lex-llm-gateway, direct publishers, etc.).
59
+
60
+ ### Routing Key Schema
61
+
62
+ ```
63
+ llm.request.ollama.<type>.<model>
64
+ ```
65
+
66
+ | Segment | Values | Notes |
67
+ |------------|----------------------------|------------------------------------|
68
+ | `ollama` | literal | provider identifier |
69
+ | `type` | `chat`, `embed`, `generate`| maps to a specific runner method |
70
+ | `model` | sanitised model name | `:` replaced with `.` (AMQP rules) |
71
+
72
+ **Examples:**
73
+ ```
74
+ llm.request.ollama.embed.nomic-embed-text
75
+ llm.request.ollama.embed.mxbai-embed-large
76
+ llm.request.ollama.chat.qwen3.5.27b # was qwen3.5:27b
77
+ llm.request.ollama.chat.llama3.2
78
+ llm.request.ollama.generate.llama3.2
79
+ ```
80
+
81
+ ### Queue Strategy
82
+
83
+ Each model+type combination gets its own **auto-delete queue** with a routing key that matches
84
+ its queue name exactly. Multiple nodes carrying the same model compete fairly (no SAC) — any
85
+ subscriber can serve. The queue name is identical to the routing key for clarity in the management UI.
86
+ RabbitMQ policies (applied externally via Terraform) set `max-length` and
87
+ `overflow: reject-publish` on `llm.request.*` queues. Queue priority is enabled by declaring
88
+ `x-max-priority: 10` on the queue itself (and may also be mirrored by policy for consistency).
89
+
90
+ ### Configuration
91
+
92
+ ```yaml
93
+ legion:
94
+ ollama:
95
+ host: "http://localhost:11434"
96
+ subscriptions:
97
+ - type: embed
98
+ model: nomic-embed-text
99
+ - type: embed
100
+ model: mxbai-embed-large
101
+ - type: chat
102
+ model: "qwen3.5:27b"
103
+ - type: chat
104
+ model: llama3.2
105
+ ```
106
+
107
+ The extension spawns one `Actor::ModelWorker` per subscription entry at boot.
108
+
109
+ ### Data Flow
110
+
111
+ ```
112
+ Publisher (legion-llm Fleet::Dispatcher / any fleet node)
113
+ │ routing_key: "llm.request.ollama.embed.nomic-embed-text"
114
+ │ AMQP type: 'llm.fleet.request'
115
+ │ Body includes: message_context { conversation_id, message_id, parent_message_id, message_seq, request_id, exchange_id }
116
+
117
+ Exchange: llm.request [topic, durable]
118
+
119
+ └── Queue: llm.request.ollama.embed.nomic-embed-text [auto-delete]
120
+
121
+ Actor::ModelWorker (type=embed, model=nomic-embed-text)
122
+
123
+ Runners::Fleet#handle_request
124
+ │ copies message_context from request
125
+
126
+ Ollama::Client#embed(model: 'nomic-embed-text', ...)
127
+
128
+ Fleet::Response (type: 'llm.fleet.response') → reply_to queue
129
+ Body includes: message_context (copied), response_message_id
26
130
  ```
27
131
 
132
+ ### Standalone Mode (no Legion runtime)
133
+
134
+ All transport/actor requires are guarded behind:
135
+ ```ruby
136
+ if Legion::Extensions.const_defined?(:Core, false)
137
+ # transport + actor requires
138
+ end
139
+ ```
140
+ The gem still works as a pure HTTP client library without AMQP, exactly as before.
141
+
142
+ ---
143
+
144
+ ## Key Design Decisions
145
+
146
+ - `generate_stream` and `chat_stream` yield `{ type: :delta, text: }` and `{ type: :done }` events.
147
+ - `S3Models` runner depends on `lex-s3`. Uses SHA256 digest verification. `import_from_s3` writes
148
+ directly to the filesystem; `sync_from_s3` pushes blobs through the Ollama API.
149
+ - `S3Models::OLLAMA_REGISTRY_PREFIX = 'manifests/registry.ollama.ai/library'`.
150
+ - `Usage` helper normalizes Ollama's token/duration fields to `{ input_tokens:, output_tokens:, ... }`.
151
+ - All runners return `{ result: body, status: code }`.
152
+ - **`Runners::Fleet` dispatch rules:**
153
+ - `request_type: 'embed'` → `Client#embed`, uses `:input` then falls back to `:text`.
154
+ - `request_type: 'generate'` → `Client#generate`.
155
+ - anything else (including `'chat'` or unknown) → `Client#chat`.
156
+ - **`Actor::ModelWorker#use_runner?` is `false`** — bypasses `Legion::Runner` / task DB entirely.
157
+ - **Reply publishing** never raises — errors are swallowed so the AMQP ack is not blocked.
158
+ - **Colon sanitisation** — `qwen3.5:27b` becomes `qwen3.5.27b` in queue/routing-key strings.
159
+
160
+ ---
161
+
162
+ ## Wire Protocol & Message Classes
163
+
164
+ Fleet messages inherit from `Legion::LLM::Transport::Message` (defined in legion-llm), which
165
+ extends `Legion::Transport::Message` with `message_context` propagation and LLM-specific headers.
166
+
167
+ ```
168
+ Legion::Transport::Message (platform base)
169
+ └── Legion::LLM::Transport::Message (LLM base — message_context, llm_headers)
170
+ ├── Legion::LLM::Fleet::Request (type: 'llm.fleet.request', app_id: 'legion-llm')
171
+ ├── Legion::LLM::Fleet::Response (type: 'llm.fleet.response', app_id: 'lex-ollama')
172
+ └── Legion::LLM::Fleet::Error (type: 'llm.fleet.error', app_id: 'lex-ollama')
173
+ ```
174
+
175
+ Every fleet message carries `message_context` in the body for end-to-end tracing:
176
+ ```
177
+ message_context:
178
+ conversation_id, message_id, parent_message_id, message_seq, request_id, exchange_id
179
+ ```
180
+
181
+ A subset (`conversation_id`, `message_id`, `request_id`) is promoted to AMQP headers
182
+ (`x-legion-llm-conversation-id`, etc.) for filtering without body parsing.
183
+
184
+ See: `docs/plans/2026-04-08-fleet-wire-protocol.md` for full AMQP property mapping,
185
+ platform-wide standard, and per-message-type specifications.
186
+
187
+ ---
188
+
28
189
  ## Dependencies
29
190
 
30
191
  | Gem | Purpose |
31
192
  |-----|---------|
32
- | faraday | HTTP client for Ollama REST API |
193
+ | `faraday` >= 2.0 | HTTP client for Ollama REST API |
194
+ | `lex-s3` >= 0.2 | S3 model distribution operations |
195
+
196
+ Fleet transport requires Legion runtime gems (`legion-transport`, `legion-llm`, `LegionIO`) but
197
+ those are *not* gemspec dependencies — they are expected to be present in the runtime environment.
198
+ `legion-llm` is needed for fleet message classes (`Legion::LLM::Fleet::Request`, etc.).
199
+
200
+ ---
33
201
 
34
202
  ## Testing
35
203
 
36
204
  ```bash
37
205
  bundle install
38
- bundle exec rspec
206
+ bundle exec rspec # all examples
39
207
  bundle exec rubocop
40
208
  ```
41
209
 
42
210
  ---
43
211
 
44
212
  **Maintained By**: Matthew Iverson (@Esity)
213
+ **Last Updated**: 2026-04-08
data/README.md CHANGED
@@ -119,6 +119,10 @@ result[:usage] # => { input_tokens: 1, output_tokens: 5, total_duration: ..., .
119
119
  - [LegionIO](https://github.com/LegionIO/LegionIO) framework
120
120
  - [Ollama](https://ollama.com) running locally or on a remote host
121
121
 
122
+ ## Version
123
+
124
+ 0.3.1
125
+
122
126
  ## License
123
127
 
124
128
  MIT
@@ -0,0 +1,113 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Legion
4
+ module Extensions
5
+ module Ollama
6
+ module Actor
7
+ # Subscription actor that listens on a model-scoped queue and forwards
8
+ # inbound LLM request messages to Runners::Fleet#handle_request.
9
+ #
10
+ # One instance is created per (request_type, model) entry in settings:
11
+ #
12
+ # legion:
13
+ # ollama:
14
+ # fleet:
15
+ # consumer_priority: 10
16
+ # subscriptions:
17
+ # - type: embed
18
+ # model: nomic-embed-text
19
+ # - type: chat
20
+ # model: "qwen3.5:27b"
21
+ #
22
+ # The queue name and routing key both follow the schema:
23
+ # llm.request.ollama.<type>.<model>
24
+ # where model colons are converted to dots (AMQP topic word separator).
25
+ class ModelWorker < Legion::Extensions::Actors::Subscription
26
+ attr_reader :request_type, :model_name
27
+
28
+ def initialize(request_type:, model:, **)
29
+ @request_type = request_type.to_s
30
+ @model_name = model.to_s
31
+ super(**)
32
+ end
33
+
34
+ def runner_class
35
+ Legion::Extensions::Ollama::Runners::Fleet
36
+ end
37
+
38
+ def runner_function
39
+ 'handle_request'
40
+ end
41
+
42
+ # Bypass Legion::Runner — call the runner module directly so we don't
43
+ # need a task record in the database for every LLM inference hop.
44
+ def use_runner?
45
+ false
46
+ end
47
+
48
+ # prefetch(1) is required for consumer priority to work correctly:
49
+ # without it, a high-priority consumer can hold multiple messages while
50
+ # lower-priority consumers sit idle. With prefetch=1, each consumer
51
+ # completes one message before RabbitMQ delivers the next, and priority
52
+ # determines which idle consumer gets it.
53
+ def prefetch
54
+ 1
55
+ end
56
+
57
+ # Consumer priority from settings. Tells RabbitMQ to prefer this consumer
58
+ # over lower-priority ones on the same queue when multiple consumers are idle.
59
+ # Standard scale: GPU server = 10, Mac Studio = 5, developer laptop = 1.
60
+ # Defaults to 0 (equal priority) if not configured.
61
+ def consumer_priority
62
+ return 0 unless defined?(Legion::Settings)
63
+
64
+ Legion::Settings.dig(:ollama, :fleet, :consumer_priority) || 0
65
+ end
66
+
67
+ # Subscribe options include x-priority argument so RabbitMQ can honour
68
+ # consumer priority when dispatching to competing consumers.
69
+ def subscribe_options
70
+ base = begin
71
+ super
72
+ rescue NoMethodError
73
+ {}
74
+ end
75
+ base.merge(arguments: { 'x-priority' => consumer_priority })
76
+ end
77
+
78
+ # Override queue to return a model-scoped queue bound with the precise
79
+ # routing key for this worker's (type, model) pair.
80
+ def queue
81
+ @queue ||= build_and_bind_queue
82
+ end
83
+
84
+ # Enrich every inbound message with the worker's own request_type and model
85
+ # so Runners::Fleet#handle_request always has them, even if the sender omitted
86
+ # them. Also defaults message_context to {} if absent.
87
+ def process_message(payload, metadata, delivery_info)
88
+ msg = super
89
+ msg[:request_type] ||= @request_type
90
+ msg[:model] ||= @model_name
91
+ msg[:message_context] ||= {}
92
+ msg
93
+ end
94
+
95
+ private
96
+
97
+ def build_and_bind_queue
98
+ sanitised_model = @model_name.tr(':', '.')
99
+ routing_key = "llm.request.ollama.#{@request_type}.#{sanitised_model}"
100
+
101
+ queue_obj = Transport::Queues::ModelRequest.new(
102
+ request_type: @request_type,
103
+ model: @model_name
104
+ )
105
+ exchange_obj = Transport::Exchanges::LlmRequest.new
106
+ queue_obj.bind(exchange_obj, routing_key: routing_key)
107
+ queue_obj
108
+ end
109
+ end
110
+ end
111
+ end
112
+ end
113
+ end
@@ -0,0 +1,212 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Legion
4
+ module Extensions
5
+ module Ollama
6
+ module Runners
7
+ # Fleet runner — handles inbound AMQP LLM request messages and dispatches
8
+ # them to the appropriate Ollama::Client method based on request_type.
9
+ #
10
+ # Called by Actor::ModelWorker with use_runner? = false.
11
+ module Fleet
12
+ class << self
13
+ # Primary entry point called by the subscription actor.
14
+ #
15
+ # @param model [String] Ollama model name, e.g. "nomic-embed-text"
16
+ # @param request_type [String] "chat", "embed", or "generate"
17
+ # @param reply_to [String, nil] routing key for the reply queue (RPC pattern)
18
+ # @param correlation_id [String, nil] fleet correlation ID, echoed back in reply
19
+ # @param message_context [Hash] tracing context — copied verbatim into the reply
20
+ # @param payload [Hash] remaining message keys passed to the Ollama client
21
+ def handle_request(model:, request_type: 'chat', reply_to: nil,
22
+ correlation_id: nil, message_context: {}, **payload)
23
+ received_at = Time.now.utc
24
+
25
+ if payload[:stream]
26
+ publish_error(
27
+ reply_to: reply_to,
28
+ correlation_id: correlation_id,
29
+ message_context: message_context,
30
+ model: model,
31
+ request_type: request_type,
32
+ error: {
33
+ code: 'unsupported_streaming',
34
+ message: 'Streaming over the fleet AMQP bus is not supported in v1',
35
+ retriable: false,
36
+ category: 'validation',
37
+ provider: 'ollama'
38
+ }
39
+ )
40
+ return { result: nil, status: 422, error: 'unsupported_streaming' }
41
+ end
42
+
43
+ result = dispatch(model: model, request_type: request_type, **payload)
44
+ returned_at = Time.now.utc
45
+
46
+ if reply_to
47
+ publish_reply(
48
+ reply_to: reply_to,
49
+ correlation_id: correlation_id,
50
+ message_context: message_context,
51
+ model: model,
52
+ request_type: request_type,
53
+ result: result,
54
+ received_at: received_at,
55
+ returned_at: returned_at
56
+ )
57
+ end
58
+
59
+ result
60
+ end
61
+
62
+ # Dispatch to the correct Ollama client method by request_type.
63
+ #
64
+ # @return [Hash] { result: body, status: code } or { result: nil, status: 500, error: msg }
65
+ def dispatch(model:, request_type:, **payload)
66
+ host = ollama_host
67
+ ollama = Legion::Extensions::Ollama::Client.new(host: host)
68
+
69
+ case request_type.to_s
70
+ when 'embed'
71
+ input = payload[:input] || payload[:text]
72
+ ollama.embed(model: model, input: input,
73
+ **payload.slice(:truncate, :options, :keep_alive, :dimensions))
74
+ when 'generate'
75
+ ollama.generate(model: model, prompt: payload[:prompt],
76
+ **payload.slice(:images, :format, :options, :system, :keep_alive))
77
+ else
78
+ ollama.chat(model: model, messages: payload[:messages],
79
+ **payload.slice(:tools, :format, :options, :keep_alive, :think))
80
+ end
81
+ rescue StandardError => e
82
+ { result: nil, usage: {}, status: 500, error: e.message }
83
+ end
84
+
85
+ # Publish a successful fleet response to the caller's reply_to queue.
86
+ # Errors are swallowed so the AMQP ack path is never blocked by a broken reply.
87
+ def publish_reply(reply_to:, correlation_id:, message_context:, model:,
88
+ request_type:, result:, received_at:, returned_at:)
89
+ return unless defined?(Legion::Transport)
90
+
91
+ body = result[:result] || {}
92
+ usage = result[:usage] || {}
93
+ status = result[:status] || 200
94
+ latency_ms = ((returned_at - received_at) * 1000).round
95
+
96
+ Transport::Messages::LlmResponse.new(
97
+ reply_to: reply_to,
98
+ fleet_correlation_id: correlation_id,
99
+ message_context: message_context,
100
+ provider: 'ollama',
101
+ model: model,
102
+ request_type: request_type,
103
+ app_id: 'lex-ollama',
104
+ **build_response_body(
105
+ request_type: request_type,
106
+ body: body,
107
+ usage: usage,
108
+ status: status,
109
+ model: model,
110
+ latency_ms: latency_ms,
111
+ received_at: received_at,
112
+ returned_at: returned_at
113
+ )
114
+ ).publish
115
+ rescue StandardError
116
+ nil
117
+ end
118
+
119
+ # Publish a fleet error to the caller's reply_to queue.
120
+ # Errors are swallowed so the AMQP ack path is never blocked.
121
+ def publish_error(reply_to:, correlation_id:, message_context:, model:,
122
+ request_type:, error:)
123
+ return unless reply_to
124
+ return unless defined?(Legion::Transport)
125
+
126
+ Legion::LLM::Fleet::Error.new(
127
+ reply_to: reply_to,
128
+ fleet_correlation_id: correlation_id,
129
+ message_context: message_context,
130
+ provider: 'ollama',
131
+ model: model,
132
+ request_type: request_type,
133
+ app_id: 'lex-ollama',
134
+ error: error,
135
+ worker_node: node_identity
136
+ ).publish
137
+ rescue StandardError
138
+ nil
139
+ end
140
+
141
+ private
142
+
143
+ # Build the JSON body for a successful fleet response.
144
+ def build_response_body(request_type:, body:, usage:, status:, model:,
145
+ latency_ms:, received_at:, returned_at:)
146
+ base = {
147
+ routing: {
148
+ provider: 'ollama',
149
+ model: model,
150
+ tier: 'fleet',
151
+ strategy: 'fleet_dispatch',
152
+ latency_ms: latency_ms
153
+ },
154
+ tokens: {
155
+ input: usage[:input_tokens] || 0,
156
+ output: usage[:output_tokens] || 0,
157
+ total: (usage[:input_tokens] || 0) + (usage[:output_tokens] || 0)
158
+ },
159
+ stop: { reason: body.is_a?(Hash) ? body['done_reason'] : nil },
160
+ cost: { estimated_usd: 0.0, provider: 'ollama', model: model },
161
+ timestamps: {
162
+ received: received_at.iso8601(3),
163
+ provider_start: received_at.iso8601(3),
164
+ provider_end: returned_at.iso8601(3),
165
+ returned: returned_at.iso8601(3)
166
+ },
167
+ audit: {
168
+ 'fleet:execute' => {
169
+ outcome: status == 200 ? 'success' : 'error',
170
+ duration_ms: latency_ms,
171
+ timestamp: returned_at.iso8601(3)
172
+ }
173
+ },
174
+ stream: false
175
+ }
176
+
177
+ case request_type.to_s
178
+ when 'embed'
179
+ base.merge(
180
+ embeddings: body.is_a?(Hash) ? body['embeddings'] : body
181
+ )
182
+ when 'generate'
183
+ base.merge(
184
+ message: { role: 'assistant', content: body.is_a?(Hash) ? body['response'] : body }
185
+ )
186
+ else
187
+ content = body.is_a?(Hash) ? body.dig('message', 'content') : body
188
+ base.merge(
189
+ message: { role: 'assistant', content: content }
190
+ )
191
+ end
192
+ end
193
+
194
+ # Resolve the Ollama host from settings, falling back to the default.
195
+ def ollama_host
196
+ return Helpers::Client::DEFAULT_HOST unless defined?(Legion::Settings)
197
+
198
+ Legion::Settings.dig(:ollama, :host) || Helpers::Client::DEFAULT_HOST
199
+ end
200
+
201
+ # Resolve the local node identity for worker_node in error messages.
202
+ def node_identity
203
+ return 'unknown' unless defined?(Legion::Settings)
204
+
205
+ Legion::Settings.dig(:node, :canonical_name) || 'unknown'
206
+ end
207
+ end
208
+ end
209
+ end
210
+ end
211
+ end
212
+ end
@@ -45,7 +45,7 @@ module Legion
45
45
  manifest_key = "#{prefix}/#{OLLAMA_REGISTRY_PREFIX}/#{name}/#{tag}"
46
46
  manifest_resp = s3.get_object(bucket: bucket, key: manifest_key)
47
47
  manifest_body = manifest_resp[:body]
48
- manifest_data = JSON.parse(manifest_body)
48
+ manifest_data = ::JSON.parse(manifest_body)
49
49
 
50
50
  digests = []
51
51
  digests << manifest_data['config'].slice('digest', 'size')
@@ -90,7 +90,7 @@ module Legion
90
90
 
91
91
  manifest_key = "#{prefix}/#{OLLAMA_REGISTRY_PREFIX}/#{name}/#{tag}"
92
92
  manifest_resp = s3.get_object(bucket: bucket, key: manifest_key)
93
- manifest_data = JSON.parse(manifest_resp[:body])
93
+ manifest_data = ::JSON.parse(manifest_resp[:body])
94
94
 
95
95
  digests = []
96
96
  digests << manifest_data['config']['digest']
@@ -0,0 +1,17 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Legion
4
+ module Extensions
5
+ module Ollama
6
+ module Transport
7
+ module Exchanges
8
+ # Thin alias that delegates exchange definition to Legion::LLM::Fleet::Exchange.
9
+ # This class exists solely so Ollama::Transport topology introspection has a
10
+ # local reference without importing legion-llm internals directly.
11
+ class LlmRequest < Legion::LLM::Fleet::Exchange
12
+ end
13
+ end
14
+ end
15
+ end
16
+ end
17
+ end
@@ -0,0 +1,28 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Legion
4
+ module Extensions
5
+ module Ollama
6
+ module Transport
7
+ module Messages
8
+ # Published back to the caller's reply_to queue after a fleet request is processed.
9
+ #
10
+ # Inherits Legion::LLM::Fleet::Response which:
11
+ # - sets type: 'llm.fleet.response'
12
+ # - sets routing_key to @options[:reply_to]
13
+ # - publishes via AMQP default exchange ('')
14
+ # - propagates message_context into body and headers
15
+ # - generates message_id with 'resp_' prefix
16
+ #
17
+ # This class only overrides app_id so audit records and the wire protocol
18
+ # correctly identify lex-ollama as the worker component.
19
+ class LlmResponse < Legion::LLM::Fleet::Response
20
+ def app_id
21
+ 'lex-ollama'
22
+ end
23
+ end
24
+ end
25
+ end
26
+ end
27
+ end
28
+ end