lex-ollama 0.3.0 → 0.3.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +29 -0
- data/CLAUDE.md +181 -12
- data/README.md +4 -0
- data/lib/legion/extensions/ollama/actors/model_worker.rb +113 -0
- data/lib/legion/extensions/ollama/runners/fleet.rb +212 -0
- data/lib/legion/extensions/ollama/runners/s3_models.rb +2 -2
- data/lib/legion/extensions/ollama/transport/exchanges/llm_request.rb +17 -0
- data/lib/legion/extensions/ollama/transport/messages/llm_response.rb +28 -0
- data/lib/legion/extensions/ollama/transport/queues/model_request.rb +58 -0
- data/lib/legion/extensions/ollama/transport.rb +25 -0
- data/lib/legion/extensions/ollama/version.rb +1 -1
- data/lib/legion/extensions/ollama.rb +12 -1
- metadata +7 -3
- data/docs/plans/2026-04-01-s3-model-distribution-design.md +0 -131
- data/docs/plans/2026-04-01-s3-model-distribution-plan.md +0 -655
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 8657f3e11e11fcd2ee34e12317bf7698bcfea1e907006c76f9a07326996c7a69
|
|
4
|
+
data.tar.gz: d1c1bbb05dc6a3a0071b4474a45a074b0d4929cd4b48b7488b5aa10539a9a6ee
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: e2a8622a2914cdfbc04b365d1ca7a9e8d35b4daa656931fd23a6e010b25b5a8ed6699246bffdbe0bfe064757ad26cad8f3664fe8c1dd2d1c606220dc932af45f
|
|
7
|
+
data.tar.gz: ea975e9ac1c89621d41c274b6040bb92760672f9d4a2db223ea6c08371f7ed8e8a2dc810417b9b5b2beb9ce30fccb64544f24c11ae815422e5b99f5a43a48517
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,34 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## [0.3.2] - 2026-04-08
|
|
4
|
+
|
|
5
|
+
### Changed
|
|
6
|
+
- `Transport::Exchanges::LlmRequest` now inherits `Legion::LLM::Fleet::Exchange` instead of declaring exchange properties independently — prevents silent divergence if the canonical exchange definition changes
|
|
7
|
+
- `Transport::Queues::ModelRequest` switched from durable quorum queue to classic auto-delete with `x-max-priority: 10` — enables `basic.return` feedback when all workers disconnect; added `dlx_enabled: false` to prevent DLX provisioning on ephemeral queues
|
|
8
|
+
- `Transport::Messages::LlmResponse` now inherits `Legion::LLM::Fleet::Response` instead of `Legion::Transport::Message` — gains wire protocol compliance (`type: 'llm.fleet.response'`, `message_context` propagation, default-exchange publishing, `resp_` prefixed message_id); overrides `app_id` to `'lex-ollama'`
|
|
9
|
+
- `Runners::Fleet#handle_request` now accepts and propagates `message_context` verbatim from request to response; rejects `stream: true` requests with `unsupported_streaming` error; builds full wire protocol response envelope (routing, tokens, timestamps, audit, cost, stop)
|
|
10
|
+
- `Runners::Fleet#publish_reply` switched from positional to keyword arguments; uses `fleet_correlation_id` instead of `correlation_id` to avoid collision with Legion task tracking
|
|
11
|
+
- `Runners::Fleet#dispatch` now resolves Ollama host from `Legion::Settings` instead of using hardcoded default
|
|
12
|
+
- `Actor::ModelWorker` now sets `prefetch(1)` for fair consumer dispatch; reads `consumer_priority` from `legion.ollama.fleet.consumer_priority` settings; passes `x-priority` in `subscribe_options`; injects `message_context: {}` default in `process_message`
|
|
13
|
+
|
|
14
|
+
### Added
|
|
15
|
+
- `Runners::Fleet#publish_error` — publishes `Legion::LLM::Fleet::Error` to caller's reply_to queue on validation failures (e.g., unsupported streaming)
|
|
16
|
+
- `Runners::Fleet#build_response_body` — constructs wire protocol response body with routing, tokens, timestamps, audit, cost, and stop blocks
|
|
17
|
+
|
|
18
|
+
## [0.3.1] - 2026-04-08
|
|
19
|
+
|
|
20
|
+
### Added
|
|
21
|
+
- `Runners::Fleet` — module-function dispatcher for inbound AMQP LLM request messages; routes by `request_type` to `Client#embed`, `Client#generate`, or `Client#chat`
|
|
22
|
+
- `Transport::Exchanges::LlmRequest` — durable topic exchange `llm.request` for fleet routing
|
|
23
|
+
- `Transport::Queues::ModelRequest` — parametric durable quorum queue per `(type, model)` pair; sanitises colons in model names to dots
|
|
24
|
+
- `Transport::Messages::LlmResponse` — reply message published back to `reply_to` queue after inference
|
|
25
|
+
- `Actor::ModelWorker` — subscription actor; one instance per configured `(type, model)` subscription; enriches inbound messages with `request_type` and `model`, bypasses Legion::Runner task DB (`use_runner? false`)
|
|
26
|
+
- Fleet queue subscription system: when `Legion::Extensions::Core` is present, subscribes to model-scoped queues on `llm.request` topic exchange using routing key `llm.request.ollama.<type>.<model>`
|
|
27
|
+
- Standalone mode: all transport/actor requires guarded behind `const_defined?(:Core, false)` so the gem works as a pure HTTP client library without AMQP
|
|
28
|
+
|
|
29
|
+
### Fixed
|
|
30
|
+
- `Runners::S3Models`: use `::JSON.parse` (stdlib) instead of bare `JSON.parse` which resolves to `Legion::JSON` (symbol keys) inside the `Legion::` namespace — fixes `import_from_s3` and `sync_from_s3` manifest parsing
|
|
31
|
+
|
|
3
32
|
## [0.3.0] - 2026-04-01
|
|
4
33
|
|
|
5
34
|
### Added
|
data/CLAUDE.md
CHANGED
|
@@ -1,44 +1,213 @@
|
|
|
1
1
|
# lex-ollama: Ollama Integration for LegionIO
|
|
2
2
|
|
|
3
|
-
**
|
|
3
|
+
**Repository Level 3 Documentation**
|
|
4
|
+
- **Parent**: `../CLAUDE.md`
|
|
5
|
+
- **Grandparent**: `../../CLAUDE.md`
|
|
4
6
|
|
|
5
7
|
## Purpose
|
|
6
8
|
|
|
7
|
-
Legion Extension that connects LegionIO to Ollama, a local LLM server. Provides text generation,
|
|
9
|
+
Legion Extension that connects LegionIO to Ollama, a local LLM server. Provides text generation,
|
|
10
|
+
chat completions, embeddings, model management, blob operations, S3 model distribution, version
|
|
11
|
+
reporting, and **fleet queue subscription** for receiving routed LLM requests from the Legion bus.
|
|
8
12
|
|
|
9
13
|
**GitHub**: https://github.com/LegionIO/lex-ollama
|
|
10
14
|
**License**: MIT
|
|
15
|
+
**Version**: 0.3.2
|
|
16
|
+
**Specs**: 82 examples (12 spec files) — fleet additions add ~35 more
|
|
17
|
+
|
|
18
|
+
---
|
|
11
19
|
|
|
12
20
|
## Architecture
|
|
13
21
|
|
|
14
22
|
```
|
|
15
23
|
Legion::Extensions::Ollama
|
|
16
24
|
├── Runners/
|
|
17
|
-
│ ├── Completions
|
|
18
|
-
│ ├── Chat
|
|
19
|
-
│ ├── Models
|
|
20
|
-
│
|
|
21
|
-
│ ├──
|
|
22
|
-
│
|
|
25
|
+
│ ├── Completions # generate, generate_stream
|
|
26
|
+
│ ├── Chat # chat, chat_stream
|
|
27
|
+
│ ├── Models # create_model, list_models, show_model, copy_model, delete_model,
|
|
28
|
+
│ │ # pull_model, push_model, list_running
|
|
29
|
+
│ ├── Embeddings # embed
|
|
30
|
+
│ ├── Blobs # check_blob, push_blob
|
|
31
|
+
│ ├── S3Models # list_s3_models, import_from_s3, sync_from_s3, import_default_models
|
|
32
|
+
│ ├── Version # server_version
|
|
33
|
+
│ └── Fleet # handle_request (fleet dispatcher — chat/embed/generate)
|
|
23
34
|
├── Helpers/
|
|
24
|
-
│
|
|
25
|
-
|
|
35
|
+
│ ├── Client # Faraday connection to Ollama server (module, factory method)
|
|
36
|
+
│ ├── Errors # error handling + with_retry
|
|
37
|
+
│ └── Usage # usage normalization (maps Ollama token/duration fields to standard shape)
|
|
38
|
+
├── Client # Standalone client class (includes all runners, holds @config)
|
|
39
|
+
├── Transport/ # (loaded only when Legion::Extensions::Core is present)
|
|
40
|
+
│ ├── Exchanges/
|
|
41
|
+
│ │ └── LlmRequest # references Legion::LLM::Fleet::Exchange ('llm.request')
|
|
42
|
+
│ ├── Queues/
|
|
43
|
+
│ │ └── ModelRequest # parametric queue — one per (type, model) pair, auto-delete
|
|
44
|
+
│ └── Messages/
|
|
45
|
+
│ └── LlmResponse # Legion::LLM::Fleet::Response subclass, reply via default exchange
|
|
46
|
+
└── Actor/
|
|
47
|
+
└── ModelWorker # subscription actor — one per registered model/type
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
---
|
|
51
|
+
|
|
52
|
+
## Fleet Queue Subscription
|
|
53
|
+
|
|
54
|
+
### Overview
|
|
55
|
+
|
|
56
|
+
When `Legion::Extensions::Core` is available, lex-ollama subscribes to model-scoped queues on the
|
|
57
|
+
`llm.request` topic exchange, accepting routed inference work from other Legion fleet members
|
|
58
|
+
(lex-llm-gateway, direct publishers, etc.).
|
|
59
|
+
|
|
60
|
+
### Routing Key Schema
|
|
61
|
+
|
|
62
|
+
```
|
|
63
|
+
llm.request.ollama.<type>.<model>
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
| Segment | Values | Notes |
|
|
67
|
+
|------------|----------------------------|------------------------------------|
|
|
68
|
+
| `ollama` | literal | provider identifier |
|
|
69
|
+
| `type` | `chat`, `embed`, `generate`| maps to a specific runner method |
|
|
70
|
+
| `model` | sanitised model name | `:` replaced with `.` (AMQP rules) |
|
|
71
|
+
|
|
72
|
+
**Examples:**
|
|
73
|
+
```
|
|
74
|
+
llm.request.ollama.embed.nomic-embed-text
|
|
75
|
+
llm.request.ollama.embed.mxbai-embed-large
|
|
76
|
+
llm.request.ollama.chat.qwen3.5.27b # was qwen3.5:27b
|
|
77
|
+
llm.request.ollama.chat.llama3.2
|
|
78
|
+
llm.request.ollama.generate.llama3.2
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
### Queue Strategy
|
|
82
|
+
|
|
83
|
+
Each model+type combination gets its own **auto-delete queue** with a routing key that matches
|
|
84
|
+
its queue name exactly. Multiple nodes carrying the same model compete fairly (no SAC) — any
|
|
85
|
+
subscriber can serve. The queue name is identical to the routing key for clarity in the management UI.
|
|
86
|
+
RabbitMQ policies (applied externally via Terraform) set `max-length` and
|
|
87
|
+
`overflow: reject-publish` on `llm.request.*` queues. Queue priority is enabled by declaring
|
|
88
|
+
`x-max-priority: 10` on the queue itself (and may also be mirrored by policy for consistency).
|
|
89
|
+
|
|
90
|
+
### Configuration
|
|
91
|
+
|
|
92
|
+
```yaml
|
|
93
|
+
legion:
|
|
94
|
+
ollama:
|
|
95
|
+
host: "http://localhost:11434"
|
|
96
|
+
subscriptions:
|
|
97
|
+
- type: embed
|
|
98
|
+
model: nomic-embed-text
|
|
99
|
+
- type: embed
|
|
100
|
+
model: mxbai-embed-large
|
|
101
|
+
- type: chat
|
|
102
|
+
model: "qwen3.5:27b"
|
|
103
|
+
- type: chat
|
|
104
|
+
model: llama3.2
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
The extension spawns one `Actor::ModelWorker` per subscription entry at boot.
|
|
108
|
+
|
|
109
|
+
### Data Flow
|
|
110
|
+
|
|
111
|
+
```
|
|
112
|
+
Publisher (legion-llm Fleet::Dispatcher / any fleet node)
|
|
113
|
+
│ routing_key: "llm.request.ollama.embed.nomic-embed-text"
|
|
114
|
+
│ AMQP type: 'llm.fleet.request'
|
|
115
|
+
│ Body includes: message_context { conversation_id, message_id, parent_message_id, message_seq, request_id, exchange_id }
|
|
116
|
+
▼
|
|
117
|
+
Exchange: llm.request [topic, durable]
|
|
118
|
+
│
|
|
119
|
+
└── Queue: llm.request.ollama.embed.nomic-embed-text [auto-delete]
|
|
120
|
+
▼
|
|
121
|
+
Actor::ModelWorker (type=embed, model=nomic-embed-text)
|
|
122
|
+
▼
|
|
123
|
+
Runners::Fleet#handle_request
|
|
124
|
+
│ copies message_context from request
|
|
125
|
+
▼
|
|
126
|
+
Ollama::Client#embed(model: 'nomic-embed-text', ...)
|
|
127
|
+
▼
|
|
128
|
+
Fleet::Response (type: 'llm.fleet.response') → reply_to queue
|
|
129
|
+
Body includes: message_context (copied), response_message_id
|
|
26
130
|
```
|
|
27
131
|
|
|
132
|
+
### Standalone Mode (no Legion runtime)
|
|
133
|
+
|
|
134
|
+
All transport/actor requires are guarded behind:
|
|
135
|
+
```ruby
|
|
136
|
+
if Legion::Extensions.const_defined?(:Core, false)
|
|
137
|
+
# transport + actor requires
|
|
138
|
+
end
|
|
139
|
+
```
|
|
140
|
+
The gem still works as a pure HTTP client library without AMQP, exactly as before.
|
|
141
|
+
|
|
142
|
+
---
|
|
143
|
+
|
|
144
|
+
## Key Design Decisions
|
|
145
|
+
|
|
146
|
+
- `generate_stream` and `chat_stream` yield `{ type: :delta, text: }` and `{ type: :done }` events.
|
|
147
|
+
- `S3Models` runner depends on `lex-s3`. Uses SHA256 digest verification. `import_from_s3` writes
|
|
148
|
+
directly to the filesystem; `sync_from_s3` pushes blobs through the Ollama API.
|
|
149
|
+
- `S3Models::OLLAMA_REGISTRY_PREFIX = 'manifests/registry.ollama.ai/library'`.
|
|
150
|
+
- `Usage` helper normalizes Ollama's token/duration fields to `{ input_tokens:, output_tokens:, ... }`.
|
|
151
|
+
- All runners return `{ result: body, status: code }`.
|
|
152
|
+
- **`Runners::Fleet` dispatch rules:**
|
|
153
|
+
- `request_type: 'embed'` → `Client#embed`, uses `:input` then falls back to `:text`.
|
|
154
|
+
- `request_type: 'generate'` → `Client#generate`.
|
|
155
|
+
- anything else (including `'chat'` or unknown) → `Client#chat`.
|
|
156
|
+
- **`Actor::ModelWorker#use_runner?` is `false`** — bypasses `Legion::Runner` / task DB entirely.
|
|
157
|
+
- **Reply publishing** never raises — errors are swallowed so the AMQP ack is not blocked.
|
|
158
|
+
- **Colon sanitisation** — `qwen3.5:27b` becomes `qwen3.5.27b` in queue/routing-key strings.
|
|
159
|
+
|
|
160
|
+
---
|
|
161
|
+
|
|
162
|
+
## Wire Protocol & Message Classes
|
|
163
|
+
|
|
164
|
+
Fleet messages inherit from `Legion::LLM::Transport::Message` (defined in legion-llm), which
|
|
165
|
+
extends `Legion::Transport::Message` with `message_context` propagation and LLM-specific headers.
|
|
166
|
+
|
|
167
|
+
```
|
|
168
|
+
Legion::Transport::Message (platform base)
|
|
169
|
+
└── Legion::LLM::Transport::Message (LLM base — message_context, llm_headers)
|
|
170
|
+
├── Legion::LLM::Fleet::Request (type: 'llm.fleet.request', app_id: 'legion-llm')
|
|
171
|
+
├── Legion::LLM::Fleet::Response (type: 'llm.fleet.response', app_id: 'lex-ollama')
|
|
172
|
+
└── Legion::LLM::Fleet::Error (type: 'llm.fleet.error', app_id: 'lex-ollama')
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
Every fleet message carries `message_context` in the body for end-to-end tracing:
|
|
176
|
+
```
|
|
177
|
+
message_context:
|
|
178
|
+
conversation_id, message_id, parent_message_id, message_seq, request_id, exchange_id
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
A subset (`conversation_id`, `message_id`, `request_id`) is promoted to AMQP headers
|
|
182
|
+
(`x-legion-llm-conversation-id`, etc.) for filtering without body parsing.
|
|
183
|
+
|
|
184
|
+
See: `docs/plans/2026-04-08-fleet-wire-protocol.md` for full AMQP property mapping,
|
|
185
|
+
platform-wide standard, and per-message-type specifications.
|
|
186
|
+
|
|
187
|
+
---
|
|
188
|
+
|
|
28
189
|
## Dependencies
|
|
29
190
|
|
|
30
191
|
| Gem | Purpose |
|
|
31
192
|
|-----|---------|
|
|
32
|
-
| faraday | HTTP client for Ollama REST API |
|
|
193
|
+
| `faraday` >= 2.0 | HTTP client for Ollama REST API |
|
|
194
|
+
| `lex-s3` >= 0.2 | S3 model distribution operations |
|
|
195
|
+
|
|
196
|
+
Fleet transport requires Legion runtime gems (`legion-transport`, `legion-llm`, `LegionIO`) but
|
|
197
|
+
those are *not* gemspec dependencies — they are expected to be present in the runtime environment.
|
|
198
|
+
`legion-llm` is needed for fleet message classes (`Legion::LLM::Fleet::Request`, etc.).
|
|
199
|
+
|
|
200
|
+
---
|
|
33
201
|
|
|
34
202
|
## Testing
|
|
35
203
|
|
|
36
204
|
```bash
|
|
37
205
|
bundle install
|
|
38
|
-
bundle exec rspec
|
|
206
|
+
bundle exec rspec # all examples
|
|
39
207
|
bundle exec rubocop
|
|
40
208
|
```
|
|
41
209
|
|
|
42
210
|
---
|
|
43
211
|
|
|
44
212
|
**Maintained By**: Matthew Iverson (@Esity)
|
|
213
|
+
**Last Updated**: 2026-04-08
|
data/README.md
CHANGED
|
@@ -119,6 +119,10 @@ result[:usage] # => { input_tokens: 1, output_tokens: 5, total_duration: ..., .
|
|
|
119
119
|
- [LegionIO](https://github.com/LegionIO/LegionIO) framework
|
|
120
120
|
- [Ollama](https://ollama.com) running locally or on a remote host
|
|
121
121
|
|
|
122
|
+
## Version
|
|
123
|
+
|
|
124
|
+
0.3.1
|
|
125
|
+
|
|
122
126
|
## License
|
|
123
127
|
|
|
124
128
|
MIT
|
|
@@ -0,0 +1,113 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Legion
|
|
4
|
+
module Extensions
|
|
5
|
+
module Ollama
|
|
6
|
+
module Actor
|
|
7
|
+
# Subscription actor that listens on a model-scoped queue and forwards
|
|
8
|
+
# inbound LLM request messages to Runners::Fleet#handle_request.
|
|
9
|
+
#
|
|
10
|
+
# One instance is created per (request_type, model) entry in settings:
|
|
11
|
+
#
|
|
12
|
+
# legion:
|
|
13
|
+
# ollama:
|
|
14
|
+
# fleet:
|
|
15
|
+
# consumer_priority: 10
|
|
16
|
+
# subscriptions:
|
|
17
|
+
# - type: embed
|
|
18
|
+
# model: nomic-embed-text
|
|
19
|
+
# - type: chat
|
|
20
|
+
# model: "qwen3.5:27b"
|
|
21
|
+
#
|
|
22
|
+
# The queue name and routing key both follow the schema:
|
|
23
|
+
# llm.request.ollama.<type>.<model>
|
|
24
|
+
# where model colons are converted to dots (AMQP topic word separator).
|
|
25
|
+
class ModelWorker < Legion::Extensions::Actors::Subscription
|
|
26
|
+
attr_reader :request_type, :model_name
|
|
27
|
+
|
|
28
|
+
def initialize(request_type:, model:, **)
|
|
29
|
+
@request_type = request_type.to_s
|
|
30
|
+
@model_name = model.to_s
|
|
31
|
+
super(**)
|
|
32
|
+
end
|
|
33
|
+
|
|
34
|
+
def runner_class
|
|
35
|
+
Legion::Extensions::Ollama::Runners::Fleet
|
|
36
|
+
end
|
|
37
|
+
|
|
38
|
+
def runner_function
|
|
39
|
+
'handle_request'
|
|
40
|
+
end
|
|
41
|
+
|
|
42
|
+
# Bypass Legion::Runner — call the runner module directly so we don't
|
|
43
|
+
# need a task record in the database for every LLM inference hop.
|
|
44
|
+
def use_runner?
|
|
45
|
+
false
|
|
46
|
+
end
|
|
47
|
+
|
|
48
|
+
# prefetch(1) is required for consumer priority to work correctly:
|
|
49
|
+
# without it, a high-priority consumer can hold multiple messages while
|
|
50
|
+
# lower-priority consumers sit idle. With prefetch=1, each consumer
|
|
51
|
+
# completes one message before RabbitMQ delivers the next, and priority
|
|
52
|
+
# determines which idle consumer gets it.
|
|
53
|
+
def prefetch
|
|
54
|
+
1
|
|
55
|
+
end
|
|
56
|
+
|
|
57
|
+
# Consumer priority from settings. Tells RabbitMQ to prefer this consumer
|
|
58
|
+
# over lower-priority ones on the same queue when multiple consumers are idle.
|
|
59
|
+
# Standard scale: GPU server = 10, Mac Studio = 5, developer laptop = 1.
|
|
60
|
+
# Defaults to 0 (equal priority) if not configured.
|
|
61
|
+
def consumer_priority
|
|
62
|
+
return 0 unless defined?(Legion::Settings)
|
|
63
|
+
|
|
64
|
+
Legion::Settings.dig(:ollama, :fleet, :consumer_priority) || 0
|
|
65
|
+
end
|
|
66
|
+
|
|
67
|
+
# Subscribe options include x-priority argument so RabbitMQ can honour
|
|
68
|
+
# consumer priority when dispatching to competing consumers.
|
|
69
|
+
def subscribe_options
|
|
70
|
+
base = begin
|
|
71
|
+
super
|
|
72
|
+
rescue NoMethodError
|
|
73
|
+
{}
|
|
74
|
+
end
|
|
75
|
+
base.merge(arguments: { 'x-priority' => consumer_priority })
|
|
76
|
+
end
|
|
77
|
+
|
|
78
|
+
# Override queue to return a model-scoped queue bound with the precise
|
|
79
|
+
# routing key for this worker's (type, model) pair.
|
|
80
|
+
def queue
|
|
81
|
+
@queue ||= build_and_bind_queue
|
|
82
|
+
end
|
|
83
|
+
|
|
84
|
+
# Enrich every inbound message with the worker's own request_type and model
|
|
85
|
+
# so Runners::Fleet#handle_request always has them, even if the sender omitted
|
|
86
|
+
# them. Also defaults message_context to {} if absent.
|
|
87
|
+
def process_message(payload, metadata, delivery_info)
|
|
88
|
+
msg = super
|
|
89
|
+
msg[:request_type] ||= @request_type
|
|
90
|
+
msg[:model] ||= @model_name
|
|
91
|
+
msg[:message_context] ||= {}
|
|
92
|
+
msg
|
|
93
|
+
end
|
|
94
|
+
|
|
95
|
+
private
|
|
96
|
+
|
|
97
|
+
def build_and_bind_queue
|
|
98
|
+
sanitised_model = @model_name.tr(':', '.')
|
|
99
|
+
routing_key = "llm.request.ollama.#{@request_type}.#{sanitised_model}"
|
|
100
|
+
|
|
101
|
+
queue_obj = Transport::Queues::ModelRequest.new(
|
|
102
|
+
request_type: @request_type,
|
|
103
|
+
model: @model_name
|
|
104
|
+
)
|
|
105
|
+
exchange_obj = Transport::Exchanges::LlmRequest.new
|
|
106
|
+
queue_obj.bind(exchange_obj, routing_key: routing_key)
|
|
107
|
+
queue_obj
|
|
108
|
+
end
|
|
109
|
+
end
|
|
110
|
+
end
|
|
111
|
+
end
|
|
112
|
+
end
|
|
113
|
+
end
|
|
@@ -0,0 +1,212 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Legion
|
|
4
|
+
module Extensions
|
|
5
|
+
module Ollama
|
|
6
|
+
module Runners
|
|
7
|
+
# Fleet runner — handles inbound AMQP LLM request messages and dispatches
|
|
8
|
+
# them to the appropriate Ollama::Client method based on request_type.
|
|
9
|
+
#
|
|
10
|
+
# Called by Actor::ModelWorker with use_runner? = false.
|
|
11
|
+
module Fleet
|
|
12
|
+
class << self
|
|
13
|
+
# Primary entry point called by the subscription actor.
|
|
14
|
+
#
|
|
15
|
+
# @param model [String] Ollama model name, e.g. "nomic-embed-text"
|
|
16
|
+
# @param request_type [String] "chat", "embed", or "generate"
|
|
17
|
+
# @param reply_to [String, nil] routing key for the reply queue (RPC pattern)
|
|
18
|
+
# @param correlation_id [String, nil] fleet correlation ID, echoed back in reply
|
|
19
|
+
# @param message_context [Hash] tracing context — copied verbatim into the reply
|
|
20
|
+
# @param payload [Hash] remaining message keys passed to the Ollama client
|
|
21
|
+
def handle_request(model:, request_type: 'chat', reply_to: nil,
|
|
22
|
+
correlation_id: nil, message_context: {}, **payload)
|
|
23
|
+
received_at = Time.now.utc
|
|
24
|
+
|
|
25
|
+
if payload[:stream]
|
|
26
|
+
publish_error(
|
|
27
|
+
reply_to: reply_to,
|
|
28
|
+
correlation_id: correlation_id,
|
|
29
|
+
message_context: message_context,
|
|
30
|
+
model: model,
|
|
31
|
+
request_type: request_type,
|
|
32
|
+
error: {
|
|
33
|
+
code: 'unsupported_streaming',
|
|
34
|
+
message: 'Streaming over the fleet AMQP bus is not supported in v1',
|
|
35
|
+
retriable: false,
|
|
36
|
+
category: 'validation',
|
|
37
|
+
provider: 'ollama'
|
|
38
|
+
}
|
|
39
|
+
)
|
|
40
|
+
return { result: nil, status: 422, error: 'unsupported_streaming' }
|
|
41
|
+
end
|
|
42
|
+
|
|
43
|
+
result = dispatch(model: model, request_type: request_type, **payload)
|
|
44
|
+
returned_at = Time.now.utc
|
|
45
|
+
|
|
46
|
+
if reply_to
|
|
47
|
+
publish_reply(
|
|
48
|
+
reply_to: reply_to,
|
|
49
|
+
correlation_id: correlation_id,
|
|
50
|
+
message_context: message_context,
|
|
51
|
+
model: model,
|
|
52
|
+
request_type: request_type,
|
|
53
|
+
result: result,
|
|
54
|
+
received_at: received_at,
|
|
55
|
+
returned_at: returned_at
|
|
56
|
+
)
|
|
57
|
+
end
|
|
58
|
+
|
|
59
|
+
result
|
|
60
|
+
end
|
|
61
|
+
|
|
62
|
+
# Dispatch to the correct Ollama client method by request_type.
|
|
63
|
+
#
|
|
64
|
+
# @return [Hash] { result: body, status: code } or { result: nil, status: 500, error: msg }
|
|
65
|
+
def dispatch(model:, request_type:, **payload)
|
|
66
|
+
host = ollama_host
|
|
67
|
+
ollama = Legion::Extensions::Ollama::Client.new(host: host)
|
|
68
|
+
|
|
69
|
+
case request_type.to_s
|
|
70
|
+
when 'embed'
|
|
71
|
+
input = payload[:input] || payload[:text]
|
|
72
|
+
ollama.embed(model: model, input: input,
|
|
73
|
+
**payload.slice(:truncate, :options, :keep_alive, :dimensions))
|
|
74
|
+
when 'generate'
|
|
75
|
+
ollama.generate(model: model, prompt: payload[:prompt],
|
|
76
|
+
**payload.slice(:images, :format, :options, :system, :keep_alive))
|
|
77
|
+
else
|
|
78
|
+
ollama.chat(model: model, messages: payload[:messages],
|
|
79
|
+
**payload.slice(:tools, :format, :options, :keep_alive, :think))
|
|
80
|
+
end
|
|
81
|
+
rescue StandardError => e
|
|
82
|
+
{ result: nil, usage: {}, status: 500, error: e.message }
|
|
83
|
+
end
|
|
84
|
+
|
|
85
|
+
# Publish a successful fleet response to the caller's reply_to queue.
|
|
86
|
+
# Errors are swallowed so the AMQP ack path is never blocked by a broken reply.
|
|
87
|
+
def publish_reply(reply_to:, correlation_id:, message_context:, model:,
|
|
88
|
+
request_type:, result:, received_at:, returned_at:)
|
|
89
|
+
return unless defined?(Legion::Transport)
|
|
90
|
+
|
|
91
|
+
body = result[:result] || {}
|
|
92
|
+
usage = result[:usage] || {}
|
|
93
|
+
status = result[:status] || 200
|
|
94
|
+
latency_ms = ((returned_at - received_at) * 1000).round
|
|
95
|
+
|
|
96
|
+
Transport::Messages::LlmResponse.new(
|
|
97
|
+
reply_to: reply_to,
|
|
98
|
+
fleet_correlation_id: correlation_id,
|
|
99
|
+
message_context: message_context,
|
|
100
|
+
provider: 'ollama',
|
|
101
|
+
model: model,
|
|
102
|
+
request_type: request_type,
|
|
103
|
+
app_id: 'lex-ollama',
|
|
104
|
+
**build_response_body(
|
|
105
|
+
request_type: request_type,
|
|
106
|
+
body: body,
|
|
107
|
+
usage: usage,
|
|
108
|
+
status: status,
|
|
109
|
+
model: model,
|
|
110
|
+
latency_ms: latency_ms,
|
|
111
|
+
received_at: received_at,
|
|
112
|
+
returned_at: returned_at
|
|
113
|
+
)
|
|
114
|
+
).publish
|
|
115
|
+
rescue StandardError
|
|
116
|
+
nil
|
|
117
|
+
end
|
|
118
|
+
|
|
119
|
+
# Publish a fleet error to the caller's reply_to queue.
|
|
120
|
+
# Errors are swallowed so the AMQP ack path is never blocked.
|
|
121
|
+
def publish_error(reply_to:, correlation_id:, message_context:, model:,
|
|
122
|
+
request_type:, error:)
|
|
123
|
+
return unless reply_to
|
|
124
|
+
return unless defined?(Legion::Transport)
|
|
125
|
+
|
|
126
|
+
Legion::LLM::Fleet::Error.new(
|
|
127
|
+
reply_to: reply_to,
|
|
128
|
+
fleet_correlation_id: correlation_id,
|
|
129
|
+
message_context: message_context,
|
|
130
|
+
provider: 'ollama',
|
|
131
|
+
model: model,
|
|
132
|
+
request_type: request_type,
|
|
133
|
+
app_id: 'lex-ollama',
|
|
134
|
+
error: error,
|
|
135
|
+
worker_node: node_identity
|
|
136
|
+
).publish
|
|
137
|
+
rescue StandardError
|
|
138
|
+
nil
|
|
139
|
+
end
|
|
140
|
+
|
|
141
|
+
private
|
|
142
|
+
|
|
143
|
+
# Build the JSON body for a successful fleet response.
|
|
144
|
+
def build_response_body(request_type:, body:, usage:, status:, model:,
|
|
145
|
+
latency_ms:, received_at:, returned_at:)
|
|
146
|
+
base = {
|
|
147
|
+
routing: {
|
|
148
|
+
provider: 'ollama',
|
|
149
|
+
model: model,
|
|
150
|
+
tier: 'fleet',
|
|
151
|
+
strategy: 'fleet_dispatch',
|
|
152
|
+
latency_ms: latency_ms
|
|
153
|
+
},
|
|
154
|
+
tokens: {
|
|
155
|
+
input: usage[:input_tokens] || 0,
|
|
156
|
+
output: usage[:output_tokens] || 0,
|
|
157
|
+
total: (usage[:input_tokens] || 0) + (usage[:output_tokens] || 0)
|
|
158
|
+
},
|
|
159
|
+
stop: { reason: body.is_a?(Hash) ? body['done_reason'] : nil },
|
|
160
|
+
cost: { estimated_usd: 0.0, provider: 'ollama', model: model },
|
|
161
|
+
timestamps: {
|
|
162
|
+
received: received_at.iso8601(3),
|
|
163
|
+
provider_start: received_at.iso8601(3),
|
|
164
|
+
provider_end: returned_at.iso8601(3),
|
|
165
|
+
returned: returned_at.iso8601(3)
|
|
166
|
+
},
|
|
167
|
+
audit: {
|
|
168
|
+
'fleet:execute' => {
|
|
169
|
+
outcome: status == 200 ? 'success' : 'error',
|
|
170
|
+
duration_ms: latency_ms,
|
|
171
|
+
timestamp: returned_at.iso8601(3)
|
|
172
|
+
}
|
|
173
|
+
},
|
|
174
|
+
stream: false
|
|
175
|
+
}
|
|
176
|
+
|
|
177
|
+
case request_type.to_s
|
|
178
|
+
when 'embed'
|
|
179
|
+
base.merge(
|
|
180
|
+
embeddings: body.is_a?(Hash) ? body['embeddings'] : body
|
|
181
|
+
)
|
|
182
|
+
when 'generate'
|
|
183
|
+
base.merge(
|
|
184
|
+
message: { role: 'assistant', content: body.is_a?(Hash) ? body['response'] : body }
|
|
185
|
+
)
|
|
186
|
+
else
|
|
187
|
+
content = body.is_a?(Hash) ? body.dig('message', 'content') : body
|
|
188
|
+
base.merge(
|
|
189
|
+
message: { role: 'assistant', content: content }
|
|
190
|
+
)
|
|
191
|
+
end
|
|
192
|
+
end
|
|
193
|
+
|
|
194
|
+
# Resolve the Ollama host from settings, falling back to the default.
|
|
195
|
+
def ollama_host
|
|
196
|
+
return Helpers::Client::DEFAULT_HOST unless defined?(Legion::Settings)
|
|
197
|
+
|
|
198
|
+
Legion::Settings.dig(:ollama, :host) || Helpers::Client::DEFAULT_HOST
|
|
199
|
+
end
|
|
200
|
+
|
|
201
|
+
# Resolve the local node identity for worker_node in error messages.
|
|
202
|
+
def node_identity
|
|
203
|
+
return 'unknown' unless defined?(Legion::Settings)
|
|
204
|
+
|
|
205
|
+
Legion::Settings.dig(:node, :canonical_name) || 'unknown'
|
|
206
|
+
end
|
|
207
|
+
end
|
|
208
|
+
end
|
|
209
|
+
end
|
|
210
|
+
end
|
|
211
|
+
end
|
|
212
|
+
end
|
|
@@ -45,7 +45,7 @@ module Legion
|
|
|
45
45
|
manifest_key = "#{prefix}/#{OLLAMA_REGISTRY_PREFIX}/#{name}/#{tag}"
|
|
46
46
|
manifest_resp = s3.get_object(bucket: bucket, key: manifest_key)
|
|
47
47
|
manifest_body = manifest_resp[:body]
|
|
48
|
-
manifest_data = JSON.parse(manifest_body)
|
|
48
|
+
manifest_data = ::JSON.parse(manifest_body)
|
|
49
49
|
|
|
50
50
|
digests = []
|
|
51
51
|
digests << manifest_data['config'].slice('digest', 'size')
|
|
@@ -90,7 +90,7 @@ module Legion
|
|
|
90
90
|
|
|
91
91
|
manifest_key = "#{prefix}/#{OLLAMA_REGISTRY_PREFIX}/#{name}/#{tag}"
|
|
92
92
|
manifest_resp = s3.get_object(bucket: bucket, key: manifest_key)
|
|
93
|
-
manifest_data = JSON.parse(manifest_resp[:body])
|
|
93
|
+
manifest_data = ::JSON.parse(manifest_resp[:body])
|
|
94
94
|
|
|
95
95
|
digests = []
|
|
96
96
|
digests << manifest_data['config']['digest']
|
|
@@ -0,0 +1,17 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Legion
|
|
4
|
+
module Extensions
|
|
5
|
+
module Ollama
|
|
6
|
+
module Transport
|
|
7
|
+
module Exchanges
|
|
8
|
+
# Thin alias that delegates exchange definition to Legion::LLM::Fleet::Exchange.
|
|
9
|
+
# This class exists solely so Ollama::Transport topology introspection has a
|
|
10
|
+
# local reference without importing legion-llm internals directly.
|
|
11
|
+
class LlmRequest < Legion::LLM::Fleet::Exchange
|
|
12
|
+
end
|
|
13
|
+
end
|
|
14
|
+
end
|
|
15
|
+
end
|
|
16
|
+
end
|
|
17
|
+
end
|
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Legion
|
|
4
|
+
module Extensions
|
|
5
|
+
module Ollama
|
|
6
|
+
module Transport
|
|
7
|
+
module Messages
|
|
8
|
+
# Published back to the caller's reply_to queue after a fleet request is processed.
|
|
9
|
+
#
|
|
10
|
+
# Inherits Legion::LLM::Fleet::Response which:
|
|
11
|
+
# - sets type: 'llm.fleet.response'
|
|
12
|
+
# - sets routing_key to @options[:reply_to]
|
|
13
|
+
# - publishes via AMQP default exchange ('')
|
|
14
|
+
# - propagates message_context into body and headers
|
|
15
|
+
# - generates message_id with 'resp_' prefix
|
|
16
|
+
#
|
|
17
|
+
# This class only overrides app_id so audit records and the wire protocol
|
|
18
|
+
# correctly identify lex-ollama as the worker component.
|
|
19
|
+
class LlmResponse < Legion::LLM::Fleet::Response
|
|
20
|
+
def app_id
|
|
21
|
+
'lex-ollama'
|
|
22
|
+
end
|
|
23
|
+
end
|
|
24
|
+
end
|
|
25
|
+
end
|
|
26
|
+
end
|
|
27
|
+
end
|
|
28
|
+
end
|