lex-ollama 0.2.0 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 7f82aeecea946b03e08e2dc80a8ec66504276a2bb28aaaca5528d02105328166
4
- data.tar.gz: 6b7b392634ec069693a0b0b030b1619a0a5ae1d3cbb34c2440124c1c52d15e4a
3
+ metadata.gz: 28df561b00b58c7cb179b9904aed61a5aa7e278140306dadb3b4b2665eaab824
4
+ data.tar.gz: 446afaab9d80e6a4f62286a1f5ccc1c023bdbb178dba043cb96081412991b2d3
5
5
  SHA512:
6
- metadata.gz: 25b18ed44dbad71930004a3384dc897e37b245d3cdafa98122e0210a22ac5d5e6be343ab0133e519f8228f026d254e4993ee597f13368590c2fa81971329c6a8
7
- data.tar.gz: 39b9f4ed1e8a7ccd447b03770a9757c7e2cdbcea03341f07c2dda561db72e3556029152c76256e6f8b85b6d0cb858773e2d7ee9989d5559ffdc9daccfcf6b966
6
+ metadata.gz: 2915cfe6e4e959e61ee5b8ce68e7da784b4c6001cfe0c3acdb0a4e0f804da79a1e46a17b7c5297b9dd4f26e58bbae504066f5874d8cd82d6ea223b3dfc561bbb
7
+ data.tar.gz: cb1337292d4bb7c94603612e03dbdbcbd9a41c2a94e56f4bbfad1132d403f6d14b0316091017745ee2d282ccc426bdcdb65b137f66f07f2a505d231792e424b0
data/CHANGELOG.md CHANGED
@@ -1,5 +1,35 @@
1
1
  # Changelog
2
2
 
3
+ ## [0.3.1] - 2026-04-08
4
+
5
+ ### Added
6
+ - `Runners::Fleet` — module-function dispatcher for inbound AMQP LLM request messages; routes by `request_type` to `Client#embed`, `Client#generate`, or `Client#chat`
7
+ - `Transport::Exchanges::LlmRequest` — durable topic exchange `llm.request` for fleet routing
8
+ - `Transport::Queues::ModelRequest` — parametric durable quorum queue per `(type, model)` pair; sanitises colons in model names to dots
9
+ - `Transport::Messages::LlmResponse` — reply message published back to `reply_to` queue after inference
10
+ - `Actor::ModelWorker` — subscription actor; one instance per configured `(type, model)` subscription; enriches inbound messages with `request_type` and `model`, bypasses Legion::Runner task DB (`use_runner? false`)
11
+ - Fleet queue subscription system: when `Legion::Extensions::Core` is present, subscribes to model-scoped queues on `llm.request` topic exchange using routing key `llm.request.ollama.<type>.<model>`
12
+ - Standalone mode: all transport/actor requires guarded behind `const_defined?(:Core, false)` so the gem works as a pure HTTP client library without AMQP
13
+
14
+ ### Fixed
15
+ - `Runners::S3Models`: use `::JSON.parse` (stdlib) instead of bare `JSON.parse` which resolves to `Legion::JSON` (symbol keys) inside the `Legion::` namespace — fixes `import_from_s3` and `sync_from_s3` manifest parsing
16
+
17
+ ## [0.3.0] - 2026-04-01
18
+
19
+ ### Added
20
+ - S3 model distribution via new `Runners::S3Models` module
21
+ - `list_s3_models` to discover models available in an S3 mirror
22
+ - `import_from_s3` for direct filesystem model import (works without Ollama running)
23
+ - `sync_from_s3` for Ollama API-based model import (push_blob + manifest write)
24
+ - `import_default_models` convenience method for fleet provisioning
25
+ - Runtime dependency on `lex-s3` for S3 operations
26
+ - Streaming S3 downloads via `response_target` to avoid loading multi-GB blobs into memory
27
+ - Error propagation in `sync_from_s3` — returns failure with error details when blob push fails
28
+ - SHA256 digest verification for all downloaded blobs (import and sync paths)
29
+ - Atomic blob writes via temp file + rename (prevents partial/corrupt blobs on failure)
30
+ - Cache hits verified by SHA256 digest, not just file size — corrupted local blobs are re-downloaded
31
+ - `DigestMismatchError` raised when S3 blob content does not match manifest digest
32
+
3
33
  ## [0.2.0] - 2026-03-31
4
34
 
5
35
  ### Added
data/CLAUDE.md CHANGED
@@ -1,44 +1,178 @@
1
1
  # lex-ollama: Ollama Integration for LegionIO
2
2
 
3
- **Parent**: `/Users/miverso2/rubymine/legion/extensions-ai/CLAUDE.md`
3
+ **Repository Level 3 Documentation**
4
+ - **Parent**: `../CLAUDE.md`
5
+ - **Grandparent**: `../../CLAUDE.md`
4
6
 
5
7
  ## Purpose
6
8
 
7
- Legion Extension that connects LegionIO to Ollama, a local LLM server. Provides text generation, chat completions, embeddings, model management, and blob operations.
9
+ Legion Extension that connects LegionIO to Ollama, a local LLM server. Provides text generation,
10
+ chat completions, embeddings, model management, blob operations, S3 model distribution, version
11
+ reporting, and **fleet queue subscription** for receiving routed LLM requests from the Legion bus.
8
12
 
9
13
  **GitHub**: https://github.com/LegionIO/lex-ollama
10
14
  **License**: MIT
15
+ **Version**: 0.3.1
16
+ **Specs**: 82 examples (12 spec files) — fleet additions add ~35 more
17
+
18
+ ---
11
19
 
12
20
  ## Architecture
13
21
 
14
22
  ```
15
23
  Legion::Extensions::Ollama
16
24
  ├── Runners/
17
- │ ├── Completions # POST /api/generate
18
- │ ├── Chat # POST /api/chat
19
- │ ├── Models # CRUD + pull/push/running
20
- ├── Embeddings # POST /api/embed
21
- │ ├── Blobs # HEAD/POST /api/blobs/:digest
22
- └── Version # GET /api/version
25
+ │ ├── Completions # generate, generate_stream
26
+ │ ├── Chat # chat, chat_stream
27
+ │ ├── Models # create_model, list_models, show_model, copy_model, delete_model,
28
+ # pull_model, push_model, list_running
29
+ │ ├── Embeddings # embed
30
+ ├── Blobs # check_blob, push_blob
31
+ │ ├── S3Models # list_s3_models, import_from_s3, sync_from_s3, import_default_models
32
+ │ ├── Version # server_version
33
+ │ └── Fleet # handle_request (fleet dispatcher — chat/embed/generate)
23
34
  ├── Helpers/
24
- └── Client # Faraday connection to Ollama server
25
- └── Client # Standalone client class
35
+ ├── Client # Faraday connection to Ollama server (module, factory method)
36
+ │ ├── Errors # error handling + with_retry
37
+ │ └── Usage # usage normalization (maps Ollama token/duration fields to standard shape)
38
+ ├── Client # Standalone client class (includes all runners, holds @config)
39
+ ├── Transport/ # (loaded only when Legion::Extensions::Core is present)
40
+ │ ├── Exchanges/
41
+ │ │ └── LlmRequest # topic exchange 'llm.request'
42
+ │ ├── Queues/
43
+ │ │ └── ModelRequest # parametric queue — one per (type, model) pair
44
+ │ └── Messages/
45
+ │ └── LlmResponse # reply message published back to reply_to
46
+ └── Actor/
47
+ └── ModelWorker # subscription actor — one per registered model/type
26
48
  ```
27
49
 
50
+ ---
51
+
52
+ ## Fleet Queue Subscription
53
+
54
+ ### Overview
55
+
56
+ When `Legion::Extensions::Core` is available, lex-ollama subscribes to model-scoped queues on the
57
+ `llm.request` topic exchange, accepting routed inference work from other Legion fleet members
58
+ (lex-llm-gateway, direct publishers, etc.).
59
+
60
+ ### Routing Key Schema
61
+
62
+ ```
63
+ llm.request.ollama.<type>.<model>
64
+ ```
65
+
66
+ | Segment | Values | Notes |
67
+ |------------|----------------------------|------------------------------------|
68
+ | `ollama` | literal | provider identifier |
69
+ | `type` | `chat`, `embed`, `generate`| maps to a specific runner method |
70
+ | `model` | sanitised model name | `:` replaced with `.` (AMQP rules) |
71
+
72
+ **Examples:**
73
+ ```
74
+ llm.request.ollama.embed.nomic-embed-text
75
+ llm.request.ollama.embed.mxbai-embed-large
76
+ llm.request.ollama.chat.qwen3.5.27b # was qwen3.5:27b
77
+ llm.request.ollama.chat.llama3.2
78
+ llm.request.ollama.generate.llama3.2
79
+ ```
80
+
81
+ ### Queue Strategy
82
+
83
+ Each model+type combination gets its own **durable quorum queue** with a routing key that matches
84
+ its queue name exactly. Multiple nodes carrying the same model compete fairly (no SAC) — any
85
+ subscriber can serve. The queue name is identical to the routing key for clarity in the management UI.
86
+
87
+ ### Configuration
88
+
89
+ ```yaml
90
+ legion:
91
+ ollama:
92
+ host: "http://localhost:11434"
93
+ subscriptions:
94
+ - type: embed
95
+ model: nomic-embed-text
96
+ - type: embed
97
+ model: mxbai-embed-large
98
+ - type: chat
99
+ model: "qwen3.5:27b"
100
+ - type: chat
101
+ model: llama3.2
102
+ ```
103
+
104
+ The extension spawns one `Actor::ModelWorker` per subscription entry at boot.
105
+
106
+ ### Data Flow
107
+
108
+ ```
109
+ Publisher (lex-llm-gateway / any fleet node)
110
+ │ routing_key: "llm.request.ollama.embed.nomic-embed-text"
111
+
112
+ Exchange: llm.request [topic, durable]
113
+
114
+ └── Queue: llm.request.ollama.embed.nomic-embed-text [quorum]
115
+
116
+ Actor::ModelWorker (type=embed, model=nomic-embed-text)
117
+
118
+ Runners::Fleet#handle_request
119
+
120
+ Ollama::Client#embed(model: 'nomic-embed-text', ...)
121
+
122
+ Transport::Messages::LlmResponse → reply_to queue (if present)
123
+ ```
124
+
125
+ ### Standalone Mode (no Legion runtime)
126
+
127
+ All transport/actor requires are guarded behind:
128
+ ```ruby
129
+ if Legion::Extensions.const_defined?(:Core, false)
130
+ # transport + actor requires
131
+ end
132
+ ```
133
+ The gem still works as a pure HTTP client library without AMQP, exactly as before.
134
+
135
+ ---
136
+
137
+ ## Key Design Decisions
138
+
139
+ - `generate_stream` and `chat_stream` yield `{ type: :delta, text: }` and `{ type: :done }` events.
140
+ - `S3Models` runner depends on `lex-s3`. Uses SHA256 digest verification. `import_from_s3` writes
141
+ directly to the filesystem; `sync_from_s3` pushes blobs through the Ollama API.
142
+ - `S3Models::OLLAMA_REGISTRY_PREFIX = 'manifests/registry.ollama.ai/library'`.
143
+ - `Usage` helper normalizes Ollama's token/duration fields to `{ input_tokens:, output_tokens:, ... }`.
144
+ - All runners return `{ result: body, status: code }`.
145
+ - **`Runners::Fleet` dispatch rules:**
146
+ - `request_type: 'embed'` → `Client#embed`, uses `:input` then falls back to `:text`.
147
+ - `request_type: 'generate'` → `Client#generate`.
148
+ - anything else (including `'chat'` or unknown) → `Client#chat`.
149
+ - **`Actor::ModelWorker#use_runner?` is `false`** — bypasses `Legion::Runner` / task DB entirely.
150
+ - **Reply publishing** never raises — errors are swallowed so the AMQP ack is not blocked.
151
+ - **Colon sanitisation** — `qwen3.5:27b` becomes `qwen3.5.27b` in queue/routing-key strings.
152
+
153
+ ---
154
+
28
155
  ## Dependencies
29
156
 
30
157
  | Gem | Purpose |
31
158
  |-----|---------|
32
- | faraday | HTTP client for Ollama REST API |
159
+ | `faraday` >= 2.0 | HTTP client for Ollama REST API |
160
+ | `lex-s3` >= 0.2 | S3 model distribution operations |
161
+
162
+ Fleet transport requires Legion runtime gems (`legion-transport`, `LegionIO`) but those are *not*
163
+ gemspec dependencies — they are expected to be present in the runtime environment.
164
+
165
+ ---
33
166
 
34
167
  ## Testing
35
168
 
36
169
  ```bash
37
170
  bundle install
38
- bundle exec rspec
171
+ bundle exec rspec # all examples
39
172
  bundle exec rubocop
40
173
  ```
41
174
 
42
175
  ---
43
176
 
44
177
  **Maintained By**: Matthew Iverson (@Esity)
178
+ **Last Updated**: 2026-04-07
data/Gemfile CHANGED
@@ -8,5 +8,6 @@ group :test do
8
8
  gem 'rspec'
9
9
  gem 'rspec_junit_formatter'
10
10
  gem 'rubocop'
11
+ gem 'rubocop-legion'
11
12
  gem 'simplecov'
12
13
  end
data/README.md CHANGED
@@ -35,6 +35,12 @@ gem install lex-ollama
35
35
  - `check_blob` - Check if a blob exists on the server (HEAD /api/blobs/:digest)
36
36
  - `push_blob` - Upload a binary blob to the server (POST /api/blobs/:digest)
37
37
 
38
+ ### S3 Model Distribution
39
+ - `list_s3_models` - List models available in an S3 mirror
40
+ - `import_from_s3` - Download model from S3 directly to Ollama's filesystem (works before Ollama starts)
41
+ - `sync_from_s3` - Download model from S3, push blobs through Ollama's API, write manifest to filesystem
42
+ - `import_default_models` - Import a list of models from S3 (fleet provisioning)
43
+
38
44
  ### Version
39
45
  - `server_version` - Retrieve the Ollama server version (GET /api/version)
40
46
 
@@ -71,6 +77,34 @@ client.chat_stream(model: 'llama3.2', messages: [{ role: 'user', content: 'Hello
71
77
  end
72
78
  ```
73
79
 
80
+ ## S3 Model Distribution
81
+
82
+ Pull models from an internal S3 mirror instead of the public Ollama registry:
83
+
84
+ ```ruby
85
+ client = Legion::Extensions::Ollama::Client.new
86
+
87
+ # List available models in S3
88
+ client.list_s3_models(bucket: 'legion', endpoint: 'https://mesh.s3api-core.optum.com')
89
+
90
+ # Import directly to filesystem (works without Ollama running)
91
+ client.import_from_s3(model: 'llama3:latest', bucket: 'legion',
92
+ endpoint: 'https://mesh.s3api-core.optum.com')
93
+
94
+ # Push through Ollama API (requires Ollama running)
95
+ client.sync_from_s3(model: 'llama3:latest', bucket: 'legion',
96
+ endpoint: 'https://mesh.s3api-core.optum.com')
97
+
98
+ # Provision fleet with default models
99
+ client.import_default_models(
100
+ default_models: %w[llama3:latest nomic-embed-text:latest],
101
+ bucket: 'legion',
102
+ endpoint: 'https://mesh.s3api-core.optum.com'
103
+ )
104
+ ```
105
+
106
+ S3 operations use [lex-s3](https://github.com/LegionIO/lex-s3). The S3 bucket should mirror the Ollama models directory structure (`manifests/` and `blobs/` under the configured prefix).
107
+
74
108
  All API calls include automatic retry with exponential backoff on connection failures and timeouts.
75
109
 
76
110
  Generate and chat responses include standardized `usage:` data:
@@ -85,6 +119,10 @@ result[:usage] # => { input_tokens: 1, output_tokens: 5, total_duration: ..., .
85
119
  - [LegionIO](https://github.com/LegionIO/LegionIO) framework
86
120
  - [Ollama](https://ollama.com) running locally or on a remote host
87
121
 
122
+ ## Version
123
+
124
+ 0.3.1
125
+
88
126
  ## License
89
127
 
90
128
  MIT
@@ -0,0 +1,131 @@
1
+ # S3 Model Distribution for lex-ollama
2
+
3
+ ## Problem
4
+
5
+ Thousands of engineers pulling models from the public Ollama registry is wasteful and unreliable. Models should be cached in internal S3 and distributed from there. Fleet-wide model updates should be broadcast via RabbitMQ.
6
+
7
+ ## Design
8
+
9
+ ### New Runner: `Runners::S3Models`
10
+
11
+ A new runner module alongside the existing `Models` runner. Three primary methods plus one convenience method.
12
+
13
+ #### `import_from_s3` (filesystem write)
14
+
15
+ Downloads manifest + blobs from S3, writes directly to `~/.ollama/models/`.
16
+
17
+ ```ruby
18
+ import_from_s3(
19
+ model:, # e.g. "llama3:latest"
20
+ bucket:, # S3 bucket name
21
+ prefix: "ollama/models", # S3 key prefix
22
+ models_path: nil, # local Ollama models dir, defaults to ~/.ollama/models
23
+ **s3_opts # passed through to lex-s3 (endpoint:, region:, access_key_id:, etc.)
24
+ )
25
+ ```
26
+
27
+ Flow:
28
+ 1. Parse `model` into `name` + `tag` (default tag: `latest`)
29
+ 2. Download manifest from S3: `{prefix}/manifests/registry.ollama.ai/library/{name}/{tag}`
30
+ 3. Parse manifest JSON to get the list of blob digests
31
+ 4. For each blob, check if it already exists locally with matching SHA256 digest (skip if valid)
32
+ 5. Stream blob from S3 to `.tmp` file, verify SHA256, atomic rename to final path
33
+ 6. Raise `DigestMismatchError` if any blob fails verification (temp file cleaned up)
34
+ 7. Write the manifest file
35
+ 8. Return `{ result: true, model:, blobs_downloaded:, blobs_skipped:, status: 200 }`
36
+
37
+ Best for: provisioning, bootstrapping, when Ollama is not yet running.
38
+
39
+ #### `sync_from_s3` (Ollama API + filesystem manifest)
40
+
41
+ Downloads from S3, pushes blobs through Ollama's API, writes manifest to filesystem.
42
+
43
+ ```ruby
44
+ sync_from_s3(
45
+ model:,
46
+ bucket:,
47
+ prefix: "ollama/models",
48
+ host: nil, # Ollama server host
49
+ models_path: nil, # local models dir for manifest write
50
+ **s3_opts # passed to lex-s3
51
+ )
52
+ ```
53
+
54
+ Flow:
55
+ 1. Parse model, download manifest from S3
56
+ 2. For each blob digest, `check_blob` via Ollama API -- skip if already present
57
+ 3. Stream blob from S3 to tempfile, verify SHA256 digest
58
+ 4. `push_blob` to Ollama API, check return value for success
59
+ 5. If any blob fails: return `{ result: false, errors: [...], status: 500 }`
60
+ 6. Write manifest to `{models_path}/manifests/registry.ollama.ai/library/{name}/{tag}`
61
+ 7. Return `{ result: true, model:, blobs_pushed:, blobs_skipped:, status: 200 }`
62
+
63
+ Best for: when Ollama is running and you want blob validation through the API.
64
+
65
+ #### `list_s3_models`
66
+
67
+ Lists available models in the S3 mirror.
68
+
69
+ ```ruby
70
+ list_s3_models(
71
+ bucket:,
72
+ prefix: "ollama/models",
73
+ **s3_opts
74
+ )
75
+ ```
76
+
77
+ Lists manifest keys under the prefix and parses them into model name/tag pairs.
78
+
79
+ #### `import_default_models`
80
+
81
+ Convenience method that reads `default_models` from settings and calls `import_from_s3` for each.
82
+
83
+ ### Settings
84
+
85
+ ```yaml
86
+ legion:
87
+ ollama:
88
+ s3:
89
+ bucket: "legion"
90
+ prefix: "ollama/models"
91
+ endpoint: "https://mesh.s3api-core.optum.com"
92
+ region: "us-east-2"
93
+ default_models:
94
+ - "llama3:latest"
95
+ - "nomic-embed-text:latest"
96
+ models_path: null # defaults to ~/.ollama/models, respects OLLAMA_MODELS env var
97
+ ```
98
+
99
+ ### Dependency
100
+
101
+ `lex-ollama.gemspec` adds a runtime dependency on `lex-s3` (`>= 0.1`). The `S3Models` runner uses `Legion::Extensions::S3::Client` for all S3 operations.
102
+
103
+ ### Data Flow
104
+
105
+ ```
106
+ S3 (mesh.s3api-core.optum.com)
107
+ |
108
+ | HTTPS (direct, no AMQP)
109
+ v
110
+ Node: S3Models runner
111
+ |
112
+ |-- import_from_s3 --> filesystem write to ~/.ollama/models/
113
+ |-- sync_from_s3 --> Ollama HTTP API (push_blob + create_model)
114
+ ```
115
+
116
+ Fleet broadcast: publish a message to the `ollama.s3_models` queue (natural LEX runner behavior). Each node picks it up and runs the download independently from S3.
117
+
118
+ ### File Layout
119
+
120
+ ```
121
+ lib/legion/extensions/ollama/
122
+ runners/
123
+ models.rb # existing, unchanged
124
+ s3_models.rb # NEW
125
+ client.rb # updated to include Runners::S3Models
126
+
127
+ spec/legion/extensions/ollama/runners/
128
+ s3_models_spec.rb # NEW
129
+ ```
130
+
131
+ No changes to existing runner methods or the Helpers::Client module.