lex-ollama 0.1.0 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +31 -0
- data/Gemfile +1 -0
- data/README.md +57 -0
- data/docs/plans/2026-04-01-s3-model-distribution-design.md +131 -0
- data/docs/plans/2026-04-01-s3-model-distribution-plan.md +655 -0
- data/lex-ollama.gemspec +1 -0
- data/lib/legion/extensions/ollama/client.rb +6 -0
- data/lib/legion/extensions/ollama/helpers/client.rb +9 -0
- data/lib/legion/extensions/ollama/helpers/errors.rb +40 -0
- data/lib/legion/extensions/ollama/helpers/usage.rb +35 -0
- data/lib/legion/extensions/ollama/runners/blobs.rb +7 -4
- data/lib/legion/extensions/ollama/runners/chat.rb +37 -2
- data/lib/legion/extensions/ollama/runners/completions.rb +37 -2
- data/lib/legion/extensions/ollama/runners/embeddings.rb +2 -1
- data/lib/legion/extensions/ollama/runners/models.rb +12 -9
- data/lib/legion/extensions/ollama/runners/s3_models.rb +194 -0
- data/lib/legion/extensions/ollama/runners/version.rb +2 -1
- data/lib/legion/extensions/ollama/version.rb +1 -1
- data/lib/legion/extensions/ollama.rb +3 -0
- metadata +20 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 7477574f919b18b85c79afba3a1f65c8540d9eff9ca02b9e0c807b3740fed452
|
|
4
|
+
data.tar.gz: a5c69878c8518caf02c2e238c94243fd49c320f20bfaede00252bfdc87be5cbb
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 31566bf77244dd3cfc097531a3af1da186e8d0e7e0ec675be0b7471f8b7654649fa666d4c3d2f6bb34c46d73d29aa72a64dfa07f7beb35ae01d23c8f2bc6c797
|
|
7
|
+
data.tar.gz: f900e723d2db75dbdb266fcf33d01be56d7614b992be9e0b6d29345a85012be0d226ff9a2f42cb2d5a9f932cb1e641e6ecbfc033ccd3c6c98bbe4d2a7207ad13
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,36 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## [0.3.0] - 2026-04-01
|
|
4
|
+
|
|
5
|
+
### Added
|
|
6
|
+
- S3 model distribution via new `Runners::S3Models` module
|
|
7
|
+
- `list_s3_models` to discover models available in an S3 mirror
|
|
8
|
+
- `import_from_s3` for direct filesystem model import (works without Ollama running)
|
|
9
|
+
- `sync_from_s3` for Ollama API-based model import (push_blob + manifest write)
|
|
10
|
+
- `import_default_models` convenience method for fleet provisioning
|
|
11
|
+
- Runtime dependency on `lex-s3` for S3 operations
|
|
12
|
+
- Streaming S3 downloads via `response_target` to avoid loading multi-GB blobs into memory
|
|
13
|
+
- Error propagation in `sync_from_s3` — returns failure with error details when blob push fails
|
|
14
|
+
- SHA256 digest verification for all downloaded blobs (import and sync paths)
|
|
15
|
+
- Atomic blob writes via temp file + rename (prevents partial/corrupt blobs on failure)
|
|
16
|
+
- Cache hits verified by SHA256 digest, not just file size — corrupted local blobs are re-downloaded
|
|
17
|
+
- `DigestMismatchError` raised when S3 blob content does not match manifest digest
|
|
18
|
+
|
|
19
|
+
## [0.2.0] - 2026-03-31
|
|
20
|
+
|
|
21
|
+
### Added
|
|
22
|
+
- `Helpers::Errors` — Faraday exception classification (TimeoutError, ConnectionFailed) with exponential backoff retry (`with_retry`, 3 retries, 0.5s base delay)
|
|
23
|
+
- `Helpers::Usage` — standardized usage hash normalization from Ollama response fields (`prompt_eval_count` -> `input_tokens`, `eval_count` -> `output_tokens`, plus duration fields)
|
|
24
|
+
- `Helpers::Client#streaming_client` — Faraday connection without JSON response middleware for streaming endpoints
|
|
25
|
+
- `Runners::Completions#generate_stream` — streaming generate with per-chunk block callback and full text accumulation
|
|
26
|
+
- `Runners::Chat#chat_stream` — streaming chat with per-chunk block callback and full text accumulation
|
|
27
|
+
|
|
28
|
+
### Changed
|
|
29
|
+
- All runner methods wrapped in `Helpers::Errors.with_retry` for production reliability
|
|
30
|
+
- `Runners::Completions#generate` now returns a `usage:` key with standardized token/duration counts
|
|
31
|
+
- `Runners::Chat#chat` now returns a `usage:` key with standardized token/duration counts
|
|
32
|
+
- `Client` class now overrides `streaming_client` for host passthrough
|
|
33
|
+
|
|
3
34
|
## [0.1.0] - 2026-03-31
|
|
4
35
|
|
|
5
36
|
### Added
|
data/Gemfile
CHANGED
data/README.md
CHANGED
|
@@ -12,9 +12,11 @@ gem install lex-ollama
|
|
|
12
12
|
|
|
13
13
|
### Completions
|
|
14
14
|
- `generate` - Generate a text completion (POST /api/generate)
|
|
15
|
+
- `generate_stream` - Stream a text completion with per-chunk callbacks
|
|
15
16
|
|
|
16
17
|
### Chat
|
|
17
18
|
- `chat` - Generate a chat completion with message history and tool support (POST /api/chat)
|
|
19
|
+
- `chat_stream` - Stream a chat completion with per-chunk callbacks
|
|
18
20
|
|
|
19
21
|
### Models
|
|
20
22
|
- `create_model` - Create a model from another model, GGUF, or safetensors (POST /api/create)
|
|
@@ -33,6 +35,12 @@ gem install lex-ollama
|
|
|
33
35
|
- `check_blob` - Check if a blob exists on the server (HEAD /api/blobs/:digest)
|
|
34
36
|
- `push_blob` - Upload a binary blob to the server (POST /api/blobs/:digest)
|
|
35
37
|
|
|
38
|
+
### S3 Model Distribution
|
|
39
|
+
- `list_s3_models` - List models available in an S3 mirror
|
|
40
|
+
- `import_from_s3` - Download model from S3 directly to Ollama's filesystem (works before Ollama starts)
|
|
41
|
+
- `sync_from_s3` - Download model from S3, push blobs through Ollama's API, write manifest to filesystem
|
|
42
|
+
- `import_default_models` - Import a list of models from S3 (fleet provisioning)
|
|
43
|
+
|
|
36
44
|
### Version
|
|
37
45
|
- `server_version` - Retrieve the Ollama server version (GET /api/version)
|
|
38
46
|
|
|
@@ -54,6 +62,55 @@ result = client.embed(model: 'all-minilm', input: 'Some text to embed')
|
|
|
54
62
|
|
|
55
63
|
# List models
|
|
56
64
|
result = client.list_models
|
|
65
|
+
|
|
66
|
+
# Streaming generate
|
|
67
|
+
client.generate_stream(model: 'llama3.2', prompt: 'Tell me a story') do |event|
|
|
68
|
+
case event[:type]
|
|
69
|
+
when :delta then print event[:text]
|
|
70
|
+
when :done then puts "\nDone!"
|
|
71
|
+
end
|
|
72
|
+
end
|
|
73
|
+
|
|
74
|
+
# Streaming chat
|
|
75
|
+
client.chat_stream(model: 'llama3.2', messages: [{ role: 'user', content: 'Hello!' }]) do |event|
|
|
76
|
+
print event[:text] if event[:type] == :delta
|
|
77
|
+
end
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
## S3 Model Distribution
|
|
81
|
+
|
|
82
|
+
Pull models from an internal S3 mirror instead of the public Ollama registry:
|
|
83
|
+
|
|
84
|
+
```ruby
|
|
85
|
+
client = Legion::Extensions::Ollama::Client.new
|
|
86
|
+
|
|
87
|
+
# List available models in S3
|
|
88
|
+
client.list_s3_models(bucket: 'legion', endpoint: 'https://mesh.s3api-core.optum.com')
|
|
89
|
+
|
|
90
|
+
# Import directly to filesystem (works without Ollama running)
|
|
91
|
+
client.import_from_s3(model: 'llama3:latest', bucket: 'legion',
|
|
92
|
+
endpoint: 'https://mesh.s3api-core.optum.com')
|
|
93
|
+
|
|
94
|
+
# Push through Ollama API (requires Ollama running)
|
|
95
|
+
client.sync_from_s3(model: 'llama3:latest', bucket: 'legion',
|
|
96
|
+
endpoint: 'https://mesh.s3api-core.optum.com')
|
|
97
|
+
|
|
98
|
+
# Provision fleet with default models
|
|
99
|
+
client.import_default_models(
|
|
100
|
+
default_models: %w[llama3:latest nomic-embed-text:latest],
|
|
101
|
+
bucket: 'legion',
|
|
102
|
+
endpoint: 'https://mesh.s3api-core.optum.com'
|
|
103
|
+
)
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
S3 operations use [lex-s3](https://github.com/LegionIO/lex-s3). The S3 bucket should mirror the Ollama models directory structure (`manifests/` and `blobs/` under the configured prefix).
|
|
107
|
+
|
|
108
|
+
All API calls include automatic retry with exponential backoff on connection failures and timeouts.
|
|
109
|
+
|
|
110
|
+
Generate and chat responses include standardized `usage:` data:
|
|
111
|
+
```ruby
|
|
112
|
+
result = client.generate(model: 'llama3.2', prompt: 'Hello')
|
|
113
|
+
result[:usage] # => { input_tokens: 1, output_tokens: 5, total_duration: ..., ... }
|
|
57
114
|
```
|
|
58
115
|
|
|
59
116
|
## Requirements
|
|
@@ -0,0 +1,131 @@
|
|
|
1
|
+
# S3 Model Distribution for lex-ollama
|
|
2
|
+
|
|
3
|
+
## Problem
|
|
4
|
+
|
|
5
|
+
Thousands of engineers pulling models from the public Ollama registry is wasteful and unreliable. Models should be cached in internal S3 and distributed from there. Fleet-wide model updates should be broadcast via RabbitMQ.
|
|
6
|
+
|
|
7
|
+
## Design
|
|
8
|
+
|
|
9
|
+
### New Runner: `Runners::S3Models`
|
|
10
|
+
|
|
11
|
+
A new runner module alongside the existing `Models` runner. Three primary methods plus one convenience method.
|
|
12
|
+
|
|
13
|
+
#### `import_from_s3` (filesystem write)
|
|
14
|
+
|
|
15
|
+
Downloads manifest + blobs from S3, writes directly to `~/.ollama/models/`.
|
|
16
|
+
|
|
17
|
+
```ruby
|
|
18
|
+
import_from_s3(
|
|
19
|
+
model:, # e.g. "llama3:latest"
|
|
20
|
+
bucket:, # S3 bucket name
|
|
21
|
+
prefix: "ollama/models", # S3 key prefix
|
|
22
|
+
models_path: nil, # local Ollama models dir, defaults to ~/.ollama/models
|
|
23
|
+
**s3_opts # passed through to lex-s3 (endpoint:, region:, access_key_id:, etc.)
|
|
24
|
+
)
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
Flow:
|
|
28
|
+
1. Parse `model` into `name` + `tag` (default tag: `latest`)
|
|
29
|
+
2. Download manifest from S3: `{prefix}/manifests/registry.ollama.ai/library/{name}/{tag}`
|
|
30
|
+
3. Parse manifest JSON to get the list of blob digests
|
|
31
|
+
4. For each blob, check if it already exists locally with matching SHA256 digest (skip if valid)
|
|
32
|
+
5. Stream blob from S3 to `.tmp` file, verify SHA256, atomic rename to final path
|
|
33
|
+
6. Raise `DigestMismatchError` if any blob fails verification (temp file cleaned up)
|
|
34
|
+
7. Write the manifest file
|
|
35
|
+
8. Return `{ result: true, model:, blobs_downloaded:, blobs_skipped:, status: 200 }`
|
|
36
|
+
|
|
37
|
+
Best for: provisioning, bootstrapping, when Ollama is not yet running.
|
|
38
|
+
|
|
39
|
+
#### `sync_from_s3` (Ollama API + filesystem manifest)
|
|
40
|
+
|
|
41
|
+
Downloads from S3, pushes blobs through Ollama's API, writes manifest to filesystem.
|
|
42
|
+
|
|
43
|
+
```ruby
|
|
44
|
+
sync_from_s3(
|
|
45
|
+
model:,
|
|
46
|
+
bucket:,
|
|
47
|
+
prefix: "ollama/models",
|
|
48
|
+
host: nil, # Ollama server host
|
|
49
|
+
models_path: nil, # local models dir for manifest write
|
|
50
|
+
**s3_opts # passed to lex-s3
|
|
51
|
+
)
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
Flow:
|
|
55
|
+
1. Parse model, download manifest from S3
|
|
56
|
+
2. For each blob digest, `check_blob` via Ollama API -- skip if already present
|
|
57
|
+
3. Stream blob from S3 to tempfile, verify SHA256 digest
|
|
58
|
+
4. `push_blob` to Ollama API, check return value for success
|
|
59
|
+
5. If any blob fails: return `{ result: false, errors: [...], status: 500 }`
|
|
60
|
+
6. Write manifest to `{models_path}/manifests/registry.ollama.ai/library/{name}/{tag}`
|
|
61
|
+
7. Return `{ result: true, model:, blobs_pushed:, blobs_skipped:, status: 200 }`
|
|
62
|
+
|
|
63
|
+
Best for: when Ollama is running and you want blob validation through the API.
|
|
64
|
+
|
|
65
|
+
#### `list_s3_models`
|
|
66
|
+
|
|
67
|
+
Lists available models in the S3 mirror.
|
|
68
|
+
|
|
69
|
+
```ruby
|
|
70
|
+
list_s3_models(
|
|
71
|
+
bucket:,
|
|
72
|
+
prefix: "ollama/models",
|
|
73
|
+
**s3_opts
|
|
74
|
+
)
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
Lists manifest keys under the prefix and parses them into model name/tag pairs.
|
|
78
|
+
|
|
79
|
+
#### `import_default_models`
|
|
80
|
+
|
|
81
|
+
Convenience method that reads `default_models` from settings and calls `import_from_s3` for each.
|
|
82
|
+
|
|
83
|
+
### Settings
|
|
84
|
+
|
|
85
|
+
```yaml
|
|
86
|
+
legion:
|
|
87
|
+
ollama:
|
|
88
|
+
s3:
|
|
89
|
+
bucket: "legion"
|
|
90
|
+
prefix: "ollama/models"
|
|
91
|
+
endpoint: "https://mesh.s3api-core.optum.com"
|
|
92
|
+
region: "us-east-2"
|
|
93
|
+
default_models:
|
|
94
|
+
- "llama3:latest"
|
|
95
|
+
- "nomic-embed-text:latest"
|
|
96
|
+
models_path: null # defaults to ~/.ollama/models, respects OLLAMA_MODELS env var
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
### Dependency
|
|
100
|
+
|
|
101
|
+
`lex-ollama.gemspec` adds a runtime dependency on `lex-s3` (`>= 0.1`). The `S3Models` runner uses `Legion::Extensions::S3::Client` for all S3 operations.
|
|
102
|
+
|
|
103
|
+
### Data Flow
|
|
104
|
+
|
|
105
|
+
```
|
|
106
|
+
S3 (mesh.s3api-core.optum.com)
|
|
107
|
+
|
|
|
108
|
+
| HTTPS (direct, no AMQP)
|
|
109
|
+
v
|
|
110
|
+
Node: S3Models runner
|
|
111
|
+
|
|
|
112
|
+
|-- import_from_s3 --> filesystem write to ~/.ollama/models/
|
|
113
|
+
|-- sync_from_s3 --> Ollama HTTP API (push_blob + create_model)
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
Fleet broadcast: publish a message to the `ollama.s3_models` queue (natural LEX runner behavior). Each node picks it up and runs the download independently from S3.
|
|
117
|
+
|
|
118
|
+
### File Layout
|
|
119
|
+
|
|
120
|
+
```
|
|
121
|
+
lib/legion/extensions/ollama/
|
|
122
|
+
runners/
|
|
123
|
+
models.rb # existing, unchanged
|
|
124
|
+
s3_models.rb # NEW
|
|
125
|
+
client.rb # updated to include Runners::S3Models
|
|
126
|
+
|
|
127
|
+
spec/legion/extensions/ollama/runners/
|
|
128
|
+
s3_models_spec.rb # NEW
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
No changes to existing runner methods or the Helpers::Client module.
|