lex-ollama 0.1.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: c524d3518516ca7731280a6e84a2b354b12b47b516545a23bec97f6eda373a90
4
- data.tar.gz: fbfcff5614ac931e74219661eb792f295dfb65d125c93ddb190b40eae74c6f81
3
+ metadata.gz: 7477574f919b18b85c79afba3a1f65c8540d9eff9ca02b9e0c807b3740fed452
4
+ data.tar.gz: a5c69878c8518caf02c2e238c94243fd49c320f20bfaede00252bfdc87be5cbb
5
5
  SHA512:
6
- metadata.gz: 8975e31624faf65d869fcbe9c512fbf9d35b468bae3b5267726126312ccbe4520f8c5ff339550ce1d2ea44b2561c90d383ae847c67d826e538dd04a2f8a5c2ff
7
- data.tar.gz: 4a01f66393c2a78bbd9cccb707ed19156338501f1de46e2afa78508b4be7d93354e90a01a7eef0acc14c18b817d39189d5fc50152da7905ce311d9d537bf0bfe
6
+ metadata.gz: 31566bf77244dd3cfc097531a3af1da186e8d0e7e0ec675be0b7471f8b7654649fa666d4c3d2f6bb34c46d73d29aa72a64dfa07f7beb35ae01d23c8f2bc6c797
7
+ data.tar.gz: f900e723d2db75dbdb266fcf33d01be56d7614b992be9e0b6d29345a85012be0d226ff9a2f42cb2d5a9f932cb1e641e6ecbfc033ccd3c6c98bbe4d2a7207ad13
data/CHANGELOG.md CHANGED
@@ -1,5 +1,36 @@
1
1
  # Changelog
2
2
 
3
+ ## [0.3.0] - 2026-04-01
4
+
5
+ ### Added
6
+ - S3 model distribution via new `Runners::S3Models` module
7
+ - `list_s3_models` to discover models available in an S3 mirror
8
+ - `import_from_s3` for direct filesystem model import (works without Ollama running)
9
+ - `sync_from_s3` for Ollama API-based model import (push_blob + manifest write)
10
+ - `import_default_models` convenience method for fleet provisioning
11
+ - Runtime dependency on `lex-s3` for S3 operations
12
+ - Streaming S3 downloads via `response_target` to avoid loading multi-GB blobs into memory
13
+ - Error propagation in `sync_from_s3` — returns failure with error details when blob push fails
14
+ - SHA256 digest verification for all downloaded blobs (import and sync paths)
15
+ - Atomic blob writes via temp file + rename (prevents partial/corrupt blobs on failure)
16
+ - Cache hits verified by SHA256 digest, not just file size — corrupted local blobs are re-downloaded
17
+ - `DigestMismatchError` raised when S3 blob content does not match manifest digest
18
+
19
+ ## [0.2.0] - 2026-03-31
20
+
21
+ ### Added
22
+ - `Helpers::Errors` — Faraday exception classification (TimeoutError, ConnectionFailed) with exponential backoff retry (`with_retry`, 3 retries, 0.5s base delay)
23
+ - `Helpers::Usage` — standardized usage hash normalization from Ollama response fields (`prompt_eval_count` -> `input_tokens`, `eval_count` -> `output_tokens`, plus duration fields)
24
+ - `Helpers::Client#streaming_client` — Faraday connection without JSON response middleware for streaming endpoints
25
+ - `Runners::Completions#generate_stream` — streaming generate with per-chunk block callback and full text accumulation
26
+ - `Runners::Chat#chat_stream` — streaming chat with per-chunk block callback and full text accumulation
27
+
28
+ ### Changed
29
+ - All runner methods wrapped in `Helpers::Errors.with_retry` for production reliability
30
+ - `Runners::Completions#generate` now returns a `usage:` key with standardized token/duration counts
31
+ - `Runners::Chat#chat` now returns a `usage:` key with standardized token/duration counts
32
+ - `Client` class now overrides `streaming_client` for host passthrough
33
+
3
34
  ## [0.1.0] - 2026-03-31
4
35
 
5
36
  ### Added
data/Gemfile CHANGED
@@ -8,5 +8,6 @@ group :test do
8
8
  gem 'rspec'
9
9
  gem 'rspec_junit_formatter'
10
10
  gem 'rubocop'
11
+ gem 'rubocop-legion'
11
12
  gem 'simplecov'
12
13
  end
data/README.md CHANGED
@@ -12,9 +12,11 @@ gem install lex-ollama
12
12
 
13
13
  ### Completions
14
14
  - `generate` - Generate a text completion (POST /api/generate)
15
+ - `generate_stream` - Stream a text completion with per-chunk callbacks
15
16
 
16
17
  ### Chat
17
18
  - `chat` - Generate a chat completion with message history and tool support (POST /api/chat)
19
+ - `chat_stream` - Stream a chat completion with per-chunk callbacks
18
20
 
19
21
  ### Models
20
22
  - `create_model` - Create a model from another model, GGUF, or safetensors (POST /api/create)
@@ -33,6 +35,12 @@ gem install lex-ollama
33
35
  - `check_blob` - Check if a blob exists on the server (HEAD /api/blobs/:digest)
34
36
  - `push_blob` - Upload a binary blob to the server (POST /api/blobs/:digest)
35
37
 
38
+ ### S3 Model Distribution
39
+ - `list_s3_models` - List models available in an S3 mirror
40
+ - `import_from_s3` - Download model from S3 directly to Ollama's filesystem (works before Ollama starts)
41
+ - `sync_from_s3` - Download model from S3, push blobs through Ollama's API, write manifest to filesystem
42
+ - `import_default_models` - Import a list of models from S3 (fleet provisioning)
43
+
36
44
  ### Version
37
45
  - `server_version` - Retrieve the Ollama server version (GET /api/version)
38
46
 
@@ -54,6 +62,55 @@ result = client.embed(model: 'all-minilm', input: 'Some text to embed')
54
62
 
55
63
  # List models
56
64
  result = client.list_models
65
+
66
+ # Streaming generate
67
+ client.generate_stream(model: 'llama3.2', prompt: 'Tell me a story') do |event|
68
+ case event[:type]
69
+ when :delta then print event[:text]
70
+ when :done then puts "\nDone!"
71
+ end
72
+ end
73
+
74
+ # Streaming chat
75
+ client.chat_stream(model: 'llama3.2', messages: [{ role: 'user', content: 'Hello!' }]) do |event|
76
+ print event[:text] if event[:type] == :delta
77
+ end
78
+ ```
79
+
80
+ ## S3 Model Distribution
81
+
82
+ Pull models from an internal S3 mirror instead of the public Ollama registry:
83
+
84
+ ```ruby
85
+ client = Legion::Extensions::Ollama::Client.new
86
+
87
+ # List available models in S3
88
+ client.list_s3_models(bucket: 'legion', endpoint: 'https://mesh.s3api-core.optum.com')
89
+
90
+ # Import directly to filesystem (works without Ollama running)
91
+ client.import_from_s3(model: 'llama3:latest', bucket: 'legion',
92
+ endpoint: 'https://mesh.s3api-core.optum.com')
93
+
94
+ # Push through Ollama API (requires Ollama running)
95
+ client.sync_from_s3(model: 'llama3:latest', bucket: 'legion',
96
+ endpoint: 'https://mesh.s3api-core.optum.com')
97
+
98
+ # Provision fleet with default models
99
+ client.import_default_models(
100
+ default_models: %w[llama3:latest nomic-embed-text:latest],
101
+ bucket: 'legion',
102
+ endpoint: 'https://mesh.s3api-core.optum.com'
103
+ )
104
+ ```
105
+
106
+ S3 operations use [lex-s3](https://github.com/LegionIO/lex-s3). The S3 bucket should mirror the Ollama models directory structure (`manifests/` and `blobs/` under the configured prefix).
107
+
108
+ All API calls include automatic retry with exponential backoff on connection failures and timeouts.
109
+
110
+ Generate and chat responses include standardized `usage:` data:
111
+ ```ruby
112
+ result = client.generate(model: 'llama3.2', prompt: 'Hello')
113
+ result[:usage] # => { input_tokens: 1, output_tokens: 5, total_duration: ..., ... }
57
114
  ```
58
115
 
59
116
  ## Requirements
@@ -0,0 +1,131 @@
1
+ # S3 Model Distribution for lex-ollama
2
+
3
+ ## Problem
4
+
5
+ Thousands of engineers pulling models from the public Ollama registry is wasteful and unreliable. Models should be cached in internal S3 and distributed from there. Fleet-wide model updates should be broadcast via RabbitMQ.
6
+
7
+ ## Design
8
+
9
+ ### New Runner: `Runners::S3Models`
10
+
11
+ A new runner module alongside the existing `Models` runner. Three primary methods plus one convenience method.
12
+
13
+ #### `import_from_s3` (filesystem write)
14
+
15
+ Downloads manifest + blobs from S3, writes directly to `~/.ollama/models/`.
16
+
17
+ ```ruby
18
+ import_from_s3(
19
+ model:, # e.g. "llama3:latest"
20
+ bucket:, # S3 bucket name
21
+ prefix: "ollama/models", # S3 key prefix
22
+ models_path: nil, # local Ollama models dir, defaults to ~/.ollama/models
23
+ **s3_opts # passed through to lex-s3 (endpoint:, region:, access_key_id:, etc.)
24
+ )
25
+ ```
26
+
27
+ Flow:
28
+ 1. Parse `model` into `name` + `tag` (default tag: `latest`)
29
+ 2. Download manifest from S3: `{prefix}/manifests/registry.ollama.ai/library/{name}/{tag}`
30
+ 3. Parse manifest JSON to get the list of blob digests
31
+ 4. For each blob, check if it already exists locally with matching SHA256 digest (skip if valid)
32
+ 5. Stream blob from S3 to `.tmp` file, verify SHA256, atomic rename to final path
33
+ 6. Raise `DigestMismatchError` if any blob fails verification (temp file cleaned up)
34
+ 7. Write the manifest file
35
+ 8. Return `{ result: true, model:, blobs_downloaded:, blobs_skipped:, status: 200 }`
36
+
37
+ Best for: provisioning, bootstrapping, when Ollama is not yet running.
38
+
39
+ #### `sync_from_s3` (Ollama API + filesystem manifest)
40
+
41
+ Downloads from S3, pushes blobs through Ollama's API, writes manifest to filesystem.
42
+
43
+ ```ruby
44
+ sync_from_s3(
45
+ model:,
46
+ bucket:,
47
+ prefix: "ollama/models",
48
+ host: nil, # Ollama server host
49
+ models_path: nil, # local models dir for manifest write
50
+ **s3_opts # passed to lex-s3
51
+ )
52
+ ```
53
+
54
+ Flow:
55
+ 1. Parse model, download manifest from S3
56
+ 2. For each blob digest, `check_blob` via Ollama API -- skip if already present
57
+ 3. Stream blob from S3 to tempfile, verify SHA256 digest
58
+ 4. `push_blob` to Ollama API, check return value for success
59
+ 5. If any blob fails: return `{ result: false, errors: [...], status: 500 }`
60
+ 6. Write manifest to `{models_path}/manifests/registry.ollama.ai/library/{name}/{tag}`
61
+ 7. Return `{ result: true, model:, blobs_pushed:, blobs_skipped:, status: 200 }`
62
+
63
+ Best for: when Ollama is running and you want blob validation through the API.
64
+
65
+ #### `list_s3_models`
66
+
67
+ Lists available models in the S3 mirror.
68
+
69
+ ```ruby
70
+ list_s3_models(
71
+ bucket:,
72
+ prefix: "ollama/models",
73
+ **s3_opts
74
+ )
75
+ ```
76
+
77
+ Lists manifest keys under the prefix and parses them into model name/tag pairs.
78
+
79
+ #### `import_default_models`
80
+
81
+ Convenience method that reads `default_models` from settings and calls `import_from_s3` for each.
82
+
83
+ ### Settings
84
+
85
+ ```yaml
86
+ legion:
87
+ ollama:
88
+ s3:
89
+ bucket: "legion"
90
+ prefix: "ollama/models"
91
+ endpoint: "https://mesh.s3api-core.optum.com"
92
+ region: "us-east-2"
93
+ default_models:
94
+ - "llama3:latest"
95
+ - "nomic-embed-text:latest"
96
+ models_path: null # defaults to ~/.ollama/models, respects OLLAMA_MODELS env var
97
+ ```
98
+
99
+ ### Dependency
100
+
101
+ `lex-ollama.gemspec` adds a runtime dependency on `lex-s3` (`>= 0.1`). The `S3Models` runner uses `Legion::Extensions::S3::Client` for all S3 operations.
102
+
103
+ ### Data Flow
104
+
105
+ ```
106
+ S3 (mesh.s3api-core.optum.com)
107
+ |
108
+ | HTTPS (direct, no AMQP)
109
+ v
110
+ Node: S3Models runner
111
+ |
112
+ |-- import_from_s3 --> filesystem write to ~/.ollama/models/
113
+ |-- sync_from_s3 --> Ollama HTTP API (push_blob + create_model)
114
+ ```
115
+
116
+ Fleet broadcast: publish a message to the `ollama.s3_models` queue (natural LEX runner behavior). Each node picks it up and runs the download independently from S3.
117
+
118
+ ### File Layout
119
+
120
+ ```
121
+ lib/legion/extensions/ollama/
122
+ runners/
123
+ models.rb # existing, unchanged
124
+ s3_models.rb # NEW
125
+ client.rb # updated to include Runners::S3Models
126
+
127
+ spec/legion/extensions/ollama/runners/
128
+ s3_models_spec.rb # NEW
129
+ ```
130
+
131
+ No changes to existing runner methods or the Helpers::Client module.