RubyGems - lex-ollama - Versions diffs - 0.2.0 → 0.3.1 - Mend

lex-ollama 0.2.0 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +30 -0
data/CLAUDE.md +146 -12
data/Gemfile +1 -0
data/README.md +38 -0
data/docs/plans/2026-04-01-s3-model-distribution-design.md +131 -0
data/docs/plans/2026-04-01-s3-model-distribution-plan.md +655 -0
data/docs/plans/2026-04-07-fleet-queue-subscription-design.md +427 -0
data/lex-ollama.gemspec +1 -0
data/lib/legion/extensions/ollama/actors/model_worker.rb +79 -0
data/lib/legion/extensions/ollama/client.rb +2 -0
data/lib/legion/extensions/ollama/runners/fleet.rb +67 -0
data/lib/legion/extensions/ollama/runners/s3_models.rb +194 -0
data/lib/legion/extensions/ollama/transport/exchanges/llm_request.rb +21 -0
data/lib/legion/extensions/ollama/transport/messages/llm_response.rb +39 -0
data/lib/legion/extensions/ollama/transport/queues/model_request.rb +42 -0
data/lib/legion/extensions/ollama/transport.rb +25 -0
data/lib/legion/extensions/ollama/version.rb +1 -1
data/lib/legion/extensions/ollama.rb +13 -1
metadata +25 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 7f82aeecea946b03e08e2dc80a8ec66504276a2bb28aaaca5528d02105328166
-  data.tar.gz: 6b7b392634ec069693a0b0b030b1619a0a5ae1d3cbb34c2440124c1c52d15e4a
+  metadata.gz: 28df561b00b58c7cb179b9904aed61a5aa7e278140306dadb3b4b2665eaab824
+  data.tar.gz: 446afaab9d80e6a4f62286a1f5ccc1c023bdbb178dba043cb96081412991b2d3
 SHA512:
-  metadata.gz: 25b18ed44dbad71930004a3384dc897e37b245d3cdafa98122e0210a22ac5d5e6be343ab0133e519f8228f026d254e4993ee597f13368590c2fa81971329c6a8
-  data.tar.gz: 39b9f4ed1e8a7ccd447b03770a9757c7e2cdbcea03341f07c2dda561db72e3556029152c76256e6f8b85b6d0cb858773e2d7ee9989d5559ffdc9daccfcf6b966
+  metadata.gz: 2915cfe6e4e959e61ee5b8ce68e7da784b4c6001cfe0c3acdb0a4e0f804da79a1e46a17b7c5297b9dd4f26e58bbae504066f5874d8cd82d6ea223b3dfc561bbb
+  data.tar.gz: cb1337292d4bb7c94603612e03dbdbcbd9a41c2a94e56f4bbfad1132d403f6d14b0316091017745ee2d282ccc426bdcdb65b137f66f07f2a505d231792e424b0

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,35 @@
 # Changelog
+## [0.3.1] - 2026-04-08
+### Added
+- `Runners::Fleet` — module-function dispatcher for inbound AMQP LLM request messages; routes by `request_type` to `Client#embed`, `Client#generate`, or `Client#chat`
+- `Transport::Exchanges::LlmRequest` — durable topic exchange `llm.request` for fleet routing
+- `Transport::Queues::ModelRequest` — parametric durable quorum queue per `(type, model)` pair; sanitises colons in model names to dots
+- `Transport::Messages::LlmResponse` — reply message published back to `reply_to` queue after inference
+- `Actor::ModelWorker` — subscription actor; one instance per configured `(type, model)` subscription; enriches inbound messages with `request_type` and `model`, bypasses Legion::Runner task DB (`use_runner? false`)
+- Fleet queue subscription system: when `Legion::Extensions::Core` is present, subscribes to model-scoped queues on `llm.request` topic exchange using routing key `llm.request.ollama.<type>.<model>`
+- Standalone mode: all transport/actor requires guarded behind `const_defined?(:Core, false)` so the gem works as a pure HTTP client library without AMQP
+### Fixed
+- `Runners::S3Models`: use `::JSON.parse` (stdlib) instead of bare `JSON.parse` which resolves to `Legion::JSON` (symbol keys) inside the `Legion::` namespace — fixes `import_from_s3` and `sync_from_s3` manifest parsing
+## [0.3.0] - 2026-04-01
+### Added
+- S3 model distribution via new `Runners::S3Models` module
+- `list_s3_models` to discover models available in an S3 mirror
+- `import_from_s3` for direct filesystem model import (works without Ollama running)
+- `sync_from_s3` for Ollama API-based model import (push_blob + manifest write)
+- `import_default_models` convenience method for fleet provisioning
+- Runtime dependency on `lex-s3` for S3 operations
+- Streaming S3 downloads via `response_target` to avoid loading multi-GB blobs into memory
+- Error propagation in `sync_from_s3` — returns failure with error details when blob push fails
+- SHA256 digest verification for all downloaded blobs (import and sync paths)
+- Atomic blob writes via temp file + rename (prevents partial/corrupt blobs on failure)
+- Cache hits verified by SHA256 digest, not just file size — corrupted local blobs are re-downloaded
+- `DigestMismatchError` raised when S3 blob content does not match manifest digest
 ## [0.2.0] - 2026-03-31
 ### Added

data/CLAUDE.md CHANGED Viewed

@@ -1,44 +1,178 @@
 # lex-ollama: Ollama Integration for LegionIO
-**Parent**: `/Users/miverso2/rubymine/legion/extensions-ai/CLAUDE.md`
+**Repository Level 3 Documentation**
+- **Parent**: `../CLAUDE.md`
+- **Grandparent**: `../../CLAUDE.md`
 ## Purpose
-Legion Extension that connects LegionIO to Ollama, a local LLM server. Provides text generation, chat completions, embeddings, model management, and blob operations.
+Legion Extension that connects LegionIO to Ollama, a local LLM server. Provides text generation,
+chat completions, embeddings, model management, blob operations, S3 model distribution, version
+reporting, and **fleet queue subscription** for receiving routed LLM requests from the Legion bus.
 **GitHub**: https://github.com/LegionIO/lex-ollama
 **License**: MIT
+**Version**: 0.3.1
+**Specs**: 82 examples (12 spec files) — fleet additions add ~35 more
+---
 ## Architecture
 ```
 Legion::Extensions::Ollama
 ├── Runners/
-│   ├── Completions        # POST /api/generate
-│   ├── Chat               # POST /api/chat
-│   ├── Models             # CRUD + pull/push/running
-│   ├── Embeddings         # POST /api/embed
-│   ├── Blobs              # HEAD/POST /api/blobs/:digest
-│   └── Version            # GET /api/version
+│   ├── Completions    # generate, generate_stream
+│   ├── Chat           # chat, chat_stream
+│   ├── Models         # create_model, list_models, show_model, copy_model, delete_model,
+│   │                  #   pull_model, push_model, list_running
+│   ├── Embeddings     # embed
+│   ├── Blobs          # check_blob, push_blob
+│   ├── S3Models       # list_s3_models, import_from_s3, sync_from_s3, import_default_models
+│   ├── Version        # server_version
+│   └── Fleet          # handle_request (fleet dispatcher — chat/embed/generate)
 ├── Helpers/
-│   └── Client             # Faraday connection to Ollama server
-└── Client                 # Standalone client class
+│   ├── Client         # Faraday connection to Ollama server (module, factory method)
+│   ├── Errors         # error handling + with_retry
+│   └── Usage          # usage normalization (maps Ollama token/duration fields to standard shape)
+├── Client             # Standalone client class (includes all runners, holds @config)
+├── Transport/         # (loaded only when Legion::Extensions::Core is present)
+│   ├── Exchanges/
+│   │   └── LlmRequest   # topic exchange 'llm.request'
+│   ├── Queues/
+│   │   └── ModelRequest # parametric queue — one per (type, model) pair
+│   └── Messages/
+│       └── LlmResponse  # reply message published back to reply_to
+└── Actor/
+    └── ModelWorker    # subscription actor — one per registered model/type
 ```
+---
+## Fleet Queue Subscription
+### Overview
+When `Legion::Extensions::Core` is available, lex-ollama subscribes to model-scoped queues on the
+`llm.request` topic exchange, accepting routed inference work from other Legion fleet members
+(lex-llm-gateway, direct publishers, etc.).
+### Routing Key Schema
+```
+llm.request.ollama.<type>.<model>
+```
+| Segment    | Values                     | Notes                              |
+|------------|----------------------------|------------------------------------|
+| `ollama`   | literal                    | provider identifier                |
+| `type`     | `chat`, `embed`, `generate`| maps to a specific runner method   |
+| `model`    | sanitised model name       | `:` replaced with `.` (AMQP rules) |
+**Examples:**
+```
+llm.request.ollama.embed.nomic-embed-text
+llm.request.ollama.embed.mxbai-embed-large
+llm.request.ollama.chat.qwen3.5.27b          # was qwen3.5:27b
+llm.request.ollama.chat.llama3.2
+llm.request.ollama.generate.llama3.2
+```
+### Queue Strategy
+Each model+type combination gets its own **durable quorum queue** with a routing key that matches
+its queue name exactly. Multiple nodes carrying the same model compete fairly (no SAC) — any
+subscriber can serve. The queue name is identical to the routing key for clarity in the management UI.
+### Configuration
+```yaml
+legion:
+  ollama:
+    host: "http://localhost:11434"
+    subscriptions:
+      - type: embed
+        model: nomic-embed-text
+      - type: embed
+        model: mxbai-embed-large
+      - type: chat
+        model: "qwen3.5:27b"
+      - type: chat
+        model: llama3.2
+```
+The extension spawns one `Actor::ModelWorker` per subscription entry at boot.
+### Data Flow
+```
+Publisher (lex-llm-gateway / any fleet node)
+  │  routing_key: "llm.request.ollama.embed.nomic-embed-text"
+  ▼
+Exchange: llm.request  [topic, durable]
+  │
+  └── Queue: llm.request.ollama.embed.nomic-embed-text  [quorum]
+            ▼
+       Actor::ModelWorker (type=embed, model=nomic-embed-text)
+            ▼
+       Runners::Fleet#handle_request
+            ▼
+       Ollama::Client#embed(model: 'nomic-embed-text', ...)
+            ▼
+       Transport::Messages::LlmResponse → reply_to queue (if present)
+```
+### Standalone Mode (no Legion runtime)
+All transport/actor requires are guarded behind:
+```ruby
+if Legion::Extensions.const_defined?(:Core, false)
+  # transport + actor requires
+end
+```
+The gem still works as a pure HTTP client library without AMQP, exactly as before.
+---
+## Key Design Decisions
+- `generate_stream` and `chat_stream` yield `{ type: :delta, text: }` and `{ type: :done }` events.
+- `S3Models` runner depends on `lex-s3`. Uses SHA256 digest verification. `import_from_s3` writes
+  directly to the filesystem; `sync_from_s3` pushes blobs through the Ollama API.
+- `S3Models::OLLAMA_REGISTRY_PREFIX = 'manifests/registry.ollama.ai/library'`.
+- `Usage` helper normalizes Ollama's token/duration fields to `{ input_tokens:, output_tokens:, ... }`.
+- All runners return `{ result: body, status: code }`.
+- **`Runners::Fleet` dispatch rules:**
+  - `request_type: 'embed'` → `Client#embed`, uses `:input` then falls back to `:text`.
+  - `request_type: 'generate'` → `Client#generate`.
+  - anything else (including `'chat'` or unknown) → `Client#chat`.
+- **`Actor::ModelWorker#use_runner?` is `false`** — bypasses `Legion::Runner` / task DB entirely.
+- **Reply publishing** never raises — errors are swallowed so the AMQP ack is not blocked.
+- **Colon sanitisation** — `qwen3.5:27b` becomes `qwen3.5.27b` in queue/routing-key strings.
+---
 ## Dependencies
 | Gem | Purpose |
 |-----|---------|
-| faraday | HTTP client for Ollama REST API |
+| `faraday` >= 2.0 | HTTP client for Ollama REST API |
+| `lex-s3` >= 0.2 | S3 model distribution operations |
+Fleet transport requires Legion runtime gems (`legion-transport`, `LegionIO`) but those are *not*
+gemspec dependencies — they are expected to be present in the runtime environment.
+---
 ## Testing
 ```bash
 bundle install
-bundle exec rspec
+bundle exec rspec        # all examples
 bundle exec rubocop
 ```
 ---
 **Maintained By**: Matthew Iverson (@Esity)
+**Last Updated**: 2026-04-07

data/Gemfile CHANGED Viewed

@@ -8,5 +8,6 @@ group :test do
   gem 'rspec'
   gem 'rspec_junit_formatter'
   gem 'rubocop'
+  gem 'rubocop-legion'
   gem 'simplecov'
 end

data/README.md CHANGED Viewed

@@ -35,6 +35,12 @@ gem install lex-ollama
 - `check_blob` - Check if a blob exists on the server (HEAD /api/blobs/:digest)
 - `push_blob` - Upload a binary blob to the server (POST /api/blobs/:digest)
+### S3 Model Distribution
+- `list_s3_models` - List models available in an S3 mirror
+- `import_from_s3` - Download model from S3 directly to Ollama's filesystem (works before Ollama starts)
+- `sync_from_s3` - Download model from S3, push blobs through Ollama's API, write manifest to filesystem
+- `import_default_models` - Import a list of models from S3 (fleet provisioning)
 ### Version
 - `server_version` - Retrieve the Ollama server version (GET /api/version)
@@ -71,6 +77,34 @@ client.chat_stream(model: 'llama3.2', messages: [{ role: 'user', content: 'Hello
 end
 ```
+## S3 Model Distribution
+Pull models from an internal S3 mirror instead of the public Ollama registry:
+```ruby
+client = Legion::Extensions::Ollama::Client.new
+# List available models in S3
+client.list_s3_models(bucket: 'legion', endpoint: 'https://mesh.s3api-core.optum.com')
+# Import directly to filesystem (works without Ollama running)
+client.import_from_s3(model: 'llama3:latest', bucket: 'legion',
+                      endpoint: 'https://mesh.s3api-core.optum.com')
+# Push through Ollama API (requires Ollama running)
+client.sync_from_s3(model: 'llama3:latest', bucket: 'legion',
+                    endpoint: 'https://mesh.s3api-core.optum.com')
+# Provision fleet with default models
+client.import_default_models(
+  default_models: %w[llama3:latest nomic-embed-text:latest],
+  bucket: 'legion',
+  endpoint: 'https://mesh.s3api-core.optum.com'
+)
+```
+S3 operations use [lex-s3](https://github.com/LegionIO/lex-s3). The S3 bucket should mirror the Ollama models directory structure (`manifests/` and `blobs/` under the configured prefix).
 All API calls include automatic retry with exponential backoff on connection failures and timeouts.
 Generate and chat responses include standardized `usage:` data:
@@ -85,6 +119,10 @@ result[:usage]  # => { input_tokens: 1, output_tokens: 5, total_duration: ..., .
 - [LegionIO](https://github.com/LegionIO/LegionIO) framework
 - [Ollama](https://ollama.com) running locally or on a remote host
+## Version
+0.3.1
 ## License
 MIT

data/docs/plans/2026-04-01-s3-model-distribution-design.md ADDED Viewed

@@ -0,0 +1,131 @@
+# S3 Model Distribution for lex-ollama
+## Problem
+Thousands of engineers pulling models from the public Ollama registry is wasteful and unreliable. Models should be cached in internal S3 and distributed from there. Fleet-wide model updates should be broadcast via RabbitMQ.
+## Design
+### New Runner: `Runners::S3Models`
+A new runner module alongside the existing `Models` runner. Three primary methods plus one convenience method.
+#### `import_from_s3` (filesystem write)
+Downloads manifest + blobs from S3, writes directly to `~/.ollama/models/`.
+```ruby
+import_from_s3(
+  model:,                    # e.g. "llama3:latest"
+  bucket:,                   # S3 bucket name
+  prefix: "ollama/models",   # S3 key prefix
+  models_path: nil,          # local Ollama models dir, defaults to ~/.ollama/models
+  **s3_opts                  # passed through to lex-s3 (endpoint:, region:, access_key_id:, etc.)
+)
+```
+Flow:
+1. Parse `model` into `name` + `tag` (default tag: `latest`)
+2. Download manifest from S3: `{prefix}/manifests/registry.ollama.ai/library/{name}/{tag}`
+3. Parse manifest JSON to get the list of blob digests
+4. For each blob, check if it already exists locally with matching SHA256 digest (skip if valid)
+5. Stream blob from S3 to `.tmp` file, verify SHA256, atomic rename to final path
+6. Raise `DigestMismatchError` if any blob fails verification (temp file cleaned up)
+7. Write the manifest file
+8. Return `{ result: true, model:, blobs_downloaded:, blobs_skipped:, status: 200 }`
+Best for: provisioning, bootstrapping, when Ollama is not yet running.
+#### `sync_from_s3` (Ollama API + filesystem manifest)
+Downloads from S3, pushes blobs through Ollama's API, writes manifest to filesystem.
+```ruby
+sync_from_s3(
+  model:,
+  bucket:,
+  prefix: "ollama/models",
+  host: nil,                   # Ollama server host
+  models_path: nil,            # local models dir for manifest write
+  **s3_opts                    # passed to lex-s3
+)
+```
+Flow:
+1. Parse model, download manifest from S3
+2. For each blob digest, `check_blob` via Ollama API -- skip if already present
+3. Stream blob from S3 to tempfile, verify SHA256 digest
+4. `push_blob` to Ollama API, check return value for success
+5. If any blob fails: return `{ result: false, errors: [...], status: 500 }`
+6. Write manifest to `{models_path}/manifests/registry.ollama.ai/library/{name}/{tag}`
+7. Return `{ result: true, model:, blobs_pushed:, blobs_skipped:, status: 200 }`
+Best for: when Ollama is running and you want blob validation through the API.
+#### `list_s3_models`
+Lists available models in the S3 mirror.
+```ruby
+list_s3_models(
+  bucket:,
+  prefix: "ollama/models",
+  **s3_opts
+)
+```
+Lists manifest keys under the prefix and parses them into model name/tag pairs.
+#### `import_default_models`
+Convenience method that reads `default_models` from settings and calls `import_from_s3` for each.
+### Settings
+```yaml
+legion:
+  ollama:
+    s3:
+      bucket: "legion"
+      prefix: "ollama/models"
+      endpoint: "https://mesh.s3api-core.optum.com"
+      region: "us-east-2"
+    default_models:
+      - "llama3:latest"
+      - "nomic-embed-text:latest"
+    models_path: null  # defaults to ~/.ollama/models, respects OLLAMA_MODELS env var
+```
+### Dependency
+`lex-ollama.gemspec` adds a runtime dependency on `lex-s3` (`>= 0.1`). The `S3Models` runner uses `Legion::Extensions::S3::Client` for all S3 operations.
+### Data Flow
+```
+S3 (mesh.s3api-core.optum.com)
+  |
+  | HTTPS (direct, no AMQP)
+  v
+Node: S3Models runner
+  |
+  |-- import_from_s3 --> filesystem write to ~/.ollama/models/
+  |-- sync_from_s3   --> Ollama HTTP API (push_blob + create_model)
+```
+Fleet broadcast: publish a message to the `ollama.s3_models` queue (natural LEX runner behavior). Each node picks it up and runs the download independently from S3.
+### File Layout
+```
+lib/legion/extensions/ollama/
+  runners/
+    models.rb          # existing, unchanged
+    s3_models.rb       # NEW
+  client.rb            # updated to include Runners::S3Models
+spec/legion/extensions/ollama/runners/
+  s3_models_spec.rb    # NEW
+```
+No changes to existing runner methods or the Helpers::Client module.