lex-llm 0.4.15 → 0.4.18
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop.yml.new +54 -0
- data/CHANGELOG.md +27 -0
- data/README.md +349 -153
- data/lex-llm.gemspec +1 -0
- data/lib/legion/extensions/llm/configuration.rb +4 -0
- data/lib/legion/extensions/llm/connection.rb +10 -1
- data/lib/legion/extensions/llm/credential_sources.rb +14 -6
- data/lib/legion/extensions/llm/fleet/token_validator.rb +28 -5
- data/lib/legion/extensions/llm/fleet/worker_execution.rb +13 -2
- data/lib/legion/extensions/llm/model/info.rb +7 -1
- data/lib/legion/extensions/llm/models.json +138 -66
- data/lib/legion/extensions/llm/provider/open_ai_compatible.rb +5 -1
- data/lib/legion/extensions/llm/provider.rb +14 -0
- data/lib/legion/extensions/llm/streaming.rb +5 -1
- data/lib/legion/extensions/llm/transport/messages/fleet_error.rb +1 -0
- data/lib/legion/extensions/llm/transport/messages/fleet_request.rb +1 -0
- data/lib/legion/extensions/llm/transport/messages/fleet_response.rb +1 -0
- data/lib/legion/extensions/llm/version.rb +1 -1
- metadata +16 -1
data/README.md
CHANGED
|
@@ -2,42 +2,35 @@
|
|
|
2
2
|
|
|
3
3
|
[](https://github.com/LegionIO/lex-llm/actions/workflows/ci.yml)
|
|
4
4
|
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
`lex-llm` is a standard Legion extension gem. It does not
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
- `lex-llm-anthropic`
|
|
35
|
-
- `lex-llm-openai`
|
|
36
|
-
- `lex-llm-gemini`
|
|
37
|
-
- `lex-llm-mlx`
|
|
38
|
-
- `lex-llm-bedrock`
|
|
39
|
-
- `lex-llm-vertex`
|
|
40
|
-
- `lex-llm-azure-foundry`
|
|
5
|
+
Base provider framework for all LegionIO LLM provider extensions.
|
|
6
|
+
|
|
7
|
+
`lex-llm` is a standard Legion extension gem that provides provider-neutral primitives for LLM integration. It does not include concrete provider implementations -- those live in `lex-llm-*` gems (e.g. `lex-llm-ollama`, `lex-llm-openai`, `lex-llm-bedrock`). The routing unit is a **model offering**, not a provider, enabling Legion to reason about any combination of local instances, remote servers, cloud providers, and fleet workers.
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## Quick Index
|
|
12
|
+
|
|
13
|
+
| Topic | Section |
|
|
14
|
+
|-------|---------|
|
|
15
|
+
| Install & depend | [Install](#install) |
|
|
16
|
+
| Extension namespace | [Namespace](#namespace) |
|
|
17
|
+
| Core classes & files | [Class Index](#class-index) |
|
|
18
|
+
| Model offerings (routing) | [Model Offerings](#model-offerings) |
|
|
19
|
+
| In-memory offering registry | [Offering Registry](#offering-registry) |
|
|
20
|
+
| Fleet lanes & work routing | [Fleet Lanes](#fleet-lanes) |
|
|
21
|
+
| Fleet protocol v2 | [Fleet Protocol](#fleet-protocol) |
|
|
22
|
+
| Registry events | [Registry Events](#registry-events) |
|
|
23
|
+
| Provider contract | [Provider Extension Contract](#provider-extension-contract) |
|
|
24
|
+
| Streaming & accumulator | [Streaming](#streaming) |
|
|
25
|
+
| Credential discovery | [Credential Sources](#credential-sources) |
|
|
26
|
+
| Auto-registration | [Auto Registration](#auto-registration) |
|
|
27
|
+
| Provider settings | [Provider Settings](#provider-settings) |
|
|
28
|
+
| Schema & tools | [Schema & Tools](#schema--tools) |
|
|
29
|
+
| Response objects | [Response Objects](#response-objects) |
|
|
30
|
+
| Configuration | [Configuration](#configuration) |
|
|
31
|
+
| Running tests | [Development](#development) |
|
|
32
|
+
|
|
33
|
+
---
|
|
41
34
|
|
|
42
35
|
## Install
|
|
43
36
|
|
|
@@ -61,9 +54,7 @@ Load the extension through the Legion namespace:
|
|
|
61
54
|
require 'legion/extensions/llm'
|
|
62
55
|
```
|
|
63
56
|
|
|
64
|
-
Provider gems must use nested Legion extension namespaces so LegionIO autoloading
|
|
65
|
-
|
|
66
|
-
Example for `lex-llm-ollama`:
|
|
57
|
+
All classes live under `Legion::Extensions::Llm`. Provider gems must use nested Legion extension namespaces so LegionIO autoloading finds them consistently:
|
|
67
58
|
|
|
68
59
|
```ruby
|
|
69
60
|
require 'legion/extensions/llm'
|
|
@@ -84,6 +75,113 @@ module Legion
|
|
|
84
75
|
end
|
|
85
76
|
```
|
|
86
77
|
|
|
78
|
+
---
|
|
79
|
+
|
|
80
|
+
## Class Index
|
|
81
|
+
|
|
82
|
+
### Core
|
|
83
|
+
| Class | File | Purpose |
|
|
84
|
+
|-------|------|---------|
|
|
85
|
+
| `Provider` | `lib/.../provider.rb` | Base class for all provider adapters. Includes `Legion::Cache::Helper` and `Legion::Logging::Helper`. Mixin entry point for credentials, model caching, and model whitelist/blacklist. |
|
|
86
|
+
| `Provider::OpenAICompatible` | `lib/.../provider/open_ai_compatible.rb` | Shared adapter for OpenAI-compatible servers (vLLM, Ollama, MLX, local proxies). Handles request/response translation, streaming, tool calls, embedding, image, transcription, and thinking extraction. |
|
|
87
|
+
| `ProviderContract` | `lib/.../provider_contract.rb` | Defines the canonical provider interface: `chat`, `stream_chat`, `embed`, `image`, `count_tokens`, `health`, `discover_offerings`. Raises `UnsupportedCapability` for unimplemented methods. |
|
|
88
|
+
| `Configuration` | `lib/.../configuration.rb` | Hash-backed provider config wrapper; normalizes instance-level and fleet-level settings. |
|
|
89
|
+
| `ProviderSettings` | `lib/.../provider_settings.rb` | Builds complete provider settings from `family`, `instance`, and nested fleet settings. Includes `infer_tier_from_endpoint(url)` to detect `:local` vs `:direct`. |
|
|
90
|
+
|
|
91
|
+
### Requests & Data Types
|
|
92
|
+
| Class | File | Purpose |
|
|
93
|
+
|-------|------|---------|
|
|
94
|
+
| `Message` | `lib/.../message.rb` | Structured message (role, content, tool calls, attachments, thinking). |
|
|
95
|
+
| `Content` | `lib/.../content.rb` | Content part (text, image, file, tool result) with MIME type support. |
|
|
96
|
+
| `Tool` | `lib/.../tool.rb` | Tool definition (name, description, parameters, strict mode). |
|
|
97
|
+
| `ToolCall` | `lib/.../tool_call.rb` | Tool call result (id, function name, arguments, result). |
|
|
98
|
+
| `Attachment` | `lib/.../attachment.rb` | File attachment with content, filename, and MIME type. |
|
|
99
|
+
| `Chunk` | `lib/.../chunk.rb` | Streaming chunk wrapper (content delta, reasoning, tool call delta, usage). |
|
|
100
|
+
| `Context` | `lib/.../context.rb` | Conversation context builder; normalizes history and strips thinking. |
|
|
101
|
+
| `Thinking` | `lib/.../thinking.rb` | Thinking/reasoning metadata extracted from provider output. |
|
|
102
|
+
| `MimeType` | `lib/.../mime_type.rb` | MIME type utilities for image and file content. |
|
|
103
|
+
|
|
104
|
+
### Model & Routing
|
|
105
|
+
| Class | File | Purpose |
|
|
106
|
+
|-------|------|---------|
|
|
107
|
+
| `Model::Info` | `lib/.../model/info.rb` | Immutable `Data.define` struct: `instance`, `provider_family`, `provider_model`, `parameter_count`, `quantization`, `size_bytes`, `modalities_input/output`, `context_window`, `max_output_tokens`, `pricing`, `capabilities`, `created_at`, `knowledge_cutoff`. Factory: `Model::Info.from_hash` for legacy hash compatibility. |
|
|
108
|
+
| `Model::Modalities` | `lib/.../model/modalities.rb` | Canonical modality symbols and helpers. |
|
|
109
|
+
| `Model::Pricing` | `lib/.../model/pricing.rb` | Pricing data struct with `PricingCategory` and `PricingTier`. |
|
|
110
|
+
| `Models` | `lib/.../models.rb` | Shared model listing and metadata normalization. Uses `Call::Registry` with namespace-scanning fallback. |
|
|
111
|
+
| `Routing::ModelOffering` | `lib/.../routing/model_offering.rb` | Concrete offering: one model on one provider instance. Routing/filtering/health/policy unit. See [Model Offerings](#model-offerings). |
|
|
112
|
+
| `Routing::OfferingRegistry` | `lib/.../routing/offering_registry.rb` | In-memory index for offerings. See [Offering Registry](#offering-registry). |
|
|
113
|
+
| `Routing::LaneKey` | `lib/.../routing/lane_key.rb` | Derives fleet lane key strings from offerings. |
|
|
114
|
+
| `Aliases` | `lib/.../aliases.rb` | Canonical model alias normalization from `aliases.json`. |
|
|
115
|
+
| `Routing::RegistryEvent` | `lib/.../routing/registry_event.rb` | Envelope builder for registry availability events. |
|
|
116
|
+
|
|
117
|
+
### Responses
|
|
118
|
+
| Class | File | Purpose |
|
|
119
|
+
|-------|------|---------|
|
|
120
|
+
| `Responses::ChatResponse` | `lib/.../responses/chat_response.rb` | Normalized chat response: message, usage, thinking, finish_reason. |
|
|
121
|
+
| `Responses::EmbeddingResponse` | `lib/.../responses/embedding_response.rb` | Normalized embedding response: vectors, usage, model. |
|
|
122
|
+
| `Responses::StreamChunk` | `lib/.../responses/stream_chunk.rb` | Normalized stream chunk with delta fields and metadata. |
|
|
123
|
+
| `Responses::ThinkingExtractor` | `lib/.../responses/thinking_extractor.rb` | Extracts thinking/reasoning from provider output (reasoning_content, `</think>` tags, untagged preambles). |
|
|
124
|
+
|
|
125
|
+
### Streaming
|
|
126
|
+
| Class | File | Purpose |
|
|
127
|
+
|-------|------|---------|
|
|
128
|
+
| `Streaming` | `lib/.../streaming.rb` | Streaming framework: Faraday middleware, chunk parsing, retry on status 500, thinking extraction, error handling. Handles both Net::HTTP and Typhoeus adapters. |
|
|
129
|
+
| `StreamAccumulator` | `lib/.../stream_accumulator.rb` | Accumulates streaming deltas into complete messages; assembles partial tool-call arguments, separates thinking from content, builds tool call arrays. |
|
|
130
|
+
|
|
131
|
+
### Fleet (Protocol v2)
|
|
132
|
+
| Class | File | Purpose |
|
|
133
|
+
|-------|------|---------|
|
|
134
|
+
| `Fleet::Protocol` | `lib/.../fleet/protocol.rb` | Protocol v2 constants, field names, and versioning. |
|
|
135
|
+
| `Fleet::EnvelopeValidation` | `lib/.../fleet/envelope_validation.rb` | Validates v2 envelopes; rejects legacy fields. |
|
|
136
|
+
| `Fleet::TokenValidator` | `lib/.../fleet/token_validator.rb` | Validates JWT replay tokens with issuer verification and hash-based claims. |
|
|
137
|
+
| `Fleet::TokenError` | `lib/.../fleet/token_error.rb` | Token validation error types. |
|
|
138
|
+
| `Fleet::Settings` | `lib/.../fleet/settings.rb` | Default fleet settings builder (consumer, auth, endpoint). |
|
|
139
|
+
| `Fleet::ProviderResponder` | `lib/.../fleet/provider_responder.rb` | Responder-side execution: receives fleet requests, validates tokens, dispatches to provider, publishes responses. |
|
|
140
|
+
| `Fleet::WorkerExecution` | `lib/.../fleet/worker_execution.rb` | Worker-side execution: binds to lanes, pulls/consumes messages, manages backpressure. |
|
|
141
|
+
| `Fleet::DefaultExchangeReply` | `lib/.../fleet/default_exchange_reply.rb` | Publishes replies via AMQP default exchange with publisher confirms. |
|
|
142
|
+
| `Fleet::PublishSafety` | `lib/.../fleet/publish_safety.rb` | Guards against infinite requeues on publish failure. |
|
|
143
|
+
| `Transport::Messages::FleetRequest` | `lib/.../transport/messages/fleet_request.rb` | Encrypted fleet request envelope (v2). |
|
|
144
|
+
| `Transport::Messages::FleetResponse` | `lib/.../transport/messages/fleet_response.rb` | Encrypted fleet response envelope (v2). |
|
|
145
|
+
| `Transport::Messages::FleetError` | `lib/.../transport/messages/fleet_error.rb` | Encrypted fleet error envelope (v2). |
|
|
146
|
+
| `Transport::Exchanges::Fleet` | `lib/.../transport/exchanges/fleet.rb` | Fleet exchange declarations. |
|
|
147
|
+
| `Transport::Exchanges::LlmRegistry` | `lib/.../transport/exchanges/llm_registry.rb` | Registry exchange for offering availability events. |
|
|
148
|
+
| `Transport::FleetLane` | `lib/.../transport/fleet_lane.rb` | Fleet lane declaration and binding. |
|
|
149
|
+
| `RegistryPublisher` | `lib/.../registry_publisher.rb` | Publishes registry events to `llm.registry` exchange. |
|
|
150
|
+
| `RegistryEventBuilder` | `lib/.../registry_event_builder.rb` | Builds sanitized registry event messages. |
|
|
151
|
+
|
|
152
|
+
### Credentials & Discovery
|
|
153
|
+
| Class | File | Purpose |
|
|
154
|
+
|-------|------|---------|
|
|
155
|
+
| `CredentialSources` | `lib/.../credential_sources.rb` | Read-only probes: env vars, `~/.claude/settings.json`, `~/.codex/auth.json`, `Legion::Settings`, socket/HTTP probes. SHA-256 credential dedup via `credential_fingerprint`. Includes `source_tag(type, location, key)` for provenance. Probing gated behind `extensions.llm.security.credential_source_probing`. |
|
|
156
|
+
| `AutoRegistration` | `lib/.../auto_registration.rb` | Mixin for provider self-registration into `Call::Registry`. Discovers instances, builds offerings, handles rediscovery. Pure discovery -- no upward registry mutation. |
|
|
157
|
+
|
|
158
|
+
### Capabilities
|
|
159
|
+
| Class | File | Purpose |
|
|
160
|
+
|-------|------|---------|
|
|
161
|
+
| `Chat` | `lib/.../chat.rb` | Shared chat request builder and parameter normalization. |
|
|
162
|
+
| `Embedding` | `lib/.../embedding.rb` | Embedding request builder. |
|
|
163
|
+
| `Image` | `lib/.../image.rb` | Image generation request builder. |
|
|
164
|
+
| `Moderation` | `lib/.../moderation.rb` | Moderation request builder. |
|
|
165
|
+
| `Tokens` | `lib/.../tokens.rb` | Token counting request builder. |
|
|
166
|
+
| `Transcription` | `lib/.../transcription.rb` | Audio transcription request builder. |
|
|
167
|
+
| `Agent` | `lib/.../agent.rb` | Agent-specific context and parameter helpers. |
|
|
168
|
+
|
|
169
|
+
### Connection
|
|
170
|
+
| Class | File | Purpose |
|
|
171
|
+
|-------|------|---------|
|
|
172
|
+
| `Connection` | `lib/.../connection.rb` | Faraday connection builder with `:typhoeus` adapter preference, bearer token redaction in logs, middleware stack, and error handling. |
|
|
173
|
+
|
|
174
|
+
### Misc
|
|
175
|
+
| Class | File | Purpose |
|
|
176
|
+
|-------|------|---------|
|
|
177
|
+
| `Schema` | `lib/.../schema.rb` | Bridge to `ruby_llm-schema` for JSON schema tool definitions. |
|
|
178
|
+
| `Error` | `lib/.../error.rb` | Base error class for lex-llm. |
|
|
179
|
+
| `Errors::UnsupportedCapability` | `lib/.../errors/unsupported_capability.rb` | Raised when a provider lacks a requested capability. |
|
|
180
|
+
| `Utils` | `lib/.../utils.rb` | Shared utility methods. |
|
|
181
|
+
| `VERSION` | `lib/.../version.rb` | Current gem version (`0.4.18`). |
|
|
182
|
+
|
|
183
|
+
---
|
|
184
|
+
|
|
87
185
|
## Model Offerings
|
|
88
186
|
|
|
89
187
|
A model offering describes one concrete model made available by one provider instance. It is the base unit for routing, filtering, fleet lane creation, health, policy, and cost decisions.
|
|
@@ -132,30 +230,28 @@ offering.eligible_for?(
|
|
|
132
230
|
|
|
133
231
|
Common offering fields:
|
|
134
232
|
|
|
135
|
-
- `offering_id`: stable identifier
|
|
136
|
-
- `provider_family`:
|
|
233
|
+
- `offering_id`: stable identifier; generated from provider, instance, usage type, and canonical alias when omitted
|
|
234
|
+
- `provider_family`: `:ollama`, `:vllm`, `:bedrock`, `:anthropic`, `:openai`, etc.
|
|
137
235
|
- `provider_instance`: concrete provider instance, account, node, region, or local runtime
|
|
138
236
|
- `instance_id`: compatibility alias for `provider_instance`
|
|
139
|
-
- `model_family`: provider-neutral family such as `:openai`, `:anthropic`, `:
|
|
140
|
-
- `transport`: `:local`, `:http`, `:rabbitmq`, `:sdk
|
|
141
|
-
- `tier`: `:local`, `:private`, `:fleet`, `:cloud`, `:frontier
|
|
142
|
-
- `model`: provider model name or normalized
|
|
143
|
-
- `canonical_model_alias`: provider-neutral alias
|
|
237
|
+
- `model_family`: provider-neutral family such as `:openai`, `:anthropic`, `:qwen`, `:llama`
|
|
238
|
+
- `transport`: `:local`, `:http`, `:rabbitmq`, `:sdk`
|
|
239
|
+
- `tier`: `:local`, `:private`, `:fleet`, `:cloud`, `:frontier`
|
|
240
|
+
- `model`: provider model name or normalized alias
|
|
241
|
+
- `canonical_model_alias`: provider-neutral alias for routers and fleet lanes
|
|
144
242
|
- `usage_type`: `:inference` or `:embedding`
|
|
145
|
-
- `capabilities`:
|
|
146
|
-
- `limits`: context window, output token limits, rate limits, concurrency
|
|
147
|
-
- `health`: readiness, latency, recent failures
|
|
148
|
-
- `policy_tags`:
|
|
149
|
-
- `routing_metadata`:
|
|
150
|
-
- `metadata`: extension
|
|
151
|
-
|
|
152
|
-
Provider gems that still pass `instance_id`, or that store `model_family`, `canonical_model_alias`, or `alias` under `metadata`, remain compatible. `ModelOffering` lifts those values into first-class readers for routers.
|
|
243
|
+
- `capabilities`: `:chat`, `:tools`, `:json_schema`, `:vision`, `:thinking`, `:embedding`, `:function_calling`
|
|
244
|
+
- `limits`: context window, output token limits, rate limits, concurrency
|
|
245
|
+
- `health`: readiness, latency, recent failures
|
|
246
|
+
- `policy_tags`: `:internal_only`, `:phi_allowed`, `:hipaa`
|
|
247
|
+
- `routing_metadata`: scheduling metadata for routers
|
|
248
|
+
- `metadata`: extension metadata; sensitive values excluded from fleet fingerprints
|
|
153
249
|
|
|
154
|
-
`Legion::Extensions::Llm::Aliases.canonical_model_alias(model, provider)`
|
|
250
|
+
`Legion::Extensions::Llm::Aliases.canonical_model_alias(model, provider)` normalizes aliases from `aliases.json`.
|
|
155
251
|
|
|
156
252
|
## Offering Registry
|
|
157
253
|
|
|
158
|
-
`Legion::Extensions::Llm::Routing::OfferingRegistry` is an in-memory index
|
|
254
|
+
`Legion::Extensions::Llm::Routing::OfferingRegistry` is an in-memory index.
|
|
159
255
|
|
|
160
256
|
```ruby
|
|
161
257
|
registry = Legion::Extensions::Llm::Routing::OfferingRegistry.new
|
|
@@ -171,9 +267,70 @@ registry.filter(
|
|
|
171
267
|
)
|
|
172
268
|
```
|
|
173
269
|
|
|
270
|
+
## Fleet Lanes
|
|
271
|
+
|
|
272
|
+
Fleet routing uses shared work lanes derived from offerings. A lane describes the work, not the worker:
|
|
273
|
+
|
|
274
|
+
```ruby
|
|
275
|
+
offering.lane_key
|
|
276
|
+
# => "llm.fleet.inference.qwen3-6-27b-q4-k-m.ctx32768"
|
|
277
|
+
```
|
|
278
|
+
|
|
279
|
+
Embedding lanes omit context size:
|
|
280
|
+
|
|
281
|
+
```ruby
|
|
282
|
+
Legion::Extensions::Llm::Routing::ModelOffering.new(
|
|
283
|
+
provider_family: :ollama,
|
|
284
|
+
instance_id: :gpu_embed_01,
|
|
285
|
+
transport: :rabbitmq,
|
|
286
|
+
model: 'nomic-embed-text',
|
|
287
|
+
usage_type: :embedding,
|
|
288
|
+
capabilities: %i[embedding]
|
|
289
|
+
).lane_key
|
|
290
|
+
# => "llm.fleet.embed.nomic-embed-text"
|
|
291
|
+
```
|
|
292
|
+
|
|
293
|
+
Any eligible worker can bind to the same lane: local MacBooks, GPU servers, vLLM workers, Ollama workers, or cloud-side LegionIO workers near Bedrock/Vertex/Azure.
|
|
294
|
+
|
|
295
|
+
## Fleet Protocol
|
|
296
|
+
|
|
297
|
+
Fleet communication uses protocol v2 envelopes with strict validation:
|
|
298
|
+
|
|
299
|
+
- `FleetRequest`: encrypted request envelope with `operation`, `request_id`, `correlation_id`, `idempotency_key`, `message_context`, and signed JWT replay token
|
|
300
|
+
- `FleetResponse`: encrypted response envelope with provider output
|
|
301
|
+
- `FleetError`: encrypted error envelope with typed error metadata
|
|
302
|
+
|
|
303
|
+
When `fleet.compliance.encrypt_fleet` is true (default), all envelopes are encrypted via `Legion::Crypt`. JWT replay tokens validate the `issuer` claim and use hash-based claim validation (no raw PHI in base64 payloads).
|
|
304
|
+
|
|
305
|
+
`Fleet::ProviderResponder` handles the responder side: token validation, idempotency, provider dispatch, response publishing. `Fleet::WorkerExecution` handles the worker side: lane binding, message consumption, backpressure.
|
|
306
|
+
|
|
307
|
+
Default fleet settings via `Legion::Extensions::Llm.default_settings` -- fleet and endpoint modes are disabled by default:
|
|
308
|
+
|
|
309
|
+
```ruby
|
|
310
|
+
{
|
|
311
|
+
fleet: {
|
|
312
|
+
enabled: false,
|
|
313
|
+
scheduler: :basic_get,
|
|
314
|
+
consumer_priority: 0,
|
|
315
|
+
queue_expires_ms: 60_000,
|
|
316
|
+
message_ttl_ms: 120_000,
|
|
317
|
+
queue_max_length: 100,
|
|
318
|
+
delivery_limit: 3,
|
|
319
|
+
consumer_ack_timeout_ms: 300_000,
|
|
320
|
+
endpoint: {
|
|
321
|
+
enabled: false,
|
|
322
|
+
empty_lane_backoff_ms: 250,
|
|
323
|
+
idle_backoff_ms: 1_000,
|
|
324
|
+
max_consecutive_pulls_per_lane: 0,
|
|
325
|
+
accept_when: []
|
|
326
|
+
}
|
|
327
|
+
}
|
|
328
|
+
}
|
|
329
|
+
```
|
|
330
|
+
|
|
174
331
|
## Registry Events
|
|
175
332
|
|
|
176
|
-
`Legion::Extensions::Llm::Routing::RegistryEvent` builds
|
|
333
|
+
`Legion::Extensions::Llm::Routing::RegistryEvent` builds envelopes for `llm.registry` publishing.
|
|
177
334
|
|
|
178
335
|
```ruby
|
|
179
336
|
event = Legion::Extensions::Llm::Routing::RegistryEvent.available(
|
|
@@ -186,118 +343,102 @@ event = Legion::Extensions::Llm::Routing::RegistryEvent.available(
|
|
|
186
343
|
)
|
|
187
344
|
|
|
188
345
|
event.to_h
|
|
189
|
-
# => {
|
|
190
|
-
# event_id: "...",
|
|
191
|
-
# event_type: :offering_available,
|
|
192
|
-
# occurred_at: "2026-04-28T14:30:15.123456Z",
|
|
193
|
-
# offering: { ... },
|
|
194
|
-
# runtime: { host_id: "macbook-m4-max", process: { pid: 12345 } },
|
|
195
|
-
# capacity: { concurrency: 4, queued: 0 },
|
|
196
|
-
# health: { ready: true, latency_ms: 180 },
|
|
197
|
-
# lane: "llm.fleet.inference.qwen3-6-27b-q4-k-m.ctx32768",
|
|
198
|
-
# metadata: { observed_by: :lex_llm_ollama }
|
|
199
|
-
# }
|
|
346
|
+
# => { event_id: "...", event_type: :offering_available, offering: { ... }, ... }
|
|
200
347
|
```
|
|
201
348
|
|
|
202
|
-
Supported
|
|
349
|
+
Supported types: `:offering_available`, `:offering_unavailable`, `:offering_degraded`, `:offering_heartbeat`. Sensitive keys (credentials, tokens, secrets, URLs, prompts) are rejected during sanitization.
|
|
203
350
|
|
|
204
|
-
|
|
351
|
+
Publishing is handled by `RegistryPublisher` (parameterized by `provider_family`) through the `llm.registry` exchange.
|
|
352
|
+
|
|
353
|
+
## Credential Sources
|
|
205
354
|
|
|
206
|
-
|
|
355
|
+
`CredentialSources` provides read-only credential discovery:
|
|
207
356
|
|
|
208
357
|
```ruby
|
|
209
|
-
|
|
210
|
-
|
|
358
|
+
Legion::Extensions::Llm::CredentialSources.discover_credentials(
|
|
359
|
+
family: :openai,
|
|
360
|
+
setting_key: 'OPENAI_API_KEY'
|
|
361
|
+
)
|
|
211
362
|
```
|
|
212
363
|
|
|
213
|
-
|
|
364
|
+
Probes env vars, `~/.claude/settings.json`, `~/.codex/auth.json`, `Legion::Settings`, and optional socket/HTTP endpoints. Credentials are deduplicated via `credential_fingerprint` (first 8 chars of SHA-256). Probing is gated behind `extensions.llm.security.credential_source_probing`.
|
|
365
|
+
|
|
366
|
+
Each source gets a provenance tag: `CredentialSources.source_tag(type, location, key)`.
|
|
367
|
+
|
|
368
|
+
## Auto Registration
|
|
369
|
+
|
|
370
|
+
`AutoRegistration` mixin enables providers to self-discover instances and register offerings into `Call::Registry`:
|
|
214
371
|
|
|
215
372
|
```ruby
|
|
216
|
-
Legion::Extensions::Llm::
|
|
217
|
-
|
|
218
|
-
|
|
219
|
-
transport: :rabbitmq,
|
|
220
|
-
model: 'nomic-embed-text',
|
|
221
|
-
usage_type: :embedding,
|
|
222
|
-
capabilities: %i[embedding]
|
|
223
|
-
).lane_key
|
|
224
|
-
# => "llm.fleet.embed.nomic-embed-text"
|
|
225
|
-
```
|
|
373
|
+
class MyProvider < Legion::Extensions::Llm::Provider
|
|
374
|
+
extend Legion::Extensions::Llm::AutoRegistration
|
|
375
|
+
end
|
|
226
376
|
|
|
227
|
-
|
|
377
|
+
MyProvider.rediscover! # Re-probe all instances
|
|
378
|
+
```
|
|
228
379
|
|
|
229
|
-
|
|
230
|
-
- GPU servers in a datacenter
|
|
231
|
-
- vLLM workers
|
|
232
|
-
- Ollama workers
|
|
233
|
-
- cloud-side LegionIO workers near Bedrock, Vertex, Azure, or another provider
|
|
380
|
+
Discovers instances from settings, builds model offerings via `discover_offerings`, and registers them. Passes tier and capabilities metadata to the registry.
|
|
234
381
|
|
|
235
|
-
|
|
382
|
+
## Streaming
|
|
236
383
|
|
|
237
|
-
|
|
384
|
+
`Streaming` provides the streaming framework for OpenAI-compatible SSE responses:
|
|
238
385
|
|
|
239
|
-
|
|
386
|
+
- Faraday middleware handles chunk parsing, thinking extraction, and error handling
|
|
387
|
+
- `StreamAccumulator` accumulates deltas into complete messages with tool-call assembly
|
|
388
|
+
- Retries on HTTP 500 with partial body preservation
|
|
389
|
+
- Handles both Net::HTTP and Typhoeus adapters (Typhoeus chunks arrive with nil/0 status during streaming)
|
|
390
|
+
- Provider thinking (`</think>` tags, `reasoning_content`) is stripped from caller-visible content
|
|
240
391
|
|
|
241
392
|
```ruby
|
|
242
|
-
|
|
243
|
-
#
|
|
244
|
-
|
|
245
|
-
# enabled: false,
|
|
246
|
-
# scheduler: :basic_get,
|
|
247
|
-
# consumer_priority: 0,
|
|
248
|
-
# queue_expires_ms: 60_000,
|
|
249
|
-
# message_ttl_ms: 120_000,
|
|
250
|
-
# queue_max_length: 100,
|
|
251
|
-
# delivery_limit: 3,
|
|
252
|
-
# consumer_ack_timeout_ms: 300_000,
|
|
253
|
-
# endpoint: {
|
|
254
|
-
# enabled: false,
|
|
255
|
-
# empty_lane_backoff_ms: 250,
|
|
256
|
-
# idle_backoff_ms: 1_000,
|
|
257
|
-
# max_consecutive_pulls_per_lane: 0,
|
|
258
|
-
# accept_when: []
|
|
259
|
-
# }
|
|
260
|
-
# }
|
|
261
|
-
# }
|
|
393
|
+
provider.stream_chat(messages:, model:, tools: []) do |chunk|
|
|
394
|
+
# chunk is a Chunk or StreamChunk with content_delta, reasoning_delta, tool_call_delta
|
|
395
|
+
end
|
|
262
396
|
```
|
|
263
397
|
|
|
264
|
-
|
|
398
|
+
## Schema & Tools
|
|
265
399
|
|
|
266
|
-
-
|
|
267
|
-
- endpoint fleet mode is separately disabled by default
|
|
268
|
-
- queue and message TTLs are bounded
|
|
269
|
-
- pull scheduling is the default for endpoint-style workers
|
|
270
|
-
- provider gems can override defaults through `Legion::Settings`
|
|
271
|
-
|
|
272
|
-
Provider gems can build a complete provider settings hash without duplicating merge logic:
|
|
400
|
+
`Legion::Extensions::Llm::Schema` bridges `ruby_llm-schema` for JSON schema tool definitions. Tools are defined as:
|
|
273
401
|
|
|
274
402
|
```ruby
|
|
275
|
-
Legion::Extensions::Llm.
|
|
276
|
-
|
|
277
|
-
|
|
278
|
-
|
|
279
|
-
|
|
403
|
+
Legion::Extensions::Llm::Tool.new(
|
|
404
|
+
name: 'search',
|
|
405
|
+
description: 'Search the knowledge base',
|
|
406
|
+
parameters: {
|
|
407
|
+
type: 'object',
|
|
408
|
+
properties: {
|
|
409
|
+
query: { type: 'string', description: 'Search query' }
|
|
410
|
+
},
|
|
411
|
+
required: %w[query]
|
|
280
412
|
}
|
|
281
413
|
)
|
|
282
414
|
```
|
|
283
415
|
|
|
284
|
-
##
|
|
416
|
+
## Response Objects
|
|
417
|
+
|
|
418
|
+
All provider responses should normalize through the shared response objects:
|
|
285
419
|
|
|
286
|
-
|
|
420
|
+
- `Responses::ChatResponse` -- chat completions with message, usage, thinking, finish_reason
|
|
421
|
+
- `Responses::EmbeddingResponse` -- vectors, usage, model
|
|
422
|
+
- `Responses::StreamChunk` -- streaming deltas
|
|
423
|
+
- `Responses::ThinkingExtractor` -- extracts thinking from multiple formats (reasoning_content, `</think>` tags, untagged preambles)
|
|
287
424
|
|
|
288
|
-
|
|
425
|
+
Provider-specific thinking is always separated from caller-visible content.
|
|
289
426
|
|
|
290
|
-
|
|
291
|
-
- provider default settings
|
|
292
|
-
- model discovery or a static model offering registry
|
|
293
|
-
- provider request translation
|
|
294
|
-
- provider response translation
|
|
295
|
-
- health and readiness checks
|
|
296
|
-
- embedding support separately from inference support when the provider exposes both
|
|
427
|
+
---
|
|
297
428
|
|
|
298
|
-
Provider
|
|
429
|
+
## Provider Extension Contract
|
|
299
430
|
|
|
300
|
-
|
|
431
|
+
A provider gem uses `lex-llm` for shared behavior and implements only provider-specific transport, authentication, model discovery, and translation.
|
|
432
|
+
|
|
433
|
+
At minimum, a provider extension defines:
|
|
434
|
+
|
|
435
|
+
- `Legion::Extensions::Llm::<Provider>` namespace
|
|
436
|
+
- Provider default settings
|
|
437
|
+
- Model discovery or static model offering registry
|
|
438
|
+
- Provider request/response translation
|
|
439
|
+
- Health and readiness checks
|
|
440
|
+
|
|
441
|
+
Canonical provider calls (all keyword-based):
|
|
301
442
|
|
|
302
443
|
```ruby
|
|
303
444
|
provider.chat(messages:, model:, tools: [], temperature: nil, params: {}, headers: {}, schema: nil, thinking: nil)
|
|
@@ -309,27 +450,63 @@ provider.health(live: false)
|
|
|
309
450
|
provider.discover_offerings(live: false, **filters)
|
|
310
451
|
```
|
|
311
452
|
|
|
312
|
-
|
|
453
|
+
Inherited from `Provider`:
|
|
454
|
+
|
|
455
|
+
- `#readiness(live: false)` -- configured state, locality, base URL, non-live health metadata
|
|
456
|
+
- `#model_detail(model_name)` -- cache-backed lookup (24h TTL; nil results not cached)
|
|
457
|
+
- `#model_allowed?(model_name)` -- whitelist/blacklist check
|
|
458
|
+
- `#discover_offerings(live: false)` -- cached live discovery when `live: false`, probes endpoints when `true`
|
|
459
|
+
- `#offering_transport` / `#offering_tier` -- instance methods with class-level `default_transport`/`default_tier` overrides
|
|
460
|
+
- `#runtime_provider_setting(key)` -- fallback to `Legion::Settings` for model whitelist/blacklist
|
|
313
461
|
|
|
314
|
-
|
|
462
|
+
Inherited from `Provider::OpenAICompatible`:
|
|
315
463
|
|
|
316
|
-
|
|
464
|
+
- Full OpenAI-compatible API translation
|
|
465
|
+
- Model list parsing with capability/modality normalization
|
|
466
|
+
- Streaming with thinking extraction
|
|
467
|
+
- Embedding, image, transcription, moderation support
|
|
468
|
+
- `fetch_model_detail` override hook for live API model metadata
|
|
317
469
|
|
|
318
|
-
##
|
|
470
|
+
## Configuration
|
|
319
471
|
|
|
320
|
-
|
|
472
|
+
Provider settings are built with `Legion::Extensions::Llm.provider_settings`:
|
|
321
473
|
|
|
322
474
|
```ruby
|
|
323
|
-
Legion::Extensions::Llm
|
|
475
|
+
Legion::Extensions::Llm.provider_settings(
|
|
476
|
+
family: :ollama,
|
|
477
|
+
instance: {
|
|
478
|
+
base_url: 'http://localhost:11434',
|
|
479
|
+
fleet: { enabled: true, consumer_priority: 10 }
|
|
480
|
+
}
|
|
481
|
+
)
|
|
324
482
|
```
|
|
325
483
|
|
|
326
|
-
|
|
484
|
+
`ProviderSettings.infer_tier_from_endpoint(url)` returns `:local` for localhost/loopback, `:direct` for all other hosts.
|
|
327
485
|
|
|
328
|
-
|
|
329
|
-
RubyLLM::Schema
|
|
330
|
-
```
|
|
486
|
+
Key settings paths:
|
|
331
487
|
|
|
332
|
-
|
|
488
|
+
- `extensions.llm.fleet` -- fleet participation and behavior
|
|
489
|
+
- `extensions.llm.fleet.endpoint` -- endpoint-style worker configuration
|
|
490
|
+
- `extensions.llm.fleet.compliance.encrypt_fleet` -- encrypt fleet envelopes (default true)
|
|
491
|
+
- `extensions.llm.fleet.auth.verify_issuer` -- validate JWT issuer (default true)
|
|
492
|
+
- `extensions.llm.security.credential_source_probing` -- gate credential probing (default true)
|
|
493
|
+
- `extensions.llm.model_whitelist` / `model_blacklist` -- provider-level model filters
|
|
494
|
+
- `extensions.llm.<family>.instance.<name>.model_whitelist` -- per-instance override
|
|
495
|
+
|
|
496
|
+
---
|
|
497
|
+
|
|
498
|
+
## Provider Dependencies
|
|
499
|
+
|
|
500
|
+
| Extension | Depends on |
|
|
501
|
+
|-----------|-----------|
|
|
502
|
+
| `Provider` | `Legion::Cache::Helper`, `Legion::Logging::Helper`, `Legion::Settings`, `Legion::JSON` |
|
|
503
|
+
| `Streaming` | Faraday (`:typhoeus` or `:net_http`), Typhoeus |
|
|
504
|
+
| `Connection` | Faraday, Faraday::Typhoeus |
|
|
505
|
+
| `CredentialSources` | `Legion::Settings` (for Legion-settings probes) |
|
|
506
|
+
| `Fleet::*` | `Legion::Crypt` (when `encrypt_fleet` is true), `Legion::Transport` (AMQP via bunny) |
|
|
507
|
+
| `Schema` | `ruby_llm-schema` |
|
|
508
|
+
|
|
509
|
+
Runtime gem dependencies: `legion-json`, `legion-settings`, `legion-logging`, `legion-cache`, `faraday`, `faraday-typhoeus`, `ruby_llm-schema`.
|
|
333
510
|
|
|
334
511
|
## Development
|
|
335
512
|
|
|
@@ -339,20 +516,39 @@ Install dependencies:
|
|
|
339
516
|
bundle install
|
|
340
517
|
```
|
|
341
518
|
|
|
342
|
-
Run
|
|
519
|
+
Run the full test suite:
|
|
343
520
|
|
|
344
521
|
```bash
|
|
345
|
-
bundle exec
|
|
522
|
+
bundle exec rspec
|
|
346
523
|
```
|
|
347
524
|
|
|
348
|
-
Run
|
|
525
|
+
Run lint and auto-correct:
|
|
349
526
|
|
|
350
527
|
```bash
|
|
351
|
-
bundle exec
|
|
528
|
+
bundle exec rubocop -A
|
|
352
529
|
```
|
|
353
530
|
|
|
354
531
|
`Gemfile.lock` is intentionally not committed for this repo.
|
|
355
532
|
|
|
533
|
+
### Testing Rules
|
|
534
|
+
|
|
535
|
+
- Do NOT mock `Legion::Settings`, `Legion::Logging`, `Legion::JSON`, or `Legion::Cache` -- require the real gems
|
|
536
|
+
- `Legion::Cache.setup` activates the Memory adapter in test (no Redis needed)
|
|
537
|
+
- `Faraday::ConnectionFailed` is rescued in `discover_offerings` with a concise log
|
|
538
|
+
- `bundle exec rspec && bundle exec rubocop -A` is the gate before committing
|
|
539
|
+
|
|
540
|
+
## Key Patterns
|
|
541
|
+
|
|
542
|
+
- `Provider` includes `Legion::Cache::Helper` -- use `cache_get`/`cache_set` directly
|
|
543
|
+
- `model_detail(model_name)` -- cache-backed lookup (cache_get -> fetch_model_detail -> cache_set if non-nil)
|
|
544
|
+
- `fetch_model_detail` -- override in subclass for live API calls; return `{ context_window: N }` or nil
|
|
545
|
+
- `model_detail_cache_key` includes credential fingerprint for non-local providers
|
|
546
|
+
- `model_whitelist`/`model_blacklist` -- checks instance config first, then provider settings
|
|
547
|
+
- `discover_offerings` filters via `model_allowed?` and rescues `Faraday::ConnectionFailed`
|
|
548
|
+
- Faraday response logger: `errors: false` -- never dump raw stacktraces from HTTP failures
|
|
549
|
+
- `CredentialSources.source_tag(type, location, key)` -- provenance tag for discovered credentials
|
|
550
|
+
- `CredentialSources.credential_fingerprint(value)` -- first 8 chars of SHA-256
|
|
551
|
+
|
|
356
552
|
## Attribution
|
|
357
553
|
|
|
358
554
|
`lex-llm` began as a LegionIO fork of RubyLLM. RubyLLM remains credited under the MIT license in `LICENSE`.
|
data/lex-llm.gemspec
CHANGED
|
@@ -35,6 +35,7 @@ Gem::Specification.new do |spec|
|
|
|
35
35
|
spec.add_dependency 'faraday-multipart', '>= 1'
|
|
36
36
|
spec.add_dependency 'faraday-net_http', '>= 1'
|
|
37
37
|
spec.add_dependency 'faraday-retry', '>= 1'
|
|
38
|
+
spec.add_dependency 'faraday-typhoeus', '>= 0.2'
|
|
38
39
|
spec.add_dependency 'legion-cache', '>= 1.3.0'
|
|
39
40
|
spec.add_dependency 'legion-crypt', '>= 1.5.1'
|
|
40
41
|
spec.add_dependency 'legion-json', '>= 1.2.1'
|
|
@@ -53,6 +53,10 @@ module Legion
|
|
|
53
53
|
option :log_stream_debug, -> { ENV['LEGION_LLM_STREAM_DEBUG'] == 'true' }
|
|
54
54
|
option :log_regexp_timeout, -> { Regexp.respond_to?(:timeout) ? (Regexp.timeout || 1.0) : nil }
|
|
55
55
|
|
|
56
|
+
# Prompt caching
|
|
57
|
+
option :llm_cache_enabled, true
|
|
58
|
+
option :cache_control_prefix_tokens, 4
|
|
59
|
+
|
|
56
60
|
def initialize
|
|
57
61
|
self.class.send(:defaults).each do |key, default|
|
|
58
62
|
value = default.respond_to?(:call) ? instance_exec(&default) : default
|