lex-llm 0.4.15 → 0.4.18

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md CHANGED
@@ -2,42 +2,35 @@
2
2
 
3
3
  [![CI](https://github.com/LegionIO/lex-llm/actions/workflows/ci.yml/badge.svg)](https://github.com/LegionIO/lex-llm/actions/workflows/ci.yml)
4
4
 
5
- Shared LegionIO framework for LLM provider extensions.
6
-
7
- `lex-llm` is a standard Legion extension gem. It does not expose a standalone RubyLLM-compatible API, Rails integration, generators, rake tasks, or concrete providers. Its runtime contract is `Legion::Extensions::Llm`, which provider gems extend through nested namespaces such as `Legion::Extensions::Llm::Ollama`.
8
-
9
- The routing principle is simple: provider is not the routing unit anymore. A concrete model offering is.
10
-
11
- That lets Legion reason about one local Ollama instance with many models, multiple remote Ollama or vLLM instances, Bedrock accounts in different regions, direct frontier providers, and fleet workers on MacBooks, GPU servers, or cloud-side proxy nodes.
12
-
13
- ## What This Gem Owns
14
-
15
- `lex-llm` provides provider-neutral primitives only. Provider-specific behavior belongs in provider gems.
16
-
17
- This gem owns:
18
-
19
- - `Legion::Extensions::Llm`, the Legion extension namespace used by autoloading and settings
20
- - provider-neutral request, response, message, content, token, and tool objects
21
- - schema bridging through `Legion::Extensions::Llm::Schema`
22
- - model metadata and capability normalization
23
- - routing structures such as `Legion::Extensions::Llm::Routing::ModelOffering`
24
- - fleet lane key generation for shared RabbitMQ work lanes
25
- - shared chat, embedding, moderation, image, transcription, streaming, and OpenAI-compatible adapter helpers
26
- - shared runtime dependencies such as `legion-json`, `legion-settings`, and `legion-logging`
27
-
28
- Concrete provider gems should depend on this gem and implement the provider-specific transport, authentication, model discovery, request translation, response translation, and health checks.
29
-
30
- Expected provider gems include:
31
-
32
- - `lex-llm-ollama`
33
- - `lex-llm-vllm`
34
- - `lex-llm-anthropic`
35
- - `lex-llm-openai`
36
- - `lex-llm-gemini`
37
- - `lex-llm-mlx`
38
- - `lex-llm-bedrock`
39
- - `lex-llm-vertex`
40
- - `lex-llm-azure-foundry`
5
+ Base provider framework for all LegionIO LLM provider extensions.
6
+
7
+ `lex-llm` is a standard Legion extension gem that provides provider-neutral primitives for LLM integration. It does not include concrete provider implementations -- those live in `lex-llm-*` gems (e.g. `lex-llm-ollama`, `lex-llm-openai`, `lex-llm-bedrock`). The routing unit is a **model offering**, not a provider, enabling Legion to reason about any combination of local instances, remote servers, cloud providers, and fleet workers.
8
+
9
+ ---
10
+
11
+ ## Quick Index
12
+
13
+ | Topic | Section |
14
+ |-------|---------|
15
+ | Install & depend | [Install](#install) |
16
+ | Extension namespace | [Namespace](#namespace) |
17
+ | Core classes & files | [Class Index](#class-index) |
18
+ | Model offerings (routing) | [Model Offerings](#model-offerings) |
19
+ | In-memory offering registry | [Offering Registry](#offering-registry) |
20
+ | Fleet lanes & work routing | [Fleet Lanes](#fleet-lanes) |
21
+ | Fleet protocol v2 | [Fleet Protocol](#fleet-protocol) |
22
+ | Registry events | [Registry Events](#registry-events) |
23
+ | Provider contract | [Provider Extension Contract](#provider-extension-contract) |
24
+ | Streaming & accumulator | [Streaming](#streaming) |
25
+ | Credential discovery | [Credential Sources](#credential-sources) |
26
+ | Auto-registration | [Auto Registration](#auto-registration) |
27
+ | Provider settings | [Provider Settings](#provider-settings) |
28
+ | Schema & tools | [Schema & Tools](#schema--tools) |
29
+ | Response objects | [Response Objects](#response-objects) |
30
+ | Configuration | [Configuration](#configuration) |
31
+ | Running tests | [Development](#development) |
32
+
33
+ ---
41
34
 
42
35
  ## Install
43
36
 
@@ -61,9 +54,7 @@ Load the extension through the Legion namespace:
61
54
  require 'legion/extensions/llm'
62
55
  ```
63
56
 
64
- Provider gems must use nested Legion extension namespaces so LegionIO autoloading can find them consistently.
65
-
66
- Example for `lex-llm-ollama`:
57
+ All classes live under `Legion::Extensions::Llm`. Provider gems must use nested Legion extension namespaces so LegionIO autoloading finds them consistently:
67
58
 
68
59
  ```ruby
69
60
  require 'legion/extensions/llm'
@@ -84,6 +75,113 @@ module Legion
84
75
  end
85
76
  ```
86
77
 
78
+ ---
79
+
80
+ ## Class Index
81
+
82
+ ### Core
83
+ | Class | File | Purpose |
84
+ |-------|------|---------|
85
+ | `Provider` | `lib/.../provider.rb` | Base class for all provider adapters. Includes `Legion::Cache::Helper` and `Legion::Logging::Helper`. Mixin entry point for credentials, model caching, and model whitelist/blacklist. |
86
+ | `Provider::OpenAICompatible` | `lib/.../provider/open_ai_compatible.rb` | Shared adapter for OpenAI-compatible servers (vLLM, Ollama, MLX, local proxies). Handles request/response translation, streaming, tool calls, embedding, image, transcription, and thinking extraction. |
87
+ | `ProviderContract` | `lib/.../provider_contract.rb` | Defines the canonical provider interface: `chat`, `stream_chat`, `embed`, `image`, `count_tokens`, `health`, `discover_offerings`. Raises `UnsupportedCapability` for unimplemented methods. |
88
+ | `Configuration` | `lib/.../configuration.rb` | Hash-backed provider config wrapper; normalizes instance-level and fleet-level settings. |
89
+ | `ProviderSettings` | `lib/.../provider_settings.rb` | Builds complete provider settings from `family`, `instance`, and nested fleet settings. Includes `infer_tier_from_endpoint(url)` to detect `:local` vs `:direct`. |
90
+
91
+ ### Requests & Data Types
92
+ | Class | File | Purpose |
93
+ |-------|------|---------|
94
+ | `Message` | `lib/.../message.rb` | Structured message (role, content, tool calls, attachments, thinking). |
95
+ | `Content` | `lib/.../content.rb` | Content part (text, image, file, tool result) with MIME type support. |
96
+ | `Tool` | `lib/.../tool.rb` | Tool definition (name, description, parameters, strict mode). |
97
+ | `ToolCall` | `lib/.../tool_call.rb` | Tool call result (id, function name, arguments, result). |
98
+ | `Attachment` | `lib/.../attachment.rb` | File attachment with content, filename, and MIME type. |
99
+ | `Chunk` | `lib/.../chunk.rb` | Streaming chunk wrapper (content delta, reasoning, tool call delta, usage). |
100
+ | `Context` | `lib/.../context.rb` | Conversation context builder; normalizes history and strips thinking. |
101
+ | `Thinking` | `lib/.../thinking.rb` | Thinking/reasoning metadata extracted from provider output. |
102
+ | `MimeType` | `lib/.../mime_type.rb` | MIME type utilities for image and file content. |
103
+
104
+ ### Model & Routing
105
+ | Class | File | Purpose |
106
+ |-------|------|---------|
107
+ | `Model::Info` | `lib/.../model/info.rb` | Immutable `Data.define` struct: `instance`, `provider_family`, `provider_model`, `parameter_count`, `quantization`, `size_bytes`, `modalities_input/output`, `context_window`, `max_output_tokens`, `pricing`, `capabilities`, `created_at`, `knowledge_cutoff`. Factory: `Model::Info.from_hash` for legacy hash compatibility. |
108
+ | `Model::Modalities` | `lib/.../model/modalities.rb` | Canonical modality symbols and helpers. |
109
+ | `Model::Pricing` | `lib/.../model/pricing.rb` | Pricing data struct with `PricingCategory` and `PricingTier`. |
110
+ | `Models` | `lib/.../models.rb` | Shared model listing and metadata normalization. Uses `Call::Registry` with namespace-scanning fallback. |
111
+ | `Routing::ModelOffering` | `lib/.../routing/model_offering.rb` | Concrete offering: one model on one provider instance. Routing/filtering/health/policy unit. See [Model Offerings](#model-offerings). |
112
+ | `Routing::OfferingRegistry` | `lib/.../routing/offering_registry.rb` | In-memory index for offerings. See [Offering Registry](#offering-registry). |
113
+ | `Routing::LaneKey` | `lib/.../routing/lane_key.rb` | Derives fleet lane key strings from offerings. |
114
+ | `Aliases` | `lib/.../aliases.rb` | Canonical model alias normalization from `aliases.json`. |
115
+ | `Routing::RegistryEvent` | `lib/.../routing/registry_event.rb` | Envelope builder for registry availability events. |
116
+
117
+ ### Responses
118
+ | Class | File | Purpose |
119
+ |-------|------|---------|
120
+ | `Responses::ChatResponse` | `lib/.../responses/chat_response.rb` | Normalized chat response: message, usage, thinking, finish_reason. |
121
+ | `Responses::EmbeddingResponse` | `lib/.../responses/embedding_response.rb` | Normalized embedding response: vectors, usage, model. |
122
+ | `Responses::StreamChunk` | `lib/.../responses/stream_chunk.rb` | Normalized stream chunk with delta fields and metadata. |
123
+ | `Responses::ThinkingExtractor` | `lib/.../responses/thinking_extractor.rb` | Extracts thinking/reasoning from provider output (reasoning_content, `</think>` tags, untagged preambles). |
124
+
125
+ ### Streaming
126
+ | Class | File | Purpose |
127
+ |-------|------|---------|
128
+ | `Streaming` | `lib/.../streaming.rb` | Streaming framework: Faraday middleware, chunk parsing, retry on status 500, thinking extraction, error handling. Handles both Net::HTTP and Typhoeus adapters. |
129
+ | `StreamAccumulator` | `lib/.../stream_accumulator.rb` | Accumulates streaming deltas into complete messages; assembles partial tool-call arguments, separates thinking from content, builds tool call arrays. |
130
+
131
+ ### Fleet (Protocol v2)
132
+ | Class | File | Purpose |
133
+ |-------|------|---------|
134
+ | `Fleet::Protocol` | `lib/.../fleet/protocol.rb` | Protocol v2 constants, field names, and versioning. |
135
+ | `Fleet::EnvelopeValidation` | `lib/.../fleet/envelope_validation.rb` | Validates v2 envelopes; rejects legacy fields. |
136
+ | `Fleet::TokenValidator` | `lib/.../fleet/token_validator.rb` | Validates JWT replay tokens with issuer verification and hash-based claims. |
137
+ | `Fleet::TokenError` | `lib/.../fleet/token_error.rb` | Token validation error types. |
138
+ | `Fleet::Settings` | `lib/.../fleet/settings.rb` | Default fleet settings builder (consumer, auth, endpoint). |
139
+ | `Fleet::ProviderResponder` | `lib/.../fleet/provider_responder.rb` | Responder-side execution: receives fleet requests, validates tokens, dispatches to provider, publishes responses. |
140
+ | `Fleet::WorkerExecution` | `lib/.../fleet/worker_execution.rb` | Worker-side execution: binds to lanes, pulls/consumes messages, manages backpressure. |
141
+ | `Fleet::DefaultExchangeReply` | `lib/.../fleet/default_exchange_reply.rb` | Publishes replies via AMQP default exchange with publisher confirms. |
142
+ | `Fleet::PublishSafety` | `lib/.../fleet/publish_safety.rb` | Guards against infinite requeues on publish failure. |
143
+ | `Transport::Messages::FleetRequest` | `lib/.../transport/messages/fleet_request.rb` | Encrypted fleet request envelope (v2). |
144
+ | `Transport::Messages::FleetResponse` | `lib/.../transport/messages/fleet_response.rb` | Encrypted fleet response envelope (v2). |
145
+ | `Transport::Messages::FleetError` | `lib/.../transport/messages/fleet_error.rb` | Encrypted fleet error envelope (v2). |
146
+ | `Transport::Exchanges::Fleet` | `lib/.../transport/exchanges/fleet.rb` | Fleet exchange declarations. |
147
+ | `Transport::Exchanges::LlmRegistry` | `lib/.../transport/exchanges/llm_registry.rb` | Registry exchange for offering availability events. |
148
+ | `Transport::FleetLane` | `lib/.../transport/fleet_lane.rb` | Fleet lane declaration and binding. |
149
+ | `RegistryPublisher` | `lib/.../registry_publisher.rb` | Publishes registry events to `llm.registry` exchange. |
150
+ | `RegistryEventBuilder` | `lib/.../registry_event_builder.rb` | Builds sanitized registry event messages. |
151
+
152
+ ### Credentials & Discovery
153
+ | Class | File | Purpose |
154
+ |-------|------|---------|
155
+ | `CredentialSources` | `lib/.../credential_sources.rb` | Read-only probes: env vars, `~/.claude/settings.json`, `~/.codex/auth.json`, `Legion::Settings`, socket/HTTP probes. SHA-256 credential dedup via `credential_fingerprint`. Includes `source_tag(type, location, key)` for provenance. Probing gated behind `extensions.llm.security.credential_source_probing`. |
156
+ | `AutoRegistration` | `lib/.../auto_registration.rb` | Mixin for provider self-registration into `Call::Registry`. Discovers instances, builds offerings, handles rediscovery. Pure discovery -- no upward registry mutation. |
157
+
158
+ ### Capabilities
159
+ | Class | File | Purpose |
160
+ |-------|------|---------|
161
+ | `Chat` | `lib/.../chat.rb` | Shared chat request builder and parameter normalization. |
162
+ | `Embedding` | `lib/.../embedding.rb` | Embedding request builder. |
163
+ | `Image` | `lib/.../image.rb` | Image generation request builder. |
164
+ | `Moderation` | `lib/.../moderation.rb` | Moderation request builder. |
165
+ | `Tokens` | `lib/.../tokens.rb` | Token counting request builder. |
166
+ | `Transcription` | `lib/.../transcription.rb` | Audio transcription request builder. |
167
+ | `Agent` | `lib/.../agent.rb` | Agent-specific context and parameter helpers. |
168
+
169
+ ### Connection
170
+ | Class | File | Purpose |
171
+ |-------|------|---------|
172
+ | `Connection` | `lib/.../connection.rb` | Faraday connection builder with `:typhoeus` adapter preference, bearer token redaction in logs, middleware stack, and error handling. |
173
+
174
+ ### Misc
175
+ | Class | File | Purpose |
176
+ |-------|------|---------|
177
+ | `Schema` | `lib/.../schema.rb` | Bridge to `ruby_llm-schema` for JSON schema tool definitions. |
178
+ | `Error` | `lib/.../error.rb` | Base error class for lex-llm. |
179
+ | `Errors::UnsupportedCapability` | `lib/.../errors/unsupported_capability.rb` | Raised when a provider lacks a requested capability. |
180
+ | `Utils` | `lib/.../utils.rb` | Shared utility methods. |
181
+ | `VERSION` | `lib/.../version.rb` | Current gem version (`0.4.18`). |
182
+
183
+ ---
184
+
87
185
  ## Model Offerings
88
186
 
89
187
  A model offering describes one concrete model made available by one provider instance. It is the base unit for routing, filtering, fleet lane creation, health, policy, and cost decisions.
@@ -132,30 +230,28 @@ offering.eligible_for?(
132
230
 
133
231
  Common offering fields:
134
232
 
135
- - `offering_id`: stable identifier for the concrete offering; generated from provider, instance, usage type, and canonical alias when omitted
136
- - `provider_family`: provider implementation family, such as `:ollama`, `:vllm`, `:bedrock`, `:anthropic`, or `:openai`
233
+ - `offering_id`: stable identifier; generated from provider, instance, usage type, and canonical alias when omitted
234
+ - `provider_family`: `:ollama`, `:vllm`, `:bedrock`, `:anthropic`, `:openai`, etc.
137
235
  - `provider_instance`: concrete provider instance, account, node, region, or local runtime
138
236
  - `instance_id`: compatibility alias for `provider_instance`
139
- - `model_family`: provider-neutral family such as `:openai`, `:anthropic`, `:gemini`, `:qwen`, or `:llama`
140
- - `transport`: `:local`, `:http`, `:rabbitmq`, `:sdk`, or another provider-supported transport
141
- - `tier`: `:local`, `:private`, `:fleet`, `:cloud`, `:frontier`, or deployment-specific policy tier
142
- - `model`: provider model name or normalized model alias
143
- - `canonical_model_alias`: provider-neutral alias used by routers and shared fleet lane keys when a provider deployment hides the base model
237
+ - `model_family`: provider-neutral family such as `:openai`, `:anthropic`, `:qwen`, `:llama`
238
+ - `transport`: `:local`, `:http`, `:rabbitmq`, `:sdk`
239
+ - `tier`: `:local`, `:private`, `:fleet`, `:cloud`, `:frontier`
240
+ - `model`: provider model name or normalized alias
241
+ - `canonical_model_alias`: provider-neutral alias for routers and fleet lanes
144
242
  - `usage_type`: `:inference` or `:embedding`
145
- - `capabilities`: normalized feature flags such as `:chat`, `:tools`, `:json_schema`, `:vision`, `:thinking`, or `:embedding`
146
- - `limits`: context window, output token limits, rate limits, concurrency limits, and provider-specific bounds
147
- - `health`: readiness, latency, recent failures, and provider-specific health metadata
148
- - `policy_tags`: routing and compliance tags such as `:internal_only`, `:phi_allowed`, or `:hipaa`
149
- - `routing_metadata`: provider-neutral scheduling metadata for routers; persistence is intentionally out of scope
150
- - `metadata`: extension-specific metadata; sensitive values are excluded from fleet eligibility fingerprints
151
-
152
- Provider gems that still pass `instance_id`, or that store `model_family`, `canonical_model_alias`, or `alias` under `metadata`, remain compatible. `ModelOffering` lifts those values into first-class readers for routers.
243
+ - `capabilities`: `:chat`, `:tools`, `:json_schema`, `:vision`, `:thinking`, `:embedding`, `:function_calling`
244
+ - `limits`: context window, output token limits, rate limits, concurrency
245
+ - `health`: readiness, latency, recent failures
246
+ - `policy_tags`: `:internal_only`, `:phi_allowed`, `:hipaa`
247
+ - `routing_metadata`: scheduling metadata for routers
248
+ - `metadata`: extension metadata; sensitive values excluded from fleet fingerprints
153
249
 
154
- `Legion::Extensions::Llm::Aliases.canonical_model_alias(model, provider)` provides shared alias normalization from `aliases.json`, with an explicit model string fallback.
250
+ `Legion::Extensions::Llm::Aliases.canonical_model_alias(model, provider)` normalizes aliases from `aliases.json`.
155
251
 
156
252
  ## Offering Registry
157
253
 
158
- `Legion::Extensions::Llm::Routing::OfferingRegistry` is an in-memory index for discovered or configured offerings. It does not persist state.
254
+ `Legion::Extensions::Llm::Routing::OfferingRegistry` is an in-memory index.
159
255
 
160
256
  ```ruby
161
257
  registry = Legion::Extensions::Llm::Routing::OfferingRegistry.new
@@ -171,9 +267,70 @@ registry.filter(
171
267
  )
172
268
  ```
173
269
 
270
+ ## Fleet Lanes
271
+
272
+ Fleet routing uses shared work lanes derived from offerings. A lane describes the work, not the worker:
273
+
274
+ ```ruby
275
+ offering.lane_key
276
+ # => "llm.fleet.inference.qwen3-6-27b-q4-k-m.ctx32768"
277
+ ```
278
+
279
+ Embedding lanes omit context size:
280
+
281
+ ```ruby
282
+ Legion::Extensions::Llm::Routing::ModelOffering.new(
283
+ provider_family: :ollama,
284
+ instance_id: :gpu_embed_01,
285
+ transport: :rabbitmq,
286
+ model: 'nomic-embed-text',
287
+ usage_type: :embedding,
288
+ capabilities: %i[embedding]
289
+ ).lane_key
290
+ # => "llm.fleet.embed.nomic-embed-text"
291
+ ```
292
+
293
+ Any eligible worker can bind to the same lane: local MacBooks, GPU servers, vLLM workers, Ollama workers, or cloud-side LegionIO workers near Bedrock/Vertex/Azure.
294
+
295
+ ## Fleet Protocol
296
+
297
+ Fleet communication uses protocol v2 envelopes with strict validation:
298
+
299
+ - `FleetRequest`: encrypted request envelope with `operation`, `request_id`, `correlation_id`, `idempotency_key`, `message_context`, and signed JWT replay token
300
+ - `FleetResponse`: encrypted response envelope with provider output
301
+ - `FleetError`: encrypted error envelope with typed error metadata
302
+
303
+ When `fleet.compliance.encrypt_fleet` is true (default), all envelopes are encrypted via `Legion::Crypt`. JWT replay tokens validate the `issuer` claim and use hash-based claim validation (no raw PHI in base64 payloads).
304
+
305
+ `Fleet::ProviderResponder` handles the responder side: token validation, idempotency, provider dispatch, response publishing. `Fleet::WorkerExecution` handles the worker side: lane binding, message consumption, backpressure.
306
+
307
+ Default fleet settings via `Legion::Extensions::Llm.default_settings` -- fleet and endpoint modes are disabled by default:
308
+
309
+ ```ruby
310
+ {
311
+ fleet: {
312
+ enabled: false,
313
+ scheduler: :basic_get,
314
+ consumer_priority: 0,
315
+ queue_expires_ms: 60_000,
316
+ message_ttl_ms: 120_000,
317
+ queue_max_length: 100,
318
+ delivery_limit: 3,
319
+ consumer_ack_timeout_ms: 300_000,
320
+ endpoint: {
321
+ enabled: false,
322
+ empty_lane_backoff_ms: 250,
323
+ idle_backoff_ms: 1_000,
324
+ max_consecutive_pulls_per_lane: 0,
325
+ accept_when: []
326
+ }
327
+ }
328
+ }
329
+ ```
330
+
174
331
  ## Registry Events
175
332
 
176
- `Legion::Extensions::Llm::Routing::RegistryEvent` builds dependency-light envelopes for future `llm.registry` publishing. It does not persist registry state or publish messages by itself.
333
+ `Legion::Extensions::Llm::Routing::RegistryEvent` builds envelopes for `llm.registry` publishing.
177
334
 
178
335
  ```ruby
179
336
  event = Legion::Extensions::Llm::Routing::RegistryEvent.available(
@@ -186,118 +343,102 @@ event = Legion::Extensions::Llm::Routing::RegistryEvent.available(
186
343
  )
187
344
 
188
345
  event.to_h
189
- # => {
190
- # event_id: "...",
191
- # event_type: :offering_available,
192
- # occurred_at: "2026-04-28T14:30:15.123456Z",
193
- # offering: { ... },
194
- # runtime: { host_id: "macbook-m4-max", process: { pid: 12345 } },
195
- # capacity: { concurrency: 4, queued: 0 },
196
- # health: { ready: true, latency_ms: 180 },
197
- # lane: "llm.fleet.inference.qwen3-6-27b-q4-k-m.ctx32768",
198
- # metadata: { observed_by: :lex_llm_ollama }
199
- # }
346
+ # => { event_id: "...", event_type: :offering_available, offering: { ... }, ... }
200
347
  ```
201
348
 
202
- Supported event types are `:offering_available`, `:offering_unavailable`, `:offering_degraded`, and `:offering_heartbeat`. Event offerings are derived from `ModelOffering#to_h`, with sensitive offering fields removed. Optional `runtime`, `capacity`, `health`, `lane`, and `metadata` values are intended for non-secret operational context and reject sensitive keys such as credentials, tokens, secrets, URLs, endpoint paths, prompts, and reply queues.
349
+ Supported types: `:offering_available`, `:offering_unavailable`, `:offering_degraded`, `:offering_heartbeat`. Sensitive keys (credentials, tokens, secrets, URLs, prompts) are rejected during sanitization.
203
350
 
204
- ## Fleet Lanes
351
+ Publishing is handled by `RegistryPublisher` (parameterized by `provider_family`) through the `llm.registry` exchange.
352
+
353
+ ## Credential Sources
205
354
 
206
- Fleet routing uses shared work lanes derived from model offerings. A lane describes the work required, not the worker that happens to do it.
355
+ `CredentialSources` provides read-only credential discovery:
207
356
 
208
357
  ```ruby
209
- offering.lane_key
210
- # => "llm.fleet.inference.qwen3-6-27b-q4-k-m.ctx32768"
358
+ Legion::Extensions::Llm::CredentialSources.discover_credentials(
359
+ family: :openai,
360
+ setting_key: 'OPENAI_API_KEY'
361
+ )
211
362
  ```
212
363
 
213
- Embedding lanes omit context size:
364
+ Probes env vars, `~/.claude/settings.json`, `~/.codex/auth.json`, `Legion::Settings`, and optional socket/HTTP endpoints. Credentials are deduplicated via `credential_fingerprint` (first 8 chars of SHA-256). Probing is gated behind `extensions.llm.security.credential_source_probing`.
365
+
366
+ Each source gets a provenance tag: `CredentialSources.source_tag(type, location, key)`.
367
+
368
+ ## Auto Registration
369
+
370
+ `AutoRegistration` mixin enables providers to self-discover instances and register offerings into `Call::Registry`:
214
371
 
215
372
  ```ruby
216
- Legion::Extensions::Llm::Routing::ModelOffering.new(
217
- provider_family: :ollama,
218
- instance_id: :gpu_embed_01,
219
- transport: :rabbitmq,
220
- model: 'nomic-embed-text',
221
- usage_type: :embedding,
222
- capabilities: %i[embedding]
223
- ).lane_key
224
- # => "llm.fleet.embed.nomic-embed-text"
225
- ```
373
+ class MyProvider < Legion::Extensions::Llm::Provider
374
+ extend Legion::Extensions::Llm::AutoRegistration
375
+ end
226
376
 
227
- The intent is that any eligible worker can bind to the same lane:
377
+ MyProvider.rediscover! # Re-probe all instances
378
+ ```
228
379
 
229
- - local MacBook workers
230
- - GPU servers in a datacenter
231
- - vLLM workers
232
- - Ollama workers
233
- - cloud-side LegionIO workers near Bedrock, Vertex, Azure, or another provider
380
+ Discovers instances from settings, builds model offerings via `discover_offerings`, and registers them. Passes tier and capabilities metadata to the registry.
234
381
 
235
- Busy endpoint workers should not reject/requeue in a hot loop. Endpoint fleet workers can use pull-style scheduling, while server-class workers can use normal consumers with prefetch and consumer priority.
382
+ ## Streaming
236
383
 
237
- ## Default Fleet Settings
384
+ `Streaming` provides the streaming framework for OpenAI-compatible SSE responses:
238
385
 
239
- `Legion::Extensions::Llm.default_settings` provides defaults that provider extensions inherit and override:
386
+ - Faraday middleware handles chunk parsing, thinking extraction, and error handling
387
+ - `StreamAccumulator` accumulates deltas into complete messages with tool-call assembly
388
+ - Retries on HTTP 500 with partial body preservation
389
+ - Handles both Net::HTTP and Typhoeus adapters (Typhoeus chunks arrive with nil/0 status during streaming)
390
+ - Provider thinking (`</think>` tags, `reasoning_content`) is stripped from caller-visible content
240
391
 
241
392
  ```ruby
242
- Legion::Extensions::Llm.default_settings
243
- # => {
244
- # fleet: {
245
- # enabled: false,
246
- # scheduler: :basic_get,
247
- # consumer_priority: 0,
248
- # queue_expires_ms: 60_000,
249
- # message_ttl_ms: 120_000,
250
- # queue_max_length: 100,
251
- # delivery_limit: 3,
252
- # consumer_ack_timeout_ms: 300_000,
253
- # endpoint: {
254
- # enabled: false,
255
- # empty_lane_backoff_ms: 250,
256
- # idle_backoff_ms: 1_000,
257
- # max_consecutive_pulls_per_lane: 0,
258
- # accept_when: []
259
- # }
260
- # }
261
- # }
393
+ provider.stream_chat(messages:, model:, tools: []) do |chunk|
394
+ # chunk is a Chunk or StreamChunk with content_delta, reasoning_delta, tool_call_delta
395
+ end
262
396
  ```
263
397
 
264
- The defaults are conservative:
398
+ ## Schema & Tools
265
399
 
266
- - fleet participation is off unless configured
267
- - endpoint fleet mode is separately disabled by default
268
- - queue and message TTLs are bounded
269
- - pull scheduling is the default for endpoint-style workers
270
- - provider gems can override defaults through `Legion::Settings`
271
-
272
- Provider gems can build a complete provider settings hash without duplicating merge logic:
400
+ `Legion::Extensions::Llm::Schema` bridges `ruby_llm-schema` for JSON schema tool definitions. Tools are defined as:
273
401
 
274
402
  ```ruby
275
- Legion::Extensions::Llm.provider_settings(
276
- family: :ollama,
277
- instance: {
278
- base_url: 'http://localhost:11434',
279
- fleet: { enabled: true, consumer_priority: 10 }
403
+ Legion::Extensions::Llm::Tool.new(
404
+ name: 'search',
405
+ description: 'Search the knowledge base',
406
+ parameters: {
407
+ type: 'object',
408
+ properties: {
409
+ query: { type: 'string', description: 'Search query' }
410
+ },
411
+ required: %w[query]
280
412
  }
281
413
  )
282
414
  ```
283
415
 
284
- ## Provider Extension Contract
416
+ ## Response Objects
417
+
418
+ All provider responses should normalize through the shared response objects:
285
419
 
286
- A provider gem should use `lex-llm` for shared behavior and implement only the provider-specific pieces.
420
+ - `Responses::ChatResponse` -- chat completions with message, usage, thinking, finish_reason
421
+ - `Responses::EmbeddingResponse` -- vectors, usage, model
422
+ - `Responses::StreamChunk` -- streaming deltas
423
+ - `Responses::ThinkingExtractor` -- extracts thinking from multiple formats (reasoning_content, `</think>` tags, untagged preambles)
287
424
 
288
- At minimum, a provider extension should define:
425
+ Provider-specific thinking is always separated from caller-visible content.
289
426
 
290
- - `Legion::Extensions::Llm::<Provider>`
291
- - provider default settings
292
- - model discovery or a static model offering registry
293
- - provider request translation
294
- - provider response translation
295
- - health and readiness checks
296
- - embedding support separately from inference support when the provider exposes both
427
+ ---
297
428
 
298
- Provider extensions should avoid duplicating shared classes, schema logic, fleet lane construction, JSON handling, or common request/response objects.
429
+ ## Provider Extension Contract
299
430
 
300
- Canonical provider calls are keyword-based:
431
+ A provider gem uses `lex-llm` for shared behavior and implements only provider-specific transport, authentication, model discovery, and translation.
432
+
433
+ At minimum, a provider extension defines:
434
+
435
+ - `Legion::Extensions::Llm::<Provider>` namespace
436
+ - Provider default settings
437
+ - Model discovery or static model offering registry
438
+ - Provider request/response translation
439
+ - Health and readiness checks
440
+
441
+ Canonical provider calls (all keyword-based):
301
442
 
302
443
  ```ruby
303
444
  provider.chat(messages:, model:, tools: [], temperature: nil, params: {}, headers: {}, schema: nil, thinking: nil)
@@ -309,27 +450,63 @@ provider.health(live: false)
309
450
  provider.discover_offerings(live: false, **filters)
310
451
  ```
311
452
 
312
- Provider responses should normalize through the shared response objects before they reach callers. Visible assistant text and provider reasoning are separate values: provider-specific thinking fields, OpenAI-compatible `reasoning_content`, and literal `<think>...</think>` text are removed from caller-visible content and preserved as thinking metadata when present.
453
+ Inherited from `Provider`:
454
+
455
+ - `#readiness(live: false)` -- configured state, locality, base URL, non-live health metadata
456
+ - `#model_detail(model_name)` -- cache-backed lookup (24h TTL; nil results not cached)
457
+ - `#model_allowed?(model_name)` -- whitelist/blacklist check
458
+ - `#discover_offerings(live: false)` -- cached live discovery when `live: false`, probes endpoints when `true`
459
+ - `#offering_transport` / `#offering_tier` -- instance methods with class-level `default_transport`/`default_tier` overrides
460
+ - `#runtime_provider_setting(key)` -- fallback to `Legion::Settings` for model whitelist/blacklist
313
461
 
314
- Fleet envelopes also live here. `FleetRequest`, `FleetResponse`, and `FleetError` are protocol-v2 transport messages with `operation`, `request_id`, `correlation_id`, `idempotency_key`, `message_context`, and signed-token fields. Provider gems should consume and publish these shared envelopes instead of defining local fleet message shapes.
462
+ Inherited from `Provider::OpenAICompatible`:
315
463
 
316
- All providers inherit `#readiness(live: false)`, which returns configured state, provider locality, API base, endpoint helpers, and non-live health metadata without probing remote services. Providers with a cheap health endpoint can pass `live: true` to include that endpoint response. OpenAI-compatible providers also inherit shared model-list parsing that maps discovered models into normalized capabilities and modalities for Legion routing.
464
+ - Full OpenAI-compatible API translation
465
+ - Model list parsing with capability/modality normalization
466
+ - Streaming with thinking extraction
467
+ - Embedding, image, transcription, moderation support
468
+ - `fetch_model_detail` override hook for live API model metadata
317
469
 
318
- ## Schema Status
470
+ ## Configuration
319
471
 
320
- `lex-llm` still depends on `ruby_llm-schema` because the current schema bridge exposes:
472
+ Provider settings are built with `Legion::Extensions::Llm.provider_settings`:
321
473
 
322
474
  ```ruby
323
- Legion::Extensions::Llm::Schema
475
+ Legion::Extensions::Llm.provider_settings(
476
+ family: :ollama,
477
+ instance: {
478
+ base_url: 'http://localhost:11434',
479
+ fleet: { enabled: true, consumer_priority: 10 }
480
+ }
481
+ )
324
482
  ```
325
483
 
326
- as:
484
+ `ProviderSettings.infer_tier_from_endpoint(url)` returns `:local` for localhost/loopback, `:direct` for all other hosts.
327
485
 
328
- ```ruby
329
- RubyLLM::Schema
330
- ```
486
+ Key settings paths:
331
487
 
332
- That dependency should stay until LegionIO owns or replaces the schema layer directly.
488
+ - `extensions.llm.fleet` -- fleet participation and behavior
489
+ - `extensions.llm.fleet.endpoint` -- endpoint-style worker configuration
490
+ - `extensions.llm.fleet.compliance.encrypt_fleet` -- encrypt fleet envelopes (default true)
491
+ - `extensions.llm.fleet.auth.verify_issuer` -- validate JWT issuer (default true)
492
+ - `extensions.llm.security.credential_source_probing` -- gate credential probing (default true)
493
+ - `extensions.llm.model_whitelist` / `model_blacklist` -- provider-level model filters
494
+ - `extensions.llm.<family>.instance.<name>.model_whitelist` -- per-instance override
495
+
496
+ ---
497
+
498
+ ## Provider Dependencies
499
+
500
+ | Extension | Depends on |
501
+ |-----------|-----------|
502
+ | `Provider` | `Legion::Cache::Helper`, `Legion::Logging::Helper`, `Legion::Settings`, `Legion::JSON` |
503
+ | `Streaming` | Faraday (`:typhoeus` or `:net_http`), Typhoeus |
504
+ | `Connection` | Faraday, Faraday::Typhoeus |
505
+ | `CredentialSources` | `Legion::Settings` (for Legion-settings probes) |
506
+ | `Fleet::*` | `Legion::Crypt` (when `encrypt_fleet` is true), `Legion::Transport` (AMQP via bunny) |
507
+ | `Schema` | `ruby_llm-schema` |
508
+
509
+ Runtime gem dependencies: `legion-json`, `legion-settings`, `legion-logging`, `legion-cache`, `faraday`, `faraday-typhoeus`, `ruby_llm-schema`.
333
510
 
334
511
  ## Development
335
512
 
@@ -339,20 +516,39 @@ Install dependencies:
339
516
  bundle install
340
517
  ```
341
518
 
342
- Run lint:
519
+ Run the full test suite:
343
520
 
344
521
  ```bash
345
- bundle exec rubocop -A
522
+ bundle exec rspec
346
523
  ```
347
524
 
348
- Run the full test suite:
525
+ Run lint and auto-correct:
349
526
 
350
527
  ```bash
351
- bundle exec rspec --format json --out tmp/rspec_results.json --format progress --out tmp/rspec_progress.txt
528
+ bundle exec rubocop -A
352
529
  ```
353
530
 
354
531
  `Gemfile.lock` is intentionally not committed for this repo.
355
532
 
533
+ ### Testing Rules
534
+
535
+ - Do NOT mock `Legion::Settings`, `Legion::Logging`, `Legion::JSON`, or `Legion::Cache` -- require the real gems
536
+ - `Legion::Cache.setup` activates the Memory adapter in test (no Redis needed)
537
+ - `Faraday::ConnectionFailed` is rescued in `discover_offerings` with a concise log
538
+ - `bundle exec rspec && bundle exec rubocop -A` is the gate before committing
539
+
540
+ ## Key Patterns
541
+
542
+ - `Provider` includes `Legion::Cache::Helper` -- use `cache_get`/`cache_set` directly
543
+ - `model_detail(model_name)` -- cache-backed lookup (cache_get -> fetch_model_detail -> cache_set if non-nil)
544
+ - `fetch_model_detail` -- override in subclass for live API calls; return `{ context_window: N }` or nil
545
+ - `model_detail_cache_key` includes credential fingerprint for non-local providers
546
+ - `model_whitelist`/`model_blacklist` -- checks instance config first, then provider settings
547
+ - `discover_offerings` filters via `model_allowed?` and rescues `Faraday::ConnectionFailed`
548
+ - Faraday response logger: `errors: false` -- never dump raw stacktraces from HTTP failures
549
+ - `CredentialSources.source_tag(type, location, key)` -- provenance tag for discovered credentials
550
+ - `CredentialSources.credential_fingerprint(value)` -- first 8 chars of SHA-256
551
+
356
552
  ## Attribution
357
553
 
358
554
  `lex-llm` began as a LegionIO fork of RubyLLM. RubyLLM remains credited under the MIT license in `LICENSE`.
data/lex-llm.gemspec CHANGED
@@ -35,6 +35,7 @@ Gem::Specification.new do |spec|
35
35
  spec.add_dependency 'faraday-multipart', '>= 1'
36
36
  spec.add_dependency 'faraday-net_http', '>= 1'
37
37
  spec.add_dependency 'faraday-retry', '>= 1'
38
+ spec.add_dependency 'faraday-typhoeus', '>= 0.2'
38
39
  spec.add_dependency 'legion-cache', '>= 1.3.0'
39
40
  spec.add_dependency 'legion-crypt', '>= 1.5.1'
40
41
  spec.add_dependency 'legion-json', '>= 1.2.1'
@@ -53,6 +53,10 @@ module Legion
53
53
  option :log_stream_debug, -> { ENV['LEGION_LLM_STREAM_DEBUG'] == 'true' }
54
54
  option :log_regexp_timeout, -> { Regexp.respond_to?(:timeout) ? (Regexp.timeout || 1.0) : nil }
55
55
 
56
+ # Prompt caching
57
+ option :llm_cache_enabled, true
58
+ option :cache_control_prefix_tokens, 4
59
+
56
60
  def initialize
57
61
  self.class.send(:defaults).each do |key, default|
58
62
  value = default.respond_to?(:call) ? instance_exec(&default) : default