lex-llm-gateway 0.2.0 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +5 -0
- data/README.md +76 -1
- data/lib/legion/extensions/llm/gateway/version.rb +1 -1
- data/lib/legion/extensions/llm/gateway.rb +1 -0
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 9781b8397704e33e074250be57228735624bc3831d440c8c7919922739a06c35
|
|
4
|
+
data.tar.gz: d1f17c0173cdba69b5f6c91332d2eb859f2fd1a6dc0f36acd4b7b456c2efc9ba
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: df28717d0579f52423e513ea1c97a001fed6c33b0f679eefd994dd51e0e36843b09b1b4018aff1d394b23d58f8fc4b6445e712857eefc4e46c61b7393abb36de
|
|
7
|
+
data.tar.gz: e8c5ccdd18c0f832ad0bdf17164aa24cadad3acd2d26677aa9a736c7ca94d58fa8320f151c429269e353e2e5fcdc64cbf1bc968a4c7336cf163bf3740260fa2f
|
data/CHANGELOG.md
CHANGED
data/README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# lex-llm-gateway
|
|
2
2
|
|
|
3
|
-
LLM inference gateway for LegionIO. Provides centralized metering over RabbitMQ, fleet RPC dispatch to GPU workers, and local disk spooling for offline resilience.
|
|
3
|
+
LLM inference gateway for [LegionIO](https://github.com/LegionIO/LegionIO). Provides centralized metering over RabbitMQ, fleet RPC dispatch to GPU workers, and local disk spooling for offline resilience.
|
|
4
4
|
|
|
5
5
|
## Installation
|
|
6
6
|
|
|
@@ -10,6 +10,81 @@ Add to your Gemfile:
|
|
|
10
10
|
gem 'lex-llm-gateway'
|
|
11
11
|
```
|
|
12
12
|
|
|
13
|
+
## Overview
|
|
14
|
+
|
|
15
|
+
`lex-llm-gateway` wraps all LLM calls with automatic metering and fleet routing. It is designed for clusters with 100k+ edge nodes that cannot have direct database access.
|
|
16
|
+
|
|
17
|
+
Three node roles:
|
|
18
|
+
|
|
19
|
+
| Role | What It Does |
|
|
20
|
+
|------|-------------|
|
|
21
|
+
| **Publisher** (all nodes) | Calls `Inference.chat` which auto-meters to RMQ or disk spool |
|
|
22
|
+
| **Fleet Worker** (GPU nodes) | Runs InferenceWorker actor, processes fleet requests |
|
|
23
|
+
| **Metering Writer** (DB nodes) | Runs MeteringWriter actor, writes to `metering_records` |
|
|
24
|
+
|
|
25
|
+
## Degradation Ladder
|
|
26
|
+
|
|
27
|
+
```
|
|
28
|
+
Full stack (transport + gateway + LLM + fleet)
|
|
29
|
+
no transport -> spool to disk, flush when reconnected
|
|
30
|
+
no gateway -> Legion::LLM direct (no metering)
|
|
31
|
+
no fleet -> local/cloud only
|
|
32
|
+
no cloud -> local LLM only
|
|
33
|
+
no local -> error
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
## Runners
|
|
37
|
+
|
|
38
|
+
- **Metering** - `build_event`, `publish_or_spool`, `flush_spool`
|
|
39
|
+
- **Inference** - `chat`, `embed`, `structured` (all auto-metered)
|
|
40
|
+
- **Fleet** - `dispatch` to GPU workers with timeout and JWT auth
|
|
41
|
+
- **FleetHandler** - `handle_fleet_request` (validates JWT, calls local LLM)
|
|
42
|
+
- **MeteringWriter** - `write_metering_record` (DB insert consumed from RMQ)
|
|
43
|
+
|
|
44
|
+
## Standalone Client
|
|
45
|
+
|
|
46
|
+
```ruby
|
|
47
|
+
require 'legion/extensions/llm/gateway/client'
|
|
48
|
+
|
|
49
|
+
client = Legion::Extensions::LLM::Gateway::Client.new
|
|
50
|
+
result = client.chat(model: 'claude-opus-4-6', messages: [{ role: 'user', content: 'Hello' }])
|
|
51
|
+
result[:success] # => true
|
|
52
|
+
result[:response] # => "Hello! How can I help you?"
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
## Settings
|
|
56
|
+
|
|
57
|
+
```json
|
|
58
|
+
{
|
|
59
|
+
"llm": {
|
|
60
|
+
"routing": {
|
|
61
|
+
"use_fleet": true,
|
|
62
|
+
"fleet": {
|
|
63
|
+
"timeout_seconds": 30,
|
|
64
|
+
"require_auth": false
|
|
65
|
+
}
|
|
66
|
+
}
|
|
67
|
+
}
|
|
68
|
+
}
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
## Requirements
|
|
72
|
+
|
|
73
|
+
- Ruby >= 3.4
|
|
74
|
+
- [LegionIO](https://github.com/LegionIO/LegionIO) framework
|
|
75
|
+
- `legion-transport` (AMQP metering + inference queues)
|
|
76
|
+
- `legion-crypt` (JWT signing for fleet auth, optional)
|
|
77
|
+
- `legion-data` (MeteringWriter and disk spool, optional)
|
|
78
|
+
- `legion-llm` (inference execution on fleet workers)
|
|
79
|
+
|
|
80
|
+
## Development
|
|
81
|
+
|
|
82
|
+
```bash
|
|
83
|
+
bundle install
|
|
84
|
+
bundle exec rspec # 199 examples, 0 failures
|
|
85
|
+
bundle exec rubocop # 0 offenses
|
|
86
|
+
```
|
|
87
|
+
|
|
13
88
|
## License
|
|
14
89
|
|
|
15
90
|
MIT
|