@khanglvm/llm-router 1.0.6 → 1.0.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +14 -0
- package/README.md +127 -379
- package/package.json +13 -1
- package/src/cli/router-module.js +530 -0
package/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,20 @@ All notable changes to this project will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [1.0.8] - 2026-02-28
|
|
9
|
+
|
|
10
|
+
### Changed
|
|
11
|
+
- Added focused npm `keywords` metadata in `package.json` to improve package discoverability.
|
|
12
|
+
|
|
13
|
+
## [1.0.7] - 2026-02-28
|
|
14
|
+
|
|
15
|
+
### Added
|
|
16
|
+
- Added `llm-router ai-help` to generate an agent-oriented operating guide with live gateway checks and coding-tool patch instructions.
|
|
17
|
+
- Added tests covering `ai-help` discovery output and first-run setup guidance.
|
|
18
|
+
|
|
19
|
+
### Changed
|
|
20
|
+
- Rewrote `README.md` into a shorter setup and operations guide focused on providers, aliases, rate limits, and local/hosted usage.
|
|
21
|
+
|
|
8
22
|
## [1.0.6] - 2026-02-28
|
|
9
23
|
|
|
10
24
|
### Added
|
package/README.md
CHANGED
|
@@ -1,440 +1,188 @@
|
|
|
1
1
|
# llm-router
|
|
2
2
|
|
|
3
|
-
`llm-router`
|
|
3
|
+
`llm-router` exposes unified API endpoint for multiple AI providers and models.
|
|
4
4
|
|
|
5
|
-
|
|
6
|
-
- local route server `llm-router start`
|
|
7
|
-
- Cloudflare Worker route runtime deployment `llm-router deploy`
|
|
8
|
-
- CLI + TUI management `config`, `start`, `deploy`, `worker-key`
|
|
9
|
-
- Seamless model fallback
|
|
5
|
+
## Main feature
|
|
10
6
|
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
npm i -g @khanglvm/llm-router
|
|
15
|
-
```
|
|
16
|
-
|
|
17
|
-
## Versioning
|
|
7
|
+
1. Single endpoint, unified providers & models
|
|
8
|
+
2. Support grouping models with rate-limit and load balancing strategy
|
|
9
|
+
3. Configuration auto reload in real time, no interruption
|
|
18
10
|
|
|
19
|
-
|
|
20
|
-
- Release notes live in [`CHANGELOG.md`](./CHANGELOG.md).
|
|
21
|
-
- npm publishes are configured for the public registry package.
|
|
22
|
-
|
|
23
|
-
Release checklist:
|
|
24
|
-
- Update `README.md` if user-facing behavior changed.
|
|
25
|
-
- Add a dated entry in `CHANGELOG.md`.
|
|
26
|
-
- Bump the package version before publish.
|
|
27
|
-
- Publish with `npm publish`.
|
|
28
|
-
|
|
29
|
-
## Quick Start
|
|
11
|
+
## Install
|
|
30
12
|
|
|
31
13
|
```bash
|
|
32
|
-
|
|
33
|
-
llm-router
|
|
34
|
-
|
|
35
|
-
# 2) Start local route server
|
|
36
|
-
llm-router start
|
|
14
|
+
npm i -g @khanglvm/llm-router@latest
|
|
37
15
|
```
|
|
38
16
|
|
|
39
|
-
|
|
40
|
-
- Unified (Auto transform): `http://127.0.0.1:8787/route` (or `/` and `/v1`)
|
|
41
|
-
- Anthropic: `http://127.0.0.1:8787/anthropic`
|
|
42
|
-
- OpenAI: `http://127.0.0.1:8787/openai`
|
|
43
|
-
|
|
44
|
-
## Usage Example
|
|
45
|
-
|
|
46
|
-
```bash
|
|
47
|
-
# Your AI Agent can help! Ask them to manage api router via this tool for you.
|
|
48
|
-
|
|
49
|
-
# 1) Add provider + models + provider API key. You can ask your AI agent to do it for you, or manually via TUI or command line:
|
|
50
|
-
llm-router config \
|
|
51
|
-
--operation=upsert-provider \
|
|
52
|
-
--provider-id=openrouter \
|
|
53
|
-
--name="OpenRouter" \
|
|
54
|
-
--base-url=https://openrouter.ai/api/v1 \
|
|
55
|
-
--api-key=sk-or-v1-... \
|
|
56
|
-
--models=claude-3-7-sonnet,gpt-4o \
|
|
57
|
-
--format=openai \
|
|
58
|
-
--skip-probe=true
|
|
59
|
-
|
|
60
|
-
# 2) (Optional) Configure model fallback order for direct provider/model requests
|
|
61
|
-
llm-router config \
|
|
62
|
-
--operation=set-model-fallbacks \
|
|
63
|
-
--provider-id=openrouter \
|
|
64
|
-
--model=claude-3-7-sonnet \
|
|
65
|
-
--fallback-models=openrouter/gpt-4o
|
|
66
|
-
|
|
67
|
-
# 3) (Optional) Create a model alias with a routing strategy and weighted targets
|
|
68
|
-
llm-router config \
|
|
69
|
-
--operation=upsert-model-alias \
|
|
70
|
-
--alias-id=chat.default \
|
|
71
|
-
--strategy=auto \
|
|
72
|
-
--targets=openrouter/claude-3-7-sonnet@2,openrouter/gpt-4o@1 \
|
|
73
|
-
--fallback-targets=openrouter/gpt-4o-mini
|
|
74
|
-
|
|
75
|
-
# 4) (Optional) Add provider request-cap bucket (models: all)
|
|
76
|
-
llm-router config \
|
|
77
|
-
--operation=set-provider-rate-limits \
|
|
78
|
-
--provider-id=openrouter \
|
|
79
|
-
--bucket-name="Monthly cap" \
|
|
80
|
-
--bucket-models=all \
|
|
81
|
-
--bucket-requests=20000 \
|
|
82
|
-
--bucket-window=month:1
|
|
83
|
-
|
|
84
|
-
# 5) Set master key (this is your gateway key for client apps)
|
|
85
|
-
llm-router config --operation=set-master-key --master-key=gw_your_gateway_key
|
|
86
|
-
|
|
87
|
-
# 6) Start gateway with auth required
|
|
88
|
-
llm-router start --require-auth=true
|
|
89
|
-
```
|
|
17
|
+
## Usage
|
|
90
18
|
|
|
91
|
-
|
|
19
|
+
Copy/paste this short instruction to your AI agent:
|
|
92
20
|
|
|
93
|
-
```
|
|
94
|
-
|
|
95
|
-
"env": {
|
|
96
|
-
"ANTHROPIC_BASE_URL": "http://127.0.0.1:8787/anthropic",
|
|
97
|
-
"ANTHROPIC_AUTH_TOKEN": "gw_your_gateway_key"
|
|
98
|
-
}
|
|
99
|
-
}
|
|
21
|
+
```text
|
|
22
|
+
Run `llm-router ai-help` first, then set up and operate llm-router for me using CLI commands.
|
|
100
23
|
```
|
|
101
24
|
|
|
102
|
-
##
|
|
103
|
-
|
|
104
|
-
`llm-router` can fail over from a primary model to configured fallback models with status-aware logic:
|
|
105
|
-
- `429` (rate-limited): immediate fallback (no origin retry), with `Retry-After` respected when present.
|
|
106
|
-
- Temporary failures (`408`, `409`, `5xx`, network errors): origin-only bounded retries with jittered backoff, then fallback.
|
|
107
|
-
- Billing/quota exhaustion (`402`, or provider-specific billing signals): immediate fallback with longer origin cooldown memory.
|
|
108
|
-
- Auth and permission failures (`401` and relevant `403` cases): no retry; fallback to other providers/models when possible.
|
|
109
|
-
- Policy/moderation blocks: no retry; cross-provider fallback is disabled by default (`LLM_ROUTER_ALLOW_POLICY_FALLBACK=false`).
|
|
110
|
-
- Invalid client requests (`400`, `413`, `422`): no retry and no fallback short-circuit.
|
|
25
|
+
## Main Workflow
|
|
111
26
|
|
|
112
|
-
|
|
27
|
+
1. Add Providers + models into llm-router
|
|
28
|
+
2. Optionally, group models as alias with load balancing and auto fallback support
|
|
29
|
+
3. Start llm-router server, point your coding tool API and model to llm-router
|
|
113
30
|
|
|
114
|
-
|
|
31
|
+
## What Each Term Means
|
|
115
32
|
|
|
116
|
-
|
|
33
|
+
### Provider
|
|
34
|
+
The service endpoint you call (OpenRouter, Anthropic, etc.).
|
|
117
35
|
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
- `round-robin`: Rotates evenly across eligible targets.
|
|
121
|
-
- `weighted-rr`: Rotates like round-robin, but favors higher weights.
|
|
122
|
-
- `quota-aware-weighted-rr`: Weighted routing plus remaining-capacity awareness.
|
|
123
|
-
|
|
124
|
-
Example:
|
|
125
|
-
|
|
126
|
-
```bash
|
|
127
|
-
llm-router config \
|
|
128
|
-
--operation=upsert-model-alias \
|
|
129
|
-
--alias-id=coding \
|
|
130
|
-
--strategy=auto \
|
|
131
|
-
--targets=rc/gpt-5.3-codex,zai/glm-5
|
|
132
|
-
```
|
|
133
|
-
|
|
134
|
-
Concrete model alias example with provider-specific caps:
|
|
135
|
-
|
|
136
|
-
```bash
|
|
137
|
-
llm-router config \
|
|
138
|
-
--operation=upsert-model-alias \
|
|
139
|
-
--alias-id=coding \
|
|
140
|
-
--strategy=auto \
|
|
141
|
-
--targets=rc/gpt-5.3-codex,zai/glm-5
|
|
142
|
-
|
|
143
|
-
llm-router config \
|
|
144
|
-
--operation=set-provider-rate-limits \
|
|
145
|
-
--provider-id=rc \
|
|
146
|
-
--bucket-name="Minute cap" \
|
|
147
|
-
--bucket-models=gpt-5.3-codex \
|
|
148
|
-
--bucket-requests=60 \
|
|
149
|
-
--bucket-window=minute:1
|
|
150
|
-
|
|
151
|
-
llm-router config \
|
|
152
|
-
--operation=set-provider-rate-limits \
|
|
153
|
-
--provider-id=zai \
|
|
154
|
-
--bucket-name="5-hours cap" \
|
|
155
|
-
--bucket-models=glm-5 \
|
|
156
|
-
--bucket-requests=600 \
|
|
157
|
-
--bucket-window=hour:5
|
|
158
|
-
```
|
|
159
|
-
|
|
160
|
-
## What Is A Bucket?
|
|
161
|
-
|
|
162
|
-
A rate-limit bucket is a request cap for a time window.
|
|
36
|
+
### Model
|
|
37
|
+
The actual model ID from that provider.
|
|
163
38
|
|
|
39
|
+
### Rate-Limit Bucket
|
|
40
|
+
A request cap for a time window.
|
|
164
41
|
Examples:
|
|
165
|
-
- `40
|
|
166
|
-
- `
|
|
167
|
-
|
|
168
|
-
Multiple buckets can apply to the same model scope at the same time. A candidate is treated as exhausted if any matching bucket is exhausted.
|
|
169
|
-
|
|
170
|
-
## TUI Bucket Walkthrough
|
|
171
|
-
|
|
172
|
-
Use the config manager and select:
|
|
173
|
-
- `Manage provider rate-limit buckets`
|
|
174
|
-
- `Create bucket(s)`
|
|
175
|
-
|
|
176
|
-
The TUI now guides you through:
|
|
177
|
-
- Bucket name (friendly label)
|
|
178
|
-
- Model scope (`all` or selected models with multiselect checkboxes)
|
|
179
|
-
- Request cap
|
|
180
|
-
- Window unit (`minute`, `hour(s)`, `week`, `month`)
|
|
181
|
-
- Window size (hours support `N`, other preset units lock to `1`)
|
|
182
|
-
- Review + optional add-another loop for combined policies
|
|
183
|
-
|
|
184
|
-
Internal bucket ids are generated automatically from the name when omitted and shown as advanced detail in review.
|
|
185
|
-
|
|
186
|
-
## Combined-Cap Recipe (`40/min` + `600/6h`)
|
|
187
|
-
|
|
188
|
-
```bash
|
|
189
|
-
llm-router config \
|
|
190
|
-
--operation=set-provider-rate-limits \
|
|
191
|
-
--provider-id=openrouter \
|
|
192
|
-
--bucket-name="Minute cap" \
|
|
193
|
-
--bucket-models=all \
|
|
194
|
-
--bucket-requests=40 \
|
|
195
|
-
--bucket-window=minute:1
|
|
196
|
-
|
|
197
|
-
llm-router config \
|
|
198
|
-
--operation=set-provider-rate-limits \
|
|
199
|
-
--provider-id=openrouter \
|
|
200
|
-
--bucket-name="6-hours cap" \
|
|
201
|
-
--bucket-models=all \
|
|
202
|
-
--bucket-requests=600 \
|
|
203
|
-
--bucket-window=hour:6
|
|
204
|
-
```
|
|
205
|
-
|
|
206
|
-
This keeps both limits active together for the same model scope.
|
|
207
|
-
|
|
208
|
-
## Rate-Limit Troubleshooting
|
|
209
|
-
|
|
210
|
-
- Check routing decisions with `LLM_ROUTER_DEBUG_ROUTING=true` and inspect `x-llm-router-skipped-candidates`.
|
|
211
|
-
- `quota-exhausted` means proactive pre-routing skip happened before an upstream call.
|
|
212
|
-
- For provider `429`, cooldown is tracked from `Retry-After` when present, or from `LLM_ROUTER_ORIGIN_RATE_LIMIT_COOLDOWN_MS`.
|
|
213
|
-
- Local mode persists state by default (file backend), while Worker defaults to in-memory state.
|
|
214
|
-
|
|
215
|
-
## Main Commands
|
|
42
|
+
- `40 requests / minute`
|
|
43
|
+
- `20,000 requests / month`
|
|
216
44
|
|
|
217
|
-
|
|
218
|
-
|
|
219
|
-
llm-router start
|
|
220
|
-
llm-router stop
|
|
221
|
-
llm-router reload
|
|
222
|
-
llm-router update
|
|
223
|
-
llm-router deploy
|
|
224
|
-
llm-router worker-key
|
|
225
|
-
```
|
|
226
|
-
|
|
227
|
-
## Non-Interactive Config (Agent/CI Friendly)
|
|
228
|
-
|
|
229
|
-
```bash
|
|
230
|
-
llm-router config \
|
|
231
|
-
--operation=upsert-provider \
|
|
232
|
-
--provider-id=openrouter \
|
|
233
|
-
--name="OpenRouter" \
|
|
234
|
-
--base-url=https://openrouter.ai/api/v1 \
|
|
235
|
-
--api-key=sk-or-v1-... \
|
|
236
|
-
--models=gpt-4o,claude-3-7-sonnet \
|
|
237
|
-
--format=openai \
|
|
238
|
-
--skip-probe=true
|
|
239
|
-
|
|
240
|
-
llm-router config \
|
|
241
|
-
--operation=upsert-model-alias \
|
|
242
|
-
--alias-id=chat.default \
|
|
243
|
-
--strategy=auto \
|
|
244
|
-
--targets=openrouter/gpt-4o-mini@3,anthropic/claude-3-5-haiku@2 \
|
|
245
|
-
--fallback-targets=openrouter/gpt-4o
|
|
246
|
-
|
|
247
|
-
llm-router config \
|
|
248
|
-
--operation=set-provider-rate-limits \
|
|
249
|
-
--provider-id=openrouter \
|
|
250
|
-
--bucket-name="Monthly cap" \
|
|
251
|
-
--bucket-models=all \
|
|
252
|
-
--bucket-requests=20000 \
|
|
253
|
-
--bucket-window=month:1
|
|
254
|
-
```
|
|
255
|
-
|
|
256
|
-
Alias target syntax:
|
|
257
|
-
- `--targets` / `--fallback-targets`: `<routeRef>@<weight>` or `<routeRef>:<weight>`
|
|
258
|
-
- route refs: direct `provider/model` or alias id
|
|
45
|
+
### Model Load Balancer
|
|
46
|
+
Decides how traffic is distributed across models in an alias group.
|
|
259
47
|
|
|
260
|
-
|
|
48
|
+
Available strategies:
|
|
261
49
|
- `auto` (recommended)
|
|
262
50
|
- `ordered`
|
|
263
51
|
- `round-robin`
|
|
264
52
|
- `weighted-rr`
|
|
265
53
|
- `quota-aware-weighted-rr`
|
|
266
54
|
|
|
267
|
-
|
|
268
|
-
|
|
269
|
-
- `--bucket-window=1w`
|
|
270
|
-
- `--bucket-window=7day`
|
|
271
|
-
|
|
272
|
-
Routing summary:
|
|
273
|
-
|
|
274
|
-
```bash
|
|
275
|
-
llm-router config --operation=list-routing
|
|
276
|
-
```
|
|
55
|
+
### Model Alias (Group models)
|
|
56
|
+
A single model name that auto route/rotate across multiple models.
|
|
277
57
|
|
|
278
|
-
|
|
58
|
+
Example:
|
|
59
|
+
- alias: `opus`
|
|
60
|
+
- targets:
|
|
61
|
+
- `openrouter/claude-opus-4.6`
|
|
62
|
+
- `anthropic/claude-opus-4.6`
|
|
279
63
|
|
|
280
|
-
|
|
281
|
-
llm-router config --operation=migrate-config --target-version=2 --create-backup=true
|
|
282
|
-
```
|
|
64
|
+
Your app can use `opus` model and `llm-router` chooses target models based on your routing settings.
|
|
283
65
|
|
|
284
|
-
|
|
285
|
-
- Local config loads with silent forward-migration to latest supported schema.
|
|
286
|
-
- Migration is persisted automatically on read when possible (best-effort, no interactive prompt).
|
|
287
|
-
- Future/newer version numbers do not fail only because of version mismatch; known fields are normalized best-effort.
|
|
66
|
+
## Setup using Terminal User Interface (TUI)
|
|
288
67
|
|
|
289
|
-
|
|
68
|
+
Open the TUI:
|
|
290
69
|
|
|
291
70
|
```bash
|
|
292
|
-
llm-router
|
|
293
|
-
# or generate a strong key automatically
|
|
294
|
-
llm-router config --operation=set-master-key --generate-master-key=true
|
|
71
|
+
llm-router
|
|
295
72
|
```
|
|
296
73
|
|
|
297
|
-
|
|
74
|
+
Then follow this order.
|
|
75
|
+
|
|
76
|
+
### 1) Add Provider
|
|
77
|
+
Flow:
|
|
78
|
+
1. `Config manager`
|
|
79
|
+
2. `Add/Edit provider`
|
|
80
|
+
3. Enter provider name, endpoint, API key
|
|
81
|
+
4. Enter model list
|
|
82
|
+
5. Save
|
|
83
|
+
|
|
84
|
+
### 2) Configure Model Fallback (Optional)
|
|
85
|
+
Flow:
|
|
86
|
+
1. `Config manager`
|
|
87
|
+
2. `Set model silent-fallbacks`
|
|
88
|
+
3. Pick main model
|
|
89
|
+
4. Pick fallback models
|
|
90
|
+
5. Save
|
|
91
|
+
|
|
92
|
+
### 3) Configure Rate Limits (Optional)
|
|
93
|
+
Flow:
|
|
94
|
+
1. `Config manager`
|
|
95
|
+
2. `Manage provider rate-limit buckets`
|
|
96
|
+
3. `Create bucket(s)`
|
|
97
|
+
4. Set name, model scope, request cap, time window
|
|
98
|
+
5. Save
|
|
99
|
+
|
|
100
|
+
### 4) Group Models With Alias (Recommended)
|
|
101
|
+
Flow:
|
|
102
|
+
1. `Config manager`
|
|
103
|
+
2. `Add/Edit model alias`
|
|
104
|
+
3. Set alias ID (example: `chat.default`)
|
|
105
|
+
4. Select target models
|
|
106
|
+
5. Save
|
|
107
|
+
|
|
108
|
+
### 5) Configure Model Load Balancer
|
|
109
|
+
Flow:
|
|
110
|
+
1. `Config manager`
|
|
111
|
+
2. `Add/Edit model alias`
|
|
112
|
+
3. Open the alias you want to balance
|
|
113
|
+
4. Choose strategy (`auto` recommended)
|
|
114
|
+
5. Review alias targets
|
|
115
|
+
6. Save
|
|
116
|
+
|
|
117
|
+
### 6) Set Gateway Key
|
|
118
|
+
Flow:
|
|
119
|
+
1. `Config manager`
|
|
120
|
+
2. `Set worker master key`
|
|
121
|
+
3. Set or generate key
|
|
122
|
+
4. Save
|
|
123
|
+
|
|
124
|
+
## Start Local Server
|
|
298
125
|
|
|
299
126
|
```bash
|
|
300
|
-
llm-router start
|
|
127
|
+
llm-router start
|
|
301
128
|
```
|
|
302
129
|
|
|
303
|
-
|
|
130
|
+
Local endpoints:
|
|
131
|
+
- Unified: `http://127.0.0.1:8787/route`
|
|
132
|
+
- Anthropic-style: `http://127.0.0.1:8787/anthropic`
|
|
133
|
+
- OpenAI-style: `http://127.0.0.1:8787/openai`
|
|
304
134
|
|
|
305
|
-
|
|
135
|
+
## Connect your coding tool
|
|
306
136
|
|
|
307
|
-
|
|
137
|
+
After setting master key, point your app/agent to local endpoint and use that key as auth token.
|
|
308
138
|
|
|
309
|
-
|
|
310
|
-
llm-router deploy
|
|
311
|
-
```
|
|
139
|
+
Claude Code example (`~/.claude/settings.local.json`):
|
|
312
140
|
|
|
313
|
-
|
|
141
|
+
```json
|
|
142
|
+
{
|
|
143
|
+
"env": {
|
|
144
|
+
"ANTHROPIC_BASE_URL": "http://127.0.0.1:8787",
|
|
145
|
+
"ANTHROPIC_AUTH_TOKEN": "gw_your_gateway_key",
|
|
146
|
+
"ANTHROPIC_DEFAULT_OPUS_MODEL": "provider_name/model_name_1",
|
|
147
|
+
"ANTHROPIC_DEFAULT_SONNET_MODEL": "provider_name/model_name_2",
|
|
148
|
+
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "provider_name/model_name_3"
|
|
149
|
+
}
|
|
150
|
+
}
|
|
151
|
+
```
|
|
314
152
|
|
|
315
|
-
|
|
153
|
+
## Real-Time Update Experience
|
|
316
154
|
|
|
317
|
-
|
|
318
|
-
- `
|
|
319
|
-
-
|
|
155
|
+
When local server is running:
|
|
156
|
+
- open `llm-router`
|
|
157
|
+
- change provider/model/load-balancer/rate-limit/alias in TUI
|
|
158
|
+
- save
|
|
159
|
+
- the running proxy updates instantly
|
|
320
160
|
|
|
321
|
-
|
|
161
|
+
No stop/start cycle needed.
|
|
322
162
|
|
|
323
|
-
|
|
324
|
-
- Create a DNS record in Cloudflare for `llm` (usually `CNAME llm -> @`)
|
|
325
|
-
- Set **Proxy status = Proxied** (orange cloud)
|
|
326
|
-
- Use route target `--route-pattern=llm.example.com/* --zone-name=example.com`
|
|
327
|
-
- Claude Code base URL should be `https://llm.example.com/anthropic` (**no `:8787`**; that port is local-only)
|
|
163
|
+
## Cloudflare Worker (Hosted)
|
|
328
164
|
|
|
329
|
-
|
|
330
|
-
llm-router deploy --export-only=true --out=.llm-router.worker.json
|
|
331
|
-
wrangler secret put LLM_ROUTER_CONFIG_JSON < .llm-router.worker.json
|
|
332
|
-
wrangler deploy
|
|
333
|
-
```
|
|
165
|
+
Use when you want a hosted endpoint instead of local server.
|
|
334
166
|
|
|
335
|
-
|
|
167
|
+
Guided deploy:
|
|
336
168
|
|
|
337
169
|
```bash
|
|
338
|
-
llm-router
|
|
339
|
-
# or generate and rotate immediately
|
|
340
|
-
llm-router worker-key --env=production --generate-master-key=true
|
|
170
|
+
llm-router deploy
|
|
341
171
|
```
|
|
342
172
|
|
|
343
|
-
|
|
344
|
-
|
|
345
|
-
Cloudflare hardening and incident-response checklist: see [`SECURITY.md`](./SECURITY.md).
|
|
346
|
-
|
|
347
|
-
## Runtime Secrets / Env
|
|
348
|
-
|
|
349
|
-
Primary:
|
|
350
|
-
- `LLM_ROUTER_CONFIG_JSON`
|
|
351
|
-
- `LLM_ROUTER_MASTER_KEY` (optional override)
|
|
352
|
-
|
|
353
|
-
Also supported:
|
|
354
|
-
- `ROUTE_CONFIG_JSON`
|
|
355
|
-
- `LLM_ROUTER_JSON`
|
|
173
|
+
You will be guided in TUI to select account and deploy target.
|
|
356
174
|
|
|
357
|
-
|
|
358
|
-
- `LLM_ROUTER_ORIGIN_RETRY_ATTEMPTS` (default `3`)
|
|
359
|
-
- `LLM_ROUTER_ORIGIN_RETRY_BASE_DELAY_MS` (default `250`)
|
|
360
|
-
- `LLM_ROUTER_ORIGIN_RETRY_MAX_DELAY_MS` (default `3000`)
|
|
361
|
-
- `LLM_ROUTER_ORIGIN_FALLBACK_COOLDOWN_MS` (default `45000`)
|
|
362
|
-
- `LLM_ROUTER_ORIGIN_RATE_LIMIT_COOLDOWN_MS` (default `30000`)
|
|
363
|
-
- `LLM_ROUTER_ORIGIN_BILLING_COOLDOWN_MS` (default `900000`)
|
|
364
|
-
- `LLM_ROUTER_ORIGIN_AUTH_COOLDOWN_MS` (default `600000`)
|
|
365
|
-
- `LLM_ROUTER_ORIGIN_POLICY_COOLDOWN_MS` (default `120000`)
|
|
366
|
-
- `LLM_ROUTER_ALLOW_POLICY_FALLBACK` (default `false`)
|
|
367
|
-
- `LLM_ROUTER_FALLBACK_CIRCUIT_FAILURES` (default `2`)
|
|
368
|
-
- `LLM_ROUTER_FALLBACK_CIRCUIT_COOLDOWN_MS` (default `30000`)
|
|
369
|
-
- `LLM_ROUTER_MAX_REQUEST_BODY_BYTES` (default `1048576`, min `4096`, max `20971520`)
|
|
370
|
-
- `LLM_ROUTER_UPSTREAM_TIMEOUT_MS` (default `60000`, min `1000`, max `300000`)
|
|
175
|
+
## Config File Location
|
|
371
176
|
|
|
372
|
-
|
|
373
|
-
- By default, cross-origin browser reads are denied unless explicitly allow-listed.
|
|
374
|
-
- `LLM_ROUTER_CORS_ALLOWED_ORIGINS` (comma-separated exact origins, e.g. `https://app.example.com`)
|
|
375
|
-
- `LLM_ROUTER_CORS_ALLOW_ALL=true` (allows any origin; not recommended for production)
|
|
376
|
-
|
|
377
|
-
Optional source IP allowlist (recommended for Worker deployments):
|
|
378
|
-
- `LLM_ROUTER_ALLOWED_IPS` (comma-separated client IPs; denies requests from all other IPs)
|
|
379
|
-
- `LLM_ROUTER_IP_ALLOWLIST` (alias of `LLM_ROUTER_ALLOWED_IPS`)
|
|
380
|
-
|
|
381
|
-
## Default Config Path
|
|
177
|
+
Local config file:
|
|
382
178
|
|
|
383
179
|
`~/.llm-router.json`
|
|
384
180
|
|
|
385
|
-
|
|
386
|
-
|
|
387
|
-
```json
|
|
388
|
-
{
|
|
389
|
-
"version": 2,
|
|
390
|
-
"masterKey": "local_or_worker_key",
|
|
391
|
-
"defaultModel": "chat.default",
|
|
392
|
-
"modelAliases": {
|
|
393
|
-
"chat.default": {
|
|
394
|
-
"strategy": "auto",
|
|
395
|
-
"targets": [
|
|
396
|
-
{ "ref": "openrouter/gpt-4o" },
|
|
397
|
-
{ "ref": "anthropic/claude-3-5-haiku" }
|
|
398
|
-
],
|
|
399
|
-
"fallbackTargets": [
|
|
400
|
-
{ "ref": "openrouter/gpt-4o-mini" }
|
|
401
|
-
]
|
|
402
|
-
}
|
|
403
|
-
},
|
|
404
|
-
"providers": [
|
|
405
|
-
{
|
|
406
|
-
"id": "openrouter",
|
|
407
|
-
"name": "OpenRouter",
|
|
408
|
-
"baseUrl": "https://openrouter.ai/api/v1",
|
|
409
|
-
"apiKey": "sk-or-v1-...",
|
|
410
|
-
"formats": ["openai"],
|
|
411
|
-
"models": [{ "id": "gpt-4o" }],
|
|
412
|
-
"rateLimits": [
|
|
413
|
-
{
|
|
414
|
-
"id": "openrouter-all-month",
|
|
415
|
-
"name": "Monthly cap",
|
|
416
|
-
"models": ["all"],
|
|
417
|
-
"requests": 20000,
|
|
418
|
-
"window": { "unit": "month", "size": 1 }
|
|
419
|
-
}
|
|
420
|
-
]
|
|
421
|
-
}
|
|
422
|
-
]
|
|
423
|
-
}
|
|
424
|
-
```
|
|
425
|
-
|
|
426
|
-
Direct vs model alias routing:
|
|
427
|
-
- Direct route: request `model=provider/model` and optional model-level `fallbackModels` applies.
|
|
428
|
-
- Model alias route: request `model=alias.id` (or set as `defaultModel`) and the model alias `targets` + `strategy` drive balancing. `auto` is the recommended default for new model aliases.
|
|
429
|
-
|
|
430
|
-
State durability caveats:
|
|
431
|
-
- Local Node (`llm-router start`): routing state defaults to file-backed local persistence, so cooldowns/caps survive restarts.
|
|
432
|
-
- Cloudflare Worker: default state is in-memory per isolate for now; long-window counters are best-effort until a durable Worker backend is configured.
|
|
181
|
+
## Security
|
|
433
182
|
|
|
434
|
-
|
|
183
|
+
See [`SECURITY.md`](./SECURITY.md).
|
|
435
184
|
|
|
436
|
-
|
|
437
|
-
npm run test:provider-smoke
|
|
438
|
-
```
|
|
185
|
+
## Versioning
|
|
439
186
|
|
|
440
|
-
|
|
187
|
+
- Semver: [Semantic Versioning](https://semver.org/)
|
|
188
|
+
- Release notes: [`CHANGELOG.md`](./CHANGELOG.md)
|
package/package.json
CHANGED
|
@@ -1,7 +1,19 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@khanglvm/llm-router",
|
|
3
|
-
"version": "1.0.
|
|
3
|
+
"version": "1.0.8",
|
|
4
4
|
"description": "Single gateway endpoint for multi-provider LLMs with unified OpenAI+Anthropic format and seamless fallback",
|
|
5
|
+
"keywords": [
|
|
6
|
+
"llm-router",
|
|
7
|
+
"llm-gateway",
|
|
8
|
+
"ai-proxy",
|
|
9
|
+
"openai-compatible",
|
|
10
|
+
"anthropic-compatible",
|
|
11
|
+
"model-routing",
|
|
12
|
+
"fallback",
|
|
13
|
+
"load-balancing",
|
|
14
|
+
"cloudflare-workers",
|
|
15
|
+
"agent-infra"
|
|
16
|
+
],
|
|
5
17
|
"type": "module",
|
|
6
18
|
"main": "src/index.js",
|
|
7
19
|
"bin": {
|
package/src/cli/router-module.js
CHANGED
|
@@ -90,6 +90,7 @@ const MODEL_ROUTING_STRATEGY_OPTIONS = [
|
|
|
90
90
|
const MODEL_ALIAS_STRATEGIES = MODEL_ROUTING_STRATEGY_OPTIONS.map((option) => option.value);
|
|
91
91
|
const DEFAULT_PROBE_REQUESTS_PER_MINUTE = 30;
|
|
92
92
|
const DEFAULT_PROBE_MAX_RATE_LIMIT_RETRIES = 3;
|
|
93
|
+
const DEFAULT_AI_HELP_GATEWAY_TEST_TIMEOUT_MS = 6000;
|
|
93
94
|
const RATE_LIMIT_WINDOW_UNIT_ALIASES = new Map([
|
|
94
95
|
["s", "second"],
|
|
95
96
|
["sec", "second"],
|
|
@@ -4659,6 +4660,497 @@ async function runUpdateAction(context) {
|
|
|
4659
4660
|
};
|
|
4660
4661
|
}
|
|
4661
4662
|
|
|
4663
|
+
function toHomeRelativePath(value) {
|
|
4664
|
+
const input = String(value || "").trim();
|
|
4665
|
+
const home = String(process.env.HOME || "").trim();
|
|
4666
|
+
if (!input || !home) return input;
|
|
4667
|
+
if (!input.startsWith(`${home}/`)) return input;
|
|
4668
|
+
return `~${input.slice(home.length)}`;
|
|
4669
|
+
}
|
|
4670
|
+
|
|
4671
|
+
function collectEnabledModelRefsFromConfig(config) {
|
|
4672
|
+
const providers = (config?.providers || []).filter((provider) => provider && provider.enabled !== false);
|
|
4673
|
+
const refs = [];
|
|
4674
|
+
for (const provider of providers) {
|
|
4675
|
+
const providerId = String(provider?.id || "").trim();
|
|
4676
|
+
if (!providerId) continue;
|
|
4677
|
+
for (const model of (provider.models || [])) {
|
|
4678
|
+
if (!model || model.enabled === false) continue;
|
|
4679
|
+
const modelId = String(model.id || "").trim();
|
|
4680
|
+
if (!modelId) continue;
|
|
4681
|
+
refs.push(`${providerId}/${modelId}`);
|
|
4682
|
+
}
|
|
4683
|
+
}
|
|
4684
|
+
return dedupeList(refs);
|
|
4685
|
+
}
|
|
4686
|
+
|
|
4687
|
+
function quoteShellSingle(value) {
|
|
4688
|
+
return `'${String(value || "").replace(/'/g, "'\"'\"'")}'`;
|
|
4689
|
+
}
|
|
4690
|
+
|
|
4691
|
+
function buildCurlGuideCommand(url, {
|
|
4692
|
+
method = "GET",
|
|
4693
|
+
headers = [],
|
|
4694
|
+
jsonBody
|
|
4695
|
+
} = {}) {
|
|
4696
|
+
const parts = ["curl -sS"];
|
|
4697
|
+
if (String(method || "").toUpperCase() !== "GET") {
|
|
4698
|
+
parts.push(`-X ${String(method || "").toUpperCase()}`);
|
|
4699
|
+
}
|
|
4700
|
+
for (const header of headers) {
|
|
4701
|
+
parts.push(`-H ${quoteShellSingle(header)}`);
|
|
4702
|
+
}
|
|
4703
|
+
if (jsonBody !== undefined) {
|
|
4704
|
+
parts.push("-H 'content-type: application/json'");
|
|
4705
|
+
parts.push(`--data ${quoteShellSingle(JSON.stringify(jsonBody))}`);
|
|
4706
|
+
}
|
|
4707
|
+
parts.push(quoteShellSingle(url));
|
|
4708
|
+
return parts.join(" ");
|
|
4709
|
+
}
|
|
4710
|
+
|
|
4711
|
+
async function runGatewayHttpProbe({
|
|
4712
|
+
url,
|
|
4713
|
+
method = "GET",
|
|
4714
|
+
headers = {},
|
|
4715
|
+
jsonBody,
|
|
4716
|
+
timeoutMs = DEFAULT_AI_HELP_GATEWAY_TEST_TIMEOUT_MS
|
|
4717
|
+
} = {}) {
|
|
4718
|
+
const requestHeaders = { ...(headers || {}) };
|
|
4719
|
+
const requestInit = {
|
|
4720
|
+
method: String(method || "GET").toUpperCase(),
|
|
4721
|
+
headers: requestHeaders
|
|
4722
|
+
};
|
|
4723
|
+
|
|
4724
|
+
if (jsonBody !== undefined) {
|
|
4725
|
+
if (!requestHeaders["content-type"] && !requestHeaders["Content-Type"]) {
|
|
4726
|
+
requestHeaders["content-type"] = "application/json";
|
|
4727
|
+
}
|
|
4728
|
+
requestInit.body = JSON.stringify(jsonBody);
|
|
4729
|
+
}
|
|
4730
|
+
|
|
4731
|
+
if (typeof AbortSignal !== "undefined" && typeof AbortSignal.timeout === "function") {
|
|
4732
|
+
requestInit.signal = AbortSignal.timeout(timeoutMs);
|
|
4733
|
+
}
|
|
4734
|
+
|
|
4735
|
+
try {
|
|
4736
|
+
const response = await fetch(url, requestInit);
|
|
4737
|
+
const rawText = await response.text();
|
|
4738
|
+
return {
|
|
4739
|
+
ok: response.ok,
|
|
4740
|
+
status: response.status,
|
|
4741
|
+
payload: parseJsonSafely(rawText),
|
|
4742
|
+
rawText: String(rawText || "").trim().slice(0, 280)
|
|
4743
|
+
};
|
|
4744
|
+
} catch (error) {
|
|
4745
|
+
return {
|
|
4746
|
+
ok: false,
|
|
4747
|
+
status: 0,
|
|
4748
|
+
payload: null,
|
|
4749
|
+
rawText: "",
|
|
4750
|
+
error: error instanceof Error ? error.message : String(error)
|
|
4751
|
+
};
|
|
4752
|
+
}
|
|
4753
|
+
}
|
|
4754
|
+
|
|
4755
|
+
function summarizeProbeMessage(probe) {
|
|
4756
|
+
if (!probe) return "";
|
|
4757
|
+
if (probe.error) return String(probe.error);
|
|
4758
|
+
const payloadError = probe.payload?.error;
|
|
4759
|
+
if (typeof payloadError === "string") return payloadError.trim();
|
|
4760
|
+
if (payloadError && typeof payloadError === "object") {
|
|
4761
|
+
if (payloadError.message) return String(payloadError.message).trim();
|
|
4762
|
+
if (payloadError.type) return String(payloadError.type).trim();
|
|
4763
|
+
}
|
|
4764
|
+
if (probe.rawText) return String(probe.rawText).trim().slice(0, 140);
|
|
4765
|
+
return "";
|
|
4766
|
+
}
|
|
4767
|
+
|
|
4768
|
+
function formatProbeStatusLabel(probe, {
|
|
4769
|
+
passStatuses = [200],
|
|
4770
|
+
passWhenStatusIsNot = null
|
|
4771
|
+
} = {}) {
|
|
4772
|
+
if (!probe) return "not-run";
|
|
4773
|
+
if (probe.error) return `error (${probe.error})`;
|
|
4774
|
+
const status = Number(probe.status || 0);
|
|
4775
|
+
const isPass = passWhenStatusIsNot !== null
|
|
4776
|
+
? status !== passWhenStatusIsNot
|
|
4777
|
+
: passStatuses.includes(status);
|
|
4778
|
+
const message = summarizeProbeMessage(probe);
|
|
4779
|
+
if (message) return `${isPass ? "pass" : "fail"} (status=${status}; ${message})`;
|
|
4780
|
+
return `${isPass ? "pass" : "fail"} (status=${status})`;
|
|
4781
|
+
}
|
|
4782
|
+
|
|
4783
|
+
async function runAiHelpGatewayLiveTests({
|
|
4784
|
+
runtimeState,
|
|
4785
|
+
authToken = "",
|
|
4786
|
+
probeModel = "",
|
|
4787
|
+
timeoutMs = DEFAULT_AI_HELP_GATEWAY_TEST_TIMEOUT_MS
|
|
4788
|
+
} = {}) {
|
|
4789
|
+
if (!runtimeState) {
|
|
4790
|
+
return {
|
|
4791
|
+
ran: false,
|
|
4792
|
+
reason: "local-server-not-running",
|
|
4793
|
+
baseUrl: "",
|
|
4794
|
+
tests: {}
|
|
4795
|
+
};
|
|
4796
|
+
}
|
|
4797
|
+
|
|
4798
|
+
const baseUrl = `http://${runtimeState.host}:${runtimeState.port}`;
|
|
4799
|
+
const token = String(authToken || "").trim();
|
|
4800
|
+
const headers = token
|
|
4801
|
+
? {
|
|
4802
|
+
Authorization: `Bearer ${token}`,
|
|
4803
|
+
"x-api-key": token
|
|
4804
|
+
}
|
|
4805
|
+
: {};
|
|
4806
|
+
|
|
4807
|
+
const modelId = String(probeModel || "").trim() || "chat.default";
|
|
4808
|
+
const [health, openaiModels, claudeModels, codexResponses] = await Promise.all([
|
|
4809
|
+
runGatewayHttpProbe({
|
|
4810
|
+
url: `${baseUrl}/health`,
|
|
4811
|
+
method: "GET",
|
|
4812
|
+
headers,
|
|
4813
|
+
timeoutMs
|
|
4814
|
+
}),
|
|
4815
|
+
runGatewayHttpProbe({
|
|
4816
|
+
url: `${baseUrl}/openai/v1/models`,
|
|
4817
|
+
method: "GET",
|
|
4818
|
+
headers,
|
|
4819
|
+
timeoutMs
|
|
4820
|
+
}),
|
|
4821
|
+
runGatewayHttpProbe({
|
|
4822
|
+
url: `${baseUrl}/anthropic/v1/models`,
|
|
4823
|
+
method: "GET",
|
|
4824
|
+
headers,
|
|
4825
|
+
timeoutMs
|
|
4826
|
+
}),
|
|
4827
|
+
runGatewayHttpProbe({
|
|
4828
|
+
url: `${baseUrl}/openai/v1/responses`,
|
|
4829
|
+
method: "POST",
|
|
4830
|
+
headers,
|
|
4831
|
+
jsonBody: {
|
|
4832
|
+
model: modelId,
|
|
4833
|
+
input: "ping"
|
|
4834
|
+
},
|
|
4835
|
+
timeoutMs
|
|
4836
|
+
})
|
|
4837
|
+
]);
|
|
4838
|
+
|
|
4839
|
+
return {
|
|
4840
|
+
ran: true,
|
|
4841
|
+
reason: "completed",
|
|
4842
|
+
baseUrl,
|
|
4843
|
+
tests: {
|
|
4844
|
+
health,
|
|
4845
|
+
openaiModels,
|
|
4846
|
+
claudeModels,
|
|
4847
|
+
codexResponses
|
|
4848
|
+
}
|
|
4849
|
+
};
|
|
4850
|
+
}
|
|
4851
|
+
|
|
4852
|
+
async function runAiHelpAction(context) {
|
|
4853
|
+
const args = context.args || {};
|
|
4854
|
+
const configPath = readArg(args, ["config", "configPath"], getDefaultConfigPath());
|
|
4855
|
+
const skipLiveTest = toBoolean(readArg(args, ["skip-live-test", "skipLiveTest"], false), false);
|
|
4856
|
+
const liveTestTimeoutMs = toPositiveInteger(
|
|
4857
|
+
readArg(args, ["live-test-timeout-ms", "liveTestTimeoutMs"], DEFAULT_AI_HELP_GATEWAY_TEST_TIMEOUT_MS),
|
|
4858
|
+
DEFAULT_AI_HELP_GATEWAY_TEST_TIMEOUT_MS,
|
|
4859
|
+
{ min: 500, max: 60_000 }
|
|
4860
|
+
);
|
|
4861
|
+
const explicitGatewayAuthToken = String(readArg(args, ["gateway-auth-token", "gatewayAuthToken"], "") || "").trim();
|
|
4862
|
+
const config = await readConfigFile(configPath);
|
|
4863
|
+
|
|
4864
|
+
const providers = (config.providers || []).filter((provider) => provider && provider.enabled !== false);
|
|
4865
|
+
const providerCount = providers.length;
|
|
4866
|
+
const modelCount = providers.reduce((sum, provider) => {
|
|
4867
|
+
const count = (provider.models || []).filter((model) => model && model.enabled !== false).length;
|
|
4868
|
+
return sum + count;
|
|
4869
|
+
}, 0);
|
|
4870
|
+
|
|
4871
|
+
const aliasEntries = Object.entries(config.modelAliases || {});
|
|
4872
|
+
const aliasCount = aliasEntries.length;
|
|
4873
|
+
const aliasStrategySummary = aliasEntries
|
|
4874
|
+
.map(([aliasId, alias]) => `${aliasId}:${alias?.strategy || "ordered"}`)
|
|
4875
|
+
.join(", ") || "(none)";
|
|
4876
|
+
const rateLimitBucketCount = providers.reduce((sum, provider) => sum + (provider.rateLimits || []).length, 0);
|
|
4877
|
+
const defaultModel = String(config.defaultModel || "smart");
|
|
4878
|
+
const hasMasterKey = Boolean(String(config.masterKey || "").trim());
|
|
4879
|
+
|
|
4880
|
+
let runtimeState = null;
|
|
4881
|
+
try {
|
|
4882
|
+
runtimeState = await getActiveRuntimeState();
|
|
4883
|
+
} catch {
|
|
4884
|
+
runtimeState = null;
|
|
4885
|
+
}
|
|
4886
|
+
const serverRunning = Boolean(runtimeState);
|
|
4887
|
+
const runtimeRequiresAuth = Boolean(runtimeState?.requireAuth);
|
|
4888
|
+
|
|
4889
|
+
let runtimeConfig = null;
|
|
4890
|
+
const runtimeConfigPath = String(runtimeState?.configPath || "").trim();
|
|
4891
|
+
if (runtimeConfigPath && runtimeConfigPath !== configPath) {
|
|
4892
|
+
try {
|
|
4893
|
+
runtimeConfig = await readConfigFile(runtimeConfigPath);
|
|
4894
|
+
} catch {
|
|
4895
|
+
runtimeConfig = null;
|
|
4896
|
+
}
|
|
4897
|
+
}
|
|
4898
|
+
|
|
4899
|
+
const runtimeMasterKey = String(runtimeConfig?.masterKey || "").trim();
|
|
4900
|
+
const gatewayAuthToken = explicitGatewayAuthToken
|
|
4901
|
+
|| (runtimeConfigPath && runtimeConfigPath !== configPath ? runtimeMasterKey : "")
|
|
4902
|
+
|| String(config.masterKey || "").trim()
|
|
4903
|
+
|| runtimeMasterKey;
|
|
4904
|
+
|
|
4905
|
+
const directModelRefs = collectEnabledModelRefsFromConfig(config);
|
|
4906
|
+
const aliasIds = aliasEntries.map(([aliasId]) => aliasId);
|
|
4907
|
+
const modelDecisionOptions = dedupeList([
|
|
4908
|
+
defaultModel && defaultModel !== "smart" ? defaultModel : "",
|
|
4909
|
+
...aliasIds,
|
|
4910
|
+
...directModelRefs
|
|
4911
|
+
]);
|
|
4912
|
+
const probeModel = modelDecisionOptions[0] || "chat.default";
|
|
4913
|
+
|
|
4914
|
+
let liveTest = {
|
|
4915
|
+
ran: false,
|
|
4916
|
+
reason: skipLiveTest ? "skipped-by-flag" : "local-server-not-running",
|
|
4917
|
+
baseUrl: serverRunning ? `http://${runtimeState.host}:${runtimeState.port}` : "",
|
|
4918
|
+
tests: {}
|
|
4919
|
+
};
|
|
4920
|
+
if (!skipLiveTest && serverRunning) {
|
|
4921
|
+
liveTest = await runAiHelpGatewayLiveTests({
|
|
4922
|
+
runtimeState,
|
|
4923
|
+
authToken: gatewayAuthToken,
|
|
4924
|
+
probeModel,
|
|
4925
|
+
timeoutMs: liveTestTimeoutMs
|
|
4926
|
+
});
|
|
4927
|
+
}
|
|
4928
|
+
|
|
4929
|
+
const healthProbe = liveTest.tests?.health || null;
|
|
4930
|
+
const openaiModelsProbe = liveTest.tests?.openaiModels || null;
|
|
4931
|
+
const claudeModelsProbe = liveTest.tests?.claudeModels || null;
|
|
4932
|
+
const codexResponsesProbe = liveTest.tests?.codexResponses || null;
|
|
4933
|
+
|
|
4934
|
+
const claudePatchGate = !liveTest.ran
|
|
4935
|
+
? "pending-live-test"
|
|
4936
|
+
: (claudeModelsProbe?.status === 200 ? "ready" : "blocked");
|
|
4937
|
+
const openCodePatchGate = !liveTest.ran
|
|
4938
|
+
? "pending-live-test"
|
|
4939
|
+
: (openaiModelsProbe?.status === 200 ? "ready" : "blocked");
|
|
4940
|
+
let codexPatchGate = "pending-live-test";
|
|
4941
|
+
if (liveTest.ran) {
|
|
4942
|
+
if (codexResponsesProbe?.error) {
|
|
4943
|
+
codexPatchGate = "blocked";
|
|
4944
|
+
} else if (codexResponsesProbe?.status === 404) {
|
|
4945
|
+
codexPatchGate = "blocked-responses-endpoint-missing";
|
|
4946
|
+
} else if ([401, 403].includes(Number(codexResponsesProbe?.status || 0))) {
|
|
4947
|
+
codexPatchGate = "blocked-auth";
|
|
4948
|
+
} else {
|
|
4949
|
+
codexPatchGate = "ready";
|
|
4950
|
+
}
|
|
4951
|
+
}
|
|
4952
|
+
|
|
4953
|
+
const suggestions = [];
|
|
4954
|
+
if (providerCount === 0) {
|
|
4955
|
+
suggestions.push("Add first provider with at least one model. Run: llm-router config --operation=upsert-provider --provider-id=<id> --name=\"<name>\" --base-url=<url> --api-key=<key> --models=<model1,model2>");
|
|
4956
|
+
} else {
|
|
4957
|
+
const providersWithoutModels = providers
|
|
4958
|
+
.filter((provider) => (provider.models || []).filter((model) => model && model.enabled !== false).length === 0)
|
|
4959
|
+
.map((provider) => provider.id);
|
|
4960
|
+
if (providersWithoutModels.length > 0) {
|
|
4961
|
+
suggestions.push(`Add models to provider(s) with empty model list: ${providersWithoutModels.join(", ")}. Run: llm-router config --operation=upsert-provider --provider-id=<id> --models=<model1,model2>`);
|
|
4962
|
+
}
|
|
4963
|
+
}
|
|
4964
|
+
|
|
4965
|
+
if (modelCount > 0 && aliasCount === 0) {
|
|
4966
|
+
suggestions.push("Create a model alias/group for stable app routing. Run: llm-router config --operation=upsert-model-alias --alias-id=chat.default --strategy=auto --targets=<provider/model,...>");
|
|
4967
|
+
}
|
|
4968
|
+
|
|
4969
|
+
if (aliasCount > 0) {
|
|
4970
|
+
const nonAutoAliases = aliasEntries
|
|
4971
|
+
.filter(([, alias]) => String(alias?.strategy || "ordered") !== "auto")
|
|
4972
|
+
.map(([aliasId]) => aliasId);
|
|
4973
|
+
if (nonAutoAliases.length > 0) {
|
|
4974
|
+
suggestions.push(`Review load-balancer strategy for alias(es): ${nonAutoAliases.join(", ")}. Recommended default: auto.`);
|
|
4975
|
+
}
|
|
4976
|
+
}
|
|
4977
|
+
|
|
4978
|
+
if (providerCount > 0 && rateLimitBucketCount === 0) {
|
|
4979
|
+
suggestions.push("Add at least one provider rate-limit bucket for quota safety. Run: llm-router config --operation=set-provider-rate-limits --provider-id=<id> --bucket-name=\"Monthly cap\" --bucket-models=all --bucket-requests=<n> --bucket-window=month:1");
|
|
4980
|
+
}
|
|
4981
|
+
|
|
4982
|
+
if (!hasMasterKey) {
|
|
4983
|
+
suggestions.push("Set master key for authenticated access. Run: llm-router config --operation=set-master-key --generate-master-key=true");
|
|
4984
|
+
}
|
|
4985
|
+
|
|
4986
|
+
if (!serverRunning) {
|
|
4987
|
+
suggestions.push(`Start local proxy server. Run: llm-router start${hasMasterKey ? " --require-auth=true" : ""}`);
|
|
4988
|
+
} else {
|
|
4989
|
+
suggestions.push(`Local proxy is running on http://${runtimeState.host}:${runtimeState.port}. Apply config changes with llm-router config; updates hot-reload automatically.`);
|
|
4990
|
+
}
|
|
4991
|
+
|
|
4992
|
+
if (serverRunning && skipLiveTest) {
|
|
4993
|
+
suggestions.push("Run live llm-router API test before patching coding-tool config. Re-run: llm-router ai-help --skip-live-test=false");
|
|
4994
|
+
}
|
|
4995
|
+
|
|
4996
|
+
if (liveTest.ran && claudePatchGate !== "ready") {
|
|
4997
|
+
suggestions.push("Claude/OpenCode patch gate is blocked. Fix llm-router auth/provider/model readiness, then re-run llm-router ai-help.");
|
|
4998
|
+
}
|
|
4999
|
+
if (liveTest.ran && codexPatchGate === "blocked-responses-endpoint-missing") {
|
|
5000
|
+
suggestions.push("Codex CLI requires OpenAI Responses API. Current llm-router endpoint does not expose /openai/v1/responses; do not patch Codex until this gate is resolved.");
|
|
5001
|
+
}
|
|
5002
|
+
|
|
5003
|
+
if (suggestions.length === 0) {
|
|
5004
|
+
suggestions.push("No blocking setup gaps detected. Review routing summary with: llm-router config --operation=list-routing");
|
|
5005
|
+
}
|
|
5006
|
+
|
|
5007
|
+
const runtimeConfigPathForDisplay = runtimeConfigPath ? toHomeRelativePath(runtimeConfigPath) : "";
|
|
5008
|
+
const gatewayBaseUrlForGuide = liveTest.baseUrl || (serverRunning ? `http://${runtimeState.host}:${runtimeState.port}` : "http://127.0.0.1:8787");
|
|
5009
|
+
const authGuideHeaders = runtimeRequiresAuth ? ["Authorization: Bearer <master_key>"] : [];
|
|
5010
|
+
|
|
5011
|
+
const lines = [
|
|
5012
|
+
"# AI-HELP",
|
|
5013
|
+
"ENTITY: llm-router",
|
|
5014
|
+
"MODE: cli-automation",
|
|
5015
|
+
"PROFILE: agent-guide-v2",
|
|
5016
|
+
"",
|
|
5017
|
+
"## INTRO",
|
|
5018
|
+
"Use this output as an AI-agent operating brief for llm-router.",
|
|
5019
|
+
"The agent should auto-discover commands, inspect current state, configure llm-router on your behalf, run live API gates, and only then patch coding tool configs.",
|
|
5020
|
+
"",
|
|
5021
|
+
"## WHAT AGENT CAN DO WITH LLM-ROUTER",
|
|
5022
|
+
"- explain llm-router capabilities and current setup readiness",
|
|
5023
|
+
"- set provider, model list, model alias/group, and rate-limit buckets via CLI",
|
|
5024
|
+
"- validate local llm-router endpoint health/model-list/routes with real API probes",
|
|
5025
|
+
"- patch coding tools (Claude Code, Codex CLI, OpenCode) after pre-patch gates pass",
|
|
5026
|
+
"",
|
|
5027
|
+
"## DISCOVERY COMMANDS",
|
|
5028
|
+
"- llm-router -h",
|
|
5029
|
+
"- llm-router config -h",
|
|
5030
|
+
"- llm-router start -h",
|
|
5031
|
+
"- llm-router deploy -h",
|
|
5032
|
+
"",
|
|
5033
|
+
"## CURRENT STATE",
|
|
5034
|
+
`- config_path=${configPath}`,
|
|
5035
|
+
`- providers=${providerCount}`,
|
|
5036
|
+
`- models=${modelCount}`,
|
|
5037
|
+
`- model_aliases=${aliasCount}`,
|
|
5038
|
+
`- alias_strategies=${aliasStrategySummary}`,
|
|
5039
|
+
`- rate_limit_buckets=${rateLimitBucketCount}`,
|
|
5040
|
+
`- default_model=${defaultModel}`,
|
|
5041
|
+
`- master_key_configured=${hasMasterKey}`,
|
|
5042
|
+
`- local_server_running=${serverRunning}`,
|
|
5043
|
+
serverRunning ? `- local_server_endpoint=http://${runtimeState.host}:${runtimeState.port}` : "",
|
|
5044
|
+
runtimeState ? `- local_server_require_auth=${runtimeRequiresAuth}` : "",
|
|
5045
|
+
runtimeConfigPathForDisplay ? `- local_server_config_path=${runtimeConfigPathForDisplay}` : "",
|
|
5046
|
+
"",
|
|
5047
|
+
"## MODEL/GROUP DECISION INPUT (REQUIRED BEFORE PATCHING TOOL CONFIG)",
|
|
5048
|
+
"- Ask user to choose target_tool: claude-code | codex-cli | opencode",
|
|
5049
|
+
"- Ask user to choose target_model_or_group for that tool",
|
|
5050
|
+
`- available_alias_groups=${aliasIds.join(", ") || "(none)"}`,
|
|
5051
|
+
`- available_direct_models=${directModelRefs.join(", ") || "(none)"}`,
|
|
5052
|
+
`- decision_options_preview=${modelDecisionOptions.slice(0, 12).join(", ") || "(none)"}`,
|
|
5053
|
+
"- If user chooses an alias/group, keep alias id unchanged so llm-router balancing still works.",
|
|
5054
|
+
"",
|
|
5055
|
+
"## PRE-PATCH API GATE (MUST PASS BEFORE EDITING TOOL CONFIG)",
|
|
5056
|
+
`- live_test_mode=${skipLiveTest ? "skipped-by-flag" : (liveTest.ran ? "executed" : "pending-local-server")}`,
|
|
5057
|
+
`- live_test_timeout_ms=${liveTestTimeoutMs}`,
|
|
5058
|
+
`- gateway_base_url=${gatewayBaseUrlForGuide}`,
|
|
5059
|
+
`- health_probe=${liveTest.ran ? formatProbeStatusLabel(healthProbe, { passStatuses: [200] }) : "not-run"}`,
|
|
5060
|
+
`- openai_models_probe=${liveTest.ran ? formatProbeStatusLabel(openaiModelsProbe, { passStatuses: [200] }) : "not-run"}`,
|
|
5061
|
+
`- claude_models_probe=${liveTest.ran ? formatProbeStatusLabel(claudeModelsProbe, { passStatuses: [200] }) : "not-run"}`,
|
|
5062
|
+
`- codex_responses_probe=${liveTest.ran ? formatProbeStatusLabel(codexResponsesProbe, { passWhenStatusIsNot: 404 }) : "not-run"}`,
|
|
5063
|
+
`- patch_gate_claude_code=${claudePatchGate}`,
|
|
5064
|
+
`- patch_gate_opencode=${openCodePatchGate}`,
|
|
5065
|
+
`- patch_gate_codex_cli=${codexPatchGate}`,
|
|
5066
|
+
"- Rule: Do NOT patch any coding-tool config until required gate is ready.",
|
|
5067
|
+
"",
|
|
5068
|
+
"## LIVE TEST COMMANDS (RUN BEFORE PATCHING TOOL CONFIG)",
|
|
5069
|
+
runtimeRequiresAuth ? "- export LLM_ROUTER_MASTER_KEY='<master_key>'" : "- Local auth currently disabled; auth header is optional.",
|
|
5070
|
+
`- ${buildCurlGuideCommand(`${gatewayBaseUrlForGuide}/health`, { method: "GET", headers: authGuideHeaders })}`,
|
|
5071
|
+
`- ${buildCurlGuideCommand(`${gatewayBaseUrlForGuide}/openai/v1/models`, { method: "GET", headers: authGuideHeaders })}`,
|
|
5072
|
+
`- ${buildCurlGuideCommand(`${gatewayBaseUrlForGuide}/anthropic/v1/models`, { method: "GET", headers: authGuideHeaders })}`,
|
|
5073
|
+
`- ${buildCurlGuideCommand(`${gatewayBaseUrlForGuide}/openai/v1/responses`, {
|
|
5074
|
+
method: "POST",
|
|
5075
|
+
headers: authGuideHeaders,
|
|
5076
|
+
jsonBody: { model: "<target_model_or_group>", input: "ping" }
|
|
5077
|
+
})} # required for Codex CLI compatibility`,
|
|
5078
|
+
"",
|
|
5079
|
+
"## LLM-ROUTER CONFIG WORKFLOWS (CLI)",
|
|
5080
|
+
"1. Upsert provider + models:",
|
|
5081
|
+
" llm-router config --operation=upsert-provider --provider-id=<id> --name=\"<name>\" --endpoints=<url1,url2> --api-key=<key> --models=<model1,model2>",
|
|
5082
|
+
"2. Upsert model alias/group:",
|
|
5083
|
+
" llm-router config --operation=upsert-model-alias --alias-id=<alias> --strategy=auto --targets=<provider/model,...>",
|
|
5084
|
+
"3. Set provider rate limit bucket:",
|
|
5085
|
+
" llm-router config --operation=set-provider-rate-limits --provider-id=<id> --bucket-name=\"Monthly cap\" --bucket-models=all --bucket-requests=<n> --bucket-window=month:1",
|
|
5086
|
+
"4. Review final routing summary:",
|
|
5087
|
+
" llm-router config --operation=list-routing",
|
|
5088
|
+
"",
|
|
5089
|
+
"## CODING TOOL PATCH PLAYBOOK",
|
|
5090
|
+
"### Claude Code",
|
|
5091
|
+
"- patch_target_priority=.claude/settings.local.json (project) -> ~/.claude/settings.json (user)",
|
|
5092
|
+
"- required_gate=patch_gate_claude_code=ready",
|
|
5093
|
+
"- set env keys: ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN, ANTHROPIC_MODEL",
|
|
5094
|
+
"```json",
|
|
5095
|
+
"{",
|
|
5096
|
+
" \"env\": {",
|
|
5097
|
+
` \"ANTHROPIC_BASE_URL\": \"${gatewayBaseUrlForGuide}/anthropic\",`,
|
|
5098
|
+
" \"ANTHROPIC_AUTH_TOKEN\": \"<master_key>\",",
|
|
5099
|
+
" \"ANTHROPIC_MODEL\": \"<target_model_or_group>\"",
|
|
5100
|
+
" }",
|
|
5101
|
+
"}",
|
|
5102
|
+
"```",
|
|
5103
|
+
"",
|
|
5104
|
+
"### Codex CLI",
|
|
5105
|
+
"- patch_target_priority=.codex/config.toml (project) -> ~/.codex/config.toml (user)",
|
|
5106
|
+
"- required_gate=patch_gate_codex_cli=ready",
|
|
5107
|
+
"- hard requirement: Codex uses OpenAI Responses API; /openai/v1/responses must be reachable",
|
|
5108
|
+
"```toml",
|
|
5109
|
+
"model_provider = \"llm_router\"",
|
|
5110
|
+
"model = \"<target_model_or_group>\"",
|
|
5111
|
+
"",
|
|
5112
|
+
"[model_providers.llm_router]",
|
|
5113
|
+
"name = \"llm-router\"",
|
|
5114
|
+
`base_url = \"${gatewayBaseUrlForGuide}/openai/v1\"`,
|
|
5115
|
+
"wire_api = \"responses\"",
|
|
5116
|
+
"env_http_headers = { Authorization = \"LLM_ROUTER_AUTH_HEADER\" }",
|
|
5117
|
+
"```",
|
|
5118
|
+
"- export env before launching Codex: export LLM_ROUTER_AUTH_HEADER='Bearer <master_key>'",
|
|
5119
|
+
"",
|
|
5120
|
+
"### OpenCode",
|
|
5121
|
+
"- patch_target_priority=./opencode.json (project) -> ~/.config/opencode/opencode.json (user)",
|
|
5122
|
+
"- required_gate=patch_gate_opencode=ready",
|
|
5123
|
+
"```json",
|
|
5124
|
+
"{",
|
|
5125
|
+
" \"model\": \"<target_model_or_group>\",",
|
|
5126
|
+
" \"small_model\": \"<target_model_or_group>\",",
|
|
5127
|
+
" \"provider\": {",
|
|
5128
|
+
" \"llm-router\": {",
|
|
5129
|
+
" \"options\": {",
|
|
5130
|
+
` \"baseURL\": \"${gatewayBaseUrlForGuide}/openai\",`,
|
|
5131
|
+
" \"apiKey\": \"<master_key>\"",
|
|
5132
|
+
" }",
|
|
5133
|
+
" }",
|
|
5134
|
+
" }",
|
|
5135
|
+
"}",
|
|
5136
|
+
"```",
|
|
5137
|
+
"",
|
|
5138
|
+
"## NEXT SUGGESTIONS",
|
|
5139
|
+
...suggestions.map((item, index) => `${index + 1}. ${item}`),
|
|
5140
|
+
"",
|
|
5141
|
+
"## UPDATE RULE",
|
|
5142
|
+
"When local server is running, llm-router config changes are hot-reloaded in memory (no manual restart required).",
|
|
5143
|
+
"Agent policy: always run live API gate checks first, then patch tool configs only after gate status is ready."
|
|
5144
|
+
].filter(Boolean);
|
|
5145
|
+
|
|
5146
|
+
return {
|
|
5147
|
+
ok: true,
|
|
5148
|
+
mode: context.mode,
|
|
5149
|
+
exitCode: EXIT_SUCCESS,
|
|
5150
|
+
data: lines.join("\n")
|
|
5151
|
+
};
|
|
5152
|
+
}
|
|
5153
|
+
|
|
4662
5154
|
async function runDeployAction(context) {
|
|
4663
5155
|
const args = context.args || {};
|
|
4664
5156
|
const configPath = readArg(args, ["config", "configPath"], getDefaultConfigPath());
|
|
@@ -5352,6 +5844,44 @@ const routerModule = {
|
|
|
5352
5844
|
},
|
|
5353
5845
|
run: runUpdateAction
|
|
5354
5846
|
},
|
|
5847
|
+
{
|
|
5848
|
+
actionId: "ai-help",
|
|
5849
|
+
description: "Print AI-agent guide with llm-router setup workflows, live API gates, and coding-tool patch playbooks.",
|
|
5850
|
+
tui: { steps: ["print-ai-help"] },
|
|
5851
|
+
commandline: {
|
|
5852
|
+
requiredArgs: [],
|
|
5853
|
+
optionalArgs: [
|
|
5854
|
+
"config",
|
|
5855
|
+
"skip-live-test",
|
|
5856
|
+
"live-test-timeout-ms",
|
|
5857
|
+
"gateway-auth-token"
|
|
5858
|
+
]
|
|
5859
|
+
},
|
|
5860
|
+
help: {
|
|
5861
|
+
summary: "AI guide for setup + operation: state snapshot, provider/alias/rate-limit workflows, live gateway tests, and patch rules for Claude/Codex/OpenCode.",
|
|
5862
|
+
args: [
|
|
5863
|
+
{ name: "config", required: false, description: "Path to config file used for state-aware suggestions.", example: "--config=~/.llm-router.json" },
|
|
5864
|
+
{ name: "skip-live-test", required: false, description: "Skip live llm-router API probes in ai-help output.", example: "--skip-live-test=true" },
|
|
5865
|
+
{ name: "live-test-timeout-ms", required: false, description: `HTTP timeout for ai-help live probes (default ${DEFAULT_AI_HELP_GATEWAY_TEST_TIMEOUT_MS}ms).`, example: "--live-test-timeout-ms=8000" },
|
|
5866
|
+
{ name: "gateway-auth-token", required: false, description: "Override auth token for live probes when runtime config differs from selected --config.", example: "--gateway-auth-token=gw_..." }
|
|
5867
|
+
],
|
|
5868
|
+
examples: [
|
|
5869
|
+
"llm-router ai-help",
|
|
5870
|
+
"llm-router ai-help --config=~/.llm-router.json",
|
|
5871
|
+
"llm-router ai-help --skip-live-test=true",
|
|
5872
|
+
"llm-router ai-help --live-test-timeout-ms=8000"
|
|
5873
|
+
],
|
|
5874
|
+
useCases: [
|
|
5875
|
+
{
|
|
5876
|
+
name: "agent setup brief",
|
|
5877
|
+
description: "Generate a machine-readable operating guide so AI agents can configure llm-router, run pre-patch API gates, and patch tool configs safely.",
|
|
5878
|
+
command: "llm-router ai-help"
|
|
5879
|
+
}
|
|
5880
|
+
],
|
|
5881
|
+
keybindings: []
|
|
5882
|
+
},
|
|
5883
|
+
run: runAiHelpAction
|
|
5884
|
+
},
|
|
5355
5885
|
{
|
|
5356
5886
|
actionId: "config",
|
|
5357
5887
|
description: "Config manager for providers/models/master-key/startup service.",
|