@khanglvm/llm-router 1.0.6 → 1.0.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +26 -0
- package/README.md +131 -378
- package/package.json +13 -1
- package/src/cli/cloudflare-api.js +267 -0
- package/src/cli/router-module.js +575 -568
- package/src/cli/wrangler-toml.js +324 -0
- package/src/index.js +3 -1
- package/src/node/port-reclaim.js +224 -0
- package/src/node/start-command.js +2 -128
- package/src/runtime/handler/provider-call.js +8 -2
- package/src/runtime/handler/route-debug.js +104 -0
- package/src/runtime/handler/runtime-policy.js +161 -0
- package/src/runtime/handler.js +43 -236
- package/src/shared/timeout-signal.js +23 -0
package/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,32 @@ All notable changes to this project will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [1.0.9] - 2026-03-03
|
|
9
|
+
|
|
10
|
+
### Added
|
|
11
|
+
- Added dedicated modules for Cloudflare API preflight checks and Wrangler TOML target handling.
|
|
12
|
+
- Added runtime policy and route-debug helpers so stateful routing can be safely disabled by default on Cloudflare Worker.
|
|
13
|
+
- Added reusable timeout-signal utility and start-command port reclaim utilities with test coverage.
|
|
14
|
+
|
|
15
|
+
### Changed
|
|
16
|
+
- Refactored CLI deploy/runtime handler code into focused modules with cleaner boundaries.
|
|
17
|
+
- Updated provider-call timeout handling to support both `AbortSignal.timeout` and `AbortController` fallback.
|
|
18
|
+
- Documented Worker safety defaults and switched README release/security links to canonical GitHub URLs.
|
|
19
|
+
|
|
20
|
+
## [1.0.8] - 2026-02-28
|
|
21
|
+
|
|
22
|
+
### Changed
|
|
23
|
+
- Added focused npm `keywords` metadata in `package.json` to improve package discoverability.
|
|
24
|
+
|
|
25
|
+
## [1.0.7] - 2026-02-28
|
|
26
|
+
|
|
27
|
+
### Added
|
|
28
|
+
- Added `llm-router ai-help` to generate an agent-oriented operating guide with live gateway checks and coding-tool patch instructions.
|
|
29
|
+
- Added tests covering `ai-help` discovery output and first-run setup guidance.
|
|
30
|
+
|
|
31
|
+
### Changed
|
|
32
|
+
- Rewrote `README.md` into a shorter setup and operations guide focused on providers, aliases, rate limits, and local/hosted usage.
|
|
33
|
+
|
|
8
34
|
## [1.0.6] - 2026-02-28
|
|
9
35
|
|
|
10
36
|
### Added
|
package/README.md
CHANGED
|
@@ -1,440 +1,193 @@
|
|
|
1
1
|
# llm-router
|
|
2
2
|
|
|
3
|
-
`llm-router`
|
|
3
|
+
`llm-router` exposes unified API endpoint for multiple AI providers and models.
|
|
4
4
|
|
|
5
|
-
|
|
6
|
-
- local route server `llm-router start`
|
|
7
|
-
- Cloudflare Worker route runtime deployment `llm-router deploy`
|
|
8
|
-
- CLI + TUI management `config`, `start`, `deploy`, `worker-key`
|
|
9
|
-
- Seamless model fallback
|
|
5
|
+
## Main feature
|
|
10
6
|
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
npm i -g @khanglvm/llm-router
|
|
15
|
-
```
|
|
16
|
-
|
|
17
|
-
## Versioning
|
|
7
|
+
1. Single endpoint, unified providers & models
|
|
8
|
+
2. Support grouping models with rate-limit and load balancing strategy
|
|
9
|
+
3. Configuration auto reload in real time, no interruption
|
|
18
10
|
|
|
19
|
-
|
|
20
|
-
- Release notes live in [`CHANGELOG.md`](./CHANGELOG.md).
|
|
21
|
-
- npm publishes are configured for the public registry package.
|
|
22
|
-
|
|
23
|
-
Release checklist:
|
|
24
|
-
- Update `README.md` if user-facing behavior changed.
|
|
25
|
-
- Add a dated entry in `CHANGELOG.md`.
|
|
26
|
-
- Bump the package version before publish.
|
|
27
|
-
- Publish with `npm publish`.
|
|
28
|
-
|
|
29
|
-
## Quick Start
|
|
11
|
+
## Install
|
|
30
12
|
|
|
31
13
|
```bash
|
|
32
|
-
|
|
33
|
-
llm-router
|
|
34
|
-
|
|
35
|
-
# 2) Start local route server
|
|
36
|
-
llm-router start
|
|
14
|
+
npm i -g @khanglvm/llm-router@latest
|
|
37
15
|
```
|
|
38
16
|
|
|
39
|
-
|
|
40
|
-
- Unified (Auto transform): `http://127.0.0.1:8787/route` (or `/` and `/v1`)
|
|
41
|
-
- Anthropic: `http://127.0.0.1:8787/anthropic`
|
|
42
|
-
- OpenAI: `http://127.0.0.1:8787/openai`
|
|
43
|
-
|
|
44
|
-
## Usage Example
|
|
45
|
-
|
|
46
|
-
```bash
|
|
47
|
-
# Your AI Agent can help! Ask them to manage api router via this tool for you.
|
|
48
|
-
|
|
49
|
-
# 1) Add provider + models + provider API key. You can ask your AI agent to do it for you, or manually via TUI or command line:
|
|
50
|
-
llm-router config \
|
|
51
|
-
--operation=upsert-provider \
|
|
52
|
-
--provider-id=openrouter \
|
|
53
|
-
--name="OpenRouter" \
|
|
54
|
-
--base-url=https://openrouter.ai/api/v1 \
|
|
55
|
-
--api-key=sk-or-v1-... \
|
|
56
|
-
--models=claude-3-7-sonnet,gpt-4o \
|
|
57
|
-
--format=openai \
|
|
58
|
-
--skip-probe=true
|
|
59
|
-
|
|
60
|
-
# 2) (Optional) Configure model fallback order for direct provider/model requests
|
|
61
|
-
llm-router config \
|
|
62
|
-
--operation=set-model-fallbacks \
|
|
63
|
-
--provider-id=openrouter \
|
|
64
|
-
--model=claude-3-7-sonnet \
|
|
65
|
-
--fallback-models=openrouter/gpt-4o
|
|
66
|
-
|
|
67
|
-
# 3) (Optional) Create a model alias with a routing strategy and weighted targets
|
|
68
|
-
llm-router config \
|
|
69
|
-
--operation=upsert-model-alias \
|
|
70
|
-
--alias-id=chat.default \
|
|
71
|
-
--strategy=auto \
|
|
72
|
-
--targets=openrouter/claude-3-7-sonnet@2,openrouter/gpt-4o@1 \
|
|
73
|
-
--fallback-targets=openrouter/gpt-4o-mini
|
|
74
|
-
|
|
75
|
-
# 4) (Optional) Add provider request-cap bucket (models: all)
|
|
76
|
-
llm-router config \
|
|
77
|
-
--operation=set-provider-rate-limits \
|
|
78
|
-
--provider-id=openrouter \
|
|
79
|
-
--bucket-name="Monthly cap" \
|
|
80
|
-
--bucket-models=all \
|
|
81
|
-
--bucket-requests=20000 \
|
|
82
|
-
--bucket-window=month:1
|
|
83
|
-
|
|
84
|
-
# 5) Set master key (this is your gateway key for client apps)
|
|
85
|
-
llm-router config --operation=set-master-key --master-key=gw_your_gateway_key
|
|
86
|
-
|
|
87
|
-
# 6) Start gateway with auth required
|
|
88
|
-
llm-router start --require-auth=true
|
|
89
|
-
```
|
|
17
|
+
## Usage
|
|
90
18
|
|
|
91
|
-
|
|
19
|
+
Copy/paste this short instruction to your AI agent:
|
|
92
20
|
|
|
93
|
-
```
|
|
94
|
-
|
|
95
|
-
"env": {
|
|
96
|
-
"ANTHROPIC_BASE_URL": "http://127.0.0.1:8787/anthropic",
|
|
97
|
-
"ANTHROPIC_AUTH_TOKEN": "gw_your_gateway_key"
|
|
98
|
-
}
|
|
99
|
-
}
|
|
21
|
+
```text
|
|
22
|
+
Run `llm-router ai-help` first, then set up and operate llm-router for me using CLI commands.
|
|
100
23
|
```
|
|
101
24
|
|
|
102
|
-
##
|
|
103
|
-
|
|
104
|
-
`llm-router` can fail over from a primary model to configured fallback models with status-aware logic:
|
|
105
|
-
- `429` (rate-limited): immediate fallback (no origin retry), with `Retry-After` respected when present.
|
|
106
|
-
- Temporary failures (`408`, `409`, `5xx`, network errors): origin-only bounded retries with jittered backoff, then fallback.
|
|
107
|
-
- Billing/quota exhaustion (`402`, or provider-specific billing signals): immediate fallback with longer origin cooldown memory.
|
|
108
|
-
- Auth and permission failures (`401` and relevant `403` cases): no retry; fallback to other providers/models when possible.
|
|
109
|
-
- Policy/moderation blocks: no retry; cross-provider fallback is disabled by default (`LLM_ROUTER_ALLOW_POLICY_FALLBACK=false`).
|
|
110
|
-
- Invalid client requests (`400`, `413`, `422`): no retry and no fallback short-circuit.
|
|
25
|
+
## Main Workflow
|
|
111
26
|
|
|
112
|
-
|
|
27
|
+
1. Add Providers + models into llm-router
|
|
28
|
+
2. Optionally, group models as alias with load balancing and auto fallback support
|
|
29
|
+
3. Start llm-router server, point your coding tool API and model to llm-router
|
|
113
30
|
|
|
114
|
-
|
|
31
|
+
## What Each Term Means
|
|
115
32
|
|
|
116
|
-
|
|
33
|
+
### Provider
|
|
34
|
+
The service endpoint you call (OpenRouter, Anthropic, etc.).
|
|
117
35
|
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
- `round-robin`: Rotates evenly across eligible targets.
|
|
121
|
-
- `weighted-rr`: Rotates like round-robin, but favors higher weights.
|
|
122
|
-
- `quota-aware-weighted-rr`: Weighted routing plus remaining-capacity awareness.
|
|
123
|
-
|
|
124
|
-
Example:
|
|
125
|
-
|
|
126
|
-
```bash
|
|
127
|
-
llm-router config \
|
|
128
|
-
--operation=upsert-model-alias \
|
|
129
|
-
--alias-id=coding \
|
|
130
|
-
--strategy=auto \
|
|
131
|
-
--targets=rc/gpt-5.3-codex,zai/glm-5
|
|
132
|
-
```
|
|
133
|
-
|
|
134
|
-
Concrete model alias example with provider-specific caps:
|
|
135
|
-
|
|
136
|
-
```bash
|
|
137
|
-
llm-router config \
|
|
138
|
-
--operation=upsert-model-alias \
|
|
139
|
-
--alias-id=coding \
|
|
140
|
-
--strategy=auto \
|
|
141
|
-
--targets=rc/gpt-5.3-codex,zai/glm-5
|
|
142
|
-
|
|
143
|
-
llm-router config \
|
|
144
|
-
--operation=set-provider-rate-limits \
|
|
145
|
-
--provider-id=rc \
|
|
146
|
-
--bucket-name="Minute cap" \
|
|
147
|
-
--bucket-models=gpt-5.3-codex \
|
|
148
|
-
--bucket-requests=60 \
|
|
149
|
-
--bucket-window=minute:1
|
|
150
|
-
|
|
151
|
-
llm-router config \
|
|
152
|
-
--operation=set-provider-rate-limits \
|
|
153
|
-
--provider-id=zai \
|
|
154
|
-
--bucket-name="5-hours cap" \
|
|
155
|
-
--bucket-models=glm-5 \
|
|
156
|
-
--bucket-requests=600 \
|
|
157
|
-
--bucket-window=hour:5
|
|
158
|
-
```
|
|
159
|
-
|
|
160
|
-
## What Is A Bucket?
|
|
161
|
-
|
|
162
|
-
A rate-limit bucket is a request cap for a time window.
|
|
36
|
+
### Model
|
|
37
|
+
The actual model ID from that provider.
|
|
163
38
|
|
|
39
|
+
### Rate-Limit Bucket
|
|
40
|
+
A request cap for a time window.
|
|
164
41
|
Examples:
|
|
165
|
-
- `40
|
|
166
|
-
- `
|
|
167
|
-
|
|
168
|
-
Multiple buckets can apply to the same model scope at the same time. A candidate is treated as exhausted if any matching bucket is exhausted.
|
|
42
|
+
- `40 requests / minute`
|
|
43
|
+
- `20,000 requests / month`
|
|
169
44
|
|
|
170
|
-
|
|
45
|
+
### Model Load Balancer
|
|
46
|
+
Decides how traffic is distributed across models in an alias group.
|
|
171
47
|
|
|
172
|
-
|
|
173
|
-
- `Manage provider rate-limit buckets`
|
|
174
|
-
- `Create bucket(s)`
|
|
175
|
-
|
|
176
|
-
The TUI now guides you through:
|
|
177
|
-
- Bucket name (friendly label)
|
|
178
|
-
- Model scope (`all` or selected models with multiselect checkboxes)
|
|
179
|
-
- Request cap
|
|
180
|
-
- Window unit (`minute`, `hour(s)`, `week`, `month`)
|
|
181
|
-
- Window size (hours support `N`, other preset units lock to `1`)
|
|
182
|
-
- Review + optional add-another loop for combined policies
|
|
183
|
-
|
|
184
|
-
Internal bucket ids are generated automatically from the name when omitted and shown as advanced detail in review.
|
|
185
|
-
|
|
186
|
-
## Combined-Cap Recipe (`40/min` + `600/6h`)
|
|
187
|
-
|
|
188
|
-
```bash
|
|
189
|
-
llm-router config \
|
|
190
|
-
--operation=set-provider-rate-limits \
|
|
191
|
-
--provider-id=openrouter \
|
|
192
|
-
--bucket-name="Minute cap" \
|
|
193
|
-
--bucket-models=all \
|
|
194
|
-
--bucket-requests=40 \
|
|
195
|
-
--bucket-window=minute:1
|
|
196
|
-
|
|
197
|
-
llm-router config \
|
|
198
|
-
--operation=set-provider-rate-limits \
|
|
199
|
-
--provider-id=openrouter \
|
|
200
|
-
--bucket-name="6-hours cap" \
|
|
201
|
-
--bucket-models=all \
|
|
202
|
-
--bucket-requests=600 \
|
|
203
|
-
--bucket-window=hour:6
|
|
204
|
-
```
|
|
205
|
-
|
|
206
|
-
This keeps both limits active together for the same model scope.
|
|
207
|
-
|
|
208
|
-
## Rate-Limit Troubleshooting
|
|
209
|
-
|
|
210
|
-
- Check routing decisions with `LLM_ROUTER_DEBUG_ROUTING=true` and inspect `x-llm-router-skipped-candidates`.
|
|
211
|
-
- `quota-exhausted` means proactive pre-routing skip happened before an upstream call.
|
|
212
|
-
- For provider `429`, cooldown is tracked from `Retry-After` when present, or from `LLM_ROUTER_ORIGIN_RATE_LIMIT_COOLDOWN_MS`.
|
|
213
|
-
- Local mode persists state by default (file backend), while Worker defaults to in-memory state.
|
|
214
|
-
|
|
215
|
-
## Main Commands
|
|
216
|
-
|
|
217
|
-
```bash
|
|
218
|
-
llm-router config
|
|
219
|
-
llm-router start
|
|
220
|
-
llm-router stop
|
|
221
|
-
llm-router reload
|
|
222
|
-
llm-router update
|
|
223
|
-
llm-router deploy
|
|
224
|
-
llm-router worker-key
|
|
225
|
-
```
|
|
226
|
-
|
|
227
|
-
## Non-Interactive Config (Agent/CI Friendly)
|
|
228
|
-
|
|
229
|
-
```bash
|
|
230
|
-
llm-router config \
|
|
231
|
-
--operation=upsert-provider \
|
|
232
|
-
--provider-id=openrouter \
|
|
233
|
-
--name="OpenRouter" \
|
|
234
|
-
--base-url=https://openrouter.ai/api/v1 \
|
|
235
|
-
--api-key=sk-or-v1-... \
|
|
236
|
-
--models=gpt-4o,claude-3-7-sonnet \
|
|
237
|
-
--format=openai \
|
|
238
|
-
--skip-probe=true
|
|
239
|
-
|
|
240
|
-
llm-router config \
|
|
241
|
-
--operation=upsert-model-alias \
|
|
242
|
-
--alias-id=chat.default \
|
|
243
|
-
--strategy=auto \
|
|
244
|
-
--targets=openrouter/gpt-4o-mini@3,anthropic/claude-3-5-haiku@2 \
|
|
245
|
-
--fallback-targets=openrouter/gpt-4o
|
|
246
|
-
|
|
247
|
-
llm-router config \
|
|
248
|
-
--operation=set-provider-rate-limits \
|
|
249
|
-
--provider-id=openrouter \
|
|
250
|
-
--bucket-name="Monthly cap" \
|
|
251
|
-
--bucket-models=all \
|
|
252
|
-
--bucket-requests=20000 \
|
|
253
|
-
--bucket-window=month:1
|
|
254
|
-
```
|
|
255
|
-
|
|
256
|
-
Alias target syntax:
|
|
257
|
-
- `--targets` / `--fallback-targets`: `<routeRef>@<weight>` or `<routeRef>:<weight>`
|
|
258
|
-
- route refs: direct `provider/model` or alias id
|
|
259
|
-
|
|
260
|
-
Routing strategy values:
|
|
48
|
+
Available strategies:
|
|
261
49
|
- `auto` (recommended)
|
|
262
50
|
- `ordered`
|
|
263
51
|
- `round-robin`
|
|
264
52
|
- `weighted-rr`
|
|
265
53
|
- `quota-aware-weighted-rr`
|
|
266
54
|
|
|
267
|
-
|
|
268
|
-
|
|
269
|
-
- `--bucket-window=1w`
|
|
270
|
-
- `--bucket-window=7day`
|
|
271
|
-
|
|
272
|
-
Routing summary:
|
|
273
|
-
|
|
274
|
-
```bash
|
|
275
|
-
llm-router config --operation=list-routing
|
|
276
|
-
```
|
|
55
|
+
### Model Alias (Group models)
|
|
56
|
+
A single model name that auto route/rotate across multiple models.
|
|
277
57
|
|
|
278
|
-
|
|
58
|
+
Example:
|
|
59
|
+
- alias: `opus`
|
|
60
|
+
- targets:
|
|
61
|
+
- `openrouter/claude-opus-4.6`
|
|
62
|
+
- `anthropic/claude-opus-4.6`
|
|
279
63
|
|
|
280
|
-
|
|
281
|
-
llm-router config --operation=migrate-config --target-version=2 --create-backup=true
|
|
282
|
-
```
|
|
64
|
+
Your app can use `opus` model and `llm-router` chooses target models based on your routing settings.
|
|
283
65
|
|
|
284
|
-
|
|
285
|
-
- Local config loads with silent forward-migration to latest supported schema.
|
|
286
|
-
- Migration is persisted automatically on read when possible (best-effort, no interactive prompt).
|
|
287
|
-
- Future/newer version numbers do not fail only because of version mismatch; known fields are normalized best-effort.
|
|
66
|
+
## Setup using Terminal User Interface (TUI)
|
|
288
67
|
|
|
289
|
-
|
|
68
|
+
Open the TUI:
|
|
290
69
|
|
|
291
70
|
```bash
|
|
292
|
-
llm-router
|
|
293
|
-
# or generate a strong key automatically
|
|
294
|
-
llm-router config --operation=set-master-key --generate-master-key=true
|
|
71
|
+
llm-router
|
|
295
72
|
```
|
|
296
73
|
|
|
297
|
-
|
|
74
|
+
Then follow this order.
|
|
75
|
+
|
|
76
|
+
### 1) Add Provider
|
|
77
|
+
Flow:
|
|
78
|
+
1. `Config manager`
|
|
79
|
+
2. `Add/Edit provider`
|
|
80
|
+
3. Enter provider name, endpoint, API key
|
|
81
|
+
4. Enter model list
|
|
82
|
+
5. Save
|
|
83
|
+
|
|
84
|
+
### 2) Configure Model Fallback (Optional)
|
|
85
|
+
Flow:
|
|
86
|
+
1. `Config manager`
|
|
87
|
+
2. `Set model silent-fallbacks`
|
|
88
|
+
3. Pick main model
|
|
89
|
+
4. Pick fallback models
|
|
90
|
+
5. Save
|
|
91
|
+
|
|
92
|
+
### 3) Configure Rate Limits (Optional)
|
|
93
|
+
Flow:
|
|
94
|
+
1. `Config manager`
|
|
95
|
+
2. `Manage provider rate-limit buckets`
|
|
96
|
+
3. `Create bucket(s)`
|
|
97
|
+
4. Set name, model scope, request cap, time window
|
|
98
|
+
5. Save
|
|
99
|
+
|
|
100
|
+
### 4) Group Models With Alias (Recommended)
|
|
101
|
+
Flow:
|
|
102
|
+
1. `Config manager`
|
|
103
|
+
2. `Add/Edit model alias`
|
|
104
|
+
3. Set alias ID (example: `chat.default`)
|
|
105
|
+
4. Select target models
|
|
106
|
+
5. Save
|
|
107
|
+
|
|
108
|
+
### 5) Configure Model Load Balancer
|
|
109
|
+
Flow:
|
|
110
|
+
1. `Config manager`
|
|
111
|
+
2. `Add/Edit model alias`
|
|
112
|
+
3. Open the alias you want to balance
|
|
113
|
+
4. Choose strategy (`auto` recommended)
|
|
114
|
+
5. Review alias targets
|
|
115
|
+
6. Save
|
|
116
|
+
|
|
117
|
+
### 6) Set Gateway Key
|
|
118
|
+
Flow:
|
|
119
|
+
1. `Config manager`
|
|
120
|
+
2. `Set worker master key`
|
|
121
|
+
3. Set or generate key
|
|
122
|
+
4. Save
|
|
123
|
+
|
|
124
|
+
## Start Local Server
|
|
298
125
|
|
|
299
126
|
```bash
|
|
300
|
-
llm-router start
|
|
127
|
+
llm-router start
|
|
301
128
|
```
|
|
302
129
|
|
|
303
|
-
|
|
130
|
+
Local endpoints:
|
|
131
|
+
- Unified: `http://127.0.0.1:8787/route`
|
|
132
|
+
- Anthropic-style: `http://127.0.0.1:8787/anthropic`
|
|
133
|
+
- OpenAI-style: `http://127.0.0.1:8787/openai`
|
|
304
134
|
|
|
305
|
-
|
|
135
|
+
## Connect your coding tool
|
|
306
136
|
|
|
307
|
-
|
|
137
|
+
After setting master key, point your app/agent to local endpoint and use that key as auth token.
|
|
308
138
|
|
|
309
|
-
|
|
310
|
-
llm-router deploy
|
|
311
|
-
```
|
|
139
|
+
Claude Code example (`~/.claude/settings.local.json`):
|
|
312
140
|
|
|
313
|
-
|
|
141
|
+
```json
|
|
142
|
+
{
|
|
143
|
+
"env": {
|
|
144
|
+
"ANTHROPIC_BASE_URL": "http://127.0.0.1:8787",
|
|
145
|
+
"ANTHROPIC_AUTH_TOKEN": "gw_your_gateway_key",
|
|
146
|
+
"ANTHROPIC_DEFAULT_OPUS_MODEL": "provider_name/model_name_1",
|
|
147
|
+
"ANTHROPIC_DEFAULT_SONNET_MODEL": "provider_name/model_name_2",
|
|
148
|
+
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "provider_name/model_name_3"
|
|
149
|
+
}
|
|
150
|
+
}
|
|
151
|
+
```
|
|
314
152
|
|
|
315
|
-
|
|
153
|
+
## Real-Time Update Experience
|
|
316
154
|
|
|
317
|
-
|
|
318
|
-
- `
|
|
319
|
-
-
|
|
155
|
+
When local server is running:
|
|
156
|
+
- open `llm-router`
|
|
157
|
+
- change provider/model/load-balancer/rate-limit/alias in TUI
|
|
158
|
+
- save
|
|
159
|
+
- the running proxy updates instantly
|
|
320
160
|
|
|
321
|
-
|
|
161
|
+
No stop/start cycle needed.
|
|
322
162
|
|
|
323
|
-
|
|
324
|
-
- Create a DNS record in Cloudflare for `llm` (usually `CNAME llm -> @`)
|
|
325
|
-
- Set **Proxy status = Proxied** (orange cloud)
|
|
326
|
-
- Use route target `--route-pattern=llm.example.com/* --zone-name=example.com`
|
|
327
|
-
- Claude Code base URL should be `https://llm.example.com/anthropic` (**no `:8787`**; that port is local-only)
|
|
163
|
+
## Cloudflare Worker (Hosted)
|
|
328
164
|
|
|
329
|
-
|
|
330
|
-
llm-router deploy --export-only=true --out=.llm-router.worker.json
|
|
331
|
-
wrangler secret put LLM_ROUTER_CONFIG_JSON < .llm-router.worker.json
|
|
332
|
-
wrangler deploy
|
|
333
|
-
```
|
|
165
|
+
Use when you want a hosted endpoint instead of local server.
|
|
334
166
|
|
|
335
|
-
|
|
167
|
+
Guided deploy:
|
|
336
168
|
|
|
337
169
|
```bash
|
|
338
|
-
llm-router
|
|
339
|
-
# or generate and rotate immediately
|
|
340
|
-
llm-router worker-key --env=production --generate-master-key=true
|
|
170
|
+
llm-router deploy
|
|
341
171
|
```
|
|
342
172
|
|
|
343
|
-
|
|
344
|
-
|
|
345
|
-
Cloudflare hardening and incident-response checklist: see [`SECURITY.md`](./SECURITY.md).
|
|
346
|
-
|
|
347
|
-
## Runtime Secrets / Env
|
|
348
|
-
|
|
349
|
-
Primary:
|
|
350
|
-
- `LLM_ROUTER_CONFIG_JSON`
|
|
351
|
-
- `LLM_ROUTER_MASTER_KEY` (optional override)
|
|
352
|
-
|
|
353
|
-
Also supported:
|
|
354
|
-
- `ROUTE_CONFIG_JSON`
|
|
355
|
-
- `LLM_ROUTER_JSON`
|
|
173
|
+
You will be guided in TUI to select account and deploy target.
|
|
356
174
|
|
|
357
|
-
|
|
358
|
-
- `
|
|
359
|
-
-
|
|
360
|
-
-
|
|
361
|
-
- `LLM_ROUTER_ORIGIN_FALLBACK_COOLDOWN_MS` (default `45000`)
|
|
362
|
-
- `LLM_ROUTER_ORIGIN_RATE_LIMIT_COOLDOWN_MS` (default `30000`)
|
|
363
|
-
- `LLM_ROUTER_ORIGIN_BILLING_COOLDOWN_MS` (default `900000`)
|
|
364
|
-
- `LLM_ROUTER_ORIGIN_AUTH_COOLDOWN_MS` (default `600000`)
|
|
365
|
-
- `LLM_ROUTER_ORIGIN_POLICY_COOLDOWN_MS` (default `120000`)
|
|
366
|
-
- `LLM_ROUTER_ALLOW_POLICY_FALLBACK` (default `false`)
|
|
367
|
-
- `LLM_ROUTER_FALLBACK_CIRCUIT_FAILURES` (default `2`)
|
|
368
|
-
- `LLM_ROUTER_FALLBACK_CIRCUIT_COOLDOWN_MS` (default `30000`)
|
|
369
|
-
- `LLM_ROUTER_MAX_REQUEST_BODY_BYTES` (default `1048576`, min `4096`, max `20971520`)
|
|
370
|
-
- `LLM_ROUTER_UPSTREAM_TIMEOUT_MS` (default `60000`, min `1000`, max `300000`)
|
|
175
|
+
Worker safety defaults:
|
|
176
|
+
- `LLM_ROUTER_STATE_BACKEND=file` is ignored on Worker (auto-fallback to in-memory state).
|
|
177
|
+
- Stateful timing-dependent routing features (cursor balancing, local quota counters, cooldown persistence) are auto-disabled by default to keep route flow safe across Worker isolates.
|
|
178
|
+
- To opt in to best-effort stateful behavior on Worker, set `LLM_ROUTER_WORKER_ALLOW_BEST_EFFORT_STATEFUL_ROUTING=true`.
|
|
371
179
|
|
|
372
|
-
|
|
373
|
-
- By default, cross-origin browser reads are denied unless explicitly allow-listed.
|
|
374
|
-
- `LLM_ROUTER_CORS_ALLOWED_ORIGINS` (comma-separated exact origins, e.g. `https://app.example.com`)
|
|
375
|
-
- `LLM_ROUTER_CORS_ALLOW_ALL=true` (allows any origin; not recommended for production)
|
|
180
|
+
## Config File Location
|
|
376
181
|
|
|
377
|
-
|
|
378
|
-
- `LLM_ROUTER_ALLOWED_IPS` (comma-separated client IPs; denies requests from all other IPs)
|
|
379
|
-
- `LLM_ROUTER_IP_ALLOWLIST` (alias of `LLM_ROUTER_ALLOWED_IPS`)
|
|
380
|
-
|
|
381
|
-
## Default Config Path
|
|
182
|
+
Local config file:
|
|
382
183
|
|
|
383
184
|
`~/.llm-router.json`
|
|
384
185
|
|
|
385
|
-
|
|
386
|
-
|
|
387
|
-
```json
|
|
388
|
-
{
|
|
389
|
-
"version": 2,
|
|
390
|
-
"masterKey": "local_or_worker_key",
|
|
391
|
-
"defaultModel": "chat.default",
|
|
392
|
-
"modelAliases": {
|
|
393
|
-
"chat.default": {
|
|
394
|
-
"strategy": "auto",
|
|
395
|
-
"targets": [
|
|
396
|
-
{ "ref": "openrouter/gpt-4o" },
|
|
397
|
-
{ "ref": "anthropic/claude-3-5-haiku" }
|
|
398
|
-
],
|
|
399
|
-
"fallbackTargets": [
|
|
400
|
-
{ "ref": "openrouter/gpt-4o-mini" }
|
|
401
|
-
]
|
|
402
|
-
}
|
|
403
|
-
},
|
|
404
|
-
"providers": [
|
|
405
|
-
{
|
|
406
|
-
"id": "openrouter",
|
|
407
|
-
"name": "OpenRouter",
|
|
408
|
-
"baseUrl": "https://openrouter.ai/api/v1",
|
|
409
|
-
"apiKey": "sk-or-v1-...",
|
|
410
|
-
"formats": ["openai"],
|
|
411
|
-
"models": [{ "id": "gpt-4o" }],
|
|
412
|
-
"rateLimits": [
|
|
413
|
-
{
|
|
414
|
-
"id": "openrouter-all-month",
|
|
415
|
-
"name": "Monthly cap",
|
|
416
|
-
"models": ["all"],
|
|
417
|
-
"requests": 20000,
|
|
418
|
-
"window": { "unit": "month", "size": 1 }
|
|
419
|
-
}
|
|
420
|
-
]
|
|
421
|
-
}
|
|
422
|
-
]
|
|
423
|
-
}
|
|
424
|
-
```
|
|
186
|
+
## Security
|
|
425
187
|
|
|
426
|
-
|
|
427
|
-
- Direct route: request `model=provider/model` and optional model-level `fallbackModels` applies.
|
|
428
|
-
- Model alias route: request `model=alias.id` (or set as `defaultModel`) and the model alias `targets` + `strategy` drive balancing. `auto` is the recommended default for new model aliases.
|
|
188
|
+
See [`SECURITY.md`](https://github.com/khanglvm/llm-router/blob/master/SECURITY.md).
|
|
429
189
|
|
|
430
|
-
|
|
431
|
-
- Local Node (`llm-router start`): routing state defaults to file-backed local persistence, so cooldowns/caps survive restarts.
|
|
432
|
-
- Cloudflare Worker: default state is in-memory per isolate for now; long-window counters are best-effort until a durable Worker backend is configured.
|
|
433
|
-
|
|
434
|
-
## Smoke Test
|
|
435
|
-
|
|
436
|
-
```bash
|
|
437
|
-
npm run test:provider-smoke
|
|
438
|
-
```
|
|
190
|
+
## Versioning
|
|
439
191
|
|
|
440
|
-
|
|
192
|
+
- Semver: [Semantic Versioning](https://semver.org/)
|
|
193
|
+
- Release notes: [`CHANGELOG.md`](https://github.com/khanglvm/llm-router/blob/master/CHANGELOG.md)
|
package/package.json
CHANGED
|
@@ -1,7 +1,19 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@khanglvm/llm-router",
|
|
3
|
-
"version": "1.0.
|
|
3
|
+
"version": "1.0.9",
|
|
4
4
|
"description": "Single gateway endpoint for multi-provider LLMs with unified OpenAI+Anthropic format and seamless fallback",
|
|
5
|
+
"keywords": [
|
|
6
|
+
"llm-router",
|
|
7
|
+
"llm-gateway",
|
|
8
|
+
"ai-proxy",
|
|
9
|
+
"openai-compatible",
|
|
10
|
+
"anthropic-compatible",
|
|
11
|
+
"model-routing",
|
|
12
|
+
"fallback",
|
|
13
|
+
"load-balancing",
|
|
14
|
+
"cloudflare-workers",
|
|
15
|
+
"agent-infra"
|
|
16
|
+
],
|
|
5
17
|
"type": "module",
|
|
6
18
|
"main": "src/index.js",
|
|
7
19
|
"bin": {
|