@khanglvm/llm-router 1.0.5 → 1.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md ADDED
@@ -0,0 +1,46 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [1.0.6] - 2026-02-28
9
+
10
+ ### Added
11
+ - Added a formal changelog for tracked, versioned releases.
12
+ - Added npm package publish metadata to keep public publish defaults explicit.
13
+
14
+ ### Changed
15
+ - Added an explicit package `files` whitelist so npm publishes are predictable.
16
+ - Updated release workflow docs in `README.md` to require changelog + version updates before publish.
17
+
18
+ ## [1.0.5] - 2026-02-27
19
+
20
+ ### Fixed
21
+ - Hardened release surface and added `.npmignore` coverage for safer package publishes.
22
+
23
+ ## [1.0.4] - 2026-02-26
24
+
25
+ ### Changed
26
+ - Refined README guidance for routing and deployment usage.
27
+
28
+ ## [1.0.3] - 2026-02-26
29
+
30
+ ### Changed
31
+ - Simplified project positioning and gateway copy in docs.
32
+
33
+ ## [1.0.2] - 2026-02-26
34
+
35
+ ### Changed
36
+ - Documented smart fallback behavior and operational expectations.
37
+
38
+ ## [1.0.1] - 2026-02-25
39
+
40
+ ### Changed
41
+ - Improved fallback strategy behavior and released patch update.
42
+
43
+ ## [1.0.0] - 2026-02-25
44
+
45
+ ### Added
46
+ - Initial `llm-router` route release with local + Cloudflare Worker gateway flows.
package/README.md CHANGED
@@ -14,6 +14,18 @@ It supports:
14
14
  npm i -g @khanglvm/llm-router
15
15
  ```
16
16
 
17
+ ## Versioning
18
+
19
+ - Follows [Semantic Versioning](https://semver.org/).
20
+ - Release notes live in [`CHANGELOG.md`](./CHANGELOG.md).
21
+ - npm publishes are configured for the public registry package.
22
+
23
+ Release checklist:
24
+ - Update `README.md` if user-facing behavior changed.
25
+ - Add a dated entry in `CHANGELOG.md`.
26
+ - Bump the package version before publish.
27
+ - Publish with `npm publish`.
28
+
17
29
  ## Quick Start
18
30
 
19
31
  ```bash
@@ -45,17 +57,34 @@ llm-router config \
45
57
  --format=openai \
46
58
  --skip-probe=true
47
59
 
48
- # 2) (Optional) Configure model fallback order
60
+ # 2) (Optional) Configure model fallback order for direct provider/model requests
49
61
  llm-router config \
50
62
  --operation=set-model-fallbacks \
51
63
  --provider-id=openrouter \
52
64
  --model=claude-3-7-sonnet \
53
65
  --fallback-models=openrouter/gpt-4o
54
66
 
55
- # 3) Set master key (this is your gateway key for client apps)
67
+ # 3) (Optional) Create a model alias with a routing strategy and weighted targets
68
+ llm-router config \
69
+ --operation=upsert-model-alias \
70
+ --alias-id=chat.default \
71
+ --strategy=auto \
72
+ --targets=openrouter/claude-3-7-sonnet@2,openrouter/gpt-4o@1 \
73
+ --fallback-targets=openrouter/gpt-4o-mini
74
+
75
+ # 4) (Optional) Add provider request-cap bucket (models: all)
76
+ llm-router config \
77
+ --operation=set-provider-rate-limits \
78
+ --provider-id=openrouter \
79
+ --bucket-name="Monthly cap" \
80
+ --bucket-models=all \
81
+ --bucket-requests=20000 \
82
+ --bucket-window=month:1
83
+
84
+ # 5) Set master key (this is your gateway key for client apps)
56
85
  llm-router config --operation=set-master-key --master-key=gw_your_gateway_key
57
86
 
58
- # 4) Start gateway with auth required
87
+ # 6) Start gateway with auth required
59
88
  llm-router start --require-auth=true
60
89
  ```
61
90
 
@@ -80,6 +109,109 @@ Claude Code example (`~/.claude/settings.local.json`):
80
109
  - Policy/moderation blocks: no retry; cross-provider fallback is disabled by default (`LLM_ROUTER_ALLOW_POLICY_FALLBACK=false`).
81
110
  - Invalid client requests (`400`, `413`, `422`): no retry and no fallback short-circuit.
82
111
 
112
+ ## Model Alias Routing Strategies
113
+
114
+ A model alias groups multiple models from different providers under one model name.
115
+
116
+ Use `--strategy` when creating or updating a model alias:
117
+
118
+ - `auto`: Recommended set-and-forget mode. Automatically routes using quota, cooldown, and health signals to reduce rate-limit failures.
119
+ - `ordered`: Tries targets in list order.
120
+ - `round-robin`: Rotates evenly across eligible targets.
121
+ - `weighted-rr`: Rotates like round-robin, but favors higher weights.
122
+ - `quota-aware-weighted-rr`: Weighted routing plus remaining-capacity awareness.
123
+
124
+ Example:
125
+
126
+ ```bash
127
+ llm-router config \
128
+ --operation=upsert-model-alias \
129
+ --alias-id=coding \
130
+ --strategy=auto \
131
+ --targets=rc/gpt-5.3-codex,zai/glm-5
132
+ ```
133
+
134
+ Concrete model alias example with provider-specific caps:
135
+
136
+ ```bash
137
+ llm-router config \
138
+ --operation=upsert-model-alias \
139
+ --alias-id=coding \
140
+ --strategy=auto \
141
+ --targets=rc/gpt-5.3-codex,zai/glm-5
142
+
143
+ llm-router config \
144
+ --operation=set-provider-rate-limits \
145
+ --provider-id=rc \
146
+ --bucket-name="Minute cap" \
147
+ --bucket-models=gpt-5.3-codex \
148
+ --bucket-requests=60 \
149
+ --bucket-window=minute:1
150
+
151
+ llm-router config \
152
+ --operation=set-provider-rate-limits \
153
+ --provider-id=zai \
154
+ --bucket-name="5-hours cap" \
155
+ --bucket-models=glm-5 \
156
+ --bucket-requests=600 \
157
+ --bucket-window=hour:5
158
+ ```
159
+
160
+ ## What Is A Bucket?
161
+
162
+ A rate-limit bucket is a request cap for a time window.
163
+
164
+ Examples:
165
+ - `40 req / 1 minute`
166
+ - `600 req / 6 hours`
167
+
168
+ Multiple buckets can apply to the same model scope at the same time. A candidate is treated as exhausted if any matching bucket is exhausted.
169
+
170
+ ## TUI Bucket Walkthrough
171
+
172
+ Use the config manager and select:
173
+ - `Manage provider rate-limit buckets`
174
+ - `Create bucket(s)`
175
+
176
+ The TUI now guides you through:
177
+ - Bucket name (friendly label)
178
+ - Model scope (`all` or selected models with multiselect checkboxes)
179
+ - Request cap
180
+ - Window unit (`minute`, `hour(s)`, `week`, `month`)
181
+ - Window size (hours support `N`, other preset units lock to `1`)
182
+ - Review + optional add-another loop for combined policies
183
+
184
+ Internal bucket ids are generated automatically from the name when omitted and shown as advanced detail in review.
185
+
186
+ ## Combined-Cap Recipe (`40/min` + `600/6h`)
187
+
188
+ ```bash
189
+ llm-router config \
190
+ --operation=set-provider-rate-limits \
191
+ --provider-id=openrouter \
192
+ --bucket-name="Minute cap" \
193
+ --bucket-models=all \
194
+ --bucket-requests=40 \
195
+ --bucket-window=minute:1
196
+
197
+ llm-router config \
198
+ --operation=set-provider-rate-limits \
199
+ --provider-id=openrouter \
200
+ --bucket-name="6-hours cap" \
201
+ --bucket-models=all \
202
+ --bucket-requests=600 \
203
+ --bucket-window=hour:6
204
+ ```
205
+
206
+ This keeps both limits active together for the same model scope.
207
+
208
+ ## Rate-Limit Troubleshooting
209
+
210
+ - Check routing decisions with `LLM_ROUTER_DEBUG_ROUTING=true` and inspect `x-llm-router-skipped-candidates`.
211
+ - `quota-exhausted` means proactive pre-routing skip happened before an upstream call.
212
+ - For provider `429`, cooldown is tracked from `Retry-After` when present, or from `LLM_ROUTER_ORIGIN_RATE_LIMIT_COOLDOWN_MS`.
213
+ - Local mode persists state by default (file backend), while Worker defaults to in-memory state.
214
+
83
215
  ## Main Commands
84
216
 
85
217
  ```bash
@@ -104,8 +236,56 @@ llm-router config \
104
236
  --models=gpt-4o,claude-3-7-sonnet \
105
237
  --format=openai \
106
238
  --skip-probe=true
239
+
240
+ llm-router config \
241
+ --operation=upsert-model-alias \
242
+ --alias-id=chat.default \
243
+ --strategy=auto \
244
+ --targets=openrouter/gpt-4o-mini@3,anthropic/claude-3-5-haiku@2 \
245
+ --fallback-targets=openrouter/gpt-4o
246
+
247
+ llm-router config \
248
+ --operation=set-provider-rate-limits \
249
+ --provider-id=openrouter \
250
+ --bucket-name="Monthly cap" \
251
+ --bucket-models=all \
252
+ --bucket-requests=20000 \
253
+ --bucket-window=month:1
254
+ ```
255
+
256
+ Alias target syntax:
257
+ - `--targets` / `--fallback-targets`: `<routeRef>@<weight>` or `<routeRef>:<weight>`
258
+ - route refs: direct `provider/model` or alias id
259
+
260
+ Routing strategy values:
261
+ - `auto` (recommended)
262
+ - `ordered`
263
+ - `round-robin`
264
+ - `weighted-rr`
265
+ - `quota-aware-weighted-rr`
266
+
267
+ Rate-limit bucket window syntax:
268
+ - `--bucket-window=month:1`
269
+ - `--bucket-window=1w`
270
+ - `--bucket-window=7day`
271
+
272
+ Routing summary:
273
+
274
+ ```bash
275
+ llm-router config --operation=list-routing
107
276
  ```
108
277
 
278
+ Explicit schema migration with backup:
279
+
280
+ ```bash
281
+ llm-router config --operation=migrate-config --target-version=2 --create-backup=true
282
+ ```
283
+
284
+ Automatic version handling:
285
+ - Local config loads with silent forward-migration to latest supported schema.
286
+ - Migration is persisted automatically on read when possible (best-effort, no interactive prompt).
287
+ - Future/newer version numbers do not fail only because of version mismatch; known fields are normalized best-effort.
288
+
109
289
  Set local auth key:
110
290
 
111
291
  ```bash
@@ -206,8 +386,21 @@ Minimal shape:
206
386
 
207
387
  ```json
208
388
  {
389
+ "version": 2,
209
390
  "masterKey": "local_or_worker_key",
210
- "defaultModel": "openrouter/gpt-4o",
391
+ "defaultModel": "chat.default",
392
+ "modelAliases": {
393
+ "chat.default": {
394
+ "strategy": "auto",
395
+ "targets": [
396
+ { "ref": "openrouter/gpt-4o" },
397
+ { "ref": "anthropic/claude-3-5-haiku" }
398
+ ],
399
+ "fallbackTargets": [
400
+ { "ref": "openrouter/gpt-4o-mini" }
401
+ ]
402
+ }
403
+ },
211
404
  "providers": [
212
405
  {
213
406
  "id": "openrouter",
@@ -215,12 +408,29 @@ Minimal shape:
215
408
  "baseUrl": "https://openrouter.ai/api/v1",
216
409
  "apiKey": "sk-or-v1-...",
217
410
  "formats": ["openai"],
218
- "models": [{ "id": "gpt-4o" }]
411
+ "models": [{ "id": "gpt-4o" }],
412
+ "rateLimits": [
413
+ {
414
+ "id": "openrouter-all-month",
415
+ "name": "Monthly cap",
416
+ "models": ["all"],
417
+ "requests": 20000,
418
+ "window": { "unit": "month", "size": 1 }
419
+ }
420
+ ]
219
421
  }
220
422
  ]
221
423
  }
222
424
  ```
223
425
 
426
+ Direct vs model alias routing:
427
+ - Direct route: request `model=provider/model` and optional model-level `fallbackModels` applies.
428
+ - Model alias route: request `model=alias.id` (or set as `defaultModel`) and the model alias `targets` + `strategy` drive balancing. `auto` is the recommended default for new model aliases.
429
+
430
+ State durability caveats:
431
+ - Local Node (`llm-router start`): routing state defaults to file-backed local persistence, so cooldowns/caps survive restarts.
432
+ - Cloudflare Worker: default state is in-memory per isolate for now; long-window counters are best-effort until a durable Worker backend is configured.
433
+
224
434
  ## Smoke Test
225
435
 
226
436
  ```bash
package/SECURITY.md ADDED
@@ -0,0 +1,142 @@
1
+ # Security Guide
2
+
3
+ This guide focuses on preventing unauthorized access to costly LLM resources, especially in Cloudflare Worker deployments.
4
+
5
+ ## Quick Hardened Setup
6
+
7
+ 1. Generate and set a strong gateway key locally:
8
+
9
+ ```bash
10
+ llm-router config --operation=set-master-key --generate-master-key=true
11
+ ```
12
+
13
+ 2. Deploy with worker defaults already set in this repo:
14
+ - `workers_dev = false`
15
+ - `preview_urls = false`
16
+
17
+ 3. Deploy config + secrets:
18
+
19
+ ```bash
20
+ llm-router deploy --env=production
21
+ ```
22
+
23
+ 4. Restrict who can call the router:
24
+ - Set `LLM_ROUTER_ALLOWED_IPS` (or `LLM_ROUTER_IP_ALLOWLIST`) to trusted source IPs.
25
+ - Set `LLM_ROUTER_CORS_ALLOWED_ORIGINS` to explicit browser origins.
26
+ - Keep `LLM_ROUTER_CORS_ALLOW_ALL` disabled in production.
27
+
28
+ 5. Expose only a custom domain route (not `workers.dev`):
29
+
30
+ ```toml
31
+ [env.production]
32
+ routes = [{ pattern = "api.example.com/*", zone_name = "example.com" }]
33
+ ```
34
+
35
+ ## Quick Master Key Generation
36
+
37
+ Use generated keys instead of hand-written keys:
38
+
39
+ ```bash
40
+ # Local config master key
41
+ llm-router config --operation=set-master-key --generate-master-key=true
42
+
43
+ # Rotate Cloudflare worker key directly
44
+ llm-router worker-key --env=production --generate-master-key=true
45
+ ```
46
+
47
+ Optional tuning:
48
+
49
+ ```bash
50
+ llm-router worker-key \
51
+ --env=production \
52
+ --generate-master-key=true \
53
+ --master-key-length=64 \
54
+ --master-key-prefix=gw_
55
+ ```
56
+
57
+ ## Cloudflare Access (Recommended)
58
+
59
+ Protect the worker behind Cloudflare Access so clients must present a service token before hitting the router.
60
+
61
+ Suggested setup:
62
+ 1. Zero Trust -> Access -> Applications -> Add application.
63
+ 2. Type: Self-hosted.
64
+ 3. Domain: your API hostname (for example `api.example.com`).
65
+ 4. Policy: allow only a Service Token for machine-to-machine traffic.
66
+
67
+ Client calls should include:
68
+ - `CF-Access-Client-Id`
69
+ - `CF-Access-Client-Secret`
70
+
71
+ Reference:
72
+ - [Cloudflare Access service tokens](https://developers.cloudflare.com/cloudflare-one/identity/service-tokens/)
73
+
74
+ ## WAF and Rate Limiting
75
+
76
+ Use WAF custom rules and rate limiting to reduce abuse blast radius.
77
+
78
+ Suggested custom rule expressions (adapt host/path to your deployment):
79
+
80
+ 1. Block non-allowlisted source IPs to route endpoint:
81
+
82
+ ```txt
83
+ http.host eq "api.example.com" and starts_with(http.request.uri.path, "/route") and not ip.src in $llm_router_allowed_ips
84
+ ```
85
+
86
+ 2. Block unexpected methods on route endpoint:
87
+
88
+ ```txt
89
+ http.host eq "api.example.com" and starts_with(http.request.uri.path, "/route") and not http.request.method in {"POST" "OPTIONS"}
90
+ ```
91
+
92
+ Suggested rate limit rule:
93
+ - Match expression:
94
+
95
+ ```txt
96
+ http.host eq "api.example.com" and starts_with(http.request.uri.path, "/route")
97
+ ```
98
+
99
+ - Threshold example:
100
+ - 60 requests / 1 minute per source IP (tighten or loosen by workload).
101
+ - Action: Block or Managed Challenge.
102
+
103
+ References:
104
+ - [Cloudflare WAF custom rules](https://developers.cloudflare.com/waf/custom-rules/)
105
+ - [Cloudflare WAF rate limiting rules](https://developers.cloudflare.com/waf/rate-limiting-rules/)
106
+
107
+ ## Incident Response: Master Key Leak
108
+
109
+ 1. Rotate worker key immediately:
110
+
111
+ ```bash
112
+ llm-router worker-key --env=production --generate-master-key=true
113
+ ```
114
+
115
+ 2. Rotate local config key (if reused anywhere):
116
+
117
+ ```bash
118
+ llm-router config --operation=set-master-key --generate-master-key=true
119
+ ```
120
+
121
+ 3. Revoke exposed credentials and rotate provider API keys.
122
+ 4. Review Cloudflare logs/WAF events for abuse window.
123
+ 5. Tighten Access policy, IP allowlist, and rate limits before reopening traffic.
124
+
125
+ ## Router Runtime Hardening Knobs
126
+
127
+ - `LLM_ROUTER_MAX_REQUEST_BODY_BYTES`
128
+ - `LLM_ROUTER_UPSTREAM_TIMEOUT_MS`
129
+ - `LLM_ROUTER_ALLOWED_IPS` / `LLM_ROUTER_IP_ALLOWLIST`
130
+ - `LLM_ROUTER_CORS_ALLOWED_ORIGINS`
131
+ - `LLM_ROUTER_CORS_ALLOW_ALL` (keep `false` in production)
132
+
133
+ ## Official References
134
+
135
+ - [Workers Secrets](https://developers.cloudflare.com/workers/configuration/secrets/)
136
+ - [Wrangler configuration](https://developers.cloudflare.com/workers/wrangler/configuration/)
137
+ - [workers.dev routing controls](https://developers.cloudflare.com/workers/configuration/routing/workers-dev/)
138
+ - [Preview URLs](https://developers.cloudflare.com/changelog/2024-03-14-preview-urls/)
139
+ - [Cloudflare Access service tokens](https://developers.cloudflare.com/cloudflare-one/identity/service-tokens/)
140
+ - [WAF custom rules](https://developers.cloudflare.com/waf/custom-rules/)
141
+ - [WAF rate limiting](https://developers.cloudflare.com/waf/rate-limiting-rules/)
142
+ - [API Shield sequence mitigation](https://developers.cloudflare.com/api-shield/security/sequence-mitigation/)
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@khanglvm/llm-router",
3
- "version": "1.0.5",
3
+ "version": "1.0.6",
4
4
  "description": "Single gateway endpoint for multi-provider LLMs with unified OpenAI+Anthropic format and seamless fallback",
5
5
  "type": "module",
6
6
  "main": "src/index.js",
@@ -18,9 +18,21 @@
18
18
  "test:provider-smoke": "node ./scripts/provider-smoke-suite.mjs"
19
19
  },
20
20
  "dependencies": {
21
- "@levu/snap": "^0.3.8"
21
+ "@levu/snap": "^0.3.11"
22
22
  },
23
23
  "devDependencies": {
24
24
  "wrangler": "^4.68.1"
25
- }
25
+ },
26
+ "publishConfig": {
27
+ "access": "public"
28
+ },
29
+ "files": [
30
+ "src/**/*.js",
31
+ "!src/**/*.test.js",
32
+ "!src/**/*.spec.js",
33
+ "README.md",
34
+ "SECURITY.md",
35
+ "CHANGELOG.md",
36
+ "wrangler.toml"
37
+ ]
26
38
  }