@khanglvm/llm-router 1.0.6 → 1.0.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -5,6 +5,20 @@ All notable changes to this project will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [1.0.8] - 2026-02-28
9
+
10
+ ### Changed
11
+ - Added focused npm `keywords` metadata in `package.json` to improve package discoverability.
12
+
13
+ ## [1.0.7] - 2026-02-28
14
+
15
+ ### Added
16
+ - Added `llm-router ai-help` to generate an agent-oriented operating guide with live gateway checks and coding-tool patch instructions.
17
+ - Added tests covering `ai-help` discovery output and first-run setup guidance.
18
+
19
+ ### Changed
20
+ - Rewrote `README.md` into a shorter setup and operations guide focused on providers, aliases, rate limits, and local/hosted usage.
21
+
8
22
  ## [1.0.6] - 2026-02-28
9
23
 
10
24
  ### Added
package/README.md CHANGED
@@ -1,440 +1,188 @@
1
1
  # llm-router
2
2
 
3
- `llm-router` is a gateway api proxy for accessing multiple models across any provider that supports OpenAI or Anthropic formats.
3
+ `llm-router` exposes unified API endpoint for multiple AI providers and models.
4
4
 
5
- It supports:
6
- - local route server `llm-router start`
7
- - Cloudflare Worker route runtime deployment `llm-router deploy`
8
- - CLI + TUI management `config`, `start`, `deploy`, `worker-key`
9
- - Seamless model fallback
5
+ ## Main feature
10
6
 
11
- ## Install
12
-
13
- ```bash
14
- npm i -g @khanglvm/llm-router
15
- ```
16
-
17
- ## Versioning
7
+ 1. Single endpoint, unified providers & models
8
+ 2. Support grouping models with rate-limit and load balancing strategy
9
+ 3. Configuration auto reload in real time, no interruption
18
10
 
19
- - Follows [Semantic Versioning](https://semver.org/).
20
- - Release notes live in [`CHANGELOG.md`](./CHANGELOG.md).
21
- - npm publishes are configured for the public registry package.
22
-
23
- Release checklist:
24
- - Update `README.md` if user-facing behavior changed.
25
- - Add a dated entry in `CHANGELOG.md`.
26
- - Bump the package version before publish.
27
- - Publish with `npm publish`.
28
-
29
- ## Quick Start
11
+ ## Install
30
12
 
31
13
  ```bash
32
- # 1) Open config TUI (default behavior) to manage providers, models, fallbacks, and auth
33
- llm-router
34
-
35
- # 2) Start local route server
36
- llm-router start
14
+ npm i -g @khanglvm/llm-router@latest
37
15
  ```
38
16
 
39
- Local endpoints:
40
- - Unified (Auto transform): `http://127.0.0.1:8787/route` (or `/` and `/v1`)
41
- - Anthropic: `http://127.0.0.1:8787/anthropic`
42
- - OpenAI: `http://127.0.0.1:8787/openai`
43
-
44
- ## Usage Example
45
-
46
- ```bash
47
- # Your AI Agent can help! Ask them to manage api router via this tool for you.
48
-
49
- # 1) Add provider + models + provider API key. You can ask your AI agent to do it for you, or manually via TUI or command line:
50
- llm-router config \
51
- --operation=upsert-provider \
52
- --provider-id=openrouter \
53
- --name="OpenRouter" \
54
- --base-url=https://openrouter.ai/api/v1 \
55
- --api-key=sk-or-v1-... \
56
- --models=claude-3-7-sonnet,gpt-4o \
57
- --format=openai \
58
- --skip-probe=true
59
-
60
- # 2) (Optional) Configure model fallback order for direct provider/model requests
61
- llm-router config \
62
- --operation=set-model-fallbacks \
63
- --provider-id=openrouter \
64
- --model=claude-3-7-sonnet \
65
- --fallback-models=openrouter/gpt-4o
66
-
67
- # 3) (Optional) Create a model alias with a routing strategy and weighted targets
68
- llm-router config \
69
- --operation=upsert-model-alias \
70
- --alias-id=chat.default \
71
- --strategy=auto \
72
- --targets=openrouter/claude-3-7-sonnet@2,openrouter/gpt-4o@1 \
73
- --fallback-targets=openrouter/gpt-4o-mini
74
-
75
- # 4) (Optional) Add provider request-cap bucket (models: all)
76
- llm-router config \
77
- --operation=set-provider-rate-limits \
78
- --provider-id=openrouter \
79
- --bucket-name="Monthly cap" \
80
- --bucket-models=all \
81
- --bucket-requests=20000 \
82
- --bucket-window=month:1
83
-
84
- # 5) Set master key (this is your gateway key for client apps)
85
- llm-router config --operation=set-master-key --master-key=gw_your_gateway_key
86
-
87
- # 6) Start gateway with auth required
88
- llm-router start --require-auth=true
89
- ```
17
+ ## Usage
90
18
 
91
- Claude Code example (`~/.claude/settings.local.json`):
19
+ Copy/paste this short instruction to your AI agent:
92
20
 
93
- ```json
94
- {
95
- "env": {
96
- "ANTHROPIC_BASE_URL": "http://127.0.0.1:8787/anthropic",
97
- "ANTHROPIC_AUTH_TOKEN": "gw_your_gateway_key"
98
- }
99
- }
21
+ ```text
22
+ Run `llm-router ai-help` first, then set up and operate llm-router for me using CLI commands.
100
23
  ```
101
24
 
102
- ## Smart Fallback Behavior
103
-
104
- `llm-router` can fail over from a primary model to configured fallback models with status-aware logic:
105
- - `429` (rate-limited): immediate fallback (no origin retry), with `Retry-After` respected when present.
106
- - Temporary failures (`408`, `409`, `5xx`, network errors): origin-only bounded retries with jittered backoff, then fallback.
107
- - Billing/quota exhaustion (`402`, or provider-specific billing signals): immediate fallback with longer origin cooldown memory.
108
- - Auth and permission failures (`401` and relevant `403` cases): no retry; fallback to other providers/models when possible.
109
- - Policy/moderation blocks: no retry; cross-provider fallback is disabled by default (`LLM_ROUTER_ALLOW_POLICY_FALLBACK=false`).
110
- - Invalid client requests (`400`, `413`, `422`): no retry and no fallback short-circuit.
25
+ ## Main Workflow
111
26
 
112
- ## Model Alias Routing Strategies
27
+ 1. Add Providers + models into llm-router
28
+ 2. Optionally, group models as alias with load balancing and auto fallback support
29
+ 3. Start llm-router server, point your coding tool API and model to llm-router
113
30
 
114
- A model alias groups multiple models from different providers under one model name.
31
+ ## What Each Term Means
115
32
 
116
- Use `--strategy` when creating or updating a model alias:
33
+ ### Provider
34
+ The service endpoint you call (OpenRouter, Anthropic, etc.).
117
35
 
118
- - `auto`: Recommended set-and-forget mode. Automatically routes using quota, cooldown, and health signals to reduce rate-limit failures.
119
- - `ordered`: Tries targets in list order.
120
- - `round-robin`: Rotates evenly across eligible targets.
121
- - `weighted-rr`: Rotates like round-robin, but favors higher weights.
122
- - `quota-aware-weighted-rr`: Weighted routing plus remaining-capacity awareness.
123
-
124
- Example:
125
-
126
- ```bash
127
- llm-router config \
128
- --operation=upsert-model-alias \
129
- --alias-id=coding \
130
- --strategy=auto \
131
- --targets=rc/gpt-5.3-codex,zai/glm-5
132
- ```
133
-
134
- Concrete model alias example with provider-specific caps:
135
-
136
- ```bash
137
- llm-router config \
138
- --operation=upsert-model-alias \
139
- --alias-id=coding \
140
- --strategy=auto \
141
- --targets=rc/gpt-5.3-codex,zai/glm-5
142
-
143
- llm-router config \
144
- --operation=set-provider-rate-limits \
145
- --provider-id=rc \
146
- --bucket-name="Minute cap" \
147
- --bucket-models=gpt-5.3-codex \
148
- --bucket-requests=60 \
149
- --bucket-window=minute:1
150
-
151
- llm-router config \
152
- --operation=set-provider-rate-limits \
153
- --provider-id=zai \
154
- --bucket-name="5-hours cap" \
155
- --bucket-models=glm-5 \
156
- --bucket-requests=600 \
157
- --bucket-window=hour:5
158
- ```
159
-
160
- ## What Is A Bucket?
161
-
162
- A rate-limit bucket is a request cap for a time window.
36
+ ### Model
37
+ The actual model ID from that provider.
163
38
 
39
+ ### Rate-Limit Bucket
40
+ A request cap for a time window.
164
41
  Examples:
165
- - `40 req / 1 minute`
166
- - `600 req / 6 hours`
167
-
168
- Multiple buckets can apply to the same model scope at the same time. A candidate is treated as exhausted if any matching bucket is exhausted.
169
-
170
- ## TUI Bucket Walkthrough
171
-
172
- Use the config manager and select:
173
- - `Manage provider rate-limit buckets`
174
- - `Create bucket(s)`
175
-
176
- The TUI now guides you through:
177
- - Bucket name (friendly label)
178
- - Model scope (`all` or selected models with multiselect checkboxes)
179
- - Request cap
180
- - Window unit (`minute`, `hour(s)`, `week`, `month`)
181
- - Window size (hours support `N`, other preset units lock to `1`)
182
- - Review + optional add-another loop for combined policies
183
-
184
- Internal bucket ids are generated automatically from the name when omitted and shown as advanced detail in review.
185
-
186
- ## Combined-Cap Recipe (`40/min` + `600/6h`)
187
-
188
- ```bash
189
- llm-router config \
190
- --operation=set-provider-rate-limits \
191
- --provider-id=openrouter \
192
- --bucket-name="Minute cap" \
193
- --bucket-models=all \
194
- --bucket-requests=40 \
195
- --bucket-window=minute:1
196
-
197
- llm-router config \
198
- --operation=set-provider-rate-limits \
199
- --provider-id=openrouter \
200
- --bucket-name="6-hours cap" \
201
- --bucket-models=all \
202
- --bucket-requests=600 \
203
- --bucket-window=hour:6
204
- ```
205
-
206
- This keeps both limits active together for the same model scope.
207
-
208
- ## Rate-Limit Troubleshooting
209
-
210
- - Check routing decisions with `LLM_ROUTER_DEBUG_ROUTING=true` and inspect `x-llm-router-skipped-candidates`.
211
- - `quota-exhausted` means proactive pre-routing skip happened before an upstream call.
212
- - For provider `429`, cooldown is tracked from `Retry-After` when present, or from `LLM_ROUTER_ORIGIN_RATE_LIMIT_COOLDOWN_MS`.
213
- - Local mode persists state by default (file backend), while Worker defaults to in-memory state.
214
-
215
- ## Main Commands
42
+ - `40 requests / minute`
43
+ - `20,000 requests / month`
216
44
 
217
- ```bash
218
- llm-router config
219
- llm-router start
220
- llm-router stop
221
- llm-router reload
222
- llm-router update
223
- llm-router deploy
224
- llm-router worker-key
225
- ```
226
-
227
- ## Non-Interactive Config (Agent/CI Friendly)
228
-
229
- ```bash
230
- llm-router config \
231
- --operation=upsert-provider \
232
- --provider-id=openrouter \
233
- --name="OpenRouter" \
234
- --base-url=https://openrouter.ai/api/v1 \
235
- --api-key=sk-or-v1-... \
236
- --models=gpt-4o,claude-3-7-sonnet \
237
- --format=openai \
238
- --skip-probe=true
239
-
240
- llm-router config \
241
- --operation=upsert-model-alias \
242
- --alias-id=chat.default \
243
- --strategy=auto \
244
- --targets=openrouter/gpt-4o-mini@3,anthropic/claude-3-5-haiku@2 \
245
- --fallback-targets=openrouter/gpt-4o
246
-
247
- llm-router config \
248
- --operation=set-provider-rate-limits \
249
- --provider-id=openrouter \
250
- --bucket-name="Monthly cap" \
251
- --bucket-models=all \
252
- --bucket-requests=20000 \
253
- --bucket-window=month:1
254
- ```
255
-
256
- Alias target syntax:
257
- - `--targets` / `--fallback-targets`: `<routeRef>@<weight>` or `<routeRef>:<weight>`
258
- - route refs: direct `provider/model` or alias id
45
+ ### Model Load Balancer
46
+ Decides how traffic is distributed across models in an alias group.
259
47
 
260
- Routing strategy values:
48
+ Available strategies:
261
49
  - `auto` (recommended)
262
50
  - `ordered`
263
51
  - `round-robin`
264
52
  - `weighted-rr`
265
53
  - `quota-aware-weighted-rr`
266
54
 
267
- Rate-limit bucket window syntax:
268
- - `--bucket-window=month:1`
269
- - `--bucket-window=1w`
270
- - `--bucket-window=7day`
271
-
272
- Routing summary:
273
-
274
- ```bash
275
- llm-router config --operation=list-routing
276
- ```
55
+ ### Model Alias (Group models)
56
+ A single model name that auto route/rotate across multiple models.
277
57
 
278
- Explicit schema migration with backup:
58
+ Example:
59
+ - alias: `opus`
60
+ - targets:
61
+ - `openrouter/claude-opus-4.6`
62
+ - `anthropic/claude-opus-4.6`
279
63
 
280
- ```bash
281
- llm-router config --operation=migrate-config --target-version=2 --create-backup=true
282
- ```
64
+ Your app can use `opus` model and `llm-router` chooses target models based on your routing settings.
283
65
 
284
- Automatic version handling:
285
- - Local config loads with silent forward-migration to latest supported schema.
286
- - Migration is persisted automatically on read when possible (best-effort, no interactive prompt).
287
- - Future/newer version numbers do not fail only because of version mismatch; known fields are normalized best-effort.
66
+ ## Setup using Terminal User Interface (TUI)
288
67
 
289
- Set local auth key:
68
+ Open the TUI:
290
69
 
291
70
  ```bash
292
- llm-router config --operation=set-master-key --master-key=your_local_key
293
- # or generate a strong key automatically
294
- llm-router config --operation=set-master-key --generate-master-key=true
71
+ llm-router
295
72
  ```
296
73
 
297
- Start with auth required:
74
+ Then follow this order.
75
+
76
+ ### 1) Add Provider
77
+ Flow:
78
+ 1. `Config manager`
79
+ 2. `Add/Edit provider`
80
+ 3. Enter provider name, endpoint, API key
81
+ 4. Enter model list
82
+ 5. Save
83
+
84
+ ### 2) Configure Model Fallback (Optional)
85
+ Flow:
86
+ 1. `Config manager`
87
+ 2. `Set model silent-fallbacks`
88
+ 3. Pick main model
89
+ 4. Pick fallback models
90
+ 5. Save
91
+
92
+ ### 3) Configure Rate Limits (Optional)
93
+ Flow:
94
+ 1. `Config manager`
95
+ 2. `Manage provider rate-limit buckets`
96
+ 3. `Create bucket(s)`
97
+ 4. Set name, model scope, request cap, time window
98
+ 5. Save
99
+
100
+ ### 4) Group Models With Alias (Recommended)
101
+ Flow:
102
+ 1. `Config manager`
103
+ 2. `Add/Edit model alias`
104
+ 3. Set alias ID (example: `chat.default`)
105
+ 4. Select target models
106
+ 5. Save
107
+
108
+ ### 5) Configure Model Load Balancer
109
+ Flow:
110
+ 1. `Config manager`
111
+ 2. `Add/Edit model alias`
112
+ 3. Open the alias you want to balance
113
+ 4. Choose strategy (`auto` recommended)
114
+ 5. Review alias targets
115
+ 6. Save
116
+
117
+ ### 6) Set Gateway Key
118
+ Flow:
119
+ 1. `Config manager`
120
+ 2. `Set worker master key`
121
+ 3. Set or generate key
122
+ 4. Save
123
+
124
+ ## Start Local Server
298
125
 
299
126
  ```bash
300
- llm-router start --require-auth=true
127
+ llm-router start
301
128
  ```
302
129
 
303
- ## Cloudflare Worker Deploy
130
+ Local endpoints:
131
+ - Unified: `http://127.0.0.1:8787/route`
132
+ - Anthropic-style: `http://127.0.0.1:8787/anthropic`
133
+ - OpenAI-style: `http://127.0.0.1:8787/openai`
304
134
 
305
- Worker project name in `wrangler.toml`: `llm-router-route`.
135
+ ## Connect your coding tool
306
136
 
307
- ### Option A: Guided deploy
137
+ After setting master key, point your app/agent to local endpoint and use that key as auth token.
308
138
 
309
- ```bash
310
- llm-router deploy
311
- ```
139
+ Claude Code example (`~/.claude/settings.local.json`):
312
140
 
313
- If `LLM_ROUTER_CONFIG_JSON` exceeds Cloudflare Free-tier secret size (`5 KB`), deploy now warns and requires explicit confirmation (default is `No`). In non-interactive environments, pass `--allow-large-config=true` to proceed intentionally.
141
+ ```json
142
+ {
143
+ "env": {
144
+ "ANTHROPIC_BASE_URL": "http://127.0.0.1:8787",
145
+ "ANTHROPIC_AUTH_TOKEN": "gw_your_gateway_key",
146
+ "ANTHROPIC_DEFAULT_OPUS_MODEL": "provider_name/model_name_1",
147
+ "ANTHROPIC_DEFAULT_SONNET_MODEL": "provider_name/model_name_2",
148
+ "ANTHROPIC_DEFAULT_HAIKU_MODEL": "provider_name/model_name_3"
149
+ }
150
+ }
151
+ ```
314
152
 
315
- `deploy` requires `CLOUDFLARE_API_TOKEN` for Cloudflare API access. Create a **User Profile API token** at <https://dash.cloudflare.com/profile/api-tokens> (do not use Account API Tokens), then choose preset/template `Edit Cloudflare Workers`. If the env var is missing in interactive mode, the CLI will show the guide and prompt for token input securely.
153
+ ## Real-Time Update Experience
316
154
 
317
- For multi-account tokens, set account explicitly in non-interactive runs:
318
- - `CLOUDFLARE_ACCOUNT_ID=<id>` or
319
- - `llm-router deploy --account-id=<id>`
155
+ When local server is running:
156
+ - open `llm-router`
157
+ - change provider/model/load-balancer/rate-limit/alias in TUI
158
+ - save
159
+ - the running proxy updates instantly
320
160
 
321
- `llm-router deploy` resolves deploy target from CLI/TUI input (workers.dev or custom route), generates a temporary Wrangler config at runtime, deploys with `--config`, then removes that temporary file. Personal route/account details are not persisted back into repo `wrangler.toml`.
161
+ No stop/start cycle needed.
322
162
 
323
- For custom domains, the deploy helper now prints a DNS checklist and connectivity commands. Common setup for `llm.example.com`:
324
- - Create a DNS record in Cloudflare for `llm` (usually `CNAME llm -> @`)
325
- - Set **Proxy status = Proxied** (orange cloud)
326
- - Use route target `--route-pattern=llm.example.com/* --zone-name=example.com`
327
- - Claude Code base URL should be `https://llm.example.com/anthropic` (**no `:8787`**; that port is local-only)
163
+ ## Cloudflare Worker (Hosted)
328
164
 
329
- ```bash
330
- llm-router deploy --export-only=true --out=.llm-router.worker.json
331
- wrangler secret put LLM_ROUTER_CONFIG_JSON < .llm-router.worker.json
332
- wrangler deploy
333
- ```
165
+ Use when you want a hosted endpoint instead of local server.
334
166
 
335
- Rotate worker auth key quickly:
167
+ Guided deploy:
336
168
 
337
169
  ```bash
338
- llm-router worker-key --master-key=new_key
339
- # or generate and rotate immediately
340
- llm-router worker-key --env=production --generate-master-key=true
170
+ llm-router deploy
341
171
  ```
342
172
 
343
- If you intentionally need to bypass weak-key checks (not recommended), add `--allow-weak-master-key=true` to `deploy` or `worker-key`.
344
-
345
- Cloudflare hardening and incident-response checklist: see [`SECURITY.md`](./SECURITY.md).
346
-
347
- ## Runtime Secrets / Env
348
-
349
- Primary:
350
- - `LLM_ROUTER_CONFIG_JSON`
351
- - `LLM_ROUTER_MASTER_KEY` (optional override)
352
-
353
- Also supported:
354
- - `ROUTE_CONFIG_JSON`
355
- - `LLM_ROUTER_JSON`
173
+ You will be guided in TUI to select account and deploy target.
356
174
 
357
- Optional resilience tuning:
358
- - `LLM_ROUTER_ORIGIN_RETRY_ATTEMPTS` (default `3`)
359
- - `LLM_ROUTER_ORIGIN_RETRY_BASE_DELAY_MS` (default `250`)
360
- - `LLM_ROUTER_ORIGIN_RETRY_MAX_DELAY_MS` (default `3000`)
361
- - `LLM_ROUTER_ORIGIN_FALLBACK_COOLDOWN_MS` (default `45000`)
362
- - `LLM_ROUTER_ORIGIN_RATE_LIMIT_COOLDOWN_MS` (default `30000`)
363
- - `LLM_ROUTER_ORIGIN_BILLING_COOLDOWN_MS` (default `900000`)
364
- - `LLM_ROUTER_ORIGIN_AUTH_COOLDOWN_MS` (default `600000`)
365
- - `LLM_ROUTER_ORIGIN_POLICY_COOLDOWN_MS` (default `120000`)
366
- - `LLM_ROUTER_ALLOW_POLICY_FALLBACK` (default `false`)
367
- - `LLM_ROUTER_FALLBACK_CIRCUIT_FAILURES` (default `2`)
368
- - `LLM_ROUTER_FALLBACK_CIRCUIT_COOLDOWN_MS` (default `30000`)
369
- - `LLM_ROUTER_MAX_REQUEST_BODY_BYTES` (default `1048576`, min `4096`, max `20971520`)
370
- - `LLM_ROUTER_UPSTREAM_TIMEOUT_MS` (default `60000`, min `1000`, max `300000`)
175
+ ## Config File Location
371
176
 
372
- Optional browser access (CORS):
373
- - By default, cross-origin browser reads are denied unless explicitly allow-listed.
374
- - `LLM_ROUTER_CORS_ALLOWED_ORIGINS` (comma-separated exact origins, e.g. `https://app.example.com`)
375
- - `LLM_ROUTER_CORS_ALLOW_ALL=true` (allows any origin; not recommended for production)
376
-
377
- Optional source IP allowlist (recommended for Worker deployments):
378
- - `LLM_ROUTER_ALLOWED_IPS` (comma-separated client IPs; denies requests from all other IPs)
379
- - `LLM_ROUTER_IP_ALLOWLIST` (alias of `LLM_ROUTER_ALLOWED_IPS`)
380
-
381
- ## Default Config Path
177
+ Local config file:
382
178
 
383
179
  `~/.llm-router.json`
384
180
 
385
- Minimal shape:
386
-
387
- ```json
388
- {
389
- "version": 2,
390
- "masterKey": "local_or_worker_key",
391
- "defaultModel": "chat.default",
392
- "modelAliases": {
393
- "chat.default": {
394
- "strategy": "auto",
395
- "targets": [
396
- { "ref": "openrouter/gpt-4o" },
397
- { "ref": "anthropic/claude-3-5-haiku" }
398
- ],
399
- "fallbackTargets": [
400
- { "ref": "openrouter/gpt-4o-mini" }
401
- ]
402
- }
403
- },
404
- "providers": [
405
- {
406
- "id": "openrouter",
407
- "name": "OpenRouter",
408
- "baseUrl": "https://openrouter.ai/api/v1",
409
- "apiKey": "sk-or-v1-...",
410
- "formats": ["openai"],
411
- "models": [{ "id": "gpt-4o" }],
412
- "rateLimits": [
413
- {
414
- "id": "openrouter-all-month",
415
- "name": "Monthly cap",
416
- "models": ["all"],
417
- "requests": 20000,
418
- "window": { "unit": "month", "size": 1 }
419
- }
420
- ]
421
- }
422
- ]
423
- }
424
- ```
425
-
426
- Direct vs model alias routing:
427
- - Direct route: request `model=provider/model` and optional model-level `fallbackModels` applies.
428
- - Model alias route: request `model=alias.id` (or set as `defaultModel`) and the model alias `targets` + `strategy` drive balancing. `auto` is the recommended default for new model aliases.
429
-
430
- State durability caveats:
431
- - Local Node (`llm-router start`): routing state defaults to file-backed local persistence, so cooldowns/caps survive restarts.
432
- - Cloudflare Worker: default state is in-memory per isolate for now; long-window counters are best-effort until a durable Worker backend is configured.
181
+ ## Security
433
182
 
434
- ## Smoke Test
183
+ See [`SECURITY.md`](./SECURITY.md).
435
184
 
436
- ```bash
437
- npm run test:provider-smoke
438
- ```
185
+ ## Versioning
439
186
 
440
- Use `.env.test-suite.example` as template for provider-based smoke tests.
187
+ - Semver: [Semantic Versioning](https://semver.org/)
188
+ - Release notes: [`CHANGELOG.md`](./CHANGELOG.md)
package/package.json CHANGED
@@ -1,7 +1,19 @@
1
1
  {
2
2
  "name": "@khanglvm/llm-router",
3
- "version": "1.0.6",
3
+ "version": "1.0.8",
4
4
  "description": "Single gateway endpoint for multi-provider LLMs with unified OpenAI+Anthropic format and seamless fallback",
5
+ "keywords": [
6
+ "llm-router",
7
+ "llm-gateway",
8
+ "ai-proxy",
9
+ "openai-compatible",
10
+ "anthropic-compatible",
11
+ "model-routing",
12
+ "fallback",
13
+ "load-balancing",
14
+ "cloudflare-workers",
15
+ "agent-infra"
16
+ ],
5
17
  "type": "module",
6
18
  "main": "src/index.js",
7
19
  "bin": {
@@ -90,6 +90,7 @@ const MODEL_ROUTING_STRATEGY_OPTIONS = [
90
90
  const MODEL_ALIAS_STRATEGIES = MODEL_ROUTING_STRATEGY_OPTIONS.map((option) => option.value);
91
91
  const DEFAULT_PROBE_REQUESTS_PER_MINUTE = 30;
92
92
  const DEFAULT_PROBE_MAX_RATE_LIMIT_RETRIES = 3;
93
+ const DEFAULT_AI_HELP_GATEWAY_TEST_TIMEOUT_MS = 6000;
93
94
  const RATE_LIMIT_WINDOW_UNIT_ALIASES = new Map([
94
95
  ["s", "second"],
95
96
  ["sec", "second"],
@@ -4659,6 +4660,497 @@ async function runUpdateAction(context) {
4659
4660
  };
4660
4661
  }
4661
4662
 
4663
+ function toHomeRelativePath(value) {
4664
+ const input = String(value || "").trim();
4665
+ const home = String(process.env.HOME || "").trim();
4666
+ if (!input || !home) return input;
4667
+ if (!input.startsWith(`${home}/`)) return input;
4668
+ return `~${input.slice(home.length)}`;
4669
+ }
4670
+
4671
+ function collectEnabledModelRefsFromConfig(config) {
4672
+ const providers = (config?.providers || []).filter((provider) => provider && provider.enabled !== false);
4673
+ const refs = [];
4674
+ for (const provider of providers) {
4675
+ const providerId = String(provider?.id || "").trim();
4676
+ if (!providerId) continue;
4677
+ for (const model of (provider.models || [])) {
4678
+ if (!model || model.enabled === false) continue;
4679
+ const modelId = String(model.id || "").trim();
4680
+ if (!modelId) continue;
4681
+ refs.push(`${providerId}/${modelId}`);
4682
+ }
4683
+ }
4684
+ return dedupeList(refs);
4685
+ }
4686
+
4687
+ function quoteShellSingle(value) {
4688
+ return `'${String(value || "").replace(/'/g, "'\"'\"'")}'`;
4689
+ }
4690
+
4691
+ function buildCurlGuideCommand(url, {
4692
+ method = "GET",
4693
+ headers = [],
4694
+ jsonBody
4695
+ } = {}) {
4696
+ const parts = ["curl -sS"];
4697
+ if (String(method || "").toUpperCase() !== "GET") {
4698
+ parts.push(`-X ${String(method || "").toUpperCase()}`);
4699
+ }
4700
+ for (const header of headers) {
4701
+ parts.push(`-H ${quoteShellSingle(header)}`);
4702
+ }
4703
+ if (jsonBody !== undefined) {
4704
+ parts.push("-H 'content-type: application/json'");
4705
+ parts.push(`--data ${quoteShellSingle(JSON.stringify(jsonBody))}`);
4706
+ }
4707
+ parts.push(quoteShellSingle(url));
4708
+ return parts.join(" ");
4709
+ }
4710
+
4711
+ async function runGatewayHttpProbe({
4712
+ url,
4713
+ method = "GET",
4714
+ headers = {},
4715
+ jsonBody,
4716
+ timeoutMs = DEFAULT_AI_HELP_GATEWAY_TEST_TIMEOUT_MS
4717
+ } = {}) {
4718
+ const requestHeaders = { ...(headers || {}) };
4719
+ const requestInit = {
4720
+ method: String(method || "GET").toUpperCase(),
4721
+ headers: requestHeaders
4722
+ };
4723
+
4724
+ if (jsonBody !== undefined) {
4725
+ if (!requestHeaders["content-type"] && !requestHeaders["Content-Type"]) {
4726
+ requestHeaders["content-type"] = "application/json";
4727
+ }
4728
+ requestInit.body = JSON.stringify(jsonBody);
4729
+ }
4730
+
4731
+ if (typeof AbortSignal !== "undefined" && typeof AbortSignal.timeout === "function") {
4732
+ requestInit.signal = AbortSignal.timeout(timeoutMs);
4733
+ }
4734
+
4735
+ try {
4736
+ const response = await fetch(url, requestInit);
4737
+ const rawText = await response.text();
4738
+ return {
4739
+ ok: response.ok,
4740
+ status: response.status,
4741
+ payload: parseJsonSafely(rawText),
4742
+ rawText: String(rawText || "").trim().slice(0, 280)
4743
+ };
4744
+ } catch (error) {
4745
+ return {
4746
+ ok: false,
4747
+ status: 0,
4748
+ payload: null,
4749
+ rawText: "",
4750
+ error: error instanceof Error ? error.message : String(error)
4751
+ };
4752
+ }
4753
+ }
4754
+
4755
+ function summarizeProbeMessage(probe) {
4756
+ if (!probe) return "";
4757
+ if (probe.error) return String(probe.error);
4758
+ const payloadError = probe.payload?.error;
4759
+ if (typeof payloadError === "string") return payloadError.trim();
4760
+ if (payloadError && typeof payloadError === "object") {
4761
+ if (payloadError.message) return String(payloadError.message).trim();
4762
+ if (payloadError.type) return String(payloadError.type).trim();
4763
+ }
4764
+ if (probe.rawText) return String(probe.rawText).trim().slice(0, 140);
4765
+ return "";
4766
+ }
4767
+
4768
+ function formatProbeStatusLabel(probe, {
4769
+ passStatuses = [200],
4770
+ passWhenStatusIsNot = null
4771
+ } = {}) {
4772
+ if (!probe) return "not-run";
4773
+ if (probe.error) return `error (${probe.error})`;
4774
+ const status = Number(probe.status || 0);
4775
+ const isPass = passWhenStatusIsNot !== null
4776
+ ? status !== passWhenStatusIsNot
4777
+ : passStatuses.includes(status);
4778
+ const message = summarizeProbeMessage(probe);
4779
+ if (message) return `${isPass ? "pass" : "fail"} (status=${status}; ${message})`;
4780
+ return `${isPass ? "pass" : "fail"} (status=${status})`;
4781
+ }
4782
+
4783
+ async function runAiHelpGatewayLiveTests({
4784
+ runtimeState,
4785
+ authToken = "",
4786
+ probeModel = "",
4787
+ timeoutMs = DEFAULT_AI_HELP_GATEWAY_TEST_TIMEOUT_MS
4788
+ } = {}) {
4789
+ if (!runtimeState) {
4790
+ return {
4791
+ ran: false,
4792
+ reason: "local-server-not-running",
4793
+ baseUrl: "",
4794
+ tests: {}
4795
+ };
4796
+ }
4797
+
4798
+ const baseUrl = `http://${runtimeState.host}:${runtimeState.port}`;
4799
+ const token = String(authToken || "").trim();
4800
+ const headers = token
4801
+ ? {
4802
+ Authorization: `Bearer ${token}`,
4803
+ "x-api-key": token
4804
+ }
4805
+ : {};
4806
+
4807
+ const modelId = String(probeModel || "").trim() || "chat.default";
4808
+ const [health, openaiModels, claudeModels, codexResponses] = await Promise.all([
4809
+ runGatewayHttpProbe({
4810
+ url: `${baseUrl}/health`,
4811
+ method: "GET",
4812
+ headers,
4813
+ timeoutMs
4814
+ }),
4815
+ runGatewayHttpProbe({
4816
+ url: `${baseUrl}/openai/v1/models`,
4817
+ method: "GET",
4818
+ headers,
4819
+ timeoutMs
4820
+ }),
4821
+ runGatewayHttpProbe({
4822
+ url: `${baseUrl}/anthropic/v1/models`,
4823
+ method: "GET",
4824
+ headers,
4825
+ timeoutMs
4826
+ }),
4827
+ runGatewayHttpProbe({
4828
+ url: `${baseUrl}/openai/v1/responses`,
4829
+ method: "POST",
4830
+ headers,
4831
+ jsonBody: {
4832
+ model: modelId,
4833
+ input: "ping"
4834
+ },
4835
+ timeoutMs
4836
+ })
4837
+ ]);
4838
+
4839
+ return {
4840
+ ran: true,
4841
+ reason: "completed",
4842
+ baseUrl,
4843
+ tests: {
4844
+ health,
4845
+ openaiModels,
4846
+ claudeModels,
4847
+ codexResponses
4848
+ }
4849
+ };
4850
+ }
4851
+
4852
+ async function runAiHelpAction(context) {
4853
+ const args = context.args || {};
4854
+ const configPath = readArg(args, ["config", "configPath"], getDefaultConfigPath());
4855
+ const skipLiveTest = toBoolean(readArg(args, ["skip-live-test", "skipLiveTest"], false), false);
4856
+ const liveTestTimeoutMs = toPositiveInteger(
4857
+ readArg(args, ["live-test-timeout-ms", "liveTestTimeoutMs"], DEFAULT_AI_HELP_GATEWAY_TEST_TIMEOUT_MS),
4858
+ DEFAULT_AI_HELP_GATEWAY_TEST_TIMEOUT_MS,
4859
+ { min: 500, max: 60_000 }
4860
+ );
4861
+ const explicitGatewayAuthToken = String(readArg(args, ["gateway-auth-token", "gatewayAuthToken"], "") || "").trim();
4862
+ const config = await readConfigFile(configPath);
4863
+
4864
+ const providers = (config.providers || []).filter((provider) => provider && provider.enabled !== false);
4865
+ const providerCount = providers.length;
4866
+ const modelCount = providers.reduce((sum, provider) => {
4867
+ const count = (provider.models || []).filter((model) => model && model.enabled !== false).length;
4868
+ return sum + count;
4869
+ }, 0);
4870
+
4871
+ const aliasEntries = Object.entries(config.modelAliases || {});
4872
+ const aliasCount = aliasEntries.length;
4873
+ const aliasStrategySummary = aliasEntries
4874
+ .map(([aliasId, alias]) => `${aliasId}:${alias?.strategy || "ordered"}`)
4875
+ .join(", ") || "(none)";
4876
+ const rateLimitBucketCount = providers.reduce((sum, provider) => sum + (provider.rateLimits || []).length, 0);
4877
+ const defaultModel = String(config.defaultModel || "smart");
4878
+ const hasMasterKey = Boolean(String(config.masterKey || "").trim());
4879
+
4880
+ let runtimeState = null;
4881
+ try {
4882
+ runtimeState = await getActiveRuntimeState();
4883
+ } catch {
4884
+ runtimeState = null;
4885
+ }
4886
+ const serverRunning = Boolean(runtimeState);
4887
+ const runtimeRequiresAuth = Boolean(runtimeState?.requireAuth);
4888
+
4889
+ let runtimeConfig = null;
4890
+ const runtimeConfigPath = String(runtimeState?.configPath || "").trim();
4891
+ if (runtimeConfigPath && runtimeConfigPath !== configPath) {
4892
+ try {
4893
+ runtimeConfig = await readConfigFile(runtimeConfigPath);
4894
+ } catch {
4895
+ runtimeConfig = null;
4896
+ }
4897
+ }
4898
+
4899
+ const runtimeMasterKey = String(runtimeConfig?.masterKey || "").trim();
4900
+ const gatewayAuthToken = explicitGatewayAuthToken
4901
+ || (runtimeConfigPath && runtimeConfigPath !== configPath ? runtimeMasterKey : "")
4902
+ || String(config.masterKey || "").trim()
4903
+ || runtimeMasterKey;
4904
+
4905
+ const directModelRefs = collectEnabledModelRefsFromConfig(config);
4906
+ const aliasIds = aliasEntries.map(([aliasId]) => aliasId);
4907
+ const modelDecisionOptions = dedupeList([
4908
+ defaultModel && defaultModel !== "smart" ? defaultModel : "",
4909
+ ...aliasIds,
4910
+ ...directModelRefs
4911
+ ]);
4912
+ const probeModel = modelDecisionOptions[0] || "chat.default";
4913
+
4914
+ let liveTest = {
4915
+ ran: false,
4916
+ reason: skipLiveTest ? "skipped-by-flag" : "local-server-not-running",
4917
+ baseUrl: serverRunning ? `http://${runtimeState.host}:${runtimeState.port}` : "",
4918
+ tests: {}
4919
+ };
4920
+ if (!skipLiveTest && serverRunning) {
4921
+ liveTest = await runAiHelpGatewayLiveTests({
4922
+ runtimeState,
4923
+ authToken: gatewayAuthToken,
4924
+ probeModel,
4925
+ timeoutMs: liveTestTimeoutMs
4926
+ });
4927
+ }
4928
+
4929
+ const healthProbe = liveTest.tests?.health || null;
4930
+ const openaiModelsProbe = liveTest.tests?.openaiModels || null;
4931
+ const claudeModelsProbe = liveTest.tests?.claudeModels || null;
4932
+ const codexResponsesProbe = liveTest.tests?.codexResponses || null;
4933
+
4934
+ const claudePatchGate = !liveTest.ran
4935
+ ? "pending-live-test"
4936
+ : (claudeModelsProbe?.status === 200 ? "ready" : "blocked");
4937
+ const openCodePatchGate = !liveTest.ran
4938
+ ? "pending-live-test"
4939
+ : (openaiModelsProbe?.status === 200 ? "ready" : "blocked");
4940
+ let codexPatchGate = "pending-live-test";
4941
+ if (liveTest.ran) {
4942
+ if (codexResponsesProbe?.error) {
4943
+ codexPatchGate = "blocked";
4944
+ } else if (codexResponsesProbe?.status === 404) {
4945
+ codexPatchGate = "blocked-responses-endpoint-missing";
4946
+ } else if ([401, 403].includes(Number(codexResponsesProbe?.status || 0))) {
4947
+ codexPatchGate = "blocked-auth";
4948
+ } else {
4949
+ codexPatchGate = "ready";
4950
+ }
4951
+ }
4952
+
4953
+ const suggestions = [];
4954
+ if (providerCount === 0) {
4955
+ suggestions.push("Add first provider with at least one model. Run: llm-router config --operation=upsert-provider --provider-id=<id> --name=\"<name>\" --base-url=<url> --api-key=<key> --models=<model1,model2>");
4956
+ } else {
4957
+ const providersWithoutModels = providers
4958
+ .filter((provider) => (provider.models || []).filter((model) => model && model.enabled !== false).length === 0)
4959
+ .map((provider) => provider.id);
4960
+ if (providersWithoutModels.length > 0) {
4961
+ suggestions.push(`Add models to provider(s) with empty model list: ${providersWithoutModels.join(", ")}. Run: llm-router config --operation=upsert-provider --provider-id=<id> --models=<model1,model2>`);
4962
+ }
4963
+ }
4964
+
4965
+ if (modelCount > 0 && aliasCount === 0) {
4966
+ suggestions.push("Create a model alias/group for stable app routing. Run: llm-router config --operation=upsert-model-alias --alias-id=chat.default --strategy=auto --targets=<provider/model,...>");
4967
+ }
4968
+
4969
+ if (aliasCount > 0) {
4970
+ const nonAutoAliases = aliasEntries
4971
+ .filter(([, alias]) => String(alias?.strategy || "ordered") !== "auto")
4972
+ .map(([aliasId]) => aliasId);
4973
+ if (nonAutoAliases.length > 0) {
4974
+ suggestions.push(`Review load-balancer strategy for alias(es): ${nonAutoAliases.join(", ")}. Recommended default: auto.`);
4975
+ }
4976
+ }
4977
+
4978
+ if (providerCount > 0 && rateLimitBucketCount === 0) {
4979
+ suggestions.push("Add at least one provider rate-limit bucket for quota safety. Run: llm-router config --operation=set-provider-rate-limits --provider-id=<id> --bucket-name=\"Monthly cap\" --bucket-models=all --bucket-requests=<n> --bucket-window=month:1");
4980
+ }
4981
+
4982
+ if (!hasMasterKey) {
4983
+ suggestions.push("Set master key for authenticated access. Run: llm-router config --operation=set-master-key --generate-master-key=true");
4984
+ }
4985
+
4986
+ if (!serverRunning) {
4987
+ suggestions.push(`Start local proxy server. Run: llm-router start${hasMasterKey ? " --require-auth=true" : ""}`);
4988
+ } else {
4989
+ suggestions.push(`Local proxy is running on http://${runtimeState.host}:${runtimeState.port}. Apply config changes with llm-router config; updates hot-reload automatically.`);
4990
+ }
4991
+
4992
+ if (serverRunning && skipLiveTest) {
4993
+ suggestions.push("Run live llm-router API test before patching coding-tool config. Re-run: llm-router ai-help --skip-live-test=false");
4994
+ }
4995
+
4996
+ if (liveTest.ran && claudePatchGate !== "ready") {
4997
+ suggestions.push("Claude/OpenCode patch gate is blocked. Fix llm-router auth/provider/model readiness, then re-run llm-router ai-help.");
4998
+ }
4999
+ if (liveTest.ran && codexPatchGate === "blocked-responses-endpoint-missing") {
5000
+ suggestions.push("Codex CLI requires OpenAI Responses API. Current llm-router endpoint does not expose /openai/v1/responses; do not patch Codex until this gate is resolved.");
5001
+ }
5002
+
5003
+ if (suggestions.length === 0) {
5004
+ suggestions.push("No blocking setup gaps detected. Review routing summary with: llm-router config --operation=list-routing");
5005
+ }
5006
+
5007
+ const runtimeConfigPathForDisplay = runtimeConfigPath ? toHomeRelativePath(runtimeConfigPath) : "";
5008
+ const gatewayBaseUrlForGuide = liveTest.baseUrl || (serverRunning ? `http://${runtimeState.host}:${runtimeState.port}` : "http://127.0.0.1:8787");
5009
+ const authGuideHeaders = runtimeRequiresAuth ? ["Authorization: Bearer <master_key>"] : [];
5010
+
5011
+ const lines = [
5012
+ "# AI-HELP",
5013
+ "ENTITY: llm-router",
5014
+ "MODE: cli-automation",
5015
+ "PROFILE: agent-guide-v2",
5016
+ "",
5017
+ "## INTRO",
5018
+ "Use this output as an AI-agent operating brief for llm-router.",
5019
+ "The agent should auto-discover commands, inspect current state, configure llm-router on your behalf, run live API gates, and only then patch coding tool configs.",
5020
+ "",
5021
+ "## WHAT AGENT CAN DO WITH LLM-ROUTER",
5022
+ "- explain llm-router capabilities and current setup readiness",
5023
+ "- set provider, model list, model alias/group, and rate-limit buckets via CLI",
5024
+ "- validate local llm-router endpoint health/model-list/routes with real API probes",
5025
+ "- patch coding tools (Claude Code, Codex CLI, OpenCode) after pre-patch gates pass",
5026
+ "",
5027
+ "## DISCOVERY COMMANDS",
5028
+ "- llm-router -h",
5029
+ "- llm-router config -h",
5030
+ "- llm-router start -h",
5031
+ "- llm-router deploy -h",
5032
+ "",
5033
+ "## CURRENT STATE",
5034
+ `- config_path=${configPath}`,
5035
+ `- providers=${providerCount}`,
5036
+ `- models=${modelCount}`,
5037
+ `- model_aliases=${aliasCount}`,
5038
+ `- alias_strategies=${aliasStrategySummary}`,
5039
+ `- rate_limit_buckets=${rateLimitBucketCount}`,
5040
+ `- default_model=${defaultModel}`,
5041
+ `- master_key_configured=${hasMasterKey}`,
5042
+ `- local_server_running=${serverRunning}`,
5043
+ serverRunning ? `- local_server_endpoint=http://${runtimeState.host}:${runtimeState.port}` : "",
5044
+ runtimeState ? `- local_server_require_auth=${runtimeRequiresAuth}` : "",
5045
+ runtimeConfigPathForDisplay ? `- local_server_config_path=${runtimeConfigPathForDisplay}` : "",
5046
+ "",
5047
+ "## MODEL/GROUP DECISION INPUT (REQUIRED BEFORE PATCHING TOOL CONFIG)",
5048
+ "- Ask user to choose target_tool: claude-code | codex-cli | opencode",
5049
+ "- Ask user to choose target_model_or_group for that tool",
5050
+ `- available_alias_groups=${aliasIds.join(", ") || "(none)"}`,
5051
+ `- available_direct_models=${directModelRefs.join(", ") || "(none)"}`,
5052
+ `- decision_options_preview=${modelDecisionOptions.slice(0, 12).join(", ") || "(none)"}`,
5053
+ "- If user chooses an alias/group, keep alias id unchanged so llm-router balancing still works.",
5054
+ "",
5055
+ "## PRE-PATCH API GATE (MUST PASS BEFORE EDITING TOOL CONFIG)",
5056
+ `- live_test_mode=${skipLiveTest ? "skipped-by-flag" : (liveTest.ran ? "executed" : "pending-local-server")}`,
5057
+ `- live_test_timeout_ms=${liveTestTimeoutMs}`,
5058
+ `- gateway_base_url=${gatewayBaseUrlForGuide}`,
5059
+ `- health_probe=${liveTest.ran ? formatProbeStatusLabel(healthProbe, { passStatuses: [200] }) : "not-run"}`,
5060
+ `- openai_models_probe=${liveTest.ran ? formatProbeStatusLabel(openaiModelsProbe, { passStatuses: [200] }) : "not-run"}`,
5061
+ `- claude_models_probe=${liveTest.ran ? formatProbeStatusLabel(claudeModelsProbe, { passStatuses: [200] }) : "not-run"}`,
5062
+ `- codex_responses_probe=${liveTest.ran ? formatProbeStatusLabel(codexResponsesProbe, { passWhenStatusIsNot: 404 }) : "not-run"}`,
5063
+ `- patch_gate_claude_code=${claudePatchGate}`,
5064
+ `- patch_gate_opencode=${openCodePatchGate}`,
5065
+ `- patch_gate_codex_cli=${codexPatchGate}`,
5066
+ "- Rule: Do NOT patch any coding-tool config until required gate is ready.",
5067
+ "",
5068
+ "## LIVE TEST COMMANDS (RUN BEFORE PATCHING TOOL CONFIG)",
5069
+ runtimeRequiresAuth ? "- export LLM_ROUTER_MASTER_KEY='<master_key>'" : "- Local auth currently disabled; auth header is optional.",
5070
+ `- ${buildCurlGuideCommand(`${gatewayBaseUrlForGuide}/health`, { method: "GET", headers: authGuideHeaders })}`,
5071
+ `- ${buildCurlGuideCommand(`${gatewayBaseUrlForGuide}/openai/v1/models`, { method: "GET", headers: authGuideHeaders })}`,
5072
+ `- ${buildCurlGuideCommand(`${gatewayBaseUrlForGuide}/anthropic/v1/models`, { method: "GET", headers: authGuideHeaders })}`,
5073
+ `- ${buildCurlGuideCommand(`${gatewayBaseUrlForGuide}/openai/v1/responses`, {
5074
+ method: "POST",
5075
+ headers: authGuideHeaders,
5076
+ jsonBody: { model: "<target_model_or_group>", input: "ping" }
5077
+ })} # required for Codex CLI compatibility`,
5078
+ "",
5079
+ "## LLM-ROUTER CONFIG WORKFLOWS (CLI)",
5080
+ "1. Upsert provider + models:",
5081
+ " llm-router config --operation=upsert-provider --provider-id=<id> --name=\"<name>\" --endpoints=<url1,url2> --api-key=<key> --models=<model1,model2>",
5082
+ "2. Upsert model alias/group:",
5083
+ " llm-router config --operation=upsert-model-alias --alias-id=<alias> --strategy=auto --targets=<provider/model,...>",
5084
+ "3. Set provider rate limit bucket:",
5085
+ " llm-router config --operation=set-provider-rate-limits --provider-id=<id> --bucket-name=\"Monthly cap\" --bucket-models=all --bucket-requests=<n> --bucket-window=month:1",
5086
+ "4. Review final routing summary:",
5087
+ " llm-router config --operation=list-routing",
5088
+ "",
5089
+ "## CODING TOOL PATCH PLAYBOOK",
5090
+ "### Claude Code",
5091
+ "- patch_target_priority=.claude/settings.local.json (project) -> ~/.claude/settings.json (user)",
5092
+ "- required_gate=patch_gate_claude_code=ready",
5093
+ "- set env keys: ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN, ANTHROPIC_MODEL",
5094
+ "```json",
5095
+ "{",
5096
+ " \"env\": {",
5097
+ ` \"ANTHROPIC_BASE_URL\": \"${gatewayBaseUrlForGuide}/anthropic\",`,
5098
+ " \"ANTHROPIC_AUTH_TOKEN\": \"<master_key>\",",
5099
+ " \"ANTHROPIC_MODEL\": \"<target_model_or_group>\"",
5100
+ " }",
5101
+ "}",
5102
+ "```",
5103
+ "",
5104
+ "### Codex CLI",
5105
+ "- patch_target_priority=.codex/config.toml (project) -> ~/.codex/config.toml (user)",
5106
+ "- required_gate=patch_gate_codex_cli=ready",
5107
+ "- hard requirement: Codex uses OpenAI Responses API; /openai/v1/responses must be reachable",
5108
+ "```toml",
5109
+ "model_provider = \"llm_router\"",
5110
+ "model = \"<target_model_or_group>\"",
5111
+ "",
5112
+ "[model_providers.llm_router]",
5113
+ "name = \"llm-router\"",
5114
+ `base_url = \"${gatewayBaseUrlForGuide}/openai/v1\"`,
5115
+ "wire_api = \"responses\"",
5116
+ "env_http_headers = { Authorization = \"LLM_ROUTER_AUTH_HEADER\" }",
5117
+ "```",
5118
+ "- export env before launching Codex: export LLM_ROUTER_AUTH_HEADER='Bearer <master_key>'",
5119
+ "",
5120
+ "### OpenCode",
5121
+ "- patch_target_priority=./opencode.json (project) -> ~/.config/opencode/opencode.json (user)",
5122
+ "- required_gate=patch_gate_opencode=ready",
5123
+ "```json",
5124
+ "{",
5125
+ " \"model\": \"<target_model_or_group>\",",
5126
+ " \"small_model\": \"<target_model_or_group>\",",
5127
+ " \"provider\": {",
5128
+ " \"llm-router\": {",
5129
+ " \"options\": {",
5130
+ ` \"baseURL\": \"${gatewayBaseUrlForGuide}/openai\",`,
5131
+ " \"apiKey\": \"<master_key>\"",
5132
+ " }",
5133
+ " }",
5134
+ " }",
5135
+ "}",
5136
+ "```",
5137
+ "",
5138
+ "## NEXT SUGGESTIONS",
5139
+ ...suggestions.map((item, index) => `${index + 1}. ${item}`),
5140
+ "",
5141
+ "## UPDATE RULE",
5142
+ "When local server is running, llm-router config changes are hot-reloaded in memory (no manual restart required).",
5143
+ "Agent policy: always run live API gate checks first, then patch tool configs only after gate status is ready."
5144
+ ].filter(Boolean);
5145
+
5146
+ return {
5147
+ ok: true,
5148
+ mode: context.mode,
5149
+ exitCode: EXIT_SUCCESS,
5150
+ data: lines.join("\n")
5151
+ };
5152
+ }
5153
+
4662
5154
  async function runDeployAction(context) {
4663
5155
  const args = context.args || {};
4664
5156
  const configPath = readArg(args, ["config", "configPath"], getDefaultConfigPath());
@@ -5352,6 +5844,44 @@ const routerModule = {
5352
5844
  },
5353
5845
  run: runUpdateAction
5354
5846
  },
5847
+ {
5848
+ actionId: "ai-help",
5849
+ description: "Print AI-agent guide with llm-router setup workflows, live API gates, and coding-tool patch playbooks.",
5850
+ tui: { steps: ["print-ai-help"] },
5851
+ commandline: {
5852
+ requiredArgs: [],
5853
+ optionalArgs: [
5854
+ "config",
5855
+ "skip-live-test",
5856
+ "live-test-timeout-ms",
5857
+ "gateway-auth-token"
5858
+ ]
5859
+ },
5860
+ help: {
5861
+ summary: "AI guide for setup + operation: state snapshot, provider/alias/rate-limit workflows, live gateway tests, and patch rules for Claude/Codex/OpenCode.",
5862
+ args: [
5863
+ { name: "config", required: false, description: "Path to config file used for state-aware suggestions.", example: "--config=~/.llm-router.json" },
5864
+ { name: "skip-live-test", required: false, description: "Skip live llm-router API probes in ai-help output.", example: "--skip-live-test=true" },
5865
+ { name: "live-test-timeout-ms", required: false, description: `HTTP timeout for ai-help live probes (default ${DEFAULT_AI_HELP_GATEWAY_TEST_TIMEOUT_MS}ms).`, example: "--live-test-timeout-ms=8000" },
5866
+ { name: "gateway-auth-token", required: false, description: "Override auth token for live probes when runtime config differs from selected --config.", example: "--gateway-auth-token=gw_..." }
5867
+ ],
5868
+ examples: [
5869
+ "llm-router ai-help",
5870
+ "llm-router ai-help --config=~/.llm-router.json",
5871
+ "llm-router ai-help --skip-live-test=true",
5872
+ "llm-router ai-help --live-test-timeout-ms=8000"
5873
+ ],
5874
+ useCases: [
5875
+ {
5876
+ name: "agent setup brief",
5877
+ description: "Generate a machine-readable operating guide so AI agents can configure llm-router, run pre-patch API gates, and patch tool configs safely.",
5878
+ command: "llm-router ai-help"
5879
+ }
5880
+ ],
5881
+ keybindings: []
5882
+ },
5883
+ run: runAiHelpAction
5884
+ },
5355
5885
  {
5356
5886
  actionId: "config",
5357
5887
  description: "Config manager for providers/models/master-key/startup service.",