npm - @khanglvm/llm-router - Versions diffs - 1.0.5 → 1.0.6 - Mend

@khanglvm/llm-router 1.0.5 → 1.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

package/CHANGELOG.md +46 -0
package/README.md +215 -5
package/SECURITY.md +142 -0
package/package.json +15 -3
package/src/cli/router-module.js +1873 -256
package/src/index.js +2 -2
package/src/node/config-store.js +74 -6
package/src/node/local-server.js +9 -3
package/src/node/provider-probe.js +354 -97
package/src/runtime/balancer.js +310 -0
package/src/runtime/config.js +895 -45
package/src/runtime/handler/cache-mapping.js +306 -0
package/src/runtime/handler/config-loading.js +4 -1
package/src/runtime/handler/fallback.js +10 -0
package/src/runtime/handler/provider-call.js +40 -2
package/src/runtime/handler/reasoning-effort.js +313 -0
package/src/runtime/handler.js +414 -44
package/src/runtime/rate-limits.js +317 -0
package/src/runtime/state-store.file.js +335 -0
package/src/runtime/state-store.js +74 -0
package/src/runtime/state-store.memory.js +180 -0
package/src/translator/request/claude-to-openai.js +86 -25
package/src/translator/request/openai-to-claude.js +87 -13
package/.env.test-suite.example +0 -19

package/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,46 @@
+# Changelog
+All notable changes to this project will be documented in this file.
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [1.0.6] - 2026-02-28
+### Added
+- Added a formal changelog for tracked, versioned releases.
+- Added npm package publish metadata to keep public publish defaults explicit.
+### Changed
+- Added an explicit package `files` whitelist so npm publishes are predictable.
+- Updated release workflow docs in `README.md` to require changelog + version updates before publish.
+## [1.0.5] - 2026-02-27
+### Fixed
+- Hardened release surface and added `.npmignore` coverage for safer package publishes.
+## [1.0.4] - 2026-02-26
+### Changed
+- Refined README guidance for routing and deployment usage.
+## [1.0.3] - 2026-02-26
+### Changed
+- Simplified project positioning and gateway copy in docs.
+## [1.0.2] - 2026-02-26
+### Changed
+- Documented smart fallback behavior and operational expectations.
+## [1.0.1] - 2026-02-25
+### Changed
+- Improved fallback strategy behavior and released patch update.
+## [1.0.0] - 2026-02-25
+### Added
+- Initial `llm-router` route release with local + Cloudflare Worker gateway flows.

package/README.md CHANGED Viewed

@@ -14,6 +14,18 @@ It supports:
 npm i -g @khanglvm/llm-router
 ```
+## Versioning
+- Follows [Semantic Versioning](https://semver.org/).
+- Release notes live in [`CHANGELOG.md`](./CHANGELOG.md).
+- npm publishes are configured for the public registry package.
+Release checklist:
+- Update `README.md` if user-facing behavior changed.
+- Add a dated entry in `CHANGELOG.md`.
+- Bump the package version before publish.
+- Publish with `npm publish`.
 ## Quick Start
 ```bash
@@ -45,17 +57,34 @@ llm-router config \
   --format=openai \
   --skip-probe=true
-# 2) (Optional) Configure model fallback order
+# 2) (Optional) Configure model fallback order for direct provider/model requests
 llm-router config \
   --operation=set-model-fallbacks \
   --provider-id=openrouter \
   --model=claude-3-7-sonnet \
   --fallback-models=openrouter/gpt-4o
-# 3) Set master key (this is your gateway key for client apps)
+# 3) (Optional) Create a model alias with a routing strategy and weighted targets
+llm-router config \
+  --operation=upsert-model-alias \
+  --alias-id=chat.default \
+  --strategy=auto \
+  --targets=openrouter/claude-3-7-sonnet@2,openrouter/gpt-4o@1 \
+  --fallback-targets=openrouter/gpt-4o-mini
+# 4) (Optional) Add provider request-cap bucket (models: all)
+llm-router config \
+  --operation=set-provider-rate-limits \
+  --provider-id=openrouter \
+  --bucket-name="Monthly cap" \
+  --bucket-models=all \
+  --bucket-requests=20000 \
+  --bucket-window=month:1
+# 5) Set master key (this is your gateway key for client apps)
 llm-router config --operation=set-master-key --master-key=gw_your_gateway_key
-# 4) Start gateway with auth required
+# 6) Start gateway with auth required
 llm-router start --require-auth=true
 ```
@@ -80,6 +109,109 @@ Claude Code example (`~/.claude/settings.local.json`):
 - Policy/moderation blocks: no retry; cross-provider fallback is disabled by default (`LLM_ROUTER_ALLOW_POLICY_FALLBACK=false`).
 - Invalid client requests (`400`, `413`, `422`): no retry and no fallback short-circuit.
+## Model Alias Routing Strategies
+A model alias groups multiple models from different providers under one model name.
+Use `--strategy` when creating or updating a model alias:
+- `auto`: Recommended set-and-forget mode. Automatically routes using quota, cooldown, and health signals to reduce rate-limit failures.
+- `ordered`: Tries targets in list order.
+- `round-robin`: Rotates evenly across eligible targets.
+- `weighted-rr`: Rotates like round-robin, but favors higher weights.
+- `quota-aware-weighted-rr`: Weighted routing plus remaining-capacity awareness.
+Example:
+```bash
+llm-router config \
+  --operation=upsert-model-alias \
+  --alias-id=coding \
+  --strategy=auto \
+  --targets=rc/gpt-5.3-codex,zai/glm-5
+```
+Concrete model alias example with provider-specific caps:
+```bash
+llm-router config \
+  --operation=upsert-model-alias \
+  --alias-id=coding \
+  --strategy=auto \
+  --targets=rc/gpt-5.3-codex,zai/glm-5
+llm-router config \
+  --operation=set-provider-rate-limits \
+  --provider-id=rc \
+  --bucket-name="Minute cap" \
+  --bucket-models=gpt-5.3-codex \
+  --bucket-requests=60 \
+  --bucket-window=minute:1
+llm-router config \
+  --operation=set-provider-rate-limits \
+  --provider-id=zai \
+  --bucket-name="5-hours cap" \
+  --bucket-models=glm-5 \
+  --bucket-requests=600 \
+  --bucket-window=hour:5
+```
+## What Is A Bucket?
+A rate-limit bucket is a request cap for a time window.
+Examples:
+- `40 req / 1 minute`
+- `600 req / 6 hours`
+Multiple buckets can apply to the same model scope at the same time. A candidate is treated as exhausted if any matching bucket is exhausted.
+## TUI Bucket Walkthrough
+Use the config manager and select:
+- `Manage provider rate-limit buckets`
+- `Create bucket(s)`
+The TUI now guides you through:
+- Bucket name (friendly label)
+- Model scope (`all` or selected models with multiselect checkboxes)
+- Request cap
+- Window unit (`minute`, `hour(s)`, `week`, `month`)
+- Window size (hours support `N`, other preset units lock to `1`)
+- Review + optional add-another loop for combined policies
+Internal bucket ids are generated automatically from the name when omitted and shown as advanced detail in review.
+## Combined-Cap Recipe (`40/min` + `600/6h`)
+```bash
+llm-router config \
+  --operation=set-provider-rate-limits \
+  --provider-id=openrouter \
+  --bucket-name="Minute cap" \
+  --bucket-models=all \
+  --bucket-requests=40 \
+  --bucket-window=minute:1
+llm-router config \
+  --operation=set-provider-rate-limits \
+  --provider-id=openrouter \
+  --bucket-name="6-hours cap" \
+  --bucket-models=all \
+  --bucket-requests=600 \
+  --bucket-window=hour:6
+```
+This keeps both limits active together for the same model scope.
+## Rate-Limit Troubleshooting
+- Check routing decisions with `LLM_ROUTER_DEBUG_ROUTING=true` and inspect `x-llm-router-skipped-candidates`.
+- `quota-exhausted` means proactive pre-routing skip happened before an upstream call.
+- For provider `429`, cooldown is tracked from `Retry-After` when present, or from `LLM_ROUTER_ORIGIN_RATE_LIMIT_COOLDOWN_MS`.
+- Local mode persists state by default (file backend), while Worker defaults to in-memory state.
 ## Main Commands
 ```bash
@@ -104,8 +236,56 @@ llm-router config \
   --models=gpt-4o,claude-3-7-sonnet \
   --format=openai \
   --skip-probe=true
+llm-router config \
+  --operation=upsert-model-alias \
+  --alias-id=chat.default \
+  --strategy=auto \
+  --targets=openrouter/gpt-4o-mini@3,anthropic/claude-3-5-haiku@2 \
+  --fallback-targets=openrouter/gpt-4o
+llm-router config \
+  --operation=set-provider-rate-limits \
+  --provider-id=openrouter \
+  --bucket-name="Monthly cap" \
+  --bucket-models=all \
+  --bucket-requests=20000 \
+  --bucket-window=month:1
+```
+Alias target syntax:
+- `--targets` / `--fallback-targets`: `<routeRef>@<weight>` or `<routeRef>:<weight>`
+- route refs: direct `provider/model` or alias id
+Routing strategy values:
+- `auto` (recommended)
+- `ordered`
+- `round-robin`
+- `weighted-rr`
+- `quota-aware-weighted-rr`
+Rate-limit bucket window syntax:
+- `--bucket-window=month:1`
+- `--bucket-window=1w`
+- `--bucket-window=7day`
+Routing summary:
+```bash
+llm-router config --operation=list-routing
 ```
+Explicit schema migration with backup:
+```bash
+llm-router config --operation=migrate-config --target-version=2 --create-backup=true
+```
+Automatic version handling:
+- Local config loads with silent forward-migration to latest supported schema.
+- Migration is persisted automatically on read when possible (best-effort, no interactive prompt).
+- Future/newer version numbers do not fail only because of version mismatch; known fields are normalized best-effort.
 Set local auth key:
 ```bash
@@ -206,8 +386,21 @@ Minimal shape:
 ```json
 {
+  "version": 2,
   "masterKey": "local_or_worker_key",
-  "defaultModel": "openrouter/gpt-4o",
+  "defaultModel": "chat.default",
+  "modelAliases": {
+    "chat.default": {
+      "strategy": "auto",
+      "targets": [
+        { "ref": "openrouter/gpt-4o" },
+        { "ref": "anthropic/claude-3-5-haiku" }
+      ],
+      "fallbackTargets": [
+        { "ref": "openrouter/gpt-4o-mini" }
+      ]
+    }
+  },
   "providers": [
     {
       "id": "openrouter",
@@ -215,12 +408,29 @@ Minimal shape:
       "baseUrl": "https://openrouter.ai/api/v1",
       "apiKey": "sk-or-v1-...",
       "formats": ["openai"],
-      "models": [{ "id": "gpt-4o" }]
+      "models": [{ "id": "gpt-4o" }],
+      "rateLimits": [
+        {
+          "id": "openrouter-all-month",
+          "name": "Monthly cap",
+          "models": ["all"],
+          "requests": 20000,
+          "window": { "unit": "month", "size": 1 }
+        }
+      ]
     }
   ]
 }
 ```
+Direct vs model alias routing:
+- Direct route: request `model=provider/model` and optional model-level `fallbackModels` applies.
+- Model alias route: request `model=alias.id` (or set as `defaultModel`) and the model alias `targets` + `strategy` drive balancing. `auto` is the recommended default for new model aliases.
+State durability caveats:
+- Local Node (`llm-router start`): routing state defaults to file-backed local persistence, so cooldowns/caps survive restarts.
+- Cloudflare Worker: default state is in-memory per isolate for now; long-window counters are best-effort until a durable Worker backend is configured.
 ## Smoke Test
 ```bash

package/SECURITY.md ADDED Viewed

@@ -0,0 +1,142 @@
+# Security Guide
+This guide focuses on preventing unauthorized access to costly LLM resources, especially in Cloudflare Worker deployments.
+## Quick Hardened Setup
+1. Generate and set a strong gateway key locally:
+```bash
+llm-router config --operation=set-master-key --generate-master-key=true
+```
+2. Deploy with worker defaults already set in this repo:
+- `workers_dev = false`
+- `preview_urls = false`
+3. Deploy config + secrets:
+```bash
+llm-router deploy --env=production
+```
+4. Restrict who can call the router:
+- Set `LLM_ROUTER_ALLOWED_IPS` (or `LLM_ROUTER_IP_ALLOWLIST`) to trusted source IPs.
+- Set `LLM_ROUTER_CORS_ALLOWED_ORIGINS` to explicit browser origins.
+- Keep `LLM_ROUTER_CORS_ALLOW_ALL` disabled in production.
+5. Expose only a custom domain route (not `workers.dev`):
+```toml
+[env.production]
+routes = [{ pattern = "api.example.com/*", zone_name = "example.com" }]
+```
+## Quick Master Key Generation
+Use generated keys instead of hand-written keys:
+```bash
+# Local config master key
+llm-router config --operation=set-master-key --generate-master-key=true
+# Rotate Cloudflare worker key directly
+llm-router worker-key --env=production --generate-master-key=true
+```
+Optional tuning:
+```bash
+llm-router worker-key \
+  --env=production \
+  --generate-master-key=true \
+  --master-key-length=64 \
+  --master-key-prefix=gw_
+```
+## Cloudflare Access (Recommended)
+Protect the worker behind Cloudflare Access so clients must present a service token before hitting the router.
+Suggested setup:
+1. Zero Trust -> Access -> Applications -> Add application.
+2. Type: Self-hosted.
+3. Domain: your API hostname (for example `api.example.com`).
+4. Policy: allow only a Service Token for machine-to-machine traffic.
+Client calls should include:
+- `CF-Access-Client-Id`
+- `CF-Access-Client-Secret`
+Reference:
+- [Cloudflare Access service tokens](https://developers.cloudflare.com/cloudflare-one/identity/service-tokens/)
+## WAF and Rate Limiting
+Use WAF custom rules and rate limiting to reduce abuse blast radius.
+Suggested custom rule expressions (adapt host/path to your deployment):
+1. Block non-allowlisted source IPs to route endpoint:
+```txt
+http.host eq "api.example.com" and starts_with(http.request.uri.path, "/route") and not ip.src in $llm_router_allowed_ips
+```
+2. Block unexpected methods on route endpoint:
+```txt
+http.host eq "api.example.com" and starts_with(http.request.uri.path, "/route") and not http.request.method in {"POST" "OPTIONS"}
+```
+Suggested rate limit rule:
+- Match expression:
+```txt
+http.host eq "api.example.com" and starts_with(http.request.uri.path, "/route")
+```
+- Threshold example:
+  - 60 requests / 1 minute per source IP (tighten or loosen by workload).
+  - Action: Block or Managed Challenge.
+References:
+- [Cloudflare WAF custom rules](https://developers.cloudflare.com/waf/custom-rules/)
+- [Cloudflare WAF rate limiting rules](https://developers.cloudflare.com/waf/rate-limiting-rules/)
+## Incident Response: Master Key Leak
+1. Rotate worker key immediately:
+```bash
+llm-router worker-key --env=production --generate-master-key=true
+```
+2. Rotate local config key (if reused anywhere):
+```bash
+llm-router config --operation=set-master-key --generate-master-key=true
+```
+3. Revoke exposed credentials and rotate provider API keys.
+4. Review Cloudflare logs/WAF events for abuse window.
+5. Tighten Access policy, IP allowlist, and rate limits before reopening traffic.
+## Router Runtime Hardening Knobs
+- `LLM_ROUTER_MAX_REQUEST_BODY_BYTES`
+- `LLM_ROUTER_UPSTREAM_TIMEOUT_MS`
+- `LLM_ROUTER_ALLOWED_IPS` / `LLM_ROUTER_IP_ALLOWLIST`
+- `LLM_ROUTER_CORS_ALLOWED_ORIGINS`
+- `LLM_ROUTER_CORS_ALLOW_ALL` (keep `false` in production)
+## Official References
+- [Workers Secrets](https://developers.cloudflare.com/workers/configuration/secrets/)
+- [Wrangler configuration](https://developers.cloudflare.com/workers/wrangler/configuration/)
+- [workers.dev routing controls](https://developers.cloudflare.com/workers/configuration/routing/workers-dev/)
+- [Preview URLs](https://developers.cloudflare.com/changelog/2024-03-14-preview-urls/)
+- [Cloudflare Access service tokens](https://developers.cloudflare.com/cloudflare-one/identity/service-tokens/)
+- [WAF custom rules](https://developers.cloudflare.com/waf/custom-rules/)
+- [WAF rate limiting](https://developers.cloudflare.com/waf/rate-limiting-rules/)
+- [API Shield sequence mitigation](https://developers.cloudflare.com/api-shield/security/sequence-mitigation/)

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@khanglvm/llm-router",
-  "version": "1.0.5",
+  "version": "1.0.6",
   "description": "Single gateway endpoint for multi-provider LLMs with unified OpenAI+Anthropic format and seamless fallback",
   "type": "module",
   "main": "src/index.js",
@@ -18,9 +18,21 @@
     "test:provider-smoke": "node ./scripts/provider-smoke-suite.mjs"
   },
   "dependencies": {
-    "@levu/snap": "^0.3.8"
+    "@levu/snap": "^0.3.11"
   },
   "devDependencies": {
     "wrangler": "^4.68.1"
-  }
+  },
+  "publishConfig": {
+    "access": "public"
+  },
+  "files": [
+    "src/**/*.js",
+    "!src/**/*.test.js",
+    "!src/**/*.spec.js",
+    "README.md",
+    "SECURITY.md",
+    "CHANGELOG.md",
+    "wrangler.toml"
+  ]
 }