claw-llm-router 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Donn Felker
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,336 @@
1
+ # Claw LLM Router
2
+
3
+ An [OpenClaw](https://openclaw.ai) plugin that cuts LLM costs **40–80%** by classifying prompts and routing them to the cheapest capable model. Simple questions go to fast/cheap models (Gemini Flash at ~$0.15/1M tokens); complex tasks go to frontier models. All routing happens locally in <1ms.
4
+
5
+ ## Why
6
+
7
+ LLM costs add up fast when every prompt hits a frontier model. Most prompts don't need one. "What is the capital of France?" doesn't need Claude Opus — Gemini Flash answers it for 100x less. The router makes this automatic: you interact with a single model (`claw-llm-router/auto`) and the classifier picks the right backend.
8
+
9
+ ## How It Works
10
+
11
+ ```mermaid
12
+ flowchart TD
13
+ A[Incoming Request] --> B{Model ID?}
14
+ B -->|simple, medium, complex, reasoning| C[Forced Tier]
15
+ B -->|auto| D[Rule-Based Classifier]
16
+ D --> F[Assigned Tier]
17
+ F --> H[Load Tier Config]
18
+ C --> H
19
+ H --> I[Resolve Provider]
20
+ I --> J{Provider Type}
21
+ J -->|Google, OpenAI, Groq, etc.| K[Direct API Call]
22
+ J -->|Anthropic + API Key| L[Direct Anthropic API]
23
+ J -->|Anthropic + OAuth| M{Router is Primary?}
24
+ M -->|No| N[Gateway Fallback]
25
+ M -->|Yes| O[Gateway + Model Override]
26
+ K --> P[Response]
27
+ L --> P
28
+ N --> P
29
+ O --> P
30
+ ```
31
+
32
+ ### The Four Tiers
33
+
34
+ | Tier | Default Model | When It's Used |
35
+ | ------------- | ------------------------------------- | -------------------------------------------------------------------------------- |
36
+ | **SIMPLE** | `google/gemini-2.5-flash` | Factual lookups, definitions, translations, greetings, yes/no, simple math |
37
+ | **MEDIUM** | `anthropic/claude-haiku-4-5-20251001` | Code snippets, explanations, summaries, moderate analysis |
38
+ | **COMPLEX** | `anthropic/claude-sonnet-4-6` | Multi-file code, architecture, long-form analysis, detailed technical work |
39
+ | **REASONING** | `anthropic/claude-opus-4-6` | Mathematical proofs, formal logic, multi-step derivations, deep chain-of-thought |
40
+
41
+ Every tier is configurable. Any OpenAI-compatible provider works, plus Anthropic's native Messages API.
42
+
43
+ ### Classification
44
+
45
+ The classifier scores prompts across 15 weighted dimensions:
46
+
47
+ | Dimension | Weight | What It Detects |
48
+ | --------------------- | ------ | ----------------------------------------------------- |
49
+ | Reasoning markers | 0.18 | "prove", "theorem", "derive", "step by step" |
50
+ | Code presence | 0.15 | `function`, `class`, `import`, backtick blocks |
51
+ | Technical terms | 0.13 | "algorithm", "kubernetes", "distributed" |
52
+ | Multi-step patterns | 0.10 | "first...then", "step 1", numbered lists |
53
+ | Token count | 0.08 | Short prompts pull toward SIMPLE, long toward COMPLEX |
54
+ | Agentic tasks | 0.06 | "read file", "edit", "deploy", "fix", "debug" |
55
+ | Imperative verbs | 0.05 | "build", "create", "implement", "design" |
56
+ | Creative markers | 0.04 | "story", "poem", "brainstorm", "write a" |
57
+ | Question complexity | 0.04 | Multiple question marks |
58
+ | Constraint indicators | 0.04 | "at most", "within", "budget", "maximum" |
59
+ | Output format | 0.03 | "json", "yaml", "table", "csv" |
60
+ | Simple indicators | 0.02 | "what is", "define", "who is", "capital of" |
61
+ | Reference complexity | 0.02 | "the code", "above", "the api" |
62
+ | Domain specificity | 0.02 | "quantum", "fpga", "genomics", "zero-knowledge" |
63
+ | Negation complexity | 0.01 | "don't", "avoid", "except", "exclude" |
64
+
65
+ Scores map to tiers via boundaries (SIMPLE < 0.0, MEDIUM < 0.3, COMPLEX < 0.5, REASONING >= 0.5). The MEDIUM band is intentionally wide (0.30) so ambiguous prompts land confidently within it — no external LLM calls needed.
66
+
67
+ ### Fallback Chain
68
+
69
+ If a provider call fails, the router tries the next tier up:
70
+
71
+ ```
72
+ SIMPLE → MEDIUM → COMPLEX
73
+ MEDIUM → COMPLEX
74
+ COMPLEX → REASONING
75
+ REASONING → (no fallback)
76
+ ```
77
+
78
+ ## Architecture
79
+
80
+ See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for provider strategy, resolution logic, and the OAuth model override mechanism.
81
+
82
+ ## Quickstart
83
+
84
+ ### 1. Install the plugin
85
+
86
+ ```bash
87
+ openclaw plugins install claw-llm-router
88
+ ```
89
+
90
+ Or install from a local directory during development:
91
+
92
+ ```bash
93
+ openclaw plugins install -l ./claw-llm-router
94
+ ```
95
+
96
+ ### 2. Set up API keys
97
+
98
+ The router needs at least one provider API key. Set keys for the providers you want to use:
99
+
100
+ | Provider | Environment Variable | Tier Suggestion |
101
+ | --------- | ---------------------------------------- | -------------------------- |
102
+ | Google | `GEMINI_API_KEY` | SIMPLE |
103
+ | Anthropic | `ANTHROPIC_API_KEY` or OAuth via `/auth` | MEDIUM, COMPLEX, REASONING |
104
+ | OpenAI | `OPENAI_API_KEY` | MEDIUM, COMPLEX |
105
+ | Groq | `GROQ_API_KEY` | SIMPLE |
106
+ | xAI | `XAI_API_KEY` | MEDIUM |
107
+
108
+ You can also add credentials through OpenClaw's `/auth` command.
109
+
110
+ > **Tip:** At minimum, set `GEMINI_API_KEY` for the SIMPLE tier and authenticate Anthropic via `/auth` for the other tiers. This covers all four tiers.
111
+
112
+ ### 3. Restart and verify
113
+
114
+ ```bash
115
+ openclaw gateway restart
116
+ ```
117
+
118
+ Then run the doctor to verify everything is working:
119
+
120
+ ```
121
+ /router doctor
122
+ ```
123
+
124
+ ### 4. Set as primary model (recommended)
125
+
126
+ To route all prompts through the router automatically, set it as the primary model in `~/.openclaw/openclaw.json`:
127
+
128
+ ```json
129
+ {
130
+ "agents": {
131
+ "defaults": {
132
+ "model": {
133
+ "primary": "claw-llm-router/auto"
134
+ }
135
+ }
136
+ }
137
+ }
138
+ ```
139
+
140
+ Restart the gateway after this change.
141
+
142
+ ### Configure Tiers
143
+
144
+ Use the `/router` command in any OpenClaw chat:
145
+
146
+ ```
147
+ /router setup # Show current config + suggestions
148
+ /router set SIMPLE google/gemini-2.5-flash
149
+ /router set MEDIUM anthropic/claude-haiku-4-5-20251001
150
+ /router set COMPLEX anthropic/claude-sonnet-4-6
151
+ /router set REASONING anthropic/claude-opus-4-6
152
+ /router doctor # Diagnose config, auth, and proxy issues
153
+ ```
154
+
155
+ ### API Key Resolution
156
+
157
+ The router reads API keys from OpenClaw's existing auth stores (never stores its own). Priority order:
158
+
159
+ 1. Environment variable (e.g., `GEMINI_API_KEY`, `ANTHROPIC_API_KEY`, `XAI_API_KEY`, `MOONSHOT_API_KEY`)
160
+ 2. `~/.openclaw/agents/main/agent/auth-profiles.json`
161
+ 3. `~/.openclaw/agents/main/agent/auth.json`
162
+ 4. `~/.openclaw/openclaw.json` `env.vars` section
163
+
164
+ ## Usage
165
+
166
+ ### `/router` Command
167
+
168
+ The `/router` slash command manages the router from any OpenClaw chat. No gateway restart needed for tier changes — they take effect on the next request.
169
+
170
+ | Command | What It Does |
171
+ | ------------------------------------- | ----------------------------------------------------------- |
172
+ | `/router` | Show status: uptime, proxy health, current tier assignments |
173
+ | `/router help` | List all subcommands with examples |
174
+ | `/router setup` | Show current config with suggested models per tier |
175
+ | `/router set <TIER> <provider/model>` | Change a tier's model |
176
+ | `/router doctor` | Diagnose config, API keys, base URLs, and proxy health |
177
+
178
+ Common workflows:
179
+
180
+ ```
181
+ # Check what's running
182
+ /router
183
+
184
+ # Something broken? Run the doctor
185
+ /router doctor
186
+
187
+ # Swap SIMPLE tier to a different model
188
+ /router set SIMPLE groq/llama-3.3-70b-versatile
189
+
190
+ # See all options and suggested models
191
+ /router setup
192
+ ```
193
+
194
+ ### Model Selection
195
+
196
+ Switch to the router model in any chat:
197
+
198
+ ```
199
+ /model claw-llm-router/auto
200
+ ```
201
+
202
+ Or force a specific tier (useful for testing or when you know the complexity):
203
+
204
+ ```
205
+ /model claw-llm-router/simple # Always use the cheapest model
206
+ /model claw-llm-router/complex # Skip classification, go straight to capable
207
+ /model claw-llm-router/reasoning # Force frontier reasoning model
208
+ ```
209
+
210
+ ### Via curl
211
+
212
+ The proxy runs on `http://127.0.0.1:8401` and speaks the OpenAI chat completions API:
213
+
214
+ ```bash
215
+ # Auto-classify
216
+ curl -s http://127.0.0.1:8401/v1/chat/completions \
217
+ -H "Content-Type: application/json" \
218
+ -d '{
219
+ "model": "auto",
220
+ "messages": [{"role": "user", "content": "What is 2+2?"}],
221
+ "max_tokens": 50
222
+ }'
223
+
224
+ # Force a tier
225
+ curl -s http://127.0.0.1:8401/v1/chat/completions \
226
+ -H "Content-Type: application/json" \
227
+ -d '{
228
+ "model": "complex",
229
+ "messages": [{"role": "user", "content": "Design a microservice architecture"}],
230
+ "max_tokens": 500
231
+ }'
232
+
233
+ # Streaming
234
+ curl -s http://127.0.0.1:8401/v1/chat/completions \
235
+ -H "Content-Type: application/json" \
236
+ -d '{
237
+ "model": "auto",
238
+ "messages": [{"role": "user", "content": "Hello!"}],
239
+ "stream": true
240
+ }'
241
+ ```
242
+
243
+ ### Endpoints
244
+
245
+ | Endpoint | Method | Description |
246
+ | ---------------------- | ------ | ------------------------------------ |
247
+ | `/v1/chat/completions` | POST | Chat completions (OpenAI-compatible) |
248
+ | `/v1/models` | GET | List available models |
249
+ | `/health` | GET | Health check |
250
+
251
+ ### Virtual Model IDs
252
+
253
+ | Model ID | Behavior |
254
+ | ----------- | -------------------------------- |
255
+ | `auto` | Classify and route automatically |
256
+ | `simple` | Force SIMPLE tier |
257
+ | `medium` | Force MEDIUM tier |
258
+ | `complex` | Force COMPLEX tier |
259
+ | `reasoning` | Force REASONING tier |
260
+
261
+ ## Troubleshooting
262
+
263
+ Run the built-in doctor to check your setup:
264
+
265
+ ```
266
+ /router doctor
267
+ ```
268
+
269
+ It verifies:
270
+
271
+ - Config file exists and all 4 tiers are configured
272
+ - Each tier has a valid `provider/model-id` format, resolvable base URL, and available API key
273
+ - Proxy is running and healthy on port 8401
274
+ - Whether the router is set as the primary model
275
+
276
+ Any issues are flagged with fix instructions (e.g., which env var to set, how to add a custom provider).
277
+
278
+ ## Project Structure
279
+
280
+ ```
281
+ claw-llm-router/
282
+ ├── index.ts # Plugin entry point, OpenClaw registration
283
+ ├── proxy.ts # HTTP proxy server, request routing
284
+ ├── classifier.ts # Rule-based prompt classifier (15 dimensions)
285
+ ├── tier-config.ts # Tier-to-model config, API key loading
286
+ ├── models.ts # Model definitions, provider constants
287
+ ├── provider.ts # OpenClaw provider plugin definition
288
+ ├── router-config.json # Tier configuration (auto-generated)
289
+ ├── router-logger.ts # RouterLogger class — centralized log formatting
290
+ ├── openclaw.plugin.json # Plugin manifest
291
+ ├── docs/
292
+ │ ├── ARCHITECTURE.md # Provider strategy, OAuth override mechanism
293
+ │ ├── CLASSIFIER.md # Classifier dimensions, weights, extraction
294
+ │ └── PROVIDERS.md # Step-by-step guide for adding providers
295
+ ├── providers/
296
+ │ ├── types.ts # LLMProvider interface, shared types
297
+ │ ├── openai-compatible.ts # Google, OpenAI, Groq, Mistral, etc.
298
+ │ ├── anthropic.ts # Anthropic Messages API (direct key)
299
+ │ ├── gateway.ts # OpenClaw gateway fallback (OAuth)
300
+ │ ├── model-override.ts # In-process override store for recursion prevention
301
+ │ └── index.ts # Provider registry + resolution
302
+ └── tests/
303
+ ├── classifier.test.ts
304
+ ├── proxy.test.ts
305
+ ├── tier-config.test.ts
306
+ └── providers/
307
+ ├── anthropic.test.ts
308
+ ├── gateway.test.ts
309
+ ├── openai-compatible.test.ts
310
+ ├── model-override.test.ts
311
+ └── registry.test.ts
312
+ ```
313
+
314
+ ## Testing
315
+
316
+ Tests use Node.js built-in test runner (`node:test`). No external test dependencies.
317
+
318
+ ```bash
319
+ # Run all tests
320
+ npx tsx --test tests/providers/*.test.ts tests/classifier.test.ts tests/proxy.test.ts tests/tier-config.test.ts
321
+
322
+ # Provider tests only
323
+ npx tsx --test tests/providers/*.test.ts
324
+
325
+ # Classifier tests only
326
+ npx tsx --test tests/classifier.test.ts
327
+ ```
328
+
329
+
330
+ ## Adding a New Provider
331
+
332
+ See [docs/PROVIDERS.md](docs/PROVIDERS.md) for a step-by-step guide to implementing a new provider.
333
+
334
+ ## License
335
+
336
+ MIT