claw-llm-router 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,146 @@
1
+ # Prompt Classifier
2
+
3
+ The classifier determines which tier (SIMPLE, MEDIUM, COMPLEX, REASONING) a user prompt belongs to, so the router can pick the most cost-effective model.
4
+
5
+ ## Architecture
6
+
7
+ ```
8
+ User prompt
9
+
10
+
11
+ ┌──────────────────┐
12
+ │ Rule-based │
13
+ │ (15 dimensions) │
14
+ └──────────────────┘
15
+
16
+
17
+ Tier assigned
18
+ ```
19
+
20
+ The classifier is 100% local — 15-dimension weighted scoring that runs in <1ms with no API calls. Ambiguous prompts (near tier boundaries) default to the rule-based result rather than calling an external LLM, because the MEDIUM tier is cheap enough to be a safe default.
21
+
22
+ ### Why no LLM fallback?
23
+
24
+ An earlier version used a hybrid approach: when rule-based confidence was below 0.70, it called a cheap LLM to verify. This was removed because:
25
+
26
+ 1. **Net cost increase**: The LLM classifier correctly downgraded ~35% of ambiguous prompts to SIMPLE (saving ~$0.004 each), but upgraded ~25% to COMPLEX/REASONING (costing ~$0.03-0.04 each). The upgrades dominated, making the classifier a net cost of $3.50-$74/month depending on traffic.
27
+ 2. **Latency**: Added 100-500ms per LLM call on ~33% of messages.
28
+ 3. **ClawRouter precedent**: ClawRouter uses 100% local classification with no LLM fallback and reports 70-80% cost savings. The savings come from the non-ambiguous prompts (both approaches classify these identically).
29
+ 4. **MEDIUM is a safe default**: Cheap enough to not waste money, capable enough to handle most tasks.
30
+
31
+ ## Rule-Based Classifier
32
+
33
+ ### 15 Scoring Dimensions
34
+
35
+ Each dimension scores the prompt on a scale (typically -1.0 to 1.0). The weighted sum determines the tier.
36
+
37
+ | # | Dimension | Weight | What it detects |
38
+ | --- | --------------------- | ------ | ------------------------------------------------------------------------------ |
39
+ | 1 | `tokenCount` | 0.08 | Prompt length (<50 tokens = -1.0, >500 = 1.0) |
40
+ | 2 | `codePresence` | 0.14 | Code keywords: `function`, `class`, `import`, ` ``` `, etc. |
41
+ | 3 | `reasoningMarkers` | 0.17 | `prove`, `theorem`, `step by step`, `chain of thought`, etc. |
42
+ | 4 | `technicalTerms` | 0.09 | `algorithm`, `kubernetes`, `distributed`, `architecture`, etc. |
43
+ | 5 | `creativeMarkers` | 0.05 | `story`, `poem`, `brainstorm`, `write a`, etc. |
44
+ | 6 | `simpleIndicators` | 0.11 | `what is`, `define`, `hello`, `capital of` → scores -1.0 (pulls toward SIMPLE) |
45
+ | 7 | `multiStepPatterns` | 0.11 | Regex: `first.*then`, `step \d`, numbered lists |
46
+ | 8 | `questionComplexity` | 0.04 | 4+ question marks in the prompt |
47
+ | 9 | `imperativeVerbs` | 0.03 | `build`, `create`, `implement`, `deploy`, etc. |
48
+ | 10 | `constraintCount` | 0.04 | `at most`, `within`, `maximum`, `budget`, etc. |
49
+ | 11 | `outputFormat` | 0.03 | `json`, `yaml`, `table`, `format as`, etc. |
50
+ | 12 | `referenceComplexity` | 0.02 | `the docs`, `the api`, `attached`, `above`, etc. |
51
+ | 13 | `negationComplexity` | 0.01 | `don't`, `avoid`, `without`, `except`, etc. |
52
+ | 14 | `domainSpecificity` | 0.02 | `quantum`, `fpga`, `genomics`, `zero-knowledge`, etc. |
53
+ | 15 | `agenticTask` | 0.06 | `read file`, `edit`, `deploy`, `fix`, `debug`, `step 1`, etc. |
54
+
55
+ Weights sum to 1.0 and are aligned with [ClawRouter](https://github.com/claw-project/claw-router)'s 14-dimension scheme, scaled to accommodate our 15th dimension (`agenticTask`).
56
+
57
+ ### Tier Boundaries
58
+
59
+ The weighted sum maps to a tier via fixed boundaries:
60
+
61
+ | Score range | Tier | Band width |
62
+ | ----------- | --------- | ---------- |
63
+ | < 0.00 | SIMPLE | — |
64
+ | 0.00 – 0.30 | MEDIUM | 0.30 |
65
+ | 0.30 – 0.50 | COMPLEX | 0.20 |
66
+ | >= 0.50 | REASONING | — |
67
+
68
+ These boundaries match [ClawRouter](https://github.com/claw-project/claw-router)'s production-proven values. The MEDIUM band is intentionally wide (0.30) so that ambiguous prompts — which tend to cluster around boundaries — land confidently within MEDIUM rather than triggering expensive misrouting. With steepness=12.0, a score at the center of MEDIUM (0.15) has distance 0.15 to the nearest boundary, yielding confidence ~0.86.
69
+
70
+ ### Special Overrides
71
+
72
+ These override the score-based mapping regardless of weighted sum:
73
+
74
+ | Condition | Forced tier | Min confidence |
75
+ | ---------------------------------------------------------------------------------------- | ----------- | -------------- |
76
+ | >100k estimated tokens | COMPLEX | 0.95 |
77
+ | 2+ reasoning keywords | REASONING | 0.85 |
78
+ | 4+ complexity signals (technical + imperative + agentic) AND (multi-step OR long prompt) | COMPLEX | 0.85 |
79
+
80
+ ### Confidence Calculation
81
+
82
+ Confidence measures how far the score is from the nearest tier boundary, using a sigmoid function:
83
+
84
+ ```
85
+ confidence = sigmoid(distance_to_nearest_boundary)
86
+ sigmoid(x) = 1 / (1 + exp(-12.0 * x))
87
+ ```
88
+
89
+ Higher confidence means the score is well within a tier's range. Lower confidence means it's near a boundary. Either way, the rule-based tier is used directly.
90
+
91
+ ### Signals
92
+
93
+ The classifier returns human-readable signal strings that explain why it chose a tier. Examples:
94
+
95
+ - `short (3 tokens)` — prompt is very short
96
+ - `simple (what is)` — matched a simple-indicator keyword
97
+ - `code (function, class)` — matched code keywords
98
+ - `reasoning (step by step, prove)` — matched reasoning markers
99
+
100
+ These signals appear in the router logs for debugging.
101
+
102
+ ## Prompt Extraction
103
+
104
+ Before classification, the proxy extracts the actual user text from potentially wrapped messages. This prevents system prompt keywords from polluting the classification.
105
+
106
+ ### Three extraction cases
107
+
108
+ 1. **Packed context** — OpenClaw group chats/subagents wrap history + current message:
109
+
110
+ ```
111
+ [Chat messages since your last reply - for context]
112
+ user: earlier message
113
+ assistant: earlier reply
114
+ [Current message - respond to this]
115
+ What is 2+2?
116
+ ```
117
+
118
+ The classifier only sees `What is 2+2?`.
119
+
120
+ 2. **Embedded system prompt** — Some OpenClaw paths (webchat) prepend the system prompt to the user message instead of sending it as a separate system-role message. If the system prompt text is found inside the user message, it's stripped before classification.
121
+
122
+ 3. **Long message without system role** — If there's no separate system message and the user message is >500 chars, the system prompt is likely embedded. The classifier takes the text after the last `\n\n` break (if it's <500 chars) as the actual user input.
123
+
124
+ These extraction steps are critical — without them, system prompt keywords like `json`, `function`, or `code` cause misclassification (e.g., "3+1" classified as MEDIUM instead of SIMPLE because the system prompt mentioned JSON formatting).
125
+
126
+ ## Forced Tier Override
127
+
128
+ Users can bypass the classifier entirely by using tier-specific model IDs:
129
+
130
+ | Model ID | Tier |
131
+ | ------------------------------------------ | --------- |
132
+ | `simple` or `claw-llm-router/simple` | SIMPLE |
133
+ | `medium` or `claw-llm-router/medium` | MEDIUM |
134
+ | `complex` or `claw-llm-router/complex` | COMPLEX |
135
+ | `reasoning` or `claw-llm-router/reasoning` | REASONING |
136
+
137
+ ## Fallback Chain
138
+
139
+ If a provider fails, the router tries the next tier up:
140
+
141
+ | Starting tier | Fallback chain |
142
+ | ------------- | ------------------------- |
143
+ | SIMPLE | SIMPLE → MEDIUM → COMPLEX |
144
+ | MEDIUM | MEDIUM → COMPLEX |
145
+ | COMPLEX | COMPLEX → REASONING |
146
+ | REASONING | REASONING (no fallback) |
@@ -0,0 +1,228 @@
1
+ # Adding a New Provider
2
+
3
+ This guide explains how to add a new LLM provider to the claw-llm-router.
4
+
5
+ ## Overview
6
+
7
+ The router uses a **Strategy pattern** — each provider implements the `LLMProvider` interface and handles its own request/response format. The provider registry (`providers/index.ts`) picks the right provider based on the model spec.
8
+
9
+ ```mermaid
10
+ flowchart TD
11
+ CALL[callProvider] --> RESOLVE[resolveProvider]
12
+ RESOLVE --> OAUTH{spec.isOAuth?}
13
+ OAUTH -->|Yes| PRIMARY{Router is primary?}
14
+ PRIMARY -->|No| GW[GatewayProvider]
15
+ PRIMARY -->|Yes| GWO[gateway-with-override]
16
+ OAUTH -->|No| CHECK{spec.isAnthropic?}
17
+ CHECK -->|Yes| ANT[AnthropicProvider]
18
+ CHECK -->|No| OAI[OpenAICompatibleProvider]
19
+ ```
20
+
21
+ ## The `LLMProvider` Interface
22
+
23
+ Every provider implements this contract from `providers/types.ts`:
24
+
25
+ ```typescript
26
+ interface LLMProvider {
27
+ readonly name: string;
28
+ chatCompletion(
29
+ body: Record<string, unknown>, // Original OpenAI-format request
30
+ spec: { modelId: string; apiKey: string; baseUrl: string },
31
+ stream: boolean,
32
+ res: ServerResponse,
33
+ log: PluginLogger,
34
+ ): Promise<void>;
35
+ }
36
+ ```
37
+
38
+ ### Parameters
39
+
40
+ - `body` — The original request body in OpenAI chat completions format
41
+ - `spec` — Provider details (model ID, API key, base URL) resolved from tier config
42
+ - `stream` — Whether the client requested streaming SSE
43
+ - `res` — Node.js `ServerResponse` to write the response to
44
+ - `log` — Logger with `info()`, `warn()`, `error()` methods
45
+
46
+ ### Contract
47
+
48
+ - **Non-streaming**: Write a complete JSON response to `res` with `Content-Type: application/json`
49
+ - **Streaming**: Write SSE events to `res` with `Content-Type: text/event-stream`, ending with `data: [DONE]\n\n`
50
+ - **Errors**: Throw an `Error` — the proxy's fallback chain will catch it and try the next tier
51
+ - **Response format**: Must be OpenAI chat completions format (the proxy expects it)
52
+
53
+ ## Step-by-Step Guide
54
+
55
+ ### 1. Create `providers/my-provider.ts`
56
+
57
+ ```typescript
58
+ import type { ServerResponse } from "node:http";
59
+ import type { LLMProvider, PluginLogger } from "./types.js";
60
+
61
+ export class MyProvider implements LLMProvider {
62
+ readonly name = "my-provider";
63
+
64
+ async chatCompletion(
65
+ body: Record<string, unknown>,
66
+ spec: { modelId: string; apiKey: string; baseUrl: string },
67
+ stream: boolean,
68
+ res: ServerResponse,
69
+ log: PluginLogger,
70
+ ): Promise<void> {
71
+ // 1. Convert request if needed (OpenAI format → provider format)
72
+ // 2. Make the API call with fetch()
73
+ // 3. Convert response back to OpenAI format if needed
74
+ // 4. Write to res (JSON for non-streaming, SSE for streaming)
75
+ }
76
+ }
77
+ ```
78
+
79
+ ### 2. Add to `providers/index.ts`
80
+
81
+ ```typescript
82
+ import { MyProvider } from "./my-provider.js";
83
+
84
+ const myProvider = new MyProvider();
85
+
86
+ export function resolveProvider(spec: TierModelSpec): LLMProvider {
87
+ if (spec.provider === "my-provider") {
88
+ return myProvider;
89
+ }
90
+ // ... existing logic
91
+ }
92
+ ```
93
+
94
+ ### 3. Add well-known base URL to `tier-config.ts`
95
+
96
+ In the `WELL_KNOWN_BASE_URLS` map:
97
+
98
+ ```typescript
99
+ const WELL_KNOWN_BASE_URLS: Record<string, string> = {
100
+ // ... existing entries
101
+ "my-provider": "https://api.my-provider.com/v1",
102
+ };
103
+ ```
104
+
105
+ ### 4. Add env var mapping (if non-standard)
106
+
107
+ If the API key env var isn't `MY_PROVIDER_API_KEY`, add to `ENV_VAR_OVERRIDES`:
108
+
109
+ ```typescript
110
+ const ENV_VAR_OVERRIDES: Record<string, string> = {
111
+ google: "GEMINI_API_KEY",
112
+ "my-provider": "MY_CUSTOM_KEY_VAR",
113
+ };
114
+ ```
115
+
116
+ ### 5. Test
117
+
118
+ ```bash
119
+ # Configure a tier to use your provider
120
+ /router set SIMPLE my-provider/model-id
121
+
122
+ # Test with curl
123
+ curl -s http://127.0.0.1:8401/v1/chat/completions \
124
+ -H "Content-Type: application/json" \
125
+ -d '{"model":"simple","messages":[{"role":"user","content":"hello"}],"max_tokens":50}'
126
+ ```
127
+
128
+ ### 6. Write tests
129
+
130
+ Create `tests/providers/my-provider.test.ts` — see existing tests for patterns (mock `fetch()`, test request/response conversion).
131
+
132
+ ## Request/Response Format
133
+
134
+ ### Input (OpenAI Chat Completions)
135
+
136
+ ```json
137
+ {
138
+ "model": "my-model",
139
+ "messages": [
140
+ { "role": "system", "content": "You are helpful." },
141
+ { "role": "user", "content": "Hello" }
142
+ ],
143
+ "max_tokens": 100,
144
+ "temperature": 0.7,
145
+ "stream": false
146
+ }
147
+ ```
148
+
149
+ ### Output (Non-Streaming)
150
+
151
+ ```json
152
+ {
153
+ "id": "chatcmpl-...",
154
+ "object": "chat.completion",
155
+ "created": 1234567890,
156
+ "model": "my-model",
157
+ "choices": [
158
+ {
159
+ "index": 0,
160
+ "message": { "role": "assistant", "content": "Hi!" },
161
+ "finish_reason": "stop"
162
+ }
163
+ ],
164
+ "usage": {
165
+ "prompt_tokens": 10,
166
+ "completion_tokens": 5,
167
+ "total_tokens": 15
168
+ }
169
+ }
170
+ ```
171
+
172
+ ### Output (Streaming SSE)
173
+
174
+ ```
175
+ data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1234567890,"model":"my-model","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
176
+
177
+ data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1234567890,"model":"my-model","choices":[{"index":0,"delta":{"content":"Hi"},"finish_reason":null}]}
178
+
179
+ data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1234567890,"model":"my-model","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
180
+
181
+ data: [DONE]
182
+ ```
183
+
184
+ ## Request Body Sanitization
185
+
186
+ The `OpenAICompatibleProvider` strips non-standard fields from the request body before forwarding. Fields like `store` and `metadata` (added by OpenClaw internally) cause 400 errors on providers like Google Gemini. Only standard OpenAI chat completion parameters are forwarded:
187
+
188
+ `messages`, `model`, `stream`, `max_tokens`, `max_completion_tokens`, `temperature`, `top_p`, `n`, `stop`, `presence_penalty`, `frequency_penalty`, `logit_bias`, `logprobs`, `top_logprobs`, `response_format`, `seed`, `tools`, `tool_choice`, `parallel_tool_calls`, `user`, `stream_options`, `service_tier`
189
+
190
+ If your provider accepts additional parameters, either add them to the allowlist in `openai-compatible.ts` or handle them in your custom provider.
191
+
192
+ ## Auth
193
+
194
+ API keys are resolved by `tier-config.ts` in this priority order:
195
+
196
+ 1. Environment variable (e.g., `MY_PROVIDER_API_KEY`)
197
+ 2. `auth-profiles.json` (OpenClaw's canonical credential store)
198
+ 3. `auth.json` (runtime cache)
199
+ 4. `openclaw.json` `env.vars` section
200
+
201
+ The key is passed to your provider via `spec.apiKey`. Your provider should use it in the appropriate header (e.g., `Authorization: Bearer {apiKey}` or `x-api-key: {apiKey}`).
202
+
203
+ ## Existing Providers Reference
204
+
205
+ | Provider | File | Auth Header | API Format | Notes |
206
+ | -------------------------- | ---------------------- | --------------------------------------- | ------------------ | --------------------------------------------------------- |
207
+ | `OpenAICompatibleProvider` | `openai-compatible.ts` | `Authorization: Bearer` | OpenAI | Sanitizes non-standard fields |
208
+ | `AnthropicProvider` | `anthropic.ts` | `x-api-key` | Anthropic Messages | Full format conversion (request + response + streaming) |
209
+ | `GatewayProvider` | `gateway.ts` | `Authorization: Bearer` (gateway token) | OpenAI | Fallback for OAuth tokens |
210
+ | `gateway-with-override` | `index.ts` (inline) | Same as Gateway | OpenAI | Sets `before_model_resolve` override to prevent recursion |
211
+
212
+ ### Supported OpenAI-Compatible Providers
213
+
214
+ | Provider | Base URL | Env Var | Example Models |
215
+ | ---------- | --------------------------------------------------------- | -------------------- | ---------------------------------------- |
216
+ | Google | `https://generativelanguage.googleapis.com/v1beta/openai` | `GEMINI_API_KEY` | `gemini-2.5-flash` |
217
+ | OpenAI | `https://api.openai.com/v1` | `OPENAI_API_KEY` | `gpt-4o`, `gpt-4o-mini` |
218
+ | Groq | `https://api.groq.com/openai/v1` | `GROQ_API_KEY` | `llama-3.3-70b-versatile` |
219
+ | Mistral | `https://api.mistral.ai/v1` | `MISTRAL_API_KEY` | `mistral-large-latest` |
220
+ | DeepSeek | `https://api.deepseek.com/v1` | `DEEPSEEK_API_KEY` | `deepseek-chat` |
221
+ | Together | `https://api.together.xyz/v1` | `TOGETHER_API_KEY` | `meta-llama/Llama-3-70b` |
222
+ | Fireworks | `https://api.fireworks.ai/inference/v1` | `FIREWORKS_API_KEY` | `accounts/fireworks/models/llama-v3-70b` |
223
+ | Perplexity | `https://api.perplexity.ai` | `PERPLEXITY_API_KEY` | `sonar-pro` |
224
+ | xAI | `https://api.x.ai/v1` | `XAI_API_KEY` | `grok-3`, `grok-beta` |
225
+ | MiniMax | `https://api.minimax.io/v1` | `MINIMAX_API_KEY` | `MiniMax-M1` |
226
+ | MoonShot | `https://api.moonshot.ai/v1` | `MOONSHOT_API_KEY` | `kimi-k2.5` |
227
+
228
+ **Note:** MiniMax supports both direct API key and OAuth authentication. With OAuth (via OpenClaw auth-profiles), requests route through the gateway which handles token refresh and API format conversion. The router auto-detects OAuth credentials from the `minimax-portal` auth profile.