@blockrun/clawrouter 0.12.61 → 0.12.63

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (33) hide show
  1. package/docs/anthropic-cost-savings.md +349 -0
  2. package/docs/architecture.md +559 -0
  3. package/docs/assets/blockrun-248-day-cost-overrun-problem.png +0 -0
  4. package/docs/assets/blockrun-clawrouter-7-layer-token-compression-openclaw.png +0 -0
  5. package/docs/assets/blockrun-clawrouter-observation-compression-97-percent-token-savings.png +0 -0
  6. package/docs/assets/blockrun-clawrouter-openclaw-agentic-proxy-architecture.png +0 -0
  7. package/docs/assets/blockrun-clawrouter-openclaw-automatic-tier-routing-model-selection.png +0 -0
  8. package/docs/assets/blockrun-clawrouter-openclaw-error-classification-retry-storm-prevention.png +0 -0
  9. package/docs/assets/blockrun-clawrouter-openclaw-session-memory-journaling-vs-context-compounding.png +0 -0
  10. package/docs/assets/blockrun-clawrouter-vs-openclaw-standalone-comparison-production-safety.png +0 -0
  11. package/docs/assets/blockrun-clawrouter-x402-usdc-micropayment-wallet-budget-control.png +0 -0
  12. package/docs/assets/blockrun-openclaw-inference-layer-blind-spots.png +0 -0
  13. package/docs/blog-benchmark-2026-03.md +184 -0
  14. package/docs/blog-openclaw-cost-overruns.md +197 -0
  15. package/docs/clawrouter-savings.png +0 -0
  16. package/docs/configuration.md +512 -0
  17. package/docs/features.md +257 -0
  18. package/docs/image-generation.md +380 -0
  19. package/docs/plans/2026-02-03-smart-routing-design.md +267 -0
  20. package/docs/plans/2026-02-13-e2e-docker-deployment.md +1260 -0
  21. package/docs/plans/2026-02-28-worker-network.md +947 -0
  22. package/docs/plans/2026-03-18-error-classification.md +574 -0
  23. package/docs/plans/2026-03-19-exclude-models.md +538 -0
  24. package/docs/routing-profiles.md +81 -0
  25. package/docs/subscription-failover.md +320 -0
  26. package/docs/technical-routing-2026-03.md +322 -0
  27. package/docs/troubleshooting.md +159 -0
  28. package/docs/vision.md +49 -0
  29. package/docs/vs-openrouter.md +157 -0
  30. package/docs/worker-network.md +1241 -0
  31. package/package.json +3 -2
  32. package/scripts/reinstall.sh +8 -4
  33. package/scripts/update.sh +8 -4
@@ -0,0 +1,320 @@
1
+ # Using Subscriptions with ClawRouter Failover
2
+
3
+ This guide explains how to use your existing LLM subscriptions (Claude Pro/Max, ChatGPT Plus, etc.) as primary providers, with ClawRouter x402 micropayments as automatic failover.
4
+
5
+ ## Why Not Built Into ClawRouter?
6
+
7
+ After careful consideration, we decided **not** to integrate subscription support directly into ClawRouter for several important reasons:
8
+
9
+ ### 1. Terms of Service Compliance
10
+
11
+ - Most subscription ToS (Claude Code, ChatGPT Plus) are designed for personal use
12
+ - Using them through a proxy/API service may violate provider agreements
13
+ - We want to keep ClawRouter compliant and low-risk for all users
14
+
15
+ ### 2. Security & Privacy
16
+
17
+ - Integrating subscriptions would require ClawRouter to access your credentials/sessions
18
+ - Spawning external processes (like Claude CLI) introduces security concerns
19
+ - Better to keep authentication at the OpenClaw layer where you control it
20
+
21
+ ### 3. Maintenance & Flexibility
22
+
23
+ - Each subscription provider has different APIs, CLIs, and authentication methods
24
+ - OpenClaw already has a robust provider system that handles this
25
+ - Duplicating this in ClawRouter would increase complexity without added value
26
+
27
+ ### 4. Better Architecture
28
+
29
+ - OpenClaw's native failover mechanism is more flexible and powerful
30
+ - Works with **any** provider (not just Claude)
31
+ - Zero code changes needed in ClawRouter
32
+ - You maintain full control over your credentials
33
+
34
+ ## How It Works
35
+
36
+ OpenClaw has a built-in **model fallback chain** that automatically tries alternative providers when the primary fails:
37
+
38
+ ```
39
+ User Request
40
+
41
+ Primary Provider (e.g., Claude subscription via OpenClaw)
42
+ ↓ (rate limited / quota exceeded / auth failed)
43
+ OpenClaw detects failure
44
+
45
+ Fallback Chain (try each in order)
46
+
47
+ ClawRouter (blockrun/auto)
48
+
49
+ Smart routing picks cheapest model
50
+
51
+ x402 micropayment to BlockRun API
52
+
53
+ Response returned to user
54
+ ```
55
+
56
+ **Key benefits:**
57
+
58
+ - ✅ Automatic failover (no manual intervention)
59
+ - ✅ Works with any subscription provider OpenClaw supports
60
+ - ✅ Respects provider ToS (you configure authentication directly)
61
+ - ✅ ClawRouter stays focused on cost optimization
62
+
63
+ ## Setup Guide
64
+
65
+ ### Prerequisites
66
+
67
+ 1. **OpenClaw Gateway installed** with ClawRouter plugin
68
+
69
+ ```bash
70
+ npm install -g openclaw
71
+ openclaw plugins install @blockrun/clawrouter
72
+ ```
73
+
74
+ 2. **Subscription configured in OpenClaw**
75
+ - For Claude: Use `claude setup-token` or API key
76
+ - For OpenAI: Set `OPENAI_API_KEY` environment variable
77
+ - For others: See [OpenClaw provider docs](https://docs.openclaw.ai)
78
+
79
+ 3. **ClawRouter wallet funded** (for failover)
80
+ ```bash
81
+ openclaw gateway logs | grep "Wallet:"
82
+ # Send USDC to the displayed address on Base network
83
+ ```
84
+
85
+ ### Configuration Steps
86
+
87
+ #### Step 1: Set Primary Model (Your Subscription)
88
+
89
+ ```bash
90
+ # Option A: Using Claude subscription
91
+ openclaw models set anthropic/claude-sonnet-4.6
92
+
93
+ # Option B: Using ChatGPT Plus (via OpenAI provider)
94
+ openclaw models set openai/gpt-4o
95
+
96
+ # Option C: Using any other provider
97
+ openclaw models set <provider>/<model>
98
+ ```
99
+
100
+ #### Step 2: Add ClawRouter as Fallback
101
+
102
+ ```bash
103
+ # Add blockrun/auto for smart routing (recommended)
104
+ openclaw models fallbacks add blockrun/auto
105
+
106
+ # Or specify a specific model
107
+ openclaw models fallbacks add blockrun/google/gemini-2.5-pro
108
+ ```
109
+
110
+ #### Step 3: Verify Configuration
111
+
112
+ ```bash
113
+ openclaw models show
114
+ ```
115
+
116
+ Expected output:
117
+
118
+ ```
119
+ Primary: anthropic/claude-sonnet-4.6
120
+ Fallbacks:
121
+ 1. blockrun/auto
122
+ ```
123
+
124
+ #### Step 4: Test Failover (Optional)
125
+
126
+ To verify failover works:
127
+
128
+ 1. **Temporarily exhaust your subscription quota** (or wait for rate limit)
129
+ 2. **Make a request** - OpenClaw should automatically failover to ClawRouter
130
+ 3. **Check logs:**
131
+ ```bash
132
+ openclaw gateway logs | grep -i "fallback\|blockrun"
133
+ ```
134
+
135
+ ### Advanced Configuration
136
+
137
+ #### Configure Multiple Fallbacks
138
+
139
+ ```bash
140
+ openclaw models fallbacks add blockrun/google/gemini-2.5-flash # Fast & cheap
141
+ openclaw models fallbacks add blockrun/deepseek/deepseek-chat # Even cheaper
142
+ openclaw models fallbacks add blockrun/nvidia/gpt-oss-120b # Free tier
143
+ ```
144
+
145
+ #### Per-Agent Configuration
146
+
147
+ Edit `~/.openclaw/openclaw.json`:
148
+
149
+ ```json
150
+ {
151
+ "agents": {
152
+ "main": {
153
+ "model": {
154
+ "primary": "anthropic/claude-opus-4.6",
155
+ "fallbacks": ["blockrun/auto"]
156
+ }
157
+ },
158
+ "coding": {
159
+ "model": {
160
+ "primary": "anthropic/claude-sonnet-4.6",
161
+ "fallbacks": ["blockrun/google/gemini-2.5-pro", "blockrun/deepseek/deepseek-chat"]
162
+ }
163
+ }
164
+ }
165
+ }
166
+ ```
167
+
168
+ #### Tier-Based Configuration (ClawRouter Smart Routing)
169
+
170
+ When using `blockrun/auto`, ClawRouter automatically classifies your request and picks the cheapest capable model:
171
+
172
+ - **SIMPLE** queries → Gemini 2.5 Flash, DeepSeek Chat (~$0.0001/req)
173
+ - **MEDIUM** queries → GPT-4o-mini, Gemini Flash (~$0.001/req)
174
+ - **COMPLEX** queries → Claude Sonnet, Gemini Pro (~$0.01/req)
175
+ - **REASONING** queries → DeepSeek R1, o3-mini (~$0.05/req)
176
+
177
+ Learn more: [ClawRouter Smart Routing](./smart-routing.md)
178
+
179
+ ## Monitoring & Troubleshooting
180
+
181
+ ### Check If Failover Is Working
182
+
183
+ ```bash
184
+ # Watch real-time logs
185
+ openclaw gateway logs --follow | grep -i "fallback\|blockrun\|rate.limit\|quota"
186
+
187
+ # Check ClawRouter proxy logs
188
+ openclaw gateway logs | grep "ClawRouter"
189
+ ```
190
+
191
+ **Success indicators:**
192
+
193
+ - ✅ "Rate limit reached" or "Quota exceeded" → primary failed
194
+ - ✅ "Trying fallback: blockrun/auto" → failover triggered
195
+ - ✅ "ClawRouter: Success with model" → failover succeeded
196
+
197
+ ### Common Issues
198
+
199
+ #### Issue: Failover never triggers
200
+
201
+ **Symptoms:** Always uses primary, never switches to ClawRouter
202
+
203
+ **Solutions:**
204
+
205
+ 1. Check fallbacks are configured:
206
+ ```bash
207
+ openclaw models show
208
+ ```
209
+ 2. Verify primary is actually failing (check provider dashboard for quota/rate limits)
210
+ 3. Check OpenClaw logs for authentication errors
211
+
212
+ #### Issue: "Wallet empty" errors during failover
213
+
214
+ **Symptoms:** Failover triggers but ClawRouter returns balance errors
215
+
216
+ **Solutions:**
217
+
218
+ 1. Check ClawRouter wallet balance:
219
+ ```bash
220
+ openclaw gateway logs | grep "Balance:"
221
+ ```
222
+ 2. Fund wallet on Base network (USDC)
223
+ 3. Verify wallet key is configured correctly
224
+
225
+ #### Issue: Slow failover (high latency)
226
+
227
+ **Symptoms:** 5-10 second delay when switching to ClawRouter
228
+
229
+ **Cause:** OpenClaw tries multiple auth profiles before failover
230
+
231
+ **Solutions:**
232
+
233
+ 1. Reduce auth profile retry attempts (see OpenClaw config)
234
+ 2. Use `blockrun/auto` as primary for faster responses
235
+ 3. Accept the latency as a tradeoff for cheaper requests
236
+
237
+ ## Cost Analysis
238
+
239
+ ### Example Scenario
240
+
241
+ **Usage pattern:**
242
+
243
+ - 100 requests/day
244
+ - 50% hit Claude subscription quota (rate limited)
245
+ - 50% use ClawRouter failover
246
+
247
+ **Without failover:**
248
+
249
+ - Pay Anthropic API: $50/month (100% API usage)
250
+
251
+ **With failover:**
252
+
253
+ - Claude subscription: $20/month (covers 50%)
254
+ - ClawRouter x402: ~$5/month (50 requests via smart routing)
255
+ - **Total: $25/month (50% savings)**
256
+
257
+ ### When Does This Make Sense?
258
+
259
+ ✅ **Good fit:**
260
+
261
+ - You already have a subscription for personal use
262
+ - You occasionally exceed quota/rate limits
263
+ - You want cost optimization without managing API keys
264
+
265
+ ❌ **Not ideal:**
266
+
267
+ - You need 100% reliability (subscriptions have rate limits)
268
+ - You prefer a single provider (no failover complexity)
269
+ - Your usage is low (< 10 requests/day)
270
+
271
+ ## FAQ
272
+
273
+ ### Q: Will this violate my subscription ToS?
274
+
275
+ **A:** You configure the subscription directly in OpenClaw using your own credentials. ClawRouter only receives requests after your subscription fails. This is similar to using multiple API keys yourself.
276
+
277
+ However, each provider has different ToS. Check yours before proceeding:
278
+
279
+ - [Claude Code Terms](https://claude.ai/terms)
280
+ - [ChatGPT Terms](https://openai.com/policies/terms-of-use)
281
+
282
+ ### Q: Can I use multiple subscriptions?
283
+
284
+ **A:** Yes! Configure multiple providers with failback chains:
285
+
286
+ ```bash
287
+ openclaw models set anthropic/claude-opus-4.6
288
+ openclaw models fallbacks add openai/gpt-4o # ChatGPT Plus
289
+ openclaw models fallbacks add blockrun/auto # x402 as final fallback
290
+ ```
291
+
292
+ ### Q: Does this work with Claude Max API Proxy?
293
+
294
+ **A:** Yes! Configure the proxy as a custom provider in OpenClaw, then add `blockrun/auto` as fallback.
295
+
296
+ See: [Claude Max API Proxy Guide](https://github.com/anthropics/claude-code/blob/main/docs/providers/claude-max-api-proxy.md)
297
+
298
+ ### Q: How is this different from PR #15?
299
+
300
+ **A:** PR #15 integrated Claude CLI directly into ClawRouter. Our approach:
301
+
302
+ - ✅ Works with any provider (not just Claude)
303
+ - ✅ Respects provider ToS (no proxy/wrapper)
304
+ - ✅ Uses OpenClaw's native failover (more reliable)
305
+ - ✅ Zero maintenance burden on ClawRouter
306
+
307
+ ## Feedback & Support
308
+
309
+ We'd love to hear your experience with subscription failover:
310
+
311
+ - **GitHub Discussion:** [Share your setup](https://github.com/BlockRunAI/ClawRouter/discussions)
312
+ - **Issues:** [Report problems](https://github.com/BlockRunAI/ClawRouter/issues)
313
+ - **Telegram:** [Join community](https://t.me/blockrunAI)
314
+
315
+ ## Related Documentation
316
+
317
+ - [OpenClaw Model Failover](https://docs.openclaw.ai/concepts/model-failover)
318
+ - [OpenClaw Provider Configuration](https://docs.openclaw.ai/gateway/configuration)
319
+ - [ClawRouter Smart Routing](./smart-routing.md)
320
+ - [ClawRouter x402 Micropayments](./x402-payments.md)
@@ -0,0 +1,322 @@
1
+ # Building a Smart LLM Router: How We Benchmarked 46 Models and Built a 14-Dimension Classifier
2
+
3
+ *March 20, 2026 | BlockRun Engineering*
4
+
5
+ When you route AI requests across 46 models from 8 providers, you can't just pick the cheapest one. You can't just pick the fastest one either. We learned this the hard way.
6
+
7
+ This is the technical story of how we benchmarked every model on our platform, discovered that speed and intelligence are poorly correlated, and built a production routing system that classifies requests in under 1ms using 14 weighted dimensions with sigmoid confidence calibration.
8
+
9
+ ## The Problem: One Gateway, 46 Models, Infinite Wrong Choices
10
+
11
+ BlockRun is an x402 micropayment gateway. Every LLM request flows through our proxy, gets authenticated via on-chain USDC payment, and is forwarded to the appropriate provider. The payment overhead adds 50-100ms to every request.
12
+
13
+ Our users set `model: "auto"` and expect us to pick the right model. But "right" means different things for different requests:
14
+
15
+ - A "what is Python?" query should route to the cheapest, fastest model
16
+ - A "implement a B-tree with concurrent insertions" query needs a capable model
17
+ - A "prove this theorem step by step" query needs reasoning capabilities
18
+ - An agentic workflow with tool calls needs models that follow instructions precisely
19
+
20
+ We needed a system that could classify any request and route it to the optimal model in real-time.
21
+
22
+ ## Step 1: Benchmarking the Fleet
23
+
24
+ Before building the router, we needed ground truth. We benchmarked all 46 models through our production payment pipeline.
25
+
26
+ ### Methodology
27
+
28
+ ```
29
+ Setup: ClawRouter v0.12.47 proxy on localhost
30
+ → BlockRun x402 gateway (Base EVM chain)
31
+ → Provider APIs (OpenAI, Anthropic, Google, xAI, DeepSeek, Moonshot, MiniMax, NVIDIA, Z.AI)
32
+
33
+ Prompts: 3 Python coding tasks (IPv4 validation, LCS algorithm, LRU cache)
34
+ 2 requests per model per prompt
35
+ Config: 256 max tokens, non-streaming, temperature 0.7
36
+ Measured: End-to-end wall clock time (includes x402 payment verification)
37
+ ```
38
+
39
+ This is not a synthetic benchmark. Every measurement includes the full payment-verification round trip that real users experience.
40
+
41
+ ### The Latency Landscape
42
+
43
+ Results revealed a 7x spread between the fastest and slowest models:
44
+
45
+ ```
46
+ FAST TIER (<1.5s):
47
+ xai/grok-4-fast 1,143ms 224 tok/s $0.20/$0.50
48
+ xai/grok-3-mini 1,202ms 215 tok/s $0.30/$0.50
49
+ google/gemini-2.5-flash 1,238ms 208 tok/s $0.30/$2.50
50
+ google/gemini-2.5-pro 1,294ms 198 tok/s $1.25/$10.00
51
+ google/gemini-3-flash 1,398ms 183 tok/s $0.50/$3.00
52
+ deepseek/deepseek-chat 1,431ms 179 tok/s $0.28/$0.42
53
+
54
+ MID TIER (1.5-2.5s):
55
+ google/gemini-3.1-pro 1,609ms 167 tok/s $2.00/$12.00
56
+ moonshot/kimi-k2.5 1,646ms 156 tok/s $0.60/$3.00
57
+ anthropic/claude-sonnet 2,110ms 121 tok/s $3.00/$15.00
58
+ anthropic/claude-opus 2,139ms 120 tok/s $5.00/$25.00
59
+ openai/o3-mini 2,260ms 114 tok/s $1.10/$4.40
60
+
61
+ SLOW TIER (>3s):
62
+ openai/gpt-5.2-pro 3,546ms 73 tok/s $21.00/$168.00
63
+ openai/gpt-4o 5,378ms 48 tok/s $2.50/$10.00
64
+ openai/gpt-5.4 6,213ms 41 tok/s $2.50/$15.00
65
+ openai/gpt-5.3-codex 7,935ms 32 tok/s $1.75/$14.00
66
+ ```
67
+
68
+ Two clear patterns:
69
+
70
+ 1. **Google and xAI dominate speed.** 11 of the top 13 fastest models are from Google or xAI.
71
+ 2. **OpenAI flagship models are consistently slow.** Every GPT-5.x model takes 3-8 seconds. Even their cheapest models (GPT-4.1-nano at $0.10/$0.40) are 2x slower than Google's cheapest.
72
+
73
+ ## Step 2: Adding the Quality Dimension
74
+
75
+ Speed alone tells you nothing about whether a model can actually handle your request. We cross-referenced our latency data with Artificial Analysis Intelligence Index v4.0 scores (composite of GPQA, MMLU, MATH, HumanEval, and other benchmarks):
76
+
77
+ ```
78
+ MODEL LATENCY IQ $/M INPUT
79
+ ─────────────────────────────────────────────────────
80
+ google/gemini-3.1-pro 1,609ms 57 $2.00 ← SWEET SPOT
81
+ openai/gpt-5.4 6,213ms 57 $2.50
82
+ openai/gpt-5.3-codex 7,935ms 54 $1.75
83
+ anthropic/claude-opus-4.6 2,139ms 53 $5.00
84
+ anthropic/claude-sonnet-4.6 2,110ms 52 $3.00
85
+ google/gemini-3-pro-prev 1,352ms 48 $2.00
86
+ moonshot/kimi-k2.5 1,646ms 47 $0.60
87
+ google/gemini-3-flash-prev 1,398ms 46 $0.50 ← VALUE SWEET SPOT
88
+ xai/grok-4 1,348ms 41 $0.20
89
+ xai/grok-4.1-fast 1,244ms 41 $0.20
90
+ deepseek/deepseek-chat 1,431ms 32 $0.28
91
+ xai/grok-4-fast 1,143ms 23 $0.20
92
+ google/gemini-2.5-flash 1,238ms 20 $0.30
93
+ ```
94
+
95
+ ### The Efficiency Frontier
96
+
97
+ Plotting IQ against latency reveals a clear efficiency frontier:
98
+
99
+ ```
100
+ IQ
101
+ 57 | Gem3.1Pro ·························· GPT-5.4
102
+ |
103
+ 53 | · Opus
104
+ 52 | · Sonnet
105
+ |
106
+ 48 | Gem3Pro ·
107
+ 47 | · Kimi
108
+ 46 | Gem3Flash ·
109
+ |
110
+ 41 | Grok4 ·
111
+ |
112
+ 32 | Grok3 · · DeepSeek
113
+ |
114
+ 23 | GrokFast ·
115
+ 20 | GemFlash ·
116
+ └──────────────────────────────────────────────
117
+ 1.0 1.5 2.0 2.5 3.0 6.0 8.0
118
+ End-to-End Latency (seconds)
119
+ ```
120
+
121
+ The frontier runs from Gemini 2.5 Flash (IQ 20, 1.2s) up to Gemini 3.1 Pro (IQ 57, 1.6s). Everything above and to the right of this line is dominated — you can get equal or better quality at lower latency from a different model.
122
+
123
+ Key insight: **Gemini 3.1 Pro matches GPT-5.4's IQ at 1/4 the latency and lower cost.** Claude Sonnet 4.6 nearly matches Opus 4.6 quality at 60% of the price. These dominated pairings directly informed our routing fallback chains.
124
+
125
+ ## Step 3: The Failed Experiment (Latency-First Routing)
126
+
127
+ Armed with benchmark data, we initially optimized for speed. The routing config promoted fast models:
128
+
129
+ ```typescript
130
+ // v0.12.47 — latency-optimized (REVERTED)
131
+ COMPLEX: {
132
+ primary: "xai/grok-4-0709", // 1,348ms, IQ 41
133
+ fallback: [
134
+ "xai/grok-4-1-fast-non-reasoning", // 1,244ms, IQ 41
135
+ "google/gemini-2.5-flash", // 1,238ms, IQ 20
136
+ // ... fast models first
137
+ ],
138
+ }
139
+ ```
140
+
141
+ Users complained within 24 hours. The fast models were refusing complex tasks and giving shallow responses. A model with IQ 41 can't reliably handle architecture design or multi-step code generation, no matter how fast it is.
142
+
143
+ **Lesson: optimizing for a single metric in a multi-objective system creates failure modes.** We needed to optimize across speed, quality, and cost simultaneously.
144
+
145
+ ## Step 4: The 14-Dimension Scoring System
146
+
147
+ The router needs to determine what kind of request it's looking at before selecting a model. We built a rule-based classifier that scores requests across 14 weighted dimensions:
148
+
149
+ ### Architecture
150
+
151
+ ```
152
+ User Prompt → Lowercase + Tokenize
153
+
154
+ ┌──────────────────────────────────┐
155
+ │ 14 Dimension Scorers │
156
+ │ Each returns score ∈ [-1, 1] │
157
+ └──────┬───────────────────────────┘
158
+
159
+ Weighted Sum (configurable weights)
160
+
161
+ Tier Boundaries (SIMPLE < 0.0 < MEDIUM < 0.3 < COMPLEX < 0.5 < REASONING)
162
+
163
+ Sigmoid Confidence Calibration
164
+
165
+ confidence < 0.7 → AMBIGUOUS → default to MEDIUM
166
+ confidence ≥ 0.7 → Classified tier
167
+
168
+ Tier × Profile → Model Selection
169
+ ```
170
+
171
+ ### The 14 Dimensions
172
+
173
+ | Dimension | Weight | What It Detects | Score Range |
174
+ |-----------|--------|-----------------|-------------|
175
+ | reasoningMarkers | 0.18 | "prove", "theorem", "step by step" | 0 to 1.0 |
176
+ | codePresence | 0.15 | "function", "class", "import", "```" | 0 to 1.0 |
177
+ | multiStepPatterns | 0.12 | "first...then", "step N", numbered lists | 0 or 0.5 |
178
+ | technicalTerms | 0.10 | "algorithm", "kubernetes", "distributed" | 0 to 1.0 |
179
+ | tokenCount | 0.08 | Short (<50 tokens) vs long (>500 tokens) | -1.0 to 1.0 |
180
+ | creativeMarkers | 0.05 | "story", "poem", "brainstorm" | 0 to 0.7 |
181
+ | questionComplexity | 0.05 | Number of question marks (>3 = complex) | 0 or 0.5 |
182
+ | agenticTask | 0.04 | "edit", "deploy", "fix", "debug" | 0 to 1.0 |
183
+ | constraintCount | 0.04 | "at most", "within", "O()" | 0 to 0.7 |
184
+ | imperativeVerbs | 0.03 | "build", "create", "implement" | 0 to 0.5 |
185
+ | outputFormat | 0.03 | "json", "yaml", "table", "csv" | 0 to 0.7 |
186
+ | simpleIndicators | 0.02 | "what is", "hello", "define" | 0 to -1.0 |
187
+ | referenceComplexity | 0.02 | "the code above", "the API docs" | 0 to 0.5 |
188
+ | domainSpecificity | 0.02 | "quantum", "FPGA", "genomics" | 0 to 0.8 |
189
+
190
+ Weights sum to 1.0. The weighted score maps to a continuous axis where tier boundaries partition the space.
191
+
192
+ ### Multilingual Support
193
+
194
+ Every keyword list includes translations in 9 languages (EN, ZH, JA, RU, DE, ES, PT, KO, AR). A Chinese user asking "证明这个定理" triggers the same reasoning classification as "prove this theorem."
195
+
196
+ ### Confidence Calibration
197
+
198
+ Raw tier assignments can be ambiguous when a score falls near a boundary. We use sigmoid calibration:
199
+
200
+ ```
201
+ confidence = 1 / (1 + exp(-steepness * distance_from_boundary))
202
+ ```
203
+
204
+ Where `steepness = 12` and `distance_from_boundary` is the score's distance to the nearest tier boundary. This maps to a [0.5, 1.0] confidence range. Below `threshold = 0.7`, the request is classified as ambiguous and defaults to MEDIUM.
205
+
206
+ ### Agentic Detection
207
+
208
+ A separate scoring pathway detects agentic tasks (multi-step, tool-using, iterative). When `agenticScore >= 0.5`, the router switches to agentic-optimized tier configs that prefer models with strong instruction following (Claude Sonnet for complex tasks, GPT-4o-mini for simple tool calls).
209
+
210
+ ## Step 5: Tier-to-Model Mapping
211
+
212
+ Once a request is classified into a tier, the router selects from 4 routing profiles:
213
+
214
+ ### Auto Profile (Default)
215
+
216
+ Tuned from our benchmark data + user retention metrics:
217
+
218
+ ```
219
+ SIMPLE → gemini-2.5-flash (1,238ms, IQ 20, 60% retention)
220
+ MEDIUM → kimi-k2.5 (1,646ms, IQ 47, strong tool use)
221
+ COMPLEX → gemini-3.1-pro (1,609ms, IQ 57, fastest flagship)
222
+ REASON → grok-4-1-fast-reasoning (1,454ms, $0.20/$0.50)
223
+ ```
224
+
225
+ ### Eco Profile
226
+
227
+ Ultra cost-optimized. Uses free/near-free models:
228
+
229
+ ```
230
+ SIMPLE → nvidia/gpt-oss-120b (FREE)
231
+ MEDIUM → gemini-2.5-flash-lite ($0.10/$0.40, 1M context)
232
+ COMPLEX → gemini-2.5-flash-lite ($0.10/$0.40)
233
+ REASON → grok-4-1-fast-reasoning ($0.20/$0.50)
234
+ ```
235
+
236
+ ### Premium Profile
237
+
238
+ Best quality regardless of cost:
239
+
240
+ ```
241
+ SIMPLE → kimi-k2.5 ($0.60/$3.00)
242
+ MEDIUM → gpt-5.3-codex ($1.75/$14.00, 400K context)
243
+ COMPLEX → claude-opus-4.6 ($5.00/$25.00)
244
+ REASON → claude-sonnet-4.6 ($3.00/$15.00)
245
+ ```
246
+
247
+ ### Fallback Chains
248
+
249
+ Each tier config includes an ordered fallback list. When the primary model returns a 402 (payment failed), 429 (rate limited), or 5xx, the proxy walks the fallback chain. Fallback ordering is benchmark-informed:
250
+
251
+ ```typescript
252
+ // COMPLEX tier — quality-first fallback order
253
+ fallback: [
254
+ "google/gemini-3-pro-preview", // IQ 48, 1,352ms
255
+ "google/gemini-3-flash-preview", // IQ 46, 1,398ms
256
+ "xai/grok-4-0709", // IQ 41, 1,348ms
257
+ "google/gemini-2.5-pro", // 1,294ms
258
+ "anthropic/claude-sonnet-4.6", // IQ 52, 2,110ms
259
+ "deepseek/deepseek-chat", // IQ 32, 1,431ms
260
+ "google/gemini-2.5-flash", // IQ 20, 1,238ms
261
+ "openai/gpt-5.4", // IQ 57, 6,213ms — last resort
262
+ ]
263
+ ```
264
+
265
+ The chain descends by quality first (IQ 48 → 46 → 41), then trades quality for speed. GPT-5.4 is last despite having IQ 57, because its 6.2s latency is a worst-case user experience.
266
+
267
+ ## Step 6: Context-Aware Filtering
268
+
269
+ The fallback chain is filtered at runtime based on request properties:
270
+
271
+ 1. **Context window filtering**: Models with insufficient context window for the estimated total tokens are excluded (with 10% safety buffer)
272
+ 2. **Tool calling filter**: When the request includes tool definitions, only models that support function calling are kept
273
+ 3. **Vision filter**: When the request includes images, only vision-capable models are kept
274
+
275
+ If filtering eliminates all candidates, the full chain is used as a fallback (better to let the API error than return nothing).
276
+
277
+ ## Cost Calculation and Savings
278
+
279
+ Every routing decision includes a cost estimate and savings percentage against a baseline (Claude Opus 4.6 pricing):
280
+
281
+ ```typescript
282
+ savings = max(0, (opusCost - routedCost) / opusCost)
283
+ ```
284
+
285
+ For a typical SIMPLE request (500 input tokens, 256 output tokens):
286
+ - Opus cost: $0.0089 (at $5.00/$25.00 per 1M tokens)
287
+ - Gemini Flash cost: $0.0008 (at $0.30/$2.50 per 1M tokens)
288
+ - Savings: 91.0%
289
+
290
+ Across our user base, the median savings rate is 85% compared to routing everything to a premium model.
291
+
292
+ ## Performance
293
+
294
+ The entire classification pipeline (14 dimensions + tier mapping + model selection) runs in under 1ms. No external API calls. No LLM inference. Pure keyword matching and arithmetic.
295
+
296
+ We originally designed a two-stage system where low-confidence rules-based classifications would fall back to an LLM classifier (Gemini 2.5 Flash). In practice, the rules handle 70-80% of requests with high confidence, and the remaining ambiguous cases default to MEDIUM — which is the correct conservative choice.
297
+
298
+ ## What We Learned
299
+
300
+ 1. **Speed and intelligence are weakly correlated.** The fastest model (Grok 4 Fast, IQ 23) is at the bottom of the quality scale. The smartest model at low latency (Gemini 3.1 Pro, IQ 57, 1.6s) is a Google model, not OpenAI.
301
+
302
+ 2. **Optimizing for one metric fails.** Latency-first routing breaks quality. Quality-first routing breaks latency budgets. You need multi-objective optimization.
303
+
304
+ 3. **User retention is the real metric.** Our best-performing model for SIMPLE tasks isn't the cheapest or the fastest — it's Gemini 2.5 Flash (60% retention rate), which balances speed, cost, and just-enough quality.
305
+
306
+ 4. **Fallback ordering matters more than primary selection.** The primary model handles the happy path. The fallback chain handles reality — rate limits, outages, payment failures. A well-ordered fallback chain is more important than picking the perfect primary.
307
+
308
+ 5. **Rule-based classification is underrated.** 14 keyword dimensions with sigmoid confidence calibration handles 70-80% of requests correctly in <1ms. The remaining 20-30% default to a safe middle tier. For a routing system where every millisecond of overhead compounds across millions of requests, avoiding LLM inference in the classification step is worth the reduced accuracy.
309
+
310
+ ---
311
+
312
+ ## Appendix: Full Benchmark Data
313
+
314
+ Raw data (46 models, latency, throughput, IQ scores, pricing): [`benchmark-merged.json`](https://github.com/BlockRunAI/ClawRouter/blob/main/benchmark-merged.json)
315
+
316
+ Routing configuration: [`src/router/config.ts`](https://github.com/BlockRunAI/ClawRouter/blob/main/src/router/config.ts)
317
+
318
+ Scoring implementation: [`src/router/rules.ts`](https://github.com/BlockRunAI/ClawRouter/blob/main/src/router/rules.ts)
319
+
320
+ ---
321
+
322
+ *BlockRun is the x402 micropayment gateway for AI. One wallet, 46+ models, pay-per-request with USDC. [blockrun.ai](https://blockrun.ai)*