@blockrun/clawrouter 0.12.62 → 0.12.64
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/docs/anthropic-cost-savings.md +349 -0
- package/docs/architecture.md +559 -0
- package/docs/blog-benchmark-2026-03.md +184 -0
- package/docs/blog-openclaw-cost-overruns.md +197 -0
- package/docs/clawrouter-savings.png +0 -0
- package/docs/configuration.md +512 -0
- package/docs/features.md +257 -0
- package/docs/image-generation.md +380 -0
- package/docs/routing-profiles.md +81 -0
- package/docs/subscription-failover.md +320 -0
- package/docs/technical-routing-2026-03.md +322 -0
- package/docs/troubleshooting.md +159 -0
- package/docs/vision.md +49 -0
- package/docs/vs-openrouter.md +157 -0
- package/docs/worker-network.md +1241 -0
- package/package.json +3 -1
|
@@ -0,0 +1,322 @@
|
|
|
1
|
+
# Building a Smart LLM Router: How We Benchmarked 46 Models and Built a 14-Dimension Classifier
|
|
2
|
+
|
|
3
|
+
*March 20, 2026 | BlockRun Engineering*
|
|
4
|
+
|
|
5
|
+
When you route AI requests across 46 models from 8 providers, you can't just pick the cheapest one. You can't just pick the fastest one either. We learned this the hard way.
|
|
6
|
+
|
|
7
|
+
This is the technical story of how we benchmarked every model on our platform, discovered that speed and intelligence are poorly correlated, and built a production routing system that classifies requests in under 1ms using 14 weighted dimensions with sigmoid confidence calibration.
|
|
8
|
+
|
|
9
|
+
## The Problem: One Gateway, 46 Models, Infinite Wrong Choices
|
|
10
|
+
|
|
11
|
+
BlockRun is an x402 micropayment gateway. Every LLM request flows through our proxy, gets authenticated via on-chain USDC payment, and is forwarded to the appropriate provider. The payment overhead adds 50-100ms to every request.
|
|
12
|
+
|
|
13
|
+
Our users set `model: "auto"` and expect us to pick the right model. But "right" means different things for different requests:
|
|
14
|
+
|
|
15
|
+
- A "what is Python?" query should route to the cheapest, fastest model
|
|
16
|
+
- A "implement a B-tree with concurrent insertions" query needs a capable model
|
|
17
|
+
- A "prove this theorem step by step" query needs reasoning capabilities
|
|
18
|
+
- An agentic workflow with tool calls needs models that follow instructions precisely
|
|
19
|
+
|
|
20
|
+
We needed a system that could classify any request and route it to the optimal model in real-time.
|
|
21
|
+
|
|
22
|
+
## Step 1: Benchmarking the Fleet
|
|
23
|
+
|
|
24
|
+
Before building the router, we needed ground truth. We benchmarked all 46 models through our production payment pipeline.
|
|
25
|
+
|
|
26
|
+
### Methodology
|
|
27
|
+
|
|
28
|
+
```
|
|
29
|
+
Setup: ClawRouter v0.12.47 proxy on localhost
|
|
30
|
+
→ BlockRun x402 gateway (Base EVM chain)
|
|
31
|
+
→ Provider APIs (OpenAI, Anthropic, Google, xAI, DeepSeek, Moonshot, MiniMax, NVIDIA, Z.AI)
|
|
32
|
+
|
|
33
|
+
Prompts: 3 Python coding tasks (IPv4 validation, LCS algorithm, LRU cache)
|
|
34
|
+
2 requests per model per prompt
|
|
35
|
+
Config: 256 max tokens, non-streaming, temperature 0.7
|
|
36
|
+
Measured: End-to-end wall clock time (includes x402 payment verification)
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
This is not a synthetic benchmark. Every measurement includes the full payment-verification round trip that real users experience.
|
|
40
|
+
|
|
41
|
+
### The Latency Landscape
|
|
42
|
+
|
|
43
|
+
Results revealed a 7x spread between the fastest and slowest models:
|
|
44
|
+
|
|
45
|
+
```
|
|
46
|
+
FAST TIER (<1.5s):
|
|
47
|
+
xai/grok-4-fast 1,143ms 224 tok/s $0.20/$0.50
|
|
48
|
+
xai/grok-3-mini 1,202ms 215 tok/s $0.30/$0.50
|
|
49
|
+
google/gemini-2.5-flash 1,238ms 208 tok/s $0.30/$2.50
|
|
50
|
+
google/gemini-2.5-pro 1,294ms 198 tok/s $1.25/$10.00
|
|
51
|
+
google/gemini-3-flash 1,398ms 183 tok/s $0.50/$3.00
|
|
52
|
+
deepseek/deepseek-chat 1,431ms 179 tok/s $0.28/$0.42
|
|
53
|
+
|
|
54
|
+
MID TIER (1.5-2.5s):
|
|
55
|
+
google/gemini-3.1-pro 1,609ms 167 tok/s $2.00/$12.00
|
|
56
|
+
moonshot/kimi-k2.5 1,646ms 156 tok/s $0.60/$3.00
|
|
57
|
+
anthropic/claude-sonnet 2,110ms 121 tok/s $3.00/$15.00
|
|
58
|
+
anthropic/claude-opus 2,139ms 120 tok/s $5.00/$25.00
|
|
59
|
+
openai/o3-mini 2,260ms 114 tok/s $1.10/$4.40
|
|
60
|
+
|
|
61
|
+
SLOW TIER (>3s):
|
|
62
|
+
openai/gpt-5.2-pro 3,546ms 73 tok/s $21.00/$168.00
|
|
63
|
+
openai/gpt-4o 5,378ms 48 tok/s $2.50/$10.00
|
|
64
|
+
openai/gpt-5.4 6,213ms 41 tok/s $2.50/$15.00
|
|
65
|
+
openai/gpt-5.3-codex 7,935ms 32 tok/s $1.75/$14.00
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
Two clear patterns:
|
|
69
|
+
|
|
70
|
+
1. **Google and xAI dominate speed.** 11 of the top 13 fastest models are from Google or xAI.
|
|
71
|
+
2. **OpenAI flagship models are consistently slow.** Every GPT-5.x model takes 3-8 seconds. Even their cheapest models (GPT-4.1-nano at $0.10/$0.40) are 2x slower than Google's cheapest.
|
|
72
|
+
|
|
73
|
+
## Step 2: Adding the Quality Dimension
|
|
74
|
+
|
|
75
|
+
Speed alone tells you nothing about whether a model can actually handle your request. We cross-referenced our latency data with Artificial Analysis Intelligence Index v4.0 scores (composite of GPQA, MMLU, MATH, HumanEval, and other benchmarks):
|
|
76
|
+
|
|
77
|
+
```
|
|
78
|
+
MODEL LATENCY IQ $/M INPUT
|
|
79
|
+
─────────────────────────────────────────────────────
|
|
80
|
+
google/gemini-3.1-pro 1,609ms 57 $2.00 ← SWEET SPOT
|
|
81
|
+
openai/gpt-5.4 6,213ms 57 $2.50
|
|
82
|
+
openai/gpt-5.3-codex 7,935ms 54 $1.75
|
|
83
|
+
anthropic/claude-opus-4.6 2,139ms 53 $5.00
|
|
84
|
+
anthropic/claude-sonnet-4.6 2,110ms 52 $3.00
|
|
85
|
+
google/gemini-3-pro-prev 1,352ms 48 $2.00
|
|
86
|
+
moonshot/kimi-k2.5 1,646ms 47 $0.60
|
|
87
|
+
google/gemini-3-flash-prev 1,398ms 46 $0.50 ← VALUE SWEET SPOT
|
|
88
|
+
xai/grok-4 1,348ms 41 $0.20
|
|
89
|
+
xai/grok-4.1-fast 1,244ms 41 $0.20
|
|
90
|
+
deepseek/deepseek-chat 1,431ms 32 $0.28
|
|
91
|
+
xai/grok-4-fast 1,143ms 23 $0.20
|
|
92
|
+
google/gemini-2.5-flash 1,238ms 20 $0.30
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
### The Efficiency Frontier
|
|
96
|
+
|
|
97
|
+
Plotting IQ against latency reveals a clear efficiency frontier:
|
|
98
|
+
|
|
99
|
+
```
|
|
100
|
+
IQ
|
|
101
|
+
57 | Gem3.1Pro ·························· GPT-5.4
|
|
102
|
+
|
|
|
103
|
+
53 | · Opus
|
|
104
|
+
52 | · Sonnet
|
|
105
|
+
|
|
|
106
|
+
48 | Gem3Pro ·
|
|
107
|
+
47 | · Kimi
|
|
108
|
+
46 | Gem3Flash ·
|
|
109
|
+
|
|
|
110
|
+
41 | Grok4 ·
|
|
111
|
+
|
|
|
112
|
+
32 | Grok3 · · DeepSeek
|
|
113
|
+
|
|
|
114
|
+
23 | GrokFast ·
|
|
115
|
+
20 | GemFlash ·
|
|
116
|
+
└──────────────────────────────────────────────
|
|
117
|
+
1.0 1.5 2.0 2.5 3.0 6.0 8.0
|
|
118
|
+
End-to-End Latency (seconds)
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
The frontier runs from Gemini 2.5 Flash (IQ 20, 1.2s) up to Gemini 3.1 Pro (IQ 57, 1.6s). Everything above and to the right of this line is dominated — you can get equal or better quality at lower latency from a different model.
|
|
122
|
+
|
|
123
|
+
Key insight: **Gemini 3.1 Pro matches GPT-5.4's IQ at 1/4 the latency and lower cost.** Claude Sonnet 4.6 nearly matches Opus 4.6 quality at 60% of the price. These dominated pairings directly informed our routing fallback chains.
|
|
124
|
+
|
|
125
|
+
## Step 3: The Failed Experiment (Latency-First Routing)
|
|
126
|
+
|
|
127
|
+
Armed with benchmark data, we initially optimized for speed. The routing config promoted fast models:
|
|
128
|
+
|
|
129
|
+
```typescript
|
|
130
|
+
// v0.12.47 — latency-optimized (REVERTED)
|
|
131
|
+
COMPLEX: {
|
|
132
|
+
primary: "xai/grok-4-0709", // 1,348ms, IQ 41
|
|
133
|
+
fallback: [
|
|
134
|
+
"xai/grok-4-1-fast-non-reasoning", // 1,244ms, IQ 41
|
|
135
|
+
"google/gemini-2.5-flash", // 1,238ms, IQ 20
|
|
136
|
+
// ... fast models first
|
|
137
|
+
],
|
|
138
|
+
}
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
Users complained within 24 hours. The fast models were refusing complex tasks and giving shallow responses. A model with IQ 41 can't reliably handle architecture design or multi-step code generation, no matter how fast it is.
|
|
142
|
+
|
|
143
|
+
**Lesson: optimizing for a single metric in a multi-objective system creates failure modes.** We needed to optimize across speed, quality, and cost simultaneously.
|
|
144
|
+
|
|
145
|
+
## Step 4: The 14-Dimension Scoring System
|
|
146
|
+
|
|
147
|
+
The router needs to determine what kind of request it's looking at before selecting a model. We built a rule-based classifier that scores requests across 14 weighted dimensions:
|
|
148
|
+
|
|
149
|
+
### Architecture
|
|
150
|
+
|
|
151
|
+
```
|
|
152
|
+
User Prompt → Lowercase + Tokenize
|
|
153
|
+
↓
|
|
154
|
+
┌──────────────────────────────────┐
|
|
155
|
+
│ 14 Dimension Scorers │
|
|
156
|
+
│ Each returns score ∈ [-1, 1] │
|
|
157
|
+
└──────┬───────────────────────────┘
|
|
158
|
+
↓
|
|
159
|
+
Weighted Sum (configurable weights)
|
|
160
|
+
↓
|
|
161
|
+
Tier Boundaries (SIMPLE < 0.0 < MEDIUM < 0.3 < COMPLEX < 0.5 < REASONING)
|
|
162
|
+
↓
|
|
163
|
+
Sigmoid Confidence Calibration
|
|
164
|
+
↓
|
|
165
|
+
confidence < 0.7 → AMBIGUOUS → default to MEDIUM
|
|
166
|
+
confidence ≥ 0.7 → Classified tier
|
|
167
|
+
↓
|
|
168
|
+
Tier × Profile → Model Selection
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
### The 14 Dimensions
|
|
172
|
+
|
|
173
|
+
| Dimension | Weight | What It Detects | Score Range |
|
|
174
|
+
|-----------|--------|-----------------|-------------|
|
|
175
|
+
| reasoningMarkers | 0.18 | "prove", "theorem", "step by step" | 0 to 1.0 |
|
|
176
|
+
| codePresence | 0.15 | "function", "class", "import", "```" | 0 to 1.0 |
|
|
177
|
+
| multiStepPatterns | 0.12 | "first...then", "step N", numbered lists | 0 or 0.5 |
|
|
178
|
+
| technicalTerms | 0.10 | "algorithm", "kubernetes", "distributed" | 0 to 1.0 |
|
|
179
|
+
| tokenCount | 0.08 | Short (<50 tokens) vs long (>500 tokens) | -1.0 to 1.0 |
|
|
180
|
+
| creativeMarkers | 0.05 | "story", "poem", "brainstorm" | 0 to 0.7 |
|
|
181
|
+
| questionComplexity | 0.05 | Number of question marks (>3 = complex) | 0 or 0.5 |
|
|
182
|
+
| agenticTask | 0.04 | "edit", "deploy", "fix", "debug" | 0 to 1.0 |
|
|
183
|
+
| constraintCount | 0.04 | "at most", "within", "O()" | 0 to 0.7 |
|
|
184
|
+
| imperativeVerbs | 0.03 | "build", "create", "implement" | 0 to 0.5 |
|
|
185
|
+
| outputFormat | 0.03 | "json", "yaml", "table", "csv" | 0 to 0.7 |
|
|
186
|
+
| simpleIndicators | 0.02 | "what is", "hello", "define" | 0 to -1.0 |
|
|
187
|
+
| referenceComplexity | 0.02 | "the code above", "the API docs" | 0 to 0.5 |
|
|
188
|
+
| domainSpecificity | 0.02 | "quantum", "FPGA", "genomics" | 0 to 0.8 |
|
|
189
|
+
|
|
190
|
+
Weights sum to 1.0. The weighted score maps to a continuous axis where tier boundaries partition the space.
|
|
191
|
+
|
|
192
|
+
### Multilingual Support
|
|
193
|
+
|
|
194
|
+
Every keyword list includes translations in 9 languages (EN, ZH, JA, RU, DE, ES, PT, KO, AR). A Chinese user asking "证明这个定理" triggers the same reasoning classification as "prove this theorem."
|
|
195
|
+
|
|
196
|
+
### Confidence Calibration
|
|
197
|
+
|
|
198
|
+
Raw tier assignments can be ambiguous when a score falls near a boundary. We use sigmoid calibration:
|
|
199
|
+
|
|
200
|
+
```
|
|
201
|
+
confidence = 1 / (1 + exp(-steepness * distance_from_boundary))
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
Where `steepness = 12` and `distance_from_boundary` is the score's distance to the nearest tier boundary. This maps to a [0.5, 1.0] confidence range. Below `threshold = 0.7`, the request is classified as ambiguous and defaults to MEDIUM.
|
|
205
|
+
|
|
206
|
+
### Agentic Detection
|
|
207
|
+
|
|
208
|
+
A separate scoring pathway detects agentic tasks (multi-step, tool-using, iterative). When `agenticScore >= 0.5`, the router switches to agentic-optimized tier configs that prefer models with strong instruction following (Claude Sonnet for complex tasks, GPT-4o-mini for simple tool calls).
|
|
209
|
+
|
|
210
|
+
## Step 5: Tier-to-Model Mapping
|
|
211
|
+
|
|
212
|
+
Once a request is classified into a tier, the router selects from 4 routing profiles:
|
|
213
|
+
|
|
214
|
+
### Auto Profile (Default)
|
|
215
|
+
|
|
216
|
+
Tuned from our benchmark data + user retention metrics:
|
|
217
|
+
|
|
218
|
+
```
|
|
219
|
+
SIMPLE → gemini-2.5-flash (1,238ms, IQ 20, 60% retention)
|
|
220
|
+
MEDIUM → kimi-k2.5 (1,646ms, IQ 47, strong tool use)
|
|
221
|
+
COMPLEX → gemini-3.1-pro (1,609ms, IQ 57, fastest flagship)
|
|
222
|
+
REASON → grok-4-1-fast-reasoning (1,454ms, $0.20/$0.50)
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
### Eco Profile
|
|
226
|
+
|
|
227
|
+
Ultra cost-optimized. Uses free/near-free models:
|
|
228
|
+
|
|
229
|
+
```
|
|
230
|
+
SIMPLE → nvidia/gpt-oss-120b (FREE)
|
|
231
|
+
MEDIUM → gemini-2.5-flash-lite ($0.10/$0.40, 1M context)
|
|
232
|
+
COMPLEX → gemini-2.5-flash-lite ($0.10/$0.40)
|
|
233
|
+
REASON → grok-4-1-fast-reasoning ($0.20/$0.50)
|
|
234
|
+
```
|
|
235
|
+
|
|
236
|
+
### Premium Profile
|
|
237
|
+
|
|
238
|
+
Best quality regardless of cost:
|
|
239
|
+
|
|
240
|
+
```
|
|
241
|
+
SIMPLE → kimi-k2.5 ($0.60/$3.00)
|
|
242
|
+
MEDIUM → gpt-5.3-codex ($1.75/$14.00, 400K context)
|
|
243
|
+
COMPLEX → claude-opus-4.6 ($5.00/$25.00)
|
|
244
|
+
REASON → claude-sonnet-4.6 ($3.00/$15.00)
|
|
245
|
+
```
|
|
246
|
+
|
|
247
|
+
### Fallback Chains
|
|
248
|
+
|
|
249
|
+
Each tier config includes an ordered fallback list. When the primary model returns a 402 (payment failed), 429 (rate limited), or 5xx, the proxy walks the fallback chain. Fallback ordering is benchmark-informed:
|
|
250
|
+
|
|
251
|
+
```typescript
|
|
252
|
+
// COMPLEX tier — quality-first fallback order
|
|
253
|
+
fallback: [
|
|
254
|
+
"google/gemini-3-pro-preview", // IQ 48, 1,352ms
|
|
255
|
+
"google/gemini-3-flash-preview", // IQ 46, 1,398ms
|
|
256
|
+
"xai/grok-4-0709", // IQ 41, 1,348ms
|
|
257
|
+
"google/gemini-2.5-pro", // 1,294ms
|
|
258
|
+
"anthropic/claude-sonnet-4.6", // IQ 52, 2,110ms
|
|
259
|
+
"deepseek/deepseek-chat", // IQ 32, 1,431ms
|
|
260
|
+
"google/gemini-2.5-flash", // IQ 20, 1,238ms
|
|
261
|
+
"openai/gpt-5.4", // IQ 57, 6,213ms — last resort
|
|
262
|
+
]
|
|
263
|
+
```
|
|
264
|
+
|
|
265
|
+
The chain descends by quality first (IQ 48 → 46 → 41), then trades quality for speed. GPT-5.4 is last despite having IQ 57, because its 6.2s latency is a worst-case user experience.
|
|
266
|
+
|
|
267
|
+
## Step 6: Context-Aware Filtering
|
|
268
|
+
|
|
269
|
+
The fallback chain is filtered at runtime based on request properties:
|
|
270
|
+
|
|
271
|
+
1. **Context window filtering**: Models with insufficient context window for the estimated total tokens are excluded (with 10% safety buffer)
|
|
272
|
+
2. **Tool calling filter**: When the request includes tool definitions, only models that support function calling are kept
|
|
273
|
+
3. **Vision filter**: When the request includes images, only vision-capable models are kept
|
|
274
|
+
|
|
275
|
+
If filtering eliminates all candidates, the full chain is used as a fallback (better to let the API error than return nothing).
|
|
276
|
+
|
|
277
|
+
## Cost Calculation and Savings
|
|
278
|
+
|
|
279
|
+
Every routing decision includes a cost estimate and savings percentage against a baseline (Claude Opus 4.6 pricing):
|
|
280
|
+
|
|
281
|
+
```typescript
|
|
282
|
+
savings = max(0, (opusCost - routedCost) / opusCost)
|
|
283
|
+
```
|
|
284
|
+
|
|
285
|
+
For a typical SIMPLE request (500 input tokens, 256 output tokens):
|
|
286
|
+
- Opus cost: $0.0089 (at $5.00/$25.00 per 1M tokens)
|
|
287
|
+
- Gemini Flash cost: $0.0008 (at $0.30/$2.50 per 1M tokens)
|
|
288
|
+
- Savings: 91.0%
|
|
289
|
+
|
|
290
|
+
Across our user base, the median savings rate is 85% compared to routing everything to a premium model.
|
|
291
|
+
|
|
292
|
+
## Performance
|
|
293
|
+
|
|
294
|
+
The entire classification pipeline (14 dimensions + tier mapping + model selection) runs in under 1ms. No external API calls. No LLM inference. Pure keyword matching and arithmetic.
|
|
295
|
+
|
|
296
|
+
We originally designed a two-stage system where low-confidence rules-based classifications would fall back to an LLM classifier (Gemini 2.5 Flash). In practice, the rules handle 70-80% of requests with high confidence, and the remaining ambiguous cases default to MEDIUM — which is the correct conservative choice.
|
|
297
|
+
|
|
298
|
+
## What We Learned
|
|
299
|
+
|
|
300
|
+
1. **Speed and intelligence are weakly correlated.** The fastest model (Grok 4 Fast, IQ 23) is at the bottom of the quality scale. The smartest model at low latency (Gemini 3.1 Pro, IQ 57, 1.6s) is a Google model, not OpenAI.
|
|
301
|
+
|
|
302
|
+
2. **Optimizing for one metric fails.** Latency-first routing breaks quality. Quality-first routing breaks latency budgets. You need multi-objective optimization.
|
|
303
|
+
|
|
304
|
+
3. **User retention is the real metric.** Our best-performing model for SIMPLE tasks isn't the cheapest or the fastest — it's Gemini 2.5 Flash (60% retention rate), which balances speed, cost, and just-enough quality.
|
|
305
|
+
|
|
306
|
+
4. **Fallback ordering matters more than primary selection.** The primary model handles the happy path. The fallback chain handles reality — rate limits, outages, payment failures. A well-ordered fallback chain is more important than picking the perfect primary.
|
|
307
|
+
|
|
308
|
+
5. **Rule-based classification is underrated.** 14 keyword dimensions with sigmoid confidence calibration handles 70-80% of requests correctly in <1ms. The remaining 20-30% default to a safe middle tier. For a routing system where every millisecond of overhead compounds across millions of requests, avoiding LLM inference in the classification step is worth the reduced accuracy.
|
|
309
|
+
|
|
310
|
+
---
|
|
311
|
+
|
|
312
|
+
## Appendix: Full Benchmark Data
|
|
313
|
+
|
|
314
|
+
Raw data (46 models, latency, throughput, IQ scores, pricing): [`benchmark-merged.json`](https://github.com/BlockRunAI/ClawRouter/blob/main/benchmark-merged.json)
|
|
315
|
+
|
|
316
|
+
Routing configuration: [`src/router/config.ts`](https://github.com/BlockRunAI/ClawRouter/blob/main/src/router/config.ts)
|
|
317
|
+
|
|
318
|
+
Scoring implementation: [`src/router/rules.ts`](https://github.com/BlockRunAI/ClawRouter/blob/main/src/router/rules.ts)
|
|
319
|
+
|
|
320
|
+
---
|
|
321
|
+
|
|
322
|
+
*BlockRun is the x402 micropayment gateway for AI. One wallet, 46+ models, pay-per-request with USDC. [blockrun.ai](https://blockrun.ai)*
|
|
@@ -0,0 +1,159 @@
|
|
|
1
|
+
# Troubleshooting
|
|
2
|
+
|
|
3
|
+
Quick solutions for common ClawRouter issues.
|
|
4
|
+
|
|
5
|
+
> Need help? [Open a Discussion](https://github.com/BlockRunAI/ClawRouter/discussions) or check [existing issues](https://github.com/BlockRunAI/ClawRouter/issues).
|
|
6
|
+
|
|
7
|
+
## Table of Contents
|
|
8
|
+
|
|
9
|
+
- [Quick Checklist](#quick-checklist)
|
|
10
|
+
- [Common Errors](#common-errors)
|
|
11
|
+
- [Security Scanner Warnings](#security-scanner-warnings)
|
|
12
|
+
- [Port Conflicts](#port-conflicts)
|
|
13
|
+
- [How to Update](#how-to-update)
|
|
14
|
+
- [Verify Routing](#verify-routing)
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## Quick Checklist
|
|
19
|
+
|
|
20
|
+
```bash
|
|
21
|
+
# 1. Check your version (should be 0.12+)
|
|
22
|
+
cat ~/.openclaw/extensions/clawrouter/package.json | grep version
|
|
23
|
+
|
|
24
|
+
# 2. Check proxy is running
|
|
25
|
+
curl http://localhost:8402/health
|
|
26
|
+
|
|
27
|
+
# 3. Check wallet (both EVM + Solana addresses and balance)
|
|
28
|
+
/wallet
|
|
29
|
+
|
|
30
|
+
# 4. Watch routing in action
|
|
31
|
+
openclaw logs --follow
|
|
32
|
+
# Should see: kimi-k2.5 $0.0012 (saved 99%)
|
|
33
|
+
|
|
34
|
+
# 5. View cost savings
|
|
35
|
+
/stats
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
---
|
|
39
|
+
|
|
40
|
+
## Common Errors
|
|
41
|
+
|
|
42
|
+
### "Unknown model: blockrun/auto" or "Unknown model: auto"
|
|
43
|
+
|
|
44
|
+
Plugin isn't loaded or outdated. **Don't change the model name** — `blockrun/auto` is correct.
|
|
45
|
+
|
|
46
|
+
**Fix:** Update to v0.3.21+ which handles both `blockrun/auto` and `auto` (OpenClaw strips provider prefix). See [How to Update](#how-to-update).
|
|
47
|
+
|
|
48
|
+
### "No API key found for provider blockrun"
|
|
49
|
+
|
|
50
|
+
Auth profile is missing or wasn't created properly.
|
|
51
|
+
|
|
52
|
+
**Fix:** See [How to Update](#how-to-update) — the reinstall script automatically injects the auth profile.
|
|
53
|
+
|
|
54
|
+
### "Config validation failed: plugin not found: clawrouter"
|
|
55
|
+
|
|
56
|
+
Plugin directory was removed but config still references it. This blocks all OpenClaw commands until fixed.
|
|
57
|
+
|
|
58
|
+
**Fix:** See [How to Update](#how-to-update) for complete cleanup steps.
|
|
59
|
+
|
|
60
|
+
### "No USDC balance" / "Insufficient funds"
|
|
61
|
+
|
|
62
|
+
Wallet needs funding. ClawRouter accepts **USDC** (not SOL or ETH) on either chain.
|
|
63
|
+
|
|
64
|
+
**Fix:**
|
|
65
|
+
|
|
66
|
+
1. Find your wallet address: run `/wallet` in any OpenClaw conversation
|
|
67
|
+
2. Choose your preferred chain and send **USDC** to that address:
|
|
68
|
+
- **Base (EVM):** Send USDC on Base network to your EVM address (`0x...`)
|
|
69
|
+
- **Solana:** Send USDC on Solana network to your Solana address (base58)
|
|
70
|
+
3. $1–5 is enough for hundreds of requests
|
|
71
|
+
4. Restart OpenClaw (or wait up to 60s for balance cache to refresh)
|
|
72
|
+
|
|
73
|
+
---
|
|
74
|
+
|
|
75
|
+
## Security Scanner Warnings
|
|
76
|
+
|
|
77
|
+
### "WARNING: dangerous code patterns — possible credential harvesting"
|
|
78
|
+
|
|
79
|
+
This is a **false positive**. ClawRouter legitimately:
|
|
80
|
+
|
|
81
|
+
1. Reads `BLOCKRUN_WALLET_KEY` from environment (for authentication)
|
|
82
|
+
2. Sends authenticated requests to BlockRun API (for x402 micropayments)
|
|
83
|
+
|
|
84
|
+
This pattern triggers OpenClaw's security scanner, but it's the intended behavior — the wallet key is required to sign payment transactions. The code is fully open source and auditable.
|
|
85
|
+
|
|
86
|
+
### "env-harvesting" Warning
|
|
87
|
+
|
|
88
|
+
OpenClaw's security scanner may flag ClawRouter with:
|
|
89
|
+
|
|
90
|
+
```
|
|
91
|
+
[env-harvesting] Environment variable access combined with network send
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
**This is a false positive.** The scanner's heuristic (`env variable + network request = suspicious`) flags all payment plugins, but this pattern is inherently required for non-custodial payments.
|
|
95
|
+
|
|
96
|
+
ClawRouter reads `BLOCKRUN_WALLET_KEY` to sign x402 payment transactions — this is required and intentional:
|
|
97
|
+
|
|
98
|
+
- The wallet key is used **locally** for cryptographic signing (EIP-712)
|
|
99
|
+
- The **signature** is transmitted, not the private key itself
|
|
100
|
+
- The key **never leaves the machine** — only cryptographic proofs are sent
|
|
101
|
+
- This is standard [x402 payment protocol](https://x402.org) behavior
|
|
102
|
+
- Source code is [MIT licensed and fully auditable](https://github.com/BlockRunAI/ClawRouter)
|
|
103
|
+
|
|
104
|
+
See [`openclaw.security.json`](../openclaw.security.json) for detailed security documentation and [this discussion](https://x.com/bc1beat/status/2020158972561428686) for more context.
|
|
105
|
+
|
|
106
|
+
---
|
|
107
|
+
|
|
108
|
+
## Port Conflicts
|
|
109
|
+
|
|
110
|
+
### Port 8402 already in use
|
|
111
|
+
|
|
112
|
+
As of v0.4.1, ClawRouter automatically detects and reuses an existing proxy on the configured port instead of failing with `EADDRINUSE`. You should no longer see this error.
|
|
113
|
+
|
|
114
|
+
If you need to use a different port:
|
|
115
|
+
|
|
116
|
+
```bash
|
|
117
|
+
# Set custom port via environment variable
|
|
118
|
+
export BLOCKRUN_PROXY_PORT=8403
|
|
119
|
+
openclaw gateway restart
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
To manually check/kill the process:
|
|
123
|
+
|
|
124
|
+
```bash
|
|
125
|
+
lsof -i :8402
|
|
126
|
+
# Kill the process or restart OpenClaw
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
---
|
|
130
|
+
|
|
131
|
+
## How to Update
|
|
132
|
+
|
|
133
|
+
```bash
|
|
134
|
+
npx @blockrun/clawrouter@latest
|
|
135
|
+
openclaw gateway restart
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
This installs the latest version and restarts the gateway. Alternatively:
|
|
139
|
+
|
|
140
|
+
```bash
|
|
141
|
+
curl -fsSL https://raw.githubusercontent.com/BlockRunAI/ClawRouter/main/scripts/reinstall.sh | bash
|
|
142
|
+
openclaw gateway restart
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
---
|
|
146
|
+
|
|
147
|
+
## Verify Routing
|
|
148
|
+
|
|
149
|
+
```bash
|
|
150
|
+
openclaw logs --follow
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
You should see model selection for each request:
|
|
154
|
+
|
|
155
|
+
```
|
|
156
|
+
[plugins] [SIMPLE] google/gemini-2.5-flash $0.0012 (saved 99%)
|
|
157
|
+
[plugins] [MEDIUM] deepseek/deepseek-chat $0.0003 (saved 99%)
|
|
158
|
+
[plugins] [REASONING] deepseek/deepseek-reasoner $0.0005 (saved 99%)
|
|
159
|
+
```
|
package/docs/vision.md
ADDED
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
# BlockRun Worker Network — Vision
|
|
2
|
+
|
|
3
|
+
## The Problem
|
|
4
|
+
|
|
5
|
+
Every company with a public API — blockchain protocols, AI providers, SaaS businesses — needs to prove to investors, customers, and regulators that their service is reliable. Today, that proof comes from centralized monitoring tools like UptimeRobot or Pingdom. The fundamental flaw: these reports are self-referential. A company can choose which tool monitors them, when to show the data, and how to present it. There is no independent, tamper-proof verification.
|
|
6
|
+
|
|
7
|
+
Meanwhile, ClawRouter users run AI agents that are idle the vast majority of the time. That idle compute has no economic value today.
|
|
8
|
+
|
|
9
|
+
## The Vision
|
|
10
|
+
|
|
11
|
+
Turn ClawRouter's distributed user base into the world's first **decentralized high-availability validation network** — where AI agents earn USDC by doing real verification work, and any company can purchase cryptographically verifiable proof that their service is always on.
|
|
12
|
+
|
|
13
|
+
The core insight: a health check result signed by 50 independent nodes across 30 countries is fundamentally different from the same check run by a single company's monitoring vendor. It cannot be fabricated. It cannot be cherry-picked. It is, for the first time, **objective proof of uptime**.
|
|
14
|
+
|
|
15
|
+
## Why This Matters
|
|
16
|
+
|
|
17
|
+
We are entering a world where AI agents are the primary consumers of APIs. As agents proliferate, the reliability of the infrastructure they depend on becomes critical. A DeFi protocol that goes down at the wrong moment, an AI API that drops requests under load, a SaaS backend that silently fails — these are existential risks for the services built on top of them.
|
|
18
|
+
|
|
19
|
+
The companies that can prove they never go down will win. BlockRun provides that proof.
|
|
20
|
+
|
|
21
|
+
## Long-Term Ambition
|
|
22
|
+
|
|
23
|
+
Start with uptime monitoring. Expand to any verification task that benefits from decentralized, independent execution:
|
|
24
|
+
|
|
25
|
+
- **Phase 1**: HTTP health checks, latency measurement, SSL/DNS validation
|
|
26
|
+
- **Phase 2**: API contract verification (does the endpoint return what it promises?)
|
|
27
|
+
- **Phase 3**: Full agentic tasks — scheduled jobs, data pipelines, anything a ClawRouter agent can run
|
|
28
|
+
|
|
29
|
+
The worker network becomes the backbone of a new trust layer for the internet — not maintained by a single company, but by thousands of independent agents earning for their work.
|
|
30
|
+
|
|
31
|
+
## The Flywheel
|
|
32
|
+
|
|
33
|
+
```
|
|
34
|
+
More ClawRouter users
|
|
35
|
+
→ More worker nodes → Better geographic coverage
|
|
36
|
+
→ Better product for verification buyers
|
|
37
|
+
→ More revenue → Higher worker earnings
|
|
38
|
+
→ More incentive to run ClawRouter
|
|
39
|
+
→ More ClawRouter users
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
Each side of the marketplace strengthens the other. ClawRouter users are simultaneously the supply (workers) and a natural demand source (they build services that need monitoring). There is no cold start problem.
|
|
43
|
+
|
|
44
|
+
## Why BlockRun Wins This
|
|
45
|
+
|
|
46
|
+
1. **Infrastructure already exists**: x402 micropayments, USDC wallets, distributed user base — all live today
|
|
47
|
+
2. **No new trust required**: Workers are already ClawRouter users who've onboarded with a funded wallet
|
|
48
|
+
3. **Crypto-native from day one**: USDC settlement on Base, x402 protocol — the payment layer is the differentiator
|
|
49
|
+
4. **Partnership leverage**: Built on Coinbase's x402 protocol, natural alignment with Base ecosystem
|
|
@@ -0,0 +1,157 @@
|
|
|
1
|
+
# ClawRouter vs OpenRouter
|
|
2
|
+
|
|
3
|
+
OpenRouter is a popular LLM routing service. Here's why ClawRouter is built differently — and why it matters for agents.
|
|
4
|
+
|
|
5
|
+
## TL;DR
|
|
6
|
+
|
|
7
|
+
**OpenRouter is built for developers. ClawRouter is built for agents.**
|
|
8
|
+
|
|
9
|
+
| Aspect | OpenRouter | ClawRouter |
|
|
10
|
+
| ------------------ | ------------------------------------- | -------------------------------------- |
|
|
11
|
+
| **Setup** | Human creates account, pastes API key | Agent generates wallet, receives funds |
|
|
12
|
+
| **Authentication** | API key (shared secret) | Wallet signature (cryptographic) |
|
|
13
|
+
| **Payment** | Prepaid balance (custodial) | Per-request USDC (non-custodial) |
|
|
14
|
+
| **Routing** | Server-side, proprietary | Client-side, open source, <1ms |
|
|
15
|
+
| **Rate limits** | Per-key quotas | None (your wallet, your limits) |
|
|
16
|
+
| **Empty balance** | Request fails | Auto-fallback to free tier |
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## The Problem with API Keys
|
|
21
|
+
|
|
22
|
+
OpenRouter (and every traditional LLM gateway) uses API keys for authentication. This breaks agent autonomy:
|
|
23
|
+
|
|
24
|
+
### 1. Key Leakage in LLM Context
|
|
25
|
+
|
|
26
|
+
**OpenClaw Issue [#11202](https://github.com/openclaw/openclaw/issues/11202)**: API keys configured in `openclaw.json` are resolved and serialized into every LLM request payload. Every provider sees every other provider's keys.
|
|
27
|
+
|
|
28
|
+
> "OpenRouter sees your NVIDIA key, Anthropic sees your Google key... keys are sent on every turn."
|
|
29
|
+
|
|
30
|
+
**ClawRouter**: No API keys. Authentication happens via cryptographic wallet signatures. There's nothing to leak because there are no shared secrets.
|
|
31
|
+
|
|
32
|
+
### 2. Rate Limit Hell
|
|
33
|
+
|
|
34
|
+
**OpenClaw Issue [#8615](https://github.com/openclaw/openclaw/issues/8615)**: Single API key support means heavy users hit rate limits (429 errors) quickly. Users request multi-key load balancing, but that's just patching a broken model.
|
|
35
|
+
|
|
36
|
+
**ClawRouter**: Non-custodial wallets. You control your own keys. No shared rate limits. Scale by funding more wallets if needed.
|
|
37
|
+
|
|
38
|
+
### 3. Setup Friction
|
|
39
|
+
|
|
40
|
+
**OpenClaw Issues [#16257](https://github.com/openclaw/openclaw/issues/16257), [#16226](https://github.com/openclaw/openclaw/issues/16226)**: Latest installer skips model selection, shows "No auth configured for provider anthropic". Users can't even get started without debugging config.
|
|
41
|
+
|
|
42
|
+
**ClawRouter**: One-line install. 30+ models auto-configured. No API keys to paste.
|
|
43
|
+
|
|
44
|
+
### 4. Model Path Collision
|
|
45
|
+
|
|
46
|
+
**OpenClaw Issue [#2373](https://github.com/openclaw/openclaw/issues/2373)**: `openrouter/auto` is broken because OpenClaw prefixes all OpenRouter models with `openrouter/`, so the actual model becomes `openrouter/openrouter/auto`.
|
|
47
|
+
|
|
48
|
+
**ClawRouter**: Clean namespace. `blockrun/auto` just works. No prefix collision.
|
|
49
|
+
|
|
50
|
+
### 5. False Billing Errors
|
|
51
|
+
|
|
52
|
+
**OpenClaw Issue [#16237](https://github.com/openclaw/openclaw/issues/16237)**: The regex `/\b402\b/` falsely matches normal content (e.g., "402 calories") as a billing error, replacing valid AI responses with error messages.
|
|
53
|
+
|
|
54
|
+
**ClawRouter**: Native x402 protocol support. Precise error handling. No regex hacks.
|
|
55
|
+
|
|
56
|
+
### 6. Unknown Model Failures
|
|
57
|
+
|
|
58
|
+
**OpenClaw Issues [#16277](https://github.com/openclaw/openclaw/issues/16277), [#10687](https://github.com/openclaw/openclaw/issues/10687)**: Static model catalog causes "Unknown model" errors when providers add new models or during sub-agent spawns.
|
|
59
|
+
|
|
60
|
+
**ClawRouter**: 30+ models pre-configured, auto-updated catalog.
|
|
61
|
+
|
|
62
|
+
---
|
|
63
|
+
|
|
64
|
+
## Agent-Native: Why It Matters
|
|
65
|
+
|
|
66
|
+
Traditional LLM gateways require a human in the loop:
|
|
67
|
+
|
|
68
|
+
```
|
|
69
|
+
Traditional Flow (Human-in-the-loop):
|
|
70
|
+
Human → creates account → gets API key → pastes into config → agent runs
|
|
71
|
+
|
|
72
|
+
Agent-Native Flow (Fully autonomous):
|
|
73
|
+
Agent → generates wallet → receives USDC → pays per request → runs
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
| Capability | OpenRouter | ClawRouter |
|
|
77
|
+
| -------------------- | ----------------------- | -------------------------- |
|
|
78
|
+
| **Account creation** | Requires human | Agent generates wallet |
|
|
79
|
+
| **Authentication** | Shared secret (API key) | Cryptographic signature |
|
|
80
|
+
| **Payment** | Human prepays balance | Agent pays per request |
|
|
81
|
+
| **Funds custody** | They hold your money | You hold your keys |
|
|
82
|
+
| **Empty balance** | Request fails | Auto-fallback to free tier |
|
|
83
|
+
|
|
84
|
+
### The x402 Difference
|
|
85
|
+
|
|
86
|
+
```
|
|
87
|
+
Request → 402 Response (price: $0.003)
|
|
88
|
+
→ Agent's wallet signs payment
|
|
89
|
+
→ Response delivered
|
|
90
|
+
|
|
91
|
+
No accounts. No API keys. No human intervention.
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
**Agents can:**
|
|
95
|
+
|
|
96
|
+
- Spawn with a fresh wallet
|
|
97
|
+
- Receive funds programmatically
|
|
98
|
+
- Pay for exactly what they use
|
|
99
|
+
- Never trust a third party with their funds
|
|
100
|
+
|
|
101
|
+
---
|
|
102
|
+
|
|
103
|
+
## Routing: Cloud vs Local
|
|
104
|
+
|
|
105
|
+
### OpenRouter
|
|
106
|
+
|
|
107
|
+
- Routing decisions happen on OpenRouter's servers
|
|
108
|
+
- You trust their proprietary algorithm
|
|
109
|
+
- No visibility into why a model was chosen
|
|
110
|
+
- Adds latency for every request
|
|
111
|
+
|
|
112
|
+
### ClawRouter
|
|
113
|
+
|
|
114
|
+
- **100% local routing** — 15-dimension weighted scoring runs on YOUR machine
|
|
115
|
+
- **<1ms decisions** — no API calls for routing
|
|
116
|
+
- **Open source** — inspect the exact scoring logic in [`src/router.ts`](../src/router.ts)
|
|
117
|
+
- **Transparent** — see why each model is chosen
|
|
118
|
+
|
|
119
|
+
---
|
|
120
|
+
|
|
121
|
+
## Quick Start
|
|
122
|
+
|
|
123
|
+
Already using OpenRouter? Switch in 60 seconds:
|
|
124
|
+
|
|
125
|
+
```bash
|
|
126
|
+
# 1. Install ClawRouter
|
|
127
|
+
curl -fsSL https://blockrun.ai/ClawRouter-update | bash
|
|
128
|
+
|
|
129
|
+
# 2. Restart gateway
|
|
130
|
+
openclaw gateway restart
|
|
131
|
+
|
|
132
|
+
# 3. Fund wallet (address shown during install)
|
|
133
|
+
# $5 USDC on Base = thousands of requests
|
|
134
|
+
|
|
135
|
+
# 4. Switch model
|
|
136
|
+
/model blockrun/auto
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
Your OpenRouter config stays intact — ClawRouter is additive, not replacement.
|
|
140
|
+
|
|
141
|
+
---
|
|
142
|
+
|
|
143
|
+
## Summary
|
|
144
|
+
|
|
145
|
+
> **OpenRouter**: Built for developers who paste API keys
|
|
146
|
+
>
|
|
147
|
+
> **ClawRouter**: Built for agents that manage their own wallets
|
|
148
|
+
|
|
149
|
+
The future of AI isn't humans configuring API keys. It's agents autonomously acquiring and paying for resources.
|
|
150
|
+
|
|
151
|
+
---
|
|
152
|
+
|
|
153
|
+
<div align="center">
|
|
154
|
+
|
|
155
|
+
**Questions?** [Telegram](https://t.me/blockrunAI) · [X](https://x.com/BlockRunAI) · [GitHub](https://github.com/BlockRunAI/ClawRouter)
|
|
156
|
+
|
|
157
|
+
</div>
|