@blockrun/clawrouter 0.12.64 → 0.12.65
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +55 -55
- package/dist/cli.js +50 -14
- package/dist/cli.js.map +1 -1
- package/dist/index.js +57 -16
- package/dist/index.js.map +1 -1
- package/docs/anthropic-cost-savings.md +90 -85
- package/docs/architecture.md +12 -12
- package/docs/{blog-openclaw-cost-overruns.md → clawrouter-cuts-llm-api-costs-500x.md} +27 -27
- package/docs/clawrouter-vs-openrouter-llm-routing-comparison.md +280 -0
- package/docs/configuration.md +2 -2
- package/docs/image-generation.md +39 -39
- package/docs/{blog-benchmark-2026-03.md → llm-router-benchmark-46-models-sub-1ms-routing.md} +61 -64
- package/docs/routing-profiles.md +6 -6
- package/docs/{technical-routing-2026-03.md → smart-llm-router-14-dimension-classifier.md} +29 -28
- package/docs/worker-network.md +438 -347
- package/package.json +1 -1
- package/scripts/reinstall.sh +31 -6
- package/scripts/update.sh +6 -1
- package/docs/vs-openrouter.md +0 -157
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# Building a Smart LLM Router: How We Benchmarked 46 Models and Built a 14-Dimension Classifier
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
_March 20, 2026 | BlockRun Engineering_
|
|
4
4
|
|
|
5
5
|
When you route AI requests across 46 models from 8 providers, you can't just pick the cheapest one. You can't just pick the fastest one either. We learned this the hard way.
|
|
6
6
|
|
|
@@ -170,22 +170,22 @@ User Prompt → Lowercase + Tokenize
|
|
|
170
170
|
|
|
171
171
|
### The 14 Dimensions
|
|
172
172
|
|
|
173
|
-
| Dimension
|
|
174
|
-
|
|
175
|
-
| reasoningMarkers
|
|
176
|
-
| codePresence
|
|
177
|
-
| multiStepPatterns
|
|
178
|
-
| technicalTerms
|
|
179
|
-
| tokenCount
|
|
180
|
-
| creativeMarkers
|
|
181
|
-
| questionComplexity
|
|
182
|
-
| agenticTask
|
|
183
|
-
| constraintCount
|
|
184
|
-
| imperativeVerbs
|
|
185
|
-
| outputFormat
|
|
186
|
-
| simpleIndicators
|
|
187
|
-
| referenceComplexity | 0.02
|
|
188
|
-
| domainSpecificity
|
|
173
|
+
| Dimension | Weight | What It Detects | Score Range |
|
|
174
|
+
| ------------------- | ------ | ---------------------------------------- | ----------- |
|
|
175
|
+
| reasoningMarkers | 0.18 | "prove", "theorem", "step by step" | 0 to 1.0 |
|
|
176
|
+
| codePresence | 0.15 | "function", "class", "import", "```" | 0 to 1.0 |
|
|
177
|
+
| multiStepPatterns | 0.12 | "first...then", "step N", numbered lists | 0 or 0.5 |
|
|
178
|
+
| technicalTerms | 0.10 | "algorithm", "kubernetes", "distributed" | 0 to 1.0 |
|
|
179
|
+
| tokenCount | 0.08 | Short (<50 tokens) vs long (>500 tokens) | -1.0 to 1.0 |
|
|
180
|
+
| creativeMarkers | 0.05 | "story", "poem", "brainstorm" | 0 to 0.7 |
|
|
181
|
+
| questionComplexity | 0.05 | Number of question marks (>3 = complex) | 0 or 0.5 |
|
|
182
|
+
| agenticTask | 0.04 | "edit", "deploy", "fix", "debug" | 0 to 1.0 |
|
|
183
|
+
| constraintCount | 0.04 | "at most", "within", "O()" | 0 to 0.7 |
|
|
184
|
+
| imperativeVerbs | 0.03 | "build", "create", "implement" | 0 to 0.5 |
|
|
185
|
+
| outputFormat | 0.03 | "json", "yaml", "table", "csv" | 0 to 0.7 |
|
|
186
|
+
| simpleIndicators | 0.02 | "what is", "hello", "define" | 0 to -1.0 |
|
|
187
|
+
| referenceComplexity | 0.02 | "the code above", "the API docs" | 0 to 0.5 |
|
|
188
|
+
| domainSpecificity | 0.02 | "quantum", "FPGA", "genomics" | 0 to 0.8 |
|
|
189
189
|
|
|
190
190
|
Weights sum to 1.0. The weighted score maps to a continuous axis where tier boundaries partition the space.
|
|
191
191
|
|
|
@@ -251,15 +251,15 @@ Each tier config includes an ordered fallback list. When the primary model retur
|
|
|
251
251
|
```typescript
|
|
252
252
|
// COMPLEX tier — quality-first fallback order
|
|
253
253
|
fallback: [
|
|
254
|
-
"google/gemini-3-pro-preview",
|
|
255
|
-
"google/gemini-3-flash-preview",
|
|
256
|
-
"xai/grok-4-0709",
|
|
257
|
-
"google/gemini-2.5-pro",
|
|
258
|
-
"anthropic/claude-sonnet-4.6",
|
|
259
|
-
"deepseek/deepseek-chat",
|
|
260
|
-
"google/gemini-2.5-flash",
|
|
261
|
-
"openai/gpt-5.4",
|
|
262
|
-
]
|
|
254
|
+
"google/gemini-3-pro-preview", // IQ 48, 1,352ms
|
|
255
|
+
"google/gemini-3-flash-preview", // IQ 46, 1,398ms
|
|
256
|
+
"xai/grok-4-0709", // IQ 41, 1,348ms
|
|
257
|
+
"google/gemini-2.5-pro", // 1,294ms
|
|
258
|
+
"anthropic/claude-sonnet-4.6", // IQ 52, 2,110ms
|
|
259
|
+
"deepseek/deepseek-chat", // IQ 32, 1,431ms
|
|
260
|
+
"google/gemini-2.5-flash", // IQ 20, 1,238ms
|
|
261
|
+
"openai/gpt-5.4", // IQ 57, 6,213ms — last resort
|
|
262
|
+
];
|
|
263
263
|
```
|
|
264
264
|
|
|
265
265
|
The chain descends by quality first (IQ 48 → 46 → 41), then trades quality for speed. GPT-5.4 is last despite having IQ 57, because its 6.2s latency is a worst-case user experience.
|
|
@@ -279,10 +279,11 @@ If filtering eliminates all candidates, the full chain is used as a fallback (be
|
|
|
279
279
|
Every routing decision includes a cost estimate and savings percentage against a baseline (Claude Opus 4.6 pricing):
|
|
280
280
|
|
|
281
281
|
```typescript
|
|
282
|
-
savings = max(0, (opusCost - routedCost) / opusCost)
|
|
282
|
+
savings = max(0, (opusCost - routedCost) / opusCost);
|
|
283
283
|
```
|
|
284
284
|
|
|
285
285
|
For a typical SIMPLE request (500 input tokens, 256 output tokens):
|
|
286
|
+
|
|
286
287
|
- Opus cost: $0.0089 (at $5.00/$25.00 per 1M tokens)
|
|
287
288
|
- Gemini Flash cost: $0.0008 (at $0.30/$2.50 per 1M tokens)
|
|
288
289
|
- Savings: 91.0%
|
|
@@ -319,4 +320,4 @@ Scoring implementation: [`src/router/rules.ts`](https://github.com/BlockRunAI/Cl
|
|
|
319
320
|
|
|
320
321
|
---
|
|
321
322
|
|
|
322
|
-
|
|
323
|
+
_BlockRun is the x402 micropayment gateway for AI. One wallet, 46+ models, pay-per-request with USDC. [blockrun.ai](https://blockrun.ai)_
|