@blockrun/clawrouter 0.12.63 → 0.12.65

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (34) hide show
  1. package/README.md +55 -55
  2. package/dist/cli.js +50 -14
  3. package/dist/cli.js.map +1 -1
  4. package/dist/index.js +57 -16
  5. package/dist/index.js.map +1 -1
  6. package/docs/anthropic-cost-savings.md +90 -85
  7. package/docs/architecture.md +12 -12
  8. package/docs/{blog-openclaw-cost-overruns.md → clawrouter-cuts-llm-api-costs-500x.md} +27 -27
  9. package/docs/clawrouter-vs-openrouter-llm-routing-comparison.md +280 -0
  10. package/docs/configuration.md +2 -2
  11. package/docs/image-generation.md +39 -39
  12. package/docs/{blog-benchmark-2026-03.md → llm-router-benchmark-46-models-sub-1ms-routing.md} +61 -64
  13. package/docs/routing-profiles.md +6 -6
  14. package/docs/{technical-routing-2026-03.md → smart-llm-router-14-dimension-classifier.md} +29 -28
  15. package/docs/worker-network.md +438 -347
  16. package/package.json +3 -2
  17. package/scripts/reinstall.sh +31 -6
  18. package/scripts/update.sh +6 -1
  19. package/docs/assets/blockrun-248-day-cost-overrun-problem.png +0 -0
  20. package/docs/assets/blockrun-clawrouter-7-layer-token-compression-openclaw.png +0 -0
  21. package/docs/assets/blockrun-clawrouter-observation-compression-97-percent-token-savings.png +0 -0
  22. package/docs/assets/blockrun-clawrouter-openclaw-agentic-proxy-architecture.png +0 -0
  23. package/docs/assets/blockrun-clawrouter-openclaw-automatic-tier-routing-model-selection.png +0 -0
  24. package/docs/assets/blockrun-clawrouter-openclaw-error-classification-retry-storm-prevention.png +0 -0
  25. package/docs/assets/blockrun-clawrouter-openclaw-session-memory-journaling-vs-context-compounding.png +0 -0
  26. package/docs/assets/blockrun-clawrouter-vs-openclaw-standalone-comparison-production-safety.png +0 -0
  27. package/docs/assets/blockrun-clawrouter-x402-usdc-micropayment-wallet-budget-control.png +0 -0
  28. package/docs/assets/blockrun-openclaw-inference-layer-blind-spots.png +0 -0
  29. package/docs/plans/2026-02-03-smart-routing-design.md +0 -267
  30. package/docs/plans/2026-02-13-e2e-docker-deployment.md +0 -1260
  31. package/docs/plans/2026-02-28-worker-network.md +0 -947
  32. package/docs/plans/2026-03-18-error-classification.md +0 -574
  33. package/docs/plans/2026-03-19-exclude-models.md +0 -538
  34. package/docs/vs-openrouter.md +0 -157
@@ -1,6 +1,6 @@
1
1
  # Building a Smart LLM Router: How We Benchmarked 46 Models and Built a 14-Dimension Classifier
2
2
 
3
- *March 20, 2026 | BlockRun Engineering*
3
+ _March 20, 2026 | BlockRun Engineering_
4
4
 
5
5
  When you route AI requests across 46 models from 8 providers, you can't just pick the cheapest one. You can't just pick the fastest one either. We learned this the hard way.
6
6
 
@@ -170,22 +170,22 @@ User Prompt → Lowercase + Tokenize
170
170
 
171
171
  ### The 14 Dimensions
172
172
 
173
- | Dimension | Weight | What It Detects | Score Range |
174
- |-----------|--------|-----------------|-------------|
175
- | reasoningMarkers | 0.18 | "prove", "theorem", "step by step" | 0 to 1.0 |
176
- | codePresence | 0.15 | "function", "class", "import", "```" | 0 to 1.0 |
177
- | multiStepPatterns | 0.12 | "first...then", "step N", numbered lists | 0 or 0.5 |
178
- | technicalTerms | 0.10 | "algorithm", "kubernetes", "distributed" | 0 to 1.0 |
179
- | tokenCount | 0.08 | Short (<50 tokens) vs long (>500 tokens) | -1.0 to 1.0 |
180
- | creativeMarkers | 0.05 | "story", "poem", "brainstorm" | 0 to 0.7 |
181
- | questionComplexity | 0.05 | Number of question marks (>3 = complex) | 0 or 0.5 |
182
- | agenticTask | 0.04 | "edit", "deploy", "fix", "debug" | 0 to 1.0 |
183
- | constraintCount | 0.04 | "at most", "within", "O()" | 0 to 0.7 |
184
- | imperativeVerbs | 0.03 | "build", "create", "implement" | 0 to 0.5 |
185
- | outputFormat | 0.03 | "json", "yaml", "table", "csv" | 0 to 0.7 |
186
- | simpleIndicators | 0.02 | "what is", "hello", "define" | 0 to -1.0 |
187
- | referenceComplexity | 0.02 | "the code above", "the API docs" | 0 to 0.5 |
188
- | domainSpecificity | 0.02 | "quantum", "FPGA", "genomics" | 0 to 0.8 |
173
+ | Dimension | Weight | What It Detects | Score Range |
174
+ | ------------------- | ------ | ---------------------------------------- | ----------- |
175
+ | reasoningMarkers | 0.18 | "prove", "theorem", "step by step" | 0 to 1.0 |
176
+ | codePresence | 0.15 | "function", "class", "import", "```" | 0 to 1.0 |
177
+ | multiStepPatterns | 0.12 | "first...then", "step N", numbered lists | 0 or 0.5 |
178
+ | technicalTerms | 0.10 | "algorithm", "kubernetes", "distributed" | 0 to 1.0 |
179
+ | tokenCount | 0.08 | Short (<50 tokens) vs long (>500 tokens) | -1.0 to 1.0 |
180
+ | creativeMarkers | 0.05 | "story", "poem", "brainstorm" | 0 to 0.7 |
181
+ | questionComplexity | 0.05 | Number of question marks (>3 = complex) | 0 or 0.5 |
182
+ | agenticTask | 0.04 | "edit", "deploy", "fix", "debug" | 0 to 1.0 |
183
+ | constraintCount | 0.04 | "at most", "within", "O()" | 0 to 0.7 |
184
+ | imperativeVerbs | 0.03 | "build", "create", "implement" | 0 to 0.5 |
185
+ | outputFormat | 0.03 | "json", "yaml", "table", "csv" | 0 to 0.7 |
186
+ | simpleIndicators | 0.02 | "what is", "hello", "define" | 0 to -1.0 |
187
+ | referenceComplexity | 0.02 | "the code above", "the API docs" | 0 to 0.5 |
188
+ | domainSpecificity | 0.02 | "quantum", "FPGA", "genomics" | 0 to 0.8 |
189
189
 
190
190
  Weights sum to 1.0. The weighted score maps to a continuous axis where tier boundaries partition the space.
191
191
 
@@ -251,15 +251,15 @@ Each tier config includes an ordered fallback list. When the primary model retur
251
251
  ```typescript
252
252
  // COMPLEX tier — quality-first fallback order
253
253
  fallback: [
254
- "google/gemini-3-pro-preview", // IQ 48, 1,352ms
255
- "google/gemini-3-flash-preview", // IQ 46, 1,398ms
256
- "xai/grok-4-0709", // IQ 41, 1,348ms
257
- "google/gemini-2.5-pro", // 1,294ms
258
- "anthropic/claude-sonnet-4.6", // IQ 52, 2,110ms
259
- "deepseek/deepseek-chat", // IQ 32, 1,431ms
260
- "google/gemini-2.5-flash", // IQ 20, 1,238ms
261
- "openai/gpt-5.4", // IQ 57, 6,213ms — last resort
262
- ]
254
+ "google/gemini-3-pro-preview", // IQ 48, 1,352ms
255
+ "google/gemini-3-flash-preview", // IQ 46, 1,398ms
256
+ "xai/grok-4-0709", // IQ 41, 1,348ms
257
+ "google/gemini-2.5-pro", // 1,294ms
258
+ "anthropic/claude-sonnet-4.6", // IQ 52, 2,110ms
259
+ "deepseek/deepseek-chat", // IQ 32, 1,431ms
260
+ "google/gemini-2.5-flash", // IQ 20, 1,238ms
261
+ "openai/gpt-5.4", // IQ 57, 6,213ms — last resort
262
+ ];
263
263
  ```
264
264
 
265
265
  The chain descends by quality first (IQ 48 → 46 → 41), then trades quality for speed. GPT-5.4 is last despite having IQ 57, because its 6.2s latency is a worst-case user experience.
@@ -279,10 +279,11 @@ If filtering eliminates all candidates, the full chain is used as a fallback (be
279
279
  Every routing decision includes a cost estimate and savings percentage against a baseline (Claude Opus 4.6 pricing):
280
280
 
281
281
  ```typescript
282
- savings = max(0, (opusCost - routedCost) / opusCost)
282
+ savings = max(0, (opusCost - routedCost) / opusCost);
283
283
  ```
284
284
 
285
285
  For a typical SIMPLE request (500 input tokens, 256 output tokens):
286
+
286
287
  - Opus cost: $0.0089 (at $5.00/$25.00 per 1M tokens)
287
288
  - Gemini Flash cost: $0.0008 (at $0.30/$2.50 per 1M tokens)
288
289
  - Savings: 91.0%
@@ -319,4 +320,4 @@ Scoring implementation: [`src/router/rules.ts`](https://github.com/BlockRunAI/Cl
319
320
 
320
321
  ---
321
322
 
322
- *BlockRun is the x402 micropayment gateway for AI. One wallet, 46+ models, pay-per-request with USDC. [blockrun.ai](https://blockrun.ai)*
323
+ _BlockRun is the x402 micropayment gateway for AI. One wallet, 46+ models, pay-per-request with USDC. [blockrun.ai](https://blockrun.ai)_