@blockrun/clawrouter 0.12.63 → 0.12.65
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +55 -55
- package/dist/cli.js +50 -14
- package/dist/cli.js.map +1 -1
- package/dist/index.js +57 -16
- package/dist/index.js.map +1 -1
- package/docs/anthropic-cost-savings.md +90 -85
- package/docs/architecture.md +12 -12
- package/docs/{blog-openclaw-cost-overruns.md → clawrouter-cuts-llm-api-costs-500x.md} +27 -27
- package/docs/clawrouter-vs-openrouter-llm-routing-comparison.md +280 -0
- package/docs/configuration.md +2 -2
- package/docs/image-generation.md +39 -39
- package/docs/{blog-benchmark-2026-03.md → llm-router-benchmark-46-models-sub-1ms-routing.md} +61 -64
- package/docs/routing-profiles.md +6 -6
- package/docs/{technical-routing-2026-03.md → smart-llm-router-14-dimension-classifier.md} +29 -28
- package/docs/worker-network.md +438 -347
- package/package.json +3 -2
- package/scripts/reinstall.sh +31 -6
- package/scripts/update.sh +6 -1
- package/docs/assets/blockrun-248-day-cost-overrun-problem.png +0 -0
- package/docs/assets/blockrun-clawrouter-7-layer-token-compression-openclaw.png +0 -0
- package/docs/assets/blockrun-clawrouter-observation-compression-97-percent-token-savings.png +0 -0
- package/docs/assets/blockrun-clawrouter-openclaw-agentic-proxy-architecture.png +0 -0
- package/docs/assets/blockrun-clawrouter-openclaw-automatic-tier-routing-model-selection.png +0 -0
- package/docs/assets/blockrun-clawrouter-openclaw-error-classification-retry-storm-prevention.png +0 -0
- package/docs/assets/blockrun-clawrouter-openclaw-session-memory-journaling-vs-context-compounding.png +0 -0
- package/docs/assets/blockrun-clawrouter-vs-openclaw-standalone-comparison-production-safety.png +0 -0
- package/docs/assets/blockrun-clawrouter-x402-usdc-micropayment-wallet-budget-control.png +0 -0
- package/docs/assets/blockrun-openclaw-inference-layer-blind-spots.png +0 -0
- package/docs/plans/2026-02-03-smart-routing-design.md +0 -267
- package/docs/plans/2026-02-13-e2e-docker-deployment.md +0 -1260
- package/docs/plans/2026-02-28-worker-network.md +0 -947
- package/docs/plans/2026-03-18-error-classification.md +0 -574
- package/docs/plans/2026-03-19-exclude-models.md +0 -538
- package/docs/vs-openrouter.md +0 -157
|
@@ -0,0 +1,280 @@
|
|
|
1
|
+
# We Read 100 OpenClaw Issues About OpenRouter. Here's What We Built Instead.
|
|
2
|
+
|
|
3
|
+
> _OpenRouter is the most popular LLM aggregator. It's also the source of the most frustration in OpenClaw's issue tracker._
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## The Data
|
|
8
|
+
|
|
9
|
+
We searched OpenClaw's GitHub issues for "openrouter" and read every result. 100 issues. Open and closed. Filed by users who ran into the same structural problems over and over:
|
|
10
|
+
|
|
11
|
+
| Category | Issue Count | Representative Issues |
|
|
12
|
+
| ------------------------------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
13
|
+
| **Broken fallback / failover** | ~20 | [#22136](https://github.com/openclaw/openclaw/issues/22136), [#45663](https://github.com/openclaw/openclaw/issues/45663), [#50389](https://github.com/openclaw/openclaw/issues/50389), [#49079](https://github.com/openclaw/openclaw/issues/49079) |
|
|
14
|
+
| **Model ID mangling** | ~15 | [#49379](https://github.com/openclaw/openclaw/issues/49379), [#50711](https://github.com/openclaw/openclaw/issues/50711), [#25665](https://github.com/openclaw/openclaw/issues/25665), [#2373](https://github.com/openclaw/openclaw/issues/2373) |
|
|
15
|
+
| **Authentication / 401 errors** | ~8 | [#51056](https://github.com/openclaw/openclaw/issues/51056), [#34830](https://github.com/openclaw/openclaw/issues/34830), [#26960](https://github.com/openclaw/openclaw/issues/26960) |
|
|
16
|
+
| **Cost / billing opacity** | ~6 | [#25371](https://github.com/openclaw/openclaw/issues/25371), [#50738](https://github.com/openclaw/openclaw/issues/50738), [#38248](https://github.com/openclaw/openclaw/issues/38248) |
|
|
17
|
+
| **Routing opacity** | ~5 | [#7006](https://github.com/openclaw/openclaw/issues/7006), [#35842](https://github.com/openclaw/openclaw/issues/35842) |
|
|
18
|
+
| **Missing feature parity** | ~10 | [#46255](https://github.com/openclaw/openclaw/issues/46255), [#50485](https://github.com/openclaw/openclaw/issues/50485), [#30850](https://github.com/openclaw/openclaw/issues/30850) |
|
|
19
|
+
| **Rate limit / key exhaustion** | ~4 | [#8615](https://github.com/openclaw/openclaw/issues/8615), [#48729](https://github.com/openclaw/openclaw/issues/48729) |
|
|
20
|
+
| **Model catalog staleness** | ~5 | [#10687](https://github.com/openclaw/openclaw/issues/10687), [#30152](https://github.com/openclaw/openclaw/issues/30152) |
|
|
21
|
+
|
|
22
|
+
These aren't edge cases. They're structural consequences of how OpenRouter works: a middleman that adds latency, mangles model IDs, obscures routing decisions, and introduces its own failure modes on top of the providers it aggregates.
|
|
23
|
+
|
|
24
|
+
---
|
|
25
|
+
|
|
26
|
+
## 1. Broken Fallback — The #1 Pain Point
|
|
27
|
+
|
|
28
|
+
From [#45663](https://github.com/openclaw/openclaw/issues/45663):
|
|
29
|
+
|
|
30
|
+
> _"Provider returned error from OpenRouter does not trigger model failover."_
|
|
31
|
+
|
|
32
|
+
From [#50389](https://github.com/openclaw/openclaw/issues/50389):
|
|
33
|
+
|
|
34
|
+
> _"Rate limit errors surfaced to user instead of auto-failover."_
|
|
35
|
+
|
|
36
|
+
When OpenRouter returns a 429 or provider error, OpenClaw's failover logic often doesn't recognize it as retriable. The user sees a raw error. The agent stops. ~20 issues document variations of this: HTTP 529 (Anthropic overloaded) not triggering fallback ([#49079](https://github.com/openclaw/openclaw/issues/49079)), invalid model IDs causing 400 instead of failover ([#50017](https://github.com/openclaw/openclaw/issues/50017)), timeouts in cron sessions with no recovery ([#49597](https://github.com/openclaw/openclaw/issues/49597)).
|
|
37
|
+
|
|
38
|
+
### How ClawRouter Solves This
|
|
39
|
+
|
|
40
|
+
ClawRouter maintains 8-deep fallback chains per routing tier. When a model fails:
|
|
41
|
+
|
|
42
|
+
1. **200ms retry** — short-burst rate limits often recover in milliseconds
|
|
43
|
+
2. **Next model** — if retry fails, move to the next model in the chain
|
|
44
|
+
3. **Per-model isolation** — one provider's failure doesn't poison the others
|
|
45
|
+
4. **All-failed summary** — if every model in the chain fails, you get a structured error listing every attempt and failure reason
|
|
46
|
+
|
|
47
|
+
```
|
|
48
|
+
[ClawRouter] Trying model 1/6: google/gemini-2.5-flash
|
|
49
|
+
[ClawRouter] Model google/gemini-2.5-flash returned 429, retrying in 200ms...
|
|
50
|
+
[ClawRouter] Retry failed, trying model 2/6: deepseek/deepseek-chat
|
|
51
|
+
[ClawRouter] Success with model: deepseek/deepseek-chat
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
No silent failures. No raw 429s surfaced to the agent.
|
|
55
|
+
|
|
56
|
+
---
|
|
57
|
+
|
|
58
|
+
## 2. Model ID Mangling — Death by Prefix
|
|
59
|
+
|
|
60
|
+
From [#25665](https://github.com/openclaw/openclaw/issues/25665):
|
|
61
|
+
|
|
62
|
+
> _"Model config defaults to `openrouter/openrouter/auto` (double prefix)."_
|
|
63
|
+
|
|
64
|
+
From [#50711](https://github.com/openclaw/openclaw/issues/50711):
|
|
65
|
+
|
|
66
|
+
> _"Control UI model picker strips `openrouter/` prefix."_
|
|
67
|
+
|
|
68
|
+
OpenRouter uses nested model IDs: `openrouter/deepseek/deepseek-v3.2`. OpenClaw's UI, Discord bot, and web gateway all handle these differently. Some add the prefix. Some strip it. Some double it. 15 issues trace back to model ID confusion.
|
|
69
|
+
|
|
70
|
+
### How ClawRouter Solves This
|
|
71
|
+
|
|
72
|
+
ClawRouter uses clean aliases. You say `sonnet` and get `anthropic/claude-sonnet-4-6`. You say `flash` and get `google/gemini-2.5-flash`. No nested prefixes. No double-prefix bugs.
|
|
73
|
+
|
|
74
|
+
```typescript
|
|
75
|
+
// resolveModelAlias() handles all normalization
|
|
76
|
+
"sonnet" → "anthropic/claude-sonnet-4-6"
|
|
77
|
+
"opus" → "anthropic/claude-opus-4-6"
|
|
78
|
+
"flash" → "google/gemini-2.5-flash"
|
|
79
|
+
"grok" → "xai/grok-4-0314"
|
|
80
|
+
"deepseek" → "deepseek/deepseek-chat"
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
One canonical format. No mangling. No UI inconsistency.
|
|
84
|
+
|
|
85
|
+
---
|
|
86
|
+
|
|
87
|
+
## 3. API Key Hell — 401s, Leakage, and Rotation
|
|
88
|
+
|
|
89
|
+
From [#51056](https://github.com/openclaw/openclaw/issues/51056):
|
|
90
|
+
|
|
91
|
+
> _"OpenRouter fails with '401 Missing Authentication header' despite valid key."_
|
|
92
|
+
|
|
93
|
+
From [#8615](https://github.com/openclaw/openclaw/issues/8615):
|
|
94
|
+
|
|
95
|
+
> _"Feature request: native multi-API-key support with load balancing and fallback."_
|
|
96
|
+
|
|
97
|
+
API keys are the root cause of an entire category of failures. Keys expire. Keys leak into LLM context (every provider sees every other provider's keys in the serialized request). Keys hit rate limits that can't be load-balanced. 8 issues document auth failures alone.
|
|
98
|
+
|
|
99
|
+
### How ClawRouter Solves This
|
|
100
|
+
|
|
101
|
+
ClawRouter has no API keys. Zero.
|
|
102
|
+
|
|
103
|
+
Payment happens via [x402](https://x402.org/) — a cryptographic micropayment protocol. Your agent generates a wallet on first run (BIP-44 derivation, both EVM and Solana). Each request is signed with the wallet's private key. USDC moves per-request.
|
|
104
|
+
|
|
105
|
+
```
|
|
106
|
+
No keys to leak.
|
|
107
|
+
No keys to rotate.
|
|
108
|
+
No keys to rate-limit.
|
|
109
|
+
No keys to expire.
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
The wallet is the identity. The signature is the authentication. Nothing to configure, nothing to paste into a config file, nothing for the LLM to accidentally serialize.
|
|
113
|
+
|
|
114
|
+
---
|
|
115
|
+
|
|
116
|
+
## 4. Cost and Billing Opacity — Surprise Bills
|
|
117
|
+
|
|
118
|
+
From [#25371](https://github.com/openclaw/openclaw/issues/25371):
|
|
119
|
+
|
|
120
|
+
> _"OpenRouter 402 billing error misclassified as 'Context overflow', triggering auto-compaction that drains remaining credits faster."_
|
|
121
|
+
|
|
122
|
+
From [#7006](https://github.com/openclaw/openclaw/issues/7006):
|
|
123
|
+
|
|
124
|
+
> _"`openrouter/auto` doesn't expose which model was actually used or its cost."_
|
|
125
|
+
|
|
126
|
+
When OpenRouter runs out of credits, it returns a 402 that OpenClaw misreads as a context overflow. OpenClaw then auto-compacts the context and retries — on the same empty balance. Each retry charges the compaction cost. Credits drain faster. The agent burns money trying to fix a billing error it doesn't understand.
|
|
127
|
+
|
|
128
|
+
### How ClawRouter Solves This
|
|
129
|
+
|
|
130
|
+
**Per-request cost visibility.** Every response includes cost headers:
|
|
131
|
+
|
|
132
|
+
```
|
|
133
|
+
x-clawrouter-cost: 0.0034
|
|
134
|
+
x-clawrouter-savings: 82%
|
|
135
|
+
x-clawrouter-model: google/gemini-2.5-flash
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
**Per-request USDC payments.** No prepaid balance to drain. Each request shows its price before you pay. When the wallet is empty, requests don't fail — they fall back to the free tier (NVIDIA GPT-OSS-120B).
|
|
139
|
+
|
|
140
|
+
**Budget guard.** `maxCostPerRun` caps per-session spending. Two modes: `graceful` (downgrade to cheaper models) or `strict` (hard stop). The $248/day heartbeat scenario is structurally impossible.
|
|
141
|
+
|
|
142
|
+
**Usage logging.** Every request logs to `~/.openclaw/blockrun/logs/usage-YYYY-MM-DD.jsonl` with model, tier, cost, baseline cost, savings, and latency. `/stats` shows the breakdown.
|
|
143
|
+
|
|
144
|
+
---
|
|
145
|
+
|
|
146
|
+
## 5. Routing Opacity — "Which Model Did I Just Pay For?"
|
|
147
|
+
|
|
148
|
+
From [#7006](https://github.com/openclaw/openclaw/issues/7006):
|
|
149
|
+
|
|
150
|
+
> _"No visibility into which model `openrouter/auto` actually uses."_
|
|
151
|
+
|
|
152
|
+
From [#35842](https://github.com/openclaw/openclaw/issues/35842):
|
|
153
|
+
|
|
154
|
+
> _"Need explicit Claude Sonnet default instead of auto-routing."_
|
|
155
|
+
|
|
156
|
+
When you use `openrouter/auto`, you don't know what model served your request. You can't debug quality regressions. You can't understand cost spikes. You're paying for a black box.
|
|
157
|
+
|
|
158
|
+
### How ClawRouter Solves This
|
|
159
|
+
|
|
160
|
+
ClawRouter's routing is 100% local, open-source, and transparent.
|
|
161
|
+
|
|
162
|
+
**14-dimension weighted classifier** runs locally in <1ms. It scores every request across: token count, code presence, reasoning markers, technical terms, multi-step patterns, question complexity, tool signals, and more.
|
|
163
|
+
|
|
164
|
+
**Debug headers on every response:**
|
|
165
|
+
|
|
166
|
+
```
|
|
167
|
+
x-clawrouter-profile: auto
|
|
168
|
+
x-clawrouter-tier: MEDIUM
|
|
169
|
+
x-clawrouter-model: moonshot/kimi-k2.5
|
|
170
|
+
x-clawrouter-confidence: 0.87
|
|
171
|
+
x-clawrouter-reasoning: "Code task with moderate complexity"
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
**SSE debug comments** in streaming responses show the routing decision inline. You always know which model, why it was selected, and how confident the classifier was.
|
|
175
|
+
|
|
176
|
+
**Four routing profiles** give you explicit control:
|
|
177
|
+
|
|
178
|
+
| Profile | Behavior | Savings |
|
|
179
|
+
| --------- | ----------------------- | ------- |
|
|
180
|
+
| `auto` | Balanced quality + cost | 74–100% |
|
|
181
|
+
| `eco` | Cheapest possible | 95–100% |
|
|
182
|
+
| `premium` | Best quality always | 0% |
|
|
183
|
+
| `free` | NVIDIA GPT-OSS only | 100% |
|
|
184
|
+
|
|
185
|
+
No black box. No mystery routing. Full visibility, full control.
|
|
186
|
+
|
|
187
|
+
---
|
|
188
|
+
|
|
189
|
+
## 6. Missing Feature Parity — Images, Tools, Caching
|
|
190
|
+
|
|
191
|
+
From [#46255](https://github.com/openclaw/openclaw/issues/46255):
|
|
192
|
+
|
|
193
|
+
> _"Images not passed to OpenRouter models."_
|
|
194
|
+
|
|
195
|
+
From [#47707](https://github.com/openclaw/openclaw/issues/47707):
|
|
196
|
+
|
|
197
|
+
> _"Mistral models fail with strict tool call ID requirements."_
|
|
198
|
+
|
|
199
|
+
OpenRouter doesn't always pass through provider-specific features correctly. Image payloads get dropped. Cache retention headers get ignored. Tool call ID formats cause silent failures with strict providers.
|
|
200
|
+
|
|
201
|
+
### How ClawRouter Solves This
|
|
202
|
+
|
|
203
|
+
**Vision auto-detection.** When `image_url` content parts are detected, ClawRouter automatically filters the fallback chain to vision-capable models only. No images dropped.
|
|
204
|
+
|
|
205
|
+
**Tool calling validation.** Every model has a `toolCalling` flag. When tools are present in the request, ClawRouter forces agentic routing tiers and excludes models without tool support. No silent tool call failures.
|
|
206
|
+
|
|
207
|
+
**Direct provider routing.** ClawRouter routes through BlockRun's API directly to providers — not through a second aggregator. One hop, not two. Provider-specific features work because there's no middleman translating them.
|
|
208
|
+
|
|
209
|
+
---
|
|
210
|
+
|
|
211
|
+
## 7. Model Catalog Staleness — "Where's the New Model?"
|
|
212
|
+
|
|
213
|
+
From [#10687](https://github.com/openclaw/openclaw/issues/10687):
|
|
214
|
+
|
|
215
|
+
> _"Need fully dynamic model discovery."_
|
|
216
|
+
|
|
217
|
+
From [#30152](https://github.com/openclaw/openclaw/issues/30152):
|
|
218
|
+
|
|
219
|
+
> _"Allowlist silently drops models not in catalog."_
|
|
220
|
+
|
|
221
|
+
When new models launch, OpenRouter's catalog lags. Users configure a model that exists at the provider but isn't in the catalog. The request fails silently or gets rerouted.
|
|
222
|
+
|
|
223
|
+
### How ClawRouter Solves This
|
|
224
|
+
|
|
225
|
+
ClawRouter maintains a curated catalog of 46+ models across 8 providers, updated with each release. Delisted models have automatic redirect aliases:
|
|
226
|
+
|
|
227
|
+
```typescript
|
|
228
|
+
// Delisted models redirect automatically
|
|
229
|
+
"xai/grok-code-fast-1" → "deepseek/deepseek-chat"
|
|
230
|
+
"google/gemini-2.0-pro" → "google/gemini-3.1-pro"
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
No silent drops. No stale catalog. Models are benchmarked for speed, quality, and tool support before inclusion.
|
|
234
|
+
|
|
235
|
+
---
|
|
236
|
+
|
|
237
|
+
## The Full Comparison
|
|
238
|
+
|
|
239
|
+
| | OpenRouter | ClawRouter |
|
|
240
|
+
| ------------------- | -------------------------------- | ---------------------------------------------- |
|
|
241
|
+
| **Authentication** | API key (leak risk) | Wallet signature (no keys) |
|
|
242
|
+
| **Payment** | Prepaid balance (custodial) | Per-request USDC (non-custodial) |
|
|
243
|
+
| **Routing** | Server-side black box | Local 14-dim classifier, <1ms |
|
|
244
|
+
| **Fallback** | Often broken (20+ issues) | 8-deep chains, per-model isolation |
|
|
245
|
+
| **Model IDs** | Nested prefixes, mangling bugs | Clean aliases, single format |
|
|
246
|
+
| **Cost visibility** | None per-request | Headers + JSONL logs + `/stats` |
|
|
247
|
+
| **Empty wallet** | Request fails | Auto-fallback to free tier |
|
|
248
|
+
| **Rate limits** | Per-key, shared | Per-wallet, independent |
|
|
249
|
+
| **Vision support** | Images sometimes dropped | Auto-detected, vision-only fallback |
|
|
250
|
+
| **Tool calling** | Silent failures with some models | Flag-based filtering, guaranteed support |
|
|
251
|
+
| **Model catalog** | Laggy, silent drops | Curated 46+ models, redirect aliases |
|
|
252
|
+
| **Budget control** | Monthly invoice | Per-session cap (`maxCostPerRun`) |
|
|
253
|
+
| **Setup** | Create account, paste key | Agent generates wallet, auto-configured |
|
|
254
|
+
| **Average cost** | $25/M tokens (Opus direct) | $2.05/M tokens (auto-routed) = **92% savings** |
|
|
255
|
+
|
|
256
|
+
---
|
|
257
|
+
|
|
258
|
+
## Getting Started
|
|
259
|
+
|
|
260
|
+
```bash
|
|
261
|
+
# Install
|
|
262
|
+
npm install -g @blockrun/clawrouter
|
|
263
|
+
|
|
264
|
+
# Start (auto-configures OpenClaw)
|
|
265
|
+
clawrouter
|
|
266
|
+
|
|
267
|
+
# Check your wallet
|
|
268
|
+
# /wallet
|
|
269
|
+
|
|
270
|
+
# View routing stats
|
|
271
|
+
# /stats
|
|
272
|
+
```
|
|
273
|
+
|
|
274
|
+
ClawRouter auto-injects itself into `~/.openclaw/openclaw.json` as a provider on startup. Your existing tools, sessions, and extensions are unchanged.
|
|
275
|
+
|
|
276
|
+
Load a wallet with USDC on Base or Solana, pick a routing profile, and run.
|
|
277
|
+
|
|
278
|
+
---
|
|
279
|
+
|
|
280
|
+
_[github.com/BlockRunAI/ClawRouter](https://github.com/BlockRunAI/ClawRouter) · [blockrun.ai](https://blockrun.ai) · `npm install -g @blockrun/clawrouter`_
|
package/docs/configuration.md
CHANGED
|
@@ -316,7 +316,7 @@ plugins:
|
|
|
316
316
|
config:
|
|
317
317
|
# Maximum spend per session/run in USD.
|
|
318
318
|
# Default: disabled (no limit)
|
|
319
|
-
maxCostPerRun: 0.50
|
|
319
|
+
maxCostPerRun: 0.50 # $0.50 per session
|
|
320
320
|
|
|
321
321
|
# How to enforce the budget cap. Default: graceful
|
|
322
322
|
#
|
|
@@ -326,7 +326,7 @@ plugins:
|
|
|
326
326
|
#
|
|
327
327
|
# strict: immediately returns 429 (X-ClawRouter-Cost-Cap-Exceeded: 1) once
|
|
328
328
|
# the session spend reaches the cap. Use when you need a hard budget ceiling.
|
|
329
|
-
maxCostPerRunMode: graceful
|
|
329
|
+
maxCostPerRunMode: graceful # or: strict
|
|
330
330
|
|
|
331
331
|
# Note: image generation endpoints (/v1/images/generations) bypass maxCostPerRun.
|
|
332
332
|
# Their cost is charged via x402 micropayment directly and is not tracked per-session.
|
package/docs/image-generation.md
CHANGED
|
@@ -51,13 +51,13 @@ The returned URL is a publicly hosted image, ready to use in Telegram, Discord,
|
|
|
51
51
|
|
|
52
52
|
## Models & Pricing
|
|
53
53
|
|
|
54
|
-
| Model ID
|
|
55
|
-
|
|
|
56
|
-
| `google/nano-banana`
|
|
57
|
-
| `google/nano-banana-pro`
|
|
58
|
-
| `openai/dall-e-3`
|
|
59
|
-
| `openai/gpt-image-1`
|
|
60
|
-
| `black-forest/flux-1.1-pro
|
|
54
|
+
| Model ID | Shorthand | Price | Max Size | Provider |
|
|
55
|
+
| --------------------------- | ------------- | ----------- | --------- | ------------------- |
|
|
56
|
+
| `google/nano-banana` | `nano-banana` | $0.05/image | 1024×1024 | Google Gemini Flash |
|
|
57
|
+
| `google/nano-banana-pro` | `banana-pro` | $0.10/image | 4096×4096 | Google Gemini Pro |
|
|
58
|
+
| `openai/dall-e-3` | `dall-e-3` | $0.04/image | 1792×1024 | OpenAI DALL-E 3 |
|
|
59
|
+
| `openai/gpt-image-1` | `gpt-image` | $0.02/image | 1536×1024 | OpenAI GPT Image |
|
|
60
|
+
| `black-forest/flux-1.1-pro` | `flux` | $0.04/image | 1024×1024 | Black Forest Labs |
|
|
61
61
|
|
|
62
62
|
Default model: `google/nano-banana`.
|
|
63
63
|
|
|
@@ -71,20 +71,20 @@ OpenAI-compatible endpoint. Route via ClawRouter proxy (`http://localhost:8402`)
|
|
|
71
71
|
|
|
72
72
|
**Request body:**
|
|
73
73
|
|
|
74
|
-
| Field | Type | Required | Description
|
|
75
|
-
| -------- | -------- | -------- |
|
|
76
|
-
| `model` | `string` | Yes | Model ID (see table above)
|
|
77
|
-
| `prompt` | `string` | Yes | Text description of the image to generate
|
|
78
|
-
| `size` | `string` | No | Image dimensions, e.g. `"1024x1024"` (default)
|
|
79
|
-
| `n` | `number` | No | Number of images (default: `1`)
|
|
74
|
+
| Field | Type | Required | Description |
|
|
75
|
+
| -------- | -------- | -------- | ---------------------------------------------- |
|
|
76
|
+
| `model` | `string` | Yes | Model ID (see table above) |
|
|
77
|
+
| `prompt` | `string` | Yes | Text description of the image to generate |
|
|
78
|
+
| `size` | `string` | No | Image dimensions, e.g. `"1024x1024"` (default) |
|
|
79
|
+
| `n` | `number` | No | Number of images (default: `1`) |
|
|
80
80
|
|
|
81
81
|
**Response:**
|
|
82
82
|
|
|
83
83
|
```typescript
|
|
84
84
|
{
|
|
85
|
-
created: number;
|
|
85
|
+
created: number; // Unix timestamp
|
|
86
86
|
data: Array<{
|
|
87
|
-
url: string;
|
|
87
|
+
url: string; // Publicly hosted image URL
|
|
88
88
|
revised_prompt?: string; // Model's rewritten prompt (dall-e-3 only)
|
|
89
89
|
}>;
|
|
90
90
|
}
|
|
@@ -96,22 +96,22 @@ Edit an existing image using AI. Route via ClawRouter proxy (`http://localhost:8
|
|
|
96
96
|
|
|
97
97
|
**Request body:**
|
|
98
98
|
|
|
99
|
-
| Field | Type | Required | Description
|
|
100
|
-
| -------- | -------- | -------- |
|
|
101
|
-
| `model` | `string` | No | Model ID (default: `openai/gpt-image-1`)
|
|
102
|
-
| `prompt` | `string` | Yes | Text description of the edit to apply
|
|
103
|
-
| `image` | `string` | Yes | Source image — see **Image input formats** below
|
|
104
|
-
| `mask` | `string` | No | Mask image (white = area to edit) — same formats as `image`
|
|
105
|
-
| `size` | `string` | No | Output dimensions, e.g. `"1024x1024"` (default)
|
|
99
|
+
| Field | Type | Required | Description |
|
|
100
|
+
| -------- | -------- | -------- | ----------------------------------------------------------- |
|
|
101
|
+
| `model` | `string` | No | Model ID (default: `openai/gpt-image-1`) |
|
|
102
|
+
| `prompt` | `string` | Yes | Text description of the edit to apply |
|
|
103
|
+
| `image` | `string` | Yes | Source image — see **Image input formats** below |
|
|
104
|
+
| `mask` | `string` | No | Mask image (white = area to edit) — same formats as `image` |
|
|
105
|
+
| `size` | `string` | No | Output dimensions, e.g. `"1024x1024"` (default) |
|
|
106
106
|
|
|
107
107
|
**Image input formats** — the `image` and `mask` fields accept any of:
|
|
108
108
|
|
|
109
|
-
| Format
|
|
110
|
-
|
|
|
111
|
-
| Local file path
|
|
112
|
-
| Home-relative path
|
|
113
|
-
| HTTP/HTTPS URL
|
|
114
|
-
| Base64 data URI
|
|
109
|
+
| Format | Example | Description |
|
|
110
|
+
| ------------------ | ---------------------------------- | ---------------------------------------------- |
|
|
111
|
+
| Local file path | `"/Users/me/photo.png"` | Absolute path — ClawRouter reads the file |
|
|
112
|
+
| Home-relative path | `"~/photo.png"` | Expands `~` to home directory |
|
|
113
|
+
| HTTP/HTTPS URL | `"https://example.com/photo.png"` | ClawRouter downloads the image automatically |
|
|
114
|
+
| Base64 data URI | `"data:image/png;base64,iVBOR..."` | Passed through directly (no conversion needed) |
|
|
115
115
|
|
|
116
116
|
Supported image formats: **PNG**, **JPG/JPEG**, **WebP**.
|
|
117
117
|
|
|
@@ -119,9 +119,9 @@ Supported image formats: **PNG**, **JPG/JPEG**, **WebP**.
|
|
|
119
119
|
|
|
120
120
|
```typescript
|
|
121
121
|
{
|
|
122
|
-
created: number;
|
|
122
|
+
created: number; // Unix timestamp
|
|
123
123
|
data: Array<{
|
|
124
|
-
url: string;
|
|
124
|
+
url: string; // Locally cached image URL (http://localhost:8402/images/...)
|
|
125
125
|
revised_prompt?: string; // Model's rewritten prompt
|
|
126
126
|
}>;
|
|
127
127
|
}
|
|
@@ -171,7 +171,7 @@ const response = await fetch("http://localhost:8402/v1/images/generations", {
|
|
|
171
171
|
}),
|
|
172
172
|
});
|
|
173
173
|
|
|
174
|
-
const result = await response.json() as {
|
|
174
|
+
const result = (await response.json()) as {
|
|
175
175
|
created: number;
|
|
176
176
|
data: Array<{ url: string; revised_prompt?: string }>;
|
|
177
177
|
};
|
|
@@ -206,7 +206,7 @@ print(image_url)
|
|
|
206
206
|
import OpenAI from "openai";
|
|
207
207
|
|
|
208
208
|
const client = new OpenAI({
|
|
209
|
-
apiKey: "blockrun",
|
|
209
|
+
apiKey: "blockrun", // any non-empty string
|
|
210
210
|
baseURL: "http://localhost:8402/v1",
|
|
211
211
|
});
|
|
212
212
|
|
|
@@ -352,12 +352,12 @@ When using ClawRouter with OpenClaw, generate and edit images directly from any
|
|
|
352
352
|
/img2img --image /tmp/portrait.png --size 1536x1024 add a hat
|
|
353
353
|
```
|
|
354
354
|
|
|
355
|
-
| Flag | Default
|
|
356
|
-
| --------- |
|
|
357
|
-
| `--image` | _(required)_
|
|
358
|
-
| `--mask` | _(none)_
|
|
359
|
-
| `--model` | `gpt-image-1`
|
|
360
|
-
| `--size` | `1024x1024`
|
|
355
|
+
| Flag | Default | Description |
|
|
356
|
+
| --------- | ------------- | ------------------------------------- |
|
|
357
|
+
| `--image` | _(required)_ | Local image file path (supports `~/`) |
|
|
358
|
+
| `--mask` | _(none)_ | Mask image (white = area to edit) |
|
|
359
|
+
| `--model` | `gpt-image-1` | Model to use |
|
|
360
|
+
| `--size` | `1024x1024` | Output size |
|
|
361
361
|
|
|
362
362
|
### Model shorthands
|
|
363
363
|
|
|
@@ -366,7 +366,7 @@ When using ClawRouter with OpenClaw, generate and edit images directly from any
|
|
|
366
366
|
| `nano-banana` | `google/nano-banana` |
|
|
367
367
|
| `banana-pro` | `google/nano-banana-pro` |
|
|
368
368
|
| `dall-e-3` | `openai/dall-e-3` |
|
|
369
|
-
| `gpt-image` | `openai/gpt-image-1`
|
|
369
|
+
| `gpt-image` | `openai/gpt-image-1` |
|
|
370
370
|
| `flux` | `black-forest/flux-1.1-pro` |
|
|
371
371
|
|
|
372
372
|
---
|
package/docs/{blog-benchmark-2026-03.md → llm-router-benchmark-46-models-sub-1ms-routing.md}
RENAMED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# We Benchmarked 39 AI Models Through Our Payment Gateway. Here's What We Found.
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
_March 16, 2026 | BlockRun Engineering_
|
|
4
4
|
|
|
5
5
|
Last week we ran every model on BlockRun through a real-world latency benchmark — 39 models, same prompts, same payment pipeline, same hardware. No cherry-picked results. No synthetic lab conditions. Just cold, hard numbers from production infrastructure.
|
|
6
6
|
|
|
@@ -18,47 +18,47 @@ We sent 2 coding prompts per model (256 max tokens, non-streaming) and measured
|
|
|
18
18
|
|
|
19
19
|
### Speed Rankings (End-to-End Latency Through BlockRun)
|
|
20
20
|
|
|
21
|
-
| #
|
|
22
|
-
|
|
23
|
-
| 1
|
|
24
|
-
| 2
|
|
25
|
-
| 3
|
|
26
|
-
| 4
|
|
27
|
-
| 5
|
|
28
|
-
| 6
|
|
29
|
-
| 7
|
|
30
|
-
| 8
|
|
31
|
-
| 9
|
|
32
|
-
| 10
|
|
33
|
-
| 11
|
|
34
|
-
| 12
|
|
35
|
-
| 13
|
|
36
|
-
| 14
|
|
37
|
-
| 15
|
|
38
|
-
| 16
|
|
39
|
-
| 17
|
|
40
|
-
| 18
|
|
41
|
-
| 19
|
|
42
|
-
| 20
|
|
43
|
-
| 21
|
|
44
|
-
| 22
|
|
45
|
-
| 23
|
|
46
|
-
| 24
|
|
47
|
-
| 25
|
|
48
|
-
| 26
|
|
49
|
-
| 27
|
|
50
|
-
| 28
|
|
51
|
-
| 29
|
|
52
|
-
| 30
|
|
53
|
-
| 31
|
|
54
|
-
| 32
|
|
55
|
-
| 33
|
|
56
|
-
| 34
|
|
57
|
-
| 35
|
|
58
|
-
| 36
|
|
59
|
-
| 37
|
|
60
|
-
| 38
|
|
61
|
-
| 39
|
|
21
|
+
| # | Model | Latency | Tok/s | $/1M in | $/1M out |
|
|
22
|
+
| --- | ------------------------------- | ------- | ----- | ------- | -------- |
|
|
23
|
+
| 1 | xai/grok-4-fast-non-reasoning | 1,143ms | 224 | $0.20 | $0.50 |
|
|
24
|
+
| 2 | xai/grok-3-mini | 1,202ms | 215 | $0.30 | $0.50 |
|
|
25
|
+
| 3 | google/gemini-2.5-flash | 1,238ms | 208 | $0.15 | $0.60 |
|
|
26
|
+
| 4 | xai/grok-3 | 1,244ms | 207 | $3.00 | $15.00 |
|
|
27
|
+
| 5 | xai/grok-4-1-fast-non-reasoning | 1,244ms | 206 | $0.20 | $0.50 |
|
|
28
|
+
| 6 | nvidia/gpt-oss-120b | 1,252ms | 204 | FREE | FREE |
|
|
29
|
+
| 7 | minimax/minimax-m2.5 | 1,278ms | 202 | $0.30 | $1.10 |
|
|
30
|
+
| 8 | google/gemini-2.5-pro | 1,294ms | 198 | $1.25 | $10.00 |
|
|
31
|
+
| 9 | xai/grok-4-fast-reasoning | 1,298ms | 198 | $0.20 | $0.50 |
|
|
32
|
+
| 10 | xai/grok-4-0709 | 1,348ms | 190 | $0.20 | $1.50 |
|
|
33
|
+
| 11 | google/gemini-3-pro-preview | 1,352ms | 190 | $1.25 | $10.00 |
|
|
34
|
+
| 12 | google/gemini-2.5-flash-lite | 1,353ms | 193 | $0.10 | $0.40 |
|
|
35
|
+
| 13 | google/gemini-3-flash-preview | 1,398ms | 183 | $0.15 | $0.60 |
|
|
36
|
+
| 14 | deepseek/deepseek-chat | 1,431ms | 179 | $0.27 | $1.10 |
|
|
37
|
+
| 15 | deepseek/deepseek-reasoner | 1,454ms | 183 | $0.55 | $2.19 |
|
|
38
|
+
| 16 | xai/grok-4-1-fast-reasoning | 1,454ms | 176 | $0.20 | $0.50 |
|
|
39
|
+
| 17 | google/gemini-3.1-pro | 1,609ms | 167 | $1.25 | $10.00 |
|
|
40
|
+
| 18 | moonshot/kimi-k2.5 | 1,646ms | 156 | $0.60 | $3.00 |
|
|
41
|
+
| 19 | anthropic/claude-sonnet-4.6 | 2,110ms | 121 | $3.00 | $15.00 |
|
|
42
|
+
| 20 | anthropic/claude-opus-4.6 | 2,139ms | 120 | $15.00 | $75.00 |
|
|
43
|
+
| 21 | openai/o3-mini | 2,260ms | 114 | $1.10 | $4.40 |
|
|
44
|
+
| 22 | openai/gpt-5-mini | 2,264ms | 114 | $1.10 | $4.40 |
|
|
45
|
+
| 23 | anthropic/claude-haiku-4.5 | 2,305ms | 141 | $0.80 | $4.00 |
|
|
46
|
+
| 24 | openai/o4-mini | 2,328ms | 111 | $1.10 | $4.40 |
|
|
47
|
+
| 25 | openai/gpt-4.1-mini | 2,340ms | 109 | $0.40 | $1.60 |
|
|
48
|
+
| 26 | openai/o1 | 2,562ms | 100 | $15.00 | $60.00 |
|
|
49
|
+
| 27 | openai/gpt-4.1-nano | 2,640ms | 97 | $0.10 | $0.40 |
|
|
50
|
+
| 28 | openai/o1-mini | 2,746ms | 93 | $1.10 | $4.40 |
|
|
51
|
+
| 29 | openai/gpt-4o-mini | 2,764ms | 93 | $0.15 | $0.60 |
|
|
52
|
+
| 30 | openai/o3 | 2,862ms | 90 | $2.00 | $8.00 |
|
|
53
|
+
| 31 | openai/gpt-5-nano | 3,187ms | 81 | $0.50 | $2.00 |
|
|
54
|
+
| 32 | openai/gpt-5.2-pro | 3,546ms | 73 | $2.50 | $10.00 |
|
|
55
|
+
| 33 | openai/gpt-4o | 5,378ms | 48 | $2.50 | $10.00 |
|
|
56
|
+
| 34 | openai/gpt-4.1 | 5,477ms | 47 | $2.00 | $8.00 |
|
|
57
|
+
| 35 | openai/gpt-5.3 | 5,910ms | 43 | $2.50 | $10.00 |
|
|
58
|
+
| 36 | openai/gpt-5.4 | 6,213ms | 41 | $2.50 | $15.00 |
|
|
59
|
+
| 37 | openai/gpt-5.2 | 6,507ms | 40 | $2.50 | $10.00 |
|
|
60
|
+
| 38 | openai/gpt-5.4-pro | 6,671ms | 40 | $2.50 | $15.00 |
|
|
61
|
+
| 39 | openai/gpt-5.3-codex | 7,935ms | 32 | $2.50 | $10.00 |
|
|
62
62
|
|
|
63
63
|
## Three Things That Surprised Us
|
|
64
64
|
|
|
@@ -86,21 +86,21 @@ OpenAI's "mini" and "nano" variants are faster (2.2-3.2s range) but still 2x slo
|
|
|
86
86
|
|
|
87
87
|
We cross-referenced our latency data with quality scores from [Artificial Analysis](https://artificialanalysis.ai/leaderboards/models) (Intelligence Index v4.0):
|
|
88
88
|
|
|
89
|
-
| Model
|
|
90
|
-
|
|
91
|
-
| Gemini 3.1 Pro
|
|
92
|
-
| GPT-5.4
|
|
93
|
-
| GPT-5.3 Codex
|
|
94
|
-
| Claude Opus 4.6
|
|
95
|
-
| Claude Sonnet 4.6
|
|
96
|
-
| Kimi K2.5
|
|
97
|
-
| Gemini 3 Flash Preview | 1,398ms
|
|
98
|
-
| Grok 4
|
|
99
|
-
| Grok 4.1 Fast
|
|
100
|
-
| DeepSeek V3
|
|
101
|
-
| Grok 3
|
|
102
|
-
| Grok 4 Fast
|
|
103
|
-
| Gemini 2.5 Flash
|
|
89
|
+
| Model | BlockRun Latency | Intelligence Index | Price Tier |
|
|
90
|
+
| ---------------------- | ---------------- | ------------------ | ----------- |
|
|
91
|
+
| Gemini 3.1 Pro | 1,609ms | 57 | $1.25/$10 |
|
|
92
|
+
| GPT-5.4 | 6,213ms | 57 | $2.50/$15 |
|
|
93
|
+
| GPT-5.3 Codex | 7,935ms | 54 | $2.50/$10 |
|
|
94
|
+
| Claude Opus 4.6 | 2,139ms | 53 | $15/$75 |
|
|
95
|
+
| Claude Sonnet 4.6 | 2,110ms | 52 | $3/$15 |
|
|
96
|
+
| Kimi K2.5 | 1,646ms | 47 | $0.60/$3 |
|
|
97
|
+
| Gemini 3 Flash Preview | 1,398ms | 46 | $0.15/$0.60 |
|
|
98
|
+
| Grok 4 | 1,348ms | 41 | $0.20/$1.50 |
|
|
99
|
+
| Grok 4.1 Fast | 1,244ms | 41 | $0.20/$0.50 |
|
|
100
|
+
| DeepSeek V3 | 1,431ms | 32 | $0.27/$1.10 |
|
|
101
|
+
| Grok 3 | 1,244ms | 32 | $3/$15 |
|
|
102
|
+
| Grok 4 Fast | 1,143ms | 23 | $0.20/$0.50 |
|
|
103
|
+
| Gemini 2.5 Flash | 1,238ms | 20 | $0.15/$0.60 |
|
|
104
104
|
|
|
105
105
|
**Gemini 3.1 Pro** is the standout: highest intelligence score (57) at just 1.6 seconds. GPT-5.4 matches its intelligence but takes **4x longer**.
|
|
106
106
|
|
|
@@ -131,7 +131,7 @@ Raw benchmark data: [benchmark-results.json](https://github.com/BlockRunAI/ClawR
|
|
|
131
131
|
|
|
132
132
|
---
|
|
133
133
|
|
|
134
|
-
|
|
134
|
+
_BlockRun is the x402 micropayment gateway for AI. One wallet, 39+ models, pay-per-request with USDC. [Get started](https://blockrun.ai)_
|
|
135
135
|
|
|
136
136
|
---
|
|
137
137
|
|
|
@@ -144,18 +144,14 @@ Raw benchmark data: [benchmark-results.json](https://github.com/BlockRunAI/ClawR
|
|
|
144
144
|
The fastest model (Grok 4 Fast) was 7x faster than the slowest (GPT-5.3 Codex). Here's the full breakdown:
|
|
145
145
|
|
|
146
146
|
**2/** Top 5 fastest (end-to-end latency):
|
|
147
|
+
|
|
147
148
|
1. xai/grok-4-fast — 1,143ms
|
|
148
149
|
2. xai/grok-3-mini — 1,202ms
|
|
149
150
|
3. google/gemini-2.5-flash — 1,238ms
|
|
150
151
|
4. xai/grok-3 — 1,244ms
|
|
151
152
|
5. nvidia/gpt-oss-120b — 1,252ms (FREE)
|
|
152
153
|
|
|
153
|
-
**3/** Bottom 5 (all OpenAI):
|
|
154
|
-
35. openai/gpt-5.3 — 5,910ms
|
|
155
|
-
36. openai/gpt-5.4 — 6,213ms
|
|
156
|
-
37. openai/gpt-5.2 — 6,507ms
|
|
157
|
-
38. openai/gpt-5.4-pro — 6,671ms
|
|
158
|
-
39. openai/gpt-5.3-codex — 7,935ms
|
|
154
|
+
**3/** Bottom 5 (all OpenAI): 35. openai/gpt-5.3 — 5,910ms 36. openai/gpt-5.4 — 6,213ms 37. openai/gpt-5.2 — 6,507ms 38. openai/gpt-5.4-pro — 6,671ms 39. openai/gpt-5.3-codex — 7,935ms
|
|
159
155
|
|
|
160
156
|
Every OpenAI 5.x model: 5-8 seconds. Every Grok/Gemini model: ~1.2 seconds.
|
|
161
157
|
|
|
@@ -166,6 +162,7 @@ We tried routing all requests to the fastest models. Users complained the "fast"
|
|
|
166
162
|
Lesson: you need to balance speed, quality, AND cost.
|
|
167
163
|
|
|
168
164
|
**5/** The efficiency frontier winners:
|
|
165
|
+
|
|
169
166
|
- Best overall: Gemini 3.1 Pro (IQ 57, 1.6s, $1.25/M)
|
|
170
167
|
- Best budget: Gemini 2.5 Flash (IQ 20, 1.2s, $0.15/M)
|
|
171
168
|
- Best reasoning: Claude Opus 4.6 (IQ 53, 2.1s, $15/M)
|
package/docs/routing-profiles.md
CHANGED
|
@@ -19,12 +19,12 @@ Use `blockrun/eco` for maximum cost savings.
|
|
|
19
19
|
|
|
20
20
|
Use `blockrun/auto` for the best quality/price balance.
|
|
21
21
|
|
|
22
|
-
| Tier | Primary Model
|
|
23
|
-
| --------- |
|
|
24
|
-
| SIMPLE | moonshot/kimi-k2.5
|
|
25
|
-
| MEDIUM | xai/grok-code-fast-1
|
|
26
|
-
| COMPLEX | google/gemini-3.1-pro
|
|
27
|
-
| REASONING | xai/grok-4-1-fast-reasoning
|
|
22
|
+
| Tier | Primary Model | Input | Output |
|
|
23
|
+
| --------- | --------------------------- | ----- | ------ |
|
|
24
|
+
| SIMPLE | moonshot/kimi-k2.5 | $0.60 | $3.00 |
|
|
25
|
+
| MEDIUM | xai/grok-code-fast-1 | $0.20 | $1.50 |
|
|
26
|
+
| COMPLEX | google/gemini-3.1-pro | $2.00 | $12.00 |
|
|
27
|
+
| REASONING | xai/grok-4-1-fast-reasoning | $0.20 | $0.50 |
|
|
28
28
|
|
|
29
29
|
---
|
|
30
30
|
|