npm - @blockrun/clawrouter - Versions diffs - 0.12.63 → 0.12.65 - Mend

@blockrun/clawrouter 0.12.63 → 0.12.65

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

package/docs/clawrouter-vs-openrouter-llm-routing-comparison.md ADDED Viewed

@@ -0,0 +1,280 @@
+# We Read 100 OpenClaw Issues About OpenRouter. Here's What We Built Instead.
+> _OpenRouter is the most popular LLM aggregator. It's also the source of the most frustration in OpenClaw's issue tracker._
+---
+## The Data
+We searched OpenClaw's GitHub issues for "openrouter" and read every result. 100 issues. Open and closed. Filed by users who ran into the same structural problems over and over:
+| Category                        | Issue Count | Representative Issues                                                                                                                                                                                                                              |
+| ------------------------------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| **Broken fallback / failover**  | ~20         | [#22136](https://github.com/openclaw/openclaw/issues/22136), [#45663](https://github.com/openclaw/openclaw/issues/45663), [#50389](https://github.com/openclaw/openclaw/issues/50389), [#49079](https://github.com/openclaw/openclaw/issues/49079) |
+| **Model ID mangling**           | ~15         | [#49379](https://github.com/openclaw/openclaw/issues/49379), [#50711](https://github.com/openclaw/openclaw/issues/50711), [#25665](https://github.com/openclaw/openclaw/issues/25665), [#2373](https://github.com/openclaw/openclaw/issues/2373)   |
+| **Authentication / 401 errors** | ~8          | [#51056](https://github.com/openclaw/openclaw/issues/51056), [#34830](https://github.com/openclaw/openclaw/issues/34830), [#26960](https://github.com/openclaw/openclaw/issues/26960)                                                              |
+| **Cost / billing opacity**      | ~6          | [#25371](https://github.com/openclaw/openclaw/issues/25371), [#50738](https://github.com/openclaw/openclaw/issues/50738), [#38248](https://github.com/openclaw/openclaw/issues/38248)                                                              |
+| **Routing opacity**             | ~5          | [#7006](https://github.com/openclaw/openclaw/issues/7006), [#35842](https://github.com/openclaw/openclaw/issues/35842)                                                                                                                             |
+| **Missing feature parity**      | ~10         | [#46255](https://github.com/openclaw/openclaw/issues/46255), [#50485](https://github.com/openclaw/openclaw/issues/50485), [#30850](https://github.com/openclaw/openclaw/issues/30850)                                                              |
+| **Rate limit / key exhaustion** | ~4          | [#8615](https://github.com/openclaw/openclaw/issues/8615), [#48729](https://github.com/openclaw/openclaw/issues/48729)                                                                                                                             |
+| **Model catalog staleness**     | ~5          | [#10687](https://github.com/openclaw/openclaw/issues/10687), [#30152](https://github.com/openclaw/openclaw/issues/30152)                                                                                                                           |
+These aren't edge cases. They're structural consequences of how OpenRouter works: a middleman that adds latency, mangles model IDs, obscures routing decisions, and introduces its own failure modes on top of the providers it aggregates.
+---
+## 1. Broken Fallback — The #1 Pain Point
+From [#45663](https://github.com/openclaw/openclaw/issues/45663):
+> _"Provider returned error from OpenRouter does not trigger model failover."_
+From [#50389](https://github.com/openclaw/openclaw/issues/50389):
+> _"Rate limit errors surfaced to user instead of auto-failover."_
+When OpenRouter returns a 429 or provider error, OpenClaw's failover logic often doesn't recognize it as retriable. The user sees a raw error. The agent stops. ~20 issues document variations of this: HTTP 529 (Anthropic overloaded) not triggering fallback ([#49079](https://github.com/openclaw/openclaw/issues/49079)), invalid model IDs causing 400 instead of failover ([#50017](https://github.com/openclaw/openclaw/issues/50017)), timeouts in cron sessions with no recovery ([#49597](https://github.com/openclaw/openclaw/issues/49597)).
+### How ClawRouter Solves This
+ClawRouter maintains 8-deep fallback chains per routing tier. When a model fails:
+1. **200ms retry** — short-burst rate limits often recover in milliseconds
+2. **Next model** — if retry fails, move to the next model in the chain
+3. **Per-model isolation** — one provider's failure doesn't poison the others
+4. **All-failed summary** — if every model in the chain fails, you get a structured error listing every attempt and failure reason
+```
+[ClawRouter] Trying model 1/6: google/gemini-2.5-flash
+[ClawRouter] Model google/gemini-2.5-flash returned 429, retrying in 200ms...
+[ClawRouter] Retry failed, trying model 2/6: deepseek/deepseek-chat
+[ClawRouter] Success with model: deepseek/deepseek-chat
+```
+No silent failures. No raw 429s surfaced to the agent.
+---
+## 2. Model ID Mangling — Death by Prefix
+From [#25665](https://github.com/openclaw/openclaw/issues/25665):
+> _"Model config defaults to `openrouter/openrouter/auto` (double prefix)."_
+From [#50711](https://github.com/openclaw/openclaw/issues/50711):
+> _"Control UI model picker strips `openrouter/` prefix."_
+OpenRouter uses nested model IDs: `openrouter/deepseek/deepseek-v3.2`. OpenClaw's UI, Discord bot, and web gateway all handle these differently. Some add the prefix. Some strip it. Some double it. 15 issues trace back to model ID confusion.
+### How ClawRouter Solves This
+ClawRouter uses clean aliases. You say `sonnet` and get `anthropic/claude-sonnet-4-6`. You say `flash` and get `google/gemini-2.5-flash`. No nested prefixes. No double-prefix bugs.
+```typescript
+// resolveModelAlias() handles all normalization
+"sonnet"     → "anthropic/claude-sonnet-4-6"
+"opus"       → "anthropic/claude-opus-4-6"
+"flash"      → "google/gemini-2.5-flash"
+"grok"       → "xai/grok-4-0314"
+"deepseek"   → "deepseek/deepseek-chat"
+```
+One canonical format. No mangling. No UI inconsistency.
+---
+## 3. API Key Hell — 401s, Leakage, and Rotation
+From [#51056](https://github.com/openclaw/openclaw/issues/51056):
+> _"OpenRouter fails with '401 Missing Authentication header' despite valid key."_
+From [#8615](https://github.com/openclaw/openclaw/issues/8615):
+> _"Feature request: native multi-API-key support with load balancing and fallback."_
+API keys are the root cause of an entire category of failures. Keys expire. Keys leak into LLM context (every provider sees every other provider's keys in the serialized request). Keys hit rate limits that can't be load-balanced. 8 issues document auth failures alone.
+### How ClawRouter Solves This
+ClawRouter has no API keys. Zero.
+Payment happens via [x402](https://x402.org/) — a cryptographic micropayment protocol. Your agent generates a wallet on first run (BIP-44 derivation, both EVM and Solana). Each request is signed with the wallet's private key. USDC moves per-request.
+```
+No keys to leak.
+No keys to rotate.
+No keys to rate-limit.
+No keys to expire.
+```
+The wallet is the identity. The signature is the authentication. Nothing to configure, nothing to paste into a config file, nothing for the LLM to accidentally serialize.
+---
+## 4. Cost and Billing Opacity — Surprise Bills
+From [#25371](https://github.com/openclaw/openclaw/issues/25371):
+> _"OpenRouter 402 billing error misclassified as 'Context overflow', triggering auto-compaction that drains remaining credits faster."_
+From [#7006](https://github.com/openclaw/openclaw/issues/7006):
+> _"`openrouter/auto` doesn't expose which model was actually used or its cost."_
+When OpenRouter runs out of credits, it returns a 402 that OpenClaw misreads as a context overflow. OpenClaw then auto-compacts the context and retries — on the same empty balance. Each retry charges the compaction cost. Credits drain faster. The agent burns money trying to fix a billing error it doesn't understand.
+### How ClawRouter Solves This
+**Per-request cost visibility.** Every response includes cost headers:
+```
+x-clawrouter-cost: 0.0034
+x-clawrouter-savings: 82%
+x-clawrouter-model: google/gemini-2.5-flash
+```
+**Per-request USDC payments.** No prepaid balance to drain. Each request shows its price before you pay. When the wallet is empty, requests don't fail — they fall back to the free tier (NVIDIA GPT-OSS-120B).
+**Budget guard.** `maxCostPerRun` caps per-session spending. Two modes: `graceful` (downgrade to cheaper models) or `strict` (hard stop). The $248/day heartbeat scenario is structurally impossible.
+**Usage logging.** Every request logs to `~/.openclaw/blockrun/logs/usage-YYYY-MM-DD.jsonl` with model, tier, cost, baseline cost, savings, and latency. `/stats` shows the breakdown.
+---
+## 5. Routing Opacity — "Which Model Did I Just Pay For?"
+From [#7006](https://github.com/openclaw/openclaw/issues/7006):
+> _"No visibility into which model `openrouter/auto` actually uses."_
+From [#35842](https://github.com/openclaw/openclaw/issues/35842):
+> _"Need explicit Claude Sonnet default instead of auto-routing."_
+When you use `openrouter/auto`, you don't know what model served your request. You can't debug quality regressions. You can't understand cost spikes. You're paying for a black box.
+### How ClawRouter Solves This
+ClawRouter's routing is 100% local, open-source, and transparent.
+**14-dimension weighted classifier** runs locally in <1ms. It scores every request across: token count, code presence, reasoning markers, technical terms, multi-step patterns, question complexity, tool signals, and more.
+**Debug headers on every response:**
+```
+x-clawrouter-profile: auto
+x-clawrouter-tier: MEDIUM
+x-clawrouter-model: moonshot/kimi-k2.5
+x-clawrouter-confidence: 0.87
+x-clawrouter-reasoning: "Code task with moderate complexity"
+```
+**SSE debug comments** in streaming responses show the routing decision inline. You always know which model, why it was selected, and how confident the classifier was.
+**Four routing profiles** give you explicit control:
+| Profile   | Behavior                | Savings |
+| --------- | ----------------------- | ------- |
+| `auto`    | Balanced quality + cost | 74–100% |
+| `eco`     | Cheapest possible       | 95–100% |
+| `premium` | Best quality always     | 0%      |
+| `free`    | NVIDIA GPT-OSS only     | 100%    |
+No black box. No mystery routing. Full visibility, full control.
+---
+## 6. Missing Feature Parity — Images, Tools, Caching
+From [#46255](https://github.com/openclaw/openclaw/issues/46255):
+> _"Images not passed to OpenRouter models."_
+From [#47707](https://github.com/openclaw/openclaw/issues/47707):
+> _"Mistral models fail with strict tool call ID requirements."_
+OpenRouter doesn't always pass through provider-specific features correctly. Image payloads get dropped. Cache retention headers get ignored. Tool call ID formats cause silent failures with strict providers.
+### How ClawRouter Solves This
+**Vision auto-detection.** When `image_url` content parts are detected, ClawRouter automatically filters the fallback chain to vision-capable models only. No images dropped.
+**Tool calling validation.** Every model has a `toolCalling` flag. When tools are present in the request, ClawRouter forces agentic routing tiers and excludes models without tool support. No silent tool call failures.
+**Direct provider routing.** ClawRouter routes through BlockRun's API directly to providers — not through a second aggregator. One hop, not two. Provider-specific features work because there's no middleman translating them.
+---
+## 7. Model Catalog Staleness — "Where's the New Model?"
+From [#10687](https://github.com/openclaw/openclaw/issues/10687):
+> _"Need fully dynamic model discovery."_
+From [#30152](https://github.com/openclaw/openclaw/issues/30152):
+> _"Allowlist silently drops models not in catalog."_
+When new models launch, OpenRouter's catalog lags. Users configure a model that exists at the provider but isn't in the catalog. The request fails silently or gets rerouted.
+### How ClawRouter Solves This
+ClawRouter maintains a curated catalog of 46+ models across 8 providers, updated with each release. Delisted models have automatic redirect aliases:
+```typescript
+// Delisted models redirect automatically
+"xai/grok-code-fast-1"  → "deepseek/deepseek-chat"
+"google/gemini-2.0-pro"  → "google/gemini-3.1-pro"
+```
+No silent drops. No stale catalog. Models are benchmarked for speed, quality, and tool support before inclusion.
+---
+## The Full Comparison
+|                     | OpenRouter                       | ClawRouter                                     |
+| ------------------- | -------------------------------- | ---------------------------------------------- |
+| **Authentication**  | API key (leak risk)              | Wallet signature (no keys)                     |
+| **Payment**         | Prepaid balance (custodial)      | Per-request USDC (non-custodial)               |
+| **Routing**         | Server-side black box            | Local 14-dim classifier, <1ms                  |
+| **Fallback**        | Often broken (20+ issues)        | 8-deep chains, per-model isolation             |
+| **Model IDs**       | Nested prefixes, mangling bugs   | Clean aliases, single format                   |
+| **Cost visibility** | None per-request                 | Headers + JSONL logs + `/stats`                |
+| **Empty wallet**    | Request fails                    | Auto-fallback to free tier                     |
+| **Rate limits**     | Per-key, shared                  | Per-wallet, independent                        |
+| **Vision support**  | Images sometimes dropped         | Auto-detected, vision-only fallback            |
+| **Tool calling**    | Silent failures with some models | Flag-based filtering, guaranteed support       |
+| **Model catalog**   | Laggy, silent drops              | Curated 46+ models, redirect aliases           |
+| **Budget control**  | Monthly invoice                  | Per-session cap (`maxCostPerRun`)              |
+| **Setup**           | Create account, paste key        | Agent generates wallet, auto-configured        |
+| **Average cost**    | $25/M tokens (Opus direct)       | $2.05/M tokens (auto-routed) = **92% savings** |
+---
+## Getting Started
+```bash
+# Install
+npm install -g @blockrun/clawrouter
+# Start (auto-configures OpenClaw)
+clawrouter
+# Check your wallet
+# /wallet
+# View routing stats
+# /stats
+```
+ClawRouter auto-injects itself into `~/.openclaw/openclaw.json` as a provider on startup. Your existing tools, sessions, and extensions are unchanged.
+Load a wallet with USDC on Base or Solana, pick a routing profile, and run.
+---
+_[github.com/BlockRunAI/ClawRouter](https://github.com/BlockRunAI/ClawRouter) · [blockrun.ai](https://blockrun.ai) · `npm install -g @blockrun/clawrouter`_

package/docs/configuration.md CHANGED Viewed

@@ -316,7 +316,7 @@ plugins:
     config:
       # Maximum spend per session/run in USD.
       # Default: disabled (no limit)
-      maxCostPerRun: 0.50   # $0.50 per session
+      maxCostPerRun: 0.50 # $0.50 per session
       # How to enforce the budget cap. Default: graceful
       #
@@ -326,7 +326,7 @@ plugins:
       #
       # strict: immediately returns 429 (X-ClawRouter-Cost-Cap-Exceeded: 1) once
       #   the session spend reaches the cap. Use when you need a hard budget ceiling.
-      maxCostPerRunMode: graceful   # or: strict
+      maxCostPerRunMode: graceful # or: strict
       # Note: image generation endpoints (/v1/images/generations) bypass maxCostPerRun.
       # Their cost is charged via x402 micropayment directly and is not tracked per-session.

package/docs/image-generation.md CHANGED Viewed

@@ -51,13 +51,13 @@ The returned URL is a publicly hosted image, ready to use in Telegram, Discord,
 ## Models & Pricing
-| Model ID                   | Shorthand       | Price       | Max Size   | Provider          |
-| -------------------------- | --------------- | ----------- | ---------- | ----------------- |
-| `google/nano-banana`       | `nano-banana`   | $0.05/image | 1024×1024  | Google Gemini Flash |
-| `google/nano-banana-pro`   | `banana-pro`    | $0.10/image | 4096×4096  | Google Gemini Pro |
-| `openai/dall-e-3`          | `dall-e-3`      | $0.04/image | 1792×1024  | OpenAI DALL-E 3   |
-| `openai/gpt-image-1`       | `gpt-image`     | $0.02/image | 1536×1024  | OpenAI GPT Image  |
-| `black-forest/flux-1.1-pro`| `flux`          | $0.04/image | 1024×1024  | Black Forest Labs |
+| Model ID                    | Shorthand     | Price       | Max Size  | Provider            |
+| --------------------------- | ------------- | ----------- | --------- | ------------------- |
+| `google/nano-banana`        | `nano-banana` | $0.05/image | 1024×1024 | Google Gemini Flash |
+| `google/nano-banana-pro`    | `banana-pro`  | $0.10/image | 4096×4096 | Google Gemini Pro   |
+| `openai/dall-e-3`           | `dall-e-3`    | $0.04/image | 1792×1024 | OpenAI DALL-E 3     |
+| `openai/gpt-image-1`        | `gpt-image`   | $0.02/image | 1536×1024 | OpenAI GPT Image    |
+| `black-forest/flux-1.1-pro` | `flux`        | $0.04/image | 1024×1024 | Black Forest Labs   |
 Default model: `google/nano-banana`.
@@ -71,20 +71,20 @@ OpenAI-compatible endpoint. Route via ClawRouter proxy (`http://localhost:8402`)
 **Request body:**
-| Field    | Type     | Required | Description                                      |
-| -------- | -------- | -------- | ------------------------------------------------ |
-| `model`  | `string` | Yes      | Model ID (see table above)                       |
-| `prompt` | `string` | Yes      | Text description of the image to generate        |
-| `size`   | `string` | No       | Image dimensions, e.g. `"1024x1024"` (default)  |
-| `n`      | `number` | No       | Number of images (default: `1`)                  |
+| Field    | Type     | Required | Description                                    |
+| -------- | -------- | -------- | ---------------------------------------------- |
+| `model`  | `string` | Yes      | Model ID (see table above)                     |
+| `prompt` | `string` | Yes      | Text description of the image to generate      |
+| `size`   | `string` | No       | Image dimensions, e.g. `"1024x1024"` (default) |
+| `n`      | `number` | No       | Number of images (default: `1`)                |
 **Response:**
 ```typescript
 {
-  created: number;          // Unix timestamp
+  created: number; // Unix timestamp
   data: Array<{
-    url: string;            // Publicly hosted image URL
+    url: string; // Publicly hosted image URL
     revised_prompt?: string; // Model's rewritten prompt (dall-e-3 only)
   }>;
 }
@@ -96,22 +96,22 @@ Edit an existing image using AI. Route via ClawRouter proxy (`http://localhost:8
 **Request body:**
-| Field    | Type     | Required | Description                                                    |
-| -------- | -------- | -------- | -------------------------------------------------------------- |
-| `model`  | `string` | No       | Model ID (default: `openai/gpt-image-1`)                       |
-| `prompt` | `string` | Yes      | Text description of the edit to apply                          |
-| `image`  | `string` | Yes      | Source image — see **Image input formats** below               |
-| `mask`   | `string` | No       | Mask image (white = area to edit) — same formats as `image`    |
-| `size`   | `string` | No       | Output dimensions, e.g. `"1024x1024"` (default)               |
+| Field    | Type     | Required | Description                                                 |
+| -------- | -------- | -------- | ----------------------------------------------------------- |
+| `model`  | `string` | No       | Model ID (default: `openai/gpt-image-1`)                    |
+| `prompt` | `string` | Yes      | Text description of the edit to apply                       |
+| `image`  | `string` | Yes      | Source image — see **Image input formats** below            |
+| `mask`   | `string` | No       | Mask image (white = area to edit) — same formats as `image` |
+| `size`   | `string` | No       | Output dimensions, e.g. `"1024x1024"` (default)             |
 **Image input formats** — the `image` and `mask` fields accept any of:
-| Format              | Example                              | Description                                    |
-| ------------------- | ------------------------------------ | ---------------------------------------------- |
-| Local file path     | `"/Users/me/photo.png"`              | Absolute path — ClawRouter reads the file      |
-| Home-relative path  | `"~/photo.png"`                      | Expands `~` to home directory                  |
-| HTTP/HTTPS URL      | `"https://example.com/photo.png"`    | ClawRouter downloads the image automatically   |
-| Base64 data URI     | `"data:image/png;base64,iVBOR..."`   | Passed through directly (no conversion needed) |
+| Format             | Example                            | Description                                    |
+| ------------------ | ---------------------------------- | ---------------------------------------------- |
+| Local file path    | `"/Users/me/photo.png"`            | Absolute path — ClawRouter reads the file      |
+| Home-relative path | `"~/photo.png"`                    | Expands `~` to home directory                  |
+| HTTP/HTTPS URL     | `"https://example.com/photo.png"`  | ClawRouter downloads the image automatically   |
+| Base64 data URI    | `"data:image/png;base64,iVBOR..."` | Passed through directly (no conversion needed) |
 Supported image formats: **PNG**, **JPG/JPEG**, **WebP**.
@@ -119,9 +119,9 @@ Supported image formats: **PNG**, **JPG/JPEG**, **WebP**.
 ```typescript
 {
-  created: number;          // Unix timestamp
+  created: number; // Unix timestamp
   data: Array<{
-    url: string;            // Locally cached image URL (http://localhost:8402/images/...)
+    url: string; // Locally cached image URL (http://localhost:8402/images/...)
     revised_prompt?: string; // Model's rewritten prompt
   }>;
 }
@@ -171,7 +171,7 @@ const response = await fetch("http://localhost:8402/v1/images/generations", {
   }),
 });
-const result = await response.json() as {
+const result = (await response.json()) as {
   created: number;
   data: Array<{ url: string; revised_prompt?: string }>;
 };
@@ -206,7 +206,7 @@ print(image_url)
 import OpenAI from "openai";
 const client = new OpenAI({
-  apiKey: "blockrun",               // any non-empty string
+  apiKey: "blockrun", // any non-empty string
   baseURL: "http://localhost:8402/v1",
 });
@@ -352,12 +352,12 @@ When using ClawRouter with OpenClaw, generate and edit images directly from any
 /img2img --image /tmp/portrait.png --size 1536x1024 add a hat
 ```
-| Flag      | Default        | Description                           |
-| --------- | -------------- | ------------------------------------- |
-| `--image` | _(required)_   | Local image file path (supports `~/`) |
-| `--mask`  | _(none)_       | Mask image (white = area to edit)     |
-| `--model` | `gpt-image-1`  | Model to use                          |
-| `--size`  | `1024x1024`    | Output size                           |
+| Flag      | Default       | Description                           |
+| --------- | ------------- | ------------------------------------- |
+| `--image` | _(required)_  | Local image file path (supports `~/`) |
+| `--mask`  | _(none)_      | Mask image (white = area to edit)     |
+| `--model` | `gpt-image-1` | Model to use                          |
+| `--size`  | `1024x1024`   | Output size                           |
 ### Model shorthands
@@ -366,7 +366,7 @@ When using ClawRouter with OpenClaw, generate and edit images directly from any
 | `nano-banana` | `google/nano-banana`        |
 | `banana-pro`  | `google/nano-banana-pro`    |
 | `dall-e-3`    | `openai/dall-e-3`           |
-| `gpt-image`   | `openai/gpt-image-1`       |
+| `gpt-image`   | `openai/gpt-image-1`        |
 | `flux`        | `black-forest/flux-1.1-pro` |
 ---

package/docs/{blog-benchmark-2026-03.md → llm-router-benchmark-46-models-sub-1ms-routing.md} RENAMED Viewed

@@ -1,6 +1,6 @@
 # We Benchmarked 39 AI Models Through Our Payment Gateway. Here's What We Found.
-*March 16, 2026 | BlockRun Engineering*
+_March 16, 2026 | BlockRun Engineering_
 Last week we ran every model on BlockRun through a real-world latency benchmark — 39 models, same prompts, same payment pipeline, same hardware. No cherry-picked results. No synthetic lab conditions. Just cold, hard numbers from production infrastructure.
@@ -18,47 +18,47 @@ We sent 2 coding prompts per model (256 max tokens, non-streaming) and measured
 ### Speed Rankings (End-to-End Latency Through BlockRun)
-| # | Model | Latency | Tok/s | $/1M in | $/1M out |
-|---|-------|---------|-------|---------|----------|
-| 1 | xai/grok-4-fast-non-reasoning | 1,143ms | 224 | $0.20 | $0.50 |
-| 2 | xai/grok-3-mini | 1,202ms | 215 | $0.30 | $0.50 |
-| 3 | google/gemini-2.5-flash | 1,238ms | 208 | $0.15 | $0.60 |
-| 4 | xai/grok-3 | 1,244ms | 207 | $3.00 | $15.00 |
-| 5 | xai/grok-4-1-fast-non-reasoning | 1,244ms | 206 | $0.20 | $0.50 |
-| 6 | nvidia/gpt-oss-120b | 1,252ms | 204 | FREE | FREE |
-| 7 | minimax/minimax-m2.5 | 1,278ms | 202 | $0.30 | $1.10 |
-| 8 | google/gemini-2.5-pro | 1,294ms | 198 | $1.25 | $10.00 |
-| 9 | xai/grok-4-fast-reasoning | 1,298ms | 198 | $0.20 | $0.50 |
-| 10 | xai/grok-4-0709 | 1,348ms | 190 | $0.20 | $1.50 |
-| 11 | google/gemini-3-pro-preview | 1,352ms | 190 | $1.25 | $10.00 |
-| 12 | google/gemini-2.5-flash-lite | 1,353ms | 193 | $0.10 | $0.40 |
-| 13 | google/gemini-3-flash-preview | 1,398ms | 183 | $0.15 | $0.60 |
-| 14 | deepseek/deepseek-chat | 1,431ms | 179 | $0.27 | $1.10 |
-| 15 | deepseek/deepseek-reasoner | 1,454ms | 183 | $0.55 | $2.19 |
-| 16 | xai/grok-4-1-fast-reasoning | 1,454ms | 176 | $0.20 | $0.50 |
-| 17 | google/gemini-3.1-pro | 1,609ms | 167 | $1.25 | $10.00 |
-| 18 | moonshot/kimi-k2.5 | 1,646ms | 156 | $0.60 | $3.00 |
-| 19 | anthropic/claude-sonnet-4.6 | 2,110ms | 121 | $3.00 | $15.00 |
-| 20 | anthropic/claude-opus-4.6 | 2,139ms | 120 | $15.00 | $75.00 |
-| 21 | openai/o3-mini | 2,260ms | 114 | $1.10 | $4.40 |
-| 22 | openai/gpt-5-mini | 2,264ms | 114 | $1.10 | $4.40 |
-| 23 | anthropic/claude-haiku-4.5 | 2,305ms | 141 | $0.80 | $4.00 |
-| 24 | openai/o4-mini | 2,328ms | 111 | $1.10 | $4.40 |
-| 25 | openai/gpt-4.1-mini | 2,340ms | 109 | $0.40 | $1.60 |
-| 26 | openai/o1 | 2,562ms | 100 | $15.00 | $60.00 |
-| 27 | openai/gpt-4.1-nano | 2,640ms | 97 | $0.10 | $0.40 |
-| 28 | openai/o1-mini | 2,746ms | 93 | $1.10 | $4.40 |
-| 29 | openai/gpt-4o-mini | 2,764ms | 93 | $0.15 | $0.60 |
-| 30 | openai/o3 | 2,862ms | 90 | $2.00 | $8.00 |
-| 31 | openai/gpt-5-nano | 3,187ms | 81 | $0.50 | $2.00 |
-| 32 | openai/gpt-5.2-pro | 3,546ms | 73 | $2.50 | $10.00 |
-| 33 | openai/gpt-4o | 5,378ms | 48 | $2.50 | $10.00 |
-| 34 | openai/gpt-4.1 | 5,477ms | 47 | $2.00 | $8.00 |
-| 35 | openai/gpt-5.3 | 5,910ms | 43 | $2.50 | $10.00 |
-| 36 | openai/gpt-5.4 | 6,213ms | 41 | $2.50 | $15.00 |
-| 37 | openai/gpt-5.2 | 6,507ms | 40 | $2.50 | $10.00 |
-| 38 | openai/gpt-5.4-pro | 6,671ms | 40 | $2.50 | $15.00 |
-| 39 | openai/gpt-5.3-codex | 7,935ms | 32 | $2.50 | $10.00 |
+| #   | Model                           | Latency | Tok/s | $/1M in | $/1M out |
+| --- | ------------------------------- | ------- | ----- | ------- | -------- |
+| 1   | xai/grok-4-fast-non-reasoning   | 1,143ms | 224   | $0.20   | $0.50    |
+| 2   | xai/grok-3-mini                 | 1,202ms | 215   | $0.30   | $0.50    |
+| 3   | google/gemini-2.5-flash         | 1,238ms | 208   | $0.15   | $0.60    |
+| 4   | xai/grok-3                      | 1,244ms | 207   | $3.00   | $15.00   |
+| 5   | xai/grok-4-1-fast-non-reasoning | 1,244ms | 206   | $0.20   | $0.50    |
+| 6   | nvidia/gpt-oss-120b             | 1,252ms | 204   | FREE    | FREE     |
+| 7   | minimax/minimax-m2.5            | 1,278ms | 202   | $0.30   | $1.10    |
+| 8   | google/gemini-2.5-pro           | 1,294ms | 198   | $1.25   | $10.00   |
+| 9   | xai/grok-4-fast-reasoning       | 1,298ms | 198   | $0.20   | $0.50    |
+| 10  | xai/grok-4-0709                 | 1,348ms | 190   | $0.20   | $1.50    |
+| 11  | google/gemini-3-pro-preview     | 1,352ms | 190   | $1.25   | $10.00   |
+| 12  | google/gemini-2.5-flash-lite    | 1,353ms | 193   | $0.10   | $0.40    |
+| 13  | google/gemini-3-flash-preview   | 1,398ms | 183   | $0.15   | $0.60    |
+| 14  | deepseek/deepseek-chat          | 1,431ms | 179   | $0.27   | $1.10    |
+| 15  | deepseek/deepseek-reasoner      | 1,454ms | 183   | $0.55   | $2.19    |
+| 16  | xai/grok-4-1-fast-reasoning     | 1,454ms | 176   | $0.20   | $0.50    |
+| 17  | google/gemini-3.1-pro           | 1,609ms | 167   | $1.25   | $10.00   |
+| 18  | moonshot/kimi-k2.5              | 1,646ms | 156   | $0.60   | $3.00    |
+| 19  | anthropic/claude-sonnet-4.6     | 2,110ms | 121   | $3.00   | $15.00   |
+| 20  | anthropic/claude-opus-4.6       | 2,139ms | 120   | $15.00  | $75.00   |
+| 21  | openai/o3-mini                  | 2,260ms | 114   | $1.10   | $4.40    |
+| 22  | openai/gpt-5-mini               | 2,264ms | 114   | $1.10   | $4.40    |
+| 23  | anthropic/claude-haiku-4.5      | 2,305ms | 141   | $0.80   | $4.00    |
+| 24  | openai/o4-mini                  | 2,328ms | 111   | $1.10   | $4.40    |
+| 25  | openai/gpt-4.1-mini             | 2,340ms | 109   | $0.40   | $1.60    |
+| 26  | openai/o1                       | 2,562ms | 100   | $15.00  | $60.00   |
+| 27  | openai/gpt-4.1-nano             | 2,640ms | 97    | $0.10   | $0.40    |
+| 28  | openai/o1-mini                  | 2,746ms | 93    | $1.10   | $4.40    |
+| 29  | openai/gpt-4o-mini              | 2,764ms | 93    | $0.15   | $0.60    |
+| 30  | openai/o3                       | 2,862ms | 90    | $2.00   | $8.00    |
+| 31  | openai/gpt-5-nano               | 3,187ms | 81    | $0.50   | $2.00    |
+| 32  | openai/gpt-5.2-pro              | 3,546ms | 73    | $2.50   | $10.00   |
+| 33  | openai/gpt-4o                   | 5,378ms | 48    | $2.50   | $10.00   |
+| 34  | openai/gpt-4.1                  | 5,477ms | 47    | $2.00   | $8.00    |
+| 35  | openai/gpt-5.3                  | 5,910ms | 43    | $2.50   | $10.00   |
+| 36  | openai/gpt-5.4                  | 6,213ms | 41    | $2.50   | $15.00   |
+| 37  | openai/gpt-5.2                  | 6,507ms | 40    | $2.50   | $10.00   |
+| 38  | openai/gpt-5.4-pro              | 6,671ms | 40    | $2.50   | $15.00   |
+| 39  | openai/gpt-5.3-codex            | 7,935ms | 32    | $2.50   | $10.00   |
 ## Three Things That Surprised Us
@@ -86,21 +86,21 @@ OpenAI's "mini" and "nano" variants are faster (2.2-3.2s range) but still 2x slo
 We cross-referenced our latency data with quality scores from [Artificial Analysis](https://artificialanalysis.ai/leaderboards/models) (Intelligence Index v4.0):
-| Model | BlockRun Latency | Intelligence Index | Price Tier |
-|-------|-----------------|-------------------|-----------|
-| Gemini 3.1 Pro | 1,609ms | 57 | $1.25/$10 |
-| GPT-5.4 | 6,213ms | 57 | $2.50/$15 |
-| GPT-5.3 Codex | 7,935ms | 54 | $2.50/$10 |
-| Claude Opus 4.6 | 2,139ms | 53 | $15/$75 |
-| Claude Sonnet 4.6 | 2,110ms | 52 | $3/$15 |
-| Kimi K2.5 | 1,646ms | 47 | $0.60/$3 |
-| Gemini 3 Flash Preview | 1,398ms | 46 | $0.15/$0.60 |
-| Grok 4 | 1,348ms | 41 | $0.20/$1.50 |
-| Grok 4.1 Fast | 1,244ms | 41 | $0.20/$0.50 |
-| DeepSeek V3 | 1,431ms | 32 | $0.27/$1.10 |
-| Grok 3 | 1,244ms | 32 | $3/$15 |
-| Grok 4 Fast | 1,143ms | 23 | $0.20/$0.50 |
-| Gemini 2.5 Flash | 1,238ms | 20 | $0.15/$0.60 |
+| Model                  | BlockRun Latency | Intelligence Index | Price Tier  |
+| ---------------------- | ---------------- | ------------------ | ----------- |
+| Gemini 3.1 Pro         | 1,609ms          | 57                 | $1.25/$10   |
+| GPT-5.4                | 6,213ms          | 57                 | $2.50/$15   |
+| GPT-5.3 Codex          | 7,935ms          | 54                 | $2.50/$10   |
+| Claude Opus 4.6        | 2,139ms          | 53                 | $15/$75     |
+| Claude Sonnet 4.6      | 2,110ms          | 52                 | $3/$15      |
+| Kimi K2.5              | 1,646ms          | 47                 | $0.60/$3    |
+| Gemini 3 Flash Preview | 1,398ms          | 46                 | $0.15/$0.60 |
+| Grok 4                 | 1,348ms          | 41                 | $0.20/$1.50 |
+| Grok 4.1 Fast          | 1,244ms          | 41                 | $0.20/$0.50 |
+| DeepSeek V3            | 1,431ms          | 32                 | $0.27/$1.10 |
+| Grok 3                 | 1,244ms          | 32                 | $3/$15      |
+| Grok 4 Fast            | 1,143ms          | 23                 | $0.20/$0.50 |
+| Gemini 2.5 Flash       | 1,238ms          | 20                 | $0.15/$0.60 |
 **Gemini 3.1 Pro** is the standout: highest intelligence score (57) at just 1.6 seconds. GPT-5.4 matches its intelligence but takes **4x longer**.
@@ -131,7 +131,7 @@ Raw benchmark data: [benchmark-results.json](https://github.com/BlockRunAI/ClawR
 ---
-*BlockRun is the x402 micropayment gateway for AI. One wallet, 39+ models, pay-per-request with USDC. [Get started](https://blockrun.ai)*
+_BlockRun is the x402 micropayment gateway for AI. One wallet, 39+ models, pay-per-request with USDC. [Get started](https://blockrun.ai)_
 ---
@@ -144,18 +144,14 @@ Raw benchmark data: [benchmark-results.json](https://github.com/BlockRunAI/ClawR
 The fastest model (Grok 4 Fast) was 7x faster than the slowest (GPT-5.3 Codex). Here's the full breakdown:
 **2/** Top 5 fastest (end-to-end latency):
 1. xai/grok-4-fast — 1,143ms
 2. xai/grok-3-mini — 1,202ms
 3. google/gemini-2.5-flash — 1,238ms
 4. xai/grok-3 — 1,244ms
 5. nvidia/gpt-oss-120b — 1,252ms (FREE)
-**3/** Bottom 5 (all OpenAI):
-35. openai/gpt-5.3 — 5,910ms
-36. openai/gpt-5.4 — 6,213ms
-37. openai/gpt-5.2 — 6,507ms
-38. openai/gpt-5.4-pro — 6,671ms
-39. openai/gpt-5.3-codex — 7,935ms
+**3/** Bottom 5 (all OpenAI): 35. openai/gpt-5.3 — 5,910ms 36. openai/gpt-5.4 — 6,213ms 37. openai/gpt-5.2 — 6,507ms 38. openai/gpt-5.4-pro — 6,671ms 39. openai/gpt-5.3-codex — 7,935ms
 Every OpenAI 5.x model: 5-8 seconds. Every Grok/Gemini model: ~1.2 seconds.
@@ -166,6 +162,7 @@ We tried routing all requests to the fastest models. Users complained the "fast"
 Lesson: you need to balance speed, quality, AND cost.
 **5/** The efficiency frontier winners:
 - Best overall: Gemini 3.1 Pro (IQ 57, 1.6s, $1.25/M)
 - Best budget: Gemini 2.5 Flash (IQ 20, 1.2s, $0.15/M)
 - Best reasoning: Claude Opus 4.6 (IQ 53, 2.1s, $15/M)

package/docs/routing-profiles.md CHANGED Viewed

@@ -19,12 +19,12 @@ Use `blockrun/eco` for maximum cost savings.
 Use `blockrun/auto` for the best quality/price balance.
-| Tier      | Primary Model                 | Input | Output |
-| --------- | ----------------------------- | ----- | ------ |
-| SIMPLE    | moonshot/kimi-k2.5            | $0.60 | $3.00  |
-| MEDIUM    | xai/grok-code-fast-1          | $0.20 | $1.50  |
-| COMPLEX   | google/gemini-3.1-pro         | $2.00 | $12.00 |
-| REASONING | xai/grok-4-1-fast-reasoning   | $0.20 | $0.50  |
+| Tier      | Primary Model               | Input | Output |
+| --------- | --------------------------- | ----- | ------ |
+| SIMPLE    | moonshot/kimi-k2.5          | $0.60 | $3.00  |
+| MEDIUM    | xai/grok-code-fast-1        | $0.20 | $1.50  |
+| COMPLEX   | google/gemini-3.1-pro       | $2.00 | $12.00 |
+| REASONING | xai/grok-4-1-fast-reasoning | $0.20 | $0.50  |
 ---