npm - @blockrun/clawrouter - Versions diffs - 0.12.61 → 0.12.63 - Mend

@blockrun/clawrouter 0.12.61 → 0.12.63

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (33) hide show

package/docs/anthropic-cost-savings.md +349 -0
package/docs/architecture.md +559 -0
package/docs/assets/blockrun-248-day-cost-overrun-problem.png +0 -0
package/docs/assets/blockrun-clawrouter-7-layer-token-compression-openclaw.png +0 -0
package/docs/assets/blockrun-clawrouter-observation-compression-97-percent-token-savings.png +0 -0
package/docs/assets/blockrun-clawrouter-openclaw-agentic-proxy-architecture.png +0 -0
package/docs/assets/blockrun-clawrouter-openclaw-automatic-tier-routing-model-selection.png +0 -0
package/docs/assets/blockrun-clawrouter-openclaw-error-classification-retry-storm-prevention.png +0 -0
package/docs/assets/blockrun-clawrouter-openclaw-session-memory-journaling-vs-context-compounding.png +0 -0
package/docs/assets/blockrun-clawrouter-vs-openclaw-standalone-comparison-production-safety.png +0 -0
package/docs/assets/blockrun-clawrouter-x402-usdc-micropayment-wallet-budget-control.png +0 -0
package/docs/assets/blockrun-openclaw-inference-layer-blind-spots.png +0 -0
package/docs/blog-benchmark-2026-03.md +184 -0
package/docs/blog-openclaw-cost-overruns.md +197 -0
package/docs/clawrouter-savings.png +0 -0
package/docs/configuration.md +512 -0
package/docs/features.md +257 -0
package/docs/image-generation.md +380 -0
package/docs/plans/2026-02-03-smart-routing-design.md +267 -0
package/docs/plans/2026-02-13-e2e-docker-deployment.md +1260 -0
package/docs/plans/2026-02-28-worker-network.md +947 -0
package/docs/plans/2026-03-18-error-classification.md +574 -0
package/docs/plans/2026-03-19-exclude-models.md +538 -0
package/docs/routing-profiles.md +81 -0
package/docs/subscription-failover.md +320 -0
package/docs/technical-routing-2026-03.md +322 -0
package/docs/troubleshooting.md +159 -0
package/docs/vision.md +49 -0
package/docs/vs-openrouter.md +157 -0
package/docs/worker-network.md +1241 -0
package/package.json +3 -2
package/scripts/reinstall.sh +8 -4
package/scripts/update.sh +8 -4

package/docs/anthropic-cost-savings.md ADDED Viewed

@@ -0,0 +1,349 @@
+# Stop Overpaying for Claude: How ClawRouter Cuts Your Anthropic Bill by 70%
+*You love Claude. Your wallet doesn't. Here's how to keep frontier-quality answers — at a fraction of the cost.*
+---
+## The Problem: Claude Is Brilliant, But Expensive
+If you're building with the Anthropic API, you already know Claude is the best reasoning model available. Opus 4.6 runs $5/$25 per million tokens. Sonnet at $3/$15. Even Haiku costs $1/$5.
+But here's what most developers won't admit: **the majority of your API calls don't need Claude.**
+Think about your typical workload. You're building a SaaS app. Some requests need Claude's reasoning — debugging complex code, analyzing long documents, orchestrating multi-step agent workflows. But most requests are mundane: extracting JSON from text, answering simple user questions, translating a string, summarizing a paragraph.
+You're paying $3-25 per million tokens for work that a $0.10 model handles identically.
+**The problem is simple:** you're paying Claude rates on 100% of your requests, but only ~30% of them need Claude.
+---
+## What Does a Typical Developer Workload Look Like?
+### The Everyday Tasks (~70% of requests)
+These are the requests you fire off constantly and barely think about:
+- **"Extract the name and email from this text and return JSON"** — Any model can do this. You're paying Claude $15/M output tokens for structured extraction that a $0.40 model handles perfectly.
+- **"Summarize this customer support ticket in 2 sentences"** — Summarization is a solved problem. You don't need frontier reasoning here.
+- **"Translate this error message to Spanish"** — Translation is a commodity task. Paying Claude rates for it is like taking a Lamborghini to the grocery store.
+- **"What's the difference between `useEffect` and `useLayoutEffect`?"** — Factual Q&A. Every model gets this right.
+- **"Convert this CSV data to a markdown table"** — Pure formatting. A free model does this identically.
+### The Tasks That Actually Need Claude (~30% of requests)
+This is where you're paying for real value:
+- **Complex code generation** — "Refactor this authentication module to support OAuth2 + PKCE, handle token refresh, and add rate limiting." Multi-file, multi-constraint reasoning. Claude earns its price here.
+- **Long-document analysis** — "Read this 50-page contract and identify all clauses that could expose us to liability over $1M." Context window + reasoning quality matter.
+- **Multi-step agent orchestration** — "Scan these 5 APIs, cross-reference the data, and generate a report with recommendations." Agentic workflows where the model needs to maintain a plan across many steps.
+- **Advanced reasoning** — "Debug this race condition in our distributed system" or "Prove this algorithm is O(n log n)." Tasks where cheaper models lose the thread.
+---
+## The Solution: ClawRouter
+[ClawRouter](https://github.com/blockrunai/ClawRouter) is an open-source local proxy that sits between your app and 41+ AI models. It saves you money in three ways: **smart routing**, **token optimization**, and **response caching**.
+```
+┌─────────────┐     ┌──────────────────────────────┐     ┌──────────────────┐
+│  Your App    │────▶│       ClawRouter              │────▶│  41+ AI Models   │
+│  (OpenAI     │     │       (local proxy)           │     │                  │
+│   SDK)       │     │                               │     │  FREE  (nvidia)  │
+│              │     │  1. Route to cheapest model    │     │  $0.10 (gemini)  │
+│  model:      │     │  2. Compress tokens            │     │  $3.00 (sonnet)  │
+│  "auto"      │     │  3. Cache repeated requests    │     │  $0.20 (grok)    │
+└─────────────┘     └──────────────────────────────┘     └──────────────────┘
+```
+---
+## How You Save: Three Layers
+### Layer 1: Smart Routing (the biggest win)
+ClawRouter scores every prompt against 14 dimensions in <1ms and routes it to the cheapest model that can handle the task.
+```
+"What is the capital of France?"
+  → SIMPLE → nvidia/gpt-oss-120b (FREE)
+"Extract JSON from this text"
+  → SIMPLE → nvidia/gpt-oss-120b (FREE)
+"Refactor this auth module with OAuth2 + PKCE"
+  → COMPLEX → anthropic/claude-sonnet-4.6 ($3/$15)
+"Prove sqrt(2) is irrational, show every step"
+  → REASONING → xai/grok-4-1-fast-reasoning ($0.20/$0.50)
+```
+From real production data across 20,000+ paying user requests:
+| Model | % of Requests | Price (input/output per M) |
+|---|---|---|
+| gemini-2.5-flash-lite | 34.5% | $0.10 / $0.40 |
+| **claude-sonnet-4.6** | **22.7%** | **$3.00 / $15.00** |
+| kimi-k2.5 | 16.2% | $0.60 / $3.00 |
+| minimax-m2.5 | 6.5% | $0.30 / $1.20 |
+| grok-code-fast | 6.1% | $0.20 / $1.50 |
+| claude-haiku-4.5 | 2.7% | $1.00 / $5.00 |
+| nvidia/gpt-oss-120b | 2.1% | FREE |
+| grok-reasoning | 2.9% | $0.20 / $0.50 |
+| Others | 6.3% | varies |
+**Result:** 77% of requests go to models that cost 5-150x less than Sonnet. Only the ~23% that genuinely need Claude still go to Claude.
+### Layer 2: Token Compression (saves on every request)
+Even when a request does go to Claude, ClawRouter reduces the tokens you pay for. The proxy runs a multi-layer compression pipeline on your request **before** sending it to the provider — and you pay based on the **compressed** token count, not the original.
+**How it works:**
+| Compression Layer | What It Does | Savings |
+|---|---|---|
+| **Deduplication** | Removes duplicate messages in conversation history | 2-5% |
+| **Whitespace normalization** | Strips excess whitespace, trailing spaces, empty lines | 3-8% |
+| **JSON compaction** | Minifies JSON in tool calls and results | 2-4% |
+These three layers are **enabled by default** and are completely safe — they don't change semantic meaning. The compression triggers automatically on requests larger than 180KB (common in agent workflows and long conversations).
+**For agent-heavy workloads** (long tool outputs, multi-turn conversations), the savings are even larger. An optional observation compression layer can reduce massive tool outputs by up to 97% — turning 10KB of verbose log output into 300 characters of essential information.
+**Typical combined savings: 7-15% fewer tokens per request.** On long-context agent workloads: 20-40%.
+This matters most on expensive models. If you're sending a 50K-token agent conversation to Claude Sonnet, 15% compression saves ~$0.03 per request — that adds up to real money at scale.
+### Layer 3: Response Cache + Request Deduplication (saves 100%)
+ClawRouter caches responses locally. If your app sends the same request within 10 minutes, you get an instant response at **zero cost** — no API call, no tokens billed.
+This is more common than you'd think:
+- **Retry logic** — Your app retries on timeout. Without dedup, you pay twice. With ClawRouter, the retry resolves from cache instantly.
+- **Redundant requests** — Multiple users or processes asking the same thing? One API call, multiple responses.
+- **Agent loops** — Agentic frameworks often re-query with identical context. Cache catches these.
+```
+Request 1: "Summarize this document" → API call → $0.02 → cached
+Request 2: "Summarize this document" → cache hit → $0.00 → instant
+Request 3: "Summarize this document" → cache hit → $0.00 → instant
+```
+The deduplicator also catches in-flight duplicates: if two identical requests arrive simultaneously, only one goes to the provider. Both callers get the same response.
+---
+## The Cost Math (Honest Numbers)
+**10,000 mixed requests per month**, averaging 1,000 input tokens and 500 output tokens each.
+### Direct Anthropic API
+| Approach | Input (10M tokens) | Output (5M tokens) | Monthly Total |
+|---|---|---|---|
+| All Claude Sonnet | $30.00 | $75.00 | **$105.00** |
+| All Claude Opus | $50.00 | $125.00 | **$175.00** |
+### ClawRouter (real paying-user distribution)
+| Tier | % Requests | Routed To | Cost |
+|---|---|---|---|
+| Cheap models | 34.5% | gemini-flash-lite | $0.76 |
+| Mid-tier | 16.2% | kimi-k2.5 | $2.43 |
+| **Claude (complex)** | **22.7%** | **claude-sonnet-4.6** | **$17.44** |
+| Code models | 6.1% | grok-code-fast | $0.52 |
+| Reasoning | 2.9% | grok-reasoning | $0.03 |
+| Haiku | 2.7% | claude-haiku-4.5 | $0.76 |
+| Free | 2.1% | nvidia/gpt-oss-120b | $0.00 |
+| Other | 12.8% | various | $1.18 |
+| **Subtotal (routing)** | | | **$23.12** |
+| Token compression (~10%) | | | **-$2.31** |
+| Cache hits (~5% est.) | | | **-$1.16** |
+| **Final Total** | | | **~$19.65** |
+### The Bottom Line
+| Approach | Monthly Cost | Savings |
+|---|---|---|
+| Direct Claude Sonnet | $105.00 | — |
+| Direct Claude Opus | $175.00 | — |
+| **ClawRouter** | **~$20** | **~81% vs Sonnet, ~89% vs Opus** |
+Breaking down where the savings come from:
+| Savings Source | Estimated Impact | How |
+|---|---|---|
+| **Smart routing** | ~68% cost reduction | 77% of requests → cheaper models |
+| **Token compression** | ~7-15% on remaining cost | Fewer tokens billed per request |
+| **Response cache** | ~3-5% additional | Repeat requests cost $0 |
+| **Request dedup** | Prevents overcharges | Retries don't double-bill |
+---
+## How the 14-Dimension Router Works
+ClawRouter runs a weighted scoring algorithm on every prompt — entirely locally, in under 1 millisecond, zero external API calls.
+| Dimension | Weight | Detects |
+|---|---|---|
+| Reasoning Markers | 0.18 | "prove," "step by step," "analyze" |
+| Code Presence | 0.15 | `function`, `class`, `import`, code blocks |
+| Multi-Step Patterns | 0.12 | "first...then," numbered steps |
+| Technical Terms | 0.10 | Domain-specific vocabulary |
+| Token Count | 0.08 | Short vs. long context |
+| Question Complexity | 0.05 | Nested or compound questions |
+| Creative Markers | 0.05 | Creative writing indicators |
+| Constraint Count | 0.04 | "max," "minimum," "at most" |
+| Imperative Verbs | 0.03 | "create," "generate," "build" |
+| Output Format | 0.03 | JSON, YAML, table, markdown |
+| Simple Indicators | 0.02 | "what is," "define," "translate" |
+| Reference Complexity | 0.02 | "the code above," "the docs" |
+| Domain Specificity | 0.02 | Quantum, genomics, etc. |
+| Negation Complexity | 0.01 | "don't," "never," "avoid" |
+The weighted score maps to four tiers:
+```
+Score < 0.0   →  SIMPLE     →  Free or ultra-cheap models
+Score 0.0–0.3 →  MEDIUM     →  Mid-tier (Kimi K2.5, DeepSeek)
+Score 0.3–0.5 →  COMPLEX    →  Frontier (Claude Sonnet, Gemini Pro)
+Score > 0.5   →  REASONING  →  Specialized (Grok Reasoning, DeepSeek-R)
+```
+Multilingual support across 9 languages. Tool-calling and vision requests automatically filter for compatible models. If the primary model fails, a fallback chain tries alternatives before returning an error.
+---
+## Getting Started: 3 Minutes
+### Step 1: Install
+```bash
+npx @blockrun/clawrouter
+```
+Starts a local proxy on port 8402. Auto-generates a crypto wallet. Done.
+### Step 2: Update Your Code
+**Python** — change 2 lines:
+```python
+from openai import OpenAI
+client = OpenAI(
+    base_url="http://localhost:8402/v1",  # ← was: https://api.anthropic.com
+    api_key="unused"                       # ← ClawRouter handles auth
+)
+response = client.chat.completions.create(
+    model="blockrun/auto",                 # ← was: claude-sonnet-4.6
+    messages=[{"role": "user", "content": "Your prompt here"}]
+)
+```
+**TypeScript** — same idea:
+```typescript
+import OpenAI from "openai";
+const client = new OpenAI({
+  baseURL: "http://localhost:8402/v1",
+  apiKey: "unused",
+});
+const response = await client.chat.completions.create({
+  model: "blockrun/auto",  // or "eco" for max savings, "premium" for best quality
+  messages: [{ role: "user", content: "Your prompt here" }],
+});
+```
+**Routing profiles:**
+- `blockrun/auto` — Balanced cost/quality (default)
+- `blockrun/eco` — Maximum savings (free tier aggressively)
+- `blockrun/premium` — Best quality (Opus/Sonnet/GPT-5)
+- `blockrun/free` — Free tier only (gpt-oss-120b)
+### Step 3: Fund (optional)
+```bash
+# Your wallet address is shown on startup.
+# Send any amount of USDC on Base chain to that address.
+# $1 is enough for hundreds of requests.
+# Or start with $0 — the free tier model works immediately.
+```
+That's it. Your existing code works. Your output quality on complex tasks stays the same.
+### Check Your Savings
+```
+$ /stats 7
+╔═══════════════════════════════════════════════════════╗
+║        ClawRouter v0.12.12 — Usage Statistics         ║
+╠═══════════════════════════════════════════════════════╣
+║  Period: last 7 days                                  ║
+║  Total Requests: 1,523                                ║
+║  Actual Cost:    $12.35                               ║
+║  Baseline Cost:  $156.23 (if all went to Opus 4.6)   ║
+║  Saved: $143.89 (92.1%)                               ║
+╠═══════════════════════════════════════════════════════╣
+║  SIMPLE     ████████████████     50.2%  (765 reqs)   ║
+║  MEDIUM     ██████████           28.5%  (434 reqs)   ║
+║  COMPLEX    ██████               15.0%  (228 reqs)   ║
+║  REASONING  ██                    6.3%   (96 reqs)   ║
+╚═══════════════════════════════════════════════════════╝
+```
+---
+## Why ClawRouter Instead of OpenRouter?
+| | ClawRouter | OpenRouter |
+|---|---|---|
+| **Smart routing** | Automatic — 14-dimension scorer picks the model | Manual — you pick the model |
+| **Token optimization** | Built-in compression (7-15% savings) | None |
+| **Response caching** | Local cache, repeat requests = $0 | None |
+| **Request dedup** | Retries don't double-bill | None |
+| **Routing latency** | <1ms (local, on your machine) | Additional network hop |
+| **Payments** | Non-custodial USDC on Base (your wallet, your keys) | Prepaid credit balance (custodial) |
+| **Free tier** | NVIDIA GPT-OSS-120B (always available) | No free models |
+| **API keys** | Zero — proxy handles all auth | You manage keys per provider |
+| **Algorithm** | Open-source, MIT license, modify it yourself | Proprietary |
+The fundamental difference: **OpenRouter is a model marketplace where you choose.** ClawRouter is an intelligent proxy that **chooses for you**, compresses your tokens, caches your responses, and pays per-request with crypto from your own wallet.
+---
+## TL;DR
+| What | Details |
+|---|---|
+| **Problem** | You pay Claude $3-25/M tokens on every request, but ~70% don't need Claude |
+| **Solution** | ClawRouter auto-routes + compresses + caches |
+| **Savings** | ~81% vs Sonnet, ~89% vs Opus |
+| **How** | Routing (68%) + token compression (7-15%) + caching (3-5%) |
+| **Code change** | 2 lines (base_url + model name) |
+| **Setup time** | 3 minutes |
+| **Quality tradeoff** | None — complex tasks still go to Claude |
+| **Open source** | MIT license, local proxy, non-custodial payments |
+```bash
+# Start saving now:
+npx @blockrun/clawrouter
+```
+**Links:**
+- [ClawRouter on GitHub](https://github.com/blockrunai/ClawRouter) — MIT License
+- [BlockRun](https://blockrun.ai) — AI model marketplace
+- [x402 Protocol](https://www.x402.org/) — Per-request crypto payments for AI
+---
+*Cost data based on real production traffic from paying users across 20,000+ requests, March 2026. Savings vary by workload — agent-heavy and long-context workloads see larger compression benefits. ClawRouter is open-source and part of the BlockRun ecosystem.*