@blockrun/llm 1.13.0 → 2.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +65 -81
- package/dist/index.cjs +402 -218
- package/dist/index.d.cts +141 -77
- package/dist/index.d.ts +141 -77
- package/dist/index.js +403 -218
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
|
|
3
3
|
> **@blockrun/llm** is a TypeScript/Node.js SDK for accessing 41+ large language models (GPT-5, Claude, Gemini, Grok, DeepSeek, Kimi, and more) with automatic pay-per-request USDC micropayments via the x402 protocol. No API keys required — your wallet signature is your authentication. Supports **streaming**, smart routing, Base and Solana chains.
|
|
4
4
|
>
|
|
5
|
-
> 🆓 **Includes
|
|
5
|
+
> 🆓 **Includes 8 fully-free NVIDIA-hosted models** (6 visible in `/v1/models`, 2 hidden but directly callable) — DeepSeek V4 Flash (1M context), Nemotron Nano Omni (vision), Qwen3, Llama 4, Mistral, plus the gpt-oss pair. Zero USDC, no rate-limit gimmicks. Use `routingProfile: 'free'` or call any `nvidia/*` model directly.
|
|
6
6
|
|
|
7
7
|
[](https://www.npmjs.com/package/@blockrun/llm)
|
|
8
8
|
[](LICENSE)
|
|
@@ -63,17 +63,18 @@ console.log(result.response); // '4'
|
|
|
63
63
|
|
|
64
64
|
| Model ID | Context | Best For |
|
|
65
65
|
|----------|---------|----------|
|
|
66
|
-
| `nvidia/deepseek-v4-
|
|
67
|
-
| `nvidia/deepseek-v4-flash` | 1M | ~5× faster than V4 Pro — chat, summarization, light reasoning (weaker factual recall) |
|
|
66
|
+
| `nvidia/deepseek-v4-flash` | 1M | DeepSeek V4 Flash — 284B / 13B active MoE, ~5× faster than V4 Pro. Best free chat / summarization / light reasoning |
|
|
68
67
|
| `nvidia/nemotron-3-nano-omni-30b-a3b-reasoning` | 256K | Only vision-capable free model — text + images + video (≤2 min) + audio (≤1 hr) |
|
|
69
68
|
| `nvidia/qwen3-next-80b-a3b-thinking` | 131K | 116 tok/s reasoning with thinking mode |
|
|
70
69
|
| `nvidia/mistral-small-4-119b` | 131K | 114 tok/s — fastest free chat |
|
|
71
|
-
| `nvidia/glm-4.7` | 131K | 237 tok/s — GLM-4.7 with thinking mode |
|
|
72
70
|
| `nvidia/llama-4-maverick` | 131K | Meta Llama 4 Maverick MoE |
|
|
73
71
|
| `nvidia/qwen3-coder-480b` | 131K | Coding-optimised 480B MoE |
|
|
74
|
-
| `nvidia/
|
|
72
|
+
| `nvidia/gpt-oss-120b` | 128K | OpenAI open-weight 120B — 123 tok/s. Hidden from `/v1/models` for privacy but direct calls still work |
|
|
73
|
+
| `nvidia/gpt-oss-20b` | 128K | OpenAI open-weight 20B — 155 tok/s. Hidden from `/v1/models` but direct calls still work |
|
|
75
74
|
|
|
76
|
-
>
|
|
75
|
+
> Need V4-Pro-class reasoning? Use the paid `deepseek/deepseek-v4-pro` ($0.50/$1.00 with the 75% promo through 2026-05-31) — `nvidia/deepseek-v4-pro` is currently hidden because NVIDIA's NIM deployment is hung; backend MODEL_REDIRECTS forwards calls to V4 Flash.
|
|
76
|
+
|
|
77
|
+
> Privacy note: `nvidia/gpt-oss-120b` and `nvidia/gpt-oss-20b` are hidden from `/v1/models` because NVIDIA's free build.nvidia.com tier reserves the right to use prompts/outputs for service improvement. Direct calls by full model ID still work — opt in only when your data isn't sensitive.
|
|
77
78
|
|
|
78
79
|
## Quick Start (Solana)
|
|
79
80
|
|
|
@@ -145,13 +146,33 @@ console.log(`Saved ${(result.routing.savings * 100).toFixed(0)}%`); // 'Saved 78
|
|
|
145
146
|
// Complex reasoning task -> routes to reasoning model
|
|
146
147
|
const complex = await client.smartChat('Prove the Riemann hypothesis step by step');
|
|
147
148
|
console.log(complex.model); // 'xai/grok-4-1-fast-reasoning'
|
|
149
|
+
|
|
150
|
+
// Inspect the fallback chain SmartChat will walk on transient errors.
|
|
151
|
+
console.log(complex.routing.fallbacks); // ['anthropic/claude-opus-4.7', ...]
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
### Automatic Fallback on Transient Errors
|
|
155
|
+
|
|
156
|
+
`smartChat()` populates a tier-specific fallback chain and `chat()` /
|
|
157
|
+
`chatCompletion()` walk it automatically when the primary model returns a
|
|
158
|
+
transient error — timeouts, network failures, or 5xx responses (502/503/504/
|
|
159
|
+
522/524). 4xx errors and `PaymentError` propagate immediately so wallet /
|
|
160
|
+
auth issues surface fast.
|
|
161
|
+
|
|
162
|
+
```typescript
|
|
163
|
+
// Manually pass a fallback chain to chat() / chatCompletion()
|
|
164
|
+
const reply = await client.chat('nvidia/deepseek-v4-flash', 'hello', {
|
|
165
|
+
fallbackModels: ['nvidia/llama-4-maverick', 'nvidia/mistral-small-4-119b'],
|
|
166
|
+
});
|
|
167
|
+
// If deepseek-v4-flash times out, the SDK retries against the next model
|
|
168
|
+
// and logs each hop to stderr: "[@blockrun/llm] <from> -> <to> (...)".
|
|
148
169
|
```
|
|
149
170
|
|
|
150
171
|
### Routing Profiles
|
|
151
172
|
|
|
152
173
|
| Profile | Description | Best For |
|
|
153
174
|
|---------|-------------|----------|
|
|
154
|
-
| `free` | NVIDIA free tier — smart-routes across
|
|
175
|
+
| `free` | NVIDIA free tier — smart-routes across 8 models (DeepSeek V4 Flash, Nemotron Nano Omni, Qwen3, Llama 4, Mistral, plus 2 hidden gpt-oss) | Zero-cost testing, dev, prod |
|
|
155
176
|
| `eco` | Cheapest models per tier (DeepSeek, xAI) | Cost-sensitive production |
|
|
156
177
|
| `auto` | Best balance of cost/quality (default) | General use |
|
|
157
178
|
| `premium` | Top-tier models (OpenAI, Anthropic) | Quality-critical tasks |
|
|
@@ -162,7 +183,7 @@ const result = await client.smartChat(
|
|
|
162
183
|
'Write production-grade async TypeScript code',
|
|
163
184
|
{ routingProfile: 'premium' }
|
|
164
185
|
);
|
|
165
|
-
console.log(result.model); // 'anthropic/claude-opus-4.
|
|
186
|
+
console.log(result.model); // 'anthropic/claude-opus-4.7'
|
|
166
187
|
```
|
|
167
188
|
|
|
168
189
|
### How ClawRouter Works
|
|
@@ -229,14 +250,15 @@ Released 2026-04-23 — first fully retrained base since GPT-4.5. 1M context, 12
|
|
|
229
250
|
| `openai/o4-mini` | $1.10/M | $4.40/M |
|
|
230
251
|
|
|
231
252
|
### Anthropic Claude
|
|
232
|
-
| Model | Input Price | Output Price |
|
|
233
|
-
|
|
234
|
-
| `anthropic/claude-opus-4.
|
|
235
|
-
| `anthropic/claude-opus-4.
|
|
236
|
-
| `anthropic/claude-opus-4` | $
|
|
237
|
-
| `anthropic/claude-
|
|
238
|
-
| `anthropic/claude-sonnet-4` | $3.00/M | $15.00/M |
|
|
239
|
-
| `anthropic/claude-
|
|
253
|
+
| Model | Input Price | Output Price | Context | Notes |
|
|
254
|
+
|-------|-------------|--------------|---------|-------|
|
|
255
|
+
| `anthropic/claude-opus-4.7` | $5.00/M | $25.00/M | **1M** | Flagship — agentic coding + adaptive thinking, 128K output |
|
|
256
|
+
| `anthropic/claude-opus-4.6` | $5.00/M | $25.00/M | 200K | Hidden but still callable — kept as in-family hot-swap fallback |
|
|
257
|
+
| `anthropic/claude-opus-4.5` | $5.00/M | $25.00/M | 200K | |
|
|
258
|
+
| `anthropic/claude-opus-4` | $15.00/M | $75.00/M | 200K | |
|
|
259
|
+
| `anthropic/claude-sonnet-4.6` | $3.00/M | $15.00/M | 200K | Best for reasoning/instructions |
|
|
260
|
+
| `anthropic/claude-sonnet-4` | $3.00/M | $15.00/M | 200K | |
|
|
261
|
+
| `anthropic/claude-haiku-4.5` | $1.00/M | $5.00/M | 200K | |
|
|
240
262
|
|
|
241
263
|
### Google Gemini
|
|
242
264
|
| Model | Input Price | Output Price |
|
|
@@ -249,10 +271,17 @@ Released 2026-04-23 — first fully retrained base since GPT-4.5. 1M context, 12
|
|
|
249
271
|
| `google/gemini-2.5-flash-lite` | $0.10/M | $0.40/M |
|
|
250
272
|
|
|
251
273
|
### DeepSeek
|
|
252
|
-
|
|
253
|
-
|
|
254
|
-
|
|
255
|
-
|
|
274
|
+
|
|
275
|
+
V4 family launched 2026-04-24. DeepSeek upstream now serves the legacy
|
|
276
|
+
`deepseek-chat` / `deepseek-reasoner` aliases as V4 Flash non-thinking /
|
|
277
|
+
thinking modes. V4 Pro is the new flagship paid SKU — 1.6T MoE / 49B active,
|
|
278
|
+
1M context, MMLU-Pro 87.5, GPQA 90.1, SWE-bench 80.6, LiveCodeBench 93.5.
|
|
279
|
+
|
|
280
|
+
| Model | Input Price | Output Price | Context | Notes |
|
|
281
|
+
|-------|-------------|--------------|---------|-------|
|
|
282
|
+
| `deepseek/deepseek-v4-pro` | $0.50/M | $1.00/M | 1M | V4 flagship — strongest open-weight reasoner. **75% off until 2026-05-31** (list $2.00/$4.00) |
|
|
283
|
+
| `deepseek/deepseek-chat` | $0.20/M | $0.40/M | 1M | V4 Flash non-thinking (paid endpoint with 5MB request bodies; same upstream as `nvidia/deepseek-v4-flash`) |
|
|
284
|
+
| `deepseek/deepseek-reasoner` | $0.20/M | $0.40/M | 1M | V4 Flash thinking (same upstream as `deepseek-chat`, thinking enabled by default) |
|
|
256
285
|
|
|
257
286
|
### xAI Grok
|
|
258
287
|
| Model | Input Price | Output Price | Context | Notes |
|
|
@@ -281,23 +310,26 @@ Released 2026-04-23 — first fully retrained base since GPT-4.5. 1M context, 12
|
|
|
281
310
|
|
|
282
311
|
### NVIDIA (Free) + Moonshot
|
|
283
312
|
|
|
284
|
-
Free tier refreshed 2026-04-28: added
|
|
285
|
-
Omni (vision)
|
|
286
|
-
|
|
287
|
-
|
|
288
|
-
|
|
313
|
+
Free tier refreshed 2026-04-28: added `nvidia/deepseek-v4-flash` (1M context)
|
|
314
|
+
and Nemotron Nano Omni (vision). `nvidia/gpt-oss-120b` and
|
|
315
|
+
`nvidia/gpt-oss-20b` were briefly delisted over privacy concerns then
|
|
316
|
+
**re-enabled 2026-04-30** with `available: true` + `hidden: true` — they
|
|
317
|
+
no longer appear in `/v1/models` (so SmartChat won't auto-pick them) but
|
|
318
|
+
direct calls by full ID still return HTTP 200. `nvidia/deepseek-v4-pro`,
|
|
319
|
+
`nvidia/deepseek-v3.2`, and `nvidia/glm-4.7` are hidden because NVIDIA's
|
|
320
|
+
NIM deployment is hung — backend MODEL_REDIRECTS forwards calls to V4
|
|
321
|
+
Flash / qwen3-coder.
|
|
289
322
|
|
|
290
323
|
| Model | Input Price | Output Price | Notes |
|
|
291
324
|
|-------|-------------|--------------|-------|
|
|
292
|
-
| `nvidia/deepseek-v4-
|
|
293
|
-
| `nvidia/deepseek-v4-flash` | **FREE** | **FREE** | 284B / 13B active MoE, 1M context — ~5× faster than V4 Pro |
|
|
325
|
+
| `nvidia/deepseek-v4-flash` | **FREE** | **FREE** | 284B / 13B active MoE, 1M context — best free chat / summarization / light reasoning |
|
|
294
326
|
| `nvidia/nemotron-3-nano-omni-30b-a3b-reasoning` | **FREE** | **FREE** | 31B / 3.2B active MoE, 256K — only vision-capable free model |
|
|
295
327
|
| `nvidia/qwen3-next-80b-a3b-thinking` | **FREE** | **FREE** | 116 tok/s — reasoning flagship with thinking mode |
|
|
296
328
|
| `nvidia/mistral-small-4-119b` | **FREE** | **FREE** | 114 tok/s — fastest free chat |
|
|
297
|
-
| `nvidia/glm-4.7` | **FREE** | **FREE** | 237 tok/s — GLM-4.7 with thinking mode |
|
|
298
329
|
| `nvidia/llama-4-maverick` | **FREE** | **FREE** | Meta Llama 4 Maverick MoE |
|
|
299
330
|
| `nvidia/qwen3-coder-480b` | **FREE** | **FREE** | Coding-optimised 480B MoE |
|
|
300
|
-
| `nvidia/
|
|
331
|
+
| `nvidia/gpt-oss-120b` | **FREE** | **FREE** | Hidden from `/v1/models` for privacy but direct calls still work — 123 tok/s |
|
|
332
|
+
| `nvidia/gpt-oss-20b` | **FREE** | **FREE** | Hidden from `/v1/models` but direct calls still work — 155 tok/s |
|
|
301
333
|
| `moonshot/kimi-k2.5` | $0.60/M | $3.00/M | Direct from Moonshot — replaces `nvidia/kimi-k2.5` |
|
|
302
334
|
|
|
303
335
|
### E2E Verified Models
|
|
@@ -323,7 +355,6 @@ All models below have been tested end-to-end via the TypeScript SDK (Feb 2026):
|
|
|
323
355
|
| `openai/gpt-image-2` | $0.06-0.12/image (reasoning-driven, multilingual text rendering, character consistency) |
|
|
324
356
|
| `google/nano-banana` | $0.05/image |
|
|
325
357
|
| `google/nano-banana-pro` | $0.10-0.15/image |
|
|
326
|
-
| `black-forest/flux-1.1-pro` | $0.04/image |
|
|
327
358
|
| `xai/grok-imagine-image` | $0.02/image |
|
|
328
359
|
| `xai/grok-imagine-image-pro` | $0.07/image |
|
|
329
360
|
| `zai/cogview-4` | $0.015/image |
|
|
@@ -496,48 +527,6 @@ const result = await client.imageEdit(
|
|
|
496
527
|
console.log(result.data[0].url);
|
|
497
528
|
```
|
|
498
529
|
|
|
499
|
-
## Testnet Usage
|
|
500
|
-
|
|
501
|
-
For development and testing without real USDC, use the testnet:
|
|
502
|
-
|
|
503
|
-
```typescript
|
|
504
|
-
import { testnetClient } from '@blockrun/llm';
|
|
505
|
-
|
|
506
|
-
// Create testnet client (uses Base Sepolia)
|
|
507
|
-
const client = testnetClient({ privateKey: '0x...' });
|
|
508
|
-
|
|
509
|
-
// Chat with testnet model
|
|
510
|
-
const response = await client.chat('openai/gpt-oss-20b', 'Hello!');
|
|
511
|
-
console.log(response);
|
|
512
|
-
|
|
513
|
-
// Check if client is on testnet
|
|
514
|
-
console.log(client.isTestnet()); // true
|
|
515
|
-
```
|
|
516
|
-
|
|
517
|
-
### Testnet Setup
|
|
518
|
-
|
|
519
|
-
1. Get testnet ETH from [Alchemy Base Sepolia Faucet](https://www.alchemy.com/faucets/base-sepolia)
|
|
520
|
-
2. Get testnet USDC from [Circle USDC Faucet](https://faucet.circle.com/)
|
|
521
|
-
3. Set your wallet key: `export BASE_CHAIN_WALLET_KEY=0x...`
|
|
522
|
-
|
|
523
|
-
### Available Testnet Models
|
|
524
|
-
|
|
525
|
-
- `openai/gpt-oss-20b` - $0.001/request (flat price)
|
|
526
|
-
- `openai/gpt-oss-120b` - $0.002/request (flat price)
|
|
527
|
-
|
|
528
|
-
### Manual Testnet Configuration
|
|
529
|
-
|
|
530
|
-
```typescript
|
|
531
|
-
import { LLMClient } from '@blockrun/llm';
|
|
532
|
-
|
|
533
|
-
// Or configure manually
|
|
534
|
-
const client = new LLMClient({
|
|
535
|
-
privateKey: '0x...',
|
|
536
|
-
apiUrl: 'https://testnet.blockrun.ai/api'
|
|
537
|
-
});
|
|
538
|
-
const response = await client.chat('openai/gpt-oss-20b', 'Hello!');
|
|
539
|
-
```
|
|
540
|
-
|
|
541
530
|
## Usage Examples
|
|
542
531
|
|
|
543
532
|
### Simple Chat
|
|
@@ -771,7 +760,7 @@ Works on both `LLMClient` (Base) and `SolanaLLMClient`.
|
|
|
771
760
|
|
|
772
761
|
## Exa Web Search (Powered by Exa)
|
|
773
762
|
|
|
774
|
-
Access [Exa](https://exa.ai)'s neural web search via x402. No API keys needed — pay-per-request
|
|
763
|
+
Access [Exa](https://exa.ai)'s neural web search via x402. No API keys needed — pay-per-request. Available on **`LLMClient` (Base USDC)** and `SolanaLLMClient` (Solana USDC). Use Base as the primary path; the Solana gateway is awaiting `EXA_API_KEY` provisioning.
|
|
775
764
|
|
|
776
765
|
| Method | Description | Price |
|
|
777
766
|
|---|---|---|
|
|
@@ -782,9 +771,9 @@ Access [Exa](https://exa.ai)'s neural web search via x402. No API keys needed
|
|
|
782
771
|
| `exa(path, body)` | Generic proxy for any Exa endpoint | varies |
|
|
783
772
|
|
|
784
773
|
```typescript
|
|
785
|
-
import {
|
|
774
|
+
import { LLMClient } from '@blockrun/llm';
|
|
786
775
|
|
|
787
|
-
const client = new
|
|
776
|
+
const client = new LLMClient();
|
|
788
777
|
|
|
789
778
|
// Neural web search ($0.01/request)
|
|
790
779
|
const results = await client.exaSearch("latest AI safety research", { numResults: 5 });
|
|
@@ -795,10 +784,6 @@ const similar = await client.exaFindSimilar("https://openai.com/research/gpt-4",
|
|
|
795
784
|
|
|
796
785
|
// Extract content from URLs ($0.002/URL)
|
|
797
786
|
const content = await client.exaContents(["https://arxiv.org/abs/2303.08774"]);
|
|
798
|
-
const rich = await client.exaContents(
|
|
799
|
-
["https://example.com/page1", "https://example.com/page2"],
|
|
800
|
-
{ text: true, highlights: true }
|
|
801
|
-
);
|
|
802
787
|
|
|
803
788
|
// AI-generated answer from live web ($0.01/request)
|
|
804
789
|
const answer = await client.exaAnswer("What is the current state of AI safety research?");
|
|
@@ -807,7 +792,7 @@ const answer = await client.exaAnswer("What is the current state of AI safety re
|
|
|
807
792
|
const custom = await client.exa("search", { query: "transformer architecture", numResults: 5 });
|
|
808
793
|
```
|
|
809
794
|
|
|
810
|
-
`SolanaLLMClient`
|
|
795
|
+
Same surface on `SolanaLLMClient` once Solana-side `EXA_API_KEY` is provisioned.
|
|
811
796
|
|
|
812
797
|
## Configuration
|
|
813
798
|
|
|
@@ -944,7 +929,6 @@ Full TypeScript support with exported types:
|
|
|
944
929
|
import {
|
|
945
930
|
LLMClient,
|
|
946
931
|
OpenAI,
|
|
947
|
-
testnetClient,
|
|
948
932
|
type ChatMessage,
|
|
949
933
|
type ChatResponse,
|
|
950
934
|
type ChatOptions,
|