@probeo/anymodel 0.3.1 → 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +27 -3
- package/dist/cli.cjs +556 -18
- package/dist/cli.cjs.map +1 -1
- package/dist/cli.js +556 -18
- package/dist/cli.js.map +1 -1
- package/dist/index.cjs +565 -18
- package/dist/index.cjs.map +1 -1
- package/dist/index.d.cts +19 -1
- package/dist/index.d.ts +19 -1
- package/dist/index.js +562 -18
- package/dist/index.js.map +1 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -81,9 +81,22 @@ mistral/mistral-large-latest
|
|
|
81
81
|
groq/llama-3.3-70b-versatile
|
|
82
82
|
deepseek/deepseek-chat
|
|
83
83
|
xai/grok-3
|
|
84
|
+
perplexity/sonar-pro
|
|
84
85
|
ollama/llama3.3
|
|
85
86
|
```
|
|
86
87
|
|
|
88
|
+
### Flex Pricing (OpenAI)
|
|
89
|
+
|
|
90
|
+
Get 50% off OpenAI requests with flexible latency:
|
|
91
|
+
|
|
92
|
+
```typescript
|
|
93
|
+
const response = await client.chat.completions.create({
|
|
94
|
+
model: "openai/gpt-4o",
|
|
95
|
+
messages: [{ role: "user", content: "Hello!" }],
|
|
96
|
+
service_tier: "flex",
|
|
97
|
+
});
|
|
98
|
+
```
|
|
99
|
+
|
|
87
100
|
## Fallback Routing
|
|
88
101
|
|
|
89
102
|
Try multiple models in order. If one fails, the next is attempted:
|
|
@@ -147,7 +160,7 @@ const response = await client.chat.completions.create({
|
|
|
147
160
|
|
|
148
161
|
## Batch Processing
|
|
149
162
|
|
|
150
|
-
Process many requests with native provider batch APIs or concurrent fallback. OpenAI and
|
|
163
|
+
Process many requests with native provider batch APIs or concurrent fallback. OpenAI, Anthropic, and Google batches are processed server-side — OpenAI at 50% cost, Anthropic with async processing for up to 10K requests, Google at 50% cost via `batchGenerateContent`. Other providers fall back to concurrent execution automatically.
|
|
151
164
|
|
|
152
165
|
### Submit and wait
|
|
153
166
|
|
|
@@ -168,7 +181,7 @@ for (const result of results.results) {
|
|
|
168
181
|
|
|
169
182
|
### Submit now, check later
|
|
170
183
|
|
|
171
|
-
Submit a batch and get back an ID immediately — no need to keep the process running for native batches (OpenAI, Anthropic):
|
|
184
|
+
Submit a batch and get back an ID immediately — no need to keep the process running for native batches (OpenAI, Anthropic, Google):
|
|
172
185
|
|
|
173
186
|
```typescript
|
|
174
187
|
// Submit and get the batch ID
|
|
@@ -231,6 +244,10 @@ const results = await client.batches.createAndPoll(request, {
|
|
|
231
244
|
|
|
232
245
|
Batches are persisted to `./.anymodel/batches/` in the current working directory and survive process restarts.
|
|
233
246
|
|
|
247
|
+
### Automatic max_tokens
|
|
248
|
+
|
|
249
|
+
When `max_tokens` isn't set on a batch request, anymodel automatically calculates a safe value per-request based on the estimated input size and the model's context window. This prevents truncated responses and context overflow errors without requiring you to hand-tune each request in a large batch. The estimation uses a ~4 chars/token heuristic with a 5% safety margin — conservative enough to avoid overflows, lightweight enough to skip tokenizer dependencies.
|
|
250
|
+
|
|
234
251
|
## Models Endpoint
|
|
235
252
|
|
|
236
253
|
```typescript
|
|
@@ -264,6 +281,7 @@ const client = new AnyModel({
|
|
|
264
281
|
temperature: 0.7,
|
|
265
282
|
max_tokens: 4096,
|
|
266
283
|
retries: 2,
|
|
284
|
+
timeout: 120, // HTTP timeout in seconds (default: 120 = 2 min, flex: 600 = 10 min)
|
|
267
285
|
},
|
|
268
286
|
});
|
|
269
287
|
|
|
@@ -425,6 +443,7 @@ npx tsx examples/basic.ts batch
|
|
|
425
443
|
- **Retries**: Automatic retry with exponential backoff on 429/502/503 errors (configurable via `defaults.retries`)
|
|
426
444
|
- **Rate limit tracking**: Per-provider rate limit state, automatically skips rate-limited providers during fallback routing
|
|
427
445
|
- **Parameter stripping**: Unsupported parameters are automatically removed before forwarding to providers
|
|
446
|
+
- **Smart batch defaults**: Automatic `max_tokens` estimation per-request in batches — calculates safe values from input size and model context limits, preventing truncation and overflow without manual tuning
|
|
428
447
|
- **High-volume IO**: All batch file operations use concurrency-limited async queues with atomic durable writes (temp file + fsync + rename) to prevent corruption on crash. Defaults: 20 concurrent reads, 10 concurrent writes — configurable via `io.readConcurrency` and `io.writeConcurrency`
|
|
429
448
|
|
|
430
449
|
## Roadmap
|
|
@@ -432,10 +451,15 @@ npx tsx examples/basic.ts batch
|
|
|
432
451
|
- [ ] **A/B testing** — split routing (% traffic to each model) and compare mode (same request to multiple models, return all responses with stats)
|
|
433
452
|
- [ ] **Cost tracking** — per-request and aggregate cost calculation from provider pricing
|
|
434
453
|
- [ ] **Caching** — response caching with configurable TTL for identical requests
|
|
435
|
-
- [x] **Native batch APIs** — OpenAI Batch API (JSONL upload, 50% cost)
|
|
454
|
+
- [x] **Native batch APIs** — OpenAI Batch API (JSONL upload, 50% cost), Anthropic Message Batches (10K requests, async), and Google Gemini Batch (50% cost). Auto-detects provider and routes to native API, falls back to concurrent for other providers
|
|
436
455
|
- [ ] **Result export** — `saveResults()` to write batch results to a configurable output directory
|
|
437
456
|
- [ ] **Prompt logging** — optional request/response logging for debugging and evaluation
|
|
438
457
|
|
|
458
|
+
## Also Available
|
|
459
|
+
|
|
460
|
+
- **Python**: [`anymodel-py`](https://github.com/probeo-io/anymodel-py) on PyPI
|
|
461
|
+
- **Go**: [`anymodel-go`](https://github.com/probeo-io/anymodel-go)
|
|
462
|
+
|
|
439
463
|
## License
|
|
440
464
|
|
|
441
465
|
MIT
|