@probeo/anymodel 0.3.1 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -81,9 +81,22 @@ mistral/mistral-large-latest
81
81
  groq/llama-3.3-70b-versatile
82
82
  deepseek/deepseek-chat
83
83
  xai/grok-3
84
+ perplexity/sonar-pro
84
85
  ollama/llama3.3
85
86
  ```
86
87
 
88
+ ### Flex Pricing (OpenAI)
89
+
90
+ Get 50% off OpenAI requests with flexible latency:
91
+
92
+ ```typescript
93
+ const response = await client.chat.completions.create({
94
+ model: "openai/gpt-4o",
95
+ messages: [{ role: "user", content: "Hello!" }],
96
+ service_tier: "flex",
97
+ });
98
+ ```
99
+
87
100
  ## Fallback Routing
88
101
 
89
102
  Try multiple models in order. If one fails, the next is attempted:
@@ -147,7 +160,7 @@ const response = await client.chat.completions.create({
147
160
 
148
161
  ## Batch Processing
149
162
 
150
- Process many requests with native provider batch APIs or concurrent fallback. OpenAI and Anthropic batches are processed server-side — OpenAI at 50% cost, Anthropic with async processing for up to 10K requests. Other providers fall back to concurrent execution automatically.
163
+ Process many requests with native provider batch APIs or concurrent fallback. OpenAI, Anthropic, and Google batches are processed server-side — OpenAI at 50% cost, Anthropic with async processing for up to 10K requests, Google at 50% cost via `batchGenerateContent`. Other providers fall back to concurrent execution automatically.
151
164
 
152
165
  ### Submit and wait
153
166
 
@@ -168,7 +181,7 @@ for (const result of results.results) {
168
181
 
169
182
  ### Submit now, check later
170
183
 
171
- Submit a batch and get back an ID immediately — no need to keep the process running for native batches (OpenAI, Anthropic):
184
+ Submit a batch and get back an ID immediately — no need to keep the process running for native batches (OpenAI, Anthropic, Google):
172
185
 
173
186
  ```typescript
174
187
  // Submit and get the batch ID
@@ -231,6 +244,10 @@ const results = await client.batches.createAndPoll(request, {
231
244
 
232
245
  Batches are persisted to `./.anymodel/batches/` in the current working directory and survive process restarts.
233
246
 
247
+ ### Automatic max_tokens
248
+
249
+ When `max_tokens` isn't set on a batch request, anymodel automatically calculates a safe value per-request based on the estimated input size and the model's context window. This prevents truncated responses and context overflow errors without requiring you to hand-tune each request in a large batch. The estimation uses a ~4 chars/token heuristic with a 5% safety margin — conservative enough to avoid overflows, lightweight enough to skip tokenizer dependencies.
250
+
234
251
  ## Models Endpoint
235
252
 
236
253
  ```typescript
@@ -264,6 +281,7 @@ const client = new AnyModel({
264
281
  temperature: 0.7,
265
282
  max_tokens: 4096,
266
283
  retries: 2,
284
+ timeout: 120, // HTTP timeout in seconds (default: 120 = 2 min, flex: 600 = 10 min)
267
285
  },
268
286
  });
269
287
 
@@ -425,6 +443,7 @@ npx tsx examples/basic.ts batch
425
443
  - **Retries**: Automatic retry with exponential backoff on 429/502/503 errors (configurable via `defaults.retries`)
426
444
  - **Rate limit tracking**: Per-provider rate limit state, automatically skips rate-limited providers during fallback routing
427
445
  - **Parameter stripping**: Unsupported parameters are automatically removed before forwarding to providers
446
+ - **Smart batch defaults**: Automatic `max_tokens` estimation per-request in batches — calculates safe values from input size and model context limits, preventing truncation and overflow without manual tuning
428
447
  - **High-volume IO**: All batch file operations use concurrency-limited async queues with atomic durable writes (temp file + fsync + rename) to prevent corruption on crash. Defaults: 20 concurrent reads, 10 concurrent writes — configurable via `io.readConcurrency` and `io.writeConcurrency`
429
448
 
430
449
  ## Roadmap
@@ -432,10 +451,15 @@ npx tsx examples/basic.ts batch
432
451
  - [ ] **A/B testing** — split routing (% traffic to each model) and compare mode (same request to multiple models, return all responses with stats)
433
452
  - [ ] **Cost tracking** — per-request and aggregate cost calculation from provider pricing
434
453
  - [ ] **Caching** — response caching with configurable TTL for identical requests
435
- - [x] **Native batch APIs** — OpenAI Batch API (JSONL upload, 50% cost) and Anthropic Message Batches (10K requests, async). Auto-detects provider and routes to native API, falls back to concurrent for other providers
454
+ - [x] **Native batch APIs** — OpenAI Batch API (JSONL upload, 50% cost), Anthropic Message Batches (10K requests, async), and Google Gemini Batch (50% cost). Auto-detects provider and routes to native API, falls back to concurrent for other providers
436
455
  - [ ] **Result export** — `saveResults()` to write batch results to a configurable output directory
437
456
  - [ ] **Prompt logging** — optional request/response logging for debugging and evaluation
438
457
 
458
+ ## Also Available
459
+
460
+ - **Python**: [`anymodel-py`](https://github.com/probeo-io/anymodel-py) on PyPI
461
+ - **Go**: [`anymodel-go`](https://github.com/probeo-io/anymodel-go)
462
+
439
463
  ## License
440
464
 
441
465
  MIT