modelmix 4.4.12 → 4.4.16

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -135,9 +135,10 @@ Here's a comprehensive list of available methods:
135
135
 
136
136
  | Method | Provider | Model | Price (I/O) per 1 M tokens |
137
137
  | ------------------ | ---------- | ------------------------------ | -------------------------- |
138
+ | `gpt54()` | OpenAI | gpt-5.4 | [\$2.50 / \$15.00][1] |
138
139
  | `gpt52()` | OpenAI | gpt-5.2 | [\$1.75 / \$14.00][1] |
139
140
  | `gpt51()` | OpenAI | gpt-5.1 | [\$1.25 / \$10.00][1] |
140
- | `gpt5()` | OpenAI | gpt-5 | [\$1.25 / \$10.00][1] |
141
+ | `gpt53codex()` | OpenAI | gpt-5.3-codex | [\$1.25 / \$14.00][1] |
141
142
  | `gpt5mini()` | OpenAI | gpt-5-mini | [\$0.25 / \$2.00][1] |
142
143
  | `gpt5nano()` | OpenAI | gpt-5-nano | [\$0.05 / \$0.40][1] |
143
144
  | `gpt41()` | OpenAI | gpt-4.1 | [\$2.00 / \$8.00][1] |
@@ -148,8 +149,8 @@ Here's a comprehensive list of available methods:
148
149
  | `opus45[think]()` | Anthropic | claude-opus-4-5-20251101 | [\$5.00 / \$25.00][2] |
149
150
  | `sonnet46[think]()`| Anthropic | claude-sonnet-4-6 | [\$3.00 / \$15.00][2] |
150
151
  | `sonnet45[think]()`| Anthropic | claude-sonnet-4-5-20250929 | [\$3.00 / \$15.00][2] |
151
- | `haiku35()` | Anthropic | claude-3-5-haiku-20241022 | [\$0.80 / \$4.00][2] |
152
152
  | `haiku45[think]()` | Anthropic | claude-haiku-4-5-20251001 | [\$1.00 / \$5.00][2] |
153
+ | `gemini31pro()` | Google | gemini-3.1-pro-preview | [\$2.00 / \$12.00][3] |
153
154
  | `gemini3pro()` | Google | gemini-3-pro-preview | [\$2.00 / \$12.00][3] |
154
155
  | `gemini3flash()` | Google | gemini-3-flash-preview | [\$0.50 / \$3.00][3] |
155
156
  | `gemini25pro()` | Google | gemini-2.5-pro | [\$1.25 / \$10.00][3] |
@@ -161,8 +162,6 @@ Here's a comprehensive list of available methods:
161
162
  | `minimaxM25()` | MiniMax | MiniMax-M2.5 | [\$0.30 / \$1.20][9] |
162
163
  | `sonar()` | Perplexity | sonar | [\$1.00 / \$1.00][4] |
163
164
  | `sonarPro()` | Perplexity | sonar-pro | [\$3.00 / \$15.00][4] |
164
- | `scout()` | Groq | Llama-4-Scout-17B-16E-Instruct | [\$0.11 / \$0.34][5] |
165
- | `maverick()` | Groq | Maverick-17B-128E-Instruct-FP8 | [\$0.20 / \$0.60][5] |
166
165
  | `hermes3()` | Lambda | Hermes-3-Llama-3.1-405B-FP8 | [\$0.80 / \$0.80][8] |
167
166
  | `qwen3()` | Together | Qwen3-235B-A22B-fp8-tput | [\$0.20 / \$0.60][7] |
168
167
  | `kimiK2()` | Together | Kimi-K2-Instruct | [\$1.00 / \$3.00][7] |
@@ -345,11 +344,11 @@ Descriptions support **descriptor objects** with `description`, `required`, `enu
345
344
 
346
345
  ```javascript
347
346
  const result = await model.json(
348
- { name: 'martin', age: 22, sex: 'm' },
347
+ { name: 'Martin', age: 22, sex: 'male' },
349
348
  {
350
349
  name: { description: 'Name of the actor', required: false },
351
- age: 'Age of the actor', // string still works
352
- sex: { description: 'Gender', enum: ['m', 'f', null], default: 'm' }
350
+ age: 'Age of the actor', // string still works
351
+ sex: { description: 'Gender', enum: ['male', 'female', null], default: null }
353
352
  }
354
353
  );
355
354
  ```
@@ -406,7 +405,9 @@ Every response from `raw()` now includes a `tokens` object with the following st
406
405
  tokens: {
407
406
  input: 150, // Number of tokens in the prompt/input
408
407
  output: 75, // Number of tokens in the completion/output
409
- total: 225 // Total tokens used (input + output)
408
+ total: 225, // Total tokens used (input + output)
409
+ cost: 0.0012, // Estimated cost in USD (null if model not in pricing table)
410
+ speed: 42 // Output tokens per second (int)
410
411
  }
411
412
  }
412
413
  ```
@@ -418,10 +419,10 @@ After calling `message()` or `json()`, use `lastRaw` to access the complete resp
418
419
  ```javascript
419
420
  const text = await model.message();
420
421
  console.log(model.lastRaw.tokens);
421
- // { input: 122, output: 86, total: 541, cost: 0.000319 }
422
+ // { input: 122, output: 86, total: 541, cost: 0.000319, speed: 38 }
422
423
  ```
423
424
 
424
- The `cost` field is the estimated cost in USD based on the model's pricing per 1M tokens (input/output). If the model is not found in the pricing table, `cost` will be `null`.
425
+ The `cost` field is the estimated cost in USD based on the model's pricing per 1M tokens (input/output). If the model is not found in the pricing table, `cost` will be `null`. The `speed` field is the generation speed measured in output tokens per second (integer).
425
426
 
426
427
  ## 🐛 Enabling Debug Mode
427
428
 
@@ -515,7 +516,7 @@ new ModelMix(args = { options: {}, config: {} })
515
516
  - `message`: The text response from the model
516
517
  - `think`: Reasoning/thinking content (if available)
517
518
  - `toolCalls`: Array of tool calls made by the model (if any)
518
- - `tokens`: Object with `input`, `output`, and `total` token counts
519
+ - `tokens`: Object with `input`, `output`, `total` token counts, `cost` (USD), and `speed` (output tokens/sec)
519
520
  - `response`: The raw API response
520
521
  - `stream(callback)`: Sends the message and streams the response, invoking the callback with each streamed part.
521
522
  - `json(schemaExample, descriptions = {}, options = {})`: Forces the model to return a response in a specific JSON format.
package/demo/gemini.js CHANGED
@@ -1,5 +1,5 @@
1
1
  import { ModelMix, MixGoogle } from '../index.js';
2
- try { process.loadEnvFile(); } catch {}
2
+ try { process.loadEnvFile(); } catch { }
3
3
 
4
4
  const mmix = new ModelMix({
5
5
  options: {
@@ -12,9 +12,9 @@ const mmix = new ModelMix({
12
12
  }
13
13
  });
14
14
 
15
- // Using gemini25flash (Gemini 2.5 Flash) with built-in method
15
+ // Using gemini3flash (Gemini 3 Flash) with built-in method
16
16
  console.log("\n" + '--------| gemini25flash() |--------');
17
- const flash = await mmix.gemini25flash()
17
+ const flash = await mmix.gemini3flash()
18
18
  .addText('Hi there! Do you like cats?')
19
19
  .message();
20
20
 
@@ -22,20 +22,23 @@ console.log(flash);
22
22
 
23
23
  // Using gemini3pro (Gemini 3 Pro) with custom config
24
24
  console.log("\n" + '--------| gemini3pro() with JSON response |--------');
25
- const pro = mmix.new().gemini3pro();
25
+ const pro = mmix.new().gemini31pro();
26
26
 
27
27
  pro.addText('Give me a fun fact about cats');
28
- const jsonResponse = await pro.json({
28
+
29
+ const jsonExampleAndSchema = {
29
30
  fact: 'A fun fact about cats',
30
- category: 'animal behavior'
31
- });
31
+ category: 'animal behavior'
32
+ };
33
+
34
+ const jsonResponse = await pro.json(jsonExampleAndSchema, jsonExampleAndSchema);
32
35
 
33
36
  console.log(jsonResponse);
34
37
 
35
38
  // Using attach method with MixGoogle for custom model
36
39
  console.log("\n" + '--------| Custom Gemini with attach() |--------');
37
- mmix.attach('gemini-2.5-flash', new MixGoogle());
40
+ const customModel = mmix.new().attach('gemini-2.5-flash', new MixGoogle());
38
41
 
39
- const custom = await mmix.addText('Tell me a short joke about cats.').message();
42
+ const custom = await customModel.addText('Tell me a short joke about cats.').message();
40
43
  console.log(custom);
41
44
 
@@ -0,0 +1,22 @@
1
+ import { ModelMix } from '../index.js';
2
+ try { process.loadEnvFile(); } catch {}
3
+
4
+ const mmix = new ModelMix({
5
+ config: {
6
+ debug: 3
7
+ }
8
+ });
9
+
10
+ console.log('\n--------| gptRealtime() |--------');
11
+
12
+ const realtime = mmix.gptRealtimeMini({
13
+ options: {
14
+ stream: true
15
+ }
16
+ });
17
+
18
+ realtime.addText('Explain quantum entanglement in simple terms.');
19
+ const response = await realtime.stream(({ delta }) => {
20
+ process.stdout.write(delta || '');
21
+ });
22
+ console.log('\n\n[done]\n', response.tokens);
@@ -8,10 +8,10 @@ const mmix = new ModelMix({
8
8
  }
9
9
  });
10
10
 
11
- console.log("\n" + '--------| gpt51() |--------');
11
+ console.log("\n" + '--------| gpt54() |--------');
12
12
 
13
13
  const gptArgs = { options: { reasoning_effort: "none", verbosity: "low" } };
14
- const gpt = mmix.gpt51(gptArgs);
14
+ const gpt = mmix.gpt54(gptArgs);
15
15
 
16
16
  gpt.addText("Explain quantum entanglement in simple terms.");
17
17
  const response = await gpt.message();
@@ -11,7 +11,8 @@
11
11
  "dependencies": {
12
12
  "dotenv": "^17.2.3",
13
13
  "isolated-vm": "^6.0.2",
14
- "lemonlog": "^1.1.4"
14
+ "lemonlog": "^1.1.4",
15
+ "pathmix": "^1.0.0"
15
16
  }
16
17
  },
17
18
  ".api/apis/pplx": {
@@ -290,6 +291,15 @@
290
291
  "wrappy": "1"
291
292
  }
292
293
  },
294
+ "node_modules/pathmix": {
295
+ "version": "1.0.0",
296
+ "resolved": "https://registry.npmjs.org/pathmix/-/pathmix-1.0.0.tgz",
297
+ "integrity": "sha512-oLbvoOKuyV6TjkKLEYqH5O+q+d+qZwtRNzMrBI93IsCYN0liDw8W8aZq3BPvIaF4jJU+igeO/1p6lCwFfy8E5Q==",
298
+ "license": "ISC",
299
+ "engines": {
300
+ "node": ">=16.0.0"
301
+ }
302
+ },
293
303
  "node_modules/prebuild-install": {
294
304
  "version": "7.1.3",
295
305
  "resolved": "https://registry.npmjs.org/prebuild-install/-/prebuild-install-7.1.3.tgz",
package/demo/package.json CHANGED
@@ -15,6 +15,7 @@
15
15
  "dependencies": {
16
16
  "dotenv": "^17.2.3",
17
17
  "isolated-vm": "^6.0.2",
18
- "lemonlog": "^1.1.4"
18
+ "lemonlog": "^1.1.4",
19
+ "pathmix": "^1.0.0"
19
20
  }
20
21
  }
package/index.js CHANGED
@@ -5,6 +5,7 @@ const { inspect } = require('util');
5
5
  const log = require('lemonlog')('ModelMix');
6
6
  const Bottleneck = require('bottleneck');
7
7
  const path = require('path');
8
+ const WebSocket = require('ws');
8
9
  const generateJsonSchema = require('./schema');
9
10
  const { Client } = require("@modelcontextprotocol/sdk/client/index.js");
10
11
  const { StdioClientTransport } = require("@modelcontextprotocol/sdk/client/stdio.js");
@@ -14,6 +15,11 @@ const { MCPToolsManager } = require('./mcp-tools');
14
15
  // Based on provider pricing pages linked in README
15
16
  const MODEL_PRICING = {
16
17
  // OpenAI
18
+ 'gpt-realtime-mini': [0.60, 2.40],
19
+ 'gpt-realtime': [4.00, 16.00],
20
+ 'gpt-5.4': [2.50, 15.00],
21
+ 'gpt-5.4-pro': [30, 180.00],
22
+ 'gpt-5.3-codex': [1.75, 14.00],
17
23
  'gpt-5.2': [1.75, 14.00],
18
24
  'gpt-5.2-chat-latest': [1.75, 14.00],
19
25
  'gpt-5.1': [1.25, 10.00],
@@ -37,6 +43,7 @@ const MODEL_PRICING = {
37
43
  'claude-3-5-haiku-20241022': [0.80, 4.00],
38
44
  'claude-haiku-4-5-20251001': [1.00, 5.00],
39
45
  // Google
46
+ 'gemini-3.1-pro-preview':[2.00, 12.00],
40
47
  'gemini-3-pro-preview': [2.00, 12.00],
41
48
  'gemini-3-flash-preview': [0.50, 3.00],
42
49
  'gemini-2.5-pro': [1.25, 10.00],
@@ -267,6 +274,21 @@ class ModelMix {
267
274
  gpt52({ options = {}, config = {} } = {}) {
268
275
  return this.attach('gpt-5.2', new MixOpenAI({ options, config }));
269
276
  }
277
+ gpt54({ options = {}, config = {} } = {}) {
278
+ return this.attach('gpt-5.4', new MixOpenAIResponses({ options, config }));
279
+ }
280
+ gpt54pro({ options = {}, config = {} } = {}) {
281
+ return this.attach('gpt-5.4-pro', new MixOpenAIResponses({ options, config }));
282
+ }
283
+ gptRealtime({ options = {}, config = {} } = {}) {
284
+ return this.attach('gpt-realtime', new MixOpenAIWebSocket({ options, config }));
285
+ }
286
+ gptRealtimeMini({ options = {}, config = {} } = {}) {
287
+ return this.attach('gpt-realtime-mini', new MixOpenAIWebSocket({ options, config }));
288
+ }
289
+ gpt53codex({ options = {}, config = {} } = {}) {
290
+ return this.attach('gpt-5.3-codex', new MixOpenAIResponses({ options, config }));
291
+ }
270
292
  gpt52chat({ options = {}, config = {} } = {}) {
271
293
  return this.attach('gpt-5.2-chat-latest', new MixOpenAI({ options, config }));
272
294
  }
@@ -341,6 +363,9 @@ class ModelMix {
341
363
  gemini25flash({ options = {}, config = {} } = {}) {
342
364
  return this.attach('gemini-2.5-flash', new MixGoogle({ options, config }));
343
365
  }
366
+ gemini31pro({ options = {}, config = {} } = {}) {
367
+ return this.attach('gemini-3.1-pro-preview', new MixGoogle({ options, config }));
368
+ }
344
369
  gemini3pro({ options = {}, config = {} } = {}) {
345
370
  return this.attach('gemini-3-pro-preview', new MixGoogle({ options, config }));
346
371
  }
@@ -889,11 +914,14 @@ class ModelMix {
889
914
  providerInstance.streamCallback = this.streamCallback;
890
915
  }
891
916
 
917
+ const startTime = Date.now();
892
918
  const result = await providerInstance.create({ options: currentOptions, config: currentConfig });
919
+ const elapsedMs = Date.now() - startTime;
893
920
 
894
- // Calculate cost based on model pricing
895
921
  if (result.tokens) {
896
922
  result.tokens.cost = ModelMix.calculateCost(currentModelKey, result.tokens);
923
+ const elapsedSec = elapsedMs / 1000;
924
+ result.tokens.speed = elapsedSec > 0 ? Math.round(result.tokens.output / elapsedSec) : 0;
897
925
  }
898
926
 
899
927
  if (result.toolCalls && result.toolCalls.length > 0) {
@@ -935,7 +963,7 @@ class ModelMix {
935
963
  // debug level 2: Readable summary of output
936
964
  if (currentConfig.debug >= 2) {
937
965
  const tokenInfo = result.tokens
938
- ? ` ${result.tokens.input} → ${result.tokens.output} tok` + (result.tokens.cost != null ? ` $${result.tokens.cost.toFixed(4)}` : '')
966
+ ? ` ${result.tokens.input} → ${result.tokens.output} tok` + (result.tokens.speed ? ` ${result.tokens.speed} t/s` : '') + (result.tokens.cost != null ? ` $${result.tokens.cost.toFixed(4)}` : '')
939
967
  : '';
940
968
  console.log(`✓${tokenInfo}\n${ModelMix.formatOutputSummary(result, currentConfig.debug).trim()}`);
941
969
  }
@@ -1492,6 +1520,339 @@ class MixOpenAI extends MixCustom {
1492
1520
  }
1493
1521
  }
1494
1522
 
1523
+ class MixOpenAIResponses extends MixOpenAI {
1524
+ async create({ config = {}, options = {} } = {}) {
1525
+
1526
+ // Keep GPT/o-model option normalization behavior
1527
+ if (options.model?.startsWith('o')) {
1528
+ delete options.max_tokens;
1529
+ delete options.temperature;
1530
+ }
1531
+ if (options.model?.includes('gpt-5')) {
1532
+ if (options.max_tokens) {
1533
+ options.max_completion_tokens = options.max_tokens;
1534
+ delete options.max_tokens;
1535
+ }
1536
+ delete options.temperature;
1537
+ }
1538
+
1539
+ const responsesUrl = this.config.url.replace('/chat/completions', '/responses');
1540
+ const request = MixOpenAIResponses.buildResponsesRequest(options);
1541
+ const response = await axios.post(responsesUrl, request, {
1542
+ headers: this.headers
1543
+ });
1544
+
1545
+ return MixOpenAIResponses.processResponsesResponse(response);
1546
+ }
1547
+
1548
+ static buildResponsesRequest(options = {}) {
1549
+ const request = {
1550
+ model: options.model,
1551
+ input: MixOpenAIResponses.messagesToResponsesInput(options.messages),
1552
+ stream: false
1553
+ };
1554
+
1555
+ if (options.reasoning_effort) request.reasoning = { effort: options.reasoning_effort };
1556
+ if (options.verbosity) request.text = { verbosity: options.verbosity };
1557
+
1558
+ if (typeof options.max_completion_tokens === 'number') {
1559
+ request.max_output_tokens = options.max_completion_tokens;
1560
+ } else if (typeof options.max_tokens === 'number') {
1561
+ request.max_output_tokens = options.max_tokens;
1562
+ }
1563
+
1564
+ if (typeof options.temperature === 'number') request.temperature = options.temperature;
1565
+ if (typeof options.top_p === 'number') request.top_p = options.top_p;
1566
+ if (typeof options.presence_penalty === 'number') request.presence_penalty = options.presence_penalty;
1567
+ if (typeof options.frequency_penalty === 'number') request.frequency_penalty = options.frequency_penalty;
1568
+ if (options.stop !== undefined) request.stop = options.stop;
1569
+ if (typeof options.n === 'number') request.n = options.n;
1570
+ if (options.logit_bias !== undefined) request.logit_bias = options.logit_bias;
1571
+ if (options.user !== undefined) request.user = options.user;
1572
+
1573
+ return request;
1574
+ }
1575
+
1576
+ static processResponsesResponse(response) {
1577
+ const message = MixOpenAIResponses.extractResponsesMessage(response.data);
1578
+ return {
1579
+ message,
1580
+ think: null,
1581
+ toolCalls: [],
1582
+ tokens: MixOpenAIResponses.extractResponsesTokens(response.data),
1583
+ response: response.data
1584
+ };
1585
+ }
1586
+
1587
+ static extractResponsesTokens(data) {
1588
+ if (data.usage) {
1589
+ return {
1590
+ input: data.usage.input_tokens || 0,
1591
+ output: data.usage.output_tokens || 0,
1592
+ total: data.usage.total_tokens || ((data.usage.input_tokens || 0) + (data.usage.output_tokens || 0))
1593
+ };
1594
+ }
1595
+ return {
1596
+ input: 0,
1597
+ output: 0,
1598
+ total: 0
1599
+ };
1600
+ }
1601
+
1602
+ static extractResponsesMessage(data) {
1603
+ if (!Array.isArray(data.output)) return '';
1604
+ return data.output
1605
+ .filter(item => item.type === 'message')
1606
+ .flatMap(item => Array.isArray(item.content) ? item.content : [])
1607
+ .filter(content => content.type === 'output_text' && typeof content.text === 'string')
1608
+ .map(content => content.text)
1609
+ .join('\n')
1610
+ .trim();
1611
+ }
1612
+
1613
+ static messagesToResponsesInput(messages = []) {
1614
+ const mapped = [];
1615
+
1616
+ for (const message of messages) {
1617
+ if (!message || !message.role) continue;
1618
+ if (message.tool_calls || message.role === 'tool') continue;
1619
+
1620
+ let text = '';
1621
+ if (typeof message.content === 'string') {
1622
+ text = message.content;
1623
+ } else if (Array.isArray(message.content)) {
1624
+ text = message.content
1625
+ .filter(item => item && item.type === 'text' && typeof item.text === 'string')
1626
+ .map(item => item.text)
1627
+ .join('\n');
1628
+ }
1629
+
1630
+ if (!text) continue;
1631
+ mapped.push({
1632
+ role: message.role,
1633
+ content: [{ type: 'input_text', text }]
1634
+ });
1635
+ }
1636
+
1637
+ return mapped;
1638
+ }
1639
+ }
1640
+
1641
+ class MixOpenAIWebSocket extends MixOpenAIResponses {
1642
+ getDefaultConfig(customConfig) {
1643
+ return super.getDefaultConfig({
1644
+ realtimeUrl: 'wss://api.openai.com/v1/realtime',
1645
+ websocketTimeoutMs: 120000,
1646
+ ...customConfig
1647
+ });
1648
+ }
1649
+
1650
+ async create({ config = {}, options = {} } = {}) {
1651
+ if (options.model?.startsWith('o')) {
1652
+ delete options.max_tokens;
1653
+ delete options.temperature;
1654
+ }
1655
+ if (options.model?.includes('gpt-5')) {
1656
+ if (options.max_tokens) {
1657
+ options.max_completion_tokens = options.max_tokens;
1658
+ delete options.max_tokens;
1659
+ }
1660
+ delete options.temperature;
1661
+ }
1662
+
1663
+ const mergedConfig = { ...this.config, ...config };
1664
+ const realtimeUrl = `${mergedConfig.realtimeUrl}?model=${encodeURIComponent(options.model)}`;
1665
+ const timeoutMs = mergedConfig.websocketTimeoutMs || 120000;
1666
+
1667
+ return await new Promise((resolve, reject) => {
1668
+ const ws = new WebSocket(realtimeUrl, {
1669
+ headers: {
1670
+ authorization: `Bearer ${mergedConfig.apiKey}`
1671
+ }
1672
+ });
1673
+
1674
+ const events = [];
1675
+ let message = '';
1676
+ let settled = false;
1677
+ let finalResponse = null;
1678
+
1679
+ const timeout = setTimeout(() => {
1680
+ if (settled) return;
1681
+ settled = true;
1682
+ ws.close();
1683
+ reject({
1684
+ message: `Realtime WebSocket timed out after ${timeoutMs}ms`,
1685
+ statusCode: null,
1686
+ details: null,
1687
+ config: mergedConfig,
1688
+ options
1689
+ });
1690
+ }, timeoutMs);
1691
+
1692
+ const cleanUp = () => clearTimeout(timeout);
1693
+
1694
+ ws.on('open', () => {
1695
+ const session = {
1696
+ type: 'realtime',
1697
+ output_modalities: ['text']
1698
+ };
1699
+
1700
+ if (mergedConfig.system) session.instructions = mergedConfig.system;
1701
+ if (Array.isArray(options.tools) && options.tools.length > 0) {
1702
+ session.tools = options.tools;
1703
+ }
1704
+
1705
+ ws.send(JSON.stringify({ type: 'session.update', session }));
1706
+
1707
+ const items = MixOpenAIWebSocket.messagesToConversationItems(options.messages);
1708
+ for (const item of items) {
1709
+ ws.send(JSON.stringify({
1710
+ type: 'conversation.item.create',
1711
+ item
1712
+ }));
1713
+ }
1714
+
1715
+ const responseConfig = { output_modalities: ['text'] };
1716
+ if (typeof options.max_completion_tokens === 'number') {
1717
+ responseConfig.max_output_tokens = Math.min(options.max_completion_tokens, 4096);
1718
+ } else if (typeof options.max_tokens === 'number') {
1719
+ responseConfig.max_output_tokens = Math.min(options.max_tokens, 4096);
1720
+ }
1721
+ if (Array.isArray(options.tools) && options.tools.length > 0) responseConfig.tools = options.tools;
1722
+
1723
+ ws.send(JSON.stringify({
1724
+ type: 'response.create',
1725
+ response: responseConfig
1726
+ }));
1727
+ });
1728
+
1729
+ ws.on('message', raw => {
1730
+ let event;
1731
+ try {
1732
+ event = JSON.parse(raw.toString());
1733
+ } catch {
1734
+ return;
1735
+ }
1736
+
1737
+ events.push(event);
1738
+
1739
+ const isTextDeltaEvent = event.type === 'response.text.delta' || event.type === 'response.output_text.delta';
1740
+ if (isTextDeltaEvent) {
1741
+ const delta = MixOpenAIWebSocket.extractDelta(event);
1742
+ if (delta) {
1743
+ message += delta;
1744
+ if (this.streamCallback) {
1745
+ this.streamCallback({ response: event, message, delta });
1746
+ }
1747
+ }
1748
+ return;
1749
+ }
1750
+
1751
+ if (event.type === 'response.done') {
1752
+ finalResponse = event.response || null;
1753
+ if (!message && finalResponse) {
1754
+ message = MixOpenAIResponses.extractResponsesMessage(finalResponse);
1755
+ }
1756
+
1757
+ if (!settled) {
1758
+ settled = true;
1759
+ cleanUp();
1760
+ ws.close();
1761
+ resolve({
1762
+ message: message.trim(),
1763
+ think: null,
1764
+ toolCalls: [],
1765
+ tokens: MixOpenAIResponses.extractResponsesTokens(finalResponse || {}),
1766
+ response: {
1767
+ response: finalResponse,
1768
+ events
1769
+ }
1770
+ });
1771
+ }
1772
+ return;
1773
+ }
1774
+
1775
+ if (event.type === 'error' && !settled) {
1776
+ settled = true;
1777
+ cleanUp();
1778
+ ws.close();
1779
+ reject({
1780
+ message: event.error?.message || 'Realtime WebSocket error',
1781
+ statusCode: null,
1782
+ details: event.error || event,
1783
+ config: mergedConfig,
1784
+ options
1785
+ });
1786
+ }
1787
+ });
1788
+
1789
+ ws.on('error', error => {
1790
+ if (settled) return;
1791
+ settled = true;
1792
+ cleanUp();
1793
+ reject({
1794
+ message: error.message || 'Realtime WebSocket connection error',
1795
+ statusCode: null,
1796
+ details: null,
1797
+ stack: error.stack,
1798
+ config: mergedConfig,
1799
+ options
1800
+ });
1801
+ });
1802
+
1803
+ ws.on('close', () => {
1804
+ if (settled) return;
1805
+ settled = true;
1806
+ cleanUp();
1807
+ reject({
1808
+ message: 'Realtime WebSocket closed before response.done',
1809
+ statusCode: null,
1810
+ details: null,
1811
+ config: mergedConfig,
1812
+ options
1813
+ });
1814
+ });
1815
+ });
1816
+ }
1817
+
1818
+ static messagesToConversationItems(messages = []) {
1819
+ const items = [];
1820
+
1821
+ for (const message of messages) {
1822
+ if (!message || !message.role) continue;
1823
+ if (message.role === 'tool' || message.tool_calls) continue;
1824
+
1825
+ const role = message.role === 'assistant' ? 'assistant' : (message.role === 'system' ? 'system' : 'user');
1826
+ const content = [];
1827
+
1828
+ if (typeof message.content === 'string') {
1829
+ content.push({
1830
+ type: role === 'assistant' ? 'text' : 'input_text',
1831
+ text: message.content
1832
+ });
1833
+ } else if (Array.isArray(message.content)) {
1834
+ for (const item of message.content) {
1835
+ if (!item || item.type !== 'text' || typeof item.text !== 'string') continue;
1836
+ content.push({
1837
+ type: role === 'assistant' ? 'text' : 'input_text',
1838
+ text: item.text
1839
+ });
1840
+ }
1841
+ }
1842
+
1843
+ if (content.length === 0) continue;
1844
+ items.push({ type: 'message', role, content });
1845
+ }
1846
+
1847
+ return items;
1848
+ }
1849
+
1850
+ static extractDelta(event) {
1851
+ if (typeof event.delta === 'string') return event.delta;
1852
+ return '';
1853
+ }
1854
+ }
1855
+
1495
1856
  class MixOpenRouter extends MixOpenAI {
1496
1857
  getDefaultConfig(customConfig) {
1497
1858
 
@@ -2266,4 +2627,4 @@ class MixGoogle extends MixCustom {
2266
2627
  }
2267
2628
  }
2268
2629
 
2269
- module.exports = { MixCustom, ModelMix, MixAnthropic, MixMiniMax, MixOpenAI, MixOpenRouter, MixPerplexity, MixOllama, MixLMStudio, MixGroq, MixTogether, MixGrok, MixCerebras, MixGoogle, MixFireworks };
2630
+ module.exports = { MixCustom, ModelMix, MixAnthropic, MixMiniMax, MixOpenAI, MixOpenAIResponses, MixOpenAIWebSocket, MixOpenRouter, MixPerplexity, MixOllama, MixLMStudio, MixGroq, MixTogether, MixGrok, MixCerebras, MixGoogle, MixFireworks };
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "modelmix",
3
- "version": "4.4.12",
3
+ "version": "4.4.16",
4
4
  "description": "🧬 Reliable interface with automatic fallback for AI LLMs.",
5
5
  "main": "index.js",
6
6
  "repository": {
@@ -16,7 +16,7 @@
16
16
  "openai",
17
17
  "anthropic",
18
18
  "agent",
19
- "grok4",
19
+ "realtime",
20
20
  "gpt",
21
21
  "claude",
22
22
  "llama",
@@ -47,16 +47,17 @@
47
47
  },
48
48
  "homepage": "https://github.com/clasen/ModelMix#readme",
49
49
  "dependencies": {
50
- "@modelcontextprotocol/sdk": "^1.26.0",
50
+ "@modelcontextprotocol/sdk": "^1.27.1",
51
51
  "axios": "^1.13.5",
52
52
  "bottleneck": "^2.19.5",
53
53
  "file-type": "^16.5.4",
54
54
  "form-data": "^4.0.4",
55
- "lemonlog": "^1.2.0"
55
+ "lemonlog": "^1.2.0",
56
+ "ws": "^8.19.0"
56
57
  },
57
58
  "devDependencies": {
58
59
  "chai": "^5.2.1",
59
- "mocha": "^11.3.0",
60
+ "mocha": "^11.7.5",
60
61
  "nock": "^14.0.9",
61
62
  "sinon": "^21.0.0"
62
63
  },
@@ -1,41 +1,50 @@
1
1
  ---
2
2
  name: modelmix
3
- description: Instructions for using the ModelMix Node.js library to interact with multiple AI LLM providers through a unified interface. Use when integrating AI models (OpenAI, Anthropic, Google, Groq, Perplexity, Grok, etc.), chaining models with fallback, getting structured JSON from LLMs, adding MCP tools, streaming responses, or managing multi-provider AI workflows in Node.js.
3
+ description: Instructions for using the ModelMix Node.js library to interact with multiple AI LLM providers through a unified interface. Use when writing code that calls AI models (OpenAI, Anthropic, Google, Groq, Perplexity, Grok, MiniMax, Fireworks, Together, Lambda, Cerebras, OpenRouter, Ollama, LM Studio), chaining models with fallback, getting structured JSON from LLMs, adding MCP tools, streaming responses, managing multi-provider AI workflows, round-robin load balancing, or rate limiting API requests in Node.js. Also use when the user mentions "modelmix", "ModelMix", asks to "call an LLM", "query a model", "add AI to my app", or wants to integrate any supported provider.
4
+ metadata:
5
+ tags: [llm, ai, openai, anthropic, google, groq, perplexity, grok, mcp, streaming, json-output]
4
6
  ---
5
7
 
6
8
  # ModelMix Library Skill
7
9
 
8
10
  ## Overview
9
11
 
10
- ModelMix is a Node.js library that provides a unified fluent API to interact with multiple AI LLM providers. It handles automatic fallback between models, round-robin load balancing, structured JSON output, streaming, MCP tool integration, rate limiting, and token tracking.
12
+ ModelMix is a Node.js library providing a unified fluent API to interact with multiple AI LLM providers. It handles automatic fallback between models, round-robin load balancing, structured JSON output, streaming, MCP tool integration, custom local tools, rate limiting, and token tracking.
11
13
 
12
14
  Use this skill when:
13
15
  - Integrating one or more AI models into a Node.js project
14
- - Chaining models with automatic fallback
16
+ - Chaining models with automatic fallback or round-robin
15
17
  - Extracting structured JSON from LLMs
16
18
  - Adding MCP tools or custom tools to models
19
+ - Streaming responses from any provider
17
20
  - Working with templates and file-based prompts
21
+ - Tracking token usage and costs
18
22
 
19
- Do NOT use this skill for:
23
+ Do NOT use for:
20
24
  - Python or non-Node.js projects
21
25
  - Direct HTTP calls to LLM APIs (use ModelMix instead)
22
26
 
23
- ## Common Tasks
27
+ ## Quick Reference
24
28
 
29
+ - [Installation](#installation)
30
+ - [Creating an instance](#creating-an-instance)
31
+ - [Attaching models](#attaching-models)
25
32
  - [Get a text response](#get-a-text-response)
26
33
  - [Get structured JSON](#get-structured-json)
27
34
  - [Stream a response](#stream-a-response)
28
- - [Get raw response (tokens, thinking, tool calls)](#get-raw-response-tokens-thinking-tool-calls)
29
- - [Access full response after `message()` or `json()` with `lastRaw`](#access-full-response-after-message-or-json-with-lastraw)
35
+ - [Extract a code block](#extract-a-code-block)
36
+ - [Get raw response (tokens, thinking, tool calls)](#get-raw-response)
37
+ - [Access full response with lastRaw](#access-full-response-with-lastraw)
30
38
  - [Add images](#add-images)
31
- - [Use templates with placeholders](#use-templates-with-placeholders)
39
+ - [Templates with placeholders](#templates-with-placeholders)
32
40
  - [Round-robin load balancing](#round-robin-load-balancing)
33
- - [MCP integration (external tools)](#mcp-integration-external-tools)
34
- - [Custom local tools (addTool)](#custom-local-tools-addtool)
35
- - [Rate limiting (Bottleneck)](#rate-limiting-bottleneck)
36
- - [Debug mode](#debug-mode)
37
- - [Use free-tier models](#use-free-tier-models)
41
+ - [MCP integration](#mcp-integration)
42
+ - [Custom local tools](#custom-local-tools)
43
+ - [Rate limiting](#rate-limiting)
38
44
  - [Conversation history](#conversation-history)
45
+ - [Debug mode](#debug-mode)
46
+ - [Free-tier models](#free-tier-models)
47
+ - [Multi-provider routing](#multi-provider-routing)
39
48
 
40
49
  ## Installation
41
50
 
@@ -54,49 +63,77 @@ import { ModelMix } from 'modelmix';
54
63
  ### Creating an Instance
55
64
 
56
65
  ```javascript
57
- // Static factory (preferred)
58
66
  const model = ModelMix.new();
59
67
 
60
- // With global options
61
68
  const model = ModelMix.new({
62
69
  options: { max_tokens: 4096, temperature: 0.7 },
63
70
  config: {
64
71
  system: "You are a helpful assistant.",
65
- max_history: 5,
66
- debug: 0, // 0=silent, 1=minimal, 2=summary, 3=full (no truncate), 4=verbose
67
- roundRobin: false // false=fallback, true=rotate models
72
+ max_history: 5, // -1 = unlimited, 0 = none (default), N = keep last N
73
+ debug: 0, // 0=silent, 1=minimal, 2=summary, 3=full, 4=verbose
74
+ roundRobin: false // false=fallback, true=rotate models
68
75
  }
69
76
  });
70
77
  ```
71
78
 
72
- ### Attaching Models (Fluent Chain)
79
+ ### Attaching Models
73
80
 
74
- Chain shorthand methods to attach providers. First model is primary; others are fallbacks:
81
+ Chain shorthand methods to attach providers. First model is primary; others are fallbacks (or rotated if `roundRobin: true`):
75
82
 
76
83
  ```javascript
77
84
  const model = ModelMix.new()
78
85
  .sonnet46() // primary
79
- .gpt52() // fallback 1
86
+ .gpt52() // fallback 1
80
87
  .gemini3flash() // fallback 2
81
88
  .addText("Hello!")
82
89
  ```
83
90
 
84
- If `sonnet45` fails, it automatically tries `gpt5mini`, then `gemini3flash`.
91
+ If `sonnet46` fails, it automatically tries `gpt52`, then `gemini3flash`.
85
92
 
86
93
  ## Available Model Shorthands
87
94
 
88
- - **OpenAI**: `gpt52` `gpt51` `gpt5` `gpt5mini` `gpt5nano` `gpt41` `gpt41mini` `gpt41nano`
89
- - **Anthropic**: `opus46` `opus45` `sonnet46` `sonnet45` `haiku45` `haiku35` (thinking variants: add `think` suffix)
90
- - **Google**: `gemini3pro` `gemini3flash` `gemini25pro` `gemini25flash`
91
- - **Grok**: `grok4` `grok41` (thinking variant available)
92
- - **Perplexity**: `sonar` `sonarPro`
93
- - **Groq**: `scout` `maverick`
94
- - **Together**: `qwen3` `kimiK2`
95
- - **Multi-provider**: `deepseekR1` `gptOss`
96
- - **MiniMax**: `minimaxM21`
97
- - **Fireworks**: `deepseekV32` `GLM47`
95
+ ### OpenAI
96
+ `gpt52()` `gpt52chat()` `gpt51()` `gpt5()` `gpt5mini()` `gpt5nano()` `gpt45()` `gpt41()` `gpt41mini()` `gpt41nano()` `o3()` `o4mini()`
97
+
98
+ ### Anthropic
99
+ `opus46()` `opus45()` `opus41()` `sonnet46()` `sonnet45()` `sonnet4()` `sonnet37()` `haiku45()` `haiku35()`
100
+
101
+ Thinking variants: append `think` — e.g. `opus46think()` `sonnet46think()` `sonnet45think()` `sonnet4think()` `sonnet37think()` `opus45think()` `opus41think()` `haiku45think()`
102
+
103
+ ### Google
104
+ `gemini3pro()` `gemini3flash()` `gemini25pro()` `gemini25flash()`
105
+
106
+ ### Grok
107
+ `grok4()` `grok41()` `grok41think()` `grok3()` `grok3mini()`
108
+
109
+ ### Perplexity
110
+ `sonar()` `sonarPro()`
111
+
112
+ ### Groq
113
+ `scout()` `maverick()`
114
+
115
+ ### Together
116
+ `qwen3()` `kimiK2()` `kimiK2think()` `kimiK25think()` `gptOss()`
117
+
118
+ ### MiniMax
119
+ `minimaxM25()` `minimaxM21()` `minimaxM2()` `minimaxM2Stable()`
120
+
121
+ ### Fireworks
122
+ `deepseekV32()` `GLM5()` `GLM47()`
123
+
124
+ ### Cerebras
125
+ `GLM46()`
126
+
127
+ ### OpenRouter
128
+ `GLM45()`
129
+
130
+ ### Multi-provider (auto-fallback across free/paid tiers)
131
+ `deepseekR1()` `hermes3()` `scout()` `maverick()` `kimiK2()` `GLM47()`
98
132
 
99
- Each method is called as `mix.methodName()` and accepts optional `{ options, config }` to override per-model settings.
133
+ ### Local
134
+ `lmstudio()` — for LM Studio local models
135
+
136
+ Each method accepts optional `{ options, config }` to override per-model settings.
100
137
 
101
138
  ## Common Tasks
102
139
 
@@ -116,35 +153,30 @@ const result = await ModelMix.new()
116
153
  .gpt5mini()
117
154
  .addText("Name and capital of 3 South American countries.")
118
155
  .json(
119
- { countries: [{ name: "", capital: "" }] }, // schema example
120
- { countries: [{ name: "country name", capital: "in uppercase" }] }, // descriptions
121
- { addNote: true } // options
156
+ { countries: [{ name: "", capital: "" }] },
157
+ { countries: [{ name: "country name", capital: "in uppercase" }] },
158
+ { addNote: true }
122
159
  );
123
- // result.countries → [{ name: "Brazil", capital: "BRASILIA" }, ...]
124
160
  ```
125
161
 
126
162
  `json()` signature: `json(schemaExample, schemaDescription?, { addSchema, addExample, addNote }?)`
127
163
 
128
164
  #### Enhanced descriptors
129
165
 
130
- Descriptions can be **strings** or **descriptor objects** with metadata:
166
+ Descriptions can be strings or descriptor objects with metadata:
131
167
 
132
168
  ```javascript
133
169
  const result = await model.json(
134
170
  { name: 'martin', age: 22, sex: 'Male' },
135
171
  {
136
172
  name: { description: 'Name of the actor', required: false },
137
- age: 'Age of the actor', // string still works
173
+ age: 'Age of the actor',
138
174
  sex: { description: 'Gender', enum: ['Male', 'Female', null] }
139
175
  }
140
176
  );
141
177
  ```
142
178
 
143
- Descriptor properties:
144
- - `description` (string) — field description
145
- - `required` (boolean, default `true`) — if `false`: removed from required array, type becomes nullable
146
- - `enum` (array) — allowed values; if includes `null`, type auto-becomes nullable
147
- - `default` (any) — default value
179
+ Descriptor properties: `description` (string), `required` (boolean, default true — if false, field becomes nullable), `enum` (array — if includes null, type auto-becomes nullable), `default` (any).
148
180
 
149
181
  #### Array auto-wrap
150
182
 
@@ -166,7 +198,19 @@ await ModelMix.new()
166
198
  });
167
199
  ```
168
200
 
169
- ### Get raw response (tokens, thinking, tool calls)
201
+ ### Extract a code block
202
+
203
+ ```javascript
204
+ const code = await ModelMix.new()
205
+ .gpt5mini()
206
+ .addText("Write a hello world function in JavaScript.")
207
+ .block();
208
+ // Returns only the content inside the first code block
209
+ ```
210
+
211
+ `block()` accepts `{ addSystemExtra }` (default true) — adds system instructions that tell the model to wrap output in a code block.
212
+
213
+ ### Get raw response
170
214
 
171
215
  ```javascript
172
216
  const raw = await ModelMix.new()
@@ -176,15 +220,15 @@ const raw = await ModelMix.new()
176
220
  // raw.message, raw.think, raw.tokens, raw.toolCalls, raw.response
177
221
  ```
178
222
 
179
- ### Access full response after `message()` or `json()` with `lastRaw`
223
+ ### Access full response with lastRaw
180
224
 
181
- After calling `message()`, `json()`, `block()`, or `stream()`, use `lastRaw` to access the complete response (tokens, thinking, tool calls, etc.). It has the same structure as `raw()`.
225
+ After calling `message()`, `json()`, `block()`, or `stream()`, use `lastRaw` to access the complete response:
182
226
 
183
227
  ```javascript
184
228
  const model = ModelMix.new().gpt5mini().addText("Hello!");
185
229
  const text = await model.message();
186
230
  console.log(model.lastRaw.tokens);
187
- // { input: 122, output: 86, total: 541, cost: 0.000319 }
231
+ // { input: 122, output: 86, total: 541, cost: 0.000319, speed: 38 }
188
232
  console.log(model.lastRaw.think); // reasoning content (if available)
189
233
  console.log(model.lastRaw.response); // raw API response
190
234
  ```
@@ -193,13 +237,16 @@ console.log(model.lastRaw.response); // raw API response
193
237
 
194
238
  ```javascript
195
239
  const model = ModelMix.new().sonnet45();
196
- model.addImage('./photo.jpg'); // from file
197
- model.addImageFromUrl('https://example.com/img.png'); // from URL
240
+ model.addImage('./photo.jpg'); // from file
241
+ model.addImageFromUrl('https://example.com/img.png'); // from URL
242
+ model.addImageFromBuffer(imageBuffer); // from Buffer
198
243
  model.addText('Describe this image.');
199
244
  const description = await model.message();
200
245
  ```
201
246
 
202
- ### Use templates with placeholders
247
+ All image methods accept an optional second argument `{ role }` (default `"user"`).
248
+
249
+ ### Templates with placeholders
203
250
 
204
251
  ```javascript
205
252
  const model = ModelMix.new().gpt5mini();
@@ -221,12 +268,11 @@ const pool = ModelMix.new({ config: { roundRobin: true } })
221
268
  .sonnet45()
222
269
  .gemini3flash();
223
270
 
224
- // Each call rotates to the next model
225
271
  const r1 = await pool.new().addText("Request 1").message();
226
272
  const r2 = await pool.new().addText("Request 2").message();
227
273
  ```
228
274
 
229
- ### MCP integration (external tools)
275
+ ### MCP integration
230
276
 
231
277
  ```javascript
232
278
  const model = ModelMix.new({ config: { max_history: 10 } }).gpt5nano();
@@ -238,7 +284,7 @@ console.log(await model.message());
238
284
 
239
285
  Requires `BRAVE_API_KEY` in `.env` for Brave Search MCP.
240
286
 
241
- ### Custom local tools (addTool)
287
+ ### Custom local tools
242
288
 
243
289
  ```javascript
244
290
  const model = ModelMix.new({ config: { max_history: 10 } }).gpt5mini();
@@ -259,7 +305,18 @@ model.addText("What's the weather in Tokyo?");
259
305
  console.log(await model.message());
260
306
  ```
261
307
 
262
- ### Rate limiting (Bottleneck)
308
+ Register multiple tools at once:
309
+
310
+ ```javascript
311
+ model.addTools([
312
+ { tool: { name: "tool_a", description: "...", inputSchema: {...} }, callback: async (args) => {...} },
313
+ { tool: { name: "tool_b", description: "...", inputSchema: {...} }, callback: async (args) => {...} }
314
+ ]);
315
+ ```
316
+
317
+ Manage tools: `model.removeTool("tool_a")` and `model.listTools()` → `{ local, mcp }`.
318
+
319
+ ### Rate limiting
263
320
 
264
321
  ```javascript
265
322
  const model = ModelMix.new({
@@ -272,20 +329,31 @@ const model = ModelMix.new({
272
329
  }).gpt5mini();
273
330
  ```
274
331
 
332
+ ### Conversation history
333
+
334
+ ```javascript
335
+ const chat = ModelMix.new({ config: { max_history: 10 } }).gpt5mini();
336
+ chat.addText("My name is Martin.");
337
+ await chat.message();
338
+ chat.addText("What's my name?");
339
+ const reply = await chat.message(); // "Martin"
340
+ ```
341
+
342
+ `max_history`: 0 = no history (default), N = keep last N exchanges, -1 = unlimited.
343
+
275
344
  ### Debug mode
276
345
 
277
346
  ```javascript
278
347
  const model = ModelMix.new({
279
- config: { debug: 2 } // 0=silent, 1=minimal, 2=summary, 3=full (no truncate), 4=verbose
348
+ config: { debug: 2 } // 0=silent, 1=minimal, 2=summary, 3=full, 4=verbose
280
349
  }).gpt5mini();
281
350
  ```
282
351
 
283
- For full debug output, also set the env: `DEBUG=ModelMix* node script.js`
352
+ For full debug output, also set: `DEBUG=ModelMix* node script.js`
284
353
 
285
- ### Use free-tier models
354
+ ### Free-tier models
286
355
 
287
356
  ```javascript
288
- // These use providers with free quotas (OpenRouter, Groq, Cerebras)
289
357
  const model = ModelMix.new()
290
358
  .gptOss()
291
359
  .kimiK2()
@@ -295,48 +363,61 @@ const model = ModelMix.new()
295
363
  console.log(await model.message());
296
364
  ```
297
365
 
298
- ### Conversation history
366
+ These use providers with free quotas (OpenRouter, Groq, Cerebras). If one runs out of quota, ModelMix falls back to the next.
367
+
368
+ ### Multi-provider routing
369
+
370
+ Some model shorthands register the same model across multiple providers for maximum resilience. Control which providers are enabled via the `mix` parameter:
299
371
 
300
372
  ```javascript
301
- const chat = ModelMix.new({ config: { max_history: 10 } }).gpt5mini();
302
- chat.addText("My name is Martin.");
303
- await chat.message();
304
- chat.addText("What's my name?");
305
- const reply = await chat.message(); // "Martin"
373
+ const model = ModelMix.new({
374
+ mix: {
375
+ openrouter: true, // default: true
376
+ cerebras: true, // default: true
377
+ groq: true, // default: true
378
+ together: false, // default: false
379
+ lambda: false, // default: false
380
+ minimax: false, // default: false
381
+ fireworks: false // default: false
382
+ }
383
+ }).deepseekR1();
306
384
  ```
307
385
 
308
386
  ## Agent Usage Rules
309
387
 
310
- - Always check `package.json` for `modelmix` before running `npm install`.
311
- - Use `ModelMix.new()` static factory to create instances (not `new ModelMix()`).
388
+ - Check `package.json` for `modelmix` before running `npm install`.
389
+ - Use `ModelMix.new()` static factory (not `new ModelMix()`).
312
390
  - Store API keys in `.env` and load with `dotenv/config` or `process.loadEnvFile()`. Never hardcode keys.
313
391
  - Chain models for resilience: primary model first, fallbacks after.
314
- - When using MCP tools or `addTool()`, set `max_history` to at least 3.
315
- - Use `.json()` for structured output instead of parsing text manually. Use descriptor objects `{ description, required, enum, default }` in descriptions for richer schema control.
392
+ - When using MCP tools or `addTool()`, set `max_history` to at least 3 — tool call/response pairs consume history slots.
393
+ - Use `.json()` for structured output instead of parsing text manually. Use descriptor objects `{ description, required, enum, default }` for richer schema control.
316
394
  - Use `.message()` for simple text, `.raw()` when you need tokens/thinking/toolCalls.
317
395
  - For thinking models, append `think` to the method name (e.g. `sonnet45think()`).
318
396
  - Template placeholders use `{key}` syntax in both system prompts and user messages.
319
- - The library uses CommonJS internally (`require`) but supports ESM import via `{ ModelMix }`.
320
- - Available provider Mix classes for custom setups: `MixOpenAI`, `MixAnthropic`, `MixGoogle`, `MixPerplexity`, `MixGroq`, `MixTogether`, `MixGrok`, `MixOpenRouter`, `MixOllama`, `MixLMStudio`, `MixCustom`, `MixCerebras`, `MixFireworks`, `MixMiniMax`.
397
+ - The library uses CommonJS internally but supports ESM import via `{ ModelMix }`.
398
+ - GPT-5+ models automatically use `max_completion_tokens` instead of `max_tokens`.
399
+ - o-series models (o3, o4mini) automatically strip `max_tokens` and `temperature` since those APIs don't support them.
400
+ - `addText()`, `addImage()`, `addImageFromUrl()`, and `addImageFromBuffer()` all accept `{ role }` as second argument (default `"user"`).
321
401
 
322
402
  ## API Quick Reference
323
403
 
324
404
  | Method | Returns | Description |
325
405
  | --- | --- | --- |
326
- | `.addText(text)` | `this` | Add user message |
327
- | `.addTextFromFile(path)` | `this` | Add user message from file |
406
+ | `.addText(text, {role?})` | `this` | Add user message |
407
+ | `.addTextFromFile(path, {role?})` | `this` | Add user message from file |
328
408
  | `.setSystem(text)` | `this` | Set system prompt |
329
409
  | `.setSystemFromFile(path)` | `this` | Set system prompt from file |
330
- | `.addImage(path)` | `this` | Add image from file |
331
- | `.addImageFromUrl(url)` | `this` | Add image from URL or data URI |
410
+ | `.addImage(path, {role?})` | `this` | Add image from file |
411
+ | `.addImageFromUrl(url, {role?})` | `this` | Add image from URL or data URI |
412
+ | `.addImageFromBuffer(buffer, {role?})` | `this` | Add image from Buffer |
332
413
  | `.replace({})` | `this` | Set placeholder replacements |
333
414
  | `.replaceKeyFromFile(key, path)` | `this` | Replace placeholder with file content |
334
415
  | `.message()` | `Promise<string>` | Get text response |
335
- | `.json(example, desc?, opts?)` | `Promise<object\|array>` | Get structured JSON. Descriptions support descriptor objects `{ description, required, enum, default }`. Top-level arrays auto-wrapped |
416
+ | `.json(example, desc?, opts?)` | `Promise<object\|array>` | Get structured JSON |
336
417
  | `.raw()` | `Promise<{message, think, toolCalls, tokens, response}>` | Full response |
337
- | `.lastRaw` | `object \| null` | Full response from last `message()`/`json()`/`block()`/`stream()` call |
418
+ | `.lastRaw` | `object \| null` | Full response from last call |
338
419
  | `.stream(callback)` | `Promise` | Stream response |
339
- | `.block()` | `Promise<string>` | Extract code block from response |
420
+ | `.block({addSystemExtra?})` | `Promise<string>` | Extract code block from response |
340
421
  | `.addMCP(package)` | `Promise` | Add MCP server tools |
341
422
  | `.addTool(def, callback)` | `this` | Register custom local tool |
342
423
  | `.addTools([{tool, callback}])` | `this` | Register multiple tools |
@@ -345,6 +426,30 @@ const reply = await chat.message(); // "Martin"
345
426
  | `.new()` | `ModelMix` | Clone instance sharing models |
346
427
  | `.attach(key, provider)` | `this` | Attach custom provider |
347
428
 
429
+ ## Available Provider Classes
430
+
431
+ `MixOpenAI` `MixAnthropic` `MixGoogle` `MixPerplexity` `MixGroq` `MixTogether` `MixGrok` `MixOpenRouter` `MixOllama` `MixLMStudio` `MixCustom` `MixCerebras` `MixFireworks` `MixMiniMax` `MixLambda`
432
+
433
+ ## Troubleshooting
434
+
435
+ **Model fails with "API key not found"**
436
+ The provider's API key env var is not set. Add it to `.env` and ensure it loads before ModelMix runs. Each provider looks for its standard env var (e.g. `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GEMINI_API_KEY`).
437
+
438
+ **Tool calls not working**
439
+ Set `max_history` to at least 3. Tool call/response pairs are stored in history and the model needs to see them to complete the conversation loop.
440
+
441
+ **JSON response parsing fails**
442
+ Add `{ addNote: true }` to the `json()` options — this injects instructions about JSON escaping that prevent common parsing errors. For complex schemas, also try `{ addExample: true }`.
443
+
444
+ **Model returns empty or truncated response**
445
+ Increase `max_tokens` in options. Default is 8192 but some tasks need more. For GPT-5+ models, `max_completion_tokens` is used automatically.
446
+
447
+ **Rate limit errors**
448
+ Configure Bottleneck: `config: { bottleneck: { maxConcurrent: 2, minTime: 2000 } }`. This throttles requests to stay within provider limits.
449
+
450
+ **MCP server fails to connect**
451
+ Ensure the MCP package is installed (`npm install @modelcontextprotocol/server-brave-search`) and required env vars are set. Call `addMCP()` with `await` — it's async.
452
+
348
453
  ## References
349
454
 
350
455
  - [GitHub Repository](https://github.com/clasen/ModelMix)