npm - modelmix - Versions diffs - 4.4.12 → 4.4.16 - Mend

modelmix 4.4.12 → 4.4.16

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/README.md +12 -11
package/demo/gemini.js +12 -9
package/demo/gpt-realtime.js +22 -0
package/demo/{gpt51.js → gpt54.js} +2 -2
package/demo/package-lock.json +11 -1
package/demo/package.json +2 -1
package/index.js +364 -3
package/package.json +6 -5
package/skills/modelmix/SKILL.md +183 -78

package/README.md CHANGED Viewed

@@ -135,9 +135,10 @@ Here's a comprehensive list of available methods:
 | Method             | Provider   | Model                          | Price (I/O) per 1 M tokens |
 | ------------------ | ---------- | ------------------------------ | -------------------------- |
+| `gpt54()`          | OpenAI     | gpt-5.4                        | [\$2.50 / \$15.00][1]      |
 | `gpt52()`          | OpenAI     | gpt-5.2                        | [\$1.75 / \$14.00][1]      |
 | `gpt51()`          | OpenAI     | gpt-5.1                        | [\$1.25 / \$10.00][1]      |
-| `gpt5()`           | OpenAI     | gpt-5                          | [\$1.25 / \$10.00][1]      |
+| `gpt53codex()`     | OpenAI     | gpt-5.3-codex                  | [\$1.25 / \$14.00][1]      |
 | `gpt5mini()`       | OpenAI     | gpt-5-mini                     | [\$0.25 / \$2.00][1]       |
 | `gpt5nano()`       | OpenAI     | gpt-5-nano                     | [\$0.05 / \$0.40][1]       |
 | `gpt41()`          | OpenAI     | gpt-4.1                        | [\$2.00 / \$8.00][1]       |
@@ -148,8 +149,8 @@ Here's a comprehensive list of available methods:
 | `opus45[think]()`  | Anthropic  | claude-opus-4-5-20251101       | [\$5.00 / \$25.00][2]      |
 | `sonnet46[think]()`| Anthropic  | claude-sonnet-4-6              | [\$3.00 / \$15.00][2]      |
 | `sonnet45[think]()`| Anthropic  | claude-sonnet-4-5-20250929     | [\$3.00 / \$15.00][2]      |
-| `haiku35()`        | Anthropic  | claude-3-5-haiku-20241022      | [\$0.80 / \$4.00][2]       |
 | `haiku45[think]()` | Anthropic  | claude-haiku-4-5-20251001      | [\$1.00 / \$5.00][2]       |
+| `gemini31pro()`    | Google     | gemini-3.1-pro-preview         | [\$2.00 / \$12.00][3]      |
 | `gemini3pro()`     | Google     | gemini-3-pro-preview           | [\$2.00 / \$12.00][3]      |
 | `gemini3flash()`    | Google     | gemini-3-flash-preview          | [\$0.50 / \$3.00][3]       |
 | `gemini25pro()`    | Google     | gemini-2.5-pro                 | [\$1.25 / \$10.00][3]      |
@@ -161,8 +162,6 @@ Here's a comprehensive list of available methods:
 | `minimaxM25()`     | MiniMax    | MiniMax-M2.5                   | [\$0.30 / \$1.20][9]       |
 | `sonar()`          | Perplexity | sonar                          | [\$1.00 / \$1.00][4]       |
 | `sonarPro()`       | Perplexity | sonar-pro                      | [\$3.00 / \$15.00][4]      |
-| `scout()`          | Groq       | Llama-4-Scout-17B-16E-Instruct | [\$0.11 / \$0.34][5]       |
-| `maverick()`       | Groq       | Maverick-17B-128E-Instruct-FP8 | [\$0.20 / \$0.60][5]       |
 | `hermes3()`        | Lambda     | Hermes-3-Llama-3.1-405B-FP8    | [\$0.80 / \$0.80][8]       |
 | `qwen3()`          | Together   | Qwen3-235B-A22B-fp8-tput       | [\$0.20 / \$0.60][7]       |
 | `kimiK2()`         | Together   | Kimi-K2-Instruct               | [\$1.00 / \$3.00][7]       |
@@ -345,11 +344,11 @@ Descriptions support **descriptor objects** with `description`, `required`, `enu
 ```javascript
 const result = await model.json(
-    { name: 'martin', age: 22, sex: 'm' },
+    { name: 'Martin', age: 22, sex: 'male' },
     {
         name: { description: 'Name of the actor', required: false },
-        age: 'Age of the actor',                                     // string still works
-        sex: { description: 'Gender', enum: ['m', 'f', null], default: 'm' }
+        age: 'Age of the actor', // string still works
+        sex: { description: 'Gender', enum: ['male', 'female', null], default: null }
     }
 );
 ```
@@ -406,7 +405,9 @@ Every response from `raw()` now includes a `tokens` object with the following st
   tokens: {
     input: 150,    // Number of tokens in the prompt/input
     output: 75,    // Number of tokens in the completion/output
-    total: 225     // Total tokens used (input + output)
+    total: 225,    // Total tokens used (input + output)
+    cost: 0.0012,  // Estimated cost in USD (null if model not in pricing table)
+    speed: 42      // Output tokens per second (int)
   }
 }
 ```
@@ -418,10 +419,10 @@ After calling `message()` or `json()`, use `lastRaw` to access the complete resp
 ```javascript
 const text = await model.message();
 console.log(model.lastRaw.tokens);
-// { input: 122, output: 86, total: 541, cost: 0.000319 }
+// { input: 122, output: 86, total: 541, cost: 0.000319, speed: 38 }
 ```
-The `cost` field is the estimated cost in USD based on the model's pricing per 1M tokens (input/output). If the model is not found in the pricing table, `cost` will be `null`.
+The `cost` field is the estimated cost in USD based on the model's pricing per 1M tokens (input/output). If the model is not found in the pricing table, `cost` will be `null`. The `speed` field is the generation speed measured in output tokens per second (integer).
 ## 🐛 Enabling Debug Mode
@@ -515,7 +516,7 @@ new ModelMix(args = { options: {}, config: {} })
   - `message`: The text response from the model
   - `think`: Reasoning/thinking content (if available)
   - `toolCalls`: Array of tool calls made by the model (if any)
-  - `tokens`: Object with `input`, `output`, and `total` token counts
+  - `tokens`: Object with `input`, `output`, `total` token counts, `cost` (USD), and `speed` (output tokens/sec)
   - `response`: The raw API response
 - `stream(callback)`: Sends the message and streams the response, invoking the callback with each streamed part.
 - `json(schemaExample, descriptions = {}, options = {})`: Forces the model to return a response in a specific JSON format.

package/demo/gemini.js CHANGED Viewed

@@ -1,5 +1,5 @@
 import { ModelMix, MixGoogle } from '../index.js';
-try { process.loadEnvFile(); } catch {}
+try { process.loadEnvFile(); } catch { }
 const mmix = new ModelMix({
     options: {
@@ -12,9 +12,9 @@ const mmix = new ModelMix({
     }
 });
-// Using gemini25flash (Gemini 2.5 Flash) with built-in method
+// Using gemini3flash (Gemini 3 Flash) with built-in method
 console.log("\n" + '--------| gemini25flash() |--------');
-const flash = await mmix.gemini25flash()
+const flash = await mmix.gemini3flash()
     .addText('Hi there! Do you like cats?')
     .message();
@@ -22,20 +22,23 @@ console.log(flash);
 // Using gemini3pro (Gemini 3 Pro) with custom config
 console.log("\n" + '--------| gemini3pro() with JSON response |--------');
-const pro = mmix.new().gemini3pro();
+const pro = mmix.new().gemini31pro();
 pro.addText('Give me a fun fact about cats');
-const jsonResponse = await pro.json({
+const jsonExampleAndSchema = {
     fact: 'A fun fact about cats',
-    category: 'animal behavior'
-});
+    category: 'animal behavior'
+};
+const jsonResponse = await pro.json(jsonExampleAndSchema, jsonExampleAndSchema);
 console.log(jsonResponse);
 // Using attach method with MixGoogle for custom model
 console.log("\n" + '--------| Custom Gemini with attach() |--------');
-mmix.attach('gemini-2.5-flash', new MixGoogle());
+const customModel = mmix.new().attach('gemini-2.5-flash', new MixGoogle());
-const custom = await mmix.addText('Tell me a short joke about cats.').message();
+const custom = await customModel.addText('Tell me a short joke about cats.').message();
 console.log(custom);

package/demo/gpt-realtime.js ADDED Viewed

@@ -0,0 +1,22 @@
+import { ModelMix } from '../index.js';
+try { process.loadEnvFile(); } catch {}
+const mmix = new ModelMix({
+    config: {
+        debug: 3
+    }
+});
+console.log('\n--------| gptRealtime() |--------');
+const realtime = mmix.gptRealtimeMini({
+    options: {
+        stream: true
+    }
+});
+realtime.addText('Explain quantum entanglement in simple terms.');
+const response = await realtime.stream(({ delta }) => {
+    process.stdout.write(delta || '');
+});
+console.log('\n\n[done]\n', response.tokens);

package/demo/{gpt51.js → gpt54.js} RENAMED Viewed

@@ -8,10 +8,10 @@ const mmix = new ModelMix({
     }
 });
-console.log("\n" + '--------| gpt51() |--------');
+console.log("\n" + '--------| gpt54() |--------');
 const gptArgs = { options: { reasoning_effort: "none", verbosity: "low" } };
-const gpt = mmix.gpt51(gptArgs);
+const gpt = mmix.gpt54(gptArgs);
 gpt.addText("Explain quantum entanglement in simple terms.");
 const response = await gpt.message();

package/demo/package-lock.json CHANGED Viewed

@@ -11,7 +11,8 @@
       "dependencies": {
         "dotenv": "^17.2.3",
         "isolated-vm": "^6.0.2",
-        "lemonlog": "^1.1.4"
+        "lemonlog": "^1.1.4",
+        "pathmix": "^1.0.0"
       }
     },
     ".api/apis/pplx": {
@@ -290,6 +291,15 @@
         "wrappy": "1"
       }
     },
+    "node_modules/pathmix": {
+      "version": "1.0.0",
+      "resolved": "https://registry.npmjs.org/pathmix/-/pathmix-1.0.0.tgz",
+      "integrity": "sha512-oLbvoOKuyV6TjkKLEYqH5O+q+d+qZwtRNzMrBI93IsCYN0liDw8W8aZq3BPvIaF4jJU+igeO/1p6lCwFfy8E5Q==",
+      "license": "ISC",
+      "engines": {
+        "node": ">=16.0.0"
+      }
+    },
     "node_modules/prebuild-install": {
       "version": "7.1.3",
       "resolved": "https://registry.npmjs.org/prebuild-install/-/prebuild-install-7.1.3.tgz",

package/demo/package.json CHANGED Viewed

@@ -15,6 +15,7 @@
   "dependencies": {
     "dotenv": "^17.2.3",
     "isolated-vm": "^6.0.2",
-    "lemonlog": "^1.1.4"
+    "lemonlog": "^1.1.4",
+    "pathmix": "^1.0.0"
   }
 }

package/index.js CHANGED Viewed

@@ -5,6 +5,7 @@ const { inspect } = require('util');
 const log = require('lemonlog')('ModelMix');
 const Bottleneck = require('bottleneck');
 const path = require('path');
+const WebSocket = require('ws');
 const generateJsonSchema = require('./schema');
 const { Client } = require("@modelcontextprotocol/sdk/client/index.js");
 const { StdioClientTransport } = require("@modelcontextprotocol/sdk/client/stdio.js");
@@ -14,6 +15,11 @@ const { MCPToolsManager } = require('./mcp-tools');
 // Based on provider pricing pages linked in README
 const MODEL_PRICING = {
     // OpenAI
+    'gpt-realtime-mini': [0.60, 2.40],
+    'gpt-realtime': [4.00, 16.00],
+    'gpt-5.4': [2.50, 15.00],
+    'gpt-5.4-pro': [30, 180.00],
+    'gpt-5.3-codex': [1.75, 14.00],
     'gpt-5.2': [1.75, 14.00],
     'gpt-5.2-chat-latest': [1.75, 14.00],
     'gpt-5.1': [1.25, 10.00],
@@ -37,6 +43,7 @@ const MODEL_PRICING = {
     'claude-3-5-haiku-20241022': [0.80, 4.00],
     'claude-haiku-4-5-20251001': [1.00, 5.00],
     // Google
+    'gemini-3.1-pro-preview':[2.00, 12.00],
     'gemini-3-pro-preview': [2.00, 12.00],
     'gemini-3-flash-preview': [0.50, 3.00],
     'gemini-2.5-pro': [1.25, 10.00],
@@ -267,6 +274,21 @@ class ModelMix {
     gpt52({ options = {}, config = {} } = {}) {
         return this.attach('gpt-5.2', new MixOpenAI({ options, config }));
     }
+    gpt54({ options = {}, config = {} } = {}) {
+        return this.attach('gpt-5.4', new MixOpenAIResponses({ options, config }));
+    }
+    gpt54pro({ options = {}, config = {} } = {}) {
+        return this.attach('gpt-5.4-pro', new MixOpenAIResponses({ options, config }));
+    }
+    gptRealtime({ options = {}, config = {} } = {}) {
+        return this.attach('gpt-realtime', new MixOpenAIWebSocket({ options, config }));
+    }
+    gptRealtimeMini({ options = {}, config = {} } = {}) {
+        return this.attach('gpt-realtime-mini', new MixOpenAIWebSocket({ options, config }));
+    }
+    gpt53codex({ options = {}, config = {} } = {}) {
+        return this.attach('gpt-5.3-codex', new MixOpenAIResponses({ options, config }));
+    }
     gpt52chat({ options = {}, config = {} } = {}) {
         return this.attach('gpt-5.2-chat-latest', new MixOpenAI({ options, config }));
     }
@@ -341,6 +363,9 @@ class ModelMix {
     gemini25flash({ options = {}, config = {} } = {}) {
         return this.attach('gemini-2.5-flash', new MixGoogle({ options, config }));
     }
+    gemini31pro({ options = {}, config = {} } = {}) {
+        return this.attach('gemini-3.1-pro-preview', new MixGoogle({ options, config }));
+    }
     gemini3pro({ options = {}, config = {} } = {}) {
         return this.attach('gemini-3-pro-preview', new MixGoogle({ options, config }));
     }
@@ -889,11 +914,14 @@ class ModelMix {
                         providerInstance.streamCallback = this.streamCallback;
                     }
+                    const startTime = Date.now();
                     const result = await providerInstance.create({ options: currentOptions, config: currentConfig });
+                    const elapsedMs = Date.now() - startTime;
-                    // Calculate cost based on model pricing
                     if (result.tokens) {
                         result.tokens.cost = ModelMix.calculateCost(currentModelKey, result.tokens);
+                        const elapsedSec = elapsedMs / 1000;
+                        result.tokens.speed = elapsedSec > 0 ? Math.round(result.tokens.output / elapsedSec) : 0;
                     }
                     if (result.toolCalls && result.toolCalls.length > 0) {
@@ -935,7 +963,7 @@ class ModelMix {
                     // debug level 2: Readable summary of output
                     if (currentConfig.debug >= 2) {
                         const tokenInfo = result.tokens
-                            ? ` ${result.tokens.input} → ${result.tokens.output} tok` + (result.tokens.cost != null ? ` $${result.tokens.cost.toFixed(4)}` : '')
+                            ? ` ${result.tokens.input} → ${result.tokens.output} tok` + (result.tokens.speed ? ` ${result.tokens.speed} t/s` : '') + (result.tokens.cost != null ? ` $${result.tokens.cost.toFixed(4)}` : '')
                             : '';
                         console.log(`✓${tokenInfo}\n${ModelMix.formatOutputSummary(result, currentConfig.debug).trim()}`);
                     }
@@ -1492,6 +1520,339 @@ class MixOpenAI extends MixCustom {
     }
 }
+class MixOpenAIResponses extends MixOpenAI {
+    async create({ config = {}, options = {} } = {}) {
+        // Keep GPT/o-model option normalization behavior
+        if (options.model?.startsWith('o')) {
+            delete options.max_tokens;
+            delete options.temperature;
+        }
+        if (options.model?.includes('gpt-5')) {
+            if (options.max_tokens) {
+                options.max_completion_tokens = options.max_tokens;
+                delete options.max_tokens;
+            }
+            delete options.temperature;
+        }
+        const responsesUrl = this.config.url.replace('/chat/completions', '/responses');
+        const request = MixOpenAIResponses.buildResponsesRequest(options);
+        const response = await axios.post(responsesUrl, request, {
+            headers: this.headers
+        });
+        return MixOpenAIResponses.processResponsesResponse(response);
+    }
+    static buildResponsesRequest(options = {}) {
+        const request = {
+            model: options.model,
+            input: MixOpenAIResponses.messagesToResponsesInput(options.messages),
+            stream: false
+        };
+        if (options.reasoning_effort) request.reasoning = { effort: options.reasoning_effort };
+        if (options.verbosity) request.text = { verbosity: options.verbosity };
+        if (typeof options.max_completion_tokens === 'number') {
+            request.max_output_tokens = options.max_completion_tokens;
+        } else if (typeof options.max_tokens === 'number') {
+            request.max_output_tokens = options.max_tokens;
+        }
+        if (typeof options.temperature === 'number') request.temperature = options.temperature;
+        if (typeof options.top_p === 'number') request.top_p = options.top_p;
+        if (typeof options.presence_penalty === 'number') request.presence_penalty = options.presence_penalty;
+        if (typeof options.frequency_penalty === 'number') request.frequency_penalty = options.frequency_penalty;
+        if (options.stop !== undefined) request.stop = options.stop;
+        if (typeof options.n === 'number') request.n = options.n;
+        if (options.logit_bias !== undefined) request.logit_bias = options.logit_bias;
+        if (options.user !== undefined) request.user = options.user;
+        return request;
+    }
+    static processResponsesResponse(response) {
+        const message = MixOpenAIResponses.extractResponsesMessage(response.data);
+        return {
+            message,
+            think: null,
+            toolCalls: [],
+            tokens: MixOpenAIResponses.extractResponsesTokens(response.data),
+            response: response.data
+        };
+    }
+    static extractResponsesTokens(data) {
+        if (data.usage) {
+            return {
+                input: data.usage.input_tokens || 0,
+                output: data.usage.output_tokens || 0,
+                total: data.usage.total_tokens || ((data.usage.input_tokens || 0) + (data.usage.output_tokens || 0))
+            };
+        }
+        return {
+            input: 0,
+            output: 0,
+            total: 0
+        };
+    }
+    static extractResponsesMessage(data) {
+        if (!Array.isArray(data.output)) return '';
+        return data.output
+            .filter(item => item.type === 'message')
+            .flatMap(item => Array.isArray(item.content) ? item.content : [])
+            .filter(content => content.type === 'output_text' && typeof content.text === 'string')
+            .map(content => content.text)
+            .join('\n')
+            .trim();
+    }
+    static messagesToResponsesInput(messages = []) {
+        const mapped = [];
+        for (const message of messages) {
+            if (!message || !message.role) continue;
+            if (message.tool_calls || message.role === 'tool') continue;
+            let text = '';
+            if (typeof message.content === 'string') {
+                text = message.content;
+            } else if (Array.isArray(message.content)) {
+                text = message.content
+                    .filter(item => item && item.type === 'text' && typeof item.text === 'string')
+                    .map(item => item.text)
+                    .join('\n');
+            }
+            if (!text) continue;
+            mapped.push({
+                role: message.role,
+                content: [{ type: 'input_text', text }]
+            });
+        }
+        return mapped;
+    }
+}
+class MixOpenAIWebSocket extends MixOpenAIResponses {
+    getDefaultConfig(customConfig) {
+        return super.getDefaultConfig({
+            realtimeUrl: 'wss://api.openai.com/v1/realtime',
+            websocketTimeoutMs: 120000,
+            ...customConfig
+        });
+    }
+    async create({ config = {}, options = {} } = {}) {
+        if (options.model?.startsWith('o')) {
+            delete options.max_tokens;
+            delete options.temperature;
+        }
+        if (options.model?.includes('gpt-5')) {
+            if (options.max_tokens) {
+                options.max_completion_tokens = options.max_tokens;
+                delete options.max_tokens;
+            }
+            delete options.temperature;
+        }
+        const mergedConfig = { ...this.config, ...config };
+        const realtimeUrl = `${mergedConfig.realtimeUrl}?model=${encodeURIComponent(options.model)}`;
+        const timeoutMs = mergedConfig.websocketTimeoutMs || 120000;
+        return await new Promise((resolve, reject) => {
+            const ws = new WebSocket(realtimeUrl, {
+                headers: {
+                    authorization: `Bearer ${mergedConfig.apiKey}`
+                }
+            });
+            const events = [];
+            let message = '';
+            let settled = false;
+            let finalResponse = null;
+            const timeout = setTimeout(() => {
+                if (settled) return;
+                settled = true;
+                ws.close();
+                reject({
+                    message: `Realtime WebSocket timed out after ${timeoutMs}ms`,
+                    statusCode: null,
+                    details: null,
+                    config: mergedConfig,
+                    options
+                });
+            }, timeoutMs);
+            const cleanUp = () => clearTimeout(timeout);
+            ws.on('open', () => {
+                const session = {
+                    type: 'realtime',
+                    output_modalities: ['text']
+                };
+                if (mergedConfig.system) session.instructions = mergedConfig.system;
+                if (Array.isArray(options.tools) && options.tools.length > 0) {
+                    session.tools = options.tools;
+                }
+                ws.send(JSON.stringify({ type: 'session.update', session }));
+                const items = MixOpenAIWebSocket.messagesToConversationItems(options.messages);
+                for (const item of items) {
+                    ws.send(JSON.stringify({
+                        type: 'conversation.item.create',
+                        item
+                    }));
+                }
+                const responseConfig = { output_modalities: ['text'] };
+                if (typeof options.max_completion_tokens === 'number') {
+                    responseConfig.max_output_tokens = Math.min(options.max_completion_tokens, 4096);
+                } else if (typeof options.max_tokens === 'number') {
+                    responseConfig.max_output_tokens = Math.min(options.max_tokens, 4096);
+                }
+                if (Array.isArray(options.tools) && options.tools.length > 0) responseConfig.tools = options.tools;
+                ws.send(JSON.stringify({
+                    type: 'response.create',
+                    response: responseConfig
+                }));
+            });
+            ws.on('message', raw => {
+                let event;
+                try {
+                    event = JSON.parse(raw.toString());
+                } catch {
+                    return;
+                }
+                events.push(event);
+                const isTextDeltaEvent = event.type === 'response.text.delta' || event.type === 'response.output_text.delta';
+                if (isTextDeltaEvent) {
+                    const delta = MixOpenAIWebSocket.extractDelta(event);
+                    if (delta) {
+                        message += delta;
+                        if (this.streamCallback) {
+                            this.streamCallback({ response: event, message, delta });
+                        }
+                    }
+                    return;
+                }
+                if (event.type === 'response.done') {
+                    finalResponse = event.response || null;
+                    if (!message && finalResponse) {
+                        message = MixOpenAIResponses.extractResponsesMessage(finalResponse);
+                    }
+                    if (!settled) {
+                        settled = true;
+                        cleanUp();
+                        ws.close();
+                        resolve({
+                            message: message.trim(),
+                            think: null,
+                            toolCalls: [],
+                            tokens: MixOpenAIResponses.extractResponsesTokens(finalResponse || {}),
+                            response: {
+                                response: finalResponse,
+                                events
+                            }
+                        });
+                    }
+                    return;
+                }
+                if (event.type === 'error' && !settled) {
+                    settled = true;
+                    cleanUp();
+                    ws.close();
+                    reject({
+                        message: event.error?.message || 'Realtime WebSocket error',
+                        statusCode: null,
+                        details: event.error || event,
+                        config: mergedConfig,
+                        options
+                    });
+                }
+            });
+            ws.on('error', error => {
+                if (settled) return;
+                settled = true;
+                cleanUp();
+                reject({
+                    message: error.message || 'Realtime WebSocket connection error',
+                    statusCode: null,
+                    details: null,
+                    stack: error.stack,
+                    config: mergedConfig,
+                    options
+                });
+            });
+            ws.on('close', () => {
+                if (settled) return;
+                settled = true;
+                cleanUp();
+                reject({
+                    message: 'Realtime WebSocket closed before response.done',
+                    statusCode: null,
+                    details: null,
+                    config: mergedConfig,
+                    options
+                });
+            });
+        });
+    }
+    static messagesToConversationItems(messages = []) {
+        const items = [];
+        for (const message of messages) {
+            if (!message || !message.role) continue;
+            if (message.role === 'tool' || message.tool_calls) continue;
+            const role = message.role === 'assistant' ? 'assistant' : (message.role === 'system' ? 'system' : 'user');
+            const content = [];
+            if (typeof message.content === 'string') {
+                content.push({
+                    type: role === 'assistant' ? 'text' : 'input_text',
+                    text: message.content
+                });
+            } else if (Array.isArray(message.content)) {
+                for (const item of message.content) {
+                    if (!item || item.type !== 'text' || typeof item.text !== 'string') continue;
+                    content.push({
+                        type: role === 'assistant' ? 'text' : 'input_text',
+                        text: item.text
+                    });
+                }
+            }
+            if (content.length === 0) continue;
+            items.push({ type: 'message', role, content });
+        }
+        return items;
+    }
+    static extractDelta(event) {
+        if (typeof event.delta === 'string') return event.delta;
+        return '';
+    }
+}
 class MixOpenRouter extends MixOpenAI {
     getDefaultConfig(customConfig) {
@@ -2266,4 +2627,4 @@ class MixGoogle extends MixCustom {
     }
 }
-module.exports = { MixCustom, ModelMix, MixAnthropic, MixMiniMax, MixOpenAI, MixOpenRouter, MixPerplexity, MixOllama, MixLMStudio, MixGroq, MixTogether, MixGrok, MixCerebras, MixGoogle, MixFireworks };
+module.exports = { MixCustom, ModelMix, MixAnthropic, MixMiniMax, MixOpenAI, MixOpenAIResponses, MixOpenAIWebSocket, MixOpenRouter, MixPerplexity, MixOllama, MixLMStudio, MixGroq, MixTogether, MixGrok, MixCerebras, MixGoogle, MixFireworks };

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "modelmix",
-  "version": "4.4.12",
+  "version": "4.4.16",
   "description": "🧬 Reliable interface with automatic fallback for AI LLMs.",
   "main": "index.js",
   "repository": {
@@ -16,7 +16,7 @@
     "openai",
     "anthropic",
     "agent",
-    "grok4",
+    "realtime",
     "gpt",
     "claude",
     "llama",
@@ -47,16 +47,17 @@
   },
   "homepage": "https://github.com/clasen/ModelMix#readme",
   "dependencies": {
-    "@modelcontextprotocol/sdk": "^1.26.0",
+    "@modelcontextprotocol/sdk": "^1.27.1",
     "axios": "^1.13.5",
     "bottleneck": "^2.19.5",
     "file-type": "^16.5.4",
     "form-data": "^4.0.4",
-    "lemonlog": "^1.2.0"
+    "lemonlog": "^1.2.0",
+    "ws": "^8.19.0"
   },
   "devDependencies": {
     "chai": "^5.2.1",
-    "mocha": "^11.3.0",
+    "mocha": "^11.7.5",
     "nock": "^14.0.9",
     "sinon": "^21.0.0"
   },

package/skills/modelmix/SKILL.md CHANGED Viewed

@@ -1,41 +1,50 @@
 ---
 name: modelmix
-description: Instructions for using the ModelMix Node.js library to interact with multiple AI LLM providers through a unified interface. Use when integrating AI models (OpenAI, Anthropic, Google, Groq, Perplexity, Grok, etc.), chaining models with fallback, getting structured JSON from LLMs, adding MCP tools, streaming responses, or managing multi-provider AI workflows in Node.js.
+description: Instructions for using the ModelMix Node.js library to interact with multiple AI LLM providers through a unified interface. Use when writing code that calls AI models (OpenAI, Anthropic, Google, Groq, Perplexity, Grok, MiniMax, Fireworks, Together, Lambda, Cerebras, OpenRouter, Ollama, LM Studio), chaining models with fallback, getting structured JSON from LLMs, adding MCP tools, streaming responses, managing multi-provider AI workflows, round-robin load balancing, or rate limiting API requests in Node.js. Also use when the user mentions "modelmix", "ModelMix", asks to "call an LLM", "query a model", "add AI to my app", or wants to integrate any supported provider.
+metadata:
+  tags: [llm, ai, openai, anthropic, google, groq, perplexity, grok, mcp, streaming, json-output]
 ---
 # ModelMix Library Skill
 ## Overview
-ModelMix is a Node.js library that provides a unified fluent API to interact with multiple AI LLM providers. It handles automatic fallback between models, round-robin load balancing, structured JSON output, streaming, MCP tool integration, rate limiting, and token tracking.
+ModelMix is a Node.js library providing a unified fluent API to interact with multiple AI LLM providers. It handles automatic fallback between models, round-robin load balancing, structured JSON output, streaming, MCP tool integration, custom local tools, rate limiting, and token tracking.
 Use this skill when:
 - Integrating one or more AI models into a Node.js project
-- Chaining models with automatic fallback
+- Chaining models with automatic fallback or round-robin
 - Extracting structured JSON from LLMs
 - Adding MCP tools or custom tools to models
+- Streaming responses from any provider
 - Working with templates and file-based prompts
+- Tracking token usage and costs
-Do NOT use this skill for:
+Do NOT use for:
 - Python or non-Node.js projects
 - Direct HTTP calls to LLM APIs (use ModelMix instead)
-## Common Tasks
+## Quick Reference
+- [Installation](#installation)
+- [Creating an instance](#creating-an-instance)
+- [Attaching models](#attaching-models)
 - [Get a text response](#get-a-text-response)
 - [Get structured JSON](#get-structured-json)
 - [Stream a response](#stream-a-response)
-- [Get raw response (tokens, thinking, tool calls)](#get-raw-response-tokens-thinking-tool-calls)
-- [Access full response after `message()` or `json()` with `lastRaw`](#access-full-response-after-message-or-json-with-lastraw)
+- [Extract a code block](#extract-a-code-block)
+- [Get raw response (tokens, thinking, tool calls)](#get-raw-response)
+- [Access full response with lastRaw](#access-full-response-with-lastraw)
 - [Add images](#add-images)
-- [Use templates with placeholders](#use-templates-with-placeholders)
+- [Templates with placeholders](#templates-with-placeholders)
 - [Round-robin load balancing](#round-robin-load-balancing)
-- [MCP integration (external tools)](#mcp-integration-external-tools)
-- [Custom local tools (addTool)](#custom-local-tools-addtool)
-- [Rate limiting (Bottleneck)](#rate-limiting-bottleneck)
-- [Debug mode](#debug-mode)
-- [Use free-tier models](#use-free-tier-models)
+- [MCP integration](#mcp-integration)
+- [Custom local tools](#custom-local-tools)
+- [Rate limiting](#rate-limiting)
 - [Conversation history](#conversation-history)
+- [Debug mode](#debug-mode)
+- [Free-tier models](#free-tier-models)
+- [Multi-provider routing](#multi-provider-routing)
 ## Installation
@@ -54,49 +63,77 @@ import { ModelMix } from 'modelmix';
 ### Creating an Instance
 ```javascript
-// Static factory (preferred)
 const model = ModelMix.new();
-// With global options
 const model = ModelMix.new({
     options: { max_tokens: 4096, temperature: 0.7 },
     config: {
         system: "You are a helpful assistant.",
-        max_history: 5,
-        debug: 0,           // 0=silent, 1=minimal, 2=summary, 3=full (no truncate), 4=verbose
-        roundRobin: false    // false=fallback, true=rotate models
+        max_history: 5,   // -1 = unlimited, 0 = none (default), N = keep last N
+        debug: 0,          // 0=silent, 1=minimal, 2=summary, 3=full, 4=verbose
+        roundRobin: false  // false=fallback, true=rotate models
     }
 });
 ```
-### Attaching Models (Fluent Chain)
+### Attaching Models
-Chain shorthand methods to attach providers. First model is primary; others are fallbacks:
+Chain shorthand methods to attach providers. First model is primary; others are fallbacks (or rotated if `roundRobin: true`):
 ```javascript
 const model = ModelMix.new()
     .sonnet46()        // primary
-    .gpt52()        // fallback 1
+    .gpt52()           // fallback 1
     .gemini3flash()    // fallback 2
     .addText("Hello!")
 ```
-If `sonnet45` fails, it automatically tries `gpt5mini`, then `gemini3flash`.
+If `sonnet46` fails, it automatically tries `gpt52`, then `gemini3flash`.
 ## Available Model Shorthands
-- **OpenAI**: `gpt52` `gpt51` `gpt5` `gpt5mini` `gpt5nano` `gpt41` `gpt41mini` `gpt41nano`
-- **Anthropic**: `opus46` `opus45` `sonnet46` `sonnet45` `haiku45` `haiku35` (thinking variants: add `think` suffix)
-- **Google**: `gemini3pro` `gemini3flash` `gemini25pro` `gemini25flash`
-- **Grok**: `grok4` `grok41` (thinking variant available)
-- **Perplexity**: `sonar` `sonarPro`
-- **Groq**: `scout` `maverick`
-- **Together**: `qwen3` `kimiK2`
-- **Multi-provider**: `deepseekR1` `gptOss`
-- **MiniMax**: `minimaxM21`
-- **Fireworks**: `deepseekV32` `GLM47`
+### OpenAI
+`gpt52()` `gpt52chat()` `gpt51()` `gpt5()` `gpt5mini()` `gpt5nano()` `gpt45()` `gpt41()` `gpt41mini()` `gpt41nano()` `o3()` `o4mini()`
+### Anthropic
+`opus46()` `opus45()` `opus41()` `sonnet46()` `sonnet45()` `sonnet4()` `sonnet37()` `haiku45()` `haiku35()`
+Thinking variants: append `think` — e.g. `opus46think()` `sonnet46think()` `sonnet45think()` `sonnet4think()` `sonnet37think()` `opus45think()` `opus41think()` `haiku45think()`
+### Google
+`gemini3pro()` `gemini3flash()` `gemini25pro()` `gemini25flash()`
+### Grok
+`grok4()` `grok41()` `grok41think()` `grok3()` `grok3mini()`
+### Perplexity
+`sonar()` `sonarPro()`
+### Groq
+`scout()` `maverick()`
+### Together
+`qwen3()` `kimiK2()` `kimiK2think()` `kimiK25think()` `gptOss()`
+### MiniMax
+`minimaxM25()` `minimaxM21()` `minimaxM2()` `minimaxM2Stable()`
+### Fireworks
+`deepseekV32()` `GLM5()` `GLM47()`
+### Cerebras
+`GLM46()`
+### OpenRouter
+`GLM45()`
+### Multi-provider (auto-fallback across free/paid tiers)
+`deepseekR1()` `hermes3()` `scout()` `maverick()` `kimiK2()` `GLM47()`
-Each method is called as `mix.methodName()` and accepts optional `{ options, config }` to override per-model settings.
+### Local
+`lmstudio()` — for LM Studio local models
+Each method accepts optional `{ options, config }` to override per-model settings.
 ## Common Tasks
@@ -116,35 +153,30 @@ const result = await ModelMix.new()
     .gpt5mini()
     .addText("Name and capital of 3 South American countries.")
     .json(
-        { countries: [{ name: "", capital: "" }] },                    // schema example
-        { countries: [{ name: "country name", capital: "in uppercase" }] }, // descriptions
-        { addNote: true }                                               // options
+        { countries: [{ name: "", capital: "" }] },
+        { countries: [{ name: "country name", capital: "in uppercase" }] },
+        { addNote: true }
     );
-// result.countries → [{ name: "Brazil", capital: "BRASILIA" }, ...]
 ```
 `json()` signature: `json(schemaExample, schemaDescription?, { addSchema, addExample, addNote }?)`
 #### Enhanced descriptors
-Descriptions can be **strings** or **descriptor objects** with metadata:
+Descriptions can be strings or descriptor objects with metadata:
 ```javascript
 const result = await model.json(
     { name: 'martin', age: 22, sex: 'Male' },
     {
         name: { description: 'Name of the actor', required: false },
-        age: 'Age of the actor',                                     // string still works
+        age: 'Age of the actor',
         sex: { description: 'Gender', enum: ['Male', 'Female', null] }
     }
 );
 ```
-Descriptor properties:
-- `description` (string) — field description
-- `required` (boolean, default `true`) — if `false`: removed from required array, type becomes nullable
-- `enum` (array) — allowed values; if includes `null`, type auto-becomes nullable
-- `default` (any) — default value
+Descriptor properties: `description` (string), `required` (boolean, default true — if false, field becomes nullable), `enum` (array — if includes null, type auto-becomes nullable), `default` (any).
 #### Array auto-wrap
@@ -166,7 +198,19 @@ await ModelMix.new()
     });
 ```
-### Get raw response (tokens, thinking, tool calls)
+### Extract a code block
+```javascript
+const code = await ModelMix.new()
+    .gpt5mini()
+    .addText("Write a hello world function in JavaScript.")
+    .block();
+// Returns only the content inside the first code block
+```
+`block()` accepts `{ addSystemExtra }` (default true) — adds system instructions that tell the model to wrap output in a code block.
+### Get raw response
 ```javascript
 const raw = await ModelMix.new()
@@ -176,15 +220,15 @@ const raw = await ModelMix.new()
 // raw.message, raw.think, raw.tokens, raw.toolCalls, raw.response
 ```
-### Access full response after `message()` or `json()` with `lastRaw`
+### Access full response with lastRaw
-After calling `message()`, `json()`, `block()`, or `stream()`, use `lastRaw` to access the complete response (tokens, thinking, tool calls, etc.). It has the same structure as `raw()`.
+After calling `message()`, `json()`, `block()`, or `stream()`, use `lastRaw` to access the complete response:
 ```javascript
 const model = ModelMix.new().gpt5mini().addText("Hello!");
 const text = await model.message();
 console.log(model.lastRaw.tokens);
-// { input: 122, output: 86, total: 541, cost: 0.000319 }
+// { input: 122, output: 86, total: 541, cost: 0.000319, speed: 38 }
 console.log(model.lastRaw.think);    // reasoning content (if available)
 console.log(model.lastRaw.response); // raw API response
 ```
@@ -193,13 +237,16 @@ console.log(model.lastRaw.response); // raw API response
 ```javascript
 const model = ModelMix.new().sonnet45();
-model.addImage('./photo.jpg');                         // from file
-model.addImageFromUrl('https://example.com/img.png');  // from URL
+model.addImage('./photo.jpg');                          // from file
+model.addImageFromUrl('https://example.com/img.png');   // from URL
+model.addImageFromBuffer(imageBuffer);                  // from Buffer
 model.addText('Describe this image.');
 const description = await model.message();
 ```
-### Use templates with placeholders
+All image methods accept an optional second argument `{ role }` (default `"user"`).
+### Templates with placeholders
 ```javascript
 const model = ModelMix.new().gpt5mini();
@@ -221,12 +268,11 @@ const pool = ModelMix.new({ config: { roundRobin: true } })
     .sonnet45()
     .gemini3flash();
-// Each call rotates to the next model
 const r1 = await pool.new().addText("Request 1").message();
 const r2 = await pool.new().addText("Request 2").message();
 ```
-### MCP integration (external tools)
+### MCP integration
 ```javascript
 const model = ModelMix.new({ config: { max_history: 10 } }).gpt5nano();
@@ -238,7 +284,7 @@ console.log(await model.message());
 Requires `BRAVE_API_KEY` in `.env` for Brave Search MCP.
-### Custom local tools (addTool)
+### Custom local tools
 ```javascript
 const model = ModelMix.new({ config: { max_history: 10 } }).gpt5mini();
@@ -259,7 +305,18 @@ model.addText("What's the weather in Tokyo?");
 console.log(await model.message());
 ```
-### Rate limiting (Bottleneck)
+Register multiple tools at once:
+```javascript
+model.addTools([
+    { tool: { name: "tool_a", description: "...", inputSchema: {...} }, callback: async (args) => {...} },
+    { tool: { name: "tool_b", description: "...", inputSchema: {...} }, callback: async (args) => {...} }
+]);
+```
+Manage tools: `model.removeTool("tool_a")` and `model.listTools()` → `{ local, mcp }`.
+### Rate limiting
 ```javascript
 const model = ModelMix.new({
@@ -272,20 +329,31 @@ const model = ModelMix.new({
 }).gpt5mini();
 ```
+### Conversation history
+```javascript
+const chat = ModelMix.new({ config: { max_history: 10 } }).gpt5mini();
+chat.addText("My name is Martin.");
+await chat.message();
+chat.addText("What's my name?");
+const reply = await chat.message();  // "Martin"
+```
+`max_history`: 0 = no history (default), N = keep last N exchanges, -1 = unlimited.
 ### Debug mode
 ```javascript
 const model = ModelMix.new({
-    config: { debug: 2 }  // 0=silent, 1=minimal, 2=summary, 3=full (no truncate), 4=verbose
+    config: { debug: 2 }  // 0=silent, 1=minimal, 2=summary, 3=full, 4=verbose
 }).gpt5mini();
 ```
-For full debug output, also set the env: `DEBUG=ModelMix* node script.js`
+For full debug output, also set: `DEBUG=ModelMix* node script.js`
-### Use free-tier models
+### Free-tier models
 ```javascript
-// These use providers with free quotas (OpenRouter, Groq, Cerebras)
 const model = ModelMix.new()
     .gptOss()
     .kimiK2()
@@ -295,48 +363,61 @@ const model = ModelMix.new()
 console.log(await model.message());
 ```
-### Conversation history
+These use providers with free quotas (OpenRouter, Groq, Cerebras). If one runs out of quota, ModelMix falls back to the next.
+### Multi-provider routing
+Some model shorthands register the same model across multiple providers for maximum resilience. Control which providers are enabled via the `mix` parameter:
 ```javascript
-const chat = ModelMix.new({ config: { max_history: 10 } }).gpt5mini();
-chat.addText("My name is Martin.");
-await chat.message();
-chat.addText("What's my name?");
-const reply = await chat.message();  // "Martin"
+const model = ModelMix.new({
+    mix: {
+        openrouter: true,   // default: true
+        cerebras: true,      // default: true
+        groq: true,          // default: true
+        together: false,     // default: false
+        lambda: false,       // default: false
+        minimax: false,      // default: false
+        fireworks: false     // default: false
+    }
+}).deepseekR1();
 ```
 ## Agent Usage Rules
-- Always check `package.json` for `modelmix` before running `npm install`.
-- Use `ModelMix.new()` static factory to create instances (not `new ModelMix()`).
+- Check `package.json` for `modelmix` before running `npm install`.
+- Use `ModelMix.new()` static factory (not `new ModelMix()`).
 - Store API keys in `.env` and load with `dotenv/config` or `process.loadEnvFile()`. Never hardcode keys.
 - Chain models for resilience: primary model first, fallbacks after.
-- When using MCP tools or `addTool()`, set `max_history` to at least 3.
-- Use `.json()` for structured output instead of parsing text manually. Use descriptor objects `{ description, required, enum, default }` in descriptions for richer schema control.
+- When using MCP tools or `addTool()`, set `max_history` to at least 3 — tool call/response pairs consume history slots.
+- Use `.json()` for structured output instead of parsing text manually. Use descriptor objects `{ description, required, enum, default }` for richer schema control.
 - Use `.message()` for simple text, `.raw()` when you need tokens/thinking/toolCalls.
 - For thinking models, append `think` to the method name (e.g. `sonnet45think()`).
 - Template placeholders use `{key}` syntax in both system prompts and user messages.
-- The library uses CommonJS internally (`require`) but supports ESM import via `{ ModelMix }`.
-- Available provider Mix classes for custom setups: `MixOpenAI`, `MixAnthropic`, `MixGoogle`, `MixPerplexity`, `MixGroq`, `MixTogether`, `MixGrok`, `MixOpenRouter`, `MixOllama`, `MixLMStudio`, `MixCustom`, `MixCerebras`, `MixFireworks`, `MixMiniMax`.
+- The library uses CommonJS internally but supports ESM import via `{ ModelMix }`.
+- GPT-5+ models automatically use `max_completion_tokens` instead of `max_tokens`.
+- o-series models (o3, o4mini) automatically strip `max_tokens` and `temperature` since those APIs don't support them.
+- `addText()`, `addImage()`, `addImageFromUrl()`, and `addImageFromBuffer()` all accept `{ role }` as second argument (default `"user"`).
 ## API Quick Reference
 | Method | Returns | Description |
 | --- | --- | --- |
-| `.addText(text)` | `this` | Add user message |
-| `.addTextFromFile(path)` | `this` | Add user message from file |
+| `.addText(text, {role?})` | `this` | Add user message |
+| `.addTextFromFile(path, {role?})` | `this` | Add user message from file |
 | `.setSystem(text)` | `this` | Set system prompt |
 | `.setSystemFromFile(path)` | `this` | Set system prompt from file |
-| `.addImage(path)` | `this` | Add image from file |
-| `.addImageFromUrl(url)` | `this` | Add image from URL or data URI |
+| `.addImage(path, {role?})` | `this` | Add image from file |
+| `.addImageFromUrl(url, {role?})` | `this` | Add image from URL or data URI |
+| `.addImageFromBuffer(buffer, {role?})` | `this` | Add image from Buffer |
 | `.replace({})` | `this` | Set placeholder replacements |
 | `.replaceKeyFromFile(key, path)` | `this` | Replace placeholder with file content |
 | `.message()` | `Promise<string>` | Get text response |
-| `.json(example, desc?, opts?)` | `Promise<object\|array>` | Get structured JSON. Descriptions support descriptor objects `{ description, required, enum, default }`. Top-level arrays auto-wrapped |
+| `.json(example, desc?, opts?)` | `Promise<object\|array>` | Get structured JSON |
 | `.raw()` | `Promise<{message, think, toolCalls, tokens, response}>` | Full response |
-| `.lastRaw` | `object \| null` | Full response from last `message()`/`json()`/`block()`/`stream()` call |
+| `.lastRaw` | `object \| null` | Full response from last call |
 | `.stream(callback)` | `Promise` | Stream response |
-| `.block()` | `Promise<string>` | Extract code block from response |
+| `.block({addSystemExtra?})` | `Promise<string>` | Extract code block from response |
 | `.addMCP(package)` | `Promise` | Add MCP server tools |
 | `.addTool(def, callback)` | `this` | Register custom local tool |
 | `.addTools([{tool, callback}])` | `this` | Register multiple tools |
@@ -345,6 +426,30 @@ const reply = await chat.message();  // "Martin"
 | `.new()` | `ModelMix` | Clone instance sharing models |
 | `.attach(key, provider)` | `this` | Attach custom provider |
+## Available Provider Classes
+`MixOpenAI` `MixAnthropic` `MixGoogle` `MixPerplexity` `MixGroq` `MixTogether` `MixGrok` `MixOpenRouter` `MixOllama` `MixLMStudio` `MixCustom` `MixCerebras` `MixFireworks` `MixMiniMax` `MixLambda`
+## Troubleshooting
+**Model fails with "API key not found"**
+The provider's API key env var is not set. Add it to `.env` and ensure it loads before ModelMix runs. Each provider looks for its standard env var (e.g. `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GEMINI_API_KEY`).
+**Tool calls not working**
+Set `max_history` to at least 3. Tool call/response pairs are stored in history and the model needs to see them to complete the conversation loop.
+**JSON response parsing fails**
+Add `{ addNote: true }` to the `json()` options — this injects instructions about JSON escaping that prevent common parsing errors. For complex schemas, also try `{ addExample: true }`.
+**Model returns empty or truncated response**
+Increase `max_tokens` in options. Default is 8192 but some tasks need more. For GPT-5+ models, `max_completion_tokens` is used automatically.
+**Rate limit errors**
+Configure Bottleneck: `config: { bottleneck: { maxConcurrent: 2, minTime: 2000 } }`. This throttles requests to stay within provider limits.
+**MCP server fails to connect**
+Ensure the MCP package is installed (`npm install @modelcontextprotocol/server-brave-search`) and required env vars are set. Call `addMCP()` with `await` — it's async.
 ## References
 - [GitHub Repository](https://github.com/clasen/ModelMix)