npm - llmjs2 - Versions diffs - 1.1.1 → 1.3.1 - Mend

llmjs2 1.1.1 → 1.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

package/CONFIG_README.md +98 -0
package/README.md +382 -357
package/cli.js +195 -0
package/config.yaml +149 -0
package/docs/BASIC_USAGE.md +296 -0
package/docs/CLI.md +455 -0
package/docs/GET_STARTED.md +129 -0
package/docs/GUARDRAILS_GUIDE.md +734 -0
package/docs/README.md +47 -0
package/docs/ROUTER_GUIDE.md +397 -0
package/docs/SERVER_MODE.md +350 -0
package/index.js +199 -246
package/package.json +45 -34
package/providers/ollama.js +120 -88
package/providers/openai.js +104 -0
package/providers/openrouter.js +113 -79
package/router.js +248 -0
package/server.js +186 -0
package/test.js +246 -0
package/validate-config.js +87 -0
package/LICENSE +0 -21

package/docs/SERVER_MODE.md ADDED Viewed

@@ -0,0 +1,350 @@
+# Server Mode Guide
+Run llmjs2 as an OpenAI-compatible API server with intelligent routing and load balancing capabilities to integrate with existing OpenAI clients and applications.
+## Quick Start Server
+### Method 1: Simple JavaScript Server
+Create a server file:
+```javascript
+// server.js
+import { app } from 'llmjs2';
+// Start the server
+app.listen(3000, () => {
+  console.log('🚀 llmjs2 server running on http://localhost:3000');
+});
+```
+Run it:
+```bash
+node server.js
+```
+#
+## API Endpoints
+### Chat Completions
+**Endpoint:** `POST /v1/chat/completions`
+**Content-Type:** `application/json`
+**Request Format:**
+```json
+{
+  "model": "ollama/minimax-m2.5:cloud",
+  "messages": [
+    {
+      "role": "user",
+      "content": "Hello! How are you?"
+    }
+  ],
+  "tools": [] // optional
+}
+```
+**Response Format:**
+The server returns a response with metadata and the complete message array:
+```json
+{
+  "id": "chatcmpl-123456",
+  "object": "chat.completion",
+  "created": 1640995200,
+  "model": "ollama/minimax-m2.5:cloud",
+  "messages": [
+    {
+      "role": "user",
+      "content": "Hello! How are you?"
+    },
+    {
+      "role": "assistant",
+      "content": "Hello! I'm doing well, thank you for asking!"
+    }
+  ]
+}
+```
+## Using with OpenAI Clients
+### Direct HTTP Requests
+Since the server returns a simplified message array format, use direct HTTP requests:
+```python
+import requests
+response = requests.post(
+    "http://localhost:3000/v1/chat/completions",
+    json={
+        "messages": [{"role": "user", "content": "Hello!"}]
+    }
+)
+data = response.json()
+messages = data["messages"]
+assistant_message = messages[-1]  # Last message is the assistant's response
+print(f"Model used: {data['model']}")
+print(f"Assistant: {assistant_message['content']}")
+```
+### Node.js with fetch
+```javascript
+const response = await fetch('http://localhost:3000/v1/chat/completions', {
+  method: 'POST',
+  headers: {
+    'Content-Type': 'application/json'
+  },
+  body: JSON.stringify({
+    messages: [{ role: 'user', content: 'Hello!' }]
+  })
+});
+const data = await response.json();
+const messages = data.messages;
+const assistantMessage = messages[messages.length - 1]; // Last message is assistant's response
+console.log(`Model used: ${data.model}`);
+console.log(`Assistant: ${assistantMessage.content}`);
+```
+### cURL
+```bash
+curl -X POST http://localhost:3000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer your-api-key" \
+  -d '{
+    "model": "ollama/minimax-m2.5:cloud",
+    "messages": [
+      {
+        "role": "user",
+        "content": "Hello! How are you?"
+      }
+    ]
+   }'
+```
+## Router Integration
+Add intelligent routing and load balancing to your server:
+### Basic Router Setup
+```javascript
+import { router, app } from 'llmjs2';
+const costOptimizedModels = [
+  {
+    "model_name": "text-davinci-001",
+    "llm_params": {
+      "model": "ollama/text-davinci-003",
+      "api_key": process.env.OLLAMA_API_KEY
+    }
+  },
+  {
+    "model_name": "text-davinci-002",
+    "llm_params": {
+      "model": "openrouter/text-davinci-003",
+      "api_key": process.env.OPENROUTER_API_KEY
+    }
+  },
+  {
+    "model_name": "text-davinci-003",
+    "llm_params": {
+      "model": "openai/gpt-3.5-turbo",
+      "api_key": process.env.OPENAI_API_KEY
+    }
+  }
+];
+// Create router with random strategy for load balancing
+const route = router(costOptimizedModels, 'random');
+// Apply router to server
+app.use(route);
+// Start the server
+app.listen(3000, () => {
+  console.log('🚀 llmjs2 server with routing running on http://localhost:3000');
+});
+```
+### Router Strategies
+- **`'random'`**: Randomly selects from available models
+- **`'sequential'`**: Cycles through models in order
+- **`'default'`** or none: Load balances across models with same `model_name`
+### API Usage with Routing
+```bash
+# Automatic routing (uses router strategy)
+curl -X POST http://localhost:3000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "messages": [{"role": "user", "content": "Hello!"}]
+  }'
+# Direct model routing (bypasses router)
+curl -X POST http://localhost:3000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "openai/gpt-3.5-turbo",
+    "messages": [{"role": "user", "content": "Hello!"}]
+  }'
+```
+### Advanced Routing Examples
+**Multi-Provider Fallback:**
+```javascript
+const fallbackModels = [
+  { "model_name": "gpt-4", "llm_params": { "model": "openai/gpt-4", "api_key": process.env.OPENAI_API_KEY } },
+  { "model_name": "gpt-4", "llm_params": { "model": "ollama/gpt-4", "api_key": process.env.OLLAMA_API_KEY } },
+  { "model_name": "gpt-4", "llm_params": { "model": "openrouter/gpt-4", "api_key": process.env.OPENROUTER_API_KEY } }
+];
+const route = router(fallbackModels, 'random');
+app.use(route);
+```
+**Cost Optimization:**
+```javascript
+const costModels = [
+  { "model_name": "completion", "llm_params": { "model": "ollama/llama2", "api_key": process.env.OLLAMA_API_KEY } },
+  { "model_name": "completion", "llm_params": { "model": "openrouter/free", "api_key": process.env.OPENROUTER_API_KEY } },
+  { "model_name": "completion", "llm_params": { "model": "openai/gpt-3.5-turbo", "api_key": process.env.OPENAI_API_KEY } }
+];
+const route = router(costModels, 'sequential'); // Try cheaper models first
+app.use(route);
+```
+## Function Calling (Tools) Support
+The server supports OpenAI-compatible function calling:
+```bash
+curl -X POST http://localhost:3000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer your-api-key" \
+  -d '{
+    "model": "openrouter/openrouter/free",
+    "messages": [
+      {
+        "role": "user",
+        "content": "What is the weather like in Paris?"
+      }
+    ],
+    "tools": [
+      {
+        "type": "function",
+        "function": {
+          "name": "get_weather",
+          "description": "Get the current weather in a given location",
+          "parameters": {
+            "type": "object",
+            "properties": {
+              "location": {
+                "type": "string",
+                "description": "The city and state, e.g. San Francisco, CA"
+              }
+            },
+            "required": ["location"]
+          }
+        }
+      }
+    ]
+  }'
+```
+#### Error Handling
+The server returns proper HTTP status codes and JSON error responses:
+```json
+{
+  "error": {
+    "message": "model is required",
+    "type": "invalid_request_error"
+  }
+}
+```
+Common status codes:
+- `400` - Bad Request (missing parameters)
+- `404` - Not Found (invalid endpoint)
+- `500` - Internal Server Error (API failures)
+### Environment Variables for Production
+```bash
+# Server configuration
+PORT=3000
+HOST=0.0.0.0
+# API Keys
+OLLAMA_API_KEY=your_production_key
+OPEN_ROUTER_API_KEY=your_production_key
+# Default models
+OLLAMA_DEFAULT_MODEL=minimax-m2.5:cloud
+OPEN_ROUTER_DEFAULT_MODEL=openrouter/free
+```
+## Monitoring and Logging
+The server includes comprehensive logging:
+```
+[2024-01-15T10:30:45.123Z] POST /v1/chat/completions
+Headers: {"content-type":"application/json",...}
+Body parsing completed successfully
+Starting completion with model: ollama/minimax-m2.5:cloud
+```
+### API Request Issues
+**400 Bad Request:**
+- Check that `model` and `messages` are provided
+- Ensure messages have `role` and `content` properties
+**500 Internal Server Error:**
+- Check API keys are valid
+- Verify internet connection
+- Check provider API status
+### CORS Issues
+If you're getting CORS errors in the browser:
+```javascript
+// The server includes CORS headers by default
+// If you need custom CORS, modify the server code
+res.writeHead(statusCode, {
+  'Content-Type': 'application/json',
+  'Access-Control-Allow-Origin': '*', // Change this for production
+  // ... other headers
+});
+```
+## Next Steps
+- **[CLI Guide](CLI.md)** - Use the command-line interface
+- **[Basic Usage](BASIC_USAGE.md)** - Learn different API patterns
+- **[Technical Specification](TECHNICAL_SPECIFICATION.md)** - Detailed technical information
+The server mode makes llmjs2 compatible with any OpenAI-compatible client or application!