llmjs2 1.1.1 → 1.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,350 @@
1
+ # Server Mode Guide
2
+
3
+ Run llmjs2 as an OpenAI-compatible API server with intelligent routing and load balancing capabilities to integrate with existing OpenAI clients and applications.
4
+
5
+ ## Quick Start Server
6
+
7
+ ### Method 1: Simple JavaScript Server
8
+
9
+ Create a server file:
10
+
11
+ ```javascript
12
+ // server.js
13
+ import { app } from 'llmjs2';
14
+
15
+ // Start the server
16
+ app.listen(3000, () => {
17
+ console.log('🚀 llmjs2 server running on http://localhost:3000');
18
+ });
19
+ ```
20
+
21
+ Run it:
22
+
23
+ ```bash
24
+ node server.js
25
+ ```
26
+
27
+ #
28
+
29
+ ## API Endpoints
30
+
31
+ ### Chat Completions
32
+
33
+ **Endpoint:** `POST /v1/chat/completions`
34
+
35
+ **Content-Type:** `application/json`
36
+
37
+ **Request Format:**
38
+
39
+ ```json
40
+ {
41
+ "model": "ollama/minimax-m2.5:cloud",
42
+ "messages": [
43
+ {
44
+ "role": "user",
45
+ "content": "Hello! How are you?"
46
+ }
47
+ ],
48
+ "tools": [] // optional
49
+ }
50
+ ```
51
+
52
+ **Response Format:**
53
+
54
+ The server returns a response with metadata and the complete message array:
55
+
56
+ ```json
57
+ {
58
+ "id": "chatcmpl-123456",
59
+ "object": "chat.completion",
60
+ "created": 1640995200,
61
+ "model": "ollama/minimax-m2.5:cloud",
62
+ "messages": [
63
+ {
64
+ "role": "user",
65
+ "content": "Hello! How are you?"
66
+ },
67
+ {
68
+ "role": "assistant",
69
+ "content": "Hello! I'm doing well, thank you for asking!"
70
+ }
71
+ ]
72
+ }
73
+ ```
74
+
75
+ ## Using with OpenAI Clients
76
+
77
+ ### Direct HTTP Requests
78
+
79
+ Since the server returns a simplified message array format, use direct HTTP requests:
80
+
81
+ ```python
82
+ import requests
83
+
84
+ response = requests.post(
85
+ "http://localhost:3000/v1/chat/completions",
86
+ json={
87
+ "messages": [{"role": "user", "content": "Hello!"}]
88
+ }
89
+ )
90
+
91
+ data = response.json()
92
+ messages = data["messages"]
93
+ assistant_message = messages[-1] # Last message is the assistant's response
94
+ print(f"Model used: {data['model']}")
95
+ print(f"Assistant: {assistant_message['content']}")
96
+ ```
97
+
98
+ ### Node.js with fetch
99
+
100
+ ```javascript
101
+ const response = await fetch('http://localhost:3000/v1/chat/completions', {
102
+ method: 'POST',
103
+ headers: {
104
+ 'Content-Type': 'application/json'
105
+ },
106
+ body: JSON.stringify({
107
+ messages: [{ role: 'user', content: 'Hello!' }]
108
+ })
109
+ });
110
+
111
+ const data = await response.json();
112
+ const messages = data.messages;
113
+ const assistantMessage = messages[messages.length - 1]; // Last message is assistant's response
114
+ console.log(`Model used: ${data.model}`);
115
+ console.log(`Assistant: ${assistantMessage.content}`);
116
+ ```
117
+
118
+ ### cURL
119
+
120
+ ```bash
121
+ curl -X POST http://localhost:3000/v1/chat/completions \
122
+ -H "Content-Type: application/json" \
123
+ -H "Authorization: Bearer your-api-key" \
124
+ -d '{
125
+ "model": "ollama/minimax-m2.5:cloud",
126
+ "messages": [
127
+ {
128
+ "role": "user",
129
+ "content": "Hello! How are you?"
130
+ }
131
+ ]
132
+ }'
133
+ ```
134
+
135
+ ## Router Integration
136
+
137
+ Add intelligent routing and load balancing to your server:
138
+
139
+ ### Basic Router Setup
140
+
141
+ ```javascript
142
+ import { router, app } from 'llmjs2';
143
+
144
+ const costOptimizedModels = [
145
+ {
146
+ "model_name": "text-davinci-001",
147
+ "llm_params": {
148
+ "model": "ollama/text-davinci-003",
149
+ "api_key": process.env.OLLAMA_API_KEY
150
+ }
151
+ },
152
+ {
153
+ "model_name": "text-davinci-002",
154
+ "llm_params": {
155
+ "model": "openrouter/text-davinci-003",
156
+ "api_key": process.env.OPENROUTER_API_KEY
157
+ }
158
+ },
159
+ {
160
+ "model_name": "text-davinci-003",
161
+ "llm_params": {
162
+ "model": "openai/gpt-3.5-turbo",
163
+ "api_key": process.env.OPENAI_API_KEY
164
+ }
165
+ }
166
+ ];
167
+
168
+ // Create router with random strategy for load balancing
169
+ const route = router(costOptimizedModels, 'random');
170
+
171
+ // Apply router to server
172
+ app.use(route);
173
+
174
+ // Start the server
175
+ app.listen(3000, () => {
176
+ console.log('🚀 llmjs2 server with routing running on http://localhost:3000');
177
+ });
178
+ ```
179
+
180
+ ### Router Strategies
181
+
182
+ - **`'random'`**: Randomly selects from available models
183
+ - **`'sequential'`**: Cycles through models in order
184
+ - **`'default'`** or none: Load balances across models with same `model_name`
185
+
186
+ ### API Usage with Routing
187
+
188
+ ```bash
189
+ # Automatic routing (uses router strategy)
190
+ curl -X POST http://localhost:3000/v1/chat/completions \
191
+ -H "Content-Type: application/json" \
192
+ -d '{
193
+ "messages": [{"role": "user", "content": "Hello!"}]
194
+ }'
195
+
196
+ # Direct model routing (bypasses router)
197
+ curl -X POST http://localhost:3000/v1/chat/completions \
198
+ -H "Content-Type: application/json" \
199
+ -d '{
200
+ "model": "openai/gpt-3.5-turbo",
201
+ "messages": [{"role": "user", "content": "Hello!"}]
202
+ }'
203
+ ```
204
+
205
+ ### Advanced Routing Examples
206
+
207
+ **Multi-Provider Fallback:**
208
+
209
+ ```javascript
210
+ const fallbackModels = [
211
+ { "model_name": "gpt-4", "llm_params": { "model": "openai/gpt-4", "api_key": process.env.OPENAI_API_KEY } },
212
+ { "model_name": "gpt-4", "llm_params": { "model": "ollama/gpt-4", "api_key": process.env.OLLAMA_API_KEY } },
213
+ { "model_name": "gpt-4", "llm_params": { "model": "openrouter/gpt-4", "api_key": process.env.OPENROUTER_API_KEY } }
214
+ ];
215
+
216
+ const route = router(fallbackModels, 'random');
217
+ app.use(route);
218
+ ```
219
+
220
+ **Cost Optimization:**
221
+
222
+ ```javascript
223
+ const costModels = [
224
+ { "model_name": "completion", "llm_params": { "model": "ollama/llama2", "api_key": process.env.OLLAMA_API_KEY } },
225
+ { "model_name": "completion", "llm_params": { "model": "openrouter/free", "api_key": process.env.OPENROUTER_API_KEY } },
226
+ { "model_name": "completion", "llm_params": { "model": "openai/gpt-3.5-turbo", "api_key": process.env.OPENAI_API_KEY } }
227
+ ];
228
+
229
+ const route = router(costModels, 'sequential'); // Try cheaper models first
230
+ app.use(route);
231
+ ```
232
+
233
+ ## Function Calling (Tools) Support
234
+
235
+ The server supports OpenAI-compatible function calling:
236
+
237
+ ```bash
238
+ curl -X POST http://localhost:3000/v1/chat/completions \
239
+ -H "Content-Type: application/json" \
240
+ -H "Authorization: Bearer your-api-key" \
241
+ -d '{
242
+ "model": "openrouter/openrouter/free",
243
+ "messages": [
244
+ {
245
+ "role": "user",
246
+ "content": "What is the weather like in Paris?"
247
+ }
248
+ ],
249
+ "tools": [
250
+ {
251
+ "type": "function",
252
+ "function": {
253
+ "name": "get_weather",
254
+ "description": "Get the current weather in a given location",
255
+ "parameters": {
256
+ "type": "object",
257
+ "properties": {
258
+ "location": {
259
+ "type": "string",
260
+ "description": "The city and state, e.g. San Francisco, CA"
261
+ }
262
+ },
263
+ "required": ["location"]
264
+ }
265
+ }
266
+ }
267
+ ]
268
+ }'
269
+ ```
270
+
271
+ #### Error Handling
272
+
273
+ The server returns proper HTTP status codes and JSON error responses:
274
+
275
+ ```json
276
+ {
277
+ "error": {
278
+ "message": "model is required",
279
+ "type": "invalid_request_error"
280
+ }
281
+ }
282
+ ```
283
+
284
+ Common status codes:
285
+
286
+ - `400` - Bad Request (missing parameters)
287
+ - `404` - Not Found (invalid endpoint)
288
+ - `500` - Internal Server Error (API failures)
289
+
290
+ ### Environment Variables for Production
291
+
292
+ ```bash
293
+ # Server configuration
294
+ PORT=3000
295
+ HOST=0.0.0.0
296
+
297
+ # API Keys
298
+ OLLAMA_API_KEY=your_production_key
299
+ OPEN_ROUTER_API_KEY=your_production_key
300
+
301
+ # Default models
302
+ OLLAMA_DEFAULT_MODEL=minimax-m2.5:cloud
303
+ OPEN_ROUTER_DEFAULT_MODEL=openrouter/free
304
+ ```
305
+
306
+ ## Monitoring and Logging
307
+
308
+ The server includes comprehensive logging:
309
+
310
+ ```
311
+ [2024-01-15T10:30:45.123Z] POST /v1/chat/completions
312
+ Headers: {"content-type":"application/json",...}
313
+ Body parsing completed successfully
314
+ Starting completion with model: ollama/minimax-m2.5:cloud
315
+ ```
316
+
317
+ ### API Request Issues
318
+
319
+ **400 Bad Request:**
320
+
321
+ - Check that `model` and `messages` are provided
322
+ - Ensure messages have `role` and `content` properties
323
+
324
+ **500 Internal Server Error:**
325
+
326
+ - Check API keys are valid
327
+ - Verify internet connection
328
+ - Check provider API status
329
+
330
+ ### CORS Issues
331
+
332
+ If you're getting CORS errors in the browser:
333
+
334
+ ```javascript
335
+ // The server includes CORS headers by default
336
+ // If you need custom CORS, modify the server code
337
+ res.writeHead(statusCode, {
338
+ 'Content-Type': 'application/json',
339
+ 'Access-Control-Allow-Origin': '*', // Change this for production
340
+ // ... other headers
341
+ });
342
+ ```
343
+
344
+ ## Next Steps
345
+
346
+ - **[CLI Guide](CLI.md)** - Use the command-line interface
347
+ - **[Basic Usage](BASIC_USAGE.md)** - Learn different API patterns
348
+ - **[Technical Specification](TECHNICAL_SPECIFICATION.md)** - Detailed technical information
349
+
350
+ The server mode makes llmjs2 compatible with any OpenAI-compatible client or application!