aillom-vox-client 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +15 -0
- package/README.md +272 -0
- package/dist/AillomVox.d.ts +36 -0
- package/dist/AillomVox.js +152 -0
- package/dist/index.d.ts +2 -0
- package/dist/index.js +18 -0
- package/dist/types.d.ts +36 -0
- package/dist/types.js +2 -0
- package/docs/ASTERISK.md +411 -0
- package/docs/PROTOCOL.md +156 -0
- package/docs/PROVIDERS.md +40 -0
- package/docs/TOOLS.md +314 -0
- package/docs/TROUBLESHOOTING.md +86 -0
- package/docs/VOICES.md +219 -0
- package/docs/providers/AILLOMVOX.md +185 -0
- package/docs/providers/AWS.md +32 -0
- package/docs/providers/GEMINI.md +33 -0
- package/docs/providers/GROK.md +25 -0
- package/docs/providers/OPENAI.md +39 -0
- package/docs/providers/QWEN.md +27 -0
- package/docs/providers/ULTRAVOX.md +29 -0
- package/examples/01-basic/app.js +196 -0
- package/examples/01-basic/index.html +27 -0
- package/examples/02-advanced-dashboard/app.js +465 -0
- package/examples/02-advanced-dashboard/index.html +200 -0
- package/examples/02-advanced-dashboard/style.css +501 -0
- package/examples/03-smart-home/index.html +377 -0
- package/examples/04-customer-support/index.html +474 -0
- package/examples/sdk-usage.ts +44 -0
- package/integrations/n8n-nodes-aillomvox/README.md +56 -0
- package/integrations/n8n-nodes-aillomvox/credentials/AillomVoxApi.credentials.ts +29 -0
- package/integrations/n8n-nodes-aillomvox/dist/credentials/AillomVoxApi.credentials.js +30 -0
- package/integrations/n8n-nodes-aillomvox/dist/nodes/AillomVox/AillomVox.node.js +219 -0
- package/integrations/n8n-nodes-aillomvox/dist/nodes/AillomVox/aillomvox.svg +6 -0
- package/integrations/n8n-nodes-aillomvox/gulpfile.js +10 -0
- package/integrations/n8n-nodes-aillomvox/nodes/AillomVox/AillomVox.node.ts +229 -0
- package/integrations/n8n-nodes-aillomvox/nodes/AillomVox/aillomvox.svg +6 -0
- package/integrations/n8n-nodes-aillomvox/package-lock.json +11741 -0
- package/integrations/n8n-nodes-aillomvox/package.json +56 -0
- package/integrations/n8n-nodes-aillomvox/tsconfig.json +32 -0
- package/package.json +55 -0
- package/src/AillomVox.ts +169 -0
- package/src/index.ts +2 -0
- package/src/types.ts +50 -0
- package/tsconfig.json +23 -0
package/docs/ASTERISK.md
ADDED
|
@@ -0,0 +1,411 @@
|
|
|
1
|
+
# 📞 Asterisk 23 Integration Guide - AillomVox
|
|
2
|
+
|
|
3
|
+
Complete guide for integrating **Asterisk 23** with **AillomVox** for real-time Voice AI calls.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## 🎯 Overview
|
|
8
|
+
|
|
9
|
+
AillomVox provides **two integration architectures** for Asterisk:
|
|
10
|
+
|
|
11
|
+
### 1. **Direct Mode** (Simple)
|
|
12
|
+
```
|
|
13
|
+
Asterisk → AudioSocket → AillomVox Gateway → AI Provider
|
|
14
|
+
```
|
|
15
|
+
✅ Simple voice conversations
|
|
16
|
+
❌ No client tools (transfer, AMI control)
|
|
17
|
+
|
|
18
|
+
### 2. **Middleware Mode** (Advanced)
|
|
19
|
+
```
|
|
20
|
+
Asterisk → AudioSocket → Node.js Middleware → AillomVox Gateway
|
|
21
|
+
↓ AMI/ARI
|
|
22
|
+
```
|
|
23
|
+
✅ Client tools enabled (transfer, hangup, dial)
|
|
24
|
+
✅ Full Asterisk control from AI
|
|
25
|
+
|
|
26
|
+
---
|
|
27
|
+
|
|
28
|
+
## 📋 Prerequisites
|
|
29
|
+
|
|
30
|
+
### Asterisk Requirements
|
|
31
|
+
- **Asterisk 23** (or 18+)
|
|
32
|
+
- **AudioSocket** module compiled and enabled
|
|
33
|
+
- **API Key** for AillomVox gateway
|
|
34
|
+
|
|
35
|
+
### Verify AudioSocket Module
|
|
36
|
+
```bash
|
|
37
|
+
asterisk -rx "module show like audiosocket"
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
Expected output:
|
|
41
|
+
```
|
|
42
|
+
Module Description
|
|
43
|
+
res_audiosocket.so AudioSocket
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
If not loaded:
|
|
47
|
+
```bash
|
|
48
|
+
# Load module
|
|
49
|
+
asterisk -rx "module load res_audiosocket"
|
|
50
|
+
|
|
51
|
+
# Make it persistent
|
|
52
|
+
echo "load = res_audiosocket.so" >> /etc/asterisk/modules.conf
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## 🔐 Authentication
|
|
58
|
+
|
|
59
|
+
AillomVox uses **API Key** authentication for security and usage tracking.
|
|
60
|
+
|
|
61
|
+
### Getting Your API Key
|
|
62
|
+
|
|
63
|
+
Get your API key from the AillomVox dashboard or contact support.
|
|
64
|
+
|
|
65
|
+
### Storing API Key
|
|
66
|
+
|
|
67
|
+
Store in `extensions.conf`:
|
|
68
|
+
|
|
69
|
+
```ini
|
|
70
|
+
[globals]
|
|
71
|
+
AILLOM_API_KEY=your-api-key-here
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
---
|
|
75
|
+
|
|
76
|
+
## 🏗️ Architecture 1: Direct Mode (Simple)
|
|
77
|
+
|
|
78
|
+
### When to Use
|
|
79
|
+
- Simple AI voice conversations
|
|
80
|
+
- No need to transfer calls or control Asterisk
|
|
81
|
+
- Just want AI to answer and respond
|
|
82
|
+
|
|
83
|
+
### Dialplan Configuration
|
|
84
|
+
|
|
85
|
+
```ini
|
|
86
|
+
[from-internal]
|
|
87
|
+
|
|
88
|
+
; Call extension 6000 to talk to AI
|
|
89
|
+
exten => 6000,1,NoOp(AillomVox Direct Mode)
|
|
90
|
+
same => n,Set(WS_URL=ws://vox.aillom.com/ws?apiKey=${AILLOM_API_KEY})
|
|
91
|
+
same => n,Set(CONFIG={"provider":"aillomvox","voice":"Heitor","language":"pt-BR","system_prompt":"Você é um assistente. Seja conciso.","first_message":"Olá! Como posso ajudar?","sample_rate":8000})
|
|
92
|
+
same => n,Answer()
|
|
93
|
+
same => n,AudioSocket(${WS_URL},${CONFIG})
|
|
94
|
+
same => n,Hangup()
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
### Configuration JSON
|
|
98
|
+
|
|
99
|
+
```json
|
|
100
|
+
{
|
|
101
|
+
"provider": "aillomvox",
|
|
102
|
+
"voice": "Heitor",
|
|
103
|
+
"language": "pt-BR",
|
|
104
|
+
"system_prompt": "Você é um assistente virtual.",
|
|
105
|
+
"first_message": "Olá! Como posso ajudar?",
|
|
106
|
+
"farewell_message": "Obrigado por ligar. Até logo!",
|
|
107
|
+
"sample_rate": 8000,
|
|
108
|
+
"max_duration": 300
|
|
109
|
+
}
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
**Note**: No `tools` array needed in Direct Mode.
|
|
113
|
+
|
|
114
|
+
---
|
|
115
|
+
|
|
116
|
+
## 🏗️ Architecture 2: Middleware Mode (Advanced)
|
|
117
|
+
|
|
118
|
+
### When to Use
|
|
119
|
+
- AI needs to **transfer** calls to extensions
|
|
120
|
+
- AI needs to **hangup** via AMI (not just end conversation)
|
|
121
|
+
- AI needs to **dial** external numbers
|
|
122
|
+
- Advanced call control
|
|
123
|
+
|
|
124
|
+
### Architecture
|
|
125
|
+
```
|
|
126
|
+
Asterisk → AudioSocket (port 9000) → Node.js Middleware
|
|
127
|
+
↓
|
|
128
|
+
AMI/ARI Control
|
|
129
|
+
↓
|
|
130
|
+
AillomVox Gateway
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
### Step 1: Install Dependencies
|
|
134
|
+
|
|
135
|
+
```bash
|
|
136
|
+
npm install aillom-vox-client asterisk-manager
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
### Step 2: Create Middleware (`asterisk-bridge.js`)
|
|
140
|
+
|
|
141
|
+
```javascript
|
|
142
|
+
const net = require('net');
|
|
143
|
+
const AillomVoxClient = require('aillom-vox-client');
|
|
144
|
+
const AMI = require('asterisk-manager');
|
|
145
|
+
|
|
146
|
+
// AMI connection for Asterisk control
|
|
147
|
+
const ami = new AMI(5038, 'localhost', 'admin', 'secret', true);
|
|
148
|
+
|
|
149
|
+
ami.keepConnected();
|
|
150
|
+
|
|
151
|
+
// AudioSocket server (receives from Asterisk)
|
|
152
|
+
const server = net.createServer((socket) => {
|
|
153
|
+
console.log('[Bridge] New call from Asterisk');
|
|
154
|
+
|
|
155
|
+
// Connect to AillomVox
|
|
156
|
+
const client = new AillomVoxClient({
|
|
157
|
+
apiKey: process.env.AILLOM_API_KEY,
|
|
158
|
+
url: 'wss://vox.aillom.com/ws'
|
|
159
|
+
});
|
|
160
|
+
|
|
161
|
+
// Register client tools
|
|
162
|
+
client.connect({
|
|
163
|
+
provider: 'aillomvox',
|
|
164
|
+
voice: 'Heitor',
|
|
165
|
+
language: 'pt-BR',
|
|
166
|
+
system_prompt: 'Você é um assistente. Se o usuário pedir, transfira para ramal 100 ou desligue a ligação.',
|
|
167
|
+
sample_rate: 8000,
|
|
168
|
+
tools: [{
|
|
169
|
+
name: 'hangup',
|
|
170
|
+
description: 'End the call',
|
|
171
|
+
parameters: { type: 'object', properties: {} }
|
|
172
|
+
}, {
|
|
173
|
+
name: 'transfer',
|
|
174
|
+
description: 'Transfer call to extension',
|
|
175
|
+
parameters: {
|
|
176
|
+
type: 'object',
|
|
177
|
+
properties: {
|
|
178
|
+
extension: { type: 'string', description: 'Target extension' }
|
|
179
|
+
},
|
|
180
|
+
required: ['extension']
|
|
181
|
+
}
|
|
182
|
+
}]
|
|
183
|
+
});
|
|
184
|
+
|
|
185
|
+
// Handle tool calls from AI
|
|
186
|
+
client.on('tool_call', async (tool) => {
|
|
187
|
+
console.log(`[Tool] AI called: ${tool.name}`, tool.args);
|
|
188
|
+
|
|
189
|
+
if (tool.name === 'hangup') {
|
|
190
|
+
console.log('[Tool] Hanging up call');
|
|
191
|
+
socket.end();
|
|
192
|
+
client.disconnect();
|
|
193
|
+
return 'Call ended';
|
|
194
|
+
}
|
|
195
|
+
|
|
196
|
+
if (tool.name === 'transfer') {
|
|
197
|
+
const ext = tool.args.extension;
|
|
198
|
+
console.log(`[Tool] Transferring to ${ext}`);
|
|
199
|
+
|
|
200
|
+
// Use AMI to transfer (implement based on your channel tracking)
|
|
201
|
+
ami.action({
|
|
202
|
+
action: 'redirect',
|
|
203
|
+
channel: 'PJSIP/1234-00000001', // Track this from call setup
|
|
204
|
+
exten: ext,
|
|
205
|
+
context: 'from-internal',
|
|
206
|
+
priority: 1
|
|
207
|
+
});
|
|
208
|
+
|
|
209
|
+
return `Transferred to extension ${ext}`;
|
|
210
|
+
}
|
|
211
|
+
});
|
|
212
|
+
|
|
213
|
+
// Pipe audio: Asterisk ↔ AillomVox
|
|
214
|
+
socket.on('data', (data) => {
|
|
215
|
+
// Parse AudioSocket protocol if needed, then send PCM
|
|
216
|
+
client.sendAudio(data);
|
|
217
|
+
});
|
|
218
|
+
|
|
219
|
+
client.on('audio', (pcmData) => {
|
|
220
|
+
socket.write(pcmData);
|
|
221
|
+
});
|
|
222
|
+
|
|
223
|
+
socket.on('end', () => {
|
|
224
|
+
console.log('[Bridge] Asterisk closed connection');
|
|
225
|
+
client.disconnect();
|
|
226
|
+
});
|
|
227
|
+
});
|
|
228
|
+
|
|
229
|
+
server.listen(9000, '127.0.0.1', () => {
|
|
230
|
+
console.log('[Bridge] Listening on 127.0.0.1:9000');
|
|
231
|
+
});
|
|
232
|
+
```
|
|
233
|
+
|
|
234
|
+
### Step 3: Configure Asterisk
|
|
235
|
+
|
|
236
|
+
```ini
|
|
237
|
+
[from-internal]
|
|
238
|
+
|
|
239
|
+
; Call extension 7000 using middleware (with client tools)
|
|
240
|
+
exten => 7000,1,NoOp(AillomVox Middleware Mode)
|
|
241
|
+
same => n,Set(CONFIG={"provider":"aillomvox"})
|
|
242
|
+
same => n,Answer()
|
|
243
|
+
same => n,AudioSocket(127.0.0.1:9000,${CONFIG})
|
|
244
|
+
same => n,Hangup()
|
|
245
|
+
```
|
|
246
|
+
|
|
247
|
+
### Step 4: Run Middleware
|
|
248
|
+
|
|
249
|
+
```bash
|
|
250
|
+
AILLOM_API_KEY=your-api-key node asterisk-bridge.js
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
### Step 5: Test
|
|
254
|
+
|
|
255
|
+
Call extension 7000 and say:
|
|
256
|
+
- "Transfer me to extension 100" → AI calls `transfer` tool
|
|
257
|
+
- "Goodbye" → AI calls `hangup` tool
|
|
258
|
+
|
|
259
|
+
---
|
|
260
|
+
|
|
261
|
+
## 🎙️ Audio Format
|
|
262
|
+
|
|
263
|
+
### AudioSocket Protocol
|
|
264
|
+
- **Format**: PCM 16-bit signed little-endian (`slin`)
|
|
265
|
+
- **Sample Rate**: 8000 Hz (telephony standard)
|
|
266
|
+
- **Channels**: 1 (mono)
|
|
267
|
+
- **Encoding**: `pcm_s16le`
|
|
268
|
+
|
|
269
|
+
### Codec Conversion (ulaw/alaw → slin)
|
|
270
|
+
|
|
271
|
+
**No Brasil e na maioria dos países**, trunks SIP usam:
|
|
272
|
+
- **ulaw** (G.711μ) - Padrão nos EUA/Brasil
|
|
273
|
+
- **alaw** (G.711a) - Padrão na Europa
|
|
274
|
+
|
|
275
|
+
**Asterisk converte automaticamente**:
|
|
276
|
+
```
|
|
277
|
+
Trunk SIP (ulaw/alaw) → Asterisk → slin → AudioSocket → AillomVox
|
|
278
|
+
```
|
|
279
|
+
|
|
280
|
+
Não precisa configurar nada! Asterisk faz a conversão transparente.
|
|
281
|
+
|
|
282
|
+
### Forcing Codec (Optional)
|
|
283
|
+
|
|
284
|
+
Se tiver problemas de áudio, force o codec:
|
|
285
|
+
|
|
286
|
+
```ini
|
|
287
|
+
exten => 6000,1,Set(CHANNEL(audioreadformat)=slin)
|
|
288
|
+
same => n,Set(CHANNEL(audiowriteformat)=slin)
|
|
289
|
+
same => n,Answer()
|
|
290
|
+
same => n,AudioSocket(...)
|
|
291
|
+
```
|
|
292
|
+
|
|
293
|
+
Isso garante que Asterisk sempre entrega PCM 16-bit para o AudioSocket.
|
|
294
|
+
|
|
295
|
+
---
|
|
296
|
+
|
|
297
|
+
## 🌍 Multi-Provider Examples (Direct Mode)
|
|
298
|
+
|
|
299
|
+
### AillomVox (Best for Telephony)
|
|
300
|
+
|
|
301
|
+
```ini
|
|
302
|
+
exten => 7001,1,Set(CONFIG={"provider":"aillomvox","voice":"Heitor","sample_rate":8000})
|
|
303
|
+
same => n,AudioSocket(ws://vox.aillom.com/ws?apiKey=${AILLOM_API_KEY},${CONFIG})
|
|
304
|
+
```
|
|
305
|
+
|
|
306
|
+
**Why?** Lowest latency, optimized for 8kHz, $0.03/min.
|
|
307
|
+
|
|
308
|
+
### Gemini 2.5 Flash
|
|
309
|
+
|
|
310
|
+
```ini
|
|
311
|
+
exten => 7002,1,Set(CONFIG={"provider":"gemini","voice":"Puck","sample_rate":8000})
|
|
312
|
+
same => n,AudioSocket(ws://vox.aillom.com/ws?apiKey=${AILLOM_API_KEY},${CONFIG})
|
|
313
|
+
```
|
|
314
|
+
|
|
315
|
+
**Why?** Multimodal, fast, $0.06/min.
|
|
316
|
+
|
|
317
|
+
### OpenAI Realtime
|
|
318
|
+
|
|
319
|
+
```ini
|
|
320
|
+
exten => 7003,1,Set(CONFIG={"provider":"openai","voice":"shimmer","sample_rate":8000})
|
|
321
|
+
same => n,AudioSocket(ws://vox.aillom.com/ws?apiKey=${AILLOM_API_KEY},${CONFIG})
|
|
322
|
+
```
|
|
323
|
+
|
|
324
|
+
**Why?** Best for complex reasoning, $0.10/min.
|
|
325
|
+
|
|
326
|
+
---
|
|
327
|
+
|
|
328
|
+
## ⚠️ Troubleshooting
|
|
329
|
+
|
|
330
|
+
### Problem: "Module audiosocket not loaded"
|
|
331
|
+
|
|
332
|
+
```bash
|
|
333
|
+
asterisk -rx "module load res_audiosocket"
|
|
334
|
+
echo "load = res_audiosocket.so" >> /etc/asterisk/modules.conf
|
|
335
|
+
```
|
|
336
|
+
|
|
337
|
+
### Problem: "Connection refused"
|
|
338
|
+
|
|
339
|
+
**Direct Mode**: Check firewall, verify server is running
|
|
340
|
+
**Middleware Mode**: Ensure middleware is running on `127.0.0.1:9000`
|
|
341
|
+
|
|
342
|
+
```bash
|
|
343
|
+
# Test direct connection
|
|
344
|
+
curl -I https://vox.aillom.com/health
|
|
345
|
+
|
|
346
|
+
# Test middleware
|
|
347
|
+
netstat -tuln | grep 9000
|
|
348
|
+
```
|
|
349
|
+
|
|
350
|
+
### Problem: "No audio"
|
|
351
|
+
|
|
352
|
+
Force codec:
|
|
353
|
+
```ini
|
|
354
|
+
exten => 6000,1,Set(CHANNEL(audioreadformat)=slin)
|
|
355
|
+
same => n,Set(CHANNEL(audiowriteformat)=slin)
|
|
356
|
+
same => n,AudioSocket(...)
|
|
357
|
+
```
|
|
358
|
+
|
|
359
|
+
---
|
|
360
|
+
|
|
361
|
+
## 📊 Configuration Options
|
|
362
|
+
|
|
363
|
+
| Field | Type | Description | Default |
|
|
364
|
+
|-------|------|-------------|---------|
|
|
365
|
+
| `provider` | string | `aillomvox`, `gemini`, `openai`, `qwen`, `grok`, `aws`, `ultravox` | **Required** |
|
|
366
|
+
| `voice` | string | Voice ID (provider-specific) | **Required** |
|
|
367
|
+
| `language` | string | `pt-BR`, `en-US`, `es-ES`, etc | `en-US` |
|
|
368
|
+
| `system_prompt` | string | Instructions for AI | **Required** |
|
|
369
|
+
| `first_message` | string | Initial greeting | `null` |
|
|
370
|
+
| `farewell_message` | string | Goodbye message | `null` |
|
|
371
|
+
| `sample_rate` | number | `8000` (tel), `16000` (hd) | **Required** |
|
|
372
|
+
| `max_duration` | number | Max seconds (120-3600) | `300` |
|
|
373
|
+
| `tools` | array | Client tools (Middleware only) | `[]` |
|
|
374
|
+
|
|
375
|
+
---
|
|
376
|
+
|
|
377
|
+
## 🌐 Production Checklist
|
|
378
|
+
|
|
379
|
+
- [ ] AudioSocket module loaded and persistent
|
|
380
|
+
- [ ] API key stored securely in `[globals]`
|
|
381
|
+
- [ ] Sample rate set to 8000 Hz
|
|
382
|
+
- [ ] Max duration configured
|
|
383
|
+
- [ ] If using Middleware: process manager (PM2, systemd)
|
|
384
|
+
- [ ] If using Middleware: AMI credentials configured
|
|
385
|
+
- [ ] Tested with real calls
|
|
386
|
+
- [ ] Monitoring enabled
|
|
387
|
+
|
|
388
|
+
---
|
|
389
|
+
|
|
390
|
+
## 💡 Which Architecture Should I Use?
|
|
391
|
+
|
|
392
|
+
| Feature | Direct Mode | Middleware Mode |
|
|
393
|
+
|---------|-------------|-----------------|
|
|
394
|
+
| Simple conversations | ✅ Yes | ✅ Yes |
|
|
395
|
+
| Transfer calls | ❌ No | ✅ Yes |
|
|
396
|
+
| Hangup via AMI | ❌ No | ✅ Yes |
|
|
397
|
+
| Dial external numbers | ❌ No | ✅ Yes |
|
|
398
|
+
| Complexity | Low | Medium |
|
|
399
|
+
| Setup | 2 minutes | 10 minutes |
|
|
400
|
+
|
|
401
|
+
**Recommendation**: Start with **Direct Mode**. Upgrade to **Middleware** when you need client tools.
|
|
402
|
+
|
|
403
|
+
---
|
|
404
|
+
|
|
405
|
+
## 📚 Additional Resources
|
|
406
|
+
|
|
407
|
+
- [Client Tools Guide](TOOLS.md)
|
|
408
|
+
- [AillomVox Protocol](PROTOCOL.md)
|
|
409
|
+
- [Provider Comparison](PROVIDERS.md)
|
|
410
|
+
|
|
411
|
+
Happy building! 🎉
|
package/docs/PROTOCOL.md
ADDED
|
@@ -0,0 +1,156 @@
|
|
|
1
|
+
# WebSocket Protocol
|
|
2
|
+
|
|
3
|
+
The AillomVox Gateway uses a WebSocket protocol for full-duplex audio streaming and control messaging.
|
|
4
|
+
|
|
5
|
+
## Connection
|
|
6
|
+
|
|
7
|
+
**Endpoint**: `wss://vox.aillom.com/ws`
|
|
8
|
+
|
|
9
|
+
> **Note**: Authentication is performed in-band (inside the first message), not via HTTP headers or query params.
|
|
10
|
+
|
|
11
|
+
## Message Flow
|
|
12
|
+
|
|
13
|
+
```
|
|
14
|
+
Client Server
|
|
15
|
+
| |
|
|
16
|
+
|--- WebSocket Connect (GET /ws) --->|
|
|
17
|
+
|<--- 101 Switching Protocols -------|
|
|
18
|
+
| |
|
|
19
|
+
|--- JSON Config (type: "config") -->| ← MUST be first message
|
|
20
|
+
| | (Auth + Billing + Provider init)
|
|
21
|
+
| ~500ms stabilization~ |
|
|
22
|
+
| |
|
|
23
|
+
|--- Binary PCM Audio Chunks ------->| ← 16-bit LE Mono
|
|
24
|
+
|<--- Binary PCM Audio Chunks -------| ← AI response audio
|
|
25
|
+
|<--- JSON Events (transcript, etc) -|
|
|
26
|
+
| |
|
|
27
|
+
|--- JSON { type: "hangup" } ------->| ← or server-initiated
|
|
28
|
+
|<--- WebSocket Close ----------------|
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
## 1. Handshake (Client → Server)
|
|
32
|
+
|
|
33
|
+
The **first message** must be a flat JSON config object. Sending binary data before this message results in connection termination (code `1008`).
|
|
34
|
+
|
|
35
|
+
```json
|
|
36
|
+
{
|
|
37
|
+
"type": "config",
|
|
38
|
+
"apikey": "av_your_api_key_here",
|
|
39
|
+
"provider": "aillomvox",
|
|
40
|
+
"voice": "Edward",
|
|
41
|
+
"language": "en-US",
|
|
42
|
+
"sample_rate": 16000,
|
|
43
|
+
"system_prompt": "You are a helpful assistant.",
|
|
44
|
+
"first_message": "Hello! How can I help you?",
|
|
45
|
+
"farewell_message": "Thank you for calling. Goodbye!",
|
|
46
|
+
"max_duration": 300,
|
|
47
|
+
"tools": [
|
|
48
|
+
{
|
|
49
|
+
"name": "hangup",
|
|
50
|
+
"description": "End the call when user says goodbye.",
|
|
51
|
+
"parameters": { "type": "object", "properties": {} }
|
|
52
|
+
}
|
|
53
|
+
]
|
|
54
|
+
}
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
| Field | Required | Type | Description |
|
|
58
|
+
| :--- | :--- | :--- | :--- |
|
|
59
|
+
| `type` | ✅ | string | Must be `"config"` |
|
|
60
|
+
| `apikey` | ✅ | string | Your AillomVox API key |
|
|
61
|
+
| `provider` | ✅ | string | `aillomvox`, `openai`, `gemini`, `aws`, `ultravox`, `grok`, `qwen` |
|
|
62
|
+
| `sample_rate` | ✅ | number | `8000`, `16000`, or `24000` Hz |
|
|
63
|
+
| `system_prompt` | ✅ | string | AI persona instructions |
|
|
64
|
+
| `voice` | | string | Provider-specific voice ID |
|
|
65
|
+
| `language` | | string | Locale code (e.g., `en-US`, `pt-BR`) |
|
|
66
|
+
| `first_message` | | string | Greeting spoken on connect |
|
|
67
|
+
| `farewell_message` | | string | Message before session close |
|
|
68
|
+
| `max_duration` | | number | Session limit in seconds (60–3600) |
|
|
69
|
+
| `tools` | | array | Client-side tool definitions |
|
|
70
|
+
|
|
71
|
+
## 2. Audio (Binary Messages)
|
|
72
|
+
|
|
73
|
+
After the handshake, audio flows as raw binary WebSocket messages in both directions.
|
|
74
|
+
|
|
75
|
+
- **Format**: PCM 16-bit Signed Integer, Little Endian
|
|
76
|
+
- **Channels**: Mono (1 channel)
|
|
77
|
+
- **Rate**: Must match `sample_rate` from the handshake
|
|
78
|
+
- **Direction**: Full duplex (bidirectional)
|
|
79
|
+
|
|
80
|
+
> There is no JSON wrapper for audio. If the WebSocket message is binary, it is a raw audio chunk.
|
|
81
|
+
|
|
82
|
+
## 3. Server → Client Events
|
|
83
|
+
|
|
84
|
+
### Transcript
|
|
85
|
+
```json
|
|
86
|
+
{
|
|
87
|
+
"type": "transcript",
|
|
88
|
+
"role": "user",
|
|
89
|
+
"text": "Hello world",
|
|
90
|
+
"final": true
|
|
91
|
+
}
|
|
92
|
+
```
|
|
93
|
+
- `role`: `"user"` or `"assistant"`
|
|
94
|
+
- `final`: `true` when the sentence is complete. Only render `final: true` transcripts to avoid UI flickering.
|
|
95
|
+
|
|
96
|
+
### Tool Call
|
|
97
|
+
```json
|
|
98
|
+
{
|
|
99
|
+
"type": "tool_call",
|
|
100
|
+
"call_id": "abc123",
|
|
101
|
+
"name": "hangup",
|
|
102
|
+
"args": {}
|
|
103
|
+
}
|
|
104
|
+
```
|
|
105
|
+
Client must respond with a `tool_result` within 15 seconds (see below).
|
|
106
|
+
|
|
107
|
+
### Playback Clear Buffer
|
|
108
|
+
```json
|
|
109
|
+
{
|
|
110
|
+
"type": "playback_clear_buffer"
|
|
111
|
+
}
|
|
112
|
+
```
|
|
113
|
+
Sent when the user interrupts the AI. Client **must** immediately flush its audio playback queue.
|
|
114
|
+
|
|
115
|
+
### Hangup
|
|
116
|
+
```json
|
|
117
|
+
{
|
|
118
|
+
"type": "hangup"
|
|
119
|
+
}
|
|
120
|
+
```
|
|
121
|
+
Server-initiated session end. Client should stop audio, close socket, and reset UI.
|
|
122
|
+
|
|
123
|
+
### Error
|
|
124
|
+
```json
|
|
125
|
+
{
|
|
126
|
+
"type": "error",
|
|
127
|
+
"message": "Invalid API Key"
|
|
128
|
+
}
|
|
129
|
+
```
|
|
130
|
+
Error codes: `unauthorized`, `insufficient_balance`, `max_duration_reached`.
|
|
131
|
+
|
|
132
|
+
## 4. Client → Server Events
|
|
133
|
+
|
|
134
|
+
### Tool Result
|
|
135
|
+
```json
|
|
136
|
+
{
|
|
137
|
+
"type": "tool_result",
|
|
138
|
+
"call_id": "abc123",
|
|
139
|
+
"result": "Order status: shipped"
|
|
140
|
+
}
|
|
141
|
+
```
|
|
142
|
+
**Mandatory** response after receiving a `tool_call`. The AI pauses execution until it receives this. Timeout: 15 seconds.
|
|
143
|
+
|
|
144
|
+
### Hangup
|
|
145
|
+
```json
|
|
146
|
+
{
|
|
147
|
+
"type": "hangup"
|
|
148
|
+
}
|
|
149
|
+
```
|
|
150
|
+
Client-initiated session end.
|
|
151
|
+
|
|
152
|
+
## 5. Session Governance
|
|
153
|
+
|
|
154
|
+
- **Max Duration**: 60–3600 seconds (default: 300)
|
|
155
|
+
- **Farewell Warning**: At 15 seconds remaining, the AI speaks the `farewell_message`
|
|
156
|
+
- **Force Close**: At 0 seconds, connection closes with `max_duration_reached`
|
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
# Supported AI Providers
|
|
2
|
+
|
|
3
|
+
AillomVox aggregates top-tier Voice AI providers. Click on a provider to see detailed configuration and examples.
|
|
4
|
+
|
|
5
|
+
## Provider Index
|
|
6
|
+
|
|
7
|
+
| Provider | Model | Voices | Best For | Documentation |
|
|
8
|
+
| :--- | :--- | :--- | :--- | :--- |
|
|
9
|
+
| **AillomVox** | *Groq + Inworld TTS* | 65 voices | **Speed & Telephony**. Lowest latency, 8kHz native. | [Docs](providers/AILLOMVOX.md) |
|
|
10
|
+
| **OpenAI** | `gpt-realtime-mini` | 6 voices | **Complex Logic**. Math, coding, strict reasoning. | [Docs](providers/OPENAI.md) |
|
|
11
|
+
| **Gemini** | `gemini-2.5-flash` | 5 voices | **Long Context**. Massive memory, complex prompts. | [Docs](providers/GEMINI.md) |
|
|
12
|
+
| **AWS** | `nova-2-sonic` | 3 voices | **Enterprise**. High reliability, AWS compliance. | [Docs](providers/AWS.md) |
|
|
13
|
+
| **UltraVox** | `ultravox-v0.7` | 2 voices | **Emotion**. High emotional intelligence. | [Docs](providers/ULTRAVOX.md) |
|
|
14
|
+
| **Grok** | `grok-beta` | Model-dependent | **Casual/Fun**. Witty, less robotic interactions. | [Docs](providers/GROK.md) |
|
|
15
|
+
| **Qwen** | `qwen3-omni` | Model-dependent | **Cost & Asia**. High performance at lower cost. | [Docs](providers/QWEN.md) |
|
|
16
|
+
|
|
17
|
+
See the complete [Voice Catalog](VOICES.md) for all voices across providers.
|
|
18
|
+
|
|
19
|
+
## Quick Config Example
|
|
20
|
+
|
|
21
|
+
```javascript
|
|
22
|
+
// Connect to AillomVox
|
|
23
|
+
ws.send(JSON.stringify({
|
|
24
|
+
type: "config",
|
|
25
|
+
apikey: "YOUR_API_KEY",
|
|
26
|
+
provider: "aillomvox",
|
|
27
|
+
voice: "Edward",
|
|
28
|
+
language: "en-US",
|
|
29
|
+
sample_rate: 16000,
|
|
30
|
+
system_prompt: "You are a helpful assistant."
|
|
31
|
+
}));
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
---
|
|
35
|
+
|
|
36
|
+
## ⚠️ Known Limitations
|
|
37
|
+
|
|
38
|
+
| Provider | Limitation | Workaround |
|
|
39
|
+
| :--- | :--- | :--- |
|
|
40
|
+
| **Qwen** (`qwen3-omni-flash-realtime`) | **Function calling / Client Tools not supported** in WebSocket Realtime mode. The model will respond with text instead of emitting tool calls. | Use **AWS**, **OpenAI**, or **Gemini** for scenarios requiring tools (e.g., `hangup`, `transfer`). |
|