@jambonz/mcp-schema-server 0.1.0 → 0.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +545 -7
- package/dist/index.js +141 -0
- package/dist/index.js.map +1 -1
- package/docs/verbs/conference.md +51 -0
- package/docs/verbs/listen.md +71 -0
- package/docs/verbs/stream.md +5 -0
- package/package.json +13 -5
- package/schema/callbacks/base.schema.json +29 -0
- package/schema/callbacks/call-status.schema.json +22 -0
- package/schema/callbacks/conference-status.schema.json +24 -0
- package/schema/callbacks/conference-wait.schema.json +11 -0
- package/schema/callbacks/conference.schema.json +11 -0
- package/schema/callbacks/dequeue.schema.json +19 -0
- package/schema/callbacks/dial-dtmf.schema.json +18 -0
- package/schema/callbacks/dial-hold.schema.json +22 -0
- package/schema/callbacks/dial-refer.schema.json +28 -0
- package/schema/callbacks/dial.schema.json +31 -0
- package/schema/callbacks/enqueue-wait.schema.json +17 -0
- package/schema/callbacks/enqueue.schema.json +27 -0
- package/schema/callbacks/gather-partial.schema.json +54 -0
- package/schema/callbacks/gather.schema.json +60 -0
- package/schema/callbacks/listen.schema.json +21 -0
- package/schema/callbacks/llm.schema.json +30 -0
- package/schema/callbacks/message.schema.json +35 -0
- package/schema/callbacks/play.schema.json +36 -0
- package/schema/callbacks/session-new.schema.json +143 -0
- package/schema/callbacks/session-reconnect.schema.json +9 -0
- package/schema/callbacks/session-redirect.schema.json +38 -0
- package/schema/callbacks/sip-refer-event.schema.json +20 -0
- package/schema/callbacks/sip-refer.schema.json +22 -0
- package/schema/callbacks/sip-request.schema.json +27 -0
- package/schema/callbacks/transcribe-translation.schema.json +24 -0
- package/schema/callbacks/transcribe.schema.json +46 -0
- package/schema/callbacks/verb-status.schema.json +57 -0
- package/schema/components/actionHook.schema.json +1 -1
- package/schema/components/amd.schema.json +68 -0
- package/schema/components/recognizer-assemblyAiOptions.schema.json +35 -0
- package/schema/components/recognizer-awsOptions.schema.json +52 -0
- package/schema/components/recognizer-azureOptions.schema.json +32 -0
- package/schema/components/recognizer-cobaltOptions.schema.json +34 -0
- package/schema/components/recognizer-customOptions.schema.json +27 -0
- package/schema/components/recognizer-deepgramOptions.schema.json +147 -0
- package/schema/components/recognizer-elevenlabsOptions.schema.json +39 -0
- package/schema/components/recognizer-gladiaOptions.schema.json +8 -0
- package/schema/components/recognizer-googleOptions.schema.json +35 -0
- package/schema/components/recognizer-houndifyOptions.schema.json +53 -0
- package/schema/components/recognizer-ibmOptions.schema.json +54 -0
- package/schema/components/recognizer-nuanceOptions.schema.json +150 -0
- package/schema/components/recognizer-nvidiaOptions.schema.json +39 -0
- package/schema/components/recognizer-openaiOptions.schema.json +59 -0
- package/schema/components/recognizer-sonioxOptions.schema.json +46 -0
- package/schema/components/recognizer-speechmaticsOptions.schema.json +91 -0
- package/schema/components/recognizer-verbioOptions.schema.json +46 -0
- package/schema/components/recognizer.schema.json +17 -85
- package/schema/verbs/config.schema.json +8 -12
- package/schema/verbs/dial.schema.json +10 -14
- package/schema/verbs/gather.schema.json +1 -1
- package/schema/verbs/listen.schema.json +2 -3
- package/schema/verbs/llm.schema.json +1 -1
- package/schema/verbs/stream.schema.json +1 -1
- package/schema/verbs/transcribe.schema.json +4 -0
package/AGENTS.md
CHANGED
|
@@ -71,7 +71,194 @@ Individual verb schemas are in `schema/verbs/`. Shared component types (synthesi
|
|
|
71
71
|
- **answer** — Explicitly answer the call.
|
|
72
72
|
- **leave** — Leave a conference or queue.
|
|
73
73
|
|
|
74
|
-
##
|
|
74
|
+
## Using the @jambonz/sdk
|
|
75
|
+
|
|
76
|
+
**IMPORTANT**: Always use the `@jambonz/sdk` package (version 0.1.x+) to build jambonz applications. The older packages `@jambonz/node-client-ws` and `@jambonz/node-client` are **deprecated and replaced** by `@jambonz/sdk`. Do NOT use those old packages. Do NOT build apps with raw JSON verb arrays and plain Express/WS — use `@jambonz/sdk`.
|
|
77
|
+
|
|
78
|
+
Install: `npm install @jambonz/sdk`
|
|
79
|
+
|
|
80
|
+
### Webhook Application (HTTP)
|
|
81
|
+
|
|
82
|
+
Import `WebhookResponse` from `@jambonz/sdk/webhook`. Create an Express server, construct a `WebhookResponse` for each request, chain verb methods, and return it via `res.json()`.
|
|
83
|
+
|
|
84
|
+
**Important**: Every webhook app must include a `POST /call-status` handler. jambonz sends call status change notifications (ringing, in-progress, completed, etc.) to this endpoint. The handler should log the event and return 200.
|
|
85
|
+
|
|
86
|
+
```typescript
|
|
87
|
+
import express from 'express';
|
|
88
|
+
import { WebhookResponse } from '@jambonz/sdk/webhook';
|
|
89
|
+
|
|
90
|
+
const app = express();
|
|
91
|
+
app.use(express.json());
|
|
92
|
+
|
|
93
|
+
app.post('/incoming', (req, res) => {
|
|
94
|
+
const jambonz = new WebhookResponse();
|
|
95
|
+
jambonz
|
|
96
|
+
.say({ text: 'Hello! Welcome to our service.' })
|
|
97
|
+
.gather({
|
|
98
|
+
input: ['speech', 'digits'],
|
|
99
|
+
actionHook: '/handle-input',
|
|
100
|
+
numDigits: 1,
|
|
101
|
+
timeout: 10,
|
|
102
|
+
say: { text: 'Press 1 for sales or 2 for support.' },
|
|
103
|
+
})
|
|
104
|
+
.say({ text: 'We did not receive any input. Goodbye.' })
|
|
105
|
+
.hangup();
|
|
106
|
+
|
|
107
|
+
res.json(jambonz);
|
|
108
|
+
});
|
|
109
|
+
|
|
110
|
+
app.post('/handle-input', (req, res) => {
|
|
111
|
+
const { digits, speech } = req.body;
|
|
112
|
+
const jambonz = new WebhookResponse();
|
|
113
|
+
jambonz.say({ text: `You pressed ${digits || 'nothing'}. Goodbye.` }).hangup();
|
|
114
|
+
res.json(jambonz);
|
|
115
|
+
});
|
|
116
|
+
|
|
117
|
+
// Every webhook app must handle call status events
|
|
118
|
+
app.post('/call-status', (req, res) => {
|
|
119
|
+
console.log(`Call ${req.body.call_sid} status: ${req.body.call_status}`);
|
|
120
|
+
res.sendStatus(200);
|
|
121
|
+
});
|
|
122
|
+
|
|
123
|
+
app.listen(3000, () => console.log('Listening on port 3000'));
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
### WebSocket Application
|
|
127
|
+
|
|
128
|
+
Import `createEndpoint` from `@jambonz/sdk/websocket`. Create an HTTP server, call `createEndpoint` to set up WebSocket handling, then register path-based services that receive `session` objects.
|
|
129
|
+
|
|
130
|
+
```typescript
|
|
131
|
+
import http from 'http';
|
|
132
|
+
import { createEndpoint } from '@jambonz/sdk/websocket';
|
|
133
|
+
|
|
134
|
+
const server = http.createServer();
|
|
135
|
+
const makeService = createEndpoint({ server, port: 3000 });
|
|
136
|
+
|
|
137
|
+
const svc = makeService({ path: '/' });
|
|
138
|
+
|
|
139
|
+
svc.on('session:new', (session) => {
|
|
140
|
+
console.log(`Incoming call: ${session.callSid}`);
|
|
141
|
+
|
|
142
|
+
session
|
|
143
|
+
.say({ text: 'Hello from jambonz over WebSocket!' })
|
|
144
|
+
.hangup()
|
|
145
|
+
.send();
|
|
146
|
+
});
|
|
147
|
+
|
|
148
|
+
console.log('jambonz ws app listening on port 3000');
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
**Key differences from webhook**: Use `session` instead of `WebhookResponse`. Chain verbs the same way, but call `.send()` at the end to transmit the initial verb array over the WebSocket.
|
|
152
|
+
|
|
153
|
+
### WebSocket actionHook Events (CRITICAL)
|
|
154
|
+
|
|
155
|
+
In webhook mode, an `actionHook` is just a URL that jambonz POSTs to. In WebSocket mode, the `actionHook` value becomes an **event name** emitted on the session. You MUST bind a handler for it and respond with `.reply()`.
|
|
156
|
+
|
|
157
|
+
**Key rules for WebSocket actionHook handling:**
|
|
158
|
+
1. Use `session.on('/hookName', (evt) => {...})` to listen for the actionHook event.
|
|
159
|
+
2. The `evt` object contains the actionHook payload (same fields as the webhook POST body: `reason`, `speech`, `digits`, etc.).
|
|
160
|
+
3. Respond with `.reply()` — NOT `.send()`. `.send()` is only for the initial verb array (the first response to `session:new`). `.reply()` acknowledges the actionHook and provides the next verb array.
|
|
161
|
+
4. If no listener is bound for an actionHook, the SDK auto-replies with an empty verb array.
|
|
162
|
+
|
|
163
|
+
### WebSocket with Gather (speech echo example)
|
|
164
|
+
|
|
165
|
+
```typescript
|
|
166
|
+
import http from 'http';
|
|
167
|
+
import { createEndpoint } from '@jambonz/sdk/websocket';
|
|
168
|
+
|
|
169
|
+
const server = http.createServer();
|
|
170
|
+
const makeService = createEndpoint({ server, port: 3000 });
|
|
171
|
+
|
|
172
|
+
const svc = makeService({ path: '/' });
|
|
173
|
+
|
|
174
|
+
svc.on('session:new', (session) => {
|
|
175
|
+
// Bind actionHook handler BEFORE sending verbs
|
|
176
|
+
session
|
|
177
|
+
.on('close', (code: number, _reason: Buffer) => {
|
|
178
|
+
console.log(`Session ${session.callSid} closed: ${code}`);
|
|
179
|
+
})
|
|
180
|
+
.on('error', (err: Error) => {
|
|
181
|
+
console.error(`Session ${session.callSid} error:`, err);
|
|
182
|
+
})
|
|
183
|
+
.on('/echo', (evt: Record<string, any>) => {
|
|
184
|
+
// This fires when the gather verb completes (actionHook: '/echo')
|
|
185
|
+
switch (evt.reason) {
|
|
186
|
+
case 'speechDetected': {
|
|
187
|
+
const transcript = evt.speech?.alternatives?.[0]?.transcript || 'nothing';
|
|
188
|
+
session
|
|
189
|
+
.say({ text: `You said: ${transcript}.` })
|
|
190
|
+
.gather({
|
|
191
|
+
input: ['speech'],
|
|
192
|
+
actionHook: '/echo',
|
|
193
|
+
timeout: 10,
|
|
194
|
+
say: { text: 'Please say something else.' },
|
|
195
|
+
})
|
|
196
|
+
.reply(); // reply() — NOT send()
|
|
197
|
+
break;
|
|
198
|
+
}
|
|
199
|
+
case 'timeout':
|
|
200
|
+
session
|
|
201
|
+
.gather({
|
|
202
|
+
input: ['speech'],
|
|
203
|
+
actionHook: '/echo',
|
|
204
|
+
timeout: 10,
|
|
205
|
+
say: { text: 'Are you still there? I didn\'t hear anything.' },
|
|
206
|
+
})
|
|
207
|
+
.reply();
|
|
208
|
+
break;
|
|
209
|
+
default:
|
|
210
|
+
session.reply();
|
|
211
|
+
break;
|
|
212
|
+
}
|
|
213
|
+
});
|
|
214
|
+
|
|
215
|
+
// Send initial verbs to jambonz
|
|
216
|
+
session
|
|
217
|
+
.pause({ length: 1 })
|
|
218
|
+
.gather({
|
|
219
|
+
input: ['speech'],
|
|
220
|
+
actionHook: '/echo',
|
|
221
|
+
timeout: 10,
|
|
222
|
+
say: { text: 'Please say something and I will echo it back to you.' },
|
|
223
|
+
})
|
|
224
|
+
.send(); // send() — first response only
|
|
225
|
+
});
|
|
226
|
+
|
|
227
|
+
console.log('Speech echo WebSocket app listening on port 3000');
|
|
228
|
+
```
|
|
229
|
+
|
|
230
|
+
**`.send()` vs `.reply()`:**
|
|
231
|
+
- `.send()` — Use ONCE for the initial verb array in response to `session:new`. This acknowledges the session.
|
|
232
|
+
- `.reply()` — Use for ALL subsequent responses (actionHook events, session:redirect). This acknowledges the hook message and provides the next verb array.
|
|
233
|
+
|
|
234
|
+
### SDK Verb Method Reference
|
|
235
|
+
|
|
236
|
+
Both `WebhookResponse` and `Session` support the same chainable verb methods:
|
|
237
|
+
|
|
238
|
+
`.say(opts)` `.play(opts)` `.gather(opts)` `.dial(opts)` `.llm(opts)` `.conference(opts)` `.enqueue(opts)` `.dequeue(opts)` `.hangup()` `.pause(opts)` `.redirect(opts)` `.config(opts)` `.tag(opts)` `.dtmf(opts)` `.listen(opts)` `.transcribe(opts)` `.message(opts)` `.stream(opts)` `.pipeline(opts)` `.dub(opts)` `.alert(opts)` `.answer(opts)` `.leave()` `.sipDecline(opts)` `.sipRefer(opts)` `.sipRequest(opts)`
|
|
239
|
+
|
|
240
|
+
All methods accept the same options as the corresponding verb JSON Schema. Methods are chainable — they return `this`.
|
|
241
|
+
|
|
242
|
+
### REST API Client
|
|
243
|
+
|
|
244
|
+
```typescript
|
|
245
|
+
import { JambonzClient } from '@jambonz/sdk/client';
|
|
246
|
+
|
|
247
|
+
const client = new JambonzClient({ baseUrl: 'https://api.jambonz.us', accountSid, apiKey });
|
|
248
|
+
|
|
249
|
+
// Create an outbound call
|
|
250
|
+
await client.calls.create({ from: '+15085551212', to: { type: 'phone', number: '+15085551213' }, call_hook: '/incoming' });
|
|
251
|
+
|
|
252
|
+
// Mid-call control
|
|
253
|
+
await client.calls.whisper(callSid, { verb: 'say', text: 'Supervisor listening.' });
|
|
254
|
+
await client.calls.mute(callSid, 'mute');
|
|
255
|
+
await client.calls.redirect(callSid, 'https://example.com/new-flow');
|
|
256
|
+
await client.calls.update(callSid, { call_status: 'completed' });
|
|
257
|
+
```
|
|
258
|
+
|
|
259
|
+
## Common Patterns (Raw JSON)
|
|
260
|
+
|
|
261
|
+
These are the raw JSON verb arrays that the SDK generates. You should use the SDK verb methods above, but these show the underlying structure for reference.
|
|
75
262
|
|
|
76
263
|
### Simple Greeting and Gather
|
|
77
264
|
```json
|
|
@@ -151,7 +338,7 @@ When a verb completes, jambonz invokes the `actionHook` URL (webhook) or sends a
|
|
|
151
338
|
|
|
152
339
|
**gather**: `speech` (object with `alternatives[].transcript`), `digits` (string), `reason` (`speechDetected`, `dtmfDetected`, `timeout`)
|
|
153
340
|
|
|
154
|
-
**dial**: `dial_call_sid`, `dial_call_status`, `dial_sip_status`, `duration`
|
|
341
|
+
**dial**: `dial_call_sid`, `dial_call_status`, `dial_sip_status`, `dial_sbc_callid`, `duration`
|
|
155
342
|
|
|
156
343
|
**llm**: `completion_reason` (`normal`, `timeout`, `error`), `llm_usage` (token counts)
|
|
157
344
|
|
|
@@ -159,6 +346,62 @@ When a verb completes, jambonz invokes the `actionHook` URL (webhook) or sends a
|
|
|
159
346
|
|
|
160
347
|
**transcribe**: `transcription` (object with transcript text)
|
|
161
348
|
|
|
349
|
+
## Application Environment Variables
|
|
350
|
+
|
|
351
|
+
Applications can declare configurable parameters that the jambonz portal discovers via an HTTP `OPTIONS` request. The portal renders a UI form from the schema, and configured values are delivered as `env_vars` in the initial call payload.
|
|
352
|
+
|
|
353
|
+
**When to suggest env vars**: If the user describes an application that includes values likely to change between deployments or users — such as phone numbers to dial, API keys, language preferences, greeting text, queue names, or timeout values — proactively ask whether they'd like to make those configurable via environment variables rather than hardcoding them. Many users won't think of this but will benefit from the flexibility.
|
|
354
|
+
|
|
355
|
+
### Declaring Environment Variables
|
|
356
|
+
|
|
357
|
+
Define a schema object where each key is a parameter name and the value describes its type and UI behavior:
|
|
358
|
+
|
|
359
|
+
```typescript
|
|
360
|
+
const envVars = {
|
|
361
|
+
API_KEY: { type: 'string', description: 'Your API key', required: true, obscure: true },
|
|
362
|
+
LANGUAGE: { type: 'string', description: 'TTS language', default: 'en-US', enum: ['en-US', 'es-ES', 'fr-FR'] },
|
|
363
|
+
MAX_RETRIES: { type: 'number', description: 'Max retry attempts', default: 3 },
|
|
364
|
+
};
|
|
365
|
+
```
|
|
366
|
+
|
|
367
|
+
Each parameter supports: `type` (required: `'string'` | `'number'` | `'boolean'`), `description` (required), `required`, `default`, `enum`, and `obscure` (masks value in portal UI for secrets).
|
|
368
|
+
|
|
369
|
+
### WebSocket Apps
|
|
370
|
+
|
|
371
|
+
Pass `envVars` to `createEndpoint`. The SDK auto-responds to OPTIONS requests:
|
|
372
|
+
|
|
373
|
+
```typescript
|
|
374
|
+
const makeService = createEndpoint({ server, port: 3000, envVars });
|
|
375
|
+
```
|
|
376
|
+
|
|
377
|
+
Access values at runtime via `session.data`:
|
|
378
|
+
|
|
379
|
+
```typescript
|
|
380
|
+
svc.on('session:new', (session) => {
|
|
381
|
+
const apiKey = session.data.env_vars?.API_KEY;
|
|
382
|
+
});
|
|
383
|
+
```
|
|
384
|
+
|
|
385
|
+
### Webhook Apps
|
|
386
|
+
|
|
387
|
+
Use the `envVarsMiddleware` to auto-respond to OPTIONS requests:
|
|
388
|
+
|
|
389
|
+
```typescript
|
|
390
|
+
import { WebhookResponse, envVarsMiddleware } from '@jambonz/sdk/webhook';
|
|
391
|
+
|
|
392
|
+
app.use(envVarsMiddleware(envVars));
|
|
393
|
+
```
|
|
394
|
+
|
|
395
|
+
Access values at runtime via `req.body`:
|
|
396
|
+
|
|
397
|
+
```typescript
|
|
398
|
+
app.post('/incoming', (req, res) => {
|
|
399
|
+
const apiKey = req.body.env_vars?.API_KEY;
|
|
400
|
+
});
|
|
401
|
+
```
|
|
402
|
+
|
|
403
|
+
**Note**: `env_vars` is only present in the initial call webhook, not in subsequent actionHook callbacks.
|
|
404
|
+
|
|
162
405
|
## Mid-Call Control
|
|
163
406
|
|
|
164
407
|
Active calls can be modified asynchronously — inject verbs, mute, redirect, or start recording while the call is in progress.
|
|
@@ -223,7 +466,8 @@ session.injectCommand('redirect', { call_hook: '/new-flow' });
|
|
|
223
466
|
|------|-------------|
|
|
224
467
|
| `session:new` | New call session established. Contains call details. |
|
|
225
468
|
| `session:redirect` | Call was redirected to this app. |
|
|
226
|
-
| `verb:
|
|
469
|
+
| `verb:hook` | An actionHook fired (e.g. gather completed). Contains `hook` (the actionHook name) and `data` (the payload). The SDK emits this as `session.on('/hookName', handler)`. Respond with `.reply()`. |
|
|
470
|
+
| `verb:status` | Informational verb status notification (no reply needed). |
|
|
227
471
|
| `call:status` | Call state changed (e.g. `completed`). |
|
|
228
472
|
| `llm:tool-call` | LLM requested a tool/function call. |
|
|
229
473
|
| `llm:event` | LLM lifecycle event (connected, tokens, etc.). |
|
|
@@ -245,16 +489,185 @@ session.injectCommand('redirect', { call_hook: '/new-flow' });
|
|
|
245
489
|
The SDK `Session` object emits events for common message types:
|
|
246
490
|
|
|
247
491
|
```typescript
|
|
248
|
-
|
|
249
|
-
session.on('
|
|
492
|
+
// ActionHook events — the hook name IS the event name. Respond with .reply()
|
|
493
|
+
session.on('/echo', (data) => { /* gather actionHook fired */ session.say({text: '...'}).reply(); });
|
|
494
|
+
session.on('/dial-result', (data) => { /* dial actionHook */ session.reply(); });
|
|
495
|
+
session.on('/llm-complete', (data) => { /* llm actionHook */ session.hangup().reply(); });
|
|
496
|
+
|
|
497
|
+
// Fallback — fires for any verb:hook without a specific listener
|
|
498
|
+
session.on('verb:hook', (hook, data) => { /* generic actionHook handler */ });
|
|
499
|
+
|
|
500
|
+
// Status events (informational — no reply needed)
|
|
501
|
+
session.on('verb:status', (data) => { /* verb status notification */ });
|
|
250
502
|
session.on('call:status', (data) => { /* call state change */ });
|
|
503
|
+
|
|
504
|
+
// LLM events
|
|
251
505
|
session.on('llm:tool-call', (data) => { /* tool call from LLM */ });
|
|
252
506
|
session.on('llm:event', (data) => { /* LLM event */ });
|
|
253
|
-
|
|
507
|
+
|
|
508
|
+
// TTS streaming — specific lifecycle events
|
|
509
|
+
session.on('tts:stream_open', (data) => { /* vendor connection established */ });
|
|
510
|
+
session.on('tts:stream_paused', (data) => { /* backpressure: buffer full */ });
|
|
511
|
+
session.on('tts:stream_resumed', (data) => { /* backpressure released */ });
|
|
512
|
+
session.on('tts:stream_closed', (data) => { /* TTS stream ended */ });
|
|
513
|
+
session.on('tts:user_interruption', (data) => { /* user barge-in (with event data) */ });
|
|
514
|
+
session.on('tts:user_interrupt', () => { /* user barge-in (convenience, no data) */ });
|
|
515
|
+
// Catch-all for any TTS streaming event
|
|
516
|
+
session.on('tts:streaming-event', (data) => { /* data.event_type has the type */ });
|
|
517
|
+
|
|
518
|
+
// Connection lifecycle
|
|
254
519
|
session.on('close', (code, reason) => { /* connection closed */ });
|
|
255
520
|
session.on('error', (err) => { /* error */ });
|
|
256
521
|
```
|
|
257
522
|
|
|
523
|
+
## Audio WebSocket (Listen/Stream)
|
|
524
|
+
|
|
525
|
+
The `listen` and `stream` verbs open a separate WebSocket connection from jambonz to your application, carrying raw audio. This is independent of the control WebSocket (`ws.jambonz.org`) — it uses the `audio.drachtio.org` subprotocol.
|
|
526
|
+
|
|
527
|
+
### Receiving Audio in the Same Application
|
|
528
|
+
|
|
529
|
+
Use `makeService.audio()` to register an audio WebSocket handler on the same server that handles the control pipe:
|
|
530
|
+
|
|
531
|
+
```typescript
|
|
532
|
+
import http from 'http';
|
|
533
|
+
import { createEndpoint } from '@jambonz/sdk/websocket';
|
|
534
|
+
|
|
535
|
+
const server = http.createServer();
|
|
536
|
+
const makeService = createEndpoint({ server, port: 3000 });
|
|
537
|
+
|
|
538
|
+
// Control pipe — handles call sessions
|
|
539
|
+
const svc = makeService({ path: '/' });
|
|
540
|
+
|
|
541
|
+
// Audio pipe — receives listen/stream audio
|
|
542
|
+
const audioSvc = makeService.audio({ path: '/audio-stream' });
|
|
543
|
+
|
|
544
|
+
svc.on('session:new', (session) => {
|
|
545
|
+
session
|
|
546
|
+
.answer()
|
|
547
|
+
.say({ text: 'Recording your audio.' })
|
|
548
|
+
.listen({
|
|
549
|
+
url: '/audio-stream', // relative path — jambonz connects back to same server
|
|
550
|
+
sampleRate: 16000,
|
|
551
|
+
mixType: 'mono',
|
|
552
|
+
metadata: { purpose: 'recording' },
|
|
553
|
+
})
|
|
554
|
+
.hangup()
|
|
555
|
+
.send();
|
|
556
|
+
});
|
|
557
|
+
|
|
558
|
+
audioSvc.on('connection', (stream) => {
|
|
559
|
+
console.log(`Audio from call ${stream.callSid}, rate=${stream.sampleRate}`);
|
|
560
|
+
console.log('Metadata:', stream.metadata);
|
|
561
|
+
|
|
562
|
+
stream.on('audio', (pcm: Buffer) => {
|
|
563
|
+
// L16 PCM binary frames
|
|
564
|
+
});
|
|
565
|
+
|
|
566
|
+
stream.on('close', () => {
|
|
567
|
+
console.log('Audio stream closed');
|
|
568
|
+
});
|
|
569
|
+
});
|
|
570
|
+
```
|
|
571
|
+
|
|
572
|
+
### AudioStream API
|
|
573
|
+
|
|
574
|
+
The `stream` object in the `connection` event is an `AudioStream` instance:
|
|
575
|
+
|
|
576
|
+
**Properties**: `metadata` (initial JSON), `callSid`, `sampleRate`
|
|
577
|
+
|
|
578
|
+
**Events**:
|
|
579
|
+
- `audio` — L16 PCM binary frame (`Buffer`)
|
|
580
|
+
- `dtmf` — `{digit, duration}` (only if `passDtmf: true` on listen verb)
|
|
581
|
+
- `playDone` — `{id}` (after non-streaming playAudio completes)
|
|
582
|
+
- `mark` — `{name, event}` where event is `'playout'` or `'cleared'`
|
|
583
|
+
- `close` — `(code, reason)`
|
|
584
|
+
- `error` — `(err)`
|
|
585
|
+
|
|
586
|
+
### Sending Audio Back (Bidirectional)
|
|
587
|
+
|
|
588
|
+
The listen verb supports bidirectional audio. There are two modes, controlled by the `bidirectionalAudio.streaming` option on the listen verb.
|
|
589
|
+
|
|
590
|
+
**Non-streaming mode** (`streaming: false`, the default) — send complete audio clips as base64:
|
|
591
|
+
|
|
592
|
+
```typescript
|
|
593
|
+
stream.playAudio(base64Content, {
|
|
594
|
+
audioContentType: 'raw', // or 'wav'
|
|
595
|
+
sampleRate: 16000,
|
|
596
|
+
id: 'greeting', // optional — returned in playDone event
|
|
597
|
+
queuePlay: true, // true: queue after current; false: interrupt (default)
|
|
598
|
+
});
|
|
599
|
+
|
|
600
|
+
stream.on('playDone', (evt) => {
|
|
601
|
+
console.log(`Finished playing: ${evt.id}`);
|
|
602
|
+
});
|
|
603
|
+
```
|
|
604
|
+
|
|
605
|
+
Up to 10 playAudio commands can be queued simultaneously.
|
|
606
|
+
|
|
607
|
+
**Streaming mode** (`streaming: true`) — send raw binary PCM frames directly:
|
|
608
|
+
|
|
609
|
+
```typescript
|
|
610
|
+
// In the listen verb config:
|
|
611
|
+
// bidirectionalAudio: { enabled: true, streaming: true, sampleRate: 16000 }
|
|
612
|
+
|
|
613
|
+
stream.on('audio', (pcm) => {
|
|
614
|
+
// Echo audio back (or send processed/generated audio)
|
|
615
|
+
stream.sendAudio(pcm);
|
|
616
|
+
});
|
|
617
|
+
```
|
|
618
|
+
|
|
619
|
+
### Marks (Synchronization Markers)
|
|
620
|
+
|
|
621
|
+
Marks let you track when streamed audio has been played out to the caller. They work **only with bidirectional streaming mode** — you must enable `bidirectionalAudio: { enabled: true, streaming: true }` on the listen verb.
|
|
622
|
+
|
|
623
|
+
The pattern is: stream audio via `sendAudio()`, then send a mark. When all the audio sent before the mark finishes playing out, jambonz sends back a mark event with `event: 'playout'`. This is how you know the caller has heard a specific chunk of audio.
|
|
624
|
+
|
|
625
|
+
```typescript
|
|
626
|
+
// Listen verb must enable bidirectional streaming for marks to work
|
|
627
|
+
session
|
|
628
|
+
.listen({
|
|
629
|
+
url: '/audio',
|
|
630
|
+
actionHook: '/listen-done',
|
|
631
|
+
bidirectionalAudio: {
|
|
632
|
+
enabled: true,
|
|
633
|
+
streaming: true,
|
|
634
|
+
sampleRate: 8000,
|
|
635
|
+
},
|
|
636
|
+
})
|
|
637
|
+
.send();
|
|
638
|
+
|
|
639
|
+
// In the audio handler:
|
|
640
|
+
audioSvc.on('connection', (stream) => {
|
|
641
|
+
// Stream audio, then mark a sync point
|
|
642
|
+
stream.sendAudio(pcmBuffer);
|
|
643
|
+
stream.sendMark('chunk-1'); // fires 'playout' when audio above finishes playing
|
|
644
|
+
|
|
645
|
+
stream.sendAudio(morePcm);
|
|
646
|
+
stream.sendMark('chunk-2'); // fires 'playout' when this audio finishes
|
|
647
|
+
|
|
648
|
+
// Listen for mark events
|
|
649
|
+
stream.on('mark', (evt) => {
|
|
650
|
+
// evt.name = 'chunk-1' or 'chunk-2'
|
|
651
|
+
// evt.event = 'playout' (audio played) or 'cleared' (mark was cleared)
|
|
652
|
+
});
|
|
653
|
+
|
|
654
|
+
// Clear all pending marks (unplayed marks get event='cleared')
|
|
655
|
+
stream.clearMarks();
|
|
656
|
+
});
|
|
657
|
+
```
|
|
658
|
+
|
|
659
|
+
**Important**: Without `bidirectionalAudio.streaming: true`, marks are accepted but never fire — there is no playout buffer to sync against. This is the most common mistake when marks appear to silently fail.
|
|
660
|
+
|
|
661
|
+
### Other Commands
|
|
662
|
+
|
|
663
|
+
```typescript
|
|
664
|
+
stream.killAudio(); // Stop playback, flush buffer
|
|
665
|
+
stream.disconnect(); // Close connection, end listen verb
|
|
666
|
+
stream.sendMark('sync-pt'); // Insert synchronization marker
|
|
667
|
+
stream.clearMarks(); // Clear all pending markers
|
|
668
|
+
stream.close(); // Close the WebSocket
|
|
669
|
+
```
|
|
670
|
+
|
|
258
671
|
## Recording
|
|
259
672
|
|
|
260
673
|
jambonz supports SIPREC-based call recording. Recording is controlled mid-call via inject commands (WebSocket) or future REST API extensions.
|
|
@@ -285,14 +698,139 @@ Key resources:
|
|
|
285
698
|
- **Calls** — Create outbound calls, query active calls, modify in-progress calls (redirect, whisper, mute, hangup)
|
|
286
699
|
- **Messages** — Send SMS/MMS messages
|
|
287
700
|
|
|
701
|
+
## Code Structure
|
|
702
|
+
|
|
703
|
+
### Single File (default)
|
|
704
|
+
|
|
705
|
+
For simple applications with 1-2 routes, put everything in a single file. This is the default for all examples in this repo and is perfectly suitable for production use.
|
|
706
|
+
|
|
707
|
+
### Multi-File with Routes Directory
|
|
708
|
+
|
|
709
|
+
For applications with 3+ routes or significant per-route logic, split into a `src/` directory with a routes folder:
|
|
710
|
+
|
|
711
|
+
```
|
|
712
|
+
src/
|
|
713
|
+
app.ts ← entry point: server setup, route registration
|
|
714
|
+
routes/
|
|
715
|
+
incoming.ts ← handler for one endpoint/path
|
|
716
|
+
hold-music.ts
|
|
717
|
+
queue-exit.ts
|
|
718
|
+
```
|
|
719
|
+
|
|
720
|
+
**Webhook pattern** — each route file exports an Express route handler:
|
|
721
|
+
|
|
722
|
+
```typescript
|
|
723
|
+
// src/routes/incoming.ts
|
|
724
|
+
import type { Request, Response } from 'express';
|
|
725
|
+
import { WebhookResponse } from '@jambonz/sdk/webhook';
|
|
726
|
+
|
|
727
|
+
export default function incoming(_req: Request, res: Response) {
|
|
728
|
+
const jambonz = new WebhookResponse();
|
|
729
|
+
jambonz
|
|
730
|
+
.say({ text: 'Thank you for calling. Please hold.' })
|
|
731
|
+
.enqueue({ name: 'support', waitHook: '/hold-music', actionHook: '/queue-exit' });
|
|
732
|
+
res.json(jambonz);
|
|
733
|
+
}
|
|
734
|
+
```
|
|
735
|
+
|
|
736
|
+
```typescript
|
|
737
|
+
// src/app.ts
|
|
738
|
+
import express from 'express';
|
|
739
|
+
import incoming from './routes/incoming.js';
|
|
740
|
+
import holdMusic from './routes/hold-music.js';
|
|
741
|
+
import queueExit from './routes/queue-exit.js';
|
|
742
|
+
|
|
743
|
+
const app = express();
|
|
744
|
+
app.use(express.json());
|
|
745
|
+
|
|
746
|
+
app.post('/incoming', incoming);
|
|
747
|
+
app.post('/hold-music', holdMusic);
|
|
748
|
+
app.post('/queue-exit', queueExit);
|
|
749
|
+
|
|
750
|
+
app.listen(3000, () => console.log('Listening on port 3000'));
|
|
751
|
+
```
|
|
752
|
+
|
|
753
|
+
**WebSocket pattern** — there are two cases to consider:
|
|
754
|
+
|
|
755
|
+
1. **Multiple services** (different `makeService({ path })` calls — each path gets its own `session:new`). Each route file exports a function that takes a session:
|
|
756
|
+
|
|
757
|
+
```typescript
|
|
758
|
+
// src/routes/caller.ts
|
|
759
|
+
import type { Session } from '@jambonz/sdk/websocket';
|
|
760
|
+
|
|
761
|
+
export default function caller(session: Session) {
|
|
762
|
+
session
|
|
763
|
+
.say({ text: 'Please hold.' })
|
|
764
|
+
.enqueue({ name: 'support', waitHook: '/hold-music', actionHook: '/queue-exit' })
|
|
765
|
+
.send();
|
|
766
|
+
}
|
|
767
|
+
```
|
|
768
|
+
|
|
769
|
+
```typescript
|
|
770
|
+
// src/app.ts
|
|
771
|
+
import http from 'http';
|
|
772
|
+
import { createEndpoint } from '@jambonz/sdk/websocket';
|
|
773
|
+
import caller from './routes/caller.js';
|
|
774
|
+
import agent from './routes/agent.js';
|
|
775
|
+
|
|
776
|
+
const server = http.createServer();
|
|
777
|
+
const makeService = createEndpoint({ server, port: 3000 });
|
|
778
|
+
|
|
779
|
+
makeService({ path: '/incoming' }).on('session:new', (session) => caller(session));
|
|
780
|
+
makeService({ path: '/agent' }).on('session:new', (session) => agent(session));
|
|
781
|
+
```
|
|
782
|
+
|
|
783
|
+
2. **Multiple actionHook handlers on one session** — extract handler functions, but register them all within `session:new`:
|
|
784
|
+
|
|
785
|
+
```typescript
|
|
786
|
+
// src/routes/echo-handler.ts
|
|
787
|
+
import type { Session } from '@jambonz/sdk/websocket';
|
|
788
|
+
|
|
789
|
+
export default function echoHandler(session: Session, evt: Record<string, any>) {
|
|
790
|
+
if (evt.reason === 'speechDetected') {
|
|
791
|
+
const text = evt.speech?.alternatives?.[0]?.transcript || 'nothing';
|
|
792
|
+
session.say({ text: `You said: ${text}` })
|
|
793
|
+
.gather({ input: ['speech'], actionHook: '/echo', timeout: 10 })
|
|
794
|
+
.reply();
|
|
795
|
+
} else {
|
|
796
|
+
session.gather({ input: ['speech'], actionHook: '/echo', timeout: 10,
|
|
797
|
+
say: { text: 'I didn\'t hear anything. Try again.' } }).reply();
|
|
798
|
+
}
|
|
799
|
+
}
|
|
800
|
+
```
|
|
801
|
+
|
|
802
|
+
```typescript
|
|
803
|
+
// src/app.ts — wire it up
|
|
804
|
+
svc.on('session:new', (session) => {
|
|
805
|
+
session.on('/echo', (evt) => echoHandler(session, evt));
|
|
806
|
+
session.gather({ input: ['speech'], actionHook: '/echo', timeout: 10,
|
|
807
|
+
say: { text: 'Say something.' } }).send();
|
|
808
|
+
});
|
|
809
|
+
```
|
|
810
|
+
|
|
811
|
+
### When to Split
|
|
812
|
+
|
|
813
|
+
- **1-2 routes, simple logic** → single file
|
|
814
|
+
- **3+ routes or substantial per-route logic** → `src/app.ts` + `src/routes/`
|
|
815
|
+
- **Shared config, prompts, or utilities** → `src/config.ts`, `src/prompts.ts`, etc.
|
|
816
|
+
|
|
817
|
+
When in doubt, start with a single file. It's easy to split later.
|
|
818
|
+
|
|
288
819
|
## Examples
|
|
289
820
|
|
|
290
821
|
Complete working examples are in the `examples/` directory:
|
|
291
822
|
- **hello-world** — Minimal greeting (webhook + WebSocket)
|
|
823
|
+
- **echo** — Speech echo using gather with actionHook pattern (webhook + WebSocket). The canonical example for understanding actionHook event handling.
|
|
292
824
|
- **ivr-menu** — Interactive menu with speech and DTMF input (webhook)
|
|
293
|
-
- **
|
|
825
|
+
- **dial** — Simple outbound dial to a phone number (webhook)
|
|
826
|
+
- **listen-record** — Record audio using the listen verb to stream to a WebSocket (webhook)
|
|
827
|
+
- **voice-agent** — LLM-powered conversational AI with tool calls (webhook + WebSocket)
|
|
828
|
+
- **openai-realtime** — OpenAI Realtime API voice agent with function calling (WebSocket)
|
|
829
|
+
- **deepgram-voice-agent** — Deepgram Voice Agent API with function calling (WebSocket)
|
|
830
|
+
- **llm-streaming** — Anthropic LLM with TTS token streaming and barge-in (WebSocket)
|
|
294
831
|
- **queue-with-hold** — Call queue with hold music and agent dequeue (webhook + WebSocket)
|
|
295
832
|
- **call-recording** — Mid-call recording control via REST API and inject commands (webhook + WebSocket)
|
|
833
|
+
- **realtime-translator** — Bridges two parties with real-time speech translation using STT, Google Translate, and TTS dub tracks. Multi-file example with `src/routes/` structure (WebSocket)
|
|
296
834
|
|
|
297
835
|
## Key Concepts
|
|
298
836
|
|