@jambonz/mcp-schema-server 0.1.0 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (62) hide show
  1. package/AGENTS.md +602 -7
  2. package/dist/index.js +141 -0
  3. package/dist/index.js.map +1 -1
  4. package/docs/verbs/conference.md +51 -0
  5. package/docs/verbs/dial.md +8 -0
  6. package/docs/verbs/listen.md +71 -0
  7. package/docs/verbs/stream.md +5 -0
  8. package/package.json +13 -5
  9. package/schema/callbacks/base.schema.json +29 -0
  10. package/schema/callbacks/call-status.schema.json +22 -0
  11. package/schema/callbacks/conference-status.schema.json +24 -0
  12. package/schema/callbacks/conference-wait.schema.json +11 -0
  13. package/schema/callbacks/conference.schema.json +11 -0
  14. package/schema/callbacks/dequeue.schema.json +19 -0
  15. package/schema/callbacks/dial-dtmf.schema.json +18 -0
  16. package/schema/callbacks/dial-hold.schema.json +22 -0
  17. package/schema/callbacks/dial-refer.schema.json +28 -0
  18. package/schema/callbacks/dial.schema.json +31 -0
  19. package/schema/callbacks/enqueue-wait.schema.json +17 -0
  20. package/schema/callbacks/enqueue.schema.json +27 -0
  21. package/schema/callbacks/gather-partial.schema.json +54 -0
  22. package/schema/callbacks/gather.schema.json +60 -0
  23. package/schema/callbacks/listen.schema.json +21 -0
  24. package/schema/callbacks/llm.schema.json +30 -0
  25. package/schema/callbacks/message.schema.json +35 -0
  26. package/schema/callbacks/play.schema.json +36 -0
  27. package/schema/callbacks/session-new.schema.json +143 -0
  28. package/schema/callbacks/session-reconnect.schema.json +9 -0
  29. package/schema/callbacks/session-redirect.schema.json +38 -0
  30. package/schema/callbacks/sip-refer-event.schema.json +20 -0
  31. package/schema/callbacks/sip-refer.schema.json +22 -0
  32. package/schema/callbacks/sip-request.schema.json +27 -0
  33. package/schema/callbacks/transcribe-translation.schema.json +24 -0
  34. package/schema/callbacks/transcribe.schema.json +46 -0
  35. package/schema/callbacks/verb-status.schema.json +57 -0
  36. package/schema/components/actionHook.schema.json +1 -1
  37. package/schema/components/amd.schema.json +68 -0
  38. package/schema/components/recognizer-assemblyAiOptions.schema.json +35 -0
  39. package/schema/components/recognizer-awsOptions.schema.json +52 -0
  40. package/schema/components/recognizer-azureOptions.schema.json +32 -0
  41. package/schema/components/recognizer-cobaltOptions.schema.json +34 -0
  42. package/schema/components/recognizer-customOptions.schema.json +27 -0
  43. package/schema/components/recognizer-deepgramOptions.schema.json +147 -0
  44. package/schema/components/recognizer-elevenlabsOptions.schema.json +39 -0
  45. package/schema/components/recognizer-gladiaOptions.schema.json +8 -0
  46. package/schema/components/recognizer-googleOptions.schema.json +35 -0
  47. package/schema/components/recognizer-houndifyOptions.schema.json +53 -0
  48. package/schema/components/recognizer-ibmOptions.schema.json +54 -0
  49. package/schema/components/recognizer-nuanceOptions.schema.json +150 -0
  50. package/schema/components/recognizer-nvidiaOptions.schema.json +39 -0
  51. package/schema/components/recognizer-openaiOptions.schema.json +59 -0
  52. package/schema/components/recognizer-sonioxOptions.schema.json +46 -0
  53. package/schema/components/recognizer-speechmaticsOptions.schema.json +91 -0
  54. package/schema/components/recognizer-verbioOptions.schema.json +46 -0
  55. package/schema/components/recognizer.schema.json +17 -85
  56. package/schema/verbs/config.schema.json +8 -12
  57. package/schema/verbs/dial.schema.json +10 -14
  58. package/schema/verbs/gather.schema.json +1 -1
  59. package/schema/verbs/listen.schema.json +2 -3
  60. package/schema/verbs/llm.schema.json +1 -1
  61. package/schema/verbs/stream.schema.json +1 -1
  62. package/schema/verbs/transcribe.schema.json +4 -0
package/AGENTS.md CHANGED
@@ -71,7 +71,211 @@ Individual verb schemas are in `schema/verbs/`. Shared component types (synthesi
71
71
  - **answer** — Explicitly answer the call.
72
72
  - **leave** — Leave a conference or queue.
73
73
 
74
- ## Common Patterns
74
+ ### Verb synonyms
75
+ - **listen** and **stream** are synonyms - but use "stream" always when generating application code.
76
+ ## Using the @jambonz/sdk
77
+
78
+ **IMPORTANT**: Always use the `@jambonz/sdk` package (version 0.1.x+) to build jambonz applications. The older packages `@jambonz/node-client-ws` and `@jambonz/node-client` are **deprecated and replaced** by `@jambonz/sdk`. Do NOT use those old packages. Do NOT build apps with raw JSON verb arrays and plain Express/WS — use `@jambonz/sdk`.
79
+
80
+ **IMPORTANT — App Environment Variables vs process.env**: jambonz applications should NEVER use `process.env` for application-configurable values (phone numbers, API keys, language preferences, greeting text, etc.). Instead, use **jambonz application environment variables** — a two-step pattern:
81
+ 1. **Declare** the variables so the jambonz portal can discover them (via `envVars` option for WebSocket, or `envVarsMiddleware` for webhook).
82
+ 2. **Read** the values at runtime from the call payload (`session.data.env_vars` for WebSocket, `req.body.env_vars` for webhook).
83
+
84
+ Both steps are required. Declaring without reading means values are ignored. Reading without declaring means the portal won't know about them and won't send them. See the [Application Environment Variables](#application-environment-variables) section for full details.
85
+
86
+ Install: `npm install @jambonz/sdk`
87
+
88
+ **Dependencies**: Webhook apps also require `express` (`npm install express`). WebSocket apps have no additional dependencies — the SDK includes `ws` internally. When generating a `package.json`, always include all required dependencies.
89
+
90
+ ### Webhook Application (HTTP)
91
+
92
+ Import `WebhookResponse` from `@jambonz/sdk/webhook`. Create an Express server, construct a `WebhookResponse` for each request, chain verb methods, and return it via `res.json()`.
93
+
94
+ **Best practice**: Always include a `POST /call-status` handler. jambonz sends call status change notifications (ringing, in-progress, completed, etc.) to this endpoint. The handler should log the event and return 200. The path `/call-status` is conventional but the user may choose a different path:
95
+
96
+ ```typescript
97
+ app.post('/call-status', (req, res) => {
98
+ console.log(`Call ${req.body.call_sid} status: ${req.body.call_status}`);
99
+ res.sendStatus(200);
100
+ });
101
+ ```
102
+
103
+ ```typescript
104
+ import express from 'express';
105
+ import { WebhookResponse } from '@jambonz/sdk/webhook';
106
+
107
+ const app = express();
108
+ app.use(express.json());
109
+
110
+ app.post('/incoming', (req, res) => {
111
+ const jambonz = new WebhookResponse();
112
+ jambonz
113
+ .say({ text: 'Hello! Welcome to our service.' })
114
+ .gather({
115
+ input: ['speech', 'digits'],
116
+ actionHook: '/handle-input',
117
+ numDigits: 1,
118
+ timeout: 10,
119
+ say: { text: 'Press 1 for sales or 2 for support.' },
120
+ })
121
+ .say({ text: 'We did not receive any input. Goodbye.' })
122
+ .hangup();
123
+
124
+ res.json(jambonz);
125
+ });
126
+
127
+ app.post('/handle-input', (req, res) => {
128
+ const { digits, speech } = req.body;
129
+ const jambonz = new WebhookResponse();
130
+ jambonz.say({ text: `You pressed ${digits || 'nothing'}. Goodbye.` }).hangup();
131
+ res.json(jambonz);
132
+ });
133
+
134
+ // Every webhook app must handle call status events
135
+ app.post('/call-status', (req, res) => {
136
+ console.log(`Call ${req.body.call_sid} status: ${req.body.call_status}`);
137
+ res.sendStatus(200);
138
+ });
139
+
140
+ app.listen(3000, () => console.log('Listening on port 3000'));
141
+ ```
142
+
143
+ ### WebSocket Application
144
+
145
+ Import `createEndpoint` from `@jambonz/sdk/websocket`. Create an HTTP server, call `createEndpoint` to set up WebSocket handling, then register path-based services that receive `session` objects.
146
+
147
+ ```typescript
148
+ import http from 'http';
149
+ import { createEndpoint } from '@jambonz/sdk/websocket';
150
+
151
+ const server = http.createServer();
152
+ const makeService = createEndpoint({ server, port: 3000 });
153
+
154
+ const svc = makeService({ path: '/' });
155
+
156
+ svc.on('session:new', (session) => {
157
+ console.log(`Incoming call: ${session.callSid}`);
158
+
159
+ session
160
+ .say({ text: 'Hello from jambonz over WebSocket!' })
161
+ .hangup()
162
+ .send();
163
+ });
164
+
165
+ console.log('jambonz ws app listening on port 3000');
166
+ ```
167
+
168
+ **Key differences from webhook**: Use `session` instead of `WebhookResponse`. Chain verbs the same way, but call `.send()` at the end to transmit the initial verb array over the WebSocket.
169
+
170
+ ### WebSocket actionHook Events (CRITICAL)
171
+
172
+ In webhook mode, an `actionHook` is just a URL that jambonz POSTs to. In WebSocket mode, the `actionHook` value becomes an **event name** emitted on the session. You MUST bind a handler for it and respond with `.reply()`.
173
+
174
+ **Key rules for WebSocket actionHook handling:**
175
+ 1. Use `session.on('/hookName', (evt) => {...})` to listen for the actionHook event.
176
+ 2. The `evt` object contains the actionHook payload (same fields as the webhook POST body: `reason`, `speech`, `digits`, etc.).
177
+ 3. Respond with `.reply()` — NOT `.send()`. `.send()` is only for the initial verb array (the first response to `session:new`). `.reply()` acknowledges the actionHook and provides the next verb array.
178
+ 4. If no listener is bound for an actionHook, the SDK auto-replies with an empty verb array.
179
+
180
+ ### WebSocket with Gather (speech echo example)
181
+
182
+ ```typescript
183
+ import http from 'http';
184
+ import { createEndpoint } from '@jambonz/sdk/websocket';
185
+
186
+ const server = http.createServer();
187
+ const makeService = createEndpoint({ server, port: 3000 });
188
+
189
+ const svc = makeService({ path: '/' });
190
+
191
+ svc.on('session:new', (session) => {
192
+ // Bind actionHook handler BEFORE sending verbs
193
+ session
194
+ .on('close', (code: number, _reason: Buffer) => {
195
+ console.log(`Session ${session.callSid} closed: ${code}`);
196
+ })
197
+ .on('error', (err: Error) => {
198
+ console.error(`Session ${session.callSid} error:`, err);
199
+ })
200
+ .on('/echo', (evt: Record<string, any>) => {
201
+ // This fires when the gather verb completes (actionHook: '/echo')
202
+ switch (evt.reason) {
203
+ case 'speechDetected': {
204
+ const transcript = evt.speech?.alternatives?.[0]?.transcript || 'nothing';
205
+ session
206
+ .say({ text: `You said: ${transcript}.` })
207
+ .gather({
208
+ input: ['speech'],
209
+ actionHook: '/echo',
210
+ timeout: 10,
211
+ say: { text: 'Please say something else.' },
212
+ })
213
+ .reply(); // reply() — NOT send()
214
+ break;
215
+ }
216
+ case 'timeout':
217
+ session
218
+ .gather({
219
+ input: ['speech'],
220
+ actionHook: '/echo',
221
+ timeout: 10,
222
+ say: { text: 'Are you still there? I didn\'t hear anything.' },
223
+ })
224
+ .reply();
225
+ break;
226
+ default:
227
+ session.reply();
228
+ break;
229
+ }
230
+ });
231
+
232
+ // Send initial verbs to jambonz
233
+ session
234
+ .pause({ length: 1 })
235
+ .gather({
236
+ input: ['speech'],
237
+ actionHook: '/echo',
238
+ timeout: 10,
239
+ say: { text: 'Please say something and I will echo it back to you.' },
240
+ })
241
+ .send(); // send() — first response only
242
+ });
243
+
244
+ console.log('Speech echo WebSocket app listening on port 3000');
245
+ ```
246
+
247
+ **`.send()` vs `.reply()`:**
248
+ - `.send()` — Use ONCE for the initial verb array in response to `session:new`. This acknowledges the session.
249
+ - `.reply()` — Use for ALL subsequent responses (actionHook events, session:redirect). This acknowledges the hook message and provides the next verb array.
250
+
251
+ ### SDK Verb Method Reference
252
+
253
+ Both `WebhookResponse` and `Session` support the same chainable verb methods:
254
+
255
+ `.say(opts)` `.play(opts)` `.gather(opts)` `.dial(opts)` `.llm(opts)` `.conference(opts)` `.enqueue(opts)` `.dequeue(opts)` `.hangup()` `.pause(opts)` `.redirect(opts)` `.config(opts)` `.tag(opts)` `.dtmf(opts)` `.listen(opts)` `.transcribe(opts)` `.message(opts)` `.stream(opts)` `.pipeline(opts)` `.dub(opts)` `.alert(opts)` `.answer(opts)` `.leave()` `.sipDecline(opts)` `.sipRefer(opts)` `.sipRequest(opts)`
256
+
257
+ All methods accept the same options as the corresponding verb JSON Schema. Methods are chainable — they return `this`.
258
+
259
+ ### REST API Client
260
+
261
+ ```typescript
262
+ import { JambonzClient } from '@jambonz/sdk/client';
263
+
264
+ const client = new JambonzClient({ baseUrl: 'https://api.jambonz.us', accountSid, apiKey });
265
+
266
+ // Create an outbound call
267
+ await client.calls.create({ from: '+15085551212', to: { type: 'phone', number: '+15085551213' }, call_hook: '/incoming' });
268
+
269
+ // Mid-call control
270
+ await client.calls.whisper(callSid, { verb: 'say', text: 'Supervisor listening.' });
271
+ await client.calls.mute(callSid, 'mute');
272
+ await client.calls.redirect(callSid, 'https://example.com/new-flow');
273
+ await client.calls.update(callSid, { call_status: 'completed' });
274
+ ```
275
+
276
+ ## Common Patterns (Raw JSON)
277
+
278
+ These are the raw JSON verb arrays that the SDK generates. You should use the SDK verb methods above, but these show the underlying structure for reference.
75
279
 
76
280
  ### Simple Greeting and Gather
77
281
  ```json
@@ -151,7 +355,7 @@ When a verb completes, jambonz invokes the `actionHook` URL (webhook) or sends a
151
355
 
152
356
  **gather**: `speech` (object with `alternatives[].transcript`), `digits` (string), `reason` (`speechDetected`, `dtmfDetected`, `timeout`)
153
357
 
154
- **dial**: `dial_call_sid`, `dial_call_status`, `dial_sip_status`, `duration`
358
+ **dial**: `dial_call_sid`, `dial_call_status`, `dial_sip_status`, `dial_sbc_callid`, `duration`
155
359
 
156
360
  **llm**: `completion_reason` (`normal`, `timeout`, `error`), `llm_usage` (token counts)
157
361
 
@@ -159,6 +363,102 @@ When a verb completes, jambonz invokes the `actionHook` URL (webhook) or sends a
159
363
 
160
364
  **transcribe**: `transcription` (object with transcript text)
161
365
 
366
+ ## Application Environment Variables
367
+
368
+ jambonz has a built-in mechanism for application configuration that is **always preferred over `process.env`**. It works in two required steps:
369
+
370
+ 1. **Declare** — Your app declares its configurable parameters with a schema. The jambonz portal discovers these via an HTTP `OPTIONS` request and renders a configuration form for administrators.
371
+ 2. **Receive** — When a call arrives, jambonz delivers the configured values in the call payload as `env_vars`. Your app reads them from there.
372
+
373
+ **IMPORTANT**: Both steps are required. If you only declare without reading, the values are ignored. If you only read without declaring, the portal won't discover the parameters and won't send them. NEVER use `process.env` for values that should be configurable per-application in the jambonz portal.
374
+
375
+ **When to use env vars**: Phone numbers to dial, API keys, language/voice preferences, greeting text, queue names, timeout values, feature flags, or any value that might change between deployments or users. If in doubt, make it an env var.
376
+
377
+ ### Step 1: Define the Schema
378
+
379
+ Define a schema object where each key is a parameter name and the value describes its type and UI behavior:
380
+
381
+ ```typescript
382
+ const envVars = {
383
+ API_KEY: { type: 'string', description: 'Your API key', required: true, obscure: true },
384
+ LANGUAGE: { type: 'string', description: 'TTS language', default: 'en-US', enum: ['en-US', 'es-ES', 'fr-FR'] },
385
+ MAX_RETRIES: { type: 'number', description: 'Max retry attempts', default: 3 },
386
+ CARRIER: { type: 'string', description: 'Outbound carrier', jambonzResource: 'carriers' },
387
+ SYSTEM_PROMPT: { type: 'string', description: 'LLM system prompt', uiHint: 'textarea' },
388
+ TLS_CERT: { type: 'string', description: 'TLS certificate', uiHint: 'filepicker' },
389
+ };
390
+ ```
391
+
392
+ Each parameter supports:
393
+
394
+ | Property | Required | Description |
395
+ |----------|----------|-------------|
396
+ | `type` | Yes | `'string'` \| `'number'` \| `'boolean'` |
397
+ | `description` | Yes | Human-readable label shown in the portal |
398
+ | `required` | No | Whether the user must provide a value |
399
+ | `default` | No | Pre-filled default value |
400
+ | `enum` | No | Array of allowed values — renders as a dropdown |
401
+ | `obscure` | No | Masks the value in the portal UI (for secrets/API keys) |
402
+ | `uiHint` | No | `'input'` (default single-line), `'textarea'` (multi-line), or `'filepicker'` (file upload with textarea) |
403
+ | `jambonzResource` | No | Populate a dropdown from jambonz account data. Currently supports `'carriers'` (lists VoIP carriers on the account) |
404
+
405
+ **Notes on `jambonzResource`**: When set to `'carriers'`, the portal fetches the VoIP carriers configured for the account and renders them as a dropdown. The stored value is the carrier name. This is preferred over hardcoding carrier names or using `enum` with static values.
406
+
407
+ ### Step 2: Register and Read — WebSocket Apps
408
+
409
+ Pass `envVars` to `createEndpoint` to register the declaration (the SDK auto-responds to OPTIONS requests), then read values from `session.data.env_vars`:
410
+
411
+ ```typescript
412
+ import http from 'http';
413
+ import { createEndpoint } from '@jambonz/sdk/websocket';
414
+
415
+ const envVars = {
416
+ GREETING: { type: 'string', description: 'Greeting message', default: 'Hello!' },
417
+ LANGUAGE: { type: 'string', description: 'TTS language', default: 'en-US' },
418
+ };
419
+
420
+ const server = http.createServer();
421
+ const makeService = createEndpoint({ server, port: 3000, envVars }); // Step 1: declare
422
+
423
+ const svc = makeService({ path: '/' });
424
+
425
+ svc.on('session:new', (session) => {
426
+ const greeting = session.data.env_vars?.GREETING || 'Hello!'; // Step 2: read
427
+ const language = session.data.env_vars?.LANGUAGE || 'en-US';
428
+
429
+ session.say({ text: greeting, language }).hangup().send();
430
+ });
431
+ ```
432
+
433
+ ### Step 2: Register and Read — Webhook Apps
434
+
435
+ Use `envVarsMiddleware` to register the declaration (auto-responds to OPTIONS requests), then read values from `req.body.env_vars`:
436
+
437
+ ```typescript
438
+ import express from 'express';
439
+ import { WebhookResponse, envVarsMiddleware } from '@jambonz/sdk/webhook';
440
+
441
+ const envVars = {
442
+ GREETING: { type: 'string', description: 'Greeting message', default: 'Hello!' },
443
+ LANGUAGE: { type: 'string', description: 'TTS language', default: 'en-US' },
444
+ };
445
+
446
+ const app = express();
447
+ app.use(express.json());
448
+ app.use(envVarsMiddleware(envVars)); // Step 1: declare
449
+
450
+ app.post('/incoming', (req, res) => {
451
+ const greeting = req.body.env_vars?.GREETING || 'Hello!'; // Step 2: read
452
+ const language = req.body.env_vars?.LANGUAGE || 'en-US';
453
+
454
+ const jambonz = new WebhookResponse();
455
+ jambonz.say({ text: greeting, language }).hangup();
456
+ res.json(jambonz);
457
+ });
458
+ ```
459
+
460
+ **Note**: `env_vars` is only present in the initial call webhook (or `session:new` for WebSocket), not in subsequent actionHook callbacks. If you need env var values in actionHook handlers, store them in a variable during the initial call.
461
+
162
462
  ## Mid-Call Control
163
463
 
164
464
  Active calls can be modified asynchronously — inject verbs, mute, redirect, or start recording while the call is in progress.
@@ -223,7 +523,8 @@ session.injectCommand('redirect', { call_hook: '/new-flow' });
223
523
  |------|-------------|
224
524
  | `session:new` | New call session established. Contains call details. |
225
525
  | `session:redirect` | Call was redirected to this app. |
226
- | `verb:status` | A verb completed or changed status. Contains actionHook data. |
526
+ | `verb:hook` | An actionHook fired (e.g. gather completed). Contains `hook` (the actionHook name) and `data` (the payload). The SDK emits this as `session.on('/hookName', handler)`. Respond with `.reply()`. |
527
+ | `verb:status` | Informational verb status notification (no reply needed). |
227
528
  | `call:status` | Call state changed (e.g. `completed`). |
228
529
  | `llm:tool-call` | LLM requested a tool/function call. |
229
530
  | `llm:event` | LLM lifecycle event (connected, tokens, etc.). |
@@ -245,16 +546,185 @@ session.injectCommand('redirect', { call_hook: '/new-flow' });
245
546
  The SDK `Session` object emits events for common message types:
246
547
 
247
548
  ```typescript
248
- session.on('session:new', (session) => { /* new call */ });
249
- session.on('verb:status', (data) => { /* verb completed */ });
549
+ // ActionHook events the hook name IS the event name. Respond with .reply()
550
+ session.on('/echo', (data) => { /* gather actionHook fired */ session.say({text: '...'}).reply(); });
551
+ session.on('/dial-result', (data) => { /* dial actionHook */ session.reply(); });
552
+ session.on('/llm-complete', (data) => { /* llm actionHook */ session.hangup().reply(); });
553
+
554
+ // Fallback — fires for any verb:hook without a specific listener
555
+ session.on('verb:hook', (hook, data) => { /* generic actionHook handler */ });
556
+
557
+ // Status events (informational — no reply needed)
558
+ session.on('verb:status', (data) => { /* verb status notification */ });
250
559
  session.on('call:status', (data) => { /* call state change */ });
560
+
561
+ // LLM events
251
562
  session.on('llm:tool-call', (data) => { /* tool call from LLM */ });
252
563
  session.on('llm:event', (data) => { /* LLM event */ });
253
- session.on('tts:user_interrupt', () => { /* user interrupted TTS */ });
564
+
565
+ // TTS streaming — specific lifecycle events
566
+ session.on('tts:stream_open', (data) => { /* vendor connection established */ });
567
+ session.on('tts:stream_paused', (data) => { /* backpressure: buffer full */ });
568
+ session.on('tts:stream_resumed', (data) => { /* backpressure released */ });
569
+ session.on('tts:stream_closed', (data) => { /* TTS stream ended */ });
570
+ session.on('tts:user_interruption', (data) => { /* user barge-in (with event data) */ });
571
+ session.on('tts:user_interrupt', () => { /* user barge-in (convenience, no data) */ });
572
+ // Catch-all for any TTS streaming event
573
+ session.on('tts:streaming-event', (data) => { /* data.event_type has the type */ });
574
+
575
+ // Connection lifecycle
254
576
  session.on('close', (code, reason) => { /* connection closed */ });
255
577
  session.on('error', (err) => { /* error */ });
256
578
  ```
257
579
 
580
+ ## Audio WebSocket (Listen/Stream)
581
+
582
+ The `listen` and `stream` verbs open a separate WebSocket connection from jambonz to your application, carrying raw audio. This is independent of the control WebSocket (`ws.jambonz.org`) — it uses the `audio.drachtio.org` subprotocol.
583
+
584
+ ### Receiving Audio in the Same Application
585
+
586
+ Use `makeService.audio()` to register an audio WebSocket handler on the same server that handles the control pipe:
587
+
588
+ ```typescript
589
+ import http from 'http';
590
+ import { createEndpoint } from '@jambonz/sdk/websocket';
591
+
592
+ const server = http.createServer();
593
+ const makeService = createEndpoint({ server, port: 3000 });
594
+
595
+ // Control pipe — handles call sessions
596
+ const svc = makeService({ path: '/' });
597
+
598
+ // Audio pipe — receives listen/stream audio
599
+ const audioSvc = makeService.audio({ path: '/audio-stream' });
600
+
601
+ svc.on('session:new', (session) => {
602
+ session
603
+ .answer()
604
+ .say({ text: 'Recording your audio.' })
605
+ .listen({
606
+ url: '/audio-stream', // relative path — jambonz connects back to same server
607
+ sampleRate: 16000,
608
+ mixType: 'mono',
609
+ metadata: { purpose: 'recording' },
610
+ })
611
+ .hangup()
612
+ .send();
613
+ });
614
+
615
+ audioSvc.on('connection', (stream) => {
616
+ console.log(`Audio from call ${stream.callSid}, rate=${stream.sampleRate}`);
617
+ console.log('Metadata:', stream.metadata);
618
+
619
+ stream.on('audio', (pcm: Buffer) => {
620
+ // L16 PCM binary frames
621
+ });
622
+
623
+ stream.on('close', () => {
624
+ console.log('Audio stream closed');
625
+ });
626
+ });
627
+ ```
628
+
629
+ ### AudioStream API
630
+
631
+ The `stream` object in the `connection` event is an `AudioStream` instance:
632
+
633
+ **Properties**: `metadata` (initial JSON), `callSid`, `sampleRate`
634
+
635
+ **Events**:
636
+ - `audio` — L16 PCM binary frame (`Buffer`)
637
+ - `dtmf` — `{digit, duration}` (only if `passDtmf: true` on listen verb)
638
+ - `playDone` — `{id}` (after non-streaming playAudio completes)
639
+ - `mark` — `{name, event}` where event is `'playout'` or `'cleared'`
640
+ - `close` — `(code, reason)`
641
+ - `error` — `(err)`
642
+
643
+ ### Sending Audio Back (Bidirectional)
644
+
645
+ The listen verb supports bidirectional audio. There are two modes, controlled by the `bidirectionalAudio.streaming` option on the listen verb.
646
+
647
+ **Non-streaming mode** (`streaming: false`, the default) — send complete audio clips as base64:
648
+
649
+ ```typescript
650
+ stream.playAudio(base64Content, {
651
+ audioContentType: 'raw', // or 'wav'
652
+ sampleRate: 16000,
653
+ id: 'greeting', // optional — returned in playDone event
654
+ queuePlay: true, // true: queue after current; false: interrupt (default)
655
+ });
656
+
657
+ stream.on('playDone', (evt) => {
658
+ console.log(`Finished playing: ${evt.id}`);
659
+ });
660
+ ```
661
+
662
+ Up to 10 playAudio commands can be queued simultaneously.
663
+
664
+ **Streaming mode** (`streaming: true`) — send raw binary PCM frames directly:
665
+
666
+ ```typescript
667
+ // In the listen verb config:
668
+ // bidirectionalAudio: { enabled: true, streaming: true, sampleRate: 16000 }
669
+
670
+ stream.on('audio', (pcm) => {
671
+ // Echo audio back (or send processed/generated audio)
672
+ stream.sendAudio(pcm);
673
+ });
674
+ ```
675
+
676
+ ### Marks (Synchronization Markers)
677
+
678
+ Marks let you track when streamed audio has been played out to the caller. They work **only with bidirectional streaming mode** — you must enable `bidirectionalAudio: { enabled: true, streaming: true }` on the listen verb.
679
+
680
+ The pattern is: stream audio via `sendAudio()`, then send a mark. When all the audio sent before the mark finishes playing out, jambonz sends back a mark event with `event: 'playout'`. This is how you know the caller has heard a specific chunk of audio.
681
+
682
+ ```typescript
683
+ // Listen verb must enable bidirectional streaming for marks to work
684
+ session
685
+ .listen({
686
+ url: '/audio',
687
+ actionHook: '/listen-done',
688
+ bidirectionalAudio: {
689
+ enabled: true,
690
+ streaming: true,
691
+ sampleRate: 8000,
692
+ },
693
+ })
694
+ .send();
695
+
696
+ // In the audio handler:
697
+ audioSvc.on('connection', (stream) => {
698
+ // Stream audio, then mark a sync point
699
+ stream.sendAudio(pcmBuffer);
700
+ stream.sendMark('chunk-1'); // fires 'playout' when audio above finishes playing
701
+
702
+ stream.sendAudio(morePcm);
703
+ stream.sendMark('chunk-2'); // fires 'playout' when this audio finishes
704
+
705
+ // Listen for mark events
706
+ stream.on('mark', (evt) => {
707
+ // evt.name = 'chunk-1' or 'chunk-2'
708
+ // evt.event = 'playout' (audio played) or 'cleared' (mark was cleared)
709
+ });
710
+
711
+ // Clear all pending marks (unplayed marks get event='cleared')
712
+ stream.clearMarks();
713
+ });
714
+ ```
715
+
716
+ **Important**: Without `bidirectionalAudio.streaming: true`, marks are accepted but never fire — there is no playout buffer to sync against. This is the most common mistake when marks appear to silently fail.
717
+
718
+ ### Other Commands
719
+
720
+ ```typescript
721
+ stream.killAudio(); // Stop playback, flush buffer
722
+ stream.disconnect(); // Close connection, end listen verb
723
+ stream.sendMark('sync-pt'); // Insert synchronization marker
724
+ stream.clearMarks(); // Clear all pending markers
725
+ stream.close(); // Close the WebSocket
726
+ ```
727
+
258
728
  ## Recording
259
729
 
260
730
  jambonz supports SIPREC-based call recording. Recording is controlled mid-call via inject commands (WebSocket) or future REST API extensions.
@@ -285,14 +755,139 @@ Key resources:
285
755
  - **Calls** — Create outbound calls, query active calls, modify in-progress calls (redirect, whisper, mute, hangup)
286
756
  - **Messages** — Send SMS/MMS messages
287
757
 
758
+ ## Code Structure
759
+
760
+ ### Single File (default)
761
+
762
+ For simple applications with 1-2 routes, put everything in a single file. This is the default for all examples in this repo and is perfectly suitable for production use.
763
+
764
+ ### Multi-File with Routes Directory
765
+
766
+ For applications with 3+ routes or significant per-route logic, split into a `src/` directory with a routes folder:
767
+
768
+ ```
769
+ src/
770
+ app.ts ← entry point: server setup, route registration
771
+ routes/
772
+ incoming.ts ← handler for one endpoint/path
773
+ hold-music.ts
774
+ queue-exit.ts
775
+ ```
776
+
777
+ **Webhook pattern** — each route file exports an Express route handler:
778
+
779
+ ```typescript
780
+ // src/routes/incoming.ts
781
+ import type { Request, Response } from 'express';
782
+ import { WebhookResponse } from '@jambonz/sdk/webhook';
783
+
784
+ export default function incoming(_req: Request, res: Response) {
785
+ const jambonz = new WebhookResponse();
786
+ jambonz
787
+ .say({ text: 'Thank you for calling. Please hold.' })
788
+ .enqueue({ name: 'support', waitHook: '/hold-music', actionHook: '/queue-exit' });
789
+ res.json(jambonz);
790
+ }
791
+ ```
792
+
793
+ ```typescript
794
+ // src/app.ts
795
+ import express from 'express';
796
+ import incoming from './routes/incoming.js';
797
+ import holdMusic from './routes/hold-music.js';
798
+ import queueExit from './routes/queue-exit.js';
799
+
800
+ const app = express();
801
+ app.use(express.json());
802
+
803
+ app.post('/incoming', incoming);
804
+ app.post('/hold-music', holdMusic);
805
+ app.post('/queue-exit', queueExit);
806
+
807
+ app.listen(3000, () => console.log('Listening on port 3000'));
808
+ ```
809
+
810
+ **WebSocket pattern** — there are two cases to consider:
811
+
812
+ 1. **Multiple services** (different `makeService({ path })` calls — each path gets its own `session:new`). Each route file exports a function that takes a session:
813
+
814
+ ```typescript
815
+ // src/routes/caller.ts
816
+ import type { Session } from '@jambonz/sdk/websocket';
817
+
818
+ export default function caller(session: Session) {
819
+ session
820
+ .say({ text: 'Please hold.' })
821
+ .enqueue({ name: 'support', waitHook: '/hold-music', actionHook: '/queue-exit' })
822
+ .send();
823
+ }
824
+ ```
825
+
826
+ ```typescript
827
+ // src/app.ts
828
+ import http from 'http';
829
+ import { createEndpoint } from '@jambonz/sdk/websocket';
830
+ import caller from './routes/caller.js';
831
+ import agent from './routes/agent.js';
832
+
833
+ const server = http.createServer();
834
+ const makeService = createEndpoint({ server, port: 3000 });
835
+
836
+ makeService({ path: '/incoming' }).on('session:new', (session) => caller(session));
837
+ makeService({ path: '/agent' }).on('session:new', (session) => agent(session));
838
+ ```
839
+
840
+ 2. **Multiple actionHook handlers on one session** — extract handler functions, but register them all within `session:new`:
841
+
842
+ ```typescript
843
+ // src/routes/echo-handler.ts
844
+ import type { Session } from '@jambonz/sdk/websocket';
845
+
846
+ export default function echoHandler(session: Session, evt: Record<string, any>) {
847
+ if (evt.reason === 'speechDetected') {
848
+ const text = evt.speech?.alternatives?.[0]?.transcript || 'nothing';
849
+ session.say({ text: `You said: ${text}` })
850
+ .gather({ input: ['speech'], actionHook: '/echo', timeout: 10 })
851
+ .reply();
852
+ } else {
853
+ session.gather({ input: ['speech'], actionHook: '/echo', timeout: 10,
854
+ say: { text: 'I didn\'t hear anything. Try again.' } }).reply();
855
+ }
856
+ }
857
+ ```
858
+
859
+ ```typescript
860
+ // src/app.ts — wire it up
861
+ svc.on('session:new', (session) => {
862
+ session.on('/echo', (evt) => echoHandler(session, evt));
863
+ session.gather({ input: ['speech'], actionHook: '/echo', timeout: 10,
864
+ say: { text: 'Say something.' } }).send();
865
+ });
866
+ ```
867
+
868
+ ### When to Split
869
+
870
+ - **1-2 routes, simple logic** → single file
871
+ - **3+ routes or substantial per-route logic** → `src/app.ts` + `src/routes/`
872
+ - **Shared config, prompts, or utilities** → `src/config.ts`, `src/prompts.ts`, etc.
873
+
874
+ When in doubt, start with a single file. It's easy to split later.
875
+
288
876
  ## Examples
289
877
 
290
878
  Complete working examples are in the `examples/` directory:
291
879
  - **hello-world** — Minimal greeting (webhook + WebSocket)
880
+ - **echo** — Speech echo using gather with actionHook pattern (webhook + WebSocket). The canonical example for understanding actionHook event handling.
292
881
  - **ivr-menu** — Interactive menu with speech and DTMF input (webhook)
293
- - **voice-agent** — LLM-powered conversational AI (webhook + WebSocket)
882
+ - **dial** — Simple outbound dial to a phone number (webhook)
883
+ - **listen-record** — Record audio using the listen verb to stream to a WebSocket (webhook)
884
+ - **voice-agent** — LLM-powered conversational AI with tool calls (webhook + WebSocket)
885
+ - **openai-realtime** — OpenAI Realtime API voice agent with function calling (WebSocket)
886
+ - **deepgram-voice-agent** — Deepgram Voice Agent API with function calling (WebSocket)
887
+ - **llm-streaming** — Anthropic LLM with TTS token streaming and barge-in (WebSocket)
294
888
  - **queue-with-hold** — Call queue with hold music and agent dequeue (webhook + WebSocket)
295
889
  - **call-recording** — Mid-call recording control via REST API and inject commands (webhook + WebSocket)
890
+ - **realtime-translator** — Bridges two parties with real-time speech translation using STT, Google Translate, and TTS dub tracks. Multi-file example with `src/routes/` structure (WebSocket)
296
891
 
297
892
  ## Key Concepts
298
893