@jambonz/schema 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (106) hide show
  1. package/AGENTS.md +974 -0
  2. package/callbacks/amd.schema.json +50 -0
  3. package/callbacks/base.schema.json +29 -0
  4. package/callbacks/call-status.schema.json +22 -0
  5. package/callbacks/conference-status.schema.json +24 -0
  6. package/callbacks/conference-wait.schema.json +11 -0
  7. package/callbacks/conference.schema.json +11 -0
  8. package/callbacks/dequeue.schema.json +19 -0
  9. package/callbacks/dial-dtmf.schema.json +18 -0
  10. package/callbacks/dial-hold.schema.json +22 -0
  11. package/callbacks/dial-refer.schema.json +28 -0
  12. package/callbacks/dial.schema.json +31 -0
  13. package/callbacks/enqueue-wait.schema.json +17 -0
  14. package/callbacks/enqueue.schema.json +27 -0
  15. package/callbacks/gather-partial.schema.json +54 -0
  16. package/callbacks/gather.schema.json +60 -0
  17. package/callbacks/listen.schema.json +21 -0
  18. package/callbacks/llm.schema.json +30 -0
  19. package/callbacks/message.schema.json +35 -0
  20. package/callbacks/pipeline-turn.schema.json +109 -0
  21. package/callbacks/play.schema.json +36 -0
  22. package/callbacks/session-new.schema.json +143 -0
  23. package/callbacks/session-reconnect.schema.json +9 -0
  24. package/callbacks/session-redirect.schema.json +38 -0
  25. package/callbacks/sip-refer-event.schema.json +20 -0
  26. package/callbacks/sip-refer.schema.json +22 -0
  27. package/callbacks/sip-request.schema.json +27 -0
  28. package/callbacks/transcribe-translation.schema.json +24 -0
  29. package/callbacks/transcribe.schema.json +46 -0
  30. package/callbacks/tts-streaming-event.schema.json +77 -0
  31. package/callbacks/verb-status.schema.json +57 -0
  32. package/components/actionHook.schema.json +36 -0
  33. package/components/actionHookDelayAction.schema.json +37 -0
  34. package/components/amd.schema.json +68 -0
  35. package/components/auth.schema.json +18 -0
  36. package/components/bidirectionalAudio.schema.json +22 -0
  37. package/components/fillerNoise.schema.json +25 -0
  38. package/components/llm-base.schema.json +94 -0
  39. package/components/recognizer-assemblyAiOptions.schema.json +66 -0
  40. package/components/recognizer-awsOptions.schema.json +52 -0
  41. package/components/recognizer-azureOptions.schema.json +32 -0
  42. package/components/recognizer-cobaltOptions.schema.json +34 -0
  43. package/components/recognizer-customOptions.schema.json +27 -0
  44. package/components/recognizer-deepgramOptions.schema.json +147 -0
  45. package/components/recognizer-elevenlabsOptions.schema.json +39 -0
  46. package/components/recognizer-gladiaOptions.schema.json +8 -0
  47. package/components/recognizer-googleOptions.schema.json +35 -0
  48. package/components/recognizer-houndifyOptions.schema.json +53 -0
  49. package/components/recognizer-ibmOptions.schema.json +54 -0
  50. package/components/recognizer-nuanceOptions.schema.json +150 -0
  51. package/components/recognizer-nvidiaOptions.schema.json +39 -0
  52. package/components/recognizer-openaiOptions.schema.json +59 -0
  53. package/components/recognizer-sonioxOptions.schema.json +46 -0
  54. package/components/recognizer-speechmaticsOptions.schema.json +100 -0
  55. package/components/recognizer-verbioOptions.schema.json +46 -0
  56. package/components/recognizer.schema.json +216 -0
  57. package/components/synthesizer.schema.json +82 -0
  58. package/components/target.schema.json +105 -0
  59. package/components/vad.schema.json +48 -0
  60. package/docs/components/recognizer.md +78 -0
  61. package/docs/components/synthesizer.md +27 -0
  62. package/docs/guides/session-commands.md +417 -0
  63. package/docs/verbs/conference.md +51 -0
  64. package/docs/verbs/deepgram_s2s.md +108 -0
  65. package/docs/verbs/dial.md +8 -0
  66. package/docs/verbs/listen.md +71 -0
  67. package/docs/verbs/pipeline.md +475 -0
  68. package/docs/verbs/stream.md +5 -0
  69. package/index.js +9 -0
  70. package/jambonz-app.schema.json +112 -0
  71. package/lib/normalize.js +72 -0
  72. package/lib/validator.js +137 -0
  73. package/package.json +39 -0
  74. package/verbs/alert.schema.json +34 -0
  75. package/verbs/answer.schema.json +22 -0
  76. package/verbs/conference.schema.json +107 -0
  77. package/verbs/config.schema.json +218 -0
  78. package/verbs/deepgram_s2s.schema.json +81 -0
  79. package/verbs/dequeue.schema.json +51 -0
  80. package/verbs/dial.schema.json +187 -0
  81. package/verbs/dialogflow.schema.json +148 -0
  82. package/verbs/dtmf.schema.json +49 -0
  83. package/verbs/dub.schema.json +103 -0
  84. package/verbs/elevenlabs_s2s.schema.json +81 -0
  85. package/verbs/enqueue.schema.json +53 -0
  86. package/verbs/gather.schema.json +188 -0
  87. package/verbs/google_s2s.schema.json +42 -0
  88. package/verbs/hangup.schema.json +36 -0
  89. package/verbs/leave.schema.json +22 -0
  90. package/verbs/listen.schema.json +127 -0
  91. package/verbs/llm.schema.json +44 -0
  92. package/verbs/message.schema.json +82 -0
  93. package/verbs/openai_s2s.schema.json +42 -0
  94. package/verbs/pause.schema.json +36 -0
  95. package/verbs/pipeline.schema.json +240 -0
  96. package/verbs/play.schema.json +96 -0
  97. package/verbs/redirect.schema.json +34 -0
  98. package/verbs/s2s.schema.json +39 -0
  99. package/verbs/say.schema.json +107 -0
  100. package/verbs/sip-decline.schema.json +58 -0
  101. package/verbs/sip-refer.schema.json +58 -0
  102. package/verbs/sip-request.schema.json +54 -0
  103. package/verbs/stream.schema.json +103 -0
  104. package/verbs/tag.schema.json +41 -0
  105. package/verbs/transcribe.schema.json +57 -0
  106. package/verbs/ultravox_s2s.schema.json +41 -0
package/AGENTS.md ADDED
@@ -0,0 +1,974 @@
1
+ # jambonz Agent Toolkit
2
+
3
+ jambonz is an open-source CPaaS (Communications Platform as a Service) for building voice and messaging applications. It handles telephony infrastructure — SIP, carriers, phone numbers, media processing — so you can focus on application logic.
4
+
5
+ ## Before You Start — Ask the User
6
+
7
+ Before generating any jambonz application code, ask the user: **"Do you prefer TypeScript or JavaScript?"** Then generate all code examples in their chosen language. If they don't have a preference, default to TypeScript.
8
+
9
+ ## Server Versions
10
+
11
+ jambonz has two editions: **v0.9.x (open source)** and **v10.x (commercial)**. Always target the commercial version (v10.x). All verb schemas and features are available.
12
+
13
+ ## How jambonz Applications Work
14
+
15
+ A jambonz application controls phone calls by returning **arrays of verbs** — JSON instructions that execute sequentially. The runtime processes each verb in order: speak text, play audio, collect input, dial a number, connect to an AI model, etc.
16
+
17
+ ### The Webhook Lifecycle
18
+
19
+ 1. An incoming call arrives. jambonz invokes your application's URL with call details (caller, called number, SIP headers, etc.).
20
+ 2. Your application returns a JSON array of verbs.
21
+ 3. jambonz executes the verbs in order.
22
+ 4. When a verb with an `actionHook` completes (e.g. `gather` collects speech input), jambonz invokes the actionHook URL with the result.
23
+ 5. The actionHook response (a new verb array) replaces the remaining verb stack.
24
+ 6. This continues until the call ends or a `hangup` verb is executed.
25
+
26
+ ### Transport Modes
27
+
28
+ jambonz supports two transport modes for delivering verb arrays:
29
+
30
+ - **Webhook (HTTP)**: Your server receives HTTP POST requests with call data and returns JSON verb arrays in the response body. Stateless and simple. Good for IVR menus, call routing, and straightforward flows.
31
+ - **WebSocket**: Your server maintains a persistent websocket connection with jambonz. Verb arrays are sent as JSON messages in both directions. Required for real-time features like LLM conversations, audio streaming, and event-driven flows.
32
+
33
+ The verb schemas and JSON structure are identical in both modes. The difference is the transport.
34
+
35
+ ### When to Use Which
36
+
37
+ - **Webhook**: Simple IVR, call routing, voicemail, basic gather-and-respond patterns.
38
+ - **WebSocket**: LLM-powered voice agents, real-time audio streaming, complex conversational flows, anything requiring bidirectional communication, or asynchronous logic, or streaming tts.
39
+
40
+ **IMPORTANT**: Any application that uses a speech-to-speech verb (`openai_s2s`, `google_s2s`, `deepgram_s2s`, `ultravox_s2s`, `elevenlabs_s2s`, `s2s`, or `pipeline`) MUST use WebSocket transport, not webhooks. These verbs require persistent bidirectional communication for real-time audio and events. Always use `createEndpoint` from `@jambonz/sdk/websocket` for s2s applications.
41
+
42
+ ## Schema
43
+
44
+ The complete verb schema is at `schema/jambonz-app.schema.json`. This is a JSON Schema (draft 2020-12) that defines the structure of a jambonz application.
45
+
46
+ Individual verb schemas are in `schema/verbs/`. Shared component types (synthesizer, recognizer, target, etc.) are in `schema/components/`.
47
+
48
+ ## Core Verbs
49
+
50
+ ### Audio & Speech
51
+ - **say** — Speak text using TTS. Supports SSML, streaming, multiple voices.
52
+ - **play** — Play an audio file from a URL.
53
+ - **gather** — Collect speech (STT) and/or DTMF input. The workhorse for interactive menus and voice input.
54
+
55
+ ### AI & Real-time
56
+ - **openai_s2s** / **google_s2s** / **deepgram_s2s** / **ultravox_s2s** — Connect the caller to a vendor-specific LLM for real-time voice conversation. These are the **preferred** verbs when the vendor is known. Each handles the full STT→LLM→TTS pipeline with the vendor pre-set.
57
+ - **elevenlabs_s2s** — Connect the caller to an ElevenLabs Conversational AI agent. **Unlike other s2s vendors**, ElevenLabs requires a pre-configured `agent_id` (created in the ElevenLabs dashboard) rather than a model and messages. See [ElevenLabs S2S specifics](#elevenlabs-s2s-specifics) below.
58
+ - **s2s** — Generic LLM voice conversation verb. Use only when the vendor is determined at runtime (e.g. from an env var). Requires `vendor` to be specified.
59
+ - **pipeline** — Higher-level voice AI pipeline with integrated turn detection.
60
+ - **dialogflow** — Connect the caller to a Google Dialogflow agent (ES, CX, or CES).
61
+ - **stream** — Stream raw audio to a websocket endpoint for custom processing.
62
+ - **transcribe** — Real-time call transcription sent to a webhook.
63
+
64
+ ### Call Control
65
+ - **dial** — Place an outbound call and bridge it to the current caller.
66
+ - **conference** — Multi-party conference room.
67
+ - **enqueue** / **dequeue** — Call queuing.
68
+ - **hangup** — End the call.
69
+ - **redirect** — Transfer control to a different webhook.
70
+ - **pause** — Wait for a specified duration.
71
+
72
+ ### SIP
73
+ - **sip:decline** — Reject an incoming call with a SIP error.
74
+ - **sip:request** — Send a SIP request within the dialog.
75
+ - **sip:refer** — Transfer the call via SIP REFER.
76
+
77
+ ### Utility
78
+ - **config** — Set session-level defaults (TTS vendor/voice, STT vendor, VAD, etc.).
79
+ - **tag** — Attach metadata to the call.
80
+ - **dtmf** — Send DTMF tones.
81
+ - **dub** — Mix auxiliary audio tracks into the call.
82
+ - **message** — Send SMS/MMS.
83
+ - **alert** — Send a SIP 180 with Alert-Info.
84
+ - **answer** — Explicitly answer the call.
85
+ - **leave** — Leave a conference or queue.
86
+
87
+ ### Verb Synonyms and Shortcuts
88
+
89
+ **IMPORTANT — Code generation rules:**
90
+
91
+ 1. **Always use `stream`, never `listen`** — they are synonyms; `stream` is the preferred name.
92
+ 2. **Always use the vendor-specific shortcut when the LLM vendor is known** — use `openai_s2s`, `google_s2s`, `elevenlabs_s2s`, `deepgram_s2s`, or `ultravox_s2s`. Do NOT use `llm` or `s2s` with a `vendor` property when a shortcut exists.
93
+ 3. **Use `s2s` (not `llm`) when the vendor is dynamic** — e.g. the vendor comes from an env var or runtime config. Both `s2s` and `llm` are synonyms, but prefer `s2s`.
94
+ 4. **Never use `llm` in generated code** — it is a legacy name. Use either a vendor shortcut or `s2s`.
95
+
96
+ The same rules apply to SDK method calls: use `.openai_s2s(opts)`, `.deepgram_s2s(opts)`, etc. instead of `.llm({ vendor: 'openai', ... })`. Use `.stream(opts)` instead of `.listen(opts)`.
97
+
98
+ ## Using the @jambonz/sdk
99
+
100
+ **IMPORTANT**: Always use the `@jambonz/sdk` package (version 0.1.x+) to build jambonz applications. The older packages `@jambonz/node-client-ws` and `@jambonz/node-client` are **deprecated and replaced** by `@jambonz/sdk`. Do NOT use those old packages. Do NOT build apps with raw JSON verb arrays and plain Express/WS — use `@jambonz/sdk`.
101
+
102
+ **IMPORTANT — App Environment Variables vs process.env**: jambonz applications should NEVER use `process.env` for application-configurable values (phone numbers, API keys, language preferences, greeting text, etc.). Instead, use **jambonz application environment variables** — a two-step pattern:
103
+ 1. **Declare** the variables so the jambonz portal can discover them (via `envVars` option for WebSocket, or `envVarsMiddleware` for webhook).
104
+ 2. **Read** the values at runtime from the call payload (`session.data.env_vars` for WebSocket, `req.body.env_vars` for webhook).
105
+
106
+ Both steps are required. Declaring without reading means values are ignored. Reading without declaring means the portal won't know about them and won't send them. See the [Application Environment Variables](#application-environment-variables) section for full details.
107
+
108
+ Install: `npm install @jambonz/sdk`
109
+
110
+ **Dependencies**: Webhook apps also require `express` (`npm install express`). WebSocket apps have no additional dependencies — the SDK includes `ws` internally. When generating a `package.json`, always include all required dependencies.
111
+
112
+ ### Webhook Application (HTTP)
113
+
114
+ Import `WebhookResponse` from `@jambonz/sdk/webhook`. Create an Express server, construct a `WebhookResponse` for each request, chain verb methods, and return it via `res.json()`.
115
+
116
+ **Best practice**: Always include a `POST /call-status` handler. jambonz sends call status change notifications (ringing, in-progress, completed, etc.) to this endpoint. The handler should log the event and return 200. The path `/call-status` is conventional but the user may choose a different path:
117
+
118
+ ```typescript
119
+ app.post('/call-status', (req, res) => {
120
+ console.log(`Call ${req.body.call_sid} status: ${req.body.call_status}`);
121
+ res.sendStatus(200);
122
+ });
123
+ ```
124
+
125
+ ```typescript
126
+ import express from 'express';
127
+ import { WebhookResponse } from '@jambonz/sdk/webhook';
128
+
129
+ const app = express();
130
+ app.use(express.json());
131
+
132
+ app.post('/incoming', (req, res) => {
133
+ const jambonz = new WebhookResponse();
134
+ jambonz
135
+ .say({ text: 'Hello! Welcome to our service.' })
136
+ .gather({
137
+ input: ['speech', 'digits'],
138
+ actionHook: '/handle-input',
139
+ numDigits: 1,
140
+ timeout: 10,
141
+ say: { text: 'Press 1 for sales or 2 for support.' },
142
+ })
143
+ .say({ text: 'We did not receive any input. Goodbye.' })
144
+ .hangup();
145
+
146
+ res.json(jambonz);
147
+ });
148
+
149
+ app.post('/handle-input', (req, res) => {
150
+ const { digits, speech } = req.body;
151
+ const jambonz = new WebhookResponse();
152
+ jambonz.say({ text: `You pressed ${digits || 'nothing'}. Goodbye.` }).hangup();
153
+ res.json(jambonz);
154
+ });
155
+
156
+ // Every webhook app must handle call status events
157
+ app.post('/call-status', (req, res) => {
158
+ console.log(`Call ${req.body.call_sid} status: ${req.body.call_status}`);
159
+ res.sendStatus(200);
160
+ });
161
+
162
+ app.listen(3000, () => console.log('Listening on port 3000'));
163
+ ```
164
+
165
+ ### WebSocket Application
166
+
167
+ Import `createEndpoint` from `@jambonz/sdk/websocket`. Create an HTTP server, call `createEndpoint` to set up WebSocket handling, then register path-based services that receive `session` objects.
168
+
169
+ ```typescript
170
+ import http from 'http';
171
+ import { createEndpoint } from '@jambonz/sdk/websocket';
172
+
173
+ const server = http.createServer();
174
+ const makeService = createEndpoint({ server, port: 3000 });
175
+
176
+ const svc = makeService({ path: '/' });
177
+
178
+ svc.on('session:new', (session) => {
179
+ console.log(`Incoming call: ${session.callSid}`);
180
+
181
+ session
182
+ .say({ text: 'Hello from jambonz over WebSocket!' })
183
+ .hangup()
184
+ .send();
185
+ });
186
+
187
+ console.log('jambonz ws app listening on port 3000');
188
+ ```
189
+
190
+ **Key differences from webhook**: Use `session` instead of `WebhookResponse`. Chain verbs the same way, but call `.send()` at the end to transmit the initial verb array over the WebSocket.
191
+
192
+ ### WebSocket actionHook Events (CRITICAL)
193
+
194
+ In webhook mode, an `actionHook` is just a URL that jambonz POSTs to. In WebSocket mode, the `actionHook` value becomes an **event name** emitted on the session. You MUST bind a handler for it and respond with `.reply()`.
195
+
196
+ **Key rules for WebSocket actionHook handling:**
197
+ 1. Use `session.on('/hookName', (evt) => {...})` to listen for the actionHook event.
198
+ 2. The `evt` object contains the actionHook payload (same fields as the webhook POST body: `reason`, `speech`, `digits`, etc.).
199
+ 3. Respond with `.reply()` — NOT `.send()`. `.send()` is only for the initial verb array (the first response to `session:new`). `.reply()` acknowledges the actionHook and provides the next verb array.
200
+ 4. If no listener is bound for an actionHook, the SDK auto-replies with an empty verb array.
201
+
202
+ ### WebSocket with Gather (speech echo example)
203
+
204
+ ```typescript
205
+ import http from 'http';
206
+ import { createEndpoint } from '@jambonz/sdk/websocket';
207
+
208
+ const server = http.createServer();
209
+ const makeService = createEndpoint({ server, port: 3000 });
210
+
211
+ const svc = makeService({ path: '/' });
212
+
213
+ svc.on('session:new', (session) => {
214
+ // Bind actionHook handler BEFORE sending verbs
215
+ session
216
+ .on('close', (code: number, _reason: Buffer) => {
217
+ console.log(`Session ${session.callSid} closed: ${code}`);
218
+ })
219
+ .on('error', (err: Error) => {
220
+ console.error(`Session ${session.callSid} error:`, err);
221
+ })
222
+ .on('/echo', (evt: Record<string, any>) => {
223
+ // This fires when the gather verb completes (actionHook: '/echo')
224
+ switch (evt.reason) {
225
+ case 'speechDetected': {
226
+ const transcript = evt.speech?.alternatives?.[0]?.transcript || 'nothing';
227
+ session
228
+ .say({ text: `You said: ${transcript}.` })
229
+ .gather({
230
+ input: ['speech'],
231
+ actionHook: '/echo',
232
+ timeout: 10,
233
+ say: { text: 'Please say something else.' },
234
+ })
235
+ .reply(); // reply() — NOT send()
236
+ break;
237
+ }
238
+ case 'timeout':
239
+ session
240
+ .gather({
241
+ input: ['speech'],
242
+ actionHook: '/echo',
243
+ timeout: 10,
244
+ say: { text: 'Are you still there? I didn\'t hear anything.' },
245
+ })
246
+ .reply();
247
+ break;
248
+ default:
249
+ session.reply();
250
+ break;
251
+ }
252
+ });
253
+
254
+ // Send initial verbs to jambonz
255
+ session
256
+ .pause({ length: 1 })
257
+ .gather({
258
+ input: ['speech'],
259
+ actionHook: '/echo',
260
+ timeout: 10,
261
+ say: { text: 'Please say something and I will echo it back to you.' },
262
+ })
263
+ .send(); // send() — first response only
264
+ });
265
+
266
+ console.log('Speech echo WebSocket app listening on port 3000');
267
+ ```
268
+
269
+ **`.send()` vs `.reply()`:**
270
+ - `.send()` — Use ONCE for the initial verb array in response to `session:new`. This acknowledges the session.
271
+ - `.reply()` — Use for ALL subsequent responses (actionHook events, session:redirect). This acknowledges the hook message and provides the next verb array.
272
+
273
+ ### SDK Verb Method Reference
274
+
275
+ Both `WebhookResponse` and `Session` support the same chainable verb methods:
276
+
277
+ `.say(opts)` `.play(opts)` `.gather(opts)` `.dial(opts)` `.llm(opts)` `.s2s(opts)` `.openai_s2s(opts)` `.google_s2s(opts)` `.elevenlabs_s2s(opts)` `.deepgram_s2s(opts)` `.ultravox_s2s(opts)` `.dialogflow(opts)` `.conference(opts)` `.enqueue(opts)` `.dequeue(opts)` `.hangup()` `.pause(opts)` `.redirect(opts)` `.config(opts)` `.tag(opts)` `.dtmf(opts)` `.listen(opts)` `.transcribe(opts)` `.message(opts)` `.stream(opts)` `.pipeline(opts)` `.dub(opts)` `.alert(opts)` `.answer(opts)` `.leave()` `.sipDecline(opts)` `.sipRefer(opts)` `.sipRequest(opts)`
278
+
279
+ All methods accept the same options as the corresponding verb JSON Schema. Methods are chainable — they return `this`.
280
+
281
+ ### REST API Client
282
+
283
+ ```typescript
284
+ import { JambonzClient } from '@jambonz/sdk/client';
285
+
286
+ const client = new JambonzClient({ baseUrl: 'https://api.jambonz.us', accountSid, apiKey });
287
+
288
+ // Create an outbound call
289
+ await client.calls.create({ from: '+15085551212', to: { type: 'phone', number: '+15085551213' }, call_hook: '/incoming' });
290
+
291
+ // Mid-call control
292
+ await client.calls.whisper(callSid, { verb: 'say', text: 'Supervisor listening.' });
293
+ await client.calls.mute(callSid, 'mute');
294
+ await client.calls.redirect(callSid, 'https://example.com/new-flow');
295
+ await client.calls.update(callSid, { call_status: 'completed' });
296
+ ```
297
+
298
+ ## Common Patterns (Raw JSON)
299
+
300
+ These are the raw JSON verb arrays that the SDK generates. You should use the SDK verb methods above, but these show the underlying structure for reference.
301
+
302
+ ### Simple Greeting and Gather
303
+ ```json
304
+ [
305
+ { "verb": "say", "text": "Welcome. Press 1 for sales, 2 for support." },
306
+ { "verb": "gather", "input": ["digits"], "numDigits": 1, "actionHook": "/menu" }
307
+ ]
308
+ ```
309
+
310
+ ### LLM Voice Agent
311
+ ```json
312
+ [
313
+ {
314
+ "verb": "config",
315
+ "synthesizer": { "vendor": "elevenlabs", "voice": "EXAVITQu4vr4xnSDxMaL" },
316
+ "recognizer": { "vendor": "deepgram", "language": "en-US" }
317
+ },
318
+ {
319
+ "verb": "openai_s2s",
320
+ "model": "gpt-4o",
321
+ "llmOptions": {
322
+ "messages": [{ "role": "system", "content": "You are a helpful assistant." }]
323
+ },
324
+ "actionHook": "/llm-done",
325
+ "toolHook": "/tool-call"
326
+ }
327
+ ]
328
+ ```
329
+
330
+ ### ElevenLabs S2S Specifics
331
+
332
+ ElevenLabs works differently from other s2s vendors. Instead of passing a model and system prompt, you create a **Conversational AI agent** in the ElevenLabs dashboard and pass the `agent_id` to jambonz. The agent's voice, personality, tools, and LLM configuration are all managed on the ElevenLabs side.
333
+
334
+ **Key differences from other s2s verbs:**
335
+ - `auth` requires `agent_id` (required) and optionally `api_key` (enables signed WebSocket URLs)
336
+ - `model` is NOT used — the model is configured in the ElevenLabs agent
337
+ - `llmOptions` should be an empty object `{}` (do NOT pass `messages` or `temperature`)
338
+ - `llmOptions.conversation_initiation_client_data` can optionally send data to the agent at conversation start
339
+ - Always include `eventHook` and `events: ['all']` — omitting eventHook causes errors on the server
340
+
341
+ ```json
342
+ [
343
+ {
344
+ "verb": "elevenlabs_s2s",
345
+ "auth": {
346
+ "agent_id": "your-agent-id",
347
+ "api_key": "your-api-key"
348
+ },
349
+ "llmOptions": {},
350
+ "actionHook": "/s2s-complete",
351
+ "eventHook": "/event",
352
+ "events": ["all"]
353
+ }
354
+ ]
355
+ ```
356
+
357
+ SDK example:
358
+ ```typescript
359
+ session
360
+ .elevenlabs_s2s({
361
+ auth: { agent_id: agentId, api_key: apiKey },
362
+ llmOptions: {},
363
+ actionHook: '/s2s-complete',
364
+ eventHook: '/event',
365
+ events: ['all'],
366
+ })
367
+ .send();
368
+ ```
369
+
370
+ ### Dial with Fallback
371
+ ```json
372
+ [
373
+ { "verb": "say", "text": "Connecting you now." },
374
+ {
375
+ "verb": "dial",
376
+ "target": [{ "type": "phone", "number": "+15085551212" }],
377
+ "answerOnBridge": true,
378
+ "timeout": 30,
379
+ "actionHook": "/dial-result"
380
+ },
381
+ { "verb": "say", "text": "The agent is unavailable. Goodbye." },
382
+ { "verb": "hangup" }
383
+ ]
384
+ ```
385
+
386
+ ### Call Queue
387
+ ```json
388
+ [
389
+ { "verb": "say", "text": "All agents are busy. You are in the queue." },
390
+ {
391
+ "verb": "enqueue",
392
+ "name": "support",
393
+ "waitHook": "/hold-music",
394
+ "actionHook": "/queue-exit"
395
+ }
396
+ ]
397
+ ```
398
+
399
+ ## ActionHook Payloads
400
+
401
+ When a verb completes, jambonz invokes the `actionHook` URL (webhook) or sends an event (WebSocket) with result data. Every actionHook payload includes these base fields:
402
+
403
+ | Field | Description |
404
+ |-------|-------------|
405
+ | `call_sid` | Unique identifier for this call |
406
+ | `account_sid` | Your account identifier |
407
+ | `application_sid` | The application handling this call |
408
+ | `direction` | `inbound` or `outbound` |
409
+ | `from` | Caller phone number or SIP URI |
410
+ | `to` | Called phone number or SIP URI |
411
+ | `call_id` | SIP Call-ID |
412
+ | `call_status` | Current call state (`trying`, `ringing`, `early-media`, `in-progress`, `completed`, `failed`, `busy`, `no-answer`) |
413
+ | `sip_status` | SIP response code (e.g. `200`, `486`) |
414
+
415
+ ### Verb-Specific Payload Fields
416
+
417
+ **gather**: `speech` (object with `alternatives[].transcript`), `digits` (string), `reason` (`speechDetected`, `dtmfDetected`, `timeout`)
418
+
419
+ **dial**: `dial_call_sid`, `dial_call_status`, `dial_sip_status`, `dial_sbc_callid`, `duration`
420
+
421
+ **llm**: `completion_reason` (`normal`, `timeout`, `error`), `llm_usage` (token counts)
422
+
423
+ **enqueue**: `queue_result` (`dequeued`, `hangup`, `error`)
424
+
425
+ **transcribe**: `transcription` (object with transcript text)
426
+
427
+ ## Application Environment Variables
428
+
429
+ jambonz has a built-in mechanism for application configuration that is **always preferred over `process.env`**. It works in two required steps:
430
+
431
+ 1. **Declare** — Your app declares its configurable parameters with a schema. The jambonz portal discovers these via an HTTP `OPTIONS` request and renders a configuration form for administrators.
432
+ 2. **Receive** — When a call arrives, jambonz delivers the configured values in the call payload as `env_vars`. Your app reads them from there.
433
+
434
+ **IMPORTANT**: Both steps are required. If you only declare without reading, the values are ignored. If you only read without declaring, the portal won't discover the parameters and won't send them. NEVER use `process.env` for values that should be configurable per-application in the jambonz portal.
435
+
436
+ **When to use env vars**: Phone numbers to dial, API keys, language/voice preferences, greeting text, queue names, timeout values, feature flags, or any value that might change between deployments or users. If in doubt, make it an env var.
437
+
438
+ ### Step 1: Define the Schema
439
+
440
+ Define a schema object where each key is a parameter name and the value describes its type and UI behavior:
441
+
442
+ ```typescript
443
+ const envVars = {
444
+ API_KEY: { type: 'string', description: 'Your API key', required: true, obscure: true },
445
+ LANGUAGE: { type: 'string', description: 'TTS language', default: 'en-US', enum: ['en-US', 'es-ES', 'fr-FR'] },
446
+ MAX_RETRIES: { type: 'number', description: 'Max retry attempts', default: 3 },
447
+ CARRIER: { type: 'string', description: 'Outbound carrier', jambonzResource: 'carriers' },
448
+ SYSTEM_PROMPT: { type: 'string', description: 'LLM system prompt', uiHint: 'textarea' },
449
+ TLS_CERT: { type: 'string', description: 'TLS certificate', uiHint: 'filepicker' },
450
+ };
451
+ ```
452
+
453
+ Each parameter supports:
454
+
455
+ | Property | Required | Description |
456
+ |----------|----------|-------------|
457
+ | `type` | Yes | `'string'` \| `'number'` \| `'boolean'` |
458
+ | `description` | Yes | Human-readable label shown in the portal |
459
+ | `required` | No | Whether the user must provide a value |
460
+ | `default` | No | Pre-filled default value |
461
+ | `enum` | No | Array of allowed values — renders as a dropdown |
462
+ | `obscure` | No | Masks the value in the portal UI (for secrets/API keys) |
463
+ | `uiHint` | No | `'input'` (default single-line), `'textarea'` (multi-line), or `'filepicker'` (file upload with textarea) |
464
+ | `jambonzResource` | No | Populate a dropdown from jambonz account data. Currently supports `'carriers'` (lists VoIP carriers on the account) |
465
+
466
+ **Notes on `jambonzResource`**: When set to `'carriers'`, the portal fetches the VoIP carriers configured for the account and renders them as a dropdown. The stored value is the carrier name. This is preferred over hardcoding carrier names or using `enum` with static values.
467
+
468
+ ### Step 2: Register and Read — WebSocket Apps
469
+
470
+ Pass `envVars` to `createEndpoint` to register the declaration (the SDK auto-responds to OPTIONS requests), then read values from `session.data.env_vars`:
471
+
472
+ ```typescript
473
+ import http from 'http';
474
+ import { createEndpoint } from '@jambonz/sdk/websocket';
475
+
476
+ const envVars = {
477
+ GREETING: { type: 'string', description: 'Greeting message', default: 'Hello!' },
478
+ LANGUAGE: { type: 'string', description: 'TTS language', default: 'en-US' },
479
+ };
480
+
481
+ const server = http.createServer();
482
+ const makeService = createEndpoint({ server, port: 3000, envVars }); // Step 1: declare
483
+
484
+ const svc = makeService({ path: '/' });
485
+
486
+ svc.on('session:new', (session) => {
487
+ const greeting = session.data.env_vars?.GREETING || 'Hello!'; // Step 2: read
488
+ const language = session.data.env_vars?.LANGUAGE || 'en-US';
489
+
490
+ session.say({ text: greeting, language }).hangup().send();
491
+ });
492
+ ```
493
+
494
+ ### Step 2: Register and Read — Webhook Apps
495
+
496
+ Use `envVarsMiddleware` to register the declaration (auto-responds to OPTIONS requests), then read values from `req.body.env_vars`:
497
+
498
+ ```typescript
499
+ import express from 'express';
500
+ import { WebhookResponse, envVarsMiddleware } from '@jambonz/sdk/webhook';
501
+
502
+ const envVars = {
503
+ GREETING: { type: 'string', description: 'Greeting message', default: 'Hello!' },
504
+ LANGUAGE: { type: 'string', description: 'TTS language', default: 'en-US' },
505
+ };
506
+
507
+ const app = express();
508
+ app.use(express.json());
509
+ app.use(envVarsMiddleware(envVars)); // Step 1: declare
510
+
511
+ app.post('/incoming', (req, res) => {
512
+ const greeting = req.body.env_vars?.GREETING || 'Hello!'; // Step 2: read
513
+ const language = req.body.env_vars?.LANGUAGE || 'en-US';
514
+
515
+ const jambonz = new WebhookResponse();
516
+ jambonz.say({ text: greeting, language }).hangup();
517
+ res.json(jambonz);
518
+ });
519
+ ```
520
+
521
+ **Note**: `env_vars` is only present in the initial call webhook (or `session:new` for WebSocket), not in subsequent actionHook callbacks. If you need env var values in actionHook handlers, store them in a variable during the initial call.
522
+
523
+ ## Mid-Call Control
524
+
525
+ Active calls can be modified asynchronously — inject verbs, mute, redirect, or start recording while the call is in progress.
526
+
527
+ ### REST API (Webhook Apps)
528
+
529
+ Use `PUT /v1/Accounts/{accountSid}/Calls/{callSid}` to modify an active call:
530
+
531
+ ```json
532
+ { "whisper": { "verb": "say", "text": "Supervisor is listening." } }
533
+ { "mute_status": "mute" }
534
+ { "call_hook": "https://example.com/new-flow" }
535
+ { "call_status": "completed" }
536
+ { "listen_status": "pause" }
537
+ ```
538
+
539
+ The SDK provides typed methods:
540
+ ```typescript
541
+ import { JambonzClient } from '@jambonz/sdk/client';
542
+ const client = new JambonzClient({ baseUrl, accountSid, apiKey });
543
+
544
+ await client.calls.whisper(callSid, { verb: 'say', text: 'Hello' });
545
+ await client.calls.mute(callSid, 'mute');
546
+ await client.calls.redirect(callSid, 'https://example.com/new-flow');
547
+ await client.calls.update(callSid, { call_status: 'completed' });
548
+ ```
549
+
550
+ ### Inject Commands (WebSocket Apps)
551
+
552
+ WebSocket sessions can inject commands for immediate execution:
553
+
554
+ ```typescript
555
+ // Recording
556
+ session.injectRecord('startCallRecording', { siprecServerURL: 'sip:recorder@example.com' });
557
+ session.injectRecord('stopCallRecording');
558
+
559
+ // Whisper a verb to one party
560
+ session.injectWhisper({ verb: 'say', text: 'You have 5 minutes remaining.' });
561
+
562
+ // Mute/unmute
563
+ session.injectMute('mute');
564
+ session.injectMute('unmute');
565
+
566
+ // Pause/resume audio streaming
567
+ session.injectListenStatus('pause');
568
+
569
+ // Send DTMF
570
+ session.injectDtmf('1');
571
+
572
+ // Attach metadata
573
+ session.injectTag({ supervisor: 'jane', priority: 'high' });
574
+
575
+ // Generic inject (for any command)
576
+ session.injectCommand('redirect', { call_hook: '/new-flow' });
577
+ ```
578
+
579
+ ## Session Commands
580
+
581
+ Beyond verbs, WebSocket apps can perform async operations at any time during a call: TTS token streaming, inject commands (mute, whisper, DTMF, recording), and LLM tool output. These are SDK method calls that execute immediately without affecting the verb stack.
582
+
583
+ **Fetch the full reference with `guide:session-commands`** — covers all commands with SDK methods, events, setup, and complete examples including how to build a cascaded voice AI agent (app-managed LLM with TTS token streaming).
584
+
585
+ Key capabilities:
586
+ - **TTS token streaming** — `sendTtsTokens()`, `flushTtsTokens()`, `clearTtsTokens()` — pipe LLM tokens to jambonz incrementally for lowest-latency TTS playback. **Not the same as `autoStreamTts`** (which is a jambonz-internal audio optimization).
587
+ - **Inject commands** — `injectMute()`, `injectWhisper()`, `injectDtmf()`, `injectRecord()`, `injectTag()`, `injectListenStatus()` — modify the call mid-stream.
588
+ - **LLM tool output** — `toolOutput()` — return tool call results to the pipeline verb's LLM.
589
+ - **Cascaded voice AI agents** — build your own STT→LLM→TTS loop using `config` (ttsStream + bargeIn) + `sendTtsTokens()`. Full control over LLM interaction and conversation history.
590
+
591
+ ## WebSocket Protocol
592
+
593
+ ### Message Types (jambonz → app)
594
+
595
+ | Type | Description |
596
+ |------|-------------|
597
+ | `session:new` | New call session established. Contains call details. |
598
+ | `session:redirect` | Call was redirected to this app. |
599
+ | `verb:hook` | An actionHook fired (e.g. gather completed). Contains `hook` (the actionHook name) and `data` (the payload). The SDK emits this as `session.on('/hookName', handler)`. Respond with `.reply()`. |
600
+ | `verb:status` | Informational verb status notification (no reply needed). |
601
+ | `call:status` | Call state changed (e.g. `completed`). |
602
+ | `llm:tool-call` | LLM requested a tool/function call. |
603
+ | `llm:event` | LLM lifecycle event (connected, tokens, etc.). |
604
+ | `tts:tokens-result` | Ack for a TTS token streaming message. |
605
+ | `tts:streaming-event` | TTS streaming lifecycle event (e.g. user interruption). |
606
+
607
+ ### Message Types (app → jambonz)
608
+
609
+ | Type | Description |
610
+ |------|-------------|
611
+ | `ack` | Acknowledge a received message. Include verbs in the `data` array to replace the current verb stack. |
612
+ | `command` | Send a command (e.g. inject a verb, control recording). |
613
+ | `llm:tool-output` | Return the result of a tool call to the LLM. |
614
+ | `tts:tokens` | Stream TTS text tokens for incremental speech synthesis. |
615
+ | `tts:flush` | Signal end of a TTS token stream. |
616
+
617
+ ### Session Events (SDK)
618
+
619
+ The SDK `Session` object emits events for common message types:
620
+
621
+ ```typescript
622
+ // ActionHook events — the hook name IS the event name. Respond with .reply()
623
+ session.on('/echo', (data) => { /* gather actionHook fired */ session.say({text: '...'}).reply(); });
624
+ session.on('/dial-result', (data) => { /* dial actionHook */ session.reply(); });
625
+ session.on('/llm-complete', (data) => { /* llm actionHook */ session.hangup().reply(); });
626
+
627
+ // Fallback — fires for any verb:hook without a specific listener
628
+ session.on('verb:hook', (hook, data) => { /* generic actionHook handler */ });
629
+
630
+ // Status events (informational — no reply needed)
631
+ session.on('verb:status', (data) => { /* verb status notification */ });
632
+ session.on('call:status', (data) => { /* call state change */ });
633
+
634
+ // LLM events
635
+ session.on('llm:tool-call', (data) => { /* tool call from LLM */ });
636
+ session.on('llm:event', (data) => { /* LLM event */ });
637
+
638
+ // TTS streaming — specific lifecycle events
639
+ session.on('tts:stream_open', (data) => { /* vendor connection established */ });
640
+ session.on('tts:stream_paused', (data) => { /* backpressure: buffer full */ });
641
+ session.on('tts:stream_resumed', (data) => { /* backpressure released */ });
642
+ session.on('tts:stream_closed', (data) => { /* TTS stream ended */ });
643
+ session.on('tts:user_interruption', (data) => { /* user barge-in (with event data) */ });
644
+ session.on('tts:user_interrupt', () => { /* user barge-in (convenience, no data) */ });
645
+ // Catch-all for any TTS streaming event
646
+ session.on('tts:streaming-event', (data) => { /* data.event_type has the type */ });
647
+
648
+ // Connection lifecycle
649
+ session.on('close', (code, reason) => { /* connection closed */ });
650
+ session.on('error', (err) => { /* error */ });
651
+ ```
652
+
653
+ ## Audio WebSocket (Listen/Stream)
654
+
655
+ The `listen` and `stream` verbs open a separate WebSocket connection from jambonz to your application, carrying raw audio. This is independent of the control WebSocket (`ws.jambonz.org`) — it uses the `audio.drachtio.org` subprotocol.
656
+
657
+ ### Receiving Audio in the Same Application
658
+
659
+ Use `makeService.audio()` to register an audio WebSocket handler on the same server that handles the control pipe:
660
+
661
+ ```typescript
662
+ import http from 'http';
663
+ import { createEndpoint } from '@jambonz/sdk/websocket';
664
+
665
+ const server = http.createServer();
666
+ const makeService = createEndpoint({ server, port: 3000 });
667
+
668
+ // Control pipe — handles call sessions
669
+ const svc = makeService({ path: '/' });
670
+
671
+ // Audio pipe — receives listen/stream audio
672
+ const audioSvc = makeService.audio({ path: '/audio-stream' });
673
+
674
+ svc.on('session:new', (session) => {
675
+ session
676
+ .answer()
677
+ .say({ text: 'Recording your audio.' })
678
+ .listen({
679
+ url: '/audio-stream', // relative path — jambonz connects back to same server
680
+ sampleRate: 16000,
681
+ mixType: 'mono',
682
+ metadata: { purpose: 'recording' },
683
+ })
684
+ .hangup()
685
+ .send();
686
+ });
687
+
688
+ audioSvc.on('connection', (stream) => {
689
+ console.log(`Audio from call ${stream.callSid}, rate=${stream.sampleRate}`);
690
+ console.log('Metadata:', stream.metadata);
691
+
692
+ stream.on('audio', (pcm: Buffer) => {
693
+ // L16 PCM binary frames
694
+ });
695
+
696
+ stream.on('close', () => {
697
+ console.log('Audio stream closed');
698
+ });
699
+ });
700
+ ```
701
+
702
+ ### AudioStream API
703
+
704
+ The `stream` object in the `connection` event is an `AudioStream` instance:
705
+
706
+ **Properties**: `metadata` (initial JSON), `callSid`, `sampleRate`
707
+
708
+ **Events**:
709
+ - `audio` — L16 PCM binary frame (`Buffer`)
710
+ - `dtmf` — `{digit, duration}` (only if `passDtmf: true` on listen verb)
711
+ - `playDone` — `{id}` (after non-streaming playAudio completes)
712
+ - `mark` — `{name, event}` where event is `'playout'` or `'cleared'`
713
+ - `close` — `(code, reason)`
714
+ - `error` — `(err)`
715
+
716
+ ### Sending Audio Back (Bidirectional)
717
+
718
+ The listen verb supports bidirectional audio. There are two modes, controlled by the `bidirectionalAudio.streaming` option on the listen verb.
719
+
720
+ **Non-streaming mode** (`streaming: false`, the default) — send complete audio clips as base64:
721
+
722
+ ```typescript
723
+ stream.playAudio(base64Content, {
724
+ audioContentType: 'raw', // or 'wav'
725
+ sampleRate: 16000,
726
+ id: 'greeting', // optional — returned in playDone event
727
+ queuePlay: true, // true: queue after current; false: interrupt (default)
728
+ });
729
+
730
+ stream.on('playDone', (evt) => {
731
+ console.log(`Finished playing: ${evt.id}`);
732
+ });
733
+ ```
734
+
735
+ Up to 10 playAudio commands can be queued simultaneously.
736
+
737
+ **Streaming mode** (`streaming: true`) — send raw binary PCM frames directly:
738
+
739
+ ```typescript
740
+ // In the listen verb config:
741
+ // bidirectionalAudio: { enabled: true, streaming: true, sampleRate: 16000 }
742
+
743
+ stream.on('audio', (pcm) => {
744
+ // Echo audio back (or send processed/generated audio)
745
+ stream.sendAudio(pcm);
746
+ });
747
+ ```
748
+
749
+ ### Marks (Synchronization Markers)
750
+
751
+ Marks let you track when streamed audio has been played out to the caller. They work **only with bidirectional streaming mode** — you must enable `bidirectionalAudio: { enabled: true, streaming: true }` on the listen verb.
752
+
753
+ The pattern is: stream audio via `sendAudio()`, then send a mark. When all the audio sent before the mark finishes playing out, jambonz sends back a mark event with `event: 'playout'`. This is how you know the caller has heard a specific chunk of audio.
754
+
755
+ ```typescript
756
+ // Listen verb must enable bidirectional streaming for marks to work
757
+ session
758
+ .listen({
759
+ url: '/audio',
760
+ actionHook: '/listen-done',
761
+ bidirectionalAudio: {
762
+ enabled: true,
763
+ streaming: true,
764
+ sampleRate: 8000,
765
+ },
766
+ })
767
+ .send();
768
+
769
+ // In the audio handler:
770
+ audioSvc.on('connection', (stream) => {
771
+ // Stream audio, then mark a sync point
772
+ stream.sendAudio(pcmBuffer);
773
+ stream.sendMark('chunk-1'); // fires 'playout' when audio above finishes playing
774
+
775
+ stream.sendAudio(morePcm);
776
+ stream.sendMark('chunk-2'); // fires 'playout' when this audio finishes
777
+
778
+ // Listen for mark events
779
+ stream.on('mark', (evt) => {
780
+ // evt.name = 'chunk-1' or 'chunk-2'
781
+ // evt.event = 'playout' (audio played) or 'cleared' (mark was cleared)
782
+ });
783
+
784
+ // Clear all pending marks (unplayed marks get event='cleared')
785
+ stream.clearMarks();
786
+ });
787
+ ```
788
+
789
+ **Important**: Without `bidirectionalAudio.streaming: true`, marks are accepted but never fire — there is no playout buffer to sync against. This is the most common mistake when marks appear to silently fail.
790
+
791
+ ### Other Commands
792
+
793
+ ```typescript
794
+ stream.killAudio(); // Stop playback, flush buffer
795
+ stream.disconnect(); // Close connection, end listen verb
796
+ stream.sendMark('sync-pt'); // Insert synchronization marker
797
+ stream.clearMarks(); // Clear all pending markers
798
+ stream.close(); // Close the WebSocket
799
+ ```
800
+
801
+ ## Recording
802
+
803
+ jambonz supports SIPREC-based call recording. Recording is controlled mid-call via inject commands (WebSocket) or future REST API extensions.
804
+
805
+ ### WebSocket Recording
806
+ ```typescript
807
+ // Start recording — sends audio via SIPREC to a recording server
808
+ session.injectRecord('startCallRecording', {
809
+ siprecServerURL: 'sip:recorder@example.com',
810
+ recordingID: 'my-recording-123', // optional
811
+ });
812
+
813
+ // Pause/resume recording
814
+ session.injectRecord('pauseCallRecording');
815
+ session.injectRecord('resumeCallRecording');
816
+
817
+ // Stop recording
818
+ session.injectRecord('stopCallRecording');
819
+ ```
820
+
821
+ **Important**: The `dial` verb must use `anchorMedia: true` for recording to work during bridged calls. Without media anchoring, audio doesn't flow through the jambonz media server.
822
+
823
+ ## REST API
824
+
825
+ jambonz provides a REST API for platform management and active call control. The API client is available in the SDK at `@jambonz/sdk/client`.
826
+
827
+ Key resources:
828
+ - **Calls** — Create outbound calls, query active calls, modify in-progress calls (redirect, whisper, mute, hangup)
829
+ - **Messages** — Send SMS/MMS messages
830
+
831
+ ## Code Structure
832
+
833
+ ### Single File (default)
834
+
835
+ For simple applications with 1-2 routes, put everything in a single file. This is the default for all examples in this repo and is perfectly suitable for production use.
836
+
837
+ ### Multi-File with Routes Directory
838
+
839
+ For applications with 3+ routes or significant per-route logic, split into a `src/` directory with a routes folder:
840
+
841
+ ```
842
+ src/
843
+ app.ts ← entry point: server setup, route registration
844
+ routes/
845
+ incoming.ts ← handler for one endpoint/path
846
+ hold-music.ts
847
+ queue-exit.ts
848
+ ```
849
+
850
+ **Webhook pattern** — each route file exports an Express route handler:
851
+
852
+ ```typescript
853
+ // src/routes/incoming.ts
854
+ import type { Request, Response } from 'express';
855
+ import { WebhookResponse } from '@jambonz/sdk/webhook';
856
+
857
+ export default function incoming(_req: Request, res: Response) {
858
+ const jambonz = new WebhookResponse();
859
+ jambonz
860
+ .say({ text: 'Thank you for calling. Please hold.' })
861
+ .enqueue({ name: 'support', waitHook: '/hold-music', actionHook: '/queue-exit' });
862
+ res.json(jambonz);
863
+ }
864
+ ```
865
+
866
+ ```typescript
867
+ // src/app.ts
868
+ import express from 'express';
869
+ import incoming from './routes/incoming.js';
870
+ import holdMusic from './routes/hold-music.js';
871
+ import queueExit from './routes/queue-exit.js';
872
+
873
+ const app = express();
874
+ app.use(express.json());
875
+
876
+ app.post('/incoming', incoming);
877
+ app.post('/hold-music', holdMusic);
878
+ app.post('/queue-exit', queueExit);
879
+
880
+ app.listen(3000, () => console.log('Listening on port 3000'));
881
+ ```
882
+
883
+ **WebSocket pattern** — there are two cases to consider:
884
+
885
+ 1. **Multiple services** (different `makeService({ path })` calls — each path gets its own `session:new`). Each route file exports a function that takes a session:
886
+
887
+ ```typescript
888
+ // src/routes/caller.ts
889
+ import type { Session } from '@jambonz/sdk/websocket';
890
+
891
+ export default function caller(session: Session) {
892
+ session
893
+ .say({ text: 'Please hold.' })
894
+ .enqueue({ name: 'support', waitHook: '/hold-music', actionHook: '/queue-exit' })
895
+ .send();
896
+ }
897
+ ```
898
+
899
+ ```typescript
900
+ // src/app.ts
901
+ import http from 'http';
902
+ import { createEndpoint } from '@jambonz/sdk/websocket';
903
+ import caller from './routes/caller.js';
904
+ import agent from './routes/agent.js';
905
+
906
+ const server = http.createServer();
907
+ const makeService = createEndpoint({ server, port: 3000 });
908
+
909
+ makeService({ path: '/incoming' }).on('session:new', (session) => caller(session));
910
+ makeService({ path: '/agent' }).on('session:new', (session) => agent(session));
911
+ ```
912
+
913
+ 2. **Multiple actionHook handlers on one session** — extract handler functions, but register them all within `session:new`:
914
+
915
+ ```typescript
916
+ // src/routes/echo-handler.ts
917
+ import type { Session } from '@jambonz/sdk/websocket';
918
+
919
+ export default function echoHandler(session: Session, evt: Record<string, any>) {
920
+ if (evt.reason === 'speechDetected') {
921
+ const text = evt.speech?.alternatives?.[0]?.transcript || 'nothing';
922
+ session.say({ text: `You said: ${text}` })
923
+ .gather({ input: ['speech'], actionHook: '/echo', timeout: 10 })
924
+ .reply();
925
+ } else {
926
+ session.gather({ input: ['speech'], actionHook: '/echo', timeout: 10,
927
+ say: { text: 'I didn\'t hear anything. Try again.' } }).reply();
928
+ }
929
+ }
930
+ ```
931
+
932
+ ```typescript
933
+ // src/app.ts — wire it up
934
+ svc.on('session:new', (session) => {
935
+ session.on('/echo', (evt) => echoHandler(session, evt));
936
+ session.gather({ input: ['speech'], actionHook: '/echo', timeout: 10,
937
+ say: { text: 'Say something.' } }).send();
938
+ });
939
+ ```
940
+
941
+ ### When to Split
942
+
943
+ - **1-2 routes, simple logic** → single file
944
+ - **3+ routes or substantial per-route logic** → `src/app.ts` + `src/routes/`
945
+ - **Shared config, prompts, or utilities** → `src/config.ts`, `src/prompts.ts`, etc.
946
+
947
+ When in doubt, start with a single file. It's easy to split later.
948
+
949
+ ## Examples
950
+
951
+ Complete working examples are in the `examples/` directory:
952
+ - **hello-world** — Minimal greeting (webhook + WebSocket)
953
+ - **echo** — Speech echo using gather with actionHook pattern (webhook + WebSocket). The canonical example for understanding actionHook event handling.
954
+ - **ivr-menu** — Interactive menu with speech and DTMF input (webhook)
955
+ - **dial** — Simple outbound dial to a phone number (webhook)
956
+ - **listen-record** — Record audio using the listen verb to stream to a WebSocket (webhook)
957
+ - **voice-agent** — LLM-powered conversational AI with tool calls (webhook + WebSocket)
958
+ - **openai-realtime** — OpenAI Realtime API voice agent with function calling (WebSocket)
959
+ - **deepgram-voice-agent** — Deepgram Voice Agent API with function calling (WebSocket)
960
+ - **elevenlabs-voice-agent** — ElevenLabs Conversational AI agent (WebSocket). Demonstrates the agent_id auth pattern unique to ElevenLabs.
961
+ - **llm-streaming** — Anthropic LLM with TTS token streaming and barge-in (WebSocket)
962
+ - **queue-with-hold** — Call queue with hold music and agent dequeue (webhook + WebSocket)
963
+ - **call-recording** — Mid-call recording control via REST API and inject commands (webhook + WebSocket)
964
+ - **realtime-translator** — Bridges two parties with real-time speech translation using STT, Google Translate, and TTS dub tracks. Multi-file example with `src/routes/` structure (WebSocket)
965
+
966
+ ## Key Concepts
967
+
968
+ - **Verb**: A JSON object with a `verb` property that tells jambonz what to do. Verbs execute sequentially.
969
+ - **ActionHook**: A webhook URL that jambonz calls when a verb completes. Returns the next verb array. Payload includes call details and verb-specific results.
970
+ - **Synthesizer**: TTS configuration (vendor, voice, language).
971
+ - **Recognizer**: STT configuration (vendor, language, model).
972
+ - **Target**: A call destination (phone number, SIP URI, registered user, Teams user).
973
+ - **Session**: A single phone call. Session-level settings (set via `config`) persist across verbs.
974
+ - **Inject Command**: Asynchronous mid-call modification (WebSocket). Executes immediately without replacing the verb stack.