@jambonz/schema 0.1.1 → 0.1.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +20 -731
- package/README.md +64 -0
- package/jambonz-app.schema.json +2 -1
- package/package.json +1 -1
- package/verbs/dial.schema.json +14 -4
- package/verbs/hangup.schema.json +1 -1
- package/verbs/rest_dial.schema.json +113 -0
- package/verbs/sip-decline.schema.json +1 -1
- package/verbs/sip-refer.schema.json +1 -1
- package/verbs/sip-request.schema.json +1 -1
package/AGENTS.md
CHANGED
|
@@ -1,10 +1,8 @@
|
|
|
1
|
-
# jambonz
|
|
1
|
+
# jambonz Developer Guide
|
|
2
2
|
|
|
3
|
-
|
|
4
|
-
|
|
5
|
-
## Before You Start — Ask the User
|
|
3
|
+
This guide covers the jambonz verb model, transport modes, and protocol. For SDK-specific documentation, see the AGENTS.md in the respective SDK repository.
|
|
6
4
|
|
|
7
|
-
|
|
5
|
+
jambonz is an open-source CPaaS (Communications Platform as a Service) for building voice and messaging applications. It handles telephony infrastructure — SIP, carriers, phone numbers, media processing — so you can focus on application logic.
|
|
8
6
|
|
|
9
7
|
## Server Versions
|
|
10
8
|
|
|
@@ -37,13 +35,24 @@ The verb schemas and JSON structure are identical in both modes. The difference
|
|
|
37
35
|
- **Webhook**: Simple IVR, call routing, voicemail, basic gather-and-respond patterns.
|
|
38
36
|
- **WebSocket**: LLM-powered voice agents, real-time audio streaming, complex conversational flows, anything requiring bidirectional communication, or asynchronous logic, or streaming tts.
|
|
39
37
|
|
|
40
|
-
**IMPORTANT**: Any application that uses a speech-to-speech verb (`openai_s2s`, `google_s2s`, `deepgram_s2s`, `ultravox_s2s`, `elevenlabs_s2s`, `s2s`, or `pipeline`) MUST use WebSocket transport, not webhooks. These verbs require persistent bidirectional communication for real-time audio and events.
|
|
38
|
+
**IMPORTANT**: Any application that uses a speech-to-speech verb (`openai_s2s`, `google_s2s`, `deepgram_s2s`, `ultravox_s2s`, `elevenlabs_s2s`, `s2s`, or `pipeline`) MUST use WebSocket transport, not webhooks. These verbs require persistent bidirectional communication for real-time audio and events.
|
|
41
39
|
|
|
42
40
|
## Schema
|
|
43
41
|
|
|
44
|
-
The complete verb schema is at `
|
|
42
|
+
The complete verb schema is at `jambonz-app.schema.json` in the package root. This is a JSON Schema (draft 2020-12) that defines the structure of a jambonz application.
|
|
43
|
+
|
|
44
|
+
Individual verb schemas are in `verbs/`. Shared component types (synthesizer, recognizer, target, etc.) are in `components/`. Callback payload schemas are in `callbacks/`.
|
|
45
|
+
|
|
46
|
+
### MCP Server
|
|
47
|
+
|
|
48
|
+
AI agents can fetch individual schemas on demand via the jambonz MCP server:
|
|
45
49
|
|
|
46
|
-
|
|
50
|
+
- **Remote**: `https://mcp-server.jambonz.app/mcp` (no install needed)
|
|
51
|
+
- **Local**: `npx @jambonz/mcp-schema-server` (stdio) or `npx @jambonz/mcp-schema-server --http` (HTTP)
|
|
52
|
+
|
|
53
|
+
Two tools are available:
|
|
54
|
+
1. **`jambonz_developer_toolkit`** — Returns this guide plus an index of all available schemas. Call this first.
|
|
55
|
+
2. **`get_jambonz_schema`** — Fetch the JSON Schema for any verb, component, or callback (e.g. `verb:say`, `component:synthesizer`, `callback:gather`, `guide:session-commands`).
|
|
47
56
|
|
|
48
57
|
## Core Verbs
|
|
49
58
|
|
|
@@ -93,211 +102,9 @@ Individual verb schemas are in `schema/verbs/`. Shared component types (synthesi
|
|
|
93
102
|
3. **Use `s2s` (not `llm`) when the vendor is dynamic** — e.g. the vendor comes from an env var or runtime config. Both `s2s` and `llm` are synonyms, but prefer `s2s`.
|
|
94
103
|
4. **Never use `llm` in generated code** — it is a legacy name. Use either a vendor shortcut or `s2s`.
|
|
95
104
|
|
|
96
|
-
The same rules apply to SDK method calls: use `.openai_s2s(opts)`, `.deepgram_s2s(opts)`, etc. instead of `.llm({ vendor: 'openai', ... })`. Use `.stream(opts)` instead of `.listen(opts)`.
|
|
97
|
-
|
|
98
|
-
## Using the @jambonz/sdk
|
|
99
|
-
|
|
100
|
-
**IMPORTANT**: Always use the `@jambonz/sdk` package (version 0.1.x+) to build jambonz applications. The older packages `@jambonz/node-client-ws` and `@jambonz/node-client` are **deprecated and replaced** by `@jambonz/sdk`. Do NOT use those old packages. Do NOT build apps with raw JSON verb arrays and plain Express/WS — use `@jambonz/sdk`.
|
|
101
|
-
|
|
102
|
-
**IMPORTANT — App Environment Variables vs process.env**: jambonz applications should NEVER use `process.env` for application-configurable values (phone numbers, API keys, language preferences, greeting text, etc.). Instead, use **jambonz application environment variables** — a two-step pattern:
|
|
103
|
-
1. **Declare** the variables so the jambonz portal can discover them (via `envVars` option for WebSocket, or `envVarsMiddleware` for webhook).
|
|
104
|
-
2. **Read** the values at runtime from the call payload (`session.data.env_vars` for WebSocket, `req.body.env_vars` for webhook).
|
|
105
|
-
|
|
106
|
-
Both steps are required. Declaring without reading means values are ignored. Reading without declaring means the portal won't know about them and won't send them. See the [Application Environment Variables](#application-environment-variables) section for full details.
|
|
107
|
-
|
|
108
|
-
Install: `npm install @jambonz/sdk`
|
|
109
|
-
|
|
110
|
-
**Dependencies**: Webhook apps also require `express` (`npm install express`). WebSocket apps have no additional dependencies — the SDK includes `ws` internally. When generating a `package.json`, always include all required dependencies.
|
|
111
|
-
|
|
112
|
-
### Webhook Application (HTTP)
|
|
113
|
-
|
|
114
|
-
Import `WebhookResponse` from `@jambonz/sdk/webhook`. Create an Express server, construct a `WebhookResponse` for each request, chain verb methods, and return it via `res.json()`.
|
|
115
|
-
|
|
116
|
-
**Best practice**: Always include a `POST /call-status` handler. jambonz sends call status change notifications (ringing, in-progress, completed, etc.) to this endpoint. The handler should log the event and return 200. The path `/call-status` is conventional but the user may choose a different path:
|
|
117
|
-
|
|
118
|
-
```typescript
|
|
119
|
-
app.post('/call-status', (req, res) => {
|
|
120
|
-
console.log(`Call ${req.body.call_sid} status: ${req.body.call_status}`);
|
|
121
|
-
res.sendStatus(200);
|
|
122
|
-
});
|
|
123
|
-
```
|
|
124
|
-
|
|
125
|
-
```typescript
|
|
126
|
-
import express from 'express';
|
|
127
|
-
import { WebhookResponse } from '@jambonz/sdk/webhook';
|
|
128
|
-
|
|
129
|
-
const app = express();
|
|
130
|
-
app.use(express.json());
|
|
131
|
-
|
|
132
|
-
app.post('/incoming', (req, res) => {
|
|
133
|
-
const jambonz = new WebhookResponse();
|
|
134
|
-
jambonz
|
|
135
|
-
.say({ text: 'Hello! Welcome to our service.' })
|
|
136
|
-
.gather({
|
|
137
|
-
input: ['speech', 'digits'],
|
|
138
|
-
actionHook: '/handle-input',
|
|
139
|
-
numDigits: 1,
|
|
140
|
-
timeout: 10,
|
|
141
|
-
say: { text: 'Press 1 for sales or 2 for support.' },
|
|
142
|
-
})
|
|
143
|
-
.say({ text: 'We did not receive any input. Goodbye.' })
|
|
144
|
-
.hangup();
|
|
145
|
-
|
|
146
|
-
res.json(jambonz);
|
|
147
|
-
});
|
|
148
|
-
|
|
149
|
-
app.post('/handle-input', (req, res) => {
|
|
150
|
-
const { digits, speech } = req.body;
|
|
151
|
-
const jambonz = new WebhookResponse();
|
|
152
|
-
jambonz.say({ text: `You pressed ${digits || 'nothing'}. Goodbye.` }).hangup();
|
|
153
|
-
res.json(jambonz);
|
|
154
|
-
});
|
|
155
|
-
|
|
156
|
-
// Every webhook app must handle call status events
|
|
157
|
-
app.post('/call-status', (req, res) => {
|
|
158
|
-
console.log(`Call ${req.body.call_sid} status: ${req.body.call_status}`);
|
|
159
|
-
res.sendStatus(200);
|
|
160
|
-
});
|
|
161
|
-
|
|
162
|
-
app.listen(3000, () => console.log('Listening on port 3000'));
|
|
163
|
-
```
|
|
164
|
-
|
|
165
|
-
### WebSocket Application
|
|
166
|
-
|
|
167
|
-
Import `createEndpoint` from `@jambonz/sdk/websocket`. Create an HTTP server, call `createEndpoint` to set up WebSocket handling, then register path-based services that receive `session` objects.
|
|
168
|
-
|
|
169
|
-
```typescript
|
|
170
|
-
import http from 'http';
|
|
171
|
-
import { createEndpoint } from '@jambonz/sdk/websocket';
|
|
172
|
-
|
|
173
|
-
const server = http.createServer();
|
|
174
|
-
const makeService = createEndpoint({ server, port: 3000 });
|
|
175
|
-
|
|
176
|
-
const svc = makeService({ path: '/' });
|
|
177
|
-
|
|
178
|
-
svc.on('session:new', (session) => {
|
|
179
|
-
console.log(`Incoming call: ${session.callSid}`);
|
|
180
|
-
|
|
181
|
-
session
|
|
182
|
-
.say({ text: 'Hello from jambonz over WebSocket!' })
|
|
183
|
-
.hangup()
|
|
184
|
-
.send();
|
|
185
|
-
});
|
|
186
|
-
|
|
187
|
-
console.log('jambonz ws app listening on port 3000');
|
|
188
|
-
```
|
|
189
|
-
|
|
190
|
-
**Key differences from webhook**: Use `session` instead of `WebhookResponse`. Chain verbs the same way, but call `.send()` at the end to transmit the initial verb array over the WebSocket.
|
|
191
|
-
|
|
192
|
-
### WebSocket actionHook Events (CRITICAL)
|
|
193
|
-
|
|
194
|
-
In webhook mode, an `actionHook` is just a URL that jambonz POSTs to. In WebSocket mode, the `actionHook` value becomes an **event name** emitted on the session. You MUST bind a handler for it and respond with `.reply()`.
|
|
195
|
-
|
|
196
|
-
**Key rules for WebSocket actionHook handling:**
|
|
197
|
-
1. Use `session.on('/hookName', (evt) => {...})` to listen for the actionHook event.
|
|
198
|
-
2. The `evt` object contains the actionHook payload (same fields as the webhook POST body: `reason`, `speech`, `digits`, etc.).
|
|
199
|
-
3. Respond with `.reply()` — NOT `.send()`. `.send()` is only for the initial verb array (the first response to `session:new`). `.reply()` acknowledges the actionHook and provides the next verb array.
|
|
200
|
-
4. If no listener is bound for an actionHook, the SDK auto-replies with an empty verb array.
|
|
201
|
-
|
|
202
|
-
### WebSocket with Gather (speech echo example)
|
|
203
|
-
|
|
204
|
-
```typescript
|
|
205
|
-
import http from 'http';
|
|
206
|
-
import { createEndpoint } from '@jambonz/sdk/websocket';
|
|
207
|
-
|
|
208
|
-
const server = http.createServer();
|
|
209
|
-
const makeService = createEndpoint({ server, port: 3000 });
|
|
210
|
-
|
|
211
|
-
const svc = makeService({ path: '/' });
|
|
212
|
-
|
|
213
|
-
svc.on('session:new', (session) => {
|
|
214
|
-
// Bind actionHook handler BEFORE sending verbs
|
|
215
|
-
session
|
|
216
|
-
.on('close', (code: number, _reason: Buffer) => {
|
|
217
|
-
console.log(`Session ${session.callSid} closed: ${code}`);
|
|
218
|
-
})
|
|
219
|
-
.on('error', (err: Error) => {
|
|
220
|
-
console.error(`Session ${session.callSid} error:`, err);
|
|
221
|
-
})
|
|
222
|
-
.on('/echo', (evt: Record<string, any>) => {
|
|
223
|
-
// This fires when the gather verb completes (actionHook: '/echo')
|
|
224
|
-
switch (evt.reason) {
|
|
225
|
-
case 'speechDetected': {
|
|
226
|
-
const transcript = evt.speech?.alternatives?.[0]?.transcript || 'nothing';
|
|
227
|
-
session
|
|
228
|
-
.say({ text: `You said: ${transcript}.` })
|
|
229
|
-
.gather({
|
|
230
|
-
input: ['speech'],
|
|
231
|
-
actionHook: '/echo',
|
|
232
|
-
timeout: 10,
|
|
233
|
-
say: { text: 'Please say something else.' },
|
|
234
|
-
})
|
|
235
|
-
.reply(); // reply() — NOT send()
|
|
236
|
-
break;
|
|
237
|
-
}
|
|
238
|
-
case 'timeout':
|
|
239
|
-
session
|
|
240
|
-
.gather({
|
|
241
|
-
input: ['speech'],
|
|
242
|
-
actionHook: '/echo',
|
|
243
|
-
timeout: 10,
|
|
244
|
-
say: { text: 'Are you still there? I didn\'t hear anything.' },
|
|
245
|
-
})
|
|
246
|
-
.reply();
|
|
247
|
-
break;
|
|
248
|
-
default:
|
|
249
|
-
session.reply();
|
|
250
|
-
break;
|
|
251
|
-
}
|
|
252
|
-
});
|
|
253
|
-
|
|
254
|
-
// Send initial verbs to jambonz
|
|
255
|
-
session
|
|
256
|
-
.pause({ length: 1 })
|
|
257
|
-
.gather({
|
|
258
|
-
input: ['speech'],
|
|
259
|
-
actionHook: '/echo',
|
|
260
|
-
timeout: 10,
|
|
261
|
-
say: { text: 'Please say something and I will echo it back to you.' },
|
|
262
|
-
})
|
|
263
|
-
.send(); // send() — first response only
|
|
264
|
-
});
|
|
265
|
-
|
|
266
|
-
console.log('Speech echo WebSocket app listening on port 3000');
|
|
267
|
-
```
|
|
268
|
-
|
|
269
|
-
**`.send()` vs `.reply()`:**
|
|
270
|
-
- `.send()` — Use ONCE for the initial verb array in response to `session:new`. This acknowledges the session.
|
|
271
|
-
- `.reply()` — Use for ALL subsequent responses (actionHook events, session:redirect). This acknowledges the hook message and provides the next verb array.
|
|
272
|
-
|
|
273
|
-
### SDK Verb Method Reference
|
|
274
|
-
|
|
275
|
-
Both `WebhookResponse` and `Session` support the same chainable verb methods:
|
|
276
|
-
|
|
277
|
-
`.say(opts)` `.play(opts)` `.gather(opts)` `.dial(opts)` `.llm(opts)` `.s2s(opts)` `.openai_s2s(opts)` `.google_s2s(opts)` `.elevenlabs_s2s(opts)` `.deepgram_s2s(opts)` `.ultravox_s2s(opts)` `.dialogflow(opts)` `.conference(opts)` `.enqueue(opts)` `.dequeue(opts)` `.hangup()` `.pause(opts)` `.redirect(opts)` `.config(opts)` `.tag(opts)` `.dtmf(opts)` `.listen(opts)` `.transcribe(opts)` `.message(opts)` `.stream(opts)` `.pipeline(opts)` `.dub(opts)` `.alert(opts)` `.answer(opts)` `.leave()` `.sipDecline(opts)` `.sipRefer(opts)` `.sipRequest(opts)`
|
|
278
|
-
|
|
279
|
-
All methods accept the same options as the corresponding verb JSON Schema. Methods are chainable — they return `this`.
|
|
280
|
-
|
|
281
|
-
### REST API Client
|
|
282
|
-
|
|
283
|
-
```typescript
|
|
284
|
-
import { JambonzClient } from '@jambonz/sdk/client';
|
|
285
|
-
|
|
286
|
-
const client = new JambonzClient({ baseUrl: 'https://api.jambonz.us', accountSid, apiKey });
|
|
287
|
-
|
|
288
|
-
// Create an outbound call
|
|
289
|
-
await client.calls.create({ from: '+15085551212', to: { type: 'phone', number: '+15085551213' }, call_hook: '/incoming' });
|
|
290
|
-
|
|
291
|
-
// Mid-call control
|
|
292
|
-
await client.calls.whisper(callSid, { verb: 'say', text: 'Supervisor listening.' });
|
|
293
|
-
await client.calls.mute(callSid, 'mute');
|
|
294
|
-
await client.calls.redirect(callSid, 'https://example.com/new-flow');
|
|
295
|
-
await client.calls.update(callSid, { call_status: 'completed' });
|
|
296
|
-
```
|
|
297
|
-
|
|
298
105
|
## Common Patterns (Raw JSON)
|
|
299
106
|
|
|
300
|
-
These are the raw JSON verb arrays that
|
|
107
|
+
These are the raw JSON verb arrays that jambonz applications produce. They show the underlying structure for reference.
|
|
301
108
|
|
|
302
109
|
### Simple Greeting and Gather
|
|
303
110
|
```json
|
|
@@ -354,19 +161,6 @@ ElevenLabs works differently from other s2s vendors. Instead of passing a model
|
|
|
354
161
|
]
|
|
355
162
|
```
|
|
356
163
|
|
|
357
|
-
SDK example:
|
|
358
|
-
```typescript
|
|
359
|
-
session
|
|
360
|
-
.elevenlabs_s2s({
|
|
361
|
-
auth: { agent_id: agentId, api_key: apiKey },
|
|
362
|
-
llmOptions: {},
|
|
363
|
-
actionHook: '/s2s-complete',
|
|
364
|
-
eventHook: '/event',
|
|
365
|
-
events: ['all'],
|
|
366
|
-
})
|
|
367
|
-
.send();
|
|
368
|
-
```
|
|
369
|
-
|
|
370
164
|
### Dial with Fallback
|
|
371
165
|
```json
|
|
372
166
|
[
|
|
@@ -424,170 +218,6 @@ When a verb completes, jambonz invokes the `actionHook` URL (webhook) or sends a
|
|
|
424
218
|
|
|
425
219
|
**transcribe**: `transcription` (object with transcript text)
|
|
426
220
|
|
|
427
|
-
## Application Environment Variables
|
|
428
|
-
|
|
429
|
-
jambonz has a built-in mechanism for application configuration that is **always preferred over `process.env`**. It works in two required steps:
|
|
430
|
-
|
|
431
|
-
1. **Declare** — Your app declares its configurable parameters with a schema. The jambonz portal discovers these via an HTTP `OPTIONS` request and renders a configuration form for administrators.
|
|
432
|
-
2. **Receive** — When a call arrives, jambonz delivers the configured values in the call payload as `env_vars`. Your app reads them from there.
|
|
433
|
-
|
|
434
|
-
**IMPORTANT**: Both steps are required. If you only declare without reading, the values are ignored. If you only read without declaring, the portal won't discover the parameters and won't send them. NEVER use `process.env` for values that should be configurable per-application in the jambonz portal.
|
|
435
|
-
|
|
436
|
-
**When to use env vars**: Phone numbers to dial, API keys, language/voice preferences, greeting text, queue names, timeout values, feature flags, or any value that might change between deployments or users. If in doubt, make it an env var.
|
|
437
|
-
|
|
438
|
-
### Step 1: Define the Schema
|
|
439
|
-
|
|
440
|
-
Define a schema object where each key is a parameter name and the value describes its type and UI behavior:
|
|
441
|
-
|
|
442
|
-
```typescript
|
|
443
|
-
const envVars = {
|
|
444
|
-
API_KEY: { type: 'string', description: 'Your API key', required: true, obscure: true },
|
|
445
|
-
LANGUAGE: { type: 'string', description: 'TTS language', default: 'en-US', enum: ['en-US', 'es-ES', 'fr-FR'] },
|
|
446
|
-
MAX_RETRIES: { type: 'number', description: 'Max retry attempts', default: 3 },
|
|
447
|
-
CARRIER: { type: 'string', description: 'Outbound carrier', jambonzResource: 'carriers' },
|
|
448
|
-
SYSTEM_PROMPT: { type: 'string', description: 'LLM system prompt', uiHint: 'textarea' },
|
|
449
|
-
TLS_CERT: { type: 'string', description: 'TLS certificate', uiHint: 'filepicker' },
|
|
450
|
-
};
|
|
451
|
-
```
|
|
452
|
-
|
|
453
|
-
Each parameter supports:
|
|
454
|
-
|
|
455
|
-
| Property | Required | Description |
|
|
456
|
-
|----------|----------|-------------|
|
|
457
|
-
| `type` | Yes | `'string'` \| `'number'` \| `'boolean'` |
|
|
458
|
-
| `description` | Yes | Human-readable label shown in the portal |
|
|
459
|
-
| `required` | No | Whether the user must provide a value |
|
|
460
|
-
| `default` | No | Pre-filled default value |
|
|
461
|
-
| `enum` | No | Array of allowed values — renders as a dropdown |
|
|
462
|
-
| `obscure` | No | Masks the value in the portal UI (for secrets/API keys) |
|
|
463
|
-
| `uiHint` | No | `'input'` (default single-line), `'textarea'` (multi-line), or `'filepicker'` (file upload with textarea) |
|
|
464
|
-
| `jambonzResource` | No | Populate a dropdown from jambonz account data. Currently supports `'carriers'` (lists VoIP carriers on the account) |
|
|
465
|
-
|
|
466
|
-
**Notes on `jambonzResource`**: When set to `'carriers'`, the portal fetches the VoIP carriers configured for the account and renders them as a dropdown. The stored value is the carrier name. This is preferred over hardcoding carrier names or using `enum` with static values.
|
|
467
|
-
|
|
468
|
-
### Step 2: Register and Read — WebSocket Apps
|
|
469
|
-
|
|
470
|
-
Pass `envVars` to `createEndpoint` to register the declaration (the SDK auto-responds to OPTIONS requests), then read values from `session.data.env_vars`:
|
|
471
|
-
|
|
472
|
-
```typescript
|
|
473
|
-
import http from 'http';
|
|
474
|
-
import { createEndpoint } from '@jambonz/sdk/websocket';
|
|
475
|
-
|
|
476
|
-
const envVars = {
|
|
477
|
-
GREETING: { type: 'string', description: 'Greeting message', default: 'Hello!' },
|
|
478
|
-
LANGUAGE: { type: 'string', description: 'TTS language', default: 'en-US' },
|
|
479
|
-
};
|
|
480
|
-
|
|
481
|
-
const server = http.createServer();
|
|
482
|
-
const makeService = createEndpoint({ server, port: 3000, envVars }); // Step 1: declare
|
|
483
|
-
|
|
484
|
-
const svc = makeService({ path: '/' });
|
|
485
|
-
|
|
486
|
-
svc.on('session:new', (session) => {
|
|
487
|
-
const greeting = session.data.env_vars?.GREETING || 'Hello!'; // Step 2: read
|
|
488
|
-
const language = session.data.env_vars?.LANGUAGE || 'en-US';
|
|
489
|
-
|
|
490
|
-
session.say({ text: greeting, language }).hangup().send();
|
|
491
|
-
});
|
|
492
|
-
```
|
|
493
|
-
|
|
494
|
-
### Step 2: Register and Read — Webhook Apps
|
|
495
|
-
|
|
496
|
-
Use `envVarsMiddleware` to register the declaration (auto-responds to OPTIONS requests), then read values from `req.body.env_vars`:
|
|
497
|
-
|
|
498
|
-
```typescript
|
|
499
|
-
import express from 'express';
|
|
500
|
-
import { WebhookResponse, envVarsMiddleware } from '@jambonz/sdk/webhook';
|
|
501
|
-
|
|
502
|
-
const envVars = {
|
|
503
|
-
GREETING: { type: 'string', description: 'Greeting message', default: 'Hello!' },
|
|
504
|
-
LANGUAGE: { type: 'string', description: 'TTS language', default: 'en-US' },
|
|
505
|
-
};
|
|
506
|
-
|
|
507
|
-
const app = express();
|
|
508
|
-
app.use(express.json());
|
|
509
|
-
app.use(envVarsMiddleware(envVars)); // Step 1: declare
|
|
510
|
-
|
|
511
|
-
app.post('/incoming', (req, res) => {
|
|
512
|
-
const greeting = req.body.env_vars?.GREETING || 'Hello!'; // Step 2: read
|
|
513
|
-
const language = req.body.env_vars?.LANGUAGE || 'en-US';
|
|
514
|
-
|
|
515
|
-
const jambonz = new WebhookResponse();
|
|
516
|
-
jambonz.say({ text: greeting, language }).hangup();
|
|
517
|
-
res.json(jambonz);
|
|
518
|
-
});
|
|
519
|
-
```
|
|
520
|
-
|
|
521
|
-
**Note**: `env_vars` is only present in the initial call webhook (or `session:new` for WebSocket), not in subsequent actionHook callbacks. If you need env var values in actionHook handlers, store them in a variable during the initial call.
|
|
522
|
-
|
|
523
|
-
## Mid-Call Control
|
|
524
|
-
|
|
525
|
-
Active calls can be modified asynchronously — inject verbs, mute, redirect, or start recording while the call is in progress.
|
|
526
|
-
|
|
527
|
-
### REST API (Webhook Apps)
|
|
528
|
-
|
|
529
|
-
Use `PUT /v1/Accounts/{accountSid}/Calls/{callSid}` to modify an active call:
|
|
530
|
-
|
|
531
|
-
```json
|
|
532
|
-
{ "whisper": { "verb": "say", "text": "Supervisor is listening." } }
|
|
533
|
-
{ "mute_status": "mute" }
|
|
534
|
-
{ "call_hook": "https://example.com/new-flow" }
|
|
535
|
-
{ "call_status": "completed" }
|
|
536
|
-
{ "listen_status": "pause" }
|
|
537
|
-
```
|
|
538
|
-
|
|
539
|
-
The SDK provides typed methods:
|
|
540
|
-
```typescript
|
|
541
|
-
import { JambonzClient } from '@jambonz/sdk/client';
|
|
542
|
-
const client = new JambonzClient({ baseUrl, accountSid, apiKey });
|
|
543
|
-
|
|
544
|
-
await client.calls.whisper(callSid, { verb: 'say', text: 'Hello' });
|
|
545
|
-
await client.calls.mute(callSid, 'mute');
|
|
546
|
-
await client.calls.redirect(callSid, 'https://example.com/new-flow');
|
|
547
|
-
await client.calls.update(callSid, { call_status: 'completed' });
|
|
548
|
-
```
|
|
549
|
-
|
|
550
|
-
### Inject Commands (WebSocket Apps)
|
|
551
|
-
|
|
552
|
-
WebSocket sessions can inject commands for immediate execution:
|
|
553
|
-
|
|
554
|
-
```typescript
|
|
555
|
-
// Recording
|
|
556
|
-
session.injectRecord('startCallRecording', { siprecServerURL: 'sip:recorder@example.com' });
|
|
557
|
-
session.injectRecord('stopCallRecording');
|
|
558
|
-
|
|
559
|
-
// Whisper a verb to one party
|
|
560
|
-
session.injectWhisper({ verb: 'say', text: 'You have 5 minutes remaining.' });
|
|
561
|
-
|
|
562
|
-
// Mute/unmute
|
|
563
|
-
session.injectMute('mute');
|
|
564
|
-
session.injectMute('unmute');
|
|
565
|
-
|
|
566
|
-
// Pause/resume audio streaming
|
|
567
|
-
session.injectListenStatus('pause');
|
|
568
|
-
|
|
569
|
-
// Send DTMF
|
|
570
|
-
session.injectDtmf('1');
|
|
571
|
-
|
|
572
|
-
// Attach metadata
|
|
573
|
-
session.injectTag({ supervisor: 'jane', priority: 'high' });
|
|
574
|
-
|
|
575
|
-
// Generic inject (for any command)
|
|
576
|
-
session.injectCommand('redirect', { call_hook: '/new-flow' });
|
|
577
|
-
```
|
|
578
|
-
|
|
579
|
-
## Session Commands
|
|
580
|
-
|
|
581
|
-
Beyond verbs, WebSocket apps can perform async operations at any time during a call: TTS token streaming, inject commands (mute, whisper, DTMF, recording), and LLM tool output. These are SDK method calls that execute immediately without affecting the verb stack.
|
|
582
|
-
|
|
583
|
-
**Fetch the full reference with `guide:session-commands`** — covers all commands with SDK methods, events, setup, and complete examples including how to build a cascaded voice AI agent (app-managed LLM with TTS token streaming).
|
|
584
|
-
|
|
585
|
-
Key capabilities:
|
|
586
|
-
- **TTS token streaming** — `sendTtsTokens()`, `flushTtsTokens()`, `clearTtsTokens()` — pipe LLM tokens to jambonz incrementally for lowest-latency TTS playback. **Not the same as `autoStreamTts`** (which is a jambonz-internal audio optimization).
|
|
587
|
-
- **Inject commands** — `injectMute()`, `injectWhisper()`, `injectDtmf()`, `injectRecord()`, `injectTag()`, `injectListenStatus()` — modify the call mid-stream.
|
|
588
|
-
- **LLM tool output** — `toolOutput()` — return tool call results to the pipeline verb's LLM.
|
|
589
|
-
- **Cascaded voice AI agents** — build your own STT→LLM→TTS loop using `config` (ttsStream + bargeIn) + `sendTtsTokens()`. Full control over LLM interaction and conversation history.
|
|
590
|
-
|
|
591
221
|
## WebSocket Protocol
|
|
592
222
|
|
|
593
223
|
### Message Types (jambonz → app)
|
|
@@ -596,7 +226,7 @@ Key capabilities:
|
|
|
596
226
|
|------|-------------|
|
|
597
227
|
| `session:new` | New call session established. Contains call details. |
|
|
598
228
|
| `session:redirect` | Call was redirected to this app. |
|
|
599
|
-
| `verb:hook` | An actionHook fired (e.g. gather completed). Contains `hook` (the actionHook name) and `data` (the payload).
|
|
229
|
+
| `verb:hook` | An actionHook fired (e.g. gather completed). Contains `hook` (the actionHook name) and `data` (the payload). Respond with an ack containing the next verb array. |
|
|
600
230
|
| `verb:status` | Informational verb status notification (no reply needed). |
|
|
601
231
|
| `call:status` | Call state changed (e.g. `completed`). |
|
|
602
232
|
| `llm:tool-call` | LLM requested a tool/function call. |
|
|
@@ -614,355 +244,14 @@ Key capabilities:
|
|
|
614
244
|
| `tts:tokens` | Stream TTS text tokens for incremental speech synthesis. |
|
|
615
245
|
| `tts:flush` | Signal end of a TTS token stream. |
|
|
616
246
|
|
|
617
|
-
### Session Events (SDK)
|
|
618
|
-
|
|
619
|
-
The SDK `Session` object emits events for common message types:
|
|
620
|
-
|
|
621
|
-
```typescript
|
|
622
|
-
// ActionHook events — the hook name IS the event name. Respond with .reply()
|
|
623
|
-
session.on('/echo', (data) => { /* gather actionHook fired */ session.say({text: '...'}).reply(); });
|
|
624
|
-
session.on('/dial-result', (data) => { /* dial actionHook */ session.reply(); });
|
|
625
|
-
session.on('/llm-complete', (data) => { /* llm actionHook */ session.hangup().reply(); });
|
|
626
|
-
|
|
627
|
-
// Fallback — fires for any verb:hook without a specific listener
|
|
628
|
-
session.on('verb:hook', (hook, data) => { /* generic actionHook handler */ });
|
|
629
|
-
|
|
630
|
-
// Status events (informational — no reply needed)
|
|
631
|
-
session.on('verb:status', (data) => { /* verb status notification */ });
|
|
632
|
-
session.on('call:status', (data) => { /* call state change */ });
|
|
633
|
-
|
|
634
|
-
// LLM events
|
|
635
|
-
session.on('llm:tool-call', (data) => { /* tool call from LLM */ });
|
|
636
|
-
session.on('llm:event', (data) => { /* LLM event */ });
|
|
637
|
-
|
|
638
|
-
// TTS streaming — specific lifecycle events
|
|
639
|
-
session.on('tts:stream_open', (data) => { /* vendor connection established */ });
|
|
640
|
-
session.on('tts:stream_paused', (data) => { /* backpressure: buffer full */ });
|
|
641
|
-
session.on('tts:stream_resumed', (data) => { /* backpressure released */ });
|
|
642
|
-
session.on('tts:stream_closed', (data) => { /* TTS stream ended */ });
|
|
643
|
-
session.on('tts:user_interruption', (data) => { /* user barge-in (with event data) */ });
|
|
644
|
-
session.on('tts:user_interrupt', () => { /* user barge-in (convenience, no data) */ });
|
|
645
|
-
// Catch-all for any TTS streaming event
|
|
646
|
-
session.on('tts:streaming-event', (data) => { /* data.event_type has the type */ });
|
|
647
|
-
|
|
648
|
-
// Connection lifecycle
|
|
649
|
-
session.on('close', (code, reason) => { /* connection closed */ });
|
|
650
|
-
session.on('error', (err) => { /* error */ });
|
|
651
|
-
```
|
|
652
|
-
|
|
653
|
-
## Audio WebSocket (Listen/Stream)
|
|
654
|
-
|
|
655
|
-
The `listen` and `stream` verbs open a separate WebSocket connection from jambonz to your application, carrying raw audio. This is independent of the control WebSocket (`ws.jambonz.org`) — it uses the `audio.drachtio.org` subprotocol.
|
|
656
|
-
|
|
657
|
-
### Receiving Audio in the Same Application
|
|
658
|
-
|
|
659
|
-
Use `makeService.audio()` to register an audio WebSocket handler on the same server that handles the control pipe:
|
|
660
|
-
|
|
661
|
-
```typescript
|
|
662
|
-
import http from 'http';
|
|
663
|
-
import { createEndpoint } from '@jambonz/sdk/websocket';
|
|
664
|
-
|
|
665
|
-
const server = http.createServer();
|
|
666
|
-
const makeService = createEndpoint({ server, port: 3000 });
|
|
667
|
-
|
|
668
|
-
// Control pipe — handles call sessions
|
|
669
|
-
const svc = makeService({ path: '/' });
|
|
670
|
-
|
|
671
|
-
// Audio pipe — receives listen/stream audio
|
|
672
|
-
const audioSvc = makeService.audio({ path: '/audio-stream' });
|
|
673
|
-
|
|
674
|
-
svc.on('session:new', (session) => {
|
|
675
|
-
session
|
|
676
|
-
.answer()
|
|
677
|
-
.say({ text: 'Recording your audio.' })
|
|
678
|
-
.listen({
|
|
679
|
-
url: '/audio-stream', // relative path — jambonz connects back to same server
|
|
680
|
-
sampleRate: 16000,
|
|
681
|
-
mixType: 'mono',
|
|
682
|
-
metadata: { purpose: 'recording' },
|
|
683
|
-
})
|
|
684
|
-
.hangup()
|
|
685
|
-
.send();
|
|
686
|
-
});
|
|
687
|
-
|
|
688
|
-
audioSvc.on('connection', (stream) => {
|
|
689
|
-
console.log(`Audio from call ${stream.callSid}, rate=${stream.sampleRate}`);
|
|
690
|
-
console.log('Metadata:', stream.metadata);
|
|
691
|
-
|
|
692
|
-
stream.on('audio', (pcm: Buffer) => {
|
|
693
|
-
// L16 PCM binary frames
|
|
694
|
-
});
|
|
695
|
-
|
|
696
|
-
stream.on('close', () => {
|
|
697
|
-
console.log('Audio stream closed');
|
|
698
|
-
});
|
|
699
|
-
});
|
|
700
|
-
```
|
|
701
|
-
|
|
702
|
-
### AudioStream API
|
|
703
|
-
|
|
704
|
-
The `stream` object in the `connection` event is an `AudioStream` instance:
|
|
705
|
-
|
|
706
|
-
**Properties**: `metadata` (initial JSON), `callSid`, `sampleRate`
|
|
707
|
-
|
|
708
|
-
**Events**:
|
|
709
|
-
- `audio` — L16 PCM binary frame (`Buffer`)
|
|
710
|
-
- `dtmf` — `{digit, duration}` (only if `passDtmf: true` on listen verb)
|
|
711
|
-
- `playDone` — `{id}` (after non-streaming playAudio completes)
|
|
712
|
-
- `mark` — `{name, event}` where event is `'playout'` or `'cleared'`
|
|
713
|
-
- `close` — `(code, reason)`
|
|
714
|
-
- `error` — `(err)`
|
|
715
|
-
|
|
716
|
-
### Sending Audio Back (Bidirectional)
|
|
717
|
-
|
|
718
|
-
The listen verb supports bidirectional audio. There are two modes, controlled by the `bidirectionalAudio.streaming` option on the listen verb.
|
|
719
|
-
|
|
720
|
-
**Non-streaming mode** (`streaming: false`, the default) — send complete audio clips as base64:
|
|
721
|
-
|
|
722
|
-
```typescript
|
|
723
|
-
stream.playAudio(base64Content, {
|
|
724
|
-
audioContentType: 'raw', // or 'wav'
|
|
725
|
-
sampleRate: 16000,
|
|
726
|
-
id: 'greeting', // optional — returned in playDone event
|
|
727
|
-
queuePlay: true, // true: queue after current; false: interrupt (default)
|
|
728
|
-
});
|
|
729
|
-
|
|
730
|
-
stream.on('playDone', (evt) => {
|
|
731
|
-
console.log(`Finished playing: ${evt.id}`);
|
|
732
|
-
});
|
|
733
|
-
```
|
|
734
|
-
|
|
735
|
-
Up to 10 playAudio commands can be queued simultaneously.
|
|
736
|
-
|
|
737
|
-
**Streaming mode** (`streaming: true`) — send raw binary PCM frames directly:
|
|
738
|
-
|
|
739
|
-
```typescript
|
|
740
|
-
// In the listen verb config:
|
|
741
|
-
// bidirectionalAudio: { enabled: true, streaming: true, sampleRate: 16000 }
|
|
742
|
-
|
|
743
|
-
stream.on('audio', (pcm) => {
|
|
744
|
-
// Echo audio back (or send processed/generated audio)
|
|
745
|
-
stream.sendAudio(pcm);
|
|
746
|
-
});
|
|
747
|
-
```
|
|
748
|
-
|
|
749
|
-
### Marks (Synchronization Markers)
|
|
750
|
-
|
|
751
|
-
Marks let you track when streamed audio has been played out to the caller. They work **only with bidirectional streaming mode** — you must enable `bidirectionalAudio: { enabled: true, streaming: true }` on the listen verb.
|
|
752
|
-
|
|
753
|
-
The pattern is: stream audio via `sendAudio()`, then send a mark. When all the audio sent before the mark finishes playing out, jambonz sends back a mark event with `event: 'playout'`. This is how you know the caller has heard a specific chunk of audio.
|
|
754
|
-
|
|
755
|
-
```typescript
|
|
756
|
-
// Listen verb must enable bidirectional streaming for marks to work
|
|
757
|
-
session
|
|
758
|
-
.listen({
|
|
759
|
-
url: '/audio',
|
|
760
|
-
actionHook: '/listen-done',
|
|
761
|
-
bidirectionalAudio: {
|
|
762
|
-
enabled: true,
|
|
763
|
-
streaming: true,
|
|
764
|
-
sampleRate: 8000,
|
|
765
|
-
},
|
|
766
|
-
})
|
|
767
|
-
.send();
|
|
768
|
-
|
|
769
|
-
// In the audio handler:
|
|
770
|
-
audioSvc.on('connection', (stream) => {
|
|
771
|
-
// Stream audio, then mark a sync point
|
|
772
|
-
stream.sendAudio(pcmBuffer);
|
|
773
|
-
stream.sendMark('chunk-1'); // fires 'playout' when audio above finishes playing
|
|
774
|
-
|
|
775
|
-
stream.sendAudio(morePcm);
|
|
776
|
-
stream.sendMark('chunk-2'); // fires 'playout' when this audio finishes
|
|
777
|
-
|
|
778
|
-
// Listen for mark events
|
|
779
|
-
stream.on('mark', (evt) => {
|
|
780
|
-
// evt.name = 'chunk-1' or 'chunk-2'
|
|
781
|
-
// evt.event = 'playout' (audio played) or 'cleared' (mark was cleared)
|
|
782
|
-
});
|
|
783
|
-
|
|
784
|
-
// Clear all pending marks (unplayed marks get event='cleared')
|
|
785
|
-
stream.clearMarks();
|
|
786
|
-
});
|
|
787
|
-
```
|
|
788
|
-
|
|
789
|
-
**Important**: Without `bidirectionalAudio.streaming: true`, marks are accepted but never fire — there is no playout buffer to sync against. This is the most common mistake when marks appear to silently fail.
|
|
790
|
-
|
|
791
|
-
### Other Commands
|
|
792
|
-
|
|
793
|
-
```typescript
|
|
794
|
-
stream.killAudio(); // Stop playback, flush buffer
|
|
795
|
-
stream.disconnect(); // Close connection, end listen verb
|
|
796
|
-
stream.sendMark('sync-pt'); // Insert synchronization marker
|
|
797
|
-
stream.clearMarks(); // Clear all pending markers
|
|
798
|
-
stream.close(); // Close the WebSocket
|
|
799
|
-
```
|
|
800
|
-
|
|
801
|
-
## Recording
|
|
802
|
-
|
|
803
|
-
jambonz supports SIPREC-based call recording. Recording is controlled mid-call via inject commands (WebSocket) or future REST API extensions.
|
|
804
|
-
|
|
805
|
-
### WebSocket Recording
|
|
806
|
-
```typescript
|
|
807
|
-
// Start recording — sends audio via SIPREC to a recording server
|
|
808
|
-
session.injectRecord('startCallRecording', {
|
|
809
|
-
siprecServerURL: 'sip:recorder@example.com',
|
|
810
|
-
recordingID: 'my-recording-123', // optional
|
|
811
|
-
});
|
|
812
|
-
|
|
813
|
-
// Pause/resume recording
|
|
814
|
-
session.injectRecord('pauseCallRecording');
|
|
815
|
-
session.injectRecord('resumeCallRecording');
|
|
816
|
-
|
|
817
|
-
// Stop recording
|
|
818
|
-
session.injectRecord('stopCallRecording');
|
|
819
|
-
```
|
|
820
|
-
|
|
821
|
-
**Important**: The `dial` verb must use `anchorMedia: true` for recording to work during bridged calls. Without media anchoring, audio doesn't flow through the jambonz media server.
|
|
822
|
-
|
|
823
247
|
## REST API
|
|
824
248
|
|
|
825
|
-
jambonz provides a REST API for platform management and active call control.
|
|
249
|
+
jambonz provides a REST API for platform management and active call control.
|
|
826
250
|
|
|
827
251
|
Key resources:
|
|
828
252
|
- **Calls** — Create outbound calls, query active calls, modify in-progress calls (redirect, whisper, mute, hangup)
|
|
829
253
|
- **Messages** — Send SMS/MMS messages
|
|
830
254
|
|
|
831
|
-
## Code Structure
|
|
832
|
-
|
|
833
|
-
### Single File (default)
|
|
834
|
-
|
|
835
|
-
For simple applications with 1-2 routes, put everything in a single file. This is the default for all examples in this repo and is perfectly suitable for production use.
|
|
836
|
-
|
|
837
|
-
### Multi-File with Routes Directory
|
|
838
|
-
|
|
839
|
-
For applications with 3+ routes or significant per-route logic, split into a `src/` directory with a routes folder:
|
|
840
|
-
|
|
841
|
-
```
|
|
842
|
-
src/
|
|
843
|
-
app.ts ← entry point: server setup, route registration
|
|
844
|
-
routes/
|
|
845
|
-
incoming.ts ← handler for one endpoint/path
|
|
846
|
-
hold-music.ts
|
|
847
|
-
queue-exit.ts
|
|
848
|
-
```
|
|
849
|
-
|
|
850
|
-
**Webhook pattern** — each route file exports an Express route handler:
|
|
851
|
-
|
|
852
|
-
```typescript
|
|
853
|
-
// src/routes/incoming.ts
|
|
854
|
-
import type { Request, Response } from 'express';
|
|
855
|
-
import { WebhookResponse } from '@jambonz/sdk/webhook';
|
|
856
|
-
|
|
857
|
-
export default function incoming(_req: Request, res: Response) {
|
|
858
|
-
const jambonz = new WebhookResponse();
|
|
859
|
-
jambonz
|
|
860
|
-
.say({ text: 'Thank you for calling. Please hold.' })
|
|
861
|
-
.enqueue({ name: 'support', waitHook: '/hold-music', actionHook: '/queue-exit' });
|
|
862
|
-
res.json(jambonz);
|
|
863
|
-
}
|
|
864
|
-
```
|
|
865
|
-
|
|
866
|
-
```typescript
|
|
867
|
-
// src/app.ts
|
|
868
|
-
import express from 'express';
|
|
869
|
-
import incoming from './routes/incoming.js';
|
|
870
|
-
import holdMusic from './routes/hold-music.js';
|
|
871
|
-
import queueExit from './routes/queue-exit.js';
|
|
872
|
-
|
|
873
|
-
const app = express();
|
|
874
|
-
app.use(express.json());
|
|
875
|
-
|
|
876
|
-
app.post('/incoming', incoming);
|
|
877
|
-
app.post('/hold-music', holdMusic);
|
|
878
|
-
app.post('/queue-exit', queueExit);
|
|
879
|
-
|
|
880
|
-
app.listen(3000, () => console.log('Listening on port 3000'));
|
|
881
|
-
```
|
|
882
|
-
|
|
883
|
-
**WebSocket pattern** — there are two cases to consider:
|
|
884
|
-
|
|
885
|
-
1. **Multiple services** (different `makeService({ path })` calls — each path gets its own `session:new`). Each route file exports a function that takes a session:
|
|
886
|
-
|
|
887
|
-
```typescript
|
|
888
|
-
// src/routes/caller.ts
|
|
889
|
-
import type { Session } from '@jambonz/sdk/websocket';
|
|
890
|
-
|
|
891
|
-
export default function caller(session: Session) {
|
|
892
|
-
session
|
|
893
|
-
.say({ text: 'Please hold.' })
|
|
894
|
-
.enqueue({ name: 'support', waitHook: '/hold-music', actionHook: '/queue-exit' })
|
|
895
|
-
.send();
|
|
896
|
-
}
|
|
897
|
-
```
|
|
898
|
-
|
|
899
|
-
```typescript
|
|
900
|
-
// src/app.ts
|
|
901
|
-
import http from 'http';
|
|
902
|
-
import { createEndpoint } from '@jambonz/sdk/websocket';
|
|
903
|
-
import caller from './routes/caller.js';
|
|
904
|
-
import agent from './routes/agent.js';
|
|
905
|
-
|
|
906
|
-
const server = http.createServer();
|
|
907
|
-
const makeService = createEndpoint({ server, port: 3000 });
|
|
908
|
-
|
|
909
|
-
makeService({ path: '/incoming' }).on('session:new', (session) => caller(session));
|
|
910
|
-
makeService({ path: '/agent' }).on('session:new', (session) => agent(session));
|
|
911
|
-
```
|
|
912
|
-
|
|
913
|
-
2. **Multiple actionHook handlers on one session** — extract handler functions, but register them all within `session:new`:
|
|
914
|
-
|
|
915
|
-
```typescript
|
|
916
|
-
// src/routes/echo-handler.ts
|
|
917
|
-
import type { Session } from '@jambonz/sdk/websocket';
|
|
918
|
-
|
|
919
|
-
export default function echoHandler(session: Session, evt: Record<string, any>) {
|
|
920
|
-
if (evt.reason === 'speechDetected') {
|
|
921
|
-
const text = evt.speech?.alternatives?.[0]?.transcript || 'nothing';
|
|
922
|
-
session.say({ text: `You said: ${text}` })
|
|
923
|
-
.gather({ input: ['speech'], actionHook: '/echo', timeout: 10 })
|
|
924
|
-
.reply();
|
|
925
|
-
} else {
|
|
926
|
-
session.gather({ input: ['speech'], actionHook: '/echo', timeout: 10,
|
|
927
|
-
say: { text: 'I didn\'t hear anything. Try again.' } }).reply();
|
|
928
|
-
}
|
|
929
|
-
}
|
|
930
|
-
```
|
|
931
|
-
|
|
932
|
-
```typescript
|
|
933
|
-
// src/app.ts — wire it up
|
|
934
|
-
svc.on('session:new', (session) => {
|
|
935
|
-
session.on('/echo', (evt) => echoHandler(session, evt));
|
|
936
|
-
session.gather({ input: ['speech'], actionHook: '/echo', timeout: 10,
|
|
937
|
-
say: { text: 'Say something.' } }).send();
|
|
938
|
-
});
|
|
939
|
-
```
|
|
940
|
-
|
|
941
|
-
### When to Split
|
|
942
|
-
|
|
943
|
-
- **1-2 routes, simple logic** → single file
|
|
944
|
-
- **3+ routes or substantial per-route logic** → `src/app.ts` + `src/routes/`
|
|
945
|
-
- **Shared config, prompts, or utilities** → `src/config.ts`, `src/prompts.ts`, etc.
|
|
946
|
-
|
|
947
|
-
When in doubt, start with a single file. It's easy to split later.
|
|
948
|
-
|
|
949
|
-
## Examples
|
|
950
|
-
|
|
951
|
-
Complete working examples are in the `examples/` directory:
|
|
952
|
-
- **hello-world** — Minimal greeting (webhook + WebSocket)
|
|
953
|
-
- **echo** — Speech echo using gather with actionHook pattern (webhook + WebSocket). The canonical example for understanding actionHook event handling.
|
|
954
|
-
- **ivr-menu** — Interactive menu with speech and DTMF input (webhook)
|
|
955
|
-
- **dial** — Simple outbound dial to a phone number (webhook)
|
|
956
|
-
- **listen-record** — Record audio using the listen verb to stream to a WebSocket (webhook)
|
|
957
|
-
- **voice-agent** — LLM-powered conversational AI with tool calls (webhook + WebSocket)
|
|
958
|
-
- **openai-realtime** — OpenAI Realtime API voice agent with function calling (WebSocket)
|
|
959
|
-
- **deepgram-voice-agent** — Deepgram Voice Agent API with function calling (WebSocket)
|
|
960
|
-
- **elevenlabs-voice-agent** — ElevenLabs Conversational AI agent (WebSocket). Demonstrates the agent_id auth pattern unique to ElevenLabs.
|
|
961
|
-
- **llm-streaming** — Anthropic LLM with TTS token streaming and barge-in (WebSocket)
|
|
962
|
-
- **queue-with-hold** — Call queue with hold music and agent dequeue (webhook + WebSocket)
|
|
963
|
-
- **call-recording** — Mid-call recording control via REST API and inject commands (webhook + WebSocket)
|
|
964
|
-
- **realtime-translator** — Bridges two parties with real-time speech translation using STT, Google Translate, and TTS dub tracks. Multi-file example with `src/routes/` structure (WebSocket)
|
|
965
|
-
|
|
966
255
|
## Key Concepts
|
|
967
256
|
|
|
968
257
|
- **Verb**: A JSON object with a `verb` property that tells jambonz what to do. Verbs execute sequentially.
|
package/README.md
ADDED
|
@@ -0,0 +1,64 @@
|
|
|
1
|
+
# @jambonz/schema
|
|
2
|
+
|
|
3
|
+
JSON Schema definitions and validation for jambonz verb applications.
|
|
4
|
+
|
|
5
|
+
## What's Included
|
|
6
|
+
|
|
7
|
+
- **33 verb schemas** (`verbs/`) -- every jambonz verb (say, gather, dial, openai_s2s, pipeline, etc.)
|
|
8
|
+
- **42 component schemas** (`components/`) -- shared types (synthesizer, recognizer, target, actionHook, etc.)
|
|
9
|
+
- **32 callback schemas** (`callbacks/`) -- actionHook payload definitions for each verb
|
|
10
|
+
- **AGENTS.md** -- language-agnostic developer guide covering the verb model, transport modes, and protocol
|
|
11
|
+
- **docs/** -- additional reference documentation
|
|
12
|
+
- **jambonz-app.schema.json** -- the full application schema (JSON Schema draft 2020-12)
|
|
13
|
+
|
|
14
|
+
## Installation
|
|
15
|
+
|
|
16
|
+
```bash
|
|
17
|
+
npm install @jambonz/schema
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
## Usage
|
|
21
|
+
|
|
22
|
+
```javascript
|
|
23
|
+
const { validate, validateVerb, normalizeJambones } = require('@jambonz/schema');
|
|
24
|
+
|
|
25
|
+
// Validate a single verb object
|
|
26
|
+
const verb = { verb: 'say', text: 'Hello world' };
|
|
27
|
+
const result = validateVerb(verb);
|
|
28
|
+
if (!result.valid) {
|
|
29
|
+
console.error(result.errors);
|
|
30
|
+
}
|
|
31
|
+
|
|
32
|
+
// Validate a full jambonz application (array of verbs)
|
|
33
|
+
const app = [
|
|
34
|
+
{ verb: 'say', text: 'Welcome.' },
|
|
35
|
+
{ verb: 'gather', input: ['speech'], actionHook: '/input', timeout: 10 }
|
|
36
|
+
];
|
|
37
|
+
const appResult = validate(app);
|
|
38
|
+
|
|
39
|
+
// Normalize legacy verb names and formats
|
|
40
|
+
const normalized = normalizeJambones(app);
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
## Schema Format
|
|
44
|
+
|
|
45
|
+
All schemas use **JSON Schema draft 2020-12**. The root application schema (`jambonz-app.schema.json`) references individual verb and component schemas via `$ref`.
|
|
46
|
+
|
|
47
|
+
## API
|
|
48
|
+
|
|
49
|
+
| Function | Description |
|
|
50
|
+
|----------|-------------|
|
|
51
|
+
| `validate(app)` | Validate a verb array or single verb against the schema |
|
|
52
|
+
| `validateVerb(verb)` | Validate a single verb object |
|
|
53
|
+
| `validateApp(app)` | Validate a complete jambonz application array |
|
|
54
|
+
| `normalizeJambones(app)` | Normalize legacy verb names and synonyms (e.g. `listen` -> `stream`, `llm` -> `s2s`) |
|
|
55
|
+
|
|
56
|
+
## Links
|
|
57
|
+
|
|
58
|
+
- [jambonz.org](https://jambonz.org) -- platform documentation
|
|
59
|
+
- [@jambonz/mcp-schema-server](https://github.com/jambonz/mcp-server) -- MCP server for AI agent integration
|
|
60
|
+
- [GitHub](https://github.com/jambonz/schema)
|
|
61
|
+
|
|
62
|
+
## License
|
|
63
|
+
|
|
64
|
+
MIT
|
package/jambonz-app.schema.json
CHANGED
package/package.json
CHANGED
package/verbs/dial.schema.json
CHANGED
|
@@ -59,9 +59,19 @@
|
|
|
59
59
|
"description": "URL of an audio file to play to the caller while the outbound call is ringing. Replaces the default ringback tone."
|
|
60
60
|
},
|
|
61
61
|
"dtmfCapture": {
|
|
62
|
-
"
|
|
63
|
-
|
|
64
|
-
|
|
62
|
+
"oneOf": [
|
|
63
|
+
{
|
|
64
|
+
"type": "array",
|
|
65
|
+
"items": { "type": "string" },
|
|
66
|
+
"description": "Array of DTMF patterns to capture on both call legs."
|
|
67
|
+
},
|
|
68
|
+
{
|
|
69
|
+
"type": "object",
|
|
70
|
+
"description": "Per-leg DTMF capture configuration with childCall and/or parentCall arrays.",
|
|
71
|
+
"additionalProperties": true
|
|
72
|
+
}
|
|
73
|
+
],
|
|
74
|
+
"description": "Configuration for capturing DTMF digits during the bridged call. Can be a simple array of patterns (applied to both legs) or an object with childCall/parentCall arrays."
|
|
65
75
|
},
|
|
66
76
|
"dtmfHook": {
|
|
67
77
|
"$ref": "../components/actionHook",
|
|
@@ -71,7 +81,7 @@
|
|
|
71
81
|
"type": "object",
|
|
72
82
|
"description": "Custom SIP headers to include on the outbound INVITE.",
|
|
73
83
|
"additionalProperties": {
|
|
74
|
-
"type": "string"
|
|
84
|
+
"oneOf": [{ "type": "string" }, { "type": "number" }]
|
|
75
85
|
}
|
|
76
86
|
},
|
|
77
87
|
"anchorMedia": {
|
package/verbs/hangup.schema.json
CHANGED
|
@@ -0,0 +1,113 @@
|
|
|
1
|
+
{
|
|
2
|
+
"$schema": "https://json-schema.org/draft/2020-12/schema",
|
|
3
|
+
"$id": "https://jambonz.org/schema/verbs/rest:dial",
|
|
4
|
+
"title": "REST Dial",
|
|
5
|
+
"description": "Internal verb used to originate an outbound call via the REST API. Not typically used directly in application verb arrays.",
|
|
6
|
+
"type": "object",
|
|
7
|
+
"properties": {
|
|
8
|
+
"verb": {
|
|
9
|
+
"const": "rest:dial"
|
|
10
|
+
},
|
|
11
|
+
"id": {
|
|
12
|
+
"type": "string"
|
|
13
|
+
},
|
|
14
|
+
"account_sid": {
|
|
15
|
+
"type": "string"
|
|
16
|
+
},
|
|
17
|
+
"application_sid": {
|
|
18
|
+
"type": "string"
|
|
19
|
+
},
|
|
20
|
+
"call_hook": {
|
|
21
|
+
"oneOf": [
|
|
22
|
+
{ "type": "string" },
|
|
23
|
+
{ "type": "object" }
|
|
24
|
+
],
|
|
25
|
+
"description": "Webhook URL or object for call control."
|
|
26
|
+
},
|
|
27
|
+
"call_status_hook": {
|
|
28
|
+
"oneOf": [
|
|
29
|
+
{ "type": "string" },
|
|
30
|
+
{ "type": "object" }
|
|
31
|
+
],
|
|
32
|
+
"description": "Webhook URL or object for call status notifications."
|
|
33
|
+
},
|
|
34
|
+
"from": {
|
|
35
|
+
"type": "string",
|
|
36
|
+
"description": "The caller ID for the outbound call."
|
|
37
|
+
},
|
|
38
|
+
"callerName": {
|
|
39
|
+
"type": "string",
|
|
40
|
+
"description": "Display name for the caller."
|
|
41
|
+
},
|
|
42
|
+
"fromHost": {
|
|
43
|
+
"type": "string",
|
|
44
|
+
"description": "SIP host to use in the From header."
|
|
45
|
+
},
|
|
46
|
+
"speech_synthesis_vendor": {
|
|
47
|
+
"type": "string"
|
|
48
|
+
},
|
|
49
|
+
"speech_synthesis_voice": {
|
|
50
|
+
"type": "string"
|
|
51
|
+
},
|
|
52
|
+
"speech_synthesis_language": {
|
|
53
|
+
"type": "string"
|
|
54
|
+
},
|
|
55
|
+
"speech_recognizer_vendor": {
|
|
56
|
+
"type": "string"
|
|
57
|
+
},
|
|
58
|
+
"speech_recognizer_language": {
|
|
59
|
+
"type": "string"
|
|
60
|
+
},
|
|
61
|
+
"tag": {
|
|
62
|
+
"type": "object",
|
|
63
|
+
"description": "Arbitrary metadata to attach to the call.",
|
|
64
|
+
"additionalProperties": true
|
|
65
|
+
},
|
|
66
|
+
"to": {
|
|
67
|
+
"$ref": "../components/target",
|
|
68
|
+
"description": "The call destination."
|
|
69
|
+
},
|
|
70
|
+
"headers": {
|
|
71
|
+
"type": "object",
|
|
72
|
+
"description": "Custom SIP headers to include on the outbound INVITE.",
|
|
73
|
+
"additionalProperties": {
|
|
74
|
+
"oneOf": [
|
|
75
|
+
{ "type": "string" },
|
|
76
|
+
{ "type": "number" }
|
|
77
|
+
]
|
|
78
|
+
}
|
|
79
|
+
},
|
|
80
|
+
"timeout": {
|
|
81
|
+
"type": "number",
|
|
82
|
+
"description": "Ring timeout in seconds."
|
|
83
|
+
},
|
|
84
|
+
"amd": {
|
|
85
|
+
"$ref": "../components/amd",
|
|
86
|
+
"description": "Answering machine detection configuration."
|
|
87
|
+
},
|
|
88
|
+
"dual_streams": {
|
|
89
|
+
"type": "boolean",
|
|
90
|
+
"description": "If true, send separate audio streams for each call leg."
|
|
91
|
+
},
|
|
92
|
+
"sipRequestWithinDialogHook": {
|
|
93
|
+
"type": "string",
|
|
94
|
+
"description": "Webhook for in-dialog SIP requests."
|
|
95
|
+
},
|
|
96
|
+
"referHook": {
|
|
97
|
+
"oneOf": [
|
|
98
|
+
{ "type": "string" },
|
|
99
|
+
{ "type": "object" }
|
|
100
|
+
],
|
|
101
|
+
"description": "Webhook for SIP REFER handling."
|
|
102
|
+
},
|
|
103
|
+
"timeLimit": {
|
|
104
|
+
"type": "number",
|
|
105
|
+
"description": "Maximum call duration in seconds."
|
|
106
|
+
}
|
|
107
|
+
},
|
|
108
|
+
"required": [
|
|
109
|
+
"call_hook",
|
|
110
|
+
"from",
|
|
111
|
+
"to"
|
|
112
|
+
]
|
|
113
|
+
}
|