@adaptic/maestro 1.1.6 → 1.1.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/commands/init-maestro.md +225 -279
- package/README.md +19 -2
- package/docs/guides/email-setup.md +399 -0
- package/docs/guides/media-generation-setup.md +349 -0
- package/docs/guides/outbound-governance-setup.md +438 -0
- package/docs/guides/pdf-generation-setup.md +315 -0
- package/docs/guides/poller-daemon-setup.md +550 -0
- package/docs/guides/rag-context-setup.md +459 -0
- package/docs/guides/slack-setup.md +348 -0
- package/docs/guides/voice-sms-setup.md +698 -0
- package/docs/guides/whatsapp-setup.md +282 -0
- package/docs/runbooks/mac-mini-bootstrap.md +21 -0
- package/package.json +1 -1
- package/scaffold/config/caller-id-map.yaml +46 -0
- package/scripts/media-generation/README.md +2 -0
- package/scripts/pdf-generation/README.md +2 -0
- package/scripts/poller/slack-poller.mjs +22 -7
- package/scripts/poller/trigger.mjs +12 -1
- package/scripts/setup/boot-claude-session.sh +4 -8
- package/scripts/setup/configure-macos.sh +8 -4
|
@@ -0,0 +1,698 @@
|
|
|
1
|
+
# Voice & SMS Setup Guide
|
|
2
|
+
|
|
3
|
+
How to enable real-time voice calls (Slack huddles) and SMS messaging for a Maestro agent. This guide covers the full stack: Twilio phone number, inbound/outbound SMS, Slack huddle voice participation (Deepgram STT + ElevenLabs TTS), Cloudflare tunnel exposure, caller ID mapping, and voice transcript parsing.
|
|
4
|
+
|
|
5
|
+
**Prerequisites**: Complete the [Mac Mini Bootstrap](../runbooks/mac-mini-bootstrap.md) and [Agent Persona Setup](agent-persona-setup.md) first. This guide assumes the agent repo is scaffolded and `/init-maestro` Phase 0–3 are done.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Architecture Overview
|
|
10
|
+
|
|
11
|
+
The voice/SMS stack has three layers:
|
|
12
|
+
|
|
13
|
+
```
|
|
14
|
+
┌─────────────────────────────────────────────────────────────┐
|
|
15
|
+
│ Layer 1: SMS (Twilio) │
|
|
16
|
+
│ ┌──────────────┐ ┌──────────────────┐ ┌────────────┐ │
|
|
17
|
+
│ │ Twilio Cloud │───▶│ Cloudflare Tunnel │───▶│ sms-handler│ │
|
|
18
|
+
│ │ (webhooks) │ │ (port 3001) │ │ .mjs │ │
|
|
19
|
+
│ └──────────────┘ └──────────────────┘ └─────┬──────┘ │
|
|
20
|
+
│ │ │
|
|
21
|
+
│ state/inbox/ │
|
|
22
|
+
│ sms/*.yaml │
|
|
23
|
+
├─────────────────────────────────────────────────────────────┤
|
|
24
|
+
│ Layer 2: Slack Huddle Voice │
|
|
25
|
+
│ ┌──────────┐ ┌───────────┐ ┌──────────────────────┐ │
|
|
26
|
+
│ │ Slack │◀─▶│ CDP │◀─▶│ huddle-controller │ │
|
|
27
|
+
│ │ Desktop │ │ (9222) │ │ .mjs │ │
|
|
28
|
+
│ └────┬─────┘ └───────────┘ └──────────────────────┘ │
|
|
29
|
+
│ │ │
|
|
30
|
+
│ ┌────▼──────────────────────────────────────────────────┐ │
|
|
31
|
+
│ │ BlackHole 2ch (capture) ←→ Deepgram STT │ │
|
|
32
|
+
│ │ BlackHole 16ch (playback) ←→ ElevenLabs TTS │ │
|
|
33
|
+
│ │ audio-bridge.mjs │ │
|
|
34
|
+
│ └────────────────────────────────────────────────────────┘ │
|
|
35
|
+
├─────────────────────────────────────────────────────────────┤
|
|
36
|
+
│ Layer 3: Transcript Processing │
|
|
37
|
+
│ ┌────────────────────┐ ┌─────────────────────────────┐ │
|
|
38
|
+
│ │ parse-voice- │──▶│ state/inbox/voice/*.yaml │ │
|
|
39
|
+
│ │ transcript.mjs │ │ (action items, priorities) │ │
|
|
40
|
+
│ └────────────────────┘ └─────────────────────────────┘ │
|
|
41
|
+
└─────────────────────────────────────────────────────────────┘
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
**Data flow summary**:
|
|
45
|
+
- **SMS in**: Twilio → Cloudflare tunnel → `sms-handler.mjs` (port 3001) → inbox YAML → inbox processor
|
|
46
|
+
- **SMS out**: Agent calls `scripts/send-sms.sh` → Twilio REST API → recipient phone
|
|
47
|
+
- **Voice in**: Slack huddle audio → BlackHole 2ch → sox → Deepgram STT → transcript
|
|
48
|
+
- **Voice out**: Claude response → ElevenLabs TTS → sox → BlackHole 16ch → Slack huddle mic
|
|
49
|
+
- **Voice post-call**: Transcript summary → `parse-voice-transcript.mjs` → inbox YAML → action routing
|
|
50
|
+
|
|
51
|
+
---
|
|
52
|
+
|
|
53
|
+
## 1. Twilio Account & Phone Number
|
|
54
|
+
|
|
55
|
+
### 1.1 Create a Twilio Account
|
|
56
|
+
|
|
57
|
+
1. Sign up at https://www.twilio.com/ (pay-as-you-go, no minimum commitment)
|
|
58
|
+
2. Verify your email and phone number
|
|
59
|
+
3. From the Twilio Console dashboard, copy:
|
|
60
|
+
- **Account SID** (starts with `AC`)
|
|
61
|
+
- **Auth Token** (click to reveal)
|
|
62
|
+
|
|
63
|
+
### 1.2 Purchase a Phone Number
|
|
64
|
+
|
|
65
|
+
1. Go to Twilio Console → Phone Numbers → Buy a Number
|
|
66
|
+
2. Search for a number with **both SMS and Voice capabilities** enabled
|
|
67
|
+
3. Select a number in a region appropriate for your agent's primary contacts
|
|
68
|
+
4. Note the number in E.164 format (e.g., `+15551234567`)
|
|
69
|
+
|
|
70
|
+
**Tip**: US numbers are cheapest (~$1.15/month). If your contacts are international, check Twilio's [geo-permissions](https://console.twilio.com/us1/develop/sms/settings/geo-permissions) and enable the countries you need for inbound/outbound SMS.
|
|
71
|
+
|
|
72
|
+
### 1.3 Configure Environment Variables
|
|
73
|
+
|
|
74
|
+
Add to your agent's `.env`:
|
|
75
|
+
|
|
76
|
+
```bash
|
|
77
|
+
# Twilio credentials
|
|
78
|
+
TWILIO_ACCOUNT_SID=ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
|
79
|
+
TWILIO_AUTH_TOKEN=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
|
80
|
+
TWILIO_PHONE_NUMBER=+15551234567
|
|
81
|
+
TWILIO_PHONE_SID=PNxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx # Optional: from Phone Numbers → your number
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
### 1.4 Enable Geo-Permissions (International SMS)
|
|
85
|
+
|
|
86
|
+
If your agent needs to send/receive SMS to/from international numbers:
|
|
87
|
+
|
|
88
|
+
1. Go to Twilio Console → Messaging → Settings → Geo Permissions
|
|
89
|
+
2. Enable each country where your contacts have phone numbers
|
|
90
|
+
3. **Critical for UAE**: Enable "United Arab Emirates" if your principal or contacts use UAE numbers
|
|
91
|
+
|
|
92
|
+
---
|
|
93
|
+
|
|
94
|
+
## 2. SMS — Inbound (Webhook Handler)
|
|
95
|
+
|
|
96
|
+
The inbound SMS handler (`scripts/sms-handler.mjs`) receives Twilio webhook POSTs and writes messages to the agent's inbox for processing.
|
|
97
|
+
|
|
98
|
+
### 2.1 How It Works
|
|
99
|
+
|
|
100
|
+
1. Twilio receives an SMS to your agent's phone number
|
|
101
|
+
2. Twilio sends an HTTP POST to your configured webhook URL
|
|
102
|
+
3. `sms-handler.mjs` parses the message, looks up the sender in `config/caller-id-map.yaml`
|
|
103
|
+
4. Creates a YAML file in `state/inbox/sms/` with sender identity, access level, and priority
|
|
104
|
+
5. Returns empty TwiML (prevents Twilio from auto-replying)
|
|
105
|
+
6. CEO messages automatically create priority trigger files for immediate processing
|
|
106
|
+
|
|
107
|
+
### 2.2 Start the SMS Handler
|
|
108
|
+
|
|
109
|
+
```bash
|
|
110
|
+
# Start the handler (default port 3001)
|
|
111
|
+
node scripts/sms-handler.mjs
|
|
112
|
+
|
|
113
|
+
# Or with a custom port
|
|
114
|
+
SMS_PORT=3005 node scripts/sms-handler.mjs
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
Verify it's running:
|
|
118
|
+
|
|
119
|
+
```bash
|
|
120
|
+
curl http://localhost:3001/health
|
|
121
|
+
# Expected: {"status":"ok","uptime":...}
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
### 2.3 Expose via Cloudflare Tunnel
|
|
125
|
+
|
|
126
|
+
The handler runs locally — Twilio needs a public URL to reach it. Use Cloudflare's free quick tunnels:
|
|
127
|
+
|
|
128
|
+
```bash
|
|
129
|
+
# Install cloudflared if not already present
|
|
130
|
+
brew install cloudflared
|
|
131
|
+
|
|
132
|
+
# Start a tunnel to the SMS handler
|
|
133
|
+
cloudflared tunnel --url http://localhost:3001
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
Cloudflared prints a public URL like `https://random-words-here.trycloudflare.com`. Copy this URL.
|
|
137
|
+
|
|
138
|
+
**Important**: Quick tunnels generate a new URL each time. For production stability, set up a [named tunnel](https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/get-started/) with a fixed subdomain.
|
|
139
|
+
|
|
140
|
+
### 2.4 Configure Twilio Webhook
|
|
141
|
+
|
|
142
|
+
1. Go to Twilio Console → Phone Numbers → Active Numbers → your number
|
|
143
|
+
2. Under **Messaging** → "A message comes in":
|
|
144
|
+
- Set to **Webhook**
|
|
145
|
+
- URL: `https://your-tunnel-url.trycloudflare.com/sms`
|
|
146
|
+
- Method: **HTTP POST**
|
|
147
|
+
3. Click **Save configuration**
|
|
148
|
+
|
|
149
|
+
### 2.5 Test Inbound SMS
|
|
150
|
+
|
|
151
|
+
Send an SMS to your agent's Twilio number from your phone. Check:
|
|
152
|
+
|
|
153
|
+
```bash
|
|
154
|
+
# Verify the message landed in the inbox
|
|
155
|
+
ls state/inbox/sms/
|
|
156
|
+
|
|
157
|
+
# Check the SMS logs
|
|
158
|
+
cat logs/sms/$(date +%Y-%m-%d)/*.jsonl | tail -1 | jq .
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
---
|
|
162
|
+
|
|
163
|
+
## 3. SMS — Outbound
|
|
164
|
+
|
|
165
|
+
Outbound SMS uses `scripts/send-sms.sh`, a shell wrapper around Twilio's REST API with built-in deduplication and audit logging.
|
|
166
|
+
|
|
167
|
+
### 3.1 Send an SMS
|
|
168
|
+
|
|
169
|
+
```bash
|
|
170
|
+
./scripts/send-sms.sh --to "+1234567890" --body "Hello from your agent"
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
Options:
|
|
174
|
+
|
|
175
|
+
| Flag | Description | Required |
|
|
176
|
+
|----------|------------------------------------------------|----------|
|
|
177
|
+
| `--to` | Recipient phone number (E.164 format) | Yes |
|
|
178
|
+
| `--body` | Message text | Yes |
|
|
179
|
+
| `--from` | Sender number (defaults to `$TWILIO_PHONE_NUMBER`) | No |
|
|
180
|
+
|
|
181
|
+
### 3.2 Deduplication
|
|
182
|
+
|
|
183
|
+
The send script integrates with `scripts/outbound-dedup.sh` to prevent duplicate sends when multiple sessions are running concurrently. This is automatic — no configuration needed.
|
|
184
|
+
|
|
185
|
+
### 3.3 Audit Trail
|
|
186
|
+
|
|
187
|
+
Every outbound SMS is logged to two locations:
|
|
188
|
+
|
|
189
|
+
- `logs/sms/YYYY-MM-DD/YYYY-MM-DD-sms.jsonl` — SMS-specific log
|
|
190
|
+
- `logs/audit/YYYY-MM-DD-actions.jsonl` — Global audit trail
|
|
191
|
+
|
|
192
|
+
---
|
|
193
|
+
|
|
194
|
+
## 4. Caller ID Mapping
|
|
195
|
+
|
|
196
|
+
Both the SMS handler and voice transcript parser use `config/caller-id-map.yaml` to identify callers and assign access levels. This file is a **security boundary** — it determines what information the agent can share with each caller.
|
|
197
|
+
|
|
198
|
+
### 4.1 Create the Caller ID Map
|
|
199
|
+
|
|
200
|
+
If your agent was scaffolded with `npx @adaptic/maestro create`, a template exists at `config/caller-id-map.yaml`. If not, create one:
|
|
201
|
+
|
|
202
|
+
```yaml
|
|
203
|
+
# config/caller-id-map.yaml
|
|
204
|
+
#
|
|
205
|
+
# Maps phone numbers, Slack IDs, and emails to user identities with access levels.
|
|
206
|
+
# Used by: sms-handler.mjs, parse-voice-transcript.mjs, user-context-search.py
|
|
207
|
+
#
|
|
208
|
+
# Access levels:
|
|
209
|
+
# ceo — Full access to all paths (memory, knowledge, state, outputs, docs, logs)
|
|
210
|
+
# leadership — Company knowledge + docs + research/briefs + own interaction logs
|
|
211
|
+
# partner — Public knowledge + research + own interaction logs
|
|
212
|
+
# default — Public company knowledge only (knowledge/sources/)
|
|
213
|
+
|
|
214
|
+
users:
|
|
215
|
+
# Principal (the person the agent reports to)
|
|
216
|
+
principal-name:
|
|
217
|
+
name: "Full Name"
|
|
218
|
+
phone: ["+1XXXXXXXXXX"]
|
|
219
|
+
whatsapp: ["+1XXXXXXXXXX"]
|
|
220
|
+
slack_id: "UXXXXXXXXXX"
|
|
221
|
+
email: "name@company.com"
|
|
222
|
+
access_level: ceo
|
|
223
|
+
|
|
224
|
+
# Add team members, partners, and other contacts below
|
|
225
|
+
# team-member:
|
|
226
|
+
# name: "Team Member Name"
|
|
227
|
+
# phone: []
|
|
228
|
+
# slack_id: "UXXXXXXXXXX"
|
|
229
|
+
# email: "member@company.com"
|
|
230
|
+
# access_level: leadership
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
### 4.2 Access Level Reference
|
|
234
|
+
|
|
235
|
+
| Level | Accessible Paths | Typical Recipients |
|
|
236
|
+
|--------------|-----------------------------------------------------------------------------|----------------------------------|
|
|
237
|
+
| `ceo` | Everything: memory/, knowledge/, state/, outputs/, docs/, logs/, all interactions | CEO / principal |
|
|
238
|
+
| `leadership` | knowledge/ (company), docs/, research/briefs, own interaction logs | C-suite, directors, senior team |
|
|
239
|
+
| `partner` | knowledge/sources/ (public), research, own interaction logs | JV partners, advisors |
|
|
240
|
+
| `default` | knowledge/sources/ only (public company knowledge) | Unknown callers, vendors |
|
|
241
|
+
|
|
242
|
+
### 4.3 Hot Reload
|
|
243
|
+
|
|
244
|
+
The SMS handler reloads `caller-id-map.yaml` every 60 seconds automatically. No restart needed when adding contacts.
|
|
245
|
+
|
|
246
|
+
---
|
|
247
|
+
|
|
248
|
+
## 5. Voice — Slack Huddle Infrastructure
|
|
249
|
+
|
|
250
|
+
The voice stack enables the agent to join Slack huddles, listen via STT, reason via Claude, and speak via TTS. This requires virtual audio routing and three API integrations.
|
|
251
|
+
|
|
252
|
+
### 5.1 Install Audio Dependencies
|
|
253
|
+
|
|
254
|
+
Run the setup script:
|
|
255
|
+
|
|
256
|
+
```bash
|
|
257
|
+
cd scripts/huddle
|
|
258
|
+
./setup-audio.sh
|
|
259
|
+
```
|
|
260
|
+
|
|
261
|
+
This installs:
|
|
262
|
+
|
|
263
|
+
| Component | Install Method | Purpose |
|
|
264
|
+
|-------------------|----------------------|--------------------------------------------------|
|
|
265
|
+
| BlackHole 2ch | `brew install blackhole-2ch` | Captures Slack's speaker output (what others say) |
|
|
266
|
+
| BlackHole 16ch | `brew install blackhole-16ch` | Carries TTS output to Slack's mic (what agent says) |
|
|
267
|
+
| sox | `brew install sox` | Audio capture and playback between devices |
|
|
268
|
+
| SwitchAudioSource | `brew install switchaudio-osx`| Lists and verifies audio devices |
|
|
269
|
+
|
|
270
|
+
**Why two BlackHole devices?** Using separate virtual audio devices for capture (2ch) and playback (16ch) prevents feedback loops. If the same device were used for both, the agent would hear its own TTS output, transcribe it, and respond to itself in an infinite loop.
|
|
271
|
+
|
|
272
|
+
### 5.2 Configure Slack Audio
|
|
273
|
+
|
|
274
|
+
Open Slack → Preferences → Audio & Video:
|
|
275
|
+
|
|
276
|
+
| Setting | Value | Purpose |
|
|
277
|
+
|--------------|------------------|----------------------------------------------|
|
|
278
|
+
| Speaker | BlackHole 2ch | Routes huddle audio to the capture pipeline |
|
|
279
|
+
| Microphone | BlackHole 16ch | Routes TTS output into the huddle |
|
|
280
|
+
|
|
281
|
+
**Note**: System audio (built-in speakers/mic) remains unchanged. Only Slack's audio is rerouted.
|
|
282
|
+
|
|
283
|
+
### 5.3 Verify Audio Setup
|
|
284
|
+
|
|
285
|
+
```bash
|
|
286
|
+
# Check all devices are present
|
|
287
|
+
./scripts/huddle/setup-audio.sh --check
|
|
288
|
+
|
|
289
|
+
# Test capture and playback
|
|
290
|
+
./scripts/huddle/setup-audio.sh --test
|
|
291
|
+
```
|
|
292
|
+
|
|
293
|
+
### 5.4 Configure Voice API Keys
|
|
294
|
+
|
|
295
|
+
Add to `.env`:
|
|
296
|
+
|
|
297
|
+
```bash
|
|
298
|
+
# Speech-to-text (Deepgram)
|
|
299
|
+
# Sign up: https://deepgram.com/ (free tier: 200 hours/year)
|
|
300
|
+
DEEPGRAM_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
|
301
|
+
|
|
302
|
+
# Text-to-speech (ElevenLabs)
|
|
303
|
+
# Sign up: https://elevenlabs.io/ (free tier available)
|
|
304
|
+
ELEVENLABS_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
|
305
|
+
```
|
|
306
|
+
|
|
307
|
+
### 5.5 Launch Slack with CDP
|
|
308
|
+
|
|
309
|
+
The huddle controller automates Slack via Chrome DevTools Protocol. Slack must be launched with remote debugging enabled:
|
|
310
|
+
|
|
311
|
+
```bash
|
|
312
|
+
# Launch Slack with CDP on port 9222
|
|
313
|
+
./scripts/huddle/launch-slack.sh
|
|
314
|
+
|
|
315
|
+
# Or manually:
|
|
316
|
+
/Applications/Slack.app/Contents/MacOS/Slack --remote-debugging-port=9222
|
|
317
|
+
```
|
|
318
|
+
|
|
319
|
+
Add this to your agent's launchd configuration or Login Items so Slack always starts with CDP enabled.
|
|
320
|
+
|
|
321
|
+
### 5.6 Install Huddle Dependencies
|
|
322
|
+
|
|
323
|
+
```bash
|
|
324
|
+
cd scripts/huddle
|
|
325
|
+
npm install
|
|
326
|
+
```
|
|
327
|
+
|
|
328
|
+
This installs `ws` (WebSocket client for CDP), `@anthropic-ai/sdk`, and `dotenv`.
|
|
329
|
+
|
|
330
|
+
### 5.7 Start the Huddle Server
|
|
331
|
+
|
|
332
|
+
```bash
|
|
333
|
+
# Start and listen for huddle invitations
|
|
334
|
+
node scripts/huddle/huddle-server.mjs
|
|
335
|
+
|
|
336
|
+
# Join a specific channel's huddle
|
|
337
|
+
node scripts/huddle/huddle-server.mjs --join general
|
|
338
|
+
|
|
339
|
+
# Initiate a huddle with someone
|
|
340
|
+
node scripts/huddle/huddle-server.mjs --call mehran
|
|
341
|
+
```
|
|
342
|
+
|
|
343
|
+
The huddle server:
|
|
344
|
+
1. Connects to Slack via CDP (port 9222)
|
|
345
|
+
2. Listens for huddle invitations
|
|
346
|
+
3. On join: starts the audio bridge (Deepgram STT ↔ Claude reasoning ↔ ElevenLabs TTS)
|
|
347
|
+
4. On huddle end: saves transcript, routes action items to queues
|
|
348
|
+
|
|
349
|
+
### 5.8 Audio Bridge Configuration
|
|
350
|
+
|
|
351
|
+
The audio bridge supports two capture modes (set via environment variable):
|
|
352
|
+
|
|
353
|
+
| Mode | Env Variable | How It Works | When to Use |
|
|
354
|
+
|------------|--------------------------------|-------------------------------------------------|--------------------------|
|
|
355
|
+
| `webaudio` | `HUDDLE_CAPTURE_MODE=webaudio` | Injects ScriptProcessorNode via CDP (echo-free) | Default, recommended |
|
|
356
|
+
| `sox` | `HUDDLE_CAPTURE_MODE=sox` | Captures from BlackHole 2ch via sox | Fallback if CDP fails |
|
|
357
|
+
|
|
358
|
+
Optional environment variables:
|
|
359
|
+
|
|
360
|
+
```bash
|
|
361
|
+
HUDDLE_CAPTURE_DEVICE="BlackHole 2ch" # sox capture device name
|
|
362
|
+
HUDDLE_PLAYBACK_DEVICE="BlackHole 16ch" # sox playback device name
|
|
363
|
+
HUDDLE_CAPTURE_WS_PORT=3201 # WebSocket port for webaudio capture
|
|
364
|
+
HUDDLE_EVENTS_PORT=3200 # HTTP port for huddle event API
|
|
365
|
+
HUDDLE_USE_API=0 # Set to 1 to use Anthropic API instead of Claude CLI
|
|
366
|
+
```
|
|
367
|
+
|
|
368
|
+
---
|
|
369
|
+
|
|
370
|
+
## 6. Voice Transcript Processing
|
|
371
|
+
|
|
372
|
+
After a huddle ends, the transcript is posted to Slack. The transcript parser extracts structured action items for queue routing.
|
|
373
|
+
|
|
374
|
+
### 6.1 How It Works
|
|
375
|
+
|
|
376
|
+
`scripts/parse-voice-transcript.mjs` processes transcript summaries and:
|
|
377
|
+
|
|
378
|
+
1. Identifies the caller from the phone number (via hardcoded map — keep in sync with `caller-id-map.yaml`)
|
|
379
|
+
2. Extracts caller statements from "Caller said:" blocks
|
|
380
|
+
3. Classifies each statement using regex pattern matching across 6 categories:
|
|
381
|
+
- Direct requests ("I need...", "Can you...", "Please...")
|
|
382
|
+
- Imperative verbs (take, send, create, update, schedule)
|
|
383
|
+
- Questions implying action ("What are our options for...?")
|
|
384
|
+
- Decisions/directives ("We should...", "Let's...")
|
|
385
|
+
- Time-sensitive signals (today, urgent, ASAP, emergency)
|
|
386
|
+
- Follow-up triggers (references to previous conversations)
|
|
387
|
+
4. Assigns priority: `critical` (urgent/emergency), `high` (today/priority), `medium` (this week)
|
|
388
|
+
5. Detects topic shifts and groups statements by topic
|
|
389
|
+
6. CEO callers get automatic priority boost (medium → high)
|
|
390
|
+
|
|
391
|
+
### 6.2 Usage
|
|
392
|
+
|
|
393
|
+
The parser is called by the inbox processor when it detects a voice transcript message in Slack. It can also be used standalone:
|
|
394
|
+
|
|
395
|
+
```bash
|
|
396
|
+
# Test with a transcript file
|
|
397
|
+
node scripts/parse-voice-transcript.mjs < transcript.txt
|
|
398
|
+
|
|
399
|
+
# Run the test suite
|
|
400
|
+
node scripts/test-voice-parser.mjs
|
|
401
|
+
```
|
|
402
|
+
|
|
403
|
+
### 6.3 Updating Known Callers
|
|
404
|
+
|
|
405
|
+
The parser has a hardcoded `KNOWN_CALLERS` map at the top of the file. When adding a new contact, update both:
|
|
406
|
+
|
|
407
|
+
1. `config/caller-id-map.yaml` — for SMS handler and RAG access
|
|
408
|
+
2. `scripts/parse-voice-transcript.mjs` → `KNOWN_CALLERS` — for transcript caller identification
|
|
409
|
+
|
|
410
|
+
---
|
|
411
|
+
|
|
412
|
+
## 7. Process Management
|
|
413
|
+
|
|
414
|
+
In production, the SMS handler and huddle server run as persistent background processes alongside the poller and daemon.
|
|
415
|
+
|
|
416
|
+
### 7.1 launchd Plists
|
|
417
|
+
|
|
418
|
+
Create launchd agents for persistent process management. These ensure the services restart automatically after crashes or reboots.
|
|
419
|
+
|
|
420
|
+
**SMS Handler** (`~/Library/LaunchAgents/com.adaptic.<agent>.sms-handler.plist`):
|
|
421
|
+
|
|
422
|
+
```xml
|
|
423
|
+
<?xml version="1.0" encoding="UTF-8"?>
|
|
424
|
+
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
|
|
425
|
+
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
|
426
|
+
<plist version="1.0">
|
|
427
|
+
<dict>
|
|
428
|
+
<key>Label</key>
|
|
429
|
+
<string>com.adaptic.AGENT_NAME.sms-handler</string>
|
|
430
|
+
<key>ProgramArguments</key>
|
|
431
|
+
<array>
|
|
432
|
+
<string>/opt/homebrew/bin/node</string>
|
|
433
|
+
<string>scripts/sms-handler.mjs</string>
|
|
434
|
+
</array>
|
|
435
|
+
<key>WorkingDirectory</key>
|
|
436
|
+
<string>/Users/AGENT_USER/AGENT_REPO</string>
|
|
437
|
+
<key>RunAtLoad</key>
|
|
438
|
+
<true/>
|
|
439
|
+
<key>KeepAlive</key>
|
|
440
|
+
<true/>
|
|
441
|
+
<key>StandardOutPath</key>
|
|
442
|
+
<string>/Users/AGENT_USER/AGENT_REPO/logs/sms-handler-stdout.log</string>
|
|
443
|
+
<key>StandardErrorPath</key>
|
|
444
|
+
<string>/Users/AGENT_USER/AGENT_REPO/logs/sms-handler-stderr.log</string>
|
|
445
|
+
<key>EnvironmentVariables</key>
|
|
446
|
+
<dict>
|
|
447
|
+
<key>PATH</key>
|
|
448
|
+
<string>/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin</string>
|
|
449
|
+
</dict>
|
|
450
|
+
<key>ThrottleInterval</key>
|
|
451
|
+
<integer>10</integer>
|
|
452
|
+
</dict>
|
|
453
|
+
</plist>
|
|
454
|
+
```
|
|
455
|
+
|
|
456
|
+
**Huddle Server** (`~/Library/LaunchAgents/com.adaptic.<agent>.huddle-server.plist`):
|
|
457
|
+
|
|
458
|
+
```xml
|
|
459
|
+
<?xml version="1.0" encoding="UTF-8"?>
|
|
460
|
+
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
|
|
461
|
+
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
|
462
|
+
<plist version="1.0">
|
|
463
|
+
<dict>
|
|
464
|
+
<key>Label</key>
|
|
465
|
+
<string>com.adaptic.AGENT_NAME.huddle-server</string>
|
|
466
|
+
<key>ProgramArguments</key>
|
|
467
|
+
<array>
|
|
468
|
+
<string>/opt/homebrew/bin/node</string>
|
|
469
|
+
<string>scripts/huddle/huddle-server.mjs</string>
|
|
470
|
+
</array>
|
|
471
|
+
<key>WorkingDirectory</key>
|
|
472
|
+
<string>/Users/AGENT_USER/AGENT_REPO</string>
|
|
473
|
+
<key>RunAtLoad</key>
|
|
474
|
+
<true/>
|
|
475
|
+
<key>KeepAlive</key>
|
|
476
|
+
<true/>
|
|
477
|
+
<key>StandardOutPath</key>
|
|
478
|
+
<string>/Users/AGENT_USER/AGENT_REPO/logs/huddle-server-stdout.log</string>
|
|
479
|
+
<key>StandardErrorPath</key>
|
|
480
|
+
<string>/Users/AGENT_USER/AGENT_REPO/logs/huddle-server-stderr.log</string>
|
|
481
|
+
<key>EnvironmentVariables</key>
|
|
482
|
+
<dict>
|
|
483
|
+
<key>PATH</key>
|
|
484
|
+
<string>/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin</string>
|
|
485
|
+
</dict>
|
|
486
|
+
<key>ThrottleInterval</key>
|
|
487
|
+
<integer>10</integer>
|
|
488
|
+
</dict>
|
|
489
|
+
</plist>
|
|
490
|
+
```
|
|
491
|
+
|
|
492
|
+
Replace `AGENT_NAME`, `AGENT_USER`, and `AGENT_REPO` with your agent's values.
|
|
493
|
+
|
|
494
|
+
Load the plists:
|
|
495
|
+
|
|
496
|
+
```bash
|
|
497
|
+
launchctl load ~/Library/LaunchAgents/com.adaptic.AGENT_NAME.sms-handler.plist
|
|
498
|
+
launchctl load ~/Library/LaunchAgents/com.adaptic.AGENT_NAME.huddle-server.plist
|
|
499
|
+
```
|
|
500
|
+
|
|
501
|
+
### 7.2 Cloudflare Tunnel Persistence
|
|
502
|
+
|
|
503
|
+
For production, set up a named Cloudflare tunnel so the webhook URL doesn't change on restart:
|
|
504
|
+
|
|
505
|
+
```bash
|
|
506
|
+
# Authenticate (one-time)
|
|
507
|
+
cloudflared tunnel login
|
|
508
|
+
|
|
509
|
+
# Create a named tunnel
|
|
510
|
+
cloudflared tunnel create agent-webhooks
|
|
511
|
+
|
|
512
|
+
# Configure routes in ~/.cloudflared/config.yml
|
|
513
|
+
cat > ~/.cloudflared/config.yml << 'EOF'
|
|
514
|
+
tunnel: <TUNNEL_ID>
|
|
515
|
+
credentials-file: /Users/AGENT_USER/.cloudflared/<TUNNEL_ID>.json
|
|
516
|
+
|
|
517
|
+
ingress:
|
|
518
|
+
- hostname: sms.agent.yourdomain.com
|
|
519
|
+
service: http://localhost:3001
|
|
520
|
+
- hostname: whatsapp.agent.yourdomain.com
|
|
521
|
+
service: http://localhost:3002
|
|
522
|
+
- service: http_status:404
|
|
523
|
+
EOF
|
|
524
|
+
|
|
525
|
+
# Run the tunnel
|
|
526
|
+
cloudflared tunnel run agent-webhooks
|
|
527
|
+
```
|
|
528
|
+
|
|
529
|
+
Create a launchd plist for the tunnel as well, following the same pattern as the SMS handler plist.
|
|
530
|
+
|
|
531
|
+
---
|
|
532
|
+
|
|
533
|
+
## 8. Testing
|
|
534
|
+
|
|
535
|
+
### 8.1 SMS End-to-End Test
|
|
536
|
+
|
|
537
|
+
| # | Test | How to Verify |
|
|
538
|
+
|---|---------------------------------|-------------------------------------------------------|
|
|
539
|
+
| 1 | Send SMS to agent's number | Check `state/inbox/sms/` for new YAML file |
|
|
540
|
+
| 2 | Verify caller ID lookup | YAML file should contain correct sender name + level |
|
|
541
|
+
| 3 | Verify CEO priority detection | CEO messages should create priority trigger file |
|
|
542
|
+
| 4 | Send outbound SMS | `./scripts/send-sms.sh --to "+1..." --body "test"` |
|
|
543
|
+
| 5 | Verify dedup | Send identical outbound twice quickly; second skips |
|
|
544
|
+
| 6 | Verify audit log | Check `logs/audit/YYYY-MM-DD-actions.jsonl` |
|
|
545
|
+
| 7 | Unknown sender handling | SMS from unknown number should get `default` access |
|
|
546
|
+
|
|
547
|
+
### 8.2 Voice End-to-End Test
|
|
548
|
+
|
|
549
|
+
| # | Test | How to Verify |
|
|
550
|
+
|---|---------------------------------|-------------------------------------------------------|
|
|
551
|
+
| 1 | Audio setup | `./scripts/huddle/setup-audio.sh --check` |
|
|
552
|
+
| 2 | Audio capture/playback | `./scripts/huddle/setup-audio.sh --test` |
|
|
553
|
+
| 3 | CDP connection | Huddle server logs "CDP connected" on start |
|
|
554
|
+
| 4 | Join huddle | `node scripts/huddle/huddle-server.mjs --call <user>` |
|
|
555
|
+
| 5 | STT working | Server logs transcribed speech within ~2 seconds |
|
|
556
|
+
| 6 | TTS working | Agent's voice is audible in the huddle |
|
|
557
|
+
| 7 | Transcript saved | Check `logs/huddle/` for transcript file |
|
|
558
|
+
| 8 | Action items extracted | Run parser on transcript; verify structured output |
|
|
559
|
+
|
|
560
|
+
### 8.3 Voice Transcript Parser Test
|
|
561
|
+
|
|
562
|
+
```bash
|
|
563
|
+
node scripts/test-voice-parser.mjs
|
|
564
|
+
```
|
|
565
|
+
|
|
566
|
+
This runs the parser against sample transcripts and validates extraction accuracy.
|
|
567
|
+
|
|
568
|
+
---
|
|
569
|
+
|
|
570
|
+
## 9. Troubleshooting
|
|
571
|
+
|
|
572
|
+
### SMS handler not receiving messages
|
|
573
|
+
|
|
574
|
+
1. **Check tunnel is running**: `curl https://your-tunnel-url/health`
|
|
575
|
+
2. **Check Twilio webhook config**: Twilio Console → Phone Numbers → your number → Messaging → webhook URL must match tunnel
|
|
576
|
+
3. **Check geo-permissions**: Twilio Console → Messaging → Settings → Geo Permissions → sender's country enabled
|
|
577
|
+
4. **Check handler is listening**: `curl http://localhost:3001/health`
|
|
578
|
+
5. **Check logs**: `tail -f logs/sms-handler-stderr.log`
|
|
579
|
+
|
|
580
|
+
### Outbound SMS fails
|
|
581
|
+
|
|
582
|
+
1. **Check Twilio credentials**: `echo $TWILIO_ACCOUNT_SID` should start with `AC`
|
|
583
|
+
2. **Check phone number format**: Must be E.164 (`+` prefix, country code, no spaces)
|
|
584
|
+
3. **Check Twilio balance**: Low balance prevents sends
|
|
585
|
+
4. **Check dedup**: If you see `DEDUP_SKIP`, the same message was recently sent
|
|
586
|
+
|
|
587
|
+
### Audio devices not found
|
|
588
|
+
|
|
589
|
+
1. **Run setup**: `./scripts/huddle/setup-audio.sh`
|
|
590
|
+
2. **Restart after install**: BlackHole may require a system restart after first install
|
|
591
|
+
3. **Verify devices**: `SwitchAudioSource -a` should list both BlackHole 2ch and 16ch
|
|
592
|
+
4. **Check Slack audio settings**: Slack Preferences → Audio & Video → verify speaker = BlackHole 2ch, mic = BlackHole 16ch
|
|
593
|
+
|
|
594
|
+
### Huddle server can't connect to Slack
|
|
595
|
+
|
|
596
|
+
1. **Check Slack is running with CDP**: `curl http://localhost:9222/json/version` should return Slack version info
|
|
597
|
+
2. **Relaunch Slack**: `./scripts/huddle/launch-slack.sh`
|
|
598
|
+
3. **Check CDP port**: Ensure no other Electron app (VS Code, etc.) is claiming port 9222
|
|
599
|
+
|
|
600
|
+
### Echo in huddles (agent hears itself)
|
|
601
|
+
|
|
602
|
+
1. **Switch capture mode**: Set `HUDDLE_CAPTURE_MODE=webaudio` in `.env` (this bypasses BlackHole for inbound capture, using CDP-injected audio capture instead)
|
|
603
|
+
2. **Verify audio routing**: BlackHole 2ch should be Slack's speaker only; system speaker should remain on built-in
|
|
604
|
+
|
|
605
|
+
### Deepgram not transcribing
|
|
606
|
+
|
|
607
|
+
1. **Check API key**: Verify `DEEPGRAM_API_KEY` is set and valid
|
|
608
|
+
2. **Check audio signal**: `./scripts/huddle/setup-audio.sh --test` should show non-zero bytes captured
|
|
609
|
+
3. **Check WebSocket**: Audio bridge logs should show "Deepgram connected"
|
|
610
|
+
4. **Check model**: Default is `nova-2` — verify your Deepgram plan supports it
|
|
611
|
+
|
|
612
|
+
### ElevenLabs TTS not working
|
|
613
|
+
|
|
614
|
+
1. **Check API key**: Verify `ELEVENLABS_API_KEY` is set and valid
|
|
615
|
+
2. **Check voice ID**: Default voice and model are configured in audio-bridge.mjs — verify they match your ElevenLabs account
|
|
616
|
+
3. **Check playback device**: `sox -n -t coreaudio "BlackHole 16ch" synth 1 sine 440` should play without error
|
|
617
|
+
4. **Check quota**: Free tier has limited characters per month
|
|
618
|
+
|
|
619
|
+
---
|
|
620
|
+
|
|
621
|
+
## 10. Security Considerations
|
|
622
|
+
|
|
623
|
+
### Caller Authentication
|
|
624
|
+
|
|
625
|
+
- All inbound SMS senders are looked up in `caller-id-map.yaml` before processing
|
|
626
|
+
- Unknown numbers receive `default` (most restrictive) access level
|
|
627
|
+
- The agent never shares internal state, logs, or memory with unknown callers
|
|
628
|
+
- Phone number spoofing is possible — do not treat SMS as a secure authentication channel for high-risk actions
|
|
629
|
+
|
|
630
|
+
### Twilio Request Validation
|
|
631
|
+
|
|
632
|
+
When `TWILIO_ACCOUNT_SID` and `TWILIO_AUTH_TOKEN` are set, the SMS handler can validate that incoming webhooks genuinely come from Twilio. This prevents third parties from injecting fake messages by POSTing to your webhook URL.
|
|
633
|
+
|
|
634
|
+
### Tunnel Security
|
|
635
|
+
|
|
636
|
+
- Quick tunnels (random URLs) provide security-through-obscurity but the URL can leak
|
|
637
|
+
- Named tunnels with Cloudflare Access provide proper authentication
|
|
638
|
+
- Consider adding IP allowlisting in Cloudflare if your Twilio traffic comes from known IPs
|
|
639
|
+
|
|
640
|
+
### Voice Data
|
|
641
|
+
|
|
642
|
+
- Audio is processed in real-time and not stored permanently (only transcripts are kept)
|
|
643
|
+
- Deepgram processes audio on their servers — review their [data handling policy](https://deepgram.com/privacy)
|
|
644
|
+
- ElevenLabs processes text on their servers — review their [privacy policy](https://elevenlabs.io/privacy)
|
|
645
|
+
- All voice transcripts are logged locally in `logs/huddle/`
|
|
646
|
+
|
|
647
|
+
---
|
|
648
|
+
|
|
649
|
+
## Quick Reference
|
|
650
|
+
|
|
651
|
+
### Ports
|
|
652
|
+
|
|
653
|
+
| Service | Default Port | Env Variable |
|
|
654
|
+
|-------------------|-------------|-------------------------|
|
|
655
|
+
| SMS Handler | 3001 | `SMS_PORT` |
|
|
656
|
+
| WhatsApp Handler | 3002 | `WHATSAPP_PORT` |
|
|
657
|
+
| Huddle Events API | 3200 | `HUDDLE_EVENTS_PORT` |
|
|
658
|
+
| Audio Capture WS | 3201 | `HUDDLE_CAPTURE_WS_PORT`|
|
|
659
|
+
| Slack CDP | 9222 | (launch flag) |
|
|
660
|
+
|
|
661
|
+
### Environment Variables (Voice/SMS)
|
|
662
|
+
|
|
663
|
+
| Variable | Required For | Description |
|
|
664
|
+
|---------------------------|----------------|------------------------------------------|
|
|
665
|
+
| `TWILIO_ACCOUNT_SID` | SMS | Twilio account identifier |
|
|
666
|
+
| `TWILIO_AUTH_TOKEN` | SMS | Twilio authentication token |
|
|
667
|
+
| `TWILIO_PHONE_NUMBER` | SMS | Agent's Twilio phone number (E.164) |
|
|
668
|
+
| `TWILIO_PHONE_SID` | SMS (optional) | Phone number SID for advanced features |
|
|
669
|
+
| `DEEPGRAM_API_KEY` | Voice | Deepgram STT API key |
|
|
670
|
+
| `ELEVENLABS_API_KEY` | Voice | ElevenLabs TTS API key |
|
|
671
|
+
| `SMS_PORT` | SMS (optional) | Override default SMS handler port |
|
|
672
|
+
| `HUDDLE_CAPTURE_MODE` | Voice (optional)| `webaudio` (default) or `sox` |
|
|
673
|
+
| `HUDDLE_USE_API` | Voice (optional)| Set `1` to use Anthropic API vs CLI |
|
|
674
|
+
|
|
675
|
+
### Key Files
|
|
676
|
+
|
|
677
|
+
| File | Purpose |
|
|
678
|
+
|--------------------------------------|-------------------------------------------------|
|
|
679
|
+
| `scripts/sms-handler.mjs` | Inbound SMS webhook handler |
|
|
680
|
+
| `scripts/send-sms.sh` | Outbound SMS sender with dedup |
|
|
681
|
+
| `scripts/huddle/setup-audio.sh` | Audio device installation and verification |
|
|
682
|
+
| `scripts/huddle/launch-slack.sh` | Launch Slack with CDP enabled |
|
|
683
|
+
| `scripts/huddle/start-call.mjs` | Initiate a Slack huddle with a contact |
|
|
684
|
+
| `scripts/huddle/huddle-server.mjs` | Main huddle orchestrator |
|
|
685
|
+
| `scripts/huddle/huddle-controller.mjs` | Slack CDP automation for huddle UI |
|
|
686
|
+
| `scripts/huddle/audio-bridge.mjs` | Bidirectional STT/TTS audio pipeline |
|
|
687
|
+
| `scripts/parse-voice-transcript.mjs`| Extract action items from call transcripts |
|
|
688
|
+
| `config/caller-id-map.yaml` | Phone → identity mapping with access levels |
|
|
689
|
+
|
|
690
|
+
---
|
|
691
|
+
|
|
692
|
+
## Related Documents
|
|
693
|
+
|
|
694
|
+
- [Mac Mini Bootstrap](../runbooks/mac-mini-bootstrap.md) — Full machine setup
|
|
695
|
+
- [Agent Persona Setup](agent-persona-setup.md) — Identity and configuration
|
|
696
|
+
- [Perpetual Operations](../runbooks/perpetual-operations.md) — How the system runs 24/7
|
|
697
|
+
- [Communications Policy](../governance/communications-policy.md) — Voice modes and approval rules
|
|
698
|
+
- [System Architecture](../architecture/system-architecture.md) — Overall system design
|