@adaptic/maestro 1.1.6 → 1.1.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,698 @@
1
+ # Voice & SMS Setup Guide
2
+
3
+ How to enable real-time voice calls (Slack huddles) and SMS messaging for a Maestro agent. This guide covers the full stack: Twilio phone number, inbound/outbound SMS, Slack huddle voice participation (Deepgram STT + ElevenLabs TTS), Cloudflare tunnel exposure, caller ID mapping, and voice transcript parsing.
4
+
5
+ **Prerequisites**: Complete the [Mac Mini Bootstrap](../runbooks/mac-mini-bootstrap.md) and [Agent Persona Setup](agent-persona-setup.md) first. This guide assumes the agent repo is scaffolded and `/init-maestro` Phase 0–3 are done.
6
+
7
+ ---
8
+
9
+ ## Architecture Overview
10
+
11
+ The voice/SMS stack has three layers:
12
+
13
+ ```
14
+ ┌─────────────────────────────────────────────────────────────┐
15
+ │ Layer 1: SMS (Twilio) │
16
+ │ ┌──────────────┐ ┌──────────────────┐ ┌────────────┐ │
17
+ │ │ Twilio Cloud │───▶│ Cloudflare Tunnel │───▶│ sms-handler│ │
18
+ │ │ (webhooks) │ │ (port 3001) │ │ .mjs │ │
19
+ │ └──────────────┘ └──────────────────┘ └─────┬──────┘ │
20
+ │ │ │
21
+ │ state/inbox/ │
22
+ │ sms/*.yaml │
23
+ ├─────────────────────────────────────────────────────────────┤
24
+ │ Layer 2: Slack Huddle Voice │
25
+ │ ┌──────────┐ ┌───────────┐ ┌──────────────────────┐ │
26
+ │ │ Slack │◀─▶│ CDP │◀─▶│ huddle-controller │ │
27
+ │ │ Desktop │ │ (9222) │ │ .mjs │ │
28
+ │ └────┬─────┘ └───────────┘ └──────────────────────┘ │
29
+ │ │ │
30
+ │ ┌────▼──────────────────────────────────────────────────┐ │
31
+ │ │ BlackHole 2ch (capture) ←→ Deepgram STT │ │
32
+ │ │ BlackHole 16ch (playback) ←→ ElevenLabs TTS │ │
33
+ │ │ audio-bridge.mjs │ │
34
+ │ └────────────────────────────────────────────────────────┘ │
35
+ ├─────────────────────────────────────────────────────────────┤
36
+ │ Layer 3: Transcript Processing │
37
+ │ ┌────────────────────┐ ┌─────────────────────────────┐ │
38
+ │ │ parse-voice- │──▶│ state/inbox/voice/*.yaml │ │
39
+ │ │ transcript.mjs │ │ (action items, priorities) │ │
40
+ │ └────────────────────┘ └─────────────────────────────┘ │
41
+ └─────────────────────────────────────────────────────────────┘
42
+ ```
43
+
44
+ **Data flow summary**:
45
+ - **SMS in**: Twilio → Cloudflare tunnel → `sms-handler.mjs` (port 3001) → inbox YAML → inbox processor
46
+ - **SMS out**: Agent calls `scripts/send-sms.sh` → Twilio REST API → recipient phone
47
+ - **Voice in**: Slack huddle audio → BlackHole 2ch → sox → Deepgram STT → transcript
48
+ - **Voice out**: Claude response → ElevenLabs TTS → sox → BlackHole 16ch → Slack huddle mic
49
+ - **Voice post-call**: Transcript summary → `parse-voice-transcript.mjs` → inbox YAML → action routing
50
+
51
+ ---
52
+
53
+ ## 1. Twilio Account & Phone Number
54
+
55
+ ### 1.1 Create a Twilio Account
56
+
57
+ 1. Sign up at https://www.twilio.com/ (pay-as-you-go, no minimum commitment)
58
+ 2. Verify your email and phone number
59
+ 3. From the Twilio Console dashboard, copy:
60
+ - **Account SID** (starts with `AC`)
61
+ - **Auth Token** (click to reveal)
62
+
63
+ ### 1.2 Purchase a Phone Number
64
+
65
+ 1. Go to Twilio Console → Phone Numbers → Buy a Number
66
+ 2. Search for a number with **both SMS and Voice capabilities** enabled
67
+ 3. Select a number in a region appropriate for your agent's primary contacts
68
+ 4. Note the number in E.164 format (e.g., `+15551234567`)
69
+
70
+ **Tip**: US numbers are cheapest (~$1.15/month). If your contacts are international, check Twilio's [geo-permissions](https://console.twilio.com/us1/develop/sms/settings/geo-permissions) and enable the countries you need for inbound/outbound SMS.
71
+
72
+ ### 1.3 Configure Environment Variables
73
+
74
+ Add to your agent's `.env`:
75
+
76
+ ```bash
77
+ # Twilio credentials
78
+ TWILIO_ACCOUNT_SID=ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
79
+ TWILIO_AUTH_TOKEN=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
80
+ TWILIO_PHONE_NUMBER=+15551234567
81
+ TWILIO_PHONE_SID=PNxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx # Optional: from Phone Numbers → your number
82
+ ```
83
+
84
+ ### 1.4 Enable Geo-Permissions (International SMS)
85
+
86
+ If your agent needs to send/receive SMS to/from international numbers:
87
+
88
+ 1. Go to Twilio Console → Messaging → Settings → Geo Permissions
89
+ 2. Enable each country where your contacts have phone numbers
90
+ 3. **Critical for UAE**: Enable "United Arab Emirates" if your principal or contacts use UAE numbers
91
+
92
+ ---
93
+
94
+ ## 2. SMS — Inbound (Webhook Handler)
95
+
96
+ The inbound SMS handler (`scripts/sms-handler.mjs`) receives Twilio webhook POSTs and writes messages to the agent's inbox for processing.
97
+
98
+ ### 2.1 How It Works
99
+
100
+ 1. Twilio receives an SMS to your agent's phone number
101
+ 2. Twilio sends an HTTP POST to your configured webhook URL
102
+ 3. `sms-handler.mjs` parses the message, looks up the sender in `config/caller-id-map.yaml`
103
+ 4. Creates a YAML file in `state/inbox/sms/` with sender identity, access level, and priority
104
+ 5. Returns empty TwiML (prevents Twilio from auto-replying)
105
+ 6. CEO messages automatically create priority trigger files for immediate processing
106
+
107
+ ### 2.2 Start the SMS Handler
108
+
109
+ ```bash
110
+ # Start the handler (default port 3001)
111
+ node scripts/sms-handler.mjs
112
+
113
+ # Or with a custom port
114
+ SMS_PORT=3005 node scripts/sms-handler.mjs
115
+ ```
116
+
117
+ Verify it's running:
118
+
119
+ ```bash
120
+ curl http://localhost:3001/health
121
+ # Expected: {"status":"ok","uptime":...}
122
+ ```
123
+
124
+ ### 2.3 Expose via Cloudflare Tunnel
125
+
126
+ The handler runs locally — Twilio needs a public URL to reach it. Use Cloudflare's free quick tunnels:
127
+
128
+ ```bash
129
+ # Install cloudflared if not already present
130
+ brew install cloudflared
131
+
132
+ # Start a tunnel to the SMS handler
133
+ cloudflared tunnel --url http://localhost:3001
134
+ ```
135
+
136
+ Cloudflared prints a public URL like `https://random-words-here.trycloudflare.com`. Copy this URL.
137
+
138
+ **Important**: Quick tunnels generate a new URL each time. For production stability, set up a [named tunnel](https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/get-started/) with a fixed subdomain.
139
+
140
+ ### 2.4 Configure Twilio Webhook
141
+
142
+ 1. Go to Twilio Console → Phone Numbers → Active Numbers → your number
143
+ 2. Under **Messaging** → "A message comes in":
144
+ - Set to **Webhook**
145
+ - URL: `https://your-tunnel-url.trycloudflare.com/sms`
146
+ - Method: **HTTP POST**
147
+ 3. Click **Save configuration**
148
+
149
+ ### 2.5 Test Inbound SMS
150
+
151
+ Send an SMS to your agent's Twilio number from your phone. Check:
152
+
153
+ ```bash
154
+ # Verify the message landed in the inbox
155
+ ls state/inbox/sms/
156
+
157
+ # Check the SMS logs
158
+ cat logs/sms/$(date +%Y-%m-%d)/*.jsonl | tail -1 | jq .
159
+ ```
160
+
161
+ ---
162
+
163
+ ## 3. SMS — Outbound
164
+
165
+ Outbound SMS uses `scripts/send-sms.sh`, a shell wrapper around Twilio's REST API with built-in deduplication and audit logging.
166
+
167
+ ### 3.1 Send an SMS
168
+
169
+ ```bash
170
+ ./scripts/send-sms.sh --to "+1234567890" --body "Hello from your agent"
171
+ ```
172
+
173
+ Options:
174
+
175
+ | Flag | Description | Required |
176
+ |----------|------------------------------------------------|----------|
177
+ | `--to` | Recipient phone number (E.164 format) | Yes |
178
+ | `--body` | Message text | Yes |
179
+ | `--from` | Sender number (defaults to `$TWILIO_PHONE_NUMBER`) | No |
180
+
181
+ ### 3.2 Deduplication
182
+
183
+ The send script integrates with `scripts/outbound-dedup.sh` to prevent duplicate sends when multiple sessions are running concurrently. This is automatic — no configuration needed.
184
+
185
+ ### 3.3 Audit Trail
186
+
187
+ Every outbound SMS is logged to two locations:
188
+
189
+ - `logs/sms/YYYY-MM-DD/YYYY-MM-DD-sms.jsonl` — SMS-specific log
190
+ - `logs/audit/YYYY-MM-DD-actions.jsonl` — Global audit trail
191
+
192
+ ---
193
+
194
+ ## 4. Caller ID Mapping
195
+
196
+ Both the SMS handler and voice transcript parser use `config/caller-id-map.yaml` to identify callers and assign access levels. This file is a **security boundary** — it determines what information the agent can share with each caller.
197
+
198
+ ### 4.1 Create the Caller ID Map
199
+
200
+ If your agent was scaffolded with `npx @adaptic/maestro create`, a template exists at `config/caller-id-map.yaml`. If not, create one:
201
+
202
+ ```yaml
203
+ # config/caller-id-map.yaml
204
+ #
205
+ # Maps phone numbers, Slack IDs, and emails to user identities with access levels.
206
+ # Used by: sms-handler.mjs, parse-voice-transcript.mjs, user-context-search.py
207
+ #
208
+ # Access levels:
209
+ # ceo — Full access to all paths (memory, knowledge, state, outputs, docs, logs)
210
+ # leadership — Company knowledge + docs + research/briefs + own interaction logs
211
+ # partner — Public knowledge + research + own interaction logs
212
+ # default — Public company knowledge only (knowledge/sources/)
213
+
214
+ users:
215
+ # Principal (the person the agent reports to)
216
+ principal-name:
217
+ name: "Full Name"
218
+ phone: ["+1XXXXXXXXXX"]
219
+ whatsapp: ["+1XXXXXXXXXX"]
220
+ slack_id: "UXXXXXXXXXX"
221
+ email: "name@company.com"
222
+ access_level: ceo
223
+
224
+ # Add team members, partners, and other contacts below
225
+ # team-member:
226
+ # name: "Team Member Name"
227
+ # phone: []
228
+ # slack_id: "UXXXXXXXXXX"
229
+ # email: "member@company.com"
230
+ # access_level: leadership
231
+ ```
232
+
233
+ ### 4.2 Access Level Reference
234
+
235
+ | Level | Accessible Paths | Typical Recipients |
236
+ |--------------|-----------------------------------------------------------------------------|----------------------------------|
237
+ | `ceo` | Everything: memory/, knowledge/, state/, outputs/, docs/, logs/, all interactions | CEO / principal |
238
+ | `leadership` | knowledge/ (company), docs/, research/briefs, own interaction logs | C-suite, directors, senior team |
239
+ | `partner` | knowledge/sources/ (public), research, own interaction logs | JV partners, advisors |
240
+ | `default` | knowledge/sources/ only (public company knowledge) | Unknown callers, vendors |
241
+
242
+ ### 4.3 Hot Reload
243
+
244
+ The SMS handler reloads `caller-id-map.yaml` every 60 seconds automatically. No restart needed when adding contacts.
245
+
246
+ ---
247
+
248
+ ## 5. Voice — Slack Huddle Infrastructure
249
+
250
+ The voice stack enables the agent to join Slack huddles, listen via STT, reason via Claude, and speak via TTS. This requires virtual audio routing and three API integrations.
251
+
252
+ ### 5.1 Install Audio Dependencies
253
+
254
+ Run the setup script:
255
+
256
+ ```bash
257
+ cd scripts/huddle
258
+ ./setup-audio.sh
259
+ ```
260
+
261
+ This installs:
262
+
263
+ | Component | Install Method | Purpose |
264
+ |-------------------|----------------------|--------------------------------------------------|
265
+ | BlackHole 2ch | `brew install blackhole-2ch` | Captures Slack's speaker output (what others say) |
266
+ | BlackHole 16ch | `brew install blackhole-16ch` | Carries TTS output to Slack's mic (what agent says) |
267
+ | sox | `brew install sox` | Audio capture and playback between devices |
268
+ | SwitchAudioSource | `brew install switchaudio-osx`| Lists and verifies audio devices |
269
+
270
+ **Why two BlackHole devices?** Using separate virtual audio devices for capture (2ch) and playback (16ch) prevents feedback loops. If the same device were used for both, the agent would hear its own TTS output, transcribe it, and respond to itself in an infinite loop.
271
+
272
+ ### 5.2 Configure Slack Audio
273
+
274
+ Open Slack → Preferences → Audio & Video:
275
+
276
+ | Setting | Value | Purpose |
277
+ |--------------|------------------|----------------------------------------------|
278
+ | Speaker | BlackHole 2ch | Routes huddle audio to the capture pipeline |
279
+ | Microphone | BlackHole 16ch | Routes TTS output into the huddle |
280
+
281
+ **Note**: System audio (built-in speakers/mic) remains unchanged. Only Slack's audio is rerouted.
282
+
283
+ ### 5.3 Verify Audio Setup
284
+
285
+ ```bash
286
+ # Check all devices are present
287
+ ./scripts/huddle/setup-audio.sh --check
288
+
289
+ # Test capture and playback
290
+ ./scripts/huddle/setup-audio.sh --test
291
+ ```
292
+
293
+ ### 5.4 Configure Voice API Keys
294
+
295
+ Add to `.env`:
296
+
297
+ ```bash
298
+ # Speech-to-text (Deepgram)
299
+ # Sign up: https://deepgram.com/ (free tier: 200 hours/year)
300
+ DEEPGRAM_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
301
+
302
+ # Text-to-speech (ElevenLabs)
303
+ # Sign up: https://elevenlabs.io/ (free tier available)
304
+ ELEVENLABS_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
305
+ ```
306
+
307
+ ### 5.5 Launch Slack with CDP
308
+
309
+ The huddle controller automates Slack via Chrome DevTools Protocol. Slack must be launched with remote debugging enabled:
310
+
311
+ ```bash
312
+ # Launch Slack with CDP on port 9222
313
+ ./scripts/huddle/launch-slack.sh
314
+
315
+ # Or manually:
316
+ /Applications/Slack.app/Contents/MacOS/Slack --remote-debugging-port=9222
317
+ ```
318
+
319
+ Add this to your agent's launchd configuration or Login Items so Slack always starts with CDP enabled.
320
+
321
+ ### 5.6 Install Huddle Dependencies
322
+
323
+ ```bash
324
+ cd scripts/huddle
325
+ npm install
326
+ ```
327
+
328
+ This installs `ws` (WebSocket client for CDP), `@anthropic-ai/sdk`, and `dotenv`.
329
+
330
+ ### 5.7 Start the Huddle Server
331
+
332
+ ```bash
333
+ # Start and listen for huddle invitations
334
+ node scripts/huddle/huddle-server.mjs
335
+
336
+ # Join a specific channel's huddle
337
+ node scripts/huddle/huddle-server.mjs --join general
338
+
339
+ # Initiate a huddle with someone
340
+ node scripts/huddle/huddle-server.mjs --call mehran
341
+ ```
342
+
343
+ The huddle server:
344
+ 1. Connects to Slack via CDP (port 9222)
345
+ 2. Listens for huddle invitations
346
+ 3. On join: starts the audio bridge (Deepgram STT ↔ Claude reasoning ↔ ElevenLabs TTS)
347
+ 4. On huddle end: saves transcript, routes action items to queues
348
+
349
+ ### 5.8 Audio Bridge Configuration
350
+
351
+ The audio bridge supports two capture modes (set via environment variable):
352
+
353
+ | Mode | Env Variable | How It Works | When to Use |
354
+ |------------|--------------------------------|-------------------------------------------------|--------------------------|
355
+ | `webaudio` | `HUDDLE_CAPTURE_MODE=webaudio` | Injects ScriptProcessorNode via CDP (echo-free) | Default, recommended |
356
+ | `sox` | `HUDDLE_CAPTURE_MODE=sox` | Captures from BlackHole 2ch via sox | Fallback if CDP fails |
357
+
358
+ Optional environment variables:
359
+
360
+ ```bash
361
+ HUDDLE_CAPTURE_DEVICE="BlackHole 2ch" # sox capture device name
362
+ HUDDLE_PLAYBACK_DEVICE="BlackHole 16ch" # sox playback device name
363
+ HUDDLE_CAPTURE_WS_PORT=3201 # WebSocket port for webaudio capture
364
+ HUDDLE_EVENTS_PORT=3200 # HTTP port for huddle event API
365
+ HUDDLE_USE_API=0 # Set to 1 to use Anthropic API instead of Claude CLI
366
+ ```
367
+
368
+ ---
369
+
370
+ ## 6. Voice Transcript Processing
371
+
372
+ After a huddle ends, the transcript is posted to Slack. The transcript parser extracts structured action items for queue routing.
373
+
374
+ ### 6.1 How It Works
375
+
376
+ `scripts/parse-voice-transcript.mjs` processes transcript summaries and:
377
+
378
+ 1. Identifies the caller from the phone number (via hardcoded map — keep in sync with `caller-id-map.yaml`)
379
+ 2. Extracts caller statements from "Caller said:" blocks
380
+ 3. Classifies each statement using regex pattern matching across 6 categories:
381
+ - Direct requests ("I need...", "Can you...", "Please...")
382
+ - Imperative verbs (take, send, create, update, schedule)
383
+ - Questions implying action ("What are our options for...?")
384
+ - Decisions/directives ("We should...", "Let's...")
385
+ - Time-sensitive signals (today, urgent, ASAP, emergency)
386
+ - Follow-up triggers (references to previous conversations)
387
+ 4. Assigns priority: `critical` (urgent/emergency), `high` (today/priority), `medium` (this week)
388
+ 5. Detects topic shifts and groups statements by topic
389
+ 6. CEO callers get automatic priority boost (medium → high)
390
+
391
+ ### 6.2 Usage
392
+
393
+ The parser is called by the inbox processor when it detects a voice transcript message in Slack. It can also be used standalone:
394
+
395
+ ```bash
396
+ # Test with a transcript file
397
+ node scripts/parse-voice-transcript.mjs < transcript.txt
398
+
399
+ # Run the test suite
400
+ node scripts/test-voice-parser.mjs
401
+ ```
402
+
403
+ ### 6.3 Updating Known Callers
404
+
405
+ The parser has a hardcoded `KNOWN_CALLERS` map at the top of the file. When adding a new contact, update both:
406
+
407
+ 1. `config/caller-id-map.yaml` — for SMS handler and RAG access
408
+ 2. `scripts/parse-voice-transcript.mjs` → `KNOWN_CALLERS` — for transcript caller identification
409
+
410
+ ---
411
+
412
+ ## 7. Process Management
413
+
414
+ In production, the SMS handler and huddle server run as persistent background processes alongside the poller and daemon.
415
+
416
+ ### 7.1 launchd Plists
417
+
418
+ Create launchd agents for persistent process management. These ensure the services restart automatically after crashes or reboots.
419
+
420
+ **SMS Handler** (`~/Library/LaunchAgents/com.adaptic.<agent>.sms-handler.plist`):
421
+
422
+ ```xml
423
+ <?xml version="1.0" encoding="UTF-8"?>
424
+ <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
425
+ "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
426
+ <plist version="1.0">
427
+ <dict>
428
+ <key>Label</key>
429
+ <string>com.adaptic.AGENT_NAME.sms-handler</string>
430
+ <key>ProgramArguments</key>
431
+ <array>
432
+ <string>/opt/homebrew/bin/node</string>
433
+ <string>scripts/sms-handler.mjs</string>
434
+ </array>
435
+ <key>WorkingDirectory</key>
436
+ <string>/Users/AGENT_USER/AGENT_REPO</string>
437
+ <key>RunAtLoad</key>
438
+ <true/>
439
+ <key>KeepAlive</key>
440
+ <true/>
441
+ <key>StandardOutPath</key>
442
+ <string>/Users/AGENT_USER/AGENT_REPO/logs/sms-handler-stdout.log</string>
443
+ <key>StandardErrorPath</key>
444
+ <string>/Users/AGENT_USER/AGENT_REPO/logs/sms-handler-stderr.log</string>
445
+ <key>EnvironmentVariables</key>
446
+ <dict>
447
+ <key>PATH</key>
448
+ <string>/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin</string>
449
+ </dict>
450
+ <key>ThrottleInterval</key>
451
+ <integer>10</integer>
452
+ </dict>
453
+ </plist>
454
+ ```
455
+
456
+ **Huddle Server** (`~/Library/LaunchAgents/com.adaptic.<agent>.huddle-server.plist`):
457
+
458
+ ```xml
459
+ <?xml version="1.0" encoding="UTF-8"?>
460
+ <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
461
+ "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
462
+ <plist version="1.0">
463
+ <dict>
464
+ <key>Label</key>
465
+ <string>com.adaptic.AGENT_NAME.huddle-server</string>
466
+ <key>ProgramArguments</key>
467
+ <array>
468
+ <string>/opt/homebrew/bin/node</string>
469
+ <string>scripts/huddle/huddle-server.mjs</string>
470
+ </array>
471
+ <key>WorkingDirectory</key>
472
+ <string>/Users/AGENT_USER/AGENT_REPO</string>
473
+ <key>RunAtLoad</key>
474
+ <true/>
475
+ <key>KeepAlive</key>
476
+ <true/>
477
+ <key>StandardOutPath</key>
478
+ <string>/Users/AGENT_USER/AGENT_REPO/logs/huddle-server-stdout.log</string>
479
+ <key>StandardErrorPath</key>
480
+ <string>/Users/AGENT_USER/AGENT_REPO/logs/huddle-server-stderr.log</string>
481
+ <key>EnvironmentVariables</key>
482
+ <dict>
483
+ <key>PATH</key>
484
+ <string>/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin</string>
485
+ </dict>
486
+ <key>ThrottleInterval</key>
487
+ <integer>10</integer>
488
+ </dict>
489
+ </plist>
490
+ ```
491
+
492
+ Replace `AGENT_NAME`, `AGENT_USER`, and `AGENT_REPO` with your agent's values.
493
+
494
+ Load the plists:
495
+
496
+ ```bash
497
+ launchctl load ~/Library/LaunchAgents/com.adaptic.AGENT_NAME.sms-handler.plist
498
+ launchctl load ~/Library/LaunchAgents/com.adaptic.AGENT_NAME.huddle-server.plist
499
+ ```
500
+
501
+ ### 7.2 Cloudflare Tunnel Persistence
502
+
503
+ For production, set up a named Cloudflare tunnel so the webhook URL doesn't change on restart:
504
+
505
+ ```bash
506
+ # Authenticate (one-time)
507
+ cloudflared tunnel login
508
+
509
+ # Create a named tunnel
510
+ cloudflared tunnel create agent-webhooks
511
+
512
+ # Configure routes in ~/.cloudflared/config.yml
513
+ cat > ~/.cloudflared/config.yml << 'EOF'
514
+ tunnel: <TUNNEL_ID>
515
+ credentials-file: /Users/AGENT_USER/.cloudflared/<TUNNEL_ID>.json
516
+
517
+ ingress:
518
+ - hostname: sms.agent.yourdomain.com
519
+ service: http://localhost:3001
520
+ - hostname: whatsapp.agent.yourdomain.com
521
+ service: http://localhost:3002
522
+ - service: http_status:404
523
+ EOF
524
+
525
+ # Run the tunnel
526
+ cloudflared tunnel run agent-webhooks
527
+ ```
528
+
529
+ Create a launchd plist for the tunnel as well, following the same pattern as the SMS handler plist.
530
+
531
+ ---
532
+
533
+ ## 8. Testing
534
+
535
+ ### 8.1 SMS End-to-End Test
536
+
537
+ | # | Test | How to Verify |
538
+ |---|---------------------------------|-------------------------------------------------------|
539
+ | 1 | Send SMS to agent's number | Check `state/inbox/sms/` for new YAML file |
540
+ | 2 | Verify caller ID lookup | YAML file should contain correct sender name + level |
541
+ | 3 | Verify CEO priority detection | CEO messages should create priority trigger file |
542
+ | 4 | Send outbound SMS | `./scripts/send-sms.sh --to "+1..." --body "test"` |
543
+ | 5 | Verify dedup | Send identical outbound twice quickly; second skips |
544
+ | 6 | Verify audit log | Check `logs/audit/YYYY-MM-DD-actions.jsonl` |
545
+ | 7 | Unknown sender handling | SMS from unknown number should get `default` access |
546
+
547
+ ### 8.2 Voice End-to-End Test
548
+
549
+ | # | Test | How to Verify |
550
+ |---|---------------------------------|-------------------------------------------------------|
551
+ | 1 | Audio setup | `./scripts/huddle/setup-audio.sh --check` |
552
+ | 2 | Audio capture/playback | `./scripts/huddle/setup-audio.sh --test` |
553
+ | 3 | CDP connection | Huddle server logs "CDP connected" on start |
554
+ | 4 | Join huddle | `node scripts/huddle/huddle-server.mjs --call <user>` |
555
+ | 5 | STT working | Server logs transcribed speech within ~2 seconds |
556
+ | 6 | TTS working | Agent's voice is audible in the huddle |
557
+ | 7 | Transcript saved | Check `logs/huddle/` for transcript file |
558
+ | 8 | Action items extracted | Run parser on transcript; verify structured output |
559
+
560
+ ### 8.3 Voice Transcript Parser Test
561
+
562
+ ```bash
563
+ node scripts/test-voice-parser.mjs
564
+ ```
565
+
566
+ This runs the parser against sample transcripts and validates extraction accuracy.
567
+
568
+ ---
569
+
570
+ ## 9. Troubleshooting
571
+
572
+ ### SMS handler not receiving messages
573
+
574
+ 1. **Check tunnel is running**: `curl https://your-tunnel-url/health`
575
+ 2. **Check Twilio webhook config**: Twilio Console → Phone Numbers → your number → Messaging → webhook URL must match tunnel
576
+ 3. **Check geo-permissions**: Twilio Console → Messaging → Settings → Geo Permissions → sender's country enabled
577
+ 4. **Check handler is listening**: `curl http://localhost:3001/health`
578
+ 5. **Check logs**: `tail -f logs/sms-handler-stderr.log`
579
+
580
+ ### Outbound SMS fails
581
+
582
+ 1. **Check Twilio credentials**: `echo $TWILIO_ACCOUNT_SID` should start with `AC`
583
+ 2. **Check phone number format**: Must be E.164 (`+` prefix, country code, no spaces)
584
+ 3. **Check Twilio balance**: Low balance prevents sends
585
+ 4. **Check dedup**: If you see `DEDUP_SKIP`, the same message was recently sent
586
+
587
+ ### Audio devices not found
588
+
589
+ 1. **Run setup**: `./scripts/huddle/setup-audio.sh`
590
+ 2. **Restart after install**: BlackHole may require a system restart after first install
591
+ 3. **Verify devices**: `SwitchAudioSource -a` should list both BlackHole 2ch and 16ch
592
+ 4. **Check Slack audio settings**: Slack Preferences → Audio & Video → verify speaker = BlackHole 2ch, mic = BlackHole 16ch
593
+
594
+ ### Huddle server can't connect to Slack
595
+
596
+ 1. **Check Slack is running with CDP**: `curl http://localhost:9222/json/version` should return Slack version info
597
+ 2. **Relaunch Slack**: `./scripts/huddle/launch-slack.sh`
598
+ 3. **Check CDP port**: Ensure no other Electron app (VS Code, etc.) is claiming port 9222
599
+
600
+ ### Echo in huddles (agent hears itself)
601
+
602
+ 1. **Switch capture mode**: Set `HUDDLE_CAPTURE_MODE=webaudio` in `.env` (this bypasses BlackHole for inbound capture, using CDP-injected audio capture instead)
603
+ 2. **Verify audio routing**: BlackHole 2ch should be Slack's speaker only; system speaker should remain on built-in
604
+
605
+ ### Deepgram not transcribing
606
+
607
+ 1. **Check API key**: Verify `DEEPGRAM_API_KEY` is set and valid
608
+ 2. **Check audio signal**: `./scripts/huddle/setup-audio.sh --test` should show non-zero bytes captured
609
+ 3. **Check WebSocket**: Audio bridge logs should show "Deepgram connected"
610
+ 4. **Check model**: Default is `nova-2` — verify your Deepgram plan supports it
611
+
612
+ ### ElevenLabs TTS not working
613
+
614
+ 1. **Check API key**: Verify `ELEVENLABS_API_KEY` is set and valid
615
+ 2. **Check voice ID**: Default voice and model are configured in audio-bridge.mjs — verify they match your ElevenLabs account
616
+ 3. **Check playback device**: `sox -n -t coreaudio "BlackHole 16ch" synth 1 sine 440` should play without error
617
+ 4. **Check quota**: Free tier has limited characters per month
618
+
619
+ ---
620
+
621
+ ## 10. Security Considerations
622
+
623
+ ### Caller Authentication
624
+
625
+ - All inbound SMS senders are looked up in `caller-id-map.yaml` before processing
626
+ - Unknown numbers receive `default` (most restrictive) access level
627
+ - The agent never shares internal state, logs, or memory with unknown callers
628
+ - Phone number spoofing is possible — do not treat SMS as a secure authentication channel for high-risk actions
629
+
630
+ ### Twilio Request Validation
631
+
632
+ When `TWILIO_ACCOUNT_SID` and `TWILIO_AUTH_TOKEN` are set, the SMS handler can validate that incoming webhooks genuinely come from Twilio. This prevents third parties from injecting fake messages by POSTing to your webhook URL.
633
+
634
+ ### Tunnel Security
635
+
636
+ - Quick tunnels (random URLs) provide security-through-obscurity but the URL can leak
637
+ - Named tunnels with Cloudflare Access provide proper authentication
638
+ - Consider adding IP allowlisting in Cloudflare if your Twilio traffic comes from known IPs
639
+
640
+ ### Voice Data
641
+
642
+ - Audio is processed in real-time and not stored permanently (only transcripts are kept)
643
+ - Deepgram processes audio on their servers — review their [data handling policy](https://deepgram.com/privacy)
644
+ - ElevenLabs processes text on their servers — review their [privacy policy](https://elevenlabs.io/privacy)
645
+ - All voice transcripts are logged locally in `logs/huddle/`
646
+
647
+ ---
648
+
649
+ ## Quick Reference
650
+
651
+ ### Ports
652
+
653
+ | Service | Default Port | Env Variable |
654
+ |-------------------|-------------|-------------------------|
655
+ | SMS Handler | 3001 | `SMS_PORT` |
656
+ | WhatsApp Handler | 3002 | `WHATSAPP_PORT` |
657
+ | Huddle Events API | 3200 | `HUDDLE_EVENTS_PORT` |
658
+ | Audio Capture WS | 3201 | `HUDDLE_CAPTURE_WS_PORT`|
659
+ | Slack CDP | 9222 | (launch flag) |
660
+
661
+ ### Environment Variables (Voice/SMS)
662
+
663
+ | Variable | Required For | Description |
664
+ |---------------------------|----------------|------------------------------------------|
665
+ | `TWILIO_ACCOUNT_SID` | SMS | Twilio account identifier |
666
+ | `TWILIO_AUTH_TOKEN` | SMS | Twilio authentication token |
667
+ | `TWILIO_PHONE_NUMBER` | SMS | Agent's Twilio phone number (E.164) |
668
+ | `TWILIO_PHONE_SID` | SMS (optional) | Phone number SID for advanced features |
669
+ | `DEEPGRAM_API_KEY` | Voice | Deepgram STT API key |
670
+ | `ELEVENLABS_API_KEY` | Voice | ElevenLabs TTS API key |
671
+ | `SMS_PORT` | SMS (optional) | Override default SMS handler port |
672
+ | `HUDDLE_CAPTURE_MODE` | Voice (optional)| `webaudio` (default) or `sox` |
673
+ | `HUDDLE_USE_API` | Voice (optional)| Set `1` to use Anthropic API vs CLI |
674
+
675
+ ### Key Files
676
+
677
+ | File | Purpose |
678
+ |--------------------------------------|-------------------------------------------------|
679
+ | `scripts/sms-handler.mjs` | Inbound SMS webhook handler |
680
+ | `scripts/send-sms.sh` | Outbound SMS sender with dedup |
681
+ | `scripts/huddle/setup-audio.sh` | Audio device installation and verification |
682
+ | `scripts/huddle/launch-slack.sh` | Launch Slack with CDP enabled |
683
+ | `scripts/huddle/start-call.mjs` | Initiate a Slack huddle with a contact |
684
+ | `scripts/huddle/huddle-server.mjs` | Main huddle orchestrator |
685
+ | `scripts/huddle/huddle-controller.mjs` | Slack CDP automation for huddle UI |
686
+ | `scripts/huddle/audio-bridge.mjs` | Bidirectional STT/TTS audio pipeline |
687
+ | `scripts/parse-voice-transcript.mjs`| Extract action items from call transcripts |
688
+ | `config/caller-id-map.yaml` | Phone → identity mapping with access levels |
689
+
690
+ ---
691
+
692
+ ## Related Documents
693
+
694
+ - [Mac Mini Bootstrap](../runbooks/mac-mini-bootstrap.md) — Full machine setup
695
+ - [Agent Persona Setup](agent-persona-setup.md) — Identity and configuration
696
+ - [Perpetual Operations](../runbooks/perpetual-operations.md) — How the system runs 24/7
697
+ - [Communications Policy](../governance/communications-policy.md) — Voice modes and approval rules
698
+ - [System Architecture](../architecture/system-architecture.md) — Overall system design