agentvibes 4.6.3 → 4.6.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,2121 +1,2162 @@
1
- # 🎤 AgentVibes
2
-
3
- > **Finally! Your agents can talk back!**
4
- >
5
- > 🌐 **[agentvibes.org](https://agentvibes.org)**
6
- >
7
- > Professional text-to-speech for **Claude Code**, **Claude Desktop**, and **OpenClaw** - **Soprano** (Neural), **Piper TTS** (Free!), **macOS Say** (Built-in!), or **Windows SAPI** (Zero Setup!)
8
-
9
- [![npm version](https://img.shields.io/npm/v/agentvibes)](https://www.npmjs.com/package/agentvibes)
10
- [![Test Suite](https://github.com/paulpreibisch/AgentVibes/actions/workflows/test.yml/badge.svg)](https://github.com/paulpreibisch/AgentVibes/actions/workflows/test.yml)
11
- [![Publish](https://github.com/paulpreibisch/AgentVibes/actions/workflows/publish.yml/badge.svg)](https://github.com/paulpreibisch/AgentVibes/actions/workflows/publish.yml)
12
- [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
13
-
14
- **Author**: Paul Preibisch ([@997Fire](https://x.com/997Fire)) | **Version**: v4.6.2
15
-
16
- ---
17
-
18
- ## 🚀 Quick Links
19
-
20
- | I want to... | Go here |
21
- |--------------|---------|
22
- | **Install AgentVibes** (just `npx`, no git!) | [Quick Start Guide](docs/quick-start.md) |
23
- | **Run Claude Code on Android** | [Android/Termux Setup](#-android--termux) |
24
- | **Secure OpenClaw on Remote Server** | [Security Hardening Guide](docs/security-hardening-guide.md) ⚠️ |
25
- | **Understand what I need** | [Prerequisites](#-prerequisites) |
26
- | **Set up on Windows (Native)** | [Windows Native Setup](WINDOWS-SETUP.md) |
27
- | **Set up on Windows (Claude Desktop/WSL)** | [Windows WSL Guide](mcp-server/WINDOWS_SETUP.md) |
28
- | **Use with OpenClaw** | [OpenClaw Integration](#-openclaw-integration) |
29
- | **Use natural language** | [MCP Setup](docs/mcp-setup.md) |
30
- | **Switch voices** | [Voice Library](docs/voice-library.md) |
31
- | **Fix issues** (git-lfs? MCP tokens? Read this!) | [Troubleshooting](docs/troubleshooting.md) & [FAQ](#-frequently-asked-questions-faq) |
32
-
33
- ---
34
-
35
- ## ✨ What is AgentVibes?
36
-
37
- **AgentVibes adds lively voice narration to your Claude AI sessions!**
38
-
39
- Whether you're coding in Claude Code, chatting in Claude Desktop, or running OpenClaw — AgentVibes brings AI to life with professional voices and personalities.
40
-
41
- ---
42
-
43
- ## 🐛 NEW IN v4.6.2 — Party Mode Voices, LibriTTS Speaker Fix, Agent Pretext
44
-
45
- - **Party mode agents now speak in their unique voices** — SKILL.md wired to `bmad-speak.ps1` per agent
46
- - **LibriTTS speaker IDs resolved correctly** `Holly-7` is speaker 322, not 7
47
- - **Agent pretext spoken on Windows** — "Mary, Business Analyst here." before every response
48
- - **`parseMultiSpeaker` fallback** — works on fresh installs before `.onnx.json` is patched
49
-
50
- ---
51
-
52
- ## 🌟 NEW IN v4.6.1 Party Mode Voice Clarity + Agent Config UI Polish
53
-
54
- ### 🔊 Voice Volume Fixed in Party Mode
55
-
56
- - **`normalize=0`** added to ffmpeg `amix` — prevents voices being silenced to 50% when mixed with background music
57
- - **Voice boost `volume=1.5`** applied to every TTS stream agents are now loud and clear
58
- - **Music intro reduced to 1 second** (`adelay=1000`) — less dead air before each agent speaks
59
- - **Pre-synthesis gap reduction** — WAV files are generated *before* acquiring the mutex, so synthesis overlaps with the previous agent's playback (gap drops from ~4–6s to ~1s)
60
-
61
- ### 🎛️ BMAD Agent Config — Preview + Split Fields
62
-
63
- - **Music Track** and **Music Vol** are now separate fields in the agent editor — each opens its own dialog
64
- - **Preview button** plays the selected voice with full effects: personality, reverb, background music track and volume
65
- - **Blinking indicator** (`►█`) highlights the focused button — reuses the shared `attachBtnBlink` utility
66
- - **Preview spinner** animates while audio is playing
67
- - **Tab→Save hint** shown in the volume input dialog
68
-
69
- ### 🚻 Voice Gender Auto-Assign Fixed
70
-
71
- - `inferGender` now strips the numeric suffix from LibriTTS speaker names (e.g. `anna-9` → `anna`) before looking up gender
72
- - Expanded `GENDER_MAP` with 60+ first names covering all bundled voices
73
- - `libritts` blanket-male override removedLibriTTS voices are now inferred per-name
74
-
75
- ### 🐛 Other Fixes
76
-
77
- - Volume dialog text now uses `cyan`/`white` — no more invisible-on-dark-background instructions
78
- - After saving agent settings, focus correctly returns to the agent list (Enter re-opens the agent)
79
- - Boundary navigation in agent fields no longer jumps to buttons prematurely
80
-
81
- ---
82
-
83
- ## 🌟 NEW IN v4.6Party Mode Auto-Install + Volume Fix
84
-
85
- ### 🎉 BMAD Party Mode TTSZero Setup
86
-
87
- Every agent now speaks automatically in any BMAD project — no manual hook configuration needed:
88
-
89
- - Installer copies `bmad-party-speak.sh` (Linux/macOS/WSL) or `bmad-party-speak.ps1` (Windows) to `~/.claude/hooks/`
90
- - `PostToolUse` hook registered in `~/.claude/settings.json` automatically
91
- - `npx agentvibes update` keeps the scripts fresh across all platforms
92
-
93
- ### 🔊 Background Music Volume Default: 20%
94
-
95
- All volume defaults lowered from 70% to 20% — new installs and agents start at a sensible level. `bmad-speak` scripts now inherit the global volume setting instead of ignoring it.
96
-
97
- ### 🐛 Installer Navigation Fix
98
-
99
- Pressing on the completion screen no longer jumps back to the installation step.
100
-
101
- ### 🧪 628 Tests, Zero Failures
102
-
103
- ---
104
-
105
- ## 🌟 v4.5 "Speak Every Language" Release
106
-
107
- ### 🌍 Multilingual TUI — 9 Languages
108
-
109
- Every screen, button, and label in `npx agentvibes` is now fully translated:
110
-
111
- - **English, Spanish, French, German, Portuguese, Japanese, Korean, Chinese (Simplified), Italian**
112
- - Language selection on first launchpick your language before anything else
113
- - Language sub-tab in Settings — switch live, no restart needed
114
- - All tab labels, buttons, footer hints, status messages, and BMAD/Receiver tabs translated
115
- - Per-language i18n files (`src/i18n/en.js`, `es.js`, `fr.js`, ...) with English fallback
116
-
117
- ### 🪟 Windows Security Hardening
118
-
119
- - **Unpredictable temp files** — `randomUUID()` replaces `Date.now()` in all temp filenames (JS + PowerShell)
120
- - **No shell injection** `spawnSync` replaces `execSync(..., { shell: true })` for `which` lookups
121
- - **Smart music player detection** — `detectMp3Player()` replaces hardcoded `ffplay` on Windows
122
- - **Boolean fix** — `isWindowsTerminal` now returns `true/false`, not the `WT_SESSION` UUID string
123
-
124
- ### 🎙️ Cross-Platform BMAD Speak
125
-
126
- BMAD (Build More Architect Dreams) is an AI multi-agent framework where specialized agents — Architect, PM, Developer, QA, and Analyst — collaborate to build software. With this release, every agent in a BMAD party mode session now speaks aloud with their own unique voice, personality, and music on Windows — making each role instantly recognizable.
127
-
128
- - `bmad-speak.js` cross-platform entry point; auto-routes to PowerShell on Windows or bash on Mac/Linux
129
- - `bmad-speak.ps1` — native Windows BMAD speak with per-agent personality routing
130
-
131
- ### 🧪 600 Tests, Zero Failures
132
-
133
- ---
134
-
135
- ## 🌟 v4.4 — Full Platform Parity Release
136
-
137
- ### 🪟 Windows MCP Parity — 27/27 Tools Working
138
-
139
- All MCP tools now work natively on Windows. Previously 12 tools silently failed due to missing scripts:
140
-
141
- - **6 new PowerShell scripts** personality-manager, speed-manager, language-manager, learn-manager, verbosity-manager, clean-audio-cache
142
- - **Unified provider naming** `piper` and `sapi` on all platforms (no more `windows-piper`/`windows-sapi`)
143
- - **replay command** added to voice-manager for Windows
144
- - **Adversarial review** 24 issues found, 10 fixed (3 CRITICAL, 4 HIGH, 3 MEDIUM)
145
- - **28 new tests** covering script parity, effects round-trip, provider management, and naming consistency
146
- - **Feature-platform matrix** — [docs/feature-platform-matrix.md](docs/feature-platform-matrix.md) tracks all 85 features across Linux, macOS, Windows, and WSL
147
-
148
- ### Bug Fixes (HIGH)
149
- - ffmpeg stderr redirected to temp file instead of literal `"NUL"` file
150
- - `AGENTVIBES_NO_PLAY` env var properly cleaned up on error paths
151
- - `PIPER_SPEAKER` env var no longer leaks between voice switches
152
- - Provider config now uses project-local `.claude` (not always global)
153
- - Text sanitization relaxed`$50 (USD)` no longer becomes `50 USD`
154
-
155
- ---
156
-
157
- ## 🌟 v4.3 — Windows Parity + BMAD Party Mode
158
-
159
- ### 🎭 BMAD Party Mode — Every Agent Has Its Own Voice
160
-
161
- The BMad Method (Build More Architect Dreams) is an AI-driven development framework that helps you build software from ideation through agentic implementation with specialized AI agents, guided workflows, and intelligent planning that adapts to your project's complexity.
162
-
163
- **Every BMAD agent now speaks with their own unique voice, music, and personality.**
164
-
165
- When party mode runs a multi-agent discussion, the Architect, PM, Developer, QA, and Analyst each sound completely different — making every role immediately recognizable.
166
-
167
- **Auto-enabled** — if BMAD is installed, party mode activates automatically. Open the BMad Tab to configure each agent:
168
-
169
- ```bash
170
- npx agentvibes # Press B to open the BMad Tab
171
- ```
172
-
173
- **Per-agent configuration:**
174
- - 🎙️ **Voice** — 914 voices to choose from, auto-assigned gender-aware
175
- - 🎵 **Background Music** — Unique ambient track per agent (cinematic, lo-fi, jazz...)
176
- - 🎚️ **Music Volume** Per-agent level, or set all at once via Bulk Edit
177
- - 🎛️ **Reverb** none / room / hall / cathedral / studio per agent
178
- - 💬 **Pretext** Custom intro phrase ("Winston says:..." before every line)
179
- - 🎭 **Personality** sarcastic, dramatic, pirate, cheerful, and more
180
- - 🔇 **No Overlap**Speech lock ensures agents never talk over each other
181
- - ✨ **Markdown-Clean** — Asterisks and formatting stripped before TTS
182
-
183
- ### 🎛️ BMad Tab — Visual Agent Configurator
184
-
185
- The `npx agentvibes` TUI now includes a full **BMad Tab** for managing every agent visually — inspired by the Voices tab, with the same columns and navigation polish:
186
-
187
- ```bash
188
- npx agentvibes # Press B for BMad Tab
189
- ```
190
-
191
- | Agent | Voice | Gender | Provider | Reverb | Music | Vol | Pretext |
192
- |-------|-------|--------|----------|--------|-------|-----|---------|
193
- | 🏢 Winston | Rose Ibex | Female | Piper (LibriTTS) | studio | jazz | 65% | Winston says |
194
- | 🧠 Larry | Kusal | Male | Piper | hall | cinematic | 80% | Larry says |
195
-
196
- **Highlights:**
197
- - **Beautified voice names** `16Speakers::Rose_Ibex` shows as `Rose Ibex`; `en_US-kusal-medium` shows as `Kusal`
198
- - **Gender & Provider columns** — see voice metadata at a glance, just like the Voices tab
199
- - **Inline row hints** — navigate to any agent and see `[Space] Preview [Enter] Configure` on the row itself
200
- - **Preview spinner** — animated `⠋⠙⠹⠸` braille spinner while audio plays
201
-
202
- | Key | Action |
203
- |-----|--------|
204
- | `↑↓` / `jk` | Navigate agents |
205
- | `Space` | Preview agent (spinner shows while playing) |
206
- | `Enter` | Configure voice, music, volume, reverb, personality, pretext |
207
- | `A` | Auto-assign unique voices (gender-aware, no repeats) |
208
- | `B` | Bulk Edit set music / volume / pretext / reverb for all agents |
209
- | `X` | Reset agent to defaults |
210
-
211
- ---
212
-
213
- ### 🖥️ SSH Receiver — Hear Your Headless Server
214
-
215
- **Run Claude on a cloud box and hear the TTS on your local machine.**
216
-
217
- The new **Receiver Tab** streams TTS audio from voiceless remote servers to your local machine over TCP — perfect for AWS/GCP dev boxes, WSL2, and SSH sessions.
218
-
219
- ```bash
220
- # On your local machine open TUI, go to Receiver tab, click Start
221
- npx agentvibes
222
-
223
- # On the remote server — AgentVibes auto-detects the receiver and streams
224
- ```
225
-
226
- Zero-config forwarding. Works with Piper, macOS Say, and Soprano.
227
-
228
- ---
229
-
230
- ### ⚡ TTS Latency -~1 Second
231
-
232
- - **Batched Node.js calls** 6 separate profile reads collapsed into 1 (~900ms saved)
233
- - **inotifywait queue** file-event-based worker, no polling delay
234
- - **Background cache cleanup** off the critical path every 10th call
235
-
236
- ---
237
-
238
- ### 🎨 ANSI Banner Colors + Toggle
239
-
240
- Full color in the TTS banner (gold voice, cyan reverb, traffic-light cache). Hide it without muting:
241
-
242
- ```bash
243
- touch ~/.agentvibes/banner-disabled # or say "turn off the TTS banner"
244
- ```
245
-
246
- ---
247
-
248
- ### 💬 Intro Text (Pretext) - Your Personal AI Branding
249
-
250
- **Add custom prefixes to every TTS announcement!**
251
-
252
- Configure via the AgentVibes TUI Settings tab:
253
-
254
- ```bash
255
- npx agentvibes # Navigate to Settings tab
256
- ```
257
-
258
- Transform generic AI responses into your personal brand:
259
-
260
- **Before:**
261
- ```
262
- "Starting analysis of the codebase..."
263
- ```
264
-
265
- **After (with "FireBot: " intro text):**
266
- ```
267
- "FireBot: Starting analysis of the codebase..."
268
- ```
269
-
270
- **Perfect for:**
271
- - 🤖 **Personal AI Branding** - Make Claude sound like your custom assistant
272
- - 🏢 **Team Identity** - Company bots with branded voices
273
- - 🎮 **Character Roleplay** - Gaming assistants with character names
274
- - 🎓 **Teaching Contexts** - Professor Bot, Tutor AI, etc.
275
-
276
- **Features:**
277
- - Up to 50 characters
278
- - UTF-8 and emoji support 🎉
279
- - Set during installation or anytime after
280
- - Works with all TTS providers
281
- - Applies to every single announcement
282
-
283
- **Examples:**
284
- - `"JARVIS: "` - Iron Man style
285
- - `"🤖 Assistant: "` - With emoji
286
- - `"CodeBot: "` - Development assistant
287
- - `"Chef AI: "` - Cooking helper
288
-
289
- Configure via: `npx agentvibes` → Settings tab
290
-
291
- ---
292
-
293
- ### 🎵 Custom Background Music - Complete Audio Control
294
-
295
- **Upload your own background music with battle-tested security!**
296
-
297
- Configure via the AgentVibes TUI Music tab:
298
-
299
- ```bash
300
- npx agentvibes # Navigate to Music tab
301
- ```
302
-
303
- Replace the default background tracks with your own audio files.
304
-
305
- **Supported Formats:**
306
- - 🎵 MP3 (.mp3)
307
- - 🎵 WAV (.wav)
308
- - 🎵 OGG (.ogg)
309
- - 🎵 M4A (.m4a)
310
-
311
- **Security First:**
312
- - **180+ attack variations tested** - Path traversal, symlinks, Unicode tricks
313
- - **100% attack rejection rate** - Every malicious attempt blocked
314
- - **OWASP CWE-22 compliant** - Industry-standard security
315
- - ✅ **7 validation layers** - Defense-in-depth architecture
316
- - **File ownership verification** - Only your files accepted
317
- - ✅ **Magic number validation** - Real audio files only
318
- - ✅ **Secure storage** - 600 permissions, restricted directory
319
-
320
- **Smart Validation:**
321
- - Recommended duration: 30-90 seconds (optimal looping)
322
- - Maximum: 300 seconds (5 minutes)
323
- - Maximum size: 50MB
324
- - Automatic format detection
325
- - Duration warnings for non-optimal lengths
326
-
327
- **Perfect for:**
328
- - 🎮 **Making coding fun** - Your favorite beats while you build
329
- - 🎼 **Setting the mood** - Match the music to the task (lo-fi for debugging, epic for shipping)
330
- - 🗂️ **Identifying projects** - Different track per repo so you always know which project Claude is in
331
- - 🎹 **Deep focus** - Ambient or classical to stay in flow
332
-
333
- **Features:**
334
- - Preview before setting
335
- - One-command upload
336
- - Works with all TTS providers
337
- - Loops seamlessly under voice
338
- - Easy restore to defaults
339
-
340
- **Menu Options:**
341
- 1. Change music - Upload new audio file
342
- 2. Remove music - Clear custom music
343
- 3. Reset to default - Restore built-in tracks (16 genres)
344
- 4. Enable/Disable - Toggle background music
345
- 5. Preview current - Sample your music
346
-
347
- Configure via: `npx agentvibes` → Music tab
348
-
349
- **Security Certified:** See full audit report at `docs/security/SECURITY-AUDIT.md`
350
-
351
- ---
352
-
353
- ### 🎯 Key Features
354
-
355
- **🌟 v4.2 BMAD Party Mode & SSH Receiver:**
356
- - 🎭 **BMAD Party Mode Voices** Each agent speaks with their unique voice, music, reverb, personality
357
- - 🖥️ **SSH Receiver Tab** Stream TTS audio from headless servers to your local machine over TCP
358
- - 🎛️ **BMad Tab (TUI)** Visual agent configurator with auto-assign and bulk edit
359
- - ⚡ **TTS Latency -1s** — Batched Node.js calls, inotifywait queue, background cleanup
360
- - 🎨 **ANSI Banner Colors Restored** — Gold/cyan/traffic-light colors in TTS info banner
361
- - 🔕 **Banner Toggle** — Hide TTS banner without muting (`~/.agentvibes/banner-disabled`)
362
- - 🔇 **No Party Mode Overlap** — Agents wait for full audio before next speaks
363
- - 🧹 **Markdown-Clean Speech** Asterisks/formatting stripped automatically from party mode
364
-
365
- **🌟 NEW IN v3.6.0 — Voice Explorer Release:**
366
- - 🏷️ **Friendly Voice Names** - "Ryan" instead of "en_US-libritts_r-medium-speaker-123"
367
- - 💬 **Intro Text (Pretext)** - Custom prefix for all TTS ("FireBot: Starting...")
368
- - 🎵 **Custom Background Music** - Upload your own audio files with battle-tested security
369
- - 🎨 **Interactive Installer** - Preview voices and music during installation
370
- - 🛡️ **Security Hardening** - 180+ attack variations tested, 100% blocked, OWASP compliant
371
-
372
- **🪟 NEW IN v3.5.5 Native Windows Support:**
373
- - 🖥️ **Windows Native TTS** - Soprano, Piper, and Windows SAPI providers. No WSL required!
374
- - 🎵 **Background Music** - 16 genre tracks mixed under voice
375
- - 🎛️ **Reverb & Audio Effects** - 5 reverb levels via ffmpeg
376
- - 🔊 **Verbosity Control** - High, Medium, or Low settings
377
- - 🎨 **Beautiful Installer** - `npx agentvibes install` or `.\setup-windows.ps1`
378
-
379
- **⚡ v3.4.0 Highlights:**
380
- - 🎤 **Soprano TTS Provider** - Ultra-fast neural TTS with 20x CPU, 2000x GPU acceleration (thanks [@nathanchase](https://github.com/nathanchase)!)
381
- - 🛡️ **Security Hardening** - 9.5/10 score with comprehensive validation and timeouts
382
- - 🌐 **Environment Intelligence** - PulseAudio tunnel auto-detection for SSH scenarios
383
-
384
- **⚡ Core Features:**
385
- - **One-Command Install** - Get started in 30 seconds (`npx agentvibes install` or `.\setup-windows.ps1` without Node.js)
386
- - 🎭 **Multi-Provider Support** - Soprano (neural), Piper TTS (50+ free voices), macOS Say (100+ built-in), or Windows SAPI
387
- - 🎙️ **27+ Professional AI Voices** - Character voices, accents, and unique personalities
388
- - 🎙️ **Verbosity Control** - Choose how much Claude speaks (LOW, MEDIUM, HIGH)
389
- - 🎙️ **AgentVibes MCP** - Natural language control ("Switch to Aria voice") for Claude Code & Desktop
390
- - 🔊 **SSH Audio Optimization** - Auto-detects remote sessions and eliminates static (VS Code Remote SSH, cloud dev)
391
-
392
- **🎭 Personalization:**
393
- - 🎭 **19 Built-in Personalities** - From sarcastic to flirty, pirate to dry humor
394
- - 💬 **Advanced Sentiment System** - Apply personality styles to ANY voice without changing it
395
- - 🎵 **Voice Preview & Replay** - Listen before you choose, replay last 10 TTS messages
396
-
397
- **🚀 Integrations & Power Features:**
398
- - 🔌 **Enhanced BMAD Plugin** - Auto voice switching for BMAD agents with multilingual support
399
- - 🔊 **Live Audio Feedback** - Hear task acknowledgments and completions in any language
400
- - 🌍 **30+ Languages** - Multilingual support with native voice quality
401
- - 🆓 **Free & Open** - Use Piper TTS with no API key required
402
-
403
- ### 🤗 Hugging Face AI Voice Models
404
-
405
- **AgentVibes' Piper TTS uses 100% Hugging Face-trained AI voice models** from [rhasspy/piper-voices](https://huggingface.co/rhasspy/piper-voices).
406
-
407
- **What are Hugging Face voice models?**
408
-
409
- Hugging Face voice models are pre-trained artificial intelligence models hosted on the Hugging Face Model Hub platform, designed to convert text into human-like speech (Text-to-Speech or TTS) or perform other speech tasks like voice cloning and speech-to-speech translation. They're accessible via their Transformers library for easy use in applications like voice assistants, audio generation, and more.
410
-
411
- **Key Benefits:**
412
- - 🎯 **Human-like Speech** - VITS-based neural models for natural pronunciation and intonation
413
- - 🌍 **35+ Languages** - Multilingual support with native accents
414
- - 🆓 **100% Open Source** - All Piper voices are free HF models (Tacotron2, FastSpeech2, VITS)
415
- - 🔧 **Developer-Friendly** - Fine-tune, customize, or deploy for various audio projects
416
- - **Offline & Fast** - No API keys, no internet needed once installed
417
-
418
- All 50+ Piper voices AgentVibes provides are sourced from Hugging Face's open-source AI voice models, ensuring high-quality, natural-sounding speech synthesis across all supported platforms.
419
-
420
- ---
421
-
422
- ## 📑 Table of Contents
423
-
424
- ### Getting Started
425
- - [🚀 Quick Start](#-quick-start) - Get voice in 30 seconds (3 simple steps)
426
- - [📱 Android/Termux](#-quick-setup-android--termux-claude-code-on-your-phone) - Run Claude Code on your phone
427
- - [📋 Prerequisites](#-prerequisites) - What you actually need (Node.js + optional tools)
428
- - [✨ What is AgentVibes?](#-what-is-agentvibes) - Overview & key features
429
- - [🌟 NEW FEATURE HIGHLIGHTS](#-new-feature-highlights) - **START HERE!**
430
- - [🎭 BMAD Party Mode](#-bmad-party-mode--multi-agent-voice-conversations) - Per-agent voices, music, reverb
431
- - [🖥️ SSH Receiver](#️-agentvibes-receiver--remote-audio-streaming) - Stream audio from headless servers
432
- - [💬 Intro Text](#-intro-text-pretext---your-personal-ai-branding) - Custom TTS prefixes
433
- - [🎵 Custom Background Music](#-custom-background-music---complete-audio-control) - Upload your own tracks
434
- - [📰 Latest Release](#-latest-release) - v4.3 "Windows Parity" — background music, voice selection, ffmpeg auto-install on Windows
435
- - [🪟 Windows Setup Guide for Claude Desktop](mcp-server/WINDOWS_SETUP.md) - Complete Windows installation with WSL & Python
436
-
437
- ### AgentVibes MCP (Natural Language Control)
438
- - [🎙️ AgentVibes MCP Overview](#%EF%B8%8F-agentvibes-mcp) - **Easiest way** - Natural language commands
439
- - [For Claude Desktop](docs/mcp-setup.md#for-claude-desktop) - Windows/WSL setup, Python requirements
440
-
441
- - [For Claude Code](docs/mcp-setup.md#for-claude-code) - Project-specific setup
442
-
443
- ### Core Features
444
- - [🎤 Commands Reference](#-commands-reference) - All available commands
445
- - [🎙️ Verbosity Control](#%EF%B8%8F-verbosity-control) - Control how much Claude speaks (low/medium/high)
446
- - [🎭 Personalities vs Sentiments](#-personalities-vs-sentiments) - Two systems explained
447
- - [🗣️ Voice Library](#%EF%B8%8F-voice-library) - 914 voices with friendly names
448
- - [🔌 BMAD Plugin](#-bmad-plugin) - Auto voice switching for BMAD agents
449
- - [🎙️ AgentVibes Receiver - NEW!](#%EF%B8%8F-agentvibes-receiver-remote-audio-streaming-from-voiceless-servers) - Remote audio streaming from voiceless servers
450
-
451
- ### Integrations & Platforms
452
- - [🤖 OpenClaw Integration](#-openclaw-integration) - Use AgentVibes with OpenClaw messaging platform
453
- - [🎙️ AgentVibes Skill for OpenClaw](#-agentvibes-skill-for-openclaw---what-you-get) - 50+ voices, effects, personalities for OpenClaw
454
- - [📱 AgentVibes Receiver](#-agentvibes-receiver-local-phone-) - Remote audio on phones/local machines
455
-
456
- ### Advanced Topics
457
- - [📦 Installation Structure](#-installation-structure) - What gets installed
458
- - [💡 Common Workflows](#-common-workflows) - Quick examples
459
- - [🔧 Advanced Features](#-advanced-features) - Custom voices & personalities
460
- - [🔊 Remote Audio Setup](#-remote-audio-setup) - Play TTS from remote servers
461
- - [🛠️ Technical Documentation](#️-technical-documentation) - Audio architecture, cross-platform support, voice resolution
462
- - [🚨 Security Hardening Guide](docs/security-hardening-guide.md) - **REQUIRED if running OpenClaw on remote server**: SSH hardening, Fail2Ban, Tailscale, UFW, AIDE
463
- - [🔬 Technical Deep Dive](docs/technical-deep-dive.md) - How AgentVibes works under the hood
464
- - [❓ Troubleshooting](#-troubleshooting) - Common issues & fixes
465
-
466
- ### Additional Resources
467
- - [🔗 Useful Links](#-useful-links) - Voice typing & AI tools
468
- - [🔄 Updating](#-updating) - Keep AgentVibes current
469
- - [🗑️ Uninstalling](#️-uninstalling) - Remove AgentVibes cleanly
470
- - [❓ FAQ](#-frequently-asked-questions-faq) - **NEW!** Common questions answered (git-lfs, MCP tokens, installation)
471
- - [🍎 macOS Testing](docs/macos-testing.md) - Automated testing on macOS with GitHub Actions
472
- - [🤗 Hugging Face Voice Models](docs/hugging-face-models.md) - Technical details on AI voice models
473
- - [🙏 Credits](#-credits) - Acknowledgments
474
- - [🤝 Contributing](#-contributing) - Show support
475
-
476
- ---
477
-
478
- ## 📰 Latest Release
479
-
480
- **[v4.3 - "Windows Parity" Release](https://github.com/paulpreibisch/AgentVibes/releases/tag/v4.3)** 🎉
481
-
482
- This is the biggest AgentVibes release since the TUI launched in v4.0. Two headline features: **BMAD Party Mode** gives every agent their own voice and music, and the **SSH Receiver** lets you hear your headless server speak on your local machine.
483
-
484
- ### 🎭 BMAD Party Mode Multi-Agent Voice Conversations
485
-
486
- The BMad Method (Build More Architect Dreams) is an AI-driven development framework module that helps you build software from ideation through agentic implementation with specialized AI agents, guided workflows, and intelligent planning.
487
-
488
- Every agent in a BMAD discussion now speaks with their own individually configured voice, music, reverb, and personality — making the Architect, PM, Developer, QA, and Analyst immediately recognizable the moment they speak.
489
-
490
- **Auto-enabled** party mode activates automatically when BMAD is detected. Configure agents visually:
491
-
492
- ```bash
493
- npx agentvibes # Press B for BMad Tab
494
- ```
495
-
496
- **Each agent gets:**
497
- - 🎙️ **Their own voice** 914 to choose from, or auto-assign gender-aware
498
- - 🎵 **Their own music track** cinematic for the Architect, lo-fi for the Dev
499
- - 🎚️ **Their own volume** — fine-tune per-agent, or bulk-set all at once
500
- - 🎛️ **Their own reverb** — studio, hall, cathedral, room, or none
501
- - 💬 **Their own pretext** — "Winston says:..." before every line
502
- - 🎭 **Their own personality** — sarcastic, dramatic, pirate, cheerful...
503
- - 🔇 **No overlap** — agents wait for full audio before the next one speaks
504
- - ✨ **Markdown stripped** — no "asterisk asterisk" in TTS output
505
-
506
- ### 🎛️ BMad Tab — Full Visual Agent Configurator
507
-
508
- Manage every agent from an interactive table — same polish as the Voices tab:
509
-
510
- | Key | Action |
511
- |-----|--------|
512
- | `Space` | Preview agent with full profile (animated spinner while playing) |
513
- | `Enter` | Configure voice, music, volume, reverb, personality, pretext |
514
- | `A` | Auto-assign unique voices (gender-aware, no repeats) |
515
- | `B` | Bulk Editset music / volume / pretext / reverb for all agents |
516
- | `X` | Reset agent to defaults |
517
-
518
- The table shows **Voice, Gender, Provider, Reverb, Music, Vol, Pretext** columns. Voice names are automatically beautified: `16Speakers::Rose_Ibex` → `Rose Ibex`.
519
-
520
- ### 🖥️ SSH Receiver — Hear Your Headless Server
521
-
522
- Stream TTS from a cloud box, WSL2, or any voiceless server directly to your local machine over TCP:
523
-
524
- ```bash
525
- # Local: open TUI → Receiver tab → Start
526
- npx agentvibes
527
-
528
- # Remote: AgentVibes auto-detects the receiver and streams audio to you
529
- ```
530
-
531
- ### ~1 Second Faster TTS
532
-
533
- - 6 Node.js profile reads collapsed into 1 (~900ms saved per speech)
534
- - `inotifywait` queue workerno polling delay
535
- - Cache cleanup runs off the critical path
536
-
537
- ### 🎨 ANSI Colors Restored + Banner Toggle
538
-
539
- Full color in the TTS banner. Silence it without muting audio:
540
- ```bash
541
- touch ~/.agentvibes/banner-disabled # or: "turn off the TTS banner" via MCP
542
- ```
543
-
544
- ### Quick Install
545
-
546
- ```bash
547
- npx agentvibes install
548
- ```
549
-
550
- 💡 **Tip:** If `npx agentvibes` shows an older version: `npm cache clean --force && npx agentvibes@latest`
551
-
552
- 🐛 **Found a bug?** [GitHub Issues](https://github.com/paulpreibisch/AgentVibes/issues)
553
-
554
- [→ View Complete Release Notes](RELEASE_NOTES.md) | [→ View Previous Release (v4.0.1)](https://github.com/paulpreibisch/AgentVibes/releases/tag/v4.0.1) | [→ View All Releases](https://github.com/paulpreibisch/AgentVibes/releases)
555
-
556
- [↑ Back to top](#-table-of-contents)
557
-
558
- ---
559
-
560
- ## 🎙️ AgentVibes MCP
561
-
562
- Agent Vibes was originally created to give the Claude Code assistant a voice! Simply install it with an npx command in your terminal, and Claude Code can talk back to you.
563
-
564
- We've now enhanced this capability by adding an MCP (Model Context Protocol) server. This integration exposes Agent Vibes' functionality directly to your AI assistant, allowing you to configure and control Agent Vibes using natural language instead of typing "/" slash commands.
565
-
566
- Setting it up is straightforward: just add the MCP server to your Claude Code configuration files.
567
-
568
- But the convenience doesn't stop there. With the MCP server in place, Claude Desktop can now use Agent Vibes too!
569
-
570
- We're thrilled about this expansion because it means Claude Desktop can finally talk back as well!
571
-
572
- If you decide to use the MCP server on Claude Desktop, after configuration, give Claude Desktop this command: "every time i give you a command, speak the acknowledgement using agentvibes and the confirmation about what you completed, when done"—and watch the magic happen!
573
-
574
- **🎯 Control AgentVibes with natural language - no slash commands to remember!**
575
-
576
- Just say "Switch to Aria voice" or "Speak in Spanish" instead of typing commands.
577
-
578
- **Works in:** Claude Desktop, Claude Code
579
-
580
- **[→ View Complete MCP Setup Guide](docs/mcp-setup.md)** - Full setup for all platforms, configuration examples, available tools, and MCP vs slash commands comparison
581
-
582
- [↑ Back to top](#-table-of-contents)
583
-
584
- ---
585
-
586
- ## 🚀 Quick Start - Get Voice in 30 Seconds
587
-
588
- **3 Simple Steps:**
589
-
590
- ### 1️⃣ Install
591
- ```bash
592
- npx agentvibes install
593
- ```
594
-
595
- ### 2️⃣ Choose Provider (Auto-Detected)
596
- - **macOS**: Native `say` provider (100+ voices) ✨
597
- - **Linux/WSL**: Piper TTS (50+ free voices) 🎙️
598
- - **Windows Native**: Soprano, Piper, or SAPI 🪟
599
- - **Android**: Termux with auto-setup 📱
600
-
601
- ### 3️⃣ Use in Claude Code
602
- Just code normally - AgentVibes automatically speaks task acknowledgments and completions! 🔊
603
-
604
- ---
605
-
606
- ### TUI Console Commands
607
-
608
- AgentVibes includes a full **Text User Interface (TUI)** built with blessed.js for managing voices, music, settings, and installation — all from a single interactive console.
609
-
610
- | Command | Description |
611
- |---------|-------------|
612
- | `npx agentvibes` | Smart detection — opens Settings if installed, Install if not |
613
- | `npx agentvibes install` | Open the Install tab directly |
614
- | `npx agentvibes config` | Open the Settings tab directly |
615
-
616
- Once inside, use **Tab** / **Shift+Tab** to switch between tabs: **Voices**, **Music**, **BMad**, **Settings**, **Receiver**, and **Install**. Use **[** / **]** to page through voice and music catalogs.
617
-
618
- ---
619
-
620
- **🍎 macOS Users (One-Time Setup):**
621
- ```bash
622
- brew install bash # Required for bash 5.x features
623
- ```
624
- macOS ships with bash 3.2 (from 2007). After this, everything works perfectly!
625
-
626
- ---
627
-
628
- **[→ Full Setup Guide](docs/quick-start.md)** - Advanced options, provider switching, and detailed setup
629
-
630
- [↑ Back to top](#-table-of-contents)
631
-
632
- [↑ Back to top](#-table-of-contents)
633
-
634
- ---
635
-
636
- ## 📋 Prerequisites - What You Actually Need
637
-
638
- ### Minimum (Core Features)
639
- **✅ REQUIRED:**
640
- - **Node.js** ≥16.0 - Check with: `node --version`
641
-
642
- ### Required for Full Features
643
- **✅ STRONGLY RECOMMENDED:**
644
- - **Python** 3.10+ - Needed for Piper TTS voice engine
645
- - **bash** 5.0+ - macOS only (macOS ships with 3.2 from 2007)
646
-
647
- ### Optional but Recommended
648
- **⭕ OPTIONAL (TTS still works without them):**
649
- - **sox** - Audio effects (reverb, EQ, pitch shifting)
650
- - **ffmpeg** - Background music, audio padding, RDP compression
651
-
652
- ### NOT Required (Despite What You've Heard)
653
- **❌ DEFINITELY NOT NEEDED:**
654
- - ❌ Git or git-lfs (npm handles everything)
655
- - Repository cloning (unless you're contributing code)
656
- - ❌ Build tools or C++ compilers (pre-built package ready to use)
657
-
658
- ### Installation Methods
659
-
660
- | Method | Command | Use Case |
661
- |--------|---------|----------|
662
- | **✅ RECOMMENDED: NPX (via npm)** | `npx agentvibes install` | **All platforms** - Just want to use AgentVibes |
663
- | **🪟 Windows PowerShell** | `.\setup-windows.ps1` | **Windows** - Standalone installer (no Node.js needed) |
664
- | **⚠️ Git Clone** | `git clone ...` | **Developers Only** - Contributing code |
665
-
666
- **Why npx?** Zero git operations, no build steps, just 30 seconds to voice!
667
-
668
- ### For Developers (Contributing Code)
669
-
670
- If you want to contribute to AgentVibes:
671
- ```bash
672
- git clone https://github.com/paulpreibisch/AgentVibes.git
673
- cd AgentVibes
674
- npm install
675
- npm link
676
- ```
677
-
678
- Requires: Node.js 16+, Git (no git-lfs), and `npm link` familiarity.
679
-
680
- [↑ Back to top](#-table-of-contents)
681
-
682
- ---
683
-
684
- ---
685
-
686
- ## 📱 Quick Setup: Android & Termux (Claude Code on Your Phone!)
687
-
688
- **Want to run Claude Code on your Android phone with professional voices?**
689
-
690
- Simply install Termux from F-Droid (NOT Google Play) and run:
691
- ```bash
692
- pkg update && pkg upgrade
693
- pkg install nodejs-lts
694
- npx agentvibes install
695
- ```
696
-
697
- Termux auto-detects and installs everything needed (proot-distro for compatibility, Piper TTS, audio playback).
698
-
699
- **[→ Full Android/Termux Setup Guide](#-android--termux)** - Detailed troubleshooting and verification steps
700
-
701
- [↑ Back to top](#-table-of-contents)
702
-
703
- ---
704
-
705
- ## 📋 System Requirements
706
-
707
- AgentVibes requires certain system dependencies for optimal audio processing and playback. Requirements vary by operating system and TTS provider.
708
-
709
- ### Core Requirements (All Platforms)
710
-
711
- | Tool | Required For | Why It's Needed |
712
- |------|-------------|-----------------|
713
- | **Node.js** ≥16.0 | All platforms | Runtime for AgentVibes installer and MCP server |
714
- | **Bash** ≥5.0 | macOS | Modern bash features (macOS ships with 3.2 from 2007) |
715
- | **Python** 3.10+ | Piper TTS, MCP server | Runs Piper voice engine and MCP server |
716
-
717
- ### Audio Processing Tools (Recommended)
718
-
719
- | Tool | Status | Purpose | Impact if Missing |
720
- |------|--------|---------|------------------|
721
- | **sox** | Recommended | Audio effects (reverb, EQ, pitch, compression) | No audio effects, still works |
722
- | **ffmpeg** | Recommended | Background music mixing, audio padding, RDP compression | No background music or RDP optimization |
723
-
724
- ### Platform-Specific Requirements
725
-
726
- #### 🐧 Linux / WSL
727
-
728
- ```bash
729
- # Ubuntu/Debian
730
- sudo apt-get update
731
- sudo apt-get install -y sox ffmpeg python3-pip pipx
732
-
733
- # Fedora/RHEL
734
- sudo dnf install -y sox ffmpeg python3-pip pipx
735
-
736
- # Arch Linux
737
- sudo pacman -S sox ffmpeg python-pip python-pipx
738
- ```
739
-
740
- **Audio Playback** (one of the following):
741
- - `paplay` (PulseAudio - usually pre-installed)
742
- - `aplay` (ALSA - fallback)
743
- - `mpg123` (fallback)
744
- - `mpv` (fallback)
745
-
746
- **Why these tools?**
747
- - **sox**: Applies audio effects defined in `.claude/config/audio-effects.cfg` (reverb, pitch shifting, EQ, compression)
748
- - **ffmpeg**: Mixes background music tracks, adds silence padding to prevent audio cutoff, compresses audio for RDP/SSH sessions
749
- - **paplay/aplay**: Plays generated TTS audio files
750
- - **pipx**: Isolated Python environment manager for Piper TTS installation
751
-
752
- #### 🍎 macOS
753
-
754
- ```bash
755
- # Install Homebrew if not already installed
756
- /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
757
-
758
- # Required: Modern bash
759
- brew install bash
760
-
761
- # Recommended: Audio processing tools
762
- brew install sox ffmpeg pipx
763
- ```
764
-
765
- **Audio Playback**:
766
- - `afplay` (built-in - always available)
767
- - `say` (built-in - for macOS TTS provider)
768
-
769
- **Why these tools?**
770
- - **bash 5.x**: macOS ships with bash 3.2 which lacks associative arrays and other modern features AgentVibes uses
771
- - **sox**: Same audio effects processing as Linux
772
- - **ffmpeg**: Same background music and padding as Linux
773
- - **afplay**: Built-in macOS audio player
774
- - **say**: Built-in macOS text-to-speech (alternative to Piper)
775
-
776
- #### 🪟 Windows
777
-
778
- **Option A: Native Windows (Recommended)**
779
-
780
- AgentVibes now supports native Windows with three TTS providers. No WSL required!
781
-
782
- ```powershell
783
- # Interactive Node.js installer (recommended)
784
- npx agentvibes install
785
-
786
- # Or use the standalone PowerShell installer
787
- .\setup-windows.ps1
788
- ```
789
-
790
- **Providers available natively:**
791
- - **Soprano** - Ultra-fast neural TTS (best quality, requires `pip install soprano-tts`)
792
- - **Windows Piper** - High quality offline neural voices (auto-downloaded)
793
- - **Windows SAPI** - Built-in Windows voices (zero setup)
794
-
795
- **Requirements:** Node.js 16+, PowerShell 5.1+, ffmpeg (optional, for background music & reverb)
796
-
797
- See [Windows Native Setup Guide](WINDOWS-SETUP.md) for full instructions.
798
-
799
- **Option B: WSL (Legacy)**
800
-
801
- For Claude Desktop or WSL-based workflows, follow the [Windows WSL Guide](mcp-server/WINDOWS_SETUP.md).
802
-
803
- ```powershell
804
- # Install WSL from PowerShell (Administrator)
805
- wsl --install -d Ubuntu
806
- ```
807
-
808
- Then follow Linux requirements above inside WSL.
809
-
810
- #### 🤖 Android / Termux
811
-
812
- **Running Claude Code on Your Android Using Termux**
813
-
814
- AgentVibes fully supports Android devices through the [Termux app](https://termux.dev/). This enables you to run Claude Code with professional TTS voices directly on your Android phone or tablet!
815
-
816
- **Quick Setup:**
817
-
818
- ```bash
819
- # 1. Install Termux from F-Droid (NOT Google Play - it's outdated)
820
- # Download: https://f-droid.org/en/packages/com.termux/
821
-
822
- # 2. Install Node.js in Termux
823
- pkg update && pkg upgrade
824
- pkg install nodejs-lts
825
-
826
- # 3. Install AgentVibes (auto-detects Android and runs Termux installer)
827
- npx agentvibes install
828
- ```
829
-
830
- **What Gets Installed?**
831
-
832
- The Termux installer automatically sets up:
833
- - **proot-distro** with Debian (for glibc compatibility)
834
- - **Piper TTS** via proot wrapper (Android uses bionic libc, not glibc)
835
- - **termux-media-player** for audio playback (`paplay` doesn't work on Android)
836
- - **Audio dependencies**: ffmpeg, sox, bc for processing
837
- - **termux-api** for Android-specific audio routing
838
-
839
- **Why Termux Instead of Standard Installation?**
840
-
841
- Android's architecture requires special handling:
842
- - ❌ Standard pip/pipx fails (missing wheels for bionic libc)
843
- - Linux binaries require glibc (Android uses bionic)
844
- - ❌ `/tmp` directory is not accessible on Android
845
- - Standard audio tools like `paplay` don't exist
846
-
847
- Termux installer solves all these issues with proot-distro and Android-native audio playback!
848
-
849
- **Requirements:**
850
- - [Termux app](https://f-droid.org/en/packages/com.termux/) (from F-Droid, NOT Google Play)
851
- - [Termux:API](https://f-droid.org/en/packages/com.termux.api/) (for audio playback)
852
- - Android 7.0+ (recommended: Android 10+)
853
- - ~500MB free storage (for Piper TTS + voice models)
854
-
855
- **Audio Playback:**
856
- - Uses `termux-media-player` instead of `paplay`
857
- - Audio automatically routes through Android's media system
858
- - Supports all Piper TTS voices (50+ languages)
859
-
860
- **Verifying Your Setup:**
861
-
862
- ```bash
863
- # Check Termux environment
864
- echo $PREFIX # Should show /data/data/com.termux/files/usr
865
-
866
- # Check Node.js
867
- node --version # Should be ≥16.0
868
-
869
- # Check if Piper is installed
870
- which piper # Should return /data/data/com.termux/files/usr/bin/piper
871
-
872
- # Test audio playback
873
- termux-media-player play /path/to/audio.wav
874
- ```
875
-
876
- **Troubleshooting:**
877
-
878
- | Issue | Solution |
879
- |-------|----------|
880
- | "piper: not found" | Run `npx agentvibes install` - auto-detects Termux |
881
- | No audio playback | Install Termux:API from F-Droid |
882
- | Permission denied | Run `termux-setup-storage` to grant storage access |
883
- | Slow installation | Use WiFi, not mobile data (~300MB download) |
884
-
885
- **Why F-Droid and Not Google Play?**
886
-
887
- Google Play's Termux version is outdated and unsupported. Always use the [F-Droid version](https://f-droid.org/en/packages/com.termux/) for the latest security updates and compatibility.
888
-
889
- ### TTS Provider Requirements
890
-
891
- #### Piper TTS (Free, Offline)
892
- - **Python** 3.10+
893
- - **pipx** (for isolated installation)
894
- - **Disk Space**: ~50MB per voice model
895
- - **Internet**: Only for initial voice downloads
896
-
897
- ```bash
898
- # Installed automatically by AgentVibes
899
- pipx install piper-tts
900
- ```
901
-
902
- #### macOS Say (Built-in, macOS Only)
903
- - No additional requirements
904
- - 100+ voices pre-installed on macOS
905
- - Use: `/agent-vibes:provider switch macos`
906
-
907
- ### Verifying Your Setup
908
-
909
- ```bash
910
- # Check all dependencies
911
- node --version # Should be ≥16.0
912
- python3 --version # Should be ≥3.10
913
- bash --version # Should be ≥5.0 (macOS users!)
914
- sox --version # Optional but recommended
915
- ffmpeg -version # Optional but recommended
916
- pipx --version # Required for Piper TTS
917
-
918
- # Check audio playback (Linux/WSL)
919
- paplay --version || aplay --version
920
-
921
- # Check audio playback (macOS)
922
- which afplay # Should return /usr/bin/afplay
923
- ```
924
-
925
- ### What Happens Without Optional Dependencies?
926
-
927
- | Missing Tool | Impact | Workaround |
928
- |-------------|--------|------------|
929
- | sox | No audio effects (reverb, EQ, pitch) | TTS still works, just no effects |
930
- | ffmpeg | No background music, no audio padding | TTS still works, audio may cut off slightly early |
931
- | paplay/aplay | No audio playback on Linux | Install at least one audio player |
932
-
933
- **All TTS generation still works** - optional tools only enhance the experience!
934
-
935
- [↑ Back to top](#-table-of-contents)
936
-
937
- ---
938
-
939
- ## 🎭 Choose Your Voice Provider
940
-
941
- **Piper TTS** (free, works offline on Linux/WSL) or **macOS Say** (free, built-in on Mac) - pick one and switch anytime.
942
-
943
- | Provider | Platform | Cost | Quality | Setup |
944
- |----------|----------|------|---------|-------|
945
- | **macOS Say** | macOS only | Free (built-in) | ⭐⭐⭐⭐ | Zero config |
946
- | **Piper** | Linux/WSL/Windows | Free | ⭐⭐⭐⭐ | Auto-downloads |
947
- | **Soprano** | Linux/WSL/Windows | Free | ⭐⭐⭐⭐⭐ | `pip install soprano-tts` |
948
- | **Windows SAPI** | Windows | Free (built-in) | ⭐⭐⭐ | Zero config |
949
-
950
- On macOS, the native `say` provider is automatically detected and recommended!
951
-
952
- **[→ Provider Comparison Guide](docs/providers.md)**
953
-
954
- [↑ Back to top](#-table-of-contents)
955
-
956
- ---
957
-
958
- ## 🎤 Commands Reference
959
-
960
- AgentVibes provides **50+ slash commands** and **natural language MCP equivalents**.
961
-
962
- **Quick Examples:**
963
- ```bash
964
- # Voice control
965
- /agent-vibes:switch Aria # Or: "Switch to Aria voice"
966
- /agent-vibes:list # Or: "List all voices"
967
-
968
- # Personality & sentiment
969
- /agent-vibes:personality pirate # Or: "Set personality to pirate"
970
- /agent-vibes:sentiment sarcastic # Or: "Apply sarcastic sentiment"
971
-
972
- # Language & learning
973
- /agent-vibes:set-language spanish # Or: "Speak in Spanish"
974
- /agent-vibes:learn # Or: "Enable learning mode"
975
- ```
976
-
977
- **[→ View Complete Command Reference](docs/commands.md)** - All voice, system, personality, sentiment, language, and BMAD commands with MCP equivalents
978
-
979
- ### Intro Text Commands
980
-
981
- ```bash
982
- # Configure intro text — open Settings tab
983
- npx agentvibes
984
-
985
- # View current intro text
986
- cat ~/.claude/config/intro-text.txt
987
- ```
988
-
989
- **MCP Equivalent:**
990
- ```
991
- "Set my intro text to 'FireBot: '"
992
- "What's my current intro text?"
993
- "Clear my intro text"
994
- ```
995
-
996
- ### Custom Music Commands
997
-
998
- ```bash
999
- # Configure background music — open Music tab
1000
- npx agentvibes
1001
- ```
1002
-
1003
- **MCP Equivalent:**
1004
- ```
1005
- "Configure my background music"
1006
- "Add custom background music"
1007
- "Remove custom music"
1008
- "Preview my background music"
1009
- ```
1010
-
1011
- ### Friendly Voice Name Commands
1012
-
1013
- ```bash
1014
- # Switch using friendly name
1015
- /agent-vibes:switch Ryan
1016
- /agent-vibes:switch Sarah
1017
-
1018
- # List all voices with friendly names
1019
- /agent-vibes:list
1020
-
1021
- # Get current voice (shows friendly name if available)
1022
- /agent-vibes:whoami
1023
- ```
1024
-
1025
- **MCP Equivalent:**
1026
- ```
1027
- "Switch to Ryan voice"
1028
- "Use the Sarah voice"
1029
- "List all available voices"
1030
- ```
1031
-
1032
- [↑ Back to top](#-table-of-contents)
1033
-
1034
- ---
1035
-
1036
- ## 🎙️ Verbosity Control
1037
-
1038
- **Control how much Claude speaks while working!** 🔊
1039
-
1040
- Choose from three verbosity levels:
1041
-
1042
- ### LOW (Minimal) 🔇
1043
- - Acknowledgments only (start of task)
1044
- - Completions only (end of task)
1045
- - Perfect for quiet work sessions
1046
-
1047
- ### MEDIUM (Balanced) 🤔
1048
- - Acknowledgments + completions
1049
- - Major decisions ("I'll use grep to search")
1050
- - Key findings ("Found 12 instances")
1051
- - Perfect for understanding decisions without full narration
1052
-
1053
- ### HIGH (Maximum Transparency) 💭
1054
- - All reasoning ("Let me search for all instances")
1055
- - All decisions ("I'll use grep for this")
1056
- - All findings ("Found it at line 1323")
1057
- - Perfect for learning mode, debugging complex tasks
1058
-
1059
- **Quick Commands:**
1060
- ```bash
1061
- /agent-vibes:verbosity # Show current level
1062
- /agent-vibes:verbosity high # Maximum transparency
1063
- /agent-vibes:verbosity medium # Balanced
1064
- /agent-vibes:verbosity low # Minimal (default)
1065
- ```
1066
-
1067
- **MCP Equivalent:**
1068
- ```
1069
- "Set verbosity to high"
1070
- "What's my current verbosity level?"
1071
- ```
1072
-
1073
- 💡 **How it works:** Claude uses emoji markers (💭 🤔 ✓) in its text, and AgentVibes automatically detects and speaks them based on your verbosity level. No manual TTS calls needed!
1074
-
1075
- ⚠️ **Note:** Changes take effect on next Claude Code session restart.
1076
-
1077
- [↑ Back to top](#-table-of-contents)
1078
-
1079
- ---
1080
-
1081
- ## 📚 Language Learning Mode
1082
-
1083
- **🎯 Learn Spanish (or 30+ languages) while you program!** 🌍
1084
-
1085
- Every task acknowledgment plays **twice** - first in English, then in your target language. Context-based learning while you code!
1086
-
1087
- **[→ View Complete Learning Mode Guide](docs/language-learning-mode.md)** - Full tutorial, quick start, commands, speech rate control, supported languages, and pro tips
1088
-
1089
- [↑ Back to top](#-table-of-contents)
1090
-
1091
- ---
1092
-
1093
- ## 🎭 Personalities vs Sentiments
1094
-
1095
- **Two ways to add personality:**
1096
-
1097
- - **🎪 Personalities** - Changes BOTH voice AND speaking style (e.g., `pirate` personality = Pirate Marshal voice + pirate speak)
1098
- - **💭 Sentiments** - Keeps your current voice, only changes speaking style (e.g., Aria voice + sarcastic sentiment)
1099
-
1100
- **[→ Complete Personalities Guide](docs/personalities.md)** - All 19 personalities, create custom ones
1101
-
1102
- [↑ Back to top](#-table-of-contents)
1103
-
1104
- ---
1105
-
1106
- ## 🗣️ Voice Library
1107
-
1108
- Use the **AgentVibes TUI installer** (`/audio-browser`) to browse, sample, and install from 914 voices interactively.
1109
-
1110
- ### Friendly Voice Names
1111
-
1112
- All voices now have memorable names! Instead of technical IDs like `en_US-libritts_r-medium-speaker-123`, just use friendly names like **Ryan**, **Joe**, or **Sarah**.
1113
-
1114
- **Voice Metadata Includes:**
1115
- - Display name and technical ID
1116
- - Gender, accent, and region
1117
- - Personality traits (professional, warm, friendly, etc.)
1118
- - Recommended use cases
1119
- - Quality rating and sample rate
1120
-
1121
- ### Voice Categories
1122
-
1123
- **Curated Voices** (10 personalities):
1124
- These hand-picked voices cover common use cases with clear characteristics.
1125
-
1126
- **Speaker Variations** (904 voices):
1127
- High-quality Piper TTS voices from the libritts-high model. Each speaker has unique vocal characteristics, accents, and tones.
1128
-
1129
- ### Popular Voices
1130
-
1131
- AgentVibes includes professional AI voices from Piper TTS and macOS Say with multilingual support.
1132
-
1133
- 🎧 **Try in Claude Code:** `/agent-vibes:preview` to hear all voices
1134
- 🌍 **Multilingual:** Use Antoni, Rachel, Domi, or Bella for automatic language detection
1135
-
1136
- **[→ View Complete Voice Library](docs/voice-library.md)** - All voices with clickable samples, descriptions, and best use cases
1137
-
1138
- [↑ Back to top](#-table-of-contents)
1139
-
1140
- ---
1141
-
1142
- ## 🔌 BMAD Plugin
1143
-
1144
- **Automatically switch voices when using BMAD agents!**
1145
-
1146
- The BMAD plugin detects when you activate a BMAD agent (e.g., `/BMad:agents:pm`) and automatically uses the assigned voice for that role.
1147
-
1148
- **Version Support**: AgentVibes supports both BMAD v4 and v6-alpha installations. Version detection is automatic - just install BMAD and AgentVibes will detect and configure itself correctly!
1149
-
1150
- ### 🔊 TTS Injection: How It Works
1151
-
1152
- BMAD uses a **loosely-coupled injection system** for voice integration. BMAD source files contain placeholder markers that AgentVibes replaces with speaking instructions during installation:
1153
-
1154
- **Before Installation (BMAD Source):**
1155
- ```xml
1156
- <rules>
1157
- <r>ALWAYS communicate in {communication_language}...</r>
1158
- <!-- TTS_INJECTION:agent-tts -->
1159
- <r>Stay in character until exit selected</r>
1160
- </rules>
1161
- ```
1162
-
1163
- **After Installation (with AgentVibes enabled):**
1164
- ```xml
1165
- <rules>
1166
- <r>ALWAYS communicate in {communication_language}...</r>
1167
- - When responding to user messages, speak your responses using TTS:
1168
- Call: `.claude/hooks/bmad-speak.sh '{agent-id}' '{response-text}'`
1169
- Where {agent-id} is your agent type (pm, architect, dev, etc.)
1170
-
1171
- - Auto Voice Switching: AgentVibes automatically switches to the voice
1172
- assigned for your agent role when activated
1173
- <r>Stay in character until exit selected</r>
1174
- </rules>
1175
- ```
1176
-
1177
- **After Installation (with TTS disabled):**
1178
- ```xml
1179
- <rules>
1180
- <r>ALWAYS communicate in {communication_language}...</r>
1181
- <r>Stay in character until exit selected</r>
1182
- </rules>
1183
- ```
1184
-
1185
- This design means **any TTS provider** can integrate with BMAD by replacing these markers with their own instructions!
1186
-
1187
- **[ View Complete BMAD Documentation](docs/bmad-plugin.md)** - All agent mappings, language support, TTS injection details, plugin management, and customization
1188
-
1189
- [↑ Back to top](#-table-of-contents)
1190
-
1191
- ---
1192
-
1193
- ## 🤖 OpenClaw Integration
1194
-
1195
- **Use AgentVibes TTS with OpenClaw - the revolutionary AI assistant you can access via any instant messenger!**
1196
-
1197
- **What is OpenClaw?** [OpenClaw](https://openclaw.ai/) is a revolutionary AI assistant that brings Claude AI to your favorite messaging platforms - WhatsApp, Telegram, Discord, and more. No apps to install, no websites to visit - just message your AI assistant like you would a friend.
1198
-
1199
- 🌐 **Website**: https://openclaw.ai/
1200
-
1201
- AgentVibes seamlessly integrates with OpenClaw, providing professional text-to-speech for AI assistants running on messaging platforms and remote servers.
1202
-
1203
- ### 🚨 CRITICAL: Security Before Running OpenClaw on Any Remote Server
1204
-
1205
- ⚠️ **SECURITY IS NOT OPTIONAL** - Running OpenClaw on a remote server exposes your infrastructure to attack vectors including SSH compromise, credential theft, and lateral movement.
1206
-
1207
- **👉 READ THIS FIRST:** [Security Hardening Guide](docs/security-hardening-guide.md) - **Required reading** covering:
1208
- - SSH hardening (key-only auth, port 2222, fail2ban)
1209
- - Firewall configuration (UFW/iptables)
1210
- - Intrusion detection (AIDE, Wazuh)
1211
- - ✅ VPN tunneling (Tailscale alternative to direct SSH)
1212
-
1213
- **Do not expose your OpenClaw server to the internet without reading this guide.**
1214
-
1215
- ### 🎯 Key Benefits
1216
-
1217
- - **Free & Offline**: No API costs, works without internet
1218
- - **Remote SSH Audio**: Audio tunnels from server to local machine via PulseAudio
1219
- - **50+ Voices**: Professional AI voices in 30+ languages
1220
- - **Zero Config**: Automatic when AgentVibes is installed
1221
-
1222
- ### 🚀 Installation
1223
-
1224
- AgentVibes includes a ready-to-use OpenClaw skill that enables TTS on messaging platforms. The setup involves two components:
1225
-
1226
- #### Component 1: OpenClaw Server (Remote)
1227
-
1228
- Install AgentVibes on your OpenClaw server:
1229
-
1230
- ```bash
1231
- # On your remote server where OpenClaw is running
1232
- npx agentvibes install
1233
- ```
1234
-
1235
- The OpenClaw skill is **automatically included** in the AgentVibes npm package at `.clawdbot/skill/SKILL.md`.
1236
-
1237
- **How to activate the skill in OpenClaw:**
1238
-
1239
- 1. **Locate the skill** - After installing AgentVibes, the skill is at:
1240
- ```
1241
- node_modules/agentvibes/.clawdbot/skill/SKILL.md
1242
- ```
1243
-
1244
- 2. **Link to OpenClaw skills directory** (if OpenClaw uses skills):
1245
- ```bash
1246
- # Example - adjust path based on your OpenClaw installation
1247
- ln -s $(npm root -g)/agentvibes/.clawdbot/skill/SKILL.md ~/.openclaw/skills/agentvibes.md
1248
- ```
1249
-
1250
- 3. **OpenClaw auto-detection** - Many OpenClaw setups automatically detect AgentVibes when it's installed. Check your OpenClaw logs for:
1251
- ```
1252
- AgentVibes skill detected and loaded
1253
- ```
1254
-
1255
- ---
1256
-
1257
- #### 🎙️ AgentVibes Voice Management Skill for OpenClaw
1258
-
1259
- Manage your text-to-speech voices across multiple providers with the AgentVibes Voice Management Skill:
1260
-
1261
- **Voice Management Features:**
1262
- - 🎤 **50+ Professional Voices** - Across Piper TTS, Piper (free offline), and macOS Say providers
1263
- - 🔀 **Multi-Provider Support** - Switch between Piper TTS (premium), Piper (free), and macOS Say
1264
- - 👂 **Voice Preview** - Listen to voices before selecting them
1265
- - 🎚️ **Voice Customization** - Add custom voices, set pretext, control speech rate
1266
- - 📋 **Voice Management** - List, switch, replay, and manage your voice library
1267
- - 🔇 **Mute Control** - Mute/unmute TTS output with persistent settings
1268
- - 🌍 **Multilingual Support** - Voices in 30+ languages across all providers
1269
-
1270
- **Installation Confirmation:**
1271
- ✅ The skill is **automatically included** in the AgentVibes npm package at:
1272
- ```
1273
- node_modules/agentvibes/.clawdbot/skill/SKILL.md
1274
- ```
1275
-
1276
- No extra setup needed - when you run `npx agentvibes install` on your OpenClaw server, the skill is ready to use!
1277
-
1278
- **Full Skill Documentation:**
1279
- **[→ View Complete AgentVibes Skill Guide](.clawdbot/skill/SKILL.md)** - 430+ lines covering:
1280
- - Quick start with 50+ voice options
1281
- - Background music & effects management
1282
- - Personality system (19+ styles)
1283
- - Voice effects (reverb, reverb, EQ)
1284
- - Speed & verbosity control
1285
- - Remote SSH audio setup
1286
- - Troubleshooting & complete reference
1287
-
1288
- **Popular Voice Examples:**
1289
- ```bash
1290
- # Female voices
1291
- npx agentvibes speak "Hello" --voice en_US-amy-medium
1292
- npx agentvibes speak "Bonjour" --voice fr_FR-siwis-medium
1293
-
1294
- # Male voices
1295
- npx agentvibes speak "Hello" --voice en_US-lessac-medium
1296
- npx agentvibes speak "Good day" --voice en_GB-alan-medium
1297
-
1298
- # Add personality!
1299
- bash ~/.claude/hooks/personality-manager.sh set sarcastic
1300
- bash ~/.claude/hooks/play-tts.sh "Oh wonderful, another request"
1301
- ```
1302
-
1303
- ---
1304
-
1305
- #### Component 2: AgentVibes Receiver (Local/Phone) ⚠️ REQUIRED
1306
-
1307
- **CRITICAL: You MUST install AgentVibes on your phone (or local machine) to receive and play audio!**
1308
-
1309
- Without this, audio cannot be heard - the server generates TTS but needs a receiver to play it.
1310
-
1311
- **Install on Android Phone (Termux):**
1312
-
1313
- 1. **Install Termux from F-Droid** (NOT Google Play):
1314
- - Download: https://f-droid.org/en/packages/com.termux/
1315
-
1316
- 2. **Install Node.js in Termux:**
1317
- ```bash
1318
- pkg update && pkg upgrade
1319
- pkg install nodejs-lts
1320
- ```
1321
-
1322
- 3. **Install AgentVibes in Termux:**
1323
- ```bash
1324
- npx agentvibes install
1325
- ```
1326
-
1327
- 4. **Install Termux:API** (for audio playback):
1328
- - Download: https://f-droid.org/en/packages/com.termux.api/
1329
- - Then in Termux: `pkg install termux-api`
1330
-
1331
- **Install on Local Mac/Linux:**
1332
-
1333
- ```bash
1334
- npx agentvibes install
1335
- ```
1336
-
1337
- **Why is this needed?**
1338
- - The **server generates TTS** but has no speakers (headless)
1339
- - AgentVibes on your **phone acts as the audio receiver** via SSH tunnel
1340
- - Audio tunnels from server → SSH → phone → speakers 🔊
1341
-
1342
- Without AgentVibes installed on the receiving device, you'll generate audio but hear nothing!
1343
-
1344
- #### How It Works: Server → SSH Tunnel → Local Playback
1345
-
1346
- ```
1347
- ┌─────────────────────────────────────────────────────────┐
1348
- │ 1. User messages OpenClaw via Telegram/WhatsApp │
1349
- │ "Tell me about the weather" │
1350
- └─────────────────────────────────────────────────────────┘
1351
-
1352
- ┌─────────────────────────────────────────────────────────┐
1353
- │ 2. OpenClaw (Server) processes request with Claude │
1354
- │ AgentVibes skill generates TTS audio │
1355
- └─────────────────────────────────────────────────────────┘
1356
-
1357
- ┌─────────────────────────────────────────────────────────┐
1358
- │ 3. Audio tunnels through SSH → PulseAudio (port 14713)│
1359
- │ Server: PULSE_SERVER=tcp:localhost:14713 │
1360
- └─────────────────────────────────────────────────────────┘
1361
-
1362
- ┌─────────────────────────────────────────────────────────┐
1363
- │ 4. Local AgentVibes receives and plays audio │
1364
- │ Phone speakers, laptop speakers, etc. │
1365
- │ 🔊 "The weather is sunny and 72 degrees" │
1366
- └─────────────────────────────────────────────────────────┘
1367
- ```
1368
-
1369
- **Architecture:**
1370
- - **Server (OpenClaw)**: Generates TTS, sends via PulseAudio
1371
- - **SSH Tunnel**: RemoteForward port 14713 (encrypted transport)
1372
- - **Local (Termux/Desktop)**: AgentVibes receives audio, plays on speakers
1373
-
1374
- This creates a **Siri-like experience** - message from anywhere, hear responses on your phone! 📱🎤
1375
-
1376
- ### 📝 Usage
1377
-
1378
- #### Basic TTS Commands
1379
-
1380
- ```bash
1381
- # Basic TTS
1382
- npx agentvibes speak "Hello from OpenClaw"
1383
-
1384
- # With different voices
1385
- npx agentvibes speak "Hello" --voice en_US-amy-medium
1386
- npx agentvibes speak "Bonjour" --voice fr_FR-siwis-medium
1387
-
1388
- # List available voices
1389
- npx agentvibes voices
1390
- ```
1391
-
1392
- #### Advanced: Direct Hook Usage with Voice Override
1393
-
1394
- For programmatic control, use the TTS hook directly:
1395
-
1396
- ```bash
1397
- # Basic: Use default voice
1398
- bash ~/.claude/hooks/play-tts.sh "Hello from OpenClaw"
1399
-
1400
- # Advanced: Override voice per message
1401
- bash ~/.claude/hooks/play-tts.sh "Welcome message" "en_US-amy-medium"
1402
- bash ~/.claude/hooks/play-tts.sh "Bonjour!" "fr_FR-siwis-medium"
1403
- bash ~/.claude/hooks/play-tts.sh "British greeting" "en_GB-alan-medium"
1404
- ```
1405
-
1406
- **Parameters:**
1407
- - `$1` - **TEXT** (required): Message to speak
1408
- - `$2` - **VOICE** (optional): Voice name to override default
1409
-
1410
- #### Audio Effects Configuration for OpenClaw
1411
-
1412
- **File**: `.claude/config/audio-effects.cfg`
1413
-
1414
- Customize audio effects, background music, and voice processing per agent or use default settings:
1415
-
1416
- **Format:**
1417
- ```
1418
- AGENT_NAME|SOX_EFFECTS|BACKGROUND_FILE|BACKGROUND_VOLUME
1419
- ```
1420
-
1421
- **Example Configuration:**
1422
-
1423
- ```bash
1424
- # Default - subtle background music
1425
- default||agentvibes_soft_flamenco_loop.mp3|0.30
1426
-
1427
- # Custom agent with reverb + background
1428
- MyAgent|reverb 40 50 90 gain -2|agentvibes_soft_flamenco_loop.mp3|0.20
1429
-
1430
- # Agent with pitch shift and EQ
1431
- Assistant|pitch -100 equalizer 3000 1q +2|agentvibes_dark_chill_step_loop.mp3|0.15
1432
- ```
1433
-
1434
- **Available SOX Effects:**
1435
-
1436
- | Effect | Syntax | Example | Description |
1437
- |--------|--------|---------|-------------|
1438
- | **Reverb** | `reverb <reverberance> <HF-damping> <room-scale>` | `reverb 40 50 90` | Adds room ambiance (light: 30 40 70, heavy: 50 60 100) |
1439
- | **Pitch** | `pitch <cents>` | `pitch -100` | Shift pitch (100 cents = 1 semitone, negative = lower) |
1440
- | **Equalizer** | `equalizer <freq> <width>q <gain-dB>` | `equalizer 3000 1q +2` | Boost/cut frequencies (bass: 200Hz, treble: 4000Hz) |
1441
- | **Gain** | `gain <dB>` | `gain -2` | Adjust volume (negative = quieter, positive = louder) |
1442
- | **Compand** | `compand <attack,decay> <threshold:in,out>` | `compand 0.3,1 6:-70,-60,-20` | Dynamic range compression (makes quiet parts louder) |
1443
-
1444
- **Background Music Tracks:**
1445
-
1446
- Built-in tracks available in `.claude/audio/tracks/`:
1447
- - `agentvibes_soft_flamenco_loop.mp3` - Warm, rhythmic flamenco
1448
- - `agentvibes_dark_chill_step_loop.mp3` - Modern chill electronic
1449
- - (50+ additional tracks available)
1450
-
1451
- **Background Volume:**
1452
- - `0.10` - Very subtle (10%)
1453
- - `0.20` - Subtle (20%)
1454
- - `0.30` - Moderate (30%, recommended default)
1455
- - `0.40` - Noticeable (40%, party mode)
1456
-
1457
- **Example: OpenClaw Custom Configuration**
1458
-
1459
- Create `.claude/config/audio-effects.cfg` on your OpenClaw server:
1460
-
1461
- ```bash
1462
- # OpenClaw assistant - warm voice with subtle reverb
1463
- OpenClaw|reverb 30 40 70 gain -1|agentvibes_soft_flamenco_loop.mp3|0.25
1464
-
1465
- # Help desk agent - clear, bright voice
1466
- HelpDesk|equalizer 4000 1q +3 compand 0.2,0.5 6:-70,-60,-20|agentvibes_dark_chill_step_loop.mp3|0.15
1467
-
1468
- # Default fallback
1469
- default||agentvibes_soft_flamenco_loop.mp3|0.30
1470
- ```
1471
-
1472
- **How AgentVibes Applies Effects:**
1473
-
1474
- 1. **Generate TTS** - Create base audio with Piper TTS
1475
- 2. **Apply SOX effects** - Process audio (reverb, EQ, pitch, etc.)
1476
- 3. **Mix background** - Blend background music at specified volume
1477
- 4. **Tunnel via SSH** - Send processed audio to local receiver
1478
- 5. **Play on device** - Output to phone/laptop speakers
1479
-
1480
- This allows **per-message customization** or **consistent agent branding** with unique audio signatures!
1481
-
1482
- ### 🔊 Remote SSH Audio
1483
-
1484
- Perfect for running OpenClaw on a remote server with audio on your local machine:
1485
-
1486
- **Quick Setup:**
1487
-
1488
- 1. **Remote server** - Configure PulseAudio:
1489
- ```bash
1490
- echo 'export PULSE_SERVER=tcp:localhost:14713' >> ~/.bashrc
1491
- source ~/.bashrc
1492
- ```
1493
-
1494
- 2. **Local machine** - Add SSH tunnel (`~/.ssh/config`):
1495
- ```
1496
- Host your-server
1497
- RemoteForward 14713 localhost:14713
1498
- ```
1499
-
1500
- 3. **Connect and test**:
1501
- ```bash
1502
- ssh your-server
1503
- agentvibes speak "Testing remote audio from OpenClaw"
1504
- ```
1505
-
1506
- Audio plays on your local speakers! 🔊
1507
-
1508
- ### 📚 Documentation
1509
-
1510
- - **OpenClaw Skill**: [.clawdbot/README.md](.clawdbot/README.md)
1511
- - **OpenClaw Website**: https://openclaw.ai/
1512
- - **Remote Audio Setup**: [docs/remote-audio-setup.md](docs/remote-audio-setup.md)
1513
- - **Security Hardening**: [docs/security-hardening-guide.md](docs/security-hardening-guide.md) ⚠️
1514
-
1515
- [↑ Back to top](#-table-of-contents)
1516
-
1517
- ---
1518
-
1519
- ## 🎙️ AgentVibes Receiver: Remote Audio Streaming from Voiceless Servers
1520
-
1521
- **Receive and play TTS audio from servers that have no audio output!**
1522
-
1523
- AgentVibes Receiver is a lightweight audio client that runs on your phone, tablet, or personal computer, which receives TTS audio from remote voiceless servers, where your OpenClaw Personal Assistant or your Claude Code project is installed.
1524
-
1525
- ### 🎯 What AgentVibes Receiver Solves
1526
-
1527
- You have OpenClaw running on a Mac mini or remote server with **no audio output**:
1528
- - 🖥️ Mac mini (silent)
1529
- - 🖥️ Ubuntu server (headless)
1530
- - ☁️ AWS/DigitalOcean instance
1531
- - 📦 Docker container
1532
- - 🪟 WSL (Windows Subsystem for Linux)
1533
-
1534
- Users message you via WhatsApp, Telegram, Discord but only get text responses:
1535
- - No voice = Less engaging experience
1536
- - ❌ No personality = Feels robotic
1537
- - ❌ No audio cues = Miss important context
1538
-
1539
- **AgentVibes Receiver transforms this:**
1540
- - ✅ OpenClaw speaks with voice (Siri-like experience)
1541
- - Audio streams to your device automatically
1542
- - ✅ You hear responses on your speakers
1543
- - ✅ Users get a conversational AI experience
1544
-
1545
- ### 🔧 How It Works
1546
-
1547
- **One-time setup:**
1548
- 1. Install AgentVibes on your voiceless server with OpenClaw
1549
- 2. Install AgentVibes Receiver on your personal device (phone/tablet/laptop)
1550
- 3. Connect via SSH tunnel (or Tailscale VPN)
1551
- 4. Done - automatic from then on
1552
-
1553
- **Flow diagram:**
1554
- ```
1555
- ┌──────────────────────────────────────────┐
1556
- Your Mac mini / Server │
1557
- │ (OpenClaw + AgentVibes) │
1558
- │ • Generates TTS audio │
1559
- │ • Sends via SSH tunnel │
1560
- └──────────────────────────────────────────┘
1561
- ↓ Encrypted SSH tunnel
1562
- ┌──────────────────────────────────────────┐
1563
- │ Your Phone / Laptop │
1564
- │ (AgentVibes Receiver) │
1565
- │ • Receives audio stream (or text stream) │
1566
- Auto-plays on device speakers │
1567
- └──────────────────────────────────────────┘
1568
- ```
1569
-
1570
- **Real-world example:**
1571
- ```
1572
- 📱 WhatsApp: "Tell me about quantum computing"
1573
-
1574
- 🖥️ Mac mini: OpenClaw processes + generates TTS
1575
- SSH tunnel (audio or text stream)
1576
- 📱 Your phone (Agent Vibes Receiver): Plays audio 🔊
1577
-
1578
- You hear on your device speakers: "Quantum computing uses quantum bits..."
1579
-
1580
- 💬 Conversation feels alive!
1581
- ```
1582
-
1583
- ### Key Features
1584
-
1585
- | Feature | Benefit |
1586
- |---------|---------|
1587
- | **One-Time Pairing** | SSH key setup, automatic reconnect |
1588
- | **Real-Time Streaming** | Low-latency audio playback |
1589
- | **SSH Encryption** | Secure audio tunnel |
1590
- | **Tailscale Support** | Easy VPN for remote servers |
1591
- | **Voice Selection** | Configure server-side voice |
1592
- | **Audio Effects** | Reverb, echo, pitch on server |
1593
- | **Cache Tracking** | Monitor audio generation |
1594
- | **Multiple Servers** | Connect to different OpenClaw instances |
1595
-
1596
- ### 🚀 Perfect For
1597
-
1598
- - 🖥️ **Mac mini + OpenClaw** - Home server with professional voices
1599
- - ☁️ **Remote Servers** - OpenClaw on AWS/GCP/DigitalOcean
1600
- - 📱 **WhatsApp/Telegram** - Users message, hear responses
1601
- - 🎓 **Discord Bots** - Bot speaks with voices
1602
- - 🏗️ **Docker/Containers** - Containerized OpenClaw with audio
1603
- - 🔧 **WSL Development** - Windows developers using voiceless WSL
1604
-
1605
- ### 📝 Setup
1606
-
1607
- ```bash
1608
- # On your server (Mac mini, Ubuntu, AWS, etc.)
1609
- npx agentvibes install
1610
- # Selects OpenClaw option
1611
- # AgentVibes installs with SSH-Remote provider
1612
-
1613
- # On your personal device (phone, laptop, tablet)
1614
- npx agentvibes receiver setup
1615
- # Pairing prompt with server SSH key
1616
- # Done!
1617
- ```
1618
-
1619
- ### 📚 Documentation
1620
-
1621
- **[→ View AgentVibes Receiver Setup Guide](docs/agentvibes-receiver.md)** - Pairing, SSH configuration, Tailscale setup, troubleshooting
1622
-
1623
- **[→ View OpenClaw Integration Guide](docs/openclaw-integration.md)** - Server setup, voice configuration, audio effects, and best practices
1624
-
1625
- [↑ Back to top](#-table-of-contents)
1626
-
1627
- ---
1628
-
1629
- ## 📦 Installation Structure
1630
-
1631
- **What gets installed:** Commands, hooks, personalities, and plugins in `.claude/` directory.
1632
-
1633
- **[→ View Complete Installation Structure](docs/installation-structure.md)** - Full directory tree, file descriptions, and settings storage
1634
-
1635
- [↑ Back to top](#-table-of-contents)
1636
-
1637
- ---
1638
-
1639
- ## 💡 Common Workflows
1640
-
1641
- ```bash
1642
- # Switch voices
1643
- /agent-vibes:list # See all voices
1644
- /agent-vibes:switch Aria # Change voice
1645
-
1646
- # Try personalities
1647
- /agent-vibes:personality pirate # Pirate voice + style
1648
- /agent-vibes:personality list # See all 19 personalities
1649
-
1650
- # Speak in other languages
1651
- /agent-vibes:set-language spanish # Speak in Spanish
1652
- /agent-vibes:set-language list # See 30+ languages
1653
-
1654
- # Replay audio
1655
- /agent-vibes:replay # Replay last message
1656
- ```
1657
-
1658
- **💡 Tip:** Using MCP? Just say "Switch to Aria voice" or "Speak in Spanish" instead of typing commands.
1659
-
1660
- [↑ Back to top](#-table-of-contents)
1661
-
1662
- ---
1663
-
1664
- ## 🔧 Advanced Features
1665
-
1666
- AgentVibes supports **custom personalities** and **custom voices**.
1667
-
1668
- **Quick Examples:**
1669
- ```bash
1670
- # Create custom personality
1671
- /agent-vibes:personality add mycustom
1672
-
1673
- # Add custom Piper voice
1674
- /agent-vibes:add "My Voice" abc123xyz789
1675
-
1676
- # Use in custom output styles
1677
- [Bash: .claude/hooks/play-tts.sh "Starting" "Aria"]
1678
- ```
1679
-
1680
- **[→ View Advanced Features Guide](docs/advanced-features.md)** - Custom personalities, custom voices, and more
1681
-
1682
- [↑ Back to top](#-table-of-contents)
1683
-
1684
- ---
1685
-
1686
- ## 🔊 Remote Audio Setup
1687
-
1688
- **Running AgentVibes on a remote server?** No problem!
1689
-
1690
- ✅ **Auto-detects SSH sessions** - Works with VS Code Remote SSH, regular SSH, cloud dev environments
1691
- **Zero configuration** - Audio optimizes automatically
1692
- ✅ **No static/clicking** - Clean playback through SSH tunnels
1693
-
1694
- **[→ Remote Audio Setup Guide](docs/remote-audio-setup.md)** - Full PulseAudio configuration details
1695
-
1696
- [↑ Back to top](#-table-of-contents)
1697
-
1698
- ---
1699
-
1700
- ## 🛠️ Technical Documentation
1701
-
1702
- ### Audio Architecture
1703
-
1704
- AgentVibes uses a cross-platform audio module (`src/console/audio-env.js`) that handles player detection and environment configuration for all supported platforms.
1705
-
1706
- #### Platform Audio Support Matrix
1707
-
1708
- | Platform | PulseAudio Config | MP3 Players (preference order) | WAV Players (preference order) |
1709
- |----------|-------------------|-------------------------------|-------------------------------|
1710
- | **Native Linux** | System default (not overridden) | ffplay → play (sox) → mpg123 → cvlc → mpv | aplay → paplay → play → ffplay |
1711
- | **WSL2** | Auto-detects `/mnt/wslg/PulseServer` | Same as Linux | Same as Linux |
1712
- | **macOS** | Not applicable | ffplay → play → mpg123 → cvlc → mpv → afplay | aplay → paplay → play → ffplay → afplay |
1713
- | **Windows** | Not applicable | ffplay → mpv (if installed) | ffplay → mpv → PowerShell SoundPlayer (built-in) |
1714
-
1715
- #### Key Design Decisions
1716
-
1717
- - **Direct spawn, not shell chains**: Audio players are spawned directly via Node's `spawn()` instead of `sh -c 'cmd1 || cmd2'` chains. VLC/cvlc crashes when stderr is redirected inside shell wrappers.
1718
- - **Player detection at startup**: The available player is detected once using `which` and cached. No runtime fallback chains.
1719
- - **PULSE_SERVER safety**: The WSL2 PulseServer path (`/mnt/wslg/PulseServer`) is only set when the socket file actually exists. Hardcoding it on native Linux silently breaks audio output.
1720
- - **Windows WAV fallback**: PowerShell's `System.Media.SoundPlayer` is used as a built-in fallback when no cross-platform player is installed.
1721
-
1722
- #### Multi-Speaker Voice Models
1723
-
1724
- Piper supports multi-speaker ONNX models (e.g., `16Speakers.onnx`) that contain multiple voices in a single file. AgentVibes expands these automatically:
1725
-
1726
- - The `.onnx.json` metadata file contains `num_speakers` and `speaker_id_map`
1727
- - `scanInstalledVoices()` expands multi-speaker models into individual selectable entries (e.g., `16Speakers::Cori_Samuel`)
1728
- - When selected, the system writes `tts-piper-model.txt` and `tts-piper-speaker-id.txt` to `.claude/`
1729
- - `play-tts-piper.sh` reads these files and passes `--speaker <id>` to the piper binary
1730
-
1731
- #### Voice Directory Resolution
1732
-
1733
- Voice storage follows the same precedence chain in both JavaScript and shell:
1734
-
1735
- 1. `PIPER_VOICES_DIR` environment variable
1736
- 2. Project-local `.claude/piper-voices-dir.txt` (walks up directory tree)
1737
- 3. Global `~/.claude/piper-voices-dir.txt`
1738
- 4. Default `~/.claude/piper-voices`
1739
-
1740
- #### Voice Catalog System
1741
-
1742
- AgentVibes includes a 914-voice catalog (`voice-assignments.json`) that lets users browse, preview, and install voices directly from the Voices tab:
1743
-
1744
- - **10 Curated Voices** — Hand-picked high-quality voices installed by default
1745
- - **904 LibriTTS Speakers** Automatically extracted from the `16Speakers` multi-speaker model's `speaker_id_map`, plus the full LibriTTS catalog from Hugging Face
1746
- - **Download on Demand** — Uninstalled voices appear greyed-out in the list; pressing Enter opens a download modal that fetches the voice via `piper-voice-manager.sh`
1747
- - **Catalog Metadata** Each entry includes `voiceId`, `displayName`, `gender`, `type` (curated/libritts), and download URL
1748
- - **LibriTTS Speaker Names** — Raw numeric IDs are patched at load time using `patchLibriTTSSpeakerNames()` which maps speaker IDs to human-readable names from the registry
1749
-
1750
- The catalog is loaded once at tab initialization by `loadCatalog()`. Installed voices (from disk scan) are shown with full color; catalog-only voices are dimmed until downloaded.
1751
-
1752
- #### Required System Dependencies for Background Music
1753
-
1754
- Background music requires an MP3-capable audio player. The installer detects missing players and offers to install `ffmpeg` automatically. If no player is found, the Music tab displays a clear error message.
1755
-
1756
- ```bash
1757
- # Install ffmpeg (recommended — provides ffplay)
1758
- # Ubuntu/Debian/WSL2:
1759
- sudo apt install ffmpeg
1760
-
1761
- # macOS:
1762
- brew install ffmpeg
1763
-
1764
- # Arch Linux:
1765
- sudo pacman -S ffmpeg
1766
- ```
1767
-
1768
- [↑ Back to top](#-table-of-contents)
1769
-
1770
- ---
1771
-
1772
- ## 🔗 Useful Links
1773
-
1774
- ### Voice & AI Tools
1775
-
1776
- - 🎤 **[WhisperTyping](https://whispertyping.com/)** - Fast voice-to-text typing for developers
1777
- - 🗣️ **[OpenWhisper (Azure)](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/whisper-overview)** - Microsoft's speech-to-text service
1778
- - 🆓 **[Piper TTS](https://github.com/rhasspy/piper)** - Free offline neural TTS
1779
- - 🤖 **[Claude Code](https://claude.com/claude-code)** - AI coding assistant
1780
- - 🎭 **[BMAD METHOD](https://github.com/bmad-code-org/BMAD-METHOD)** - Multi-agent framework
1781
-
1782
- ### AgentVibes Resources
1783
-
1784
- - 🐛 **[Issues](https://github.com/paulpreibisch/AgentVibes/issues)** - Report bugs
1785
- - 📝 **[Changelog](https://github.com/paulpreibisch/AgentVibes/releases)** - Version history
1786
- - 📰 **[Technical Deep Dive - LinkedIn Article](https://www.linkedin.com/pulse/agent-vibes-add-voice-claude-code-deep-dive-npx-paul-preibisch-8zrcc/)** - How AgentVibes works under the hood
1787
-
1788
- [↑ Back to top](#-table-of-contents)
1789
-
1790
- ---
1791
-
1792
- ## ❓ Troubleshooting
1793
-
1794
- **Common Issues:**
1795
-
1796
- **❌ Error: "git-lfs is not installed"**
1797
-
1798
- **AgentVibes does NOT require git-lfs.** This error suggests:
1799
-
1800
- 1. **Wrong installation method** - Use npm, not git clone:
1801
- ```bash
1802
- # ✅ CORRECT - Use this:
1803
- npx agentvibes install
1804
-
1805
- # WRONG - Don't clone unless contributing:
1806
- git clone https://github.com/paulpreibisch/AgentVibes.git
1807
- ```
1808
-
1809
- 2. **Different project** - You may be in a BMAD-METHOD or other repo that uses git-lfs
1810
-
1811
- 3. **Global git config** - Your git may have lfs enabled globally:
1812
- ```bash
1813
- git config --global --list | grep lfs
1814
- ```
1815
-
1816
- **Solution:** Use `npx agentvibes install` - no git operations needed!
1817
-
1818
- ---
1819
-
1820
- **No Audio Playing?**
1821
- 1. Verify hook is installed: `ls -la .claude/hooks/session-start-tts.sh`
1822
- 2. Test: `/agent-vibes:sample Aria`
1823
-
1824
- **Commands Not Found?**
1825
- ```bash
1826
- npx agentvibes install --yes
1827
- ```
1828
-
1829
- **[ View Complete Troubleshooting Guide](docs/troubleshooting.md)** - Solutions for audio issues, command problems, MCP errors, voice issues, and more
1830
-
1831
- [↑ Back to top](#-table-of-contents)
1832
-
1833
- ---
1834
-
1835
- ## 🔄 Updating
1836
-
1837
- **Quick Update (From Claude Code):**
1838
- ```bash
1839
- /agent-vibes:update
1840
- ```
1841
-
1842
- **Alternative Methods:**
1843
- ```bash
1844
- # Via npx
1845
- npx agentvibes update --yes
1846
-
1847
- # Via npm (if installed globally)
1848
- npm update -g agentvibes && agentvibes update --yes
1849
- ```
1850
-
1851
- **Check Version:** `/agent-vibes:version`
1852
-
1853
- **[→ View Complete Update Guide](docs/updating.md)** - All update methods, version checking, what gets updated, and troubleshooting
1854
-
1855
- [↑ Back to top](#-table-of-contents)
1856
-
1857
- ---
1858
-
1859
- ## 🗑️ Uninstalling
1860
-
1861
- **Quick Uninstall (Project Only):**
1862
- ```bash
1863
- npx agentvibes uninstall
1864
- ```
1865
-
1866
- **Uninstall Options:**
1867
- ```bash
1868
- # Interactive uninstall (confirms before removing)
1869
- npx agentvibes uninstall
1870
-
1871
- # Auto-confirm (skip confirmation prompt)
1872
- npx agentvibes uninstall --yes
1873
-
1874
- # Also remove global configuration
1875
- npx agentvibes uninstall --global
1876
-
1877
- # Complete uninstall including Piper TTS
1878
- npx agentvibes uninstall --global --with-piper
1879
- ```
1880
-
1881
- **What Gets Removed:**
1882
-
1883
- **Project-level (default):**
1884
- - `.claude/commands/agent-vibes/` - Slash commands
1885
- - `.claude/hooks/` - TTS scripts
1886
- - `.claude/personalities/` - Personality templates
1887
- - `.claude/output-styles/` - Output styles
1888
- - `.claude/audio/` - Audio cache
1889
- - `.claude/tts-*.txt` - TTS configuration files
1890
- - `.agentvibes/` - BMAD integration files
1891
-
1892
- **Global (with `--global` flag):**
1893
- - `~/.claude/` - Global configuration
1894
- - `~/.agentvibes/` - Global cache
1895
-
1896
- **Piper TTS (with `--with-piper` flag):**
1897
- - `~/piper/` - Piper TTS installation
1898
-
1899
- **To Reinstall:**
1900
- ```bash
1901
- npx agentvibes install
1902
- ```
1903
-
1904
- **💡 Tips:**
1905
- - Default uninstall only removes project-level files
1906
- - Use `--global` if you want to completely reset AgentVibes
1907
- - Use `--with-piper` if you also want to remove the Piper TTS engine
1908
- - Run `npx agentvibes status` to check installation status
1909
-
1910
- [↑ Back to top](#-table-of-contents)
1911
-
1912
- ---
1913
-
1914
- ## ❓ Frequently Asked Questions (FAQ)
1915
-
1916
- ### Installation & Setup
1917
-
1918
- **Q: Does AgentVibes require git-lfs?**
1919
- **A:** **NO.** AgentVibes has zero git-lfs requirement. Use `npx agentvibes install` - no git operations needed.
1920
-
1921
- **Q: Do I need to clone the GitHub repository?**
1922
- **A:** **NO** (unless you're contributing code). Normal users should use `npx agentvibes install`. Repository cloning is only for developers who want to contribute to the project.
1923
-
1924
- **Q: Why is the GitHub repo so large?**
1925
- **A:** The repo includes demo files and development dependencies (node_modules). The actual npm package you download is **< 50MB** and optimized for users.
1926
-
1927
- **Q: What's the difference between npm install and git clone?**
1928
- **A:**
1929
- - `npx agentvibes install` → **For users** - Downloads pre-built package, zero git operations, instant setup
1930
- - `git clone ...` → **For developers only** - Full source code, development setup, contributing code
1931
-
1932
- **Q: I saw an error about git-lfs, is something wrong?**
1933
- **A:** You're likely:
1934
- 1. Using wrong installation method (use `npx` not `git clone`)
1935
- 2. In a different project directory that uses git-lfs
1936
- 3. Have global git config with lfs enabled
1937
-
1938
- AgentVibes itself does NOT use or require git-lfs.
1939
-
1940
- ### Features & Usage
1941
-
1942
- **Q: Does MCP consume tokens from my context window?**
1943
- **A:** **YES.** Every MCP tool schema adds to the context window. AgentVibes MCP is designed to be minimal (~1500-2000 tokens), but if you're concerned about token usage, you can use slash commands instead of MCP.
1944
-
1945
- **Q: What's the difference between using MCP vs slash commands?**
1946
- **A:**
1947
- - **MCP**: Natural language ("Switch to Aria voice"), uses ~1500-2000 context tokens
1948
- - **Slash commands**: Explicit commands (`/agent-vibes:switch Aria`), zero token overhead
1949
-
1950
- Both do the exact same thing - MCP is more convenient, slash commands are more token-efficient.
1951
-
1952
- **Q: Is AgentVibes just a bash script?**
1953
- **A:** No. AgentVibes includes:
1954
- - Multi-provider TTS abstraction (Piper TTS, macOS Say)
1955
- - Voice management system with 50+ voices
1956
- - Personality & sentiment system
1957
- - Language learning mode with bilingual playback
1958
- - Audio effects processing (reverb, EQ, compression)
1959
- - MCP server for natural language control
1960
- - BMAD integration for multi-agent voice switching
1961
- - Remote audio optimization for SSH/RDP sessions
1962
-
1963
- **Q: Can I use AgentVibes without BMAD?**
1964
- **A:** **YES.** AgentVibes works standalone. BMAD integration is optional - only activates if you install BMAD separately.
1965
-
1966
- **Q: What are the audio dependencies?**
1967
- **A:**
1968
- - **Required**: Node.js 16+, Python 3.10+ (for Piper TTS)
1969
- - **Optional**: sox (audio effects), ffmpeg (background music, padding)
1970
- - All TTS generation works without optional dependencies - they just enhance the experience
1971
-
1972
- ### Voice Features
1973
-
1974
- **Q: How do I browse and install voices?**
1975
- **A:** Use the built-in TUI installer by running `/audio-browser` in Claude Code. Navigate with arrow keys, press ENTER to sample voices, and select one to install. AgentVibes switches to the chosen voice automatically.
1976
-
1977
- **Q: What are friendly voice names?**
1978
- **A:** Instead of technical IDs like `en_US-ryan-high`, you can now use simple names like "Ryan" when switching voices. All 904+ voices have friendly names matched to their characteristics.
1979
-
1980
- **Q: How do I set up custom intro text?**
1981
- **A:** During installation you'll be prompted for intro text. You can also configure it anytime via `npx agentvibes` → Settings tab. Enter text like "FireBot: " and it will prefix all TTS announcements.
1982
-
1983
- **Q: Can I use my own background music?**
1984
- **A:** Yes! Run `npx agentvibes` and open the Music tab. Select "Change music" and provide the path to your audio file (.mp3, .wav, .ogg, or .m4a). Files are validated for security and must be under 50MB.
1985
-
1986
- **Q: What's the recommended duration for custom music?**
1987
- **A:** Between 30-90 seconds is ideal for smooth looping. The system supports up to 300 seconds (5 minutes) but will warn you if the duration is non-optimal.
1988
-
1989
- **Q: Are friendly voice names case-sensitive?**
1990
- **A:** No! You can type "ryan", "Ryan", or "RYAN" - they all work. The voice resolution is case-insensitive.
1991
-
1992
- **Q: Does custom music work with all TTS providers?**
1993
- **A:** Yes! Custom background music works with Piper TTS, Soprano, macOS Say, and Windows SAPI.
1994
-
1995
- **Q: Can I preview music before setting it as my background?**
1996
- **A:** Yes! In `npx agentvibes` Music tab, select "Preview current" to hear your music. During installation, you can also sample all built-in tracks.
1997
-
1998
- **Q: What security measures protect custom music uploads?**
1999
- **A:** AgentVibes implements **defense-in-depth security with 7 validation layers**, tested against 180+ attack variations:
2000
-
2001
- 1. **Path Validation** - `path.resolve()` prevents traversal attacks (../, encoded, Unicode)
2002
- 2. **Home Directory Boundary** - Files must be within your home directory
2003
- 3. **File Existence Check** - Verifies file actually exists
2004
- 4. **File Type Verification** - Must be a regular file (not device, socket, etc.)
2005
- 5. **Ownership Verification** - File must be owned by you (UID check)
2006
- 6. **Format Validation** - Magic number checking ensures real audio files
2007
- 7. **Secure Storage** - Files copied to restricted directory with 600 permissions
2008
-
2009
- **Security Certification:**
2010
- - 100% attack rejection rate (107/107 tests passed)
2011
- - OWASP CWE-22 compliant (path traversal prevention)
2012
- - ✅ No information disclosure in error messages
2013
- - Production-ready and certified secure
2014
-
2015
- See full security audit: `docs/security/SECURITY-AUDIT.md`
2016
-
2017
- **Q: Has the security been independently verified?**
2018
- **A:** Yes! AgentVibes v3.6.0 includes a comprehensive security audit with 180+ attack variations tested. All path traversal, symlink, Unicode, null byte, and edge case attacks were successfully blocked (100% rejection rate). The system is OWASP CWE-22 compliant and includes a detailed security audit report at `docs/security/SECURITY-AUDIT.md`.
2019
-
2020
- **Q: What attack patterns were tested?**
2021
- **A:** The security test suite covers:
2022
- - **Path Traversal:** 100 variations (basic, URL-encoded, Unicode, null bytes, mixed)
2023
- - **Symlink Attacks:** 10 variations (sensitive files, chains, traversal targets)
2024
- - **Hard Link Attacks:** 5 variations (ownership verification)
2025
- - **Edge Cases:** 65+ variations (CRLF, whitespace, Unicode normalization, platform-specific)
2026
-
2027
- Every attack was correctly rejected with no information disclosure.
2028
-
2029
- ### Troubleshooting
2030
-
2031
- **Q: Why isn't Claude speaking?**
2032
- **A:** Common causes:
2033
- 1. Hook not installed - Run `npx agentvibes install --yes`
2034
- 2. Audio player missing - Install `sox` and `ffmpeg`
2035
- 3. TTS protocol not enabled in settings
2036
- 4. Test with `/agent-vibes:sample Aria`
2037
-
2038
- **Q: Can I use this on Windows?**
2039
- **A:** Yes! AgentVibes supports **native Windows** with PowerShell scripts (Soprano, Piper, SAPI providers). See [Windows Native Setup](WINDOWS-SETUP.md). WSL is also supported for legacy workflows - see [Windows WSL Guide](mcp-server/WINDOWS_SETUP.md).
2040
-
2041
- **Q: How do I reduce token usage?**
2042
- **A:**
2043
- 1. Use slash commands instead of MCP (zero context token overhead)
2044
- 2. Set verbosity to LOW (`/agent-vibes:verbosity low`)
2045
- 3. Disable BMAD integration if not using it
2046
-
2047
- [↑ Back to top](#-table-of-contents)
2048
-
2049
- ---
2050
-
2051
- ## ⚠️ Important Disclaimers
2052
-
2053
- **API Costs & Usage:**
2054
- - Usage is completely free with Piper TTS and Mac Say (no API costs)
2055
- - Users are solely responsible for their own API costs and usage
2056
-
2057
-
2058
- **Third-Party Services:**
2059
- - This project integrates with Piper TTS (local processing) and macOS Say (system built-in)
2060
- - We are **not affiliated with, endorsed by, or officially connected** to Anthropic, Apple, or Claude
2061
- - Piper TTS is subject to its terms of service
2062
-
2063
- **Privacy & Data:**
2064
- - **Piper TTS**: All processing happens locally on your machine, no external data transmission
2065
- - **macOS Say**: All processing happens locally using Apple's built-in speech synthesis
2066
-
2067
- **Software License:**
2068
- - Provided "as-is" under Apache 2.0 License without warranty of any kind
2069
- - See [LICENSE](LICENSE) file for full terms
2070
- - No liability for data loss, bugs, service interruptions, or any damages
2071
-
2072
- **Use at Your Own Risk:**
2073
- - This is open-source software maintained by the community
2074
- - Always test in development before production use
2075
- - Monitor your API usage and costs regularly
2076
-
2077
- [↑ Back to top](#-table-of-contents)
2078
-
2079
- ---
2080
-
2081
- ## 🙏 Credits
2082
-
2083
- **Built with ❤️ by [Paul Preibisch](https://github.com/paulpreibisch)**
2084
-
2085
- - 🐦 Twitter: [@997Fire](https://x.com/997Fire)
2086
- - 💼 LinkedIn: [paul-preibisch](https://www.linkedin.com/in/paul-preibisch/)
2087
- - 🌐 GitHub: [paulpreibisch](https://github.com/paulpreibisch)
2088
-
2089
- **Powered by:**
2090
- - [Piper TTS](https://github.com/rhasspy/piper) - Free neural voices
2091
- - [Soprano TTS](https://github.com/suno-ai/bark) - Ultra-fast neural TTS
2092
- - **Windows SAPI** - Native Windows text-to-speech
2093
- - **macOS Say** - Native macOS text-to-speech
2094
- - [Claude Code](https://claude.com/claude-code) - AI coding assistant
2095
- - Licensed under Apache 2.0
2096
-
2097
- **Contributors:**
2098
- - 🎤 [@nathanchase](https://github.com/nathanchase) - Soprano TTS Provider integration (PR #95) - Ultra-fast neural TTS with GPU acceleration
2099
-
2100
- **Special Thanks:**
2101
- - 💡 [Claude Code Hooks Mastery](https://github.com/disler/claude-code-hooks-mastery) by [@disler](https://github.com/disler) - Hooks inspiration
2102
- - 🤖 [BMAD METHOD](https://github.com/bmad-code-org/BMAD-METHOD) - Multi-agent framework with auto voice switching integration
2103
-
2104
- [↑ Back to top](#-table-of-contents)
2105
-
2106
- ---
2107
-
2108
- ## 🤝 Contributing
2109
-
2110
- If AgentVibes makes your coding more fun:
2111
- - **Star this repo** on GitHub
2112
- - 🐦 **Tweet** and tag [@997Fire](https://x.com/997Fire)
2113
- - 🎥 **Share videos** of Claude with personality
2114
- - 💬 **Tell dev friends** about voice-powered AI
2115
-
2116
- ---
2117
-
2118
- **Ready to give Claude a voice? Install now and code with personality! 🎤✨**
2119
-
2120
- [↑ Back to top](#-table-of-contents)
2121
-
1
+ # 🎤 AgentVibes
2
+
3
+ > **Finally! Your agents can talk back!**
4
+ >
5
+ > 🌐 **[agentvibes.org](https://agentvibes.org)**
6
+ >
7
+ > Professional text-to-speech for **Claude Code**, **Claude Desktop**, and **OpenClaw** - **Soprano** (Neural), **Piper TTS** (Free!), **macOS Say** (Built-in!), or **Windows SAPI** (Zero Setup!)
8
+
9
+ [![npm version](https://img.shields.io/npm/v/agentvibes)](https://www.npmjs.com/package/agentvibes)
10
+ [![Test Suite](https://github.com/paulpreibisch/AgentVibes/actions/workflows/test.yml/badge.svg)](https://github.com/paulpreibisch/AgentVibes/actions/workflows/test.yml)
11
+ [![Publish](https://github.com/paulpreibisch/AgentVibes/actions/workflows/publish.yml/badge.svg)](https://github.com/paulpreibisch/AgentVibes/actions/workflows/publish.yml)
12
+ [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
13
+
14
+ **Author**: Paul Preibisch ([@997Fire](https://x.com/997Fire)) | **Version**: v4.6.6
15
+
16
+ ---
17
+
18
+ ## 🚀 Quick Links
19
+
20
+ | I want to... | Go here |
21
+ |--------------|---------|
22
+ | **Install AgentVibes** (just `npx`, no git!) | [Quick Start Guide](docs/quick-start.md) |
23
+ | **Run Claude Code on Android** | [Android/Termux Setup](#-android--termux) |
24
+ | **Secure OpenClaw on Remote Server** | [Security Hardening Guide](docs/security-hardening-guide.md) ⚠️ |
25
+ | **Understand what I need** | [Prerequisites](#-prerequisites) |
26
+ | **Set up on Windows (Native)** | [Windows Native Setup](WINDOWS-SETUP.md) |
27
+ | **Set up on Windows (Claude Desktop/WSL)** | [Windows WSL Guide](mcp-server/WINDOWS_SETUP.md) |
28
+ | **Use with OpenClaw** | [OpenClaw Integration](#-openclaw-integration) |
29
+ | **Use natural language** | [MCP Setup](docs/mcp-setup.md) |
30
+ | **Switch voices** | [Voice Library](docs/voice-library.md) |
31
+ | **Configure BMAD Party Mode** (agents with unique voices) | [BMAD Plugin & Party Mode](#-bmad-plugin) |
32
+ | **Fix issues** (git-lfs? MCP tokens? Read this!) | [Troubleshooting](docs/troubleshooting.md) & [FAQ](#-frequently-asked-questions-faq) |
33
+
34
+ ---
35
+
36
+ ## ✨ What is AgentVibes?
37
+
38
+ **AgentVibes adds lively voice narration to your Claude AI sessions!**
39
+
40
+ Whether you're coding in Claude Code, chatting in Claude Desktop, or running OpenClaw — AgentVibes brings AI to life with professional voices and personalities.
41
+
42
+ ---
43
+
44
+ ## 🧭 NEW IN v4.6.6 — Natural TUI Navigation
45
+
46
+ The Settings TUI now flows the way you'd expect. Down moves top-to-bottom through header sub-tabs content → footer. Left/Right switches sub-tabs and moves between footer buttons. Up from content returns to the active sub-tab — not always Voice. The Language tab has a proper scrollable list. Readme falls back to the AgentVibes package README when no local one exists. Escape from the installer no longer gets stuck.
47
+
48
+ ---
49
+
50
+ ## 🔧 NEW IN v4.6.5 — Line Endings, TUI Non-Interactive Hint, Release Process
51
+
52
+ - **`.gitattributes`** enforces `LF` for shell scripts/JS/JSON/markdown, `CRLF` for PowerShell; stops `bin/` files showing as modified on Windows
53
+ - **TUI non-interactive hint** — installer header now shows a two-tone hint on row 2: `Skip this UI?` (dim) + `npx agentvibes install --non-interactive` (brighter), matching the `[piper] [en_US-ryan-high]` footer aesthetic
54
+
55
+ ---
56
+
57
+ ## 🐛 NEW IN v4.6.4CI & macOS Fixes
58
+
59
+ - **macOS `mktemp` fixed** — 12 calls now use BSD-compatible syntax (XXXXXX at end, then rename to add extension)
60
+ - **CI test suite green** — macOS path symlink, execute permission, and parallel mktesk race all fixed
61
+
62
+ ---
63
+
64
+ ## 🐛 NEW IN v4.6.3 Party Mode Correct Voices
65
+
66
+ - **Party mode agents speak with their configured voices** `bmad-party-speak.ps1` was extracting the trailing number from the speaker display name suffix (e.g. `14` from `Yara-14`) and passing it as the Piper `--speaker` index. That number is a human-readable disambiguator, not the model index — `Yara-14` is actually speaker 860. Fixed to look up the full name in `speaker_id_map` from the `.onnx.json` file, matching what `play-tts-piper.ps1` already did. Every configured agent was silently playing a different voice.
67
+
68
+ ---
69
+
70
+ ## 🐛 NEW IN v4.6.2 — Party Mode Voices, LibriTTS Speaker Fix, Agent Pretext
71
+
72
+ - **Party mode agents now speak in their unique voices** — SKILL.md wired to `bmad-speak.ps1` per agent
73
+ - **LibriTTS speaker IDs resolved correctly** `Holly-7` is speaker 322, not 7
74
+ - **Agent pretext spoken on Windows** — "Mary, Business Analyst here." before every response
75
+ - **`parseMultiSpeaker` fallback** — works on fresh installs before `.onnx.json` is patched
76
+
77
+ ---
78
+
79
+ ## 🌟 NEW IN v4.6.1 Party Mode Voice Clarity + Agent Config UI Polish
80
+
81
+ ### 🔊 Voice Volume Fixed in Party Mode
82
+
83
+ - **`normalize=0`** added to ffmpeg `amix` prevents voices being silenced to 50% when mixed with background music
84
+ - **Voice boost `volume=1.5`** applied to every TTS stream — agents are now loud and clear
85
+ - **Music intro reduced to 1 second** (`adelay=1000`) less dead air before each agent speaks
86
+ - **Pre-synthesis gap reduction** — WAV files are generated *before* acquiring the mutex, so synthesis overlaps with the previous agent's playback (gap drops from ~4–6s to ~1s)
87
+
88
+ ### 🎛️ BMAD Agent Config — Preview + Split Fields
89
+
90
+ - **Music Track** and **Music Vol** are now separate fields in the agent editor — each opens its own dialog
91
+ - **Preview button** plays the selected voice with full effects: personality, reverb, background music track and volume
92
+ - **Blinking indicator** (`►█`) highlights the focused button — reuses the shared `attachBtnBlink` utility
93
+ - **Preview spinner** animates while audio is playing
94
+ - **Tab→Save hint** shown in the volume input dialog
95
+
96
+ ### 🚻 Voice Gender Auto-Assign Fixed
97
+
98
+ - `inferGender` now strips the numeric suffix from LibriTTS speaker names (e.g. `anna-9` → `anna`) before looking up gender
99
+ - Expanded `GENDER_MAP` with 60+ first names covering all bundled voices
100
+ - `libritts` blanket-male override removed — LibriTTS voices are now inferred per-name
101
+
102
+ ### 🐛 Other Fixes
103
+
104
+ - Volume dialog text now uses `cyan`/`white` — no more invisible-on-dark-background instructions
105
+ - After saving agent settings, focus correctly returns to the agent list (Enter re-opens the agent)
106
+ - Boundary navigation in agent fields no longer jumps to buttons prematurely
107
+
108
+ ---
109
+
110
+ ## 🌟 NEW IN v4.6 — Party Mode Auto-Install + Volume Fix
111
+
112
+ ### 🎉 BMAD Party Mode TTSZero Setup
113
+
114
+ Every agent now speaks automatically in any BMAD project no manual hook configuration needed:
115
+
116
+ - Installer copies `bmad-party-speak.sh` (Linux/macOS/WSL) or `bmad-party-speak.ps1` (Windows) to `~/.claude/hooks/`
117
+ - `PostToolUse` hook registered in `~/.claude/settings.json` automatically
118
+ - `npx agentvibes update` keeps the scripts fresh across all platforms
119
+
120
+ ### 🔊 Background Music Volume Default: 20%
121
+
122
+ All volume defaults lowered from 70% to 20% new installs and agents start at a sensible level. `bmad-speak` scripts now inherit the global volume setting instead of ignoring it.
123
+
124
+ ### 🐛 Installer Navigation Fix
125
+
126
+ Pressing on the completion screen no longer jumps back to the installation step.
127
+
128
+ ### 🧪 628 Tests, Zero Failures
129
+
130
+ ---
131
+
132
+ ## 🌟 v4.5 — "Speak Every Language" Release
133
+
134
+ ### 🌍 Multilingual TUI — 9 Languages
135
+
136
+ Every screen, button, and label in `npx agentvibes` is now fully translated:
137
+
138
+ - **English, Spanish, French, German, Portuguese, Japanese, Korean, Chinese (Simplified), Italian**
139
+ - Language selection on first launch pick your language before anything else
140
+ - Language sub-tab in Settings — switch live, no restart needed
141
+ - All tab labels, buttons, footer hints, status messages, and BMAD/Receiver tabs translated
142
+ - Per-language i18n files (`src/i18n/en.js`, `es.js`, `fr.js`, ...) with English fallback
143
+
144
+ ### 🪟 Windows Security Hardening
145
+
146
+ - **Unpredictable temp files** — `randomUUID()` replaces `Date.now()` in all temp filenames (JS + PowerShell)
147
+ - **No shell injection** — `spawnSync` replaces `execSync(..., { shell: true })` for `which` lookups
148
+ - **Smart music player detection** — `detectMp3Player()` replaces hardcoded `ffplay` on Windows
149
+ - **Boolean fix** `isWindowsTerminal` now returns `true/false`, not the `WT_SESSION` UUID string
150
+
151
+ ### 🎙️ Cross-Platform BMAD Speak
152
+
153
+ BMAD (Build More Architect Dreams) is an AI multi-agent framework where specialized agents Architect, PM, Developer, QA, and Analyst — collaborate to build software. With this release, every agent in a BMAD party mode session now speaks aloud with their own unique voice, personality, and music on Windows — making each role instantly recognizable.
154
+
155
+ - `bmad-speak.js` — cross-platform entry point; auto-routes to PowerShell on Windows or bash on Mac/Linux
156
+ - `bmad-speak.ps1` — native Windows BMAD speak with per-agent personality routing
157
+
158
+ ### 🧪 600 Tests, Zero Failures
159
+
160
+ ---
161
+
162
+ ## 🌟 v4.4 — Full Platform Parity Release
163
+
164
+ ### 🪟 Windows MCP Parity — 27/27 Tools Working
165
+
166
+ All MCP tools now work natively on Windows. Previously 12 tools silently failed due to missing scripts:
167
+
168
+ - **6 new PowerShell scripts** — personality-manager, speed-manager, language-manager, learn-manager, verbosity-manager, clean-audio-cache
169
+ - **Unified provider naming** — `piper` and `sapi` on all platforms (no more `windows-piper`/`windows-sapi`)
170
+ - **replay command** added to voice-manager for Windows
171
+ - **Adversarial review** — 24 issues found, 10 fixed (3 CRITICAL, 4 HIGH, 3 MEDIUM)
172
+ - **28 new tests** covering script parity, effects round-trip, provider management, and naming consistency
173
+ - **Feature-platform matrix** — [docs/feature-platform-matrix.md](docs/feature-platform-matrix.md) tracks all 85 features across Linux, macOS, Windows, and WSL
174
+
175
+ ### Bug Fixes (HIGH)
176
+ - ffmpeg stderr redirected to temp file instead of literal `"NUL"` file
177
+ - `AGENTVIBES_NO_PLAY` env var properly cleaned up on error paths
178
+ - `PIPER_SPEAKER` env var no longer leaks between voice switches
179
+ - Provider config now uses project-local `.claude` (not always global)
180
+ - Text sanitization relaxed`$50 (USD)` no longer becomes `50 USD`
181
+
182
+ ---
183
+
184
+ ## 🌟 v4.3 — Windows Parity + BMAD Party Mode
185
+
186
+ ### 🎭 BMAD Party Mode — Every Agent Has Its Own Voice
187
+
188
+ The BMad Method (Build More Architect Dreams) is an AI-driven development framework that helps you build software from ideation through agentic implementation with specialized AI agents, guided workflows, and intelligent planning that adapts to your project's complexity.
189
+
190
+ **Every BMAD agent now speaks with their own unique voice, music, and personality.**
191
+
192
+ When party mode runs a multi-agent discussion, the Architect, PM, Developer, QA, and Analyst each sound completely different — making every role immediately recognizable.
193
+
194
+ **Auto-enabled** if BMAD is installed, party mode activates automatically. Open the BMad Tab to configure each agent:
195
+
196
+ ```bash
197
+ npx agentvibes # Press B to open the BMad Tab
198
+ ```
199
+
200
+ **Per-agent configuration:**
201
+ - 🎙️ **Voice** — 914 voices to choose from, auto-assigned gender-aware
202
+ - 🎵 **Background Music** — Unique ambient track per agent (cinematic, lo-fi, jazz...)
203
+ - 🎚️ **Music Volume** — Per-agent level, or set all at once via Bulk Edit
204
+ - 🎛️ **Reverb** — none / room / hall / cathedral / studio per agent
205
+ - 💬 **Pretext** Custom intro phrase ("Winston says:..." before every line)
206
+ - 🎭 **Personality** sarcastic, dramatic, pirate, cheerful, and more
207
+ - 🔇 **No Overlap** Speech lock ensures agents never talk over each other
208
+ - **Markdown-Clean**Asterisks and formatting stripped before TTS
209
+
210
+ ### 🎛️ BMad Tab — Visual Agent Configurator
211
+
212
+ The `npx agentvibes` TUI now includes a full **BMad Tab** for managing every agent visually — inspired by the Voices tab, with the same columns and navigation polish:
213
+
214
+ ```bash
215
+ npx agentvibes # Press B for BMad Tab
216
+ ```
217
+
218
+ | Agent | Voice | Gender | Provider | Reverb | Music | Vol | Pretext |
219
+ |-------|-------|--------|----------|--------|-------|-----|---------|
220
+ | 🏢 Winston | Rose Ibex | Female | Piper (LibriTTS) | studio | jazz | 65% | Winston says |
221
+ | 🧠 Larry | Kusal | Male | Piper | hall | cinematic | 80% | Larry says |
222
+
223
+ **Highlights:**
224
+ - **Beautified voice names** — `16Speakers::Rose_Ibex` shows as `Rose Ibex`; `en_US-kusal-medium` shows as `Kusal`
225
+ - **Gender & Provider columns** — see voice metadata at a glance, just like the Voices tab
226
+ - **Inline row hints** navigate to any agent and see `[Space] Preview [Enter] Configure` on the row itself
227
+ - **Preview spinner** — animated `⠋⠙⠹⠸` braille spinner while audio plays
228
+
229
+ | Key | Action |
230
+ |-----|--------|
231
+ | `↑↓` / `jk` | Navigate agents |
232
+ | `Space` | Preview agent (spinner shows while playing) |
233
+ | `Enter` | Configure voice, music, volume, reverb, personality, pretext |
234
+ | `A` | Auto-assign unique voices (gender-aware, no repeats) |
235
+ | `B` | Bulk Edit — set music / volume / pretext / reverb for all agents |
236
+ | `X` | Reset agent to defaults |
237
+
238
+ ---
239
+
240
+ ### 🖥️ SSH Receiver Hear Your Headless Server
241
+
242
+ **Run Claude on a cloud box and hear the TTS on your local machine.**
243
+
244
+ The new **Receiver Tab** streams TTS audio from voiceless remote servers to your local machine over TCP — perfect for AWS/GCP dev boxes, WSL2, and SSH sessions.
245
+
246
+ ```bash
247
+ # On your local machine — open TUI, go to Receiver tab, click Start
248
+ npx agentvibes
249
+
250
+ # On the remote server AgentVibes auto-detects the receiver and streams
251
+ ```
252
+
253
+ Zero-config forwarding. Works with Piper, macOS Say, and Soprano.
254
+
255
+ ---
256
+
257
+ ### ⚡ TTS Latency -~1 Second
258
+
259
+ - **Batched Node.js calls** — 6 separate profile reads collapsed into 1 (~900ms saved)
260
+ - **inotifywait queue** — file-event-based worker, no polling delay
261
+ - **Background cache cleanup** — off the critical path every 10th call
262
+
263
+ ---
264
+
265
+ ### 🎨 ANSI Banner Colors + Toggle
266
+
267
+ Full color in the TTS banner (gold voice, cyan reverb, traffic-light cache). Hide it without muting:
268
+
269
+ ```bash
270
+ touch ~/.agentvibes/banner-disabled # or say "turn off the TTS banner"
271
+ ```
272
+
273
+ ---
274
+
275
+ ### 💬 Intro Text (Pretext) - Your Personal AI Branding
276
+
277
+ **Add custom prefixes to every TTS announcement!**
278
+
279
+ Configure via the AgentVibes TUI Settings tab:
280
+
281
+ ```bash
282
+ npx agentvibes # Navigate to Settings tab
283
+ ```
284
+
285
+ Transform generic AI responses into your personal brand:
286
+
287
+ **Before:**
288
+ ```
289
+ "Starting analysis of the codebase..."
290
+ ```
291
+
292
+ **After (with "FireBot: " intro text):**
293
+ ```
294
+ "FireBot: Starting analysis of the codebase..."
295
+ ```
296
+
297
+ **Perfect for:**
298
+ - 🤖 **Personal AI Branding** - Make Claude sound like your custom assistant
299
+ - 🏢 **Team Identity** - Company bots with branded voices
300
+ - 🎮 **Character Roleplay** - Gaming assistants with character names
301
+ - 🎓 **Teaching Contexts** - Professor Bot, Tutor AI, etc.
302
+
303
+ **Features:**
304
+ - Up to 50 characters
305
+ - UTF-8 and emoji support 🎉
306
+ - Set during installation or anytime after
307
+ - Works with all TTS providers
308
+ - Applies to every single announcement
309
+
310
+ **Examples:**
311
+ - `"JARVIS: "` - Iron Man style
312
+ - `"🤖 Assistant: "` - With emoji
313
+ - `"CodeBot: "` - Development assistant
314
+ - `"Chef AI: "` - Cooking helper
315
+
316
+ Configure via: `npx agentvibes` Settings tab
317
+
318
+ ---
319
+
320
+ ### 🎵 Custom Background Music - Complete Audio Control
321
+
322
+ **Upload your own background music with battle-tested security!**
323
+
324
+ Configure via the AgentVibes TUI Music tab:
325
+
326
+ ```bash
327
+ npx agentvibes # Navigate to Music tab
328
+ ```
329
+
330
+ Replace the default background tracks with your own audio files.
331
+
332
+ **Supported Formats:**
333
+ - 🎵 MP3 (.mp3)
334
+ - 🎵 WAV (.wav)
335
+ - 🎵 OGG (.ogg)
336
+ - 🎵 M4A (.m4a)
337
+
338
+ **Security First:**
339
+ - ✅ **180+ attack variations tested** - Path traversal, symlinks, Unicode tricks
340
+ - ✅ **100% attack rejection rate** - Every malicious attempt blocked
341
+ - **OWASP CWE-22 compliant** - Industry-standard security
342
+ - **7 validation layers** - Defense-in-depth architecture
343
+ - **File ownership verification** - Only your files accepted
344
+ - **Magic number validation** - Real audio files only
345
+ - **Secure storage** - 600 permissions, restricted directory
346
+
347
+ **Smart Validation:**
348
+ - Recommended duration: 30-90 seconds (optimal looping)
349
+ - Maximum: 300 seconds (5 minutes)
350
+ - Maximum size: 50MB
351
+ - Automatic format detection
352
+ - Duration warnings for non-optimal lengths
353
+
354
+ **Perfect for:**
355
+ - 🎮 **Making coding fun** - Your favorite beats while you build
356
+ - 🎼 **Setting the mood** - Match the music to the task (lo-fi for debugging, epic for shipping)
357
+ - 🗂️ **Identifying projects** - Different track per repo so you always know which project Claude is in
358
+ - 🎹 **Deep focus** - Ambient or classical to stay in flow
359
+
360
+ **Features:**
361
+ - Preview before setting
362
+ - One-command upload
363
+ - Works with all TTS providers
364
+ - Loops seamlessly under voice
365
+ - Easy restore to defaults
366
+
367
+ **Menu Options:**
368
+ 1. Change music - Upload new audio file
369
+ 2. Remove music - Clear custom music
370
+ 3. Reset to default - Restore built-in tracks (16 genres)
371
+ 4. Enable/Disable - Toggle background music
372
+ 5. Preview current - Sample your music
373
+
374
+ Configure via: `npx agentvibes` Music tab
375
+
376
+ **Security Certified:** See full audit report at `docs/security/SECURITY-AUDIT.md`
377
+
378
+ ---
379
+
380
+ ### 🎯 Key Features
381
+
382
+ **🌟 v4.2 BMAD Party Mode & SSH Receiver:**
383
+ - 🎭 **BMAD Party Mode Voices** — Each agent speaks with their unique voice, music, reverb, personality
384
+ - 🖥️ **SSH Receiver Tab** — Stream TTS audio from headless servers to your local machine over TCP
385
+ - 🎛️ **BMad Tab (TUI)** Visual agent configurator with auto-assign and bulk edit
386
+ - **TTS Latency -1s** Batched Node.js calls, inotifywait queue, background cleanup
387
+ - 🎨 **ANSI Banner Colors Restored** — Gold/cyan/traffic-light colors in TTS info banner
388
+ - 🔕 **Banner Toggle** Hide TTS banner without muting (`~/.agentvibes/banner-disabled`)
389
+ - 🔇 **No Party Mode Overlap** Agents wait for full audio before next speaks
390
+ - 🧹 **Markdown-Clean Speech** Asterisks/formatting stripped automatically from party mode
391
+
392
+ **🌟 NEW IN v3.6.0 — Voice Explorer Release:**
393
+ - 🏷️ **Friendly Voice Names** - "Ryan" instead of "en_US-libritts_r-medium-speaker-123"
394
+ - 💬 **Intro Text (Pretext)** - Custom prefix for all TTS ("FireBot: Starting...")
395
+ - 🎵 **Custom Background Music** - Upload your own audio files with battle-tested security
396
+ - 🎨 **Interactive Installer** - Preview voices and music during installation
397
+ - 🛡️ **Security Hardening** - 180+ attack variations tested, 100% blocked, OWASP compliant
398
+
399
+ **🪟 NEW IN v3.5.5 Native Windows Support:**
400
+ - 🖥️ **Windows Native TTS** - Soprano, Piper, and Windows SAPI providers. No WSL required!
401
+ - 🎵 **Background Music** - 16 genre tracks mixed under voice
402
+ - 🎛️ **Reverb & Audio Effects** - 5 reverb levels via ffmpeg
403
+ - 🔊 **Verbosity Control** - High, Medium, or Low settings
404
+ - 🎨 **Beautiful Installer** - `npx agentvibes install` or `.\setup-windows.ps1`
405
+
406
+ **⚡ v3.4.0 Highlights:**
407
+ - 🎤 **Soprano TTS Provider** - Ultra-fast neural TTS with 20x CPU, 2000x GPU acceleration (thanks [@nathanchase](https://github.com/nathanchase)!)
408
+ - 🛡️ **Security Hardening** - 9.5/10 score with comprehensive validation and timeouts
409
+ - 🌐 **Environment Intelligence** - PulseAudio tunnel auto-detection for SSH scenarios
410
+
411
+ **⚡ Core Features:**
412
+ - **One-Command Install** - Get started in 30 seconds (`npx agentvibes install` or `.\setup-windows.ps1` without Node.js)
413
+ - 🎭 **Multi-Provider Support** - Soprano (neural), Piper TTS (50+ free voices), macOS Say (100+ built-in), or Windows SAPI
414
+ - 🎙️ **27+ Professional AI Voices** - Character voices, accents, and unique personalities
415
+ - 🎙️ **Verbosity Control** - Choose how much Claude speaks (LOW, MEDIUM, HIGH)
416
+ - 🎙️ **AgentVibes MCP** - Natural language control ("Switch to Aria voice") for Claude Code & Desktop
417
+ - 🔊 **SSH Audio Optimization** - Auto-detects remote sessions and eliminates static (VS Code Remote SSH, cloud dev)
418
+
419
+ **🎭 Personalization:**
420
+ - 🎭 **19 Built-in Personalities** - From sarcastic to flirty, pirate to dry humor
421
+ - 💬 **Advanced Sentiment System** - Apply personality styles to ANY voice without changing it
422
+ - 🎵 **Voice Preview & Replay** - Listen before you choose, replay last 10 TTS messages
423
+
424
+ **🚀 Integrations & Power Features:**
425
+ - 🔌 **Enhanced BMAD Plugin** - Auto voice switching for BMAD agents with multilingual support
426
+ - 🔊 **Live Audio Feedback** - Hear task acknowledgments and completions in any language
427
+ - 🌍 **30+ Languages** - Multilingual support with native voice quality
428
+ - 🆓 **Free & Open** - Use Piper TTS with no API key required
429
+
430
+ ### 🤗 Hugging Face AI Voice Models
431
+
432
+ **AgentVibes' Piper TTS uses 100% Hugging Face-trained AI voice models** from [rhasspy/piper-voices](https://huggingface.co/rhasspy/piper-voices).
433
+
434
+ **What are Hugging Face voice models?**
435
+
436
+ Hugging Face voice models are pre-trained artificial intelligence models hosted on the Hugging Face Model Hub platform, designed to convert text into human-like speech (Text-to-Speech or TTS) or perform other speech tasks like voice cloning and speech-to-speech translation. They're accessible via their Transformers library for easy use in applications like voice assistants, audio generation, and more.
437
+
438
+ **Key Benefits:**
439
+ - 🎯 **Human-like Speech** - VITS-based neural models for natural pronunciation and intonation
440
+ - 🌍 **35+ Languages** - Multilingual support with native accents
441
+ - 🆓 **100% Open Source** - All Piper voices are free HF models (Tacotron2, FastSpeech2, VITS)
442
+ - 🔧 **Developer-Friendly** - Fine-tune, customize, or deploy for various audio projects
443
+ - **Offline & Fast** - No API keys, no internet needed once installed
444
+
445
+ All 50+ Piper voices AgentVibes provides are sourced from Hugging Face's open-source AI voice models, ensuring high-quality, natural-sounding speech synthesis across all supported platforms.
446
+
447
+ ---
448
+
449
+ ## 📑 Table of Contents
450
+
451
+ ### Getting Started
452
+ - [🚀 Quick Start](#-quick-start) - Get voice in 30 seconds (3 simple steps)
453
+ - [📱 Android/Termux](#-quick-setup-android--termux-claude-code-on-your-phone) - Run Claude Code on your phone
454
+ - [📋 Prerequisites](#-prerequisites) - What you actually need (Node.js + optional tools)
455
+ - [✨ What is AgentVibes?](#-what-is-agentvibes) - Overview & key features
456
+ - [🌟 NEW FEATURE HIGHLIGHTS](#-new-feature-highlights) - **START HERE!**
457
+ - [🎭 BMAD Party Mode](#-bmad-party-mode--multi-agent-voice-conversations) - Per-agent voices, music, reverb
458
+ - [🖥️ SSH Receiver](#️-agentvibes-receiver--remote-audio-streaming) - Stream audio from headless servers
459
+ - [💬 Intro Text](#-intro-text-pretext---your-personal-ai-branding) - Custom TTS prefixes
460
+ - [🎵 Custom Background Music](#-custom-background-music---complete-audio-control) - Upload your own tracks
461
+ - [📰 Latest Release](#-latest-release) - v4.6.6 TUI navigation & UX polish
462
+ - [🪟 Windows Setup Guide for Claude Desktop](mcp-server/WINDOWS_SETUP.md) - Complete Windows installation with WSL & Python
463
+
464
+ ### AgentVibes MCP (Natural Language Control)
465
+ - [🎙️ AgentVibes MCP Overview](#%EF%B8%8F-agentvibes-mcp) - **Easiest way** - Natural language commands
466
+ - [For Claude Desktop](docs/mcp-setup.md#for-claude-desktop) - Windows/WSL setup, Python requirements
467
+
468
+ - [For Claude Code](docs/mcp-setup.md#for-claude-code) - Project-specific setup
469
+
470
+ ### Core Features
471
+ - [🎤 Commands Reference](#-commands-reference) - All available commands
472
+ - [🎙️ Verbosity Control](#%EF%B8%8F-verbosity-control) - Control how much Claude speaks (low/medium/high)
473
+ - [🎭 Personalities vs Sentiments](#-personalities-vs-sentiments) - Two systems explained
474
+ - [🗣️ Voice Library](#%EF%B8%8F-voice-library) - 914 voices with friendly names
475
+ - [🔌 BMAD Plugin](#-bmad-plugin) - Auto voice switching for BMAD agents
476
+ - [🎙️ AgentVibes Receiver - NEW!](#%EF%B8%8F-agentvibes-receiver-remote-audio-streaming-from-voiceless-servers) - Remote audio streaming from voiceless servers
477
+
478
+ ### Integrations & Platforms
479
+ - [🤖 OpenClaw Integration](#-openclaw-integration) - Use AgentVibes with OpenClaw messaging platform
480
+ - [🎙️ AgentVibes Skill for OpenClaw](#-agentvibes-skill-for-openclaw---what-you-get) - 50+ voices, effects, personalities for OpenClaw
481
+ - [📱 AgentVibes Receiver](#-agentvibes-receiver-local-phone-) - Remote audio on phones/local machines
482
+
483
+ ### Advanced Topics
484
+ - [📦 Installation Structure](#-installation-structure) - What gets installed
485
+ - [💡 Common Workflows](#-common-workflows) - Quick examples
486
+ - [🔧 Advanced Features](#-advanced-features) - Custom voices & personalities
487
+ - [🔊 Remote Audio Setup](#-remote-audio-setup) - Play TTS from remote servers
488
+ - [🛠️ Technical Documentation](#️-technical-documentation) - Audio architecture, cross-platform support, voice resolution
489
+ - [🚨 Security Hardening Guide](docs/security-hardening-guide.md) - **REQUIRED if running OpenClaw on remote server**: SSH hardening, Fail2Ban, Tailscale, UFW, AIDE
490
+ - [🔬 Technical Deep Dive](docs/technical-deep-dive.md) - How AgentVibes works under the hood
491
+ - [❓ Troubleshooting](#-troubleshooting) - Common issues & fixes
492
+
493
+ ### Additional Resources
494
+ - [🔗 Useful Links](#-useful-links) - Voice typing & AI tools
495
+ - [🔄 Updating](#-updating) - Keep AgentVibes current
496
+ - [🗑️ Uninstalling](#️-uninstalling) - Remove AgentVibes cleanly
497
+ - [❓ FAQ](#-frequently-asked-questions-faq) - **NEW!** Common questions answered (git-lfs, MCP tokens, installation)
498
+ - [🍎 macOS Testing](docs/macos-testing.md) - Automated testing on macOS with GitHub Actions
499
+ - [🤗 Hugging Face Voice Models](docs/hugging-face-models.md) - Technical details on AI voice models
500
+ - [🙏 Credits](#-credits) - Acknowledgments
501
+ - [🤝 Contributing](#-contributing) - Show support
502
+
503
+ ---
504
+
505
+ ## 📰 Latest Release
506
+
507
+ **[v4.6.6 - Natural TUI Navigation](https://github.com/paulpreibisch/AgentVibes/releases/tag/v4.6.6)**
508
+
509
+ The Settings TUI now navigates the way you'd expect — arrow keys flow naturally through the interface, the Language tab has a proper scrollable list, and the Readme tab always has something useful to show.
510
+
511
+ ### 🐛 Recent Fixes (v4.6.3 / v4.6.4)
512
+
513
+ - **Party mode correct voices** — agents now speak with their individually configured voices. `bmad-party-speak.ps1` was extracting the trailing number from the display name suffix (e.g. `14` from `Yara-14`) as the Piper speaker index — wrong. Fixed to look up the full speaker name in `speaker_id_map` from the `.onnx.json` file.
514
+ - **macOS CI green** — `mktemp` with extension suffix (e.g. `tts-XXXXXX.wav`) silently fails on BSD mktemp. Fixed all 12 occurrences across the TTS pipeline scripts.
515
+ - **macOS path symlink test fix** `/var/folders/...` resolved to `/private/var/folders/...` in test assertions.
516
+
517
+ ### 🎭 BMAD Party Mode — Multi-Agent Voice Conversations
518
+
519
+ The BMad Method (Build More Architect Dreams) is an AI-driven development framework module that helps you build software from ideation through agentic implementation with specialized AI agents, guided workflows, and intelligent planning.
520
+
521
+ Every agent in a BMAD discussion now speaks with their own individually configured voice, music, reverb, and personality — making the Architect, PM, Developer, QA, and Analyst immediately recognizable the moment they speak.
522
+
523
+ **Auto-enabled** — party mode activates automatically when BMAD is detected. Configure agents visually:
524
+
525
+ ```bash
526
+ npx agentvibes # Press B for BMad Tab
527
+ ```
528
+
529
+ **Each agent gets:**
530
+ - 🎙️ **Their own voice** — 914 to choose from, or auto-assign gender-aware
531
+ - 🎵 **Their own music track** — cinematic for the Architect, lo-fi for the Dev
532
+ - 🎚️ **Their own volume** — fine-tune per-agent, or bulk-set all at once
533
+ - 🎛️ **Their own reverb** studio, hall, cathedral, room, or none
534
+ - 💬 **Their own pretext** "Winston says:..." before every line
535
+ - 🎭 **Their own personality** sarcastic, dramatic, pirate, cheerful...
536
+ - 🔇 **No overlap** — agents wait for full audio before the next one speaks
537
+ - **Markdown stripped** no "asterisk asterisk" in TTS output
538
+
539
+ ### 🎛️ BMad Tab Full Visual Agent Configurator
540
+
541
+ Manage every agent from an interactive table same polish as the Voices tab:
542
+
543
+ | Key | Action |
544
+ |-----|--------|
545
+ | `Space` | Preview agent with full profile (animated spinner while playing) |
546
+ | `Enter` | Configure voice, music, volume, reverb, personality, pretext |
547
+ | `A` | Auto-assign unique voices (gender-aware, no repeats) |
548
+ | `B` | Bulk Edit — set music / volume / pretext / reverb for all agents |
549
+ | `X` | Reset agent to defaults |
550
+
551
+ The table shows **Voice, Gender, Provider, Reverb, Music, Vol, Pretext** columns. Voice names are automatically beautified: `16Speakers::Rose_Ibex` → `Rose Ibex`.
552
+
553
+ ### 🖥️ SSH Receiver — Hear Your Headless Server
554
+
555
+ Stream TTS from a cloud box, WSL2, or any voiceless server directly to your local machine over TCP:
556
+
557
+ ```bash
558
+ # Local: open TUI → Receiver tab → Start
559
+ npx agentvibes
560
+
561
+ # Remote: AgentVibes auto-detects the receiver and streams audio to you
562
+ ```
563
+
564
+ ### ~1 Second Faster TTS
565
+
566
+ - 6 Node.js profile reads collapsed into 1 (~900ms saved per speech)
567
+ - `inotifywait` queue worker — no polling delay
568
+ - Cache cleanup runs off the critical path
569
+
570
+ ### 🎨 ANSI Colors Restored + Banner Toggle
571
+
572
+ Full color in the TTS banner. Silence it without muting audio:
573
+ ```bash
574
+ touch ~/.agentvibes/banner-disabled # or: "turn off the TTS banner" via MCP
575
+ ```
576
+
577
+ ### Quick Install
578
+
579
+ ```bash
580
+ npx agentvibes install
581
+ ```
582
+
583
+ 💡 **Tip:** If `npx agentvibes` shows an older version: `npm cache clean --force && npx agentvibes@latest`
584
+
585
+ 🐛 **Found a bug?** [GitHub Issues](https://github.com/paulpreibisch/AgentVibes/issues)
586
+
587
+ [→ View Complete Release Notes](RELEASE_NOTES.md) | [→ View Previous Release (v4.0.1)](https://github.com/paulpreibisch/AgentVibes/releases/tag/v4.0.1) | [→ View All Releases](https://github.com/paulpreibisch/AgentVibes/releases)
588
+
589
+ [↑ Back to top](#-table-of-contents)
590
+
591
+ ---
592
+
593
+ ## 🎙️ AgentVibes MCP
594
+
595
+ Agent Vibes was originally created to give the Claude Code assistant a voice! Simply install it with an npx command in your terminal, and Claude Code can talk back to you.
596
+
597
+ We've now enhanced this capability by adding an MCP (Model Context Protocol) server. This integration exposes Agent Vibes' functionality directly to your AI assistant, allowing you to configure and control Agent Vibes using natural language instead of typing "/" slash commands.
598
+
599
+ Setting it up is straightforward: just add the MCP server to your Claude Code configuration files.
600
+
601
+ But the convenience doesn't stop there. With the MCP server in place, Claude Desktop can now use Agent Vibes too!
602
+
603
+ We're thrilled about this expansion because it means Claude Desktop can finally talk back as well!
604
+
605
+ If you decide to use the MCP server on Claude Desktop, after configuration, give Claude Desktop this command: "every time i give you a command, speak the acknowledgement using agentvibes and the confirmation about what you completed, when done"—and watch the magic happen!
606
+
607
+ **🎯 Control AgentVibes with natural language - no slash commands to remember!**
608
+
609
+ Just say "Switch to Aria voice" or "Speak in Spanish" instead of typing commands.
610
+
611
+ **Works in:** Claude Desktop, Claude Code
612
+
613
+ **[→ View Complete MCP Setup Guide](docs/mcp-setup.md)** - Full setup for all platforms, configuration examples, available tools, and MCP vs slash commands comparison
614
+
615
+ [↑ Back to top](#-table-of-contents)
616
+
617
+ ---
618
+
619
+ ## 🚀 Quick Start - Get Voice in 30 Seconds
620
+
621
+ **3 Simple Steps:**
622
+
623
+ ### 1️⃣ Install
624
+ ```bash
625
+ npx agentvibes install
626
+ ```
627
+
628
+ ### 2️⃣ Choose Provider (Auto-Detected)
629
+ - **macOS**: Native `say` provider (100+ voices) ✨
630
+ - **Linux/WSL**: Piper TTS (50+ free voices) 🎙️
631
+ - **Windows Native**: Soprano, Piper, or SAPI 🪟
632
+ - **Android**: Termux with auto-setup 📱
633
+
634
+ ### 3️⃣ Use in Claude Code
635
+ Just code normally - AgentVibes automatically speaks task acknowledgments and completions! 🔊
636
+
637
+ ---
638
+
639
+ ### TUI Console Commands
640
+
641
+ AgentVibes includes a full **Text User Interface (TUI)** built with blessed.js for managing voices, music, settings, and installation — all from a single interactive console.
642
+
643
+ | Command | Description |
644
+ |---------|-------------|
645
+ | `npx agentvibes` | Smart detection opens Settings if installed, Install if not |
646
+ | `npx agentvibes install` | Open the Install tab directly |
647
+ | `npx agentvibes config` | Open the Settings tab directly |
648
+
649
+ Once inside, use **Tab** / **Shift+Tab** to switch between tabs: **Voices**, **Music**, **BMad**, **Settings**, **Receiver**, and **Install**. Use **[** / **]** to page through voice and music catalogs.
650
+
651
+ ---
652
+
653
+ **🍎 macOS Users (One-Time Setup):**
654
+ ```bash
655
+ brew install bash # Required for bash 5.x features
656
+ ```
657
+ macOS ships with bash 3.2 (from 2007). After this, everything works perfectly!
658
+
659
+ ---
660
+
661
+ **[→ Full Setup Guide](docs/quick-start.md)** - Advanced options, provider switching, and detailed setup
662
+
663
+ [↑ Back to top](#-table-of-contents)
664
+
665
+ [↑ Back to top](#-table-of-contents)
666
+
667
+ ---
668
+
669
+ ## 📋 Prerequisites - What You Actually Need
670
+
671
+ ### Minimum (Core Features)
672
+ **✅ REQUIRED:**
673
+ - **Node.js** ≥16.0 - Check with: `node --version`
674
+
675
+ ### Required for Full Features
676
+ **✅ STRONGLY RECOMMENDED:**
677
+ - **Python** 3.10+ - Needed for Piper TTS voice engine
678
+ - **bash** 5.0+ - macOS only (macOS ships with 3.2 from 2007)
679
+
680
+ ### Optional but Recommended
681
+ **⭕ OPTIONAL (TTS still works without them):**
682
+ - **sox** - Audio effects (reverb, EQ, pitch shifting)
683
+ - **ffmpeg** - Background music, audio padding, RDP compression
684
+
685
+ ### NOT Required (Despite What You've Heard)
686
+ **❌ DEFINITELY NOT NEEDED:**
687
+ - ❌ Git or git-lfs (npm handles everything)
688
+ - Repository cloning (unless you're contributing code)
689
+ - ❌ Build tools or C++ compilers (pre-built package ready to use)
690
+
691
+ ### Installation Methods
692
+
693
+ | Method | Command | Use Case |
694
+ |--------|---------|----------|
695
+ | **✅ RECOMMENDED: NPX (via npm)** | `npx agentvibes install` | **All platforms** - Just want to use AgentVibes |
696
+ | **🪟 Windows PowerShell** | `.\setup-windows.ps1` | **Windows** - Standalone installer (no Node.js needed) |
697
+ | **⚠️ Git Clone** | `git clone ...` | **Developers Only** - Contributing code |
698
+
699
+ **Why npx?** Zero git operations, no build steps, just 30 seconds to voice!
700
+
701
+ ### For Developers (Contributing Code)
702
+
703
+ If you want to contribute to AgentVibes:
704
+ ```bash
705
+ git clone https://github.com/paulpreibisch/AgentVibes.git
706
+ cd AgentVibes
707
+ npm install
708
+ npm link
709
+ ```
710
+
711
+ Requires: Node.js 16+, Git (no git-lfs), and `npm link` familiarity.
712
+
713
+ [↑ Back to top](#-table-of-contents)
714
+
715
+ ---
716
+
717
+ ---
718
+
719
+ ## 📱 Quick Setup: Android & Termux (Claude Code on Your Phone!)
720
+
721
+ **Want to run Claude Code on your Android phone with professional voices?**
722
+
723
+ Simply install Termux from F-Droid (NOT Google Play) and run:
724
+ ```bash
725
+ pkg update && pkg upgrade
726
+ pkg install nodejs-lts
727
+ npx agentvibes install
728
+ ```
729
+
730
+ Termux auto-detects and installs everything needed (proot-distro for compatibility, Piper TTS, audio playback).
731
+
732
+ **[→ Full Android/Termux Setup Guide](#-android--termux)** - Detailed troubleshooting and verification steps
733
+
734
+ [↑ Back to top](#-table-of-contents)
735
+
736
+ ---
737
+
738
+ ## 📋 System Requirements
739
+
740
+ AgentVibes requires certain system dependencies for optimal audio processing and playback. Requirements vary by operating system and TTS provider.
741
+
742
+ ### Core Requirements (All Platforms)
743
+
744
+ | Tool | Required For | Why It's Needed |
745
+ |------|-------------|-----------------|
746
+ | **Node.js** ≥16.0 | All platforms | Runtime for AgentVibes installer and MCP server |
747
+ | **Bash** ≥5.0 | macOS | Modern bash features (macOS ships with 3.2 from 2007) |
748
+ | **Python** 3.10+ | Piper TTS, MCP server | Runs Piper voice engine and MCP server |
749
+
750
+ ### Audio Processing Tools (Recommended)
751
+
752
+ | Tool | Status | Purpose | Impact if Missing |
753
+ |------|--------|---------|------------------|
754
+ | **sox** | Recommended | Audio effects (reverb, EQ, pitch, compression) | No audio effects, still works |
755
+ | **ffmpeg** | Recommended | Background music mixing, audio padding, RDP compression | No background music or RDP optimization |
756
+
757
+ ### Platform-Specific Requirements
758
+
759
+ #### 🐧 Linux / WSL
760
+
761
+ ```bash
762
+ # Ubuntu/Debian
763
+ sudo apt-get update
764
+ sudo apt-get install -y sox ffmpeg python3-pip pipx
765
+
766
+ # Fedora/RHEL
767
+ sudo dnf install -y sox ffmpeg python3-pip pipx
768
+
769
+ # Arch Linux
770
+ sudo pacman -S sox ffmpeg python-pip python-pipx
771
+ ```
772
+
773
+ **Audio Playback** (one of the following):
774
+ - `paplay` (PulseAudio - usually pre-installed)
775
+ - `aplay` (ALSA - fallback)
776
+ - `mpg123` (fallback)
777
+ - `mpv` (fallback)
778
+
779
+ **Why these tools?**
780
+ - **sox**: Applies audio effects defined in `.claude/config/audio-effects.cfg` (reverb, pitch shifting, EQ, compression)
781
+ - **ffmpeg**: Mixes background music tracks, adds silence padding to prevent audio cutoff, compresses audio for RDP/SSH sessions
782
+ - **paplay/aplay**: Plays generated TTS audio files
783
+ - **pipx**: Isolated Python environment manager for Piper TTS installation
784
+
785
+ #### 🍎 macOS
786
+
787
+ ```bash
788
+ # Install Homebrew if not already installed
789
+ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
790
+
791
+ # Required: Modern bash
792
+ brew install bash
793
+
794
+ # Recommended: Audio processing tools
795
+ brew install sox ffmpeg pipx
796
+ ```
797
+
798
+ **Audio Playback**:
799
+ - `afplay` (built-in - always available)
800
+ - `say` (built-in - for macOS TTS provider)
801
+
802
+ **Why these tools?**
803
+ - **bash 5.x**: macOS ships with bash 3.2 which lacks associative arrays and other modern features AgentVibes uses
804
+ - **sox**: Same audio effects processing as Linux
805
+ - **ffmpeg**: Same background music and padding as Linux
806
+ - **afplay**: Built-in macOS audio player
807
+ - **say**: Built-in macOS text-to-speech (alternative to Piper)
808
+
809
+ #### 🪟 Windows
810
+
811
+ **Option A: Native Windows (Recommended)**
812
+
813
+ AgentVibes now supports native Windows with three TTS providers. No WSL required!
814
+
815
+ ```powershell
816
+ # Interactive Node.js installer (recommended)
817
+ npx agentvibes install
818
+
819
+ # Or use the standalone PowerShell installer
820
+ .\setup-windows.ps1
821
+ ```
822
+
823
+ **Providers available natively:**
824
+ - **Soprano** - Ultra-fast neural TTS (best quality, requires `pip install soprano-tts`)
825
+ - **Windows Piper** - High quality offline neural voices (auto-downloaded)
826
+ - **Windows SAPI** - Built-in Windows voices (zero setup)
827
+
828
+ **Requirements:** Node.js 16+, PowerShell 5.1+, ffmpeg (optional, for background music & reverb)
829
+
830
+ See [Windows Native Setup Guide](WINDOWS-SETUP.md) for full instructions.
831
+
832
+ **Option B: WSL (Legacy)**
833
+
834
+ For Claude Desktop or WSL-based workflows, follow the [Windows WSL Guide](mcp-server/WINDOWS_SETUP.md).
835
+
836
+ ```powershell
837
+ # Install WSL from PowerShell (Administrator)
838
+ wsl --install -d Ubuntu
839
+ ```
840
+
841
+ Then follow Linux requirements above inside WSL.
842
+
843
+ #### 🤖 Android / Termux
844
+
845
+ **Running Claude Code on Your Android Using Termux**
846
+
847
+ AgentVibes fully supports Android devices through the [Termux app](https://termux.dev/). This enables you to run Claude Code with professional TTS voices directly on your Android phone or tablet!
848
+
849
+ **Quick Setup:**
850
+
851
+ ```bash
852
+ # 1. Install Termux from F-Droid (NOT Google Play - it's outdated)
853
+ # Download: https://f-droid.org/en/packages/com.termux/
854
+
855
+ # 2. Install Node.js in Termux
856
+ pkg update && pkg upgrade
857
+ pkg install nodejs-lts
858
+
859
+ # 3. Install AgentVibes (auto-detects Android and runs Termux installer)
860
+ npx agentvibes install
861
+ ```
862
+
863
+ **What Gets Installed?**
864
+
865
+ The Termux installer automatically sets up:
866
+ - **proot-distro** with Debian (for glibc compatibility)
867
+ - **Piper TTS** via proot wrapper (Android uses bionic libc, not glibc)
868
+ - **termux-media-player** for audio playback (`paplay` doesn't work on Android)
869
+ - **Audio dependencies**: ffmpeg, sox, bc for processing
870
+ - **termux-api** for Android-specific audio routing
871
+
872
+ **Why Termux Instead of Standard Installation?**
873
+
874
+ Android's architecture requires special handling:
875
+ - ❌ Standard pip/pipx fails (missing wheels for bionic libc)
876
+ - ❌ Linux binaries require glibc (Android uses bionic)
877
+ - ❌ `/tmp` directory is not accessible on Android
878
+ - Standard audio tools like `paplay` don't exist
879
+
880
+ Termux installer solves all these issues with proot-distro and Android-native audio playback!
881
+
882
+ **Requirements:**
883
+ - [Termux app](https://f-droid.org/en/packages/com.termux/) (from F-Droid, NOT Google Play)
884
+ - [Termux:API](https://f-droid.org/en/packages/com.termux.api/) (for audio playback)
885
+ - Android 7.0+ (recommended: Android 10+)
886
+ - ~500MB free storage (for Piper TTS + voice models)
887
+
888
+ **Audio Playback:**
889
+ - Uses `termux-media-player` instead of `paplay`
890
+ - Audio automatically routes through Android's media system
891
+ - Supports all Piper TTS voices (50+ languages)
892
+
893
+ **Verifying Your Setup:**
894
+
895
+ ```bash
896
+ # Check Termux environment
897
+ echo $PREFIX # Should show /data/data/com.termux/files/usr
898
+
899
+ # Check Node.js
900
+ node --version # Should be ≥16.0
901
+
902
+ # Check if Piper is installed
903
+ which piper # Should return /data/data/com.termux/files/usr/bin/piper
904
+
905
+ # Test audio playback
906
+ termux-media-player play /path/to/audio.wav
907
+ ```
908
+
909
+ **Troubleshooting:**
910
+
911
+ | Issue | Solution |
912
+ |-------|----------|
913
+ | "piper: not found" | Run `npx agentvibes install` - auto-detects Termux |
914
+ | No audio playback | Install Termux:API from F-Droid |
915
+ | Permission denied | Run `termux-setup-storage` to grant storage access |
916
+ | Slow installation | Use WiFi, not mobile data (~300MB download) |
917
+
918
+ **Why F-Droid and Not Google Play?**
919
+
920
+ Google Play's Termux version is outdated and unsupported. Always use the [F-Droid version](https://f-droid.org/en/packages/com.termux/) for the latest security updates and compatibility.
921
+
922
+ ### TTS Provider Requirements
923
+
924
+ #### Piper TTS (Free, Offline)
925
+ - **Python** 3.10+
926
+ - **pipx** (for isolated installation)
927
+ - **Disk Space**: ~50MB per voice model
928
+ - **Internet**: Only for initial voice downloads
929
+
930
+ ```bash
931
+ # Installed automatically by AgentVibes
932
+ pipx install piper-tts
933
+ ```
934
+
935
+ #### macOS Say (Built-in, macOS Only)
936
+ - No additional requirements
937
+ - 100+ voices pre-installed on macOS
938
+ - Use: `/agent-vibes:provider switch macos`
939
+
940
+ ### Verifying Your Setup
941
+
942
+ ```bash
943
+ # Check all dependencies
944
+ node --version # Should be ≥16.0
945
+ python3 --version # Should be ≥3.10
946
+ bash --version # Should be ≥5.0 (macOS users!)
947
+ sox --version # Optional but recommended
948
+ ffmpeg -version # Optional but recommended
949
+ pipx --version # Required for Piper TTS
950
+
951
+ # Check audio playback (Linux/WSL)
952
+ paplay --version || aplay --version
953
+
954
+ # Check audio playback (macOS)
955
+ which afplay # Should return /usr/bin/afplay
956
+ ```
957
+
958
+ ### What Happens Without Optional Dependencies?
959
+
960
+ | Missing Tool | Impact | Workaround |
961
+ |-------------|--------|------------|
962
+ | sox | No audio effects (reverb, EQ, pitch) | TTS still works, just no effects |
963
+ | ffmpeg | No background music, no audio padding | TTS still works, audio may cut off slightly early |
964
+ | paplay/aplay | No audio playback on Linux | Install at least one audio player |
965
+
966
+ **All TTS generation still works** - optional tools only enhance the experience!
967
+
968
+ [↑ Back to top](#-table-of-contents)
969
+
970
+ ---
971
+
972
+ ## 🎭 Choose Your Voice Provider
973
+
974
+ **Piper TTS** (free, works offline on Linux/WSL) or **macOS Say** (free, built-in on Mac) - pick one and switch anytime.
975
+
976
+ | Provider | Platform | Cost | Quality | Setup |
977
+ |----------|----------|------|---------|-------|
978
+ | **macOS Say** | macOS only | Free (built-in) | ⭐⭐⭐⭐ | Zero config |
979
+ | **Piper** | Linux/WSL/Windows | Free | ⭐⭐⭐⭐ | Auto-downloads |
980
+ | **Soprano** | Linux/WSL/Windows | Free | ⭐⭐⭐⭐⭐ | `pip install soprano-tts` |
981
+ | **Windows SAPI** | Windows | Free (built-in) | ⭐⭐⭐ | Zero config |
982
+
983
+ On macOS, the native `say` provider is automatically detected and recommended!
984
+
985
+ **[→ Provider Comparison Guide](docs/providers.md)**
986
+
987
+ [↑ Back to top](#-table-of-contents)
988
+
989
+ ---
990
+
991
+ ## 🎤 Commands Reference
992
+
993
+ AgentVibes provides **50+ slash commands** and **natural language MCP equivalents**.
994
+
995
+ **Quick Examples:**
996
+ ```bash
997
+ # Voice control
998
+ /agent-vibes:switch Aria # Or: "Switch to Aria voice"
999
+ /agent-vibes:list # Or: "List all voices"
1000
+
1001
+ # Personality & sentiment
1002
+ /agent-vibes:personality pirate # Or: "Set personality to pirate"
1003
+ /agent-vibes:sentiment sarcastic # Or: "Apply sarcastic sentiment"
1004
+
1005
+ # Language & learning
1006
+ /agent-vibes:set-language spanish # Or: "Speak in Spanish"
1007
+ /agent-vibes:learn # Or: "Enable learning mode"
1008
+ ```
1009
+
1010
+ **[→ View Complete Command Reference](docs/commands.md)** - All voice, system, personality, sentiment, language, and BMAD commands with MCP equivalents
1011
+
1012
+ ### Intro Text Commands
1013
+
1014
+ ```bash
1015
+ # Configure intro text — open Settings tab
1016
+ npx agentvibes
1017
+
1018
+ # View current intro text
1019
+ cat ~/.claude/config/intro-text.txt
1020
+ ```
1021
+
1022
+ **MCP Equivalent:**
1023
+ ```
1024
+ "Set my intro text to 'FireBot: '"
1025
+ "What's my current intro text?"
1026
+ "Clear my intro text"
1027
+ ```
1028
+
1029
+ ### Custom Music Commands
1030
+
1031
+ ```bash
1032
+ # Configure background music — open Music tab
1033
+ npx agentvibes
1034
+ ```
1035
+
1036
+ **MCP Equivalent:**
1037
+ ```
1038
+ "Configure my background music"
1039
+ "Add custom background music"
1040
+ "Remove custom music"
1041
+ "Preview my background music"
1042
+ ```
1043
+
1044
+ ### Friendly Voice Name Commands
1045
+
1046
+ ```bash
1047
+ # Switch using friendly name
1048
+ /agent-vibes:switch Ryan
1049
+ /agent-vibes:switch Sarah
1050
+
1051
+ # List all voices with friendly names
1052
+ /agent-vibes:list
1053
+
1054
+ # Get current voice (shows friendly name if available)
1055
+ /agent-vibes:whoami
1056
+ ```
1057
+
1058
+ **MCP Equivalent:**
1059
+ ```
1060
+ "Switch to Ryan voice"
1061
+ "Use the Sarah voice"
1062
+ "List all available voices"
1063
+ ```
1064
+
1065
+ [↑ Back to top](#-table-of-contents)
1066
+
1067
+ ---
1068
+
1069
+ ## 🎙️ Verbosity Control
1070
+
1071
+ **Control how much Claude speaks while working!** 🔊
1072
+
1073
+ Choose from three verbosity levels:
1074
+
1075
+ ### LOW (Minimal) 🔇
1076
+ - Acknowledgments only (start of task)
1077
+ - Completions only (end of task)
1078
+ - Perfect for quiet work sessions
1079
+
1080
+ ### MEDIUM (Balanced) 🤔
1081
+ - Acknowledgments + completions
1082
+ - Major decisions ("I'll use grep to search")
1083
+ - Key findings ("Found 12 instances")
1084
+ - Perfect for understanding decisions without full narration
1085
+
1086
+ ### HIGH (Maximum Transparency) 💭
1087
+ - All reasoning ("Let me search for all instances")
1088
+ - All decisions ("I'll use grep for this")
1089
+ - All findings ("Found it at line 1323")
1090
+ - Perfect for learning mode, debugging complex tasks
1091
+
1092
+ **Quick Commands:**
1093
+ ```bash
1094
+ /agent-vibes:verbosity # Show current level
1095
+ /agent-vibes:verbosity high # Maximum transparency
1096
+ /agent-vibes:verbosity medium # Balanced
1097
+ /agent-vibes:verbosity low # Minimal (default)
1098
+ ```
1099
+
1100
+ **MCP Equivalent:**
1101
+ ```
1102
+ "Set verbosity to high"
1103
+ "What's my current verbosity level?"
1104
+ ```
1105
+
1106
+ 💡 **How it works:** Claude uses emoji markers (💭 🤔 ✓) in its text, and AgentVibes automatically detects and speaks them based on your verbosity level. No manual TTS calls needed!
1107
+
1108
+ ⚠️ **Note:** Changes take effect on next Claude Code session restart.
1109
+
1110
+ [↑ Back to top](#-table-of-contents)
1111
+
1112
+ ---
1113
+
1114
+ ## 📚 Language Learning Mode
1115
+
1116
+ **🎯 Learn Spanish (or 30+ languages) while you program!** 🌍
1117
+
1118
+ Every task acknowledgment plays **twice** - first in English, then in your target language. Context-based learning while you code!
1119
+
1120
+ **[→ View Complete Learning Mode Guide](docs/language-learning-mode.md)** - Full tutorial, quick start, commands, speech rate control, supported languages, and pro tips
1121
+
1122
+ [↑ Back to top](#-table-of-contents)
1123
+
1124
+ ---
1125
+
1126
+ ## 🎭 Personalities vs Sentiments
1127
+
1128
+ **Two ways to add personality:**
1129
+
1130
+ - **🎪 Personalities** - Changes BOTH voice AND speaking style (e.g., `pirate` personality = Pirate Marshal voice + pirate speak)
1131
+ - **💭 Sentiments** - Keeps your current voice, only changes speaking style (e.g., Aria voice + sarcastic sentiment)
1132
+
1133
+ **[→ Complete Personalities Guide](docs/personalities.md)** - All 19 personalities, create custom ones
1134
+
1135
+ [↑ Back to top](#-table-of-contents)
1136
+
1137
+ ---
1138
+
1139
+ ## 🗣️ Voice Library
1140
+
1141
+ Use the **AgentVibes TUI installer** (`/audio-browser`) to browse, sample, and install from 914 voices interactively.
1142
+
1143
+ ### Friendly Voice Names
1144
+
1145
+ All voices now have memorable names! Instead of technical IDs like `en_US-libritts_r-medium-speaker-123`, just use friendly names like **Ryan**, **Joe**, or **Sarah**.
1146
+
1147
+ **Voice Metadata Includes:**
1148
+ - Display name and technical ID
1149
+ - Gender, accent, and region
1150
+ - Personality traits (professional, warm, friendly, etc.)
1151
+ - Recommended use cases
1152
+ - Quality rating and sample rate
1153
+
1154
+ ### Voice Categories
1155
+
1156
+ **Curated Voices** (10 personalities):
1157
+ These hand-picked voices cover common use cases with clear characteristics.
1158
+
1159
+ **Speaker Variations** (904 voices):
1160
+ High-quality Piper TTS voices from the libritts-high model. Each speaker has unique vocal characteristics, accents, and tones.
1161
+
1162
+ ### Popular Voices
1163
+
1164
+ AgentVibes includes professional AI voices from Piper TTS and macOS Say with multilingual support.
1165
+
1166
+ 🎧 **Try in Claude Code:** `/agent-vibes:preview` to hear all voices
1167
+ 🌍 **Multilingual:** Use Antoni, Rachel, Domi, or Bella for automatic language detection
1168
+
1169
+ **[→ View Complete Voice Library](docs/voice-library.md)** - All voices with clickable samples, descriptions, and best use cases
1170
+
1171
+ [↑ Back to top](#-table-of-contents)
1172
+
1173
+ ---
1174
+
1175
+ ## 🔌 BMAD Plugin
1176
+
1177
+ **Automatically switch voices when using BMAD agents!**
1178
+
1179
+ The BMAD plugin detects when you activate a BMAD agent (e.g., `/BMad:agents:pm`) and automatically uses the assigned voice for that role.
1180
+
1181
+ **Version Support**: AgentVibes supports both BMAD v4 and v6-alpha installations. Version detection is automatic - just install BMAD and AgentVibes will detect and configure itself correctly!
1182
+
1183
+ ### 🎭 Party Mode — Screenshots
1184
+
1185
+ Open the **BMad** tab in the AgentVibes TUI (`npx agentvibes`) to configure which voice each agent uses:
1186
+
1187
+ ![BMAD Party Mode Tab](docs/installation-screenshots/screenshot-bmad-party-mode.png)
1188
+
1189
+ > 📸 **Don't have a screenshot yet?** Run `npx agentvibes`, switch to the **BMad** tab, and take a screenshot — then save it as `docs/installation-screenshots/screenshot-bmad-party-mode.png`.
1190
+
1191
+ ### 🔊 TTS Injection: How It Works
1192
+
1193
+ BMAD uses a **loosely-coupled injection system** for voice integration. BMAD source files contain placeholder markers that AgentVibes replaces with speaking instructions during installation:
1194
+
1195
+ **Before Installation (BMAD Source):**
1196
+ ```xml
1197
+ <rules>
1198
+ <r>ALWAYS communicate in {communication_language}...</r>
1199
+ <!-- TTS_INJECTION:agent-tts -->
1200
+ <r>Stay in character until exit selected</r>
1201
+ </rules>
1202
+ ```
1203
+
1204
+ **After Installation (with AgentVibes enabled):**
1205
+ ```xml
1206
+ <rules>
1207
+ <r>ALWAYS communicate in {communication_language}...</r>
1208
+ - When responding to user messages, speak your responses using TTS:
1209
+ Call: `.claude/hooks/bmad-speak.sh '{agent-id}' '{response-text}'`
1210
+ Where {agent-id} is your agent type (pm, architect, dev, etc.)
1211
+
1212
+ - Auto Voice Switching: AgentVibes automatically switches to the voice
1213
+ assigned for your agent role when activated
1214
+ <r>Stay in character until exit selected</r>
1215
+ </rules>
1216
+ ```
1217
+
1218
+ **After Installation (with TTS disabled):**
1219
+ ```xml
1220
+ <rules>
1221
+ <r>ALWAYS communicate in {communication_language}...</r>
1222
+ <r>Stay in character until exit selected</r>
1223
+ </rules>
1224
+ ```
1225
+
1226
+ This design means **any TTS provider** can integrate with BMAD by replacing these markers with their own instructions!
1227
+
1228
+ **[→ View Complete BMAD Documentation](docs/bmad-plugin.md)** - All agent mappings, language support, TTS injection details, plugin management, and customization
1229
+
1230
+ [↑ Back to top](#-table-of-contents)
1231
+
1232
+ ---
1233
+
1234
+ ## 🤖 OpenClaw Integration
1235
+
1236
+ **Use AgentVibes TTS with OpenClaw - the revolutionary AI assistant you can access via any instant messenger!**
1237
+
1238
+ **What is OpenClaw?** [OpenClaw](https://openclaw.ai/) is a revolutionary AI assistant that brings Claude AI to your favorite messaging platforms - WhatsApp, Telegram, Discord, and more. No apps to install, no websites to visit - just message your AI assistant like you would a friend.
1239
+
1240
+ 🌐 **Website**: https://openclaw.ai/
1241
+
1242
+ AgentVibes seamlessly integrates with OpenClaw, providing professional text-to-speech for AI assistants running on messaging platforms and remote servers.
1243
+
1244
+ ### 🚨 CRITICAL: Security Before Running OpenClaw on Any Remote Server
1245
+
1246
+ ⚠️ **SECURITY IS NOT OPTIONAL** - Running OpenClaw on a remote server exposes your infrastructure to attack vectors including SSH compromise, credential theft, and lateral movement.
1247
+
1248
+ **👉 READ THIS FIRST:** [Security Hardening Guide](docs/security-hardening-guide.md) - **Required reading** covering:
1249
+ - ✅ SSH hardening (key-only auth, port 2222, fail2ban)
1250
+ - Firewall configuration (UFW/iptables)
1251
+ - ✅ Intrusion detection (AIDE, Wazuh)
1252
+ - VPN tunneling (Tailscale alternative to direct SSH)
1253
+
1254
+ **Do not expose your OpenClaw server to the internet without reading this guide.**
1255
+
1256
+ ### 🎯 Key Benefits
1257
+
1258
+ - **Free & Offline**: No API costs, works without internet
1259
+ - **Remote SSH Audio**: Audio tunnels from server to local machine via PulseAudio
1260
+ - **50+ Voices**: Professional AI voices in 30+ languages
1261
+ - **Zero Config**: Automatic when AgentVibes is installed
1262
+
1263
+ ### 🚀 Installation
1264
+
1265
+ AgentVibes includes a ready-to-use OpenClaw skill that enables TTS on messaging platforms. The setup involves two components:
1266
+
1267
+ #### Component 1: OpenClaw Server (Remote)
1268
+
1269
+ Install AgentVibes on your OpenClaw server:
1270
+
1271
+ ```bash
1272
+ # On your remote server where OpenClaw is running
1273
+ npx agentvibes install
1274
+ ```
1275
+
1276
+ The OpenClaw skill is **automatically included** in the AgentVibes npm package at `.clawdbot/skill/SKILL.md`.
1277
+
1278
+ **How to activate the skill in OpenClaw:**
1279
+
1280
+ 1. **Locate the skill** - After installing AgentVibes, the skill is at:
1281
+ ```
1282
+ node_modules/agentvibes/.clawdbot/skill/SKILL.md
1283
+ ```
1284
+
1285
+ 2. **Link to OpenClaw skills directory** (if OpenClaw uses skills):
1286
+ ```bash
1287
+ # Example - adjust path based on your OpenClaw installation
1288
+ ln -s $(npm root -g)/agentvibes/.clawdbot/skill/SKILL.md ~/.openclaw/skills/agentvibes.md
1289
+ ```
1290
+
1291
+ 3. **OpenClaw auto-detection** - Many OpenClaw setups automatically detect AgentVibes when it's installed. Check your OpenClaw logs for:
1292
+ ```
1293
+ ✓ AgentVibes skill detected and loaded
1294
+ ```
1295
+
1296
+ ---
1297
+
1298
+ #### 🎙️ AgentVibes Voice Management Skill for OpenClaw
1299
+
1300
+ Manage your text-to-speech voices across multiple providers with the AgentVibes Voice Management Skill:
1301
+
1302
+ **Voice Management Features:**
1303
+ - 🎤 **50+ Professional Voices** - Across Piper TTS, Piper (free offline), and macOS Say providers
1304
+ - 🔀 **Multi-Provider Support** - Switch between Piper TTS (premium), Piper (free), and macOS Say
1305
+ - 👂 **Voice Preview** - Listen to voices before selecting them
1306
+ - 🎚️ **Voice Customization** - Add custom voices, set pretext, control speech rate
1307
+ - 📋 **Voice Management** - List, switch, replay, and manage your voice library
1308
+ - 🔇 **Mute Control** - Mute/unmute TTS output with persistent settings
1309
+ - 🌍 **Multilingual Support** - Voices in 30+ languages across all providers
1310
+
1311
+ **Installation Confirmation:**
1312
+ ✅ The skill is **automatically included** in the AgentVibes npm package at:
1313
+ ```
1314
+ node_modules/agentvibes/.clawdbot/skill/SKILL.md
1315
+ ```
1316
+
1317
+ No extra setup needed - when you run `npx agentvibes install` on your OpenClaw server, the skill is ready to use!
1318
+
1319
+ **Full Skill Documentation:**
1320
+ **[→ View Complete AgentVibes Skill Guide](.clawdbot/skill/SKILL.md)** - 430+ lines covering:
1321
+ - Quick start with 50+ voice options
1322
+ - Background music & effects management
1323
+ - Personality system (19+ styles)
1324
+ - Voice effects (reverb, reverb, EQ)
1325
+ - Speed & verbosity control
1326
+ - Remote SSH audio setup
1327
+ - Troubleshooting & complete reference
1328
+
1329
+ **Popular Voice Examples:**
1330
+ ```bash
1331
+ # Female voices
1332
+ npx agentvibes speak "Hello" --voice en_US-amy-medium
1333
+ npx agentvibes speak "Bonjour" --voice fr_FR-siwis-medium
1334
+
1335
+ # Male voices
1336
+ npx agentvibes speak "Hello" --voice en_US-lessac-medium
1337
+ npx agentvibes speak "Good day" --voice en_GB-alan-medium
1338
+
1339
+ # Add personality!
1340
+ bash ~/.claude/hooks/personality-manager.sh set sarcastic
1341
+ bash ~/.claude/hooks/play-tts.sh "Oh wonderful, another request"
1342
+ ```
1343
+
1344
+ ---
1345
+
1346
+ #### Component 2: AgentVibes Receiver (Local/Phone) ⚠️ REQUIRED
1347
+
1348
+ **CRITICAL: You MUST install AgentVibes on your phone (or local machine) to receive and play audio!**
1349
+
1350
+ Without this, audio cannot be heard - the server generates TTS but needs a receiver to play it.
1351
+
1352
+ **Install on Android Phone (Termux):**
1353
+
1354
+ 1. **Install Termux from F-Droid** (NOT Google Play):
1355
+ - Download: https://f-droid.org/en/packages/com.termux/
1356
+
1357
+ 2. **Install Node.js in Termux:**
1358
+ ```bash
1359
+ pkg update && pkg upgrade
1360
+ pkg install nodejs-lts
1361
+ ```
1362
+
1363
+ 3. **Install AgentVibes in Termux:**
1364
+ ```bash
1365
+ npx agentvibes install
1366
+ ```
1367
+
1368
+ 4. **Install Termux:API** (for audio playback):
1369
+ - Download: https://f-droid.org/en/packages/com.termux.api/
1370
+ - Then in Termux: `pkg install termux-api`
1371
+
1372
+ **Install on Local Mac/Linux:**
1373
+
1374
+ ```bash
1375
+ npx agentvibes install
1376
+ ```
1377
+
1378
+ **Why is this needed?**
1379
+ - The **server generates TTS** but has no speakers (headless)
1380
+ - AgentVibes on your **phone acts as the audio receiver** via SSH tunnel
1381
+ - Audio tunnels from server → SSH → phone → speakers 🔊
1382
+
1383
+ Without AgentVibes installed on the receiving device, you'll generate audio but hear nothing!
1384
+
1385
+ #### How It Works: Server → SSH Tunnel → Local Playback
1386
+
1387
+ ```
1388
+ ┌─────────────────────────────────────────────────────────┐
1389
+ │ 1. User messages OpenClaw via Telegram/WhatsApp │
1390
+ │ "Tell me about the weather" │
1391
+ └─────────────────────────────────────────────────────────┘
1392
+
1393
+ ┌─────────────────────────────────────────────────────────┐
1394
+ │ 2. OpenClaw (Server) processes request with Claude │
1395
+ │ AgentVibes skill generates TTS audio │
1396
+ └─────────────────────────────────────────────────────────┘
1397
+
1398
+ ┌─────────────────────────────────────────────────────────┐
1399
+ │ 3. Audio tunnels through SSH → PulseAudio (port 14713)│
1400
+ │ Server: PULSE_SERVER=tcp:localhost:14713 │
1401
+ └─────────────────────────────────────────────────────────┘
1402
+
1403
+ ┌─────────────────────────────────────────────────────────┐
1404
+ │ 4. Local AgentVibes receives and plays audio │
1405
+ │ Phone speakers, laptop speakers, etc. │
1406
+ │ 🔊 "The weather is sunny and 72 degrees" │
1407
+ └─────────────────────────────────────────────────────────┘
1408
+ ```
1409
+
1410
+ **Architecture:**
1411
+ - **Server (OpenClaw)**: Generates TTS, sends via PulseAudio
1412
+ - **SSH Tunnel**: RemoteForward port 14713 (encrypted transport)
1413
+ - **Local (Termux/Desktop)**: AgentVibes receives audio, plays on speakers
1414
+
1415
+ This creates a **Siri-like experience** - message from anywhere, hear responses on your phone! 📱🎤
1416
+
1417
+ ### 📝 Usage
1418
+
1419
+ #### Basic TTS Commands
1420
+
1421
+ ```bash
1422
+ # Basic TTS
1423
+ npx agentvibes speak "Hello from OpenClaw"
1424
+
1425
+ # With different voices
1426
+ npx agentvibes speak "Hello" --voice en_US-amy-medium
1427
+ npx agentvibes speak "Bonjour" --voice fr_FR-siwis-medium
1428
+
1429
+ # List available voices
1430
+ npx agentvibes voices
1431
+ ```
1432
+
1433
+ #### Advanced: Direct Hook Usage with Voice Override
1434
+
1435
+ For programmatic control, use the TTS hook directly:
1436
+
1437
+ ```bash
1438
+ # Basic: Use default voice
1439
+ bash ~/.claude/hooks/play-tts.sh "Hello from OpenClaw"
1440
+
1441
+ # Advanced: Override voice per message
1442
+ bash ~/.claude/hooks/play-tts.sh "Welcome message" "en_US-amy-medium"
1443
+ bash ~/.claude/hooks/play-tts.sh "Bonjour!" "fr_FR-siwis-medium"
1444
+ bash ~/.claude/hooks/play-tts.sh "British greeting" "en_GB-alan-medium"
1445
+ ```
1446
+
1447
+ **Parameters:**
1448
+ - `$1` - **TEXT** (required): Message to speak
1449
+ - `$2` - **VOICE** (optional): Voice name to override default
1450
+
1451
+ #### Audio Effects Configuration for OpenClaw
1452
+
1453
+ **File**: `.claude/config/audio-effects.cfg`
1454
+
1455
+ Customize audio effects, background music, and voice processing per agent or use default settings:
1456
+
1457
+ **Format:**
1458
+ ```
1459
+ AGENT_NAME|SOX_EFFECTS|BACKGROUND_FILE|BACKGROUND_VOLUME
1460
+ ```
1461
+
1462
+ **Example Configuration:**
1463
+
1464
+ ```bash
1465
+ # Default - subtle background music
1466
+ default||agentvibes_soft_flamenco_loop.mp3|0.30
1467
+
1468
+ # Custom agent with reverb + background
1469
+ MyAgent|reverb 40 50 90 gain -2|agentvibes_soft_flamenco_loop.mp3|0.20
1470
+
1471
+ # Agent with pitch shift and EQ
1472
+ Assistant|pitch -100 equalizer 3000 1q +2|agentvibes_dark_chill_step_loop.mp3|0.15
1473
+ ```
1474
+
1475
+ **Available SOX Effects:**
1476
+
1477
+ | Effect | Syntax | Example | Description |
1478
+ |--------|--------|---------|-------------|
1479
+ | **Reverb** | `reverb <reverberance> <HF-damping> <room-scale>` | `reverb 40 50 90` | Adds room ambiance (light: 30 40 70, heavy: 50 60 100) |
1480
+ | **Pitch** | `pitch <cents>` | `pitch -100` | Shift pitch (100 cents = 1 semitone, negative = lower) |
1481
+ | **Equalizer** | `equalizer <freq> <width>q <gain-dB>` | `equalizer 3000 1q +2` | Boost/cut frequencies (bass: 200Hz, treble: 4000Hz) |
1482
+ | **Gain** | `gain <dB>` | `gain -2` | Adjust volume (negative = quieter, positive = louder) |
1483
+ | **Compand** | `compand <attack,decay> <threshold:in,out>` | `compand 0.3,1 6:-70,-60,-20` | Dynamic range compression (makes quiet parts louder) |
1484
+
1485
+ **Background Music Tracks:**
1486
+
1487
+ Built-in tracks available in `.claude/audio/tracks/`:
1488
+ - `agentvibes_soft_flamenco_loop.mp3` - Warm, rhythmic flamenco
1489
+ - `agentvibes_dark_chill_step_loop.mp3` - Modern chill electronic
1490
+ - (50+ additional tracks available)
1491
+
1492
+ **Background Volume:**
1493
+ - `0.10` - Very subtle (10%)
1494
+ - `0.20` - Subtle (20%)
1495
+ - `0.30` - Moderate (30%, recommended default)
1496
+ - `0.40` - Noticeable (40%, party mode)
1497
+
1498
+ **Example: OpenClaw Custom Configuration**
1499
+
1500
+ Create `.claude/config/audio-effects.cfg` on your OpenClaw server:
1501
+
1502
+ ```bash
1503
+ # OpenClaw assistant - warm voice with subtle reverb
1504
+ OpenClaw|reverb 30 40 70 gain -1|agentvibes_soft_flamenco_loop.mp3|0.25
1505
+
1506
+ # Help desk agent - clear, bright voice
1507
+ HelpDesk|equalizer 4000 1q +3 compand 0.2,0.5 6:-70,-60,-20|agentvibes_dark_chill_step_loop.mp3|0.15
1508
+
1509
+ # Default fallback
1510
+ default||agentvibes_soft_flamenco_loop.mp3|0.30
1511
+ ```
1512
+
1513
+ **How AgentVibes Applies Effects:**
1514
+
1515
+ 1. **Generate TTS** - Create base audio with Piper TTS
1516
+ 2. **Apply SOX effects** - Process audio (reverb, EQ, pitch, etc.)
1517
+ 3. **Mix background** - Blend background music at specified volume
1518
+ 4. **Tunnel via SSH** - Send processed audio to local receiver
1519
+ 5. **Play on device** - Output to phone/laptop speakers
1520
+
1521
+ This allows **per-message customization** or **consistent agent branding** with unique audio signatures!
1522
+
1523
+ ### 🔊 Remote SSH Audio
1524
+
1525
+ Perfect for running OpenClaw on a remote server with audio on your local machine:
1526
+
1527
+ **Quick Setup:**
1528
+
1529
+ 1. **Remote server** - Configure PulseAudio:
1530
+ ```bash
1531
+ echo 'export PULSE_SERVER=tcp:localhost:14713' >> ~/.bashrc
1532
+ source ~/.bashrc
1533
+ ```
1534
+
1535
+ 2. **Local machine** - Add SSH tunnel (`~/.ssh/config`):
1536
+ ```
1537
+ Host your-server
1538
+ RemoteForward 14713 localhost:14713
1539
+ ```
1540
+
1541
+ 3. **Connect and test**:
1542
+ ```bash
1543
+ ssh your-server
1544
+ agentvibes speak "Testing remote audio from OpenClaw"
1545
+ ```
1546
+
1547
+ Audio plays on your local speakers! 🔊
1548
+
1549
+ ### 📚 Documentation
1550
+
1551
+ - **OpenClaw Skill**: [.clawdbot/README.md](.clawdbot/README.md)
1552
+ - **OpenClaw Website**: https://openclaw.ai/
1553
+ - **Remote Audio Setup**: [docs/remote-audio-setup.md](docs/remote-audio-setup.md)
1554
+ - **Security Hardening**: [docs/security-hardening-guide.md](docs/security-hardening-guide.md) ⚠️
1555
+
1556
+ [↑ Back to top](#-table-of-contents)
1557
+
1558
+ ---
1559
+
1560
+ ## 🎙️ AgentVibes Receiver: Remote Audio Streaming from Voiceless Servers
1561
+
1562
+ **Receive and play TTS audio from servers that have no audio output!**
1563
+
1564
+ AgentVibes Receiver is a lightweight audio client that runs on your phone, tablet, or personal computer, which receives TTS audio from remote voiceless servers, where your OpenClaw Personal Assistant or your Claude Code project is installed.
1565
+
1566
+ ### 🎯 What AgentVibes Receiver Solves
1567
+
1568
+ You have OpenClaw running on a Mac mini or remote server with **no audio output**:
1569
+ - 🖥️ Mac mini (silent)
1570
+ - 🖥️ Ubuntu server (headless)
1571
+ - ☁️ AWS/DigitalOcean instance
1572
+ - 📦 Docker container
1573
+ - 🪟 WSL (Windows Subsystem for Linux)
1574
+
1575
+ Users message you via WhatsApp, Telegram, Discord but only get text responses:
1576
+ - No voice = Less engaging experience
1577
+ - ❌ No personality = Feels robotic
1578
+ - No audio cues = Miss important context
1579
+
1580
+ **AgentVibes Receiver transforms this:**
1581
+ - ✅ OpenClaw speaks with voice (Siri-like experience)
1582
+ - ✅ Audio streams to your device automatically
1583
+ - You hear responses on your speakers
1584
+ - ✅ Users get a conversational AI experience
1585
+
1586
+ ### 🔧 How It Works
1587
+
1588
+ **One-time setup:**
1589
+ 1. Install AgentVibes on your voiceless server with OpenClaw
1590
+ 2. Install AgentVibes Receiver on your personal device (phone/tablet/laptop)
1591
+ 3. Connect via SSH tunnel (or Tailscale VPN)
1592
+ 4. Done - automatic from then on
1593
+
1594
+ **Flow diagram:**
1595
+ ```
1596
+ ┌──────────────────────────────────────────┐
1597
+ │ Your Mac mini / Server │
1598
+ (OpenClaw + AgentVibes) │
1599
+ Generates TTS audio │
1600
+ Sends via SSH tunnel │
1601
+ └──────────────────────────────────────────┘
1602
+ Encrypted SSH tunnel
1603
+ ┌──────────────────────────────────────────┐
1604
+ │ Your Phone / Laptop │
1605
+ (AgentVibes Receiver) │
1606
+ │ • Receives audio stream (or text stream) │
1607
+ │ • Auto-plays on device speakers │
1608
+ └──────────────────────────────────────────┘
1609
+ ```
1610
+
1611
+ **Real-world example:**
1612
+ ```
1613
+ 📱 WhatsApp: "Tell me about quantum computing"
1614
+
1615
+ 🖥️ Mac mini: OpenClaw processes + generates TTS
1616
+ SSH tunnel (audio or text stream)
1617
+ 📱 Your phone (Agent Vibes Receiver): Plays audio 🔊
1618
+
1619
+ You hear on your device speakers: "Quantum computing uses quantum bits..."
1620
+
1621
+ 💬 Conversation feels alive!
1622
+ ```
1623
+
1624
+ ### ✨ Key Features
1625
+
1626
+ | Feature | Benefit |
1627
+ |---------|---------|
1628
+ | **One-Time Pairing** | SSH key setup, automatic reconnect |
1629
+ | **Real-Time Streaming** | Low-latency audio playback |
1630
+ | **SSH Encryption** | Secure audio tunnel |
1631
+ | **Tailscale Support** | Easy VPN for remote servers |
1632
+ | **Voice Selection** | Configure server-side voice |
1633
+ | **Audio Effects** | Reverb, echo, pitch on server |
1634
+ | **Cache Tracking** | Monitor audio generation |
1635
+ | **Multiple Servers** | Connect to different OpenClaw instances |
1636
+
1637
+ ### 🚀 Perfect For
1638
+
1639
+ - 🖥️ **Mac mini + OpenClaw** - Home server with professional voices
1640
+ - ☁️ **Remote Servers** - OpenClaw on AWS/GCP/DigitalOcean
1641
+ - 📱 **WhatsApp/Telegram** - Users message, hear responses
1642
+ - 🎓 **Discord Bots** - Bot speaks with voices
1643
+ - 🏗️ **Docker/Containers** - Containerized OpenClaw with audio
1644
+ - 🔧 **WSL Development** - Windows developers using voiceless WSL
1645
+
1646
+ ### 📝 Setup
1647
+
1648
+ ```bash
1649
+ # On your server (Mac mini, Ubuntu, AWS, etc.)
1650
+ npx agentvibes install
1651
+ # Selects OpenClaw option
1652
+ # AgentVibes installs with SSH-Remote provider
1653
+
1654
+ # On your personal device (phone, laptop, tablet)
1655
+ npx agentvibes receiver setup
1656
+ # Pairing prompt with server SSH key
1657
+ # Done!
1658
+ ```
1659
+
1660
+ ### 📚 Documentation
1661
+
1662
+ **[→ View AgentVibes Receiver Setup Guide](docs/agentvibes-receiver.md)** - Pairing, SSH configuration, Tailscale setup, troubleshooting
1663
+
1664
+ **[→ View OpenClaw Integration Guide](docs/openclaw-integration.md)** - Server setup, voice configuration, audio effects, and best practices
1665
+
1666
+ [↑ Back to top](#-table-of-contents)
1667
+
1668
+ ---
1669
+
1670
+ ## 📦 Installation Structure
1671
+
1672
+ **What gets installed:** Commands, hooks, personalities, and plugins in `.claude/` directory.
1673
+
1674
+ **[→ View Complete Installation Structure](docs/installation-structure.md)** - Full directory tree, file descriptions, and settings storage
1675
+
1676
+ [↑ Back to top](#-table-of-contents)
1677
+
1678
+ ---
1679
+
1680
+ ## 💡 Common Workflows
1681
+
1682
+ ```bash
1683
+ # Switch voices
1684
+ /agent-vibes:list # See all voices
1685
+ /agent-vibes:switch Aria # Change voice
1686
+
1687
+ # Try personalities
1688
+ /agent-vibes:personality pirate # Pirate voice + style
1689
+ /agent-vibes:personality list # See all 19 personalities
1690
+
1691
+ # Speak in other languages
1692
+ /agent-vibes:set-language spanish # Speak in Spanish
1693
+ /agent-vibes:set-language list # See 30+ languages
1694
+
1695
+ # Replay audio
1696
+ /agent-vibes:replay # Replay last message
1697
+ ```
1698
+
1699
+ **💡 Tip:** Using MCP? Just say "Switch to Aria voice" or "Speak in Spanish" instead of typing commands.
1700
+
1701
+ [↑ Back to top](#-table-of-contents)
1702
+
1703
+ ---
1704
+
1705
+ ## 🔧 Advanced Features
1706
+
1707
+ AgentVibes supports **custom personalities** and **custom voices**.
1708
+
1709
+ **Quick Examples:**
1710
+ ```bash
1711
+ # Create custom personality
1712
+ /agent-vibes:personality add mycustom
1713
+
1714
+ # Add custom Piper voice
1715
+ /agent-vibes:add "My Voice" abc123xyz789
1716
+
1717
+ # Use in custom output styles
1718
+ [Bash: .claude/hooks/play-tts.sh "Starting" "Aria"]
1719
+ ```
1720
+
1721
+ **[→ View Advanced Features Guide](docs/advanced-features.md)** - Custom personalities, custom voices, and more
1722
+
1723
+ [↑ Back to top](#-table-of-contents)
1724
+
1725
+ ---
1726
+
1727
+ ## 🔊 Remote Audio Setup
1728
+
1729
+ **Running AgentVibes on a remote server?** No problem!
1730
+
1731
+ **Auto-detects SSH sessions** - Works with VS Code Remote SSH, regular SSH, cloud dev environments
1732
+ ✅ **Zero configuration** - Audio optimizes automatically
1733
+ **No static/clicking** - Clean playback through SSH tunnels
1734
+
1735
+ **[→ Remote Audio Setup Guide](docs/remote-audio-setup.md)** - Full PulseAudio configuration details
1736
+
1737
+ [↑ Back to top](#-table-of-contents)
1738
+
1739
+ ---
1740
+
1741
+ ## 🛠️ Technical Documentation
1742
+
1743
+ ### Audio Architecture
1744
+
1745
+ AgentVibes uses a cross-platform audio module (`src/console/audio-env.js`) that handles player detection and environment configuration for all supported platforms.
1746
+
1747
+ #### Platform Audio Support Matrix
1748
+
1749
+ | Platform | PulseAudio Config | MP3 Players (preference order) | WAV Players (preference order) |
1750
+ |----------|-------------------|-------------------------------|-------------------------------|
1751
+ | **Native Linux** | System default (not overridden) | ffplay → play (sox) → mpg123 → cvlc → mpv | aplay → paplay → play → ffplay |
1752
+ | **WSL2** | Auto-detects `/mnt/wslg/PulseServer` | Same as Linux | Same as Linux |
1753
+ | **macOS** | Not applicable | ffplay → play → mpg123 → cvlc → mpv → afplay | aplay → paplay → play → ffplay → afplay |
1754
+ | **Windows** | Not applicable | ffplay mpv (if installed) | ffplay mpv PowerShell SoundPlayer (built-in) |
1755
+
1756
+ #### Key Design Decisions
1757
+
1758
+ - **Direct spawn, not shell chains**: Audio players are spawned directly via Node's `spawn()` instead of `sh -c 'cmd1 || cmd2'` chains. VLC/cvlc crashes when stderr is redirected inside shell wrappers.
1759
+ - **Player detection at startup**: The available player is detected once using `which` and cached. No runtime fallback chains.
1760
+ - **PULSE_SERVER safety**: The WSL2 PulseServer path (`/mnt/wslg/PulseServer`) is only set when the socket file actually exists. Hardcoding it on native Linux silently breaks audio output.
1761
+ - **Windows WAV fallback**: PowerShell's `System.Media.SoundPlayer` is used as a built-in fallback when no cross-platform player is installed.
1762
+
1763
+ #### Multi-Speaker Voice Models
1764
+
1765
+ Piper supports multi-speaker ONNX models (e.g., `16Speakers.onnx`) that contain multiple voices in a single file. AgentVibes expands these automatically:
1766
+
1767
+ - The `.onnx.json` metadata file contains `num_speakers` and `speaker_id_map`
1768
+ - `scanInstalledVoices()` expands multi-speaker models into individual selectable entries (e.g., `16Speakers::Cori_Samuel`)
1769
+ - When selected, the system writes `tts-piper-model.txt` and `tts-piper-speaker-id.txt` to `.claude/`
1770
+ - `play-tts-piper.sh` reads these files and passes `--speaker <id>` to the piper binary
1771
+
1772
+ #### Voice Directory Resolution
1773
+
1774
+ Voice storage follows the same precedence chain in both JavaScript and shell:
1775
+
1776
+ 1. `PIPER_VOICES_DIR` environment variable
1777
+ 2. Project-local `.claude/piper-voices-dir.txt` (walks up directory tree)
1778
+ 3. Global `~/.claude/piper-voices-dir.txt`
1779
+ 4. Default `~/.claude/piper-voices`
1780
+
1781
+ #### Voice Catalog System
1782
+
1783
+ AgentVibes includes a 914-voice catalog (`voice-assignments.json`) that lets users browse, preview, and install voices directly from the Voices tab:
1784
+
1785
+ - **10 Curated Voices** — Hand-picked high-quality voices installed by default
1786
+ - **904 LibriTTS Speakers** Automatically extracted from the `16Speakers` multi-speaker model's `speaker_id_map`, plus the full LibriTTS catalog from Hugging Face
1787
+ - **Download on Demand** — Uninstalled voices appear greyed-out in the list; pressing Enter opens a download modal that fetches the voice via `piper-voice-manager.sh`
1788
+ - **Catalog Metadata** — Each entry includes `voiceId`, `displayName`, `gender`, `type` (curated/libritts), and download URL
1789
+ - **LibriTTS Speaker Names** — Raw numeric IDs are patched at load time using `patchLibriTTSSpeakerNames()` which maps speaker IDs to human-readable names from the registry
1790
+
1791
+ The catalog is loaded once at tab initialization by `loadCatalog()`. Installed voices (from disk scan) are shown with full color; catalog-only voices are dimmed until downloaded.
1792
+
1793
+ #### Required System Dependencies for Background Music
1794
+
1795
+ Background music requires an MP3-capable audio player. The installer detects missing players and offers to install `ffmpeg` automatically. If no player is found, the Music tab displays a clear error message.
1796
+
1797
+ ```bash
1798
+ # Install ffmpeg (recommended provides ffplay)
1799
+ # Ubuntu/Debian/WSL2:
1800
+ sudo apt install ffmpeg
1801
+
1802
+ # macOS:
1803
+ brew install ffmpeg
1804
+
1805
+ # Arch Linux:
1806
+ sudo pacman -S ffmpeg
1807
+ ```
1808
+
1809
+ [↑ Back to top](#-table-of-contents)
1810
+
1811
+ ---
1812
+
1813
+ ## 🔗 Useful Links
1814
+
1815
+ ### Voice & AI Tools
1816
+
1817
+ - 🎤 **[WhisperTyping](https://whispertyping.com/)** - Fast voice-to-text typing for developers
1818
+ - 🗣️ **[OpenWhisper (Azure)](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/whisper-overview)** - Microsoft's speech-to-text service
1819
+ - 🆓 **[Piper TTS](https://github.com/rhasspy/piper)** - Free offline neural TTS
1820
+ - 🤖 **[Claude Code](https://claude.com/claude-code)** - AI coding assistant
1821
+ - 🎭 **[BMAD METHOD](https://github.com/bmad-code-org/BMAD-METHOD)** - Multi-agent framework
1822
+
1823
+ ### AgentVibes Resources
1824
+
1825
+ - 🐛 **[Issues](https://github.com/paulpreibisch/AgentVibes/issues)** - Report bugs
1826
+ - 📝 **[Changelog](https://github.com/paulpreibisch/AgentVibes/releases)** - Version history
1827
+ - 📰 **[Technical Deep Dive - LinkedIn Article](https://www.linkedin.com/pulse/agent-vibes-add-voice-claude-code-deep-dive-npx-paul-preibisch-8zrcc/)** - How AgentVibes works under the hood
1828
+
1829
+ [ Back to top](#-table-of-contents)
1830
+
1831
+ ---
1832
+
1833
+ ## ❓ Troubleshooting
1834
+
1835
+ **Common Issues:**
1836
+
1837
+ **❌ Error: "git-lfs is not installed"**
1838
+
1839
+ **AgentVibes does NOT require git-lfs.** This error suggests:
1840
+
1841
+ 1. **Wrong installation method** - Use npm, not git clone:
1842
+ ```bash
1843
+ # ✅ CORRECT - Use this:
1844
+ npx agentvibes install
1845
+
1846
+ # ❌ WRONG - Don't clone unless contributing:
1847
+ git clone https://github.com/paulpreibisch/AgentVibes.git
1848
+ ```
1849
+
1850
+ 2. **Different project** - You may be in a BMAD-METHOD or other repo that uses git-lfs
1851
+
1852
+ 3. **Global git config** - Your git may have lfs enabled globally:
1853
+ ```bash
1854
+ git config --global --list | grep lfs
1855
+ ```
1856
+
1857
+ **Solution:** Use `npx agentvibes install` - no git operations needed!
1858
+
1859
+ ---
1860
+
1861
+ **No Audio Playing?**
1862
+ 1. Verify hook is installed: `ls -la .claude/hooks/session-start-tts.sh`
1863
+ 2. Test: `/agent-vibes:sample Aria`
1864
+
1865
+ **Commands Not Found?**
1866
+ ```bash
1867
+ npx agentvibes install --yes
1868
+ ```
1869
+
1870
+ **[→ View Complete Troubleshooting Guide](docs/troubleshooting.md)** - Solutions for audio issues, command problems, MCP errors, voice issues, and more
1871
+
1872
+ [↑ Back to top](#-table-of-contents)
1873
+
1874
+ ---
1875
+
1876
+ ## 🔄 Updating
1877
+
1878
+ **Quick Update (From Claude Code):**
1879
+ ```bash
1880
+ /agent-vibes:update
1881
+ ```
1882
+
1883
+ **Alternative Methods:**
1884
+ ```bash
1885
+ # Via npx
1886
+ npx agentvibes update --yes
1887
+
1888
+ # Via npm (if installed globally)
1889
+ npm update -g agentvibes && agentvibes update --yes
1890
+ ```
1891
+
1892
+ **Check Version:** `/agent-vibes:version`
1893
+
1894
+ **[→ View Complete Update Guide](docs/updating.md)** - All update methods, version checking, what gets updated, and troubleshooting
1895
+
1896
+ [↑ Back to top](#-table-of-contents)
1897
+
1898
+ ---
1899
+
1900
+ ## 🗑️ Uninstalling
1901
+
1902
+ **Quick Uninstall (Project Only):**
1903
+ ```bash
1904
+ npx agentvibes uninstall
1905
+ ```
1906
+
1907
+ **Uninstall Options:**
1908
+ ```bash
1909
+ # Interactive uninstall (confirms before removing)
1910
+ npx agentvibes uninstall
1911
+
1912
+ # Auto-confirm (skip confirmation prompt)
1913
+ npx agentvibes uninstall --yes
1914
+
1915
+ # Also remove global configuration
1916
+ npx agentvibes uninstall --global
1917
+
1918
+ # Complete uninstall including Piper TTS
1919
+ npx agentvibes uninstall --global --with-piper
1920
+ ```
1921
+
1922
+ **What Gets Removed:**
1923
+
1924
+ **Project-level (default):**
1925
+ - `.claude/commands/agent-vibes/` - Slash commands
1926
+ - `.claude/hooks/` - TTS scripts
1927
+ - `.claude/personalities/` - Personality templates
1928
+ - `.claude/output-styles/` - Output styles
1929
+ - `.claude/audio/` - Audio cache
1930
+ - `.claude/tts-*.txt` - TTS configuration files
1931
+ - `.agentvibes/` - BMAD integration files
1932
+
1933
+ **Global (with `--global` flag):**
1934
+ - `~/.claude/` - Global configuration
1935
+ - `~/.agentvibes/` - Global cache
1936
+
1937
+ **Piper TTS (with `--with-piper` flag):**
1938
+ - `~/piper/` - Piper TTS installation
1939
+
1940
+ **To Reinstall:**
1941
+ ```bash
1942
+ npx agentvibes install
1943
+ ```
1944
+
1945
+ **💡 Tips:**
1946
+ - Default uninstall only removes project-level files
1947
+ - Use `--global` if you want to completely reset AgentVibes
1948
+ - Use `--with-piper` if you also want to remove the Piper TTS engine
1949
+ - Run `npx agentvibes status` to check installation status
1950
+
1951
+ [↑ Back to top](#-table-of-contents)
1952
+
1953
+ ---
1954
+
1955
+ ## Frequently Asked Questions (FAQ)
1956
+
1957
+ ### Installation & Setup
1958
+
1959
+ **Q: Does AgentVibes require git-lfs?**
1960
+ **A:** **NO.** AgentVibes has zero git-lfs requirement. Use `npx agentvibes install` - no git operations needed.
1961
+
1962
+ **Q: Do I need to clone the GitHub repository?**
1963
+ **A:** **NO** (unless you're contributing code). Normal users should use `npx agentvibes install`. Repository cloning is only for developers who want to contribute to the project.
1964
+
1965
+ **Q: Why is the GitHub repo so large?**
1966
+ **A:** The repo includes demo files and development dependencies (node_modules). The actual npm package you download is **< 50MB** and optimized for users.
1967
+
1968
+ **Q: What's the difference between npm install and git clone?**
1969
+ **A:**
1970
+ - `npx agentvibes install` **For users** - Downloads pre-built package, zero git operations, instant setup
1971
+ - `git clone ...` → **For developers only** - Full source code, development setup, contributing code
1972
+
1973
+ **Q: I saw an error about git-lfs, is something wrong?**
1974
+ **A:** You're likely:
1975
+ 1. Using wrong installation method (use `npx` not `git clone`)
1976
+ 2. In a different project directory that uses git-lfs
1977
+ 3. Have global git config with lfs enabled
1978
+
1979
+ AgentVibes itself does NOT use or require git-lfs.
1980
+
1981
+ ### Features & Usage
1982
+
1983
+ **Q: Does MCP consume tokens from my context window?**
1984
+ **A:** **YES.** Every MCP tool schema adds to the context window. AgentVibes MCP is designed to be minimal (~1500-2000 tokens), but if you're concerned about token usage, you can use slash commands instead of MCP.
1985
+
1986
+ **Q: What's the difference between using MCP vs slash commands?**
1987
+ **A:**
1988
+ - **MCP**: Natural language ("Switch to Aria voice"), uses ~1500-2000 context tokens
1989
+ - **Slash commands**: Explicit commands (`/agent-vibes:switch Aria`), zero token overhead
1990
+
1991
+ Both do the exact same thing - MCP is more convenient, slash commands are more token-efficient.
1992
+
1993
+ **Q: Is AgentVibes just a bash script?**
1994
+ **A:** No. AgentVibes includes:
1995
+ - Multi-provider TTS abstraction (Piper TTS, macOS Say)
1996
+ - Voice management system with 50+ voices
1997
+ - Personality & sentiment system
1998
+ - Language learning mode with bilingual playback
1999
+ - Audio effects processing (reverb, EQ, compression)
2000
+ - MCP server for natural language control
2001
+ - BMAD integration for multi-agent voice switching
2002
+ - Remote audio optimization for SSH/RDP sessions
2003
+
2004
+ **Q: Can I use AgentVibes without BMAD?**
2005
+ **A:** **YES.** AgentVibes works standalone. BMAD integration is optional - only activates if you install BMAD separately.
2006
+
2007
+ **Q: What are the audio dependencies?**
2008
+ **A:**
2009
+ - **Required**: Node.js 16+, Python 3.10+ (for Piper TTS)
2010
+ - **Optional**: sox (audio effects), ffmpeg (background music, padding)
2011
+ - All TTS generation works without optional dependencies - they just enhance the experience
2012
+
2013
+ ### Voice Features
2014
+
2015
+ **Q: How do I browse and install voices?**
2016
+ **A:** Use the built-in TUI installer by running `/audio-browser` in Claude Code. Navigate with arrow keys, press ENTER to sample voices, and select one to install. AgentVibes switches to the chosen voice automatically.
2017
+
2018
+ **Q: What are friendly voice names?**
2019
+ **A:** Instead of technical IDs like `en_US-ryan-high`, you can now use simple names like "Ryan" when switching voices. All 904+ voices have friendly names matched to their characteristics.
2020
+
2021
+ **Q: How do I set up custom intro text?**
2022
+ **A:** During installation you'll be prompted for intro text. You can also configure it anytime via `npx agentvibes` → Settings tab. Enter text like "FireBot: " and it will prefix all TTS announcements.
2023
+
2024
+ **Q: Can I use my own background music?**
2025
+ **A:** Yes! Run `npx agentvibes` and open the Music tab. Select "Change music" and provide the path to your audio file (.mp3, .wav, .ogg, or .m4a). Files are validated for security and must be under 50MB.
2026
+
2027
+ **Q: What's the recommended duration for custom music?**
2028
+ **A:** Between 30-90 seconds is ideal for smooth looping. The system supports up to 300 seconds (5 minutes) but will warn you if the duration is non-optimal.
2029
+
2030
+ **Q: Are friendly voice names case-sensitive?**
2031
+ **A:** No! You can type "ryan", "Ryan", or "RYAN" - they all work. The voice resolution is case-insensitive.
2032
+
2033
+ **Q: Does custom music work with all TTS providers?**
2034
+ **A:** Yes! Custom background music works with Piper TTS, Soprano, macOS Say, and Windows SAPI.
2035
+
2036
+ **Q: Can I preview music before setting it as my background?**
2037
+ **A:** Yes! In `npx agentvibes` → Music tab, select "Preview current" to hear your music. During installation, you can also sample all built-in tracks.
2038
+
2039
+ **Q: What security measures protect custom music uploads?**
2040
+ **A:** AgentVibes implements **defense-in-depth security with 7 validation layers**, tested against 180+ attack variations:
2041
+
2042
+ 1. **Path Validation** - `path.resolve()` prevents traversal attacks (../, encoded, Unicode)
2043
+ 2. **Home Directory Boundary** - Files must be within your home directory
2044
+ 3. **File Existence Check** - Verifies file actually exists
2045
+ 4. **File Type Verification** - Must be a regular file (not device, socket, etc.)
2046
+ 5. **Ownership Verification** - File must be owned by you (UID check)
2047
+ 6. **Format Validation** - Magic number checking ensures real audio files
2048
+ 7. **Secure Storage** - Files copied to restricted directory with 600 permissions
2049
+
2050
+ **Security Certification:**
2051
+ - 100% attack rejection rate (107/107 tests passed)
2052
+ - ✅ OWASP CWE-22 compliant (path traversal prevention)
2053
+ - No information disclosure in error messages
2054
+ - Production-ready and certified secure
2055
+
2056
+ See full security audit: `docs/security/SECURITY-AUDIT.md`
2057
+
2058
+ **Q: Has the security been independently verified?**
2059
+ **A:** Yes! AgentVibes v3.6.0 includes a comprehensive security audit with 180+ attack variations tested. All path traversal, symlink, Unicode, null byte, and edge case attacks were successfully blocked (100% rejection rate). The system is OWASP CWE-22 compliant and includes a detailed security audit report at `docs/security/SECURITY-AUDIT.md`.
2060
+
2061
+ **Q: What attack patterns were tested?**
2062
+ **A:** The security test suite covers:
2063
+ - **Path Traversal:** 100 variations (basic, URL-encoded, Unicode, null bytes, mixed)
2064
+ - **Symlink Attacks:** 10 variations (sensitive files, chains, traversal targets)
2065
+ - **Hard Link Attacks:** 5 variations (ownership verification)
2066
+ - **Edge Cases:** 65+ variations (CRLF, whitespace, Unicode normalization, platform-specific)
2067
+
2068
+ Every attack was correctly rejected with no information disclosure.
2069
+
2070
+ ### Troubleshooting
2071
+
2072
+ **Q: Why isn't Claude speaking?**
2073
+ **A:** Common causes:
2074
+ 1. Hook not installed - Run `npx agentvibes install --yes`
2075
+ 2. Audio player missing - Install `sox` and `ffmpeg`
2076
+ 3. TTS protocol not enabled in settings
2077
+ 4. Test with `/agent-vibes:sample Aria`
2078
+
2079
+ **Q: Can I use this on Windows?**
2080
+ **A:** Yes! AgentVibes supports **native Windows** with PowerShell scripts (Soprano, Piper, SAPI providers). See [Windows Native Setup](WINDOWS-SETUP.md). WSL is also supported for legacy workflows - see [Windows WSL Guide](mcp-server/WINDOWS_SETUP.md).
2081
+
2082
+ **Q: How do I reduce token usage?**
2083
+ **A:**
2084
+ 1. Use slash commands instead of MCP (zero context token overhead)
2085
+ 2. Set verbosity to LOW (`/agent-vibes:verbosity low`)
2086
+ 3. Disable BMAD integration if not using it
2087
+
2088
+ [↑ Back to top](#-table-of-contents)
2089
+
2090
+ ---
2091
+
2092
+ ## ⚠️ Important Disclaimers
2093
+
2094
+ **API Costs & Usage:**
2095
+ - Usage is completely free with Piper TTS and Mac Say (no API costs)
2096
+ - Users are solely responsible for their own API costs and usage
2097
+
2098
+
2099
+ **Third-Party Services:**
2100
+ - This project integrates with Piper TTS (local processing) and macOS Say (system built-in)
2101
+ - We are **not affiliated with, endorsed by, or officially connected** to Anthropic, Apple, or Claude
2102
+ - Piper TTS is subject to its terms of service
2103
+
2104
+ **Privacy & Data:**
2105
+ - **Piper TTS**: All processing happens locally on your machine, no external data transmission
2106
+ - **macOS Say**: All processing happens locally using Apple's built-in speech synthesis
2107
+
2108
+ **Software License:**
2109
+ - Provided "as-is" under Apache 2.0 License without warranty of any kind
2110
+ - See [LICENSE](LICENSE) file for full terms
2111
+ - No liability for data loss, bugs, service interruptions, or any damages
2112
+
2113
+ **Use at Your Own Risk:**
2114
+ - This is open-source software maintained by the community
2115
+ - Always test in development before production use
2116
+ - Monitor your API usage and costs regularly
2117
+
2118
+ [↑ Back to top](#-table-of-contents)
2119
+
2120
+ ---
2121
+
2122
+ ## 🙏 Credits
2123
+
2124
+ **Built with ❤️ by [Paul Preibisch](https://github.com/paulpreibisch)**
2125
+
2126
+ - 🐦 Twitter: [@997Fire](https://x.com/997Fire)
2127
+ - 💼 LinkedIn: [paul-preibisch](https://www.linkedin.com/in/paul-preibisch/)
2128
+ - 🌐 GitHub: [paulpreibisch](https://github.com/paulpreibisch)
2129
+
2130
+ **Powered by:**
2131
+ - [Piper TTS](https://github.com/rhasspy/piper) - Free neural voices
2132
+ - [Soprano TTS](https://github.com/suno-ai/bark) - Ultra-fast neural TTS
2133
+ - **Windows SAPI** - Native Windows text-to-speech
2134
+ - **macOS Say** - Native macOS text-to-speech
2135
+ - [Claude Code](https://claude.com/claude-code) - AI coding assistant
2136
+ - Licensed under Apache 2.0
2137
+
2138
+ **Contributors:**
2139
+ - 🎤 [@nathanchase](https://github.com/nathanchase) - Soprano TTS Provider integration (PR #95) - Ultra-fast neural TTS with GPU acceleration
2140
+
2141
+ **Special Thanks:**
2142
+ - 💡 [Claude Code Hooks Mastery](https://github.com/disler/claude-code-hooks-mastery) by [@disler](https://github.com/disler) - Hooks inspiration
2143
+ - 🤖 [BMAD METHOD](https://github.com/bmad-code-org/BMAD-METHOD) - Multi-agent framework with auto voice switching integration
2144
+
2145
+ [↑ Back to top](#-table-of-contents)
2146
+
2147
+ ---
2148
+
2149
+ ## 🤝 Contributing
2150
+
2151
+ If AgentVibes makes your coding more fun:
2152
+ - ⭐ **Star this repo** on GitHub
2153
+ - 🐦 **Tweet** and tag [@997Fire](https://x.com/997Fire)
2154
+ - 🎥 **Share videos** of Claude with personality
2155
+ - 💬 **Tell dev friends** about voice-powered AI
2156
+
2157
+ ---
2158
+
2159
+ **Ready to give Claude a voice? Install now and code with personality! 🎤✨**
2160
+
2161
+ [↑ Back to top](#-table-of-contents)
2162
+