agentvibes 2.1.0 → 2.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.bmad-core/agent-teams/team-all.yaml +15 -0
- package/.bmad-core/agent-teams/team-fullstack.yaml +19 -0
- package/.bmad-core/agent-teams/team-ide-minimal.yaml +11 -0
- package/.bmad-core/agent-teams/team-no-ui.yaml +14 -0
- package/.bmad-core/agents/analyst.md +84 -0
- package/.bmad-core/agents/architect.md +85 -0
- package/.bmad-core/agents/bmad-master.md +110 -0
- package/.bmad-core/agents/bmad-orchestrator.md +147 -0
- package/.bmad-core/agents/dev.md +81 -0
- package/.bmad-core/agents/pm.md +84 -0
- package/.bmad-core/agents/po.md +79 -0
- package/.bmad-core/agents/qa.md +87 -0
- package/.bmad-core/agents/sm.md +65 -0
- package/.bmad-core/agents/ux-expert.md +69 -0
- package/.bmad-core/checklists/architect-checklist.md +440 -0
- package/.bmad-core/checklists/change-checklist.md +184 -0
- package/.bmad-core/checklists/pm-checklist.md +372 -0
- package/.bmad-core/checklists/po-master-checklist.md +434 -0
- package/.bmad-core/checklists/story-dod-checklist.md +96 -0
- package/.bmad-core/checklists/story-draft-checklist.md +155 -0
- package/.bmad-core/core-config.yaml +22 -0
- package/.bmad-core/data/bmad-kb.md +809 -0
- package/.bmad-core/data/brainstorming-techniques.md +38 -0
- package/.bmad-core/data/elicitation-methods.md +156 -0
- package/.bmad-core/data/technical-preferences.md +5 -0
- package/.bmad-core/data/test-levels-framework.md +148 -0
- package/.bmad-core/data/test-priorities-matrix.md +174 -0
- package/.bmad-core/enhanced-ide-development-workflow.md +248 -0
- package/.bmad-core/install-manifest.yaml +230 -0
- package/.bmad-core/tasks/advanced-elicitation.md +119 -0
- package/.bmad-core/tasks/apply-qa-fixes.md +150 -0
- package/.bmad-core/tasks/brownfield-create-epic.md +162 -0
- package/.bmad-core/tasks/brownfield-create-story.md +149 -0
- package/.bmad-core/tasks/correct-course.md +72 -0
- package/.bmad-core/tasks/create-brownfield-story.md +314 -0
- package/.bmad-core/tasks/create-deep-research-prompt.md +280 -0
- package/.bmad-core/tasks/create-doc.md +103 -0
- package/.bmad-core/tasks/create-next-story.md +114 -0
- package/.bmad-core/tasks/document-project.md +345 -0
- package/.bmad-core/tasks/execute-checklist.md +88 -0
- package/.bmad-core/tasks/facilitate-brainstorming-session.md +138 -0
- package/.bmad-core/tasks/generate-ai-frontend-prompt.md +53 -0
- package/.bmad-core/tasks/index-docs.md +175 -0
- package/.bmad-core/tasks/kb-mode-interaction.md +77 -0
- package/.bmad-core/tasks/nfr-assess.md +345 -0
- package/.bmad-core/tasks/qa-gate.md +163 -0
- package/.bmad-core/tasks/review-story.md +316 -0
- package/.bmad-core/tasks/risk-profile.md +355 -0
- package/.bmad-core/tasks/shard-doc.md +187 -0
- package/.bmad-core/tasks/test-design.md +176 -0
- package/.bmad-core/tasks/trace-requirements.md +266 -0
- package/.bmad-core/tasks/validate-next-story.md +136 -0
- package/.bmad-core/templates/architecture-tmpl.yaml +651 -0
- package/.bmad-core/templates/brainstorming-output-tmpl.yaml +156 -0
- package/.bmad-core/templates/brownfield-architecture-tmpl.yaml +477 -0
- package/.bmad-core/templates/brownfield-prd-tmpl.yaml +281 -0
- package/.bmad-core/templates/competitor-analysis-tmpl.yaml +307 -0
- package/.bmad-core/templates/front-end-architecture-tmpl.yaml +219 -0
- package/.bmad-core/templates/front-end-spec-tmpl.yaml +350 -0
- package/.bmad-core/templates/fullstack-architecture-tmpl.yaml +824 -0
- package/.bmad-core/templates/market-research-tmpl.yaml +253 -0
- package/.bmad-core/templates/prd-tmpl.yaml +203 -0
- package/.bmad-core/templates/project-brief-tmpl.yaml +222 -0
- package/.bmad-core/templates/qa-gate-tmpl.yaml +103 -0
- package/.bmad-core/templates/story-tmpl.yaml +138 -0
- package/.bmad-core/user-guide.md +577 -0
- package/.bmad-core/utils/bmad-doc-template.md +327 -0
- package/.bmad-core/utils/workflow-management.md +71 -0
- package/.bmad-core/workflows/brownfield-fullstack.yaml +298 -0
- package/.bmad-core/workflows/brownfield-service.yaml +188 -0
- package/.bmad-core/workflows/brownfield-ui.yaml +198 -0
- package/.bmad-core/workflows/greenfield-fullstack.yaml +241 -0
- package/.bmad-core/workflows/greenfield-service.yaml +207 -0
- package/.bmad-core/workflows/greenfield-ui.yaml +236 -0
- package/.bmad-core/working-in-the-brownfield.md +606 -0
- package/.claude/commands/BMad/analyst.md +88 -0
- package/.claude/commands/BMad/architect.md +89 -0
- package/.claude/commands/BMad/bmad-master.md +114 -0
- package/.claude/commands/BMad/bmad-orchestrator.md +151 -0
- package/.claude/commands/BMad/dev.md +85 -0
- package/.claude/commands/BMad/pm.md +88 -0
- package/.claude/commands/BMad/po.md +83 -0
- package/.claude/commands/BMad/qa.md +91 -0
- package/.claude/commands/BMad/sm.md +69 -0
- package/.claude/commands/BMad/tasks/advanced-elicitation.md +123 -0
- package/.claude/commands/BMad/tasks/apply-qa-fixes.md +154 -0
- package/.claude/commands/BMad/tasks/brownfield-create-epic.md +166 -0
- package/.claude/commands/BMad/tasks/brownfield-create-story.md +153 -0
- package/.claude/commands/BMad/tasks/correct-course.md +76 -0
- package/.claude/commands/BMad/tasks/create-brownfield-story.md +318 -0
- package/.claude/commands/BMad/tasks/create-deep-research-prompt.md +284 -0
- package/.claude/commands/BMad/tasks/create-doc.md +107 -0
- package/.claude/commands/BMad/tasks/create-next-story.md +118 -0
- package/.claude/commands/BMad/tasks/document-project.md +349 -0
- package/.claude/commands/BMad/tasks/execute-checklist.md +92 -0
- package/.claude/commands/BMad/tasks/facilitate-brainstorming-session.md +142 -0
- package/.claude/commands/BMad/tasks/generate-ai-frontend-prompt.md +57 -0
- package/.claude/commands/BMad/tasks/index-docs.md +179 -0
- package/.claude/commands/BMad/tasks/kb-mode-interaction.md +81 -0
- package/.claude/commands/BMad/tasks/nfr-assess.md +349 -0
- package/.claude/commands/BMad/tasks/qa-gate.md +167 -0
- package/.claude/commands/BMad/tasks/review-story.md +320 -0
- package/.claude/commands/BMad/tasks/risk-profile.md +359 -0
- package/.claude/commands/BMad/tasks/shard-doc.md +191 -0
- package/.claude/commands/BMad/tasks/test-design.md +180 -0
- package/.claude/commands/BMad/tasks/trace-requirements.md +270 -0
- package/.claude/commands/BMad/tasks/validate-next-story.md +140 -0
- package/.claude/commands/BMad/ux-expert.md +73 -0
- package/.claude/hooks/piper-installer.sh +2 -2
- package/README.md +10 -11
- package/docs/technical-deep-dive.md +905 -0
- package/linkedin/vibe-coding-and-pulseaudio.md +121 -0
- package/mcp-server/agentvibes.db +0 -0
- package/package.json +1 -1
- package/scripts/audio-tunnel.config +17 -0
- package/src/installer.js +3 -3
|
@@ -0,0 +1,905 @@
|
|
|
1
|
+
# How AgentVibes Works Under the Hood: A Technical Deep Dive
|
|
2
|
+
|
|
3
|
+
Two months ago, I wanted to add voice and personality to my Claude coding agents so they would speak acknowledgments and completions—making my development workflow more engaging and keeping me in flow state. Fast forward to today, and we've built an amazing working system that not only speaks with over 150 voices, but does so with distinct personalities ranging from zen masters to sarcastic companions that add a bit of sass to your coding sessions.
|
|
4
|
+
|
|
5
|
+
In this article, we're going to take a deep dive to show you how AgentVibes works under the hood—the architecture, the design patterns, and the clever implementations that make it all possible. And best of all, **this is an open source project that is completely free** and will completely transform your coding experience with AI assistants.
|
|
6
|
+
|
|
7
|
+
## The Big Picture: What Problem Does AgentVibes Solve?
|
|
8
|
+
|
|
9
|
+
Claude Code is an amazing AI coding assistant, but it's entirely text-based. You type a request, Claude responds with text, runs commands, and writes code. But what if Claude could *tell* you when it's starting a task? What if it could vocally confirm when it's done? What if it could do all this with personality—speaking with dry wit and sass, zen-like calmness, or whatever style fits your mood?
|
|
10
|
+
|
|
11
|
+
That's exactly what AgentVibes does. It transforms Claude Code from a silent text assistant into a voice-enabled AI companion with character and charm.
|
|
12
|
+
|
|
13
|
+
## Architecture Overview: The Four Core Systems
|
|
14
|
+
|
|
15
|
+
AgentVibes is built on four interconnected systems:
|
|
16
|
+
|
|
17
|
+
1. **Output Style System** - The AI's instructions for when to speak
|
|
18
|
+
2. **Hook System** - The bash scripts that generate and play audio
|
|
19
|
+
3. **Provider System** - The TTS engines (ElevenLabs or Piper)
|
|
20
|
+
4. **MCP Server** - Natural language control interface
|
|
21
|
+
|
|
22
|
+
Let's explore each one.
|
|
23
|
+
|
|
24
|
+
---
|
|
25
|
+
|
|
26
|
+
## System 1: The Output Style - Teaching Claude When to Speak
|
|
27
|
+
|
|
28
|
+
### What is an Output Style?
|
|
29
|
+
|
|
30
|
+
In Claude Code, an "output style" is essentially a set of instructions that tells the AI assistant *how* to format and present its responses. Think of it as a personality overlay that changes Claude's behavior without changing its core capabilities.
|
|
31
|
+
|
|
32
|
+
AgentVibes provides an output style called "Agent Vibes" (located at `.claude/output-styles/agent-vibes.md`). This markdown file contains detailed instructions that become part of Claude's system prompt when activated.
|
|
33
|
+
|
|
34
|
+
### The Two-Point Protocol
|
|
35
|
+
|
|
36
|
+
The core genius of the AgentVibes output style is its **Two-Point TTS Protocol**:
|
|
37
|
+
|
|
38
|
+
**1. ACKNOWLEDGMENT** (Start of task)
|
|
39
|
+
When Claude receives a user command, it:
|
|
40
|
+
- Checks current personality/sentiment settings
|
|
41
|
+
- Generates a unique acknowledgment in that style
|
|
42
|
+
- Executes the TTS script to speak it
|
|
43
|
+
- Then proceeds with the actual work
|
|
44
|
+
|
|
45
|
+
**2. COMPLETION** (End of task)
|
|
46
|
+
After completing the task, Claude:
|
|
47
|
+
- Uses the same personality/sentiment as acknowledgment
|
|
48
|
+
- Generates a unique completion message
|
|
49
|
+
- Executes the TTS script again
|
|
50
|
+
|
|
51
|
+
Here's the critical part from `.claude/output-styles/agent-vibes.md`:
|
|
52
|
+
|
|
53
|
+
```
|
|
54
|
+
### 1. ACKNOWLEDGMENT (Start of task)
|
|
55
|
+
After receiving a user command:
|
|
56
|
+
1. Check sentiment FIRST: `SENTIMENT=$(cat .claude/tts-sentiment.txt 2>/dev/null)`
|
|
57
|
+
2. If no sentiment, check personality: `PERSONALITY=$(cat .claude/tts-personality.txt 2>/dev/null)`
|
|
58
|
+
3. Use sentiment if set, otherwise use personality
|
|
59
|
+
4. **Generate UNIQUE acknowledgment** - Use AI to create a fresh response in that style
|
|
60
|
+
5. Execute TTS: `.claude/hooks/play-tts.sh "[message]" "[VoiceName]"`
|
|
61
|
+
6. Proceed with work
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
### Why This Matters
|
|
65
|
+
|
|
66
|
+
This two-point protocol creates natural conversational flow:
|
|
67
|
+
- User: "Check git status"
|
|
68
|
+
- Claude (spoken): "I'll check that for you right away"
|
|
69
|
+
- Claude (text): *runs git status command*
|
|
70
|
+
- Claude (spoken): "Your repository is clean and up to date"
|
|
71
|
+
|
|
72
|
+
The AI doesn't just blindly execute—it *communicates* like a helpful assistant would.
|
|
73
|
+
|
|
74
|
+
### Settings Priority System
|
|
75
|
+
|
|
76
|
+
AgentVibes has a sophisticated three-tier priority system for how Claude should speak:
|
|
77
|
+
|
|
78
|
+
**Priority 0: Language** (`.claude/tts-language.txt`)
|
|
79
|
+
- Controls which language TTS speaks
|
|
80
|
+
- Examples: "english", "spanish", "french"
|
|
81
|
+
- When set to non-English, ALL TTS is in that language
|
|
82
|
+
|
|
83
|
+
**Priority 1: Sentiment** (`.claude/tts-sentiment.txt`)
|
|
84
|
+
- Applies personality style WITHOUT changing voice
|
|
85
|
+
- Examples: "sarcastic", "flirty", "professional"
|
|
86
|
+
- Keeps your current voice but changes speaking style
|
|
87
|
+
|
|
88
|
+
**Priority 2: Personality** (`.claude/tts-personality.txt`)
|
|
89
|
+
- Changes BOTH voice AND speaking style
|
|
90
|
+
- Examples: "sarcastic" = Jessica Anne Bogart voice + dry wit
|
|
91
|
+
- Each personality has an assigned voice
|
|
92
|
+
|
|
93
|
+
The output style checks these in order—if language is set, speak in that language. If sentiment is set, use that style. Otherwise fall back to personality.
|
|
94
|
+
|
|
95
|
+
---
|
|
96
|
+
|
|
97
|
+
## System 2: The Hook System - Where the Magic Happens
|
|
98
|
+
|
|
99
|
+
The hook system is a collection of bash scripts in `.claude/hooks/` that do the actual work of generating and playing audio. Let's trace the journey of a TTS request.
|
|
100
|
+
|
|
101
|
+
### The Entry Point: play-tts.sh
|
|
102
|
+
|
|
103
|
+
When Claude's output style executes `.claude/hooks/play-tts.sh "Hello world" "Aria"`, here's what happens:
|
|
104
|
+
|
|
105
|
+
**File: `.claude/hooks/play-tts.sh`** (the router)
|
|
106
|
+
|
|
107
|
+
```bash
|
|
108
|
+
TEXT="$1" # "Hello world"
|
|
109
|
+
VOICE_OVERRIDE="$2" # "Aria" (optional)
|
|
110
|
+
|
|
111
|
+
# Get active provider (elevenlabs or piper)
|
|
112
|
+
ACTIVE_PROVIDER=$(get_active_provider)
|
|
113
|
+
|
|
114
|
+
# Route to provider-specific implementation
|
|
115
|
+
case "$ACTIVE_PROVIDER" in
|
|
116
|
+
elevenlabs)
|
|
117
|
+
exec "$SCRIPT_DIR/play-tts-elevenlabs.sh" "$TEXT" "$VOICE_OVERRIDE"
|
|
118
|
+
;;
|
|
119
|
+
piper)
|
|
120
|
+
exec "$SCRIPT_DIR/play-tts-piper.sh" "$TEXT" "$VOICE_OVERRIDE"
|
|
121
|
+
;;
|
|
122
|
+
esac
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
This script is a **provider router**. It doesn't generate audio itself—it delegates to the appropriate provider implementation. This is the provider abstraction pattern in action.
|
|
126
|
+
|
|
127
|
+
### Provider Implementations
|
|
128
|
+
|
|
129
|
+
Each provider has its own script that handles the specifics:
|
|
130
|
+
|
|
131
|
+
**For ElevenLabs** (`.claude/hooks/play-tts-elevenlabs.sh`):
|
|
132
|
+
1. Resolves voice name to voice ID (looks up "Aria" → actual voice ID)
|
|
133
|
+
2. Detects current language setting (for multilingual support)
|
|
134
|
+
3. Makes API call to ElevenLabs with text, voice, and language
|
|
135
|
+
4. Saves audio to temp file
|
|
136
|
+
5. Plays audio using system player (paplay/aplay/mpg123)
|
|
137
|
+
6. Handles SSH detection and audio optimization
|
|
138
|
+
|
|
139
|
+
**For Piper** (`.claude/hooks/play-tts-piper.sh`):
|
|
140
|
+
1. Resolves voice name to Piper model (e.g., "en_US-lessac-medium")
|
|
141
|
+
2. Downloads voice model if not cached
|
|
142
|
+
3. Runs local Piper TTS engine (no API call)
|
|
143
|
+
4. Saves audio to temp file
|
|
144
|
+
5. Plays audio using system player
|
|
145
|
+
|
|
146
|
+
### The Personality Manager
|
|
147
|
+
|
|
148
|
+
One of the most interesting hooks is `personality-manager.sh`. Let's see how it works.
|
|
149
|
+
|
|
150
|
+
When you run `/agent-vibes:personality sarcastic`, this script:
|
|
151
|
+
|
|
152
|
+
```bash
|
|
153
|
+
# 1. Validates personality exists
|
|
154
|
+
if [[ ! -f "$PERSONALITIES_DIR/${PERSONALITY}.md" ]]; then
|
|
155
|
+
echo "❌ Personality not found: $PERSONALITY"
|
|
156
|
+
exit 1
|
|
157
|
+
fi
|
|
158
|
+
|
|
159
|
+
# 2. Saves personality to config file
|
|
160
|
+
echo "$PERSONALITY" > "$PERSONALITY_FILE"
|
|
161
|
+
|
|
162
|
+
# 3. Detects active provider (ElevenLabs or Piper)
|
|
163
|
+
ACTIVE_PROVIDER=$(cat "$CLAUDE_DIR/tts-provider.txt")
|
|
164
|
+
|
|
165
|
+
# 4. Reads assigned voice from personality file
|
|
166
|
+
if [[ "$ACTIVE_PROVIDER" == "piper" ]]; then
|
|
167
|
+
ASSIGNED_VOICE=$(get_personality_data "$PERSONALITY" "piper_voice")
|
|
168
|
+
else
|
|
169
|
+
ASSIGNED_VOICE=$(get_personality_data "$PERSONALITY" "voice")
|
|
170
|
+
fi
|
|
171
|
+
|
|
172
|
+
# 5. Switches to that voice automatically
|
|
173
|
+
"$VOICE_MANAGER" switch "$ASSIGNED_VOICE" --silent
|
|
174
|
+
|
|
175
|
+
# 6. Plays a personality-appropriate acknowledgment
|
|
176
|
+
REMARK=$(pick_random_example_from_personality_file)
|
|
177
|
+
.claude/hooks/play-tts.sh "$REMARK"
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
### Personality Configuration Files
|
|
181
|
+
|
|
182
|
+
Each personality is defined in a markdown file like `.claude/personalities/sarcastic.md`:
|
|
183
|
+
|
|
184
|
+
```markdown
|
|
185
|
+
---
|
|
186
|
+
name: sarcastic
|
|
187
|
+
description: Dry wit and cutting observations
|
|
188
|
+
elevenlabs_voice: Jessica Anne Bogart
|
|
189
|
+
piper_voice: en_US-amy-medium
|
|
190
|
+
---
|
|
191
|
+
|
|
192
|
+
## AI Instructions
|
|
193
|
+
Use dry wit, cutting observations, and dismissive compliance. Model after
|
|
194
|
+
iconic sarcastic characters like Dr. House, Chandler Bing, and Miranda Priestly.
|
|
195
|
+
|
|
196
|
+
Rotate through different sarcastic approaches:
|
|
197
|
+
- Condescending intelligence: "Fascinating. You've discovered debugging."
|
|
198
|
+
- Quick zingers: "Could this build BE any slower?"
|
|
199
|
+
- Icy dismissiveness: "By all means, continue at a glacial pace"
|
|
200
|
+
|
|
201
|
+
## Example Responses
|
|
202
|
+
- "Oh joy, another merge conflict. Just what I needed today."
|
|
203
|
+
- "Wow, a syntax error. I'm shocked. Shocked, I tell you."
|
|
204
|
+
- "Sure, I'll run that test. Right after I finish curing world hunger."
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
**Personal note:** I've literally laughed out loud multiple times while coding with the sarcastic personality active. There's something delightfully entertaining about having your AI assistant respond with perfectly-timed sass when you ask it to debug yet another type error.
|
|
208
|
+
|
|
209
|
+
The AI reads this file and uses the "AI Instructions" section to generate unique responses in that style. The example responses are just guidance—the AI creates fresh variations each time.
|
|
210
|
+
|
|
211
|
+
### Provider Manager
|
|
212
|
+
|
|
213
|
+
The provider manager (`provider-manager.sh`) handles switching between ElevenLabs and Piper:
|
|
214
|
+
|
|
215
|
+
```bash
|
|
216
|
+
# Get active provider
|
|
217
|
+
get_active_provider() {
|
|
218
|
+
local provider_file=""
|
|
219
|
+
|
|
220
|
+
# Check project-local first, then global
|
|
221
|
+
if [[ -f ".claude/tts-provider.txt" ]]; then
|
|
222
|
+
provider_file=".claude/tts-provider.txt"
|
|
223
|
+
elif [[ -f "$HOME/.claude/tts-provider.txt" ]]; then
|
|
224
|
+
provider_file="$HOME/.claude/tts-provider.txt"
|
|
225
|
+
fi
|
|
226
|
+
|
|
227
|
+
cat "$provider_file" 2>/dev/null || echo "elevenlabs"
|
|
228
|
+
}
|
|
229
|
+
|
|
230
|
+
# Switch provider
|
|
231
|
+
switch_provider() {
|
|
232
|
+
local new_provider="$1"
|
|
233
|
+
echo "$new_provider" > "$CLAUDE_DIR/tts-provider.txt"
|
|
234
|
+
echo "✅ Switched to $new_provider provider"
|
|
235
|
+
}
|
|
236
|
+
```
|
|
237
|
+
|
|
238
|
+
This allows seamless switching between paid (ElevenLabs) and free (Piper) TTS without changing any other configuration.
|
|
239
|
+
|
|
240
|
+
---
|
|
241
|
+
|
|
242
|
+
## System 3: The Provider System - Two Engines, One Interface
|
|
243
|
+
|
|
244
|
+
AgentVibes supports two TTS providers with the same interface:
|
|
245
|
+
|
|
246
|
+
### ElevenLabs Provider
|
|
247
|
+
|
|
248
|
+
**Architecture:** Cloud-based API
|
|
249
|
+
|
|
250
|
+
**How it works:**
|
|
251
|
+
1. Accepts text, voice name, and language code
|
|
252
|
+
2. Makes HTTPS POST request to ElevenLabs API
|
|
253
|
+
3. Receives MP3 audio stream
|
|
254
|
+
4. Detects if running over SSH (checks `$SSH_CONNECTION`)
|
|
255
|
+
5. If SSH detected, converts to OGG format (prevents audio corruption)
|
|
256
|
+
6. Plays audio using local audio player
|
|
257
|
+
|
|
258
|
+
**Code snippet from `.claude/hooks/play-tts-elevenlabs.sh`:**
|
|
259
|
+
|
|
260
|
+
```bash
|
|
261
|
+
# Make API request
|
|
262
|
+
RESPONSE=$(curl -s -X POST \
|
|
263
|
+
"https://api.elevenlabs.io/v1/text-to-speech/${VOICE_ID}" \
|
|
264
|
+
-H "xi-api-key: ${API_KEY}" \
|
|
265
|
+
-H "Content-Type: application/json" \
|
|
266
|
+
-d "{
|
|
267
|
+
\"text\": \"$TEXT\",
|
|
268
|
+
\"model_id\": \"eleven_multilingual_v2\",
|
|
269
|
+
\"language_code\": \"$LANGUAGE_CODE\",
|
|
270
|
+
\"voice_settings\": {
|
|
271
|
+
\"stability\": 0.5,
|
|
272
|
+
\"similarity_boost\": 0.75
|
|
273
|
+
}
|
|
274
|
+
}" \
|
|
275
|
+
--output "$AUDIO_FILE")
|
|
276
|
+
|
|
277
|
+
# SSH audio optimization
|
|
278
|
+
if [[ -n "$SSH_CONNECTION" ]]; then
|
|
279
|
+
# Convert MP3 to OGG to prevent corruption over SSH
|
|
280
|
+
ffmpeg -i "$AUDIO_FILE" -c:a libopus -b:a 128k "$OGG_FILE"
|
|
281
|
+
AUDIO_FILE="$OGG_FILE"
|
|
282
|
+
fi
|
|
283
|
+
|
|
284
|
+
# Play audio
|
|
285
|
+
paplay "$AUDIO_FILE" 2>/dev/null || aplay "$AUDIO_FILE"
|
|
286
|
+
```
|
|
287
|
+
|
|
288
|
+
### Piper Provider
|
|
289
|
+
|
|
290
|
+
**Architecture:** Local neural TTS
|
|
291
|
+
|
|
292
|
+
**How it works:**
|
|
293
|
+
1. Accepts text and voice model name
|
|
294
|
+
2. Downloads voice model if not cached (stored in `~/.local/share/piper/`)
|
|
295
|
+
3. Runs Piper engine locally (no internet required)
|
|
296
|
+
4. Generates WAV audio
|
|
297
|
+
5. Plays audio using local audio player
|
|
298
|
+
|
|
299
|
+
**Code snippet from `.claude/hooks/play-tts-piper.sh`:**
|
|
300
|
+
|
|
301
|
+
```bash
|
|
302
|
+
# Check if voice model exists
|
|
303
|
+
VOICE_PATH="$HOME/.local/share/piper/voices/${VOICE}.onnx"
|
|
304
|
+
|
|
305
|
+
if [[ ! -f "$VOICE_PATH" ]]; then
|
|
306
|
+
# Download voice model
|
|
307
|
+
"$SCRIPT_DIR/piper-download-voices.sh" "$VOICE"
|
|
308
|
+
fi
|
|
309
|
+
|
|
310
|
+
# Generate speech locally
|
|
311
|
+
echo "$TEXT" | piper \
|
|
312
|
+
--model "$VOICE_PATH" \
|
|
313
|
+
--output_file "$AUDIO_FILE"
|
|
314
|
+
|
|
315
|
+
# Play audio
|
|
316
|
+
paplay "$AUDIO_FILE" 2>/dev/null || aplay "$AUDIO_FILE"
|
|
317
|
+
```
|
|
318
|
+
|
|
319
|
+
### Why Two Providers?
|
|
320
|
+
|
|
321
|
+
**ElevenLabs:**
|
|
322
|
+
- ✅ Superior voice quality
|
|
323
|
+
- ✅ 150+ voices with distinct characters
|
|
324
|
+
- ✅ Perfect multilingual support (29 languages)
|
|
325
|
+
- ❌ Requires API key and paid plan
|
|
326
|
+
- ❌ Needs internet connection
|
|
327
|
+
- ❌ API costs per character
|
|
328
|
+
|
|
329
|
+
**Piper:**
|
|
330
|
+
- ✅ Completely free
|
|
331
|
+
- ✅ Works offline
|
|
332
|
+
- ✅ No API key needed
|
|
333
|
+
- ✅ 50+ voices
|
|
334
|
+
- ❌ Moderate voice quality
|
|
335
|
+
- ❌ Basic multilingual support
|
|
336
|
+
- ❌ Requires local installation
|
|
337
|
+
|
|
338
|
+
By supporting both, AgentVibes lets users choose based on their priorities: quality vs. cost.
|
|
339
|
+
|
|
340
|
+
---
|
|
341
|
+
|
|
342
|
+
## System 4: The MCP Server - Natural Language Control
|
|
343
|
+
|
|
344
|
+
The Model Context Protocol (MCP) server is AgentVibes' newest feature. It exposes all AgentVibes functionality through a standardized protocol that AI assistants can use.
|
|
345
|
+
|
|
346
|
+
### What is MCP?
|
|
347
|
+
|
|
348
|
+
MCP is a protocol that allows AI assistants to discover and use external tools. Think of it like REST API for AI assistants—instead of manually typing commands like `/agent-vibes:switch Aria`, you can just say "Switch to Aria voice" and the AI figures out the right tool to call.
|
|
349
|
+
|
|
350
|
+
### The MCP Server Architecture
|
|
351
|
+
|
|
352
|
+
**File: `mcp-server/server.py`** (Python implementation)
|
|
353
|
+
|
|
354
|
+
```python
|
|
355
|
+
class AgentVibesServer:
|
|
356
|
+
"""MCP Server for AgentVibes TTS functionality"""
|
|
357
|
+
|
|
358
|
+
def __init__(self):
|
|
359
|
+
# Find the .claude directory (where hooks live)
|
|
360
|
+
self.claude_dir = self._find_claude_dir()
|
|
361
|
+
self.hooks_dir = self.claude_dir / "hooks"
|
|
362
|
+
|
|
363
|
+
async def text_to_speech(
|
|
364
|
+
self,
|
|
365
|
+
text: str,
|
|
366
|
+
voice: Optional[str] = None,
|
|
367
|
+
personality: Optional[str] = None,
|
|
368
|
+
language: Optional[str] = None,
|
|
369
|
+
) -> str:
|
|
370
|
+
"""Convert text to speech using AgentVibes"""
|
|
371
|
+
|
|
372
|
+
# Temporarily set personality if specified
|
|
373
|
+
if personality:
|
|
374
|
+
await self._run_script(
|
|
375
|
+
"personality-manager.sh",
|
|
376
|
+
["set", personality]
|
|
377
|
+
)
|
|
378
|
+
|
|
379
|
+
# Temporarily set language if specified
|
|
380
|
+
if language:
|
|
381
|
+
await self._run_script(
|
|
382
|
+
"language-manager.sh",
|
|
383
|
+
["set", language]
|
|
384
|
+
)
|
|
385
|
+
|
|
386
|
+
# Call the TTS script
|
|
387
|
+
args = ["bash", str(self.hooks_dir / "play-tts.sh"), text]
|
|
388
|
+
if voice:
|
|
389
|
+
args.append(voice)
|
|
390
|
+
|
|
391
|
+
# Execute asynchronously (non-blocking)
|
|
392
|
+
result = await asyncio.create_subprocess_exec(
|
|
393
|
+
*args,
|
|
394
|
+
stdout=asyncio.subprocess.PIPE,
|
|
395
|
+
stderr=asyncio.subprocess.PIPE,
|
|
396
|
+
)
|
|
397
|
+
|
|
398
|
+
return "✅ Audio played successfully"
|
|
399
|
+
```
|
|
400
|
+
|
|
401
|
+
### How MCP Tools are Registered
|
|
402
|
+
|
|
403
|
+
The server registers tools that the AI can discover:
|
|
404
|
+
|
|
405
|
+
```python
|
|
406
|
+
@server.list_tools()
|
|
407
|
+
async def list_tools() -> list[Tool]:
|
|
408
|
+
return [
|
|
409
|
+
Tool(
|
|
410
|
+
name="text_to_speech",
|
|
411
|
+
description="Speak text using AgentVibes TTS",
|
|
412
|
+
inputSchema={
|
|
413
|
+
"type": "object",
|
|
414
|
+
"properties": {
|
|
415
|
+
"text": {"type": "string"},
|
|
416
|
+
"voice": {"type": "string", "optional": True},
|
|
417
|
+
"personality": {"type": "string", "optional": True},
|
|
418
|
+
"language": {"type": "string", "optional": True},
|
|
419
|
+
},
|
|
420
|
+
},
|
|
421
|
+
),
|
|
422
|
+
Tool(name="switch_voice", ...),
|
|
423
|
+
Tool(name="list_voices", ...),
|
|
424
|
+
Tool(name="set_personality", ...),
|
|
425
|
+
# ... 20+ more tools
|
|
426
|
+
]
|
|
427
|
+
```
|
|
428
|
+
|
|
429
|
+
### MCP in Action
|
|
430
|
+
|
|
431
|
+
When you say "Switch to Aria voice" in Claude Desktop with AgentVibes MCP installed:
|
|
432
|
+
|
|
433
|
+
1. Claude receives your natural language request
|
|
434
|
+
2. Claude sees the `switch_voice` tool is available
|
|
435
|
+
3. Claude calls: `switch_voice(voice_name="Aria")`
|
|
436
|
+
4. MCP server executes: `bash .claude/hooks/voice-manager.sh switch Aria`
|
|
437
|
+
5. Voice manager saves "Aria" to `.claude/tts-voice.txt`
|
|
438
|
+
6. MCP server returns: "✅ Switched to Aria voice"
|
|
439
|
+
7. Claude responds to you with confirmation
|
|
440
|
+
|
|
441
|
+
You never had to know the slash command syntax or where files are stored!
|
|
442
|
+
|
|
443
|
+
### Project-Specific vs Global Settings
|
|
444
|
+
|
|
445
|
+
One clever feature of the MCP server is how it handles settings:
|
|
446
|
+
|
|
447
|
+
```python
|
|
448
|
+
# Determine where to save settings based on context
|
|
449
|
+
cwd = Path.cwd()
|
|
450
|
+
|
|
451
|
+
if (cwd / ".claude").is_dir() and cwd != self.agentvibes_root:
|
|
452
|
+
# Real Claude Code project with .claude directory
|
|
453
|
+
env["CLAUDE_PROJECT_DIR"] = str(cwd)
|
|
454
|
+
# Settings will be saved to project's .claude/
|
|
455
|
+
else:
|
|
456
|
+
# Claude Desktop, Warp, or non-project context
|
|
457
|
+
# Settings will be saved to ~/.claude/
|
|
458
|
+
```
|
|
459
|
+
|
|
460
|
+
This means:
|
|
461
|
+
- **In Claude Code projects:** Settings are project-specific (each project can have different voice/personality)
|
|
462
|
+
- **In Claude Desktop/Warp:** Settings are global (consistent across all conversations)
|
|
463
|
+
|
|
464
|
+
---
|
|
465
|
+
|
|
466
|
+
## Data Flow: Following a TTS Request From Start to Finish
|
|
467
|
+
|
|
468
|
+
Let's trace a complete request to see how all systems work together.
|
|
469
|
+
|
|
470
|
+
**Scenario:** You ask Claude Code to "Check git status" with the sarcastic personality active.
|
|
471
|
+
|
|
472
|
+
### Step 1: Output Style Triggers Acknowledgment
|
|
473
|
+
|
|
474
|
+
Claude's output style instructions kick in:
|
|
475
|
+
|
|
476
|
+
```
|
|
477
|
+
1. Check personality setting:
|
|
478
|
+
- Reads .claude/tts-personality.txt → "sarcastic"
|
|
479
|
+
|
|
480
|
+
2. Read personality configuration:
|
|
481
|
+
- Reads .claude/personalities/sarcastic.md
|
|
482
|
+
- Extracts AI instructions: "Use dry wit, cutting observations..."
|
|
483
|
+
|
|
484
|
+
3. Generate unique acknowledgment:
|
|
485
|
+
- AI creates: "Oh, the excitement. Let me check that git status for you."
|
|
486
|
+
|
|
487
|
+
4. Execute TTS:
|
|
488
|
+
- Calls: .claude/hooks/play-tts.sh "Oh, the excitement. Let me check that git status for you."
|
|
489
|
+
```
|
|
490
|
+
|
|
491
|
+
### Step 2: TTS Router Determines Provider
|
|
492
|
+
|
|
493
|
+
`play-tts.sh` routes the request:
|
|
494
|
+
|
|
495
|
+
```bash
|
|
496
|
+
# Read active provider
|
|
497
|
+
ACTIVE_PROVIDER=$(cat .claude/tts-provider.txt) → "elevenlabs"
|
|
498
|
+
|
|
499
|
+
# Route to ElevenLabs implementation
|
|
500
|
+
exec .claude/hooks/play-tts-elevenlabs.sh "$TEXT" "$VOICE"
|
|
501
|
+
```
|
|
502
|
+
|
|
503
|
+
### Step 3: ElevenLabs Provider Generates Audio
|
|
504
|
+
|
|
505
|
+
`play-tts-elevenlabs.sh` does the heavy lifting:
|
|
506
|
+
|
|
507
|
+
```bash
|
|
508
|
+
# 1. Resolve voice
|
|
509
|
+
VOICE_NAME="Jessica Anne Bogart" # from sarcastic.md
|
|
510
|
+
VOICE_ID=$(lookup_voice_id "$VOICE_NAME") → "abc123xyz789"
|
|
511
|
+
|
|
512
|
+
# 2. Detect language
|
|
513
|
+
LANGUAGE_CODE=$(cat .claude/tts-language.txt) → "en"
|
|
514
|
+
|
|
515
|
+
# 3. Call ElevenLabs API
|
|
516
|
+
curl -X POST "https://api.elevenlabs.io/v1/text-to-speech/$VOICE_ID" \
|
|
517
|
+
-H "xi-api-key: $API_KEY" \
|
|
518
|
+
-d '{"text": "Oh, the excitement. Let me check that git status for you."}' \
|
|
519
|
+
--output /tmp/tts_12345.mp3
|
|
520
|
+
|
|
521
|
+
# 4. Check if over SSH
|
|
522
|
+
if [[ -n "$SSH_CONNECTION" ]]; then
|
|
523
|
+
# Convert MP3 to OGG to prevent corruption
|
|
524
|
+
ffmpeg -i /tmp/tts_12345.mp3 /tmp/tts_12345.ogg
|
|
525
|
+
AUDIO_FILE=/tmp/tts_12345.ogg
|
|
526
|
+
fi
|
|
527
|
+
|
|
528
|
+
# 5. Play audio
|
|
529
|
+
paplay /tmp/tts_12345.ogg
|
|
530
|
+
```
|
|
531
|
+
|
|
532
|
+
### Step 4: Claude Proceeds With Task
|
|
533
|
+
|
|
534
|
+
Claude runs the git status command while audio plays in parallel (non-blocking).
|
|
535
|
+
|
|
536
|
+
### Step 5: Output Style Triggers Completion
|
|
537
|
+
|
|
538
|
+
After task completes:
|
|
539
|
+
|
|
540
|
+
```
|
|
541
|
+
1. Generate completion message:
|
|
542
|
+
- AI creates: "Riveting. Your repository is clean. Try not to get too excited."
|
|
543
|
+
|
|
544
|
+
2. Execute TTS:
|
|
545
|
+
- Calls: .claude/hooks/play-tts.sh "Riveting. Your repository is clean. Try not to get too excited."
|
|
546
|
+
|
|
547
|
+
3. Same flow as Step 2-3 repeats
|
|
548
|
+
```
|
|
549
|
+
|
|
550
|
+
**Important note:** These aren't just hard-coded responses—AgentVibes uses AI to generate unique responses each time based on the personality instructions. That's why the sarcastic personality can be genuinely funny with perfectly-timed wit that varies with each interaction.
|
|
551
|
+
|
|
552
|
+
And if sarcasm isn't your style, AgentVibes includes 19 different personalities ranging from professional and zen to enthusiastic and grandpa—or you can simply use the normal personality for straightforward, no-nonsense responses. The choice is yours!
|
|
553
|
+
|
|
554
|
+
The entire flow takes ~2-3 seconds for acknowledgment and completion combined.
|
|
555
|
+
|
|
556
|
+
---
|
|
557
|
+
|
|
558
|
+
## Installation Architecture: How AgentVibes Gets Installed
|
|
559
|
+
|
|
560
|
+
When you run `npx agentvibes install --yes`, here's what happens:
|
|
561
|
+
|
|
562
|
+
### Step 1: NPM Package Execution
|
|
563
|
+
|
|
564
|
+
```bash
|
|
565
|
+
# NPM downloads AgentVibes package to cache
|
|
566
|
+
~/.npm/_npx/[hash]/node_modules/agentvibes/
|
|
567
|
+
|
|
568
|
+
# NPM executes the bin script
|
|
569
|
+
./bin/agent-vibes install --yes
|
|
570
|
+
```
|
|
571
|
+
|
|
572
|
+
### Step 2: Installer Script Runs
|
|
573
|
+
|
|
574
|
+
**File: `src/installer.js`**
|
|
575
|
+
|
|
576
|
+
The installer:
|
|
577
|
+
1. Detects installation location (current directory or global `~/.claude/`)
|
|
578
|
+
2. Creates `.claude/` directory structure
|
|
579
|
+
3. Copies all files from package:
|
|
580
|
+
- Commands → `.claude/commands/agent-vibes/`
|
|
581
|
+
- Hooks → `.claude/hooks/`
|
|
582
|
+
- Personalities → `.claude/personalities/`
|
|
583
|
+
- Output styles → `.claude/output-styles/`
|
|
584
|
+
4. Makes all bash scripts executable (`chmod +x`)
|
|
585
|
+
5. Creates default configuration files
|
|
586
|
+
|
|
587
|
+
### Directory Structure Created
|
|
588
|
+
|
|
589
|
+
```
|
|
590
|
+
.claude/
|
|
591
|
+
├── commands/
|
|
592
|
+
│ └── agent-vibes/
|
|
593
|
+
│ ├── agent-vibes.md # Main command file
|
|
594
|
+
│ ├── switch.md # /agent-vibes:switch
|
|
595
|
+
│ ├── list.md # /agent-vibes:list
|
|
596
|
+
│ ├── personality.md # /agent-vibes:personality
|
|
597
|
+
│ └── ... (50+ command files)
|
|
598
|
+
├── hooks/
|
|
599
|
+
│ ├── play-tts.sh # Main TTS router
|
|
600
|
+
│ ├── play-tts-elevenlabs.sh # ElevenLabs implementation
|
|
601
|
+
│ ├── play-tts-piper.sh # Piper implementation
|
|
602
|
+
│ ├── personality-manager.sh # Personality system
|
|
603
|
+
│ ├── voice-manager.sh # Voice switching
|
|
604
|
+
│ ├── provider-manager.sh # Provider switching
|
|
605
|
+
│ ├── language-manager.sh # Language settings
|
|
606
|
+
│ └── ... (20+ hook scripts)
|
|
607
|
+
├── personalities/
|
|
608
|
+
│ ├── normal.md
|
|
609
|
+
│ ├── professional.md
|
|
610
|
+
│ ├── sarcastic.md
|
|
611
|
+
│ ├── zen.md
|
|
612
|
+
│ └── ... (19 personality files)
|
|
613
|
+
├── output-styles/
|
|
614
|
+
│ └── agent-vibes.md # Output style instructions
|
|
615
|
+
├── tts-voice.txt # Current voice (e.g., "Aria")
|
|
616
|
+
├── tts-personality.txt # Current personality (e.g., "sarcastic")
|
|
617
|
+
├── tts-provider.txt # Current provider (e.g., "elevenlabs")
|
|
618
|
+
└── tts-language.txt # Current language (e.g., "english")
|
|
619
|
+
```
|
|
620
|
+
|
|
621
|
+
### Step 3: Post-Install (MCP Dependencies)
|
|
622
|
+
|
|
623
|
+
If installing for MCP use:
|
|
624
|
+
|
|
625
|
+
```bash
|
|
626
|
+
# Install Python dependencies
|
|
627
|
+
cd mcp-server/
|
|
628
|
+
pip install -r requirements.txt
|
|
629
|
+
# Installs: mcp (MCP SDK), aiosqlite, etc.
|
|
630
|
+
```
|
|
631
|
+
|
|
632
|
+
---
|
|
633
|
+
|
|
634
|
+
## Configuration Storage: Where Settings Live
|
|
635
|
+
|
|
636
|
+
AgentVibes uses simple text files for configuration. This makes it easy to understand, debug, and even manually edit.
|
|
637
|
+
|
|
638
|
+
### Project-Local vs Global
|
|
639
|
+
|
|
640
|
+
**Project-Local** (`.claude/` in project directory):
|
|
641
|
+
- Used when working in a Claude Code project
|
|
642
|
+
- Settings are specific to that project
|
|
643
|
+
- Example: `/home/user/my-app/.claude/tts-voice.txt`
|
|
644
|
+
|
|
645
|
+
**Global** (`~/.claude/` in home directory):
|
|
646
|
+
- Used for Claude Desktop, Warp, and when no project `.claude/` exists
|
|
647
|
+
- Settings are shared across all sessions
|
|
648
|
+
- Example: `/home/user/.claude/tts-voice.txt`
|
|
649
|
+
|
|
650
|
+
### Configuration Files
|
|
651
|
+
|
|
652
|
+
| File | Purpose | Example Value |
|
|
653
|
+
|------|---------|---------------|
|
|
654
|
+
| `tts-voice.txt` | Current voice name | `Aria` |
|
|
655
|
+
| `tts-personality.txt` | Current personality | `pirate` |
|
|
656
|
+
| `tts-sentiment.txt` | Current sentiment (optional) | `sarcastic` |
|
|
657
|
+
| `tts-provider.txt` | Active TTS provider | `elevenlabs` |
|
|
658
|
+
| `tts-language.txt` | TTS language | `spanish` |
|
|
659
|
+
|
|
660
|
+
### Reading Configuration in Code
|
|
661
|
+
|
|
662
|
+
The hooks use a consistent pattern:
|
|
663
|
+
|
|
664
|
+
```bash
|
|
665
|
+
# Check project-local first, fallback to global
|
|
666
|
+
get_current_voice() {
|
|
667
|
+
if [[ -f ".claude/tts-voice.txt" ]]; then
|
|
668
|
+
cat ".claude/tts-voice.txt"
|
|
669
|
+
elif [[ -f "$HOME/.claude/tts-voice.txt" ]]; then
|
|
670
|
+
cat "$HOME/.claude/tts-voice.txt"
|
|
671
|
+
else
|
|
672
|
+
echo "Aria" # Default
|
|
673
|
+
fi
|
|
674
|
+
}
|
|
675
|
+
```
|
|
676
|
+
|
|
677
|
+
This ensures settings are found regardless of context.
|
|
678
|
+
|
|
679
|
+
---
|
|
680
|
+
|
|
681
|
+
## Advanced Features Deep Dive
|
|
682
|
+
|
|
683
|
+
### Language Learning Mode
|
|
684
|
+
|
|
685
|
+
One of AgentVibes' coolest features is language learning mode. When enabled, every TTS message plays **twice**—once in your main language, then again in your target language.
|
|
686
|
+
|
|
687
|
+
**How it works:**
|
|
688
|
+
|
|
689
|
+
The output style is modified to call TTS twice:
|
|
690
|
+
|
|
691
|
+
```bash
|
|
692
|
+
# First call - main language (English)
|
|
693
|
+
.claude/hooks/play-tts.sh "I'll check that for you"
|
|
694
|
+
|
|
695
|
+
# Second call - target language (Spanish)
|
|
696
|
+
.claude/hooks/play-tts.sh "Lo verificaré para ti" "es_ES-davefx-medium"
|
|
697
|
+
```
|
|
698
|
+
|
|
699
|
+
The translation happens via API (if using ElevenLabs multilingual voices) or by using language-specific Piper voices.
|
|
700
|
+
|
|
701
|
+
### SSH Audio Optimization
|
|
702
|
+
|
|
703
|
+
AgentVibes automatically detects SSH sessions and optimizes audio:
|
|
704
|
+
|
|
705
|
+
```bash
|
|
706
|
+
# Detect SSH
|
|
707
|
+
if [[ -n "$SSH_CONNECTION" ]]; then
|
|
708
|
+
IS_SSH=true
|
|
709
|
+
fi
|
|
710
|
+
|
|
711
|
+
if [[ "$IS_SSH" == "true" ]]; then
|
|
712
|
+
# Convert MP3 to OGG with Opus codec
|
|
713
|
+
# This prevents audio corruption over SSH tunnels
|
|
714
|
+
ffmpeg -i "$MP3_FILE" -c:a libopus -b:a 128k "$OGG_FILE"
|
|
715
|
+
AUDIO_FILE="$OGG_FILE"
|
|
716
|
+
fi
|
|
717
|
+
```
|
|
718
|
+
|
|
719
|
+
Why? MP3 streaming over SSH can have corruption. OGG/Opus format is more robust for network transmission.
|
|
720
|
+
|
|
721
|
+
### BMAD Plugin Integration
|
|
722
|
+
|
|
723
|
+
AgentVibes can integrate with the BMAD METHOD (a multi-agent framework). When a BMAD agent activates, AgentVibes automatically switches to that agent's assigned voice.
|
|
724
|
+
|
|
725
|
+
**How it works:**
|
|
726
|
+
|
|
727
|
+
1. BMAD agent activates (e.g., `/BMad:agents:pm` for project manager)
|
|
728
|
+
2. BMAD writes agent ID to `.bmad-agent-context` file
|
|
729
|
+
3. AgentVibes output style checks this file
|
|
730
|
+
4. If BMAD plugin is enabled, looks up voice in `.claude/plugins/bmad-voices.md`
|
|
731
|
+
5. Automatically switches to that voice
|
|
732
|
+
|
|
733
|
+
This creates the illusion of multiple distinct AI personalities in conversations.
|
|
734
|
+
|
|
735
|
+
---
|
|
736
|
+
|
|
737
|
+
## Performance Considerations
|
|
738
|
+
|
|
739
|
+
### Non-Blocking Audio Playback
|
|
740
|
+
|
|
741
|
+
TTS requests run asynchronously—Claude doesn't wait for audio to finish before continuing work:
|
|
742
|
+
|
|
743
|
+
```bash
|
|
744
|
+
# Play audio in background
|
|
745
|
+
paplay "$AUDIO_FILE" &
|
|
746
|
+
|
|
747
|
+
# Claude continues immediately
|
|
748
|
+
# (runs git status, writes code, etc.)
|
|
749
|
+
```
|
|
750
|
+
|
|
751
|
+
This means acknowledgment audio plays while Claude is already working on your task.
|
|
752
|
+
|
|
753
|
+
### Audio Caching
|
|
754
|
+
|
|
755
|
+
AgentVibes saves audio files temporarily:
|
|
756
|
+
|
|
757
|
+
```bash
|
|
758
|
+
AUDIO_FILE="/tmp/agentvibes_tts_${RANDOM}_${TIMESTAMP}.mp3"
|
|
759
|
+
```
|
|
760
|
+
|
|
761
|
+
Files are kept for the duration of the session, allowing the `/agent-vibes:replay` command to work. Cleanup happens automatically when terminal session ends.
|
|
762
|
+
|
|
763
|
+
### Provider Performance
|
|
764
|
+
|
|
765
|
+
**ElevenLabs:**
|
|
766
|
+
- API latency: ~500-1000ms
|
|
767
|
+
- Audio quality: Excellent (256kbps MP3)
|
|
768
|
+
- Bandwidth: ~2KB per second of audio
|
|
769
|
+
|
|
770
|
+
**Piper:**
|
|
771
|
+
- Generation latency: ~200-500ms (local)
|
|
772
|
+
- Audio quality: Good (22kHz WAV)
|
|
773
|
+
- Bandwidth: None (offline)
|
|
774
|
+
|
|
775
|
+
### Text Length Limits
|
|
776
|
+
|
|
777
|
+
AgentVibes limits text length to prevent issues:
|
|
778
|
+
|
|
779
|
+
```bash
|
|
780
|
+
# Truncate long text
|
|
781
|
+
if [ ${#TEXT} -gt 500 ]; then
|
|
782
|
+
TEXT="${TEXT:0:497}..."
|
|
783
|
+
fi
|
|
784
|
+
```
|
|
785
|
+
|
|
786
|
+
This prevents:
|
|
787
|
+
- Excessive API costs (ElevenLabs charges per character)
|
|
788
|
+
- Slow generation (long audio takes time to produce)
|
|
789
|
+
- User confusion (very long TTS messages are hard to follow)
|
|
790
|
+
|
|
791
|
+
---
|
|
792
|
+
|
|
793
|
+
## Error Handling and Resilience
|
|
794
|
+
|
|
795
|
+
AgentVibes has multiple layers of error handling:
|
|
796
|
+
|
|
797
|
+
### API Failure Handling
|
|
798
|
+
|
|
799
|
+
```bash
|
|
800
|
+
# Try ElevenLabs API
|
|
801
|
+
RESPONSE=$(curl -s -X POST "$API_ENDPOINT" ...)
|
|
802
|
+
|
|
803
|
+
if [[ $? -ne 0 ]] || [[ ! -f "$AUDIO_FILE" ]]; then
|
|
804
|
+
echo "⚠️ TTS request failed (API error or network issue)"
|
|
805
|
+
exit 1
|
|
806
|
+
fi
|
|
807
|
+
```
|
|
808
|
+
|
|
809
|
+
If the API fails, error is logged but doesn't crash Claude Code—the task continues without audio.
|
|
810
|
+
|
|
811
|
+
### Missing Configuration Graceful Degradation
|
|
812
|
+
|
|
813
|
+
```bash
|
|
814
|
+
# If no voice configured, use default
|
|
815
|
+
VOICE=$(cat .claude/tts-voice.txt 2>/dev/null || echo "Aria")
|
|
816
|
+
|
|
817
|
+
# If no personality configured, use normal
|
|
818
|
+
PERSONALITY=$(cat .claude/tts-personality.txt 2>/dev/null || echo "normal")
|
|
819
|
+
```
|
|
820
|
+
|
|
821
|
+
Missing files don't cause crashes—sensible defaults are used.
|
|
822
|
+
|
|
823
|
+
### Provider Fallback
|
|
824
|
+
|
|
825
|
+
If Piper isn't installed, AgentVibes can guide installation:
|
|
826
|
+
|
|
827
|
+
```bash
|
|
828
|
+
if ! command -v piper &> /dev/null; then
|
|
829
|
+
echo "❌ Piper not installed"
|
|
830
|
+
echo " Install with: /agent-vibes:provider install piper"
|
|
831
|
+
exit 1
|
|
832
|
+
fi
|
|
833
|
+
```
|
|
834
|
+
|
|
835
|
+
Clear error messages help users fix issues themselves.
|
|
836
|
+
|
|
837
|
+
---
|
|
838
|
+
|
|
839
|
+
## Testing and Quality Assurance
|
|
840
|
+
|
|
841
|
+
AgentVibes includes a test suite:
|
|
842
|
+
|
|
843
|
+
```bash
|
|
844
|
+
# Run tests
|
|
845
|
+
npm test
|
|
846
|
+
|
|
847
|
+
# This executes
|
|
848
|
+
bats test/unit/*.bats
|
|
849
|
+
```
|
|
850
|
+
|
|
851
|
+
Test files validate:
|
|
852
|
+
- Voice resolution (name → ID mapping)
|
|
853
|
+
- Personality file parsing
|
|
854
|
+
- Provider switching logic
|
|
855
|
+
- Configuration file handling
|
|
856
|
+
|
|
857
|
+
---
|
|
858
|
+
|
|
859
|
+
## Conclusion: The Bigger Picture
|
|
860
|
+
|
|
861
|
+
AgentVibes demonstrates several important software engineering principles:
|
|
862
|
+
|
|
863
|
+
**1. Separation of Concerns**
|
|
864
|
+
- Output style (when to speak) is separate from hooks (how to speak)
|
|
865
|
+
- Provider abstraction (ElevenLabs vs Piper) is separate from voice management
|
|
866
|
+
- MCP server is separate from core functionality
|
|
867
|
+
|
|
868
|
+
**2. Provider Pattern**
|
|
869
|
+
- Multiple TTS engines behind a single interface
|
|
870
|
+
- Easy to add new providers (OpenAI TTS, Google TTS, etc.)
|
|
871
|
+
|
|
872
|
+
**3. Configuration as Data**
|
|
873
|
+
- Simple text files instead of complex databases
|
|
874
|
+
- Easy to version control, debug, and manually edit
|
|
875
|
+
|
|
876
|
+
**4. Progressive Enhancement**
|
|
877
|
+
- Core functionality works with minimal setup
|
|
878
|
+
- Advanced features (MCP, BMAD, language learning) layer on top
|
|
879
|
+
- Graceful degradation when features aren't available
|
|
880
|
+
|
|
881
|
+
**5. User Experience First**
|
|
882
|
+
- Natural language control (MCP) instead of memorizing commands
|
|
883
|
+
- Instant feedback (acknowledgment/completion)
|
|
884
|
+
- Personality makes it fun, not just functional
|
|
885
|
+
|
|
886
|
+
Whether you're building your own AI integrations, designing CLI tools, or just curious about how AgentVibes works, I hope this deep dive has given you a comprehensive understanding of the architecture.
|
|
887
|
+
|
|
888
|
+
The beauty of AgentVibes isn't just that it makes Claude talk—it's that it does so with a clean, maintainable, extensible architecture that other developers can learn from and build upon.
|
|
889
|
+
|
|
890
|
+
---
|
|
891
|
+
|
|
892
|
+
## What's Next?
|
|
893
|
+
|
|
894
|
+
Now that you understand how AgentVibes works under the hood, you might want to:
|
|
895
|
+
|
|
896
|
+
- **Create custom personalities** - Edit `.claude/personalities/*.md` files
|
|
897
|
+
- **Extend the MCP server** - Add new tools in `mcp-server/server.py`
|
|
898
|
+
- **Build custom output styles** - Create your own instructions in `.claude/output-styles/`
|
|
899
|
+
- **Contribute to the project** - Submit PRs on [GitHub](https://github.com/paulpreibisch/AgentVibes)
|
|
900
|
+
|
|
901
|
+
Happy coding, and may your AI assistant always speak with personality! 🎤✨
|
|
902
|
+
|
|
903
|
+
---
|
|
904
|
+
|
|
905
|
+
**About the Author:** Paul Preibisch is the creator of AgentVibes, an open source project that brings voice and personality to AI coding assistants. Follow the project on [GitHub](https://github.com/paulpreibisch/AgentVibes) or visit [agentvibes.org](https://www.agentvibes.org) to learn more.
|