agentvibes 2.0.3 → 2.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,574 @@
1
+ # Provider System Architecture
2
+
3
+ > **Context**: Multi-provider TTS architecture for AgentVibes
4
+ > **Purpose**: Unified interface for pluggable TTS providers
5
+ > **Pattern**: Router + Provider Implementation pattern
6
+ > **Status**: Production (v2.0.0+)
7
+
8
+ ---
9
+
10
+ ## 📋 Table of Contents
11
+
12
+ - [Overview](#overview)
13
+ - [Design Principles](#design-principles)
14
+ - [Component Architecture](#component-architecture)
15
+ - [Request Flow](#request-flow)
16
+ - [Provider Interface Contract](#provider-interface-contract)
17
+ - [State Management](#state-management)
18
+ - [Extension Points](#extension-points)
19
+ - [Related Files](#related-files)
20
+ - [Migration Guide](#migration-guide)
21
+
22
+ ---
23
+
24
+ ## Overview
25
+
26
+ AgentVibes implements a **multi-provider TTS architecture** that allows seamless switching between different text-to-speech services. The system abstracts provider-specific details behind a unified interface, enabling:
27
+
28
+ - 🔄 **Runtime provider switching** without configuration changes
29
+ - 🎤 **Automatic voice mapping** across providers
30
+ - 💾 **Persistent state management** for user preferences
31
+ - 🔌 **Plugin extensibility** for adding new providers
32
+ - ⚡ **Zero-downtime switching** with immediate effect
33
+
34
+ **Current Providers:**
35
+ - **ElevenLabs** - Premium AI voices (150+ voices, 30+ languages)
36
+ - **Piper TTS** - Free offline neural voices (50+ voices, 18 languages)
37
+
38
+ ---
39
+
40
+ ## Design Principles
41
+
42
+ ### 1. Provider Abstraction
43
+ **Unified interface hides provider complexity**
44
+ - Single entry point for all TTS requests (`play-tts.sh`)
45
+ - Provider-agnostic API for callers
46
+ - Implementation details encapsulated in provider scripts
47
+
48
+ ### 2. Loose Coupling
49
+ **Provider implementations are independent**
50
+ - Each provider is a standalone script
51
+ - No cross-provider dependencies
52
+ - Adding/removing providers doesn't affect others
53
+ - Provider failures are isolated
54
+
55
+ ### 3. Simple State Management
56
+ **File-based configuration for transparency**
57
+ - `.claude/tts-provider.txt` - Active provider name
58
+ - `.claude/personalities/*.md` - Voice mappings per provider
59
+ - No databases or complex state stores
60
+ - Human-readable and git-friendly
61
+
62
+ ### 4. Backward Compatibility
63
+ **Seamless migration from single-provider**
64
+ - Old `voice:` format still works (treated as ElevenLabs)
65
+ - New `voices: {elevenlabs: X, piper: Y}` format preferred
66
+ - Migration warnings guide users to new format
67
+ - No forced breaking changes
68
+
69
+ ---
70
+
71
+ ## Component Architecture
72
+
73
+ ```
74
+ ┌─────────────────────────────────────────────────────────────┐
75
+ │ User/Claude Code │
76
+ └──────────────────────────┬──────────────────────────────────┘
77
+
78
+
79
+ ┌─────────────────────────────────────────────────────────────┐
80
+ │ play-tts.sh (Router) │
81
+ │ • Personality provider override check │
82
+ │ • Active provider resolution │
83
+ │ • Provider script delegation │
84
+ └──────────────────────────┬──────────────────────────────────┘
85
+
86
+ ┌────────────────┼────────────────┐
87
+ │ │
88
+ ▼ ▼
89
+ ┌──────────────────────┐ ┌──────────────────────┐
90
+ │ play-tts-elevenlabs │ │ play-tts-piper │
91
+ │ │ │ │
92
+ │ • API requests │ │ • Local synthesis │
93
+ │ • Voice mapping │ │ • Voice models │
94
+ │ • Language codes │ │ • Offline mode │
95
+ │ • Error handling │ │ • Fast generation │
96
+ └──────────┬───────────┘ └──────────┬───────────┘
97
+ │ │
98
+ ▼ ▼
99
+ ┌─────────────────────────────────────────────────────────────┐
100
+ │ Audio Output (mpv/afplay) │
101
+ └─────────────────────────────────────────────────────────────┘
102
+ ```
103
+
104
+ ### Router Layer
105
+
106
+ **File**: `.claude/hooks/play-tts.sh`
107
+
108
+ **Responsibilities:**
109
+ - Check for personality provider override
110
+ - Read active provider from state file
111
+ - Validate provider exists
112
+ - Delegate to provider-specific implementation
113
+ - Handle errors and provide fallbacks
114
+
115
+ **Key Logic:**
116
+ ```bash
117
+ # Check personality override first
118
+ PROVIDER=""
119
+ if [[ personality has provider field ]]; then
120
+ PROVIDER=$(personality_provider)
121
+ fi
122
+
123
+ # Fall back to global provider
124
+ if [[ -z "$PROVIDER" ]]; then
125
+ PROVIDER=$(get_active_provider)
126
+ fi
127
+
128
+ # Delegate to provider script
129
+ exec "$PROVIDER_SCRIPT" "$@"
130
+ ```
131
+
132
+ ### Provider Implementations
133
+
134
+ **Files**: `.claude/hooks/play-tts-{provider}.sh`
135
+
136
+ Each provider implements:
137
+ - Text-to-speech synthesis
138
+ - Voice name resolution
139
+ - Language handling
140
+ - Error messaging
141
+ - Audio playback integration
142
+
143
+ **Current Implementations:**
144
+ - `play-tts-elevenlabs.sh` - ElevenLabs API integration
145
+ - `play-tts-piper.sh` - Piper TTS local synthesis
146
+
147
+ **Future Providers:**
148
+ - `play-tts-azure.sh` - Microsoft Azure TTS
149
+ - `play-tts-google.sh` - Google Cloud TTS
150
+ - `play-tts-aws.sh` - Amazon Polly
151
+
152
+ ### Provider Manager
153
+
154
+ **File**: `.claude/hooks/provider-manager.sh`
155
+
156
+ **Functions:**
157
+ - `get_active_provider()` - Read current provider from state
158
+ - `set_active_provider()` - Update state with new provider
159
+ - `list_providers()` - Discover available provider scripts
160
+ - `validate_provider()` - Check if provider exists and is valid
161
+ - `get_provider_script_path()` - Resolve provider script location
162
+
163
+ ### State Management
164
+
165
+ **Files:**
166
+ - `.claude/tts-provider.txt` - Active provider name (project-local)
167
+ - `~/.claude/tts-provider.txt` - Global fallback
168
+ - `.claude/personalities/*.md` - Per-personality voice mappings
169
+ - `.claude/plugins/bmad-voices.md` - BMAD agent voice table
170
+
171
+ **State Flow:**
172
+ ```
173
+ Project-local (.claude/tts-provider.txt)
174
+
175
+ Global fallback (~/.claude/tts-provider.txt)
176
+
177
+ Default (elevenlabs)
178
+ ```
179
+
180
+ ---
181
+
182
+ ## Request Flow
183
+
184
+ ### Standard TTS Request
185
+
186
+ ```
187
+ 1. User triggers TTS (e.g., task completion)
188
+
189
+
190
+ 2. play-tts.sh receives request
191
+ • Parameters: $1=text, $2=voice_name (optional)
192
+
193
+
194
+ 3. Check personality provider override
195
+ • Read current personality from .claude/tts-personality.txt
196
+ • Extract provider: field from personality YAML
197
+ • If set and valid, use personality provider
198
+
199
+
200
+ 4. Fall back to global provider (if no override)
201
+ • provider-manager.sh: get_active_provider()
202
+ • Read .claude/tts-provider.txt
203
+ • Default to "elevenlabs" if not set
204
+
205
+
206
+ 5. Resolve provider script path
207
+ • provider-manager.sh: get_provider_script_path()
208
+ • Returns .claude/hooks/play-tts-{provider}.sh
209
+ • Validate script exists and is executable
210
+
211
+
212
+ 6. Delegate to provider implementation
213
+ • exec play-tts-{provider}.sh "$text" "$voice"
214
+ • Provider handles synthesis
215
+ • Provider plays audio
216
+
217
+
218
+ 7. Audio output via system player (mpv/afplay)
219
+ ```
220
+
221
+ ### Provider Switch Request
222
+
223
+ ```
224
+ 1. User runs: /agent-vibes:provider switch piper
225
+
226
+
227
+ 2. provider-commands.sh validates provider
228
+ • Check provider exists
229
+ • Check platform compatibility (Piper requires WSL/Linux)
230
+ • Check language compatibility with current language
231
+
232
+
233
+ 3. User confirmation prompt
234
+ • Show current vs new provider
235
+ • Show any warnings (language fallbacks, platform issues)
236
+ • Require explicit Y/n confirmation
237
+
238
+
239
+ 4. Update state
240
+ • provider-manager.sh: set_active_provider(piper)
241
+ • Write "piper" to .claude/tts-provider.txt
242
+
243
+
244
+ 5. Test new provider
245
+ • play-tts.sh "Provider switched successfully"
246
+ • Immediate feedback to user
247
+ ```
248
+
249
+ ---
250
+
251
+ ## Provider Interface Contract
252
+
253
+ Every provider implementation must adhere to this contract:
254
+
255
+ ### Command Line Interface
256
+
257
+ ```bash
258
+ play-tts-{provider}.sh <text> [voice_name]
259
+ ```
260
+
261
+ **Parameters:**
262
+ - `$1` (required): Text to synthesize
263
+ - `$2` (optional): Voice name override
264
+
265
+ ### Return Codes
266
+
267
+ - `0` - Success (audio played successfully)
268
+ - `1` - Provider error (API failure, network issue, etc.)
269
+ - `2` - Configuration error (missing API key, invalid voice, etc.)
270
+ - `3` - Platform error (unsupported platform, missing dependencies)
271
+
272
+ ### Standard Output
273
+
274
+ ```bash
275
+ 🌍 Using spanish voice: Antoni # Language info
276
+ 🎤 Using voice: Aria (session override) # Voice selection
277
+ 🎵 Saved to: /path/to/audio.mp3 # Audio file path
278
+ 🎭 Using personality override: piper # Provider override
279
+ ```
280
+
281
+ ### Error Output (stderr)
282
+
283
+ ```bash
284
+ ❌ Provider 'foo' not found
285
+ ⚠️ Language 'arabic' not supported by piper, using English
286
+ 🔑 ElevenLabs API key not found
287
+ ```
288
+
289
+ ### Audio Output
290
+
291
+ - Must play audio via system player (mpv/afplay)
292
+ - Must handle audio padding for WSL static prevention
293
+ - Must clean up temporary files
294
+ - Must not cut off audio start/end
295
+
296
+ ### Voice Resolution
297
+
298
+ Providers must support:
299
+ 1. Voice name parameter from caller
300
+ 2. Personality-specific voice mapping
301
+ 3. Language-specific voice mapping
302
+ 4. Fallback to default voice
303
+
304
+ **Priority Order:**
305
+ ```
306
+ 1. Voice name parameter ($2)
307
+ 2. Language-specific voice (e.g., Spanish → Antoni)
308
+ 3. Personality voice (e.g., pirate → Pirate Marshal)
309
+ 4. Default voice (e.g., Aria)
310
+ ```
311
+
312
+ ---
313
+
314
+ ## State Management
315
+
316
+ ### Provider State
317
+
318
+ **Location**: `.claude/tts-provider.txt` (project) or `~/.claude/tts-provider.txt` (global)
319
+
320
+ **Format**: Plain text, single line
321
+ ```
322
+ elevenlabs
323
+ ```
324
+
325
+ **Read by**: `get_active_provider()`
326
+ **Written by**: `set_active_provider()`
327
+
328
+ ### Voice Mappings
329
+
330
+ **Location**: `.claude/personalities/*.md`
331
+
332
+ **Format**: YAML frontmatter
333
+ ```yaml
334
+ ---
335
+ name: pirate
336
+ voices:
337
+ elevenlabs: Pirate Marshal
338
+ piper: en_GB-northern_english_male-medium
339
+ provider: piper # Optional override
340
+ ---
341
+ ```
342
+
343
+ **Read by**: `personality-manager.sh:get_personality_data()`
344
+
345
+ ### BMAD Agent Voices
346
+
347
+ **Location**: `.claude/plugins/bmad-voices.md`
348
+
349
+ **Format**: Markdown table
350
+ ```markdown
351
+ | Agent ID | Name | ElevenLabs Voice | Piper Voice | Personality |
352
+ |----------|------|------------------|-------------|-------------|
353
+ | pm | John | Jessica Anne Bogart | en_US-lessac-medium | professional |
354
+ ```
355
+
356
+ **Read by**: `bmad-voice-manager.sh:get_agent_voice()`
357
+
358
+ ---
359
+
360
+ ## Extension Points
361
+
362
+ ### Adding a New Provider
363
+
364
+ To add a new TTS provider to AgentVibes:
365
+
366
+ #### 1. Create Provider Script
367
+
368
+ **File**: `.claude/hooks/play-tts-{provider}.sh`
369
+
370
+ ```bash
371
+ #!/bin/bash
372
+ # Provider: {Your Provider Name}
373
+
374
+ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
375
+ TEXT="${1:-Hello World}"
376
+ VOICE_OVERRIDE="${2:-}"
377
+
378
+ # 1. Voice resolution
379
+ if [[ -n "$VOICE_OVERRIDE" ]]; then
380
+ VOICE="$VOICE_OVERRIDE"
381
+ else
382
+ # Check language, personality, or default
383
+ VOICE="default-voice"
384
+ fi
385
+
386
+ # 2. Synthesize speech
387
+ # ... your provider-specific synthesis ...
388
+
389
+ # 3. Play audio
390
+ mpv --really-quiet "$AUDIO_FILE"
391
+
392
+ # 4. Exit with appropriate code
393
+ exit 0
394
+ ```
395
+
396
+ #### 2. Add Voice Mappings
397
+
398
+ Update all personality files in `.claude/personalities/*.md`:
399
+
400
+ ```yaml
401
+ ---
402
+ name: sarcastic
403
+ voices:
404
+ elevenlabs: Jessica Anne Bogart
405
+ piper: en_US-amy-medium
406
+ yourprovider: your-voice-name # Add this line
407
+ ---
408
+ ```
409
+
410
+ #### 3. Update Installer
411
+
412
+ Add provider option to `bin/agent-vibes` installer:
413
+
414
+ ```javascript
415
+ const providers = [
416
+ { name: 'ElevenLabs', value: 'elevenlabs', ... },
417
+ { name: 'Piper', value: 'piper', ... },
418
+ { name: 'Your Provider', value: 'yourprovider', ... } // Add this
419
+ ];
420
+ ```
421
+
422
+ #### 4. Add Provider Commands
423
+
424
+ Update `.claude/hooks/provider-commands.sh`:
425
+
426
+ ```bash
427
+ case "$provider_name" in
428
+ elevenlabs) ... ;;
429
+ piper) ... ;;
430
+ yourprovider) # Add this
431
+ echo "Your Provider - Description"
432
+ echo "Platform: All"
433
+ echo "Cost: ..."
434
+ ;;
435
+ esac
436
+ ```
437
+
438
+ #### 5. Create Documentation
439
+
440
+ Create `.docs/providers/yourprovider-setup.md` with:
441
+ - Installation instructions
442
+ - Configuration requirements
443
+ - Platform compatibility
444
+ - Voice library information
445
+ - Troubleshooting guide
446
+
447
+ #### 6. Test Provider
448
+
449
+ ```bash
450
+ # Switch to new provider
451
+ /agent-vibes:provider switch yourprovider
452
+
453
+ # Test with personalities
454
+ /agent-vibes:personality sarcastic
455
+ # Trigger TTS to hear new voice
456
+
457
+ # Test languages
458
+ /agent-vibes:set-language spanish
459
+ # Trigger TTS to verify language support
460
+ ```
461
+
462
+ ---
463
+
464
+ ## Related Files
465
+
466
+ ### Core Provider System
467
+
468
+ | File | Purpose |
469
+ |------|---------|
470
+ | `.claude/hooks/play-tts.sh` | Router - main entry point |
471
+ | `.claude/hooks/provider-manager.sh` | Provider management functions |
472
+ | `.claude/hooks/play-tts-elevenlabs.sh` | ElevenLabs implementation |
473
+ | `.claude/hooks/play-tts-piper.sh` | Piper implementation |
474
+ | `.claude/hooks/provider-commands.sh` | Slash commands for providers |
475
+
476
+ ### Voice Management
477
+
478
+ | File | Purpose |
479
+ |------|---------|
480
+ | `.claude/hooks/personality-manager.sh` | Personality voice resolution |
481
+ | `.claude/hooks/language-manager.sh` | Language-specific voice mapping |
482
+ | `.claude/hooks/bmad-voice-manager.sh` | BMAD agent voice lookup |
483
+ | `.claude/language-voices.yaml` | Language-to-voice mappings |
484
+
485
+ ### State Files
486
+
487
+ | File | Purpose |
488
+ |------|---------|
489
+ | `.claude/tts-provider.txt` | Active provider name |
490
+ | `.claude/personalities/*.md` | Voice mappings per personality |
491
+ | `.claude/plugins/bmad-voices.md` | BMAD agent voice table |
492
+
493
+ ### Slash Commands
494
+
495
+ | File | Purpose |
496
+ |------|---------|
497
+ | `.claude/commands/agent-vibes/provider.md` | Provider management commands |
498
+ | `.claude/commands/agent-vibes/switch.md` | Voice switching (single provider) |
499
+ | `.claude/commands/agent-vibes/list.md` | List voices (provider-aware) |
500
+
501
+ ---
502
+
503
+ ## Migration Guide
504
+
505
+ ### From Single-Provider (v1.x) to Multi-Provider (v2.x)
506
+
507
+ #### Personality Files
508
+
509
+ **Old Format (v1.x)**:
510
+ ```yaml
511
+ ---
512
+ name: sarcastic
513
+ voice: Jessica Anne Bogart
514
+ ---
515
+ ```
516
+
517
+ **New Format (v2.x)**:
518
+ ```yaml
519
+ ---
520
+ name: sarcastic
521
+ voices:
522
+ elevenlabs: Jessica Anne Bogart
523
+ piper: en_US-amy-medium
524
+ ---
525
+ ```
526
+
527
+ **Migration**: Old format still works (treated as `voices.elevenlabs`), but new format recommended for multi-provider support.
528
+
529
+ #### BMAD Plugin
530
+
531
+ **Old Format (v1.x)**:
532
+ ```markdown
533
+ | Agent | Name | Voice | Personality |
534
+ |-------|------|-------|-------------|
535
+ | pm | John | Jessica | professional |
536
+ ```
537
+
538
+ **New Format (v2.x)**:
539
+ ```markdown
540
+ | Agent | Name | ElevenLabs Voice | Piper Voice | Personality |
541
+ |-------|------|------------------|-------------|-------------|
542
+ | pm | John | Jessica Anne Bogart | en_US-lessac-medium | professional |
543
+ ```
544
+
545
+ **Migration**: Automatic detection in `bmad-voice-manager.sh` - checks for "| ElevenLabs Voice |" header to determine format.
546
+
547
+ #### API Changes
548
+
549
+ No breaking API changes. All existing scripts continue to work:
550
+ - `play-tts.sh "text"` - Still works
551
+ - `/agent-vibes:switch voice` - Still works
552
+ - Personality switching - Still works
553
+
554
+ New features added:
555
+ - `/agent-vibes:provider switch piper` - New
556
+ - `provider:` field in personality frontmatter - New
557
+ - Multi-provider voice tables - New
558
+
559
+ ---
560
+
561
+ ## Learn More
562
+
563
+ - **Provider Comparison**: [agentvibes.org/providers](https://agentvibes.org/providers)
564
+ - **Setup Guides**: [docs/providers/](../providers/)
565
+ - **Voice Mapping Format**: [docs/voice-mapping-format.md](../voice-mapping-format.md)
566
+ - **Troubleshooting**: [docs/troubleshooting.md](../troubleshooting.md)
567
+
568
+ **Questions?** Visit [agentvibes.org/support](https://agentvibes.org/support)
569
+
570
+ ---
571
+
572
+ **Last Updated**: 2025-01-05
573
+ **Architecture Version**: 2.0.0
574
+ **Status**: Production