abstractvoice 0.1.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,1132 @@
1
+ Metadata-Version: 2.4
2
+ Name: abstractvoice
3
+ Version: 0.1.0
4
+ Summary: A modular Python library for voice interactions with AI systems
5
+ Author-email: Laurent-Philippe Albou <contact@abstractcore.ai>
6
+ License-Expression: MIT
7
+ Project-URL: Repository, https://github.com/lpalbou/abstractvoice
8
+ Project-URL: Documentation, https://github.com/lpalbou/abstractvoice#readme
9
+ Classifier: Development Status :: 3 - Alpha
10
+ Classifier: Intended Audience :: Developers
11
+ Classifier: Programming Language :: Python :: 3
12
+ Classifier: Programming Language :: Python :: 3.8
13
+ Classifier: Programming Language :: Python :: 3.9
14
+ Classifier: Programming Language :: Python :: 3.10
15
+ Classifier: Programming Language :: Python :: 3.11
16
+ Classifier: Programming Language :: Python :: 3.12
17
+ Requires-Python: >=3.8
18
+ Description-Content-Type: text/markdown
19
+ License-File: LICENSE
20
+ Requires-Dist: numpy>=1.24.0
21
+ Requires-Dist: sounddevice>=0.4.6
22
+ Requires-Dist: webrtcvad>=2.0.10
23
+ Requires-Dist: PyAudio>=0.2.13
24
+ Requires-Dist: openai-whisper>=20230314
25
+ Requires-Dist: coqui-tts>=0.27.0
26
+ Requires-Dist: torch>=2.0.0
27
+ Requires-Dist: torchaudio>=2.0.0
28
+ Requires-Dist: librosa>=0.10.0
29
+ Requires-Dist: soundfile>=0.12.1
30
+ Requires-Dist: requests>=2.31.0
31
+ Requires-Dist: flask>=2.0.0
32
+ Requires-Dist: tiktoken>=0.6.0
33
+ Provides-Extra: dev
34
+ Requires-Dist: pytest>=7.0.0; extra == "dev"
35
+ Requires-Dist: black>=22.0.0; extra == "dev"
36
+ Requires-Dist: flake8>=5.0.0; extra == "dev"
37
+ Dynamic: license-file
38
+
39
+ # AbstractVoice
40
+
41
+ [![PyPI version](https://img.shields.io/pypi/v/abstractvoice.svg)](https://pypi.org/project/abstractvoice/)
42
+ [![Python Version](https://img.shields.io/pypi/pyversions/abstractvoice)](https://pypi.org/project/abstractvoice/)
43
+ [![License](https://img.shields.io/pypi/l/abstractvoice)](https://github.com/lpalbou/abstractvoice/blob/main/LICENSE)
44
+ [![GitHub stars](https://img.shields.io/github/stars/lpalbou/abstractvoice?style=social)](https://github.com/lpalbou/abstractvoice/stargazers)
45
+
46
+ A modular Python library for voice interactions with AI systems, providing text-to-speech (TTS) and speech-to-text (STT) capabilities with interrupt handling.
47
+
48
+ While we provide CLI and WEB examples, AbstractVoice is designed to be integrated in other projects.
49
+
50
+ ## Features
51
+
52
+ - **High-Quality TTS**: Best-in-class speech synthesis with VITS model
53
+ - Natural prosody and intonation
54
+ - Adjustable speed without pitch distortion (using librosa time-stretching)
55
+ - Multiple quality levels (VITS best, fast_pitch fallback)
56
+ - Automatic fallback if espeak-ng not installed
57
+ - **Cross-Platform**: Works on macOS, Linux, and Windows
58
+ - Best quality: Install espeak-ng (easy on all platforms)
59
+ - Fallback mode: Works without any system dependencies
60
+ - **Speech-to-Text**: Accurate voice recognition using OpenAI's Whisper
61
+ - **Voice Activity Detection**: Efficient speech detection using WebRTC VAD
62
+ - **Interrupt Handling**: Stop TTS by speaking or using stop commands
63
+ - **Modular Design**: Easily integrate with any text generation system
64
+
65
+ ## Installation
66
+
67
+ ### Prerequisites
68
+
69
+ - Python 3.8+ (3.11+ recommended)
70
+ - PortAudio for audio input/output
71
+ - **Recommended**: espeak-ng for best voice quality (VITS model)
72
+
73
+ ### Installing espeak-ng (Recommended for Best Quality)
74
+
75
+ AbstractVoice will work without espeak-ng, but voice quality will be significantly better with it:
76
+
77
+ **macOS:**
78
+ ```bash
79
+ brew install espeak-ng
80
+ ```
81
+
82
+ **Linux (Ubuntu/Debian):**
83
+ ```bash
84
+ sudo apt-get install espeak-ng
85
+ ```
86
+
87
+ **Linux (Fedora/RHEL):**
88
+ ```bash
89
+ sudo yum install espeak-ng
90
+ ```
91
+
92
+ **Windows:**
93
+ ```bash
94
+ # Option 1: Using Conda
95
+ conda install -c conda-forge espeak-ng
96
+
97
+ # Option 2: Using Chocolatey
98
+ choco install espeak-ng
99
+
100
+ # Option 3: Download installer from https://github.com/espeak-ng/espeak-ng/releases
101
+ ```
102
+
103
+ **Without espeak-ng:** AbstractVoice will automatically fall back to a simpler TTS model (fast_pitch) that works everywhere but has lower voice quality.
104
+
105
+ ### Basic Installation
106
+
107
+ ```bash
108
+ # Install from PyPI
109
+ pip install abstractvoice
110
+
111
+ # Or clone the repository
112
+ git clone https://github.com/lpalbou/abstractvoice.git
113
+ cd abstractvoice
114
+ pip install -e .
115
+ ```
116
+
117
+ ### Development Installation
118
+
119
+ ```bash
120
+ # Install with development dependencies
121
+ pip install "abstractvoice[dev]"
122
+ ```
123
+
124
+ ### From Requirements File
125
+
126
+ ```bash
127
+ # Install all dependencies including the package
128
+ pip install -r requirements.txt
129
+ ```
130
+
131
+ ## Quick Start
132
+
133
+ ### Using AbstractVoice from the Command Line
134
+
135
+ The easiest way to get started is to use AbstractVoice directly from your shell:
136
+
137
+ ```bash
138
+ # Start AbstractVoice in voice mode (TTS ON, STT ON)
139
+ abstractvoice
140
+ # → Automatically uses VITS if espeak-ng installed (best quality)
141
+ # → Falls back to fast_pitch if espeak-ng not found
142
+
143
+ # Or start with custom settings
144
+ abstractvoice --model gemma3:latest --whisper base
145
+
146
+ # Start in text-only mode (TTS enabled, listening disabled)
147
+ abstractvoice --no-listening
148
+ ```
149
+
150
+ Once started, you can interact with the AI using voice or text. Use `/help` to see all available commands.
151
+
152
+ **Note**: AbstractVoice automatically selects the best available TTS model. For best quality, install espeak-ng (see Installation section above).
153
+
154
+ ### Integrating AbstractVoice in Your Python Project
155
+
156
+ Here's a simple example of how to integrate AbstractVoice into your own application:
157
+
158
+ ```python
159
+ from abstractvoice import VoiceManager
160
+ import time
161
+
162
+ # Initialize voice manager
163
+ voice_manager = VoiceManager(debug_mode=False)
164
+
165
+ # Text to speech
166
+ voice_manager.speak("Hello, I am an AI assistant. How can I help you today?")
167
+
168
+ # Wait for speech to complete
169
+ while voice_manager.is_speaking():
170
+ time.sleep(0.1)
171
+
172
+ # Speech to text with callback
173
+ def on_transcription(text):
174
+ print(f"User said: {text}")
175
+ if text.lower() != "stop":
176
+ # Process with your text generation system
177
+ response = f"You said: {text}"
178
+ voice_manager.speak(response)
179
+
180
+ # Start voice recognition
181
+ voice_manager.listen(on_transcription)
182
+
183
+ # Wait for user to say "stop" or press Ctrl+C
184
+ try:
185
+ while voice_manager.is_listening():
186
+ time.sleep(0.1)
187
+ except KeyboardInterrupt:
188
+ pass
189
+
190
+ # Clean up
191
+ voice_manager.cleanup()
192
+ ```
193
+
194
+ ## Running Examples
195
+
196
+ The package includes several examples that demonstrate different ways to use AbstractVoice.
197
+
198
+ ### Voice Mode (Default)
199
+
200
+ If installed globally, you can launch AbstractVoice directly in voice mode:
201
+
202
+ ```bash
203
+ # Start AbstractVoice in voice mode (TTS ON, STT ON)
204
+ abstractvoice
205
+
206
+ # With options
207
+ abstractvoice --debug --whisper base --model gemma3:latest --api http://localhost:11434/api/chat
208
+ ```
209
+
210
+ **Command line options:**
211
+ - `--debug` - Enable debug mode with detailed logging
212
+ - `--api <url>` - URL of the Ollama API (default: http://localhost:11434/api/chat)
213
+ - `--model <name>` - Ollama model to use (default: granite3.3:2b)
214
+ - Examples: cogito:3b, phi4-mini:latest, qwen2.5:latest, gemma3:latest, etc.
215
+ - `--whisper <model>` - Whisper model to use (default: tiny)
216
+ - Options: tiny, base, small, medium, large
217
+ - `--no-listening` - Disable speech-to-text (listening), TTS still works
218
+ - **Note**: This creates a "TTS-only" mode where you type and the AI speaks back
219
+ - `--system <prompt>` - Custom system prompt
220
+
221
+ ### Command-Line REPL
222
+
223
+ ```bash
224
+ # Run the CLI example (TTS ON, STT OFF)
225
+ abstractvoice-cli cli
226
+
227
+ # With debug mode
228
+ abstractvoice-cli cli --debug
229
+ ```
230
+
231
+ #### REPL Commands
232
+
233
+ All commands must start with `/` except `stop`:
234
+
235
+ **Basic Commands:**
236
+ - `/exit`, `/q`, `/quit` - Exit REPL
237
+ - `/clear` - Clear conversation history
238
+ - `/help` - Show help information
239
+ - `stop` - Stop voice mode or TTS (voice command, no `/` needed)
240
+
241
+ **Voice & Audio:**
242
+ - `/tts on|off` - Toggle text-to-speech
243
+ - `/voice <mode>` - Voice input modes:
244
+ - `off` - Disable voice input
245
+ - `full` - Continuous listening, interrupts TTS on speech detection
246
+ - `wait` - Pause listening while speaking (recommended, reduces self-interruption)
247
+ - `stop` - Only stop on 'stop' keyword (planned)
248
+ - `ptt` - Push-to-talk mode (planned)
249
+ - `/speed <number>` - Set TTS speed (0.5-2.0, default: 1.0, **pitch preserved**)
250
+ - `/tts_model <model>` - Switch TTS model:
251
+ - `vits` - **Best quality** (requires espeak-ng)
252
+ - `fast_pitch` - Good quality (works everywhere)
253
+ - `glow-tts` - Alternative (similar quality to fast_pitch)
254
+ - `tacotron2-DDC` - Legacy (slower, lower quality)
255
+ - `/whisper <model>` - Switch Whisper model (tiny|base|small|medium|large)
256
+ - `/stop` - Stop voice mode or TTS playback
257
+ - `/pause` - Pause current TTS playback (can be resumed)
258
+ - `/resume` - Resume paused TTS playback
259
+
260
+ **LLM Configuration:**
261
+ - `/model <name>` - Change LLM model (e.g., `/model gemma3:latest`)
262
+ - `/system <prompt>` - Set system prompt (e.g., `/system You are a helpful coding assistant`)
263
+ - `/temperature <val>` - Set temperature (0.0-2.0, default: 0.7)
264
+ - `/max_tokens <num>` - Set max tokens (default: 4096)
265
+
266
+ **Chat Management:**
267
+ - `/save <filename>` - Save chat history (e.g., `/save conversation`)
268
+ - `/load <filename>` - Load chat history (e.g., `/load conversation`)
269
+ - `/tokens` - Display token usage statistics
270
+
271
+ **Sending Messages:**
272
+ - `<message>` - Any text without `/` prefix is sent to the LLM
273
+
274
+ **Note**: Commands without `/` (except `stop`) are sent to the LLM as regular messages.
275
+
276
+ ### Web API
277
+
278
+ ```bash
279
+ # Run the web API example
280
+ abstractvoice-cli web
281
+
282
+ # With different host and port
283
+ abstractvoice-cli web --host 0.0.0.0 --port 8000
284
+ ```
285
+
286
+ You can also run a simplified version that doesn't load the full models:
287
+
288
+ ```bash
289
+ # Run the web API with simulation mode
290
+ abstractvoice-cli web --simulate
291
+ ```
292
+
293
+ #### Troubleshooting Web API
294
+
295
+ If you encounter issues with the web API:
296
+
297
+ 1. **404 Not Found**: Make sure you're accessing the correct endpoints (e.g., `/api/test`, `/api/tts`)
298
+ 2. **Connection Issues**: Ensure no other service is using the port
299
+ 3. **Model Loading Errors**: Try running with `--simulate` flag to test without loading models
300
+ 4. **Dependencies**: Ensure all required packages are installed:
301
+ ```bash
302
+ pip install flask soundfile numpy requests
303
+ ```
304
+ 5. **Test with a simple Flask script**:
305
+ ```python
306
+ from flask import Flask
307
+ app = Flask(__name__)
308
+ @app.route('/')
309
+ def home():
310
+ return "Flask works!"
311
+ app.run(host='127.0.0.1', port=5000)
312
+ ```
313
+
314
+ ### Simple Demo
315
+
316
+ ```bash
317
+ # Run the simple example
318
+ abstractvoice-cli simple
319
+ ```
320
+
321
+ ## Documentation
322
+
323
+ ### 📚 Documentation Overview
324
+
325
+ - **[README.md](README.md)** - This file: User guide, API reference, and examples
326
+ - **[CONTRIBUTING.md](CONTRIBUTING.md)** - Contribution guidelines and development setup
327
+ - **[CHANGELOG.md](CHANGELOG.md)** - Version history and release notes
328
+ - **[docs/](docs/)** - Technical documentation for developers
329
+
330
+ ### 🎯 Quick Navigation
331
+
332
+ - **Getting Started**: [Installation](#installation) and [Quick Start](#quick-start)
333
+ - **Pause/Resume Control**: [TTS Control](#quick-reference-tts-control) section
334
+ - **Integration Examples**: [Integration Guide](#integration-guide-for-third-party-applications)
335
+ - **Technical Details**: [docs/architecture.md](docs/architecture.md) - How immediate pause/resume works
336
+ - **Development**: [CONTRIBUTING.md](CONTRIBUTING.md) - Setup and guidelines
337
+
338
+ ## Component Overview
339
+
340
+ ### VoiceManager
341
+
342
+ The main class that coordinates TTS and STT functionality:
343
+
344
+ ```python
345
+ from abstractvoice import VoiceManager
346
+
347
+ # Simple initialization (automatic model selection)
348
+ # - Uses VITS if espeak-ng is installed (best quality)
349
+ # - Falls back to fast_pitch if espeak-ng is missing
350
+ manager = VoiceManager()
351
+
352
+ # Or specify a model explicitly
353
+ manager = VoiceManager(
354
+ tts_model="tts_models/en/ljspeech/vits", # Best quality (needs espeak-ng)
355
+ # tts_model="tts_models/en/ljspeech/fast_pitch", # Good (works everywhere)
356
+ whisper_model="tiny",
357
+ debug_mode=False
358
+ )
359
+
360
+ # === TTS (Text-to-Speech) ===
361
+
362
+ # Basic speech synthesis
363
+ manager.speak("Hello world")
364
+
365
+ # With speed control (pitch preserved via time-stretching!)
366
+ manager.speak("This is 20% faster", speed=1.2)
367
+ manager.speak("This is half speed", speed=0.5)
368
+
369
+ # Check if speaking
370
+ if manager.is_speaking():
371
+ manager.stop_speaking()
372
+
373
+ # Pause and resume TTS (IMMEDIATE response)
374
+ manager.speak("This is a long sentence that can be paused and resumed immediately")
375
+ time.sleep(1)
376
+ success = manager.pause_speaking() # Pause IMMEDIATELY (~20ms response)
377
+ if success:
378
+ print("TTS paused immediately")
379
+
380
+ time.sleep(2)
381
+ success = manager.resume_speaking() # Resume IMMEDIATELY from exact position
382
+ if success:
383
+ print("TTS resumed from exact position")
384
+
385
+ # Check pause status
386
+ if manager.is_paused():
387
+ manager.resume_speaking()
388
+
389
+ # Change TTS speed globally
390
+ manager.set_speed(1.3) # All subsequent speech will be 30% faster
391
+
392
+ # Change TTS model dynamically
393
+ manager.set_tts_model("tts_models/en/ljspeech/glow-tts")
394
+
395
+ # Available TTS models (quality ranking):
396
+ # - "tts_models/en/ljspeech/vits" (BEST quality, requires espeak-ng)
397
+ # - "tts_models/en/ljspeech/fast_pitch" (fallback, works everywhere)
398
+ # - "tts_models/en/ljspeech/glow-tts" (alternative fallback)
399
+ # - "tts_models/en/ljspeech/tacotron2-DDC" (legacy)
400
+
401
+ # === STT (Speech-to-Text) ===
402
+
403
+ def on_transcription(text):
404
+ print(f"You said: {text}")
405
+
406
+ manager.listen(on_transcription, on_stop=None)
407
+ manager.stop_listening()
408
+ manager.is_listening()
409
+
410
+ # Change Whisper model
411
+ manager.set_whisper("base") # tiny, base, small, medium, large
412
+
413
+ # === Voice Modes ===
414
+
415
+ # Control how voice recognition behaves during TTS
416
+ manager.set_voice_mode("wait") # Pause listening while speaking (recommended)
417
+ manager.set_voice_mode("full") # Keep listening, interrupt on speech
418
+ manager.set_voice_mode("off") # Disable voice recognition
419
+
420
+ # === VAD (Voice Activity Detection) ===
421
+
422
+ manager.change_vad_aggressiveness(2) # 0-3, higher = more aggressive
423
+
424
+ # === Cleanup ===
425
+
426
+ manager.cleanup()
427
+ ```
428
+
429
+ ### TTSEngine
430
+
431
+ Handles text-to-speech synthesis:
432
+
433
+ ```python
434
+ from abstractvoice.tts import TTSEngine
435
+
436
+ # Initialize with fast_pitch model (default, no external dependencies)
437
+ tts = TTSEngine(
438
+ model_name="tts_models/en/ljspeech/fast_pitch",
439
+ debug_mode=False,
440
+ streaming=True # Enable progressive playback for long text
441
+ )
442
+
443
+ # Speak with speed control (pitch preserved via time-stretching)
444
+ tts.speak(text, speed=1.2, callback=None) # 20% faster, same pitch
445
+
446
+ # Immediate pause and resume control
447
+ success = tts.pause() # Pause IMMEDIATELY (~20ms response)
448
+ success = tts.resume() # Resume IMMEDIATELY from exact position
449
+ is_paused = tts.is_paused() # Check if currently paused
450
+
451
+ tts.stop() # Stop completely (cannot resume)
452
+ tts.is_active() # Check if active
453
+ ```
454
+
455
+ **Important Note on Speed Parameter:**
456
+ - The speed parameter now uses proper time-stretching (via librosa)
457
+ - Changing speed does NOT affect pitch anymore
458
+ - Range: 0.5 (half speed) to 2.0 (double speed)
459
+ - Example: `speed=1.3` makes speech 30% faster while preserving natural pitch
460
+
461
+ ### VoiceRecognizer
462
+
463
+ Manages speech recognition with VAD:
464
+
465
+ ```python
466
+ from abstractvoice.recognition import VoiceRecognizer
467
+
468
+ def on_transcription(text):
469
+ print(f"Transcribed: {text}")
470
+
471
+ def on_stop():
472
+ print("Stop command detected")
473
+
474
+ recognizer = VoiceRecognizer(transcription_callback=on_transcription,
475
+ stop_callback=on_stop,
476
+ whisper_model="tiny",
477
+ debug_mode=False)
478
+ recognizer.start(tts_interrupt_callback=None)
479
+ recognizer.stop()
480
+ recognizer.change_whisper_model("base")
481
+ recognizer.change_vad_aggressiveness(2)
482
+ ```
483
+
484
+ ## Quick Reference: TTS Control
485
+
486
+ ### Pause and Resume TTS
487
+
488
+ **Professional-grade pause/resume control** with immediate response and no terminal interference.
489
+
490
+ **In CLI/REPL:**
491
+ ```bash
492
+ /pause # Pause current TTS playback IMMEDIATELY
493
+ /resume # Resume paused TTS playback IMMEDIATELY
494
+ /stop # Stop TTS completely (cannot resume)
495
+ ```
496
+
497
+ **Programmatic Usage:**
498
+
499
+ #### Basic Pause/Resume
500
+ ```python
501
+ from abstractvoice import VoiceManager
502
+ import time
503
+
504
+ vm = VoiceManager()
505
+
506
+ # Start speech
507
+ vm.speak("This is a long sentence that demonstrates immediate pause and resume functionality.")
508
+
509
+ # Pause immediately (takes effect within ~20ms)
510
+ time.sleep(1)
511
+ result = vm.pause_speaking()
512
+ if result:
513
+ print("✓ TTS paused immediately")
514
+
515
+ # Resume immediately (takes effect within ~20ms)
516
+ time.sleep(2)
517
+ result = vm.resume_speaking()
518
+ if result:
519
+ print("✓ TTS resumed immediately")
520
+ ```
521
+
522
+ #### Advanced Control with Status Checking
523
+ ```python
524
+ from abstractvoice import VoiceManager
525
+ import time
526
+
527
+ vm = VoiceManager()
528
+
529
+ # Start long speech
530
+ vm.speak("This is a very long text that will be used to demonstrate the advanced pause and resume control features.")
531
+
532
+ # Wait and pause
533
+ time.sleep(1.5)
534
+ if vm.is_speaking():
535
+ vm.pause_speaking()
536
+ print("Speech paused")
537
+
538
+ # Check pause status
539
+ if vm.is_paused():
540
+ print("Confirmed: TTS is paused")
541
+ time.sleep(2)
542
+
543
+ # Resume from exact position
544
+ vm.resume_speaking()
545
+ print("Speech resumed from exact position")
546
+
547
+ # Wait for completion
548
+ while vm.is_speaking():
549
+ time.sleep(0.1)
550
+ print("Speech completed")
551
+ ```
552
+
553
+ #### Interactive Control Example
554
+ ```python
555
+ from abstractvoice import VoiceManager
556
+ import threading
557
+ import time
558
+
559
+ vm = VoiceManager()
560
+
561
+ def control_speech():
562
+ """Interactive control in separate thread"""
563
+ time.sleep(2)
564
+ print("Pausing speech...")
565
+ vm.pause_speaking()
566
+
567
+ time.sleep(3)
568
+ print("Resuming speech...")
569
+ vm.resume_speaking()
570
+
571
+ # Start long speech
572
+ long_text = """
573
+ This is a comprehensive demonstration of AbstractVoice's immediate pause and resume functionality.
574
+ The system uses non-blocking audio streaming with callback-based control.
575
+ You can pause and resume at any time with immediate response.
576
+ The audio continues from the exact position where it was paused.
577
+ """
578
+
579
+ # Start control thread
580
+ control_thread = threading.Thread(target=control_speech, daemon=True)
581
+ control_thread.start()
582
+
583
+ # Start speech (non-blocking)
584
+ vm.speak(long_text)
585
+
586
+ # Wait for completion
587
+ while vm.is_speaking() or vm.is_paused():
588
+ time.sleep(0.1)
589
+
590
+ vm.cleanup()
591
+ ```
592
+
593
+ #### Error Handling
594
+ ```python
595
+ from abstractvoice import VoiceManager
596
+
597
+ vm = VoiceManager()
598
+
599
+ # Start speech
600
+ vm.speak("Testing pause/resume with error handling")
601
+
602
+ # Safe pause with error handling
603
+ try:
604
+ if vm.is_speaking():
605
+ success = vm.pause_speaking()
606
+ if success:
607
+ print("Successfully paused")
608
+ else:
609
+ print("No active speech to pause")
610
+
611
+ # Safe resume with error handling
612
+ if vm.is_paused():
613
+ success = vm.resume_speaking()
614
+ if success:
615
+ print("Successfully resumed")
616
+ else:
617
+ print("Was not paused or playback completed")
618
+
619
+ except Exception as e:
620
+ print(f"Error controlling TTS: {e}")
621
+ ```
622
+
623
+ **Key Features:**
624
+ - **⚡ Immediate Response**: Pause/resume takes effect within ~20ms
625
+ - **🎯 Exact Position**: Resumes from precise audio position (no repetition)
626
+ - **🖥️ No Terminal Interference**: Uses OutputStream callbacks, never blocks terminal
627
+ - **🔒 Thread-Safe**: Safe to call from any thread or callback
628
+ - **📊 Reliable Status**: `is_paused()` and `is_speaking()` always accurate
629
+ - **🔄 Seamless Streaming**: Works with ongoing text synthesis
630
+
631
+ **How it works:**
632
+ - Uses `sounddevice.OutputStream` with callback function
633
+ - Pause immediately outputs silence in next audio callback (~20ms)
634
+ - Resume immediately continues audio output from exact position
635
+ - No blocking `sd.stop()` calls that interfere with terminal I/O
636
+ - Thread-safe with proper locking mechanisms
637
+
638
+ ## Quick Reference: Speed & Model Control
639
+
640
+ ### Changing TTS Speed
641
+
642
+ **In CLI/REPL:**
643
+ ```bash
644
+ /speed 1.2 # 20% faster, pitch preserved
645
+ /speed 0.8 # 20% slower, pitch preserved
646
+ ```
647
+
648
+ **Programmatically:**
649
+ ```python
650
+ from abstractvoice import VoiceManager
651
+
652
+ vm = VoiceManager()
653
+
654
+ # Method 1: Set global speed
655
+ vm.set_speed(1.3) # All speech will be 30% faster
656
+ vm.speak("This will be 30% faster")
657
+
658
+ # Method 2: Per-speech speed
659
+ vm.speak("This is 50% faster", speed=1.5)
660
+ vm.speak("This is normal speed", speed=1.0)
661
+ vm.speak("This is half speed", speed=0.5)
662
+
663
+ # Get current speed
664
+ current = vm.get_speed() # Returns 1.3 from set_speed() above
665
+ ```
666
+
667
+ ### Changing TTS Model
668
+
669
+ **In CLI/REPL:**
670
+ ```bash
671
+ /tts_model vits # Best quality (needs espeak-ng)
672
+ /tts_model fast_pitch # Good quality (works everywhere)
673
+ /tts_model glow-tts # Alternative model
674
+ /tts_model tacotron2-DDC # Legacy model
675
+ ```
676
+
677
+ **Programmatically:**
678
+ ```python
679
+ from abstractvoice import VoiceManager
680
+
681
+ # Method 1: Set at initialization
682
+ vm = VoiceManager(tts_model="tts_models/en/ljspeech/glow-tts")
683
+
684
+ # Method 2: Change dynamically at runtime
685
+ vm.set_tts_model("tts_models/en/ljspeech/fast_pitch")
686
+ vm.speak("Using fast_pitch now")
687
+
688
+ vm.set_tts_model("tts_models/en/ljspeech/glow-tts")
689
+ vm.speak("Using glow-tts now")
690
+
691
+ # Available models (quality ranking):
692
+ models = [
693
+ "tts_models/en/ljspeech/vits", # BEST (requires espeak-ng)
694
+ "tts_models/en/ljspeech/fast_pitch", # Good (works everywhere)
695
+ "tts_models/en/ljspeech/glow-tts", # Alternative fallback
696
+ "tts_models/en/ljspeech/tacotron2-DDC" # Legacy
697
+ ]
698
+ ```
699
+
700
+ ### Complete Example: Experiment with Settings
701
+
702
+ ```python
703
+ from abstractvoice import VoiceManager
704
+ import time
705
+
706
+ vm = VoiceManager()
707
+
708
+ # Test different models (vits requires espeak-ng)
709
+ for model in ["vits", "fast_pitch", "glow-tts", "tacotron2-DDC"]:
710
+ full_name = f"tts_models/en/ljspeech/{model}"
711
+ vm.set_tts_model(full_name)
712
+
713
+ # Test different speeds with each model
714
+ for speed in [0.8, 1.0, 1.2]:
715
+ vm.speak(f"Testing {model} at {speed}x speed", speed=speed)
716
+ while vm.is_speaking():
717
+ time.sleep(0.1)
718
+ ```
719
+
720
+ ## Integration Guide for Third-Party Applications
721
+
722
+ AbstractVoice is designed as a lightweight, modular library for easy integration into your applications. This guide covers everything you need to know.
723
+
724
+ ### Quick Start: Basic Integration
725
+
726
+ ```python
727
+ from abstractvoice import VoiceManager
728
+
729
+ # 1. Initialize (automatic best-quality model selection)
730
+ vm = VoiceManager()
731
+
732
+ # 2. Text-to-Speech
733
+ vm.speak("Hello from my app!")
734
+
735
+ # 3. Speech-to-Text with callback
736
+ def handle_speech(text):
737
+ print(f"User said: {text}")
738
+ # Process text in your app...
739
+
740
+ vm.listen(on_transcription=handle_speech)
741
+ ```
742
+
743
+ ### Model Selection: Automatic vs Explicit
744
+
745
+ **Automatic (Recommended):**
746
+ ```python
747
+ # Automatically uses best available model
748
+ vm = VoiceManager()
749
+ # → Uses VITS if espeak-ng installed (best quality)
750
+ # → Falls back to fast_pitch if espeak-ng missing
751
+ ```
752
+
753
+ **Explicit:**
754
+ ```python
755
+ # Force a specific model (bypasses auto-detection)
756
+ vm = VoiceManager(tts_model="tts_models/en/ljspeech/fast_pitch")
757
+
758
+ # Or change dynamically at runtime
759
+ vm.set_tts_model("tts_models/en/ljspeech/vits")
760
+ ```
761
+
762
+ ### Voice Quality Levels
763
+
764
+ | Model | Quality | Speed | Requirements |
765
+ |-------|---------|-------|--------------|
766
+ | **vits** | ⭐⭐⭐⭐⭐ Excellent | Fast | espeak-ng |
767
+ | **fast_pitch** | ⭐⭐⭐ Good | Fast | None |
768
+ | **glow-tts** | ⭐⭐⭐ Good | Fast | None |
769
+ | **tacotron2-DDC** | ⭐⭐ Fair | Slow | None |
770
+
771
+ ### Customization Options
772
+
773
+ ```python
774
+ from abstractvoice import VoiceManager
775
+
776
+ vm = VoiceManager(
777
+ # TTS Configuration
778
+ tts_model="tts_models/en/ljspeech/vits", # Model to use
779
+
780
+ # STT Configuration
781
+ whisper_model="base", # tiny, base, small, medium, large
782
+
783
+ # Debugging
784
+ debug_mode=True # Enable detailed logging
785
+ )
786
+
787
+ # Runtime customization
788
+ vm.set_speed(1.2) # Adjust TTS speed (0.5-2.0)
789
+ vm.set_tts_model("...") # Change TTS model
790
+ vm.set_whisper("small") # Change STT model
791
+ vm.set_voice_mode("wait") # wait, full, or off
792
+ vm.change_vad_aggressiveness(2) # VAD sensitivity (0-3)
793
+ ```
794
+
795
+ ### Integration Patterns
796
+
797
+ #### Pattern 1: TTS Only (No Voice Input)
798
+ ```python
799
+ vm = VoiceManager()
800
+
801
+ # Speak with different speeds
802
+ vm.speak("Normal speed")
803
+ vm.speak("Fast speech", speed=1.5)
804
+ vm.speak("Slow speech", speed=0.7)
805
+
806
+ # Control playback with immediate response
807
+ if vm.is_speaking():
808
+ success = vm.pause_speaking() # Pause IMMEDIATELY (~20ms)
809
+ if success:
810
+ print("Speech paused immediately")
811
+ # or
812
+ vm.stop_speaking() # Stop completely (cannot resume)
813
+
814
+ # Resume from exact position
815
+ if vm.is_paused():
816
+ success = vm.resume_speaking() # Resume IMMEDIATELY (~20ms)
817
+ if success:
818
+ print("Speech resumed from exact position")
819
+ ```
820
+
821
+ #### Pattern 2: STT Only (No Text-to-Speech)
822
+ ```python
823
+ vm = VoiceManager()
824
+
825
+ def process_speech(text):
826
+ # Send to your backend, save to DB, etc.
827
+ your_app.process(text)
828
+
829
+ vm.listen(on_transcription=process_speech)
830
+ ```
831
+
832
+ #### Pattern 3: Full Voice Interaction
833
+ ```python
834
+ vm = VoiceManager()
835
+
836
+ def on_speech(text):
837
+ response = your_llm.generate(text)
838
+ vm.speak(response)
839
+
840
+ def on_stop():
841
+ print("User said stop")
842
+ vm.cleanup()
843
+
844
+ vm.listen(
845
+ on_transcription=on_speech,
846
+ on_stop=on_stop
847
+ )
848
+ ```
849
+
850
+ ### Error Handling
851
+
852
+ ```python
853
+ try:
854
+ vm = VoiceManager()
855
+ vm.speak("Test")
856
+ except Exception as e:
857
+ print(f"TTS Error: {e}")
858
+ # Handle missing dependencies, etc.
859
+
860
+ # Check model availability
861
+ try:
862
+ vm.set_tts_model("tts_models/en/ljspeech/vits")
863
+ print("VITS available")
864
+ except:
865
+ print("VITS not available, using fallback")
866
+ vm.set_tts_model("tts_models/en/ljspeech/fast_pitch")
867
+ ```
868
+
869
+ ### Threading and Async Support
870
+
871
+ AbstractVoice handles threading internally for TTS and STT:
872
+
873
+ ```python
874
+ # TTS is non-blocking
875
+ vm.speak("Long text...") # Returns immediately
876
+ # Your code continues while speech plays
877
+
878
+ # Check status
879
+ if vm.is_speaking():
880
+ print("Still speaking...")
881
+
882
+ # Wait for completion
883
+ while vm.is_speaking():
884
+ time.sleep(0.1)
885
+
886
+ # STT runs in background thread
887
+ vm.listen(on_transcription=callback) # Returns immediately
888
+ # Callbacks fire on background thread
889
+ ```
890
+
891
+ ### Cleanup and Resource Management
892
+
893
+ ```python
894
+ # Always cleanup when done
895
+ vm.cleanup()
896
+
897
+ # Or use context manager pattern
898
+ from contextlib import contextmanager
899
+
900
+ @contextmanager
901
+ def voice_manager():
902
+ vm = VoiceManager()
903
+ try:
904
+ yield vm
905
+ finally:
906
+ vm.cleanup()
907
+
908
+ # Usage
909
+ with voice_manager() as vm:
910
+ vm.speak("Hello")
911
+ ```
912
+
913
+ ### Configuration for Different Environments
914
+
915
+ **Development (fast iteration):**
916
+ ```python
917
+ vm = VoiceManager(
918
+ tts_model="tts_models/en/ljspeech/fast_pitch", # Fast
919
+ whisper_model="tiny", # Fast STT
920
+ debug_mode=True
921
+ )
922
+ ```
923
+
924
+ **Production (best quality):**
925
+ ```python
926
+ vm = VoiceManager(
927
+ tts_model="tts_models/en/ljspeech/vits", # Best quality
928
+ whisper_model="base", # Good accuracy
929
+ debug_mode=False
930
+ )
931
+ ```
932
+
933
+ **Embedded/Resource-Constrained:**
934
+ ```python
935
+ vm = VoiceManager(
936
+ tts_model="tts_models/en/ljspeech/fast_pitch", # Lower memory
937
+ whisper_model="tiny", # Smallest model
938
+ debug_mode=False
939
+ )
940
+ ```
941
+
942
+ ## Integration with Text Generation Systems
943
+
944
+ AbstractVoice is designed to be a lightweight, modular library that you can easily integrate into your own applications. Here are complete examples for common use cases:
945
+
946
+ ### Example 1: Voice-Enabled Chatbot with Ollama
947
+
948
+ ```python
949
+ from abstractvoice import VoiceManager
950
+ import requests
951
+ import time
952
+
953
+ # Initialize voice manager
954
+ voice_manager = VoiceManager()
955
+
956
+ # Function to call Ollama API
957
+ def generate_text(prompt):
958
+ response = requests.post("http://localhost:11434/api/chat", json={
959
+ "model": "granite3.3:2b",
960
+ "messages": [{"role": "user", "content": prompt}],
961
+ "stream": False
962
+ })
963
+ return response.json()["message"]["content"]
964
+
965
+ # Callback for speech recognition
966
+ def on_transcription(text):
967
+ if text.lower() == "stop":
968
+ return
969
+
970
+ print(f"User: {text}")
971
+
972
+ # Generate response
973
+ response = generate_text(text)
974
+ print(f"AI: {response}")
975
+
976
+ # Speak response
977
+ voice_manager.speak(response)
978
+
979
+ # Start listening
980
+ voice_manager.listen(on_transcription)
981
+
982
+ # Keep running until interrupted
983
+ try:
984
+ while voice_manager.is_listening():
985
+ time.sleep(0.1)
986
+ except KeyboardInterrupt:
987
+ voice_manager.cleanup()
988
+ ```
989
+
990
+ ### Example 2: Voice-Enabled Assistant with OpenAI
991
+
992
+ ```python
993
+ from abstractvoice import VoiceManager
994
+ import openai
995
+ import time
996
+
997
+ # Initialize
998
+ voice_manager = VoiceManager()
999
+ openai.api_key = "your-api-key"
1000
+
1001
+ def on_transcription(text):
1002
+ print(f"User: {text}")
1003
+
1004
+ # Get response from OpenAI
1005
+ response = openai.ChatCompletion.create(
1006
+ model="gpt-4",
1007
+ messages=[{"role": "user", "content": text}]
1008
+ )
1009
+
1010
+ ai_response = response.choices[0].message.content
1011
+ print(f"AI: {ai_response}")
1012
+
1013
+ # Speak the response
1014
+ voice_manager.speak(ai_response)
1015
+
1016
+ # Start voice interaction
1017
+ voice_manager.listen(on_transcription)
1018
+
1019
+ # Keep running
1020
+ try:
1021
+ while voice_manager.is_listening():
1022
+ time.sleep(0.1)
1023
+ except KeyboardInterrupt:
1024
+ voice_manager.cleanup()
1025
+ ```
1026
+
1027
+ ### Example 3: Text-to-Speech Only (No Voice Input)
1028
+
1029
+ ```python
1030
+ from abstractvoice import VoiceManager
1031
+ import time
1032
+
1033
+ # Initialize voice manager
1034
+ voice_manager = VoiceManager()
1035
+
1036
+ # Simple text-to-speech
1037
+ voice_manager.speak("Hello! This is a test of the text to speech system.")
1038
+
1039
+ # Wait for speech to finish
1040
+ while voice_manager.is_speaking():
1041
+ time.sleep(0.1)
1042
+
1043
+ # Adjust speed
1044
+ voice_manager.set_speed(1.5)
1045
+ voice_manager.speak("This speech is 50% faster.")
1046
+
1047
+ while voice_manager.is_speaking():
1048
+ time.sleep(0.1)
1049
+
1050
+ # Cleanup
1051
+ voice_manager.cleanup()
1052
+ ```
1053
+
1054
+ ### Example 4: Speech-to-Text Only (No TTS)
1055
+
1056
+ ```python
1057
+ from abstractvoice import VoiceManager
1058
+ import time
1059
+
1060
+ voice_manager = VoiceManager()
1061
+
1062
+ def on_transcription(text):
1063
+ print(f"Transcribed: {text}")
1064
+ # Do something with the transcribed text
1065
+ # e.g., save to file, send to API, etc.
1066
+
1067
+ # Start listening
1068
+ voice_manager.listen(on_transcription)
1069
+
1070
+ # Keep running
1071
+ try:
1072
+ while voice_manager.is_listening():
1073
+ time.sleep(0.1)
1074
+ except KeyboardInterrupt:
1075
+ voice_manager.cleanup()
1076
+ ```
1077
+
1078
+ ### Key Integration Points
1079
+
1080
+ **VoiceManager Configuration:**
1081
+ ```python
1082
+ # Full configuration example
1083
+ voice_manager = VoiceManager(
1084
+ tts_model="tts_models/en/ljspeech/fast_pitch", # Default (no external deps)
1085
+ whisper_model="base", # Whisper STT model (tiny, base, small, medium, large)
1086
+ debug_mode=True # Enable debug logging
1087
+ )
1088
+
1089
+ # Alternative TTS models (all pure Python, cross-platform):
1090
+ # - "tts_models/en/ljspeech/fast_pitch" - Default (fast, good quality)
1091
+ # - "tts_models/en/ljspeech/glow-tts" - Alternative (similar quality)
1092
+ # - "tts_models/en/ljspeech/tacotron2-DDC" - Legacy (older, slower)
1093
+
1094
+ # Set voice mode (full, wait, off)
1095
+ voice_manager.set_voice_mode("wait") # Recommended to avoid self-interruption
1096
+
1097
+ # Adjust settings (speed now preserves pitch!)
1098
+ voice_manager.set_speed(1.2) # TTS speed (default is 1.0, range 0.5-2.0)
1099
+ voice_manager.change_vad_aggressiveness(2) # VAD sensitivity (0-3)
1100
+ ```
1101
+
1102
+ **Callback Functions:**
1103
+ ```python
1104
+ def on_transcription(text):
1105
+ """Called when speech is transcribed"""
1106
+ print(f"User said: {text}")
1107
+ # Your custom logic here
1108
+
1109
+ def on_stop():
1110
+ """Called when user says 'stop'"""
1111
+ print("Stopping voice mode")
1112
+ # Your cleanup logic here
1113
+
1114
+ voice_manager.listen(
1115
+ on_transcription=on_transcription,
1116
+ on_stop=on_stop
1117
+ )
1118
+ ```
1119
+
1120
+ ## Perspectives
1121
+
1122
+ This is a test project that I designed with examples to work with Ollama, but I will adapt the examples and abstractvoice to work with any LLM provider (anthropic, openai, etc).
1123
+
1124
+ Next iteration will leverage directly [AbstractCore](https://www.abstractcore.ai) to handle everything related to LLM, their providers, models and configurations.
1125
+
1126
+ ## License and Acknowledgments
1127
+
1128
+ AbstractVoice is licensed under the [MIT License](LICENSE).
1129
+
1130
+ This project depends on several open-source libraries and models, each with their own licenses. Please see [ACKNOWLEDGMENTS.md](ACKNOWLEDGMENTS.md) for a detailed list of dependencies and their respective licenses.
1131
+
1132
+ Some dependencies, particularly certain TTS models, may have non-commercial use restrictions. If you plan to use AbstractVoice in a commercial application, please ensure you are using models that permit commercial use or obtain appropriate licenses.