PyPI - abstractvoice - Versions diffs - 0.5.1__tar.gz → 0.6.1__tar.gz - Mend

abstractvoice 0.5.1tar.gz → 0.6.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (94) hide show

abstractvoice-0.6.1/PKG-INFO ADDED Viewed

@@ -0,0 +1,213 @@
+Metadata-Version: 2.4
+Name: abstractvoice
+Version: 0.6.1
+Summary: A modular Python library for voice interactions with AI systems
+Author-email: Laurent-Philippe Albou <contact@abstractcore.ai>
+License-Expression: MIT
+Project-URL: Repository, https://github.com/lpalbou/abstractvoice
+Project-URL: Documentation, https://github.com/lpalbou/abstractvoice#readme
+Classifier: Development Status :: 3 - Alpha
+Classifier: Intended Audience :: Developers
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.8
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: numpy>=1.24.0
+Requires-Dist: requests>=2.31.0
+Requires-Dist: appdirs>=1.4.0
+Requires-Dist: piper-tts>=1.2.0
+Requires-Dist: huggingface_hub>=0.20.0
+Requires-Dist: faster-whisper>=0.10.0
+Requires-Dist: sounddevice>=0.4.6
+Requires-Dist: soundfile>=0.12.1
+Requires-Dist: webrtcvad>=2.0.10
+Provides-Extra: voice
+Requires-Dist: sounddevice>=0.4.6; extra == "voice"
+Requires-Dist: webrtcvad>=2.0.10; extra == "voice"
+Requires-Dist: soundfile>=0.12.1; extra == "voice"
+Provides-Extra: audio-fx
+Requires-Dist: librosa>=0.10.0; extra == "audio-fx"
+Provides-Extra: cloning
+Requires-Dist: f5-tts>=1.1.0; extra == "cloning"
+Provides-Extra: chroma
+Requires-Dist: torch>=2.0.0; extra == "chroma"
+Requires-Dist: torchaudio>=2.0.0; extra == "chroma"
+Requires-Dist: torchvision>=0.15.0; extra == "chroma"
+Requires-Dist: transformers>=5.0.0rc0; extra == "chroma"
+Requires-Dist: accelerate>=1.0.0; extra == "chroma"
+Requires-Dist: av>=14.0.0; extra == "chroma"
+Requires-Dist: librosa>=0.11.0; extra == "chroma"
+Requires-Dist: audioread>=3.0.0; extra == "chroma"
+Requires-Dist: pillow>=11.0.0; extra == "chroma"
+Requires-Dist: safetensors>=0.5.0; extra == "chroma"
+Provides-Extra: aec
+Requires-Dist: aec-audio-processing>=1.0.1; extra == "aec"
+Provides-Extra: stt
+Requires-Dist: openai-whisper>=20230314; extra == "stt"
+Requires-Dist: tiktoken>=0.6.0; extra == "stt"
+Provides-Extra: web
+Requires-Dist: flask>=2.0.0; extra == "web"
+Provides-Extra: all
+Requires-Dist: piper-tts>=1.2.0; extra == "all"
+Requires-Dist: sounddevice>=0.4.6; extra == "all"
+Requires-Dist: webrtcvad>=2.0.10; extra == "all"
+Requires-Dist: openai-whisper>=20230314; extra == "all"
+Requires-Dist: librosa>=0.10.0; extra == "all"
+Requires-Dist: soundfile>=0.12.1; extra == "all"
+Requires-Dist: flask>=2.0.0; extra == "all"
+Requires-Dist: tiktoken>=0.6.0; extra == "all"
+Requires-Dist: f5-tts>=1.1.0; extra == "all"
+Requires-Dist: aec-audio-processing>=1.0.1; extra == "all"
+Provides-Extra: dev
+Requires-Dist: pytest>=7.0.0; extra == "dev"
+Requires-Dist: black>=22.0.0; extra == "dev"
+Requires-Dist: flake8>=5.0.0; extra == "dev"
+Provides-Extra: voice-full
+Requires-Dist: sounddevice>=0.4.6; extra == "voice-full"
+Requires-Dist: webrtcvad>=2.0.10; extra == "voice-full"
+Requires-Dist: openai-whisper>=20230314; extra == "voice-full"
+Requires-Dist: librosa>=0.10.0; extra == "voice-full"
+Requires-Dist: soundfile>=0.12.1; extra == "voice-full"
+Requires-Dist: tiktoken>=0.6.0; extra == "voice-full"
+Provides-Extra: core-stt
+Requires-Dist: openai-whisper>=20230314; extra == "core-stt"
+Requires-Dist: tiktoken>=0.6.0; extra == "core-stt"
+Provides-Extra: audio-only
+Requires-Dist: sounddevice>=0.4.6; extra == "audio-only"
+Requires-Dist: webrtcvad>=2.0.10; extra == "audio-only"
+Requires-Dist: soundfile>=0.12.1; extra == "audio-only"
+Dynamic: license-file
+# AbstractVoice
+A modular Python library for **voice I/O** around AI applications.
+- **TTS (default)**: Piper (cross-platform, no system deps)
+- **STT (default)**: faster-whisper
+- **Local assistant**: `listen()` + `speak()` with playback/listening control
+- **Headless/server**: `speak_to_bytes()` / `speak_to_file()` and `transcribe_*`
+Status: **alpha** (`0.6.1`). The supported integrator surface is documented in `docs/api.md`.
+Next: `docs/getting-started.md` (recommended setup + first smoke tests).
+> AbstractVoice will ultimately be integrated as the voice modality of AbstractFramework.
+> An OpenAI-compatible voice endpoint is an optional demo/integration layer (see backlog).
+---
+## Install
+```bash
+pip install abstractvoice
+```
+Optional extras (feature flags):
+```bash
+pip install "abstractvoice[all]"
+```
+Notes:
+- `abstractvoice[all]` enables most optional features (incl. cloning + AEC + audio-fx), but **does not** include the GPU-heavy Chroma runtime.
+- For the full list of extras (and platform troubleshooting), see `docs/installation.md`.
+### Explicit model downloads (recommended; never implicit in the REPL)
+Some features rely on large model weights/artifacts. AbstractVoice will **not**
+download these implicitly inside the REPL (offline-first).
+After installing, prefetch explicitly (cross-platform):
+```bash
+abstractvoice-prefetch --stt small
+abstractvoice-prefetch --piper en
+abstractvoice-prefetch --openf5
+abstractvoice-prefetch --chroma
+```
+Or equivalently:
+```bash
+python -m abstractvoice download --stt small
+python -m abstractvoice download --piper en
+python -m abstractvoice download --openf5
+python -m abstractvoice download --chroma
+```
+Notes:
+- `--piper <lang>` downloads the Piper ONNX voice for that language into `~/.piper/models`.
+- `--openf5` is ~5.4GB. `--chroma` is very large (GPU-heavy).
+---
+## Quick smoke tests
+### REPL (fastest end-to-end)
+```bash
+abstractvoice --verbose
+# or (from a source checkout):
+python -m abstractvoice cli --verbose
+```
+Notes:
+- Mic voice input is **off by default** for fast startup. Enable with `--voice-mode stop` (or in-session: `/voice stop`).
+- The REPL is **offline-first**: no implicit model downloads. Use the explicit download commands above.
+See `docs/repl_guide.md`.
+### Minimal Python
+```python
+from abstractvoice import VoiceManager
+vm = VoiceManager()
+vm.speak("Hello! This is AbstractVoice.")
+```
+---
+## Public API (stable surface)
+See `docs/api.md` for the supported integrator contract.
+At a glance:
+- **TTS**: `speak()`, `stop_speaking()`, `pause_speaking()`, `resume_speaking()`, `speak_to_bytes()`, `speak_to_file()`
+- **STT**: `transcribe_file()`, `transcribe_from_bytes()`
+- **Mic**: `listen()`, `stop_listening()`, `pause_listening()`, `resume_listening()`
+---
+## Documentation (minimal set)
+- **Docs index**: `docs/README.md`
+- **Getting started**: `docs/getting-started.md`
+- **FAQ**: `docs/faq.md`
+- **Orientation**: `docs/overview.md`
+- **Acronyms**: `docs/acronyms.md`
+- **Public API**: `docs/api.md`
+- **REPL guide**: `docs/repl_guide.md`
+- **Install troubleshooting**: `docs/installation.md`
+- **Multilingual support**: `docs/multilingual.md`
+- **Architecture (internal)**: `docs/architecture.md` + `docs/adr/`
+- **Model management (Piper-first)**: `docs/model-management.md`
+- **Licensing notes**: `docs/voices-and-licenses.md`
+---
+## Project
+- **Changelog**: `CHANGELOG.md`
+- **Contributing**: `CONTRIBUTING.md`
+- **Security**: `SECURITY.md`
+- **Acknowledgments**: `ACKNOWLEDGMENTS.md`
+## License
+MIT. See `LICENSE`.

abstractvoice-0.6.1/README.md ADDED Viewed

@@ -0,0 +1,128 @@
+# AbstractVoice
+A modular Python library for **voice I/O** around AI applications.
+- **TTS (default)**: Piper (cross-platform, no system deps)
+- **STT (default)**: faster-whisper
+- **Local assistant**: `listen()` + `speak()` with playback/listening control
+- **Headless/server**: `speak_to_bytes()` / `speak_to_file()` and `transcribe_*`
+Status: **alpha** (`0.6.1`). The supported integrator surface is documented in `docs/api.md`.
+Next: `docs/getting-started.md` (recommended setup + first smoke tests).
+> AbstractVoice will ultimately be integrated as the voice modality of AbstractFramework.
+> An OpenAI-compatible voice endpoint is an optional demo/integration layer (see backlog).
+---
+## Install
+```bash
+pip install abstractvoice
+```
+Optional extras (feature flags):
+```bash
+pip install "abstractvoice[all]"
+```
+Notes:
+- `abstractvoice[all]` enables most optional features (incl. cloning + AEC + audio-fx), but **does not** include the GPU-heavy Chroma runtime.
+- For the full list of extras (and platform troubleshooting), see `docs/installation.md`.
+### Explicit model downloads (recommended; never implicit in the REPL)
+Some features rely on large model weights/artifacts. AbstractVoice will **not**
+download these implicitly inside the REPL (offline-first).
+After installing, prefetch explicitly (cross-platform):
+```bash
+abstractvoice-prefetch --stt small
+abstractvoice-prefetch --piper en
+abstractvoice-prefetch --openf5
+abstractvoice-prefetch --chroma
+```
+Or equivalently:
+```bash
+python -m abstractvoice download --stt small
+python -m abstractvoice download --piper en
+python -m abstractvoice download --openf5
+python -m abstractvoice download --chroma
+```
+Notes:
+- `--piper <lang>` downloads the Piper ONNX voice for that language into `~/.piper/models`.
+- `--openf5` is ~5.4GB. `--chroma` is very large (GPU-heavy).
+---
+## Quick smoke tests
+### REPL (fastest end-to-end)
+```bash
+abstractvoice --verbose
+# or (from a source checkout):
+python -m abstractvoice cli --verbose
+```
+Notes:
+- Mic voice input is **off by default** for fast startup. Enable with `--voice-mode stop` (or in-session: `/voice stop`).
+- The REPL is **offline-first**: no implicit model downloads. Use the explicit download commands above.
+See `docs/repl_guide.md`.
+### Minimal Python
+```python
+from abstractvoice import VoiceManager
+vm = VoiceManager()
+vm.speak("Hello! This is AbstractVoice.")
+```
+---
+## Public API (stable surface)
+See `docs/api.md` for the supported integrator contract.
+At a glance:
+- **TTS**: `speak()`, `stop_speaking()`, `pause_speaking()`, `resume_speaking()`, `speak_to_bytes()`, `speak_to_file()`
+- **STT**: `transcribe_file()`, `transcribe_from_bytes()`
+- **Mic**: `listen()`, `stop_listening()`, `pause_listening()`, `resume_listening()`
+---
+## Documentation (minimal set)
+- **Docs index**: `docs/README.md`
+- **Getting started**: `docs/getting-started.md`
+- **FAQ**: `docs/faq.md`
+- **Orientation**: `docs/overview.md`
+- **Acronyms**: `docs/acronyms.md`
+- **Public API**: `docs/api.md`
+- **REPL guide**: `docs/repl_guide.md`
+- **Install troubleshooting**: `docs/installation.md`
+- **Multilingual support**: `docs/multilingual.md`
+- **Architecture (internal)**: `docs/architecture.md` + `docs/adr/`
+- **Model management (Piper-first)**: `docs/model-management.md`
+- **Licensing notes**: `docs/voices-and-licenses.md`
+---
+## Project
+- **Changelog**: `CHANGELOG.md`
+- **Contributing**: `CONTRIBUTING.md`
+- **Security**: `SECURITY.md`
+- **Acknowledgments**: `ACKNOWLEDGMENTS.md`
+## License
+MIT. See `LICENSE`.

{abstractvoice-0.5.1 → abstractvoice-0.6.1}/abstractvoice/__init__.py RENAMED Viewed

@@ -29,8 +29,5 @@ warnings.filterwarnings(
 # Import the main class for public API
 from .voice_manager import VoiceManager
-# Import simple APIs for third-party applications
-from .simple_model_manager import list_models, download_model, get_status, is_ready
-__version__ = "0.5.1"
-__all__ = ['VoiceManager', 'list_models', 'download_model', 'get_status', 'is_ready']
+__version__ = "0.6.1"
+__all__ = ["VoiceManager"]

{abstractvoice-0.5.1 → abstractvoice-0.6.1}/abstractvoice/__main__.py RENAMED Viewed

@@ -16,8 +16,9 @@ def print_examples():
     print("  web       - Web API example")
     print("  simple    - Simple usage example")
     print("  check-deps - Check dependency compatibility")
+    print("  download  - Explicitly prefetch model artifacts")
     print("\nUsage: python -m abstractvoice <example> [--language <lang>] [args...]")
-    print("\nSupported languages: en, fr, es, de, it, ru, multilingual")
+    print("\nSupported languages: en, fr, de, es, ru, zh")
     print("\nExamples:")
     print("  python -m abstractvoice cli --language fr    # French CLI")
     print("  python -m abstractvoice simple --language ru # Russian simple example")
@@ -99,7 +100,7 @@ def main():
     parser = argparse.ArgumentParser(description="AbstractVoice examples")
     parser.add_argument("example", nargs="?", help="Example to run (cli, web, simple, check-deps)")
     parser.add_argument("--language", "--lang", default="en",
-                      choices=["en", "fr", "es", "de", "it", "ru", "multilingual"],
+                      choices=["en", "fr", "de", "es", "ru", "zh"],
                       help="Voice language for examples")
     # Parse just the first argument and language
@@ -119,6 +120,84 @@ def main():
             print("This might indicate a dependency issue.")
         return
+    if args.example == "download":
+        dl = argparse.ArgumentParser(description="AbstractVoice explicit downloads")
+        dl.add_argument("--stt", dest="stt_model", default=None, help="Prefetch faster-whisper model (e.g. small)")
+        dl.add_argument(
+            "--openf5",
+            action="store_true",
+            help="Prefetch OpenF5 artifacts for cloning (~5.4GB, requires abstractvoice[cloning])",
+        )
+        dl.add_argument(
+            "--chroma",
+            action="store_true",
+            help="Prefetch Chroma-4B artifacts (~14GB+, requires HF access; install abstractvoice[chroma] to run inference)",
+        )
+        dl.add_argument(
+            "--piper",
+            dest="piper_language",
+            default=None,
+            help="Prefetch Piper voice model for a language (e.g. en/fr/de).",
+        )
+        dl_args = dl.parse_args(remaining)
+        if not dl_args.stt_model and not dl_args.openf5 and not dl_args.chroma and not dl_args.piper_language:
+            print("Nothing to download. Examples:")
+            print("  python -m abstractvoice download --stt small")
+            print("  python -m abstractvoice download --openf5")
+            print("  python -m abstractvoice download --chroma")
+            print("  python -m abstractvoice download --piper en")
+            return
+        if dl_args.stt_model:
+            try:
+                from abstractvoice.adapters.stt_faster_whisper import FasterWhisperAdapter
+                model = str(dl_args.stt_model).strip()
+                print(f"Downloading STT model (faster-whisper): {model}")
+                stt = FasterWhisperAdapter(model_size=model, device="cpu", compute_type="int8", allow_downloads=True)
+                if not stt.is_available():
+                    raise RuntimeError("Model download/load failed.")
+                print("✅ STT model ready.")
+            except Exception as e:
+                print(f"❌ STT download failed: {e}")
+        if dl_args.openf5:
+            try:
+                from abstractvoice.cloning.engine_f5 import F5TTSVoiceCloningEngine
+                print("Downloading OpenF5 artifacts (cloning)…")
+                engine = F5TTSVoiceCloningEngine(debug=True)
+                engine.ensure_openf5_artifacts_downloaded()
+                print("✅ OpenF5 artifacts ready.")
+            except Exception as e:
+                print(f"❌ OpenF5 download failed: {e}")
+        if dl_args.chroma:
+            try:
+                from abstractvoice.cloning.engine_chroma import ChromaVoiceCloningEngine
+                print("Downloading Chroma artifacts (cloning)…")
+                engine = ChromaVoiceCloningEngine(debug=True)
+                engine.ensure_chroma_artifacts_downloaded()
+                print("✅ Chroma artifacts ready.")
+            except Exception as e:
+                print(f"❌ Chroma download failed: {e}")
+        if dl_args.piper_language:
+            try:
+                from abstractvoice.adapters.tts_piper import PiperTTSAdapter
+                lang = str(dl_args.piper_language).strip().lower()
+                print(f"Downloading Piper voice model: {lang}")
+                piper = PiperTTSAdapter(language=lang, allow_downloads=True, auto_load=False)
+                if not piper.ensure_model_downloaded(lang):
+                    raise RuntimeError("Piper model download failed.")
+                print("✅ Piper model ready.")
+            except Exception as e:
+                print(f"❌ Piper download failed: {e}")
+        return
     # Set remaining args as sys.argv for the examples, including language
     if args.language != "en":
         remaining = ["--language", args.language] + remaining
@@ -138,4 +217,4 @@ def main():
 if __name__ == "__main__":
-    main()
+    main()

abstractvoice-0.6.1/abstractvoice/adapters/__init__.py ADDED Viewed

@@ -0,0 +1,12 @@
+"""Adapter interfaces for TTS and STT engines.
+This module defines base interfaces for pluggable TTS and STT engines,
+enabling easy integration of new speech synthesis and recognition backends
+while maintaining API compatibility.
+"""
+from .base import TTSAdapter, STTAdapter
+from .tts_piper import PiperTTSAdapter
+from .stt_faster_whisper import FasterWhisperAdapter
+__all__ = ['TTSAdapter', 'STTAdapter', 'PiperTTSAdapter', 'FasterWhisperAdapter']

abstractvoice-0.6.1/abstractvoice/adapters/base.py ADDED Viewed

@@ -0,0 +1,207 @@
+"""Base adapter interfaces for TTS and STT engines.
+These abstract base classes define the contract that all TTS and STT adapters
+must implement, ensuring consistent API across different backends.
+"""
+from abc import ABC, abstractmethod
+from typing import Optional, Dict, Any, Union
+import numpy as np
+import io
+class TTSAdapter(ABC):
+    """Abstract base class for Text-to-Speech adapters.
+    All TTS engines must implement this interface to be compatible with
+    the VoiceManager. This ensures we can swap engines without breaking
+    existing code.
+    """
+    @abstractmethod
+    def synthesize(self, text: str) -> np.ndarray:
+        """Convert text to audio array for immediate playback.
+        Args:
+            text: The text to synthesize
+        Returns:
+            Audio data as numpy array (shape: [samples,], dtype: float32, range: -1.0 to 1.0)
+        """
+        pass
+    @abstractmethod
+    def synthesize_to_bytes(self, text: str, format: str = 'wav') -> bytes:
+        """Convert text to audio bytes for network transmission or file storage.
+        This method is essential for client-server architectures where the backend
+        generates speech and sends it to clients for playback.
+        Args:
+            text: The text to synthesize
+            format: Audio format ('wav', 'mp3', 'ogg'). Default: 'wav'
+        Returns:
+            Audio data as bytes in the specified format
+        """
+        pass
+    @abstractmethod
+    def synthesize_to_file(self, text: str, output_path: str, format: Optional[str] = None) -> str:
+        """Convert text to audio file.
+        Args:
+            text: The text to synthesize
+            output_path: Path to save the audio file
+            format: Audio format (optional, inferred from file extension if not provided)
+        Returns:
+            Path to the saved audio file
+        """
+        pass
+    @abstractmethod
+    def set_language(self, language: str) -> bool:
+        """Switch the TTS language.
+        Args:
+            language: ISO 639-1 language code (e.g., 'en', 'fr', 'de')
+        Returns:
+            True if language switch successful, False otherwise
+        """
+        pass
+    @abstractmethod
+    def get_supported_languages(self) -> list[str]:
+        """Get list of supported language codes.
+        Returns:
+            List of ISO 639-1 language codes
+        """
+        pass
+    @abstractmethod
+    def get_sample_rate(self) -> int:
+        """Get the sample rate of the synthesized audio.
+        Returns:
+            Sample rate in Hz (e.g., 22050, 16000)
+        """
+        pass
+    @abstractmethod
+    def is_available(self) -> bool:
+        """Check if this TTS engine is available and functional.
+        Returns:
+            True if the engine can be used, False if dependencies missing or initialization failed
+        """
+        pass
+    def get_info(self) -> Dict[str, Any]:
+        """Get metadata about this TTS engine.
+        Returns:
+            Dictionary with engine information (name, version, languages, etc.)
+        """
+        return {
+            'name': self.__class__.__name__,
+            'languages': self.get_supported_languages(),
+            'sample_rate': self.get_sample_rate(),
+            'available': self.is_available()
+        }
+class STTAdapter(ABC):
+    """Abstract base class for Speech-to-Text adapters.
+    All STT engines must implement this interface to be compatible with
+    the VoiceManager.
+    """
+    @abstractmethod
+    def transcribe(self, audio_path: str, language: Optional[str] = None) -> str:
+        """Transcribe audio file to text.
+        Args:
+            audio_path: Path to audio file
+            language: Target language (optional, auto-detect if not provided)
+        Returns:
+            Transcribed text
+        """
+        pass
+    @abstractmethod
+    def transcribe_from_bytes(self, audio_bytes: bytes, language: Optional[str] = None) -> str:
+        """Transcribe audio from bytes (network use case).
+        This method is essential for client-server architectures where clients
+        record audio and send it to the backend for transcription.
+        Args:
+            audio_bytes: Audio data as bytes
+            language: Target language (optional, auto-detect if not provided)
+        Returns:
+            Transcribed text
+        """
+        pass
+    @abstractmethod
+    def transcribe_from_array(self, audio_array: np.ndarray, sample_rate: int,
+                             language: Optional[str] = None) -> str:
+        """Transcribe audio from numpy array.
+        Args:
+            audio_array: Audio data as numpy array
+            sample_rate: Sample rate of the audio in Hz
+            language: Target language (optional, auto-detect if not provided)
+        Returns:
+            Transcribed text
+        """
+        pass
+    @abstractmethod
+    def set_language(self, language: str) -> bool:
+        """Set the default language for transcription.
+        Args:
+            language: ISO 639-1 language code
+        Returns:
+            True if successful, False otherwise
+        """
+        pass
+    @abstractmethod
+    def get_supported_languages(self) -> list[str]:
+        """Get list of supported language codes.
+        Returns:
+            List of ISO 639-1 language codes
+        """
+        pass
+    @abstractmethod
+    def is_available(self) -> bool:
+        """Check if this STT engine is available and functional.
+        Returns:
+            True if the engine can be used, False otherwise
+        """
+        pass
+    def get_info(self) -> Dict[str, Any]:
+        """Get metadata about this STT engine.
+        Returns:
+            Dictionary with engine information
+        """
+        return {
+            'name': self.__class__.__name__,
+            'languages': self.get_supported_languages(),
+            'available': self.is_available()
+        }

abstractvoice 0.5.1__tar.gz → 0.6.1__tar.gz

abstractvoice 0.5.1tar.gz → 0.6.1tar.gz