PyPI - sinapsis-speech - Versions diffs - 0.1.0__py3-none-any.whl → 0.2.0__py3-none-any.whl - Mend

sinapsis-speech 0.1.0py3-none-any.whl → 0.2.0py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

{sinapsis_speech-0.1.0.dist-info → sinapsis_speech-0.2.0.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: sinapsis-speech
-Version: 0.1.0
+Version: 0.2.0
 Summary: Generate speech using various libraries.
 Author-email: SinapsisAI <dev@sinapsis-ai.com>
 License:                     GNU AFFERO GENERAL PUBLIC LICENSE
@@ -666,25 +666,20 @@ License:                     GNU AFFERO GENERAL PUBLIC LICENSE
         <https://www.gnu.org/licenses/>.
 Project-URL: Homepage, https://sinapsis.tech
-Project-URL: Documentation, https://docs.sinapsis.tech/docs
+Project-URL: Documentation, https://docs.sinapsis.tech/docs/sinapsis-speech
 Project-URL: Tutorials, https://docs.sinapsis.tech/tutorials
 Project-URL: Repository, https://github.com/Sinapsis-AI/sinapsis-speech.git
 Requires-Python: >=3.10
 Description-Content-Type: text/markdown
-License-File: LICENSE
 Requires-Dist: pip>=24.3.1
-Requires-Dist: sinapsis>=0.1.1
-Provides-Extra: elevenlabs-app
-Requires-Dist: sinapsis-elevenlabs; extra == "elevenlabs-app"
-Requires-Dist: sinapsis-speech[gradio-app]; extra == "elevenlabs-app"
-Provides-Extra: gradio-app
-Requires-Dist: gradio>=5.14.0; extra == "gradio-app"
-Requires-Dist: sinapsis-data-readers>=0.1.0; extra == "gradio-app"
+Requires-Dist: sinapsis>=0.2.2
 Provides-Extra: all
-Requires-Dist: sinapsis-elevenlabs; extra == "all"
-Requires-Dist: sinapsis-speech[gradio-app]; extra == "all"
-Requires-Dist: sinapsis-speech[elevenlabs-app]; extra == "all"
-Dynamic: license-file
+Requires-Dist: sinapsis-elevenlabs[all]; extra == "all"
+Requires-Dist: sinapsis-f5-tts[all]; extra == "all"
+Requires-Dist: sinapsis-speech[webapp]; extra == "all"
+Requires-Dist: sinapsis-zonos[all]; extra == "all"
+Provides-Extra: gradio-app
+Requires-Dist: sinapsis[webapp]>=0.2.3; extra == "gradio-app"
 <h1 align="center">
 <br>
@@ -702,7 +697,7 @@ Sinapsis Speech
 <p align="center">
 <a href="#installation">🐍 Installation</a> •
 <a href="#packages">📦 Packages</a> •
-<a href="#webapp">🌐 Webapp</a> •
+<a href="#webapp">🌐 Webapps</a> •
 <a href="#documentation">📙 Documentation</a> •
 <a href="#packages">🔍 License</a>
 </p>
@@ -715,47 +710,93 @@ Sinapsis Speech
 > Sinapsis projects requires Python 3.10 or higher.
 >
-We strongly encourage the use of <code>uv</code>, although any other package manager should work too.
-If you need to install <code>uv</code> please see the [official documentation](https://docs.astral.sh/uv/getting-started/installation/#installation-methods).
+This repo includes packages for performing speech synthesis using different tools:
+* <code>sinapsis-elevenlabs</code>
+* <code>sinapsis-f5-tts</code>
+* <code>sinapsis-zonos</code>
-1. Install using your favourite package manager.
+Install using your preferred package manager. We strongly recommend using <code>uv</code>. To install <code>uv</code>, refer to the [official documentation](https://docs.astral.sh/uv/getting-started/installation/#installation-methods).
-Example with <code>uv</code>:
+Install with <code>uv</code>:
 ```bash
-  uv pip install sinapsis-elevenlabs --extra-index-url https://pypi.sinapsis.tech
+uv pip install sinapsis-elevenlabs --extra-index-url https://pypi.sinapsis.tech
 ```
- or with raw <code>pip</code>:
+Or with raw <code>pip</code>:
 ```bash
-  pip install sinapsis-elevenlabs --extra-index-url https://pypi.sinapsis.tech
+pip install sinapsis-elevenlabs --extra-index-url https://pypi.sinapsis.tech
 ```
-**Change the name of the package for the one you want to install**.
+**Replace `sinapsis-elevenlabs` with the name of the package you intend to install**.
+> [!IMPORTANT]
+> Templates in each package may require additional dependencies. For development, we recommend installing the package all optional dependencies:
+>
+With <code>uv</code>:
+```bash
+uv pip install sinapsis-elevenlabs[all] --extra-index-url https://pypi.sinapsis.tech
+```
+Or with raw <code>pip</code>:
+```bash
+pip install sinapsis-elevenlabs[all] --extra-index-url https://pypi.sinapsis.tech
+```
+**Be sure to substitute `sinapsis-elevenlabs` with the appropriate package name**.
 > [!TIP]
 > You can also install all the packages within this project:
 >
 ```bash
-  uv pip install sinapsis-speech[all] --extra-index-url https://pypi.sinapsis.tech
+uv pip install sinapsis-speech[all] --extra-index-url https://pypi.sinapsis.tech
 ```
 <h2 id="packages">📦 Packages</h2>
-Each package can be used independently or combined to create more complex workflows. Below is an overview of the available packages:
+This repository is organized into modular packages, each designed for integration with different text-to-speech tools. These packages provide ready-to-use templates for speech synthesis. Below is an overview of the available packages:
 <details>
-<summary id="elevenlabs"><strong><span style="font-size: 1.4em;"> Elevenlabs </span></strong></summary>
+<summary id="elevenlabs"><strong><span style="font-size: 1.4em;"> Sinapsis ElevenLabs </span></strong></summary>
+This package offers a suite of templates and utilities designed for effortless integrating, configuration, and execution of **text-to-speech (TTS)** and **voice generation** functionalities powered by [ElevenLabs](https://elevenlabs.io/).
-This package provides a suite of templates and utilities for seamlessly integrating, configuring, and running **text-to-speech (TTS)** and **voice generation** functionalities powered by [ElevenLabs](https://elevenlabs.io/):
+- **ElevenLabsTTS**: Template for converting text into speech using ElevenLabs' voice models.
-- **Text-to-speech**: Template for converting text into speech using ElevenLabs' voice models.
+- **ElevenLabsVoiceGeneration**: Template for generating custom synthetic voices based on user-provided descriptions.
-- **Voice generation**: Template for generating custom synthetic voices based on user-provided descriptions.
+For specific instructions and further details, see the [README.md](https://github.com/Sinapsis-AI/sinapsis-speech/blob/main/packages/sinapsis_elevenlabs/README.md).
 </details>
-<h2 id="webapps">🌐 Webapps</h2>
-The webapps included in this project showcase the modularity of the templates, in this case
-for speech generation tasks.
+<details>
+<summary id="f5tts"><strong><span style="font-size: 1.4em;"> Sinapsis F5-TTS</span></strong></summary>
+This package provides a template for seamlessly integrating, configuring, and running **text-to-speech (TTS)** functionalities powered by [F5TTS](https://github.com/SWivid/F5-TTS).
+- **F5TTSInference**: Converts text to speech using the F5TTS model with voice cloning capabilities.
+For specific instructions and further details, see the [README.md](https://github.com/Sinapsis-AI/sinapsis-speech/blob/main/packages/sinapsis_f5_tts/README.md).
+</details>
+<details>
+<summary id="zonos"><strong><span style="font-size: 1.4em;"> Sinapsis Zonos</span></strong></summary>
+This package provides a single template for integrating, configuring, and running **text-to-speech (TTS)** and **voice cloning** functionalities powered by [Zonos](https://github.com/Zyphra/Zonos/tree/main).
+- **ZonosTTS**: Template for converting text to speech or performing voice cloning based on the presence of an audio sample.
+For specific instructions and further details, see the [README.md](https://github.com/Sinapsis-AI/sinapsis-speech/blob/main/packages/sinapsis_zonos/README.md).
+</details>
+<h2 id="webapp">🌐 Webapps</h2>
+The webapps included in this project showcase the modularity of the templates, in this case for speech generation tasks.
 > [!IMPORTANT]
 > To run the app you first need to clone this repository:
@@ -768,89 +809,102 @@ cd sinapsis-speech
 > [!NOTE]
 > If you'd like to enable external app sharing in Gradio, `export GRADIO_SHARE_APP=True`
-> [!IMPORTANT]
-> The CosyVoice model requires at least 4GB of ram to work.
 > [!IMPORTANT]
-> Elevenlabs requires an api key to run any inference. Please go to the [official website](https://elevenlabs.io), create an account.
-If you already have an account, go to the [token page](https://elevenlabs.io/app/settings/api-keys) and generate a token.
+> Elevenlabs requires an API key to run any inference. To get started, visit the [official website](https://elevenlabs.io) and create an account. If you already have an account, go to the [API keys page](https://elevenlabs.io/app/settings/api-keys) to generate a token.
 > [!IMPORTANT]
-> set your env var using <code> export ELEVENLABS_API_KEY='your-api-key'</code>
+> Set your env var using <code> export ELEVENLABS_API_KEY='your-api-key'</code>
+> [!IMPORTANT]
+> F5-TTS requires a reference audio file for voice cloning. Make sure you have a reference audio file in the artifacts directory.
-> [!TIP]
-> The agent configuration can be updated using the AGENT_CONFIG_PATH environment var.
+> [!NOTE]
+> Agent configuration can be changed through the `AGENT_CONFIG_PATH` env var. You can check the available configurations in each package configs folder.
 <details>
-<summary id="docker"><strong><span style="font-size: 1.4em;">🐳 Build with Docker</span></strong></summary>
+<summary id="docker"><strong><span style="font-size: 1.4em;">🐳 Docker</span></strong></summary>
-**IMPORTANT** This docker image depends on the sinapsis-nvidia:base image. Please refer to the official [sinapsis](https://github.com/Sinapsis-ai/sinapsis?tab=readme-ov-file#docker) instructions to Build with Docker.
+**IMPORTANT**: This Docker image depends on the `sinapsis-nvidia:base` image. For detailed instructions, please refer to the [Sinapsis README](https://github.com/Sinapsis-ai/sinapsis?tab=readme-ov-file#docker).
-1. **Build the Docker image**:
+1. **Build the sinapsis-speech image**:
 ```bash
 docker compose -f docker/compose.yaml build
 ```
-2. **Launch the service**:
+2. **Start the app container**:
+For ElevenLabs:
 ```bash
 docker compose -f docker/compose_apps.yaml up -d sinapsis-elevenlabs
 ```
+For F5-TTS:
+```bash
+docker compose -f docker/compose_apps.yaml up -d sinapsis-f5_tts
+```
+For Zonos:
+```bash
+docker compose -f docker/compose_apps.yaml up -d sinapsis-zonos
+```
-2. **Check the logs**
+3. **Check the logs**
+For ElevenLabs:
 ```bash
 docker logs -f sinapsis-elevenlabs
 ```
-3. **The logs will display the URL to access the webapp, e.g.,:**:
+For F5-TTS:
+```bash
+docker logs -f sinapsis-f5tts
+```
+For Zonos:
+```bash
+docker logs -f sinapsis-zonos
+```
+4. **The logs will display the URL to access the webapp, e.g.,:**:
 ```bash
 Running on local URL:  http://127.0.0.1:7860
 ```
-4. To stop the app:
+**NOTE**: The url may be different, check the output of logs.
+5. **To stop the app**:
 ```bash
-docker compose -f docker/compose_apps.yaml down sinapsis-elevenlabs
+docker compose -f docker/compose_apps.yaml down
 ```
 </details>
 <details>
 <summary id="virtual-environment"><strong><span style="font-size: 1.4em;">💻 UV</span></strong></summary>
+To run the webapp using the <code>uv</code> package manager, follow these steps:
 1. **Sync the virtual environment**:
 ```bash
 uv sync --frozen
 ```
-2. Install the wheel:
+2. **Install the wheel**:
 ```bash
 uv pip install sinapsis-speech[all] --extra-index-url https://pypi.sinapsis.tech
 ```
-3. **Activate the virtual environment**:
+3. **Run the webapp**:
+For ElevenLabs:
 ```bash
-source .venv/bin/activate
+uv run webapps/elevenlabs/elevenlabs_tts_app.py
 ```
-4. **Declare PYTHONPATH**
+For F5-TTS:
 ```bash
-export PYTHONPATH=$PWD/webapps
+uv run webapps/f5-tts/f5_tts_app.py
 ```
-**NOTE** if not located in <code>sinapsis-speech</code> folder, change $PWD for the actual path to <code>sinapsis-speech</code>
-5. **Launch the demo**:
+For Zonos:
 ```bash
-python webapps/elevenlabs/elevenlabs_tts_app.py
+uv run webapps/zonos/zonos_tts_app.py
 ```
-6. Open the displayed URL, e.g.:
+4. **The terminal will display the URL to access the webapp (e.g.)**:
 ```bash
 Running on local URL:  http://127.0.0.1:7860
 ```
-**NOTE**: The URL can be different, please make sure you check the logs.
+**NOTE**: The URL may vary; check the terminal output for the correct address.
 </details>

sinapsis_speech-0.2.0.dist-info/RECORD ADDED Viewed

@@ -0,0 +1,21 @@
+sinapsis_elevenlabs/src/sinapsis_elevenlabs/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
+sinapsis_elevenlabs/src/sinapsis_elevenlabs/helpers/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
+sinapsis_elevenlabs/src/sinapsis_elevenlabs/helpers/env_var_keys.py,sha256=j8J64iplBNaff1WvmfJ03eJozE1f5SdqtqQeldV2vPY,998
+sinapsis_elevenlabs/src/sinapsis_elevenlabs/helpers/voice_utils.py,sha256=fR1r1aaoFy_rQGfJLunUNdZfVxDyAo7shevS4TAXH_M,2420
+sinapsis_elevenlabs/src/sinapsis_elevenlabs/templates/__init__.py,sha256=pyTWPBLN_P6sxFTF1QqfL7iTZd9E0EaggpfwB0qLLHI,579
+sinapsis_elevenlabs/src/sinapsis_elevenlabs/templates/elevenlabs_base.py,sha256=MQglkwvyOVk4krXTXoMSPZ4yCeDBq9vMpI3riz87aIg,8291
+sinapsis_elevenlabs/src/sinapsis_elevenlabs/templates/elevenlabs_tts.py,sha256=WVTROfB2ODAksHmWwV5RKcub3Hoc29OM_eAw75c9yio,2847
+sinapsis_elevenlabs/src/sinapsis_elevenlabs/templates/elevenlabs_voice_generation.py,sha256=bKo7zhfsiZwsn-qZx_MCVAIx_MmaKnaP3lc-07AwAaY,2819
+sinapsis_f5_tts/src/sinapsis_f5_tts/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
+sinapsis_f5_tts/src/sinapsis_f5_tts/templates/__init__.py,sha256=28BOPAr9GG1jYcrXi45ZWO1n2FAZJOdDcmRkOXdEYmk,496
+sinapsis_f5_tts/src/sinapsis_f5_tts/templates/f5_tts_inference.py,sha256=7EBxw-tRthbPDz0zFopaLdBhv7DXwxyMGXam6F1MwGs,15802
+sinapsis_zonos/src/sinapsis_zonos/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
+sinapsis_zonos/src/sinapsis_zonos/helpers/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
+sinapsis_zonos/src/sinapsis_zonos/helpers/zonos_keys.py,sha256=m1GdOYfzP73JGmtxH30mNiqbNkzFsQl9o2QaT7QxSVU,2470
+sinapsis_zonos/src/sinapsis_zonos/helpers/zonos_tts_utils.py,sha256=8Tr2YgxjBfRqv_Hf6sw36X2pLzW7fdQWqa6QPBxNZK8,6419
+sinapsis_zonos/src/sinapsis_zonos/templates/__init__.py,sha256=A-_F0K3hbEFqeWWAh4YftgU9CFX-WHrauSiCAww9yp8,482
+sinapsis_zonos/src/sinapsis_zonos/templates/zonos_tts.py,sha256=KsNuT8cFTTjTEqjfEWsIr4B-DjGhVacSw2SdPckuFvk,7507
+sinapsis_speech-0.2.0.dist-info/METADATA,sha256=-qJhZCqgMvFKr7iZBbv6lIleFa2DCTb0wXp1B2dKs18,48741
+sinapsis_speech-0.2.0.dist-info/WHEEL,sha256=CmyFI0kx5cdEMTLiONQRbGQwjIoR1aIYB7eCAQ4KPJ0,91
+sinapsis_speech-0.2.0.dist-info/top_level.txt,sha256=vQFjL84TMSRld2lKvEVMUNyY2b3AVluCT1Ijws7o7_c,51
+sinapsis_speech-0.2.0.dist-info/RECORD,,

{sinapsis_speech-0.1.0.dist-info → sinapsis_speech-0.2.0.dist-info}/WHEEL RENAMED Viewed

@@ -1,5 +1,5 @@
 Wheel-Version: 1.0
-Generator: setuptools (77.0.1)
+Generator: setuptools (78.1.0)
 Root-Is-Purelib: true
 Tag: py3-none-any

sinapsis_speech-0.2.0.dist-info/top_level.txt ADDED Viewed

@@ -0,0 +1,3 @@
+sinapsis_elevenlabs
+sinapsis_f5_tts
+sinapsis_zonos

sinapsis_zonos/src/sinapsis_zonos/__init__.py ADDED Viewed

File without changes

sinapsis_zonos/src/sinapsis_zonos/helpers/__init__.py ADDED Viewed

File without changes

sinapsis_zonos/src/sinapsis_zonos/helpers/zonos_keys.py ADDED Viewed

@@ -0,0 +1,67 @@
+# -*- coding: utf-8 -*-
+from typing import Literal
+from pydantic import BaseModel
+from pydantic.dataclasses import dataclass
+@dataclass(frozen=True)
+class TTSKeys:
+    """
+    A class to hold constants for the keys used in the Text-to-Speech (TTS) model configuration.
+    These keys represent standard fields that are used to configure various parameters of the TTS model,
+    such as speaker attributes, emotions, and other audio-related settings. They are typically used in
+    templates and potentially a TTS web application to adjust and access specific TTS settings."
+    """
+    speaker: Literal["speaker"] = "speaker"
+    emotion: Literal["emotion"] = "emotion"
+    vqscore_8: Literal["vqscore_8"] = "vqscore_8"
+    fmax: Literal["fmax"] = "fmax"
+    pitch_std: Literal["pitch_std"] = "pitch_std"
+    speaking_rate: Literal["speaking_rate"] = "speaking_rate"
+    dnsmos_ovrl: Literal["dnsmos_ovrl"] = "dnsmos_ovrl"
+    speaker_noised: Literal["speaker_noised"] = "speaker_noised"
+    wav: Literal["wav"] = "wav"
+    en_language: Literal["en-us"] = "en-us"
+    min_p: Literal["min_p"] = "min_p"
+class SamplingParams(BaseModel):
+    """
+    A class to hold the sampling parameters for the TTS model.
+    Attributes:
+        min_p (float): Minimum token probability, scaled by the highest token probability. Range: 0-1. Default: 0.0.
+        top_k (int): Number of top tokens to sample from. Range: 0-1024. Default: 0.
+        top_p (float): Cumulative probability threshold for nucleus sampling. Range: 0-1. Default: 0.0.
+        linear (float): Controls the token unusualness. Range: -2.0 to 2.0. Default: 0.0.
+        conf (float): Confidence level for randomness. Range: -2.0 to 2.0. Default: 0.0.
+        quad (float): Controls how much low probabilities are reduced. Range: -2.0 to 2.0. Default: 0.0.
+    """
+    min_p: float = 0.0
+    top_k: int = 0
+    top_p: float = 0.0
+    linear: float = 0.0
+    conf: float = 0.0
+    quad: float = 0.0
+class EmotionsConfig(BaseModel):
+    """
+    A class to hold emotional attributes that influence the tone of the generated speech.
+    These emotions are represented as float values and are used to adjust the emotional tone of the speech.
+    Higher values can represent a stronger presence of a particular emotion.
+    """
+    happiness: float = 0
+    sadness: float = 0
+    disgust: float = 0
+    fear: float = 0
+    surprise: float = 0
+    anger: float = 0
+    other: float = 0
+    neutral: float = 0

sinapsis_zonos/src/sinapsis_zonos/helpers/zonos_tts_utils.py ADDED Viewed

@@ -0,0 +1,153 @@
+# -*- coding: utf-8 -*-
+from typing import Set
+import torch
+import torchaudio
+from sinapsis_core.template_base.template import TemplateAttributeType
+from sinapsis_core.utils.logging_utils import sinapsis_logger
+from zonos.conditioning import make_cond_dict, supported_language_codes
+from zonos.model import Zonos
+from sinapsis_zonos.helpers.zonos_keys import SamplingParams, TTSKeys
+def get_audio_prefix_codes(prefix_path: str | None, model: Zonos) -> torch.Tensor | None:
+    """Generates audio prefix codes from an audio file.
+    Args:
+        prefix_path (str): Path to the audio file to generate the prefix codes from.
+        model (Zonos): The Zonos model used to generate the audio prefix codes.
+    Returns:
+        torch.Tensor | None: The generated audio prefix codes if available, otherwise None.
+    """
+    if prefix_path:
+        waveform, sample_rate = torchaudio.load(prefix_path)
+        waveform = waveform.mean(0, keepdim=True)
+        waveform = model.autoencoder.preprocess(waveform, sample_rate)
+        return model.autoencoder.encode(waveform.unsqueeze(0))
+    return None
+def get_conditioning(
+    attributes: TemplateAttributeType, model: Zonos, input_text: str, device: torch.device
+) -> torch.Tensor:
+    """
+    Generates conditioning tensor for the input text, combining it with speaker embeddings and emotions.
+    Args:
+        attributes (TemplateAttributeType): attributes with configuration for the conditioning dictionary
+        of the model.
+        model (Zonos): Model to be used during inference, where the setup is modified.
+        input_text (str): The text to be converted to speech.
+        device (torch.device): Device where model should be loaded.
+    Returns:
+        torch.Tensor: The generated conditioning tensor for speech synthesis.
+    """
+    speaker_embedding = get_speaker_embedding(attributes.speaker_audio, attributes.unconditional_keys, model, device)
+    emotion_data = get_emotion_tensor(attributes, device)
+    validate_language(attributes)
+    vq_data = torch.tensor([attributes.vq_score] * 8, device=device).unsqueeze(0)
+    conditioning_dict = make_cond_dict(
+        text=input_text,
+        language=attributes.language,
+        speaker=speaker_embedding,
+        emotion=emotion_data,
+        vqscore_8=vq_data,
+        fmax=attributes.fmax,
+        pitch_std=attributes.pitch_std,
+        speaking_rate=attributes.speaking_rate,
+        dnsmos_ovrl=attributes.dnsmos,
+        speaker_noised=attributes.denoised_speaker,
+        device=device,
+        unconditional_keys=attributes.unconditional_keys,
+    )
+    return model.prepare_conditioning(conditioning_dict)
+def get_emotion_tensor(attributes: TemplateAttributeType, device: torch.device) -> torch.Tensor:
+    """
+    Extracts or constructs an emotion tensor from the given attributes.
+    If `attributes.emotions` is present, its values are serialized and converted into a tensor.
+    If not, a default zero tensor of shape (8,) is returned, and the `emotion` key is
+    added to `attributes.unconditional_keys` (if not already included) to indicate unconditional conditioning.
+    Args:
+        attributes (TemplateAttributeType): Attributes for Zonos TTS model configuration.
+        device (torch.device): The device on which the tensor should be created.
+    Returns:
+        torch.Tensor: A tensor representing emotion values, either user-provided or default.
+    """
+    if attributes.emotions:
+        emotion_values = list(map(float, attributes.emotions.model_dump().values()))
+        return torch.tensor(emotion_values, device=device)
+    else:
+        if TTSKeys.emotion not in attributes.unconditional_keys:
+            attributes.unconditional_keys.add(TTSKeys.emotion)
+        return torch.tensor([0.0] * 8, device=device)
+def get_sampling_params(sampling_params: SamplingParams | dict) -> dict:
+    """
+    Returns a dictionary of sampling parameters for audio generation.
+    If `sampling_params` is a Pydantic model, its non-null fields are serialized using `model_dump()`.
+    If `sampling_params` is empty, a default dictionary with a minimum probability value is returned.
+    Args:
+        sampling_params (SamplingParams | dict): A SamplingParams Pydantic model or dictionary.
+    Returns:
+        dict: A dictionary of sampling parameters, either user-defined or with a default fallback.
+    """
+    if isinstance(sampling_params, SamplingParams):
+        return sampling_params.model_dump(exclude_none=True)
+    return {TTSKeys.min_p: 0.1}
+def get_speaker_embedding(
+    speaker_path: str | None, unconditional_keys: Set[str], model: Zonos, device: torch.device
+) -> torch.Tensor | None:
+    """Extracts speaker embedding from an audio file.
+    Args:
+        speaker_path (str): Path to the audio file from which the speaker embedding will be extracted.
+        unconditional_keys (dict): Dictionary of keys to condition speech synthesis.
+            This will be used to determine whether a speaker embedding is needed.
+        model (Zonos): The Zonos model used for generating the speaker embedding.
+    Returns:
+        torch.Tensor | None: The speaker embedding if available, otherwise None.
+    """
+    if speaker_path and TTSKeys.speaker not in unconditional_keys:
+        waveform, sample_rate = torchaudio.load(speaker_path)
+        speaker_embedding = model.make_speaker_embedding(waveform, sample_rate)
+        return speaker_embedding.to(device, dtype=torch.bfloat16)
+    return None
+def init_seed(attributes: TemplateAttributeType) -> None:
+    """Initializes the seed for reproducible results."""
+    if attributes.randomized_seed:
+        attributes.seed = torch.randint(0, 2**32 - 1, (1,)).item()
+    torch.manual_seed(attributes.seed)
+def validate_language(attributes: TemplateAttributeType) -> None:
+    """
+    Validates and updates the language attribute in the provided TTS configuration.
+    Checks if `attributes.language` is included in the list of supported language codes.
+    If the language is unsupported, logs an error and defaults it to `TTSKeys.en_language`.
+    Args:
+        attributes (TemplateAttributeType): The model attributes containing the language setting.
+    """
+    if attributes.language not in supported_language_codes:
+        sinapsis_logger.error(f"Language {attributes.language} not supported. Defaulting to {TTSKeys.en_language}")
+        attributes.language = TTSKeys.en_language

sinapsis_zonos/src/sinapsis_zonos/templates/__init__.py ADDED Viewed

@@ -0,0 +1,20 @@
+# -*- coding: utf-8 -*-
+import importlib
+from typing import Callable
+_root_lib_path = "sinapsis_zonos.templates"
+_template_lookup = {
+    "ZonosTTS": f"{_root_lib_path}.zonos_tts",
+}
+def __getattr__(name: str) -> Callable:
+    if name in _template_lookup:
+        module = importlib.import_module(_template_lookup[name])
+        return getattr(module, name)
+    raise AttributeError(f"template `{name}` not found in {_root_lib_path}")
+__all__ = list(_template_lookup.keys())

sinapsis-speech 0.1.0__py3-none-any.whl → 0.2.0__py3-none-any.whl

sinapsis-speech 0.1.0py3-none-any.whl → 0.2.0py3-none-any.whl