sinapsis-csm 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,249 @@
1
+ Metadata-Version: 2.4
2
+ Name: sinapsis-csm
3
+ Version: 0.1.0
4
+ Summary: Text to speech using CSM TTS model
5
+ Author-email: SinapsisAI <dev@sinapsis.tech>
6
+ Project-URL: Homepage, https://sinapsis.tech
7
+ Project-URL: Documentation, https://docs.sinapsis.tech/docs
8
+ Project-URL: Tutorials, https://docs.sinapsis.tech/tutorials
9
+ Project-URL: Repository, https://github.com/Sinapsis-AI/sinapsis-speech.git
10
+ Requires-Python: >=3.10
11
+ Description-Content-Type: text/markdown
12
+ Requires-Dist: silentcipher
13
+ Requires-Dist: csm
14
+ Provides-Extra: data-tools
15
+ Requires-Dist: sinapsis-data-readers[all]>=0.1.2; extra == "data-tools"
16
+ Requires-Dist: sinapsis-data-writers[soundfile]>=0.1.2; extra == "data-tools"
17
+ Provides-Extra: all
18
+ Requires-Dist: sinapsis-csm[data-tools]; extra == "all"
19
+
20
+ <h1 align="center">
21
+ <br>
22
+ <a href="https://sinapsis.tech/">
23
+ <img
24
+ src="https://github.com/Sinapsis-AI/brand-resources/blob/main/sinapsis_logo/4x/logo.png?raw=true"
25
+ alt="" width="300">
26
+ </a><br>
27
+ Sinapsis CSM
28
+ <br>
29
+ </h1>
30
+
31
+ <p align="center">
32
+ <a href="#installation">🐍 Installation</a> •
33
+ <a href="#features">🚀 Features</a> •
34
+ <a href="#example">📚 Usage example</a> •
35
+ <a href="#documentation">📙 Documentation</a> •
36
+ <a href="#license">🔍 License</a>
37
+ </p>
38
+
39
+ This **Sinapsis CSM** package integrates a lightweight, efficient text-to-speech engine using the CSM model. It provides a simple template to convert input text into speech using Sinapsis.
40
+
41
+ ---
42
+
43
+ <h2 id="installation">🐍 Installation</h2>
44
+
45
+ > [!IMPORTANT]
46
+ > Sinapsis project requires Python 3.10 or higher.
47
+
48
+ Install using your preferred package manager. We strongly recommend using <code>uv</code>. To install <code>uv</code>, refer to the [official documentation](https://docs.astral.sh/uv/getting-started/installation/#installation-methods).
49
+
50
+ Install with <code>uv</code>:
51
+ ```bash
52
+ uv pip install sinapsis-csm --extra-index-url https://pypi.sinapsis.tech
53
+ ```
54
+
55
+ Or with raw <code>pip</code>:
56
+ ```bash
57
+ pip install sinapsis-csm --extra-index-url https://pypi.sinapsis.tech
58
+ ```
59
+
60
+ > [!IMPORTANT]
61
+ > Templates in each package may require additional dependencies. For development, we recommend installing the package with all the optional dependencies:
62
+
63
+ With <code>uv</code>:
64
+ ```bash
65
+ uv pip install sinapsis-csm[all] --extra-index-url https://pypi.sinapsis.tech
66
+ ```
67
+
68
+ Or with raw <code>pip</code>:
69
+ ```bash
70
+ pip install sinapsis-csm[all] --extra-index-url https://pypi.sinapsis.tech
71
+ ```
72
+
73
+ To run this package you need a HuggingFace token. See the [official instructions](https://huggingface.co/docs/hub/security-tokens)
74
+ and set it using
75
+ ```bash
76
+ export HF_TOKEN=<token-provided-by-hf>
77
+ ```
78
+
79
+ and test it through the cli or the webapp.
80
+
81
+ Access to the following models is needed:
82
+
83
+ * [Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B)
84
+ * [CSM-1B](https://huggingface.co/sesame/csm-1b)
85
+
86
+ ---
87
+
88
+ <h2 id="features">🚀 Features</h2>
89
+
90
+ <h3>Templates Supported</h3>
91
+
92
+ - **CSMTTS**: Converts text into speech using the CSM model.
93
+
94
+ <details>
95
+ <summary>Attributes</summary>
96
+
97
+ - `speaker_id` (int, default: 0): Speaker identity index.
98
+ - `max_audio_length_ms` (int, default: 10000): Max audio length in milliseconds.
99
+ - `device` ("cpu" or "cuda", default: "cpu"): Device used for inference.
100
+ - `context` (context: list[str] | None = None): Optional list of past utterances for context.
101
+ - `sample_rate_hz` (int, default: 24000): Output audio sample rate.
102
+ </details>
103
+
104
+ ---
105
+
106
+ <h2 id="example">📚 Usage example</h2>
107
+
108
+ This example shows how to use the **CSMTTS** template to convert text into speech and save it to disk.
109
+
110
+ <details>
111
+ <summary><strong><span style="font-size: 1.2em;">Agent config</span></strong></summary>
112
+
113
+ ```yaml
114
+ agent:
115
+ name: csm_tts_agent
116
+ description: Agent that synthesizes speech from text using the CSM model.
117
+
118
+ templates:
119
+ - template_name: InputTemplate
120
+ class_name: InputTemplate
121
+ attributes: {}
122
+
123
+ - template_name: TextInput
124
+ class_name: TextInput
125
+ template_input: InputTemplate
126
+ attributes:
127
+ text: "Hi, my name is Taylor and this is Sinapsis"
128
+
129
+ - template_name: CSMTTS
130
+ class_name: CSMTTS
131
+ template_input: TextInput
132
+ attributes:
133
+ speaker_id: 0
134
+ max_audio_length_ms: 10000
135
+ device: cpu
136
+ context: null
137
+ sample_rate_hz: 24000
138
+
139
+ - template_name: AudioWriterSoundFile
140
+ class_name: AudioWriterSoundFile
141
+ template_input: CSMTTS
142
+ attributes:
143
+ save_dir: csm_tts
144
+ extension: wav
145
+ ```
146
+
147
+ </details>
148
+
149
+ To run the config, use:
150
+
151
+ ```bash
152
+ sinapsis run packages/sinapsis_csm/src/sinapsis_csm/configs/csm_agent.yml
153
+ ```
154
+
155
+ > [!NOTE]
156
+ > The `TextInput` and `AudioWriterSoundFile` templates come from the [sinapsis-data-readers](https://github.com/Sinapsis-AI/sinapsis-data-tools) and [sinapsis-data-writers](https://github.com/Sinapsis-AI/sinapsis-data-tools) packages. Make sure they are installed to use this example.
157
+
158
+ ---
159
+
160
+
161
+ <h2 id="webapp">🌐 Webapp</h2>
162
+ The webapp included in this project showcases the modularity of the CSM template for speech generation tasks.
163
+
164
+ > [!IMPORTANT]
165
+ > To run the app you first need to clone this repository:
166
+
167
+ ```bash
168
+ git clone git@github.com:Sinapsis-ai/sinapsis-speech.git
169
+ cd sinapsis-speech
170
+ ```
171
+
172
+ > [!NOTE]
173
+ > If you'd like to enable external app sharing in Gradio, `export GRADIO_SHARE_APP=True`
174
+
175
+
176
+ <details>
177
+ <summary id="docker"><strong><span style="font-size: 1.4em;">🐳 Docker</span></strong></summary>
178
+
179
+ **IMPORTANT** This docker image depends on the sinapsis-nvidia:base image. Please refer to the official [sinapsis](https://github.com/Sinapsis-ai/sinapsis?tab=readme-ov-file#docker) instructions to Build with Docker.
180
+
181
+ 1. **Build the sinapsis-speech image**:
182
+ ```bash
183
+ docker compose -f docker/compose.yaml build
184
+ ```
185
+
186
+ 2. **Start the app container**:
187
+ ```bash
188
+ docker compose -f docker/compose_apps.yaml up -d sinapsis-csm
189
+ ```
190
+ 3. **Check the logs**
191
+ ```bash
192
+ docker logs -f sinapsis-csm
193
+ ```
194
+ 4. **The logs will display the URL to access the webapp, e.g.,:**:
195
+ ```bash
196
+ Running on local URL: http://127.0.0.1:7860
197
+ ```
198
+
199
+ **NOTE**: The url may be different, check the output of logs.
200
+
201
+ 5. **To stop the app**:
202
+ ```bash
203
+ docker compose -f docker/compose_apps.yaml down
204
+ ```
205
+ </details>
206
+
207
+ <details>
208
+ <summary id="virtual-environment"><strong><span style="font-size: 1.4em;">💻 UV</span></strong></summary>
209
+
210
+ To run the webapp using the <code>uv</code> package manager, follow these steps:
211
+
212
+ 1. **Sync the virtual environment**:
213
+
214
+ ```bash
215
+ uv sync --frozen
216
+ ```
217
+ 2. **Install the wheel**:
218
+
219
+ ```bash
220
+ uv pip install sinapsis-speech[all] --extra-index-url https://pypi.sinapsis.tech
221
+ ```
222
+ 3. **Run the webapp**:
223
+
224
+ ```bash
225
+ uv run webapps/packet_tts_apps/csm_tts_app.py
226
+ ```
227
+ 4. **The terminal will display the URL to access the webapp (e.g.)**:
228
+ ```bash
229
+ Running on local URL: http://127.0.0.1:7860
230
+ ```
231
+ **NOTE**: The URL may vary; check the terminal output for the correct address.
232
+
233
+ </details>
234
+
235
+
236
+
237
+ <h2 id="documentation">📙 Documentation</h2>
238
+
239
+ Documentation is available on the [Sinapsis website](https://docs.sinapsis.tech/docs).
240
+
241
+ Tutorials and guides for different templates and agents are available at [docs.sinapsis.tech/tutorials](https://docs.sinapsis.tech/tutorials).
242
+
243
+ ---
244
+
245
+ <h2 id="license">🔍 License</h2>
246
+
247
+ This project is licensed under the AGPLv3 license, which encourages open collaboration and sharing. For more details, please refer to the [LICENSE](LICENSE) file.
248
+
249
+ For commercial use, please refer to our [official Sinapsis website](https://sinapsis.tech) for information on obtaining a commercial license.
@@ -0,0 +1,230 @@
1
+ <h1 align="center">
2
+ <br>
3
+ <a href="https://sinapsis.tech/">
4
+ <img
5
+ src="https://github.com/Sinapsis-AI/brand-resources/blob/main/sinapsis_logo/4x/logo.png?raw=true"
6
+ alt="" width="300">
7
+ </a><br>
8
+ Sinapsis CSM
9
+ <br>
10
+ </h1>
11
+
12
+ <p align="center">
13
+ <a href="#installation">🐍 Installation</a> •
14
+ <a href="#features">🚀 Features</a> •
15
+ <a href="#example">📚 Usage example</a> •
16
+ <a href="#documentation">📙 Documentation</a> •
17
+ <a href="#license">🔍 License</a>
18
+ </p>
19
+
20
+ This **Sinapsis CSM** package integrates a lightweight, efficient text-to-speech engine using the CSM model. It provides a simple template to convert input text into speech using Sinapsis.
21
+
22
+ ---
23
+
24
+ <h2 id="installation">🐍 Installation</h2>
25
+
26
+ > [!IMPORTANT]
27
+ > Sinapsis project requires Python 3.10 or higher.
28
+
29
+ Install using your preferred package manager. We strongly recommend using <code>uv</code>. To install <code>uv</code>, refer to the [official documentation](https://docs.astral.sh/uv/getting-started/installation/#installation-methods).
30
+
31
+ Install with <code>uv</code>:
32
+ ```bash
33
+ uv pip install sinapsis-csm --extra-index-url https://pypi.sinapsis.tech
34
+ ```
35
+
36
+ Or with raw <code>pip</code>:
37
+ ```bash
38
+ pip install sinapsis-csm --extra-index-url https://pypi.sinapsis.tech
39
+ ```
40
+
41
+ > [!IMPORTANT]
42
+ > Templates in each package may require additional dependencies. For development, we recommend installing the package with all the optional dependencies:
43
+
44
+ With <code>uv</code>:
45
+ ```bash
46
+ uv pip install sinapsis-csm[all] --extra-index-url https://pypi.sinapsis.tech
47
+ ```
48
+
49
+ Or with raw <code>pip</code>:
50
+ ```bash
51
+ pip install sinapsis-csm[all] --extra-index-url https://pypi.sinapsis.tech
52
+ ```
53
+
54
+ To run this package you need a HuggingFace token. See the [official instructions](https://huggingface.co/docs/hub/security-tokens)
55
+ and set it using
56
+ ```bash
57
+ export HF_TOKEN=<token-provided-by-hf>
58
+ ```
59
+
60
+ and test it through the cli or the webapp.
61
+
62
+ Access to the following models is needed:
63
+
64
+ * [Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B)
65
+ * [CSM-1B](https://huggingface.co/sesame/csm-1b)
66
+
67
+ ---
68
+
69
+ <h2 id="features">🚀 Features</h2>
70
+
71
+ <h3>Templates Supported</h3>
72
+
73
+ - **CSMTTS**: Converts text into speech using the CSM model.
74
+
75
+ <details>
76
+ <summary>Attributes</summary>
77
+
78
+ - `speaker_id` (int, default: 0): Speaker identity index.
79
+ - `max_audio_length_ms` (int, default: 10000): Max audio length in milliseconds.
80
+ - `device` ("cpu" or "cuda", default: "cpu"): Device used for inference.
81
+ - `context` (context: list[str] | None = None): Optional list of past utterances for context.
82
+ - `sample_rate_hz` (int, default: 24000): Output audio sample rate.
83
+ </details>
84
+
85
+ ---
86
+
87
+ <h2 id="example">📚 Usage example</h2>
88
+
89
+ This example shows how to use the **CSMTTS** template to convert text into speech and save it to disk.
90
+
91
+ <details>
92
+ <summary><strong><span style="font-size: 1.2em;">Agent config</span></strong></summary>
93
+
94
+ ```yaml
95
+ agent:
96
+ name: csm_tts_agent
97
+ description: Agent that synthesizes speech from text using the CSM model.
98
+
99
+ templates:
100
+ - template_name: InputTemplate
101
+ class_name: InputTemplate
102
+ attributes: {}
103
+
104
+ - template_name: TextInput
105
+ class_name: TextInput
106
+ template_input: InputTemplate
107
+ attributes:
108
+ text: "Hi, my name is Taylor and this is Sinapsis"
109
+
110
+ - template_name: CSMTTS
111
+ class_name: CSMTTS
112
+ template_input: TextInput
113
+ attributes:
114
+ speaker_id: 0
115
+ max_audio_length_ms: 10000
116
+ device: cpu
117
+ context: null
118
+ sample_rate_hz: 24000
119
+
120
+ - template_name: AudioWriterSoundFile
121
+ class_name: AudioWriterSoundFile
122
+ template_input: CSMTTS
123
+ attributes:
124
+ save_dir: csm_tts
125
+ extension: wav
126
+ ```
127
+
128
+ </details>
129
+
130
+ To run the config, use:
131
+
132
+ ```bash
133
+ sinapsis run packages/sinapsis_csm/src/sinapsis_csm/configs/csm_agent.yml
134
+ ```
135
+
136
+ > [!NOTE]
137
+ > The `TextInput` and `AudioWriterSoundFile` templates come from the [sinapsis-data-readers](https://github.com/Sinapsis-AI/sinapsis-data-tools) and [sinapsis-data-writers](https://github.com/Sinapsis-AI/sinapsis-data-tools) packages. Make sure they are installed to use this example.
138
+
139
+ ---
140
+
141
+
142
+ <h2 id="webapp">🌐 Webapp</h2>
143
+ The webapp included in this project showcases the modularity of the CSM template for speech generation tasks.
144
+
145
+ > [!IMPORTANT]
146
+ > To run the app you first need to clone this repository:
147
+
148
+ ```bash
149
+ git clone git@github.com:Sinapsis-ai/sinapsis-speech.git
150
+ cd sinapsis-speech
151
+ ```
152
+
153
+ > [!NOTE]
154
+ > If you'd like to enable external app sharing in Gradio, `export GRADIO_SHARE_APP=True`
155
+
156
+
157
+ <details>
158
+ <summary id="docker"><strong><span style="font-size: 1.4em;">🐳 Docker</span></strong></summary>
159
+
160
+ **IMPORTANT** This docker image depends on the sinapsis-nvidia:base image. Please refer to the official [sinapsis](https://github.com/Sinapsis-ai/sinapsis?tab=readme-ov-file#docker) instructions to Build with Docker.
161
+
162
+ 1. **Build the sinapsis-speech image**:
163
+ ```bash
164
+ docker compose -f docker/compose.yaml build
165
+ ```
166
+
167
+ 2. **Start the app container**:
168
+ ```bash
169
+ docker compose -f docker/compose_apps.yaml up -d sinapsis-csm
170
+ ```
171
+ 3. **Check the logs**
172
+ ```bash
173
+ docker logs -f sinapsis-csm
174
+ ```
175
+ 4. **The logs will display the URL to access the webapp, e.g.,:**:
176
+ ```bash
177
+ Running on local URL: http://127.0.0.1:7860
178
+ ```
179
+
180
+ **NOTE**: The url may be different, check the output of logs.
181
+
182
+ 5. **To stop the app**:
183
+ ```bash
184
+ docker compose -f docker/compose_apps.yaml down
185
+ ```
186
+ </details>
187
+
188
+ <details>
189
+ <summary id="virtual-environment"><strong><span style="font-size: 1.4em;">💻 UV</span></strong></summary>
190
+
191
+ To run the webapp using the <code>uv</code> package manager, follow these steps:
192
+
193
+ 1. **Sync the virtual environment**:
194
+
195
+ ```bash
196
+ uv sync --frozen
197
+ ```
198
+ 2. **Install the wheel**:
199
+
200
+ ```bash
201
+ uv pip install sinapsis-speech[all] --extra-index-url https://pypi.sinapsis.tech
202
+ ```
203
+ 3. **Run the webapp**:
204
+
205
+ ```bash
206
+ uv run webapps/packet_tts_apps/csm_tts_app.py
207
+ ```
208
+ 4. **The terminal will display the URL to access the webapp (e.g.)**:
209
+ ```bash
210
+ Running on local URL: http://127.0.0.1:7860
211
+ ```
212
+ **NOTE**: The URL may vary; check the terminal output for the correct address.
213
+
214
+ </details>
215
+
216
+
217
+
218
+ <h2 id="documentation">📙 Documentation</h2>
219
+
220
+ Documentation is available on the [Sinapsis website](https://docs.sinapsis.tech/docs).
221
+
222
+ Tutorials and guides for different templates and agents are available at [docs.sinapsis.tech/tutorials](https://docs.sinapsis.tech/tutorials).
223
+
224
+ ---
225
+
226
+ <h2 id="license">🔍 License</h2>
227
+
228
+ This project is licensed under the AGPLv3 license, which encourages open collaboration and sharing. For more details, please refer to the [LICENSE](LICENSE) file.
229
+
230
+ For commercial use, please refer to our [official Sinapsis website](https://sinapsis.tech) for information on obtaining a commercial license.
@@ -0,0 +1,65 @@
1
+ [project]
2
+ name = "sinapsis-csm"
3
+ version = "0.1.0"
4
+ description = "Text to speech using CSM TTS model"
5
+ readme = "README.md"
6
+ requires-python = ">=3.10"
7
+ authors = [
8
+ {name = "SinapsisAI", email = "dev@sinapsis.tech"},
9
+ ]
10
+ license-files = ["LICENSE"]
11
+ dependencies = [
12
+ "silentcipher",
13
+ "csm",
14
+ ]
15
+
16
+ [build-system]
17
+ requires = ["setuptools", "wheel"]
18
+ build-backend = "setuptools.build_meta"
19
+
20
+ [tool.uv.sources]
21
+ csm = { git = "https://github.com/Natalia-OsorioClavijo/csm.git" }
22
+ sinapsis-csm = { workspace = true }
23
+ silentcipher = { git = "https://github.com/SesameAILabs/silentcipher", rev = "master" }
24
+
25
+
26
+ [tool.ruff]
27
+ lint.select = [
28
+ "ARG",
29
+ "ANN",
30
+ "BLE",
31
+ "C4",
32
+ "E",
33
+ "F",
34
+ "FIX",
35
+ "FLY",
36
+ "I",
37
+ "PERF",
38
+ "PIE",
39
+ "RUF",
40
+ "RSE",
41
+ "SIM",
42
+ "SLOT",
43
+ "T10",
44
+ "T20",
45
+ "TD",
46
+ "TID",
47
+ ]
48
+ lint.ignore = ['ANN401']
49
+ line-length = 120
50
+ show-fixes = true
51
+
52
+ [project.urls]
53
+ Homepage = "https://sinapsis.tech"
54
+ Documentation = "https://docs.sinapsis.tech/docs"
55
+ Tutorials = "https://docs.sinapsis.tech/tutorials"
56
+ Repository = "https://github.com/Sinapsis-AI/sinapsis-speech.git"
57
+
58
+ [project.optional-dependencies]
59
+ data-tools = [
60
+ "sinapsis-data-readers[all]>=0.1.2",
61
+ "sinapsis-data-writers[soundfile]>=0.1.2",
62
+ ]
63
+
64
+ all = [
65
+ "sinapsis-csm[data-tools]"]
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
File without changes
@@ -0,0 +1,43 @@
1
+ # -*- coding: utf-8 -*-
2
+ from typing import Literal
3
+
4
+ import torch
5
+ from csm.generator import Generator
6
+ from csm.models import Model
7
+
8
+
9
+ class CSMGenerator:
10
+ """
11
+ Wrapper around the CSM model providing a simple interface
12
+ for text-to-speech generation
13
+ """
14
+
15
+ def __init__(self, device: Literal["cpu", "cuda"] = "cpu", sample_rate: int = 24000) -> None:
16
+ self.device: str = device
17
+ self.sample_rate: int = sample_rate
18
+ self.model: Model = Model.from_pretrained("sesame/csm-1b")
19
+ self.model.to(device=device)
20
+ self.model.sample_rate = sample_rate
21
+ self.generator = Generator(self.model)
22
+
23
+ def generate(
24
+ self, text: str, speaker: int = 0, context: list[str] | None = None, max_audio_length_ms: int = 10000
25
+ ) -> torch.Tensor:
26
+ if context is None:
27
+ context = []
28
+ return self.generator.generate(
29
+ text=text,
30
+ speaker=speaker,
31
+ context=context,
32
+ max_audio_length_ms=max_audio_length_ms,
33
+ )
34
+
35
+
36
+ def load_csm_1b(device: Literal["cpu", "cuda"] = "cpu", sample_rate: int = 24000) -> CSMGenerator:
37
+ """
38
+ Loads and configures the CSM TTS model.
39
+
40
+ Returns:
41
+ CSMGenerator: Model wrapper with ready-to-use generate method.
42
+ """
43
+ return CSMGenerator(device=device, sample_rate=sample_rate)
@@ -0,0 +1,19 @@
1
+ import importlib
2
+ from typing import Callable
3
+ from sinapsis_csm.templates.csm_tts import CSMTTS
4
+
5
+ _root_lib_path = "sinapsis_csm.templates"
6
+ _template_lookup = {
7
+ "CSMTTS": f"{_root_lib_path}.csm_tts",
8
+ }
9
+
10
+ def __getattr__(name: str) -> Callable:
11
+ if name in _template_lookup:
12
+ module = importlib.import_module(_template_lookup[name])
13
+ return getattr(module, name)
14
+ raise AttributeError(f"Template `{name}` not found in `{_root_lib_path}`.")
15
+
16
+
17
+ __all__ = ["CSMTTS"]
18
+
19
+
@@ -0,0 +1,88 @@
1
+ from typing import Literal
2
+ import torch
3
+ from sinapsis_core.data_containers.data_packet import AudioPacket, DataContainer
4
+ from sinapsis_core.template_base import Template
5
+ from sinapsis_core.template_base.base_models import TemplateAttributes, TemplateAttributeType
6
+ from sinapsis_csm.helpers.generator import load_csm_1b
7
+
8
+
9
+ class CSMTTS(Template):
10
+ """
11
+ Sinapsis template for converting text into speech using the CSM TTS model.
12
+ """
13
+
14
+ class AttributesBaseModel(TemplateAttributes): # type: ignore
15
+ """
16
+ Defines configurable attributes for the CSMTTS template.
17
+ """
18
+ speaker_id: int = 0
19
+ max_audio_length_ms: int = 10000
20
+ device: Literal["cuda", "cpu"] = "cpu"
21
+ context: list[str] | None = None
22
+ sample_rate_hz: int = 24000
23
+
24
+ def __init__(self, attributes: TemplateAttributeType) -> None:
25
+ """
26
+ Initializes the template and loads the CSM model.
27
+
28
+ Args:
29
+ attributes (TemplateAttributeType): User-defined attributes from YAML configuration.
30
+ """
31
+ super().__init__(attributes)
32
+ self.model = load_csm_1b(
33
+ device=self.attributes.device,
34
+ sample_rate=self.attributes.sample_rate_hz
35
+ )
36
+
37
+ def generate_audio(self, text: str) -> torch.Tensor:
38
+ """
39
+ Converts input text to audio using the CSM model.
40
+
41
+ Args:
42
+ text (str): Input text string.
43
+
44
+ Returns:
45
+ torch.Tensor: Audio waveform tensor.
46
+ """
47
+ context = self.attributes.context if self.attributes.context else []
48
+ return self.model.generate(
49
+ text=text,
50
+ speaker=self.attributes.speaker_id,
51
+ context=context,
52
+ max_audio_length_ms=self.attributes.max_audio_length_ms,
53
+ )
54
+
55
+ def generate_audio_packet(self, audio: torch.Tensor, source_text: str) -> AudioPacket:
56
+ """
57
+ Wraps a raw audio tensor into a sinapsis compatible audioPacket
58
+
59
+ Args:
60
+ audio (torch.Tensor): Audio waveform.
61
+ source_text (str): Original input text used for generation.
62
+
63
+ Returns:
64
+ AudioPacket: Encapsulated audio data with metadata.
65
+ """
66
+ audio_np = audio.cpu().numpy()
67
+ return AudioPacket(
68
+ content=audio_np,
69
+ sample_rate=self.attributes.sample_rate_hz,
70
+ generic_data={"source_text": source_text, "model": "CSM"}
71
+ )
72
+
73
+ def execute(self, container: DataContainer) -> DataContainer:
74
+ """
75
+ Main method executed by Sinapsis. Converts all text packets in the input container to audio.
76
+
77
+ Args:
78
+ container (DataContainer): Input container with text packets.
79
+
80
+ Returns:
81
+ DataContainer: Output container with generated audio packets.
82
+ """
83
+ for packet in container.texts:
84
+ audio = self.generate_audio(packet.content)
85
+ audio_packet = self.generate_audio_packet(audio, packet.content)
86
+ audio_packet.source = self.instance_name
87
+ container.audios.append(audio_packet)
88
+ return container
@@ -0,0 +1,249 @@
1
+ Metadata-Version: 2.4
2
+ Name: sinapsis-csm
3
+ Version: 0.1.0
4
+ Summary: Text to speech using CSM TTS model
5
+ Author-email: SinapsisAI <dev@sinapsis.tech>
6
+ Project-URL: Homepage, https://sinapsis.tech
7
+ Project-URL: Documentation, https://docs.sinapsis.tech/docs
8
+ Project-URL: Tutorials, https://docs.sinapsis.tech/tutorials
9
+ Project-URL: Repository, https://github.com/Sinapsis-AI/sinapsis-speech.git
10
+ Requires-Python: >=3.10
11
+ Description-Content-Type: text/markdown
12
+ Requires-Dist: silentcipher
13
+ Requires-Dist: csm
14
+ Provides-Extra: data-tools
15
+ Requires-Dist: sinapsis-data-readers[all]>=0.1.2; extra == "data-tools"
16
+ Requires-Dist: sinapsis-data-writers[soundfile]>=0.1.2; extra == "data-tools"
17
+ Provides-Extra: all
18
+ Requires-Dist: sinapsis-csm[data-tools]; extra == "all"
19
+
20
+ <h1 align="center">
21
+ <br>
22
+ <a href="https://sinapsis.tech/">
23
+ <img
24
+ src="https://github.com/Sinapsis-AI/brand-resources/blob/main/sinapsis_logo/4x/logo.png?raw=true"
25
+ alt="" width="300">
26
+ </a><br>
27
+ Sinapsis CSM
28
+ <br>
29
+ </h1>
30
+
31
+ <p align="center">
32
+ <a href="#installation">🐍 Installation</a> •
33
+ <a href="#features">🚀 Features</a> •
34
+ <a href="#example">📚 Usage example</a> •
35
+ <a href="#documentation">📙 Documentation</a> •
36
+ <a href="#license">🔍 License</a>
37
+ </p>
38
+
39
+ This **Sinapsis CSM** package integrates a lightweight, efficient text-to-speech engine using the CSM model. It provides a simple template to convert input text into speech using Sinapsis.
40
+
41
+ ---
42
+
43
+ <h2 id="installation">🐍 Installation</h2>
44
+
45
+ > [!IMPORTANT]
46
+ > Sinapsis project requires Python 3.10 or higher.
47
+
48
+ Install using your preferred package manager. We strongly recommend using <code>uv</code>. To install <code>uv</code>, refer to the [official documentation](https://docs.astral.sh/uv/getting-started/installation/#installation-methods).
49
+
50
+ Install with <code>uv</code>:
51
+ ```bash
52
+ uv pip install sinapsis-csm --extra-index-url https://pypi.sinapsis.tech
53
+ ```
54
+
55
+ Or with raw <code>pip</code>:
56
+ ```bash
57
+ pip install sinapsis-csm --extra-index-url https://pypi.sinapsis.tech
58
+ ```
59
+
60
+ > [!IMPORTANT]
61
+ > Templates in each package may require additional dependencies. For development, we recommend installing the package with all the optional dependencies:
62
+
63
+ With <code>uv</code>:
64
+ ```bash
65
+ uv pip install sinapsis-csm[all] --extra-index-url https://pypi.sinapsis.tech
66
+ ```
67
+
68
+ Or with raw <code>pip</code>:
69
+ ```bash
70
+ pip install sinapsis-csm[all] --extra-index-url https://pypi.sinapsis.tech
71
+ ```
72
+
73
+ To run this package you need a HuggingFace token. See the [official instructions](https://huggingface.co/docs/hub/security-tokens)
74
+ and set it using
75
+ ```bash
76
+ export HF_TOKEN=<token-provided-by-hf>
77
+ ```
78
+
79
+ and test it through the cli or the webapp.
80
+
81
+ Access to the following models is needed:
82
+
83
+ * [Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B)
84
+ * [CSM-1B](https://huggingface.co/sesame/csm-1b)
85
+
86
+ ---
87
+
88
+ <h2 id="features">🚀 Features</h2>
89
+
90
+ <h3>Templates Supported</h3>
91
+
92
+ - **CSMTTS**: Converts text into speech using the CSM model.
93
+
94
+ <details>
95
+ <summary>Attributes</summary>
96
+
97
+ - `speaker_id` (int, default: 0): Speaker identity index.
98
+ - `max_audio_length_ms` (int, default: 10000): Max audio length in milliseconds.
99
+ - `device` ("cpu" or "cuda", default: "cpu"): Device used for inference.
100
+ - `context` (context: list[str] | None = None): Optional list of past utterances for context.
101
+ - `sample_rate_hz` (int, default: 24000): Output audio sample rate.
102
+ </details>
103
+
104
+ ---
105
+
106
+ <h2 id="example">📚 Usage example</h2>
107
+
108
+ This example shows how to use the **CSMTTS** template to convert text into speech and save it to disk.
109
+
110
+ <details>
111
+ <summary><strong><span style="font-size: 1.2em;">Agent config</span></strong></summary>
112
+
113
+ ```yaml
114
+ agent:
115
+ name: csm_tts_agent
116
+ description: Agent that synthesizes speech from text using the CSM model.
117
+
118
+ templates:
119
+ - template_name: InputTemplate
120
+ class_name: InputTemplate
121
+ attributes: {}
122
+
123
+ - template_name: TextInput
124
+ class_name: TextInput
125
+ template_input: InputTemplate
126
+ attributes:
127
+ text: "Hi, my name is Taylor and this is Sinapsis"
128
+
129
+ - template_name: CSMTTS
130
+ class_name: CSMTTS
131
+ template_input: TextInput
132
+ attributes:
133
+ speaker_id: 0
134
+ max_audio_length_ms: 10000
135
+ device: cpu
136
+ context: null
137
+ sample_rate_hz: 24000
138
+
139
+ - template_name: AudioWriterSoundFile
140
+ class_name: AudioWriterSoundFile
141
+ template_input: CSMTTS
142
+ attributes:
143
+ save_dir: csm_tts
144
+ extension: wav
145
+ ```
146
+
147
+ </details>
148
+
149
+ To run the config, use:
150
+
151
+ ```bash
152
+ sinapsis run packages/sinapsis_csm/src/sinapsis_csm/configs/csm_agent.yml
153
+ ```
154
+
155
+ > [!NOTE]
156
+ > The `TextInput` and `AudioWriterSoundFile` templates come from the [sinapsis-data-readers](https://github.com/Sinapsis-AI/sinapsis-data-tools) and [sinapsis-data-writers](https://github.com/Sinapsis-AI/sinapsis-data-tools) packages. Make sure they are installed to use this example.
157
+
158
+ ---
159
+
160
+
161
+ <h2 id="webapp">🌐 Webapp</h2>
162
+ The webapp included in this project showcases the modularity of the CSM template for speech generation tasks.
163
+
164
+ > [!IMPORTANT]
165
+ > To run the app you first need to clone this repository:
166
+
167
+ ```bash
168
+ git clone git@github.com:Sinapsis-ai/sinapsis-speech.git
169
+ cd sinapsis-speech
170
+ ```
171
+
172
+ > [!NOTE]
173
+ > If you'd like to enable external app sharing in Gradio, `export GRADIO_SHARE_APP=True`
174
+
175
+
176
+ <details>
177
+ <summary id="docker"><strong><span style="font-size: 1.4em;">🐳 Docker</span></strong></summary>
178
+
179
+ **IMPORTANT** This docker image depends on the sinapsis-nvidia:base image. Please refer to the official [sinapsis](https://github.com/Sinapsis-ai/sinapsis?tab=readme-ov-file#docker) instructions to Build with Docker.
180
+
181
+ 1. **Build the sinapsis-speech image**:
182
+ ```bash
183
+ docker compose -f docker/compose.yaml build
184
+ ```
185
+
186
+ 2. **Start the app container**:
187
+ ```bash
188
+ docker compose -f docker/compose_apps.yaml up -d sinapsis-csm
189
+ ```
190
+ 3. **Check the logs**
191
+ ```bash
192
+ docker logs -f sinapsis-csm
193
+ ```
194
+ 4. **The logs will display the URL to access the webapp, e.g.,:**:
195
+ ```bash
196
+ Running on local URL: http://127.0.0.1:7860
197
+ ```
198
+
199
+ **NOTE**: The url may be different, check the output of logs.
200
+
201
+ 5. **To stop the app**:
202
+ ```bash
203
+ docker compose -f docker/compose_apps.yaml down
204
+ ```
205
+ </details>
206
+
207
+ <details>
208
+ <summary id="virtual-environment"><strong><span style="font-size: 1.4em;">💻 UV</span></strong></summary>
209
+
210
+ To run the webapp using the <code>uv</code> package manager, follow these steps:
211
+
212
+ 1. **Sync the virtual environment**:
213
+
214
+ ```bash
215
+ uv sync --frozen
216
+ ```
217
+ 2. **Install the wheel**:
218
+
219
+ ```bash
220
+ uv pip install sinapsis-speech[all] --extra-index-url https://pypi.sinapsis.tech
221
+ ```
222
+ 3. **Run the webapp**:
223
+
224
+ ```bash
225
+ uv run webapps/packet_tts_apps/csm_tts_app.py
226
+ ```
227
+ 4. **The terminal will display the URL to access the webapp (e.g.)**:
228
+ ```bash
229
+ Running on local URL: http://127.0.0.1:7860
230
+ ```
231
+ **NOTE**: The URL may vary; check the terminal output for the correct address.
232
+
233
+ </details>
234
+
235
+
236
+
237
+ <h2 id="documentation">📙 Documentation</h2>
238
+
239
+ Documentation is available on the [Sinapsis website](https://docs.sinapsis.tech/docs).
240
+
241
+ Tutorials and guides for different templates and agents are available at [docs.sinapsis.tech/tutorials](https://docs.sinapsis.tech/tutorials).
242
+
243
+ ---
244
+
245
+ <h2 id="license">🔍 License</h2>
246
+
247
+ This project is licensed under the AGPLv3 license, which encourages open collaboration and sharing. For more details, please refer to the [LICENSE](LICENSE) file.
248
+
249
+ For commercial use, please refer to our [official Sinapsis website](https://sinapsis.tech) for information on obtaining a commercial license.
@@ -0,0 +1,11 @@
1
+ README.md
2
+ pyproject.toml
3
+ src/sinapsis_csm/__init__.py
4
+ src/sinapsis_csm.egg-info/PKG-INFO
5
+ src/sinapsis_csm.egg-info/SOURCES.txt
6
+ src/sinapsis_csm.egg-info/dependency_links.txt
7
+ src/sinapsis_csm.egg-info/requires.txt
8
+ src/sinapsis_csm.egg-info/top_level.txt
9
+ src/sinapsis_csm/helpers/generator.py
10
+ src/sinapsis_csm/templates/__init__.py
11
+ src/sinapsis_csm/templates/csm_tts.py
@@ -0,0 +1,9 @@
1
+ silentcipher
2
+ csm
3
+
4
+ [all]
5
+ sinapsis-csm[data-tools]
6
+
7
+ [data-tools]
8
+ sinapsis-data-readers[all]>=0.1.2
9
+ sinapsis-data-writers[soundfile]>=0.1.2
@@ -0,0 +1 @@
1
+ sinapsis_csm