neutts 0.1.0__py3-none-any.whl → 0.1.1__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,288 @@
1
+ Metadata-Version: 2.4
2
+ Name: neutts
3
+ Version: 0.1.1
4
+ Summary: NeuTTS - a package for text-to-speech generation using Neuphonics TTS models.
5
+ Author-email: neuphonic <general@neuphonic.com>
6
+ Requires-Python: >=3.9
7
+ Description-Content-Type: text/markdown
8
+ License-File: LICENSE
9
+ Requires-Dist: librosa==0.11.0
10
+ Requires-Dist: neucodec>=0.0.4
11
+ Requires-Dist: numpy~=2.2.6
12
+ Requires-Dist: phonemizer>=3.0.0
13
+ Requires-Dist: resemble-perth==1.0.1
14
+ Requires-Dist: soundfile==0.13.1
15
+ Requires-Dist: torch>=2.8.0
16
+ Requires-Dist: transformers~=4.56.1
17
+ Provides-Extra: llama
18
+ Requires-Dist: llama-cpp-python; extra == "llama"
19
+ Provides-Extra: onnx
20
+ Requires-Dist: onnxruntime; extra == "onnx"
21
+ Provides-Extra: all
22
+ Requires-Dist: llama-cpp-python; extra == "all"
23
+ Requires-Dist: onnxruntime; extra == "all"
24
+ Dynamic: license-file
25
+
26
+ # NeuTTS
27
+
28
+ HuggingFace 🤗:
29
+
30
+ - NeuTTS-Air: [Model](https://huggingface.co/neuphonic/neutts-air), [Q8 GGUF](https://huggingface.co/neuphonic/neutts-air-q8-gguf), [Q4 GGUF](https://huggingface.co/neuphonic/neutts-air-q4-gguf), [Spaces](https://huggingface.co/spaces/neuphonic/neutts-air)
31
+ - NeuTTS-Nano: [Model](https://huggingface.co/neuphonic/neutts-nano), [Q8 GGUF](https://huggingface.co/neuphonic/neutts-nano-q8-gguf), [Q4 GGUF](https://huggingface.co/neuphonic/neutts-nano-q4-gguf), [Spaces](https://huggingface.co/spaces/neuphonic/neutts-nano)
32
+
33
+
34
+ [NeuTTS-Nano Demo Video](https://github.com/user-attachments/assets/629ec5b2-4818-4fa6-987a-99fcbadc56bc)
35
+
36
+ _Created by [Neuphonic](http://neuphonic.com/) - building faster, smaller, on-device voice AI_
37
+
38
+ State-of-the-art Voice AI has been locked behind web APIs for too long. NeuTTS is a collection of open source, on-device, TTS speech language models with instant voice cloning. Built off of LLM backbones, NeuTTS brings natural-sounding speech, real-time performance, built-in security and speaker cloning to your local device - unlocking a new category of embedded voice agents, assistants, toys, and compliance-safe apps.
39
+
40
+ ## Key Features
41
+
42
+ - 🗣Best-in-class realism for their size - produce natural, ultra-realistic voices that sound human, at the sweet spot between speed, size, and quality for real-world applications
43
+ - 📱Optimised for on-device deployment - provided in GGML format, ready to run on phones, laptops, or even Raspberry Pis
44
+ - 👫Instant voice cloning - create your own speaker with as little as 3 seconds of audio
45
+ - 🚄Simple LM + codec architecture - making development and deployment simple
46
+
47
+ > [!CAUTION]
48
+ > Websites like neutts.com are popping up and they're not affliated with Neuphonic, our github or this repo.
49
+ >
50
+ > We are on neuphonic.com only. Please be careful out there! 🙏
51
+
52
+ ## Model Details
53
+
54
+
55
+
56
+ NeuTTS models are built from small LLM backbones - lightweight yet capable language models optimised for text understanding and generation - as well as a powerful combination of technologies designed for efficiency and quality:
57
+
58
+ - **Supported Languages**: English
59
+ - **Audio Codec**: [NeuCodec](https://huggingface.co/neuphonic/neucodec) - our 50hz neural audio codec that achieves exceptional audio quality at low bitrates using a single codebook
60
+ - **Context Window**: 2048 tokens, enough for processing ~30 seconds of audio (including prompt duration)
61
+ - **Format**: Available in GGML format for efficient on-device inference
62
+ - **Responsibility**: Watermarked outputs
63
+ - **Inference Speed**: Real-time generation on mid-range devices
64
+ - **Power Consumption**: Optimised for mobile and embedded devices
65
+
66
+
67
+ | | NeuTTSAir | NeuTTSNano |
68
+ |---|---:|---:|
69
+ | **# Params (Active)** | ~360m | ~120m |
70
+ | **# Params (Emb + Active)** | ~552m | ~229m |
71
+ | **Cloning** | Yes | Yes |
72
+ | **License** | Apache 2.0 | NeuTTS Open License 1.0 |
73
+
74
+ ## Throughput Benchmarking
75
+
76
+ The two models were benchmarked using the Q4 quantisations [neutts-air-Q4-0](https://huggingface.co/neuphonic/neutts-air-q4-gguf) and [neutts-nano-Q4-0](https://huggingface.co/neuphonic/neutts-nano-q4-gguf).
77
+ Benchmarks on CPU were run through llama-bench (llama.cpp) to measure prefill and decode throughput at multiple context sizes.
78
+
79
+ For GPU's (specifically RTX 4090), we leverage vLLM to maximise throughput. We run benchmarks using the [vLLM benchmark](https://docs.vllm.ai/en/stable/cli/bench/throughput/).
80
+
81
+ We include benchmarks on four devices: Galaxy A25 5G, AMD Ryzen 9HX 370, iMac M4 16GB, NVIDIA GeForce RTX 4090.
82
+
83
+
84
+ | | NeuTTSAir | NeuTTSNano |
85
+ |---|---:|---:|
86
+ | **Galaxy A25 5G (CPU only)** | 20 tokens/s | 45 tokens/s|
87
+ | **AMD Ryzen 9 HX 370 (CPU only)** | 119 tokens/s | 221 tokens/s |
88
+ | **iMAc M4 16 GB (CPU only)** | 111 tokens/s | 195 tokens/s |
89
+ | **RTX 4090** | 16194 tokens/s | 19268 tokens/s |
90
+
91
+
92
+ > [!NOTE]
93
+ > llama-bench used 14 threads for prefill and 16 threads for decode (as configured in the benchmark run) on AMD Ryzen 9HX 370 and iMac M4 16GB, and 6 threads for each on the Galaxy A25 5G. The tokens/s reported are when having 500 prefill tokens and generating 250 output tokens.
94
+
95
+ > [!NOTE]
96
+ > Please note that these benchmarks only include the Speech Language Model and do not include the Codec which is needed for a full audio generation pipeline.
97
+
98
+ ## Get Started with NeuTTS
99
+
100
+ > [!NOTE]
101
+ > We have added a [streaming example](examples/basic_streaming_example.py) using the `llama-cpp-python` library as well as a [finetuning script](examples/finetune.py). For finetuning, please refer to the [finetune guide](TRAINING.md) for more details.
102
+
103
+ 1. **Clone Git Repo**
104
+
105
+ ```bash
106
+ git clone https://github.com/neuphonic/neutts.git
107
+ cd neutts
108
+ ```
109
+
110
+ 2. **Install `espeak` (required dependency)**
111
+
112
+ Please refer to the following link for instructions on how to install `espeak`:
113
+
114
+ https://github.com/espeak-ng/espeak-ng/blob/master/docs/guide.md
115
+
116
+ ```bash
117
+ # Mac OS
118
+ brew install espeak-ng
119
+
120
+ # Ubuntu/Debian
121
+ sudo apt install espeak-ng
122
+
123
+ # Windows install
124
+ # via chocolatey (https://community.chocolatey.org/packages?page=1&prerelease=False&moderatorQueue=False&tags=espeak)
125
+ choco install espeak-ng
126
+ # via wingit
127
+ winget install -e --id eSpeak-NG.eSpeak-NG
128
+ # via msi (need to add to path or folow the "Windows users who installed via msi" below)
129
+ # find the msi at https://github.com/espeak-ng/espeak-ng/releases
130
+ ```
131
+
132
+ Windows users who installed via msi / do not have their install on path need to run the following (see https://github.com/bootphon/phonemizer/issues/163)
133
+ ```pwsh
134
+ $env:PHONEMIZER_ESPEAK_LIBRARY = "c:\Program Files\eSpeak NG\libespeak-ng.dll"
135
+ $env:PHONEMIZER_ESPEAK_PATH = "c:\Program Files\eSpeak NG"
136
+ setx PHONEMIZER_ESPEAK_LIBRARY "c:\Program Files\eSpeak NG\libespeak-ng.dll"
137
+ setx PHONEMIZER_ESPEAK_PATH "c:\Program Files\eSpeak NG"
138
+ ```
139
+
140
+ 3. **Install NeuTTS**
141
+ ```bash
142
+ pip install neutts
143
+ ```
144
+ alternatively
145
+ ```bash
146
+ pip install neutts[all] # to get onnx and llamacpp dependency
147
+ ```
148
+
149
+
150
+ 4. **(Optional) Install Llama-cpp-python to use the `GGUF` models.**
151
+
152
+ ```bash
153
+ pip install "neutts[llama]"
154
+ ```
155
+
156
+ To run llama-cpp with GPU suport (CUDA, MPS) support please refer to:
157
+ https://pypi.org/project/llama-cpp-python/
158
+
159
+ 5. **(Optional) Install onnxruntime to use the `.onnx` decoder.**
160
+ If you want to run the onnxdecoder
161
+ ```bash
162
+ pip install "neutts[onnx]"
163
+ ```
164
+
165
+ ## Running the Model
166
+
167
+ Run the basic example script to synthesize speech:
168
+
169
+ ```bash
170
+ python -m examples.basic_example \
171
+ --input_text "My name is Andy. I'm 25 and I just moved to London. The underground is pretty confusing, but it gets me around in no time at all." \
172
+ --ref_audio samples/jo.wav \
173
+ --ref_text samples/jo.txt
174
+ ```
175
+
176
+ To specify a particular model repo for the backbone or codec, add the `--backbone` argument. Available backbones are listed in [NeuTTS-Air](https://huggingface.co/collections/neuphonic/neutts-air) and [NeuTTS-Nano](https://huggingface.co/collections/neuphonic/neutts-nano) huggingface collections.
177
+
178
+ Several examples are available, including a Jupyter notebook in the `examples` folder.
179
+
180
+ ### One-Code Block Usage
181
+
182
+ ```python
183
+ from neutts import NeuTTS
184
+ import soundfile as sf
185
+
186
+ tts = NeuTTS(
187
+ backbone_repo="neuphonic/neutts-nano", # or 'neuphonic/neutts-nano-q4-gguf' with llama-cpp-python installed
188
+ backbone_device="cpu",
189
+ codec_repo="neuphonic/neucodec",
190
+ codec_device="cpu"
191
+ )
192
+ input_text = "My name is Andy. I'm 25 and I just moved to London. The underground is pretty confusing, but it gets me around in no time at all."
193
+
194
+ ref_text = "samples/jo.txt"
195
+ ref_audio_path = "samples/jo.wav"
196
+
197
+ ref_text = open(ref_text, "r").read().strip()
198
+ ref_codes = tts.encode_reference(ref_audio_path)
199
+
200
+ wav = tts.infer(input_text, ref_codes, ref_text)
201
+ sf.write("test.wav", wav, 24000)
202
+ ```
203
+
204
+ ### Streaming
205
+
206
+ Speech can also be synthesised in _streaming mode_, where audio is generated in chunks and plays as generated. Note that this requires pyaudio to be installed. To do this, run:
207
+
208
+ ```bash
209
+ python -m examples.basic_streaming_example \
210
+ --input_text "My name is Andy. I'm 25 and I just moved to London. The underground is pretty confusing, but it gets me around in no time at all." \
211
+ --ref_codes samples/jo.pt \
212
+ --ref_text samples/jo.txt
213
+ ```
214
+
215
+ Again, a particular model repo can be specified with the `--backbone` argument - note that for streaming the model must be in GGUF format.
216
+
217
+ ## Preparing References for Cloning
218
+
219
+ NeuTTS requires two inputs:
220
+
221
+ 1. A reference audio sample (`.wav` file)
222
+ 2. A text string
223
+
224
+ The model then synthesises the text as speech in the style of the reference audio. This is what enables NeuTTS models instant voice cloning capability.
225
+
226
+ ### Example Reference Files
227
+
228
+ You can find some ready-to-use samples in the `examples` folder:
229
+
230
+ - `samples/dave.wav`
231
+ - `samples/jo.wav`
232
+
233
+ ### Guidelines for Best Results
234
+
235
+ For optimal performance, reference audio samples should be:
236
+
237
+ 1. **Mono channel**
238
+ 2. **16-44 kHz sample rate**
239
+ 3. **3–15 seconds in length**
240
+ 4. **Saved as a `.wav` file**
241
+ 5. **Clean** — minimal to no background noise
242
+ 6. **Natural, continuous speech** — like a monologue or conversation, with few pauses, so the model can capture tone effectively
243
+
244
+ ## Guidelines for minimizing Latency
245
+
246
+ For optimal performance on-device:
247
+
248
+ 1. Use the GGUF model backbones
249
+ 2. Pre-encode references
250
+ 3. Use the [onnx codec decoder](https://huggingface.co/neuphonic/neucodec-onnx-decoder)
251
+
252
+ Take a look at this example [examples README](examples/README.md###minimal-latency-example) to get started.
253
+
254
+ ## Responsibility
255
+
256
+ Every audio file generated by NeuTTS includes [Perth (Perceptual Threshold) Watermarker](https://github.com/resemble-ai/perth).
257
+
258
+ ## Disclaimer
259
+
260
+ Don't use this model to do bad things… please.
261
+
262
+ ## Developer Requirements
263
+
264
+ To run the pre commit hooks to contribute to this project run:
265
+
266
+ ```bash
267
+ pip install pre-commit
268
+ ```
269
+
270
+ Then:
271
+
272
+ ```bash
273
+ pre-commit install
274
+ ```
275
+
276
+ ## Running Tests
277
+
278
+ First, install the dev requirements:
279
+
280
+ ```
281
+ pip install -r requirements-dev.txt
282
+ ```
283
+
284
+ To run the tests:
285
+
286
+ ```
287
+ pytest tests/
288
+ ```
@@ -1,9 +1,9 @@
1
1
  neutts/__init__.py,sha256=hddhq1_HdIr30GtoeYhARPumA4ET1W7QW5hgqMVQT8o,49
2
2
  neutts/neutts.py,sha256=TMpshBT_yoInKrz6_WV9dnzQgvvT50djaGzwAKfxfzI,17747
3
- neutts-0.1.0.dist-info/licenses/LICENSE,sha256=7CGCbnYiDp_SkdQn1VHp-Vx5NKdGWLOjrIQxGwpoZaw,11085
3
+ neutts-0.1.1.dist-info/licenses/LICENSE,sha256=7CGCbnYiDp_SkdQn1VHp-Vx5NKdGWLOjrIQxGwpoZaw,11085
4
4
  neuttsair/__init__.py,sha256=Z1fqgoEL9bkcUtwHcO6y4ly0vINa99pISVFZKFO9_UQ,55
5
5
  neuttsair/neutts.py,sha256=1GakVYRZZnlTtlsZGvRxDRY5768sywURonLHLBSjeiU,257
6
- neutts-0.1.0.dist-info/METADATA,sha256=sWlF41oo1V7zCiR0ZZS7ix9-xCnNlQL5cHW63nioheE,509
7
- neutts-0.1.0.dist-info/WHEEL,sha256=wUyA8OaulRlbfwMtmQsvNngGrxQHAvkKcvRmdizlJi0,92
8
- neutts-0.1.0.dist-info/top_level.txt,sha256=eyRQcARniW63_drAEjVvpLnnki4ngBSPkRNuw5jlHXQ,17
9
- neutts-0.1.0.dist-info/RECORD,,
6
+ neutts-0.1.1.dist-info/METADATA,sha256=bAeMe-QVsvte0fE8DN9qG--ix2K7fA8jDE3WwdNyeEs,10880
7
+ neutts-0.1.1.dist-info/WHEEL,sha256=wUyA8OaulRlbfwMtmQsvNngGrxQHAvkKcvRmdizlJi0,92
8
+ neutts-0.1.1.dist-info/top_level.txt,sha256=eyRQcARniW63_drAEjVvpLnnki4ngBSPkRNuw5jlHXQ,17
9
+ neutts-0.1.1.dist-info/RECORD,,
@@ -1,16 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: neutts
3
- Version: 0.1.0
4
- Summary: NeuTTS - a package for text-to-speech generation using Neuphonics TTS models.
5
- Author-email: neuphonic <general@neuphonic.com>
6
- Requires-Python: >=3.9
7
- License-File: LICENSE
8
- Requires-Dist: librosa==0.11.0
9
- Requires-Dist: neucodec>=0.0.4
10
- Requires-Dist: numpy~=2.2.6
11
- Requires-Dist: phonemizer>=3.0.0
12
- Requires-Dist: resemble-perth==1.0.1
13
- Requires-Dist: soundfile==0.13.1
14
- Requires-Dist: torch>=2.8.0
15
- Requires-Dist: transformers~=4.56.1
16
- Dynamic: license-file
File without changes