ttscli 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,79 @@
1
+ ---
2
+ name: tts
3
+ description: Convert text to speech using the tts CLI. Use when the user asks to read text aloud, generate audio, speak something, or convert text to speech.
4
+ ---
5
+
6
+ # TTS - Text to Speech Skill
7
+
8
+ You have access to the `tts` CLI for text-to-speech with voice cloning, powered by Qwen3-TTS running locally.
9
+
10
+ ## How to use
11
+
12
+ When the user asks you to speak, read aloud, or generate audio from text, use the `tts` CLI via the Bash tool.
13
+
14
+ ### Core commands
15
+
16
+ **Speak text aloud (with streaming playback):**
17
+ ```bash
18
+ tts say "Text to speak"
19
+ ```
20
+
21
+ **Save to a WAV file:**
22
+ ```bash
23
+ tts say "Text to speak" --save output.wav --no-play
24
+ ```
25
+
26
+ **Speak and save simultaneously:**
27
+ ```bash
28
+ tts say "Text to speak" --save output.wav
29
+ ```
30
+
31
+ **Generate audio file (no playback):**
32
+ ```bash
33
+ tts generate "Text to speak" -o output.wav
34
+ ```
35
+
36
+ ### Options
37
+
38
+ | Flag | Description |
39
+ |------|-------------|
40
+ | `-v, --voice NAME` | Use a specific cloned voice |
41
+ | `-l, --language CODE` | Language code (default: en, also: zh, ja, ko, etc.) |
42
+ | `-m, --model SIZE` | Model: 1.7B (quality) or 0.6B (speed) |
43
+ | `-i, --instruct TEXT` | Speaking style instruction (e.g., "Speak slowly and calmly") |
44
+ | `-s, --save PATH` | Save audio to WAV file |
45
+ | `--no-play` | Don't play audio, only save |
46
+ | `--no-stream` | Disable streaming (generate all then play) |
47
+ | `--seed INT` | Random seed for reproducibility |
48
+ | `-f, --file PATH` | Read text from file instead of argument |
49
+
50
+ ### Voice management
51
+
52
+ ```bash
53
+ tts voice list # List available voices
54
+ tts voice add recording.wav --text "transcript" --voice myvoice # Add a voice
55
+ tts voice default myvoice # Set default voice
56
+ tts voice info myvoice # Show voice details
57
+ ```
58
+
59
+ ### Piping text
60
+
61
+ ```bash
62
+ echo "Hello world" | tts say
63
+ ```
64
+
65
+ ## Guidelines
66
+
67
+ 1. **Interpret $ARGUMENTS as the text to speak or as instructions about what to generate.** If the user provides plain text, speak it directly. If they provide instructions (e.g., "read the README aloud"), follow them.
68
+ 2. **Default to `tts say`** for quick playback. Use `tts generate` only when the user explicitly wants a file without playback.
69
+ 3. **Always include `--instruct "Speak at a moderate, natural pace"`** by default for a comfortable listening speed. Adjust the instruct text based on context:
70
+ - Short notifications/alerts: `"Speak clearly and at a normal pace"`
71
+ - Long paragraphs/explanations: `"Speak at a slightly slower, clear pace for easy listening"`
72
+ - If the user asks for faster/slower speed, adjust accordingly (e.g., `"Speak quickly"`, `"Speak very slowly"`)
73
+ - Combine speed with tone when appropriate (e.g., `"Speak slowly and calmly"`, `"Speak quickly with excitement"`)
74
+ 4. **Ask about voice preference** only if the user hasn't specified one and has multiple voices available. Otherwise use the default voice.
75
+ 5. **For long text from files**, use `tts say --file <path>` or pipe the content.
76
+ 6. **Use `--instruct`** when the user describes a tone or speaking style (e.g., "read this excitedly", "speak in a calm voice").
77
+ 7. **Language detection**: If the text is clearly in a non-English language, set `--language` appropriately (zh for Chinese, ja for Japanese, ko for Korean, etc.).
78
+ 8. **For saving files**, default to `.wav` format and suggest a descriptive filename based on the content.
79
+ 9. **Run tts commands with a timeout** of 300000ms (5 minutes) since audio generation can take time for long text.
@@ -0,0 +1,28 @@
1
+ name: Publish to PyPI
2
+
3
+ on:
4
+ release:
5
+ types: [published]
6
+
7
+ permissions:
8
+ id-token: write
9
+
10
+ jobs:
11
+ publish:
12
+ runs-on: ubuntu-latest
13
+ environment: pypi
14
+ steps:
15
+ - uses: actions/checkout@v4
16
+
17
+ - uses: actions/setup-python@v5
18
+ with:
19
+ python-version: "3.12"
20
+
21
+ - name: Install build tools
22
+ run: pip install build
23
+
24
+ - name: Build package
25
+ run: python -m build
26
+
27
+ - name: Publish to PyPI
28
+ uses: pypa/gh-action-pypi-publish@release/v1
@@ -0,0 +1,70 @@
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ build/
8
+ develop-eggs/
9
+ dist/
10
+ downloads/
11
+ eggs/
12
+ .eggs/
13
+ lib/
14
+ lib64/
15
+ parts/
16
+ sdist/
17
+ var/
18
+ wheels/
19
+ *.egg-info/
20
+ .installed.cfg
21
+ *.egg
22
+
23
+ # Virtual environments
24
+ venv/
25
+ ENV/
26
+ env/
27
+ .venv
28
+
29
+ # IDE
30
+ .vscode/
31
+ .idea/
32
+ *.swp
33
+ *.swo
34
+ *~
35
+
36
+ # Testing
37
+ .pytest_cache/
38
+ .coverage
39
+ htmlcov/
40
+ .tox/
41
+ *.cover
42
+
43
+ # Data
44
+ *.db
45
+ *.db-journal
46
+ tts_data/
47
+
48
+ # Distribution
49
+ dist/
50
+ build/
51
+ *.whl
52
+
53
+ # Audio output
54
+ *.wav
55
+ *.mp3
56
+
57
+ # Keep demo intro audio
58
+ !ttscli_intro.wav
59
+
60
+ # OS
61
+ .DS_Store
62
+ Thumbs.db
63
+
64
+ # Config (local)
65
+ config.local.toml
66
+
67
+ # Demo build artifacts
68
+ demo/out/
69
+ demo/intro/node_modules/
70
+ demo/intro/.remotion/
@@ -0,0 +1 @@
1
+ 3.11
@@ -0,0 +1,29 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [0.1.0] - 2025-02-19
9
+
10
+ ### Added
11
+
12
+ - `tts generate` — generate speech from text and save to WAV
13
+ - `tts say` — speak text aloud with streaming audio playback
14
+ - `tts voice add` — add audio samples to a voice (creates voice if needed)
15
+ - `tts voice list` — list all voices
16
+ - `tts voice info` — show voice details
17
+ - `tts voice delete` — delete a voice
18
+ - `tts voice default` — set/show default voice
19
+ - `tts config show` — display current configuration
20
+ - `tts config set` — update configuration values
21
+ - PyTorch backend using Qwen3-TTS models (1.7B and 0.6B)
22
+ - MLX backend for Apple Silicon (via mlx-audio)
23
+ - Automatic platform detection (Apple Silicon → MLX, otherwise → PyTorch)
24
+ - Voice cloning from reference audio samples
25
+ - Streaming audio playback with chunked generation
26
+ - JSON output mode (`--output json` / `--json`) for scripting
27
+ - Rich terminal output with tables and progress spinners
28
+ - Configuration via TOML files or CLI flags
29
+ - Generation history tracking (last 100 entries)
@@ -0,0 +1,121 @@
1
+ # Installation Guide
2
+
3
+ ## Prerequisites
4
+
5
+ - Python 3.11 or higher
6
+ - pip or uv package manager
7
+
8
+ ## Installation
9
+
10
+ ### From PyPI
11
+
12
+ ```bash
13
+ # Basic install
14
+ pip install tts-cli
15
+
16
+ # With PyTorch backend (CUDA / CPU)
17
+ pip install tts-cli[pytorch]
18
+
19
+ # With MLX backend (Apple Silicon)
20
+ pip install tts-cli[mlx]
21
+ ```
22
+
23
+ ### From source
24
+
25
+ ```bash
26
+ git clone https://github.com/your-org/ttscli.git
27
+ cd ttscli
28
+ pip install -e ".[pytorch]"
29
+ ```
30
+
31
+ Or using uv:
32
+
33
+ ```bash
34
+ uv pip install -e ".[pytorch]"
35
+ ```
36
+
37
+ ### Verify installation
38
+
39
+ ```bash
40
+ tts --version
41
+ ```
42
+
43
+ You should see:
44
+ ```
45
+ tts version 0.1.0
46
+ ```
47
+
48
+ ### Test basic commands
49
+
50
+ ```bash
51
+ # List voices (should be empty initially)
52
+ tts voice list
53
+
54
+ # Show config
55
+ tts config show
56
+
57
+ # View help
58
+ tts --help
59
+ tts say --help
60
+ ```
61
+
62
+ ## Configuration (optional)
63
+
64
+ ### Set custom data directory
65
+
66
+ ```bash
67
+ tts config set data_dir /path/to/your/data
68
+ ```
69
+
70
+ ## Troubleshooting
71
+
72
+ ### Command not found
73
+
74
+ If `tts` command is not found, ensure your Python scripts directory is in PATH:
75
+
76
+ ```bash
77
+ # Add to ~/.bashrc or ~/.zshrc
78
+ export PATH="$HOME/.local/bin:$PATH"
79
+ ```
80
+
81
+ Or use the module directly:
82
+
83
+ ```bash
84
+ python -m ttscli --version
85
+ ```
86
+
87
+ ### Import errors
88
+
89
+ Make sure all dependencies are installed:
90
+
91
+ ```bash
92
+ pip install -e ".[pytorch]" --force-reinstall
93
+ ```
94
+
95
+ ### Permission errors
96
+
97
+ On some systems you may need to install in user mode:
98
+
99
+ ```bash
100
+ pip install --user -e .
101
+ ```
102
+
103
+ ## Development Installation
104
+
105
+ For development with testing tools:
106
+
107
+ ```bash
108
+ pip install -e ".[dev]"
109
+ ```
110
+
111
+ This installs additional packages: pytest, black, ruff, etc.
112
+
113
+ ## Uninstallation
114
+
115
+ ```bash
116
+ pip uninstall tts-cli
117
+ ```
118
+
119
+ ## Next Steps
120
+
121
+ See [README.md](README.md) for usage guide and examples.
ttscli-0.1.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Voicebox Team
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
ttscli-0.1.0/PKG-INFO ADDED
@@ -0,0 +1,230 @@
1
+ Metadata-Version: 2.4
2
+ Name: ttscli
3
+ Version: 0.1.0
4
+ Summary: Command-line interface for text-to-speech with voice cloning
5
+ Project-URL: Homepage, https://github.com/jiweiyuan/ttscli
6
+ Project-URL: Repository, https://github.com/jiweiyuan/ttscli
7
+ Project-URL: Issues, https://github.com/jiweiyuan/ttscli/issues
8
+ Project-URL: Changelog, https://github.com/jiweiyuan/ttscli/blob/main/CHANGELOG.md
9
+ Author: Voicebox Team
10
+ License: MIT
11
+ License-File: LICENSE
12
+ Keywords: cli,speech-synthesis,tts,voice-cloning
13
+ Classifier: Development Status :: 3 - Alpha
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: License :: OSI Approved :: MIT License
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Programming Language :: Python :: 3.11
18
+ Classifier: Programming Language :: Python :: 3.12
19
+ Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
20
+ Requires-Python: >=3.11
21
+ Requires-Dist: numpy<2.0.0,>=1.24.0
22
+ Requires-Dist: pydantic-settings>=2.2.0
23
+ Requires-Dist: pydantic>=2.7.0
24
+ Requires-Dist: rich>=13.7.0
25
+ Requires-Dist: sounddevice>=0.4.6
26
+ Requires-Dist: soundfile>=0.12.1
27
+ Requires-Dist: toml>=0.10.2
28
+ Requires-Dist: typer[all]>=0.12.0
29
+ Provides-Extra: dev
30
+ Requires-Dist: black>=24.3.0; extra == 'dev'
31
+ Requires-Dist: psutil>=5.9.0; extra == 'dev'
32
+ Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
33
+ Requires-Dist: pytest-cov>=5.0.0; extra == 'dev'
34
+ Requires-Dist: pytest>=8.1.0; extra == 'dev'
35
+ Requires-Dist: ruff>=0.3.0; extra == 'dev'
36
+ Provides-Extra: mlx
37
+ Requires-Dist: librosa>=0.10.0; extra == 'mlx'
38
+ Requires-Dist: mlx-audio>=0.3.0; extra == 'mlx'
39
+ Requires-Dist: mlx>=0.5.0; extra == 'mlx'
40
+ Provides-Extra: pytorch
41
+ Requires-Dist: accelerate>=0.25.0; extra == 'pytorch'
42
+ Requires-Dist: librosa>=0.10.0; extra == 'pytorch'
43
+ Requires-Dist: pillow>=10.0.0; extra == 'pytorch'
44
+ Requires-Dist: qwen-tts>=0.1.0; extra == 'pytorch'
45
+ Requires-Dist: scipy>=1.11.0; extra == 'pytorch'
46
+ Requires-Dist: sentencepiece>=0.1.99; extra == 'pytorch'
47
+ Requires-Dist: torch>=2.1.0; extra == 'pytorch'
48
+ Requires-Dist: transformers>=4.36.0; extra == 'pytorch'
49
+ Description-Content-Type: text/markdown
50
+
51
+ # TTS CLI
52
+
53
+ A command-line interface for text-to-speech with voice cloning, powered by [Qwen3-TTS](https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-Base).
54
+
55
+ Supports **PyTorch** (CUDA / CPU) and **MLX** (Apple Silicon) backends with automatic platform detection.
56
+
57
+ ## Features
58
+
59
+ - 🎙️ **Voice cloning** — clone any voice from a short audio sample
60
+ - 🔊 **Streaming playback** — hear audio as it generates, no waiting
61
+ - 🍎 **Apple Silicon native** — MLX backend for fast local inference
62
+ - 🎛️ **Two model sizes** — 1.7B (quality) and 0.6B (speed)
63
+ - 📝 **JSON output** — machine-readable output for scripting and pipelines
64
+ - ⚙️ **Configurable** — TOML config files or CLI flags
65
+
66
+ ## Installation
67
+
68
+ **Requires Python 3.11+**
69
+
70
+ ```bash
71
+ # Basic install
72
+ pip install tts-cli
73
+
74
+ # With PyTorch backend
75
+ pip install tts-cli[pytorch]
76
+
77
+ # With MLX backend (Apple Silicon)
78
+ pip install tts-cli[mlx]
79
+
80
+ # Development
81
+ pip install tts-cli[dev]
82
+ ```
83
+
84
+ Or install from source:
85
+
86
+ ```bash
87
+ git clone https://github.com/your-org/ttscli.git
88
+ cd ttscli
89
+ pip install -e ".[pytorch]"
90
+ ```
91
+
92
+ Verify:
93
+
94
+ ```bash
95
+ tts --version
96
+ ```
97
+
98
+ ## Quick Start
99
+
100
+ ### 1. Add a voice sample
101
+
102
+ ```bash
103
+ tts voice add recording.wav --text "The transcript of the recording" --voice myvoice
104
+ ```
105
+
106
+ ### 2. Speak aloud (streaming)
107
+
108
+ ```bash
109
+ tts say "Hello, how are you today?" --voice myvoice
110
+ ```
111
+
112
+ ### 3. Save to file
113
+
114
+ ```bash
115
+ tts say "Hello world" --voice myvoice -o hello.wav --no-play
116
+ ```
117
+
118
+ ## Commands
119
+
120
+ ### `tts say`
121
+
122
+ Generate speech from text. Plays aloud with streaming by default.
123
+
124
+ ```bash
125
+ tts say "Text to speak" [OPTIONS]
126
+
127
+ Options:
128
+ -v, --voice TEXT Voice name (default: configured default)
129
+ -l, --language TEXT Language code (default: en)
130
+ -m, --model TEXT Model size: 1.7B or 0.6B (default: 1.7B)
131
+ -o, --output PATH Save to WAV file
132
+ -i, --instruct TEXT Speaking style instruction
133
+ --no-play Don't play audio, only save to file
134
+ --no-stream Disable streaming (generate all, then play)
135
+ --seed INT Random seed for reproducibility
136
+ ```
137
+
138
+ Examples:
139
+
140
+ ```bash
141
+ tts say "Hello, how are you?" # play aloud
142
+ tts say "Good morning" --voice myvoice # use specific voice
143
+ tts say "Hello world" -o hello.wav # play and save
144
+ tts say "Hello world" -o hello.wav --no-play # save only
145
+ tts say "Breaking news!" -i "Speak urgently" # with style instruction
146
+ tts say "Slow and steady" --no-stream # generate all, then play
147
+ ```
148
+
149
+ ### `tts voice`
150
+
151
+ Manage voices and audio samples.
152
+
153
+ ```bash
154
+ tts voice add <audio_file> [OPTIONS] # Add sample (creates voice if needed)
155
+ tts voice list # List all voices
156
+ tts voice info [VOICE] # Show voice details
157
+ tts voice delete <VOICE> [-y] # Delete a voice
158
+ tts voice default [VOICE] # Set/show default voice
159
+ tts voice default --unset # Unset default voice
160
+ ```
161
+
162
+ ### `tts config`
163
+
164
+ View and update configuration.
165
+
166
+ ```bash
167
+ tts config show # Show current config
168
+ tts config set <key> <value> # Set a config value
169
+ ```
170
+
171
+ Available config keys: `data_dir`, `default_voice`, `default_language`, `default_model`, `output_format`, `auto_play`
172
+
173
+ ## JSON Output
174
+
175
+ Use `--json` or `--output json` for machine-readable output:
176
+
177
+ ```bash
178
+ tts --json voice list
179
+ tts --output json say "Hello" --voice myvoice
180
+ ```
181
+
182
+ ## Configuration
183
+
184
+ Configuration is loaded from (in order of priority):
185
+
186
+ 1. CLI flags (`--data-dir`, `--output`)
187
+ 2. Config files:
188
+ - `./tts.toml` (project-local)
189
+ - `~/.config/tts/config.toml`
190
+ - `~/.tts/config.toml`
191
+
192
+ Example `config.toml`:
193
+
194
+ ```toml
195
+ default_voice = "myvoice"
196
+ default_language = "en"
197
+ default_model = "1.7B"
198
+ output_format = "rich"
199
+ data_dir = "~/tts"
200
+ ```
201
+
202
+ ## Data Storage
203
+
204
+ All data is stored in `~/tts/` by default:
205
+
206
+ ```text
207
+ ~/tts/
208
+ ├── voices.json # Voice definitions and metadata
209
+ ├── samples/ # Audio samples for voice cloning
210
+ └── generations/ # Generated audio files
211
+ ```
212
+
213
+ ## Requirements
214
+
215
+ - Python 3.11+
216
+ - **PyTorch backend**: torch, transformers, qwen-tts
217
+ - **MLX backend** (Apple Silicon): mlx, mlx-audio
218
+ - Audio: soundfile, sounddevice
219
+ - **System dependency**: [SoX](https://sox.sourceforge.net/) (required by qwen-tts)
220
+
221
+ ```bash
222
+ # macOS
223
+ brew install sox
224
+ # Ubuntu/Debian
225
+ sudo apt install sox
226
+ ```
227
+
228
+ ## License
229
+
230
+ MIT — see [LICENSE](LICENSE) for details.