PyPI - ttscli - Versions diffs - 0.1.0__tar.gz - Mend

ttscli 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (32) hide show

ttscli-0.1.0/.claude/skills/tts/SKILL.md +79 -0
ttscli-0.1.0/.github/workflows/pypi.yml +28 -0
ttscli-0.1.0/.gitignore +70 -0
ttscli-0.1.0/.python-version +1 -0
ttscli-0.1.0/CHANGELOG.md +29 -0
ttscli-0.1.0/INSTALL.md +121 -0
ttscli-0.1.0/LICENSE +21 -0
ttscli-0.1.0/PKG-INFO +230 -0
ttscli-0.1.0/README.md +180 -0
ttscli-0.1.0/docs/PUBLISH.md +494 -0
ttscli-0.1.0/install.sh +136 -0
ttscli-0.1.0/pyproject.toml +93 -0
ttscli-0.1.0/tests/__init__.py +0 -0
ttscli-0.1.0/tests/conftest.py +52 -0
ttscli-0.1.0/tests/test_cli.py +68 -0
ttscli-0.1.0/tests/test_commands/__init__.py +0 -0
ttscli-0.1.0/tests/test_commands/test_voice.py +92 -0
ttscli-0.1.0/tests/test_performance.py +103 -0
ttscli-0.1.0/tests/test_say.py +282 -0
ttscli-0.1.0/ttscli/__init__.py +3 -0
ttscli-0.1.0/ttscli/__main__.py +9 -0
ttscli-0.1.0/ttscli/audio.py +8 -0
ttscli-0.1.0/ttscli/backends/__init__.py +182 -0
ttscli-0.1.0/ttscli/backends/mlx.py +350 -0
ttscli-0.1.0/ttscli/backends/pytorch.py +234 -0
ttscli-0.1.0/ttscli/cli.py +139 -0
ttscli-0.1.0/ttscli/commands.py +787 -0
ttscli-0.1.0/ttscli/config.py +72 -0
ttscli-0.1.0/ttscli/output_format.py +25 -0
ttscli-0.1.0/ttscli/platform.py +57 -0
ttscli-0.1.0/ttscli/storage.py +20 -0
ttscli-0.1.0/ttscli/voices.py +241 -0

ttscli-0.1.0/.claude/skills/tts/SKILL.md ADDED Viewed

@@ -0,0 +1,79 @@
+---
+name: tts
+description: Convert text to speech using the tts CLI. Use when the user asks to read text aloud, generate audio, speak something, or convert text to speech.
+---
+# TTS - Text to Speech Skill
+You have access to the `tts` CLI for text-to-speech with voice cloning, powered by Qwen3-TTS running locally.
+## How to use
+When the user asks you to speak, read aloud, or generate audio from text, use the `tts` CLI via the Bash tool.
+### Core commands
+**Speak text aloud (with streaming playback):**
+```bash
+tts say "Text to speak"
+```
+**Save to a WAV file:**
+```bash
+tts say "Text to speak" --save output.wav --no-play
+```
+**Speak and save simultaneously:**
+```bash
+tts say "Text to speak" --save output.wav
+```
+**Generate audio file (no playback):**
+```bash
+tts generate "Text to speak" -o output.wav
+```
+### Options
+| Flag | Description |
+|------|-------------|
+| `-v, --voice NAME` | Use a specific cloned voice |
+| `-l, --language CODE` | Language code (default: en, also: zh, ja, ko, etc.) |
+| `-m, --model SIZE` | Model: 1.7B (quality) or 0.6B (speed) |
+| `-i, --instruct TEXT` | Speaking style instruction (e.g., "Speak slowly and calmly") |
+| `-s, --save PATH` | Save audio to WAV file |
+| `--no-play` | Don't play audio, only save |
+| `--no-stream` | Disable streaming (generate all then play) |
+| `--seed INT` | Random seed for reproducibility |
+| `-f, --file PATH` | Read text from file instead of argument |
+### Voice management
+```bash
+tts voice list                    # List available voices
+tts voice add recording.wav --text "transcript" --voice myvoice  # Add a voice
+tts voice default myvoice         # Set default voice
+tts voice info myvoice            # Show voice details
+```
+### Piping text
+```bash
+echo "Hello world" | tts say
+```
+## Guidelines
+1. **Interpret $ARGUMENTS as the text to speak or as instructions about what to generate.** If the user provides plain text, speak it directly. If they provide instructions (e.g., "read the README aloud"), follow them.
+2. **Default to `tts say`** for quick playback. Use `tts generate` only when the user explicitly wants a file without playback.
+3. **Always include `--instruct "Speak at a moderate, natural pace"`** by default for a comfortable listening speed. Adjust the instruct text based on context:
+   - Short notifications/alerts: `"Speak clearly and at a normal pace"`
+   - Long paragraphs/explanations: `"Speak at a slightly slower, clear pace for easy listening"`
+   - If the user asks for faster/slower speed, adjust accordingly (e.g., `"Speak quickly"`, `"Speak very slowly"`)
+   - Combine speed with tone when appropriate (e.g., `"Speak slowly and calmly"`, `"Speak quickly with excitement"`)
+4. **Ask about voice preference** only if the user hasn't specified one and has multiple voices available. Otherwise use the default voice.
+5. **For long text from files**, use `tts say --file <path>` or pipe the content.
+6. **Use `--instruct`** when the user describes a tone or speaking style (e.g., "read this excitedly", "speak in a calm voice").
+7. **Language detection**: If the text is clearly in a non-English language, set `--language` appropriately (zh for Chinese, ja for Japanese, ko for Korean, etc.).
+8. **For saving files**, default to `.wav` format and suggest a descriptive filename based on the content.
+9. **Run tts commands with a timeout** of 300000ms (5 minutes) since audio generation can take time for long text.

ttscli-0.1.0/.github/workflows/pypi.yml ADDED Viewed

@@ -0,0 +1,28 @@
+name: Publish to PyPI
+on:
+  release:
+    types: [published]
+permissions:
+  id-token: write
+jobs:
+  publish:
+    runs-on: ubuntu-latest
+    environment: pypi
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+      - name: Install build tools
+        run: pip install build
+      - name: Build package
+        run: python -m build
+      - name: Publish to PyPI
+        uses: pypa/gh-action-pypi-publish@release/v1

ttscli-0.1.0/.gitignore ADDED Viewed

@@ -0,0 +1,70 @@
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+# Virtual environments
+venv/
+ENV/
+env/
+.venv
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+# Testing
+.pytest_cache/
+.coverage
+htmlcov/
+.tox/
+*.cover
+# Data
+*.db
+*.db-journal
+tts_data/
+# Distribution
+dist/
+build/
+*.whl
+# Audio output
+*.wav
+*.mp3
+# Keep demo intro audio
+!ttscli_intro.wav
+# OS
+.DS_Store
+Thumbs.db
+# Config (local)
+config.local.toml
+# Demo build artifacts
+demo/out/
+demo/intro/node_modules/
+demo/intro/.remotion/

ttscli-0.1.0/.python-version ADDED Viewed

	@@ -0,0 +1 @@
1	+ 3.11

ttscli-0.1.0/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,29 @@
+# Changelog
+All notable changes to this project will be documented in this file.
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.1.0] - 2025-02-19
+### Added
+- `tts generate` — generate speech from text and save to WAV
+- `tts say` — speak text aloud with streaming audio playback
+- `tts voice add` — add audio samples to a voice (creates voice if needed)
+- `tts voice list` — list all voices
+- `tts voice info` — show voice details
+- `tts voice delete` — delete a voice
+- `tts voice default` — set/show default voice
+- `tts config show` — display current configuration
+- `tts config set` — update configuration values
+- PyTorch backend using Qwen3-TTS models (1.7B and 0.6B)
+- MLX backend for Apple Silicon (via mlx-audio)
+- Automatic platform detection (Apple Silicon → MLX, otherwise → PyTorch)
+- Voice cloning from reference audio samples
+- Streaming audio playback with chunked generation
+- JSON output mode (`--output json` / `--json`) for scripting
+- Rich terminal output with tables and progress spinners
+- Configuration via TOML files or CLI flags
+- Generation history tracking (last 100 entries)

ttscli-0.1.0/INSTALL.md ADDED Viewed

@@ -0,0 +1,121 @@
+# Installation Guide
+## Prerequisites
+- Python 3.11 or higher
+- pip or uv package manager
+## Installation
+### From PyPI
+```bash
+# Basic install
+pip install tts-cli
+# With PyTorch backend (CUDA / CPU)
+pip install tts-cli[pytorch]
+# With MLX backend (Apple Silicon)
+pip install tts-cli[mlx]
+```
+### From source
+```bash
+git clone https://github.com/your-org/ttscli.git
+cd ttscli
+pip install -e ".[pytorch]"
+```
+Or using uv:
+```bash
+uv pip install -e ".[pytorch]"
+```
+### Verify installation
+```bash
+tts --version
+```
+You should see:
+```
+tts version 0.1.0
+```
+### Test basic commands
+```bash
+# List voices (should be empty initially)
+tts voice list
+# Show config
+tts config show
+# View help
+tts --help
+tts say --help
+```
+## Configuration (optional)
+### Set custom data directory
+```bash
+tts config set data_dir /path/to/your/data
+```
+## Troubleshooting
+### Command not found
+If `tts` command is not found, ensure your Python scripts directory is in PATH:
+```bash
+# Add to ~/.bashrc or ~/.zshrc
+export PATH="$HOME/.local/bin:$PATH"
+```
+Or use the module directly:
+```bash
+python -m ttscli --version
+```
+### Import errors
+Make sure all dependencies are installed:
+```bash
+pip install -e ".[pytorch]" --force-reinstall
+```
+### Permission errors
+On some systems you may need to install in user mode:
+```bash
+pip install --user -e .
+```
+## Development Installation
+For development with testing tools:
+```bash
+pip install -e ".[dev]"
+```
+This installs additional packages: pytest, black, ruff, etc.
+## Uninstallation
+```bash
+pip uninstall tts-cli
+```
+## Next Steps
+See [README.md](README.md) for usage guide and examples.

ttscli-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2025 Voicebox Team
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

ttscli-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,230 @@
+Metadata-Version: 2.4
+Name: ttscli
+Version: 0.1.0
+Summary: Command-line interface for text-to-speech with voice cloning
+Project-URL: Homepage, https://github.com/jiweiyuan/ttscli
+Project-URL: Repository, https://github.com/jiweiyuan/ttscli
+Project-URL: Issues, https://github.com/jiweiyuan/ttscli/issues
+Project-URL: Changelog, https://github.com/jiweiyuan/ttscli/blob/main/CHANGELOG.md
+Author: Voicebox Team
+License: MIT
+License-File: LICENSE
+Keywords: cli,speech-synthesis,tts,voice-cloning
+Classifier: Development Status :: 3 - Alpha
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
+Requires-Python: >=3.11
+Requires-Dist: numpy<2.0.0,>=1.24.0
+Requires-Dist: pydantic-settings>=2.2.0
+Requires-Dist: pydantic>=2.7.0
+Requires-Dist: rich>=13.7.0
+Requires-Dist: sounddevice>=0.4.6
+Requires-Dist: soundfile>=0.12.1
+Requires-Dist: toml>=0.10.2
+Requires-Dist: typer[all]>=0.12.0
+Provides-Extra: dev
+Requires-Dist: black>=24.3.0; extra == 'dev'
+Requires-Dist: psutil>=5.9.0; extra == 'dev'
+Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
+Requires-Dist: pytest-cov>=5.0.0; extra == 'dev'
+Requires-Dist: pytest>=8.1.0; extra == 'dev'
+Requires-Dist: ruff>=0.3.0; extra == 'dev'
+Provides-Extra: mlx
+Requires-Dist: librosa>=0.10.0; extra == 'mlx'
+Requires-Dist: mlx-audio>=0.3.0; extra == 'mlx'
+Requires-Dist: mlx>=0.5.0; extra == 'mlx'
+Provides-Extra: pytorch
+Requires-Dist: accelerate>=0.25.0; extra == 'pytorch'
+Requires-Dist: librosa>=0.10.0; extra == 'pytorch'
+Requires-Dist: pillow>=10.0.0; extra == 'pytorch'
+Requires-Dist: qwen-tts>=0.1.0; extra == 'pytorch'
+Requires-Dist: scipy>=1.11.0; extra == 'pytorch'
+Requires-Dist: sentencepiece>=0.1.99; extra == 'pytorch'
+Requires-Dist: torch>=2.1.0; extra == 'pytorch'
+Requires-Dist: transformers>=4.36.0; extra == 'pytorch'
+Description-Content-Type: text/markdown
+# TTS CLI
+A command-line interface for text-to-speech with voice cloning, powered by [Qwen3-TTS](https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-Base).
+Supports **PyTorch** (CUDA / CPU) and **MLX** (Apple Silicon) backends with automatic platform detection.
+## Features
+- 🎙️ **Voice cloning** — clone any voice from a short audio sample
+- 🔊 **Streaming playback** — hear audio as it generates, no waiting
+- 🍎 **Apple Silicon native** — MLX backend for fast local inference
+- 🎛️ **Two model sizes** — 1.7B (quality) and 0.6B (speed)
+- 📝 **JSON output** — machine-readable output for scripting and pipelines
+- ⚙️ **Configurable** — TOML config files or CLI flags
+## Installation
+**Requires Python 3.11+**
+```bash
+# Basic install
+pip install tts-cli
+# With PyTorch backend
+pip install tts-cli[pytorch]
+# With MLX backend (Apple Silicon)
+pip install tts-cli[mlx]
+# Development
+pip install tts-cli[dev]
+```
+Or install from source:
+```bash
+git clone https://github.com/your-org/ttscli.git
+cd ttscli
+pip install -e ".[pytorch]"
+```
+Verify:
+```bash
+tts --version
+```
+## Quick Start
+### 1. Add a voice sample
+```bash
+tts voice add recording.wav --text "The transcript of the recording" --voice myvoice
+```
+### 2. Speak aloud (streaming)
+```bash
+tts say "Hello, how are you today?" --voice myvoice
+```
+### 3. Save to file
+```bash
+tts say "Hello world" --voice myvoice -o hello.wav --no-play
+```
+## Commands
+### `tts say`
+Generate speech from text. Plays aloud with streaming by default.
+```bash
+tts say "Text to speak" [OPTIONS]
+Options:
+  -v, --voice TEXT     Voice name (default: configured default)
+  -l, --language TEXT  Language code (default: en)
+  -m, --model TEXT     Model size: 1.7B or 0.6B (default: 1.7B)
+  -o, --output PATH   Save to WAV file
+  -i, --instruct TEXT  Speaking style instruction
+  --no-play            Don't play audio, only save to file
+  --no-stream          Disable streaming (generate all, then play)
+  --seed INT           Random seed for reproducibility
+```
+Examples:
+```bash
+tts say "Hello, how are you?"                      # play aloud
+tts say "Good morning" --voice myvoice             # use specific voice
+tts say "Hello world" -o hello.wav                 # play and save
+tts say "Hello world" -o hello.wav --no-play       # save only
+tts say "Breaking news!" -i "Speak urgently"       # with style instruction
+tts say "Slow and steady" --no-stream              # generate all, then play
+```
+### `tts voice`
+Manage voices and audio samples.
+```bash
+tts voice add <audio_file> [OPTIONS]   # Add sample (creates voice if needed)
+tts voice list                          # List all voices
+tts voice info [VOICE]                  # Show voice details
+tts voice delete <VOICE> [-y]           # Delete a voice
+tts voice default [VOICE]               # Set/show default voice
+tts voice default --unset               # Unset default voice
+```
+### `tts config`
+View and update configuration.
+```bash
+tts config show                # Show current config
+tts config set <key> <value>   # Set a config value
+```
+Available config keys: `data_dir`, `default_voice`, `default_language`, `default_model`, `output_format`, `auto_play`
+## JSON Output
+Use `--json` or `--output json` for machine-readable output:
+```bash
+tts --json voice list
+tts --output json say "Hello" --voice myvoice
+```
+## Configuration
+Configuration is loaded from (in order of priority):
+1. CLI flags (`--data-dir`, `--output`)
+2. Config files:
+   - `./tts.toml` (project-local)
+   - `~/.config/tts/config.toml`
+   - `~/.tts/config.toml`
+Example `config.toml`:
+```toml
+default_voice = "myvoice"
+default_language = "en"
+default_model = "1.7B"
+output_format = "rich"
+data_dir = "~/tts"
+```
+## Data Storage
+All data is stored in `~/tts/` by default:
+```text
+~/tts/
+├── voices.json       # Voice definitions and metadata
+├── samples/          # Audio samples for voice cloning
+└── generations/      # Generated audio files
+```
+## Requirements
+- Python 3.11+
+- **PyTorch backend**: torch, transformers, qwen-tts
+- **MLX backend** (Apple Silicon): mlx, mlx-audio
+- Audio: soundfile, sounddevice
+- **System dependency**: [SoX](https://sox.sourceforge.net/) (required by qwen-tts)
+  ```bash
+  # macOS
+  brew install sox
+  # Ubuntu/Debian
+  sudo apt install sox
+  ```
+## License
+MIT — see [LICENSE](LICENSE) for details.