speak-cli 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 hoveychen
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,171 @@
1
+ # speak-cli
2
+
3
+ [![Release](https://img.shields.io/github/v/release/hoveychen/speak-cli)](https://github.com/hoveychen/speak-cli/releases/latest)
4
+ [![Go](https://img.shields.io/badge/go-1.22+-00ADD8?logo=go)](https://go.dev/)
5
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
6
+ [![Platform](https://img.shields.io/badge/platform-macOS%20%7C%20Windows-lightgrey)](https://github.com/hoveychen/speak-cli/releases/latest)
7
+
8
+ **speak-cli** is a fast, offline-capable multilingual text-to-speech CLI powered by [Kokoro](https://github.com/thewh1teagle/kokoro-onnx). It auto-detects language, supports 150+ voices, and uses the faster MLX backend automatically on Apple Silicon.
9
+
10
+ ```bash
11
+ speak "Hello, world!"
12
+ speak "你好,欢迎使用 speak-cli"
13
+ speak -v af_sky -s 1.2 "Speed it up a bit"
14
+ speak --output hello.wav "Save to file"
15
+ ```
16
+
17
+ ---
18
+
19
+ ## Features
20
+
21
+ - **Auto language detection** — switches between English and Chinese based on text content
22
+ - **150+ voices** — 54 English voices (American, British, Spanish, French, Hindi, Japanese, Portuguese, Italian) and 103 Chinese voices
23
+ - **Apple Silicon optimised** — uses MLX backend for English on M-series Macs, falls back to ONNX automatically
24
+ - **Offline after first use** — engine and models are cached in `~/.cache/speak-cli/`
25
+ - **Save to file** — export speech as WAV with `--output`
26
+ - **Tiny binary** — ~8 MB Go binary; engine and models are downloaded on demand
27
+
28
+ ## Supported Platforms
29
+
30
+ | Platform | Architecture | Backend |
31
+ |----------|-------------|---------|
32
+ | macOS | Apple Silicon (arm64) | MLX (fast) + ONNX fallback |
33
+ | macOS | Intel (amd64) | ONNX |
34
+ | Windows | amd64 | ONNX |
35
+ | Linux | — | Export to file only (no audio playback) |
36
+
37
+ ---
38
+
39
+ ## Installation
40
+
41
+ ### npm (Node.js 18+)
42
+
43
+ ```bash
44
+ npm install -g speak-cli
45
+ ```
46
+
47
+ ### One-line install
48
+
49
+ **macOS:**
50
+ ```bash
51
+ curl -fsSL https://raw.githubusercontent.com/hoveychen/speak-cli/main/install.sh | bash
52
+ ```
53
+
54
+ **Windows (PowerShell):**
55
+ ```powershell
56
+ powershell -c "irm https://raw.githubusercontent.com/hoveychen/speak-cli/main/install.ps1 | iex"
57
+ ```
58
+
59
+ Auto-detects your platform and architecture, installs to `~/.local/bin` (macOS) or `%LOCALAPPDATA%\speak-cli\bin` (Windows), and adds it to your PATH automatically.
60
+
61
+ To install a specific version or change the install directory:
62
+ ```bash
63
+ # macOS
64
+ SPEAK_VERSION=v0.2.0 SPEAK_INSTALL_DIR=/usr/local/bin curl -fsSL https://raw.githubusercontent.com/hoveychen/speak-cli/main/install.sh | bash
65
+
66
+ # Windows
67
+ $env:SPEAK_VERSION="v0.2.0"; $env:SPEAK_INSTALL_DIR="C:\Tools"; powershell -c "irm https://raw.githubusercontent.com/hoveychen/speak-cli/main/install.ps1 | iex"
68
+ ```
69
+
70
+ ### Build from source
71
+
72
+ Requires Go 1.22+.
73
+
74
+ ```bash
75
+ git clone https://github.com/hoveychen/speak-cli.git
76
+ cd speak-cli
77
+ make build # outputs to bin/
78
+ ```
79
+
80
+ ---
81
+
82
+ ## Usage
83
+
84
+ ### Speak text
85
+
86
+ ```bash
87
+ speak "Hello, world!" # English (auto-detected)
88
+ speak "你好,欢迎使用 speak-cli" # Chinese (auto-detected)
89
+ speak --lang zh "Kokoro is great" # Force language
90
+ speak -v af_sky "Choose a specific voice" # Choose voice
91
+ speak -s 1.5 "Speak 50% faster" # Adjust speed (0.5–2.0)
92
+ speak -o out.wav "Save as WAV" # Export to file
93
+ ```
94
+
95
+ ### List voices
96
+
97
+ ```bash
98
+ speak voices # All voices
99
+ speak voices --lang en # English only
100
+ speak voices --lang zh # Chinese only
101
+ ```
102
+
103
+ ### Pre-download assets (for offline use)
104
+
105
+ ```bash
106
+ speak init # Download both English and Chinese
107
+ speak init --lang en # English only
108
+ speak init --lang zh # Chinese only
109
+ ```
110
+
111
+ ### All flags
112
+
113
+ ```
114
+ Flags:
115
+ --lang string Language: auto, en, zh (default "auto")
116
+ -v, --voice string Voice name (default depends on language)
117
+ -s, --speed float Speed multiplier 0.5–2.0 (default 1.0)
118
+ -o, --output string Save WAV to file instead of playing
119
+ --no-progress Suppress download progress bar
120
+ -h, --help Help
121
+ ```
122
+
123
+ ---
124
+
125
+ ## How it works
126
+
127
+ On first use, `speak` downloads the appropriate engine bundle and model:
128
+
129
+ ```
130
+ ~/.cache/speak-cli/
131
+ ├── en/
132
+ │ ├── engine/ # PyInstaller-packaged Kokoro ONNX or MLX engine
133
+ │ └── model/ # model.onnx + voices.bin (~88 MB INT8)
134
+ └── zh/
135
+ ├── engine/ # PyInstaller-packaged Kokoro ONNX engine
136
+ └── model/ # model.onnx + voices.bin + config.json (~82 MB INT8)
137
+ ```
138
+
139
+ The Go binary manages downloads, invokes the engine subprocess with JSON arguments, and plays back the WAV output. Language auto-detection uses Unicode CJK block (U+4E00–U+9FFF).
140
+
141
+ ---
142
+
143
+ ## Models
144
+
145
+ | Language | Model | Size | Source |
146
+ |----------|-------|------|--------|
147
+ | English | Kokoro v1.0 (INT8 ONNX) | ~88 MB | [thewh1teagle/kokoro-onnx](https://huggingface.co/thewh1teagle/kokoro-onnx) |
148
+ | Chinese | Kokoro v1.1-zh (INT8 ONNX) | ~82 MB | [hoveyc/speak-cli-models](https://huggingface.co/hoveyc/speak-cli-models) |
149
+ | English (MLX) | Kokoro-82M-bf16 | streamed | [mlx-community/Kokoro-82M-bf16](https://huggingface.co/mlx-community/Kokoro-82M-bf16) |
150
+
151
+ ---
152
+
153
+ ## AI Agent Integration
154
+
155
+ speak-cli ships with AI skills compatible with [Claude Code](https://claude.ai/claude-code), [Cursor](https://cursor.com), and other AI coding agents. Install the skill to let your AI assistant use speak-cli:
156
+
157
+ ```bash
158
+ npx skills add hoveychen/speak-cli
159
+ ```
160
+
161
+ ---
162
+
163
+ ## Contributing
164
+
165
+ Contributions are welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
166
+
167
+ ---
168
+
169
+ ## License
170
+
171
+ [MIT](LICENSE)
package/npm/install.js ADDED
@@ -0,0 +1,93 @@
1
+ #!/usr/bin/env node
2
+ "use strict";
3
+
4
+ const https = require("https");
5
+ const fs = require("fs");
6
+ const path = require("path");
7
+
8
+ // Platform → GitHub Release asset mapping
9
+ const PLATFORMS = {
10
+ "darwin-arm64": "speak-darwin-arm64",
11
+ "darwin-x64": "speak-darwin-amd64",
12
+ "win32-x64": "speak-windows-amd64.exe",
13
+ };
14
+
15
+ const pkg = require("../package.json");
16
+ const version = pkg.version;
17
+ const key = `${process.platform}-${process.arch}`;
18
+ const asset = PLATFORMS[key];
19
+
20
+ if (!asset) {
21
+ console.error(
22
+ `speak-cli: unsupported platform ${process.platform}/${process.arch}\n` +
23
+ `Supported: macOS (arm64, x64), Windows (x64)\n` +
24
+ `Build from source: https://github.com/hoveychen/speak-cli`
25
+ );
26
+ process.exit(1);
27
+ }
28
+
29
+ const binDir = path.join(__dirname, "bin");
30
+ const binName = process.platform === "win32" ? "speak.exe" : "speak";
31
+ const binPath = path.join(binDir, binName);
32
+ const url = `https://github.com/hoveychen/speak-cli/releases/download/v${version}/${asset}`;
33
+
34
+ function download(url, dest, redirects) {
35
+ if (redirects <= 0) {
36
+ console.error("speak-cli: too many redirects");
37
+ process.exit(1);
38
+ }
39
+
40
+ return new Promise((resolve, reject) => {
41
+ const mod = url.startsWith("https") ? https : require("http");
42
+ mod
43
+ .get(url, (res) => {
44
+ // Follow redirects (GitHub uses 302)
45
+ if (res.statusCode >= 300 && res.statusCode < 400 && res.headers.location) {
46
+ res.resume();
47
+ return resolve(download(res.headers.location, dest, redirects - 1));
48
+ }
49
+
50
+ if (res.statusCode !== 200) {
51
+ res.resume();
52
+ return reject(new Error(`HTTP ${res.statusCode} downloading ${url}`));
53
+ }
54
+
55
+ const totalBytes = parseInt(res.headers["content-length"], 10) || 0;
56
+ let downloaded = 0;
57
+
58
+ const file = fs.createWriteStream(dest);
59
+ res.on("data", (chunk) => {
60
+ downloaded += chunk.length;
61
+ if (totalBytes > 0 && process.stderr.isTTY) {
62
+ const pct = ((downloaded / totalBytes) * 100).toFixed(0);
63
+ process.stderr.write(`\rspeak-cli: downloading ${pct}%`);
64
+ }
65
+ });
66
+ res.pipe(file);
67
+ file.on("finish", () => {
68
+ if (process.stderr.isTTY) process.stderr.write("\n");
69
+ file.close(resolve);
70
+ });
71
+ file.on("error", reject);
72
+ })
73
+ .on("error", reject);
74
+ });
75
+ }
76
+
77
+ async function main() {
78
+ fs.mkdirSync(binDir, { recursive: true });
79
+
80
+ console.log(`speak-cli: downloading speak v${version} for ${key}...`);
81
+ await download(url, binPath, 5);
82
+
83
+ if (process.platform !== "win32") {
84
+ fs.chmodSync(binPath, 0o755);
85
+ }
86
+
87
+ console.log(`speak-cli: installed to ${binPath}`);
88
+ }
89
+
90
+ main().catch((err) => {
91
+ console.error(`speak-cli: ${err.message}`);
92
+ process.exit(1);
93
+ });
package/npm/run.js ADDED
@@ -0,0 +1,23 @@
1
+ #!/usr/bin/env node
2
+ "use strict";
3
+
4
+ const { execFileSync } = require("child_process");
5
+ const path = require("path");
6
+ const fs = require("fs");
7
+
8
+ const binName = process.platform === "win32" ? "speak.exe" : "speak";
9
+ const binPath = path.join(__dirname, "bin", binName);
10
+
11
+ if (!fs.existsSync(binPath)) {
12
+ console.error(
13
+ "speak-cli: binary not found. Run `node npm/install.js` or reinstall:\n" +
14
+ " npm install -g speak-cli"
15
+ );
16
+ process.exit(1);
17
+ }
18
+
19
+ try {
20
+ execFileSync(binPath, process.argv.slice(2), { stdio: "inherit" });
21
+ } catch (err) {
22
+ process.exit(err.status ?? 1);
23
+ }
package/package.json ADDED
@@ -0,0 +1,34 @@
1
+ {
2
+ "name": "speak-cli",
3
+ "version": "0.1.2",
4
+ "description": "Multilingual text-to-speech CLI powered by Kokoro",
5
+ "bin": {
6
+ "speak": "npm/run.js"
7
+ },
8
+ "scripts": {
9
+ "postinstall": "node npm/install.js"
10
+ },
11
+ "files": [
12
+ "npm/",
13
+ "skills/",
14
+ "README.md",
15
+ "LICENSE"
16
+ ],
17
+ "os": [
18
+ "darwin",
19
+ "win32"
20
+ ],
21
+ "keywords": [
22
+ "tts",
23
+ "text-to-speech",
24
+ "kokoro",
25
+ "cli",
26
+ "offline",
27
+ "multilingual"
28
+ ],
29
+ "license": "MIT",
30
+ "repository": {
31
+ "type": "git",
32
+ "url": "https://github.com/hoveychen/speak-cli.git"
33
+ }
34
+ }
@@ -0,0 +1,183 @@
1
+ ---
2
+ name: speak-cli
3
+ description: Multilingual text-to-speech CLI — synthesize speech, select voices, export audio
4
+ ---
5
+
6
+ # speak-cli
7
+
8
+ `speak` is a fast, offline-capable multilingual text-to-speech CLI powered by Kokoro. It auto-detects language, supports 150+ voices across 8 languages, and uses the faster MLX backend automatically on Apple Silicon.
9
+
10
+ After first use, the engine and models are cached locally — subsequent runs start instantly with no network required.
11
+
12
+ ## Installation
13
+
14
+ Before using speak commands, verify it is installed:
15
+
16
+ ```bash
17
+ speak --help
18
+ ```
19
+
20
+ If not installed, use one of:
21
+
22
+ ```bash
23
+ # npm (Node.js 18+)
24
+ npm install -g speak-cli
25
+
26
+ # macOS
27
+ curl -fsSL https://raw.githubusercontent.com/hoveychen/speak-cli/main/install.sh | bash
28
+
29
+ # Windows PowerShell
30
+ powershell -c "irm https://raw.githubusercontent.com/hoveychen/speak-cli/main/install.ps1 | iex"
31
+ ```
32
+
33
+ ## Commands Reference
34
+
35
+ ### `speak [flags] <text>` — Synthesize speech
36
+
37
+ Speak text aloud or save to WAV. Language is auto-detected from the text unless `--lang` is specified.
38
+
39
+ | Flag | Default | Description |
40
+ |------|---------|-------------|
41
+ | `--lang` | `auto` | Language: `auto`, `en`, `zh`, `es`, `fr`, `hi`, `it`, `ja`, `pt` |
42
+ | `-v, --voice` | per-language default | Voice name (see voice naming below) |
43
+ | `-s, --speed` | `1.0` | Speed multiplier, range `0.5`–`2.0` |
44
+ | `-o, --output` | _(play audio)_ | Save WAV to file instead of playing |
45
+ | `--no-progress` | `false` | Suppress download progress bar |
46
+
47
+ **Important:** The text argument must be a single quoted string. Wrap it in quotes.
48
+
49
+ ### `speak voices [--lang <lang>]` — List available voices
50
+
51
+ Lists all available voices offline (no engine or model download needed).
52
+
53
+ - `--lang all` (default): show all voices
54
+ - `--lang en`: English voices only
55
+ - `--lang zh`: Chinese voices only
56
+ - Also supports: `es`, `fr`, `hi`, `it`, `ja`, `pt`
57
+
58
+ ### `speak init [--lang <lang>]` — Pre-download assets
59
+
60
+ Downloads engine and model files for offline use.
61
+
62
+ - `--lang all` (default): download all languages
63
+ - `--lang en`: English only (~88 MB model + engine)
64
+ - `--lang zh`: Chinese only (~82 MB model + engine)
65
+
66
+ ## Language Detection
67
+
68
+ When `--lang` is `auto` (default), speak inspects the text for CJK characters (Unicode U+4E00–U+9FFF):
69
+ - If any CJK character is found → Chinese (`zh`)
70
+ - Otherwise → English (`en`)
71
+
72
+ For other languages (Spanish, French, Hindi, Italian, Japanese, Portuguese), you **must** specify `--lang` explicitly.
73
+
74
+ **Supported languages:** `en`, `zh`, `es`, `fr`, `hi`, `it`, `ja`, `pt`
75
+
76
+ ## Voice Naming Convention
77
+
78
+ Voices follow the pattern `{language}{gender}_{name}`:
79
+
80
+ | Prefix | Language | Gender |
81
+ |--------|----------|--------|
82
+ | `af_` | American English | Female |
83
+ | `am_` | American English | Male |
84
+ | `bf_` | British English | Female |
85
+ | `bm_` | British English | Male |
86
+ | `ef_` | Spanish | Female |
87
+ | `em_` | Spanish | Male |
88
+ | `ff_` | French | Female |
89
+ | `hf_` | Hindi | Female |
90
+ | `hm_` | Hindi | Male |
91
+ | `if_` | Italian | Female |
92
+ | `im_` | Italian | Male |
93
+ | `jf_` | Japanese | Female |
94
+ | `jm_` | Japanese | Male |
95
+ | `pf_` | Portuguese (BR) | Female |
96
+ | `pm_` | Portuguese (BR) | Male |
97
+ | `zf_` | Mandarin Chinese | Female |
98
+ | `zm_` | Mandarin Chinese | Male |
99
+
100
+ **Default voices per language:**
101
+
102
+ | Language | Default Voice |
103
+ |----------|--------------|
104
+ | English | `af_heart` |
105
+ | Chinese | `zf_001` |
106
+ | Spanish | `ef_dora` |
107
+ | French | `ff_siwis` |
108
+ | Hindi | `hf_alpha` |
109
+ | Italian | `if_sara` |
110
+ | Japanese | `jf_alpha` |
111
+ | Portuguese | `pf_dora` |
112
+
113
+ Use `speak voices --lang <code>` to see all available voices for a language.
114
+
115
+ ## Common Patterns
116
+
117
+ ```bash
118
+ # Basic speech (language auto-detected)
119
+ speak "Hello, world!"
120
+ speak "你好,欢迎使用 speak-cli"
121
+
122
+ # Choose a specific voice
123
+ speak -v af_sky "A different voice"
124
+ speak -v zm_010 "换一个男声"
125
+
126
+ # Adjust speed
127
+ speak -s 1.5 "Speak faster"
128
+ speak -s 0.7 "Speak slower"
129
+
130
+ # Save to WAV file
131
+ speak -o greeting.wav "Hello, world!"
132
+
133
+ # Force language (required for es/fr/hi/it/ja/pt)
134
+ speak --lang ja "こんにちは"
135
+ speak --lang es "Hola, mundo"
136
+
137
+ # Scripting: suppress progress bar
138
+ speak --no-progress -o out.wav "Silent download"
139
+
140
+ # Batch export
141
+ for i in 1 2 3; do
142
+ speak -o "part${i}.wav" "This is part ${i}"
143
+ done
144
+ ```
145
+
146
+ ## Offline Setup
147
+
148
+ For offline environments, pre-download assets:
149
+
150
+ ```bash
151
+ speak init # All languages
152
+ speak init --lang en # English only
153
+ speak init --lang zh # Chinese only
154
+ ```
155
+
156
+ Assets are cached in `~/.cache/speak-cli/`:
157
+ - Engine: PyInstaller-packaged Kokoro engine (~200 MB)
158
+ - Models: ONNX model + voice data (~80–90 MB per language)
159
+ - On Apple Silicon: MLX engine is also downloaded for English (faster)
160
+
161
+ ## Platform Support
162
+
163
+ | Platform | Architecture | Backend | Notes |
164
+ |----------|-------------|---------|-------|
165
+ | macOS | Apple Silicon (arm64) | MLX + ONNX fallback | Fastest on M-series |
166
+ | macOS | Intel (amd64) | ONNX | |
167
+ | Windows | amd64 | ONNX | |
168
+ | Linux | — | Export only | Use `--output` to save WAV; no audio playback |
169
+
170
+ ## Troubleshooting
171
+
172
+ - **First run is slow**: The engine and model are downloaded on first use. Use `speak init` to pre-download.
173
+ - **MLX fallback message on Apple Silicon**: If the MLX engine cannot load (e.g., missing Metal libraries), speak automatically falls back to the ONNX engine. This is normal and does not affect output quality.
174
+ - **"unsupported platform" error**: speak currently supports macOS and Windows. On Linux, build from source.
175
+ - **Output format**: Only WAV output is supported. Use external tools (ffmpeg) to convert to other formats.
176
+
177
+ ## Constraints
178
+
179
+ - Text must be passed as a **single argument** — always wrap in quotes
180
+ - Speed range: `0.5` to `2.0`
181
+ - Output format: WAV only (use `--output` flag)
182
+ - No streaming or pipe support — the full audio is synthesized before playback
183
+ - Auto-detection only distinguishes English and Chinese; other languages require `--lang`