speak-cli 0.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +171 -0
- package/npm/install.js +93 -0
- package/npm/run.js +23 -0
- package/package.json +34 -0
- package/skills/speak-cli/SKILL.md +183 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025 hoveychen
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,171 @@
|
|
|
1
|
+
# speak-cli
|
|
2
|
+
|
|
3
|
+
[](https://github.com/hoveychen/speak-cli/releases/latest)
|
|
4
|
+
[](https://go.dev/)
|
|
5
|
+
[](LICENSE)
|
|
6
|
+
[](https://github.com/hoveychen/speak-cli/releases/latest)
|
|
7
|
+
|
|
8
|
+
**speak-cli** is a fast, offline-capable multilingual text-to-speech CLI powered by [Kokoro](https://github.com/thewh1teagle/kokoro-onnx). It auto-detects language, supports 150+ voices, and uses the faster MLX backend automatically on Apple Silicon.
|
|
9
|
+
|
|
10
|
+
```bash
|
|
11
|
+
speak "Hello, world!"
|
|
12
|
+
speak "你好,欢迎使用 speak-cli"
|
|
13
|
+
speak -v af_sky -s 1.2 "Speed it up a bit"
|
|
14
|
+
speak --output hello.wav "Save to file"
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
## Features
|
|
20
|
+
|
|
21
|
+
- **Auto language detection** — switches between English and Chinese based on text content
|
|
22
|
+
- **150+ voices** — 54 English voices (American, British, Spanish, French, Hindi, Japanese, Portuguese, Italian) and 103 Chinese voices
|
|
23
|
+
- **Apple Silicon optimised** — uses MLX backend for English on M-series Macs, falls back to ONNX automatically
|
|
24
|
+
- **Offline after first use** — engine and models are cached in `~/.cache/speak-cli/`
|
|
25
|
+
- **Save to file** — export speech as WAV with `--output`
|
|
26
|
+
- **Tiny binary** — ~8 MB Go binary; engine and models are downloaded on demand
|
|
27
|
+
|
|
28
|
+
## Supported Platforms
|
|
29
|
+
|
|
30
|
+
| Platform | Architecture | Backend |
|
|
31
|
+
|----------|-------------|---------|
|
|
32
|
+
| macOS | Apple Silicon (arm64) | MLX (fast) + ONNX fallback |
|
|
33
|
+
| macOS | Intel (amd64) | ONNX |
|
|
34
|
+
| Windows | amd64 | ONNX |
|
|
35
|
+
| Linux | — | Export to file only (no audio playback) |
|
|
36
|
+
|
|
37
|
+
---
|
|
38
|
+
|
|
39
|
+
## Installation
|
|
40
|
+
|
|
41
|
+
### npm (Node.js 18+)
|
|
42
|
+
|
|
43
|
+
```bash
|
|
44
|
+
npm install -g speak-cli
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
### One-line install
|
|
48
|
+
|
|
49
|
+
**macOS:**
|
|
50
|
+
```bash
|
|
51
|
+
curl -fsSL https://raw.githubusercontent.com/hoveychen/speak-cli/main/install.sh | bash
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
**Windows (PowerShell):**
|
|
55
|
+
```powershell
|
|
56
|
+
powershell -c "irm https://raw.githubusercontent.com/hoveychen/speak-cli/main/install.ps1 | iex"
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
Auto-detects your platform and architecture, installs to `~/.local/bin` (macOS) or `%LOCALAPPDATA%\speak-cli\bin` (Windows), and adds it to your PATH automatically.
|
|
60
|
+
|
|
61
|
+
To install a specific version or change the install directory:
|
|
62
|
+
```bash
|
|
63
|
+
# macOS
|
|
64
|
+
SPEAK_VERSION=v0.2.0 SPEAK_INSTALL_DIR=/usr/local/bin curl -fsSL https://raw.githubusercontent.com/hoveychen/speak-cli/main/install.sh | bash
|
|
65
|
+
|
|
66
|
+
# Windows
|
|
67
|
+
$env:SPEAK_VERSION="v0.2.0"; $env:SPEAK_INSTALL_DIR="C:\Tools"; powershell -c "irm https://raw.githubusercontent.com/hoveychen/speak-cli/main/install.ps1 | iex"
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
### Build from source
|
|
71
|
+
|
|
72
|
+
Requires Go 1.22+.
|
|
73
|
+
|
|
74
|
+
```bash
|
|
75
|
+
git clone https://github.com/hoveychen/speak-cli.git
|
|
76
|
+
cd speak-cli
|
|
77
|
+
make build # outputs to bin/
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
---
|
|
81
|
+
|
|
82
|
+
## Usage
|
|
83
|
+
|
|
84
|
+
### Speak text
|
|
85
|
+
|
|
86
|
+
```bash
|
|
87
|
+
speak "Hello, world!" # English (auto-detected)
|
|
88
|
+
speak "你好,欢迎使用 speak-cli" # Chinese (auto-detected)
|
|
89
|
+
speak --lang zh "Kokoro is great" # Force language
|
|
90
|
+
speak -v af_sky "Choose a specific voice" # Choose voice
|
|
91
|
+
speak -s 1.5 "Speak 50% faster" # Adjust speed (0.5–2.0)
|
|
92
|
+
speak -o out.wav "Save as WAV" # Export to file
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
### List voices
|
|
96
|
+
|
|
97
|
+
```bash
|
|
98
|
+
speak voices # All voices
|
|
99
|
+
speak voices --lang en # English only
|
|
100
|
+
speak voices --lang zh # Chinese only
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
### Pre-download assets (for offline use)
|
|
104
|
+
|
|
105
|
+
```bash
|
|
106
|
+
speak init # Download both English and Chinese
|
|
107
|
+
speak init --lang en # English only
|
|
108
|
+
speak init --lang zh # Chinese only
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
### All flags
|
|
112
|
+
|
|
113
|
+
```
|
|
114
|
+
Flags:
|
|
115
|
+
--lang string Language: auto, en, zh (default "auto")
|
|
116
|
+
-v, --voice string Voice name (default depends on language)
|
|
117
|
+
-s, --speed float Speed multiplier 0.5–2.0 (default 1.0)
|
|
118
|
+
-o, --output string Save WAV to file instead of playing
|
|
119
|
+
--no-progress Suppress download progress bar
|
|
120
|
+
-h, --help Help
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
---
|
|
124
|
+
|
|
125
|
+
## How it works
|
|
126
|
+
|
|
127
|
+
On first use, `speak` downloads the appropriate engine bundle and model:
|
|
128
|
+
|
|
129
|
+
```
|
|
130
|
+
~/.cache/speak-cli/
|
|
131
|
+
├── en/
|
|
132
|
+
│ ├── engine/ # PyInstaller-packaged Kokoro ONNX or MLX engine
|
|
133
|
+
│ └── model/ # model.onnx + voices.bin (~88 MB INT8)
|
|
134
|
+
└── zh/
|
|
135
|
+
├── engine/ # PyInstaller-packaged Kokoro ONNX engine
|
|
136
|
+
└── model/ # model.onnx + voices.bin + config.json (~82 MB INT8)
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
The Go binary manages downloads, invokes the engine subprocess with JSON arguments, and plays back the WAV output. Language auto-detection uses Unicode CJK block (U+4E00–U+9FFF).
|
|
140
|
+
|
|
141
|
+
---
|
|
142
|
+
|
|
143
|
+
## Models
|
|
144
|
+
|
|
145
|
+
| Language | Model | Size | Source |
|
|
146
|
+
|----------|-------|------|--------|
|
|
147
|
+
| English | Kokoro v1.0 (INT8 ONNX) | ~88 MB | [thewh1teagle/kokoro-onnx](https://huggingface.co/thewh1teagle/kokoro-onnx) |
|
|
148
|
+
| Chinese | Kokoro v1.1-zh (INT8 ONNX) | ~82 MB | [hoveyc/speak-cli-models](https://huggingface.co/hoveyc/speak-cli-models) |
|
|
149
|
+
| English (MLX) | Kokoro-82M-bf16 | streamed | [mlx-community/Kokoro-82M-bf16](https://huggingface.co/mlx-community/Kokoro-82M-bf16) |
|
|
150
|
+
|
|
151
|
+
---
|
|
152
|
+
|
|
153
|
+
## AI Agent Integration
|
|
154
|
+
|
|
155
|
+
speak-cli ships with AI skills compatible with [Claude Code](https://claude.ai/claude-code), [Cursor](https://cursor.com), and other AI coding agents. Install the skill to let your AI assistant use speak-cli:
|
|
156
|
+
|
|
157
|
+
```bash
|
|
158
|
+
npx skills add hoveychen/speak-cli
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
---
|
|
162
|
+
|
|
163
|
+
## Contributing
|
|
164
|
+
|
|
165
|
+
Contributions are welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
|
|
166
|
+
|
|
167
|
+
---
|
|
168
|
+
|
|
169
|
+
## License
|
|
170
|
+
|
|
171
|
+
[MIT](LICENSE)
|
package/npm/install.js
ADDED
|
@@ -0,0 +1,93 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
"use strict";
|
|
3
|
+
|
|
4
|
+
const https = require("https");
|
|
5
|
+
const fs = require("fs");
|
|
6
|
+
const path = require("path");
|
|
7
|
+
|
|
8
|
+
// Platform → GitHub Release asset mapping
|
|
9
|
+
const PLATFORMS = {
|
|
10
|
+
"darwin-arm64": "speak-darwin-arm64",
|
|
11
|
+
"darwin-x64": "speak-darwin-amd64",
|
|
12
|
+
"win32-x64": "speak-windows-amd64.exe",
|
|
13
|
+
};
|
|
14
|
+
|
|
15
|
+
const pkg = require("../package.json");
|
|
16
|
+
const version = pkg.version;
|
|
17
|
+
const key = `${process.platform}-${process.arch}`;
|
|
18
|
+
const asset = PLATFORMS[key];
|
|
19
|
+
|
|
20
|
+
if (!asset) {
|
|
21
|
+
console.error(
|
|
22
|
+
`speak-cli: unsupported platform ${process.platform}/${process.arch}\n` +
|
|
23
|
+
`Supported: macOS (arm64, x64), Windows (x64)\n` +
|
|
24
|
+
`Build from source: https://github.com/hoveychen/speak-cli`
|
|
25
|
+
);
|
|
26
|
+
process.exit(1);
|
|
27
|
+
}
|
|
28
|
+
|
|
29
|
+
const binDir = path.join(__dirname, "bin");
|
|
30
|
+
const binName = process.platform === "win32" ? "speak.exe" : "speak";
|
|
31
|
+
const binPath = path.join(binDir, binName);
|
|
32
|
+
const url = `https://github.com/hoveychen/speak-cli/releases/download/v${version}/${asset}`;
|
|
33
|
+
|
|
34
|
+
function download(url, dest, redirects) {
|
|
35
|
+
if (redirects <= 0) {
|
|
36
|
+
console.error("speak-cli: too many redirects");
|
|
37
|
+
process.exit(1);
|
|
38
|
+
}
|
|
39
|
+
|
|
40
|
+
return new Promise((resolve, reject) => {
|
|
41
|
+
const mod = url.startsWith("https") ? https : require("http");
|
|
42
|
+
mod
|
|
43
|
+
.get(url, (res) => {
|
|
44
|
+
// Follow redirects (GitHub uses 302)
|
|
45
|
+
if (res.statusCode >= 300 && res.statusCode < 400 && res.headers.location) {
|
|
46
|
+
res.resume();
|
|
47
|
+
return resolve(download(res.headers.location, dest, redirects - 1));
|
|
48
|
+
}
|
|
49
|
+
|
|
50
|
+
if (res.statusCode !== 200) {
|
|
51
|
+
res.resume();
|
|
52
|
+
return reject(new Error(`HTTP ${res.statusCode} downloading ${url}`));
|
|
53
|
+
}
|
|
54
|
+
|
|
55
|
+
const totalBytes = parseInt(res.headers["content-length"], 10) || 0;
|
|
56
|
+
let downloaded = 0;
|
|
57
|
+
|
|
58
|
+
const file = fs.createWriteStream(dest);
|
|
59
|
+
res.on("data", (chunk) => {
|
|
60
|
+
downloaded += chunk.length;
|
|
61
|
+
if (totalBytes > 0 && process.stderr.isTTY) {
|
|
62
|
+
const pct = ((downloaded / totalBytes) * 100).toFixed(0);
|
|
63
|
+
process.stderr.write(`\rspeak-cli: downloading ${pct}%`);
|
|
64
|
+
}
|
|
65
|
+
});
|
|
66
|
+
res.pipe(file);
|
|
67
|
+
file.on("finish", () => {
|
|
68
|
+
if (process.stderr.isTTY) process.stderr.write("\n");
|
|
69
|
+
file.close(resolve);
|
|
70
|
+
});
|
|
71
|
+
file.on("error", reject);
|
|
72
|
+
})
|
|
73
|
+
.on("error", reject);
|
|
74
|
+
});
|
|
75
|
+
}
|
|
76
|
+
|
|
77
|
+
async function main() {
|
|
78
|
+
fs.mkdirSync(binDir, { recursive: true });
|
|
79
|
+
|
|
80
|
+
console.log(`speak-cli: downloading speak v${version} for ${key}...`);
|
|
81
|
+
await download(url, binPath, 5);
|
|
82
|
+
|
|
83
|
+
if (process.platform !== "win32") {
|
|
84
|
+
fs.chmodSync(binPath, 0o755);
|
|
85
|
+
}
|
|
86
|
+
|
|
87
|
+
console.log(`speak-cli: installed to ${binPath}`);
|
|
88
|
+
}
|
|
89
|
+
|
|
90
|
+
main().catch((err) => {
|
|
91
|
+
console.error(`speak-cli: ${err.message}`);
|
|
92
|
+
process.exit(1);
|
|
93
|
+
});
|
package/npm/run.js
ADDED
|
@@ -0,0 +1,23 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
"use strict";
|
|
3
|
+
|
|
4
|
+
const { execFileSync } = require("child_process");
|
|
5
|
+
const path = require("path");
|
|
6
|
+
const fs = require("fs");
|
|
7
|
+
|
|
8
|
+
const binName = process.platform === "win32" ? "speak.exe" : "speak";
|
|
9
|
+
const binPath = path.join(__dirname, "bin", binName);
|
|
10
|
+
|
|
11
|
+
if (!fs.existsSync(binPath)) {
|
|
12
|
+
console.error(
|
|
13
|
+
"speak-cli: binary not found. Run `node npm/install.js` or reinstall:\n" +
|
|
14
|
+
" npm install -g speak-cli"
|
|
15
|
+
);
|
|
16
|
+
process.exit(1);
|
|
17
|
+
}
|
|
18
|
+
|
|
19
|
+
try {
|
|
20
|
+
execFileSync(binPath, process.argv.slice(2), { stdio: "inherit" });
|
|
21
|
+
} catch (err) {
|
|
22
|
+
process.exit(err.status ?? 1);
|
|
23
|
+
}
|
package/package.json
ADDED
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "speak-cli",
|
|
3
|
+
"version": "0.1.2",
|
|
4
|
+
"description": "Multilingual text-to-speech CLI powered by Kokoro",
|
|
5
|
+
"bin": {
|
|
6
|
+
"speak": "npm/run.js"
|
|
7
|
+
},
|
|
8
|
+
"scripts": {
|
|
9
|
+
"postinstall": "node npm/install.js"
|
|
10
|
+
},
|
|
11
|
+
"files": [
|
|
12
|
+
"npm/",
|
|
13
|
+
"skills/",
|
|
14
|
+
"README.md",
|
|
15
|
+
"LICENSE"
|
|
16
|
+
],
|
|
17
|
+
"os": [
|
|
18
|
+
"darwin",
|
|
19
|
+
"win32"
|
|
20
|
+
],
|
|
21
|
+
"keywords": [
|
|
22
|
+
"tts",
|
|
23
|
+
"text-to-speech",
|
|
24
|
+
"kokoro",
|
|
25
|
+
"cli",
|
|
26
|
+
"offline",
|
|
27
|
+
"multilingual"
|
|
28
|
+
],
|
|
29
|
+
"license": "MIT",
|
|
30
|
+
"repository": {
|
|
31
|
+
"type": "git",
|
|
32
|
+
"url": "https://github.com/hoveychen/speak-cli.git"
|
|
33
|
+
}
|
|
34
|
+
}
|
|
@@ -0,0 +1,183 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: speak-cli
|
|
3
|
+
description: Multilingual text-to-speech CLI — synthesize speech, select voices, export audio
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# speak-cli
|
|
7
|
+
|
|
8
|
+
`speak` is a fast, offline-capable multilingual text-to-speech CLI powered by Kokoro. It auto-detects language, supports 150+ voices across 8 languages, and uses the faster MLX backend automatically on Apple Silicon.
|
|
9
|
+
|
|
10
|
+
After first use, the engine and models are cached locally — subsequent runs start instantly with no network required.
|
|
11
|
+
|
|
12
|
+
## Installation
|
|
13
|
+
|
|
14
|
+
Before using speak commands, verify it is installed:
|
|
15
|
+
|
|
16
|
+
```bash
|
|
17
|
+
speak --help
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
If not installed, use one of:
|
|
21
|
+
|
|
22
|
+
```bash
|
|
23
|
+
# npm (Node.js 18+)
|
|
24
|
+
npm install -g speak-cli
|
|
25
|
+
|
|
26
|
+
# macOS
|
|
27
|
+
curl -fsSL https://raw.githubusercontent.com/hoveychen/speak-cli/main/install.sh | bash
|
|
28
|
+
|
|
29
|
+
# Windows PowerShell
|
|
30
|
+
powershell -c "irm https://raw.githubusercontent.com/hoveychen/speak-cli/main/install.ps1 | iex"
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
## Commands Reference
|
|
34
|
+
|
|
35
|
+
### `speak [flags] <text>` — Synthesize speech
|
|
36
|
+
|
|
37
|
+
Speak text aloud or save to WAV. Language is auto-detected from the text unless `--lang` is specified.
|
|
38
|
+
|
|
39
|
+
| Flag | Default | Description |
|
|
40
|
+
|------|---------|-------------|
|
|
41
|
+
| `--lang` | `auto` | Language: `auto`, `en`, `zh`, `es`, `fr`, `hi`, `it`, `ja`, `pt` |
|
|
42
|
+
| `-v, --voice` | per-language default | Voice name (see voice naming below) |
|
|
43
|
+
| `-s, --speed` | `1.0` | Speed multiplier, range `0.5`–`2.0` |
|
|
44
|
+
| `-o, --output` | _(play audio)_ | Save WAV to file instead of playing |
|
|
45
|
+
| `--no-progress` | `false` | Suppress download progress bar |
|
|
46
|
+
|
|
47
|
+
**Important:** The text argument must be a single quoted string. Wrap it in quotes.
|
|
48
|
+
|
|
49
|
+
### `speak voices [--lang <lang>]` — List available voices
|
|
50
|
+
|
|
51
|
+
Lists all available voices offline (no engine or model download needed).
|
|
52
|
+
|
|
53
|
+
- `--lang all` (default): show all voices
|
|
54
|
+
- `--lang en`: English voices only
|
|
55
|
+
- `--lang zh`: Chinese voices only
|
|
56
|
+
- Also supports: `es`, `fr`, `hi`, `it`, `ja`, `pt`
|
|
57
|
+
|
|
58
|
+
### `speak init [--lang <lang>]` — Pre-download assets
|
|
59
|
+
|
|
60
|
+
Downloads engine and model files for offline use.
|
|
61
|
+
|
|
62
|
+
- `--lang all` (default): download all languages
|
|
63
|
+
- `--lang en`: English only (~88 MB model + engine)
|
|
64
|
+
- `--lang zh`: Chinese only (~82 MB model + engine)
|
|
65
|
+
|
|
66
|
+
## Language Detection
|
|
67
|
+
|
|
68
|
+
When `--lang` is `auto` (default), speak inspects the text for CJK characters (Unicode U+4E00–U+9FFF):
|
|
69
|
+
- If any CJK character is found → Chinese (`zh`)
|
|
70
|
+
- Otherwise → English (`en`)
|
|
71
|
+
|
|
72
|
+
For other languages (Spanish, French, Hindi, Italian, Japanese, Portuguese), you **must** specify `--lang` explicitly.
|
|
73
|
+
|
|
74
|
+
**Supported languages:** `en`, `zh`, `es`, `fr`, `hi`, `it`, `ja`, `pt`
|
|
75
|
+
|
|
76
|
+
## Voice Naming Convention
|
|
77
|
+
|
|
78
|
+
Voices follow the pattern `{language}{gender}_{name}`:
|
|
79
|
+
|
|
80
|
+
| Prefix | Language | Gender |
|
|
81
|
+
|--------|----------|--------|
|
|
82
|
+
| `af_` | American English | Female |
|
|
83
|
+
| `am_` | American English | Male |
|
|
84
|
+
| `bf_` | British English | Female |
|
|
85
|
+
| `bm_` | British English | Male |
|
|
86
|
+
| `ef_` | Spanish | Female |
|
|
87
|
+
| `em_` | Spanish | Male |
|
|
88
|
+
| `ff_` | French | Female |
|
|
89
|
+
| `hf_` | Hindi | Female |
|
|
90
|
+
| `hm_` | Hindi | Male |
|
|
91
|
+
| `if_` | Italian | Female |
|
|
92
|
+
| `im_` | Italian | Male |
|
|
93
|
+
| `jf_` | Japanese | Female |
|
|
94
|
+
| `jm_` | Japanese | Male |
|
|
95
|
+
| `pf_` | Portuguese (BR) | Female |
|
|
96
|
+
| `pm_` | Portuguese (BR) | Male |
|
|
97
|
+
| `zf_` | Mandarin Chinese | Female |
|
|
98
|
+
| `zm_` | Mandarin Chinese | Male |
|
|
99
|
+
|
|
100
|
+
**Default voices per language:**
|
|
101
|
+
|
|
102
|
+
| Language | Default Voice |
|
|
103
|
+
|----------|--------------|
|
|
104
|
+
| English | `af_heart` |
|
|
105
|
+
| Chinese | `zf_001` |
|
|
106
|
+
| Spanish | `ef_dora` |
|
|
107
|
+
| French | `ff_siwis` |
|
|
108
|
+
| Hindi | `hf_alpha` |
|
|
109
|
+
| Italian | `if_sara` |
|
|
110
|
+
| Japanese | `jf_alpha` |
|
|
111
|
+
| Portuguese | `pf_dora` |
|
|
112
|
+
|
|
113
|
+
Use `speak voices --lang <code>` to see all available voices for a language.
|
|
114
|
+
|
|
115
|
+
## Common Patterns
|
|
116
|
+
|
|
117
|
+
```bash
|
|
118
|
+
# Basic speech (language auto-detected)
|
|
119
|
+
speak "Hello, world!"
|
|
120
|
+
speak "你好,欢迎使用 speak-cli"
|
|
121
|
+
|
|
122
|
+
# Choose a specific voice
|
|
123
|
+
speak -v af_sky "A different voice"
|
|
124
|
+
speak -v zm_010 "换一个男声"
|
|
125
|
+
|
|
126
|
+
# Adjust speed
|
|
127
|
+
speak -s 1.5 "Speak faster"
|
|
128
|
+
speak -s 0.7 "Speak slower"
|
|
129
|
+
|
|
130
|
+
# Save to WAV file
|
|
131
|
+
speak -o greeting.wav "Hello, world!"
|
|
132
|
+
|
|
133
|
+
# Force language (required for es/fr/hi/it/ja/pt)
|
|
134
|
+
speak --lang ja "こんにちは"
|
|
135
|
+
speak --lang es "Hola, mundo"
|
|
136
|
+
|
|
137
|
+
# Scripting: suppress progress bar
|
|
138
|
+
speak --no-progress -o out.wav "Silent download"
|
|
139
|
+
|
|
140
|
+
# Batch export
|
|
141
|
+
for i in 1 2 3; do
|
|
142
|
+
speak -o "part${i}.wav" "This is part ${i}"
|
|
143
|
+
done
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
## Offline Setup
|
|
147
|
+
|
|
148
|
+
For offline environments, pre-download assets:
|
|
149
|
+
|
|
150
|
+
```bash
|
|
151
|
+
speak init # All languages
|
|
152
|
+
speak init --lang en # English only
|
|
153
|
+
speak init --lang zh # Chinese only
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
Assets are cached in `~/.cache/speak-cli/`:
|
|
157
|
+
- Engine: PyInstaller-packaged Kokoro engine (~200 MB)
|
|
158
|
+
- Models: ONNX model + voice data (~80–90 MB per language)
|
|
159
|
+
- On Apple Silicon: MLX engine is also downloaded for English (faster)
|
|
160
|
+
|
|
161
|
+
## Platform Support
|
|
162
|
+
|
|
163
|
+
| Platform | Architecture | Backend | Notes |
|
|
164
|
+
|----------|-------------|---------|-------|
|
|
165
|
+
| macOS | Apple Silicon (arm64) | MLX + ONNX fallback | Fastest on M-series |
|
|
166
|
+
| macOS | Intel (amd64) | ONNX | |
|
|
167
|
+
| Windows | amd64 | ONNX | |
|
|
168
|
+
| Linux | — | Export only | Use `--output` to save WAV; no audio playback |
|
|
169
|
+
|
|
170
|
+
## Troubleshooting
|
|
171
|
+
|
|
172
|
+
- **First run is slow**: The engine and model are downloaded on first use. Use `speak init` to pre-download.
|
|
173
|
+
- **MLX fallback message on Apple Silicon**: If the MLX engine cannot load (e.g., missing Metal libraries), speak automatically falls back to the ONNX engine. This is normal and does not affect output quality.
|
|
174
|
+
- **"unsupported platform" error**: speak currently supports macOS and Windows. On Linux, build from source.
|
|
175
|
+
- **Output format**: Only WAV output is supported. Use external tools (ffmpeg) to convert to other formats.
|
|
176
|
+
|
|
177
|
+
## Constraints
|
|
178
|
+
|
|
179
|
+
- Text must be passed as a **single argument** — always wrap in quotes
|
|
180
|
+
- Speed range: `0.5` to `2.0`
|
|
181
|
+
- Output format: WAV only (use `--output` flag)
|
|
182
|
+
- No streaming or pipe support — the full audio is synthesized before playback
|
|
183
|
+
- Auto-detection only distinguishes English and Chinese; other languages require `--lang`
|