voicemix 1.3.0 → 1.3.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +5 -0
- package/package.json +1 -1
- package/providers/cartesia.js +3 -3
- package/providers/elevenlabs.js +3 -3
- package/providers/resemble.js +3 -3
- package/skills/voicemix/SKILL.md +254 -0
package/README.md
CHANGED
|
@@ -8,6 +8,11 @@ VoiceMix is a flexible text-to-speech library that allows you to generate speech
|
|
|
8
8
|
npm install voicemix dotenv
|
|
9
9
|
```
|
|
10
10
|
|
|
11
|
+
> **AI Skill**: You can also add VoiceMix as a skill for AI agentic development:
|
|
12
|
+
> ```bash
|
|
13
|
+
> npx skills add https://github.com/clasen/VoiceMix --skill voicemix
|
|
14
|
+
> ```
|
|
15
|
+
|
|
11
16
|
### Environment Setup
|
|
12
17
|
|
|
13
18
|
Create a `.env` file in your project root with your API keys:
|
package/package.json
CHANGED
package/providers/cartesia.js
CHANGED
|
@@ -17,9 +17,6 @@ export class CartesiaProvider {
|
|
|
17
17
|
}
|
|
18
18
|
};
|
|
19
19
|
|
|
20
|
-
if (!this.apiKey) {
|
|
21
|
-
throw new ProviderError('Cartesia API key is required', 'cartesia');
|
|
22
|
-
}
|
|
23
20
|
}
|
|
24
21
|
|
|
25
22
|
_getRequestOptions(voiceId, text, format = 'mp3') {
|
|
@@ -53,6 +50,9 @@ export class CartesiaProvider {
|
|
|
53
50
|
|
|
54
51
|
async save(voiceId, text, format, filePath, fileName) {
|
|
55
52
|
try {
|
|
53
|
+
if (!this.apiKey) {
|
|
54
|
+
throw new ProviderError('Cartesia API key is required', 'cartesia');
|
|
55
|
+
}
|
|
56
56
|
if (!voiceId) {
|
|
57
57
|
throw new ProviderError('Voice ID is required', 'cartesia');
|
|
58
58
|
}
|
package/providers/elevenlabs.js
CHANGED
|
@@ -15,9 +15,6 @@ export class ElevenLabsProvider {
|
|
|
15
15
|
use_speaker_boost: true
|
|
16
16
|
};
|
|
17
17
|
|
|
18
|
-
if (!this.apiKey) {
|
|
19
|
-
throw new ProviderError('ElevenLabs API key is required', 'elevenlabs');
|
|
20
|
-
}
|
|
21
18
|
}
|
|
22
19
|
|
|
23
20
|
monolingual_v1() {
|
|
@@ -64,6 +61,9 @@ export class ElevenLabsProvider {
|
|
|
64
61
|
|
|
65
62
|
async save(voiceId, text, format, filePath, fileName) {
|
|
66
63
|
try {
|
|
64
|
+
if (!this.apiKey) {
|
|
65
|
+
throw new ProviderError('ElevenLabs API key is required', 'elevenlabs');
|
|
66
|
+
}
|
|
67
67
|
if (!voiceId) {
|
|
68
68
|
throw new ProviderError('Voice ID is required', 'elevenlabs');
|
|
69
69
|
}
|
package/providers/resemble.js
CHANGED
|
@@ -13,9 +13,6 @@ export class ResembleProvider {
|
|
|
13
13
|
precision: 'PCM_16'
|
|
14
14
|
};
|
|
15
15
|
|
|
16
|
-
if (!this.apiKey) {
|
|
17
|
-
throw new ProviderError('Resemble API key is required', 'resemble');
|
|
18
|
-
}
|
|
19
16
|
}
|
|
20
17
|
|
|
21
18
|
_getRequestOptions(endpoint, data) {
|
|
@@ -33,6 +30,9 @@ export class ResembleProvider {
|
|
|
33
30
|
|
|
34
31
|
async save(voiceId, text, format, filePath, fileName) {
|
|
35
32
|
try {
|
|
33
|
+
if (!this.apiKey) {
|
|
34
|
+
throw new ProviderError('Resemble API key is required', 'resemble');
|
|
35
|
+
}
|
|
36
36
|
if (!voiceId) {
|
|
37
37
|
throw new ProviderError('Voice ID is required', 'resemble');
|
|
38
38
|
}
|
|
@@ -0,0 +1,254 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: voicemix
|
|
3
|
+
description: Instructions for generating speech audio files using the VoiceMix Node.js library
|
|
4
|
+
version: 1.3.2
|
|
5
|
+
tags: [nodejs, tts, text-to-speech, elevenlabs, resemble-ai, cartesia, audio]
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# VoiceMix Skill
|
|
9
|
+
|
|
10
|
+
## Table of Contents
|
|
11
|
+
|
|
12
|
+
- [Overview](#overview)
|
|
13
|
+
- [Installation](#installation)
|
|
14
|
+
- [Environment Setup](#environment-setup)
|
|
15
|
+
- [Core API](#core-api)
|
|
16
|
+
- [Common Tasks](#common-tasks)
|
|
17
|
+
- [Agent Usage Rules](#agent-usage-rules)
|
|
18
|
+
- [Troubleshooting](#troubleshooting)
|
|
19
|
+
- [References](#references)
|
|
20
|
+
|
|
21
|
+
## Overview
|
|
22
|
+
|
|
23
|
+
VoiceMix is a chainable text-to-speech Node.js library that generates audio files (mp3/wav) from text using multiple TTS provider APIs.
|
|
24
|
+
|
|
25
|
+
**Use this skill when:**
|
|
26
|
+
|
|
27
|
+
- A task requires generating speech audio from text
|
|
28
|
+
- Integrating text-to-speech into a Node.js project
|
|
29
|
+
- Working with ElevenLabs, Resemble AI, or Cartesia APIs
|
|
30
|
+
- Batch-processing scripts of dialogue lines into audio files
|
|
31
|
+
|
|
32
|
+
**Do NOT use this skill when:**
|
|
33
|
+
|
|
34
|
+
- The task requires real-time streaming audio playback (VoiceMix writes to files)
|
|
35
|
+
- The target environment is browser-only (VoiceMix uses Node.js `fs`)
|
|
36
|
+
- Speech-to-text (transcription) is needed — this is text-**to**-speech only
|
|
37
|
+
|
|
38
|
+
## Installation
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
npm install voicemix dotenv
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
- Requires Node.js with ESM support (`"type": "module"` in `package.json`).
|
|
45
|
+
- `dotenv` is optional but recommended for loading API keys from `.env`.
|
|
46
|
+
|
|
47
|
+
## Environment Setup
|
|
48
|
+
|
|
49
|
+
Create a `.env` file with the relevant provider key(s):
|
|
50
|
+
|
|
51
|
+
```plaintext
|
|
52
|
+
ELEVENLABS_API_KEY="your-elevenlabs-key"
|
|
53
|
+
RESEMBLE_API_KEY="your-resemble-key"
|
|
54
|
+
CARTESIA_API_KEY="your-cartesia-key"
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
Only the key for the provider being used is required. ElevenLabs is the default provider.
|
|
58
|
+
|
|
59
|
+
## Core API
|
|
60
|
+
|
|
61
|
+
VoiceMix uses a **chainable fluent API**. Every method (except `save()`) returns `this`.
|
|
62
|
+
|
|
63
|
+
### Constructor
|
|
64
|
+
|
|
65
|
+
```javascript
|
|
66
|
+
import { VoiceMix } from 'voicemix';
|
|
67
|
+
|
|
68
|
+
const vm = new VoiceMix(); // defaults: ElevenLabs, multilingual_v2, mp3, cwd
|
|
69
|
+
const vm = new VoiceMix({ filePath: './audio', format: 'wav' }); // with options
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
Constructor options (all optional):
|
|
73
|
+
|
|
74
|
+
| Option | Default | Description |
|
|
75
|
+
|--------------|------------|------------------------------------|
|
|
76
|
+
| `filePath` | `'./'` | Output directory for audio files |
|
|
77
|
+
| `format` | `'mp3'` | Output format (`mp3` or `wav`) |
|
|
78
|
+
| `filePrefix` | `''` | Prefix for generated filenames |
|
|
79
|
+
| `drymode` | `false` | Skip API calls, return filename |
|
|
80
|
+
| `apiKey` | env var | Override provider API key |
|
|
81
|
+
|
|
82
|
+
### Provider Selection
|
|
83
|
+
|
|
84
|
+
```javascript
|
|
85
|
+
vm.useElevenLabs(apiKey?) // default — reads ELEVENLABS_API_KEY
|
|
86
|
+
vm.useResemble(apiKey?) // reads RESEMBLE_API_KEY
|
|
87
|
+
vm.useCartesia(apiKey?) // reads CARTESIA_API_KEY
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
### ElevenLabs Model Selection
|
|
91
|
+
|
|
92
|
+
```javascript
|
|
93
|
+
vm.monolingual_v1() // English only
|
|
94
|
+
vm.multilingual_v1() // First multilingual
|
|
95
|
+
vm.multilingual_v2() // Default — improved multilingual
|
|
96
|
+
vm.v3() // Latest, most advanced
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
### Speech Generation Chain
|
|
100
|
+
|
|
101
|
+
```javascript
|
|
102
|
+
vm.voice('voiceId') // required — set provider voice ID
|
|
103
|
+
.say('Hello world') // required — set text, auto-generates hashed filename
|
|
104
|
+
.save(); // returns Promise<string> resolving to the full file path
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
### Additional Methods
|
|
108
|
+
|
|
109
|
+
```javascript
|
|
110
|
+
vm.lang('en-US') // set language (used by Resemble for SSML)
|
|
111
|
+
vm.prompt('Friendly tone') // set voice style prompt (Resemble only)
|
|
112
|
+
vm.path('./output') // change output directory
|
|
113
|
+
vm.prefix('ch1_') // set filename prefix
|
|
114
|
+
vm.file('custom-name') // override auto-generated filename
|
|
115
|
+
vm.id('voiceId') // alias for .voice()
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
### Resemble-Specific
|
|
119
|
+
|
|
120
|
+
```javascript
|
|
121
|
+
vm.setSampleRate(48000) // default 48000
|
|
122
|
+
vm.setPrecision('PCM_16') // MULAW | PCM_16 | PCM_24 | PCM_32
|
|
123
|
+
vm.setOutputFormat('mp3') // mp3 | wav
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
## Common Tasks
|
|
127
|
+
|
|
128
|
+
### Generate a Single Audio File (ElevenLabs)
|
|
129
|
+
|
|
130
|
+
```javascript
|
|
131
|
+
import { VoiceMix } from 'voicemix';
|
|
132
|
+
import dotenv from 'dotenv';
|
|
133
|
+
dotenv.config();
|
|
134
|
+
|
|
135
|
+
const vm = new VoiceMix();
|
|
136
|
+
|
|
137
|
+
await vm
|
|
138
|
+
.voice('EbhcCfMvNsbvjN6OhjpJ')
|
|
139
|
+
.say('Hello, world!')
|
|
140
|
+
.save();
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
### Generate with ElevenLabs v3
|
|
144
|
+
|
|
145
|
+
```javascript
|
|
146
|
+
const vm = new VoiceMix();
|
|
147
|
+
|
|
148
|
+
await vm
|
|
149
|
+
.v3()
|
|
150
|
+
.voice('dxvGlXoa4TLMyfYR6uC9')
|
|
151
|
+
.say('This uses the latest ElevenLabs model.')
|
|
152
|
+
.save();
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
### Generate with Resemble AI (with Prompt Styling)
|
|
156
|
+
|
|
157
|
+
```javascript
|
|
158
|
+
const vm = new VoiceMix();
|
|
159
|
+
|
|
160
|
+
await vm
|
|
161
|
+
.useResemble()
|
|
162
|
+
.prompt('Friendly and conversational tone')
|
|
163
|
+
.voice('ba875a0a')
|
|
164
|
+
.lang('en-US')
|
|
165
|
+
.say('Your text here')
|
|
166
|
+
.save();
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
### Generate with Cartesia
|
|
170
|
+
|
|
171
|
+
```javascript
|
|
172
|
+
const vm = new VoiceMix();
|
|
173
|
+
|
|
174
|
+
await vm
|
|
175
|
+
.useCartesia()
|
|
176
|
+
.voice('6ccbfb76-1fc6-48f7-b71d-91ac6298247b')
|
|
177
|
+
.say('Your text here')
|
|
178
|
+
.save();
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
### Batch Process a Script from JSON
|
|
182
|
+
|
|
183
|
+
```javascript
|
|
184
|
+
import { VoiceMix } from 'voicemix';
|
|
185
|
+
import fs from 'fs';
|
|
186
|
+
|
|
187
|
+
const script = JSON.parse(fs.readFileSync('./lines.json', 'utf8'));
|
|
188
|
+
const vm = new VoiceMix({ filePath: './audio' });
|
|
189
|
+
|
|
190
|
+
for (const entry of script) {
|
|
191
|
+
await vm
|
|
192
|
+
.prompt(entry.prompt || 'Friendly and conversational tone')
|
|
193
|
+
.voice(entry.voiceId)
|
|
194
|
+
.say(entry.english)
|
|
195
|
+
.save();
|
|
196
|
+
}
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
Expected `lines.json` format:
|
|
200
|
+
|
|
201
|
+
```json
|
|
202
|
+
[
|
|
203
|
+
{
|
|
204
|
+
"prompt": "Friendly and conversational tone",
|
|
205
|
+
"english": "Hello, how are you today?",
|
|
206
|
+
"voiceId": "EbhcCfMvNsbvjN6OhjpJ"
|
|
207
|
+
}
|
|
208
|
+
]
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
### Save to a Custom Path and Filename
|
|
212
|
+
|
|
213
|
+
```javascript
|
|
214
|
+
const vm = new VoiceMix();
|
|
215
|
+
|
|
216
|
+
await vm
|
|
217
|
+
.voice('EbhcCfMvNsbvjN6OhjpJ')
|
|
218
|
+
.path('./output/chapter1')
|
|
219
|
+
.prefix('line_')
|
|
220
|
+
.say('Opening narration here.')
|
|
221
|
+
.save();
|
|
222
|
+
```
|
|
223
|
+
|
|
224
|
+
## Agent Usage Rules
|
|
225
|
+
|
|
226
|
+
1. **Always load environment variables** — call `dotenv.config()` (or equivalent) before constructing `VoiceMix` so provider API keys are available via `process.env`.
|
|
227
|
+
2. **Check if `voicemix` is already installed** before running `npm install`.
|
|
228
|
+
3. **Ensure `"type": "module"`** exists in the project's `package.json` — VoiceMix is ESM-only.
|
|
229
|
+
4. **Never hardcode API keys** — use environment variables or pass keys via constructor/provider methods.
|
|
230
|
+
5. **Always `await` the `.save()` call** — it returns a Promise. Without `await`, files may not be written before the process exits.
|
|
231
|
+
6. **Voice ID is required** — calling `.save()` without `.voice()` throws a `ValidationError`.
|
|
232
|
+
7. **Use the correct voice IDs for the selected provider** — ElevenLabs, Resemble, and Cartesia voice IDs are not interchangeable.
|
|
233
|
+
8. **Filenames are auto-hashed** — `.say(text)` generates a deterministic filename from the text + config. The same input produces the same filename (useful for caching). Use `.file('name')` only when an explicit filename is needed.
|
|
234
|
+
9. **Provider methods are provider-scoped** — `.prompt()` only works with Resemble; `.v3()` only works with ElevenLabs. Calling them on the wrong provider is a no-op (no error thrown).
|
|
235
|
+
10. **Batch processing uses an internal queue** (batch size 3) — multiple `.save()` calls are automatically batched and processed concurrently.
|
|
236
|
+
|
|
237
|
+
## Troubleshooting
|
|
238
|
+
|
|
239
|
+
| Symptom | Cause | Fix |
|
|
240
|
+
|---------|-------|-----|
|
|
241
|
+
| `ProviderError: ElevenLabs API key is required` | Missing env var | Set `ELEVENLABS_API_KEY` in `.env` and call `dotenv.config()` |
|
|
242
|
+
| `ProviderError: Cartesia API key is required` | Missing env var | Set `CARTESIA_API_KEY` in `.env` |
|
|
243
|
+
| `ValidationError: Voice ID is required` | `.voice()` not called | Chain `.voice('id')` before `.save()` |
|
|
244
|
+
| 401 / 403 from provider API | Invalid or expired key | Verify the API key in provider dashboard |
|
|
245
|
+
| Files not appearing | `save()` not awaited | Add `await` before `.save()` |
|
|
246
|
+
| Wrong provider voice ID | Mixing IDs across providers | Use voice IDs from the active provider's dashboard |
|
|
247
|
+
|
|
248
|
+
## References
|
|
249
|
+
|
|
250
|
+
- [npm: voicemix](https://www.npmjs.com/package/voicemix)
|
|
251
|
+
- [GitHub: clasen/VoiceMix](https://github.com/clasen/VoiceMix)
|
|
252
|
+
- [ElevenLabs API docs](https://elevenlabs.io/docs)
|
|
253
|
+
- [Resemble AI docs](https://docs.resemble.ai/)
|
|
254
|
+
- [Cartesia docs](https://docs.cartesia.ai/)
|