voicemix 1.3.0 → 1.3.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -8,6 +8,11 @@ VoiceMix is a flexible text-to-speech library that allows you to generate speech
8
8
  npm install voicemix dotenv
9
9
  ```
10
10
 
11
+ > **AI Skill**: You can also add VoiceMix as a skill for AI agentic development:
12
+ > ```bash
13
+ > npx skills add https://github.com/clasen/VoiceMix --skill voicemix
14
+ > ```
15
+
11
16
  ### Environment Setup
12
17
 
13
18
  Create a `.env` file in your project root with your API keys:
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "voicemix",
3
3
  "type": "module",
4
- "version": "1.3.0",
4
+ "version": "1.3.4",
5
5
  "description": "🗣️ VoiceMix - A simple text-to-speech tool using ElevenLabs, Cartesia and Resemble AI APIs.",
6
6
  "main": "index.js",
7
7
  "repository": {
@@ -17,9 +17,6 @@ export class CartesiaProvider {
17
17
  }
18
18
  };
19
19
 
20
- if (!this.apiKey) {
21
- throw new ProviderError('Cartesia API key is required', 'cartesia');
22
- }
23
20
  }
24
21
 
25
22
  _getRequestOptions(voiceId, text, format = 'mp3') {
@@ -53,6 +50,9 @@ export class CartesiaProvider {
53
50
 
54
51
  async save(voiceId, text, format, filePath, fileName) {
55
52
  try {
53
+ if (!this.apiKey) {
54
+ throw new ProviderError('Cartesia API key is required', 'cartesia');
55
+ }
56
56
  if (!voiceId) {
57
57
  throw new ProviderError('Voice ID is required', 'cartesia');
58
58
  }
@@ -15,9 +15,6 @@ export class ElevenLabsProvider {
15
15
  use_speaker_boost: true
16
16
  };
17
17
 
18
- if (!this.apiKey) {
19
- throw new ProviderError('ElevenLabs API key is required', 'elevenlabs');
20
- }
21
18
  }
22
19
 
23
20
  monolingual_v1() {
@@ -64,6 +61,9 @@ export class ElevenLabsProvider {
64
61
 
65
62
  async save(voiceId, text, format, filePath, fileName) {
66
63
  try {
64
+ if (!this.apiKey) {
65
+ throw new ProviderError('ElevenLabs API key is required', 'elevenlabs');
66
+ }
67
67
  if (!voiceId) {
68
68
  throw new ProviderError('Voice ID is required', 'elevenlabs');
69
69
  }
@@ -13,9 +13,6 @@ export class ResembleProvider {
13
13
  precision: 'PCM_16'
14
14
  };
15
15
 
16
- if (!this.apiKey) {
17
- throw new ProviderError('Resemble API key is required', 'resemble');
18
- }
19
16
  }
20
17
 
21
18
  _getRequestOptions(endpoint, data) {
@@ -33,6 +30,9 @@ export class ResembleProvider {
33
30
 
34
31
  async save(voiceId, text, format, filePath, fileName) {
35
32
  try {
33
+ if (!this.apiKey) {
34
+ throw new ProviderError('Resemble API key is required', 'resemble');
35
+ }
36
36
  if (!voiceId) {
37
37
  throw new ProviderError('Voice ID is required', 'resemble');
38
38
  }
@@ -0,0 +1,254 @@
1
+ ---
2
+ name: voicemix
3
+ description: Instructions for generating speech audio files using the VoiceMix Node.js library
4
+ version: 1.3.2
5
+ tags: [nodejs, tts, text-to-speech, elevenlabs, resemble-ai, cartesia, audio]
6
+ ---
7
+
8
+ # VoiceMix Skill
9
+
10
+ ## Table of Contents
11
+
12
+ - [Overview](#overview)
13
+ - [Installation](#installation)
14
+ - [Environment Setup](#environment-setup)
15
+ - [Core API](#core-api)
16
+ - [Common Tasks](#common-tasks)
17
+ - [Agent Usage Rules](#agent-usage-rules)
18
+ - [Troubleshooting](#troubleshooting)
19
+ - [References](#references)
20
+
21
+ ## Overview
22
+
23
+ VoiceMix is a chainable text-to-speech Node.js library that generates audio files (mp3/wav) from text using multiple TTS provider APIs.
24
+
25
+ **Use this skill when:**
26
+
27
+ - A task requires generating speech audio from text
28
+ - Integrating text-to-speech into a Node.js project
29
+ - Working with ElevenLabs, Resemble AI, or Cartesia APIs
30
+ - Batch-processing scripts of dialogue lines into audio files
31
+
32
+ **Do NOT use this skill when:**
33
+
34
+ - The task requires real-time streaming audio playback (VoiceMix writes to files)
35
+ - The target environment is browser-only (VoiceMix uses Node.js `fs`)
36
+ - Speech-to-text (transcription) is needed — this is text-**to**-speech only
37
+
38
+ ## Installation
39
+
40
+ ```bash
41
+ npm install voicemix dotenv
42
+ ```
43
+
44
+ - Requires Node.js with ESM support (`"type": "module"` in `package.json`).
45
+ - `dotenv` is optional but recommended for loading API keys from `.env`.
46
+
47
+ ## Environment Setup
48
+
49
+ Create a `.env` file with the relevant provider key(s):
50
+
51
+ ```plaintext
52
+ ELEVENLABS_API_KEY="your-elevenlabs-key"
53
+ RESEMBLE_API_KEY="your-resemble-key"
54
+ CARTESIA_API_KEY="your-cartesia-key"
55
+ ```
56
+
57
+ Only the key for the provider being used is required. ElevenLabs is the default provider.
58
+
59
+ ## Core API
60
+
61
+ VoiceMix uses a **chainable fluent API**. Every method (except `save()`) returns `this`.
62
+
63
+ ### Constructor
64
+
65
+ ```javascript
66
+ import { VoiceMix } from 'voicemix';
67
+
68
+ const vm = new VoiceMix(); // defaults: ElevenLabs, multilingual_v2, mp3, cwd
69
+ const vm = new VoiceMix({ filePath: './audio', format: 'wav' }); // with options
70
+ ```
71
+
72
+ Constructor options (all optional):
73
+
74
+ | Option | Default | Description |
75
+ |--------------|------------|------------------------------------|
76
+ | `filePath` | `'./'` | Output directory for audio files |
77
+ | `format` | `'mp3'` | Output format (`mp3` or `wav`) |
78
+ | `filePrefix` | `''` | Prefix for generated filenames |
79
+ | `drymode` | `false` | Skip API calls, return filename |
80
+ | `apiKey` | env var | Override provider API key |
81
+
82
+ ### Provider Selection
83
+
84
+ ```javascript
85
+ vm.useElevenLabs(apiKey?) // default — reads ELEVENLABS_API_KEY
86
+ vm.useResemble(apiKey?) // reads RESEMBLE_API_KEY
87
+ vm.useCartesia(apiKey?) // reads CARTESIA_API_KEY
88
+ ```
89
+
90
+ ### ElevenLabs Model Selection
91
+
92
+ ```javascript
93
+ vm.monolingual_v1() // English only
94
+ vm.multilingual_v1() // First multilingual
95
+ vm.multilingual_v2() // Default — improved multilingual
96
+ vm.v3() // Latest, most advanced
97
+ ```
98
+
99
+ ### Speech Generation Chain
100
+
101
+ ```javascript
102
+ vm.voice('voiceId') // required — set provider voice ID
103
+ .say('Hello world') // required — set text, auto-generates hashed filename
104
+ .save(); // returns Promise<string> resolving to the full file path
105
+ ```
106
+
107
+ ### Additional Methods
108
+
109
+ ```javascript
110
+ vm.lang('en-US') // set language (used by Resemble for SSML)
111
+ vm.prompt('Friendly tone') // set voice style prompt (Resemble only)
112
+ vm.path('./output') // change output directory
113
+ vm.prefix('ch1_') // set filename prefix
114
+ vm.file('custom-name') // override auto-generated filename
115
+ vm.id('voiceId') // alias for .voice()
116
+ ```
117
+
118
+ ### Resemble-Specific
119
+
120
+ ```javascript
121
+ vm.setSampleRate(48000) // default 48000
122
+ vm.setPrecision('PCM_16') // MULAW | PCM_16 | PCM_24 | PCM_32
123
+ vm.setOutputFormat('mp3') // mp3 | wav
124
+ ```
125
+
126
+ ## Common Tasks
127
+
128
+ ### Generate a Single Audio File (ElevenLabs)
129
+
130
+ ```javascript
131
+ import { VoiceMix } from 'voicemix';
132
+ import dotenv from 'dotenv';
133
+ dotenv.config();
134
+
135
+ const vm = new VoiceMix();
136
+
137
+ await vm
138
+ .voice('EbhcCfMvNsbvjN6OhjpJ')
139
+ .say('Hello, world!')
140
+ .save();
141
+ ```
142
+
143
+ ### Generate with ElevenLabs v3
144
+
145
+ ```javascript
146
+ const vm = new VoiceMix();
147
+
148
+ await vm
149
+ .v3()
150
+ .voice('dxvGlXoa4TLMyfYR6uC9')
151
+ .say('This uses the latest ElevenLabs model.')
152
+ .save();
153
+ ```
154
+
155
+ ### Generate with Resemble AI (with Prompt Styling)
156
+
157
+ ```javascript
158
+ const vm = new VoiceMix();
159
+
160
+ await vm
161
+ .useResemble()
162
+ .prompt('Friendly and conversational tone')
163
+ .voice('ba875a0a')
164
+ .lang('en-US')
165
+ .say('Your text here')
166
+ .save();
167
+ ```
168
+
169
+ ### Generate with Cartesia
170
+
171
+ ```javascript
172
+ const vm = new VoiceMix();
173
+
174
+ await vm
175
+ .useCartesia()
176
+ .voice('6ccbfb76-1fc6-48f7-b71d-91ac6298247b')
177
+ .say('Your text here')
178
+ .save();
179
+ ```
180
+
181
+ ### Batch Process a Script from JSON
182
+
183
+ ```javascript
184
+ import { VoiceMix } from 'voicemix';
185
+ import fs from 'fs';
186
+
187
+ const script = JSON.parse(fs.readFileSync('./lines.json', 'utf8'));
188
+ const vm = new VoiceMix({ filePath: './audio' });
189
+
190
+ for (const entry of script) {
191
+ await vm
192
+ .prompt(entry.prompt || 'Friendly and conversational tone')
193
+ .voice(entry.voiceId)
194
+ .say(entry.english)
195
+ .save();
196
+ }
197
+ ```
198
+
199
+ Expected `lines.json` format:
200
+
201
+ ```json
202
+ [
203
+ {
204
+ "prompt": "Friendly and conversational tone",
205
+ "english": "Hello, how are you today?",
206
+ "voiceId": "EbhcCfMvNsbvjN6OhjpJ"
207
+ }
208
+ ]
209
+ ```
210
+
211
+ ### Save to a Custom Path and Filename
212
+
213
+ ```javascript
214
+ const vm = new VoiceMix();
215
+
216
+ await vm
217
+ .voice('EbhcCfMvNsbvjN6OhjpJ')
218
+ .path('./output/chapter1')
219
+ .prefix('line_')
220
+ .say('Opening narration here.')
221
+ .save();
222
+ ```
223
+
224
+ ## Agent Usage Rules
225
+
226
+ 1. **Always load environment variables** — call `dotenv.config()` (or equivalent) before constructing `VoiceMix` so provider API keys are available via `process.env`.
227
+ 2. **Check if `voicemix` is already installed** before running `npm install`.
228
+ 3. **Ensure `"type": "module"`** exists in the project's `package.json` — VoiceMix is ESM-only.
229
+ 4. **Never hardcode API keys** — use environment variables or pass keys via constructor/provider methods.
230
+ 5. **Always `await` the `.save()` call** — it returns a Promise. Without `await`, files may not be written before the process exits.
231
+ 6. **Voice ID is required** — calling `.save()` without `.voice()` throws a `ValidationError`.
232
+ 7. **Use the correct voice IDs for the selected provider** — ElevenLabs, Resemble, and Cartesia voice IDs are not interchangeable.
233
+ 8. **Filenames are auto-hashed** — `.say(text)` generates a deterministic filename from the text + config. The same input produces the same filename (useful for caching). Use `.file('name')` only when an explicit filename is needed.
234
+ 9. **Provider methods are provider-scoped** — `.prompt()` only works with Resemble; `.v3()` only works with ElevenLabs. Calling them on the wrong provider is a no-op (no error thrown).
235
+ 10. **Batch processing uses an internal queue** (batch size 3) — multiple `.save()` calls are automatically batched and processed concurrently.
236
+
237
+ ## Troubleshooting
238
+
239
+ | Symptom | Cause | Fix |
240
+ |---------|-------|-----|
241
+ | `ProviderError: ElevenLabs API key is required` | Missing env var | Set `ELEVENLABS_API_KEY` in `.env` and call `dotenv.config()` |
242
+ | `ProviderError: Cartesia API key is required` | Missing env var | Set `CARTESIA_API_KEY` in `.env` |
243
+ | `ValidationError: Voice ID is required` | `.voice()` not called | Chain `.voice('id')` before `.save()` |
244
+ | 401 / 403 from provider API | Invalid or expired key | Verify the API key in provider dashboard |
245
+ | Files not appearing | `save()` not awaited | Add `await` before `.save()` |
246
+ | Wrong provider voice ID | Mixing IDs across providers | Use voice IDs from the active provider's dashboard |
247
+
248
+ ## References
249
+
250
+ - [npm: voicemix](https://www.npmjs.com/package/voicemix)
251
+ - [GitHub: clasen/VoiceMix](https://github.com/clasen/VoiceMix)
252
+ - [ElevenLabs API docs](https://elevenlabs.io/docs)
253
+ - [Resemble AI docs](https://docs.resemble.ai/)
254
+ - [Cartesia docs](https://docs.cartesia.ai/)