@inworld/tts 0.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of @inworld/tts might be problematic. Click here for more details.
- package/CHANGELOG.md +9 -0
- package/LICENSE +21 -0
- package/README.md +332 -0
- package/dist/index.cjs +1580 -0
- package/package.json +77 -0
- package/src/client.js +929 -0
- package/src/config.js +135 -0
- package/src/encoding.js +23 -0
- package/src/errors.js +31 -0
- package/src/index.d.ts +363 -0
- package/src/index.js +149 -0
- package/src/player.browser.js +53 -0
- package/src/player.js +143 -0
- package/src/voice.js +498 -0
- package/src/write-file.browser.js +7 -0
- package/src/write-file.js +11 -0
package/CHANGELOG.md
ADDED
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025 Inworld AI
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,332 @@
|
|
|
1
|
+
# @inworld/tts
|
|
2
|
+
|
|
3
|
+
[](https://www.npmjs.com/package/@inworld/tts)
|
|
4
|
+
[](LICENSE)
|
|
5
|
+
[](https://nodejs.org/)
|
|
6
|
+
|
|
7
|
+
Node.js and browser SDK for the Inworld TTS API — generate, stream, and manage voices.
|
|
8
|
+
|
|
9
|
+
**[API Reference](API_REFERENCE.md)** · **[Changelog](CHANGELOG.md)** · **[Platform](https://platform.inworld.ai)**
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## Install
|
|
14
|
+
|
|
15
|
+
```bash
|
|
16
|
+
npm install @inworld/tts
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
Supports ESM, CommonJS, and browser (Vite, webpack 5, Rollup, esbuild).
|
|
20
|
+
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
## Authentication
|
|
24
|
+
|
|
25
|
+
Pass your API key directly or set `INWORLD_API_KEY` in your environment:
|
|
26
|
+
|
|
27
|
+
```js
|
|
28
|
+
import { InworldTTS } from '@inworld/tts'; // TypeScript: types are bundled, no @types/ needed
|
|
29
|
+
|
|
30
|
+
const tts = InworldTTS(); // reads INWORLD_API_KEY from env
|
|
31
|
+
const tts = InworldTTS({ apiKey: 'your_key' }); // or pass directly
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
Get your key at [platform.inworld.ai](https://platform.inworld.ai). For browser usage with JWT, see [Browser](#browser).
|
|
35
|
+
|
|
36
|
+
---
|
|
37
|
+
|
|
38
|
+
## Quickstart
|
|
39
|
+
|
|
40
|
+
```js
|
|
41
|
+
import { InworldTTS } from '@inworld/tts';
|
|
42
|
+
|
|
43
|
+
const tts = InworldTTS();
|
|
44
|
+
await tts.generate({ text: 'Hello, world!', voice: 'Dennis', outputFile: 'hello.mp3' });
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
---
|
|
48
|
+
|
|
49
|
+
## Models
|
|
50
|
+
|
|
51
|
+
| Model ID | Quality | Default for |
|
|
52
|
+
|----------|---------|-------------|
|
|
53
|
+
| `inworld-tts-1.5-max` | Higher quality | `generate()` |
|
|
54
|
+
| `inworld-tts-1.5-mini` | Lower latency | `stream()` |
|
|
55
|
+
|
|
56
|
+
Use `max` when quality is the priority (e.g. audiobooks, voiceovers). Use `mini` for real-time use cases (e.g. voice assistants).
|
|
57
|
+
|
|
58
|
+
---
|
|
59
|
+
|
|
60
|
+
## Constructor
|
|
61
|
+
|
|
62
|
+
```js
|
|
63
|
+
const tts = InworldTTS({
|
|
64
|
+
apiKey: 'your_key', // or set INWORLD_API_KEY env var
|
|
65
|
+
timeout: 120_000, // HTTP timeout in ms (default: per-method)
|
|
66
|
+
maxConcurrentRequests: 4, // parallel chunk requests for long text (default: 2)
|
|
67
|
+
maxRetries: 2, // retry on network errors / 5xx with exponential backoff (default: 2)
|
|
68
|
+
debug: true, // log timing and retry info
|
|
69
|
+
});
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
See [Constructor](API_REFERENCE.md#constructor) in the API Reference for full parameter details and per-method timeout defaults.
|
|
73
|
+
|
|
74
|
+
---
|
|
75
|
+
|
|
76
|
+
## generate()
|
|
77
|
+
|
|
78
|
+
Synthesize speech from text of any length. Blocks until all audio is ready.
|
|
79
|
+
|
|
80
|
+
```js
|
|
81
|
+
// Save to file
|
|
82
|
+
await tts.generate({ text: 'Hello!', voice: 'Dennis', outputFile: 'hello.mp3' });
|
|
83
|
+
|
|
84
|
+
// Get bytes for further processing
|
|
85
|
+
const audio = await tts.generate({ text: 'Hello!', voice: 'Dennis' });
|
|
86
|
+
|
|
87
|
+
// Generate, save, and play
|
|
88
|
+
await tts.generate({ text: 'Hello!', voice: 'Dennis', outputFile: 'hello.mp3', play: true });
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
## stream()
|
|
92
|
+
|
|
93
|
+
Streaming — first audio chunk arrives faster than `generate()`. Max 2000 characters per call.
|
|
94
|
+
|
|
95
|
+
```js
|
|
96
|
+
for await (const chunk of tts.stream({ text: 'Hello, world!', voice: 'Dennis' })) {
|
|
97
|
+
// chunk is Uint8Array — pipe to audio player or accumulate
|
|
98
|
+
}
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
## Timestamps
|
|
102
|
+
|
|
103
|
+
`generateWithTimestamps()` and `streamWithTimestamps()` return word- or character-level timing alongside audio.
|
|
104
|
+
|
|
105
|
+
```js
|
|
106
|
+
const { audio, timestamps } = await tts.generateWithTimestamps({
|
|
107
|
+
text: 'Hello, world!',
|
|
108
|
+
voice: 'Dennis',
|
|
109
|
+
timestampType: 'WORD',
|
|
110
|
+
});
|
|
111
|
+
const wa = timestamps.wordAlignment;
|
|
112
|
+
wa.words.forEach((w, i) =>
|
|
113
|
+
console.log(`${w}: ${wa.wordStartTimeSeconds[i].toFixed(2)}s – ${wa.wordEndTimeSeconds[i].toFixed(2)}s`)
|
|
114
|
+
);
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
See [generateWithTimestamps()](API_REFERENCE.md#generatewithtimestamps) and [streamWithTimestamps()](API_REFERENCE.md#streamwithtimestamps) for full details.
|
|
118
|
+
|
|
119
|
+
## play()
|
|
120
|
+
|
|
121
|
+
Play audio from a `Uint8Array` or file path. Encoding is auto-detected from magic bytes or file extension.
|
|
122
|
+
|
|
123
|
+
```js
|
|
124
|
+
const audio = await tts.generate({ text: 'Hello!', voice: 'Dennis' });
|
|
125
|
+
await tts.play(audio);
|
|
126
|
+
|
|
127
|
+
await tts.play('hello.mp3'); // file path also accepted (Node.js only)
|
|
128
|
+
await tts.play(pcmBytes, { encoding: 'PCM' }); // encoding hint for raw PCM/ALAW/MULAW
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
See [play()](API_REFERENCE.md#play) for platform player details and browser constraints.
|
|
132
|
+
|
|
133
|
+
---
|
|
134
|
+
|
|
135
|
+
## listVoices()
|
|
136
|
+
|
|
137
|
+
List voices in your workspace, with optional language filter.
|
|
138
|
+
|
|
139
|
+
```js
|
|
140
|
+
const voices = await tts.listVoices();
|
|
141
|
+
const enVoices = await tts.listVoices({ lang: 'EN_US' });
|
|
142
|
+
const multi = await tts.listVoices({ lang: ['EN_US', 'ES_ES'] });
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
## getVoice()
|
|
146
|
+
|
|
147
|
+
Get details of a specific voice.
|
|
148
|
+
|
|
149
|
+
```js
|
|
150
|
+
const voice = await tts.getVoice('workspace__my_clone');
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
## updateVoice()
|
|
154
|
+
|
|
155
|
+
Update a voice's display name, description, or tags.
|
|
156
|
+
|
|
157
|
+
```js
|
|
158
|
+
await tts.updateVoice({ voice: 'workspace__my_clone', displayName: 'Narrator', tags: ['calm'] });
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
## deleteVoice()
|
|
162
|
+
|
|
163
|
+
Delete a voice from your workspace.
|
|
164
|
+
|
|
165
|
+
```js
|
|
166
|
+
await tts.deleteVoice('workspace__my_clone');
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
## cloneVoice()
|
|
170
|
+
|
|
171
|
+
Clone a voice from one or more audio recordings (WAV/MP3). Accepts `Uint8Array` buffers or file path strings (Node.js only).
|
|
172
|
+
|
|
173
|
+
```js
|
|
174
|
+
const result = await tts.cloneVoice({
|
|
175
|
+
audioSamples: ['sample.wav'],
|
|
176
|
+
displayName: 'My Clone',
|
|
177
|
+
});
|
|
178
|
+
const voiceId = result.voice.voiceId;
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
> **Note:** Voice cloning is a long-running operation (up to 5 min). If it times out, check `listVoices()` — the voice may have been created anyway.
|
|
182
|
+
|
|
183
|
+
## designVoice()
|
|
184
|
+
|
|
185
|
+
Design a voice from a text description — no recording needed.
|
|
186
|
+
|
|
187
|
+
```js
|
|
188
|
+
const result = await tts.designVoice({
|
|
189
|
+
designPrompt: 'A warm, friendly narrator',
|
|
190
|
+
previewText: 'Hello, welcome to our audiobook.',
|
|
191
|
+
});
|
|
192
|
+
const preview = result.previewVoices[0];
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
## publishVoice()
|
|
196
|
+
|
|
197
|
+
Publish a designed or cloned voice preview to your library.
|
|
198
|
+
|
|
199
|
+
```js
|
|
200
|
+
await tts.publishVoice({ voice: preview.voiceId, displayName: 'My Custom Voice' });
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
## migrateFromElevenLabs()
|
|
204
|
+
|
|
205
|
+
Migrate a voice from ElevenLabs to your Inworld workspace. No ElevenLabs SDK required.
|
|
206
|
+
|
|
207
|
+
```js
|
|
208
|
+
const result = await tts.migrateFromElevenLabs({
|
|
209
|
+
elevenLabsApiKey: 'el_...',
|
|
210
|
+
elevenLabsVoiceId: 'abc123',
|
|
211
|
+
});
|
|
212
|
+
console.log(`${result.elevenLabsName} → ${result.inworldVoiceId}`);
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
See [Voice Management](API_REFERENCE.md#voice-management) in the API Reference for all parameters.
|
|
216
|
+
|
|
217
|
+
---
|
|
218
|
+
|
|
219
|
+
## Errors
|
|
220
|
+
|
|
221
|
+
| Class | When |
|
|
222
|
+
|-------|------|
|
|
223
|
+
| `MissingApiKeyError` | No API key or token found at construction |
|
|
224
|
+
| `ApiError` | API returned 4xx/5xx — has `.code` and `.details` |
|
|
225
|
+
| `NetworkError` | Connection or timeout failure |
|
|
226
|
+
|
|
227
|
+
All inherit from `InworldTTSError`.
|
|
228
|
+
|
|
229
|
+
```js
|
|
230
|
+
import { InworldTTS, ApiError, MissingApiKeyError, NetworkError } from '@inworld/tts';
|
|
231
|
+
|
|
232
|
+
try {
|
|
233
|
+
const audio = await tts.generate({ text: 'Hello!', voice: 'Dennis' });
|
|
234
|
+
} catch (err) {
|
|
235
|
+
if (err instanceof MissingApiKeyError) console.error('Missing API key');
|
|
236
|
+
else if (err instanceof ApiError) console.error(`HTTP ${err.code}: ${err.message}`);
|
|
237
|
+
else if (err instanceof NetworkError) console.error(`Network error: ${err.message}`);
|
|
238
|
+
else throw err;
|
|
239
|
+
}
|
|
240
|
+
```
|
|
241
|
+
|
|
242
|
+
---
|
|
243
|
+
|
|
244
|
+
## Browser
|
|
245
|
+
|
|
246
|
+
For browser usage, use a short-lived JWT token from your backend instead of exposing your API key.
|
|
247
|
+
|
|
248
|
+
```js
|
|
249
|
+
const fetchToken = async () => {
|
|
250
|
+
const { token } = await fetch('/api/tts-token').then(r => r.json());
|
|
251
|
+
return token;
|
|
252
|
+
};
|
|
253
|
+
|
|
254
|
+
const tts = InworldTTS({
|
|
255
|
+
token: await fetchToken(),
|
|
256
|
+
onTokenExpiring: fetchToken, // called automatically when token is about to expire
|
|
257
|
+
});
|
|
258
|
+
|
|
259
|
+
// play() must be called inside a user event handler (browser autoplay policy)
|
|
260
|
+
button.onclick = async () => {
|
|
261
|
+
const audio = await tts.generate({ text: 'Hello!', voice: 'Dennis', encoding: 'MP3' });
|
|
262
|
+
await tts.play(audio);
|
|
263
|
+
};
|
|
264
|
+
```
|
|
265
|
+
|
|
266
|
+
For development only, you can use an API key directly with `dangerouslyAllowBrowser: true` — but your key will be visible in DevTools and billed to your account.
|
|
267
|
+
|
|
268
|
+
A complete working example is in [`examples/browser/`](examples/browser/).
|
|
269
|
+
|
|
270
|
+
> **Encoding in browser:** Prefer `encoding: 'MP3'` — supported natively in all browsers. `OGG_OPUS`/`FLAC` work in Chrome and Firefox but not Safari. `LINEAR16`, `PCM`, `ALAW`, `MULAW` cannot be played by `play()` in browser.
|
|
271
|
+
|
|
272
|
+
---
|
|
273
|
+
|
|
274
|
+
## Examples
|
|
275
|
+
|
|
276
|
+
Runnable examples are in the [`examples/`](examples/) directory:
|
|
277
|
+
|
|
278
|
+
| File | What it shows |
|
|
279
|
+
|------|---------------|
|
|
280
|
+
| [`hello_world.js`](examples/hello_world.js) | Text → MP3 in 3 lines |
|
|
281
|
+
| [`stream_audio.js`](examples/stream_audio.js) | Real-time streaming |
|
|
282
|
+
| [`list_voices.js`](examples/list_voices.js) | List all voices with optional language filter |
|
|
283
|
+
| [`clone_voice.js`](examples/clone_voice.js) | Clone a voice from a WAV/MP3 recording |
|
|
284
|
+
| [`design_voice.js`](examples/design_voice.js) | Design a voice from a text description, preview, and publish |
|
|
285
|
+
| [`generate_timestamps.js`](examples/generate_timestamps.js) | Word-level timestamps |
|
|
286
|
+
| [`examples/browser/`](examples/browser/) | Browser usage with JWT auth |
|
|
287
|
+
|
|
288
|
+
---
|
|
289
|
+
|
|
290
|
+
## Troubleshooting
|
|
291
|
+
|
|
292
|
+
### `MissingApiKeyError` / `ApiError` 401
|
|
293
|
+
|
|
294
|
+
Set `INWORLD_API_KEY` or pass `apiKey` directly. If the key is set but rejected, regenerate it at [platform.inworld.ai](https://platform.inworld.ai).
|
|
295
|
+
|
|
296
|
+
### `play()` blocked by browser (autoplay policy)
|
|
297
|
+
|
|
298
|
+
Move `play()` inside a user event handler:
|
|
299
|
+
|
|
300
|
+
```js
|
|
301
|
+
button.onclick = async () => await tts.play(audio);
|
|
302
|
+
```
|
|
303
|
+
|
|
304
|
+
### `ApiError`: text exceeds 2000 character limit
|
|
305
|
+
|
|
306
|
+
`stream()` accepts at most 2000 characters per call. Use `generate()` instead — it handles any text length automatically.
|
|
307
|
+
|
|
308
|
+
### `NetworkError: Request timed out`
|
|
309
|
+
|
|
310
|
+
Increase the timeout or add retries:
|
|
311
|
+
|
|
312
|
+
```js
|
|
313
|
+
const tts = InworldTTS({ timeout: 120_000, maxRetries: 3 });
|
|
314
|
+
```
|
|
315
|
+
|
|
316
|
+
### `require('@inworld/tts')` throws `ERR_REQUIRE_ESM`
|
|
317
|
+
|
|
318
|
+
Both ESM and CommonJS are supported. Ensure you are on Node.js ≥18 and using a bundler that respects the `exports` field (webpack 5, Vite, Rollup, esbuild).
|
|
319
|
+
|
|
320
|
+
```js
|
|
321
|
+
// ESM
|
|
322
|
+
import { InworldTTS } from '@inworld/tts';
|
|
323
|
+
|
|
324
|
+
// CommonJS
|
|
325
|
+
const { InworldTTS } = require('@inworld/tts');
|
|
326
|
+
```
|
|
327
|
+
|
|
328
|
+
---
|
|
329
|
+
|
|
330
|
+
## License
|
|
331
|
+
|
|
332
|
+
[MIT](LICENSE)
|