@inworld/tts 0.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of @inworld/tts might be problematic. Click here for more details.

package/CHANGELOG.md ADDED
@@ -0,0 +1,9 @@
1
+ # Changelog
2
+
3
+ ## [Unreleased]
4
+
5
+ ### Added
6
+
7
+ - CI workflow for build validation on push and pull requests
8
+ - Release workflow for automated npm publishing via GitHub Actions
9
+ - Initial SDK: generate, stream, voice management, cloning, design
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Inworld AI
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,332 @@
1
+ # @inworld/tts
2
+
3
+ [![npm version](https://img.shields.io/npm/v/@inworld/tts.svg)](https://www.npmjs.com/package/@inworld/tts)
4
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
5
+ [![Node.js ≥18](https://img.shields.io/badge/node-%3E%3D18-brightgreen.svg)](https://nodejs.org/)
6
+
7
+ Node.js and browser SDK for the Inworld TTS API — generate, stream, and manage voices.
8
+
9
+ **[API Reference](API_REFERENCE.md)** · **[Changelog](CHANGELOG.md)** · **[Platform](https://platform.inworld.ai)**
10
+
11
+ ---
12
+
13
+ ## Install
14
+
15
+ ```bash
16
+ npm install @inworld/tts
17
+ ```
18
+
19
+ Supports ESM, CommonJS, and browser (Vite, webpack 5, Rollup, esbuild).
20
+
21
+ ---
22
+
23
+ ## Authentication
24
+
25
+ Pass your API key directly or set `INWORLD_API_KEY` in your environment:
26
+
27
+ ```js
28
+ import { InworldTTS } from '@inworld/tts'; // TypeScript: types are bundled, no @types/ needed
29
+
30
+ const tts = InworldTTS(); // reads INWORLD_API_KEY from env
31
+ const tts = InworldTTS({ apiKey: 'your_key' }); // or pass directly
32
+ ```
33
+
34
+ Get your key at [platform.inworld.ai](https://platform.inworld.ai). For browser usage with JWT, see [Browser](#browser).
35
+
36
+ ---
37
+
38
+ ## Quickstart
39
+
40
+ ```js
41
+ import { InworldTTS } from '@inworld/tts';
42
+
43
+ const tts = InworldTTS();
44
+ await tts.generate({ text: 'Hello, world!', voice: 'Dennis', outputFile: 'hello.mp3' });
45
+ ```
46
+
47
+ ---
48
+
49
+ ## Models
50
+
51
+ | Model ID | Quality | Default for |
52
+ |----------|---------|-------------|
53
+ | `inworld-tts-1.5-max` | Higher quality | `generate()` |
54
+ | `inworld-tts-1.5-mini` | Lower latency | `stream()` |
55
+
56
+ Use `max` when quality is the priority (e.g. audiobooks, voiceovers). Use `mini` for real-time use cases (e.g. voice assistants).
57
+
58
+ ---
59
+
60
+ ## Constructor
61
+
62
+ ```js
63
+ const tts = InworldTTS({
64
+ apiKey: 'your_key', // or set INWORLD_API_KEY env var
65
+ timeout: 120_000, // HTTP timeout in ms (default: per-method)
66
+ maxConcurrentRequests: 4, // parallel chunk requests for long text (default: 2)
67
+ maxRetries: 2, // retry on network errors / 5xx with exponential backoff (default: 2)
68
+ debug: true, // log timing and retry info
69
+ });
70
+ ```
71
+
72
+ See [Constructor](API_REFERENCE.md#constructor) in the API Reference for full parameter details and per-method timeout defaults.
73
+
74
+ ---
75
+
76
+ ## generate()
77
+
78
+ Synthesize speech from text of any length. Blocks until all audio is ready.
79
+
80
+ ```js
81
+ // Save to file
82
+ await tts.generate({ text: 'Hello!', voice: 'Dennis', outputFile: 'hello.mp3' });
83
+
84
+ // Get bytes for further processing
85
+ const audio = await tts.generate({ text: 'Hello!', voice: 'Dennis' });
86
+
87
+ // Generate, save, and play
88
+ await tts.generate({ text: 'Hello!', voice: 'Dennis', outputFile: 'hello.mp3', play: true });
89
+ ```
90
+
91
+ ## stream()
92
+
93
+ Streaming — first audio chunk arrives faster than `generate()`. Max 2000 characters per call.
94
+
95
+ ```js
96
+ for await (const chunk of tts.stream({ text: 'Hello, world!', voice: 'Dennis' })) {
97
+ // chunk is Uint8Array — pipe to audio player or accumulate
98
+ }
99
+ ```
100
+
101
+ ## Timestamps
102
+
103
+ `generateWithTimestamps()` and `streamWithTimestamps()` return word- or character-level timing alongside audio.
104
+
105
+ ```js
106
+ const { audio, timestamps } = await tts.generateWithTimestamps({
107
+ text: 'Hello, world!',
108
+ voice: 'Dennis',
109
+ timestampType: 'WORD',
110
+ });
111
+ const wa = timestamps.wordAlignment;
112
+ wa.words.forEach((w, i) =>
113
+ console.log(`${w}: ${wa.wordStartTimeSeconds[i].toFixed(2)}s – ${wa.wordEndTimeSeconds[i].toFixed(2)}s`)
114
+ );
115
+ ```
116
+
117
+ See [generateWithTimestamps()](API_REFERENCE.md#generatewithtimestamps) and [streamWithTimestamps()](API_REFERENCE.md#streamwithtimestamps) for full details.
118
+
119
+ ## play()
120
+
121
+ Play audio from a `Uint8Array` or file path. Encoding is auto-detected from magic bytes or file extension.
122
+
123
+ ```js
124
+ const audio = await tts.generate({ text: 'Hello!', voice: 'Dennis' });
125
+ await tts.play(audio);
126
+
127
+ await tts.play('hello.mp3'); // file path also accepted (Node.js only)
128
+ await tts.play(pcmBytes, { encoding: 'PCM' }); // encoding hint for raw PCM/ALAW/MULAW
129
+ ```
130
+
131
+ See [play()](API_REFERENCE.md#play) for platform player details and browser constraints.
132
+
133
+ ---
134
+
135
+ ## listVoices()
136
+
137
+ List voices in your workspace, with optional language filter.
138
+
139
+ ```js
140
+ const voices = await tts.listVoices();
141
+ const enVoices = await tts.listVoices({ lang: 'EN_US' });
142
+ const multi = await tts.listVoices({ lang: ['EN_US', 'ES_ES'] });
143
+ ```
144
+
145
+ ## getVoice()
146
+
147
+ Get details of a specific voice.
148
+
149
+ ```js
150
+ const voice = await tts.getVoice('workspace__my_clone');
151
+ ```
152
+
153
+ ## updateVoice()
154
+
155
+ Update a voice's display name, description, or tags.
156
+
157
+ ```js
158
+ await tts.updateVoice({ voice: 'workspace__my_clone', displayName: 'Narrator', tags: ['calm'] });
159
+ ```
160
+
161
+ ## deleteVoice()
162
+
163
+ Delete a voice from your workspace.
164
+
165
+ ```js
166
+ await tts.deleteVoice('workspace__my_clone');
167
+ ```
168
+
169
+ ## cloneVoice()
170
+
171
+ Clone a voice from one or more audio recordings (WAV/MP3). Accepts `Uint8Array` buffers or file path strings (Node.js only).
172
+
173
+ ```js
174
+ const result = await tts.cloneVoice({
175
+ audioSamples: ['sample.wav'],
176
+ displayName: 'My Clone',
177
+ });
178
+ const voiceId = result.voice.voiceId;
179
+ ```
180
+
181
+ > **Note:** Voice cloning is a long-running operation (up to 5 min). If it times out, check `listVoices()` — the voice may have been created anyway.
182
+
183
+ ## designVoice()
184
+
185
+ Design a voice from a text description — no recording needed.
186
+
187
+ ```js
188
+ const result = await tts.designVoice({
189
+ designPrompt: 'A warm, friendly narrator',
190
+ previewText: 'Hello, welcome to our audiobook.',
191
+ });
192
+ const preview = result.previewVoices[0];
193
+ ```
194
+
195
+ ## publishVoice()
196
+
197
+ Publish a designed or cloned voice preview to your library.
198
+
199
+ ```js
200
+ await tts.publishVoice({ voice: preview.voiceId, displayName: 'My Custom Voice' });
201
+ ```
202
+
203
+ ## migrateFromElevenLabs()
204
+
205
+ Migrate a voice from ElevenLabs to your Inworld workspace. No ElevenLabs SDK required.
206
+
207
+ ```js
208
+ const result = await tts.migrateFromElevenLabs({
209
+ elevenLabsApiKey: 'el_...',
210
+ elevenLabsVoiceId: 'abc123',
211
+ });
212
+ console.log(`${result.elevenLabsName} → ${result.inworldVoiceId}`);
213
+ ```
214
+
215
+ See [Voice Management](API_REFERENCE.md#voice-management) in the API Reference for all parameters.
216
+
217
+ ---
218
+
219
+ ## Errors
220
+
221
+ | Class | When |
222
+ |-------|------|
223
+ | `MissingApiKeyError` | No API key or token found at construction |
224
+ | `ApiError` | API returned 4xx/5xx — has `.code` and `.details` |
225
+ | `NetworkError` | Connection or timeout failure |
226
+
227
+ All inherit from `InworldTTSError`.
228
+
229
+ ```js
230
+ import { InworldTTS, ApiError, MissingApiKeyError, NetworkError } from '@inworld/tts';
231
+
232
+ try {
233
+ const audio = await tts.generate({ text: 'Hello!', voice: 'Dennis' });
234
+ } catch (err) {
235
+ if (err instanceof MissingApiKeyError) console.error('Missing API key');
236
+ else if (err instanceof ApiError) console.error(`HTTP ${err.code}: ${err.message}`);
237
+ else if (err instanceof NetworkError) console.error(`Network error: ${err.message}`);
238
+ else throw err;
239
+ }
240
+ ```
241
+
242
+ ---
243
+
244
+ ## Browser
245
+
246
+ For browser usage, use a short-lived JWT token from your backend instead of exposing your API key.
247
+
248
+ ```js
249
+ const fetchToken = async () => {
250
+ const { token } = await fetch('/api/tts-token').then(r => r.json());
251
+ return token;
252
+ };
253
+
254
+ const tts = InworldTTS({
255
+ token: await fetchToken(),
256
+ onTokenExpiring: fetchToken, // called automatically when token is about to expire
257
+ });
258
+
259
+ // play() must be called inside a user event handler (browser autoplay policy)
260
+ button.onclick = async () => {
261
+ const audio = await tts.generate({ text: 'Hello!', voice: 'Dennis', encoding: 'MP3' });
262
+ await tts.play(audio);
263
+ };
264
+ ```
265
+
266
+ For development only, you can use an API key directly with `dangerouslyAllowBrowser: true` — but your key will be visible in DevTools and billed to your account.
267
+
268
+ A complete working example is in [`examples/browser/`](examples/browser/).
269
+
270
+ > **Encoding in browser:** Prefer `encoding: 'MP3'` — supported natively in all browsers. `OGG_OPUS`/`FLAC` work in Chrome and Firefox but not Safari. `LINEAR16`, `PCM`, `ALAW`, `MULAW` cannot be played by `play()` in browser.
271
+
272
+ ---
273
+
274
+ ## Examples
275
+
276
+ Runnable examples are in the [`examples/`](examples/) directory:
277
+
278
+ | File | What it shows |
279
+ |------|---------------|
280
+ | [`hello_world.js`](examples/hello_world.js) | Text → MP3 in 3 lines |
281
+ | [`stream_audio.js`](examples/stream_audio.js) | Real-time streaming |
282
+ | [`list_voices.js`](examples/list_voices.js) | List all voices with optional language filter |
283
+ | [`clone_voice.js`](examples/clone_voice.js) | Clone a voice from a WAV/MP3 recording |
284
+ | [`design_voice.js`](examples/design_voice.js) | Design a voice from a text description, preview, and publish |
285
+ | [`generate_timestamps.js`](examples/generate_timestamps.js) | Word-level timestamps |
286
+ | [`examples/browser/`](examples/browser/) | Browser usage with JWT auth |
287
+
288
+ ---
289
+
290
+ ## Troubleshooting
291
+
292
+ ### `MissingApiKeyError` / `ApiError` 401
293
+
294
+ Set `INWORLD_API_KEY` or pass `apiKey` directly. If the key is set but rejected, regenerate it at [platform.inworld.ai](https://platform.inworld.ai).
295
+
296
+ ### `play()` blocked by browser (autoplay policy)
297
+
298
+ Move `play()` inside a user event handler:
299
+
300
+ ```js
301
+ button.onclick = async () => await tts.play(audio);
302
+ ```
303
+
304
+ ### `ApiError`: text exceeds 2000 character limit
305
+
306
+ `stream()` accepts at most 2000 characters per call. Use `generate()` instead — it handles any text length automatically.
307
+
308
+ ### `NetworkError: Request timed out`
309
+
310
+ Increase the timeout or add retries:
311
+
312
+ ```js
313
+ const tts = InworldTTS({ timeout: 120_000, maxRetries: 3 });
314
+ ```
315
+
316
+ ### `require('@inworld/tts')` throws `ERR_REQUIRE_ESM`
317
+
318
+ Both ESM and CommonJS are supported. Ensure you are on Node.js ≥18 and using a bundler that respects the `exports` field (webpack 5, Vite, Rollup, esbuild).
319
+
320
+ ```js
321
+ // ESM
322
+ import { InworldTTS } from '@inworld/tts';
323
+
324
+ // CommonJS
325
+ const { InworldTTS } = require('@inworld/tts');
326
+ ```
327
+
328
+ ---
329
+
330
+ ## License
331
+
332
+ [MIT](LICENSE)