@imgly/plugin-ai-audio-generation-web 0.2.17 → 1.69.0-nightly.20260130

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,478 +1,9 @@
1
- # IMG.LY AI Audio Generation for Web
1
+ # @imgly/plugin-ai-audio-generation-web
2
2
 
3
- A plugin for integrating AI audio generation capabilities into CreativeEditor SDK.
3
+ AI audio generation plugin for the CE.SDK editor
4
4
 
5
- ## Overview
6
-
7
- The `@imgly/plugin-ai-audio-generation-web` package enables users to generate audio content using AI directly within CreativeEditor SDK. This shipped provider leverages the [ElevenLabs](https://elevenlabs.io) platform to provide high-quality text-to-speech and sound effect generation.
8
-
9
- Features include:
10
-
11
- - Text-to-speech generation with multiple voices
12
- - Sound effect generation from text descriptions
13
- - Voice selection interface
14
- - Speed adjustment
15
- - Automatic history tracking
16
- - Seamless integration with CreativeEditor SDK
17
-
18
- ## Installation
19
-
20
- ```bash
21
- npm install @imgly/plugin-ai-audio-generation-web
22
- ```
23
-
24
- ## Usage
25
-
26
- ### Basic Configuration
27
-
28
- To use the plugin, import it and configure it with your preferred providers:
29
-
30
- #### Single Provider Configuration
31
-
32
- ```typescript
33
- import CreativeEditorSDK from '@cesdk/cesdk-js';
34
- import AudioGeneration from '@imgly/plugin-ai-audio-generation-web';
35
- import Elevenlabs from '@imgly/plugin-ai-audio-generation-web/elevenlabs';
36
-
37
- // Initialize CreativeEditor SDK
38
- CreativeEditorSDK.create(domElement, {
39
- license: 'your-license-key'
40
- // Other configuration options...
41
- }).then(async (cesdk) => {
42
- // Add the audio generation plugin
43
- cesdk.addPlugin(
44
- AudioGeneration({
45
- // Text-to-speech provider
46
- text2speech: Elevenlabs.ElevenMultilingualV2({
47
- proxyUrl: 'http://your-proxy-server.com/api/proxy',
48
- headers: {
49
- 'x-custom-header': 'value',
50
- 'x-client-version': '1.0.0'
51
- },
52
- // Optional: Configure default property values
53
- properties: {
54
- voice_id: 'pNInz6obpgDQGcFmaJgB', // Default voice (Adam)
55
- voice_settings_stability: 0.5,
56
- voice_settings_similarity_boost: 0.75
57
- }
58
- }),
59
-
60
- // Sound effects provider (optional)
61
- text2sound: Elevenlabs.ElevenSoundEffects({
62
- proxyUrl: 'http://your-proxy-server.com/api/proxy',
63
- headers: {
64
- 'x-custom-header': 'value',
65
- 'x-client-version': '1.0.0'
66
- }
67
- }),
68
-
69
- // Optional configuration
70
- debug: false,
71
- dryRun: false
72
- })
73
- );
74
- });
75
- ```
76
-
77
- #### Multiple Providers Configuration
78
-
79
- You can configure multiple providers for each generation type, and users will see a selection box to choose between them:
80
-
81
- ```typescript
82
- import CreativeEditorSDK from '@cesdk/cesdk-js';
83
- import AudioGeneration from '@imgly/plugin-ai-audio-generation-web';
84
- import Elevenlabs from '@imgly/plugin-ai-audio-generation-web/elevenlabs';
85
-
86
- // Initialize CreativeEditor SDK
87
- CreativeEditorSDK.create(domElement, {
88
- license: 'your-license-key'
89
- // Other configuration options...
90
- }).then(async (cesdk) => {
91
- // Add the audio generation plugin with multiple providers
92
- cesdk.addPlugin(
93
- AudioGeneration({
94
- // Multiple text-to-speech providers
95
- text2speech: [
96
- Elevenlabs.ElevenMultilingualV2({
97
- proxyUrl: 'http://your-proxy-server.com/api/proxy',
98
- headers: {
99
- 'x-custom-header': 'value',
100
- 'x-client-version': '1.0.0'
101
- }
102
- }),
103
- // Add more providers here as they become available
104
- // OtherProvider.SomeModel({
105
- // proxyUrl: 'http://your-proxy-server.com/api/proxy',
106
- // headers: {
107
- // 'x-api-key': 'your-key',
108
- // 'x-source': 'cesdk'
109
- // }
110
- // })
111
- ],
112
-
113
- // Sound effects provider (optional)
114
- text2sound: Elevenlabs.ElevenSoundEffects({
115
- proxyUrl: 'http://your-proxy-server.com/api/proxy',
116
- headers: {
117
- 'x-custom-header': 'value',
118
- 'x-client-version': '1.0.0'
119
- }
120
- }),
121
-
122
- // Optional configuration
123
- debug: false,
124
- dryRun: false
125
- })
126
- );
127
- });
128
- ```
129
-
130
- ### Providers
131
-
132
- The plugin comes with two pre-configured providers for ElevenLabs:
133
-
134
- #### 1. ElevenMultilingualV2 (Text-to-Speech)
135
-
136
- A versatile text-to-speech engine that supports multiple languages and voices:
137
-
138
- ```typescript
139
- text2speech: Elevenlabs.ElevenMultilingualV2({
140
- proxyUrl: 'http://your-proxy-server.com/api/proxy',
141
- headers: {
142
- 'x-custom-header': 'value',
143
- 'x-client-version': '1.0.0'
144
- },
145
- // Optional: Configure default property values
146
- properties: {
147
- voice_id: 'pNInz6obpgDQGcFmaJgB', // Default voice (Adam)
148
- voice_settings_stability: 0.5, // Voice stability (0.0-1.0)
149
- voice_settings_similarity_boost: 0.75 // Voice similarity (0.0-1.0)
150
- }
151
- });
152
- ```
153
-
154
- Key features:
155
-
156
- - Multiple voice options
157
- - Multilingual support
158
- - Adjustable speaking speed
159
- - Natural-sounding speech
160
- - Custom headers support for API requests
161
-
162
- **Custom Translations:**
163
-
164
- ```typescript
165
- cesdk.i18n.setTranslations({
166
- en: {
167
- 'ly.img.plugin-ai-audio-generation-web.elevenlabs/monolingual/v1.property.prompt': 'Enter text to convert to speech',
168
- 'ly.img.plugin-ai-audio-generation-web.elevenlabs/monolingual/v1.property.voice_id': 'Select Voice',
169
- 'ly.img.plugin-ai-audio-generation-web.elevenlabs/monolingual/v1.property.speed': 'Playback Speed'
170
- }
171
- });
172
- ```
173
-
174
- #### 2. ElevenSoundEffects (Text-to-Sound)
175
-
176
- A sound effect generator that creates audio from text descriptions:
177
-
178
- ```typescript
179
- text2sound: Elevenlabs.ElevenSoundEffects({
180
- proxyUrl: 'http://your-proxy-server.com/api/proxy',
181
- headers: {
182
- 'x-custom-header': 'value',
183
- 'x-client-version': '1.0.0'
184
- },
185
- // Optional: Configure default property values
186
- properties: {
187
- duration_seconds: 10, // Duration of sound effect
188
- prompt_influence: 0.3 // How much the prompt influences generation
189
- }
190
- });
191
- ```
192
-
193
- Key features:
194
-
195
- - Generate sound effects from text descriptions
196
- - Create ambient sounds, effects, and music
197
- - Seamless integration with CreativeEditor SDK
198
- - Automatic thumbnails and duration detection
199
- - Custom headers support for API requests
200
-
201
- **Custom Translations:**
202
-
203
- ```typescript
204
- cesdk.i18n.setTranslations({
205
- en: {
206
- 'ly.img.plugin-ai-audio-generation-web.elevenlabs/sound-generation.property.prompt': 'Describe the sound you want to create',
207
- 'ly.img.plugin-ai-audio-generation-web.elevenlabs/sound-generation.property.duration': 'Audio Length'
208
- }
209
- });
210
- ```
211
-
212
- ### Feature Control
213
-
214
- You can control various aspects of the audio generation plugin using the Feature API:
215
-
216
- ```typescript
217
- // Disable provider selection for speech
218
- cesdk.feature.enable('ly.img.plugin-ai-audio-generation-web.speech.providerSelect', false);
219
-
220
- // Disable provider selection for sound effects
221
- cesdk.feature.enable('ly.img.plugin-ai-audio-generation-web.sound.providerSelect', false);
222
-
223
- // Control individual provider visibility
224
- cesdk.feature.enable('ly.img.plugin-ai-audio-generation-web.providerSelect', false);
225
- ```
226
-
227
- For more information about Feature API and available feature flags, see the [@imgly/plugin-ai-generation-web documentation](https://github.com/imgly/plugins/tree/main/packages/plugin-ai-generation-web#available-feature-flags).
228
-
229
- ### Customizing Labels and Translations
230
-
231
- You can customize all labels and text in the AI audio generation interface using the translation system. This allows you to provide better labels for your users in any language.
232
-
233
- #### Translation Key Structure
234
-
235
- The system checks for translations in this order (highest to lowest priority):
236
-
237
- 1. **Provider-specific**: `ly.img.plugin-ai-audio-generation-web.${provider}.property.${field}` - Override labels for a specific AI provider
238
- 2. **Generic**: `ly.img.plugin-ai-generation-web.property.${field}` - Override labels for all AI plugins
239
-
240
- #### Basic Example
241
-
242
- ```typescript
243
- // Customize labels for your AI audio generation interface
244
- cesdk.i18n.setTranslations({
245
- en: {
246
- // Generic labels (applies to ALL AI plugins)
247
- 'ly.img.plugin-ai-generation-web.property.prompt': 'Describe what you want to create',
248
- 'ly.img.plugin-ai-generation-web.property.voice_id': 'Voice Selection',
249
- 'ly.img.plugin-ai-generation-web.property.speed': 'Speaking Speed',
250
-
251
- // Provider-specific for ElevenMultilingualV2
252
- 'ly.img.plugin-ai-audio-generation-web.elevenlabs/monolingual/v1.property.prompt': 'Enter text to speak',
253
- 'ly.img.plugin-ai-audio-generation-web.elevenlabs/monolingual/v1.property.voice_id': 'Choose Voice',
254
- 'ly.img.plugin-ai-audio-generation-web.elevenlabs/monolingual/v1.property.speed': 'Speech Speed',
255
-
256
- // Provider-specific for ElevenSoundEffects
257
- 'ly.img.plugin-ai-audio-generation-web.elevenlabs/sound-generation.property.prompt': 'Describe the sound effect',
258
- 'ly.img.plugin-ai-audio-generation-web.elevenlabs/sound-generation.property.duration': 'Sound Duration'
259
- }
260
- });
261
- ```
262
-
263
- ### Configuration Options
264
-
265
- The plugin accepts the following configuration options:
266
-
267
- | Option | Type | Description | Default |
268
- | ------------- | -------------------- | ----------------------------------------------- | --------- |
269
- | `text2speech` | Provider \| Provider[] | Provider(s) for text-to-speech generation. When multiple providers are provided, users can select between them | undefined |
270
- | `text2sound` | Provider \| Provider[] | Provider(s) for sound effect generation. When multiple providers are provided, users can select between them | undefined |
271
- | `debug` | boolean | Enable debug logging | false |
272
- | `dryRun` | boolean | Simulate generation without API calls | false |
273
- | `middleware` | Function[] | Array of middleware functions for the generation | undefined |
274
-
275
- ### Middleware Configuration
276
-
277
- The `middleware` option allows you to add pre-processing and post-processing capabilities to the generation process:
278
-
279
- ```typescript
280
- import AudioGeneration from '@imgly/plugin-ai-audio-generation-web';
281
- import Elevenlabs from '@imgly/plugin-ai-audio-generation-web/elevenlabs';
282
- import { loggingMiddleware, rateLimitMiddleware } from '@imgly/plugin-ai-generation-web';
283
-
284
- // Create middleware functions
285
- const logging = loggingMiddleware();
286
- const rateLimit = rateLimitMiddleware({
287
- maxRequests: 15,
288
- timeWindowMs: 60000, // 1 minute
289
- onRateLimitExceeded: (input, options, info) => {
290
- console.log(`Audio generation rate limit exceeded: ${info.currentCount}/${info.maxRequests}`);
291
- return false; // Reject request
292
- }
293
- });
294
-
295
- // Create custom middleware
296
- const customMiddleware = async (input, options, next) => {
297
- console.log('Before generation:', input);
298
-
299
- // Add custom fields or modify the input
300
- const modifiedInput = {
301
- ...input,
302
- customField: 'custom value'
303
- };
304
-
305
- // Call the next middleware or generation function
306
- const result = await next(modifiedInput, options);
307
-
308
- console.log('After generation:', result);
309
-
310
- // You can also modify the result before returning it
311
- return result;
312
- };
313
-
314
- // Apply middleware to plugin
315
- cesdk.addPlugin(
316
- AudioGeneration({
317
- text2speech: Elevenlabs.ElevenMultilingualV2({
318
- proxyUrl: 'http://your-proxy-server.com/api/proxy'
319
- }),
320
- middleware: [logging, rateLimit, customMiddleware] // Apply middleware in order
321
- })
322
- );
323
- ```
324
-
325
- Built-in middleware options:
326
-
327
- - **loggingMiddleware**: Logs generation requests and responses
328
- - **rateLimitMiddleware**: Limits the number of generation requests in a time window
329
-
330
- You can also create custom middleware functions to meet your specific needs.
331
-
332
- #### Preventing Default Feedback
333
-
334
- Middleware can suppress default UI feedback behaviors using `options.preventDefault()`:
335
-
336
- ```typescript
337
- const customErrorMiddleware = async (input, options, next) => {
338
- try {
339
- return await next(input, options);
340
- } catch (error) {
341
- // Prevent default error notification
342
- options.preventDefault();
343
-
344
- // Show custom error notification
345
- options.cesdk?.ui.showNotification({
346
- type: 'error',
347
- message: `Audio generation failed: ${error.message}`,
348
- action: {
349
- label: 'Try Again',
350
- onClick: () => {/* retry logic */}
351
- }
352
- });
353
-
354
- throw error;
355
- }
356
- };
357
- ```
358
-
359
- **What gets prevented:**
360
- - Error/success notifications
361
- - Block error state
362
- - Console error logging
363
-
364
- **What is NOT prevented:**
365
- - Pending → Ready transition (loading spinner always stops)
366
-
367
- For more details, see the [@imgly/plugin-ai-generation-web documentation](https://github.com/imgly/plugins/tree/main/packages/plugin-ai-generation-web#preventing-default-feedback).
368
-
369
- ### Using a Proxy
370
-
371
- For security reasons, it's recommended to use a proxy server to handle API requests to ElevenLabs. The proxy URL is required when configuring providers:
372
-
373
- ```typescript
374
- text2speech: Elevenlabs.ElevenMultilingualV2({
375
- proxyUrl: 'http://your-proxy-server.com/api/proxy',
376
- headers: {
377
- 'x-custom-header': 'value',
378
- 'x-client-version': '1.0.0'
379
- }
380
- });
381
- ```
382
-
383
- The `headers` option allows you to include custom HTTP headers in all API requests. This is useful for:
384
- - Adding custom client identification headers
385
- - Including version information
386
- - Passing through metadata required by your API
387
- - Adding correlation IDs for request tracing
388
-
389
- You'll need to implement a proxy server that forwards requests to ElevenLabs and handles authentication.
390
-
391
- ## API Reference
392
-
393
- ### Main Plugin
394
-
395
- ```typescript
396
- AudioGeneration(options: PluginConfiguration): EditorPlugin
397
- ```
398
-
399
- Creates and returns a plugin that can be added to CreativeEditor SDK.
400
-
401
- ### Plugin Configuration
402
-
403
- ```typescript
404
- interface PluginConfiguration {
405
- // Provider(s) for text-to-speech generation
406
- text2speech?: AiAudioProvider | AiAudioProvider[];
407
-
408
- // Provider(s) for sound effect generation
409
- text2sound?: AiAudioProvider | AiAudioProvider[];
410
-
411
- // Enable debug logging
412
- debug?: boolean;
413
-
414
- // Skip actual API calls for testing
415
- dryRun?: boolean;
416
-
417
- // Extend the generation process
418
- middleware?: GenerationMiddleware;
419
- }
420
- ```
421
-
422
- ### ElevenLabs Providers
423
-
424
- #### ElevenMultilingualV2
425
-
426
- ```typescript
427
- Elevenlabs.ElevenMultilingualV2(config: {
428
- proxyUrl: string;
429
- headers?: Record<string, string>;
430
- debug?: boolean;
431
- }): AiAudioProvider
432
- ```
433
-
434
- #### ElevenSoundEffects
435
-
436
- ```typescript
437
- Elevenlabs.ElevenSoundEffects(config: {
438
- proxyUrl: string;
439
- headers?: Record<string, string>;
440
- debug?: boolean;
441
- }): AiAudioProvider
442
- ```
443
-
444
- ## UI Integration
445
-
446
- The plugin automatically registers the following UI components:
447
-
448
- 1. **Speech Generation Panel**: A sidebar panel for text-to-speech generation
449
- 2. **Sound Generation Panel**: A sidebar panel for generating sound effects
450
- 3. **Voice Selection Panel**: A panel for choosing different voice options
451
- 4. **History Library**: Displays previously generated audio clips
452
-
453
- ### Panel IDs
454
-
455
- - Main speech panel: `ly.img.ai.elevenlabs/monolingual/v1`
456
- - Main sound panel: `ly.img.ai.elevenlabs/sound-generation`
457
- - Voice selection panel: `ly.img.ai.audio-generation/speech/elevenlabs.voiceSelection`
458
-
459
- ### Asset History
460
-
461
- Generated audio files are automatically stored in asset sources with the following IDs:
462
-
463
- - Text-to-Speech: `elevenlabs/monolingual/v1.history`
464
- - Sound Effects: `elevenlabs/sound-generation.history`
465
-
466
- ## Translations
467
-
468
- For customization and localization, see the [translations.json](https://github.com/imgly/plugins/tree/main/packages/plugin-ai-audio-generation-web/translations.json) file which contains provider-specific translation keys for audio generation interfaces.
469
-
470
- ## Related Packages
471
-
472
- - [@imgly/plugin-ai-generation-web](https://github.com/imgly/plugins/tree/main/packages/plugin-ai-generation-web) - Core utilities for AI generation
473
- - [@imgly/plugin-ai-image-generation-web](https://github.com/imgly/plugins/tree/main/packages/plugin-ai-image-generation-web) - AI image generation
474
- - [@imgly/plugin-ai-video-generation-web](https://github.com/imgly/plugins/tree/main/packages/plugin-ai-video-generation-web) - AI video generation
5
+ For documentation, visit: https://img.ly/docs/cesdk
475
6
 
476
7
  ## License
477
8
 
478
- This plugin is part of the IMG.LY plugin ecosystem for CreativeEditor SDK. Please refer to the license terms in the package.
9
+ This plugin is part of the IMG.LY plugin ecosystem for CreativeEditor SDK.