@felores/kie-ai-mcp-server 1.5.0 → 1.7.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +197 -70
- package/dist/index.js +426 -166
- package/dist/kie-ai-client.d.ts +4 -2
- package/dist/kie-ai-client.js +92 -32
- package/dist/types.d.ts +165 -39
- package/dist/types.js +73 -18
- package/package.json +2 -2
package/README.md
CHANGED
|
@@ -22,6 +22,7 @@ Access the world's best AI models through a single, developer-friendly API. Gene
|
|
|
22
22
|
- **Nano Banana**: Lightning-fast image generation and editing
|
|
23
23
|
- **ElevenLabs**: Studio-quality text-to-speech and sound effects
|
|
24
24
|
- **ByteDance Seedance**: High-quality video with text-to-video and image-to-video
|
|
25
|
+
- **ByteDance Seedream V4**: Advanced image generation and editing with unified interface
|
|
25
26
|
|
|
26
27
|
### 💰 **Affordable Pricing**
|
|
27
28
|
Pay-as-you-go credit system means you only pay for what you use. Good for startups and enterprises looking to reduce AI costs.
|
|
@@ -340,24 +341,25 @@ Using explicit model (overrides default V5):
|
|
|
340
341
|
**Note**: In custom mode, `style` and `title` are required. If `instrumental` is false, `prompt` is used as exact lyrics. The `callBackUrl` is optional and will use the `KIE_AI_CALLBACK_URL` environment variable if not provided. The `model` parameter defaults to "V5" but can be explicitly set to any available version.
|
|
341
342
|
|
|
342
343
|
### 9. `elevenlabs_tts`
|
|
343
|
-
Generate speech from text using ElevenLabs
|
|
344
|
+
Generate speech from text using ElevenLabs TTS models (Turbo 2.5 by default, with optional Multilingual v2 support).
|
|
344
345
|
|
|
345
346
|
**Parameters:**
|
|
346
347
|
- `text` (string, required): The text to convert to speech (max 5000 characters)
|
|
348
|
+
- `model` (enum, optional): TTS model to use - "turbo" (faster, default) or "multilingual" (supports context)
|
|
347
349
|
- `voice` (enum, optional): Voice to use - "Rachel", "Aria", "Roger", "Sarah", "Laura", "Charlie", "George", "Callum", "River", "Liam", "Charlotte", "Alice", "Matilda", "Will", "Jessica", "Eric", "Chris", "Brian", "Daniel", "Lily", "Bill" (default: "Rachel")
|
|
348
350
|
- `stability` (number, optional): Voice stability (0-1, step 0.01, default: 0.5)
|
|
349
351
|
- `similarity_boost` (number, optional): Similarity boost (0-1, step 0.01, default: 0.75)
|
|
350
352
|
- `style` (number, optional): Style exaggeration (0-1, step 0.01, default: 0)
|
|
351
353
|
- `speed` (number, optional): Speech speed (0.7-1.2, step 0.01, default: 1.0)
|
|
352
354
|
- `timestamps` (boolean, optional): Whether to return timestamps for each word (default: false)
|
|
353
|
-
- `previous_text` (string, optional): Text that came before current request
|
|
354
|
-
- `next_text` (string, optional): Text that comes after current request
|
|
355
|
-
- `language_code` (string, optional): ISO 639-1 language code for language enforcement (max 500 chars)
|
|
355
|
+
- `previous_text` (string, optional): Text that came before current request (multilingual model only, max 5000 chars)
|
|
356
|
+
- `next_text` (string, optional): Text that comes after current request (multilingual model only, max 5000 chars)
|
|
357
|
+
- `language_code` (string, optional): ISO 639-1 language code for language enforcement (turbo model only, max 500 chars)
|
|
356
358
|
- `callBackUrl` (string, optional): URL to receive task completion updates (uses KIE_AI_CALLBACK_URL environment variable if not provided)
|
|
357
359
|
|
|
358
360
|
**Examples:**
|
|
359
361
|
|
|
360
|
-
Basic TTS generation:
|
|
362
|
+
Basic TTS generation (uses Turbo model by default):
|
|
361
363
|
```json
|
|
362
364
|
{
|
|
363
365
|
"text": "Hello, this is a test of the ElevenLabs text-to-speech system.",
|
|
@@ -365,86 +367,36 @@ Basic TTS generation:
|
|
|
365
367
|
}
|
|
366
368
|
```
|
|
367
369
|
|
|
368
|
-
|
|
369
|
-
```json
|
|
370
|
-
{
|
|
371
|
-
"text": "Welcome to our presentation on artificial intelligence",
|
|
372
|
-
"voice": "Aria",
|
|
373
|
-
"stability": 0.8,
|
|
374
|
-
"similarity_boost": 0.9,
|
|
375
|
-
"style": 0.3,
|
|
376
|
-
"speed": 1.1
|
|
377
|
-
}
|
|
378
|
-
```
|
|
379
|
-
|
|
380
|
-
With continuity for longer texts:
|
|
381
|
-
```json
|
|
382
|
-
{
|
|
383
|
-
"text": "This is the second part of our conversation.",
|
|
384
|
-
"voice": "Roger",
|
|
385
|
-
"previous_text": "This is the first part of our conversation.",
|
|
386
|
-
"next_text": "This is the third part of our conversation."
|
|
387
|
-
}
|
|
388
|
-
```
|
|
389
|
-
|
|
390
|
-
**Note**: The `callBackUrl` is optional and will use the `KIE_AI_CALLBACK_URL` environment variable if not provided. Generation typically takes 30 seconds to 2 minutes depending on text length.
|
|
391
|
-
|
|
392
|
-
### 10. `elevenlabs_tts_turbo`
|
|
393
|
-
Generate speech from text using ElevenLabs Turbo 2.5 TTS model (faster generation with language enforcement support).
|
|
394
|
-
|
|
395
|
-
**Parameters:**
|
|
396
|
-
- `text` (string, required): The text to convert to speech (max 5000 characters)
|
|
397
|
-
- `voice` (enum, optional): Voice to use - "Rachel", "Aria", "Roger", "Sarah", "Laura", "Charlie", "George", "Callum", "River", "Liam", "Charlotte", "Alice", "Matilda", "Will", "Jessica", "Eric", "Chris", "Brian", "Daniel", "Lily", "Bill" (default: "Rachel")
|
|
398
|
-
- `stability` (number, optional): Voice stability (0-1, step 0.01, default: 0.5)
|
|
399
|
-
- `similarity_boost` (number, optional): Similarity boost (0-1, step 0.01, default: 0.75)
|
|
400
|
-
- `style` (number, optional): Style exaggeration (0-1, step 0.01, default: 0)
|
|
401
|
-
- `speed` (number, optional): Speech speed (0.7-1.2, step 0.01, default: 1.0)
|
|
402
|
-
- `timestamps` (boolean, optional): Whether to return timestamps for each word (default: false)
|
|
403
|
-
- `previous_text` (string, optional): Text that came before current request for continuity (max 5000 chars)
|
|
404
|
-
- `next_text` (string, optional): Text that comes after current request for continuity (max 5000 chars)
|
|
405
|
-
- `language_code` (string, optional): ISO 639-1 language code for language enforcement - Turbo 2.5 supports this feature (max 500 chars)
|
|
406
|
-
- `callBackUrl` (string, optional): URL to receive task completion updates (uses KIE_AI_CALLBACK_URL environment variable if not provided)
|
|
407
|
-
|
|
408
|
-
**Examples:**
|
|
409
|
-
|
|
410
|
-
Fast TTS generation:
|
|
411
|
-
```json
|
|
412
|
-
{
|
|
413
|
-
"text": "This is a fast generation using the Turbo model.",
|
|
414
|
-
"voice": "Aria"
|
|
415
|
-
}
|
|
416
|
-
```
|
|
417
|
-
|
|
418
|
-
With language enforcement:
|
|
370
|
+
Fast generation with language enforcement (Turbo model):
|
|
419
371
|
```json
|
|
420
372
|
{
|
|
421
373
|
"text": "Bonjour, comment allez-vous?",
|
|
422
374
|
"voice": "Rachel",
|
|
375
|
+
"model": "turbo",
|
|
423
376
|
"language_code": "fr"
|
|
424
377
|
}
|
|
425
378
|
```
|
|
426
379
|
|
|
427
|
-
Advanced controls with
|
|
380
|
+
Advanced voice controls with context (Multilingual model):
|
|
428
381
|
```json
|
|
429
382
|
{
|
|
430
|
-
"text": "This is part
|
|
383
|
+
"text": "This is the second part of our conversation.",
|
|
431
384
|
"voice": "Roger",
|
|
432
|
-
"
|
|
433
|
-
"
|
|
434
|
-
"
|
|
435
|
-
"
|
|
385
|
+
"model": "multilingual",
|
|
386
|
+
"stability": 0.8,
|
|
387
|
+
"similarity_boost": 0.9,
|
|
388
|
+
"previous_text": "This is the first part of our conversation.",
|
|
389
|
+
"next_text": "This is the third part of our conversation."
|
|
436
390
|
}
|
|
437
391
|
```
|
|
438
392
|
|
|
439
|
-
**
|
|
440
|
-
- **
|
|
441
|
-
- **
|
|
442
|
-
- **Same Voice Options**: All 21 voices available
|
|
443
|
-
- **Same Quality**: Maintains high audio quality with faster processing
|
|
393
|
+
**Model Comparison:**
|
|
394
|
+
- **Turbo 2.5** (default): Faster generation (15-60 seconds), supports language enforcement with `language_code`
|
|
395
|
+
- **Multilingual v2**: Supports context with `previous_text`/`next_text`, generation takes 30-120 seconds
|
|
444
396
|
|
|
445
|
-
**Note**: The `callBackUrl` is optional and will use the `KIE_AI_CALLBACK_URL` environment variable if not provided. Turbo
|
|
397
|
+
**Note**: The `callBackUrl` is optional and will use the `KIE_AI_CALLBACK_URL` environment variable if not provided. Choose Turbo model for speed and language enforcement, or Multilingual model for context-aware speech generation.
|
|
446
398
|
|
|
447
|
-
###
|
|
399
|
+
### 10. `elevenlabs_ttsfx`
|
|
448
400
|
Generate sound effects from text descriptions using ElevenLabs Sound Effects v2 model.
|
|
449
401
|
|
|
450
402
|
**Parameters:**
|
|
@@ -564,6 +516,181 @@ Video with specific ending frame:
|
|
|
564
516
|
|
|
565
517
|
**Note**: The `callBackUrl` is optional and will use the `KIE_AI_CALLBACK_URL` environment variable if not provided. Video generation typically takes 2-5 minutes depending on quality and complexity.
|
|
566
518
|
|
|
519
|
+
### 13. `bytedance_seedream_image`
|
|
520
|
+
Generate and edit images using ByteDance Seedream V4 models (unified tool for both text-to-image and image editing).
|
|
521
|
+
|
|
522
|
+
**Parameters:**
|
|
523
|
+
- `prompt` (string, required): Text prompt for image generation or editing (max 10000 chars)
|
|
524
|
+
- `image_urls` (array, optional): Array of image URLs for editing mode (1-10 images, if not provided, uses text-to-image)
|
|
525
|
+
- `image_size` (string, optional): Image aspect ratio (default: "1:1")
|
|
526
|
+
- Options: `1:1`, `4:3`, `3:4`, `16:9`, `9:16`, `21:9`, `9:21`, `3:2`, `2:3`
|
|
527
|
+
- `image_resolution` (string, optional): Image resolution (default: "1K")
|
|
528
|
+
- `1K`: Standard resolution (1024px on shortest side)
|
|
529
|
+
- `2K`: High resolution (2048px on shortest side)
|
|
530
|
+
- `4K`: Ultra high resolution (4096px on shortest side)
|
|
531
|
+
- `max_images` (integer, optional): Number of images to generate (1-6, default: 1)
|
|
532
|
+
- `seed` (integer, optional): Random seed for reproducible results (default: -1 for random)
|
|
533
|
+
- `callBackUrl` (string, optional): URL for task completion notifications
|
|
534
|
+
|
|
535
|
+
**Examples:**
|
|
536
|
+
|
|
537
|
+
Text-to-image generation:
|
|
538
|
+
```json
|
|
539
|
+
{
|
|
540
|
+
"prompt": "A majestic dragon perched atop a crystal mountain at sunset, digital art style",
|
|
541
|
+
"image_size": "16:9",
|
|
542
|
+
"image_resolution": "2K",
|
|
543
|
+
"max_images": 2,
|
|
544
|
+
"seed": 42
|
|
545
|
+
}
|
|
546
|
+
```
|
|
547
|
+
|
|
548
|
+
Image editing:
|
|
549
|
+
```json
|
|
550
|
+
{
|
|
551
|
+
"prompt": "Transform the day scene into a magical night with glowing stars and moonlight",
|
|
552
|
+
"image_urls": ["https://example.com/day-landscape.jpg"],
|
|
553
|
+
"image_size": "16:9",
|
|
554
|
+
"image_resolution": "2K",
|
|
555
|
+
"max_images": 1
|
|
556
|
+
}
|
|
557
|
+
```
|
|
558
|
+
|
|
559
|
+
Multiple image editing:
|
|
560
|
+
```json
|
|
561
|
+
{
|
|
562
|
+
"prompt": "Apply a consistent cyberpunk aesthetic to all images with neon lights and futuristic elements",
|
|
563
|
+
"image_urls": [
|
|
564
|
+
"https://example.com/character1.jpg",
|
|
565
|
+
"https://example.com/character2.jpg",
|
|
566
|
+
"https://example.com/background.jpg"
|
|
567
|
+
],
|
|
568
|
+
"image_resolution": "4K",
|
|
569
|
+
"max_images": 3
|
|
570
|
+
}
|
|
571
|
+
```
|
|
572
|
+
|
|
573
|
+
**Key Features:**
|
|
574
|
+
- **Unified Interface**: Single tool for both text-to-image and image editing
|
|
575
|
+
- **Smart Mode Detection**: Automatically detects mode based on presence of `image_urls`
|
|
576
|
+
- **High Resolution**: Support for 1K, 2K, and 4K output
|
|
577
|
+
- **Multiple Images**: Generate up to 6 images in a single request
|
|
578
|
+
- **Batch Editing**: Edit up to 10 images simultaneously with consistent style
|
|
579
|
+
- **Reproducible Results**: Seed control for consistent output
|
|
580
|
+
|
|
581
|
+
**Note**: The `callBackUrl` is optional and will use the `KIE_AI_CALLBACK_URL` environment variable if not provided. Image generation typically takes 30-120 seconds depending on resolution and complexity.
|
|
582
|
+
|
|
583
|
+
### 14. `runway_aleph_video`
|
|
584
|
+
Transform videos using Runway Aleph video-to-video generation with AI-powered editing.
|
|
585
|
+
|
|
586
|
+
**Parameters:**
|
|
587
|
+
- `prompt` (string, required): Text prompt describing desired video transformation (max 1000 chars)
|
|
588
|
+
- `videoUrl` (string, required): URL of the input video to transform
|
|
589
|
+
- `waterMark` (string, optional): Watermark text to add to the video (max 100 chars, default: "")
|
|
590
|
+
- `uploadCn` (boolean, optional): Whether to upload to China servers (default: false)
|
|
591
|
+
- `aspectRatio` (enum, optional): Output video aspect ratio (default: "16:9")
|
|
592
|
+
- Options: `16:9`, `9:16`, `4:3`, `3:4`, `1:1`, `21:9`
|
|
593
|
+
- `seed` (integer, optional): Random seed for reproducible results (1-999999)
|
|
594
|
+
- `referenceImage` (string, optional): URL of reference image for style guidance
|
|
595
|
+
- `callBackUrl` (string, optional): URL for task completion notifications
|
|
596
|
+
|
|
597
|
+
**Examples:**
|
|
598
|
+
|
|
599
|
+
Basic video transformation:
|
|
600
|
+
```json
|
|
601
|
+
{
|
|
602
|
+
"prompt": "Transform this video into a cinematic anime style with vibrant colors",
|
|
603
|
+
"videoUrl": "https://example.com/input-video.mp4",
|
|
604
|
+
"aspectRatio": "16:9"
|
|
605
|
+
}
|
|
606
|
+
```
|
|
607
|
+
|
|
608
|
+
Advanced transformation with reference image:
|
|
609
|
+
```json
|
|
610
|
+
{
|
|
611
|
+
"prompt": "Apply the artistic style of the reference image to this video",
|
|
612
|
+
"videoUrl": "https://example.com/cooking-video.mp4",
|
|
613
|
+
"referenceImage": "https://example.com/van-gogh-painting.jpg",
|
|
614
|
+
"seed": 123456,
|
|
615
|
+
"waterMark": "My Channel"
|
|
616
|
+
}
|
|
617
|
+
```
|
|
618
|
+
|
|
619
|
+
Vertical video for social media:
|
|
620
|
+
```json
|
|
621
|
+
{
|
|
622
|
+
"prompt": "Convert to a dreamy, ethereal style with soft lighting",
|
|
623
|
+
"videoUrl": "https://example.com/landscape-video.mp4",
|
|
624
|
+
"aspectRatio": "9:16",
|
|
625
|
+
"uploadCn": false
|
|
626
|
+
}
|
|
627
|
+
```
|
|
628
|
+
|
|
629
|
+
**Key Features:**
|
|
630
|
+
- **Video-to-Video Transformation**: Transform existing videos with AI-powered editing
|
|
631
|
+
- **Style Transfer**: Apply artistic styles from text prompts or reference images
|
|
632
|
+
- **Aspect Ratio Control**: Convert between horizontal, vertical, and square formats
|
|
633
|
+
- **Reproducible Results**: Seed control for consistent transformations
|
|
634
|
+
- **Watermark Support**: Add custom watermarks to transformed videos
|
|
635
|
+
- **Reference Guidance**: Use reference images to guide the transformation style
|
|
636
|
+
|
|
637
|
+
**Note**: The `callBackUrl` is optional and will use the `KIE_AI_CALLBACK_URL` environment variable if not provided. Video-to-video transformation typically takes 3-8 minutes depending on complexity and length.
|
|
638
|
+
|
|
639
|
+
### 14. `wan_video`
|
|
640
|
+
Generate videos using Alibaba Wan 2.5 models (unified tool for both text-to-video and image-to-video).
|
|
641
|
+
|
|
642
|
+
**Parameters:**
|
|
643
|
+
- `prompt` (string, required): Text prompt for video generation (max 800 chars)
|
|
644
|
+
- `image_url` (string, optional): URL of input image for image-to-video generation (if not provided, uses text-to-video)
|
|
645
|
+
- `aspect_ratio` (string, optional): Video aspect ratio for text-to-video (default: "16:9")
|
|
646
|
+
- Options: `16:9`, `9:16`, `1:1`
|
|
647
|
+
- `resolution` (string, optional): Video resolution (default: "1080p")
|
|
648
|
+
- `720p`: Faster generation
|
|
649
|
+
- `1080p`: Higher quality
|
|
650
|
+
- `duration` (string, optional): Video duration for image-to-video (default: "5")
|
|
651
|
+
- Options: `5`, `10` seconds
|
|
652
|
+
- `negative_prompt` (string, optional): Negative prompt to describe content to avoid (max 500 chars, default: "")
|
|
653
|
+
- `enable_prompt_expansion` (boolean, optional): Enable prompt rewriting using LLM (default: true)
|
|
654
|
+
- `seed` (integer, optional): Random seed for reproducible results
|
|
655
|
+
- `callBackUrl` (string, optional): URL for task completion notifications
|
|
656
|
+
|
|
657
|
+
**Examples:**
|
|
658
|
+
|
|
659
|
+
Text-to-video generation:
|
|
660
|
+
```json
|
|
661
|
+
{
|
|
662
|
+
"prompt": "A dimly lit jazz bar at night, wooden tables glowing under warm pendant lights. Patrons sip drinks and chat quietly while a three-piece band performs on stage. The saxophone player stands under a spotlight, gleaming instrument reflecting the light. No dialogue. Ambient audio: smooth live jazz music with saxophone and piano, clinking glasses, low murmur of audience conversations.",
|
|
663
|
+
"aspect_ratio": "16:9",
|
|
664
|
+
"resolution": "1080p",
|
|
665
|
+
"enable_prompt_expansion": true,
|
|
666
|
+
"seed": 42
|
|
667
|
+
}
|
|
668
|
+
```
|
|
669
|
+
|
|
670
|
+
Image-to-video generation:
|
|
671
|
+
```json
|
|
672
|
+
{
|
|
673
|
+
"prompt": "The same woman from the reference image looks directly into the camera, takes a breath, then smiles brightly and speaks with enthusiasm: 'Have you heard? Alibaba Wan 2.5 API is now available on Kie.ai!'",
|
|
674
|
+
"image_url": "https://example.com/portrait.jpg",
|
|
675
|
+
"duration": "5",
|
|
676
|
+
"resolution": "1080p",
|
|
677
|
+
"negative_prompt": "blurry, low quality",
|
|
678
|
+
"seed": 123
|
|
679
|
+
}
|
|
680
|
+
```
|
|
681
|
+
|
|
682
|
+
**Key Features:**
|
|
683
|
+
- **Unified Interface**: Single tool for both text-to-video and image-to-video
|
|
684
|
+
- **Smart Mode Detection**: Automatically detects mode based on presence of `image_url`
|
|
685
|
+
- **Prompt Expansion**: LLM-powered prompt rewriting for better results with short prompts
|
|
686
|
+
- **Flexible Resolutions**: 720p for speed, 1080p for quality
|
|
687
|
+
- **Aspect Ratio Control**: Support for horizontal, vertical, and square formats (text-to-video)
|
|
688
|
+
- **Duration Control**: 5 or 10 second options for image-to-video
|
|
689
|
+
- **Negative Prompts**: Fine-tune results by specifying what to avoid
|
|
690
|
+
- **Reproducible Results**: Seed control for consistent output
|
|
691
|
+
|
|
692
|
+
**Note**: The `callBackUrl` is optional and will use the `KIE_AI_CALLBACK_URL` environment variable if not provided. Video generation typically takes 2-6 minutes depending on resolution and complexity.
|
|
693
|
+
|
|
567
694
|
## Why Developers Choose Kie.ai Over Alternatives
|
|
568
695
|
|
|
569
696
|
### 💸 **Better Value Than Fal.ai**
|
|
@@ -834,7 +961,7 @@ nano_banana_generate: "Modern minimalist app icon for fitness tracker"
|
|
|
834
961
|
bytedance_seedance_video: "Screen recording showing app features, clean interface"
|
|
835
962
|
|
|
836
963
|
# Add narration
|
|
837
|
-
|
|
964
|
+
elevenlabs_tts: "Tap here to get started with your new profile"
|
|
838
965
|
```
|
|
839
966
|
|
|
840
967
|
### 🏢 **Enterprise Applications**
|