@felores/kie-ai-mcp-server 1.5.0 → 1.7.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -22,6 +22,7 @@ Access the world's best AI models through a single, developer-friendly API. Gene
22
22
  - **Nano Banana**: Lightning-fast image generation and editing
23
23
  - **ElevenLabs**: Studio-quality text-to-speech and sound effects
24
24
  - **ByteDance Seedance**: High-quality video with text-to-video and image-to-video
25
+ - **ByteDance Seedream V4**: Advanced image generation and editing with unified interface
25
26
 
26
27
  ### 💰 **Affordable Pricing**
27
28
  Pay-as-you-go credit system means you only pay for what you use. Good for startups and enterprises looking to reduce AI costs.
@@ -340,24 +341,25 @@ Using explicit model (overrides default V5):
340
341
  **Note**: In custom mode, `style` and `title` are required. If `instrumental` is false, `prompt` is used as exact lyrics. The `callBackUrl` is optional and will use the `KIE_AI_CALLBACK_URL` environment variable if not provided. The `model` parameter defaults to "V5" but can be explicitly set to any available version.
341
342
 
342
343
  ### 9. `elevenlabs_tts`
343
- Generate speech from text using ElevenLabs multilingual TTS v2 model.
344
+ Generate speech from text using ElevenLabs TTS models (Turbo 2.5 by default, with optional Multilingual v2 support).
344
345
 
345
346
  **Parameters:**
346
347
  - `text` (string, required): The text to convert to speech (max 5000 characters)
348
+ - `model` (enum, optional): TTS model to use - "turbo" (faster, default) or "multilingual" (supports context)
347
349
  - `voice` (enum, optional): Voice to use - "Rachel", "Aria", "Roger", "Sarah", "Laura", "Charlie", "George", "Callum", "River", "Liam", "Charlotte", "Alice", "Matilda", "Will", "Jessica", "Eric", "Chris", "Brian", "Daniel", "Lily", "Bill" (default: "Rachel")
348
350
  - `stability` (number, optional): Voice stability (0-1, step 0.01, default: 0.5)
349
351
  - `similarity_boost` (number, optional): Similarity boost (0-1, step 0.01, default: 0.75)
350
352
  - `style` (number, optional): Style exaggeration (0-1, step 0.01, default: 0)
351
353
  - `speed` (number, optional): Speech speed (0.7-1.2, step 0.01, default: 1.0)
352
354
  - `timestamps` (boolean, optional): Whether to return timestamps for each word (default: false)
353
- - `previous_text` (string, optional): Text that came before current request for continuity (max 5000 chars)
354
- - `next_text` (string, optional): Text that comes after current request for continuity (max 5000 chars)
355
- - `language_code` (string, optional): ISO 639-1 language code for language enforcement (max 500 chars)
355
+ - `previous_text` (string, optional): Text that came before current request (multilingual model only, max 5000 chars)
356
+ - `next_text` (string, optional): Text that comes after current request (multilingual model only, max 5000 chars)
357
+ - `language_code` (string, optional): ISO 639-1 language code for language enforcement (turbo model only, max 500 chars)
356
358
  - `callBackUrl` (string, optional): URL to receive task completion updates (uses KIE_AI_CALLBACK_URL environment variable if not provided)
357
359
 
358
360
  **Examples:**
359
361
 
360
- Basic TTS generation:
362
+ Basic TTS generation (uses Turbo model by default):
361
363
  ```json
362
364
  {
363
365
  "text": "Hello, this is a test of the ElevenLabs text-to-speech system.",
@@ -365,86 +367,36 @@ Basic TTS generation:
365
367
  }
366
368
  ```
367
369
 
368
- Advanced voice controls:
369
- ```json
370
- {
371
- "text": "Welcome to our presentation on artificial intelligence",
372
- "voice": "Aria",
373
- "stability": 0.8,
374
- "similarity_boost": 0.9,
375
- "style": 0.3,
376
- "speed": 1.1
377
- }
378
- ```
379
-
380
- With continuity for longer texts:
381
- ```json
382
- {
383
- "text": "This is the second part of our conversation.",
384
- "voice": "Roger",
385
- "previous_text": "This is the first part of our conversation.",
386
- "next_text": "This is the third part of our conversation."
387
- }
388
- ```
389
-
390
- **Note**: The `callBackUrl` is optional and will use the `KIE_AI_CALLBACK_URL` environment variable if not provided. Generation typically takes 30 seconds to 2 minutes depending on text length.
391
-
392
- ### 10. `elevenlabs_tts_turbo`
393
- Generate speech from text using ElevenLabs Turbo 2.5 TTS model (faster generation with language enforcement support).
394
-
395
- **Parameters:**
396
- - `text` (string, required): The text to convert to speech (max 5000 characters)
397
- - `voice` (enum, optional): Voice to use - "Rachel", "Aria", "Roger", "Sarah", "Laura", "Charlie", "George", "Callum", "River", "Liam", "Charlotte", "Alice", "Matilda", "Will", "Jessica", "Eric", "Chris", "Brian", "Daniel", "Lily", "Bill" (default: "Rachel")
398
- - `stability` (number, optional): Voice stability (0-1, step 0.01, default: 0.5)
399
- - `similarity_boost` (number, optional): Similarity boost (0-1, step 0.01, default: 0.75)
400
- - `style` (number, optional): Style exaggeration (0-1, step 0.01, default: 0)
401
- - `speed` (number, optional): Speech speed (0.7-1.2, step 0.01, default: 1.0)
402
- - `timestamps` (boolean, optional): Whether to return timestamps for each word (default: false)
403
- - `previous_text` (string, optional): Text that came before current request for continuity (max 5000 chars)
404
- - `next_text` (string, optional): Text that comes after current request for continuity (max 5000 chars)
405
- - `language_code` (string, optional): ISO 639-1 language code for language enforcement - Turbo 2.5 supports this feature (max 500 chars)
406
- - `callBackUrl` (string, optional): URL to receive task completion updates (uses KIE_AI_CALLBACK_URL environment variable if not provided)
407
-
408
- **Examples:**
409
-
410
- Fast TTS generation:
411
- ```json
412
- {
413
- "text": "This is a fast generation using the Turbo model.",
414
- "voice": "Aria"
415
- }
416
- ```
417
-
418
- With language enforcement:
370
+ Fast generation with language enforcement (Turbo model):
419
371
  ```json
420
372
  {
421
373
  "text": "Bonjour, comment allez-vous?",
422
374
  "voice": "Rachel",
375
+ "model": "turbo",
423
376
  "language_code": "fr"
424
377
  }
425
378
  ```
426
379
 
427
- Advanced controls with continuity:
380
+ Advanced voice controls with context (Multilingual model):
428
381
  ```json
429
382
  {
430
- "text": "This is part two of our series.",
383
+ "text": "This is the second part of our conversation.",
431
384
  "voice": "Roger",
432
- "stability": 0.9,
433
- "similarity_boost": 0.8,
434
- "previous_text": "This is part one of our series.",
435
- "language_code": "en"
385
+ "model": "multilingual",
386
+ "stability": 0.8,
387
+ "similarity_boost": 0.9,
388
+ "previous_text": "This is the first part of our conversation.",
389
+ "next_text": "This is the third part of our conversation."
436
390
  }
437
391
  ```
438
392
 
439
- **Key Differences from Multilingual TTS:**
440
- - **Faster Generation**: Turbo 2.5 processes text 15-60 seconds (vs 30-120 seconds for multilingual)
441
- - **Language Enforcement**: Supports ISO 639-1 language codes for consistent language output
442
- - **Same Voice Options**: All 21 voices available
443
- - **Same Quality**: Maintains high audio quality with faster processing
393
+ **Model Comparison:**
394
+ - **Turbo 2.5** (default): Faster generation (15-60 seconds), supports language enforcement with `language_code`
395
+ - **Multilingual v2**: Supports context with `previous_text`/`next_text`, generation takes 30-120 seconds
444
396
 
445
- **Note**: The `callBackUrl` is optional and will use the `KIE_AI_CALLBACK_URL` environment variable if not provided. Turbo 2.5 generation is faster and supports language enforcement.
397
+ **Note**: The `callBackUrl` is optional and will use the `KIE_AI_CALLBACK_URL` environment variable if not provided. Choose Turbo model for speed and language enforcement, or Multilingual model for context-aware speech generation.
446
398
 
447
- ### 11. `elevenlabs_ttsfx`
399
+ ### 10. `elevenlabs_ttsfx`
448
400
  Generate sound effects from text descriptions using ElevenLabs Sound Effects v2 model.
449
401
 
450
402
  **Parameters:**
@@ -564,6 +516,181 @@ Video with specific ending frame:
564
516
 
565
517
  **Note**: The `callBackUrl` is optional and will use the `KIE_AI_CALLBACK_URL` environment variable if not provided. Video generation typically takes 2-5 minutes depending on quality and complexity.
566
518
 
519
+ ### 13. `bytedance_seedream_image`
520
+ Generate and edit images using ByteDance Seedream V4 models (unified tool for both text-to-image and image editing).
521
+
522
+ **Parameters:**
523
+ - `prompt` (string, required): Text prompt for image generation or editing (max 10000 chars)
524
+ - `image_urls` (array, optional): Array of image URLs for editing mode (1-10 images, if not provided, uses text-to-image)
525
+ - `image_size` (string, optional): Image aspect ratio (default: "1:1")
526
+ - Options: `1:1`, `4:3`, `3:4`, `16:9`, `9:16`, `21:9`, `9:21`, `3:2`, `2:3`
527
+ - `image_resolution` (string, optional): Image resolution (default: "1K")
528
+ - `1K`: Standard resolution (1024px on shortest side)
529
+ - `2K`: High resolution (2048px on shortest side)
530
+ - `4K`: Ultra high resolution (4096px on shortest side)
531
+ - `max_images` (integer, optional): Number of images to generate (1-6, default: 1)
532
+ - `seed` (integer, optional): Random seed for reproducible results (default: -1 for random)
533
+ - `callBackUrl` (string, optional): URL for task completion notifications
534
+
535
+ **Examples:**
536
+
537
+ Text-to-image generation:
538
+ ```json
539
+ {
540
+ "prompt": "A majestic dragon perched atop a crystal mountain at sunset, digital art style",
541
+ "image_size": "16:9",
542
+ "image_resolution": "2K",
543
+ "max_images": 2,
544
+ "seed": 42
545
+ }
546
+ ```
547
+
548
+ Image editing:
549
+ ```json
550
+ {
551
+ "prompt": "Transform the day scene into a magical night with glowing stars and moonlight",
552
+ "image_urls": ["https://example.com/day-landscape.jpg"],
553
+ "image_size": "16:9",
554
+ "image_resolution": "2K",
555
+ "max_images": 1
556
+ }
557
+ ```
558
+
559
+ Multiple image editing:
560
+ ```json
561
+ {
562
+ "prompt": "Apply a consistent cyberpunk aesthetic to all images with neon lights and futuristic elements",
563
+ "image_urls": [
564
+ "https://example.com/character1.jpg",
565
+ "https://example.com/character2.jpg",
566
+ "https://example.com/background.jpg"
567
+ ],
568
+ "image_resolution": "4K",
569
+ "max_images": 3
570
+ }
571
+ ```
572
+
573
+ **Key Features:**
574
+ - **Unified Interface**: Single tool for both text-to-image and image editing
575
+ - **Smart Mode Detection**: Automatically detects mode based on presence of `image_urls`
576
+ - **High Resolution**: Support for 1K, 2K, and 4K output
577
+ - **Multiple Images**: Generate up to 6 images in a single request
578
+ - **Batch Editing**: Edit up to 10 images simultaneously with consistent style
579
+ - **Reproducible Results**: Seed control for consistent output
580
+
581
+ **Note**: The `callBackUrl` is optional and will use the `KIE_AI_CALLBACK_URL` environment variable if not provided. Image generation typically takes 30-120 seconds depending on resolution and complexity.
582
+
583
+ ### 14. `runway_aleph_video`
584
+ Transform videos using Runway Aleph video-to-video generation with AI-powered editing.
585
+
586
+ **Parameters:**
587
+ - `prompt` (string, required): Text prompt describing desired video transformation (max 1000 chars)
588
+ - `videoUrl` (string, required): URL of the input video to transform
589
+ - `waterMark` (string, optional): Watermark text to add to the video (max 100 chars, default: "")
590
+ - `uploadCn` (boolean, optional): Whether to upload to China servers (default: false)
591
+ - `aspectRatio` (enum, optional): Output video aspect ratio (default: "16:9")
592
+ - Options: `16:9`, `9:16`, `4:3`, `3:4`, `1:1`, `21:9`
593
+ - `seed` (integer, optional): Random seed for reproducible results (1-999999)
594
+ - `referenceImage` (string, optional): URL of reference image for style guidance
595
+ - `callBackUrl` (string, optional): URL for task completion notifications
596
+
597
+ **Examples:**
598
+
599
+ Basic video transformation:
600
+ ```json
601
+ {
602
+ "prompt": "Transform this video into a cinematic anime style with vibrant colors",
603
+ "videoUrl": "https://example.com/input-video.mp4",
604
+ "aspectRatio": "16:9"
605
+ }
606
+ ```
607
+
608
+ Advanced transformation with reference image:
609
+ ```json
610
+ {
611
+ "prompt": "Apply the artistic style of the reference image to this video",
612
+ "videoUrl": "https://example.com/cooking-video.mp4",
613
+ "referenceImage": "https://example.com/van-gogh-painting.jpg",
614
+ "seed": 123456,
615
+ "waterMark": "My Channel"
616
+ }
617
+ ```
618
+
619
+ Vertical video for social media:
620
+ ```json
621
+ {
622
+ "prompt": "Convert to a dreamy, ethereal style with soft lighting",
623
+ "videoUrl": "https://example.com/landscape-video.mp4",
624
+ "aspectRatio": "9:16",
625
+ "uploadCn": false
626
+ }
627
+ ```
628
+
629
+ **Key Features:**
630
+ - **Video-to-Video Transformation**: Transform existing videos with AI-powered editing
631
+ - **Style Transfer**: Apply artistic styles from text prompts or reference images
632
+ - **Aspect Ratio Control**: Convert between horizontal, vertical, and square formats
633
+ - **Reproducible Results**: Seed control for consistent transformations
634
+ - **Watermark Support**: Add custom watermarks to transformed videos
635
+ - **Reference Guidance**: Use reference images to guide the transformation style
636
+
637
+ **Note**: The `callBackUrl` is optional and will use the `KIE_AI_CALLBACK_URL` environment variable if not provided. Video-to-video transformation typically takes 3-8 minutes depending on complexity and length.
638
+
639
+ ### 14. `wan_video`
640
+ Generate videos using Alibaba Wan 2.5 models (unified tool for both text-to-video and image-to-video).
641
+
642
+ **Parameters:**
643
+ - `prompt` (string, required): Text prompt for video generation (max 800 chars)
644
+ - `image_url` (string, optional): URL of input image for image-to-video generation (if not provided, uses text-to-video)
645
+ - `aspect_ratio` (string, optional): Video aspect ratio for text-to-video (default: "16:9")
646
+ - Options: `16:9`, `9:16`, `1:1`
647
+ - `resolution` (string, optional): Video resolution (default: "1080p")
648
+ - `720p`: Faster generation
649
+ - `1080p`: Higher quality
650
+ - `duration` (string, optional): Video duration for image-to-video (default: "5")
651
+ - Options: `5`, `10` seconds
652
+ - `negative_prompt` (string, optional): Negative prompt to describe content to avoid (max 500 chars, default: "")
653
+ - `enable_prompt_expansion` (boolean, optional): Enable prompt rewriting using LLM (default: true)
654
+ - `seed` (integer, optional): Random seed for reproducible results
655
+ - `callBackUrl` (string, optional): URL for task completion notifications
656
+
657
+ **Examples:**
658
+
659
+ Text-to-video generation:
660
+ ```json
661
+ {
662
+ "prompt": "A dimly lit jazz bar at night, wooden tables glowing under warm pendant lights. Patrons sip drinks and chat quietly while a three-piece band performs on stage. The saxophone player stands under a spotlight, gleaming instrument reflecting the light. No dialogue. Ambient audio: smooth live jazz music with saxophone and piano, clinking glasses, low murmur of audience conversations.",
663
+ "aspect_ratio": "16:9",
664
+ "resolution": "1080p",
665
+ "enable_prompt_expansion": true,
666
+ "seed": 42
667
+ }
668
+ ```
669
+
670
+ Image-to-video generation:
671
+ ```json
672
+ {
673
+ "prompt": "The same woman from the reference image looks directly into the camera, takes a breath, then smiles brightly and speaks with enthusiasm: 'Have you heard? Alibaba Wan 2.5 API is now available on Kie.ai!'",
674
+ "image_url": "https://example.com/portrait.jpg",
675
+ "duration": "5",
676
+ "resolution": "1080p",
677
+ "negative_prompt": "blurry, low quality",
678
+ "seed": 123
679
+ }
680
+ ```
681
+
682
+ **Key Features:**
683
+ - **Unified Interface**: Single tool for both text-to-video and image-to-video
684
+ - **Smart Mode Detection**: Automatically detects mode based on presence of `image_url`
685
+ - **Prompt Expansion**: LLM-powered prompt rewriting for better results with short prompts
686
+ - **Flexible Resolutions**: 720p for speed, 1080p for quality
687
+ - **Aspect Ratio Control**: Support for horizontal, vertical, and square formats (text-to-video)
688
+ - **Duration Control**: 5 or 10 second options for image-to-video
689
+ - **Negative Prompts**: Fine-tune results by specifying what to avoid
690
+ - **Reproducible Results**: Seed control for consistent output
691
+
692
+ **Note**: The `callBackUrl` is optional and will use the `KIE_AI_CALLBACK_URL` environment variable if not provided. Video generation typically takes 2-6 minutes depending on resolution and complexity.
693
+
567
694
  ## Why Developers Choose Kie.ai Over Alternatives
568
695
 
569
696
  ### 💸 **Better Value Than Fal.ai**
@@ -834,7 +961,7 @@ nano_banana_generate: "Modern minimalist app icon for fitness tracker"
834
961
  bytedance_seedance_video: "Screen recording showing app features, clean interface"
835
962
 
836
963
  # Add narration
837
- elevenlabs_tts_turbo: "Tap here to get started with your new profile"
964
+ elevenlabs_tts: "Tap here to get started with your new profile"
838
965
  ```
839
966
 
840
967
  ### 🏢 **Enterprise Applications**