npm - @felores/kie-ai-mcp-server - Versions diffs - 1.5.0 → 1.7.2 - Mend

@felores/kie-ai-mcp-server 1.5.0 → 1.7.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/README.md CHANGED Viewed

@@ -22,6 +22,7 @@ Access the world's best AI models through a single, developer-friendly API. Gene
 - **Nano Banana**: Lightning-fast image generation and editing
 - **ElevenLabs**: Studio-quality text-to-speech and sound effects
 - **ByteDance Seedance**: High-quality video with text-to-video and image-to-video
+- **ByteDance Seedream V4**: Advanced image generation and editing with unified interface
 ### 💰 **Affordable Pricing**
 Pay-as-you-go credit system means you only pay for what you use. Good for startups and enterprises looking to reduce AI costs.
@@ -340,24 +341,25 @@ Using explicit model (overrides default V5):
 **Note**: In custom mode, `style` and `title` are required. If `instrumental` is false, `prompt` is used as exact lyrics. The `callBackUrl` is optional and will use the `KIE_AI_CALLBACK_URL` environment variable if not provided. The `model` parameter defaults to "V5" but can be explicitly set to any available version.
 ### 9. `elevenlabs_tts`
-Generate speech from text using ElevenLabs multilingual TTS v2 model.
+Generate speech from text using ElevenLabs TTS models (Turbo 2.5 by default, with optional Multilingual v2 support).
 **Parameters:**
 - `text` (string, required): The text to convert to speech (max 5000 characters)
+- `model` (enum, optional): TTS model to use - "turbo" (faster, default) or "multilingual" (supports context)
 - `voice` (enum, optional): Voice to use - "Rachel", "Aria", "Roger", "Sarah", "Laura", "Charlie", "George", "Callum", "River", "Liam", "Charlotte", "Alice", "Matilda", "Will", "Jessica", "Eric", "Chris", "Brian", "Daniel", "Lily", "Bill" (default: "Rachel")
 - `stability` (number, optional): Voice stability (0-1, step 0.01, default: 0.5)
 - `similarity_boost` (number, optional): Similarity boost (0-1, step 0.01, default: 0.75)
 - `style` (number, optional): Style exaggeration (0-1, step 0.01, default: 0)
 - `speed` (number, optional): Speech speed (0.7-1.2, step 0.01, default: 1.0)
 - `timestamps` (boolean, optional): Whether to return timestamps for each word (default: false)
-- `previous_text` (string, optional): Text that came before current request for continuity (max 5000 chars)
-- `next_text` (string, optional): Text that comes after current request for continuity (max 5000 chars)
-- `language_code` (string, optional): ISO 639-1 language code for language enforcement (max 500 chars)
+- `previous_text` (string, optional): Text that came before current request (multilingual model only, max 5000 chars)
+- `next_text` (string, optional): Text that comes after current request (multilingual model only, max 5000 chars)
+- `language_code` (string, optional): ISO 639-1 language code for language enforcement (turbo model only, max 500 chars)
 - `callBackUrl` (string, optional): URL to receive task completion updates (uses KIE_AI_CALLBACK_URL environment variable if not provided)
 **Examples:**
-Basic TTS generation:
+Basic TTS generation (uses Turbo model by default):
 ```json
 {
   "text": "Hello, this is a test of the ElevenLabs text-to-speech system.",
@@ -365,86 +367,36 @@ Basic TTS generation:
 }
 ```
-Advanced voice controls:
-```json
-{
-  "text": "Welcome to our presentation on artificial intelligence",
-  "voice": "Aria",
-  "stability": 0.8,
-  "similarity_boost": 0.9,
-  "style": 0.3,
-  "speed": 1.1
-}
-```
-With continuity for longer texts:
-```json
-{
-  "text": "This is the second part of our conversation.",
-  "voice": "Roger",
-  "previous_text": "This is the first part of our conversation.",
-  "next_text": "This is the third part of our conversation."
-}
-```
-**Note**: The `callBackUrl` is optional and will use the `KIE_AI_CALLBACK_URL` environment variable if not provided. Generation typically takes 30 seconds to 2 minutes depending on text length.
-### 10. `elevenlabs_tts_turbo`
-Generate speech from text using ElevenLabs Turbo 2.5 TTS model (faster generation with language enforcement support).
-**Parameters:**
-- `text` (string, required): The text to convert to speech (max 5000 characters)
-- `voice` (enum, optional): Voice to use - "Rachel", "Aria", "Roger", "Sarah", "Laura", "Charlie", "George", "Callum", "River", "Liam", "Charlotte", "Alice", "Matilda", "Will", "Jessica", "Eric", "Chris", "Brian", "Daniel", "Lily", "Bill" (default: "Rachel")
-- `stability` (number, optional): Voice stability (0-1, step 0.01, default: 0.5)
-- `similarity_boost` (number, optional): Similarity boost (0-1, step 0.01, default: 0.75)
-- `style` (number, optional): Style exaggeration (0-1, step 0.01, default: 0)
-- `speed` (number, optional): Speech speed (0.7-1.2, step 0.01, default: 1.0)
-- `timestamps` (boolean, optional): Whether to return timestamps for each word (default: false)
-- `previous_text` (string, optional): Text that came before current request for continuity (max 5000 chars)
-- `next_text` (string, optional): Text that comes after current request for continuity (max 5000 chars)
-- `language_code` (string, optional): ISO 639-1 language code for language enforcement - Turbo 2.5 supports this feature (max 500 chars)
-- `callBackUrl` (string, optional): URL to receive task completion updates (uses KIE_AI_CALLBACK_URL environment variable if not provided)
-**Examples:**
-Fast TTS generation:
-```json
-{
-  "text": "This is a fast generation using the Turbo model.",
-  "voice": "Aria"
-}
-```
-With language enforcement:
+Fast generation with language enforcement (Turbo model):
 ```json
 {
   "text": "Bonjour, comment allez-vous?",
   "voice": "Rachel",
+  "model": "turbo",
   "language_code": "fr"
 }
 ```
-Advanced controls with continuity:
+Advanced voice controls with context (Multilingual model):
 ```json
 {
-  "text": "This is part two of our series.",
+  "text": "This is the second part of our conversation.",
   "voice": "Roger",
-  "stability": 0.9,
-  "similarity_boost": 0.8,
-  "previous_text": "This is part one of our series.",
-  "language_code": "en"
+  "model": "multilingual",
+  "stability": 0.8,
+  "similarity_boost": 0.9,
+  "previous_text": "This is the first part of our conversation.",
+  "next_text": "This is the third part of our conversation."
 }
 ```
-**Key Differences from Multilingual TTS:**
-- **Faster Generation**: Turbo 2.5 processes text 15-60 seconds (vs 30-120 seconds for multilingual)
-- **Language Enforcement**: Supports ISO 639-1 language codes for consistent language output
-- **Same Voice Options**: All 21 voices available
-- **Same Quality**: Maintains high audio quality with faster processing
+**Model Comparison:**
+- **Turbo 2.5** (default): Faster generation (15-60 seconds), supports language enforcement with `language_code`
+- **Multilingual v2**: Supports context with `previous_text`/`next_text`, generation takes 30-120 seconds
-**Note**: The `callBackUrl` is optional and will use the `KIE_AI_CALLBACK_URL` environment variable if not provided. Turbo 2.5 generation is faster and supports language enforcement.
+**Note**: The `callBackUrl` is optional and will use the `KIE_AI_CALLBACK_URL` environment variable if not provided. Choose Turbo model for speed and language enforcement, or Multilingual model for context-aware speech generation.
-### 11. `elevenlabs_ttsfx`
+### 10. `elevenlabs_ttsfx`
 Generate sound effects from text descriptions using ElevenLabs Sound Effects v2 model.
 **Parameters:**
@@ -564,6 +516,181 @@ Video with specific ending frame:
 **Note**: The `callBackUrl` is optional and will use the `KIE_AI_CALLBACK_URL` environment variable if not provided. Video generation typically takes 2-5 minutes depending on quality and complexity.
+### 13. `bytedance_seedream_image`
+Generate and edit images using ByteDance Seedream V4 models (unified tool for both text-to-image and image editing).
+**Parameters:**
+- `prompt` (string, required): Text prompt for image generation or editing (max 10000 chars)
+- `image_urls` (array, optional): Array of image URLs for editing mode (1-10 images, if not provided, uses text-to-image)
+- `image_size` (string, optional): Image aspect ratio (default: "1:1")
+  - Options: `1:1`, `4:3`, `3:4`, `16:9`, `9:16`, `21:9`, `9:21`, `3:2`, `2:3`
+- `image_resolution` (string, optional): Image resolution (default: "1K")
+  - `1K`: Standard resolution (1024px on shortest side)
+  - `2K`: High resolution (2048px on shortest side)
+  - `4K`: Ultra high resolution (4096px on shortest side)
+- `max_images` (integer, optional): Number of images to generate (1-6, default: 1)
+- `seed` (integer, optional): Random seed for reproducible results (default: -1 for random)
+- `callBackUrl` (string, optional): URL for task completion notifications
+**Examples:**
+Text-to-image generation:
+```json
+{
+  "prompt": "A majestic dragon perched atop a crystal mountain at sunset, digital art style",
+  "image_size": "16:9",
+  "image_resolution": "2K",
+  "max_images": 2,
+  "seed": 42
+}
+```
+Image editing:
+```json
+{
+  "prompt": "Transform the day scene into a magical night with glowing stars and moonlight",
+  "image_urls": ["https://example.com/day-landscape.jpg"],
+  "image_size": "16:9",
+  "image_resolution": "2K",
+  "max_images": 1
+}
+```
+Multiple image editing:
+```json
+{
+  "prompt": "Apply a consistent cyberpunk aesthetic to all images with neon lights and futuristic elements",
+  "image_urls": [
+    "https://example.com/character1.jpg",
+    "https://example.com/character2.jpg",
+    "https://example.com/background.jpg"
+  ],
+  "image_resolution": "4K",
+  "max_images": 3
+}
+```
+**Key Features:**
+- **Unified Interface**: Single tool for both text-to-image and image editing
+- **Smart Mode Detection**: Automatically detects mode based on presence of `image_urls`
+- **High Resolution**: Support for 1K, 2K, and 4K output
+- **Multiple Images**: Generate up to 6 images in a single request
+- **Batch Editing**: Edit up to 10 images simultaneously with consistent style
+- **Reproducible Results**: Seed control for consistent output
+**Note**: The `callBackUrl` is optional and will use the `KIE_AI_CALLBACK_URL` environment variable if not provided. Image generation typically takes 30-120 seconds depending on resolution and complexity.
+### 14. `runway_aleph_video`
+Transform videos using Runway Aleph video-to-video generation with AI-powered editing.
+**Parameters:**
+- `prompt` (string, required): Text prompt describing desired video transformation (max 1000 chars)
+- `videoUrl` (string, required): URL of the input video to transform
+- `waterMark` (string, optional): Watermark text to add to the video (max 100 chars, default: "")
+- `uploadCn` (boolean, optional): Whether to upload to China servers (default: false)
+- `aspectRatio` (enum, optional): Output video aspect ratio (default: "16:9")
+  - Options: `16:9`, `9:16`, `4:3`, `3:4`, `1:1`, `21:9`
+- `seed` (integer, optional): Random seed for reproducible results (1-999999)
+- `referenceImage` (string, optional): URL of reference image for style guidance
+- `callBackUrl` (string, optional): URL for task completion notifications
+**Examples:**
+Basic video transformation:
+```json
+{
+  "prompt": "Transform this video into a cinematic anime style with vibrant colors",
+  "videoUrl": "https://example.com/input-video.mp4",
+  "aspectRatio": "16:9"
+}
+```
+Advanced transformation with reference image:
+```json
+{
+  "prompt": "Apply the artistic style of the reference image to this video",
+  "videoUrl": "https://example.com/cooking-video.mp4",
+  "referenceImage": "https://example.com/van-gogh-painting.jpg",
+  "seed": 123456,
+  "waterMark": "My Channel"
+}
+```
+Vertical video for social media:
+```json
+{
+  "prompt": "Convert to a dreamy, ethereal style with soft lighting",
+  "videoUrl": "https://example.com/landscape-video.mp4",
+  "aspectRatio": "9:16",
+  "uploadCn": false
+}
+```
+**Key Features:**
+- **Video-to-Video Transformation**: Transform existing videos with AI-powered editing
+- **Style Transfer**: Apply artistic styles from text prompts or reference images
+- **Aspect Ratio Control**: Convert between horizontal, vertical, and square formats
+- **Reproducible Results**: Seed control for consistent transformations
+- **Watermark Support**: Add custom watermarks to transformed videos
+- **Reference Guidance**: Use reference images to guide the transformation style
+**Note**: The `callBackUrl` is optional and will use the `KIE_AI_CALLBACK_URL` environment variable if not provided. Video-to-video transformation typically takes 3-8 minutes depending on complexity and length.
+### 14. `wan_video`
+Generate videos using Alibaba Wan 2.5 models (unified tool for both text-to-video and image-to-video).
+**Parameters:**
+- `prompt` (string, required): Text prompt for video generation (max 800 chars)
+- `image_url` (string, optional): URL of input image for image-to-video generation (if not provided, uses text-to-video)
+- `aspect_ratio` (string, optional): Video aspect ratio for text-to-video (default: "16:9")
+  - Options: `16:9`, `9:16`, `1:1`
+- `resolution` (string, optional): Video resolution (default: "1080p")
+  - `720p`: Faster generation
+  - `1080p`: Higher quality
+- `duration` (string, optional): Video duration for image-to-video (default: "5")
+  - Options: `5`, `10` seconds
+- `negative_prompt` (string, optional): Negative prompt to describe content to avoid (max 500 chars, default: "")
+- `enable_prompt_expansion` (boolean, optional): Enable prompt rewriting using LLM (default: true)
+- `seed` (integer, optional): Random seed for reproducible results
+- `callBackUrl` (string, optional): URL for task completion notifications
+**Examples:**
+Text-to-video generation:
+```json
+{
+  "prompt": "A dimly lit jazz bar at night, wooden tables glowing under warm pendant lights. Patrons sip drinks and chat quietly while a three-piece band performs on stage. The saxophone player stands under a spotlight, gleaming instrument reflecting the light. No dialogue. Ambient audio: smooth live jazz music with saxophone and piano, clinking glasses, low murmur of audience conversations.",
+  "aspect_ratio": "16:9",
+  "resolution": "1080p",
+  "enable_prompt_expansion": true,
+  "seed": 42
+}
+```
+Image-to-video generation:
+```json
+{
+  "prompt": "The same woman from the reference image looks directly into the camera, takes a breath, then smiles brightly and speaks with enthusiasm: 'Have you heard? Alibaba Wan 2.5 API is now available on Kie.ai!'",
+  "image_url": "https://example.com/portrait.jpg",
+  "duration": "5",
+  "resolution": "1080p",
+  "negative_prompt": "blurry, low quality",
+  "seed": 123
+}
+```
+**Key Features:**
+- **Unified Interface**: Single tool for both text-to-video and image-to-video
+- **Smart Mode Detection**: Automatically detects mode based on presence of `image_url`
+- **Prompt Expansion**: LLM-powered prompt rewriting for better results with short prompts
+- **Flexible Resolutions**: 720p for speed, 1080p for quality
+- **Aspect Ratio Control**: Support for horizontal, vertical, and square formats (text-to-video)
+- **Duration Control**: 5 or 10 second options for image-to-video
+- **Negative Prompts**: Fine-tune results by specifying what to avoid
+- **Reproducible Results**: Seed control for consistent output
+**Note**: The `callBackUrl` is optional and will use the `KIE_AI_CALLBACK_URL` environment variable if not provided. Video generation typically takes 2-6 minutes depending on resolution and complexity.
 ## Why Developers Choose Kie.ai Over Alternatives
 ### 💸 **Better Value Than Fal.ai**
@@ -834,7 +961,7 @@ nano_banana_generate: "Modern minimalist app icon for fitness tracker"
 bytedance_seedance_video: "Screen recording showing app features, clean interface"
 # Add narration
-elevenlabs_tts_turbo: "Tap here to get started with your new profile"
+elevenlabs_tts: "Tap here to get started with your new profile"
 ```
 ### 🏢 **Enterprise Applications**