npm - @runpod/ai-sdk-provider - Versions diffs - 1.1.0 → 1.3.0 - Mend

@runpod/ai-sdk-provider 1.1.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,21 @@
 # @runpod/ai-sdk-provider
+## 1.3.0
+### Minor Changes
+- 973fae6: Add support for the Tongyi-MAI Z-Image Turbo image model with validated sizes and aspect ratios.
+## 1.2.0
+### Minor Changes
+- cf0c976: Add transcription model support with `pruna/whisper-v3-large`
+  - Add `transcriptionModel()` and `transcription()` methods to the provider
+  - Support audio transcription via RunPod's Whisper endpoint
+  - Accept audio as `Uint8Array`, base64 string, or URL via providerOptions
+  - Return transcription text, segments with timing, detected language, and duration
 ## 1.1.0
 ### Minor Changes

package/README.md CHANGED Viewed

@@ -278,22 +278,23 @@ Check out our [examples](https://github.com/runpod/examples/tree/main/ai-sdk/get
 ### Supported Models
-| Model ID                               | Type | Resolution      | Aspect Ratios                             |
-| -------------------------------------- | ---- | --------------- | ----------------------------------------- |
-| `alibaba/wan-2.6`                      | t2i  | 1024x1024       | 1:1, 4:3, 3:4                             |
-| `pruna/p-image-t2i`                    | t2i  | up to 1440x1440 | 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3       |
-| `pruna/p-image-edit`                   | edit | up to 1440x1440 | 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3       |
-| `google/nano-banana-edit`              | edit | up to 4096x4096 | 1:1, 4:3, 3:4                             |
-| `google/nano-banana-pro-edit`          | edit | 1k, 2k, 4k      | 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, 21:9 |
-| `bytedance/seedream-3.0`               | t2i  | up to 4096x4096 | 1:1, 4:3, 3:4                             |
-| `bytedance/seedream-4.0`               | t2i  | up to 4096x4096 | 1:1, 4:3, 3:4                             |
-| `bytedance/seedream-4.0-edit`          | edit | up to 4096x4096 | uses size                                 |
-| `qwen/qwen-image`                      | t2i  | up to 4096x4096 | 1:1, 4:3, 3:4                             |
-| `qwen/qwen-image-edit`                 | edit | up to 4096x4096 | 1:1, 4:3, 3:4                             |
-| `qwen/qwen-image-edit-2511`            | edit | up to 1536x1536 | 1:1, 4:3, 3:4                             |
-| `black-forest-labs/flux-1-schnell`     | t2i  | up to 2048x2048 | 1:1, 4:3, 3:4                             |
-| `black-forest-labs/flux-1-dev`         | t2i  | up to 2048x2048 | 1:1, 4:3, 3:4                             |
-| `black-forest-labs/flux-1-kontext-dev` | edit | up to 2048x2048 | 1:1, 4:3, 3:4                             |
+| Model ID                               | Type | Resolution        | Aspect Ratios                                   |
+| -------------------------------------- | ---- | ----------------- | ----------------------------------------------- |
+| `alibaba/wan-2.6`                      | t2i  | 768x768–1280x1280 | 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, 21:9, 9:21 |
+| `pruna/p-image-t2i`                    | t2i  | up to 1440x1440   | 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3             |
+| `pruna/p-image-edit`                   | edit | up to 1440x1440   | 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3             |
+| `google/nano-banana-edit`              | edit | up to 4096x4096   | 1:1, 4:3, 3:4                                   |
+| `google/nano-banana-pro-edit`          | edit | 1k, 2k, 4k        | 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, 21:9       |
+| `bytedance/seedream-3.0`               | t2i  | up to 4096x4096   | 1:1, 4:3, 3:4                                   |
+| `bytedance/seedream-4.0`               | t2i  | up to 4096x4096   | 1:1, 4:3, 3:4                                   |
+| `bytedance/seedream-4.0-edit`          | edit | up to 4096x4096   | uses size                                       |
+| `qwen/qwen-image`                      | t2i  | up to 4096x4096   | 1:1, 4:3, 3:4                                   |
+| `qwen/qwen-image-edit`                 | edit | up to 4096x4096   | 1:1, 4:3, 3:4                                   |
+| `qwen/qwen-image-edit-2511`            | edit | up to 1536x1536   | 1:1, 4:3, 3:4                                   |
+| `tongyi-mai/z-image-turbo`             | t2i  | up to 1536x1536   | 1:1, 4:3, 3:4, 3:2, 2:3, 16:9, 9:16             |
+| `black-forest-labs/flux-1-schnell`     | t2i  | up to 2048x2048   | 1:1, 4:3, 3:4                                   |
+| `black-forest-labs/flux-1-dev`         | t2i  | up to 2048x2048   | 1:1, 4:3, 3:4                                   |
+| `black-forest-labs/flux-1-kontext-dev` | edit | up to 2048x2048   | 1:1, 4:3, 3:4                                   |
 For the full list of models, see the [Runpod Public Endpoint Reference](https://docs.runpod.io/hub/public-endpoint-reference).
@@ -355,6 +356,38 @@ const { image } = await generateImage({
 });
 ```
+#### Alibaba (WAN 2.6)
+Text-to-image model with flexible resolution support.
+**Resolution constraints:**
+- Total pixels: 589,824 (768x768) to 1,638,400 (1280x1280)
+- Aspect ratio: 1:4 to 4:1
+- Default: 1280x1280
+**Recommended resolutions for common aspect ratios:**
+| Aspect Ratio | Resolution |
+| :----------- | :--------- |
+| 1:1          | 1280x1280  |
+| 2:3          | 800x1200   |
+| 3:2          | 1200x800   |
+| 3:4          | 960x1280   |
+| 4:3          | 1280x960   |
+| 9:16         | 720x1280   |
+| 16:9         | 1280x720   |
+| 21:9         | 1344x576   |
+| 9:21         | 576x1344   |
+```ts
+const { image } = await generateImage({
+  model: runpod.image('alibaba/wan-2.6'),
+  prompt: 'A serene mountain landscape at dawn',
+  aspectRatio: '16:9',
+});
+```
 #### Google (Nano Banana Pro)
 | Option                              | Values           |
@@ -403,6 +436,14 @@ const { image } = await generateImage({
 });
 ```
+#### Tongyi-MAI (Z-Image Turbo)
+Supported model: `tongyi-mai/z-image-turbo`
+- Supported sizes (validated by provider): 512x512, 768x768, 1024x1024, 1280x1280, 1536x1536, 512x768, 768x512, 1024x768, 768x1024, 1328x1328, 1472x1140, 1140x1472, 768x432, 1024x576, 1280x720, 1536x864, 432x768, 576x1024, 720x1280, 864x1536
+- Supported `aspectRatio` values: 1:1, 4:3, 3:4, 3:2, 2:3, 16:9, 9:16 (maps to sizes above; use `size` for exact dimensions)
+- Additional parameters: `strength`, `output_format`, `enable_safety_checker`, `seed`
 ## Speech Models
 Generate speech using the AI SDK's `generateSpeech` and `runpod.speech(...)`:
@@ -533,6 +574,185 @@ const result = await generateSpeech({
 });
 ```
+## Transcription Models
+Transcribe audio using the AI SDK's `experimental_transcribe` and `runpod.transcription(...)`:
+```ts
+import { runpod } from '@runpod/ai-sdk-provider';
+import { experimental_transcribe as transcribe } from 'ai';
+const result = await transcribe({
+  model: runpod.transcription('pruna/whisper-v3-large'),
+  audio: new URL('https://image.runpod.ai/demo/transcription-demo.wav'),
+});
+console.log(result.text);
+```
+**Returns:**
+- `result.text` - Full transcription text
+- `result.segments` - Array of segments with timing info
+  - `segment.text` - Segment text
+  - `segment.startSecond` - Start time in seconds
+  - `segment.endSecond` - End time in seconds
+- `result.language` - Detected language code
+- `result.durationInSeconds` - Audio duration
+- `result.warnings` - Array of any warnings
+- `result.providerMetadata.runpod.jobId` - RunPod job ID
+### Audio Input
+You can provide audio in several ways:
+```ts
+// URL (recommended for large files)
+const result = await transcribe({
+  model: runpod.transcription('pruna/whisper-v3-large'),
+  audio: new URL('https://image.runpod.ai/demo/transcription-demo.wav'),
+});
+// Local file as Uint8Array
+import { readFileSync } from 'fs';
+const audioData = readFileSync('./audio.wav');
+const result = await transcribe({
+  model: runpod.transcription('pruna/whisper-v3-large'),
+  audio: audioData,
+});
+```
+### Examples
+Check out our [examples](https://github.com/runpod/examples/tree/main/ai-sdk/getting-started) for more code snippets on how to use all the different models.
+### Supported Models
+- `pruna/whisper-v3-large`
+### Provider Options
+Use `providerOptions.runpod` for model-specific parameters:
+| Option               | Type      | Default | Description                                    |
+| -------------------- | --------- | ------- | ---------------------------------------------- |
+| `audio`              | `string`  | -       | URL to audio file (alternative to binary data) |
+| `prompt`             | `string`  | -       | Context prompt to guide transcription          |
+| `language`           | `string`  | Auto    | ISO-639-1 language code (e.g., 'en', 'es')     |
+| `word_timestamps`    | `boolean` | `false` | Include word-level timestamps                  |
+| `translate`          | `boolean` | `false` | Translate audio to English                     |
+| `enable_vad`         | `boolean` | `false` | Enable voice activity detection                |
+| `maxPollAttempts`    | `number`  | `120`   | Max polling attempts                           |
+| `pollIntervalMillis` | `number`  | `2000`  | Polling interval (ms)                          |
+**Example (providerOptions):**
+```ts
+const result = await transcribe({
+  model: runpod.transcription('pruna/whisper-v3-large'),
+  audio: new URL('https://image.runpod.ai/demo/transcription-demo.wav'),
+  providerOptions: {
+    runpod: {
+      language: 'en',
+      prompt: 'This is a demo of audio transcription',
+      word_timestamps: true,
+    },
+  },
+});
+```
+## Video Models
+Generate videos using the AI SDK's `experimental_generateVideo` and `runpod.video(...)`:
+```ts
+import { runpod } from '@runpod/ai-sdk-provider';
+import { experimental_generateVideo as generateVideo } from 'ai';
+// Text-to-video
+const result = await generateVideo({
+  model: runpod.video('alibaba/wan-2.6-t2v'),
+  prompt: 'A golden retriever running on a sunny beach, cinematic, 4k',
+});
+console.log(result.video.url);
+```
+```ts
+// Image-to-video
+const result = await generateVideo({
+  model: runpod.video('alibaba/wan-2.6-i2v'),
+  prompt: 'Animate this scene with gentle camera movement',
+  image: new URL('https://example.com/image.png'),
+});
+console.log(result.video.url);
+```
+**Returns:**
+- `result.video` - Generated video (`{ type: 'url', url, mediaType: 'video/mp4' }`)
+- `result.warnings` - Array of any warnings
+- `result.providerMetadata.runpod.jobId` - Runpod job ID
+### Examples
+Check out our [examples](https://github.com/runpod/examples/tree/main/ai-sdk/getting-started) for more code snippets on how to use all the different models.
+### Supported Models
+| Model ID                                | Type        | Company             |
+| --------------------------------------- | ----------- | ------------------- |
+| `pruna/p-video`                         | t2v         | Pruna AI            |
+| `vidu/q3-t2v`                           | t2v         | Shengshu Technology |
+| `vidu/q3-i2v`                           | i2v         | Shengshu Technology |
+| `kwaivgi/kling-v2.6-std-motion-control` | i2v + video | KwaiVGI (Kuaishou)  |
+| `kwaivgi/kling-video-o1-r2v`            | i2v         | KwaiVGI (Kuaishou)  |
+| `kwaivgi/kling-v2.1-i2v-pro`            | i2v         | KwaiVGI (Kuaishou)  |
+| `alibaba/wan-2.6-t2v`                   | t2v         | Alibaba             |
+| `alibaba/wan-2.6-i2v`                   | i2v         | Alibaba             |
+| `alibaba/wan-2.5`                       | i2v         | Alibaba             |
+| `alibaba/wan-2.2-t2v-720-lora`          | i2v         | Alibaba             |
+| `alibaba/wan-2.2-i2v-720`               | i2v         | Alibaba             |
+| `alibaba/wan-2.1-i2v-720`               | i2v         | Alibaba             |
+| `bytedance/seedance-v1.5-pro-i2v`       | i2v         | ByteDance           |
+| `openai/sora-2-pro-i2v`                 | i2v         | OpenAI              |
+| `openai/sora-2-i2v`                     | i2v         | OpenAI              |
+### Provider Options
+Use `providerOptions.runpod` for model-specific parameters:
+| Option                | Type     | Default | Description                          |
+| --------------------- | -------- | ------- | ------------------------------------ |
+| `negative_prompt`     | `string` | -       | What to avoid in the generated video |
+| `guidance_scale`      | `number` | -       | Guidance scale for prompt adherence  |
+| `num_inference_steps` | `number` | -       | Number of inference steps            |
+| `style`               | `string` | -       | Style preset (model-specific)        |
+| `maxPollAttempts`     | `number` | `120`   | Max polling attempts                 |
+| `pollIntervalMillis`  | `number` | `5000`  | Polling interval (ms)                |
+Any additional model-specific parameters can be passed through `providerOptions.runpod` and will be forwarded to the API.
+**Example (providerOptions):**
+```ts
+const result = await generateVideo({
+  model: runpod.video('alibaba/wan-2.6-t2v'),
+  prompt: 'A serene mountain landscape with flowing water',
+  duration: 5,
+  aspectRatio: '16:9',
+  seed: 42,
+  providerOptions: {
+    runpod: {
+      negative_prompt: 'blurry, low quality',
+      guidance_scale: 7.5,
+    },
+  },
+});
+```
 ## About Runpod
 [Runpod](https://runpod.io) is the foundation for developers to build, deploy, and scale custom AI systems.

package/dist/index.d.mts CHANGED Viewed

@@ -1,4 +1,4 @@
-import { LanguageModelV3, ImageModelV3, SpeechModelV3 } from '@ai-sdk/provider';
+import { LanguageModelV3, ImageModelV3, SpeechModelV3, TranscriptionModelV3, Experimental_VideoModelV3 } from '@ai-sdk/provider';
 import { FetchFunction } from '@ai-sdk/provider-utils';
 export { OpenAICompatibleErrorData as RunpodErrorData } from '@ai-sdk/openai-compatible';
 import { z } from 'zod';
@@ -56,6 +56,22 @@ interface RunpodProvider {
   Creates a speech model for speech generation.
   */
     speech(modelId: string): SpeechModelV3;
+    /**
+  Creates a transcription model for audio transcription.
+  */
+    transcriptionModel(modelId: string): TranscriptionModelV3;
+    /**
+  Creates a transcription model for audio transcription.
+  */
+    transcription(modelId: string): TranscriptionModelV3;
+    /**
+  Creates a video model for video generation.
+  */
+    videoModel(modelId: string): Experimental_VideoModelV3;
+    /**
+  Creates a video model for video generation.
+  */
+    video(modelId: string): Experimental_VideoModelV3;
 }
 declare function createRunpod(options?: RunpodProviderSettings): RunpodProvider;
 declare const runpod: RunpodProvider;
@@ -64,7 +80,101 @@ type RunpodChatModelId = 'qwen/qwen3-32b-awq' | (string & {});
 type RunpodCompletionModelId = 'qwen/qwen3-32b-awq' | (string & {});
-type RunpodImageModelId = 'qwen/qwen-image' | 'qwen/qwen-image-edit' | 'qwen/qwen-image-edit-2511' | 'bytedance/seedream-3.0' | 'bytedance/seedream-4.0' | 'bytedance/seedream-4.0-edit' | 'black-forest-labs/flux-1-kontext-dev' | 'black-forest-labs/flux-1-schnell' | 'black-forest-labs/flux-1-dev' | 'alibaba/wan-2.6' | 'google/nano-banana-edit' | 'nano-banana-edit';
+type RunpodImageModelId = 'qwen/qwen-image' | 'qwen/qwen-image-edit' | 'qwen/qwen-image-edit-2511' | 'bytedance/seedream-3.0' | 'bytedance/seedream-4.0' | 'bytedance/seedream-4.0-edit' | 'black-forest-labs/flux-1-kontext-dev' | 'black-forest-labs/flux-1-schnell' | 'black-forest-labs/flux-1-dev' | 'alibaba/wan-2.6' | 'tongyi-mai/z-image-turbo' | 'google/nano-banana-edit' | 'nano-banana-edit';
+type RunpodTranscriptionModelId = 'pruna/whisper-v3-large' | (string & {});
+interface RunpodTranscriptionProviderOptions {
+    /**
+     * URL to audio file. Use this if you want to pass an audio URL directly
+     * instead of binary audio data.
+     */
+    audio?: string;
+    /**
+     * Optional context prompt to guide the transcription (initial_prompt in Whisper).
+     */
+    prompt?: string;
+    /**
+     * Alias for prompt - the initial prompt for the first window.
+     */
+    initial_prompt?: string;
+    /**
+     * Language of the audio in ISO-639-1 format (e.g., 'en', 'es', 'fr').
+     * If not specified, Whisper will auto-detect the language.
+     */
+    language?: string;
+    /**
+     * Whether to include word-level timestamps in the response.
+     * @default false
+     */
+    word_timestamps?: boolean;
+    /**
+     * Whisper model to use.
+     * Options: 'tiny', 'base', 'small', 'medium', 'large-v1', 'large-v2', 'large-v3', 'turbo'
+     * @default 'base'
+     */
+    model?: string;
+    /**
+     * Output format for transcription.
+     * Options: 'plain_text', 'formatted_text', 'srt', 'vtt'
+     * @default 'plain_text'
+     */
+    transcription?: string;
+    /**
+     * Whether to translate the audio to English.
+     * @default false
+     */
+    translate?: boolean;
+    /**
+     * Whether to enable voice activity detection.
+     * @default false
+     */
+    enable_vad?: boolean;
+    /**
+     * Maximum number of polling attempts before timing out.
+     * @default 120
+     */
+    maxPollAttempts?: number;
+    /**
+     * Interval between polling attempts in milliseconds.
+     * @default 2000
+     */
+    pollIntervalMillis?: number;
+}
+type RunpodVideoModelId = 'pruna/p-video' | 'vidu/q3-t2v' | 'vidu/q3-i2v' | 'kwaivgi/kling-v2.6-std-motion-control' | 'kwaivgi/kling-video-o1-r2v' | 'kwaivgi/kling-v2.1-i2v-pro' | 'alibaba/wan-2.6-t2v' | 'alibaba/wan-2.6-i2v' | 'alibaba/wan-2.5' | 'alibaba/wan-2.2-t2v-720-lora' | 'alibaba/wan-2.2-i2v-720' | 'alibaba/wan-2.1-i2v-720' | 'bytedance/seedance-v1.5-pro-i2v' | 'openai/sora-2-pro-i2v' | 'openai/sora-2-i2v' | (string & {});
+interface RunpodVideoProviderOptions {
+    /**
+     * Negative prompt to guide what to avoid in the generated video.
+     */
+    negative_prompt?: string;
+    /**
+     * Style preset for video generation (model-specific).
+     */
+    style?: string;
+    /**
+     * Guidance scale for prompt adherence.
+     */
+    guidance_scale?: number;
+    /**
+     * Number of inference steps.
+     */
+    num_inference_steps?: number;
+    /**
+     * Maximum number of polling attempts before timing out.
+     * @default 120
+     */
+    maxPollAttempts?: number;
+    /**
+     * Interval between polling attempts in milliseconds.
+     * @default 5000
+     */
+    pollIntervalMillis?: number;
+    /**
+     * Additional model-specific parameters are passed through via
+     * index signature.
+     */
+    [key: string]: unknown;
+}
 declare const runpodImageErrorSchema: z.ZodObject<{
     error: z.ZodOptional<z.ZodString>;
@@ -78,4 +188,4 @@ declare const runpodImageErrorSchema: z.ZodObject<{
 }>;
 type RunpodImageErrorData = z.infer<typeof runpodImageErrorSchema>;
-export { type RunpodChatModelId, type RunpodCompletionModelId, type RunpodImageErrorData, type RunpodImageModelId, type RunpodProvider, type RunpodProviderSettings, createRunpod, runpod };
+export { type RunpodChatModelId, type RunpodCompletionModelId, type RunpodImageErrorData, type RunpodImageModelId, type RunpodProvider, type RunpodProviderSettings, type RunpodTranscriptionModelId, type RunpodTranscriptionProviderOptions, type RunpodVideoModelId, type RunpodVideoProviderOptions, createRunpod, runpod };

package/dist/index.d.ts CHANGED Viewed

@@ -1,4 +1,4 @@
-import { LanguageModelV3, ImageModelV3, SpeechModelV3 } from '@ai-sdk/provider';
+import { LanguageModelV3, ImageModelV3, SpeechModelV3, TranscriptionModelV3, Experimental_VideoModelV3 } from '@ai-sdk/provider';
 import { FetchFunction } from '@ai-sdk/provider-utils';
 export { OpenAICompatibleErrorData as RunpodErrorData } from '@ai-sdk/openai-compatible';
 import { z } from 'zod';
@@ -56,6 +56,22 @@ interface RunpodProvider {
   Creates a speech model for speech generation.
   */
     speech(modelId: string): SpeechModelV3;
+    /**
+  Creates a transcription model for audio transcription.
+  */
+    transcriptionModel(modelId: string): TranscriptionModelV3;
+    /**
+  Creates a transcription model for audio transcription.
+  */
+    transcription(modelId: string): TranscriptionModelV3;
+    /**
+  Creates a video model for video generation.
+  */
+    videoModel(modelId: string): Experimental_VideoModelV3;
+    /**
+  Creates a video model for video generation.
+  */
+    video(modelId: string): Experimental_VideoModelV3;
 }
 declare function createRunpod(options?: RunpodProviderSettings): RunpodProvider;
 declare const runpod: RunpodProvider;
@@ -64,7 +80,101 @@ type RunpodChatModelId = 'qwen/qwen3-32b-awq' | (string & {});
 type RunpodCompletionModelId = 'qwen/qwen3-32b-awq' | (string & {});
-type RunpodImageModelId = 'qwen/qwen-image' | 'qwen/qwen-image-edit' | 'qwen/qwen-image-edit-2511' | 'bytedance/seedream-3.0' | 'bytedance/seedream-4.0' | 'bytedance/seedream-4.0-edit' | 'black-forest-labs/flux-1-kontext-dev' | 'black-forest-labs/flux-1-schnell' | 'black-forest-labs/flux-1-dev' | 'alibaba/wan-2.6' | 'google/nano-banana-edit' | 'nano-banana-edit';
+type RunpodImageModelId = 'qwen/qwen-image' | 'qwen/qwen-image-edit' | 'qwen/qwen-image-edit-2511' | 'bytedance/seedream-3.0' | 'bytedance/seedream-4.0' | 'bytedance/seedream-4.0-edit' | 'black-forest-labs/flux-1-kontext-dev' | 'black-forest-labs/flux-1-schnell' | 'black-forest-labs/flux-1-dev' | 'alibaba/wan-2.6' | 'tongyi-mai/z-image-turbo' | 'google/nano-banana-edit' | 'nano-banana-edit';
+type RunpodTranscriptionModelId = 'pruna/whisper-v3-large' | (string & {});
+interface RunpodTranscriptionProviderOptions {
+    /**
+     * URL to audio file. Use this if you want to pass an audio URL directly
+     * instead of binary audio data.
+     */
+    audio?: string;
+    /**
+     * Optional context prompt to guide the transcription (initial_prompt in Whisper).
+     */
+    prompt?: string;
+    /**
+     * Alias for prompt - the initial prompt for the first window.
+     */
+    initial_prompt?: string;
+    /**
+     * Language of the audio in ISO-639-1 format (e.g., 'en', 'es', 'fr').
+     * If not specified, Whisper will auto-detect the language.
+     */
+    language?: string;
+    /**
+     * Whether to include word-level timestamps in the response.
+     * @default false
+     */
+    word_timestamps?: boolean;
+    /**
+     * Whisper model to use.
+     * Options: 'tiny', 'base', 'small', 'medium', 'large-v1', 'large-v2', 'large-v3', 'turbo'
+     * @default 'base'
+     */
+    model?: string;
+    /**
+     * Output format for transcription.
+     * Options: 'plain_text', 'formatted_text', 'srt', 'vtt'
+     * @default 'plain_text'
+     */
+    transcription?: string;
+    /**
+     * Whether to translate the audio to English.
+     * @default false
+     */
+    translate?: boolean;
+    /**
+     * Whether to enable voice activity detection.
+     * @default false
+     */
+    enable_vad?: boolean;
+    /**
+     * Maximum number of polling attempts before timing out.
+     * @default 120
+     */
+    maxPollAttempts?: number;
+    /**
+     * Interval between polling attempts in milliseconds.
+     * @default 2000
+     */
+    pollIntervalMillis?: number;
+}
+type RunpodVideoModelId = 'pruna/p-video' | 'vidu/q3-t2v' | 'vidu/q3-i2v' | 'kwaivgi/kling-v2.6-std-motion-control' | 'kwaivgi/kling-video-o1-r2v' | 'kwaivgi/kling-v2.1-i2v-pro' | 'alibaba/wan-2.6-t2v' | 'alibaba/wan-2.6-i2v' | 'alibaba/wan-2.5' | 'alibaba/wan-2.2-t2v-720-lora' | 'alibaba/wan-2.2-i2v-720' | 'alibaba/wan-2.1-i2v-720' | 'bytedance/seedance-v1.5-pro-i2v' | 'openai/sora-2-pro-i2v' | 'openai/sora-2-i2v' | (string & {});
+interface RunpodVideoProviderOptions {
+    /**
+     * Negative prompt to guide what to avoid in the generated video.
+     */
+    negative_prompt?: string;
+    /**
+     * Style preset for video generation (model-specific).
+     */
+    style?: string;
+    /**
+     * Guidance scale for prompt adherence.
+     */
+    guidance_scale?: number;
+    /**
+     * Number of inference steps.
+     */
+    num_inference_steps?: number;
+    /**
+     * Maximum number of polling attempts before timing out.
+     * @default 120
+     */
+    maxPollAttempts?: number;
+    /**
+     * Interval between polling attempts in milliseconds.
+     * @default 5000
+     */
+    pollIntervalMillis?: number;
+    /**
+     * Additional model-specific parameters are passed through via
+     * index signature.
+     */
+    [key: string]: unknown;
+}
 declare const runpodImageErrorSchema: z.ZodObject<{
     error: z.ZodOptional<z.ZodString>;
@@ -78,4 +188,4 @@ declare const runpodImageErrorSchema: z.ZodObject<{
 }>;
 type RunpodImageErrorData = z.infer<typeof runpodImageErrorSchema>;
-export { type RunpodChatModelId, type RunpodCompletionModelId, type RunpodImageErrorData, type RunpodImageModelId, type RunpodProvider, type RunpodProviderSettings, createRunpod, runpod };
+export { type RunpodChatModelId, type RunpodCompletionModelId, type RunpodImageErrorData, type RunpodImageModelId, type RunpodProvider, type RunpodProviderSettings, type RunpodTranscriptionModelId, type RunpodTranscriptionProviderOptions, type RunpodVideoModelId, type RunpodVideoProviderOptions, createRunpod, runpod };