getaiapi 1.3.1 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,12 +1,12 @@
1
1
  # getaiapi
2
2
 
3
- **One function to call any AI model.**
3
+ **Typed AI provider SDKs. One import per provider.**
4
4
 
5
5
  [![npm version](https://img.shields.io/npm/v/getaiapi)](https://www.npmjs.com/package/getaiapi)
6
6
  [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
7
7
  [![TypeScript](https://img.shields.io/badge/TypeScript-strict-blue.svg)](https://www.typescriptlang.org/)
8
8
 
9
- A unified TypeScript library that wraps 1,890+ AI models across 5 providers into a single `generate()` function. One input shape. One output shape. Any model.
9
+ Each AI provider gets a typed namespace with one function per model. No generic `generate()`, no model strings, no mapping layers. What you type is what gets sent.
10
10
 
11
11
  ## Install
12
12
 
@@ -14,639 +14,652 @@ A unified TypeScript library that wraps 1,890+ AI models across 5 providers into
14
14
  npm install getaiapi
15
15
  ```
16
16
 
17
- ## Quick Start
17
+ ## Kling AI
18
18
 
19
- ```typescript
20
- import { generate } from 'getaiapi'
19
+ 69 models across 20 endpoints. Each model is a typed function with Kling-native field names.
21
20
 
22
- const result = await generate({
23
- model: 'flux-schnell',
24
- prompt: 'a cat wearing sunglasses'
25
- })
21
+ ### Setup
26
22
 
27
- console.log(result.outputs[0].url)
23
+ ```bash
24
+ export KLING_ACCESS_KEY="your-access-key"
25
+ export KLING_SECRET_KEY="your-secret-key"
28
26
  ```
29
27
 
30
- ## More Examples
31
-
32
- **Text generation (LLMs)**
28
+ Or configure programmatically:
33
29
 
34
30
  ```typescript
35
- const answer = await generate({
36
- model: 'claude-sonnet-4-6',
37
- prompt: 'Explain quantum computing in one paragraph'
38
- })
31
+ import { kling } from 'getaiapi'
39
32
 
40
- console.log(answer.outputs[0].content)
33
+ kling.configure({ accessKey: '...', secretKey: '...' })
41
34
  ```
42
35
 
43
- With system prompt and parameters:
44
-
45
- ```typescript
46
- const reply = await generate({
47
- model: 'gpt-4o',
48
- prompt: 'Write a haiku about TypeScript',
49
- options: {
50
- system: 'You are a creative poet.',
51
- temperature: 0.9,
52
- max_tokens: 100,
53
- }
54
- })
55
- ```
36
+ ### Text to Video
56
37
 
57
- **Text-to-video**
38
+ 9 models: V1 Standard, V1.6 Pro/Standard, V2 Master, V2.1 Master, V2.5 Turbo Pro, V2.6 Pro, V3 Pro/Standard.
58
39
 
59
40
  ```typescript
60
- const video = await generate({
61
- model: 'veo3.1',
62
- prompt: 'a timelapse of a flower blooming in a garden'
63
- })
64
- ```
65
-
66
- **Image editing**
41
+ import { kling } from 'getaiapi'
67
42
 
68
- ```typescript
69
- const edited = await generate({
70
- model: 'gpt-image-1.5-edit',
71
- image: 'https://example.com/photo.jpg',
72
- prompt: 'add a rainbow in the sky'
43
+ const result = await kling.textToVideoV3Pro({
44
+ prompt: 'a golden retriever running on a beach at sunset',
45
+ duration: '5',
46
+ aspect_ratio: '16:9',
47
+ sound: 'on',
73
48
  })
49
+
50
+ console.log(result.videos[0].url)
74
51
  ```
75
52
 
76
- **Multi-image references** (e.g., character + location consistency)
53
+ | Function | Model | Mode |
54
+ |----------|-------|------|
55
+ | `textToVideoV1Standard` | kling-v1 | std |
56
+ | `textToVideoV1_6Pro` | kling-v1-6 | pro |
57
+ | `textToVideoV1_6Standard` | kling-v1-6 | std |
58
+ | `textToVideoV2Master` | kling-v2-master | — |
59
+ | `textToVideoV2_1Master` | kling-v2-1-master | — |
60
+ | `textToVideoV2_5TurboPro` | kling-v2-5-turbo | pro |
61
+ | `textToVideoV2_6Pro` | kling-v2-6 | pro |
62
+ | `textToVideoV3Pro` | kling-v3 | pro |
63
+ | `textToVideoV3Standard` | kling-v3 | std |
64
+
65
+ **Input: `TextToVideoInput`**
77
66
 
78
67
  ```typescript
79
- const scene = await generate({
80
- model: 'google-nano-banana-pro-edit',
81
- prompt: 'cinematic shot of the character in the location',
82
- image: 'https://example.com/character.jpg',
83
- images: [
84
- 'https://example.com/character.jpg',
85
- 'https://example.com/location.jpg',
86
- ],
87
- })
68
+ {
69
+ prompt: string // required
70
+ negative_prompt?: string
71
+ duration?: string // '5' or '10'
72
+ aspect_ratio?: string // '16:9', '9:16', '1:1'
73
+ cfg_scale?: number
74
+ sound?: 'on' | 'off' // generate audio
75
+ }
88
76
  ```
89
77
 
90
- **Text-to-speech**
78
+ ### Image to Video
79
+
80
+ 13 models: V1 Standard, V1.5 Pro, V1.6 Pro/Standard, V2 Master, V2.1 Master/Pro/Standard, V2.5 Turbo Pro/Standard, V2.6 Pro, V3 Pro/Standard.
91
81
 
92
82
  ```typescript
93
- const speech = await generate({
94
- model: 'elevenlabs-v3',
95
- prompt: 'Hello, welcome to getaiapi.',
96
- options: { voice_id: 'rachel' }
83
+ const result = await kling.imageToVideoV3Pro({
84
+ image: 'https://example.com/photo.jpg',
85
+ prompt: 'animate this photo with gentle wind',
86
+ duration: '5',
97
87
  })
98
88
  ```
99
89
 
100
- **Upscale an image**
90
+ | Function | Model | Mode |
91
+ |----------|-------|------|
92
+ | `imageToVideoV1Standard` | kling-v1 | std |
93
+ | `imageToVideoV1_5Pro` | kling-v1-5 | pro |
94
+ | `imageToVideoV1_6Pro` | kling-v1-6 | pro |
95
+ | `imageToVideoV1_6Standard` | kling-v1-6 | std |
96
+ | `imageToVideoV2Master` | kling-v2-master | — |
97
+ | `imageToVideoV2_1Master` | kling-v2-1-master | — |
98
+ | `imageToVideoV2_1Pro` | kling-v2-1 | pro |
99
+ | `imageToVideoV2_1Standard` | kling-v2-1 | std |
100
+ | `imageToVideoV2_5TurboPro` | kling-v2-5-turbo | pro |
101
+ | `imageToVideoV2_5TurboStandard` | kling-v2-5-turbo | std |
102
+ | `imageToVideoV2_6Pro` | kling-v2-6 | pro |
103
+ | `imageToVideoV3Pro` | kling-v3 | pro |
104
+ | `imageToVideoV3Standard` | kling-v3 | std |
105
+
106
+ **Input: `ImageToVideoInput`**
101
107
 
102
108
  ```typescript
103
- const upscaled = await generate({
104
- model: 'topaz-upscale-image',
105
- image: 'https://example.com/low-res.jpg'
106
- })
109
+ {
110
+ image: string // required — URL or base64
111
+ prompt?: string
112
+ negative_prompt?: string
113
+ duration?: string
114
+ aspect_ratio?: string
115
+ cfg_scale?: number
116
+ sound?: 'on' | 'off'
117
+ image_tail?: string // end frame image URL
118
+ voice_list?: Array<{ voice_id: string }>
119
+ element_list?: Array<{ id: string; image: string }>
120
+ }
107
121
  ```
108
122
 
109
- **Kling native provider** (bypass fal-ai, call Kling API directly)
123
+ ### Omni Video
124
+
125
+ 17 models across O1 and O3 variants. Supports text-to-video, image-to-video, reference-to-video, video editing, and video reference — all through one endpoint.
110
126
 
111
127
  ```typescript
112
- const video = await generate({
113
- model: 'kling-video-v3-pro-text-to-video',
114
- provider: 'kling', // uses KLING_ACCESS_KEY directly
115
- prompt: 'a golden retriever running on a beach at sunset',
128
+ const result = await kling.omniVideoO3ProTextToVideo({
129
+ prompt: 'a cyberpunk city at night',
116
130
  duration: '5',
117
- options: { aspect_ratio: '16:9', sound: 'on' },
131
+ aspect_ratio: '16:9',
118
132
  })
119
133
  ```
120
134
 
121
- **Remove background**
135
+ | Function | Model | Mode |
136
+ |----------|-------|------|
137
+ | `omniVideoO1ImageToVideo` | kling-video-o1 | — |
138
+ | `omniVideoO1ReferenceToVideo` | kling-video-o1 | — |
139
+ | `omniVideoO1StandardImageToVideo` | kling-video-o1 | std |
140
+ | `omniVideoO1StandardReferenceToVideo` | kling-video-o1 | std |
141
+ | `omniVideoO1StandardVideoEdit` | kling-video-o1 | std |
142
+ | `omniVideoO1StandardVideoReference` | kling-video-o1 | std |
143
+ | `omniVideoO1VideoEdit` | kling-video-o1 | — |
144
+ | `omniVideoO1VideoReference` | kling-video-o1 | — |
145
+ | `omniVideoO3ProImageToVideo` | kling-v3-omni | pro |
146
+ | `omniVideoO3ProReferenceToVideo` | kling-v3-omni | pro |
147
+ | `omniVideoO3ProTextToVideo` | kling-v3-omni | pro |
148
+ | `omniVideoO3ProVideoEdit` | kling-v3-omni | pro |
149
+ | `omniVideoO3ProVideoReference` | kling-v3-omni | pro |
150
+ | `omniVideoO3StandardReferenceToVideo` | kling-v3-omni | std |
151
+ | `omniVideoO3StandardTextToVideo` | kling-v3-omni | std |
152
+ | `omniVideoO3StandardVideoEdit` | kling-v3-omni | std |
153
+ | `omniVideoO3StandardVideoReference` | kling-v3-omni | std |
154
+
155
+ **Input: `OmniVideoInput`**
122
156
 
123
157
  ```typescript
124
- const cutout = await generate({
125
- model: 'birefnet-v2',
126
- image: 'https://example.com/portrait.jpg'
127
- })
158
+ {
159
+ prompt: string // required
160
+ image?: string
161
+ negative_prompt?: string
162
+ duration?: string
163
+ aspect_ratio?: string
164
+ cfg_scale?: number
165
+ sound?: 'on' | 'off'
166
+ element_list?: Array<{ id: string; image: string }>
167
+ }
128
168
  ```
129
169
 
130
- ## Async Job Control
170
+ ### Image Generation
131
171
 
132
- For long-running jobs (video generation, training), you can submit a job and poll for status separately instead of blocking until completion.
172
+ 2 models on `v1/images/generations` and 3 models on `v1/images/omni-image`.
133
173
 
134
174
  ```typescript
135
- import { submit, poll } from 'getaiapi'
136
-
137
- // Submit — returns immediately with the provider's task ID
138
- const job = await submit({
139
- model: 'veo3.1',
140
- prompt: 'a timelapse of a flower blooming',
175
+ const result = await kling.imageO1({
176
+ prompt: 'a watercolor painting of a mountain lake',
177
+ n: 2,
178
+ aspect_ratio: '16:9',
141
179
  })
142
180
 
143
- console.log(job.id) // provider task ID
144
- console.log(job.status) // 'pending' | 'processing' | 'completed'
181
+ console.log(result.images[0].url)
182
+ ```
145
183
 
146
- // Poll check status manually (call in a loop, on a timer, etc.)
147
- let result = await poll(job)
184
+ | Function | Endpoint | Model |
185
+ |----------|----------|-------|
186
+ | `imageV3TextToImage` | generations | kling-v3 |
187
+ | `imageV3ImageToImage` | generations | kling-v3 |
188
+ | `imageO1` | omni-image | kling-image-o1 |
189
+ | `imageO3TextToImage` | omni-image | kling-v3-omni |
190
+ | `imageO3ImageToImage` | omni-image | kling-v3-omni |
148
191
 
149
- while (result.status === 'pending' || result.status === 'processing') {
150
- await new Promise(r => setTimeout(r, 2000))
151
- result = await poll(job)
152
- }
192
+ **Input: `ImageGenerationInput` / `OmniImageInput`**
153
193
 
154
- if (result.status === 'completed') {
155
- console.log(result.outputs[0].url)
194
+ ```typescript
195
+ {
196
+ prompt: string // required
197
+ image?: string // for image-to-image
198
+ n?: number // number of outputs
199
+ aspect_ratio?: string
156
200
  }
157
201
  ```
158
202
 
159
- Synchronous providers (like OpenRouter) return `status: 'completed'` from `submit()` immediately -- check status before polling.
160
-
161
- `submitAndPoll()` is an alias for `generate()` that makes the blocking behavior explicit:
203
+ ### Virtual Try-On
162
204
 
163
205
  ```typescript
164
- import { submitAndPoll } from 'getaiapi'
165
-
166
- const result = await submitAndPoll({
167
- model: 'flux-schnell',
168
- prompt: 'a cat in space',
206
+ const result = await kling.virtualTryOn({
207
+ human_image: 'https://example.com/person.jpg',
208
+ cloth_image: 'https://example.com/shirt.jpg',
169
209
  })
170
210
  ```
171
211
 
172
- ## Configuration
212
+ **Input: `VirtualTryOnInput`**
173
213
 
174
- ### Option 1: Environment Variables
214
+ ```typescript
215
+ {
216
+ human_image: string // required
217
+ cloth_image: string // required
218
+ }
219
+ ```
175
220
 
176
- Set API keys as environment variables. You only need keys for the providers you plan to call.
221
+ ### AI Avatar
177
222
 
178
- ```bash
179
- # fal-ai (1,201 models)
180
- export FAL_KEY="your-fal-key"
223
+ 4 models: V1 Pro/Standard, V2 Pro/Standard.
181
224
 
182
- # Replicate (687 models)
183
- export REPLICATE_API_TOKEN="your-replicate-token"
225
+ ```typescript
226
+ const result = await kling.avatarV2Pro({
227
+ image: 'https://example.com/portrait.jpg',
228
+ sound_file: 'https://example.com/speech.mp3',
229
+ prompt: 'talking head presentation',
230
+ })
231
+ ```
184
232
 
185
- # WaveSpeed (66 models)
186
- export WAVESPEED_API_KEY="your-wavespeed-key"
233
+ | Function | Mode |
234
+ |----------|------|
235
+ | `avatarV1Pro` | pro |
236
+ | `avatarV1Standard` | std |
237
+ | `avatarV2Pro` | pro |
238
+ | `avatarV2Standard` | std |
187
239
 
188
- # OpenRouter (24 LLM models — Claude, GPT, Gemini, Llama, etc.)
189
- export OPENROUTER_API_KEY="your-openrouter-key"
240
+ **Input: `AvatarInput`**
190
241
 
191
- # Kling AI (69 models — native API, bypasses fal-ai middleman)
192
- export KLING_ACCESS_KEY="your-access-key"
193
- export KLING_SECRET_KEY="your-secret-key"
242
+ ```typescript
243
+ {
244
+ image: string // required — portrait image
245
+ sound_file?: string // audio for lip sync
246
+ prompt?: string
247
+ }
194
248
  ```
195
249
 
196
- ### Option 2: Programmatic Configuration
197
-
198
- Use `configure()` to set keys in code -- useful when your env vars have different names or keys come from a secrets manager.
250
+ ### Lip Sync
199
251
 
200
252
  ```typescript
201
- import { configure } from 'getaiapi'
202
-
203
- configure({
204
- keys: {
205
- 'fal-ai': process.env.MY_FAL_TOKEN,
206
- 'replicate': process.env.MY_REPLICATE_TOKEN,
207
- 'wavespeed': process.env.MY_WAVESPEED_TOKEN,
208
- 'openrouter': process.env.MY_OPENROUTER_TOKEN,
209
- 'kling': `${process.env.MY_KLING_AK}:${process.env.MY_KLING_SK}`,
210
- },
253
+ const result = await kling.lipSyncAudioToVideo({
254
+ sound_file: 'https://example.com/speech.mp3',
211
255
  })
212
256
  ```
213
257
 
214
- You can also set keys and storage together:
258
+ | Function | Description |
259
+ |----------|-------------|
260
+ | `lipSyncAudioToVideo` | Audio-driven lip sync |
261
+ | `lipSyncTextToVideo` | Text-driven lip sync |
262
+
263
+ **Input: `LipSyncInput`**
215
264
 
216
265
  ```typescript
217
- configure({
218
- keys: {
219
- 'fal-ai': 'your-fal-key',
220
- },
221
- storage: {
222
- accountId: 'your-r2-account',
223
- bucketName: 'your-bucket',
224
- accessKeyId: 'your-r2-key',
225
- secretAccessKey: 'your-r2-secret',
226
- publicUrlBase: 'https://cdn.example.com',
227
- },
228
- })
266
+ {
267
+ sound_file?: string // audio URL
268
+ }
229
269
  ```
230
270
 
231
- Or set just provider keys with `configureAuth()`:
271
+ ### Video Effects
232
272
 
233
- ```typescript
234
- import { configureAuth } from 'getaiapi'
273
+ 4 models: V1 Standard, V1.5 Pro, V1.6 Pro/Standard.
235
274
 
236
- configureAuth({
237
- 'fal-ai': myKeyVault.get('fal'),
238
- 'replicate': myKeyVault.get('replicate'),
275
+ ```typescript
276
+ const result = await kling.effectsV1_6Pro({
277
+ image: 'https://example.com/photo.jpg',
239
278
  })
240
279
  ```
241
280
 
242
- Programmatic keys take priority over environment variables. Any provider not set programmatically falls back to its default env var.
243
-
244
- Models are automatically filtered to only show providers where you have a valid key configured.
281
+ | Function |
282
+ |----------|
283
+ | `effectsV1Standard` |
284
+ | `effectsV1_5Pro` |
285
+ | `effectsV1_6Pro` |
286
+ | `effectsV1_6Standard` |
245
287
 
246
- ## Model Discovery
288
+ **Input: `EffectsInput`**
247
289
 
248
290
  ```typescript
249
- import { listModels, resolveModel, deriveCategory } from 'getaiapi'
291
+ {
292
+ image: string // required
293
+ }
294
+ ```
250
295
 
251
- // List all models
252
- const all = listModels()
296
+ ### Motion Control
253
297
 
254
- // Filter by input/output modality
255
- const imageModels = listModels({ input: 'text', output: 'image' })
298
+ 4 models: V2.6 Pro/Standard, V3 Pro/Standard.
256
299
 
257
- // Filter by provider
258
- const falModels = listModels({ provider: 'fal-ai' })
300
+ ```typescript
301
+ const result = await kling.motionControlV3Pro({
302
+ image_url: 'https://example.com/scene.jpg',
303
+ prompt: 'camera pan left',
304
+ })
305
+ ```
259
306
 
260
- // Search by name
261
- const fluxModels = listModels({ query: 'flux' })
307
+ | Function | Model | Mode |
308
+ |----------|-------|------|
309
+ | `motionControlV2_6Pro` | kling-v2-6 | pro |
310
+ | `motionControlV2_6Standard` | kling-v2-6 | std |
311
+ | `motionControlV3Pro` | kling-v3 | pro |
312
+ | `motionControlV3Standard` | kling-v3 | std |
262
313
 
263
- // Resolve a specific model
264
- const model = resolveModel('flux-schnell')
265
- // => { canonical_name, aliases, modality, providers }
314
+ **Input: `MotionControlInput`**
266
315
 
267
- // Derive a display label from modality
268
- deriveCategory(model) // => "text-to-image"
316
+ ```typescript
317
+ {
318
+ image_url: string // required
319
+ video_url?: string
320
+ prompt?: string
321
+ keep_original_sound?: boolean
322
+ character_orientation?: string
323
+ element_list?: Array<{ id: string; image: string }>
324
+ }
269
325
  ```
270
326
 
271
- ## Modality
327
+ ### Text to Speech (Sync)
272
328
 
273
- Models declare their input and output types via `modality`. There are no fixed categories — modality is the source of truth.
329
+ Returns immediately no polling.
274
330
 
275
- **Input types:** `text`, `image`, `audio`, `video`
331
+ ```typescript
332
+ const result = await kling.tts({ text: 'Hello world' })
333
+ console.log(result.audios[0].url)
334
+ ```
276
335
 
277
- **Output types:** `image`, `video`, `audio`, `text`, `3d`, `segmentation`
336
+ **Input: `TtsInput`**
278
337
 
279
- Common combinations across 1,890+ models (69 with native Kling provider):
338
+ ```typescript
339
+ {
340
+ text: string // required
341
+ }
342
+ ```
280
343
 
281
- | Inputs | Outputs | Example |
282
- |---|---|---|
283
- | text | image | `flux-schnell`, `ideogram-v3` |
284
- | text | video | `veo3.1`, `sora-2` |
285
- | image, text | image | `gpt-image-1.5-edit`, `flux-2-pro-edit` |
286
- | image, text | video | `kling-video-v3-pro`, `seedance-v1.5-pro` |
287
- | text | audio | `elevenlabs-v3`, `minimax-music-v2` |
288
- | text | text | `claude-sonnet-4-6`, `gpt-4o` |
289
- | image | image | `topaz-upscale-image`, `birefnet-v2` |
290
- | image | 3d | `trellis-image-to-3d` |
291
- | audio | text | `whisper` |
344
+ ### Video to Audio
292
345
 
293
- ## Providers
346
+ Generates audio for a video. Returns both the merged video and the generated audio tracks.
294
347
 
295
- | Provider | Models | Auth Env Var | Protocol |
296
- |---|---|---|---|
297
- | fal-ai | 1,201 | `FAL_KEY` | Native fetch |
298
- | Replicate | 687 | `REPLICATE_API_TOKEN` | Native fetch |
299
- | Kling AI | 69 | `KLING_ACCESS_KEY` | Native fetch + JWT |
300
- | WaveSpeed | 66 | `WAVESPEED_API_KEY` | Native fetch |
301
- | OpenRouter | 24 | `OPENROUTER_API_KEY` | Native fetch |
348
+ ```typescript
349
+ const result = await kling.videoToAudio({
350
+ video_url: 'https://example.com/video.mp4',
351
+ sound_effect_prompt: 'ocean waves crashing',
352
+ })
302
353
 
303
- Many Kling models are available through both fal-ai and the native Kling provider. Using `provider: 'kling'` calls the Kling API directly with JWT authentication, bypassing intermediary markup. Set both `KLING_ACCESS_KEY` and `KLING_SECRET_KEY` env vars (or pass them combined as `accessKey:secretKey` via `configure()`).
354
+ console.log(result.videos[0].url) // merged video with audio
355
+ console.log(result.audios[0].url_mp3) // audio track (mp3)
356
+ console.log(result.audios[0].url_wav) // audio track (wav)
357
+ ```
304
358
 
305
- **Provider portability** -- the same code works across providers. Parameter names are aligned: `generate_audio`, `end_image_url`, `voice_ids`, and `elements` work identically whether you use `provider: 'fal-ai'` or `provider: 'kling'`. The library automatically translates to each provider's native field names (e.g., `generate_audio: true` becomes `sound: "on"` for Kling, stays `generate_audio: true` for fal-ai).
359
+ **Input: `VideoToAudioInput`**
306
360
 
307
- Zero external dependencies -- all provider communication uses native `fetch`. Works in Node.js, Vercel Edge, Cloudflare Workers, Deno, Bun, and any ESM runtime -- no `fs` or special bundler config needed.
361
+ ```typescript
362
+ {
363
+ video_url?: string // mutually exclusive with video_id
364
+ video_id?: string // mutually exclusive with video_url
365
+ sound_effect_prompt?: string
366
+ bgm_prompt?: string // background music prompt
367
+ asmr_mode?: boolean // enhanced detailed sound effects
368
+ }
369
+ ```
308
370
 
309
- ## API Reference
371
+ ### Text to Audio
310
372
 
311
- ### `generate(request: GenerateRequest): Promise<GenerateResponse>`
373
+ ```typescript
374
+ const result = await kling.textToAudio({
375
+ prompt: 'thunderstorm with heavy rain',
376
+ duration: 5.0,
377
+ })
312
378
 
313
- The core function. Resolves the model, maps parameters, calls the provider, and returns a unified response.
379
+ console.log(result.audios[0].url) // normalized from url_mp3
380
+ console.log(result.audios[0].url_mp3) // mp3 URL
381
+ console.log(result.audios[0].url_wav) // wav URL
382
+ ```
314
383
 
315
- **GenerateRequest**
384
+ **Input: `TextToAudioInput`**
316
385
 
317
386
  ```typescript
318
- interface GenerateRequest<P extends ProviderName = ProviderName> {
319
- model: string // required - model name
320
- provider?: P // preferred provider (optional)
321
- prompt?: string // text prompt
322
- image?: string | File // input image (URL or File)
323
- images?: (string | File)[] // multiple reference images
324
- audio?: string | File // input audio
325
- video?: string | File // input video
326
- negative_prompt?: string // what to avoid
327
- count?: number // number of outputs
328
- size?: string | { width: number; height: number } // output dimensions
329
- seed?: number // reproducibility seed
330
- guidance?: number // guidance scale
331
- steps?: number // inference steps
332
- strength?: number // denoising strength
333
- format?: 'png' | 'jpeg' | 'webp' | 'mp4' | 'mp3' | 'wav' | 'obj' | 'glb'
334
- quality?: number // output quality
335
- safety?: boolean // enable safety checker
336
- duration?: string // output duration (video/audio)
337
- options?: ProviderOptionsFor<P> // provider-specific overrides
387
+ {
388
+ prompt: string // required
389
+ duration: number // required 3.0 to 10.0
338
390
  }
339
391
  ```
340
392
 
341
- The generic `P` narrows `options` by provider. Use `GenerateRequest<'kling'>` for type-safe Kling options:
393
+ ### Voice Clone
342
394
 
343
395
  ```typescript
344
- const req: GenerateRequest<'kling'> = {
345
- model: 'kling-video-v3-pro-image-to-video',
346
- provider: 'kling',
347
- image: 'https://example.com/img.png',
348
- prompt: 'Animate this photo',
349
- options: {
350
- sound: 'on', // typed: 'on' | 'off'
351
- aspect_ratio: '16:9', // typed: string
352
- cfg_scale: 0.5, // typed: number
353
- },
354
- }
355
- ```
396
+ const result = await kling.createVoice({
397
+ voice_name: 'my-voice',
398
+ voice_url: 'https://example.com/sample.mp3',
399
+ })
356
400
 
357
- Without a generic, `options` accepts any `Record<string, unknown>` (backward compatible).
401
+ console.log(result.voices[0].voice_id)
402
+ console.log(result.voices[0].trial_url)
403
+ ```
358
404
 
359
- **GenerateResponse**
405
+ **Input: `CreateVoiceInput`**
360
406
 
361
407
  ```typescript
362
- interface GenerateResponse {
363
- id: string
364
- model: string
365
- provider: string
366
- status: 'completed' | 'failed'
367
- outputs: OutputItem[]
368
- metadata: {
369
- seed?: number
370
- inference_time_ms?: number
371
- cost?: number
372
- safety_flagged?: boolean
373
- tokens?: number // total tokens (LLM only)
374
- prompt_tokens?: number // input tokens (LLM only)
375
- completion_tokens?: number // output tokens (LLM only)
376
- }
377
- }
378
-
379
- interface OutputItem {
380
- type: 'image' | 'video' | 'audio' | 'text' | '3d' | 'segmentation'
381
- url?: string // URL for media outputs
382
- content?: string // text content for LLM outputs
383
- content_type: string
384
- size_bytes?: number
408
+ {
409
+ voice_name: string // required
410
+ voice_url?: string // audio sample URL
411
+ video_id?: string // or extract from video
385
412
  }
386
413
  ```
387
414
 
388
- ### `submit(request: GenerateRequest): Promise<SubmitResponse>`
415
+ ### Multi-Shot
389
416
 
390
- Submits a job to the provider and returns immediately without waiting for completion. Returns the provider's task ID and enough context to poll later.
417
+ Generate multi-angle reference images from a frontal image. Each image returns 3 angle variants.
391
418
 
392
419
  ```typescript
393
- interface SubmitResponse {
394
- id: string // provider's task/request ID
395
- model: string // canonical model name
396
- provider: ProviderName // which provider handled it
397
- endpoint: string // needed for polling
398
- status: 'pending' | 'processing' | 'completed'
399
- }
400
- ```
420
+ const result = await kling.multiShot({
421
+ element_frontal_image: 'https://example.com/face.jpg',
422
+ })
401
423
 
402
- ### `poll(job: SubmitResponse): Promise<PollResponse>`
424
+ console.log(result.images[0].url_1) // angle 1
425
+ console.log(result.images[0].url_2) // angle 2
426
+ console.log(result.images[0].url_3) // angle 3
427
+ ```
403
428
 
404
- Checks the status of a submitted job once. Returns current status, and includes mapped outputs and metadata when completed.
429
+ **Input: `MultiShotInput`**
405
430
 
406
431
  ```typescript
407
- interface PollResponse {
408
- id: string
409
- model: string
410
- provider: ProviderName
411
- status: 'completed' | 'failed' | 'processing' | 'pending'
412
- outputs?: OutputItem[] // populated when completed
413
- metadata?: GenerateResponse['metadata'] // populated when completed
414
- error?: string // populated when failed
432
+ {
433
+ element_frontal_image: string // required
415
434
  }
416
435
  ```
417
436
 
418
- ### `submitAndPoll(request: GenerateRequest): Promise<GenerateResponse>`
419
-
420
- Alias for `generate()`. Submits a job and polls until completion. Use this when you want the blocking behavior but want to be explicit about it.
421
-
422
- ### `listModels(filters?: ListModelsFilters): ModelEntry[]`
437
+ ### Reference to Image
423
438
 
424
- Returns all models in the registry. Accepts optional filters:
425
-
426
- - `input` -- filter by input modality (e.g. `'text'`, `'image'`, `'audio'`, `'video'`)
427
- - `output` -- filter by output modality (e.g. `'image'`, `'video'`, `'text'`, `'3d'`)
428
- - `provider` -- filter by provider (e.g. `'fal-ai'`)
429
- - `query` -- search canonical names and aliases
430
-
431
- ### `resolveModel(name: string): ModelEntry`
432
-
433
- Resolves a model by name. Accepts canonical names, aliases, and normalized variants. Throws if no match is found.
439
+ ```typescript
440
+ const result = await kling.referenceToImage({
441
+ prompt: 'portrait in watercolor style',
442
+ n: 2,
443
+ })
444
+ ```
434
445
 
435
- ### `deriveCategory(model: ModelEntry): string`
446
+ **Input: `ReferenceToImageInput`**
436
447
 
437
- Derives a display category label from a model's modality (e.g. `"text-to-image"`).
448
+ ```typescript
449
+ {
450
+ prompt: string // required
451
+ n?: number
452
+ aspect_ratio?: string
453
+ }
454
+ ```
438
455
 
439
- ## R2 Storage (Asset Uploads)
456
+ ### Expand Image
440
457
 
441
- getaiapi includes built-in Cloudflare R2 storage support that automatically uploads binary assets before sending them to providers. Two modes are supported:
458
+ Outpainting expand an image beyond its borders.
442
459
 
443
- - **`public`** (default) — requires a publicly readable bucket; returns public URLs (via `publicUrlBase` or the R2 endpoint)
444
- - **`presigned`** works with private buckets; returns time-limited presigned GET URLs signed with S3 Signature V4 (no public access needed, `publicUrlBase` is not required)
460
+ ```typescript
461
+ const result = await kling.expandImage({
462
+ image: 'https://example.com/photo.jpg',
463
+ prompt: 'extend the landscape',
464
+ })
465
+ ```
445
466
 
446
- ### Setup
467
+ **Input: `ExpandImageInput`**
447
468
 
448
- Set these environment variables:
469
+ ```typescript
470
+ {
471
+ image: string // required
472
+ prompt?: string
473
+ n?: number
474
+ }
475
+ ```
449
476
 
450
- ```bash
451
- # Required
452
- export R2_ACCOUNT_ID="your-cloudflare-account-id"
453
- export R2_BUCKET_NAME="your-bucket-name"
454
- export R2_ACCESS_KEY_ID="your-r2-access-key"
455
- export R2_SECRET_ACCESS_KEY="your-r2-secret-key"
477
+ ### Extend Video
456
478
 
457
- # Optional - custom public URL (only needed for mode: 'public')
458
- export R2_PUBLIC_URL="https://cdn.example.com"
479
+ Continue a video beyond its last frame.
459
480
 
460
- # Optional - use presigned URLs for private buckets (default: 'public')
461
- export R2_STORAGE_MODE="presigned"
462
- export R2_PRESIGN_EXPIRES_IN="3600" # seconds, default: 3600, max: 604800 (7 days)
481
+ ```typescript
482
+ const result = await kling.extendVideo({
483
+ prompt: 'the camera continues to pan right',
484
+ })
463
485
  ```
464
486
 
465
- #### How to get your R2 Public URL (public mode only)
487
+ **Input: `ExtendVideoInput`**
466
488
 
467
- If using `mode: 'presigned'`, you can skip this — no public bucket access is needed.
489
+ ```typescript
490
+ {
491
+ prompt?: string
492
+ negative_prompt?: string
493
+ }
494
+ ```
468
495
 
469
- 1. Log in to the [Cloudflare dashboard](https://dash.cloudflare.com)
470
- 2. Go to **R2 Object Storage** in the left sidebar
471
- 3. Click on your bucket
472
- 4. Go to the **Settings** tab
473
- 5. Under **Public access**, click **Allow Access**
474
- 6. Cloudflare will provide a public URL like `https://<bucket>.<account-id>.r2.dev` — use this as your `R2_PUBLIC_URL`
475
- 7. (Optional) You can also connect a **Custom Domain** under the same section for a cleaner URL like `https://cdn.yourdomain.com`
496
+ ### Identify Face (Sync)
476
497
 
477
- Then call `configureStorage()` once at startup:
498
+ Detect faces in a video for lip-sync targeting. Returns immediately — no polling.
478
499
 
479
500
  ```typescript
480
- import { configureStorage } from 'getaiapi'
481
-
482
- // Read from environment variables
483
- configureStorage()
501
+ const result = await kling.identifyFace({
502
+ video_url: 'https://example.com/video.mp4',
503
+ })
484
504
 
485
- // Or pass config directly
486
- configureStorage({
487
- accountId: 'your-account-id',
488
- bucketName: 'your-bucket',
489
- accessKeyId: 'your-key',
490
- secretAccessKey: 'your-secret',
491
- publicUrlBase: 'https://cdn.example.com', // optional
492
- autoUpload: false, // optional
493
- mode: 'public', // 'public' | 'presigned' (default: 'public')
494
- presignExpiresIn: 3600, // presigned URL TTL in seconds (default: 3600)
505
+ console.log(result.session_id)
506
+ result.face_data.forEach(face => {
507
+ console.log(face.face_id, face.face_image, face.start_time, face.end_time)
495
508
  })
496
509
  ```
497
510
 
498
- ### Automatic Uploads in `generate()`
499
-
500
- Once storage is configured, any `Buffer`, `Blob`, `File`, or `ArrayBuffer` values in provider params are automatically uploaded to R2 and replaced with public URLs before the request is sent to the provider. This works recursively -- nested objects and arrays are traversed, so params like Kling's `elements[].frontal_image_url` are handled automatically. No code changes needed -- it just works.
511
+ **Input: `IdentifyFaceInput`**
501
512
 
502
513
  ```typescript
503
- import { generate, configureStorage } from 'getaiapi'
504
- import { readFileSync } from 'fs'
505
-
506
- configureStorage()
507
-
508
- const result = await generate({
509
- model: 'gpt-image-1.5-edit',
510
- image: readFileSync('./photo.jpg'), // Buffer uploaded to R2 automatically
511
- prompt: 'add a rainbow in the sky',
512
- })
514
+ {
515
+ video_url?: string // mutually exclusive with video_id
516
+ video_id?: string // mutually exclusive with video_url
517
+ }
513
518
  ```
514
519
 
515
- To also re-upload URL strings through R2 (useful when providers can't access the original URL), pass `reupload: true` per-call:
520
+ ### Image Recognize (Sync)
521
+
522
+ Returns immediately — no polling.
516
523
 
517
524
  ```typescript
518
- const result = await generate({
519
- model: 'kling-video-pro',
520
- image: 'https://private-server.com/img.jpg',
521
- prompt: 'animate this image',
522
- options: { reupload: true },
525
+ const result = await kling.imageRecognize({
526
+ image: 'https://example.com/photo.jpg',
523
527
  })
524
528
  ```
525
529
 
526
- Or enable it globally with `autoUpload: true` in the storage config.
527
-
528
- ### Cleanup / Lifecycle
530
+ **Input: `ImageRecognizeInput`**
529
531
 
530
- Assets uploaded automatically via `generate()` use the `getaiapi-tmp/` key prefix. You can set a [Cloudflare R2 lifecycle rule](https://developers.cloudflare.com/r2/buckets/object-lifecycles/) to auto-expire objects under that prefix (e.g. delete after 24 hours) so ephemeral generation assets don't accumulate.
532
+ ```typescript
533
+ {
534
+ image: string // required
535
+ }
536
+ ```
531
537
 
532
- ### Standalone Upload / Delete
538
+ ## Output Types
533
539
 
534
- You can also use R2 storage directly:
540
+ All functions return typed results based on output modality:
535
541
 
536
542
  ```typescript
537
- import { uploadAsset, deleteAsset, configureStorage } from 'getaiapi'
543
+ // Video endpoints (textToVideo, imageToVideo, omniVideo, avatar, lipSync, effects, motionControl, extendVideo)
544
+ interface KlingVideoResult {
545
+ task_id: string
546
+ videos: Array<{ id: string; url: string; duration: string }>
547
+ }
538
548
 
539
- configureStorage()
549
+ // Image endpoints (imageGeneration, omniImage, virtualTryOn, referenceToImage, expandImage)
550
+ interface KlingImageResult {
551
+ task_id: string
552
+ images: Array<{ index: number; url: string }>
553
+ }
540
554
 
541
- // Upload a buffer
542
- const { url, key, size_bytes, content_type } = await uploadAsset(
543
- Buffer.from('hello world'),
544
- { contentType: 'text/plain', prefix: 'uploads' }
545
- )
546
- console.log(url) // https://cdn.example.com/uploads/a1b2c3d4-...
555
+ // Audio endpoints (tts, textToAudio)
556
+ interface KlingAudioResult {
557
+ task_id: string
558
+ audios: Array<{ id: string; url: string; url_mp3?: string; url_wav?: string; duration?: string; duration_mp3?: string; duration_wav?: string }>
559
+ }
547
560
 
548
- // Delete by key
549
- await deleteAsset(key)
550
- ```
561
+ // Multi-shot endpoint — 3 angle URLs per image
562
+ interface KlingMultiShotResult {
563
+ task_id: string
564
+ images: Array<{ index: number; url_1: string; url_2: string; url_3: string }>
565
+ }
551
566
 
552
- ### Presigned URLs (Private Buckets)
567
+ // Voice clone endpoint
568
+ interface KlingVoiceResult {
569
+ task_id: string
570
+ voices: Array<{ voice_id: string; voice_name: string; trial_url: string; owned_by: string }>
571
+ }
553
572
 
554
- If your R2 bucket doesn't have public read access, use presigned mode. Instead of returning a public URL, `uploadAsset` will return a time-limited presigned GET URL signed with S3 Signature V4.
573
+ // Video-to-audio endpoint merged video + generated audio
574
+ interface KlingVideoAudioResult {
575
+ task_id: string
576
+ videos: Array<{ id: string; url: string; duration: string }>
577
+ audios: Array<{ id: string; url_mp3?: string; url_wav?: string; duration_mp3?: string; duration_wav?: string }>
578
+ }
555
579
 
556
- ```typescript
557
- configureStorage({
558
- accountId: 'your-account-id',
559
- bucketName: 'private-bucket',
560
- accessKeyId: 'your-key',
561
- secretAccessKey: 'your-secret',
562
- mode: 'presigned', // uploadAsset returns presigned URLs
563
- presignExpiresIn: 1800, // URLs expire after 30 minutes
564
- })
580
+ // Face detection (identifyFace) — sync, no task_id
581
+ interface KlingFaceResult {
582
+ session_id: string
583
+ face_data: Array<{ face_id: string; face_image: string; start_time: number; end_time: number }>
584
+ }
565
585
 
566
- const { url } = await uploadAsset(Buffer.from('secret data'), {
567
- contentType: 'application/octet-stream',
568
- })
569
- // url is a presigned GET URL, valid for 30 minutes
586
+ // Generic JSON (imageRecognize)
587
+ interface KlingJsonResult {
588
+ task_id: string
589
+ data: unknown
590
+ }
570
591
  ```
571
592
 
572
- You can also generate presigned URLs for existing objects:
573
-
574
- ```typescript
575
- import { presignAsset } from 'getaiapi'
593
+ ## Polling Control
576
594
 
577
- const url = presignAsset('uploads/my-file.png')
578
- // => https://<account>.r2.cloudflarestorage.com/<bucket>/uploads/my-file.png?X-Amz-Algorithm=...
595
+ All functions accept optional polling parameters:
579
596
 
580
- // Custom expiry per-call (overrides config default)
581
- const shortUrl = presignAsset('uploads/my-file.png', { expiresIn: 300 }) // 5 minutes
597
+ ```typescript
598
+ await kling.textToVideoV3Pro({
599
+ prompt: 'a sunset',
600
+ timeout: 600_000, // max wait time in ms (default: 300_000 = 5 min)
601
+ pollInterval: 5_000, // poll frequency in ms (default: 3_000)
602
+ })
582
603
  ```
583
604
 
584
- **UploadOptions**
605
+ Sync endpoints (`tts`, `imageRecognize`, `identifyFace`) return immediately regardless of these settings.
585
606
 
586
- | Option | Type | Description |
587
- |---|---|---|
588
- | `key` | `string` | Custom object key (default: auto-generated UUID) |
589
- | `contentType` | `string` | MIME type (default: detected from input or `application/octet-stream`) |
590
- | `prefix` | `string` | Key prefix / folder (e.g. `"uploads"`) |
591
- | `maxBytes` | `number` | Max upload size in bytes (default: 500 MB) |
607
+ ## Extra Parameters
592
608
 
593
- ### Storage Errors
609
+ All input types accept additional Kling-native fields via index signature. Pass any parameter the Kling API supports:
594
610
 
595
611
  ```typescript
596
- import { StorageError } from 'getaiapi'
597
-
598
- try {
599
- await uploadAsset(buffer)
600
- } catch (err) {
601
- if (err instanceof StorageError) {
602
- console.error(err.operation) // 'upload' | 'delete' | 'config'
603
- console.error(err.statusCode) // HTTP status from R2, if applicable
604
- }
605
- }
612
+ await kling.textToVideoV3Pro({
613
+ prompt: 'a sunset',
614
+ camera_control: { type: 'simple', config: { horizontal: 5 } },
615
+ callback_url: 'https://example.com/webhook',
616
+ })
606
617
  ```
607
618
 
608
619
  ## Error Handling
609
620
 
610
- All errors extend `GetAIApiError` and can be caught uniformly or by type:
611
-
612
- | Error | When |
613
- |---|---|
614
- | `AuthError` | Missing or invalid API key for a provider |
615
- | `ModelNotFoundError` | Model name could not be resolved |
616
- | `ValidationError` | Invalid input parameters |
617
- | `ProviderError` | Provider returned an error response |
618
- | `TimeoutError` | Generation exceeded the timeout |
619
- | `RateLimitError` | Provider returned HTTP 429 |
620
- | `StorageError` | R2 upload, delete, or config failure |
621
-
622
621
  ```typescript
623
- import { generate, AuthError, ModelNotFoundError } from 'getaiapi'
622
+ import { kling, KlingAuthError, KlingTimeoutError, KlingTaskFailedError } from 'getaiapi'
624
623
 
625
624
  try {
626
- const result = await generate({ model: 'flux-schnell', prompt: 'a cat' })
625
+ await kling.textToVideoV3Pro({ prompt: 'test' })
627
626
  } catch (err) {
628
- if (err instanceof AuthError) {
629
- console.error(`Set ${err.envVar} to use ${err.provider}`)
627
+ if (err instanceof KlingAuthError) {
628
+ // Missing or invalid credentials
630
629
  }
631
- if (err instanceof ModelNotFoundError) {
632
- console.error(err.message) // includes "did you mean" suggestions
630
+ if (err instanceof KlingTimeoutError) {
631
+ // Task took too long (increase timeout)
632
+ }
633
+ if (err instanceof KlingTaskFailedError) {
634
+ // Kling rejected the task (content violation, bad params, etc.)
635
+ console.error(err.taskId, err.message)
633
636
  }
634
637
  }
635
638
  ```
636
639
 
637
- ## Migrating from v0.x
640
+ | Error | Code | When |
641
+ |-------|------|------|
642
+ | `KlingAuthError` | `AUTH_ERROR` | Missing credentials or 401 response |
643
+ | `KlingRateLimitError` | `RATE_LIMIT` | HTTP 429 or body codes 1100-1102 |
644
+ | `KlingApiError` | `API_ERROR` | Provider returned an error |
645
+ | `KlingTimeoutError` | `TIMEOUT` | Polling exceeded timeout |
646
+ | `KlingTaskFailedError` | `TASK_FAILED` | Task status is 'failed' |
638
647
 
639
- v1.0.0 replaces the category-based architecture with a modality-first design. Key changes:
648
+ All errors extend `KlingError` which extends `Error`.
640
649
 
641
- - `getModel()` is now `resolveModel()`
642
- - `listModels({ category: '...' })` is now `listModels({ input: '...', output: '...' })`
643
- - No more `readFileSync` -- works in edge runtimes without any bundler config
650
+ ## Deprecated: v1 Unified Gateway
644
651
 
645
- See the full [Migration Guide](docs/MIGRATION.md) for details.
652
+ The previous `generate()`, `submit()`, `poll()` APIs and the multi-provider registry are deprecated but still exported for backward compatibility. They will be removed in the next major version.
646
653
 
647
- ## Documentation
654
+ ```typescript
655
+ // Deprecated — still works but will be removed
656
+ import { generate } from 'getaiapi'
657
+ await generate({ model: 'flux-schnell', prompt: '...' })
648
658
 
649
- Full documentation available at [interactive10.com/getaiapi.html](https://www.interactive10.com/getaiapi.html)
659
+ // New use provider-specific typed functions
660
+ import { kling } from 'getaiapi'
661
+ await kling.textToVideoV3Pro({ prompt: '...' })
662
+ ```
650
663
 
651
664
  ## License
652
665