@ai-sdk/xai 4.0.0-beta.7 → 4.0.0-beta.75

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (51) hide show
  1. package/CHANGELOG.md +660 -9
  2. package/README.md +2 -0
  3. package/dist/index.d.ts +213 -68
  4. package/dist/index.js +2074 -781
  5. package/dist/index.js.map +1 -1
  6. package/docs/01-xai.mdx +445 -54
  7. package/package.json +15 -15
  8. package/src/convert-to-xai-chat-messages.ts +48 -27
  9. package/src/convert-xai-chat-usage.ts +3 -3
  10. package/src/files/xai-files-api.ts +16 -0
  11. package/src/files/xai-files-options.ts +19 -0
  12. package/src/files/xai-files.ts +94 -0
  13. package/src/index.ts +9 -4
  14. package/src/map-xai-finish-reason.ts +2 -2
  15. package/src/realtime/index.ts +2 -0
  16. package/src/realtime/xai-realtime-event-mapper.ts +399 -0
  17. package/src/realtime/xai-realtime-model-options.ts +3 -0
  18. package/src/realtime/xai-realtime-model.ts +101 -0
  19. package/src/remove-additional-properties.ts +24 -0
  20. package/src/responses/convert-to-xai-responses-input.ts +100 -23
  21. package/src/responses/convert-xai-responses-usage.ts +3 -3
  22. package/src/responses/map-xai-responses-finish-reason.ts +3 -2
  23. package/src/responses/xai-responses-api.ts +31 -1
  24. package/src/responses/{xai-responses-options.ts → xai-responses-language-model-options.ts} +12 -7
  25. package/src/responses/xai-responses-language-model.ts +157 -60
  26. package/src/responses/xai-responses-prepare-tools.ts +10 -8
  27. package/src/tool/code-execution.ts +2 -2
  28. package/src/tool/file-search.ts +2 -2
  29. package/src/tool/mcp-server.ts +2 -2
  30. package/src/tool/view-image.ts +2 -2
  31. package/src/tool/view-x-video.ts +2 -2
  32. package/src/tool/web-search.ts +4 -2
  33. package/src/tool/x-search.ts +2 -2
  34. package/src/{xai-chat-options.ts → xai-chat-language-model-options.ts} +28 -13
  35. package/src/xai-chat-language-model.ts +65 -29
  36. package/src/xai-chat-prompt.ts +2 -1
  37. package/src/xai-error.ts +13 -3
  38. package/src/xai-image-model.ts +28 -11
  39. package/src/xai-prepare-tools.ts +9 -8
  40. package/src/xai-provider.ts +115 -19
  41. package/src/xai-speech-model-options.ts +55 -0
  42. package/src/xai-speech-model.ts +167 -0
  43. package/src/xai-transcription-model-options.ts +70 -0
  44. package/src/xai-transcription-model.ts +166 -0
  45. package/src/xai-video-model-options.ts +145 -0
  46. package/src/xai-video-model.ts +129 -22
  47. package/dist/index.d.mts +0 -377
  48. package/dist/index.mjs +0 -3070
  49. package/dist/index.mjs.map +0 -1
  50. package/src/xai-video-options.ts +0 -23
  51. /package/src/{xai-image-options.ts → xai-image-model-options.ts} +0 -0
package/docs/01-xai.mdx CHANGED
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  title: xAI Grok
3
- description: Learn how to use xAI Grok.
3
+ description: Learn how to use xAI Grok and Imagine.
4
4
  ---
5
5
 
6
6
  # xAI Grok Provider
@@ -73,10 +73,10 @@ You can use the following optional settings to customize the xAI provider instan
73
73
  ## Language Models
74
74
 
75
75
  You can create [xAI models](https://console.x.ai) using a provider instance. The
76
- first argument is the model id, e.g. `grok-3`.
76
+ first argument is the model id, e.g. `grok-4.20-non-reasoning`.
77
77
 
78
78
  ```ts
79
- const model = xai('grok-3');
79
+ const model = xai('grok-4.20-non-reasoning');
80
80
  ```
81
81
 
82
82
  By default, `xai(modelId)` uses the Responses API. To use the [Chat Completions API](https://docs.x.ai/docs/api-reference#chat-completions) (legacy), use `xai.chat(modelId)`.
@@ -90,7 +90,7 @@ import { xai } from '@ai-sdk/xai';
90
90
  import { generateText } from 'ai';
91
91
 
92
92
  const { text } = await generateText({
93
- model: xai('grok-3'),
93
+ model: xai('grok-4.20-non-reasoning'),
94
94
  prompt: 'Write a vegetarian lasagna recipe for 4 people.',
95
95
  });
96
96
  ```
@@ -99,12 +99,76 @@ xAI language models can also be used in the `streamText` function
99
99
  and support structured data generation with [`Output`](/docs/reference/ai-sdk-core/output)
100
100
  (see [AI SDK Core](/docs/ai-sdk-core)).
101
101
 
102
+ ### Reasoning Effort
103
+
104
+ For reasoning-capable models you can control how much effort the model spends
105
+ thinking before responding via `providerOptions.xai.reasoningEffort`. This
106
+ works for both the Responses API (default) and the Chat Completions API
107
+ (`xai.chat()`).
108
+
109
+ ```ts
110
+ import { xai } from '@ai-sdk/xai';
111
+ import { generateText } from 'ai';
112
+
113
+ const { text } = await generateText({
114
+ model: xai('grok-4.3'),
115
+ prompt: 'Explain quantum entanglement.',
116
+ providerOptions: {
117
+ xai: { reasoningEffort: 'medium' },
118
+ },
119
+ });
120
+ ```
121
+
122
+ Supported values:
123
+
124
+ - `'none'` — Disables reasoning entirely; no thinking tokens are used. Best
125
+ for simple use cases that need a near-instant response. Supported by
126
+ `grok-4.3` and newer reasoning models.
127
+ - `'low'` (default) — Uses some reasoning tokens, but still fast. Good for
128
+ general agentic use and tool calling.
129
+ - `'medium'` — More thinking for less-latency-sensitive applications such as
130
+ complex data analysis and long-context reasoning.
131
+ - `'high'` — More reasoning tokens for deeper thinking. Suited for very
132
+ challenging problems, complex math, multi-step logic, and competition-level
133
+ tasks.
134
+
135
+ <Note>
136
+ Not every Grok model accepts every value. See xAI's [reasoning
137
+ docs](https://docs.x.ai/docs/guides/reasoning) for the values supported by
138
+ your selected model. `'none'` requires `grok-4.3` or newer.
139
+ </Note>
140
+
141
+ ## Realtime Models
142
+
143
+ <Note type="warning">Realtime is an experimental feature.</Note>
144
+
145
+ You can create models that call the [xAI Realtime API](https://docs.x.ai/docs/guides/realtime)
146
+ using the `.experimental_realtime()` factory method.
147
+
148
+ ```ts
149
+ import { xai } from '@ai-sdk/xai';
150
+
151
+ const model = xai.experimental_realtime('grok-voice-latest');
152
+ ```
153
+
154
+ Realtime sessions run in the browser and require a short-lived token created on
155
+ your server with `xai.experimental_realtime.getToken()`:
156
+
157
+ ```ts
158
+ const token = await xai.experimental_realtime.getToken({
159
+ model: 'grok-voice-latest',
160
+ });
161
+ ```
162
+
163
+ See [Realtime](/docs/ai-sdk-core/realtime) for the complete setup and tool
164
+ calling pattern.
165
+
102
166
  ## Responses API (Agentic Tools)
103
167
 
104
168
  The xAI Responses API is the default when using `xai(modelId)`. You can also use `xai.responses(modelId)` explicitly. This enables the model to autonomously orchestrate tool calls and research on xAI's servers.
105
169
 
106
170
  ```ts
107
- const model = xai.responses('grok-4-fast-non-reasoning');
171
+ const model = xai.responses('grok-4.20-non-reasoning');
108
172
  ```
109
173
 
110
174
  The Responses API provides server-side tools that the model can autonomously execute during its reasoning process:
@@ -132,7 +196,11 @@ const { text } = await generateText({
132
196
  role: 'user',
133
197
  content: [
134
198
  { type: 'text', text: 'What do you see in this image?' },
135
- { type: 'image', image: fs.readFileSync('./image.png') },
199
+ {
200
+ type: 'file',
201
+ mediaType: 'image',
202
+ data: fs.readFileSync('./image.png'),
203
+ },
136
204
  ],
137
205
  },
138
206
  ],
@@ -148,7 +216,7 @@ import { xai } from '@ai-sdk/xai';
148
216
  import { generateText } from 'ai';
149
217
 
150
218
  const { text, sources } = await generateText({
151
- model: xai.responses('grok-4-fast-non-reasoning'),
219
+ model: xai.responses('grok-4.20-non-reasoning'),
152
220
  prompt: 'What are the latest developments in AI?',
153
221
  tools: {
154
222
  web_search: xai.tools.webSearch({
@@ -172,9 +240,13 @@ console.log('Citations:', sources);
172
240
 
173
241
  Exclude specified domains from search (max 5). Cannot be used with `allowedDomains`.
174
242
 
243
+ - **enableImageSearch** _boolean_
244
+
245
+ Allow the model to perform image search as a separate Web Search mode when images are useful for the answer. The model can choose between regular web search and image search; responses can include Markdown image embeds when relevant.
246
+
175
247
  - **enableImageUnderstanding** _boolean_
176
248
 
177
- Enable the model to view and analyze images found during search. Increases token usage.
249
+ Enable the model to view and analyze images found during search. Increases token usage. To allow explicitly searching for images, use `enableImageSearch`.
178
250
 
179
251
  ### X Search Tool
180
252
 
@@ -182,7 +254,7 @@ The X search tool enables searching X (Twitter) for posts, with filtering by han
182
254
 
183
255
  ```ts
184
256
  const { text, sources } = await generateText({
185
- model: xai.responses('grok-4-fast-non-reasoning'),
257
+ model: xai.responses('grok-4.20-non-reasoning'),
186
258
  prompt: 'What are people saying about AI on X this week?',
187
259
  tools: {
188
260
  x_search: xai.tools.xSearch({
@@ -228,7 +300,7 @@ The code execution tool enables the model to write and execute Python code for c
228
300
 
229
301
  ```ts
230
302
  const { text } = await generateText({
231
- model: xai.responses('grok-4-fast-non-reasoning'),
303
+ model: xai.responses('grok-4.20-non-reasoning'),
232
304
  prompt:
233
305
  'Calculate the compound interest for $10,000 at 5% annually for 10 years',
234
306
  tools: {
@@ -243,7 +315,7 @@ The view image tool enables the model to view and analyze images:
243
315
 
244
316
  ```ts
245
317
  const { text } = await generateText({
246
- model: xai.responses('grok-4-fast-non-reasoning'),
318
+ model: xai.responses('grok-4.20-non-reasoning'),
247
319
  prompt: 'Describe what you see in the image',
248
320
  tools: {
249
321
  view_image: xai.tools.viewImage(),
@@ -257,7 +329,7 @@ The view X video tool enables the model to view and analyze videos from X (Twitt
257
329
 
258
330
  ```ts
259
331
  const { text } = await generateText({
260
- model: xai.responses('grok-4-fast-non-reasoning'),
332
+ model: xai.responses('grok-4.20-non-reasoning'),
261
333
  prompt: 'Summarize the content of this X video',
262
334
  tools: {
263
335
  view_x_video: xai.tools.viewXVideo(),
@@ -271,7 +343,7 @@ The MCP server tool enables the model to connect to remote [Model Context Protoc
271
343
 
272
344
  ```ts
273
345
  const { text } = await generateText({
274
- model: xai.responses('grok-4-fast-non-reasoning'),
346
+ model: xai.responses('grok-4.20-non-reasoning'),
275
347
  prompt: 'Use the weather tool to check conditions in San Francisco',
276
348
  tools: {
277
349
  weather_server: xai.tools.mcpServer({
@@ -319,7 +391,7 @@ import { xai, type XaiLanguageModelResponsesOptions } from '@ai-sdk/xai';
319
391
  import { streamText } from 'ai';
320
392
 
321
393
  const result = streamText({
322
- model: xai.responses('grok-4-1-fast-reasoning'),
394
+ model: xai.responses('grok-4.20-reasoning'),
323
395
  prompt: 'What documents do you have access to?',
324
396
  tools: {
325
397
  file_search: xai.tools.fileSearch({
@@ -352,8 +424,8 @@ const result = streamText({
352
424
  Include file search results in the response. When set to `['file_search_call.results']`, the response will contain the actual search results with file content and scores.
353
425
 
354
426
  <Note>
355
- File search requires grok-4 family models and the Responses API. Vector stores
356
- can be created using the [xAI
427
+ File search requires grok-4 family models (including grok-4.20) and the
428
+ Responses API. Vector stores can be created using the [xAI
357
429
  API](https://docs.x.ai/docs/guides/using-collections/api).
358
430
  </Note>
359
431
 
@@ -365,8 +437,8 @@ You can combine multiple server-side tools for comprehensive research:
365
437
  import { xai } from '@ai-sdk/xai';
366
438
  import { streamText } from 'ai';
367
439
 
368
- const { fullStream } = streamText({
369
- model: xai.responses('grok-4-fast-non-reasoning'),
440
+ const { stream } = streamText({
441
+ model: xai.responses('grok-4.20-non-reasoning'),
370
442
  prompt: 'Research AI safety developments and calculate risk metrics',
371
443
  tools: {
372
444
  web_search: xai.tools.webSearch(),
@@ -382,7 +454,7 @@ const { fullStream } = streamText({
382
454
  },
383
455
  });
384
456
 
385
- for await (const part of fullStream) {
457
+ for await (const part of stream) {
386
458
  if (part.type === 'text-delta') {
387
459
  process.stdout.write(part.text);
388
460
  } else if (part.type === 'source' && part.sourceType === 'url') {
@@ -400,7 +472,7 @@ import { xai, type XaiLanguageModelResponsesOptions } from '@ai-sdk/xai';
400
472
  import { generateText } from 'ai';
401
473
 
402
474
  const result = await generateText({
403
- model: xai.responses('grok-4-fast-non-reasoning'),
475
+ model: xai.responses('grok-4.20-non-reasoning'),
404
476
  providerOptions: {
405
477
  xai: {
406
478
  reasoningEffort: 'high',
@@ -412,9 +484,9 @@ const result = await generateText({
412
484
 
413
485
  The following provider options are available:
414
486
 
415
- - **reasoningEffort** _'low' | 'medium' | 'high'_
487
+ - **reasoningEffort** _'none' | 'low' | 'medium' | 'high'_
416
488
 
417
- Control the reasoning effort for the model. Higher effort may produce more thorough results at the cost of increased latency and token usage.
489
+ Control the reasoning effort for the model. See [Reasoning Effort](#reasoning-effort) for details on each value. `'none'` disables reasoning entirely (requires `grok-4.3` or newer).
418
490
 
419
491
  - **logprobs** _boolean_
420
492
 
@@ -445,19 +517,16 @@ The following provider options are available:
445
517
 
446
518
  | Model | Image Input | Object Generation | Tool Usage | Tool Streaming | Reasoning |
447
519
  | ----------------------------- | ------------------- | ------------------- | ------------------- | ------------------- | ------------------- |
448
- | `grok-4-1` | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Cross size={18} /> |
520
+ | `grok-4.20-reasoning` | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> |
521
+ | `grok-4.20-non-reasoning` | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Cross size={18} /> |
449
522
  | `grok-4-1-fast-reasoning` | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> |
450
523
  | `grok-4-1-fast-non-reasoning` | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Cross size={18} /> |
451
- | `grok-4-fast-non-reasoning` | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Cross size={18} /> |
524
+ | `grok-4-1` | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Cross size={18} /> |
452
525
  | `grok-4-fast-reasoning` | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> |
526
+ | `grok-4-fast-non-reasoning` | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Cross size={18} /> |
453
527
  | `grok-code-fast-1` | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> |
454
- | `grok-4` | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Cross size={18} /> |
455
- | `grok-4-0709` | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Cross size={18} /> |
456
- | `grok-4-latest` | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Cross size={18} /> |
457
528
  | `grok-3` | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Cross size={18} /> |
458
- | `grok-3-latest` | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Cross size={18} /> |
459
529
  | `grok-3-mini` | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> |
460
- | `grok-3-mini-latest` | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> |
461
530
 
462
531
  <Note>
463
532
  The table above lists popular models. Please see the [xAI
@@ -465,6 +534,198 @@ The following provider options are available:
465
534
  can also pass any available provider model ID as a string if needed.
466
535
  </Note>
467
536
 
537
+ ## Speech Models
538
+
539
+ You can create models that call the [xAI Text to Speech API](https://docs.x.ai/developers/model-capabilities/audio/text-to-speech)
540
+ using the `.speech()` factory method. xAI's text-to-speech endpoint does not
541
+ require a model identifier.
542
+
543
+ ```ts
544
+ const model = xai.speech();
545
+ ```
546
+
547
+ Use xAI speech models with the `generateSpeech` function:
548
+
549
+ ```ts
550
+ import { xai } from '@ai-sdk/xai';
551
+ import { generateSpeech } from 'ai';
552
+
553
+ const result = await generateSpeech({
554
+ model: xai.speech(),
555
+ text: 'Hello from the AI SDK!',
556
+ voice: 'ara',
557
+ language: 'en',
558
+ outputFormat: 'mp3',
559
+ speed: 1.1,
560
+ });
561
+ ```
562
+
563
+ ### Supported Parameters
564
+
565
+ - **text** _string_ (required)
566
+
567
+ Text to synthesize. You can include xAI speech tags such as `[pause]`,
568
+ `[laugh]`, or `<whisper>...</whisper>` directly in the text.
569
+
570
+ - **voice** _string_
571
+
572
+ The xAI voice ID. Defaults to `'eve'`. Built-in voice IDs are `'eve'`,
573
+ `'ara'`, `'rex'`, `'sal'`, and `'leo'`; custom voice IDs are also accepted.
574
+
575
+ - **language** _string_
576
+
577
+ A BCP-47 language code or `'auto'` for automatic detection. Defaults to
578
+ `'auto'` when omitted.
579
+
580
+ - **speed** _number_
581
+
582
+ Speech rate multiplier from `0.7` to `1.5`.
583
+
584
+ - **outputFormat** _string_
585
+
586
+ The audio codec to generate. Supported values are `'mp3'`, `'wav'`, `'pcm'`,
587
+ `'mulaw'`, and `'alaw'`. Defaults to `'mp3'`.
588
+
589
+ <Note>
590
+ xAI does not expose a separate `instructions` field for text-to-speech. Use
591
+ speech tags in `text` to control expressive delivery.
592
+ </Note>
593
+
594
+ ### Provider Options
595
+
596
+ You can pass xAI-specific controls using `providerOptions.xai`:
597
+
598
+ ```ts
599
+ import { xai, type XaiSpeechModelOptions } from '@ai-sdk/xai';
600
+ import { generateSpeech } from 'ai';
601
+
602
+ const result = await generateSpeech({
603
+ model: xai.speech(),
604
+ text: 'A high fidelity narration sample.',
605
+ outputFormat: 'mp3',
606
+ providerOptions: {
607
+ xai: {
608
+ sampleRate: 44100,
609
+ bitRate: 192000,
610
+ optimizeStreamingLatency: 1,
611
+ textNormalization: true,
612
+ } satisfies XaiSpeechModelOptions,
613
+ },
614
+ });
615
+ ```
616
+
617
+ - **sampleRate** _8000 | 16000 | 22050 | 24000 | 44100 | 48000_
618
+
619
+ Sample rate of the output audio in Hz.
620
+
621
+ - **bitRate** _32000 | 64000 | 96000 | 128000 | 192000_
622
+
623
+ MP3 bit rate in bits per second. Only applies when `outputFormat` is
624
+ `'mp3'`.
625
+
626
+ - **optimizeStreamingLatency** _0 | 1 | 2_
627
+
628
+ Latency optimization level. Higher values reduce time to first audio with a
629
+ quality tradeoff at chunk boundaries.
630
+
631
+ - **textNormalization** _boolean_
632
+
633
+ Whether to normalize written-form input text before synthesizing speech.
634
+
635
+ ### Model Capabilities
636
+
637
+ | Model | Language | Speed | Output Formats |
638
+ | --------- | ------------------- | ------------------- | -------------------------- |
639
+ | `default` | <Check size={18} /> | <Check size={18} /> | mp3, wav, pcm, mulaw, alaw |
640
+
641
+ ## Transcription Models
642
+
643
+ You can create models that call the [xAI Speech to Text API](https://docs.x.ai/developers/model-capabilities/audio/speech-to-text)
644
+ using the `.transcription()` factory method. xAI's batch transcription endpoint
645
+ does not require a model identifier.
646
+
647
+ ```ts
648
+ const model = xai.transcription();
649
+ ```
650
+
651
+ Use xAI transcription models with the `transcribe` function:
652
+
653
+ ```ts
654
+ import { xai } from '@ai-sdk/xai';
655
+ import { transcribe } from 'ai';
656
+ import { readFile } from 'fs/promises';
657
+
658
+ const result = await transcribe({
659
+ model: xai.transcription(),
660
+ audio: await readFile('meeting.mp3'),
661
+ });
662
+ ```
663
+
664
+ ### Provider Options
665
+
666
+ You can pass xAI-specific controls using `providerOptions.xai`:
667
+
668
+ ```ts
669
+ import { xai, type XaiTranscriptionModelOptions } from '@ai-sdk/xai';
670
+ import { transcribe } from 'ai';
671
+ import { readFile } from 'fs/promises';
672
+
673
+ const result = await transcribe({
674
+ model: xai.transcription(),
675
+ audio: await readFile('meeting.mp3'),
676
+ providerOptions: {
677
+ xai: {
678
+ language: 'en',
679
+ format: true,
680
+ keyterm: ['AI SDK', 'Grok'],
681
+ diarize: true,
682
+ } satisfies XaiTranscriptionModelOptions,
683
+ },
684
+ });
685
+ ```
686
+
687
+ - **audioFormat** _`pcm` | `mulaw` | `alaw`_
688
+
689
+ Audio encoding for raw, headerless input audio.
690
+
691
+ - **sampleRate** _8000 | 16000 | 22050 | 24000 | 44100 | 48000_
692
+
693
+ Sample rate of the input audio in Hz.
694
+
695
+ - **language** _string_
696
+
697
+ Language code used for inverse text normalization.
698
+
699
+ - **format** _boolean_
700
+
701
+ Enables inverse text normalization. Requires `language`.
702
+
703
+ - **multichannel** _boolean_
704
+
705
+ Enables per-channel transcription for interleaved multichannel audio.
706
+
707
+ - **channels** _2 | 3 | 4 | 5 | 6 | 7 | 8_
708
+
709
+ Number of interleaved audio channels.
710
+
711
+ - **diarize** _boolean_
712
+
713
+ Enables speaker diarization.
714
+
715
+ - **keyterm** _string | string[]_
716
+
717
+ One or more terms to bias transcription toward.
718
+
719
+ - **fillerWords** _boolean_
720
+
721
+ Includes filler words such as `uh` and `um` in the transcript.
722
+
723
+ ### Model Capabilities
724
+
725
+ | Model | Word Timestamps | Diarization | Multichannel |
726
+ | --------- | ------------------- | ------------------- | ------------------- |
727
+ | `default` | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> |
728
+
468
729
  ## Image Models
469
730
 
470
731
  You can create xAI image models using the `.image()` factory method. For more on image generation with the AI SDK see [generateImage()](/docs/reference/ai-sdk-core/generate-image).
@@ -516,7 +777,7 @@ const { images } = await generateImage({
516
777
 
517
778
  #### Multi-Image Editing
518
779
 
519
- Combine or reference multiple input images (up to 3) in the prompt:
780
+ Combine or reference multiple input images in the prompt:
520
781
 
521
782
  ```ts
522
783
  import { xai } from '@ai-sdk/xai';
@@ -554,37 +815,53 @@ const { images } = await generateImage({
554
815
 
555
816
  <Note>
556
817
  Input images can be provided as `Buffer`, `ArrayBuffer`, `Uint8Array`, or
557
- base64-encoded strings. Up to 3 input images are supported per request.
818
+ base64-encoded strings.
558
819
  </Note>
559
820
 
560
- ### Model-specific options
821
+ ### Image Provider Options
561
822
 
562
- You can customize the image generation behavior with model-specific settings:
823
+ You can customize the image generation behavior with provider-specific settings via `providerOptions.xai`:
563
824
 
564
825
  ```ts
565
- import { xai } from '@ai-sdk/xai';
826
+ import { xai, type XaiImageModelOptions } from '@ai-sdk/xai';
566
827
  import { generateImage } from 'ai';
567
828
 
568
829
  const { images } = await generateImage({
569
- model: xai.image('grok-imagine-image'),
830
+ model: xai.image('grok-imagine-image-pro'),
570
831
  prompt: 'A futuristic cityscape at sunset',
571
832
  aspectRatio: '16:9',
572
- n: 2,
833
+ providerOptions: {
834
+ xai: {
835
+ resolution: '2k',
836
+ quality: 'high',
837
+ } satisfies XaiImageModelOptions,
838
+ },
573
839
  });
574
840
  ```
575
841
 
576
- ### Model Capabilities
842
+ - **resolution** _'1k' | '2k'_
843
+
844
+ Output resolution. `1k` produces ~1024×1024 images, `2k` produces ~2048×2048
845
+ images (actual dimensions vary based on aspect ratio). Available for
846
+ `grok-imagine-image-pro`.
847
+
848
+ - **quality** _'low' | 'medium' | 'high'_
577
849
 
578
- | Model | Aspect Ratios | Image Editing |
579
- | -------------------- | ----------------------------------------------------------------------------------------------------------- | ------------------- |
580
- | `grok-imagine-image` | `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `3:2`, `2:3`, `2:1`, `1:2`, `19.5:9`, `9:19.5`, `20:9`, `9:20`, `auto` | <Check size={18} /> |
850
+ Image quality level. Higher quality may increase generation time.
851
+
852
+ ### Image Model Capabilities
853
+
854
+ | Model | Resolution | Aspect Ratios | Image Editing |
855
+ | ------------------------ | ---------- | ----------------------------------------------------------------------------------------------------------- | ------------------- |
856
+ | `grok-imagine-image-pro` | `1k`, `2k` | `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `3:2`, `2:3`, `2:1`, `1:2`, `19.5:9`, `9:19.5`, `20:9`, `9:20`, `auto` | <Check size={18} /> |
857
+ | `grok-imagine-image` | `1k` | `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `3:2`, `2:3`, `2:1`, `1:2`, `19.5:9`, `9:19.5`, `20:9`, `9:20`, `auto` | <Check size={18} /> |
581
858
 
582
859
  ## Video Models
583
860
 
584
861
  You can create xAI video models using the `.video()` factory method.
585
862
  For more on video generation with the AI SDK see [generateVideo()](/docs/reference/ai-sdk-core/generate-video).
586
863
 
587
- This provider supports three video generation modes: text-to-video, image-to-video, and video editing.
864
+ This provider supports standard video generation from text prompts or image input, plus explicit video editing, video extension, and reference-to-video (R2V) operations.
588
865
 
589
866
  ### Text-to-Video
590
867
 
@@ -594,7 +871,7 @@ Generate videos from text prompts:
594
871
  import { xai, type XaiVideoModelOptions } from '@ai-sdk/xai';
595
872
  import { experimental_generateVideo as generateVideo } from 'ai';
596
873
 
597
- const { videos } = await generateVideo({
874
+ const { video } = await generateVideo({
598
875
  model: xai.video('grok-imagine-video'),
599
876
  prompt: 'A chicken flying into the sunset in the style of 90s anime.',
600
877
  aspectRatio: '16:9',
@@ -607,15 +884,15 @@ const { videos } = await generateVideo({
607
884
  });
608
885
  ```
609
886
 
610
- ### Image-to-Video
887
+ ### Generation with Image Input
611
888
 
612
- Generate videos using an image as the starting frame with an optional text prompt:
889
+ Generate videos using an image as the starting frame with an optional text prompt. This uses the standard generation path rather than a separate provider mode:
613
890
 
614
891
  ```ts
615
892
  import { xai, type XaiVideoModelOptions } from '@ai-sdk/xai';
616
893
  import { experimental_generateVideo as generateVideo } from 'ai';
617
894
 
618
- const { videos } = await generateVideo({
895
+ const { video } = await generateVideo({
619
896
  model: xai.video('grok-imagine-video'),
620
897
  prompt: {
621
898
  image: 'https://example.com/start-frame.png',
@@ -638,11 +915,12 @@ Edit an existing video using a text prompt by providing a source video URL via p
638
915
  import { xai, type XaiVideoModelOptions } from '@ai-sdk/xai';
639
916
  import { experimental_generateVideo as generateVideo } from 'ai';
640
917
 
641
- const { videos } = await generateVideo({
918
+ const { video } = await generateVideo({
642
919
  model: xai.video('grok-imagine-video'),
643
920
  prompt: 'Give the person sunglasses and a hat',
644
921
  providerOptions: {
645
922
  xai: {
923
+ mode: 'edit-video',
646
924
  videoUrl: 'https://example.com/source-video.mp4',
647
925
  pollTimeoutMs: 600000, // 10 minutes
648
926
  } satisfies XaiVideoModelOptions,
@@ -668,6 +946,7 @@ import { experimental_generateVideo as generateVideo } from 'ai';
668
946
 
669
947
  const providerOptions = {
670
948
  xai: {
949
+ mode: 'edit-video',
671
950
  videoUrl: 'https://example.com/source-video.mp4',
672
951
  pollTimeoutMs: 600000,
673
952
  } satisfies XaiVideoModelOptions,
@@ -689,19 +968,107 @@ const [withSunglasses, withScarf] = await Promise.all([
689
968
  model: xai.video('grok-imagine-video'),
690
969
  prompt: 'Add sunglasses',
691
970
  providerOptions: {
692
- xai: { videoUrl: step1VideoUrl, pollTimeoutMs: 600000 },
971
+ xai: {
972
+ mode: 'edit-video',
973
+ videoUrl: step1VideoUrl,
974
+ pollTimeoutMs: 600000,
975
+ },
693
976
  },
694
977
  }),
695
978
  generateVideo({
696
979
  model: xai.video('grok-imagine-video'),
697
980
  prompt: 'Add a scarf',
698
981
  providerOptions: {
699
- xai: { videoUrl: step1VideoUrl, pollTimeoutMs: 600000 },
982
+ xai: {
983
+ mode: 'edit-video',
984
+ videoUrl: step1VideoUrl,
985
+ pollTimeoutMs: 600000,
986
+ },
700
987
  },
701
988
  }),
702
989
  ]);
703
990
  ```
704
991
 
992
+ ### Video Extension
993
+
994
+ Extend an existing video from its last frame. The `duration` controls the length of the extension only, not the total output. The output inherits `aspectRatio` and `resolution` from the source video.
995
+
996
+ ```ts
997
+ import { xai, type XaiVideoModelOptions } from '@ai-sdk/xai';
998
+ import { experimental_generateVideo as generateVideo } from 'ai';
999
+
1000
+ // Step 1: Generate a source video
1001
+ const source = await generateVideo({
1002
+ model: xai.video('grok-imagine-video'),
1003
+ prompt: 'A cat sitting on a sunlit windowsill, tail gently swishing.',
1004
+ duration: 5,
1005
+ aspectRatio: '16:9',
1006
+ providerOptions: {
1007
+ xai: {
1008
+ pollTimeoutMs: 600000,
1009
+ } satisfies XaiVideoModelOptions,
1010
+ },
1011
+ });
1012
+
1013
+ const sourceUrl = source.providerMetadata?.xai?.videoUrl as string;
1014
+
1015
+ // Step 2: Extend the video with a new scene
1016
+ const extended = await generateVideo({
1017
+ model: xai.video('grok-imagine-video'),
1018
+ prompt: 'The cat turns its head, notices a butterfly, and leaps off.',
1019
+ duration: 6,
1020
+ providerOptions: {
1021
+ xai: {
1022
+ mode: 'extend-video',
1023
+ videoUrl: sourceUrl,
1024
+ pollTimeoutMs: 600000,
1025
+ } satisfies XaiVideoModelOptions,
1026
+ },
1027
+ });
1028
+ ```
1029
+
1030
+ <Note>
1031
+ Video extension does not support custom `aspectRatio` or `resolution` — the
1032
+ output inherits those from the source video. `duration` is supported and
1033
+ controls how long the extension is (not the total video length).
1034
+ </Note>
1035
+
1036
+ ### Reference-to-Video (R2V)
1037
+
1038
+ Provide reference images to guide the video's style and content. Unlike image-to-video, reference images are not used as the first frame — the model incorporates their visual elements into the generated video. Each reference image can be a public HTTPS URL or a base64 data URI.
1039
+
1040
+ ```ts
1041
+ import { xai, type XaiVideoModelOptions } from '@ai-sdk/xai';
1042
+ import { experimental_generateVideo as generateVideo } from 'ai';
1043
+
1044
+ const { video } = await generateVideo({
1045
+ model: xai.video('grok-imagine-video'),
1046
+ prompt:
1047
+ 'The comic cat from <IMAGE_1> and the comic dog from <IMAGE_2> ' +
1048
+ 'are having a playful chase through a sunlit park. ' +
1049
+ 'Cinematic slow-motion, warm afternoon light.',
1050
+ duration: 8,
1051
+ aspectRatio: '16:9',
1052
+ providerOptions: {
1053
+ xai: {
1054
+ mode: 'reference-to-video',
1055
+ referenceImageUrls: [
1056
+ 'https://example.com/comic-cat.png',
1057
+ 'https://example.com/comic-dog.png',
1058
+ ],
1059
+ pollTimeoutMs: 600000,
1060
+ } satisfies XaiVideoModelOptions,
1061
+ },
1062
+ });
1063
+ ```
1064
+
1065
+ Use `<IMAGE_1>`, `<IMAGE_2>`, etc. in your prompt to reference specific images. Up to 7 reference images are supported per request.
1066
+
1067
+ <Note>
1068
+ Reference-to-video supports `duration`, `aspectRatio`, and `resolution`. Use
1069
+ `mode` to select the operation — each mode is mutually exclusive.
1070
+ </Note>
1071
+
705
1072
  ### Video Provider Options
706
1073
 
707
1074
  The following provider options are available via `providerOptions.xai`.
@@ -721,10 +1088,27 @@ You can validate the provider options using the `XaiVideoModelOptions` type.
721
1088
  `1280x720` maps to `720p` and `854x480` maps to `480p`.
722
1089
  Use this provider option to pass the native format directly.
723
1090
 
1091
+ - **mode** _'edit-video' | 'extend-video' | 'reference-to-video'_
1092
+
1093
+ Selects the explicit video operation. Each mode is mutually exclusive:
1094
+ - `'edit-video'` — edit an existing video (requires `videoUrl`)
1095
+ - `'extend-video'` — extend a video from its last frame (requires `videoUrl`)
1096
+ - `'reference-to-video'` — generate from reference images (requires `referenceImageUrls`)
1097
+
1098
+ When omitted, standard generation is used. Legacy inputs are still auto-detected from fields for backward compatibility.
1099
+
724
1100
  - **videoUrl** _string_
725
1101
 
726
- URL of a source video for video editing. When provided, the prompt is used
727
- to describe the desired edits to the video.
1102
+ URL of a source video. Used with `mode: 'edit-video'` for video editing
1103
+ and `mode: 'extend-video'` for video extension.
1104
+
1105
+ - **referenceImageUrls** _string[]_
1106
+
1107
+ Array of reference image URLs (1–7 images) or base64 data URIs for
1108
+ reference-to-video (R2V) generation. The model incorporates visual
1109
+ elements from these images without using them as the first frame. Use
1110
+ `<IMAGE_1>`, `<IMAGE_2>`, etc. in the prompt to reference specific
1111
+ images. Used with `mode: 'reference-to-video'`.
728
1112
 
729
1113
  <Note>
730
1114
  Video generation is an asynchronous process that can take several minutes.
@@ -744,14 +1128,21 @@ desired ratio.
744
1128
 
745
1129
  For **video editing**, the output matches the input video's aspect ratio and
746
1130
  resolution. Custom `duration`, `aspectRatio`, and `resolution` are not
747
- supported - the output resolution is capped at 720p (e.g., a 1080p input
1131
+ supported the output resolution is capped at 720p (e.g., a 1080p input
748
1132
  will be downsized to 720p).
749
1133
 
1134
+ For **video extension**, the output inherits `aspectRatio` and `resolution`
1135
+ from the source video. `duration` is supported and controls only the
1136
+ extension length.
1137
+
1138
+ For **reference-to-video (R2V)**, you can specify `duration`, `aspectRatio`,
1139
+ and `resolution` just like text-to-video.
1140
+
750
1141
  ### Video Model Capabilities
751
1142
 
752
- | Model | Duration | Aspect Ratios | Resolution | Image-to-Video | Video Editing |
753
- | -------------------- | -------- | ------------------------------------------------- | -------------- | ------------------- | ------------------- |
754
- | `grok-imagine-video` | 1–15s | `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `3:2`, `2:3` | `480p`, `720p` | <Check size={18} /> | <Check size={18} /> |
1143
+ | Model | Duration | Aspect Ratios | Resolution | Image-to-Video | Editing | Extension | R2V |
1144
+ | -------------------- | -------- | ------------------------------------------------- | -------------- | ------------------- | ------------------- | ------------------- | ------------------- |
1145
+ | `grok-imagine-video` | 1–15s | `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `3:2`, `2:3` | `480p`, `720p` | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> |
755
1146
 
756
1147
  <Note>
757
1148
  You can also pass any available provider model ID as a string if needed.