@ai-sdk/xai 4.0.0-beta.7 → 4.0.0-beta.75
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +660 -9
- package/README.md +2 -0
- package/dist/index.d.ts +213 -68
- package/dist/index.js +2074 -781
- package/dist/index.js.map +1 -1
- package/docs/01-xai.mdx +445 -54
- package/package.json +15 -15
- package/src/convert-to-xai-chat-messages.ts +48 -27
- package/src/convert-xai-chat-usage.ts +3 -3
- package/src/files/xai-files-api.ts +16 -0
- package/src/files/xai-files-options.ts +19 -0
- package/src/files/xai-files.ts +94 -0
- package/src/index.ts +9 -4
- package/src/map-xai-finish-reason.ts +2 -2
- package/src/realtime/index.ts +2 -0
- package/src/realtime/xai-realtime-event-mapper.ts +399 -0
- package/src/realtime/xai-realtime-model-options.ts +3 -0
- package/src/realtime/xai-realtime-model.ts +101 -0
- package/src/remove-additional-properties.ts +24 -0
- package/src/responses/convert-to-xai-responses-input.ts +100 -23
- package/src/responses/convert-xai-responses-usage.ts +3 -3
- package/src/responses/map-xai-responses-finish-reason.ts +3 -2
- package/src/responses/xai-responses-api.ts +31 -1
- package/src/responses/{xai-responses-options.ts → xai-responses-language-model-options.ts} +12 -7
- package/src/responses/xai-responses-language-model.ts +157 -60
- package/src/responses/xai-responses-prepare-tools.ts +10 -8
- package/src/tool/code-execution.ts +2 -2
- package/src/tool/file-search.ts +2 -2
- package/src/tool/mcp-server.ts +2 -2
- package/src/tool/view-image.ts +2 -2
- package/src/tool/view-x-video.ts +2 -2
- package/src/tool/web-search.ts +4 -2
- package/src/tool/x-search.ts +2 -2
- package/src/{xai-chat-options.ts → xai-chat-language-model-options.ts} +28 -13
- package/src/xai-chat-language-model.ts +65 -29
- package/src/xai-chat-prompt.ts +2 -1
- package/src/xai-error.ts +13 -3
- package/src/xai-image-model.ts +28 -11
- package/src/xai-prepare-tools.ts +9 -8
- package/src/xai-provider.ts +115 -19
- package/src/xai-speech-model-options.ts +55 -0
- package/src/xai-speech-model.ts +167 -0
- package/src/xai-transcription-model-options.ts +70 -0
- package/src/xai-transcription-model.ts +166 -0
- package/src/xai-video-model-options.ts +145 -0
- package/src/xai-video-model.ts +129 -22
- package/dist/index.d.mts +0 -377
- package/dist/index.mjs +0 -3070
- package/dist/index.mjs.map +0 -1
- package/src/xai-video-options.ts +0 -23
- /package/src/{xai-image-options.ts → xai-image-model-options.ts} +0 -0
package/docs/01-xai.mdx
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
title: xAI Grok
|
|
3
|
-
description: Learn how to use xAI Grok.
|
|
3
|
+
description: Learn how to use xAI Grok and Imagine.
|
|
4
4
|
---
|
|
5
5
|
|
|
6
6
|
# xAI Grok Provider
|
|
@@ -73,10 +73,10 @@ You can use the following optional settings to customize the xAI provider instan
|
|
|
73
73
|
## Language Models
|
|
74
74
|
|
|
75
75
|
You can create [xAI models](https://console.x.ai) using a provider instance. The
|
|
76
|
-
first argument is the model id, e.g. `grok-
|
|
76
|
+
first argument is the model id, e.g. `grok-4.20-non-reasoning`.
|
|
77
77
|
|
|
78
78
|
```ts
|
|
79
|
-
const model = xai('grok-
|
|
79
|
+
const model = xai('grok-4.20-non-reasoning');
|
|
80
80
|
```
|
|
81
81
|
|
|
82
82
|
By default, `xai(modelId)` uses the Responses API. To use the [Chat Completions API](https://docs.x.ai/docs/api-reference#chat-completions) (legacy), use `xai.chat(modelId)`.
|
|
@@ -90,7 +90,7 @@ import { xai } from '@ai-sdk/xai';
|
|
|
90
90
|
import { generateText } from 'ai';
|
|
91
91
|
|
|
92
92
|
const { text } = await generateText({
|
|
93
|
-
model: xai('grok-
|
|
93
|
+
model: xai('grok-4.20-non-reasoning'),
|
|
94
94
|
prompt: 'Write a vegetarian lasagna recipe for 4 people.',
|
|
95
95
|
});
|
|
96
96
|
```
|
|
@@ -99,12 +99,76 @@ xAI language models can also be used in the `streamText` function
|
|
|
99
99
|
and support structured data generation with [`Output`](/docs/reference/ai-sdk-core/output)
|
|
100
100
|
(see [AI SDK Core](/docs/ai-sdk-core)).
|
|
101
101
|
|
|
102
|
+
### Reasoning Effort
|
|
103
|
+
|
|
104
|
+
For reasoning-capable models you can control how much effort the model spends
|
|
105
|
+
thinking before responding via `providerOptions.xai.reasoningEffort`. This
|
|
106
|
+
works for both the Responses API (default) and the Chat Completions API
|
|
107
|
+
(`xai.chat()`).
|
|
108
|
+
|
|
109
|
+
```ts
|
|
110
|
+
import { xai } from '@ai-sdk/xai';
|
|
111
|
+
import { generateText } from 'ai';
|
|
112
|
+
|
|
113
|
+
const { text } = await generateText({
|
|
114
|
+
model: xai('grok-4.3'),
|
|
115
|
+
prompt: 'Explain quantum entanglement.',
|
|
116
|
+
providerOptions: {
|
|
117
|
+
xai: { reasoningEffort: 'medium' },
|
|
118
|
+
},
|
|
119
|
+
});
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
Supported values:
|
|
123
|
+
|
|
124
|
+
- `'none'` — Disables reasoning entirely; no thinking tokens are used. Best
|
|
125
|
+
for simple use cases that need a near-instant response. Supported by
|
|
126
|
+
`grok-4.3` and newer reasoning models.
|
|
127
|
+
- `'low'` (default) — Uses some reasoning tokens, but still fast. Good for
|
|
128
|
+
general agentic use and tool calling.
|
|
129
|
+
- `'medium'` — More thinking for less-latency-sensitive applications such as
|
|
130
|
+
complex data analysis and long-context reasoning.
|
|
131
|
+
- `'high'` — More reasoning tokens for deeper thinking. Suited for very
|
|
132
|
+
challenging problems, complex math, multi-step logic, and competition-level
|
|
133
|
+
tasks.
|
|
134
|
+
|
|
135
|
+
<Note>
|
|
136
|
+
Not every Grok model accepts every value. See xAI's [reasoning
|
|
137
|
+
docs](https://docs.x.ai/docs/guides/reasoning) for the values supported by
|
|
138
|
+
your selected model. `'none'` requires `grok-4.3` or newer.
|
|
139
|
+
</Note>
|
|
140
|
+
|
|
141
|
+
## Realtime Models
|
|
142
|
+
|
|
143
|
+
<Note type="warning">Realtime is an experimental feature.</Note>
|
|
144
|
+
|
|
145
|
+
You can create models that call the [xAI Realtime API](https://docs.x.ai/docs/guides/realtime)
|
|
146
|
+
using the `.experimental_realtime()` factory method.
|
|
147
|
+
|
|
148
|
+
```ts
|
|
149
|
+
import { xai } from '@ai-sdk/xai';
|
|
150
|
+
|
|
151
|
+
const model = xai.experimental_realtime('grok-voice-latest');
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
Realtime sessions run in the browser and require a short-lived token created on
|
|
155
|
+
your server with `xai.experimental_realtime.getToken()`:
|
|
156
|
+
|
|
157
|
+
```ts
|
|
158
|
+
const token = await xai.experimental_realtime.getToken({
|
|
159
|
+
model: 'grok-voice-latest',
|
|
160
|
+
});
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
See [Realtime](/docs/ai-sdk-core/realtime) for the complete setup and tool
|
|
164
|
+
calling pattern.
|
|
165
|
+
|
|
102
166
|
## Responses API (Agentic Tools)
|
|
103
167
|
|
|
104
168
|
The xAI Responses API is the default when using `xai(modelId)`. You can also use `xai.responses(modelId)` explicitly. This enables the model to autonomously orchestrate tool calls and research on xAI's servers.
|
|
105
169
|
|
|
106
170
|
```ts
|
|
107
|
-
const model = xai.responses('grok-4-
|
|
171
|
+
const model = xai.responses('grok-4.20-non-reasoning');
|
|
108
172
|
```
|
|
109
173
|
|
|
110
174
|
The Responses API provides server-side tools that the model can autonomously execute during its reasoning process:
|
|
@@ -132,7 +196,11 @@ const { text } = await generateText({
|
|
|
132
196
|
role: 'user',
|
|
133
197
|
content: [
|
|
134
198
|
{ type: 'text', text: 'What do you see in this image?' },
|
|
135
|
-
{
|
|
199
|
+
{
|
|
200
|
+
type: 'file',
|
|
201
|
+
mediaType: 'image',
|
|
202
|
+
data: fs.readFileSync('./image.png'),
|
|
203
|
+
},
|
|
136
204
|
],
|
|
137
205
|
},
|
|
138
206
|
],
|
|
@@ -148,7 +216,7 @@ import { xai } from '@ai-sdk/xai';
|
|
|
148
216
|
import { generateText } from 'ai';
|
|
149
217
|
|
|
150
218
|
const { text, sources } = await generateText({
|
|
151
|
-
model: xai.responses('grok-4-
|
|
219
|
+
model: xai.responses('grok-4.20-non-reasoning'),
|
|
152
220
|
prompt: 'What are the latest developments in AI?',
|
|
153
221
|
tools: {
|
|
154
222
|
web_search: xai.tools.webSearch({
|
|
@@ -172,9 +240,13 @@ console.log('Citations:', sources);
|
|
|
172
240
|
|
|
173
241
|
Exclude specified domains from search (max 5). Cannot be used with `allowedDomains`.
|
|
174
242
|
|
|
243
|
+
- **enableImageSearch** _boolean_
|
|
244
|
+
|
|
245
|
+
Allow the model to perform image search as a separate Web Search mode when images are useful for the answer. The model can choose between regular web search and image search; responses can include Markdown image embeds when relevant.
|
|
246
|
+
|
|
175
247
|
- **enableImageUnderstanding** _boolean_
|
|
176
248
|
|
|
177
|
-
Enable the model to view and analyze images found during search. Increases token usage.
|
|
249
|
+
Enable the model to view and analyze images found during search. Increases token usage. To allow explicitly searching for images, use `enableImageSearch`.
|
|
178
250
|
|
|
179
251
|
### X Search Tool
|
|
180
252
|
|
|
@@ -182,7 +254,7 @@ The X search tool enables searching X (Twitter) for posts, with filtering by han
|
|
|
182
254
|
|
|
183
255
|
```ts
|
|
184
256
|
const { text, sources } = await generateText({
|
|
185
|
-
model: xai.responses('grok-4-
|
|
257
|
+
model: xai.responses('grok-4.20-non-reasoning'),
|
|
186
258
|
prompt: 'What are people saying about AI on X this week?',
|
|
187
259
|
tools: {
|
|
188
260
|
x_search: xai.tools.xSearch({
|
|
@@ -228,7 +300,7 @@ The code execution tool enables the model to write and execute Python code for c
|
|
|
228
300
|
|
|
229
301
|
```ts
|
|
230
302
|
const { text } = await generateText({
|
|
231
|
-
model: xai.responses('grok-4-
|
|
303
|
+
model: xai.responses('grok-4.20-non-reasoning'),
|
|
232
304
|
prompt:
|
|
233
305
|
'Calculate the compound interest for $10,000 at 5% annually for 10 years',
|
|
234
306
|
tools: {
|
|
@@ -243,7 +315,7 @@ The view image tool enables the model to view and analyze images:
|
|
|
243
315
|
|
|
244
316
|
```ts
|
|
245
317
|
const { text } = await generateText({
|
|
246
|
-
model: xai.responses('grok-4-
|
|
318
|
+
model: xai.responses('grok-4.20-non-reasoning'),
|
|
247
319
|
prompt: 'Describe what you see in the image',
|
|
248
320
|
tools: {
|
|
249
321
|
view_image: xai.tools.viewImage(),
|
|
@@ -257,7 +329,7 @@ The view X video tool enables the model to view and analyze videos from X (Twitt
|
|
|
257
329
|
|
|
258
330
|
```ts
|
|
259
331
|
const { text } = await generateText({
|
|
260
|
-
model: xai.responses('grok-4-
|
|
332
|
+
model: xai.responses('grok-4.20-non-reasoning'),
|
|
261
333
|
prompt: 'Summarize the content of this X video',
|
|
262
334
|
tools: {
|
|
263
335
|
view_x_video: xai.tools.viewXVideo(),
|
|
@@ -271,7 +343,7 @@ The MCP server tool enables the model to connect to remote [Model Context Protoc
|
|
|
271
343
|
|
|
272
344
|
```ts
|
|
273
345
|
const { text } = await generateText({
|
|
274
|
-
model: xai.responses('grok-4-
|
|
346
|
+
model: xai.responses('grok-4.20-non-reasoning'),
|
|
275
347
|
prompt: 'Use the weather tool to check conditions in San Francisco',
|
|
276
348
|
tools: {
|
|
277
349
|
weather_server: xai.tools.mcpServer({
|
|
@@ -319,7 +391,7 @@ import { xai, type XaiLanguageModelResponsesOptions } from '@ai-sdk/xai';
|
|
|
319
391
|
import { streamText } from 'ai';
|
|
320
392
|
|
|
321
393
|
const result = streamText({
|
|
322
|
-
model: xai.responses('grok-4-
|
|
394
|
+
model: xai.responses('grok-4.20-reasoning'),
|
|
323
395
|
prompt: 'What documents do you have access to?',
|
|
324
396
|
tools: {
|
|
325
397
|
file_search: xai.tools.fileSearch({
|
|
@@ -352,8 +424,8 @@ const result = streamText({
|
|
|
352
424
|
Include file search results in the response. When set to `['file_search_call.results']`, the response will contain the actual search results with file content and scores.
|
|
353
425
|
|
|
354
426
|
<Note>
|
|
355
|
-
File search requires grok-4 family models
|
|
356
|
-
can be created using the [xAI
|
|
427
|
+
File search requires grok-4 family models (including grok-4.20) and the
|
|
428
|
+
Responses API. Vector stores can be created using the [xAI
|
|
357
429
|
API](https://docs.x.ai/docs/guides/using-collections/api).
|
|
358
430
|
</Note>
|
|
359
431
|
|
|
@@ -365,8 +437,8 @@ You can combine multiple server-side tools for comprehensive research:
|
|
|
365
437
|
import { xai } from '@ai-sdk/xai';
|
|
366
438
|
import { streamText } from 'ai';
|
|
367
439
|
|
|
368
|
-
const {
|
|
369
|
-
model: xai.responses('grok-4-
|
|
440
|
+
const { stream } = streamText({
|
|
441
|
+
model: xai.responses('grok-4.20-non-reasoning'),
|
|
370
442
|
prompt: 'Research AI safety developments and calculate risk metrics',
|
|
371
443
|
tools: {
|
|
372
444
|
web_search: xai.tools.webSearch(),
|
|
@@ -382,7 +454,7 @@ const { fullStream } = streamText({
|
|
|
382
454
|
},
|
|
383
455
|
});
|
|
384
456
|
|
|
385
|
-
for await (const part of
|
|
457
|
+
for await (const part of stream) {
|
|
386
458
|
if (part.type === 'text-delta') {
|
|
387
459
|
process.stdout.write(part.text);
|
|
388
460
|
} else if (part.type === 'source' && part.sourceType === 'url') {
|
|
@@ -400,7 +472,7 @@ import { xai, type XaiLanguageModelResponsesOptions } from '@ai-sdk/xai';
|
|
|
400
472
|
import { generateText } from 'ai';
|
|
401
473
|
|
|
402
474
|
const result = await generateText({
|
|
403
|
-
model: xai.responses('grok-4-
|
|
475
|
+
model: xai.responses('grok-4.20-non-reasoning'),
|
|
404
476
|
providerOptions: {
|
|
405
477
|
xai: {
|
|
406
478
|
reasoningEffort: 'high',
|
|
@@ -412,9 +484,9 @@ const result = await generateText({
|
|
|
412
484
|
|
|
413
485
|
The following provider options are available:
|
|
414
486
|
|
|
415
|
-
- **reasoningEffort** _'low' | 'medium' | 'high'_
|
|
487
|
+
- **reasoningEffort** _'none' | 'low' | 'medium' | 'high'_
|
|
416
488
|
|
|
417
|
-
Control the reasoning effort for the model.
|
|
489
|
+
Control the reasoning effort for the model. See [Reasoning Effort](#reasoning-effort) for details on each value. `'none'` disables reasoning entirely (requires `grok-4.3` or newer).
|
|
418
490
|
|
|
419
491
|
- **logprobs** _boolean_
|
|
420
492
|
|
|
@@ -445,19 +517,16 @@ The following provider options are available:
|
|
|
445
517
|
|
|
446
518
|
| Model | Image Input | Object Generation | Tool Usage | Tool Streaming | Reasoning |
|
|
447
519
|
| ----------------------------- | ------------------- | ------------------- | ------------------- | ------------------- | ------------------- |
|
|
448
|
-
| `grok-4-
|
|
520
|
+
| `grok-4.20-reasoning` | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> |
|
|
521
|
+
| `grok-4.20-non-reasoning` | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Cross size={18} /> |
|
|
449
522
|
| `grok-4-1-fast-reasoning` | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> |
|
|
450
523
|
| `grok-4-1-fast-non-reasoning` | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Cross size={18} /> |
|
|
451
|
-
| `grok-4-
|
|
524
|
+
| `grok-4-1` | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Cross size={18} /> |
|
|
452
525
|
| `grok-4-fast-reasoning` | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> |
|
|
526
|
+
| `grok-4-fast-non-reasoning` | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Cross size={18} /> |
|
|
453
527
|
| `grok-code-fast-1` | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> |
|
|
454
|
-
| `grok-4` | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Cross size={18} /> |
|
|
455
|
-
| `grok-4-0709` | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Cross size={18} /> |
|
|
456
|
-
| `grok-4-latest` | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Cross size={18} /> |
|
|
457
528
|
| `grok-3` | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Cross size={18} /> |
|
|
458
|
-
| `grok-3-latest` | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Cross size={18} /> |
|
|
459
529
|
| `grok-3-mini` | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> |
|
|
460
|
-
| `grok-3-mini-latest` | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> |
|
|
461
530
|
|
|
462
531
|
<Note>
|
|
463
532
|
The table above lists popular models. Please see the [xAI
|
|
@@ -465,6 +534,198 @@ The following provider options are available:
|
|
|
465
534
|
can also pass any available provider model ID as a string if needed.
|
|
466
535
|
</Note>
|
|
467
536
|
|
|
537
|
+
## Speech Models
|
|
538
|
+
|
|
539
|
+
You can create models that call the [xAI Text to Speech API](https://docs.x.ai/developers/model-capabilities/audio/text-to-speech)
|
|
540
|
+
using the `.speech()` factory method. xAI's text-to-speech endpoint does not
|
|
541
|
+
require a model identifier.
|
|
542
|
+
|
|
543
|
+
```ts
|
|
544
|
+
const model = xai.speech();
|
|
545
|
+
```
|
|
546
|
+
|
|
547
|
+
Use xAI speech models with the `generateSpeech` function:
|
|
548
|
+
|
|
549
|
+
```ts
|
|
550
|
+
import { xai } from '@ai-sdk/xai';
|
|
551
|
+
import { generateSpeech } from 'ai';
|
|
552
|
+
|
|
553
|
+
const result = await generateSpeech({
|
|
554
|
+
model: xai.speech(),
|
|
555
|
+
text: 'Hello from the AI SDK!',
|
|
556
|
+
voice: 'ara',
|
|
557
|
+
language: 'en',
|
|
558
|
+
outputFormat: 'mp3',
|
|
559
|
+
speed: 1.1,
|
|
560
|
+
});
|
|
561
|
+
```
|
|
562
|
+
|
|
563
|
+
### Supported Parameters
|
|
564
|
+
|
|
565
|
+
- **text** _string_ (required)
|
|
566
|
+
|
|
567
|
+
Text to synthesize. You can include xAI speech tags such as `[pause]`,
|
|
568
|
+
`[laugh]`, or `<whisper>...</whisper>` directly in the text.
|
|
569
|
+
|
|
570
|
+
- **voice** _string_
|
|
571
|
+
|
|
572
|
+
The xAI voice ID. Defaults to `'eve'`. Built-in voice IDs are `'eve'`,
|
|
573
|
+
`'ara'`, `'rex'`, `'sal'`, and `'leo'`; custom voice IDs are also accepted.
|
|
574
|
+
|
|
575
|
+
- **language** _string_
|
|
576
|
+
|
|
577
|
+
A BCP-47 language code or `'auto'` for automatic detection. Defaults to
|
|
578
|
+
`'auto'` when omitted.
|
|
579
|
+
|
|
580
|
+
- **speed** _number_
|
|
581
|
+
|
|
582
|
+
Speech rate multiplier from `0.7` to `1.5`.
|
|
583
|
+
|
|
584
|
+
- **outputFormat** _string_
|
|
585
|
+
|
|
586
|
+
The audio codec to generate. Supported values are `'mp3'`, `'wav'`, `'pcm'`,
|
|
587
|
+
`'mulaw'`, and `'alaw'`. Defaults to `'mp3'`.
|
|
588
|
+
|
|
589
|
+
<Note>
|
|
590
|
+
xAI does not expose a separate `instructions` field for text-to-speech. Use
|
|
591
|
+
speech tags in `text` to control expressive delivery.
|
|
592
|
+
</Note>
|
|
593
|
+
|
|
594
|
+
### Provider Options
|
|
595
|
+
|
|
596
|
+
You can pass xAI-specific controls using `providerOptions.xai`:
|
|
597
|
+
|
|
598
|
+
```ts
|
|
599
|
+
import { xai, type XaiSpeechModelOptions } from '@ai-sdk/xai';
|
|
600
|
+
import { generateSpeech } from 'ai';
|
|
601
|
+
|
|
602
|
+
const result = await generateSpeech({
|
|
603
|
+
model: xai.speech(),
|
|
604
|
+
text: 'A high fidelity narration sample.',
|
|
605
|
+
outputFormat: 'mp3',
|
|
606
|
+
providerOptions: {
|
|
607
|
+
xai: {
|
|
608
|
+
sampleRate: 44100,
|
|
609
|
+
bitRate: 192000,
|
|
610
|
+
optimizeStreamingLatency: 1,
|
|
611
|
+
textNormalization: true,
|
|
612
|
+
} satisfies XaiSpeechModelOptions,
|
|
613
|
+
},
|
|
614
|
+
});
|
|
615
|
+
```
|
|
616
|
+
|
|
617
|
+
- **sampleRate** _8000 | 16000 | 22050 | 24000 | 44100 | 48000_
|
|
618
|
+
|
|
619
|
+
Sample rate of the output audio in Hz.
|
|
620
|
+
|
|
621
|
+
- **bitRate** _32000 | 64000 | 96000 | 128000 | 192000_
|
|
622
|
+
|
|
623
|
+
MP3 bit rate in bits per second. Only applies when `outputFormat` is
|
|
624
|
+
`'mp3'`.
|
|
625
|
+
|
|
626
|
+
- **optimizeStreamingLatency** _0 | 1 | 2_
|
|
627
|
+
|
|
628
|
+
Latency optimization level. Higher values reduce time to first audio with a
|
|
629
|
+
quality tradeoff at chunk boundaries.
|
|
630
|
+
|
|
631
|
+
- **textNormalization** _boolean_
|
|
632
|
+
|
|
633
|
+
Whether to normalize written-form input text before synthesizing speech.
|
|
634
|
+
|
|
635
|
+
### Model Capabilities
|
|
636
|
+
|
|
637
|
+
| Model | Language | Speed | Output Formats |
|
|
638
|
+
| --------- | ------------------- | ------------------- | -------------------------- |
|
|
639
|
+
| `default` | <Check size={18} /> | <Check size={18} /> | mp3, wav, pcm, mulaw, alaw |
|
|
640
|
+
|
|
641
|
+
## Transcription Models
|
|
642
|
+
|
|
643
|
+
You can create models that call the [xAI Speech to Text API](https://docs.x.ai/developers/model-capabilities/audio/speech-to-text)
|
|
644
|
+
using the `.transcription()` factory method. xAI's batch transcription endpoint
|
|
645
|
+
does not require a model identifier.
|
|
646
|
+
|
|
647
|
+
```ts
|
|
648
|
+
const model = xai.transcription();
|
|
649
|
+
```
|
|
650
|
+
|
|
651
|
+
Use xAI transcription models with the `transcribe` function:
|
|
652
|
+
|
|
653
|
+
```ts
|
|
654
|
+
import { xai } from '@ai-sdk/xai';
|
|
655
|
+
import { transcribe } from 'ai';
|
|
656
|
+
import { readFile } from 'fs/promises';
|
|
657
|
+
|
|
658
|
+
const result = await transcribe({
|
|
659
|
+
model: xai.transcription(),
|
|
660
|
+
audio: await readFile('meeting.mp3'),
|
|
661
|
+
});
|
|
662
|
+
```
|
|
663
|
+
|
|
664
|
+
### Provider Options
|
|
665
|
+
|
|
666
|
+
You can pass xAI-specific controls using `providerOptions.xai`:
|
|
667
|
+
|
|
668
|
+
```ts
|
|
669
|
+
import { xai, type XaiTranscriptionModelOptions } from '@ai-sdk/xai';
|
|
670
|
+
import { transcribe } from 'ai';
|
|
671
|
+
import { readFile } from 'fs/promises';
|
|
672
|
+
|
|
673
|
+
const result = await transcribe({
|
|
674
|
+
model: xai.transcription(),
|
|
675
|
+
audio: await readFile('meeting.mp3'),
|
|
676
|
+
providerOptions: {
|
|
677
|
+
xai: {
|
|
678
|
+
language: 'en',
|
|
679
|
+
format: true,
|
|
680
|
+
keyterm: ['AI SDK', 'Grok'],
|
|
681
|
+
diarize: true,
|
|
682
|
+
} satisfies XaiTranscriptionModelOptions,
|
|
683
|
+
},
|
|
684
|
+
});
|
|
685
|
+
```
|
|
686
|
+
|
|
687
|
+
- **audioFormat** _`pcm` | `mulaw` | `alaw`_
|
|
688
|
+
|
|
689
|
+
Audio encoding for raw, headerless input audio.
|
|
690
|
+
|
|
691
|
+
- **sampleRate** _8000 | 16000 | 22050 | 24000 | 44100 | 48000_
|
|
692
|
+
|
|
693
|
+
Sample rate of the input audio in Hz.
|
|
694
|
+
|
|
695
|
+
- **language** _string_
|
|
696
|
+
|
|
697
|
+
Language code used for inverse text normalization.
|
|
698
|
+
|
|
699
|
+
- **format** _boolean_
|
|
700
|
+
|
|
701
|
+
Enables inverse text normalization. Requires `language`.
|
|
702
|
+
|
|
703
|
+
- **multichannel** _boolean_
|
|
704
|
+
|
|
705
|
+
Enables per-channel transcription for interleaved multichannel audio.
|
|
706
|
+
|
|
707
|
+
- **channels** _2 | 3 | 4 | 5 | 6 | 7 | 8_
|
|
708
|
+
|
|
709
|
+
Number of interleaved audio channels.
|
|
710
|
+
|
|
711
|
+
- **diarize** _boolean_
|
|
712
|
+
|
|
713
|
+
Enables speaker diarization.
|
|
714
|
+
|
|
715
|
+
- **keyterm** _string | string[]_
|
|
716
|
+
|
|
717
|
+
One or more terms to bias transcription toward.
|
|
718
|
+
|
|
719
|
+
- **fillerWords** _boolean_
|
|
720
|
+
|
|
721
|
+
Includes filler words such as `uh` and `um` in the transcript.
|
|
722
|
+
|
|
723
|
+
### Model Capabilities
|
|
724
|
+
|
|
725
|
+
| Model | Word Timestamps | Diarization | Multichannel |
|
|
726
|
+
| --------- | ------------------- | ------------------- | ------------------- |
|
|
727
|
+
| `default` | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> |
|
|
728
|
+
|
|
468
729
|
## Image Models
|
|
469
730
|
|
|
470
731
|
You can create xAI image models using the `.image()` factory method. For more on image generation with the AI SDK see [generateImage()](/docs/reference/ai-sdk-core/generate-image).
|
|
@@ -516,7 +777,7 @@ const { images } = await generateImage({
|
|
|
516
777
|
|
|
517
778
|
#### Multi-Image Editing
|
|
518
779
|
|
|
519
|
-
Combine or reference multiple input images
|
|
780
|
+
Combine or reference multiple input images in the prompt:
|
|
520
781
|
|
|
521
782
|
```ts
|
|
522
783
|
import { xai } from '@ai-sdk/xai';
|
|
@@ -554,37 +815,53 @@ const { images } = await generateImage({
|
|
|
554
815
|
|
|
555
816
|
<Note>
|
|
556
817
|
Input images can be provided as `Buffer`, `ArrayBuffer`, `Uint8Array`, or
|
|
557
|
-
base64-encoded strings.
|
|
818
|
+
base64-encoded strings.
|
|
558
819
|
</Note>
|
|
559
820
|
|
|
560
|
-
###
|
|
821
|
+
### Image Provider Options
|
|
561
822
|
|
|
562
|
-
You can customize the image generation behavior with
|
|
823
|
+
You can customize the image generation behavior with provider-specific settings via `providerOptions.xai`:
|
|
563
824
|
|
|
564
825
|
```ts
|
|
565
|
-
import { xai } from '@ai-sdk/xai';
|
|
826
|
+
import { xai, type XaiImageModelOptions } from '@ai-sdk/xai';
|
|
566
827
|
import { generateImage } from 'ai';
|
|
567
828
|
|
|
568
829
|
const { images } = await generateImage({
|
|
569
|
-
model: xai.image('grok-imagine-image'),
|
|
830
|
+
model: xai.image('grok-imagine-image-pro'),
|
|
570
831
|
prompt: 'A futuristic cityscape at sunset',
|
|
571
832
|
aspectRatio: '16:9',
|
|
572
|
-
|
|
833
|
+
providerOptions: {
|
|
834
|
+
xai: {
|
|
835
|
+
resolution: '2k',
|
|
836
|
+
quality: 'high',
|
|
837
|
+
} satisfies XaiImageModelOptions,
|
|
838
|
+
},
|
|
573
839
|
});
|
|
574
840
|
```
|
|
575
841
|
|
|
576
|
-
|
|
842
|
+
- **resolution** _'1k' | '2k'_
|
|
843
|
+
|
|
844
|
+
Output resolution. `1k` produces ~1024×1024 images, `2k` produces ~2048×2048
|
|
845
|
+
images (actual dimensions vary based on aspect ratio). Available for
|
|
846
|
+
`grok-imagine-image-pro`.
|
|
847
|
+
|
|
848
|
+
- **quality** _'low' | 'medium' | 'high'_
|
|
577
849
|
|
|
578
|
-
|
|
579
|
-
|
|
580
|
-
|
|
850
|
+
Image quality level. Higher quality may increase generation time.
|
|
851
|
+
|
|
852
|
+
### Image Model Capabilities
|
|
853
|
+
|
|
854
|
+
| Model | Resolution | Aspect Ratios | Image Editing |
|
|
855
|
+
| ------------------------ | ---------- | ----------------------------------------------------------------------------------------------------------- | ------------------- |
|
|
856
|
+
| `grok-imagine-image-pro` | `1k`, `2k` | `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `3:2`, `2:3`, `2:1`, `1:2`, `19.5:9`, `9:19.5`, `20:9`, `9:20`, `auto` | <Check size={18} /> |
|
|
857
|
+
| `grok-imagine-image` | `1k` | `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `3:2`, `2:3`, `2:1`, `1:2`, `19.5:9`, `9:19.5`, `20:9`, `9:20`, `auto` | <Check size={18} /> |
|
|
581
858
|
|
|
582
859
|
## Video Models
|
|
583
860
|
|
|
584
861
|
You can create xAI video models using the `.video()` factory method.
|
|
585
862
|
For more on video generation with the AI SDK see [generateVideo()](/docs/reference/ai-sdk-core/generate-video).
|
|
586
863
|
|
|
587
|
-
This provider supports
|
|
864
|
+
This provider supports standard video generation from text prompts or image input, plus explicit video editing, video extension, and reference-to-video (R2V) operations.
|
|
588
865
|
|
|
589
866
|
### Text-to-Video
|
|
590
867
|
|
|
@@ -594,7 +871,7 @@ Generate videos from text prompts:
|
|
|
594
871
|
import { xai, type XaiVideoModelOptions } from '@ai-sdk/xai';
|
|
595
872
|
import { experimental_generateVideo as generateVideo } from 'ai';
|
|
596
873
|
|
|
597
|
-
const {
|
|
874
|
+
const { video } = await generateVideo({
|
|
598
875
|
model: xai.video('grok-imagine-video'),
|
|
599
876
|
prompt: 'A chicken flying into the sunset in the style of 90s anime.',
|
|
600
877
|
aspectRatio: '16:9',
|
|
@@ -607,15 +884,15 @@ const { videos } = await generateVideo({
|
|
|
607
884
|
});
|
|
608
885
|
```
|
|
609
886
|
|
|
610
|
-
### Image
|
|
887
|
+
### Generation with Image Input
|
|
611
888
|
|
|
612
|
-
Generate videos using an image as the starting frame with an optional text prompt:
|
|
889
|
+
Generate videos using an image as the starting frame with an optional text prompt. This uses the standard generation path rather than a separate provider mode:
|
|
613
890
|
|
|
614
891
|
```ts
|
|
615
892
|
import { xai, type XaiVideoModelOptions } from '@ai-sdk/xai';
|
|
616
893
|
import { experimental_generateVideo as generateVideo } from 'ai';
|
|
617
894
|
|
|
618
|
-
const {
|
|
895
|
+
const { video } = await generateVideo({
|
|
619
896
|
model: xai.video('grok-imagine-video'),
|
|
620
897
|
prompt: {
|
|
621
898
|
image: 'https://example.com/start-frame.png',
|
|
@@ -638,11 +915,12 @@ Edit an existing video using a text prompt by providing a source video URL via p
|
|
|
638
915
|
import { xai, type XaiVideoModelOptions } from '@ai-sdk/xai';
|
|
639
916
|
import { experimental_generateVideo as generateVideo } from 'ai';
|
|
640
917
|
|
|
641
|
-
const {
|
|
918
|
+
const { video } = await generateVideo({
|
|
642
919
|
model: xai.video('grok-imagine-video'),
|
|
643
920
|
prompt: 'Give the person sunglasses and a hat',
|
|
644
921
|
providerOptions: {
|
|
645
922
|
xai: {
|
|
923
|
+
mode: 'edit-video',
|
|
646
924
|
videoUrl: 'https://example.com/source-video.mp4',
|
|
647
925
|
pollTimeoutMs: 600000, // 10 minutes
|
|
648
926
|
} satisfies XaiVideoModelOptions,
|
|
@@ -668,6 +946,7 @@ import { experimental_generateVideo as generateVideo } from 'ai';
|
|
|
668
946
|
|
|
669
947
|
const providerOptions = {
|
|
670
948
|
xai: {
|
|
949
|
+
mode: 'edit-video',
|
|
671
950
|
videoUrl: 'https://example.com/source-video.mp4',
|
|
672
951
|
pollTimeoutMs: 600000,
|
|
673
952
|
} satisfies XaiVideoModelOptions,
|
|
@@ -689,19 +968,107 @@ const [withSunglasses, withScarf] = await Promise.all([
|
|
|
689
968
|
model: xai.video('grok-imagine-video'),
|
|
690
969
|
prompt: 'Add sunglasses',
|
|
691
970
|
providerOptions: {
|
|
692
|
-
xai: {
|
|
971
|
+
xai: {
|
|
972
|
+
mode: 'edit-video',
|
|
973
|
+
videoUrl: step1VideoUrl,
|
|
974
|
+
pollTimeoutMs: 600000,
|
|
975
|
+
},
|
|
693
976
|
},
|
|
694
977
|
}),
|
|
695
978
|
generateVideo({
|
|
696
979
|
model: xai.video('grok-imagine-video'),
|
|
697
980
|
prompt: 'Add a scarf',
|
|
698
981
|
providerOptions: {
|
|
699
|
-
xai: {
|
|
982
|
+
xai: {
|
|
983
|
+
mode: 'edit-video',
|
|
984
|
+
videoUrl: step1VideoUrl,
|
|
985
|
+
pollTimeoutMs: 600000,
|
|
986
|
+
},
|
|
700
987
|
},
|
|
701
988
|
}),
|
|
702
989
|
]);
|
|
703
990
|
```
|
|
704
991
|
|
|
992
|
+
### Video Extension
|
|
993
|
+
|
|
994
|
+
Extend an existing video from its last frame. The `duration` controls the length of the extension only, not the total output. The output inherits `aspectRatio` and `resolution` from the source video.
|
|
995
|
+
|
|
996
|
+
```ts
|
|
997
|
+
import { xai, type XaiVideoModelOptions } from '@ai-sdk/xai';
|
|
998
|
+
import { experimental_generateVideo as generateVideo } from 'ai';
|
|
999
|
+
|
|
1000
|
+
// Step 1: Generate a source video
|
|
1001
|
+
const source = await generateVideo({
|
|
1002
|
+
model: xai.video('grok-imagine-video'),
|
|
1003
|
+
prompt: 'A cat sitting on a sunlit windowsill, tail gently swishing.',
|
|
1004
|
+
duration: 5,
|
|
1005
|
+
aspectRatio: '16:9',
|
|
1006
|
+
providerOptions: {
|
|
1007
|
+
xai: {
|
|
1008
|
+
pollTimeoutMs: 600000,
|
|
1009
|
+
} satisfies XaiVideoModelOptions,
|
|
1010
|
+
},
|
|
1011
|
+
});
|
|
1012
|
+
|
|
1013
|
+
const sourceUrl = source.providerMetadata?.xai?.videoUrl as string;
|
|
1014
|
+
|
|
1015
|
+
// Step 2: Extend the video with a new scene
|
|
1016
|
+
const extended = await generateVideo({
|
|
1017
|
+
model: xai.video('grok-imagine-video'),
|
|
1018
|
+
prompt: 'The cat turns its head, notices a butterfly, and leaps off.',
|
|
1019
|
+
duration: 6,
|
|
1020
|
+
providerOptions: {
|
|
1021
|
+
xai: {
|
|
1022
|
+
mode: 'extend-video',
|
|
1023
|
+
videoUrl: sourceUrl,
|
|
1024
|
+
pollTimeoutMs: 600000,
|
|
1025
|
+
} satisfies XaiVideoModelOptions,
|
|
1026
|
+
},
|
|
1027
|
+
});
|
|
1028
|
+
```
|
|
1029
|
+
|
|
1030
|
+
<Note>
|
|
1031
|
+
Video extension does not support custom `aspectRatio` or `resolution` — the
|
|
1032
|
+
output inherits those from the source video. `duration` is supported and
|
|
1033
|
+
controls how long the extension is (not the total video length).
|
|
1034
|
+
</Note>
|
|
1035
|
+
|
|
1036
|
+
### Reference-to-Video (R2V)
|
|
1037
|
+
|
|
1038
|
+
Provide reference images to guide the video's style and content. Unlike image-to-video, reference images are not used as the first frame — the model incorporates their visual elements into the generated video. Each reference image can be a public HTTPS URL or a base64 data URI.
|
|
1039
|
+
|
|
1040
|
+
```ts
|
|
1041
|
+
import { xai, type XaiVideoModelOptions } from '@ai-sdk/xai';
|
|
1042
|
+
import { experimental_generateVideo as generateVideo } from 'ai';
|
|
1043
|
+
|
|
1044
|
+
const { video } = await generateVideo({
|
|
1045
|
+
model: xai.video('grok-imagine-video'),
|
|
1046
|
+
prompt:
|
|
1047
|
+
'The comic cat from <IMAGE_1> and the comic dog from <IMAGE_2> ' +
|
|
1048
|
+
'are having a playful chase through a sunlit park. ' +
|
|
1049
|
+
'Cinematic slow-motion, warm afternoon light.',
|
|
1050
|
+
duration: 8,
|
|
1051
|
+
aspectRatio: '16:9',
|
|
1052
|
+
providerOptions: {
|
|
1053
|
+
xai: {
|
|
1054
|
+
mode: 'reference-to-video',
|
|
1055
|
+
referenceImageUrls: [
|
|
1056
|
+
'https://example.com/comic-cat.png',
|
|
1057
|
+
'https://example.com/comic-dog.png',
|
|
1058
|
+
],
|
|
1059
|
+
pollTimeoutMs: 600000,
|
|
1060
|
+
} satisfies XaiVideoModelOptions,
|
|
1061
|
+
},
|
|
1062
|
+
});
|
|
1063
|
+
```
|
|
1064
|
+
|
|
1065
|
+
Use `<IMAGE_1>`, `<IMAGE_2>`, etc. in your prompt to reference specific images. Up to 7 reference images are supported per request.
|
|
1066
|
+
|
|
1067
|
+
<Note>
|
|
1068
|
+
Reference-to-video supports `duration`, `aspectRatio`, and `resolution`. Use
|
|
1069
|
+
`mode` to select the operation — each mode is mutually exclusive.
|
|
1070
|
+
</Note>
|
|
1071
|
+
|
|
705
1072
|
### Video Provider Options
|
|
706
1073
|
|
|
707
1074
|
The following provider options are available via `providerOptions.xai`.
|
|
@@ -721,10 +1088,27 @@ You can validate the provider options using the `XaiVideoModelOptions` type.
|
|
|
721
1088
|
`1280x720` maps to `720p` and `854x480` maps to `480p`.
|
|
722
1089
|
Use this provider option to pass the native format directly.
|
|
723
1090
|
|
|
1091
|
+
- **mode** _'edit-video' | 'extend-video' | 'reference-to-video'_
|
|
1092
|
+
|
|
1093
|
+
Selects the explicit video operation. Each mode is mutually exclusive:
|
|
1094
|
+
- `'edit-video'` — edit an existing video (requires `videoUrl`)
|
|
1095
|
+
- `'extend-video'` — extend a video from its last frame (requires `videoUrl`)
|
|
1096
|
+
- `'reference-to-video'` — generate from reference images (requires `referenceImageUrls`)
|
|
1097
|
+
|
|
1098
|
+
When omitted, standard generation is used. Legacy inputs are still auto-detected from fields for backward compatibility.
|
|
1099
|
+
|
|
724
1100
|
- **videoUrl** _string_
|
|
725
1101
|
|
|
726
|
-
URL of a source video
|
|
727
|
-
|
|
1102
|
+
URL of a source video. Used with `mode: 'edit-video'` for video editing
|
|
1103
|
+
and `mode: 'extend-video'` for video extension.
|
|
1104
|
+
|
|
1105
|
+
- **referenceImageUrls** _string[]_
|
|
1106
|
+
|
|
1107
|
+
Array of reference image URLs (1–7 images) or base64 data URIs for
|
|
1108
|
+
reference-to-video (R2V) generation. The model incorporates visual
|
|
1109
|
+
elements from these images without using them as the first frame. Use
|
|
1110
|
+
`<IMAGE_1>`, `<IMAGE_2>`, etc. in the prompt to reference specific
|
|
1111
|
+
images. Used with `mode: 'reference-to-video'`.
|
|
728
1112
|
|
|
729
1113
|
<Note>
|
|
730
1114
|
Video generation is an asynchronous process that can take several minutes.
|
|
@@ -744,14 +1128,21 @@ desired ratio.
|
|
|
744
1128
|
|
|
745
1129
|
For **video editing**, the output matches the input video's aspect ratio and
|
|
746
1130
|
resolution. Custom `duration`, `aspectRatio`, and `resolution` are not
|
|
747
|
-
supported
|
|
1131
|
+
supported — the output resolution is capped at 720p (e.g., a 1080p input
|
|
748
1132
|
will be downsized to 720p).
|
|
749
1133
|
|
|
1134
|
+
For **video extension**, the output inherits `aspectRatio` and `resolution`
|
|
1135
|
+
from the source video. `duration` is supported and controls only the
|
|
1136
|
+
extension length.
|
|
1137
|
+
|
|
1138
|
+
For **reference-to-video (R2V)**, you can specify `duration`, `aspectRatio`,
|
|
1139
|
+
and `resolution` just like text-to-video.
|
|
1140
|
+
|
|
750
1141
|
### Video Model Capabilities
|
|
751
1142
|
|
|
752
|
-
| Model | Duration | Aspect Ratios | Resolution | Image-to-Video |
|
|
753
|
-
| -------------------- | -------- | ------------------------------------------------- | -------------- | ------------------- | ------------------- |
|
|
754
|
-
| `grok-imagine-video` | 1–15s | `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `3:2`, `2:3` | `480p`, `720p` | <Check size={18} /> | <Check size={18} /> |
|
|
1143
|
+
| Model | Duration | Aspect Ratios | Resolution | Image-to-Video | Editing | Extension | R2V |
|
|
1144
|
+
| -------------------- | -------- | ------------------------------------------------- | -------------- | ------------------- | ------------------- | ------------------- | ------------------- |
|
|
1145
|
+
| `grok-imagine-video` | 1–15s | `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `3:2`, `2:3` | `480p`, `720p` | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> |
|
|
755
1146
|
|
|
756
1147
|
<Note>
|
|
757
1148
|
You can also pass any available provider model ID as a string if needed.
|