ima2-gen 1.1.17 → 1.1.19

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (60) hide show
  1. package/README.md +26 -4
  2. package/bin/commands/capabilities.js +6 -0
  3. package/bin/commands/capabilities.ts +6 -0
  4. package/bin/commands/video.js +215 -0
  5. package/bin/commands/video.ts +205 -0
  6. package/bin/ima2.js +61 -6
  7. package/bin/ima2.ts +54 -6
  8. package/docs/API.md +73 -4
  9. package/docs/CLI.md +38 -0
  10. package/docs/README.ja.md +2 -2
  11. package/docs/README.ko.md +15 -3
  12. package/docs/README.zh-CN.md +2 -2
  13. package/lib/agentGenerationPlanner.js +18 -1
  14. package/lib/agentGenerationPlanner.ts +21 -1
  15. package/lib/agentRuntime.js +105 -1
  16. package/lib/agentRuntime.ts +118 -1
  17. package/lib/agentTypes.js +1 -0
  18. package/lib/agentTypes.ts +2 -1
  19. package/lib/assetLifecycle.js +12 -8
  20. package/lib/assetLifecycle.ts +12 -8
  21. package/lib/capabilities.js +9 -0
  22. package/lib/capabilities.ts +9 -0
  23. package/lib/grokVideoAdapter.js +45 -1
  24. package/lib/grokVideoAdapter.ts +49 -1
  25. package/lib/historyList.js +1 -0
  26. package/lib/historyList.ts +1 -0
  27. package/lib/imageModels.js +1 -1
  28. package/lib/imageModels.ts +2 -2
  29. package/lib/oauthLauncher.js +5 -2
  30. package/lib/oauthLauncher.ts +5 -3
  31. package/lib/videoSeriesChain.js +24 -0
  32. package/lib/videoSeriesChain.ts +29 -0
  33. package/node_modules/progrok/README.md +300 -22
  34. package/node_modules/progrok/dist/index.js +558 -173
  35. package/node_modules/progrok/dist/index.js.map +1 -1
  36. package/node_modules/progrok/package.json +3 -3
  37. package/node_modules/progrok/skills/progrok/SKILL.md +145 -109
  38. package/package.json +2 -2
  39. package/routes/video.js +10 -1
  40. package/routes/video.ts +11 -1
  41. package/ui/dist/.vite/manifest.json +12 -12
  42. package/ui/dist/assets/AgentWorkspace-DE_wg90f.js +3 -0
  43. package/ui/dist/assets/{CardNewsWorkspace-6y_HNp3I.js → CardNewsWorkspace--Myc5pAp.js} +1 -1
  44. package/ui/dist/assets/NodeCanvas-4U5oOT2y.js +7 -0
  45. package/ui/dist/assets/{PromptBuilderPanel-BQlPtGGm.js → PromptBuilderPanel-DNW1U8zI.js} +2 -2
  46. package/ui/dist/assets/{PromptImportDialog-aNk40wLt.js → PromptImportDialog-o-4Sqki1.js} +2 -2
  47. package/ui/dist/assets/{PromptImportDiscoverySection-B6NKkVBz.js → PromptImportDiscoverySection-BAbrRP8B.js} +1 -1
  48. package/ui/dist/assets/{PromptImportFolderSection-9-xbe-FM.js → PromptImportFolderSection-L-XI2noz.js} +1 -1
  49. package/ui/dist/assets/{PromptLibraryPanel-CbEY0AM6.js → PromptLibraryPanel-CrW9LYGD.js} +2 -2
  50. package/ui/dist/assets/{SettingsWorkspace-ao9ymIWt.js → SettingsWorkspace-Dn4SYTyZ.js} +1 -1
  51. package/ui/dist/assets/index-B6tcw_UF.css +1 -0
  52. package/ui/dist/assets/{index-DP88bEQf.js → index-BONbNNIi.js} +1 -1
  53. package/ui/dist/assets/index-CeSZ2L3-.js +32 -0
  54. package/ui/dist/index.html +2 -2
  55. package/vendor/progrok-0.1.1.tgz +0 -0
  56. package/ui/dist/assets/AgentWorkspace-CLHwx6u4.js +0 -3
  57. package/ui/dist/assets/NodeCanvas-DR2N5Dib.js +0 -7
  58. package/ui/dist/assets/index-B0re600T.js +0 -32
  59. package/ui/dist/assets/index-CXJEgTOQ.css +0 -1
  60. package/vendor/progrok-0.1.0.tgz +0 -0
package/README.md CHANGED
@@ -16,9 +16,9 @@
16
16
 
17
17
  `ima2-gen` is a local image generation studio for people who want the ChatGPT/Codex image workflow in a small desktop-like web app.
18
18
 
19
- Run it with `npx`, sign in with Codex OAuth, type a prompt, and keep iterating with history, references, node branches, multimode batches, and Canvas Mode cleanup. No OpenAI API key is required for the default path, but API-key generation and bundled Grok generation are also supported when configured.
19
+ Run it with `npx`, sign in with ChatGPT OAuth or Grok OAuth, and start generating images and videos. Iterate with history, references, node branches, multimode batches, Canvas Mode cleanup, and Grok Video generation. No API key required free ChatGPT OAuth and SuperGrok subscription cover everything.
20
20
 
21
- ![ima2-gen classic generation screen with prompt composer, generated image, compact model label, and result metadata.](assets/screenshots/classic-generate-light.png)
21
+ ![ima2-gen video playback with gallery sidebar showing generated images and videos.](assets/screenshots/classic-generate-light.png)
22
22
 
23
23
  ## Quick Start
24
24
 
@@ -35,6 +35,13 @@ npx @openai/codex login
35
35
  npx ima2-gen serve
36
36
  ```
37
37
 
38
+ To generate a video from the CLI:
39
+
40
+ ```bash
41
+ ima2 video "a cat playing piano" --duration 5 --resolution 720p
42
+ ima2 video "animate this scene" --ref photo.png --duration 10
43
+ ```
44
+
38
45
  If `3333` is already occupied, `ima2-gen` binds the next available port and writes the actual URL to `~/.ima2/server.json`. Use `ima2 open` or the URL printed in the terminal instead of assuming the port.
39
46
 
40
47
  You can also install it globally:
@@ -44,6 +51,17 @@ npm install -g ima2-gen
44
51
  ima2 serve
45
52
  ```
46
53
 
54
+ ### Setup
55
+
56
+ `ima2 setup` offers four authentication choices:
57
+
58
+ 1. **GPT OAuth** — login with ChatGPT account (free, images only)
59
+ 2. **Grok OAuth** — login with xAI/Grok account (images + video)
60
+ 3. **Both** — GPT OAuth + Grok OAuth (full feature access)
61
+ 4. **API Key** — paste your OpenAI API key (paid)
62
+
63
+ Video generation requires Grok OAuth (option 2 or 3). Run `ima2 grok login` separately if you already have GPT OAuth configured and want to add video support.
64
+
47
65
  Before updating a global install on Windows, stop any running `ima2 serve`
48
66
  process. If npm reports `EBUSY` or `resource busy or locked`, close ima2
49
67
  terminals, end stale `node.exe` processes if needed, and retry. If the lock
@@ -54,9 +72,10 @@ persists, reboot and run the update before starting ima2 again.
54
72
  - **Classic mode**: generate, edit, reuse the current image, paste references, and continue from history.
55
73
  - **Node mode**: branch a good image into multiple directions without losing the original.
56
74
  - **Multimode batches**: launch several Classic outputs from one prompt, watch slot-by-slot progress, and continue from the best result.
75
+ - **Video generation**: create short videos from text, a single image, or multiple reference images via Grok video models. SSE streaming shows planning → submitted → progress % → done.
57
76
  - **Canvas Mode**: zoom, pan, annotate, erase, clean backgrounds, keep transparent previews, and export either alpha or matte-backed versions.
58
77
  - **Local gallery**: keep generated assets on your machine with session-aware history. By default the gallery shows the current session and an All Images toggle reveals the full history; the default scope is sticky across sessions. Each image records its generation time and reasoning effort in the result metadata, so they persist across reloads.
59
- - **Reference images**: drag, drop, paste, and attach up to 5 references; large images are compressed before upload.
78
+ - **Reference images**: drag, drop, paste, and attach up to 5 references (images) or up to 7 references (video); large images are compressed before upload.
60
79
  - **Prompt library imports**: import local prompt packs, GitHub folders, and curated GPT-image prompt hints into the built-in prompt library.
61
80
  - **Mobile shell**: use the app bar, compose sheet, and compact settings toggle on smaller screens.
62
81
  - **Observable jobs**: active and recent jobs are tracked with safe logs and request IDs.
@@ -73,7 +92,7 @@ Image generation can run through the local Codex/ChatGPT OAuth path, a configure
73
92
 
74
93
  If no provider is specified, the app keeps the current OAuth/default behavior. API-key generation defaults to `gpt-5.4-mini`, `low` reasoning, and `1024x1024` unless the request passes validated model, reasoning, size, or web-search options. Grok defaults to `grok-imagine-image`; `quality: "high"` promotes the final image call to `grok-imagine-image-quality`.
75
94
 
76
- Grok video generation (T2V/I2V) is not shipped in `1.1.15`. The video files in `docs/grok-video-i2v-plan.md` and `docs/grok-video-i2v-research.md` are implementation planning and research notes only; the published runtime remains image-only.
95
+ Grok video generation uses `grok-imagine-video` (default) or `grok-imagine-video-1.5-preview`. Three modes are auto-detected from reference count: text-to-video (0 refs), image-to-video (1 ref), and reference-to-video (2–7 refs, max 10s duration). Video controls include duration (1–15s), resolution (480p, 720p), and aspect ratio (1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, auto).
77
96
 
78
97
  ![Settings workspace showing OAuth active and API key provider available.](assets/screenshots/settings-oauth-generation.png)
79
98
 
@@ -166,6 +185,7 @@ These require a running `ima2 serve`. The CLI covers every server route. The mos
166
185
  | `ima2 gen <prompt>` | Generate from the CLI |
167
186
  | `ima2 edit <file> --prompt <text>` | Edit an existing image |
168
187
  | `ima2 multimode <prompt>` | Multi-image SSE generation |
188
+ | `ima2 video <prompt>` | Video generation via Grok (SSE streaming with progress) |
169
189
  | `ima2 ls [--session <id>] [--favorites]` | List recent history |
170
190
  | `ima2 show <name> [--metadata]` | Reveal a generated asset |
171
191
  | `ima2 prompt ls -q <search>` | Search the prompt library |
@@ -179,6 +199,8 @@ The server advertises its actual port at `~/.ima2/server.json`. If `3333` is bus
179
199
  ima2 gen "poster" --model gpt-5.4 --reasoning-effort high
180
200
  ima2 edit input.png --prompt "make it rainy" --web-search
181
201
  ima2 multimode "two cats playing" -n 2
202
+ ima2 video "a cat playing piano" --duration 5 --resolution 720p
203
+ ima2 video "animate this" --ref photo.png --aspect-ratio 16:9
182
204
  ima2 inflight ls --terminal
183
205
  ima2 config set imageModels.reasoningEffort high
184
206
  ```
@@ -70,6 +70,12 @@ function printText(capabilities) {
70
70
  if (capabilities.valid?.imageModels?.grokSupported?.length) {
71
71
  out(` grok models: ${capabilities.valid.imageModels.grokSupported.join(", ")}`);
72
72
  }
73
+ if (capabilities.valid?.videoModels?.supported?.length) {
74
+ out(` video models: ${capabilities.valid.videoModels.supported.join(", ")}`);
75
+ out(` video resolutions: ${capabilities.valid.videoModels.resolutions?.join(", ")}`);
76
+ out(` video aspect ratios: ${capabilities.valid.videoModels.aspectRatios?.join(", ")}`);
77
+ out(` video duration: ${capabilities.valid.videoModels.durationRange?.[0]}-${capabilities.valid.videoModels.durationRange?.[1]}s`);
78
+ }
73
79
  out(` reasoning: ${capabilities.valid?.reasoningEfforts?.join(", ")}`);
74
80
  out(` quality: ${capabilities.valid?.quality?.join(", ")}`);
75
81
  out(` modes: ${capabilities.valid?.modes?.join(", ")}`);
@@ -74,6 +74,12 @@ function printText(capabilities: any): void {
74
74
  if (capabilities.valid?.imageModels?.grokSupported?.length) {
75
75
  out(` grok models: ${capabilities.valid.imageModels.grokSupported.join(", ")}`);
76
76
  }
77
+ if (capabilities.valid?.videoModels?.supported?.length) {
78
+ out(` video models: ${capabilities.valid.videoModels.supported.join(", ")}`);
79
+ out(` video resolutions: ${capabilities.valid.videoModels.resolutions?.join(", ")}`);
80
+ out(` video aspect ratios: ${capabilities.valid.videoModels.aspectRatios?.join(", ")}`);
81
+ out(` video duration: ${capabilities.valid.videoModels.durationRange?.[0]}-${capabilities.valid.videoModels.durationRange?.[1]}s`);
82
+ }
77
83
  out(` reasoning: ${capabilities.valid?.reasoningEfforts?.join(", ")}`);
78
84
  out(` quality: ${capabilities.valid?.quality?.join(", ")}`);
79
85
  out(` modes: ${capabilities.valid?.modes?.join(", ")}`);
@@ -0,0 +1,215 @@
1
+ import { parseArgs } from "../lib/args.js";
2
+ import { resolveServer } from "../lib/client.js";
3
+ import { streamSse } from "../lib/sse.js";
4
+ import { out, die, color, json, exitCodeForError } from "../lib/output.js";
5
+ import { config } from "../../config.js";
6
+ import { readFile, writeFile, mkdir } from "node:fs/promises";
7
+ import { dirname, join } from "node:path";
8
+ const VALID_RESOLUTIONS = new Set(["480p", "720p"]);
9
+ const VALID_ASPECT_RATIOS = new Set(["1:1", "16:9", "9:16", "4:3", "3:4", "3:2", "2:3", "auto"]);
10
+ const VALID_MODELS = new Set(["grok-imagine-video", "grok-imagine-video-1.5-preview"]);
11
+ const SPEC = {
12
+ flags: {
13
+ duration: { type: "string", default: "5" },
14
+ resolution: { type: "string", default: "480p" },
15
+ "aspect-ratio": { type: "string", default: "auto" },
16
+ model: { type: "string" },
17
+ topic: { type: "string" },
18
+ ref: { type: "string", repeatable: true },
19
+ out: { short: "o", type: "string" },
20
+ "out-dir": { short: "d", type: "string" },
21
+ json: { type: "boolean" },
22
+ timeout: { type: "string", default: "600" },
23
+ server: { type: "string" },
24
+ session: { type: "string" },
25
+ help: { short: "h", type: "boolean" },
26
+ },
27
+ };
28
+ const HELP = `
29
+ ima2 video <prompt...> [options]
30
+
31
+ Generate a video via the Grok video provider (SSE streaming).
32
+
33
+ Options:
34
+ --duration <1..15> Duration in seconds. Default: 5
35
+ --resolution <480p|720p> Default: 480p
36
+ --aspect-ratio <ratio|auto> 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, auto. Default: auto
37
+ --model <name> grok-imagine-video, grok-imagine-video-1.5-preview
38
+ --topic <text> Series topic for prompt chain continuity
39
+ --ref <file> Attach source/reference image (repeatable, max 7)
40
+ -o, --out <file> Output file path
41
+ -d, --out-dir <dir> Output directory
42
+ --json Print JSON result to stdout
43
+ --timeout <sec> Default: 600
44
+ --server <url> Override server URL
45
+ --session <id> Session ID
46
+
47
+ Modes (auto-detected from --ref count):
48
+ 0 refs → text-to-video
49
+ 1 ref → image-to-video
50
+ 2-7 refs → reference-to-video (max 10s duration)
51
+
52
+ Examples:
53
+ ima2 video "a cat playing piano"
54
+ ima2 video "animate this" --ref photo.png --duration 10
55
+ ima2 video "cinematic" --resolution 720p --aspect-ratio 16:9 -o out.mp4
56
+ `;
57
+ export default async function videoCmd(argv) {
58
+ const args = parseArgs(argv, SPEC);
59
+ if (args.help) {
60
+ out(HELP);
61
+ return;
62
+ }
63
+ const prompt = args.positional.join(" ");
64
+ if (!prompt)
65
+ die(2, "prompt is required");
66
+ const duration = parseInt(String(args.duration)) || 5;
67
+ if (duration < 1 || duration > 15)
68
+ die(2, "--duration must be between 1 and 15");
69
+ const resolution = String(args.resolution);
70
+ if (!VALID_RESOLUTIONS.has(resolution))
71
+ die(2, "--resolution must be one of: 480p, 720p");
72
+ const aspectRatio = String(args["aspect-ratio"]);
73
+ if (!VALID_ASPECT_RATIOS.has(aspectRatio))
74
+ die(2, "--aspect-ratio must be one of: 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, auto");
75
+ if (args.model && !VALID_MODELS.has(String(args.model))) {
76
+ die(2, "--model must be one of: grok-imagine-video, grok-imagine-video-1.5-preview");
77
+ }
78
+ const refs = (Array.isArray(args.ref) ? args.ref : []);
79
+ if (refs.length > 7)
80
+ die(2, "max 7 --ref attachments for video");
81
+ let server;
82
+ try {
83
+ server = await resolveServer({ serverFlag: args.server });
84
+ }
85
+ catch (e) {
86
+ die(exitCodeForError(e), e.message);
87
+ throw e;
88
+ }
89
+ const referenceImages = await Promise.all(refs.map(async (p) => {
90
+ const buf = await readFile(p);
91
+ return buf.toString("base64");
92
+ }));
93
+ const timeoutMs = (parseInt(String(args.timeout)) || 600) * 1000;
94
+ const requestId = `req_cli_video_${Date.now().toString(36)}`;
95
+ const body = {
96
+ prompt,
97
+ provider: "grok",
98
+ duration,
99
+ resolution,
100
+ aspectRatio,
101
+ requestId,
102
+ };
103
+ if (args.model)
104
+ body.model = args.model;
105
+ if (args.session)
106
+ body.sessionId = args.session;
107
+ if (args.topic)
108
+ body.topic = args.topic;
109
+ if (referenceImages.length === 1) {
110
+ body.sourceImage = referenceImages[0];
111
+ }
112
+ else if (referenceImages.length > 1) {
113
+ body.referenceImages = referenceImages;
114
+ }
115
+ const ac = new AbortController();
116
+ let timedOut = false;
117
+ const timeoutTimer = setTimeout(() => { timedOut = true; ac.abort(); }, timeoutMs);
118
+ const onSig = () => { ac.abort(); process.exit(130); };
119
+ process.once("SIGINT", onSig);
120
+ process.once("SIGTERM", onSig);
121
+ const url = `${server.base}/api/video/generate`;
122
+ let doneData = null;
123
+ let lastProgress = -1;
124
+ try {
125
+ for await (const ev of streamSse(url, { body, signal: ac.signal, headers: { "X-Request-Id": requestId } })) {
126
+ switch (ev.event) {
127
+ case "planning":
128
+ if (!args.json)
129
+ out(color.dim("[planning] preparing video generation..."));
130
+ break;
131
+ case "submitted":
132
+ if (!args.json)
133
+ out(color.dim(`[submitted] xai request: ${ev.data.xaiVideoRequestId || "..."}`));
134
+ break;
135
+ case "progress": {
136
+ const pct = typeof ev.data.progress === "number" ? Math.round(ev.data.progress * 100) : null;
137
+ if (pct !== null && pct !== lastProgress && !args.json) {
138
+ const bar = renderBar(pct);
139
+ process.stdout.write(`\r ${bar} ${pct}%`);
140
+ lastProgress = pct;
141
+ }
142
+ break;
143
+ }
144
+ case "done":
145
+ if (!args.json && lastProgress >= 0)
146
+ process.stdout.write("\n");
147
+ doneData = ev.data;
148
+ break;
149
+ case "error":
150
+ if (!args.json && lastProgress >= 0)
151
+ process.stdout.write("\n");
152
+ die(1, `video error: ${ev.data.error || ev.data}${ev.data.code ? ` (${ev.data.code})` : ""}`);
153
+ }
154
+ }
155
+ }
156
+ catch (e) {
157
+ if (e.name === "AbortError" && !timedOut)
158
+ return;
159
+ if (!args.json && lastProgress >= 0)
160
+ process.stdout.write("\n");
161
+ die(exitCodeForError(e), e.message);
162
+ }
163
+ finally {
164
+ clearTimeout(timeoutTimer);
165
+ process.off("SIGINT", onSig);
166
+ process.off("SIGTERM", onSig);
167
+ }
168
+ if (!doneData?.filename)
169
+ die(1, "server did not return a video filename");
170
+ // Determine output path
171
+ const filename = String(doneData.filename);
172
+ const explicitOut = args.out ? String(args.out) : null;
173
+ const outDir = args["out-dir"] ? String(args["out-dir"]) : null;
174
+ let target;
175
+ if (explicitOut) {
176
+ target = explicitOut;
177
+ }
178
+ else if (outDir) {
179
+ target = join(outDir, filename);
180
+ }
181
+ else {
182
+ target = join(config.storage.generatedDir, filename);
183
+ }
184
+ // Download the video file from server
185
+ const videoUrl = `${server.base}${doneData.url || `/generated/${encodeURIComponent(filename)}`}`;
186
+ const dlRes = await fetch(videoUrl, { signal: AbortSignal.timeout(30_000) });
187
+ if (!dlRes.ok)
188
+ die(1, `failed to download video: HTTP ${dlRes.status}`);
189
+ const videoBuf = Buffer.from(await dlRes.arrayBuffer());
190
+ await mkdir(dirname(target), { recursive: true }).catch(() => { });
191
+ await writeFile(target, videoBuf);
192
+ if (args.json) {
193
+ json({
194
+ ok: true,
195
+ requestId: doneData.requestId,
196
+ path: target,
197
+ filename,
198
+ elapsed: doneData.elapsed,
199
+ video: doneData.video,
200
+ revisedPrompt: doneData.revisedPrompt,
201
+ });
202
+ }
203
+ else {
204
+ out(color.green("✓ ") + target);
205
+ if (doneData.elapsed)
206
+ out(color.dim(`elapsed ${doneData.elapsed}s`));
207
+ if (doneData.revisedPrompt)
208
+ out(color.dim(`revised: ${String(doneData.revisedPrompt).slice(0, 80)}`));
209
+ }
210
+ }
211
+ function renderBar(pct) {
212
+ const width = 20;
213
+ const filled = Math.round((pct / 100) * width);
214
+ return color.green("█".repeat(filled)) + color.dim("░".repeat(width - filled));
215
+ }
@@ -0,0 +1,205 @@
1
+ import { parseArgs } from "../lib/args.js";
2
+ import { resolveServer } from "../lib/client.js";
3
+ import { streamSse } from "../lib/sse.js";
4
+ import { out, die, color, json, exitCodeForError } from "../lib/output.js";
5
+ import { config } from "../../config.js";
6
+ import { readFile, writeFile, mkdir } from "node:fs/promises";
7
+ import { dirname, join } from "node:path";
8
+
9
+ const VALID_RESOLUTIONS = new Set(["480p", "720p"]);
10
+ const VALID_ASPECT_RATIOS = new Set(["1:1", "16:9", "9:16", "4:3", "3:4", "3:2", "2:3", "auto"]);
11
+ const VALID_MODELS = new Set(["grok-imagine-video", "grok-imagine-video-1.5-preview"]);
12
+
13
+ const SPEC = {
14
+ flags: {
15
+ duration: { type: "string", default: "5" },
16
+ resolution: { type: "string", default: "480p" },
17
+ "aspect-ratio": { type: "string", default: "auto" },
18
+ model: { type: "string" },
19
+ topic: { type: "string" },
20
+ ref: { type: "string", repeatable: true },
21
+ out: { short: "o", type: "string" },
22
+ "out-dir": { short: "d", type: "string" },
23
+ json: { type: "boolean" },
24
+ timeout: { type: "string", default: "600" },
25
+ server: { type: "string" },
26
+ session: { type: "string" },
27
+ help: { short: "h", type: "boolean" },
28
+ },
29
+ };
30
+
31
+ const HELP = `
32
+ ima2 video <prompt...> [options]
33
+
34
+ Generate a video via the Grok video provider (SSE streaming).
35
+
36
+ Options:
37
+ --duration <1..15> Duration in seconds. Default: 5
38
+ --resolution <480p|720p> Default: 480p
39
+ --aspect-ratio <ratio|auto> 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, auto. Default: auto
40
+ --model <name> grok-imagine-video, grok-imagine-video-1.5-preview
41
+ --topic <text> Series topic for prompt chain continuity
42
+ --ref <file> Attach source/reference image (repeatable, max 7)
43
+ -o, --out <file> Output file path
44
+ -d, --out-dir <dir> Output directory
45
+ --json Print JSON result to stdout
46
+ --timeout <sec> Default: 600
47
+ --server <url> Override server URL
48
+ --session <id> Session ID
49
+
50
+ Modes (auto-detected from --ref count):
51
+ 0 refs → text-to-video
52
+ 1 ref → image-to-video
53
+ 2-7 refs → reference-to-video (max 10s duration)
54
+
55
+ Examples:
56
+ ima2 video "a cat playing piano"
57
+ ima2 video "animate this" --ref photo.png --duration 10
58
+ ima2 video "cinematic" --resolution 720p --aspect-ratio 16:9 -o out.mp4
59
+ `;
60
+
61
+ export default async function videoCmd(argv: string[]) {
62
+ const args = parseArgs(argv, SPEC);
63
+ if (args.help) { out(HELP); return; }
64
+
65
+ const prompt = args.positional.join(" ");
66
+ if (!prompt) die(2, "prompt is required");
67
+
68
+ const duration = parseInt(String(args.duration)) || 5;
69
+ if (duration < 1 || duration > 15) die(2, "--duration must be between 1 and 15");
70
+
71
+ const resolution = String(args.resolution);
72
+ if (!VALID_RESOLUTIONS.has(resolution)) die(2, "--resolution must be one of: 480p, 720p");
73
+
74
+ const aspectRatio = String(args["aspect-ratio"]);
75
+ if (!VALID_ASPECT_RATIOS.has(aspectRatio)) die(2, "--aspect-ratio must be one of: 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, auto");
76
+
77
+ if (args.model && !VALID_MODELS.has(String(args.model))) {
78
+ die(2, "--model must be one of: grok-imagine-video, grok-imagine-video-1.5-preview");
79
+ }
80
+
81
+ const refs = (Array.isArray(args.ref) ? args.ref : []) as string[];
82
+ if (refs.length > 7) die(2, "max 7 --ref attachments for video");
83
+
84
+ let server;
85
+ try { server = await resolveServer({ serverFlag: args.server }); }
86
+ catch (e: unknown) { die(exitCodeForError(e), (e as Error).message); throw e; }
87
+
88
+ const referenceImages = await Promise.all(refs.map(async (p: string) => {
89
+ const buf = await readFile(p);
90
+ return buf.toString("base64");
91
+ }));
92
+
93
+ const timeoutMs = (parseInt(String(args.timeout)) || 600) * 1000;
94
+ const requestId = `req_cli_video_${Date.now().toString(36)}`;
95
+
96
+ const body: Record<string, unknown> = {
97
+ prompt,
98
+ provider: "grok",
99
+ duration,
100
+ resolution,
101
+ aspectRatio,
102
+ requestId,
103
+ };
104
+ if (args.model) body.model = args.model;
105
+ if (args.session) body.sessionId = args.session;
106
+ if (args.topic) body.topic = args.topic;
107
+ if (referenceImages.length === 1) {
108
+ body.sourceImage = referenceImages[0];
109
+ } else if (referenceImages.length > 1) {
110
+ body.referenceImages = referenceImages;
111
+ }
112
+
113
+ const ac = new AbortController();
114
+ let timedOut = false;
115
+ const timeoutTimer = setTimeout(() => { timedOut = true; ac.abort(); }, timeoutMs);
116
+ const onSig = () => { ac.abort(); process.exit(130); };
117
+ process.once("SIGINT", onSig);
118
+ process.once("SIGTERM", onSig);
119
+
120
+ const url = `${server.base}/api/video/generate`;
121
+ let doneData: Record<string, unknown> | null = null;
122
+ let lastProgress = -1;
123
+
124
+ try {
125
+ for await (const ev of streamSse(url, { body, signal: ac.signal, headers: { "X-Request-Id": requestId } })) {
126
+ switch (ev.event) {
127
+ case "planning":
128
+ if (!args.json) out(color.dim("[planning] preparing video generation..."));
129
+ break;
130
+ case "submitted":
131
+ if (!args.json) out(color.dim(`[submitted] xai request: ${ev.data.xaiVideoRequestId || "..."}`));
132
+ break;
133
+ case "progress": {
134
+ const pct = typeof ev.data.progress === "number" ? Math.round(ev.data.progress * 100) : null;
135
+ if (pct !== null && pct !== lastProgress && !args.json) {
136
+ const bar = renderBar(pct);
137
+ process.stdout.write(`\r ${bar} ${pct}%`);
138
+ lastProgress = pct;
139
+ }
140
+ break;
141
+ }
142
+ case "done":
143
+ if (!args.json && lastProgress >= 0) process.stdout.write("\n");
144
+ doneData = ev.data;
145
+ break;
146
+ case "error":
147
+ if (!args.json && lastProgress >= 0) process.stdout.write("\n");
148
+ die(1, `video error: ${ev.data.error || ev.data}${ev.data.code ? ` (${ev.data.code})` : ""}`);
149
+ }
150
+ }
151
+ } catch (e: unknown) {
152
+ if ((e as Error).name === "AbortError" && !timedOut) return;
153
+ if (!args.json && lastProgress >= 0) process.stdout.write("\n");
154
+ die(exitCodeForError(e), (e as Error).message);
155
+ } finally {
156
+ clearTimeout(timeoutTimer);
157
+ process.off("SIGINT", onSig);
158
+ process.off("SIGTERM", onSig);
159
+ }
160
+
161
+ if (!doneData?.filename) die(1, "server did not return a video filename");
162
+
163
+ // Determine output path
164
+ const filename = String(doneData.filename);
165
+ const explicitOut = args.out ? String(args.out) : null;
166
+ const outDir = args["out-dir"] ? String(args["out-dir"]) : null;
167
+ let target: string;
168
+ if (explicitOut) {
169
+ target = explicitOut;
170
+ } else if (outDir) {
171
+ target = join(outDir, filename);
172
+ } else {
173
+ target = join(config.storage.generatedDir, filename);
174
+ }
175
+
176
+ // Download the video file from server
177
+ const videoUrl = `${server.base}${doneData.url || `/generated/${encodeURIComponent(filename)}`}`;
178
+ const dlRes = await fetch(videoUrl, { signal: AbortSignal.timeout(30_000) });
179
+ if (!dlRes.ok) die(1, `failed to download video: HTTP ${dlRes.status}`);
180
+ const videoBuf = Buffer.from(await dlRes.arrayBuffer());
181
+ await mkdir(dirname(target), { recursive: true }).catch(() => {});
182
+ await writeFile(target, videoBuf);
183
+
184
+ if (args.json) {
185
+ json({
186
+ ok: true,
187
+ requestId: doneData.requestId,
188
+ path: target,
189
+ filename,
190
+ elapsed: doneData.elapsed,
191
+ video: doneData.video,
192
+ revisedPrompt: doneData.revisedPrompt,
193
+ });
194
+ } else {
195
+ out(color.green("✓ ") + target);
196
+ if (doneData.elapsed) out(color.dim(`elapsed ${doneData.elapsed}s`));
197
+ if (doneData.revisedPrompt) out(color.dim(`revised: ${String(doneData.revisedPrompt).slice(0, 80)}`));
198
+ }
199
+ }
200
+
201
+ function renderBar(pct: number): string {
202
+ const width = 20;
203
+ const filled = Math.round((pct / 100) * width);
204
+ return color.green("█".repeat(filled)) + color.dim("░".repeat(width - filled));
205
+ }
package/bin/ima2.js CHANGED
@@ -62,11 +62,13 @@ async function setup() {
62
62
  const rl = createInterface({ input: process.stdin, output: process.stdout });
63
63
  console.log("\n ima2-gen — GPT Image 2 Generator\n");
64
64
  console.log(" Choose authentication method:\n");
65
- console.log(" 1) API Key paste your OpenAI API key (paid)");
66
- console.log(" 2) OAuth — login with ChatGPT account (free)\n");
67
- const choice = await rl.question(" Enter 1 or 2: ");
65
+ console.log(" 1) GPT OAuth login with ChatGPT account (free, images only)");
66
+ console.log(" 2) Grok OAuth — login with xAI/Grok account (images + video)");
67
+ console.log(" 3) Both — GPT OAuth + Grok OAuth");
68
+ console.log(" 4) API Key — paste your OpenAI API key (paid)\n");
69
+ const choice = await rl.question(" Enter 1-4: ");
68
70
  const config = loadConfig();
69
- if (choice.trim() === "1") {
71
+ if (choice.trim() === "4") {
70
72
  const key = await rl.question(" OpenAI API Key: ");
71
73
  if (!key.startsWith("sk-")) {
72
74
  console.log(" Invalid API key format. Expected sk-...");
@@ -78,12 +80,62 @@ async function setup() {
78
80
  saveConfig(config);
79
81
  console.log("\n API key saved. Starting server...\n");
80
82
  }
83
+ else if (choice.trim() === "2") {
84
+ config.provider = "grok";
85
+ config.oauth = config.oauth || {};
86
+ config.oauth.disableAutoStart = true;
87
+ delete config.apiKey;
88
+ saveConfig(config);
89
+ console.log("\n Starting Grok OAuth login...\n");
90
+ try {
91
+ execSync(`node ${JSON.stringify(join(ROOT, "bin", "ima2.js"))} grok login`, { stdio: "inherit" });
92
+ }
93
+ catch {
94
+ console.log("\n Grok login failed or cancelled. You can retry with 'ima2 grok login'.\n");
95
+ rl.close();
96
+ process.exit(1);
97
+ }
98
+ console.log(" Grok configured. Run 'ima2 serve' to start.\n");
99
+ }
100
+ else if (choice.trim() === "3") {
101
+ config.provider = "oauth";
102
+ delete config.apiKey;
103
+ if (config.oauth)
104
+ delete config.oauth.disableAutoStart;
105
+ saveConfig(config);
106
+ console.log("\n Setting up both GPT OAuth + Grok OAuth...\n");
107
+ // GPT OAuth
108
+ const auth = detectCodexAuth();
109
+ if (!auth.authed) {
110
+ console.log(" Running GPT OAuth login...\n");
111
+ try {
112
+ execSync(`${resolveBin("npx")} @openai/codex login`, { stdio: "inherit" });
113
+ }
114
+ catch {
115
+ console.log("\n GPT login failed. Continuing with Grok...\n");
116
+ }
117
+ }
118
+ else {
119
+ console.log(` GPT OAuth session found.\n`);
120
+ }
121
+ // Grok OAuth
122
+ console.log(" Running Grok OAuth login...\n");
123
+ try {
124
+ execSync(`node ${JSON.stringify(join(ROOT, "bin", "ima2.js"))} grok login`, { stdio: "inherit" });
125
+ }
126
+ catch {
127
+ console.log("\n Grok login failed. You can retry with 'ima2 grok login'.\n");
128
+ }
129
+ console.log(" Both providers configured.\n");
130
+ }
81
131
  else {
132
+ // Default: GPT OAuth (choice 1 or anything else)
82
133
  config.provider = "oauth";
134
+ config.oauth = config.oauth || {};
135
+ config.oauth.disableAutoStart = false;
83
136
  delete config.apiKey;
84
137
  saveConfig(config);
85
138
  console.log("\n Starting OAuth login...\n");
86
- // Check if codex auth exists (file OR keyring via `codex login status`)
87
139
  const auth = detectCodexAuth();
88
140
  const hasAuth = auth.authed;
89
141
  if (!hasAuth) {
@@ -211,6 +263,7 @@ function showHelp() {
211
263
 
212
264
  Client commands (require a running 'ima2 serve'):
213
265
  gen <prompt> Generate image(s) from prompt (ima2 gen --help)
266
+ video <prompt> Generate video via Grok (ima2 video --help)
214
267
  edit <file> Edit an existing image (ima2 edit --help)
215
268
  ls List recent history (ima2 ls --help)
216
269
  show <name> Show one history item (ima2 show --help)
@@ -256,6 +309,7 @@ function showHelp() {
256
309
  ima2 serve --dev Start with verbose server diagnostics
257
310
  ima2 gen "a shiba in space" Generate from CLI
258
311
  ima2 gen "merge" --ref a.png --ref b.png -q high -o out.png
312
+ ima2 video "a cat playing piano" --duration 10
259
313
  ima2 ls -n 10 Last 10 generations
260
314
  ima2 skill Print agent usage skill
261
315
  ima2 capabilities --json Inspect supported models/options
@@ -271,7 +325,7 @@ if (args.includes("-v") || args.includes("--version")) {
271
325
  process.exit(0);
272
326
  }
273
327
  if ((!command || args.includes("-h") || args.includes("--help"))
274
- && !["doctor", "gen", "edit", "ls", "show", "ps", "cancel", "session", "history", "prompt", "multimode", "node", "annotate", "canvas-versions", "metadata", "comfy", "cardnews", "inflight", "storage", "billing", "providers", "oauth", "grok", "config", "defaults", "capabilities", "skill", "ping"].includes(command)) {
328
+ && !["doctor", "gen", "video", "edit", "ls", "show", "ps", "cancel", "session", "history", "prompt", "multimode", "node", "annotate", "canvas-versions", "metadata", "comfy", "cardnews", "inflight", "storage", "billing", "providers", "oauth", "grok", "config", "defaults", "capabilities", "skill", "ping"].includes(command)) {
275
329
  showHelp();
276
330
  process.exit(command ? 0 : 1);
277
331
  }
@@ -314,6 +368,7 @@ switch (command) {
314
368
  }
315
369
  break;
316
370
  case "gen":
371
+ case "video":
317
372
  case "edit":
318
373
  case "ls":
319
374
  case "show":