ima2-gen 2.0.1 → 2.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +150 -0
- package/README.md +10 -1
- package/bin/commands/backfillThumbs.js +6 -0
- package/bin/commands/gen.js +6 -0
- package/bin/ima2.js +14 -10
- package/docs/API.md +131 -8
- package/docs/CLI.md +2 -1
- package/docs/FAQ.ko.md +16 -0
- package/docs/FAQ.md +30 -0
- package/docs/README.ko.md +7 -3
- package/docs/migration/runtime-test-inventory.md +15 -1
- package/lib/agentImageVideoGen.js +261 -0
- package/lib/agentRuntime.js +7 -262
- package/lib/agyImageAdapter.js +35 -8
- package/lib/errorClassify.js +8 -7
- package/lib/eventBus.js +71 -0
- package/lib/geminiApiImageAdapter.js +16 -20
- package/lib/generationErrors.js +3 -1
- package/lib/grokImageAdapter.js +68 -129
- package/lib/grokImageCore.js +153 -0
- package/lib/grokMultimodeAdapter.js +5 -3
- package/lib/grokVideoCanvas.js +13 -0
- package/lib/grokVideoPlannerPrompt.js +53 -6
- package/lib/historyList.js +1 -0
- package/lib/inflight.js +54 -17
- package/lib/multimodeHelpers.js +10 -0
- package/lib/nodeHelpers.js +59 -0
- package/lib/oauthProxy/prompts.js +30 -36
- package/lib/promptBuilder/systemPrompt.js +2 -5
- package/lib/promptSafetyPolicy.js +1 -5
- package/lib/responsesFallback.js +2 -1
- package/lib/routeHelpers.js +44 -0
- package/lib/ssePublish.js +12 -0
- package/lib/storyboardPrefix.js +28 -0
- package/lib/thumbBackfill.js +16 -5
- package/package.json +4 -1
- package/routes/agy.js +44 -0
- package/routes/auth.js +6 -2
- package/routes/edit.js +7 -1
- package/routes/events.js +78 -0
- package/routes/generate.js +99 -127
- package/routes/index.js +4 -0
- package/routes/multimode.js +99 -56
- package/routes/nodes.js +59 -103
- package/routes/video.js +100 -17
- package/skills/ima2/SKILL.md +98 -21
- package/ui/dist/.vite/manifest.json +12 -12
- package/ui/dist/assets/{AgentWorkspace-CYv84Rus.js → AgentWorkspace-Dth6YijN.js} +1 -1
- package/ui/dist/assets/{CardNewsWorkspace-Dqyc1WZ1.js → CardNewsWorkspace-Dav3K5CT.js} +1 -1
- package/ui/dist/assets/{NodeCanvas-ChEXzQbb.js → NodeCanvas-C4ifFzB1.js} +1 -1
- package/ui/dist/assets/{PromptBuilderPanel-B95ZufnR.js → PromptBuilderPanel-CEcyU9PL.js} +1 -1
- package/ui/dist/assets/{PromptImportDialog-DGOwFQET.js → PromptImportDialog-CgQ94Gth.js} +2 -2
- package/ui/dist/assets/{PromptImportDiscoverySection-CgvdnR49.js → PromptImportDiscoverySection-CuzyzbNI.js} +1 -1
- package/ui/dist/assets/{PromptImportFolderSection-CfUye9J8.js → PromptImportFolderSection-DHLGlO6l.js} +1 -1
- package/ui/dist/assets/{PromptLibraryPanel-B9kndPw1.js → PromptLibraryPanel-BOe18we8.js} +2 -2
- package/ui/dist/assets/SettingsWorkspace-Cdgnm4Wa.js +1 -0
- package/ui/dist/assets/{index-BhcvL0g-.js → index-C5PSahkr.js} +1 -1
- package/ui/dist/assets/index-Dn2AhL6d.css +1 -0
- package/ui/dist/assets/index-Tjqx6wUV.js +23 -0
- package/ui/dist/index.html +2 -2
- package/ui/dist/assets/SettingsWorkspace-B3tgLrmF.js +0 -1
- package/ui/dist/assets/index-BtK3YhJc.js +0 -39
- package/ui/dist/assets/index-ClOLOjnA.css +0 -1
package/CHANGELOG.md
ADDED
|
@@ -0,0 +1,150 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to this project are documented in this file.
|
|
4
|
+
|
|
5
|
+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
|
6
|
+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
+
|
|
8
|
+
## [Unreleased]
|
|
9
|
+
|
|
10
|
+
### Added
|
|
11
|
+
|
|
12
|
+
- **SSE multiplexing** — shared `GET /api/events` endpoint with ring-buffer replay and `Last-Event-ID` reconnect support (`lib/eventBus.ts`, `routes/events.ts`).
|
|
13
|
+
- **Async POST generation mode** — multimode, node, and video routes accept async POST and dual-emit progress on both per-request SSE and the shared event bus.
|
|
14
|
+
- **Frontend event channel** — singleton `EventSource` client (`ui/src/lib/eventChannel.ts`) replaces per-request SSE streams for UI generation flows.
|
|
15
|
+
- **Subscribe-before-fetch contract** — `tests/async-stream-subscribe-order.test.js` locks the race where ultra-fast server publish could arrive before client handler registration.
|
|
16
|
+
- Store modularization — split monolithic `useAppStore` into focused impl modules (`storeGenImpl`, `storeNodeGenImpl`, `storeVideoImpl`, `storeInflightImpl`, etc.).
|
|
17
|
+
- Frontend/API barrel splits — `ui/src/lib/api.ts` and `ui/src/index.css` decomposed into ≤500-line modules.
|
|
18
|
+
- Storyboard workflow — 9-panel grid with black Panel 1 lead-in for image and video generation.
|
|
19
|
+
- Gallery hang fix — video decoder/connection exhaustion on focus change (RCA 01).
|
|
20
|
+
|
|
21
|
+
### Changed
|
|
22
|
+
|
|
23
|
+
- UI clients migrated from per-request SSE to `eventChannel` + async POST for multimode, node, and video generation.
|
|
24
|
+
- Multimode concurrency tracking uses `activeFlightIds` Set instead of `multimodeAbortControllers`.
|
|
25
|
+
- Test suite grew to **968** cases across **186** files (65 runtime-importing, 121 contract-only).
|
|
26
|
+
|
|
27
|
+
### Fixed
|
|
28
|
+
|
|
29
|
+
- SSE multiplexing hardening — inflight cancel/done race guards, replay-gap handling, subscribe/timeout/requestId races, and frontend reconnect/error parsing (`sseStreamError.ts`).
|
|
30
|
+
- Node route validation order — `startJob`/202 response moved after request validation.
|
|
31
|
+
- CI typecheck — unused imports in card-news tests and store split type mismatches.
|
|
32
|
+
- Thumbnail backfill failure reporting (#94).
|
|
33
|
+
- AGY Windows pipe handling, Gemini API aspect ratio string values, multimode same-prompt batching.
|
|
34
|
+
- Moderation over-filtering — removed safety tags and added error enrichment.
|
|
35
|
+
|
|
36
|
+
## [2.0.1] - 2026-06-03
|
|
37
|
+
|
|
38
|
+
### Added
|
|
39
|
+
|
|
40
|
+
- **Gemini API provider** (`provider: "gemini-api"`) — direct Generative Language API and Vertex AI paths with `nano-banana-2` / `nano-banana-pro` model picker, aspect ratio, and resolution controls.
|
|
41
|
+
- **Grok billing quota bar** — `$used/$limit` on QuotaCard via `GET /api/quota`.
|
|
42
|
+
- **Switch Account** — device-code OAuth re-auth for Grok and Codex without leaving the app.
|
|
43
|
+
- **Grok video model picker** — V / V1.5 selection in video controls.
|
|
44
|
+
- Image/video thumbnails and history sidebar cards.
|
|
45
|
+
- Centralized recursive thumbnail backfill.
|
|
46
|
+
- Gemini/Vertex API key management routes and web UI.
|
|
47
|
+
|
|
48
|
+
### Changed
|
|
49
|
+
|
|
50
|
+
- Provider plumbing and CLI parity for gemini-api, grok-api, and vertex paths.
|
|
51
|
+
- Grok model/size pickers and adapter updates.
|
|
52
|
+
- Pages and developer docs reorganized to be feature-centric.
|
|
53
|
+
|
|
54
|
+
### Fixed
|
|
55
|
+
|
|
56
|
+
- Preserve video metadata in sequence history and thumbnail fallbacks in history UI.
|
|
57
|
+
- Vertex AI integration — auth mode persistence, skip unsupported `response_format`, prefer Vertex over API key when both configured.
|
|
58
|
+
- Gemini image cost corrected to official pricing; aspect ratio/resolution UI layout polish.
|
|
59
|
+
- Skip GPT pixel-limit size confirm for Grok/Gemini providers.
|
|
60
|
+
- Reap orphaned codex device-auth child on abandoned Switch Account flow.
|
|
61
|
+
- Document Gemini providers in CLI help; harden provider paths and CLI metadata.
|
|
62
|
+
|
|
63
|
+
### Security
|
|
64
|
+
|
|
65
|
+
- Atomic `config.json` writes in keys routes; atomic token write with codex env scrubbing.
|
|
66
|
+
- Cap sharp input pixels to prevent decompression bombs.
|
|
67
|
+
- Audit fixes — crypto session IDs, session cap, double-click guard, API key in header only.
|
|
68
|
+
|
|
69
|
+
## [2.0.0] - 2026-06-02
|
|
70
|
+
|
|
71
|
+
### Added
|
|
72
|
+
|
|
73
|
+
- Major version bump packaging the Gemini API, Grok API key, Vertex AI, and expanded provider surface shipped in the 1.1.x preview line.
|
|
74
|
+
|
|
75
|
+
## [1.1.23] - 2026-06-02
|
|
76
|
+
|
|
77
|
+
### Added
|
|
78
|
+
|
|
79
|
+
- Gallery skeleton shimmer and F5 refresh fix (#93).
|
|
80
|
+
- Hero one-click install scripts on the documentation site.
|
|
81
|
+
|
|
82
|
+
## [1.1.22] - 2026-06-02
|
|
83
|
+
|
|
84
|
+
### Fixed
|
|
85
|
+
|
|
86
|
+
- Graceful shutdown releases file handles on Windows (EBUSY fix).
|
|
87
|
+
- Ctrl+C clean shutdown — database close, child process stop, file lock release.
|
|
88
|
+
|
|
89
|
+
## [1.1.21] - 2026-05-31
|
|
90
|
+
|
|
91
|
+
### Changed
|
|
92
|
+
|
|
93
|
+
- Bump bundled progrok 0.1.1 → 0.2.0 (video edit + extend commands).
|
|
94
|
+
|
|
95
|
+
## [1.1.15] - 2026-05-31
|
|
96
|
+
|
|
97
|
+
### Added
|
|
98
|
+
|
|
99
|
+
- **Agent Mode** — conversational image workspace with sessions, turns, durable queue, slash commands (`/api/agent/*`).
|
|
100
|
+
- **Grok provider** — bundled progrok, Classic/Node/Agent through search + planner + xAI Images API.
|
|
101
|
+
- **Video generation** — text/image/reference-to-video via Grok, edit/extend/frame/analyze routes, branch-local last-frame continuation.
|
|
102
|
+
- `GET /api/capabilities` discovery endpoint (#62).
|
|
103
|
+
- `POST /api/prompt-builder/chat` assistant and `ima2 prompt build` CLI wrapper.
|
|
104
|
+
- Grok model/size pickers, billing API, and `ima2 grok` helpers.
|
|
105
|
+
|
|
106
|
+
### Fixed
|
|
107
|
+
|
|
108
|
+
- Prompt Studio regression (#75), long-prompt preview (#77), prompt autofill perf (#78).
|
|
109
|
+
- Per-image metadata persistence (#79), batch comparison matrix (#80).
|
|
110
|
+
|
|
111
|
+
## [1.1.10] - 2026-05-06
|
|
112
|
+
|
|
113
|
+
### Added
|
|
114
|
+
|
|
115
|
+
- API-key provider Responses parity for generate/edit/multimode/node (#49).
|
|
116
|
+
- Masked-edit feature flag groundwork (`IMA2_OAUTH_MASKED_EDIT_ENABLED`, #31).
|
|
117
|
+
- Gallery default-to-current-session with All Images toggle (#42).
|
|
118
|
+
- Centralized `persistenceRegistry` for `ima2.*` localStorage keys (#43).
|
|
119
|
+
- `typecheck:tests` and `test:inventory` quality gates.
|
|
120
|
+
|
|
121
|
+
### Changed
|
|
122
|
+
|
|
123
|
+
- Split `lib/oauthProxy.ts` into `lib/oauthProxy/*` subtree.
|
|
124
|
+
- Added `lib/runtimeContext.ts`, `lib/responsesImageAdapter.ts`, `lib/providerOptions.ts`, `lib/errInfo.ts`, `lib/promptSafetyPolicy.ts`.
|
|
125
|
+
|
|
126
|
+
## [1.1.0] - 2026-04-25
|
|
127
|
+
|
|
128
|
+
### Added
|
|
129
|
+
|
|
130
|
+
- TypeScript migration complete — route, lib, server, config, and bin sources are `*.ts` with committed build artifacts.
|
|
131
|
+
- CLI feature parity with server API (#45).
|
|
132
|
+
- Canvas Mode workspace split and dual-mask cleanup.
|
|
133
|
+
- OS-trash soft-delete for history.
|
|
134
|
+
|
|
135
|
+
## [1.0.3] - 2026-04-23
|
|
136
|
+
|
|
137
|
+
### Added
|
|
138
|
+
|
|
139
|
+
- Initial npm publish of `ima2-gen` — local OAuth image generation studio with Classic mode, Node mode, Canvas Mode, and CLI.
|
|
140
|
+
|
|
141
|
+
[Unreleased]: https://github.com/lidge-jun/ima2-gen/compare/v2.0.1...HEAD
|
|
142
|
+
[2.0.1]: https://github.com/lidge-jun/ima2-gen/compare/v2.0.0...v2.0.1
|
|
143
|
+
[2.0.0]: https://github.com/lidge-jun/ima2-gen/compare/v1.1.23...v2.0.0
|
|
144
|
+
[1.1.23]: https://github.com/lidge-jun/ima2-gen/compare/v1.1.22...v1.1.23
|
|
145
|
+
[1.1.22]: https://github.com/lidge-jun/ima2-gen/compare/v1.1.21...v1.1.22
|
|
146
|
+
[1.1.21]: https://github.com/lidge-jun/ima2-gen/compare/v1.1.20...v1.1.21
|
|
147
|
+
[1.1.15]: https://github.com/lidge-jun/ima2-gen/compare/v1.1.14...v1.1.15
|
|
148
|
+
[1.1.10]: https://github.com/lidge-jun/ima2-gen/compare/v1.1.9...v1.1.10
|
|
149
|
+
[1.1.0]: https://github.com/lidge-jun/ima2-gen/compare/v1.0.11...v1.1.0
|
|
150
|
+
[1.0.3]: https://github.com/lidge-jun/ima2-gen/releases/tag/v1.0.3
|
package/README.md
CHANGED
|
@@ -69,7 +69,7 @@ Each script checks for nvm/fnm/brew/winget, installs Node LTS through the best a
|
|
|
69
69
|
1. **GPT OAuth** — login with ChatGPT account (free, images only)
|
|
70
70
|
2. **Grok OAuth** — login with xAI/Grok account (images + video)
|
|
71
71
|
3. **Both** — GPT OAuth + Grok OAuth (full feature access)
|
|
72
|
-
4. **
|
|
72
|
+
4. **Web setup** — configure everything in the web UI
|
|
73
73
|
|
|
74
74
|
Video generation requires Grok OAuth (option 2 or 3). Run `ima2 grok login` separately if you already have GPT OAuth configured and want to add video support; it defaults to the manual-paste flow.
|
|
75
75
|
|
|
@@ -97,6 +97,10 @@ Ctrl+C now performs a clean shutdown — closing the database, stopping child pr
|
|
|
97
97
|
- **Mobile shell**: use the app bar, compose sheet, and compact settings toggle on smaller screens.
|
|
98
98
|
- **Observable jobs**: active and recent jobs are tracked with safe logs and request IDs.
|
|
99
99
|
|
|
100
|
+
### SSE Multiplexing
|
|
101
|
+
|
|
102
|
+
The web UI uses a single `GET /api/events` Server-Sent Events connection for all generation progress. Multimode, node, and video requests are submitted as async POST (`202 { requestId }`) and progress events are multiplexed through a shared event bus. This eliminates the browser 6-connection limit that previously caused gallery hangs during concurrent generation. CLI clients that do not send `async: true` still receive per-request SSE streams for backward compatibility.
|
|
103
|
+
|
|
100
104
|
## Provider Paths
|
|
101
105
|
|
|
102
106
|
Image generation can run through the local Codex/ChatGPT OAuth path, a configured OpenAI API key, the bundled Grok provider, or the Gemini provider via Antigravity CLI.
|
|
@@ -105,11 +109,14 @@ Image generation can run through the local Codex/ChatGPT OAuth path, a configure
|
|
|
105
109
|
- `provider: "api"` calls the OpenAI Responses API with the hosted `image_generation` tool.
|
|
106
110
|
- `provider: "grok"` starts bundled `progrok` on `127.0.0.1:18645`, runs mandatory xAI Web Search plus a planner pass (default: `grok-4.3`, configurable in settings or via `--planner-model`), then calls xAI Images API through the local proxy.
|
|
107
111
|
- `provider: "agy"` spawns the Antigravity CLI (`agy -p`) to generate images via Google Gemini's `default_api:generate_image` tool (model: `nano-banana-2`). Output is fixed at 1024×1024 JPEG, max 3 reference images. No web search, quality, or size controls.
|
|
112
|
+
- `provider: "gemini-api"` calls the Google Generative Language API directly. Supports two models: `nano-banana-2` (Gemini 3.1 Flash Image) and `nano-banana-pro` (Gemini 3 Pro Image). Auth is via `GEMINI_API_KEY` env var, web UI key management, or a Vertex AI service account JSON (`VERTEX_SERVICE_ACCOUNT_JSON`). When both an API key and Vertex credentials are configured, Vertex takes priority. Supports variable aspect ratios (1:1 through 21:9) and four resolution tiers (512px, 1K, 2K, 4K); these controls are only honored on the direct API path — the Vertex AI endpoint ignores aspect/size because it does not accept the `response_format` field. Per-model cost differs: `nano-banana-2` (Flash): 512=$0.001, 1K=$0.003, 2K=$0.004, 4K=$0.006; `nano-banana-pro`: 1K=$0.007, 2K=$0.007, 4K=$0.013. No web search or mask controls.
|
|
108
113
|
- API-key generation supports classic generate, edit, mask-guided edit, multimode, and node generation.
|
|
109
114
|
- Grok generation supports Classic, Node, and Agent flows. If a Classic reference, Node parent image, or Agent current image is present, ima2 switches the final Grok call to xAI image edit so image-to-image context is preserved.
|
|
110
115
|
|
|
111
116
|
If no provider is specified, the app keeps the current GPT OAuth/default behavior. API-key generation defaults to `gpt-5.4-mini`, `low` reasoning, and `1024x1024` unless the request passes validated model, reasoning, size, or web-search options. Grok defaults to `grok-imagine-image`; `quality: "high"` promotes the final image call to `grok-imagine-image-quality`.
|
|
112
117
|
|
|
118
|
+
Grok image generation exposes a model picker (`grok-imagine-image` / `grok-imagine-image-quality`) and a size picker (aspect ratio + 1k/2k resolution). The Settings page shows a billing/quota bar with `$used/$limit` drawn from the Grok billing API, and a **Switch Account** button that starts a device-code OAuth flow (`POST /api/auth/switch`) for re-authenticating without leaving the app.
|
|
119
|
+
|
|
113
120
|
Grok video generation uses `grok-imagine-video` (default) or `grok-imagine-video-1.5-preview`. Three modes are auto-detected from reference count: text-to-video (0 refs), image-to-video (1 ref), and reference-to-video (2–7 refs, max 10s duration). `grok-imagine-video-1.5-preview` supports image-to-video but not `reference_images` Ref2V, so 2+ refs use `grok-imagine-video` as the effective model. Video edit and extension are also base-model only. Video controls include duration (1–15s), resolution (480p, 720p), and aspect ratio (1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, auto).
|
|
114
121
|
|
|
115
122
|

|
|
@@ -260,6 +267,8 @@ environment variables > ~/.ima2/config.json > built-in defaults
|
|
|
260
267
|
| `IMA2_GROK_IMAGE_MODEL_DEFAULT` | `grok-imagine-image` | Default final Grok image model |
|
|
261
268
|
| `IMA2_GROK_GENERATION_TIMEOUT_MS` | `120000` | Timeout for the final Grok Images API call |
|
|
262
269
|
| `IMA2_OAUTH_MASKED_EDIT_ENABLED` | `false` | Opt-in feature flag for masked-edit requests on the OAuth path (#31, groundwork only) |
|
|
270
|
+
| `GEMINI_API_KEY` | — | API key for `provider: "gemini-api"` direct Generative Language API path |
|
|
271
|
+
| `VERTEX_SERVICE_ACCOUNT_JSON` | — | Google service account JSON for Vertex AI auth with `provider: "gemini-api"`; takes priority over `GEMINI_API_KEY` when both are set |
|
|
263
272
|
|
|
264
273
|
### Logging modes
|
|
265
274
|
|
|
@@ -15,4 +15,10 @@ export async function backfillThumbs() {
|
|
|
15
15
|
if (r.created > 0)
|
|
16
16
|
invalidateHistoryIndex();
|
|
17
17
|
console.log(`[thumbs] Done: ${r.created} created, ${r.skipped} skipped (already exist), ${r.failed} failed out of ${r.total} media files.`);
|
|
18
|
+
if (r.failures.length > 0) {
|
|
19
|
+
console.log(`[thumbs] Showing ${r.failures.length} thumbnail failure(s):`);
|
|
20
|
+
for (const failure of r.failures) {
|
|
21
|
+
console.log(` - ${failure.kind}: ${failure.file} (${failure.reason})`);
|
|
22
|
+
}
|
|
23
|
+
}
|
|
18
24
|
}
|
package/bin/commands/gen.js
CHANGED
|
@@ -39,6 +39,11 @@ const HELP = `
|
|
|
39
39
|
|
|
40
40
|
Generate image(s) via the running ima2 server.
|
|
41
41
|
|
|
42
|
+
Batch/async note:
|
|
43
|
+
Use -n <N> for multiple candidates in one request. Independent CLI
|
|
44
|
+
commands can run concurrently against the server; monitor active requestIds
|
|
45
|
+
with 'ima2 ps --json' and stop one with 'ima2 cancel <requestId>'.
|
|
46
|
+
|
|
42
47
|
Options:
|
|
43
48
|
-q, --quality <low|medium|high> Default: low
|
|
44
49
|
-s, --size <WxH | auto> Default: 1024x1024
|
|
@@ -63,6 +68,7 @@ const HELP = `
|
|
|
63
68
|
|
|
64
69
|
Examples:
|
|
65
70
|
ima2 gen "a shiba in space"
|
|
71
|
+
ima2 gen "a shiba in space" -n 4 -d ./out
|
|
66
72
|
ima2 gen "poster" --model gpt-5.4 --mode direct --moderation low
|
|
67
73
|
ima2 gen "merge" --ref a.png --ref b.png -q high -o out.png
|
|
68
74
|
cat prompt.txt | ima2 gen --stdin -n 2 -d ./out
|
package/bin/ima2.js
CHANGED
|
@@ -65,20 +65,15 @@ async function setup() {
|
|
|
65
65
|
console.log(" 1) GPT OAuth — login with ChatGPT account (free, images only)");
|
|
66
66
|
console.log(" 2) Grok OAuth — login with xAI/Grok account (images + video)");
|
|
67
67
|
console.log(" 3) Both — GPT OAuth + Grok OAuth");
|
|
68
|
-
console.log(" 4)
|
|
68
|
+
console.log(" 4) Web setup — configure everything in the web UI\n");
|
|
69
69
|
const choice = await rl.question(" Enter 1-4: ");
|
|
70
70
|
const config = loadConfig();
|
|
71
71
|
if (choice.trim() === "4") {
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
console.log(" Invalid API key format. Expected sk-...");
|
|
75
|
-
rl.close();
|
|
76
|
-
process.exit(1);
|
|
77
|
-
}
|
|
78
|
-
config.provider = "api";
|
|
79
|
-
config.apiKey = key.trim();
|
|
72
|
+
config.provider = "oauth";
|
|
73
|
+
delete config.apiKey;
|
|
80
74
|
saveConfig(config);
|
|
81
|
-
console.log("\n
|
|
75
|
+
console.log("\n You can set up everything from the web UI.");
|
|
76
|
+
console.log(" Run 'ima2 serve', then open Settings in the browser to sign in or add API keys.\n");
|
|
82
77
|
}
|
|
83
78
|
else if (choice.trim() === "2") {
|
|
84
79
|
config.provider = "grok";
|
|
@@ -260,6 +255,12 @@ function showHelp() {
|
|
|
260
255
|
|
|
261
256
|
Usage: ima2 <command> [options]
|
|
262
257
|
|
|
258
|
+
Generation workflow:
|
|
259
|
+
Image/video jobs run on the server. For multiple candidates, prefer
|
|
260
|
+
'ima2 gen -n <N>' or 'ima2 multimode <prompt>' instead of repeating
|
|
261
|
+
one-image prompts. Start independent CLI jobs concurrently when needed;
|
|
262
|
+
use 'ima2 ps --json' to monitor requestIds and 'ima2 cancel <id>' to stop.
|
|
263
|
+
|
|
263
264
|
Server commands:
|
|
264
265
|
serve [--dev] Start the image generation server
|
|
265
266
|
setup, login Configure API key or GPT OAuth (interactive)
|
|
@@ -316,6 +317,9 @@ function showHelp() {
|
|
|
316
317
|
ima2 serve Start server
|
|
317
318
|
ima2 serve --dev Start with verbose server diagnostics
|
|
318
319
|
ima2 gen "a shiba in space" Generate from CLI
|
|
320
|
+
ima2 gen "a shiba in space" -n 4 -d ./out
|
|
321
|
+
Generate 4 candidates in one request
|
|
322
|
+
ima2 ps --json Watch active async generation jobs
|
|
319
323
|
ima2 gen "merge" --ref a.png --ref b.png -q high -o out.png
|
|
320
324
|
ima2 video "a cat playing piano" --duration 10
|
|
321
325
|
ima2 ls -n 10 Last 10 generations
|
package/docs/API.md
CHANGED
|
@@ -10,14 +10,14 @@ http://localhost:3333
|
|
|
10
10
|
|
|
11
11
|
## Provider Policy
|
|
12
12
|
|
|
13
|
-
Image generation supports OAuth, API-key, Grok, and Gemini (agy) providers.
|
|
13
|
+
Image generation supports OAuth, API-key, Grok, and Gemini (`agy` and `gemini-api`) providers.
|
|
14
14
|
|
|
15
15
|
- `provider: "oauth"` uses the local Codex OAuth proxy.
|
|
16
16
|
- `provider: "api"` uses the OpenAI Responses API with the hosted `image_generation` tool.
|
|
17
17
|
- `provider: "grok"` uses the bundled progrok xAI proxy. Classic, Node, and Agent generation run mandatory xAI Web Search through `/v1/responses`, then run a `grok-4.3` planner call with a forced local `generate_image` function, then ima2 executes xAI `/v1/images/generations`. If reference images, a Node parent image, or an Agent current image are attached, the final step switches to xAI `/v1/images/edits` so image-to-image context is preserved.
|
|
18
18
|
- `provider: "agy"` spawns the Antigravity CLI (`agy -p`) to generate images via Google Gemini's `default_api:generate_image` tool. Model is `nano-banana-2`. Output is fixed at 1024×1024 JPEG. Max 3 reference images (i2i). No web search, quality, size, or mask controls. Multimode returns a single image. Video is unsupported (`AGY_VIDEO_UNSUPPORTED`).
|
|
19
19
|
- `provider: "grok-api"` uses a direct xAI API key instead of the bundled progrok OAuth proxy. Same pipeline as `grok` (Web Search → planner → `/v1/images/generations`), same aspect ratio and resolution options. Requires an xAI API key configured via the web UI key management or `XAI_API_KEY` env var. Also supports video generation.
|
|
20
|
-
- `provider: "gemini-api"` calls the Google Generative Language API directly (or Vertex AI with a service account JSON). Supports models `nano-banana-2` (Gemini 3.1 Flash Image) and `nano-banana-pro` (Gemini 3 Pro Image). Supports variable aspect ratios and
|
|
20
|
+
- `provider: "gemini-api"` calls the Google Generative Language API directly (or Vertex AI with a service account JSON). Supports models `nano-banana-2` (Gemini 3.1 Flash Image) and `nano-banana-pro` (Gemini 3 Pro Image). Supports variable aspect ratios (1:1 through 21:9) and four resolution tiers (512px, 1K, 2K, 4K); these are honored only on the direct API path — the Vertex AI endpoint (`aiplatform.googleapis.com`) rejects the `response_format` field and always returns a default 1K/1:1 image regardless of requested size. Auth: `GEMINI_API_KEY` env var, web UI key management (`/api/keys/gemini`), or a Vertex AI service account JSON (`VERTEX_SERVICE_ACCOUNT_JSON` or `/api/keys/vertex`). When both Vertex credentials and an API key are configured, Vertex takes priority. The chosen auth mode (`apikey` or `vertex`) persists to `~/.ima2/config.json` as `geminiAuthMode` and is restored on server startup. Per-model cost: `nano-banana-2` (Flash): 512=$0.001, 1K=$0.003, 2K=$0.004, 4K=$0.006; `nano-banana-pro`: 1K=$0.007, 2K=$0.007, 4K=$0.013. No web search or mask controls.
|
|
21
21
|
- API-key generation covers classic generate, edit, mask-guided edit, multimode, and node generation.
|
|
22
22
|
- If `provider: "api"` is requested without an API key, routes fail before upstream with `401` and `API_KEY_REQUIRED`.
|
|
23
23
|
- Grok generation maps `size` to xAI `aspect_ratio` and `resolution`; it does not send an OpenAI-style `size` field upstream. Grok edit uses xAI `/v1/images/edits`; Grok mask edit remains unsupported and returns `GROK_MASK_UNSUPPORTED`.
|
|
@@ -35,6 +35,16 @@ Generation section below for the full endpoint specification.
|
|
|
35
35
|
| `GET` | `/api/oauth/status` | OAuth proxy status and visible models |
|
|
36
36
|
| `GET` | `/api/grok/status` | Bundled progrok status and visible xAI image models |
|
|
37
37
|
| `GET` | `/api/billing` | Billing/status probe, including API key source when configured |
|
|
38
|
+
| `GET` | `/api/quota` | Provider quota: returns `{ codex, grok }`. Grok result includes `billing: { usedUsd, limitUsd }` and a `monthly` percent window drawn from the xAI billing API. |
|
|
39
|
+
|
|
40
|
+
## Account Switching
|
|
41
|
+
|
|
42
|
+
| Method | Path | Notes |
|
|
43
|
+
|---|---|---|
|
|
44
|
+
| `POST` | `/api/auth/switch` | Start a device-code OAuth flow. Body: `{ "provider": "grok" \| "codex" }`. Returns `{ sessionId, userCode, verificationUrl }`. |
|
|
45
|
+
| `GET` | `/api/auth/switch/:sessionId` | Poll switch-account session status. Returns `{ status }` where status is `pending`, `complete`, `error`, or `expired`. |
|
|
46
|
+
|
|
47
|
+
The Switch Account flow opens a browser verification URL. Once the user completes the device-code step, the server saves the new credentials (Grok: `~/.progrok/auth.json`; Codex: via `codex login --device-auth`) and the session transitions to `complete`. This endpoint is surfaced as a **Switch Account** button in the Settings QuotaCard for Grok and Codex providers.
|
|
38
48
|
|
|
39
49
|
## Storage
|
|
40
50
|
|
|
@@ -84,9 +94,81 @@ Storage `state` values:
|
|
|
84
94
|
| `GET` | `/api/inflight` | Active jobs only by default |
|
|
85
95
|
| `GET` | `/api/inflight?includeTerminal=1` | Includes recent terminal jobs for debugging |
|
|
86
96
|
| `DELETE` | `/api/inflight/:requestId` | Cancel or forget an active job |
|
|
97
|
+
| `GET` | `/api/events` | Persistent SSE multiplex channel for all async generation progress (see below) |
|
|
87
98
|
|
|
88
99
|
In-flight logs and responses use `requestId` for correlation. Logs should not include raw prompts, reference data URLs, generated base64, tokens, cookies, auth headers, or raw upstream bodies.
|
|
89
100
|
|
|
101
|
+
## Events (SSE Multiplexing)
|
|
102
|
+
|
|
103
|
+
### `GET /api/events` (SSE Multiplexing)
|
|
104
|
+
|
|
105
|
+
Single persistent Server-Sent Events channel that carries progress for all async generation jobs. The browser UI opens one `EventSource` here instead of holding a per-request SSE connection for each job, avoiding browser per-origin connection limits.
|
|
106
|
+
|
|
107
|
+
| Query | Notes |
|
|
108
|
+
|---|---|
|
|
109
|
+
| `lastEventId` | Optional. Reconnect cursor; also accepted via the `Last-Event-ID` request header |
|
|
110
|
+
|
|
111
|
+
**Response**: `text/event-stream` (persistent). Each frame uses standard SSE fields `id`, `event`, and `data` (JSON).
|
|
112
|
+
|
|
113
|
+
**Connection limits**: When active listeners reach 512, the server returns `503` with `SSE_CAPACITY` before opening the stream.
|
|
114
|
+
|
|
115
|
+
**Heartbeat**: Every 15 seconds the server writes a comment frame:
|
|
116
|
+
|
|
117
|
+
```text
|
|
118
|
+
: ping
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
**Replay**: On reconnect, the server replays events from an in-memory ring buffer (size 2000) for IDs newer than `lastEventId`. Large image payloads (>1000 characters) are omitted from replay with `_imageOmitted: true` in the `data` payload. If the requested ID is older than the oldest buffered event, the server emits a `replay-gap` event before live fan-out:
|
|
122
|
+
|
|
123
|
+
| Event | Data | Description |
|
|
124
|
+
|---|---|---|
|
|
125
|
+
| `replay-gap` | `{ lastEventId, oldestAvailableId }` | Client should reconcile inflight state (for example via `GET /api/inflight`) |
|
|
126
|
+
|
|
127
|
+
**Job routing**: Every `data` payload includes `jobId` (same value as the job's `requestId`). Event bodies also carry `requestId` where applicable. Clients filter events by matching `data.jobId` or `data.requestId` to the job they started.
|
|
128
|
+
|
|
129
|
+
**Event types** (fan-out to all connected clients):
|
|
130
|
+
|
|
131
|
+
| Event | Emitted by | Description |
|
|
132
|
+
|---|---|---|
|
|
133
|
+
| `phase` | node, multimode, video | Lifecycle phase change |
|
|
134
|
+
| `partial` | node, multimode | Progressive preview image (base64 data URL) |
|
|
135
|
+
| `image` | multimode | Final saved `GenerateItem` for one sequence image |
|
|
136
|
+
| `done` | node, multimode, video | Terminal success payload (route-specific shape) |
|
|
137
|
+
| `error` | all generation routes | Terminal failure |
|
|
138
|
+
| `submitted` | video | Job submitted to xAI |
|
|
139
|
+
| `progress` | video | Progress fraction 0.0–1.0 |
|
|
140
|
+
| `planning` | video | Video planner running |
|
|
141
|
+
|
|
142
|
+
Example SSE frame:
|
|
143
|
+
|
|
144
|
+
```text
|
|
145
|
+
id: 42
|
|
146
|
+
event: phase
|
|
147
|
+
data: {"requestId":"req_abc","jobId":"req_abc","phase":"streaming"}
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
### Async generation mode
|
|
151
|
+
|
|
152
|
+
`POST /api/node/generate`, `POST /api/generate/multimode`, and `POST /api/video/generate` support an async POST mode for clients that already hold `GET /api/events`:
|
|
153
|
+
|
|
154
|
+
```json
|
|
155
|
+
{
|
|
156
|
+
"async": true,
|
|
157
|
+
"requestId": "req_xxx",
|
|
158
|
+
"...": "other route fields"
|
|
159
|
+
}
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
| Outcome | HTTP | Body |
|
|
163
|
+
|---|---|---|
|
|
164
|
+
| Accepted | `202` | `{ "requestId": "req_xxx" }` |
|
|
165
|
+
| Duplicate active `requestId` | `409` | `REQUEST_ID_IN_USE` |
|
|
166
|
+
| More than 12 concurrent active jobs | `429` | `TOO_MANY_JOBS` with `Retry-After: 5` |
|
|
167
|
+
|
|
168
|
+
Progress events are published on `GET /api/events`. The POST response returns immediately; clients must not expect SSE on the POST connection when `async: true`.
|
|
169
|
+
|
|
170
|
+
CLI and legacy clients omit `async` and keep the original behavior: per-request SSE on the same POST response (`Accept: text/event-stream` where applicable). The server dual-emits in that mode — it writes SSE to the POST response and also publishes the same events on `GET /api/events`.
|
|
171
|
+
|
|
90
172
|
## Generation
|
|
91
173
|
|
|
92
174
|
### `POST /api/generate`
|
|
@@ -195,14 +277,46 @@ Body fields:
|
|
|
195
277
|
}
|
|
196
278
|
```
|
|
197
279
|
|
|
198
|
-
When `parentNodeId` is present, the server loads the stored parent node image and uses the edit path.
|
|
280
|
+
When `parentNodeId` is present, the server loads the stored parent node image and uses the edit path. Node-local references are allowed on both root and child/edit nodes; for child/edit nodes the parent image is sent first, then references, then the text prompt.
|
|
199
281
|
|
|
200
282
|
With `provider: "grok"`, Node Mode uses the same xAI search + `grok-4.3` planner + Images API pipeline as classic generation. A parent node image, `externalSrc`, or extra references are passed to the planner and then to xAI `/v1/images/edits`; otherwise the final call uses `/v1/images/generations`. Grok Node requests are capped at three total input images, counting the parent/current image plus references, and return `GROK_REF_TOO_MANY` before upstream when that limit is exceeded. `quality: "high"` promotes the final image model to `grok-imagine-image-quality`.
|
|
201
283
|
|
|
202
|
-
The route can stream Server-Sent Events when the client sends `Accept: text/event-stream`. Possible events include `phase`, `partial`, `done`, and `error`.
|
|
284
|
+
The route can stream Server-Sent Events when the client sends `Accept: text/event-stream`. Possible events include `phase`, `partial`, `done`, and `error`. Alternatively, send `{ "async": true, "requestId": "req_xxx" }` in the body to receive `202 { requestId }` immediately and follow progress on `GET /api/events` (see Events section).
|
|
203
285
|
|
|
204
286
|
Grok Node SSE responses do not include Responses API `partial` image events because the xAI Images API call is synchronous JSON. They still emit `phase` and `done`/`error` events so the Node UI can use the same in-flight lifecycle.
|
|
205
287
|
|
|
288
|
+
### `POST /api/generate/multimode` (SSE)
|
|
289
|
+
|
|
290
|
+
Multi-image sequence generation. SSE-only on the POST response unless async mode is used.
|
|
291
|
+
|
|
292
|
+
```json
|
|
293
|
+
{
|
|
294
|
+
"prompt": "a story in four panels",
|
|
295
|
+
"maxImages": 4,
|
|
296
|
+
"quality": "medium",
|
|
297
|
+
"size": "1024x1024",
|
|
298
|
+
"format": "png",
|
|
299
|
+
"moderation": "low",
|
|
300
|
+
"model": "gpt-5.4",
|
|
301
|
+
"provider": "oauth",
|
|
302
|
+
"references": [],
|
|
303
|
+
"requestId": "optional-client-id",
|
|
304
|
+
"async": false
|
|
305
|
+
}
|
|
306
|
+
```
|
|
307
|
+
|
|
308
|
+
Send `Accept: text/event-stream` for per-request SSE on the POST connection. Or set `"async": true` with a client `requestId` to get `202 { requestId }` and receive events on `GET /api/events`.
|
|
309
|
+
|
|
310
|
+
**SSE events**:
|
|
311
|
+
|
|
312
|
+
| Event | Data | Description |
|
|
313
|
+
|---|---|---|
|
|
314
|
+
| `phase` | `{ requestId, phase, sequenceId?, maxImages? }` | Lifecycle phase |
|
|
315
|
+
| `partial` | `{ requestId, image, index }` | Progressive preview |
|
|
316
|
+
| `image` | full `GenerateItem` | One saved sequence image |
|
|
317
|
+
| `done` | route-specific summary; may include `status: "partial"` after timeout if at least one image was saved | Sequence complete |
|
|
318
|
+
| `error` | `{ requestId, error, code?, status? }` | Generation failed |
|
|
319
|
+
|
|
206
320
|
### `GET /api/node/:nodeId`
|
|
207
321
|
|
|
208
322
|
Fetch stored node metadata and asset URL.
|
|
@@ -228,7 +342,7 @@ Server-side validation may return these reference codes:
|
|
|
228
342
|
|
|
229
343
|
### `POST /api/video/generate` (SSE)
|
|
230
344
|
|
|
231
|
-
Generate a video via the Grok video provider. Returns Server-Sent Events.
|
|
345
|
+
Generate a video via the Grok video provider. Returns Server-Sent Events on the POST connection, or accepts async mode (`{ "async": true, "requestId": "req_xxx" }`) for `202 { requestId }` with progress on `GET /api/events` (see Events section).
|
|
232
346
|
|
|
233
347
|
```json
|
|
234
348
|
{
|
|
@@ -263,7 +377,7 @@ Generate a video via the Grok video provider. Returns Server-Sent Events.
|
|
|
263
377
|
| Field | Type | Default | Notes |
|
|
264
378
|
|---|---|---|---|
|
|
265
379
|
| `prompt` | string | — | Required |
|
|
266
|
-
| `provider` | string | `"grok"` |
|
|
380
|
+
| `provider` | string | `"grok"` | `"grok"` or `"grok-api"` |
|
|
267
381
|
| `model` | string | `grok-imagine-video` | Video model |
|
|
268
382
|
| `duration` | integer | `5` | 1–15 seconds (clamped to 10 for reference-to-video) |
|
|
269
383
|
| `resolution` | string | `"480p"` | `480p` or `720p` |
|
|
@@ -488,6 +602,9 @@ Style-sheet extraction can require an API key/openai client. Image generation al
|
|
|
488
602
|
| `GEMINI_API_SAFETY_BLOCKED` | Gemini API generation blocked by safety filter |
|
|
489
603
|
| `GEMINI_API_NO_IMAGE` | Gemini API returned no image in response |
|
|
490
604
|
| `VIDEO_PROVIDER_UNSUPPORTED` | Video generation requires provider `"grok"` or `"grok-api"` |
|
|
605
|
+
| `SSE_CAPACITY` | More than 512 concurrent `GET /api/events` listeners |
|
|
606
|
+
| `REQUEST_ID_IN_USE` | Async POST used a `requestId` that already has an active job |
|
|
607
|
+
| `TOO_MANY_JOBS` | More than 12 concurrent active generation jobs (`Retry-After: 5`) |
|
|
491
608
|
|
|
492
609
|
## Key Management
|
|
493
610
|
|
|
@@ -529,7 +646,7 @@ Most server routes under `/api/*` have a CLI wrapper. The exception is **Agent M
|
|
|
529
646
|
| `POST /api/node/generate` (SSE) / `GET /api/node/:id` | `ima2 node generate` / `ima2 node show` |
|
|
530
647
|
| `GET /api/history` | `ima2 ls` |
|
|
531
648
|
| `DELETE /api/history/:name` / `…/permanent` | `ima2 history rm [--permanent]` |
|
|
532
|
-
| `POST /api/history/restore` | `ima2 history restore --trash-id` |
|
|
649
|
+
| `POST /api/history/:filename/restore` | `ima2 history restore --trash-id` |
|
|
533
650
|
| `POST /api/history/favorite` | `ima2 history favorite` |
|
|
534
651
|
| `POST /api/history/import-local` | `ima2 history import` |
|
|
535
652
|
| `POST /api/metadata/read` | `ima2 metadata` / `ima2 show --metadata` |
|
|
@@ -544,10 +661,16 @@ Most server routes under `/api/*` have a CLI wrapper. The exception is **Agent M
|
|
|
544
661
|
| `…/api/cardnews/…` (gated on `features.cardNews`) | `ima2 cardnews …` |
|
|
545
662
|
| `POST /api/comfy/export-image` | `ima2 comfy export` |
|
|
546
663
|
| `GET /api/inflight` / `DELETE /api/inflight/:id` | `ima2 inflight ls` (alias `ps`) / `ima2 inflight rm` (alias `cancel`) |
|
|
664
|
+
| `GET /api/events` (SSE multiplex) | Web UI only (persistent `EventSource`; no CLI wrapper) |
|
|
547
665
|
| `GET /api/storage/status` / `POST /api/storage/open-generated-dir` | `ima2 storage status` / `ima2 storage open` |
|
|
548
666
|
| `GET /api/billing` / `GET /api/providers` / `GET /api/oauth/status` / `GET /api/grok/status` | `ima2 billing` / `ima2 providers` / `ima2 oauth status` / `ima2 grok status` |
|
|
667
|
+
| `GET /api/quota` | `ima2 billing` (includes Grok `usedUsd`/`limitUsd`) |
|
|
668
|
+
| `POST /api/auth/switch` / `GET /api/auth/switch/:sessionId` | Web UI only (Settings > QuotaCard > Switch Account) |
|
|
549
669
|
| `GET /api/health` | `ima2 ping` |
|
|
550
670
|
| `GET /api/capabilities` | `ima2 capabilities` |
|
|
671
|
+
| `GET /api/config/grok-planner` | — (Grok planner model query) |
|
|
672
|
+
| `PATCH /api/config/grok-planner` | — (Grok planner model update) |
|
|
673
|
+
| `GET /api/agy/status` | — (Antigravity CLI install status) |
|
|
551
674
|
| `POST /api/history/backfill-thumbnails` | `ima2 backfill-thumbs` |
|
|
552
675
|
| `GET /api/keys/status`, `PUT/DELETE /api/keys/:provider`, `PUT/DELETE /api/keys/vertex` | Web UI only (Settings > API Keys) |
|
|
553
676
|
| `GET/POST/PATCH/DELETE /api/agent/*` (sessions, turns, queue) | — (Agent Mode; web UI only, no CLI) |
|
|
@@ -556,7 +679,7 @@ Most server routes under `/api/*` have a CLI wrapper. The exception is **Agent M
|
|
|
556
679
|
Notes:
|
|
557
680
|
- `ima2 history favorite` and `ima2 annotate …` send `X-Ima2-Browser-Id: cli-<sha1prefix>` derived from the config dir, so CLI activity does not collide with browser sessions.
|
|
558
681
|
- `ima2 session graph save` performs a GET-then-PUT with `If-Match: "<version>"` to guard against `GRAPH_VERSION_CONFLICT`.
|
|
559
|
-
- `ima2 history import` and `ima2 canvas-versions save/update` send raw bytes with `Content-Type: image/<png|jpeg|webp>`; the SSE endpoints (`multimode`, `node generate`) use `Accept: text/event-stream`.
|
|
682
|
+
- `ima2 history import` and `ima2 canvas-versions save/update` send raw bytes with `Content-Type: image/<png|jpeg|webp>`; the SSE endpoints (`multimode`, `node generate`, `video`) use `Accept: text/event-stream`. The web UI instead uses `GET /api/events` plus `async: true` on POST routes.
|
|
560
683
|
- `ima2 cardnews …` checks `runtimeConfig.features.cardNews` before calling the gated endpoints; when disabled the CLI exits 2 with a clear message instead of producing a 404.
|
|
561
684
|
|
|
562
685
|
## CLI Discovery
|
package/docs/CLI.md
CHANGED
|
@@ -62,6 +62,7 @@ Provider override semantics:
|
|
|
62
62
|
- `oauth` forces the local OAuth proxy path.
|
|
63
63
|
- `grok` uses the bundled progrok xAI proxy (`127.0.0.1:18645`). Classic generation first runs mandatory xAI Web Search through Responses API, then asks `grok-4.3` to call ima2's local `generate_image` tool, then ima2 executes xAI `/v1/images/generations`. If `--ref` images are attached, the final step uses xAI `/v1/images/edits` instead so image-to-image/reference context is preserved. Models: `grok-imagine-image`, `grok-imagine-image-quality`. Size is mapped to xAI `aspect_ratio` and `resolution`; the UI web-search toggle is OpenAI-provider-only because Grok search is always on in this path.
|
|
64
64
|
- `agy` spawns the Antigravity CLI to generate via Google Gemini (`nano-banana-2`). Fixed 1024×1024 JPEG output, max 3 refs. No web search, quality, size, or mask controls.
|
|
65
|
+
- `gemini-api` calls the Google Generative Language API directly. Models: `nano-banana-2` (Gemini 3.1 Flash Image) and `nano-banana-pro` (Gemini 3 Pro Image). Use `--model nano-banana-2` or `--model nano-banana-pro` to select. Supports `--size` for aspect ratio and resolution (512px–4K) on the direct API path; Vertex AI ignores aspect/size. Requires `GEMINI_API_KEY` or a Vertex AI service account (`VERTEX_SERVICE_ACCOUNT_JSON`). Switching from `agy` or `gemini-api` provider auto-selects the corresponding Gemini model; switching away resets to the GPT default.
|
|
65
66
|
- `auto` preserves route default behavior and currently resolves to GPT OAuth unless server routing changes.
|
|
66
67
|
|
|
67
68
|
`ima2 serve` starts the bundled Grok proxy automatically. No separate `progrok`
|
|
@@ -317,7 +318,7 @@ Card News requires the server to be started with `IMA2_CARD_NEWS=1` (or `feature
|
|
|
317
318
|
| `ima2 inflight rm <requestId>` | Force-remove a stuck job |
|
|
318
319
|
| `ima2 storage status` | Storage inspection (richer than `doctor`) |
|
|
319
320
|
| `ima2 storage open` | Open the generated dir in the OS file manager (POST) |
|
|
320
|
-
| `ima2 billing` | API usage / quota |
|
|
321
|
+
| `ima2 billing` | API usage / quota; Grok result includes `billing.usedUsd` / `billing.limitUsd` drawn from the xAI billing API |
|
|
321
322
|
| `ima2 providers` | Configured providers |
|
|
322
323
|
| `ima2 oauth status` | OAuth proxy state |
|
|
323
324
|
| `ima2 grok status` | Bundled progrok / xAI image-model probe state |
|
package/docs/FAQ.ko.md
CHANGED
|
@@ -323,6 +323,22 @@ export HTTPS_PROXY=http://127.0.0.1:7890
|
|
|
323
323
|
|
|
324
324
|
GPT OAuth는 OpenAI와 ChatGPT/Codex 관련 호스트 접근이 필요할 수 있습니다. 회사 방화벽, TLS 검사, VPN, 프록시가 흐름을 깨뜨릴 수 있습니다. 로그인 실패와 `failed to fetch`가 반복되면 다른 네트워크에서도 시도해 보세요.
|
|
325
325
|
|
|
326
|
+
## SSE 멀티플렉싱
|
|
327
|
+
|
|
328
|
+
### 왜 웹 UI가 단일 SSE 연결을 쓰나요?
|
|
329
|
+
|
|
330
|
+
브라우저는 같은 origin에 대해 동시 HTTP 연결 수를 제한합니다(보통 6개). 여러 이미지를 동시에 생성할 때 각 요청이 SSE 연결을 점유하면, multimode+node+video가 동시에 돌아갈 때 연결이 포화되어 갤러리 썸네일이 멈췄습니다.
|
|
331
|
+
|
|
332
|
+
이제 웹 UI는 `GET /api/events`로 하나의 SSE 연결만 열고, 모든 생성 진행 이벤트를 멀티플렉싱합니다. 생성 요청은 `async: true`로 보내면 즉시 `202 { requestId }` 응답을 받아 연결을 바로 해제합니다. CLI는 영향 없이 기존 per-request SSE를 그대로 사용합니다.
|
|
333
|
+
|
|
334
|
+
### SSE 연결이 끊기면 어떻게 되나요?
|
|
335
|
+
|
|
336
|
+
이벤트 채널 클라이언트가 지수 백오프로 자동 재연결합니다. 재연결 시 `Last-Event-ID`를 보내서 서버의 링 버퍼(최대 2000건)에서 놓친 이벤트를 재전송받습니다. 버퍼에서 이미 사라진 이벤트가 있으면 `replay-gap` 이벤트로 알려줍니다.
|
|
337
|
+
|
|
338
|
+
### 동시 작업 상한은 얼마인가요?
|
|
339
|
+
|
|
340
|
+
서버는 동시 생성 작업을 12건(`MAX_CONCURRENT_JOBS`)으로 제한합니다. 초과 요청은 `429`와 `Retry-After: 5`를 받습니다. SSE 엔드포인트 자체는 512개 동시 연결까지 지원합니다.
|
|
341
|
+
|
|
326
342
|
## CLI 점검 순서
|
|
327
343
|
|
|
328
344
|
아래 순서대로 확인해 보세요.
|
package/docs/FAQ.md
CHANGED
|
@@ -103,6 +103,20 @@ ima2 serve
|
|
|
103
103
|
|
|
104
104
|
If this happens on a company network, a firewall, VPN, proxy, or captive portal may also be blocking the OAuth flow.
|
|
105
105
|
|
|
106
|
+
### How do I use the Gemini providers?
|
|
107
|
+
|
|
108
|
+
Two Gemini providers are available:
|
|
109
|
+
|
|
110
|
+
- **`agy`** — uses the Antigravity CLI (`agy -p`) with no API key needed. Requires the `agy` binary to be installed and logged in. Model is `nano-banana-2`, output is fixed at 1024×1024.
|
|
111
|
+
|
|
112
|
+
- **`gemini-api`** — calls the Google Generative Language API directly. Add a `GEMINI_API_KEY` env var, or configure a key via Settings > API Keys. For Vertex AI, add a service account JSON via Settings or the `VERTEX_SERVICE_ACCOUNT_JSON` env var. When both an API key and Vertex credentials are present, Vertex takes priority. Use the auth-mode dropdown in Settings to switch between `apikey` and `vertex`; the choice is saved and restored automatically.
|
|
113
|
+
|
|
114
|
+
The `gemini-api` provider supports two models: `nano-banana-2` (Gemini 3.1 Flash Image) and `nano-banana-pro` (Gemini 3 Pro Image). The web UI shows aspect-ratio and resolution controls (512px–4K) for `gemini-api`; these are honored only on the direct Gemini API path and are ignored by Vertex AI.
|
|
115
|
+
|
|
116
|
+
### How do I re-authenticate Grok or Codex without restarting?
|
|
117
|
+
|
|
118
|
+
Use the **Switch Account** button in Settings > QuotaCard for the provider. This starts a device-code OAuth flow: a new browser tab opens the verification URL, you complete the login, and the server automatically picks up the new credentials. The Grok quota bar also shows `$used / $limit` (in USD) drawn from the xAI billing API.
|
|
119
|
+
|
|
106
120
|
## Models and quota
|
|
107
121
|
|
|
108
122
|
### Which model should I use?
|
|
@@ -334,6 +348,22 @@ Use the host and port from your proxy client. If `ima2-gen` still fails after th
|
|
|
334
348
|
|
|
335
349
|
GPT OAuth may require access to OpenAI and ChatGPT/Codex-related hosts. A corporate firewall, TLS inspection, VPN, or proxy can break the flow. Try a different network if login and `failed to fetch` errors keep repeating.
|
|
336
350
|
|
|
351
|
+
## SSE Multiplexing
|
|
352
|
+
|
|
353
|
+
### Why does the web UI use a single SSE connection?
|
|
354
|
+
|
|
355
|
+
Browsers limit the number of concurrent HTTP connections to the same origin (typically 6). When generating multiple images at once, each generation request used to hold a Server-Sent Events connection open. With multimode, node, and video running simultaneously, the browser would run out of connections and gallery thumbnails would hang.
|
|
356
|
+
|
|
357
|
+
The web UI now opens a single persistent `GET /api/events` SSE connection and all generation progress is multiplexed through it. Generation requests use `async: true` and receive an immediate `202 { requestId }` response, freeing the connection immediately. The CLI is unaffected — it still uses per-request SSE when `async` is not set.
|
|
358
|
+
|
|
359
|
+
### What happens if the SSE connection drops?
|
|
360
|
+
|
|
361
|
+
The event channel client reconnects automatically with exponential backoff. On reconnect, it sends `Last-Event-ID` so the server can replay missed events from its ring buffer (up to 2000 entries). If events have been evicted from the buffer, the server sends a `replay-gap` event so the client knows some updates may have been lost.
|
|
362
|
+
|
|
363
|
+
### What is the maximum number of concurrent jobs?
|
|
364
|
+
|
|
365
|
+
The server caps concurrent generation jobs at 12 (`MAX_CONCURRENT_JOBS`). Additional requests receive `429` with `Retry-After: 5`. The SSE endpoint itself caps at 512 simultaneous connections.
|
|
366
|
+
|
|
337
367
|
## CLI troubleshooting checklist
|
|
338
368
|
|
|
339
369
|
Run these in order:
|
package/docs/README.ko.md
CHANGED
|
@@ -78,7 +78,7 @@ v1.1.22부터 Ctrl+C가 DB, 소켓, 자식 프로세스를 깨끗하게 정리
|
|
|
78
78
|
1. **GPT OAuth** — ChatGPT 계정으로 로그인 (무료, 이미지만)
|
|
79
79
|
2. **Grok OAuth** — xAI/Grok 계정으로 로그인 (이미지 + 영상)
|
|
80
80
|
3. **Both** — GPT + Grok 둘 다 (전체 기능)
|
|
81
|
-
4. **
|
|
81
|
+
4. **Web setup** — 웹 UI에서 전체 설정
|
|
82
82
|
|
|
83
83
|
영상 생성은 Grok OAuth(2번 또는 3번)가 필요합니다.
|
|
84
84
|
|
|
@@ -95,6 +95,10 @@ v1.1.22부터 Ctrl+C가 DB, 소켓, 자식 프로세스를 깨끗하게 정리
|
|
|
95
95
|
- **Mobile shell**: 작은 화면에서는 app bar, compose sheet, compact settings toggle로 조작합니다.
|
|
96
96
|
- **Observable jobs**: 진행 중인 작업과 최근 완료된 작업을 request ID로 추적합니다.
|
|
97
97
|
|
|
98
|
+
### SSE 멀티플렉싱
|
|
99
|
+
|
|
100
|
+
웹 UI는 단일 `GET /api/events` Server-Sent Events 연결로 모든 생성 진행 상황을 수신합니다. Multimode, node, video 요청은 비동기 POST(`202 { requestId }`)로 제출되고, 이벤트 버스를 통해 진행 이벤트가 멀티플렉싱됩니다. 기존 브라우저 6-연결 제한으로 인한 동시 생성 시 갤러리 hang 문제가 해결됩니다. `async: true`를 보내지 않는 CLI 클라이언트는 기존 per-request SSE 스트림을 그대로 사용할 수 있습니다.
|
|
101
|
+
|
|
98
102
|
## 이미지 생성 공급자
|
|
99
103
|
|
|
100
104
|
이미지 생성은 로컬 Codex/ChatGPT OAuth, OpenAI API key, 번들 Grok 공급자를 지원합니다.
|
|
@@ -232,8 +236,8 @@ environment variables > ~/.ima2/config.json > built-in defaults
|
|
|
232
236
|
| `IMA2_NO_GROK_PROXY` | — | `1`이면 progrok 자동 시작 비활성화 |
|
|
233
237
|
| `IMA2_GROK_PLANNER_MODEL` | `grok-4.3` | Grok 플래너 모델 (설정 UI 또는 `--planner-model` CLI 플래그로도 변경 가능) |
|
|
234
238
|
| `IMA2_GROK_IMAGE_MODEL_DEFAULT` | `grok-imagine-image` | 기본 Grok 이미지 모델 |
|
|
235
|
-
| `IMA2_LOG_LEVEL` | `
|
|
236
|
-
| `IMA2_INFLIGHT_TERMINAL_TTL_MS` | `
|
|
239
|
+
| `IMA2_LOG_LEVEL` | `info` | 일반 `serve`는 `info`, dev 모드는 `debug`. `debug`, `info`, `warn`, `error`, `silent` 지원 |
|
|
240
|
+
| `IMA2_INFLIGHT_TERMINAL_TTL_MS` | `300000` | 디버그용 최근 작업 보존 시간 (5분) |
|
|
237
241
|
| `OPENAI_API_KEY` | — | `provider: "api"` Responses 이미지 경로와 보조 기능용 API 키 |
|
|
238
242
|
|
|
239
243
|
### 로그 모드
|