imgx-mcp 1.5.1 → 1.6.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +35 -0
- package/README.md +29 -6
- package/dist/cli.bundle.js +34 -11
- package/dist/image-generation-skill.zip +0 -0
- package/dist/mcp.bundle.js +37 -12
- package/package.json +2 -2
- package/skills/image-generation/SKILL.md +518 -13
- package/skills/image-generation/references/providers.md +18 -11
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,40 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 1.6.1 (2026-03-24)
|
|
4
|
+
|
|
5
|
+
### Added
|
|
6
|
+
|
|
7
|
+
- **README badges** — npm version, npm downloads, Cursor Directory, MIT license
|
|
8
|
+
- **"What sets imgx-mcp apart" section** — highlights context-aware prompt optimization, 24 editing techniques, and session management with undo/redo
|
|
9
|
+
- **Cursor Directory link** in README Links section
|
|
10
|
+
|
|
11
|
+
### Changed
|
|
12
|
+
|
|
13
|
+
- **Description updated** across package.json, server.json, plugin configs — now emphasizes context-aware prompt optimization, bundled editing techniques, and session undo/redo
|
|
14
|
+
- **Listed on Cursor Directory** — https://cursor.directory/plugins/imgx-mcp
|
|
15
|
+
|
|
16
|
+
## 1.6.0 (2026-03-06)
|
|
17
|
+
|
|
18
|
+
### Added
|
|
19
|
+
|
|
20
|
+
- **`gemini-2.5-flash-image` (Nano Banana)** — re-added as the only Gemini image model with a **free tier** (10 RPM / 500 RPD, no credit card required). Max 1024×1024, 7 aspect ratios. Ideal entry point for users without paid API access
|
|
21
|
+
- **`gpt-image-1.5`** — OpenAI's latest image model. ~4x faster, 20% cheaper than gpt-image-1, improved text rendering and editing precision. Same API, drop-in replacement
|
|
22
|
+
- **`gpt-image-1-mini`** — Budget OpenAI model at $0.005–$0.036/image. Same API compatibility
|
|
23
|
+
- **`background` parameter** — `transparent`, `opaque`, or `auto` (OpenAI only). Available on `generate_image`, `edit_image`, and `edit_last` MCP tools, and as `--background` / `-b` CLI flag. Use `transparent` for icons, logos, and stickers with transparent PNG/WebP output
|
|
24
|
+
- **`quality` parameter** — `low`, `medium`, `high`, or `auto` (OpenAI only). Direct quality control that overrides the resolution-based mapping. Available on all MCP tools and as `--quality` / `-q` CLI flag
|
|
25
|
+
- **Viral style templates** — Ghibli, action figure, 3D clay, pixel art, and more in SKILL.md
|
|
26
|
+
- **Specialized use case guides** — Icon set generation, seamless patterns, technical diagrams, story sequences with character consistency
|
|
27
|
+
- **Multi-image consistency techniques** — Design token approach, character DNA template, style reference chains
|
|
28
|
+
- **Platform size guide** — Recommended aspect ratios and resolutions for social media, OGP, app stores, print, and blog platforms
|
|
29
|
+
- **Model aliases** — Nano Banana Pro, Nano Banana 2, Nano Banana, GPT Image 1.5, GPT Image Mini (+ Japanese variants) mapped in Skill for natural language model selection
|
|
30
|
+
|
|
31
|
+
### Changed
|
|
32
|
+
|
|
33
|
+
- **Default model changed to free tier** — `gemini-2.5-flash-image` (Nano Banana) is now the default model. Users start free (500 images/day, no credit card), and can upgrade to paid models (Nano Banana 2, Nano Banana Pro) when they need higher quality, 4K resolution, or extended aspect ratios
|
|
34
|
+
- **Gemini 3.1 Flash model added** — `gemini-3.1-flash-image-preview` (Nano Banana 2) as paid alternative. Supports 4K resolution and 14 aspect ratios
|
|
35
|
+
- **SKILL.md rewritten** — added detailed model specs (3 Gemini + 3 OpenAI), model selection guide, 9 use case templates (including transparent background), 24 editing techniques, prompt enhancement guide (Subject-Context-Style framework), and refinement patterns
|
|
36
|
+
- **providers.md rewritten** — 3-model comparison table with per-model capabilities
|
|
37
|
+
|
|
3
38
|
## 1.5.1 (2026-03-05)
|
|
4
39
|
|
|
5
40
|
### Fixed
|
package/README.md
CHANGED
|
@@ -1,9 +1,20 @@
|
|
|
1
1
|
# imgx-mcp
|
|
2
2
|
|
|
3
|
+
[](https://www.npmjs.com/package/imgx-mcp)
|
|
4
|
+
[](https://www.npmjs.com/package/imgx-mcp)
|
|
5
|
+
[](https://cursor.directory/plugins/imgx-mcp)
|
|
6
|
+
[](https://opensource.org/licenses/MIT)
|
|
7
|
+
|
|
3
8
|
AI image generation and editing MCP server. Works with Claude Code, Gemini CLI, Cursor, Windsurf, and any MCP-compatible tool.
|
|
4
9
|
|
|
5
10
|
Generate images from text, edit existing images with text instructions, iterate on results — all from your AI coding environment.
|
|
6
11
|
|
|
12
|
+
### What sets imgx-mcp apart
|
|
13
|
+
|
|
14
|
+
- **No prompt engineering** — Your AI agent keeps conversation context and auto-constructs optimized prompts. Say what you need; the agent handles prompt structure, model selection, and platform-specific sizing
|
|
15
|
+
- **24 editing techniques built in** — Atmosphere, composition, style transfer, element manipulation, and trending styles — bundled as a Skill your agent applies on demand
|
|
16
|
+
- **Session management with undo/redo** — Edit iteratively, step back to any point, branch off, or switch between parallel sessions — version control for images
|
|
17
|
+
|
|
7
18
|
## Quick start
|
|
8
19
|
|
|
9
20
|
Add to your tool's MCP config (`.mcp.json`, `settings.json`, etc.):
|
|
@@ -70,9 +81,18 @@ Claude Desktop supports skills via ZIP upload:
|
|
|
70
81
|
|
|
71
82
|
> Update the skill by re-downloading and re-uploading the ZIP after new releases.
|
|
72
83
|
|
|
73
|
-
### What the
|
|
84
|
+
### What the Skill brings
|
|
85
|
+
|
|
86
|
+
The MCP server gives the AI the *ability* to generate and edit images. The Skill adds the *knowledge* of how to use those tools well — so you don't need to learn prompt syntax, model specifications, or service-specific parameters.
|
|
87
|
+
|
|
88
|
+
- **Automatic prompt construction** — Say "I need a cover image." The AI builds a structured prompt using the Subject-Context-Style framework: what to show, where to place it, how it should look
|
|
89
|
+
- **24 editing techniques** — Atmosphere adjustment, composition changes, element manipulation, style transfer. "Make it warmer" or "add depth of field" — the AI selects the right instruction for the model
|
|
90
|
+
- **Intelligent model selection** — Starts with the free model. Suggests paid upgrades only when your needs exceed free tier capabilities, and explains what changes
|
|
91
|
+
- **Platform-aware sizing** — "Twitter OGP" or "App Store screenshot" — the AI picks the correct aspect ratio and resolution. Covers social media, OGP, app stores, print, and blog platforms
|
|
92
|
+
- **Trending style templates** — Ghibli, action figure in box, 3D clay, pixel art, chibi, and more. Name the style and the AI applies the right prompt structure
|
|
93
|
+
- **Multi-image consistency** — Design tokens and character DNA templates maintain visual coherence across slide decks, social media series, and brand assets
|
|
74
94
|
|
|
75
|
-
The
|
|
95
|
+
The image generation models already have these capabilities. The Skill is what makes them accessible without specialized knowledge.
|
|
76
96
|
|
|
77
97
|
### MCP server vs Skill
|
|
78
98
|
|
|
@@ -157,7 +177,7 @@ edit_last → imgx-a1b2c3d4-1.png
|
|
|
157
177
|
|
|
158
178
|
Set up at least one provider:
|
|
159
179
|
|
|
160
|
-
**Gemini** — get a key from [Google AI Studio](https://aistudio.google.com/apikey) (free tier available):
|
|
180
|
+
**Gemini** — get a key from [Google AI Studio](https://aistudio.google.com/apikey) (free tier available for `gemini-2.5-flash-image`):
|
|
161
181
|
|
|
162
182
|
```bash
|
|
163
183
|
imgx config set api-key YOUR_GEMINI_API_KEY --provider gemini
|
|
@@ -277,8 +297,8 @@ The same `npx` pattern works with Cursor, Windsurf, Continue.dev, Cline, Zed, an
|
|
|
277
297
|
|
|
278
298
|
| Provider | Models | Capabilities |
|
|
279
299
|
|----------|--------|-------------|
|
|
280
|
-
| Gemini | `gemini-3-pro-image-preview
|
|
281
|
-
| OpenAI | `gpt-image-1` | Generate, edit, aspect ratio, multi-output, output format (PNG/JPEG/WebP) |
|
|
300
|
+
| Gemini | `gemini-2.5-flash-image` (Nano Banana — **free tier**, default), `gemini-3-pro-image-preview` (Nano Banana Pro), `gemini-3.1-flash-image-preview` (Nano Banana 2) | Generate, edit, aspect ratio (up to 14 ratios), resolution (up to 4K), reference images, person control |
|
|
301
|
+
| OpenAI | `gpt-image-1`, `gpt-image-1.5` (faster, 20% cheaper), `gpt-image-1-mini` (budget) | Generate, edit, aspect ratio, multi-output, output format (PNG/JPEG/WebP), background transparency |
|
|
282
302
|
|
|
283
303
|
## Architecture
|
|
284
304
|
|
|
@@ -360,10 +380,12 @@ imgx capabilities # Detailed capabilities of current provider
|
|
|
360
380
|
| `--output` | `-o` | Output file path (auto-generated if omitted) |
|
|
361
381
|
| `--input` | `-i` | Input image to edit (`edit` command only) |
|
|
362
382
|
| `--last` | `-l` | Use last output as input (`edit` command only) |
|
|
363
|
-
| `--aspect-ratio` | `-a` | `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `2:3`, `3:2` |
|
|
383
|
+
| `--aspect-ratio` | `-a` | `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `2:3`, `3:2` + Gemini 3.x: `1:4`, `1:8`, `4:1`, `4:5`, `5:4`, `8:1`, `21:9` |
|
|
364
384
|
| `--resolution` | `-r` | `1K`, `2K`, `4K` |
|
|
365
385
|
| `--count` | `-n` | Number of images to generate |
|
|
366
386
|
| `--format` | `-f` | Output format: `png`, `jpeg`, `webp` (OpenAI only) |
|
|
387
|
+
| `--background` | `-b` | Background: `transparent`, `opaque`, `auto` (OpenAI only) |
|
|
388
|
+
| `--quality` | `-q` | Quality: `low`, `medium`, `high`, `auto` (OpenAI only) |
|
|
367
389
|
| `--model` | `-m` | Model name |
|
|
368
390
|
| `--provider` | | Provider name (default: `gemini`) |
|
|
369
391
|
| `--output-dir` | `-d` | Output directory |
|
|
@@ -503,6 +525,7 @@ MIT — [SOMA COFFEE KYOTO](https://github.com/somacoffeekyoto)
|
|
|
503
525
|
- [Official page](https://somacoffee.net/imgx-mcp/)
|
|
504
526
|
- [GitHub](https://github.com/somacoffeekyoto/imgx-mcp)
|
|
505
527
|
- [npm](https://www.npmjs.com/package/imgx-mcp)
|
|
528
|
+
- [Cursor Directory](https://cursor.directory/plugins/imgx-mcp)
|
|
506
529
|
- [MCP Registry](https://registry.modelcontextprotocol.io)
|
|
507
530
|
- [SOMA COFFEE KYOTO](https://somacoffee.net)
|
|
508
531
|
- [X (@somacoffeekyoto)](https://x.com/somacoffeekyoto)
|
package/dist/cli.bundle.js
CHANGED
|
@@ -39527,8 +39527,12 @@ var Capability;
|
|
|
39527
39527
|
// build/providers/gemini/capabilities.js
|
|
39528
39528
|
var GEMINI_PROVIDER_INFO = {
|
|
39529
39529
|
name: "gemini",
|
|
39530
|
-
models: [
|
|
39531
|
-
|
|
39530
|
+
models: [
|
|
39531
|
+
"gemini-3-pro-image-preview",
|
|
39532
|
+
"gemini-3.1-flash-image-preview",
|
|
39533
|
+
"gemini-2.5-flash-image"
|
|
39534
|
+
],
|
|
39535
|
+
defaultModel: "gemini-2.5-flash-image",
|
|
39532
39536
|
capabilities: /* @__PURE__ */ new Set([
|
|
39533
39537
|
Capability.TEXT_TO_IMAGE,
|
|
39534
39538
|
Capability.ASPECT_RATIO,
|
|
@@ -39539,12 +39543,19 @@ var GEMINI_PROVIDER_INFO = {
|
|
|
39539
39543
|
]),
|
|
39540
39544
|
aspectRatios: [
|
|
39541
39545
|
"1:1",
|
|
39546
|
+
"1:4",
|
|
39547
|
+
"1:8",
|
|
39542
39548
|
"2:3",
|
|
39543
39549
|
"3:2",
|
|
39544
39550
|
"3:4",
|
|
39551
|
+
"4:1",
|
|
39545
39552
|
"4:3",
|
|
39553
|
+
"4:5",
|
|
39554
|
+
"5:4",
|
|
39555
|
+
"8:1",
|
|
39546
39556
|
"9:16",
|
|
39547
|
-
"16:9"
|
|
39557
|
+
"16:9",
|
|
39558
|
+
"21:9"
|
|
39548
39559
|
],
|
|
39549
39560
|
resolutions: ["1K", "2K", "4K"]
|
|
39550
39561
|
};
|
|
@@ -39646,7 +39657,7 @@ import { readFileSync as readFileSync4 } from "node:fs";
|
|
|
39646
39657
|
// build/providers/openai/capabilities.js
|
|
39647
39658
|
var OPENAI_PROVIDER_INFO = {
|
|
39648
39659
|
name: "openai",
|
|
39649
|
-
models: ["gpt-image-1"],
|
|
39660
|
+
models: ["gpt-image-1", "gpt-image-1.5", "gpt-image-1-mini"],
|
|
39650
39661
|
defaultModel: "gpt-image-1",
|
|
39651
39662
|
capabilities: /* @__PURE__ */ new Set([
|
|
39652
39663
|
Capability.TEXT_TO_IMAGE,
|
|
@@ -39734,8 +39745,9 @@ var OpenAIProvider = class {
|
|
|
39734
39745
|
prompt: input.prompt,
|
|
39735
39746
|
n: input.count || 1,
|
|
39736
39747
|
size: mapSize(input.aspectRatio),
|
|
39737
|
-
quality: mapQuality(input.resolution),
|
|
39738
|
-
...input.outputFormat ? { output_format: input.outputFormat } : {}
|
|
39748
|
+
quality: input.quality || mapQuality(input.resolution),
|
|
39749
|
+
...input.outputFormat ? { output_format: input.outputFormat } : {},
|
|
39750
|
+
...input.background ? { background: input.background } : {}
|
|
39739
39751
|
})
|
|
39740
39752
|
});
|
|
39741
39753
|
const json = await response.json();
|
|
@@ -39763,8 +39775,9 @@ var OpenAIProvider = class {
|
|
|
39763
39775
|
prompt: input.prompt,
|
|
39764
39776
|
n: String(input.count || 1),
|
|
39765
39777
|
size: mapSize(input.aspectRatio),
|
|
39766
|
-
quality: mapQuality(input.resolution),
|
|
39767
|
-
...input.outputFormat ? { output_format: input.outputFormat } : {}
|
|
39778
|
+
quality: input.quality || mapQuality(input.resolution),
|
|
39779
|
+
...input.outputFormat ? { output_format: input.outputFormat } : {},
|
|
39780
|
+
...input.background ? { background: input.background } : {}
|
|
39768
39781
|
};
|
|
39769
39782
|
const { body, contentType: ct } = buildMultipart(fields, [
|
|
39770
39783
|
{
|
|
@@ -39865,7 +39878,9 @@ async function runGenerate(provider, args) {
|
|
|
39865
39878
|
aspectRatio: args.aspectRatio,
|
|
39866
39879
|
count: args.count,
|
|
39867
39880
|
resolution: args.resolution,
|
|
39868
|
-
outputFormat: args.outputFormat
|
|
39881
|
+
outputFormat: args.outputFormat,
|
|
39882
|
+
background: args.background,
|
|
39883
|
+
quality: args.quality
|
|
39869
39884
|
}, args.model);
|
|
39870
39885
|
if (!result.success || result.images.length === 0) {
|
|
39871
39886
|
fail(result.error || "Generation failed");
|
|
@@ -39904,7 +39919,9 @@ async function runEdit(provider, args) {
|
|
|
39904
39919
|
prompt: args.prompt,
|
|
39905
39920
|
aspectRatio: args.aspectRatio,
|
|
39906
39921
|
resolution: args.resolution,
|
|
39907
|
-
outputFormat: args.outputFormat
|
|
39922
|
+
outputFormat: args.outputFormat,
|
|
39923
|
+
background: args.background,
|
|
39924
|
+
quality: args.quality
|
|
39908
39925
|
}, args.model);
|
|
39909
39926
|
if (!result.success || result.images.length === 0) {
|
|
39910
39927
|
fail(result.error || "Edit failed");
|
|
@@ -40293,7 +40310,7 @@ function runRedo() {
|
|
|
40293
40310
|
}
|
|
40294
40311
|
|
|
40295
40312
|
// build/cli/index.js
|
|
40296
|
-
var VERSION2 = "1.
|
|
40313
|
+
var VERSION2 = "1.6.1";
|
|
40297
40314
|
var HELP = `imgx v${VERSION2} \u2014 AI image generation and editing for MCP-compatible AI agents
|
|
40298
40315
|
|
|
40299
40316
|
Commands:
|
|
@@ -40322,6 +40339,8 @@ Options:
|
|
|
40322
40339
|
-n, --count <number> Number of images to generate
|
|
40323
40340
|
-r, --resolution <size> Resolution: 1K, 2K, 4K
|
|
40324
40341
|
-f, --format <type> Output format: png, jpeg, webp (OpenAI only)
|
|
40342
|
+
-b, --background <type> Background: transparent, opaque, auto (OpenAI only)
|
|
40343
|
+
-q, --quality <level> Quality: low, medium, high, auto (OpenAI only)
|
|
40325
40344
|
-m, --model <model> Model name
|
|
40326
40345
|
--provider <name> Provider: gemini, openai (default: gemini)
|
|
40327
40346
|
-d, --output-dir <dir> Output directory
|
|
@@ -40411,6 +40430,8 @@ function main() {
|
|
|
40411
40430
|
count: { type: "string", short: "n" },
|
|
40412
40431
|
resolution: { type: "string", short: "r" },
|
|
40413
40432
|
format: { type: "string", short: "f" },
|
|
40433
|
+
background: { type: "string", short: "b" },
|
|
40434
|
+
quality: { type: "string", short: "q" },
|
|
40414
40435
|
model: { type: "string", short: "m" },
|
|
40415
40436
|
provider: { type: "string" },
|
|
40416
40437
|
"output-dir": { type: "string", short: "d" },
|
|
@@ -40447,6 +40468,8 @@ function main() {
|
|
|
40447
40468
|
aspectRatio: values["aspect-ratio"] || resolveDefault("aspectRatio") || void 0,
|
|
40448
40469
|
resolution: values.resolution || resolveDefault("resolution") || void 0,
|
|
40449
40470
|
outputFormat: values.format || void 0,
|
|
40471
|
+
background: values.background || void 0,
|
|
40472
|
+
quality: values.quality || void 0,
|
|
40450
40473
|
model,
|
|
40451
40474
|
count: values.count ? parseInt(values.count, 10) : void 0
|
|
40452
40475
|
};
|
|
Binary file
|
package/dist/mcp.bundle.js
CHANGED
|
@@ -69664,8 +69664,12 @@ var Capability;
|
|
|
69664
69664
|
// build/providers/gemini/capabilities.js
|
|
69665
69665
|
var GEMINI_PROVIDER_INFO = {
|
|
69666
69666
|
name: "gemini",
|
|
69667
|
-
models: [
|
|
69668
|
-
|
|
69667
|
+
models: [
|
|
69668
|
+
"gemini-3-pro-image-preview",
|
|
69669
|
+
"gemini-3.1-flash-image-preview",
|
|
69670
|
+
"gemini-2.5-flash-image"
|
|
69671
|
+
],
|
|
69672
|
+
defaultModel: "gemini-2.5-flash-image",
|
|
69669
69673
|
capabilities: /* @__PURE__ */ new Set([
|
|
69670
69674
|
Capability.TEXT_TO_IMAGE,
|
|
69671
69675
|
Capability.ASPECT_RATIO,
|
|
@@ -69676,12 +69680,19 @@ var GEMINI_PROVIDER_INFO = {
|
|
|
69676
69680
|
]),
|
|
69677
69681
|
aspectRatios: [
|
|
69678
69682
|
"1:1",
|
|
69683
|
+
"1:4",
|
|
69684
|
+
"1:8",
|
|
69679
69685
|
"2:3",
|
|
69680
69686
|
"3:2",
|
|
69681
69687
|
"3:4",
|
|
69688
|
+
"4:1",
|
|
69682
69689
|
"4:3",
|
|
69690
|
+
"4:5",
|
|
69691
|
+
"5:4",
|
|
69692
|
+
"8:1",
|
|
69683
69693
|
"9:16",
|
|
69684
|
-
"16:9"
|
|
69694
|
+
"16:9",
|
|
69695
|
+
"21:9"
|
|
69685
69696
|
],
|
|
69686
69697
|
resolutions: ["1K", "2K", "4K"]
|
|
69687
69698
|
};
|
|
@@ -69787,7 +69798,7 @@ import { readFileSync as readFileSync4 } from "node:fs";
|
|
|
69787
69798
|
// build/providers/openai/capabilities.js
|
|
69788
69799
|
var OPENAI_PROVIDER_INFO = {
|
|
69789
69800
|
name: "openai",
|
|
69790
|
-
models: ["gpt-image-1"],
|
|
69801
|
+
models: ["gpt-image-1", "gpt-image-1.5", "gpt-image-1-mini"],
|
|
69791
69802
|
defaultModel: "gpt-image-1",
|
|
69792
69803
|
capabilities: /* @__PURE__ */ new Set([
|
|
69793
69804
|
Capability.TEXT_TO_IMAGE,
|
|
@@ -69875,8 +69886,9 @@ var OpenAIProvider = class {
|
|
|
69875
69886
|
prompt: input.prompt,
|
|
69876
69887
|
n: input.count || 1,
|
|
69877
69888
|
size: mapSize(input.aspectRatio),
|
|
69878
|
-
quality: mapQuality(input.resolution),
|
|
69879
|
-
...input.outputFormat ? { output_format: input.outputFormat } : {}
|
|
69889
|
+
quality: input.quality || mapQuality(input.resolution),
|
|
69890
|
+
...input.outputFormat ? { output_format: input.outputFormat } : {},
|
|
69891
|
+
...input.background ? { background: input.background } : {}
|
|
69880
69892
|
})
|
|
69881
69893
|
});
|
|
69882
69894
|
const json2 = await response.json();
|
|
@@ -69904,8 +69916,9 @@ var OpenAIProvider = class {
|
|
|
69904
69916
|
prompt: input.prompt,
|
|
69905
69917
|
n: String(input.count || 1),
|
|
69906
69918
|
size: mapSize(input.aspectRatio),
|
|
69907
|
-
quality: mapQuality(input.resolution),
|
|
69908
|
-
...input.outputFormat ? { output_format: input.outputFormat } : {}
|
|
69919
|
+
quality: input.quality || mapQuality(input.resolution),
|
|
69920
|
+
...input.outputFormat ? { output_format: input.outputFormat } : {},
|
|
69921
|
+
...input.background ? { background: input.background } : {}
|
|
69909
69922
|
};
|
|
69910
69923
|
const { body, contentType: ct } = buildMultipart(fields, [
|
|
69911
69924
|
{
|
|
@@ -69982,7 +69995,7 @@ function buildImageContent(images, paths, extra) {
|
|
|
69982
69995
|
}
|
|
69983
69996
|
var server = new McpServer({
|
|
69984
69997
|
name: "imgx",
|
|
69985
|
-
version: "1.
|
|
69998
|
+
version: "1.6.1"
|
|
69986
69999
|
});
|
|
69987
70000
|
initGemini();
|
|
69988
70001
|
initOpenAI();
|
|
@@ -70003,6 +70016,8 @@ server.tool("generate_image", "Generate an image from a text prompt", {
|
|
|
70003
70016
|
resolution: external_exports3.enum(["1K", "2K", "4K"]).optional().describe("Output resolution"),
|
|
70004
70017
|
count: external_exports3.number().int().min(1).max(4).optional().describe("Number of images"),
|
|
70005
70018
|
output_format: external_exports3.enum(["png", "jpeg", "webp"]).optional().describe("Output format"),
|
|
70019
|
+
background: external_exports3.enum(["transparent", "opaque", "auto"]).optional().describe("Background transparency (OpenAI only). Use 'transparent' for transparent PNG/WebP"),
|
|
70020
|
+
quality: external_exports3.enum(["low", "medium", "high", "auto"]).optional().describe("Image quality (OpenAI only). Overrides resolution-based quality mapping"),
|
|
70006
70021
|
model: external_exports3.string().optional().describe("Model name"),
|
|
70007
70022
|
provider: external_exports3.string().optional().describe("Provider name")
|
|
70008
70023
|
}, async (args) => {
|
|
@@ -70013,7 +70028,9 @@ server.tool("generate_image", "Generate an image from a text prompt", {
|
|
|
70013
70028
|
aspectRatio: args.aspect_ratio,
|
|
70014
70029
|
resolution: args.resolution,
|
|
70015
70030
|
count: args.count,
|
|
70016
|
-
outputFormat: args.output_format
|
|
70031
|
+
outputFormat: args.output_format,
|
|
70032
|
+
background: args.background,
|
|
70033
|
+
quality: args.quality
|
|
70017
70034
|
};
|
|
70018
70035
|
const result = await prov.generate(input, args.model);
|
|
70019
70036
|
if (!result.success || result.images.length === 0) {
|
|
@@ -70049,6 +70066,8 @@ server.tool("edit_image", "Edit an existing image with text instructions", {
|
|
|
70049
70066
|
aspect_ratio: external_exports3.enum(["1:1", "2:3", "3:2", "3:4", "4:3", "9:16", "16:9"]).optional().describe("Aspect ratio"),
|
|
70050
70067
|
resolution: external_exports3.enum(["1K", "2K", "4K"]).optional().describe("Output resolution"),
|
|
70051
70068
|
output_format: external_exports3.enum(["png", "jpeg", "webp"]).optional().describe("Output format"),
|
|
70069
|
+
background: external_exports3.enum(["transparent", "opaque", "auto"]).optional().describe("Background transparency (OpenAI only)"),
|
|
70070
|
+
quality: external_exports3.enum(["low", "medium", "high", "auto"]).optional().describe("Image quality (OpenAI only)"),
|
|
70052
70071
|
model: external_exports3.string().optional().describe("Model name"),
|
|
70053
70072
|
provider: external_exports3.string().optional().describe("Provider name")
|
|
70054
70073
|
}, async (args) => {
|
|
@@ -70064,7 +70083,9 @@ server.tool("edit_image", "Edit an existing image with text instructions", {
|
|
|
70064
70083
|
inputImage: args.input,
|
|
70065
70084
|
aspectRatio: args.aspect_ratio,
|
|
70066
70085
|
resolution: args.resolution,
|
|
70067
|
-
outputFormat: args.output_format
|
|
70086
|
+
outputFormat: args.output_format,
|
|
70087
|
+
background: args.background,
|
|
70088
|
+
quality: args.quality
|
|
70068
70089
|
};
|
|
70069
70090
|
const result = await prov.edit(input, args.model);
|
|
70070
70091
|
if (!result.success || result.images.length === 0) {
|
|
@@ -70094,6 +70115,8 @@ server.tool("edit_last", "Edit the last generated/edited image with new text ins
|
|
|
70094
70115
|
aspect_ratio: external_exports3.enum(["1:1", "2:3", "3:2", "3:4", "4:3", "9:16", "16:9"]).optional().describe("Aspect ratio"),
|
|
70095
70116
|
resolution: external_exports3.enum(["1K", "2K", "4K"]).optional().describe("Output resolution"),
|
|
70096
70117
|
output_format: external_exports3.enum(["png", "jpeg", "webp"]).optional().describe("Output format"),
|
|
70118
|
+
background: external_exports3.enum(["transparent", "opaque", "auto"]).optional().describe("Background transparency (OpenAI only)"),
|
|
70119
|
+
quality: external_exports3.enum(["low", "medium", "high", "auto"]).optional().describe("Image quality (OpenAI only)"),
|
|
70097
70120
|
model: external_exports3.string().optional().describe("Model name"),
|
|
70098
70121
|
provider: external_exports3.string().optional().describe("Provider name")
|
|
70099
70122
|
}, async (args) => {
|
|
@@ -70115,7 +70138,9 @@ server.tool("edit_last", "Edit the last generated/edited image with new text ins
|
|
|
70115
70138
|
inputImage: active.filePaths[0],
|
|
70116
70139
|
aspectRatio: args.aspect_ratio,
|
|
70117
70140
|
resolution: args.resolution,
|
|
70118
|
-
outputFormat: args.output_format
|
|
70141
|
+
outputFormat: args.output_format,
|
|
70142
|
+
background: args.background,
|
|
70143
|
+
quality: args.quality
|
|
70119
70144
|
};
|
|
70120
70145
|
const result = await prov.edit(input, args.model);
|
|
70121
70146
|
if (!result.success || result.images.length === 0) {
|
package/package.json
CHANGED
|
@@ -1,8 +1,8 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "imgx-mcp",
|
|
3
|
-
"version": "1.
|
|
3
|
+
"version": "1.6.1",
|
|
4
4
|
"mcpName": "io.github.somacoffeekyoto/imgx",
|
|
5
|
-
"description": "AI image generation and editing
|
|
5
|
+
"description": "AI image generation and editing MCP server with context-aware prompt optimization, 24 editing techniques, and session undo/redo",
|
|
6
6
|
"type": "module",
|
|
7
7
|
"bin": {
|
|
8
8
|
"imgx": "dist/cli.bundle.js",
|
|
@@ -7,12 +7,44 @@ description: Generate and edit AI images using Gemini or OpenAI. Text-to-image,
|
|
|
7
7
|
|
|
8
8
|
Generate and edit images using the imgx MCP tools. Gemini and OpenAI providers supported.
|
|
9
9
|
|
|
10
|
+
## Default model behavior
|
|
11
|
+
|
|
12
|
+
**When the user does not specify a model, use Nano Banana (`gemini-2.5-flash-image`)** — the free tier model. This lets users start immediately without paid API access (500 images/day, no credit card).
|
|
13
|
+
|
|
14
|
+
Suggest upgrading to a paid model when:
|
|
15
|
+
- The user is unsatisfied with quality and wants improvement
|
|
16
|
+
- The user needs 4K resolution or extended aspect ratios (1:4, 1:8, 4:1, 8:1, 21:9)
|
|
17
|
+
- The user needs high text rendering accuracy (→ Nano Banana 2)
|
|
18
|
+
- The user explicitly asks for higher quality or a specific paid model
|
|
19
|
+
- The task clearly requires maximum quality (e.g. final production assets, print)
|
|
20
|
+
|
|
21
|
+
When suggesting an upgrade, briefly explain what the paid model adds. Example:
|
|
22
|
+
> "This was generated with the free model (Nano Banana). For higher resolution (up to 4K) and more aspect ratio options, I can re-generate with Nano Banana 2 or Pro — these require paid API access."
|
|
23
|
+
|
|
10
24
|
## When to use
|
|
11
25
|
|
|
12
26
|
- User asks to create, generate, or make an image
|
|
13
27
|
- User asks to edit, modify, or change an existing image
|
|
14
28
|
- User needs a cover image, diagram, icon, or visual asset
|
|
15
29
|
- User wants to refine an image iteratively ("make it darker", "change the background")
|
|
30
|
+
- User mentions a model by alias (Nano Banana, NB2, etc.) — see Model aliases below
|
|
31
|
+
|
|
32
|
+
## Model aliases
|
|
33
|
+
|
|
34
|
+
Users may refer to models by their alias. Map these to the correct `model` parameter value:
|
|
35
|
+
|
|
36
|
+
| Alias (case-insensitive) | Model ID | Provider |
|
|
37
|
+
|--------------------------|----------|----------|
|
|
38
|
+
| Nano Banana Pro, NanoBanana Pro, NB Pro, ナノバナナプロ | `gemini-3-pro-image-preview` | gemini |
|
|
39
|
+
| Nano Banana 2, NanoBanana 2, NB2, ナノバナナ2, ナノバナナツー | `gemini-3.1-flash-image-preview` | gemini |
|
|
40
|
+
| Nano Banana, NanoBanana, NB, ナノバナナ | `gemini-2.5-flash-image` | gemini |
|
|
41
|
+
| GPT Image, gpt-image | `gpt-image-1` | openai |
|
|
42
|
+
| GPT Image 1.5 | `gpt-image-1.5` | openai |
|
|
43
|
+
| GPT Image Mini, gpt-mini | `gpt-image-1-mini` | openai |
|
|
44
|
+
|
|
45
|
+
When the user says "ナノバナナ2で画像作って" → use `generate_image` with `model="gemini-3.1-flash-image-preview"`.
|
|
46
|
+
When the user says "Nano Banana Proで前の画像を作り直して" → use `edit_last` with `model="gemini-3-pro-image-preview"`.
|
|
47
|
+
When the user says "ナノバナナで画像作って" or "NB" → use `generate_image` with `model="gemini-2.5-flash-image"` (free tier model).
|
|
16
48
|
|
|
17
49
|
## Setup
|
|
18
50
|
|
|
@@ -53,7 +85,7 @@ After adding, restart Claude Code for the MCP server to connect.
|
|
|
53
85
|
|
|
54
86
|
Get at least one API key:
|
|
55
87
|
|
|
56
|
-
- **Gemini** (default
|
|
88
|
+
- **Gemini** (default): [Google AI Studio](https://aistudio.google.com/apikey)
|
|
57
89
|
- **OpenAI**: [OpenAI Platform](https://platform.openai.com/api-keys)
|
|
58
90
|
|
|
59
91
|
Set the key in the `.mcp.json` env section (above), or via CLI:
|
|
@@ -61,6 +93,132 @@ Set the key in the `.mcp.json` env section (above), or via CLI:
|
|
|
61
93
|
npx imgx-mcp config set api-key YOUR_KEY --provider gemini
|
|
62
94
|
```
|
|
63
95
|
|
|
96
|
+
### 3. Project root (optional but recommended)
|
|
97
|
+
|
|
98
|
+
imgx-mcp uses the project root to determine where `.imgx/` (history + default image output) is created. Without it, images go to `~/Pictures/imgx/` and history to `~/.config/imgx/`.
|
|
99
|
+
|
|
100
|
+
| Method | Scope | How to set |
|
|
101
|
+
|--------|-------|------------|
|
|
102
|
+
| `IMGX_PROJECT_ROOT` env var | Per-client (highest priority) | Add to `env` in `.mcp.json` or `claude_desktop_config.json` |
|
|
103
|
+
| Auto-detection (MCP roots / `.imgxrc` search) | Automatic | Works on CLI agents (Claude Code, Gemini CLI). Not available on Claude Desktop |
|
|
104
|
+
| `imgx config set project-root /path` | All clients on the machine | Stored in user config |
|
|
105
|
+
|
|
106
|
+
Detection priority: env var > MCP roots > `.imgxrc` upward search > user config `projectRoot`.
|
|
107
|
+
|
|
108
|
+
**Claude Code** usually auto-detects via MCP roots — no extra config needed. **Claude Desktop** does not support auto-detection, so set `IMGX_PROJECT_ROOT` in the env.
|
|
109
|
+
|
|
110
|
+
#### `.imgxrc` project config
|
|
111
|
+
|
|
112
|
+
Create with `npx imgx-mcp init` or manually. Shared via Git (do not put API keys here):
|
|
113
|
+
|
|
114
|
+
```json
|
|
115
|
+
{
|
|
116
|
+
"defaults": {
|
|
117
|
+
"model": "gemini-2.5-flash-image",
|
|
118
|
+
"outputDir": "./assets/images",
|
|
119
|
+
"aspectRatio": "16:9"
|
|
120
|
+
}
|
|
121
|
+
}
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
#### Claude Desktop config example
|
|
125
|
+
|
|
126
|
+
```json
|
|
127
|
+
{
|
|
128
|
+
"mcpServers": {
|
|
129
|
+
"imgx": {
|
|
130
|
+
"command": "npx",
|
|
131
|
+
"args": ["--package=imgx-mcp", "-y", "imgx-mcp"],
|
|
132
|
+
"env": {
|
|
133
|
+
"GEMINI_API_KEY": "your-key",
|
|
134
|
+
"IMGX_PROJECT_ROOT": "C:\\Users\\you\\my-project"
|
|
135
|
+
}
|
|
136
|
+
}
|
|
137
|
+
}
|
|
138
|
+
}
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
## Models and image specs
|
|
142
|
+
|
|
143
|
+
### Nano Banana Pro — `gemini-3-pro-image-preview`
|
|
144
|
+
|
|
145
|
+
Google's highest-quality image generation model. Paid only.
|
|
146
|
+
|
|
147
|
+
| Spec | Value |
|
|
148
|
+
|------|-------|
|
|
149
|
+
| Resolution | 1K (1024px), 2K (2048px), 4K (4096px) |
|
|
150
|
+
| Aspect ratios | 14: `1:1`, `1:4`, `1:8`, `2:3`, `3:2`, `3:4`, `4:1`, `4:3`, `4:5`, `5:4`, `8:1`, `9:16`, `16:9`, `21:9` |
|
|
151
|
+
| Output format | PNG |
|
|
152
|
+
| Text rendering | Good |
|
|
153
|
+
| Photorealism | High |
|
|
154
|
+
| Cost | ~$0.134/image |
|
|
155
|
+
| Best for | High-quality hero images, photorealistic scenes, detailed illustrations |
|
|
156
|
+
|
|
157
|
+
### Nano Banana 2 — `gemini-3.1-flash-image-preview`
|
|
158
|
+
|
|
159
|
+
Fast model with Pro-level capabilities at lower cost. Improved text rendering.
|
|
160
|
+
|
|
161
|
+
| Spec | Value |
|
|
162
|
+
|------|-------|
|
|
163
|
+
| Resolution | 1K (1024px), 2K (2048px), 4K (4096px) |
|
|
164
|
+
| Aspect ratios | 14: `1:1`, `1:4`, `1:8`, `2:3`, `3:2`, `3:4`, `4:1`, `4:3`, `4:5`, `5:4`, `8:1`, `9:16`, `16:9`, `21:9` |
|
|
165
|
+
| Output format | PNG |
|
|
166
|
+
| Text rendering | High (~90% accuracy) |
|
|
167
|
+
| Photorealism | Good |
|
|
168
|
+
| Cost | $0.045-$0.151/image (resolution dependent) |
|
|
169
|
+
| Best for | Rapid iteration, text-heavy images, marketing mockups, cost-sensitive workflows |
|
|
170
|
+
|
|
171
|
+
### Nano Banana — `gemini-2.5-flash-image`
|
|
172
|
+
|
|
173
|
+
The only Gemini image model with a **free tier**. Best entry point for trying imgx-mcp without cost.
|
|
174
|
+
|
|
175
|
+
| Spec | Value |
|
|
176
|
+
|------|-------|
|
|
177
|
+
| Resolution | 1K (1024px) max |
|
|
178
|
+
| Aspect ratios | 7: `1:1`, `2:3`, `3:2`, `3:4`, `4:3`, `9:16`, `16:9` |
|
|
179
|
+
| Output format | PNG |
|
|
180
|
+
| Text rendering | Fair |
|
|
181
|
+
| Photorealism | Good |
|
|
182
|
+
| Free tier | **Yes** — 10 RPM / 500 RPD (no credit card required) |
|
|
183
|
+
| Paid tier | $0.039/image |
|
|
184
|
+
| Best for | Free usage, quick prototyping, learning the workflow |
|
|
185
|
+
| Limitations | No 4K, no extended aspect ratios (1:4, 1:8, 4:1, 8:1, 21:9 etc.) |
|
|
186
|
+
|
|
187
|
+
### OpenAI models
|
|
188
|
+
|
|
189
|
+
3 models available. All share the same capabilities (multi-output, format selection). Same API, same parameters.
|
|
190
|
+
|
|
191
|
+
| Spec | gpt-image-1 | gpt-image-1.5 | gpt-image-1-mini |
|
|
192
|
+
|------|-------------|----------------|------------------|
|
|
193
|
+
| Resolution | Auto | Auto | Auto |
|
|
194
|
+
| Aspect ratios | 7 | 7 | 7 |
|
|
195
|
+
| Output format | PNG, JPEG, WebP | PNG, JPEG, WebP | PNG, JPEG, WebP |
|
|
196
|
+
| Text rendering | Good | High (improved) | Fair |
|
|
197
|
+
| Speed | Standard | ~4x faster | Standard |
|
|
198
|
+
| Cost | $0.02-$0.19/image | ~20% cheaper than gpt-image-1 | $0.005-$0.036/image |
|
|
199
|
+
| Best for | General use | Fast iteration, text-heavy, editing precision | Budget, bulk generation |
|
|
200
|
+
|
|
201
|
+
### Model selection guide
|
|
202
|
+
|
|
203
|
+
| Situation | Recommended model |
|
|
204
|
+
|-----------|-------------------|
|
|
205
|
+
| **Default / no model specified** | **Nano Banana** (free, 500/day) |
|
|
206
|
+
| User wants better quality | Nano Banana Pro (`model="gemini-3-pro-image-preview"`) — paid |
|
|
207
|
+
| Fast iteration with 4K / extended ratios | Nano Banana 2 (`model="gemini-3.1-flash-image-preview"`) — paid |
|
|
208
|
+
| Text on images (logos, cards, mockups) | Nano Banana 2 (best text rendering) — paid |
|
|
209
|
+
| Ultra-wide / tall images (8:1, 1:8, 21:9) | Gemini 3.x models (14 aspect ratios) — paid |
|
|
210
|
+
| Need transparent PNG (icons, logos) | OpenAI (`background="transparent"`) — paid |
|
|
211
|
+
| Need JPEG/WebP output | OpenAI (`output_format="jpeg"`) — paid |
|
|
212
|
+
| Multiple variations at once | OpenAI (`count=3`) — paid |
|
|
213
|
+
| OpenAI fast + cheap | gpt-image-1.5 (`model="gpt-image-1.5"`) — 4x faster, 20% cheaper |
|
|
214
|
+
| OpenAI ultra-budget | gpt-image-1-mini (`model="gpt-image-1-mini"`) — $0.005/image |
|
|
215
|
+
| OpenAI fast draft (low cost) | Any OpenAI model with `quality="low"` — fastest, cheapest |
|
|
216
|
+
| OpenAI maximum detail | Any OpenAI model with `quality="high"` — best quality, slower |
|
|
217
|
+
| Compare providers side-by-side | Generate with Gemini, then OpenAI |
|
|
218
|
+
| Budget-conscious bulk generation | Nano Banana 2 (lowest per-image cost in paid tier) |
|
|
219
|
+
|
|
220
|
+
**Upgrade path**: Nano Banana (free) → Nano Banana 2 (fast, affordable paid) → Nano Banana Pro (highest quality paid)
|
|
221
|
+
|
|
64
222
|
## MCP tools
|
|
65
223
|
|
|
66
224
|
Use these tools directly. No Bash needed.
|
|
@@ -72,11 +230,13 @@ Generate an image from a text prompt.
|
|
|
72
230
|
| Parameter | Required | Description |
|
|
73
231
|
|-----------|----------|-------------|
|
|
74
232
|
| `prompt` | Yes | Image description |
|
|
75
|
-
| `aspect_ratio` | No |
|
|
233
|
+
| `aspect_ratio` | No | See model specs above for supported ratios |
|
|
76
234
|
| `resolution` | No | `1K`, `2K`, `4K` (Gemini only) |
|
|
77
235
|
| `count` | No | Number of images (OpenAI only) |
|
|
78
236
|
| `output_format` | No | `png`, `jpeg`, `webp` (OpenAI only) |
|
|
79
|
-
| `
|
|
237
|
+
| `background` | No | `transparent`, `opaque`, `auto` (OpenAI only). Use `transparent` for transparent PNG/WebP |
|
|
238
|
+
| `quality` | No | `low`, `medium`, `high`, `auto` (OpenAI only). Overrides resolution-based mapping |
|
|
239
|
+
| `model` | No | Model name or use alias mapping above |
|
|
80
240
|
| `provider` | No | `gemini` (default) or `openai` |
|
|
81
241
|
| `output` | No | Output file path |
|
|
82
242
|
| `output_dir` | No | Output directory |
|
|
@@ -92,7 +252,9 @@ Edit an existing image with text instructions. No mask needed — the model dete
|
|
|
92
252
|
| `aspect_ratio` | No | Output aspect ratio |
|
|
93
253
|
| `resolution` | No | Output resolution (Gemini only) |
|
|
94
254
|
| `output_format` | No | `png`, `jpeg`, `webp` (OpenAI only) |
|
|
95
|
-
| `
|
|
255
|
+
| `background` | No | `transparent`, `opaque`, `auto` (OpenAI only) |
|
|
256
|
+
| `quality` | No | `low`, `medium`, `high`, `auto` (OpenAI only) |
|
|
257
|
+
| `model` | No | Model name or use alias mapping above |
|
|
96
258
|
| `provider` | No | `gemini` (default) or `openai` |
|
|
97
259
|
| `output` | No | Output file path |
|
|
98
260
|
| `output_dir` | No | Output directory |
|
|
@@ -107,7 +269,9 @@ Edit the last generated or edited image. No input path needed — automatically
|
|
|
107
269
|
| `aspect_ratio` | No | Output aspect ratio |
|
|
108
270
|
| `resolution` | No | Output resolution (Gemini only) |
|
|
109
271
|
| `output_format` | No | `png`, `jpeg`, `webp` (OpenAI only) |
|
|
110
|
-
| `
|
|
272
|
+
| `background` | No | `transparent`, `opaque`, `auto` (OpenAI only) |
|
|
273
|
+
| `quality` | No | `low`, `medium`, `high`, `auto` (OpenAI only) |
|
|
274
|
+
| `model` | No | Model name or use alias mapping above |
|
|
111
275
|
| `provider` | No | `gemini` (default) or `openai` |
|
|
112
276
|
| `output` | No | Output file path |
|
|
113
277
|
| `output_dir` | No | Output directory |
|
|
@@ -165,10 +329,11 @@ Change the default output directory for generated images.
|
|
|
165
329
|
### Blog cover image
|
|
166
330
|
|
|
167
331
|
```
|
|
168
|
-
1. generate_image: prompt="A developer's desk with laptop showing terminal, coffee cup, warm morning light" aspect_ratio="16:9"
|
|
332
|
+
1. generate_image: prompt="A developer's desk with laptop showing terminal, coffee cup, warm morning light" aspect_ratio="16:9"
|
|
333
|
+
(uses free Nano Banana model by default)
|
|
169
334
|
2. Review the result with the user
|
|
170
335
|
3. edit_last: prompt="Make the color palette warmer" (if user wants changes)
|
|
171
|
-
4.
|
|
336
|
+
4. If user wants higher quality → re-generate with model="gemini-3-pro-image-preview" resolution="2K"
|
|
172
337
|
```
|
|
173
338
|
|
|
174
339
|
### Iterative refinement
|
|
@@ -176,7 +341,7 @@ Change the default output directory for generated images.
|
|
|
176
341
|
The `edit_last` tool is the key to conversational image editing. Each call takes the previous output as input:
|
|
177
342
|
|
|
178
343
|
```
|
|
179
|
-
generate_image
|
|
344
|
+
generate_image -> edit_last -> edit_last -> edit_last -> done
|
|
180
345
|
```
|
|
181
346
|
|
|
182
347
|
Tell the user what was generated, ask if they want changes, and use `edit_last` to apply them. This is the most natural workflow.
|
|
@@ -186,10 +351,12 @@ Tell the user what was generated, ask if they want changes, and use `edit_last`
|
|
|
186
351
|
Use `undo_edit` and `redo_edit` to navigate through edit history:
|
|
187
352
|
|
|
188
353
|
```
|
|
189
|
-
generate_image
|
|
354
|
+
generate_image -> edit_last -> edit_last -> undo_edit -> undo_edit -> redo_edit
|
|
190
355
|
```
|
|
191
356
|
|
|
192
|
-
|
|
357
|
+
After undo, calling `edit_last` branches from the current position — abandoned entries and their files are automatically deleted from disk.
|
|
358
|
+
|
|
359
|
+
Each generate starts a new session. Use `edit_history` to see all sessions, and `switch_session` to resume work on a previous image chain. `edit_last` uses the current position in the switched session.
|
|
193
360
|
|
|
194
361
|
### Comparing providers
|
|
195
362
|
|
|
@@ -210,19 +377,357 @@ Generate the same prompt with different providers to let the user choose:
|
|
|
210
377
|
2. For Gemini, generate multiple times with slight prompt variations
|
|
211
378
|
```
|
|
212
379
|
|
|
380
|
+
## Common use cases and techniques
|
|
381
|
+
|
|
382
|
+
When the user describes what they need, suggest appropriate parameters and approach based on context.
|
|
383
|
+
|
|
384
|
+
### Use case: OGP / social share images
|
|
385
|
+
|
|
386
|
+
- Aspect ratio: `16:9` (Twitter/X, Facebook) or `1.91:1` (use `2:3` as closest)
|
|
387
|
+
- Start with Nano Banana (free) for drafting. Upgrade to `2K` resolution with Nano Banana 2 or Pro for final
|
|
388
|
+
- For text on the image — suggest Nano Banana 2 (best text rendering, paid)
|
|
389
|
+
- Prompt tip: Describe the scene plus any text overlay you want rendered directly
|
|
390
|
+
|
|
391
|
+
### Use case: Blog / article cover
|
|
392
|
+
|
|
393
|
+
- Aspect ratio: `16:9` or `3:2`
|
|
394
|
+
- Resolution: `2K` (balances quality and file size)
|
|
395
|
+
- Prompt tip: Describe the main visual concept. Avoid metaphorical descriptions — be literal about what should appear
|
|
396
|
+
|
|
397
|
+
### Use case: Presentation slides
|
|
398
|
+
|
|
399
|
+
- Aspect ratio: `16:9`
|
|
400
|
+
- Resolution: `2K`
|
|
401
|
+
- Use a consistent visual theme across slides (describe the same color palette, style, and composition framing)
|
|
402
|
+
- Prompt tip: Include "slide design" or "presentation visual" for cleaner layout
|
|
403
|
+
|
|
404
|
+
### Use case: App store screenshots / product images
|
|
405
|
+
|
|
406
|
+
- Aspect ratio: `9:16` (portrait), `16:9` (landscape), `1:1` (square)
|
|
407
|
+
- Draft with Nano Banana (free), then `4K` with Nano Banana 2 or Pro (paid) for retina
|
|
408
|
+
- Prompt tip: Describe the device frame and screen content you want shown
|
|
409
|
+
|
|
410
|
+
### Use case: Vertical content (Stories, Reels, Shorts)
|
|
411
|
+
|
|
412
|
+
- Aspect ratio: `9:16`
|
|
413
|
+
- Full-bleed imagery works best — describe edge-to-edge scenes
|
|
414
|
+
|
|
415
|
+
### Use case: Ultra-wide banner
|
|
416
|
+
|
|
417
|
+
- Aspect ratio: `21:9` or `8:1` — requires Gemini 3.x models (paid)
|
|
418
|
+
- Good for website hero banners, email headers, panoramic scenes
|
|
419
|
+
- Note: Nano Banana (free) does not support extended ratios. Suggest upgrade if user needs these
|
|
420
|
+
|
|
421
|
+
### Use case: Tall / narrow (Pinterest, infographic header)
|
|
422
|
+
|
|
423
|
+
- Aspect ratio: `1:4` or `1:8` — requires Gemini 3.x models (paid)
|
|
424
|
+
- Describe vertical flow — elements stacked top to bottom
|
|
425
|
+
|
|
426
|
+
### Use case: Icons, logos, stickers (transparent background)
|
|
427
|
+
|
|
428
|
+
- Use OpenAI with `background="transparent"` and `output_format="png"` (or `webp`)
|
|
429
|
+
- JPEG does not support transparency — use PNG or WebP
|
|
430
|
+
- Aspect ratio: `1:1` for icons
|
|
431
|
+
- Prompt tip: Describe only the subject. Do not describe the background — the API handles removal
|
|
432
|
+
|
|
433
|
+
### Use case: WordPress / web content
|
|
434
|
+
|
|
435
|
+
- Prefer `output_format="jpeg"` (OpenAI) for smaller file size
|
|
436
|
+
- Or generate with Gemini (PNG) and let the CMS handle conversion
|
|
437
|
+
- `2K` resolution is sufficient for web
|
|
438
|
+
|
|
439
|
+
## Popular editing techniques
|
|
440
|
+
|
|
441
|
+
When the user wants to modify an image, suggest these proven approaches with `edit_last`:
|
|
442
|
+
|
|
443
|
+
### Atmosphere and mood
|
|
444
|
+
|
|
445
|
+
| Technique | Prompt example |
|
|
446
|
+
|-----------|---------------|
|
|
447
|
+
| Warm up | "Make the color palette warmer, shift toward golden/amber tones" |
|
|
448
|
+
| Cool down | "Shift the color palette to cooler blue tones" |
|
|
449
|
+
| Dramatic lighting | "Add dramatic side lighting with deep shadows" |
|
|
450
|
+
| Golden hour | "Change the lighting to golden hour, warm sun low on the horizon" |
|
|
451
|
+
| Night / dark mode | "Convert to a nighttime scene with dark sky and artificial lighting" |
|
|
452
|
+
| Foggy / misty | "Add atmospheric fog in the background" |
|
|
453
|
+
|
|
454
|
+
### Composition adjustments
|
|
455
|
+
|
|
456
|
+
| Technique | Prompt example |
|
|
457
|
+
|-----------|---------------|
|
|
458
|
+
| Simplify background | "Replace the busy background with a clean, solid dark background" |
|
|
459
|
+
| Add depth of field | "Blur the background to create shallow depth of field, keep foreground sharp" |
|
|
460
|
+
| Add vignette | "Add a subtle vignette effect, darker edges" |
|
|
461
|
+
| Change perspective | "Change the viewpoint to a top-down bird's eye view" |
|
|
462
|
+
| Zoom in | "Crop tighter on the main subject, remove surrounding elements" |
|
|
463
|
+
|
|
464
|
+
### Element manipulation
|
|
465
|
+
|
|
466
|
+
| Technique | Prompt example |
|
|
467
|
+
|-----------|---------------|
|
|
468
|
+
| Add object | "Add a steaming coffee cup on the left side of the desk" |
|
|
469
|
+
| Remove object | "Remove the laptop from the scene" |
|
|
470
|
+
| Change color | "Change the shirt color from blue to red" |
|
|
471
|
+
| Add text | "Add the text 'HELLO WORLD' in bold white letters at the top" |
|
|
472
|
+
| Swap material | "Change the wooden table to marble" |
|
|
473
|
+
| Change season | "Change the scene from summer to autumn, add fall foliage" |
|
|
474
|
+
| Add weather | "Add rain falling and puddles on the ground" |
|
|
475
|
+
|
|
476
|
+
### Style transfer
|
|
477
|
+
|
|
478
|
+
| Technique | Prompt example |
|
|
479
|
+
|-----------|---------------|
|
|
480
|
+
| Illustration style | "Convert to a flat vector illustration style" |
|
|
481
|
+
| Watercolor | "Redraw as a watercolor painting with soft edges" |
|
|
482
|
+
| Pencil sketch | "Convert to a detailed pencil sketch" |
|
|
483
|
+
| Pixel art | "Redraw as pixel art in 16-bit style" |
|
|
484
|
+
| Anime / manga | "Redraw in anime art style" |
|
|
485
|
+
| Vintage photo | "Apply a vintage film photo look with grain and faded colors" |
|
|
486
|
+
|
|
487
|
+
### Practical refinement patterns
|
|
488
|
+
|
|
489
|
+
These multi-step sequences are common in real workflows:
|
|
490
|
+
|
|
491
|
+
**Quality escalation**: Start with Nano Banana (free) for drafting. When the concept is right, offer to re-generate with Nano Banana 2 (paid, fast, 4K) or Nano Banana Pro (paid, highest quality) for the final version.
|
|
492
|
+
|
|
493
|
+
**A/B comparison**: Generate the same prompt with `provider="gemini"` then `provider="openai"` and show both to the user.
|
|
494
|
+
|
|
495
|
+
**Iterative detail building**: Start broad ("a coffee shop interior"), then add details step by step ("add plants by the window", "put a barista behind the counter", "add warm overhead lighting").
|
|
496
|
+
|
|
497
|
+
**Style exploration**: Generate a base image, then apply different style transfers with `edit_last` to find the right mood. Use `undo_edit` to return to the base and try another style.
|
|
498
|
+
|
|
499
|
+
## Viral and trending image styles
|
|
500
|
+
|
|
501
|
+
Popular AI image styles that users may request. Use these prompt templates with `generate_image` or `edit_last`.
|
|
502
|
+
|
|
503
|
+
| Style | Prompt template | Notes |
|
|
504
|
+
|-------|----------------|-------|
|
|
505
|
+
| Ghibli / anime scene | "Redraw in Studio Ghibli anime style, soft watercolor textures, warm natural lighting, pastoral atmosphere" | Apply via `edit_last` to transform existing images |
|
|
506
|
+
| Action figure in box | "A realistic action figure of [subject] in a sealed toy box with clear plastic window, product packaging, brand logo area at top, accessories visible" | Works well with `1:1` or `3:4` aspect ratio |
|
|
507
|
+
| 3D clay figure | "A cute 3D clay figure of [subject], rounded smooth surfaces, soft pastel colors, miniature diorama, studio lighting" | The original "Nano Banana" viral style |
|
|
508
|
+
| "Hug your past self" | "A person in [current clothing] hugging a smaller version of themselves as a [child/teenager], warm emotional lighting, photo-realistic" | Emotional / personal branding content |
|
|
509
|
+
| Pet portrait (humanized) | "A [breed] dog/cat dressed in [outfit], sitting in a [setting], portrait style, dignified pose, realistic fur texture" | Popular for social media profiles |
|
|
510
|
+
| Chibi character | "A chibi-style character of [description], oversized head, small body, big expressive eyes, simple background, cute proportions" | Good for avatars and stickers |
|
|
511
|
+
| Pixel art retro | "16-bit pixel art of [subject], retro game aesthetic, limited color palette, clean pixel edges" | Nostalgic developer/gaming content |
|
|
512
|
+
|
|
513
|
+
When the user requests a trending style, use the appropriate template and adjust based on their subject. Combine with `background="transparent"` (OpenAI) for stickers.
|
|
514
|
+
|
|
515
|
+
## Specialized use case guides
|
|
516
|
+
|
|
517
|
+
### Icon set generation
|
|
518
|
+
|
|
519
|
+
Generate multiple icons with consistent style for an app or project:
|
|
520
|
+
|
|
521
|
+
```
|
|
522
|
+
1. Define the style: "Flat minimalist icon, 2px stroke, rounded corners, single accent color #FF6B35 on white"
|
|
523
|
+
2. generate_image: prompt="[style] of a home/house symbol" aspect_ratio="1:1"
|
|
524
|
+
3. generate_image: prompt="[style] of a settings gear symbol" aspect_ratio="1:1"
|
|
525
|
+
4. generate_image: prompt="[style] of a user profile symbol" aspect_ratio="1:1"
|
|
526
|
+
```
|
|
527
|
+
|
|
528
|
+
Key: Repeat the exact same style description in every prompt. This is more reliable than using `edit_last` for style consistency across separate icons.
|
|
529
|
+
|
|
530
|
+
For transparent icons: Use OpenAI with `background="transparent"` and describe only the icon subject.
|
|
531
|
+
|
|
532
|
+
### Seamless pattern
|
|
533
|
+
|
|
534
|
+
```
|
|
535
|
+
1. generate_image: prompt="Seamless tileable pattern of [elements], evenly distributed, no visible seam edges, [style]"
|
|
536
|
+
2. edit_last: prompt="Make the pattern more evenly distributed, ensure elements don't cluster at edges"
|
|
537
|
+
```
|
|
538
|
+
|
|
539
|
+
Tip: Include "seamless tileable pattern" and "no visible seam edges" in the prompt.
|
|
540
|
+
|
|
541
|
+
### Technical diagram / architecture
|
|
542
|
+
|
|
543
|
+
```
|
|
544
|
+
1. generate_image: prompt="Clean technical architecture diagram showing [components], labeled boxes connected by arrows, white background, minimal style, clear hierarchy"
|
|
545
|
+
2. edit_last: prompt="Add a label '[text]' to the top box"
|
|
546
|
+
```
|
|
547
|
+
|
|
548
|
+
For accurate text labels, use Nano Banana 2 (best text rendering) or OpenAI gpt-image-1.5.
|
|
549
|
+
|
|
550
|
+
### Story sequence (consistent characters)
|
|
551
|
+
|
|
552
|
+
Maintain visual consistency across a sequence of images:
|
|
553
|
+
|
|
554
|
+
```
|
|
555
|
+
1. Define a character DNA: "A woman with short dark hair, round glasses, wearing a navy blue cardigan and white t-shirt"
|
|
556
|
+
2. generate_image: prompt="[character DNA], sitting at a desk reading a book, warm indoor lighting"
|
|
557
|
+
3. generate_image: prompt="[character DNA], standing at a coffee shop counter ordering, morning light through windows"
|
|
558
|
+
4. generate_image: prompt="[character DNA], walking on a city street with a tote bag, afternoon sun"
|
|
559
|
+
```
|
|
560
|
+
|
|
561
|
+
Key: Copy the exact character description into every prompt. Add scene-specific context after the character DNA. Consistency improves when using the same model and provider across all images.
|
|
562
|
+
|
|
563
|
+
## Multi-image consistency techniques
|
|
564
|
+
|
|
565
|
+
When the user needs multiple images that look like they belong together (slide decks, social media series, brand assets):
|
|
566
|
+
|
|
567
|
+
### Design token approach
|
|
568
|
+
|
|
569
|
+
Define visual constants and reuse them across all prompts:
|
|
570
|
+
|
|
571
|
+
```
|
|
572
|
+
Color: "earth tones, warm browns (#8B6914) and sage green (#87A96B)"
|
|
573
|
+
Style: "flat illustration with subtle paper texture, 2D, no gradients"
|
|
574
|
+
Lighting: "soft diffused natural light, no harsh shadows"
|
|
575
|
+
Framing: "centered subject, 20% padding, clean background"
|
|
576
|
+
```
|
|
577
|
+
|
|
578
|
+
Prepend these tokens to every prompt: `"[tokens], [subject-specific content]"`
|
|
579
|
+
|
|
580
|
+
### Character DNA template
|
|
581
|
+
|
|
582
|
+
For recurring characters or mascots, write a fixed description block:
|
|
583
|
+
|
|
584
|
+
```
|
|
585
|
+
Character: "A friendly robot with a round head, single blue eye, matte silver body, short stubby arms, standing upright"
|
|
586
|
+
```
|
|
587
|
+
|
|
588
|
+
Never paraphrase — copy the exact same text each time.
|
|
589
|
+
|
|
590
|
+
### Style reference chain
|
|
591
|
+
|
|
592
|
+
Use one generated image as the style anchor:
|
|
593
|
+
|
|
594
|
+
```
|
|
595
|
+
1. generate_image: prompt="[detailed style + first scene]" → establish the look
|
|
596
|
+
2. For subsequent images: describe the same style explicitly + new scene content
|
|
597
|
+
3. If style drifts: undo_edit back, regenerate with more explicit style description
|
|
598
|
+
```
|
|
599
|
+
|
|
600
|
+
### Consistency tips
|
|
601
|
+
|
|
602
|
+
- **Same model, same provider** across all images in a set
|
|
603
|
+
- **Front-load the style description** before scene-specific content
|
|
604
|
+
- **Use exact phrases** — "soft watercolor" not sometimes "watercolor" and sometimes "painted in watercolors"
|
|
605
|
+
- **Generate at the same resolution** — mixing resolutions changes perceived style
|
|
606
|
+
- **Review and regenerate** — if one image in a set drifts, regenerate it rather than trying to edit it to match
|
|
607
|
+
|
|
608
|
+
## Platform size guide
|
|
609
|
+
|
|
610
|
+
Recommended aspect ratios and resolutions for common platforms. When the user mentions a platform, suggest these settings automatically.
|
|
611
|
+
|
|
612
|
+
### Social media
|
|
613
|
+
|
|
614
|
+
| Platform | Use case | Aspect ratio | Resolution | Notes |
|
|
615
|
+
|----------|----------|-------------|------------|-------|
|
|
616
|
+
| Twitter/X | Post image | `16:9` | `2K` | 1200x675 recommended, larger is fine |
|
|
617
|
+
| Twitter/X | Profile header | `3:1` (use `21:9`) | `2K` | 1500x500 recommended |
|
|
618
|
+
| Facebook | Shared post | `16:9` | `2K` | |
|
|
619
|
+
| Facebook | Cover photo | `21:9` | `2K` | 820x312 recommended |
|
|
620
|
+
| Instagram | Feed post | `1:1` or `4:5` | `2K` | Square or portrait |
|
|
621
|
+
| Instagram | Story/Reel | `9:16` | `2K` | 1080x1920 |
|
|
622
|
+
| LinkedIn | Post image | `16:9` or `1:1` | `2K` | |
|
|
623
|
+
| YouTube | Thumbnail | `16:9` | `2K` | 1280x720 minimum |
|
|
624
|
+
|
|
625
|
+
### OGP (Open Graph Protocol)
|
|
626
|
+
|
|
627
|
+
| Platform | Recommended size | Aspect ratio | Notes |
|
|
628
|
+
|----------|-----------------|-------------|-------|
|
|
629
|
+
| Twitter/X Cards | 1200x630 | `~1.91:1` (use `16:9`) | Summary with large image |
|
|
630
|
+
| Facebook OGP | 1200x630 | `~1.91:1` (use `16:9`) | Same as Twitter |
|
|
631
|
+
| LinkedIn OGP | 1200x627 | `~1.91:1` (use `16:9`) | Same ratio |
|
|
632
|
+
| Slack unfurl | 1200x630 | `16:9` | Same as OGP standard |
|
|
633
|
+
|
|
634
|
+
For OGP images: Use `16:9` at `2K` resolution. This covers all major platforms.
|
|
635
|
+
|
|
636
|
+
### App stores
|
|
637
|
+
|
|
638
|
+
| Platform | Use case | Aspect ratio | Resolution |
|
|
639
|
+
|----------|----------|-------------|------------|
|
|
640
|
+
| iOS App Store | Screenshot (iPhone) | `9:16` | `4K` (retina) |
|
|
641
|
+
| iOS App Store | Screenshot (iPad) | `3:4` | `4K` |
|
|
642
|
+
| Google Play | Screenshot | `9:16` | `4K` |
|
|
643
|
+
| App Store | Feature graphic | `16:9` | `2K` |
|
|
644
|
+
|
|
645
|
+
### Print and documents
|
|
646
|
+
|
|
647
|
+
| Use case | Aspect ratio | Resolution | Notes |
|
|
648
|
+
|----------|-------------|------------|-------|
|
|
649
|
+
| A4 document | `3:4` | `4K` | Portrait orientation |
|
|
650
|
+
| Letter | `4:5` | `4K` | US letter approximation |
|
|
651
|
+
| Presentation (16:9) | `16:9` | `2K`–`4K` | Standard widescreen |
|
|
652
|
+
| Business card | `16:9` or `3:2` | `2K` | Landscape orientation |
|
|
653
|
+
|
|
654
|
+
### Blog platforms
|
|
655
|
+
|
|
656
|
+
| Platform | Cover image | Aspect ratio | Notes |
|
|
657
|
+
|----------|-------------|-------------|-------|
|
|
658
|
+
| note.com | Header | `16:9` | PNG recommended |
|
|
659
|
+
| Dev.to | Cover | `16:9` | 1000x420 minimum |
|
|
660
|
+
| Medium | Header | `16:9` or `3:2` | |
|
|
661
|
+
| WordPress | Featured image | `16:9` | JPEG for file size |
|
|
662
|
+
| Qiita | OGP | `16:9` | Auto-generated if not set |
|
|
663
|
+
|
|
664
|
+
## Writing effective prompts
|
|
665
|
+
|
|
666
|
+
Structure prompts with three layers: **Subject → Context → Style**. Each layer adds specificity.
|
|
667
|
+
|
|
668
|
+
### Subject (what)
|
|
669
|
+
|
|
670
|
+
Name the main subject concretely. Avoid abstract descriptions.
|
|
671
|
+
|
|
672
|
+
| Weak | Strong |
|
|
673
|
+
|------|--------|
|
|
674
|
+
| "coffee scene" | "a ceramic pour-over dripper on a wooden table with a freshly brewed cup" |
|
|
675
|
+
| "developer working" | "a developer's hands on a laptop keyboard, terminal showing green text on dark background" |
|
|
676
|
+
| "nature" | "a single oak tree on a grass hill, autumn leaves half-fallen" |
|
|
677
|
+
|
|
678
|
+
### Context (where / when / with what)
|
|
679
|
+
|
|
680
|
+
Add environment, lighting, and surrounding elements.
|
|
681
|
+
|
|
682
|
+
| Element | Example |
|
|
683
|
+
|---------|---------|
|
|
684
|
+
| Lighting | "soft natural light from a left window", "harsh overhead fluorescent", "golden hour backlight" |
|
|
685
|
+
| Setting | "in a minimalist Scandinavian kitchen", "on a rainy Tokyo street at night" |
|
|
686
|
+
| Surrounding objects | "with a notebook and pen beside it", "next to a stack of books" |
|
|
687
|
+
| Time/season | "early morning", "winter snowfall outside the window" |
|
|
688
|
+
|
|
689
|
+
### Style (how it looks)
|
|
690
|
+
|
|
691
|
+
Specify the visual treatment.
|
|
692
|
+
|
|
693
|
+
| Element | Example |
|
|
694
|
+
|---------|---------|
|
|
695
|
+
| Photography style | "shallow depth of field, f/1.8", "wide-angle shot from below" |
|
|
696
|
+
| Art style | "flat vector illustration", "watercolor with soft edges", "detailed pencil sketch" |
|
|
697
|
+
| Color palette | "earth tones, warm browns and greens", "monochrome with single red accent" |
|
|
698
|
+
| Mood | "calm and contemplative", "energetic and vibrant" |
|
|
699
|
+
|
|
700
|
+
### Complete prompt example
|
|
701
|
+
|
|
702
|
+
```
|
|
703
|
+
Subject: A barista pouring steamed milk into a latte, creating a rosetta pattern
|
|
704
|
+
Context: At a wooden counter in a small coffee shop, warm pendant light overhead, coffee equipment in the background
|
|
705
|
+
Style: Close-up shot, shallow depth of field, warm earth tones, natural lighting
|
|
706
|
+
```
|
|
707
|
+
|
|
708
|
+
→ `"A barista pouring steamed milk into a latte creating a rosetta pattern, at a wooden counter in a small coffee shop, warm pendant light overhead, coffee equipment in background, close-up shot, shallow depth of field, warm earth tones, natural lighting"`
|
|
709
|
+
|
|
710
|
+
### Prompt tips
|
|
711
|
+
|
|
712
|
+
- **Be literal, not metaphorical** — "a bridge connecting two cliffs" not "bridging the gap between ideas"
|
|
713
|
+
- **Front-load the subject** — The model weights the beginning of the prompt more heavily
|
|
714
|
+
- **Specify what you don't want** sparingly — "no text" or "no people" can help, but negative prompts are less reliable than positive descriptions
|
|
715
|
+
- **For text in images** — Put the exact text in quotes: `"with the text 'HELLO WORLD' in bold white sans-serif at the top center"`
|
|
716
|
+
- **For editing** — Describe only the change, not the entire image. "Make the sky sunset orange" not "A scene with everything the same but the sky is now sunset orange"
|
|
717
|
+
|
|
213
718
|
## Tips
|
|
214
719
|
|
|
215
720
|
- **Be specific in prompts**: "A wooden table with a ceramic pour-over dripper, steam rising, soft natural light from left" works better than "coffee scene"
|
|
216
721
|
- **Use edit_last for iteration**: Don't ask the user to specify file paths. Just use `edit_last` after any generation or edit
|
|
217
722
|
- **Check provider capabilities**: Use `list_providers` if unsure what a provider supports
|
|
218
|
-
- **Where `.imgx/` is created**: The `.imgx/` directory holds both edit history (`output-history.json`) and default image output. When a project root is detected, it's created at `<project-root>/.imgx/`. Without a project root, images go to `~/Pictures/imgx/` and history to `~/.config/imgx/`. All clients sharing the same project root share the same history
|
|
723
|
+
- **Where `.imgx/` is created**: The `.imgx/` directory holds both edit history (`output-history.json`) and default image output. When a project root is detected, it's created at `<project-root>/.imgx/`. Without a project root, images go to `~/Pictures/imgx/` and history to `~/.config/imgx/`. All clients sharing the same project root share the same history. See the **Project root** setup section above for configuration methods
|
|
219
724
|
- **Default output**: Images save to `<project-root>/.imgx/<session-id>/` (project auto-detected). Falls back to `~/Pictures/imgx/` when no project is detected. Use `output` or `output_dir` to customize
|
|
220
725
|
- **Custom output_dir and history**: When `output_dir` is specified on `generate_image`, the path is recorded as session metadata in `output-history.json`. `edit_last` reads this to inherit the output location. Only image files go to the custom path — history always stays in `.imgx/` (or global config directory)
|
|
221
726
|
- **Inline preview**: MCP responses include base64 image data for inline display in supported clients
|
|
222
727
|
- **Undo/redo**: Use `undo_edit` and `redo_edit` to step through edit history. Each session holds up to 10 entries
|
|
223
728
|
- **Sessions**: Each `generate_image` starts a new session. Use `edit_history` to see all sessions and `switch_session` to resume a previous one
|
|
224
|
-
- **Sequential naming**: When `output` specifies a filename, `edit_last` appends sequential numbers: `cover.png`
|
|
225
|
-
- **Project scope**: History is stored per-project in `<project-root>/.imgx/output-history.json`. `clear_history` only affects the current project
|
|
729
|
+
- **Sequential naming**: When `output` specifies a filename, `edit_last` appends sequential numbers: `cover.png` -> `cover-1.png` -> `cover-2.png`. Undo automatically deletes discarded files
|
|
730
|
+
- **Project scope**: History is stored per-project in `<project-root>/.imgx/output-history.json`. `clear_history` only affects the current project. Relative paths in `output` and `output_dir` are resolved against the project root
|
|
226
731
|
|
|
227
732
|
## CLI fallback
|
|
228
733
|
|
|
@@ -5,18 +5,20 @@
|
|
|
5
5
|
| Item | Value |
|
|
6
6
|
|------|-------|
|
|
7
7
|
| Provider name | `gemini` |
|
|
8
|
-
| Default model | `gemini-
|
|
9
|
-
|
|
|
8
|
+
| Default model | `gemini-2.5-flash-image` (Nano Banana — **free tier**) |
|
|
9
|
+
| Paid models | `gemini-3-pro-image-preview` (Nano Banana Pro), `gemini-3.1-flash-image-preview` (Nano Banana 2) |
|
|
10
10
|
| API key env var | `GEMINI_API_KEY` |
|
|
11
11
|
|
|
12
12
|
### Model comparison
|
|
13
13
|
|
|
14
|
-
| Feature | gemini-3-pro-image-preview | gemini-2.5-flash-image |
|
|
15
|
-
|
|
16
|
-
| Quality |
|
|
17
|
-
| Speed | Slower | Faster |
|
|
18
|
-
| Cost | ~$0.134/image |
|
|
19
|
-
| Resolution | 1K, 2K, 4K | 1K, 2K |
|
|
14
|
+
| Feature | Nano Banana Pro (`gemini-3-pro-image-preview`) | Nano Banana 2 (`gemini-3.1-flash-image-preview`) | Nano Banana (`gemini-2.5-flash-image`) |
|
|
15
|
+
|---------|------------------------------------------------|--------------------------------------------------|----------------------------------------|
|
|
16
|
+
| Quality | Highest | Good (improved text rendering, ~90% accuracy) | Good |
|
|
17
|
+
| Speed | Slower | Faster | Fast |
|
|
18
|
+
| Cost | ~$0.134/image | $0.045-$0.151/image (resolution dependent) | $0.039/image |
|
|
19
|
+
| Resolution | 1K, 2K, 4K | 1K, 2K, 4K | 1K (1024px max) |
|
|
20
|
+
| Aspect ratios | 14 | 14 | 7 |
|
|
21
|
+
| Free tier | No | No | **Yes** (10 RPM / 500 RPD) |
|
|
20
22
|
|
|
21
23
|
### Capabilities
|
|
22
24
|
|
|
@@ -24,8 +26,8 @@
|
|
|
24
26
|
|------------|---------------|-------------|
|
|
25
27
|
| TEXT_TO_IMAGE | (default) | Generate from text |
|
|
26
28
|
| IMAGE_EDITING | `input` | Edit with text instructions |
|
|
27
|
-
| ASPECT_RATIO | `aspect_ratio` |
|
|
28
|
-
| RESOLUTION_CONTROL | `resolution` | `1K`, `2K`, `4K` |
|
|
29
|
+
| ASPECT_RATIO | `aspect_ratio` | 3.x models: 14 ratios (`1:1`, `1:4`, `1:8`, `2:3`, `3:2`, `3:4`, `4:1`, `4:3`, `4:5`, `5:4`, `8:1`, `9:16`, `16:9`, `21:9`). 2.5 Flash: 7 ratios (`1:1`, `2:3`, `3:2`, `3:4`, `4:3`, `9:16`, `16:9`) |
|
|
30
|
+
| RESOLUTION_CONTROL | `resolution` | 3.x models: `1K`, `2K`, `4K`. 2.5 Flash: `1K` only |
|
|
29
31
|
| REFERENCE_IMAGES | — | Use reference images (future) |
|
|
30
32
|
| PERSON_CONTROL | — | Control person generation (future) |
|
|
31
33
|
|
|
@@ -35,6 +37,7 @@
|
|
|
35
37
|
|------|-------|
|
|
36
38
|
| Provider name | `openai` |
|
|
37
39
|
| Default model | `gpt-image-1` |
|
|
40
|
+
| Additional models | `gpt-image-1.5` (faster, 20% cheaper), `gpt-image-1-mini` (budget, $0.005/image) |
|
|
38
41
|
| API key env var | `OPENAI_API_KEY` |
|
|
39
42
|
|
|
40
43
|
### Capabilities
|
|
@@ -46,15 +49,19 @@
|
|
|
46
49
|
| ASPECT_RATIO | `aspect_ratio` | 7 ratios: `1:1`, `2:3`, `3:2`, `3:4`, `4:3`, `9:16`, `16:9` |
|
|
47
50
|
| MULTIPLE_OUTPUTS | `count` | Generate up to 4 images per request |
|
|
48
51
|
| OUTPUT_FORMAT | `output_format` | PNG, JPEG, WebP |
|
|
52
|
+
| BACKGROUND | `background` | `transparent`, `opaque`, `auto`. Transparent PNG/WebP for icons, logos, stickers |
|
|
53
|
+
| QUALITY | `quality` | `low`, `medium`, `high`, `auto`. Direct quality control (overrides resolution mapping) |
|
|
49
54
|
|
|
50
55
|
### Provider comparison
|
|
51
56
|
|
|
52
57
|
| Feature | Gemini | OpenAI |
|
|
53
58
|
|---------|--------|--------|
|
|
54
59
|
| Edit (text-only, no mask) | Yes | Yes |
|
|
55
|
-
| Resolution control | Yes (1K/2K/4K) | No |
|
|
60
|
+
| Resolution control | Yes (3.x: 1K/2K/4K, 2.5: 1K only) | No |
|
|
61
|
+
| Aspect ratios | 3.x: 14, 2.5: 7 | 7 |
|
|
56
62
|
| Multiple outputs | No | Yes (up to 4) |
|
|
57
63
|
| Output format selection | No (PNG only) | Yes (PNG/JPEG/WebP) |
|
|
64
|
+
| Background transparency | No | Yes (`transparent`/`opaque`/`auto`) |
|
|
58
65
|
| Iterative editing (`edit_last`) | Yes | Yes |
|
|
59
66
|
|
|
60
67
|
## Adding new providers
|