pixel-surgeon-mcp 1.1.0 → 1.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +48 -28
- package/dist/index.js +4 -4
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -6,13 +6,14 @@
|
|
|
6
6
|
|
|
7
7
|
<p align="center">
|
|
8
8
|
<strong>MCP server for AI image & video generation, editing, and transplant-grade region repair</strong><br/>
|
|
9
|
-
Powered by Gemini 3.1 Flash Image, OpenAI GPT Image 2, and Veo 3
|
|
9
|
+
Powered by Gemini 3.1 Flash Image, OpenAI GPT Image 2, Grok Imagine, and Veo 3
|
|
10
10
|
</p>
|
|
11
11
|
|
|
12
12
|
<p align="center">
|
|
13
13
|
<img src="https://img.shields.io/badge/MCP-stdio-blue" alt="MCP stdio" />
|
|
14
14
|
<img src="https://img.shields.io/badge/Gemini_3.1-Flash_Image-4285F4?logo=google" alt="Gemini" />
|
|
15
15
|
<img src="https://img.shields.io/badge/GPT_Image_2-OpenAI-412991?logo=openai&logoColor=white" alt="OpenAI" />
|
|
16
|
+
<img src="https://img.shields.io/badge/Grok_Imagine-xAI-000000?logo=x&logoColor=white" alt="Grok" />
|
|
16
17
|
<img src="https://img.shields.io/badge/Veo_3-Video-34A853?logo=google" alt="Veo 3" />
|
|
17
18
|
<img src="https://img.shields.io/badge/TypeScript-5.9-3178C6?logo=typescript&logoColor=white" alt="TypeScript" />
|
|
18
19
|
</p>
|
|
@@ -23,15 +24,19 @@ An [MCP](https://modelcontextprotocol.io) server that gives Claude (or any MCP c
|
|
|
23
24
|
|
|
24
25
|
## How it works
|
|
25
26
|
|
|
26
|
-
pixel-surgeon-mcp is a **multi-provider** image generation server. You can use
|
|
27
|
+
pixel-surgeon-mcp is a **multi-provider** image generation server. You can use any combination of providers and switch between them per-request:
|
|
27
28
|
|
|
28
|
-
### Gemini (Google)
|
|
29
|
+
### Gemini (Google) — balanced
|
|
29
30
|
|
|
30
|
-
Google's image generation pipeline uses a two-stage approach: **Gemini 3.1 Pro** reasons about your prompt, then **Gemini 3.1 Flash Image** renders the pixels. Supports 9 aspect ratios at 512/1K/2K/4K resolution.
|
|
31
|
+
Google's image generation pipeline uses a two-stage approach: **Gemini 3.1 Pro** reasons about your prompt, then **Gemini 3.1 Flash Image** renders the pixels. Supports 9 aspect ratios at 512/1K/2K/4K resolution. Best price/performance ratio, with a free tier available.
|
|
31
32
|
|
|
32
|
-
### OpenAI GPT Image 2
|
|
33
|
+
### OpenAI GPT Image 2 — highest quality
|
|
33
34
|
|
|
34
|
-
OpenAI's latest image model with dramatically improved text rendering and visual fidelity. Supports flexible resolutions — pixel-surgeon maps your chosen size and aspect ratio to the optimal pixel dimensions automatically. Quality levels: `medium` (fast) and `high` (print-ready). **Excellent for infographics, diagrams, and text-heavy images** where
|
|
35
|
+
OpenAI's latest image model with dramatically improved text rendering and visual fidelity. Supports flexible resolutions — pixel-surgeon maps your chosen size and aspect ratio to the optimal pixel dimensions automatically. Quality levels: `medium` (fast) and `high` (print-ready). **Excellent for infographics, diagrams, and text-heavy images** where other models struggle. Slower and more expensive.
|
|
36
|
+
|
|
37
|
+
### Grok Imagine (xAI) — fastest
|
|
38
|
+
|
|
39
|
+
xAI's Aurora-powered image model. Fastest generation speed and lowest cost. Supports 7 aspect ratios at fixed resolutions (~1K). Good for rapid prototyping and iteration.
|
|
35
40
|
|
|
36
41
|
### Veo 3 (Video)
|
|
37
42
|
|
|
@@ -64,6 +69,7 @@ AI image models struggle with text-heavy images. The fix tools solve this by sen
|
|
|
64
69
|
| `gemini-2.5-flash-image` | Google | 1K max (free tier) | Quick drafts, prototyping |
|
|
65
70
|
| `gpt-image-2` | OpenAI | Flexible (up to 4K) | Text-heavy images, infographics, diagrams, typography |
|
|
66
71
|
| `gpt-image-1` | OpenAI | 3 fixed sizes | Legacy support |
|
|
72
|
+
| `grok-imagine` | xAI | Fixed (~1K per ratio) | Fast iteration, lowest cost |
|
|
67
73
|
|
|
68
74
|
Force a specific model per-call via the `model` tool parameter, or set `DEFAULT_IMAGE_MODEL` env var.
|
|
69
75
|
|
|
@@ -80,10 +86,10 @@ Magazine editorial, bold typography, halftone textures. Cream, black, and terrac
|
|
|
80
86
|
|
|
81
87
|
<img src="assets/style-neo-brutalist.png" alt="neo-brutalist style example" width="400" />
|
|
82
88
|
|
|
83
|
-
### `
|
|
84
|
-
1960s Space Age meets 1980s arcade. Cathode blue, amber, and salmon palette.
|
|
89
|
+
### `duval-software-infographic`
|
|
90
|
+
Duval Software's signature retro-futurist infographic style. 1960s Space Age meets 1980s arcade. Cathode blue, amber, and salmon palette. Great for diagrams and system overviews.
|
|
85
91
|
|
|
86
|
-
<img src="assets/style-neo-retro-futurism.png" alt="
|
|
92
|
+
<img src="assets/style-neo-retro-futurism.png" alt="duval-software-infographic style example" width="400" />
|
|
87
93
|
|
|
88
94
|
### `fractal-arcade`
|
|
89
95
|
Dithered fractals, Sierpinski patterns, low-poly. CRT retro, Amiga/EGA palette.
|
|
@@ -99,7 +105,7 @@ Technical diagrams, system flows, data pipelines. Dark navy, cyan, and electric
|
|
|
99
105
|
|
|
100
106
|
### Get your API key(s)
|
|
101
107
|
|
|
102
|
-
You need at least one provider API key. You can use
|
|
108
|
+
You need at least one provider API key. You can use any combination for maximum flexibility.
|
|
103
109
|
|
|
104
110
|
#### Google (Gemini + Veo 3)
|
|
105
111
|
|
|
@@ -118,45 +124,59 @@ You need at least one provider API key. You can use both for maximum flexibility
|
|
|
118
124
|
|
|
119
125
|
> GPT Image 2 excels at text rendering, infographics, and diagrams. If you primarily need text-heavy images, this is the provider to use.
|
|
120
126
|
|
|
121
|
-
|
|
127
|
+
#### xAI (Grok Imagine)
|
|
128
|
+
|
|
129
|
+
1. Go to [xAI Console](https://console.x.ai/)
|
|
130
|
+
2. Sign in or create an account
|
|
131
|
+
3. Create an API key and copy it
|
|
132
|
+
|
|
133
|
+
> Grok Imagine is the fastest and cheapest provider. Great for rapid iteration and prototyping. Fixed output resolutions (~1K) with no size control.
|
|
122
134
|
|
|
123
|
-
|
|
135
|
+
### Quick start (npx)
|
|
124
136
|
|
|
125
|
-
|
|
137
|
+
No install needed — run directly with npx. Pass whichever API keys you have:
|
|
126
138
|
|
|
127
139
|
```bash
|
|
128
|
-
|
|
129
|
-
cd pixel-surgeon-mcp
|
|
130
|
-
npm install
|
|
131
|
-
npm run build
|
|
140
|
+
npx pixel-surgeon-mcp
|
|
132
141
|
```
|
|
133
142
|
|
|
134
|
-
|
|
143
|
+
#### Claude Code CLI
|
|
135
144
|
|
|
136
|
-
|
|
145
|
+
```bash
|
|
146
|
+
claude mcp add pixel-surgeon \
|
|
147
|
+
-e GOOGLE_API_KEY=your-google-key \
|
|
148
|
+
-e OPENAI_API_KEY=your-openai-key \
|
|
149
|
+
-e XAI_API_KEY=your-xai-key \
|
|
150
|
+
-- npx pixel-surgeon-mcp
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
#### Claude Desktop / MCP client config
|
|
137
154
|
|
|
138
155
|
```json
|
|
139
156
|
{
|
|
140
157
|
"mcpServers": {
|
|
141
158
|
"pixel-surgeon": {
|
|
142
|
-
"command": "
|
|
143
|
-
"args": ["
|
|
159
|
+
"command": "npx",
|
|
160
|
+
"args": ["pixel-surgeon-mcp"],
|
|
144
161
|
"env": {
|
|
145
162
|
"GOOGLE_API_KEY": "your-google-api-key",
|
|
146
|
-
"OPENAI_API_KEY": "your-openai-api-key"
|
|
163
|
+
"OPENAI_API_KEY": "your-openai-api-key",
|
|
164
|
+
"XAI_API_KEY": "your-xai-api-key"
|
|
147
165
|
}
|
|
148
166
|
}
|
|
149
167
|
}
|
|
150
168
|
}
|
|
151
169
|
```
|
|
152
170
|
|
|
153
|
-
|
|
171
|
+
### Install from source
|
|
172
|
+
|
|
173
|
+
If you prefer a local clone:
|
|
154
174
|
|
|
155
175
|
```bash
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
|
|
176
|
+
git clone https://github.com/j-east/pixel-surgeon-mcp.git
|
|
177
|
+
cd pixel-surgeon-mcp
|
|
178
|
+
npm install
|
|
179
|
+
npm run build
|
|
160
180
|
```
|
|
161
181
|
|
|
162
182
|
### Image output
|
|
@@ -192,7 +212,7 @@ Add entries to the `STYLE_PRESETS` object in `src/index.ts`. Your PR should incl
|
|
|
192
212
|
|
|
193
213
|
### Model adapters
|
|
194
214
|
|
|
195
|
-
The server currently supports Gemini, OpenAI, and Veo 3. We'd love adapters for other image/video generation APIs — Stable Diffusion, Flux, etc. If you're interested in adding one, open an issue first so we can align on the interface.
|
|
215
|
+
The server currently supports Gemini, OpenAI, Grok Imagine, and Veo 3. We'd love adapters for other image/video generation APIs — Stable Diffusion, Flux, etc. If you're interested in adding one, open an issue first so we can align on the interface.
|
|
196
216
|
|
|
197
217
|
## Built by Duval Software
|
|
198
218
|
|
package/dist/index.js
CHANGED
|
@@ -1689,8 +1689,8 @@ const STYLE_PRESETS = {
|
|
|
1689
1689
|
promptPrefix: "Neo-brutalist minimalist design. Magazine editorial style layout. Off-white / cream background with bold black typography in a heavy-weight grotesque sans-serif font, slightly overlapping and breaking the grid. Accent color: muted burnt orange or terracotta used sparingly as stripe or block elements. Raw, unpolished aesthetic — visible grid lines, asymmetric layout, oversized type that bleeds off edges. Subtle halftone texture overlay. Monospaced subtext in lowercase. No gradients, no glossy effects, no heavy saturation. Clean but edgy, restrained but bold.",
|
|
1690
1690
|
defaultAspectRatio: "4:5",
|
|
1691
1691
|
},
|
|
1692
|
-
"
|
|
1693
|
-
description: "
|
|
1692
|
+
"retro-futuristic-arcade": {
|
|
1693
|
+
description: "Retro-futurist infographic style. 1960s Space Age optimism meets 1980s arcade aesthetics. Cathode blue, warm amber, salmon red, warm green palette. CRT scanlines, atomic-age geometry, pixel-grid accents. Great for diagrams, system overviews, and technical illustrations.",
|
|
1694
1694
|
promptPrefix: "Neo-retro-futurism style. Blend of 1960s Space Age futurism and 1980s video game aesthetics with a modern neo-retro sensibility. Color palette: deep cathode-ray blue (#1a3a5c to #4a9eff glowing CRT blue), warm amber (#d4a017 to #ffcc44), salmon red (#e8735a to #ff6b6b), and warm muted greens (#5a8a5c to #8bbd7b). Dark background evoking a CRT monitor with subtle scanline texture and faint phosphor glow. Typography: mix of retrofuturist geometric sans-serif (like Eurostile, Microgramma, or Bank Gothic) with pixel-grid or bitmap-style secondary text. Design elements: atomic-age starbursts, orbital ellipses, rounded-rectangle pods, jet-age swooshes, and subtle 8-bit pixel patterns along borders or dividers. Faint CRT curvature vignette at edges. Thin vector grid lines receding to a vanishing point. Icons and illustrations should feel like arcade cabinet art meets Googie architecture meets NASA mission patches. Warm analog glow on all light sources — no harsh pure whites, everything filtered through amber or blue phosphor. The overall mood is optimistic, adventurous, and slightly nostalgic — a future that never was, rendered through a cathode ray tube.",
|
|
1695
1695
|
defaultAspectRatio: "4:5",
|
|
1696
1696
|
},
|
|
@@ -1699,8 +1699,8 @@ const STYLE_PRESETS = {
|
|
|
1699
1699
|
promptPrefix: "Geometric dithered illustration style. All shading done through dithering patterns, halftone dots, and geometric cross-hatch grids — NO smooth gradients anywhere. Every surface rendered with visible pixel-level dithering like a 16-color EGA/VGA palette pushed through ordered Bayer matrix dithering. Fractal geometric patterns in the background — Sierpinski triangles, hexagonal tessellations, recursive diamond grids. Color palette: deep cathode-ray blue (#1a3a5c to #4a9eff), warm amber (#d4a017 to #ffcc44), salmon red (#e8735a), warm muted greens (#5a8a5c). Subjects built from clean geometric shapes — triangular facets, polygonal planes, like a low-poly render but flat and 2D with dithered color fills instead of smooth shading. Think: Saul Bass designed a character select screen for an Amiga game. Geometric line-art icons. Chunky retrofuturist typeface for headers, smaller geometric caps for subtitles. Horizontal scanline overlay. No photorealism, no soft shadows, no AI-gradient smoothness. Every color transition is a hard dither pattern. Clean, precise, geometric, but retro-cool.",
|
|
1700
1700
|
defaultAspectRatio: "4:5",
|
|
1701
1701
|
},
|
|
1702
|
-
"
|
|
1703
|
-
description: "
|
|
1702
|
+
"duval-software-infographic": {
|
|
1703
|
+
description: "Duval Software's clean technical infographic for architecture diagrams, system flows, and data pipelines. Dark navy background, cyan/electric blue glowing connection lines, geometric nodes, professional and precise.",
|
|
1704
1704
|
promptPrefix: "Clean, professional technical infographic on a dark navy (#0a1628) background with subtle grid lines. Use cyan (#00d4ff) and electric blue (#4a9eff) glowing connection lines between components. White and light gray text only — no bright colors for text. Components rendered as clean geometric shapes: rounded rectangles, hexagons, circles with thin borders and subtle inner glow. Icons are minimal line-art style (server racks, phones, browsers, databases, cloud services). Typography: modern sans-serif (like Inter or SF Pro) — bold for titles, regular weight for labels, monospace for technical details (ports, protocols, versions). Layout follows clear left-to-right or top-to-bottom data flow with labeled arrows showing protocols and data formats. No decorative illustrations, no clip art, no logos, no random embellishments. Include a thin tech stack bar at the bottom. The overall feel is a polished engineering diagram you'd present to a CTO — precise, minimal, and authoritative.",
|
|
1705
1705
|
defaultAspectRatio: "16:9",
|
|
1706
1706
|
},
|