pixel-surgeon-mcp 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 John Evans
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,203 @@
1
+ <p align="center">
2
+ <img src="assets/architecture.png" alt="pixel-surgeon-mcp architecture" width="800" />
3
+ </p>
4
+
5
+ <h1 align="center">pixel-surgeon-mcp</h1>
6
+
7
+ <p align="center">
8
+ <strong>MCP server for AI image &amp; video generation, editing, and transplant-grade region repair</strong><br/>
9
+ Powered by Gemini 3.1 Flash Image, OpenAI GPT Image 2, and Veo 3
10
+ </p>
11
+
12
+ <p align="center">
13
+ <img src="https://img.shields.io/badge/MCP-stdio-blue" alt="MCP stdio" />
14
+ <img src="https://img.shields.io/badge/Gemini_3.1-Flash_Image-4285F4?logo=google" alt="Gemini" />
15
+ <img src="https://img.shields.io/badge/GPT_Image_2-OpenAI-412991?logo=openai&logoColor=white" alt="OpenAI" />
16
+ <img src="https://img.shields.io/badge/Veo_3-Video-34A853?logo=google" alt="Veo 3" />
17
+ <img src="https://img.shields.io/badge/TypeScript-5.9-3178C6?logo=typescript&logoColor=white" alt="TypeScript" />
18
+ </p>
19
+
20
+ ---
21
+
22
+ An [MCP](https://modelcontextprotocol.io) server that gives Claude (or any MCP client) the ability to generate images, edit them, fix garbled text, and create videos — all through natural language.
23
+
24
+ ## How it works
25
+
26
+ pixel-surgeon-mcp is a **multi-provider** image generation server. You can use either or both providers, and switch between them per-request:
27
+
28
+ ### Gemini (Google)
29
+
30
+ Google's image generation pipeline uses a two-stage approach: **Gemini 3.1 Pro** reasons about your prompt, then **Gemini 3.1 Flash Image** renders the pixels. Supports 9 aspect ratios at 512/1K/2K/4K resolution.
31
+
32
+ ### OpenAI GPT Image 2
33
+
34
+ OpenAI's latest image model with dramatically improved text rendering and visual fidelity. Supports flexible resolutions — pixel-surgeon maps your chosen size and aspect ratio to the optimal pixel dimensions automatically. Quality levels: `medium` (fast) and `high` (print-ready). **Excellent for infographics, diagrams, and text-heavy images** where Gemini models struggle.
35
+
36
+ ### Veo 3 (Video)
37
+
38
+ For video, the server calls **Veo 3** with async polling — generating both video and ambient audio. Supports 16:9 and 9:16 at 5s or 8s duration.
39
+
40
+ ### Region repair
41
+
42
+ AI image models struggle with text-heavy images. The fix tools solve this by sending smaller regions to the provider, then stitching the results back with histogram-matched compositing for seamless blending.
43
+
44
+ ## Tools
45
+
46
+ | Tool | Description |
47
+ |------|-------------|
48
+ | `generate_image` | Text-to-image generation (single image) |
49
+ | `generate_images` | Parallel batch generation (1-8 images) |
50
+ | `generate_video` | Text-to-video via Veo 3 with audio (5s or 8s) |
51
+ | `edit_image` | Edit an existing image with natural language instructions |
52
+ | `fix_image` | Grid-based tile repair for garbled text (2x2, 3x3, etc.) |
53
+ | `fix_region` | Targeted region repair with automatic aspect ratio snapping |
54
+ | `interactive_fix` | Browser-based crop UI with multi-shot selection |
55
+ | `list_images` | List generated images and videos |
56
+ | `save_image` | Import an external image into the workspace |
57
+ | `remove_background` | Remove image background (alpha channel transparency) |
58
+
59
+ ## Models
60
+
61
+ | Model | Provider | Resolution | Best for |
62
+ |-------|----------|-----------|----------|
63
+ | `gemini-3.1-flash-image` | Google | 512 / 1K / 2K / 4K | General image generation, photo-realistic scenes |
64
+ | `gemini-2.5-flash-image` | Google | 1K max (free tier) | Quick drafts, prototyping |
65
+ | `gpt-image-2` | OpenAI | Flexible (up to 4K) | Text-heavy images, infographics, diagrams, typography |
66
+ | `gpt-image-1` | OpenAI | 3 fixed sizes | Legacy support |
67
+
68
+ Force a specific model per-call via the `model` tool parameter, or set `DEFAULT_IMAGE_MODEL` env var.
69
+
70
+ ### Gemini automatic fallback
71
+
72
+ If a Gemini generation call fails with a billing / prepay error, the server automatically retries on the free-tier **`gemini-2.5-flash-image`** model. The viewer shows a yellow banner when this happens. Free-tier limits: 1K max resolution, 10 RPM, 500 RPD.
73
+
74
+ ## Style presets
75
+
76
+ All generation and edit tools support an optional `style` parameter:
77
+
78
+ ### `neo-brutalist`
79
+ Magazine editorial, bold typography, halftone textures. Cream, black, and terracotta palette.
80
+
81
+ <img src="assets/style-neo-brutalist.png" alt="neo-brutalist style example" width="400" />
82
+
83
+ ### `neo-retro-futurism`
84
+ 1960s Space Age meets 1980s arcade. Cathode blue, amber, and salmon palette.
85
+
86
+ <img src="assets/style-neo-retro-futurism.png" alt="neo-retro-futurism style example" width="400" />
87
+
88
+ ### `fractal-arcade`
89
+ Dithered fractals, Sierpinski patterns, low-poly. CRT retro, Amiga/EGA palette.
90
+
91
+ <img src="assets/style-fractal-arcade.png" alt="fractal-arcade style example" width="400" />
92
+
93
+ ### `clean-tech-infographic`
94
+ Technical diagrams, system flows, data pipelines. Dark navy, cyan, and electric blue.
95
+
96
+ <img src="assets/style-clean-tech-infographic.png" alt="clean-tech-infographic style example" width="600" />
97
+
98
+ ## Setup
99
+
100
+ ### Get your API key(s)
101
+
102
+ You need at least one provider API key. You can use both for maximum flexibility.
103
+
104
+ #### Google (Gemini + Veo 3)
105
+
106
+ 1. Go to [Google AI Studio](https://aistudio.google.com/apikey)
107
+ 2. Sign in with your Google account
108
+ 3. Click **Create API Key** and copy it
109
+
110
+ > **Prepayment required.** Gemini 3.1 Flash Image and Veo 3 require billing and prepaid credits. The free-tier fallback (2.5 Flash) has limited resolution and rate limits. See [Google AI pricing](https://ai.google.dev/pricing).
111
+
112
+ #### OpenAI (GPT Image 2)
113
+
114
+ 1. Go to [OpenAI API](https://platform.openai.com/api-keys)
115
+ 2. Sign in or create an account
116
+ 3. Click **Create new secret key** and copy it
117
+ 4. Ensure you have API credits — image generation is billed per request
118
+
119
+ > GPT Image 2 excels at text rendering, infographics, and diagrams. If you primarily need text-heavy images, this is the provider to use.
120
+
121
+ ### Prerequisites
122
+
123
+ - Node.js 18+
124
+
125
+ ### Install
126
+
127
+ ```bash
128
+ git clone https://github.com/j-east/pixel-surgeon-mcp.git
129
+ cd pixel-surgeon-mcp
130
+ npm install
131
+ npm run build
132
+ ```
133
+
134
+ ### Configure your MCP client
135
+
136
+ Add to your Claude Code or Claude Desktop config. Include whichever API keys you have:
137
+
138
+ ```json
139
+ {
140
+ "mcpServers": {
141
+ "pixel-surgeon": {
142
+ "command": "node",
143
+ "args": ["/path/to/pixel-surgeon-mcp/dist/index.js"],
144
+ "env": {
145
+ "GOOGLE_API_KEY": "your-google-api-key",
146
+ "OPENAI_API_KEY": "your-openai-api-key"
147
+ }
148
+ }
149
+ }
150
+ }
151
+ ```
152
+
153
+ Or via the Claude Code CLI:
154
+
155
+ ```bash
156
+ claude mcp add pixel-surgeon \
157
+ -e GOOGLE_API_KEY=your-google-key \
158
+ -e OPENAI_API_KEY=your-openai-key \
159
+ -- node /path/to/pixel-surgeon-mcp/dist/index.js
160
+ ```
161
+
162
+ ### Image output
163
+
164
+ Generated images are saved to `~/Pictures/pixel-surgeon/`. A local browser viewer auto-launches on first use for full-resolution previews with model selection, respin controls, and search.
165
+
166
+ ## Development
167
+
168
+ ```bash
169
+ npm run dev # tsx watch mode
170
+ npm run build # compile TypeScript
171
+ npm run start # run compiled server
172
+ ```
173
+
174
+ ## Key implementation details
175
+
176
+ - **Aspect ratio snapping** — crops are adjusted to the nearest Gemini-supported ratio while preserving center point
177
+ - **Histogram matching** — per-channel RGB normalization ensures composited regions blend seamlessly
178
+ - **Human-in-the-loop** — `interactive_fix` opens a browser crop UI, blocks via Promise until the user submits, fires parallel Gemini calls, and lets the user pick the best result
179
+ - **MCP size limits** — full-resolution images are saved to disk; downsampled versions (< 950KB) are returned in MCP responses
180
+
181
+ ## Contributing
182
+
183
+ PRs are welcome! We're especially looking for:
184
+
185
+ ### New style presets
186
+
187
+ Add entries to the `STYLE_PRESETS` object in `src/index.ts`. Your PR should include:
188
+
189
+ - The preset definition (name, prompt prefix, default aspect ratio)
190
+ - 2-3 example images generated with the preset (drop them in your PR description)
191
+ - A short description of the visual style for the README table
192
+
193
+ ### Model adapters
194
+
195
+ The server currently supports Gemini, OpenAI, and Veo 3. We'd love adapters for other image/video generation APIs — Stable Diffusion, Flux, etc. If you're interested in adding one, open an issue first so we can align on the interface.
196
+
197
+ ## Built by Duval Software
198
+
199
+ pixel-surgeon-mcp is maintained by [John Evans](https://github.com/j-east), part of the engineering team at [Duval Software](https://duvalsoftware.com) — a software engineering firm in Jacksonville Beach, FL building AI-powered tools and custom integrations. If you need MCP servers, AI pipelines, or production tooling built, [get in touch](https://duvalsoftware.com).
200
+
201
+ ## License
202
+
203
+ MIT
@@ -0,0 +1,2 @@
1
+ #!/usr/bin/env node
2
+ export {};