@houtini/gemini-mcp 2.2.3 → 2.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -3,19 +3,36 @@
3
3
  [![npm version](https://img.shields.io/npm/v/@houtini/gemini-mcp.svg?style=flat-square)](https://www.npmjs.com/package/@houtini/gemini-mcp)
4
4
  [![MCP Registry](https://img.shields.io/badge/MCP-Registry-blue?style=flat-square)](https://registry.modelcontextprotocol.io)
5
5
  [![Known Vulnerabilities](https://snyk.io/test/github/houtini-ai/gemini-mcp/badge.svg)](https://snyk.io/test/github/houtini-ai/gemini-mcp)
6
- ${badge_line}
7
6
 
8
- **I've been running this MCP server in my Claude Desktop setup for several months, and it's one of the few I leave enabled permanently.** Not because Gemini replaces Claude -- it doesn't -- but because grounded search, deep research, image generation, and video are things Gemini does well. Having them as tools inside Claude beats switching between browser tabs.
7
+ I've been running this MCP server in my Claude Desktop setup for months. It's one of the few I leave on permanently not because Gemini replaces Claude, but because grounded search, image generation, SVG diagrams, and video are things Gemini does genuinely well. Having them as tools inside Claude beats switching browser tabs.
9
8
 
10
9
  Thirteen tools. One `npx` command.
11
10
 
12
- ### MCP App previews
11
+ <p align="center">
12
+ <a href="https://glama.ai/mcp/servers/@houtini-ai/gemini-mcp">
13
+ <img width="380" height="200" src="https://glama.ai/mcp/servers/@houtini-ai/gemini-mcp/badge" alt="Gemini MCP server" />
14
+ </a>
15
+ </p>
13
16
 
14
- Generated images and diagrams render inline in Claude Desktop with zoom controls, file paths, and prompt context:
17
+ ---
18
+
19
+ > **Quick Navigation**
20
+ >
21
+ > [Get started](#get-started-in-two-minutes) | [What it does](#what-it-does) | [SVG generation](#svg-generation) | [Image output](#image-output-and-storage) | [Configuration](#configuration-reference) | [Tools](#tools-reference) | [Models](#model-reference) | [Requirements](#requirements)
22
+
23
+ ---
24
+
25
+ ## What it looks like
26
+
27
+ Generated images, SVGs, and videos render inline in Claude Desktop with zoom controls, file paths, and prompt context:
15
28
 
16
29
  | Image generation | SVG / diagram generation |
17
30
  |:---:|:---:|
18
- | ![Image preview in MCP App](image-preview-mcp-app.jpg) | ![Diagram preview in MCP App](diagram-preview-mcp-app.jpg) |
31
+ | ![Image preview](image-preview-mcp-app.jpg) | ![SVG preview](diagram-preview-mcp-app.jpg) |
32
+
33
+ | Image embed | SVG embed | Video embed |
34
+ |:---:|:---:|:---:|
35
+ | ![Image embed](image-embed.png) | ![SVG embed](svg-embed.png) | ![Video embed](video-embed.png) |
19
36
 
20
37
  ---
21
38
 
@@ -23,7 +40,7 @@ Generated images and diagrams render inline in Claude Desktop with zoom controls
23
40
 
24
41
  **Step 1: Get a Gemini API key**
25
42
 
26
- Go to [Google AI Studio](https://aistudio.google.com/apikey) and create one. The free tier covers most development use -- you'll hit rate limits on deep research if you're hammering it, but for day-to-day work it's fine.
43
+ Go to [Google AI Studio](https://aistudio.google.com/apikey) and create one. The free tier covers most development use you'll hit rate limits on deep research if you're hammering it, but for day-to-day work it's fine.
27
44
 
28
45
  **Step 2: Add to your Claude Desktop config**
29
46
 
@@ -47,7 +64,8 @@ Config file locations:
47
64
 
48
65
  **Step 3: Restart Claude Desktop**
49
66
 
50
- That's it. The tools show up automatically. `npx` pulls the package on first run -- no separate install.
67
+ That's it. Tools show up automatically. `npx` pulls the package on first run no separate install needed.
68
+
51
69
 
52
70
  ### Local build instead
53
71
 
@@ -76,6 +94,26 @@ Then point your config at the local build:
76
94
  }
77
95
  ```
78
96
 
97
+ ### Claude Code (CLI)
98
+
99
+ Claude Code uses a different registration mechanism — it doesn't read `claude_desktop_config.json`. Use `claude mcp add` instead:
100
+
101
+ ```bash
102
+ claude mcp add -e GEMINI_API_KEY=your-api-key-here -s user gemini -- npx -y @houtini/gemini-mcp
103
+ ```
104
+
105
+ With optional image output directory:
106
+
107
+ ```bash
108
+ claude mcp add \
109
+ -e GEMINI_API_KEY=your-api-key-here \
110
+ -e GEMINI_IMAGE_OUTPUT_DIR=/path/to/output \
111
+ -s user \
112
+ gemini -- npx -y @houtini/gemini-mcp
113
+ ```
114
+
115
+ Verify with `claude mcp get gemini` — you should see `Status: Connected`.
116
+
79
117
  ---
80
118
 
81
119
  ## What it does
@@ -86,9 +124,7 @@ Then point your config at the local build:
86
124
  Use gemini:gemini_chat to ask: "What changed in the MCP spec in the last month?"
87
125
  ```
88
126
 
89
- Grounding is on by default. Gemini searches Google before answering, so you get current information rather than training data cutoff answers. Sources come back as markdown links.
90
-
91
- For questions where you want reasoning over live search -- "explain this code" or similar -- set `grounding: false`.
127
+ Grounding is on by default. Gemini searches Google before answering, so you get current information rather than training cutoff answers. Sources come back as markdown links. For questions where you want pure reasoning — "explain this code" or similar — set `grounding: false`.
92
128
 
93
129
  Supports `thinking_level` on Gemini 3 models: `high` for maximum reasoning depth, `low` to keep it fast, `medium`/`minimal` on Gemini 3 Flash only.
94
130
 
@@ -100,9 +136,10 @@ Use gemini:gemini_deep_research with:
100
136
  max_iterations=5
101
137
  ```
102
138
 
103
- Runs multiple grounded search iterations, then synthesises a full report. Takes 2-5 minutes depending on complexity. Worth it for anything where you need comprehensive coverage rather than a quick answer.
139
+ Runs multiple grounded search iterations then synthesises a full report. Takes 2-5 minutes depending on complexity worth it for anything needing comprehensive coverage rather than a quick answer.
140
+
141
+ Set `max_iterations` to 3-4 in Claude Desktop (4-minute tool timeout). In IDEs (Cursor, Windsurf, VS Code) or agent frameworks, 7-10 iterations produces noticeably better synthesis. Pass `focus_areas` as an array to steer toward specific angles.
104
142
 
105
- Set `max_iterations` to 3-4 in Claude Desktop (4-minute tool timeout). In IDEs (Cursor, Windsurf, VS Code) or agent frameworks with longer timeout tolerance, 7-10 iterations produces noticeably better synthesis. Pass `focus_areas` as an array to steer toward specific angles.
106
143
 
107
144
  ### Image generation with search grounding
108
145
 
@@ -115,7 +152,7 @@ Use gemini:generate_image with:
115
152
 
116
153
  Default model is `gemini-3-pro-image-preview` (Nano Banana Pro). Also supports `gemini-2.5-flash-image` for faster generation.
117
154
 
118
- When `use_search=true`, Gemini searches Google for current data before generating. Financial and news queries work reliably and return 2-5 grounding sources as markdown links. Weather queries are inconsistent (Gemini API limitation, not a code issue).
155
+ When `use_search=true`, Gemini searches Google for current data before generating. Financial and news queries work reliably. The full-resolution image saves to disk automatically the inline preview is resized for transport but the original is untouched.
119
156
 
120
157
  ### Video generation with Veo 3.1
121
158
 
@@ -126,18 +163,21 @@ Use gemini:generate_video with:
126
163
  durationSeconds=8
127
164
  ```
128
165
 
129
- Uses Google's Veo 3.1 model. Generates 4-8 second videos at up to 4K resolution with native synchronised audio. Processing takes 2-5 minutes -- the tool polls automatically until the video is ready.
166
+ Uses Google's Veo 3.1 model. Generates 4-8 second videos at up to 4K with native synchronised audio. Processing takes 2-5 minutes the tool polls automatically until ready.
167
+
168
+ Options worth knowing:
169
+ - `aspectRatio` — `16:9` landscape or `9:16` portrait/vertical
170
+ - `generateAudio` — on by default, produces dialogue and sound effects matching the prompt
171
+ - `sampleCount` — generate up to 4 variations in one call
172
+ - `seed` — deterministic output across runs
173
+ - `generateThumbnail` — extracts a frame via ffmpeg (needs ffmpeg in PATH)
174
+ - `firstFrameImage` — animate from a starting image (image-to-video)
130
175
 
131
- Options worth knowing about:
132
- - `aspectRatio` -- `16:9` (landscape, default) or `9:16` (portrait/vertical)
133
- - `generateAudio` -- on by default, produces dialogue and sound effects matching the prompt
134
- - `sampleCount` -- generate up to 4 variations in one call
135
- - `seed` -- for deterministic output across runs
136
- - `generateThumbnail` -- extracts a frame via ffmpeg (needs ffmpeg in PATH)
137
- - `generateHTMLPlayer` -- creates a local HTML player alongside the video
138
176
 
139
177
  ### SVG generation
140
178
 
179
+ This is the one people underestimate. SVG output isn't just diagrams — it's production-ready vector graphics you can drop straight into a codebase, a presentation, or a web page. Clean, scalable, no raster artefacts.
180
+
141
181
  ```
142
182
  Use gemini:generate_svg with:
143
183
  prompt="Architecture diagram showing a microservices system with API gateway, three services, and a shared database"
@@ -146,11 +186,22 @@ Use gemini:generate_svg with:
146
186
  height=600
147
187
  ```
148
188
 
149
- Generates clean, production-ready SVG code for diagrams, illustrations, icons, and data visualisations. Styles: `technical` (diagrams), `artistic` (illustrations), `minimal` (simple), `data-viz` (charts).
189
+ Four styles:
190
+
191
+ | Style | Best for |
192
+ |-------|----------|
193
+ | `technical` | Architecture diagrams, flowcharts, system maps |
194
+ | `artistic` | Illustrations, decorative graphics, icons |
195
+ | `minimal` | Clean data visualisations, simple charts |
196
+ | `data-viz` | Complex charts, dashboards, infographics |
197
+
198
+ The output is actual SVG code — edit it, animate it, embed it in HTML, commit it to a repo. No rasterising, no export steps, no Figma required.
199
+
200
+ ![SVG generation in Claude Desktop](svg-embed.png)
150
201
 
151
202
  ### Image editing and analysis
152
203
 
153
- **Conversational editing** -- Gemini 3 Pro Image maintains context across editing turns using thought signatures. The server captures these automatically. Pass them back on subsequent edit calls for full continuity:
204
+ **Conversational editing** Gemini 3 Pro Image maintains context across editing turns. Pass thought signatures back on subsequent `edit_image` calls for full continuity:
154
205
 
155
206
  ```
156
207
  Use gemini:edit_image with:
@@ -158,21 +209,18 @@ Use gemini:edit_image with:
158
209
  images=[{data: imageBase64, mimeType: "image/png", thoughtSignature: "fromPreviousCall"}]
159
210
  ```
160
211
 
161
- Skip thought signatures and each edit starts from scratch.
162
-
163
- **Analysis** -- two tools for different purposes:
164
- - `describe_image` -- Fast general descriptions using Gemini 3 Flash
165
- - `analyze_image` -- Structured extraction and detailed reasoning using Gemini 3.1 Pro
212
+ **Analysis** two tools for different purposes:
213
+ - `describe_image` — Fast general descriptions using Gemini 3 Flash
214
+ - `analyze_image` Structured extraction and detailed reasoning using Gemini 3.1 Pro
166
215
 
167
216
  **Load local files:**
168
217
  ```
169
218
  Use gemini:load_image_from_path with filePath="C:/screenshots/error.png"
170
219
  ```
171
- Returns base64 data ready for any image tool.
172
220
 
173
221
  ### Media resolution control
174
222
 
175
- Reduce token usage by up to 75% whilst maintaining quality:
223
+ Reduce token usage by up to 75% whilst maintaining quality for the task:
176
224
 
177
225
  | Level | Tokens | Savings | Best for |
178
226
  |-------|--------|---------|----------|
@@ -181,7 +229,8 @@ Reduce token usage by up to 75% whilst maintaining quality:
181
229
  | `MEDIA_RESOLUTION_HIGH` | 1120 | default | Detailed analysis |
182
230
  | `MEDIA_RESOLUTION_ULTRA_HIGH` | 2000+ | per-image only | Maximum detail |
183
231
 
184
- For PDF OCR, MEDIUM gives identical text extraction quality to HIGH at half the tokens. Set `global_media_resolution` to apply to all images, or override per-image with `mediaResolution`.
232
+ For PDF OCR, MEDIUM gives identical text extraction quality to HIGH at half the tokens.
233
+
185
234
 
186
235
  ### Landing page generation
187
236
 
@@ -194,17 +243,17 @@ Use gemini:generate_landing_page with:
194
243
  sections=["hero", "features", "pricing", "cta"]
195
244
  ```
196
245
 
197
- Returns a self-contained HTML file -- inline CSS and vanilla JS, no external dependencies. Styles: `minimal`, `bold`, `corporate`, `startup`.
246
+ Returns a self-contained HTML file inline CSS and vanilla JS, no external dependencies. Styles: `minimal`, `bold`, `corporate`, `startup`.
198
247
 
199
248
  ### Professional chart design systems
200
249
 
201
- The `gemini_prompt_assistant` tool includes 9 professional chart design systems:
250
+ `gemini_prompt_assistant` includes 9 professional chart design systems:
202
251
 
203
252
  | System | Inspiration | Best for |
204
253
  |--------|------------|----------|
205
- | **storytelling** | Cole Nussbaumer Knaflic | Executive presentations -- everything muted except one bold highlight |
206
- | **financial** | Financial Times | Editorial journalism -- FT Pink background, serif titles |
207
- | **terminal** | Bloomberg / Fintech | High-density dark mode with electric neon |
254
+ | **storytelling** | Cole Nussbaumer Knaflic | Executive presentations |
255
+ | **financial** | Financial Times | Editorial journalism FT Pink, serif titles |
256
+ | **terminal** | Bloomberg / Fintech | High-density dark mode with neon |
208
257
  | **modernist** | W.E.B. Du Bois | Bold geometric blocks, stark contrasts |
209
258
  | **professional** | IBM Carbon / Tailwind | Enterprise dashboards |
210
259
  | **editorial** | FiveThirtyEight / Economist | Data journalism |
@@ -212,28 +261,19 @@ The `gemini_prompt_assistant` tool includes 9 professional chart design systems:
212
261
  | **minimal** | Edward Tufte | Maximum data-ink ratio |
213
262
  | **dark** | Observable | Modern dark mode |
214
263
 
215
- ```
216
- Use gemini:gemini_prompt_assistant with:
217
- request_type="template"
218
- use_case="product"
219
- desired_outcome="Generate a professional product comparison chart"
220
- ```
221
-
222
264
  ### Help system
223
265
 
224
266
  ```
225
267
  Use gemini:gemini_help with topic="overview"
226
268
  ```
227
269
 
228
- Documentation for all features without leaving Claude. Topics: `overview`, `image_generation`, `image_editing`, `image_analysis`, `chat`, `deep_research`, `grounding`, `media_resolution`, `models`, `all`.
270
+ Full documentation without leaving Claude. Topics: `overview`, `image_generation`, `image_editing`, `image_analysis`, `chat`, `deep_research`, `grounding`, `media_resolution`, `models`, `all`.
229
271
 
230
272
  ---
231
273
 
232
274
  ## Image output and storage
233
275
 
234
- **Default behaviour:** Images return as inline base64 previews (quality 100, 1024px) rendered directly in Claude.
235
-
236
- **Persistent storage:** Set `GEMINI_IMAGE_OUTPUT_DIR` to auto-save all generated images:
276
+ By default, images return as inline previews rendered directly in Claude. Set `GEMINI_IMAGE_OUTPUT_DIR` to auto-save everything:
237
277
 
238
278
  ```json
239
279
  "env": {
@@ -242,18 +282,14 @@ Documentation for all features without leaving Claude. Topics: `overview`, `imag
242
282
  }
243
283
  ```
244
284
 
245
- Every image saves with a timestamp filename. The tool returns both the inline preview and the file path.
246
-
247
- **Per-call override:** Pass `outputPath` on any generation tool to save to a specific location.
285
+ The server uses a two-tier approach to handle the MCP protocol's 1MB JSON-RPC limit whilst preserving full-resolution files:
248
286
 
249
- The server uses a two-tier compression approach to handle the MCP protocol's ~1MB JSON-RPC limit whilst preserving full-resolution files on disk:
287
+ | Tier | Purpose |
288
+ |------|---------|
289
+ | **Full-res** | Saved to disk immediately, untouched |
290
+ | **Preview** | Resized JPEG for inline transport — dynamically sized to fit under the cap |
250
291
 
251
- | Tier | Quality | Max dimension | Purpose |
252
- |------|---------|---------------|---------|
253
- | **Full-res** | Original | Original | Saved to disk |
254
- | **Viewer preview** | 100 | 1024px | MCP App inline preview (~400KB) |
255
-
256
- Gemini returns 2-5MB images. The full image is saved to disk immediately, and a compressed preview is created for the MCP App viewer.
292
+ Gemini returns 2-5MB images. The resize is smart it measures the non-image overhead in each response and calculates the exact binary budget available, stepping down dimensions (800→600→400→300→200px) until it fits. The full image is always there on disk.
257
293
 
258
294
  ---
259
295
 
@@ -261,32 +297,31 @@ Gemini returns 2-5MB images. The full image is saved to disk immediately, and a
261
297
 
262
298
  | Variable | Required | Default | Description |
263
299
  |----------|----------|---------|-------------|
264
- | `GEMINI_API_KEY` | Yes | -- | Google AI API key from [AI Studio](https://aistudio.google.com/apikey) |
300
+ | `GEMINI_API_KEY` | Yes | | Google AI API key from [AI Studio](https://aistudio.google.com/apikey) |
265
301
  | `GEMINI_DEFAULT_MODEL` | No | `gemini-3.1-pro-preview` | Default model for `gemini_chat` and `analyze_image` |
266
302
  | `GEMINI_DEFAULT_GROUNDING` | No | `true` | Enable Google Search grounding by default |
267
- | `GEMINI_IMAGE_OUTPUT_DIR` | No | -- | Auto-save directory for generated images |
303
+ | `GEMINI_IMAGE_OUTPUT_DIR` | No | | Auto-save directory for generated images and videos |
268
304
  | `GEMINI_ALLOW_EXPERIMENTAL` | No | `false` | Include experimental/preview models in auto-discovery |
269
305
  | `GEMINI_MCP_LOG_FILE` | No | `false` | Write logs to `~/.gemini-mcp/logs/` |
270
306
  | `DEBUG_MCP` | No | `false` | Log to stderr for debugging tool calls |
271
307
 
272
- ---
273
308
 
274
309
  ## Tools reference
275
310
 
276
311
  | Tool | Description |
277
312
  |------|-------------|
278
- | `gemini_chat` | Chat with Gemini 3.1 Pro. Google Search grounding on by default. Supports `thinking_level` for Gemini 3 |
313
+ | `gemini_chat` | Chat with Gemini 3.1 Pro. Google Search grounding on by default. Supports `thinking_level` |
279
314
  | `gemini_deep_research` | Multi-step iterative research with Google Search. Synthesises comprehensive reports |
280
- | `gemini_list_models` | Lists available models from the API |
315
+ | `gemini_list_models` | Lists available models from the Gemini API |
281
316
  | `gemini_help` | Documentation for all features without leaving Claude |
282
317
  | `gemini_prompt_assistant` | Expert guidance for image generation with 9 chart design systems |
283
- | `generate_image` | Image generation with search grounding and thought signatures for conversational editing |
284
- | `edit_image` | Edit images with natural-language instructions. Supports multi-turn continuity |
318
+ | `generate_image` | Image generation with optional search grounding. Full-res saved to disk |
319
+ | `edit_image` | Edit images with natural-language instructions. Multi-turn continuity via thought signatures |
285
320
  | `describe_image` | Fast image descriptions using Gemini 3 Flash |
286
321
  | `analyze_image` | Structured extraction and analysis using Gemini 3.1 Pro |
287
322
  | `load_image_from_path` | Read a local image file and return base64 for any image tool |
288
- | `generate_video` | Video generation with Veo 3.1 -- 4-8 seconds at up to 4K with native audio |
289
- | `generate_svg` | Production-ready SVG graphics for diagrams, illustrations, and data visualisations |
323
+ | `generate_video` | Video generation with Veo 3.1 4-8 seconds at up to 4K with native audio |
324
+ | `generate_svg` | Production-ready SVG: diagrams, illustrations, icons, data visualisations |
290
325
  | `generate_landing_page` | Self-contained HTML landing pages with inline CSS/JS |
291
326
 
292
327
  ---
@@ -296,12 +331,12 @@ Gemini returns 2-5MB images. The full image is saved to disk immediately, and a
296
331
  | Model | Used by | Notes |
297
332
  |-------|---------|-------|
298
333
  | `gemini-3.1-pro-preview` | `gemini_chat`, `analyze_image` | Default. Advanced reasoning |
299
- | `gemini-3-pro-image-preview` | `generate_image`, `edit_image` | Nano Banana Pro -- highest quality generation |
334
+ | `gemini-3-pro-image-preview` | `generate_image`, `edit_image` | Nano Banana Pro highest quality image generation |
300
335
  | `gemini-2.5-flash-image` | `generate_image` (optional) | Faster generation, higher volume |
301
336
  | `gemini-3-flash-preview` | `describe_image` | Fast general descriptions |
302
- | `veo-3.1-generate-preview` | `generate_video` | Veo 3.1 -- 4K video with native audio |
337
+ | `veo-3.1-generate-preview` | `generate_video` | Veo 3.1 4K video with native audio |
303
338
 
304
- **Gemini 3 notes:** Temperature is forced to 1.0 on Gemini 3 models (Google's requirement -- lower values cause looping). Thought signatures are captured automatically for conversational image editing. Thinking level only applies to `gemini_chat`.
339
+ **Gemini 3 notes:** Temperature is forced to 1.0 on Gemini 3 models (Google's requirement lower values cause looping). Thinking level only applies to `gemini_chat`.
305
340
 
306
341
  ---
307
342
 
@@ -33,7 +33,7 @@ export const config = {
33
33
  },
34
34
  server: {
35
35
  name: 'gemini-mcp',
36
- version: '2.2.3',
36
+ version: '2.2.4',
37
37
  imageOutputDir: process.env.GEMINI_IMAGE_OUTPUT_DIR
38
38
  },
39
39
  logging: {