@lutery/vision-mcp 1.0.0 → 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,428 +1,227 @@
1
1
  # Vision MCP
2
2
 
3
- MCP Server providing vision capabilities for LLMs via GLM-4.6V, SiliconFlow, and ModelScope. This server enables LLMs without native vision support or with expensive vision models to access cost-effective visual analysis capabilities.
3
+ 一个基于 STDIO MCP Server,为不具备视觉能力(或视觉模型成本较高)的 LLM 提供统一的图片分析能力。通过切换 Provider(环境变量配置),即可使用不同平台/厂商的多模态模型。
4
4
 
5
- ## Features
5
+ ## 支持的模型 / Provider
6
6
 
7
- - 🤖 **Multiple Model Support**: GLM-4.6V, SiliconFlow, and ModelScope vision models
8
- - 🖼️ **Flexible Image Input**: URL, base64 data URL, or local file paths
9
- - 📊 **Multiple Analysis Types**: Image description, UI analysis, object detection, OCR, and structured extraction
10
- - 🔧 **System Prompt Templates**: Built-in templates for common vision tasks
11
- - 📦 **Easy Deployment**: STDIO MCP Server, runs with npx
12
- - 🔒 **Secure**: Environment-based configuration, sensitive data masking in logs
7
+ 通过 `VISION_MODEL_TYPE` 选择提供商:
13
8
 
14
- ### Streaming Response Support
9
+ | type | Provider | 默认 `VISION_API_BASE_URL` | 默认 `VISION_MODEL_NAME` | 备注 |
10
+ |------|----------|----------------------------|--------------------------|------|
11
+ | `glm-4.6v` | 智谱 GLM-4.6V | `https://open.bigmodel.cn/api/paas/v4` | `glm-4.6v` | `glm` 是别名(等同 `glm-4.6v`) |
12
+ | `glm` | GLM-4.6V(别名) | `https://open.bigmodel.cn/api/paas/v4` | `glm-4.6v` | 兼容历史配置 |
13
+ | `siliconflow` | SiliconFlow(OpenAI 兼容) | `https://api.siliconflow.cn/v1` | `Qwen/Qwen2-VL-72B-Instruct` | 视觉模型丰富 |
14
+ | `modelscope` | ModelScope API-Inference(OpenAI 兼容) | `https://api-inference.modelscope.cn/v1` | `ZhipuAI/GLM-4.6V` | 需实名/绑定阿里云,受限额影响 |
15
+ | `openai` | OpenAI | `https://api.openai.com/v1` | `gpt-4o` | 适配 Chat Completions |
16
+ | `claude` | Anthropic Claude(Messages API) | `https://api.anthropic.com` | `claude-3-5-sonnet-20241022` | `baseUrl` 不要带 `/v1` |
17
+ | `gemini` | Google Gemini(generateContent API) | `https://api.gptsapi.net` | `gemini-2.0-flash-exp` | 默认是代理地址,可改为官方或自建网关 |
15
18
 
16
- Current adapters explicitly disable streaming responses (`stream: false`) and are designed for complete JSON responses. This ensures compatibility with both GLM-4.6V and SiliconFlow APIs.
19
+ 获取 API Key / Token(各平台控制台):
17
20
 
18
- **Note**: Streaming-only providers are not currently supported. If a provider only supports streaming responses (Server-Sent Events/text/event-stream format), the adapter will fail as it expects a complete JSON response. To add support for streaming providers, a streaming response parser would need to be implemented.
21
+ - GLM(智谱):https://open.bigmodel.cn/
22
+ - SiliconFlow:https://cloud.siliconflow.cn/
23
+ - ModelScope:https://modelscope.cn/my/myaccesstoken
24
+ - OpenAI:https://platform.openai.com/
25
+ - Claude(Anthropic):https://console.anthropic.com/
26
+ - Gemini(Google AI):https://ai.google.dev/
19
27
 
20
- ## Quick Start
28
+ ## 特性
21
29
 
22
- ### Installation
30
+ - 多 Provider 一键切换(仅需改环境变量)
31
+ - 图片输入:URL / base64 data URL / 本地文件路径
32
+ - 内置系统提示词模板:UI 分析、OCR、目标检测、结构化提取等
33
+ - 安全:日志自动脱敏 API Key,且会过滤模型返回的 thinking/reasoning 内容
34
+ - 严格遵守 MCP:stdout 仅用于 JSON-RPC,日志走 stderr
23
35
 
24
- 1. Clone or download this repository
25
- 2. Install dependencies:
36
+ ## 安装与运行
26
37
 
27
- ```bash
28
- cd vision_mcp
29
- npm install
30
- ```
31
-
32
- ### Configuration
38
+ 要求:Node.js >= 18
33
39
 
34
- Create a `.env` file in the project root:
40
+ ### 作为 NPM 包被 MCP 客户端启动(推荐)
35
41
 
36
- #### Option 1: GLM-4.6V
42
+ MCP 客户端(如 Claude Desktop)里配置命令为 `npx`:
37
43
 
38
- ```bash
39
- VISION_MODEL_TYPE=glm-4.6v
40
- VISION_MODEL_NAME=glm-4.6v
41
- VISION_API_BASE_URL=https://open.bigmodel.cn/api/paas/v4
42
- VISION_API_KEY=your-glm-api-key
44
+ ```json
45
+ {
46
+ "mcpServers": {
47
+ "vision-mcp": {
48
+ "command": "npx",
49
+ "args": ["-y", "@lutery/vision-mcp"],
50
+ "env": {
51
+ "VISION_MODEL_TYPE": "siliconflow",
52
+ "VISION_API_KEY": "sk-your-key",
53
+ "VISION_MODEL_NAME": "Qwen/Qwen2-VL-72B-Instruct",
54
+ "VISION_API_BASE_URL": "https://api.siliconflow.cn/v1"
55
+ }
56
+ }
57
+ }
58
+ }
43
59
  ```
44
60
 
45
- #### Option 2: SiliconFlow
46
-
47
- ```bash
48
- VISION_MODEL_TYPE=siliconflow
49
- VISION_MODEL_NAME=Qwen/Qwen2-VL-72B-Instruct
50
- VISION_API_BASE_URL=https://api.siliconflow.cn/v1
51
- VISION_API_KEY=your-siliconflow-api-key
52
- ```
61
+ 说明:
62
+ - `VISION_MODEL_NAME` / `VISION_API_BASE_URL` 可省略(会使用该 provider 的默认值)
63
+ - 如需更详细的配置项,建议直接参考 `.env.example`
53
64
 
54
- #### Option 3: ModelScope API-Inference
65
+ 也可以全局安装后直接使用可执行文件(`bin` 名称为 `vision-mcp`):
55
66
 
56
67
  ```bash
57
- VISION_MODEL_TYPE=modelscope
58
- VISION_MODEL_NAME=ZhipuAI/GLM-4.6V
59
- VISION_API_BASE_URL=https://api-inference.modelscope.cn/v1
60
- VISION_API_KEY=your-modelscope-token
68
+ npm i -g @lutery/vision-mcp
69
+ vision-mcp
61
70
  ```
62
71
 
63
- **Note**: ModelScope requires:
64
- - Real-name authentication on your ModelScope account
65
- - Aliyun account binding
66
- - API usage limits apply (see [API Limits](https://www.modelscope.cn/docs/model-service/API-Inference/limits))
67
-
68
- ### Build
72
+ ### 本地开发运行
69
73
 
70
74
  ```bash
75
+ cd mcp/vision_mcp
76
+ npm install
71
77
  npm run build
72
- ```
73
-
74
- ### Run (local)
75
-
76
- ```bash
77
78
  node dist/index.js
78
79
  ```
79
80
 
80
- If successful, you'll see: `Vision MCP Server is running on stdio` in stderr.
81
-
82
- ### Run (npx)
83
-
84
- ```bash
85
- # Local package (requires build first)
86
- npx .
87
-
88
- # Published package
89
- npx -y @lutery/vision-mcp
90
- ```
91
-
92
- ## MCP Client Configuration
81
+ 成功启动后,会在 stderr 输出 `Vision MCP Server is running on stdio`。
93
82
 
94
- ### Claude Desktop
83
+ ## 配置(环境变量)
95
84
 
96
- Add to your Claude Desktop configuration:
85
+ 最小必填:
97
86
 
98
- ```json
99
- {
100
- "mcpServers": {
101
- "vision-mcp": {
102
- "command": "npx",
103
- "args": ["-y", "@lutery/vision-mcp"],
104
- "env": {
105
- "VISION_MODEL_TYPE": "glm-4.6v",
106
- "VISION_MODEL_NAME": "glm-4.6v",
107
- "VISION_API_BASE_URL": "https://open.bigmodel.cn/api/paas/v4",
108
- "VISION_API_KEY": "your-api-key"
109
- }
110
- }
111
- }
112
- }
113
- ```
87
+ - `VISION_MODEL_TYPE`:选择 provider
88
+ - `VISION_API_KEY`:对应 provider 的 key/token
114
89
 
115
- Or with a local installation:
90
+ 常用可选:
116
91
 
117
- ```json
118
- {
119
- "mcpServers": {
120
- "vision-mcp": {
121
- "command": "node",
122
- "args": ["/path/to/vision_mcp/dist/index.js"],
123
- "env": {
124
- "VISION_MODEL_TYPE": "glm-4.6v",
125
- "VISION_MODEL_NAME": "glm-4.6v",
126
- "VISION_API_BASE_URL": "https://open.bigmodel.cn/api/paas/v4",
127
- "VISION_API_KEY": "your-api-key"
128
- }
129
- }
130
- }
131
- }
132
- ```
92
+ | 变量 | 说明 | 默认 |
93
+ |------|------|------|
94
+ | `VISION_MODEL_NAME` | 模型名称 | 各 provider 内置默认值 |
95
+ | `VISION_API_BASE_URL` | API 基础地址(不要带具体 endpoint) | 各 provider 内置默认值 |
96
+ | `VISION_API_TIMEOUT` | 超时时间(毫秒) | `60000` |
97
+ | `VISION_MAX_RETRIES` | 最大重试次数 | `2` |
98
+ | `VISION_STRICT_URL_VALIDATION` | 严格校验图片 URL 是否以 `.jpg/.jpeg/.png/.webp` 结尾 | `true` |
99
+ | `LOG_LEVEL` | 日志级别:`debug`/`info`/`warn`/`error` | `info` |
133
100
 
134
- ### Cursor/Codex CLI
101
+ Provider 特有配置:
135
102
 
136
- Similar configuration for other MCP-compatible clients.
103
+ - Claude
104
+ - `VISION_CLAUDE_API_VERSION`:Anthropic API 版本(默认 `2023-06-01`)
105
+ - Gemini
106
+ - `VISION_GEMINI_API_VERSION`:`v1beta` / `v1`(默认 `v1beta`)
107
+ - `VISION_GEMINI_AUTH_MODE`:`bearer` / `x-goog` / `query`(默认 `bearer`)
108
+ - `VISION_GEMINI_IMAGE_PART_MODE`:`inline_data` / `inline_bytes`(默认 `inline_data`)
137
109
 
138
- ## Using the Tools
110
+ ## MCP 工具(Tools
139
111
 
140
- ### 1. Analyze Image
112
+ 本服务注册了 3 个工具:
141
113
 
142
- Main tool for image analysis:
114
+ ### 1) `analyze_image`
143
115
 
144
- ```javascript
145
- // Tool: analyze_image
146
- // Parameters:
147
- {
148
- "image": "https://example.com/image.jpg", // Image URL, base64, or local path
149
- "prompt": "Describe this UI design in detail", // Analysis prompt
150
- "output_format": "text", // Optional: "text" or "json"
151
- "template": "ui-analysis" // Optional: see templates below
152
- }
153
- ```
154
-
155
- #### Example Prompts
116
+ 参数:
156
117
 
157
- **UI Analysis:**
158
118
  ```json
159
119
  {
160
- "image": "./screenshot.png",
161
- "prompt": "Analyze this UI design and extract all UI components with their positions and styles",
120
+ "image": "https://example.com/a.png",
121
+ "prompt": "请描述这个界面有哪些组件",
122
+ "output_format": "text",
162
123
  "template": "ui-analysis"
163
124
  }
164
125
  ```
165
126
 
166
- **Object Detection:**
167
- ```json
168
- {
169
- "image": "https://example.com/photo.jpg",
170
- "prompt": "Detect all objects and provide their coordinates",
171
- "template": "object-detection"
172
- }
173
- ```
127
+ 字段说明:
128
+ - `image`:支持 URL / base64 data URL / 本地路径
129
+ - `prompt`:你的分析任务描述
130
+ - `output_format`:`text` 或 `json`(提示偏好;不会强制校验 JSON)
131
+ - `template`:可选系统模板(见下方 `list_templates`)
174
132
 
175
- **OCR:**
176
- ```json
177
- {
178
- "image": "data:image/png;base64,iVBORw0KGgo...",
179
- "prompt": "Extract all text from this image",
180
- "template": "ocr"
181
- }
182
- ```
133
+ ### 2) `list_templates`
183
134
 
184
- **Structured Extraction:**
185
- ```json
186
- {
187
- "image": "./form.jpg",
188
- "prompt": "Extract all form fields and values as JSON",
189
- "output_format": "json"
190
- }
191
- ```
135
+ 列出内置系统提示词模板(包含 id、用途说明等)。
192
136
 
193
- ### 2. List Templates
137
+ ### 3) `get_config`
194
138
 
195
- List available system prompt templates:
139
+ 返回当前生效的模型配置(API Key 会脱敏)。
196
140
 
197
- ```javascript
198
- // Tool: list_templates
199
- // Parameters: none
200
- ```
141
+ ## 图片输入规范
201
142
 
202
- Available templates:
203
- - `general-description` - General image description
204
- - `ui-analysis` - UI prototype and interface analysis
205
- - `object-detection` - Object detection and localization
206
- - `ocr` - Text extraction (OCR)
207
- - `structured-extraction` - Structured data extraction
143
+ 支持三种输入:
208
144
 
209
- ### 3. Get Config
145
+ 1) URL
210
146
 
211
- Get current model configuration:
212
-
213
- ```javascript
214
- // Tool: get_config
215
- // Parameters: none
147
+ ```text
148
+ https://example.com/image.png
216
149
  ```
217
150
 
218
- ## Image Input Formats
151
+ 默认开启严格校验:URL 必须以 `.jpg/.jpeg/.png/.webp` 结尾,否则报错。可通过 `VISION_STRICT_URL_VALIDATION=false` 放宽(仅告警)。
219
152
 
220
- ### 1. URL
153
+ 2) Base64 Data URL
221
154
 
222
- ```
223
- https://example.com/image.jpg
155
+ ```text
156
+ data:image/png;base64,iVBORw0KGgo...
224
157
  ```
225
158
 
226
- ### 2. Base64 Data URL
159
+ 支持的 MIME:`image/jpeg` / `image/jpg` / `image/png` / `image/webp`。
227
160
 
228
- ```
229
- data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD...
230
- ```
231
-
232
- ### 3. Local File Path
161
+ 3) 本地文件路径
233
162
 
234
- ```
235
- /path/to/image.png
236
- ./relative/path/image.jpg
163
+ ```text
164
+ ./test/image.png
165
+ D:\\path\\to\\image.jpg
237
166
  ```
238
167
 
239
- Note: Local paths only work if the MCP server has access to the filesystem.
240
- Note: URL validation is strict by default (see `VISION_STRICT_URL_VALIDATION`).
168
+ 要求 MCP Server 进程对该路径可读;仅支持 `.jpg/.jpeg/.png/.webp`。
241
169
 
242
- ## Environment Variables
170
+ 补充:Gemini provider 不支持直接传 URL 图片,本项目会在 Gemini 适配器内下载 URL 并转 base64(有大小与超时限制)。
243
171
 
244
- | Variable | Description | Default | Required |
245
- |----------|-------------|---------|----------|
246
- | `VISION_MODEL_TYPE` | Model type: `glm` (alias for `glm-4.6v`), `glm-4.6v`, `siliconflow`, or `modelscope` | - | Yes |
247
- | `VISION_MODEL_NAME` | Model name for the API | See defaults below | Yes |
248
- | `VISION_API_BASE_URL` | API base URL (must be base path, no `/chat/completions`) | See defaults below | Yes |
249
- | `VISION_API_KEY` | API key for authentication | - | Yes |
250
- | `VISION_API_TIMEOUT` | Request timeout in milliseconds | 60000 | No |
251
- | `VISION_MAX_RETRIES` | Maximum retry attempts | 2 | No |
252
- | `VISION_STRICT_URL_VALIDATION` | Enforce strict image URL validation | `true` | No |
253
- | `LOG_LEVEL` | Log level: `debug`, `info`, `warn`, `error` | `info` | No |
172
+ ## 关于流式响应(Streaming)
254
173
 
255
- **Notes**:
256
- - `VISION_STRICT_URL_VALIDATION` defaults to `true`, enforcing strict validation that URLs must end with supported image extensions (`.jpg`, `.jpeg`, `.png`, `.webp`). Set to `false` to allow non-image URLs with a warning only.
257
- - For GLM-4.6V provider, both `glm` and `glm-4.6v` values work for `VISION_MODEL_TYPE`. `glm` is provided as a convenient alias.
174
+ 所有适配器均强制 `stream: false`,并按“完整 JSON 响应”进行解析。
258
175
 
259
- ### Model Defaults
176
+ 如果某个上游只支持 SSE / `text/event-stream`,目前不支持(需要额外实现流式解析器)。
260
177
 
261
- **GLM-4.6V:**
262
- ```bash
263
- VISION_MODEL_NAME=glm-4.6v
264
- VISION_API_BASE_URL=https://open.bigmodel.cn/api/paas/v4
265
- ```
178
+ ## 开发与测试
266
179
 
267
- **SiliconFlow:**
268
180
  ```bash
269
- VISION_MODEL_NAME=Qwen/Qwen2-VL-72B-Instruct
270
- VISION_API_BASE_URL=https://api.siliconflow.cn/v1
181
+ cd mcp/vision_mcp
182
+ npm install
183
+ npm run build
271
184
  ```
272
185
 
273
- ## API Keys
274
-
275
- ### GLM-4.6V
276
-
277
- Get your API key from: [智谱 AI 开放平台](https://open.bigmodel.cn/)
278
-
279
- Format: `xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.xxxxxxxxxxxxxxxxxxxx`
280
-
281
- ### SiliconFlow
282
-
283
- Get your API key from: [SiliconFlow](https://cloud.siliconflow.cn/)
284
-
285
- Format: `sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx`
286
-
287
- ## MCP Protocol Note
186
+ 测试:
288
187
 
289
- **IMPORTANT**: This is a STDIO-based MCP Server. According to MCP protocol:
188
+ - 仅跑单测(不需要任何 API Key):
290
189
 
291
- - **DO NOT** use `console.log()` or write to stdout
292
- - **USE ONLY** `console.error()` for logging (stderr)
293
- - stdout is reserved for JSON-RPC communication
294
-
295
- The server handles this automatically. If you fork this project, ensure you follow this rule.
296
-
297
- ## Development
298
-
299
- ### Project Structure
300
-
301
- ```
302
- vision_mcp/
303
- ├── src/
304
- │ ├── index.ts # MCP Server entry point
305
- │ ├── config/
306
- │ │ └── model-config.ts # Configuration management
307
- │ ├── tools/
308
- │ │ └── vision-tool.ts # Vision analysis tool
309
- │ ├── adapters/
310
- │ │ ├── base-adapter.ts # Base adapter class
311
- │ │ ├── glm-adapter.ts # GLM-4.6V adapter
312
- │ │ └── siliconflow-adapter.ts # SiliconFlow adapter
313
- │ ├── prompts/
314
- │ │ └── system.ts # System prompt templates
315
- │ └── utils/
316
- │ ├── errors.ts # Error handling
317
- │ ├── logger.ts # Logging utilities
318
- │ └── image-input.ts # Image input normalization
319
- ├── package.json
320
- ├── tsconfig.json
321
- └── README.md
190
+ ```bash
191
+ npm run test:unit
322
192
  ```
323
193
 
324
- ### Building
194
+ - 跑集成测试(需要配置好 `VISION_*` 环境变量):
325
195
 
326
196
  ```bash
327
- # Install dependencies
328
- npm install
329
-
330
- # Build TypeScript
331
- npm run build
332
-
333
- # Run tests
334
197
  npm test
335
198
  ```
336
199
 
337
- ### Testing Notes
338
-
339
- - `npm test` uses `VISION_API_KEY` (default) or provider-specific keys in the test script:
340
- - `SILICONFLOW_API_KEY`
341
- - `GLM_API_KEY`
342
- - If no API key is set, the tests will exit with a clear error message.
343
-
344
- ## Troubleshooting
345
-
346
- ### 1. "Failed to load model configuration"
347
-
348
- - Check all required environment variables are set
349
- - Verify `VISION_MODEL_TYPE` is either `glm-4.6v` or `siliconflow`
350
-
351
- ### 2. "API Key not found"
352
-
353
- - Set `VISION_API_KEY` in your environment
354
- - Verify the API key format matches the model requirements
355
-
356
- ### 3. "Connection timeout"
357
-
358
- - Increase `VISION_API_TIMEOUT` value
359
- - Check network connectivity to the API endpoint
360
- - Verify API endpoint URL is correct
361
-
362
- ### 4. "Invalid image URL"
363
-
364
- - Ensure URL is publicly accessible
365
- - Check URL format (http:// or https://)
366
- - Verify image format is supported
367
-
368
- ### 5. "Permission denied reading file"
369
-
370
- - MCP server needs filesystem access for local files
371
- - Use absolute paths or ensure relative paths are accessible
372
- - Check file permissions
373
-
374
- ### 6. "Invalid API endpoint" or "404 Not Found"
375
-
376
- - Ensure `VISION_API_BASE_URL` is the base path only, without `/chat/completions`
377
- - Correct: `https://api.siliconflow.cn/v1`
378
- - Incorrect: `https://api.siliconflow.cn/v1/chat/completions`
379
- - Check the error details for the full request URL to diagnose endpoint issues
380
-
381
- ## Security Notes
382
-
383
- - API keys are loaded from environment variables, never hardcoded
384
- - API keys are masked in logs
385
- - Images are not persisted by default
386
- - MCP server should run in trusted environments only (no built-in auth)
387
- - **Thinking/Reasoning Content Filtering**: Model thinking/reasoning content is automatically filtered from responses to prevent exposing internal reasoning to MCP clients. This filtering is unconditional and applied to all supported models regardless of configuration.
388
-
389
- ## Security Best Practices
390
-
391
- ⚠️ **IMPORTANT**: Never commit API keys or credentials to the repository!
200
+ ## 常见问题(Troubleshooting)
392
201
 
393
- - **Use environment variables** for sensitive data (`.env` file)
394
- - **Keep local test credentials** in `.gitignore`'d files (e.g., `test_key.local.md`)
395
- - **Rotate keys immediately** if accidentally exposed or committed
396
- - **See** `doc/test_key.example.md` for test setup template
397
- - **Never** copy real API keys into documentation, code comments, or issue trackers
202
+ ### 1) 配置加载失败:`Missing VISION_MODEL_TYPE` / `Unsupported model type`
398
203
 
399
- **Key Protection Checklist**:
400
- - [ ] `.env` is in `.gitignore`
401
- - [ ] `.env.local` is in `.gitignore`
402
- - [ ] No real keys in `test_key.md` (use `test_key.example.md` instead)
403
- - [ ] No keys in documentation or comments
404
- - [ ] Review git history for accidental key commits (`git log --all --full-history -S --source --all -- "*secret*" "*key*" "*password*" "test_key.md"`)
204
+ - 确认设置了 `VISION_MODEL_TYPE`
205
+ - 可用值:`glm` / `glm-4.6v` / `siliconflow` / `modelscope` / `openai` / `claude` / `gemini`
405
206
 
406
- ## License
207
+ ### 2) `Missing VISION_API_KEY`
407
208
 
408
- MIT
209
+ - 确认 `VISION_API_KEY` 已设置(在 `.env` 或 MCP 客户端 `env` 里)
409
210
 
410
- ## Contributing
211
+ ### 3) 404 / endpoint 错误
411
212
 
412
- 1. Fork the repository
413
- 2. Create a feature branch
414
- 3. Make your changes
415
- 4. Add tests
416
- 5. Submit a pull request
213
+ - `VISION_API_BASE_URL` 必须是“base”,不要带具体 endpoint
214
+ - OpenAI / SiliconFlow / ModelScope:会自动拼 `/chat/completions`
215
+ - Claude:会自动拼 `/v1/messages`(`baseUrl` 不要写成 `.../v1`)
216
+ - Gemini:会自动拼 `/{apiVersion}/models/{model}:generateContent`
417
217
 
418
- ## Support
218
+ ### 4) 图片 URL 校验失败
419
219
 
420
- For issues and questions:
421
- - Open an issue on the repository
422
- - Check model documentation:
423
- - [GLM-4.6V Docs](https://docs.bigmodel.cn/)
424
- - [SiliconFlow Docs](https://docs.siliconflow.cn/)
220
+ - 默认要求 URL 以 `.jpg/.jpeg/.png/.webp` 结尾
221
+ - 如需放宽:`VISION_STRICT_URL_VALIDATION=false`
425
222
 
223
+ ## 安全说明
426
224
 
427
- ## TODO
428
- - [ ] 适配modelscope的视觉模型接口请求:https://www.modelscope.cn/docs/model-service/API-Inference/intro
225
+ - 不要在 stdout 打日志(stdout 仅用于 MCP JSON-RPC),本项目日志统一走 stderr
226
+ - API Key 会在日志中脱敏
227
+ - 会无条件过滤模型返回的 thinking/reasoning 内容,避免泄露内部推理信息
@@ -0,0 +1,73 @@
1
+ /**
2
+ * Gemini Adapter
3
+ *
4
+ * @description Gemini generateContent API 适配器实现,支持 Google Gemini 多模态视觉模型
5
+ * @see https://ai.google.dev/api/rest/v1beta/models/generateContent
6
+ */
7
+ import { BaseVisionModelAdapter, VisionModelResponse } from './base-adapter.js';
8
+ import { ModelConfig } from '../config/model-config.js';
9
+ export interface GeminiAdapterOptions {
10
+ apiVersion?: string;
11
+ authMode?: 'bearer' | 'x-goog' | 'query';
12
+ imagePartMode?: 'inline_data' | 'inline_bytes';
13
+ maxTokens?: number;
14
+ }
15
+ export declare class GeminiAdapter extends BaseVisionModelAdapter {
16
+ private options;
17
+ constructor(config: ModelConfig, options?: GeminiAdapterOptions);
18
+ analyze(imageData: string, prompt: string): Promise<string>;
19
+ analyzeWithResponse(imageData: string, prompt: string): Promise<VisionModelResponse>;
20
+ private callGeminiAPI;
21
+ /**
22
+ * 构建 API URL
23
+ * Format: {baseUrl}/{apiVersion}/models/{model}:generateContent
24
+ */
25
+ private buildApiUrl;
26
+ /**
27
+ * 构建认证头
28
+ */
29
+ private buildAuthHeaders;
30
+ /**
31
+ * 构建请求体
32
+ */
33
+ private buildRequest;
34
+ /**
35
+ * 构建图片 part
36
+ * 支持三种输入格式:
37
+ * 1. HTTP(S) URL - 下载并转换为 base64
38
+ * 2. Data URL - 直接解析
39
+ * 3. Base64 字符串 - 假设为图片
40
+ */
41
+ private buildImagePart;
42
+ /**
43
+ * 创建 inline_data part
44
+ */
45
+ private createInlineDataPart;
46
+ /**
47
+ * 脱敏 URL 用于日志记录(移除敏感的查询参数)
48
+ * @param url - 原始 URL
49
+ * @returns 脱敏后的 URL(query 模式下的 key 会被替换为 ***)
50
+ */
51
+ private sanitizeUrl;
52
+ /**
53
+ * 下载图片并转换为 base64
54
+ * Gemini 不支持直接的 URL 输入,必须先下载
55
+ */
56
+ private downloadImageAsBase64;
57
+ /**
58
+ * 处理错误响应
59
+ */
60
+ private handleErrorResponse;
61
+ /**
62
+ * 解析响应数据
63
+ */
64
+ private parseResponseData;
65
+ /**
66
+ * 归一化 Gemini 响应为 VisionModelResponse
67
+ * 支持两种响应格式:
68
+ * 1. 官方格式: candidates[0].content.parts[].text
69
+ * 2. 代理格式: candidates[0].output.parts[].text
70
+ */
71
+ private normalizeResponse;
72
+ }
73
+ //# sourceMappingURL=gemini-adapter.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"gemini-adapter.d.ts","sourceRoot":"","sources":["../../src/adapters/gemini-adapter.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAEH,OAAO,EAAE,sBAAsB,EAAE,mBAAmB,EAAE,MAAM,mBAAmB,CAAC;AAChF,OAAO,EAAE,WAAW,EAAE,MAAM,2BAA2B,CAAC;AAMxD,MAAM,WAAW,oBAAoB;IACnC,UAAU,CAAC,EAAE,MAAM,CAAC;IACpB,QAAQ,CAAC,EAAE,QAAQ,GAAG,QAAQ,GAAG,OAAO,CAAC;IACzC,aAAa,CAAC,EAAE,aAAa,GAAG,cAAc,CAAC;IAC/C,SAAS,CAAC,EAAE,MAAM,CAAC;CACpB;AAOD,qBAAa,aAAc,SAAQ,sBAAsB;IACvD,OAAO,CAAC,OAAO,CAAiC;gBAEpC,MAAM,EAAE,WAAW,EAAE,OAAO,GAAE,oBAAyB;IAgC7D,OAAO,CAAC,SAAS,EAAE,MAAM,EAAE,MAAM,EAAE,MAAM,GAAG,OAAO,CAAC,MAAM,CAAC;IAyB3D,mBAAmB,CAAC,SAAS,EAAE,MAAM,EAAE,MAAM,EAAE,MAAM,GAAG,OAAO,CAAC,mBAAmB,CAAC;YA0B5E,aAAa;IAuD3B;;;OAGG;IACH,OAAO,CAAC,WAAW;IAcnB;;OAEG;IACH,OAAO,CAAC,gBAAgB;IAcxB;;OAEG;YACW,YAAY;IAmB1B;;;;;;OAMG;YACW,cAAc;IAmB5B;;OAEG;IACH,OAAO,CAAC,oBAAoB;IAiB5B;;;;OAIG;IACH,OAAO,CAAC,WAAW;IAOnB;;;OAGG;YACW,qBAAqB;IAuEnC;;OAEG;YACW,mBAAmB;IAgCjC;;OAEG;YACW,iBAAiB;IAiC/B;;;;;OAKG;IACH,OAAO,CAAC,iBAAiB;CAsE1B"}