@lutery/vision-mcp 1.0.0 → 1.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +136 -337
- package/dist/adapters/gemini-adapter.d.ts +73 -0
- package/dist/adapters/gemini-adapter.d.ts.map +1 -0
- package/dist/adapters/gemini-adapter.js +406 -0
- package/dist/adapters/gemini-adapter.js.map +1 -0
- package/dist/providers/provider-registry.d.ts.map +1 -1
- package/dist/providers/provider-registry.js +23 -1
- package/dist/providers/provider-registry.js.map +1 -1
- package/dist/utils/errors.d.ts +12 -0
- package/dist/utils/errors.d.ts.map +1 -1
- package/dist/utils/errors.js +44 -1
- package/dist/utils/errors.js.map +1 -1
- package/dist/utils/logger.d.ts +13 -1
- package/dist/utils/logger.d.ts.map +1 -1
- package/dist/utils/logger.js +56 -9
- package/dist/utils/logger.js.map +1 -1
- package/dist/utils/thinking-extractors.d.ts +11 -0
- package/dist/utils/thinking-extractors.d.ts.map +1 -1
- package/dist/utils/thinking-extractors.js +47 -0
- package/dist/utils/thinking-extractors.js.map +1 -1
- package/dist/utils/thinking-filter.d.ts.map +1 -1
- package/dist/utils/thinking-filter.js +2 -1
- package/dist/utils/thinking-filter.js.map +1 -1
- package/package.json +2 -2
package/README.md
CHANGED
|
@@ -1,428 +1,227 @@
|
|
|
1
1
|
# Vision MCP
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
一个基于 STDIO 的 MCP Server,为不具备视觉能力(或视觉模型成本较高)的 LLM 提供统一的图片分析能力。通过切换 Provider(环境变量配置),即可使用不同平台/厂商的多模态模型。
|
|
4
4
|
|
|
5
|
-
##
|
|
5
|
+
## 支持的模型 / Provider
|
|
6
6
|
|
|
7
|
-
|
|
8
|
-
- 🖼️ **Flexible Image Input**: URL, base64 data URL, or local file paths
|
|
9
|
-
- 📊 **Multiple Analysis Types**: Image description, UI analysis, object detection, OCR, and structured extraction
|
|
10
|
-
- 🔧 **System Prompt Templates**: Built-in templates for common vision tasks
|
|
11
|
-
- 📦 **Easy Deployment**: STDIO MCP Server, runs with npx
|
|
12
|
-
- 🔒 **Secure**: Environment-based configuration, sensitive data masking in logs
|
|
7
|
+
通过 `VISION_MODEL_TYPE` 选择提供商:
|
|
13
8
|
|
|
14
|
-
|
|
9
|
+
| type | Provider | 默认 `VISION_API_BASE_URL` | 默认 `VISION_MODEL_NAME` | 备注 |
|
|
10
|
+
|------|----------|----------------------------|--------------------------|------|
|
|
11
|
+
| `glm-4.6v` | 智谱 GLM-4.6V | `https://open.bigmodel.cn/api/paas/v4` | `glm-4.6v` | `glm` 是别名(等同 `glm-4.6v`) |
|
|
12
|
+
| `glm` | GLM-4.6V(别名) | `https://open.bigmodel.cn/api/paas/v4` | `glm-4.6v` | 兼容历史配置 |
|
|
13
|
+
| `siliconflow` | SiliconFlow(OpenAI 兼容) | `https://api.siliconflow.cn/v1` | `Qwen/Qwen2-VL-72B-Instruct` | 视觉模型丰富 |
|
|
14
|
+
| `modelscope` | ModelScope API-Inference(OpenAI 兼容) | `https://api-inference.modelscope.cn/v1` | `ZhipuAI/GLM-4.6V` | 需实名/绑定阿里云,受限额影响 |
|
|
15
|
+
| `openai` | OpenAI | `https://api.openai.com/v1` | `gpt-4o` | 适配 Chat Completions |
|
|
16
|
+
| `claude` | Anthropic Claude(Messages API) | `https://api.anthropic.com` | `claude-3-5-sonnet-20241022` | `baseUrl` 不要带 `/v1` |
|
|
17
|
+
| `gemini` | Google Gemini(generateContent API) | `https://api.gptsapi.net` | `gemini-2.0-flash-exp` | 默认是代理地址,可改为官方或自建网关 |
|
|
15
18
|
|
|
16
|
-
|
|
19
|
+
获取 API Key / Token(各平台控制台):
|
|
17
20
|
|
|
18
|
-
|
|
21
|
+
- GLM(智谱):https://open.bigmodel.cn/
|
|
22
|
+
- SiliconFlow:https://cloud.siliconflow.cn/
|
|
23
|
+
- ModelScope:https://modelscope.cn/my/myaccesstoken
|
|
24
|
+
- OpenAI:https://platform.openai.com/
|
|
25
|
+
- Claude(Anthropic):https://console.anthropic.com/
|
|
26
|
+
- Gemini(Google AI):https://ai.google.dev/
|
|
19
27
|
|
|
20
|
-
##
|
|
28
|
+
## 特性
|
|
21
29
|
|
|
22
|
-
|
|
30
|
+
- 多 Provider 一键切换(仅需改环境变量)
|
|
31
|
+
- 图片输入:URL / base64 data URL / 本地文件路径
|
|
32
|
+
- 内置系统提示词模板:UI 分析、OCR、目标检测、结构化提取等
|
|
33
|
+
- 安全:日志自动脱敏 API Key,且会过滤模型返回的 thinking/reasoning 内容
|
|
34
|
+
- 严格遵守 MCP:stdout 仅用于 JSON-RPC,日志走 stderr
|
|
23
35
|
|
|
24
|
-
|
|
25
|
-
2. Install dependencies:
|
|
36
|
+
## 安装与运行
|
|
26
37
|
|
|
27
|
-
|
|
28
|
-
cd vision_mcp
|
|
29
|
-
npm install
|
|
30
|
-
```
|
|
31
|
-
|
|
32
|
-
### Configuration
|
|
38
|
+
要求:Node.js >= 18
|
|
33
39
|
|
|
34
|
-
|
|
40
|
+
### 作为 NPM 包被 MCP 客户端启动(推荐)
|
|
35
41
|
|
|
36
|
-
|
|
42
|
+
在 MCP 客户端(如 Claude Desktop)里配置命令为 `npx`:
|
|
37
43
|
|
|
38
|
-
```
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
44
|
+
```json
|
|
45
|
+
{
|
|
46
|
+
"mcpServers": {
|
|
47
|
+
"vision-mcp": {
|
|
48
|
+
"command": "npx",
|
|
49
|
+
"args": ["-y", "@lutery/vision-mcp"],
|
|
50
|
+
"env": {
|
|
51
|
+
"VISION_MODEL_TYPE": "siliconflow",
|
|
52
|
+
"VISION_API_KEY": "sk-your-key",
|
|
53
|
+
"VISION_MODEL_NAME": "Qwen/Qwen2-VL-72B-Instruct",
|
|
54
|
+
"VISION_API_BASE_URL": "https://api.siliconflow.cn/v1"
|
|
55
|
+
}
|
|
56
|
+
}
|
|
57
|
+
}
|
|
58
|
+
}
|
|
43
59
|
```
|
|
44
60
|
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
VISION_MODEL_TYPE=siliconflow
|
|
49
|
-
VISION_MODEL_NAME=Qwen/Qwen2-VL-72B-Instruct
|
|
50
|
-
VISION_API_BASE_URL=https://api.siliconflow.cn/v1
|
|
51
|
-
VISION_API_KEY=your-siliconflow-api-key
|
|
52
|
-
```
|
|
61
|
+
说明:
|
|
62
|
+
- `VISION_MODEL_NAME` / `VISION_API_BASE_URL` 可省略(会使用该 provider 的默认值)
|
|
63
|
+
- 如需更详细的配置项,建议直接参考 `.env.example`
|
|
53
64
|
|
|
54
|
-
|
|
65
|
+
也可以全局安装后直接使用可执行文件(`bin` 名称为 `vision-mcp`):
|
|
55
66
|
|
|
56
67
|
```bash
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
VISION_API_BASE_URL=https://api-inference.modelscope.cn/v1
|
|
60
|
-
VISION_API_KEY=your-modelscope-token
|
|
68
|
+
npm i -g @lutery/vision-mcp
|
|
69
|
+
vision-mcp
|
|
61
70
|
```
|
|
62
71
|
|
|
63
|
-
|
|
64
|
-
- Real-name authentication on your ModelScope account
|
|
65
|
-
- Aliyun account binding
|
|
66
|
-
- API usage limits apply (see [API Limits](https://www.modelscope.cn/docs/model-service/API-Inference/limits))
|
|
67
|
-
|
|
68
|
-
### Build
|
|
72
|
+
### 本地开发运行
|
|
69
73
|
|
|
70
74
|
```bash
|
|
75
|
+
cd mcp/vision_mcp
|
|
76
|
+
npm install
|
|
71
77
|
npm run build
|
|
72
|
-
```
|
|
73
|
-
|
|
74
|
-
### Run (local)
|
|
75
|
-
|
|
76
|
-
```bash
|
|
77
78
|
node dist/index.js
|
|
78
79
|
```
|
|
79
80
|
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
### Run (npx)
|
|
83
|
-
|
|
84
|
-
```bash
|
|
85
|
-
# Local package (requires build first)
|
|
86
|
-
npx .
|
|
87
|
-
|
|
88
|
-
# Published package
|
|
89
|
-
npx -y @lutery/vision-mcp
|
|
90
|
-
```
|
|
91
|
-
|
|
92
|
-
## MCP Client Configuration
|
|
81
|
+
成功启动后,会在 stderr 输出 `Vision MCP Server is running on stdio`。
|
|
93
82
|
|
|
94
|
-
|
|
83
|
+
## 配置(环境变量)
|
|
95
84
|
|
|
96
|
-
|
|
85
|
+
最小必填:
|
|
97
86
|
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
"mcpServers": {
|
|
101
|
-
"vision-mcp": {
|
|
102
|
-
"command": "npx",
|
|
103
|
-
"args": ["-y", "@lutery/vision-mcp"],
|
|
104
|
-
"env": {
|
|
105
|
-
"VISION_MODEL_TYPE": "glm-4.6v",
|
|
106
|
-
"VISION_MODEL_NAME": "glm-4.6v",
|
|
107
|
-
"VISION_API_BASE_URL": "https://open.bigmodel.cn/api/paas/v4",
|
|
108
|
-
"VISION_API_KEY": "your-api-key"
|
|
109
|
-
}
|
|
110
|
-
}
|
|
111
|
-
}
|
|
112
|
-
}
|
|
113
|
-
```
|
|
87
|
+
- `VISION_MODEL_TYPE`:选择 provider
|
|
88
|
+
- `VISION_API_KEY`:对应 provider 的 key/token
|
|
114
89
|
|
|
115
|
-
|
|
90
|
+
常用可选:
|
|
116
91
|
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
"VISION_MODEL_NAME": "glm-4.6v",
|
|
126
|
-
"VISION_API_BASE_URL": "https://open.bigmodel.cn/api/paas/v4",
|
|
127
|
-
"VISION_API_KEY": "your-api-key"
|
|
128
|
-
}
|
|
129
|
-
}
|
|
130
|
-
}
|
|
131
|
-
}
|
|
132
|
-
```
|
|
92
|
+
| 变量 | 说明 | 默认 |
|
|
93
|
+
|------|------|------|
|
|
94
|
+
| `VISION_MODEL_NAME` | 模型名称 | 各 provider 内置默认值 |
|
|
95
|
+
| `VISION_API_BASE_URL` | API 基础地址(不要带具体 endpoint) | 各 provider 内置默认值 |
|
|
96
|
+
| `VISION_API_TIMEOUT` | 超时时间(毫秒) | `60000` |
|
|
97
|
+
| `VISION_MAX_RETRIES` | 最大重试次数 | `2` |
|
|
98
|
+
| `VISION_STRICT_URL_VALIDATION` | 严格校验图片 URL 是否以 `.jpg/.jpeg/.png/.webp` 结尾 | `true` |
|
|
99
|
+
| `LOG_LEVEL` | 日志级别:`debug`/`info`/`warn`/`error` | `info` |
|
|
133
100
|
|
|
134
|
-
|
|
101
|
+
Provider 特有配置:
|
|
135
102
|
|
|
136
|
-
|
|
103
|
+
- Claude
|
|
104
|
+
- `VISION_CLAUDE_API_VERSION`:Anthropic API 版本(默认 `2023-06-01`)
|
|
105
|
+
- Gemini
|
|
106
|
+
- `VISION_GEMINI_API_VERSION`:`v1beta` / `v1`(默认 `v1beta`)
|
|
107
|
+
- `VISION_GEMINI_AUTH_MODE`:`bearer` / `x-goog` / `query`(默认 `bearer`)
|
|
108
|
+
- `VISION_GEMINI_IMAGE_PART_MODE`:`inline_data` / `inline_bytes`(默认 `inline_data`)
|
|
137
109
|
|
|
138
|
-
##
|
|
110
|
+
## MCP 工具(Tools)
|
|
139
111
|
|
|
140
|
-
|
|
112
|
+
本服务注册了 3 个工具:
|
|
141
113
|
|
|
142
|
-
|
|
114
|
+
### 1) `analyze_image`
|
|
143
115
|
|
|
144
|
-
|
|
145
|
-
// Tool: analyze_image
|
|
146
|
-
// Parameters:
|
|
147
|
-
{
|
|
148
|
-
"image": "https://example.com/image.jpg", // Image URL, base64, or local path
|
|
149
|
-
"prompt": "Describe this UI design in detail", // Analysis prompt
|
|
150
|
-
"output_format": "text", // Optional: "text" or "json"
|
|
151
|
-
"template": "ui-analysis" // Optional: see templates below
|
|
152
|
-
}
|
|
153
|
-
```
|
|
154
|
-
|
|
155
|
-
#### Example Prompts
|
|
116
|
+
参数:
|
|
156
117
|
|
|
157
|
-
**UI Analysis:**
|
|
158
118
|
```json
|
|
159
119
|
{
|
|
160
|
-
"image": "
|
|
161
|
-
"prompt": "
|
|
120
|
+
"image": "https://example.com/a.png",
|
|
121
|
+
"prompt": "请描述这个界面有哪些组件",
|
|
122
|
+
"output_format": "text",
|
|
162
123
|
"template": "ui-analysis"
|
|
163
124
|
}
|
|
164
125
|
```
|
|
165
126
|
|
|
166
|
-
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
"template": "object-detection"
|
|
172
|
-
}
|
|
173
|
-
```
|
|
127
|
+
字段说明:
|
|
128
|
+
- `image`:支持 URL / base64 data URL / 本地路径
|
|
129
|
+
- `prompt`:你的分析任务描述
|
|
130
|
+
- `output_format`:`text` 或 `json`(提示偏好;不会强制校验 JSON)
|
|
131
|
+
- `template`:可选系统模板(见下方 `list_templates`)
|
|
174
132
|
|
|
175
|
-
|
|
176
|
-
```json
|
|
177
|
-
{
|
|
178
|
-
"image": "data:image/png;base64,iVBORw0KGgo...",
|
|
179
|
-
"prompt": "Extract all text from this image",
|
|
180
|
-
"template": "ocr"
|
|
181
|
-
}
|
|
182
|
-
```
|
|
133
|
+
### 2) `list_templates`
|
|
183
134
|
|
|
184
|
-
|
|
185
|
-
```json
|
|
186
|
-
{
|
|
187
|
-
"image": "./form.jpg",
|
|
188
|
-
"prompt": "Extract all form fields and values as JSON",
|
|
189
|
-
"output_format": "json"
|
|
190
|
-
}
|
|
191
|
-
```
|
|
135
|
+
列出内置系统提示词模板(包含 id、用途说明等)。
|
|
192
136
|
|
|
193
|
-
###
|
|
137
|
+
### 3) `get_config`
|
|
194
138
|
|
|
195
|
-
|
|
139
|
+
返回当前生效的模型配置(API Key 会脱敏)。
|
|
196
140
|
|
|
197
|
-
|
|
198
|
-
// Tool: list_templates
|
|
199
|
-
// Parameters: none
|
|
200
|
-
```
|
|
141
|
+
## 图片输入规范
|
|
201
142
|
|
|
202
|
-
|
|
203
|
-
- `general-description` - General image description
|
|
204
|
-
- `ui-analysis` - UI prototype and interface analysis
|
|
205
|
-
- `object-detection` - Object detection and localization
|
|
206
|
-
- `ocr` - Text extraction (OCR)
|
|
207
|
-
- `structured-extraction` - Structured data extraction
|
|
143
|
+
支持三种输入:
|
|
208
144
|
|
|
209
|
-
|
|
145
|
+
1) URL
|
|
210
146
|
|
|
211
|
-
|
|
212
|
-
|
|
213
|
-
```javascript
|
|
214
|
-
// Tool: get_config
|
|
215
|
-
// Parameters: none
|
|
147
|
+
```text
|
|
148
|
+
https://example.com/image.png
|
|
216
149
|
```
|
|
217
150
|
|
|
218
|
-
|
|
151
|
+
默认开启严格校验:URL 必须以 `.jpg/.jpeg/.png/.webp` 结尾,否则报错。可通过 `VISION_STRICT_URL_VALIDATION=false` 放宽(仅告警)。
|
|
219
152
|
|
|
220
|
-
|
|
153
|
+
2) Base64 Data URL
|
|
221
154
|
|
|
222
|
-
```
|
|
223
|
-
|
|
155
|
+
```text
|
|
156
|
+
data:image/png;base64,iVBORw0KGgo...
|
|
224
157
|
```
|
|
225
158
|
|
|
226
|
-
|
|
159
|
+
支持的 MIME:`image/jpeg` / `image/jpg` / `image/png` / `image/webp`。
|
|
227
160
|
|
|
228
|
-
|
|
229
|
-
data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD...
|
|
230
|
-
```
|
|
231
|
-
|
|
232
|
-
### 3. Local File Path
|
|
161
|
+
3) 本地文件路径
|
|
233
162
|
|
|
234
|
-
```
|
|
235
|
-
/
|
|
236
|
-
|
|
163
|
+
```text
|
|
164
|
+
./test/image.png
|
|
165
|
+
D:\\path\\to\\image.jpg
|
|
237
166
|
```
|
|
238
167
|
|
|
239
|
-
|
|
240
|
-
Note: URL validation is strict by default (see `VISION_STRICT_URL_VALIDATION`).
|
|
168
|
+
要求 MCP Server 进程对该路径可读;仅支持 `.jpg/.jpeg/.png/.webp`。
|
|
241
169
|
|
|
242
|
-
|
|
170
|
+
补充:Gemini provider 不支持直接传 URL 图片,本项目会在 Gemini 适配器内下载 URL 并转 base64(有大小与超时限制)。
|
|
243
171
|
|
|
244
|
-
|
|
245
|
-
|----------|-------------|---------|----------|
|
|
246
|
-
| `VISION_MODEL_TYPE` | Model type: `glm` (alias for `glm-4.6v`), `glm-4.6v`, `siliconflow`, or `modelscope` | - | Yes |
|
|
247
|
-
| `VISION_MODEL_NAME` | Model name for the API | See defaults below | Yes |
|
|
248
|
-
| `VISION_API_BASE_URL` | API base URL (must be base path, no `/chat/completions`) | See defaults below | Yes |
|
|
249
|
-
| `VISION_API_KEY` | API key for authentication | - | Yes |
|
|
250
|
-
| `VISION_API_TIMEOUT` | Request timeout in milliseconds | 60000 | No |
|
|
251
|
-
| `VISION_MAX_RETRIES` | Maximum retry attempts | 2 | No |
|
|
252
|
-
| `VISION_STRICT_URL_VALIDATION` | Enforce strict image URL validation | `true` | No |
|
|
253
|
-
| `LOG_LEVEL` | Log level: `debug`, `info`, `warn`, `error` | `info` | No |
|
|
172
|
+
## 关于流式响应(Streaming)
|
|
254
173
|
|
|
255
|
-
|
|
256
|
-
- `VISION_STRICT_URL_VALIDATION` defaults to `true`, enforcing strict validation that URLs must end with supported image extensions (`.jpg`, `.jpeg`, `.png`, `.webp`). Set to `false` to allow non-image URLs with a warning only.
|
|
257
|
-
- For GLM-4.6V provider, both `glm` and `glm-4.6v` values work for `VISION_MODEL_TYPE`. `glm` is provided as a convenient alias.
|
|
174
|
+
所有适配器均强制 `stream: false`,并按“完整 JSON 响应”进行解析。
|
|
258
175
|
|
|
259
|
-
|
|
176
|
+
如果某个上游只支持 SSE / `text/event-stream`,目前不支持(需要额外实现流式解析器)。
|
|
260
177
|
|
|
261
|
-
|
|
262
|
-
```bash
|
|
263
|
-
VISION_MODEL_NAME=glm-4.6v
|
|
264
|
-
VISION_API_BASE_URL=https://open.bigmodel.cn/api/paas/v4
|
|
265
|
-
```
|
|
178
|
+
## 开发与测试
|
|
266
179
|
|
|
267
|
-
**SiliconFlow:**
|
|
268
180
|
```bash
|
|
269
|
-
|
|
270
|
-
|
|
181
|
+
cd mcp/vision_mcp
|
|
182
|
+
npm install
|
|
183
|
+
npm run build
|
|
271
184
|
```
|
|
272
185
|
|
|
273
|
-
|
|
274
|
-
|
|
275
|
-
### GLM-4.6V
|
|
276
|
-
|
|
277
|
-
Get your API key from: [智谱 AI 开放平台](https://open.bigmodel.cn/)
|
|
278
|
-
|
|
279
|
-
Format: `xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.xxxxxxxxxxxxxxxxxxxx`
|
|
280
|
-
|
|
281
|
-
### SiliconFlow
|
|
282
|
-
|
|
283
|
-
Get your API key from: [SiliconFlow](https://cloud.siliconflow.cn/)
|
|
284
|
-
|
|
285
|
-
Format: `sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx`
|
|
286
|
-
|
|
287
|
-
## MCP Protocol Note
|
|
186
|
+
测试:
|
|
288
187
|
|
|
289
|
-
|
|
188
|
+
- 仅跑单测(不需要任何 API Key):
|
|
290
189
|
|
|
291
|
-
|
|
292
|
-
|
|
293
|
-
- stdout is reserved for JSON-RPC communication
|
|
294
|
-
|
|
295
|
-
The server handles this automatically. If you fork this project, ensure you follow this rule.
|
|
296
|
-
|
|
297
|
-
## Development
|
|
298
|
-
|
|
299
|
-
### Project Structure
|
|
300
|
-
|
|
301
|
-
```
|
|
302
|
-
vision_mcp/
|
|
303
|
-
├── src/
|
|
304
|
-
│ ├── index.ts # MCP Server entry point
|
|
305
|
-
│ ├── config/
|
|
306
|
-
│ │ └── model-config.ts # Configuration management
|
|
307
|
-
│ ├── tools/
|
|
308
|
-
│ │ └── vision-tool.ts # Vision analysis tool
|
|
309
|
-
│ ├── adapters/
|
|
310
|
-
│ │ ├── base-adapter.ts # Base adapter class
|
|
311
|
-
│ │ ├── glm-adapter.ts # GLM-4.6V adapter
|
|
312
|
-
│ │ └── siliconflow-adapter.ts # SiliconFlow adapter
|
|
313
|
-
│ ├── prompts/
|
|
314
|
-
│ │ └── system.ts # System prompt templates
|
|
315
|
-
│ └── utils/
|
|
316
|
-
│ ├── errors.ts # Error handling
|
|
317
|
-
│ ├── logger.ts # Logging utilities
|
|
318
|
-
│ └── image-input.ts # Image input normalization
|
|
319
|
-
├── package.json
|
|
320
|
-
├── tsconfig.json
|
|
321
|
-
└── README.md
|
|
190
|
+
```bash
|
|
191
|
+
npm run test:unit
|
|
322
192
|
```
|
|
323
193
|
|
|
324
|
-
|
|
194
|
+
- 跑集成测试(需要配置好 `VISION_*` 环境变量):
|
|
325
195
|
|
|
326
196
|
```bash
|
|
327
|
-
# Install dependencies
|
|
328
|
-
npm install
|
|
329
|
-
|
|
330
|
-
# Build TypeScript
|
|
331
|
-
npm run build
|
|
332
|
-
|
|
333
|
-
# Run tests
|
|
334
197
|
npm test
|
|
335
198
|
```
|
|
336
199
|
|
|
337
|
-
|
|
338
|
-
|
|
339
|
-
- `npm test` uses `VISION_API_KEY` (default) or provider-specific keys in the test script:
|
|
340
|
-
- `SILICONFLOW_API_KEY`
|
|
341
|
-
- `GLM_API_KEY`
|
|
342
|
-
- If no API key is set, the tests will exit with a clear error message.
|
|
343
|
-
|
|
344
|
-
## Troubleshooting
|
|
345
|
-
|
|
346
|
-
### 1. "Failed to load model configuration"
|
|
347
|
-
|
|
348
|
-
- Check all required environment variables are set
|
|
349
|
-
- Verify `VISION_MODEL_TYPE` is either `glm-4.6v` or `siliconflow`
|
|
350
|
-
|
|
351
|
-
### 2. "API Key not found"
|
|
352
|
-
|
|
353
|
-
- Set `VISION_API_KEY` in your environment
|
|
354
|
-
- Verify the API key format matches the model requirements
|
|
355
|
-
|
|
356
|
-
### 3. "Connection timeout"
|
|
357
|
-
|
|
358
|
-
- Increase `VISION_API_TIMEOUT` value
|
|
359
|
-
- Check network connectivity to the API endpoint
|
|
360
|
-
- Verify API endpoint URL is correct
|
|
361
|
-
|
|
362
|
-
### 4. "Invalid image URL"
|
|
363
|
-
|
|
364
|
-
- Ensure URL is publicly accessible
|
|
365
|
-
- Check URL format (http:// or https://)
|
|
366
|
-
- Verify image format is supported
|
|
367
|
-
|
|
368
|
-
### 5. "Permission denied reading file"
|
|
369
|
-
|
|
370
|
-
- MCP server needs filesystem access for local files
|
|
371
|
-
- Use absolute paths or ensure relative paths are accessible
|
|
372
|
-
- Check file permissions
|
|
373
|
-
|
|
374
|
-
### 6. "Invalid API endpoint" or "404 Not Found"
|
|
375
|
-
|
|
376
|
-
- Ensure `VISION_API_BASE_URL` is the base path only, without `/chat/completions`
|
|
377
|
-
- Correct: `https://api.siliconflow.cn/v1`
|
|
378
|
-
- Incorrect: `https://api.siliconflow.cn/v1/chat/completions`
|
|
379
|
-
- Check the error details for the full request URL to diagnose endpoint issues
|
|
380
|
-
|
|
381
|
-
## Security Notes
|
|
382
|
-
|
|
383
|
-
- API keys are loaded from environment variables, never hardcoded
|
|
384
|
-
- API keys are masked in logs
|
|
385
|
-
- Images are not persisted by default
|
|
386
|
-
- MCP server should run in trusted environments only (no built-in auth)
|
|
387
|
-
- **Thinking/Reasoning Content Filtering**: Model thinking/reasoning content is automatically filtered from responses to prevent exposing internal reasoning to MCP clients. This filtering is unconditional and applied to all supported models regardless of configuration.
|
|
388
|
-
|
|
389
|
-
## Security Best Practices
|
|
390
|
-
|
|
391
|
-
⚠️ **IMPORTANT**: Never commit API keys or credentials to the repository!
|
|
200
|
+
## 常见问题(Troubleshooting)
|
|
392
201
|
|
|
393
|
-
|
|
394
|
-
- **Keep local test credentials** in `.gitignore`'d files (e.g., `test_key.local.md`)
|
|
395
|
-
- **Rotate keys immediately** if accidentally exposed or committed
|
|
396
|
-
- **See** `doc/test_key.example.md` for test setup template
|
|
397
|
-
- **Never** copy real API keys into documentation, code comments, or issue trackers
|
|
202
|
+
### 1) 配置加载失败:`Missing VISION_MODEL_TYPE` / `Unsupported model type`
|
|
398
203
|
|
|
399
|
-
|
|
400
|
-
-
|
|
401
|
-
- [ ] `.env.local` is in `.gitignore`
|
|
402
|
-
- [ ] No real keys in `test_key.md` (use `test_key.example.md` instead)
|
|
403
|
-
- [ ] No keys in documentation or comments
|
|
404
|
-
- [ ] Review git history for accidental key commits (`git log --all --full-history -S --source --all -- "*secret*" "*key*" "*password*" "test_key.md"`)
|
|
204
|
+
- 确认设置了 `VISION_MODEL_TYPE`
|
|
205
|
+
- 可用值:`glm` / `glm-4.6v` / `siliconflow` / `modelscope` / `openai` / `claude` / `gemini`
|
|
405
206
|
|
|
406
|
-
|
|
207
|
+
### 2) `Missing VISION_API_KEY`
|
|
407
208
|
|
|
408
|
-
|
|
209
|
+
- 确认 `VISION_API_KEY` 已设置(在 `.env` 或 MCP 客户端 `env` 里)
|
|
409
210
|
|
|
410
|
-
|
|
211
|
+
### 3) 404 / endpoint 错误
|
|
411
212
|
|
|
412
|
-
|
|
413
|
-
|
|
414
|
-
|
|
415
|
-
|
|
416
|
-
5. Submit a pull request
|
|
213
|
+
- `VISION_API_BASE_URL` 必须是“base”,不要带具体 endpoint
|
|
214
|
+
- OpenAI / SiliconFlow / ModelScope:会自动拼 `/chat/completions`
|
|
215
|
+
- Claude:会自动拼 `/v1/messages`(`baseUrl` 不要写成 `.../v1`)
|
|
216
|
+
- Gemini:会自动拼 `/{apiVersion}/models/{model}:generateContent`
|
|
417
217
|
|
|
418
|
-
|
|
218
|
+
### 4) 图片 URL 校验失败
|
|
419
219
|
|
|
420
|
-
|
|
421
|
-
-
|
|
422
|
-
- Check model documentation:
|
|
423
|
-
- [GLM-4.6V Docs](https://docs.bigmodel.cn/)
|
|
424
|
-
- [SiliconFlow Docs](https://docs.siliconflow.cn/)
|
|
220
|
+
- 默认要求 URL 以 `.jpg/.jpeg/.png/.webp` 结尾
|
|
221
|
+
- 如需放宽:`VISION_STRICT_URL_VALIDATION=false`
|
|
425
222
|
|
|
223
|
+
## 安全说明
|
|
426
224
|
|
|
427
|
-
|
|
428
|
-
-
|
|
225
|
+
- 不要在 stdout 打日志(stdout 仅用于 MCP JSON-RPC),本项目日志统一走 stderr
|
|
226
|
+
- API Key 会在日志中脱敏
|
|
227
|
+
- 会无条件过滤模型返回的 thinking/reasoning 内容,避免泄露内部推理信息
|
|
@@ -0,0 +1,73 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Gemini Adapter
|
|
3
|
+
*
|
|
4
|
+
* @description Gemini generateContent API 适配器实现,支持 Google Gemini 多模态视觉模型
|
|
5
|
+
* @see https://ai.google.dev/api/rest/v1beta/models/generateContent
|
|
6
|
+
*/
|
|
7
|
+
import { BaseVisionModelAdapter, VisionModelResponse } from './base-adapter.js';
|
|
8
|
+
import { ModelConfig } from '../config/model-config.js';
|
|
9
|
+
export interface GeminiAdapterOptions {
|
|
10
|
+
apiVersion?: string;
|
|
11
|
+
authMode?: 'bearer' | 'x-goog' | 'query';
|
|
12
|
+
imagePartMode?: 'inline_data' | 'inline_bytes';
|
|
13
|
+
maxTokens?: number;
|
|
14
|
+
}
|
|
15
|
+
export declare class GeminiAdapter extends BaseVisionModelAdapter {
|
|
16
|
+
private options;
|
|
17
|
+
constructor(config: ModelConfig, options?: GeminiAdapterOptions);
|
|
18
|
+
analyze(imageData: string, prompt: string): Promise<string>;
|
|
19
|
+
analyzeWithResponse(imageData: string, prompt: string): Promise<VisionModelResponse>;
|
|
20
|
+
private callGeminiAPI;
|
|
21
|
+
/**
|
|
22
|
+
* 构建 API URL
|
|
23
|
+
* Format: {baseUrl}/{apiVersion}/models/{model}:generateContent
|
|
24
|
+
*/
|
|
25
|
+
private buildApiUrl;
|
|
26
|
+
/**
|
|
27
|
+
* 构建认证头
|
|
28
|
+
*/
|
|
29
|
+
private buildAuthHeaders;
|
|
30
|
+
/**
|
|
31
|
+
* 构建请求体
|
|
32
|
+
*/
|
|
33
|
+
private buildRequest;
|
|
34
|
+
/**
|
|
35
|
+
* 构建图片 part
|
|
36
|
+
* 支持三种输入格式:
|
|
37
|
+
* 1. HTTP(S) URL - 下载并转换为 base64
|
|
38
|
+
* 2. Data URL - 直接解析
|
|
39
|
+
* 3. Base64 字符串 - 假设为图片
|
|
40
|
+
*/
|
|
41
|
+
private buildImagePart;
|
|
42
|
+
/**
|
|
43
|
+
* 创建 inline_data part
|
|
44
|
+
*/
|
|
45
|
+
private createInlineDataPart;
|
|
46
|
+
/**
|
|
47
|
+
* 脱敏 URL 用于日志记录(移除敏感的查询参数)
|
|
48
|
+
* @param url - 原始 URL
|
|
49
|
+
* @returns 脱敏后的 URL(query 模式下的 key 会被替换为 ***)
|
|
50
|
+
*/
|
|
51
|
+
private sanitizeUrl;
|
|
52
|
+
/**
|
|
53
|
+
* 下载图片并转换为 base64
|
|
54
|
+
* Gemini 不支持直接的 URL 输入,必须先下载
|
|
55
|
+
*/
|
|
56
|
+
private downloadImageAsBase64;
|
|
57
|
+
/**
|
|
58
|
+
* 处理错误响应
|
|
59
|
+
*/
|
|
60
|
+
private handleErrorResponse;
|
|
61
|
+
/**
|
|
62
|
+
* 解析响应数据
|
|
63
|
+
*/
|
|
64
|
+
private parseResponseData;
|
|
65
|
+
/**
|
|
66
|
+
* 归一化 Gemini 响应为 VisionModelResponse
|
|
67
|
+
* 支持两种响应格式:
|
|
68
|
+
* 1. 官方格式: candidates[0].content.parts[].text
|
|
69
|
+
* 2. 代理格式: candidates[0].output.parts[].text
|
|
70
|
+
*/
|
|
71
|
+
private normalizeResponse;
|
|
72
|
+
}
|
|
73
|
+
//# sourceMappingURL=gemini-adapter.d.ts.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"gemini-adapter.d.ts","sourceRoot":"","sources":["../../src/adapters/gemini-adapter.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAEH,OAAO,EAAE,sBAAsB,EAAE,mBAAmB,EAAE,MAAM,mBAAmB,CAAC;AAChF,OAAO,EAAE,WAAW,EAAE,MAAM,2BAA2B,CAAC;AAMxD,MAAM,WAAW,oBAAoB;IACnC,UAAU,CAAC,EAAE,MAAM,CAAC;IACpB,QAAQ,CAAC,EAAE,QAAQ,GAAG,QAAQ,GAAG,OAAO,CAAC;IACzC,aAAa,CAAC,EAAE,aAAa,GAAG,cAAc,CAAC;IAC/C,SAAS,CAAC,EAAE,MAAM,CAAC;CACpB;AAOD,qBAAa,aAAc,SAAQ,sBAAsB;IACvD,OAAO,CAAC,OAAO,CAAiC;gBAEpC,MAAM,EAAE,WAAW,EAAE,OAAO,GAAE,oBAAyB;IAgC7D,OAAO,CAAC,SAAS,EAAE,MAAM,EAAE,MAAM,EAAE,MAAM,GAAG,OAAO,CAAC,MAAM,CAAC;IAyB3D,mBAAmB,CAAC,SAAS,EAAE,MAAM,EAAE,MAAM,EAAE,MAAM,GAAG,OAAO,CAAC,mBAAmB,CAAC;YA0B5E,aAAa;IAuD3B;;;OAGG;IACH,OAAO,CAAC,WAAW;IAcnB;;OAEG;IACH,OAAO,CAAC,gBAAgB;IAcxB;;OAEG;YACW,YAAY;IAmB1B;;;;;;OAMG;YACW,cAAc;IAmB5B;;OAEG;IACH,OAAO,CAAC,oBAAoB;IAiB5B;;;;OAIG;IACH,OAAO,CAAC,WAAW;IAOnB;;;OAGG;YACW,qBAAqB;IAuEnC;;OAEG;YACW,mBAAmB;IAgCjC;;OAEG;YACW,iBAAiB;IAiC/B;;;;;OAKG;IACH,OAAO,CAAC,iBAAiB;CAsE1B"}
|