@lutery/vision-mcp 1.0.2 → 1.0.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +91 -91
- package/package.json +5 -1
package/README.md
CHANGED
|
@@ -1,44 +1,44 @@
|
|
|
1
1
|
# Vision MCP
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
An STDIO-based MCP Server that provides unified image analysis capabilities for LLMs lacking visual abilities (or with expensive vision models). By switching Providers (via environment variables), you can use multimodal models from different platforms/vendors.
|
|
4
4
|
|
|
5
|
-
##
|
|
5
|
+
## Supported Models / Providers
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
Select a provider via `VISION_MODEL_TYPE`:
|
|
8
8
|
|
|
9
|
-
| type | Provider |
|
|
10
|
-
|
|
11
|
-
| `glm` |
|
|
12
|
-
| `siliconflow` | SiliconFlow
|
|
13
|
-
| `modelscope` | ModelScope API-Inference
|
|
14
|
-
| `openai` | OpenAI | `https://api.openai.com/v1` | `gpt-4o` |
|
|
15
|
-
| `claude` | Anthropic Claude
|
|
16
|
-
| `gemini` | Google Gemini
|
|
9
|
+
| type | Provider | Default `VISION_API_BASE_URL` | Default `VISION_MODEL_NAME` | Notes |
|
|
10
|
+
|------|----------|-------------------------------|----------------------------|-------|
|
|
11
|
+
| `glm` | Zhipu GLM-4.6V | `https://open.bigmodel.cn/api/paas/v4` | `glm-4.6v` | GLM-4.6V Zhipu vision model |
|
|
12
|
+
| `siliconflow` | SiliconFlow (OpenAI compatible) | `https://api.siliconflow.cn/v1` | `Qwen/Qwen2-VL-72B-Instruct` | Rich vision model selection |
|
|
13
|
+
| `modelscope` | ModelScope API-Inference (OpenAI compatible) | `https://api-inference.modelscope.cn/v1` | `ZhipuAI/GLM-4.6V` | Requires real-name verification/Aliyun binding, subject to quotas |
|
|
14
|
+
| `openai` | OpenAI | `https://api.openai.com/v1` | `gpt-4o` | Chat Completions compatible |
|
|
15
|
+
| `claude` | Anthropic Claude (Messages API) | `https://api.anthropic.com` | `claude-3-5-sonnet-20241022` | `baseUrl` should not include `/v1` |
|
|
16
|
+
| `gemini` | Google Gemini (generateContent API) | `https://generativelanguage.googleapis.com` | `gemini-2.0-flash-exp` | Official entry; proxies/gateways can override via `VISION_API_BASE_URL` |
|
|
17
17
|
|
|
18
|
-
|
|
18
|
+
Get API Key / Token (from respective platform consoles):
|
|
19
19
|
|
|
20
|
-
- GLM
|
|
21
|
-
- SiliconFlow
|
|
22
|
-
- ModelScope
|
|
23
|
-
- OpenAI
|
|
24
|
-
- Claude
|
|
25
|
-
- Gemini
|
|
20
|
+
- GLM (Zhipu): https://open.bigmodel.cn/
|
|
21
|
+
- SiliconFlow: https://cloud.siliconflow.cn/
|
|
22
|
+
- ModelScope: https://modelscope.cn/my/myaccesstoken
|
|
23
|
+
- OpenAI: https://platform.openai.com/
|
|
24
|
+
- Claude (Anthropic): https://console.anthropic.com/
|
|
25
|
+
- Gemini (Google AI): https://ai.google.dev/
|
|
26
26
|
|
|
27
|
-
##
|
|
27
|
+
## Features
|
|
28
28
|
|
|
29
|
-
-
|
|
30
|
-
-
|
|
31
|
-
-
|
|
32
|
-
-
|
|
33
|
-
-
|
|
29
|
+
- One-click provider switching (just change environment variables)
|
|
30
|
+
- Image input support: URL / base64 data URL / local file path
|
|
31
|
+
- Built-in system prompt templates: UI analysis, OCR, object detection, structured extraction, etc.
|
|
32
|
+
- Security: Automatic API key masking in logs, filters model-returned thinking/reasoning content
|
|
33
|
+
- MCP compliant: stdout reserved for JSON-RPC, logs go to stderr
|
|
34
34
|
|
|
35
|
-
##
|
|
35
|
+
## Installation & Usage
|
|
36
36
|
|
|
37
|
-
|
|
37
|
+
Requirements: Node.js >= 18
|
|
38
38
|
|
|
39
|
-
###
|
|
39
|
+
### Running as NPM package from MCP client (Recommended)
|
|
40
40
|
|
|
41
|
-
|
|
41
|
+
Configure your MCP client (e.g., Claude Desktop) to use `npx`:
|
|
42
42
|
|
|
43
43
|
```json
|
|
44
44
|
{
|
|
@@ -57,18 +57,18 @@
|
|
|
57
57
|
}
|
|
58
58
|
```
|
|
59
59
|
|
|
60
|
-
|
|
61
|
-
- `VISION_MODEL_NAME` / `VISION_API_BASE_URL`
|
|
62
|
-
-
|
|
60
|
+
Notes:
|
|
61
|
+
- `VISION_MODEL_NAME` / `VISION_API_BASE_URL` are optional (will use provider defaults)
|
|
62
|
+
- For more configuration options, refer to `.env.example`
|
|
63
63
|
|
|
64
|
-
|
|
64
|
+
You can also install globally and use the executable directly (binary name: `vision-mcp`):
|
|
65
65
|
|
|
66
66
|
```bash
|
|
67
67
|
npm i -g @lutery/vision-mcp
|
|
68
68
|
vision-mcp
|
|
69
69
|
```
|
|
70
70
|
|
|
71
|
-
###
|
|
71
|
+
### Local development
|
|
72
72
|
|
|
73
73
|
```bash
|
|
74
74
|
cd mcp/vision_mcp
|
|
@@ -77,65 +77,65 @@ npm run build
|
|
|
77
77
|
node dist/index.js
|
|
78
78
|
```
|
|
79
79
|
|
|
80
|
-
|
|
80
|
+
On successful startup, you'll see `Vision MCP Server is running on stdio` in stderr.
|
|
81
81
|
|
|
82
|
-
##
|
|
82
|
+
## Configuration (Environment Variables)
|
|
83
83
|
|
|
84
|
-
|
|
84
|
+
Minimum required:
|
|
85
85
|
|
|
86
|
-
- `VISION_MODEL_TYPE
|
|
87
|
-
- `VISION_API_KEY
|
|
86
|
+
- `VISION_MODEL_TYPE`: Select provider
|
|
87
|
+
- `VISION_API_KEY`: Key/token for the selected provider
|
|
88
88
|
|
|
89
|
-
|
|
89
|
+
Common optional:
|
|
90
90
|
|
|
91
|
-
|
|
|
92
|
-
|
|
93
|
-
| `VISION_MODEL_NAME` |
|
|
94
|
-
| `VISION_API_BASE_URL` | API
|
|
95
|
-
| `VISION_API_TIMEOUT` |
|
|
96
|
-
| `VISION_MAX_RETRIES` |
|
|
97
|
-
| `VISION_STRICT_URL_VALIDATION` |
|
|
98
|
-
| `LOG_LEVEL` |
|
|
91
|
+
| Variable | Description | Default |
|
|
92
|
+
|----------|-------------|---------|
|
|
93
|
+
| `VISION_MODEL_NAME` | Model name | Provider built-in defaults |
|
|
94
|
+
| `VISION_API_BASE_URL` | API base URL (without specific endpoint) | Provider built-in defaults |
|
|
95
|
+
| `VISION_API_TIMEOUT` | Timeout (milliseconds) | `60000` |
|
|
96
|
+
| `VISION_MAX_RETRIES` | Maximum retry attempts | `2` |
|
|
97
|
+
| `VISION_STRICT_URL_VALIDATION` | Strict validation for image URLs ending in `.jpg/.jpeg/.png/.webp` | `true` |
|
|
98
|
+
| `LOG_LEVEL` | Log level: `debug`/`info`/`warn`/`error` | `info` |
|
|
99
99
|
|
|
100
|
-
Provider
|
|
100
|
+
Provider-specific configuration:
|
|
101
101
|
|
|
102
102
|
- Claude
|
|
103
|
-
- `VISION_CLAUDE_API_VERSION
|
|
103
|
+
- `VISION_CLAUDE_API_VERSION`: Anthropic API version (default `2023-06-01`)
|
|
104
104
|
|
|
105
|
-
## MCP
|
|
105
|
+
## MCP Tools
|
|
106
106
|
|
|
107
|
-
|
|
107
|
+
This server registers 3 tools:
|
|
108
108
|
|
|
109
109
|
### 1) `analyze_image`
|
|
110
110
|
|
|
111
|
-
|
|
111
|
+
Parameters:
|
|
112
112
|
|
|
113
113
|
```json
|
|
114
114
|
{
|
|
115
115
|
"image": "https://example.com/a.png",
|
|
116
|
-
"prompt": "
|
|
116
|
+
"prompt": "Describe the components in this interface",
|
|
117
117
|
"output_format": "text",
|
|
118
118
|
"template": "ui-analysis"
|
|
119
119
|
}
|
|
120
120
|
```
|
|
121
121
|
|
|
122
|
-
|
|
123
|
-
- `image
|
|
124
|
-
- `prompt
|
|
125
|
-
- `output_format
|
|
126
|
-
- `template
|
|
122
|
+
Field descriptions:
|
|
123
|
+
- `image`: Supports URL / base64 data URL / local path
|
|
124
|
+
- `prompt`: Your analysis task description
|
|
125
|
+
- `output_format`: `text` or `json` (hint preference; JSON not strictly validated)
|
|
126
|
+
- `template`: Optional system template (see `list_templates` below)
|
|
127
127
|
|
|
128
128
|
### 2) `list_templates`
|
|
129
129
|
|
|
130
|
-
|
|
130
|
+
Lists built-in system prompt templates (including id, usage description, etc.).
|
|
131
131
|
|
|
132
132
|
### 3) `get_config`
|
|
133
133
|
|
|
134
|
-
|
|
134
|
+
Returns currently active model configuration (API key is masked).
|
|
135
135
|
|
|
136
|
-
##
|
|
136
|
+
## Image Input Specifications
|
|
137
137
|
|
|
138
|
-
|
|
138
|
+
Supports three input types:
|
|
139
139
|
|
|
140
140
|
1) URL
|
|
141
141
|
|
|
@@ -143,7 +143,7 @@ Provider 特有配置:
|
|
|
143
143
|
https://example.com/image.png
|
|
144
144
|
```
|
|
145
145
|
|
|
146
|
-
|
|
146
|
+
Strict validation enabled by default: URL must end with `.jpg/.jpeg/.png/.webp`, otherwise error. Can be relaxed with `VISION_STRICT_URL_VALIDATION=false` (warning only).
|
|
147
147
|
|
|
148
148
|
2) Base64 Data URL
|
|
149
149
|
|
|
@@ -151,26 +151,26 @@ https://example.com/image.png
|
|
|
151
151
|
data:image/png;base64,iVBORw0KGgo...
|
|
152
152
|
```
|
|
153
153
|
|
|
154
|
-
|
|
154
|
+
Supported MIME types: `image/jpeg` / `image/jpg` / `image/png` / `image/webp`.
|
|
155
155
|
|
|
156
|
-
3)
|
|
156
|
+
3) Local file path
|
|
157
157
|
|
|
158
158
|
```text
|
|
159
159
|
./test/image.png
|
|
160
160
|
D:\\path\\to\\image.jpg
|
|
161
161
|
```
|
|
162
162
|
|
|
163
|
-
|
|
163
|
+
Requires MCP Server process to have read access; only supports `.jpg/.jpeg/.png/.webp`.
|
|
164
164
|
|
|
165
|
-
|
|
165
|
+
Note: Gemini provider doesn't support direct URL image input; this project downloads URLs and converts to base64 in the Gemini adapter (subject to size and timeout limits).
|
|
166
166
|
|
|
167
|
-
##
|
|
167
|
+
## About Streaming
|
|
168
168
|
|
|
169
|
-
|
|
169
|
+
All adapters enforce `stream: false` and parse as "complete JSON response".
|
|
170
170
|
|
|
171
|
-
|
|
171
|
+
SSE / `text/event-stream` responses are currently not supported (would require additional streaming parser implementation).
|
|
172
172
|
|
|
173
|
-
##
|
|
173
|
+
## Development & Testing
|
|
174
174
|
|
|
175
175
|
```bash
|
|
176
176
|
cd mcp/vision_mcp
|
|
@@ -178,45 +178,45 @@ npm install
|
|
|
178
178
|
npm run build
|
|
179
179
|
```
|
|
180
180
|
|
|
181
|
-
|
|
181
|
+
Testing:
|
|
182
182
|
|
|
183
|
-
-
|
|
183
|
+
- Unit tests only (no API keys required):
|
|
184
184
|
|
|
185
185
|
```bash
|
|
186
186
|
npm run test:unit
|
|
187
187
|
```
|
|
188
188
|
|
|
189
|
-
-
|
|
189
|
+
- Integration tests (requires `VISION_*` environment variables configured):
|
|
190
190
|
|
|
191
191
|
```bash
|
|
192
192
|
npm test
|
|
193
193
|
```
|
|
194
194
|
|
|
195
|
-
##
|
|
195
|
+
## Troubleshooting
|
|
196
196
|
|
|
197
|
-
### 1)
|
|
197
|
+
### 1) Configuration loading failed: `Missing VISION_MODEL_TYPE` / `Unsupported model type`
|
|
198
198
|
|
|
199
|
-
-
|
|
200
|
-
-
|
|
199
|
+
- Ensure `VISION_MODEL_TYPE` is set
|
|
200
|
+
- Valid values: `glm` / `siliconflow` / `modelscope` / `openai` / `claude` / `gemini`
|
|
201
201
|
|
|
202
202
|
### 2) `Missing VISION_API_KEY`
|
|
203
203
|
|
|
204
|
-
-
|
|
204
|
+
- Ensure `VISION_API_KEY` is set (in `.env` or MCP client `env`)
|
|
205
205
|
|
|
206
|
-
### 3) 404 / endpoint
|
|
206
|
+
### 3) 404 / endpoint errors
|
|
207
207
|
|
|
208
|
-
- `VISION_API_BASE_URL`
|
|
209
|
-
- OpenAI / SiliconFlow / ModelScope
|
|
210
|
-
- Claude
|
|
211
|
-
- Gemini
|
|
208
|
+
- `VISION_API_BASE_URL` must be the "base" URL, without specific endpoint
|
|
209
|
+
- OpenAI / SiliconFlow / ModelScope: automatically appends `/chat/completions`
|
|
210
|
+
- Claude: automatically appends `/v1/messages` (don't write `baseUrl` as `.../v1`)
|
|
211
|
+
- Gemini: automatically appends `/{apiVersion}/models/{model}:generateContent`
|
|
212
212
|
|
|
213
|
-
### 4)
|
|
213
|
+
### 4) Image URL validation failed
|
|
214
214
|
|
|
215
|
-
-
|
|
216
|
-
-
|
|
215
|
+
- Default requires URL to end with `.jpg/.jpeg/.png/.webp`
|
|
216
|
+
- To relax: `VISION_STRICT_URL_VALIDATION=false`
|
|
217
217
|
|
|
218
|
-
##
|
|
218
|
+
## Security Notes
|
|
219
219
|
|
|
220
|
-
-
|
|
221
|
-
- API
|
|
222
|
-
-
|
|
220
|
+
- Don't log to stdout (stdout reserved for MCP JSON-RPC), all logs go to stderr
|
|
221
|
+
- API keys are masked in logs
|
|
222
|
+
- Unconditionally filters model-returned thinking/reasoning content to avoid leaking internal inference information
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@lutery/vision-mcp",
|
|
3
|
-
"version": "1.0.
|
|
3
|
+
"version": "1.0.3",
|
|
4
4
|
"description": "MCP Server providing vision capabilities for LLMs via GLM-4.6V, SiliconFlow, and ModelScope",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|
|
@@ -18,6 +18,10 @@
|
|
|
18
18
|
"test:smoke": "node test/smoke-glm-adapter.mjs",
|
|
19
19
|
"test:integration": "npm run build && node -r dotenv/config test/simple-test.js"
|
|
20
20
|
},
|
|
21
|
+
"repository": {
|
|
22
|
+
"type": "git",
|
|
23
|
+
"url": "https://github.com/lutery/mcp.git"
|
|
24
|
+
},
|
|
21
25
|
"keywords": [
|
|
22
26
|
"mcp",
|
|
23
27
|
"vision",
|