@lutery/vision-mcp 1.0.2 → 1.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +91 -91
  2. package/package.json +5 -1
package/README.md CHANGED
@@ -1,44 +1,44 @@
1
1
  # Vision MCP
2
2
 
3
- 一个基于 STDIO MCP Server,为不具备视觉能力(或视觉模型成本较高)的 LLM 提供统一的图片分析能力。通过切换 Provider(环境变量配置),即可使用不同平台/厂商的多模态模型。
3
+ An STDIO-based MCP Server that provides unified image analysis capabilities for LLMs lacking visual abilities (or with expensive vision models). By switching Providers (via environment variables), you can use multimodal models from different platforms/vendors.
4
4
 
5
- ## 支持的模型 / Provider
5
+ ## Supported Models / Providers
6
6
 
7
- 通过 `VISION_MODEL_TYPE` 选择提供商:
7
+ Select a provider via `VISION_MODEL_TYPE`:
8
8
 
9
- | type | Provider | 默认 `VISION_API_BASE_URL` | 默认 `VISION_MODEL_NAME` | 备注 |
10
- |------|----------|----------------------------|--------------------------|------|
11
- | `glm` | 智谱 GLM-4.6V | `https://open.bigmodel.cn/api/paas/v4` | `glm-4.6v` | GLM-4.6V 智谱视觉模型 |
12
- | `siliconflow` | SiliconFlowOpenAI 兼容) | `https://api.siliconflow.cn/v1` | `Qwen/Qwen2-VL-72B-Instruct` | 视觉模型丰富 |
13
- | `modelscope` | ModelScope API-InferenceOpenAI 兼容) | `https://api-inference.modelscope.cn/v1` | `ZhipuAI/GLM-4.6V` | 需实名/绑定阿里云,受限额影响 |
14
- | `openai` | OpenAI | `https://api.openai.com/v1` | `gpt-4o` | 适配 Chat Completions |
15
- | `claude` | Anthropic ClaudeMessages API | `https://api.anthropic.com` | `claude-3-5-sonnet-20241022` | `baseUrl` 不要带 `/v1` |
16
- | `gemini` | Google GeminigenerateContent API | `https://generativelanguage.googleapis.com` | `gemini-2.0-flash-exp` | 官方入口;代理/网关可通过 `VISION_API_BASE_URL` 覆盖 |
9
+ | type | Provider | Default `VISION_API_BASE_URL` | Default `VISION_MODEL_NAME` | Notes |
10
+ |------|----------|-------------------------------|----------------------------|-------|
11
+ | `glm` | Zhipu GLM-4.6V | `https://open.bigmodel.cn/api/paas/v4` | `glm-4.6v` | GLM-4.6V Zhipu vision model |
12
+ | `siliconflow` | SiliconFlow (OpenAI compatible) | `https://api.siliconflow.cn/v1` | `Qwen/Qwen2-VL-72B-Instruct` | Rich vision model selection |
13
+ | `modelscope` | ModelScope API-Inference (OpenAI compatible) | `https://api-inference.modelscope.cn/v1` | `ZhipuAI/GLM-4.6V` | Requires real-name verification/Aliyun binding, subject to quotas |
14
+ | `openai` | OpenAI | `https://api.openai.com/v1` | `gpt-4o` | Chat Completions compatible |
15
+ | `claude` | Anthropic Claude (Messages API) | `https://api.anthropic.com` | `claude-3-5-sonnet-20241022` | `baseUrl` should not include `/v1` |
16
+ | `gemini` | Google Gemini (generateContent API) | `https://generativelanguage.googleapis.com` | `gemini-2.0-flash-exp` | Official entry; proxies/gateways can override via `VISION_API_BASE_URL` |
17
17
 
18
- 获取 API Key / Token(各平台控制台):
18
+ Get API Key / Token (from respective platform consoles):
19
19
 
20
- - GLM(智谱):https://open.bigmodel.cn/
21
- - SiliconFlowhttps://cloud.siliconflow.cn/
22
- - ModelScopehttps://modelscope.cn/my/myaccesstoken
23
- - OpenAIhttps://platform.openai.com/
24
- - ClaudeAnthropic):https://console.anthropic.com/
25
- - GeminiGoogle AI):https://ai.google.dev/
20
+ - GLM (Zhipu): https://open.bigmodel.cn/
21
+ - SiliconFlow: https://cloud.siliconflow.cn/
22
+ - ModelScope: https://modelscope.cn/my/myaccesstoken
23
+ - OpenAI: https://platform.openai.com/
24
+ - Claude (Anthropic): https://console.anthropic.com/
25
+ - Gemini (Google AI): https://ai.google.dev/
26
26
 
27
- ## 特性
27
+ ## Features
28
28
 
29
- - Provider 一键切换(仅需改环境变量)
30
- - 图片输入:URL / base64 data URL / 本地文件路径
31
- - 内置系统提示词模板:UI 分析、OCR、目标检测、结构化提取等
32
- - 安全:日志自动脱敏 API Key,且会过滤模型返回的 thinking/reasoning 内容
33
- - 严格遵守 MCP:stdout 仅用于 JSON-RPC,日志走 stderr
29
+ - One-click provider switching (just change environment variables)
30
+ - Image input support: URL / base64 data URL / local file path
31
+ - Built-in system prompt templates: UI analysis, OCR, object detection, structured extraction, etc.
32
+ - Security: Automatic API key masking in logs, filters model-returned thinking/reasoning content
33
+ - MCP compliant: stdout reserved for JSON-RPC, logs go to stderr
34
34
 
35
- ## 安装与运行
35
+ ## Installation & Usage
36
36
 
37
- 要求:Node.js >= 18
37
+ Requirements: Node.js >= 18
38
38
 
39
- ### 作为 NPM 包被 MCP 客户端启动(推荐)
39
+ ### Running as NPM package from MCP client (Recommended)
40
40
 
41
- MCP 客户端(如 Claude Desktop)里配置命令为 `npx`:
41
+ Configure your MCP client (e.g., Claude Desktop) to use `npx`:
42
42
 
43
43
  ```json
44
44
  {
@@ -57,18 +57,18 @@
57
57
  }
58
58
  ```
59
59
 
60
- 说明:
61
- - `VISION_MODEL_NAME` / `VISION_API_BASE_URL` 可省略(会使用该 provider 的默认值)
62
- - 如需更详细的配置项,建议直接参考 `.env.example`
60
+ Notes:
61
+ - `VISION_MODEL_NAME` / `VISION_API_BASE_URL` are optional (will use provider defaults)
62
+ - For more configuration options, refer to `.env.example`
63
63
 
64
- 也可以全局安装后直接使用可执行文件(`bin` 名称为 `vision-mcp`):
64
+ You can also install globally and use the executable directly (binary name: `vision-mcp`):
65
65
 
66
66
  ```bash
67
67
  npm i -g @lutery/vision-mcp
68
68
  vision-mcp
69
69
  ```
70
70
 
71
- ### 本地开发运行
71
+ ### Local development
72
72
 
73
73
  ```bash
74
74
  cd mcp/vision_mcp
@@ -77,65 +77,65 @@ npm run build
77
77
  node dist/index.js
78
78
  ```
79
79
 
80
- 成功启动后,会在 stderr 输出 `Vision MCP Server is running on stdio`。
80
+ On successful startup, you'll see `Vision MCP Server is running on stdio` in stderr.
81
81
 
82
- ## 配置(环境变量)
82
+ ## Configuration (Environment Variables)
83
83
 
84
- 最小必填:
84
+ Minimum required:
85
85
 
86
- - `VISION_MODEL_TYPE`:选择 provider
87
- - `VISION_API_KEY`:对应 provider key/token
86
+ - `VISION_MODEL_TYPE`: Select provider
87
+ - `VISION_API_KEY`: Key/token for the selected provider
88
88
 
89
- 常用可选:
89
+ Common optional:
90
90
 
91
- | 变量 | 说明 | 默认 |
92
- |------|------|------|
93
- | `VISION_MODEL_NAME` | 模型名称 | provider 内置默认值 |
94
- | `VISION_API_BASE_URL` | API 基础地址(不要带具体 endpoint | provider 内置默认值 |
95
- | `VISION_API_TIMEOUT` | 超时时间(毫秒) | `60000` |
96
- | `VISION_MAX_RETRIES` | 最大重试次数 | `2` |
97
- | `VISION_STRICT_URL_VALIDATION` | 严格校验图片 URL 是否以 `.jpg/.jpeg/.png/.webp` 结尾 | `true` |
98
- | `LOG_LEVEL` | 日志级别:`debug`/`info`/`warn`/`error` | `info` |
91
+ | Variable | Description | Default |
92
+ |----------|-------------|---------|
93
+ | `VISION_MODEL_NAME` | Model name | Provider built-in defaults |
94
+ | `VISION_API_BASE_URL` | API base URL (without specific endpoint) | Provider built-in defaults |
95
+ | `VISION_API_TIMEOUT` | Timeout (milliseconds) | `60000` |
96
+ | `VISION_MAX_RETRIES` | Maximum retry attempts | `2` |
97
+ | `VISION_STRICT_URL_VALIDATION` | Strict validation for image URLs ending in `.jpg/.jpeg/.png/.webp` | `true` |
98
+ | `LOG_LEVEL` | Log level: `debug`/`info`/`warn`/`error` | `info` |
99
99
 
100
- Provider 特有配置:
100
+ Provider-specific configuration:
101
101
 
102
102
  - Claude
103
- - `VISION_CLAUDE_API_VERSION`:Anthropic API 版本(默认 `2023-06-01`)
103
+ - `VISION_CLAUDE_API_VERSION`: Anthropic API version (default `2023-06-01`)
104
104
 
105
- ## MCP 工具(Tools
105
+ ## MCP Tools
106
106
 
107
- 本服务注册了 3 个工具:
107
+ This server registers 3 tools:
108
108
 
109
109
  ### 1) `analyze_image`
110
110
 
111
- 参数:
111
+ Parameters:
112
112
 
113
113
  ```json
114
114
  {
115
115
  "image": "https://example.com/a.png",
116
- "prompt": "请描述这个界面有哪些组件",
116
+ "prompt": "Describe the components in this interface",
117
117
  "output_format": "text",
118
118
  "template": "ui-analysis"
119
119
  }
120
120
  ```
121
121
 
122
- 字段说明:
123
- - `image`:支持 URL / base64 data URL / 本地路径
124
- - `prompt`:你的分析任务描述
125
- - `output_format`:`text` `json`(提示偏好;不会强制校验 JSON
126
- - `template`:可选系统模板(见下方 `list_templates`)
122
+ Field descriptions:
123
+ - `image`: Supports URL / base64 data URL / local path
124
+ - `prompt`: Your analysis task description
125
+ - `output_format`: `text` or `json` (hint preference; JSON not strictly validated)
126
+ - `template`: Optional system template (see `list_templates` below)
127
127
 
128
128
  ### 2) `list_templates`
129
129
 
130
- 列出内置系统提示词模板(包含 id、用途说明等)。
130
+ Lists built-in system prompt templates (including id, usage description, etc.).
131
131
 
132
132
  ### 3) `get_config`
133
133
 
134
- 返回当前生效的模型配置(API Key 会脱敏)。
134
+ Returns currently active model configuration (API key is masked).
135
135
 
136
- ## 图片输入规范
136
+ ## Image Input Specifications
137
137
 
138
- 支持三种输入:
138
+ Supports three input types:
139
139
 
140
140
  1) URL
141
141
 
@@ -143,7 +143,7 @@ Provider 特有配置:
143
143
  https://example.com/image.png
144
144
  ```
145
145
 
146
- 默认开启严格校验:URL 必须以 `.jpg/.jpeg/.png/.webp` 结尾,否则报错。可通过 `VISION_STRICT_URL_VALIDATION=false` 放宽(仅告警)。
146
+ Strict validation enabled by default: URL must end with `.jpg/.jpeg/.png/.webp`, otherwise error. Can be relaxed with `VISION_STRICT_URL_VALIDATION=false` (warning only).
147
147
 
148
148
  2) Base64 Data URL
149
149
 
@@ -151,26 +151,26 @@ https://example.com/image.png
151
151
  data:image/png;base64,iVBORw0KGgo...
152
152
  ```
153
153
 
154
- 支持的 MIME:`image/jpeg` / `image/jpg` / `image/png` / `image/webp`。
154
+ Supported MIME types: `image/jpeg` / `image/jpg` / `image/png` / `image/webp`.
155
155
 
156
- 3) 本地文件路径
156
+ 3) Local file path
157
157
 
158
158
  ```text
159
159
  ./test/image.png
160
160
  D:\\path\\to\\image.jpg
161
161
  ```
162
162
 
163
- 要求 MCP Server 进程对该路径可读;仅支持 `.jpg/.jpeg/.png/.webp`。
163
+ Requires MCP Server process to have read access; only supports `.jpg/.jpeg/.png/.webp`.
164
164
 
165
- 补充:Gemini provider 不支持直接传 URL 图片,本项目会在 Gemini 适配器内下载 URL 并转 base64(有大小与超时限制)。
165
+ Note: Gemini provider doesn't support direct URL image input; this project downloads URLs and converts to base64 in the Gemini adapter (subject to size and timeout limits).
166
166
 
167
- ## 关于流式响应(Streaming
167
+ ## About Streaming
168
168
 
169
- 所有适配器均强制 `stream: false`,并按“完整 JSON 响应”进行解析。
169
+ All adapters enforce `stream: false` and parse as "complete JSON response".
170
170
 
171
- 如果某个上游只支持 SSE / `text/event-stream`,目前不支持(需要额外实现流式解析器)。
171
+ SSE / `text/event-stream` responses are currently not supported (would require additional streaming parser implementation).
172
172
 
173
- ## 开发与测试
173
+ ## Development & Testing
174
174
 
175
175
  ```bash
176
176
  cd mcp/vision_mcp
@@ -178,45 +178,45 @@ npm install
178
178
  npm run build
179
179
  ```
180
180
 
181
- 测试:
181
+ Testing:
182
182
 
183
- - 仅跑单测(不需要任何 API Key):
183
+ - Unit tests only (no API keys required):
184
184
 
185
185
  ```bash
186
186
  npm run test:unit
187
187
  ```
188
188
 
189
- - 跑集成测试(需要配置好 `VISION_*` 环境变量):
189
+ - Integration tests (requires `VISION_*` environment variables configured):
190
190
 
191
191
  ```bash
192
192
  npm test
193
193
  ```
194
194
 
195
- ## 常见问题(Troubleshooting
195
+ ## Troubleshooting
196
196
 
197
- ### 1) 配置加载失败:`Missing VISION_MODEL_TYPE` / `Unsupported model type`
197
+ ### 1) Configuration loading failed: `Missing VISION_MODEL_TYPE` / `Unsupported model type`
198
198
 
199
- - 确认设置了 `VISION_MODEL_TYPE`
200
- - 可用值:`glm` / `siliconflow` / `modelscope` / `openai` / `claude` / `gemini`
199
+ - Ensure `VISION_MODEL_TYPE` is set
200
+ - Valid values: `glm` / `siliconflow` / `modelscope` / `openai` / `claude` / `gemini`
201
201
 
202
202
  ### 2) `Missing VISION_API_KEY`
203
203
 
204
- - 确认 `VISION_API_KEY` 已设置(在 `.env` MCP 客户端 `env` 里)
204
+ - Ensure `VISION_API_KEY` is set (in `.env` or MCP client `env`)
205
205
 
206
- ### 3) 404 / endpoint 错误
206
+ ### 3) 404 / endpoint errors
207
207
 
208
- - `VISION_API_BASE_URL` 必须是“base”,不要带具体 endpoint
209
- - OpenAI / SiliconFlow / ModelScope:会自动拼 `/chat/completions`
210
- - Claude:会自动拼 `/v1/messages`(`baseUrl` 不要写成 `.../v1`)
211
- - Gemini:会自动拼 `/{apiVersion}/models/{model}:generateContent`
208
+ - `VISION_API_BASE_URL` must be the "base" URL, without specific endpoint
209
+ - OpenAI / SiliconFlow / ModelScope: automatically appends `/chat/completions`
210
+ - Claude: automatically appends `/v1/messages` (don't write `baseUrl` as `.../v1`)
211
+ - Gemini: automatically appends `/{apiVersion}/models/{model}:generateContent`
212
212
 
213
- ### 4) 图片 URL 校验失败
213
+ ### 4) Image URL validation failed
214
214
 
215
- - 默认要求 URL `.jpg/.jpeg/.png/.webp` 结尾
216
- - 如需放宽:`VISION_STRICT_URL_VALIDATION=false`
215
+ - Default requires URL to end with `.jpg/.jpeg/.png/.webp`
216
+ - To relax: `VISION_STRICT_URL_VALIDATION=false`
217
217
 
218
- ## 安全说明
218
+ ## Security Notes
219
219
 
220
- - 不要在 stdout 打日志(stdout 仅用于 MCP JSON-RPC),本项目日志统一走 stderr
221
- - API Key 会在日志中脱敏
222
- - 会无条件过滤模型返回的 thinking/reasoning 内容,避免泄露内部推理信息
220
+ - Don't log to stdout (stdout reserved for MCP JSON-RPC), all logs go to stderr
221
+ - API keys are masked in logs
222
+ - Unconditionally filters model-returned thinking/reasoning content to avoid leaking internal inference information
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@lutery/vision-mcp",
3
- "version": "1.0.2",
3
+ "version": "1.0.3",
4
4
  "description": "MCP Server providing vision capabilities for LLMs via GLM-4.6V, SiliconFlow, and ModelScope",
5
5
  "type": "module",
6
6
  "bin": {
@@ -18,6 +18,10 @@
18
18
  "test:smoke": "node test/smoke-glm-adapter.mjs",
19
19
  "test:integration": "npm run build && node -r dotenv/config test/simple-test.js"
20
20
  },
21
+ "repository": {
22
+ "type": "git",
23
+ "url": "https://github.com/lutery/mcp.git"
24
+ },
21
25
  "keywords": [
22
26
  "mcp",
23
27
  "vision",