@msalman5230/image-understand-mcp 1.0.1 → 1.0.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +81 -91
- package/package.json +2 -1
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 MSalman5230
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
CHANGED
|
@@ -1,84 +1,9 @@
|
|
|
1
1
|
# Image Understand MCP Server
|
|
2
2
|
|
|
3
|
-
Local MCP server that lets an LLM agent without native vision understand local image files through Google Gemini/
|
|
3
|
+
Local MCP server that lets an LLM agent without native vision understand local image files through Google Gemini/emma models.
|
|
4
4
|
|
|
5
5
|
The server runs over stdio and exposes image analysis tools for local image paths.
|
|
6
6
|
|
|
7
|
-
## Requirements
|
|
8
|
-
|
|
9
|
-
- Node.js 18 or newer
|
|
10
|
-
- A Gemini API key in `GEMINI_API_KEY`
|
|
11
|
-
- Local image files (`.png`, `.jpg`, `.jpeg`, `.webp`, `.gif`, `.bmp`, `.heic`, `.heif`)
|
|
12
|
-
|
|
13
|
-
## Install
|
|
14
|
-
|
|
15
|
-
```bash
|
|
16
|
-
npm install
|
|
17
|
-
npm run build
|
|
18
|
-
```
|
|
19
|
-
|
|
20
|
-
## Publish for `npx`
|
|
21
|
-
|
|
22
|
-
The npm package is published as `@msalman5230/image-understand-mcp` and exposes a CLI binary named `image-understand-mcp`, so users do not need to point their MCP client at `dist/index.js`.
|
|
23
|
-
|
|
24
|
-
Before publishing:
|
|
25
|
-
|
|
26
|
-
```bash
|
|
27
|
-
npm run check
|
|
28
|
-
npm pack --dry-run
|
|
29
|
-
```
|
|
30
|
-
|
|
31
|
-
Publish:
|
|
32
|
-
|
|
33
|
-
```bash
|
|
34
|
-
npm login
|
|
35
|
-
npm publish --access public
|
|
36
|
-
```
|
|
37
|
-
|
|
38
|
-
Scoped npm packages must use `--access public` on publish unless you want a private/restricted package.
|
|
39
|
-
|
|
40
|
-
After that, MCP clients can launch the server with:
|
|
41
|
-
|
|
42
|
-
```bash
|
|
43
|
-
npx -y @msalman5230/image-understand-mcp
|
|
44
|
-
```
|
|
45
|
-
|
|
46
|
-
For unreleased local testing, keep using `node dist/index.js`, or run `npm link` from this repo and use the linked `image-understand-mcp` binary.
|
|
47
|
-
|
|
48
|
-
## Release Versions
|
|
49
|
-
|
|
50
|
-
The first public release is `1.0.0`.
|
|
51
|
-
|
|
52
|
-
For future releases, use npm's semver bump command from the repo root:
|
|
53
|
-
|
|
54
|
-
```bash
|
|
55
|
-
npm version patch
|
|
56
|
-
git push origin main --follow-tags
|
|
57
|
-
```
|
|
58
|
-
|
|
59
|
-
Use `patch` for fixes, `minor` for backward-compatible features, and `major` for breaking changes.
|
|
60
|
-
|
|
61
|
-
## GitHub Actions Publishing
|
|
62
|
-
|
|
63
|
-
After the first manual publish, configure npm Trusted Publishing for package `@msalman5230/image-understand-mcp`:
|
|
64
|
-
|
|
65
|
-
- Publisher: GitHub Actions
|
|
66
|
-
- Repository: `MSalman5230/image-understand-mcp`
|
|
67
|
-
- Workflow filename: `publish.yml`
|
|
68
|
-
|
|
69
|
-
Once trusted publishing is configured, pushing a `v*.*.*` tag publishes that package version automatically.
|
|
70
|
-
|
|
71
|
-
## Environment
|
|
72
|
-
|
|
73
|
-
- `GEMINI_API_KEY`: required Google Gemini API key
|
|
74
|
-
- `GEMINI_MODEL`: optional model ID, defaults to `gemini-3.5-flash`
|
|
75
|
-
- `IMAGE_UNDERSTAND_INLINE_LIMIT_BYTES`: optional inline image limit, defaults to 18 MiB
|
|
76
|
-
- `IMAGE_UNDERSTAND_MAX_IMAGE_BYTES`: optional maximum image size, defaults to 100 MiB
|
|
77
|
-
|
|
78
|
-
The MCP server reads only the environment of the process that launches it. It does not load `.env`, `.env.local`, or any other dotenv file. For Codex/OpenCode usage, pass `GEMINI_API_KEY` and `GEMINI_MODEL` through that client config or through the parent shell environment.
|
|
79
|
-
|
|
80
|
-
Gemma support in v1 is configuration-based: set `GEMINI_MODEL` to a Google-accessible, vision-capable Gemma model ID if your account/runtime supports it. This server does not include a local Gemma runtime.
|
|
81
|
-
|
|
82
7
|
## Tool
|
|
83
8
|
|
|
84
9
|
`analyze_image`
|
|
@@ -113,20 +38,20 @@ The tool returns human-readable text plus structured content:
|
|
|
113
38
|
Add this to `~/.codex/config.toml` after publishing the package to npm:
|
|
114
39
|
|
|
115
40
|
```toml
|
|
116
|
-
[mcp_servers.image_understand]
|
|
117
|
-
command = "npx"
|
|
118
|
-
args = ["-y", "@msalman5230/image-understand-mcp"]
|
|
119
|
-
env = { GEMINI_API_KEY = "YOUR_KEY", GEMINI_MODEL = "gemini-3.5-flash" }
|
|
120
|
-
```
|
|
41
|
+
[mcp_servers.image_understand]
|
|
42
|
+
command = "npx"
|
|
43
|
+
args = ["-y", "@msalman5230/image-understand-mcp"]
|
|
44
|
+
env = { GEMINI_API_KEY = "YOUR_KEY", GEMINI_MODEL = "gemini-3.5-flash" }
|
|
45
|
+
```
|
|
121
46
|
|
|
122
47
|
You can also keep the API key outside the config and let Codex inherit the environment:
|
|
123
48
|
|
|
124
49
|
```toml
|
|
125
|
-
[mcp_servers.image_understand]
|
|
126
|
-
command = "npx"
|
|
127
|
-
args = ["-y", "@msalman5230/image-understand-mcp"]
|
|
128
|
-
env = { GEMINI_MODEL = "gemini-3.5-flash" }
|
|
129
|
-
```
|
|
50
|
+
[mcp_servers.image_understand]
|
|
51
|
+
command = "npx"
|
|
52
|
+
args = ["-y", "@msalman5230/image-understand-mcp"]
|
|
53
|
+
env = { GEMINI_MODEL = "gemini-3.5-flash" }
|
|
54
|
+
```
|
|
130
55
|
|
|
131
56
|
For local development before publishing, use the built file directly:
|
|
132
57
|
|
|
@@ -145,10 +70,10 @@ Add this to `opencode.json`:
|
|
|
145
70
|
{
|
|
146
71
|
"$schema": "https://opencode.ai/config.json",
|
|
147
72
|
"mcp": {
|
|
148
|
-
"image_understand": {
|
|
149
|
-
"type": "local",
|
|
150
|
-
"command": ["npx", "-y", "@msalman5230/image-understand-mcp"],
|
|
151
|
-
"enabled": true,
|
|
73
|
+
"image_understand": {
|
|
74
|
+
"type": "local",
|
|
75
|
+
"command": ["npx", "-y", "@msalman5230/image-understand-mcp"],
|
|
76
|
+
"enabled": true,
|
|
152
77
|
"environment": {
|
|
153
78
|
"GEMINI_API_KEY": "{env:GEMINI_API_KEY}",
|
|
154
79
|
"GEMINI_MODEL": "gemini-3.5-flash"
|
|
@@ -168,9 +93,32 @@ In OpenCode, MCP tools are shown as normal tools, often with the MCP server name
|
|
|
168
93
|
|
|
169
94
|
## Development
|
|
170
95
|
|
|
96
|
+
### Requirements
|
|
97
|
+
|
|
98
|
+
- Node.js 18 or newer
|
|
99
|
+
- A Gemini API key in `GEMINI_API_KEY`
|
|
100
|
+
- Local image files (`.png`, `.jpg`, `.jpeg`, `.webp`, `.gif`, `.bmp`, `.heic`, `.heif`)
|
|
101
|
+
|
|
102
|
+
### Environment
|
|
103
|
+
|
|
104
|
+
- `GEMINI_API_KEY`: required Google Gemini API key
|
|
105
|
+
- `GEMINI_MODEL`: optional model ID, defaults to `gemini-3.5-flash`
|
|
106
|
+
- `IMAGE_UNDERSTAND_INLINE_LIMIT_BYTES`: optional inline image limit, defaults to 18 MiB
|
|
107
|
+
- `IMAGE_UNDERSTAND_MAX_IMAGE_BYTES`: optional maximum image size, defaults to 100 MiB
|
|
108
|
+
|
|
109
|
+
The MCP server reads only the environment of the process that launches it. It does not load `.env`, `.env.local`, or any other dotenv file. For Codex/OpenCode usage, pass `GEMINI_API_KEY` and `GEMINI_MODEL` through that client config or through the parent shell environment.
|
|
110
|
+
|
|
111
|
+
Gemma support in v1 is configuration-based: set `GEMINI_MODEL` to a Google-accessible, vision-capable Gemma model ID if your account/runtime supports it. This server does not include a local Gemma runtime.
|
|
112
|
+
|
|
113
|
+
### Install
|
|
114
|
+
|
|
171
115
|
```bash
|
|
172
|
-
npm
|
|
116
|
+
npm install
|
|
173
117
|
npm run build
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
```bash
|
|
121
|
+
npm test
|
|
174
122
|
npm run check
|
|
175
123
|
```
|
|
176
124
|
|
|
@@ -184,3 +132,45 @@ npm run smoke -- "C:/path/to/image.jpg" "What is this image?"
|
|
|
184
132
|
The smoke script loads `.env.local` for development convenience. The MCP server itself does not load dotenv files.
|
|
185
133
|
|
|
186
134
|
For stdio MCP servers, stdout is reserved for JSON-RPC messages. This server writes diagnostics to stderr only.
|
|
135
|
+
|
|
136
|
+
### Publish for `npx`
|
|
137
|
+
|
|
138
|
+
The npm package is published as `@msalman5230/image-understand-mcp` and exposes a CLI binary named `image-understand-mcp`, so users do not need to point their MCP client at `dist/index.js`.
|
|
139
|
+
|
|
140
|
+
Before publishing:
|
|
141
|
+
|
|
142
|
+
```bash
|
|
143
|
+
npm run check
|
|
144
|
+
npm pack --dry-run
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
Publish:
|
|
148
|
+
|
|
149
|
+
```bash
|
|
150
|
+
npm login
|
|
151
|
+
npm publish --access public
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
Scoped npm packages must use `--access public` on publish unless you want a private/restricted package.
|
|
155
|
+
|
|
156
|
+
After that, MCP clients can launch the server with:
|
|
157
|
+
|
|
158
|
+
```bash
|
|
159
|
+
npx -y @msalman5230/image-understand-mcp
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
For unreleased local testing, keep using `node dist/index.js`, or run `npm link` from this repo and use the linked `image-understand-mcp` binary.
|
|
163
|
+
|
|
164
|
+
### Release Versions
|
|
165
|
+
|
|
166
|
+
The first public release is `1.0.0`.
|
|
167
|
+
|
|
168
|
+
For future releases, use npm's semver bump command from the repo root:
|
|
169
|
+
|
|
170
|
+
```bash
|
|
171
|
+
npm version patch
|
|
172
|
+
git push origin main --follow-tags
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
Use `patch` for fixes, `minor` for backward-compatible features, and `major` for breaking changes.
|
|
176
|
+
|
package/package.json
CHANGED
|
@@ -1,7 +1,8 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@msalman5230/image-understand-mcp",
|
|
3
|
-
"version": "1.0.
|
|
3
|
+
"version": "1.0.3",
|
|
4
4
|
"description": "Local MCP server that lets text-only agents understand local images through Gemini vision models.",
|
|
5
|
+
"license": "MIT",
|
|
5
6
|
"type": "module",
|
|
6
7
|
"repository": {
|
|
7
8
|
"type": "git",
|