@msalman5230/image-understand-mcp 1.0.2 → 1.0.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +40 -91
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,84 +1,9 @@
|
|
|
1
1
|
# Image Understand MCP Server
|
|
2
2
|
|
|
3
|
-
Local MCP server that lets an LLM agent without native vision understand local image files through Google Gemini/
|
|
3
|
+
Local MCP server that lets an LLM agent without native vision understand local image files through Google Gemini/emma models.
|
|
4
4
|
|
|
5
5
|
The server runs over stdio and exposes image analysis tools for local image paths.
|
|
6
6
|
|
|
7
|
-
## Requirements
|
|
8
|
-
|
|
9
|
-
- Node.js 18 or newer
|
|
10
|
-
- A Gemini API key in `GEMINI_API_KEY`
|
|
11
|
-
- Local image files (`.png`, `.jpg`, `.jpeg`, `.webp`, `.gif`, `.bmp`, `.heic`, `.heif`)
|
|
12
|
-
|
|
13
|
-
## Install
|
|
14
|
-
|
|
15
|
-
```bash
|
|
16
|
-
npm install
|
|
17
|
-
npm run build
|
|
18
|
-
```
|
|
19
|
-
|
|
20
|
-
## Publish for `npx`
|
|
21
|
-
|
|
22
|
-
The npm package is published as `@msalman5230/image-understand-mcp` and exposes a CLI binary named `image-understand-mcp`, so users do not need to point their MCP client at `dist/index.js`.
|
|
23
|
-
|
|
24
|
-
Before publishing:
|
|
25
|
-
|
|
26
|
-
```bash
|
|
27
|
-
npm run check
|
|
28
|
-
npm pack --dry-run
|
|
29
|
-
```
|
|
30
|
-
|
|
31
|
-
Publish:
|
|
32
|
-
|
|
33
|
-
```bash
|
|
34
|
-
npm login
|
|
35
|
-
npm publish --access public
|
|
36
|
-
```
|
|
37
|
-
|
|
38
|
-
Scoped npm packages must use `--access public` on publish unless you want a private/restricted package.
|
|
39
|
-
|
|
40
|
-
After that, MCP clients can launch the server with:
|
|
41
|
-
|
|
42
|
-
```bash
|
|
43
|
-
npx -y @msalman5230/image-understand-mcp
|
|
44
|
-
```
|
|
45
|
-
|
|
46
|
-
For unreleased local testing, keep using `node dist/index.js`, or run `npm link` from this repo and use the linked `image-understand-mcp` binary.
|
|
47
|
-
|
|
48
|
-
## Release Versions
|
|
49
|
-
|
|
50
|
-
The first public release is `1.0.0`.
|
|
51
|
-
|
|
52
|
-
For future releases, use npm's semver bump command from the repo root:
|
|
53
|
-
|
|
54
|
-
```bash
|
|
55
|
-
npm version patch
|
|
56
|
-
git push origin main --follow-tags
|
|
57
|
-
```
|
|
58
|
-
|
|
59
|
-
Use `patch` for fixes, `minor` for backward-compatible features, and `major` for breaking changes.
|
|
60
|
-
|
|
61
|
-
## GitHub Actions Publishing
|
|
62
|
-
|
|
63
|
-
After the first manual publish, configure npm Trusted Publishing for package `@msalman5230/image-understand-mcp`:
|
|
64
|
-
|
|
65
|
-
- Publisher: GitHub Actions
|
|
66
|
-
- Repository: `MSalman5230/image-understand-mcp`
|
|
67
|
-
- Workflow filename: `publish.yml`
|
|
68
|
-
|
|
69
|
-
Once trusted publishing is configured, pushing a `v*.*.*` tag publishes that package version automatically.
|
|
70
|
-
|
|
71
|
-
## Environment
|
|
72
|
-
|
|
73
|
-
- `GEMINI_API_KEY`: required Google Gemini API key
|
|
74
|
-
- `GEMINI_MODEL`: optional model ID, defaults to `gemini-3.5-flash`
|
|
75
|
-
- `IMAGE_UNDERSTAND_INLINE_LIMIT_BYTES`: optional inline image limit, defaults to 18 MiB
|
|
76
|
-
- `IMAGE_UNDERSTAND_MAX_IMAGE_BYTES`: optional maximum image size, defaults to 100 MiB
|
|
77
|
-
|
|
78
|
-
The MCP server reads only the environment of the process that launches it. It does not load `.env`, `.env.local`, or any other dotenv file. For Codex/OpenCode usage, pass `GEMINI_API_KEY` and `GEMINI_MODEL` through that client config or through the parent shell environment.
|
|
79
|
-
|
|
80
|
-
Gemma support in v1 is configuration-based: set `GEMINI_MODEL` to a Google-accessible, vision-capable Gemma model ID if your account/runtime supports it. This server does not include a local Gemma runtime.
|
|
81
|
-
|
|
82
7
|
## Tool
|
|
83
8
|
|
|
84
9
|
`analyze_image`
|
|
@@ -113,20 +38,20 @@ The tool returns human-readable text plus structured content:
|
|
|
113
38
|
Add this to `~/.codex/config.toml` after publishing the package to npm:
|
|
114
39
|
|
|
115
40
|
```toml
|
|
116
|
-
[mcp_servers.image_understand]
|
|
117
|
-
command = "npx"
|
|
118
|
-
args = ["-y", "@msalman5230/image-understand-mcp"]
|
|
119
|
-
env = { GEMINI_API_KEY = "YOUR_KEY", GEMINI_MODEL = "gemini-3.5-flash" }
|
|
120
|
-
```
|
|
41
|
+
[mcp_servers.image_understand]
|
|
42
|
+
command = "npx"
|
|
43
|
+
args = ["-y", "@msalman5230/image-understand-mcp"]
|
|
44
|
+
env = { GEMINI_API_KEY = "YOUR_KEY", GEMINI_MODEL = "gemini-3.5-flash" }
|
|
45
|
+
```
|
|
121
46
|
|
|
122
47
|
You can also keep the API key outside the config and let Codex inherit the environment:
|
|
123
48
|
|
|
124
49
|
```toml
|
|
125
|
-
[mcp_servers.image_understand]
|
|
126
|
-
command = "npx"
|
|
127
|
-
args = ["-y", "@msalman5230/image-understand-mcp"]
|
|
128
|
-
env = { GEMINI_MODEL = "gemini-3.5-flash" }
|
|
129
|
-
```
|
|
50
|
+
[mcp_servers.image_understand]
|
|
51
|
+
command = "npx"
|
|
52
|
+
args = ["-y", "@msalman5230/image-understand-mcp"]
|
|
53
|
+
env = { GEMINI_MODEL = "gemini-3.5-flash" }
|
|
54
|
+
```
|
|
130
55
|
|
|
131
56
|
For local development before publishing, use the built file directly:
|
|
132
57
|
|
|
@@ -145,10 +70,10 @@ Add this to `opencode.json`:
|
|
|
145
70
|
{
|
|
146
71
|
"$schema": "https://opencode.ai/config.json",
|
|
147
72
|
"mcp": {
|
|
148
|
-
"image_understand": {
|
|
149
|
-
"type": "local",
|
|
150
|
-
"command": ["npx", "-y", "@msalman5230/image-understand-mcp"],
|
|
151
|
-
"enabled": true,
|
|
73
|
+
"image_understand": {
|
|
74
|
+
"type": "local",
|
|
75
|
+
"command": ["npx", "-y", "@msalman5230/image-understand-mcp"],
|
|
76
|
+
"enabled": true,
|
|
152
77
|
"environment": {
|
|
153
78
|
"GEMINI_API_KEY": "{env:GEMINI_API_KEY}",
|
|
154
79
|
"GEMINI_MODEL": "gemini-3.5-flash"
|
|
@@ -168,9 +93,32 @@ In OpenCode, MCP tools are shown as normal tools, often with the MCP server name
|
|
|
168
93
|
|
|
169
94
|
## Development
|
|
170
95
|
|
|
96
|
+
### Requirements
|
|
97
|
+
|
|
98
|
+
- Node.js 18 or newer
|
|
99
|
+
- A Gemini API key in `GEMINI_API_KEY`
|
|
100
|
+
- Local image files (`.png`, `.jpg`, `.jpeg`, `.webp`, `.gif`, `.bmp`, `.heic`, `.heif`)
|
|
101
|
+
|
|
102
|
+
### Environment
|
|
103
|
+
|
|
104
|
+
- `GEMINI_API_KEY`: required Google Gemini API key
|
|
105
|
+
- `GEMINI_MODEL`: optional model ID, defaults to `gemini-3.5-flash`
|
|
106
|
+
- `IMAGE_UNDERSTAND_INLINE_LIMIT_BYTES`: optional inline image limit, defaults to 18 MiB
|
|
107
|
+
- `IMAGE_UNDERSTAND_MAX_IMAGE_BYTES`: optional maximum image size, defaults to 100 MiB
|
|
108
|
+
|
|
109
|
+
The MCP server reads only the environment of the process that launches it. It does not load `.env`, `.env.local`, or any other dotenv file. For Codex/OpenCode usage, pass `GEMINI_API_KEY` and `GEMINI_MODEL` through that client config or through the parent shell environment.
|
|
110
|
+
|
|
111
|
+
Gemma support in v1 is configuration-based: set `GEMINI_MODEL` to a Google-accessible, vision-capable Gemma model ID if your account/runtime supports it. This server does not include a local Gemma runtime.
|
|
112
|
+
|
|
113
|
+
### Install
|
|
114
|
+
|
|
171
115
|
```bash
|
|
172
|
-
npm
|
|
116
|
+
npm install
|
|
173
117
|
npm run build
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
```bash
|
|
121
|
+
npm test
|
|
174
122
|
npm run check
|
|
175
123
|
```
|
|
176
124
|
|
|
@@ -184,3 +132,4 @@ npm run smoke -- "C:/path/to/image.jpg" "What is this image?"
|
|
|
184
132
|
The smoke script loads `.env.local` for development convenience. The MCP server itself does not load dotenv files.
|
|
185
133
|
|
|
186
134
|
For stdio MCP servers, stdout is reserved for JSON-RPC messages. This server writes diagnostics to stderr only.
|
|
135
|
+
|
package/package.json
CHANGED