@msalman5230/image-understand-mcp 1.0.1 → 1.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +81 -91
  3. package/package.json +2 -1
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 MSalman5230
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md CHANGED
@@ -1,84 +1,9 @@
1
1
  # Image Understand MCP Server
2
2
 
3
- Local MCP server that lets an LLM agent without native vision understand local image files through Google Gemini/Gemm model ID.
3
+ Local MCP server that lets an LLM agent without native vision understand local image files through Google Gemini/emma models.
4
4
 
5
5
  The server runs over stdio and exposes image analysis tools for local image paths.
6
6
 
7
- ## Requirements
8
-
9
- - Node.js 18 or newer
10
- - A Gemini API key in `GEMINI_API_KEY`
11
- - Local image files (`.png`, `.jpg`, `.jpeg`, `.webp`, `.gif`, `.bmp`, `.heic`, `.heif`)
12
-
13
- ## Install
14
-
15
- ```bash
16
- npm install
17
- npm run build
18
- ```
19
-
20
- ## Publish for `npx`
21
-
22
- The npm package is published as `@msalman5230/image-understand-mcp` and exposes a CLI binary named `image-understand-mcp`, so users do not need to point their MCP client at `dist/index.js`.
23
-
24
- Before publishing:
25
-
26
- ```bash
27
- npm run check
28
- npm pack --dry-run
29
- ```
30
-
31
- Publish:
32
-
33
- ```bash
34
- npm login
35
- npm publish --access public
36
- ```
37
-
38
- Scoped npm packages must use `--access public` on publish unless you want a private/restricted package.
39
-
40
- After that, MCP clients can launch the server with:
41
-
42
- ```bash
43
- npx -y @msalman5230/image-understand-mcp
44
- ```
45
-
46
- For unreleased local testing, keep using `node dist/index.js`, or run `npm link` from this repo and use the linked `image-understand-mcp` binary.
47
-
48
- ## Release Versions
49
-
50
- The first public release is `1.0.0`.
51
-
52
- For future releases, use npm's semver bump command from the repo root:
53
-
54
- ```bash
55
- npm version patch
56
- git push origin main --follow-tags
57
- ```
58
-
59
- Use `patch` for fixes, `minor` for backward-compatible features, and `major` for breaking changes.
60
-
61
- ## GitHub Actions Publishing
62
-
63
- After the first manual publish, configure npm Trusted Publishing for package `@msalman5230/image-understand-mcp`:
64
-
65
- - Publisher: GitHub Actions
66
- - Repository: `MSalman5230/image-understand-mcp`
67
- - Workflow filename: `publish.yml`
68
-
69
- Once trusted publishing is configured, pushing a `v*.*.*` tag publishes that package version automatically.
70
-
71
- ## Environment
72
-
73
- - `GEMINI_API_KEY`: required Google Gemini API key
74
- - `GEMINI_MODEL`: optional model ID, defaults to `gemini-3.5-flash`
75
- - `IMAGE_UNDERSTAND_INLINE_LIMIT_BYTES`: optional inline image limit, defaults to 18 MiB
76
- - `IMAGE_UNDERSTAND_MAX_IMAGE_BYTES`: optional maximum image size, defaults to 100 MiB
77
-
78
- The MCP server reads only the environment of the process that launches it. It does not load `.env`, `.env.local`, or any other dotenv file. For Codex/OpenCode usage, pass `GEMINI_API_KEY` and `GEMINI_MODEL` through that client config or through the parent shell environment.
79
-
80
- Gemma support in v1 is configuration-based: set `GEMINI_MODEL` to a Google-accessible, vision-capable Gemma model ID if your account/runtime supports it. This server does not include a local Gemma runtime.
81
-
82
7
  ## Tool
83
8
 
84
9
  `analyze_image`
@@ -113,20 +38,20 @@ The tool returns human-readable text plus structured content:
113
38
  Add this to `~/.codex/config.toml` after publishing the package to npm:
114
39
 
115
40
  ```toml
116
- [mcp_servers.image_understand]
117
- command = "npx"
118
- args = ["-y", "@msalman5230/image-understand-mcp"]
119
- env = { GEMINI_API_KEY = "YOUR_KEY", GEMINI_MODEL = "gemini-3.5-flash" }
120
- ```
41
+ [mcp_servers.image_understand]
42
+ command = "npx"
43
+ args = ["-y", "@msalman5230/image-understand-mcp"]
44
+ env = { GEMINI_API_KEY = "YOUR_KEY", GEMINI_MODEL = "gemini-3.5-flash" }
45
+ ```
121
46
 
122
47
  You can also keep the API key outside the config and let Codex inherit the environment:
123
48
 
124
49
  ```toml
125
- [mcp_servers.image_understand]
126
- command = "npx"
127
- args = ["-y", "@msalman5230/image-understand-mcp"]
128
- env = { GEMINI_MODEL = "gemini-3.5-flash" }
129
- ```
50
+ [mcp_servers.image_understand]
51
+ command = "npx"
52
+ args = ["-y", "@msalman5230/image-understand-mcp"]
53
+ env = { GEMINI_MODEL = "gemini-3.5-flash" }
54
+ ```
130
55
 
131
56
  For local development before publishing, use the built file directly:
132
57
 
@@ -145,10 +70,10 @@ Add this to `opencode.json`:
145
70
  {
146
71
  "$schema": "https://opencode.ai/config.json",
147
72
  "mcp": {
148
- "image_understand": {
149
- "type": "local",
150
- "command": ["npx", "-y", "@msalman5230/image-understand-mcp"],
151
- "enabled": true,
73
+ "image_understand": {
74
+ "type": "local",
75
+ "command": ["npx", "-y", "@msalman5230/image-understand-mcp"],
76
+ "enabled": true,
152
77
  "environment": {
153
78
  "GEMINI_API_KEY": "{env:GEMINI_API_KEY}",
154
79
  "GEMINI_MODEL": "gemini-3.5-flash"
@@ -168,9 +93,32 @@ In OpenCode, MCP tools are shown as normal tools, often with the MCP server name
168
93
 
169
94
  ## Development
170
95
 
96
+ ### Requirements
97
+
98
+ - Node.js 18 or newer
99
+ - A Gemini API key in `GEMINI_API_KEY`
100
+ - Local image files (`.png`, `.jpg`, `.jpeg`, `.webp`, `.gif`, `.bmp`, `.heic`, `.heif`)
101
+
102
+ ### Environment
103
+
104
+ - `GEMINI_API_KEY`: required Google Gemini API key
105
+ - `GEMINI_MODEL`: optional model ID, defaults to `gemini-3.5-flash`
106
+ - `IMAGE_UNDERSTAND_INLINE_LIMIT_BYTES`: optional inline image limit, defaults to 18 MiB
107
+ - `IMAGE_UNDERSTAND_MAX_IMAGE_BYTES`: optional maximum image size, defaults to 100 MiB
108
+
109
+ The MCP server reads only the environment of the process that launches it. It does not load `.env`, `.env.local`, or any other dotenv file. For Codex/OpenCode usage, pass `GEMINI_API_KEY` and `GEMINI_MODEL` through that client config or through the parent shell environment.
110
+
111
+ Gemma support in v1 is configuration-based: set `GEMINI_MODEL` to a Google-accessible, vision-capable Gemma model ID if your account/runtime supports it. This server does not include a local Gemma runtime.
112
+
113
+ ### Install
114
+
171
115
  ```bash
172
- npm test
116
+ npm install
173
117
  npm run build
118
+ ```
119
+
120
+ ```bash
121
+ npm test
174
122
  npm run check
175
123
  ```
176
124
 
@@ -184,3 +132,45 @@ npm run smoke -- "C:/path/to/image.jpg" "What is this image?"
184
132
  The smoke script loads `.env.local` for development convenience. The MCP server itself does not load dotenv files.
185
133
 
186
134
  For stdio MCP servers, stdout is reserved for JSON-RPC messages. This server writes diagnostics to stderr only.
135
+
136
+ ### Publish for `npx`
137
+
138
+ The npm package is published as `@msalman5230/image-understand-mcp` and exposes a CLI binary named `image-understand-mcp`, so users do not need to point their MCP client at `dist/index.js`.
139
+
140
+ Before publishing:
141
+
142
+ ```bash
143
+ npm run check
144
+ npm pack --dry-run
145
+ ```
146
+
147
+ Publish:
148
+
149
+ ```bash
150
+ npm login
151
+ npm publish --access public
152
+ ```
153
+
154
+ Scoped npm packages must use `--access public` on publish unless you want a private/restricted package.
155
+
156
+ After that, MCP clients can launch the server with:
157
+
158
+ ```bash
159
+ npx -y @msalman5230/image-understand-mcp
160
+ ```
161
+
162
+ For unreleased local testing, keep using `node dist/index.js`, or run `npm link` from this repo and use the linked `image-understand-mcp` binary.
163
+
164
+ ### Release Versions
165
+
166
+ The first public release is `1.0.0`.
167
+
168
+ For future releases, use npm's semver bump command from the repo root:
169
+
170
+ ```bash
171
+ npm version patch
172
+ git push origin main --follow-tags
173
+ ```
174
+
175
+ Use `patch` for fixes, `minor` for backward-compatible features, and `major` for breaking changes.
176
+
package/package.json CHANGED
@@ -1,7 +1,8 @@
1
1
  {
2
2
  "name": "@msalman5230/image-understand-mcp",
3
- "version": "1.0.1",
3
+ "version": "1.0.3",
4
4
  "description": "Local MCP server that lets text-only agents understand local images through Gemini vision models.",
5
+ "license": "MIT",
5
6
  "type": "module",
6
7
  "repository": {
7
8
  "type": "git",