@jochenyang/opencode-vision 1.0.0 → 1.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,8 +1,45 @@
1
- # opencode-vision
1
+ <p align="center">
2
+ <picture>
3
+ <source media="(prefers-color-scheme: dark)" srcset="assets/logo.svg">
4
+ <img src="assets/logo.svg" width="64" alt="opencode-vision logo">
5
+ </picture>
6
+ </p>
7
+ <h1 align="center">opencode-vision</h1>
8
+ <p align="center">
9
+ 🌐 <a href="README_en.md">English</a> · <strong>中文</strong>
10
+ </p>
11
+ <p align="center">
12
+ 让不支持多模态的 OpenCode 模型也能「看懂」图片
13
+ <br />
14
+ 自动存图 → 引导模型调用 vision 工具 → 返回描述
15
+ </p>
16
+ <p align="center">
17
+ <a href="https://www.npmjs.com/package/@jochenyang/opencode-vision">
18
+ <img src="https://img.shields.io/npm/v/@jochenyang/opencode-vision?style=flat-square" alt="npm version">
19
+ </a>
20
+ <a href="LICENSE">
21
+ <img src="https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square" alt="MIT License">
22
+ </a>
23
+ <a href="https://github.com/JochenYang/opencode-vision">
24
+ <img src="https://img.shields.io/github/stars/JochenYang/opencode-vision?style=flat-square" alt="GitHub stars">
25
+ </a>
26
+ </p>
27
+
28
+ ---
29
+
30
+ ### ✨ 一行命令安装
2
31
 
3
- 为 [OpenCode](https://github.com/opencode-ai/opencode) 提供视觉识别能力的插件 + 工具。
32
+ ```bash
33
+ npx @jochenyang/opencode-vision
34
+ ```
4
35
 
5
- 当模型本身不支持多模态输入时,自动将用户粘贴的图片保存到临时目录,并引导模型调用 vision 工具进行识别。支持单图和多图。
36
+ 卸载同样简单:
37
+
38
+ ```bash
39
+ npx @jochenyang/opencode-vision --uninstall
40
+ ```
41
+
42
+ ---
6
43
 
7
44
  ## 原理
8
45
 
@@ -74,10 +111,14 @@ OpenCode 会自动发现 `~/.config/opencode/tools/` 和 `~/.config/opencode/plu
74
111
 
75
112
  > 如果对应目录不存在,手动创建即可。
76
113
 
77
- ### 通过 npx(即将支持)
114
+ ### 通过 npx
78
115
 
79
116
  ```bash
80
- npx opencode-vision install
117
+ # 安装
118
+ npx @jochenyang/opencode-vision
119
+
120
+ # 卸载
121
+ npx @jochenyang/opencode-vision --uninstall
81
122
  ```
82
123
 
83
124
  ## 验证
@@ -145,4 +186,4 @@ $env:VISION_MODEL = 'your-vision-model'
145
186
 
146
187
  ## 许可证
147
188
 
148
- MIT
189
+ [MIT](LICENSE)
package/README_en.md ADDED
@@ -0,0 +1,210 @@
1
+ <p align="center">
2
+ 🌐 <strong>English</strong> · <a href="README.md">中文</a>
3
+ </p>
4
+
5
+ <p align="center">
6
+ <picture>
7
+ <source media="(prefers-color-scheme: dark)" srcset="assets/logo.svg">
8
+ <img src="assets/logo.svg" width="64" alt="opencode-vision logo">
9
+ </picture>
10
+ </p>
11
+ <h1 align="center">opencode-vision</h1>
12
+ <p align="center">
13
+ 🌐 <strong>English</strong> · <a href="README.md">中文</a>
14
+ </p>
15
+ <p align="center">
16
+ Let non-vision OpenCode models "see" pasted images
17
+ <br />
18
+ Auto-saves images → guides model to call vision tool → returns description
19
+ </p>
20
+ <p align="center">
21
+ <a href="https://www.npmjs.com/package/@jochenyang/opencode-vision">
22
+ <img src="https://img.shields.io/npm/v/@jochenyang/opencode-vision?style=flat-square" alt="npm version">
23
+ </a>
24
+ <a href="LICENSE">
25
+ <img src="https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square" alt="MIT License">
26
+ </a>
27
+ <a href="https://github.com/JochenYang/opencode-vision">
28
+ <img src="https://img.shields.io/github/stars/JochenYang/opencode-vision?style=flat-square" alt="GitHub stars">
29
+ </a>
30
+ </p>
31
+ <p align="center">
32
+ <a href="https://www.npmjs.com/package/@jochenyang/opencode-vision">
33
+ <img src="https://img.shields.io/npm/v/@jochenyang/opencode-vision?style=flat-square" alt="npm version">
34
+ </a>
35
+ <a href="LICENSE">
36
+ <img src="https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square" alt="MIT License">
37
+ </a>
38
+ <a href="https://github.com/JochenYang/opencode-vision">
39
+ <img src="https://img.shields.io/github/stars/JochenYang/opencode-vision?style=flat-square" alt="GitHub stars">
40
+ </a>
41
+ </p>
42
+ </p>
43
+
44
+ ---
45
+
46
+ ### ✨ One-line install
47
+
48
+ ```bash
49
+ npx @jochenyang/opencode-vision
50
+ ```
51
+
52
+ Uninstall:
53
+
54
+ ```bash
55
+ npx @jochenyang/opencode-vision --uninstall
56
+ ```
57
+
58
+ ---
59
+
60
+ ## How It Works
61
+
62
+ ```
63
+ User pastes image + "What is this?"
64
+
65
+ vision-helper plugin (experimental.chat.messages.transform)
66
+ ├─ Decode base64 → save to temp directory
67
+ ├─ Replace original image part with a short placeholder (remove ERROR noise from unsupportedParts)
68
+ └─ Inject path hint before user's text
69
+
70
+ Model sees the path hint → automatically calls the vision tool
71
+
72
+ vision tool calls the vision API → returns image description
73
+ ```
74
+
75
+ - **Single image** → model calls `vision(path)` to read one image
76
+ - **Multiple images** → model calls `vision(paths=[...])` to process all at once
77
+
78
+ ## Prerequisites
79
+
80
+ - [OpenCode](https://github.com/opencode-ai/opencode) installed
81
+ - An OpenAI-compatible vision API (e.g., Aliyun DashScope, OpenAI, etc.)
82
+ - Environment variables configured (recommended system-wide)
83
+
84
+ ## Environment Variables
85
+
86
+ | Variable | Description | Example |
87
+ | ----------------- | ------------------------------------------------------ | ------------------------------- |
88
+ | `VISION_API_KEY` | Vision API key | `sk-your-api-key` |
89
+ | `VISION_API_URL` | Vision API base URL<br>(tool auto-appends `/chat/completions`) | `https://your-api-endpoint/v1` |
90
+ | `VISION_MODEL` | Vision model name | `your-vision-model` |
91
+
92
+ ### Windows (System-wide)
93
+
94
+ ```powershell
95
+ [System.Environment]::SetEnvironmentVariable('VISION_API_KEY', 'sk-your-api-key', 'User')
96
+ [System.Environment]::SetEnvironmentVariable('VISION_API_URL', 'https://your-api-endpoint/v1', 'User')
97
+ [System.Environment]::SetEnvironmentVariable('VISION_MODEL', 'your-vision-model', 'User')
98
+ ```
99
+
100
+ **Restart your terminal** after setting.
101
+
102
+ ### macOS / Linux
103
+
104
+ Add to `~/.zshrc` or `~/.bashrc`:
105
+
106
+ ```bash
107
+ export VISION_API_KEY="sk-your-api-key"
108
+ export VISION_API_URL="https://your-api-endpoint/v1"
109
+ export VISION_MODEL="your-vision-model"
110
+ ```
111
+
112
+ ## Installation
113
+
114
+ ### Manual
115
+
116
+ Copy the two files to OpenCode's global config directory:
117
+
118
+ ```bash
119
+ # Tool
120
+ cp tools/vision.ts ~/.config/opencode/tools/
121
+
122
+ # Plugin
123
+ cp plugins/vision-helper.ts ~/.config/opencode/plugins/
124
+ ```
125
+
126
+ OpenCode auto-discovers files under `~/.config/opencode/tools/` and `~/.config/opencode/plugins/` — **no need to modify `opencode.json`**.
127
+
128
+ > Create the directories if they don't exist.
129
+
130
+ ### Via npx
131
+
132
+ ```bash
133
+ # Install
134
+ npx @jochenyang/opencode-vision
135
+
136
+ # Uninstall
137
+ npx @jochenyang/opencode-vision --uninstall
138
+ ```
139
+
140
+ ## Verification
141
+
142
+ Start OpenCode:
143
+
144
+ ```bash
145
+ opencode
146
+ ```
147
+
148
+ Paste an image and ask:
149
+
150
+ ```
151
+ [Image] What is this?
152
+ ```
153
+
154
+ Expected behavior:
155
+
156
+ 1. The model cannot read the image directly (doesn't support multimodal)
157
+ 2. The plugin saves the image to temp and injects a path hint
158
+ 3. The model automatically calls the `vision` tool
159
+ 4. The model returns an image description
160
+
161
+ ## Project Structure
162
+
163
+ ```
164
+ opencode-vision/
165
+ ├── tools/
166
+ │ └── vision.ts # Vision tool — calls the vision API
167
+ ├── plugins/
168
+ │ └── vision-helper.ts # Plugin — saves images, injects hints, removes ERROR noise
169
+ ├── bin/
170
+ │ └── install.js # CLI install/uninstall script
171
+ ├── package.json
172
+ ├── README.md
173
+ ├── README_en.md
174
+ └── LICENSE
175
+ ```
176
+
177
+ ### Tool: `tools/vision.ts`
178
+
179
+ - Reads local image files and describes them via a vision API
180
+ - Supports `path` (single) and `paths` (multiple) parameters
181
+ - Compatible with any OpenAI Chat Completions API
182
+
183
+ ### Plugin: `plugins/vision-helper.ts`
184
+
185
+ - Hook: `experimental.chat.messages.transform`
186
+ - Processes right before the message is sent to the model
187
+ - Saves images to `os.tmpdir()/opencode-vision/`
188
+ - Injects path hints before user text (not persisted to chat history)
189
+ - Replaces original image parts to prevent ERROR noise from `unsupportedParts`
190
+
191
+ ## Notes
192
+
193
+ - Images are saved to the system temp directory `os.tmpdir()/opencode-vision/` — automatically cleaned on reboot
194
+ - Temp files are named `pasted-{timestamp}-{random}.{ext}`
195
+ - Same image pasted multiple times in one session creates separate temp files
196
+ - Vision API calls use `max_tokens: 4096`, sufficient for detailed multi-image descriptions
197
+
198
+ ## Custom Vision API
199
+
200
+ Compatible with any OpenAI Chat Completions vision API. Just change the environment variables:
201
+
202
+ ```bash
203
+ export VISION_API_KEY="sk-your-api-key"
204
+ export VISION_API_URL="https://your-api-endpoint/v1"
205
+ export VISION_MODEL="your-vision-model"
206
+ ```
207
+
208
+ ## License
209
+
210
+ [MIT](LICENSE)
@@ -0,0 +1,13 @@
1
+ <svg xmlns="http://www.w3.org/2000/svg" width="48" height="48" viewBox="0 0 48 48" fill="none">
2
+ <!-- Magnifying glass circle -->
3
+ <circle cx="20" cy="20" r="12" stroke="#6366f1" stroke-width="3" fill="none"/>
4
+ <!-- Magnifying glass handle -->
5
+ <line x1="29" y1="29" x2="40" y2="40" stroke="#6366f1" stroke-width="3.5" stroke-linecap="round"/>
6
+ <!-- Eye inside -->
7
+ <path d="M13 20c0-3.3 2.7-6 6-6s6 2.7 6 6-2.7 6-6 6-6-2.7-6-6z" stroke="#6366f1" stroke-width="1.5" fill="none"/>
8
+ <circle cx="19" cy="20" r="2.5" fill="#6366f1"/>
9
+ <!-- Small sparkle dots -->
10
+ <circle cx="8" cy="8" r="1.5" fill="#a5b4fc"/>
11
+ <circle cx="36" cy="10" r="1" fill="#a5b4fc"/>
12
+ <circle cx="42" cy="30" r="1.2" fill="#a5b4fc"/>
13
+ </svg>
package/bin/install.js CHANGED
@@ -5,12 +5,19 @@ const os = require("os")
5
5
 
6
6
  const SRC = path.join(__dirname, "..")
7
7
  const DST = path.join(os.homedir(), ".config", "opencode")
8
+ const isWin = process.platform === "win32"
8
9
 
9
10
  const FILES = [
10
11
  ["tools/vision.ts", "tools/vision.ts"],
11
12
  ["plugins/vision-helper.ts", "plugins/vision-helper.ts"],
12
13
  ]
13
14
 
15
+ const ENV_VARS = [
16
+ { name: "VISION_API_KEY", desc: "视觉 API 密钥 / Vision API key", example: "sk-your-api-key" },
17
+ { name: "VISION_API_URL", desc: "视觉 API 地址 / Vision API base URL", example: "https://your-api-endpoint/v1" },
18
+ { name: "VISION_MODEL", desc: "视觉模型名称 / Vision model name", example: "your-vision-model" },
19
+ ]
20
+
14
21
  function log(msg, ok = true) {
15
22
  const prefix = ok ? "\x1b[32m ✓\x1b[0m" : "\x1b[31m ✗\x1b[0m"
16
23
  console.log(`${prefix} ${msg}`)
@@ -20,45 +27,82 @@ function title(msg) {
20
27
  console.log(`\n\x1b[36m═══ ${msg} \x1b[0m\n`)
21
28
  }
22
29
 
30
+ function printEnvGuide() {
31
+ console.log("\n 你需要设置以下环境变量才能使用视觉识别功能:")
32
+ console.log()
33
+
34
+ for (const v of ENV_VARS) {
35
+ console.log(` \x1b[33m${v.name}\x1b[0m`)
36
+ console.log(` → ${v.desc}`)
37
+ console.log(` → 示例: ${v.example}`)
38
+ console.log()
39
+ }
40
+
41
+ if (isWin) {
42
+ console.log(" \x1b[36mWindows 系统级配置(管理员 PowerShell):\x1b[0m")
43
+ console.log()
44
+ for (const v of ENV_VARS) {
45
+ console.log(` [System.Environment]::SetEnvironmentVariable('${v.name}', '${v.example}', 'User')`)
46
+ }
47
+ console.log()
48
+ console.log(" 设置后重启终端生效。")
49
+ } else {
50
+ console.log(" \x1b[36mmacOS / Linux 配置(添加到 ~/.zshrc 或 ~/.bashrc):\x1b[0m")
51
+ console.log()
52
+ for (const v of ENV_VARS) {
53
+ console.log(` export ${v.name}="${v.example}"`)
54
+ }
55
+ console.log()
56
+ console.log(" 然后执行 source ~/.zshrc 或重启终端。")
57
+ }
58
+ }
59
+
60
+ async function checkVars() {
61
+ let missing = 0
62
+ for (const v of ENV_VARS) {
63
+ const val = process.env[v.name]
64
+ if (val) {
65
+ const masked = v.name === "VISION_API_KEY" ? val.slice(0, 6) + "****" : val
66
+ log(`${v.name} = ${masked}`)
67
+ } else {
68
+ log(`${v.name} 未设置`, false)
69
+ missing++
70
+ }
71
+ }
72
+ return missing
73
+ }
74
+
23
75
  async function doInstall() {
24
76
  title("opencode-vision 安装")
25
77
 
78
+ // ── 文件复制 ──
26
79
  for (const [, rel] of FILES) {
27
80
  const dir = path.join(DST, path.dirname(rel))
28
81
  if (!fs.existsSync(dir)) {
29
82
  fs.mkdirSync(dir, { recursive: true })
30
83
  }
31
84
  }
32
-
33
85
  for (const [srcRel, dstRel] of FILES) {
34
86
  const src = path.join(SRC, srcRel)
35
87
  const dst = path.join(DST, dstRel)
36
-
37
88
  if (!fs.existsSync(src)) {
38
89
  log(`源文件不存在: ${srcRel}`, false)
39
90
  continue
40
91
  }
41
-
42
92
  fs.copyFileSync(src, dst)
43
93
  log(`安装 ${dstRel}`)
44
94
  }
45
95
 
96
+ // ── 环境变量检查 ──
46
97
  title("环境变量检查")
47
- const vars = {
48
- VISION_API_KEY: process.env.VISION_API_KEY,
49
- VISION_API_URL: process.env.VISION_API_URL,
50
- VISION_MODEL: process.env.VISION_MODEL,
51
- }
98
+ const missing = await checkVars()
52
99
 
53
- for (const [name, val] of Object.entries(vars)) {
54
- if (val) {
55
- const masked = name === "VISION_API_KEY" ? val.slice(0, 6) + "****" : val
56
- log(`${name} = ${masked}`)
57
- } else {
58
- log(`${name} 未设置 — 请配置后再使用`, false)
59
- }
100
+ if (missing > 0) {
101
+ console.log(`\n \x1b[33m⚠ 有 ${missing} 个环境变量未设置。\x1b[0m`)
102
+ printEnvGuide()
60
103
  }
61
104
 
105
+ // ── OpenCode 检测 ──
62
106
  title("OpenCode 检测")
63
107
  try {
64
108
  const { execSync } = require("child_process")
@@ -73,9 +117,13 @@ async function doInstall() {
73
117
  }
74
118
 
75
119
  title("安装完成")
76
- console.log(" 重启 OpenCode 后即可使用。")
77
- console.log(" 粘贴一张图片试试看:")
78
- console.log(' [图片] "这是什么?"')
120
+ console.log(" 文件已就位,重启 OpenCode 后即可使用。")
121
+ if (missing > 0) {
122
+ console.log(" ⚠ 环境变量未配置完整,视觉识别功能无法正常工作。")
123
+ console.log(" 请按上面指引设置后再重启 OpenCode。")
124
+ }
125
+ console.log(" 📝 使用方式:粘贴一张图片并提问")
126
+ console.log(' "[图片] 这是什么?"')
79
127
  }
80
128
 
81
129
  async function doUninstall() {
@@ -92,7 +140,6 @@ async function doUninstall() {
92
140
  log(`已删除 ${rel}`)
93
141
  removed++
94
142
 
95
- // 如果目录空了就一并清理
96
143
  const dir = path.dirname(dst)
97
144
  if (fs.existsSync(dir) && fs.readdirSync(dir).length === 0) {
98
145
  fs.rmdirSync(dir)
package/package.json CHANGED
@@ -1,19 +1,28 @@
1
1
  {
2
2
  "name": "@jochenyang/opencode-vision",
3
- "version": "1.0.0",
3
+ "version": "1.0.2",
4
4
  "description": "Vision plugin + tool for OpenCode — automatically handles pasted images for non-vision models",
5
- "keywords": ["opencode", "vision", "image", "ai", "plugin", "tool"],
5
+ "keywords": [
6
+ "opencode",
7
+ "vision",
8
+ "image",
9
+ "ai",
10
+ "plugin",
11
+ "tool"
12
+ ],
6
13
  "homepage": "https://github.com/jochenyang/opencode-vision",
7
14
  "license": "MIT",
8
15
  "author": "Jochen Yang",
9
16
  "bin": {
10
- "opencode-vision": "./bin/install.js"
17
+ "opencode-vision": "bin/install.js"
11
18
  },
12
19
  "files": [
13
20
  "bin/",
14
21
  "tools/",
15
22
  "plugins/",
16
- "README.md"
23
+ "assets/",
24
+ "README.md",
25
+ "README_en.md"
17
26
  ],
18
27
  "engines": {
19
28
  "node": ">=18"
@@ -7,8 +7,8 @@ const TMP_DIR = path.join(tmpdir(), "opencode-vision")
7
7
  /**
8
8
  * 在消息发送给模型前一刻,检测用户消息中的图片附件:
9
9
  * 1. 保存图片到临时目录
10
- * 2. 在用户文本前注入路径提示,让不支持多模态的模型自动调用 vision 工具
11
- * 3. 替换原始图片部分避免 unsupportedParts 产生噪音 ERROR 文本
10
+ * 2. 用简短占位替换原始图片部分(消除 unsupportedParts 的 ERROR 噪音)
11
+ * 3. 注入路径提示(新 push part,不持久化,UI 不可见)
12
12
  */
13
13
  export default (async () => {
14
14
  await Bun.write(path.join(TMP_DIR, ".check"), "").catch(() => {})
@@ -40,8 +40,7 @@ export default (async () => {
40
40
 
41
41
  if (saved.length === 0) continue
42
42
 
43
- // 用简短文本占位替换原始图片 part,防止 unsupportedParts 产生噪音 ERROR
44
- // 逆序遍历避免 index 偏移
43
+ // 用简短占位替换原始图片 part,防止 unsupportedParts 产生噪音 ERROR
45
44
  for (const { index, filePath } of saved.toReversed()) {
46
45
  msg.parts.splice(index, 1, {
47
46
  type: "text",
@@ -49,16 +48,16 @@ export default (async () => {
49
48
  } as never)
50
49
  }
51
50
 
52
- // 构造路径提示
53
- const hintText = saved.length === 1
51
+ // 构造路径提示(新 push 的 part,不持久化,UI 不可见)
52
+ const hints = saved.length === 1
54
53
  ? `[Image auto-saved to ${saved[0].filePath} — use the vision tool to read it]`
55
54
  : `[Images auto-saved to:\n${saved.map((s) => ` ${s.filePath}`).join("\n")}\n— use the vision tool with paths=[...] to read them all at once]`
56
55
 
57
- // 注入到用户文本前面
58
- const firstText = msg.parts.find((p) => p.type === "text" && !p.synthetic)
59
- if (firstText && typeof firstText.text === "string") {
60
- firstText.text = hintText + "\n" + firstText.text
61
- }
56
+ // push 新的 part 而非修改现有 part,避免影响 UI 渲染
57
+ ;(msg.parts as unknown as Record<string, unknown>[]).push({
58
+ type: "text" as const,
59
+ text: hints,
60
+ })
62
61
  }
63
62
  },
64
63
  }