@jochenyang/opencode-vision 1.0.0 → 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,8 +1,45 @@
1
- # opencode-vision
1
+ <p align="center">
2
+ <picture>
3
+ <source media="(prefers-color-scheme: dark)" srcset="assets/logo.svg">
4
+ <img src="assets/logo.svg" width="64" alt="opencode-vision logo">
5
+ </picture>
6
+ </p>
7
+ <h1 align="center">opencode-vision</h1>
8
+ <p align="center">
9
+ 🌐 <a href="README_en.md">English</a> · <strong>中文</strong>
10
+ </p>
11
+ <p align="center">
12
+ 让不支持多模态的 OpenCode 模型也能「看懂」图片
13
+ <br />
14
+ 自动存图 → 引导模型调用 vision 工具 → 返回描述
15
+ </p>
16
+ <p align="center">
17
+ <a href="https://www.npmjs.com/package/@jochenyang/opencode-vision">
18
+ <img src="https://img.shields.io/npm/v/@jochenyang/opencode-vision?style=flat-square" alt="npm version">
19
+ </a>
20
+ <a href="LICENSE">
21
+ <img src="https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square" alt="MIT License">
22
+ </a>
23
+ <a href="https://github.com/JochenYang/opencode-vision">
24
+ <img src="https://img.shields.io/github/stars/JochenYang/opencode-vision?style=flat-square" alt="GitHub stars">
25
+ </a>
26
+ </p>
27
+
28
+ ---
29
+
30
+ ### ✨ 一行命令安装
2
31
 
3
- 为 [OpenCode](https://github.com/opencode-ai/opencode) 提供视觉识别能力的插件 + 工具。
32
+ ```bash
33
+ npx @jochenyang/opencode-vision
34
+ ```
4
35
 
5
- 当模型本身不支持多模态输入时,自动将用户粘贴的图片保存到临时目录,并引导模型调用 vision 工具进行识别。支持单图和多图。
36
+ 卸载同样简单:
37
+
38
+ ```bash
39
+ npx @jochenyang/opencode-vision --uninstall
40
+ ```
41
+
42
+ ---
6
43
 
7
44
  ## 原理
8
45
 
@@ -74,10 +111,14 @@ OpenCode 会自动发现 `~/.config/opencode/tools/` 和 `~/.config/opencode/plu
74
111
 
75
112
  > 如果对应目录不存在,手动创建即可。
76
113
 
77
- ### 通过 npx(即将支持)
114
+ ### 通过 npx
78
115
 
79
116
  ```bash
80
- npx opencode-vision install
117
+ # 安装
118
+ npx @jochenyang/opencode-vision
119
+
120
+ # 卸载
121
+ npx @jochenyang/opencode-vision --uninstall
81
122
  ```
82
123
 
83
124
  ## 验证
@@ -145,4 +186,4 @@ $env:VISION_MODEL = 'your-vision-model'
145
186
 
146
187
  ## 许可证
147
188
 
148
- MIT
189
+ [MIT](LICENSE)
package/README_en.md ADDED
@@ -0,0 +1,210 @@
1
+ <p align="center">
2
+ 🌐 <strong>English</strong> · <a href="README.md">中文</a>
3
+ </p>
4
+
5
+ <p align="center">
6
+ <picture>
7
+ <source media="(prefers-color-scheme: dark)" srcset="assets/logo.svg">
8
+ <img src="assets/logo.svg" width="64" alt="opencode-vision logo">
9
+ </picture>
10
+ </p>
11
+ <h1 align="center">opencode-vision</h1>
12
+ <p align="center">
13
+ 🌐 <strong>English</strong> · <a href="README.md">中文</a>
14
+ </p>
15
+ <p align="center">
16
+ Let non-vision OpenCode models "see" pasted images
17
+ <br />
18
+ Auto-saves images → guides model to call vision tool → returns description
19
+ </p>
20
+ <p align="center">
21
+ <a href="https://www.npmjs.com/package/@jochenyang/opencode-vision">
22
+ <img src="https://img.shields.io/npm/v/@jochenyang/opencode-vision?style=flat-square" alt="npm version">
23
+ </a>
24
+ <a href="LICENSE">
25
+ <img src="https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square" alt="MIT License">
26
+ </a>
27
+ <a href="https://github.com/JochenYang/opencode-vision">
28
+ <img src="https://img.shields.io/github/stars/JochenYang/opencode-vision?style=flat-square" alt="GitHub stars">
29
+ </a>
30
+ </p>
31
+ <p align="center">
32
+ <a href="https://www.npmjs.com/package/@jochenyang/opencode-vision">
33
+ <img src="https://img.shields.io/npm/v/@jochenyang/opencode-vision?style=flat-square" alt="npm version">
34
+ </a>
35
+ <a href="LICENSE">
36
+ <img src="https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square" alt="MIT License">
37
+ </a>
38
+ <a href="https://github.com/JochenYang/opencode-vision">
39
+ <img src="https://img.shields.io/github/stars/JochenYang/opencode-vision?style=flat-square" alt="GitHub stars">
40
+ </a>
41
+ </p>
42
+ </p>
43
+
44
+ ---
45
+
46
+ ### ✨ One-line install
47
+
48
+ ```bash
49
+ npx @jochenyang/opencode-vision
50
+ ```
51
+
52
+ Uninstall:
53
+
54
+ ```bash
55
+ npx @jochenyang/opencode-vision --uninstall
56
+ ```
57
+
58
+ ---
59
+
60
+ ## How It Works
61
+
62
+ ```
63
+ User pastes image + "What is this?"
64
+
65
+ vision-helper plugin (experimental.chat.messages.transform)
66
+ ├─ Decode base64 → save to temp directory
67
+ ├─ Replace original image part with a short placeholder (remove ERROR noise from unsupportedParts)
68
+ └─ Inject path hint before user's text
69
+
70
+ Model sees the path hint → automatically calls the vision tool
71
+
72
+ vision tool calls the vision API → returns image description
73
+ ```
74
+
75
+ - **Single image** → model calls `vision(path)` to read one image
76
+ - **Multiple images** → model calls `vision(paths=[...])` to process all at once
77
+
78
+ ## Prerequisites
79
+
80
+ - [OpenCode](https://github.com/opencode-ai/opencode) installed
81
+ - An OpenAI-compatible vision API (e.g., Aliyun DashScope, OpenAI, etc.)
82
+ - Environment variables configured (recommended system-wide)
83
+
84
+ ## Environment Variables
85
+
86
+ | Variable | Description | Example |
87
+ | ----------------- | ------------------------------------------------------ | ------------------------------- |
88
+ | `VISION_API_KEY` | Vision API key | `sk-your-api-key` |
89
+ | `VISION_API_URL` | Vision API base URL<br>(tool auto-appends `/chat/completions`) | `https://your-api-endpoint/v1` |
90
+ | `VISION_MODEL` | Vision model name | `your-vision-model` |
91
+
92
+ ### Windows (System-wide)
93
+
94
+ ```powershell
95
+ [System.Environment]::SetEnvironmentVariable('VISION_API_KEY', 'sk-your-api-key', 'User')
96
+ [System.Environment]::SetEnvironmentVariable('VISION_API_URL', 'https://your-api-endpoint/v1', 'User')
97
+ [System.Environment]::SetEnvironmentVariable('VISION_MODEL', 'your-vision-model', 'User')
98
+ ```
99
+
100
+ **Restart your terminal** after setting.
101
+
102
+ ### macOS / Linux
103
+
104
+ Add to `~/.zshrc` or `~/.bashrc`:
105
+
106
+ ```bash
107
+ export VISION_API_KEY="sk-your-api-key"
108
+ export VISION_API_URL="https://your-api-endpoint/v1"
109
+ export VISION_MODEL="your-vision-model"
110
+ ```
111
+
112
+ ## Installation
113
+
114
+ ### Manual
115
+
116
+ Copy the two files to OpenCode's global config directory:
117
+
118
+ ```bash
119
+ # Tool
120
+ cp tools/vision.ts ~/.config/opencode/tools/
121
+
122
+ # Plugin
123
+ cp plugins/vision-helper.ts ~/.config/opencode/plugins/
124
+ ```
125
+
126
+ OpenCode auto-discovers files under `~/.config/opencode/tools/` and `~/.config/opencode/plugins/` — **no need to modify `opencode.json`**.
127
+
128
+ > Create the directories if they don't exist.
129
+
130
+ ### Via npx
131
+
132
+ ```bash
133
+ # Install
134
+ npx @jochenyang/opencode-vision
135
+
136
+ # Uninstall
137
+ npx @jochenyang/opencode-vision --uninstall
138
+ ```
139
+
140
+ ## Verification
141
+
142
+ Start OpenCode:
143
+
144
+ ```bash
145
+ opencode
146
+ ```
147
+
148
+ Paste an image and ask:
149
+
150
+ ```
151
+ [Image] What is this?
152
+ ```
153
+
154
+ Expected behavior:
155
+
156
+ 1. The model cannot read the image directly (doesn't support multimodal)
157
+ 2. The plugin saves the image to temp and injects a path hint
158
+ 3. The model automatically calls the `vision` tool
159
+ 4. The model returns an image description
160
+
161
+ ## Project Structure
162
+
163
+ ```
164
+ opencode-vision/
165
+ ├── tools/
166
+ │ └── vision.ts # Vision tool — calls the vision API
167
+ ├── plugins/
168
+ │ └── vision-helper.ts # Plugin — saves images, injects hints, removes ERROR noise
169
+ ├── bin/
170
+ │ └── install.js # CLI install/uninstall script
171
+ ├── package.json
172
+ ├── README.md
173
+ ├── README_en.md
174
+ └── LICENSE
175
+ ```
176
+
177
+ ### Tool: `tools/vision.ts`
178
+
179
+ - Reads local image files and describes them via a vision API
180
+ - Supports `path` (single) and `paths` (multiple) parameters
181
+ - Compatible with any OpenAI Chat Completions API
182
+
183
+ ### Plugin: `plugins/vision-helper.ts`
184
+
185
+ - Hook: `experimental.chat.messages.transform`
186
+ - Processes right before the message is sent to the model
187
+ - Saves images to `os.tmpdir()/opencode-vision/`
188
+ - Injects path hints before user text (not persisted to chat history)
189
+ - Replaces original image parts to prevent ERROR noise from `unsupportedParts`
190
+
191
+ ## Notes
192
+
193
+ - Images are saved to the system temp directory `os.tmpdir()/opencode-vision/` — automatically cleaned on reboot
194
+ - Temp files are named `pasted-{timestamp}-{random}.{ext}`
195
+ - Same image pasted multiple times in one session creates separate temp files
196
+ - Vision API calls use `max_tokens: 4096`, sufficient for detailed multi-image descriptions
197
+
198
+ ## Custom Vision API
199
+
200
+ Compatible with any OpenAI Chat Completions vision API. Just change the environment variables:
201
+
202
+ ```bash
203
+ export VISION_API_KEY="sk-your-api-key"
204
+ export VISION_API_URL="https://your-api-endpoint/v1"
205
+ export VISION_MODEL="your-vision-model"
206
+ ```
207
+
208
+ ## License
209
+
210
+ [MIT](LICENSE)
@@ -0,0 +1,13 @@
1
+ <svg xmlns="http://www.w3.org/2000/svg" width="48" height="48" viewBox="0 0 48 48" fill="none">
2
+ <!-- Magnifying glass circle -->
3
+ <circle cx="20" cy="20" r="12" stroke="#6366f1" stroke-width="3" fill="none"/>
4
+ <!-- Magnifying glass handle -->
5
+ <line x1="29" y1="29" x2="40" y2="40" stroke="#6366f1" stroke-width="3.5" stroke-linecap="round"/>
6
+ <!-- Eye inside -->
7
+ <path d="M13 20c0-3.3 2.7-6 6-6s6 2.7 6 6-2.7 6-6 6-6-2.7-6-6z" stroke="#6366f1" stroke-width="1.5" fill="none"/>
8
+ <circle cx="19" cy="20" r="2.5" fill="#6366f1"/>
9
+ <!-- Small sparkle dots -->
10
+ <circle cx="8" cy="8" r="1.5" fill="#a5b4fc"/>
11
+ <circle cx="36" cy="10" r="1" fill="#a5b4fc"/>
12
+ <circle cx="42" cy="30" r="1.2" fill="#a5b4fc"/>
13
+ </svg>
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@jochenyang/opencode-vision",
3
- "version": "1.0.0",
3
+ "version": "1.0.1",
4
4
  "description": "Vision plugin + tool for OpenCode — automatically handles pasted images for non-vision models",
5
5
  "keywords": ["opencode", "vision", "image", "ai", "plugin", "tool"],
6
6
  "homepage": "https://github.com/jochenyang/opencode-vision",
@@ -13,7 +13,9 @@
13
13
  "bin/",
14
14
  "tools/",
15
15
  "plugins/",
16
- "README.md"
16
+ "assets/",
17
+ "README.md",
18
+ "README_en.md"
17
19
  ],
18
20
  "engines": {
19
21
  "node": ">=18"
@@ -7,8 +7,8 @@ const TMP_DIR = path.join(tmpdir(), "opencode-vision")
7
7
  /**
8
8
  * 在消息发送给模型前一刻,检测用户消息中的图片附件:
9
9
  * 1. 保存图片到临时目录
10
- * 2. 在用户文本前注入路径提示,让不支持多模态的模型自动调用 vision 工具
11
- * 3. 替换原始图片部分避免 unsupportedParts 产生噪音 ERROR 文本
10
+ * 2. 用简短占位替换原始图片部分(消除 unsupportedParts 的 ERROR 噪音)
11
+ * 3. 注入路径提示(新 push part,不持久化,UI 不可见)
12
12
  */
13
13
  export default (async () => {
14
14
  await Bun.write(path.join(TMP_DIR, ".check"), "").catch(() => {})
@@ -40,8 +40,7 @@ export default (async () => {
40
40
 
41
41
  if (saved.length === 0) continue
42
42
 
43
- // 用简短文本占位替换原始图片 part,防止 unsupportedParts 产生噪音 ERROR
44
- // 逆序遍历避免 index 偏移
43
+ // 用简短占位替换原始图片 part,防止 unsupportedParts 产生噪音 ERROR
45
44
  for (const { index, filePath } of saved.toReversed()) {
46
45
  msg.parts.splice(index, 1, {
47
46
  type: "text",
@@ -49,16 +48,16 @@ export default (async () => {
49
48
  } as never)
50
49
  }
51
50
 
52
- // 构造路径提示
53
- const hintText = saved.length === 1
51
+ // 构造路径提示(新 push 的 part,不持久化,UI 不可见)
52
+ const hints = saved.length === 1
54
53
  ? `[Image auto-saved to ${saved[0].filePath} — use the vision tool to read it]`
55
54
  : `[Images auto-saved to:\n${saved.map((s) => ` ${s.filePath}`).join("\n")}\n— use the vision tool with paths=[...] to read them all at once]`
56
55
 
57
- // 注入到用户文本前面
58
- const firstText = msg.parts.find((p) => p.type === "text" && !p.synthetic)
59
- if (firstText && typeof firstText.text === "string") {
60
- firstText.text = hintText + "\n" + firstText.text
61
- }
56
+ // push 新的 part 而非修改现有 part,避免影响 UI 渲染
57
+ ;(msg.parts as unknown as Record<string, unknown>[]).push({
58
+ type: "text" as const,
59
+ text: hints,
60
+ })
62
61
  }
63
62
  },
64
63
  }