npm - @jochenyang/opencode-vision - Versions diffs - 1.0.0 → 1.0.1 - Mend

@jochenyang/opencode-vision 1.0.0 → 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md +47 -6
package/README_en.md +210 -0
package/assets/logo.svg +13 -0
package/package.json +4 -2
package/plugins/vision-helper.ts +10 -11

package/README.md CHANGED Viewed

@@ -1,8 +1,45 @@
-# opencode-vision
+<p align="center">
+  <picture>
+    <source media="(prefers-color-scheme: dark)" srcset="assets/logo.svg">
+    <img src="assets/logo.svg" width="64" alt="opencode-vision logo">
+  </picture>
+</p>
+<h1 align="center">opencode-vision</h1>
+<p align="center">
+  🌐 <a href="README_en.md">English</a> · <strong>中文</strong>
+</p>
+<p align="center">
+  让不支持多模态的 OpenCode 模型也能「看懂」图片
+  <br />
+  自动存图 → 引导模型调用 vision 工具 → 返回描述
+</p>
+<p align="center">
+  <a href="https://www.npmjs.com/package/@jochenyang/opencode-vision">
+    <img src="https://img.shields.io/npm/v/@jochenyang/opencode-vision?style=flat-square" alt="npm version">
+  </a>
+  <a href="LICENSE">
+    <img src="https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square" alt="MIT License">
+  </a>
+  <a href="https://github.com/JochenYang/opencode-vision">
+    <img src="https://img.shields.io/github/stars/JochenYang/opencode-vision?style=flat-square" alt="GitHub stars">
+  </a>
+</p>
+---
+### ✨ 一行命令安装
-为 [OpenCode](https://github.com/opencode-ai/opencode) 提供视觉识别能力的插件 + 工具。
+```bash
+npx @jochenyang/opencode-vision
+```
-当模型本身不支持多模态输入时，自动将用户粘贴的图片保存到临时目录，并引导模型调用 vision 工具进行识别。支持单图和多图。
+卸载同样简单：
+```bash
+npx @jochenyang/opencode-vision --uninstall
+```
+---
 ## 原理
@@ -74,10 +111,14 @@ OpenCode 会自动发现 `~/.config/opencode/tools/` 和 `~/.config/opencode/plu
 > 如果对应目录不存在，手动创建即可。
-### 通过 npx（即将支持）
+### 通过 npx
 ```bash
-npx opencode-vision install
+# 安装
+npx @jochenyang/opencode-vision
+# 卸载
+npx @jochenyang/opencode-vision --uninstall
 ```
 ## 验证
@@ -145,4 +186,4 @@ $env:VISION_MODEL = 'your-vision-model'
 ## 许可证
-MIT
+[MIT](LICENSE)

package/README_en.md ADDED Viewed

@@ -0,0 +1,210 @@
+<p align="center">
+  🌐 <strong>English</strong> · <a href="README.md">中文</a>
+</p>
+<p align="center">
+  <picture>
+    <source media="(prefers-color-scheme: dark)" srcset="assets/logo.svg">
+    <img src="assets/logo.svg" width="64" alt="opencode-vision logo">
+  </picture>
+</p>
+<h1 align="center">opencode-vision</h1>
+<p align="center">
+  🌐 <strong>English</strong> · <a href="README.md">中文</a>
+</p>
+<p align="center">
+  Let non-vision OpenCode models "see" pasted images
+  <br />
+  Auto-saves images → guides model to call vision tool → returns description
+</p>
+<p align="center">
+  <a href="https://www.npmjs.com/package/@jochenyang/opencode-vision">
+    <img src="https://img.shields.io/npm/v/@jochenyang/opencode-vision?style=flat-square" alt="npm version">
+  </a>
+  <a href="LICENSE">
+    <img src="https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square" alt="MIT License">
+  </a>
+  <a href="https://github.com/JochenYang/opencode-vision">
+    <img src="https://img.shields.io/github/stars/JochenYang/opencode-vision?style=flat-square" alt="GitHub stars">
+  </a>
+</p>
+  <p align="center">
+    <a href="https://www.npmjs.com/package/@jochenyang/opencode-vision">
+      <img src="https://img.shields.io/npm/v/@jochenyang/opencode-vision?style=flat-square" alt="npm version">
+    </a>
+    <a href="LICENSE">
+      <img src="https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square" alt="MIT License">
+    </a>
+    <a href="https://github.com/JochenYang/opencode-vision">
+      <img src="https://img.shields.io/github/stars/JochenYang/opencode-vision?style=flat-square" alt="GitHub stars">
+    </a>
+  </p>
+</p>
+---
+### ✨ One-line install
+```bash
+npx @jochenyang/opencode-vision
+```
+Uninstall:
+```bash
+npx @jochenyang/opencode-vision --uninstall
+```
+---
+## How It Works
+```
+User pastes image + "What is this?"
+  ↓
+vision-helper plugin (experimental.chat.messages.transform)
+  ├─ Decode base64 → save to temp directory
+  ├─ Replace original image part with a short placeholder (remove ERROR noise from unsupportedParts)
+  └─ Inject path hint before user's text
+  ↓
+Model sees the path hint → automatically calls the vision tool
+  ↓
+vision tool calls the vision API → returns image description
+```
+- **Single image** → model calls `vision(path)` to read one image
+- **Multiple images** → model calls `vision(paths=[...])` to process all at once
+## Prerequisites
+- [OpenCode](https://github.com/opencode-ai/opencode) installed
+- An OpenAI-compatible vision API (e.g., Aliyun DashScope, OpenAI, etc.)
+- Environment variables configured (recommended system-wide)
+## Environment Variables
+| Variable          | Description                                            | Example                         |
+| ----------------- | ------------------------------------------------------ | ------------------------------- |
+| `VISION_API_KEY`  | Vision API key                                         | `sk-your-api-key`               |
+| `VISION_API_URL`  | Vision API base URL<br>(tool auto-appends `/chat/completions`) | `https://your-api-endpoint/v1`  |
+| `VISION_MODEL`    | Vision model name                                      | `your-vision-model`             |
+### Windows (System-wide)
+```powershell
+[System.Environment]::SetEnvironmentVariable('VISION_API_KEY', 'sk-your-api-key', 'User')
+[System.Environment]::SetEnvironmentVariable('VISION_API_URL', 'https://your-api-endpoint/v1', 'User')
+[System.Environment]::SetEnvironmentVariable('VISION_MODEL', 'your-vision-model', 'User')
+```
+**Restart your terminal** after setting.
+### macOS / Linux
+Add to `~/.zshrc` or `~/.bashrc`:
+```bash
+export VISION_API_KEY="sk-your-api-key"
+export VISION_API_URL="https://your-api-endpoint/v1"
+export VISION_MODEL="your-vision-model"
+```
+## Installation
+### Manual
+Copy the two files to OpenCode's global config directory:
+```bash
+# Tool
+cp tools/vision.ts ~/.config/opencode/tools/
+# Plugin
+cp plugins/vision-helper.ts ~/.config/opencode/plugins/
+```
+OpenCode auto-discovers files under `~/.config/opencode/tools/` and `~/.config/opencode/plugins/` — **no need to modify `opencode.json`**.
+> Create the directories if they don't exist.
+### Via npx
+```bash
+# Install
+npx @jochenyang/opencode-vision
+# Uninstall
+npx @jochenyang/opencode-vision --uninstall
+```
+## Verification
+Start OpenCode:
+```bash
+opencode
+```
+Paste an image and ask:
+```
+[Image] What is this?
+```
+Expected behavior:
+1. The model cannot read the image directly (doesn't support multimodal)
+2. The plugin saves the image to temp and injects a path hint
+3. The model automatically calls the `vision` tool
+4. The model returns an image description
+## Project Structure
+```
+opencode-vision/
+├── tools/
+│   └── vision.ts          # Vision tool — calls the vision API
+├── plugins/
+│   └── vision-helper.ts   # Plugin — saves images, injects hints, removes ERROR noise
+├── bin/
+│   └── install.js         # CLI install/uninstall script
+├── package.json
+├── README.md
+├── README_en.md
+└── LICENSE
+```
+### Tool: `tools/vision.ts`
+- Reads local image files and describes them via a vision API
+- Supports `path` (single) and `paths` (multiple) parameters
+- Compatible with any OpenAI Chat Completions API
+### Plugin: `plugins/vision-helper.ts`
+- Hook: `experimental.chat.messages.transform`
+- Processes right before the message is sent to the model
+- Saves images to `os.tmpdir()/opencode-vision/`
+- Injects path hints before user text (not persisted to chat history)
+- Replaces original image parts to prevent ERROR noise from `unsupportedParts`
+## Notes
+- Images are saved to the system temp directory `os.tmpdir()/opencode-vision/` — automatically cleaned on reboot
+- Temp files are named `pasted-{timestamp}-{random}.{ext}`
+- Same image pasted multiple times in one session creates separate temp files
+- Vision API calls use `max_tokens: 4096`, sufficient for detailed multi-image descriptions
+## Custom Vision API
+Compatible with any OpenAI Chat Completions vision API. Just change the environment variables:
+```bash
+export VISION_API_KEY="sk-your-api-key"
+export VISION_API_URL="https://your-api-endpoint/v1"
+export VISION_MODEL="your-vision-model"
+```
+## License
+[MIT](LICENSE)

package/assets/logo.svg ADDED Viewed

@@ -0,0 +1,13 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="48" height="48" viewBox="0 0 48 48" fill="none">
+  <!-- Magnifying glass circle -->
+  <circle cx="20" cy="20" r="12" stroke="#6366f1" stroke-width="3" fill="none"/>
+  <!-- Magnifying glass handle -->
+  <line x1="29" y1="29" x2="40" y2="40" stroke="#6366f1" stroke-width="3.5" stroke-linecap="round"/>
+  <!-- Eye inside -->
+  <path d="M13 20c0-3.3 2.7-6 6-6s6 2.7 6 6-2.7 6-6 6-6-2.7-6-6z" stroke="#6366f1" stroke-width="1.5" fill="none"/>
+  <circle cx="19" cy="20" r="2.5" fill="#6366f1"/>
+  <!-- Small sparkle dots -->
+  <circle cx="8" cy="8" r="1.5" fill="#a5b4fc"/>
+  <circle cx="36" cy="10" r="1" fill="#a5b4fc"/>
+  <circle cx="42" cy="30" r="1.2" fill="#a5b4fc"/>
+</svg>

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@jochenyang/opencode-vision",
-  "version": "1.0.0",
+  "version": "1.0.1",
   "description": "Vision plugin + tool for OpenCode — automatically handles pasted images for non-vision models",
   "keywords": ["opencode", "vision", "image", "ai", "plugin", "tool"],
   "homepage": "https://github.com/jochenyang/opencode-vision",
@@ -13,7 +13,9 @@
     "bin/",
     "tools/",
     "plugins/",
-    "README.md"
+    "assets/",
+    "README.md",
+    "README_en.md"
   ],
   "engines": {
     "node": ">=18"

package/plugins/vision-helper.ts CHANGED Viewed

@@ -7,8 +7,8 @@ const TMP_DIR = path.join(tmpdir(), "opencode-vision")
 /**
  * 在消息发送给模型前一刻，检测用户消息中的图片附件：
  * 1. 保存图片到临时目录
- * 2. 在用户文本前注入路径提示，让不支持多模态的模型自动调用 vision 工具
- * 3. 替换原始图片部分避免 unsupportedParts 产生噪音 ERROR 文本
+ * 2. 用简短占位替换原始图片部分（消除 unsupportedParts 的 ERROR 噪音）
+ * 3. 注入路径提示（新 push 的 part，不持久化，UI 不可见）
  */
 export default (async () => {
   await Bun.write(path.join(TMP_DIR, ".check"), "").catch(() => {})
@@ -40,8 +40,7 @@ export default (async () => {
         if (saved.length === 0) continue
-        // 用简短文本占位替换原始图片 part，防止 unsupportedParts 产生噪音 ERROR
-        // 逆序遍历避免 index 偏移
+        // 用简短占位替换原始图片 part，防止 unsupportedParts 产生噪音 ERROR
         for (const { index, filePath } of saved.toReversed()) {
           msg.parts.splice(index, 1, {
             type: "text",
@@ -49,16 +48,16 @@ export default (async () => {
           } as never)
         }
-        // 构造路径提示
-        const hintText = saved.length === 1
+        // 构造路径提示（新 push 的 part，不持久化，UI 不可见）
+        const hints = saved.length === 1
           ? `[Image auto-saved to ${saved[0].filePath} — use the vision tool to read it]`
           : `[Images auto-saved to:\n${saved.map((s) => `  ${s.filePath}`).join("\n")}\n— use the vision tool with paths=[...] to read them all at once]`
-        // 注入到用户文本前面
-        const firstText = msg.parts.find((p) => p.type === "text" && !p.synthetic)
-        if (firstText && typeof firstText.text === "string") {
-          firstText.text = hintText + "\n" + firstText.text
-        }
+        // push 新的 part 而非修改现有 part，避免影响 UI 渲染
+        ;(msg.parts as unknown as Record<string, unknown>[]).push({
+          type: "text" as const,
+          text: hints,
+        })
       }
     },
   }