opencode-see-image 0.3.0 → 0.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +21 -19
  2. package/index.ts +195 -195
  3. package/package.json +1 -1
package/README.md CHANGED
@@ -32,25 +32,27 @@ Install the opencode-see-image plugin so I can send you screenshots. Do this:
32
32
  After I restart and attach a screenshot, you should call the see_image tool to view it.
33
33
  ```
34
34
 
35
- Then restart opencode.
36
-
37
35
  ## Prerequisites
38
36
 
39
- You need a connected vision-capable provider. The plugin auto-detects whichever you have connected **either of these works**:
37
+ You need a connected vision-capable provider. The plugin auto-detects whichever you have connected, **either of these works**:
40
38
 
41
- ### Option A — Free (OpenCode Zen)
39
+ ### Free (OpenCode Zen)
42
40
  1. Run `/connect` in opencode
43
41
  2. Select **opencode** (OpenCode Zen)
44
42
  3. Paste your API key from [opencode.ai/auth](https://opencode.ai/auth)
45
43
 
46
- The plugin falls back to **big-pickle** (free, vision-capable, ~20s). No subscription needed.
44
+ The plugin falls back to **big-pickle** (~12000ms). No subscription needed.
47
45
 
48
- ### Option B — Paid, fast (OpenCode Go)
46
+ ### Paid, w/ OpenCode Go
49
47
  1. Run `/connect` in opencode
50
48
  2. Select **opencode-go**
51
49
  3. Paste your API key from [opencode.ai/auth](https://opencode.ai/auth)
52
50
 
53
- The plugin prefers **minimax-m3** via opencode-go (~3s) when available.
51
+ The plugin prefers **minimax-m3** via opencode-go (~3000ms) when available.
52
+
53
+ ### Paid, w/ another provider
54
+
55
+ Set the `SEE_IMAGE_*` env vars to point at any Anthropic-Messages-compatible endpoint. See [Configuration](#configuration) below.
54
56
 
55
57
  **Resolution order:** explicit `SEE_IMAGE_API_KEY` env → configured `SEE_IMAGE_PROVIDER` → `opencode-go` (MiniMax M3) → `opencode` (big-pickle, free).
56
58
 
@@ -85,7 +87,7 @@ The plugin registers a `see_image` tool with two arguments:
85
87
  | `filePath` | string | yes | Path to the image. Absolute path, or a bare filename like `"Screenshot 2026-06-18 at 17.32.24.png"` to auto-locate. |
86
88
  | `question` | string | no | A specific question about the image. Defaults to a general detailed description. Use this to focus on a particular detail (e.g. `"What error is shown in the terminal?"`). |
87
89
 
88
- Your model calls this tool automatically when you attach a screenshot you don't need to do anything special. The `question` arg is optional; the model uses it when you ask something specific about the image.
90
+ Your model calls this tool automatically when you attach a screenshot, you don't need to do anything special. The `question` arg is optional; the model uses it when you ask something specific about the image.
89
91
 
90
92
  ## Configuration
91
93
 
@@ -122,39 +124,39 @@ export SEE_IMAGE_MODEL="kimi-k2.7-code"
122
124
 
123
125
  | Model | Speed | Notes |
124
126
  |---|---|---|
125
- | `big-pickle` | ~20s | Free. Accurate. Default fallback when only Zen is connected. |
127
+ | `big-pickle` | ~12000ms | Free. Accurate. Default fallback when only Zen is connected. |
126
128
 
127
129
  **Paid (OpenCode Go):**
128
130
 
129
131
  | Model | Speed | Notes |
130
132
  |---|---|---|
131
- | `minimax-m3` | ~3s | Default. Fast, clean text output. |
132
- | `kimi-k2.7-code` | ~7s | Clean output, accurate. |
133
- | `kimi-k2.6` | ~20s | Accurate but slow. |
134
- | `qwen3.7-plus` | ~20s | Emits thinking blocks (handled). |
133
+ | `minimax-m3` | ~3000ms | Default. Fast, clean text output. |
134
+ | `kimi-k2.7-code` | ~7000ms | Clean output, accurate. |
135
+ | `kimi-k2.6` | ~20000ms | Accurate but slow. |
136
+ | `qwen3.7-plus` | ~20000ms | Emits thinking blocks (handled). |
135
137
 
136
138
  ## Updating
137
139
 
138
- **Auto-update (built in):** the plugin checks npm for a newer version on every opencode startup. If one exists, it runs `bun update` automatically and shows a toast: *"opencode-see-image updated to X.Y.Z restart opencode to apply"*. You just need to restart opencode to load the new version. Nothing to configure.
140
+ **Auto-update (built in):** the plugin checks npm for a newer version on every opencode startup. If one exists, it updates itself via `opencode plugin --force` (uses opencode's bundled bun, no global bun needed) and shows a toast: *"opencode-see-image updated to X.Y.Z, restart opencode to apply"*. You just need to restart opencode to load the new version. Nothing to configure.
139
141
 
140
142
  **Manual update** (if you want to force it now):
141
143
  ```bash
142
- cd ~/.cache/opencode && bun update opencode-see-image
144
+ opencode plugin opencode-see-image --force --global
143
145
  ```
144
- Then restart opencode.
146
+ Then restart opencode. (No bun required, this uses opencode's own bun.)
145
147
 
146
148
  **Pin a version** in your config to opt out of auto-updates:
147
149
  ```jsonc
148
- "plugin": ["opencode-see-image@0.3.0"]
150
+ "plugin": ["opencode-see-image@0.4.1"]
149
151
  ```
150
152
 
151
153
  ## File search locations
152
154
 
153
155
  When opencode rejects an image attachment, the model only receives a bare filename. `see_image` searches these locations in order:
154
156
 
155
- 1. `$TMPDIR/TemporaryItems/NSIRD_screencaptureui_*/` where macOS stashes dragged screenshots
157
+ 1. `$TMPDIR/TemporaryItems/NSIRD_screencaptureui_*/` (where macOS stashes dragged screenshots)
156
158
  2. `$TMPDIR/TemporaryItems/`
157
- 3. `~/Desktop` default screenshot save location
159
+ 3. `~/Desktop` (default screenshot save location)
158
160
  4. `~/Downloads`
159
161
  5. Current working directory
160
162
 
package/index.ts CHANGED
@@ -4,10 +4,6 @@ import os from "os"
4
4
  import fs from "fs"
5
5
  import type { Plugin } from "@opencode-ai/plugin"
6
6
 
7
- // ─── Configuration (env-overridable) ────────────────────────────────────────
8
- // Defaults target opencode-go's MiniMax M3. Users on other providers can
9
- // override via environment variables without editing this file.
10
-
11
7
  const ENDPOINT =
12
8
  process.env.SEE_IMAGE_ENDPOINT ||
13
9
  "https://opencode.ai/zen/go/v1/messages"
@@ -27,61 +23,6 @@ const EXT_MEDIA: Record<string, string> = {
27
23
  bmp: "image/bmp",
28
24
  }
29
25
 
30
- // ─── Auth ───────────────────────────────────────────────────────────────────
31
- // Resolves a usable API key + the endpoint + model to use. Falls back through
32
- // a chain so users with any connected opencode subscription (paid opencode-go
33
- // or free opencode Zen) get a working default with zero config.
34
- function resolveAuth(): { key: string; endpoint: string; model: string } {
35
- // 1. Explicit env vars win outright.
36
- if (process.env.SEE_IMAGE_API_KEY) {
37
- return { key: process.env.SEE_IMAGE_API_KEY, endpoint: ENDPOINT, model: MODEL }
38
- }
39
-
40
- // 2. Walk opencode's auth store and try the configured provider first,
41
- // then a curated fallback chain (paid → free).
42
- const authPath = path.join(os.homedir(), ".local/share/opencode/auth.json")
43
- let auth: any = {}
44
- try {
45
- auth = JSON.parse(fs.readFileSync(authPath, "utf8"))
46
- } catch {
47
- // ignore — handled by the empty-auth path below
48
- }
49
-
50
- // Each candidate: [providerId, endpoint, defaultModel]
51
- const candidates: Array<[string, string, string]> = [
52
- [PROVIDER_ID, ENDPOINT, MODEL],
53
- // Free fallback: OpenCode Zen's big-pickle supports vision at no cost.
54
- ["opencode", "https://opencode.ai/zen/v1/messages", "big-pickle"],
55
- // Paid fallbacks on opencode-go:
56
- ["opencode-go", "https://opencode.ai/zen/go/v1/messages", "minimax-m3"],
57
- ]
58
-
59
- const tried: string[] = []
60
- for (const [pid, ep, mdl] of candidates) {
61
- const entry = auth[pid]
62
- const k = entry && (entry.key || entry.access)
63
- if (k) {
64
- // If the user pinned PROVIDER_ID but not MODEL, honor the candidate's
65
- // default model only when provider matches the pinned one; otherwise
66
- // keep the configured MODEL (may be set via env).
67
- const useModel =
68
- pid === PROVIDER_ID || !process.env.SEE_IMAGE_MODEL ? mdl : MODEL
69
- return { key: k, endpoint: ep, model: useModel }
70
- }
71
- tried.push(pid)
72
- }
73
-
74
- throw new Error(
75
- `see_image: no API key. Connect a provider in opencode via /connect — ` +
76
- `either "opencode-go" (paid, fast MiniMax M3) or "opencode" (free, big-pickle). ` +
77
- `Or set SEE_IMAGE_API_KEY explicitly. (Checked providers: ${tried.join(", ") || "none"} in ${authPath}.)`,
78
- )
79
- }
80
-
81
- // ─── File resolution ────────────────────────────────────────────────────────
82
- // When opencode rejects an image attachment, the model only sees a bare
83
- // filename (no path). This resolves bare filenames by searching the places
84
- // macOS / opencode tend to stash screenshots.
85
26
  function resolveFilePath(name: string, cwd: string): string {
86
27
  if (path.isAbsolute(name) && fs.existsSync(name)) return name
87
28
 
@@ -91,8 +32,6 @@ function resolveFilePath(name: string, cwd: string): string {
91
32
  const tmpdir = process.env.TMPDIR || "/tmp"
92
33
  const searchDirs: string[] = []
93
34
 
94
- // macOS screenshot tool temp dirs (NSIRD_screencaptureui_<rand>) — this is
95
- // where dragged screenshots actually land, not ~/Desktop.
96
35
  const tempItems = path.join(tmpdir, "TemporaryItems")
97
36
  if (fs.existsSync(tempItems)) {
98
37
  try {
@@ -116,7 +55,6 @@ function resolveFilePath(name: string, cwd: string): string {
116
55
  } catch {}
117
56
  }
118
57
 
119
- // Shallow recursive search in the top-level search dirs.
120
58
  for (const dir of searchDirs) {
121
59
  if (!dir || !fs.existsSync(dir)) continue
122
60
  try {
@@ -133,103 +71,135 @@ function resolveFilePath(name: string, cwd: string): string {
133
71
  )
134
72
  }
135
73
 
136
- // ─── Tool definition ────────────────────────────────────────────────────────
137
- const seeImageTool = tool({
138
- description:
139
- 'See an image/screenshot that the current model cannot view. Use when the user attaches an image and you get a "this model does not support image input" / "Cannot read" error, or when a screenshot/image is referenced ("see this", "can you see", .png/.jpg). Routes the image to a vision-capable model and returns a detailed textual description you can reason about as if you saw it. Pass filePath as an absolute path OR a bare filename (auto-located in macOS screenshot temp dirs, ~/Desktop, ~/Downloads, cwd).',
140
- args: {
141
- filePath: tool.schema
142
- .string()
143
- .describe(
144
- 'Path to the image. Absolute path, or a bare filename like "Screenshot 2026-06-18 at 17.32.24.png" to auto-locate.',
145
- ),
146
- question: tool.schema
147
- .string()
148
- .optional()
149
- .describe(
150
- "Optional specific question about the image. Defaults to a general detailed description.",
151
- ),
152
- },
153
- async execute(args, context) {
154
- const fullPath = resolveFilePath(args.filePath, context.directory)
155
- const ext = path.extname(fullPath).slice(1).toLowerCase()
156
- const mediaType = EXT_MEDIA[ext] || "image/png"
157
-
158
- const buf = fs.readFileSync(fullPath)
159
- const b64 = Buffer.from(buf).toString("base64")
160
-
161
- const prompt =
162
- args.question && args.question.trim().length > 0
163
- ? args.question
164
- : "Describe this image in detail. If it is a screenshot, describe the UI, text content, and layout precisely. This description will be used by another model to answer the user, so be thorough and accurate."
165
-
166
- const { key, endpoint, model } = resolveAuth()
167
-
168
- const body = {
169
- model,
170
- max_tokens: 2048,
171
- messages: [
172
- {
173
- role: "user",
174
- content: [
175
- {
176
- type: "image",
177
- source: { type: "base64", media_type: mediaType, data: b64 },
178
- },
74
+ async function seeImageViaSDK(
75
+ client: any,
76
+ dataUrl: string,
77
+ mediaType: string,
78
+ prompt: string,
79
+ ): Promise<{ text: string; model: string; provider: string }> {
80
+ const envProvider = process.env.SEE_IMAGE_PROVIDER
81
+ const envModel = process.env.SEE_IMAGE_MODEL
82
+ const candidates: Array<{ providerID: string; modelID: string }> = []
83
+ if (envProvider && envModel) {
84
+ candidates.push({ providerID: envProvider, modelID: envModel })
85
+ }
86
+ candidates.push({ providerID: "opencode-go", modelID: "minimax-m3" })
87
+ candidates.push({ providerID: "opencode", modelID: "big-pickle" })
88
+
89
+ const errors: string[] = []
90
+
91
+ for (const { providerID, modelID } of candidates) {
92
+ let sessionID: string | undefined
93
+ try {
94
+ const sessionRes = await client.session.create({ body: {} })
95
+ sessionID = sessionRes.data?.id
96
+ if (!sessionID) {
97
+ errors.push(`${providerID}/${modelID}: no session ID`)
98
+ continue
99
+ }
100
+
101
+ const result = await client.session.prompt({
102
+ path: { id: sessionID },
103
+ body: {
104
+ model: { providerID, modelID },
105
+ parts: [
106
+ { type: "file", mime: mediaType, url: dataUrl },
179
107
  { type: "text", text: prompt },
180
108
  ],
109
+ tools: {},
110
+ system:
111
+ "You are a vision assistant. Describe the image accurately and concisely. Answer with text only.",
181
112
  },
182
- ],
183
- }
113
+ })
184
114
 
185
- const res = await fetch(endpoint, {
186
- method: "POST",
187
- headers: {
188
- "x-api-key": key,
189
- "anthropic-version": API_VERSION,
190
- "content-type": "application/json",
191
- "user-agent": USER_AGENT,
192
- },
193
- body: JSON.stringify(body),
194
- })
115
+ const parts = result.data?.parts ?? []
116
+ const text = (parts as any[])
117
+ .filter((p: any) => p.type === "text")
118
+ .map((p: any) => p.text)
119
+ .filter((t: any) => typeof t === "string" && t.length > 0)
120
+ .join("\n")
121
+ .trim()
195
122
 
196
- if (!res.ok) {
197
- const errText = await res.text()
198
- throw new Error(
199
- `see_image: vision call to "${model}" @ ${endpoint} failed: HTTP ${res.status} ${errText.slice(0, 300)}`,
200
- )
123
+ if (text) {
124
+ return { text, model: modelID, provider: providerID }
125
+ }
126
+ errors.push(`${providerID}/${modelID}: no text in response`)
127
+ } catch (e: any) {
128
+ errors.push(`${providerID}/${modelID}: ${e?.message ?? e}`)
129
+ } finally {
130
+ if (sessionID) {
131
+ await client.session
132
+ .delete({ path: { id: sessionID } })
133
+ .catch(() => {})
134
+ }
201
135
  }
136
+ }
202
137
 
203
- const data: any = await res.json()
204
- // Join all text blocks, skipping thinking/signature blocks (some models
205
- // like qwen/minimax-m2.7 emit reasoning before the answer).
206
- const text = data?.content
207
- ?.map((c: any) => c.text)
208
- .filter((t: any) => typeof t === "string" && t.length > 0)
209
- .join("\n")
210
- .trim()
211
-
212
- if (!text) {
213
- throw new Error(
214
- `see_image: model "${model}" returned no text. Response: ${JSON.stringify(data).slice(0, 300)}`,
215
- )
216
- }
138
+ throw new Error(
139
+ `see_image: SDK vision call failed for all candidates. ${errors.join("; ")}`,
140
+ )
141
+ }
217
142
 
218
- context.metadata({
219
- title: `see_image: ${path.basename(fullPath)}`,
220
- metadata: { model, endpoint, file: fullPath },
221
- })
143
+ async function seeImageViaHTTP(
144
+ b64: string,
145
+ mediaType: string,
146
+ prompt: string,
147
+ ): Promise<{ text: string; model: string; provider: string }> {
148
+ const key = process.env.SEE_IMAGE_API_KEY!
149
+ const body = {
150
+ model: MODEL,
151
+ max_tokens: 2048,
152
+ messages: [
153
+ {
154
+ role: "user",
155
+ content: [
156
+ {
157
+ type: "image",
158
+ source: { type: "base64", media_type: mediaType, data: b64 },
159
+ },
160
+ { type: "text", text: prompt },
161
+ ],
162
+ },
163
+ ],
164
+ }
222
165
 
223
- return text
224
- },
225
- })
166
+ const res = await fetch(ENDPOINT, {
167
+ method: "POST",
168
+ headers: {
169
+ "x-api-key": key,
170
+ "anthropic-version": API_VERSION,
171
+ "content-type": "application/json",
172
+ "user-agent": USER_AGENT,
173
+ },
174
+ body: JSON.stringify(body),
175
+ })
176
+
177
+ if (!res.ok) {
178
+ const errText = await res.text()
179
+ throw new Error(
180
+ `see_image: HTTP vision call to "${MODEL}" failed: HTTP ${res.status}, ${errText.slice(0, 300)}`,
181
+ )
182
+ }
226
183
 
227
- // ─── System prompt injection (the "skill") ──────────────────────────────────
228
- // Injected via experimental.chat.system.transform so the triggering logic
229
- // ships with the plugin — no separate SKILL.md install needed.
230
- const SYSTEM_INSTRUCTIONS = `# See Image (vision bridge) opencode-see-image plugin
184
+ const data: any = await res.json()
185
+ const text = data?.content
186
+ ?.map((c: any) => c.text)
187
+ .filter((t: any) => typeof t === "string" && t.length > 0)
188
+ .join("\n")
189
+ .trim()
190
+
191
+ if (!text) {
192
+ throw new Error(
193
+ `see_image: model "${MODEL}" returned no text. Response: ${JSON.stringify(data).slice(0, 300)}`,
194
+ )
195
+ }
231
196
 
232
- You have access to a \`see_image\` tool. The current model may not support image input directly. When a user attaches a screenshot or image, opencode rejects it and you only receive an error string containing the **filename** — no path, no pixels. Use \`see_image\` to actually view it.
197
+ return { text, model: MODEL, provider: PROVIDER_ID }
198
+ }
199
+
200
+ const SYSTEM_INSTRUCTIONS = `# See Image (vision bridge), opencode-see-image plugin
201
+
202
+ You have access to a \`see_image\` tool. The current model may not support image input directly. When a user attaches a screenshot or image, opencode rejects it and you only receive an error string containing the **filename**, no path, no pixels. Use \`see_image\` to actually view it.
233
203
 
234
204
  ## When to use \`see_image\`
235
205
 
@@ -238,13 +208,13 @@ Use ONLY when one of these is true:
238
208
  2. The user references an image/screenshot they expect you to see ("see this", "look at this", "can you see this", ".png"/".jpg")
239
209
  3. The user pastes an image path they want you to inspect
240
210
 
241
- Do NOT use \`see_image\` for reading text files use the \`read\` tool for those.
211
+ Do NOT use \`see_image\` for reading text files, use the \`read\` tool for those.
242
212
 
243
213
  ## How to use it
244
214
 
245
215
  1. **Extract the filename** from the error string (the quoted name), or use the path the user gave.
246
216
  2. **Call \`see_image\`** with \`filePath\` set to the bare filename (it auto-locates) or an absolute path. Pass an optional \`question\` if the user asked something specific.
247
- 3. **Answer using the returned description** as if you saw the image. Be natural don't mention that you used another model unless asked.
217
+ 3. **Answer using the returned description** as if you saw the image. Be natural, don't mention that you used another model unless asked.
248
218
 
249
219
  ## Important
250
220
 
@@ -252,18 +222,11 @@ Do NOT use \`see_image\` for reading text files — use the \`read\` tool for th
252
222
  - If the tool cannot find the file, tell the user the filename and ask for a full path or to drag the file into the project directory.
253
223
  - To inspect a specific detail, pass a targeted \`question\` (e.g. "What error is shown in the terminal?").`
254
224
 
255
- // ─── Auto-update ────────────────────────────────────────────────────────────
256
- // Runs once at plugin init (async, non-blocking). Checks npm for a newer
257
- // version, runs `bun update` in the opencode plugin cache if available, and
258
- // toasts the user to restart opencode to apply. Never throws — failures are
259
- // logged and swallowed so the plugin always loads.
260
-
261
225
  const PKG_NAME = "opencode-see-image"
262
226
  const REGISTRY_LATEST = `https://registry.npmjs.org/${PKG_NAME}/latest`
263
227
 
264
228
  function currentVersion(): string | null {
265
229
  try {
266
- // import.meta.url points at this module inside the bun cache.
267
230
  const here = new URL(".", import.meta.url)
268
231
  const pkgPath = new URL("package.json", here)
269
232
  const pkg = JSON.parse(fs.readFileSync(pkgPath, "utf8"))
@@ -291,80 +254,117 @@ async function maybeAutoUpdate(
291
254
  log: (msg: string, level?: string) => void,
292
255
  ) {
293
256
  const current = currentVersion()
294
- if (!current) {
295
- log("could not determine current version; skipping update check", "debug")
296
- return
297
- }
257
+ if (!current) return
298
258
 
299
259
  let latest: string
300
260
  try {
301
261
  const res = await fetch(REGISTRY_LATEST, {
302
262
  headers: { accept: "application/json" },
303
263
  })
304
- if (!res.ok) {
305
- log(`registry fetch returned HTTP ${res.status}`, "debug")
306
- return
307
- }
264
+ if (!res.ok) return
308
265
  const data: any = await res.json()
309
266
  latest = data?.version
310
- if (!latest) {
311
- log("registry response had no version field", "debug")
312
- return
313
- }
314
- } catch (e: any) {
315
- log(`registry fetch failed: ${e?.message ?? e}`, "debug")
267
+ if (!latest) return
268
+ } catch {
316
269
  return
317
270
  }
318
271
 
319
- if (!semverGt(latest, current)) {
320
- log(`up to date (${current})`, "debug")
321
- return
322
- }
272
+ if (!semverGt(latest, current)) return
323
273
 
324
- log(`update available: ${current} ${latest}; running bun update`, "info")
274
+ log(`update available: ${current} -> ${latest}; updating`, "info")
325
275
 
326
- // Update the plugin in opencode's cache. --no-save keeps the lockfile
327
- // resolution intact while still pulling the new tarball. We cd into the
328
- // cache dir because bun operates on the nearest package.json/lockfile.
329
- const cacheDir = path.join(os.homedir(), ".cache/opencode")
276
+ // Use opencode's own plugin command to re-resolve from npm. This uses
277
+ // opencode's bundled bun, so it works even when bun isn't installed
278
+ // globally on the user's PATH.
279
+ const opencodeBin =
280
+ process.env.OPENCODE_BIN ||
281
+ path.join(os.homedir(), ".opencode/bin/opencode")
330
282
  try {
331
- await $`cd ${cacheDir} && bun update ${PKG_NAME} --no-save`.quiet()
283
+ await $`${opencodeBin} plugin ${PKG_NAME} --force --global`.quiet()
332
284
  } catch (e: any) {
333
- log(`bun update failed: ${e?.message ?? e}`, "warn")
334
- return
285
+ // Fallback: try bare `opencode` on PATH
286
+ try {
287
+ await $`opencode plugin ${PKG_NAME} --force --global`.quiet()
288
+ } catch (e2: any) {
289
+ log(`plugin update failed: ${e2?.message ?? e2}`, "warn")
290
+ return
291
+ }
335
292
  }
336
293
 
337
- // Tell the user to restart. Toast is non-blocking; if it fails, we log.
338
294
  try {
339
295
  await client?.tui?.showToast?.({
340
296
  body: {
341
- message: `${PKG_NAME} updated to ${latest} restart opencode to apply`,
297
+ message: `${PKG_NAME} updated to ${latest}, restart opencode to apply`,
342
298
  variant: "success",
343
299
  },
344
300
  })
345
301
  } catch {
346
- log(`update applied: ${current} ${latest}; restart opencode to load`, "info")
302
+ log(`update applied: ${current} -> ${latest}; restart opencode to load`, "info")
347
303
  }
348
304
  }
349
305
 
350
- // ─── Plugin export ──────────────────────────────────────────────────────────
351
306
  const SeeImagePlugin: Plugin = async (ctx) => {
352
307
  const { client, $ } = ctx
353
308
 
354
309
  const log = (message: string, level: string = "info") => {
355
310
  try {
356
- client?.app?.log?.({
357
- body: { service: PKG_NAME, level, message },
358
- })
359
- } catch {
360
- // logging is best-effort
361
- }
311
+ client?.app?.log?.({ body: { service: PKG_NAME, level, message } })
312
+ } catch {}
362
313
  }
363
314
 
364
- // Fire-and-forget the update check. Never awaited so plugin init is not
365
- // delayed by network. Errors are swallowed inside.
366
315
  maybeAutoUpdate(client, $, log).catch(() => {})
367
316
 
317
+ const seeImageTool = tool({
318
+ description:
319
+ 'See an image/screenshot that the current model cannot view. Use when the user attaches an image and you get a "this model does not support image input" / "Cannot read" error, or when a screenshot/image is referenced ("see this", "can you see", .png/.jpg). Routes the image to a vision-capable model and returns a detailed textual description you can reason about as if you saw it. Pass filePath as an absolute path OR a bare filename (auto-located in macOS screenshot temp dirs, ~/Desktop, ~/Downloads, cwd).',
320
+ args: {
321
+ filePath: tool.schema
322
+ .string()
323
+ .describe(
324
+ 'Path to the image. Absolute path, or a bare filename like "Screenshot 2026-06-18 at 17.32.24.png" to auto-locate.',
325
+ ),
326
+ question: tool.schema
327
+ .string()
328
+ .optional()
329
+ .describe(
330
+ "Optional specific question about the image. Defaults to a general detailed description.",
331
+ ),
332
+ },
333
+ async execute(args, context) {
334
+ const fullPath = resolveFilePath(args.filePath, context.directory)
335
+ const ext = path.extname(fullPath).slice(1).toLowerCase()
336
+ const mediaType = EXT_MEDIA[ext] || "image/png"
337
+
338
+ const buf = fs.readFileSync(fullPath)
339
+ const b64 = Buffer.from(buf).toString("base64")
340
+ const dataUrl = `data:${mediaType};base64,${b64}`
341
+
342
+ const prompt =
343
+ args.question && args.question.trim().length > 0
344
+ ? args.question
345
+ : "Describe this image in detail. If it is a screenshot, describe the UI, text content, and layout precisely. This description will be used by another model to answer the user, so be thorough and accurate."
346
+
347
+ let result: { text: string; model: string; provider: string }
348
+
349
+ if (process.env.SEE_IMAGE_API_KEY) {
350
+ result = await seeImageViaHTTP(b64, mediaType, prompt)
351
+ } else {
352
+ result = await seeImageViaSDK(client, dataUrl, mediaType, prompt)
353
+ }
354
+
355
+ context.metadata({
356
+ title: `see_image: ${path.basename(fullPath)}`,
357
+ metadata: {
358
+ model: result.model,
359
+ provider: result.provider,
360
+ file: fullPath,
361
+ },
362
+ })
363
+
364
+ return result.text
365
+ },
366
+ })
367
+
368
368
  return {
369
369
  tool: {
370
370
  see_image: seeImageTool,
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "opencode-see-image",
3
- "version": "0.3.0",
3
+ "version": "0.4.1",
4
4
  "description": "Give non-vision opencode models the ability to see images/screenshots by routing them to a vision-capable model (MiniMax M3 via opencode-go by default).",
5
5
  "type": "module",
6
6
  "main": "index.ts",