opencode-vision 0.2.0 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/SKILL.md +24 -6
- package/package.json +1 -1
package/SKILL.md
CHANGED
|
@@ -15,8 +15,8 @@ description: >-
|
|
|
15
15
|
ordering/equality/layout/readability/state/diff/describe), asks the
|
|
16
16
|
user once per session which vision model to use, assembles a versioned
|
|
17
17
|
request, delegates, parses the typed report. Image paths from
|
|
18
|
-
screenshot_out_file/filePath; inline-only images saved to /tmp via
|
|
19
|
-
|
|
18
|
+
screenshot_out_file/filePath; inline-only images saved to /tmp via node
|
|
19
|
+
(not shell echo, to avoid embedding image bytes in commands).
|
|
20
20
|
---
|
|
21
21
|
|
|
22
22
|
# Vision — Visual Judgment Skill
|
|
@@ -199,19 +199,37 @@ Some tool results return image attachments with
|
|
|
199
199
|
e.g. `cua-driver_zoom` (inline-only, no path param), or
|
|
200
200
|
`playwright_browser_take_screenshot` called without a `filename`. The
|
|
201
201
|
vision subagent needs a file path to `read`. Save the inline image to
|
|
202
|
-
disk first
|
|
202
|
+
disk first.
|
|
203
|
+
|
|
204
|
+
**Prefer avoiding inline images altogether**: when calling
|
|
205
|
+
`cua-driver_get_window_state`, always pass `screenshot_out_file` so a
|
|
206
|
+
file path is available directly. When calling
|
|
207
|
+
`chrome-devtools_take_screenshot` or `playwright_browser_take_screenshot`,
|
|
208
|
+
always pass `filePath` / `filename`. This avoids the inline-only case
|
|
209
|
+
entirely and is the safest path.
|
|
210
|
+
|
|
211
|
+
If you must handle an inline-only image, write the base64 payload to a
|
|
212
|
+
file using `node -e` (not `echo | base64 -d`, which embeds the raw
|
|
213
|
+
image data in a shell command — screenshots may contain sensitive
|
|
214
|
+
content like tokens or credentials):
|
|
203
215
|
|
|
204
216
|
```
|
|
205
217
|
If a tool result has attachments[].url starting "data:image/...;base64,"
|
|
206
218
|
but no file path:
|
|
207
219
|
1. Extract the base64 payload from the data URL (the part after
|
|
208
220
|
";base64,").
|
|
209
|
-
2. Write it to /tmp/vision-<random>.png
|
|
210
|
-
|
|
221
|
+
2. Write it to /tmp/vision-<random>.png using node, which avoids
|
|
222
|
+
passing the base64 through the shell:
|
|
223
|
+
node -e "require('fs').writeFileSync('/tmp/vision-<random>.png',
|
|
224
|
+
Buffer.from('<base64>','base64'))"
|
|
225
|
+
Or write a small script to /tmp and run it, passing the base64 via
|
|
226
|
+
stdin to avoid it appearing in the command line.
|
|
211
227
|
3. Use that path in the request's images[].path.
|
|
212
228
|
```
|
|
213
229
|
|
|
214
|
-
|
|
230
|
+
Do not use `echo "<base64>" | base64 -d` — it embeds the raw image
|
|
231
|
+
bytes in the shell command, creating an exfiltration risk if the
|
|
232
|
+
screenshot contains sensitive data.
|
|
215
233
|
|
|
216
234
|
## Step 4. Pick model (once per session)
|
|
217
235
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "opencode-vision",
|
|
3
|
-
"version": "0.2.
|
|
3
|
+
"version": "0.2.1",
|
|
4
4
|
"description": "Typed visual-judgment skill for opencode. Registers 10 vision subagents (one per top-tier vision model across OpenAI, Kimi for Coding, Ollama Cloud, and opencode-go) and a skill that teaches a text-only orchestrator to extract visual-judgment intent, classify it into a typed judgment, and delegate to a vision subagent with a versioned request/report contract.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"main": "./dist/index.js",
|