npm - opencode-vision - Versions diffs - 0.2.0 → 0.2.1 - Mend

opencode-vision 0.2.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (2) hide show

package/SKILL.md +24 -6
package/package.json +1 -1

package/SKILL.md CHANGED Viewed

@@ -15,8 +15,8 @@ description: >-
   ordering/equality/layout/readability/state/diff/describe), asks the
   user once per session which vision model to use, assembles a versioned
   request, delegates, parses the typed report. Image paths from
-  screenshot_out_file/filePath; inline-only images saved to /tmp via
-  base64 -d.
+  screenshot_out_file/filePath; inline-only images saved to /tmp via node
+  (not shell echo, to avoid embedding image bytes in commands).
 ---
 # Vision — Visual Judgment Skill
@@ -199,19 +199,37 @@ Some tool results return image attachments with
 e.g. `cua-driver_zoom` (inline-only, no path param), or
 `playwright_browser_take_screenshot` called without a `filename`. The
 vision subagent needs a file path to `read`. Save the inline image to
-disk first:
+disk first.
+**Prefer avoiding inline images altogether**: when calling
+`cua-driver_get_window_state`, always pass `screenshot_out_file` so a
+file path is available directly. When calling
+`chrome-devtools_take_screenshot` or `playwright_browser_take_screenshot`,
+always pass `filePath` / `filename`. This avoids the inline-only case
+entirely and is the safest path.
+If you must handle an inline-only image, write the base64 payload to a
+file using `node -e` (not `echo | base64 -d`, which embeds the raw
+image data in a shell command — screenshots may contain sensitive
+content like tokens or credentials):
 ```
 If a tool result has attachments[].url starting "data:image/...;base64,"
 but no file path:
   1. Extract the base64 payload from the data URL (the part after
      ";base64,").
-  2. Write it to /tmp/vision-<random>.png via bash:
-       echo "<base64>" | base64 -d > /tmp/vision-<random>.png
+  2. Write it to /tmp/vision-<random>.png using node, which avoids
+     passing the base64 through the shell:
+       node -e "require('fs').writeFileSync('/tmp/vision-<random>.png',
+       Buffer.from('<base64>','base64'))"
+     Or write a small script to /tmp and run it, passing the base64 via
+     stdin to avoid it appearing in the command line.
   3. Use that path in the request's images[].path.
 ```
-This is the recommended handling for any inline-only image result.
+Do not use `echo "<base64>" | base64 -d` — it embeds the raw image
+bytes in the shell command, creating an exfiltration risk if the
+screenshot contains sensitive data.
 ## Step 4. Pick model (once per session)

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "opencode-vision",
-  "version": "0.2.0",
+  "version": "0.2.1",
   "description": "Typed visual-judgment skill for opencode. Registers 10 vision subagents (one per top-tier vision model across OpenAI, Kimi for Coding, Ollama Cloud, and opencode-go) and a skill that teaches a text-only orchestrator to extract visual-judgment intent, classify it into a typed judgment, and delegate to a vision subagent with a versioned request/report contract.",
   "type": "module",
   "main": "./dist/index.js",