opencode-vision 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,165 @@
1
+ # vision — Visual Judgment Skill for opencode
2
+
3
+ A typed visual-judgment contract for text-only orchestrators (GLM 5.2).
4
+ The orchestrator captures a screenshot via a browser/computer-use MCP,
5
+ extracts the visual-judgment intent, assembles a versioned request JSON,
6
+ delegates to a vision subagent, and parses a typed report.
7
+
8
+ ## What it gives you
9
+
10
+ - **10 vision subagents** registered programmatically at init — one per
11
+ top-tier vision model across OpenAI, Kimi for Coding, Ollama Cloud, and
12
+ opencode-go.
13
+ - **A stable typed contract** — two versioned JSON Schemas
14
+ (`visual-judgment-request.v1` / `visual-judgment-report.v1`) replace the
15
+ old "design your own schema" free-for-all.
16
+ - **Per-session model selection** — the skill asks the user once which
17
+ vision model to use, then reuses it for the rest of the session.
18
+ - **10 judgment types** — `presence`, `absence`, `alignment`, `ordering`,
19
+ `equality`, `layout`, `readability`, `state`, `diff`, `describe`.
20
+ - **MCP integration** — works with chrome-devtools, Playwright, and
21
+ cua-driver screenshots. Uses the a11y/AX tree when it answers the
22
+ question; delegates to a vision subagent only when pixels matter.
23
+
24
+ ## Install
25
+
26
+ Add the plugin to your `~/.config/opencode/opencode.json`:
27
+
28
+ ```json
29
+ {
30
+ "$schema": "https://opencode.ai/config.json",
31
+ "plugin": [
32
+ "opencode-vision"
33
+ ],
34
+ "skills": {
35
+ "paths": [
36
+ "~/.cache/opencode/node_modules/opencode-vision"
37
+ ]
38
+ }
39
+ }
40
+ ```
41
+
42
+ opencode auto-installs the npm package via Bun on next launch — no separate
43
+ `npm install` step needed. The skill ships inside the package (in `SKILL.md`),
44
+ so point `skills.paths` at the installed package location so opencode's skill
45
+ loader can find it.
46
+
47
+ The old `~/.config/opencode/agents/visual-judge.md` subagent is removed —
48
+ this plugin replaces it with 10 typed `vision-*` subagents. Delete the old
49
+ file if present:
50
+
51
+ ```bash
52
+ rm -f ~/.config/opencode/agents/visual-judge.md
53
+ ```
54
+
55
+ Restart opencode for the config to take effect.
56
+
57
+ > **Why `skills.paths` points at the installed package:** opencode's plugin
58
+ > loader resolves the npm package to its `dist/index.js` entrypoint and
59
+ > runs the `config(cfg)` hook that registers the 10 subagents. But opencode's
60
+ > *skill* loader scans directories for `SKILL.md` — it does not look inside
61
+ > npm packages automatically. So we point `skills.paths` at the installed
62
+ > package directory, where `SKILL.md` ships as a published file. The path
63
+ > above (`~/.cache/opencode/node_modules/opencode-vision`) is where Bun
64
+ > caches opencode plugins; adjust if your cache lives elsewhere.
65
+
66
+ ## Verify
67
+
68
+ ```bash
69
+ opencode debug agent vision-openai-gpt-5.5
70
+ ```
71
+
72
+ Should show the registered subagent with `model: openai/gpt-5.5`,
73
+ `mode: subagent`.
74
+
75
+ To list all 10:
76
+
77
+ ```bash
78
+ opencode debug agent vision-openai-gpt-5.5
79
+ opencode debug agent vision-kimi-for-coding-k2p7
80
+ opencode debug agent vision-ollama-cloud-gemini-3-flash-preview
81
+ opencode debug agent vision-ollama-cloud-gemma4-31b
82
+ opencode debug agent vision-ollama-cloud-minimax-m3
83
+ opencode debug agent vision-ollama-cloud-qwen3.5-397b
84
+ opencode debug agent vision-opencode-go-kimi-k2.7-code
85
+ opencode debug agent vision-opencode-go-minimax-m3
86
+ opencode debug agent vision-opencode-go-qwen3.7-plus
87
+ opencode debug agent vision-opencode-go-mimo-v2.5
88
+ ```
89
+
90
+ ## Smoke test
91
+
92
+ Ask the orchestrator something visual:
93
+
94
+ > Visually verify the screenshot at /tmp/foo.png shows a centered button.
95
+
96
+ The orchestrator should:
97
+ 1. Detect the visual-judgment intent.
98
+ 2. Ask you (once) which vision model to use.
99
+ 3. Assemble a `visual-judgment-request.v1` JSON with `judgment.type:
100
+ alignment`.
101
+ 4. Delegate to the chosen `vision-*` subagent.
102
+ 5. Parse the report and tell you pass/fail with the button's position.
103
+
104
+ ## File layout (source)
105
+
106
+ ```
107
+ opencode/vision/ # this sub-package, published as opencode-vision
108
+ package.json # npm package metadata; main -> dist/index.js
109
+ plugin.ts # source: registers 10 vision-* subagents via config(cfg)
110
+ dist/ # built on prepublishOnly (gitignored)
111
+ index.js # built bundle — the package entrypoint
112
+ vision-models.json # 10-entry manifest (one top-tier per provider × family)
113
+ subagent-body.md # shared subagent prompt template
114
+ SKILL.md # intent-capture protocol + per-session question + MCP integration
115
+ schemas/
116
+ visual-judgment-request.v1.json
117
+ visual-judgment-report.v1.json
118
+ README.md # this file
119
+ ```
120
+
121
+ ## Build & publish (maintainers)
122
+
123
+ ```bash
124
+ cd opencode/vision
125
+ bun run build # builds dist/index.js
126
+ npm publish # runs prepublishOnly -> build -> publish
127
+ ```
128
+
129
+ The `files` field in `package.json` controls what ships: `dist/`,
130
+ `SKILL.md`, `schemas/`, `subagent-body.md`, `vision-models.json`,
131
+ `README.md`. No source `.ts` or `node_modules` leak.
132
+
133
+ ## Catalog (10 models, 4 providers)
134
+
135
+ Curation rule: one top-tier model per provider × vendor family; drop
136
+ non-reasoning, drop superseded within a provider, drop coding-specialized,
137
+ drop Pro/billing variants of the same family; keep cross-provider
138
+ duplicates.
139
+
140
+ | Provider | Model | Family |
141
+ |---|---|---|
142
+ | openai | gpt-5.5 | GPT-5.5 |
143
+ | kimi-for-coding | k2p7 | Kimi K2.7 |
144
+ | ollama-cloud | gemini-3-flash-preview | Gemini |
145
+ | ollama-cloud | gemma4:31b | Gemma |
146
+ | ollama-cloud | minimax-m3 | MiniMax |
147
+ | ollama-cloud | qwen3.5:397b | Qwen 3.5 |
148
+ | opencode-go | kimi-k2.7-code | Kimi K2.7 (cross-provider route) |
149
+ | opencode-go | minimax-m3 | MiniMax (cross-provider route) |
150
+ | opencode-go | qwen3.7-plus | Qwen 3.7 |
151
+ | opencode-go | mimo-v2.5 | MiMo |
152
+
153
+ To add a model: add one line to `vision-models.json` and restart opencode.
154
+ The plugin re-reads the manifest at init.
155
+
156
+ ## Schemas
157
+
158
+ Published via GitHub raw URLs (branch `main`):
159
+
160
+ - Request: `https://raw.githubusercontent.com/WeZZard/skills/main/opencode/vision/schemas/visual-judgment-request.v1.json`
161
+ - Report: `https://raw.githubusercontent.com/WeZZard/skills/main/opencode/vision/schemas/visual-judgment-report.v1.json`
162
+
163
+ The files also live in this repo under `opencode/vision/schemas/` for
164
+ editing. The URL is the canonical `$id`/`$schema` reference used by the
165
+ SKILL.md and subagent body.
package/SKILL.md ADDED
@@ -0,0 +1,396 @@
1
+ ---
2
+ name: vision
3
+ description: >-
4
+ Use when you must verify, check, or evaluate what is visually rendered in
5
+ one or more images — e.g. "visually verify the screenshot shows a centered
6
+ button", "check the icon is visible", "does the layout match the design",
7
+ acceptance criteria mentioning on-screen state. Captures visual-judgment
8
+ intent from user prompts or MCP task outputs, classifies it into a typed
9
+ judgment, asks the user once per session which vision model to use,
10
+ assembles a versioned request, delegates to a vision subagent, and parses
11
+ the typed report. Requires locally-stored image files (cua-driver
12
+ screenshots, Playwright/chrome-devtools captures, user-provided paths).
13
+ ---
14
+
15
+ # Vision — Visual Judgment Skill
16
+
17
+ You are a text-only orchestrator (GLM 5.2). You cannot see images. When a
18
+ task requires visual verification, you delegate to a vision subagent that
19
+ returns a typed report. This skill defines the extraction pipeline:
20
+ **Detect → Classify → Assemble → Pick model → Delegate → Parse**.
21
+
22
+ ## Why this skill exists
23
+
24
+ You are text-only (`attachment: false`). You cannot verify visual properties
25
+ yourself — alignment, color, readability, layout. A vision subagent can.
26
+ This skill gives you a stable contract for talking to one.
27
+
28
+ ## The two schemas
29
+
30
+ - **Request** (what you emit, passed as the `task` prompt):
31
+ https://raw.githubusercontent.com/WeZZard/skills/main/opencode/vision/schemas/visual-judgment-request.v1.json
32
+ - **Report** (what the subagent returns):
33
+ https://raw.githubusercontent.com/WeZZard/skills/main/opencode/vision/schemas/visual-judgment-report.v1.json
34
+
35
+ ## Step 1. Detect
36
+
37
+ Visual-judgment intent arrives from two sources. Recognize both.
38
+
39
+ ### Source A — explicit visual-judgment language in a user prompt
40
+
41
+ Trigger lexicon (any of these suggests visual judgment):
42
+ - "visually verify", "visually check", "screenshot shows"
43
+ - "looks right", "looks wrong", "looks broken"
44
+ - "centered", "aligned", "overlapping", "misaligned"
45
+ - "visible", "hidden", "not showing"
46
+ - "readable", "legible", "too small", "low contrast"
47
+ - "on" / "off" / "checked" / "disabled" (for a control's visual state)
48
+ - "matches the design", "matches the mockup"
49
+ - acceptance criteria mentioning on-screen state
50
+ - a user-provided image path (e.g. `/tmp/foo.png`)
51
+
52
+ If the user's request contains image-attachment references or a path to a
53
+ screenshot/screenshot file, that is also a trigger.
54
+
55
+ ### Source B — a gap between an MCP task output and a visual criterion
56
+
57
+ A `browser-use-*` or `computer-use-cua` subagent returned a screenshot
58
+ path plus a text description of what is on screen. But the user's
59
+ criterion is visual (positional, color, readability, layout) and the text
60
+ description cannot fully prove it. You recognize the gap and extract a
61
+ visual-judgment intent from the combination of user criterion + MCP output.
62
+
63
+ **Example**: user says "log into the app and check the dashboard looks
64
+ right." You spawn `browser-use-chrome-devtools` to navigate + screenshot.
65
+ It returns `/tmp/dashboard.png` + "sidebar with nav items, bar chart,
66
+ welcome header." The text describes structure, but "looks right" is a
67
+ visual layout quality the text can't fully prove → you detect a
68
+ visual-judgment need.
69
+
70
+ ## Step 2. Classify
71
+
72
+ Map the NL task to one of the 10 closed `judgment.type` values. Each has
73
+ typed `parameters`.
74
+
75
+ | Type | When to use | Typed parameters |
76
+ |---|---|---|
77
+ | `presence` | Is X visible on screen? | `subject`, `expectation: present\|absent` |
78
+ | `absence` | Is X NOT visible? (dual of presence) | `subject`, `expectation: absent` |
79
+ | `alignment` | Is X centered / left-aligned / top along an axis? | `subject`, `axis`, `expectation`, `tolerance` |
80
+ | `ordering` | Are items in expected left-to-right or top-to-bottom order? | `direction: ltr\|ttb`, `expected[]` |
81
+ | `equality` | Do two images render the same thing? | `subjects[2]`, `threshold: exact\|perceptual` |
82
+ | `layout` | Open-ended structural check (arrangement, spacing) | `expectations` (NL) |
83
+ | `readability` | Is text legible? (contrast, size) | `subject` |
84
+ | `state` | Is a control in a given state? (toggle, checkbox) | `subject`, `expectedState` |
85
+ | `diff` | What changed between two screenshots? | `baseline`, `current` (image labels) |
86
+ | `describe` | Open-ended description of what's on screen | `focus` |
87
+
88
+ Worked examples (one per type) are in the appendix at the bottom of this
89
+ file. When in doubt, pick the most specific type that fits; fall back to
90
+ `describe` if nothing fits.
91
+
92
+ ## Step 3. Assemble
93
+
94
+ Construct the `visual-judgment-request.v1` JSON object.
95
+
96
+ ### 3a. Gather image paths
97
+
98
+ Image paths come from:
99
+
100
+ | Source | How to get the path |
101
+ |---|---|
102
+ | User-provided | Use the path the user gave (e.g. `/tmp/foo.png`). |
103
+ | chrome-devtools MCP | `chrome-devtools_take_screenshot({ filePath: "/tmp/shot.png" })` — saves PNG to disk. |
104
+ | Playwright MCP | `playwright_browser_take_screenshot({ filename: "shot.png" })` — saves to the configured output directory. |
105
+ | cua-driver MCP | `cua-driver_get_window_state({ pid, window_id, screenshot_out_file: "/tmp/win.png" })` — saves window screenshot to disk. Also returns the AX tree as text. |
106
+ | Browser-use subagent output | The subagent returns the path in its text response; extract it. |
107
+
108
+ For each image, assign a `label` (short, used in `observations[].imageLabel`)
109
+ and a `role` (`baseline` = before/reference, `current` = the thing under
110
+ test, `reference` = design target).
111
+
112
+ ### 3b. Dual-track: a11y tree vs. visual judgment
113
+
114
+ Before delegating, check whether the text tree already answers the
115
+ question. All three MCPs (chrome-devtools, Playwright, cua-driver) return
116
+ an accessibility/AX tree alongside the screenshot. You can read that text
117
+ directly — no vision call needed.
118
+
119
+ | Criterion | Source | Delegate to vision? |
120
+ |---|---|---|
121
+ | "Button exists" | a11y tree (element present) | No |
122
+ | "Button is enabled/disabled" | a11y tree (`AXEnabled`) | No |
123
+ | "Button text says 'Submit'" | a11y tree (`AXTitle`/`AXValue`) | No |
124
+ | "Button is centered" | Screenshot (positional) | **Yes** |
125
+ | "Text is readable" | Screenshot (contrast/size) | **Yes** |
126
+ | "Toggle is blue" | Screenshot (color) | **Yes** |
127
+ | "Layout matches design" | Screenshot (structural) | **Yes** |
128
+ | "Two screenshots are identical" | Screenshot pair | **Yes** |
129
+
130
+ Use the cheap text source first. Only pay for a vision call when the text
131
+ tree cannot answer.
132
+
133
+ ### 3c. Fill typed parameters + NL criteria
134
+
135
+ Fill `judgment.parameters` per the type (Step 2 table). If the typed
136
+ parameters cannot fully express the nuance, add a free-form `criteria`
137
+ string as a fallback for the subagent. Also set `responseContract` if you
138
+ want something specific back beyond the fixed report envelope.
139
+
140
+ ### 3d. Edge case — MCP output has no screenshot path
141
+
142
+ If a browser-use subagent returned only text (no path) but a visual
143
+ judgment is still needed, capture a screenshot yourself by driving the
144
+ MCP directly (see 3a table), or re-task the subagent with explicit
145
+ screenshot-save instructions.
146
+
147
+ ### 3e. Edge case — built-in computer-use MCP
148
+
149
+ The built-in Claude Code `computer-use` MCP returns screenshots as inline
150
+ base64 images, not file paths. You cannot see inline images (you are
151
+ text-only), and the vision subagent needs a file path to `read`. Prefer
152
+ `cua-driver` for desktop visual judgments — it has `screenshot_out_file`.
153
+
154
+ ## Step 4. Pick model (once per session)
155
+
156
+ opencode has no per-call model override and no LLM-set session variable.
157
+ The model choice is carried in your own context for the rest of the
158
+ session.
159
+
160
+ **On the first visual-judgment need in a session**, before delegating,
161
+ call the `question` tool once:
162
+
163
+ ```
164
+ question({
165
+ questions: [{
166
+ header: "Vision model",
167
+ question: "I found several models that support vision tasks. Which model would you prefer for visual judgments this session?",
168
+ options: [
169
+ { label: "openai/gpt-5.5", description: "Highest accuracy (Recommended)" },
170
+ { label: "kimi-for-coding/k2p7", description: "Kimi K2.7 Code" },
171
+ { label: "ollama-cloud/gemini-3-flash-preview", description: "Gemini 3 Flash, 1M context" },
172
+ { label: "ollama-cloud/gemma4:31b", description: "Gemma 4 31B" },
173
+ { label: "ollama-cloud/minimax-m3", description: "MiniMax M3" },
174
+ { label: "ollama-cloud/qwen3.5:397b", description: "Qwen 3.5 397B" },
175
+ { label: "opencode-go/kimi-k2.7-code", description: "Kimi K2.7 Code via opencode-go" },
176
+ { label: "opencode-go/minimax-m3", description: "MiniMax M3 via opencode-go" },
177
+ { label: "opencode-go/qwen3.7-plus", description: "Qwen 3.7 Plus, 1M context" },
178
+ { label: "opencode-go/mimo-v2.5", description: "MiMo V2.5, 1M context" }
179
+ ]
180
+ }]
181
+ })
182
+ ```
183
+
184
+ The tool auto-adds an "Other" option (type your own). After the user
185
+ answers:
186
+
187
+ - Map the answer to a `subagent_type` via the table below.
188
+ - Remember the choice for the rest of the session. Do not ask again.
189
+ Reuse the chosen model for all subsequent visual judgments in this
190
+ session.
191
+ - If the user picks "Other" and types a model id, map it to the closest
192
+ matching `vision-*` subagent from the table, or fall back to
193
+ `vision-openai-gpt-5.5` if no match.
194
+
195
+ ### `preferredModel → subagent_type` mapping table
196
+
197
+ ```
198
+ openai/gpt-5.5 -> vision-openai-gpt-5.5
199
+ kimi-for-coding/k2p7 -> vision-kimi-for-coding-k2p7
200
+ ollama-cloud/gemini-3-flash-preview -> vision-ollama-cloud-gemini-3-flash-preview
201
+ ollama-cloud/gemma4:31b -> vision-ollama-cloud-gemma4-31b
202
+ ollama-cloud/minimax-m3 -> vision-ollama-cloud-minimax-m3
203
+ ollama-cloud/qwen3.5:397b -> vision-ollama-cloud-qwen3.5-397b
204
+ opencode-go/kimi-k2.7-code -> vision-opencode-go-kimi-k2.7-code
205
+ opencode-go/minimax-m3 -> vision-opencode-go-minimax-m3
206
+ opencode-go/qwen3.7-plus -> vision-opencode-go-qwen3.7-plus
207
+ opencode-go/mimo-v2.5 -> vision-opencode-go-mimo-v2.5
208
+ ```
209
+
210
+ ## Step 5. Delegate
211
+
212
+ Spawn the subagent with the assembled request JSON as the `prompt`:
213
+
214
+ ```
215
+ task({
216
+ subagent_type: "<mapped subagent_type>",
217
+ description: "<short, e.g. 'Verify Submit button is centered'>",
218
+ prompt: <the full visual-judgment-request.v1 JSON object>
219
+ })
220
+ ```
221
+
222
+ ## Step 6. Parse
223
+
224
+ The subagent returns a `visual-judgment-report.v1` JSON object. Branch on
225
+ `status` and `verdict`:
226
+
227
+ - `status: "ok"` + `verdict: "pass"` → criterion met. Report success to
228
+ the user, citing `observations[]` as evidence.
229
+ - `status: "ok"` + `verdict: "fail"` → criterion not met. Report failure,
230
+ citing the specific `observations[]` (e.g. "button is 42px right of
231
+ center"). Include `reasoning`.
232
+ - `status: "ok"` + `verdict: "inconclusive"` → informational (for `diff`
233
+ and `describe`) or genuinely undeterminable. Surface `observations[]`
234
+ and `diff[]` directly to the user.
235
+ - `status: "error"` → the subagent could not analyze the image(s). Check
236
+ `errors[]` (codes: `file_not_found`, `unsupported_format`,
237
+ `model_unavailable`). If `model_unavailable`, retry with a different
238
+ model from the mapping table, or re-ask the user.
239
+ - `status: "insufficient-evidence"` → the subagent analyzed the image but
240
+ cannot reach a verdict. Report this honestly; do not pretend a verdict.
241
+
242
+ Surface `observations[]` as citations so the user sees what the subagent
243
+ actually saw. Include `confidence` in your report to the user.
244
+
245
+ ## Two integration patterns
246
+
247
+ ### Pattern 1 — Direct (simple "screenshot + judge")
248
+
249
+ Use when the browser/desktop interaction is trivial (just navigate and
250
+ look). You drive the MCP directly, capture one screenshot, delegate one
251
+ judgment.
252
+
253
+ ```
254
+ You: chrome-devtools_navigate_page({ url: "http://localhost:3000" })
255
+ You: chrome-devtools_take_screenshot({ filePath: "/tmp/login.png" })
256
+ You: [assemble request with /tmp/login.png]
257
+ You: task({ subagent_type: "vision-openai-gpt-5.5", prompt: <request> })
258
+ ```
259
+
260
+ ### Pattern 2 — Two-phase (complex interaction, then judge)
261
+
262
+ Use when interaction is non-trivial (navigate, click, fill, navigate
263
+ again). You spawn a `browser-use-*` or `computer-use-cua` subagent to
264
+ perform the interaction and capture a screenshot. It returns the path.
265
+ You then delegate to a `vision-*` subagent.
266
+
267
+ ```
268
+ You: task({
269
+ subagent_type: "browser-use-chrome-devtools",
270
+ prompt: "Navigate to /login, fill credentials, click Submit, wait for
271
+ dashboard, take a screenshot to /tmp/dashboard.png. Return
272
+ the file path and a brief text description."
273
+ })
274
+ -> subagent returns: "/tmp/dashboard.png, sidebar + chart + header"
275
+ You: [assemble request with /tmp/dashboard.png, judgment.type=layout]
276
+ You: task({ subagent_type: "vision-openai-gpt-5.5", prompt: <request> })
277
+ ```
278
+
279
+ Separation of concerns: the browser subagent knows how to drive; the
280
+ vision subagent knows how to see.
281
+
282
+ ---
283
+
284
+ ## Appendix — worked examples per judgment type
285
+
286
+ ### presence — "is X visible?"
287
+ ```json
288
+ {
289
+ "$schema": "https://raw.githubusercontent.com/WeZZard/skills/main/opencode/vision/schemas/visual-judgment-request.v1.json",
290
+ "id": "vj-001",
291
+ "preferredModel": "openai/gpt-5.5",
292
+ "images": [{ "path": "/tmp/login.png", "label": "login-screen", "role": "current" }],
293
+ "judgment": { "type": "presence", "parameters": { "subject": "Submit button", "expectation": "present" } },
294
+ "criteria": "A clickable button labeled 'Submit' or equivalent, within the login form area.",
295
+ "responseContract": "Return pass/fail and note the button's position if found."
296
+ }
297
+ ```
298
+
299
+ ### absence — "is X NOT visible?"
300
+ ```json
301
+ {
302
+ "id": "vj-002", "preferredModel": "openai/gpt-5.5",
303
+ "images": [{ "path": "/tmp/post-logout.png", "label": "home", "role": "current" }],
304
+ "judgment": { "type": "absence", "parameters": { "subject": "error banner", "expectation": "absent" } },
305
+ "criteria": "No red/error banner at the top of the page or anywhere on screen."
306
+ }
307
+ ```
308
+
309
+ ### alignment — "is X centered on an axis?"
310
+ ```json
311
+ {
312
+ "id": "vj-003", "preferredModel": "openai/gpt-5.5",
313
+ "images": [{ "path": "/tmp/header.png", "label": "header", "role": "current" }],
314
+ "judgment": { "type": "alignment", "parameters": { "subject": "logo", "axis": "horizontal", "expectation": "centered", "tolerance": "loose" } },
315
+ "criteria": "Logo should be roughly centered in the header band, allowing minor off-center within ~5%."
316
+ }
317
+ ```
318
+
319
+ ### ordering — "are items in expected LTR/TTB order?"
320
+ ```json
321
+ {
322
+ "id": "vj-004", "preferredModel": "openai/gpt-5.5",
323
+ "images": [{ "path": "/tmp/nav.png", "label": "navbar", "role": "current" }],
324
+ "judgment": { "type": "ordering", "parameters": { "direction": "ltr", "expected": ["Home", "Products", "About", "Contact"] } },
325
+ "criteria": "Items read left-to-right in the specified order."
326
+ }
327
+ ```
328
+
329
+ ### equality — "do two images match?"
330
+ ```json
331
+ {
332
+ "id": "vj-005", "preferredModel": "openai/gpt-5.5",
333
+ "images": [
334
+ { "path": "/tmp/chart-v1.png", "label": "v1", "role": "baseline" },
335
+ { "path": "/tmp/chart-v2.png", "label": "v2", "role": "current" }
336
+ ],
337
+ "judgment": { "type": "equality", "parameters": { "subjects": ["v1", "v2"], "threshold": "perceptual" } },
338
+ "criteria": "Minor pixel-level anti-aliasing differences are acceptable; structural differences are not."
339
+ }
340
+ ```
341
+
342
+ ### layout — "does the structure match expectations?"
343
+ ```json
344
+ {
345
+ "id": "vj-006", "preferredModel": "openai/gpt-5.5",
346
+ "images": [{ "path": "/tmp/form.png", "label": "signup-form", "role": "current" }],
347
+ "judgment": { "type": "layout", "parameters": { "expectations": "Fields stacked vertically; equal vertical gaps; labels above inputs." } },
348
+ "criteria": "Email, Password, Confirm Password fields in that top-to-bottom order."
349
+ }
350
+ ```
351
+
352
+ ### readability — "is the text legible?"
353
+ ```json
354
+ {
355
+ "id": "vj-007", "preferredModel": "openai/gpt-5.5",
356
+ "images": [{ "path": "/tmp/page.png", "label": "page", "role": "current" }],
357
+ "judgment": { "type": "readability", "parameters": { "subject": "footer text" } },
358
+ "criteria": "Footer text should be readable at normal viewing distance; not blurry, not too small, sufficient contrast."
359
+ }
360
+ ```
361
+
362
+ ### state — "is the control in the expected state?"
363
+ ```json
364
+ {
365
+ "id": "vj-008", "preferredModel": "openai/gpt-5.5",
366
+ "images": [{ "path": "/tmp/settings.png", "label": "settings-panel", "role": "current" }],
367
+ "judgment": { "type": "state", "parameters": { "subject": "notifications toggle", "expectedState": "on" } },
368
+ "criteria": "Toggle knob should be on the right side with the accent color (blue)."
369
+ }
370
+ ```
371
+
372
+ ### diff — "what changed between two screenshots?"
373
+ ```json
374
+ {
375
+ "id": "vj-009", "preferredModel": "openai/gpt-5.5",
376
+ "images": [
377
+ { "path": "/tmp/before.png", "label": "before", "role": "baseline" },
378
+ { "path": "/tmp/after.png", "label": "after", "role": "current" }
379
+ ],
380
+ "judgment": { "type": "diff", "parameters": { "baseline": "before", "current": "after" } },
381
+ "criteria": "Report all visual differences: added/removed/changed elements, color shifts, position changes."
382
+ }
383
+ ```
384
+
385
+ ### describe — "what's on screen?"
386
+ ```json
387
+ {
388
+ "id": "vj-010", "preferredModel": "openai/gpt-5.5",
389
+ "images": [{ "path": "/tmp/screenshot.png", "label": "screen", "role": "current" }],
390
+ "judgment": { "type": "describe", "parameters": { "focus": "overall layout and primary UI elements" } },
391
+ "criteria": "Capture: app type, main regions, primary actions, color scheme."
392
+ }
393
+ ```
394
+
395
+ For `diff` and `describe`, expect `verdict: "inconclusive"` — these are
396
+ informational, not pass/fail. Use `diff[]` and `observations[]` directly.
package/dist/index.js ADDED
@@ -0,0 +1,44 @@
1
+ // plugin.ts
2
+ import { readFileSync, existsSync } from "node:fs";
3
+ import { fileURLToPath } from "node:url";
4
+ import { dirname, join } from "node:path";
5
+ var bundleDir = dirname(fileURLToPath(import.meta.url));
6
+ var candidateDirs = [bundleDir, join(bundleDir, "..")];
7
+ var dataDir = candidateDirs.find((d) => existsSync(join(d, "vision-models.json")) && existsSync(join(d, "subagent-body.md"))) ?? bundleDir;
8
+ var manifest = JSON.parse(readFileSync(join(dataDir, "vision-models.json"), "utf8"));
9
+ var bodyTpl = readFileSync(join(dataDir, "subagent-body.md"), "utf8");
10
+ var PERMISSION = {
11
+ edit: "deny",
12
+ read: "allow",
13
+ glob: "allow",
14
+ grep: "allow",
15
+ list: "allow",
16
+ external_directory: {
17
+ "/private/tmp/**": "allow",
18
+ "/private/var/folders/**": "allow"
19
+ }
20
+ };
21
+ function subagentName(entry) {
22
+ return "vision-" + entry.provider + "-" + entry.model_id.replace(/[/:]/g, "-");
23
+ }
24
+ var plugin = async () => ({
25
+ config: async (cfg) => {
26
+ cfg.agent ??= {};
27
+ for (const e of manifest.models) {
28
+ const name = subagentName(e);
29
+ cfg.agent[name] ??= {};
30
+ Object.assign(cfg.agent[name], {
31
+ description: `Visual judgment subagent (${e.name}). Consumes a visual-judgment-request.v1 JSON, analyzes images, emits a visual-judgment-report.v1 JSON. Not coupled to any screenshot tool or UI framework - works with any locally stored image.`,
32
+ mode: "subagent",
33
+ model: `${e.provider}/${e.model_id}`,
34
+ temperature: 0.1,
35
+ prompt: bodyTpl.replaceAll("{{model_name}}", e.name).replaceAll("{{provider}}", e.provider).replaceAll("{{model_id}}", e.model_id),
36
+ permission: PERMISSION
37
+ });
38
+ }
39
+ }
40
+ });
41
+ var plugin_default = { id: "vision", server: plugin };
42
+ export {
43
+ plugin_default as default
44
+ };
package/package.json ADDED
@@ -0,0 +1,55 @@
1
+ {
2
+ "name": "opencode-vision",
3
+ "version": "0.1.0",
4
+ "description": "Typed visual-judgment skill for opencode. Registers 10 vision subagents (one per top-tier vision model across OpenAI, Kimi for Coding, Ollama Cloud, and opencode-go) and a skill that teaches a text-only orchestrator to extract visual-judgment intent, classify it into a typed judgment, and delegate to a vision subagent with a versioned request/report contract.",
5
+ "type": "module",
6
+ "main": "./dist/index.js",
7
+ "exports": {
8
+ ".": {
9
+ "import": "./dist/index.js"
10
+ }
11
+ },
12
+ "files": [
13
+ "dist",
14
+ "SKILL.md",
15
+ "schemas",
16
+ "subagent-body.md",
17
+ "vision-models.json",
18
+ "README.md"
19
+ ],
20
+ "scripts": {
21
+ "prebuild": "rm -rf dist",
22
+ "build": "bun build ./plugin.ts --outfile ./dist/index.js --target node --format esm --packages external",
23
+ "prepublishOnly": "bun run build",
24
+ "typecheck": "tsc --noEmit"
25
+ },
26
+ "keywords": [
27
+ "opencode",
28
+ "opencode-plugin",
29
+ "opencode-ai",
30
+ "vision",
31
+ "visual-judgment",
32
+ "image-analysis",
33
+ "screenshot",
34
+ "subagent",
35
+ "skill"
36
+ ],
37
+ "license": "MIT",
38
+ "repository": {
39
+ "type": "git",
40
+ "url": "git+https://github.com/WeZZard/skills.git",
41
+ "directory": "opencode/vision"
42
+ },
43
+ "homepage": "https://github.com/WeZZard/skills/tree/main/opencode/vision#readme",
44
+ "bugs": {
45
+ "url": "https://github.com/WeZZard/skills/issues"
46
+ },
47
+ "peerDependencies": {
48
+ "@opencode-ai/plugin": "^1.4.7"
49
+ },
50
+ "devDependencies": {
51
+ "@opencode-ai/plugin": "^1.4.7",
52
+ "@types/node": "^22.13.9",
53
+ "typescript": "^5.8.2"
54
+ }
55
+ }
@@ -0,0 +1,88 @@
1
+ {
2
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
3
+ "$id": "https://raw.githubusercontent.com/WeZZard/skills/main/opencode/vision/schemas/visual-judgment-report.v1.json",
4
+ "title": "Visual Judgment Report v1",
5
+ "description": "Typed report envelope returned by a vision subagent after analyzing images against a visual-judgment-request.",
6
+ "type": "object",
7
+ "required": ["$schema", "id", "status"],
8
+ "additionalProperties": false,
9
+ "properties": {
10
+ "$schema": {
11
+ "const": "https://raw.githubusercontent.com/WeZZard/skills/main/opencode/vision/schemas/visual-judgment-report.v1.json",
12
+ "description": "Schema version identifier."
13
+ },
14
+ "id": {
15
+ "type": "string",
16
+ "description": "Correlation id — MUST echo the request id."
17
+ },
18
+ "status": {
19
+ "enum": ["ok", "error", "insufficient-evidence"],
20
+ "description": "ok = analysis succeeded; error = could not analyze (image corrupt, model unavailable); insufficient-evidence = analyzed but cannot reach a verdict."
21
+ },
22
+ "verdict": {
23
+ "enum": ["pass", "fail", "inconclusive"],
24
+ "description": "pass = criterion met; fail = criterion not met; inconclusive = informational (diff/describe) or genuinely undeterminable."
25
+ },
26
+ "confidence": {
27
+ "type": "number",
28
+ "minimum": 0,
29
+ "maximum": 1,
30
+ "description": "0.0–1.0. How confident the subagent is in the verdict."
31
+ },
32
+ "observations": {
33
+ "type": "array",
34
+ "description": "Typed observations about what was seen in each image. Structure varies per judgment.type by convention.",
35
+ "items": {
36
+ "type": "object",
37
+ "required": ["imageLabel", "subject"],
38
+ "additionalProperties": false,
39
+ "properties": {
40
+ "imageLabel": { "type": "string", "description": "Label of the image this observation is about (matches request images[].label)." },
41
+ "subject": { "type": "string", "description": "What this observation describes, e.g. 'Submit button', 'footer text'." },
42
+ "properties": {
43
+ "type": "object",
44
+ "additionalProperties": true,
45
+ "description": "Type-specific findings, e.g. {found:true,position:'bottom-center'} for presence, {centerOffsetPx:42} for alignment, {knobPosition:'right',trackColor:'blue'} for state."
46
+ },
47
+ "note": { "type": "string", "description": "Free-form clarifying note." }
48
+ }
49
+ }
50
+ },
51
+ "diff": {
52
+ "type": "array",
53
+ "description": "Structured change list (populated for judgment.type=diff).",
54
+ "items": {
55
+ "type": "object",
56
+ "additionalProperties": false,
57
+ "properties": {
58
+ "from": { "type": "string", "description": "What was there in the baseline." },
59
+ "to": { "type": "string", "description": "What is there in the current." },
60
+ "description": { "type": "string", "description": "Human-readable description of the change." }
61
+ }
62
+ }
63
+ },
64
+ "reasoning": {
65
+ "type": "string",
66
+ "description": "One-paragraph justification linking observations to verdict."
67
+ },
68
+ "errors": {
69
+ "type": "array",
70
+ "description": "Populated when status=error. One entry per image that could not be analyzed.",
71
+ "items": {
72
+ "type": "object",
73
+ "required": ["imageLabel", "code"],
74
+ "additionalProperties": false,
75
+ "properties": {
76
+ "imageLabel": { "type": "string" },
77
+ "code": { "type": "string", "description": "e.g. 'file_not_found', 'unsupported_format', 'model_unavailable'." }
78
+ }
79
+ }
80
+ }
81
+ },
82
+ "allOf": [
83
+ {
84
+ "if": { "properties": { "status": { "const": "ok" } } },
85
+ "then": { "required": ["verdict", "confidence", "observations"] }
86
+ }
87
+ ]
88
+ }
@@ -0,0 +1,236 @@
1
+ {
2
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
3
+ "$id": "https://raw.githubusercontent.com/WeZZard/skills/main/opencode/vision/schemas/visual-judgment-request.v1.json",
4
+ "title": "Visual Judgment Request v1",
5
+ "description": "Typed intent envelope emitted by the orchestrator and passed as the task prompt to a vision subagent.",
6
+ "type": "object",
7
+ "required": ["$schema", "id", "images", "judgment"],
8
+ "additionalProperties": false,
9
+ "properties": {
10
+ "$schema": {
11
+ "const": "https://raw.githubusercontent.com/WeZZard/skills/main/opencode/vision/schemas/visual-judgment-request.v1.json",
12
+ "description": "Schema version identifier."
13
+ },
14
+ "id": {
15
+ "type": "string",
16
+ "description": "Correlation id (uuid recommended). The report must echo this id."
17
+ },
18
+ "preferredModel": {
19
+ "type": "string",
20
+ "pattern": "^.+/.+$",
21
+ "description": "Provider/model-id of the vision model the orchestrator selected (informational; the subagent is already bound to a model via its config)."
22
+ },
23
+ "images": {
24
+ "type": "array",
25
+ "minItems": 1,
26
+ "description": "One or more locally-stored image files to analyze.",
27
+ "items": {
28
+ "type": "object",
29
+ "required": ["path"],
30
+ "additionalProperties": false,
31
+ "properties": {
32
+ "path": {
33
+ "type": "string",
34
+ "description": "Absolute or project-relative path to the image file (PNG/JPEG/WebP)."
35
+ },
36
+ "label": {
37
+ "type": "string",
38
+ "description": "Short human-readable label for the image, used in observations[].imageLabel."
39
+ },
40
+ "role": {
41
+ "enum": ["baseline", "current", "reference"],
42
+ "description": "Role of this image in the judgment: baseline (before/reference), current (the thing under test), reference (design/target)."
43
+ }
44
+ }
45
+ }
46
+ },
47
+ "judgment": {
48
+ "description": "The typed judgment to perform. Exactly one type applies; its parameters are type-specific.",
49
+ "oneOf": [
50
+ {
51
+ "type": "object",
52
+ "required": ["type", "parameters"],
53
+ "additionalProperties": false,
54
+ "properties": {
55
+ "type": { "const": "presence" },
56
+ "parameters": {
57
+ "type": "object",
58
+ "required": ["subject", "expectation"],
59
+ "additionalProperties": false,
60
+ "properties": {
61
+ "subject": { "type": "string", "description": "The element/text/icon to find, e.g. 'Submit button'." },
62
+ "expectation": { "enum": ["present", "absent"], "description": "Whether the subject should be visible." }
63
+ }
64
+ }
65
+ }
66
+ },
67
+ {
68
+ "type": "object",
69
+ "required": ["type", "parameters"],
70
+ "additionalProperties": false,
71
+ "properties": {
72
+ "type": { "const": "absence" },
73
+ "parameters": {
74
+ "type": "object",
75
+ "required": ["subject", "expectation"],
76
+ "additionalProperties": false,
77
+ "properties": {
78
+ "subject": { "type": "string", "description": "The element that should NOT be visible." },
79
+ "expectation": { "enum": ["present", "absent"], "description": "Should be 'absent'." }
80
+ }
81
+ }
82
+ }
83
+ },
84
+ {
85
+ "type": "object",
86
+ "required": ["type", "parameters"],
87
+ "additionalProperties": false,
88
+ "properties": {
89
+ "type": { "const": "alignment" },
90
+ "parameters": {
91
+ "type": "object",
92
+ "required": ["subject", "axis"],
93
+ "additionalProperties": false,
94
+ "properties": {
95
+ "subject": { "type": "string" },
96
+ "axis": { "enum": ["horizontal", "vertical", "both"] },
97
+ "expectation": { "type": "string", "description": "e.g. 'centered', 'left-aligned', 'top'." },
98
+ "tolerance": { "enum": ["strict", "loose"], "description": "strict ≈ exact; loose ≈ within ~5%." }
99
+ }
100
+ }
101
+ }
102
+ },
103
+ {
104
+ "type": "object",
105
+ "required": ["type", "parameters"],
106
+ "additionalProperties": false,
107
+ "properties": {
108
+ "type": { "const": "ordering" },
109
+ "parameters": {
110
+ "type": "object",
111
+ "required": ["direction", "expected"],
112
+ "additionalProperties": false,
113
+ "properties": {
114
+ "direction": { "enum": ["ltr", "ttb"], "description": "Left-to-right or top-to-bottom." },
115
+ "expected": { "type": "array", "items": { "type": "string" }, "description": "Expected order of subjects." }
116
+ }
117
+ }
118
+ }
119
+ },
120
+ {
121
+ "type": "object",
122
+ "required": ["type", "parameters"],
123
+ "additionalProperties": false,
124
+ "properties": {
125
+ "type": { "const": "equality" },
126
+ "parameters": {
127
+ "type": "object",
128
+ "required": ["subjects"],
129
+ "additionalProperties": false,
130
+ "properties": {
131
+ "subjects": {
132
+ "type": "array",
133
+ "minItems": 2,
134
+ "maxItems": 2,
135
+ "items": { "type": "string" },
136
+ "description": "Labels of the two images to compare."
137
+ },
138
+ "threshold": { "enum": ["exact", "perceptual"], "description": "exact = pixel-identical; perceptual = structurally same." }
139
+ }
140
+ }
141
+ }
142
+ },
143
+ {
144
+ "type": "object",
145
+ "required": ["type", "parameters"],
146
+ "additionalProperties": false,
147
+ "properties": {
148
+ "type": { "const": "layout" },
149
+ "parameters": {
150
+ "type": "object",
151
+ "required": ["expectations"],
152
+ "additionalProperties": false,
153
+ "properties": {
154
+ "expectations": { "type": "string", "description": "NL description of expected layout, e.g. 'sidebar on left, main content right, equal gaps'." }
155
+ }
156
+ }
157
+ }
158
+ },
159
+ {
160
+ "type": "object",
161
+ "required": ["type", "parameters"],
162
+ "additionalProperties": false,
163
+ "properties": {
164
+ "type": { "const": "readability" },
165
+ "parameters": {
166
+ "type": "object",
167
+ "required": ["subject"],
168
+ "additionalProperties": false,
169
+ "properties": {
170
+ "subject": { "type": "string", "description": "The text/region to check for legibility." }
171
+ }
172
+ }
173
+ }
174
+ },
175
+ {
176
+ "type": "object",
177
+ "required": ["type", "parameters"],
178
+ "additionalProperties": false,
179
+ "properties": {
180
+ "type": { "const": "state" },
181
+ "parameters": {
182
+ "type": "object",
183
+ "required": ["subject", "expectedState"],
184
+ "additionalProperties": false,
185
+ "properties": {
186
+ "subject": { "type": "string", "description": "The control, e.g. 'notifications toggle'." },
187
+ "expectedState": { "type": "string", "description": "e.g. 'on', 'off', 'checked', 'disabled'." }
188
+ }
189
+ }
190
+ }
191
+ },
192
+ {
193
+ "type": "object",
194
+ "required": ["type", "parameters"],
195
+ "additionalProperties": false,
196
+ "properties": {
197
+ "type": { "const": "diff" },
198
+ "parameters": {
199
+ "type": "object",
200
+ "required": ["baseline", "current"],
201
+ "additionalProperties": false,
202
+ "properties": {
203
+ "baseline": { "type": "string", "description": "Label of the before image." },
204
+ "current": { "type": "string", "description": "Label of the after image." }
205
+ }
206
+ }
207
+ }
208
+ },
209
+ {
210
+ "type": "object",
211
+ "required": ["type", "parameters"],
212
+ "additionalProperties": false,
213
+ "properties": {
214
+ "type": { "const": "describe" },
215
+ "parameters": {
216
+ "type": "object",
217
+ "required": ["focus"],
218
+ "additionalProperties": false,
219
+ "properties": {
220
+ "focus": { "type": "string", "description": "What to describe, e.g. 'overall layout and primary UI elements'." }
221
+ }
222
+ }
223
+ }
224
+ }
225
+ ]
226
+ },
227
+ "criteria": {
228
+ "type": "string",
229
+ "description": "Natural-language fallback for nuance the typed parameters cannot capture. Free-form guidance for the subagent."
230
+ },
231
+ "responseContract": {
232
+ "type": "string",
233
+ "description": "What the caller wants back, beyond the fixed report envelope. e.g. 'Return pass/fail and note the button position if found.'"
234
+ }
235
+ }
236
+ }
@@ -0,0 +1,45 @@
1
+ You are a visual judgment subagent powered by {{model_name}}
2
+ ({{provider}}/{{model_id}}).
3
+
4
+ ## Input
5
+
6
+ You receive a `visual-judgment-request.v1` JSON object as your prompt.
7
+ Read it, then read each image file listed in `images[].path` using the
8
+ `read` tool. Analyze them against `judgment.type` and `judgment.parameters`.
9
+
10
+ The request schema lives at:
11
+ https://raw.githubusercontent.com/WeZZard/skills/main/opencode/vision/schemas/visual-judgment-request.v1.json
12
+
13
+ ## Output
14
+
15
+ Emit a `visual-judgment-report.v1` JSON object — nothing else. No prose,
16
+ no markdown fences, no commentary. The envelope is fixed:
17
+
18
+ - `$schema`: "https://raw.githubusercontent.com/WeZZard/skills/main/opencode/vision/schemas/visual-judgment-report.v1.json"
19
+ - `id`: echo the request id
20
+ - `status`: "ok" | "error" | "insufficient-evidence"
21
+ - `verdict`: "pass" | "fail" | "inconclusive" (only when status="ok")
22
+ - `confidence`: 0.0-1.0
23
+ - `observations[]`: typed per `judgment.type`
24
+ - `diff[]`: structured change list (for judgment.type="diff")
25
+ - `reasoning`: one-paragraph justification linking observations to verdict
26
+ - `errors[]`: if any image could not be analyzed
27
+
28
+ The report schema lives at:
29
+ https://raw.githubusercontent.com/WeZZard/skills/main/opencode/vision/schemas/visual-judgment-report.v1.json
30
+
31
+ ## Rules
32
+
33
+ - Report what you actually observe. Do not guess.
34
+ - Be specific: positions, colors, sizes, alignment, visibility, ordering.
35
+ - If a subject described in the request is not visible, say so in
36
+ `observations[].note`.
37
+ - If you cannot analyze an image (corrupted, wrong format, file not found),
38
+ set `status: "error"` with an `errors[]` entry (code e.g. "file_not_found",
39
+ "unsupported_format").
40
+ - For `diff` and `describe` judgments, set `verdict: "inconclusive"` —
41
+ these are informational, not pass/fail.
42
+ - Validate your output against the report schema URL (best-effort if the
43
+ fetch fails — emit the envelope correctly regardless).
44
+ - You MUST NOT spawn subagents. You are a leaf in the execution tree.
45
+ - You MUST NOT run the graph engine or any orchestrator-only command.
@@ -0,0 +1,15 @@
1
+ {
2
+ "_comment": "Vision subagent manifest. Curation rule: one top-tier model per provider x vendor family; drop non-reasoning, drop superseded within a provider, drop coding-specialized, drop Pro/billing variants of the same family; keep cross-provider duplicates (same model via different route = different billing/rate/latency). Add a model = one line here + restart opencode.",
3
+ "models": [
4
+ { "provider": "openai", "model_id": "gpt-5.5", "name": "GPT-5.5" },
5
+ { "provider": "kimi-for-coding", "model_id": "k2p7", "name": "Kimi K2.7 Code" },
6
+ { "provider": "ollama-cloud", "model_id": "gemini-3-flash-preview", "name": "Gemini 3 Flash" },
7
+ { "provider": "ollama-cloud", "model_id": "gemma4:31b", "name": "Gemma 4 31B" },
8
+ { "provider": "ollama-cloud", "model_id": "minimax-m3", "name": "MiniMax M3" },
9
+ { "provider": "ollama-cloud", "model_id": "qwen3.5:397b", "name": "Qwen 3.5 397B" },
10
+ { "provider": "opencode-go", "model_id": "kimi-k2.7-code", "name": "Kimi K2.7 Code" },
11
+ { "provider": "opencode-go", "model_id": "minimax-m3", "name": "MiniMax M3" },
12
+ { "provider": "opencode-go", "model_id": "qwen3.7-plus", "name": "Qwen 3.7 Plus" },
13
+ { "provider": "opencode-go", "model_id": "mimo-v2.5", "name": "MiMo V2.5" }
14
+ ]
15
+ }