@agent-sh/computer-use-linux 0.2.4 → 0.2.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +19 -7
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -45,11 +45,13 @@ MCP tools exposed by the server:
|
|
|
45
45
|
- `list_windows` — compositor windows with title, app id, wm_class, focus state, client type (Wayland/X11), and bounds
|
|
46
46
|
- `focused_window` — the window currently holding keyboard focus
|
|
47
47
|
- `get_app_state` — combined screenshot + accessibility tree for a chosen app, with element indices that the input tools accept
|
|
48
|
-
- `screenshot` — capture the screen as a PNG; can target a window, which is raised to the front and cropped to just that window
|
|
48
|
+
- `screenshot` — capture the screen as a bounded PNG or JPEG image; can target a window, which is raised to the front and cropped to just that window
|
|
49
|
+
|
|
50
|
+
Screenshot payloads are size-bounded by default before they are returned to the MCP host: max 1920 px width/height and 2 MiB image bytes, with hard caps even when callers request more. Agents that need more detail can pass `max_width`, `max_height`, `max_bytes`, `scale`, `format: "jpeg"`, or `quality`, preferably with a window target or crop. PNG remains the default; JPEG lets callers trade lossless pixels for a smaller payload before the byte cap forces further resizing. Returned screenshot metadata includes `coordinate_width`, `coordinate_height`, `scale`, `format`, and `quality` so callers can convert from a downscaled preview to desktop coordinate pixels.
|
|
49
51
|
|
|
50
52
|
**Input**
|
|
51
|
-
- `click` — by element index, semantic selector, or
|
|
52
|
-
- `drag` —
|
|
53
|
+
- `click` — by element index, semantic selector, or desktop coordinate pixels
|
|
54
|
+
- `drag` — desktop coordinate drag (start / end)
|
|
53
55
|
- `scroll` — page-based scroll on an element or at a pixel location
|
|
54
56
|
- `press_key` — keys / chords; can focus a window or terminal first
|
|
55
57
|
- `type_text` — literal text input, optionally targeted at a window or terminal
|
|
@@ -304,10 +306,16 @@ Most setups need none of these — `doctor` and the installers pick sensible def
|
|
|
304
306
|
|
|
305
307
|
| Variable | Effect |
|
|
306
308
|
| --- | --- |
|
|
307
|
-
| `COMPUTER_USE_LINUX_COSMIC_HELPER` | Path to the `computer-use-linux-cosmic` helper when it isn't next to the binary or on `PATH
|
|
308
|
-
| `CU_DISABLE_ABS_POINTER` | Disable the uinput absolute pointer and click through `ydotool` instead (for setups where the abs-pointer device misbehaves)
|
|
309
|
-
| `COMPUTER_USE_LINUX_FORCE_PORTAL_POINTER` / `…_KEYBOARD` | Always route pointer / keyboard through the RemoteDesktop portal on Wayland, skipping auto-detection
|
|
310
|
-
| `COMPUTER_USE_LINUX_FORCE_YDOTOOL_POINTER` / `…_KEYBOARD` | Always route pointer / keyboard through `ydotool`, skipping the portal and KDE clipboard paths
|
|
309
|
+
| `COMPUTER_USE_LINUX_COSMIC_HELPER` | Path to the `computer-use-linux-cosmic` helper when it isn't next to the binary or on `PATH` (`CODEX_COMPUTER_USE_COSMIC_HELPER` is also accepted by embedded Codex builds). |
|
|
310
|
+
| `CU_DISABLE_ABS_POINTER` | Disable the uinput absolute pointer and click through `ydotool` instead (for setups where the abs-pointer device misbehaves); embedded Codex builds may use `CODEX_COMPUTER_USE_DISABLE_ABS_POINTER`. |
|
|
311
|
+
| `COMPUTER_USE_LINUX_FORCE_PORTAL_POINTER` / `…_KEYBOARD` | Always route pointer / keyboard through the RemoteDesktop portal on Wayland, skipping auto-detection; embedded Codex builds may use `CODEX_COMPUTER_USE_FORCE_PORTAL_POINTER` / `…_KEYBOARD`. |
|
|
312
|
+
| `COMPUTER_USE_LINUX_FORCE_YDOTOOL_POINTER` / `…_KEYBOARD` | Always route pointer / keyboard through `ydotool`, skipping the portal and KDE clipboard paths; embedded Codex builds may use `CODEX_COMPUTER_USE_FORCE_YDOTOOL_POINTER` / `…_KEYBOARD`. |
|
|
313
|
+
|
|
314
|
+
**Build-time identity overrides** (set while compiling a downstream embedded
|
|
315
|
+
bundle): `CUL_GNOME_EXTENSION_UUID`, `CUL_DBUS_SERVICE`, and
|
|
316
|
+
`CUL_DBUS_OBJECT_PATH` replace the default standalone GNOME Shell extension
|
|
317
|
+
UUID and DBus endpoint in both the Rust probes and the generated extension
|
|
318
|
+
files.
|
|
311
319
|
|
|
312
320
|
**npm wrapper** (set during `npm install`, or before running):
|
|
313
321
|
|
|
@@ -357,6 +365,10 @@ If you're running this on a shared workstation, set `ydotoold`'s socket permissi
|
|
|
357
365
|
|
|
358
366
|
If `doctor` is green and a specific tool still misbehaves, file an issue with the JSON output of `doctor` and the failing tool's request payload.
|
|
359
367
|
|
|
368
|
+
## Related
|
|
369
|
+
|
|
370
|
+
- [agent-workspace-linux](https://github.com/agent-sh/agent-workspace-linux) — the sibling MCP that gives an agent its **own** isolated Linux desktop (a hidden Xvfb display with its own apps and browser) instead of driving yours. It is the inverse of this project: `computer-use-linux` automates the desktop you are already on; `agent-workspace-linux` sandboxes the agent in a separate one. Use them together.
|
|
371
|
+
|
|
360
372
|
## Contributing
|
|
361
373
|
|
|
362
374
|
Contributions are welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for the local development workflow, CI gates, and PR expectations. Report security vulnerabilities through [SECURITY.md](SECURITY.md), not public issues.
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@agent-sh/computer-use-linux",
|
|
3
|
-
"version": "0.2.
|
|
3
|
+
"version": "0.2.5",
|
|
4
4
|
"description": "Linux desktop-control MCP server: AT-SPI accessibility trees, Wayland/X11 input, screenshots, and compositor window targeting.",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"type": "commonjs",
|