@agent-sh/computer-use-linux 0.2.4 → 0.2.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +21 -8
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -24,7 +24,7 @@ The Rust crate is published as [`computer-use-linux`](https://crates.io/crates/c
|
|
|
24
24
|
|
|
25
25
|
Most computer-use MCP servers are macOS-only (they lean on AppKit, AXUIElement, CGEvent). The few that target Linux either drive `xdotool` against an X11 root window or shell out to OCR over screenshots. Four things set this one apart:
|
|
26
26
|
|
|
27
|
-
- **Wayland actually works.** Pointer actions can use the `org.freedesktop.portal.RemoteDesktop` interface on Wayland, with `ydotool` / `ydotoold` (uinput) as the deterministic fallback and keyboard/text path. Screenshots use the GNOME Shell DBus screenshot method when present
|
|
27
|
+
- **Wayland actually works.** Pointer actions can use the `org.freedesktop.portal.RemoteDesktop` interface on Wayland, with `ydotool` / `ydotoold` (uinput) as the deterministic fallback and keyboard/text path. Screenshots use the GNOME Shell DBus screenshot method when present, `org.freedesktop.portal.Screenshot` otherwise, and fall back to spawning `gnome-screenshot` for background/systemd contexts where both DBus paths are denied.
|
|
28
28
|
- **Window targeting is compositor-aware.** The window registry tries GNOME Shell extension, GNOME Shell Introspect, COSMIC Wayland helper, KWin DBus scripting, Hyprland `hyprctl`, and i3 IPC in order, then reports exactly which backend won or why each backend failed.
|
|
29
29
|
- **Semantic selectors, not pixel coordinates.** Tools like `click`, `perform_action`, and `set_value` accept `role` / `name` / `text` / `states` selectors backed by AT-SPI. Pixel coordinates remain available as a fallback for rendering-only surfaces (canvas, games, X clients without ATK).
|
|
30
30
|
- **One JSON readiness report.** `computer-use-linux doctor` returns a structured document covering platform, portals, AT-SPI, windowing, input, and a `readiness` summary with explicit blockers and a recommended next step. MCP hosts can render or surface that to the user without parsing prose.
|
|
@@ -45,11 +45,13 @@ MCP tools exposed by the server:
|
|
|
45
45
|
- `list_windows` — compositor windows with title, app id, wm_class, focus state, client type (Wayland/X11), and bounds
|
|
46
46
|
- `focused_window` — the window currently holding keyboard focus
|
|
47
47
|
- `get_app_state` — combined screenshot + accessibility tree for a chosen app, with element indices that the input tools accept
|
|
48
|
-
- `screenshot` — capture the screen as a PNG; can target a window, which is raised to the front and cropped to just that window
|
|
48
|
+
- `screenshot` — capture the screen as a bounded PNG or JPEG image; can target a window, which is raised to the front and cropped to just that window
|
|
49
|
+
|
|
50
|
+
Screenshot payloads are size-bounded by default before they are returned to the MCP host: max 1920 px width/height and 2 MiB image bytes, with hard caps even when callers request more. Agents that need more detail can pass `max_width`, `max_height`, `max_bytes`, `scale`, `format: "jpeg"`, or `quality`, preferably with a window target or crop. PNG remains the default; JPEG lets callers trade lossless pixels for a smaller payload before the byte cap forces further resizing. Returned screenshot metadata includes `coordinate_width`, `coordinate_height`, `scale`, `format`, and `quality` so callers can convert from a downscaled preview to desktop coordinate pixels.
|
|
49
51
|
|
|
50
52
|
**Input**
|
|
51
|
-
- `click` — by element index, semantic selector, or
|
|
52
|
-
- `drag` —
|
|
53
|
+
- `click` — by element index, semantic selector, or desktop coordinate pixels
|
|
54
|
+
- `drag` — desktop coordinate drag (start / end)
|
|
53
55
|
- `scroll` — page-based scroll on an element or at a pixel location
|
|
54
56
|
- `press_key` — keys / chords; can focus a window or terminal first
|
|
55
57
|
- `type_text` — literal text input, optionally targeted at a window or terminal
|
|
@@ -304,10 +306,17 @@ Most setups need none of these — `doctor` and the installers pick sensible def
|
|
|
304
306
|
|
|
305
307
|
| Variable | Effect |
|
|
306
308
|
| --- | --- |
|
|
307
|
-
| `COMPUTER_USE_LINUX_COSMIC_HELPER` | Path to the `computer-use-linux-cosmic` helper when it isn't next to the binary or on `PATH
|
|
308
|
-
| `CU_DISABLE_ABS_POINTER` | Disable the uinput absolute pointer and click through `ydotool` instead (for setups where the abs-pointer device misbehaves)
|
|
309
|
-
| `COMPUTER_USE_LINUX_FORCE_PORTAL_POINTER` / `…_KEYBOARD` | Always route pointer / keyboard through the RemoteDesktop portal on Wayland, skipping auto-detection
|
|
310
|
-
| `COMPUTER_USE_LINUX_FORCE_YDOTOOL_POINTER` / `…_KEYBOARD` | Always route pointer / keyboard through `ydotool`, skipping the portal and KDE clipboard paths
|
|
309
|
+
| `COMPUTER_USE_LINUX_COSMIC_HELPER` | Path to the `computer-use-linux-cosmic` helper when it isn't next to the binary or on `PATH` (`CODEX_COMPUTER_USE_COSMIC_HELPER` is also accepted by embedded Codex builds). |
|
|
310
|
+
| `CU_DISABLE_ABS_POINTER` | Disable the uinput absolute pointer and click through `ydotool` instead (for setups where the abs-pointer device misbehaves); embedded Codex builds may use `CODEX_COMPUTER_USE_DISABLE_ABS_POINTER`. |
|
|
311
|
+
| `COMPUTER_USE_LINUX_FORCE_PORTAL_POINTER` / `…_KEYBOARD` | Always route pointer / keyboard through the RemoteDesktop portal on Wayland, skipping auto-detection; embedded Codex builds may use `CODEX_COMPUTER_USE_FORCE_PORTAL_POINTER` / `…_KEYBOARD`. |
|
|
312
|
+
| `COMPUTER_USE_LINUX_FORCE_YDOTOOL_POINTER` / `…_KEYBOARD` | Always route pointer / keyboard through `ydotool`, skipping the portal and KDE clipboard paths; embedded Codex builds may use `CODEX_COMPUTER_USE_FORCE_YDOTOOL_POINTER` / `…_KEYBOARD`. |
|
|
313
|
+
| `COMPUTER_USE_LINUX_SCREENSHOT_BACKEND` | Force a single screenshot backend, skipping the fallback chain. Accepts `gnome-shell`, `portal`, or `gnome-screenshot`. Pin `gnome-screenshot` for background/systemd contexts where the GNOME Shell and portal DBus paths are denied. |
|
|
314
|
+
|
|
315
|
+
**Build-time identity overrides** (set while compiling a downstream embedded
|
|
316
|
+
bundle): `CUL_GNOME_EXTENSION_UUID`, `CUL_DBUS_SERVICE`, and
|
|
317
|
+
`CUL_DBUS_OBJECT_PATH` replace the default standalone GNOME Shell extension
|
|
318
|
+
UUID and DBus endpoint in both the Rust probes and the generated extension
|
|
319
|
+
files.
|
|
311
320
|
|
|
312
321
|
**npm wrapper** (set during `npm install`, or before running):
|
|
313
322
|
|
|
@@ -357,6 +366,10 @@ If you're running this on a shared workstation, set `ydotoold`'s socket permissi
|
|
|
357
366
|
|
|
358
367
|
If `doctor` is green and a specific tool still misbehaves, file an issue with the JSON output of `doctor` and the failing tool's request payload.
|
|
359
368
|
|
|
369
|
+
## Related
|
|
370
|
+
|
|
371
|
+
- [agent-workspace-linux](https://github.com/agent-sh/agent-workspace-linux) — the sibling MCP that gives an agent its **own** isolated Linux desktop (a hidden Xvfb display with its own apps and browser) instead of driving yours. It is the inverse of this project: `computer-use-linux` automates the desktop you are already on; `agent-workspace-linux` sandboxes the agent in a separate one. Use them together.
|
|
372
|
+
|
|
360
373
|
## Contributing
|
|
361
374
|
|
|
362
375
|
Contributions are welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for the local development workflow, CI gates, and PR expectations. Report security vulnerabilities through [SECURITY.md](SECURITY.md), not public issues.
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@agent-sh/computer-use-linux",
|
|
3
|
-
"version": "0.2.
|
|
3
|
+
"version": "0.2.6",
|
|
4
4
|
"description": "Linux desktop-control MCP server: AT-SPI accessibility trees, Wayland/X11 input, screenshots, and compositor window targeting.",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"type": "commonjs",
|