@agent-sh/computer-use-linux 0.2.4 → 0.2.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +19 -7
  2. package/package.json +1 -1
package/README.md CHANGED
@@ -45,11 +45,13 @@ MCP tools exposed by the server:
45
45
  - `list_windows` — compositor windows with title, app id, wm_class, focus state, client type (Wayland/X11), and bounds
46
46
  - `focused_window` — the window currently holding keyboard focus
47
47
  - `get_app_state` — combined screenshot + accessibility tree for a chosen app, with element indices that the input tools accept
48
- - `screenshot` — capture the screen as a PNG; can target a window, which is raised to the front and cropped to just that window
48
+ - `screenshot` — capture the screen as a bounded PNG or JPEG image; can target a window, which is raised to the front and cropped to just that window
49
+
50
+ Screenshot payloads are size-bounded by default before they are returned to the MCP host: max 1920 px width/height and 2 MiB image bytes, with hard caps even when callers request more. Agents that need more detail can pass `max_width`, `max_height`, `max_bytes`, `scale`, `format: "jpeg"`, or `quality`, preferably with a window target or crop. PNG remains the default; JPEG lets callers trade lossless pixels for a smaller payload before the byte cap forces further resizing. Returned screenshot metadata includes `coordinate_width`, `coordinate_height`, `scale`, `format`, and `quality` so callers can convert from a downscaled preview to desktop coordinate pixels.
49
51
 
50
52
  **Input**
51
- - `click` — by element index, semantic selector, or pixel coordinates
52
- - `drag` — pixel-coordinate drag (start / end)
53
+ - `click` — by element index, semantic selector, or desktop coordinate pixels
54
+ - `drag` — desktop coordinate drag (start / end)
53
55
  - `scroll` — page-based scroll on an element or at a pixel location
54
56
  - `press_key` — keys / chords; can focus a window or terminal first
55
57
  - `type_text` — literal text input, optionally targeted at a window or terminal
@@ -304,10 +306,16 @@ Most setups need none of these — `doctor` and the installers pick sensible def
304
306
 
305
307
  | Variable | Effect |
306
308
  | --- | --- |
307
- | `COMPUTER_USE_LINUX_COSMIC_HELPER` | Path to the `computer-use-linux-cosmic` helper when it isn't next to the binary or on `PATH`. |
308
- | `CU_DISABLE_ABS_POINTER` | Disable the uinput absolute pointer and click through `ydotool` instead (for setups where the abs-pointer device misbehaves). |
309
- | `COMPUTER_USE_LINUX_FORCE_PORTAL_POINTER` / `…_KEYBOARD` | Always route pointer / keyboard through the RemoteDesktop portal on Wayland, skipping auto-detection. |
310
- | `COMPUTER_USE_LINUX_FORCE_YDOTOOL_POINTER` / `…_KEYBOARD` | Always route pointer / keyboard through `ydotool`, skipping the portal and KDE clipboard paths. |
309
+ | `COMPUTER_USE_LINUX_COSMIC_HELPER` | Path to the `computer-use-linux-cosmic` helper when it isn't next to the binary or on `PATH` (`CODEX_COMPUTER_USE_COSMIC_HELPER` is also accepted by embedded Codex builds). |
310
+ | `CU_DISABLE_ABS_POINTER` | Disable the uinput absolute pointer and click through `ydotool` instead (for setups where the abs-pointer device misbehaves); embedded Codex builds may use `CODEX_COMPUTER_USE_DISABLE_ABS_POINTER`. |
311
+ | `COMPUTER_USE_LINUX_FORCE_PORTAL_POINTER` / `…_KEYBOARD` | Always route pointer / keyboard through the RemoteDesktop portal on Wayland, skipping auto-detection; embedded Codex builds may use `CODEX_COMPUTER_USE_FORCE_PORTAL_POINTER` / `…_KEYBOARD`. |
312
+ | `COMPUTER_USE_LINUX_FORCE_YDOTOOL_POINTER` / `…_KEYBOARD` | Always route pointer / keyboard through `ydotool`, skipping the portal and KDE clipboard paths; embedded Codex builds may use `CODEX_COMPUTER_USE_FORCE_YDOTOOL_POINTER` / `…_KEYBOARD`. |
313
+
314
+ **Build-time identity overrides** (set while compiling a downstream embedded
315
+ bundle): `CUL_GNOME_EXTENSION_UUID`, `CUL_DBUS_SERVICE`, and
316
+ `CUL_DBUS_OBJECT_PATH` replace the default standalone GNOME Shell extension
317
+ UUID and DBus endpoint in both the Rust probes and the generated extension
318
+ files.
311
319
 
312
320
  **npm wrapper** (set during `npm install`, or before running):
313
321
 
@@ -357,6 +365,10 @@ If you're running this on a shared workstation, set `ydotoold`'s socket permissi
357
365
 
358
366
  If `doctor` is green and a specific tool still misbehaves, file an issue with the JSON output of `doctor` and the failing tool's request payload.
359
367
 
368
+ ## Related
369
+
370
+ - [agent-workspace-linux](https://github.com/agent-sh/agent-workspace-linux) — the sibling MCP that gives an agent its **own** isolated Linux desktop (a hidden Xvfb display with its own apps and browser) instead of driving yours. It is the inverse of this project: `computer-use-linux` automates the desktop you are already on; `agent-workspace-linux` sandboxes the agent in a separate one. Use them together.
371
+
360
372
  ## Contributing
361
373
 
362
374
  Contributions are welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for the local development workflow, CI gates, and PR expectations. Report security vulnerabilities through [SECURITY.md](SECURITY.md), not public issues.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@agent-sh/computer-use-linux",
3
- "version": "0.2.4",
3
+ "version": "0.2.5",
4
4
  "description": "Linux desktop-control MCP server: AT-SPI accessibility trees, Wayland/X11 input, screenshots, and compositor window targeting.",
5
5
  "license": "MIT",
6
6
  "type": "commonjs",