@agent-sh/computer-use-linux 0.2.2 → 0.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  <div align="center">
2
2
  <h1>computer-use-linux</h1>
3
- <p><strong>Linux desktop control for MCP hosts.</strong></p>
3
+ <p><strong>Control a real Linux desktop from any MCP host.</strong></p>
4
4
  <p>
5
5
  <a href="https://github.com/agent-sh/computer-use-linux/actions/workflows/ci.yml"><img src="https://github.com/agent-sh/computer-use-linux/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
6
6
  <a href="https://crates.io/crates/computer-use-linux"><img src="https://img.shields.io/crates/v/computer-use-linux.svg" alt="crates.io"></a>
@@ -9,20 +9,20 @@
9
9
  </p>
10
10
  </div>
11
11
 
12
- Linux desktop control for any MCP host: AT-SPI accessibility trees, portal screenshots, Wayland/X11 input, and multi-compositor window targeting for GNOME, KDE/KWin, Hyprland, i3, and COSMIC.
12
+ `computer-use-linux` reads accessibility trees, takes screenshots, and drives clicks, scrolls, and keystrokes across GNOME, KDE/KWin, Hyprland, i3, and COSMIC — Wayland-first, X11 best-effort.
13
13
 
14
14
  ```bash
15
- npm install -g @agent-sh/computer-use-linux@0.2.1
15
+ npm install -g @agent-sh/computer-use-linux
16
16
  computer-use-linux doctor | jq .readiness
17
17
  ```
18
18
 
19
- Current release: [`v0.2.1`](https://github.com/agent-sh/computer-use-linux/releases/tag/v0.2.1). The Rust crate is published as [`computer-use-linux`](https://crates.io/crates/computer-use-linux), and the npm wrapper is published as [`@agent-sh/computer-use-linux`](https://www.npmjs.com/package/@agent-sh/computer-use-linux).
19
+ The Rust crate is published as [`computer-use-linux`](https://crates.io/crates/computer-use-linux) and the npm wrapper as [`@agent-sh/computer-use-linux`](https://www.npmjs.com/package/@agent-sh/computer-use-linux). Prebuilt binaries ship with the [latest release](https://github.com/agent-sh/computer-use-linux/releases/latest).
20
20
 
21
21
  ## What this is
22
22
 
23
23
  `computer-use-linux` is a Rust MCP server and CLI for Linux desktop control. The crate ships the main `computer-use-linux` binary plus a small `computer-use-linux-cosmic` helper used only for COSMIC Wayland window management. Any MCP host — Codex Desktop's Linux build, Claude Desktop, [Hermes Agent](https://github.com/NousResearch/hermes-agent), or your own client — can spawn it and gain full control of the local Linux desktop: read accessibility trees, list and focus windows, take screenshots, click, drag, scroll, type, and invoke semantic accessibility actions.
24
24
 
25
- Most computer-use MCP servers are macOS-only (they rely on AppKit, AXUIElement, CGEvent). The few that target Linux either drive `xdotool` against an X11 root window or shell out to OCR over screenshots. This crate is different on four points worth caring about:
25
+ Most computer-use MCP servers are macOS-only (they lean on AppKit, AXUIElement, CGEvent). The few that target Linux either drive `xdotool` against an X11 root window or shell out to OCR over screenshots. Four things set this one apart:
26
26
 
27
27
  - **Wayland actually works.** Pointer actions can use the `org.freedesktop.portal.RemoteDesktop` interface on Wayland, with `ydotool` / `ydotoold` (uinput) as the deterministic fallback and keyboard/text path. Screenshots use the GNOME Shell DBus screenshot method when present and `org.freedesktop.portal.Screenshot` otherwise.
28
28
  - **Window targeting is compositor-aware.** The window registry tries GNOME Shell extension, GNOME Shell Introspect, COSMIC Wayland helper, KWin DBus scripting, Hyprland `hyprctl`, and i3 IPC in order, then reports exactly which backend won or why each backend failed.
@@ -33,10 +33,10 @@ The crate was extracted from [`codex-desktop-linux`](https://github.com/avifenes
33
33
 
34
34
  ## Features
35
35
 
36
- 15 MCP tools exposed by the current `v0.2.1` server:
36
+ MCP tools exposed by the server:
37
37
 
38
38
  **Diagnostics**
39
- - `doctor` — single-shot JSON readiness report (platform, portals, accessibility, windowing, input, readiness summary)
39
+ - `doctor` — single-shot JSON readiness report (platform, portals, accessibility, windowing, input, readiness summary, and a capability map of available backends)
40
40
  - `setup_accessibility` — enables GNOME's `org.gnome.desktop.interface toolkit-accessibility` setting so toolkit apps expose AT-SPI trees
41
41
  - `setup_window_targeting` — installs and enables the bundled GNOME Shell extension when `org.gnome.Shell.Introspect` is locked down
42
42
 
@@ -45,6 +45,7 @@ The crate was extracted from [`codex-desktop-linux`](https://github.com/avifenes
45
45
  - `list_windows` — compositor windows with title, app id, wm_class, focus state, client type (Wayland/X11), and bounds
46
46
  - `focused_window` — the window currently holding keyboard focus
47
47
  - `get_app_state` — combined screenshot + accessibility tree for a chosen app, with element indices that the input tools accept
48
+ - `screenshot` — capture the screen as a PNG; can target a window, which is raised to the front and cropped to just that window
48
49
 
49
50
  **Input**
50
51
  - `click` — by element index, semantic selector, or pixel coordinates
@@ -68,7 +69,7 @@ The crate was extracted from [`codex-desktop-linux`](https://github.com/avifenes
68
69
  | --- | --- | --- |
69
70
  | Read-only observation | `doctor`, `list_apps`, `list_windows`, `focused_window`, `get_app_state` | `readOnlyHint=true`; may reveal app, window, accessibility, and screenshot contents. `get_app_state` may trigger the desktop screenshot portal prompt. |
70
71
  | Local setup mutators | `setup_accessibility`, `setup_window_targeting` | `readOnlyHint=false`, `destructiveHint=false`, `idempotentHint=true`; modifies user desktop configuration by enabling accessibility or installing/enabling the GNOME window-targeting extension. |
71
- | UI state mutators | `activate_window`, `scroll` | `readOnlyHint=false`, `destructiveHint=false`; changes focus or scroll position in the live desktop. |
72
+ | UI state mutators | `activate_window`, `scroll`, `screenshot` | `readOnlyHint=false`, `destructiveHint=false`; changes focus or scroll position in the live desktop, or raises a window to capture it. |
72
73
  | Desktop action mutators | `click`, `drag`, `press_key`, `type_text`, `perform_action`, `set_value` | `readOnlyHint=false`, `destructiveHint=true`, `openWorldHint=true`; can trigger arbitrary actions in whatever local application is targeted. |
73
74
 
74
75
  Annotations are safety hints, not an authorization system. MCP hosts should still ask the user before calls that could submit, delete, send, purchase, overwrite, or otherwise commit state.
@@ -124,7 +125,7 @@ computer-use-linux doctor | jq .readiness
124
125
  Installs the Rust binaries from crates.io. You still handle the system-level pieces yourself: `ydotoold`, AT-SPI, desktop portals, and the GNOME extension if you need the GNOME Wayland exact-focus backend.
125
126
 
126
127
  ```bash
127
- cargo install computer-use-linux --version 0.2.1
128
+ cargo install computer-use-linux
128
129
  computer-use-linux doctor
129
130
  ```
130
131
 
@@ -148,7 +149,7 @@ computer-use-linux setup-window-targeting # GNOME Shell extension
148
149
  Good for users who already have Node.js and want a no-Rust install. The npm package downloads and verifies the matching main and COSMIC helper binaries during install, then the wrapper sets `COMPUTER_USE_LINUX_COSMIC_HELPER` to the bundled helper automatically.
149
150
 
150
151
  ```bash
151
- npm install -g @agent-sh/computer-use-linux@0.2.1
152
+ npm install -g @agent-sh/computer-use-linux
152
153
  computer-use-linux doctor
153
154
  ```
154
155
 
@@ -158,15 +159,15 @@ You will still need `ydotoold` running and AT-SPI enabled (run `computer-use-lin
158
159
 
159
160
  Linux x86_64 / aarch64 builds are published with each tag. Each binary ships a `.sha256` next to it.
160
161
 
161
- - Release: <https://github.com/agent-sh/computer-use-linux/releases/tag/v0.2.1>
162
+ - Latest release: <https://github.com/agent-sh/computer-use-linux/releases/latest>
162
163
 
163
164
  ```bash
164
165
  target=x86_64-unknown-linux-gnu
165
- version=v0.2.1
166
+ base=https://github.com/agent-sh/computer-use-linux/releases/latest/download
166
167
  for binary in computer-use-linux computer-use-linux-cosmic; do
167
168
  asset="$binary-$target"
168
- curl -L -O "https://github.com/agent-sh/computer-use-linux/releases/download/$version/$asset"
169
- curl -L -O "https://github.com/agent-sh/computer-use-linux/releases/download/$version/$asset.sha256"
169
+ curl -L -O "$base/$asset"
170
+ curl -L -O "$base/$asset.sha256"
170
171
  sha256sum -c "$asset.sha256"
171
172
  install -m 0755 "$asset" "$HOME/.local/bin/$binary"
172
173
  done
@@ -182,6 +183,24 @@ The binary speaks the `rmcp` 2024-11-05 stdio protocol. Pass `mcp` as the only a
182
183
 
183
184
  The Linux build of Codex Desktop already bundles this binary as a plugin. You don't need to wire it up manually — the plugin definition lives in [`codex-desktop-linux`](https://github.com/avifenesh/codex-desktop-linux) under its `plugins/` directory and is enabled by default. To upgrade the plugin in place, replace the binary it ships with the one from this repo's release assets.
184
185
 
186
+ ### Claude Code (CLI)
187
+
188
+ Use the `claude mcp add` command to register the binary as a stdio MCP server. Pick a scope:
189
+
190
+ - `--scope user` — available across all projects for your user.
191
+ - `--scope project` — written to `.mcp.json` at the project root for team sharing.
192
+ - `--scope local` (default) — only the current project, stored in `~/.claude.json`.
193
+
194
+ ```bash
195
+ # User-wide install (recommended for desktop control)
196
+ claude mcp add --scope user computer-use-linux -- computer-use-linux mcp
197
+
198
+ # Verify the server is registered and reachable
199
+ claude mcp list
200
+ ```
201
+
202
+ If `computer-use-linux` is not on `PATH`, pass the absolute path (e.g. `~/.local/bin/computer-use-linux`). Inside a Claude Code session, run `/mcp` to confirm the tools are loaded.
203
+
185
204
  ### Claude Desktop
186
205
 
187
206
  Edit `~/.config/Claude/claude_desktop_config.json`:
@@ -197,7 +216,7 @@ Edit `~/.config/Claude/claude_desktop_config.json`:
197
216
  }
198
217
  ```
199
218
 
200
- Restart Claude Desktop. The 15 tools should appear in the tools list.
219
+ Restart Claude Desktop. The tools should appear in the tools list.
201
220
 
202
221
  ### Hermes Agent
203
222
 
@@ -277,6 +296,28 @@ Spawn the binary with `["mcp"]` as the argv tail. It speaks JSON-RPC over stdio
277
296
 
278
297
  Its socket should appear at `/run/user/$UID/.ydotool_socket`.
279
298
 
299
+ ## Environment variables
300
+
301
+ Most setups need none of these — `doctor` and the installers pick sensible defaults. They exist for overriding auto-detected paths and input backends.
302
+
303
+ **Server runtime** (set in the MCP host's environment):
304
+
305
+ | Variable | Effect |
306
+ | --- | --- |
307
+ | `COMPUTER_USE_LINUX_COSMIC_HELPER` | Path to the `computer-use-linux-cosmic` helper when it isn't next to the binary or on `PATH`. |
308
+ | `CU_DISABLE_ABS_POINTER` | Disable the uinput absolute pointer and click through `ydotool` instead (for setups where the abs-pointer device misbehaves). |
309
+ | `COMPUTER_USE_LINUX_FORCE_PORTAL_POINTER` / `…_KEYBOARD` | Always route pointer / keyboard through the RemoteDesktop portal on Wayland, skipping auto-detection. |
310
+ | `COMPUTER_USE_LINUX_FORCE_YDOTOOL_POINTER` / `…_KEYBOARD` | Always route pointer / keyboard through `ydotool`, skipping the portal and KDE clipboard paths. |
311
+
312
+ **npm wrapper** (set during `npm install`, or before running):
313
+
314
+ | Variable | Effect |
315
+ | --- | --- |
316
+ | `COMPUTER_USE_LINUX_BIN` | Run this binary instead of the one bundled by the npm package. |
317
+ | `COMPUTER_USE_LINUX_DOWNLOAD_BASE` | Override the GitHub release base URL the installer downloads from (mirrors, air-gapped hosts). |
318
+ | `COMPUTER_USE_LINUX_SKIP_DOWNLOAD=1` | Skip the post-install binary download entirely. |
319
+ | `COMPUTER_USE_LINUX_LOCAL_BINARY` / `…_LOCAL_COSMIC_HELPER` | Install from a local build instead of downloading (used by CI and local testing). |
320
+
280
321
  ## Architecture
281
322
 
282
323
  - **Accessibility tree** — [`atspi`](https://crates.io/crates/atspi) crate (tokio backend) talks to the AT-SPI registry on the user session bus. The tree is flattened to `(role, name, text, states, bounds)` tuples and indexed; element indices are stable for the duration of a `get_app_state` snapshot.
package/npm/README.md CHANGED
@@ -1,7 +1,7 @@
1
1
  # computer-use-linux
2
2
 
3
- NPM wrapper for the `computer-use-linux` MCP server. Current release:
4
- [`@agent-sh/computer-use-linux@0.2.1`](https://www.npmjs.com/package/@agent-sh/computer-use-linux/v/0.2.1).
3
+ NPM wrapper for the `computer-use-linux` MCP server, published as
4
+ [`@agent-sh/computer-use-linux`](https://www.npmjs.com/package/@agent-sh/computer-use-linux).
5
5
 
6
6
  Security note: this server can control the local Linux desktop. Tools such as
7
7
  `click`, `type_text`, `press_key`, `perform_action`, and `set_value` are
@@ -10,7 +10,7 @@ mutating and can change real application state. The MCP tool list includes
10
10
  desktop actions.
11
11
 
12
12
  ```bash
13
- npm install -g @agent-sh/computer-use-linux@0.2.1
13
+ npm install -g @agent-sh/computer-use-linux
14
14
  computer-use-linux doctor
15
15
  hermes skills tap add agent-sh/computer-use-linux
16
16
  hermes skills install agent-sh/computer-use-linux/computer-use-linux
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@agent-sh/computer-use-linux",
3
- "version": "0.2.2",
3
+ "version": "0.2.4",
4
4
  "description": "Linux desktop-control MCP server: AT-SPI accessibility trees, Wayland/X11 input, screenshots, and compositor window targeting.",
5
5
  "license": "MIT",
6
6
  "type": "commonjs",
@@ -1,7 +1,6 @@
1
1
  ---
2
2
  name: computer-use-linux
3
3
  description: "Use when Hermes needs Linux desktop observation or control through the computer-use-linux MCP server."
4
- version: 0.2.1
5
4
  author: agent-sh
6
5
  license: MIT
7
6
  platforms: [linux]
@@ -26,14 +25,14 @@ Do not use this for remote browsers, websites, or headless automation when a bro
26
25
  Preferred install for Hermes users:
27
26
 
28
27
  ```bash
29
- npm install -g @agent-sh/computer-use-linux@0.2.1
28
+ npm install -g @agent-sh/computer-use-linux
30
29
  computer-use-linux doctor | jq .readiness
31
30
  ```
32
31
 
33
32
  Rust users can install the same server from crates.io:
34
33
 
35
34
  ```bash
36
- cargo install computer-use-linux --version 0.2.1
35
+ cargo install computer-use-linux
37
36
  computer-use-linux doctor | jq .readiness
38
37
  ```
39
38