@agent-sh/computer-use-linux 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Avi Fenesh
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,312 @@
1
+ # computer-use-linux
2
+
3
+ Linux desktop control for any MCP host — AT-SPI accessibility trees, portal screenshots, Wayland/X11 input, and multi-compositor window targeting for GNOME, KDE/KWin, Hyprland, i3, and COSMIC.
4
+
5
+ [![CI](https://github.com/avifenesh/computer-use-linux/actions/workflows/ci.yml/badge.svg)](https://github.com/avifenesh/computer-use-linux/actions/workflows/ci.yml)
6
+ [![crates.io](https://img.shields.io/crates/v/computer-use-linux.svg)](https://crates.io/crates/computer-use-linux)
7
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
8
+
9
+ Current release: [`v0.2.1`](https://github.com/avifenesh/computer-use-linux/releases/tag/v0.2.1). The Rust crate is published as [`computer-use-linux`](https://crates.io/crates/computer-use-linux), and the npm wrapper is published as [`@agent-sh/computer-use-linux`](https://www.npmjs.com/package/@agent-sh/computer-use-linux).
10
+
11
+ ## What this is
12
+
13
+ `computer-use-linux` is a Rust MCP server and CLI for Linux desktop control. The crate ships the main `computer-use-linux` binary plus a small `computer-use-linux-cosmic` helper used only for COSMIC Wayland window management. Any MCP host — Codex Desktop's Linux build, Claude Desktop, [Hermes Agent](https://github.com/NousResearch/hermes-agent), or your own client — can spawn it and gain full control of the local Linux desktop: read accessibility trees, list and focus windows, take screenshots, click, drag, scroll, type, and invoke semantic accessibility actions.
14
+
15
+ Most computer-use MCP servers are macOS-only (they rely on AppKit, AXUIElement, CGEvent). The few that target Linux either drive `xdotool` against an X11 root window or shell out to OCR over screenshots. This crate is different on four points worth caring about:
16
+
17
+ - **Wayland actually works.** Pointer actions can use the `org.freedesktop.portal.RemoteDesktop` interface on Wayland, with `ydotool` / `ydotoold` (uinput) as the deterministic fallback and keyboard/text path. Screenshots use the GNOME Shell DBus screenshot method when present and `org.freedesktop.portal.Screenshot` otherwise.
18
+ - **Window targeting is compositor-aware.** The window registry tries GNOME Shell extension, GNOME Shell Introspect, COSMIC Wayland helper, KWin DBus scripting, Hyprland `hyprctl`, and i3 IPC in order, then reports exactly which backend won or why each backend failed.
19
+ - **Semantic selectors, not pixel coordinates.** Tools like `click`, `perform_action`, and `set_value` accept `role` / `name` / `text` / `states` selectors backed by AT-SPI. Pixel coordinates remain available as a fallback for rendering-only surfaces (canvas, games, X clients without ATK).
20
+ - **One JSON readiness report.** `computer-use-linux doctor` returns a structured document covering platform, portals, AT-SPI, windowing, input, and a `readiness` summary with explicit blockers and a recommended next step. MCP hosts can render or surface that to the user without parsing prose.
21
+
22
+ The crate was extracted from [`codex-desktop-linux`](https://github.com/avifenesh/codex-desktop-linux) (the Linux distribution of Codex Desktop), which still bundles this binary as a built-in plugin. This standalone repo is the upstream.
23
+
24
+ ## Features
25
+
26
+ 15 MCP tools exposed by the current `v0.2.1` server:
27
+
28
+ **Diagnostics**
29
+ - `doctor` — single-shot JSON readiness report (platform, portals, accessibility, windowing, input, readiness summary)
30
+ - `setup_accessibility` — enables GNOME's `org.gnome.desktop.interface toolkit-accessibility` setting so toolkit apps expose AT-SPI trees
31
+ - `setup_window_targeting` — installs and enables the bundled GNOME Shell extension when `org.gnome.Shell.Introspect` is locked down
32
+
33
+ **Discovery**
34
+ - `list_apps` — running desktop apps visible to the AT-SPI registry
35
+ - `list_windows` — compositor windows with title, app id, wm_class, focus state, client type (Wayland/X11), and bounds
36
+ - `focused_window` — the window currently holding keyboard focus
37
+ - `get_app_state` — combined screenshot + accessibility tree for a chosen app, with element indices that the input tools accept
38
+
39
+ **Input**
40
+ - `click` — by element index, semantic selector, or pixel coordinates
41
+ - `drag` — pixel-coordinate drag (start / end)
42
+ - `scroll` — page-based scroll on an element or at a pixel location
43
+ - `press_key` — keys / chords; can focus a window or terminal first
44
+ - `type_text` — literal text input, optionally targeted at a window or terminal
45
+
46
+ **Semantic actions**
47
+ - `perform_action` — invoke any AT-SPI action exposed by an element (`Press`, `Activate`, `Toggle`, …); defaults to the primary action
48
+ - `set_value` — write to a settable accessibility element (text fields, sliders, spinners)
49
+
50
+ **Navigation**
51
+ - `activate_window` — focus a window by `window_id`, `pid`, `app_id`, `wm_class`, `title`, or terminal selectors
52
+
53
+ ### MCP safety contract
54
+
55
+ `computer-use-linux` is not a read-only data source. It can observe the local desktop and, when a mutating tool is called, can change real application state. The `tools/list` response includes MCP `ToolAnnotations` so hosts can surface this distinction before invocation:
56
+
57
+ | Class | Tools | Contract |
58
+ | --- | --- | --- |
59
+ | Read-only observation | `doctor`, `list_apps`, `list_windows`, `focused_window`, `get_app_state` | `readOnlyHint=true`; may reveal app, window, accessibility, and screenshot contents. `get_app_state` may trigger the desktop screenshot portal prompt. |
60
+ | Local setup mutators | `setup_accessibility`, `setup_window_targeting` | `readOnlyHint=false`, `destructiveHint=false`, `idempotentHint=true`; modifies user desktop configuration by enabling accessibility or installing/enabling the GNOME window-targeting extension. |
61
+ | UI state mutators | `activate_window`, `scroll` | `readOnlyHint=false`, `destructiveHint=false`; changes focus or scroll position in the live desktop. |
62
+ | Desktop action mutators | `click`, `drag`, `press_key`, `type_text`, `perform_action`, `set_value` | `readOnlyHint=false`, `destructiveHint=true`, `openWorldHint=true`; can trigger arbitrary actions in whatever local application is targeted. |
63
+
64
+ Annotations are safety hints, not an authorization system. MCP hosts should still ask the user before calls that could submit, delete, send, purchase, overwrite, or otherwise commit state.
65
+
66
+ The binary also exposes the same capabilities from the CLI for scripting and debugging:
67
+
68
+ ```
69
+ computer-use-linux mcp # stdio MCP server
70
+ computer-use-linux doctor # JSON readiness report
71
+ computer-use-linux setup # enable AT-SPI
72
+ computer-use-linux setup-window-targeting # install GNOME Shell extension
73
+ computer-use-linux apps
74
+ computer-use-linux state [APP_NAME]
75
+ computer-use-linux screenshot # JSON screenshot summary
76
+ computer-use-linux windows
77
+ ```
78
+
79
+ ## Support matrix
80
+
81
+ Validated manually on Ubuntu 25.10 (GNOME Shell 50.1, Wayland). Other compositor backends are implemented and covered by parser / contract tests, but real desktop behavior still depends on each session exposing its expected control API.
82
+
83
+ | Desktop/session | Window backend | Notes |
84
+ | --- | --- | --- |
85
+ | GNOME Wayland | GNOME Shell extension first, `org.gnome.Shell.Introspect` fallback | Full target. The extension provides exact window activation when GNOME blocks native introspection; Introspect can list windows and focus apps by `app_id` when allowed. |
86
+ | GNOME X11 | `org.gnome.Shell.Introspect` when allowed | AT-SPI and `ydotool` work; the bundled GNOME Shell extension is only needed for GNOME Wayland. Exact per-window focus may be unavailable without the extension backend. |
87
+ | KDE Plasma / KWin | temporary KWin DBus scripting | Lists and focuses windows through `org.kde.KWin` scripting when the session bus exposes it. |
88
+ | Hyprland | `hyprctl clients -j` and `hyprctl dispatch focuswindow` | Requires `hyprctl` in the desktop session. |
89
+ | i3 | `i3-msg`; optional `xprop` for PID hydration | Lists and focuses i3 windows over the active i3 IPC socket. |
90
+ | COSMIC Wayland | `computer-use-linux-cosmic` helper | Installed automatically by `./install.sh`, `cargo install`, and npm. For custom/manual layouts, put the helper next to the main binary, on `PATH`, or point `COMPUTER_USE_LINUX_COSMIC_HELPER` at it. |
91
+ | Sway / generic wlroots | no dedicated backend yet | AT-SPI, screenshots, and global `ydotool` input can still work; exact window list/focus is currently unavailable unless another backend applies. |
92
+ | Generic X11 / XFCE / other WMs | no dedicated backend yet | AT-SPI plus `ydotool` global input only, unless running under i3. |
93
+
94
+ If you run on a desktop not covered above, or a covered backend does not come up cleanly, please open an issue with the output of `computer-use-linux doctor` so we can extend the matrix honestly.
95
+
96
+ ## Install
97
+
98
+ COSMIC users do not need a second package or a separate helper install when using `./install.sh`, `cargo install`, or the npm wrapper. Those paths install `computer-use-linux-cosmic` alongside the main binary automatically. Only manual prebuilt-binary installs need you to copy both release assets.
99
+
100
+ ### Option A — `./install.sh` from a clone
101
+
102
+ Installs system packages on Debian/Ubuntu, Fedora/RHEL-like, or Arch-like distros; installs Rust if needed; builds both release binaries; installs them to `~/.local/bin`; enables `ydotoold` as a user service; enables GNOME AT-SPI settings when running under GNOME; and installs the bundled GNOME Shell extension on GNOME Wayland.
103
+
104
+ ```bash
105
+ git clone https://github.com/avifenesh/computer-use-linux
106
+ cd computer-use-linux
107
+ ./install.sh
108
+ # log out and back in if the GNOME extension was newly installed
109
+ computer-use-linux doctor | jq .readiness
110
+ ```
111
+
112
+ ### Option B — `cargo install` (Rust binaries, no system setup)
113
+
114
+ Installs the Rust binaries from crates.io. You still handle the system-level pieces yourself: `ydotoold`, AT-SPI, desktop portals, and the GNOME extension if you need the GNOME Wayland exact-focus backend.
115
+
116
+ ```bash
117
+ cargo install computer-use-linux --version 0.2.1
118
+ computer-use-linux doctor
119
+ ```
120
+
121
+ For unreleased changes from `main`, install directly from Git:
122
+
123
+ ```bash
124
+ cargo install --git https://github.com/avifenesh/computer-use-linux
125
+ ```
126
+
127
+ Then, as needed:
128
+
129
+ ```bash
130
+ sudo apt install ydotool at-spi2-core # or your distro's equivalent
131
+ systemctl --user enable --now ydotoold
132
+ computer-use-linux setup # gsettings AT-SPI bridge
133
+ computer-use-linux setup-window-targeting # GNOME Shell extension
134
+ ```
135
+
136
+ ### Option C — npm wrapper (binary download)
137
+
138
+ Good for users who already have Node.js and want a no-Rust install. The npm package downloads and verifies the matching main and COSMIC helper binaries during install, then the wrapper sets `COMPUTER_USE_LINUX_COSMIC_HELPER` to the bundled helper automatically.
139
+
140
+ ```bash
141
+ npm install -g @agent-sh/computer-use-linux@0.2.1
142
+ computer-use-linux doctor
143
+ ```
144
+
145
+ You will still need `ydotoold` running and AT-SPI enabled (run `computer-use-linux setup` and the systemd commands above).
146
+
147
+ ### Option D — prebuilt binaries
148
+
149
+ Linux x86_64 / aarch64 builds are published with each tag. Each binary ships a `.sha256` next to it.
150
+
151
+ - Release: <https://github.com/avifenesh/computer-use-linux/releases/tag/v0.2.1>
152
+
153
+ ```bash
154
+ target=x86_64-unknown-linux-gnu
155
+ version=v0.2.1
156
+ for binary in computer-use-linux computer-use-linux-cosmic; do
157
+ asset="$binary-$target"
158
+ curl -L -O "https://github.com/avifenesh/computer-use-linux/releases/download/$version/$asset"
159
+ curl -L -O "https://github.com/avifenesh/computer-use-linux/releases/download/$version/$asset.sha256"
160
+ sha256sum -c "$asset.sha256"
161
+ install -m 0755 "$asset" "$HOME/.local/bin/$binary"
162
+ done
163
+ ```
164
+
165
+ You will still need `ydotoold` running and AT-SPI enabled (run `computer-use-linux setup` and the systemd commands above).
166
+
167
+ ## Wire it into your MCP host
168
+
169
+ The binary speaks the `rmcp` 2024-11-05 stdio protocol. Pass `mcp` as the only argument; everything else is configured through MCP tool calls.
170
+
171
+ ### Codex Desktop (Linux build)
172
+
173
+ The Linux build of Codex Desktop already bundles this binary as a plugin. You don't need to wire it up manually — the plugin definition lives in [`codex-desktop-linux`](https://github.com/avifenesh/codex-desktop-linux) under its `plugins/` directory and is enabled by default. To upgrade the plugin in place, replace the binary it ships with the one from this repo's release assets.
174
+
175
+ ### Claude Desktop
176
+
177
+ Edit `~/.config/Claude/claude_desktop_config.json`:
178
+
179
+ ```json
180
+ {
181
+ "mcpServers": {
182
+ "computer-use-linux": {
183
+ "command": "computer-use-linux",
184
+ "args": ["mcp"]
185
+ }
186
+ }
187
+ }
188
+ ```
189
+
190
+ Restart Claude Desktop. The 15 tools should appear in the tools list.
191
+
192
+ ### Hermes Agent
193
+
194
+ If `computer-use-linux` is on your `PATH`, let Hermes discover it:
195
+
196
+ ```bash
197
+ hermes mcp add computer-use-linux --command computer-use-linux --args mcp
198
+ hermes mcp test computer-use-linux
199
+ ```
200
+
201
+ Press Enter at the "Enable all tools?" prompt to expose all 15 tools. Hermes registers them as `mcp_computer_use_linux_<tool>` and creates the `mcp-computer-use-linux` runtime toolset.
202
+
203
+ If you installed the binary somewhere that is not on `PATH`, pass the absolute path as `--command`.
204
+
205
+ You can also edit `~/.hermes/config.yaml` directly:
206
+
207
+ ```yaml
208
+ mcp_servers:
209
+ computer-use-linux:
210
+ command: computer-use-linux
211
+ args: ["mcp"]
212
+
213
+ # Optional: expose the tools to subagents as well.
214
+ inherit_mcp_toolsets: true
215
+ ```
216
+
217
+ ### Generic MCP client
218
+
219
+ Spawn the binary with `["mcp"]` as the argv tail. It speaks JSON-RPC over stdio per the rmcp 2024-11-05 protocol; capability discovery happens through `tools/list` and the `doctor` tool. The server normally needs no MCP-specific configuration, but desktop runtime environment still matters (`DBUS_SESSION_BUS_ADDRESS`, `XDG_RUNTIME_DIR`, portals, AT-SPI, `ydotoold`, and optionally `COMPUTER_USE_LINUX_COSMIC_HELPER`).
220
+
221
+ ## First-run checklist
222
+
223
+ 1. **Run `doctor`.**
224
+
225
+ ```bash
226
+ computer-use-linux doctor | jq .readiness
227
+ ```
228
+
229
+ Aim for `can_register_mcp_tools`, `can_build_accessibility_tree`, `can_send_development_input`, and `can_query_windows` all `true`. The `blockers` array should be empty.
230
+
231
+ 2. **If `accessibility.at_spi_bus.ok = false`** — run `computer-use-linux setup` (or call the `setup_accessibility` MCP tool). This sets:
232
+ - `org.gnome.desktop.interface toolkit-accessibility true`
233
+
234
+ You may need to restart toolkit-using apps for the change to take effect.
235
+
236
+ 3. **If `windowing.can_list_windows = false`** — inspect `doctor.windowing.backends`. On GNOME Wayland, run `computer-use-linux setup-window-targeting` (or call `setup_window_targeting`) to install the bundled `computer-use-linux@avifenesh.dev` Shell extension, then log out and back in so GNOME Shell loads it. On KDE, Hyprland, i3, or COSMIC, install or expose the matching compositor tool/helper shown in the backend details.
237
+
238
+ 4. **Grant the screencast portal on first screenshot.** The first time `get_app_state` or any screenshot subcommand runs, GNOME will pop a portal dialog asking to share the screen. Accept once and tick "remember" to make it sticky for the session.
239
+
240
+ 5. **Confirm `ydotoold` is running.**
241
+
242
+ ```bash
243
+ systemctl --user status ydotoold
244
+ ```
245
+
246
+ Its socket should appear at `/run/user/$UID/.ydotool_socket`.
247
+
248
+ ## Architecture
249
+
250
+ - **Accessibility tree** — [`atspi`](https://crates.io/crates/atspi) crate (tokio backend) talks to the AT-SPI registry on the user session bus. The tree is flattened to `(role, name, text, states, bounds)` tuples and indexed; element indices are stable for the duration of a `get_app_state` snapshot.
251
+ - **DBus where desktops expose it** — [`zbus`](https://crates.io/crates/zbus) for portal calls (`org.freedesktop.portal.Screenshot`, `…RemoteDesktop`, `…ScreenCast`), GNOME Shell screenshots (`org.gnome.Shell.Screenshot`), the bundled GNOME extension's `dev.avifenesh.ComputerUseLinux.WindowControl` service, and temporary KWin scripting.
252
+ - **MCP transport** — [`rmcp`](https://crates.io/crates/rmcp) with the `transport-io` feature; stdio framing, no network.
253
+ - **Input fallback** — when the remote-desktop portal isn't available or the host wants deterministic injection, the binary writes to `ydotoold`'s socket, which writes to `/dev/uinput`. `install.sh` can configure `ydotoold`; the `setup` command only enables the GNOME AT-SPI bridge.
254
+ - **Window registry** — `list_windows`, `focused_window`, `activate_window`, `press_key`, and `type_text` share a backend registry. It tries GNOME extension, GNOME Introspect, COSMIC helper, KWin scripting, Hyprland `hyprctl`, and i3 IPC in that order, skipping empty or failed backends so another compositor backend can answer.
255
+ - **GNOME extension fallback** — recent GNOME builds deny `org.gnome.Shell.Introspect.GetWindows` to non-blessed clients. The bundled Shell extension exposes window data and exact activation under `dev.avifenesh.ComputerUseLinux.WindowControl`.
256
+ - **COSMIC helper** — `computer-use-linux-cosmic` talks to COSMIC toplevel protocols and is resolved from `COMPUTER_USE_LINUX_COSMIC_HELPER`, next to the running binary, or from `PATH`.
257
+ - **Terminal enrichment** — `list_windows` cross-references each terminal window with its controlling TTY and the foreground process on that TTY, so `type_text` / `press_key` can target "the terminal where `pytest` is running" without the host ever knowing the window id.
258
+
259
+ ## Security
260
+
261
+ Computer-use tooling is, by definition, a privilege-escalation surface. The threat model:
262
+
263
+ - **`ydotoold` runs as a per-user systemd service** with read/write access to `/dev/uinput`. Any process that can connect to its socket (`/run/user/$UID/.ydotool_socket`, mode `0600` by default) can synthesize arbitrary input — keypresses, clicks, anything. Keep the socket in the user runtime dir (the default), not in `/tmp` or any world-readable location. Do not run `ydotoold` as a system service.
264
+ - **The screencast portal asks for permission once per session.** Granting it lets the calling MCP host capture the screen for the rest of the session. If you don't want that, decline the portal dialog and use `get_app_state` with `include_screenshot: false`.
265
+ - **AT-SPI exposes window contents to any client on your session bus.** Enabling the AT-SPI bridge (`setup_accessibility`) is a prerequisite for this binary; it's also what screen readers use, and it shares the same trust boundary.
266
+ - **The GNOME Shell extension** is loaded only into your user's GNOME Shell, runs in the Shell's JS sandbox, and exposes a single DBus interface on the user session bus. It does not request any extra permissions.
267
+ - **No network.** This binary opens no TCP/UDP listener, makes no outbound Internet connections, and ships no telemetry. It does use local session transports such as DBus and the per-user `ydotoold` Unix socket.
268
+ - **Mutating tools are explicit.** The MCP tool list annotates read-only versus mutating tools, and CI fails if the published tool annotations drift from the table above. Treat those annotations as hints; the host is still responsible for user approval and policy.
269
+
270
+ If you're running this on a shared workstation, set `ydotoold`'s socket permissions to `0600` (the default) and audit which processes on your user can `connect()` to it.
271
+
272
+ ## Troubleshooting
273
+
274
+ `computer-use-linux doctor` is the source of truth. Common failure modes and fixes:
275
+
276
+ - **`accessibility.at_spi_bus.ok = false`** — AT-SPI registry isn't running or the toolkit bridge is off. Fix: `computer-use-linux setup` (or call the `setup_accessibility` MCP tool). Restart the apps you want to drive.
277
+ - **`windowing.gnome_shell_introspect.ok = false` and `gnome_shell_extension_dbus.ok = false`** — GNOME blocks introspection and the extension isn't installed. Fix: `computer-use-linux setup-window-targeting`, then log out and log back in.
278
+ - **`input.ydotool_socket.ok = false`** — daemon isn't running. Fix: `systemctl --user enable --now ydotoold`. If the unit doesn't exist, install the `ydotool` package and rerun `./install.sh` (or copy the unit from `systemd/ydotoold.service` in this repo).
279
+ - **`input.uinput.ok = false`** — `/dev/uinput` isn't accessible to your user. Fix: add yourself to the `input` group (`sudo usermod -aG input $USER`) and re-login. On distros that ship `uinput` as a kernel module without auto-loading it, add `uinput` to `/etc/modules-load.d/`.
280
+ - **Portal calls hang or time out** — `xdg-desktop-portal` or its backend (`-gnome`, `-gtk`, `-kde`, `-wlr`) crashed. Fix: check `journalctl --user -u xdg-desktop-portal -u xdg-desktop-portal-gnome --since '5 min ago'` and restart the relevant unit.
281
+ - **KWin / Hyprland / i3 / COSMIC windowing is unavailable** — check `doctor.windowing.backends`. KWin needs session-bus scripting; Hyprland needs `hyprctl`; i3 needs `i3-msg` and its IPC socket. COSMIC needs `computer-use-linux-cosmic`, which the standard installers provide automatically; if you copied binaries by hand, copy the helper too or set `COMPUTER_USE_LINUX_COSMIC_HELPER`.
282
+ - **Screenshots return black frames on multi-monitor setups** — known portal / compositor edge case. Use `get_app_state` with `include_screenshot: false` and rely on AT-SPI until the portal backend is healthy.
283
+ - **`type_text` types into the wrong window** — pass an explicit target (`window_id`, `pid`, `wm_class`, `title`, or for terminals `tty` / `terminal_pid` / `terminal_command` / `terminal_cwd`). Without a target, input goes to whatever window currently has compositor focus.
284
+
285
+ If `doctor` is green and a specific tool still misbehaves, file an issue with the JSON output of `doctor` and the failing tool's request payload.
286
+
287
+ ## Credits
288
+
289
+ Extracted from [`codex-desktop-linux`](https://github.com/avifenesh/codex-desktop-linux), the Linux distribution of Codex Desktop, which continues to ship this same binary as a bundled plugin. Maintained by [Avi Fenesh](https://github.com/avifenesh).
290
+
291
+ Built on top of:
292
+
293
+ - [`atspi`](https://crates.io/crates/atspi) — AT-SPI bindings
294
+ - [`zbus`](https://crates.io/crates/zbus) — async DBus
295
+ - [`rmcp`](https://crates.io/crates/rmcp) — MCP runtime
296
+ - [`ydotool`](https://github.com/ReimuNotMoe/ydotool) — Wayland-friendly uinput driver
297
+ - [`cosmic-protocols`](https://crates.io/crates/cosmic-protocols) — COSMIC Wayland toplevel protocol bindings
298
+
299
+ ## Publishing
300
+
301
+ Publishing is tag-driven from GitHub Actions. The repository needs these Actions secrets:
302
+
303
+ ```bash
304
+ gh secret set CARGO_REGISTRY_TOKEN -R avifenesh/computer-use-linux
305
+ gh secret set NPM_TOKEN -R avifenesh/computer-use-linux
306
+ ```
307
+
308
+ Then bump `Cargo.toml` and `package.json` together, update `CHANGELOG.md`, and push a `vX.Y.Z` tag. CI runs the full Rust and MCP safety gates, builds release assets for both architectures, publishes `computer-use-linux` to crates.io, and publishes the npm wrapper after the GitHub release binaries are available.
309
+
310
+ ## License
311
+
312
+ MIT — see [LICENSE](LICENSE).
package/npm/README.md ADDED
@@ -0,0 +1,26 @@
1
+ # computer-use-linux
2
+
3
+ NPM wrapper for the `computer-use-linux` MCP server. Current release:
4
+ [`@agent-sh/computer-use-linux@0.2.1`](https://www.npmjs.com/package/@agent-sh/computer-use-linux/v/0.2.1).
5
+
6
+ Security note: this server can control the local Linux desktop. Tools such as
7
+ `click`, `type_text`, `press_key`, `perform_action`, and `set_value` are
8
+ mutating and can change real application state. The MCP tool list includes
9
+ `ToolAnnotations` so hosts can distinguish read-only observation from mutating
10
+ desktop actions.
11
+
12
+ ```bash
13
+ npm install -g @agent-sh/computer-use-linux@0.2.1
14
+ computer-use-linux doctor
15
+ hermes mcp add computer-use-linux --command computer-use-linux --args mcp
16
+ hermes mcp test computer-use-linux
17
+ ```
18
+
19
+ The package downloads the matching Linux x86_64 or aarch64 binary from the
20
+ GitHub release for this package version and verifies the `.sha256` asset before
21
+ installing it. It also installs the matching `computer-use-linux-cosmic` helper
22
+ used for COSMIC desktop window targeting.
23
+
24
+ If you already built or installed the binary yourself, set
25
+ `COMPUTER_USE_LINUX_BIN=/path/to/computer-use-linux` to make the wrapper use
26
+ that executable instead.
@@ -0,0 +1,56 @@
1
+ #!/usr/bin/env node
2
+ 'use strict';
3
+
4
+ const fs = require('node:fs');
5
+ const path = require('node:path');
6
+ const { spawn } = require('node:child_process');
7
+
8
+ const binaryName = `computer-use-linux-${process.platform}-${process.arch}`;
9
+ const bundledBinary = path.join(__dirname, binaryName);
10
+ const binary = process.env.COMPUTER_USE_LINUX_BIN || bundledBinary;
11
+ const bundledCosmicHelper = path.join(__dirname, 'computer-use-linux-cosmic');
12
+
13
+ if (!fs.existsSync(binary)) {
14
+ console.error(
15
+ [
16
+ `computer-use-linux binary not found: ${binary}`,
17
+ 'Reinstall the package, run `npm rebuild computer-use-linux`, or set COMPUTER_USE_LINUX_BIN.',
18
+ ].join('\n')
19
+ );
20
+ process.exit(127);
21
+ }
22
+
23
+ const env = { ...process.env };
24
+ if (!env.COMPUTER_USE_LINUX_COSMIC_HELPER && fs.existsSync(bundledCosmicHelper)) {
25
+ env.COMPUTER_USE_LINUX_COSMIC_HELPER = bundledCosmicHelper;
26
+ }
27
+
28
+ const child = spawn(binary, process.argv.slice(2), {
29
+ stdio: 'inherit',
30
+ env,
31
+ });
32
+
33
+ for (const signal of ['SIGINT', 'SIGTERM', 'SIGHUP']) {
34
+ process.on(signal, () => {
35
+ if (!child.killed) {
36
+ child.kill(signal);
37
+ }
38
+ });
39
+ }
40
+
41
+ child.on('error', (error) => {
42
+ console.error(`failed to start computer-use-linux: ${error.message}`);
43
+ process.exit(127);
44
+ });
45
+
46
+ child.on('exit', (code, signal) => {
47
+ if (signal) {
48
+ const signalExitCodes = {
49
+ SIGHUP: 129,
50
+ SIGINT: 130,
51
+ SIGTERM: 143,
52
+ };
53
+ process.exit(signalExitCodes[signal] || 1);
54
+ }
55
+ process.exit(code ?? 1);
56
+ });
package/npm/install.js ADDED
@@ -0,0 +1,142 @@
1
+ #!/usr/bin/env node
2
+ 'use strict';
3
+
4
+ const crypto = require('node:crypto');
5
+ const fs = require('node:fs');
6
+ const https = require('node:https');
7
+ const os = require('node:os');
8
+ const path = require('node:path');
9
+
10
+ const pkg = require('../package.json');
11
+
12
+ const archToTarget = {
13
+ x64: 'x86_64',
14
+ arm64: 'aarch64',
15
+ };
16
+
17
+ const binDir = path.join(__dirname, 'bin');
18
+ const binaryPath = path.join(binDir, `computer-use-linux-${process.platform}-${process.arch}`);
19
+ const cosmicHelperPath = path.join(binDir, 'computer-use-linux-cosmic');
20
+
21
+ function fail(message) {
22
+ console.error(`[computer-use-linux] ${message}`);
23
+ process.exit(1);
24
+ }
25
+
26
+ function copyLocalBinary(source) {
27
+ fs.mkdirSync(binDir, { recursive: true });
28
+ fs.copyFileSync(source, binaryPath);
29
+ fs.chmodSync(binaryPath, 0o755);
30
+ if (process.env.COMPUTER_USE_LINUX_LOCAL_COSMIC_HELPER) {
31
+ fs.copyFileSync(process.env.COMPUTER_USE_LINUX_LOCAL_COSMIC_HELPER, cosmicHelperPath);
32
+ fs.chmodSync(cosmicHelperPath, 0o755);
33
+ }
34
+ console.log(`[computer-use-linux] installed local binary from ${source}`);
35
+ }
36
+
37
+ function download(url, destination, redirects = 5) {
38
+ return new Promise((resolve, reject) => {
39
+ const request = https.get(url, (response) => {
40
+ if (
41
+ response.statusCode >= 300 &&
42
+ response.statusCode < 400 &&
43
+ response.headers.location &&
44
+ redirects > 0
45
+ ) {
46
+ response.resume();
47
+ const nextUrl = new URL(response.headers.location, url).toString();
48
+ download(nextUrl, destination, redirects - 1).then(resolve, reject);
49
+ return;
50
+ }
51
+
52
+ if (response.statusCode !== 200) {
53
+ response.resume();
54
+ reject(new Error(`download failed with HTTP ${response.statusCode}: ${url}`));
55
+ return;
56
+ }
57
+
58
+ const file = fs.createWriteStream(destination, { mode: 0o600 });
59
+ response.pipe(file);
60
+ file.on('finish', () => file.close(resolve));
61
+ file.on('error', reject);
62
+ });
63
+ request.on('error', reject);
64
+ });
65
+ }
66
+
67
+ function parseSha256(text) {
68
+ const match = text.match(/\b[a-fA-F0-9]{64}\b/);
69
+ if (!match) {
70
+ throw new Error('sha256 file did not contain a 64-character hex digest');
71
+ }
72
+ return match[0].toLowerCase();
73
+ }
74
+
75
+ function sha256File(filePath) {
76
+ const hash = crypto.createHash('sha256');
77
+ hash.update(fs.readFileSync(filePath));
78
+ return hash.digest('hex');
79
+ }
80
+
81
+ async function main() {
82
+ if (process.env.COMPUTER_USE_LINUX_SKIP_DOWNLOAD === '1') {
83
+ console.log('[computer-use-linux] skipping binary download');
84
+ return;
85
+ }
86
+
87
+ if (process.env.COMPUTER_USE_LINUX_LOCAL_BINARY) {
88
+ copyLocalBinary(process.env.COMPUTER_USE_LINUX_LOCAL_BINARY);
89
+ return;
90
+ }
91
+
92
+ if (process.platform !== 'linux') {
93
+ fail(`unsupported platform: ${process.platform}. This package only supports Linux.`);
94
+ }
95
+
96
+ const targetArch = archToTarget[process.arch];
97
+ if (!targetArch) {
98
+ fail(`unsupported CPU architecture: ${process.arch}. Supported: x64, arm64.`);
99
+ }
100
+
101
+ const asset = `computer-use-linux-${targetArch}-unknown-linux-gnu`;
102
+ const cosmicAsset = `computer-use-linux-cosmic-${targetArch}-unknown-linux-gnu`;
103
+ const baseUrl =
104
+ process.env.COMPUTER_USE_LINUX_DOWNLOAD_BASE ||
105
+ `https://github.com/avifenesh/computer-use-linux/releases/download/v${pkg.version}`;
106
+ const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'computer-use-linux-'));
107
+ const tmpBinary = path.join(tmpDir, asset);
108
+ const tmpSha = path.join(tmpDir, `${asset}.sha256`);
109
+ const tmpCosmic = path.join(tmpDir, cosmicAsset);
110
+ const tmpCosmicSha = path.join(tmpDir, `${cosmicAsset}.sha256`);
111
+
112
+ try {
113
+ console.log(`[computer-use-linux] downloading ${asset} from ${baseUrl}`);
114
+ await download(`${baseUrl}/${asset}`, tmpBinary);
115
+ await download(`${baseUrl}/${asset}.sha256`, tmpSha);
116
+ await download(`${baseUrl}/${cosmicAsset}`, tmpCosmic);
117
+ await download(`${baseUrl}/${cosmicAsset}.sha256`, tmpCosmicSha);
118
+
119
+ const expected = parseSha256(fs.readFileSync(tmpSha, 'utf8'));
120
+ const actual = sha256File(tmpBinary);
121
+ if (actual !== expected) {
122
+ fail(`sha256 mismatch for ${asset}: expected ${expected}, got ${actual}`);
123
+ }
124
+
125
+ const expectedCosmic = parseSha256(fs.readFileSync(tmpCosmicSha, 'utf8'));
126
+ const actualCosmic = sha256File(tmpCosmic);
127
+ if (actualCosmic !== expectedCosmic) {
128
+ fail(`sha256 mismatch for ${cosmicAsset}: expected ${expectedCosmic}, got ${actualCosmic}`);
129
+ }
130
+
131
+ fs.mkdirSync(binDir, { recursive: true });
132
+ fs.copyFileSync(tmpBinary, binaryPath);
133
+ fs.chmodSync(binaryPath, 0o755);
134
+ fs.copyFileSync(tmpCosmic, cosmicHelperPath);
135
+ fs.chmodSync(cosmicHelperPath, 0o755);
136
+ console.log(`[computer-use-linux] installed ${asset} and ${cosmicAsset}`);
137
+ } finally {
138
+ fs.rmSync(tmpDir, { recursive: true, force: true });
139
+ }
140
+ }
141
+
142
+ main().catch((error) => fail(error.message));
package/package.json ADDED
@@ -0,0 +1,49 @@
1
+ {
2
+ "name": "@agent-sh/computer-use-linux",
3
+ "version": "0.2.1",
4
+ "description": "Linux desktop-control MCP server: AT-SPI accessibility trees, Wayland/X11 input, screenshots, and compositor window targeting.",
5
+ "license": "MIT",
6
+ "type": "commonjs",
7
+ "homepage": "https://github.com/avifenesh/computer-use-linux#readme",
8
+ "repository": {
9
+ "type": "git",
10
+ "url": "git+https://github.com/avifenesh/computer-use-linux.git"
11
+ },
12
+ "bugs": {
13
+ "url": "https://github.com/avifenesh/computer-use-linux/issues"
14
+ },
15
+ "keywords": [
16
+ "mcp",
17
+ "computer-use",
18
+ "linux",
19
+ "wayland",
20
+ "accessibility",
21
+ "hermes-agent",
22
+ "codex"
23
+ ],
24
+ "os": [
25
+ "linux"
26
+ ],
27
+ "cpu": [
28
+ "x64",
29
+ "arm64"
30
+ ],
31
+ "bin": {
32
+ "computer-use-linux": "npm/bin/computer-use-linux.js"
33
+ },
34
+ "files": [
35
+ "LICENSE",
36
+ "README.md",
37
+ "npm/README.md",
38
+ "npm/bin/computer-use-linux.js",
39
+ "npm/install.js"
40
+ ],
41
+ "scripts": {
42
+ "postinstall": "node npm/install.js",
43
+ "pack:check": "npm pack --dry-run",
44
+ "test:wrapper": "node npm/bin/computer-use-linux.js --help"
45
+ },
46
+ "engines": {
47
+ "node": ">=18"
48
+ }
49
+ }