@agent-sh/computer-use-linux 0.2.1 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,12 +1,22 @@
1
- # computer-use-linux
1
+ <div align="center">
2
+ <h1>computer-use-linux</h1>
3
+ <p><strong>Linux desktop control for MCP hosts.</strong></p>
4
+ <p>
5
+ <a href="https://github.com/agent-sh/computer-use-linux/actions/workflows/ci.yml"><img src="https://github.com/agent-sh/computer-use-linux/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
6
+ <a href="https://crates.io/crates/computer-use-linux"><img src="https://img.shields.io/crates/v/computer-use-linux.svg" alt="crates.io"></a>
7
+ <a href="https://www.npmjs.com/package/@agent-sh/computer-use-linux"><img src="https://img.shields.io/npm/v/@agent-sh/computer-use-linux.svg" alt="npm"></a>
8
+ <a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License: MIT"></a>
9
+ </p>
10
+ </div>
11
+
12
+ Linux desktop control for any MCP host: AT-SPI accessibility trees, portal screenshots, Wayland/X11 input, and multi-compositor window targeting for GNOME, KDE/KWin, Hyprland, i3, and COSMIC.
2
13
 
3
- Linux desktop control for any MCP host — AT-SPI accessibility trees, portal screenshots, Wayland/X11 input, and multi-compositor window targeting for GNOME, KDE/KWin, Hyprland, i3, and COSMIC.
4
-
5
- [![CI](https://github.com/avifenesh/computer-use-linux/actions/workflows/ci.yml/badge.svg)](https://github.com/avifenesh/computer-use-linux/actions/workflows/ci.yml)
6
- [![crates.io](https://img.shields.io/crates/v/computer-use-linux.svg)](https://crates.io/crates/computer-use-linux)
7
- [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
14
+ ```bash
15
+ npm install -g @agent-sh/computer-use-linux@0.2.1
16
+ computer-use-linux doctor | jq .readiness
17
+ ```
8
18
 
9
- Current release: [`v0.2.1`](https://github.com/avifenesh/computer-use-linux/releases/tag/v0.2.1). The Rust crate is published as [`computer-use-linux`](https://crates.io/crates/computer-use-linux), and the npm wrapper is published as [`@agent-sh/computer-use-linux`](https://www.npmjs.com/package/@agent-sh/computer-use-linux).
19
+ Current release: [`v0.2.1`](https://github.com/agent-sh/computer-use-linux/releases/tag/v0.2.1). The Rust crate is published as [`computer-use-linux`](https://crates.io/crates/computer-use-linux), and the npm wrapper is published as [`@agent-sh/computer-use-linux`](https://www.npmjs.com/package/@agent-sh/computer-use-linux).
10
20
 
11
21
  ## What this is
12
22
 
@@ -102,7 +112,7 @@ COSMIC users do not need a second package or a separate helper install when usin
102
112
  Installs system packages on Debian/Ubuntu, Fedora/RHEL-like, or Arch-like distros; installs Rust if needed; builds both release binaries; installs them to `~/.local/bin`; enables `ydotoold` as a user service; enables GNOME AT-SPI settings when running under GNOME; and installs the bundled GNOME Shell extension on GNOME Wayland.
103
113
 
104
114
  ```bash
105
- git clone https://github.com/avifenesh/computer-use-linux
115
+ git clone https://github.com/agent-sh/computer-use-linux
106
116
  cd computer-use-linux
107
117
  ./install.sh
108
118
  # log out and back in if the GNOME extension was newly installed
@@ -121,7 +131,7 @@ computer-use-linux doctor
121
131
  For unreleased changes from `main`, install directly from Git:
122
132
 
123
133
  ```bash
124
- cargo install --git https://github.com/avifenesh/computer-use-linux
134
+ cargo install --git https://github.com/agent-sh/computer-use-linux
125
135
  ```
126
136
 
127
137
  Then, as needed:
@@ -148,15 +158,15 @@ You will still need `ydotoold` running and AT-SPI enabled (run `computer-use-lin
148
158
 
149
159
  Linux x86_64 / aarch64 builds are published with each tag. Each binary ships a `.sha256` next to it.
150
160
 
151
- - Release: <https://github.com/avifenesh/computer-use-linux/releases/tag/v0.2.1>
161
+ - Release: <https://github.com/agent-sh/computer-use-linux/releases/tag/v0.2.1>
152
162
 
153
163
  ```bash
154
164
  target=x86_64-unknown-linux-gnu
155
165
  version=v0.2.1
156
166
  for binary in computer-use-linux computer-use-linux-cosmic; do
157
167
  asset="$binary-$target"
158
- curl -L -O "https://github.com/avifenesh/computer-use-linux/releases/download/$version/$asset"
159
- curl -L -O "https://github.com/avifenesh/computer-use-linux/releases/download/$version/$asset.sha256"
168
+ curl -L -O "https://github.com/agent-sh/computer-use-linux/releases/download/$version/$asset"
169
+ curl -L -O "https://github.com/agent-sh/computer-use-linux/releases/download/$version/$asset.sha256"
160
170
  sha256sum -c "$asset.sha256"
161
171
  install -m 0755 "$asset" "$HOME/.local/bin/$binary"
162
172
  done
@@ -191,29 +201,51 @@ Restart Claude Desktop. The 15 tools should appear in the tools list.
191
201
 
192
202
  ### Hermes Agent
193
203
 
194
- If `computer-use-linux` is on your `PATH`, let Hermes discover it:
204
+ Install the companion Hermes skill so Hermes has the desktop-specific runbook:
195
205
 
196
206
  ```bash
197
- hermes mcp add computer-use-linux --command computer-use-linux --args mcp
198
- hermes mcp test computer-use-linux
207
+ hermes skills tap add agent-sh/computer-use-linux
208
+ hermes skills install agent-sh/computer-use-linux/computer-use-linux
199
209
  ```
200
210
 
201
- Press Enter at the "Enable all tools?" prompt to expose all 15 tools. Hermes registers them as `mcp_computer_use_linux_<tool>` and creates the `mcp-computer-use-linux` runtime toolset.
211
+ The skill is optional but recommended for Hermes users. It teaches Hermes how to install, configure, verify, and call the Linux desktop MCP safely. It follows the same `skills/<name>/SKILL.md` tap layout used by Hermes community skills.
202
212
 
203
- If you installed the binary somewhere that is not on `PATH`, pass the absolute path as `--command`.
213
+ Then add the stdio MCP server:
214
+
215
+ ```bash
216
+ hermes mcp add computer-use-linux --command computer-use-linux --args mcp
217
+ hermes mcp test computer-use-linux
218
+ hermes mcp configure computer-use-linux
219
+ ```
204
220
 
205
- You can also edit `~/.hermes/config.yaml` directly:
221
+ `configure` opens Hermes' tool-selection UI for the server. The generated config should look like this:
206
222
 
207
223
  ```yaml
208
224
  mcp_servers:
209
225
  computer-use-linux:
210
226
  command: computer-use-linux
211
227
  args: ["mcp"]
228
+ timeout: 120
229
+ connect_timeout: 30
212
230
 
213
231
  # Optional: expose the tools to subagents as well.
214
232
  inherit_mcp_toolsets: true
215
233
  ```
216
234
 
235
+ If you installed the binary somewhere that is not on `PATH`, pass the absolute path as `--command`.
236
+
237
+ Restart Hermes after editing the config. Hermes registers the tools as `mcp_computer_use_linux_<tool>` and creates the `mcp-computer-use-linux` runtime toolset.
238
+
239
+ You can verify both sides before asking Hermes to use the desktop:
240
+
241
+ ```bash
242
+ computer-use-linux doctor | jq .readiness
243
+ hermes skills inspect agent-sh/computer-use-linux/computer-use-linux
244
+ hermes chat --toolsets mcp-computer-use-linux -q "List the current desktop windows."
245
+ ```
246
+
247
+ For one-off installs without adding the tap first, Hermes also accepts `hermes skills install agent-sh/computer-use-linux/skills/computer-use-linux`.
248
+
217
249
  ### Generic MCP client
218
250
 
219
251
  Spawn the binary with `["mcp"]` as the argv tail. It speaks JSON-RPC over stdio per the rmcp 2024-11-05 protocol; capability discovery happens through `tools/list` and the `doctor` tool. The server normally needs no MCP-specific configuration, but desktop runtime environment still matters (`DBUS_SESSION_BUS_ADDRESS`, `XDG_RUNTIME_DIR`, portals, AT-SPI, `ydotoold`, and optionally `COMPUTER_USE_LINUX_COSMIC_HELPER`).
@@ -284,6 +316,10 @@ If you're running this on a shared workstation, set `ydotoold`'s socket permissi
284
316
 
285
317
  If `doctor` is green and a specific tool still misbehaves, file an issue with the JSON output of `doctor` and the failing tool's request payload.
286
318
 
319
+ ## Contributing
320
+
321
+ Contributions are welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for the local development workflow, CI gates, and PR expectations. Report security vulnerabilities through [SECURITY.md](SECURITY.md), not public issues.
322
+
287
323
  ## Credits
288
324
 
289
325
  Extracted from [`codex-desktop-linux`](https://github.com/avifenesh/codex-desktop-linux), the Linux distribution of Codex Desktop, which continues to ship this same binary as a bundled plugin. Maintained by [Avi Fenesh](https://github.com/avifenesh).
@@ -301,8 +337,8 @@ Built on top of:
301
337
  Publishing is tag-driven from GitHub Actions. The repository needs these Actions secrets:
302
338
 
303
339
  ```bash
304
- gh secret set CARGO_REGISTRY_TOKEN -R avifenesh/computer-use-linux
305
- gh secret set NPM_TOKEN -R avifenesh/computer-use-linux
340
+ gh secret set CARGO_REGISTRY_TOKEN -R agent-sh/computer-use-linux
341
+ gh secret set NPM_TOKEN -R agent-sh/computer-use-linux
306
342
  ```
307
343
 
308
344
  Then bump `Cargo.toml` and `package.json` together, update `CHANGELOG.md`, and push a `vX.Y.Z` tag. CI runs the full Rust and MCP safety gates, builds release assets for both architectures, publishes `computer-use-linux` to crates.io, and publishes the npm wrapper after the GitHub release binaries are available.
package/npm/README.md CHANGED
@@ -12,8 +12,22 @@ desktop actions.
12
12
  ```bash
13
13
  npm install -g @agent-sh/computer-use-linux@0.2.1
14
14
  computer-use-linux doctor
15
+ hermes skills tap add agent-sh/computer-use-linux
16
+ hermes skills install agent-sh/computer-use-linux/computer-use-linux
15
17
  hermes mcp add computer-use-linux --command computer-use-linux --args mcp
16
18
  hermes mcp test computer-use-linux
19
+ hermes mcp configure computer-use-linux
20
+ ```
21
+
22
+ The generated Hermes config should look like this:
23
+
24
+ ```yaml
25
+ mcp_servers:
26
+ computer-use-linux:
27
+ command: computer-use-linux
28
+ args: ["mcp"]
29
+ timeout: 120
30
+ connect_timeout: 30
17
31
  ```
18
32
 
19
33
  The package downloads the matching Linux x86_64 or aarch64 binary from the
package/npm/install.js CHANGED
@@ -102,7 +102,7 @@ async function main() {
102
102
  const cosmicAsset = `computer-use-linux-cosmic-${targetArch}-unknown-linux-gnu`;
103
103
  const baseUrl =
104
104
  process.env.COMPUTER_USE_LINUX_DOWNLOAD_BASE ||
105
- `https://github.com/avifenesh/computer-use-linux/releases/download/v${pkg.version}`;
105
+ `https://github.com/agent-sh/computer-use-linux/releases/download/v${pkg.version}`;
106
106
  const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'computer-use-linux-'));
107
107
  const tmpBinary = path.join(tmpDir, asset);
108
108
  const tmpSha = path.join(tmpDir, `${asset}.sha256`);
package/package.json CHANGED
@@ -1,16 +1,16 @@
1
1
  {
2
2
  "name": "@agent-sh/computer-use-linux",
3
- "version": "0.2.1",
3
+ "version": "0.2.2",
4
4
  "description": "Linux desktop-control MCP server: AT-SPI accessibility trees, Wayland/X11 input, screenshots, and compositor window targeting.",
5
5
  "license": "MIT",
6
6
  "type": "commonjs",
7
- "homepage": "https://github.com/avifenesh/computer-use-linux#readme",
7
+ "homepage": "https://github.com/agent-sh/computer-use-linux#readme",
8
8
  "repository": {
9
9
  "type": "git",
10
- "url": "git+https://github.com/avifenesh/computer-use-linux.git"
10
+ "url": "git+https://github.com/agent-sh/computer-use-linux.git"
11
11
  },
12
12
  "bugs": {
13
- "url": "https://github.com/avifenesh/computer-use-linux/issues"
13
+ "url": "https://github.com/agent-sh/computer-use-linux/issues"
14
14
  },
15
15
  "keywords": [
16
16
  "mcp",
@@ -34,6 +34,7 @@
34
34
  "files": [
35
35
  "LICENSE",
36
36
  "README.md",
37
+ "skills/computer-use-linux/SKILL.md",
37
38
  "npm/README.md",
38
39
  "npm/bin/computer-use-linux.js",
39
40
  "npm/install.js"
@@ -0,0 +1,125 @@
1
+ ---
2
+ name: computer-use-linux
3
+ description: "Use when Hermes needs Linux desktop observation or control through the computer-use-linux MCP server."
4
+ version: 0.2.1
5
+ author: agent-sh
6
+ license: MIT
7
+ platforms: [linux]
8
+ ---
9
+
10
+ # computer-use-linux
11
+
12
+ Use `computer-use-linux` when Hermes needs to observe or operate a local Linux desktop through MCP: inspect the accessibility tree, list/focus windows, take screenshots, click, scroll, type, press keys, or invoke AT-SPI actions.
13
+
14
+ ## When to Use
15
+
16
+ Use this skill when:
17
+ - The user wants Hermes to control a Linux GUI app.
18
+ - You need desktop state from AT-SPI, screenshots, or compositor window metadata.
19
+ - You are configuring the `computer-use-linux` MCP server for Hermes.
20
+ - A desktop action needs target-aware input instead of blind shell commands.
21
+
22
+ Do not use this for remote browsers, websites, or headless automation when a browser-specific tool is available. Do not assume desktop actions are safe just because the MCP connection works.
23
+
24
+ ## Install
25
+
26
+ Preferred install for Hermes users:
27
+
28
+ ```bash
29
+ npm install -g @agent-sh/computer-use-linux@0.2.1
30
+ computer-use-linux doctor | jq .readiness
31
+ ```
32
+
33
+ Rust users can install the same server from crates.io:
34
+
35
+ ```bash
36
+ cargo install computer-use-linux --version 0.2.1
37
+ computer-use-linux doctor | jq .readiness
38
+ ```
39
+
40
+ If `doctor` reports missing input or accessibility support, run:
41
+
42
+ ```bash
43
+ computer-use-linux setup
44
+ systemctl --user enable --now ydotoold
45
+ computer-use-linux setup-window-targeting
46
+ computer-use-linux doctor | jq .readiness
47
+ ```
48
+
49
+ On GNOME Wayland, log out and back in after `setup-window-targeting` if the GNOME Shell extension was newly installed.
50
+
51
+ ## Configure Hermes
52
+
53
+ Add the server with the Hermes MCP CLI:
54
+
55
+ ```bash
56
+ hermes mcp add computer-use-linux --command computer-use-linux --args mcp
57
+ hermes mcp test computer-use-linux
58
+ hermes mcp configure computer-use-linux
59
+ ```
60
+
61
+ `configure` opens Hermes' tool-selection UI for this MCP server.
62
+
63
+ The generated config should look like this:
64
+
65
+ ```yaml
66
+ mcp_servers:
67
+ computer-use-linux:
68
+ command: computer-use-linux
69
+ args: ["mcp"]
70
+ timeout: 120
71
+ connect_timeout: 30
72
+ ```
73
+
74
+ If the binary is not on `PATH`, pass the absolute path to `--command`.
75
+
76
+ Hermes registers tools using the `mcp_<server>_<tool>` pattern. With this config, tool names are prefixed as `mcp_computer_use_linux_`, for example:
77
+
78
+ | MCP tool | Hermes tool name |
79
+ | --- | --- |
80
+ | `doctor` | `mcp_computer_use_linux_doctor` |
81
+ | `get_app_state` | `mcp_computer_use_linux_get_app_state` |
82
+ | `list_windows` | `mcp_computer_use_linux_list_windows` |
83
+ | `click` | `mcp_computer_use_linux_click` |
84
+ | `type_text` | `mcp_computer_use_linux_type_text` |
85
+
86
+ Restart Hermes after changing MCP config.
87
+
88
+ ## Procedure
89
+
90
+ 1. Start every desktop-control session with `doctor`.
91
+ 2. If `can_build_accessibility_tree` is false, run `setup` and restart the target app.
92
+ 3. If `can_query_windows` is false on GNOME Wayland, run `setup-window-targeting` and ask the user to log out and back in if setup says the shell extension needs a reload.
93
+ 4. Before targeted input, call `list_windows` or `focused_window` and verify the intended window by title, app id, pid, or wm class.
94
+ 5. Prefer semantic targeting from `get_app_state`: use element indices or role/name/text/states selectors.
95
+ 6. Use coordinates only when the UI surface has no useful accessibility tree.
96
+ 7. For text input, prefer `type_text` with a target selector (`window_id`, `pid`, `app_id`, `wm_class`, `title`, `tty`, `terminal_pid`, `terminal_command`, or `terminal_cwd`) rather than relying on current focus.
97
+ 8. After mutating actions, re-check state with `get_app_state`, `focused_window`, or an app-specific readback.
98
+
99
+ ## Pitfalls
100
+
101
+ - Already-running GTK, Qt, and Electron apps may need a restart after AT-SPI is enabled.
102
+ - GNOME may show a portal prompt on the first screenshot or `get_app_state` call with screenshots enabled.
103
+ - Desktop input is stateful. Avoid concurrent tool calls against this MCP server.
104
+ - `click`, `drag`, `press_key`, `type_text`, `perform_action`, and `set_value` can change real application state.
105
+ - `ydotoold` should run as a per-user service with its socket under `/run/user/$UID`, not as a system-wide service.
106
+ - On COSMIC, the standard npm, Cargo, and install-script paths install the `computer-use-linux-cosmic` helper automatically. Manual binary installs must copy both binaries.
107
+
108
+ ## Verification
109
+
110
+ Run:
111
+
112
+ ```bash
113
+ computer-use-linux doctor | jq .readiness
114
+ hermes chat --toolsets mcp-computer-use-linux -q "List the current desktop windows."
115
+ ```
116
+
117
+ Ready output should have:
118
+
119
+ - `can_register_mcp_tools: true`
120
+ - `can_build_accessibility_tree: true`
121
+ - `can_query_windows: true`
122
+ - `can_send_development_input: true`
123
+ - `blockers: []`
124
+
125
+ If Hermes does not expose the tools, check startup logs for MCP discovery errors and confirm the server name in `config.yaml` is exactly `computer-use-linux`.