native-devtools-mcp 0.3.6 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +97 -10
- package/package.json +13 -4
package/README.md
CHANGED
|
@@ -4,16 +4,18 @@
|
|
|
4
4
|
|
|
5
5
|

|
|
6
6
|

|
|
7
|
-

|
|
7
|
+

|
|
8
8
|

|
|
9
9
|
|
|
10
|
-
**Give your AI agent "eyes" and "hands" for native desktop applications.**
|
|
10
|
+
**Give your AI agent "eyes" and "hands" for native desktop and mobile applications.**
|
|
11
11
|
|
|
12
|
-
A Model Context Protocol (MCP) server that provides **Computer Use** capabilities: screenshots, OCR, input simulation, and window management.
|
|
12
|
+
A Model Context Protocol (MCP) server that provides **Computer Use** capabilities: screenshots, OCR, input simulation, and window management — for **native desktop apps** and **Android devices**, not just browsers.
|
|
13
13
|
|
|
14
|
-
|
|
14
|
+
**Works with:** [Claude Desktop](https://claude.ai/download) • [Claude Code](https://docs.anthropic.com/en/docs/claude-code) • [Cursor](https://cursor.com) • Any MCP-compatible client
|
|
15
15
|
|
|
16
|
-
[
|
|
16
|
+
[//]: # "Search keywords: MCP, MCP server, Model Context Protocol, computer use, desktop automation, UI automation, native app testing, test automation, e2e testing, RPA, screenshots, OCR, template matching, accessibility, mouse, keyboard, screen reading, macOS, Windows, Android, ADB, mobile testing, Claude, Claude Code, Cursor, AI agent, native-devtools-mcp"
|
|
17
|
+
|
|
18
|
+
[Features](#-features) • [Installation](#-installation) • [For AI Agents](#-for-ai-agents-llms) • [Android](#-android-support) • [Permissions](#-required-permissions-macos)
|
|
17
19
|
|
|
18
20
|
<table>
|
|
19
21
|
<tr>
|
|
@@ -30,10 +32,6 @@ A Model Context Protocol (MCP) server that provides **Computer Use** capabilitie
|
|
|
30
32
|
|
|
31
33
|
---
|
|
32
34
|
|
|
33
|
-
## 🔍 Search Keywords
|
|
34
|
-
|
|
35
|
-
MCP, Model Context Protocol, computer use, desktop automation, UI automation, RPA, screenshots, OCR, screen reading, mouse, keyboard, macOS, Windows, native-devtools-mcp.
|
|
36
|
-
|
|
37
35
|
## 🚀 Features
|
|
38
36
|
|
|
39
37
|
- **👀 Computer Vision:** Capture screenshots of screens, windows, or specific regions. Includes built-in OCR (text recognition) to "read" the screen.
|
|
@@ -41,6 +39,7 @@ MCP, Model Context Protocol, computer use, desktop automation, UI automation, RP
|
|
|
41
39
|
- **🪟 Window Management:** List open windows, find applications, and bring them to focus.
|
|
42
40
|
- **🧩 Template Matching:** Find non-text UI elements (icons, shapes) using `load_image` + `find_image`, returning precise click coordinates.
|
|
43
41
|
- **🔒 Local & Private:** 100% local execution. No screenshots or data are ever sent to external servers.
|
|
42
|
+
- **📱 Android Support:** Connect to Android devices over ADB for screenshots, input simulation, UI element search, and app management — all from the same MCP server.
|
|
44
43
|
- **🔌 Dual-Mode Interaction:**
|
|
45
44
|
1. **Visual/Native:** Works with *any* app via screenshots & coordinates (Universal).
|
|
46
45
|
2. **AppDebugKit:** Deep integration for supported apps to inspect the UI tree (DOM-like structure).
|
|
@@ -57,7 +56,7 @@ This MCP server is designed to be **highly discoverable and usable** by AI model
|
|
|
57
56
|
3. `find_text`: A shortcut to find text on screen and get its coordinates immediately. Uses the platform **accessibility API** (macOS Accessibility / Windows UI Automation) for precise element-level matching, with OCR fallback.
|
|
58
57
|
4. `load_image` / `find_image`: Template matching for non-text UI elements (icons, shapes), returning screen coordinates for clicking.
|
|
59
58
|
|
|
60
|
-
## 📦 Installation
|
|
59
|
+
## 📦 Installation
|
|
61
60
|
|
|
62
61
|
The install steps are identical on macOS and Windows.
|
|
63
62
|
|
|
@@ -84,6 +83,12 @@ cd native-devtools-mcp
|
|
|
84
83
|
cargo build --release
|
|
85
84
|
# Binary: ./target/release/native-devtools-mcp
|
|
86
85
|
```
|
|
86
|
+
|
|
87
|
+
To include Android device support, enable the `android` feature flag:
|
|
88
|
+
|
|
89
|
+
```bash
|
|
90
|
+
cargo build --release --features android
|
|
91
|
+
```
|
|
87
92
|
</details>
|
|
88
93
|
|
|
89
94
|
## ⚙️ Configuration
|
|
@@ -179,6 +184,77 @@ Use `find_image` when the target is **not text** (icons, toggles, custom control
|
|
|
179
184
|
|
|
180
185
|
Optional inputs like `mask_id`, `search_region`, `scales`, and `rotations` can improve precision and performance.
|
|
181
186
|
|
|
187
|
+
## 📱 Android Support
|
|
188
|
+
|
|
189
|
+
Android support is available as an optional feature flag. It lets the MCP server communicate with Android devices over ADB (USB or Wi-Fi), providing screenshots, input simulation, UI element search, and app management.
|
|
190
|
+
|
|
191
|
+
### Prerequisites
|
|
192
|
+
|
|
193
|
+
1. **ADB installed** on the host machine (`brew install android-platform-tools` on macOS, or install via [Android SDK](https://developer.android.com/tools/releases/platform-tools))
|
|
194
|
+
2. **USB debugging enabled** on the Android device (Settings > Developer options > USB debugging)
|
|
195
|
+
3. **ADB server running** — starts automatically when you run `adb devices`
|
|
196
|
+
|
|
197
|
+
### Building with Android support
|
|
198
|
+
|
|
199
|
+
```bash
|
|
200
|
+
cargo build --release --features android
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
### Android tools
|
|
204
|
+
|
|
205
|
+
All Android tools are prefixed with `android_` and appear dynamically after connecting to a device:
|
|
206
|
+
|
|
207
|
+
| Tool | Description |
|
|
208
|
+
|------|-------------|
|
|
209
|
+
| `android_list_devices` | List all ADB-connected devices (always available) |
|
|
210
|
+
| `android_connect` | Connect to a device by serial number |
|
|
211
|
+
| `android_disconnect` | Disconnect from the current device |
|
|
212
|
+
| `android_screenshot` | Capture the device screen |
|
|
213
|
+
| `android_find_text` | Find UI elements by text (via uiautomator) |
|
|
214
|
+
| `android_click` | Tap at screen coordinates |
|
|
215
|
+
| `android_swipe` | Swipe between two points |
|
|
216
|
+
| `android_type_text` | Type text on the device |
|
|
217
|
+
| `android_press_key` | Press a key (e.g., `KEYCODE_HOME`, `KEYCODE_BACK`) |
|
|
218
|
+
| `android_launch_app` | Launch an app by package name |
|
|
219
|
+
| `android_list_apps` | List installed packages |
|
|
220
|
+
| `android_get_display_info` | Get screen resolution and density |
|
|
221
|
+
| `android_get_current_activity` | Get the current foreground activity |
|
|
222
|
+
|
|
223
|
+
### Typical workflow
|
|
224
|
+
|
|
225
|
+
```
|
|
226
|
+
android_list_devices → find your device serial
|
|
227
|
+
android_connect(serial="...") → connect (unlocks android_* tools)
|
|
228
|
+
android_screenshot → see what's on screen
|
|
229
|
+
android_find_text(text="OK") → locate a button
|
|
230
|
+
android_click(x=..., y=...) → tap it
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
### Known issues
|
|
234
|
+
|
|
235
|
+
> **MIUI / HyperOS (Xiaomi, Redmi, POCO devices):** Input injection (`android_click`, `android_type_text`, `android_press_key`, `android_swipe`) and `android_find_text` (via uiautomator) require an additional security toggle:
|
|
236
|
+
>
|
|
237
|
+
> **Settings > Developer options > USB debugging (Security settings)** — enable this toggle. MIUI may require you to sign in with a Mi account to enable it.
|
|
238
|
+
>
|
|
239
|
+
> Without this, you'll see `INJECT_EVENTS permission` errors for input tools and `could not get idle state` errors for `android_find_text`. Screenshot and device info tools work without this toggle.
|
|
240
|
+
|
|
241
|
+
> **Wireless ADB:** To connect without a USB cable, first connect via USB and run:
|
|
242
|
+
> ```bash
|
|
243
|
+
> adb tcpip 5555
|
|
244
|
+
> adb connect <phone-ip>:5555
|
|
245
|
+
> ```
|
|
246
|
+
> Then use the `<phone-ip>:5555` serial in `android_connect`.
|
|
247
|
+
|
|
248
|
+
### Smoke tests
|
|
249
|
+
|
|
250
|
+
Smoke tests verify all Android tools against a real connected device. They are `#[ignore]`d by default and must be run explicitly:
|
|
251
|
+
|
|
252
|
+
```bash
|
|
253
|
+
cargo test --features android --test android_smoke_tests -- --ignored --test-threads=1
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
Tests must run sequentially (`--test-threads=1`) since they share a single physical device. The device must be unlocked and awake.
|
|
257
|
+
|
|
182
258
|
## 🏗️ Architecture
|
|
183
259
|
|
|
184
260
|
```mermaid
|
|
@@ -186,6 +262,7 @@ graph TD
|
|
|
186
262
|
Client[Claude / LLM Client] <-->|JSON-RPC 2.0| Server[native-devtools-mcp]
|
|
187
263
|
Server -->|Direct API| Sys[System APIs]
|
|
188
264
|
Server -->|WebSocket| Debug[AppDebugKit]
|
|
265
|
+
Server -->|ADB Protocol| Android[Android Device]
|
|
189
266
|
|
|
190
267
|
subgraph "Your Machine"
|
|
191
268
|
Sys -->|Screen/OCR| macOS[CoreGraphics / Vision]
|
|
@@ -193,6 +270,12 @@ graph TD
|
|
|
193
270
|
Sys -->|Text Search| UIA[UI Automation]
|
|
194
271
|
Debug -.->|Inspect| App[Target App]
|
|
195
272
|
end
|
|
273
|
+
|
|
274
|
+
subgraph "Android Device (USB/Wi-Fi)"
|
|
275
|
+
Android -->|screencap| Screen[Screenshots]
|
|
276
|
+
Android -->|input| Input[Tap / Swipe / Type]
|
|
277
|
+
Android -->|uiautomator| UITree[UI Hierarchy]
|
|
278
|
+
end
|
|
196
279
|
```
|
|
197
280
|
|
|
198
281
|
<details>
|
|
@@ -208,6 +291,10 @@ graph TD
|
|
|
208
291
|
| | Input | `SendInput` (Win32) |
|
|
209
292
|
| | Text Search (`find_text`) | `UI Automation` (primary), WinRT OCR (fallback) |
|
|
210
293
|
| | OCR | `Windows.Media.Ocr` (WinRT) |
|
|
294
|
+
| **Android** | Screenshots | `screencap` / ADB framebuffer |
|
|
295
|
+
| | Input | `adb shell input` (tap, swipe, text, keyevent) |
|
|
296
|
+
| | Text Search (`find_text`) | `uiautomator dump` (accessibility tree) |
|
|
297
|
+
| | Device Communication | `adb_client` crate (native Rust ADB protocol) |
|
|
211
298
|
|
|
212
299
|
### Screenshot Coordinate Precision
|
|
213
300
|
|
package/package.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "native-devtools-mcp",
|
|
3
|
-
"version": "0.
|
|
4
|
-
"description": "MCP server for
|
|
3
|
+
"version": "0.4.0",
|
|
4
|
+
"description": "MCP server for native desktop app testing — screenshot, OCR, click, type, find_text, template matching. macOS & Windows.",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"repository": {
|
|
7
7
|
"type": "git",
|
|
@@ -10,19 +10,28 @@
|
|
|
10
10
|
"homepage": "https://github.com/sh3ll3x3c/native-devtools-mcp",
|
|
11
11
|
"keywords": [
|
|
12
12
|
"mcp",
|
|
13
|
+
"mcp-server",
|
|
13
14
|
"model-context-protocol",
|
|
14
15
|
"computer-use",
|
|
15
16
|
"desktop-automation",
|
|
16
17
|
"ui-automation",
|
|
18
|
+
"native-app",
|
|
19
|
+
"test-automation",
|
|
20
|
+
"e2e-testing",
|
|
17
21
|
"rpa",
|
|
18
22
|
"ocr",
|
|
19
23
|
"screenshot",
|
|
24
|
+
"template-matching",
|
|
25
|
+
"accessibility",
|
|
26
|
+
"find-text",
|
|
20
27
|
"screen-reading",
|
|
21
28
|
"mouse",
|
|
22
29
|
"keyboard",
|
|
23
30
|
"ai-agent",
|
|
24
31
|
"llm",
|
|
25
32
|
"claude",
|
|
33
|
+
"claude-code",
|
|
34
|
+
"cursor",
|
|
26
35
|
"gemini",
|
|
27
36
|
"gpt",
|
|
28
37
|
"devtools",
|
|
@@ -39,8 +48,8 @@
|
|
|
39
48
|
"bin"
|
|
40
49
|
],
|
|
41
50
|
"optionalDependencies": {
|
|
42
|
-
"@sh3ll3x3c/native-devtools-mcp-darwin-arm64": "0.
|
|
43
|
-
"@sh3ll3x3c/native-devtools-mcp-win32-x64": "0.
|
|
51
|
+
"@sh3ll3x3c/native-devtools-mcp-darwin-arm64": "0.4.0",
|
|
52
|
+
"@sh3ll3x3c/native-devtools-mcp-win32-x64": "0.4.0"
|
|
44
53
|
},
|
|
45
54
|
"engines": {
|
|
46
55
|
"node": ">=18"
|