native-devtools-mcp 0.3.6 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +97 -10
  2. package/package.json +13 -4
package/README.md CHANGED
@@ -4,16 +4,18 @@
4
4
 
5
5
  ![Version](https://img.shields.io/npm/v/native-devtools-mcp?style=flat-square)
6
6
  ![License](https://img.shields.io/npm/l/native-devtools-mcp?style=flat-square)
7
- ![Platform](https://img.shields.io/badge/platform-macOS%20%7C%20Windows-blue?style=flat-square)
7
+ ![Platform](https://img.shields.io/badge/platform-macOS%20%7C%20Windows%20%7C%20Android-blue?style=flat-square)
8
8
  ![Downloads](https://img.shields.io/npm/dt/native-devtools-mcp?style=flat-square)
9
9
 
10
- **Give your AI agent "eyes" and "hands" for native desktop applications.**
10
+ **Give your AI agent "eyes" and "hands" for native desktop and mobile applications.**
11
11
 
12
- A Model Context Protocol (MCP) server that provides **Computer Use** capabilities: screenshots, OCR, input simulation, and window management.
12
+ A Model Context Protocol (MCP) server that provides **Computer Use** capabilities: screenshots, OCR, input simulation, and window management — for **native desktop apps** and **Android devices**, not just browsers.
13
13
 
14
- [//]: # "Search keywords: MCP, Model Context Protocol, computer use, desktop automation, UI automation, RPA, screenshots, OCR, mouse, keyboard, screen reading, macOS, Windows, native-devtools-mcp"
14
+ **Works with:** [Claude Desktop](https://claude.ai/download) [Claude Code](https://docs.anthropic.com/en/docs/claude-code) [Cursor](https://cursor.com) Any MCP-compatible client
15
15
 
16
- [Features](#-features) [Installation](#-installation) [For AI Agents](#-for-ai-agents-llms) • [Permissions](#-required-permissions-macos)
16
+ [//]: # "Search keywords: MCP, MCP server, Model Context Protocol, computer use, desktop automation, UI automation, native app testing, test automation, e2e testing, RPA, screenshots, OCR, template matching, accessibility, mouse, keyboard, screen reading, macOS, Windows, Android, ADB, mobile testing, Claude, Claude Code, Cursor, AI agent, native-devtools-mcp"
17
+
18
+ [Features](#-features) • [Installation](#-installation) • [For AI Agents](#-for-ai-agents-llms) • [Android](#-android-support) • [Permissions](#-required-permissions-macos)
17
19
 
18
20
  <table>
19
21
  <tr>
@@ -30,10 +32,6 @@ A Model Context Protocol (MCP) server that provides **Computer Use** capabilitie
30
32
 
31
33
  ---
32
34
 
33
- ## 🔍 Search Keywords
34
-
35
- MCP, Model Context Protocol, computer use, desktop automation, UI automation, RPA, screenshots, OCR, screen reading, mouse, keyboard, macOS, Windows, native-devtools-mcp.
36
-
37
35
  ## 🚀 Features
38
36
 
39
37
  - **👀 Computer Vision:** Capture screenshots of screens, windows, or specific regions. Includes built-in OCR (text recognition) to "read" the screen.
@@ -41,6 +39,7 @@ MCP, Model Context Protocol, computer use, desktop automation, UI automation, RP
41
39
  - **🪟 Window Management:** List open windows, find applications, and bring them to focus.
42
40
  - **🧩 Template Matching:** Find non-text UI elements (icons, shapes) using `load_image` + `find_image`, returning precise click coordinates.
43
41
  - **🔒 Local & Private:** 100% local execution. No screenshots or data are ever sent to external servers.
42
+ - **📱 Android Support:** Connect to Android devices over ADB for screenshots, input simulation, UI element search, and app management — all from the same MCP server.
44
43
  - **🔌 Dual-Mode Interaction:**
45
44
  1. **Visual/Native:** Works with *any* app via screenshots & coordinates (Universal).
46
45
  2. **AppDebugKit:** Deep integration for supported apps to inspect the UI tree (DOM-like structure).
@@ -57,7 +56,7 @@ This MCP server is designed to be **highly discoverable and usable** by AI model
57
56
  3. `find_text`: A shortcut to find text on screen and get its coordinates immediately. Uses the platform **accessibility API** (macOS Accessibility / Windows UI Automation) for precise element-level matching, with OCR fallback.
58
57
  4. `load_image` / `find_image`: Template matching for non-text UI elements (icons, shapes), returning screen coordinates for clicking.
59
58
 
60
- ## 📦 Installation (macOS + Windows)
59
+ ## 📦 Installation
61
60
 
62
61
  The install steps are identical on macOS and Windows.
63
62
 
@@ -84,6 +83,12 @@ cd native-devtools-mcp
84
83
  cargo build --release
85
84
  # Binary: ./target/release/native-devtools-mcp
86
85
  ```
86
+
87
+ To include Android device support, enable the `android` feature flag:
88
+
89
+ ```bash
90
+ cargo build --release --features android
91
+ ```
87
92
  </details>
88
93
 
89
94
  ## ⚙️ Configuration
@@ -179,6 +184,77 @@ Use `find_image` when the target is **not text** (icons, toggles, custom control
179
184
 
180
185
  Optional inputs like `mask_id`, `search_region`, `scales`, and `rotations` can improve precision and performance.
181
186
 
187
+ ## 📱 Android Support
188
+
189
+ Android support is available as an optional feature flag. It lets the MCP server communicate with Android devices over ADB (USB or Wi-Fi), providing screenshots, input simulation, UI element search, and app management.
190
+
191
+ ### Prerequisites
192
+
193
+ 1. **ADB installed** on the host machine (`brew install android-platform-tools` on macOS, or install via [Android SDK](https://developer.android.com/tools/releases/platform-tools))
194
+ 2. **USB debugging enabled** on the Android device (Settings > Developer options > USB debugging)
195
+ 3. **ADB server running** — starts automatically when you run `adb devices`
196
+
197
+ ### Building with Android support
198
+
199
+ ```bash
200
+ cargo build --release --features android
201
+ ```
202
+
203
+ ### Android tools
204
+
205
+ All Android tools are prefixed with `android_` and appear dynamically after connecting to a device:
206
+
207
+ | Tool | Description |
208
+ |------|-------------|
209
+ | `android_list_devices` | List all ADB-connected devices (always available) |
210
+ | `android_connect` | Connect to a device by serial number |
211
+ | `android_disconnect` | Disconnect from the current device |
212
+ | `android_screenshot` | Capture the device screen |
213
+ | `android_find_text` | Find UI elements by text (via uiautomator) |
214
+ | `android_click` | Tap at screen coordinates |
215
+ | `android_swipe` | Swipe between two points |
216
+ | `android_type_text` | Type text on the device |
217
+ | `android_press_key` | Press a key (e.g., `KEYCODE_HOME`, `KEYCODE_BACK`) |
218
+ | `android_launch_app` | Launch an app by package name |
219
+ | `android_list_apps` | List installed packages |
220
+ | `android_get_display_info` | Get screen resolution and density |
221
+ | `android_get_current_activity` | Get the current foreground activity |
222
+
223
+ ### Typical workflow
224
+
225
+ ```
226
+ android_list_devices → find your device serial
227
+ android_connect(serial="...") → connect (unlocks android_* tools)
228
+ android_screenshot → see what's on screen
229
+ android_find_text(text="OK") → locate a button
230
+ android_click(x=..., y=...) → tap it
231
+ ```
232
+
233
+ ### Known issues
234
+
235
+ > **MIUI / HyperOS (Xiaomi, Redmi, POCO devices):** Input injection (`android_click`, `android_type_text`, `android_press_key`, `android_swipe`) and `android_find_text` (via uiautomator) require an additional security toggle:
236
+ >
237
+ > **Settings > Developer options > USB debugging (Security settings)** — enable this toggle. MIUI may require you to sign in with a Mi account to enable it.
238
+ >
239
+ > Without this, you'll see `INJECT_EVENTS permission` errors for input tools and `could not get idle state` errors for `android_find_text`. Screenshot and device info tools work without this toggle.
240
+
241
+ > **Wireless ADB:** To connect without a USB cable, first connect via USB and run:
242
+ > ```bash
243
+ > adb tcpip 5555
244
+ > adb connect <phone-ip>:5555
245
+ > ```
246
+ > Then use the `<phone-ip>:5555` serial in `android_connect`.
247
+
248
+ ### Smoke tests
249
+
250
+ Smoke tests verify all Android tools against a real connected device. They are `#[ignore]`d by default and must be run explicitly:
251
+
252
+ ```bash
253
+ cargo test --features android --test android_smoke_tests -- --ignored --test-threads=1
254
+ ```
255
+
256
+ Tests must run sequentially (`--test-threads=1`) since they share a single physical device. The device must be unlocked and awake.
257
+
182
258
  ## 🏗️ Architecture
183
259
 
184
260
  ```mermaid
@@ -186,6 +262,7 @@ graph TD
186
262
  Client[Claude / LLM Client] <-->|JSON-RPC 2.0| Server[native-devtools-mcp]
187
263
  Server -->|Direct API| Sys[System APIs]
188
264
  Server -->|WebSocket| Debug[AppDebugKit]
265
+ Server -->|ADB Protocol| Android[Android Device]
189
266
 
190
267
  subgraph "Your Machine"
191
268
  Sys -->|Screen/OCR| macOS[CoreGraphics / Vision]
@@ -193,6 +270,12 @@ graph TD
193
270
  Sys -->|Text Search| UIA[UI Automation]
194
271
  Debug -.->|Inspect| App[Target App]
195
272
  end
273
+
274
+ subgraph "Android Device (USB/Wi-Fi)"
275
+ Android -->|screencap| Screen[Screenshots]
276
+ Android -->|input| Input[Tap / Swipe / Type]
277
+ Android -->|uiautomator| UITree[UI Hierarchy]
278
+ end
196
279
  ```
197
280
 
198
281
  <details>
@@ -208,6 +291,10 @@ graph TD
208
291
  | | Input | `SendInput` (Win32) |
209
292
  | | Text Search (`find_text`) | `UI Automation` (primary), WinRT OCR (fallback) |
210
293
  | | OCR | `Windows.Media.Ocr` (WinRT) |
294
+ | **Android** | Screenshots | `screencap` / ADB framebuffer |
295
+ | | Input | `adb shell input` (tap, swipe, text, keyevent) |
296
+ | | Text Search (`find_text`) | `uiautomator dump` (accessibility tree) |
297
+ | | Device Communication | `adb_client` crate (native Rust ADB protocol) |
211
298
 
212
299
  ### Screenshot Coordinate Precision
213
300
 
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "native-devtools-mcp",
3
- "version": "0.3.6",
4
- "description": "MCP server for computer-use / desktop automation of native apps (screenshots, OCR, input)",
3
+ "version": "0.4.0",
4
+ "description": "MCP server for native desktop app testing screenshot, OCR, click, type, find_text, template matching. macOS & Windows.",
5
5
  "license": "MIT",
6
6
  "repository": {
7
7
  "type": "git",
@@ -10,19 +10,28 @@
10
10
  "homepage": "https://github.com/sh3ll3x3c/native-devtools-mcp",
11
11
  "keywords": [
12
12
  "mcp",
13
+ "mcp-server",
13
14
  "model-context-protocol",
14
15
  "computer-use",
15
16
  "desktop-automation",
16
17
  "ui-automation",
18
+ "native-app",
19
+ "test-automation",
20
+ "e2e-testing",
17
21
  "rpa",
18
22
  "ocr",
19
23
  "screenshot",
24
+ "template-matching",
25
+ "accessibility",
26
+ "find-text",
20
27
  "screen-reading",
21
28
  "mouse",
22
29
  "keyboard",
23
30
  "ai-agent",
24
31
  "llm",
25
32
  "claude",
33
+ "claude-code",
34
+ "cursor",
26
35
  "gemini",
27
36
  "gpt",
28
37
  "devtools",
@@ -39,8 +48,8 @@
39
48
  "bin"
40
49
  ],
41
50
  "optionalDependencies": {
42
- "@sh3ll3x3c/native-devtools-mcp-darwin-arm64": "0.3.6",
43
- "@sh3ll3x3c/native-devtools-mcp-win32-x64": "0.3.6"
51
+ "@sh3ll3x3c/native-devtools-mcp-darwin-arm64": "0.4.0",
52
+ "@sh3ll3x3c/native-devtools-mcp-win32-x64": "0.4.0"
44
53
  },
45
54
  "engines": {
46
55
  "node": ">=18"