native-devtools-mcp 0.5.1 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +38 -8
  2. package/package.json +3 -3
package/README.md CHANGED
@@ -1,10 +1,10 @@
1
1
  # native-devtools-mcp
2
2
 
3
- `native-devtools-mcp` is a Model Context Protocol (MCP) server for computer use on macOS, Windows, and Android. It gives AI agents and MCP clients direct control over native desktop apps and Android devices through screenshots, OCR, accessibility-based text lookup, input simulation, window management, and ADB.
3
+ `native-devtools-mcp` is a Model Context Protocol (MCP) server for computer use on macOS, Windows, and Android. It gives AI agents and MCP clients direct control over native desktop apps, Chrome/Electron browsers, and Android devices through screenshots, OCR, accessibility-based text lookup, input simulation, window management, Chrome DevTools Protocol (CDP), and ADB.
4
4
 
5
- Use it when browser-only automation is not enough: Electron apps, system dialogs, desktop tools, native app testing, and Android device workflows. It works with [Claude Desktop](https://claude.ai/download), [Claude Code](https://docs.anthropic.com/en/docs/claude-code), [Cursor](https://cursor.com), and other MCP-compatible clients.
5
+ Use it when browser-only automation is not enough: Electron apps (Signal, Discord, VS Code), Chrome browser automation, system dialogs, desktop tools, native app testing, and Android device workflows. It works with [Claude Desktop](https://claude.ai/download), [Claude Code](https://docs.anthropic.com/en/docs/claude-code), [Cursor](https://cursor.com), and other MCP-compatible clients.
6
6
 
7
- Useful for MCP-based computer use, desktop automation, UI automation, native app testing, e2e testing, RPA, screen reading, mouse and keyboard control, and Android device automation.
7
+ Useful for MCP-based computer use, desktop automation, browser automation, UI automation, native app testing, e2e testing, RPA, screen reading, mouse and keyboard control, Chrome DevTools Protocol automation, and Android device automation.
8
8
 
9
9
  ```bash
10
10
  npx -y native-devtools-mcp
@@ -15,6 +15,7 @@ npx -y native-devtools-mcp
15
15
  - `click`, `type_text`, `scroll`, `launch_app`, `quit_app`, and window management
16
16
  - `element_at_point` for inspecting accessible UI elements at screen coordinates
17
17
  - `load_image` + `find_image` for non-text UI elements such as icons and custom controls
18
+ - Chrome/Electron automation via CDP: snapshots, click, fill, navigate, type, and tab management
18
19
  - Android screenshots, text lookup, input, and app control over ADB
19
20
  - Local execution: screenshots and input stay on the machine
20
21
 
@@ -25,7 +26,7 @@ npx -y native-devtools-mcp
25
26
  ![Platform](https://img.shields.io/badge/platform-macOS%20%7C%20Windows%20%7C%20Android-blue?style=flat-square)
26
27
  ![Downloads](https://img.shields.io/npm/dt/native-devtools-mcp?style=flat-square)
27
28
 
28
- [Features](#-features) • [Installation](#-installation) • [Getting Started](#-getting-started) • [Recipes](#-recipes-and-examples) • [Security & Trust](#-security--trust) • [For AI Agents](#-for-ai-agents-llms) • [Android](#-android-support)
29
+ [Features](#-features) • [Installation](#-installation) • [Getting Started](#-getting-started) • [Recipes](#-recipes-and-examples) • [Security & Trust](#-security--trust) • [For AI Agents](#-for-ai-agents-llms) • [Chrome/Electron (CDP)](#-browser-automation-cdp) • [Android](#-android-support)
29
30
 
30
31
  <div align="center">
31
32
  <table>
@@ -50,10 +51,12 @@ npx -y native-devtools-mcp
50
51
  - **🧩 Template Matching:** Find non-text UI elements (icons, shapes) using `load_image` + `find_image`, returning precise click coordinates.
51
52
  - **🔒 Local & Private:** 100% local execution. No screenshots or data are ever sent to external servers.
52
53
  - **📱 Android Support:** Connect to Android devices over ADB for screenshots, input simulation, UI element search, and app management — all from the same MCP server.
53
- - **🔍 Hover Tracking:** Track cursor hover transitions across UI elements in real-time. Configurable dwell threshold filters pass-through noise — designed for LLMs observing user navigation patterns. macOS only.
54
+ - **🔍 Hover Tracking:** Track cursor hover transitions across UI elements in real-time. Configurable dwell threshold filters pass-through noise — designed for LLMs observing user navigation patterns.
55
+ - **🌐 Browser Automation (CDP):** Connect to Chrome/Electron apps via Chrome DevTools Protocol. Take accessibility tree snapshots, click elements by UID, evaluate JavaScript, and manage tabs — all without a separate Node.js server.
54
56
  - **🔌 Dual-Mode Interaction:**
55
57
  1. **Visual/Native:** Works with *any* app via screenshots & coordinates (Universal).
56
58
  2. **AppDebugKit:** Deep integration for supported apps to inspect the UI tree (DOM-like structure).
59
+ 3. **CDP:** Connect to Chrome/Electron via `--remote-debugging-port` for DOM-level element targeting and JS evaluation.
57
60
 
58
61
  ## 🤖 For AI Agents (LLMs)
59
62
 
@@ -67,8 +70,10 @@ This MCP server is designed to be **highly discoverable and usable** by AI model
67
70
  3. `find_text`: A shortcut to find text on screen and get its coordinates immediately. Uses the platform **accessibility API** (macOS Accessibility / Windows UI Automation) for precise element-level matching, with OCR fallback.
68
71
  4. `element_at_point`: Inspect the accessibility element at given screen coordinates — returns name, role, label, value, bounds, pid, and app_name. Note: privacy-focused Electron apps (e.g. Signal) may restrict their AX tree, returning only a container — use `take_screenshot` with OCR as a fallback.
69
72
  5. `load_image` / `find_image`: Template matching for non-text UI elements (icons, shapes), returning screen coordinates for clicking.
70
- 6. `start_hover_tracking` / `get_hover_events` / `stop_hover_tracking`: Track cursor hover transitions across UI elements. Configurable dwell threshold filters pass-throughs. macOS only.
71
- 7. `launch_app` / `quit_app`: Launch apps with optional CLI args, or gracefully/forcefully quit them.
73
+ 6. `start_hover_tracking` / `get_hover_events` / `stop_hover_tracking`: Track cursor hover transitions across UI elements. Configurable dwell threshold filters pass-throughs.
74
+ 7. `start_recording` / `stop_recording`: Record the frontmost app's window at ~5fps as timestamped JPEG frames. Automatically follows app switches.
75
+ 8. `launch_app` / `quit_app`: Launch apps with optional CLI args, or gracefully/forcefully quit them.
76
+ 9. `cdp_connect` / `cdp_take_snapshot` / `cdp_click` / `cdp_fill` / `cdp_navigate`: Connect to Chrome or Electron apps via CDP for DOM-level automation — snapshots, clicking, typing, navigation, and tab management without a separate Node.js server.
72
77
 
73
78
  ## 📦 Installation
74
79
 
@@ -247,6 +252,27 @@ Use `find_image` when the target is **not text** (icons, toggles, custom control
247
252
 
248
253
  Optional inputs like `mask_id`, `search_region`, `scales`, and `rotations` can improve precision and performance.
249
254
 
255
+ ## 🌐 Browser Automation (CDP)
256
+
257
+ Connect to Chrome or Electron apps via the Chrome DevTools Protocol for DOM-level automation — more reliable than coordinate-based clicking for web content.
258
+
259
+ ```bash
260
+ # Launch Chrome with remote debugging
261
+ launch_app(app_name="Google Chrome", args=["--remote-debugging-port=9222", "--user-data-dir=/tmp/chrome-profile"])
262
+
263
+ # Connect and automate
264
+ cdp_connect(port=9222)
265
+ cdp_navigate(url="https://example.com")
266
+ cdp_take_snapshot() # accessibility tree with element UIDs
267
+ cdp_fill(uid="10", value="search query")
268
+ cdp_press_key(key="Enter")
269
+ cdp_wait_for(text=["Results"])
270
+ ```
271
+
272
+ **16 CDP tools** — click, hover, fill, type, press key, navigate, handle dialogs, manage tabs, evaluate JS, and more. Works with Chrome 136+, Chromium, and Electron apps (Signal, Discord, VS Code, Slack). See [`AGENTS.md`](./AGENTS.md) for full tool reference.
273
+
274
+ **Chrome 136+ note:** Requires `--user-data-dir=<path>` alongside `--remote-debugging-port` (Chrome silently ignores the debug port with the default profile). Electron apps only need `--remote-debugging-port`.
275
+
250
276
  ## 📱 Android Support
251
277
 
252
278
  Android support is built-in. The MCP server communicates with Android devices over ADB (USB or Wi-Fi), providing screenshots, input simulation, UI element search, and app management.
@@ -344,12 +370,15 @@ graph TD
344
370
  | | Input | `CGEvent` (CoreGraphics) |
345
371
  | | Text Search (`find_text`) | `Accessibility API` (primary), Vision OCR (fallback) |
346
372
  | | Element Inspection (`element_at_point`) | `AXUIElementCopyElementAtPosition` + AX tree walk fallback (Accessibility API) |
347
- | | Hover Tracking (`start_hover_tracking`) | `CGEvent` cursor + Accessibility API polling (macOS only) |
373
+ | | Hover Tracking (`start_hover_tracking`) | `CGEvent` cursor + Accessibility API polling |
374
+ | | Screen Recording (`start_recording`) | `CGWindowListCreateImage` at configurable fps |
348
375
  | | OCR | `VNRecognizeTextRequest` (Vision Framework) |
349
376
  | **Windows** | Screenshots | `BitBlt` (GDI) |
350
377
  | | Input | `SendInput` (Win32) |
351
378
  | | Text Search (`find_text`) | `UI Automation` (primary), WinRT OCR (fallback) |
352
379
  | | Element Inspection (`element_at_point`) | `IUIAutomation::ElementFromPoint` (UI Automation) |
380
+ | | Hover Tracking (`start_hover_tracking`) | `GetCursorPos` + UI Automation polling |
381
+ | | Screen Recording (`start_recording`) | `BitBlt` (GDI) at configurable fps |
353
382
  | | OCR | `Windows.Media.Ocr` (WinRT) |
354
383
  | **Android** | Screenshots | `screencap` / ADB framebuffer |
355
384
  | | Input | `adb shell input` (tap, swipe, text, keyevent) |
@@ -390,6 +419,7 @@ Works out of the box on **Windows 10/11**.
390
419
  * `find_text` uses **UI Automation (UIA)** as the primary search mechanism, querying the accessibility tree for element names. This is the same accessibility-first approach used on macOS (with the Accessibility API). Falls back to OCR automatically when UIA finds no matches.
391
420
  * OCR uses the built-in Windows Media OCR engine (offline).
392
421
  * **Note:** Cannot interact with "Run as Administrator" windows unless the MCP server itself is also running as Administrator.
422
+ * **Screen Recording Performance:** Screen recording uses GDI/BitBlt at configurable fps (default 5). For higher fps requirements or game capture scenarios, DXGI Desktop Duplication API would provide hardware-accelerated capture — this is a planned future upgrade.
393
423
 
394
424
  ## 📜 License
395
425
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "native-devtools-mcp",
3
- "version": "0.5.1",
3
+ "version": "0.7.0",
4
4
  "mcpName": "io.github.sh3ll3x3c/native-devtools",
5
5
  "description": "MCP server for native app testing — screenshot, OCR, click, type, find_text, template matching. macOS, Windows & Android.",
6
6
  "license": "MIT",
@@ -53,8 +53,8 @@
53
53
  "bin"
54
54
  ],
55
55
  "optionalDependencies": {
56
- "@sh3ll3x3c/native-devtools-mcp-darwin-arm64": "0.5.1",
57
- "@sh3ll3x3c/native-devtools-mcp-win32-x64": "0.5.1"
56
+ "@sh3ll3x3c/native-devtools-mcp-darwin-arm64": "0.7.0",
57
+ "@sh3ll3x3c/native-devtools-mcp-win32-x64": "0.7.0"
58
58
  },
59
59
  "engines": {
60
60
  "node": ">=18"