native-devtools-mcp 0.4.2 → 0.4.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +74 -46
  2. package/package.json +3 -3
package/README.md CHANGED
@@ -15,7 +15,7 @@ A Model Context Protocol (MCP) server that provides **Computer Use** capabilitie
15
15
 
16
16
  [//]: # "Search keywords: MCP, MCP server, Model Context Protocol, computer use, desktop automation, UI automation, native app testing, test automation, e2e testing, RPA, screenshots, OCR, template matching, accessibility, mouse, keyboard, screen reading, macOS, Windows, Android, ADB, mobile testing, Claude, Claude Code, Cursor, AI agent, native-devtools-mcp"
17
17
 
18
- [Features](#-features) • [Installation](#-installation) • [For AI Agents](#-for-ai-agents-llms) • [Android](#-android-support) • [Permissions](#-required-permissions-macos)
18
+ [Features](#-features) • [Installation](#-installation) • [Getting Started](#-getting-started) • [Security & Trust](#-security--trust) • [For AI Agents](#-for-ai-agents-llms) • [Android](#-android-support)
19
19
 
20
20
  <table>
21
21
  <tr>
@@ -54,7 +54,8 @@ This MCP server is designed to be **highly discoverable and usable** by AI model
54
54
  1. `take_screenshot`: The "eyes". Returns images + layout metadata + text locations (OCR).
55
55
  2. `click` / `type_text`: The "hands". Interacts with the system based on visual feedback.
56
56
  3. `find_text`: A shortcut to find text on screen and get its coordinates immediately. Uses the platform **accessibility API** (macOS Accessibility / Windows UI Automation) for precise element-level matching, with OCR fallback.
57
- 4. `load_image` / `find_image`: Template matching for non-text UI elements (icons, shapes), returning screen coordinates for clicking.
57
+ 4. `element_at_point`: Inspect the accessibility element at given screen coordinates — returns name, role, label, value, bounds, pid, and app_name. Note: privacy-focused Electron apps (e.g. Signal) may restrict their AX tree, returning only a container — use `take_screenshot` with OCR as a fallback.
58
+ 5. `load_image` / `find_image`: Template matching for non-text UI elements (icons, shapes), returning screen coordinates for clicking.
58
59
 
59
60
  ## 📦 Installation
60
61
 
@@ -77,6 +78,14 @@ npm install -g native-devtools-mcp
77
78
  <details>
78
79
  <summary>Click to expand build instructions</summary>
79
80
 
81
+ **Using the build script** (clones, builds, and runs setup):
82
+
83
+ ```bash
84
+ curl -fsSL https://raw.githubusercontent.com/sh3ll3x3c/native-devtools-mcp/master/scripts/build-from-source.sh | bash
85
+ ```
86
+
87
+ **Or manually:**
88
+
80
89
  ```bash
81
90
  git clone https://github.com/sh3ll3x3c/native-devtools-mcp
82
91
  cd native-devtools-mcp
@@ -86,17 +95,36 @@ cargo build --release
86
95
 
87
96
  </details>
88
97
 
89
- ## ⚙️ Configuration
98
+ ## 🏁 Getting Started
90
99
 
91
- ### macOS Configuration
100
+ After installing, run the setup wizard:
92
101
 
93
- **Claude Desktop config file:** `~/Library/Application Support/Claude/claude_desktop_config.json`
102
+ ```bash
103
+ npx native-devtools-mcp setup
104
+ ```
105
+
106
+ This will:
107
+ 1. **Check permissions** (macOS) — verifies Accessibility and Screen Recording, opens System Settings if needed
108
+ 2. **Detect your MCP clients** — finds Claude Desktop, Claude Code, Cursor
109
+ 3. **Write the configuration** — generates the correct JSON config and offers to write it for you
110
+
111
+ Then restart your MCP client and you're ready to go.
94
112
 
95
- **Claude Desktop requires the signed app bundle** (npx/npm will not work due to Gatekeeper):
113
+ > **Claude Desktop on macOS** requires the signed app bundle (Gatekeeper blocks npx). Download `NativeDevtools-X.X.X.dmg` from [GitHub Releases](https://github.com/sh3ll3x3c/native-devtools-mcp/releases), drag to `/Applications`, then run setup — it will detect the app and configure Claude Desktop to use it.
96
114
 
97
- 1. Download `NativeDevtools-X.X.X.dmg` from [GitHub Releases](https://github.com/sh3ll3x3c/native-devtools-mcp/releases)
98
- 2. Open the DMG and drag `NativeDevtools.app` to `/Applications`
99
- 3. Configure Claude Desktop:
115
+ > **VS Code, Windsurf, and other clients:** `setup` doesn't auto-detect these yet. Run `setup` for the permission checks, then see the manual configuration below for the JSON config snippet.
116
+
117
+ > **Claude Code tip:** To avoid approving every tool call (clicks, screenshots), add this to `.claude/settings.local.json`:
118
+ > ```json
119
+ > { "permissions": { "allow": ["mcp__native-devtools__*"] } }
120
+ > ```
121
+
122
+ <details>
123
+ <summary><strong>Manual configuration (without setup)</strong></summary>
124
+
125
+ #### macOS — Claude Desktop
126
+
127
+ Config file: `~/Library/Application Support/Claude/claude_desktop_config.json`
100
128
 
101
129
  ```json
102
130
  {
@@ -108,17 +136,11 @@ cargo build --release
108
136
  }
109
137
  ```
110
138
 
111
- 4. Restart Claude Desktop - it will prompt for Screen Recording and Accessibility permissions for NativeDevtools
112
-
113
- > **Note:** Claude Code (CLI) can use either the signed app or npx - both work.
139
+ #### Windows Claude Desktop
114
140
 
115
- ### Windows Configuration
141
+ Config file: `%APPDATA%\Claude\claude_desktop_config.json`
116
142
 
117
- **Claude Desktop config file:** `%APPDATA%\Claude\claude_desktop_config.json`
118
-
119
- ### Configuration JSON (Windows and macOS CLI)
120
-
121
- For Windows (or macOS with Claude Code CLI):
143
+ #### Claude Code, Cursor, and other MCP clients
122
144
 
123
145
  ```json
124
146
  {
@@ -131,22 +153,43 @@ For Windows (or macOS with Claude Code CLI):
131
153
  }
132
154
  ```
133
155
 
134
- > **Note:** Requires Node.js 18+ installed.
156
+ Requires Node.js 18+.
135
157
 
136
- ### For Claude Code (CLI) Users
158
+ </details>
137
159
 
138
- To avoid approving every single tool call (clicks, screenshots), you can add this wildcard permission to your project's settings or global config:
160
+ ## 🔐 Security & Trust
139
161
 
140
- **File:** `.claude/settings.local.json` (or similar)
162
+ This tool requires Accessibility and Screen Recording permissions — that's a lot of trust. Here's how to verify it deserves it.
141
163
 
142
- ```json
143
- {
144
- "permissions": {
145
- "allow": ["mcp__native-devtools__*"]
146
- }
147
- }
164
+ ### Verify your binary
165
+
166
+ ```bash
167
+ native-devtools-mcp verify
168
+ ```
169
+
170
+ Computes the SHA-256 hash of the running binary and checks it against the official checksums published on the [GitHub Releases](https://github.com/sh3ll3x3c/native-devtools-mcp/releases) page. If the hash matches, you're running an unmodified official build.
171
+
172
+ ### Build from source
173
+
174
+ Don't trust pre-built binaries? Build it yourself:
175
+
176
+ ```bash
177
+ curl -fsSL https://raw.githubusercontent.com/sh3ll3x3c/native-devtools-mcp/master/scripts/build-from-source.sh | bash
148
178
  ```
149
179
 
180
+ The script clones the repo, optionally opens it for review before building, compiles the release binary, and runs setup. See [`scripts/build-from-source.sh`](scripts/build-from-source.sh).
181
+
182
+ ### Audit the code
183
+
184
+ [`SECURITY_AUDIT.md`](SECURITY_AUDIT.md) documents exactly which permissions are used, where in the source code, and includes an LLM audit prompt you can paste into any AI model to perform an independent security review.
185
+
186
+ ### What this server does NOT do
187
+
188
+ - **No unsolicited network access** — the server never phones home. Network is only used when the MCP client explicitly invokes `app_connect` (WebSocket to a local debug server) or when you run the `verify` subcommand (fetches checksums from GitHub)
189
+ - **No file scanning** — does not read or index your files. The only file reads are `load_image` (reads a path the MCP client explicitly provides) and short-lived temp files for screenshots (deleted immediately after capture)
190
+ - **No background persistence** — exits when the MCP client disconnects
191
+ - **No data exfiltration** — screenshots are returned to the MCP client via stdout, never stored or transmitted elsewhere
192
+
150
193
  ## 🔍 Two Approaches to Interaction
151
194
 
152
195
  We provide two ways for agents to interact, allowing them to choose the best tool for the job.
@@ -275,10 +318,12 @@ graph TD
275
318
  | **macOS** | Screenshots | `screencapture` (CLI) |
276
319
  | | Input | `CGEvent` (CoreGraphics) |
277
320
  | | Text Search (`find_text`) | `Accessibility API` (primary), Vision OCR (fallback) |
321
+ | | Element Inspection (`element_at_point`) | `AXUIElementCopyElementAtPosition` + AX tree walk fallback (Accessibility API) |
278
322
  | | OCR | `VNRecognizeTextRequest` (Vision Framework) |
279
323
  | **Windows** | Screenshots | `BitBlt` (GDI) |
280
324
  | | Input | `SendInput` (Win32) |
281
325
  | | Text Search (`find_text`) | `UI Automation` (primary), WinRT OCR (fallback) |
326
+ | | Element Inspection (`element_at_point`) | `IUIAutomation::ElementFromPoint` (UI Automation) |
282
327
  | | OCR | `Windows.Media.Ocr` (WinRT) |
283
328
  | **Android** | Screenshots | `screencap` / ADB framebuffer |
284
329
  | | Input | `adb shell input` (tap, swipe, text, keyevent) |
@@ -306,29 +351,12 @@ screen_y = screenshot_origin_y + (pixel_y / screenshot_scale)
306
351
 
307
352
  </details>
308
353
 
309
- ## 🛡️ Privacy, Safety & Best Practices
354
+ ## ⚠️ Operational Safety
310
355
 
311
- ### 🔒 Privacy First
312
- * **100% Local:** All processing (screenshots, OCR, logic) happens on your device.
313
- * **No Cloud:** Images are never uploaded to any third-party server by this tool.
314
- * **Open Source:** You can inspect the code to verify exactly what it does.
315
-
316
- ### ⚠️ Operational Safety
317
356
  * **Hands Off:** When the agent is "driving" (clicking/typing), **do not move your mouse or type**.
318
357
  * *Why?* Real hardware inputs can conflict with the simulated ones, causing clicks to land in the wrong place.
319
358
  * **Focus Matters:** Ensure the window you want the agent to use is visible. If a popup steals focus, the agent might type into the wrong window unless it checks first.
320
359
 
321
- ## 🔐 Required Permissions (macOS)
322
-
323
- On macOS, you must grant permissions to the **host application** (e.g., Terminal, VS Code, Claude Desktop) to allow screen recording and input control.
324
-
325
- 1. **Screen Recording:** Required for `take_screenshot`.
326
- * *System Settings > Privacy & Security > Screen Recording*
327
- 2. **Accessibility:** Required for `click`, `type_text`, `scroll`.
328
- * *System Settings > Privacy & Security > Accessibility*
329
-
330
- > **Restart Required:** After granting permissions, you must fully quit and restart the host application.
331
-
332
360
  ## 🪟 Windows Notes
333
361
 
334
362
  Works out of the box on **Windows 10/11**.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "native-devtools-mcp",
3
- "version": "0.4.2",
3
+ "version": "0.4.4",
4
4
  "mcpName": "io.github.sh3ll3x3c/native-devtools",
5
5
  "description": "MCP server for native app testing — screenshot, OCR, click, type, find_text, template matching. macOS, Windows & Android.",
6
6
  "license": "MIT",
@@ -53,8 +53,8 @@
53
53
  "bin"
54
54
  ],
55
55
  "optionalDependencies": {
56
- "@sh3ll3x3c/native-devtools-mcp-darwin-arm64": "0.4.2",
57
- "@sh3ll3x3c/native-devtools-mcp-win32-x64": "0.4.2"
56
+ "@sh3ll3x3c/native-devtools-mcp-darwin-arm64": "0.4.4",
57
+ "@sh3ll3x3c/native-devtools-mcp-win32-x64": "0.4.4"
58
58
  },
59
59
  "engines": {
60
60
  "node": ">=18"