native-devtools-mcp 0.2.1 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +210 -0
  2. package/package.json +3 -3
package/README.md ADDED
@@ -0,0 +1,210 @@
1
+ # native-devtools-mcp
2
+
3
+ <div align="center">
4
+
5
+ ![Version](https://img.shields.io/npm/v/native-devtools-mcp?style=flat-square)
6
+ ![License](https://img.shields.io/npm/l/native-devtools-mcp?style=flat-square)
7
+ ![Platform](https://img.shields.io/badge/platform-macOS%20%7C%20Windows-blue?style=flat-square)
8
+ ![Downloads](https://img.shields.io/npm/dt/native-devtools-mcp?style=flat-square)
9
+
10
+ **Give your AI agent "eyes" and "hands" for native desktop applications.**
11
+
12
+ A Model Context Protocol (MCP) server that provides **Computer Use** capabilities: screenshots, OCR, input simulation, and window management.
13
+
14
+ [Features](#-features) • [Installation](#-installation) • [For AI Agents](#-for-ai-agents-llms) • [Permissions](#-required-permissions-macos)
15
+
16
+ ![Demo](demo.gif)
17
+
18
+ </div>
19
+
20
+ ---
21
+
22
+ ## 🚀 Features
23
+
24
+ - **👀 Computer Vision:** Capture screenshots of screens, windows, or specific regions. Includes built-in OCR (text recognition) to "read" the screen.
25
+ - **🖱️ Input Simulation:** Click, drag, scroll, and type text naturally. Supports global coordinates and window-relative actions.
26
+ - **🪟 Window Management:** List open windows, find applications, and bring them to focus.
27
+ - **🔒 Local & Private:** 100% local execution. No screenshots or data are ever sent to external servers.
28
+ - **🔌 Dual-Mode Interaction:**
29
+ 1. **Visual/Native:** Works with *any* app via screenshots & coordinates (Universal).
30
+ 2. **AppDebugKit:** Deep integration for supported apps to inspect the UI tree (DOM-like structure).
31
+
32
+ ## 🤖 For AI Agents (LLMs)
33
+
34
+ This MCP server is designed to be **highly discoverable and usable** by AI models (Claude, Gemini, GPT).
35
+
36
+ - **[📄 Read `agents.md`](./agents.md):** A compact, token-optimized technical reference designed specifically for ingestion by LLMs. It contains intent definitions, schema examples, and reasoning patterns.
37
+
38
+ **Core Capabilities for System Prompts:**
39
+ 1. `take_screenshot`: The "eyes". Returns images + layout metadata + text locations (OCR).
40
+ 2. `click` / `type_text`: The "hands". Interacts with the system based on visual feedback.
41
+ 3. `find_text`: A shortcut to find text on screen and get its coordinates immediately.
42
+
43
+ ## 📦 Installation (macOS + Windows)
44
+
45
+ The install steps are identical on macOS and Windows.
46
+
47
+ ### Option 1: Run with `npx` (no install needed)
48
+
49
+ ```bash
50
+ npx -y native-devtools-mcp
51
+ ```
52
+
53
+ ### Option 2: Global install
54
+
55
+ ```bash
56
+ npm install -g native-devtools-mcp
57
+ ```
58
+
59
+ ### Option 3: Build from source (Rust)
60
+
61
+ <details>
62
+ <summary>Click to expand build instructions</summary>
63
+
64
+ ```bash
65
+ git clone https://github.com/sh3ll3x3c/native-devtools-mcp
66
+ cd native-devtools-mcp
67
+ cargo build --release
68
+ # Binary: ./target/release/native-devtools-mcp
69
+ ```
70
+ </details>
71
+
72
+ ## ⚙️ Configuration
73
+
74
+ ### macOS Configuration
75
+
76
+ **Claude Desktop config file:** `~/Library/Application Support/Claude/claude_desktop_config.json`
77
+
78
+ **Claude Desktop requires the signed app bundle** (npx/npm will not work due to Gatekeeper):
79
+
80
+ 1. Download `NativeDevtools-X.X.X.dmg` from [GitHub Releases](https://github.com/sh3ll3x3c/native-devtools-mcp/releases)
81
+ 2. Open the DMG and drag `NativeDevtools.app` to `/Applications`
82
+ 3. Configure Claude Desktop:
83
+
84
+ ```json
85
+ {
86
+ "mcpServers": {
87
+ "native-devtools": {
88
+ "command": "/Applications/NativeDevtools.app/Contents/MacOS/native-devtools-mcp"
89
+ }
90
+ }
91
+ }
92
+ ```
93
+
94
+ 4. Restart Claude Desktop - it will prompt for Screen Recording and Accessibility permissions for NativeDevtools
95
+
96
+ > **Note:** Claude Code (CLI) can use either the signed app or npx - both work.
97
+
98
+ ### Windows Configuration
99
+
100
+ **Claude Desktop config file:** `%APPDATA%\Claude\claude_desktop_config.json`
101
+
102
+ ### Configuration JSON (Windows and macOS CLI)
103
+
104
+ For Windows (or macOS with Claude Code CLI):
105
+
106
+ ```json
107
+ {
108
+ "mcpServers": {
109
+ "native-devtools": {
110
+ "command": "npx",
111
+ "args": ["-y", "native-devtools-mcp"]
112
+ }
113
+ }
114
+ }
115
+ ```
116
+
117
+ > **Note:** Requires Node.js 18+ installed.
118
+
119
+ ### For Claude Code (CLI) Users
120
+
121
+ To avoid approving every single tool call (clicks, screenshots), you can add this wildcard permission to your project's settings or global config:
122
+
123
+ **File:** `.claude/settings.local.json` (or similar)
124
+
125
+ ```json
126
+ {
127
+ "permissions": {
128
+ "allow": ["mcp__native-devtools__*"]
129
+ }
130
+ }
131
+ ```
132
+
133
+ ## 🔍 Two Approaches to Interaction
134
+
135
+ We provide two ways for agents to interact, allowing them to choose the best tool for the job.
136
+
137
+ ### 1. The "Visual" Approach (Universal)
138
+ **Best for:** 99% of apps (Electron, Qt, Games, Browsers).
139
+ * **How it works:** The agent takes a screenshot, analyzes it visually (or uses OCR), and clicks at coordinates.
140
+ * **Tools:** `take_screenshot`, `find_text`, `click`, `type_text`.
141
+ * **Example:** "Click the button that looks like a gear icon."
142
+
143
+ ### 2. The "Structural" Approach (AppDebugKit)
144
+ **Best for:** Apps specifically instrumented with our AppDebugKit library (mostly for developers testing their own apps).
145
+ * **How it works:** The agent connects to a debug port and queries the UI tree (like HTML DOM).
146
+ * **Tools:** `app_connect`, `app_query`, `app_click`.
147
+ * **Example:** `app_click(element_id="submit-button")`.
148
+
149
+ ## 🏗️ Architecture
150
+
151
+ ```mermaid
152
+ graph TD
153
+ Client[Claude / LLM Client] <-->|JSON-RPC 2.0| Server[native-devtools-mcp]
154
+ Server -->|Direct API| Sys[System APIs]
155
+ Server -->|WebSocket| Debug[AppDebugKit]
156
+
157
+ subgraph "Your Machine"
158
+ Sys -->|Screen/OCR| macOS[CoreGraphics / Vision]
159
+ Sys -->|Input| Win[Win32 / SendInput]
160
+ Debug -.->|Inspect| App[Target App]
161
+ end
162
+ ```
163
+
164
+ <details>
165
+ <summary><strong>🔧 Technical Details (Under the Hood)</strong></summary>
166
+
167
+ | OS | Feature | API Used |
168
+ |----|---------|----------|
169
+ | **macOS** | Screenshots | `screencapture` (CLI) |
170
+ | | Input | `CGEvent` (CoreGraphics) |
171
+ | | OCR | `VNRecognizeTextRequest` (Vision Framework) |
172
+ | **Windows** | Screenshots | `BitBlt` (GDI) |
173
+ | | Input | `SendInput` (Win32) |
174
+ | | OCR | `Windows.Media.Ocr` (WinRT) |
175
+
176
+ </details>
177
+
178
+ ## 🛡️ Privacy, Safety & Best Practices
179
+
180
+ ### 🔒 Privacy First
181
+ * **100% Local:** All processing (screenshots, OCR, logic) happens on your device.
182
+ * **No Cloud:** Images are never uploaded to any third-party server by this tool.
183
+ * **Open Source:** You can inspect the code to verify exactly what it does.
184
+
185
+ ### ⚠️ Operational Safety
186
+ * **Hands Off:** When the agent is "driving" (clicking/typing), **do not move your mouse or type**.
187
+ * *Why?* Real hardware inputs can conflict with the simulated ones, causing clicks to land in the wrong place.
188
+ * **Focus Matters:** Ensure the window you want the agent to use is visible. If a popup steals focus, the agent might type into the wrong window unless it checks first.
189
+
190
+ ## 🔐 Required Permissions (macOS)
191
+
192
+ On macOS, you must grant permissions to the **host application** (e.g., Terminal, VS Code, Claude Desktop) to allow screen recording and input control.
193
+
194
+ 1. **Screen Recording:** Required for `take_screenshot`.
195
+ * *System Settings > Privacy & Security > Screen Recording*
196
+ 2. **Accessibility:** Required for `click`, `type_text`, `scroll`.
197
+ * *System Settings > Privacy & Security > Accessibility*
198
+
199
+ > **Restart Required:** After granting permissions, you must fully quit and restart the host application.
200
+
201
+ ## 🪟 Windows Notes
202
+
203
+ Works out of the box on **Windows 10/11**.
204
+ * Uses standard Win32 APIs (GDI, SendInput).
205
+ * OCR uses the built-in Windows Media OCR engine (offline).
206
+ * **Note:** Cannot interact with "Run as Administrator" windows unless the MCP server itself is also running as Administrator.
207
+
208
+ ## 📜 License
209
+
210
+ MIT © [sh3ll3x3c](https://github.com/sh3ll3x3c)
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "native-devtools-mcp",
3
- "version": "0.2.1",
3
+ "version": "0.3.0",
4
4
  "description": "MCP server for testing native desktop applications",
5
5
  "license": "MIT",
6
6
  "repository": {
@@ -25,8 +25,8 @@
25
25
  "bin"
26
26
  ],
27
27
  "optionalDependencies": {
28
- "@sh3ll3x3c/native-devtools-mcp-darwin-arm64": "0.2.1",
29
- "@sh3ll3x3c/native-devtools-mcp-win32-x64": "0.2.1"
28
+ "@sh3ll3x3c/native-devtools-mcp-darwin-arm64": "0.3.0",
29
+ "@sh3ll3x3c/native-devtools-mcp-win32-x64": "0.3.0"
30
30
  },
31
31
  "engines": {
32
32
  "node": ">=18"