native-devtools-mcp 0.3.1 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +26 -1
  2. package/package.json +18 -4
package/README.md CHANGED
@@ -11,6 +11,8 @@
11
11
 
12
12
  A Model Context Protocol (MCP) server that provides **Computer Use** capabilities: screenshots, OCR, input simulation, and window management.
13
13
 
14
+ [//]: # "Search keywords: MCP, Model Context Protocol, computer use, desktop automation, UI automation, RPA, screenshots, OCR, mouse, keyboard, screen reading, macOS, Windows, native-devtools-mcp"
15
+
14
16
  [Features](#-features) • [Installation](#-installation) • [For AI Agents](#-for-ai-agents-llms) • [Permissions](#-required-permissions-macos)
15
17
 
16
18
  ![Demo](demo.gif)
@@ -19,6 +21,10 @@ A Model Context Protocol (MCP) server that provides **Computer Use** capabilitie
19
21
 
20
22
  ---
21
23
 
24
+ ## 🔍 Search Keywords
25
+
26
+ MCP, Model Context Protocol, computer use, desktop automation, UI automation, RPA, screenshots, OCR, screen reading, mouse, keyboard, macOS, Windows, native-devtools-mcp.
27
+
22
28
  ## 🚀 Features
23
29
 
24
30
  - **👀 Computer Vision:** Capture screenshots of screens, windows, or specific regions. Includes built-in OCR (text recognition) to "read" the screen.
@@ -33,7 +39,7 @@ A Model Context Protocol (MCP) server that provides **Computer Use** capabilitie
33
39
 
34
40
  This MCP server is designed to be **highly discoverable and usable** by AI models (Claude, Gemini, GPT).
35
41
 
36
- - **[📄 Read `agents.md`](./agents.md):** A compact, token-optimized technical reference designed specifically for ingestion by LLMs. It contains intent definitions, schema examples, and reasoning patterns.
42
+ - **[📄 Read `AGENTS.md`](./AGENTS.md):** A compact, token-optimized technical reference designed specifically for ingestion by LLMs. It contains intent definitions, schema examples, and reasoning patterns.
37
43
 
38
44
  **Core Capabilities for System Prompts:**
39
45
  1. `take_screenshot`: The "eyes". Returns images + layout metadata + text locations (OCR).
@@ -173,6 +179,25 @@ graph TD
173
179
  | | Input | `SendInput` (Win32) |
174
180
  | | OCR | `Windows.Media.Ocr` (WinRT) |
175
181
 
182
+ ### Screenshot Coordinate Precision
183
+
184
+ Screenshots include metadata for accurate coordinate conversion:
185
+
186
+ - `screenshot_origin_x/y`: Screen-space origin of the captured area (in points)
187
+ - `screenshot_scale`: Display scale factor (e.g., 2.0 for Retina displays)
188
+ - `screenshot_pixel_width/height`: Actual pixel dimensions of the image
189
+ - `screenshot_window_id`: Window ID (for window captures)
190
+
191
+ **Coordinate conversion:**
192
+ ```
193
+ screen_x = screenshot_origin_x + (pixel_x / screenshot_scale)
194
+ screen_y = screenshot_origin_y + (pixel_y / screenshot_scale)
195
+ ```
196
+
197
+ **Implementation notes:**
198
+ - **Window captures** (macOS): Uses `screencapture -o` which excludes window shadow. The captured image dimensions match `kCGWindowBounds × scale` exactly, ensuring click coordinates derived from screenshots land on intended UI elements.
199
+ - **Region captures**: Origin coordinates are aligned to integers to match the actual captured area.
200
+
176
201
  </details>
177
202
 
178
203
  ## 🛡️ Privacy, Safety & Best Practices
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "native-devtools-mcp",
3
- "version": "0.3.1",
4
- "description": "MCP server for testing native desktop applications",
3
+ "version": "0.3.2",
4
+ "description": "MCP server for computer-use / desktop automation of native apps (screenshots, OCR, input)",
5
5
  "license": "MIT",
6
6
  "repository": {
7
7
  "type": "git",
@@ -11,6 +11,20 @@
11
11
  "keywords": [
12
12
  "mcp",
13
13
  "model-context-protocol",
14
+ "computer-use",
15
+ "desktop-automation",
16
+ "ui-automation",
17
+ "rpa",
18
+ "ocr",
19
+ "screenshot",
20
+ "screen-reading",
21
+ "mouse",
22
+ "keyboard",
23
+ "ai-agent",
24
+ "llm",
25
+ "claude",
26
+ "gemini",
27
+ "gpt",
14
28
  "devtools",
15
29
  "desktop",
16
30
  "testing",
@@ -25,8 +39,8 @@
25
39
  "bin"
26
40
  ],
27
41
  "optionalDependencies": {
28
- "@sh3ll3x3c/native-devtools-mcp-darwin-arm64": "0.3.1",
29
- "@sh3ll3x3c/native-devtools-mcp-win32-x64": "0.3.1"
42
+ "@sh3ll3x3c/native-devtools-mcp-darwin-arm64": "0.3.2",
43
+ "@sh3ll3x3c/native-devtools-mcp-win32-x64": "0.3.2"
30
44
  },
31
45
  "engines": {
32
46
  "node": ">=18"