claude-kvm-native 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Rฤฑza Emre ARAS <r.emrearas@proton.me>
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,244 @@
1
+ # Claude KVM
2
+
3
+ Claude KVM is an MCP tool that controls your remote desktop environment over VNC, with optional SSH access.
4
+
5
+ ## Architecture
6
+
7
+ Claude KVM follows an **atomic instrument** design โ€” each tool does one thing, Claude orchestrates the flow. The system provides three independent channels, each optimized for a different type of interaction:
8
+
9
+ ```mermaid
10
+ graph TB
11
+ subgraph MCP["MCP Client (Claude)"]
12
+ AI["๐Ÿค– Claude"]
13
+ end
14
+
15
+ subgraph Server["claude-kvm ยท MCP Server (stdio)"]
16
+ direction TB
17
+ Router["Tool Router<br/><code>index.js</code>"]
18
+
19
+ subgraph Channels["Channels"]
20
+ direction LR
21
+ subgraph VNC_Ch["VNC Channel"]
22
+ direction TB
23
+ VNC_Client["VNC Client<br/><code>lib/vnc.js</code>"]
24
+ HID["HID Controller<br/><code>lib/hid.js</code>"]
25
+ Capture["Screen Capture<br/><code>lib/capture.js</code>"]
26
+ end
27
+
28
+ subgraph SSH_Ch["SSH Channel"]
29
+ direction TB
30
+ SSH_Client["SSH Client<br/><code>lib/ssh.js</code>"]
31
+ end
32
+
33
+ subgraph VLM_Ch["VLM Channel"]
34
+ direction TB
35
+ VLM_Bin["claude-kvm-vlm<br/><i>Apple Silicon binary</i>"]
36
+ end
37
+ end
38
+ end
39
+
40
+ subgraph Local["Host Machine (Apple Silicon)"]
41
+ MLX["MLX Framework<br/><i>FastVLM 0.5B</i>"]
42
+ end
43
+
44
+ subgraph Target["Target Machine"]
45
+ VNC_Server["VNC Server<br/><i>:5900</i>"]
46
+ SSH_Server["SSH Server<br/><i>:22</i>"]
47
+
48
+ Desktop["๐Ÿ–ฅ๏ธ Desktop Environment"]
49
+ Shell["๐Ÿ’ป Shell"]
50
+ end
51
+
52
+ AI <--->|"stdio<br/>JSON-RPC"| Router
53
+
54
+ Router --> VNC_Client
55
+ Router --> HID
56
+ Router --> Capture
57
+ Router --> SSH_Client
58
+ Router --> VLM_Bin
59
+
60
+ VNC_Client <-->|"RFB Protocol<br/>TCP :5900"| VNC_Server
61
+ HID --> VNC_Client
62
+ Capture --> VNC_Client
63
+ Capture -->|"PNG crop"| VLM_Bin
64
+
65
+ SSH_Client <-->|"SSH Protocol<br/>TCP :22"| SSH_Server
66
+ VLM_Bin -->|"stdin: PNG<br/>stdout: text"| MLX
67
+
68
+ VNC_Server --> Desktop
69
+ SSH_Server --> Shell
70
+
71
+ classDef server fill:#1a1a2e,stroke:#16213e,color:#e5e5e5
72
+ classDef channel fill:#0f3460,stroke:#533483,color:#e5e5e5
73
+ classDef target fill:#1a1a2e,stroke:#e94560,color:#e5e5e5
74
+ classDef local fill:#1a1a2e,stroke:#533483,color:#e5e5e5
75
+
76
+ class Router server
77
+ class VNC_Client,HID,Capture,SSH_Client,VLM_Bin channel
78
+ class VNC_Server,SSH_Server,Desktop,Shell target
79
+ class MLX local
80
+ ```
81
+
82
+ ### Channel Overview
83
+
84
+ | Channel | Transport | Purpose | Tools |
85
+ |---------|-------------------|--------------------------------------------------|---------------------------------------------------------------------------|
86
+ | **VNC** | RFB over TCP | Visual control โ€” screen capture, mouse, keyboard | `screenshot` `cursor_crop` `diff_check` `set_baseline` `mouse` `keyboard` |
87
+ | **SSH** | SSH over TCP | Text I/O โ€” shell commands, file ops, osascript | `ssh` |
88
+ | **VLM** | stdin/stdout pipe | Pixel โ†’ text โ€” on-device OCR and visual Q&A | `vlm_query` |
89
+
90
+ ### How They Work Together
91
+
92
+ Each channel has a strength. Claude picks the most efficient one โ€” or combines them:
93
+
94
+ - **Read a web page** โ†’ VNC navigates, VLM reads text from a region, no screenshot needed
95
+ - **Run a shell command** โ†’ SSH returns text directly, faster than typing in a terminal via VNC
96
+ - **Verify a change** โ†’ `diff_check` detects change (5ms, no image), `cursor_crop` confirms placement (small image), `screenshot` only when needed (full image)
97
+ - **Debug a dialog** โ†’ VLM reads the button labels, SSH runs `osascript` to get window info, VNC clicks the right button
98
+
99
+ ### Three-Layer Screen Strategy
100
+
101
+ Claude minimizes token cost with a progressive verification approach:
102
+
103
+ ```
104
+ diff_check โ†’ changeDetected: true/false ~5ms (text only, no image)
105
+ cursor_crop โ†’ 300ร—300px around cursor ~200ms (small image)
106
+ screenshot โ†’ full screen capture ~1200ms (full image, HiDPI)
107
+ ```
108
+
109
+ Start cheap, escalate only when needed.
110
+
111
+ ### Coordinate Scaling
112
+
113
+ The VNC server's native resolution is scaled down to fit within `DISPLAY_MAX_DIMENSION` (default: 1280px). Claude works in scaled coordinates โ€” the server transparently converts between native and scaled space:
114
+
115
+ ```
116
+ Native: 3840 ร— 2400 (VNC server framebuffer)
117
+ Scaled: 1280 ร— 800 (what Claude sees and targets)
118
+
119
+ click_at(640, 400) โ†’ VNC receives (1920, 1200)
120
+ ```
121
+
122
+ ## Usage
123
+
124
+ Create a `.mcp.json` file in your project root directory:
125
+
126
+ ```json
127
+ {
128
+ "mcpServers": {
129
+ "claude-kvm": {
130
+ "command": "npx",
131
+ "args": ["-y", "claude-kvm"],
132
+ "env": {
133
+ "VNC_HOST": "192.168.1.100",
134
+ "VNC_PORT": "5900",
135
+ "VNC_AUTH": "auto",
136
+ "VNC_USERNAME": "user",
137
+ "VNC_PASSWORD": "pass",
138
+ "SSH_HOST": "192.168.1.100",
139
+ "SSH_USER": "user",
140
+ "SSH_PASSWORD": "pass",
141
+ "CLAUDE_KVM_VLM_TOOL_PATH": "/path/to/claude-kvm-vlm"
142
+ }
143
+ }
144
+ }
145
+ }
146
+ ```
147
+
148
+ Only the VNC connection parameters are required. SSH and all other parameters are optional.
149
+
150
+ ### Configuration
151
+
152
+ #### VNC
153
+
154
+ | Parameter | Default | Description |
155
+ |------------------------------|-------------|------------------------------------------------|
156
+ | `VNC_HOST` | `127.0.0.1` | VNC server address |
157
+ | `VNC_PORT` | `5900` | VNC port number |
158
+ | `VNC_AUTH` | `auto` | Authentication mode (`auto` / `none`) |
159
+ | `VNC_USERNAME` | | Username (for VeNCrypt Plain / ARD) |
160
+ | `VNC_PASSWORD` | | Password |
161
+ | `VNC_CONNECT_TIMEOUT_MS` | `10000` | TCP connection timeout (ms) |
162
+ | `VNC_SCREENSHOT_TIMEOUT_MS` | `3000` | Screenshot frame wait timeout (ms) |
163
+
164
+ #### SSH (optional)
165
+
166
+ | Parameter | Default | Description |
167
+ |-----------------|---------|----------------------------------------------|
168
+ | `SSH_HOST` | | SSH server address (required to enable SSH) |
169
+ | `SSH_USER` | | SSH username (required to enable SSH) |
170
+ | `SSH_PASSWORD` | | SSH password (for password auth) |
171
+ | `SSH_KEY` | | Path to private key file (for key auth) |
172
+ | `SSH_PORT` | `22` | SSH port number |
173
+
174
+ The SSH tool is only registered when both `SSH_HOST` and `SSH_USER` are set. Authentication uses either password or key โ€” whichever is provided.
175
+
176
+ #### VLM (optional, macOS only)
177
+
178
+ | Parameter | Default | Description |
179
+ |----------------------------|---------|------------------------------------------------------------------------------------------------|
180
+ | `CLAUDE_KVM_VLM_TOOL_PATH` | | Absolute path to `claude-kvm-vlm` binary (macOS arm64). Enables the `vlm_query` tool when set. |
181
+
182
+ The `vlm_query` tool is only registered when `CLAUDE_KVM_VLM_TOOL_PATH` is set. Requires Apple Silicon.
183
+
184
+ ##### Quick Install
185
+
186
+ ```bash
187
+ brew tap ARAS-Workspace/tap
188
+ brew install claude-kvm-vlm
189
+ ```
190
+
191
+ The `claude-kvm-vlm` binary is built, code-signed and notarized via CI:
192
+
193
+ - [Build Workflow](https://github.com/ARAS-Workspace/claude-kvm/actions/runs/22114321867)
194
+ - [Source Code](https://github.com/ARAS-Workspace/claude-kvm/tree/vlm-tool)
195
+
196
+ #### Display & Input
197
+
198
+ | Parameter | Default | Description |
199
+ |------------------------------|-------------|------------------------------------------------|
200
+ | `DISPLAY_MAX_DIMENSION` | `1280` | Maximum dimension to scale screenshots to (px) |
201
+ | `HID_CLICK_HOLD_MS` | `80` | Mouse click hold duration (ms) |
202
+ | `HID_KEY_HOLD_MS` | `50` | Key press hold duration (ms) |
203
+ | `HID_TYPING_DELAY_MIN_MS` | `30` | Typing delay lower bound (ms) |
204
+ | `HID_TYPING_DELAY_MAX_MS` | `100` | Typing delay upper bound (ms) |
205
+ | `HID_SCROLL_EVENTS_PER_STEP` | `5` | VNC scroll events per scroll step |
206
+ | `DIFF_PIXEL_THRESHOLD` | `30` | Per-channel pixel difference threshold (0-255) |
207
+
208
+ ## Tools
209
+
210
+ | Tool | Returns | Description |
211
+ |-----------------|------------------|-----------------------------------------------------------|
212
+ | `mouse` | `(x, y)` | Mouse actions: move, hover, click, click_at, scroll, drag |
213
+ | `keyboard` | `OK` | Keyboard actions: press, combo, type, paste |
214
+ | `screenshot` | `OK` + image | Capture full screen |
215
+ | `cursor_crop` | `(x, y)` + image | Small crop around cursor position |
216
+ | `diff_check` | `changeDetected` | Lightweight pixel change detection against baseline |
217
+ | `set_baseline` | `OK` | Save current screen as diff reference |
218
+ | `health_check` | JSON | VNC/SSH status, resolution, uptime, memory |
219
+ | `ssh` | stdout/stderr | Execute a command on the remote machine via SSH |
220
+ | `vlm_query` | text | On-device VLM query on a cropped screen region (macOS) |
221
+ | `wait` | `OK` | Wait for a specified duration |
222
+ | `task_complete` | summary | Mark task as completed |
223
+ | `task_failed` | reason | Mark task as failed |
224
+
225
+ ## Authentication
226
+
227
+ ### VNC
228
+
229
+ Supports multiple VNC authentication methods:
230
+
231
+ - **None** โ€” no authentication
232
+ - **VNC Auth** โ€” password-based challenge-response (DES)
233
+ - **ARD** โ€” Apple Remote Desktop (Diffie-Hellman + AES)
234
+ - **VeNCrypt** โ€” TLS-wrapped auth (Plain, VNC, None subtypes)
235
+
236
+ macOS Screen Sharing (ARD) is auto-detected via the `RFB 003.889` version string.
237
+
238
+ ### SSH
239
+
240
+ Supports password and private key authentication. When the target is macOS, the SSH tool enables AppleScript execution (`osascript`), clipboard access (`pbpaste`/`pbcopy`), and system-level control.
241
+
242
+ ---
243
+
244
+ Copyright (c) 2025 Riza Emre ARAS โ€” MIT License