computer-control 0.1.0 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +77 -153
  2. package/dist/cli.js +370 -312
  3. package/package.json +1 -1
package/README.md CHANGED
@@ -1,154 +1,78 @@
1
1
  # Computer Control
2
2
 
3
- Browser automation and macOS desktop control for AI agents via the Model Context Protocol (MCP).
4
-
5
- ## Features
6
-
7
- **Browser Mode** (Chrome Extension)
8
- - Take screenshots of web pages
9
- - Click, type, scroll, and navigate
10
- - Read page content and accessibility trees
11
- - Execute JavaScript in page context
12
- - Record and export GIF recordings
13
- - Manage tabs and windows
14
-
15
- **Mac Mode** (Native macOS)
16
- - Control mouse and keyboard
17
- - Take screenshots and OCR
18
- - Read accessibility trees
19
- - Execute AppleScript
20
- - Record GIF screen captures
21
-
22
- ## Quick Start
23
-
24
- ### Option 1: Install from Chrome Web Store (Recommended)
25
-
26
- 1. **Install the extension** from the [Chrome Web Store](https://chrome.google.com/webstore/detail/computer-control/kenhnnhgbbgkdbedfmijnllgpcognghl)
27
-
28
- 2. **Install the CLI**
29
- ```bash
30
- npm install -g computer-control
31
- # or
32
- bun install -g computer-control
33
- ```
34
-
35
- 3. **Run the setup wizard**
36
- ```bash
37
- computer-control browser install
38
- ```
39
- When prompted for the extension ID, enter: `kenhnnhgbbgkdbedfmijnllgpcognghl`
40
-
41
- 4. **Add to your MCP config** (Claude Code, Cursor, etc.)
42
- ```json
43
- {
44
- "mcpServers": {
45
- "computer-control-browser": {
46
- "command": "computer-control",
47
- "args": ["browser", "serve", "--skip-permissions"]
48
- }
49
- }
50
- }
51
- ```
52
-
53
- 5. **Restart your AI assistant** and start automating!
54
-
55
- ### Option 2: Load Extension from Source
56
-
57
- 1. **Clone the repository**
58
- ```bash
59
- git clone https://github.com/mergd/computer-use.git
60
- cd computer-use
61
- bun install
62
- bun run build
63
- ```
64
-
65
- 2. **Build the extension**
66
- ```bash
67
- cd extension && ./build.sh
68
- ```
69
-
70
- 3. **Load in Chrome**
71
- - Open `chrome://extensions`
72
- - Enable "Developer mode"
73
- - Click "Load unpacked"
74
- - Select the `extension/dist` folder
75
- - Copy the extension ID (32 lowercase letters)
76
-
77
- 4. **Run the setup wizard**
78
- ```bash
79
- computer-control browser install --extension-id YOUR_EXTENSION_ID
80
- ```
81
-
82
- 5. **Add to MCP config** (same as above)
83
-
84
- ## Mac Mode Setup
85
-
86
- For native macOS control (no browser needed):
3
+ MCP server for browser automation and macOS desktop control. Give your AI agent eyes and hands.
87
4
 
88
- ```bash
89
- # Run the setup wizard
90
- computer-control mac setup
5
+ ## Getting Started
6
+
7
+ ### Browser Mode
8
+
9
+ Install the CLI and the [Chrome extension](https://chromewebstore.google.com/detail/computer-control/kenhnnhgbbgkdbedfmijnllgpcognghl):
91
10
 
92
- # Check status
93
- computer-control mac status
11
+ ```bash
12
+ npm i -g computer-control
94
13
  ```
95
14
 
96
- **Requirements:**
97
- - `cliclick` for mouse/keyboard: `brew install cliclick`
98
- - `gifsicle` for GIF recording: `brew install gifsicle`
15
+ Start the server (native messaging bridge is registered automatically on first run):
99
16
 
100
- **macOS Permissions** (grant to your terminal app):
101
- - Accessibility (System Settings → Privacy & Security → Accessibility)
102
- - Screen Recording (System Settings → Privacy & Security → Screen Recording)
17
+ ```bash
18
+ computer-control browser serve
19
+ ```
20
+
21
+ Then add the MCP endpoint to your AI client (Claude Code, Cursor, etc.):
103
22
 
104
- **MCP Config:**
105
23
  ```json
106
24
  {
107
25
  "mcpServers": {
108
- "computer-control-mac": {
109
- "command": "computer-control",
110
- "args": ["mac", "serve"]
26
+ "browser": {
27
+ "url": "http://127.0.0.1:62220/mcp"
111
28
  }
112
29
  }
113
30
  }
114
31
  ```
115
32
 
116
- ## CLI Commands
33
+ ### Mac Mode
34
+
35
+ Native macOS control — no browser needed. Install deps, run the wizard, and you're set:
117
36
 
37
+ ```bash
38
+ brew install cliclick gifsicle
39
+ npm i -g computer-control
40
+ computer-control mac setup
118
41
  ```
119
- computer-control browser
120
- ├── install Interactive setup wizard
121
- ├── status Check installation status
122
- ├── serve Start MCP server
123
- ├── path Print extension directory
124
- └── uninstall Remove native host
125
42
 
126
- computer-control mac
127
- ├── setup Interactive setup wizard
128
- ├── status Check dependencies & permissions
129
- └── serve Start MCP server
43
+ Grant **Accessibility** and **Screen Recording** permissions to your terminal app when prompted.
44
+
45
+ ```json
46
+ {
47
+ "mcpServers": {
48
+ "mac": {
49
+ "command": "computer-control",
50
+ "args": ["mac", "serve"]
51
+ }
52
+ }
53
+ }
130
54
  ```
131
55
 
132
- ## Available Tools
56
+ ## Tools
133
57
 
134
- ### Browser Mode
58
+ ### Browser
135
59
 
136
60
  | Tool | Description |
137
61
  |------|-------------|
138
- | `read_page` | Get accessibility tree of page elements |
139
- | `find` | Find elements by natural language query |
140
- | `computer` | Mouse/keyboard actions and screenshots |
141
- | `navigate` | Navigate to URL or go back/forward |
62
+ | `computer` | Mouse, keyboard, and screenshots |
63
+ | `read_page` | Accessibility tree of page elements |
64
+ | `find` | Find elements by natural language |
65
+ | `navigate` | Go to URL, back, forward |
142
66
  | `form_input` | Set form input values |
143
- | `javascript_tool` | Execute JavaScript in page context |
144
- | `get_page_text` | Extract raw text content from page |
145
- | `tabs_context` | Get tab group context info |
146
- | `tabs_create` | Create new tab in MCP group |
67
+ | `javascript_tool` | Execute JS in page context |
68
+ | `get_page_text` | Extract raw text content |
69
+ | `tabs_context` | Tab group context |
70
+ | `tabs_create` | Open new tab |
147
71
  | `resize_window` | Resize browser window |
148
- | `gif_creator` | Record and export GIF of browser actions |
149
- | `upload_image` | Upload image to file input or drag target |
72
+ | `gif_creator` | Record browser actions as GIF |
73
+ | `upload_image` | Upload image to file input |
150
74
 
151
- ### Mac Mode
75
+ ### Mac
152
76
 
153
77
  | Tool | Description |
154
78
  |------|-------------|
@@ -156,66 +80,66 @@ computer-control mac
156
80
  | `mouse_click` | Click at coordinates |
157
81
  | `mouse_move` | Move cursor |
158
82
  | `mouse_scroll` | Scroll in direction |
159
- | `mouse_drag` | Drag from one point to another |
83
+ | `mouse_drag` | Drag between points |
160
84
  | `type_text` | Type text at cursor |
161
85
  | `key_press` | Press key with modifiers |
162
86
  | `run_applescript` | Execute AppleScript |
163
- | `get_active_window` | Get focused window info |
87
+ | `get_active_window` | Focused window info |
164
88
  | `list_windows` | List open windows |
165
89
  | `focus_app` | Bring app to foreground |
166
- | `get_accessibility_tree` | Get UI element hierarchy |
90
+ | `get_accessibility_tree` | UI element hierarchy |
167
91
  | `ocr_screen` | Extract text via OCR |
168
92
  | `find` | Find elements by natural language |
169
- | `gif_start/stop/export` | Record screen as GIF |
170
-
171
- ## Troubleshooting
93
+ | `gif_start` / `gif_stop` / `gif_export` | Record screen as GIF |
172
94
 
173
- ### Extension not connecting
95
+ ## CLI Reference
174
96
 
175
- 1. Check the extension is enabled in `chrome://extensions`
176
- 2. Verify the native host is registered:
177
- ```bash
178
- computer-control browser status
179
- ```
180
- 3. Make sure Chrome is running
181
- 4. Try restarting Chrome
97
+ ```
98
+ computer-control browser
99
+ serve Start MCP server (auto-registers native host)
100
+ status Check installation
101
+ install Re-register native host (or use custom extension ID)
102
+ uninstall Remove native host
182
103
 
183
- ### Permission errors on Mac
104
+ computer-control mac
105
+ setup Setup wizard (deps + permissions)
106
+ status Check deps & permissions
107
+ serve Start MCP server
108
+ ```
184
109
 
185
- Grant permissions to your terminal app in System Settings:
186
- - Privacy & Security → Accessibility
187
- - Privacy & Security → Screen Recording
110
+ ## Troubleshooting
188
111
 
189
- Then restart your terminal.
112
+ **Extension not connecting?**
113
+ Run `computer-control browser status` to check the native host registration. Make sure Chrome is running and the extension is enabled. Restart Chrome if needed.
190
114
 
191
- ### MCP server not starting
115
+ **Permission errors on Mac?**
116
+ Add your terminal app to Accessibility and Screen Recording in System Settings → Privacy & Security. Restart the terminal after.
192
117
 
193
- Check if the port is already in use:
118
+ **Port conflict?**
194
119
  ```bash
195
- lsof -i :62222 # Browser mode WebSocket port
196
- lsof -i :62220 # Browser mode HTTP port
120
+ lsof -i :62222 # WebSocket port
121
+ lsof -i :62220 # HTTP port
197
122
  ```
198
123
 
199
124
  ## Development
200
125
 
201
126
  ```bash
202
- # Install dependencies
127
+ git clone https://github.com/mergd/computer-use.git
128
+ cd computer-use
203
129
  bun install
204
-
205
- # Build everything
206
130
  bun run build
207
131
 
208
- # Build extension only
209
- cd extension && ./build.sh
210
-
211
132
  # Run from source
212
133
  bun src/cli.ts browser serve --skip-permissions
213
134
  bun src/cli.ts mac serve
135
+
136
+ # Build extension from source
137
+ cd extension && ./build.sh
214
138
  ```
215
139
 
216
140
  ## Privacy
217
141
 
218
- See [PRIVACY.md](PRIVACY.md) for our privacy policy.
142
+ See [PRIVACY.md](PRIVACY.md).
219
143
 
220
144
  ## License
221
145