computer-control 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,222 @@
1
+ # Computer Control
2
+
3
+ Browser automation and macOS desktop control for AI agents via the Model Context Protocol (MCP).
4
+
5
+ ## Features
6
+
7
+ **Browser Mode** (Chrome Extension)
8
+ - Take screenshots of web pages
9
+ - Click, type, scroll, and navigate
10
+ - Read page content and accessibility trees
11
+ - Execute JavaScript in page context
12
+ - Record and export GIF recordings
13
+ - Manage tabs and windows
14
+
15
+ **Mac Mode** (Native macOS)
16
+ - Control mouse and keyboard
17
+ - Take screenshots and OCR
18
+ - Read accessibility trees
19
+ - Execute AppleScript
20
+ - Record GIF screen captures
21
+
22
+ ## Quick Start
23
+
24
+ ### Option 1: Install from Chrome Web Store (Recommended)
25
+
26
+ 1. **Install the extension** from the [Chrome Web Store](https://chrome.google.com/webstore/detail/computer-control/kenhnnhgbbgkdbedfmijnllgpcognghl)
27
+
28
+ 2. **Install the CLI**
29
+ ```bash
30
+ npm install -g computer-control
31
+ # or
32
+ bun install -g computer-control
33
+ ```
34
+
35
+ 3. **Run the setup wizard**
36
+ ```bash
37
+ computer-control browser install
38
+ ```
39
+ When prompted for the extension ID, enter: `kenhnnhgbbgkdbedfmijnllgpcognghl`
40
+
41
+ 4. **Add to your MCP config** (Claude Code, Cursor, etc.)
42
+ ```json
43
+ {
44
+ "mcpServers": {
45
+ "computer-control-browser": {
46
+ "command": "computer-control",
47
+ "args": ["browser", "serve", "--skip-permissions"]
48
+ }
49
+ }
50
+ }
51
+ ```
52
+
53
+ 5. **Restart your AI assistant** and start automating!
54
+
55
+ ### Option 2: Load Extension from Source
56
+
57
+ 1. **Clone the repository**
58
+ ```bash
59
+ git clone https://github.com/mergd/computer-use.git
60
+ cd computer-use
61
+ bun install
62
+ bun run build
63
+ ```
64
+
65
+ 2. **Build the extension**
66
+ ```bash
67
+ cd extension && ./build.sh
68
+ ```
69
+
70
+ 3. **Load in Chrome**
71
+ - Open `chrome://extensions`
72
+ - Enable "Developer mode"
73
+ - Click "Load unpacked"
74
+ - Select the `extension/dist` folder
75
+ - Copy the extension ID (32 lowercase letters)
76
+
77
+ 4. **Run the setup wizard**
78
+ ```bash
79
+ computer-control browser install --extension-id YOUR_EXTENSION_ID
80
+ ```
81
+
82
+ 5. **Add to MCP config** (same as above)
83
+
84
+ ## Mac Mode Setup
85
+
86
+ For native macOS control (no browser needed):
87
+
88
+ ```bash
89
+ # Run the setup wizard
90
+ computer-control mac setup
91
+
92
+ # Check status
93
+ computer-control mac status
94
+ ```
95
+
96
+ **Requirements:**
97
+ - `cliclick` for mouse/keyboard: `brew install cliclick`
98
+ - `gifsicle` for GIF recording: `brew install gifsicle`
99
+
100
+ **macOS Permissions** (grant to your terminal app):
101
+ - Accessibility (System Settings → Privacy & Security → Accessibility)
102
+ - Screen Recording (System Settings → Privacy & Security → Screen Recording)
103
+
104
+ **MCP Config:**
105
+ ```json
106
+ {
107
+ "mcpServers": {
108
+ "computer-control-mac": {
109
+ "command": "computer-control",
110
+ "args": ["mac", "serve"]
111
+ }
112
+ }
113
+ }
114
+ ```
115
+
116
+ ## CLI Commands
117
+
118
+ ```
119
+ computer-control browser
120
+ ├── install Interactive setup wizard
121
+ ├── status Check installation status
122
+ ├── serve Start MCP server
123
+ ├── path Print extension directory
124
+ └── uninstall Remove native host
125
+
126
+ computer-control mac
127
+ ├── setup Interactive setup wizard
128
+ ├── status Check dependencies & permissions
129
+ └── serve Start MCP server
130
+ ```
131
+
132
+ ## Available Tools
133
+
134
+ ### Browser Mode
135
+
136
+ | Tool | Description |
137
+ |------|-------------|
138
+ | `read_page` | Get accessibility tree of page elements |
139
+ | `find` | Find elements by natural language query |
140
+ | `computer` | Mouse/keyboard actions and screenshots |
141
+ | `navigate` | Navigate to URL or go back/forward |
142
+ | `form_input` | Set form input values |
143
+ | `javascript_tool` | Execute JavaScript in page context |
144
+ | `get_page_text` | Extract raw text content from page |
145
+ | `tabs_context` | Get tab group context info |
146
+ | `tabs_create` | Create new tab in MCP group |
147
+ | `resize_window` | Resize browser window |
148
+ | `gif_creator` | Record and export GIF of browser actions |
149
+ | `upload_image` | Upload image to file input or drag target |
150
+
151
+ ### Mac Mode
152
+
153
+ | Tool | Description |
154
+ |------|-------------|
155
+ | `screenshot` | Capture screen or region |
156
+ | `mouse_click` | Click at coordinates |
157
+ | `mouse_move` | Move cursor |
158
+ | `mouse_scroll` | Scroll in direction |
159
+ | `mouse_drag` | Drag from one point to another |
160
+ | `type_text` | Type text at cursor |
161
+ | `key_press` | Press key with modifiers |
162
+ | `run_applescript` | Execute AppleScript |
163
+ | `get_active_window` | Get focused window info |
164
+ | `list_windows` | List open windows |
165
+ | `focus_app` | Bring app to foreground |
166
+ | `get_accessibility_tree` | Get UI element hierarchy |
167
+ | `ocr_screen` | Extract text via OCR |
168
+ | `find` | Find elements by natural language |
169
+ | `gif_start/stop/export` | Record screen as GIF |
170
+
171
+ ## Troubleshooting
172
+
173
+ ### Extension not connecting
174
+
175
+ 1. Check the extension is enabled in `chrome://extensions`
176
+ 2. Verify the native host is registered:
177
+ ```bash
178
+ computer-control browser status
179
+ ```
180
+ 3. Make sure Chrome is running
181
+ 4. Try restarting Chrome
182
+
183
+ ### Permission errors on Mac
184
+
185
+ Grant permissions to your terminal app in System Settings:
186
+ - Privacy & Security → Accessibility
187
+ - Privacy & Security → Screen Recording
188
+
189
+ Then restart your terminal.
190
+
191
+ ### MCP server not starting
192
+
193
+ Check if the port is already in use:
194
+ ```bash
195
+ lsof -i :62222 # Browser mode WebSocket port
196
+ lsof -i :62220 # Browser mode HTTP port
197
+ ```
198
+
199
+ ## Development
200
+
201
+ ```bash
202
+ # Install dependencies
203
+ bun install
204
+
205
+ # Build everything
206
+ bun run build
207
+
208
+ # Build extension only
209
+ cd extension && ./build.sh
210
+
211
+ # Run from source
212
+ bun src/cli.ts browser serve --skip-permissions
213
+ bun src/cli.ts mac serve
214
+ ```
215
+
216
+ ## Privacy
217
+
218
+ See [PRIVACY.md](PRIVACY.md) for our privacy policy.
219
+
220
+ ## License
221
+
222
+ MIT