screenhand 0.2.0 → 0.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +165 -446
- package/bin/darwin-arm64/macos-bridge +0 -0
- package/dist/mcp-desktop.js +3615 -400
- package/dist/scripts/export-help-center.js +112 -0
- package/dist/scripts/marketing-loop.js +117 -0
- package/dist/scripts/observer-daemon.js +288 -0
- package/dist/scripts/orchestrator-daemon.js +399 -0
- package/dist/scripts/threads-campaign.js +208 -0
- package/dist/src/community/fetcher.js +109 -0
- package/dist/src/community/index.js +6 -0
- package/dist/src/community/publisher.js +191 -0
- package/dist/src/community/remote-api.js +121 -0
- package/dist/src/community/types.js +3 -0
- package/dist/src/community/validator.js +95 -0
- package/dist/src/context-tracker.js +489 -0
- package/dist/src/ingestion/coverage-auditor.js +233 -0
- package/dist/src/ingestion/doc-parser.js +164 -0
- package/dist/src/ingestion/index.js +8 -0
- package/dist/src/ingestion/menu-scanner.js +152 -0
- package/dist/src/ingestion/reference-merger.js +186 -0
- package/dist/src/ingestion/shortcut-extractor.js +180 -0
- package/dist/src/ingestion/tutorial-extractor.js +170 -0
- package/dist/src/ingestion/types.js +3 -0
- package/dist/src/jobs/manager.js +82 -14
- package/dist/src/jobs/runner.js +138 -15
- package/dist/src/learning/engine.js +356 -0
- package/dist/src/learning/index.js +9 -0
- package/dist/src/learning/locator-policy.js +120 -0
- package/dist/src/learning/pattern-policy.js +89 -0
- package/dist/src/learning/recovery-policy.js +116 -0
- package/dist/src/learning/sensor-policy.js +115 -0
- package/dist/src/learning/timing-model.js +204 -0
- package/dist/src/learning/topology-policy.js +90 -0
- package/dist/src/learning/types.js +9 -0
- package/dist/src/logging/timeline-logger.js +4 -1
- package/dist/src/memory/playbook-seeds.js +200 -0
- package/dist/src/memory/recall.js +60 -8
- package/dist/src/memory/service.js +30 -5
- package/dist/src/memory/store.js +34 -5
- package/dist/src/native/bridge-client.js +253 -31
- package/dist/src/observer/state.js +199 -0
- package/dist/src/observer/types.js +43 -0
- package/dist/src/orchestrator/state.js +68 -0
- package/dist/src/orchestrator/types.js +22 -0
- package/dist/src/perception/ax-source.js +162 -0
- package/dist/src/perception/cdp-source.js +162 -0
- package/dist/src/perception/coordinator.js +771 -0
- package/dist/src/perception/frame-differ.js +287 -0
- package/dist/src/perception/index.js +22 -0
- package/dist/src/perception/manager.js +199 -0
- package/dist/src/perception/types.js +47 -0
- package/dist/src/perception/vision-source.js +399 -0
- package/dist/src/planner/deterministic.js +298 -0
- package/dist/src/planner/executor.js +870 -0
- package/dist/src/planner/goal-store.js +92 -0
- package/dist/src/planner/index.js +21 -0
- package/dist/src/planner/planner.js +520 -0
- package/dist/src/planner/tool-registry.js +71 -0
- package/dist/src/planner/types.js +22 -0
- package/dist/src/platform/explorer.js +213 -0
- package/dist/src/platform/help-center-markdown.js +527 -0
- package/dist/src/platform/learner.js +257 -0
- package/dist/src/playbook/engine.js +296 -11
- package/dist/src/playbook/mcp-recorder.js +204 -0
- package/dist/src/playbook/recorder.js +3 -2
- package/dist/src/playbook/runner.js +1 -1
- package/dist/src/playbook/store.js +139 -10
- package/dist/src/recovery/detectors.js +156 -0
- package/dist/src/recovery/engine.js +327 -0
- package/dist/src/recovery/index.js +20 -0
- package/dist/src/recovery/strategies.js +274 -0
- package/dist/src/recovery/types.js +20 -0
- package/dist/src/runtime/accessibility-adapter.js +55 -18
- package/dist/src/runtime/applescript-adapter.js +8 -2
- package/dist/src/runtime/cdp-chrome-adapter.js +1 -1
- package/dist/src/runtime/executor.js +23 -3
- package/dist/src/runtime/locator-cache.js +24 -2
- package/dist/src/runtime/service.js +59 -15
- package/dist/src/runtime/session-manager.js +4 -1
- package/dist/src/runtime/vision-adapter.js +2 -1
- package/dist/src/state/app-map-types.js +72 -0
- package/dist/src/state/app-map.js +1974 -0
- package/dist/src/state/entity-tracker.js +108 -0
- package/dist/src/state/fusion.js +96 -0
- package/dist/src/state/index.js +21 -0
- package/dist/src/state/ladder-generator.js +236 -0
- package/dist/src/state/persistence.js +156 -0
- package/dist/src/state/types.js +17 -0
- package/dist/src/state/world-model.js +1456 -0
- package/dist/src/util/atomic-write.js +19 -4
- package/dist/src/util/sanitize.js +146 -0
- package/dist-app-maps/com.figma.Desktop.json +959 -0
- package/dist-app-maps/com.hnc.Discord.json +1146 -0
- package/dist-app-maps/notion.id.json +2831 -0
- package/dist-playbooks/canva-screenhand-carousel.json +445 -0
- package/dist-playbooks/codex-desktop.json +76 -0
- package/dist-playbooks/competitor-research-stack.json +122 -0
- package/dist-playbooks/davinci-color-grade.json +153 -0
- package/dist-playbooks/davinci-edit-timeline.json +162 -0
- package/dist-playbooks/davinci-render.json +114 -0
- package/dist-playbooks/devto.json +52 -0
- package/dist-playbooks/discord.json +41 -0
- package/dist-playbooks/google-flow-create-project.json +59 -0
- package/dist-playbooks/google-flow-edit-image.json +90 -0
- package/dist-playbooks/google-flow-edit-video.json +90 -0
- package/dist-playbooks/google-flow-generate-image.json +68 -0
- package/dist-playbooks/google-flow-generate-video.json +191 -0
- package/dist-playbooks/google-flow-open-project.json +48 -0
- package/dist-playbooks/google-flow-open-scenebuilder.json +64 -0
- package/dist-playbooks/google-flow-search-assets.json +64 -0
- package/dist-playbooks/instagram.json +57 -0
- package/dist-playbooks/linkedin.json +52 -0
- package/dist-playbooks/n8n.json +43 -0
- package/dist-playbooks/reddit.json +52 -0
- package/dist-playbooks/threads.json +59 -0
- package/dist-playbooks/x-twitter.json +59 -0
- package/dist-playbooks/youtube.json +59 -0
- package/dist-references/canva.json +646 -0
- package/dist-references/codex-desktop.json +305 -0
- package/dist-references/davinci-resolve-keyboard.json +594 -0
- package/dist-references/davinci-resolve-menu-map.json +1139 -0
- package/dist-references/davinci-resolve-menus-batch1.json +116 -0
- package/dist-references/davinci-resolve-menus-batch2.json +372 -0
- package/dist-references/davinci-resolve-menus-batch3.json +330 -0
- package/dist-references/davinci-resolve-menus-batch4.json +297 -0
- package/dist-references/davinci-resolve-shortcuts.json +333 -0
- package/dist-references/devpost.json +186 -0
- package/dist-references/devto.json +317 -0
- package/dist-references/discord.json +549 -0
- package/dist-references/figma.json +1186 -0
- package/dist-references/finder.json +146 -0
- package/dist-references/google-ads-transparency.json +95 -0
- package/dist-references/google-flow.json +649 -0
- package/dist-references/instagram.json +341 -0
- package/dist-references/linkedin.json +324 -0
- package/dist-references/meta-ad-library.json +86 -0
- package/dist-references/n8n.json +387 -0
- package/dist-references/notes.json +27 -0
- package/dist-references/notion.json +163 -0
- package/dist-references/reddit.json +341 -0
- package/dist-references/threads.json +337 -0
- package/dist-references/x-twitter.json +403 -0
- package/dist-references/youtube.json +373 -0
- package/native/macos-bridge/Package.swift +22 -0
- package/native/macos-bridge/Sources/AccessibilityBridge.swift +482 -0
- package/native/macos-bridge/Sources/AppManagement.swift +339 -0
- package/native/macos-bridge/Sources/CoreGraphicsBridge.swift +537 -0
- package/native/macos-bridge/Sources/ObserverBridge.swift +120 -0
- package/native/macos-bridge/Sources/StreamCapture.swift +136 -0
- package/native/macos-bridge/Sources/VisionBridge.swift +238 -0
- package/native/macos-bridge/Sources/main.swift +498 -0
- package/native/windows-bridge/AppManagement.cs +234 -0
- package/native/windows-bridge/InputBridge.cs +436 -0
- package/native/windows-bridge/Program.cs +270 -0
- package/native/windows-bridge/ScreenCapture.cs +453 -0
- package/native/windows-bridge/UIAutomationBridge.cs +571 -0
- package/native/windows-bridge/WindowsBridge.csproj +17 -0
- package/package.json +12 -1
- package/scripts/postinstall.cjs +127 -0
- package/dist/.audit-log.jsonl +0 -55
- package/dist/.screenhand/memory/.lock +0 -1
- package/dist/.screenhand/memory/actions.jsonl +0 -85
- package/dist/.screenhand/memory/errors.jsonl +0 -5
- package/dist/.screenhand/memory/errors.jsonl.bak +0 -4
- package/dist/.screenhand/memory/state.json +0 -35
- package/dist/.screenhand/memory/state.json.bak +0 -35
- package/dist/.screenhand/memory/strategies.jsonl +0 -12
- package/dist/agent/cli.js +0 -73
- package/dist/agent/loop.js +0 -258
- package/dist/config.js +0 -9
- package/dist/index.js +0 -56
- package/dist/logging/timeline-logger.js +0 -29
- package/dist/mcp/mcp-stdio-server.js +0 -448
- package/dist/mcp/server.js +0 -347
- package/dist/mcp-entry.js +0 -59
- package/dist/memory/recall.js +0 -160
- package/dist/memory/research.js +0 -98
- package/dist/memory/seeds.js +0 -89
- package/dist/memory/session.js +0 -161
- package/dist/memory/store.js +0 -391
- package/dist/memory/types.js +0 -4
- package/dist/monitor/codex-monitor.js +0 -377
- package/dist/monitor/task-queue.js +0 -84
- package/dist/monitor/types.js +0 -49
- package/dist/native/bridge-client.js +0 -174
- package/dist/native/macos-bridge-client.js +0 -5
- package/dist/npm-publish-helper.js +0 -117
- package/dist/npm-token-cdp.js +0 -113
- package/dist/npm-token-create.js +0 -135
- package/dist/npm-token-finish.js +0 -126
- package/dist/playbook/engine.js +0 -193
- package/dist/playbook/index.js +0 -4
- package/dist/playbook/recorder.js +0 -519
- package/dist/playbook/runner.js +0 -392
- package/dist/playbook/store.js +0 -166
- package/dist/playbook/types.js +0 -4
- package/dist/runtime/accessibility-adapter.js +0 -377
- package/dist/runtime/app-adapter.js +0 -48
- package/dist/runtime/applescript-adapter.js +0 -283
- package/dist/runtime/ax-role-map.js +0 -80
- package/dist/runtime/browser-adapter.js +0 -36
- package/dist/runtime/cdp-chrome-adapter.js +0 -505
- package/dist/runtime/composite-adapter.js +0 -205
- package/dist/runtime/executor.js +0 -250
- package/dist/runtime/locator-cache.js +0 -12
- package/dist/runtime/planning-loop.js +0 -47
- package/dist/runtime/service.js +0 -372
- package/dist/runtime/session-manager.js +0 -28
- package/dist/runtime/state-observer.js +0 -105
- package/dist/runtime/vision-adapter.js +0 -208
- package/dist/test-mcp-protocol.js +0 -138
- package/dist/types.js +0 -1
package/README.md
CHANGED
|
@@ -2,9 +2,9 @@
|
|
|
2
2
|
|
|
3
3
|
# ScreenHand
|
|
4
4
|
|
|
5
|
-
**
|
|
5
|
+
**Let AI control your desktop — click buttons, fill forms, automate workflows in ~50ms with zero extra AI calls.**
|
|
6
6
|
|
|
7
|
-
An open-source [MCP server](https://modelcontextprotocol.io/) for macOS and Windows
|
|
7
|
+
An open-source [MCP server](https://modelcontextprotocol.io/) for macOS and Windows. Works with Claude, Cursor, Codex CLI, and any MCP-compatible client.
|
|
8
8
|
|
|
9
9
|
[](LICENSE)
|
|
10
10
|
[](https://www.npmjs.com/package/screenhand)
|
|
@@ -12,570 +12,289 @@ An open-source [MCP server](https://modelcontextprotocol.io/) for macOS and Wind
|
|
|
12
12
|
[]()
|
|
13
13
|
[]()
|
|
14
14
|
|
|
15
|
-
[
|
|
15
|
+
[Quick Start](#quick-start) | [What It Does](#what-it-does) | [Example](#example) | [All 111 Tools](docs/tools.md) | [Architecture](docs/architecture.md) | [Website](https://screenhand.com)
|
|
16
16
|
|
|
17
17
|
</div>
|
|
18
18
|
|
|
19
19
|
---
|
|
20
20
|
|
|
21
|
-
|
|
21
|
+
<!-- TODO: Add demo GIF here — 15 sec showing Claude controlling a real app -->
|
|
22
22
|
|
|
23
|
-
|
|
24
|
-
- `0` extra AI calls for native clicks, typing, and UI element lookup
|
|
25
|
-
- `70+` tools across desktop apps, browser automation, OCR, memory, sessions, jobs, and playbooks
|
|
26
|
-
- `macOS + Windows` behind the same MCP interface
|
|
27
|
-
- **Multi-agent safe** — session leases prevent conflicts between Claude, Cursor, and Codex
|
|
28
|
-
- **Background worker** — queue jobs and let the daemon process them continuously
|
|
23
|
+
## The Problem
|
|
29
24
|
|
|
30
|
-
|
|
25
|
+
AI assistants can write code but can't use your computer. Every click requires a screenshot → LLM interpretation → coordinate guess — **3-5 seconds and an API call per action**.
|
|
31
26
|
|
|
32
|
-
ScreenHand
|
|
27
|
+
ScreenHand gives AI direct access to native OS APIs. No screenshots needed for clicks. No AI calls for button presses.
|
|
33
28
|
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
- **Coordinate** multiple AI agents with session leases and stall detection
|
|
42
|
-
|
|
43
|
-
It works as an [MCP (Model Context Protocol)](https://modelcontextprotocol.io/) server, meaning any MCP-compatible AI client can use it out of the box.
|
|
44
|
-
|
|
45
|
-
| Problem | ScreenHand Solution |
|
|
46
|
-
|---|---|
|
|
47
|
-
| AI can't see your screen | Screenshots + OCR return all visible text |
|
|
48
|
-
| AI can't click UI elements | Accessibility API finds and clicks elements in ~50ms |
|
|
49
|
-
| AI can't control browsers | Chrome DevTools Protocol gives full page control |
|
|
50
|
-
| AI can't automate workflows | 70+ tools for cross-app automation |
|
|
51
|
-
| Only works on one OS | Native bridges for both macOS and Windows |
|
|
52
|
-
| Multiple agents conflict | Session leases with heartbeat and stall detection |
|
|
53
|
-
| Jobs need manual triggering | Worker daemon processes the queue continuously |
|
|
29
|
+
| | Without ScreenHand | With ScreenHand |
|
|
30
|
+
|---|---|---|
|
|
31
|
+
| Click a button | Screenshot → LLM → coordinate click (~3-5s) | Native Accessibility API (~50ms) |
|
|
32
|
+
| Cost per action | 1 LLM API call | 0 LLM calls |
|
|
33
|
+
| Accuracy | Coordinate guessing — misses on layout shift | Exact element targeting by role/name |
|
|
34
|
+
| Browser control | Needs focus, screenshot per action | CDP in background (~10ms), no focus needed |
|
|
35
|
+
| Works across apps | One app at a time | Cross-app workflows, multi-agent coordination |
|
|
54
36
|
|
|
55
37
|
## Quick Start
|
|
56
38
|
|
|
57
|
-
###
|
|
39
|
+
### 1. Add to your AI client (one step)
|
|
58
40
|
|
|
59
|
-
|
|
41
|
+
<details open>
|
|
42
|
+
<summary><b>Claude Code</b> (recommended)</summary>
|
|
60
43
|
|
|
61
44
|
```bash
|
|
62
|
-
|
|
63
|
-
cd screenhand
|
|
64
|
-
npm install
|
|
65
|
-
npm run build:native # macOS — builds Swift bridge
|
|
66
|
-
# npm run build:native:windows # Windows — builds .NET bridge
|
|
45
|
+
claude mcp add screenhand -- npx -y screenhand
|
|
67
46
|
```
|
|
68
47
|
|
|
69
|
-
|
|
48
|
+
Done. That's it.
|
|
49
|
+
</details>
|
|
70
50
|
|
|
71
|
-
|
|
51
|
+
<details>
|
|
52
|
+
<summary><b>Claude Desktop</b></summary>
|
|
72
53
|
|
|
73
54
|
Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
|
|
74
|
-
|
|
75
55
|
```json
|
|
76
56
|
{
|
|
77
57
|
"mcpServers": {
|
|
78
58
|
"screenhand": {
|
|
79
59
|
"command": "npx",
|
|
80
|
-
"args": ["
|
|
60
|
+
"args": ["-y", "screenhand"]
|
|
81
61
|
}
|
|
82
62
|
}
|
|
83
63
|
}
|
|
84
64
|
```
|
|
65
|
+
</details>
|
|
85
66
|
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
Add to your project `.mcp.json` or `~/.claude/settings.json`:
|
|
67
|
+
<details>
|
|
68
|
+
<summary><b>Cursor</b></summary>
|
|
89
69
|
|
|
70
|
+
Add to `.cursor/mcp.json`:
|
|
90
71
|
```json
|
|
91
72
|
{
|
|
92
73
|
"mcpServers": {
|
|
93
74
|
"screenhand": {
|
|
94
75
|
"command": "npx",
|
|
95
|
-
"args": ["
|
|
76
|
+
"args": ["-y", "screenhand"]
|
|
96
77
|
}
|
|
97
78
|
}
|
|
98
79
|
}
|
|
99
80
|
```
|
|
81
|
+
</details>
|
|
100
82
|
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
Add to `.cursor/mcp.json` in your project (or `~/.cursor/mcp.json` for global):
|
|
104
|
-
|
|
105
|
-
```json
|
|
106
|
-
{
|
|
107
|
-
"mcpServers": {
|
|
108
|
-
"screenhand": {
|
|
109
|
-
"command": "npx",
|
|
110
|
-
"args": ["tsx", "/path/to/screenhand/mcp-desktop.ts"]
|
|
111
|
-
}
|
|
112
|
-
}
|
|
113
|
-
}
|
|
114
|
-
```
|
|
115
|
-
|
|
116
|
-
### OpenAI Codex CLI
|
|
83
|
+
<details>
|
|
84
|
+
<summary><b>OpenAI Codex CLI</b></summary>
|
|
117
85
|
|
|
118
86
|
Add to `~/.codex/config.toml`:
|
|
119
|
-
|
|
120
87
|
```toml
|
|
121
88
|
[mcp.screenhand]
|
|
122
89
|
command = "npx"
|
|
123
|
-
args = ["
|
|
90
|
+
args = ["-y", "screenhand"]
|
|
124
91
|
transport = "stdio"
|
|
125
92
|
```
|
|
93
|
+
</details>
|
|
126
94
|
|
|
127
|
-
|
|
95
|
+
<details>
|
|
96
|
+
<summary><b>Any MCP Client</b></summary>
|
|
128
97
|
|
|
129
|
-
|
|
98
|
+
ScreenHand is a standard MCP server over stdio. Run with `npx -y screenhand`.
|
|
99
|
+
</details>
|
|
130
100
|
|
|
131
|
-
|
|
132
|
-
{
|
|
133
|
-
"mcpServers": {
|
|
134
|
-
"screenhand": {
|
|
135
|
-
"command": "npx",
|
|
136
|
-
"args": ["tsx", "/path/to/screenhand/mcp-desktop.ts"]
|
|
137
|
-
}
|
|
138
|
-
}
|
|
139
|
-
}
|
|
140
|
-
```
|
|
101
|
+
### 2. Grant permissions
|
|
141
102
|
|
|
142
|
-
|
|
103
|
+
**macOS**: System Settings > Privacy & Security > Accessibility > enable your terminal app.
|
|
143
104
|
|
|
144
|
-
|
|
105
|
+
**Windows**: No special permissions needed.
|
|
145
106
|
|
|
146
|
-
|
|
107
|
+
### 3. Browser control (optional)
|
|
147
108
|
|
|
148
|
-
|
|
109
|
+
Launch Chrome with remote debugging to enable browser tools:
|
|
110
|
+
```bash
|
|
111
|
+
open -a "Google Chrome" --args --remote-debugging-port=9222
|
|
112
|
+
```
|
|
149
113
|
|
|
150
|
-
|
|
114
|
+
That's it. Your AI client now has 111 tools for desktop automation.
|
|
151
115
|
|
|
152
|
-
|
|
116
|
+
<details>
|
|
117
|
+
<summary><b>Building from source</b> (contributors only)</summary>
|
|
153
118
|
|
|
154
|
-
|
|
119
|
+
```bash
|
|
120
|
+
git clone https://github.com/manushi4/screenhand.git
|
|
121
|
+
cd screenhand && npm install && npm run build:native
|
|
122
|
+
```
|
|
155
123
|
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
| `screenshot` | Full screenshot + OCR — returns all visible text | ~600ms |
|
|
159
|
-
| `screenshot_file` | Screenshot saved to file (for viewing the image) | ~400ms |
|
|
160
|
-
| `ocr` | OCR with element positions and bounding boxes | ~600ms |
|
|
124
|
+
On Windows, use `npm run build:native:windows` instead.
|
|
125
|
+
</details>
|
|
161
126
|
|
|
162
|
-
|
|
127
|
+
---
|
|
163
128
|
|
|
164
|
-
|
|
165
|
-
|------|-------------|-------|
|
|
166
|
-
| `apps` | List running apps with bundle IDs and PIDs | ~10ms |
|
|
167
|
-
| `windows` | List visible windows with positions and sizes | ~10ms |
|
|
168
|
-
| `focus` | Bring an app to the front | ~10ms |
|
|
169
|
-
| `launch` | Launch an app by bundle ID or name | ~1s |
|
|
170
|
-
| `ui_tree` | Full UI element tree — instant, no OCR needed | ~50ms |
|
|
171
|
-
| `ui_find` | Find a UI element by text or title | ~50ms |
|
|
172
|
-
| `ui_press` | Click a UI element by its title | ~50ms |
|
|
173
|
-
| `ui_set_value` | Set value of a text field, slider, etc. | ~50ms |
|
|
174
|
-
| `menu_click` | Click a menu bar item by path | ~100ms |
|
|
129
|
+
## What It Does
|
|
175
130
|
|
|
176
|
-
|
|
131
|
+
ScreenHand gives AI agents seven capabilities:
|
|
177
132
|
|
|
178
|
-
|
|
179
|
-
|
|
180
|
-
| `click` | Click at screen coordinates |
|
|
181
|
-
| `click_text` | Find text via OCR and click it (fallback) |
|
|
182
|
-
| `type_text` | Type text via keyboard |
|
|
183
|
-
| `key` | Key combo (e.g. `cmd+s`, `ctrl+shift+n`) |
|
|
184
|
-
| `drag` | Drag from point A to B |
|
|
185
|
-
| `scroll` | Scroll at a position |
|
|
133
|
+
### Desktop Control — 19 tools
|
|
134
|
+
Click buttons, type text, read UI trees, navigate menus, drag, scroll — all via native Accessibility APIs in ~50ms. Works with any app: Finder, Notes, VS Code, Xcode, System Settings, etc.
|
|
186
135
|
|
|
187
|
-
###
|
|
136
|
+
### Browser Automation — 15 tools
|
|
137
|
+
Full Chrome control via DevTools Protocol. Navigate, click, type, run JavaScript, fill forms — all in the background at ~10ms. Built-in anti-detection (`browser_stealth`, `browser_human_click`) for sites with bot protection.
|
|
188
138
|
|
|
189
|
-
|
|
190
|
-
|
|
191
|
-
| `browser_tabs` | List all open Chrome tabs |
|
|
192
|
-
| `browser_open` | Open URL in new tab |
|
|
193
|
-
| `browser_navigate` | Navigate active tab to URL |
|
|
194
|
-
| `browser_js` | Run JavaScript in a tab |
|
|
195
|
-
| `browser_dom` | Query DOM with CSS selectors |
|
|
196
|
-
| `browser_click` | Click element by CSS selector (uses CDP mouse events) |
|
|
197
|
-
| `browser_type` | Type into an input field (uses CDP keyboard events, React-compatible) |
|
|
198
|
-
| `browser_wait` | Wait for a page condition |
|
|
199
|
-
| `browser_page_info` | Get page title, URL, and content |
|
|
139
|
+
### Smart Fallbacks — 8 tools
|
|
140
|
+
`click_with_fallback`, `type_with_fallback`, etc. automatically try Accessibility → CDP → OCR → coordinates. You don't have to pick the right method — ScreenHand figures it out.
|
|
200
141
|
|
|
201
|
-
###
|
|
142
|
+
### Memory & Learning — 14 tools
|
|
143
|
+
Gets smarter every session. Logs tool calls, saves winning strategies, tracks error patterns with fixes. Zero config, zero latency overhead (in-memory cache, async disk writes). Ships with 12 seed strategies for common macOS workflows. 6 learning policies: locator stability, sensor effectiveness, recovery ranking, pattern recognition, adaptive timing, and topology (navigation edge reliability).
|
|
202
144
|
|
|
203
|
-
|
|
145
|
+
### App Mastery Map — automatic per-app spatial understanding
|
|
146
|
+
Builds a persistent reverse-engineered blueprint of every app from normal tool usage. 8 features record automatically: page zones, navigation graph (BFS pathfinding), hierarchy, I/O contracts, state machine, element visibility, timing profiles, and ready signals. Mastery levels (beginner → pro → expert → grandmaster) honestly reflect how well ScreenHand knows each app. Maps stored at `~/.screenhand/app-maps/`.
|
|
204
147
|
|
|
205
|
-
|
|
206
|
-
|
|
207
|
-
| `browser_stealth` | Inject anti-detection patches (hides webdriver flag, fakes plugins/languages) |
|
|
208
|
-
| `browser_fill_form` | Human-like typing with random delays via CDP keyboard events |
|
|
209
|
-
| `browser_human_click` | Realistic mouse event sequence (mouseMoved → mousePressed → mouseReleased) |
|
|
148
|
+
### Jobs & Orchestration — 34 tools
|
|
149
|
+
Queue multi-step jobs, run them via background worker daemon, coordinate multiple AI agents with session leases, detect stalls, auto-recover. Survives client restarts.
|
|
210
150
|
|
|
211
|
-
|
|
151
|
+
### Perception & Planning — 17 tools
|
|
152
|
+
Continuous screen awareness (3-rate perception loop at 100ms/300ms/1000ms), real-time world model with entity tracking, goal-oriented planning with auto-decomposition, recovery engine with self-healing. The system always knows what's on screen and feeds observations into the App Mastery Map.
|
|
212
153
|
|
|
213
|
-
|
|
154
|
+
> **Full reference**: See all [111 tools with descriptions](docs/tools.md).
|
|
214
155
|
|
|
215
|
-
|
|
156
|
+
---
|
|
216
157
|
|
|
217
|
-
|
|
218
|
-
|------|-------------|
|
|
219
|
-
| `execution_plan` | Generate an execution plan for a task |
|
|
220
|
-
| `click_with_fallback` | Click using the best available method |
|
|
221
|
-
| `type_with_fallback` | Type using the best available method |
|
|
222
|
-
| `read_with_fallback` | Read content using the best available method |
|
|
223
|
-
| `locate_with_fallback` | Find an element using the best available method |
|
|
224
|
-
| `select_with_fallback` | Select an option using the best available method |
|
|
225
|
-
| `scroll_with_fallback` | Scroll using the best available method |
|
|
226
|
-
| `wait_for_state` | Wait for a UI state using the best available method |
|
|
158
|
+
## Example
|
|
227
159
|
|
|
228
|
-
|
|
160
|
+
**Browser** — Claude controls Chrome in the background while you work:
|
|
229
161
|
|
|
230
|
-
|
|
162
|
+
```
|
|
163
|
+
You: Search for "screenhand" on Instagram
|
|
231
164
|
|
|
232
|
-
|
|
233
|
-
|
|
234
|
-
| `platform_guide` | Get automation guide for a platform (selectors, URLs, flows, errors+solutions) |
|
|
235
|
-
| `export_playbook` | Auto-generate a playbook from your session. Share it to help others. |
|
|
165
|
+
→ browser_tabs() # ~10ms
|
|
166
|
+
[34DF5DE1] Instagram — https://www.instagram.com/
|
|
236
167
|
|
|
237
|
-
|
|
238
|
-
|
|
239
|
-
|
|
240
|
-
platform_guide({ platform: "devpost", section: "flows" }) # Step-by-step workflows
|
|
241
|
-
platform_guide({ platform: "devpost" }) # Full playbook
|
|
242
|
-
```
|
|
168
|
+
→ browser_js({ code: "/* click Search icon */" }) # ~10ms
|
|
169
|
+
→ browser_fill_form({ selector: "input", text: "screenhand" }) # ~50ms (human-like)
|
|
170
|
+
→ browser_js({ code: "/* extract results */" }) # ~10ms
|
|
243
171
|
|
|
244
|
-
|
|
245
|
-
```
|
|
246
|
-
export_playbook({ platform: "twitter", domain: "twitter.com" })
|
|
172
|
+
Found @screenhand_ as the top result.
|
|
247
173
|
```
|
|
248
|
-
This auto-extracts URLs, selectors, errors+solutions from your session and saves a ready-to-share `playbooks/twitter.json`.
|
|
249
|
-
|
|
250
|
-
Available platforms: `devpost`. Add more by running `export_playbook` or creating JSON files in `playbooks/`.
|
|
251
|
-
|
|
252
|
-
Zero performance cost — files only read when `platform_guide` is called.
|
|
253
|
-
|
|
254
|
-
### AppleScript (macOS only)
|
|
255
|
-
|
|
256
|
-
| Tool | What it does |
|
|
257
|
-
|------|-------------|
|
|
258
|
-
| `applescript` | Run any AppleScript command |
|
|
259
|
-
|
|
260
|
-
### Memory (Learning) — zero-config, zero-latency
|
|
261
|
-
|
|
262
|
-
ScreenHand gets smarter every time you use it — **no manual setup needed**.
|
|
263
|
-
|
|
264
|
-
**What happens automatically:**
|
|
265
|
-
- Every tool call is logged (async, non-blocking — adds ~0ms to response time)
|
|
266
|
-
- After 3+ consecutive successes, the winning sequence is saved as a reusable strategy
|
|
267
|
-
- Known error patterns are tracked with resolutions (e.g. "launch times out → use focus() instead")
|
|
268
|
-
- On every tool call, the response includes **auto-recall hints**:
|
|
269
|
-
- Error warnings if the tool has failed before
|
|
270
|
-
- Next-step suggestions if you're mid-way through a known strategy
|
|
271
|
-
|
|
272
|
-
**Predefined seed strategies:**
|
|
273
|
-
- Ships with 12 common macOS workflows (Photo Booth, Chrome navigation, copy/paste, Finder, export PDF, etc.)
|
|
274
|
-
- Loaded automatically on first boot — the system has knowledge from day one
|
|
275
|
-
- Seeds are searchable via `memory_recall` and provide next-step hints like any learned strategy
|
|
276
|
-
|
|
277
|
-
**Background web research:**
|
|
278
|
-
- When a tool fails and no resolution exists, ScreenHand searches for a fix in the background (non-blocking)
|
|
279
|
-
- Uses Claude API (haiku, if `ANTHROPIC_API_KEY` is set) or DuckDuckGo instant answers as fallback
|
|
280
|
-
- Resolutions are saved to both error cache and strategy store — zero-latency recall next time
|
|
281
|
-
- Completely silent and fire-and-forget — never blocks tool responses or throws errors
|
|
282
|
-
|
|
283
|
-
**Fingerprint matching & feedback loop:**
|
|
284
|
-
- Each strategy is fingerprinted by its tool sequence (e.g. `apps→focus→ui_press`)
|
|
285
|
-
- O(1) exact-match lookup when the agent follows a known sequence
|
|
286
|
-
- Success/failure outcomes are tracked per strategy — unreliable strategies are auto-penalized and eventually skipped
|
|
287
|
-
- Keyword-based fuzzy search with reliability scoring for `memory_recall`
|
|
288
|
-
|
|
289
|
-
**Production-grade under the hood:**
|
|
290
|
-
- All data cached in RAM at startup — lookups are ~0ms, disk is only for persistence
|
|
291
|
-
- Disk writes are async and buffered (100ms debounce) — never block tool calls
|
|
292
|
-
- Sync flush on process exit (SIGINT/SIGTERM) — no lost writes
|
|
293
|
-
- Per-line JSONL parsing — corrupted lines are skipped, not fatal
|
|
294
|
-
- LRU eviction: 500 strategies, 200 error patterns max (oldest evicted automatically)
|
|
295
|
-
- File locking (`.lock` + PID) prevents corruption from concurrent instances
|
|
296
|
-
- Action log auto-rotates at 10 MB
|
|
297
|
-
- Data lives in `.screenhand/memory/` as JSONL (grep-friendly, no database)
|
|
298
|
-
|
|
299
|
-
| Tool | What it does |
|
|
300
|
-
|------|-------------|
|
|
301
|
-
| `memory_snapshot` | Get current memory state snapshot |
|
|
302
|
-
| `memory_recall` | Search past strategies by task description |
|
|
303
|
-
| `memory_save` | Manually save the current session as a strategy |
|
|
304
|
-
| `memory_record_error` | Record an error pattern with an optional fix |
|
|
305
|
-
| `memory_record_learning` | Record a verified pattern (what works/fails) |
|
|
306
|
-
| `memory_query_patterns` | Search learnings by scope and method |
|
|
307
|
-
| `memory_errors` | View all known error patterns and resolutions |
|
|
308
|
-
| `memory_stats` | Action counts, success rates, top tools, disk usage |
|
|
309
|
-
| `memory_clear` | Clear actions, strategies, errors, or all data |
|
|
310
|
-
|
|
311
|
-
### Session Supervisor — multi-agent coordination
|
|
312
|
-
|
|
313
|
-
Lease-based window locking with heartbeat, stall detection, and automatic recovery. Prevents multiple AI agents from fighting over the same app window.
|
|
314
|
-
|
|
315
|
-
| Tool | What it does |
|
|
316
|
-
|------|-------------|
|
|
317
|
-
| `session_claim` | Claim exclusive control of an app window |
|
|
318
|
-
| `session_heartbeat` | Keep your lease alive (call every 60s) |
|
|
319
|
-
| `session_release` | Release your session lease |
|
|
320
|
-
| `supervisor_status` | Active sessions, health metrics, stall detection |
|
|
321
|
-
| `supervisor_start` | Start the supervisor background daemon |
|
|
322
|
-
| `supervisor_stop` | Stop the supervisor daemon |
|
|
323
|
-
| `supervisor_pause` | Pause supervisor monitoring |
|
|
324
|
-
| `supervisor_resume` | Resume supervisor monitoring |
|
|
325
|
-
| `supervisor_install` | Install supervisor as a launchd service (macOS) |
|
|
326
|
-
| `supervisor_uninstall` | Uninstall supervisor launchd service |
|
|
327
|
-
| `recovery_queue_add` | Add a recovery action to the supervisor's queue |
|
|
328
|
-
| `recovery_queue_list` | List pending recovery actions |
|
|
329
|
-
|
|
330
|
-
The supervisor runs as a **detached daemon** that survives MCP/client restarts. It monitors active sessions, detects stalls, expires abandoned leases, and queues recovery actions.
|
|
331
|
-
|
|
332
|
-
### Jobs & Worker Daemon
|
|
333
|
-
|
|
334
|
-
Queue multi-step automation jobs and let a background worker process them continuously. Jobs can target specific apps/windows and execute via playbook engine or free-form steps.
|
|
335
|
-
|
|
336
|
-
| Tool | What it does |
|
|
337
|
-
|------|-------------|
|
|
338
|
-
| `job_create` | Create a job with steps (optionally tied to a playbook + bundleId/windowId) |
|
|
339
|
-
| `job_status` | Get the status of a job |
|
|
340
|
-
| `job_list` | List jobs by state (queued, running, done, failed, blocked) |
|
|
341
|
-
| `job_transition` | Transition a job to a new state |
|
|
342
|
-
| `job_step_done` | Mark a job step as done |
|
|
343
|
-
| `job_step_fail` | Mark a job step as failed |
|
|
344
|
-
| `job_resume` | Resume a blocked/waiting job |
|
|
345
|
-
| `job_dequeue` | Dequeue the next queued job |
|
|
346
|
-
| `job_remove` | Remove a job |
|
|
347
|
-
| `job_run` | Execute a single queued job through the runner |
|
|
348
|
-
| `job_run_all` | Process all queued jobs sequentially |
|
|
349
|
-
| `worker_start` | Start the background worker daemon |
|
|
350
|
-
| `worker_stop` | Stop the worker daemon |
|
|
351
|
-
| `worker_status` | Get worker daemon status and recent results |
|
|
352
|
-
|
|
353
|
-
**Job state machine:** `queued → running → done | failed | blocked | waiting_human`
|
|
354
|
-
|
|
355
|
-
**Worker daemon features:**
|
|
356
|
-
- Runs as a detached process — survives MCP/client restarts
|
|
357
|
-
- Continuously polls the job queue and executes via JobRunner
|
|
358
|
-
- Playbook integration — jobs with a `playbookId` execute through PlaybookEngine
|
|
359
|
-
- Focuses/validates the target `bundleId`/`windowId` before each step
|
|
360
|
-
- Persists status and recent results to `~/.screenhand/worker/state.json`
|
|
361
|
-
- Single-instance enforcement via PID file
|
|
362
|
-
- Graceful shutdown on SIGINT/SIGTERM
|
|
363
174
|
|
|
364
|
-
|
|
365
|
-
# Start the worker daemon directly
|
|
366
|
-
npx tsx scripts/worker-daemon.ts
|
|
367
|
-
npx tsx scripts/worker-daemon.ts --poll 5000 --max-jobs 10
|
|
175
|
+
**Desktop** — native app control without screenshots:
|
|
368
176
|
|
|
369
|
-
|
|
370
|
-
|
|
177
|
+
```
|
|
178
|
+
→ apps() # List running apps ~10ms
|
|
179
|
+
→ focus("com.apple.Notes") # Bring Notes to front ~10ms
|
|
180
|
+
→ ui_tree() # Read full UI element tree ~50ms
|
|
181
|
+
→ ui_press("New Note") # Click "New Note" button ~50ms
|
|
182
|
+
→ type_text("Hello world") # Type text ~30ms
|
|
371
183
|
```
|
|
372
184
|
|
|
373
|
-
|
|
185
|
+
**Cross-app** — chain actions across your whole desktop:
|
|
374
186
|
|
|
375
187
|
```
|
|
376
|
-
|
|
377
|
-
|
|
378
|
-
|
|
379
|
-
|
|
380
|
-
┌────────────────────────▼────────────────────────────┐
|
|
381
|
-
│ mcp-desktop.ts │
|
|
382
|
-
│ (MCP Server — 70+ tools) │
|
|
383
|
-
├───────────┬──────────┬──────────────────────────────┤
|
|
384
|
-
│ Native │ Chrome │ Memory / Supervisor / Jobs │
|
|
385
|
-
│ Bridge │ CDP │ / Playbooks / Worker │
|
|
386
|
-
└─────┬─────┴────┬─────┴──────────────────────────────┘
|
|
387
|
-
│ │
|
|
388
|
-
┌─────▼─────┐ ┌──▼──────┐ ┌──────────────┐ ┌──────────────┐
|
|
389
|
-
│macos-bridge│ │ Chrome │ │ Supervisor │ │ Worker │
|
|
390
|
-
│(Swift, AX) │ │DevTools │ │ Daemon │ │ Daemon │
|
|
391
|
-
└────────────┘ └─────────┘ └──────────────┘ └──────────────┘
|
|
188
|
+
→ browser_js(...) # Extract data from Chrome
|
|
189
|
+
→ focus("com.apple.Notes") # Switch to Notes
|
|
190
|
+
→ type_text(extractedData) # Paste it in
|
|
191
|
+
→ key("cmd+s") # Save
|
|
392
192
|
```
|
|
393
193
|
|
|
394
|
-
|
|
395
|
-
|
|
396
|
-
| Path | Purpose |
|
|
397
|
-
|---|---|
|
|
398
|
-
| `mcp-desktop.ts` | MCP server entrypoint — all tool definitions |
|
|
399
|
-
| `src/native/bridge-client.ts` | TypeScript ↔ native bridge communication |
|
|
400
|
-
| `native/macos-bridge/` | Swift binary using Accessibility API + OCR |
|
|
401
|
-
| `native/windows-bridge/` | C# binary using UI Automation + SendInput |
|
|
402
|
-
| `src/memory/` | Persistent memory service (strategies, errors, learnings) |
|
|
403
|
-
| `src/supervisor/` | Session leases, stall detection, recovery |
|
|
404
|
-
| `src/jobs/` | Job queue, runner, worker state persistence |
|
|
405
|
-
| `src/playbook/` | Playbook engine and store |
|
|
406
|
-
| `src/runtime/` | Execution contract, accessibility adapter, fallback chain |
|
|
407
|
-
| `scripts/worker-daemon.ts` | Standalone worker daemon process |
|
|
408
|
-
| `scripts/supervisor-daemon.ts` | Standalone supervisor daemon process |
|
|
194
|
+
---
|
|
409
195
|
|
|
410
|
-
|
|
196
|
+
## Claude Code Plugin
|
|
411
197
|
|
|
412
|
-
|
|
198
|
+
If you use Claude Code, ScreenHand includes a plugin with **13 skills and 5 agents** that wrap all 111 tools into intent-oriented workflows.
|
|
413
199
|
|
|
414
|
-
```
|
|
415
|
-
|
|
416
|
-
├── memory/ # strategies, errors, learnings (JSONL)
|
|
417
|
-
├── supervisor/ # supervisor daemon state
|
|
418
|
-
├── locks/ # session lease files
|
|
419
|
-
├── jobs/ # job queue persistence
|
|
420
|
-
├── worker/ # worker daemon state, PID, logs
|
|
421
|
-
└── playbooks/ # saved playbook definitions
|
|
200
|
+
```bash
|
|
201
|
+
./install-plugin.sh # after npm install && npm run build:native
|
|
422
202
|
```
|
|
423
203
|
|
|
424
|
-
|
|
204
|
+
| Skill | What it does |
|
|
205
|
+
|-------|-------------|
|
|
206
|
+
| `/automate` | Control any desktop app |
|
|
207
|
+
| `/post-social` | Post to X, LinkedIn, Instagram, Reddit, Threads, Discord |
|
|
208
|
+
| `/run-campaign` | Multi-platform marketing campaigns |
|
|
209
|
+
| `/edit-video` | DaVinci Resolve automation |
|
|
210
|
+
| `/design-figma` | Figma design via Plugin API + browser |
|
|
211
|
+
| `/edit-canva` | Canva template editing |
|
|
212
|
+
| `/scrape-web` | Data extraction with anti-detection |
|
|
213
|
+
| `/fill-form` | Human-like form filling |
|
|
214
|
+
| `/qa-smoke-test` | Automated UI testing |
|
|
215
|
+
| `/record-workflow` | Record into reusable playbooks |
|
|
216
|
+
| `/learn-platform` | Discover how to automate a new app/site |
|
|
217
|
+
| `/run-jobs` | Job queues, background workers |
|
|
218
|
+
| `/manage-system` | Supervisor, memory, diagnostics |
|
|
219
|
+
|
|
220
|
+
5 specialized agents: **marketing**, **design**, **QA**, **scraper**, **orchestrator**.
|
|
221
|
+
|
|
222
|
+
---
|
|
425
223
|
|
|
426
|
-
|
|
224
|
+
## How It Works
|
|
427
225
|
|
|
428
226
|
```
|
|
429
|
-
AI Client (Claude, Cursor,
|
|
227
|
+
AI Client (Claude, Cursor, Codex CLI)
|
|
430
228
|
↓ MCP protocol (stdio)
|
|
431
229
|
ScreenHand MCP Server (TypeScript)
|
|
432
230
|
↓ JSON-RPC (stdio)
|
|
433
231
|
Native Bridge (Swift on macOS / C# on Windows)
|
|
434
|
-
↓
|
|
435
|
-
|
|
232
|
+
↓ OS APIs
|
|
233
|
+
Accessibility, CoreGraphics, Vision, UI Automation, SendInput
|
|
436
234
|
```
|
|
437
235
|
|
|
438
|
-
|
|
439
|
-
- **macOS**: Swift binary using Accessibility APIs, CoreGraphics, and Vision framework (OCR)
|
|
440
|
-
- **Windows**: C# (.NET 8) binary using UI Automation, SendInput, GDI+, and Windows.Media.Ocr
|
|
441
|
-
2. **TypeScript MCP server** — routes tools to the correct bridge, handles Chrome CDP, manages sessions, runs jobs
|
|
442
|
-
3. **MCP protocol** — standard Model Context Protocol so any AI client can connect
|
|
443
|
-
|
|
444
|
-
The native bridge is auto-selected based on your OS. Both bridges speak the same JSON-RPC protocol, so all tools work identically on both platforms.
|
|
445
|
-
|
|
446
|
-
## Use Cases
|
|
447
|
-
|
|
448
|
-
### App Debugging
|
|
449
|
-
Claude reads UI trees, clicks through flows, and checks element states — faster than clicking around yourself.
|
|
450
|
-
|
|
451
|
-
### Design Inspection
|
|
452
|
-
Screenshots + OCR to read exactly what's on screen. `ui_tree` shows component structure like React DevTools but for any native app.
|
|
453
|
-
|
|
454
|
-
### Browser Automation
|
|
455
|
-
Fill forms, scrape data, run JavaScript, navigate pages — all through Chrome DevTools Protocol.
|
|
456
|
-
|
|
457
|
-
### Cross-App Workflows
|
|
458
|
-
Read from one app, paste into another, chain actions across your whole desktop. Example: extract data from a spreadsheet, search it in Chrome, paste results into Notes.
|
|
459
|
-
|
|
460
|
-
### Multi-Agent Coordination
|
|
461
|
-
Run Claude, Cursor, and Codex simultaneously — each claims its own app window via session leases. The supervisor detects stalls and recovers.
|
|
462
|
-
|
|
463
|
-
### Background Job Processing
|
|
464
|
-
Queue automation jobs with `job_create`, start the worker daemon with `worker_start`, and let it process tasks continuously — even after you close your AI client.
|
|
236
|
+
ScreenHand reads the UI tree and DOM directly — no screenshots needed for most operations. When screenshots are needed (canvas apps, visual verification), OCR runs in ~600ms via the native Vision framework.
|
|
465
237
|
|
|
466
|
-
|
|
467
|
-
Click buttons, verify text appears, catch visual regressions — all driven by AI.
|
|
238
|
+
---
|
|
468
239
|
|
|
469
240
|
## Requirements
|
|
470
241
|
|
|
471
|
-
|
|
472
|
-
|
|
473
|
-
- macOS 12+
|
|
474
|
-
- Node.js 18+
|
|
475
|
-
- Accessibility permissions: System Settings > Privacy & Security > Accessibility > enable your terminal
|
|
476
|
-
- Chrome with `--remote-debugging-port=9222` (only for browser tools)
|
|
477
|
-
|
|
478
|
-
### Windows
|
|
479
|
-
|
|
480
|
-
- Windows 10 (1809+)
|
|
481
|
-
- Node.js 18+
|
|
482
|
-
- [.NET 8 SDK](https://dotnet.microsoft.com/download/dotnet/8.0)
|
|
483
|
-
- No special permissions needed — UI Automation works without admin
|
|
484
|
-
- Chrome with `--remote-debugging-port=9222` (only for browser tools)
|
|
485
|
-
- Build: `npm run build:native:windows`
|
|
486
|
-
|
|
487
|
-
## Skills (Slash Commands)
|
|
488
|
-
|
|
489
|
-
ScreenHand ships with Claude Code slash commands:
|
|
490
|
-
|
|
491
|
-
- `/screenshot` — capture your screen and describe what's visible
|
|
492
|
-
- `/debug-ui` — inspect the UI tree of any app
|
|
493
|
-
- `/automate` — describe a task and Claude does it
|
|
494
|
-
|
|
495
|
-
**Install globally** so they work in any project:
|
|
496
|
-
|
|
497
|
-
```bash
|
|
498
|
-
./install-skills.sh
|
|
499
|
-
```
|
|
500
|
-
|
|
501
|
-
## Development
|
|
502
|
-
|
|
503
|
-
```bash
|
|
504
|
-
npm run dev # Run MCP server with tsx (hot reload)
|
|
505
|
-
npm run check # type-check (covers all entry files)
|
|
506
|
-
npm test # run test suite
|
|
507
|
-
npm run build # compile TypeScript
|
|
508
|
-
npm run build:native # build Swift bridge (macOS)
|
|
509
|
-
npm run build:native:windows # build .NET bridge (Windows)
|
|
510
|
-
```
|
|
511
|
-
|
|
512
|
-
## FAQ
|
|
513
|
-
|
|
514
|
-
### What is ScreenHand?
|
|
515
|
-
ScreenHand is an open-source MCP server that gives AI assistants like Claude the ability to see and control your desktop. It provides 70+ tools for screenshots, UI inspection, clicking, typing, browser automation, session management, job queuing, and playbook execution on both macOS and Windows.
|
|
516
|
-
|
|
517
|
-
### How does ScreenHand differ from Anthropic's Computer Use?
|
|
518
|
-
Anthropic's Computer Use is a cloud-based feature built into Claude. ScreenHand is an open-source, local-first tool that runs entirely on your machine with no cloud dependency. It uses native OS APIs (Accessibility on macOS, UI Automation on Windows) which are faster and more reliable than screenshot-based approaches.
|
|
519
|
-
|
|
520
|
-
### How does ScreenHand work with OpenClaw?
|
|
521
|
-
|
|
522
|
-
ScreenHand **integrates with** OpenClaw as an MCP server — giving your Claw agent native desktop speed instead of screenshot-based clicking.
|
|
523
|
-
|
|
524
|
-
| | Without ScreenHand | With ScreenHand |
|
|
242
|
+
| | macOS | Windows |
|
|
525
243
|
|---|---|---|
|
|
526
|
-
|
|
|
527
|
-
|
|
|
528
|
-
|
|
|
244
|
+
| OS | macOS 12+ | Windows 10 (1809+) |
|
|
245
|
+
| Runtime | Node.js 18+ | Node.js 18+ |
|
|
246
|
+
| Native | Swift (included) | [.NET 8 SDK](https://dotnet.microsoft.com/download/dotnet/8.0) |
|
|
247
|
+
| Permissions | Accessibility access for terminal | None (UI Automation works without admin) |
|
|
248
|
+
| Browser | Chrome with `--remote-debugging-port=9222` | Same |
|
|
249
|
+
|
|
250
|
+
## Docs
|
|
251
|
+
|
|
252
|
+
| Document | What's in it |
|
|
253
|
+
|----------|-------------|
|
|
254
|
+
| [All 111 Tools](docs/tools.md) | Complete tool reference with descriptions and speeds |
|
|
255
|
+
| [Architecture](docs/architecture.md) | 7-layer design, app tiers, performance targets |
|
|
256
|
+
| [App Mastery Map](docs/app-mastery-map.md) | Layer 7: persistent spatial understanding, 8 auto-recording features |
|
|
257
|
+
| [Bug Tracker](docs/l2-bug-tracker.md) | 103 bugs found and fixed, 80-scenario validation results |
|
|
258
|
+
| [Testing Plan](docs/testing-plan.md) | L1/L2 test methodology and gate criteria |
|
|
529
259
|
|
|
530
|
-
|
|
260
|
+
## FAQ
|
|
531
261
|
|
|
532
|
-
|
|
533
|
-
|
|
534
|
-
"mcpServers": {
|
|
535
|
-
"screenhand": {
|
|
536
|
-
"command": "npx",
|
|
537
|
-
"args": ["tsx", "/path/to/screenhand/mcp-desktop.ts"]
|
|
538
|
-
}
|
|
539
|
-
}
|
|
540
|
-
}
|
|
541
|
-
```
|
|
262
|
+
<details>
|
|
263
|
+
<summary><b>How is this different from Anthropic's Computer Use?</b></summary>
|
|
542
264
|
|
|
543
|
-
|
|
265
|
+
Computer Use is cloud-based and screenshot-driven. ScreenHand is local-first, uses native OS APIs (50ms vs 3-5s per action), costs zero API calls for clicks/typing, and runs entirely on your machine.
|
|
266
|
+
</details>
|
|
544
267
|
|
|
545
|
-
|
|
546
|
-
|
|
268
|
+
<details>
|
|
269
|
+
<summary><b>What apps can it control?</b></summary>
|
|
547
270
|
|
|
548
|
-
|
|
549
|
-
|
|
271
|
+
Any app with Accessibility support (most macOS/Windows apps). Chrome and Electron apps get full DOM access via CDP. Canvas-heavy apps (games, Photoshop viewport) use OCR as fallback.
|
|
272
|
+
</details>
|
|
550
273
|
|
|
551
|
-
|
|
552
|
-
|
|
274
|
+
<details>
|
|
275
|
+
<summary><b>Is it safe?</b></summary>
|
|
553
276
|
|
|
554
|
-
|
|
555
|
-
|
|
277
|
+
Runs locally, never sends screen data externally. PII is redacted from all persisted data (memory, playbooks, strategies). Dangerous protocols (`javascript:`, `data:`) are blocked. AppleScript and browser JS execution are audit-logged.
|
|
278
|
+
</details>
|
|
556
279
|
|
|
557
|
-
|
|
558
|
-
|
|
280
|
+
<details>
|
|
281
|
+
<summary><b>Does it work with multiple AI agents at once?</b></summary>
|
|
559
282
|
|
|
560
|
-
|
|
561
|
-
|
|
283
|
+
Yes. Session leases with heartbeat prevent conflicts. The supervisor daemon detects stalls and recovers. Each agent claims its own app window.
|
|
284
|
+
</details>
|
|
562
285
|
|
|
563
|
-
|
|
564
|
-
|
|
286
|
+
<details>
|
|
287
|
+
<summary><b>How fast is it?</b></summary>
|
|
565
288
|
|
|
566
|
-
|
|
567
|
-
|
|
289
|
+
Accessibility: ~50ms. Chrome CDP: ~10ms (background, no focus needed). OCR: ~600ms. Memory lookups: ~0ms (in-memory cache). All disk writes are async and non-blocking.
|
|
290
|
+
</details>
|
|
568
291
|
|
|
569
292
|
## Contributing
|
|
570
293
|
|
|
571
|
-
Contributions are welcome! Please open an issue first to discuss what you'd like to change.
|
|
572
|
-
|
|
573
294
|
```bash
|
|
574
295
|
git clone https://github.com/manushi4/screenhand.git
|
|
575
|
-
cd screenhand
|
|
576
|
-
npm
|
|
577
|
-
npm run build:native
|
|
578
|
-
npm test
|
|
296
|
+
cd screenhand && npm install && npm run build:native
|
|
297
|
+
npm test # 1306 tests, 53 files
|
|
579
298
|
```
|
|
580
299
|
|
|
581
300
|
## Contact
|
|
@@ -586,7 +305,7 @@ npm test
|
|
|
586
305
|
|
|
587
306
|
## License
|
|
588
307
|
|
|
589
|
-
AGPL-3.0-only — Copyright (C) 2025 Clazro Technology Private Limited
|
|
308
|
+
AGPL-3.0-only — Copyright (C) 2025-2026 Clazro Technology Private Limited
|
|
590
309
|
|
|
591
310
|
---
|
|
592
311
|
|