screenhand 0.1.1 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +193 -109
- package/bin/darwin-arm64/macos-bridge +0 -0
- package/dist/mcp-desktop.js +5876 -0
- package/dist/scripts/codex-monitor-daemon.js +335 -0
- package/dist/scripts/export-help-center.js +112 -0
- package/dist/scripts/marketing-loop.js +117 -0
- package/dist/scripts/observer-daemon.js +288 -0
- package/dist/scripts/orchestrator-daemon.js +399 -0
- package/dist/scripts/supervisor-daemon.js +272 -0
- package/dist/scripts/threads-campaign.js +208 -0
- package/dist/scripts/worker-daemon.js +228 -0
- package/dist/src/agent/cli.js +82 -0
- package/dist/src/agent/loop.js +274 -0
- package/dist/src/community/fetcher.js +109 -0
- package/dist/src/community/index.js +6 -0
- package/dist/src/community/publisher.js +191 -0
- package/dist/src/community/remote-api.js +121 -0
- package/dist/src/community/types.js +3 -0
- package/dist/src/community/validator.js +95 -0
- package/{src/config.ts → dist/src/config.js} +5 -10
- package/dist/src/context-tracker.js +489 -0
- package/{src/index.ts → dist/src/index.js} +32 -52
- package/dist/src/ingestion/coverage-auditor.js +233 -0
- package/dist/src/ingestion/doc-parser.js +164 -0
- package/dist/src/ingestion/index.js +8 -0
- package/dist/src/ingestion/menu-scanner.js +152 -0
- package/dist/src/ingestion/reference-merger.js +186 -0
- package/dist/src/ingestion/shortcut-extractor.js +180 -0
- package/dist/src/ingestion/tutorial-extractor.js +170 -0
- package/dist/src/ingestion/types.js +3 -0
- package/dist/src/jobs/manager.js +305 -0
- package/dist/src/jobs/runner.js +806 -0
- package/dist/src/jobs/store.js +102 -0
- package/dist/src/jobs/types.js +30 -0
- package/dist/src/jobs/worker.js +97 -0
- package/dist/src/learning/engine.js +356 -0
- package/dist/src/learning/index.js +9 -0
- package/dist/src/learning/locator-policy.js +120 -0
- package/dist/src/learning/pattern-policy.js +89 -0
- package/dist/src/learning/recovery-policy.js +116 -0
- package/dist/src/learning/sensor-policy.js +115 -0
- package/dist/src/learning/timing-model.js +204 -0
- package/dist/src/learning/topology-policy.js +90 -0
- package/dist/src/learning/types.js +9 -0
- package/dist/src/logging/timeline-logger.js +48 -0
- package/dist/src/mcp/mcp-stdio-server.js +464 -0
- package/dist/src/mcp/server.js +363 -0
- package/dist/src/mcp-entry.js +60 -0
- package/dist/src/memory/playbook-seeds.js +200 -0
- package/dist/src/memory/recall.js +222 -0
- package/dist/src/memory/research.js +104 -0
- package/dist/src/memory/seeds.js +101 -0
- package/dist/src/memory/service.js +446 -0
- package/dist/src/memory/session.js +169 -0
- package/dist/src/memory/store.js +451 -0
- package/{src/runtime/locator-cache.ts → dist/src/memory/types.js} +1 -17
- package/dist/src/monitor/codex-monitor.js +382 -0
- package/dist/src/monitor/task-queue.js +97 -0
- package/dist/src/monitor/types.js +62 -0
- package/dist/src/native/bridge-client.js +412 -0
- package/{src/native/macos-bridge-client.ts → dist/src/native/macos-bridge-client.js} +0 -1
- package/dist/src/observer/state.js +199 -0
- package/dist/src/observer/types.js +43 -0
- package/dist/src/orchestrator/state.js +68 -0
- package/dist/src/orchestrator/types.js +22 -0
- package/dist/src/perception/ax-source.js +162 -0
- package/dist/src/perception/cdp-source.js +162 -0
- package/dist/src/perception/coordinator.js +771 -0
- package/dist/src/perception/frame-differ.js +287 -0
- package/dist/src/perception/index.js +22 -0
- package/dist/src/perception/manager.js +199 -0
- package/dist/src/perception/types.js +47 -0
- package/dist/src/perception/vision-source.js +399 -0
- package/dist/src/planner/deterministic.js +298 -0
- package/dist/src/planner/executor.js +870 -0
- package/dist/src/planner/goal-store.js +92 -0
- package/dist/src/planner/index.js +21 -0
- package/dist/src/planner/planner.js +520 -0
- package/dist/src/planner/tool-registry.js +71 -0
- package/dist/src/planner/types.js +22 -0
- package/dist/src/platform/explorer.js +213 -0
- package/dist/src/platform/help-center-markdown.js +527 -0
- package/dist/src/platform/learner.js +257 -0
- package/dist/src/playbook/engine.js +486 -0
- package/dist/src/playbook/index.js +20 -0
- package/dist/src/playbook/mcp-recorder.js +204 -0
- package/dist/src/playbook/recorder.js +536 -0
- package/dist/src/playbook/runner.js +408 -0
- package/dist/src/playbook/store.js +312 -0
- package/dist/src/playbook/types.js +17 -0
- package/dist/src/recovery/detectors.js +156 -0
- package/dist/src/recovery/engine.js +327 -0
- package/dist/src/recovery/index.js +20 -0
- package/dist/src/recovery/strategies.js +274 -0
- package/dist/src/recovery/types.js +20 -0
- package/dist/src/runtime/accessibility-adapter.js +430 -0
- package/dist/src/runtime/app-adapter.js +64 -0
- package/dist/src/runtime/applescript-adapter.js +305 -0
- package/dist/src/runtime/ax-role-map.js +96 -0
- package/dist/src/runtime/browser-adapter.js +52 -0
- package/dist/src/runtime/cdp-chrome-adapter.js +521 -0
- package/dist/src/runtime/composite-adapter.js +221 -0
- package/dist/src/runtime/execution-contract.js +159 -0
- package/dist/src/runtime/executor.js +286 -0
- package/dist/src/runtime/locator-cache.js +50 -0
- package/dist/src/runtime/planning-loop.js +63 -0
- package/dist/src/runtime/service.js +432 -0
- package/dist/src/runtime/session-manager.js +63 -0
- package/dist/src/runtime/state-observer.js +121 -0
- package/dist/src/runtime/vision-adapter.js +225 -0
- package/dist/src/state/app-map-types.js +72 -0
- package/dist/src/state/app-map.js +1974 -0
- package/dist/src/state/entity-tracker.js +108 -0
- package/dist/src/state/fusion.js +96 -0
- package/dist/src/state/index.js +21 -0
- package/dist/src/state/ladder-generator.js +236 -0
- package/dist/src/state/persistence.js +156 -0
- package/dist/src/state/types.js +17 -0
- package/dist/src/state/world-model.js +1456 -0
- package/dist/src/supervisor/locks.js +186 -0
- package/dist/src/supervisor/supervisor.js +403 -0
- package/dist/src/supervisor/types.js +30 -0
- package/dist/src/test-mcp-protocol.js +154 -0
- package/dist/src/types.js +17 -0
- package/dist/src/util/atomic-write.js +133 -0
- package/dist/src/util/sanitize.js +146 -0
- package/dist-app-maps/com.figma.Desktop.json +959 -0
- package/dist-app-maps/com.hnc.Discord.json +1146 -0
- package/dist-app-maps/notion.id.json +2831 -0
- package/dist-playbooks/canva-screenhand-carousel.json +445 -0
- package/dist-playbooks/codex-desktop.json +76 -0
- package/dist-playbooks/competitor-research-stack.json +122 -0
- package/dist-playbooks/davinci-color-grade.json +153 -0
- package/dist-playbooks/davinci-edit-timeline.json +162 -0
- package/dist-playbooks/davinci-render.json +114 -0
- package/dist-playbooks/devto.json +52 -0
- package/dist-playbooks/discord.json +41 -0
- package/dist-playbooks/google-flow-create-project.json +59 -0
- package/dist-playbooks/google-flow-edit-image.json +90 -0
- package/dist-playbooks/google-flow-edit-video.json +90 -0
- package/dist-playbooks/google-flow-generate-image.json +68 -0
- package/dist-playbooks/google-flow-generate-video.json +191 -0
- package/dist-playbooks/google-flow-open-project.json +48 -0
- package/dist-playbooks/google-flow-open-scenebuilder.json +64 -0
- package/dist-playbooks/google-flow-search-assets.json +64 -0
- package/dist-playbooks/instagram.json +57 -0
- package/dist-playbooks/linkedin.json +52 -0
- package/dist-playbooks/n8n.json +43 -0
- package/dist-playbooks/reddit.json +52 -0
- package/dist-playbooks/threads.json +59 -0
- package/dist-playbooks/x-twitter.json +59 -0
- package/dist-playbooks/youtube.json +59 -0
- package/dist-references/canva.json +646 -0
- package/dist-references/codex-desktop.json +305 -0
- package/dist-references/davinci-resolve-keyboard.json +594 -0
- package/dist-references/davinci-resolve-menu-map.json +1139 -0
- package/dist-references/davinci-resolve-menus-batch1.json +116 -0
- package/dist-references/davinci-resolve-menus-batch2.json +372 -0
- package/dist-references/davinci-resolve-menus-batch3.json +330 -0
- package/dist-references/davinci-resolve-menus-batch4.json +297 -0
- package/dist-references/davinci-resolve-shortcuts.json +333 -0
- package/dist-references/devto.json +317 -0
- package/dist-references/discord.json +549 -0
- package/dist-references/figma.json +1186 -0
- package/dist-references/finder.json +146 -0
- package/dist-references/google-ads-transparency.json +95 -0
- package/dist-references/google-flow.json +649 -0
- package/dist-references/instagram.json +341 -0
- package/dist-references/linkedin.json +324 -0
- package/dist-references/meta-ad-library.json +86 -0
- package/dist-references/n8n.json +387 -0
- package/dist-references/notes.json +27 -0
- package/dist-references/notion.json +163 -0
- package/dist-references/reddit.json +341 -0
- package/dist-references/threads.json +337 -0
- package/dist-references/x-twitter.json +403 -0
- package/dist-references/youtube.json +373 -0
- package/native/macos-bridge/Package.swift +1 -0
- package/native/macos-bridge/Sources/AccessibilityBridge.swift +257 -36
- package/native/macos-bridge/Sources/AppManagement.swift +212 -2
- package/native/macos-bridge/Sources/CoreGraphicsBridge.swift +348 -53
- package/native/macos-bridge/Sources/StreamCapture.swift +136 -0
- package/native/macos-bridge/Sources/VisionBridge.swift +165 -7
- package/native/macos-bridge/Sources/main.swift +169 -16
- package/native/windows-bridge/Program.cs +5 -0
- package/native/windows-bridge/ScreenCapture.cs +124 -0
- package/package.json +29 -4
- package/scripts/postinstall.cjs +127 -0
- package/.claude/commands/automate.md +0 -28
- package/.claude/commands/debug-ui.md +0 -19
- package/.claude/commands/screenshot.md +0 -15
- package/.github/FUNDING.yml +0 -1
- package/.github/ISSUE_TEMPLATE/bug_report.md +0 -27
- package/.github/ISSUE_TEMPLATE/feature_request.md +0 -20
- package/.mcp.json +0 -8
- package/DESKTOP_MCP_GUIDE.md +0 -92
- package/SECURITY.md +0 -44
- package/docs/architecture.md +0 -47
- package/install-skills.sh +0 -19
- package/mcp-bridge.ts +0 -271
- package/mcp-desktop.ts +0 -1221
- package/playbooks/instagram.json +0 -41
- package/playbooks/instagram_v2.json +0 -201
- package/playbooks/x_v1.json +0 -211
- package/scripts/devpost-live-loop.mjs +0 -421
- package/src/logging/timeline-logger.ts +0 -55
- package/src/mcp/server.ts +0 -449
- package/src/memory/recall.ts +0 -191
- package/src/memory/research.ts +0 -146
- package/src/memory/seeds.ts +0 -123
- package/src/memory/session.ts +0 -201
- package/src/memory/store.ts +0 -434
- package/src/memory/types.ts +0 -69
- package/src/native/bridge-client.ts +0 -239
- package/src/runtime/accessibility-adapter.ts +0 -487
- package/src/runtime/app-adapter.ts +0 -169
- package/src/runtime/applescript-adapter.ts +0 -376
- package/src/runtime/ax-role-map.ts +0 -102
- package/src/runtime/browser-adapter.ts +0 -129
- package/src/runtime/cdp-chrome-adapter.ts +0 -676
- package/src/runtime/composite-adapter.ts +0 -274
- package/src/runtime/executor.ts +0 -396
- package/src/runtime/planning-loop.ts +0 -81
- package/src/runtime/service.ts +0 -448
- package/src/runtime/session-manager.ts +0 -50
- package/src/runtime/state-observer.ts +0 -136
- package/src/runtime/vision-adapter.ts +0 -297
- package/src/types.ts +0 -297
- package/tests/bridge-client.test.ts +0 -176
- package/tests/browser-stealth.test.ts +0 -210
- package/tests/composite-adapter.test.ts +0 -64
- package/tests/mcp-server.test.ts +0 -151
- package/tests/memory-recall.test.ts +0 -339
- package/tests/memory-research.test.ts +0 -159
- package/tests/memory-seeds.test.ts +0 -120
- package/tests/memory-store.test.ts +0 -392
- package/tests/types.test.ts +0 -92
- package/tsconfig.check.json +0 -17
- package/tsconfig.json +0 -19
- package/vitest.config.ts +0 -8
- /package/{playbooks → dist-references}/devpost.json +0 -0
package/README.md
CHANGED
|
@@ -2,86 +2,62 @@
|
|
|
2
2
|
|
|
3
3
|
# ScreenHand
|
|
4
4
|
|
|
5
|
-
**
|
|
5
|
+
**Let AI control your desktop — click buttons, fill forms, automate workflows in ~50ms with zero extra AI calls.**
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
An open-source [MCP server](https://modelcontextprotocol.io/) for macOS and Windows. Works with Claude, Cursor, Codex CLI, and any MCP-compatible client.
|
|
8
8
|
|
|
9
9
|
[](LICENSE)
|
|
10
10
|
[](https://www.npmjs.com/package/screenhand)
|
|
11
|
+
[](https://github.com/manushi4/screenhand/actions/workflows/ci.yml)
|
|
11
12
|
[]()
|
|
12
13
|
[]()
|
|
13
14
|
|
|
14
|
-
[
|
|
15
|
+
[Quick Start](#quick-start) | [What It Does](#what-it-does) | [Example](#example) | [All 111 Tools](docs/tools.md) | [Architecture](docs/architecture.md) | [Website](https://screenhand.com)
|
|
15
16
|
|
|
16
17
|
</div>
|
|
17
18
|
|
|
18
19
|
---
|
|
19
20
|
|
|
21
|
+
<!-- TODO: Add demo GIF here — 15 sec showing Claude controlling a real app -->
|
|
22
|
+
|
|
20
23
|
## The Problem
|
|
21
24
|
|
|
22
|
-
AI assistants
|
|
25
|
+
AI assistants can write code but can't use your computer. Every click requires a screenshot → LLM interpretation → coordinate guess — **3-5 seconds and an API call per action**.
|
|
23
26
|
|
|
24
|
-
|
|
27
|
+
ScreenHand gives AI direct access to native OS APIs. No screenshots needed for clicks. No AI calls for button presses.
|
|
25
28
|
|
|
26
|
-
|
|
29
|
+
| | Without ScreenHand | With ScreenHand |
|
|
30
|
+
|---|---|---|
|
|
31
|
+
| Click a button | Screenshot → LLM → coordinate click (~3-5s) | Native Accessibility API (~50ms) |
|
|
32
|
+
| Cost per action | 1 LLM API call | 0 LLM calls |
|
|
33
|
+
| Accuracy | Coordinate guessing — misses on layout shift | Exact element targeting by role/name |
|
|
34
|
+
| Browser control | Needs focus, screenshot per action | CDP in background (~10ms), no focus needed |
|
|
35
|
+
| Works across apps | One app at a time | Cross-app workflows, multi-agent coordination |
|
|
27
36
|
|
|
28
|
-
|
|
37
|
+
## Quick Start
|
|
29
38
|
|
|
30
|
-
|
|
31
|
-
- **Read** UI elements directly via native Accessibility APIs
|
|
32
|
-
- **Click** buttons, menus, and links
|
|
33
|
-
- **Type** text into any input field
|
|
34
|
-
- **Control** Chrome tabs via DevTools Protocol
|
|
35
|
-
- **Automate** cross-app workflows
|
|
39
|
+
### 1. Add to your AI client (one step)
|
|
36
40
|
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
| MCP protocol (stdio)
|
|
40
|
-
ScreenHand
|
|
41
|
-
| Native OS APIs
|
|
42
|
-
Your Desktop (any app, any browser)
|
|
43
|
-
```
|
|
44
|
-
|
|
45
|
-
## Quick Start
|
|
41
|
+
<details open>
|
|
42
|
+
<summary><b>Claude Code</b> (recommended)</summary>
|
|
46
43
|
|
|
47
44
|
```bash
|
|
48
|
-
|
|
49
|
-
cd screenhand
|
|
50
|
-
npm install
|
|
51
|
-
npm run build:native # macOS — builds Swift bridge
|
|
52
|
-
# npm run build:native:windows # Windows — builds .NET bridge
|
|
45
|
+
claude mcp add screenhand -- npx -y screenhand
|
|
53
46
|
```
|
|
54
47
|
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
<details>
|
|
58
|
-
<summary><strong>Claude Desktop</strong></summary>
|
|
59
|
-
|
|
60
|
-
Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
|
|
61
|
-
|
|
62
|
-
```json
|
|
63
|
-
{
|
|
64
|
-
"mcpServers": {
|
|
65
|
-
"screenhand": {
|
|
66
|
-
"command": "npx",
|
|
67
|
-
"args": ["tsx", "/path/to/screenhand/mcp-desktop.ts"]
|
|
68
|
-
}
|
|
69
|
-
}
|
|
70
|
-
}
|
|
71
|
-
```
|
|
48
|
+
Done. That's it.
|
|
72
49
|
</details>
|
|
73
50
|
|
|
74
51
|
<details>
|
|
75
|
-
<summary><
|
|
76
|
-
|
|
77
|
-
Add to your project `.mcp.json` or `~/.claude/settings.json`:
|
|
52
|
+
<summary><b>Claude Desktop</b></summary>
|
|
78
53
|
|
|
54
|
+
Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
|
|
79
55
|
```json
|
|
80
56
|
{
|
|
81
57
|
"mcpServers": {
|
|
82
58
|
"screenhand": {
|
|
83
59
|
"command": "npx",
|
|
84
|
-
"args": ["
|
|
60
|
+
"args": ["-y", "screenhand"]
|
|
85
61
|
}
|
|
86
62
|
}
|
|
87
63
|
}
|
|
@@ -89,16 +65,15 @@ Add to your project `.mcp.json` or `~/.claude/settings.json`:
|
|
|
89
65
|
</details>
|
|
90
66
|
|
|
91
67
|
<details>
|
|
92
|
-
<summary><
|
|
93
|
-
|
|
94
|
-
Add to `.cursor/mcp.json` in your project (or `~/.cursor/mcp.json` globally):
|
|
68
|
+
<summary><b>Cursor</b></summary>
|
|
95
69
|
|
|
70
|
+
Add to `.cursor/mcp.json`:
|
|
96
71
|
```json
|
|
97
72
|
{
|
|
98
73
|
"mcpServers": {
|
|
99
74
|
"screenhand": {
|
|
100
75
|
"command": "npx",
|
|
101
|
-
"args": ["
|
|
76
|
+
"args": ["-y", "screenhand"]
|
|
102
77
|
}
|
|
103
78
|
}
|
|
104
79
|
}
|
|
@@ -106,127 +81,236 @@ Add to `.cursor/mcp.json` in your project (or `~/.cursor/mcp.json` globally):
|
|
|
106
81
|
</details>
|
|
107
82
|
|
|
108
83
|
<details>
|
|
109
|
-
<summary><
|
|
84
|
+
<summary><b>OpenAI Codex CLI</b></summary>
|
|
110
85
|
|
|
111
86
|
Add to `~/.codex/config.toml`:
|
|
112
|
-
|
|
113
87
|
```toml
|
|
114
88
|
[mcp.screenhand]
|
|
115
89
|
command = "npx"
|
|
116
|
-
args = ["
|
|
90
|
+
args = ["-y", "screenhand"]
|
|
117
91
|
transport = "stdio"
|
|
118
92
|
```
|
|
119
93
|
</details>
|
|
120
94
|
|
|
121
95
|
<details>
|
|
122
|
-
<summary><
|
|
96
|
+
<summary><b>Any MCP Client</b></summary>
|
|
123
97
|
|
|
124
|
-
ScreenHand is a standard MCP server over stdio.
|
|
98
|
+
ScreenHand is a standard MCP server over stdio. Run with `npx -y screenhand`.
|
|
125
99
|
</details>
|
|
126
100
|
|
|
127
|
-
|
|
101
|
+
### 2. Grant permissions
|
|
128
102
|
|
|
129
|
-
|
|
103
|
+
**macOS**: System Settings > Privacy & Security > Accessibility > enable your terminal app.
|
|
130
104
|
|
|
131
|
-
|
|
132
|
-
Tell your AI "submit this form on 10 websites" or "export all these reports as PDFs" — and it does it. ScreenHand handles the clicking, typing, and navigating across any app.
|
|
105
|
+
**Windows**: No special permissions needed.
|
|
133
106
|
|
|
134
|
-
###
|
|
135
|
-
Instead of clicking through your app manually, let Claude inspect the full UI element tree, check states, and walk through flows — all from your terminal.
|
|
107
|
+
### 3. Browser control (optional)
|
|
136
108
|
|
|
137
|
-
|
|
138
|
-
|
|
109
|
+
Launch Chrome with remote debugging to enable browser tools:
|
|
110
|
+
```bash
|
|
111
|
+
open -a "Google Chrome" --args --remote-debugging-port=9222
|
|
112
|
+
```
|
|
139
113
|
|
|
140
|
-
|
|
141
|
-
Read data from a spreadsheet, search it in Chrome, paste results into Notes — chain actions across your entire desktop.
|
|
114
|
+
That's it. Your AI client now has 111 tools for desktop automation.
|
|
142
115
|
|
|
143
|
-
|
|
144
|
-
|
|
116
|
+
<details>
|
|
117
|
+
<summary><b>Building from source</b> (contributors only)</summary>
|
|
145
118
|
|
|
146
|
-
|
|
119
|
+
```bash
|
|
120
|
+
git clone https://github.com/manushi4/screenhand.git
|
|
121
|
+
cd screenhand && npm install && npm run build:native
|
|
122
|
+
```
|
|
147
123
|
|
|
148
|
-
|
|
124
|
+
On Windows, use `npm run build:native:windows` instead.
|
|
125
|
+
</details>
|
|
149
126
|
|
|
150
|
-
|
|
151
|
-
|----------|----------|----------|
|
|
152
|
-
| **Screen** | `screenshot`, `ocr` | See what's on screen, read all visible text |
|
|
153
|
-
| **App Control** | `ui_tree`, `ui_press`, `menu_click` | Read and interact with any native app |
|
|
154
|
-
| **Keyboard & Mouse** | `click`, `type_text`, `key`, `drag` | Direct input control |
|
|
155
|
-
| **Chrome Browser** | `browser_navigate`, `browser_js`, `browser_dom` | Full browser automation via CDP |
|
|
156
|
-
| **Memory** | `memory_recall`, `memory_save` | ScreenHand learns from past sessions |
|
|
157
|
-
| **AppleScript** | `applescript` | Run AppleScript on macOS |
|
|
127
|
+
---
|
|
158
128
|
|
|
159
|
-
|
|
129
|
+
## What It Does
|
|
160
130
|
|
|
161
|
-
|
|
131
|
+
ScreenHand gives AI agents seven capabilities:
|
|
162
132
|
|
|
163
|
-
|
|
164
|
-
|
|
165
|
-
|
|
166
|
-
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
|
|
133
|
+
### Desktop Control — 19 tools
|
|
134
|
+
Click buttons, type text, read UI trees, navigate menus, drag, scroll — all via native Accessibility APIs in ~50ms. Works with any app: Finder, Notes, VS Code, Xcode, System Settings, etc.
|
|
135
|
+
|
|
136
|
+
### Browser Automation — 15 tools
|
|
137
|
+
Full Chrome control via DevTools Protocol. Navigate, click, type, run JavaScript, fill forms — all in the background at ~10ms. Built-in anti-detection (`browser_stealth`, `browser_human_click`) for sites with bot protection.
|
|
138
|
+
|
|
139
|
+
### Smart Fallbacks — 8 tools
|
|
140
|
+
`click_with_fallback`, `type_with_fallback`, etc. automatically try Accessibility → CDP → OCR → coordinates. You don't have to pick the right method — ScreenHand figures it out.
|
|
141
|
+
|
|
142
|
+
### Memory & Learning — 14 tools
|
|
143
|
+
Gets smarter every session. Logs tool calls, saves winning strategies, tracks error patterns with fixes. Zero config, zero latency overhead (in-memory cache, async disk writes). Ships with 12 seed strategies for common macOS workflows. 6 learning policies: locator stability, sensor effectiveness, recovery ranking, pattern recognition, adaptive timing, and topology (navigation edge reliability).
|
|
144
|
+
|
|
145
|
+
### App Mastery Map — automatic per-app spatial understanding
|
|
146
|
+
Builds a persistent reverse-engineered blueprint of every app from normal tool usage. 8 features record automatically: page zones, navigation graph (BFS pathfinding), hierarchy, I/O contracts, state machine, element visibility, timing profiles, and ready signals. Mastery levels (beginner → pro → expert → grandmaster) honestly reflect how well ScreenHand knows each app. Maps stored at `~/.screenhand/app-maps/`.
|
|
147
|
+
|
|
148
|
+
### Jobs & Orchestration — 34 tools
|
|
149
|
+
Queue multi-step jobs, run them via background worker daemon, coordinate multiple AI agents with session leases, detect stalls, auto-recover. Survives client restarts.
|
|
170
150
|
|
|
171
|
-
|
|
151
|
+
### Perception & Planning — 17 tools
|
|
152
|
+
Continuous screen awareness (3-rate perception loop at 100ms/300ms/1000ms), real-time world model with entity tracking, goal-oriented planning with auto-decomposition, recovery engine with self-healing. The system always knows what's on screen and feeds observations into the App Mastery Map.
|
|
153
|
+
|
|
154
|
+
> **Full reference**: See all [111 tools with descriptions](docs/tools.md).
|
|
155
|
+
|
|
156
|
+
---
|
|
157
|
+
|
|
158
|
+
## Example
|
|
159
|
+
|
|
160
|
+
**Browser** — Claude controls Chrome in the background while you work:
|
|
161
|
+
|
|
162
|
+
```
|
|
163
|
+
You: Search for "screenhand" on Instagram
|
|
164
|
+
|
|
165
|
+
→ browser_tabs() # ~10ms
|
|
166
|
+
[34DF5DE1] Instagram — https://www.instagram.com/
|
|
167
|
+
|
|
168
|
+
→ browser_js({ code: "/* click Search icon */" }) # ~10ms
|
|
169
|
+
→ browser_fill_form({ selector: "input", text: "screenhand" }) # ~50ms (human-like)
|
|
170
|
+
→ browser_js({ code: "/* extract results */" }) # ~10ms
|
|
171
|
+
|
|
172
|
+
Found @screenhand_ as the top result.
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
**Desktop** — native app control without screenshots:
|
|
176
|
+
|
|
177
|
+
```
|
|
178
|
+
→ apps() # List running apps ~10ms
|
|
179
|
+
→ focus("com.apple.Notes") # Bring Notes to front ~10ms
|
|
180
|
+
→ ui_tree() # Read full UI element tree ~50ms
|
|
181
|
+
→ ui_press("New Note") # Click "New Note" button ~50ms
|
|
182
|
+
→ type_text("Hello world") # Type text ~30ms
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
**Cross-app** — chain actions across your whole desktop:
|
|
186
|
+
|
|
187
|
+
```
|
|
188
|
+
→ browser_js(...) # Extract data from Chrome
|
|
189
|
+
→ focus("com.apple.Notes") # Switch to Notes
|
|
190
|
+
→ type_text(extractedData) # Paste it in
|
|
191
|
+
→ key("cmd+s") # Save
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
---
|
|
195
|
+
|
|
196
|
+
## Claude Code Plugin
|
|
197
|
+
|
|
198
|
+
If you use Claude Code, ScreenHand includes a plugin with **13 skills and 5 agents** that wrap all 111 tools into intent-oriented workflows.
|
|
172
199
|
|
|
173
200
|
```bash
|
|
174
|
-
npm run
|
|
175
|
-
npm test # run test suite
|
|
176
|
-
npm run build # compile TypeScript
|
|
177
|
-
npm run build:native # build native bridge
|
|
201
|
+
./install-plugin.sh # after npm install && npm run build:native
|
|
178
202
|
```
|
|
179
203
|
|
|
204
|
+
| Skill | What it does |
|
|
205
|
+
|-------|-------------|
|
|
206
|
+
| `/automate` | Control any desktop app |
|
|
207
|
+
| `/post-social` | Post to X, LinkedIn, Instagram, Reddit, Threads, Discord |
|
|
208
|
+
| `/run-campaign` | Multi-platform marketing campaigns |
|
|
209
|
+
| `/edit-video` | DaVinci Resolve automation |
|
|
210
|
+
| `/design-figma` | Figma design via Plugin API + browser |
|
|
211
|
+
| `/edit-canva` | Canva template editing |
|
|
212
|
+
| `/scrape-web` | Data extraction with anti-detection |
|
|
213
|
+
| `/fill-form` | Human-like form filling |
|
|
214
|
+
| `/qa-smoke-test` | Automated UI testing |
|
|
215
|
+
| `/record-workflow` | Record into reusable playbooks |
|
|
216
|
+
| `/learn-platform` | Discover how to automate a new app/site |
|
|
217
|
+
| `/run-jobs` | Job queues, background workers |
|
|
218
|
+
| `/manage-system` | Supervisor, memory, diagnostics |
|
|
219
|
+
|
|
220
|
+
5 specialized agents: **marketing**, **design**, **QA**, **scraper**, **orchestrator**.
|
|
221
|
+
|
|
222
|
+
---
|
|
223
|
+
|
|
224
|
+
## How It Works
|
|
225
|
+
|
|
226
|
+
```
|
|
227
|
+
AI Client (Claude, Cursor, Codex CLI)
|
|
228
|
+
↓ MCP protocol (stdio)
|
|
229
|
+
ScreenHand MCP Server (TypeScript)
|
|
230
|
+
↓ JSON-RPC (stdio)
|
|
231
|
+
Native Bridge (Swift on macOS / C# on Windows)
|
|
232
|
+
↓ OS APIs
|
|
233
|
+
Accessibility, CoreGraphics, Vision, UI Automation, SendInput
|
|
234
|
+
```
|
|
235
|
+
|
|
236
|
+
ScreenHand reads the UI tree and DOM directly — no screenshots needed for most operations. When screenshots are needed (canvas apps, visual verification), OCR runs in ~600ms via the native Vision framework.
|
|
237
|
+
|
|
238
|
+
---
|
|
239
|
+
|
|
240
|
+
## Requirements
|
|
241
|
+
|
|
242
|
+
| | macOS | Windows |
|
|
243
|
+
|---|---|---|
|
|
244
|
+
| OS | macOS 12+ | Windows 10 (1809+) |
|
|
245
|
+
| Runtime | Node.js 18+ | Node.js 18+ |
|
|
246
|
+
| Native | Swift (included) | [.NET 8 SDK](https://dotnet.microsoft.com/download/dotnet/8.0) |
|
|
247
|
+
| Permissions | Accessibility access for terminal | None (UI Automation works without admin) |
|
|
248
|
+
| Browser | Chrome with `--remote-debugging-port=9222` | Same |
|
|
249
|
+
|
|
250
|
+
## Docs
|
|
251
|
+
|
|
252
|
+
| Document | What's in it |
|
|
253
|
+
|----------|-------------|
|
|
254
|
+
| [All 111 Tools](docs/tools.md) | Complete tool reference with descriptions and speeds |
|
|
255
|
+
| [Architecture](docs/architecture.md) | 7-layer design, app tiers, performance targets |
|
|
256
|
+
| [App Mastery Map](docs/app-mastery-map.md) | Layer 7: persistent spatial understanding, 8 auto-recording features |
|
|
257
|
+
| [Bug Tracker](docs/l2-bug-tracker.md) | 103 bugs found and fixed, 80-scenario validation results |
|
|
258
|
+
| [Testing Plan](docs/testing-plan.md) | L1/L2 test methodology and gate criteria |
|
|
259
|
+
|
|
180
260
|
## FAQ
|
|
181
261
|
|
|
182
262
|
<details>
|
|
183
|
-
<summary><
|
|
263
|
+
<summary><b>How is this different from Anthropic's Computer Use?</b></summary>
|
|
184
264
|
|
|
185
|
-
|
|
265
|
+
Computer Use is cloud-based and screenshot-driven. ScreenHand is local-first, uses native OS APIs (50ms vs 3-5s per action), costs zero API calls for clicks/typing, and runs entirely on your machine.
|
|
186
266
|
</details>
|
|
187
267
|
|
|
188
268
|
<details>
|
|
189
|
-
<summary><
|
|
269
|
+
<summary><b>What apps can it control?</b></summary>
|
|
190
270
|
|
|
191
|
-
|
|
271
|
+
Any app with Accessibility support (most macOS/Windows apps). Chrome and Electron apps get full DOM access via CDP. Canvas-heavy apps (games, Photoshop viewport) use OCR as fallback.
|
|
192
272
|
</details>
|
|
193
273
|
|
|
194
274
|
<details>
|
|
195
|
-
<summary><
|
|
275
|
+
<summary><b>Is it safe?</b></summary>
|
|
196
276
|
|
|
197
|
-
|
|
277
|
+
Runs locally, never sends screen data externally. PII is redacted from all persisted data (memory, playbooks, strategies). Dangerous protocols (`javascript:`, `data:`) are blocked. AppleScript and browser JS execution are audit-logged.
|
|
198
278
|
</details>
|
|
199
279
|
|
|
200
280
|
<details>
|
|
201
|
-
<summary><
|
|
281
|
+
<summary><b>Does it work with multiple AI agents at once?</b></summary>
|
|
202
282
|
|
|
203
|
-
|
|
283
|
+
Yes. Session leases with heartbeat prevent conflicts. The supervisor daemon detects stalls and recovers. Each agent claims its own app window.
|
|
204
284
|
</details>
|
|
205
285
|
|
|
206
286
|
<details>
|
|
207
|
-
<summary><
|
|
287
|
+
<summary><b>How fast is it?</b></summary>
|
|
208
288
|
|
|
209
|
-
|
|
289
|
+
Accessibility: ~50ms. Chrome CDP: ~10ms (background, no focus needed). OCR: ~600ms. Memory lookups: ~0ms (in-memory cache). All disk writes are async and non-blocking.
|
|
210
290
|
</details>
|
|
211
291
|
|
|
212
292
|
## Contributing
|
|
213
293
|
|
|
214
|
-
Contributions welcome! Please open an issue first to discuss what you'd like to change.
|
|
215
|
-
|
|
216
294
|
```bash
|
|
217
295
|
git clone https://github.com/manushi4/screenhand.git
|
|
218
|
-
cd screenhand
|
|
219
|
-
npm
|
|
296
|
+
cd screenhand && npm install && npm run build:native
|
|
297
|
+
npm test # 1306 tests, 53 files
|
|
220
298
|
```
|
|
221
299
|
|
|
300
|
+
## Contact
|
|
301
|
+
|
|
302
|
+
- **Email**: [khushi@clazro.com](mailto:khushi@clazro.com)
|
|
303
|
+
- **Issues**: [github.com/manushi4/screenhand/issues](https://github.com/manushi4/screenhand/issues)
|
|
304
|
+
- **Website**: [screenhand.com](https://screenhand.com)
|
|
305
|
+
|
|
222
306
|
## License
|
|
223
307
|
|
|
224
|
-
|
|
308
|
+
AGPL-3.0-only — Copyright (C) 2025-2026 Clazro Technology Private Limited
|
|
225
309
|
|
|
226
310
|
---
|
|
227
311
|
|
|
228
312
|
<div align="center">
|
|
229
313
|
|
|
230
|
-
**[screenhand.com](https://screenhand.com)** |
|
|
314
|
+
**[screenhand.com](https://screenhand.com)** | [khushi@clazro.com](mailto:khushi@clazro.com) | A product of **Clazro Technology Private Limited**
|
|
231
315
|
|
|
232
316
|
</div>
|
|
Binary file
|