screenhand 0.1.1 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (241) hide show
  1. package/README.md +193 -109
  2. package/bin/darwin-arm64/macos-bridge +0 -0
  3. package/dist/mcp-desktop.js +5876 -0
  4. package/dist/scripts/codex-monitor-daemon.js +335 -0
  5. package/dist/scripts/export-help-center.js +112 -0
  6. package/dist/scripts/marketing-loop.js +117 -0
  7. package/dist/scripts/observer-daemon.js +288 -0
  8. package/dist/scripts/orchestrator-daemon.js +399 -0
  9. package/dist/scripts/supervisor-daemon.js +272 -0
  10. package/dist/scripts/threads-campaign.js +208 -0
  11. package/dist/scripts/worker-daemon.js +228 -0
  12. package/dist/src/agent/cli.js +82 -0
  13. package/dist/src/agent/loop.js +274 -0
  14. package/dist/src/community/fetcher.js +109 -0
  15. package/dist/src/community/index.js +6 -0
  16. package/dist/src/community/publisher.js +191 -0
  17. package/dist/src/community/remote-api.js +121 -0
  18. package/dist/src/community/types.js +3 -0
  19. package/dist/src/community/validator.js +95 -0
  20. package/{src/config.ts → dist/src/config.js} +5 -10
  21. package/dist/src/context-tracker.js +489 -0
  22. package/{src/index.ts → dist/src/index.js} +32 -52
  23. package/dist/src/ingestion/coverage-auditor.js +233 -0
  24. package/dist/src/ingestion/doc-parser.js +164 -0
  25. package/dist/src/ingestion/index.js +8 -0
  26. package/dist/src/ingestion/menu-scanner.js +152 -0
  27. package/dist/src/ingestion/reference-merger.js +186 -0
  28. package/dist/src/ingestion/shortcut-extractor.js +180 -0
  29. package/dist/src/ingestion/tutorial-extractor.js +170 -0
  30. package/dist/src/ingestion/types.js +3 -0
  31. package/dist/src/jobs/manager.js +305 -0
  32. package/dist/src/jobs/runner.js +806 -0
  33. package/dist/src/jobs/store.js +102 -0
  34. package/dist/src/jobs/types.js +30 -0
  35. package/dist/src/jobs/worker.js +97 -0
  36. package/dist/src/learning/engine.js +356 -0
  37. package/dist/src/learning/index.js +9 -0
  38. package/dist/src/learning/locator-policy.js +120 -0
  39. package/dist/src/learning/pattern-policy.js +89 -0
  40. package/dist/src/learning/recovery-policy.js +116 -0
  41. package/dist/src/learning/sensor-policy.js +115 -0
  42. package/dist/src/learning/timing-model.js +204 -0
  43. package/dist/src/learning/topology-policy.js +90 -0
  44. package/dist/src/learning/types.js +9 -0
  45. package/dist/src/logging/timeline-logger.js +48 -0
  46. package/dist/src/mcp/mcp-stdio-server.js +464 -0
  47. package/dist/src/mcp/server.js +363 -0
  48. package/dist/src/mcp-entry.js +60 -0
  49. package/dist/src/memory/playbook-seeds.js +200 -0
  50. package/dist/src/memory/recall.js +222 -0
  51. package/dist/src/memory/research.js +104 -0
  52. package/dist/src/memory/seeds.js +101 -0
  53. package/dist/src/memory/service.js +446 -0
  54. package/dist/src/memory/session.js +169 -0
  55. package/dist/src/memory/store.js +451 -0
  56. package/{src/runtime/locator-cache.ts → dist/src/memory/types.js} +1 -17
  57. package/dist/src/monitor/codex-monitor.js +382 -0
  58. package/dist/src/monitor/task-queue.js +97 -0
  59. package/dist/src/monitor/types.js +62 -0
  60. package/dist/src/native/bridge-client.js +412 -0
  61. package/{src/native/macos-bridge-client.ts → dist/src/native/macos-bridge-client.js} +0 -1
  62. package/dist/src/observer/state.js +199 -0
  63. package/dist/src/observer/types.js +43 -0
  64. package/dist/src/orchestrator/state.js +68 -0
  65. package/dist/src/orchestrator/types.js +22 -0
  66. package/dist/src/perception/ax-source.js +162 -0
  67. package/dist/src/perception/cdp-source.js +162 -0
  68. package/dist/src/perception/coordinator.js +771 -0
  69. package/dist/src/perception/frame-differ.js +287 -0
  70. package/dist/src/perception/index.js +22 -0
  71. package/dist/src/perception/manager.js +199 -0
  72. package/dist/src/perception/types.js +47 -0
  73. package/dist/src/perception/vision-source.js +399 -0
  74. package/dist/src/planner/deterministic.js +298 -0
  75. package/dist/src/planner/executor.js +870 -0
  76. package/dist/src/planner/goal-store.js +92 -0
  77. package/dist/src/planner/index.js +21 -0
  78. package/dist/src/planner/planner.js +520 -0
  79. package/dist/src/planner/tool-registry.js +71 -0
  80. package/dist/src/planner/types.js +22 -0
  81. package/dist/src/platform/explorer.js +213 -0
  82. package/dist/src/platform/help-center-markdown.js +527 -0
  83. package/dist/src/platform/learner.js +257 -0
  84. package/dist/src/playbook/engine.js +486 -0
  85. package/dist/src/playbook/index.js +20 -0
  86. package/dist/src/playbook/mcp-recorder.js +204 -0
  87. package/dist/src/playbook/recorder.js +536 -0
  88. package/dist/src/playbook/runner.js +408 -0
  89. package/dist/src/playbook/store.js +312 -0
  90. package/dist/src/playbook/types.js +17 -0
  91. package/dist/src/recovery/detectors.js +156 -0
  92. package/dist/src/recovery/engine.js +327 -0
  93. package/dist/src/recovery/index.js +20 -0
  94. package/dist/src/recovery/strategies.js +274 -0
  95. package/dist/src/recovery/types.js +20 -0
  96. package/dist/src/runtime/accessibility-adapter.js +430 -0
  97. package/dist/src/runtime/app-adapter.js +64 -0
  98. package/dist/src/runtime/applescript-adapter.js +305 -0
  99. package/dist/src/runtime/ax-role-map.js +96 -0
  100. package/dist/src/runtime/browser-adapter.js +52 -0
  101. package/dist/src/runtime/cdp-chrome-adapter.js +521 -0
  102. package/dist/src/runtime/composite-adapter.js +221 -0
  103. package/dist/src/runtime/execution-contract.js +159 -0
  104. package/dist/src/runtime/executor.js +286 -0
  105. package/dist/src/runtime/locator-cache.js +50 -0
  106. package/dist/src/runtime/planning-loop.js +63 -0
  107. package/dist/src/runtime/service.js +432 -0
  108. package/dist/src/runtime/session-manager.js +63 -0
  109. package/dist/src/runtime/state-observer.js +121 -0
  110. package/dist/src/runtime/vision-adapter.js +225 -0
  111. package/dist/src/state/app-map-types.js +72 -0
  112. package/dist/src/state/app-map.js +1974 -0
  113. package/dist/src/state/entity-tracker.js +108 -0
  114. package/dist/src/state/fusion.js +96 -0
  115. package/dist/src/state/index.js +21 -0
  116. package/dist/src/state/ladder-generator.js +236 -0
  117. package/dist/src/state/persistence.js +156 -0
  118. package/dist/src/state/types.js +17 -0
  119. package/dist/src/state/world-model.js +1456 -0
  120. package/dist/src/supervisor/locks.js +186 -0
  121. package/dist/src/supervisor/supervisor.js +403 -0
  122. package/dist/src/supervisor/types.js +30 -0
  123. package/dist/src/test-mcp-protocol.js +154 -0
  124. package/dist/src/types.js +17 -0
  125. package/dist/src/util/atomic-write.js +133 -0
  126. package/dist/src/util/sanitize.js +146 -0
  127. package/dist-app-maps/com.figma.Desktop.json +959 -0
  128. package/dist-app-maps/com.hnc.Discord.json +1146 -0
  129. package/dist-app-maps/notion.id.json +2831 -0
  130. package/dist-playbooks/canva-screenhand-carousel.json +445 -0
  131. package/dist-playbooks/codex-desktop.json +76 -0
  132. package/dist-playbooks/competitor-research-stack.json +122 -0
  133. package/dist-playbooks/davinci-color-grade.json +153 -0
  134. package/dist-playbooks/davinci-edit-timeline.json +162 -0
  135. package/dist-playbooks/davinci-render.json +114 -0
  136. package/dist-playbooks/devto.json +52 -0
  137. package/dist-playbooks/discord.json +41 -0
  138. package/dist-playbooks/google-flow-create-project.json +59 -0
  139. package/dist-playbooks/google-flow-edit-image.json +90 -0
  140. package/dist-playbooks/google-flow-edit-video.json +90 -0
  141. package/dist-playbooks/google-flow-generate-image.json +68 -0
  142. package/dist-playbooks/google-flow-generate-video.json +191 -0
  143. package/dist-playbooks/google-flow-open-project.json +48 -0
  144. package/dist-playbooks/google-flow-open-scenebuilder.json +64 -0
  145. package/dist-playbooks/google-flow-search-assets.json +64 -0
  146. package/dist-playbooks/instagram.json +57 -0
  147. package/dist-playbooks/linkedin.json +52 -0
  148. package/dist-playbooks/n8n.json +43 -0
  149. package/dist-playbooks/reddit.json +52 -0
  150. package/dist-playbooks/threads.json +59 -0
  151. package/dist-playbooks/x-twitter.json +59 -0
  152. package/dist-playbooks/youtube.json +59 -0
  153. package/dist-references/canva.json +646 -0
  154. package/dist-references/codex-desktop.json +305 -0
  155. package/dist-references/davinci-resolve-keyboard.json +594 -0
  156. package/dist-references/davinci-resolve-menu-map.json +1139 -0
  157. package/dist-references/davinci-resolve-menus-batch1.json +116 -0
  158. package/dist-references/davinci-resolve-menus-batch2.json +372 -0
  159. package/dist-references/davinci-resolve-menus-batch3.json +330 -0
  160. package/dist-references/davinci-resolve-menus-batch4.json +297 -0
  161. package/dist-references/davinci-resolve-shortcuts.json +333 -0
  162. package/dist-references/devto.json +317 -0
  163. package/dist-references/discord.json +549 -0
  164. package/dist-references/figma.json +1186 -0
  165. package/dist-references/finder.json +146 -0
  166. package/dist-references/google-ads-transparency.json +95 -0
  167. package/dist-references/google-flow.json +649 -0
  168. package/dist-references/instagram.json +341 -0
  169. package/dist-references/linkedin.json +324 -0
  170. package/dist-references/meta-ad-library.json +86 -0
  171. package/dist-references/n8n.json +387 -0
  172. package/dist-references/notes.json +27 -0
  173. package/dist-references/notion.json +163 -0
  174. package/dist-references/reddit.json +341 -0
  175. package/dist-references/threads.json +337 -0
  176. package/dist-references/x-twitter.json +403 -0
  177. package/dist-references/youtube.json +373 -0
  178. package/native/macos-bridge/Package.swift +1 -0
  179. package/native/macos-bridge/Sources/AccessibilityBridge.swift +257 -36
  180. package/native/macos-bridge/Sources/AppManagement.swift +212 -2
  181. package/native/macos-bridge/Sources/CoreGraphicsBridge.swift +348 -53
  182. package/native/macos-bridge/Sources/StreamCapture.swift +136 -0
  183. package/native/macos-bridge/Sources/VisionBridge.swift +165 -7
  184. package/native/macos-bridge/Sources/main.swift +169 -16
  185. package/native/windows-bridge/Program.cs +5 -0
  186. package/native/windows-bridge/ScreenCapture.cs +124 -0
  187. package/package.json +29 -4
  188. package/scripts/postinstall.cjs +127 -0
  189. package/.claude/commands/automate.md +0 -28
  190. package/.claude/commands/debug-ui.md +0 -19
  191. package/.claude/commands/screenshot.md +0 -15
  192. package/.github/FUNDING.yml +0 -1
  193. package/.github/ISSUE_TEMPLATE/bug_report.md +0 -27
  194. package/.github/ISSUE_TEMPLATE/feature_request.md +0 -20
  195. package/.mcp.json +0 -8
  196. package/DESKTOP_MCP_GUIDE.md +0 -92
  197. package/SECURITY.md +0 -44
  198. package/docs/architecture.md +0 -47
  199. package/install-skills.sh +0 -19
  200. package/mcp-bridge.ts +0 -271
  201. package/mcp-desktop.ts +0 -1221
  202. package/playbooks/instagram.json +0 -41
  203. package/playbooks/instagram_v2.json +0 -201
  204. package/playbooks/x_v1.json +0 -211
  205. package/scripts/devpost-live-loop.mjs +0 -421
  206. package/src/logging/timeline-logger.ts +0 -55
  207. package/src/mcp/server.ts +0 -449
  208. package/src/memory/recall.ts +0 -191
  209. package/src/memory/research.ts +0 -146
  210. package/src/memory/seeds.ts +0 -123
  211. package/src/memory/session.ts +0 -201
  212. package/src/memory/store.ts +0 -434
  213. package/src/memory/types.ts +0 -69
  214. package/src/native/bridge-client.ts +0 -239
  215. package/src/runtime/accessibility-adapter.ts +0 -487
  216. package/src/runtime/app-adapter.ts +0 -169
  217. package/src/runtime/applescript-adapter.ts +0 -376
  218. package/src/runtime/ax-role-map.ts +0 -102
  219. package/src/runtime/browser-adapter.ts +0 -129
  220. package/src/runtime/cdp-chrome-adapter.ts +0 -676
  221. package/src/runtime/composite-adapter.ts +0 -274
  222. package/src/runtime/executor.ts +0 -396
  223. package/src/runtime/planning-loop.ts +0 -81
  224. package/src/runtime/service.ts +0 -448
  225. package/src/runtime/session-manager.ts +0 -50
  226. package/src/runtime/state-observer.ts +0 -136
  227. package/src/runtime/vision-adapter.ts +0 -297
  228. package/src/types.ts +0 -297
  229. package/tests/bridge-client.test.ts +0 -176
  230. package/tests/browser-stealth.test.ts +0 -210
  231. package/tests/composite-adapter.test.ts +0 -64
  232. package/tests/mcp-server.test.ts +0 -151
  233. package/tests/memory-recall.test.ts +0 -339
  234. package/tests/memory-research.test.ts +0 -159
  235. package/tests/memory-seeds.test.ts +0 -120
  236. package/tests/memory-store.test.ts +0 -392
  237. package/tests/types.test.ts +0 -92
  238. package/tsconfig.check.json +0 -17
  239. package/tsconfig.json +0 -19
  240. package/vitest.config.ts +0 -8
  241. /package/{playbooks → dist-references}/devpost.json +0 -0
package/README.md CHANGED
@@ -2,86 +2,62 @@
2
2
 
3
3
  # ScreenHand
4
4
 
5
- **Give AI eyes and hands on your desktop.**
5
+ **Let AI control your desktop click buttons, fill forms, automate workflows in ~50ms with zero extra AI calls.**
6
6
 
7
- ScreenHand is an [MCP server](https://modelcontextprotocol.io/) that lets AI agents see your screen, click buttons, type text, and control any app on macOS and Windows.
7
+ An open-source [MCP server](https://modelcontextprotocol.io/) for macOS and Windows. Works with Claude, Cursor, Codex CLI, and any MCP-compatible client.
8
8
 
9
9
  [![License: AGPL-3.0](https://img.shields.io/badge/License-AGPL--3.0-blue.svg)](LICENSE)
10
10
  [![npm: screenhand](https://img.shields.io/npm/v/screenhand)](https://www.npmjs.com/package/screenhand)
11
+ [![CI](https://github.com/manushi4/screenhand/actions/workflows/ci.yml/badge.svg)](https://github.com/manushi4/screenhand/actions/workflows/ci.yml)
11
12
  [![Platform: macOS & Windows](https://img.shields.io/badge/Platform-macOS%20%7C%20Windows-green)]()
12
13
  [![MCP Compatible](https://img.shields.io/badge/MCP-Compatible-purple)]()
13
14
 
14
- [Website](https://screenhand.com) | [Quick Start](#quick-start) | [Use Cases](#use-cases) | [FAQ](#faq)
15
+ [Quick Start](#quick-start) | [What It Does](#what-it-does) | [Example](#example) | [All 111 Tools](docs/tools.md) | [Architecture](docs/architecture.md) | [Website](https://screenhand.com)
15
16
 
16
17
  </div>
17
18
 
18
19
  ---
19
20
 
21
+ <!-- TODO: Add demo GIF here — 15 sec showing Claude controlling a real app -->
22
+
20
23
  ## The Problem
21
24
 
22
- AI assistants are powerful but they're blind. They can't see what's on your screen, click a button, or type into an app. If you want Claude to help you automate a workflow, debug a UI, or fill out a form, you're stuck copy-pasting screenshots and describing what you see.
25
+ AI assistants can write code but can't use your computer. Every click requires a screenshot LLM interpretation coordinate guess **3-5 seconds and an API call per action**.
23
26
 
24
- **ScreenHand fixes that.** It gives any AI agent direct access to your desktop through native OS APIs not slow screenshot-and-guess loops.
27
+ ScreenHand gives AI direct access to native OS APIs. No screenshots needed for clicks. No AI calls for button presses.
25
28
 
26
- ## How It Works
29
+ | | Without ScreenHand | With ScreenHand |
30
+ |---|---|---|
31
+ | Click a button | Screenshot → LLM → coordinate click (~3-5s) | Native Accessibility API (~50ms) |
32
+ | Cost per action | 1 LLM API call | 0 LLM calls |
33
+ | Accuracy | Coordinate guessing — misses on layout shift | Exact element targeting by role/name |
34
+ | Browser control | Needs focus, screenshot per action | CDP in background (~10ms), no focus needed |
35
+ | Works across apps | One app at a time | Cross-app workflows, multi-agent coordination |
27
36
 
28
- You connect ScreenHand to your AI client (Claude, Cursor, Codex CLI, etc.) via the [Model Context Protocol](https://modelcontextprotocol.io/). Once connected, your AI can:
37
+ ## Quick Start
29
38
 
30
- - **See** your screen via screenshots and OCR
31
- - **Read** UI elements directly via native Accessibility APIs
32
- - **Click** buttons, menus, and links
33
- - **Type** text into any input field
34
- - **Control** Chrome tabs via DevTools Protocol
35
- - **Automate** cross-app workflows
39
+ ### 1. Add to your AI client (one step)
36
40
 
37
- ```
38
- Your AI Client (Claude, Cursor, etc.)
39
- | MCP protocol (stdio)
40
- ScreenHand
41
- | Native OS APIs
42
- Your Desktop (any app, any browser)
43
- ```
44
-
45
- ## Quick Start
41
+ <details open>
42
+ <summary><b>Claude Code</b> (recommended)</summary>
46
43
 
47
44
  ```bash
48
- git clone https://github.com/manushi4/screenhand.git
49
- cd screenhand
50
- npm install
51
- npm run build:native # macOS — builds Swift bridge
52
- # npm run build:native:windows # Windows — builds .NET bridge
45
+ claude mcp add screenhand -- npx -y screenhand
53
46
  ```
54
47
 
55
- ### Connect to Your AI Client
56
-
57
- <details>
58
- <summary><strong>Claude Desktop</strong></summary>
59
-
60
- Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
61
-
62
- ```json
63
- {
64
- "mcpServers": {
65
- "screenhand": {
66
- "command": "npx",
67
- "args": ["tsx", "/path/to/screenhand/mcp-desktop.ts"]
68
- }
69
- }
70
- }
71
- ```
48
+ Done. That's it.
72
49
  </details>
73
50
 
74
51
  <details>
75
- <summary><strong>Claude Code</strong></summary>
76
-
77
- Add to your project `.mcp.json` or `~/.claude/settings.json`:
52
+ <summary><b>Claude Desktop</b></summary>
78
53
 
54
+ Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
79
55
  ```json
80
56
  {
81
57
  "mcpServers": {
82
58
  "screenhand": {
83
59
  "command": "npx",
84
- "args": ["tsx", "/path/to/screenhand/mcp-desktop.ts"]
60
+ "args": ["-y", "screenhand"]
85
61
  }
86
62
  }
87
63
  }
@@ -89,16 +65,15 @@ Add to your project `.mcp.json` or `~/.claude/settings.json`:
89
65
  </details>
90
66
 
91
67
  <details>
92
- <summary><strong>Cursor</strong></summary>
93
-
94
- Add to `.cursor/mcp.json` in your project (or `~/.cursor/mcp.json` globally):
68
+ <summary><b>Cursor</b></summary>
95
69
 
70
+ Add to `.cursor/mcp.json`:
96
71
  ```json
97
72
  {
98
73
  "mcpServers": {
99
74
  "screenhand": {
100
75
  "command": "npx",
101
- "args": ["tsx", "/path/to/screenhand/mcp-desktop.ts"]
76
+ "args": ["-y", "screenhand"]
102
77
  }
103
78
  }
104
79
  }
@@ -106,127 +81,236 @@ Add to `.cursor/mcp.json` in your project (or `~/.cursor/mcp.json` globally):
106
81
  </details>
107
82
 
108
83
  <details>
109
- <summary><strong>OpenAI Codex CLI</strong></summary>
84
+ <summary><b>OpenAI Codex CLI</b></summary>
110
85
 
111
86
  Add to `~/.codex/config.toml`:
112
-
113
87
  ```toml
114
88
  [mcp.screenhand]
115
89
  command = "npx"
116
- args = ["tsx", "/path/to/screenhand/mcp-desktop.ts"]
90
+ args = ["-y", "screenhand"]
117
91
  transport = "stdio"
118
92
  ```
119
93
  </details>
120
94
 
121
95
  <details>
122
- <summary><strong>Any MCP Client</strong></summary>
96
+ <summary><b>Any MCP Client</b></summary>
123
97
 
124
- ScreenHand is a standard MCP server over stdio. Point any MCP-compatible client at `mcp-desktop.ts`.
98
+ ScreenHand is a standard MCP server over stdio. Run with `npx -y screenhand`.
125
99
  </details>
126
100
 
127
- Replace `/path/to/screenhand` with the actual path where you cloned the repo.
101
+ ### 2. Grant permissions
128
102
 
129
- ## Use Cases
103
+ **macOS**: System Settings > Privacy & Security > Accessibility > enable your terminal app.
130
104
 
131
- ### Automate Repetitive Workflows
132
- Tell your AI "submit this form on 10 websites" or "export all these reports as PDFs" — and it does it. ScreenHand handles the clicking, typing, and navigating across any app.
105
+ **Windows**: No special permissions needed.
133
106
 
134
- ### Debug UIs Faster
135
- Instead of clicking through your app manually, let Claude inspect the full UI element tree, check states, and walk through flows — all from your terminal.
107
+ ### 3. Browser control (optional)
136
108
 
137
- ### Browser Automation Without Selenium
138
- Fill forms, scrape data, run JavaScript, and navigate pages through Chrome DevTools Protocol. Works with sites that block traditional automation.
109
+ Launch Chrome with remote debugging to enable browser tools:
110
+ ```bash
111
+ open -a "Google Chrome" --args --remote-debugging-port=9222
112
+ ```
139
113
 
140
- ### Cross-App Workflows
141
- Read data from a spreadsheet, search it in Chrome, paste results into Notes — chain actions across your entire desktop.
114
+ That's it. Your AI client now has 111 tools for desktop automation.
142
115
 
143
- ### AI-Powered UI Testing
144
- Click buttons, verify text appears, check element states, and catch regressions — all driven by your AI agent.
116
+ <details>
117
+ <summary><b>Building from source</b> (contributors only)</summary>
145
118
 
146
- ## What's Included
119
+ ```bash
120
+ git clone https://github.com/manushi4/screenhand.git
121
+ cd screenhand && npm install && npm run build:native
122
+ ```
147
123
 
148
- ScreenHand exposes **70+ tools** organized by what you need to do:
124
+ On Windows, use `npm run build:native:windows` instead.
125
+ </details>
149
126
 
150
- | Category | Examples | What For |
151
- |----------|----------|----------|
152
- | **Screen** | `screenshot`, `ocr` | See what's on screen, read all visible text |
153
- | **App Control** | `ui_tree`, `ui_press`, `menu_click` | Read and interact with any native app |
154
- | **Keyboard & Mouse** | `click`, `type_text`, `key`, `drag` | Direct input control |
155
- | **Chrome Browser** | `browser_navigate`, `browser_js`, `browser_dom` | Full browser automation via CDP |
156
- | **Memory** | `memory_recall`, `memory_save` | ScreenHand learns from past sessions |
157
- | **AppleScript** | `applescript` | Run AppleScript on macOS |
127
+ ---
158
128
 
159
- For the full tool reference, see the [tool documentation](DESKTOP_MCP_GUIDE.md).
129
+ ## What It Does
160
130
 
161
- ## Requirements
131
+ ScreenHand gives AI agents seven capabilities:
162
132
 
163
- | | macOS | Windows |
164
- |---|---|---|
165
- | **OS** | macOS 12+ | Windows 10 (1809+) |
166
- | **Runtime** | Node.js 18+ | Node.js 18+ |
167
- | **Permissions** | Accessibility (System Settings) | None (no admin needed) |
168
- | **Browser tools** | Chrome with `--remote-debugging-port=9222` | Same |
169
- | **Build** | `npm run build:native` | `npm run build:native:windows` |
133
+ ### Desktop Control 19 tools
134
+ Click buttons, type text, read UI trees, navigate menus, drag, scroll — all via native Accessibility APIs in ~50ms. Works with any app: Finder, Notes, VS Code, Xcode, System Settings, etc.
135
+
136
+ ### Browser Automation 15 tools
137
+ Full Chrome control via DevTools Protocol. Navigate, click, type, run JavaScript, fill forms — all in the background at ~10ms. Built-in anti-detection (`browser_stealth`, `browser_human_click`) for sites with bot protection.
138
+
139
+ ### Smart Fallbacks 8 tools
140
+ `click_with_fallback`, `type_with_fallback`, etc. automatically try Accessibility → CDP → OCR → coordinates. You don't have to pick the right method — ScreenHand figures it out.
141
+
142
+ ### Memory & Learning — 14 tools
143
+ Gets smarter every session. Logs tool calls, saves winning strategies, tracks error patterns with fixes. Zero config, zero latency overhead (in-memory cache, async disk writes). Ships with 12 seed strategies for common macOS workflows. 6 learning policies: locator stability, sensor effectiveness, recovery ranking, pattern recognition, adaptive timing, and topology (navigation edge reliability).
144
+
145
+ ### App Mastery Map — automatic per-app spatial understanding
146
+ Builds a persistent reverse-engineered blueprint of every app from normal tool usage. 8 features record automatically: page zones, navigation graph (BFS pathfinding), hierarchy, I/O contracts, state machine, element visibility, timing profiles, and ready signals. Mastery levels (beginner → pro → expert → grandmaster) honestly reflect how well ScreenHand knows each app. Maps stored at `~/.screenhand/app-maps/`.
147
+
148
+ ### Jobs & Orchestration — 34 tools
149
+ Queue multi-step jobs, run them via background worker daemon, coordinate multiple AI agents with session leases, detect stalls, auto-recover. Survives client restarts.
170
150
 
171
- ## Development
151
+ ### Perception & Planning — 17 tools
152
+ Continuous screen awareness (3-rate perception loop at 100ms/300ms/1000ms), real-time world model with entity tracking, goal-oriented planning with auto-decomposition, recovery engine with self-healing. The system always knows what's on screen and feeds observations into the App Mastery Map.
153
+
154
+ > **Full reference**: See all [111 tools with descriptions](docs/tools.md).
155
+
156
+ ---
157
+
158
+ ## Example
159
+
160
+ **Browser** — Claude controls Chrome in the background while you work:
161
+
162
+ ```
163
+ You: Search for "screenhand" on Instagram
164
+
165
+ → browser_tabs() # ~10ms
166
+ [34DF5DE1] Instagram — https://www.instagram.com/
167
+
168
+ → browser_js({ code: "/* click Search icon */" }) # ~10ms
169
+ → browser_fill_form({ selector: "input", text: "screenhand" }) # ~50ms (human-like)
170
+ → browser_js({ code: "/* extract results */" }) # ~10ms
171
+
172
+ Found @screenhand_ as the top result.
173
+ ```
174
+
175
+ **Desktop** — native app control without screenshots:
176
+
177
+ ```
178
+ → apps() # List running apps ~10ms
179
+ → focus("com.apple.Notes") # Bring Notes to front ~10ms
180
+ → ui_tree() # Read full UI element tree ~50ms
181
+ → ui_press("New Note") # Click "New Note" button ~50ms
182
+ → type_text("Hello world") # Type text ~30ms
183
+ ```
184
+
185
+ **Cross-app** — chain actions across your whole desktop:
186
+
187
+ ```
188
+ → browser_js(...) # Extract data from Chrome
189
+ → focus("com.apple.Notes") # Switch to Notes
190
+ → type_text(extractedData) # Paste it in
191
+ → key("cmd+s") # Save
192
+ ```
193
+
194
+ ---
195
+
196
+ ## Claude Code Plugin
197
+
198
+ If you use Claude Code, ScreenHand includes a plugin with **13 skills and 5 agents** that wrap all 111 tools into intent-oriented workflows.
172
199
 
173
200
  ```bash
174
- npm run check # type-check
175
- npm test # run test suite
176
- npm run build # compile TypeScript
177
- npm run build:native # build native bridge
201
+ ./install-plugin.sh # after npm install && npm run build:native
178
202
  ```
179
203
 
204
+ | Skill | What it does |
205
+ |-------|-------------|
206
+ | `/automate` | Control any desktop app |
207
+ | `/post-social` | Post to X, LinkedIn, Instagram, Reddit, Threads, Discord |
208
+ | `/run-campaign` | Multi-platform marketing campaigns |
209
+ | `/edit-video` | DaVinci Resolve automation |
210
+ | `/design-figma` | Figma design via Plugin API + browser |
211
+ | `/edit-canva` | Canva template editing |
212
+ | `/scrape-web` | Data extraction with anti-detection |
213
+ | `/fill-form` | Human-like form filling |
214
+ | `/qa-smoke-test` | Automated UI testing |
215
+ | `/record-workflow` | Record into reusable playbooks |
216
+ | `/learn-platform` | Discover how to automate a new app/site |
217
+ | `/run-jobs` | Job queues, background workers |
218
+ | `/manage-system` | Supervisor, memory, diagnostics |
219
+
220
+ 5 specialized agents: **marketing**, **design**, **QA**, **scraper**, **orchestrator**.
221
+
222
+ ---
223
+
224
+ ## How It Works
225
+
226
+ ```
227
+ AI Client (Claude, Cursor, Codex CLI)
228
+ ↓ MCP protocol (stdio)
229
+ ScreenHand MCP Server (TypeScript)
230
+ ↓ JSON-RPC (stdio)
231
+ Native Bridge (Swift on macOS / C# on Windows)
232
+ ↓ OS APIs
233
+ Accessibility, CoreGraphics, Vision, UI Automation, SendInput
234
+ ```
235
+
236
+ ScreenHand reads the UI tree and DOM directly — no screenshots needed for most operations. When screenshots are needed (canvas apps, visual verification), OCR runs in ~600ms via the native Vision framework.
237
+
238
+ ---
239
+
240
+ ## Requirements
241
+
242
+ | | macOS | Windows |
243
+ |---|---|---|
244
+ | OS | macOS 12+ | Windows 10 (1809+) |
245
+ | Runtime | Node.js 18+ | Node.js 18+ |
246
+ | Native | Swift (included) | [.NET 8 SDK](https://dotnet.microsoft.com/download/dotnet/8.0) |
247
+ | Permissions | Accessibility access for terminal | None (UI Automation works without admin) |
248
+ | Browser | Chrome with `--remote-debugging-port=9222` | Same |
249
+
250
+ ## Docs
251
+
252
+ | Document | What's in it |
253
+ |----------|-------------|
254
+ | [All 111 Tools](docs/tools.md) | Complete tool reference with descriptions and speeds |
255
+ | [Architecture](docs/architecture.md) | 7-layer design, app tiers, performance targets |
256
+ | [App Mastery Map](docs/app-mastery-map.md) | Layer 7: persistent spatial understanding, 8 auto-recording features |
257
+ | [Bug Tracker](docs/l2-bug-tracker.md) | 103 bugs found and fixed, 80-scenario validation results |
258
+ | [Testing Plan](docs/testing-plan.md) | L1/L2 test methodology and gate criteria |
259
+
180
260
  ## FAQ
181
261
 
182
262
  <details>
183
- <summary><strong>What is ScreenHand?</strong></summary>
263
+ <summary><b>How is this different from Anthropic's Computer Use?</b></summary>
184
264
 
185
- An MCP server that gives AI agents the ability to see and control your desktop. It uses native OS APIs (Accessibility on macOS, UI Automation on Windows) for fast, reliable automation not slow screenshot-based guessing.
265
+ Computer Use is cloud-based and screenshot-driven. ScreenHand is local-first, uses native OS APIs (50ms vs 3-5s per action), costs zero API calls for clicks/typing, and runs entirely on your machine.
186
266
  </details>
187
267
 
188
268
  <details>
189
- <summary><strong>How is this different from Anthropic's Computer Use?</strong></summary>
269
+ <summary><b>What apps can it control?</b></summary>
190
270
 
191
- Computer Use is cloud-based and built into Claude. ScreenHand is open-source, runs locally on your machine, and uses native OS APIs which are faster and more reliable than screenshot-based approaches. It also works with any MCP-compatible client, not just Claude.
271
+ Any app with Accessibility support (most macOS/Windows apps). Chrome and Electron apps get full DOM access via CDP. Canvas-heavy apps (games, Photoshop viewport) use OCR as fallback.
192
272
  </details>
193
273
 
194
274
  <details>
195
- <summary><strong>Is it safe?</strong></summary>
275
+ <summary><b>Is it safe?</b></summary>
196
276
 
197
- ScreenHand runs entirely on your machine no screen data is sent to external servers. All tool calls are audit-logged. See our [Security Policy](SECURITY.md) for details on permissions and boundaries.
277
+ Runs locally, never sends screen data externally. PII is redacted from all persisted data (memory, playbooks, strategies). Dangerous protocols (`javascript:`, `data:`) are blocked. AppleScript and browser JS execution are audit-logged.
198
278
  </details>
199
279
 
200
280
  <details>
201
- <summary><strong>What AI clients work with it?</strong></summary>
281
+ <summary><b>Does it work with multiple AI agents at once?</b></summary>
202
282
 
203
- Any MCP-compatible client: Claude Desktop, Claude Code, Cursor, Windsurf, OpenAI Codex CLI, and more.
283
+ Yes. Session leases with heartbeat prevent conflicts. The supervisor daemon detects stalls and recovers. Each agent claims its own app window.
204
284
  </details>
205
285
 
206
286
  <details>
207
- <summary><strong>Can it control any app?</strong></summary>
287
+ <summary><b>How fast is it?</b></summary>
208
288
 
209
- On macOS, any app that exposes Accessibility elements (most do). On Windows, any app supporting UI Automation. For apps with custom rendering (games, some Electron apps), OCR is available as a fallback.
289
+ Accessibility: ~50ms. Chrome CDP: ~10ms (background, no focus needed). OCR: ~600ms. Memory lookups: ~0ms (in-memory cache). All disk writes are async and non-blocking.
210
290
  </details>
211
291
 
212
292
  ## Contributing
213
293
 
214
- Contributions welcome! Please open an issue first to discuss what you'd like to change.
215
-
216
294
  ```bash
217
295
  git clone https://github.com/manushi4/screenhand.git
218
- cd screenhand
219
- npm install && npm run build:native && npm test
296
+ cd screenhand && npm install && npm run build:native
297
+ npm test # 1306 tests, 53 files
220
298
  ```
221
299
 
300
+ ## Contact
301
+
302
+ - **Email**: [khushi@clazro.com](mailto:khushi@clazro.com)
303
+ - **Issues**: [github.com/manushi4/screenhand/issues](https://github.com/manushi4/screenhand/issues)
304
+ - **Website**: [screenhand.com](https://screenhand.com)
305
+
222
306
  ## License
223
307
 
224
- [AGPL-3.0](LICENSE) — Copyright (C) 2025 Clazro Technology Private Limited
308
+ AGPL-3.0-only — Copyright (C) 2025-2026 Clazro Technology Private Limited
225
309
 
226
310
  ---
227
311
 
228
312
  <div align="center">
229
313
 
230
- **[screenhand.com](https://screenhand.com)** | Built by **[Clazro Technology Private Limited](https://github.com/manushi4)**
314
+ **[screenhand.com](https://screenhand.com)** | [khushi@clazro.com](mailto:khushi@clazro.com) | A product of **Clazro Technology Private Limited**
231
315
 
232
316
  </div>
Binary file