npm - screenhand - Versions diffs - 0.1.0 → 0.1.1 - Mend

screenhand 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (103) hide show

package/.claude/commands/automate.md +28 -0
package/.claude/commands/debug-ui.md +19 -0
package/.claude/commands/screenshot.md +15 -0
package/.github/FUNDING.yml +1 -0
package/.github/ISSUE_TEMPLATE/bug_report.md +27 -0
package/.github/ISSUE_TEMPLATE/feature_request.md +20 -0
package/.mcp.json +8 -0
package/DESKTOP_MCP_GUIDE.md +92 -0
package/LICENSE +661 -21
package/README.md +97 -292
package/SECURITY.md +44 -0
package/docs/architecture.md +47 -0
package/install-skills.sh +19 -0
package/mcp-bridge.ts +271 -0
package/mcp-desktop.ts +1221 -0
package/native/macos-bridge/Package.swift +21 -0
package/native/macos-bridge/Sources/AccessibilityBridge.swift +261 -0
package/native/macos-bridge/Sources/AppManagement.swift +129 -0
package/native/macos-bridge/Sources/CoreGraphicsBridge.swift +242 -0
package/native/macos-bridge/Sources/ObserverBridge.swift +120 -0
package/native/macos-bridge/Sources/VisionBridge.swift +80 -0
package/native/macos-bridge/Sources/main.swift +345 -0
package/native/windows-bridge/AppManagement.cs +234 -0
package/native/windows-bridge/InputBridge.cs +436 -0
package/native/windows-bridge/Program.cs +265 -0
package/native/windows-bridge/ScreenCapture.cs +329 -0
package/native/windows-bridge/UIAutomationBridge.cs +571 -0
package/native/windows-bridge/WindowsBridge.csproj +17 -0
package/package.json +3 -14
package/playbooks/devpost.json +186 -0
package/playbooks/instagram.json +41 -0
package/playbooks/instagram_v2.json +201 -0
package/playbooks/x_v1.json +211 -0
package/scripts/devpost-live-loop.mjs +421 -0
package/src/config.ts +30 -0
package/src/index.ts +92 -0
package/src/logging/timeline-logger.ts +55 -0
package/src/mcp/server.ts +449 -0
package/src/memory/recall.ts +191 -0
package/src/memory/research.ts +146 -0
package/src/memory/seeds.ts +123 -0
package/src/memory/session.ts +201 -0
package/src/memory/store.ts +434 -0
package/src/memory/types.ts +69 -0
package/src/native/bridge-client.ts +239 -0
package/src/native/macos-bridge-client.ts +22 -0
package/src/runtime/accessibility-adapter.ts +487 -0
package/src/runtime/app-adapter.ts +169 -0
package/src/runtime/applescript-adapter.ts +376 -0
package/src/runtime/ax-role-map.ts +102 -0
package/src/runtime/browser-adapter.ts +129 -0
package/src/runtime/cdp-chrome-adapter.ts +676 -0
package/src/runtime/composite-adapter.ts +274 -0
package/src/runtime/executor.ts +396 -0
package/src/runtime/locator-cache.ts +33 -0
package/src/runtime/planning-loop.ts +81 -0
package/src/runtime/service.ts +448 -0
package/src/runtime/session-manager.ts +50 -0
package/src/runtime/state-observer.ts +136 -0
package/src/runtime/vision-adapter.ts +297 -0
package/src/types.ts +297 -0
package/tests/bridge-client.test.ts +176 -0
package/tests/browser-stealth.test.ts +210 -0
package/tests/composite-adapter.test.ts +64 -0
package/tests/mcp-server.test.ts +151 -0
package/tests/memory-recall.test.ts +339 -0
package/tests/memory-research.test.ts +159 -0
package/tests/memory-seeds.test.ts +120 -0
package/tests/memory-store.test.ts +392 -0
package/tests/types.test.ts +92 -0
package/tsconfig.check.json +17 -0
package/tsconfig.json +19 -0
package/vitest.config.ts +8 -0
package/dist/config.js +0 -9
package/dist/index.js +0 -55
package/dist/logging/timeline-logger.js +0 -29
package/dist/mcp/mcp-stdio-server.js +0 -284
package/dist/mcp/server.js +0 -347
package/dist/mcp-entry.js +0 -62
package/dist/memory/recall.js +0 -160
package/dist/memory/research.js +0 -98
package/dist/memory/seeds.js +0 -89
package/dist/memory/session.js +0 -161
package/dist/memory/store.js +0 -391
package/dist/memory/types.js +0 -4
package/dist/native/bridge-client.js +0 -173
package/dist/native/macos-bridge-client.js +0 -5
package/dist/runtime/accessibility-adapter.js +0 -377
package/dist/runtime/app-adapter.js +0 -48
package/dist/runtime/applescript-adapter.js +0 -283
package/dist/runtime/ax-role-map.js +0 -80
package/dist/runtime/browser-adapter.js +0 -36
package/dist/runtime/cdp-chrome-adapter.js +0 -505
package/dist/runtime/composite-adapter.js +0 -205
package/dist/runtime/executor.js +0 -250
package/dist/runtime/locator-cache.js +0 -12
package/dist/runtime/planning-loop.js +0 -47
package/dist/runtime/service.js +0 -372
package/dist/runtime/session-manager.js +0 -28
package/dist/runtime/state-observer.js +0 -105
package/dist/runtime/vision-adapter.js +0 -208
package/dist/test-mcp-protocol.js +0 -138
package/dist/types.js +0 -1

package/.claude/commands/automate.md ADDED Viewed

@@ -0,0 +1,28 @@
+Automate a desktop workflow described by the user.
+The user will describe what they want done: $ARGUMENTS
+Plan and execute the workflow step by step using the desktop automation MCP tools:
+## Planning
+1. Break the task into discrete steps
+2. Identify which apps are involved (`apps`, `windows`)
+3. For each step, pick the FASTEST approach — try them in this order:
+   - **Accessibility (FASTEST — always try first)**: `ui_tree` → `ui_find` → `ui_press` / `ui_set_value`. ~50ms per action, no screenshots.
+   - **Keyboard shortcuts**: `key` for known shortcuts (cmd+s, cmd+c, etc.) — instant
+   - **AppleScript**: `applescript` for scriptable apps (Finder, Mail, Notes) — fast
+   - **Chrome CDP**: `browser_dom` → `browser_click` / `browser_type` — direct DOM, no vision
+   - **Visual (LAST RESORT only)**: `screenshot` → `click_text` — slow, only when Accessibility can't see the element (canvas, games, images)
+IMPORTANT: Do NOT use screenshot/OCR/click_text to interact with standard UI elements. Use ui_tree + ui_press instead — it's 10x faster and more reliable.
+## Execution
+- Execute each step, verifying success before moving to the next
+- After key actions, use `screenshot` or `ui_tree` to confirm the expected state
+- If a step fails, try an alternative approach before giving up
+- Report progress as you go
+## Completion
+- Summarize what was done
+- Note any steps that required fallbacks
+- Flag anything that didn't work as expected

package/.claude/commands/debug-ui.md ADDED Viewed

@@ -0,0 +1,19 @@
+Inspect and debug the UI structure of an app.
+1. Use `apps` to list running applications
+2. If the user specified an app name ($ARGUMENTS), find its PID. Otherwise use the frontmost app.
+3. Use `focus` to bring the app to the front
+4. Use `ui_tree` with the app's PID to get the full Accessibility tree
+5. Use `windows` to get the window bounds
+Then analyze and report:
+- App name and bundle ID
+- Window hierarchy and layout
+- Interactive elements (buttons, text fields, menus) with their states (enabled/disabled, value)
+- Navigation structure
+- Any elements that look broken or inaccessible
+- Suggested selectors for automating key actions (titles to use with `ui_press`, `ui_find`)
+Format as a structured report with sections.
+$ARGUMENTS

package/.claude/commands/screenshot.md ADDED Viewed

@@ -0,0 +1,15 @@
+Take a screenshot of the current screen and describe what you see.
+1. Use the `screenshot` MCP tool to capture the screen and OCR it
+2. Use `apps` to identify which apps are running
+3. Use `windows` to see window positions
+Then provide a clear summary:
+- What apps are visible
+- What the user appears to be doing
+- Key UI elements and text on screen
+- Any notable state (dialogs open, errors visible, etc.)
+If the user provides an app name as argument, focus on that app: use `focus` to bring it forward first, then screenshot.
+$ARGUMENTS

package/.github/FUNDING.yml ADDED Viewed

	@@ -0,0 +1 @@
1	+ github: manushi4

package/.github/ISSUE_TEMPLATE/bug_report.md ADDED Viewed

@@ -0,0 +1,27 @@
+---
+name: Bug Report
+about: Report a bug in ScreenHand
+title: "[Bug] "
+labels: bug
+---
+**Platform**
+- [ ] macOS
+- [ ] Windows
+**Describe the bug**
+A clear description of what went wrong.
+**To reproduce**
+1. Tool called: `...`
+2. Parameters: `...`
+3. Error/unexpected output: `...`
+**Expected behavior**
+What you expected to happen.
+**Environment**
+- OS version:
+- Node.js version:
+- ScreenHand version:
+- AI client (Claude Desktop / Claude Code / Cursor / other):

package/.github/ISSUE_TEMPLATE/feature_request.md ADDED Viewed

@@ -0,0 +1,20 @@
+---
+name: Feature Request
+about: Suggest a new tool or improvement for ScreenHand
+title: "[Feature] "
+labels: enhancement
+---
+**What problem does this solve?**
+Describe the use case.
+**Proposed solution**
+How should it work? What MCP tool name/parameters would you expect?
+**Alternatives considered**
+Any workarounds you've tried.
+**Platform**
+- [ ] macOS
+- [ ] Windows
+- [ ] Both

package/.mcp.json ADDED Viewed

@@ -0,0 +1,8 @@
+{
+  "mcpServers": {
+    "desktop": {
+      "command": "npx",
+      "args": ["tsx", "mcp-desktop.ts"]
+    }
+  }
+}

package/DESKTOP_MCP_GUIDE.md ADDED Viewed

@@ -0,0 +1,92 @@
+# ScreenHand MCP — Usage Guide
+You have access to the ScreenHand MCP server that can control any macOS/Windows application and Chrome browser. Use it for app debugging, design inspection, UI testing, and automation.
+## Quick Reference
+### How to discover what's on screen
+1. `apps` → get running apps with PIDs
+2. `windows` → get window IDs and positions
+3. `ui_tree(pid, maxDepth)` → get full UI structure instantly (50ms, no OCR)
+4. `screenshot(windowId)` → capture + OCR a window (600ms, use when you need visual text)
+### How to interact with native macOS apps (Finder, Notes, Xcode, etc.)
+- **Read UI structure**: `ui_tree(pid=1234, maxDepth=4)` — returns every button, menu, text field with positions
+- **Find element**: `ui_find(pid=1234, title="Save")` — find by text
+- **Click element**: `ui_press(pid=1234, title="Save")` — click by accessibility
+- **Click menu**: `menu_click(pid=1234, menuPath="File/Save As...")` — click any menu item
+- **Set text**: `ui_set_value(pid=1234, title="Search", value="hello")`
+- **Key combo**: `key(combo="cmd+s")` or `key(combo="cmd+shift+n")`
+### How to interact with Chrome/web pages (FAST — use this, not OCR)
+- **List tabs**: `browser_tabs()`
+- **Open URL**: `browser_open(url="https://example.com")`
+- **Run JS**: `browser_js(code="document.title")` — execute any JavaScript, returns result
+- **Query DOM**: `browser_dom(selector="button.primary")` — find elements with text, positions, attributes
+- **Click element**: `browser_click(selector="#submit-btn")`
+- **Type in input**: `browser_type(selector="input[name=search]", text="hello")`
+- **Wait for load**: `browser_wait(condition="document.querySelector('.results')")`
+- **Page content**: `browser_page_info()` — title, URL, text content
+### How to use AppleScript
+- `applescript(script='tell application "Finder" to get name of every file of desktop')`
+- `applescript(script='tell application "Safari" to set URL of current tab of front window to "https://google.com"')`
+## Rules & Best Practices
+### Speed hierarchy — always prefer the fastest method:
+1. **Accessibility (ui_tree, ui_press, menu_click)** — 50ms, structured, reliable. Use for ALL native app interactions.
+2. **CDP (browser_js, browser_dom, browser_click)** — 10ms, structured. Use for ALL web/browser interactions.
+3. **AppleScript** — 50ms, for app-specific scripting (Finder files, Safari URLs, Mail compose).
+4. **OCR (screenshot, click_text)** — 600ms, last resort. Only use when AX and CDP aren't available.
+### For app debugging:
+- Start with `apps` to find the PID
+- Use `ui_tree(pid, maxDepth=4)` to see the full UI hierarchy — every button, text field, label, with positions
+- Use `ui_tree(pid, maxDepth=6)` for deep inspection of complex views
+- Use `screenshot(windowId)` only when you need to see actual rendered text/images
+### For design inspection:
+- `screenshot_file(windowId)` returns the image path — you can read it to see the actual design
+- `ui_tree` shows the component structure (like React DevTools but for any app)
+- `browser_dom(selector="*")` with limit shows the DOM tree of any web page
+- `browser_js` can extract computed styles: `getComputedStyle(el).color`
+### For web app debugging:
+- Use `browser_js` to run any debugging code — console.log, inspect state, check network
+- Use `browser_dom` to find elements and their properties
+- Use `browser_wait` before interacting with dynamic content
+- Chain: `browser_navigate` → `browser_wait` → `browser_dom` → `browser_click`
+### Common patterns:
+**Debug a native app's UI:**
+```
+apps → find pid
+ui_tree(pid, 4) → see structure
+ui_find(pid, "button text") → locate element
+ui_press(pid, "button text") → interact
+```
+**Debug a web page:**
+```
+browser_tabs → find the tab
+browser_dom("main", tabId) → see page structure
+browser_js("document.querySelector('.error')?.textContent", tabId) → inspect
+```
+**Automate a flow:**
+```
+launch(bundleId) → open app
+ui_tree(pid, 3) → understand layout
+menu_click(pid, "File/New") → trigger action
+ui_set_value(pid, "Name", "test") → fill form
+ui_press(pid, "Save") → submit
+```
+### Important notes:
+- Chrome must be running with `--remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug` for browser_* tools
+- PIDs change when apps restart — always call `apps` first to get current PIDs
+- Window IDs change when windows are recreated — call `windows` to get current IDs
+- `ui_tree` requires Accessibility permissions (System Settings → Privacy → Accessibility)
+- For clicking by coordinates, use `click(x, y)` — coordinates are screen-absolute