screenhand 0.3.4 → 0.3.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/dist/mcp-desktop.js +67 -75
  2. package/package.json +1 -1
@@ -267,100 +267,92 @@ async function ensureCDP(overridePort) {
267
267
  throw new Error("Chrome not running with --remote-debugging-port. Launch with: /Applications/Google\\ Chrome.app/Contents/MacOS/Google\\ Chrome --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug");
268
268
  }
269
269
  const server = new McpServer({ name: "screenhand", version: "3.0.0" }, {
270
- instructions: `ScreenHand gives you native desktop control on macOS/Windows. 111 tools. Never click blind — always follow: KNOW → SEE → NAVIGATE → ACT → VERIFY → STOP.
270
+ instructions: `ScreenHand gives you native desktop control on macOS/Windows. 111 tools.
271
271
 
272
- ## The Golden Sequence (follow this order)
272
+ ## Quick Actions (just do it)
273
+ For simple tasks, go direct — no setup needed:
273
274
 
274
- ### 1. KNOW (before touching anything)
275
- platform_guide("figma") get selectors, flows, known errors for this app/site
276
- memory_recall("figma export") → check if you've done this before — reuse past strategies
277
- scan_menu_bar() → discover all menu items in the current app
275
+ focus("com.apple.Notes") ui_press("New Note") → type_text("hello") → key("cmd+s")
276
+ browser_navigate("https://...") browser_click("#btn") browser_js("return ...")
277
+
278
+ ## Tool Speed (fastest first)
279
+ 1. **ui_press / key / type_text** — native AX, ~50ms
280
+ 2. **browser_* tools** — CDP, ~10ms (background, no focus needed)
281
+ 3. ***_with_fallback** — auto-tries AX → CDP → OCR (~100-500ms)
282
+ 4. **screenshot + ocr** — visual, ~600ms (canvas apps only)
283
+ 5. **applescript** — macOS scripting (Finder, Mail, Safari)
278
284
 
279
- If platform_guide() has no data: platform_explore("bundleId") to auto-discover the app, or platform_learn("domain") for websites.
285
+ ## The Golden Sequence (for multi-step workflows)
286
+ For complex tasks with 3+ steps, follow this order:
287
+
288
+ ### 1. KNOW (before touching anything)
289
+ platform_guide("figma") → get selectors, flows, known errors
290
+ memory_recall("figma export") → reuse past strategies
291
+ If unknown app: platform_explore("bundleId") or platform_learn("domain")
280
292
 
281
293
  ### 2. SEE (understand current state)
282
294
  apps() → what's running?
283
- perception_start() → turn on continuous monitoring (3-rate: 100ms/300ms/1000ms)
284
- world_state() → current app, windows, controls, dialogs
285
- screenshot() → visual confirmation if needed
295
+ perception_start() → continuous monitoring (for multi-step only)
296
+ world_state() → current app, windows, controls
286
297
 
287
- perception_start() keeps world_state() continuously updated. Use it for complex multi-step workflows.
288
-
289
- ### 3. NAVIGATE (get to the right place)
298
+ ### 3. NAVIGATE
290
299
  focus("com.figma.Desktop") → bring app to front
291
- ui_tree() → see all clickable elements with roles and labels
292
- ui_find("Export") → check if a specific target exists before clicking
300
+ ui_tree() → see all clickable elements
301
+ ui_find("Export") → check if target exists
293
302
 
294
- ### 4. ACT (do the thing)
295
- click_with_fallback("Export") → click element (auto-tries AX → CDP → OCR → coordinates)
296
- type_with_fallback("filename") → type text with auto-fallback
303
+ ### 4. ACT
304
+ click_with_fallback("Export") → click (auto-tries multiple methods)
305
+ type_with_fallback("filename") → type with fallback
297
306
  key("cmd+shift+e") → keyboard shortcuts
298
- drag(fromX, fromY, toX, toY) → drag and drop
299
- scroll(direction) → scroll up/down/left/right
300
-
301
- Always prefer *_with_fallback tools over bare click/type — they auto-recover when one method fails.
302
-
303
- ### 5. VERIFY (confirm it worked)
304
- world_state() → did UI change as expected?
305
- world_state_diff() → what exactly changed since last check?
306
- screenshot() → visual proof
307
-
308
- ### 6. STOP (clean up)
309
- perception_stop() → stop monitoring (save resources)
310
- memory_save("figma_export", ...) → save successful strategy for next time
311
307
 
312
- ## For Web/Browser (Chrome, Electron apps)
313
- browser_navigate("https://...") go to URL
314
- browser_stealth() activate FIRST if site has bot detection
315
- browser_dom() → read page structure (CSS selectors)
316
- browser_click("#submit") → click element by CSS selector
317
- browser_type("input", "text") → type into form field
318
- browser_fill_form({...}) → fill multiple fields at once (human-like timing)
319
- browser_js("return ...") → run JavaScript for complex extraction/actions
320
- browser_wait("selector") → wait for element to appear
321
- browser_human_click(x, y) → human-like click with randomized timing
308
+ ### 5. VERIFY
309
+ world_state() did UI change?
310
+ world_state_diff() what changed?
322
311
 
323
- All browser tools work in the background (~10ms) — no need to focus Chrome.
312
+ ### 6. STOP
313
+ perception_stop() → stop monitoring
314
+ memory_save("task", ...) → save strategy for next time
324
315
 
325
- ## For Complex Multi-Step Tasks (let ScreenHand plan it)
326
- plan_goal("Export video as H.264") → describe WHAT you want ScreenHand generates steps from playbooks, strategies, and references
327
- plan_execute(goalId) → auto-run deterministic steps, pauses at LLM steps for your judgment
328
- plan_step_resolve(goalId, tool, params) → you provide the tool+params for LLM steps
329
- plan_status(goalId) → check progress
330
- plan_cancel(goalId) → abort if needed
316
+ ## Strategy Selection (optional for when you want to be smart about it)
317
+ Use these tools to pick the best approach. Skip for quick one-off actions.
331
318
 
332
- On success, the strategy is auto-saved to memory for future reuse.
319
+ **coverage_report(bundleId)** what does ScreenHand know about this app?
320
+ - Empty (0 selectors/flows) → learn first: scan_menu_bar() + platform_explore()
321
+ - Has data + high stability → go fast: direct tools (ui_press, key)
322
+ - Has error patterns → be careful: use *_with_fallback tools
333
323
 
334
- ## For Repeatable Workflows (automate once, run forever)
335
- playbook_record() start recording your actions
336
- ... do the work ...
337
- export_playbook() save as reusable playbook
338
- job_create("daily post", steps) make it a persistent job
339
- worker_start() background daemon runs jobs autonomously
324
+ **learning_status(bundleId)** how experienced is ScreenHand with this app?
325
+ - 100+ samples app is well-known, direct tools are safe
326
+ - 0 samples unknown app, use *_with_fallback
327
+ - AX score high use ui_tree + ui_press
328
+ - CDP score high → it's a web app, use browser_* tools
329
+ - Vision score high canvas app, use screenshot + ocr
340
330
 
341
- Jobs survive MCP client restarts. worker_start() runs independently.
331
+ ## Browser Automation
332
+ browser_navigate/browser_click/browser_type/browser_js — all work in background (~10ms)
333
+ browser_stealth() — activate before sites with bot detection
334
+ browser_fill_form({...}) — human-like multi-field form filling
335
+ browser_human_click(x, y) — randomized timing to avoid detection
342
336
 
343
- ## For Multi-Agent Coordination
344
- session_claim() → claim exclusive access to an app window (lease-based)
345
- session_heartbeat() keep your lease alive (call periodically)
346
- session_release() release when done
347
- supervisor_start() daemon that detects stalled agents and auto-recovers
337
+ ## Planning (let ScreenHand figure out the steps)
338
+ plan_goal("Export video as H.264") → generates step-by-step plan from playbooks/strategies/references
339
+ plan_execute(goalId) auto-runs known steps, pauses at LLM steps for your judgment
340
+ plan_step_resolve(goalId, tool, params) you resolve paused steps
341
+ plan_status(goalId) / plan_list() / plan_cancel(goalId)
348
342
 
349
- ## Self-Healing (automatic — no action needed)
350
- When any tool fails, ScreenHand automatically tries alternative strategies (AXCDP OCR coordinates). Learning is also automatic — every tool call teaches which selectors work, optimal timing, and recovery rankings per app. Check with:
351
- - learning_status() → see learned preferences per app
352
- - recovery_status() → see active cooldowns and cached strategies
353
- - recovery_configure() → tune recovery budget (max time, max retries)
343
+ ## Repeatable Workflows
344
+ playbook_record()do workexport_playbook() job_create("name", steps) worker_start()
345
+ Jobs survive restarts. Worker daemon runs independently.
354
346
 
355
- ## Tool Speed Priority
356
- 1. **ui_tree + ui_press** native Accessibility API, ~50ms (fastest, most reliable)
357
- 2. **browser_* tools** Chrome DevTools Protocol, ~10ms (background, no focus needed)
358
- 3. ***_with_fallback** — auto-tries multiple methods (~100-500ms)
359
- 4. **screenshot + ocr** — visual capture, ~600ms (only for canvas apps)
360
- 5. **applescript** — macOS scripting (Finder, Mail, Safari, etc.)
347
+ ## Multi-Agent
348
+ session_claim() work session_heartbeat() session_release()
349
+ supervisor_start()auto-detects stalled agents and recovers
361
350
 
362
- ## Key Rule
363
- Never click blind. Always: KNOW SEE NAVIGATE ACT VERIFY.
351
+ ## Self-Healing (automatic)
352
+ Tool failures auto-retry with alternative strategies. Learning is automatic every call improves selectors, timing, and recovery per app.
353
+ - learning_status() — inspect learned knowledge
354
+ - recovery_status() — check recovery state
355
+ - recovery_configure() — tune recovery budget
364
356
  `,
365
357
  });
366
358
  // ═══════════════════════════════════════════════
@@ -6248,7 +6240,7 @@ server.tool("ingest_tutorial", "Extract structured playbook steps from a video t
6248
6240
  }],
6249
6241
  };
6250
6242
  });
6251
- server.tool("coverage_report", "Generate a coverage report for an app — shows what knowledge we have (shortcuts, selectors, flows, playbooks, errors) and identifies gaps with recommendations.", {
6243
+ server.tool("coverage_report", "Check what ScreenHand knows about an app: shortcuts, selectors, flows, playbooks, error patterns, and stability %. Useful before complex workflows to decide strategy: learn first (if empty), go fast (if high coverage), or use fallback tools (if error patterns exist). Optional for quick actions.", {
6252
6244
  bundleId: z.string().describe("macOS bundle ID (e.g. com.blackmagic-design.DaVinciResolveLite)"),
6253
6245
  appName: z.string().describe("Human-readable app name"),
6254
6246
  includeLiveMenuScan: z.boolean().optional().describe("Also scan the live menu bar for comparison (requires app to be running, needs pid)"),
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "screenhand",
3
- "version": "0.3.4",
3
+ "version": "0.3.6",
4
4
  "mcpName": "io.github.manushi4/screenhand",
5
5
  "description": "Give AI eyes and hands on your desktop. ScreenHand is an open-source MCP server that lets Claude and other AI agents see your screen, click buttons, type text, and control any app on macOS and Windows.",
6
6
  "homepage": "https://screenhand.com",