screenhand 0.3.0 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/dist/mcp-desktop.js +90 -10
  2. package/package.json +1 -1
@@ -262,7 +262,87 @@ async function ensureCDP(overridePort) {
262
262
  }
263
263
  throw new Error("Chrome not running with --remote-debugging-port. Launch with: /Applications/Google\\ Chrome.app/Contents/MacOS/Google\\ Chrome --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug");
264
264
  }
265
- const server = new McpServer({ name: "screenhand", version: "3.0.0" });
265
+ const server = new McpServer({ name: "screenhand", version: "3.0.0" }, {
266
+ instructions: `ScreenHand gives you native desktop control on macOS/Windows. 111 tools across 7 capability tiers.
267
+
268
+ ## Quick Patterns
269
+
270
+ **Click something**: ui_find("Search") → ui_press("Search") (~50ms, no screenshot needed)
271
+ **Type text**: click target first, then type_text("hello") or key("cmd+a") for shortcuts
272
+ **Read screen**: ui_tree() for structured elements, screenshot() + ocr() for visual content
273
+ **Browser**: browser_navigate/browser_js/browser_click — works in background via CDP (~10ms)
274
+ **Cross-app**: focus("com.apple.Notes") → type_text() → key("cmd+s") — chain apps freely
275
+
276
+ ## When to Use Advanced Features
277
+
278
+ ### World State & Perception (know what's on screen)
279
+ - **perception_start()** — turn on continuous screen monitoring (3-rate: 100ms/300ms/1000ms). Use BEFORE complex multi-step workflows so you always know what's on screen.
280
+ - **world_state()** — check current app, windows, controls, dialogs. Use to verify state before acting. Use verbose=true to see all controls.
281
+ - **world_state_diff()** — find stale/outdated UI elements. Use after long pauses to check what changed.
282
+ - **perception_stop()** — turn off when done to save resources.
283
+ - Pattern: perception_start() → do work → world_state() to verify → perception_stop()
284
+
285
+ ### Learning & Memory (get smarter over time)
286
+ - **Learning is automatic** — every tool call teaches ScreenHand which selectors work, which fail, optimal timing per app. No action needed.
287
+ - **memory_save(key, value)** — save a strategy or finding for future sessions (persists to disk).
288
+ - **memory_recall(query)** — retrieve saved strategies, past errors, what worked before. ALWAYS recall before attempting unfamiliar platforms.
289
+ - **learning_status()** — see what ScreenHand has learned: locator preferences, recovery rankings, timing budgets per app.
290
+ - **learning_reset()** — nuclear option, clears all learning. Rarely needed.
291
+ - Pattern: memory_recall("instagram post") → use recalled strategy → if new approach works, memory_save() it
292
+
293
+ ### Self-Healing & Recovery (handle errors automatically)
294
+ - **Recovery is automatic** — when a tool fails, ScreenHand tries alternative strategies (AX → CDP → OCR → coordinates) without you doing anything.
295
+ - **recovery_status()** — see cooldowns, active strategies, which fixes are cached.
296
+ - **recovery_configure()** — adjust recovery budget (max time, max strategies to try).
297
+ - ***_with_fallback tools** (click_with_fallback, type_with_fallback, etc.) — use these instead of bare click/type when reliability matters. They auto-try multiple methods.
298
+ - Pattern: Use *_with_fallback tools for critical actions. If something still fails, check recovery_status() to understand why.
299
+
300
+ ### Platform Knowledge (know HOW to automate an app)
301
+ - **platform_guide("figma")** — get selectors, flows, known errors for a platform. Call FIRST when automating any app/site.
302
+ - **platform_explore("bundleId")** — auto-discover an unknown app's UI structure.
303
+ - **platform_learn("domain")** — learn a website's structure by crawling.
304
+ - **scan_menu_bar()** — discover all menu items in the current app.
305
+ - Pattern: platform_guide() first → if not found, platform_explore() → then automate
306
+
307
+ ### Jobs & Multi-Step Workflows (survive restarts)
308
+ - **job_create(name, steps[])** — define a multi-step workflow that persists to disk.
309
+ - **job_run(jobId)** — execute a job. Survives MCP client restarts.
310
+ - **worker_start()** — start background daemon that processes jobs autonomously.
311
+ - **playbook_record()** / **export_playbook()** — record your actions into reusable playbooks.
312
+ - Pattern: For repeatable workflows, record as playbook → export → job_create from playbook → worker_start
313
+
314
+ ### Multi-Agent Coordination (multiple AI agents sharing one machine)
315
+ - **session_claim()** — claim exclusive access to an app window (lease-based).
316
+ - **session_heartbeat()** — keep your lease alive.
317
+ - **session_release()** — release when done.
318
+ - **supervisor_start()** — background daemon that detects stalled agents and recovers.
319
+ - Pattern: session_claim() → do work → session_heartbeat() periodically → session_release()
320
+
321
+ ### Planning (let ScreenHand figure out the steps)
322
+ - **plan_goal("Export video as H.264")** — describe WHAT you want, ScreenHand generates a step-by-step plan. It searches playbooks, saved strategies, and reference knowledge to build the plan. Does NOT execute — returns the plan for review.
323
+ - **plan_execute(goalId)** — run the plan automatically. Deterministic steps (known selectors/flows) run internally. Pauses at LLM steps where your judgment is needed — resolve them with plan_step_resolve().
324
+ - **plan_step(goalId)** — execute one step at a time (for more control than plan_execute).
325
+ - **plan_step_resolve(goalId, tool, params)** — when a plan pauses at an LLM step, YOU decide which tool and params to use. The server executes it, verifies postconditions, and advances.
326
+ - **plan_status(goalId)** — check progress: which step you're on, what's done, what's left.
327
+ - **plan_list()** — see all goals (active, completed, failed).
328
+ - **plan_cancel(goalId)** — abort a goal.
329
+ - Pattern: plan_goal("do X") → review steps → plan_execute() → resolve LLM steps as they pause → on success, strategy auto-saved to memory
330
+
331
+ ## Tool Selection Priority
332
+ 1. **ui_tree + ui_press** for native app elements (fastest, most reliable)
333
+ 2. **browser_* tools** for web content in Chrome/Electron
334
+ 3. ***_with_fallback** when you're unsure which method will work
335
+ 4. **screenshot + ocr** only for canvas apps or visual verification
336
+ 5. **applescript** for macOS-specific automation (Finder, Mail, etc.)
337
+
338
+ ## Tips
339
+ - Always call platform_guide() before automating a new app/site
340
+ - Use memory_recall() before attempting something you might have done before
341
+ - Start perception_start() for complex workflows, stop when done
342
+ - Prefer *_with_fallback tools over bare tools for reliability
343
+ - browser_stealth() before visiting sites with bot detection
344
+ `,
345
+ });
266
346
  // ═══════════════════════════════════════════════
267
347
  // LEARNING MEMORY — cached, auto-recall, non-blocking
268
348
  // ═══════════════════════════════════════════════
@@ -3100,7 +3180,7 @@ originalTool("memory_snapshot", "Get current memory state snapshot — session i
3100
3180
  const snap = memory.getSnapshot();
3101
3181
  return { content: [{ type: "text", text: JSON.stringify(snap, null, 2) }] };
3102
3182
  });
3103
- originalTool("memory_recall", "Have I done something like this before? Searches past successful strategies by keyword similarity.", {
3183
+ originalTool("memory_recall", "Search past successful strategies by keyword. ALWAYS call this before automating an unfamiliar platform — it may have a saved strategy from a previous session. Returns matching strategies with step-by-step actions that worked before.", {
3104
3184
  task: z.string().describe("Describe the task you want to accomplish"),
3105
3185
  limit: z.number().optional().describe("Max results (default 5)"),
3106
3186
  }, async ({ task, limit }) => {
@@ -3114,7 +3194,7 @@ originalTool("memory_recall", "Have I done something like this before? Searches
3114
3194
  }).join("\n\n");
3115
3195
  return { content: [{ type: "text", text }] };
3116
3196
  });
3117
- originalTool("memory_save", "This approach worked remember it. Saves the current session's action sequence as a reusable strategy.", {
3197
+ originalTool("memory_save", "Save a successful approach for future sessions. Call this after completing a task so next time you (or another agent) can memory_recall() it instead of figuring it out again. Persists to disk — survives restarts.", {
3118
3198
  task: z.string().describe("Short description of the task that was accomplished"),
3119
3199
  tags: z.array(z.string()).optional().describe("Optional tags for easier recall"),
3120
3200
  }, async ({ task, tags }) => {
@@ -4819,7 +4899,7 @@ originalTool("worker_status", "Get the current status of the worker daemon (read
4819
4899
  // ═══════════════════════════════════════════════
4820
4900
  // PLANNER — goal-oriented planning
4821
4901
  // ═══════════════════════════════════════════════
4822
- originalTool("plan_goal", "Create a goal and generate an execution plan. Returns the plan source (playbook/strategy/llm), steps, and confidence. Does NOT execute — use the returned plan for review or pass to job system.", {
4902
+ originalTool("plan_goal", "Describe WHAT you want to achieve ScreenHand builds a step-by-step plan by searching playbooks, saved strategies, and platform references. Returns steps with confidence scores. Does NOT execute — review the plan, then use plan_execute() or plan_step() to run it. Use for complex multi-step workflows instead of figuring out each step yourself.", {
4823
4903
  goal: z.string().describe("What you want to achieve (e.g. 'Export Premiere Pro timeline as H.264')"),
4824
4904
  }, async ({ goal: goalDescription }) => {
4825
4905
  const goal = planner.createGoal(goalDescription);
@@ -4854,7 +4934,7 @@ originalTool("plan_goal", "Create a goal and generate an execution plan. Returns
4854
4934
  _meta: { goalId: goal.id, plan },
4855
4935
  };
4856
4936
  });
4857
- originalTool("plan_execute", "Execute a goal's plan automatically. Runs deterministic steps internally. Pauses at LLM steps and returns the step description for you to resolve with plan_step_resolve. On completion, saves the strategy to memory for future reuse.", {
4937
+ originalTool("plan_execute", "Run a plan automatically. Known steps (from playbooks/references) execute internally at full speed. Pauses at LLM steps where YOUR judgment is needed call plan_step_resolve() to provide the tool+params. On completion, the successful strategy is auto-saved to memory for future reuse.", {
4858
4938
  goalId: z.string().describe("Goal ID from plan_goal"),
4859
4939
  }, async ({ goalId }) => {
4860
4940
  const goal = goalStore.get(goalId);
@@ -5048,7 +5128,7 @@ originalTool("perception_status", "Get continuous perception status: multi-rate
5048
5128
  }
5049
5129
  return { content: [{ type: "text", text: lines.join("\n") }] };
5050
5130
  });
5051
- originalTool("world_state", "Get the current world model state: focused app, window/control counts, active dialogs, and last scan age. Use verbose=true to dump all controls.", {
5131
+ originalTool("world_state", "Get what's currently on screen: focused app, windows, controls, dialogs, scroll position. Call this to verify UI state before acting. Use verbose=true to see all controls with roles/labels/positions. Works best after perception_start() which keeps it continuously updated.", {
5052
5132
  verbose: z.boolean().optional().default(false).describe("Dump all controls with roles, labels, positions, and confidence"),
5053
5133
  }, async ({ verbose }) => {
5054
5134
  const state = worldModel.getState();
@@ -5164,7 +5244,7 @@ originalTool("world_state_diff", "Get stale UI controls that haven't been refres
5164
5244
  lines.push(` ... and ${stale.length - 20} more`);
5165
5245
  return { content: [{ type: "text", text: lines.join("\n") }] };
5166
5246
  });
5167
- originalTool("learning_status", "Get learning engine stats: locator preferences, recovery strategy rankings, adaptive budgets, and sensor preferences for a given app.", {
5247
+ originalTool("learning_status", "See what ScreenHand has learned about an app: which selectors work best, which recovery strategies succeed, optimal timing budgets, and sensor preferences. Learning happens automatically — every tool call teaches the system. Use this to inspect learned knowledge or debug why something isn't working.", {
5168
5248
  bundleId: z.string().optional().describe("App bundle ID to query (default: currently focused app)"),
5169
5249
  }, async ({ bundleId }) => {
5170
5250
  const bid = bundleId ?? worldModel.getState().focusedApp?.bundleId ?? "unknown";
@@ -5198,7 +5278,7 @@ originalTool("learning_status", "Get learning engine stats: locator preferences,
5198
5278
  return { content: [{ type: "text", text: lines.join("\n") }] };
5199
5279
  });
5200
5280
  // ── Perception lifecycle ──
5201
- originalTool("perception_start", "Start continuous perception for the currently focused app (or specify bundleId). Begins multi-rate AX/CDP/vision polling loop: FAST (100ms AX events), MEDIUM (300ms AX/CDP poll), SLOW (1000ms vision/OCR).", {
5281
+ originalTool("perception_start", "Start continuous screen monitoring ScreenHand will constantly track what's on screen (UI changes, new dialogs, element positions) and update world_state automatically. Call BEFORE complex multi-step workflows. 3-rate loop: FAST (100ms AX events), MEDIUM (300ms full tree), SLOW (1000ms visual OCR). Call perception_stop() when done.", {
5202
5282
  bundleId: z.string().optional().describe("Optional: specify app bundle ID directly instead of using focused app"),
5203
5283
  }, async ({ bundleId: overrideBundleId }) => {
5204
5284
  // Already running check
@@ -5339,7 +5419,7 @@ originalTool("plan_cancel", "Cancel an active goal, marking it as failed.", {
5339
5419
  return { content: [{ type: "text", text: `Goal cancelled: ${goalId}` }] };
5340
5420
  });
5341
5421
  // ── Recovery status + configure ──
5342
- originalTool("recovery_status", "Get recovery engine status: cooldowns, reference cache, learning engine connection.", {}, async () => {
5422
+ originalTool("recovery_status", "Check self-healing status: active cooldowns, cached recovery strategies, and learning engine connection. Recovery is automatic — when tools fail, ScreenHand tries alternative approaches (AX → CDP → OCR → coordinates). Use this to understand why recovery succeeded or failed.", {}, async () => {
5343
5423
  const status = recoveryEngine.getStatus();
5344
5424
  const lines = [
5345
5425
  "Recovery Engine Status:",
@@ -5349,7 +5429,7 @@ originalTool("recovery_status", "Get recovery engine status: cooldowns, referenc
5349
5429
  ];
5350
5430
  return { content: [{ type: "text", text: lines.join("\n") }] };
5351
5431
  });
5352
- originalTool("recovery_configure", "Update recovery engine default budget configuration.", {
5432
+ originalTool("recovery_configure", "Tune self-healing behavior: set max recovery time and max strategies to try when a tool fails. Default: tries multiple approaches within a time budget. Increase for critical actions, decrease for speed.", {
5353
5433
  maxRecoveryTimeMs: z.number().optional().describe("Max time for recovery attempts in ms"),
5354
5434
  maxStrategies: z.number().optional().describe("Max number of strategies to try"),
5355
5435
  }, async ({ maxRecoveryTimeMs, maxStrategies }) => {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "screenhand",
3
- "version": "0.3.0",
3
+ "version": "0.3.2",
4
4
  "mcpName": "io.github.manushi4/screenhand",
5
5
  "description": "Give AI eyes and hands on your desktop. ScreenHand is an open-source MCP server that lets Claude and other AI agents see your screen, click buttons, type text, and control any app on macOS and Windows.",
6
6
  "homepage": "https://screenhand.com",