screenhand 0.3.0 → 0.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/mcp-desktop.js +78 -8
- package/package.json +1 -1
package/dist/mcp-desktop.js
CHANGED
|
@@ -262,7 +262,77 @@ async function ensureCDP(overridePort) {
|
|
|
262
262
|
}
|
|
263
263
|
throw new Error("Chrome not running with --remote-debugging-port. Launch with: /Applications/Google\\ Chrome.app/Contents/MacOS/Google\\ Chrome --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug");
|
|
264
264
|
}
|
|
265
|
-
const server = new McpServer({ name: "screenhand", version: "3.0.0" }
|
|
265
|
+
const server = new McpServer({ name: "screenhand", version: "3.0.0" }, {
|
|
266
|
+
instructions: `ScreenHand gives you native desktop control on macOS/Windows. 111 tools across 7 capability tiers.
|
|
267
|
+
|
|
268
|
+
## Quick Patterns
|
|
269
|
+
|
|
270
|
+
**Click something**: ui_find("Search") → ui_press("Search") (~50ms, no screenshot needed)
|
|
271
|
+
**Type text**: click target first, then type_text("hello") or key("cmd+a") for shortcuts
|
|
272
|
+
**Read screen**: ui_tree() for structured elements, screenshot() + ocr() for visual content
|
|
273
|
+
**Browser**: browser_navigate/browser_js/browser_click — works in background via CDP (~10ms)
|
|
274
|
+
**Cross-app**: focus("com.apple.Notes") → type_text() → key("cmd+s") — chain apps freely
|
|
275
|
+
|
|
276
|
+
## When to Use Advanced Features
|
|
277
|
+
|
|
278
|
+
### World State & Perception (know what's on screen)
|
|
279
|
+
- **perception_start()** — turn on continuous screen monitoring (3-rate: 100ms/300ms/1000ms). Use BEFORE complex multi-step workflows so you always know what's on screen.
|
|
280
|
+
- **world_state()** — check current app, windows, controls, dialogs. Use to verify state before acting. Use verbose=true to see all controls.
|
|
281
|
+
- **world_state_diff()** — find stale/outdated UI elements. Use after long pauses to check what changed.
|
|
282
|
+
- **perception_stop()** — turn off when done to save resources.
|
|
283
|
+
- Pattern: perception_start() → do work → world_state() to verify → perception_stop()
|
|
284
|
+
|
|
285
|
+
### Learning & Memory (get smarter over time)
|
|
286
|
+
- **Learning is automatic** — every tool call teaches ScreenHand which selectors work, which fail, optimal timing per app. No action needed.
|
|
287
|
+
- **memory_save(key, value)** — save a strategy or finding for future sessions (persists to disk).
|
|
288
|
+
- **memory_recall(query)** — retrieve saved strategies, past errors, what worked before. ALWAYS recall before attempting unfamiliar platforms.
|
|
289
|
+
- **learning_status()** — see what ScreenHand has learned: locator preferences, recovery rankings, timing budgets per app.
|
|
290
|
+
- **learning_reset()** — nuclear option, clears all learning. Rarely needed.
|
|
291
|
+
- Pattern: memory_recall("instagram post") → use recalled strategy → if new approach works, memory_save() it
|
|
292
|
+
|
|
293
|
+
### Self-Healing & Recovery (handle errors automatically)
|
|
294
|
+
- **Recovery is automatic** — when a tool fails, ScreenHand tries alternative strategies (AX → CDP → OCR → coordinates) without you doing anything.
|
|
295
|
+
- **recovery_status()** — see cooldowns, active strategies, which fixes are cached.
|
|
296
|
+
- **recovery_configure()** — adjust recovery budget (max time, max strategies to try).
|
|
297
|
+
- ***_with_fallback tools** (click_with_fallback, type_with_fallback, etc.) — use these instead of bare click/type when reliability matters. They auto-try multiple methods.
|
|
298
|
+
- Pattern: Use *_with_fallback tools for critical actions. If something still fails, check recovery_status() to understand why.
|
|
299
|
+
|
|
300
|
+
### Platform Knowledge (know HOW to automate an app)
|
|
301
|
+
- **platform_guide("figma")** — get selectors, flows, known errors for a platform. Call FIRST when automating any app/site.
|
|
302
|
+
- **platform_explore("bundleId")** — auto-discover an unknown app's UI structure.
|
|
303
|
+
- **platform_learn("domain")** — learn a website's structure by crawling.
|
|
304
|
+
- **scan_menu_bar()** — discover all menu items in the current app.
|
|
305
|
+
- Pattern: platform_guide() first → if not found, platform_explore() → then automate
|
|
306
|
+
|
|
307
|
+
### Jobs & Multi-Step Workflows (survive restarts)
|
|
308
|
+
- **job_create(name, steps[])** — define a multi-step workflow that persists to disk.
|
|
309
|
+
- **job_run(jobId)** — execute a job. Survives MCP client restarts.
|
|
310
|
+
- **worker_start()** — start background daemon that processes jobs autonomously.
|
|
311
|
+
- **playbook_record()** / **export_playbook()** — record your actions into reusable playbooks.
|
|
312
|
+
- Pattern: For repeatable workflows, record as playbook → export → job_create from playbook → worker_start
|
|
313
|
+
|
|
314
|
+
### Multi-Agent Coordination (multiple AI agents sharing one machine)
|
|
315
|
+
- **session_claim()** — claim exclusive access to an app window (lease-based).
|
|
316
|
+
- **session_heartbeat()** — keep your lease alive.
|
|
317
|
+
- **session_release()** — release when done.
|
|
318
|
+
- **supervisor_start()** — background daemon that detects stalled agents and recovers.
|
|
319
|
+
- Pattern: session_claim() → do work → session_heartbeat() periodically → session_release()
|
|
320
|
+
|
|
321
|
+
## Tool Selection Priority
|
|
322
|
+
1. **ui_tree + ui_press** for native app elements (fastest, most reliable)
|
|
323
|
+
2. **browser_* tools** for web content in Chrome/Electron
|
|
324
|
+
3. ***_with_fallback** when you're unsure which method will work
|
|
325
|
+
4. **screenshot + ocr** only for canvas apps or visual verification
|
|
326
|
+
5. **applescript** for macOS-specific automation (Finder, Mail, etc.)
|
|
327
|
+
|
|
328
|
+
## Tips
|
|
329
|
+
- Always call platform_guide() before automating a new app/site
|
|
330
|
+
- Use memory_recall() before attempting something you might have done before
|
|
331
|
+
- Start perception_start() for complex workflows, stop when done
|
|
332
|
+
- Prefer *_with_fallback tools over bare tools for reliability
|
|
333
|
+
- browser_stealth() before visiting sites with bot detection
|
|
334
|
+
`,
|
|
335
|
+
});
|
|
266
336
|
// ═══════════════════════════════════════════════
|
|
267
337
|
// LEARNING MEMORY — cached, auto-recall, non-blocking
|
|
268
338
|
// ═══════════════════════════════════════════════
|
|
@@ -3100,7 +3170,7 @@ originalTool("memory_snapshot", "Get current memory state snapshot — session i
|
|
|
3100
3170
|
const snap = memory.getSnapshot();
|
|
3101
3171
|
return { content: [{ type: "text", text: JSON.stringify(snap, null, 2) }] };
|
|
3102
3172
|
});
|
|
3103
|
-
originalTool("memory_recall", "
|
|
3173
|
+
originalTool("memory_recall", "Search past successful strategies by keyword. ALWAYS call this before automating an unfamiliar platform — it may have a saved strategy from a previous session. Returns matching strategies with step-by-step actions that worked before.", {
|
|
3104
3174
|
task: z.string().describe("Describe the task you want to accomplish"),
|
|
3105
3175
|
limit: z.number().optional().describe("Max results (default 5)"),
|
|
3106
3176
|
}, async ({ task, limit }) => {
|
|
@@ -3114,7 +3184,7 @@ originalTool("memory_recall", "Have I done something like this before? Searches
|
|
|
3114
3184
|
}).join("\n\n");
|
|
3115
3185
|
return { content: [{ type: "text", text }] };
|
|
3116
3186
|
});
|
|
3117
|
-
originalTool("memory_save", "
|
|
3187
|
+
originalTool("memory_save", "Save a successful approach for future sessions. Call this after completing a task so next time you (or another agent) can memory_recall() it instead of figuring it out again. Persists to disk — survives restarts.", {
|
|
3118
3188
|
task: z.string().describe("Short description of the task that was accomplished"),
|
|
3119
3189
|
tags: z.array(z.string()).optional().describe("Optional tags for easier recall"),
|
|
3120
3190
|
}, async ({ task, tags }) => {
|
|
@@ -5048,7 +5118,7 @@ originalTool("perception_status", "Get continuous perception status: multi-rate
|
|
|
5048
5118
|
}
|
|
5049
5119
|
return { content: [{ type: "text", text: lines.join("\n") }] };
|
|
5050
5120
|
});
|
|
5051
|
-
originalTool("world_state", "Get
|
|
5121
|
+
originalTool("world_state", "Get what's currently on screen: focused app, windows, controls, dialogs, scroll position. Call this to verify UI state before acting. Use verbose=true to see all controls with roles/labels/positions. Works best after perception_start() which keeps it continuously updated.", {
|
|
5052
5122
|
verbose: z.boolean().optional().default(false).describe("Dump all controls with roles, labels, positions, and confidence"),
|
|
5053
5123
|
}, async ({ verbose }) => {
|
|
5054
5124
|
const state = worldModel.getState();
|
|
@@ -5164,7 +5234,7 @@ originalTool("world_state_diff", "Get stale UI controls that haven't been refres
|
|
|
5164
5234
|
lines.push(` ... and ${stale.length - 20} more`);
|
|
5165
5235
|
return { content: [{ type: "text", text: lines.join("\n") }] };
|
|
5166
5236
|
});
|
|
5167
|
-
originalTool("learning_status", "
|
|
5237
|
+
originalTool("learning_status", "See what ScreenHand has learned about an app: which selectors work best, which recovery strategies succeed, optimal timing budgets, and sensor preferences. Learning happens automatically — every tool call teaches the system. Use this to inspect learned knowledge or debug why something isn't working.", {
|
|
5168
5238
|
bundleId: z.string().optional().describe("App bundle ID to query (default: currently focused app)"),
|
|
5169
5239
|
}, async ({ bundleId }) => {
|
|
5170
5240
|
const bid = bundleId ?? worldModel.getState().focusedApp?.bundleId ?? "unknown";
|
|
@@ -5198,7 +5268,7 @@ originalTool("learning_status", "Get learning engine stats: locator preferences,
|
|
|
5198
5268
|
return { content: [{ type: "text", text: lines.join("\n") }] };
|
|
5199
5269
|
});
|
|
5200
5270
|
// ── Perception lifecycle ──
|
|
5201
|
-
originalTool("perception_start", "Start continuous
|
|
5271
|
+
originalTool("perception_start", "Start continuous screen monitoring — ScreenHand will constantly track what's on screen (UI changes, new dialogs, element positions) and update world_state automatically. Call BEFORE complex multi-step workflows. 3-rate loop: FAST (100ms AX events), MEDIUM (300ms full tree), SLOW (1000ms visual OCR). Call perception_stop() when done.", {
|
|
5202
5272
|
bundleId: z.string().optional().describe("Optional: specify app bundle ID directly instead of using focused app"),
|
|
5203
5273
|
}, async ({ bundleId: overrideBundleId }) => {
|
|
5204
5274
|
// Already running check
|
|
@@ -5339,7 +5409,7 @@ originalTool("plan_cancel", "Cancel an active goal, marking it as failed.", {
|
|
|
5339
5409
|
return { content: [{ type: "text", text: `Goal cancelled: ${goalId}` }] };
|
|
5340
5410
|
});
|
|
5341
5411
|
// ── Recovery status + configure ──
|
|
5342
|
-
originalTool("recovery_status", "
|
|
5412
|
+
originalTool("recovery_status", "Check self-healing status: active cooldowns, cached recovery strategies, and learning engine connection. Recovery is automatic — when tools fail, ScreenHand tries alternative approaches (AX → CDP → OCR → coordinates). Use this to understand why recovery succeeded or failed.", {}, async () => {
|
|
5343
5413
|
const status = recoveryEngine.getStatus();
|
|
5344
5414
|
const lines = [
|
|
5345
5415
|
"Recovery Engine Status:",
|
|
@@ -5349,7 +5419,7 @@ originalTool("recovery_status", "Get recovery engine status: cooldowns, referenc
|
|
|
5349
5419
|
];
|
|
5350
5420
|
return { content: [{ type: "text", text: lines.join("\n") }] };
|
|
5351
5421
|
});
|
|
5352
|
-
originalTool("recovery_configure", "
|
|
5422
|
+
originalTool("recovery_configure", "Tune self-healing behavior: set max recovery time and max strategies to try when a tool fails. Default: tries multiple approaches within a time budget. Increase for critical actions, decrease for speed.", {
|
|
5353
5423
|
maxRecoveryTimeMs: z.number().optional().describe("Max time for recovery attempts in ms"),
|
|
5354
5424
|
maxStrategies: z.number().optional().describe("Max number of strategies to try"),
|
|
5355
5425
|
}, async ({ maxRecoveryTimeMs, maxStrategies }) => {
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "screenhand",
|
|
3
|
-
"version": "0.3.
|
|
3
|
+
"version": "0.3.1",
|
|
4
4
|
"mcpName": "io.github.manushi4/screenhand",
|
|
5
5
|
"description": "Give AI eyes and hands on your desktop. ScreenHand is an open-source MCP server that lets Claude and other AI agents see your screen, click buttons, type text, and control any app on macOS and Windows.",
|
|
6
6
|
"homepage": "https://screenhand.com",
|