@harusame64/desktop-touch-mcp 0.11.12 → 0.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +97 -0
  2. package/package.json +1 -1
package/README.md CHANGED
@@ -497,6 +497,103 @@ Setting `DESKTOP_TOUCH_FORCE_FOCUS=1` makes `forceFocus: true` the default for a
497
497
 
498
498
  ---
499
499
 
500
+ ## Auto Guard (v0.12+)
501
+
502
+ Action tools (`mouse_click`, `mouse_drag`, `keyboard_type`, `keyboard_press`, `click_element`, `set_element_value`, `browser_click_element`, `browser_navigate`) automatically guard each action when you pass `windowTitle` / `tabId`:
503
+
504
+ - Verifies target window identity (process restart / HWND replacement detected)
505
+ - Confirms click coordinates are inside the target window rect
506
+ - Returns `post.perception.status` on every response — including failures — so the LLM can recover without a screenshot
507
+
508
+ **Disabling auto guard** — set `DESKTOP_TOUCH_AUTO_GUARD=0` to restore v0.11.12 behavior (no auto guard):
509
+
510
+ ```json
511
+ {
512
+ "mcpServers": {
513
+ "desktop-touch": {
514
+ "type": "stdio",
515
+ "command": "npx",
516
+ "args": ["-y", "@harusame64/desktop-touch-mcp"],
517
+ "env": {
518
+ "DESKTOP_TOUCH_AUTO_GUARD": "0"
519
+ }
520
+ }
521
+ }
522
+ }
523
+ ```
524
+
525
+ When auto guard is enabled (default), `post.perception.status` will be one of:
526
+
527
+ | Status | Meaning |
528
+ |---|---|
529
+ | `ok` | Guard passed — target verified |
530
+ | `unguarded` | `windowTitle` not provided; action ran without guard |
531
+ | `target_not_found` | No window matched the given title |
532
+ | `ambiguous_target` | Multiple windows matched; use a more specific title |
533
+ | `identity_changed` | Window was replaced (process restart / HWND change) |
534
+ | `unsafe_coordinates` | Click coordinates are outside the target window rect |
535
+ | `needs_escalation` | Use `browser_click_element` or specify `windowTitle` |
536
+
537
+ When `unsafe_coordinates` or `identity_changed` is returned, the response may include a `suggestedFix.fixId`. Pass that `fixId` to the relevant tool call to approve the recovery:
538
+
539
+ ```json
540
+ { "name": "mouse_click", "arguments": { "fixId": "fix-..." } }
541
+ { "name": "keyboard_type", "arguments": { "fixId": "fix-...", "text": "hello" } }
542
+ { "name": "click_element", "arguments": { "fixId": "fix-..." } }
543
+ { "name": "browser_click_element", "arguments": { "fixId": "fix-..." } }
544
+ ```
545
+
546
+ The fix is one-shot and expires in 15 seconds. The server revalidates the target process identity before executing.
547
+
548
+ ---
549
+
550
+ ## v0.13 Additions
551
+
552
+ ### Target-Identity Timeline
553
+
554
+ The server tracks a semantic timeline of what happened to each target window/tab. Recent events are included in:
555
+
556
+ - `get_history` → `recentTargetKeys`: array of 3 most recently active target keys (compact, no event bodies)
557
+ - `perception_read(lensId)` → `recentEvents`: up to 10 events for that lens's target, each with `tsMs`, `semantic`, `summary`
558
+
559
+ Enable the MCP resources below to browse timelines:
560
+
561
+ ```json
562
+ { "env": { "DESKTOP_TOUCH_PERCEPTION_RESOURCES": "1" } }
563
+ ```
564
+
565
+ MCP resources available when enabled:
566
+
567
+ | URI | Content |
568
+ |---|---|
569
+ | `perception://target/{targetKey}/timeline` | Semantic event timeline for a target |
570
+ | `perception://targets/recent` | Most recently active target keys |
571
+ | `perception://lens/{lensId}/summary` | Lens attention/guard state |
572
+
573
+ ### Manual Lens Eviction: FIFO → LRU
574
+
575
+ Manual lenses (created via `perception_register`) are now evicted by **least-recently-used** instead of insertion order. Using `perception_read`, `evaluatePreToolGuards`, or `buildEnvelopeFor` on a lens promotes it. The hard limit of 16 active lenses is unchanged.
576
+
577
+ ### browser_eval Structured Mode
578
+
579
+ Pass `withPerception: true` to receive a structured JSON response with `post.perception` instead of raw text:
580
+
581
+ ```json
582
+ { "name": "browser_eval", "arguments": { "expression": "document.title", "withPerception": true } }
583
+ ```
584
+
585
+ Returns `{ ok: true, result: "...", post: { perception: { status: "ok", ... } } }`.
586
+
587
+ ### mouse_drag Cross-Window Guard
588
+
589
+ `mouse_drag` now guards both start and end coordinates. Drags that cross window boundaries (or reach the desktop wallpaper) are blocked by default. To allow intentional cross-window or range-selection drags:
590
+
591
+ ```json
592
+ { "name": "mouse_drag", "arguments": { "startX": 100, "startY": 100, "endX": 900, "endY": 900, "allowCrossWindowDrag": true } }
593
+ ```
594
+
595
+ ---
596
+
500
597
  ## Known limitations
501
598
 
502
599
  | Limitation | Detail | Workaround |
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@harusame64/desktop-touch-mcp",
3
- "version": "0.11.12",
3
+ "version": "0.13.0",
4
4
  "description": "LLM-native Windows computer-use MCP server with 56 tools for screenshots, UIA, mouse/keyboard, Chrome CDP, terminal, SmartScroll, and perception guards",
5
5
  "engines": {
6
6
  "node": ">=20.0.0"