@harusame64/desktop-touch-mcp 0.11.12 → 0.13.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +97 -0
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -497,6 +497,103 @@ Setting `DESKTOP_TOUCH_FORCE_FOCUS=1` makes `forceFocus: true` the default for a
|
|
|
497
497
|
|
|
498
498
|
---
|
|
499
499
|
|
|
500
|
+
## Auto Guard (v0.12+)
|
|
501
|
+
|
|
502
|
+
Action tools (`mouse_click`, `mouse_drag`, `keyboard_type`, `keyboard_press`, `click_element`, `set_element_value`, `browser_click_element`, `browser_navigate`) automatically guard each action when you pass `windowTitle` / `tabId`:
|
|
503
|
+
|
|
504
|
+
- Verifies target window identity (process restart / HWND replacement detected)
|
|
505
|
+
- Confirms click coordinates are inside the target window rect
|
|
506
|
+
- Returns `post.perception.status` on every response — including failures — so the LLM can recover without a screenshot
|
|
507
|
+
|
|
508
|
+
**Disabling auto guard** — set `DESKTOP_TOUCH_AUTO_GUARD=0` to restore v0.11.12 behavior (no auto guard):
|
|
509
|
+
|
|
510
|
+
```json
|
|
511
|
+
{
|
|
512
|
+
"mcpServers": {
|
|
513
|
+
"desktop-touch": {
|
|
514
|
+
"type": "stdio",
|
|
515
|
+
"command": "npx",
|
|
516
|
+
"args": ["-y", "@harusame64/desktop-touch-mcp"],
|
|
517
|
+
"env": {
|
|
518
|
+
"DESKTOP_TOUCH_AUTO_GUARD": "0"
|
|
519
|
+
}
|
|
520
|
+
}
|
|
521
|
+
}
|
|
522
|
+
}
|
|
523
|
+
```
|
|
524
|
+
|
|
525
|
+
When auto guard is enabled (default), `post.perception.status` will be one of:
|
|
526
|
+
|
|
527
|
+
| Status | Meaning |
|
|
528
|
+
|---|---|
|
|
529
|
+
| `ok` | Guard passed — target verified |
|
|
530
|
+
| `unguarded` | `windowTitle` not provided; action ran without guard |
|
|
531
|
+
| `target_not_found` | No window matched the given title |
|
|
532
|
+
| `ambiguous_target` | Multiple windows matched; use a more specific title |
|
|
533
|
+
| `identity_changed` | Window was replaced (process restart / HWND change) |
|
|
534
|
+
| `unsafe_coordinates` | Click coordinates are outside the target window rect |
|
|
535
|
+
| `needs_escalation` | Use `browser_click_element` or specify `windowTitle` |
|
|
536
|
+
|
|
537
|
+
When `unsafe_coordinates` or `identity_changed` is returned, the response may include a `suggestedFix.fixId`. Pass that `fixId` to the relevant tool call to approve the recovery:
|
|
538
|
+
|
|
539
|
+
```json
|
|
540
|
+
{ "name": "mouse_click", "arguments": { "fixId": "fix-..." } }
|
|
541
|
+
{ "name": "keyboard_type", "arguments": { "fixId": "fix-...", "text": "hello" } }
|
|
542
|
+
{ "name": "click_element", "arguments": { "fixId": "fix-..." } }
|
|
543
|
+
{ "name": "browser_click_element", "arguments": { "fixId": "fix-..." } }
|
|
544
|
+
```
|
|
545
|
+
|
|
546
|
+
The fix is one-shot and expires in 15 seconds. The server revalidates the target process identity before executing.
|
|
547
|
+
|
|
548
|
+
---
|
|
549
|
+
|
|
550
|
+
## v0.13 Additions
|
|
551
|
+
|
|
552
|
+
### Target-Identity Timeline
|
|
553
|
+
|
|
554
|
+
The server tracks a semantic timeline of what happened to each target window/tab. Recent events are included in:
|
|
555
|
+
|
|
556
|
+
- `get_history` → `recentTargetKeys`: array of 3 most recently active target keys (compact, no event bodies)
|
|
557
|
+
- `perception_read(lensId)` → `recentEvents`: up to 10 events for that lens's target, each with `tsMs`, `semantic`, `summary`
|
|
558
|
+
|
|
559
|
+
Enable the MCP resources below to browse timelines:
|
|
560
|
+
|
|
561
|
+
```json
|
|
562
|
+
{ "env": { "DESKTOP_TOUCH_PERCEPTION_RESOURCES": "1" } }
|
|
563
|
+
```
|
|
564
|
+
|
|
565
|
+
MCP resources available when enabled:
|
|
566
|
+
|
|
567
|
+
| URI | Content |
|
|
568
|
+
|---|---|
|
|
569
|
+
| `perception://target/{targetKey}/timeline` | Semantic event timeline for a target |
|
|
570
|
+
| `perception://targets/recent` | Most recently active target keys |
|
|
571
|
+
| `perception://lens/{lensId}/summary` | Lens attention/guard state |
|
|
572
|
+
|
|
573
|
+
### Manual Lens Eviction: FIFO → LRU
|
|
574
|
+
|
|
575
|
+
Manual lenses (created via `perception_register`) are now evicted by **least-recently-used** instead of insertion order. Using `perception_read`, `evaluatePreToolGuards`, or `buildEnvelopeFor` on a lens promotes it. The hard limit of 16 active lenses is unchanged.
|
|
576
|
+
|
|
577
|
+
### browser_eval Structured Mode
|
|
578
|
+
|
|
579
|
+
Pass `withPerception: true` to receive a structured JSON response with `post.perception` instead of raw text:
|
|
580
|
+
|
|
581
|
+
```json
|
|
582
|
+
{ "name": "browser_eval", "arguments": { "expression": "document.title", "withPerception": true } }
|
|
583
|
+
```
|
|
584
|
+
|
|
585
|
+
Returns `{ ok: true, result: "...", post: { perception: { status: "ok", ... } } }`.
|
|
586
|
+
|
|
587
|
+
### mouse_drag Cross-Window Guard
|
|
588
|
+
|
|
589
|
+
`mouse_drag` now guards both start and end coordinates. Drags that cross window boundaries (or reach the desktop wallpaper) are blocked by default. To allow intentional cross-window or range-selection drags:
|
|
590
|
+
|
|
591
|
+
```json
|
|
592
|
+
{ "name": "mouse_drag", "arguments": { "startX": 100, "startY": 100, "endX": 900, "endY": 900, "allowCrossWindowDrag": true } }
|
|
593
|
+
```
|
|
594
|
+
|
|
595
|
+
---
|
|
596
|
+
|
|
500
597
|
## Known limitations
|
|
501
598
|
|
|
502
599
|
| Limitation | Detail | Workaround |
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@harusame64/desktop-touch-mcp",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.13.0",
|
|
4
4
|
"description": "LLM-native Windows computer-use MCP server with 56 tools for screenshots, UIA, mouse/keyboard, Chrome CDP, terminal, SmartScroll, and perception guards",
|
|
5
5
|
"engines": {
|
|
6
6
|
"node": ">=20.0.0"
|