@harusame64/desktop-touch-mcp 1.8.0 → 1.9.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +14 -33
- package/bin/launcher.js +3 -3
- package/package.json +3 -2
package/README.md
CHANGED
|
@@ -10,10 +10,11 @@
|
|
|
10
10
|
npx -y @harusame64/desktop-touch-mcp
|
|
11
11
|
```
|
|
12
12
|
|
|
13
|
-
|
|
13
|
+
29 tools, native Rust engine (UIA in 2 ms), zero-config PowerShell fallback, full CJK support, MIT licensed. Add the snippet above to your Claude / Cursor / VS Code Copilot config and Claude can drive Notepad, Excel, Chrome, Windows Terminal, and any other app on your machine.
|
|
14
14
|
|
|
15
|
-
>
|
|
16
|
-
>
|
|
15
|
+
> **Why this over pixel-clicking?** Two ideas run through every tool: **discover-then-act** — `desktop_discover` returns interactive entities with short-lived leases instead of raw coordinates, so `desktop_act` operates on *what* you mean, not *where* it was — and **per-action perception guards** that verify the target window's identity and bounds before input lands, catching wrong-window typing and stale-coordinate clicks before they happen.
|
|
16
|
+
>
|
|
17
|
+
> Under the hood: an **82× average speedup** from the Rust native engine (UIA focus queries in 2 ms, SSE2-accelerated image diffing at 13–15×), with a transparent PowerShell fallback when the engine is absent. The npm launcher fetches only the GitHub Release tag matching the installed version and verifies the Windows runtime zip before extraction.
|
|
17
18
|
|
|
18
19
|
---
|
|
19
20
|
|
|
@@ -122,7 +123,7 @@ For a local checkout, register the built server directly:
|
|
|
122
123
|
|
|
123
124
|
---
|
|
124
125
|
|
|
125
|
-
## Tools (
|
|
126
|
+
## Tools (29 Optimized Tools)
|
|
126
127
|
|
|
127
128
|
> 📖 **Full Reference**: [`docs/system-overview.md`](docs/system-overview.md) — Exhaustive guide on parameters, return schemas, and coordinate math.
|
|
128
129
|
|
|
@@ -166,6 +167,11 @@ For a local checkout, register the built server directly:
|
|
|
166
167
|
| `run_macro` | Batch up to 50 operations into a single round-trip for maximum efficiency. |
|
|
167
168
|
| `clipboard` / `notification_show` | System-level text exchange and user alerts. |
|
|
168
169
|
|
|
170
|
+
### 📊 Office (Excel)
|
|
171
|
+
| Tool | Description |
|
|
172
|
+
|---|---|
|
|
173
|
+
| `excel` | Author and run Excel VBA macros via COM. `action='run_vba'` writes a macro into a managed Trusted Location and runs it; `action='check_access_vbom'` is a read-only preflight. Runs VBA where formula-only tools cannot. One-time setup: `node scripts/enable-access-vbom.mjs`. |
|
|
174
|
+
|
|
169
175
|
---
|
|
170
176
|
|
|
171
177
|
## Standard workflow (v1.0.0)
|
|
@@ -558,7 +564,7 @@ Setting `DESKTOP_TOUCH_FORCE_FOCUS=1` makes `forceFocus: true` the default for a
|
|
|
558
564
|
|
|
559
565
|
---
|
|
560
566
|
|
|
561
|
-
## Auto Guard
|
|
567
|
+
## Auto Guard
|
|
562
568
|
|
|
563
569
|
Action tools (`mouse_click`, `mouse_drag`, `keyboard(action='type'/'press')`, `click_element`, `desktop_act`, `browser_click`, `browser_navigate`) automatically guard each action when you pass `windowTitle` / `tabId`:
|
|
564
570
|
|
|
@@ -608,32 +614,7 @@ The fix is one-shot and expires in 15 seconds. The server revalidates the target
|
|
|
608
614
|
|
|
609
615
|
---
|
|
610
616
|
|
|
611
|
-
##
|
|
612
|
-
|
|
613
|
-
### Target-Identity Timeline
|
|
614
|
-
|
|
615
|
-
The server tracks a semantic timeline of what happened to each target window/tab. Recent events are included in:
|
|
616
|
-
|
|
617
|
-
- `get_history` → `recentTargetKeys`: array of 3 most recently active target keys (compact, no event bodies)
|
|
618
|
-
- `perception_read(lensId)` → `recentEvents`: up to 10 events for that lens's target, each with `tsMs`, `semantic`, `summary`
|
|
619
|
-
|
|
620
|
-
Enable the MCP resources below to browse timelines:
|
|
621
|
-
|
|
622
|
-
```json
|
|
623
|
-
{ "env": { "DESKTOP_TOUCH_PERCEPTION_RESOURCES": "1" } }
|
|
624
|
-
```
|
|
625
|
-
|
|
626
|
-
MCP resources available when enabled:
|
|
627
|
-
|
|
628
|
-
| URI | Content |
|
|
629
|
-
|---|---|
|
|
630
|
-
| `perception://target/{targetKey}/timeline` | Semantic event timeline for a target |
|
|
631
|
-
| `perception://targets/recent` | Most recently active target keys |
|
|
632
|
-
| `perception://lens/{lensId}/summary` | Lens attention/guard state |
|
|
633
|
-
|
|
634
|
-
### Manual Lens Eviction: FIFO → LRU
|
|
635
|
-
|
|
636
|
-
Manual lenses (created via `perception_register`) are now evicted by **least-recently-used** instead of insertion order. Using `perception_read`, `evaluatePreToolGuards`, or `buildEnvelopeFor` on a lens promotes it. The hard limit of 16 active lenses is unchanged.
|
|
617
|
+
## Advanced response options
|
|
637
618
|
|
|
638
619
|
### browser_eval Structured Mode
|
|
639
620
|
|
|
@@ -748,7 +729,7 @@ Flag semantics (exact-match: only the literal string `"1"` counts):
|
|
|
748
729
|
|
|
749
730
|
### Deprecated: `DESKTOP_TOUCH_ENABLE_FUKUWARAI_V2`
|
|
750
731
|
|
|
751
|
-
This was the opt-in switch in v0.16.x.
|
|
732
|
+
This was the opt-in switch in v0.16.x. It is still accepted for backward compatibility but no longer required — the server prints a deprecation warning on startup when it is set. Remove it from your config; V2 is on by default.
|
|
752
733
|
|
|
753
734
|
### Recovery when V2 fails
|
|
754
735
|
|
|
@@ -759,7 +740,7 @@ If `desktop_act` returns `ok: false`, read `reason` and follow the built-in reco
|
|
|
759
740
|
- `entity_outside_viewport` → `scroll` / `scroll(action='to_element')`, then re-call `desktop_discover`
|
|
760
741
|
- `executor_failed` → fall back to `click_element` / `mouse_click` / `browser_click`
|
|
761
742
|
|
|
762
|
-
For `desktop_discover` warnings (`visual_provider_unavailable`, `visual_provider_warming`, `cdp_provider_failed`, …),
|
|
743
|
+
For `desktop_discover` warnings (`visual_provider_unavailable`, `visual_provider_warming`, `cdp_provider_failed`, …), the coordinate-based tools (`screenshot(detail='text')`, `click_element`, `mouse_click`, `terminal`, …) remain available as an escape hatch.
|
|
763
744
|
|
|
764
745
|
---
|
|
765
746
|
|
package/bin/launcher.js
CHANGED
|
@@ -18,15 +18,15 @@ import path from "node:path";
|
|
|
18
18
|
import { Readable } from "node:stream";
|
|
19
19
|
import { pipeline } from "node:stream/promises";
|
|
20
20
|
|
|
21
|
-
const PACKAGE_VERSION = "1.
|
|
21
|
+
const PACKAGE_VERSION = "1.9.0";
|
|
22
22
|
const RELEASE_TAG = `v${PACKAGE_VERSION}`;
|
|
23
23
|
const REPO_API_URL = `https://api.github.com/repos/Harusame64/desktop-touch-mcp/releases/tags/${RELEASE_TAG}`;
|
|
24
24
|
const ASSET_NAME = "desktop-touch-mcp-windows.zip";
|
|
25
25
|
const RELEASE_METADATA_FILE = ".desktop-touch-release.json";
|
|
26
26
|
const RELEASE_MANIFEST = {
|
|
27
|
-
tagName: "v1.
|
|
27
|
+
tagName: "v1.9.0",
|
|
28
28
|
assetName: ASSET_NAME,
|
|
29
|
-
sha256: "
|
|
29
|
+
sha256: "d1c913ba71986802c335c3d23bd419d58d60ebb36a0d8826a7441dcd5d4403a0",
|
|
30
30
|
};
|
|
31
31
|
const CACHE_ROOT = process.env.DESKTOP_TOUCH_MCP_HOME
|
|
32
32
|
? path.resolve(process.env.DESKTOP_TOUCH_MCP_HOME)
|
package/package.json
CHANGED
|
@@ -1,8 +1,8 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@harusame64/desktop-touch-mcp",
|
|
3
|
-
"version": "1.
|
|
3
|
+
"version": "1.9.0",
|
|
4
4
|
"mcpName": "io.github.Harusame64/desktop-touch-mcp",
|
|
5
|
-
"description": "Let Claude, Cursor, or any MCP client see and operate your Windows 10/11 desktop.
|
|
5
|
+
"description": "Let Claude, Cursor, or any MCP client see and operate your Windows 10/11 desktop. 29 tools for screenshots, UI Automation, Chrome CDP, keyboard/mouse, terminal, with semantic discover-then-act targeting and per-action perception guards that avoid wrong-window typing and stale-coordinate clicks.",
|
|
6
6
|
"keywords": [
|
|
7
7
|
"mcp",
|
|
8
8
|
"mcp-server",
|
|
@@ -81,6 +81,7 @@
|
|
|
81
81
|
"test:capture": "node scripts/test-capture.mjs",
|
|
82
82
|
"test:unit": "vitest run --project=unit",
|
|
83
83
|
"test:e2e": "vitest run --project=e2e",
|
|
84
|
+
"e2e:stop": "node scripts/e2e-stop.mjs",
|
|
84
85
|
"test:headed": "HEADED=1 vitest run --project=e2e",
|
|
85
86
|
"test:e2e:browser": "vitest run --project=e2e \"tests/e2e/browser-*.test.ts\"",
|
|
86
87
|
"test:e2e:window": "vitest run --project=e2e tests/e2e/dock-window.test.ts tests/e2e/dock-auto.test.ts tests/e2e/focus-integrity.test.ts tests/e2e/force-focus.test.ts tests/e2e/screenshot-electron.test.ts tests/e2e/ui-elements-cache.test.ts",
|