@harusame64/desktop-touch-mcp 1.8.0 → 1.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -10,10 +10,11 @@
10
10
  npx -y @harusame64/desktop-touch-mcp
11
11
  ```
12
12
 
13
- 28 tools, native Rust engine (UIA in 2 ms), zero-config PowerShell fallback, full CJK support, MIT licensed. Add the snippet above to your Claude / Cursor / VS Code Copilot config and Claude can drive Notepad, Excel, Chrome, Windows Terminal, and any other app on your machine.
13
+ 29 tools, native Rust engine (UIA in 2 ms), zero-config PowerShell fallback, full CJK support, MIT licensed. Add the snippet above to your Claude / Cursor / VS Code Copilot config and Claude can drive Notepad, Excel, Chrome, Windows Terminal, and any other app on your machine.
14
14
 
15
- > *v0.15: **82× average speedup** via Rust native engine UIA focus queries in 2 ms, SSE2-accelerated image diffing at 13–15× native speed. Zero-config: the engine auto-loads when present, with transparent PowerShell fallback.*
16
- > *v0.15.5: **Pinned release verification** — the npm launcher now fetches only the matching GitHub Release tag and verifies the Windows runtime zip before extraction.*
15
+ > **Why this over pixel-clicking?** Two ideas run through every tool: **discover-then-act** `desktop_discover` returns interactive entities with short-lived leases instead of raw coordinates, so `desktop_act` operates on *what* you mean, not *where* it was and **per-action perception guards** that verify the target window's identity and bounds before input lands, catching wrong-window typing and stale-coordinate clicks before they happen.
16
+ >
17
+ > Under the hood: an **82× average speedup** from the Rust native engine (UIA focus queries in 2 ms, SSE2-accelerated image diffing at 13–15×), with a transparent PowerShell fallback when the engine is absent. The npm launcher fetches only the GitHub Release tag matching the installed version and verifies the Windows runtime zip before extraction.
17
18
 
18
19
  ---
19
20
 
@@ -122,7 +123,7 @@ For a local checkout, register the built server directly:
122
123
 
123
124
  ---
124
125
 
125
- ## Tools (28 Optimized Tools)
126
+ ## Tools (29 Optimized Tools)
126
127
 
127
128
  > 📖 **Full Reference**: [`docs/system-overview.md`](docs/system-overview.md) — Exhaustive guide on parameters, return schemas, and coordinate math.
128
129
 
@@ -166,6 +167,11 @@ For a local checkout, register the built server directly:
166
167
  | `run_macro` | Batch up to 50 operations into a single round-trip for maximum efficiency. |
167
168
  | `clipboard` / `notification_show` | System-level text exchange and user alerts. |
168
169
 
170
+ ### 📊 Office (Excel)
171
+ | Tool | Description |
172
+ |---|---|
173
+ | `excel` | Author and run Excel VBA macros via COM. `action='run_vba'` writes a macro into a managed Trusted Location and runs it; `action='check_access_vbom'` is a read-only preflight. Runs VBA where formula-only tools cannot. One-time setup: `node scripts/enable-access-vbom.mjs`. |
174
+
169
175
  ---
170
176
 
171
177
  ## Standard workflow (v1.0.0)
@@ -558,7 +564,7 @@ Setting `DESKTOP_TOUCH_FORCE_FOCUS=1` makes `forceFocus: true` the default for a
558
564
 
559
565
  ---
560
566
 
561
- ## Auto Guard (v0.12+)
567
+ ## Auto Guard
562
568
 
563
569
  Action tools (`mouse_click`, `mouse_drag`, `keyboard(action='type'/'press')`, `click_element`, `desktop_act`, `browser_click`, `browser_navigate`) automatically guard each action when you pass `windowTitle` / `tabId`:
564
570
 
@@ -608,32 +614,7 @@ The fix is one-shot and expires in 15 seconds. The server revalidates the target
608
614
 
609
615
  ---
610
616
 
611
- ## v0.13 Additions
612
-
613
- ### Target-Identity Timeline
614
-
615
- The server tracks a semantic timeline of what happened to each target window/tab. Recent events are included in:
616
-
617
- - `get_history` → `recentTargetKeys`: array of 3 most recently active target keys (compact, no event bodies)
618
- - `perception_read(lensId)` → `recentEvents`: up to 10 events for that lens's target, each with `tsMs`, `semantic`, `summary`
619
-
620
- Enable the MCP resources below to browse timelines:
621
-
622
- ```json
623
- { "env": { "DESKTOP_TOUCH_PERCEPTION_RESOURCES": "1" } }
624
- ```
625
-
626
- MCP resources available when enabled:
627
-
628
- | URI | Content |
629
- |---|---|
630
- | `perception://target/{targetKey}/timeline` | Semantic event timeline for a target |
631
- | `perception://targets/recent` | Most recently active target keys |
632
- | `perception://lens/{lensId}/summary` | Lens attention/guard state |
633
-
634
- ### Manual Lens Eviction: FIFO → LRU
635
-
636
- Manual lenses (created via `perception_register`) are now evicted by **least-recently-used** instead of insertion order. Using `perception_read`, `evaluatePreToolGuards`, or `buildEnvelopeFor` on a lens promotes it. The hard limit of 16 active lenses is unchanged.
617
+ ## Advanced response options
637
618
 
638
619
  ### browser_eval Structured Mode
639
620
 
@@ -748,7 +729,7 @@ Flag semantics (exact-match: only the literal string `"1"` counts):
748
729
 
749
730
  ### Deprecated: `DESKTOP_TOUCH_ENABLE_FUKUWARAI_V2`
750
731
 
751
- This was the opt-in switch in v0.16.x. From v0.17 it is accepted for compatibility but no longer required — the server prints a deprecation warning on startup when it is set. It will be removed in v0.18. Remove it from your config when you upgrade.
732
+ This was the opt-in switch in v0.16.x. It is still accepted for backward compatibility but no longer required — the server prints a deprecation warning on startup when it is set. Remove it from your config; V2 is on by default.
752
733
 
753
734
  ### Recovery when V2 fails
754
735
 
@@ -759,7 +740,7 @@ If `desktop_act` returns `ok: false`, read `reason` and follow the built-in reco
759
740
  - `entity_outside_viewport` → `scroll` / `scroll(action='to_element')`, then re-call `desktop_discover`
760
741
  - `executor_failed` → fall back to `click_element` / `mouse_click` / `browser_click`
761
742
 
762
- For `desktop_discover` warnings (`visual_provider_unavailable`, `visual_provider_warming`, `cdp_provider_failed`, …), V1 tools (`screenshot`, `click_element`, `get_ui_elements`, `terminal(action='send')`, …) remain available as an escape hatch.
743
+ For `desktop_discover` warnings (`visual_provider_unavailable`, `visual_provider_warming`, `cdp_provider_failed`, …), the coordinate-based tools (`screenshot(detail='text')`, `click_element`, `mouse_click`, `terminal`, …) remain available as an escape hatch.
763
744
 
764
745
  ---
765
746
 
package/bin/launcher.js CHANGED
@@ -18,15 +18,15 @@ import path from "node:path";
18
18
  import { Readable } from "node:stream";
19
19
  import { pipeline } from "node:stream/promises";
20
20
 
21
- const PACKAGE_VERSION = "1.8.0";
21
+ const PACKAGE_VERSION = "1.9.0";
22
22
  const RELEASE_TAG = `v${PACKAGE_VERSION}`;
23
23
  const REPO_API_URL = `https://api.github.com/repos/Harusame64/desktop-touch-mcp/releases/tags/${RELEASE_TAG}`;
24
24
  const ASSET_NAME = "desktop-touch-mcp-windows.zip";
25
25
  const RELEASE_METADATA_FILE = ".desktop-touch-release.json";
26
26
  const RELEASE_MANIFEST = {
27
- tagName: "v1.8.0",
27
+ tagName: "v1.9.0",
28
28
  assetName: ASSET_NAME,
29
- sha256: "a4b330dba135e74707d082cbe1859ae9176b497373610165037f8ab59de35f8e",
29
+ sha256: "d1c913ba71986802c335c3d23bd419d58d60ebb36a0d8826a7441dcd5d4403a0",
30
30
  };
31
31
  const CACHE_ROOT = process.env.DESKTOP_TOUCH_MCP_HOME
32
32
  ? path.resolve(process.env.DESKTOP_TOUCH_MCP_HOME)
package/package.json CHANGED
@@ -1,8 +1,8 @@
1
1
  {
2
2
  "name": "@harusame64/desktop-touch-mcp",
3
- "version": "1.8.0",
3
+ "version": "1.9.0",
4
4
  "mcpName": "io.github.Harusame64/desktop-touch-mcp",
5
- "description": "Let Claude, Cursor, or any MCP client see and operate your Windows 10/11 desktop. 28 tools for screenshots, UI Automation, Chrome CDP, keyboard/mouse, terminal, with semantic discover-then-act targeting and per-action perception guards that avoid wrong-window typing and stale-coordinate clicks.",
5
+ "description": "Let Claude, Cursor, or any MCP client see and operate your Windows 10/11 desktop. 29 tools for screenshots, UI Automation, Chrome CDP, keyboard/mouse, terminal, with semantic discover-then-act targeting and per-action perception guards that avoid wrong-window typing and stale-coordinate clicks.",
6
6
  "keywords": [
7
7
  "mcp",
8
8
  "mcp-server",
@@ -81,6 +81,7 @@
81
81
  "test:capture": "node scripts/test-capture.mjs",
82
82
  "test:unit": "vitest run --project=unit",
83
83
  "test:e2e": "vitest run --project=e2e",
84
+ "e2e:stop": "node scripts/e2e-stop.mjs",
84
85
  "test:headed": "HEADED=1 vitest run --project=e2e",
85
86
  "test:e2e:browser": "vitest run --project=e2e \"tests/e2e/browser-*.test.ts\"",
86
87
  "test:e2e:window": "vitest run --project=e2e tests/e2e/dock-window.test.ts tests/e2e/dock-auto.test.ts tests/e2e/focus-integrity.test.ts tests/e2e/force-focus.test.ts tests/e2e/screenshot-electron.test.ts tests/e2e/ui-elements-cache.test.ts",