@harusame64/desktop-touch-mcp 1.4.3 → 1.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.ja.md +7 -3
- package/README.md +7 -3
- package/bin/launcher.js +3 -3
- package/package.json +28 -2
package/README.ja.md
CHANGED
|
@@ -4,9 +4,13 @@
|
|
|
4
4
|
|
|
5
5
|
[English](README.md)
|
|
6
6
|
|
|
7
|
-
>
|
|
7
|
+
> **Windows 用 computer-use MCP サーバー。** Claude / Cursor / VS Code Copilot などの MCP クライアントから、あなたの Windows 10/11 デスクトップを「見て」「操作」させられます — スクリーンショット、UI Automation、Chrome CDP、キーボード / マウス、ターミナル。座標ルーレットではない **セマンティックな discover-then-act 設計** と、誤ウィンドウへの入力を未然に防ぐ **action 毎の perception guard** が特徴です。
|
|
8
8
|
|
|
9
|
-
|
|
9
|
+
```bash
|
|
10
|
+
npx -y @harusame64/desktop-touch-mcp
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
28 ツール、Rust ネイティブエンジン (UIA 2ms)、PowerShell 透過フォールバック、日本語/CJK 完全対応、MIT。上記 1 行を Claude / Cursor / VS Code Copilot の MCP 設定に追加するだけで、Notepad、Excel、Chrome、Windows Terminal、その他あらゆるアプリを Claude が操作できるようになります。
|
|
10
14
|
|
|
11
15
|
> *v0.15: Rust ネイティブエンジンにより**平均 82 倍高速化** — UIA フォーカス取得 2ms、SSE2 SIMD 画像差分 13〜15 倍速。設定不要:エンジンは自動ロード、不在時は PowerShell に透過フォールバック。*
|
|
12
16
|
> *v0.15.5: **固定リリース検証** — npm ランチャーは対応する GitHub Release tag だけを取得し、Windows runtime zip を検証してから展開します。*
|
|
@@ -510,7 +514,7 @@ v0.16.x での opt-in フラグです。v0.17 では後方互換として受理
|
|
|
510
514
|
|
|
511
515
|
| 制限 | 詳細 | 回避策 |
|
|
512
516
|
|---|---|---|
|
|
513
|
-
| ゲーム・動画プレイヤーの背面キャプチャが黒またはハング | DirectX フルスクリーン等は `PW_RENDERFULLCONTENT (flag=2)`
|
|
517
|
+
| ゲーム・動画プレイヤーの背面キャプチャが黒またはハング | DirectX フルスクリーン等は `PW_RENDERFULLCONTENT (flag=2)` でも再描画してくれないことがある。v1.4.4 以降、window-targeted `screenshot(detail='image')` は PrintWindow が何も返さない場合と all-black + zero-variance フレームを返した場合に BitBlt fallback へ自動で切り替わるが、PrintWindow がハングするケースは fallback されない | `screenshot({mode:'background', fullContent:false})` で旧 PrintWindow フラグに切り替え。それでも黒なら default `mode='normal'` の BitBlt fallback が画面の rect を返す (`hints.captureFallbackReason: 'printwindow-all-black'` で識別可能) |
|
|
514
518
|
| UIA 呼び出しのオーバーヘッド | Rust ネイティブ: フォーカス取得 ~2ms、ツリー走査 ~100ms。PowerShell フォールバック: ~300ms | 操作前に `workspace_snapshot` で一括取得し、以降は `diffMode` で差分確認 |
|
|
515
519
|
| Chrome / WinUI3 の UIA 要素が空 | Chromium は UIA を限定的にしか公開しない | `browser_open` + `browser_locate` で DOM ベースのクリックを使用。視覚確認のみなら `screenshot(detail="image")` |
|
|
516
520
|
| レイヤーバッファの TTL | 90 秒操作なしでバッファが自動クリア → 次回 `diffMode` が I-frame になる | 長い待機後は `workspace_snapshot` で明示的にリセット |
|
package/README.md
CHANGED
|
@@ -4,9 +4,13 @@
|
|
|
4
4
|
|
|
5
5
|
[日本語](README.ja.md)
|
|
6
6
|
|
|
7
|
-
> **
|
|
7
|
+
> **Computer-use MCP server for Windows.** Lets Claude, Cursor, or any MCP client see and operate your Windows 10/11 desktop — screenshots, UI Automation, Chrome CDP, keyboard / mouse, terminal — with **semantic discover-then-act targeting** that avoids pixel-coordinate guessing, and **per-action perception guards** that catch wrong-window typing before it happens.
|
|
8
8
|
|
|
9
|
-
|
|
9
|
+
```bash
|
|
10
|
+
npx -y @harusame64/desktop-touch-mcp
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
28 tools, native Rust engine (UIA in 2 ms), zero-config PowerShell fallback, full CJK support, MIT licensed. Add the snippet above to your Claude / Cursor / VS Code Copilot config and Claude can drive Notepad, Excel, Chrome, Windows Terminal, and any other app on your machine.
|
|
10
14
|
|
|
11
15
|
> *v0.15: **82× average speedup** via Rust native engine — UIA focus queries in 2 ms, SSE2-accelerated image diffing at 13–15× native speed. Zero-config: the engine auto-loads when present, with transparent PowerShell fallback.*
|
|
12
16
|
> *v0.15.5: **Pinned release verification** — the npm launcher now fetches only the matching GitHub Release tag and verifies the Windows runtime zip before extraction.*
|
|
@@ -703,7 +707,7 @@ For `desktop_discover` warnings (`visual_provider_unavailable`, `visual_provider
|
|
|
703
707
|
|
|
704
708
|
| Limitation | Detail | Workaround |
|
|
705
709
|
|---|---|---|
|
|
706
|
-
| Games / video players may return black or hang in
|
|
710
|
+
| Games / video players may return black or hang in PrintWindow capture | DirectX fullscreen apps may not redraw under `PW_RENDERFULLCONTENT`. Window-targeted `screenshot(detail='image')` already falls back to BitBlt automatically when PrintWindow returns no data or an all-black + zero-variance frame, but DirectX surfaces that hang the call don't surface as fallback. | Retry with `screenshot({mode:'background', fullContent:false})` to switch to the legacy PrintWindow flag; if still black, the BitBlt fallback path (default `mode='normal'`) will at least return the on-screen rect — `hints.captureFallbackReason` will say `printwindow-all-black` |
|
|
707
711
|
| UIA call overhead | ~2 ms (focus) / ~100 ms (tree) via Rust native engine; ~300 ms via PowerShell fallback | Rust engine loads automatically; `workspace_snapshot` uses a 2 s timeout internally |
|
|
708
712
|
| Chrome / WinUI3 UIA elements are empty | Chromium exposes only limited UIA | `screenshot(detail='text')` auto-detects Chromium and falls back to Windows OCR (`hints.chromiumGuard=true`). For richer DOM access use `browser_open` + `browser_locate` |
|
|
709
713
|
| Chromium title-regex misses when sites rewrite `document.title` | Guard relies on the ` - Google Chrome` suffix being present; some sites push it off the end of a long title | Title is treated as plain Chrome (UIA runs). OCR path is still reachable via `ocrFallback='always'` or when UIA returns `<5` elements (`uiaSparse`) |
|
package/bin/launcher.js
CHANGED
|
@@ -18,15 +18,15 @@ import path from "node:path";
|
|
|
18
18
|
import { Readable } from "node:stream";
|
|
19
19
|
import { pipeline } from "node:stream/promises";
|
|
20
20
|
|
|
21
|
-
const PACKAGE_VERSION = "1.
|
|
21
|
+
const PACKAGE_VERSION = "1.5.0";
|
|
22
22
|
const RELEASE_TAG = `v${PACKAGE_VERSION}`;
|
|
23
23
|
const REPO_API_URL = `https://api.github.com/repos/Harusame64/desktop-touch-mcp/releases/tags/${RELEASE_TAG}`;
|
|
24
24
|
const ASSET_NAME = "desktop-touch-mcp-windows.zip";
|
|
25
25
|
const RELEASE_METADATA_FILE = ".desktop-touch-release.json";
|
|
26
26
|
const RELEASE_MANIFEST = {
|
|
27
|
-
tagName: "v1.
|
|
27
|
+
tagName: "v1.5.0",
|
|
28
28
|
assetName: ASSET_NAME,
|
|
29
|
-
sha256: "
|
|
29
|
+
sha256: "d8c056709d34f46024bfb7918fc6cf8da0847ede613e9b4c4609ea3d34d4431c",
|
|
30
30
|
};
|
|
31
31
|
const CACHE_ROOT = process.env.DESKTOP_TOUCH_MCP_HOME
|
|
32
32
|
? path.resolve(process.env.DESKTOP_TOUCH_MCP_HOME)
|
package/package.json
CHANGED
|
@@ -1,8 +1,34 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@harusame64/desktop-touch-mcp",
|
|
3
|
-
"version": "1.
|
|
3
|
+
"version": "1.5.0",
|
|
4
4
|
"mcpName": "io.github.Harusame64/desktop-touch-mcp",
|
|
5
|
-
"description": "
|
|
5
|
+
"description": "Let Claude, Cursor, or any MCP client see and operate your Windows 10/11 desktop. 28 tools for screenshots, UI Automation, Chrome CDP, keyboard/mouse, terminal, with semantic discover-then-act targeting and per-action perception guards that avoid wrong-window typing and stale-coordinate clicks.",
|
|
6
|
+
"keywords": [
|
|
7
|
+
"mcp",
|
|
8
|
+
"mcp-server",
|
|
9
|
+
"model-context-protocol",
|
|
10
|
+
"claude",
|
|
11
|
+
"claude-desktop",
|
|
12
|
+
"claude-code",
|
|
13
|
+
"anthropic",
|
|
14
|
+
"cursor",
|
|
15
|
+
"vscode-copilot",
|
|
16
|
+
"computer-use",
|
|
17
|
+
"agentic",
|
|
18
|
+
"ai-agent",
|
|
19
|
+
"llm",
|
|
20
|
+
"windows",
|
|
21
|
+
"windows-automation",
|
|
22
|
+
"win32",
|
|
23
|
+
"uia",
|
|
24
|
+
"ui-automation",
|
|
25
|
+
"screenshot",
|
|
26
|
+
"screen-capture",
|
|
27
|
+
"chrome-cdp",
|
|
28
|
+
"chrome-devtools-protocol",
|
|
29
|
+
"terminal-automation",
|
|
30
|
+
"desktop-automation"
|
|
31
|
+
],
|
|
6
32
|
"engines": {
|
|
7
33
|
"node": ">=20.0.0"
|
|
8
34
|
},
|