@harusame64/desktop-touch-mcp 0.15.3 → 0.15.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.ja.md +2 -0
- package/README.md +4 -2
- package/package.json +8 -3
package/README.ja.md
CHANGED
|
@@ -10,12 +10,14 @@ Claude がデスクトップを直接見て、直接操作する。
|
|
|
10
10
|
マウス・キーボード・スクリーンショット・Windows UI Automation・Chrome DevTools Protocol・ターミナル・SmartScroll・Reactive Perception Graph を統合した 57 のツールを提供する MCP サーバーです。
|
|
11
11
|
|
|
12
12
|
> *v0.15: Rust ネイティブエンジンにより**平均 82 倍高速化** — UIA フォーカス取得 2ms、SSE2 SIMD 画像差分 13〜15 倍速。設定不要:エンジンは自動ロード、不在時は PowerShell に透過フォールバック。*
|
|
13
|
+
> *v0.15.4: **Set-of-Marks (SoM) ビジュアルフォールバック** — ゲーム・RDP・非対応 Electron アプリでも OCR + Rust 画像前処理で操作可能な要素を返す。UIA が完全に無効でも動作します。*
|
|
13
14
|
|
|
14
15
|
---
|
|
15
16
|
|
|
16
17
|
## 特徴
|
|
17
18
|
|
|
18
19
|
- **⚡ 高性能 Rust ネイティブコア** — UIA ブリッジと画像差分エンジンを Rust (`napi-rs` + `windows-rs`) で実装し、ネイティブ `.node` アドオンとしてロード。専用 MTA スレッドからの直接 COM 呼び出しにより PowerShell プロセス起動を排除 — `getFocusedElement` は **2ms**(160 倍高速)、`getUiElements` はバッチ型 BFS アルゴリズムでクロスプロセス RPC を最小化し **約 100ms** で完了。画像差分は **SSE2 SIMD** で 13〜15 倍のスループット。ネイティブエンジンが利用不可の場合、全関数が PowerShell に透過フォールバック — 設定不要。
|
|
20
|
+
- **🎯 Set-of-Marks (SoM) ビジュアルフォールバック** — ゲーム・RDP・非対応 Electron アプリで UIA が完全に機能しない場合でも、`screenshot(detail="text")` が Hybrid Non-CDP パイプラインを自動起動。Rust 画像前処理 → Windows OCR → クラスタリング → 赤い枠線 + 番号バッジ(`[1]`、`[2]`…)付き PNG 画像を生成し、`clickAt` 座標付きの要素リストを返します。CDP 不要。
|
|
19
21
|
- **LLM ネイティブ設計** — 人間の操作を模倣するのではなく、「LLM がいかにコンテキストを消費せず高速に動けるか」を前提に設計。`run_macro` による複数操作の一括実行(API 往復の削減)と、**MPEG P-frame 方式のレイヤー差分** (`diffMode`) を組み合わせることで、無駄な画像転送や推論ループを極限まで削ぎ落とす。
|
|
20
22
|
- **Reactive Perception Graph** — ウィンドウやブラウザタブに `lensId` を登録し、以後の action tool に渡すだけで、操作前の安全 guard と操作後の `post.perception` フィードバックを受け取れます。`screenshot` / `get_context` の反復を減らし、別ウィンドウへの誤入力や古い座標クリックを防ぎます。
|
|
21
23
|
- **日本語/CJK 完全対応** — ウィンドウタイトル取得に Win32 `GetWindowTextW` を使用。nut-js の文字化けを回避。IME バイパス入力にも対応。
|
package/README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# desktop-touch-mcp
|
|
2
2
|
|
|
3
|
-
[](https://glama.ai/mcp/servers/Harusame64/desktop-touch-mcp)
|
|
4
4
|
|
|
5
5
|
[日本語](README.ja.md)
|
|
6
6
|
|
|
@@ -9,12 +9,14 @@
|
|
|
9
9
|
An MCP server that gives Claude eyes and hands on Windows — 57 tools covering screenshots, mouse, keyboard, Windows UI Automation, Chrome DevTools Protocol, clipboard, desktop notifications, SmartScroll, and a Reactive Perception Graph for safe multi-step automation, designed from the ground up for LLM efficiency.
|
|
10
10
|
|
|
11
11
|
> *v0.15: **82× average speedup** via Rust native engine — UIA focus queries in 2 ms, SSE2-accelerated image diffing at 13–15× native speed. Zero-config: the engine auto-loads when present, with transparent PowerShell fallback.*
|
|
12
|
+
> *v0.15.4: **Set-of-Marks (SoM) visual fallback** — games, RDP sessions, and non-accessible Electron apps now return clickable elements via OCR + Rust image preprocessing, even when UIA is completely unavailable.*
|
|
12
13
|
|
|
13
14
|
---
|
|
14
15
|
|
|
15
16
|
## Features
|
|
16
17
|
|
|
17
18
|
- **⚡ High-performance Rust Native Core** — The UIA bridge and image-diff engine are written in Rust (`napi-rs` + `windows-rs`) and loaded as a native `.node` addon. Direct COM calls from a dedicated MTA thread eliminate PowerShell process spawning — `getFocusedElement` completes in **2 ms** (160× faster), and `getUiElements` returns full trees in **~100 ms** with a batch BFS algorithm that minimizes cross-process RPC. Image-diff operations use **SSE2 SIMD** for 13–15× throughput. When the native engine is unavailable, every function transparently falls back to PowerShell — zero config required.
|
|
19
|
+
- **🎯 Set-of-Marks (SoM) visual fallback** — Games, RDP sessions, and non-accessible Electron apps return clickable elements even when UIA is completely blind. `screenshot(detail="text")` automatically detects UIA sparsity and activates a Hybrid Non-CDP pipeline: Rust-powered grayscale + bilinear upscale → Windows OCR → clustering → red bounding-box annotation with numbered badges (`[1]`, `[2]`…). Two parallel representations returned: a visual PNG for spatial orientation and a semantic `elements[]` list with `clickAt` coords — no CDP required.
|
|
18
20
|
- **LLM-native design** — Built around how LLMs think, not how humans click. `run_macro` batches multiple operations into a single API call; `diffMode` sends only the windows that changed since the last frame. Minimal tokens, minimal round-trips.
|
|
19
21
|
- **Reactive Perception Graph** — Register a `lensId` for a window or browser tab, pass it to action tools, and get guard-checked `post.perception` feedback after each action. It reduces repeated `screenshot` / `get_context` calls and prevents wrong-window typing or stale-coordinate clicks.
|
|
20
22
|
- **Full CJK support** — Uses Win32 `GetWindowTextW` for window titles, avoiding nut-js garbling. IME bypass input supported for Japanese/Chinese/Korean environments.
|
|
@@ -118,7 +120,7 @@ For a local checkout, register the built server directly:
|
|
|
118
120
|
### Screenshot (5)
|
|
119
121
|
| Tool | Description |
|
|
120
122
|
|---|---|
|
|
121
|
-
| `screenshot` | Main capture. Supports `detail`, `dotByDot`, `dotByDotMaxDimension`, `grayscale`, `region` sub-crop, `diffMode` |
|
|
123
|
+
| `screenshot` | Main capture. Supports `detail`, `dotByDot`, `dotByDotMaxDimension`, `grayscale`, `region` sub-crop, `diffMode`. `detail="text"` auto-activates the SoM pipeline when UIA is blind (games, RDP, custom Electron) |
|
|
122
124
|
| `screenshot_background` | Capture a background window without focusing it (PrintWindow API) |
|
|
123
125
|
| `screenshot_ocr` | Windows.Media.Ocr on a window; returns word-level text + screen clickAt coords |
|
|
124
126
|
| `get_screen_info` | Monitor layout, DPI, cursor position |
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@harusame64/desktop-touch-mcp",
|
|
3
|
-
"version": "0.15.
|
|
3
|
+
"version": "0.15.4",
|
|
4
4
|
"mcpName": "io.github.Harusame64/desktop-touch-mcp",
|
|
5
5
|
"description": "LLM-native Windows computer-use MCP server with 56 tools for screenshots, UIA, mouse/keyboard, Chrome CDP, terminal, SmartScroll, and perception guards",
|
|
6
6
|
"engines": {
|
|
@@ -47,8 +47,13 @@
|
|
|
47
47
|
"dev": "tsc --watch",
|
|
48
48
|
"test": "vitest run",
|
|
49
49
|
"test:capture": "node scripts/test-capture.mjs",
|
|
50
|
-
"test:
|
|
51
|
-
"test:
|
|
50
|
+
"test:unit": "vitest run --project=unit",
|
|
51
|
+
"test:e2e": "vitest run --project=e2e",
|
|
52
|
+
"test:headed": "HEADED=1 vitest run --project=e2e",
|
|
53
|
+
"test:e2e:browser": "vitest run --project=e2e \"tests/e2e/browser-*.test.ts\"",
|
|
54
|
+
"test:e2e:window": "vitest run --project=e2e tests/e2e/dock-window.test.ts tests/e2e/dock-auto.test.ts tests/e2e/focus-integrity.test.ts tests/e2e/force-focus.test.ts tests/e2e/screenshot-electron.test.ts tests/e2e/ui-elements-cache.test.ts",
|
|
55
|
+
"test:e2e:input": "vitest run --project=e2e tests/e2e/keyboard-focus-lost.test.ts tests/e2e/mouse-focus-lost.test.ts tests/e2e/terminal.test.ts",
|
|
56
|
+
"test:e2e:perception": "vitest run --project=e2e tests/e2e/perception-mvp.test.ts tests/e2e/rich-narration-edge.test.ts",
|
|
52
57
|
"test:watch": "vitest",
|
|
53
58
|
"generate:stub-catalog": "node scripts/generate-stub-tool-catalog.mjs",
|
|
54
59
|
"check:stub-catalog": "node scripts/generate-stub-tool-catalog.mjs && git diff --exit-code src/stub-tool-catalog.ts",
|