@harusame64/desktop-touch-mcp 0.15.3 → 0.15.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.ja.md CHANGED
@@ -10,12 +10,14 @@ Claude がデスクトップを直接見て、直接操作する。
10
10
  マウス・キーボード・スクリーンショット・Windows UI Automation・Chrome DevTools Protocol・ターミナル・SmartScroll・Reactive Perception Graph を統合した 57 のツールを提供する MCP サーバーです。
11
11
 
12
12
  > *v0.15: Rust ネイティブエンジンにより**平均 82 倍高速化** — UIA フォーカス取得 2ms、SSE2 SIMD 画像差分 13〜15 倍速。設定不要:エンジンは自動ロード、不在時は PowerShell に透過フォールバック。*
13
+ > *v0.15.5: **固定リリース検証** — npm ランチャーは対応する GitHub Release tag だけを取得し、Windows runtime zip を検証してから展開します。*
13
14
 
14
15
  ---
15
16
 
16
17
  ## 特徴
17
18
 
18
19
  - **⚡ 高性能 Rust ネイティブコア** — UIA ブリッジと画像差分エンジンを Rust (`napi-rs` + `windows-rs`) で実装し、ネイティブ `.node` アドオンとしてロード。専用 MTA スレッドからの直接 COM 呼び出しにより PowerShell プロセス起動を排除 — `getFocusedElement` は **2ms**(160 倍高速)、`getUiElements` はバッチ型 BFS アルゴリズムでクロスプロセス RPC を最小化し **約 100ms** で完了。画像差分は **SSE2 SIMD** で 13〜15 倍のスループット。ネイティブエンジンが利用不可の場合、全関数が PowerShell に透過フォールバック — 設定不要。
20
+ - **🎯 Set-of-Marks (SoM) ビジュアルフォールバック** — ゲーム・RDP・非対応 Electron アプリで UIA が完全に機能しない場合でも、`screenshot(detail="text")` が Hybrid Non-CDP パイプラインを自動起動。Rust 画像前処理 → Windows OCR → クラスタリング → 赤い枠線 + 番号バッジ(`[1]`、`[2]`…)付き PNG 画像を生成し、`clickAt` 座標付きの要素リストを返します。CDP 不要。
19
21
  - **LLM ネイティブ設計** — 人間の操作を模倣するのではなく、「LLM がいかにコンテキストを消費せず高速に動けるか」を前提に設計。`run_macro` による複数操作の一括実行(API 往復の削減)と、**MPEG P-frame 方式のレイヤー差分** (`diffMode`) を組み合わせることで、無駄な画像転送や推論ループを極限まで削ぎ落とす。
20
22
  - **Reactive Perception Graph** — ウィンドウやブラウザタブに `lensId` を登録し、以後の action tool に渡すだけで、操作前の安全 guard と操作後の `post.perception` フィードバックを受け取れます。`screenshot` / `get_context` の反復を減らし、別ウィンドウへの誤入力や古い座標クリックを防ぎます。
21
23
  - **日本語/CJK 完全対応** — ウィンドウタイトル取得に Win32 `GetWindowTextW` を使用。nut-js の文字化けを回避。IME バイパス入力にも対応。
@@ -49,7 +51,9 @@ Claude がデスクトップを直接見て、直接操作する。
49
51
  npx -y @harusame64/desktop-touch-mcp
50
52
  ```
51
53
 
52
- npm ランチャーは初回起動時に GitHub Releases から最新の `desktop-touch-mcp-windows.zip` をダウンロードし、`%USERPROFILE%\.desktop-touch-mcp` に展開してキャッシュします。以後は新しい GitHub Release が出ていない限り、キャッシュ済みの実体を再利用します。
54
+ npm ランチャーは npm package version に厳密に対応する runtime だけを取得します。`X.Y.Z` を実行した場合は GitHub Release `vX.Y.Z` のみを参照し、`desktop-touch-mcp-windows.zip` をダウンロードして SHA256 を検証できた場合にだけ `%USERPROFILE%\.desktop-touch-mcp` へ展開します。検証済みキャッシュは次回以降も再利用されます。
55
+
56
+ キャッシュの保存先は `DESKTOP_TOUCH_MCP_HOME` で変更できます。
53
57
 
54
58
  ### Claude CLI への登録
55
59
 
@@ -90,7 +94,11 @@ cd desktop-touch-mcp
90
94
  npm install
91
95
  ```
92
96
 
93
- `npm install` 実行時に `prepare` スクリプトが TypeScript を `dist/` にコンパイルします。別途 `npm run build` は不要です。
97
+ `npm install` 後にビルドを実行してください。
98
+
99
+ ```bash
100
+ npm run build
101
+ ```
94
102
 
95
103
  ローカルチェックアウトを使う場合は、ビルド済みのサーバーを直接登録します。
96
104
 
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # desktop-touch-mcp
2
2
 
3
- [![desktop-touch-mcp MCP server](https://glama.ai/mcp/servers/Harusame64/desktop-touch-mcp/badges/score.svg)](https://glama.ai/mcp/servers/Harusame64/desktop-touch-mcp)
3
+ [![desktop-touch-mcp MCP server](https://glama.ai/mcp/servers/Harusame64/desktop-touch-mcp/badges/card.svg)](https://glama.ai/mcp/servers/Harusame64/desktop-touch-mcp)
4
4
 
5
5
  [日本語](README.ja.md)
6
6
 
@@ -9,12 +9,14 @@
9
9
  An MCP server that gives Claude eyes and hands on Windows — 57 tools covering screenshots, mouse, keyboard, Windows UI Automation, Chrome DevTools Protocol, clipboard, desktop notifications, SmartScroll, and a Reactive Perception Graph for safe multi-step automation, designed from the ground up for LLM efficiency.
10
10
 
11
11
  > *v0.15: **82× average speedup** via Rust native engine — UIA focus queries in 2 ms, SSE2-accelerated image diffing at 13–15× native speed. Zero-config: the engine auto-loads when present, with transparent PowerShell fallback.*
12
+ > *v0.15.5: **Pinned release verification** — the npm launcher now fetches only the matching GitHub Release tag and verifies the Windows runtime zip before extraction.*
12
13
 
13
14
  ---
14
15
 
15
16
  ## Features
16
17
 
17
18
  - **⚡ High-performance Rust Native Core** — The UIA bridge and image-diff engine are written in Rust (`napi-rs` + `windows-rs`) and loaded as a native `.node` addon. Direct COM calls from a dedicated MTA thread eliminate PowerShell process spawning — `getFocusedElement` completes in **2 ms** (160× faster), and `getUiElements` returns full trees in **~100 ms** with a batch BFS algorithm that minimizes cross-process RPC. Image-diff operations use **SSE2 SIMD** for 13–15× throughput. When the native engine is unavailable, every function transparently falls back to PowerShell — zero config required.
19
+ - **🎯 Set-of-Marks (SoM) visual fallback** — Games, RDP sessions, and non-accessible Electron apps return clickable elements even when UIA is completely blind. `screenshot(detail="text")` automatically detects UIA sparsity and activates a Hybrid Non-CDP pipeline: Rust-powered grayscale + bilinear upscale → Windows OCR → clustering → red bounding-box annotation with numbered badges (`[1]`, `[2]`…). Two parallel representations returned: a visual PNG for spatial orientation and a semantic `elements[]` list with `clickAt` coords — no CDP required.
18
20
  - **LLM-native design** — Built around how LLMs think, not how humans click. `run_macro` batches multiple operations into a single API call; `diffMode` sends only the windows that changed since the last frame. Minimal tokens, minimal round-trips.
19
21
  - **Reactive Perception Graph** — Register a `lensId` for a window or browser tab, pass it to action tools, and get guard-checked `post.perception` feedback after each action. It reduces repeated `screenshot` / `get_context` calls and prevents wrong-window typing or stale-coordinate clicks.
20
22
  - **Full CJK support** — Uses Win32 `GetWindowTextW` for window titles, avoiding nut-js garbling. IME bypass input supported for Japanese/Chinese/Korean environments.
@@ -48,7 +50,9 @@ An MCP server that gives Claude eyes and hands on Windows — 57 tools covering
48
50
  npx -y @harusame64/desktop-touch-mcp
49
51
  ```
50
52
 
51
- The npm launcher downloads the latest `desktop-touch-mcp-windows.zip` from GitHub Releases on first run and caches it under `%USERPROFILE%\.desktop-touch-mcp`. Later runs reuse the cached release unless a newer GitHub Release is available.
53
+ The npm launcher resolves runtime strictly by npm package version. For package `X.Y.Z`, it fetches only GitHub Release tag `vX.Y.Z`, downloads `desktop-touch-mcp-windows.zip`, verifies its SHA256 digest, and only then expands it under `%USERPROFILE%\.desktop-touch-mcp`. Verified cached releases are reused on later runs.
54
+
55
+ Set `DESKTOP_TOUCH_MCP_HOME` to override the cache root directory.
52
56
 
53
57
  ### Register with Claude CLI
54
58
 
@@ -90,7 +94,11 @@ cd desktop-touch-mcp
90
94
  npm install
91
95
  ```
92
96
 
93
- `npm install` runs the `prepare` script, which compiles TypeScript to `dist/`. No separate build step is required.
97
+ Build after install:
98
+
99
+ ```bash
100
+ npm run build
101
+ ```
94
102
 
95
103
  For a local checkout, register the built server directly:
96
104
 
@@ -118,7 +126,7 @@ For a local checkout, register the built server directly:
118
126
  ### Screenshot (5)
119
127
  | Tool | Description |
120
128
  |---|---|
121
- | `screenshot` | Main capture. Supports `detail`, `dotByDot`, `dotByDotMaxDimension`, `grayscale`, `region` sub-crop, `diffMode` |
129
+ | `screenshot` | Main capture. Supports `detail`, `dotByDot`, `dotByDotMaxDimension`, `grayscale`, `region` sub-crop, `diffMode`. `detail="text"` auto-activates the SoM pipeline when UIA is blind (games, RDP, custom Electron) |
122
130
  | `screenshot_background` | Capture a background window without focusing it (PrintWindow API) |
123
131
  | `screenshot_ocr` | Windows.Media.Ocr on a window; returns word-level text + screen clickAt coords |
124
132
  | `get_screen_info` | Monitor layout, DPI, cursor position |
package/bin/launcher.js CHANGED
@@ -1,7 +1,7 @@
1
1
  #!/usr/bin/env node
2
2
 
3
3
  import { execFile, spawn } from "node:child_process";
4
- import { createWriteStream, existsSync } from "node:fs";
4
+ import { createReadStream, createWriteStream, existsSync } from "node:fs";
5
5
  import {
6
6
  mkdir,
7
7
  mkdtemp,
@@ -11,13 +11,22 @@ import {
11
11
  rm,
12
12
  writeFile,
13
13
  } from "node:fs/promises";
14
+ import { createHash } from "node:crypto";
14
15
  import os from "node:os";
15
16
  import path from "node:path";
16
17
  import { Readable } from "node:stream";
17
18
  import { pipeline } from "node:stream/promises";
18
19
 
19
- const REPO_API_URL = "https://api.github.com/repos/Harusame64/desktop-touch-mcp/releases/latest";
20
+ const PACKAGE_VERSION = "0.15.5";
21
+ const RELEASE_TAG = `v${PACKAGE_VERSION}`;
22
+ const REPO_API_URL = `https://api.github.com/repos/Harusame64/desktop-touch-mcp/releases/tags/${RELEASE_TAG}`;
20
23
  const ASSET_NAME = "desktop-touch-mcp-windows.zip";
24
+ const RELEASE_METADATA_FILE = ".desktop-touch-release.json";
25
+ const RELEASE_MANIFEST = {
26
+ tagName: "v0.15.5",
27
+ assetName: ASSET_NAME,
28
+ sha256: "0150c5b1b2bfda9b39f2401bab49bf9fd02e5db1f3915300490ea7a93603f402",
29
+ };
21
30
  const CACHE_ROOT = process.env.DESKTOP_TOUCH_MCP_HOME
22
31
  ? path.resolve(process.env.DESKTOP_TOUCH_MCP_HOME)
23
32
  : path.join(os.homedir(), ".desktop-touch-mcp");
@@ -42,33 +51,83 @@ function releaseDirForTag(tagName) {
42
51
  return path.join(RELEASES_DIR, tagToDirName(tagName));
43
52
  }
44
53
 
45
- async function isInstalled(releaseDir) {
46
- return existsSync(path.join(releaseDir, "dist", "index.js"));
54
+ function releaseMetadataPath(releaseDir) {
55
+ return path.join(releaseDir, RELEASE_METADATA_FILE);
47
56
  }
48
57
 
49
- async function readCurrentRelease() {
58
+ function expectedReleaseSpec() {
59
+ if (RELEASE_MANIFEST.tagName !== RELEASE_TAG) {
60
+ throw new Error(
61
+ `Release manifest mismatch: PACKAGE_VERSION=${PACKAGE_VERSION}, manifest=${RELEASE_MANIFEST.tagName}`
62
+ );
63
+ }
64
+ if (!RELEASE_MANIFEST.sha256 || RELEASE_MANIFEST.assetName !== ASSET_NAME) {
65
+ throw new Error(`Missing release manifest for ${RELEASE_TAG}`);
66
+ }
67
+ if (!/^[a-f0-9]{64}$/i.test(RELEASE_MANIFEST.sha256)) {
68
+ throw new Error(`Invalid release SHA256 manifest for ${RELEASE_TAG}`);
69
+ }
70
+ return {
71
+ tagName: RELEASE_MANIFEST.tagName,
72
+ assetName: RELEASE_MANIFEST.assetName,
73
+ sha256: String(RELEASE_MANIFEST.sha256).toLowerCase(),
74
+ };
75
+ }
76
+
77
+ async function readReleaseMetadata(releaseDir) {
78
+ try {
79
+ const raw = await readFile(releaseMetadataPath(releaseDir), "utf8");
80
+ return JSON.parse(raw);
81
+ } catch {
82
+ return null;
83
+ }
84
+ }
85
+
86
+ async function isInstalled(releaseDir, expected) {
87
+ if (!existsSync(path.join(releaseDir, "dist", "index.js"))) return false;
88
+ const metadata = await readReleaseMetadata(releaseDir);
89
+ if (!metadata) return false;
90
+ return (
91
+ metadata.tagName === expected.tagName &&
92
+ metadata.assetName === expected.assetName &&
93
+ String(metadata.sha256 || "").toLowerCase() === expected.sha256
94
+ );
95
+ }
96
+
97
+ async function readCurrentRelease(expected) {
50
98
  try {
51
99
  const raw = await readFile(CURRENT_FILE, "utf8");
52
100
  const parsed = JSON.parse(raw);
53
101
  if (typeof parsed?.tagName !== "string") return null;
102
+ if (parsed.tagName !== expected.tagName) return null;
103
+ if (parsed.assetName !== expected.assetName) return null;
104
+ if (String(parsed.sha256 || "").toLowerCase() !== expected.sha256) return null;
54
105
  const releaseDir = releaseDirForTag(parsed.tagName);
55
- if (!(await isInstalled(releaseDir))) return null;
106
+ if (!(await isInstalled(releaseDir, expected))) return null;
56
107
  return { tagName: parsed.tagName, releaseDir };
57
108
  } catch {
58
109
  return null;
59
110
  }
60
111
  }
61
112
 
62
- async function writeCurrentRelease(tagName) {
113
+ async function writeReleaseMetadata(releaseDir, expected) {
114
+ await writeFile(
115
+ releaseMetadataPath(releaseDir),
116
+ `${JSON.stringify({ ...expected, updatedAt: new Date().toISOString() }, null, 2)}\n`,
117
+ "utf8"
118
+ );
119
+ }
120
+
121
+ async function writeCurrentRelease(expected) {
63
122
  await mkdir(CACHE_ROOT, { recursive: true });
64
123
  await writeFile(
65
124
  CURRENT_FILE,
66
- `${JSON.stringify({ tagName, updatedAt: new Date().toISOString() }, null, 2)}\n`,
125
+ `${JSON.stringify({ ...expected, updatedAt: new Date().toISOString() }, null, 2)}\n`,
67
126
  "utf8"
68
127
  );
69
128
  }
70
129
 
71
- async function fetchLatestRelease() {
130
+ async function fetchReleaseByTag(expected) {
72
131
  const response = await fetch(REPO_API_URL, {
73
132
  headers: {
74
133
  "Accept": "application/vnd.github+json",
@@ -77,7 +136,7 @@ async function fetchLatestRelease() {
77
136
  });
78
137
 
79
138
  if (!response.ok) {
80
- throw new Error(`GitHub Releases API returned ${response.status} ${response.statusText}`);
139
+ throw new Error(`GitHub Releases API returned ${response.status} ${response.statusText} for ${expected.tagName}`);
81
140
  }
82
141
 
83
142
  const release = await response.json();
@@ -86,13 +145,16 @@ async function fetchLatestRelease() {
86
145
  : undefined;
87
146
 
88
147
  if (!release.tag_name || !asset?.browser_download_url) {
89
- throw new Error(`Latest release does not contain ${ASSET_NAME}`);
148
+ throw new Error(`Release ${expected.tagName} does not contain ${ASSET_NAME}`);
90
149
  }
91
150
 
92
151
  const tagName = String(release.tag_name);
93
152
  if (!/^v\d+\.\d+\.\d+$/.test(tagName)) {
94
153
  throw new Error(`Unexpected tag format: ${tagName}`);
95
154
  }
155
+ if (tagName !== expected.tagName) {
156
+ throw new Error(`Unexpected tag: expected ${expected.tagName}, got ${tagName}`);
157
+ }
96
158
 
97
159
  return {
98
160
  tagName,
@@ -100,6 +162,25 @@ async function fetchLatestRelease() {
100
162
  };
101
163
  }
102
164
 
165
+ async function sha256File(filePath) {
166
+ const hash = createHash("sha256");
167
+ await new Promise((resolve, reject) => {
168
+ const stream = createReadStream(filePath);
169
+ stream.on("error", reject);
170
+ stream.on("data", (chunk) => hash.update(chunk));
171
+ stream.on("end", resolve);
172
+ });
173
+ return hash.digest("hex").toLowerCase();
174
+ }
175
+
176
+ async function verifySha256(filePath, expectedSha256) {
177
+ const actual = await sha256File(filePath);
178
+ const expected = String(expectedSha256).toLowerCase();
179
+ if (actual !== expected) {
180
+ throw new Error(`SHA256 mismatch for ${ASSET_NAME}: expected ${expected}, got ${actual}`);
181
+ }
182
+ }
183
+
103
184
  async function downloadFile(url, destination) {
104
185
  const response = await fetch(url, {
105
186
  headers: {
@@ -144,19 +225,19 @@ async function expandZip(zipPath, destination) {
144
225
  }
145
226
 
146
227
  async function findExtractedRoot(extractDir) {
147
- if (await isInstalled(extractDir)) return extractDir;
228
+ if (existsSync(path.join(extractDir, "dist", "index.js"))) return extractDir;
148
229
 
149
230
  const entries = await readdir(extractDir, { withFileTypes: true });
150
231
  for (const entry of entries) {
151
232
  if (!entry.isDirectory()) continue;
152
233
  const candidate = path.join(extractDir, entry.name);
153
- if (await isInstalled(candidate)) return candidate;
234
+ if (existsSync(path.join(candidate, "dist", "index.js"))) return candidate;
154
235
  }
155
236
 
156
237
  throw new Error("Release zip did not contain dist/index.js");
157
238
  }
158
239
 
159
- async function installRelease(release) {
240
+ async function installRelease(release, expected) {
160
241
  await mkdir(RELEASES_DIR, { recursive: true });
161
242
 
162
243
  const targetDir = releaseDirForTag(release.tagName);
@@ -167,13 +248,15 @@ async function installRelease(release) {
167
248
  try {
168
249
  log(`Downloading ${ASSET_NAME} from ${release.tagName}`);
169
250
  await downloadFile(release.assetUrl, zipPath);
251
+ await verifySha256(zipPath, expected.sha256);
170
252
  await mkdir(extractDir, { recursive: true });
171
253
  await expandZip(zipPath, extractDir);
172
254
 
173
255
  const extractedRoot = await findExtractedRoot(extractDir);
174
256
  await rm(targetDir, { recursive: true, force: true });
175
257
  await rename(extractedRoot, targetDir);
176
- await writeCurrentRelease(release.tagName);
258
+ await writeReleaseMetadata(targetDir, expected);
259
+ await writeCurrentRelease(expected);
177
260
  log(`Installed ${release.tagName} to ${targetDir}`);
178
261
  return targetDir;
179
262
  } finally {
@@ -182,26 +265,21 @@ async function installRelease(release) {
182
265
  }
183
266
 
184
267
  async function ensureRelease() {
185
- const current = await readCurrentRelease();
186
-
187
- let latest;
188
- try {
189
- latest = await fetchLatestRelease();
190
- } catch (error) {
191
- if (current) {
192
- log(`Could not check GitHub Releases; using cached ${current.tagName}`);
193
- return current.releaseDir;
194
- }
195
- throw error;
268
+ const expected = expectedReleaseSpec();
269
+ const targetDir = releaseDirForTag(expected.tagName);
270
+ if (await isInstalled(targetDir, expected)) {
271
+ await writeCurrentRelease(expected);
272
+ return targetDir;
196
273
  }
197
274
 
198
- const targetDir = releaseDirForTag(latest.tagName);
199
- if (await isInstalled(targetDir)) {
200
- await writeCurrentRelease(latest.tagName);
201
- return targetDir;
275
+ const current = await readCurrentRelease(expected);
276
+ if (current) {
277
+ return current.releaseDir;
202
278
  }
203
279
 
204
- return installRelease(latest);
280
+ const release = await fetchReleaseByTag(expected);
281
+
282
+ return installRelease(release, expected);
205
283
  }
206
284
 
207
285
  function launchServer(releaseDir) {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@harusame64/desktop-touch-mcp",
3
- "version": "0.15.3",
3
+ "version": "0.15.5",
4
4
  "mcpName": "io.github.Harusame64/desktop-touch-mcp",
5
5
  "description": "LLM-native Windows computer-use MCP server with 56 tools for screenshots, UIA, mouse/keyboard, Chrome CDP, terminal, SmartScroll, and perception guards",
6
6
  "engines": {
@@ -41,14 +41,20 @@
41
41
  "scripts": {
42
42
  "build": "tsc",
43
43
  "sync-version": "node scripts/sync-version.mjs",
44
- "version": "npm run sync-version && git add src/version.ts",
44
+ "check:launcher-manifest": "node scripts/check-launcher-manifest.mjs",
45
+ "version": "npm run sync-version && git add src/version.ts bin/launcher.js",
45
46
  "prepare": "tsc",
46
47
  "start": "node dist/index.js",
47
48
  "dev": "tsc --watch",
48
49
  "test": "vitest run",
49
50
  "test:capture": "node scripts/test-capture.mjs",
50
- "test:e2e": "vitest run tests/e2e",
51
- "test:headed": "HEADED=1 vitest run tests/e2e",
51
+ "test:unit": "vitest run --project=unit",
52
+ "test:e2e": "vitest run --project=e2e",
53
+ "test:headed": "HEADED=1 vitest run --project=e2e",
54
+ "test:e2e:browser": "vitest run --project=e2e \"tests/e2e/browser-*.test.ts\"",
55
+ "test:e2e:window": "vitest run --project=e2e tests/e2e/dock-window.test.ts tests/e2e/dock-auto.test.ts tests/e2e/focus-integrity.test.ts tests/e2e/force-focus.test.ts tests/e2e/screenshot-electron.test.ts tests/e2e/ui-elements-cache.test.ts",
56
+ "test:e2e:input": "vitest run --project=e2e tests/e2e/keyboard-focus-lost.test.ts tests/e2e/mouse-focus-lost.test.ts tests/e2e/terminal.test.ts",
57
+ "test:e2e:perception": "vitest run --project=e2e tests/e2e/perception-mvp.test.ts tests/e2e/rich-narration-edge.test.ts",
52
58
  "test:watch": "vitest",
53
59
  "generate:stub-catalog": "node scripts/generate-stub-tool-catalog.mjs",
54
60
  "check:stub-catalog": "node scripts/generate-stub-tool-catalog.mjs && git diff --exit-code src/stub-tool-catalog.ts",