@harusame64/desktop-touch-mcp 1.7.2 → 1.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.ja.md CHANGED
@@ -197,6 +197,7 @@ DOM を触る `browser_*` ツールは `includeContext:false` で末尾の `acti
197
197
  ### ターミナル (2)
198
198
  | ツール | 概要 |
199
199
  |---|---|
200
+ | `terminal(action='run')` | コマンド送信 → 完了待ち → 出力取得を 1 コールで実行。完了判定は `until`: `quiet` / `pattern` / `exit`(コマンドの**終了**を待ち exit code を返す → [ターミナルの完了判定](#ターミナルの完了判定-until)) |
200
201
  | `terminal(action='read')` | Windows Terminal / PowerShell / cmd / WSL のテキストをUIA/OCRで取得。`sinceMarker`差分対応 |
201
202
  | `terminal(action='send')` | ターミナルへコマンド送信。clipboard paste既定でIME安全 |
202
203
 
@@ -263,6 +264,39 @@ Lease ライフサイクル:
263
264
 
264
265
  Reactive Perception Graph は desktop-touch の低コストな状況把握レイヤーです。対象の同一性・フォーカス・矩形・準備状態・guard 結果を操作間で維持し、Claude が小さな操作のたびにスクリーンショットで確認し直さなくて済むようにします。
265
266
 
267
+ ---
268
+ ## ターミナルの完了判定 (`until`)
269
+
270
+ `terminal(action='run')` はコマンド送信 → 完了待ち → 出力取得を 1 コールで行います。「完了」の判定方法は `until` で選びます:
271
+
272
+ | モード | 待つ対象 | 用途 |
273
+ |---|---|---|
274
+ | `quiet`(既定) | 出力が `quietMs` 静かになるまで | 短い対話コマンド |
275
+ | `pattern` | 出力に現れる文字列/正規表現 | 最終マーカーが分かる長時間コマンド |
276
+ | `exit` | コマンドの**終了そのもの** | 完了や exit code が必要なとき |
277
+
278
+ > **アンカーの注意 (#384):** 最終行が改行で終わらない出力は、マーカーが次プロンプトに密着して行境界が無くなります(`printf X` → `Xuser@host:~$`)。よって行末アンカー付き `pattern`(`X\s*\n` / `X$`)は**バインド不能**です。**完了検出は `mode:'exit'`**、content マッチは**裸マーカー**(`\n`/`$` を付けない)を使ってください。`mode:'pattern'` には opt-in の `quietMs` settle fallback もあります: `until:{mode:'pattern', pattern, quietMs:1000}` は、pattern 未一致でも出力が指定 ms 安定したら `reason:'quiet'`(`matchedPattern` なし)で完了し、`timeoutMs` までのハングを防ぎます。opt-in(未指定なら pattern を待ち続ける=silent gap のある長時間コマンドは無影響)。
279
+
280
+ ### `until:{mode:'exit'}` — 本当の完了 + exit code
281
+
282
+ ヒューリスティックなモードは「センチネルを末尾に付ける」定番(`some-task; echo DONE` を `DONE` で待つ)で誤判定しがちです。センチネルは**エコーされたコマンド行**にも現れ、複数行コマンドではそのエコーと実出力をバッファだけから区別できません。`mode:'exit'` はこれを構造的に解決します — サーバが**表示形と入力形が異なる**完了マーカーをコマンド末尾に注入するため、エコーには決して一致せず(複数行入力でも)、実際のプロセス exit code を返します:
283
+
284
+ ```js
285
+ terminal({
286
+ action: 'run',
287
+ windowTitle: 'pwsh',
288
+ input: 'npm run build',
289
+ until: { mode: 'exit', shell: 'powershell' },
290
+ })
291
+ // → completion: { reason: 'exited', exitCode: 0, elapsedMs: … }
292
+ // output: 注入マーカーは除去され、コマンドの実出力のみ
293
+ ```
294
+
295
+ - **`shell` は明示指定推奨**(`'bash'` / `'powershell'`)。`shell:'auto'` はターミナル窓のプロセスから判定しますが、SSH / WSL の**中で**動く shell は見えません(窓はローカルホストのまま)。リモート/ネストしたセッションではリモート側の shell を渡してください(`auto` は警告を出し外側の shell を選ぶ場合があります)。プロセスを真に特定できない窓(Windows Terminal 等)は `ExitModeShellAmbiguous` を返します。
296
+ - **first-class shell:** `bash` と `powershell`。`cmd.exe` は未対応(`ExitModeShellUnsupported`)。
297
+ - **未完の構文で終わる入力は即座に reject**(`ExitModeUnsafeInput`)。閉じていない引用符 / here-doc / `$(…)` / 末尾の `\` または PowerShell バッククォートなどはハングせず弾きます。
298
+ - exit mode は配送を自前制御するため、配送系の `sendOptions`(`method` / `preferClipboard` / `pressEnter` / `chunkSize` / `pasteKey`)は `InvalidArgs` で reject します(focus 系オプションは利用可)。
299
+
266
300
  ---
267
301
  ## ブラウザ CDP 自動化
268
302
 
package/README.md CHANGED
@@ -10,10 +10,11 @@
10
10
  npx -y @harusame64/desktop-touch-mcp
11
11
  ```
12
12
 
13
- 28 tools, native Rust engine (UIA in 2 ms), zero-config PowerShell fallback, full CJK support, MIT licensed. Add the snippet above to your Claude / Cursor / VS Code Copilot config and Claude can drive Notepad, Excel, Chrome, Windows Terminal, and any other app on your machine.
13
+ 29 tools, native Rust engine (UIA in 2 ms), zero-config PowerShell fallback, full CJK support, MIT licensed. Add the snippet above to your Claude / Cursor / VS Code Copilot config and Claude can drive Notepad, Excel, Chrome, Windows Terminal, and any other app on your machine.
14
14
 
15
- > *v0.15: **82× average speedup** via Rust native engine UIA focus queries in 2 ms, SSE2-accelerated image diffing at 13–15× native speed. Zero-config: the engine auto-loads when present, with transparent PowerShell fallback.*
16
- > *v0.15.5: **Pinned release verification** — the npm launcher now fetches only the matching GitHub Release tag and verifies the Windows runtime zip before extraction.*
15
+ > **Why this over pixel-clicking?** Two ideas run through every tool: **discover-then-act** `desktop_discover` returns interactive entities with short-lived leases instead of raw coordinates, so `desktop_act` operates on *what* you mean, not *where* it was and **per-action perception guards** that verify the target window's identity and bounds before input lands, catching wrong-window typing and stale-coordinate clicks before they happen.
16
+ >
17
+ > Under the hood: an **82× average speedup** from the Rust native engine (UIA focus queries in 2 ms, SSE2-accelerated image diffing at 13–15×), with a transparent PowerShell fallback when the engine is absent. The npm launcher fetches only the GitHub Release tag matching the installed version and verifies the Windows runtime zip before extraction.
17
18
 
18
19
  ---
19
20
 
@@ -122,7 +123,7 @@ For a local checkout, register the built server directly:
122
123
 
123
124
  ---
124
125
 
125
- ## Tools (28 Optimized Tools)
126
+ ## Tools (29 Optimized Tools)
126
127
 
127
128
  > 📖 **Full Reference**: [`docs/system-overview.md`](docs/system-overview.md) — Exhaustive guide on parameters, return schemas, and coordinate math.
128
129
 
@@ -159,13 +160,18 @@ For a local checkout, register the built server directly:
159
160
  ### 🛠️ Utilities & Workflow
160
161
  | Tool | Description |
161
162
  |---|---|
162
- | `terminal` | Unified command execution: `run` (send + wait + read), `read` (OCR/UIA), and `send`. |
163
+ | `terminal` | Unified command execution: `run` (send + wait + read), `read` (OCR/UIA), and `send`. `run` completion modes: `quiet`, `pattern`, and `exit` (waits for the command to finish + returns its exit code — see [Terminal command completion](#terminal-command-completion-until)). |
163
164
  | `wait_until` | Efficient server-side polling for window, focus, text, or URL state changes. |
164
165
  | `window_dock` / `focus_window` | Window management: `pin` (always-on-top), `unpin`, `dock` (corner snap), and `focus`. |
165
166
  | `workspace_launch` | Launch apps and auto-detect new HWNDs (supports localized titles). |
166
167
  | `run_macro` | Batch up to 50 operations into a single round-trip for maximum efficiency. |
167
168
  | `clipboard` / `notification_show` | System-level text exchange and user alerts. |
168
169
 
170
+ ### 📊 Office (Excel)
171
+ | Tool | Description |
172
+ |---|---|
173
+ | `excel` | Author and run Excel VBA macros via COM. `action='run_vba'` writes a macro into a managed Trusted Location and runs it; `action='check_access_vbom'` is a read-only preflight. Runs VBA where formula-only tools cannot. One-time setup: `node scripts/enable-access-vbom.mjs`. |
174
+
169
175
  ---
170
176
 
171
177
  ## Standard workflow (v1.0.0)
@@ -203,6 +209,66 @@ Lease lifecycle:
203
209
 
204
210
  ---
205
211
 
212
+ ## Terminal command completion (`until`)
213
+
214
+ `terminal(action='run')` sends a command, waits for it to complete, and reads the
215
+ output in one call. How it decides "complete" is controlled by `until`:
216
+
217
+ | Mode | Waits for | Best for |
218
+ |---|---|---|
219
+ | `quiet` (default) | output to fall silent for `quietMs` | short interactive commands |
220
+ | `pattern` | a string/regex you expect in the output | long commands with a known final marker |
221
+ | `exit` | the command to actually **finish** | when you need completion or the exit code |
222
+
223
+ > **Anchoring caveat (#384):** a command whose final line has no trailing newline
224
+ > glues the marker to the next prompt with no line boundary (`printf X` →
225
+ > `Xuser@host:~$`), so an end-anchored `pattern` (`X\s*\n` / `X$`) can never bind.
226
+ > For *completion* use `mode:'exit'`; for *content* matching use a bare marker
227
+ > (no `\n`/`$`). `mode:'pattern'` also accepts an optional `quietMs` settle
228
+ > fallback: `until:{mode:'pattern', pattern, quietMs:1000}` completes with
229
+ > `reason:'quiet'` (no `matchedPattern`) once output is stable for that long
230
+ > without a match — instead of hanging until `timeoutMs`. It is opt-in (omit
231
+ > `quietMs` to keep waiting for the pattern; long commands with mid-run silent
232
+ > gaps are unaffected).
233
+
234
+ ### `until:{mode:'exit'}` — real completion + exit code
235
+
236
+ The heuristic modes can misfire on the common "append a sentinel" idiom
237
+ (`some-task; echo DONE` matched by `DONE`): the sentinel also shows up in the
238
+ **echoed command line**, and for multi-line commands there is no reliable way to
239
+ tell that echo apart from real output. `mode:'exit'` removes the guesswork — the
240
+ server appends its own completion marker whose *printed* form differs from its
241
+ *typed* form, so it never matches the echoed command (even for multi-line input),
242
+ and it returns the real process exit code:
243
+
244
+ ```js
245
+ terminal({
246
+ action: 'run',
247
+ windowTitle: 'pwsh',
248
+ input: 'npm run build',
249
+ until: { mode: 'exit', shell: 'powershell' },
250
+ })
251
+ // → completion: { reason: 'exited', exitCode: 0, elapsedMs: … }
252
+ // output: just the command's real output (the injected marker is stripped)
253
+ ```
254
+
255
+ - **Pass `shell` explicitly** (`'bash'` or `'powershell'`). `shell:'auto'` detects
256
+ the shell from the terminal window, but it cannot see a shell running *inside*
257
+ SSH or WSL — the window still looks like its local host — so for remote/nested
258
+ sessions pass the remote side's shell (`auto` otherwise warns and may pick the
259
+ outer shell). A window whose process is genuinely unidentifiable (e.g. Windows
260
+ Terminal) returns `ExitModeShellAmbiguous`.
261
+ - **First-class shells:** `bash` and `powershell`. `cmd.exe` is not supported yet
262
+ (`ExitModeShellUnsupported`).
263
+ - **Unsafe input is rejected up front** (`ExitModeUnsafeInput`) rather than
264
+ hanging: a command ending mid-construct (unterminated quote, here-doc, `$(…)`,
265
+ a trailing `\` or PowerShell backtick).
266
+ - Exit mode controls its own delivery, so delivery-shaping `sendOptions`
267
+ (`method` / `preferClipboard` / `pressEnter` / `chunkSize` / `pasteKey`) are
268
+ rejected with `InvalidArgs`; focus options remain accepted.
269
+
270
+ ---
271
+
206
272
  ## Browser CDP automation
207
273
 
208
274
  For web automation, connect Chrome or Edge with the remote debugging port enabled — no Selenium or Playwright needed.
@@ -498,7 +564,7 @@ Setting `DESKTOP_TOUCH_FORCE_FOCUS=1` makes `forceFocus: true` the default for a
498
564
 
499
565
  ---
500
566
 
501
- ## Auto Guard (v0.12+)
567
+ ## Auto Guard
502
568
 
503
569
  Action tools (`mouse_click`, `mouse_drag`, `keyboard(action='type'/'press')`, `click_element`, `desktop_act`, `browser_click`, `browser_navigate`) automatically guard each action when you pass `windowTitle` / `tabId`:
504
570
 
@@ -548,32 +614,7 @@ The fix is one-shot and expires in 15 seconds. The server revalidates the target
548
614
 
549
615
  ---
550
616
 
551
- ## v0.13 Additions
552
-
553
- ### Target-Identity Timeline
554
-
555
- The server tracks a semantic timeline of what happened to each target window/tab. Recent events are included in:
556
-
557
- - `get_history` → `recentTargetKeys`: array of 3 most recently active target keys (compact, no event bodies)
558
- - `perception_read(lensId)` → `recentEvents`: up to 10 events for that lens's target, each with `tsMs`, `semantic`, `summary`
559
-
560
- Enable the MCP resources below to browse timelines:
561
-
562
- ```json
563
- { "env": { "DESKTOP_TOUCH_PERCEPTION_RESOURCES": "1" } }
564
- ```
565
-
566
- MCP resources available when enabled:
567
-
568
- | URI | Content |
569
- |---|---|
570
- | `perception://target/{targetKey}/timeline` | Semantic event timeline for a target |
571
- | `perception://targets/recent` | Most recently active target keys |
572
- | `perception://lens/{lensId}/summary` | Lens attention/guard state |
573
-
574
- ### Manual Lens Eviction: FIFO → LRU
575
-
576
- Manual lenses (created via `perception_register`) are now evicted by **least-recently-used** instead of insertion order. Using `perception_read`, `evaluatePreToolGuards`, or `buildEnvelopeFor` on a lens promotes it. The hard limit of 16 active lenses is unchanged.
617
+ ## Advanced response options
577
618
 
578
619
  ### browser_eval Structured Mode
579
620
 
@@ -688,7 +729,7 @@ Flag semantics (exact-match: only the literal string `"1"` counts):
688
729
 
689
730
  ### Deprecated: `DESKTOP_TOUCH_ENABLE_FUKUWARAI_V2`
690
731
 
691
- This was the opt-in switch in v0.16.x. From v0.17 it is accepted for compatibility but no longer required — the server prints a deprecation warning on startup when it is set. It will be removed in v0.18. Remove it from your config when you upgrade.
732
+ This was the opt-in switch in v0.16.x. It is still accepted for backward compatibility but no longer required — the server prints a deprecation warning on startup when it is set. Remove it from your config; V2 is on by default.
692
733
 
693
734
  ### Recovery when V2 fails
694
735
 
@@ -699,7 +740,7 @@ If `desktop_act` returns `ok: false`, read `reason` and follow the built-in reco
699
740
  - `entity_outside_viewport` → `scroll` / `scroll(action='to_element')`, then re-call `desktop_discover`
700
741
  - `executor_failed` → fall back to `click_element` / `mouse_click` / `browser_click`
701
742
 
702
- For `desktop_discover` warnings (`visual_provider_unavailable`, `visual_provider_warming`, `cdp_provider_failed`, …), V1 tools (`screenshot`, `click_element`, `get_ui_elements`, `terminal(action='send')`, …) remain available as an escape hatch.
743
+ For `desktop_discover` warnings (`visual_provider_unavailable`, `visual_provider_warming`, `cdp_provider_failed`, …), the coordinate-based tools (`screenshot(detail='text')`, `click_element`, `mouse_click`, `terminal`, …) remain available as an escape hatch.
703
744
 
704
745
  ---
705
746
 
package/bin/launcher.js CHANGED
@@ -18,15 +18,15 @@ import path from "node:path";
18
18
  import { Readable } from "node:stream";
19
19
  import { pipeline } from "node:stream/promises";
20
20
 
21
- const PACKAGE_VERSION = "1.7.2";
21
+ const PACKAGE_VERSION = "1.9.0";
22
22
  const RELEASE_TAG = `v${PACKAGE_VERSION}`;
23
23
  const REPO_API_URL = `https://api.github.com/repos/Harusame64/desktop-touch-mcp/releases/tags/${RELEASE_TAG}`;
24
24
  const ASSET_NAME = "desktop-touch-mcp-windows.zip";
25
25
  const RELEASE_METADATA_FILE = ".desktop-touch-release.json";
26
26
  const RELEASE_MANIFEST = {
27
- tagName: "v1.7.2",
27
+ tagName: "v1.9.0",
28
28
  assetName: ASSET_NAME,
29
- sha256: "668db752d568475898bce5eac0c82f05e6880053214eb668fbea3cc8506d6ff4",
29
+ sha256: "d1c913ba71986802c335c3d23bd419d58d60ebb36a0d8826a7441dcd5d4403a0",
30
30
  };
31
31
  const CACHE_ROOT = process.env.DESKTOP_TOUCH_MCP_HOME
32
32
  ? path.resolve(process.env.DESKTOP_TOUCH_MCP_HOME)
package/package.json CHANGED
@@ -1,8 +1,8 @@
1
1
  {
2
2
  "name": "@harusame64/desktop-touch-mcp",
3
- "version": "1.7.2",
3
+ "version": "1.9.0",
4
4
  "mcpName": "io.github.Harusame64/desktop-touch-mcp",
5
- "description": "Let Claude, Cursor, or any MCP client see and operate your Windows 10/11 desktop. 28 tools for screenshots, UI Automation, Chrome CDP, keyboard/mouse, terminal, with semantic discover-then-act targeting and per-action perception guards that avoid wrong-window typing and stale-coordinate clicks.",
5
+ "description": "Let Claude, Cursor, or any MCP client see and operate your Windows 10/11 desktop. 29 tools for screenshots, UI Automation, Chrome CDP, keyboard/mouse, terminal, with semantic discover-then-act targeting and per-action perception guards that avoid wrong-window typing and stale-coordinate clicks.",
6
6
  "keywords": [
7
7
  "mcp",
8
8
  "mcp-server",
@@ -81,6 +81,7 @@
81
81
  "test:capture": "node scripts/test-capture.mjs",
82
82
  "test:unit": "vitest run --project=unit",
83
83
  "test:e2e": "vitest run --project=e2e",
84
+ "e2e:stop": "node scripts/e2e-stop.mjs",
84
85
  "test:headed": "HEADED=1 vitest run --project=e2e",
85
86
  "test:e2e:browser": "vitest run --project=e2e \"tests/e2e/browser-*.test.ts\"",
86
87
  "test:e2e:window": "vitest run --project=e2e tests/e2e/dock-window.test.ts tests/e2e/dock-auto.test.ts tests/e2e/focus-integrity.test.ts tests/e2e/force-focus.test.ts tests/e2e/screenshot-electron.test.ts tests/e2e/ui-elements-cache.test.ts",
@@ -89,6 +90,7 @@
89
90
  "test:watch": "vitest",
90
91
  "generate:stub-catalog": "node scripts/generate-stub-tool-catalog.mjs",
91
92
  "check:stub-catalog": "node scripts/generate-stub-tool-catalog.mjs && git diff --exit-code src/stub-tool-catalog.ts",
93
+ "check:failwith-fixtures": "node scripts/extract-failwith-shape-fixtures.mjs && git diff --exit-code tests/fixtures/failwith-callsite-shapes.json",
92
94
  "check:napi-safe": "node scripts/check-napi-safe.mjs",
93
95
  "check:native-types": "node scripts/check-native-types.mjs",
94
96
  "check:no-koffi": "node scripts/check-no-koffi.mjs",