npm - @harusame64/desktop-touch-mcp - Versions diffs - 1.7.2 → 1.9.0 - Mend

@harusame64/desktop-touch-mcp 1.7.2 → 1.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.ja.md CHANGED Viewed

@@ -197,6 +197,7 @@ DOM を触る `browser_*` ツールは `includeContext:false` で末尾の `acti
 ### ターミナル (2)
 | ツール | 概要 |
 |---|---|
+| `terminal(action='run')` | コマンド送信 → 完了待ち → 出力取得を 1 コールで実行。完了判定は `until`: `quiet` / `pattern` / `exit`（コマンドの**終了**を待ち exit code を返す → [ターミナルの完了判定](#ターミナルの完了判定-until)） |
 | `terminal(action='read')` | Windows Terminal / PowerShell / cmd / WSL のテキストをUIA/OCRで取得。`sinceMarker`差分対応 |
 | `terminal(action='send')` | ターミナルへコマンド送信。clipboard paste既定でIME安全 |
@@ -263,6 +264,39 @@ Lease ライフサイクル:
 Reactive Perception Graph は desktop-touch の低コストな状況把握レイヤーです。対象の同一性・フォーカス・矩形・準備状態・guard 結果を操作間で維持し、Claude が小さな操作のたびにスクリーンショットで確認し直さなくて済むようにします。
+---
+## ターミナルの完了判定 (`until`)
+`terminal(action='run')` はコマンド送信 → 完了待ち → 出力取得を 1 コールで行います。「完了」の判定方法は `until` で選びます:
+| モード | 待つ対象 | 用途 |
+|---|---|---|
+| `quiet`（既定） | 出力が `quietMs` 静かになるまで | 短い対話コマンド |
+| `pattern` | 出力に現れる文字列/正規表現 | 最終マーカーが分かる長時間コマンド |
+| `exit` | コマンドの**終了そのもの** | 完了や exit code が必要なとき |
+> **アンカーの注意 (#384):** 最終行が改行で終わらない出力は、マーカーが次プロンプトに密着して行境界が無くなります（`printf X` → `Xuser@host:~$`）。よって行末アンカー付き `pattern`（`X\s*\n` / `X$`）は**バインド不能**です。**完了検出は `mode:'exit'`**、content マッチは**裸マーカー**（`\n`/`$` を付けない）を使ってください。`mode:'pattern'` には opt-in の `quietMs` settle fallback もあります: `until:{mode:'pattern', pattern, quietMs:1000}` は、pattern 未一致でも出力が指定 ms 安定したら `reason:'quiet'`（`matchedPattern` なし）で完了し、`timeoutMs` までのハングを防ぎます。opt-in（未指定なら pattern を待ち続ける＝silent gap のある長時間コマンドは無影響）。
+### `until:{mode:'exit'}` — 本当の完了 + exit code
+ヒューリスティックなモードは「センチネルを末尾に付ける」定番（`some-task; echo DONE` を `DONE` で待つ）で誤判定しがちです。センチネルは**エコーされたコマンド行**にも現れ、複数行コマンドではそのエコーと実出力をバッファだけから区別できません。`mode:'exit'` はこれを構造的に解決します — サーバが**表示形と入力形が異なる**完了マーカーをコマンド末尾に注入するため、エコーには決して一致せず（複数行入力でも）、実際のプロセス exit code を返します:
+```js
+terminal({
+  action: 'run',
+  windowTitle: 'pwsh',
+  input: 'npm run build',
+  until: { mode: 'exit', shell: 'powershell' },
+})
+// → completion: { reason: 'exited', exitCode: 0, elapsedMs: … }
+//   output: 注入マーカーは除去され、コマンドの実出力のみ
+```
+- **`shell` は明示指定推奨**（`'bash'` / `'powershell'`）。`shell:'auto'` はターミナル窓のプロセスから判定しますが、SSH / WSL の**中で**動く shell は見えません（窓はローカルホストのまま）。リモート/ネストしたセッションではリモート側の shell を渡してください（`auto` は警告を出し外側の shell を選ぶ場合があります）。プロセスを真に特定できない窓（Windows Terminal 等）は `ExitModeShellAmbiguous` を返します。
+- **first-class shell:** `bash` と `powershell`。`cmd.exe` は未対応（`ExitModeShellUnsupported`）。
+- **未完の構文で終わる入力は即座に reject**（`ExitModeUnsafeInput`）。閉じていない引用符 / here-doc / `$(…)` / 末尾の `\` または PowerShell バッククォートなどはハングせず弾きます。
+- exit mode は配送を自前制御するため、配送系の `sendOptions`（`method` / `preferClipboard` / `pressEnter` / `chunkSize` / `pasteKey`）は `InvalidArgs` で reject します（focus 系オプションは利用可）。
 ---
 ## ブラウザ CDP 自動化

package/README.md CHANGED Viewed

@@ -10,10 +10,11 @@
 npx -y @harusame64/desktop-touch-mcp
 ```
-28 tools, native Rust engine (UIA in 2 ms), zero-config PowerShell fallback, full CJK support, MIT licensed. Add the snippet above to your Claude / Cursor / VS Code Copilot config and Claude can drive Notepad, Excel, Chrome, Windows Terminal, and any other app on your machine.
+29 tools, native Rust engine (UIA in 2 ms), zero-config PowerShell fallback, full CJK support, MIT licensed. Add the snippet above to your Claude / Cursor / VS Code Copilot config and Claude can drive Notepad, Excel, Chrome, Windows Terminal, and any other app on your machine.
-> *v0.15: **82× average speedup** via Rust native engine — UIA focus queries in 2 ms, SSE2-accelerated image diffing at 13–15× native speed. Zero-config: the engine auto-loads when present, with transparent PowerShell fallback.*
-> *v0.15.5: **Pinned release verification** — the npm launcher now fetches only the matching GitHub Release tag and verifies the Windows runtime zip before extraction.*
+> **Why this over pixel-clicking?** Two ideas run through every tool: **discover-then-act** — `desktop_discover` returns interactive entities with short-lived leases instead of raw coordinates, so `desktop_act` operates on *what* you mean, not *where* it was — and **per-action perception guards** that verify the target window's identity and bounds before input lands, catching wrong-window typing and stale-coordinate clicks before they happen.
+>
+> Under the hood: an **82× average speedup** from the Rust native engine (UIA focus queries in 2 ms, SSE2-accelerated image diffing at 13–15×), with a transparent PowerShell fallback when the engine is absent. The npm launcher fetches only the GitHub Release tag matching the installed version and verifies the Windows runtime zip before extraction.
 ---
@@ -122,7 +123,7 @@ For a local checkout, register the built server directly:
 ---
-## Tools (28 Optimized Tools)
+## Tools (29 Optimized Tools)
 > 📖 **Full Reference**: [`docs/system-overview.md`](docs/system-overview.md) — Exhaustive guide on parameters, return schemas, and coordinate math.
@@ -159,13 +160,18 @@ For a local checkout, register the built server directly:
 ### 🛠️ Utilities & Workflow
 | Tool | Description |
 |---|---|
-| `terminal` | Unified command execution: `run` (send + wait + read), `read` (OCR/UIA), and `send`. |
+| `terminal` | Unified command execution: `run` (send + wait + read), `read` (OCR/UIA), and `send`. `run` completion modes: `quiet`, `pattern`, and `exit` (waits for the command to finish + returns its exit code — see [Terminal command completion](#terminal-command-completion-until)). |
 | `wait_until` | Efficient server-side polling for window, focus, text, or URL state changes. |
 | `window_dock` / `focus_window` | Window management: `pin` (always-on-top), `unpin`, `dock` (corner snap), and `focus`. |
 | `workspace_launch` | Launch apps and auto-detect new HWNDs (supports localized titles). |
 | `run_macro` | Batch up to 50 operations into a single round-trip for maximum efficiency. |
 | `clipboard` / `notification_show` | System-level text exchange and user alerts. |
+### 📊 Office (Excel)
+| Tool | Description |
+|---|---|
+| `excel` | Author and run Excel VBA macros via COM. `action='run_vba'` writes a macro into a managed Trusted Location and runs it; `action='check_access_vbom'` is a read-only preflight. Runs VBA where formula-only tools cannot. One-time setup: `node scripts/enable-access-vbom.mjs`. |
 ---
 ## Standard workflow (v1.0.0)
@@ -203,6 +209,66 @@ Lease lifecycle:
 ---
+## Terminal command completion (`until`)
+`terminal(action='run')` sends a command, waits for it to complete, and reads the
+output in one call. How it decides "complete" is controlled by `until`:
+| Mode | Waits for | Best for |
+|---|---|---|
+| `quiet` (default) | output to fall silent for `quietMs` | short interactive commands |
+| `pattern` | a string/regex you expect in the output | long commands with a known final marker |
+| `exit` | the command to actually **finish** | when you need completion or the exit code |
+> **Anchoring caveat (#384):** a command whose final line has no trailing newline
+> glues the marker to the next prompt with no line boundary (`printf X` →
+> `Xuser@host:~$`), so an end-anchored `pattern` (`X\s*\n` / `X$`) can never bind.
+> For *completion* use `mode:'exit'`; for *content* matching use a bare marker
+> (no `\n`/`$`). `mode:'pattern'` also accepts an optional `quietMs` settle
+> fallback: `until:{mode:'pattern', pattern, quietMs:1000}` completes with
+> `reason:'quiet'` (no `matchedPattern`) once output is stable for that long
+> without a match — instead of hanging until `timeoutMs`. It is opt-in (omit
+> `quietMs` to keep waiting for the pattern; long commands with mid-run silent
+> gaps are unaffected).
+### `until:{mode:'exit'}` — real completion + exit code
+The heuristic modes can misfire on the common "append a sentinel" idiom
+(`some-task; echo DONE` matched by `DONE`): the sentinel also shows up in the
+**echoed command line**, and for multi-line commands there is no reliable way to
+tell that echo apart from real output. `mode:'exit'` removes the guesswork — the
+server appends its own completion marker whose *printed* form differs from its
+*typed* form, so it never matches the echoed command (even for multi-line input),
+and it returns the real process exit code:
+```js
+terminal({
+  action: 'run',
+  windowTitle: 'pwsh',
+  input: 'npm run build',
+  until: { mode: 'exit', shell: 'powershell' },
+})
+// → completion: { reason: 'exited', exitCode: 0, elapsedMs: … }
+//   output: just the command's real output (the injected marker is stripped)
+```
+- **Pass `shell` explicitly** (`'bash'` or `'powershell'`). `shell:'auto'` detects
+  the shell from the terminal window, but it cannot see a shell running *inside*
+  SSH or WSL — the window still looks like its local host — so for remote/nested
+  sessions pass the remote side's shell (`auto` otherwise warns and may pick the
+  outer shell). A window whose process is genuinely unidentifiable (e.g. Windows
+  Terminal) returns `ExitModeShellAmbiguous`.
+- **First-class shells:** `bash` and `powershell`. `cmd.exe` is not supported yet
+  (`ExitModeShellUnsupported`).
+- **Unsafe input is rejected up front** (`ExitModeUnsafeInput`) rather than
+  hanging: a command ending mid-construct (unterminated quote, here-doc, `$(…)`,
+  a trailing `\` or PowerShell backtick).
+- Exit mode controls its own delivery, so delivery-shaping `sendOptions`
+  (`method` / `preferClipboard` / `pressEnter` / `chunkSize` / `pasteKey`) are
+  rejected with `InvalidArgs`; focus options remain accepted.
+---
 ## Browser CDP automation
 For web automation, connect Chrome or Edge with the remote debugging port enabled — no Selenium or Playwright needed.
@@ -498,7 +564,7 @@ Setting `DESKTOP_TOUCH_FORCE_FOCUS=1` makes `forceFocus: true` the default for a
 ---
-## Auto Guard (v0.12+)
+## Auto Guard
 Action tools (`mouse_click`, `mouse_drag`, `keyboard(action='type'/'press')`, `click_element`, `desktop_act`, `browser_click`, `browser_navigate`) automatically guard each action when you pass `windowTitle` / `tabId`:
@@ -548,32 +614,7 @@ The fix is one-shot and expires in 15 seconds. The server revalidates the target
 ---
-## v0.13 Additions
-### Target-Identity Timeline
-The server tracks a semantic timeline of what happened to each target window/tab. Recent events are included in:
-- `get_history` → `recentTargetKeys`: array of 3 most recently active target keys (compact, no event bodies)
-- `perception_read(lensId)` → `recentEvents`: up to 10 events for that lens's target, each with `tsMs`, `semantic`, `summary`
-Enable the MCP resources below to browse timelines:
-```json
-{ "env": { "DESKTOP_TOUCH_PERCEPTION_RESOURCES": "1" } }
-```
-MCP resources available when enabled:
-| URI | Content |
-|---|---|
-| `perception://target/{targetKey}/timeline` | Semantic event timeline for a target |
-| `perception://targets/recent` | Most recently active target keys |
-| `perception://lens/{lensId}/summary` | Lens attention/guard state |
-### Manual Lens Eviction: FIFO → LRU
-Manual lenses (created via `perception_register`) are now evicted by **least-recently-used** instead of insertion order. Using `perception_read`, `evaluatePreToolGuards`, or `buildEnvelopeFor` on a lens promotes it. The hard limit of 16 active lenses is unchanged.
+## Advanced response options
 ### browser_eval Structured Mode
@@ -688,7 +729,7 @@ Flag semantics (exact-match: only the literal string `"1"` counts):
 ### Deprecated: `DESKTOP_TOUCH_ENABLE_FUKUWARAI_V2`
-This was the opt-in switch in v0.16.x. From v0.17 it is accepted for compatibility but no longer required — the server prints a deprecation warning on startup when it is set. It will be removed in v0.18. Remove it from your config when you upgrade.
+This was the opt-in switch in v0.16.x. It is still accepted for backward compatibility but no longer required — the server prints a deprecation warning on startup when it is set. Remove it from your config; V2 is on by default.
 ### Recovery when V2 fails
@@ -699,7 +740,7 @@ If `desktop_act` returns `ok: false`, read `reason` and follow the built-in reco
 - `entity_outside_viewport` → `scroll` / `scroll(action='to_element')`, then re-call `desktop_discover`
 - `executor_failed` → fall back to `click_element` / `mouse_click` / `browser_click`
-For `desktop_discover` warnings (`visual_provider_unavailable`, `visual_provider_warming`, `cdp_provider_failed`, …), V1 tools (`screenshot`, `click_element`, `get_ui_elements`, `terminal(action='send')`, …) remain available as an escape hatch.
+For `desktop_discover` warnings (`visual_provider_unavailable`, `visual_provider_warming`, `cdp_provider_failed`, …), the coordinate-based tools (`screenshot(detail='text')`, `click_element`, `mouse_click`, `terminal`, …) remain available as an escape hatch.
 ---

package/bin/launcher.js CHANGED Viewed

@@ -18,15 +18,15 @@ import path from "node:path";
 import { Readable } from "node:stream";
 import { pipeline } from "node:stream/promises";
-const PACKAGE_VERSION = "1.7.2";
+const PACKAGE_VERSION = "1.9.0";
 const RELEASE_TAG = `v${PACKAGE_VERSION}`;
 const REPO_API_URL = `https://api.github.com/repos/Harusame64/desktop-touch-mcp/releases/tags/${RELEASE_TAG}`;
 const ASSET_NAME = "desktop-touch-mcp-windows.zip";
 const RELEASE_METADATA_FILE = ".desktop-touch-release.json";
 const RELEASE_MANIFEST = {
-  tagName: "v1.7.2",
+  tagName: "v1.9.0",
   assetName: ASSET_NAME,
-  sha256: "668db752d568475898bce5eac0c82f05e6880053214eb668fbea3cc8506d6ff4",
+  sha256: "d1c913ba71986802c335c3d23bd419d58d60ebb36a0d8826a7441dcd5d4403a0",
 };
 const CACHE_ROOT = process.env.DESKTOP_TOUCH_MCP_HOME
   ? path.resolve(process.env.DESKTOP_TOUCH_MCP_HOME)

package/package.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
   "name": "@harusame64/desktop-touch-mcp",
-  "version": "1.7.2",
+  "version": "1.9.0",
   "mcpName": "io.github.Harusame64/desktop-touch-mcp",
-  "description": "Let Claude, Cursor, or any MCP client see and operate your Windows 10/11 desktop. 28 tools for screenshots, UI Automation, Chrome CDP, keyboard/mouse, terminal, with semantic discover-then-act targeting and per-action perception guards that avoid wrong-window typing and stale-coordinate clicks.",
+  "description": "Let Claude, Cursor, or any MCP client see and operate your Windows 10/11 desktop. 29 tools for screenshots, UI Automation, Chrome CDP, keyboard/mouse, terminal, with semantic discover-then-act targeting and per-action perception guards that avoid wrong-window typing and stale-coordinate clicks.",
   "keywords": [
     "mcp",
     "mcp-server",
@@ -81,6 +81,7 @@
     "test:capture": "node scripts/test-capture.mjs",
     "test:unit": "vitest run --project=unit",
     "test:e2e": "vitest run --project=e2e",
+    "e2e:stop": "node scripts/e2e-stop.mjs",
     "test:headed": "HEADED=1 vitest run --project=e2e",
     "test:e2e:browser": "vitest run --project=e2e \"tests/e2e/browser-*.test.ts\"",
     "test:e2e:window": "vitest run --project=e2e tests/e2e/dock-window.test.ts tests/e2e/dock-auto.test.ts tests/e2e/focus-integrity.test.ts tests/e2e/force-focus.test.ts tests/e2e/screenshot-electron.test.ts tests/e2e/ui-elements-cache.test.ts",
@@ -89,6 +90,7 @@
     "test:watch": "vitest",
     "generate:stub-catalog": "node scripts/generate-stub-tool-catalog.mjs",
     "check:stub-catalog": "node scripts/generate-stub-tool-catalog.mjs && git diff --exit-code src/stub-tool-catalog.ts",
+    "check:failwith-fixtures": "node scripts/extract-failwith-shape-fixtures.mjs && git diff --exit-code tests/fixtures/failwith-callsite-shapes.json",
     "check:napi-safe": "node scripts/check-napi-safe.mjs",
     "check:native-types": "node scripts/check-native-types.mjs",
     "check:no-koffi": "node scripts/check-no-koffi.mjs",