@harusame64/desktop-touch-mcp 1.7.1 → 1.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.ja.md CHANGED
@@ -197,6 +197,7 @@ DOM を触る `browser_*` ツールは `includeContext:false` で末尾の `acti
197
197
  ### ターミナル (2)
198
198
  | ツール | 概要 |
199
199
  |---|---|
200
+ | `terminal(action='run')` | コマンド送信 → 完了待ち → 出力取得を 1 コールで実行。完了判定は `until`: `quiet` / `pattern` / `exit`(コマンドの**終了**を待ち exit code を返す → [ターミナルの完了判定](#ターミナルの完了判定-until)) |
200
201
  | `terminal(action='read')` | Windows Terminal / PowerShell / cmd / WSL のテキストをUIA/OCRで取得。`sinceMarker`差分対応 |
201
202
  | `terminal(action='send')` | ターミナルへコマンド送信。clipboard paste既定でIME安全 |
202
203
 
@@ -263,6 +264,39 @@ Lease ライフサイクル:
263
264
 
264
265
  Reactive Perception Graph は desktop-touch の低コストな状況把握レイヤーです。対象の同一性・フォーカス・矩形・準備状態・guard 結果を操作間で維持し、Claude が小さな操作のたびにスクリーンショットで確認し直さなくて済むようにします。
265
266
 
267
+ ---
268
+ ## ターミナルの完了判定 (`until`)
269
+
270
+ `terminal(action='run')` はコマンド送信 → 完了待ち → 出力取得を 1 コールで行います。「完了」の判定方法は `until` で選びます:
271
+
272
+ | モード | 待つ対象 | 用途 |
273
+ |---|---|---|
274
+ | `quiet`(既定) | 出力が `quietMs` 静かになるまで | 短い対話コマンド |
275
+ | `pattern` | 出力に現れる文字列/正規表現 | 最終マーカーが分かる長時間コマンド |
276
+ | `exit` | コマンドの**終了そのもの** | 完了や exit code が必要なとき |
277
+
278
+ > **アンカーの注意 (#384):** 最終行が改行で終わらない出力は、マーカーが次プロンプトに密着して行境界が無くなります(`printf X` → `Xuser@host:~$`)。よって行末アンカー付き `pattern`(`X\s*\n` / `X$`)は**バインド不能**です。**完了検出は `mode:'exit'`**、content マッチは**裸マーカー**(`\n`/`$` を付けない)を使ってください。`mode:'pattern'` には opt-in の `quietMs` settle fallback もあります: `until:{mode:'pattern', pattern, quietMs:1000}` は、pattern 未一致でも出力が指定 ms 安定したら `reason:'quiet'`(`matchedPattern` なし)で完了し、`timeoutMs` までのハングを防ぎます。opt-in(未指定なら pattern を待ち続ける=silent gap のある長時間コマンドは無影響)。
279
+
280
+ ### `until:{mode:'exit'}` — 本当の完了 + exit code
281
+
282
+ ヒューリスティックなモードは「センチネルを末尾に付ける」定番(`some-task; echo DONE` を `DONE` で待つ)で誤判定しがちです。センチネルは**エコーされたコマンド行**にも現れ、複数行コマンドではそのエコーと実出力をバッファだけから区別できません。`mode:'exit'` はこれを構造的に解決します — サーバが**表示形と入力形が異なる**完了マーカーをコマンド末尾に注入するため、エコーには決して一致せず(複数行入力でも)、実際のプロセス exit code を返します:
283
+
284
+ ```js
285
+ terminal({
286
+ action: 'run',
287
+ windowTitle: 'pwsh',
288
+ input: 'npm run build',
289
+ until: { mode: 'exit', shell: 'powershell' },
290
+ })
291
+ // → completion: { reason: 'exited', exitCode: 0, elapsedMs: … }
292
+ // output: 注入マーカーは除去され、コマンドの実出力のみ
293
+ ```
294
+
295
+ - **`shell` は明示指定推奨**(`'bash'` / `'powershell'`)。`shell:'auto'` はターミナル窓のプロセスから判定しますが、SSH / WSL の**中で**動く shell は見えません(窓はローカルホストのまま)。リモート/ネストしたセッションではリモート側の shell を渡してください(`auto` は警告を出し外側の shell を選ぶ場合があります)。プロセスを真に特定できない窓(Windows Terminal 等)は `ExitModeShellAmbiguous` を返します。
296
+ - **first-class shell:** `bash` と `powershell`。`cmd.exe` は未対応(`ExitModeShellUnsupported`)。
297
+ - **未完の構文で終わる入力は即座に reject**(`ExitModeUnsafeInput`)。閉じていない引用符 / here-doc / `$(…)` / 末尾の `\` または PowerShell バッククォートなどはハングせず弾きます。
298
+ - exit mode は配送を自前制御するため、配送系の `sendOptions`(`method` / `preferClipboard` / `pressEnter` / `chunkSize` / `pasteKey`)は `InvalidArgs` で reject します(focus 系オプションは利用可)。
299
+
266
300
  ---
267
301
  ## ブラウザ CDP 自動化
268
302
 
package/README.md CHANGED
@@ -159,7 +159,7 @@ For a local checkout, register the built server directly:
159
159
  ### 🛠️ Utilities & Workflow
160
160
  | Tool | Description |
161
161
  |---|---|
162
- | `terminal` | Unified command execution: `run` (send + wait + read), `read` (OCR/UIA), and `send`. |
162
+ | `terminal` | Unified command execution: `run` (send + wait + read), `read` (OCR/UIA), and `send`. `run` completion modes: `quiet`, `pattern`, and `exit` (waits for the command to finish + returns its exit code — see [Terminal command completion](#terminal-command-completion-until)). |
163
163
  | `wait_until` | Efficient server-side polling for window, focus, text, or URL state changes. |
164
164
  | `window_dock` / `focus_window` | Window management: `pin` (always-on-top), `unpin`, `dock` (corner snap), and `focus`. |
165
165
  | `workspace_launch` | Launch apps and auto-detect new HWNDs (supports localized titles). |
@@ -203,6 +203,66 @@ Lease lifecycle:
203
203
 
204
204
  ---
205
205
 
206
+ ## Terminal command completion (`until`)
207
+
208
+ `terminal(action='run')` sends a command, waits for it to complete, and reads the
209
+ output in one call. How it decides "complete" is controlled by `until`:
210
+
211
+ | Mode | Waits for | Best for |
212
+ |---|---|---|
213
+ | `quiet` (default) | output to fall silent for `quietMs` | short interactive commands |
214
+ | `pattern` | a string/regex you expect in the output | long commands with a known final marker |
215
+ | `exit` | the command to actually **finish** | when you need completion or the exit code |
216
+
217
+ > **Anchoring caveat (#384):** a command whose final line has no trailing newline
218
+ > glues the marker to the next prompt with no line boundary (`printf X` →
219
+ > `Xuser@host:~$`), so an end-anchored `pattern` (`X\s*\n` / `X$`) can never bind.
220
+ > For *completion* use `mode:'exit'`; for *content* matching use a bare marker
221
+ > (no `\n`/`$`). `mode:'pattern'` also accepts an optional `quietMs` settle
222
+ > fallback: `until:{mode:'pattern', pattern, quietMs:1000}` completes with
223
+ > `reason:'quiet'` (no `matchedPattern`) once output is stable for that long
224
+ > without a match — instead of hanging until `timeoutMs`. It is opt-in (omit
225
+ > `quietMs` to keep waiting for the pattern; long commands with mid-run silent
226
+ > gaps are unaffected).
227
+
228
+ ### `until:{mode:'exit'}` — real completion + exit code
229
+
230
+ The heuristic modes can misfire on the common "append a sentinel" idiom
231
+ (`some-task; echo DONE` matched by `DONE`): the sentinel also shows up in the
232
+ **echoed command line**, and for multi-line commands there is no reliable way to
233
+ tell that echo apart from real output. `mode:'exit'` removes the guesswork — the
234
+ server appends its own completion marker whose *printed* form differs from its
235
+ *typed* form, so it never matches the echoed command (even for multi-line input),
236
+ and it returns the real process exit code:
237
+
238
+ ```js
239
+ terminal({
240
+ action: 'run',
241
+ windowTitle: 'pwsh',
242
+ input: 'npm run build',
243
+ until: { mode: 'exit', shell: 'powershell' },
244
+ })
245
+ // → completion: { reason: 'exited', exitCode: 0, elapsedMs: … }
246
+ // output: just the command's real output (the injected marker is stripped)
247
+ ```
248
+
249
+ - **Pass `shell` explicitly** (`'bash'` or `'powershell'`). `shell:'auto'` detects
250
+ the shell from the terminal window, but it cannot see a shell running *inside*
251
+ SSH or WSL — the window still looks like its local host — so for remote/nested
252
+ sessions pass the remote side's shell (`auto` otherwise warns and may pick the
253
+ outer shell). A window whose process is genuinely unidentifiable (e.g. Windows
254
+ Terminal) returns `ExitModeShellAmbiguous`.
255
+ - **First-class shells:** `bash` and `powershell`. `cmd.exe` is not supported yet
256
+ (`ExitModeShellUnsupported`).
257
+ - **Unsafe input is rejected up front** (`ExitModeUnsafeInput`) rather than
258
+ hanging: a command ending mid-construct (unterminated quote, here-doc, `$(…)`,
259
+ a trailing `\` or PowerShell backtick).
260
+ - Exit mode controls its own delivery, so delivery-shaping `sendOptions`
261
+ (`method` / `preferClipboard` / `pressEnter` / `chunkSize` / `pasteKey`) are
262
+ rejected with `InvalidArgs`; focus options remain accepted.
263
+
264
+ ---
265
+
206
266
  ## Browser CDP automation
207
267
 
208
268
  For web automation, connect Chrome or Edge with the remote debugging port enabled — no Selenium or Playwright needed.
package/bin/launcher.js CHANGED
@@ -18,15 +18,15 @@ import path from "node:path";
18
18
  import { Readable } from "node:stream";
19
19
  import { pipeline } from "node:stream/promises";
20
20
 
21
- const PACKAGE_VERSION = "1.7.1";
21
+ const PACKAGE_VERSION = "1.8.0";
22
22
  const RELEASE_TAG = `v${PACKAGE_VERSION}`;
23
23
  const REPO_API_URL = `https://api.github.com/repos/Harusame64/desktop-touch-mcp/releases/tags/${RELEASE_TAG}`;
24
24
  const ASSET_NAME = "desktop-touch-mcp-windows.zip";
25
25
  const RELEASE_METADATA_FILE = ".desktop-touch-release.json";
26
26
  const RELEASE_MANIFEST = {
27
- tagName: "v1.7.1",
27
+ tagName: "v1.8.0",
28
28
  assetName: ASSET_NAME,
29
- sha256: "180d047bfe5b4e7014721b782ac30db1512cc18c440d276909e39cd27e56e07c",
29
+ sha256: "a4b330dba135e74707d082cbe1859ae9176b497373610165037f8ab59de35f8e",
30
30
  };
31
31
  const CACHE_ROOT = process.env.DESKTOP_TOUCH_MCP_HOME
32
32
  ? path.resolve(process.env.DESKTOP_TOUCH_MCP_HOME)
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@harusame64/desktop-touch-mcp",
3
- "version": "1.7.1",
3
+ "version": "1.8.0",
4
4
  "mcpName": "io.github.Harusame64/desktop-touch-mcp",
5
5
  "description": "Let Claude, Cursor, or any MCP client see and operate your Windows 10/11 desktop. 28 tools for screenshots, UI Automation, Chrome CDP, keyboard/mouse, terminal, with semantic discover-then-act targeting and per-action perception guards that avoid wrong-window typing and stale-coordinate clicks.",
6
6
  "keywords": [
@@ -89,6 +89,7 @@
89
89
  "test:watch": "vitest",
90
90
  "generate:stub-catalog": "node scripts/generate-stub-tool-catalog.mjs",
91
91
  "check:stub-catalog": "node scripts/generate-stub-tool-catalog.mjs && git diff --exit-code src/stub-tool-catalog.ts",
92
+ "check:failwith-fixtures": "node scripts/extract-failwith-shape-fixtures.mjs && git diff --exit-code tests/fixtures/failwith-callsite-shapes.json",
92
93
  "check:napi-safe": "node scripts/check-napi-safe.mjs",
93
94
  "check:native-types": "node scripts/check-native-types.mjs",
94
95
  "check:no-koffi": "node scripts/check-no-koffi.mjs",