ucu-mcp 0.3.6 → 0.3.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -5,6 +5,38 @@ All notable changes to this project will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [0.3.8] - 2026-06-08
9
+
10
+ ### Fixed
11
+
12
+ - `focus_app` no longer trips the user-activity pause. It used to be classified as `"other"` (neither observe nor input) so a recent mouse movement could block `focus_app` for 2 s; it is now in `OBSERVE_ACTIONS`, matching the production `withSafety` default. Symptom: OpenCode could not switch the active target app (e.g. CC Switch) without retrying until the cursor had been still for 2 s.
13
+ - `doctor` native-helper path resolution now checks `process.argv[1]` (npm / npx / global install), walks `import.meta.url` up to 4 levels, and falls back to `npm root -g`. Previously, when the MCP client launched `ucu-mcp` from a cwd other than the project root (the common case for `npx ucu-mcp`), the helper binaries would report as missing even though they were in the tarball. The new report includes `path` and a `tried[]` list so the model can see what was checked.
14
+ - `doctor` recommendations now list each missing macOS permission on its own line, name the host terminal app (so the user knows which entry to grant in System Settings), and add an Electron AX hint for the common case where `list_windows` returns `[]` even with Accessibility granted.
15
+
16
+ ### Tests
17
+
18
+ - `safety-guard`: `focus_app` is in `OBSERVE_ACTIONS`; `classifyAction("focus_app") === "observe"`; `withSafety`'s default `skipUserActivityPause` lets the call through even mid user-activity.
19
+ - `errors`: `WindowNotFoundError` preserves an inline `hint` field set by the platform layer, surfaced in the MCP error response.
20
+ - `macos-platform`: OCR JXA `"Failed to load screenshot image"` is re-thrown as `CaptureError` with a hint pointing at the missing Screen Recording permission (the typical cause is `screencapture` writing a 0-byte file when TCC denies Screen Recording, not the helper binary being absent).
21
+ - `tools-layer`: `doctor` report carries `terminalApp` and the richer `nativeHelpers = { cgevent, ocr } = { ok, path, tried[] }` shape.
22
+
23
+ ## [0.3.7] - 2026-06-07
24
+
25
+ ### Fixed
26
+
27
+ - `find_element` value-schema test is no longer a tautology. The 0.3.6 release fixed a *symptom* of the bug (the old test called `handler()` directly, bypassing the McpServer schema-validation wrapper, and then asserted `r.isError === true` which was `undefined`); the underlying tautology remained: the test re-created a local `z.string().min(1).optional()` instead of exercising the real schema. 0.3.7 exports the actual `findElementInputSchema` from `src/mcp/tools.ts` and the test now imports it via `findElementInputSchema.value`, so the assertion genuinely pins the production schema. Pins the 0.3.2 commit `46d4ddd` semantic.
28
+ - CHANGELOG/JXA `textMatches` comment math is now correct: 3 sources → 1 RegExp = **2 fewer** compilations per matched element. The 0.3.5/0.3.6 wording "three fewer" was off by one and has been corrected in both `src/platform/macos.ts` and this CHANGELOG.
29
+
30
+ ### Tests
31
+
32
+ - `macos-platform`: text-side regex pre-validation now has a regression test mirroring the existing value-side test (`findElement({text:"[", textMode:"regex"})` throws `PlatformError` with an `Invalid regex pattern` message). Pins the original text-side guard that the 0.3.2 commit mirrored onto the value side.
33
+ - `macos-platform`: all-no-bounds edge case for the `near` sort — when every result is missing `bounds`, the original JXA order is preserved. Pins the 0.3.2 commit `0710eca` no-bounds fallback against a future refactor that introduces a non-stable comparator.
34
+
35
+ ### Hygiene
36
+
37
+ - `findElementInputSchema` is now a named export from `src/mcp/tools.ts` (with a JSDoc comment explaining why the schema is exported) so the unit test can assert the production schema directly instead of constructing a local copy.
38
+ - Added `prepublishOnly` script to `package.json` that runs `npm test && npm run build` before `npm publish`. This is a structural guard against the yank rhythm that hit 0.3.3 and 0.3.5: a failed test or build will now block the publish at the npm level, not at the human level. (Raman review Minor #3)
39
+
8
40
  ## [0.3.5] - 2026-06-06 *(Yanked — see 0.3.6)*
9
41
 
10
42
  ### Tests
@@ -16,6 +48,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
16
48
  ### Changed
17
49
 
18
50
  - JXA `textMatches` regex branch now compiles the `RegExp` once per element instead of once per source (name / value / description) — three fewer compilations per matched element when `textMode="regex"`. The TS-side pre-validation in `findElement` guarantees the pattern is valid, so the `RegExp` constructor cannot throw here. (Herschel review perf Minor)
51
+ - JXA `textMatches` regex branch now compiles the `RegExp` once per element instead of once per source (name / value / description) — **two** fewer compilations per matched element when `textMode="regex"` (corrected in 0.3.7; 0.3.5/0.3.6 said "three fewer" which was off by one: 3 sources → 1 regex = 2 saved). The TS-side pre-validation in `findElement` guarantees the pattern is valid, so the `RegExp` constructor cannot throw here. (Herschel review perf Minor)
19
52
 
20
53
  ### Fixed
21
54
 
@@ -5,11 +5,32 @@
5
5
  * a shared safety/permission/retry pipeline (`withSafety`).
6
6
  */
7
7
  import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
8
+ import { z } from "zod";
8
9
  import type { AppTarget } from "../platform/base.js";
9
10
  /**
10
11
  * Get the currently active target context (set by focus_app).
11
12
  */
12
13
  export declare function getActiveTarget(): AppTarget | undefined;
14
+ export declare const findElementInputSchema: {
15
+ text: z.ZodOptional<z.ZodString>;
16
+ role: z.ZodOptional<z.ZodString>;
17
+ app: z.ZodOptional<z.ZodString>;
18
+ depth: z.ZodOptional<z.ZodNumber>;
19
+ includeBounds: z.ZodDefault<z.ZodBoolean>;
20
+ maxResults: z.ZodDefault<z.ZodNumber>;
21
+ textMode: z.ZodDefault<z.ZodEnum<{
22
+ contains: "contains";
23
+ exact: "exact";
24
+ regex: "regex";
25
+ }>>;
26
+ visibleOnly: z.ZodDefault<z.ZodBoolean>;
27
+ value: z.ZodOptional<z.ZodString>;
28
+ index: z.ZodOptional<z.ZodNumber>;
29
+ near: z.ZodOptional<z.ZodObject<{
30
+ x: z.ZodNumber;
31
+ y: z.ZodNumber;
32
+ }, z.core.$strip>>;
33
+ };
13
34
  export declare function startUserActivityMonitor(): void;
14
35
  export declare function stopUserActivityMonitor(): void;
15
36
  export declare function registerTools(server: McpServer): void;
@@ -37,6 +37,24 @@ const captureAfterFields = {
37
37
  captureMaxWidth: z.number().default(1280).describe("Maximum width for the post-action screenshot"),
38
38
  captureFormat: z.enum(["png", "jpeg"]).default("jpeg").describe("Format for the post-action screenshot"),
39
39
  };
40
+ // Exported so unit tests can pin the schema constraint directly instead
41
+ // of going through the McpServer wrapper (which `handler()` calls
42
+ // bypass). (Herschel review Major: 0.3.5's value='' test was a
43
+ // tautology because it re-created a local zod schema instead of
44
+ // asserting against this one.)
45
+ export const findElementInputSchema = {
46
+ text: z.string().optional().describe("Text to search"),
47
+ role: z.string().optional().describe("AX role"),
48
+ app: z.string().optional().describe("Target app"),
49
+ depth: z.number().optional().describe("AX tree depth"),
50
+ includeBounds: z.boolean().default(true).describe("Include bounds"),
51
+ maxResults: z.number().min(1).max(200).default(50).describe("Max results"),
52
+ textMode: z.enum(["contains", "exact", "regex"]).default("contains").describe("Text matching mode: contains (default), exact, or regex"),
53
+ visibleOnly: z.boolean().default(false).describe("Only return elements with valid on-screen bounds"),
54
+ value: z.string().min(1).optional().describe("Filter by AX element value (text/regex/exact, see textMode). Empty string is treated as unset (omit the field instead)."),
55
+ index: z.number().int().nonnegative().optional().describe("Return only the Nth match (0-based) after all other filtering and sorting"),
56
+ near: z.object({ x: z.number(), y: z.number() }).optional().describe("Sort results by ascending distance to this point and return closest first"),
57
+ };
40
58
  async function resolvePoint(x, y, windowId) {
41
59
  if (!windowId)
42
60
  return { x, y };
@@ -76,13 +94,21 @@ function errorDetails(error) {
76
94
  const err = error instanceof Error ? error : new Error(String(error));
77
95
  const code = error instanceof UcuError ? error.code : "UNKNOWN_ERROR";
78
96
  const retryable = error instanceof UcuError ? error.retryable : false;
79
- return {
97
+ // Some platform errors carry an inline `hint` field (added by macos.ts focusApp
98
+ // for the Electron AX case, etc.). Surface it under `hint` so the model can
99
+ // see remediation without parsing the message string.
100
+ const inlineHint = err.hint;
101
+ const details = {
80
102
  name: err.name,
81
103
  code,
82
104
  retryable,
83
105
  message: err.message,
84
106
  recovery: recoveryHint(code),
85
107
  };
108
+ if (typeof inlineHint === "string" && inlineHint.length > 0) {
109
+ details.hint = inlineHint;
110
+ }
111
+ return details;
86
112
  }
87
113
  let _actionCounter = 0;
88
114
  function nextActionId() {
@@ -263,7 +289,28 @@ export function registerTools(server) {
263
289
  includeMinimized: z.boolean().optional().describe("Include minimized windows"),
264
290
  }, async (params) => {
265
291
  const windows = await withSafety({ action: "list_windows", params: {}, requiresAccessibility: true, execute: () => getPlatform().listWindows(params.includeMinimized) });
266
- return { content: [{ type: "text", text: JSON.stringify(windows, null, 2) }] };
292
+ // Attach a diagnostic hint when the result is empty so the model can
293
+ // tell the difference between "no windows are open" and "AX enumeration
294
+ // failed for the target app" (common with Electron apps like CC Switch,
295
+ // VS Code, Discord). The windows list itself is the source of truth; the
296
+ // hint is advisory only.
297
+ let diagnostics;
298
+ if (windows.length === 0) {
299
+ let accessibility = "unknown";
300
+ try {
301
+ const { checkPermission } = await import("../safety/permissions.js");
302
+ const { granted } = await checkPermission("accessibility");
303
+ accessibility = granted ? "granted" : "denied";
304
+ }
305
+ catch { /* keep unknown */ }
306
+ const axNote = accessibility === "denied"
307
+ ? "Accessibility is currently denied to this terminal — grant it via System Settings > Privacy & Security > Accessibility, then retry."
308
+ : accessibility === "granted"
309
+ ? "Accessibility is granted. If you expected a specific app to appear here, it is likely an Electron app whose AX tree is not exposed to System Events; try modifying its config file or database directly rather than driving the UI."
310
+ : "Accessibility status is unknown. Run `doctor` first to verify.";
311
+ diagnostics = { hint: `list_windows returned 0 windows. ${axNote}`, accessibility };
312
+ }
313
+ return { content: [{ type: "text", text: JSON.stringify(diagnostics ? { windows, diagnostics } : windows, null, 2) }] };
267
314
  });
268
315
  registry.register("list_windows");
269
316
  registerTool("list_apps", "List all running applications", {}, async () => {
@@ -366,55 +413,98 @@ export function registerTools(server) {
366
413
  });
367
414
  registry.register("drag");
368
415
  registerTool("doctor", "Check system permissions, native helpers, and client readiness", {}, async () => {
369
- const { checkPermissions } = await import("../safety/permissions.js");
416
+ const { checkPermissions, getPermissionInstructions, getTerminalAppName } = await import("../safety/permissions.js");
370
417
  const { MacOSPlatform: MacPlat } = await import("../platform/macos.js");
371
- const { existsSync } = await import("node:fs");
372
- const { join, dirname } = await import("node:path");
418
+ const { existsSync, statSync } = await import("node:fs");
419
+ const { join, dirname, resolve } = await import("node:path");
373
420
  const { fileURLToPath } = await import("node:url");
374
421
  const { execFileSync } = await import("node:child_process");
375
422
  const permissions = await checkPermissions();
376
423
  const screenLocked = process.platform === "darwin" ? new MacPlat().isScreenLocked?.() ?? false : false;
377
- let nativeHelpers;
378
- if (process.platform === "darwin") {
424
+ const termApp = process.platform === "darwin" ? getTerminalAppName() : undefined;
425
+ // Resolve native helper binaries across every install layout we have seen:
426
+ // - dev: process.cwd() === project root
427
+ // - npm install --prefix X: argv[1] is in X/node_modules/ucu-mcp/...
428
+ // - global install via npm: argv[1] is in $(npm root -g)/ucu-mcp/...
429
+ // - npx: argv[1] is in ~/.npm/_npx/.../node_modules/ucu-mcp/...
430
+ // - bin/ucu-mcp.js is the entry; dist/src/*/tools.js is the module path
431
+ function resolveHelperPath(relParts) {
432
+ const tried = [];
433
+ const tryPaths = [];
379
434
  const moduleDir = dirname(fileURLToPath(import.meta.url));
380
- const checkPaths = (subdirs) => {
381
- const paths = [
382
- join(process.cwd(), ...subdirs),
383
- join(moduleDir, "..", ...subdirs),
384
- join(moduleDir, "..", "..", ...subdirs),
385
- ];
386
- return paths.some(p => { try {
387
- return existsSync(p);
435
+ const argv1 = process.argv[1] ? resolve(process.argv[1]) : "";
436
+ const argv1Dir = argv1 ? dirname(argv1) : "";
437
+ // (1) process.cwd() — dev invocation
438
+ tryPaths.push(join(process.cwd(), ...relParts));
439
+ // (2) argv[1] dir — npm / npx / global
440
+ if (argv1Dir) {
441
+ tryPaths.push(join(argv1Dir, ...relParts));
442
+ tryPaths.push(join(argv1Dir, "..", ...relParts));
443
+ tryPaths.push(join(argv1Dir, "..", "..", ...relParts));
444
+ }
445
+ // (3) module dir — dist/bin or dist/src/mcp; walk up to 4 levels
446
+ tryPaths.push(join(moduleDir, "..", ...relParts));
447
+ tryPaths.push(join(moduleDir, "..", "..", ...relParts));
448
+ tryPaths.push(join(moduleDir, "..", "..", "..", ...relParts));
449
+ tryPaths.push(join(moduleDir, "..", "..", "..", "..", ...relParts));
450
+ // (4) npm root -g for global install (best effort)
451
+ if (process.platform === "darwin") {
452
+ try {
453
+ const npmRoot = execFileSync("npm", ["root", "-g"], { encoding: "utf-8", timeout: 2000 }).trim();
454
+ if (npmRoot) {
455
+ tryPaths.push(join(npmRoot, "ucu-mcp", ...relParts));
456
+ }
388
457
  }
389
- catch {
390
- return false;
391
- } });
392
- };
458
+ catch { /* npm not on PATH is fine */ }
459
+ }
460
+ for (const p of tryPaths) {
461
+ tried.push(p);
462
+ try {
463
+ if (existsSync(p) && statSync(p).isFile())
464
+ return { path: p, tried };
465
+ }
466
+ catch { /* skip */ }
467
+ }
468
+ return { path: null, tried };
469
+ }
470
+ let nativeHelpers;
471
+ if (process.platform === "darwin") {
472
+ const cgevent = resolveHelperPath(["native", "cgevent", "cgevent-helper"]);
473
+ const ocr = resolveHelperPath(["native", "ocr", "ocr-helper"]);
393
474
  nativeHelpers = {
394
- cgevent: checkPaths(["native", "cgevent", "cgevent-helper"]),
395
- ocr: checkPaths(["native", "ocr", "ocr-helper"]),
475
+ cgevent: { ok: cgevent.path !== null, path: cgevent.path, tried: cgevent.tried.slice(0, 3) },
476
+ ocr: { ok: ocr.path !== null, path: ocr.path, tried: ocr.tried.slice(0, 3) },
396
477
  };
397
478
  }
398
479
  let readiness = "ready";
399
480
  const issues = [];
400
481
  if (!permissions.granted) {
401
482
  readiness = "blocked";
402
- issues.push("Missing macOS permissions: " + permissions.missing.join(", "));
483
+ for (const m of (permissions.missing ?? [])) {
484
+ issues.push(`Missing macOS permission: ${m}`);
485
+ }
403
486
  }
404
487
  if (screenLocked) {
405
488
  readiness = "blocked";
406
489
  issues.push("Screen is locked");
407
490
  }
408
491
  if (process.platform === "darwin" && nativeHelpers) {
409
- if (!nativeHelpers.cgevent) {
492
+ if (!nativeHelpers.cgevent.ok) {
410
493
  readiness = readiness === "ready" ? "degraded" : readiness;
411
- issues.push("Native CGEvent helper not found (input synthesis may crash on macOS Sequoia+)");
494
+ issues.push("Native CGEvent helper not found (input synthesis may crash on macOS Sequoia+). Run `npm run build` to compile it, or reinstall ucu-mcp so the helper ships from the tarball.");
412
495
  }
413
- if (!nativeHelpers.ocr) {
496
+ if (!nativeHelpers.ocr.ok) {
414
497
  readiness = readiness === "ready" ? "degraded" : readiness;
415
- issues.push("Native OCR helper not found (OCR may fail on macOS Sequoia+)");
498
+ issues.push("Native OCR helper not found (OCR may fail on macOS Sequoia+). Run `npm run build` to compile it, or reinstall ucu-mcp so the helper ships from the tarball.");
416
499
  }
417
500
  }
501
+ // Heuristic AX hint: if Accessibility is granted but list_windows consistently
502
+ // returns empty for the only app the model cared about, the model has likely
503
+ // hit the Electron AX limitation (Electron windows do not expose AX to System
504
+ // Events unless Accessibility is also granted to the Electron process itself,
505
+ // and the app has accessibility features enabled). This block is read-only —
506
+ // we never hit JXA here because the doctor must stay fast and side-effect free.
507
+ const electronHint = "If the target app is Electron (e.g. CC Switch, VS Code, Discord), list_windows may return [] even with Accessibility granted to your terminal. Grant Accessibility to the Electron app itself in System Settings > Privacy & Security > Accessibility, and restart the app. As a workaround, modify the app\'s config file or database directly rather than driving the UI.";
418
508
  const clients = {};
419
509
  for (const bin of ["claude", "codex", "opencode", "npx"]) {
420
510
  try {
@@ -427,16 +517,24 @@ export function registerTools(server) {
427
517
  }
428
518
  const recommendations = [];
429
519
  if (readiness === "blocked") {
430
- recommendations.push("Grant missing permissions in System Settings > Privacy & Security, then restart the MCP client.");
520
+ for (const m of (permissions.missing ?? [])) {
521
+ const app = termApp ?? "your terminal app";
522
+ recommendations.push(`${m}: ${getPermissionInstructions(m)} (Grant to ${app}.)`);
523
+ }
524
+ if (screenLocked)
525
+ recommendations.push("Unlock the screen, then retry.");
431
526
  }
432
- else if (readiness === "degraded") {
433
- if (nativeHelpers && (!nativeHelpers.cgevent || !nativeHelpers.ocr)) {
434
- recommendations.push("Run 'npm run build' to compile native Swift helpers.");
527
+ if (readiness !== "ready") {
528
+ if (process.platform === "darwin" && nativeHelpers && (!nativeHelpers.cgevent.ok || !nativeHelpers.ocr.ok)) {
529
+ recommendations.push("Run `npm run build` in the ucu-mcp project to compile native Swift helpers (cgevent-helper, ocr-helper).");
435
530
  }
436
531
  }
437
- else {
532
+ if (readiness === "ready") {
438
533
  recommendations.push("All checks passed. MCP client can proceed with automation.");
439
534
  }
535
+ else if (process.platform === "darwin") {
536
+ recommendations.push(electronHint);
537
+ }
440
538
  const report = {
441
539
  readiness,
442
540
  issues: issues.length > 0 ? issues : undefined,
@@ -445,6 +543,7 @@ export function registerTools(server) {
445
543
  node: process.version,
446
544
  permissions,
447
545
  screenLocked,
546
+ terminalApp: termApp,
448
547
  nativeHelpers,
449
548
  clients,
450
549
  safety: {
@@ -547,15 +646,7 @@ export function registerTools(server) {
547
646
  return actionResponse("move", { moved: true, x: pt.x, y: pt.y }, { x: pt.x, y: pt.y, windowId: params.windowId }, params.captureAfter, params.captureFormat, params.captureMaxWidth);
548
647
  });
549
648
  registry.register("move");
550
- registerTool("find_element", "Find accessibility elements by text, role, or value. Supports value/index/near selectors.", {
551
- text: z.string().optional().describe("Text to search"), role: z.string().optional().describe("AX role"), app: z.string().optional().describe("Target app"),
552
- depth: z.number().optional().describe("AX tree depth"), includeBounds: z.boolean().default(true).describe("Include bounds"), maxResults: z.number().min(1).max(200).default(50).describe("Max results"),
553
- textMode: z.enum(["contains", "exact", "regex"]).default("contains").describe("Text matching mode: contains (default), exact, or regex"),
554
- visibleOnly: z.boolean().default(false).describe("Only return elements with valid on-screen bounds"),
555
- value: z.string().min(1).optional().describe("Filter by AX element value (text/regex/exact, see textMode). Empty string is treated as unset (omit the field instead)."),
556
- index: z.number().int().nonnegative().optional().describe("Return only the Nth match (0-based) after all other filtering and sorting"),
557
- near: z.object({ x: z.number(), y: z.number() }).optional().describe("Sort results by ascending distance to this point and return closest first"),
558
- }, async (params) => {
649
+ registerTool("find_element", "Find accessibility elements by text, role, or value. Supports value/index/near selectors.", findElementInputSchema, async (params) => {
559
650
  const effectiveApp = params.app || getActiveTarget()?.appName;
560
651
  const response = await withSafety({ action: "find_element", params: {}, requiresAccessibility: true,
561
652
  execute: () => getPlatform().findElement({ text: params.text, role: params.role, app: effectiveApp, depth: params.depth, includeBounds: params.includeBounds, maxResults: params.maxResults, textMode: params.textMode, visibleOnly: params.visibleOnly, value: params.value, index: params.index, near: params.near }) });
@@ -248,7 +248,21 @@ export class MacOSPlatform {
248
248
  await new Promise((resolve) => setTimeout(resolve, 150));
249
249
  } while (Date.now() < deadline);
250
250
  if (!target) {
251
- throw new WindowNotFoundError(app);
251
+ // Wrap with a more diagnostic message: many real-world failures are
252
+ // Electron apps that do not expose their AX tree to System Events
253
+ // (CC Switch, VS Code, Discord, Slack). WindowNotFoundError carries the
254
+ // app name so the tool handler can surface a remediation hint. The
255
+ // bare WindowNotFoundError("CC Switch") was indistinguishable from
256
+ // "the app is not running", which led models to retry forever.
257
+ const err = new WindowNotFoundError(app);
258
+ err.hint =
259
+ "list_windows returned no match for this app. If the app is running, " +
260
+ "the most likely cause is that it is an Electron app whose AX tree is " +
261
+ "not exposed to System Events (System Settings > Privacy & Security > " +
262
+ "Accessibility must be granted to the Electron process itself, not just " +
263
+ "to the host terminal). As a workaround, modify the app's config file " +
264
+ "or database directly.";
265
+ throw err;
252
266
  }
253
267
  this.activeTarget = {
254
268
  targetId: randomUUID(),
@@ -779,8 +793,18 @@ export class MacOSPlatform {
779
793
  `;
780
794
  const out = execFileSync("osascript", ["-l", "JavaScript", "-e", jxaScript], { encoding: "utf-8", timeout: 30000 }).trim();
781
795
  const parsed = JSON.parse(out);
782
- if (parsed.error)
783
- throw new CaptureError(`ocr failed: ${parsed.error}`);
796
+ if (parsed.error) {
797
+ // Distinguish permission-class failures from real Vision errors.
798
+ // screencapture writes a 0-byte file when Screen Recording is not granted,
799
+ // and the JXA NSImage init then fails with "Failed to load screenshot image".
800
+ // Surface that as a PermissionError hint so the model can suggest the right fix.
801
+ const hint = parsed.error === "Failed to load screenshot image"
802
+ ? " (the screenshot file is empty or unreadable — Screen Recording permission is most likely missing; run `doctor` and grant Screen Recording to the host terminal, then retry)"
803
+ : parsed.error === "Failed to get CGImage from screenshot"
804
+ ? " (the screenshot could not be decoded — likely an empty capture; check Screen Recording permission)"
805
+ : "";
806
+ throw new CaptureError(`ocr failed: ${parsed.error}${hint}`);
807
+ }
784
808
  const imgWidth = buf.readUInt32BE(16);
785
809
  const scaleFactorX = screenSize.width / (region ? region.width : (imgWidth / scaleFactor));
786
810
  const elements = parsed.elements.map((el) => ({
@@ -100,6 +100,10 @@ export const OBSERVE_ACTIONS = new Set([
100
100
  "wait_for_element",
101
101
  "doctor",
102
102
  "clipboard_read",
103
+ // focus_app only sets the active target context via AppleScript activate
104
+ // and an AX window lookup — it does not synthesize mouse or keyboard input,
105
+ // so the user-activity pause must not block it. (OpenCode 0.3.7 follow-up)
106
+ "focus_app",
103
107
  ]);
104
108
  /** Actions that synthesize user input — need full user-activity protection. */
105
109
  export const INPUT_ACTIONS = new Set([
@@ -8,6 +8,10 @@ export interface PermissionDetail {
8
8
  granted: boolean;
9
9
  instructions: string;
10
10
  }
11
+ /**
12
+ * Get the name of the terminal app that the user needs to authorize.
13
+ */
14
+ export declare function getTerminalAppName(): string;
11
15
  export declare function checkPermissions(): Promise<PermissionCheckResult>;
12
16
  export declare function checkPermission(type: "accessibility" | "screenRecording"): Promise<{
13
17
  granted: boolean;
@@ -4,7 +4,7 @@ const execFileAsync = promisify(execFile);
4
4
  /**
5
5
  * Get the name of the terminal app that the user needs to authorize.
6
6
  */
7
- function getTerminalAppName() {
7
+ export function getTerminalAppName() {
8
8
  // Walk up the process tree to find the terminal emulator
9
9
  const ppid = process.ppid;
10
10
  // Common terminal app names
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "ucu-mcp",
3
- "version": "0.3.6",
3
+ "version": "0.3.8",
4
4
  "description": "MCP server for Universal Computer Use — desktop automation for AI agents via Model Context Protocol",
5
5
  "type": "module",
6
6
  "bin": {
@@ -26,6 +26,7 @@
26
26
  "test:integration": "vitest run tests/integration/",
27
27
  "test:macos-gui": "UCU_MACOS_GUI_SMOKE=1 vitest run tests/integration/macos-gui-smoke.test.ts",
28
28
  "test:client-cli": "UCU_CLIENT_CLI_SMOKE=1 vitest run tests/integration/client-cli-smoke.test.ts",
29
+ "prepublishOnly": "npx vitest run tests/unit/ && npm run build",
29
30
  "build:native": "cd native/cgevent && swiftc -O -o cgevent-helper main.swift -framework CoreGraphics -framework Foundation && cd ../ocr && swiftc -O -o ocr-helper main.swift -framework Vision -framework AppKit"
30
31
  },
31
32
  "keywords": [