npm - ucu-mcp - Versions diffs - 0.2.0 → 0.3.0 - Mend

ucu-mcp 0.2.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

package/CHANGELOG.md +17 -54
package/README.md +90 -4
package/dist/src/mcp/server.js +11 -6
package/dist/src/mcp/tools.d.ts +6 -1
package/dist/src/mcp/tools.js +206 -47
package/dist/src/platform/base.d.ts +26 -1
package/dist/src/platform/linux.d.ts +4 -2
package/dist/src/platform/linux.js +51 -0
package/dist/src/platform/macos.d.ts +6 -2
package/dist/src/platform/macos.js +160 -16
package/dist/src/platform/windows.d.ts +4 -2
package/dist/src/platform/windows.js +33 -0
package/dist/src/safety/guard.d.ts +8 -1
package/dist/src/safety/guard.js +43 -4
package/dist/src/util/errors.d.ts +6 -0
package/dist/src/util/errors.js +8 -0
package/package.json +2 -2

package/CHANGELOG.md CHANGED Viewed

@@ -5,67 +5,30 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
-## [0.2.0] - 2026-06-05
+## [0.3.0] - 2026-06-06
-### Changed
-- Replaced JXA keyboard/mouse input with native Swift CGEvent helper (`native/cgevent/cgevent-helper`), eliminating SIGSEGV crashes on macOS Sequoia+
-- `listWindows` switched from `CGWindowListCopyWindowInfo` to System Events for reliable window enumeration
-- `getWindowState` adapted to use System Events window IDs instead of CGWindow IDs
-- Fixed OCR JXA script — `isValid` guard now correctly handles missing/broken references
-### Fixed
-- `typeInElement` now properly escapes `$` in text to prevent JXA template-literal interpolation errors
-- AX element cache now refetches stale references instead of throwing
-- MCP server version now resolves from `package.json` instead of advertising stale `0.1.0`
-- `screenshot.maxWidth`, `screenshot.windowId`, and action `captureAfter` encode options now reach the execution path
-- `captureAfter` now returns a separate MCP image content item instead of embedding screenshot bytes in JSON text
-- Window-relative coordinate tools now reject stale `windowId` values instead of falling back to raw screen coordinates
-- Real input actions no longer use the shared retry wrapper after a partial failure
-- macOS AX traversal now uses `uiElements()` with `elements()` fallback, fixing TextEdit `AXTextArea` discovery
-- User activity monitoring now starts with the MCP server and initializes the cursor baseline before polling
-- Added client-friendly aliases and defaults: `press_key.modifiers`, `scroll.deltaX=0`, `wait_for_element.timeoutMs/intervalMs`, and `move.captureAfter`
-- README tool tables and OCR/captureAfter response examples now match the live MCP schema
-- macOS platform failures now use structured `UcuError` subclasses for screenshots, window lookup, AX permissions, stale elements, cursor queries, and input synthesis
-- MCP tool failures now return `isError: true` with JSON `error.name`, `error.code`, `error.retryable`, `error.message`, and `error.recovery` instead of forcing clients to parse plain text
-- `wait_for_element` no longer masks Accessibility/platform failures as ordinary timeouts; missing elements still time out, but real lookup failures surface through the structured MCP error response
-- macOS `listWindows` now uses a short defensive-copy cache for repeated window lookups, reducing back-to-back window resolution calls from seconds to near-zero while `focusApp` still invalidates before activating a target app
-- Added optional real client CLI smoke coverage for Claude Code CLI, Codex CLI, and OpenCode MCP visibility
-- README now includes verified `claude mcp add`, `codex mcp add`, and OpenCode `opencode.json` setup paths
-### Tests
-- Unit test count grew from 83 → 161
-- Optional client CLI smoke: 3/3 passing with `npm run test:client-cli`
-- GUI smoke tests 6/6 passing (`UCU_MACOS_GUI_SMOKE=1`)
-## [0.1.0] - 2026-06-02
 ### Added
-- Initial release of UCU-MCP (Universal Computer Use MCP Server)
-- **22 MCP tools** for desktop automation via Model Context Protocol:
-  - Screen capture: `screen_capture`, `screen_capture_active_window`
-  - Mouse control: `mouse_move`, `mouse_click`, `mouse_double_click`, `mouse_drag`, `mouse_scroll`
-  - Keyboard control: `keyboard_type`, `keyboard_hotkey`, `keyboard_key`
-  - Clipboard: `clipboard_read`, `clipboard_write`
-  - Window management: `window_list`, `window_activate`, `window_close`
-  - Application control: `app_launch`, `app_quit`
-  - System: `system_info`, `process_list`, `process_terminate`
-  - Safety: `doctor` command for permission and environment diagnostics
-- **Safety features**:
-  - URL blocklist to prevent navigation to sensitive sites
-  - Lock screen guard (macOS) — blocks automation when screen is locked
-  - Typed text injection scan — validates keyboard input before injection
-  - Focus steal suppression — prevents accidental focus changes during automation
-  - User interaction monitor — tracks user activity for safety coordination
-- **macOS platform support** with Accessibility API integration
-- TypeScript-first codebase with full type definitions
-- CLI entry point with `doctor` diagnostic command
+- Scenario-based MCP Instructions — tool-usage guidance organized by task pattern (form fill, menu bar click, screen read, app switch, verify action, wait for change, recover stale target, clipboard)
+- findElement multi-strategy: `value` filter (AX value, respects textMode), `index` selector (0-based Nth match), `near` sorter (ascending distance to point)
+- wait_for_element `until` parameter: `appear` (default), `disappear` (poll until gone), `value_change` (poll until first match value differs)
+- Action Receipt v1 — unified receipt structure for all action-class tools (click, double_click, scroll, drag, move, type_text, press_key, click_element, set_value, type_in_element)
+- Receipt fields: actionId (base36-timestamp unique ID), action, status (ok/partial/blocked), target (location context), result (business result), capture (screenshot metadata), warnings, next (suggested next step)
+- Partial receipt when action succeeds but post-action screenshot fails: status="partial", capture.error contains error details, warnings includes "Post-action screenshot capture failed"
+- Target Session v1 — `focus_app` now returns stable target metadata (`targetId`, `appName`, `pid`, `windowId`, `title`, `capturedAt`) for follow-up tool calls
+- `TARGET_STALE` structured errors for active target windows that disappear before `get_window_state`
 ### Changed
+- MCP instructions rewritten from generic description to scenario-driven workflow recommendations
+- `wait_for_element` description updated to reflect `until` parameter semantics
+- `find_element` schema extended with `value`, `index`, `near` parameters
+- Action tool responses now wrap business results under `result` instead of returning them at the top level
+- captureAfter failures now surface through receipt.capture.error instead of a flat captureError object
+- `get_window_state` can use the prior `focus_app` target when `windowId` is omitted
+- AX tools (`find_element`, `wait_for_element`, `click_element`, `set_value`, `type_in_element`) can use the prior `focus_app` target when `app` is omitted
 - Rewrote `src/mcp/tools.ts` with comprehensive 22-tool registry:
   - Unified `withSafety` wrapper for all automation actions
   - `captureAfter` helper for post-action screenshots

package/README.md CHANGED Viewed

@@ -81,7 +81,7 @@ UCU-MCP provides 22 tools across five categories:
 | `screenshot` | Capture screen, window, or region as PNG/JPEG image content | `display?`, `windowId?`, `region?`, `maxWidth?`, `format?` |
 | `list_windows` | List all on-screen windows with IDs, titles, bounds | `includeMinimized?` |
 | `list_apps` | List visible macOS apps with pid, frontmost state, and window count | — |
-| `focus_app` | Select an app/window target context for later AX tools | `app` |
+| `focus_app` | Select an app/window target context for later AX tools; returns `targetId`, `appName`, `pid`, `windowId`, `title`, and `capturedAt` | `app` |
 | `get_window_state` | Get accessibility tree of a window, or the prior focus_app target when windowId is omitted | `windowId?`, `depth?`, `includeBounds?` |
 | `get_screen_size` | Get screen dimensions | `display?` |
 | `ocr` | Perform OCR on screen or region; returns text with bounding boxes and confidence | `display?`, `region?` |
@@ -108,8 +108,8 @@ UCU-MCP provides 22 tools across five categories:
 | Tool | Description | Key Parameters |
 |------|-------------|----------------|
-| `find_element` | Find UI element by text, role, or description using AX APIs | `text?`, `role?`, `app?`, `depth?`, `includeBounds?`, `maxResults?` |
-| `click_element` | Click an AX element by its id (from find_element); refetches equivalent elements after UI updates | `elementId`, `app?`, `captureAfter?` |
+| `find_element` | Find UI element by text, role, or description using AX APIs, using the current focus_app target when app is omitted | `text?`, `role?`, `app?`, `depth?`, `includeBounds?`, `maxResults?` |
+| `click_element` | Click an AX element by its id (from find_element), using the current focus_app target when app is omitted; refetches equivalent elements after UI updates | `elementId`, `app?`, `captureAfter?` |
 | `set_value` | Set an AX element's value directly without focusing it, using the current focus_app target when app is omitted | `elementId`, `value`, `app?`, `captureAfter?` |
 | `type_in_element` | Type text into a specific AX text field element; may focus the element and refetches equivalent elements after UI updates | `elementId`, `text`, `app?`, `clearFirst?`, `captureAfter?` |
@@ -121,10 +121,12 @@ UCU-MCP provides 22 tools across five categories:
 | `wait` | Wait for UI state to settle after launches, animations, or navigation | `ms` |
 | `wait_for_element` | Poll the AX tree until a matching element appears | `text?`, `role?`, `app?`, `timeout?`, `timeoutMs?`, `interval?`, `intervalMs?` |
-Action tools accept `captureAfter`, `captureMaxWidth`, and `captureFormat` so an agent can receive a post-action screenshot as a second MCP image content item in the same response instead of spending another round trip on `screenshot`.
+Action tools accept `captureAfter`, `captureMaxWidth`, and `captureFormat` so an agent can receive a post-action screenshot as a second MCP image content item in the same response instead of spending another round trip on `screenshot`. When `captureAfter` is requested and the action succeeds, the tool returns an `ActionReceipt` (see the Action Receipt section below) with `capture.status: "ok"`. If post-action capture fails, the receipt has `status: "partial"` and `capture.status: "error"` with the error details. If `captureAfter` is omitted, `capture.status` is `"skipped"`.
 For fast AX discovery on large windows, use `find_element` with `includeBounds=false` and a small `maxResults`. Keep bounds enabled when the result may be used for coordinate fallback.
+`focus_app` establishes a session target for follow-up observation and AX actions. After focusing an app, `get_window_state` may omit `windowId`, and AX tools may omit `app`. If the focused window closes or is replaced, UCU-MCP returns a structured `TARGET_STALE` error so the agent can refresh with `focus_app` or `list_windows` instead of silently acting on a different target.
 ## OCR Tool Usage
 The `ocr` tool captures a screenshot and runs optical character recognition, returning each detected text element with its position and confidence score.
@@ -210,6 +212,90 @@ The AX (Accessibility) element tools let you interact with UI controls by their
 }
 ```
+## Action Receipt
+Action tools (`click`, `double_click`, `scroll`, `drag`, `move`, `type_text`, `press_key`, `click_element`, `set_value`, `type_in_element`) return a unified `ActionReceipt` JSON object that wraps the action result, target information, and optional post-action screenshot metadata.
+### Receipt structure
+| Field | Type | Description |
+|-------|------|-------------|
+| `actionId` | `string` | Unique base36-timestamp ID (e.g. `a1x9z2k-1`) |
+| `action` | `string` | Tool name that produced this receipt |
+| `status` | `"ok" \| "partial" \| "blocked"` | Overall action status |
+| `target` | `object` | What was acted upon (coordinates, elementId, app, windowId) |
+| `result` | `object` | Original business result (clicked, x, y, etc.) |
+| `capture` | `object` | Screenshot metadata (requested, status, format, maxWidth, error) |
+| `warnings` | `string[]` | Non-fatal warnings array |
+| `next` | `string` | Suggested next action |
+### Examples
+**Success with captureAfter:**
+```json
+{
+  "actionId": "a1x9z2k-1",
+  "action": "click",
+  "status": "ok",
+  "target": { "x": 100, "y": 200 },
+  "result": { "clicked": true, "x": 100, "y": 200 },
+  "capture": {
+    "requested": true,
+    "status": "ok",
+    "format": "jpeg",
+    "maxWidth": 1280
+  },
+  "warnings": [],
+  "next": "find_element or get_window_state"
+}
+```
+**Success without captureAfter:**
+```json
+{
+  "actionId": "a1x9z2k-2",
+  "action": "click_element",
+  "status": "ok",
+  "target": { "elementId": "AXButton-42", "app": "Safari" },
+  "result": { "clicked": true, "elementId": "AXButton-42" },
+  "capture": {
+    "requested": false,
+    "status": "skipped"
+  },
+  "warnings": [],
+  "next": "find_element or get_window_state"
+}
+```
+**Partial when capture fails:**
+```json
+{
+  "actionId": "a1x9z2k-3",
+  "action": "click",
+  "status": "partial",
+  "target": { "x": 100, "y": 200 },
+  "result": { "clicked": true, "x": 100, "y": 200 },
+  "capture": {
+    "requested": true,
+    "status": "error",
+    "format": "jpeg",
+    "maxWidth": 1280,
+    "error": {
+      "name": "CaptureError",
+      "code": "CAPTURE_FAILED",
+      "retryable": true,
+      "message": "Screenshot capture failed after action",
+      "recovery": "Check Screen Recording permission and retry."
+    }
+  },
+  "warnings": ["Post-action screenshot capture failed"],
+  "next": "screenshot"
+}
+```
 ## macOS Permission Setup
 UCU-MCP on macOS requires two system permissions:

package/dist/src/mcp/server.js CHANGED Viewed

@@ -5,15 +5,20 @@ import { fileURLToPath } from "node:url";
 import { createStdioTransport } from "./transport.js";
 import { registerTools, startUserActivityMonitor } from "./tools.js";
 const UCU_MCP_INSTRUCTIONS = `
-UCU-MCP is a cross-client computer-use server for Claude Code CLI, Claude Code Desktop, OpenCode, and other MCP clients.
+UCU-MCP is a cross-client computer-use server for Claude Code CLI/Desktop, OpenCode, and other MCP clients.
-Use screenshots and window state to observe before acting. On macOS, prefer list_apps/focus_app to establish the target app context, then use AX element tools when an element can be identified: find_element, then click_element, set_value, or type_in_element. Fall back to coordinates only when AX lookup is unavailable or ambiguous.
+Pick the right tool sequence for the task:
-Before repeated UI work, call get_screen_size and list_windows so coordinates and target windows are explicit. Use get_window_state for structured UI trees. Use ocr when visible text is not exposed through Accessibility. For tight observe-act loops, set captureAfter=true on action tools to receive a post-action screenshot in the same tool response.
+• Fill a form field → find_element (text/role) + type_in_element or set_value. Prefer AX over coordinates.
+• Click a menu bar item → get_screen_size + click with coordinates (menu bar is not in the AX tree).
+• Read what's on screen → screenshot; for text not in AX use ocr; for a structured tree use get_window_state.
+• Switch between apps → list_apps, then focus_app; subsequent tools use the active target context.
+• Verify an action succeeded → captureAfter=true on action tools, or call screenshot afterwards.
+• Wait for UI to change → wait_for_element (until: "appear" default; also "disappear" or "value_change").
+• Recover from TARGET_STALE → call focus_app again for the target app, then retry the action.
+• Read or write the clipboard → clipboard_read / clipboard_write.
-Safety model: actions are blocked while macOS is locked, dangerous shortcuts and sensitive windows are blocked, and suspicious injected text is rejected. For text entry into UI controls, prefer type_in_element because it can refetch equivalent AX elements if the UI tree changes.
-For Claude Code CLI/Desktop and OpenCode configs, run the ucu-mcp executable over stdio. If tools fail on macOS, run doctor first to check Accessibility and Screen Recording permissions. Windows and Linux adapters are explicit stubs until their native backends are implemented.
+General rules: on macOS call list_apps/focus_app first to establish target context, then prefer AX tools (find_element → click_element / type_in_element / set_value). Use coordinates only when AX lookup is unavailable. Actions are blocked while macOS is locked; dangerous shortcuts and sensitive windows are blocked; suspicious injected text is rejected. type_in_element can refetch equivalent AX elements when the UI tree changes. Run doctor to check Accessibility and Screen Recording permissions. Windows and Linux adapters are explicit stubs until their native backends are implemented.
 `.trim();
 function getPackageVersion() {
     let dir = dirname(fileURLToPath(import.meta.url));

package/dist/src/mcp/tools.d.ts CHANGED Viewed

@@ -1,10 +1,15 @@
 /**
  * Tool registry for UCU-MCP.
  *
- * Registers 22 MCP tools on the server and dispatches each call through
+ * Registers 24 MCP tools on the server and dispatches each call through
  * a shared safety/permission/retry pipeline (`withSafety`).
  */
 import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
+import type { AppTarget } from "../platform/base.js";
+/**
+ * Get the currently active target context (set by focus_app).
+ */
+export declare function getActiveTarget(): AppTarget | undefined;
 export declare function startUserActivityMonitor(): void;
 export declare function stopUserActivityMonitor(): void;
 export declare function registerTools(server: McpServer): void;