ucu-mcp 0.2.0 → 0.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +31 -53
- package/README.md +90 -4
- package/dist/src/mcp/server.js +11 -6
- package/dist/src/mcp/tools.d.ts +6 -1
- package/dist/src/mcp/tools.js +219 -46
- package/dist/src/platform/base.d.ts +26 -1
- package/dist/src/platform/linux.d.ts +4 -2
- package/dist/src/platform/linux.js +51 -0
- package/dist/src/platform/macos.d.ts +6 -2
- package/dist/src/platform/macos.js +160 -16
- package/dist/src/platform/windows.d.ts +4 -2
- package/dist/src/platform/windows.js +33 -0
- package/dist/src/safety/guard.d.ts +8 -1
- package/dist/src/safety/guard.js +43 -4
- package/dist/src/util/errors.d.ts +26 -1
- package/dist/src/util/errors.js +43 -11
- package/dist/src/util/metrics.d.ts +37 -0
- package/dist/src/util/metrics.js +97 -0
- package/native/cgevent/cgevent-helper +0 -0
- package/native/ocr/ocr-helper +0 -0
- package/package.json +2 -2
package/CHANGELOG.md
CHANGED
|
@@ -5,67 +5,45 @@ All notable changes to this project will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
-
## [0.
|
|
8
|
+
## [0.3.1] - 2026-06-06
|
|
9
9
|
|
|
10
|
-
###
|
|
10
|
+
### Bug fixes
|
|
11
|
+
|
|
12
|
+
- `wait_for_element` `until="value_change"` mode no longer spins until timeout when the matched element's initial `value` is `undefined` (e.g. progress indicators / status text without an AX value). A separate `hasInitial` flag now tracks "first sample captured" so a captured `undefined` is preserved as the baseline. On timeout, `value_change` mode now reports `"never_appeared"` (no match ever found) vs `"value_unchanged"` (match found but value did not change) so the model can branch on the result. (Singer review Major fix)
|
|
13
|
+
|
|
14
|
+
### Note
|
|
15
|
+
|
|
16
|
+
Post-0.3.0 release changes already merged on `main` and folded into 0.3.1:
|
|
17
|
+
|
|
18
|
+
- Scenario-based MCP `instructions` covering forms, menu bar, screen read, app switch, verify action, wait for UI, recover from `TARGET_STALE`, clipboard read/write. (47dbcff)
|
|
19
|
+
- `find_element` multi-strategy: new `value` (textMode-aware), `index` (Nth match), `near` (distance-sorted) selectors. (47dbcff)
|
|
20
|
+
- `wait_for_element` `until` parameter: new modes `appear` (default), `disappear`, `value_change`. (47dbcff)
|
|
21
|
+
- `UcuError` class static `code` renamed to `defaultCode` to avoid clashing with the instance `code` field. (eec7afd)
|
|
22
|
+
|
|
23
|
+
## [0.3.0] - 2026-06-06
|
|
11
24
|
|
|
12
|
-
- Replaced JXA keyboard/mouse input with native Swift CGEvent helper (`native/cgevent/cgevent-helper`), eliminating SIGSEGV crashes on macOS Sequoia+
|
|
13
|
-
- `listWindows` switched from `CGWindowListCopyWindowInfo` to System Events for reliable window enumeration
|
|
14
|
-
- `getWindowState` adapted to use System Events window IDs instead of CGWindow IDs
|
|
15
|
-
- Fixed OCR JXA script — `isValid` guard now correctly handles missing/broken references
|
|
16
|
-
|
|
17
|
-
### Fixed
|
|
18
|
-
|
|
19
|
-
- `typeInElement` now properly escapes `$` in text to prevent JXA template-literal interpolation errors
|
|
20
|
-
- AX element cache now refetches stale references instead of throwing
|
|
21
|
-
- MCP server version now resolves from `package.json` instead of advertising stale `0.1.0`
|
|
22
|
-
- `screenshot.maxWidth`, `screenshot.windowId`, and action `captureAfter` encode options now reach the execution path
|
|
23
|
-
- `captureAfter` now returns a separate MCP image content item instead of embedding screenshot bytes in JSON text
|
|
24
|
-
- Window-relative coordinate tools now reject stale `windowId` values instead of falling back to raw screen coordinates
|
|
25
|
-
- Real input actions no longer use the shared retry wrapper after a partial failure
|
|
26
|
-
- macOS AX traversal now uses `uiElements()` with `elements()` fallback, fixing TextEdit `AXTextArea` discovery
|
|
27
|
-
- User activity monitoring now starts with the MCP server and initializes the cursor baseline before polling
|
|
28
|
-
- Added client-friendly aliases and defaults: `press_key.modifiers`, `scroll.deltaX=0`, `wait_for_element.timeoutMs/intervalMs`, and `move.captureAfter`
|
|
29
|
-
- README tool tables and OCR/captureAfter response examples now match the live MCP schema
|
|
30
|
-
- macOS platform failures now use structured `UcuError` subclasses for screenshots, window lookup, AX permissions, stale elements, cursor queries, and input synthesis
|
|
31
|
-
- MCP tool failures now return `isError: true` with JSON `error.name`, `error.code`, `error.retryable`, `error.message`, and `error.recovery` instead of forcing clients to parse plain text
|
|
32
|
-
- `wait_for_element` no longer masks Accessibility/platform failures as ordinary timeouts; missing elements still time out, but real lookup failures surface through the structured MCP error response
|
|
33
|
-
- macOS `listWindows` now uses a short defensive-copy cache for repeated window lookups, reducing back-to-back window resolution calls from seconds to near-zero while `focusApp` still invalidates before activating a target app
|
|
34
|
-
- Added optional real client CLI smoke coverage for Claude Code CLI, Codex CLI, and OpenCode MCP visibility
|
|
35
|
-
- README now includes verified `claude mcp add`, `codex mcp add`, and OpenCode `opencode.json` setup paths
|
|
36
|
-
|
|
37
|
-
### Tests
|
|
38
|
-
|
|
39
|
-
- Unit test count grew from 83 → 161
|
|
40
|
-
- Optional client CLI smoke: 3/3 passing with `npm run test:client-cli`
|
|
41
|
-
- GUI smoke tests 6/6 passing (`UCU_MACOS_GUI_SMOKE=1`)
|
|
42
|
-
|
|
43
|
-
## [0.1.0] - 2026-06-02
|
|
44
25
|
|
|
45
26
|
### Added
|
|
46
27
|
|
|
47
|
-
-
|
|
48
|
-
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
- System: `system_info`, `process_list`, `process_terminate`
|
|
56
|
-
- Safety: `doctor` command for permission and environment diagnostics
|
|
57
|
-
- **Safety features**:
|
|
58
|
-
- URL blocklist to prevent navigation to sensitive sites
|
|
59
|
-
- Lock screen guard (macOS) — blocks automation when screen is locked
|
|
60
|
-
- Typed text injection scan — validates keyboard input before injection
|
|
61
|
-
- Focus steal suppression — prevents accidental focus changes during automation
|
|
62
|
-
- User interaction monitor — tracks user activity for safety coordination
|
|
63
|
-
- **macOS platform support** with Accessibility API integration
|
|
64
|
-
- TypeScript-first codebase with full type definitions
|
|
65
|
-
- CLI entry point with `doctor` diagnostic command
|
|
28
|
+
- Scenario-based MCP Instructions — tool-usage guidance organized by task pattern (form fill, menu bar click, screen read, app switch, verify action, wait for change, recover stale target, clipboard)
|
|
29
|
+
- findElement multi-strategy: `value` filter (AX value, respects textMode), `index` selector (0-based Nth match), `near` sorter (ascending distance to point)
|
|
30
|
+
- wait_for_element `until` parameter: `appear` (default), `disappear` (poll until gone), `value_change` (poll until first match value differs)
|
|
31
|
+
- Action Receipt v1 — unified receipt structure for all action-class tools (click, double_click, scroll, drag, move, type_text, press_key, click_element, set_value, type_in_element)
|
|
32
|
+
- Receipt fields: actionId (base36-timestamp unique ID), action, status (ok/partial/blocked), target (location context), result (business result), capture (screenshot metadata), warnings, next (suggested next step)
|
|
33
|
+
- Partial receipt when action succeeds but post-action screenshot fails: status="partial", capture.error contains error details, warnings includes "Post-action screenshot capture failed"
|
|
34
|
+
- Target Session v1 — `focus_app` now returns stable target metadata (`targetId`, `appName`, `pid`, `windowId`, `title`, `capturedAt`) for follow-up tool calls
|
|
35
|
+
- `TARGET_STALE` structured errors for active target windows that disappear before `get_window_state`
|
|
66
36
|
|
|
67
37
|
### Changed
|
|
68
38
|
|
|
39
|
+
- MCP instructions rewritten from generic description to scenario-driven workflow recommendations
|
|
40
|
+
- `wait_for_element` description updated to reflect `until` parameter semantics
|
|
41
|
+
- `find_element` schema extended with `value`, `index`, `near` parameters
|
|
42
|
+
- Action tool responses now wrap business results under `result` instead of returning them at the top level
|
|
43
|
+
- captureAfter failures now surface through receipt.capture.error instead of a flat captureError object
|
|
44
|
+
- `get_window_state` can use the prior `focus_app` target when `windowId` is omitted
|
|
45
|
+
- AX tools (`find_element`, `wait_for_element`, `click_element`, `set_value`, `type_in_element`) can use the prior `focus_app` target when `app` is omitted
|
|
46
|
+
|
|
69
47
|
- Rewrote `src/mcp/tools.ts` with comprehensive 22-tool registry:
|
|
70
48
|
- Unified `withSafety` wrapper for all automation actions
|
|
71
49
|
- `captureAfter` helper for post-action screenshots
|
package/README.md
CHANGED
|
@@ -81,7 +81,7 @@ UCU-MCP provides 22 tools across five categories:
|
|
|
81
81
|
| `screenshot` | Capture screen, window, or region as PNG/JPEG image content | `display?`, `windowId?`, `region?`, `maxWidth?`, `format?` |
|
|
82
82
|
| `list_windows` | List all on-screen windows with IDs, titles, bounds | `includeMinimized?` |
|
|
83
83
|
| `list_apps` | List visible macOS apps with pid, frontmost state, and window count | — |
|
|
84
|
-
| `focus_app` | Select an app/window target context for later AX tools | `app` |
|
|
84
|
+
| `focus_app` | Select an app/window target context for later AX tools; returns `targetId`, `appName`, `pid`, `windowId`, `title`, and `capturedAt` | `app` |
|
|
85
85
|
| `get_window_state` | Get accessibility tree of a window, or the prior focus_app target when windowId is omitted | `windowId?`, `depth?`, `includeBounds?` |
|
|
86
86
|
| `get_screen_size` | Get screen dimensions | `display?` |
|
|
87
87
|
| `ocr` | Perform OCR on screen or region; returns text with bounding boxes and confidence | `display?`, `region?` |
|
|
@@ -108,8 +108,8 @@ UCU-MCP provides 22 tools across five categories:
|
|
|
108
108
|
|
|
109
109
|
| Tool | Description | Key Parameters |
|
|
110
110
|
|------|-------------|----------------|
|
|
111
|
-
| `find_element` | Find UI element by text, role, or description using AX APIs | `text?`, `role?`, `app?`, `depth?`, `includeBounds?`, `maxResults?` |
|
|
112
|
-
| `click_element` | Click an AX element by its id (from find_element); refetches equivalent elements after UI updates | `elementId`, `app?`, `captureAfter?` |
|
|
111
|
+
| `find_element` | Find UI element by text, role, or description using AX APIs, using the current focus_app target when app is omitted | `text?`, `role?`, `app?`, `depth?`, `includeBounds?`, `maxResults?` |
|
|
112
|
+
| `click_element` | Click an AX element by its id (from find_element), using the current focus_app target when app is omitted; refetches equivalent elements after UI updates | `elementId`, `app?`, `captureAfter?` |
|
|
113
113
|
| `set_value` | Set an AX element's value directly without focusing it, using the current focus_app target when app is omitted | `elementId`, `value`, `app?`, `captureAfter?` |
|
|
114
114
|
| `type_in_element` | Type text into a specific AX text field element; may focus the element and refetches equivalent elements after UI updates | `elementId`, `text`, `app?`, `clearFirst?`, `captureAfter?` |
|
|
115
115
|
|
|
@@ -121,10 +121,12 @@ UCU-MCP provides 22 tools across five categories:
|
|
|
121
121
|
| `wait` | Wait for UI state to settle after launches, animations, or navigation | `ms` |
|
|
122
122
|
| `wait_for_element` | Poll the AX tree until a matching element appears | `text?`, `role?`, `app?`, `timeout?`, `timeoutMs?`, `interval?`, `intervalMs?` |
|
|
123
123
|
|
|
124
|
-
Action tools accept `captureAfter`, `captureMaxWidth`, and `captureFormat` so an agent can receive a post-action screenshot as a second MCP image content item in the same response instead of spending another round trip on `screenshot`.
|
|
124
|
+
Action tools accept `captureAfter`, `captureMaxWidth`, and `captureFormat` so an agent can receive a post-action screenshot as a second MCP image content item in the same response instead of spending another round trip on `screenshot`. When `captureAfter` is requested and the action succeeds, the tool returns an `ActionReceipt` (see the Action Receipt section below) with `capture.status: "ok"`. If post-action capture fails, the receipt has `status: "partial"` and `capture.status: "error"` with the error details. If `captureAfter` is omitted, `capture.status` is `"skipped"`.
|
|
125
125
|
|
|
126
126
|
For fast AX discovery on large windows, use `find_element` with `includeBounds=false` and a small `maxResults`. Keep bounds enabled when the result may be used for coordinate fallback.
|
|
127
127
|
|
|
128
|
+
`focus_app` establishes a session target for follow-up observation and AX actions. After focusing an app, `get_window_state` may omit `windowId`, and AX tools may omit `app`. If the focused window closes or is replaced, UCU-MCP returns a structured `TARGET_STALE` error so the agent can refresh with `focus_app` or `list_windows` instead of silently acting on a different target.
|
|
129
|
+
|
|
128
130
|
## OCR Tool Usage
|
|
129
131
|
|
|
130
132
|
The `ocr` tool captures a screenshot and runs optical character recognition, returning each detected text element with its position and confidence score.
|
|
@@ -210,6 +212,90 @@ The AX (Accessibility) element tools let you interact with UI controls by their
|
|
|
210
212
|
}
|
|
211
213
|
```
|
|
212
214
|
|
|
215
|
+
## Action Receipt
|
|
216
|
+
|
|
217
|
+
Action tools (`click`, `double_click`, `scroll`, `drag`, `move`, `type_text`, `press_key`, `click_element`, `set_value`, `type_in_element`) return a unified `ActionReceipt` JSON object that wraps the action result, target information, and optional post-action screenshot metadata.
|
|
218
|
+
|
|
219
|
+
### Receipt structure
|
|
220
|
+
|
|
221
|
+
| Field | Type | Description |
|
|
222
|
+
|-------|------|-------------|
|
|
223
|
+
| `actionId` | `string` | Unique base36-timestamp ID (e.g. `a1x9z2k-1`) |
|
|
224
|
+
| `action` | `string` | Tool name that produced this receipt |
|
|
225
|
+
| `status` | `"ok" \| "partial" \| "blocked"` | Overall action status |
|
|
226
|
+
| `target` | `object` | What was acted upon (coordinates, elementId, app, windowId) |
|
|
227
|
+
| `result` | `object` | Original business result (clicked, x, y, etc.) |
|
|
228
|
+
| `capture` | `object` | Screenshot metadata (requested, status, format, maxWidth, error) |
|
|
229
|
+
| `warnings` | `string[]` | Non-fatal warnings array |
|
|
230
|
+
| `next` | `string` | Suggested next action |
|
|
231
|
+
|
|
232
|
+
### Examples
|
|
233
|
+
|
|
234
|
+
**Success with captureAfter:**
|
|
235
|
+
|
|
236
|
+
```json
|
|
237
|
+
{
|
|
238
|
+
"actionId": "a1x9z2k-1",
|
|
239
|
+
"action": "click",
|
|
240
|
+
"status": "ok",
|
|
241
|
+
"target": { "x": 100, "y": 200 },
|
|
242
|
+
"result": { "clicked": true, "x": 100, "y": 200 },
|
|
243
|
+
"capture": {
|
|
244
|
+
"requested": true,
|
|
245
|
+
"status": "ok",
|
|
246
|
+
"format": "jpeg",
|
|
247
|
+
"maxWidth": 1280
|
|
248
|
+
},
|
|
249
|
+
"warnings": [],
|
|
250
|
+
"next": "find_element or get_window_state"
|
|
251
|
+
}
|
|
252
|
+
```
|
|
253
|
+
|
|
254
|
+
**Success without captureAfter:**
|
|
255
|
+
|
|
256
|
+
```json
|
|
257
|
+
{
|
|
258
|
+
"actionId": "a1x9z2k-2",
|
|
259
|
+
"action": "click_element",
|
|
260
|
+
"status": "ok",
|
|
261
|
+
"target": { "elementId": "AXButton-42", "app": "Safari" },
|
|
262
|
+
"result": { "clicked": true, "elementId": "AXButton-42" },
|
|
263
|
+
"capture": {
|
|
264
|
+
"requested": false,
|
|
265
|
+
"status": "skipped"
|
|
266
|
+
},
|
|
267
|
+
"warnings": [],
|
|
268
|
+
"next": "find_element or get_window_state"
|
|
269
|
+
}
|
|
270
|
+
```
|
|
271
|
+
|
|
272
|
+
**Partial when capture fails:**
|
|
273
|
+
|
|
274
|
+
```json
|
|
275
|
+
{
|
|
276
|
+
"actionId": "a1x9z2k-3",
|
|
277
|
+
"action": "click",
|
|
278
|
+
"status": "partial",
|
|
279
|
+
"target": { "x": 100, "y": 200 },
|
|
280
|
+
"result": { "clicked": true, "x": 100, "y": 200 },
|
|
281
|
+
"capture": {
|
|
282
|
+
"requested": true,
|
|
283
|
+
"status": "error",
|
|
284
|
+
"format": "jpeg",
|
|
285
|
+
"maxWidth": 1280,
|
|
286
|
+
"error": {
|
|
287
|
+
"name": "CaptureError",
|
|
288
|
+
"code": "CAPTURE_FAILED",
|
|
289
|
+
"retryable": true,
|
|
290
|
+
"message": "Screenshot capture failed after action",
|
|
291
|
+
"recovery": "Check Screen Recording permission and retry."
|
|
292
|
+
}
|
|
293
|
+
},
|
|
294
|
+
"warnings": ["Post-action screenshot capture failed"],
|
|
295
|
+
"next": "screenshot"
|
|
296
|
+
}
|
|
297
|
+
```
|
|
298
|
+
|
|
213
299
|
## macOS Permission Setup
|
|
214
300
|
|
|
215
301
|
UCU-MCP on macOS requires two system permissions:
|
package/dist/src/mcp/server.js
CHANGED
|
@@ -5,15 +5,20 @@ import { fileURLToPath } from "node:url";
|
|
|
5
5
|
import { createStdioTransport } from "./transport.js";
|
|
6
6
|
import { registerTools, startUserActivityMonitor } from "./tools.js";
|
|
7
7
|
const UCU_MCP_INSTRUCTIONS = `
|
|
8
|
-
UCU-MCP is a cross-client computer-use server for Claude Code CLI
|
|
8
|
+
UCU-MCP is a cross-client computer-use server for Claude Code CLI/Desktop, OpenCode, and other MCP clients.
|
|
9
9
|
|
|
10
|
-
|
|
10
|
+
Pick the right tool sequence for the task:
|
|
11
11
|
|
|
12
|
-
|
|
12
|
+
• Fill a form field → find_element (text/role) + type_in_element or set_value. Prefer AX over coordinates.
|
|
13
|
+
• Click a menu bar item → get_screen_size + click with coordinates (menu bar is not in the AX tree).
|
|
14
|
+
• Read what's on screen → screenshot; for text not in AX use ocr; for a structured tree use get_window_state.
|
|
15
|
+
• Switch between apps → list_apps, then focus_app; subsequent tools use the active target context.
|
|
16
|
+
• Verify an action succeeded → captureAfter=true on action tools, or call screenshot afterwards.
|
|
17
|
+
• Wait for UI to change → wait_for_element (until: "appear" default; also "disappear" or "value_change").
|
|
18
|
+
• Recover from TARGET_STALE → call focus_app again for the target app, then retry the action.
|
|
19
|
+
• Read or write the clipboard → clipboard_read / clipboard_write.
|
|
13
20
|
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
For Claude Code CLI/Desktop and OpenCode configs, run the ucu-mcp executable over stdio. If tools fail on macOS, run doctor first to check Accessibility and Screen Recording permissions. Windows and Linux adapters are explicit stubs until their native backends are implemented.
|
|
21
|
+
General rules: on macOS call list_apps/focus_app first to establish target context, then prefer AX tools (find_element → click_element / type_in_element / set_value). Use coordinates only when AX lookup is unavailable. Actions are blocked while macOS is locked; dangerous shortcuts and sensitive windows are blocked; suspicious injected text is rejected. type_in_element can refetch equivalent AX elements when the UI tree changes. Run doctor to check Accessibility and Screen Recording permissions. Windows and Linux adapters are explicit stubs until their native backends are implemented.
|
|
17
22
|
`.trim();
|
|
18
23
|
function getPackageVersion() {
|
|
19
24
|
let dir = dirname(fileURLToPath(import.meta.url));
|
package/dist/src/mcp/tools.d.ts
CHANGED
|
@@ -1,10 +1,15 @@
|
|
|
1
1
|
/**
|
|
2
2
|
* Tool registry for UCU-MCP.
|
|
3
3
|
*
|
|
4
|
-
* Registers
|
|
4
|
+
* Registers 24 MCP tools on the server and dispatches each call through
|
|
5
5
|
* a shared safety/permission/retry pipeline (`withSafety`).
|
|
6
6
|
*/
|
|
7
7
|
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
|
|
8
|
+
import type { AppTarget } from "../platform/base.js";
|
|
9
|
+
/**
|
|
10
|
+
* Get the currently active target context (set by focus_app).
|
|
11
|
+
*/
|
|
12
|
+
export declare function getActiveTarget(): AppTarget | undefined;
|
|
8
13
|
export declare function startUserActivityMonitor(): void;
|
|
9
14
|
export declare function stopUserActivityMonitor(): void;
|
|
10
15
|
export declare function registerTools(server: McpServer): void;
|