ucu-mcp 0.4.2 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,255 @@
1
+ # Tool Reference
2
+
3
+ UCU-MCP exposes 26 tools across five categories. All action tools accept
4
+ optional `captureAfter` / `captureMaxWidth` / `captureFormat` parameters that
5
+ screenshot the result and append it to the response.
6
+
7
+ Coordinate inputs are **screen-absolute** unless noted (window-relative only
8
+ when a `windowId` is explicitly passed).
9
+
10
+ ---
11
+
12
+ ## Screen & Window
13
+
14
+ ### `screenshot`
15
+ Capture the full screen, a region, or a specific window.
16
+
17
+ | Param | Type | Default | Notes |
18
+ |---|---|---|---|
19
+ | `display` | number | 0 | Display index |
20
+ | `windowId` | string | — | From `list_windows`; captures that window |
21
+ | `region` | `{x,y,width,height}` | — | Mutually exclusive with `windowId` |
22
+ | `format` | `"png"` \| `"jpeg"` | `"png"` | |
23
+ | `maxWidth` | number | 1280 | Resize preserving aspect ratio |
24
+ | `describe` | boolean | false | Append a text `ScreenDescription` block (OCR + AX) after the image |
25
+ | `describeOptions` | object | — | `{axDepth=3, ocrBlocks=50, includeAx=true}` when `describe=true` |
26
+
27
+ Returns one image content block (+ one text block if `describe=true`). Use
28
+ `describe=true` when image content may not reach the model (relay/URL downgrade).
29
+
30
+ ### `list_windows`
31
+ List visible windows. Returns `WindowInfo[]` (`{id, title, processName, pid,
32
+ bounds, isMinimized, isOnScreen}`). When empty, includes a `diagnostics` hint
33
+ distinguishing permission-denied vs Electron-opacity.
34
+
35
+ | Param | Type | Default |
36
+ |---|---|---|
37
+ | `includeMinimized` | boolean | false |
38
+
39
+ ### `get_window_state`
40
+ AX tree of a window. Returns `WindowState` = `{window, focusedElement?, tree?}`.
41
+ The `tree` is a depth-limited `ElementInfo` (`{role, name, value, states,
42
+ bounds?, children?}`).
43
+
44
+ | Param | Type | Default |
45
+ |---|---|---|
46
+ | `windowId` | string | active target |
47
+ | `depth` | number | 3 (capped at 10) |
48
+ | `includeBounds` | boolean | false |
49
+
50
+ ### `get_screen_size`
51
+ Returns `{width, height, scaleFactor, estimated?}`. Synchronous, low-cost.
52
+
53
+ | Param | Type | Default |
54
+ |---|---|---|
55
+ | `display` | number | 0 |
56
+
57
+ ### `ocr`
58
+ Run Vision OCR on the full screen or a region. Returns `{elements:
59
+ OcrElement[], fullText}`. Each `OcrElement` = `{text, x, y, width, height,
60
+ confidence}` in screen-absolute coordinates.
61
+
62
+ | Param | Type | Default |
63
+ |---|---|---|
64
+ | `display` | number | 0 |
65
+ | `region` | `{x,y,width,height}` | full screen |
66
+
67
+ ### `describe_screen`
68
+ Structured text description of the screen — the **vision-degraded fallback**.
69
+ Returns `ScreenDescription` = `{capturedAt, screen, foregroundWindow,
70
+ ocr:{blocks, fullText, status}, ax:{elements?, status, windowId?}, errors[]}`.
71
+ Each source is collected independently; failures land in `errors` (never thrown).
72
+ Password fields are masked to `[REDACTED]`.
73
+
74
+ | Param | Type | Default | Notes |
75
+ |---|---|---|---|
76
+ | `display` | number | 0 | |
77
+ | `ocr` | boolean | true | Requires Screen Recording when true |
78
+ | `includeAx` | boolean | true | Requires Accessibility when true |
79
+ | `axDepth` | number | 3 | Capped at 10 |
80
+ | `ocrBlocks` | number | 50 | Max OCR elements returned |
81
+ | `windowId` | string | active target | AX traversal target |
82
+
83
+ Use when: image content blocks are not visible to you; you need
84
+ machine-readable layout; you want OCR + AX in one call with graceful failure.
85
+
86
+ ---
87
+
88
+ ## Mouse & Input
89
+
90
+ All accept `captureAfter` / `captureMaxWidth` / `captureFormat`.
91
+
92
+ ### `click` / `double_click`
93
+ Click at screen coordinates. `button` ∈ `left|right|middle`.
94
+
95
+ | Param | Type |
96
+ |---|---|
97
+ | `x`, `y` | number |
98
+ | `button` | `"left"` \| `"right"` \| `"middle"` |
99
+ | `windowId` | string (optional, makes x/y window-relative) |
100
+
101
+ ### `scroll`
102
+ | Param | Type | Notes |
103
+ |---|---|---|
104
+ | `x`, `y` | number | position |
105
+ | `deltaX`, `deltaY` | number | negative deltaY = scroll up |
106
+
107
+ ### `drag`
108
+ | Param | Type |
109
+ |---|---|
110
+ | `startX`, `startY`, `endX`, `endY` | number |
111
+ | `button` | `"left"` \| `"right"` \| `"middle"` |
112
+ | `duration` | number (ms) |
113
+
114
+ ### `move`
115
+ Move cursor without clicking. Params: `x`, `y`.
116
+
117
+ ### `get_cursor_position`
118
+ Returns `{x, y}`.
119
+
120
+ ---
121
+
122
+ ## Keyboard
123
+
124
+ ### `type_text`
125
+ Type a string at the current cursor position (CGEvent background injection).
126
+
127
+ | Param | Type |
128
+ |---|---|
129
+ | `text` | string |
130
+ | `delay` | number (ms, optional) |
131
+
132
+ ### `press_key`
133
+ Press a key combo. Supports special keys, single letters a–z, single digits 0–9.
134
+
135
+ | Param | Type | Notes |
136
+ |---|---|---|
137
+ | `key` | string | e.g. `"enter"`, `"m"`, `"5"` |
138
+ | `keys` | string[] | alternative to `key` for multi-tap |
139
+ | `modifiers` | string[] | `cmd`, `shift`, `alt`/`option`, `ctrl`/`control`, `capslock` |
140
+
141
+ Blocked combos: `cmd+q`, `cmd+shift+q`, `cmd+option+q`, `cmd+l`, `alt+f4`,
142
+ `ctrl+alt+del` (logout/lock).
143
+
144
+ ---
145
+
146
+ ## AX Element Interaction
147
+
148
+ These operate on the active target's window (set via `focus_app`). Prefer these
149
+ over coordinate clicks.
150
+
151
+ ### `find_element`
152
+ Find AX elements by text/role/value. Returns `{results: FindElementResult[],
153
+ metrics}`. Each result has an `id` for use in the element tools below. When 0
154
+ results, includes a hint guiding to `screenshot`+`ocr`+`click(x,y)` (Electron
155
+ opacity).
156
+
157
+ | Param | Type | Default | Notes |
158
+ |---|---|---|---|
159
+ | `text` | string | — | Match element name/description |
160
+ | `role` | string | — | e.g. `AXButton`, `AXTextField` |
161
+ | `value` | string | — | Match current value |
162
+ | `textMode` | `"contains"` \| `"exact"` \| `"regex"` | `"contains"` | |
163
+ | `app` | string | active target | |
164
+ | `depth` | number | 5 | |
165
+ | `index` | number | — | Return only the Nth match (0-based) |
166
+ | `near` | `{x,y}` | — | Sort by ascending distance, closest first |
167
+ | `visibleOnly` | boolean | false | |
168
+ | `includeBounds` | boolean | false | |
169
+
170
+ ### `click_element`
171
+ Click by element `id`. AXPress first; on AX failure falls back to coordinate
172
+ click at the element's bounds center (handles Tauri/Electron silent swallows).
173
+
174
+ | Param | Type |
175
+ |---|---|
176
+ | `elementId` | string |
177
+ | `app` | string (optional) |
178
+
179
+ ### `set_value`
180
+ Set an AX element's value directly (no key synthesis). Best for text fields,
181
+ checkboxes, sliders.
182
+
183
+ | Param | Type |
184
+ |---|---|
185
+ | `elementId` | string |
186
+ | `value` | string |
187
+ | `app` | string (optional) |
188
+
189
+ ### `type_in_element`
190
+ Focus an element and type into it. Refetches an equivalent AX node if the
191
+ original `elementId` is stale (UI tree changed).
192
+
193
+ | Param | Type | Default |
194
+ |---|---|---|
195
+ | `elementId` | string | |
196
+ | `text` | string | |
197
+ | `clearFirst` | boolean | false |
198
+ | `app` | string | active target |
199
+
200
+ ### `click_menu_bar_extra`
201
+ Click a menu-bar status item (tray icon) — for menu-bar-only apps (e.g.
202
+ cc-switch) that `focus_app` cannot target. After clicking, the menu opens; use
203
+ `find_element` to locate menu items, or `screenshot` + `ocr` if the menu's AX
204
+ tree is opaque.
205
+
206
+ | Param | Type | Notes |
207
+ |---|---|---|
208
+ | `app` | string | Target app name |
209
+ | `description` | string | Match by description/name substring |
210
+ | `name` | string | Match by name/description substring |
211
+ | `index` | number | 0-based among matched items |
212
+
213
+ ---
214
+
215
+ ## Runtime & Synchronization
216
+
217
+ ### `list_apps`
218
+ Returns `AppInfo[]` = `{name, pid, isFrontmost, windowCount}`. Background-only
219
+ processes are filtered.
220
+
221
+ ### `focus_app`
222
+ Set the active target context. Establishes a window for AX tools; falls back to
223
+ a tray target (`windowId: "tray"`) for menu-bar-only apps if
224
+ `click_menu_bar_extra` status items are found.
225
+
226
+ | Param | Type |
227
+ |---|---|
228
+ | `app` | string |
229
+
230
+ ### `wait`
231
+ Pause execution.
232
+
233
+ | Param | Type |
234
+ |---|---|
235
+ | `ms` | number (1–60000) |
236
+
237
+ ### `wait_for_element`
238
+ Poll until an AX element matches. Returns the match or times out.
239
+
240
+ | Param | Type | Default |
241
+ |---|---|---|
242
+ | `text` / `role` / `value` | string | — |
243
+ | `app` | string | active target |
244
+ | `until` | `"appear"` \| `"disappear"` \| `"value_change"` | `"appear"` |
245
+ | `timeout` / `timeoutMs` | number (ms) | 5000 |
246
+ | `interval` / `intervalMs` | number (ms) | 500 |
247
+
248
+ ### `doctor`
249
+ Verify permissions, native helpers, and client readiness. Returns a JSON report
250
+ with `platform`, `safety`, `nativeHelpers`, `clients`, and `recommendations`.
251
+ Run this first when something is misbehaving.
252
+
253
+ ### `clipboard_read` / `clipboard_write`
254
+ Read/write the system clipboard. `clipboard_write` text-injection patterns
255
+ (e.g. shell-escape sequences) are blocked by the safety guard.
@@ -0,0 +1,142 @@
1
+ # Troubleshooting
2
+
3
+ ## First Checks
4
+
5
+ 1. Run `doctor` — verifies Accessibility + Screen Recording permissions and
6
+ native helpers. Most failures are permission issues.
7
+ 2. Confirm the target app is running and not minimized to the point of having no
8
+ on-screen window: `list_apps` + `list_windows`.
9
+ 3. Check `errors[]` in `describe_screen` responses — it names which source
10
+ (ocr/ax/foreground/screen) failed and why.
11
+
12
+ ---
13
+
14
+ ## Error Code Table
15
+
16
+ Every error response carries a `code` and a `hint`. The table below maps codes
17
+ to recovery steps (mirrors the runtime `recoveryHint`).
18
+
19
+ | Code | Meaning | Recovery |
20
+ |---|---|---|
21
+ | `WINDOW_NOT_FOUND` | The target window does not exist or is not on screen. | `list_windows` again, retry with a fresh `windowId`, or omit `windowId` for screen coordinates. |
22
+ | `TARGET_STALE` | The active target window changed pid or closed. | `focus_app` for the target app again, then retry. `type_in_element` auto-refetches equivalent nodes. |
23
+ | `ELEMENT_NOT_FOUND` | No AX element matched the selector. | `find_element` again with broader selectors (different `text`, `textMode:"contains"`, drop `role`). If still empty, the app may be Electron-opaque — see below. |
24
+ | `PERMISSION_DENIED` | Accessibility or Screen Recording not granted. | Run `doctor`, then grant the missing permission in System Settings → Privacy & Security, and **restart the launching client** (changes do not apply to already-running processes). |
25
+ | `SAFETY_BLOCKED` | Action rejected by the safety guard (dangerous shortcut, sensitive window, suspicious text). | Choose a less risky action, or ask the user to perform it manually. Blocked shortcuts include `cmd+q`, `cmd+shift+q`, `cmd+l`, `alt+f4`. |
26
+ | `INPUT_FAILED` | Input synthesis (click/type/keypress) failed at the CGEvent layer. | Observe current state with `screenshot` or `get_window_state`, then retry only if safe. |
27
+ | `CAPTURE_FAILED` | Screenshot/OCR failed (usually Screen Recording permission). | `doctor` → grant Screen Recording → restart client. |
28
+ | `COORDINATE_OUT_OF_BOUNDS` | Click/drag coordinates are outside the active display/window. | `get_screen_size` or `list_windows`, retry with coordinates inside bounds. |
29
+ | `UNSUPPORTED_PARAMETER` | A parameter combination is invalid (e.g. `screenshot` with both `windowId` and `region`). | Remove or replace the unsupported parameter; inspect `tools/list` for the schema. |
30
+
31
+ ---
32
+
33
+ ## Permission Issues
34
+
35
+ macOS requires two permissions for full functionality:
36
+
37
+ - **Accessibility** — needed for all AX tools (`find_element`,
38
+ `click_element`, `get_window_state`, `list_windows`, `click_menu_bar_extra`).
39
+ - **Screen Recording** — needed for `screenshot`, `ocr`, and `describe_screen`
40
+ (with `ocr: true`).
41
+
42
+ Grant via **System Settings → Privacy & Security → Accessibility / Screen
43
+ Recording**, enabling the entry for the launching terminal/client app (Terminal,
44
+ iTerm, Claude Code, Codex, etc.).
45
+
46
+ **Critical:** permission changes do not apply to already-running processes.
47
+ After granting, **quit and restart the client** that launches `ucu-mcp`.
48
+
49
+ Run `ucu-mcp doctor` (or the `doctor` tool) to verify — it reports per-permission
50
+ status and which process to authorize.
51
+
52
+ ---
53
+
54
+ ## Electron / Tauri / WebView AX Opacity
55
+
56
+ **Symptom:** `find_element` returns 0 results, `get_window_state` returns a near-
57
+ empty tree (just an `AXGroup`), `list_windows` shows the window but AX tools
58
+ can't see into it.
59
+
60
+ **Cause:** Electron/Tauri/WebView apps render their UI in a composited layer
61
+ that macOS AX cannot introspect. The AX tree exposes only the window frame and
62
+ traffic-light buttons.
63
+
64
+ **Workaround — pixel path:**
65
+ ```
66
+ screenshot({})
67
+ ocr({})
68
+ → blocks[].text locates the target UI text with bounding box {x,y,width,height}
69
+ click({ x: block.x + block.width/2, y: block.y + block.height/2 })
70
+ ```
71
+
72
+ `find_element` and `list_windows` emit a `hint` describing this fallback when
73
+ they detect the pattern. For one-shot planning, `describe_screen` gives OCR + AX
74
+ together.
75
+
76
+ ---
77
+
78
+ ## Menu-Bar / Tray App Not Reachable
79
+
80
+ **Symptom:** `focus_app("tray-app")` throws `WINDOW_NOT_FOUND`; the app has no
81
+ window in `list_windows`.
82
+
83
+ **Cause:** Pure menu-bar (LSUIElement) apps have no window; their status item is
84
+ hosted by the `SystemUIServer` system process.
85
+
86
+ **Workaround:**
87
+ ```
88
+ click_menu_bar_extra({ app: "tray-app", name: "TrayApp" }) # opens tray menu
89
+ find_element({ text: "Settings", app: "tray-app" }) # menu items are AX-visible
90
+ click_element({ elementId })
91
+ ```
92
+
93
+ `focus_app` automatically falls back to a tray target when `click_menu_bar_extra`
94
+ finds a matching status item, so subsequent AX tools work against the menu.
95
+
96
+ If the tray menu itself is Electron-opaque:
97
+ ```
98
+ click_menu_bar_extra({ app: "tray-app" })
99
+ screenshot({})
100
+ ocr({}) → locate menu item by text → click(x, y)
101
+ ```
102
+
103
+ ---
104
+
105
+ ## OCR Failures
106
+
107
+ **Symptom:** `ocr` or `describe_screen` reports OCR failure (`ocr.status:
108
+ "failed"`), or native OCR helper not found in `doctor`.
109
+
110
+ **Checks:**
111
+ 1. Screen Recording permission granted and client restarted.
112
+ 2. Native helper present — `doctor` reports `ocr` helper status. If missing, the
113
+ npm package may be corrupted; reinstall.
114
+ 3. Screen is not locked (OCR captures a black frame when locked).
115
+
116
+ `describe_screen` degrades gracefully — OCR failure still returns AX state, so
117
+ you can fall back to `get_window_state` / `find_element`.
118
+
119
+ ---
120
+
121
+ ## describe_screen Returns Empty / All-skipped
122
+
123
+ **Symptom:** `describe_screen` returns `errors: []` but `ocr.status:
124
+ "skipped"` and `ax.status: "skipped"`.
125
+
126
+ **Cause:** you passed `ocr: false, includeAx: false`, or the params defaulted to
127
+ that (note: in live MCP use the SDK applies `ocr: true, includeAx: true`
128
+ defaults; if you see both skipped, the client stripped defaults).
129
+
130
+ **Fix:** explicitly pass `ocr: true, includeAx: true`.
131
+
132
+ ---
133
+
134
+ ## Actions Blocked While macOS Is Locked
135
+
136
+ **Symptom:** input actions (`click`, `type_text`, `press_key`, …) fail; observe
137
+ actions (`screenshot`, `ocr`) return black/empty frames.
138
+
139
+ **Cause:** the safety guard refuses to synthesize input while the screen is
140
+ locked (the user is not present to supervise).
141
+
142
+ **Fix:** wait for the user to unlock, or ask them to unlock. There is no bypass.
@@ -0,0 +1,146 @@
1
+ # Workflows
2
+
3
+ Common task playbooks. Each shows the preferred tool sequence and the fallback
4
+ path when the primary path is blocked.
5
+
6
+ ---
7
+
8
+ ## 1. Fill a form field
9
+
10
+ **Primary (AX):**
11
+ ```
12
+ focus_app("Safari")
13
+ find_element({ text: "Email", role: "AXTextField" })
14
+ → elementId "Safari/w0/42"
15
+ type_in_element({ elementId: "Safari/w0/42", text: "user@example.com" })
16
+ ```
17
+
18
+ **Fallback (AX value set, for non-text controls):**
19
+ ```
20
+ set_value({ elementId: "...", value: "option" })
21
+ ```
22
+
23
+ **Fallback (coordinates, when AX is opaque):**
24
+ ```
25
+ screenshot({})
26
+ ocr({}) → blocks[].text === "Email" → {x, y, width, height}
27
+ click({ x: block.x + block.width/2, y: block.y + block.height/2 })
28
+ type_text({ text: "user@example.com" })
29
+ ```
30
+
31
+ ---
32
+
33
+ ## 2. Operate a menu-bar / tray app (e.g. cc-switch)
34
+
35
+ Tray apps' status items live in `SystemUIServer`, not the app's own window AX
36
+ tree. `focus_app` alone may return `WINDOW_NOT_FOUND`.
37
+
38
+ ```
39
+ focus_app("cc-switch") # establishes tray target if status item found
40
+ click_menu_bar_extra({ app: "cc-switch", name: "switch" }) # opens the menu
41
+ # menu is now open — find items inside it:
42
+ find_element({ text: "使用统计", app: "cc-switch" })
43
+ → elementId
44
+ click_element({ elementId })
45
+ ```
46
+
47
+ If the menu's AX tree is opaque (some Tauri/Electron menus):
48
+ ```
49
+ click_menu_bar_extra({ app: "cc-switch" })
50
+ screenshot({})
51
+ ocr({}) → locate "使用统计" by text → coordinates
52
+ click({ x, y })
53
+ ```
54
+
55
+ ---
56
+
57
+ ## 3. Electron / WebView opaque UI
58
+
59
+ Electron/Tauri apps often expose only a near-empty `AXGroup`. The runtime `hint`
60
+ on `find_element` and `list_windows` tells you when this is happening.
61
+
62
+ ```
63
+ find_element({ text: "Submit" })
64
+ → 0 results, hint: "app is likely Electron... screenshot → ocr → click(x,y)"
65
+
66
+ screenshot({})
67
+ ocr({ region: { x, y, width, height } }) # or full screen
68
+ → blocks[].text === "Submit" → {x, y, width, height}
69
+ click({ x: block.x + block.width/2, y: block.y + block.height/2 })
70
+ ```
71
+
72
+ For repeated interaction with a known-opaque app, snapshot once with
73
+ `describe_screen` to plan, then drive by coordinates.
74
+
75
+ ---
76
+
77
+ ## 4. Vision-degraded environment (image content not visible)
78
+
79
+ When the model cannot see `screenshot` image blocks (relay/downgrade to URLs),
80
+ switch to text-based screen reading:
81
+
82
+ ```
83
+ describe_screen({ ocr: true, includeAx: true })
84
+ → { screen, foregroundWindow, ocr:{blocks}, ax:{elements}, errors }
85
+
86
+ # or, if you also want the image for clients that DO support it:
87
+ screenshot({ describe: true })
88
+ → [image block, text description block]
89
+ ```
90
+
91
+ `describe_screen` never throws — OCR and AX each try/catch independently, so a
92
+ Vision failure still returns AX state and vice versa. Check `errors[]` to know
93
+ what was skipped/failed.
94
+
95
+ ---
96
+
97
+ ## 5. Recover from TARGET_STALE
98
+
99
+ The active window target can go stale (window closed, app restarted, pid
100
+ changed). AX tools throw `TARGET_STALE`.
101
+
102
+ ```
103
+ # error response includes hint: "Run focus_app again for the target app..."
104
+ focus_app("Safari") # re-establishes target
105
+ find_element({ text: "Save" }) # retry — cache refetches equivalent nodes
106
+ click_element({ elementId })
107
+ ```
108
+
109
+ `type_in_element` automatically refetches an equivalent AX node if the original
110
+ `elementId` is stale, so a single retry often succeeds without `focus_app`.
111
+
112
+ ---
113
+
114
+ ## 6. Verify an action succeeded
115
+
116
+ Always verify after clicks/types — UI may not have updated, or the wrong element
117
+ was hit.
118
+
119
+ ```
120
+ click_element({ elementId, captureAfter: true }) # screenshot in response
121
+ # or explicitly:
122
+ screenshot({})
123
+ # or check AX state:
124
+ get_window_state({}) → focusedElement / tree reflects the change
125
+ # or wait for a specific change:
126
+ wait_for_element({ text: "Saved", until: "appear", timeout: 3000 })
127
+ ```
128
+
129
+ ---
130
+
131
+ ## 7. Multi-step task with error recovery
132
+
133
+ ```
134
+ doctor() # verify permissions first
135
+ list_apps()
136
+ focus_app("Notes")
137
+ find_element({ text: "New Note" }) → id
138
+ click_element({ elementId: id, captureAfter: true })
139
+
140
+ # if click_element throws ELEMENT_NOT_FOUND:
141
+ find_element({ text: "New Note" }) → id2 # refetch, id may have changed
142
+ click_element({ elementId: id2 })
143
+
144
+ type_in_element({ elementId: bodyId, text: "Hello" })
145
+ screenshot({}) # confirm content
146
+ ```