ucu-mcp 0.4.2 → 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +32 -0
- package/README.md +28 -3
- package/dist/bin/ucu-mcp.js +0 -0
- package/dist/src/mcp/server.js +1 -0
- package/dist/src/mcp/tools/app-tools.js +7 -2
- package/dist/src/mcp/tools/element-tools.js +11 -0
- package/dist/src/mcp/tools/helpers.d.ts +2 -0
- package/dist/src/mcp/tools/helpers.js +4 -0
- package/dist/src/mcp/tools/index.d.ts +2 -2
- package/dist/src/mcp/tools/index.js +3 -2
- package/dist/src/mcp/tools/keyboard-tools.js +2 -2
- package/dist/src/mcp/tools/screen-tools.js +139 -3
- package/dist/src/platform/base.d.ts +31 -0
- package/dist/src/platform/macos/base.d.ts +3 -1
- package/dist/src/platform/macos/base.js +7 -1
- package/dist/src/platform/macos/element.d.ts +20 -0
- package/dist/src/platform/macos/element.js +255 -0
- package/dist/src/platform/macos/screen.js +21 -18
- package/dist/src/platform/macos/window.js +22 -0
- package/dist/src/safety/guard.js +17 -2
- package/dist/src/utils/input.js +24 -16
- package/package.json +2 -1
- package/skills/ucu-mcp/SKILL.md +100 -0
- package/skills/ucu-mcp/agents/openai.yaml +4 -0
- package/skills/ucu-mcp/references/tool-reference.md +255 -0
- package/skills/ucu-mcp/references/troubleshooting.md +142 -0
- package/skills/ucu-mcp/references/workflows.md +146 -0
|
@@ -0,0 +1,255 @@
|
|
|
1
|
+
# Tool Reference
|
|
2
|
+
|
|
3
|
+
UCU-MCP exposes 26 tools across five categories. All action tools accept
|
|
4
|
+
optional `captureAfter` / `captureMaxWidth` / `captureFormat` parameters that
|
|
5
|
+
screenshot the result and append it to the response.
|
|
6
|
+
|
|
7
|
+
Coordinate inputs are **screen-absolute** unless noted (window-relative only
|
|
8
|
+
when a `windowId` is explicitly passed).
|
|
9
|
+
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
## Screen & Window
|
|
13
|
+
|
|
14
|
+
### `screenshot`
|
|
15
|
+
Capture the full screen, a region, or a specific window.
|
|
16
|
+
|
|
17
|
+
| Param | Type | Default | Notes |
|
|
18
|
+
|---|---|---|---|
|
|
19
|
+
| `display` | number | 0 | Display index |
|
|
20
|
+
| `windowId` | string | — | From `list_windows`; captures that window |
|
|
21
|
+
| `region` | `{x,y,width,height}` | — | Mutually exclusive with `windowId` |
|
|
22
|
+
| `format` | `"png"` \| `"jpeg"` | `"png"` | |
|
|
23
|
+
| `maxWidth` | number | 1280 | Resize preserving aspect ratio |
|
|
24
|
+
| `describe` | boolean | false | Append a text `ScreenDescription` block (OCR + AX) after the image |
|
|
25
|
+
| `describeOptions` | object | — | `{axDepth=3, ocrBlocks=50, includeAx=true}` when `describe=true` |
|
|
26
|
+
|
|
27
|
+
Returns one image content block (+ one text block if `describe=true`). Use
|
|
28
|
+
`describe=true` when image content may not reach the model (relay/URL downgrade).
|
|
29
|
+
|
|
30
|
+
### `list_windows`
|
|
31
|
+
List visible windows. Returns `WindowInfo[]` (`{id, title, processName, pid,
|
|
32
|
+
bounds, isMinimized, isOnScreen}`). When empty, includes a `diagnostics` hint
|
|
33
|
+
distinguishing permission-denied vs Electron-opacity.
|
|
34
|
+
|
|
35
|
+
| Param | Type | Default |
|
|
36
|
+
|---|---|---|
|
|
37
|
+
| `includeMinimized` | boolean | false |
|
|
38
|
+
|
|
39
|
+
### `get_window_state`
|
|
40
|
+
AX tree of a window. Returns `WindowState` = `{window, focusedElement?, tree?}`.
|
|
41
|
+
The `tree` is a depth-limited `ElementInfo` (`{role, name, value, states,
|
|
42
|
+
bounds?, children?}`).
|
|
43
|
+
|
|
44
|
+
| Param | Type | Default |
|
|
45
|
+
|---|---|---|
|
|
46
|
+
| `windowId` | string | active target |
|
|
47
|
+
| `depth` | number | 3 (capped at 10) |
|
|
48
|
+
| `includeBounds` | boolean | false |
|
|
49
|
+
|
|
50
|
+
### `get_screen_size`
|
|
51
|
+
Returns `{width, height, scaleFactor, estimated?}`. Synchronous, low-cost.
|
|
52
|
+
|
|
53
|
+
| Param | Type | Default |
|
|
54
|
+
|---|---|---|
|
|
55
|
+
| `display` | number | 0 |
|
|
56
|
+
|
|
57
|
+
### `ocr`
|
|
58
|
+
Run Vision OCR on the full screen or a region. Returns `{elements:
|
|
59
|
+
OcrElement[], fullText}`. Each `OcrElement` = `{text, x, y, width, height,
|
|
60
|
+
confidence}` in screen-absolute coordinates.
|
|
61
|
+
|
|
62
|
+
| Param | Type | Default |
|
|
63
|
+
|---|---|---|
|
|
64
|
+
| `display` | number | 0 |
|
|
65
|
+
| `region` | `{x,y,width,height}` | full screen |
|
|
66
|
+
|
|
67
|
+
### `describe_screen`
|
|
68
|
+
Structured text description of the screen — the **vision-degraded fallback**.
|
|
69
|
+
Returns `ScreenDescription` = `{capturedAt, screen, foregroundWindow,
|
|
70
|
+
ocr:{blocks, fullText, status}, ax:{elements?, status, windowId?}, errors[]}`.
|
|
71
|
+
Each source is collected independently; failures land in `errors` (never thrown).
|
|
72
|
+
Password fields are masked to `[REDACTED]`.
|
|
73
|
+
|
|
74
|
+
| Param | Type | Default | Notes |
|
|
75
|
+
|---|---|---|---|
|
|
76
|
+
| `display` | number | 0 | |
|
|
77
|
+
| `ocr` | boolean | true | Requires Screen Recording when true |
|
|
78
|
+
| `includeAx` | boolean | true | Requires Accessibility when true |
|
|
79
|
+
| `axDepth` | number | 3 | Capped at 10 |
|
|
80
|
+
| `ocrBlocks` | number | 50 | Max OCR elements returned |
|
|
81
|
+
| `windowId` | string | active target | AX traversal target |
|
|
82
|
+
|
|
83
|
+
Use when: image content blocks are not visible to you; you need
|
|
84
|
+
machine-readable layout; you want OCR + AX in one call with graceful failure.
|
|
85
|
+
|
|
86
|
+
---
|
|
87
|
+
|
|
88
|
+
## Mouse & Input
|
|
89
|
+
|
|
90
|
+
All accept `captureAfter` / `captureMaxWidth` / `captureFormat`.
|
|
91
|
+
|
|
92
|
+
### `click` / `double_click`
|
|
93
|
+
Click at screen coordinates. `button` ∈ `left|right|middle`.
|
|
94
|
+
|
|
95
|
+
| Param | Type |
|
|
96
|
+
|---|---|
|
|
97
|
+
| `x`, `y` | number |
|
|
98
|
+
| `button` | `"left"` \| `"right"` \| `"middle"` |
|
|
99
|
+
| `windowId` | string (optional, makes x/y window-relative) |
|
|
100
|
+
|
|
101
|
+
### `scroll`
|
|
102
|
+
| Param | Type | Notes |
|
|
103
|
+
|---|---|---|
|
|
104
|
+
| `x`, `y` | number | position |
|
|
105
|
+
| `deltaX`, `deltaY` | number | negative deltaY = scroll up |
|
|
106
|
+
|
|
107
|
+
### `drag`
|
|
108
|
+
| Param | Type |
|
|
109
|
+
|---|---|
|
|
110
|
+
| `startX`, `startY`, `endX`, `endY` | number |
|
|
111
|
+
| `button` | `"left"` \| `"right"` \| `"middle"` |
|
|
112
|
+
| `duration` | number (ms) |
|
|
113
|
+
|
|
114
|
+
### `move`
|
|
115
|
+
Move cursor without clicking. Params: `x`, `y`.
|
|
116
|
+
|
|
117
|
+
### `get_cursor_position`
|
|
118
|
+
Returns `{x, y}`.
|
|
119
|
+
|
|
120
|
+
---
|
|
121
|
+
|
|
122
|
+
## Keyboard
|
|
123
|
+
|
|
124
|
+
### `type_text`
|
|
125
|
+
Type a string at the current cursor position (CGEvent background injection).
|
|
126
|
+
|
|
127
|
+
| Param | Type |
|
|
128
|
+
|---|---|
|
|
129
|
+
| `text` | string |
|
|
130
|
+
| `delay` | number (ms, optional) |
|
|
131
|
+
|
|
132
|
+
### `press_key`
|
|
133
|
+
Press a key combo. Supports special keys, single letters a–z, single digits 0–9.
|
|
134
|
+
|
|
135
|
+
| Param | Type | Notes |
|
|
136
|
+
|---|---|---|
|
|
137
|
+
| `key` | string | e.g. `"enter"`, `"m"`, `"5"` |
|
|
138
|
+
| `keys` | string[] | alternative to `key` for multi-tap |
|
|
139
|
+
| `modifiers` | string[] | `cmd`, `shift`, `alt`/`option`, `ctrl`/`control`, `capslock` |
|
|
140
|
+
|
|
141
|
+
Blocked combos: `cmd+q`, `cmd+shift+q`, `cmd+option+q`, `cmd+l`, `alt+f4`,
|
|
142
|
+
`ctrl+alt+del` (logout/lock).
|
|
143
|
+
|
|
144
|
+
---
|
|
145
|
+
|
|
146
|
+
## AX Element Interaction
|
|
147
|
+
|
|
148
|
+
These operate on the active target's window (set via `focus_app`). Prefer these
|
|
149
|
+
over coordinate clicks.
|
|
150
|
+
|
|
151
|
+
### `find_element`
|
|
152
|
+
Find AX elements by text/role/value. Returns `{results: FindElementResult[],
|
|
153
|
+
metrics}`. Each result has an `id` for use in the element tools below. When 0
|
|
154
|
+
results, includes a hint guiding to `screenshot`+`ocr`+`click(x,y)` (Electron
|
|
155
|
+
opacity).
|
|
156
|
+
|
|
157
|
+
| Param | Type | Default | Notes |
|
|
158
|
+
|---|---|---|---|
|
|
159
|
+
| `text` | string | — | Match element name/description |
|
|
160
|
+
| `role` | string | — | e.g. `AXButton`, `AXTextField` |
|
|
161
|
+
| `value` | string | — | Match current value |
|
|
162
|
+
| `textMode` | `"contains"` \| `"exact"` \| `"regex"` | `"contains"` | |
|
|
163
|
+
| `app` | string | active target | |
|
|
164
|
+
| `depth` | number | 5 | |
|
|
165
|
+
| `index` | number | — | Return only the Nth match (0-based) |
|
|
166
|
+
| `near` | `{x,y}` | — | Sort by ascending distance, closest first |
|
|
167
|
+
| `visibleOnly` | boolean | false | |
|
|
168
|
+
| `includeBounds` | boolean | false | |
|
|
169
|
+
|
|
170
|
+
### `click_element`
|
|
171
|
+
Click by element `id`. AXPress first; on AX failure falls back to coordinate
|
|
172
|
+
click at the element's bounds center (handles Tauri/Electron silent swallows).
|
|
173
|
+
|
|
174
|
+
| Param | Type |
|
|
175
|
+
|---|---|
|
|
176
|
+
| `elementId` | string |
|
|
177
|
+
| `app` | string (optional) |
|
|
178
|
+
|
|
179
|
+
### `set_value`
|
|
180
|
+
Set an AX element's value directly (no key synthesis). Best for text fields,
|
|
181
|
+
checkboxes, sliders.
|
|
182
|
+
|
|
183
|
+
| Param | Type |
|
|
184
|
+
|---|---|
|
|
185
|
+
| `elementId` | string |
|
|
186
|
+
| `value` | string |
|
|
187
|
+
| `app` | string (optional) |
|
|
188
|
+
|
|
189
|
+
### `type_in_element`
|
|
190
|
+
Focus an element and type into it. Refetches an equivalent AX node if the
|
|
191
|
+
original `elementId` is stale (UI tree changed).
|
|
192
|
+
|
|
193
|
+
| Param | Type | Default |
|
|
194
|
+
|---|---|---|
|
|
195
|
+
| `elementId` | string | |
|
|
196
|
+
| `text` | string | |
|
|
197
|
+
| `clearFirst` | boolean | false |
|
|
198
|
+
| `app` | string | active target |
|
|
199
|
+
|
|
200
|
+
### `click_menu_bar_extra`
|
|
201
|
+
Click a menu-bar status item (tray icon) — for menu-bar-only apps (e.g.
|
|
202
|
+
cc-switch) that `focus_app` cannot target. After clicking, the menu opens; use
|
|
203
|
+
`find_element` to locate menu items, or `screenshot` + `ocr` if the menu's AX
|
|
204
|
+
tree is opaque.
|
|
205
|
+
|
|
206
|
+
| Param | Type | Notes |
|
|
207
|
+
|---|---|---|
|
|
208
|
+
| `app` | string | Target app name |
|
|
209
|
+
| `description` | string | Match by description/name substring |
|
|
210
|
+
| `name` | string | Match by name/description substring |
|
|
211
|
+
| `index` | number | 0-based among matched items |
|
|
212
|
+
|
|
213
|
+
---
|
|
214
|
+
|
|
215
|
+
## Runtime & Synchronization
|
|
216
|
+
|
|
217
|
+
### `list_apps`
|
|
218
|
+
Returns `AppInfo[]` = `{name, pid, isFrontmost, windowCount}`. Background-only
|
|
219
|
+
processes are filtered.
|
|
220
|
+
|
|
221
|
+
### `focus_app`
|
|
222
|
+
Set the active target context. Establishes a window for AX tools; falls back to
|
|
223
|
+
a tray target (`windowId: "tray"`) for menu-bar-only apps if
|
|
224
|
+
`click_menu_bar_extra` status items are found.
|
|
225
|
+
|
|
226
|
+
| Param | Type |
|
|
227
|
+
|---|---|
|
|
228
|
+
| `app` | string |
|
|
229
|
+
|
|
230
|
+
### `wait`
|
|
231
|
+
Pause execution.
|
|
232
|
+
|
|
233
|
+
| Param | Type |
|
|
234
|
+
|---|---|
|
|
235
|
+
| `ms` | number (1–60000) |
|
|
236
|
+
|
|
237
|
+
### `wait_for_element`
|
|
238
|
+
Poll until an AX element matches. Returns the match or times out.
|
|
239
|
+
|
|
240
|
+
| Param | Type | Default |
|
|
241
|
+
|---|---|---|
|
|
242
|
+
| `text` / `role` / `value` | string | — |
|
|
243
|
+
| `app` | string | active target |
|
|
244
|
+
| `until` | `"appear"` \| `"disappear"` \| `"value_change"` | `"appear"` |
|
|
245
|
+
| `timeout` / `timeoutMs` | number (ms) | 5000 |
|
|
246
|
+
| `interval` / `intervalMs` | number (ms) | 500 |
|
|
247
|
+
|
|
248
|
+
### `doctor`
|
|
249
|
+
Verify permissions, native helpers, and client readiness. Returns a JSON report
|
|
250
|
+
with `platform`, `safety`, `nativeHelpers`, `clients`, and `recommendations`.
|
|
251
|
+
Run this first when something is misbehaving.
|
|
252
|
+
|
|
253
|
+
### `clipboard_read` / `clipboard_write`
|
|
254
|
+
Read/write the system clipboard. `clipboard_write` text-injection patterns
|
|
255
|
+
(e.g. shell-escape sequences) are blocked by the safety guard.
|
|
@@ -0,0 +1,142 @@
|
|
|
1
|
+
# Troubleshooting
|
|
2
|
+
|
|
3
|
+
## First Checks
|
|
4
|
+
|
|
5
|
+
1. Run `doctor` — verifies Accessibility + Screen Recording permissions and
|
|
6
|
+
native helpers. Most failures are permission issues.
|
|
7
|
+
2. Confirm the target app is running and not minimized to the point of having no
|
|
8
|
+
on-screen window: `list_apps` + `list_windows`.
|
|
9
|
+
3. Check `errors[]` in `describe_screen` responses — it names which source
|
|
10
|
+
(ocr/ax/foreground/screen) failed and why.
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
## Error Code Table
|
|
15
|
+
|
|
16
|
+
Every error response carries a `code` and a `hint`. The table below maps codes
|
|
17
|
+
to recovery steps (mirrors the runtime `recoveryHint`).
|
|
18
|
+
|
|
19
|
+
| Code | Meaning | Recovery |
|
|
20
|
+
|---|---|---|
|
|
21
|
+
| `WINDOW_NOT_FOUND` | The target window does not exist or is not on screen. | `list_windows` again, retry with a fresh `windowId`, or omit `windowId` for screen coordinates. |
|
|
22
|
+
| `TARGET_STALE` | The active target window changed pid or closed. | `focus_app` for the target app again, then retry. `type_in_element` auto-refetches equivalent nodes. |
|
|
23
|
+
| `ELEMENT_NOT_FOUND` | No AX element matched the selector. | `find_element` again with broader selectors (different `text`, `textMode:"contains"`, drop `role`). If still empty, the app may be Electron-opaque — see below. |
|
|
24
|
+
| `PERMISSION_DENIED` | Accessibility or Screen Recording not granted. | Run `doctor`, then grant the missing permission in System Settings → Privacy & Security, and **restart the launching client** (changes do not apply to already-running processes). |
|
|
25
|
+
| `SAFETY_BLOCKED` | Action rejected by the safety guard (dangerous shortcut, sensitive window, suspicious text). | Choose a less risky action, or ask the user to perform it manually. Blocked shortcuts include `cmd+q`, `cmd+shift+q`, `cmd+l`, `alt+f4`. |
|
|
26
|
+
| `INPUT_FAILED` | Input synthesis (click/type/keypress) failed at the CGEvent layer. | Observe current state with `screenshot` or `get_window_state`, then retry only if safe. |
|
|
27
|
+
| `CAPTURE_FAILED` | Screenshot/OCR failed (usually Screen Recording permission). | `doctor` → grant Screen Recording → restart client. |
|
|
28
|
+
| `COORDINATE_OUT_OF_BOUNDS` | Click/drag coordinates are outside the active display/window. | `get_screen_size` or `list_windows`, retry with coordinates inside bounds. |
|
|
29
|
+
| `UNSUPPORTED_PARAMETER` | A parameter combination is invalid (e.g. `screenshot` with both `windowId` and `region`). | Remove or replace the unsupported parameter; inspect `tools/list` for the schema. |
|
|
30
|
+
|
|
31
|
+
---
|
|
32
|
+
|
|
33
|
+
## Permission Issues
|
|
34
|
+
|
|
35
|
+
macOS requires two permissions for full functionality:
|
|
36
|
+
|
|
37
|
+
- **Accessibility** — needed for all AX tools (`find_element`,
|
|
38
|
+
`click_element`, `get_window_state`, `list_windows`, `click_menu_bar_extra`).
|
|
39
|
+
- **Screen Recording** — needed for `screenshot`, `ocr`, and `describe_screen`
|
|
40
|
+
(with `ocr: true`).
|
|
41
|
+
|
|
42
|
+
Grant via **System Settings → Privacy & Security → Accessibility / Screen
|
|
43
|
+
Recording**, enabling the entry for the launching terminal/client app (Terminal,
|
|
44
|
+
iTerm, Claude Code, Codex, etc.).
|
|
45
|
+
|
|
46
|
+
**Critical:** permission changes do not apply to already-running processes.
|
|
47
|
+
After granting, **quit and restart the client** that launches `ucu-mcp`.
|
|
48
|
+
|
|
49
|
+
Run `ucu-mcp doctor` (or the `doctor` tool) to verify — it reports per-permission
|
|
50
|
+
status and which process to authorize.
|
|
51
|
+
|
|
52
|
+
---
|
|
53
|
+
|
|
54
|
+
## Electron / Tauri / WebView AX Opacity
|
|
55
|
+
|
|
56
|
+
**Symptom:** `find_element` returns 0 results, `get_window_state` returns a near-
|
|
57
|
+
empty tree (just an `AXGroup`), `list_windows` shows the window but AX tools
|
|
58
|
+
can't see into it.
|
|
59
|
+
|
|
60
|
+
**Cause:** Electron/Tauri/WebView apps render their UI in a composited layer
|
|
61
|
+
that macOS AX cannot introspect. The AX tree exposes only the window frame and
|
|
62
|
+
traffic-light buttons.
|
|
63
|
+
|
|
64
|
+
**Workaround — pixel path:**
|
|
65
|
+
```
|
|
66
|
+
screenshot({})
|
|
67
|
+
ocr({})
|
|
68
|
+
→ blocks[].text locates the target UI text with bounding box {x,y,width,height}
|
|
69
|
+
click({ x: block.x + block.width/2, y: block.y + block.height/2 })
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
`find_element` and `list_windows` emit a `hint` describing this fallback when
|
|
73
|
+
they detect the pattern. For one-shot planning, `describe_screen` gives OCR + AX
|
|
74
|
+
together.
|
|
75
|
+
|
|
76
|
+
---
|
|
77
|
+
|
|
78
|
+
## Menu-Bar / Tray App Not Reachable
|
|
79
|
+
|
|
80
|
+
**Symptom:** `focus_app("tray-app")` throws `WINDOW_NOT_FOUND`; the app has no
|
|
81
|
+
window in `list_windows`.
|
|
82
|
+
|
|
83
|
+
**Cause:** Pure menu-bar (LSUIElement) apps have no window; their status item is
|
|
84
|
+
hosted by the `SystemUIServer` system process.
|
|
85
|
+
|
|
86
|
+
**Workaround:**
|
|
87
|
+
```
|
|
88
|
+
click_menu_bar_extra({ app: "tray-app", name: "TrayApp" }) # opens tray menu
|
|
89
|
+
find_element({ text: "Settings", app: "tray-app" }) # menu items are AX-visible
|
|
90
|
+
click_element({ elementId })
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
`focus_app` automatically falls back to a tray target when `click_menu_bar_extra`
|
|
94
|
+
finds a matching status item, so subsequent AX tools work against the menu.
|
|
95
|
+
|
|
96
|
+
If the tray menu itself is Electron-opaque:
|
|
97
|
+
```
|
|
98
|
+
click_menu_bar_extra({ app: "tray-app" })
|
|
99
|
+
screenshot({})
|
|
100
|
+
ocr({}) → locate menu item by text → click(x, y)
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
---
|
|
104
|
+
|
|
105
|
+
## OCR Failures
|
|
106
|
+
|
|
107
|
+
**Symptom:** `ocr` or `describe_screen` reports OCR failure (`ocr.status:
|
|
108
|
+
"failed"`), or native OCR helper not found in `doctor`.
|
|
109
|
+
|
|
110
|
+
**Checks:**
|
|
111
|
+
1. Screen Recording permission granted and client restarted.
|
|
112
|
+
2. Native helper present — `doctor` reports `ocr` helper status. If missing, the
|
|
113
|
+
npm package may be corrupted; reinstall.
|
|
114
|
+
3. Screen is not locked (OCR captures a black frame when locked).
|
|
115
|
+
|
|
116
|
+
`describe_screen` degrades gracefully — OCR failure still returns AX state, so
|
|
117
|
+
you can fall back to `get_window_state` / `find_element`.
|
|
118
|
+
|
|
119
|
+
---
|
|
120
|
+
|
|
121
|
+
## describe_screen Returns Empty / All-skipped
|
|
122
|
+
|
|
123
|
+
**Symptom:** `describe_screen` returns `errors: []` but `ocr.status:
|
|
124
|
+
"skipped"` and `ax.status: "skipped"`.
|
|
125
|
+
|
|
126
|
+
**Cause:** you passed `ocr: false, includeAx: false`, or the params defaulted to
|
|
127
|
+
that (note: in live MCP use the SDK applies `ocr: true, includeAx: true`
|
|
128
|
+
defaults; if you see both skipped, the client stripped defaults).
|
|
129
|
+
|
|
130
|
+
**Fix:** explicitly pass `ocr: true, includeAx: true`.
|
|
131
|
+
|
|
132
|
+
---
|
|
133
|
+
|
|
134
|
+
## Actions Blocked While macOS Is Locked
|
|
135
|
+
|
|
136
|
+
**Symptom:** input actions (`click`, `type_text`, `press_key`, …) fail; observe
|
|
137
|
+
actions (`screenshot`, `ocr`) return black/empty frames.
|
|
138
|
+
|
|
139
|
+
**Cause:** the safety guard refuses to synthesize input while the screen is
|
|
140
|
+
locked (the user is not present to supervise).
|
|
141
|
+
|
|
142
|
+
**Fix:** wait for the user to unlock, or ask them to unlock. There is no bypass.
|
|
@@ -0,0 +1,146 @@
|
|
|
1
|
+
# Workflows
|
|
2
|
+
|
|
3
|
+
Common task playbooks. Each shows the preferred tool sequence and the fallback
|
|
4
|
+
path when the primary path is blocked.
|
|
5
|
+
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## 1. Fill a form field
|
|
9
|
+
|
|
10
|
+
**Primary (AX):**
|
|
11
|
+
```
|
|
12
|
+
focus_app("Safari")
|
|
13
|
+
find_element({ text: "Email", role: "AXTextField" })
|
|
14
|
+
→ elementId "Safari/w0/42"
|
|
15
|
+
type_in_element({ elementId: "Safari/w0/42", text: "user@example.com" })
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
**Fallback (AX value set, for non-text controls):**
|
|
19
|
+
```
|
|
20
|
+
set_value({ elementId: "...", value: "option" })
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
**Fallback (coordinates, when AX is opaque):**
|
|
24
|
+
```
|
|
25
|
+
screenshot({})
|
|
26
|
+
ocr({}) → blocks[].text === "Email" → {x, y, width, height}
|
|
27
|
+
click({ x: block.x + block.width/2, y: block.y + block.height/2 })
|
|
28
|
+
type_text({ text: "user@example.com" })
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
---
|
|
32
|
+
|
|
33
|
+
## 2. Operate a menu-bar / tray app (e.g. cc-switch)
|
|
34
|
+
|
|
35
|
+
Tray apps' status items live in `SystemUIServer`, not the app's own window AX
|
|
36
|
+
tree. `focus_app` alone may return `WINDOW_NOT_FOUND`.
|
|
37
|
+
|
|
38
|
+
```
|
|
39
|
+
focus_app("cc-switch") # establishes tray target if status item found
|
|
40
|
+
click_menu_bar_extra({ app: "cc-switch", name: "switch" }) # opens the menu
|
|
41
|
+
# menu is now open — find items inside it:
|
|
42
|
+
find_element({ text: "使用统计", app: "cc-switch" })
|
|
43
|
+
→ elementId
|
|
44
|
+
click_element({ elementId })
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
If the menu's AX tree is opaque (some Tauri/Electron menus):
|
|
48
|
+
```
|
|
49
|
+
click_menu_bar_extra({ app: "cc-switch" })
|
|
50
|
+
screenshot({})
|
|
51
|
+
ocr({}) → locate "使用统计" by text → coordinates
|
|
52
|
+
click({ x, y })
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## 3. Electron / WebView opaque UI
|
|
58
|
+
|
|
59
|
+
Electron/Tauri apps often expose only a near-empty `AXGroup`. The runtime `hint`
|
|
60
|
+
on `find_element` and `list_windows` tells you when this is happening.
|
|
61
|
+
|
|
62
|
+
```
|
|
63
|
+
find_element({ text: "Submit" })
|
|
64
|
+
→ 0 results, hint: "app is likely Electron... screenshot → ocr → click(x,y)"
|
|
65
|
+
|
|
66
|
+
screenshot({})
|
|
67
|
+
ocr({ region: { x, y, width, height } }) # or full screen
|
|
68
|
+
→ blocks[].text === "Submit" → {x, y, width, height}
|
|
69
|
+
click({ x: block.x + block.width/2, y: block.y + block.height/2 })
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
For repeated interaction with a known-opaque app, snapshot once with
|
|
73
|
+
`describe_screen` to plan, then drive by coordinates.
|
|
74
|
+
|
|
75
|
+
---
|
|
76
|
+
|
|
77
|
+
## 4. Vision-degraded environment (image content not visible)
|
|
78
|
+
|
|
79
|
+
When the model cannot see `screenshot` image blocks (relay/downgrade to URLs),
|
|
80
|
+
switch to text-based screen reading:
|
|
81
|
+
|
|
82
|
+
```
|
|
83
|
+
describe_screen({ ocr: true, includeAx: true })
|
|
84
|
+
→ { screen, foregroundWindow, ocr:{blocks}, ax:{elements}, errors }
|
|
85
|
+
|
|
86
|
+
# or, if you also want the image for clients that DO support it:
|
|
87
|
+
screenshot({ describe: true })
|
|
88
|
+
→ [image block, text description block]
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
`describe_screen` never throws — OCR and AX each try/catch independently, so a
|
|
92
|
+
Vision failure still returns AX state and vice versa. Check `errors[]` to know
|
|
93
|
+
what was skipped/failed.
|
|
94
|
+
|
|
95
|
+
---
|
|
96
|
+
|
|
97
|
+
## 5. Recover from TARGET_STALE
|
|
98
|
+
|
|
99
|
+
The active window target can go stale (window closed, app restarted, pid
|
|
100
|
+
changed). AX tools throw `TARGET_STALE`.
|
|
101
|
+
|
|
102
|
+
```
|
|
103
|
+
# error response includes hint: "Run focus_app again for the target app..."
|
|
104
|
+
focus_app("Safari") # re-establishes target
|
|
105
|
+
find_element({ text: "Save" }) # retry — cache refetches equivalent nodes
|
|
106
|
+
click_element({ elementId })
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
`type_in_element` automatically refetches an equivalent AX node if the original
|
|
110
|
+
`elementId` is stale, so a single retry often succeeds without `focus_app`.
|
|
111
|
+
|
|
112
|
+
---
|
|
113
|
+
|
|
114
|
+
## 6. Verify an action succeeded
|
|
115
|
+
|
|
116
|
+
Always verify after clicks/types — UI may not have updated, or the wrong element
|
|
117
|
+
was hit.
|
|
118
|
+
|
|
119
|
+
```
|
|
120
|
+
click_element({ elementId, captureAfter: true }) # screenshot in response
|
|
121
|
+
# or explicitly:
|
|
122
|
+
screenshot({})
|
|
123
|
+
# or check AX state:
|
|
124
|
+
get_window_state({}) → focusedElement / tree reflects the change
|
|
125
|
+
# or wait for a specific change:
|
|
126
|
+
wait_for_element({ text: "Saved", until: "appear", timeout: 3000 })
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
---
|
|
130
|
+
|
|
131
|
+
## 7. Multi-step task with error recovery
|
|
132
|
+
|
|
133
|
+
```
|
|
134
|
+
doctor() # verify permissions first
|
|
135
|
+
list_apps()
|
|
136
|
+
focus_app("Notes")
|
|
137
|
+
find_element({ text: "New Note" }) → id
|
|
138
|
+
click_element({ elementId: id, captureAfter: true })
|
|
139
|
+
|
|
140
|
+
# if click_element throws ELEMENT_NOT_FOUND:
|
|
141
|
+
find_element({ text: "New Note" }) → id2 # refetch, id may have changed
|
|
142
|
+
click_element({ elementId: id2 })
|
|
143
|
+
|
|
144
|
+
type_in_element({ elementId: bodyId, text: "Hello" })
|
|
145
|
+
screenshot({}) # confirm content
|
|
146
|
+
```
|