@amaster.ai/pi-computer-use 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +201 -0
- package/README.md +136 -0
- package/bin/darwin-arm64/.version +2 -0
- package/bin/darwin-arm64/CuaDriver.app/Contents/CodeResources +0 -0
- package/bin/darwin-arm64/CuaDriver.app/Contents/Info.plist +32 -0
- package/bin/darwin-arm64/CuaDriver.app/Contents/MacOS/cua-driver +0 -0
- package/bin/darwin-arm64/CuaDriver.app/Contents/Resources/Skills/cua-driver/README.md +140 -0
- package/bin/darwin-arm64/CuaDriver.app/Contents/Resources/Skills/cua-driver/RECORDING.md +113 -0
- package/bin/darwin-arm64/CuaDriver.app/Contents/Resources/Skills/cua-driver/SKILL.md +887 -0
- package/bin/darwin-arm64/CuaDriver.app/Contents/Resources/Skills/cua-driver/TESTS.md +232 -0
- package/bin/darwin-arm64/CuaDriver.app/Contents/Resources/Skills/cua-driver/WEB_APPS.md +471 -0
- package/bin/darwin-arm64/CuaDriver.app/Contents/_CodeSignature/CodeResources +172 -0
- package/bin/darwin-x64/.version +2 -0
- package/bin/darwin-x64/CuaDriver.app/Contents/CodeResources +0 -0
- package/bin/darwin-x64/CuaDriver.app/Contents/Info.plist +32 -0
- package/bin/darwin-x64/CuaDriver.app/Contents/MacOS/cua-driver +0 -0
- package/bin/darwin-x64/CuaDriver.app/Contents/Resources/Skills/cua-driver/README.md +140 -0
- package/bin/darwin-x64/CuaDriver.app/Contents/Resources/Skills/cua-driver/RECORDING.md +113 -0
- package/bin/darwin-x64/CuaDriver.app/Contents/Resources/Skills/cua-driver/SKILL.md +887 -0
- package/bin/darwin-x64/CuaDriver.app/Contents/Resources/Skills/cua-driver/TESTS.md +232 -0
- package/bin/darwin-x64/CuaDriver.app/Contents/Resources/Skills/cua-driver/WEB_APPS.md +471 -0
- package/bin/darwin-x64/CuaDriver.app/Contents/_CodeSignature/CodeResources +172 -0
- package/bin/linux-x64/.version +2 -0
- package/bin/linux-x64/cua-driver +0 -0
- package/bin/win32-arm64/.version +2 -0
- package/bin/win32-arm64/cua-driver-uia.exe +0 -0
- package/bin/win32-arm64/cua-driver.exe +0 -0
- package/bin/win32-x64/.version +2 -0
- package/bin/win32-x64/cua-driver-uia.exe +0 -0
- package/bin/win32-x64/cua-driver.exe +0 -0
- package/dist/config.d.ts +18 -0
- package/dist/config.d.ts.map +1 -0
- package/dist/config.js +15 -0
- package/dist/config.js.map +1 -0
- package/dist/index.d.ts +6 -0
- package/dist/index.d.ts.map +1 -0
- package/dist/index.js +610 -0
- package/dist/index.js.map +1 -0
- package/dist/mcp-client.d.ts +22 -0
- package/dist/mcp-client.d.ts.map +1 -0
- package/dist/mcp-client.js +91 -0
- package/dist/mcp-client.js.map +1 -0
- package/dist/vision.d.ts +6 -0
- package/dist/vision.d.ts.map +1 -0
- package/dist/vision.js +76 -0
- package/dist/vision.js.map +1 -0
- package/package.json +72 -0
- package/preview.png +0 -0
- package/scripts/postinstall.js +29 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,201 @@
|
|
|
1
|
+
Apache License
|
|
2
|
+
Version 2.0, January 2004
|
|
3
|
+
http://www.apache.org/licenses/
|
|
4
|
+
|
|
5
|
+
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
|
6
|
+
|
|
7
|
+
1. Definitions.
|
|
8
|
+
|
|
9
|
+
"License" shall mean the terms and conditions for use, reproduction,
|
|
10
|
+
and distribution as defined by Sections 1 through 9 of this document.
|
|
11
|
+
|
|
12
|
+
"Licensor" shall mean the copyright owner or entity authorized by
|
|
13
|
+
the copyright owner that is granting the License.
|
|
14
|
+
|
|
15
|
+
"Legal Entity" shall mean the union of the acting entity and all
|
|
16
|
+
other entities that control, are controlled by, or are under common
|
|
17
|
+
control with that entity. For the purposes of this definition,
|
|
18
|
+
"control" means (i) the power, direct or indirect, to cause the
|
|
19
|
+
direction or management of such entity, whether by contract or
|
|
20
|
+
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
|
21
|
+
outstanding shares, or (iii) beneficial ownership of such entity.
|
|
22
|
+
|
|
23
|
+
"You" (or "Your") shall mean an individual or Legal Entity
|
|
24
|
+
exercising permissions granted by this License.
|
|
25
|
+
|
|
26
|
+
"Source" form shall mean the preferred form for making modifications,
|
|
27
|
+
including but not limited to software source code, documentation
|
|
28
|
+
source, and configuration files.
|
|
29
|
+
|
|
30
|
+
"Object" form shall mean any form resulting from mechanical
|
|
31
|
+
transformation or translation of a Source form, including but
|
|
32
|
+
not limited to compiled object code, generated documentation,
|
|
33
|
+
and conversions to other media types.
|
|
34
|
+
|
|
35
|
+
"Work" shall mean the work of authorship, whether in Source or
|
|
36
|
+
Object form, made available under the License, as indicated by a
|
|
37
|
+
copyright notice that is included in or attached to the work
|
|
38
|
+
(an example is provided in the Appendix below).
|
|
39
|
+
|
|
40
|
+
"Derivative Works" shall mean any work, whether in Source or Object
|
|
41
|
+
form, that is based on (or derived from) the Work and for which the
|
|
42
|
+
editorial revisions, annotations, elaborations, or other modifications
|
|
43
|
+
represent, as a whole, an original work of authorship. For the purposes
|
|
44
|
+
of this License, Derivative Works shall not include works that remain
|
|
45
|
+
separable from, or merely link (or bind by name) to the interfaces of,
|
|
46
|
+
the Work and Derivative Works thereof.
|
|
47
|
+
|
|
48
|
+
"Contribution" shall mean any work of authorship, including
|
|
49
|
+
the original version of the Work and any modifications or additions
|
|
50
|
+
to that Work or Derivative Works thereof, that is intentionally
|
|
51
|
+
submitted to Licensor for inclusion in the Work by the copyright owner
|
|
52
|
+
or by an individual or Legal Entity authorized to submit on behalf of
|
|
53
|
+
the copyright owner. For the purposes of this definition, "submitted"
|
|
54
|
+
means any form of electronic, verbal, or written communication sent
|
|
55
|
+
to the Licensor or its representatives, including but not limited to
|
|
56
|
+
communication on electronic mailing lists, source code control systems,
|
|
57
|
+
and issue tracking systems that are managed by, or on behalf of, the
|
|
58
|
+
Licensor for the purpose of discussing and improving the Work, but
|
|
59
|
+
excluding communication that is conspicuously marked or otherwise
|
|
60
|
+
designated in writing by the copyright owner as "Not a Contribution."
|
|
61
|
+
|
|
62
|
+
"Contributor" shall mean Licensor and any individual or Legal Entity
|
|
63
|
+
on behalf of whom a Contribution has been received by Licensor and
|
|
64
|
+
subsequently incorporated within the Work.
|
|
65
|
+
|
|
66
|
+
2. Grant of Copyright License. Subject to the terms and conditions of
|
|
67
|
+
this License, each Contributor hereby grants to You a perpetual,
|
|
68
|
+
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
|
69
|
+
copyright license to reproduce, prepare Derivative Works of,
|
|
70
|
+
publicly display, publicly perform, sublicense, and distribute the
|
|
71
|
+
Work and such Derivative Works in Source or Object form.
|
|
72
|
+
|
|
73
|
+
3. Grant of Patent License. Subject to the terms and conditions of
|
|
74
|
+
this License, each Contributor hereby grants to You a perpetual,
|
|
75
|
+
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
|
76
|
+
(except as stated in this section) patent license to make, have made,
|
|
77
|
+
use, offer to sell, sell, import, and otherwise transfer the Work,
|
|
78
|
+
where such license applies only to those patent claims licensable
|
|
79
|
+
by such Contributor that are necessarily infringed by their
|
|
80
|
+
Contribution(s) alone or by combination of their Contribution(s)
|
|
81
|
+
with the Work to which such Contribution(s) was submitted. If You
|
|
82
|
+
institute patent litigation against any entity (including a
|
|
83
|
+
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
|
84
|
+
or a Contribution incorporated within the Work constitutes direct
|
|
85
|
+
or contributory patent infringement, then any patent licenses
|
|
86
|
+
granted to You under this License for that Work shall terminate
|
|
87
|
+
as of the date such litigation is filed.
|
|
88
|
+
|
|
89
|
+
4. Redistribution. You may reproduce and distribute copies of the
|
|
90
|
+
Work or Derivative Works thereof in any medium, with or without
|
|
91
|
+
modifications, and in Source or Object form, provided that You
|
|
92
|
+
meet the following conditions:
|
|
93
|
+
|
|
94
|
+
(a) You must give any other recipients of the Work or
|
|
95
|
+
Derivative Works a copy of this License; and
|
|
96
|
+
|
|
97
|
+
(b) You must cause any modified files to carry prominent notices
|
|
98
|
+
stating that You changed the files; and
|
|
99
|
+
|
|
100
|
+
(c) You must retain, in the Source form of any Derivative Works
|
|
101
|
+
that You distribute, all copyright, patent, trademark, and
|
|
102
|
+
attribution notices from the Source form of the Work,
|
|
103
|
+
excluding those notices that do not pertain to any part of
|
|
104
|
+
the Derivative Works; and
|
|
105
|
+
|
|
106
|
+
(d) If the Work includes a "NOTICE" text file as part of its
|
|
107
|
+
distribution, then any Derivative Works that You distribute must
|
|
108
|
+
include a readable copy of the attribution notices contained
|
|
109
|
+
within such NOTICE file, excluding those notices that do not
|
|
110
|
+
pertain to any part of the Derivative Works, in at least one
|
|
111
|
+
of the following places: within a NOTICE text file distributed
|
|
112
|
+
as part of the Derivative Works; within the Source form or
|
|
113
|
+
documentation, if provided along with the Derivative Works; or,
|
|
114
|
+
within a display generated by the Derivative Works, if and
|
|
115
|
+
wherever such third-party notices normally appear. The contents
|
|
116
|
+
of the NOTICE file are for informational purposes only and
|
|
117
|
+
do not modify the License. You may add Your own attribution
|
|
118
|
+
notices within Derivative Works that You distribute, alongside
|
|
119
|
+
or as an addendum to the NOTICE text from the Work, provided
|
|
120
|
+
that such additional attribution notices cannot be construed
|
|
121
|
+
as modifying the License.
|
|
122
|
+
|
|
123
|
+
You may add Your own copyright statement to Your modifications and
|
|
124
|
+
may provide additional or different license terms and conditions
|
|
125
|
+
for use, reproduction, or distribution of Your modifications, or
|
|
126
|
+
for any such Derivative Works as a whole, provided Your use,
|
|
127
|
+
reproduction, and distribution of the Work otherwise complies with
|
|
128
|
+
the conditions stated in this License.
|
|
129
|
+
|
|
130
|
+
5. Submission of Contributions. Unless You explicitly state otherwise,
|
|
131
|
+
any Contribution intentionally submitted for inclusion in the Work
|
|
132
|
+
by You to the Licensor shall be under the terms and conditions of
|
|
133
|
+
this License, without any additional terms or conditions.
|
|
134
|
+
Notwithstanding the above, nothing herein shall supersede or modify
|
|
135
|
+
the terms of any separate license agreement you may have executed
|
|
136
|
+
with Licensor regarding such Contributions.
|
|
137
|
+
|
|
138
|
+
6. Trademarks. This License does not grant permission to use the trade
|
|
139
|
+
names, trademarks, service marks, or product names of the Licensor,
|
|
140
|
+
except as required for reasonable and customary use in describing the
|
|
141
|
+
origin of the Work and reproducing the content of the NOTICE file.
|
|
142
|
+
|
|
143
|
+
7. Disclaimer of Warranty. Unless required by applicable law or
|
|
144
|
+
agreed to in writing, Licensor provides the Work (and each
|
|
145
|
+
Contributor provides its Contributions) on an "AS IS" BASIS,
|
|
146
|
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
|
147
|
+
implied, including, without limitation, any warranties or conditions
|
|
148
|
+
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
|
149
|
+
PARTICULAR PURPOSE. You are solely responsible for determining the
|
|
150
|
+
appropriateness of using or redistributing the Work and assume any
|
|
151
|
+
risks associated with Your exercise of permissions under this License.
|
|
152
|
+
|
|
153
|
+
8. Limitation of Liability. In no event and under no legal theory,
|
|
154
|
+
whether in tort (including negligence), contract, or otherwise,
|
|
155
|
+
unless required by applicable law (such as deliberate and grossly
|
|
156
|
+
negligent acts) or agreed to in writing, shall any Contributor be
|
|
157
|
+
liable to You for damages, including any direct, indirect, special,
|
|
158
|
+
incidental, or consequential damages of any character arising as a
|
|
159
|
+
result of this License or out of the use or inability to use the
|
|
160
|
+
Work (including but not limited to damages for loss of goodwill,
|
|
161
|
+
work stoppage, computer failure or malfunction, or any and all
|
|
162
|
+
other commercial damages or losses), even if such Contributor
|
|
163
|
+
has been advised of the possibility of such damages.
|
|
164
|
+
|
|
165
|
+
9. Accepting Warranty or Additional Liability. While redistributing
|
|
166
|
+
the Work or Derivative Works thereof, You may choose to offer,
|
|
167
|
+
and charge a fee for, acceptance of support, warranty, indemnity,
|
|
168
|
+
or other liability obligations and/or rights consistent with this
|
|
169
|
+
License. However, in accepting such obligations, You may act only
|
|
170
|
+
on Your own behalf and on Your sole responsibility, not on behalf
|
|
171
|
+
of any other Contributor, and only if You agree to indemnify,
|
|
172
|
+
defend, and hold each Contributor harmless for any liability
|
|
173
|
+
incurred by, or claims asserted against, such Contributor by reason
|
|
174
|
+
of your accepting any such warranty or additional liability.
|
|
175
|
+
|
|
176
|
+
END OF TERMS AND CONDITIONS
|
|
177
|
+
|
|
178
|
+
APPENDIX: How to apply the Apache License to your work.
|
|
179
|
+
|
|
180
|
+
To apply the Apache License to your work, attach the following
|
|
181
|
+
boilerplate notice, with the fields enclosed by brackets "[]"
|
|
182
|
+
replaced with your own identifying information. (Don't include
|
|
183
|
+
the brackets!) The text should be enclosed in the appropriate
|
|
184
|
+
comment syntax for the file format. We also recommend that a
|
|
185
|
+
file or class name and description of purpose be included on the
|
|
186
|
+
same "printed page" as the copyright notice for easier
|
|
187
|
+
identification within third-party archives.
|
|
188
|
+
|
|
189
|
+
Copyright [yyyy] [name of copyright owner]
|
|
190
|
+
|
|
191
|
+
Licensed under the Apache License, Version 2.0 (the "License");
|
|
192
|
+
you may not use this file except in compliance with the License.
|
|
193
|
+
You may obtain a copy of the License at
|
|
194
|
+
|
|
195
|
+
http://www.apache.org/licenses/LICENSE-2.0
|
|
196
|
+
|
|
197
|
+
Unless required by applicable law or agreed to in writing, software
|
|
198
|
+
distributed under the License is distributed on an "AS IS" BASIS,
|
|
199
|
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
200
|
+
See the License for the specific language governing permissions and
|
|
201
|
+
limitations under the License.
|
package/README.md
ADDED
|
@@ -0,0 +1,136 @@
|
|
|
1
|
+
# @amaster.ai/pi-computer-use
|
|
2
|
+
|
|
3
|
+

|
|
4
|
+
|
|
5
|
+
pi-coding-agent extension that wraps [cua-driver-rs](https://github.com/trycua/cua/), exposing desktop automation tools with a `computer_use_` prefix.
|
|
6
|
+
|
|
7
|
+
## Features
|
|
8
|
+
|
|
9
|
+
- **Zero external dependencies** — pre-compiled cua-driver-rs binaries bundled for all platforms
|
|
10
|
+
- **MCP stdio communication** — spawns `cua-driver mcp` via `StdioClientTransport`, JSON-RPC over stdio
|
|
11
|
+
- **Dynamic tool discovery** — auto-discovers upstream MCP tools and registers with `computer_use_` prefix; falls back to a built-in tool list when cua-driver fails to start
|
|
12
|
+
- **Smart tool filtering** — excludes non-essential tools (agent cursor, recording, config, raw screenshot), exposes 17 action tools + 1 vision tool
|
|
13
|
+
- **Optional visual analysis** — `computer_use_analyze_screenshot` via configurable vision model
|
|
14
|
+
- **Cross-platform permission handling** — detects platform-specific permission issues (macOS TCC, Windows UAC, Linux display server access) and returns actionable guidance
|
|
15
|
+
- **Graceful degradation** — tools are always registered even when cua-driver cannot connect; lazy reconnect is attempted on each tool call
|
|
16
|
+
|
|
17
|
+
## Install
|
|
18
|
+
|
|
19
|
+
```bash
|
|
20
|
+
bun add @amaster.ai/pi-computer-use
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
Requires Node.js >= 20 and `@earendil-works/pi-coding-agent >= 0.74.0`.
|
|
24
|
+
|
|
25
|
+
## Usage
|
|
26
|
+
|
|
27
|
+
Install the package and pi-coding-agent will automatically discover and load the extension. All tools are registered on `session_start`.
|
|
28
|
+
|
|
29
|
+
Configure via `.pi/settings.json` (project-level) or `~/.pi/agent/settings.json` (user-level) under the `"pi-computer-use"` key:
|
|
30
|
+
|
|
31
|
+
```json
|
|
32
|
+
{
|
|
33
|
+
"pi-computer-use": {
|
|
34
|
+
"mode": "bundled"
|
|
35
|
+
}
|
|
36
|
+
}
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
## Configuration
|
|
40
|
+
|
|
41
|
+
| Option | Type | Default | Description |
|
|
42
|
+
|--------|------|---------|-------------|
|
|
43
|
+
| `mode` | `'bundled' \| 'path'` | `'bundled'` | Binary resolution strategy |
|
|
44
|
+
| `binaryPath` | `string` | — | Custom cua-driver binary path (requires `mode: 'path'`) |
|
|
45
|
+
| `extraArgs` | `string[]` | — | Extra CLI arguments passed to cua-driver |
|
|
46
|
+
| `visionModel` | `VisionModelConfig` | — | Enable visual screenshot analysis |
|
|
47
|
+
|
|
48
|
+
### Vision Model (Optional)
|
|
49
|
+
|
|
50
|
+
Enable `computer_use_analyze_screenshot` by referencing a model already configured in Pi's model registry (`models.json`):
|
|
51
|
+
|
|
52
|
+
```json
|
|
53
|
+
{
|
|
54
|
+
"pi-computer-use": {
|
|
55
|
+
"visionModel": {
|
|
56
|
+
"provider": "openai",
|
|
57
|
+
"model": "gpt-4o"
|
|
58
|
+
}
|
|
59
|
+
}
|
|
60
|
+
}
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
The extension resolves API key, base URL, and headers from the model registry automatically — no need to duplicate credentials here.
|
|
64
|
+
|
|
65
|
+
## Exposed Tools (17 + 1 vision)
|
|
66
|
+
|
|
67
|
+
### Input
|
|
68
|
+
|
|
69
|
+
| Tool | Description |
|
|
70
|
+
|------|-------------|
|
|
71
|
+
| `computer_use_click` | Left-click via element_index or x/y coordinates |
|
|
72
|
+
| `computer_use_double_click` | Double-click at x/y or on an AX element |
|
|
73
|
+
| `computer_use_right_click` | Right-click (context menu) |
|
|
74
|
+
| `computer_use_type_text` | Insert text via AX or CGEvent fallback |
|
|
75
|
+
| `computer_use_press_key` | Press and release a single key |
|
|
76
|
+
| `computer_use_hotkey` | Press a key combination (e.g. Cmd+C) |
|
|
77
|
+
| `computer_use_scroll` | Scroll by line or page in a direction |
|
|
78
|
+
| `computer_use_drag` | Press-drag-release gesture between two points |
|
|
79
|
+
| `computer_use_set_value` | Set value on UI elements (popups, sliders, steppers) |
|
|
80
|
+
|
|
81
|
+
### Query
|
|
82
|
+
|
|
83
|
+
| Tool | Description |
|
|
84
|
+
|------|-------------|
|
|
85
|
+
| `computer_use_get_screen_size` | Get display dimensions and scale factor |
|
|
86
|
+
| `computer_use_get_cursor_position` | Get current mouse cursor position |
|
|
87
|
+
| `computer_use_get_accessibility_tree` | Lightweight desktop snapshot (apps, windows, bounds) |
|
|
88
|
+
| `computer_use_get_window_state` | Full AX tree of a window with actionable element indices |
|
|
89
|
+
| `computer_use_list_windows` | List all top-level windows with bounds and z-order |
|
|
90
|
+
| `computer_use_list_apps` | List running and installed apps with state flags |
|
|
91
|
+
|
|
92
|
+
### App Lifecycle
|
|
93
|
+
|
|
94
|
+
| Tool | Description |
|
|
95
|
+
|------|-------------|
|
|
96
|
+
| `computer_use_launch_app` | Launch an app in the background without focus steal |
|
|
97
|
+
| `computer_use_kill_app` | Force-terminate a process by pid |
|
|
98
|
+
|
|
99
|
+
### Vision (requires `visionModel` config)
|
|
100
|
+
|
|
101
|
+
| Tool | Description |
|
|
102
|
+
|------|-------------|
|
|
103
|
+
| `computer_use_analyze_screenshot` | Take a screenshot and analyze it with a vision model |
|
|
104
|
+
|
|
105
|
+
## Excluded Tools (16)
|
|
106
|
+
|
|
107
|
+
Agent cursor styling, recording/replay, config management, zoom, raw screenshot (use `analyze_screenshot` instead), and browser-specific operations are filtered out.
|
|
108
|
+
|
|
109
|
+
## Permissions
|
|
110
|
+
|
|
111
|
+
On `session_start`, the extension checks permissions via cua-driver's `check_permissions` tool. Platform-specific guidance is provided:
|
|
112
|
+
|
|
113
|
+
| Platform | Accessibility | Screen Capture |
|
|
114
|
+
|----------|--------------|----------------|
|
|
115
|
+
| macOS | System Settings → Privacy & Security → Accessibility | System Settings → Privacy & Security → Screen & System Audio Recording |
|
|
116
|
+
| Windows | Run as Administrator / UI Automation access | Check DRM or security policy |
|
|
117
|
+
| Linux | AT-SPI accessibility service | PipeWire portal or X11 access |
|
|
118
|
+
|
|
119
|
+
When cua-driver fails to connect (missing permissions, binary not found, etc.):
|
|
120
|
+
1. User is notified with a platform-appropriate warning
|
|
121
|
+
2. Tools are still registered using a built-in fallback schema
|
|
122
|
+
3. On each tool call, lazy reconnect is attempted; if it still fails, a friendly error with permission instructions is returned
|
|
123
|
+
|
|
124
|
+
## Supported Platforms
|
|
125
|
+
|
|
126
|
+
| Platform | Binary |
|
|
127
|
+
|----------|--------|
|
|
128
|
+
| macOS ARM64 | `bin/darwin-arm64/cua-driver` |
|
|
129
|
+
| macOS x64 | `bin/darwin-x64/cua-driver` |
|
|
130
|
+
| Linux x64 | `bin/linux-x64/cua-driver` |
|
|
131
|
+
| Windows x64 | `bin/win32-x64/cua-driver.exe` |
|
|
132
|
+
| Windows ARM64 | `bin/win32-arm64/cua-driver.exe` |
|
|
133
|
+
|
|
134
|
+
## License
|
|
135
|
+
|
|
136
|
+
Apache-2.0
|
|
Binary file
|
|
@@ -0,0 +1,32 @@
|
|
|
1
|
+
<?xml version="1.0" encoding="UTF-8"?>
|
|
2
|
+
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
|
3
|
+
<plist version="1.0">
|
|
4
|
+
<dict>
|
|
5
|
+
<key>CFBundleIdentifier</key>
|
|
6
|
+
<string>com.trycua.driver</string>
|
|
7
|
+
<key>CFBundleName</key>
|
|
8
|
+
<string>Cua Driver</string>
|
|
9
|
+
<key>CFBundleDisplayName</key>
|
|
10
|
+
<string>Cua Driver</string>
|
|
11
|
+
<key>CFBundleExecutable</key>
|
|
12
|
+
<string>cua-driver</string>
|
|
13
|
+
<key>CFBundleIconFile</key>
|
|
14
|
+
<string>AppIcon</string>
|
|
15
|
+
<key>CFBundleIconName</key>
|
|
16
|
+
<string>AppIcon</string>
|
|
17
|
+
<key>CFBundlePackageType</key>
|
|
18
|
+
<string>APPL</string>
|
|
19
|
+
<key>CFBundleShortVersionString</key>
|
|
20
|
+
<string>0.2.0</string>
|
|
21
|
+
<key>CFBundleVersion</key>
|
|
22
|
+
<string>1</string>
|
|
23
|
+
<key>LSMinimumSystemVersion</key>
|
|
24
|
+
<string>14.0</string>
|
|
25
|
+
<key>LSUIElement</key>
|
|
26
|
+
<true/>
|
|
27
|
+
<key>NSHighResolutionCapable</key>
|
|
28
|
+
<true/>
|
|
29
|
+
<key>NSSupportsAutomaticTermination</key>
|
|
30
|
+
<true/>
|
|
31
|
+
</dict>
|
|
32
|
+
</plist>
|
|
Binary file
|
|
@@ -0,0 +1,140 @@
|
|
|
1
|
+
# cua-driver — Claude Code skill
|
|
2
|
+
|
|
3
|
+
A [Claude Code](https://code.claude.com) skill that teaches Claude to
|
|
4
|
+
drive native macOS apps via the
|
|
5
|
+
[`cua-driver`](https://github.com/trycua/cua/tree/main/libs/cua-driver)
|
|
6
|
+
CLI — snapshot an app's accessibility tree, click/type/scroll by
|
|
7
|
+
`element_index`, and verify via re-snapshot. Backgrounded-first: no
|
|
8
|
+
focus steal, no cursor warp, no Space follow.
|
|
9
|
+
|
|
10
|
+
## What the skill covers
|
|
11
|
+
|
|
12
|
+
- The snapshot-before-AND-after invariant that keeps the agent honest
|
|
13
|
+
about whether an action actually landed.
|
|
14
|
+
- The backgrounded-click recipe (yabai focus-without-raise + stamped
|
|
15
|
+
SLEventPostToPid) that lets synthetic clicks land on Chrome web
|
|
16
|
+
content without raising the window or pulling the user across Spaces.
|
|
17
|
+
- Web-app quirks (`WEB_APPS.md`) — Chromium/WebKit/Electron/Tauri,
|
|
18
|
+
including the minimized-Chrome keyboard-commit caveat and the
|
|
19
|
+
`set_value` workaround.
|
|
20
|
+
- Trajectory recording (`RECORDING.md`) — optional per-session
|
|
21
|
+
recording + replay for demos and regressions.
|
|
22
|
+
- Canvas/viewport apps (Blender, Unity, GHOST, Qt, wxWidgets) —
|
|
23
|
+
HID-tap fallback when AX is empty.
|
|
24
|
+
|
|
25
|
+
See `SKILL.md` for the main body.
|
|
26
|
+
|
|
27
|
+
## Prerequisites
|
|
28
|
+
|
|
29
|
+
1. **macOS 14 or newer** — the driver depends on SkyLight private SPIs
|
|
30
|
+
that were stabilized in Sonoma.
|
|
31
|
+
2. **`cua-driver` CLI + `CuaDriver.app`** — installable one-liner:
|
|
32
|
+
```bash
|
|
33
|
+
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/cua-driver/scripts/install.sh)"
|
|
34
|
+
```
|
|
35
|
+
Or from a clone of `trycua/cua`:
|
|
36
|
+
```bash
|
|
37
|
+
cd libs/cua-driver
|
|
38
|
+
scripts/install-local.sh # builds + installs + symlinks for dev use
|
|
39
|
+
```
|
|
40
|
+
The driver runs as an `.app` bundle because macOS TCC grants are
|
|
41
|
+
tied to a stable bundle id (`com.trycua.driver`). The CLI symlink
|
|
42
|
+
lets Claude invoke tools via plain shell.
|
|
43
|
+
3. **TCC grants on `CuaDriver.app`** — **Accessibility** and
|
|
44
|
+
**Screen Recording** in System Settings → Privacy & Security.
|
|
45
|
+
Verify with:
|
|
46
|
+
```bash
|
|
47
|
+
cua-driver check_permissions
|
|
48
|
+
```
|
|
49
|
+
Both fields must be `true`. If not, the app appears in the
|
|
50
|
+
relevant panes of System Settings after first use; toggle it on
|
|
51
|
+
there.
|
|
52
|
+
|
|
53
|
+
## Install
|
|
54
|
+
|
|
55
|
+
The skill is two drop-in directories.
|
|
56
|
+
|
|
57
|
+
**Personal scope** (all Claude Code sessions on your machine):
|
|
58
|
+
|
|
59
|
+
```bash
|
|
60
|
+
mkdir -p ~/.claude/skills
|
|
61
|
+
cp -R Skills/cua-driver ~/.claude/skills/
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
Or symlink if you want edits-in-place:
|
|
65
|
+
|
|
66
|
+
```bash
|
|
67
|
+
ln -s "$PWD/Skills/cua-driver" ~/.claude/skills/cua-driver
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
**Project scope** (committed alongside a specific repo):
|
|
71
|
+
|
|
72
|
+
```bash
|
|
73
|
+
mkdir -p .claude/skills
|
|
74
|
+
cp -R /path/to/cua/libs/cua-driver/Skills/cua-driver .claude/skills/
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
## Invoking the skill
|
|
78
|
+
|
|
79
|
+
Claude Code auto-invokes the skill when you ask for macOS GUI
|
|
80
|
+
automation — e.g. "open the Downloads folder in Finder", "click the
|
|
81
|
+
Save button in Numbers", "navigate to trycua.com in Chrome". You can
|
|
82
|
+
also invoke it explicitly:
|
|
83
|
+
|
|
84
|
+
```
|
|
85
|
+
/cua-driver
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
## Claude Code MCP compatibility mode
|
|
89
|
+
|
|
90
|
+
For normal skill-driven use, prefer the CLI or the standard MCP server. If you want Claude Code's vision/computer-use-style flow to ground on CuaDriver screenshots, register the compatibility server:
|
|
91
|
+
|
|
92
|
+
```bash
|
|
93
|
+
claude mcp add --transport stdio cua-computer-use -- cua-driver mcp --claude-code-computer-use-compat
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
This mode exposes the normal CuaDriver tools and changes only `screenshot`. The compatibility screenshot requires `pid` and `window_id`, captures that window only, and establishes a window-local pixel coordinate frame. It does not call Anthropic APIs or expose Anthropic's native computer-use API tool.
|
|
97
|
+
|
|
98
|
+
Use MCP for this Claude Code vision/computer-use-style path. CLI screenshots still work as CuaDriver calls, but they do not expose the `mcp__cua-computer-use__screenshot` tool name that Claude Code appears to use as the image-grounding cue.
|
|
99
|
+
|
|
100
|
+
## Files
|
|
101
|
+
|
|
102
|
+
- `SKILL.md` — the main skill body (~500 lines). Loaded on first
|
|
103
|
+
invocation; stays in context for the session.
|
|
104
|
+
- `WEB_APPS.md` — browsers, Electron, Tauri (Chromium + WebKit). Loaded
|
|
105
|
+
on demand when SKILL.md's pointer is followed.
|
|
106
|
+
- `RECORDING.md` — trajectory recording / replay. Loaded on demand.
|
|
107
|
+
- `TESTS.md` — manual test scripts for end-to-end skill verification.
|
|
108
|
+
|
|
109
|
+
## Troubleshooting
|
|
110
|
+
|
|
111
|
+
- `cua-driver: command not found` → re-run the installer or add
|
|
112
|
+
`.build/CuaDriver.app/Contents/MacOS/` to `$PATH`.
|
|
113
|
+
- `No cached AX state for pid X window_id W` → element_index was
|
|
114
|
+
reused across turns, or across different windows of the same app.
|
|
115
|
+
Call `get_window_state({pid, window_id})` first in the same turn,
|
|
116
|
+
with the same window_id you're about to act against.
|
|
117
|
+
- Empty `tree_markdown` → `capture_mode` is set to `vision`, which
|
|
118
|
+
skips the AX walk by design. Flip back to the default `som`
|
|
119
|
+
(`cua-driver config set capture_mode som`) to get the tree.
|
|
120
|
+
Tiny screenshot → likely a stale window capture. See "Behavior
|
|
121
|
+
matrix" in SKILL.md for the full mode table.
|
|
122
|
+
- System-alert beep when pressing Return on a minimized Chrome
|
|
123
|
+
omnibox → the keyboard-commit-on-minimized limitation. Use
|
|
124
|
+
`set_value` on the field instead, or AX-click a Go/Submit button.
|
|
125
|
+
See `WEB_APPS.md`.
|
|
126
|
+
|
|
127
|
+
## Updates
|
|
128
|
+
|
|
129
|
+
The skill evolves alongside the driver. To update:
|
|
130
|
+
|
|
131
|
+
```bash
|
|
132
|
+
cd /path/to/cua && git pull
|
|
133
|
+
# if you copied: re-copy
|
|
134
|
+
cp -R libs/cua-driver/Skills/cua-driver ~/.claude/skills/
|
|
135
|
+
# if you symlinked: nothing needed
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
## License
|
|
139
|
+
|
|
140
|
+
MIT. Same license as the parent `trycua/cua` repo.
|
|
@@ -0,0 +1,113 @@
|
|
|
1
|
+
# Recording & replaying trajectories
|
|
2
|
+
|
|
3
|
+
Session-scoped capture of action sequences + pre/post state, suitable
|
|
4
|
+
for demos, regression diffs, and training data. Invoked only when the
|
|
5
|
+
user explicitly asks to record — the skill does not auto-enable this.
|
|
6
|
+
|
|
7
|
+
`set_recording` turns on a session-scoped trajectory recorder. While
|
|
8
|
+
enabled, every action-tool call (`click`, `right_click`, `scroll`,
|
|
9
|
+
`type_text`, `press_key`, `hotkey`, `set_value`)
|
|
10
|
+
writes a numbered turn folder under a caller-chosen output
|
|
11
|
+
directory. Read-only tools (`get_window_state`, `list_windows`,
|
|
12
|
+
`screenshot`, `list_apps`, permission probes, agent-cursor getters /
|
|
13
|
+
setters, and `set_recording` itself) are not recorded.
|
|
14
|
+
|
|
15
|
+
## Enable / disable
|
|
16
|
+
|
|
17
|
+
Two equivalent surfaces: the `set_recording` MCP tool, or the
|
|
18
|
+
friendlier `cua-driver recording` subcommand group (wraps
|
|
19
|
+
`set_recording` + `get_recording_state` with human-readable output).
|
|
20
|
+
|
|
21
|
+
```
|
|
22
|
+
cua-driver recording start ~/cua-trajectories/run-1
|
|
23
|
+
# … run the workflow …
|
|
24
|
+
cua-driver recording status # -> enabled / disabled, next_turn, output_dir
|
|
25
|
+
cua-driver recording stop # -> "Recording disabled (N turns captured in …)"
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
Raw-tool equivalent:
|
|
29
|
+
|
|
30
|
+
```
|
|
31
|
+
cua-driver set_recording '{"enabled":true,"output_dir":"~/cua-trajectories/run-1"}'
|
|
32
|
+
cua-driver get_recording_state
|
|
33
|
+
cua-driver set_recording '{"enabled":false}'
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
The `recording` subcommands require a running daemon (`cua-driver
|
|
37
|
+
serve &`) because recording state is per-process. `output_dir` expands
|
|
38
|
+
`~` and is created (with intermediates) if missing. Turn numbering
|
|
39
|
+
starts at `1` every time recording is (re-)enabled, regardless of any
|
|
40
|
+
existing contents in the directory. State lives in memory only — a
|
|
41
|
+
daemon restart resets to disabled.
|
|
42
|
+
|
|
43
|
+
## What each turn folder contains
|
|
44
|
+
|
|
45
|
+
Each action writes to `turn-NNNNN/` (five-digit zero-padded counter):
|
|
46
|
+
|
|
47
|
+
- `app_state.json` — post-action AX snapshot for the target pid, same
|
|
48
|
+
shape `get_window_state` returns (tree_markdown, element_count,
|
|
49
|
+
turn_id, etc.) minus the screenshot fields. The recorder resolves a
|
|
50
|
+
frontmost window internally (visible + on-current-Space preferred,
|
|
51
|
+
max-area fallback) since individual action tools carry a
|
|
52
|
+
window_id but the recorder has no caller-supplied anchor.
|
|
53
|
+
- `screenshot.png` — post-action capture of the same window the
|
|
54
|
+
recorder just snapshotted. Omitted when the pid has no visible
|
|
55
|
+
window.
|
|
56
|
+
- `action.json` — the tool name, full input arguments, result
|
|
57
|
+
summary, pid, click point (when applicable), ISO-8601 timestamp.
|
|
58
|
+
- `click.png` — only for click-family actions (`click`,
|
|
59
|
+
`right_click`): a copy of `screenshot.png` with a red dot drawn at
|
|
60
|
+
the click point (screen-absolute point → window-local pixels via
|
|
61
|
+
the screenshot's `scale_factor`). Absent for other tools and for
|
|
62
|
+
clicks whose point falls outside the captured window.
|
|
63
|
+
|
|
64
|
+
## When to use it
|
|
65
|
+
|
|
66
|
+
- Demos and screen recordings — play the turn folder back to show
|
|
67
|
+
exactly what the agent saw and what it did.
|
|
68
|
+
- Replay for regression — re-run the same sequence against a future
|
|
69
|
+
build and diff the new trajectory against the saved one.
|
|
70
|
+
- Training data collection — each turn is a
|
|
71
|
+
`(state, action, next_state)` triple ready for offline learning.
|
|
72
|
+
|
|
73
|
+
## When to invoke it
|
|
74
|
+
|
|
75
|
+
This skill does **not** auto-enable recording. The client invokes
|
|
76
|
+
`set_recording` explicitly when the user asks to capture a session.
|
|
77
|
+
If the user says "record this session" or similar, call
|
|
78
|
+
`set_recording({enabled:true, output_dir:…})` before the first
|
|
79
|
+
action, and `set_recording({enabled:false})` when done.
|
|
80
|
+
|
|
81
|
+
## Replaying a recorded trajectory
|
|
82
|
+
|
|
83
|
+
`replay_trajectory({dir})` walks `<dir>/turn-NNNNN/` folders in
|
|
84
|
+
lexical order, reads each `action.json`, and re-invokes the recorded
|
|
85
|
+
tool with its recorded `arguments`. Optional knobs: `delay_ms`
|
|
86
|
+
(pacing between turns, default 500) and `stop_on_error` (halt on
|
|
87
|
+
first failure, default true).
|
|
88
|
+
|
|
89
|
+
```
|
|
90
|
+
cua-driver recording start ~/cua-trajectories/demo1
|
|
91
|
+
# … run the workflow …
|
|
92
|
+
cua-driver recording stop
|
|
93
|
+
# Later: replay against a new build.
|
|
94
|
+
cua-driver replay_trajectory '{"dir":"~/cua-trajectories/demo1","delay_ms":500}'
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
Important caveat: **element_index doesn't survive across sessions**.
|
|
98
|
+
Indices are assigned fresh on every `get_window_state` snapshot,
|
|
99
|
+
keyed on `(pid, window_id)`, so a recorded
|
|
100
|
+
`click({pid, window_id, element_index: 14})` from yesterday won't
|
|
101
|
+
resolve today — the pid is usually different, the window_id always
|
|
102
|
+
is. The call returns `Invalid element_index` or `No cached AX
|
|
103
|
+
state`. Pixel clicks (`click({pid, x, y})`) and keyboard tools
|
|
104
|
+
(`press_key`, `hotkey`, `type_text` without element_index) replay cleanly; element-indexed actions require a
|
|
105
|
+
live snapshot that replay doesn't currently re-emit (read-only tools
|
|
106
|
+
like `get_window_state` aren't recorded). For a reliable replay, either
|
|
107
|
+
compose the trajectory from pixel + keyboard primitives, or capture
|
|
108
|
+
it as a regression artifact (compare the failure/success pattern
|
|
109
|
+
across builds) rather than a re-driving script.
|
|
110
|
+
|
|
111
|
+
If recording is still enabled while replay runs, the replay is
|
|
112
|
+
itself recorded into the current output directory — that's the
|
|
113
|
+
intended regression-diff workflow.
|