@amaster.ai/pi-computer-use 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (49) hide show
  1. package/LICENSE +201 -0
  2. package/README.md +136 -0
  3. package/bin/darwin-arm64/.version +2 -0
  4. package/bin/darwin-arm64/CuaDriver.app/Contents/CodeResources +0 -0
  5. package/bin/darwin-arm64/CuaDriver.app/Contents/Info.plist +32 -0
  6. package/bin/darwin-arm64/CuaDriver.app/Contents/MacOS/cua-driver +0 -0
  7. package/bin/darwin-arm64/CuaDriver.app/Contents/Resources/Skills/cua-driver/README.md +140 -0
  8. package/bin/darwin-arm64/CuaDriver.app/Contents/Resources/Skills/cua-driver/RECORDING.md +113 -0
  9. package/bin/darwin-arm64/CuaDriver.app/Contents/Resources/Skills/cua-driver/SKILL.md +887 -0
  10. package/bin/darwin-arm64/CuaDriver.app/Contents/Resources/Skills/cua-driver/TESTS.md +232 -0
  11. package/bin/darwin-arm64/CuaDriver.app/Contents/Resources/Skills/cua-driver/WEB_APPS.md +471 -0
  12. package/bin/darwin-arm64/CuaDriver.app/Contents/_CodeSignature/CodeResources +172 -0
  13. package/bin/darwin-x64/.version +2 -0
  14. package/bin/darwin-x64/CuaDriver.app/Contents/CodeResources +0 -0
  15. package/bin/darwin-x64/CuaDriver.app/Contents/Info.plist +32 -0
  16. package/bin/darwin-x64/CuaDriver.app/Contents/MacOS/cua-driver +0 -0
  17. package/bin/darwin-x64/CuaDriver.app/Contents/Resources/Skills/cua-driver/README.md +140 -0
  18. package/bin/darwin-x64/CuaDriver.app/Contents/Resources/Skills/cua-driver/RECORDING.md +113 -0
  19. package/bin/darwin-x64/CuaDriver.app/Contents/Resources/Skills/cua-driver/SKILL.md +887 -0
  20. package/bin/darwin-x64/CuaDriver.app/Contents/Resources/Skills/cua-driver/TESTS.md +232 -0
  21. package/bin/darwin-x64/CuaDriver.app/Contents/Resources/Skills/cua-driver/WEB_APPS.md +471 -0
  22. package/bin/darwin-x64/CuaDriver.app/Contents/_CodeSignature/CodeResources +172 -0
  23. package/bin/linux-x64/.version +2 -0
  24. package/bin/linux-x64/cua-driver +0 -0
  25. package/bin/win32-arm64/.version +2 -0
  26. package/bin/win32-arm64/cua-driver-uia.exe +0 -0
  27. package/bin/win32-arm64/cua-driver.exe +0 -0
  28. package/bin/win32-x64/.version +2 -0
  29. package/bin/win32-x64/cua-driver-uia.exe +0 -0
  30. package/bin/win32-x64/cua-driver.exe +0 -0
  31. package/dist/config.d.ts +18 -0
  32. package/dist/config.d.ts.map +1 -0
  33. package/dist/config.js +15 -0
  34. package/dist/config.js.map +1 -0
  35. package/dist/index.d.ts +6 -0
  36. package/dist/index.d.ts.map +1 -0
  37. package/dist/index.js +610 -0
  38. package/dist/index.js.map +1 -0
  39. package/dist/mcp-client.d.ts +22 -0
  40. package/dist/mcp-client.d.ts.map +1 -0
  41. package/dist/mcp-client.js +91 -0
  42. package/dist/mcp-client.js.map +1 -0
  43. package/dist/vision.d.ts +6 -0
  44. package/dist/vision.d.ts.map +1 -0
  45. package/dist/vision.js +76 -0
  46. package/dist/vision.js.map +1 -0
  47. package/package.json +72 -0
  48. package/preview.png +0 -0
  49. package/scripts/postinstall.js +29 -0
package/LICENSE ADDED
@@ -0,0 +1,201 @@
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by
13
+ the copyright owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all
16
+ other entities that control, are controlled by, or are under common
17
+ control with that entity. For the purposes of this definition,
18
+ "control" means (i) the power, direct or indirect, to cause the
19
+ direction or management of such entity, whether by contract or
20
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
+ outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" form shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" form shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work
38
+ (an example is provided in the Appendix below).
39
+
40
+ "Derivative Works" shall mean any work, whether in Source or Object
41
+ form, that is based on (or derived from) the Work and for which the
42
+ editorial revisions, annotations, elaborations, or other modifications
43
+ represent, as a whole, an original work of authorship. For the purposes
44
+ of this License, Derivative Works shall not include works that remain
45
+ separable from, or merely link (or bind by name) to the interfaces of,
46
+ the Work and Derivative Works thereof.
47
+
48
+ "Contribution" shall mean any work of authorship, including
49
+ the original version of the Work and any modifications or additions
50
+ to that Work or Derivative Works thereof, that is intentionally
51
+ submitted to Licensor for inclusion in the Work by the copyright owner
52
+ or by an individual or Legal Entity authorized to submit on behalf of
53
+ the copyright owner. For the purposes of this definition, "submitted"
54
+ means any form of electronic, verbal, or written communication sent
55
+ to the Licensor or its representatives, including but not limited to
56
+ communication on electronic mailing lists, source code control systems,
57
+ and issue tracking systems that are managed by, or on behalf of, the
58
+ Licensor for the purpose of discussing and improving the Work, but
59
+ excluding communication that is conspicuously marked or otherwise
60
+ designated in writing by the copyright owner as "Not a Contribution."
61
+
62
+ "Contributor" shall mean Licensor and any individual or Legal Entity
63
+ on behalf of whom a Contribution has been received by Licensor and
64
+ subsequently incorporated within the Work.
65
+
66
+ 2. Grant of Copyright License. Subject to the terms and conditions of
67
+ this License, each Contributor hereby grants to You a perpetual,
68
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
+ copyright license to reproduce, prepare Derivative Works of,
70
+ publicly display, publicly perform, sublicense, and distribute the
71
+ Work and such Derivative Works in Source or Object form.
72
+
73
+ 3. Grant of Patent License. Subject to the terms and conditions of
74
+ this License, each Contributor hereby grants to You a perpetual,
75
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
+ (except as stated in this section) patent license to make, have made,
77
+ use, offer to sell, sell, import, and otherwise transfer the Work,
78
+ where such license applies only to those patent claims licensable
79
+ by such Contributor that are necessarily infringed by their
80
+ Contribution(s) alone or by combination of their Contribution(s)
81
+ with the Work to which such Contribution(s) was submitted. If You
82
+ institute patent litigation against any entity (including a
83
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
84
+ or a Contribution incorporated within the Work constitutes direct
85
+ or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate
87
+ as of the date such litigation is filed.
88
+
89
+ 4. Redistribution. You may reproduce and distribute copies of the
90
+ Work or Derivative Works thereof in any medium, with or without
91
+ modifications, and in Source or Object form, provided that You
92
+ meet the following conditions:
93
+
94
+ (a) You must give any other recipients of the Work or
95
+ Derivative Works a copy of this License; and
96
+
97
+ (b) You must cause any modified files to carry prominent notices
98
+ stating that You changed the files; and
99
+
100
+ (c) You must retain, in the Source form of any Derivative Works
101
+ that You distribute, all copyright, patent, trademark, and
102
+ attribution notices from the Source form of the Work,
103
+ excluding those notices that do not pertain to any part of
104
+ the Derivative Works; and
105
+
106
+ (d) If the Work includes a "NOTICE" text file as part of its
107
+ distribution, then any Derivative Works that You distribute must
108
+ include a readable copy of the attribution notices contained
109
+ within such NOTICE file, excluding those notices that do not
110
+ pertain to any part of the Derivative Works, in at least one
111
+ of the following places: within a NOTICE text file distributed
112
+ as part of the Derivative Works; within the Source form or
113
+ documentation, if provided along with the Derivative Works; or,
114
+ within a display generated by the Derivative Works, if and
115
+ wherever such third-party notices normally appear. The contents
116
+ of the NOTICE file are for informational purposes only and
117
+ do not modify the License. You may add Your own attribution
118
+ notices within Derivative Works that You distribute, alongside
119
+ or as an addendum to the NOTICE text from the Work, provided
120
+ that such additional attribution notices cannot be construed
121
+ as modifying the License.
122
+
123
+ You may add Your own copyright statement to Your modifications and
124
+ may provide additional or different license terms and conditions
125
+ for use, reproduction, or distribution of Your modifications, or
126
+ for any such Derivative Works as a whole, provided Your use,
127
+ reproduction, and distribution of the Work otherwise complies with
128
+ the conditions stated in this License.
129
+
130
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
131
+ any Contribution intentionally submitted for inclusion in the Work
132
+ by You to the Licensor shall be under the terms and conditions of
133
+ this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify
135
+ the terms of any separate license agreement you may have executed
136
+ with Licensor regarding such Contributions.
137
+
138
+ 6. Trademarks. This License does not grant permission to use the trade
139
+ names, trademarks, service marks, or product names of the Licensor,
140
+ except as required for reasonable and customary use in describing the
141
+ origin of the Work and reproducing the content of the NOTICE file.
142
+
143
+ 7. Disclaimer of Warranty. Unless required by applicable law or
144
+ agreed to in writing, Licensor provides the Work (and each
145
+ Contributor provides its Contributions) on an "AS IS" BASIS,
146
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
+ implied, including, without limitation, any warranties or conditions
148
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
+ PARTICULAR PURPOSE. You are solely responsible for determining the
150
+ appropriateness of using or redistributing the Work and assume any
151
+ risks associated with Your exercise of permissions under this License.
152
+
153
+ 8. Limitation of Liability. In no event and under no legal theory,
154
+ whether in tort (including negligence), contract, or otherwise,
155
+ unless required by applicable law (such as deliberate and grossly
156
+ negligent acts) or agreed to in writing, shall any Contributor be
157
+ liable to You for damages, including any direct, indirect, special,
158
+ incidental, or consequential damages of any character arising as a
159
+ result of this License or out of the use or inability to use the
160
+ Work (including but not limited to damages for loss of goodwill,
161
+ work stoppage, computer failure or malfunction, or any and all
162
+ other commercial damages or losses), even if such Contributor
163
+ has been advised of the possibility of such damages.
164
+
165
+ 9. Accepting Warranty or Additional Liability. While redistributing
166
+ the Work or Derivative Works thereof, You may choose to offer,
167
+ and charge a fee for, acceptance of support, warranty, indemnity,
168
+ or other liability obligations and/or rights consistent with this
169
+ License. However, in accepting such obligations, You may act only
170
+ on Your own behalf and on Your sole responsibility, not on behalf
171
+ of any other Contributor, and only if You agree to indemnify,
172
+ defend, and hold each Contributor harmless for any liability
173
+ incurred by, or claims asserted against, such Contributor by reason
174
+ of your accepting any such warranty or additional liability.
175
+
176
+ END OF TERMS AND CONDITIONS
177
+
178
+ APPENDIX: How to apply the Apache License to your work.
179
+
180
+ To apply the Apache License to your work, attach the following
181
+ boilerplate notice, with the fields enclosed by brackets "[]"
182
+ replaced with your own identifying information. (Don't include
183
+ the brackets!) The text should be enclosed in the appropriate
184
+ comment syntax for the file format. We also recommend that a
185
+ file or class name and description of purpose be included on the
186
+ same "printed page" as the copyright notice for easier
187
+ identification within third-party archives.
188
+
189
+ Copyright [yyyy] [name of copyright owner]
190
+
191
+ Licensed under the Apache License, Version 2.0 (the "License");
192
+ you may not use this file except in compliance with the License.
193
+ You may obtain a copy of the License at
194
+
195
+ http://www.apache.org/licenses/LICENSE-2.0
196
+
197
+ Unless required by applicable law or agreed to in writing, software
198
+ distributed under the License is distributed on an "AS IS" BASIS,
199
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200
+ See the License for the specific language governing permissions and
201
+ limitations under the License.
package/README.md ADDED
@@ -0,0 +1,136 @@
1
+ # @amaster.ai/pi-computer-use
2
+
3
+ ![pi-computer-use preview](https://raw.githubusercontent.com/TGYD-helige/pi/master/packages/pi-computer-use/preview.png)
4
+
5
+ pi-coding-agent extension that wraps [cua-driver-rs](https://github.com/trycua/cua/), exposing desktop automation tools with a `computer_use_` prefix.
6
+
7
+ ## Features
8
+
9
+ - **Zero external dependencies** — pre-compiled cua-driver-rs binaries bundled for all platforms
10
+ - **MCP stdio communication** — spawns `cua-driver mcp` via `StdioClientTransport`, JSON-RPC over stdio
11
+ - **Dynamic tool discovery** — auto-discovers upstream MCP tools and registers with `computer_use_` prefix; falls back to a built-in tool list when cua-driver fails to start
12
+ - **Smart tool filtering** — excludes non-essential tools (agent cursor, recording, config, raw screenshot), exposes 17 action tools + 1 vision tool
13
+ - **Optional visual analysis** — `computer_use_analyze_screenshot` via configurable vision model
14
+ - **Cross-platform permission handling** — detects platform-specific permission issues (macOS TCC, Windows UAC, Linux display server access) and returns actionable guidance
15
+ - **Graceful degradation** — tools are always registered even when cua-driver cannot connect; lazy reconnect is attempted on each tool call
16
+
17
+ ## Install
18
+
19
+ ```bash
20
+ bun add @amaster.ai/pi-computer-use
21
+ ```
22
+
23
+ Requires Node.js >= 20 and `@earendil-works/pi-coding-agent >= 0.74.0`.
24
+
25
+ ## Usage
26
+
27
+ Install the package and pi-coding-agent will automatically discover and load the extension. All tools are registered on `session_start`.
28
+
29
+ Configure via `.pi/settings.json` (project-level) or `~/.pi/agent/settings.json` (user-level) under the `"pi-computer-use"` key:
30
+
31
+ ```json
32
+ {
33
+ "pi-computer-use": {
34
+ "mode": "bundled"
35
+ }
36
+ }
37
+ ```
38
+
39
+ ## Configuration
40
+
41
+ | Option | Type | Default | Description |
42
+ |--------|------|---------|-------------|
43
+ | `mode` | `'bundled' \| 'path'` | `'bundled'` | Binary resolution strategy |
44
+ | `binaryPath` | `string` | — | Custom cua-driver binary path (requires `mode: 'path'`) |
45
+ | `extraArgs` | `string[]` | — | Extra CLI arguments passed to cua-driver |
46
+ | `visionModel` | `VisionModelConfig` | — | Enable visual screenshot analysis |
47
+
48
+ ### Vision Model (Optional)
49
+
50
+ Enable `computer_use_analyze_screenshot` by referencing a model already configured in Pi's model registry (`models.json`):
51
+
52
+ ```json
53
+ {
54
+ "pi-computer-use": {
55
+ "visionModel": {
56
+ "provider": "openai",
57
+ "model": "gpt-4o"
58
+ }
59
+ }
60
+ }
61
+ ```
62
+
63
+ The extension resolves API key, base URL, and headers from the model registry automatically — no need to duplicate credentials here.
64
+
65
+ ## Exposed Tools (17 + 1 vision)
66
+
67
+ ### Input
68
+
69
+ | Tool | Description |
70
+ |------|-------------|
71
+ | `computer_use_click` | Left-click via element_index or x/y coordinates |
72
+ | `computer_use_double_click` | Double-click at x/y or on an AX element |
73
+ | `computer_use_right_click` | Right-click (context menu) |
74
+ | `computer_use_type_text` | Insert text via AX or CGEvent fallback |
75
+ | `computer_use_press_key` | Press and release a single key |
76
+ | `computer_use_hotkey` | Press a key combination (e.g. Cmd+C) |
77
+ | `computer_use_scroll` | Scroll by line or page in a direction |
78
+ | `computer_use_drag` | Press-drag-release gesture between two points |
79
+ | `computer_use_set_value` | Set value on UI elements (popups, sliders, steppers) |
80
+
81
+ ### Query
82
+
83
+ | Tool | Description |
84
+ |------|-------------|
85
+ | `computer_use_get_screen_size` | Get display dimensions and scale factor |
86
+ | `computer_use_get_cursor_position` | Get current mouse cursor position |
87
+ | `computer_use_get_accessibility_tree` | Lightweight desktop snapshot (apps, windows, bounds) |
88
+ | `computer_use_get_window_state` | Full AX tree of a window with actionable element indices |
89
+ | `computer_use_list_windows` | List all top-level windows with bounds and z-order |
90
+ | `computer_use_list_apps` | List running and installed apps with state flags |
91
+
92
+ ### App Lifecycle
93
+
94
+ | Tool | Description |
95
+ |------|-------------|
96
+ | `computer_use_launch_app` | Launch an app in the background without focus steal |
97
+ | `computer_use_kill_app` | Force-terminate a process by pid |
98
+
99
+ ### Vision (requires `visionModel` config)
100
+
101
+ | Tool | Description |
102
+ |------|-------------|
103
+ | `computer_use_analyze_screenshot` | Take a screenshot and analyze it with a vision model |
104
+
105
+ ## Excluded Tools (16)
106
+
107
+ Agent cursor styling, recording/replay, config management, zoom, raw screenshot (use `analyze_screenshot` instead), and browser-specific operations are filtered out.
108
+
109
+ ## Permissions
110
+
111
+ On `session_start`, the extension checks permissions via cua-driver's `check_permissions` tool. Platform-specific guidance is provided:
112
+
113
+ | Platform | Accessibility | Screen Capture |
114
+ |----------|--------------|----------------|
115
+ | macOS | System Settings → Privacy & Security → Accessibility | System Settings → Privacy & Security → Screen & System Audio Recording |
116
+ | Windows | Run as Administrator / UI Automation access | Check DRM or security policy |
117
+ | Linux | AT-SPI accessibility service | PipeWire portal or X11 access |
118
+
119
+ When cua-driver fails to connect (missing permissions, binary not found, etc.):
120
+ 1. User is notified with a platform-appropriate warning
121
+ 2. Tools are still registered using a built-in fallback schema
122
+ 3. On each tool call, lazy reconnect is attempted; if it still fails, a friendly error with permission instructions is returned
123
+
124
+ ## Supported Platforms
125
+
126
+ | Platform | Binary |
127
+ |----------|--------|
128
+ | macOS ARM64 | `bin/darwin-arm64/cua-driver` |
129
+ | macOS x64 | `bin/darwin-x64/cua-driver` |
130
+ | Linux x64 | `bin/linux-x64/cua-driver` |
131
+ | Windows x64 | `bin/win32-x64/cua-driver.exe` |
132
+ | Windows ARM64 | `bin/win32-arm64/cua-driver.exe` |
133
+
134
+ ## License
135
+
136
+ Apache-2.0
@@ -0,0 +1,2 @@
1
+ cua-driver-v0.2.0
2
+ swift
@@ -0,0 +1,32 @@
1
+ <?xml version="1.0" encoding="UTF-8"?>
2
+ <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
3
+ <plist version="1.0">
4
+ <dict>
5
+ <key>CFBundleIdentifier</key>
6
+ <string>com.trycua.driver</string>
7
+ <key>CFBundleName</key>
8
+ <string>Cua Driver</string>
9
+ <key>CFBundleDisplayName</key>
10
+ <string>Cua Driver</string>
11
+ <key>CFBundleExecutable</key>
12
+ <string>cua-driver</string>
13
+ <key>CFBundleIconFile</key>
14
+ <string>AppIcon</string>
15
+ <key>CFBundleIconName</key>
16
+ <string>AppIcon</string>
17
+ <key>CFBundlePackageType</key>
18
+ <string>APPL</string>
19
+ <key>CFBundleShortVersionString</key>
20
+ <string>0.2.0</string>
21
+ <key>CFBundleVersion</key>
22
+ <string>1</string>
23
+ <key>LSMinimumSystemVersion</key>
24
+ <string>14.0</string>
25
+ <key>LSUIElement</key>
26
+ <true/>
27
+ <key>NSHighResolutionCapable</key>
28
+ <true/>
29
+ <key>NSSupportsAutomaticTermination</key>
30
+ <true/>
31
+ </dict>
32
+ </plist>
@@ -0,0 +1,140 @@
1
+ # cua-driver — Claude Code skill
2
+
3
+ A [Claude Code](https://code.claude.com) skill that teaches Claude to
4
+ drive native macOS apps via the
5
+ [`cua-driver`](https://github.com/trycua/cua/tree/main/libs/cua-driver)
6
+ CLI — snapshot an app's accessibility tree, click/type/scroll by
7
+ `element_index`, and verify via re-snapshot. Backgrounded-first: no
8
+ focus steal, no cursor warp, no Space follow.
9
+
10
+ ## What the skill covers
11
+
12
+ - The snapshot-before-AND-after invariant that keeps the agent honest
13
+ about whether an action actually landed.
14
+ - The backgrounded-click recipe (yabai focus-without-raise + stamped
15
+ SLEventPostToPid) that lets synthetic clicks land on Chrome web
16
+ content without raising the window or pulling the user across Spaces.
17
+ - Web-app quirks (`WEB_APPS.md`) — Chromium/WebKit/Electron/Tauri,
18
+ including the minimized-Chrome keyboard-commit caveat and the
19
+ `set_value` workaround.
20
+ - Trajectory recording (`RECORDING.md`) — optional per-session
21
+ recording + replay for demos and regressions.
22
+ - Canvas/viewport apps (Blender, Unity, GHOST, Qt, wxWidgets) —
23
+ HID-tap fallback when AX is empty.
24
+
25
+ See `SKILL.md` for the main body.
26
+
27
+ ## Prerequisites
28
+
29
+ 1. **macOS 14 or newer** — the driver depends on SkyLight private SPIs
30
+ that were stabilized in Sonoma.
31
+ 2. **`cua-driver` CLI + `CuaDriver.app`** — installable one-liner:
32
+ ```bash
33
+ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/cua-driver/scripts/install.sh)"
34
+ ```
35
+ Or from a clone of `trycua/cua`:
36
+ ```bash
37
+ cd libs/cua-driver
38
+ scripts/install-local.sh # builds + installs + symlinks for dev use
39
+ ```
40
+ The driver runs as an `.app` bundle because macOS TCC grants are
41
+ tied to a stable bundle id (`com.trycua.driver`). The CLI symlink
42
+ lets Claude invoke tools via plain shell.
43
+ 3. **TCC grants on `CuaDriver.app`** — **Accessibility** and
44
+ **Screen Recording** in System Settings → Privacy & Security.
45
+ Verify with:
46
+ ```bash
47
+ cua-driver check_permissions
48
+ ```
49
+ Both fields must be `true`. If not, the app appears in the
50
+ relevant panes of System Settings after first use; toggle it on
51
+ there.
52
+
53
+ ## Install
54
+
55
+ The skill is two drop-in directories.
56
+
57
+ **Personal scope** (all Claude Code sessions on your machine):
58
+
59
+ ```bash
60
+ mkdir -p ~/.claude/skills
61
+ cp -R Skills/cua-driver ~/.claude/skills/
62
+ ```
63
+
64
+ Or symlink if you want edits-in-place:
65
+
66
+ ```bash
67
+ ln -s "$PWD/Skills/cua-driver" ~/.claude/skills/cua-driver
68
+ ```
69
+
70
+ **Project scope** (committed alongside a specific repo):
71
+
72
+ ```bash
73
+ mkdir -p .claude/skills
74
+ cp -R /path/to/cua/libs/cua-driver/Skills/cua-driver .claude/skills/
75
+ ```
76
+
77
+ ## Invoking the skill
78
+
79
+ Claude Code auto-invokes the skill when you ask for macOS GUI
80
+ automation — e.g. "open the Downloads folder in Finder", "click the
81
+ Save button in Numbers", "navigate to trycua.com in Chrome". You can
82
+ also invoke it explicitly:
83
+
84
+ ```
85
+ /cua-driver
86
+ ```
87
+
88
+ ## Claude Code MCP compatibility mode
89
+
90
+ For normal skill-driven use, prefer the CLI or the standard MCP server. If you want Claude Code's vision/computer-use-style flow to ground on CuaDriver screenshots, register the compatibility server:
91
+
92
+ ```bash
93
+ claude mcp add --transport stdio cua-computer-use -- cua-driver mcp --claude-code-computer-use-compat
94
+ ```
95
+
96
+ This mode exposes the normal CuaDriver tools and changes only `screenshot`. The compatibility screenshot requires `pid` and `window_id`, captures that window only, and establishes a window-local pixel coordinate frame. It does not call Anthropic APIs or expose Anthropic's native computer-use API tool.
97
+
98
+ Use MCP for this Claude Code vision/computer-use-style path. CLI screenshots still work as CuaDriver calls, but they do not expose the `mcp__cua-computer-use__screenshot` tool name that Claude Code appears to use as the image-grounding cue.
99
+
100
+ ## Files
101
+
102
+ - `SKILL.md` — the main skill body (~500 lines). Loaded on first
103
+ invocation; stays in context for the session.
104
+ - `WEB_APPS.md` — browsers, Electron, Tauri (Chromium + WebKit). Loaded
105
+ on demand when SKILL.md's pointer is followed.
106
+ - `RECORDING.md` — trajectory recording / replay. Loaded on demand.
107
+ - `TESTS.md` — manual test scripts for end-to-end skill verification.
108
+
109
+ ## Troubleshooting
110
+
111
+ - `cua-driver: command not found` → re-run the installer or add
112
+ `.build/CuaDriver.app/Contents/MacOS/` to `$PATH`.
113
+ - `No cached AX state for pid X window_id W` → element_index was
114
+ reused across turns, or across different windows of the same app.
115
+ Call `get_window_state({pid, window_id})` first in the same turn,
116
+ with the same window_id you're about to act against.
117
+ - Empty `tree_markdown` → `capture_mode` is set to `vision`, which
118
+ skips the AX walk by design. Flip back to the default `som`
119
+ (`cua-driver config set capture_mode som`) to get the tree.
120
+ Tiny screenshot → likely a stale window capture. See "Behavior
121
+ matrix" in SKILL.md for the full mode table.
122
+ - System-alert beep when pressing Return on a minimized Chrome
123
+ omnibox → the keyboard-commit-on-minimized limitation. Use
124
+ `set_value` on the field instead, or AX-click a Go/Submit button.
125
+ See `WEB_APPS.md`.
126
+
127
+ ## Updates
128
+
129
+ The skill evolves alongside the driver. To update:
130
+
131
+ ```bash
132
+ cd /path/to/cua && git pull
133
+ # if you copied: re-copy
134
+ cp -R libs/cua-driver/Skills/cua-driver ~/.claude/skills/
135
+ # if you symlinked: nothing needed
136
+ ```
137
+
138
+ ## License
139
+
140
+ MIT. Same license as the parent `trycua/cua` repo.
@@ -0,0 +1,113 @@
1
+ # Recording & replaying trajectories
2
+
3
+ Session-scoped capture of action sequences + pre/post state, suitable
4
+ for demos, regression diffs, and training data. Invoked only when the
5
+ user explicitly asks to record — the skill does not auto-enable this.
6
+
7
+ `set_recording` turns on a session-scoped trajectory recorder. While
8
+ enabled, every action-tool call (`click`, `right_click`, `scroll`,
9
+ `type_text`, `press_key`, `hotkey`, `set_value`)
10
+ writes a numbered turn folder under a caller-chosen output
11
+ directory. Read-only tools (`get_window_state`, `list_windows`,
12
+ `screenshot`, `list_apps`, permission probes, agent-cursor getters /
13
+ setters, and `set_recording` itself) are not recorded.
14
+
15
+ ## Enable / disable
16
+
17
+ Two equivalent surfaces: the `set_recording` MCP tool, or the
18
+ friendlier `cua-driver recording` subcommand group (wraps
19
+ `set_recording` + `get_recording_state` with human-readable output).
20
+
21
+ ```
22
+ cua-driver recording start ~/cua-trajectories/run-1
23
+ # … run the workflow …
24
+ cua-driver recording status # -> enabled / disabled, next_turn, output_dir
25
+ cua-driver recording stop # -> "Recording disabled (N turns captured in …)"
26
+ ```
27
+
28
+ Raw-tool equivalent:
29
+
30
+ ```
31
+ cua-driver set_recording '{"enabled":true,"output_dir":"~/cua-trajectories/run-1"}'
32
+ cua-driver get_recording_state
33
+ cua-driver set_recording '{"enabled":false}'
34
+ ```
35
+
36
+ The `recording` subcommands require a running daemon (`cua-driver
37
+ serve &`) because recording state is per-process. `output_dir` expands
38
+ `~` and is created (with intermediates) if missing. Turn numbering
39
+ starts at `1` every time recording is (re-)enabled, regardless of any
40
+ existing contents in the directory. State lives in memory only — a
41
+ daemon restart resets to disabled.
42
+
43
+ ## What each turn folder contains
44
+
45
+ Each action writes to `turn-NNNNN/` (five-digit zero-padded counter):
46
+
47
+ - `app_state.json` — post-action AX snapshot for the target pid, same
48
+ shape `get_window_state` returns (tree_markdown, element_count,
49
+ turn_id, etc.) minus the screenshot fields. The recorder resolves a
50
+ frontmost window internally (visible + on-current-Space preferred,
51
+ max-area fallback) since individual action tools carry a
52
+ window_id but the recorder has no caller-supplied anchor.
53
+ - `screenshot.png` — post-action capture of the same window the
54
+ recorder just snapshotted. Omitted when the pid has no visible
55
+ window.
56
+ - `action.json` — the tool name, full input arguments, result
57
+ summary, pid, click point (when applicable), ISO-8601 timestamp.
58
+ - `click.png` — only for click-family actions (`click`,
59
+ `right_click`): a copy of `screenshot.png` with a red dot drawn at
60
+ the click point (screen-absolute point → window-local pixels via
61
+ the screenshot's `scale_factor`). Absent for other tools and for
62
+ clicks whose point falls outside the captured window.
63
+
64
+ ## When to use it
65
+
66
+ - Demos and screen recordings — play the turn folder back to show
67
+ exactly what the agent saw and what it did.
68
+ - Replay for regression — re-run the same sequence against a future
69
+ build and diff the new trajectory against the saved one.
70
+ - Training data collection — each turn is a
71
+ `(state, action, next_state)` triple ready for offline learning.
72
+
73
+ ## When to invoke it
74
+
75
+ This skill does **not** auto-enable recording. The client invokes
76
+ `set_recording` explicitly when the user asks to capture a session.
77
+ If the user says "record this session" or similar, call
78
+ `set_recording({enabled:true, output_dir:…})` before the first
79
+ action, and `set_recording({enabled:false})` when done.
80
+
81
+ ## Replaying a recorded trajectory
82
+
83
+ `replay_trajectory({dir})` walks `<dir>/turn-NNNNN/` folders in
84
+ lexical order, reads each `action.json`, and re-invokes the recorded
85
+ tool with its recorded `arguments`. Optional knobs: `delay_ms`
86
+ (pacing between turns, default 500) and `stop_on_error` (halt on
87
+ first failure, default true).
88
+
89
+ ```
90
+ cua-driver recording start ~/cua-trajectories/demo1
91
+ # … run the workflow …
92
+ cua-driver recording stop
93
+ # Later: replay against a new build.
94
+ cua-driver replay_trajectory '{"dir":"~/cua-trajectories/demo1","delay_ms":500}'
95
+ ```
96
+
97
+ Important caveat: **element_index doesn't survive across sessions**.
98
+ Indices are assigned fresh on every `get_window_state` snapshot,
99
+ keyed on `(pid, window_id)`, so a recorded
100
+ `click({pid, window_id, element_index: 14})` from yesterday won't
101
+ resolve today — the pid is usually different, the window_id always
102
+ is. The call returns `Invalid element_index` or `No cached AX
103
+ state`. Pixel clicks (`click({pid, x, y})`) and keyboard tools
104
+ (`press_key`, `hotkey`, `type_text` without element_index) replay cleanly; element-indexed actions require a
105
+ live snapshot that replay doesn't currently re-emit (read-only tools
106
+ like `get_window_state` aren't recorded). For a reliable replay, either
107
+ compose the trajectory from pixel + keyboard primitives, or capture
108
+ it as a regression artifact (compare the failure/success pattern
109
+ across builds) rather than a re-driving script.
110
+
111
+ If recording is still enabled while replay runs, the replay is
112
+ itself recorded into the current output directory — that's the
113
+ intended regression-diff workflow.