@amaster.ai/pi-computer-use 0.1.1-beta.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (74) hide show
  1. package/README.md +136 -0
  2. package/bin/darwin-arm64/.version +2 -0
  3. package/bin/darwin-arm64/CuaDriver.app/Contents/CodeResources +0 -0
  4. package/bin/darwin-arm64/CuaDriver.app/Contents/Info.plist +32 -0
  5. package/bin/darwin-arm64/CuaDriver.app/Contents/MacOS/cua-driver +0 -0
  6. package/bin/darwin-arm64/CuaDriver.app/Contents/Resources/Skills/cua-driver/README.md +140 -0
  7. package/bin/darwin-arm64/CuaDriver.app/Contents/Resources/Skills/cua-driver/RECORDING.md +113 -0
  8. package/bin/darwin-arm64/CuaDriver.app/Contents/Resources/Skills/cua-driver/SKILL.md +887 -0
  9. package/bin/darwin-arm64/CuaDriver.app/Contents/Resources/Skills/cua-driver/TESTS.md +232 -0
  10. package/bin/darwin-arm64/CuaDriver.app/Contents/Resources/Skills/cua-driver/WEB_APPS.md +471 -0
  11. package/bin/darwin-arm64/CuaDriver.app/Contents/_CodeSignature/CodeResources +172 -0
  12. package/bin/darwin-x64/.version +2 -0
  13. package/bin/darwin-x64/CuaDriver.app/Contents/CodeResources +0 -0
  14. package/bin/darwin-x64/CuaDriver.app/Contents/Info.plist +32 -0
  15. package/bin/darwin-x64/CuaDriver.app/Contents/MacOS/cua-driver +0 -0
  16. package/bin/darwin-x64/CuaDriver.app/Contents/Resources/Skills/cua-driver/README.md +140 -0
  17. package/bin/darwin-x64/CuaDriver.app/Contents/Resources/Skills/cua-driver/RECORDING.md +113 -0
  18. package/bin/darwin-x64/CuaDriver.app/Contents/Resources/Skills/cua-driver/SKILL.md +887 -0
  19. package/bin/darwin-x64/CuaDriver.app/Contents/Resources/Skills/cua-driver/TESTS.md +232 -0
  20. package/bin/darwin-x64/CuaDriver.app/Contents/Resources/Skills/cua-driver/WEB_APPS.md +471 -0
  21. package/bin/darwin-x64/CuaDriver.app/Contents/_CodeSignature/CodeResources +172 -0
  22. package/bin/linux-x64/.version +2 -0
  23. package/bin/linux-x64/cua-driver +0 -0
  24. package/bin/win32-arm64/.version +2 -0
  25. package/bin/win32-arm64/cua-driver-uia.exe +0 -0
  26. package/bin/win32-arm64/cua-driver.exe +0 -0
  27. package/bin/win32-x64/.version +2 -0
  28. package/bin/win32-x64/cua-driver-uia.exe +0 -0
  29. package/bin/win32-x64/cua-driver.exe +0 -0
  30. package/dist/config.d.ts +7 -20
  31. package/dist/config.d.ts.map +1 -1
  32. package/dist/config.js +8 -18
  33. package/dist/config.js.map +1 -1
  34. package/dist/index.d.ts.map +1 -1
  35. package/dist/index.js +553 -72
  36. package/dist/index.js.map +1 -1
  37. package/dist/mcp-client.d.ts +22 -0
  38. package/dist/mcp-client.d.ts.map +1 -0
  39. package/dist/mcp-client.js +91 -0
  40. package/dist/mcp-client.js.map +1 -0
  41. package/dist/vision.d.ts.map +1 -1
  42. package/dist/vision.js +19 -0
  43. package/dist/vision.js.map +1 -1
  44. package/package.json +25 -5
  45. package/preview.png +0 -0
  46. package/scripts/postinstall.js +29 -0
  47. package/dist/__tests__/computer-client.test.d.ts +0 -2
  48. package/dist/__tests__/computer-client.test.d.ts.map +0 -1
  49. package/dist/__tests__/computer-client.test.js +0 -174
  50. package/dist/__tests__/computer-client.test.js.map +0 -1
  51. package/dist/__tests__/index.test.d.ts +0 -2
  52. package/dist/__tests__/index.test.d.ts.map +0 -1
  53. package/dist/__tests__/index.test.js +0 -385
  54. package/dist/__tests__/index.test.js.map +0 -1
  55. package/dist/__tests__/server-process.test.d.ts +0 -2
  56. package/dist/__tests__/server-process.test.d.ts.map +0 -1
  57. package/dist/__tests__/server-process.test.js +0 -127
  58. package/dist/__tests__/server-process.test.js.map +0 -1
  59. package/dist/__tests__/vision.test.d.ts +0 -2
  60. package/dist/__tests__/vision.test.d.ts.map +0 -1
  61. package/dist/__tests__/vision.test.js +0 -36
  62. package/dist/__tests__/vision.test.js.map +0 -1
  63. package/dist/actions.d.ts +0 -15
  64. package/dist/actions.d.ts.map +0 -1
  65. package/dist/actions.js +0 -45
  66. package/dist/actions.js.map +0 -1
  67. package/dist/computer-client.d.ts +0 -13
  68. package/dist/computer-client.d.ts.map +0 -1
  69. package/dist/computer-client.js +0 -109
  70. package/dist/computer-client.js.map +0 -1
  71. package/dist/server-process.d.ts +0 -9
  72. package/dist/server-process.d.ts.map +0 -1
  73. package/dist/server-process.js +0 -76
  74. package/dist/server-process.js.map +0 -1
package/README.md ADDED
@@ -0,0 +1,136 @@
1
+ # @amaster.ai/pi-computer-use
2
+
3
+ ![pi-computer-use preview](https://raw.githubusercontent.com/TGYD-helige/pi/master/packages/pi-computer-use/preview.png)
4
+
5
+ pi-coding-agent extension that wraps [cua-driver-rs](https://github.com/trycua/cua/), exposing desktop automation tools with a `computer_use_` prefix.
6
+
7
+ ## Features
8
+
9
+ - **Zero external dependencies** — pre-compiled cua-driver-rs binaries bundled for all platforms
10
+ - **MCP stdio communication** — spawns `cua-driver mcp` via `StdioClientTransport`, JSON-RPC over stdio
11
+ - **Dynamic tool discovery** — auto-discovers upstream MCP tools and registers with `computer_use_` prefix; falls back to a built-in tool list when cua-driver fails to start
12
+ - **Smart tool filtering** — excludes non-essential tools (agent cursor, recording, config, raw screenshot), exposes 17 action tools + 1 vision tool
13
+ - **Optional visual analysis** — `computer_use_analyze_screenshot` via configurable vision model
14
+ - **Cross-platform permission handling** — detects platform-specific permission issues (macOS TCC, Windows UAC, Linux display server access) and returns actionable guidance
15
+ - **Graceful degradation** — tools are always registered even when cua-driver cannot connect; lazy reconnect is attempted on each tool call
16
+
17
+ ## Install
18
+
19
+ ```bash
20
+ bun add @amaster.ai/pi-computer-use
21
+ ```
22
+
23
+ Requires Node.js >= 20 and `@earendil-works/pi-coding-agent >= 0.74.0`.
24
+
25
+ ## Usage
26
+
27
+ Install the package and pi-coding-agent will automatically discover and load the extension. All tools are registered on `session_start`.
28
+
29
+ Configure via `.pi/settings.json` (project-level) or `~/.pi/agent/settings.json` (user-level) under the `"pi-computer-use"` key:
30
+
31
+ ```json
32
+ {
33
+ "pi-computer-use": {
34
+ "mode": "bundled"
35
+ }
36
+ }
37
+ ```
38
+
39
+ ## Configuration
40
+
41
+ | Option | Type | Default | Description |
42
+ |--------|------|---------|-------------|
43
+ | `mode` | `'bundled' \| 'path'` | `'bundled'` | Binary resolution strategy |
44
+ | `binaryPath` | `string` | — | Custom cua-driver binary path (requires `mode: 'path'`) |
45
+ | `extraArgs` | `string[]` | — | Extra CLI arguments passed to cua-driver |
46
+ | `visionModel` | `VisionModelConfig` | — | Enable visual screenshot analysis |
47
+
48
+ ### Vision Model (Optional)
49
+
50
+ Enable `computer_use_analyze_screenshot` by referencing a model already configured in Pi's model registry (`models.json`):
51
+
52
+ ```json
53
+ {
54
+ "pi-computer-use": {
55
+ "visionModel": {
56
+ "provider": "openai",
57
+ "model": "gpt-4o"
58
+ }
59
+ }
60
+ }
61
+ ```
62
+
63
+ The extension resolves API key, base URL, and headers from the model registry automatically — no need to duplicate credentials here.
64
+
65
+ ## Exposed Tools (17 + 1 vision)
66
+
67
+ ### Input
68
+
69
+ | Tool | Description |
70
+ |------|-------------|
71
+ | `computer_use_click` | Left-click via element_index or x/y coordinates |
72
+ | `computer_use_double_click` | Double-click at x/y or on an AX element |
73
+ | `computer_use_right_click` | Right-click (context menu) |
74
+ | `computer_use_type_text` | Insert text via AX or CGEvent fallback |
75
+ | `computer_use_press_key` | Press and release a single key |
76
+ | `computer_use_hotkey` | Press a key combination (e.g. Cmd+C) |
77
+ | `computer_use_scroll` | Scroll by line or page in a direction |
78
+ | `computer_use_drag` | Press-drag-release gesture between two points |
79
+ | `computer_use_set_value` | Set value on UI elements (popups, sliders, steppers) |
80
+
81
+ ### Query
82
+
83
+ | Tool | Description |
84
+ |------|-------------|
85
+ | `computer_use_get_screen_size` | Get display dimensions and scale factor |
86
+ | `computer_use_get_cursor_position` | Get current mouse cursor position |
87
+ | `computer_use_get_accessibility_tree` | Lightweight desktop snapshot (apps, windows, bounds) |
88
+ | `computer_use_get_window_state` | Full AX tree of a window with actionable element indices |
89
+ | `computer_use_list_windows` | List all top-level windows with bounds and z-order |
90
+ | `computer_use_list_apps` | List running and installed apps with state flags |
91
+
92
+ ### App Lifecycle
93
+
94
+ | Tool | Description |
95
+ |------|-------------|
96
+ | `computer_use_launch_app` | Launch an app in the background without focus steal |
97
+ | `computer_use_kill_app` | Force-terminate a process by pid |
98
+
99
+ ### Vision (requires `visionModel` config)
100
+
101
+ | Tool | Description |
102
+ |------|-------------|
103
+ | `computer_use_analyze_screenshot` | Take a screenshot and analyze it with a vision model |
104
+
105
+ ## Excluded Tools (16)
106
+
107
+ Agent cursor styling, recording/replay, config management, zoom, raw screenshot (use `analyze_screenshot` instead), and browser-specific operations are filtered out.
108
+
109
+ ## Permissions
110
+
111
+ On `session_start`, the extension checks permissions via cua-driver's `check_permissions` tool. Platform-specific guidance is provided:
112
+
113
+ | Platform | Accessibility | Screen Capture |
114
+ |----------|--------------|----------------|
115
+ | macOS | System Settings → Privacy & Security → Accessibility | System Settings → Privacy & Security → Screen & System Audio Recording |
116
+ | Windows | Run as Administrator / UI Automation access | Check DRM or security policy |
117
+ | Linux | AT-SPI accessibility service | PipeWire portal or X11 access |
118
+
119
+ When cua-driver fails to connect (missing permissions, binary not found, etc.):
120
+ 1. User is notified with a platform-appropriate warning
121
+ 2. Tools are still registered using a built-in fallback schema
122
+ 3. On each tool call, lazy reconnect is attempted; if it still fails, a friendly error with permission instructions is returned
123
+
124
+ ## Supported Platforms
125
+
126
+ | Platform | Binary |
127
+ |----------|--------|
128
+ | macOS ARM64 | `bin/darwin-arm64/cua-driver` |
129
+ | macOS x64 | `bin/darwin-x64/cua-driver` |
130
+ | Linux x64 | `bin/linux-x64/cua-driver` |
131
+ | Windows x64 | `bin/win32-x64/cua-driver.exe` |
132
+ | Windows ARM64 | `bin/win32-arm64/cua-driver.exe` |
133
+
134
+ ## License
135
+
136
+ Apache-2.0
@@ -0,0 +1,2 @@
1
+ cua-driver-v0.2.0
2
+ swift
@@ -0,0 +1,32 @@
1
+ <?xml version="1.0" encoding="UTF-8"?>
2
+ <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
3
+ <plist version="1.0">
4
+ <dict>
5
+ <key>CFBundleIdentifier</key>
6
+ <string>com.trycua.driver</string>
7
+ <key>CFBundleName</key>
8
+ <string>Cua Driver</string>
9
+ <key>CFBundleDisplayName</key>
10
+ <string>Cua Driver</string>
11
+ <key>CFBundleExecutable</key>
12
+ <string>cua-driver</string>
13
+ <key>CFBundleIconFile</key>
14
+ <string>AppIcon</string>
15
+ <key>CFBundleIconName</key>
16
+ <string>AppIcon</string>
17
+ <key>CFBundlePackageType</key>
18
+ <string>APPL</string>
19
+ <key>CFBundleShortVersionString</key>
20
+ <string>0.2.0</string>
21
+ <key>CFBundleVersion</key>
22
+ <string>1</string>
23
+ <key>LSMinimumSystemVersion</key>
24
+ <string>14.0</string>
25
+ <key>LSUIElement</key>
26
+ <true/>
27
+ <key>NSHighResolutionCapable</key>
28
+ <true/>
29
+ <key>NSSupportsAutomaticTermination</key>
30
+ <true/>
31
+ </dict>
32
+ </plist>
@@ -0,0 +1,140 @@
1
+ # cua-driver — Claude Code skill
2
+
3
+ A [Claude Code](https://code.claude.com) skill that teaches Claude to
4
+ drive native macOS apps via the
5
+ [`cua-driver`](https://github.com/trycua/cua/tree/main/libs/cua-driver)
6
+ CLI — snapshot an app's accessibility tree, click/type/scroll by
7
+ `element_index`, and verify via re-snapshot. Backgrounded-first: no
8
+ focus steal, no cursor warp, no Space follow.
9
+
10
+ ## What the skill covers
11
+
12
+ - The snapshot-before-AND-after invariant that keeps the agent honest
13
+ about whether an action actually landed.
14
+ - The backgrounded-click recipe (yabai focus-without-raise + stamped
15
+ SLEventPostToPid) that lets synthetic clicks land on Chrome web
16
+ content without raising the window or pulling the user across Spaces.
17
+ - Web-app quirks (`WEB_APPS.md`) — Chromium/WebKit/Electron/Tauri,
18
+ including the minimized-Chrome keyboard-commit caveat and the
19
+ `set_value` workaround.
20
+ - Trajectory recording (`RECORDING.md`) — optional per-session
21
+ recording + replay for demos and regressions.
22
+ - Canvas/viewport apps (Blender, Unity, GHOST, Qt, wxWidgets) —
23
+ HID-tap fallback when AX is empty.
24
+
25
+ See `SKILL.md` for the main body.
26
+
27
+ ## Prerequisites
28
+
29
+ 1. **macOS 14 or newer** — the driver depends on SkyLight private SPIs
30
+ that were stabilized in Sonoma.
31
+ 2. **`cua-driver` CLI + `CuaDriver.app`** — installable one-liner:
32
+ ```bash
33
+ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/cua-driver/scripts/install.sh)"
34
+ ```
35
+ Or from a clone of `trycua/cua`:
36
+ ```bash
37
+ cd libs/cua-driver
38
+ scripts/install-local.sh # builds + installs + symlinks for dev use
39
+ ```
40
+ The driver runs as an `.app` bundle because macOS TCC grants are
41
+ tied to a stable bundle id (`com.trycua.driver`). The CLI symlink
42
+ lets Claude invoke tools via plain shell.
43
+ 3. **TCC grants on `CuaDriver.app`** — **Accessibility** and
44
+ **Screen Recording** in System Settings → Privacy & Security.
45
+ Verify with:
46
+ ```bash
47
+ cua-driver check_permissions
48
+ ```
49
+ Both fields must be `true`. If not, the app appears in the
50
+ relevant panes of System Settings after first use; toggle it on
51
+ there.
52
+
53
+ ## Install
54
+
55
+ The skill is two drop-in directories.
56
+
57
+ **Personal scope** (all Claude Code sessions on your machine):
58
+
59
+ ```bash
60
+ mkdir -p ~/.claude/skills
61
+ cp -R Skills/cua-driver ~/.claude/skills/
62
+ ```
63
+
64
+ Or symlink if you want edits-in-place:
65
+
66
+ ```bash
67
+ ln -s "$PWD/Skills/cua-driver" ~/.claude/skills/cua-driver
68
+ ```
69
+
70
+ **Project scope** (committed alongside a specific repo):
71
+
72
+ ```bash
73
+ mkdir -p .claude/skills
74
+ cp -R /path/to/cua/libs/cua-driver/Skills/cua-driver .claude/skills/
75
+ ```
76
+
77
+ ## Invoking the skill
78
+
79
+ Claude Code auto-invokes the skill when you ask for macOS GUI
80
+ automation — e.g. "open the Downloads folder in Finder", "click the
81
+ Save button in Numbers", "navigate to trycua.com in Chrome". You can
82
+ also invoke it explicitly:
83
+
84
+ ```
85
+ /cua-driver
86
+ ```
87
+
88
+ ## Claude Code MCP compatibility mode
89
+
90
+ For normal skill-driven use, prefer the CLI or the standard MCP server. If you want Claude Code's vision/computer-use-style flow to ground on CuaDriver screenshots, register the compatibility server:
91
+
92
+ ```bash
93
+ claude mcp add --transport stdio cua-computer-use -- cua-driver mcp --claude-code-computer-use-compat
94
+ ```
95
+
96
+ This mode exposes the normal CuaDriver tools and changes only `screenshot`. The compatibility screenshot requires `pid` and `window_id`, captures that window only, and establishes a window-local pixel coordinate frame. It does not call Anthropic APIs or expose Anthropic's native computer-use API tool.
97
+
98
+ Use MCP for this Claude Code vision/computer-use-style path. CLI screenshots still work as CuaDriver calls, but they do not expose the `mcp__cua-computer-use__screenshot` tool name that Claude Code appears to use as the image-grounding cue.
99
+
100
+ ## Files
101
+
102
+ - `SKILL.md` — the main skill body (~500 lines). Loaded on first
103
+ invocation; stays in context for the session.
104
+ - `WEB_APPS.md` — browsers, Electron, Tauri (Chromium + WebKit). Loaded
105
+ on demand when SKILL.md's pointer is followed.
106
+ - `RECORDING.md` — trajectory recording / replay. Loaded on demand.
107
+ - `TESTS.md` — manual test scripts for end-to-end skill verification.
108
+
109
+ ## Troubleshooting
110
+
111
+ - `cua-driver: command not found` → re-run the installer or add
112
+ `.build/CuaDriver.app/Contents/MacOS/` to `$PATH`.
113
+ - `No cached AX state for pid X window_id W` → element_index was
114
+ reused across turns, or across different windows of the same app.
115
+ Call `get_window_state({pid, window_id})` first in the same turn,
116
+ with the same window_id you're about to act against.
117
+ - Empty `tree_markdown` → `capture_mode` is set to `vision`, which
118
+ skips the AX walk by design. Flip back to the default `som`
119
+ (`cua-driver config set capture_mode som`) to get the tree.
120
+ Tiny screenshot → likely a stale window capture. See "Behavior
121
+ matrix" in SKILL.md for the full mode table.
122
+ - System-alert beep when pressing Return on a minimized Chrome
123
+ omnibox → the keyboard-commit-on-minimized limitation. Use
124
+ `set_value` on the field instead, or AX-click a Go/Submit button.
125
+ See `WEB_APPS.md`.
126
+
127
+ ## Updates
128
+
129
+ The skill evolves alongside the driver. To update:
130
+
131
+ ```bash
132
+ cd /path/to/cua && git pull
133
+ # if you copied: re-copy
134
+ cp -R libs/cua-driver/Skills/cua-driver ~/.claude/skills/
135
+ # if you symlinked: nothing needed
136
+ ```
137
+
138
+ ## License
139
+
140
+ MIT. Same license as the parent `trycua/cua` repo.
@@ -0,0 +1,113 @@
1
+ # Recording & replaying trajectories
2
+
3
+ Session-scoped capture of action sequences + pre/post state, suitable
4
+ for demos, regression diffs, and training data. Invoked only when the
5
+ user explicitly asks to record — the skill does not auto-enable this.
6
+
7
+ `set_recording` turns on a session-scoped trajectory recorder. While
8
+ enabled, every action-tool call (`click`, `right_click`, `scroll`,
9
+ `type_text`, `press_key`, `hotkey`, `set_value`)
10
+ writes a numbered turn folder under a caller-chosen output
11
+ directory. Read-only tools (`get_window_state`, `list_windows`,
12
+ `screenshot`, `list_apps`, permission probes, agent-cursor getters /
13
+ setters, and `set_recording` itself) are not recorded.
14
+
15
+ ## Enable / disable
16
+
17
+ Two equivalent surfaces: the `set_recording` MCP tool, or the
18
+ friendlier `cua-driver recording` subcommand group (wraps
19
+ `set_recording` + `get_recording_state` with human-readable output).
20
+
21
+ ```
22
+ cua-driver recording start ~/cua-trajectories/run-1
23
+ # … run the workflow …
24
+ cua-driver recording status # -> enabled / disabled, next_turn, output_dir
25
+ cua-driver recording stop # -> "Recording disabled (N turns captured in …)"
26
+ ```
27
+
28
+ Raw-tool equivalent:
29
+
30
+ ```
31
+ cua-driver set_recording '{"enabled":true,"output_dir":"~/cua-trajectories/run-1"}'
32
+ cua-driver get_recording_state
33
+ cua-driver set_recording '{"enabled":false}'
34
+ ```
35
+
36
+ The `recording` subcommands require a running daemon (`cua-driver
37
+ serve &`) because recording state is per-process. `output_dir` expands
38
+ `~` and is created (with intermediates) if missing. Turn numbering
39
+ starts at `1` every time recording is (re-)enabled, regardless of any
40
+ existing contents in the directory. State lives in memory only — a
41
+ daemon restart resets to disabled.
42
+
43
+ ## What each turn folder contains
44
+
45
+ Each action writes to `turn-NNNNN/` (five-digit zero-padded counter):
46
+
47
+ - `app_state.json` — post-action AX snapshot for the target pid, same
48
+ shape `get_window_state` returns (tree_markdown, element_count,
49
+ turn_id, etc.) minus the screenshot fields. The recorder resolves a
50
+ frontmost window internally (visible + on-current-Space preferred,
51
+ max-area fallback) since individual action tools carry a
52
+ window_id but the recorder has no caller-supplied anchor.
53
+ - `screenshot.png` — post-action capture of the same window the
54
+ recorder just snapshotted. Omitted when the pid has no visible
55
+ window.
56
+ - `action.json` — the tool name, full input arguments, result
57
+ summary, pid, click point (when applicable), ISO-8601 timestamp.
58
+ - `click.png` — only for click-family actions (`click`,
59
+ `right_click`): a copy of `screenshot.png` with a red dot drawn at
60
+ the click point (screen-absolute point → window-local pixels via
61
+ the screenshot's `scale_factor`). Absent for other tools and for
62
+ clicks whose point falls outside the captured window.
63
+
64
+ ## When to use it
65
+
66
+ - Demos and screen recordings — play the turn folder back to show
67
+ exactly what the agent saw and what it did.
68
+ - Replay for regression — re-run the same sequence against a future
69
+ build and diff the new trajectory against the saved one.
70
+ - Training data collection — each turn is a
71
+ `(state, action, next_state)` triple ready for offline learning.
72
+
73
+ ## When to invoke it
74
+
75
+ This skill does **not** auto-enable recording. The client invokes
76
+ `set_recording` explicitly when the user asks to capture a session.
77
+ If the user says "record this session" or similar, call
78
+ `set_recording({enabled:true, output_dir:…})` before the first
79
+ action, and `set_recording({enabled:false})` when done.
80
+
81
+ ## Replaying a recorded trajectory
82
+
83
+ `replay_trajectory({dir})` walks `<dir>/turn-NNNNN/` folders in
84
+ lexical order, reads each `action.json`, and re-invokes the recorded
85
+ tool with its recorded `arguments`. Optional knobs: `delay_ms`
86
+ (pacing between turns, default 500) and `stop_on_error` (halt on
87
+ first failure, default true).
88
+
89
+ ```
90
+ cua-driver recording start ~/cua-trajectories/demo1
91
+ # … run the workflow …
92
+ cua-driver recording stop
93
+ # Later: replay against a new build.
94
+ cua-driver replay_trajectory '{"dir":"~/cua-trajectories/demo1","delay_ms":500}'
95
+ ```
96
+
97
+ Important caveat: **element_index doesn't survive across sessions**.
98
+ Indices are assigned fresh on every `get_window_state` snapshot,
99
+ keyed on `(pid, window_id)`, so a recorded
100
+ `click({pid, window_id, element_index: 14})` from yesterday won't
101
+ resolve today — the pid is usually different, the window_id always
102
+ is. The call returns `Invalid element_index` or `No cached AX
103
+ state`. Pixel clicks (`click({pid, x, y})`) and keyboard tools
104
+ (`press_key`, `hotkey`, `type_text` without element_index) replay cleanly; element-indexed actions require a
105
+ live snapshot that replay doesn't currently re-emit (read-only tools
106
+ like `get_window_state` aren't recorded). For a reliable replay, either
107
+ compose the trajectory from pixel + keyboard primitives, or capture
108
+ it as a regression artifact (compare the failure/success pattern
109
+ across builds) rather than a re-driving script.
110
+
111
+ If recording is still enabled while replay runs, the replay is
112
+ itself recorded into the current output directory — that's the
113
+ intended regression-diff workflow.