agent-device 0.10.1 → 0.10.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (210) hide show
  1. package/README.md +4 -1
  2. package/dist/src/376.js +3 -0
  3. package/dist/src/bin.js +74 -73
  4. package/dist/src/daemon.js +39 -39
  5. package/dist/src/index.d.ts +559 -5
  6. package/dist/src/index.js +3 -1
  7. package/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+CommandExecution.swift +8 -0
  8. package/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Interaction.swift +60 -0
  9. package/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Lifecycle.swift +1 -1
  10. package/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Models.swift +4 -0
  11. package/macos-helper/Package.swift +18 -0
  12. package/macos-helper/Sources/AgentDeviceMacOSHelper/SnapshotTraversal.swift +543 -0
  13. package/macos-helper/Sources/AgentDeviceMacOSHelper/main.swift +545 -0
  14. package/package.json +5 -1
  15. package/skills/agent-device/SKILL.md +57 -343
  16. package/skills/agent-device/references/bootstrap-install.md +207 -0
  17. package/skills/agent-device/references/coordinate-system.md +24 -4
  18. package/skills/agent-device/references/debugging.md +115 -0
  19. package/skills/agent-device/references/exploration.md +235 -0
  20. package/skills/agent-device/references/macos-desktop.md +55 -58
  21. package/skills/agent-device/references/remote-tenancy.md +69 -47
  22. package/skills/agent-device/references/verification.md +102 -0
  23. package/dist/src/224.js +0 -2
  24. package/dist/src/274.js +0 -1
  25. package/dist/src/331.js +0 -3
  26. package/dist/src/bin.d.ts +0 -1
  27. package/dist/src/cli-client-commands.d.ts +0 -8
  28. package/dist/src/cli.d.ts +0 -6
  29. package/dist/src/client-metro.d.ts +0 -64
  30. package/dist/src/client-normalizers.d.ts +0 -20
  31. package/dist/src/client-shared.d.ts +0 -20
  32. package/dist/src/client-types.d.ts +0 -269
  33. package/dist/src/client.d.ts +0 -5
  34. package/dist/src/core/app-events.d.ts +0 -8
  35. package/dist/src/core/batch.d.ts +0 -17
  36. package/dist/src/core/capabilities.d.ts +0 -3
  37. package/dist/src/core/click-button.d.ts +0 -20
  38. package/dist/src/core/dispatch-payload.d.ts +0 -1
  39. package/dist/src/core/dispatch-resolve.d.ts +0 -29
  40. package/dist/src/core/dispatch-series.d.ts +0 -7
  41. package/dist/src/core/dispatch.d.ts +0 -35
  42. package/dist/src/core/open-target.d.ts +0 -4
  43. package/dist/src/core/settings-contract.d.ts +0 -8
  44. package/dist/src/daemon/action-utils.d.ts +0 -3
  45. package/dist/src/daemon/android-system-dialog.d.ts +0 -11
  46. package/dist/src/daemon/app-log-android.d.ts +0 -4
  47. package/dist/src/daemon/app-log-ios.d.ts +0 -6
  48. package/dist/src/daemon/app-log-process.d.ts +0 -15
  49. package/dist/src/daemon/app-log-stream.d.ts +0 -19
  50. package/dist/src/daemon/app-log.d.ts +0 -28
  51. package/dist/src/daemon/artifact-archive.d.ts +0 -12
  52. package/dist/src/daemon/artifact-download.d.ts +0 -12
  53. package/dist/src/daemon/artifact-materialization.d.ts +0 -17
  54. package/dist/src/daemon/artifact-registry.d.ts +0 -12
  55. package/dist/src/daemon/config.d.ts +0 -16
  56. package/dist/src/daemon/context.d.ts +0 -23
  57. package/dist/src/daemon/device-ready.d.ts +0 -6
  58. package/dist/src/daemon/handlers/find.d.ts +0 -40
  59. package/dist/src/daemon/handlers/install-source.d.ts +0 -44
  60. package/dist/src/daemon/handlers/interaction-common.d.ts +0 -12
  61. package/dist/src/daemon/handlers/interaction-fill.d.ts +0 -3
  62. package/dist/src/daemon/handlers/interaction-flags.d.ts +0 -4
  63. package/dist/src/daemon/handlers/interaction-get.d.ts +0 -3
  64. package/dist/src/daemon/handlers/interaction-is.d.ts +0 -3
  65. package/dist/src/daemon/handlers/interaction-press.d.ts +0 -3
  66. package/dist/src/daemon/handlers/interaction-scroll.d.ts +0 -3
  67. package/dist/src/daemon/handlers/interaction-selector.d.ts +0 -27
  68. package/dist/src/daemon/handlers/interaction-snapshot.d.ts +0 -8
  69. package/dist/src/daemon/handlers/interaction-targeting.d.ts +0 -28
  70. package/dist/src/daemon/handlers/interaction-touch.d.ts +0 -46
  71. package/dist/src/daemon/handlers/interaction.d.ts +0 -16
  72. package/dist/src/daemon/handlers/lease.d.ts +0 -8
  73. package/dist/src/daemon/handlers/parse-utils.d.ts +0 -3
  74. package/dist/src/daemon/handlers/record-trace-android.d.ts +0 -18
  75. package/dist/src/daemon/handlers/record-trace-ios.d.ts +0 -52
  76. package/dist/src/daemon/handlers/record-trace-recording.d.ts +0 -32
  77. package/dist/src/daemon/handlers/record-trace.d.ts +0 -10
  78. package/dist/src/daemon/handlers/session-batch.d.ts +0 -2
  79. package/dist/src/daemon/handlers/session-close.d.ts +0 -31
  80. package/dist/src/daemon/handlers/session-deploy.d.ts +0 -37
  81. package/dist/src/daemon/handlers/session-device-utils.d.ts +0 -26
  82. package/dist/src/daemon/handlers/session-open-target.d.ts +0 -3
  83. package/dist/src/daemon/handlers/session-open.d.ts +0 -22
  84. package/dist/src/daemon/handlers/session-perf.d.ts +0 -2
  85. package/dist/src/daemon/handlers/session-replay-heal.d.ts +0 -8
  86. package/dist/src/daemon/handlers/session-replay-script.d.ts +0 -3
  87. package/dist/src/daemon/handlers/session-runtime-command.d.ts +0 -9
  88. package/dist/src/daemon/handlers/session-runtime.d.ts +0 -36
  89. package/dist/src/daemon/handlers/session-startup-metrics.d.ts +0 -11
  90. package/dist/src/daemon/handlers/session.d.ts +0 -50
  91. package/dist/src/daemon/handlers/snapshot-alert.d.ts +0 -13
  92. package/dist/src/daemon/handlers/snapshot-capture.d.ts +0 -27
  93. package/dist/src/daemon/handlers/snapshot-session.d.ts +0 -15
  94. package/dist/src/daemon/handlers/snapshot-settings.d.ts +0 -24
  95. package/dist/src/daemon/handlers/snapshot-wait.d.ts +0 -37
  96. package/dist/src/daemon/handlers/snapshot.d.ts +0 -16
  97. package/dist/src/daemon/http-server.d.ts +0 -26
  98. package/dist/src/daemon/install-source-resolution.d.ts +0 -5
  99. package/dist/src/daemon/is-predicates.d.ts +0 -15
  100. package/dist/src/daemon/lease-context.d.ts +0 -9
  101. package/dist/src/daemon/lease-registry.d.ts +0 -63
  102. package/dist/src/daemon/materialized-path-registry.d.ts +0 -15
  103. package/dist/src/daemon/network-log.d.ts +0 -32
  104. package/dist/src/daemon/record-trace-errors.d.ts +0 -6
  105. package/dist/src/daemon/recording-gestures.d.ts +0 -3
  106. package/dist/src/daemon/recording-telemetry.d.ts +0 -20
  107. package/dist/src/daemon/recording-timing.d.ts +0 -24
  108. package/dist/src/daemon/request-cancel.d.ts +0 -9
  109. package/dist/src/daemon/request-lock-policy.d.ts +0 -2
  110. package/dist/src/daemon/request-router.d.ts +0 -23
  111. package/dist/src/daemon/runtime-hints.d.ts +0 -19
  112. package/dist/src/daemon/script-utils.d.ts +0 -28
  113. package/dist/src/daemon/scroll-planner.d.ts +0 -12
  114. package/dist/src/daemon/selectors-build.d.ts +0 -5
  115. package/dist/src/daemon/selectors-match.d.ts +0 -6
  116. package/dist/src/daemon/selectors-parse.d.ts +0 -29
  117. package/dist/src/daemon/selectors-resolve.d.ts +0 -33
  118. package/dist/src/daemon/selectors.d.ts +0 -5
  119. package/dist/src/daemon/server-lifecycle.d.ts +0 -23
  120. package/dist/src/daemon/session-open-script.d.ts +0 -7
  121. package/dist/src/daemon/session-routing.d.ts +0 -3
  122. package/dist/src/daemon/session-selector.d.ts +0 -10
  123. package/dist/src/daemon/session-store.d.ts +0 -33
  124. package/dist/src/daemon/snapshot-diff.d.ts +0 -20
  125. package/dist/src/daemon/snapshot-processing.d.ts +0 -9
  126. package/dist/src/daemon/touch-reference-frame.d.ts +0 -7
  127. package/dist/src/daemon/transport.d.ts +0 -6
  128. package/dist/src/daemon/types.d.ts +0 -171
  129. package/dist/src/daemon/upload-registry.d.ts +0 -7
  130. package/dist/src/daemon/upload.d.ts +0 -5
  131. package/dist/src/daemon-client.d.ts +0 -40
  132. package/dist/src/daemon.d.ts +0 -1
  133. package/dist/src/platforms/android/adb.d.ts +0 -5
  134. package/dist/src/platforms/android/app-lifecycle.d.ts +0 -31
  135. package/dist/src/platforms/android/device-input-state.d.ts +0 -19
  136. package/dist/src/platforms/android/devices.d.ts +0 -26
  137. package/dist/src/platforms/android/index.d.ts +0 -8
  138. package/dist/src/platforms/android/input-actions.d.ts +0 -16
  139. package/dist/src/platforms/android/install-artifact.d.ts +0 -11
  140. package/dist/src/platforms/android/manifest.d.ts +0 -1
  141. package/dist/src/platforms/android/notifications.d.ts +0 -11
  142. package/dist/src/platforms/android/open-target.d.ts +0 -4
  143. package/dist/src/platforms/android/screenshot.d.ts +0 -16
  144. package/dist/src/platforms/android/sdk.d.ts +0 -2
  145. package/dist/src/platforms/android/settings.d.ts +0 -3
  146. package/dist/src/platforms/android/snapshot.d.ts +0 -7
  147. package/dist/src/platforms/android/ui-hierarchy.d.ts +0 -21
  148. package/dist/src/platforms/appearance.d.ts +0 -2
  149. package/dist/src/platforms/boot-diagnostics.d.ts +0 -14
  150. package/dist/src/platforms/install-source.d.ts +0 -29
  151. package/dist/src/platforms/ios/app-filter.d.ts +0 -2
  152. package/dist/src/platforms/ios/apps.d.ts +0 -34
  153. package/dist/src/platforms/ios/config.d.ts +0 -10
  154. package/dist/src/platforms/ios/devicectl.d.ts +0 -13
  155. package/dist/src/platforms/ios/devices.d.ts +0 -40
  156. package/dist/src/platforms/ios/ensure-simulator.d.ts +0 -18
  157. package/dist/src/platforms/ios/index.d.ts +0 -3
  158. package/dist/src/platforms/ios/install-artifact.d.ts +0 -18
  159. package/dist/src/platforms/ios/launch-diagnostics.d.ts +0 -11
  160. package/dist/src/platforms/ios/macos-apps.d.ts +0 -12
  161. package/dist/src/platforms/ios/plist.d.ts +0 -1
  162. package/dist/src/platforms/ios/runner-client.d.ts +0 -38
  163. package/dist/src/platforms/ios/runner-errors.d.ts +0 -20
  164. package/dist/src/platforms/ios/runner-macos-products.d.ts +0 -3
  165. package/dist/src/platforms/ios/runner-session.d.ts +0 -30
  166. package/dist/src/platforms/ios/runner-transport.d.ts +0 -10
  167. package/dist/src/platforms/ios/runner-xctestrun-products.d.ts +0 -2
  168. package/dist/src/platforms/ios/runner-xctestrun.d.ts +0 -38
  169. package/dist/src/platforms/ios/screenshot-status-bar.d.ts +0 -2
  170. package/dist/src/platforms/ios/screenshot.d.ts +0 -14
  171. package/dist/src/platforms/ios/simctl.d.ts +0 -7
  172. package/dist/src/platforms/ios/simulator.d.ts +0 -11
  173. package/dist/src/platforms/permission-utils.d.ts +0 -9
  174. package/dist/src/recording/overlay.d.ts +0 -10
  175. package/dist/src/upload-client.d.ts +0 -7
  176. package/dist/src/utils/args.d.ts +0 -27
  177. package/dist/src/utils/cli-config.d.ts +0 -10
  178. package/dist/src/utils/cli-option-schema.d.ts +0 -19
  179. package/dist/src/utils/cli-options.d.ts +0 -13
  180. package/dist/src/utils/command-schema.d.ts +0 -122
  181. package/dist/src/utils/device-isolation.d.ts +0 -3
  182. package/dist/src/utils/device.d.ts +0 -35
  183. package/dist/src/utils/diagnostics.d.ts +0 -30
  184. package/dist/src/utils/errors.d.ts +0 -26
  185. package/dist/src/utils/exec.d.ts +0 -32
  186. package/dist/src/utils/finders.d.ts +0 -12
  187. package/dist/src/utils/interactors.d.ts +0 -38
  188. package/dist/src/utils/json-input.d.ts +0 -1
  189. package/dist/src/utils/keyed-lock.d.ts +0 -1
  190. package/dist/src/utils/output.d.ts +0 -27
  191. package/dist/src/utils/path-resolution.d.ts +0 -8
  192. package/dist/src/utils/payload-input.d.ts +0 -12
  193. package/dist/src/utils/process-identity.d.ts +0 -11
  194. package/dist/src/utils/remote-config.d.ts +0 -15
  195. package/dist/src/utils/remote-open.d.ts +0 -9
  196. package/dist/src/utils/retry.d.ts +0 -54
  197. package/dist/src/utils/screenshot-diff.d.ts +0 -23
  198. package/dist/src/utils/session-binding.d.ts +0 -18
  199. package/dist/src/utils/snapshot-lines.d.ts +0 -12
  200. package/dist/src/utils/snapshot.d.ts +0 -42
  201. package/dist/src/utils/timeouts.d.ts +0 -3
  202. package/dist/src/utils/version.d.ts +0 -2
  203. package/dist/src/utils/video.d.ts +0 -9
  204. package/skills/agent-device/references/batching.md +0 -79
  205. package/skills/agent-device/references/logs-and-debug.md +0 -113
  206. package/skills/agent-device/references/perf-metrics.md +0 -53
  207. package/skills/agent-device/references/permissions.md +0 -70
  208. package/skills/agent-device/references/session-management.md +0 -101
  209. package/skills/agent-device/references/snapshot-refs.md +0 -102
  210. package/skills/agent-device/references/video-recording.md +0 -49
@@ -0,0 +1,115 @@
1
+ # Debugging
2
+
3
+ ## When to open this file
4
+
5
+ Open this file when the task turns into failure triage, logs, network inspection, permission prompts, setup trouble, or unstable session behavior.
6
+
7
+ ## Main commands to reach for first
8
+
9
+ - `logs clear --restart`
10
+ - `network dump`
11
+ - `logs path`
12
+ - `logs doctor`
13
+ - `alert wait`
14
+ - `alert accept` or `alert dismiss`
15
+
16
+ ## Most common mistake to avoid
17
+
18
+ Do not leave logging on for normal flows or dump full log files into context. Keep debug windows short and inspect logs with `grep` or `tail`.
19
+
20
+ ## Canonical loop
21
+
22
+ ```bash
23
+ agent-device open MyApp --platform ios
24
+ agent-device logs clear --restart
25
+ agent-device network dump 25
26
+ agent-device logs path
27
+ agent-device close
28
+ ```
29
+
30
+ ## Log and network flow
31
+
32
+ Logging is off by default. Enable it only when you need a debugging window.
33
+
34
+ - Default app logs live under `~/.agent-device/sessions/<session>/app.log`.
35
+ - `logs clear --restart` is the fastest clean repro loop.
36
+ - `network dump [limit] [summary|headers|body|all]` parses recent HTTP(s) entries from the same session app log.
37
+ - `logs doctor` checks backend and runtime readiness for the current session and device.
38
+ - `logs mark "before tap"` inserts a timestamped marker into the app log.
39
+ - Session app logs can contain runtime data, headers, or payload fragments. Review them before sharing.
40
+ - `logs start` requires an active app session and appends to `app.log`.
41
+ - `logs stop` stops streaming. `close` also stops logging.
42
+ - `logs clear` truncates `app.log` and removes rotated `app.log.N` files, and requires logging to be stopped first.
43
+ - `logs path` returns the log path plus metadata about the active backend and file state.
44
+ - `network log` is an alias for `network dump`.
45
+
46
+ Operational limits:
47
+
48
+ - `app.log` rotates to `app.log.1` after 5 MB by default.
49
+ - `network dump` scans the last 4000 app-log lines, returns up to 200 entries, and truncates header or payload fields at 2048 characters.
50
+ - Retention knobs:
51
+ - `AGENT_DEVICE_APP_LOG_MAX_BYTES`
52
+ - `AGENT_DEVICE_APP_LOG_MAX_FILES`
53
+ - Redaction hook:
54
+ - `AGENT_DEVICE_APP_LOG_REDACT_PATTERNS`
55
+
56
+ Useful shell follow-up after `logs path`:
57
+
58
+ ```bash
59
+ grep -n -E "Error|Exception|Fatal|crash" <path>
60
+ tail -50 <path>
61
+ ```
62
+
63
+ ## Alerts and permissions
64
+
65
+ Use `alert` for iOS simulator permission dialogs instead of tapping coordinates.
66
+
67
+ ```bash
68
+ agent-device alert wait 5000
69
+ agent-device alert accept
70
+ ```
71
+
72
+ - `alert` is only supported on iOS simulators.
73
+ - `alert accept` and `alert dismiss` retry internally for a short window, so you usually do not need manual sleeps.
74
+ - iOS 16+ "Allow Paste" prompts are suppressed under XCUITest. Use `xcrun simctl pbcopy booted` when you need to seed simulator clipboard content directly.
75
+
76
+ ## Setup problems worth recognizing early
77
+
78
+ - iOS snapshots do not require macOS Accessibility permissions.
79
+ - iOS physical-device XCTest setup does require valid signing and provisioning.
80
+ - If physical-device runner setup fails, prefer Xcode Automatic Signing first.
81
+ - Optional overrides are:
82
+ - `AGENT_DEVICE_IOS_TEAM_ID`
83
+ - `AGENT_DEVICE_IOS_SIGNING_IDENTITY`
84
+ - `AGENT_DEVICE_IOS_PROVISIONING_PROFILE`
85
+ - `AGENT_DEVICE_IOS_BUNDLE_ID`
86
+ - If daemon startup is timing out during setup, increase `AGENT_DEVICE_DAEMON_TIMEOUT_MS`.
87
+ - If daemon startup fails with stale metadata hints, clean `~/.agent-device/daemon.json` and `~/.agent-device/daemon.lock`, then retry.
88
+ - Free Apple Developer personal-team accounts may reject generic bundle IDs. Use a unique reverse-DNS value for `AGENT_DEVICE_IOS_BUNDLE_ID` when that happens.
89
+
90
+ ## Common failure patterns
91
+
92
+ - `snapshot` returns 0 nodes: the app may no longer be foregrounded or the UI is not stable yet. Re-open the app or retry when state settles.
93
+ - Logs are empty: confirm you opened an app session before `logs clear --restart`.
94
+ - Android logs look stale after relaunch: retry the repro window after the process rebinds.
95
+ - Permission prompts block the flow: wait for the alert and handle it explicitly.
96
+ - If snapshots keep returning 0 nodes on an iOS simulator, restart Simulator and re-open the app.
97
+ - If a macOS snapshot looks incomplete, compare with `snapshot --raw --platform macos` to separate collector filtering from missing AX content.
98
+
99
+ ## Crash triage fast path
100
+
101
+ Always start from the session app log, then branch by platform.
102
+
103
+ ```bash
104
+ agent-device logs path
105
+ grep -n -E "SIGABRT|SIGSEGV|EXC_|fatal|exception|terminated|killed|jetsam|memorystatus|FATAL EXCEPTION|Abort message" <path>
106
+ ```
107
+
108
+ - iOS: if the log suggests `ReportCrash`, `SIGABRT`, or `EXC_*`, inspect `~/Library/Logs/DiagnosticReports`.
109
+ - Android: if the app log is not enough, use `adb logcat` for `FATAL EXCEPTION`, `Abort message`, or `signal` lines around process death.
110
+ - If no crash signature appears in app logs, stop collecting broad logs and switch to the platform-native crash source.
111
+
112
+ ## When to leave this file
113
+
114
+ - Return to [exploration.md](exploration.md) once the app is stable again.
115
+ - Load [verification.md](verification.md) if you need evidence artifacts after reproducing the issue.
@@ -0,0 +1,235 @@
1
+ # Exploration
2
+
3
+ ## When to open this file
4
+
5
+ Open this file when the app or screen is already running and you need to discover the UI, choose targets, read state, wait for conditions, or perform normal interactions.
6
+
7
+ ## Read-only first
8
+
9
+ - If the question is what text, labels, or structure is visible on screen, start with plain `snapshot`.
10
+ - Escalate to `snapshot -i` only when you need refs such as `@e3` for interactive exploration or a requested action.
11
+ - If you intend to `press`, `fill`, or otherwise interact, start with `snapshot -i` and fall back to plain `snapshot` only if interactive refs are unavailable.
12
+ - Prefer `get`, `is`, or `find` before mutating the UI when a read-only command can answer the question.
13
+ - You may take the smallest reversible UI action needed to unblock inspection, such as dismissing a popup, closing an alert, or backing out of an unintended surface.
14
+ - Do not type or fill text just to make hidden information easier to access unless the user asked for that interaction.
15
+ - Do not use external sources to infer missing UI state unless the user explicitly asked.
16
+ - If the answer is not visible or exposed in the UI, report that gap instead of compensating with search, navigation, or text entry.
17
+
18
+ ## Decision shortcut
19
+
20
+ - User asks what is visible on screen: `snapshot`
21
+ - User asks for exact text from a known target: `get text`
22
+ - User asks you to tap, type, or choose an element: `snapshot -i`, then act
23
+ - UI does not expose the answer: say so plainly; do not browse or force the app into a new state unless asked
24
+
25
+ ## Read-only commands
26
+
27
+ - `snapshot`
28
+ - `get`
29
+ - `is`
30
+ - `find`
31
+
32
+ ## Interaction commands
33
+
34
+ - `snapshot -i`
35
+ - `press`
36
+ - `fill`
37
+ - `type`
38
+ - `wait`
39
+
40
+ ## Most common mistake to avoid
41
+
42
+ Do not treat `@ref` values as durable after navigation or dynamic updates. Re-snapshot after the UI changes, and switch to selectors when the flow must stay stable.
43
+
44
+ ## Common example loops
45
+
46
+ These are examples, not required exact sequences. Adapt them to the app, state, and task at hand.
47
+
48
+ ### Interactive exploration loop
49
+
50
+ ```bash
51
+ agent-device open Settings --platform ios
52
+ agent-device snapshot -i
53
+ agent-device press @e3
54
+ agent-device wait visible 'label="Privacy & Security"' 3000
55
+ agent-device get text 'label="Privacy & Security"'
56
+ agent-device close
57
+ ```
58
+
59
+ ### Screen verification loop
60
+
61
+ ```bash
62
+ agent-device open MyApp --platform ios
63
+ # perform the necessary actions to reach the state you need to verify
64
+ agent-device snapshot
65
+ # verify whether the expected element or text is present
66
+ agent-device close
67
+ ```
68
+
69
+ ## Snapshot choices
70
+
71
+ - Use plain `snapshot` when you only need to verify whether visible text or structure is on screen.
72
+ - Use `snapshot -i` when you need refs such as `@e3` for interactive exploration or for an intended interaction.
73
+ - Treat large text-surface lines in `snapshot -i` as discovery output. If a node shows preview or truncation metadata, use `get text @ref` only after you have already decided that `snapshot -i` is needed for that surface.
74
+ - Use `snapshot -i -s "Camera"` or `snapshot -i -s @e3` when you want a smaller, scoped result.
75
+
76
+ Example:
77
+
78
+ ```bash
79
+ agent-device snapshot -i
80
+ ```
81
+
82
+ Sample output:
83
+
84
+ ```text
85
+ Page: com.apple.Preferences
86
+ App: com.apple.Preferences
87
+
88
+ @e1 [ioscontentgroup]
89
+ @e2 [button] "Camera"
90
+ @e3 [button] "Privacy & Security"
91
+ ```
92
+
93
+ ## Refs vs selectors
94
+
95
+ - Use refs for discovery, debugging, and short local loops.
96
+ - Use selectors for deterministic scripts, assertions, and replay-friendly actions.
97
+ - Prefer selector or `@ref` targeting over raw coordinates.
98
+ - For tap interactions, `press` is canonical and `click` is an equivalent alias.
99
+
100
+ Examples:
101
+
102
+ ```bash
103
+ agent-device press @e2
104
+ agent-device fill @e5 "test"
105
+ agent-device press 'id="camera_row" || label="Camera" role=button'
106
+ agent-device is visible 'id="camera_settings_anchor"'
107
+ ```
108
+
109
+ ## Text entry rules
110
+
111
+ - Use `fill` to replace text in an editable field.
112
+ - Use `type` to append text to the current insertion point.
113
+ - Do not use `fill` or `type` just to make the app reveal information that is not currently visible unless the user asked for that interaction.
114
+
115
+ ## Query and sync rules
116
+
117
+ - Use `get` to read text, attrs, or state from a known target.
118
+ - Use `is` for assertions.
119
+ - Use `wait` when the UI needs time to settle after a mutation.
120
+ - Use `find "<query>" click --json` when you need search-driven targeting plus matched-target metadata.
121
+ - If you are forced onto raw coordinates, open [coordinate-system.md](coordinate-system.md) first.
122
+
123
+ Example:
124
+
125
+ ```bash
126
+ agent-device find "Increment" click --json
127
+ ```
128
+
129
+ Returned metadata comes from the matched snapshot node and can be used for observability or replay maintenance.
130
+
131
+ ## QA from acceptance criteria
132
+
133
+ Use this loop when the task starts from acceptance criteria and you need to turn them into concrete checks.
134
+
135
+ Preferred mapping:
136
+
137
+ - visibility or presence claim: `is visible` or plain `snapshot`
138
+ - exact text, label, or value claim: `get text`
139
+ - post-action state change: act, then `wait`, then `is` or `get`
140
+ - nearby structural UI change: `diff snapshot`
141
+ - proof artifact for the final result: `screenshot` or `record`
142
+
143
+ Anti-hallucination rules:
144
+
145
+ - Do not invent app names, device ids, session names, refs, selectors, or package names.
146
+ - Discover them first with `devices`, `open`, `snapshot -i`, `find`, or `session list`.
147
+ - If refs drift after navigation, re-snapshot or switch to selectors instead of guessing.
148
+
149
+ Avoid this escalation path for visible-text questions:
150
+
151
+ - Do not jump from `snapshot -i` to `get text @ref`, then to web search, then to typing into a search box just to force the app to reveal the answer.
152
+ - Start with `snapshot`. If the text is not visible or exposed, report that directly.
153
+
154
+ Canonical QA loop:
155
+
156
+ ```bash
157
+ agent-device open MyApp --platform ios
158
+ agent-device snapshot -i
159
+ agent-device press @e3
160
+ agent-device wait visible 'label="Success"' 3000
161
+ agent-device is visible 'label="Success"'
162
+ agent-device screenshot /tmp/qa-proof.png
163
+ agent-device close
164
+ ```
165
+
166
+ ## Accessibility audit
167
+
168
+ Use this pattern when you need to find UI that is visible to a user but missing from the accessibility tree.
169
+
170
+ Audit loop:
171
+
172
+ 1. Capture a `screenshot` to see what is visually rendered.
173
+ 2. Capture a `snapshot` or `snapshot -i` to see what the accessibility tree exposes.
174
+ 3. Compare the two:
175
+ - visible in screenshot and present in snapshot: exposed to accessibility
176
+ - visible in screenshot and missing from snapshot: likely accessibility gap
177
+ 4. If you suspect the node exists in AX but is filtered from interactive output, retry with `snapshot --raw`.
178
+
179
+ Example:
180
+
181
+ ```bash
182
+ agent-device screenshot /tmp/accessibility-screen.png
183
+ agent-device snapshot -i
184
+ ```
185
+
186
+ Use `screenshot` as the visual source of truth and `snapshot` as the accessibility source of truth for this audit.
187
+
188
+ ## Batch only when the sequence is already known
189
+
190
+ Use `batch` when a short command sequence is already planned and belongs to one logical screen flow.
191
+
192
+ ```bash
193
+ agent-device batch --session sim --platform ios --steps-file /tmp/batch-steps.json --json
194
+ ```
195
+
196
+ - Keep batch size moderate, roughly 5 to 20 steps.
197
+ - Add `wait` or `is exists` guards after mutating steps.
198
+ - Do not use `batch` for highly dynamic flows that need replanning after each step.
199
+
200
+ Step payload contract:
201
+
202
+ ```json
203
+ [
204
+ { "command": "open", "positionals": ["Settings"], "flags": { "platform": "ios" } },
205
+ { "command": "wait", "positionals": ["label=\"Privacy & Security\"", "3000"], "flags": {} },
206
+ { "command": "click", "positionals": ["label=\"Privacy & Security\""], "flags": {} },
207
+ { "command": "get", "positionals": ["text", "label=\"Tracking\""], "flags": {} }
208
+ ]
209
+ ```
210
+
211
+ - `positionals` is optional and defaults to `[]`.
212
+ - `flags` is optional and defaults to `{}`.
213
+ - Nested `batch` and `replay` are rejected.
214
+ - Supported error mode is stop-on-first-error.
215
+
216
+ Response handling:
217
+
218
+ - Success returns fields such as `total`, `executed`, `totalDurationMs`, and `results[]`.
219
+ - Failed runs include `details.step`, `details.command`, `details.executed`, and `details.partialResults`.
220
+ - Replan from the first failing step instead of rerunning the whole flow blindly.
221
+
222
+ Common batch error categories:
223
+
224
+ - `INVALID_ARGS`: fix the payload shape and retry.
225
+ - `SESSION_NOT_FOUND`: open or select the correct session, then retry.
226
+ - `UNSUPPORTED_OPERATION`: switch to a supported command or surface.
227
+ - `AMBIGUOUS_MATCH`: refine the selector or locator, then retry the failed step.
228
+ - `COMMAND_FAILED`: add sync guards and retry from the failing step.
229
+
230
+ ## Stop conditions
231
+
232
+ - If refs drift after transitions, switch to selectors.
233
+ - If a desktop surface or context menu is involved on macOS, load [macos-desktop.md](macos-desktop.md).
234
+ - If logs, network, alerts, or setup failures become the blocker, switch to [debugging.md](debugging.md).
235
+ - If the flow is stable and you need proof or replay maintenance, switch to [verification.md](verification.md).
@@ -1,89 +1,86 @@
1
- # macOS Desktop Automation
1
+ # macOS Desktop
2
2
 
3
- Use this reference for host Mac apps such as Finder, TextEdit, System Settings, Preview, or browser apps running as normal desktop windows.
3
+ ## When to open this file
4
4
 
5
- ## Mental model
5
+ Open this file only when `--platform macos` is involved or the task needs `frontmost-app`, `desktop`, or `menubar` surfaces.
6
6
 
7
- - `snapshot -i` should describe UI that is visible to a human in the current front window.
8
- - Context menus are not ambient UI. Open them explicitly with `click --button secondary`, then re-snapshot.
9
- - Prefer refs for exploration and selectors for deterministic replay/assertions.
10
- - Avoid raw `x y` coordinates unless refs/selectors are impossible.
7
+ ## Main commands to reach for first
11
8
 
12
- ## Canonical flow
9
+ - `open <app> --platform macos`
10
+ - `open --platform macos --surface frontmost-app|desktop|menubar`
11
+ - `snapshot -i`
12
+ - `get`
13
+ - `is`
14
+ - `click --button secondary`
15
+
16
+ ## Most common mistake to avoid
17
+
18
+ Do not treat every macOS surface the same. Use the normal `app` surface when you want to act inside one app. Use `frontmost-app`, `desktop`, or `menubar` mainly to inspect what is visible before switching back to `app` for most interactions.
19
+
20
+ ## Canonical loop
13
21
 
14
22
  ```bash
15
- agent-device open Finder --platform macos
16
- agent-device snapshot -i
17
- agent-device click @e66 --button secondary --platform macos
18
- agent-device snapshot -i
23
+ agent-device open TextEdit --platform macos
24
+ agent-device snapshot
19
25
  agent-device close
20
26
  ```
21
27
 
22
- ## What to expect from snapshots
28
+ ## Surface rules
29
+
30
+ - `app`: default surface and the normal choice for `click`, `fill`, `press`, `scroll`, `screenshot`, and `record`.
31
+ - `frontmost-app`: inspect the currently focused app without naming it first.
32
+ - `desktop`: inspect visible desktop windows across apps.
33
+ - `menubar`: inspect the active app menu bar and system menu extras.
34
+
35
+ Use inspect-first surfaces to understand desktop-global UI, then switch back to `app` when you need to act in one app.
36
+
37
+ ## Snapshot expectations
23
38
 
24
- - `snapshot -i` prioritizes visible window content over dormant menu infrastructure.
25
- - File rows, sidebar items, toolbar controls, search fields, and visible context menus should appear.
26
- - Finder and other native apps may expose duplicate-looking structures such as row wrapper nodes, `cell` nodes, and child `text` or `text-field` nodes.
27
- - Treat those as distinct AX nodes unless you have a stronger selector anchor.
39
+ - `snapshot -i` should describe UI visible to a human.
40
+ - `desktop` snapshots can include multiple windows from multiple apps.
41
+ - `menubar` snapshots can include both app-menu items and system menu extras.
42
+ - Finder-style rows, sidebar items, toolbar controls, search fields, and opened context menus should appear when visible.
43
+ - Finder and other native apps may expose duplicate-looking row, cell, and child text nodes. Treat them as distinct AX nodes unless you have a stronger selector anchor.
28
44
 
29
45
  ## Context menus
30
46
 
31
- Use secondary click when the app exposes actions only through the contextual menu.
47
+ Context menus are not ambient UI. Open them explicitly, then re-snapshot.
32
48
 
33
49
  ```bash
34
50
  agent-device click @e66 --button secondary --platform macos
35
51
  agent-device snapshot -i
36
52
  ```
37
53
 
38
- Expected pattern:
54
+ Expected loop:
39
55
 
40
56
  1. Snapshot visible content.
41
- 2. Secondary-click the target row/item.
57
+ 2. Secondary-click the target item.
42
58
  3. Snapshot again.
43
- 4. Interact with newly visible `menu-item` nodes.
44
-
45
- Do not expect context-menu items to appear before the menu is opened.
46
-
47
- ## Finder-specific guidance
48
-
49
- - `snapshot -i` should still expose visible folder rows even when nothing is selected.
50
- - Unselected folder contents should still be visible in `snapshot -i` through list/table rows.
51
- - A file row may expose multiple nodes with the same label, including a row container, name cell, and child text/text-field.
52
- - For opening a context menu, prefer the outer visible row/cell ref over a nested text child if both exist.
53
- - After secondary click, expect actions such as `Rename`, `Quick Look`, `Copy`, `Compress`, and tag-related items in the next snapshot.
54
-
55
- ## Raw snapshots
56
-
57
- Use `snapshot --raw` only when debugging AX structure or collector issues.
58
-
59
- ```bash
60
- agent-device snapshot --raw --platform macos
61
- ```
59
+ 4. Interact with the new `menu-item` nodes.
62
60
 
63
- - Raw output is larger and less token-efficient.
64
- - It is useful for verifying whether missing UI is absent from the AX tree or only filtered from interactive output.
65
- - Do not use raw output as the default agent loop when `snapshot -i` already shows the visible window content you need.
61
+ ## Targeting rules
66
62
 
67
- ## Selector guidance
63
+ - Prefer selectors or `@ref` values over raw coordinates.
64
+ - On macOS, window position can vary across runs, so coordinate-only flows are fragile.
65
+ - If the task only needs shared exploration rules, return to [exploration.md](exploration.md).
68
66
 
69
- Good macOS selectors usually anchor on one of:
67
+ Selector guidance:
70
68
 
71
- - `label="Downloads"`
72
- - `label="failed-step.json"`
73
- - `role=button label="Search"`
74
- - `role=menu-item label="Rename"`
69
+ - Good selectors usually anchor on stable labels or app-owned identifiers such as `label="Downloads"` or `role=menu-item label="Rename"`.
70
+ - Avoid relying on framework-generated `_NS:*` identifiers as stable selectors.
75
71
 
76
- Prefer exact labels when the desktop UI is stable. Use `id=...` when the AX identifier is clearly app-owned and not a framework-generated `_NS:*` value.
72
+ Use `snapshot --raw --platform macos` only when debugging AX structure or collector filtering. Do not make raw snapshots the default agent loop.
77
73
 
78
- ## Things not to rely on
74
+ Things not to rely on:
79
75
 
80
- - Mobile-only helpers like `install`, `reinstall`, `push`, `logs`, `network`, or generic `alert`
81
- - Long-press as a substitute for right-click
82
- - Raw coordinate assumptions across runs; macOS windows can move
83
- - Framework-generated `_NS:*` identifiers as stable selectors
76
+ - Mobile-only helpers such as `install`, `reinstall`, or `push`.
77
+ - Desktop-global click or fill parity from `desktop` or `menubar` sessions.
78
+ - Raw coordinate assumptions across runs.
84
79
 
85
- ## Troubleshooting
80
+ Troubleshooting:
86
81
 
87
- - If visible window content is missing from `snapshot -i`, re-snapshot once after the UI settles.
88
- - If the wrong menu opened or no menu appeared, retry secondary-clicking the row/cell wrapper instead of the nested text node.
89
- - If the app has multiple windows, ensure the correct one is frontmost before relying on refs.
82
+ - If visible content is missing from `snapshot -i`, re-snapshot after the UI settles.
83
+ - If `desktop` is too broad, retry with `frontmost-app`.
84
+ - If `menubar` is missing the expected menu, make the app frontmost first and retry.
85
+ - If the wrong menu opened, retry secondary-clicking the row or cell wrapper rather than the nested text node.
86
+ - If the app has multiple windows, make the correct window frontmost before relying on refs.
@@ -1,56 +1,69 @@
1
- # Remote Tenancy and Lease Admission
1
+ # Remote Tenancy
2
2
 
3
- Use this reference for remote daemon HTTP flows that require explicit
4
- tenant/run admission control.
3
+ ## When to open this file
5
4
 
6
- ## Transport prerequisites
5
+ Open this file for remote daemon HTTP flows, including `--remote-config` launches, that let an agent running in a Linux sandbox talk to another `agent-device` instance on a remote macOS host in order to control devices that are not available locally. This file covers daemon URL setup, authentication, lease allocation, and tenant-scoped command admission.
7
6
 
8
- - Start daemon in HTTP mode (`AGENT_DEVICE_DAEMON_SERVER_MODE=http|dual`).
9
- - Point remote clients at the host with `AGENT_DEVICE_DAEMON_BASE_URL=http(s)://host:port[/base-path]`
10
- or `--daemon-base-url <url>` so the CLI skips local daemon discovery/startup.
11
- - Use `AGENT_DEVICE_DAEMON_AUTH_TOKEN` / `--daemon-auth-token` when the client should send the
12
- shared daemon token automatically.
13
- - Direct JSON-RPC callers can use a token in params, `Authorization: Bearer <token>`, or
14
- `x-agent-device-token`.
15
- - Prefer an auth hook (`AGENT_DEVICE_HTTP_AUTH_HOOK`) for caller validation and
16
- tenant injection.
7
+ ## Main commands to reach for first
17
8
 
18
- ## Lease lifecycle (JSON-RPC)
9
+ - `agent-device open <app> --remote-config <path> --relaunch`
10
+ - `AGENT_DEVICE_DAEMON_BASE_URL=...`
11
+ - `AGENT_DEVICE_DAEMON_AUTH_TOKEN=...`
12
+ - `curl ... agent_device.lease.allocate`
13
+ - `curl ... agent_device.lease.heartbeat`
14
+ - `curl ... agent_device.lease.release`
15
+ - `agent-device --tenant ... --session-isolation tenant --run-id ... --lease-id ...`
19
16
 
20
- Use `POST /rpc` with JSON-RPC 2.0 methods:
17
+ ## Most common mistake to avoid
21
18
 
22
- - `agent_device.lease.allocate`
23
- - `agent_device.lease.heartbeat`
24
- - `agent_device.lease.release`
19
+ Do not run a tenant-isolated command without matching `tenant`, `run`, and `lease` scope. Admission checks require all three to line up.
20
+
21
+ ## Preferred remote launch path
22
+
23
+ Use this when the agent needs the simplest remote control flow: a Linux sandbox agent talks over HTTP to `agent-device` on a remote macOS host and launches the target app through a checked-in `--remote-config` profile.
24
+
25
+ ```bash
26
+ agent-device open com.example.myapp --remote-config ./agent-device.remote.json --relaunch
27
+ ```
25
28
 
26
- Example allocate:
29
+ - This is the preferred remote launch path for sandbox or cloud agents.
30
+ - For Android React Native relaunch flows, install or reinstall the APK first, then relaunch by installed package name.
31
+ - Do not use `open <apk|aab> --relaunch`; remote runtime hints are applied through the installed app sandbox.
32
+
33
+ ## Lease flow example
27
34
 
28
35
  ```bash
36
+ export AGENT_DEVICE_DAEMON_BASE_URL=http://mac-host.example:4310
37
+ export AGENT_DEVICE_DAEMON_AUTH_TOKEN=<token>
38
+
29
39
  curl -sS "${AGENT_DEVICE_DAEMON_BASE_URL}/rpc" \
30
40
  -H "content-type: application/json" \
31
41
  -H "Authorization: Bearer <token>" \
32
42
  -d '{"jsonrpc":"2.0","id":"alloc-1","method":"agent_device.lease.allocate","params":{"tenantId":"acme","runId":"run-123","ttlMs":60000}}'
43
+
44
+ agent-device \
45
+ --tenant acme \
46
+ --session-isolation tenant \
47
+ --run-id run-123 \
48
+ --lease-id <lease-id> \
49
+ session list --json
33
50
  ```
34
51
 
35
- Example heartbeat:
52
+ Heartbeat and release example:
36
53
 
37
54
  ```bash
38
55
  curl -sS "${AGENT_DEVICE_DAEMON_BASE_URL}/rpc" \
39
56
  -H "content-type: application/json" \
40
57
  -H "Authorization: Bearer <token>" \
41
58
  -d '{"jsonrpc":"2.0","id":"hb-1","method":"agent_device.lease.heartbeat","params":{"leaseId":"<lease-id>","ttlMs":60000}}'
42
- ```
43
-
44
- Example release:
45
59
 
46
- ```bash
47
60
  curl -sS "${AGENT_DEVICE_DAEMON_BASE_URL}/rpc" \
48
61
  -H "content-type: application/json" \
49
62
  -H "Authorization: Bearer <token>" \
50
63
  -d '{"jsonrpc":"2.0","id":"rel-1","method":"agent_device.lease.release","params":{"leaseId":"<lease-id>"}}'
51
64
  ```
52
65
 
53
- Example session-locked command request:
66
+ Session-locked RPC command example:
54
67
 
55
68
  ```bash
56
69
  curl -sS "${AGENT_DEVICE_DAEMON_BASE_URL}/rpc" \
@@ -59,11 +72,34 @@ curl -sS "${AGENT_DEVICE_DAEMON_BASE_URL}/rpc" \
59
72
  -d '{"jsonrpc":"2.0","id":"cmd-1","method":"agent_device.command","params":{"session":"qa-ios","command":"snapshot","positionals":[],"meta":{"lockPolicy":"reject","lockPlatform":"ios","tenantId":"acme","runId":"run-123","leaseId":"<lease-id>"}}}'
60
73
  ```
61
74
 
62
- Direct RPC callers can send the same session lock concept as the CLI and typed client through `meta.lockPolicy` and optional `meta.lockPlatform`.
75
+ ## Transport prerequisites
76
+
77
+ - Start the daemon in HTTP mode with `AGENT_DEVICE_DAEMON_SERVER_MODE=http|dual`.
78
+ - Point the client at the remote host with `AGENT_DEVICE_DAEMON_BASE_URL=http(s)://host:port[/base-path]`.
79
+ - Use `AGENT_DEVICE_DAEMON_AUTH_TOKEN` or `--daemon-auth-token` when the client should send the shared daemon token automatically.
80
+ - Direct JSON-RPC callers can authenticate with request params, `Authorization: Bearer <token>`, or `x-agent-device-token`.
81
+ - Prefer an auth hook such as `AGENT_DEVICE_HTTP_AUTH_HOOK` when the host needs caller validation or tenant injection.
82
+
83
+ ## Lease lifecycle
84
+
85
+ Use JSON-RPC methods on `POST /rpc`:
86
+
87
+ - `agent_device.lease.allocate`
88
+ - `agent_device.lease.heartbeat`
89
+ - `agent_device.lease.release`
90
+
91
+ Keep the lease alive for the duration of the run and release it when the tenant-scoped work is complete.
92
+
93
+ Host-level lease knobs:
94
+
95
+ - `AGENT_DEVICE_MAX_SIMULATOR_LEASES`
96
+ - `AGENT_DEVICE_LEASE_TTL_MS`
97
+ - `AGENT_DEVICE_LEASE_MIN_TTL_MS`
98
+ - `AGENT_DEVICE_LEASE_MAX_TTL_MS`
63
99
 
64
100
  ## Command admission contract
65
101
 
66
- For tenant-isolated command execution, pass all four flags:
102
+ For tenant-isolated command execution, pass all four CLI flags together:
67
103
 
68
104
  ```bash
69
105
  agent-device \
@@ -74,25 +110,11 @@ agent-device \
74
110
  session list --json
75
111
  ```
76
112
 
77
- Admission checks require tenant/run/lease scope alignment.
78
-
79
- The CLI sends `AGENT_DEVICE_DAEMON_AUTH_TOKEN` in both the JSON-RPC request token field and HTTP
80
- auth headers so existing daemon auth paths continue to work.
81
-
82
- ## Failure semantics
83
-
84
- - Missing tenant/run/lease fields in tenant isolation mode: `INVALID_ARGS`
85
- - Lease not active or wrong scope: `UNAUTHORIZED`
86
- - Method mismatch: JSON-RPC `-32601` (HTTP 404)
113
+ The CLI sends `AGENT_DEVICE_DAEMON_AUTH_TOKEN` in both the JSON-RPC request token field and HTTP auth headers so existing daemon auth paths continue to work.
87
114
 
88
- ## Operational guidance
115
+ ## Failure semantics and trust notes
89
116
 
90
- - Keep TTL short and heartbeat only while a run is active.
91
- - Release lease immediately on run completion/error paths.
92
- - For remote debug sessions, inspect logs on the remote host; client-side `--debug` no longer tails
93
- a local daemon log when `AGENT_DEVICE_DAEMON_BASE_URL` is set.
94
- - For bounded hosts, configure:
95
- - `AGENT_DEVICE_MAX_SIMULATOR_LEASES`
96
- - `AGENT_DEVICE_LEASE_TTL_MS`
97
- - `AGENT_DEVICE_LEASE_MIN_TTL_MS`
98
- - `AGENT_DEVICE_LEASE_MAX_TTL_MS`
117
+ - Missing tenant, run, or lease fields in tenant-isolation mode should fail as `INVALID_ARGS`.
118
+ - Inactive or scope-mismatched leases should fail as `UNAUTHORIZED`.
119
+ - Inspect logs on the remote host during remote debugging. Client-side `--debug` does not tail a local daemon log once `AGENT_DEVICE_DAEMON_BASE_URL` is set.
120
+ - Treat daemon auth tokens and lease identifiers as sensitive operational data.