agent-device 0.10.0 → 0.10.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (76) hide show
  1. package/README.md +4 -607
  2. package/dist/src/331.js +3 -3
  3. package/dist/src/425.js +1 -0
  4. package/dist/src/bin.js +28 -28
  5. package/dist/src/core/dispatch.d.ts +2 -0
  6. package/dist/src/core/session-surface.d.ts +3 -0
  7. package/dist/src/core/settings-contract.d.ts +2 -1
  8. package/dist/src/daemon/android-system-dialog.d.ts +11 -0
  9. package/dist/src/daemon/app-log-ios.d.ts +2 -1
  10. package/dist/src/daemon/app-log-process.d.ts +1 -1
  11. package/dist/src/daemon/app-log.d.ts +1 -1
  12. package/dist/src/daemon/context.d.ts +2 -0
  13. package/dist/src/daemon/handlers/interaction-common.d.ts +30 -1
  14. package/dist/src/daemon/handlers/interaction-read.d.ts +14 -0
  15. package/dist/src/daemon/handlers/interaction-touch.d.ts +45 -0
  16. package/dist/src/daemon/handlers/interaction.d.ts +2 -0
  17. package/dist/src/daemon/handlers/record-trace-android.d.ts +18 -0
  18. package/dist/src/daemon/handlers/record-trace-ios.d.ts +52 -0
  19. package/dist/src/daemon/handlers/record-trace-recording.d.ts +32 -0
  20. package/dist/src/daemon/handlers/record-trace.d.ts +2 -7
  21. package/dist/src/daemon/handlers/snapshot-capture.d.ts +11 -4
  22. package/dist/src/daemon/record-trace-errors.d.ts +6 -0
  23. package/dist/src/daemon/recording-gestures.d.ts +3 -0
  24. package/dist/src/daemon/recording-telemetry.d.ts +20 -0
  25. package/dist/src/daemon/recording-timing.d.ts +24 -0
  26. package/dist/src/daemon/request-router.d.ts +6 -0
  27. package/dist/src/daemon/script-utils.d.ts +1 -0
  28. package/dist/src/daemon/snapshot-processing.d.ts +1 -0
  29. package/dist/src/daemon/touch-reference-frame.d.ts +7 -0
  30. package/dist/src/daemon/types.d.ts +65 -11
  31. package/dist/src/daemon.js +62 -36
  32. package/dist/src/platforms/android/index.d.ts +1 -1
  33. package/dist/src/platforms/android/input-actions.d.ts +5 -0
  34. package/dist/src/platforms/android/settings.d.ts +1 -1
  35. package/dist/src/platforms/ios/apps.d.ts +1 -1
  36. package/dist/src/platforms/ios/macos-helper.d.ts +69 -0
  37. package/dist/src/platforms/ios/runner-client.d.ts +2 -2
  38. package/dist/src/platforms/ios/runner-session.d.ts +5 -0
  39. package/dist/src/platforms/ios/runner-xctestrun.d.ts +3 -1
  40. package/dist/src/recording/overlay.d.ts +10 -0
  41. package/dist/src/utils/command-schema.d.ts +2 -0
  42. package/dist/src/utils/interactors.d.ts +8 -8
  43. package/dist/src/utils/snapshot-lines.d.ts +5 -2
  44. package/dist/src/utils/snapshot.d.ts +8 -1
  45. package/dist/src/utils/text-surface.d.ts +19 -0
  46. package/dist/src/utils/video.d.ts +9 -0
  47. package/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+CommandExecution.swift +196 -51
  48. package/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Interaction.swift +133 -0
  49. package/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Lifecycle.swift +1 -1
  50. package/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Models.swift +33 -1
  51. package/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+ScreenRecorder.swift +4 -6
  52. package/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests.swift +1 -0
  53. package/ios-runner/AgentDeviceRunner/RecordingScripts/recording-overlay.swift +571 -0
  54. package/ios-runner/AgentDeviceRunner/RecordingScripts/recording-trim.swift +140 -0
  55. package/macos-helper/Package.swift +18 -0
  56. package/macos-helper/Sources/AgentDeviceMacOSHelper/SnapshotTraversal.swift +543 -0
  57. package/macos-helper/Sources/AgentDeviceMacOSHelper/main.swift +545 -0
  58. package/package.json +4 -1
  59. package/skills/agent-device/SKILL.md +25 -334
  60. package/skills/agent-device/references/bootstrap-install.md +167 -0
  61. package/skills/agent-device/references/coordinate-system.md +24 -4
  62. package/skills/agent-device/references/debugging.md +115 -0
  63. package/skills/agent-device/references/exploration.md +193 -0
  64. package/skills/agent-device/references/macos-desktop.md +55 -57
  65. package/skills/agent-device/references/remote-tenancy.md +56 -47
  66. package/skills/agent-device/references/verification.md +103 -0
  67. package/dist/src/274.js +0 -1
  68. package/dist/src/daemon/handlers/interaction-fill.d.ts +0 -3
  69. package/dist/src/daemon/handlers/interaction-press.d.ts +0 -3
  70. package/skills/agent-device/references/batching.md +0 -79
  71. package/skills/agent-device/references/logs-and-debug.md +0 -113
  72. package/skills/agent-device/references/perf-metrics.md +0 -53
  73. package/skills/agent-device/references/permissions.md +0 -70
  74. package/skills/agent-device/references/session-management.md +0 -101
  75. package/skills/agent-device/references/snapshot-refs.md +0 -102
  76. package/skills/agent-device/references/video-recording.md +0 -41
@@ -0,0 +1,115 @@
1
+ # Debugging
2
+
3
+ ## When to open this file
4
+
5
+ Open this file when the task turns into failure triage, logs, network inspection, permission prompts, setup trouble, or unstable session behavior.
6
+
7
+ ## Main commands to reach for first
8
+
9
+ - `logs clear --restart`
10
+ - `network dump`
11
+ - `logs path`
12
+ - `logs doctor`
13
+ - `alert wait`
14
+ - `alert accept` or `alert dismiss`
15
+
16
+ ## Most common mistake to avoid
17
+
18
+ Do not leave logging on for normal flows or dump full log files into context. Keep debug windows short and inspect logs with `grep` or `tail`.
19
+
20
+ ## Canonical loop
21
+
22
+ ```bash
23
+ agent-device open MyApp --platform ios
24
+ agent-device logs clear --restart
25
+ agent-device network dump 25
26
+ agent-device logs path
27
+ agent-device close
28
+ ```
29
+
30
+ ## Log and network flow
31
+
32
+ Logging is off by default. Enable it only when you need a debugging window.
33
+
34
+ - Default app logs live under `~/.agent-device/sessions/<session>/app.log`.
35
+ - `logs clear --restart` is the fastest clean repro loop.
36
+ - `network dump [limit] [summary|headers|body|all]` parses recent HTTP(s) entries from the same session app log.
37
+ - `logs doctor` checks backend and runtime readiness for the current session and device.
38
+ - `logs mark "before tap"` inserts a timestamped marker into the app log.
39
+ - Session app logs can contain runtime data, headers, or payload fragments. Review them before sharing.
40
+ - `logs start` requires an active app session and appends to `app.log`.
41
+ - `logs stop` stops streaming. `close` also stops logging.
42
+ - `logs clear` truncates `app.log` and removes rotated `app.log.N` files, and requires logging to be stopped first.
43
+ - `logs path` returns the log path plus metadata about the active backend and file state.
44
+ - `network log` is an alias for `network dump`.
45
+
46
+ Operational limits:
47
+
48
+ - `app.log` rotates to `app.log.1` after 5 MB by default.
49
+ - `network dump` scans the last 4000 app-log lines, returns up to 200 entries, and truncates header or payload fields at 2048 characters.
50
+ - Retention knobs:
51
+ - `AGENT_DEVICE_APP_LOG_MAX_BYTES`
52
+ - `AGENT_DEVICE_APP_LOG_MAX_FILES`
53
+ - Redaction hook:
54
+ - `AGENT_DEVICE_APP_LOG_REDACT_PATTERNS`
55
+
56
+ Useful shell follow-up after `logs path`:
57
+
58
+ ```bash
59
+ grep -n -E "Error|Exception|Fatal|crash" <path>
60
+ tail -50 <path>
61
+ ```
62
+
63
+ ## Alerts and permissions
64
+
65
+ Use `alert` for iOS simulator permission dialogs instead of tapping coordinates.
66
+
67
+ ```bash
68
+ agent-device alert wait 5000
69
+ agent-device alert accept
70
+ ```
71
+
72
+ - `alert` is only supported on iOS simulators.
73
+ - `alert accept` and `alert dismiss` retry internally for a short window, so you usually do not need manual sleeps.
74
+ - iOS 16+ "Allow Paste" prompts are suppressed under XCUITest. Use `xcrun simctl pbcopy booted` when you need to seed simulator clipboard content directly.
75
+
76
+ ## Setup problems worth recognizing early
77
+
78
+ - iOS snapshots do not require macOS Accessibility permissions.
79
+ - iOS physical-device XCTest setup does require valid signing and provisioning.
80
+ - If physical-device runner setup fails, prefer Xcode Automatic Signing first.
81
+ - Optional overrides are:
82
+ - `AGENT_DEVICE_IOS_TEAM_ID`
83
+ - `AGENT_DEVICE_IOS_SIGNING_IDENTITY`
84
+ - `AGENT_DEVICE_IOS_PROVISIONING_PROFILE`
85
+ - `AGENT_DEVICE_IOS_BUNDLE_ID`
86
+ - If daemon startup is timing out during setup, increase `AGENT_DEVICE_DAEMON_TIMEOUT_MS`.
87
+ - If daemon startup fails with stale metadata hints, clean `~/.agent-device/daemon.json` and `~/.agent-device/daemon.lock`, then retry.
88
+ - Free Apple Developer personal-team accounts may reject generic bundle IDs. Use a unique reverse-DNS value for `AGENT_DEVICE_IOS_BUNDLE_ID` when that happens.
89
+
90
+ ## Common failure patterns
91
+
92
+ - `snapshot` returns 0 nodes: the app may no longer be foregrounded or the UI is not stable yet. Re-open the app or retry when state settles.
93
+ - Logs are empty: confirm you opened an app session before `logs clear --restart`.
94
+ - Android logs look stale after relaunch: retry the repro window after the process rebinds.
95
+ - Permission prompts block the flow: wait for the alert and handle it explicitly.
96
+ - If snapshots keep returning 0 nodes on an iOS simulator, restart Simulator and re-open the app.
97
+ - If a macOS snapshot looks incomplete, compare with `snapshot --raw --platform macos` to separate collector filtering from missing AX content.
98
+
99
+ ## Crash triage fast path
100
+
101
+ Always start from the session app log, then branch by platform.
102
+
103
+ ```bash
104
+ agent-device logs path
105
+ grep -n -E "SIGABRT|SIGSEGV|EXC_|fatal|exception|terminated|killed|jetsam|memorystatus|FATAL EXCEPTION|Abort message" <path>
106
+ ```
107
+
108
+ - iOS: if the log suggests `ReportCrash`, `SIGABRT`, or `EXC_*`, inspect `~/Library/Logs/DiagnosticReports`.
109
+ - Android: if the app log is not enough, use `adb logcat` for `FATAL EXCEPTION`, `Abort message`, or `signal` lines around process death.
110
+ - If no crash signature appears in app logs, stop collecting broad logs and switch to the platform-native crash source.
111
+
112
+ ## When to leave this file
113
+
114
+ - Return to [exploration.md](exploration.md) once the app is stable again.
115
+ - Load [verification.md](verification.md) if you need evidence artifacts after reproducing the issue.
@@ -0,0 +1,193 @@
1
+ # Exploration
2
+
3
+ ## When to open this file
4
+
5
+ Open this file when the app or screen is already running and you need to discover the UI, choose targets, read state, wait for conditions, or perform normal interactions.
6
+
7
+ ## Main commands to reach for first
8
+
9
+ - `snapshot`
10
+ - `snapshot -i`
11
+ - `press`
12
+ - `fill`
13
+ - `get`
14
+ - `is`
15
+ - `wait`
16
+ - `find`
17
+
18
+ ## Most common mistake to avoid
19
+
20
+ Do not treat `@ref` values as durable after navigation or dynamic updates. Re-snapshot after the UI changes, and switch to selectors when the flow must stay stable.
21
+
22
+ ## Canonical loop
23
+
24
+ ```bash
25
+ agent-device open Settings --platform ios
26
+ agent-device snapshot -i
27
+ agent-device press @e3
28
+ agent-device wait visible 'label="Privacy & Security"' 3000
29
+ agent-device get text 'label="Privacy & Security"'
30
+ agent-device close
31
+ ```
32
+
33
+ ## Snapshot choices
34
+
35
+ - Use plain `snapshot` when you only need to verify whether visible text or structure is on screen.
36
+ - Use `snapshot -i` when you need refs such as `@e3` for interactive exploration.
37
+ - Treat large text-surface lines in `snapshot -i` as discovery output. If a node shows preview/truncation metadata, use `get text @ref` to expand the actual text after you choose the surface.
38
+ - Use `snapshot -i -s "Camera"` or `snapshot -i -s @e3` when you want a smaller, scoped result.
39
+
40
+ Example:
41
+
42
+ ```bash
43
+ agent-device snapshot -i
44
+ ```
45
+
46
+ Sample output:
47
+
48
+ ```text
49
+ Page: com.apple.Preferences
50
+ App: com.apple.Preferences
51
+
52
+ @e1 [ioscontentgroup]
53
+ @e2 [button] "Camera"
54
+ @e3 [button] "Privacy & Security"
55
+ ```
56
+
57
+ ## Refs vs selectors
58
+
59
+ - Use refs for discovery, debugging, and short local loops.
60
+ - Use selectors for deterministic scripts, assertions, and replay-friendly actions.
61
+ - Prefer selector or `@ref` targeting over raw coordinates.
62
+ - For tap interactions, `press` is canonical and `click` is an equivalent alias.
63
+
64
+ Examples:
65
+
66
+ ```bash
67
+ agent-device press @e2
68
+ agent-device fill @e5 "test"
69
+ agent-device press 'id="camera_row" || label="Camera" role=button'
70
+ agent-device is visible 'id="camera_settings_anchor"'
71
+ ```
72
+
73
+ ## Text entry rules
74
+
75
+ - Use `fill` to replace text in an editable field.
76
+ - Use `type` to append text to the current insertion point.
77
+
78
+ ## Query and sync rules
79
+
80
+ - Use `get` to read text, attrs, or state from a known target.
81
+ - Use `is` for assertions.
82
+ - Use `wait` when the UI needs time to settle after a mutation.
83
+ - Use `find "<query>" click --json` when you need search-driven targeting plus matched-target metadata.
84
+ - If you are forced onto raw coordinates, open [coordinate-system.md](coordinate-system.md) first.
85
+
86
+ Example:
87
+
88
+ ```bash
89
+ agent-device find "Increment" click --json
90
+ ```
91
+
92
+ Returned metadata comes from the matched snapshot node and can be used for observability or replay maintenance.
93
+
94
+ ## QA from acceptance criteria
95
+
96
+ Use this loop when the task starts from acceptance criteria and you need to turn them into concrete checks.
97
+
98
+ Preferred mapping:
99
+
100
+ - visibility or presence claim: `is visible` or plain `snapshot`
101
+ - exact text, label, or value claim: `get text`
102
+ - post-action state change: act, then `wait`, then `is` or `get`
103
+ - nearby structural UI change: `diff snapshot`
104
+ - proof artifact for the final result: `screenshot` or `record`
105
+
106
+ Anti-hallucination rules:
107
+
108
+ - Do not invent app names, device ids, session names, refs, selectors, or package names.
109
+ - Discover them first with `devices`, `open`, `snapshot -i`, `find`, or `session list`.
110
+ - If refs drift after navigation, re-snapshot or switch to selectors instead of guessing.
111
+
112
+ Canonical QA loop:
113
+
114
+ ```bash
115
+ agent-device open MyApp --platform ios
116
+ agent-device snapshot -i
117
+ agent-device press @e3
118
+ agent-device wait visible 'label="Success"' 3000
119
+ agent-device is visible 'label="Success"'
120
+ agent-device screenshot /tmp/qa-proof.png
121
+ agent-device close
122
+ ```
123
+
124
+ ## Accessibility audit
125
+
126
+ Use this pattern when you need to find UI that is visible to a user but missing from the accessibility tree.
127
+
128
+ Audit loop:
129
+
130
+ 1. Capture a `screenshot` to see what is visually rendered.
131
+ 2. Capture a `snapshot` or `snapshot -i` to see what the accessibility tree exposes.
132
+ 3. Compare the two:
133
+ - visible in screenshot and present in snapshot: exposed to accessibility
134
+ - visible in screenshot and missing from snapshot: likely accessibility gap
135
+ 4. If you suspect the node exists in AX but is filtered from interactive output, retry with `snapshot --raw`.
136
+
137
+ Example:
138
+
139
+ ```bash
140
+ agent-device screenshot /tmp/accessibility-screen.png
141
+ agent-device snapshot -i
142
+ ```
143
+
144
+ Use `screenshot` as the visual source of truth and `snapshot` as the accessibility source of truth for this audit.
145
+
146
+ ## Batch only when the sequence is already known
147
+
148
+ Use `batch` when a short command sequence is already planned and belongs to one logical screen flow.
149
+
150
+ ```bash
151
+ agent-device batch --session sim --platform ios --steps-file /tmp/batch-steps.json --json
152
+ ```
153
+
154
+ - Keep batch size moderate, roughly 5 to 20 steps.
155
+ - Add `wait` or `is exists` guards after mutating steps.
156
+ - Do not use `batch` for highly dynamic flows that need replanning after each step.
157
+
158
+ Step payload contract:
159
+
160
+ ```json
161
+ [
162
+ { "command": "open", "positionals": ["Settings"], "flags": { "platform": "ios" } },
163
+ { "command": "wait", "positionals": ["label=\"Privacy & Security\"", "3000"], "flags": {} },
164
+ { "command": "click", "positionals": ["label=\"Privacy & Security\""], "flags": {} },
165
+ { "command": "get", "positionals": ["text", "label=\"Tracking\""], "flags": {} }
166
+ ]
167
+ ```
168
+
169
+ - `positionals` is optional and defaults to `[]`.
170
+ - `flags` is optional and defaults to `{}`.
171
+ - Nested `batch` and `replay` are rejected.
172
+ - Supported error mode is stop-on-first-error.
173
+
174
+ Response handling:
175
+
176
+ - Success returns fields such as `total`, `executed`, `totalDurationMs`, and `results[]`.
177
+ - Failed runs include `details.step`, `details.command`, `details.executed`, and `details.partialResults`.
178
+ - Replan from the first failing step instead of rerunning the whole flow blindly.
179
+
180
+ Common batch error categories:
181
+
182
+ - `INVALID_ARGS`: fix the payload shape and retry.
183
+ - `SESSION_NOT_FOUND`: open or select the correct session, then retry.
184
+ - `UNSUPPORTED_OPERATION`: switch to a supported command or surface.
185
+ - `AMBIGUOUS_MATCH`: refine the selector or locator, then retry the failed step.
186
+ - `COMMAND_FAILED`: add sync guards and retry from the failing step.
187
+
188
+ ## Stop conditions
189
+
190
+ - If refs drift after transitions, switch to selectors.
191
+ - If a desktop surface or context menu is involved on macOS, load [macos-desktop.md](macos-desktop.md).
192
+ - If logs, network, alerts, or setup failures become the blocker, switch to [debugging.md](debugging.md).
193
+ - If the flow is stable and you need proof or replay maintenance, switch to [verification.md](verification.md).
@@ -1,89 +1,87 @@
1
- # macOS Desktop Automation
1
+ # macOS Desktop
2
2
 
3
- Use this reference for host Mac apps such as Finder, TextEdit, System Settings, Preview, or browser apps running as normal desktop windows.
3
+ ## When to open this file
4
4
 
5
- ## Mental model
5
+ Open this file only when `--platform macos` is involved or the task needs `frontmost-app`, `desktop`, or `menubar` surfaces.
6
6
 
7
- - `snapshot -i` should describe UI that is visible to a human in the current front window.
8
- - Context menus are not ambient UI. Open them explicitly with `click --button secondary`, then re-snapshot.
9
- - Prefer refs for exploration and selectors for deterministic replay/assertions.
10
- - Avoid raw `x y` coordinates unless refs/selectors are impossible.
7
+ ## Main commands to reach for first
11
8
 
12
- ## Canonical flow
9
+ - `open <app> --platform macos`
10
+ - `open --platform macos --surface frontmost-app|desktop|menubar`
11
+ - `snapshot -i`
12
+ - `get`
13
+ - `is`
14
+ - `click --button secondary`
15
+
16
+ ## Most common mistake to avoid
17
+
18
+ Do not treat every macOS surface the same. Use the normal `app` surface when you want to act inside one app. Use `frontmost-app`, `desktop`, or `menubar` mainly to inspect what is visible before switching back to `app` for most interactions.
19
+
20
+ ## Canonical loop
13
21
 
14
22
  ```bash
15
- agent-device open Finder --platform macos
16
- agent-device snapshot -i
17
- agent-device click @e66 --button secondary --platform macos
23
+ agent-device open TextEdit --platform macos
18
24
  agent-device snapshot -i
25
+ agent-device fill @e3 "desktop smoke test"
19
26
  agent-device close
20
27
  ```
21
28
 
22
- ## What to expect from snapshots
29
+ ## Surface rules
30
+
31
+ - `app`: default surface and the normal choice for `click`, `fill`, `press`, `scroll`, `screenshot`, and `record`.
32
+ - `frontmost-app`: inspect the currently focused app without naming it first.
33
+ - `desktop`: inspect visible desktop windows across apps.
34
+ - `menubar`: inspect the active app menu bar and system menu extras.
35
+
36
+ Use inspect-first surfaces to understand desktop-global UI, then switch back to `app` when you need to act in one app.
37
+
38
+ ## Snapshot expectations
23
39
 
24
- - `snapshot -i` prioritizes visible window content over dormant menu infrastructure.
25
- - File rows, sidebar items, toolbar controls, search fields, and visible context menus should appear.
26
- - Finder and other native apps may expose duplicate-looking structures such as row wrapper nodes, `cell` nodes, and child `text` or `text-field` nodes.
27
- - Treat those as distinct AX nodes unless you have a stronger selector anchor.
40
+ - `snapshot -i` should describe UI visible to a human.
41
+ - `desktop` snapshots can include multiple windows from multiple apps.
42
+ - `menubar` snapshots can include both app-menu items and system menu extras.
43
+ - Finder-style rows, sidebar items, toolbar controls, search fields, and opened context menus should appear when visible.
44
+ - Finder and other native apps may expose duplicate-looking row, cell, and child text nodes. Treat them as distinct AX nodes unless you have a stronger selector anchor.
28
45
 
29
46
  ## Context menus
30
47
 
31
- Use secondary click when the app exposes actions only through the contextual menu.
48
+ Context menus are not ambient UI. Open them explicitly, then re-snapshot.
32
49
 
33
50
  ```bash
34
51
  agent-device click @e66 --button secondary --platform macos
35
52
  agent-device snapshot -i
36
53
  ```
37
54
 
38
- Expected pattern:
55
+ Expected loop:
39
56
 
40
57
  1. Snapshot visible content.
41
- 2. Secondary-click the target row/item.
58
+ 2. Secondary-click the target item.
42
59
  3. Snapshot again.
43
- 4. Interact with newly visible `menu-item` nodes.
44
-
45
- Do not expect context-menu items to appear before the menu is opened.
46
-
47
- ## Finder-specific guidance
48
-
49
- - `snapshot -i` should still expose visible folder rows even when nothing is selected.
50
- - Unselected folder contents should still be visible in `snapshot -i` through list/table rows.
51
- - A file row may expose multiple nodes with the same label, including a row container, name cell, and child text/text-field.
52
- - For opening a context menu, prefer the outer visible row/cell ref over a nested text child if both exist.
53
- - After secondary click, expect actions such as `Rename`, `Quick Look`, `Copy`, `Compress`, and tag-related items in the next snapshot.
54
-
55
- ## Raw snapshots
56
-
57
- Use `snapshot --raw` only when debugging AX structure or collector issues.
58
-
59
- ```bash
60
- agent-device snapshot --raw --platform macos
61
- ```
60
+ 4. Interact with the new `menu-item` nodes.
62
61
 
63
- - Raw output is larger and less token-efficient.
64
- - It is useful for verifying whether missing UI is absent from the AX tree or only filtered from interactive output.
65
- - Do not use raw output as the default agent loop when `snapshot -i` already shows the visible window content you need.
62
+ ## Targeting rules
66
63
 
67
- ## Selector guidance
64
+ - Prefer selectors or `@ref` values over raw coordinates.
65
+ - On macOS, window position can vary across runs, so coordinate-only flows are fragile.
66
+ - If the task only needs shared exploration rules, return to [exploration.md](exploration.md).
68
67
 
69
- Good macOS selectors usually anchor on one of:
68
+ Selector guidance:
70
69
 
71
- - `label="Downloads"`
72
- - `label="failed-step.json"`
73
- - `role=button label="Search"`
74
- - `role=menu-item label="Rename"`
70
+ - Good selectors usually anchor on stable labels or app-owned identifiers such as `label="Downloads"` or `role=menu-item label="Rename"`.
71
+ - Avoid relying on framework-generated `_NS:*` identifiers as stable selectors.
75
72
 
76
- Prefer exact labels when the desktop UI is stable. Use `id=...` when the AX identifier is clearly app-owned and not a framework-generated `_NS:*` value.
73
+ Use `snapshot --raw --platform macos` only when debugging AX structure or collector filtering. Do not make raw snapshots the default agent loop.
77
74
 
78
- ## Things not to rely on
75
+ Things not to rely on:
79
76
 
80
- - Mobile-only helpers like `install`, `reinstall`, `push`, `logs`, `network`, or generic `alert`
81
- - Long-press as a substitute for right-click
82
- - Raw coordinate assumptions across runs; macOS windows can move
83
- - Framework-generated `_NS:*` identifiers as stable selectors
77
+ - Mobile-only helpers such as `install`, `reinstall`, or `push`.
78
+ - Desktop-global click or fill parity from `desktop` or `menubar` sessions.
79
+ - Raw coordinate assumptions across runs.
84
80
 
85
- ## Troubleshooting
81
+ Troubleshooting:
86
82
 
87
- - If visible window content is missing from `snapshot -i`, re-snapshot once after the UI settles.
88
- - If the wrong menu opened or no menu appeared, retry secondary-clicking the row/cell wrapper instead of the nested text node.
89
- - If the app has multiple windows, ensure the correct one is frontmost before relying on refs.
83
+ - If visible content is missing from `snapshot -i`, re-snapshot after the UI settles.
84
+ - If `desktop` is too broad, retry with `frontmost-app`.
85
+ - If `menubar` is missing the expected menu, make the app frontmost first and retry.
86
+ - If the wrong menu opened, retry secondary-clicking the row or cell wrapper rather than the nested text node.
87
+ - If the app has multiple windows, make the correct window frontmost before relying on refs.
@@ -1,56 +1,56 @@
1
- # Remote Tenancy and Lease Admission
1
+ # Remote Tenancy
2
2
 
3
- Use this reference for remote daemon HTTP flows that require explicit
4
- tenant/run admission control.
3
+ ## When to open this file
5
4
 
6
- ## Transport prerequisites
5
+ Open this file only for remote daemon HTTP flows that require explicit daemon URL setup, authentication, lease allocation, or tenant-scoped command admission.
7
6
 
8
- - Start daemon in HTTP mode (`AGENT_DEVICE_DAEMON_SERVER_MODE=http|dual`).
9
- - Point remote clients at the host with `AGENT_DEVICE_DAEMON_BASE_URL=http(s)://host:port[/base-path]`
10
- or `--daemon-base-url <url>` so the CLI skips local daemon discovery/startup.
11
- - Use `AGENT_DEVICE_DAEMON_AUTH_TOKEN` / `--daemon-auth-token` when the client should send the
12
- shared daemon token automatically.
13
- - Direct JSON-RPC callers can use a token in params, `Authorization: Bearer <token>`, or
14
- `x-agent-device-token`.
15
- - Prefer an auth hook (`AGENT_DEVICE_HTTP_AUTH_HOOK`) for caller validation and
16
- tenant injection.
7
+ ## Main commands to reach for first
17
8
 
18
- ## Lease lifecycle (JSON-RPC)
9
+ - `AGENT_DEVICE_DAEMON_BASE_URL=...`
10
+ - `AGENT_DEVICE_DAEMON_AUTH_TOKEN=...`
11
+ - `curl ... agent_device.lease.allocate`
12
+ - `curl ... agent_device.lease.heartbeat`
13
+ - `curl ... agent_device.lease.release`
14
+ - `agent-device --tenant ... --session-isolation tenant --run-id ... --lease-id ...`
19
15
 
20
- Use `POST /rpc` with JSON-RPC 2.0 methods:
16
+ ## Most common mistake to avoid
21
17
 
22
- - `agent_device.lease.allocate`
23
- - `agent_device.lease.heartbeat`
24
- - `agent_device.lease.release`
18
+ Do not run a tenant-isolated command without matching `tenant`, `run`, and `lease` scope. Admission checks require all three to line up.
25
19
 
26
- Example allocate:
20
+ ## Canonical loop
27
21
 
28
22
  ```bash
23
+ export AGENT_DEVICE_DAEMON_BASE_URL=http://mac-host.example:4310
24
+ export AGENT_DEVICE_DAEMON_AUTH_TOKEN=<token>
25
+
29
26
  curl -sS "${AGENT_DEVICE_DAEMON_BASE_URL}/rpc" \
30
27
  -H "content-type: application/json" \
31
28
  -H "Authorization: Bearer <token>" \
32
29
  -d '{"jsonrpc":"2.0","id":"alloc-1","method":"agent_device.lease.allocate","params":{"tenantId":"acme","runId":"run-123","ttlMs":60000}}'
30
+
31
+ agent-device \
32
+ --tenant acme \
33
+ --session-isolation tenant \
34
+ --run-id run-123 \
35
+ --lease-id <lease-id> \
36
+ session list --json
33
37
  ```
34
38
 
35
- Example heartbeat:
39
+ Heartbeat and release example:
36
40
 
37
41
  ```bash
38
42
  curl -sS "${AGENT_DEVICE_DAEMON_BASE_URL}/rpc" \
39
43
  -H "content-type: application/json" \
40
44
  -H "Authorization: Bearer <token>" \
41
45
  -d '{"jsonrpc":"2.0","id":"hb-1","method":"agent_device.lease.heartbeat","params":{"leaseId":"<lease-id>","ttlMs":60000}}'
42
- ```
43
-
44
- Example release:
45
46
 
46
- ```bash
47
47
  curl -sS "${AGENT_DEVICE_DAEMON_BASE_URL}/rpc" \
48
48
  -H "content-type: application/json" \
49
49
  -H "Authorization: Bearer <token>" \
50
50
  -d '{"jsonrpc":"2.0","id":"rel-1","method":"agent_device.lease.release","params":{"leaseId":"<lease-id>"}}'
51
51
  ```
52
52
 
53
- Example session-locked command request:
53
+ Session-locked RPC command example:
54
54
 
55
55
  ```bash
56
56
  curl -sS "${AGENT_DEVICE_DAEMON_BASE_URL}/rpc" \
@@ -59,11 +59,34 @@ curl -sS "${AGENT_DEVICE_DAEMON_BASE_URL}/rpc" \
59
59
  -d '{"jsonrpc":"2.0","id":"cmd-1","method":"agent_device.command","params":{"session":"qa-ios","command":"snapshot","positionals":[],"meta":{"lockPolicy":"reject","lockPlatform":"ios","tenantId":"acme","runId":"run-123","leaseId":"<lease-id>"}}}'
60
60
  ```
61
61
 
62
- Direct RPC callers can send the same session lock concept as the CLI and typed client through `meta.lockPolicy` and optional `meta.lockPlatform`.
62
+ ## Transport prerequisites
63
+
64
+ - Start the daemon in HTTP mode with `AGENT_DEVICE_DAEMON_SERVER_MODE=http|dual`.
65
+ - Point the client at the remote host with `AGENT_DEVICE_DAEMON_BASE_URL=http(s)://host:port[/base-path]`.
66
+ - Use `AGENT_DEVICE_DAEMON_AUTH_TOKEN` or `--daemon-auth-token` when the client should send the shared daemon token automatically.
67
+ - Direct JSON-RPC callers can authenticate with request params, `Authorization: Bearer <token>`, or `x-agent-device-token`.
68
+ - Prefer an auth hook such as `AGENT_DEVICE_HTTP_AUTH_HOOK` when the host needs caller validation or tenant injection.
69
+
70
+ ## Lease lifecycle
71
+
72
+ Use JSON-RPC methods on `POST /rpc`:
73
+
74
+ - `agent_device.lease.allocate`
75
+ - `agent_device.lease.heartbeat`
76
+ - `agent_device.lease.release`
77
+
78
+ Keep the lease alive for the duration of the run and release it when the tenant-scoped work is complete.
79
+
80
+ Host-level lease knobs:
81
+
82
+ - `AGENT_DEVICE_MAX_SIMULATOR_LEASES`
83
+ - `AGENT_DEVICE_LEASE_TTL_MS`
84
+ - `AGENT_DEVICE_LEASE_MIN_TTL_MS`
85
+ - `AGENT_DEVICE_LEASE_MAX_TTL_MS`
63
86
 
64
87
  ## Command admission contract
65
88
 
66
- For tenant-isolated command execution, pass all four flags:
89
+ For tenant-isolated command execution, pass all four CLI flags together:
67
90
 
68
91
  ```bash
69
92
  agent-device \
@@ -74,25 +97,11 @@ agent-device \
74
97
  session list --json
75
98
  ```
76
99
 
77
- Admission checks require tenant/run/lease scope alignment.
78
-
79
- The CLI sends `AGENT_DEVICE_DAEMON_AUTH_TOKEN` in both the JSON-RPC request token field and HTTP
80
- auth headers so existing daemon auth paths continue to work.
81
-
82
- ## Failure semantics
83
-
84
- - Missing tenant/run/lease fields in tenant isolation mode: `INVALID_ARGS`
85
- - Lease not active or wrong scope: `UNAUTHORIZED`
86
- - Method mismatch: JSON-RPC `-32601` (HTTP 404)
100
+ The CLI sends `AGENT_DEVICE_DAEMON_AUTH_TOKEN` in both the JSON-RPC request token field and HTTP auth headers so existing daemon auth paths continue to work.
87
101
 
88
- ## Operational guidance
102
+ ## Failure semantics and trust notes
89
103
 
90
- - Keep TTL short and heartbeat only while a run is active.
91
- - Release lease immediately on run completion/error paths.
92
- - For remote debug sessions, inspect logs on the remote host; client-side `--debug` no longer tails
93
- a local daemon log when `AGENT_DEVICE_DAEMON_BASE_URL` is set.
94
- - For bounded hosts, configure:
95
- - `AGENT_DEVICE_MAX_SIMULATOR_LEASES`
96
- - `AGENT_DEVICE_LEASE_TTL_MS`
97
- - `AGENT_DEVICE_LEASE_MIN_TTL_MS`
98
- - `AGENT_DEVICE_LEASE_MAX_TTL_MS`
104
+ - Missing tenant, run, or lease fields in tenant-isolation mode should fail as `INVALID_ARGS`.
105
+ - Inactive or scope-mismatched leases should fail as `UNAUTHORIZED`.
106
+ - Inspect logs on the remote host during remote debugging. Client-side `--debug` does not tail a local daemon log once `AGENT_DEVICE_DAEMON_BASE_URL` is set.
107
+ - Treat daemon auth tokens and lease identifiers as sensitive operational data.