agent-device 0.13.2 → 0.14.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +68 -63
- package/android-snapshot-helper/README.md +75 -0
- package/android-snapshot-helper/dist/agent-device-android-snapshot-helper-0.14.0.apk +0 -0
- package/android-snapshot-helper/dist/agent-device-android-snapshot-helper-0.14.0.apk.sha256 +1 -0
- package/android-snapshot-helper/dist/agent-device-android-snapshot-helper-0.14.0.manifest.json +17 -0
- package/bin/agent-device.mjs +6 -2
- package/dist/src/113.js +1 -1
- package/dist/src/1974.js +2 -2
- package/dist/src/221.js +4 -0
- package/dist/src/2301.js +1 -1
- package/dist/src/3918.js +29 -29
- package/dist/src/7847.js +1 -1
- package/dist/src/8161.js +3 -0
- package/dist/src/8656.js +1 -1
- package/dist/src/9152.js +1 -1
- package/dist/src/940.js +1 -0
- package/dist/src/9542.js +2 -2
- package/dist/src/9818.js +1 -1
- package/dist/src/989.js +1 -1
- package/dist/src/android-snapshot-helper.d.ts +181 -0
- package/dist/src/android-snapshot-helper.js +1 -0
- package/dist/src/index.d.ts +204 -1942
- package/dist/src/index.js +1 -1
- package/dist/src/internal/bin.js +440 -0
- package/dist/src/internal/companion-tunnel.js +1 -0
- package/dist/src/internal/daemon.js +45 -0
- package/dist/src/internal/update-check-entry.js +1 -0
- package/dist/src/metro.d.ts +5 -3
- package/dist/src/selectors.js +1 -1
- package/package.json +28 -24
- package/skills/agent-device/SKILL.md +20 -62
- package/skills/dogfood/SKILL.md +9 -168
- package/skills/react-devtools/SKILL.md +15 -31
- package/dist/src/4993.js +0 -1
- package/dist/src/5721.js +0 -1
- package/dist/src/7166.js +0 -1
- package/dist/src/8564.js +0 -3
- package/dist/src/9076.js +0 -1
- package/dist/src/backend.d.ts +0 -527
- package/dist/src/backend.js +0 -1
- package/dist/src/bin.js +0 -105
- package/dist/src/commands/index.d.ts +0 -1883
- package/dist/src/commands/index.js +0 -1
- package/dist/src/daemon.js +0 -43
- package/dist/src/metro-companion.js +0 -1
- package/dist/src/observability.d.ts +0 -91
- package/dist/src/observability.js +0 -1
- package/dist/src/testing/conformance.d.ts +0 -753
- package/dist/src/testing/conformance.js +0 -1
- package/dist/src/update-check-entry.js +0 -1
- package/skills/agent-device/references/bootstrap-install.md +0 -244
- package/skills/agent-device/references/coordinate-system.md +0 -28
- package/skills/agent-device/references/debugging.md +0 -138
- package/skills/agent-device/references/exploration.md +0 -362
- package/skills/agent-device/references/macos-desktop.md +0 -88
- package/skills/agent-device/references/remote-tenancy.md +0 -188
- package/skills/agent-device/references/verification.md +0 -134
- package/skills/dogfood/references/issue-taxonomy.md +0 -83
- package/skills/dogfood/templates/dogfood-report-template.md +0 -52
- package/skills/react-devtools/references/commands.md +0 -91
- package/skills/react-devtools/references/profiling.md +0 -74
- /package/dist/src/{bin.d.ts → internal/bin.d.ts} +0 -0
- /package/dist/src/{daemon.d.ts → internal/companion-tunnel.d.ts} +0 -0
- /package/dist/src/{metro-companion.d.ts → internal/daemon.d.ts} +0 -0
- /package/dist/src/{update-check-entry.d.ts → internal/update-check-entry.d.ts} +0 -0
|
@@ -1,362 +0,0 @@
|
|
|
1
|
-
# Exploration
|
|
2
|
-
|
|
3
|
-
## When to open this file
|
|
4
|
-
|
|
5
|
-
Open this file when the app or screen is already running and you need to discover the UI, choose targets, read state, wait for conditions, or perform normal interactions.
|
|
6
|
-
|
|
7
|
-
## Read-only first
|
|
8
|
-
|
|
9
|
-
- If the question is what text, labels, or structure is visible on screen, start with plain `snapshot`.
|
|
10
|
-
- Escalate to `snapshot -i` only when you need refs such as `@e3` for interactive exploration or a requested action.
|
|
11
|
-
- If you intend to `press`, `fill`, or otherwise interact, start with `snapshot -i` and fall back to plain `snapshot` only if interactive refs are unavailable.
|
|
12
|
-
- Prefer `get`, `is`, or `find` before mutating the UI when a read-only command can answer the question.
|
|
13
|
-
- You may take the smallest reversible UI action needed to unblock inspection, such as dismissing a popup, closing an alert, or backing out of an unintended surface.
|
|
14
|
-
- Do not type or fill text just to make hidden information easier to access unless the user asked for that interaction.
|
|
15
|
-
- Do not use external sources to infer missing UI state unless the user explicitly asked.
|
|
16
|
-
- If the answer is not visible or exposed in the UI, report that gap instead of compensating with search, navigation, or text entry.
|
|
17
|
-
|
|
18
|
-
## Decision shortcut
|
|
19
|
-
|
|
20
|
-
- User asks what is visible on screen: `snapshot`
|
|
21
|
-
- User asks for exact text from a known target: `get text`
|
|
22
|
-
- User asks you to tap, type, or choose an element: `snapshot -i`, then act
|
|
23
|
-
- User asks for the React Native component tree, props/state/hooks, or render profiling: use `agent-device react-devtools ...` and the `skills/react-devtools` workflow
|
|
24
|
-
- User asks to reload a Metro-backed React Native app after JS changes: `agent-device metro reload`, then wait briefly and re-run `snapshot` or `snapshot -i`
|
|
25
|
-
- React Native dev or debug build shows warning/error UI: capture enough evidence to identify it, dismiss it if it is not the requested behavior, then continue the flow and report it in the summary
|
|
26
|
-
- The on-screen keyboard is blocking the next step: `keyboard dismiss`; on iOS do this only while an app session is active, and use `keyboard status|get` only on Android
|
|
27
|
-
- UI does not expose the answer: say so plainly; do not browse or force the app into a new state unless asked
|
|
28
|
-
|
|
29
|
-
## Read-only commands
|
|
30
|
-
|
|
31
|
-
- `snapshot`
|
|
32
|
-
- `get`
|
|
33
|
-
- `is`
|
|
34
|
-
- `find`
|
|
35
|
-
- `keyboard status|get` on Android when keyboard visibility or input type matters
|
|
36
|
-
|
|
37
|
-
## Interaction commands
|
|
38
|
-
|
|
39
|
-
- `snapshot -i`
|
|
40
|
-
- `press`
|
|
41
|
-
- `fill`
|
|
42
|
-
- `type`
|
|
43
|
-
- `scroll`
|
|
44
|
-
- `wait`
|
|
45
|
-
- `keyboard dismiss` when the keyboard obscures the next target
|
|
46
|
-
|
|
47
|
-
## Common mistakes to avoid
|
|
48
|
-
|
|
49
|
-
**Stale refs.** Do not treat `@ref` values as durable after navigation or dynamic updates. Re-snapshot after the UI changes, and switch to selectors when the flow must stay stable.
|
|
50
|
-
|
|
51
|
-
**Android AX tree lag.** After submits, route changes, or composer transitions, the accessibility tree can lag behind the visible UI. If `snapshot -i` and `screenshot` disagree:
|
|
52
|
-
|
|
53
|
-
1. Trust the screenshot as visual truth.
|
|
54
|
-
2. Take one fresh `snapshot -i`. Android retries suspicious trees for a short post-action deadline after navigation-sensitive actions.
|
|
55
|
-
3. If the tree still disagrees with the screenshot, wait briefly, then take one more fresh snapshot. Do not loop snapshots immediately.
|
|
56
|
-
4. For animation-heavy Android runs, use `settings animations off` as an opt-in stabilizer and restore with `settings animations on` after the run.
|
|
57
|
-
|
|
58
|
-
**React Native dev overlays.** In dev or debug builds, warning or error overlays can block taps, change focus, or hide the real UI. Check for them near app open and after major transitions.
|
|
59
|
-
|
|
60
|
-
- Not blocking the task: dismiss and continue.
|
|
61
|
-
- Blocking or recurring: switch to [debugging.md](debugging.md) and collect evidence.
|
|
62
|
-
- Seen at any point: mention in the final summary even if dismissed.
|
|
63
|
-
|
|
64
|
-
**React Native Metro reload.** When a dev app is already running and connected to Metro, prefer a Metro reload over restarting the native app process:
|
|
65
|
-
|
|
66
|
-
```bash
|
|
67
|
-
agent-device metro reload
|
|
68
|
-
agent-device wait 1000
|
|
69
|
-
agent-device snapshot -i
|
|
70
|
-
```
|
|
71
|
-
|
|
72
|
-
Use `--metro-host`, `--metro-port`, or `--bundle-url` only when the active connection does not already carry the right runtime hints. Fall back to `open <app> --relaunch` when the app is not connected to Metro, Metro reload fails, or native startup state needs a clean process.
|
|
73
|
-
|
|
74
|
-
## Common example loops
|
|
75
|
-
|
|
76
|
-
These are examples, not required exact sequences. Adapt them to the app, state, and task at hand.
|
|
77
|
-
|
|
78
|
-
### Interactive exploration loop
|
|
79
|
-
|
|
80
|
-
```bash
|
|
81
|
-
agent-device open Settings --platform ios
|
|
82
|
-
agent-device snapshot -i
|
|
83
|
-
agent-device press @e3
|
|
84
|
-
agent-device wait visible 'label="Privacy & Security"' 3000
|
|
85
|
-
agent-device get text 'label="Privacy & Security"'
|
|
86
|
-
agent-device close
|
|
87
|
-
```
|
|
88
|
-
|
|
89
|
-
### Screen verification loop
|
|
90
|
-
|
|
91
|
-
```bash
|
|
92
|
-
agent-device open MyApp --platform ios
|
|
93
|
-
# perform the necessary actions to reach the state you need to verify
|
|
94
|
-
agent-device snapshot
|
|
95
|
-
# verify whether the expected element or text is present
|
|
96
|
-
agent-device close
|
|
97
|
-
```
|
|
98
|
-
|
|
99
|
-
## Snapshot choices
|
|
100
|
-
|
|
101
|
-
- Use plain `snapshot` when you only need to verify whether visible text or structure is on screen.
|
|
102
|
-
- Use `snapshot -i` when you need refs such as `@e3` for interactive exploration or for an intended interaction.
|
|
103
|
-
- On iOS and Android, default snapshot output is visible-first. Off-screen interactive content is surfaced as discovery hints (including inline scroll/list hidden-content hints when known), not shown as directly tappable refs.
|
|
104
|
-
- Treat large text-surface lines in `snapshot -i` as discovery output. If a node shows preview or truncation metadata, use `get text @ref` only after you have already decided that `snapshot -i` is needed for that surface.
|
|
105
|
-
- Use `snapshot -i -s "Camera"` or `snapshot -i -s @e3` when you want a smaller, scoped result.
|
|
106
|
-
- If `snapshot -i -s "<query>"` returns 0 nodes, the scope did not match the current screen. Widen the query or re-check the screen state instead of assuming the command silently fell back to the full tree.
|
|
107
|
-
- If `snapshot -i` returns 0 nodes but the screen is visibly populated, treat `screenshot` as visual truth, wait briefly, then re-run `snapshot -i` once before escalating.
|
|
108
|
-
- If `snapshot -i -d <n>` says the interactive output is empty at that depth, retry without `-d` instead of taking more shallow snapshots.
|
|
109
|
-
|
|
110
|
-
Example:
|
|
111
|
-
|
|
112
|
-
```bash
|
|
113
|
-
agent-device snapshot -i
|
|
114
|
-
```
|
|
115
|
-
|
|
116
|
-
Sample output:
|
|
117
|
-
|
|
118
|
-
```text
|
|
119
|
-
Page: com.apple.Preferences
|
|
120
|
-
App: com.apple.Preferences
|
|
121
|
-
|
|
122
|
-
@e1 [ioscontentgroup]
|
|
123
|
-
@e2 [button] "Camera"
|
|
124
|
-
@e3 [button] "Privacy & Security"
|
|
125
|
-
[off-screen below] 2 interactive items: "Location Services", "Battery"
|
|
126
|
-
```
|
|
127
|
-
|
|
128
|
-
## Refs vs selectors
|
|
129
|
-
|
|
130
|
-
- Use refs for discovery, debugging, and short local loops.
|
|
131
|
-
- When a target appears only in a visible-first off-screen summary, such as `[off-screen below] ... "Battery"`, use `scroll down` and then `snapshot -i`. For `[off-screen above]`, use `scroll up` and then `snapshot -i`.
|
|
132
|
-
- For more than two repeated scroll checks, create a short shell loop instead of issuing each command by hand. Stop when the label appears or the snapshot stops changing.
|
|
133
|
-
- Visible-first off-screen summaries are intentionally compact. If you need the full off-screen tree instead of a short summary, retry with `snapshot --raw`.
|
|
134
|
-
- Cap long searches in the loop when the list may be unbounded or the target may not exist.
|
|
135
|
-
- Use selectors for deterministic scripts, assertions, and replay-friendly actions.
|
|
136
|
-
- Prefer selector or `@ref` targeting over raw coordinates.
|
|
137
|
-
- For tap interactions, `press` is canonical and `click` is an equivalent alias.
|
|
138
|
-
|
|
139
|
-
Examples:
|
|
140
|
-
|
|
141
|
-
```bash
|
|
142
|
-
agent-device press @e2
|
|
143
|
-
agent-device fill @e5 "test"
|
|
144
|
-
agent-device press 'id="camera_row" || label="Camera" role=button'
|
|
145
|
-
agent-device is visible 'id="camera_settings_anchor"'
|
|
146
|
-
```
|
|
147
|
-
|
|
148
|
-
Example loop:
|
|
149
|
-
|
|
150
|
-
```bash
|
|
151
|
-
previous=''
|
|
152
|
-
for _ in 1 2 3 4 5 6; do
|
|
153
|
-
current="$(agent-device snapshot -i)"
|
|
154
|
-
printf '%s\n' "$current"
|
|
155
|
-
printf '%s\n' "$current" | grep -q 'Battery' && break
|
|
156
|
-
[ "$current" = "$previous" ] && break
|
|
157
|
-
previous="$current"
|
|
158
|
-
agent-device scroll down 0.5 >/dev/null
|
|
159
|
-
done
|
|
160
|
-
```
|
|
161
|
-
|
|
162
|
-
## Interaction fallbacks
|
|
163
|
-
|
|
164
|
-
When `press @ref` fails:
|
|
165
|
-
|
|
166
|
-
1. If the error says the ref is off-screen, use the off-screen summary direction to run `scroll <direction>`, then take a fresh `snapshot -i`.
|
|
167
|
-
2. Re-snapshot if the UI may have changed.
|
|
168
|
-
3. Retry `press @ref` or a selector-based `press`.
|
|
169
|
-
4. If `screenshot --overlay-refs --json` returned a reliable `overlayRefs[].center`, use `agent-device press <x> <y>`.
|
|
170
|
-
5. Use an external vision-based tap tool only after semantic and coordinate targeting fail.
|
|
171
|
-
|
|
172
|
-
- Prefer `@ref` over coordinates.
|
|
173
|
-
- Do not guess coordinates from the image when structured `center` is available.
|
|
174
|
-
- `agent-device` does not provide a built-in vision-tap flag.
|
|
175
|
-
|
|
176
|
-
## Text entry rules
|
|
177
|
-
|
|
178
|
-
- Use `fill` to replace text in an editable field.
|
|
179
|
-
- Use `type` to append text to the current insertion point.
|
|
180
|
-
- Use `fill @ref "text"` when you need to target a field directly by ref.
|
|
181
|
-
- Use `press @ref`, then `type "text"` when the field is already focused and you need append semantics.
|
|
182
|
-
- Do not write `type @ref "text"`; `type` only accepts text and will not target that ref for you.
|
|
183
|
-
- If the keyboard blocks the next control after text entry, prefer `keyboard dismiss` instead of backing out of the screen.
|
|
184
|
-
- On iOS, `keyboard dismiss` depends on the active app session to keep the target app foregrounded, so do not rely on selector-only dismiss calls after closing or without `open`.
|
|
185
|
-
- Do not use `fill` or `type` just to make the app reveal information that is not currently visible unless the user asked for that interaction.
|
|
186
|
-
|
|
187
|
-
## React Native dev or debug overlays
|
|
188
|
-
|
|
189
|
-
Use this loop for React Native dev clients, Metro-backed builds, and local debug sessions where warnings or errors may appear as tooltips, banners, toasts, or modal overlays.
|
|
190
|
-
|
|
191
|
-
1. After `open`, inspect the visible UI for warning or error surfaces before relying on the next tap.
|
|
192
|
-
2. If a warning or error is visible, capture enough evidence to identify it:
|
|
193
|
-
- preferred: `screenshot`
|
|
194
|
-
- optional: `logs mark "warning visible"` or `logs mark "error visible"` if you are already in a debug window
|
|
195
|
-
3. If the overlay is not the thing the user asked you to investigate, dismiss or close it with the smallest reversible action.
|
|
196
|
-
4. Re-check the intended screen before continuing the task.
|
|
197
|
-
5. Report any visible warnings or errors in the final summary, even if the flow succeeded after dismissal.
|
|
198
|
-
|
|
199
|
-
Use this rule of thumb:
|
|
200
|
-
|
|
201
|
-
- Warning overlay that does not block the task: dismiss and keep going.
|
|
202
|
-
- Error overlay that does not block the task: dismiss, keep going, and report it.
|
|
203
|
-
- Error overlay that blocks the task or keeps returning: stop treating it as noise and switch to [debugging.md](debugging.md).
|
|
204
|
-
|
|
205
|
-
## Query and sync rules
|
|
206
|
-
|
|
207
|
-
- Use `get` to read text, attrs, or state from a known target.
|
|
208
|
-
- Use `is` for assertions.
|
|
209
|
-
- Use `wait` when the UI needs time to settle after a mutation.
|
|
210
|
-
- Use `find "<query>" click --json` when you need search-driven targeting plus matched-target metadata.
|
|
211
|
-
- Use `find "<query>" click --first` or `--last` when ambiguous matches are expected and you want the first or last occurrence without falling back to raw coordinates.
|
|
212
|
-
- If you are forced onto raw coordinates, open [coordinate-system.md](coordinate-system.md) first.
|
|
213
|
-
|
|
214
|
-
Example:
|
|
215
|
-
|
|
216
|
-
```bash
|
|
217
|
-
agent-device find "Increment" click --json
|
|
218
|
-
```
|
|
219
|
-
|
|
220
|
-
Returned metadata comes from the matched snapshot node and can be used for observability or replay maintenance.
|
|
221
|
-
|
|
222
|
-
## QA from acceptance criteria
|
|
223
|
-
|
|
224
|
-
Use this loop when the task starts from acceptance criteria and you need to turn them into concrete checks.
|
|
225
|
-
|
|
226
|
-
Preferred mapping:
|
|
227
|
-
|
|
228
|
-
- visibility claim for what is on-screen now: `is visible` or plain `snapshot`
|
|
229
|
-
- presence claim regardless of viewport visibility: `is exists`
|
|
230
|
-
- exact text, label, or value claim: `get text`
|
|
231
|
-
- post-action state change: act, then `wait`, then `is` or `get`
|
|
232
|
-
- nearby structural UI change: `diff snapshot`
|
|
233
|
-
- proof artifact for the final result: `screenshot` or `record`
|
|
234
|
-
|
|
235
|
-
Notes:
|
|
236
|
-
|
|
237
|
-
- `wait text` is useful for synchronizing on text presence, but it is not the same as `is visible`.
|
|
238
|
-
- After a nearby navigation or submit on Android, prefer `screenshot`, then one fresh `snapshot -i`; `@ref` interactions refresh while the Android freshness window is active.
|
|
239
|
-
|
|
240
|
-
Anti-hallucination rules:
|
|
241
|
-
|
|
242
|
-
- Do not invent app names, device ids, session names, refs, selectors, or package names.
|
|
243
|
-
- Discover them first with `devices`, `open`, `snapshot -i`, `find`, or `session list`.
|
|
244
|
-
- If refs drift after navigation, re-snapshot or switch to selectors instead of guessing.
|
|
245
|
-
|
|
246
|
-
Avoid this escalation path for visible-text questions:
|
|
247
|
-
|
|
248
|
-
- Do not jump from `snapshot -i` to `get text @ref`, then to web search, then to typing into a search box just to force the app to reveal the answer.
|
|
249
|
-
- Start with `snapshot`. If the text is not visible or exposed, report that directly.
|
|
250
|
-
- After Android submit or navigation-heavy actions when the UI looks wrong: `screenshot` first, then `snapshot -i`.
|
|
251
|
-
|
|
252
|
-
Canonical QA loop:
|
|
253
|
-
|
|
254
|
-
```bash
|
|
255
|
-
agent-device open MyApp --platform ios
|
|
256
|
-
agent-device snapshot -i
|
|
257
|
-
agent-device press @e3
|
|
258
|
-
agent-device wait visible 'label="Success"' 3000
|
|
259
|
-
agent-device is visible 'label="Success"'
|
|
260
|
-
agent-device screenshot /tmp/qa-proof.png
|
|
261
|
-
agent-device close
|
|
262
|
-
```
|
|
263
|
-
|
|
264
|
-
## Accessibility audit
|
|
265
|
-
|
|
266
|
-
Use this pattern when you need to find UI that is visible to a user but missing from the accessibility tree.
|
|
267
|
-
|
|
268
|
-
Audit loop:
|
|
269
|
-
|
|
270
|
-
1. Capture a `screenshot` to see what is visually rendered.
|
|
271
|
-
2. Capture a `snapshot` or `snapshot -i` to see what the accessibility tree exposes.
|
|
272
|
-
3. Compare the two:
|
|
273
|
-
- visible in screenshot and present in snapshot: exposed to accessibility
|
|
274
|
-
- visible in screenshot and missing from snapshot: likely accessibility gap
|
|
275
|
-
4. If you suspect the node exists in AX but is filtered from interactive output, retry with `snapshot --raw`.
|
|
276
|
-
|
|
277
|
-
Example:
|
|
278
|
-
|
|
279
|
-
```bash
|
|
280
|
-
agent-device screenshot /tmp/accessibility-screen.png
|
|
281
|
-
agent-device snapshot -i
|
|
282
|
-
```
|
|
283
|
-
|
|
284
|
-
Use `screenshot` as the visual source of truth and `snapshot` as the accessibility source of truth for this audit.
|
|
285
|
-
|
|
286
|
-
## Batch only when the sequence is already known
|
|
287
|
-
|
|
288
|
-
Use `batch` when a short command sequence is already planned and belongs to one logical screen flow.
|
|
289
|
-
|
|
290
|
-
```bash
|
|
291
|
-
agent-device batch --session sim --platform ios --steps-file /tmp/batch-steps.json --json
|
|
292
|
-
```
|
|
293
|
-
|
|
294
|
-
- Keep batch size moderate, roughly 5 to 20 steps.
|
|
295
|
-
- Add `wait` or `is exists` guards after mutating steps.
|
|
296
|
-
- Do not use `batch` for highly dynamic flows that need replanning after each step.
|
|
297
|
-
|
|
298
|
-
Example: known chat-send flow
|
|
299
|
-
|
|
300
|
-
```json
|
|
301
|
-
[
|
|
302
|
-
{ "command": "open", "positionals": ["ChatApp"], "flags": { "platform": "android" } },
|
|
303
|
-
{ "command": "click", "positionals": ["label=\"Travel chat\""], "flags": {} },
|
|
304
|
-
{ "command": "wait", "positionals": ["label=\"Message\"", "3000"], "flags": {} },
|
|
305
|
-
{ "command": "fill", "positionals": ["label=\"Message\"", "Filed the expense"], "flags": {} },
|
|
306
|
-
{ "command": "press", "positionals": ["label=\"Send\""], "flags": {} }
|
|
307
|
-
]
|
|
308
|
-
```
|
|
309
|
-
|
|
310
|
-
Step payload contract:
|
|
311
|
-
|
|
312
|
-
```json
|
|
313
|
-
[
|
|
314
|
-
{ "command": "open", "positionals": ["Settings"], "flags": { "platform": "ios" } },
|
|
315
|
-
{ "command": "wait", "positionals": ["label=\"Privacy & Security\"", "3000"], "flags": {} },
|
|
316
|
-
{ "command": "click", "positionals": ["label=\"Privacy & Security\""], "flags": {} },
|
|
317
|
-
{ "command": "get", "positionals": ["text", "label=\"Tracking\""], "flags": {} }
|
|
318
|
-
]
|
|
319
|
-
```
|
|
320
|
-
|
|
321
|
-
- `positionals` is optional and defaults to `[]`.
|
|
322
|
-
- `flags` is optional and defaults to `{}`.
|
|
323
|
-
- Only `command`, `positionals`, `flags`, and `runtime` are accepted as top-level step keys.
|
|
324
|
-
- Nested `batch` and `replay` are rejected.
|
|
325
|
-
- Supported error mode is stop-on-first-error.
|
|
326
|
-
|
|
327
|
-
Response handling:
|
|
328
|
-
|
|
329
|
-
- Success returns fields such as `total`, `executed`, `totalDurationMs`, and `results[]`.
|
|
330
|
-
- Human-mode `batch` runs also print a short per-step success summary.
|
|
331
|
-
- Failed runs include `details.step`, `details.command`, `details.executed`, and `details.partialResults`.
|
|
332
|
-
- Replan from the first failing step instead of rerunning the whole flow blindly.
|
|
333
|
-
|
|
334
|
-
Canonical batch recipe: open app -> open action menu -> choose option -> verify
|
|
335
|
-
|
|
336
|
-
```json
|
|
337
|
-
[
|
|
338
|
-
{ "command": "open", "positionals": ["com.example.app"], "flags": { "platform": "android" } },
|
|
339
|
-
{ "command": "wait", "positionals": ["text", "Home", "3000"], "flags": {} },
|
|
340
|
-
{ "command": "press", "positionals": ["label=\"More actions\" role=button"], "flags": {} },
|
|
341
|
-
{ "command": "wait", "positionals": ["text", "Camera scan", "2000"], "flags": {} },
|
|
342
|
-
{ "command": "press", "positionals": ["label=\"Camera scan\""], "flags": {} },
|
|
343
|
-
{ "command": "wait", "positionals": ["text", "Expense created", "15000"], "flags": {} },
|
|
344
|
-
{ "command": "is", "positionals": ["visible", "label=\"Expense created\""], "flags": {} }
|
|
345
|
-
]
|
|
346
|
-
```
|
|
347
|
-
|
|
348
|
-
Common batch error categories:
|
|
349
|
-
|
|
350
|
-
- `INVALID_ARGS`: fix the payload shape and retry.
|
|
351
|
-
- `SESSION_NOT_FOUND`: open or select the correct session, then retry.
|
|
352
|
-
- `UNSUPPORTED_OPERATION`: switch to a supported command or surface.
|
|
353
|
-
- `AMBIGUOUS_MATCH`: refine the selector or locator, then retry the failed step.
|
|
354
|
-
- `DEVICE_IN_USE`: the device is held by another session — close or reuse the existing session before retrying.
|
|
355
|
-
- `COMMAND_FAILED`: add sync guards and retry from the failing step.
|
|
356
|
-
|
|
357
|
-
## Stop conditions
|
|
358
|
-
|
|
359
|
-
- If refs drift after transitions, switch to selectors.
|
|
360
|
-
- If a desktop surface or context menu is involved on macOS, load [macos-desktop.md](macos-desktop.md).
|
|
361
|
-
- If logs, network, alerts, or setup failures become the blocker, switch to [debugging.md](debugging.md).
|
|
362
|
-
- If the flow is stable and you need proof or replay maintenance, switch to [verification.md](verification.md).
|
|
@@ -1,88 +0,0 @@
|
|
|
1
|
-
# macOS Desktop
|
|
2
|
-
|
|
3
|
-
## When to open this file
|
|
4
|
-
|
|
5
|
-
Open this file only when `--platform macos` is involved or the task needs `frontmost-app`, `desktop`, or `menubar` surfaces.
|
|
6
|
-
|
|
7
|
-
## Main commands to reach for first
|
|
8
|
-
|
|
9
|
-
- `open <app> --platform macos`
|
|
10
|
-
- `open --platform macos --surface frontmost-app|desktop|menubar`
|
|
11
|
-
- `snapshot -i`
|
|
12
|
-
- `get`
|
|
13
|
-
- `is`
|
|
14
|
-
- `click --button secondary`
|
|
15
|
-
|
|
16
|
-
## Most common mistake to avoid
|
|
17
|
-
|
|
18
|
-
Do not treat every macOS surface the same. Use the normal `app` surface when you want to act inside one app. Use `frontmost-app`, `desktop`, or `menubar` mainly to inspect what is visible before switching back to `app` for most interactions.
|
|
19
|
-
|
|
20
|
-
## Canonical loop
|
|
21
|
-
|
|
22
|
-
```bash
|
|
23
|
-
agent-device open TextEdit --platform macos
|
|
24
|
-
agent-device snapshot
|
|
25
|
-
agent-device close
|
|
26
|
-
```
|
|
27
|
-
|
|
28
|
-
## Surface rules
|
|
29
|
-
|
|
30
|
-
- `app`: default surface and the normal choice for `click`, `fill`, `press`, `scroll`, `screenshot`, and `record`.
|
|
31
|
-
- `frontmost-app`: inspect the currently focused app without naming it first.
|
|
32
|
-
- `desktop`: inspect visible desktop windows across apps.
|
|
33
|
-
- `menubar`: inspect the active app menu bar and system menu extras. Use `open <app> --platform macos --surface menubar` when you need one menu bar app's extras, such as a status-item app.
|
|
34
|
-
- Menu bar apps can expose a sparse or empty default `app` tree. Prefer the `menubar` surface first when the app lives entirely in the top bar.
|
|
35
|
-
|
|
36
|
-
Use inspect-first surfaces to understand desktop-global UI, then switch back to `app` when you need to act in one app.
|
|
37
|
-
|
|
38
|
-
## Snapshot expectations
|
|
39
|
-
|
|
40
|
-
- `snapshot -i` should describe UI visible to a human.
|
|
41
|
-
- `desktop` snapshots can include multiple windows from multiple apps.
|
|
42
|
-
- `menubar` snapshots can include both app-menu items and system menu extras.
|
|
43
|
-
- Finder-style rows, sidebar items, toolbar controls, search fields, and opened context menus should appear when visible.
|
|
44
|
-
- Finder and other native apps may expose duplicate-looking row, cell, and child text nodes. Treat them as distinct AX nodes unless you have a stronger selector anchor.
|
|
45
|
-
|
|
46
|
-
## Context menus
|
|
47
|
-
|
|
48
|
-
Context menus are not ambient UI. Open them explicitly, then re-snapshot.
|
|
49
|
-
|
|
50
|
-
```bash
|
|
51
|
-
agent-device click @e66 --button secondary --platform macos
|
|
52
|
-
agent-device snapshot -i
|
|
53
|
-
```
|
|
54
|
-
|
|
55
|
-
Expected loop:
|
|
56
|
-
|
|
57
|
-
1. Snapshot visible content.
|
|
58
|
-
2. Secondary-click the target item.
|
|
59
|
-
3. Snapshot again.
|
|
60
|
-
4. Interact with the new `menu-item` nodes.
|
|
61
|
-
|
|
62
|
-
## Targeting rules
|
|
63
|
-
|
|
64
|
-
- Prefer selectors or `@ref` values over raw coordinates.
|
|
65
|
-
- On macOS, window position can vary across runs, so coordinate-only flows are fragile.
|
|
66
|
-
- If the task only needs shared exploration rules, return to [exploration.md](exploration.md).
|
|
67
|
-
|
|
68
|
-
Selector guidance:
|
|
69
|
-
|
|
70
|
-
- Good selectors usually anchor on stable labels or app-owned identifiers such as `label="Downloads"` or `role=menu-item label="Rename"`.
|
|
71
|
-
- Avoid relying on framework-generated `_NS:*` identifiers as stable selectors.
|
|
72
|
-
|
|
73
|
-
Use `snapshot --raw --platform macos` only when debugging AX structure or collector filtering. Do not make raw snapshots the default agent loop.
|
|
74
|
-
|
|
75
|
-
Things not to rely on:
|
|
76
|
-
|
|
77
|
-
- Mobile-only helpers such as `install`, `reinstall`, or `push`.
|
|
78
|
-
- Desktop-global click, fill, or gesture parity from `desktop` or `menubar` sessions.
|
|
79
|
-
- Raw coordinate assumptions across runs.
|
|
80
|
-
|
|
81
|
-
Troubleshooting:
|
|
82
|
-
|
|
83
|
-
- If visible content is missing from `snapshot -i`, re-snapshot after the UI settles.
|
|
84
|
-
- If `desktop` is too broad, retry with `frontmost-app`.
|
|
85
|
-
- If `menubar` is missing the expected menu, retry with `open <app> --platform macos --surface menubar` for menu bar apps, or make the app frontmost first and retry the generic menubar surface.
|
|
86
|
-
- If the wrong menu opened, retry secondary-clicking the row or cell wrapper rather than the nested text node.
|
|
87
|
-
- If the app has multiple windows, make the correct window frontmost before relying on refs.
|
|
88
|
-
- If overriding the local helper, set `AGENT_DEVICE_MACOS_HELPER_BIN` to an absolute executable path; relative helper paths are rejected.
|
|
@@ -1,188 +0,0 @@
|
|
|
1
|
-
# Remote Tenancy
|
|
2
|
-
|
|
3
|
-
## When to open this file
|
|
4
|
-
|
|
5
|
-
Open this file for remote daemon HTTP flows that let an agent running in a Linux sandbox talk to another `agent-device` instance on a remote macOS host in order to control devices that are not available locally. This file covers daemon URL setup, authentication, `connect`, tenant lease scope, and remote Metro companion lifecycle.
|
|
6
|
-
|
|
7
|
-
## Main commands to reach for first
|
|
8
|
-
|
|
9
|
-
- `agent-device connect --remote-config <path>`
|
|
10
|
-
- `agent-device install-from-source <url> --remote-config <path> --platform android`
|
|
11
|
-
- `agent-device install-from-source --github-actions-artifact <org>/<repo>:<artifact> --remote-config <path> --platform android`
|
|
12
|
-
- `agent-device open <package> --remote-config <path> --relaunch`
|
|
13
|
-
- `agent-device metro reload --remote-config <path>`
|
|
14
|
-
- `agent-device snapshot --remote-config <path> -i`
|
|
15
|
-
- `agent-device disconnect --remote-config <path>`
|
|
16
|
-
- `agent-device connection status`
|
|
17
|
-
- `agent-device auth status`
|
|
18
|
-
- `AGENT_DEVICE_DAEMON_AUTH_TOKEN=...` for CI/service-token automation
|
|
19
|
-
|
|
20
|
-
## Most common mistake to avoid
|
|
21
|
-
|
|
22
|
-
Do not mix an arbitrary `--session` plus ad-hoc daemon, tenant, run, or lease flags. That can bypass saved Metro runtime hints. Use one of these patterns instead:
|
|
23
|
-
|
|
24
|
-
- Interactive flow: run `connect --remote-config <path>` once, then normal commands, then `disconnect`.
|
|
25
|
-
- Script flow: pass the same `--remote-config <path>` to every command, including `disconnect`.
|
|
26
|
-
|
|
27
|
-
## Choose one flow
|
|
28
|
-
|
|
29
|
-
### Interactive flow
|
|
30
|
-
|
|
31
|
-
Use this when the agent will run several commands in one session.
|
|
32
|
-
|
|
33
|
-
```bash
|
|
34
|
-
agent-device connect --remote-config ./remote-config.json
|
|
35
|
-
|
|
36
|
-
ARTIFACT_URL="<trusted-artifact-url>"
|
|
37
|
-
agent-device install-from-source "$ARTIFACT_URL" --platform android
|
|
38
|
-
agent-device open com.example.app --relaunch
|
|
39
|
-
agent-device metro reload
|
|
40
|
-
agent-device snapshot -i
|
|
41
|
-
agent-device fill @e3 "test@example.com"
|
|
42
|
-
agent-device disconnect
|
|
43
|
-
```
|
|
44
|
-
|
|
45
|
-
After `connect`, normal commands use the active remote connection. If cloud credentials are missing, `connect` starts login automatically in an interactive shell and stores a revocable CLI session that silently mints short-lived `adc_agent_...` command tokens. The cloud side remains responsible for token expiry, tenant/run claim checks, revocation, one-time device approval, and polling rate limits. End with `disconnect` to release the lease and stop the owned Metro companion.
|
|
46
|
-
|
|
47
|
-
### Self-contained script flow
|
|
48
|
-
|
|
49
|
-
Use this when each command must be explicit and repeatable. Pass the same `--remote-config` to each step.
|
|
50
|
-
|
|
51
|
-
```bash
|
|
52
|
-
ARTIFACT_URL="<trusted-artifact-url>"
|
|
53
|
-
|
|
54
|
-
agent-device install-from-source "$ARTIFACT_URL" \
|
|
55
|
-
--remote-config ./remote-config.json \
|
|
56
|
-
--platform android
|
|
57
|
-
|
|
58
|
-
agent-device open com.example.app \
|
|
59
|
-
--remote-config ./remote-config.json \
|
|
60
|
-
--relaunch
|
|
61
|
-
|
|
62
|
-
agent-device snapshot \
|
|
63
|
-
--remote-config ./remote-config.json \
|
|
64
|
-
-i
|
|
65
|
-
|
|
66
|
-
agent-device disconnect \
|
|
67
|
-
--remote-config ./remote-config.json
|
|
68
|
-
```
|
|
69
|
-
|
|
70
|
-
The first command that needs a lease or Metro runtime prepares and persists it. Later commands with the same `--remote-config` reuse that state. End with `disconnect --remote-config <path>` to release the lease and stop the owned Metro companion.
|
|
71
|
-
|
|
72
|
-
## Behavior summary
|
|
73
|
-
|
|
74
|
-
- `connect` stores local non-secret connection state and defers tenant lease allocation plus Metro preparation until a later command needs them.
|
|
75
|
-
- Commands such as `install-from-source`, `open`, `snapshot`, and `apps` allocate or refresh the lease when needed.
|
|
76
|
-
- `open` prepares Metro runtime hints when the remote profile has Metro fields and no compatible runtime is already saved.
|
|
77
|
-
- `metro reload` reuses saved Metro runtime hints and asks Metro to reload connected React Native apps without restarting the native process.
|
|
78
|
-
- `batch` also prepares Metro when any step opens an app and that step does not provide its own runtime.
|
|
79
|
-
- `disconnect` closes the session when possible, stops the Metro companion owned by the connection, releases the lease when one was allocated, and removes local connection state.
|
|
80
|
-
|
|
81
|
-
Remote install examples:
|
|
82
|
-
|
|
83
|
-
```bash
|
|
84
|
-
agent-device install com.example.app ./app.apk
|
|
85
|
-
ARTIFACT_URL="<trusted-artifact-url>"
|
|
86
|
-
agent-device install-from-source "$ARTIFACT_URL" --platform android
|
|
87
|
-
```
|
|
88
|
-
|
|
89
|
-
- Use `install` or `reinstall` for local paths; remote daemons upload local artifacts automatically.
|
|
90
|
-
- Use `install-from-source` only for trusted, operator-approved artifact URLs the remote daemon can reach. Do not fetch arbitrary user-supplied URLs.
|
|
91
|
-
- Use `install-from-source --github-actions-artifact <org>/<repo>:<artifact>` when the remote daemon has repository credentials and supports daemon-resolved GitHub Actions artifacts.
|
|
92
|
-
- For local-path versus URL artifact rules, follow [bootstrap-install.md](bootstrap-install.md).
|
|
93
|
-
|
|
94
|
-
Use `agent-device connection status --session adc-android` to inspect the active connection without reading JSON state manually. Status output must not include auth tokens.
|
|
95
|
-
|
|
96
|
-
## Remote config shape
|
|
97
|
-
|
|
98
|
-
Example `remote-config.json` shape:
|
|
99
|
-
|
|
100
|
-
```json
|
|
101
|
-
{
|
|
102
|
-
"daemonBaseUrl": "https://bridge.example.com/agent-device",
|
|
103
|
-
"daemonTransport": "http",
|
|
104
|
-
"tenant": "acme",
|
|
105
|
-
"runId": "run-123",
|
|
106
|
-
"sessionIsolation": "tenant",
|
|
107
|
-
"platform": "ios",
|
|
108
|
-
"metroProxyBaseUrl": "https://bridge.example.com"
|
|
109
|
-
}
|
|
110
|
-
```
|
|
111
|
-
|
|
112
|
-
Optional overrides stay available for advanced cases:
|
|
113
|
-
|
|
114
|
-
```json
|
|
115
|
-
{
|
|
116
|
-
"session": "adc-ios",
|
|
117
|
-
"leaseBackend": "ios-instance",
|
|
118
|
-
"metroProjectRoot": ".",
|
|
119
|
-
"metroKind": "expo",
|
|
120
|
-
"metroPublicBaseUrl": "http://127.0.0.1:8081"
|
|
121
|
-
}
|
|
122
|
-
```
|
|
123
|
-
|
|
124
|
-
- Keep service tokens in env/config managed by the operator boundary. Do not persist auth tokens in connection state. Human login uses `agent-device auth login` or implicit `connect` login and stores only the CLI session credential.
|
|
125
|
-
- Omit Metro fields for non-React Native flows.
|
|
126
|
-
- Put `tenant`, `runId`, and `sessionIsolation` in the remote profile so agents can run `agent-device connect --remote-config ./remote-config.json` without extra scope flags. Add `platform`, `leaseBackend`, `session`, or Metro overrides only when the default inference is not enough for that flow.
|
|
127
|
-
- Explicit command-line flags override connected defaults. Use them intentionally when switching session, platform, target, tenant, run, or lease scope.
|
|
128
|
-
- For React Native Metro runs with `metroProxyBaseUrl`, `agent-device >= 0.11.12` can manage the local companion tunnel, but Metro itself still needs to be running locally. `metroProxyBaseUrl` is the bridge origin, not a prebuilt `/api/metro/...` route.
|
|
129
|
-
- For cloud stock React Native iOS, use the bridge descriptor's wildcard HTTPS Metro hints directly; do not install or launch the XCTest runner just to make Metro reachable.
|
|
130
|
-
- Android keeps using bridge-provided `/api/metro/runtimes/<runtimeId>/...` Metro routes.
|
|
131
|
-
- `metroPublicBaseUrl` is only needed for direct/non-bridge bundle hints. Bridged profiles can omit it.
|
|
132
|
-
- Use a lease backend that matches the bridge target platform, for example `android-instance`, `ios-instance`, or an explicit `--lease-backend` override.
|
|
133
|
-
|
|
134
|
-
## Transport prerequisites
|
|
135
|
-
|
|
136
|
-
- Start the daemon in HTTP mode with `AGENT_DEVICE_DAEMON_SERVER_MODE=http|dual` on the host.
|
|
137
|
-
- Point the profile or env at the remote host with `daemonBaseUrl` or `AGENT_DEVICE_DAEMON_BASE_URL=http(s)://host:port[/base-path]`.
|
|
138
|
-
- For humans, run `connect --remote-config <path>` and let it refresh or create the CLI session. Use `agent-device auth status` to inspect it and `agent-device auth logout` to remove it.
|
|
139
|
-
- For CI/non-interactive shells, set `AGENT_DEVICE_DAEMON_AUTH_TOKEN=adc_live_...` or pass `--daemon-auth-token`. The client does not start device-code polling in CI by default.
|
|
140
|
-
- Prefer an auth hook such as `AGENT_DEVICE_HTTP_AUTH_HOOK` when the host needs caller validation or tenant injection.
|
|
141
|
-
|
|
142
|
-
## Lease debug fallback
|
|
143
|
-
|
|
144
|
-
The main agent flow should use `connect` and `connection status`. For daemon-side auth, scope, or lease debugging, inspect host-side daemon logs and operator tooling instead of issuing raw daemon RPC from the agent shell.
|
|
145
|
-
|
|
146
|
-
## GitHub Actions artifact install
|
|
147
|
-
|
|
148
|
-
Use this when a compatible remote daemon resolves GitHub Actions artifacts server-side. Do not download CI artifacts locally or add a local `GITHUB_TOKEN` just to install CI output.
|
|
149
|
-
|
|
150
|
-
Artifact ID shape:
|
|
151
|
-
|
|
152
|
-
```bash
|
|
153
|
-
agent-device install-from-source \
|
|
154
|
-
--github-actions-artifact OWNER/REPO:1234567890 \
|
|
155
|
-
--remote-config ./remote-config.json \
|
|
156
|
-
--platform android
|
|
157
|
-
```
|
|
158
|
-
|
|
159
|
-
Artifact-name shape:
|
|
160
|
-
|
|
161
|
-
```bash
|
|
162
|
-
agent-device install-from-source \
|
|
163
|
-
--github-actions-artifact OWNER/REPO:app-debug \
|
|
164
|
-
--remote-config ./remote-config.json \
|
|
165
|
-
--platform ios
|
|
166
|
-
```
|
|
167
|
-
|
|
168
|
-
Config shape:
|
|
169
|
-
|
|
170
|
-
```json
|
|
171
|
-
{
|
|
172
|
-
"installSource": {
|
|
173
|
-
"type": "github-actions-artifact",
|
|
174
|
-
"repo": "OWNER/REPO",
|
|
175
|
-
"artifact": "app-debug"
|
|
176
|
-
}
|
|
177
|
-
}
|
|
178
|
-
```
|
|
179
|
-
|
|
180
|
-
Numeric artifacts are passed as artifact IDs. Non-numeric artifacts are passed as artifact names.
|
|
181
|
-
|
|
182
|
-
## Failure semantics and trust notes
|
|
183
|
-
|
|
184
|
-
- Missing tenant, run, or lease fields in tenant-isolation mode should fail as `INVALID_ARGS`.
|
|
185
|
-
- Inactive or scope-mismatched leases should fail as `UNAUTHORIZED`.
|
|
186
|
-
- Inspect logs on the remote host during remote debugging. Client-side `--debug` does not tail a local daemon log once `AGENT_DEVICE_DAEMON_BASE_URL` is set.
|
|
187
|
-
- Do not point `AGENT_DEVICE_DAEMON_BASE_URL` at untrusted hosts. Remote daemon requests can launch apps and execute interaction commands.
|
|
188
|
-
- Treat daemon auth tokens and lease identifiers as sensitive operational data.
|