agent-device 0.1.4 → 0.1.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +2 -1
- package/skills/agent-device/SKILL.md +156 -0
- package/skills/agent-device/references/coordinate-system.md +8 -0
- package/skills/agent-device/references/permissions.md +20 -0
- package/skills/agent-device/references/session-management.md +22 -0
- package/skills/agent-device/references/snapshot-refs.md +49 -0
- package/skills/agent-device/references/video-recording.md +39 -0
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "agent-device",
|
|
3
|
-
"version": "0.1.
|
|
3
|
+
"version": "0.1.5",
|
|
4
4
|
"description": "Unified control plane for physical and virtual devices via an agent-driven CLI.",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"author": "Callstack",
|
|
@@ -37,6 +37,7 @@
|
|
|
37
37
|
"!ios-runner/**/.swiftpm",
|
|
38
38
|
"!ios-runner/**/xcuserdata",
|
|
39
39
|
"!ios-runner/**/*.xcuserstate",
|
|
40
|
+
"skills",
|
|
40
41
|
"src",
|
|
41
42
|
"README.md",
|
|
42
43
|
"LICENSE"
|
|
@@ -0,0 +1,156 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: agent-device
|
|
3
|
+
description: Automates mobile and simulator interactions for iOS and Android devices. Use when navigating apps, taking snapshots/screenshots, tapping, typing, scrolling, or extracting UI info on mobile devices or simulators.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Mobile Automation with agent-device
|
|
7
|
+
|
|
8
|
+
## Quick start
|
|
9
|
+
|
|
10
|
+
```bash
|
|
11
|
+
agent-device open Settings --platform ios
|
|
12
|
+
agent-device snapshot -i
|
|
13
|
+
agent-device snapshot -s @e3
|
|
14
|
+
agent-device click @e3
|
|
15
|
+
agent-device wait text "Camera"
|
|
16
|
+
agent-device alert wait 10000
|
|
17
|
+
agent-device fill @e5 "test"
|
|
18
|
+
agent-device close
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
## Core workflow
|
|
22
|
+
|
|
23
|
+
1. Open app or just boot device: `open [app]`
|
|
24
|
+
2. Snapshot: `snapshot -i` to get compact refs
|
|
25
|
+
3. Interact using refs (`click @eN`, `fill @eN "text"`)
|
|
26
|
+
4. Re-snapshot after navigation or UI changes
|
|
27
|
+
5. Close session when done
|
|
28
|
+
|
|
29
|
+
## Commands
|
|
30
|
+
|
|
31
|
+
### Navigation
|
|
32
|
+
|
|
33
|
+
```bash
|
|
34
|
+
agent-device open [app] # Boot device/simulator; optionally launch app
|
|
35
|
+
agent-device close [app] # Close app or just end session
|
|
36
|
+
agent-device session list # List active sessions
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
### Snapshot (page analysis)
|
|
40
|
+
|
|
41
|
+
```bash
|
|
42
|
+
agent-device snapshot # Full accessibility tree
|
|
43
|
+
agent-device snapshot -i # Interactive elements only (recommended)
|
|
44
|
+
agent-device snapshot -c # Compact output
|
|
45
|
+
agent-device snapshot -d 3 # Limit depth
|
|
46
|
+
agent-device snapshot -s "Camera" # Scope to label/identifier
|
|
47
|
+
agent-device snapshot --raw # Raw node output
|
|
48
|
+
agent-device snapshot --backend hybrid # Default: best speed vs correctness trade-off (AX fast, XCTest complete)
|
|
49
|
+
agent-device snapshot --backend ax # macOS Accessibility tree (fast, needs permissions)
|
|
50
|
+
agent-device snapshot --backend xctest # XCTest snapshot (slow, no permissions)
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
Hybrid will automatically fill empty containers (e.g. `group`, `tab bar`) by scoping XCTest to the container label.
|
|
54
|
+
It is recommended because AX is fast but can miss UI details, while XCTest is slower but more complete.
|
|
55
|
+
If you want explicit control or AX is unavailable, use `--backend xctest`.
|
|
56
|
+
In practice, if AX returns a `Tab Bar` group with no children, hybrid will run a scoped XCTest snapshot for `Tab Bar` and insert those nodes under the group.
|
|
57
|
+
|
|
58
|
+
### Find (semantic)
|
|
59
|
+
|
|
60
|
+
```bash
|
|
61
|
+
agent-device find "Sign In" click
|
|
62
|
+
agent-device find text "Sign In" click
|
|
63
|
+
agent-device find label "Email" fill "user@example.com"
|
|
64
|
+
agent-device find value "Search" type "query"
|
|
65
|
+
agent-device find role button click
|
|
66
|
+
agent-device find id "com.example:id/login" click
|
|
67
|
+
agent-device find "Settings" wait 10000
|
|
68
|
+
agent-device find "Settings" exists
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
### Settings helpers (simulators)
|
|
72
|
+
|
|
73
|
+
```bash
|
|
74
|
+
agent-device settings wifi on
|
|
75
|
+
agent-device settings wifi off
|
|
76
|
+
agent-device settings airplane on
|
|
77
|
+
agent-device settings airplane off
|
|
78
|
+
agent-device settings location on
|
|
79
|
+
agent-device settings location off
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
Note: iOS wifi/airplane toggles status bar indicators, not actual network state.
|
|
83
|
+
Airplane off clears status bar overrides.
|
|
84
|
+
|
|
85
|
+
### App state
|
|
86
|
+
|
|
87
|
+
```bash
|
|
88
|
+
agent-device appstate
|
|
89
|
+
agent-device apps --metadata --platform ios
|
|
90
|
+
agent-device apps --metadata --platform android
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
### Interactions (use @refs from snapshot)
|
|
94
|
+
|
|
95
|
+
```bash
|
|
96
|
+
agent-device click @e1
|
|
97
|
+
agent-device focus @e2
|
|
98
|
+
agent-device fill @e2 "text" # Tap then type
|
|
99
|
+
agent-device type "text" # Type into focused field
|
|
100
|
+
agent-device press 300 500 # Tap by coordinates
|
|
101
|
+
agent-device long-press 300 500 800 # Long press (where supported)
|
|
102
|
+
agent-device scroll down 0.5
|
|
103
|
+
agent-device back
|
|
104
|
+
agent-device home
|
|
105
|
+
agent-device app-switcher
|
|
106
|
+
agent-device wait 1000
|
|
107
|
+
agent-device wait text "Settings"
|
|
108
|
+
agent-device alert get
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
### Get information
|
|
112
|
+
|
|
113
|
+
```bash
|
|
114
|
+
agent-device get text @e1
|
|
115
|
+
agent-device get attrs @e1
|
|
116
|
+
agent-device screenshot --out out.png
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
### Trace logs (AX/XCTest)
|
|
120
|
+
|
|
121
|
+
```bash
|
|
122
|
+
agent-device trace start # Start trace capture
|
|
123
|
+
agent-device trace start ./trace.log # Start trace capture to path
|
|
124
|
+
agent-device trace stop # Stop trace capture
|
|
125
|
+
agent-device trace stop ./trace.log # Stop and move trace log
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
### Devices and apps
|
|
129
|
+
|
|
130
|
+
```bash
|
|
131
|
+
agent-device devices
|
|
132
|
+
agent-device apps --platform ios
|
|
133
|
+
agent-device apps --platform android # default: launchable only
|
|
134
|
+
agent-device apps --platform android --all
|
|
135
|
+
agent-device apps --platform android --user-installed
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
## Best practices
|
|
139
|
+
|
|
140
|
+
- Always snapshot right before interactions; refs invalidate on UI changes.
|
|
141
|
+
- Prefer `snapshot -i` to reduce output size.
|
|
142
|
+
- On iOS, hybrid is the default and uses AX first, so Accessibility permission is still required.
|
|
143
|
+
- If AX returns the Simulator window or empty tree, restart Simulator or use `--backend xctest`.
|
|
144
|
+
- Use `--session <name>` for parallel sessions; avoid device contention.
|
|
145
|
+
|
|
146
|
+
## References
|
|
147
|
+
|
|
148
|
+
- [references/snapshot-refs.md](references/snapshot-refs.md)
|
|
149
|
+
- [references/session-management.md](references/session-management.md)
|
|
150
|
+
- [references/permissions.md](references/permissions.md)
|
|
151
|
+
- [references/recording.md](references/recording.md)
|
|
152
|
+
- [references/coordinate-system.md](references/coordinate-system.md)
|
|
153
|
+
|
|
154
|
+
## Missing features roadmap (high level)
|
|
155
|
+
|
|
156
|
+
See [references/missing-features.md](references/missing-features.md) for planned parity with agent-browser.
|
|
@@ -0,0 +1,20 @@
|
|
|
1
|
+
# Permissions and Setup
|
|
2
|
+
|
|
3
|
+
## iOS AX snapshot
|
|
4
|
+
|
|
5
|
+
Hybrid snapshot (default) is recommended for best speed vs correctness; it uses macOS Accessibility APIs and requires permission:
|
|
6
|
+
|
|
7
|
+
System Settings > Privacy & Security > Accessibility
|
|
8
|
+
|
|
9
|
+
If permission is missing, use:
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
agent-device snapshot --backend xctest --platform ios
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
Hybrid/AX is fast; XCTest is slower but does not require permissions.
|
|
16
|
+
|
|
17
|
+
## Simulator troubleshooting
|
|
18
|
+
|
|
19
|
+
- If AX shows the Simulator chrome instead of app, restart Simulator.
|
|
20
|
+
- If AX returns empty, restart Simulator and re-open app.
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
# Session Management
|
|
2
|
+
|
|
3
|
+
## Named sessions
|
|
4
|
+
|
|
5
|
+
```bash
|
|
6
|
+
agent-device --session auth open Settings --platform ios
|
|
7
|
+
agent-device --session auth snapshot -i --platform ios
|
|
8
|
+
```
|
|
9
|
+
|
|
10
|
+
Sessions isolate device context. A device can only be held by one session at a time.
|
|
11
|
+
|
|
12
|
+
## Best practices
|
|
13
|
+
|
|
14
|
+
- Name sessions semantically.
|
|
15
|
+
- Close sessions when done.
|
|
16
|
+
- Use separate devices for parallel work.
|
|
17
|
+
|
|
18
|
+
## Listing sessions
|
|
19
|
+
|
|
20
|
+
```bash
|
|
21
|
+
agent-device session list
|
|
22
|
+
```
|
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
# Snapshot + Refs Workflow (Mobile)
|
|
2
|
+
|
|
3
|
+
## Purpose
|
|
4
|
+
|
|
5
|
+
Refs let agents interact without repeating full UI trees. Snapshot -> refs -> click/fill.
|
|
6
|
+
|
|
7
|
+
## Snapshot
|
|
8
|
+
|
|
9
|
+
```bash
|
|
10
|
+
agent-device snapshot -i --platform ios
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
Output:
|
|
14
|
+
|
|
15
|
+
```
|
|
16
|
+
Page: com.apple.Preferences
|
|
17
|
+
App: com.apple.Preferences
|
|
18
|
+
|
|
19
|
+
@e1 [ioscontentgroup]
|
|
20
|
+
@e2 [button] "Camera"
|
|
21
|
+
@e3 [button] "Privacy & Security"
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
## Using refs
|
|
25
|
+
|
|
26
|
+
```bash
|
|
27
|
+
agent-device click @e2 --platform ios
|
|
28
|
+
agent-device fill @e5 "test" --platform ios
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
## Ref lifecycle
|
|
32
|
+
|
|
33
|
+
Refs become invalid when UI changes (navigation, modal, dynamic list updates).
|
|
34
|
+
Always re-snapshot after any transition.
|
|
35
|
+
|
|
36
|
+
## Scope snapshots
|
|
37
|
+
|
|
38
|
+
Use `-s` to scope to labels/identifiers. This reduces size and speeds up results:
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
agent-device snapshot -i -s "Camera" --platform ios
|
|
42
|
+
agent-device snapshot -i -s @e3 --platform ios
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
## Troubleshooting
|
|
46
|
+
|
|
47
|
+
- Ref not found: re-snapshot.
|
|
48
|
+
- AX returns Simulator window: restart Simulator and re-run.
|
|
49
|
+
- AX empty: verify Accessibility permission or use `--backend xctest` (hybrid is recommended because AX is fast but can miss UI details, while XCTest is slower but more complete).
|
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
# Video Recording
|
|
2
|
+
|
|
3
|
+
Capture device automation sessions as video for debugging, documentation, or verification
|
|
4
|
+
|
|
5
|
+
## iOS Simulator
|
|
6
|
+
|
|
7
|
+
Use `agent-device record` commands (wrapper around simctl):
|
|
8
|
+
|
|
9
|
+
```bash
|
|
10
|
+
# Start recording
|
|
11
|
+
agent-device record start ./recordings/ios.mov
|
|
12
|
+
|
|
13
|
+
# Perform actions
|
|
14
|
+
agent-device open App
|
|
15
|
+
agent-device snapshot
|
|
16
|
+
agent-device click @e3
|
|
17
|
+
agent-device close
|
|
18
|
+
|
|
19
|
+
# Stop recording
|
|
20
|
+
agent-device record stop
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
## Android Emulator/Device
|
|
24
|
+
|
|
25
|
+
Use `agent-device record` commands (wrapper around adb):
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
# Start recording
|
|
29
|
+
agent-device record start ./recordings/android.mp4
|
|
30
|
+
|
|
31
|
+
# Perform actions
|
|
32
|
+
agent-device open App
|
|
33
|
+
agent-device snapshot
|
|
34
|
+
agent-device click @e3
|
|
35
|
+
agent-device close
|
|
36
|
+
|
|
37
|
+
# Stop recording
|
|
38
|
+
agent-device record stop
|
|
39
|
+
```
|