agent-device 0.1.4 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "agent-device",
3
- "version": "0.1.4",
3
+ "version": "0.1.5",
4
4
  "description": "Unified control plane for physical and virtual devices via an agent-driven CLI.",
5
5
  "license": "MIT",
6
6
  "author": "Callstack",
@@ -37,6 +37,7 @@
37
37
  "!ios-runner/**/.swiftpm",
38
38
  "!ios-runner/**/xcuserdata",
39
39
  "!ios-runner/**/*.xcuserstate",
40
+ "skills",
40
41
  "src",
41
42
  "README.md",
42
43
  "LICENSE"
@@ -0,0 +1,156 @@
1
+ ---
2
+ name: agent-device
3
+ description: Automates mobile and simulator interactions for iOS and Android devices. Use when navigating apps, taking snapshots/screenshots, tapping, typing, scrolling, or extracting UI info on mobile devices or simulators.
4
+ ---
5
+
6
+ # Mobile Automation with agent-device
7
+
8
+ ## Quick start
9
+
10
+ ```bash
11
+ agent-device open Settings --platform ios
12
+ agent-device snapshot -i
13
+ agent-device snapshot -s @e3
14
+ agent-device click @e3
15
+ agent-device wait text "Camera"
16
+ agent-device alert wait 10000
17
+ agent-device fill @e5 "test"
18
+ agent-device close
19
+ ```
20
+
21
+ ## Core workflow
22
+
23
+ 1. Open app or just boot device: `open [app]`
24
+ 2. Snapshot: `snapshot -i` to get compact refs
25
+ 3. Interact using refs (`click @eN`, `fill @eN "text"`)
26
+ 4. Re-snapshot after navigation or UI changes
27
+ 5. Close session when done
28
+
29
+ ## Commands
30
+
31
+ ### Navigation
32
+
33
+ ```bash
34
+ agent-device open [app] # Boot device/simulator; optionally launch app
35
+ agent-device close [app] # Close app or just end session
36
+ agent-device session list # List active sessions
37
+ ```
38
+
39
+ ### Snapshot (page analysis)
40
+
41
+ ```bash
42
+ agent-device snapshot # Full accessibility tree
43
+ agent-device snapshot -i # Interactive elements only (recommended)
44
+ agent-device snapshot -c # Compact output
45
+ agent-device snapshot -d 3 # Limit depth
46
+ agent-device snapshot -s "Camera" # Scope to label/identifier
47
+ agent-device snapshot --raw # Raw node output
48
+ agent-device snapshot --backend hybrid # Default: best speed vs correctness trade-off (AX fast, XCTest complete)
49
+ agent-device snapshot --backend ax # macOS Accessibility tree (fast, needs permissions)
50
+ agent-device snapshot --backend xctest # XCTest snapshot (slow, no permissions)
51
+ ```
52
+
53
+ Hybrid will automatically fill empty containers (e.g. `group`, `tab bar`) by scoping XCTest to the container label.
54
+ It is recommended because AX is fast but can miss UI details, while XCTest is slower but more complete.
55
+ If you want explicit control or AX is unavailable, use `--backend xctest`.
56
+ In practice, if AX returns a `Tab Bar` group with no children, hybrid will run a scoped XCTest snapshot for `Tab Bar` and insert those nodes under the group.
57
+
58
+ ### Find (semantic)
59
+
60
+ ```bash
61
+ agent-device find "Sign In" click
62
+ agent-device find text "Sign In" click
63
+ agent-device find label "Email" fill "user@example.com"
64
+ agent-device find value "Search" type "query"
65
+ agent-device find role button click
66
+ agent-device find id "com.example:id/login" click
67
+ agent-device find "Settings" wait 10000
68
+ agent-device find "Settings" exists
69
+ ```
70
+
71
+ ### Settings helpers (simulators)
72
+
73
+ ```bash
74
+ agent-device settings wifi on
75
+ agent-device settings wifi off
76
+ agent-device settings airplane on
77
+ agent-device settings airplane off
78
+ agent-device settings location on
79
+ agent-device settings location off
80
+ ```
81
+
82
+ Note: iOS wifi/airplane toggles status bar indicators, not actual network state.
83
+ Airplane off clears status bar overrides.
84
+
85
+ ### App state
86
+
87
+ ```bash
88
+ agent-device appstate
89
+ agent-device apps --metadata --platform ios
90
+ agent-device apps --metadata --platform android
91
+ ```
92
+
93
+ ### Interactions (use @refs from snapshot)
94
+
95
+ ```bash
96
+ agent-device click @e1
97
+ agent-device focus @e2
98
+ agent-device fill @e2 "text" # Tap then type
99
+ agent-device type "text" # Type into focused field
100
+ agent-device press 300 500 # Tap by coordinates
101
+ agent-device long-press 300 500 800 # Long press (where supported)
102
+ agent-device scroll down 0.5
103
+ agent-device back
104
+ agent-device home
105
+ agent-device app-switcher
106
+ agent-device wait 1000
107
+ agent-device wait text "Settings"
108
+ agent-device alert get
109
+ ```
110
+
111
+ ### Get information
112
+
113
+ ```bash
114
+ agent-device get text @e1
115
+ agent-device get attrs @e1
116
+ agent-device screenshot --out out.png
117
+ ```
118
+
119
+ ### Trace logs (AX/XCTest)
120
+
121
+ ```bash
122
+ agent-device trace start # Start trace capture
123
+ agent-device trace start ./trace.log # Start trace capture to path
124
+ agent-device trace stop # Stop trace capture
125
+ agent-device trace stop ./trace.log # Stop and move trace log
126
+ ```
127
+
128
+ ### Devices and apps
129
+
130
+ ```bash
131
+ agent-device devices
132
+ agent-device apps --platform ios
133
+ agent-device apps --platform android # default: launchable only
134
+ agent-device apps --platform android --all
135
+ agent-device apps --platform android --user-installed
136
+ ```
137
+
138
+ ## Best practices
139
+
140
+ - Always snapshot right before interactions; refs invalidate on UI changes.
141
+ - Prefer `snapshot -i` to reduce output size.
142
+ - On iOS, hybrid is the default and uses AX first, so Accessibility permission is still required.
143
+ - If AX returns the Simulator window or empty tree, restart Simulator or use `--backend xctest`.
144
+ - Use `--session <name>` for parallel sessions; avoid device contention.
145
+
146
+ ## References
147
+
148
+ - [references/snapshot-refs.md](references/snapshot-refs.md)
149
+ - [references/session-management.md](references/session-management.md)
150
+ - [references/permissions.md](references/permissions.md)
151
+ - [references/recording.md](references/recording.md)
152
+ - [references/coordinate-system.md](references/coordinate-system.md)
153
+
154
+ ## Missing features roadmap (high level)
155
+
156
+ See [references/missing-features.md](references/missing-features.md) for planned parity with agent-browser.
@@ -0,0 +1,8 @@
1
+ # Coordinate System
2
+
3
+ All coordinate-based actions use device screen coordinates:
4
+
5
+ - Origin: top-left of the device screen
6
+ - Units: device points for iOS, pixels for Android
7
+
8
+ Use screenshots to reason about coordinates.
@@ -0,0 +1,20 @@
1
+ # Permissions and Setup
2
+
3
+ ## iOS AX snapshot
4
+
5
+ Hybrid snapshot (default) is recommended for best speed vs correctness; it uses macOS Accessibility APIs and requires permission:
6
+
7
+ System Settings > Privacy & Security > Accessibility
8
+
9
+ If permission is missing, use:
10
+
11
+ ```bash
12
+ agent-device snapshot --backend xctest --platform ios
13
+ ```
14
+
15
+ Hybrid/AX is fast; XCTest is slower but does not require permissions.
16
+
17
+ ## Simulator troubleshooting
18
+
19
+ - If AX shows the Simulator chrome instead of app, restart Simulator.
20
+ - If AX returns empty, restart Simulator and re-open app.
@@ -0,0 +1,22 @@
1
+ # Session Management
2
+
3
+ ## Named sessions
4
+
5
+ ```bash
6
+ agent-device --session auth open Settings --platform ios
7
+ agent-device --session auth snapshot -i --platform ios
8
+ ```
9
+
10
+ Sessions isolate device context. A device can only be held by one session at a time.
11
+
12
+ ## Best practices
13
+
14
+ - Name sessions semantically.
15
+ - Close sessions when done.
16
+ - Use separate devices for parallel work.
17
+
18
+ ## Listing sessions
19
+
20
+ ```bash
21
+ agent-device session list
22
+ ```
@@ -0,0 +1,49 @@
1
+ # Snapshot + Refs Workflow (Mobile)
2
+
3
+ ## Purpose
4
+
5
+ Refs let agents interact without repeating full UI trees. Snapshot -> refs -> click/fill.
6
+
7
+ ## Snapshot
8
+
9
+ ```bash
10
+ agent-device snapshot -i --platform ios
11
+ ```
12
+
13
+ Output:
14
+
15
+ ```
16
+ Page: com.apple.Preferences
17
+ App: com.apple.Preferences
18
+
19
+ @e1 [ioscontentgroup]
20
+ @e2 [button] "Camera"
21
+ @e3 [button] "Privacy & Security"
22
+ ```
23
+
24
+ ## Using refs
25
+
26
+ ```bash
27
+ agent-device click @e2 --platform ios
28
+ agent-device fill @e5 "test" --platform ios
29
+ ```
30
+
31
+ ## Ref lifecycle
32
+
33
+ Refs become invalid when UI changes (navigation, modal, dynamic list updates).
34
+ Always re-snapshot after any transition.
35
+
36
+ ## Scope snapshots
37
+
38
+ Use `-s` to scope to labels/identifiers. This reduces size and speeds up results:
39
+
40
+ ```bash
41
+ agent-device snapshot -i -s "Camera" --platform ios
42
+ agent-device snapshot -i -s @e3 --platform ios
43
+ ```
44
+
45
+ ## Troubleshooting
46
+
47
+ - Ref not found: re-snapshot.
48
+ - AX returns Simulator window: restart Simulator and re-run.
49
+ - AX empty: verify Accessibility permission or use `--backend xctest` (hybrid is recommended because AX is fast but can miss UI details, while XCTest is slower but more complete).
@@ -0,0 +1,39 @@
1
+ # Video Recording
2
+
3
+ Capture device automation sessions as video for debugging, documentation, or verification
4
+
5
+ ## iOS Simulator
6
+
7
+ Use `agent-device record` commands (wrapper around simctl):
8
+
9
+ ```bash
10
+ # Start recording
11
+ agent-device record start ./recordings/ios.mov
12
+
13
+ # Perform actions
14
+ agent-device open App
15
+ agent-device snapshot
16
+ agent-device click @e3
17
+ agent-device close
18
+
19
+ # Stop recording
20
+ agent-device record stop
21
+ ```
22
+
23
+ ## Android Emulator/Device
24
+
25
+ Use `agent-device record` commands (wrapper around adb):
26
+
27
+ ```bash
28
+ # Start recording
29
+ agent-device record start ./recordings/android.mp4
30
+
31
+ # Perform actions
32
+ agent-device open App
33
+ agent-device snapshot
34
+ agent-device click @e3
35
+ agent-device close
36
+
37
+ # Stop recording
38
+ agent-device record stop
39
+ ```