zeno-mobile-runner 0.2.16 → 0.2.17

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (75) hide show
  1. package/CHANGELOG.md +33 -0
  2. package/CONTRIBUTING.md +20 -7
  3. package/FEATURES.md +29 -20
  4. package/README.md +73 -57
  5. package/SECURITY.md +11 -6
  6. package/clients/README.md +8 -7
  7. package/clients/go/README.md +2 -2
  8. package/clients/kotlin/README.md +2 -2
  9. package/clients/kotlin/build.gradle.kts +1 -1
  10. package/clients/python/README.md +2 -1
  11. package/clients/python/pyproject.toml +1 -1
  12. package/clients/rust/Cargo.lock +1 -1
  13. package/clients/rust/Cargo.toml +1 -1
  14. package/clients/rust/README.md +2 -2
  15. package/clients/swift/README.md +2 -2
  16. package/clients/typescript/README.md +2 -1
  17. package/clients/typescript/package.json +1 -1
  18. package/docs/adr/0001-agent-native-runner-boundary.md +1 -1
  19. package/docs/adr/README.md +7 -5
  20. package/docs/agent-discovery.md +15 -15
  21. package/docs/ai-agents.md +30 -20
  22. package/docs/app-integration.md +59 -27
  23. package/docs/benchmarking.md +16 -8
  24. package/docs/benchmarks/README.md +3 -1
  25. package/docs/benchmarks/benchmark-lab-v1.md +1 -1
  26. package/docs/client-installation.md +18 -9
  27. package/docs/clients.md +7 -6
  28. package/docs/config.md +29 -15
  29. package/docs/demo.md +14 -9
  30. package/docs/expo-smoke.md +12 -18
  31. package/docs/frameworks.md +30 -21
  32. package/docs/install.md +63 -13
  33. package/docs/npm.md +45 -27
  34. package/docs/production-readiness.md +32 -17
  35. package/docs/protocol-fixtures/core-session.responses.jsonl +1 -1
  36. package/docs/protocol-versioning.md +5 -3
  37. package/docs/protocol.md +33 -18
  38. package/docs/scenario-authoring.md +15 -8
  39. package/docs/support-matrix.md +38 -0
  40. package/docs/trace-privacy.md +5 -3
  41. package/docs/troubleshooting.md +17 -14
  42. package/npm/app-config.mjs +2 -0
  43. package/npm/commands.mjs +4 -4
  44. package/npm/scaffold.mjs +2 -2
  45. package/package.json +2 -2
  46. package/prebuilds/darwin-arm64/zmr +0 -0
  47. package/prebuilds/darwin-x64/zmr +0 -0
  48. package/prebuilds/linux-arm64/zmr +0 -0
  49. package/prebuilds/linux-x64/zmr +0 -0
  50. package/schemas/README.md +6 -3
  51. package/schemas/import-output.schema.json +1 -1
  52. package/schemas/scenario.schema.json +2 -0
  53. package/schemas/zmr-config.schema.json +2 -1
  54. package/scripts/public-metadata-guard.sh +101 -0
  55. package/shims/android/README.md +4 -3
  56. package/shims/android/protocol.md +3 -2
  57. package/shims/ios/README.md +5 -5
  58. package/shims/ios/protocol.md +2 -1
  59. package/skills/zmr-mobile-testing/SKILL.md +9 -8
  60. package/src/android_emulator.zig +54 -5
  61. package/src/cli_import.zig +15 -2
  62. package/src/cli_output.zig +2 -0
  63. package/src/cli_run.zig +8 -0
  64. package/src/config.zig +3 -0
  65. package/src/errors.zig +3 -0
  66. package/src/ios_devices.zig +100 -0
  67. package/src/main.zig +1 -1
  68. package/src/mcp_protocol.zig +12 -9
  69. package/src/run_options.zig +4 -0
  70. package/src/scaffold.zig +10 -8
  71. package/src/scenario.zig +43 -0
  72. package/src/selector.zig +53 -9
  73. package/src/trace_json.zig +4 -0
  74. package/src/validation.zig +5 -0
  75. package/src/version.zig +1 -1
@@ -1,15 +1,15 @@
1
1
  # Agent Discovery
2
2
 
3
- ZMR supports agent-led discovery today through its JSON-RPC and MCP interfaces,
4
- trace events, semantic snapshot artifacts, guarded trace exploration, in-band
5
- trace discovery, and offline scenario drafting. An external agent can observe
6
- the app, choose typed actions, inspect trace events, ask ZMR to write a small
7
- repeatable scenario from the trace, and then edit it as it learns a flow.
3
+ ZMR supports agent-led discovery through JSON-RPC, MCP, trace events, semantic
4
+ snapshots, guarded trace exploration, in-band trace discovery, and offline
5
+ scenario drafting. An external agent can observe the app, choose typed actions,
6
+ inspect trace events, ask ZMR to write a repeatable scenario from the trace, and
7
+ then edit that scenario as it learns the flow.
8
8
 
9
9
  `zmr explore` is the built-in review-first exploration command. It is
10
10
  trace-backed, not an unbounded crawler: it does not launch devices, invent
11
- missing actions, discover credentials, or commit files. Keep autonomous
12
- planning in the agent, and keep ZMR as the deterministic mobile control plane.
11
+ missing actions, discover credentials, or commit files. Keep autonomous planning
12
+ in the agent; keep ZMR as the deterministic mobile control plane.
13
13
 
14
14
  ```mermaid
15
15
  flowchart LR
@@ -120,7 +120,7 @@ flowchart LR
120
120
  generated replay steps, and `skippedEventCount` is the number of events left
121
121
  out.
122
122
 
123
- 11. After editing a generated scenario, validate it in-band with JSON-RPC:
123
+ 12. After editing a generated scenario, validate it in-band with JSON-RPC:
124
124
 
125
125
  ```json
126
126
  {"jsonrpc":"2.0","id":8,"method":"scenario.validate","params":{"path":".zmr/discovered/replay-smoke.json"}}
@@ -130,7 +130,7 @@ flowchart LR
130
130
  result matches `zmr validate --json`, including field paths and source
131
131
  locations for invalid files.
132
132
 
133
- 12. Use the lower-level draft primitive when you want separate surface and
133
+ 13. Use the lower-level draft primitive when you want separate surface and
134
134
  replay files. For a conservative surface-smoke scenario:
135
135
 
136
136
  ```bash
@@ -160,16 +160,16 @@ flowchart LR
160
160
  timeout context for successful waits and timeout diagnostics.
161
161
  Unsupported events stay out of the scenario and are reported as warnings.
162
162
 
163
- 13. Edit the draft, discovery, or exploration output into a candidate flow, for example
164
- `.zmr/discovered/login-smoke.json`, by copying only steps that were observed
165
- and understood.
166
- 14. Validate the candidate scenario:
163
+ 14. Edit the draft, discovery, or exploration output into a candidate flow, for
164
+ example `.zmr/discovered/login-smoke.json`, by copying only steps that were
165
+ observed and understood.
166
+ 15. Validate the candidate scenario:
167
167
 
168
168
  ```bash
169
169
  zmr validate --json .zmr/discovered/login-smoke.json
170
170
  ```
171
171
 
172
- 15. Re-run it deterministically:
172
+ 16. Re-run it deterministically:
173
173
 
174
174
  ```bash
175
175
  zmr run .zmr/discovered/login-smoke.json \
@@ -179,7 +179,7 @@ flowchart LR
179
179
  --json
180
180
  ```
181
181
 
182
- 16. Export a redacted bundle before sharing artifacts:
182
+ 17. Export a redacted bundle before sharing artifacts:
183
183
 
184
184
  ```bash
185
185
  zmr export traces/zmr-login-smoke \
package/docs/ai-agents.md CHANGED
@@ -1,8 +1,8 @@
1
1
  # AI Agent Guide
2
2
 
3
- ZMR is built for external agents. The runner provides device state, typed
4
- actions, waits, assertions, trace explanation, and trace export; the agent
5
- decides the next step.
3
+ ZMR gives external agents a mobile control plane. The runner provides device
4
+ state, typed actions, waits, assertions, trace explanation, scenario discovery,
5
+ and redacted export. The agent remains responsible for planning and review.
6
6
 
7
7
  ```mermaid
8
8
  sequenceDiagram
@@ -24,7 +24,8 @@ sequenceDiagram
24
24
 
25
25
  ## Agent Setup Loop
26
26
 
27
- Start inside the app checkout:
27
+ Start inside the app checkout and gather machine-readable setup state before
28
+ touching a device:
28
29
 
29
30
  ```bash
30
31
  zmr inspect --json --dir .
@@ -34,7 +35,7 @@ zmr validate --json .zmr/ios-smoke.json
34
35
  zmr schemas --json
35
36
  ```
36
37
 
37
- Use `zmr doctor --strict --json` in CI or setup flows that should fail on any
38
+ Use `zmr doctor --strict --json` in CI or install flows that should fail on any
38
39
  warning. Prefer JSON output for automation because it includes stable error
39
40
  codes, field paths, and remediation hints.
40
41
 
@@ -44,13 +45,14 @@ platform smoke scenario paths, safe next commands, and explicit claim limits.
44
45
 
45
46
  ## Live JSON-RPC Session
46
47
 
47
- Agents should prefer `zmr serve` for interactive work:
48
+ Use `zmr serve` when an agent needs an interactive session with repeated
49
+ observe-act-assert turns:
48
50
 
49
51
  ```bash
50
52
  zmr serve --transport stdio --config .zmr/config.json --trace-dir traces/zmr-agent
51
53
  ```
52
54
 
53
- Recommended flow:
55
+ Recommended loop:
54
56
 
55
57
  1. Call `runner.capabilities` and check protocol/platform support.
56
58
  2. Call `session.create`.
@@ -69,7 +71,7 @@ Recommended flow:
69
71
  11. Call `trace.export` with `redact: true` before sharing artifacts.
70
72
  12. Call `session.close`.
71
73
 
72
- Do not parse screenshots or terminal text when the same fact is available from
74
+ Do not parse screenshots or terminal prose when the same fact is available from
73
75
  snapshot nodes, action results, CLI JSON, or trace events.
74
76
 
75
77
  If `zmr run --json` returns `status: "partial"`, inspect `partialFailure`.
@@ -109,14 +111,16 @@ The MCP server exposes mobile-specific tools:
109
111
  - `trace_events`, `trace_explain`, `trace_explore`, `trace_discover`, and
110
112
  `trace_export`
111
113
 
112
- Prefer `semantic_snapshot` for action planning. It avoids forcing an agent to
113
- infer intent from platform-specific Android/UI Automator or XCTest class names.
114
+ Prefer `semantic_snapshot` for action planning. It prevents the agent from
115
+ inferring product intent from platform-specific Android/UI Automator or XCTest
116
+ class names.
114
117
 
115
118
  ## Agent-Led Discovery
116
119
 
117
- Agents can use ZMR to discover flows and draft scenarios by looping over
118
- `observe.semanticSnapshot`, one typed action, trace events, and scenario
119
- validation. After a session has produced trace artifacts, call JSON-RPC
120
+ Agents can use ZMR to turn an exploratory session into a reviewable scenario.
121
+ The safe loop is: observe with `semantic_snapshot`, take one typed action,
122
+ inspect trace events, generate a candidate, validate it, then rerun it
123
+ deterministically. After a session has produced trace artifacts, call JSON-RPC
120
124
  `trace.explain` or MCP `trace_explain` for in-band triage, then call JSON-RPC
121
125
  `trace.explore` or MCP `trace_explore` when the generated draft should carry a
122
126
  stated goal and guardrails. Use JSON-RPC `trace.discover` or MCP
@@ -169,7 +173,7 @@ then commit only reviewed scenario JSON.
169
173
 
170
174
  ## Scenario File Workflow
171
175
 
172
- For repeatable tests, generate or edit `.zmr/*.json` scenarios:
176
+ For committed tests, generate or edit `.zmr/*.json` scenarios:
173
177
 
174
178
  ```bash
175
179
  zmr validate --json .zmr/login-smoke.json
@@ -178,21 +182,26 @@ zmr explain --json traces/zmr-login-smoke
178
182
  zmr export traces/zmr-login-smoke --out traces/zmr-login-smoke-redacted.zmrtrace --redact
179
183
  ```
180
184
 
181
- Use stable selectors in this order when available:
185
+ Use stable selectors in this order:
182
186
 
183
187
  - app accessibility identifiers or resource ids
184
188
  - content descriptions or accessibility labels
185
189
  - exact visible text for stable product copy
186
190
  - `textContains` only when the visible text legitimately varies
191
+ - `stableId` only as a fallback from the current `semantic_snapshot`
187
192
  - coordinate actions only as a last resort
188
193
 
189
194
  Use `waitAny` for screens with legitimate branches, and `whenVisible` for
190
195
  optional platform or dev-client screens. Keep credentials and app-private data
191
196
  in the app repository or environment, not in public scenarios.
197
+ Prefer app-owned selectors for committed scenario files. `stableId` is useful
198
+ for immediate live-session actions when a semantic node has no better selector,
199
+ but it is less portable than an accessibility identifier or resource id.
192
200
 
193
201
  ## Failure Triage
194
202
 
195
- When a run fails, inspect:
203
+ When a run fails, inspect structured evidence before changing app code or
204
+ selectors:
196
205
 
197
206
  - `zmr run --json` terminal summary
198
207
  - `zmr explain --json <trace-dir>`
@@ -202,12 +211,12 @@ When a run fails, inspect:
202
211
  - the trace viewer report from `zmr report`
203
212
 
204
213
  Selector failures include active app context, visible text, disabled/hidden or
205
- offscreen exact candidates, and nearest text matches when available. Treat
206
- those diagnostics as the source of truth before changing a selector.
214
+ offscreen exact candidates, and nearest text matches when available. Treat those
215
+ diagnostics as the source of truth before changing a selector.
207
216
 
208
217
  ## Benchmarking
209
218
 
210
- Use ZMR repeated runs first:
219
+ Use repeated ZMR runs before making reliability claims:
211
220
 
212
221
  ```bash
213
222
  zmr-benchmark --zmr .zmr/android-smoke.json --platform android --device emulator-5554 --app-id com.example.mobiletest --app-build <build-id-or-artifact> --runs 20 --trace-root traces/zmr-android-reliability --results traces/bench-comparison/results.jsonl --replace --min-pass-rate 100 --max-failures 0
@@ -228,7 +237,8 @@ candidate and baseline rows for your team to trust the result.
228
237
 
229
238
  ## Evidence Summaries
230
239
 
231
- Teams that collect repeated app/device pilot rows can evaluate them with:
240
+ Teams that collect repeated app/device pilot rows can evaluate claim readiness
241
+ with:
232
242
 
233
243
  ```bash
234
244
  zmr-release-readiness --json \
@@ -1,22 +1,33 @@
1
1
  # App Integration
2
2
 
3
- ZMR is intentionally a separate runner. A mobile app repo does not need to vendor ZMR, but it should expose a small, stable test surface so agents can drive the app deterministically.
3
+ ZMR is intentionally a separate runner. The app does not vendor ZMR, but it
4
+ should expose a small, stable test surface so agents and CI can drive the app
5
+ deterministically.
4
6
 
5
- Most app teams should install ZMR as a dev dependency:
7
+ Install the native `zmr` binary once, then create app-local `.zmr/` state from
8
+ the app repo:
6
9
 
7
10
  ```bash
8
- npm install --save-dev zeno-mobile-runner
9
- npx zmr-wizard --app-id com.example.mobiletest --package-json
11
+ curl -fsSL https://raw.githubusercontent.com/johnmikel/zeno-mobile-runner/main/install.sh | sh
12
+ export PATH="$HOME/.local/bin:$PATH"
13
+ zmr init --app --app-id com.example.mobiletest
10
14
  ```
11
15
 
12
- That keeps scenarios and app scripts in the app repo while the runner remains versioned through npm.
13
- For Expo development builds, add `--expo-dev-client-scheme <scheme>` to scaffold
14
- Android and iOS open-link smoke scenarios that load Metro before selector
15
- assertions run.
16
+ That keeps scenarios and traces in the app repo while the runner stays outside
17
+ the app process. JavaScript teams can use
18
+ `npm install --save-dev zeno-mobile-runner` plus
19
+ `npx zmr-wizard --app-id com.example.mobiletest --package-json` when they want
20
+ generated package scripts and npm helper bins.
21
+
22
+ For Expo development builds, use
23
+ `npx zmr-wizard --expo-dev-client-scheme <scheme> --package-json` when you want
24
+ generated Android and iOS open-link smoke scenarios that load Metro before
25
+ selector assertions run.
16
26
 
17
27
  ## React Native, Expo, And Flutter
18
28
 
19
- ZMR works best when the app exposes stable, user-meaningful selectors:
29
+ ZMR works best when the app exposes stable, user-meaningful selectors and direct
30
+ navigation paths:
20
31
 
21
32
  - React Native apps should use `testID`, `accessibilityLabel`, stable visible
22
33
  text, and deep links for direct navigation.
@@ -30,6 +41,8 @@ See [frameworks.md](frameworks.md) for framework-specific examples.
30
41
 
31
42
  ## What The App Provides
32
43
 
44
+ Think of this as the contract between the app repo and the runner.
45
+
33
46
  For Android:
34
47
 
35
48
  - A debug/test APK.
@@ -40,7 +53,8 @@ For Android:
40
53
  - Optional Android instrumentation shim command for faster hierarchy and
41
54
  selector-grade actions.
42
55
 
43
- Create the app-local Android shim command from the ZMR package or checkout:
56
+ Create the app-local Android shim command from the ZMR package or checkout when
57
+ you want faster hierarchy capture or selector-grade native actions:
44
58
 
45
59
  ```bash
46
60
  npx zmr-install-android-shim \
@@ -62,7 +76,7 @@ The generated
62
76
  `.zmr/android-shim` executable is the value to pass to `--android-shim` or
63
77
  `tools.androidShimPath`.
64
78
 
65
- For iOS:
79
+ For iOS/iPadOS:
66
80
 
67
81
  - A simulator `.app` build.
68
82
  - A stable bundle id, for example `com.example.mobiletest`.
@@ -71,7 +85,9 @@ For iOS:
71
85
  - Optional simulator XCTest/XCUIAutomation shim command for hierarchy and
72
86
  selector-grade actions.
73
87
 
74
- Create the app-local shim command from the ZMR package or checkout:
88
+ Create the app-local XCTest/XCUIAutomation shim command from the ZMR package or
89
+ checkout when selector actions, bounded hierarchy snapshots, or physical-device
90
+ screenshots are required:
75
91
 
76
92
  ```bash
77
93
  npx zmr-install-ios-shim \
@@ -121,10 +137,15 @@ mobile-app/
121
137
  ios-smoke.json
122
138
  ```
123
139
 
124
- Keep app-owned scenarios and ZMR defaults in `.zmr/` when they are app-specific. Keep generic examples in the ZMR repo. ZMR auto-discovers `.zmr/config.json` from the app repo; explicit CLI flags still override config defaults.
140
+ Keep app-owned scenarios and ZMR defaults in `.zmr/` when they are app-specific.
141
+ Keep generic examples in the ZMR repo. ZMR auto-discovers `.zmr/config.json`
142
+ from the app repo; explicit CLI flags still override config defaults.
125
143
 
126
144
  ## Android App Pilot Command
127
145
 
146
+ Use the pilot wrapper when you want app-local reliability evidence and standard
147
+ trace/report artifacts:
148
+
128
149
  ```bash
129
150
  /path/to/zeno-mobile-runner/scripts/run-android-pilot.sh \
130
151
  --app-root /path/to/mobile-app \
@@ -146,17 +167,18 @@ Use a saved emulator snapshot for repeatability:
146
167
  ```
147
168
 
148
169
  `--screen-record` writes `screenrecord.mp4` under the pilot trace root. For
149
- direct traced runs, use `zmr run --android-avd Small_Phone
170
+ direct traced runs, use `zmr run --ensure-device --android-avd Small_Phone
150
171
  --create-avd-if-missing --avd-system-image
151
172
  'system-images;android-35;google_apis;arm64-v8a' --avd-device pixel_6
152
173
  --restore-snapshot zmr-clean --wait-emulator --screen-record`, or set the
153
174
  equivalent `android.avdName`, `android.createAvdIfMissing`,
154
175
  `android.avdSystemImage`, `android.avdDeviceProfile`,
155
- `android.restoreSnapshot`, `android.waitReady`, and
176
+ `android.restoreSnapshot`, `android.waitReady`, `android.ensureDevice`, and
156
177
  `artifacts.screenRecording` values in `.zmr/config.json`. Treat recordings like
157
178
  screenshots: keep them local or share only when the app state is safe.
158
179
 
159
- The Android wrapper expects the default APK path under the app root. Override it when needed:
180
+ The Android wrapper expects the default APK path under the app root. Override it
181
+ when needed:
160
182
 
161
183
  ```bash
162
184
  /path/to/zeno-mobile-runner/scripts/run-android-pilot.sh \
@@ -167,7 +189,9 @@ The Android wrapper expects the default APK path under the app root. Override it
167
189
 
168
190
  ## Public Android Demo Command
169
191
 
170
- For a generic public Android app:
192
+ Use the public demo before connecting ZMR to a private app. It proves local
193
+ Android install, launch, selector action, typing, snapshot, and trace capture
194
+ with generic artifacts:
171
195
 
172
196
  ```bash
173
197
  npx zmr-demo-android --out /tmp/zmr-android-demo --device emulator-5554 --avd <avd-name>
@@ -186,12 +210,11 @@ adb install -r /tmp/zmr-android-demo/build/app-debug.apk
186
210
  --trace-dir /tmp/zmr-android-demo/traces/android-demo
187
211
  ```
188
212
 
189
- Use this path to prove local Android install, launch, selector action, typing,
190
- snapshot, and trace capture before wiring ZMR into a private app.
213
+ Use this path before wiring ZMR into a private app.
191
214
 
192
215
  ## iOS Demo Command
193
216
 
194
- For a generic public demo app with the shim already installed:
217
+ Use the public iOS demo before connecting ZMR to a private iOS or iPadOS app:
195
218
 
196
219
  ```bash
197
220
  npx zmr-demo-ios --out /tmp/zmr-ios-demo --device booted
@@ -218,7 +241,8 @@ Then boot a simulator and run:
218
241
  --ios-shim /tmp/zmr-ios-demo/.zmr/ios-shim
219
242
  ```
220
243
 
221
- Build the app for an iOS simulator, boot a simulator, then run:
244
+ For a private app, build the app for an iOS simulator, boot a simulator, then
245
+ run:
222
246
 
223
247
  ```bash
224
248
  /path/to/zeno-mobile-runner/scripts/run-ios-pilot.sh \
@@ -239,17 +263,21 @@ visible labels, hidden/disabled/offscreen candidates, and nearest text matches.
239
263
  When the app is already running, ZMR uses the shim `appState` response as an
240
264
  idempotent launch confirmation if `simctl launch` itself returns an error.
241
265
 
242
- On iOS simulators, `clearState` means best-effort app uninstall by bundle id.
243
- For physical iOS devices, lifecycle commands go through `devicectl` and
244
- selector commands go through the same app-local XCTest shim, subject to signing,
245
- provisioning, Developer Mode, and local Xcode availability. Screenshot
266
+ On iOS and iPadOS simulators, `clearState` means best-effort app uninstall by
267
+ bundle id. Use the same `--platform ios --ios-device-type simulator` path for
268
+ iPhone and iPad simulators, but collect separate iPad evidence when tablet
269
+ layouts, split views, or size classes can change the UI tree.
270
+
271
+ For physical iPhone and iPad devices, lifecycle commands go through `devicectl`
272
+ and selector commands go through the same app-local XCTest shim, subject to
273
+ signing, provisioning, Developer Mode, and local Xcode availability. Screenshot
246
274
  artifacts use the XCTest shim; log artifact capture is simulator-first in this
247
275
  release.
248
276
  Use a simulator-built `iphonesimulator` `.app` for simulator runs. A signed
249
277
  device `.ipa` must be run with `--ios-device-type physical`; the pilot wrapper
250
278
  rejects device IPAs on simulator runs before installing anything.
251
279
  Use `--ios-device-type physical` with a concrete device identifier from
252
- `zmr devices` for physical pilot runs:
280
+ `zmr devices` for physical iPhone or iPad pilot runs:
253
281
 
254
282
  ```bash
255
283
  /path/to/zeno-mobile-runner/scripts/run-ios-pilot.sh \
@@ -268,6 +296,8 @@ Install the simulator `.app` again before launch/open-link steps that need it.
268
296
 
269
297
  ## Direct CLI Use
270
298
 
299
+ Use direct CLI commands when debugging a scenario or wiring custom CI steps.
300
+
271
301
  Android:
272
302
 
273
303
  ```bash
@@ -291,6 +321,7 @@ xcrun simctl install booted /path/to/Sample.app
291
321
  zmr run .zmr/ios-shim-smoke.json \
292
322
  --platform ios \
293
323
  --device booted \
324
+ --ensure-device \
294
325
  --app-id com.example.mobiletest \
295
326
  --ios-shim ./.zmr/ios-shim \
296
327
  --trace-dir traces/ios-smoke
@@ -330,5 +361,6 @@ session.
330
361
  ## Public Artifact Rules
331
362
 
332
363
  - Share `*-redacted.zmrtrace` bundles.
333
- - Do not publish raw Metro logs, simulator logs, or unredacted screenshot bundles from private apps.
364
+ - Do not publish raw Metro logs, simulator logs, or unredacted screenshot
365
+ bundles from private apps.
334
366
  - Run `bash tests/public-safety-test.sh` before publishing this repo.
@@ -1,8 +1,9 @@
1
1
  # Benchmarking
2
2
 
3
- ZMR benchmark output is intentionally simple: each run appends one JSON object
4
- to `results.jsonl`, and `zmr report` turns that directory into local HTML and
5
- optional JUnit XML artifacts.
3
+ Benchmarking in ZMR is evidence-first. Each run appends one JSON object to
4
+ `results.jsonl`, and `zmr report` turns the directory into local HTML plus
5
+ optional JUnit XML artifacts. Use repeated runs for reliability claims and
6
+ matched baseline rows for speed claims.
6
7
 
7
8
  ## Public Evidence
8
9
 
@@ -39,6 +40,9 @@ setup, and Android/iOS ZMR workflow scenarios, but no public timing rows yet.
39
40
 
40
41
  ## Single Tool Benchmark
41
42
 
43
+ Use a single-tool benchmark to prove that ZMR can run a scenario repeatedly on a
44
+ target with the required pass rate and latency threshold:
45
+
42
46
  ```bash
43
47
  scripts/benchmark.sh \
44
48
  --zmr examples/android-app-login-smoke.json \
@@ -71,7 +75,8 @@ zmr report traces/bench-<timestamp> \
71
75
 
72
76
  ## Pilot Wrapper
73
77
 
74
- The configurable Android pilot script can run both sample scenarios repeatedly:
78
+ Use pilot wrappers when the benchmark should look like a real app release gate.
79
+ The configurable Android pilot script can run sample scenarios repeatedly:
75
80
 
76
81
  ```bash
77
82
  ./scripts/run-android-pilot.sh \
@@ -173,15 +178,18 @@ Benchmark reports include:
173
178
  - links to each run's `events.jsonl`
174
179
  - optional JUnit XML with one testcase per benchmark row for CI test reports
175
180
 
176
- Before making public performance claims, run the same scenario repeatedly on a clean emulator image and include the raw `results.jsonl` plus the redacted trace bundle for any failure.
181
+ Before making public performance claims, run the same scenario repeatedly on a
182
+ clean emulator or simulator state and retain the raw `results.jsonl` plus the
183
+ redacted trace bundle for any failure.
177
184
 
178
185
  ![ZMR HTML trace report showing the trace summary and per-event timeline](assets/report-html.png)
179
186
 
180
187
  ## Compare Against A Baseline
181
188
 
182
- Use `zmr-compare-benchmarks` when a private app repo has benchmark rows from
183
- ZMR and another local runner. The public ZMR repo keeps this generic: rows are
184
- grouped by the `tool` field and no external runner is hardcoded.
189
+ Use `zmr-compare-benchmarks` only when a private app repo has benchmark rows
190
+ from ZMR and another local runner for the same app path. The public ZMR repo
191
+ keeps this generic: rows are grouped by the `tool` field and no external runner
192
+ is hardcoded.
185
193
 
186
194
  Collect ZMR rows into the shared comparison file first:
187
195
 
@@ -1,7 +1,8 @@
1
1
  # Benchmark Evidence
2
2
 
3
3
  This directory contains public-safe benchmark evidence collected from
4
- reproducible ZMR demo apps.
4
+ reproducible ZMR demo apps. Treat each file as fixture-specific evidence, not a
5
+ global performance claim.
5
6
 
6
7
  Evidence here is intentionally narrow:
7
8
 
@@ -11,6 +12,7 @@ Evidence here is intentionally narrow:
11
12
  baseline was collected on the same app build, device state, and scenario.
12
13
  - Raw local traces are not committed because generated reports and JUnit files
13
14
  can include absolute local paths. Public rows are sanitized before commit.
15
+ - Product claims should point to the matching evidence pack and its scope.
14
16
 
15
17
  ## Evidence Packs
16
18
 
@@ -21,7 +21,7 @@ React Native, Expo, Flutter, native Android, and native iOS. The lab is not a
21
21
  generic benchmark scoreboard. Each fixture must represent an app workflow a
22
22
  developer can inspect, build, run, and adapt.
23
23
 
24
- The near-term wedge is agent-native mobile testing: structured observation,
24
+ The near-term wedge is agent-first mobile testing: structured observation,
25
25
  selector-grade actions, trace-first debugging, and reviewable scenario
26
26
  generation. Benchmarks should prove the local runner path is fast and reliable
27
27
  without overstating what one fixture demonstrates.
@@ -1,25 +1,27 @@
1
1
  # Client Installation
2
2
 
3
- ZMR has two layers:
3
+ ZMR has two layers. Install the binary first; add a language client only when
4
+ your agent or harness needs idiomatic host-side calls.
4
5
 
5
6
  1. The `zmr` binary controls devices, runs scenarios, serves JSON-RPC, and writes traces.
6
7
  2. Language clients are optional wrappers around `zmr serve --transport stdio`.
7
8
 
8
- For fastest adoption, install the binary once with npm or Homebrew. Then use a
9
- language client only when you want tests or agents written in that language.
9
+ For fastest adoption, install the native binary once with the curl installer.
10
+ Then use a language client only when tests or agents are written in that
11
+ language.
10
12
 
11
13
  ## Binary First
12
14
 
13
- Install from npm inside the app repo:
15
+ Install the release binary and scaffold app-local ZMR state:
14
16
 
15
17
  ```bash
16
- npm install --save-dev zeno-mobile-runner
17
- npx zmr-wizard --app-id com.example.mobiletest --package-json
18
- npx zmr version
18
+ curl -fsSL https://raw.githubusercontent.com/johnmikel/zeno-mobile-runner/main/install.sh | sh
19
+ export PATH="$HOME/.local/bin:$PATH"
20
+ zmr init --app --app-id com.example.mobiletest
21
+ zmr version
19
22
  ```
20
23
 
21
- Homebrew is the best install path for non-JavaScript teams because it gives any
22
- language the same `zmr` executable:
24
+ Homebrew is a secondary native install path when you use a generated formula:
23
25
 
24
26
  ```bash
25
27
  brew install --build-from-source ./dist/homebrew/zmr.rb
@@ -29,6 +31,13 @@ brew tap johnmikel/zmr
29
31
  brew install zmr
30
32
  ```
31
33
 
34
+ JavaScript teams can instead pin ZMR in the app repo and use npm helper bins:
35
+
36
+ ```bash
37
+ npm install --save-dev zeno-mobile-runner
38
+ npx zmr-wizard --app-id com.example.mobiletest --package-json
39
+ ```
40
+
32
41
  ## TypeScript
33
42
 
34
43
  ```bash
package/docs/clients.md CHANGED
@@ -1,14 +1,15 @@
1
1
  # Client Guide
2
2
 
3
3
  ZMR clients are reference implementations for the JSON-RPC protocol used by
4
- `zmr serve`. They are intentionally small and dependency-light.
4
+ `zmr serve`. They are intentionally small, dependency-light, and host-side.
5
+ They do not replace the `zmr` binary and they do not run inside the app under
6
+ test.
5
7
 
6
8
  TypeScript and Python are the most common starting points for app teams and
7
- agent harnesses. Go, Rust, Swift, and Kotlin clients are reference integrations
8
- for teams that want to embed the protocol from those ecosystems. Go and Rust
9
- include typed trace discovery and scenario validation helpers for host-side
10
- agent loops; Swift and Kotlin include lightweight discovery and validation
11
- helpers for host-side automation.
9
+ agent harnesses. Go, Rust, Swift, and Kotlin are reference integrations for
10
+ teams that want host-side orchestration in those ecosystems. Go and Rust include
11
+ typed trace discovery and scenario validation helpers; Swift and Kotlin include
12
+ lightweight discovery and validation helpers for native mobile teams.
12
13
 
13
14
  | Language | Entry point | Example |
14
15
  | --- | --- | --- |