zeno-mobile-runner 0.1.3 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (115) hide show
  1. package/CHANGELOG.md +192 -2
  2. package/FEATURES.md +50 -7
  3. package/README.md +168 -120
  4. package/build.zig.zon +3 -3
  5. package/clients/README.md +60 -3
  6. package/clients/go/README.md +12 -0
  7. package/clients/go/zmr/client.go +142 -0
  8. package/clients/kotlin/README.md +18 -1
  9. package/clients/kotlin/build.gradle.kts +1 -1
  10. package/clients/kotlin/src/main/kotlin/dev/zmr/ZmrClient.kt +76 -1
  11. package/clients/python/README.md +19 -0
  12. package/clients/python/pyproject.toml +1 -1
  13. package/clients/python/zmr_client.py +33 -0
  14. package/clients/rust/Cargo.lock +1 -1
  15. package/clients/rust/Cargo.toml +1 -1
  16. package/clients/rust/README.md +25 -1
  17. package/clients/rust/src/lib.rs +201 -0
  18. package/clients/swift/README.md +18 -0
  19. package/clients/swift/Sources/ZMRClient/ZMRClient.swift +82 -0
  20. package/clients/typescript/README.md +16 -0
  21. package/clients/typescript/index.d.ts +12 -0
  22. package/clients/typescript/index.mjs +16 -0
  23. package/clients/typescript/package.json +1 -1
  24. package/docs/agent-discovery.md +151 -22
  25. package/docs/ai-agents.md +99 -11
  26. package/docs/benchmarking.md +49 -3
  27. package/docs/benchmarks/2026-06-09-android-workflow.md +73 -0
  28. package/docs/benchmarks/2026-06-09-android-workflow.results.jsonl +20 -0
  29. package/docs/benchmarks/2026-06-09-framework-baseline-status.md +32 -0
  30. package/docs/benchmarks/2026-06-09-ios-appium-comparison.md +115 -0
  31. package/docs/benchmarks/2026-06-09-ios-appium-comparison.results.jsonl +40 -0
  32. package/docs/benchmarks/2026-06-09-ios-demo.md +90 -0
  33. package/docs/benchmarks/2026-06-09-ios-demo.results.jsonl +20 -0
  34. package/docs/benchmarks/2026-06-09-ios-maestro-comparison.md +128 -0
  35. package/docs/benchmarks/2026-06-09-ios-maestro-comparison.results.jsonl +40 -0
  36. package/docs/benchmarks/2026-06-09-ios-workflow-comparison.md +143 -0
  37. package/docs/benchmarks/2026-06-09-ios-workflow-comparison.results.jsonl +40 -0
  38. package/docs/benchmarks/2026-06-09-ios-xctest-floor.md +106 -0
  39. package/docs/benchmarks/2026-06-09-ios-xctest-floor.results.jsonl +40 -0
  40. package/docs/benchmarks/README.md +36 -0
  41. package/docs/benchmarks/benchmark-lab-v1.json +155 -0
  42. package/docs/benchmarks/benchmark-lab-v1.md +95 -0
  43. package/docs/clients.md +26 -6
  44. package/docs/demo.md +40 -1
  45. package/docs/expo-smoke.md +8 -8
  46. package/docs/frameworks.md +10 -0
  47. package/docs/install.md +3 -2
  48. package/docs/npm.md +100 -4
  49. package/docs/production-readiness.md +123 -0
  50. package/docs/protocol-fixtures/core-session.responses.jsonl +1 -1
  51. package/docs/protocol.md +215 -16
  52. package/docs/scenario-authoring.md +18 -0
  53. package/docs/trace-privacy.md +9 -0
  54. package/docs/troubleshooting.md +7 -1
  55. package/examples/android-workflow.json +79 -0
  56. package/examples/ios-shim-workflow.json +79 -0
  57. package/examples/react-native-expo-workflow.json +75 -0
  58. package/npm/agents.mjs +16 -0
  59. package/npm/commands.mjs +9 -5
  60. package/package.json +6 -1
  61. package/prebuilds/darwin-arm64/zmr +0 -0
  62. package/prebuilds/darwin-x64/zmr +0 -0
  63. package/prebuilds/linux-arm64/zmr +0 -0
  64. package/prebuilds/linux-x64/zmr +0 -0
  65. package/schemas/README.md +4 -0
  66. package/schemas/discover-output.schema.json +83 -0
  67. package/schemas/draft-output.schema.json +58 -0
  68. package/schemas/explore-output.schema.json +94 -0
  69. package/schemas/inspect-output.schema.json +88 -0
  70. package/schemas/run-output.schema.json +2 -0
  71. package/scripts/benchmark-lab.py +253 -0
  72. package/scripts/create-android-demo-app.sh +324 -29
  73. package/scripts/create-ios-demo-app.sh +174 -7
  74. package/scripts/create-react-native-expo-demo-app.sh +727 -0
  75. package/scripts/demo.sh +3 -0
  76. package/scripts/install-ios-shim.sh +2 -2
  77. package/scripts/release-readiness.py +43 -0
  78. package/scripts/run-android-pilot.sh +35 -9
  79. package/scripts/run-ios-pilot.sh +11 -4
  80. package/shims/ios/ZMRShim.swift +10 -0
  81. package/shims/ios/ZMRShimUITestCase.swift +42 -0
  82. package/shims/ios/protocol.md +1 -0
  83. package/skills/zmr-mobile-testing/SKILL.md +28 -3
  84. package/src/cli_discover.zig +239 -0
  85. package/src/cli_draft.zig +924 -0
  86. package/src/cli_explore.zig +136 -0
  87. package/src/cli_import.zig +31 -15
  88. package/src/cli_inspect.zig +310 -0
  89. package/src/cli_output.zig +26 -2
  90. package/src/cli_run.zig +28 -0
  91. package/src/cli_trace.zig +45 -15
  92. package/src/cli_validate.zig +12 -6
  93. package/src/errors.zig +9 -0
  94. package/src/ios.zig +49 -12
  95. package/src/ios_shim.zig +36 -2
  96. package/src/json_rpc_methods.zig +85 -11
  97. package/src/json_rpc_params.zig +8 -0
  98. package/src/json_rpc_protocol.zig +1 -1
  99. package/src/json_rpc_trace.zig +112 -0
  100. package/src/main.zig +27 -2
  101. package/src/mcp.zig +209 -6
  102. package/src/mcp_protocol.zig +29 -1
  103. package/src/mcp_trace.zig +126 -4
  104. package/src/report.zig +186 -0
  105. package/src/runner.zig +26 -4
  106. package/src/runner_actions.zig +10 -0
  107. package/src/runner_diagnostics.zig +31 -1
  108. package/src/runner_events.zig +70 -7
  109. package/src/runner_native.zig +17 -1
  110. package/src/runner_waits.zig +82 -19
  111. package/src/scaffold.zig +28 -12
  112. package/src/scenario.zig +32 -4
  113. package/src/schema_registry.zig +4 -0
  114. package/src/version.zig +1 -1
  115. package/viewer/app.js +23 -3
@@ -146,6 +146,10 @@ export class ZmrClient {
146
146
  return this.request("assert.healthy", options);
147
147
  }
148
148
 
149
+ validateScenario(path) {
150
+ return this.request("scenario.validate", { path });
151
+ }
152
+
149
153
  exportTrace(out, options = {}) {
150
154
  return this.request("trace.export", { out, ...options });
151
155
  }
@@ -154,6 +158,18 @@ export class ZmrClient {
154
158
  return this.request("trace.events", { afterSeq, ...options });
155
159
  }
156
160
 
161
+ explainTrace() {
162
+ return this.request("trace.explain", {});
163
+ }
164
+
165
+ discoverTrace(out, options = {}) {
166
+ return this.request("trace.discover", { out, ...options });
167
+ }
168
+
169
+ exploreTrace(out, goal, options = {}) {
170
+ return this.request("trace.explore", { out, goal, ...options });
171
+ }
172
+
157
173
  async close() {
158
174
  if (this.#closed) return;
159
175
  this.#closed = true;
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@zmr/client",
3
- "version": "0.1.3",
3
+ "version": "0.2.0",
4
4
  "type": "module",
5
5
  "main": "index.mjs",
6
6
  "types": "index.d.ts",
@@ -1,18 +1,32 @@
1
1
  # Agent Discovery
2
2
 
3
- ZMR supports agent-led discovery today through its JSON-RPC and MCP interfaces.
4
- An external agent can observe the app, choose typed actions, inspect trace
5
- events, and write a repeatable scenario file as it learns a flow.
6
-
7
- ZMR does not include a built-in autonomous crawler or test writer in this
8
- developer preview. Keep the planning loop in the agent, and keep ZMR as the
9
- deterministic mobile control plane.
3
+ ZMR supports agent-led discovery today through its JSON-RPC and MCP interfaces,
4
+ trace events, semantic snapshot artifacts, guarded trace exploration, in-band
5
+ trace discovery, and offline scenario drafting. An external agent can observe
6
+ the app, choose typed actions, inspect trace events, ask ZMR to write a small
7
+ repeatable scenario from the trace, and then edit it as it learns a flow.
8
+
9
+ `zmr explore` is the built-in review-first exploration command. It is
10
+ trace-backed, not an unbounded crawler: it does not launch devices, invent
11
+ missing actions, discover credentials, or commit files. Keep autonomous
12
+ planning in the agent, and keep ZMR as the deterministic mobile control plane.
13
+
14
+ ```mermaid
15
+ flowchart LR
16
+ SESSION["Live agent session<br/>or zmr run"] --> TRACE["Trace directory"]
17
+ TRACE --> DISCOVER["zmr discover / draft / explore<br/>--from-trace"]
18
+ DISCOVER --> CANDIDATE["Scenario candidate<br/>.zmr/discovered/*.json"]
19
+ CANDIDATE --> REVIEW["Human / agent review"]
20
+ REVIEW --> VALIDATE["zmr validate --json"]
21
+ VALIDATE --> CI["zmr run in CI<br/>report.html · junit.xml"]
22
+ ```
10
23
 
11
24
  ## Recommended Loop
12
25
 
13
26
  1. Validate local setup:
14
27
 
15
28
  ```bash
29
+ zmr inspect --json --dir .
16
30
  zmr doctor --json --config .zmr/config.json
17
31
  zmr validate --json .zmr/ios-smoke.json
18
32
  ```
@@ -34,15 +48,128 @@ deterministic mobile control plane.
34
48
  5. Choose one typed action, such as `ui.tap`, `ui.type`, `app.openLink`, or
35
49
  `wait.until`.
36
50
  6. Observe again and inspect `trace.events`.
37
- 7. Write successful steps into a candidate scenario, for example
38
- `.zmr/discovered/login-smoke.json`.
39
- 8. Validate the candidate scenario:
51
+ 7. If you used `zmr run --json --trace-dir`, read `nextCommands`; traced run
52
+ summaries include HTML/JUnit report output and the matching
53
+ `zmr discover --from-trace` command.
54
+ 8. If you want the CLI run itself to write the candidate, use:
55
+
56
+ ```bash
57
+ zmr run .zmr/login-smoke.json \
58
+ --trace-dir traces/zmr-agent \
59
+ --discover-out .zmr/discovered/replay-smoke.json \
60
+ --json
61
+ ```
62
+
63
+ The run response embeds `discovery`, the same JSON payload returned by
64
+ `zmr discover --json`, including `replay` coverage metadata for converted
65
+ and skipped trace actions.
66
+ 9. Generate a reviewable scenario candidate from the trace. For CLI-driven
67
+ agent loops, prefer `zmr explore` so the goal and guardrails travel with the
68
+ machine-readable result:
69
+
70
+ ```bash
71
+ zmr explore --from-trace traces/zmr-agent \
72
+ --out .zmr/discovered/login-smoke.json \
73
+ --goal "find a stable login smoke" \
74
+ --include-actions \
75
+ --validate \
76
+ --json
77
+ ```
78
+
79
+ The output is covered by `schemas/explore-output.schema.json` and includes
80
+ `autonomous:false`, `reviewRequired:true`, `guardrails`, replay coverage,
81
+ validation, and deterministic next commands.
82
+
83
+ 10. Use live trace exploration when the agent should keep the goal attached to
84
+ the generated draft. JSON-RPC agents can call `trace.explore`:
85
+
86
+ ```json
87
+ {"jsonrpc":"2.0","id":7,"method":"trace.explore","params":{"out":".zmr/discovered/login-smoke.json","goal":"find a stable login smoke","includeActions":true,"validate":true,"force":true}}
88
+ ```
89
+
90
+ MCP agents can call `trace_explore` with `out`, `goal`,
91
+ `includeActions`, `validate`, and `force`. The response includes
92
+ `autonomous:false`, `reviewRequired:true`, and `guardrails`.
93
+
94
+ 11. Use the lower-level trace discovery primitive when the agent already owns
95
+ goal tracking. JSON-RPC agents can
96
+ call `trace.discover`:
97
+
98
+ ```json
99
+ {"jsonrpc":"2.0","id":7,"method":"trace.discover","params":{"out":".zmr/discovered/replay-smoke.json","includeActions":true,"validate":true,"force":true}}
100
+ ```
101
+
102
+ MCP agents can call `trace_discover` with the same `out`,
103
+ `includeActions`, `validate`, and `force` arguments. The offline CLI
104
+ equivalent is:
105
+
106
+ ```bash
107
+ zmr discover --from-trace traces/zmr-agent \
108
+ --out .zmr/discovered/replay-smoke.json \
109
+ --include-actions \
110
+ --validate \
111
+ --json
112
+ ```
113
+
114
+ `zmr discover` writes a scenario from trace evidence and, with
115
+ `--validate`, immediately proves that the generated file is syntactically
116
+ runnable by ZMR. It is still review-first: it does not crawl, invent missing
117
+ actions, discover credentials, or commit the scenario.
118
+ Read the `replay` object before trusting coverage: `eventCount` is the
119
+ trace action event count considered for replay, `stepCount` is the number of
120
+ generated replay steps, and `skippedEventCount` is the number of events left
121
+ out.
122
+
123
+ 11. After editing a generated scenario, validate it in-band with JSON-RPC:
124
+
125
+ ```json
126
+ {"jsonrpc":"2.0","id":8,"method":"scenario.validate","params":{"path":".zmr/discovered/replay-smoke.json"}}
127
+ ```
128
+
129
+ MCP agents can call `scenario_validate` with the same `path` argument. The
130
+ result matches `zmr validate --json`, including field paths and source
131
+ locations for invalid files.
132
+
133
+ 12. Use the lower-level draft primitive when you want separate surface and
134
+ replay files. For a conservative surface-smoke scenario:
135
+
136
+ ```bash
137
+ zmr draft --from-trace traces/zmr-agent \
138
+ --out .zmr/discovered/surface-smoke.json \
139
+ --json
140
+ ```
141
+
142
+ The draft contains `launch`, `snapshot`, and `assertVisible` steps from
143
+ stable visible selectors. It does not tap, type, crawl, or commit anything.
144
+ If the trace contains successful typed actions and you want a replayable
145
+ starting point, include those supported events explicitly:
146
+
147
+ ```bash
148
+ zmr draft --from-trace traces/zmr-agent \
149
+ --out .zmr/discovered/replay-smoke.json \
150
+ --include-actions \
151
+ --json
152
+ ```
153
+
154
+ Replay drafts include only supported events with stable replay data, such as
155
+ launch, deep links, selector taps, selector text entry, back, keyboard hiding,
156
+ coordinate-complete swipes, selector/timeout-preserving waits, and
157
+ direction/timeout-preserving selector scrolls, selector/timeout-preserving
158
+ `assertVisible` and `assertNotVisible`, `assertNoneVisible` selector arrays,
159
+ and timed `assertHealthy` checks. Native selector wait traces also retain
160
+ timeout context for successful waits and timeout diagnostics.
161
+ Unsupported events stay out of the scenario and are reported as warnings.
162
+
163
+ 13. Edit the draft, discovery, or exploration output into a candidate flow, for example
164
+ `.zmr/discovered/login-smoke.json`, by copying only steps that were observed
165
+ and understood.
166
+ 14. Validate the candidate scenario:
40
167
 
41
168
  ```bash
42
169
  zmr validate --json .zmr/discovered/login-smoke.json
43
170
  ```
44
171
 
45
- 9. Re-run it deterministically:
172
+ 15. Re-run it deterministically:
46
173
 
47
174
  ```bash
48
175
  zmr run .zmr/discovered/login-smoke.json \
@@ -52,7 +179,7 @@ deterministic mobile control plane.
52
179
  --json
53
180
  ```
54
181
 
55
- 10. Export a redacted bundle before sharing artifacts:
182
+ 16. Export a redacted bundle before sharing artifacts:
56
183
 
57
184
  ```bash
58
185
  zmr export traces/zmr-login-smoke \
@@ -68,16 +195,18 @@ deterministic mobile control plane.
68
195
  - Prefer accessibility identifiers, resource ids, stable labels, and exact text
69
196
  over coordinates.
70
197
  - Require human review before committing generated tests.
198
+ - Treat `zmr explore` output as a starting point, not as a production-ready
199
+ flow.
200
+ - Treat `zmr discover` output as a starting point, not as a production-ready
201
+ flow.
202
+ - Treat `zmr draft` output as a starting point, not as a production-ready flow.
203
+ - Use `--include-actions` only after reviewing the trace events that produced
204
+ the replay draft.
71
205
  - Redact traces before sharing them outside the local team.
72
206
 
73
- ## Future Shape
74
-
75
- A future command could wrap this loop:
76
-
77
- ```bash
78
- zmr explore --goal "find the login flow" --out .zmr/discovered/login-smoke.json
79
- ```
207
+ ## Current Shape
80
208
 
81
- That command is not shipped today. The safer product direction is to make
82
- scenario discovery explicit, reviewable, and trace-backed before it becomes a
83
- one-command workflow.
209
+ `zmr explore` is the first shipped goal-carrying command in this loop. It still
210
+ requires an existing trace because the current product direction is to keep
211
+ scenario generation explicit, reviewable, and trace-backed before any future
212
+ goal-driven crawler can safely act inside an app.
package/docs/ai-agents.md CHANGED
@@ -1,13 +1,33 @@
1
1
  # AI Agent Guide
2
2
 
3
3
  ZMR is built for external agents. The runner provides device state, typed
4
- actions, waits, assertions, and trace export; the agent decides the next step.
4
+ actions, waits, assertions, trace explanation, and trace export; the agent
5
+ decides the next step.
6
+
7
+ ```mermaid
8
+ sequenceDiagram
9
+ participant Agent as AI agent
10
+ participant ZMR
11
+ participant Device as Emulator / simulator
12
+ Agent->>ZMR: semantic_snapshot
13
+ ZMR->>Device: capture UI + screenshot
14
+ ZMR-->>Agent: roles, stable selectors, bounds
15
+ Agent->>ZMR: tap / type / swipe / open_link
16
+ ZMR->>Device: execute + settle
17
+ Agent->>ZMR: wait_visible / assert_visible
18
+ ZMR-->>Agent: typed result + trace events
19
+ Agent->>ZMR: trace_discover
20
+ ZMR-->>Agent: reviewable replay scenario
21
+ Agent->>ZMR: trace_export --redact
22
+ ZMR-->>Agent: .zmrtrace evidence bundle
23
+ ```
5
24
 
6
25
  ## Agent Setup Loop
7
26
 
8
27
  Start inside the app checkout:
9
28
 
10
29
  ```bash
30
+ zmr inspect --json --dir .
11
31
  zmr doctor --json --config .zmr/config.json
12
32
  zmr validate --json .zmr/android-smoke.json
13
33
  zmr validate --json .zmr/ios-smoke.json
@@ -18,6 +38,10 @@ Use `zmr doctor --strict --json` in CI or setup flows that should fail on any
18
38
  warning. Prefer JSON output for automation because it includes stable error
19
39
  codes, field paths, and remediation hints.
20
40
 
41
+ Use `zmr inspect --json --dir .` first when an agent enters a repo. It is a
42
+ read-only handoff with config status, generated agent instruction status,
43
+ platform smoke scenario paths, safe next commands, and explicit claim limits.
44
+
21
45
  ## Live JSON-RPC Session
22
46
 
23
47
  Agents should prefer `zmr serve` for interactive work:
@@ -35,8 +59,15 @@ Recommended flow:
35
59
  4. Choose one typed action or assertion.
36
60
  5. Let ZMR settle, then observe again.
37
61
  6. Poll `trace.events` during long runs.
38
- 7. Call `trace.export` with `redact: true` before sharing artifacts.
39
- 8. Call `session.close`.
62
+ 7. Call `trace.explain` when you need the active trace status, failure
63
+ diagnostic, or next commands.
64
+ 8. Call `trace.explore` when you want a review-required scenario candidate for
65
+ a stated goal from the active trace.
66
+ 9. Call `trace.discover` when you want a lower-level reviewable scenario
67
+ candidate from the active trace and the agent already owns goal tracking.
68
+ 10. Call `scenario.validate` after editing generated scenario files.
69
+ 11. Call `trace.export` with `redact: true` before sharing artifacts.
70
+ 12. Call `session.close`.
40
71
 
41
72
  Do not parse screenshots or terminal text when the same fact is available from
42
73
  snapshot nodes, action results, CLI JSON, or trace events.
@@ -47,6 +78,14 @@ For iOS visual captures, `artifactStatus: "captured"` with
47
78
  XCTest hierarchy extraction failed. Use `zmr explain --json <trace-dir>` for
48
79
  the same diagnostic shape after the run.
49
80
 
81
+ For traced CLI runs, `zmr run --json` also returns `nextCommands` with the
82
+ HTML/JUnit report, explain, `zmr discover --from-trace`, and redacted export
83
+ handoffs.
84
+ Agents should prefer those commands over reconstructing trace paths from text.
85
+ When an agent should create the reviewable scenario in the same process, pass
86
+ `--discover-out .zmr/discovered/<name>.json`; the run JSON will include a
87
+ `discovery` object with validation results and `replay` coverage metadata.
88
+
50
89
  ## MCP Session
51
90
 
52
91
  Agents that support the Model Context Protocol can use ZMR directly as a local
@@ -61,9 +100,14 @@ The MCP server exposes mobile-specific tools:
61
100
  - `snapshot`: raw ZMR observation JSON
62
101
  - `semantic_snapshot`: normalized roles, names, selectors, bounds, and
63
102
  recommended actions
64
- - `tap`, `type`, `press_back`, and `open_link`
65
- - `wait_visible`
66
- - `trace_events` and `trace_export`
103
+ - `install_app`, `launch_app`, `stop_app`, and `clear_state`
104
+ - `tap`, `type`, `erase_text`, `hide_keyboard`, `swipe`, `press_back`,
105
+ `open_link`, and `scroll_until_visible`
106
+ - `wait_visible`, `wait_not_visible`, and `wait_any`
107
+ - `assert_visible`, `assert_not_visible`, and `assert_healthy`
108
+ - `scenario_validate`
109
+ - `trace_events`, `trace_explain`, `trace_explore`, `trace_discover`, and
110
+ `trace_export`
67
111
 
68
112
  Prefer `semantic_snapshot` for action planning. It avoids forcing an agent to
69
113
  infer intent from platform-specific Android/UI Automator or XCTest class names.
@@ -72,12 +116,56 @@ infer intent from platform-specific Android/UI Automator or XCTest class names.
72
116
 
73
117
  Agents can use ZMR to discover flows and draft scenarios by looping over
74
118
  `observe.semanticSnapshot`, one typed action, trace events, and scenario
75
- validation. See [Agent Discovery](agent-discovery.md) for the recommended
76
- reviewable loop.
119
+ validation. After a session has produced trace artifacts, call JSON-RPC
120
+ `trace.explain` or MCP `trace_explain` for in-band triage, then call JSON-RPC
121
+ `trace.explore` or MCP `trace_explore` when the generated draft should carry a
122
+ stated goal and guardrails. Use JSON-RPC `trace.discover` or MCP
123
+ `trace_discover` for the lower-level trace-backed draft when the agent already
124
+ owns goal tracking. Use JSON-RPC `scenario.validate` or MCP
125
+ `scenario_validate` after edits. The CLI command is the offline equivalent:
126
+
127
+ ```bash
128
+ zmr discover --from-trace traces/zmr-agent \
129
+ --out .zmr/discovered/replay-smoke.json \
130
+ --include-actions \
131
+ --validate \
132
+ --json
133
+ ```
134
+
135
+ `zmr discover` is review-first. It writes from trace evidence, validates the
136
+ generated scenario when asked, and returns next commands for deterministic
137
+ reruns. It does not crawl, discover credentials, or commit tests. The JSON
138
+ `replay` object lets agents compare trace action events considered for replay,
139
+ generated replay steps, and skipped events before making coverage claims.
140
+
141
+ Use `zmr draft` when you want the lower-level split workflow. It writes
142
+ `launch`, `snapshot`, and conservative `assertVisible` checks by default. For
143
+ traces produced by an agent session with successful typed actions, add
144
+ `--include-actions` to generate a replay draft from supported events before the
145
+ final snapshot assertions:
146
+
147
+ ```bash
148
+ zmr draft --from-trace traces/zmr-agent \
149
+ --out .zmr/discovered/replay-smoke.json \
150
+ --include-actions \
151
+ --json
152
+ zmr validate --json .zmr/discovered/replay-smoke.json
153
+ ```
77
154
 
78
- ZMR does not ship a built-in autonomous crawler or test writer in this developer
79
- preview. Keep autonomous planning outside the runner, then commit only reviewed
80
- scenario JSON.
155
+ Unsupported or underspecified events are skipped with warnings instead of being
156
+ guessed. Supported replay steps preserve selector and timeout data for waits,
157
+ selector and timeout data for `assertVisible` and `assertNotVisible`, selector
158
+ arrays for `assertNoneVisible`, and timeouts for `assertHealthy` when the trace
159
+ records them. See [Agent Discovery](agent-discovery.md) for the
160
+ recommended reviewable loop.
161
+
162
+ CLI agents can use `zmr explore --from-trace <trace-dir> --out <scenario.json>
163
+ --goal <goal> --include-actions --validate --json` when the goal should travel
164
+ with the generated scenario candidate. The result includes `autonomous:false`,
165
+ `reviewRequired:true`, `guardrails`, replay coverage, validation, and next
166
+ commands. ZMR still does not ship an unbounded autonomous crawler or test
167
+ writer in this developer preview. Keep autonomous planning outside the runner,
168
+ then commit only reviewed scenario JSON.
81
169
 
82
170
  ## Scenario File Workflow
83
171
 
@@ -1,6 +1,41 @@
1
1
  # Benchmarking
2
2
 
3
- ZMR benchmark output is intentionally simple: each run appends one JSON object to `results.jsonl`, and `zmr report` turns that directory into a local HTML report.
3
+ ZMR benchmark output is intentionally simple: each run appends one JSON object
4
+ to `results.jsonl`, and `zmr report` turns that directory into local HTML and
5
+ optional JUnit XML artifacts.
6
+
7
+ ## Public Evidence
8
+
9
+ Public-safe benchmark evidence lives in [docs/benchmarks](benchmarks/README.md).
10
+ The first committed pack is
11
+ [2026-06-09 iOS simulator demo](benchmarks/2026-06-09-ios-demo.md): 20 repeated
12
+ runs of the generated iOS smoke scenario with a 100% pass rate. It is a
13
+ single-tool reliability benchmark, not a competitive speed claim.
14
+
15
+ The first baseline comparison is documented in
16
+ [docs/benchmarks](benchmarks/README.md): 20 ZMR runs and 20 baseline runner
17
+ runs against the same generated iOS demo app.
18
+
19
+ Additional public-safe packs in that directory include a second baseline
20
+ comparison and a native shim floor. The floor is not a product comparison; it
21
+ shows the warmed platform path ZMR can approach after runner and trace overhead
22
+ are reduced.
23
+
24
+ A richer iOS workflow pack is also committed there: 20 ZMR rows and 20 baseline
25
+ runner rows against the same generated app build, covering profile entry,
26
+ catalog item selection, save, review, and final-state assertion.
27
+
28
+ Benchmark Lab v1 is the next public evidence layer. It defines framework
29
+ fixtures, timing modes, runner-adapter labels, and claim rules in a manifest
30
+ that can be validated or rendered with `zmr-benchmark-lab`.
31
+
32
+ The generated Android workflow now has its first 20-run evidence pack in
33
+ [docs/benchmarks](benchmarks/README.md), using the platform UIAutomator path
34
+ without the optional Android instrumentation shim.
35
+
36
+ A generated React Native/Expo fixture is now available for the next evidence
37
+ slice. It includes stable `testID` values, accessibility labels, deep-link
38
+ setup, and Android/iOS ZMR workflow scenarios, but no public timing rows yet.
4
39
 
5
40
  ## Single Tool Benchmark
6
41
 
@@ -29,7 +64,9 @@ or p95 duration misses the configured threshold.
29
64
  Generate a report:
30
65
 
31
66
  ```bash
32
- zmr report traces/bench-<timestamp> --out traces/bench-<timestamp>/report.html
67
+ zmr report traces/bench-<timestamp> \
68
+ --out traces/bench-<timestamp>/report.html \
69
+ --junit traces/bench-<timestamp>/junit.xml
33
70
  ```
34
71
 
35
72
  ## Pilot Wrapper
@@ -62,7 +99,13 @@ Use `--screen-record` when investigating visual flakes:
62
99
  --max-failures 0
63
100
  ```
64
101
 
65
- For `--runs 1`, the script exports normal and redacted `.zmrtrace` bundles. For `--runs > 1`, it writes benchmark directories and HTML reports.
102
+ For `--runs 1`, the script exports normal and redacted `.zmrtrace` bundles.
103
+ For `--runs > 1`, the pilot wrappers and generated app reliability scripts
104
+ write benchmark directories with HTML and JUnit reports.
105
+
106
+ Apps scaffolded by `zmr-wizard` get matching package scripts, so app-local
107
+ reliability gates run as `bun run zmr:android:reliability` and
108
+ `bun run zmr:ios:reliability` (or the npm equivalents).
66
109
 
67
110
  The iOS pilot wrapper supports the same repeated-run gates:
68
111
 
@@ -128,9 +171,12 @@ Benchmark reports include:
128
171
  - terminal trace status
129
172
  - failed step index and error when available
130
173
  - links to each run's `events.jsonl`
174
+ - optional JUnit XML with one testcase per benchmark row for CI test reports
131
175
 
132
176
  Before making public performance claims, run the same scenario repeatedly on a clean emulator image and include the raw `results.jsonl` plus the redacted trace bundle for any failure.
133
177
 
178
+ ![ZMR HTML trace report showing the trace summary and per-event timeline](assets/report-html.png)
179
+
134
180
  ## Compare Against A Baseline
135
181
 
136
182
  Use `zmr-compare-benchmarks` when a private app repo has benchmark rows from
@@ -0,0 +1,73 @@
1
+ # 2026-06-09 Android Emulator Workflow
2
+
3
+ This evidence pack records 20 repeated ZMR runs of the generated public Android
4
+ workflow demo app. The flow launches the app, fills a profile form, scrolls the
5
+ catalog, opens an item detail page, saves the item, reviews the order, and
6
+ asserts the final state.
7
+
8
+ This is a single-tool reliability and timing pack, not a comparison against
9
+ another runner. Treat it as reproducible evidence for this app, host, emulator,
10
+ app build, and workflow shape.
11
+
12
+ ## Result
13
+
14
+ | Tool | Runs | Pass rate | Failures | Mean duration | p95 duration |
15
+ | --- | ---: | ---: | ---: | ---: | ---: |
16
+ | ZMR | 20 | 100.00% | 0 | 44134 ms | 46385 ms |
17
+
18
+ The fastest run was 38627 ms and the slowest run was 49875 ms. The later rows
19
+ clustered around 43 seconds, which points to Android UIAutomator snapshot and
20
+ scroll execution as the next optimization target for this fixture.
21
+
22
+ ## Environment
23
+
24
+ | Field | Value |
25
+ | --- | --- |
26
+ | ZMR runner | `0.1.8` |
27
+ | ZMR protocol | `2026-04-28` |
28
+ | Host OS | macOS 26.6, arm64 |
29
+ | Android emulator | 36.4.10.0 |
30
+ | ADB | 1.0.41, platform-tools 37.0.0 |
31
+ | Android platform | Android 15, API 35, arm64-v8a |
32
+ | Emulator viewport | 720 x 1280, 320 dpi |
33
+ | App id | `com.example.mobiletest` |
34
+ | App build label | `generated-android-workflow-demo-20260609` |
35
+ | Demo app source | Generated by `scripts/create-android-demo-app.sh` |
36
+
37
+ Before collection, the emulator was booted fresh, the app was reinstalled, the
38
+ screen was unlocked, and Android window, transition, and animator duration
39
+ scales were set to `0`.
40
+
41
+ ## Command
42
+
43
+ ```bash
44
+ ZMR_BIN="$PWD/zig-out/bin/zmr" scripts/benchmark.sh \
45
+ --zmr examples/android-workflow.json \
46
+ --platform android \
47
+ --device emulator-5554 \
48
+ --app-id com.example.mobiletest \
49
+ --app-build generated-android-workflow-demo-20260609 \
50
+ --runs 20 \
51
+ --trace-root traces/public-benchmarks/20260609-android-workflow/zmr \
52
+ --results traces/public-benchmarks/20260609-android-workflow/results.jsonl \
53
+ --replace \
54
+ --min-pass-rate 100 \
55
+ --max-failures 0
56
+ ```
57
+
58
+ The ZMR scenario is committed as
59
+ [`examples/android-workflow.json`](../../examples/android-workflow.json).
60
+
61
+ ## Rows
62
+
63
+ The sanitized result rows are committed in
64
+ [2026-06-09-android-workflow.results.jsonl](2026-06-09-android-workflow.results.jsonl).
65
+
66
+ Raw local trace and runner logs are not committed because they can include local
67
+ absolute paths.
68
+
69
+ ## Scope
70
+
71
+ This benchmark uses the platform UIAutomator path without the optional Android
72
+ instrumentation shim. It does not compare cloud execution, React Native, Expo,
73
+ Flutter, Appium, Maestro, Detox, or Android instrumentation-runner baselines.
@@ -0,0 +1,20 @@
1
+ {"tool":"zmr","run":1,"status":"ok","durationMs":49875,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-1","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
2
+ {"tool":"zmr","run":2,"status":"ok","durationMs":45169,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-2","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
3
+ {"tool":"zmr","run":3,"status":"ok","durationMs":46093,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-3","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
4
+ {"tool":"zmr","run":4,"status":"ok","durationMs":44695,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-4","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
5
+ {"tool":"zmr","run":5,"status":"ok","durationMs":44028,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-5","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
6
+ {"tool":"zmr","run":6,"status":"ok","durationMs":44821,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-6","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
7
+ {"tool":"zmr","run":7,"status":"ok","durationMs":46385,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-7","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
8
+ {"tool":"zmr","run":8,"status":"ok","durationMs":45751,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-8","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
9
+ {"tool":"zmr","run":9,"status":"ok","durationMs":38627,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-9","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
10
+ {"tool":"zmr","run":10,"status":"ok","durationMs":42599,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-10","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
11
+ {"tool":"zmr","run":11,"status":"ok","durationMs":42968,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-11","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
12
+ {"tool":"zmr","run":12,"status":"ok","durationMs":43299,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-12","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
13
+ {"tool":"zmr","run":13,"status":"ok","durationMs":43684,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-13","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
14
+ {"tool":"zmr","run":14,"status":"ok","durationMs":43056,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-14","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
15
+ {"tool":"zmr","run":15,"status":"ok","durationMs":43418,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-15","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
16
+ {"tool":"zmr","run":16,"status":"ok","durationMs":43267,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-16","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
17
+ {"tool":"zmr","run":17,"status":"ok","durationMs":43780,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-17","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
18
+ {"tool":"zmr","run":18,"status":"ok","durationMs":43371,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-18","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
19
+ {"tool":"zmr","run":19,"status":"ok","durationMs":43095,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-19","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
20
+ {"tool":"zmr","run":20,"status":"ok","durationMs":44693,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-20","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
@@ -0,0 +1,32 @@
1
+ # 2026-06-09 Framework Baseline Status
2
+
3
+ This note tracks the requested baseline coverage beyond the committed iOS demo
4
+ comparisons.
5
+
6
+ ## Completed
7
+
8
+ | Baseline | Status | Evidence |
9
+ | --- | --- | --- |
10
+ | Maestro | Completed | [iOS ZMR vs Maestro comparison](2026-06-09-ios-maestro-comparison.md) |
11
+ | Appium | Completed | [iOS ZMR vs Appium comparison](2026-06-09-ios-appium-comparison.md) |
12
+ | XCTest floor | Completed | [iOS XCTest shim floor](2026-06-09-ios-xctest-floor.md) |
13
+
14
+ ## Not Yet Fair To Publish
15
+
16
+ | Baseline | Why it needs a fixture first | Next evidence pack |
17
+ | --- | --- | --- |
18
+ | Detox | The CLI requires a project-local `detox` install and a React Native app with Detox configuration, native iOS/Android build targets, and a test file. Running it against the generated Swift demo would not be representative. | React Native fixture with the same launch, deep link, assertion, and warm-suite/cold-command modes. |
19
+ | Flutter | The local machine does not have the Flutter CLI installed, and ZMR should not claim Flutter widget-tree-driver coverage. | Flutter fixture using platform-level labels/deep links plus either Flutter `integration_test` or an external runner baseline. |
20
+ | Espresso | No Android emulator is currently attached in this workspace. Espresso should compare against an Android fixture with an instrumentation target rather than an iOS-only demo. | Android generated demo with ZMR, direct Espresso instrumentation, and Appium UIAutomator2 rows. |
21
+
22
+ ## Speed Work Opened By This Pass
23
+
24
+ The XCTest floor showed that ZMR can be made faster. The first fix from this
25
+ pass skips the expensive iOS system-open alert probe for custom URL schemes and
26
+ keeps it for `http://` and `https://` links. On the generated iOS demo smoke
27
+ flow, the shim-backed ZMR mean dropped to `2007 ms` while the direct warmed
28
+ XCTest shim floor measured `1004 ms`.
29
+
30
+ The next speed target is a warm-suite mode where one ZMR process executes many
31
+ iterations in a single device session, avoiding repeated CLI startup and trace
32
+ setup for benchmark loops.