npm - zeno-mobile-runner - Versions diffs - 0.1.3 → 0.2.0 - Mend

zeno-mobile-runner 0.1.3 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (115) hide show

package/CHANGELOG.md +192 -2
package/FEATURES.md +50 -7
package/README.md +168 -120
package/build.zig.zon +3 -3
package/clients/README.md +60 -3
package/clients/go/README.md +12 -0
package/clients/go/zmr/client.go +142 -0
package/clients/kotlin/README.md +18 -1
package/clients/kotlin/build.gradle.kts +1 -1
package/clients/kotlin/src/main/kotlin/dev/zmr/ZmrClient.kt +76 -1
package/clients/python/README.md +19 -0
package/clients/python/pyproject.toml +1 -1
package/clients/python/zmr_client.py +33 -0
package/clients/rust/Cargo.lock +1 -1
package/clients/rust/Cargo.toml +1 -1
package/clients/rust/README.md +25 -1
package/clients/rust/src/lib.rs +201 -0
package/clients/swift/README.md +18 -0
package/clients/swift/Sources/ZMRClient/ZMRClient.swift +82 -0
package/clients/typescript/README.md +16 -0
package/clients/typescript/index.d.ts +12 -0
package/clients/typescript/index.mjs +16 -0
package/clients/typescript/package.json +1 -1
package/docs/agent-discovery.md +151 -22
package/docs/ai-agents.md +99 -11
package/docs/benchmarking.md +49 -3
package/docs/benchmarks/2026-06-09-android-workflow.md +73 -0
package/docs/benchmarks/2026-06-09-android-workflow.results.jsonl +20 -0
package/docs/benchmarks/2026-06-09-framework-baseline-status.md +32 -0
package/docs/benchmarks/2026-06-09-ios-appium-comparison.md +115 -0
package/docs/benchmarks/2026-06-09-ios-appium-comparison.results.jsonl +40 -0
package/docs/benchmarks/2026-06-09-ios-demo.md +90 -0
package/docs/benchmarks/2026-06-09-ios-demo.results.jsonl +20 -0
package/docs/benchmarks/2026-06-09-ios-maestro-comparison.md +128 -0
package/docs/benchmarks/2026-06-09-ios-maestro-comparison.results.jsonl +40 -0
package/docs/benchmarks/2026-06-09-ios-workflow-comparison.md +143 -0
package/docs/benchmarks/2026-06-09-ios-workflow-comparison.results.jsonl +40 -0
package/docs/benchmarks/2026-06-09-ios-xctest-floor.md +106 -0
package/docs/benchmarks/2026-06-09-ios-xctest-floor.results.jsonl +40 -0
package/docs/benchmarks/README.md +36 -0
package/docs/benchmarks/benchmark-lab-v1.json +155 -0
package/docs/benchmarks/benchmark-lab-v1.md +95 -0
package/docs/clients.md +26 -6
package/docs/demo.md +40 -1
package/docs/expo-smoke.md +8 -8
package/docs/frameworks.md +10 -0
package/docs/install.md +3 -2
package/docs/npm.md +100 -4
package/docs/production-readiness.md +123 -0
package/docs/protocol-fixtures/core-session.responses.jsonl +1 -1
package/docs/protocol.md +215 -16
package/docs/scenario-authoring.md +18 -0
package/docs/trace-privacy.md +9 -0
package/docs/troubleshooting.md +7 -1
package/examples/android-workflow.json +79 -0
package/examples/ios-shim-workflow.json +79 -0
package/examples/react-native-expo-workflow.json +75 -0
package/npm/agents.mjs +16 -0
package/npm/commands.mjs +9 -5
package/package.json +6 -1
package/prebuilds/darwin-arm64/zmr +0 -0
package/prebuilds/darwin-x64/zmr +0 -0
package/prebuilds/linux-arm64/zmr +0 -0
package/prebuilds/linux-x64/zmr +0 -0
package/schemas/README.md +4 -0
package/schemas/discover-output.schema.json +83 -0
package/schemas/draft-output.schema.json +58 -0
package/schemas/explore-output.schema.json +94 -0
package/schemas/inspect-output.schema.json +88 -0
package/schemas/run-output.schema.json +2 -0
package/scripts/benchmark-lab.py +253 -0
package/scripts/create-android-demo-app.sh +324 -29
package/scripts/create-ios-demo-app.sh +174 -7
package/scripts/create-react-native-expo-demo-app.sh +727 -0
package/scripts/demo.sh +3 -0
package/scripts/install-ios-shim.sh +2 -2
package/scripts/release-readiness.py +43 -0
package/scripts/run-android-pilot.sh +35 -9
package/scripts/run-ios-pilot.sh +11 -4
package/shims/ios/ZMRShim.swift +10 -0
package/shims/ios/ZMRShimUITestCase.swift +42 -0
package/shims/ios/protocol.md +1 -0
package/skills/zmr-mobile-testing/SKILL.md +28 -3
package/src/cli_discover.zig +239 -0
package/src/cli_draft.zig +924 -0
package/src/cli_explore.zig +136 -0
package/src/cli_import.zig +31 -15
package/src/cli_inspect.zig +310 -0
package/src/cli_output.zig +26 -2
package/src/cli_run.zig +28 -0
package/src/cli_trace.zig +45 -15
package/src/cli_validate.zig +12 -6
package/src/errors.zig +9 -0
package/src/ios.zig +49 -12
package/src/ios_shim.zig +36 -2
package/src/json_rpc_methods.zig +85 -11
package/src/json_rpc_params.zig +8 -0
package/src/json_rpc_protocol.zig +1 -1
package/src/json_rpc_trace.zig +112 -0
package/src/main.zig +27 -2
package/src/mcp.zig +209 -6
package/src/mcp_protocol.zig +29 -1
package/src/mcp_trace.zig +126 -4
package/src/report.zig +186 -0
package/src/runner.zig +26 -4
package/src/runner_actions.zig +10 -0
package/src/runner_diagnostics.zig +31 -1
package/src/runner_events.zig +70 -7
package/src/runner_native.zig +17 -1
package/src/runner_waits.zig +82 -19
package/src/scaffold.zig +28 -12
package/src/scenario.zig +32 -4
package/src/schema_registry.zig +4 -0
package/src/version.zig +1 -1
package/viewer/app.js +23 -3

package/clients/typescript/index.mjs CHANGED Viewed

@@ -146,6 +146,10 @@ export class ZmrClient {
     return this.request("assert.healthy", options);
   }
+  validateScenario(path) {
+    return this.request("scenario.validate", { path });
+  }
   exportTrace(out, options = {}) {
     return this.request("trace.export", { out, ...options });
   }
@@ -154,6 +158,18 @@ export class ZmrClient {
     return this.request("trace.events", { afterSeq, ...options });
   }
+  explainTrace() {
+    return this.request("trace.explain", {});
+  }
+  discoverTrace(out, options = {}) {
+    return this.request("trace.discover", { out, ...options });
+  }
+  exploreTrace(out, goal, options = {}) {
+    return this.request("trace.explore", { out, goal, ...options });
+  }
   async close() {
     if (this.#closed) return;
     this.#closed = true;

package/clients/typescript/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@zmr/client",
-  "version": "0.1.3",
+  "version": "0.2.0",
   "type": "module",
   "main": "index.mjs",
   "types": "index.d.ts",

package/docs/agent-discovery.md CHANGED Viewed

@@ -1,18 +1,32 @@
 # Agent Discovery
-ZMR supports agent-led discovery today through its JSON-RPC and MCP interfaces.
-An external agent can observe the app, choose typed actions, inspect trace
-events, and write a repeatable scenario file as it learns a flow.
-ZMR does not include a built-in autonomous crawler or test writer in this
-developer preview. Keep the planning loop in the agent, and keep ZMR as the
-deterministic mobile control plane.
+ZMR supports agent-led discovery today through its JSON-RPC and MCP interfaces,
+trace events, semantic snapshot artifacts, guarded trace exploration, in-band
+trace discovery, and offline scenario drafting. An external agent can observe
+the app, choose typed actions, inspect trace events, ask ZMR to write a small
+repeatable scenario from the trace, and then edit it as it learns a flow.
+`zmr explore` is the built-in review-first exploration command. It is
+trace-backed, not an unbounded crawler: it does not launch devices, invent
+missing actions, discover credentials, or commit files. Keep autonomous
+planning in the agent, and keep ZMR as the deterministic mobile control plane.
+```mermaid
+flowchart LR
+    SESSION["Live agent session<br/>or zmr run"] --> TRACE["Trace directory"]
+    TRACE --> DISCOVER["zmr discover / draft / explore<br/>--from-trace"]
+    DISCOVER --> CANDIDATE["Scenario candidate<br/>.zmr/discovered/*.json"]
+    CANDIDATE --> REVIEW["Human / agent review"]
+    REVIEW --> VALIDATE["zmr validate --json"]
+    VALIDATE --> CI["zmr run in CI<br/>report.html · junit.xml"]
+```
 ## Recommended Loop
 1. Validate local setup:
    ```bash
+   zmr inspect --json --dir .
    zmr doctor --json --config .zmr/config.json
    zmr validate --json .zmr/ios-smoke.json
    ```
@@ -34,15 +48,128 @@ deterministic mobile control plane.
 5. Choose one typed action, such as `ui.tap`, `ui.type`, `app.openLink`, or
    `wait.until`.
 6. Observe again and inspect `trace.events`.
-7. Write successful steps into a candidate scenario, for example
-   `.zmr/discovered/login-smoke.json`.
-8. Validate the candidate scenario:
+7. If you used `zmr run --json --trace-dir`, read `nextCommands`; traced run
+   summaries include HTML/JUnit report output and the matching
+   `zmr discover --from-trace` command.
+8. If you want the CLI run itself to write the candidate, use:
+   ```bash
+   zmr run .zmr/login-smoke.json \
+     --trace-dir traces/zmr-agent \
+     --discover-out .zmr/discovered/replay-smoke.json \
+     --json
+   ```
+   The run response embeds `discovery`, the same JSON payload returned by
+   `zmr discover --json`, including `replay` coverage metadata for converted
+   and skipped trace actions.
+9. Generate a reviewable scenario candidate from the trace. For CLI-driven
+   agent loops, prefer `zmr explore` so the goal and guardrails travel with the
+   machine-readable result:
+   ```bash
+   zmr explore --from-trace traces/zmr-agent \
+     --out .zmr/discovered/login-smoke.json \
+     --goal "find a stable login smoke" \
+     --include-actions \
+     --validate \
+     --json
+   ```
+   The output is covered by `schemas/explore-output.schema.json` and includes
+   `autonomous:false`, `reviewRequired:true`, `guardrails`, replay coverage,
+   validation, and deterministic next commands.
+10. Use live trace exploration when the agent should keep the goal attached to
+    the generated draft. JSON-RPC agents can call `trace.explore`:
+   ```json
+   {"jsonrpc":"2.0","id":7,"method":"trace.explore","params":{"out":".zmr/discovered/login-smoke.json","goal":"find a stable login smoke","includeActions":true,"validate":true,"force":true}}
+   ```
+   MCP agents can call `trace_explore` with `out`, `goal`,
+   `includeActions`, `validate`, and `force`. The response includes
+   `autonomous:false`, `reviewRequired:true`, and `guardrails`.
+11. Use the lower-level trace discovery primitive when the agent already owns
+    goal tracking. JSON-RPC agents can
+    call `trace.discover`:
+   ```json
+   {"jsonrpc":"2.0","id":7,"method":"trace.discover","params":{"out":".zmr/discovered/replay-smoke.json","includeActions":true,"validate":true,"force":true}}
+   ```
+   MCP agents can call `trace_discover` with the same `out`,
+   `includeActions`, `validate`, and `force` arguments. The offline CLI
+   equivalent is:
+   ```bash
+   zmr discover --from-trace traces/zmr-agent \
+     --out .zmr/discovered/replay-smoke.json \
+     --include-actions \
+     --validate \
+     --json
+   ```
+   `zmr discover` writes a scenario from trace evidence and, with
+   `--validate`, immediately proves that the generated file is syntactically
+   runnable by ZMR. It is still review-first: it does not crawl, invent missing
+   actions, discover credentials, or commit the scenario.
+   Read the `replay` object before trusting coverage: `eventCount` is the
+   trace action event count considered for replay, `stepCount` is the number of
+   generated replay steps, and `skippedEventCount` is the number of events left
+   out.
+11. After editing a generated scenario, validate it in-band with JSON-RPC:
+   ```json
+   {"jsonrpc":"2.0","id":8,"method":"scenario.validate","params":{"path":".zmr/discovered/replay-smoke.json"}}
+   ```
+   MCP agents can call `scenario_validate` with the same `path` argument. The
+   result matches `zmr validate --json`, including field paths and source
+   locations for invalid files.
+12. Use the lower-level draft primitive when you want separate surface and
+   replay files. For a conservative surface-smoke scenario:
+   ```bash
+   zmr draft --from-trace traces/zmr-agent \
+     --out .zmr/discovered/surface-smoke.json \
+     --json
+   ```
+   The draft contains `launch`, `snapshot`, and `assertVisible` steps from
+   stable visible selectors. It does not tap, type, crawl, or commit anything.
+   If the trace contains successful typed actions and you want a replayable
+   starting point, include those supported events explicitly:
+   ```bash
+   zmr draft --from-trace traces/zmr-agent \
+     --out .zmr/discovered/replay-smoke.json \
+     --include-actions \
+     --json
+   ```
+   Replay drafts include only supported events with stable replay data, such as
+   launch, deep links, selector taps, selector text entry, back, keyboard hiding,
+   coordinate-complete swipes, selector/timeout-preserving waits, and
+   direction/timeout-preserving selector scrolls, selector/timeout-preserving
+   `assertVisible` and `assertNotVisible`, `assertNoneVisible` selector arrays,
+   and timed `assertHealthy` checks. Native selector wait traces also retain
+   timeout context for successful waits and timeout diagnostics.
+   Unsupported events stay out of the scenario and are reported as warnings.
+13. Edit the draft, discovery, or exploration output into a candidate flow, for example
+   `.zmr/discovered/login-smoke.json`, by copying only steps that were observed
+   and understood.
+14. Validate the candidate scenario:
    ```bash
    zmr validate --json .zmr/discovered/login-smoke.json
    ```
-9. Re-run it deterministically:
+15. Re-run it deterministically:
    ```bash
    zmr run .zmr/discovered/login-smoke.json \
@@ -52,7 +179,7 @@ deterministic mobile control plane.
      --json
    ```
-10. Export a redacted bundle before sharing artifacts:
+16. Export a redacted bundle before sharing artifacts:
     ```bash
     zmr export traces/zmr-login-smoke \
@@ -68,16 +195,18 @@ deterministic mobile control plane.
 - Prefer accessibility identifiers, resource ids, stable labels, and exact text
   over coordinates.
 - Require human review before committing generated tests.
+- Treat `zmr explore` output as a starting point, not as a production-ready
+  flow.
+- Treat `zmr discover` output as a starting point, not as a production-ready
+  flow.
+- Treat `zmr draft` output as a starting point, not as a production-ready flow.
+- Use `--include-actions` only after reviewing the trace events that produced
+  the replay draft.
 - Redact traces before sharing them outside the local team.
-## Future Shape
-A future command could wrap this loop:
-```bash
-zmr explore --goal "find the login flow" --out .zmr/discovered/login-smoke.json
-```
+## Current Shape
-That command is not shipped today. The safer product direction is to make
-scenario discovery explicit, reviewable, and trace-backed before it becomes a
-one-command workflow.
+`zmr explore` is the first shipped goal-carrying command in this loop. It still
+requires an existing trace because the current product direction is to keep
+scenario generation explicit, reviewable, and trace-backed before any future
+goal-driven crawler can safely act inside an app.

package/docs/ai-agents.md CHANGED Viewed

@@ -1,13 +1,33 @@
 # AI Agent Guide
 ZMR is built for external agents. The runner provides device state, typed
-actions, waits, assertions, and trace export; the agent decides the next step.
+actions, waits, assertions, trace explanation, and trace export; the agent
+decides the next step.
+```mermaid
+sequenceDiagram
+    participant Agent as AI agent
+    participant ZMR
+    participant Device as Emulator / simulator
+    Agent->>ZMR: semantic_snapshot
+    ZMR->>Device: capture UI + screenshot
+    ZMR-->>Agent: roles, stable selectors, bounds
+    Agent->>ZMR: tap / type / swipe / open_link
+    ZMR->>Device: execute + settle
+    Agent->>ZMR: wait_visible / assert_visible
+    ZMR-->>Agent: typed result + trace events
+    Agent->>ZMR: trace_discover
+    ZMR-->>Agent: reviewable replay scenario
+    Agent->>ZMR: trace_export --redact
+    ZMR-->>Agent: .zmrtrace evidence bundle
+```
 ## Agent Setup Loop
 Start inside the app checkout:
 ```bash
+zmr inspect --json --dir .
 zmr doctor --json --config .zmr/config.json
 zmr validate --json .zmr/android-smoke.json
 zmr validate --json .zmr/ios-smoke.json
@@ -18,6 +38,10 @@ Use `zmr doctor --strict --json` in CI or setup flows that should fail on any
 warning. Prefer JSON output for automation because it includes stable error
 codes, field paths, and remediation hints.
+Use `zmr inspect --json --dir .` first when an agent enters a repo. It is a
+read-only handoff with config status, generated agent instruction status,
+platform smoke scenario paths, safe next commands, and explicit claim limits.
 ## Live JSON-RPC Session
 Agents should prefer `zmr serve` for interactive work:
@@ -35,8 +59,15 @@ Recommended flow:
 4. Choose one typed action or assertion.
 5. Let ZMR settle, then observe again.
 6. Poll `trace.events` during long runs.
-7. Call `trace.export` with `redact: true` before sharing artifacts.
-8. Call `session.close`.
+7. Call `trace.explain` when you need the active trace status, failure
+   diagnostic, or next commands.
+8. Call `trace.explore` when you want a review-required scenario candidate for
+   a stated goal from the active trace.
+9. Call `trace.discover` when you want a lower-level reviewable scenario
+   candidate from the active trace and the agent already owns goal tracking.
+10. Call `scenario.validate` after editing generated scenario files.
+11. Call `trace.export` with `redact: true` before sharing artifacts.
+12. Call `session.close`.
 Do not parse screenshots or terminal text when the same fact is available from
 snapshot nodes, action results, CLI JSON, or trace events.
@@ -47,6 +78,14 @@ For iOS visual captures, `artifactStatus: "captured"` with
 XCTest hierarchy extraction failed. Use `zmr explain --json <trace-dir>` for
 the same diagnostic shape after the run.
+For traced CLI runs, `zmr run --json` also returns `nextCommands` with the
+HTML/JUnit report, explain, `zmr discover --from-trace`, and redacted export
+handoffs.
+Agents should prefer those commands over reconstructing trace paths from text.
+When an agent should create the reviewable scenario in the same process, pass
+`--discover-out .zmr/discovered/<name>.json`; the run JSON will include a
+`discovery` object with validation results and `replay` coverage metadata.
 ## MCP Session
 Agents that support the Model Context Protocol can use ZMR directly as a local
@@ -61,9 +100,14 @@ The MCP server exposes mobile-specific tools:
 - `snapshot`: raw ZMR observation JSON
 - `semantic_snapshot`: normalized roles, names, selectors, bounds, and
   recommended actions
-- `tap`, `type`, `press_back`, and `open_link`
-- `wait_visible`
-- `trace_events` and `trace_export`
+- `install_app`, `launch_app`, `stop_app`, and `clear_state`
+- `tap`, `type`, `erase_text`, `hide_keyboard`, `swipe`, `press_back`,
+  `open_link`, and `scroll_until_visible`
+- `wait_visible`, `wait_not_visible`, and `wait_any`
+- `assert_visible`, `assert_not_visible`, and `assert_healthy`
+- `scenario_validate`
+- `trace_events`, `trace_explain`, `trace_explore`, `trace_discover`, and
+  `trace_export`
 Prefer `semantic_snapshot` for action planning. It avoids forcing an agent to
 infer intent from platform-specific Android/UI Automator or XCTest class names.
@@ -72,12 +116,56 @@ infer intent from platform-specific Android/UI Automator or XCTest class names.
 Agents can use ZMR to discover flows and draft scenarios by looping over
 `observe.semanticSnapshot`, one typed action, trace events, and scenario
-validation. See [Agent Discovery](agent-discovery.md) for the recommended
-reviewable loop.
+validation. After a session has produced trace artifacts, call JSON-RPC
+`trace.explain` or MCP `trace_explain` for in-band triage, then call JSON-RPC
+`trace.explore` or MCP `trace_explore` when the generated draft should carry a
+stated goal and guardrails. Use JSON-RPC `trace.discover` or MCP
+`trace_discover` for the lower-level trace-backed draft when the agent already
+owns goal tracking. Use JSON-RPC `scenario.validate` or MCP
+`scenario_validate` after edits. The CLI command is the offline equivalent:
+```bash
+zmr discover --from-trace traces/zmr-agent \
+  --out .zmr/discovered/replay-smoke.json \
+  --include-actions \
+  --validate \
+  --json
+```
+`zmr discover` is review-first. It writes from trace evidence, validates the
+generated scenario when asked, and returns next commands for deterministic
+reruns. It does not crawl, discover credentials, or commit tests. The JSON
+`replay` object lets agents compare trace action events considered for replay,
+generated replay steps, and skipped events before making coverage claims.
+Use `zmr draft` when you want the lower-level split workflow. It writes
+`launch`, `snapshot`, and conservative `assertVisible` checks by default. For
+traces produced by an agent session with successful typed actions, add
+`--include-actions` to generate a replay draft from supported events before the
+final snapshot assertions:
+```bash
+zmr draft --from-trace traces/zmr-agent \
+  --out .zmr/discovered/replay-smoke.json \
+  --include-actions \
+  --json
+zmr validate --json .zmr/discovered/replay-smoke.json
+```
-ZMR does not ship a built-in autonomous crawler or test writer in this developer
-preview. Keep autonomous planning outside the runner, then commit only reviewed
-scenario JSON.
+Unsupported or underspecified events are skipped with warnings instead of being
+guessed. Supported replay steps preserve selector and timeout data for waits,
+selector and timeout data for `assertVisible` and `assertNotVisible`, selector
+arrays for `assertNoneVisible`, and timeouts for `assertHealthy` when the trace
+records them. See [Agent Discovery](agent-discovery.md) for the
+recommended reviewable loop.
+CLI agents can use `zmr explore --from-trace <trace-dir> --out <scenario.json>
+--goal <goal> --include-actions --validate --json` when the goal should travel
+with the generated scenario candidate. The result includes `autonomous:false`,
+`reviewRequired:true`, `guardrails`, replay coverage, validation, and next
+commands. ZMR still does not ship an unbounded autonomous crawler or test
+writer in this developer preview. Keep autonomous planning outside the runner,
+then commit only reviewed scenario JSON.
 ## Scenario File Workflow

package/docs/benchmarking.md CHANGED Viewed

@@ -1,6 +1,41 @@
 # Benchmarking
-ZMR benchmark output is intentionally simple: each run appends one JSON object to `results.jsonl`, and `zmr report` turns that directory into a local HTML report.
+ZMR benchmark output is intentionally simple: each run appends one JSON object
+to `results.jsonl`, and `zmr report` turns that directory into local HTML and
+optional JUnit XML artifacts.
+## Public Evidence
+Public-safe benchmark evidence lives in [docs/benchmarks](benchmarks/README.md).
+The first committed pack is
+[2026-06-09 iOS simulator demo](benchmarks/2026-06-09-ios-demo.md): 20 repeated
+runs of the generated iOS smoke scenario with a 100% pass rate. It is a
+single-tool reliability benchmark, not a competitive speed claim.
+The first baseline comparison is documented in
+[docs/benchmarks](benchmarks/README.md): 20 ZMR runs and 20 baseline runner
+runs against the same generated iOS demo app.
+Additional public-safe packs in that directory include a second baseline
+comparison and a native shim floor. The floor is not a product comparison; it
+shows the warmed platform path ZMR can approach after runner and trace overhead
+are reduced.
+A richer iOS workflow pack is also committed there: 20 ZMR rows and 20 baseline
+runner rows against the same generated app build, covering profile entry,
+catalog item selection, save, review, and final-state assertion.
+Benchmark Lab v1 is the next public evidence layer. It defines framework
+fixtures, timing modes, runner-adapter labels, and claim rules in a manifest
+that can be validated or rendered with `zmr-benchmark-lab`.
+The generated Android workflow now has its first 20-run evidence pack in
+[docs/benchmarks](benchmarks/README.md), using the platform UIAutomator path
+without the optional Android instrumentation shim.
+A generated React Native/Expo fixture is now available for the next evidence
+slice. It includes stable `testID` values, accessibility labels, deep-link
+setup, and Android/iOS ZMR workflow scenarios, but no public timing rows yet.
 ## Single Tool Benchmark
@@ -29,7 +64,9 @@ or p95 duration misses the configured threshold.
 Generate a report:
 ```bash
-zmr report traces/bench-<timestamp> --out traces/bench-<timestamp>/report.html
+zmr report traces/bench-<timestamp> \
+  --out traces/bench-<timestamp>/report.html \
+  --junit traces/bench-<timestamp>/junit.xml
 ```
 ## Pilot Wrapper
@@ -62,7 +99,13 @@ Use `--screen-record` when investigating visual flakes:
   --max-failures 0
 ```
-For `--runs 1`, the script exports normal and redacted `.zmrtrace` bundles. For `--runs > 1`, it writes benchmark directories and HTML reports.
+For `--runs 1`, the script exports normal and redacted `.zmrtrace` bundles.
+For `--runs > 1`, the pilot wrappers and generated app reliability scripts
+write benchmark directories with HTML and JUnit reports.
+Apps scaffolded by `zmr-wizard` get matching package scripts, so app-local
+reliability gates run as `bun run zmr:android:reliability` and
+`bun run zmr:ios:reliability` (or the npm equivalents).
 The iOS pilot wrapper supports the same repeated-run gates:
@@ -128,9 +171,12 @@ Benchmark reports include:
 - terminal trace status
 - failed step index and error when available
 - links to each run's `events.jsonl`
+- optional JUnit XML with one testcase per benchmark row for CI test reports
 Before making public performance claims, run the same scenario repeatedly on a clean emulator image and include the raw `results.jsonl` plus the redacted trace bundle for any failure.
+![ZMR HTML trace report showing the trace summary and per-event timeline](assets/report-html.png)
 ## Compare Against A Baseline
 Use `zmr-compare-benchmarks` when a private app repo has benchmark rows from

package/docs/benchmarks/2026-06-09-android-workflow.md ADDED Viewed

@@ -0,0 +1,73 @@
+# 2026-06-09 Android Emulator Workflow
+This evidence pack records 20 repeated ZMR runs of the generated public Android
+workflow demo app. The flow launches the app, fills a profile form, scrolls the
+catalog, opens an item detail page, saves the item, reviews the order, and
+asserts the final state.
+This is a single-tool reliability and timing pack, not a comparison against
+another runner. Treat it as reproducible evidence for this app, host, emulator,
+app build, and workflow shape.
+## Result
+| Tool | Runs | Pass rate | Failures | Mean duration | p95 duration |
+| --- | ---: | ---: | ---: | ---: | ---: |
+| ZMR | 20 | 100.00% | 0 | 44134 ms | 46385 ms |
+The fastest run was 38627 ms and the slowest run was 49875 ms. The later rows
+clustered around 43 seconds, which points to Android UIAutomator snapshot and
+scroll execution as the next optimization target for this fixture.
+## Environment
+| Field | Value |
+| --- | --- |
+| ZMR runner | `0.1.8` |
+| ZMR protocol | `2026-04-28` |
+| Host OS | macOS 26.6, arm64 |
+| Android emulator | 36.4.10.0 |
+| ADB | 1.0.41, platform-tools 37.0.0 |
+| Android platform | Android 15, API 35, arm64-v8a |
+| Emulator viewport | 720 x 1280, 320 dpi |
+| App id | `com.example.mobiletest` |
+| App build label | `generated-android-workflow-demo-20260609` |
+| Demo app source | Generated by `scripts/create-android-demo-app.sh` |
+Before collection, the emulator was booted fresh, the app was reinstalled, the
+screen was unlocked, and Android window, transition, and animator duration
+scales were set to `0`.
+## Command
+```bash
+ZMR_BIN="$PWD/zig-out/bin/zmr" scripts/benchmark.sh \
+  --zmr examples/android-workflow.json \
+  --platform android \
+  --device emulator-5554 \
+  --app-id com.example.mobiletest \
+  --app-build generated-android-workflow-demo-20260609 \
+  --runs 20 \
+  --trace-root traces/public-benchmarks/20260609-android-workflow/zmr \
+  --results traces/public-benchmarks/20260609-android-workflow/results.jsonl \
+  --replace \
+  --min-pass-rate 100 \
+  --max-failures 0
+```
+The ZMR scenario is committed as
+[`examples/android-workflow.json`](../../examples/android-workflow.json).
+## Rows
+The sanitized result rows are committed in
+[2026-06-09-android-workflow.results.jsonl](2026-06-09-android-workflow.results.jsonl).
+Raw local trace and runner logs are not committed because they can include local
+absolute paths.
+## Scope
+This benchmark uses the platform UIAutomator path without the optional Android
+instrumentation shim. It does not compare cloud execution, React Native, Expo,
+Flutter, Appium, Maestro, Detox, or Android instrumentation-runner baselines.

package/docs/benchmarks/2026-06-09-android-workflow.results.jsonl ADDED Viewed

@@ -0,0 +1,20 @@
+{"tool":"zmr","run":1,"status":"ok","durationMs":49875,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-1","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
+{"tool":"zmr","run":2,"status":"ok","durationMs":45169,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-2","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
+{"tool":"zmr","run":3,"status":"ok","durationMs":46093,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-3","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
+{"tool":"zmr","run":4,"status":"ok","durationMs":44695,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-4","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
+{"tool":"zmr","run":5,"status":"ok","durationMs":44028,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-5","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
+{"tool":"zmr","run":6,"status":"ok","durationMs":44821,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-6","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
+{"tool":"zmr","run":7,"status":"ok","durationMs":46385,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-7","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
+{"tool":"zmr","run":8,"status":"ok","durationMs":45751,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-8","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
+{"tool":"zmr","run":9,"status":"ok","durationMs":38627,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-9","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
+{"tool":"zmr","run":10,"status":"ok","durationMs":42599,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-10","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
+{"tool":"zmr","run":11,"status":"ok","durationMs":42968,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-11","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
+{"tool":"zmr","run":12,"status":"ok","durationMs":43299,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-12","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
+{"tool":"zmr","run":13,"status":"ok","durationMs":43684,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-13","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
+{"tool":"zmr","run":14,"status":"ok","durationMs":43056,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-14","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
+{"tool":"zmr","run":15,"status":"ok","durationMs":43418,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-15","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
+{"tool":"zmr","run":16,"status":"ok","durationMs":43267,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-16","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
+{"tool":"zmr","run":17,"status":"ok","durationMs":43780,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-17","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
+{"tool":"zmr","run":18,"status":"ok","durationMs":43371,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-18","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
+{"tool":"zmr","run":19,"status":"ok","durationMs":43095,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-19","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}
+{"tool":"zmr","run":20,"status":"ok","durationMs":44693,"traceDir":"traces/public-benchmarks/20260609-android-workflow/zmr/zmr-20","platform":"android","device":"emulator-5554","appId":"com.example.mobiletest","scenario":"examples/android-workflow.json","appBuild":"generated-android-workflow-demo-20260609","traceStatus":"passed"}

package/docs/benchmarks/2026-06-09-framework-baseline-status.md ADDED Viewed

@@ -0,0 +1,32 @@
+# 2026-06-09 Framework Baseline Status
+This note tracks the requested baseline coverage beyond the committed iOS demo
+comparisons.
+## Completed
+| Baseline | Status | Evidence |
+| --- | --- | --- |
+| Maestro | Completed | [iOS ZMR vs Maestro comparison](2026-06-09-ios-maestro-comparison.md) |
+| Appium | Completed | [iOS ZMR vs Appium comparison](2026-06-09-ios-appium-comparison.md) |
+| XCTest floor | Completed | [iOS XCTest shim floor](2026-06-09-ios-xctest-floor.md) |
+## Not Yet Fair To Publish
+| Baseline | Why it needs a fixture first | Next evidence pack |
+| --- | --- | --- |
+| Detox | The CLI requires a project-local `detox` install and a React Native app with Detox configuration, native iOS/Android build targets, and a test file. Running it against the generated Swift demo would not be representative. | React Native fixture with the same launch, deep link, assertion, and warm-suite/cold-command modes. |
+| Flutter | The local machine does not have the Flutter CLI installed, and ZMR should not claim Flutter widget-tree-driver coverage. | Flutter fixture using platform-level labels/deep links plus either Flutter `integration_test` or an external runner baseline. |
+| Espresso | No Android emulator is currently attached in this workspace. Espresso should compare against an Android fixture with an instrumentation target rather than an iOS-only demo. | Android generated demo with ZMR, direct Espresso instrumentation, and Appium UIAutomator2 rows. |
+## Speed Work Opened By This Pass
+The XCTest floor showed that ZMR can be made faster. The first fix from this
+pass skips the expensive iOS system-open alert probe for custom URL schemes and
+keeps it for `http://` and `https://` links. On the generated iOS demo smoke
+flow, the shim-backed ZMR mean dropped to `2007 ms` while the direct warmed
+XCTest shim floor measured `1004 ms`.
+The next speed target is a warm-suite mode where one ZMR process executes many
+iterations in a single device session, avoiding repeated CLI startup and trace
+setup for benchmark loops.