agent-scenario-loop 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +119 -0
- package/app/profile-session.ts +812 -0
- package/core/config-template.json +41 -0
- package/dist/core/agent-summary.d.ts +15 -0
- package/dist/core/agent-summary.js +177 -0
- package/dist/core/artifact-contract.d.ts +151 -0
- package/dist/core/artifact-contract.js +897 -0
- package/dist/core/artifact-layout.d.ts +56 -0
- package/dist/core/artifact-layout.js +61 -0
- package/dist/core/artifact-writer.d.ts +44 -0
- package/dist/core/artifact-writer.js +55 -0
- package/dist/core/comparison.d.ts +133 -0
- package/dist/core/comparison.js +294 -0
- package/dist/core/evidence-interpreter.d.ts +28 -0
- package/dist/core/evidence-interpreter.js +69 -0
- package/dist/core/execution-plan.d.ts +44 -0
- package/dist/core/execution-plan.js +95 -0
- package/dist/core/planner.d.ts +132 -0
- package/dist/core/planner.js +812 -0
- package/dist/core/ports.d.ts +198 -0
- package/dist/core/ports.js +146 -0
- package/dist/core/run-index.d.ts +62 -0
- package/dist/core/run-index.js +143 -0
- package/dist/core/schema-validator.d.ts +86 -0
- package/dist/core/schema-validator.js +407 -0
- package/dist/index.d.ts +11 -0
- package/dist/index.js +27 -0
- package/dist/runner/agent-device-driver.d.ts +126 -0
- package/dist/runner/agent-device-driver.js +168 -0
- package/dist/runner/agent-device.d.ts +295 -0
- package/dist/runner/agent-device.js +1271 -0
- package/dist/runner/android-adb-driver.d.ts +175 -0
- package/dist/runner/android-adb-driver.js +399 -0
- package/dist/runner/android-adb.d.ts +254 -0
- package/dist/runner/android-adb.js +1618 -0
- package/dist/runner/argent-driver.d.ts +183 -0
- package/dist/runner/argent-driver.js +297 -0
- package/dist/runner/argent.d.ts +349 -0
- package/dist/runner/argent.js +1211 -0
- package/dist/runner/check-plan.d.ts +45 -0
- package/dist/runner/check-plan.js +210 -0
- package/dist/runner/cli.d.ts +20 -0
- package/dist/runner/cli.js +23 -0
- package/dist/runner/compare-latest.d.ts +99 -0
- package/dist/runner/compare-latest.js +233 -0
- package/dist/runner/compare.d.ts +58 -0
- package/dist/runner/compare.js +157 -0
- package/dist/runner/demo-loop.d.ts +45 -0
- package/dist/runner/demo-loop.js +170 -0
- package/dist/runner/example-android-live.d.ts +137 -0
- package/dist/runner/example-android-live.js +454 -0
- package/dist/runner/example-ios-live.d.ts +137 -0
- package/dist/runner/example-ios-live.js +471 -0
- package/dist/runner/host-doctor.d.ts +131 -0
- package/dist/runner/host-doctor.js +628 -0
- package/dist/runner/init-project.d.ts +88 -0
- package/dist/runner/init-project.js +263 -0
- package/dist/runner/ios-simctl-driver.d.ts +69 -0
- package/dist/runner/ios-simctl-driver.js +97 -0
- package/dist/runner/ios-simctl.d.ts +254 -0
- package/dist/runner/ios-simctl.js +1415 -0
- package/dist/runner/live-android.d.ts +137 -0
- package/dist/runner/live-android.js +539 -0
- package/dist/runner/live-comparison.d.ts +67 -0
- package/dist/runner/live-comparison.js +147 -0
- package/dist/runner/live-ios.d.ts +137 -0
- package/dist/runner/live-ios.js +460 -0
- package/dist/runner/live-proof-summary.d.ts +263 -0
- package/dist/runner/live-proof-summary.js +465 -0
- package/dist/runner/live-proof.d.ts +467 -0
- package/dist/runner/live-proof.js +920 -0
- package/dist/runner/local-env.d.ts +64 -0
- package/dist/runner/local-env.js +155 -0
- package/dist/runner/profile-android.d.ts +82 -0
- package/dist/runner/profile-android.js +671 -0
- package/dist/runner/profile-ios.d.ts +108 -0
- package/dist/runner/profile-ios.js +532 -0
- package/dist/runner/profile-mobile.d.ts +254 -0
- package/dist/runner/profile-mobile.js +1307 -0
- package/dist/runner/validate-project.d.ts +273 -0
- package/dist/runner/validate-project.js +1501 -0
- package/docs/adapters.md +145 -0
- package/docs/api.md +94 -0
- package/docs/authoring.md +196 -0
- package/docs/concepts.md +136 -0
- package/docs/consumer-rehearsal.md +115 -0
- package/docs/contracts.md +267 -0
- package/docs/live-proofs.md +270 -0
- package/docs/principles.md +46 -0
- package/examples/event-logs/app-startup-baseline.log +4 -0
- package/examples/event-logs/app-startup-current.log +4 -0
- package/examples/minimal-app/README.md +70 -0
- package/examples/mobile-app/README.md +302 -0
- package/examples/mobile-app/app.json +22 -0
- package/examples/mobile-app/asl/package-scripts.json +32 -0
- package/examples/mobile-app/asl.config.json +37 -0
- package/examples/mobile-app/event-logs/android-app-startup.log +4 -0
- package/examples/mobile-app/event-logs/android-open-close-cycle.log +12 -0
- package/examples/mobile-app/event-logs/android-scroll-settle.log +12 -0
- package/examples/mobile-app/event-logs/app-startup.log +4 -0
- package/examples/mobile-app/event-logs/open-close-cycle.log +12 -0
- package/examples/mobile-app/event-logs/scroll-settle.log +12 -0
- package/examples/mobile-app/index.ts +20 -0
- package/examples/mobile-app/metro.config.js +20 -0
- package/examples/mobile-app/package.json +62 -0
- package/examples/mobile-app/patches/expo-modules-jsi@56.0.10.patch +19 -0
- package/examples/mobile-app/plugins/with-ios-build-compat.js +271 -0
- package/examples/mobile-app/pnpm-lock.yaml +4440 -0
- package/examples/mobile-app/runner-manifests/evidence-provider.json +79 -0
- package/examples/mobile-app/runner-manifests/primary-runner.json +19 -0
- package/examples/mobile-app/scenarios/android/app-startup-video.json +73 -0
- package/examples/mobile-app/scenarios/android/app-startup.json +44 -0
- package/examples/mobile-app/scenarios/android/open-close-cycle.json +54 -0
- package/examples/mobile-app/scenarios/android/scroll-settle.json +49 -0
- package/examples/mobile-app/scenarios/ios/app-startup.json +44 -0
- package/examples/mobile-app/scenarios/ios/open-close-cycle.json +54 -0
- package/examples/mobile-app/scenarios/ios/scroll-settle.json +49 -0
- package/examples/mobile-app/scenarios/mobile/app-startup.json +91 -0
- package/examples/mobile-app/scenarios/mobile/open-close-cycle.json +160 -0
- package/examples/mobile-app/scenarios/mobile/scroll-settle.json +148 -0
- package/examples/mobile-app/scripts/asl-capture-accessibility-provider.mjs +112 -0
- package/examples/mobile-app/scripts/asl-capture-profiler-provider.mjs +127 -0
- package/examples/mobile-app/src/devtools/profile-session.ts +7 -0
- package/examples/mobile-app/src/example-screen.tsx +322 -0
- package/examples/mobile-app/tsconfig.json +16 -0
- package/examples/mobile-app/tsconfig.typecheck.json +13 -0
- package/examples/runners/README.md +44 -0
- package/examples/runners/adb-android.json +25 -0
- package/examples/runners/agent-device-android.json +27 -0
- package/examples/runners/agent-device-ios.json +27 -0
- package/examples/runners/argent-android.json +32 -0
- package/examples/runners/argent-ios.json +32 -0
- package/examples/runners/argent-react-profiler-provider.json +15 -0
- package/examples/runners/axe-accessibility-provider.json +24 -0
- package/examples/runners/manual-log-ingest.json +9 -0
- package/examples/runners/rozenite-profiler-provider.json +9 -0
- package/examples/runners/script-accessibility-provider.json +24 -0
- package/examples/runners/script-memory-provider.json +24 -0
- package/examples/runners/script-network-provider.json +24 -0
- package/examples/runners/script-profiler-provider.json +30 -0
- package/examples/runners/xcodebuildmcp-ios.json +29 -0
- package/examples/scenarios/ios/app-startup.json +28 -0
- package/examples/scenarios/ios/open-close-cycle.json +35 -0
- package/examples/scenarios/mobile/app-startup.json +72 -0
- package/examples/scenarios/mobile/media-open-close.json +141 -0
- package/examples/scenarios/mobile/open-close-cycle.json +135 -0
- package/examples/scenarios/mobile/scroll-settle.json +106 -0
- package/package.json +240 -0
- package/schemas/budget-verdict.schema.json +115 -0
- package/schemas/causal-run.schema.json +279 -0
- package/schemas/comparison.schema.json +196 -0
- package/schemas/health.schema.json +108 -0
- package/schemas/live-proof-set.schema.json +195 -0
- package/schemas/live-proof.schema.json +413 -0
- package/schemas/manifest.schema.json +204 -0
- package/schemas/metrics.schema.json +137 -0
- package/schemas/project-validation.schema.json +343 -0
- package/schemas/runner-capabilities.schema.json +217 -0
- package/schemas/scenario.schema.json +400 -0
- package/schemas/verdict.schema.json +88 -0
- package/templates/evidence-provider.json +83 -0
- package/templates/gitignore-snippet +9 -0
- package/templates/integration-readme.md +125 -0
- package/templates/mobile-scenario.json +133 -0
- package/templates/package-scripts.json +32 -0
- package/templates/primary-runner.json +19 -0
- package/templates/project.config.json +37 -0
- package/templates/scripts/asl-capture-accessibility-provider.mjs +112 -0
- package/templates/scripts/asl-capture-profiler-provider.mjs +127 -0
|
@@ -0,0 +1,115 @@
|
|
|
1
|
+
# Consumer App Rehearsal
|
|
2
|
+
|
|
3
|
+
Use this checklist before adopting Agent Scenario Loop in an existing React Native app. The goal is to prove the package workflow locally before treating it as part of the app's everyday agent loop.
|
|
4
|
+
|
|
5
|
+
In this repository, the automated packed-package rehearsal is:
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
pnpm consumer:rehearse
|
|
9
|
+
```
|
|
10
|
+
|
|
11
|
+
It creates a temporary existing app-shaped package, installs the packed tarball, runs `asl-init`, merges generated scripts into `package.json`, replaces scaffold placeholders, runs both platform plan scripts, runs generated fixture profile scripts against deterministic event logs, runs the generated Argent interaction scripts through a deterministic adapter double, validates the project through the installed CLI, and proves stale merged scripts are rejected. Use the manual checklist below when rehearsing inside a real app.
|
|
12
|
+
|
|
13
|
+
Package gates run child package-manager and CLI commands with a bounded timeout. Set `ASL_PACKAGE_GATE_TIMEOUT_MS` when a slow local registry, proxy, or package cache needs a larger budget:
|
|
14
|
+
|
|
15
|
+
```bash
|
|
16
|
+
ASL_PACKAGE_GATE_TIMEOUT_MS=300000 pnpm consumer:rehearse
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
## 1. Initialize The Scaffold
|
|
20
|
+
|
|
21
|
+
From the consuming app root:
|
|
22
|
+
|
|
23
|
+
```bash
|
|
24
|
+
asl-init --out . --scenario first-journey
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
Review the generated files before merging anything into existing app scripts:
|
|
28
|
+
|
|
29
|
+
- `asl.config.json`
|
|
30
|
+
- `scenarios/mobile/first-journey.json`
|
|
31
|
+
- `runner-manifests/primary-runner.json`
|
|
32
|
+
- `runner-manifests/evidence-provider.json`
|
|
33
|
+
- `scripts/asl-capture-accessibility-provider.mjs`
|
|
34
|
+
- `scripts/asl-capture-profiler-provider.mjs`
|
|
35
|
+
- `src/devtools/profile-session.ts`
|
|
36
|
+
- `asl/package-scripts.json`
|
|
37
|
+
- `asl/gitignore-snippet`
|
|
38
|
+
|
|
39
|
+
Keep generated artifacts ignored. Commit only durable scenarios, manifests, config, docs, and app helper wiring.
|
|
40
|
+
|
|
41
|
+
Merge the required generated `asl:*` entries from `asl/package-scripts.json` into the app `package.json`. `asl-validate-project` treats missing or drifted merged scripts as an error because agents need stable local commands, not just scaffold files.
|
|
42
|
+
|
|
43
|
+
## 2. Wire App Truth
|
|
44
|
+
|
|
45
|
+
Mount `useProfileSessionBootstrap()` once near the app root.
|
|
46
|
+
|
|
47
|
+
Emit truth events around one stable journey:
|
|
48
|
+
|
|
49
|
+
- journey intent accepted
|
|
50
|
+
- first useful visual state
|
|
51
|
+
- command target opened or completed
|
|
52
|
+
- return or completion state
|
|
53
|
+
|
|
54
|
+
Register command targets only where they map to real app behavior. Avoid selectors or commands that depend on local data, private accounts, or temporary UI state.
|
|
55
|
+
|
|
56
|
+
## 3. Validate Before Runtime
|
|
57
|
+
|
|
58
|
+
Run:
|
|
59
|
+
|
|
60
|
+
```bash
|
|
61
|
+
asl-validate-project --root . --platform all --out artifacts/asl/project-validation
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
Fix errors before runtime proof. Treat warnings and `nextActions` as setup work that should be resolved before the app depends on the scenario for regression decisions.
|
|
65
|
+
|
|
66
|
+
## 4. Prove One Platform First
|
|
67
|
+
|
|
68
|
+
Keep deterministic validation and live device proof as separate lanes. `asl-check-plan`, fixture profile runs, `pnpm package:smoke`, and `pnpm consumer:rehearse` should work in ordinary build or agent sandboxes. Live runs that touch adb, simctl, agent-device, Argent, emulators, simulators, or physical devices need host/device access. If a live command cannot reach those host services, classify it as runner environment health before treating it as a scenario regression.
|
|
69
|
+
|
|
70
|
+
Prefer Android first when iOS tooling is unstable:
|
|
71
|
+
|
|
72
|
+
```bash
|
|
73
|
+
asl-check-plan --scenario scenarios/mobile/first-journey.json --runner runner-manifests/primary-runner.json --platform android --out artifacts/asl/plan/first-journey-android
|
|
74
|
+
asl-profile-android --config asl.config.json --scenario scenarios/mobile/first-journey.json --adb-capture --profile-session --clear-logcat --launch --out artifacts/asl/android --run-id first-journey-android-live --comparison-lane first-journey-android-live
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
Use iOS once the app is installed on a booted simulator:
|
|
78
|
+
|
|
79
|
+
```bash
|
|
80
|
+
asl-check-plan --scenario scenarios/mobile/first-journey.json --runner runner-manifests/primary-runner.json --platform ios --out artifacts/asl/plan/first-journey-ios
|
|
81
|
+
asl-profile-ios --config asl.config.json --scenario scenarios/mobile/first-journey.json --simctl-capture --profile-session --profile-session-storage --launch --out artifacts/asl/ios --run-id first-journey-ios-live --comparison-lane first-journey-ios-live
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
For Expo dev-client builds, set `ASL_ANDROID_DEV_CLIENT_URL` or `ASL_IOS_DEV_CLIENT_URL` to the app's dev-client URL in ignored local env state. Android opens it before profile-session deep links; iOS opens it before reading stored profile-session evidence. If Android bundle startup is slow, set `ASL_ANDROID_DEV_CLIENT_READY_PATTERN='Running "main"'` so profile-session links wait for app runtime readiness evidence.
|
|
85
|
+
|
|
86
|
+
When Android deep-link delivery is unreliable in a dev-client shell, use `--android-profile-session-storage` so `asl-profile-android` seeds the app-owned AsyncStorage session through `run-as` before collecting evidence. The runner reads the selected device clock for the session start timestamp, which keeps app-emitted milestone durations meaningful. Keep custom storage key overrides local to the consuming app.
|
|
87
|
+
|
|
88
|
+
When `--wait-ms` is omitted, profile-session live capture derives the final adb or simctl evidence window from the scenario execution steps and cycle count. Use an explicit `--wait-ms` only for an app-specific override.
|
|
89
|
+
|
|
90
|
+
## 5. Compare Only Trusted Runs
|
|
91
|
+
|
|
92
|
+
After two passed runs exist, compare the current run against the newest trusted prior run:
|
|
93
|
+
|
|
94
|
+
```bash
|
|
95
|
+
ASL_COMPARE_ANDROID_CURRENT=artifacts/asl/android/first-journey/first-journey-android-live pnpm asl:compare:android
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
Do not make improvement or regression claims when scenario health failed or the comparison is inconclusive.
|
|
99
|
+
|
|
100
|
+
Keep each proof mode in its own comparison lane. Fixture, Android live, iOS live, adb-only, simctl-only, and sidecar-backed runs can share a scenario id, but they should not borrow each other's baselines.
|
|
101
|
+
|
|
102
|
+
## 6. Decide Adoption Scope
|
|
103
|
+
|
|
104
|
+
Before expanding beyond the first journey, confirm:
|
|
105
|
+
|
|
106
|
+
- the app helper is committed and mounted once
|
|
107
|
+
- the scenario is boring and repeatable
|
|
108
|
+
- app-owned truth events are stable
|
|
109
|
+
- artifacts are ignored locally and not packed or committed
|
|
110
|
+
- every durable runtime proof has an explicit comparison lane
|
|
111
|
+
- `agent-summary.md` gives enough context for a coding agent to act
|
|
112
|
+
- failed setup produces concrete next actions
|
|
113
|
+
- at least one platform has a passed live proof
|
|
114
|
+
|
|
115
|
+
Only then add more scenarios, providers, or runner adapters.
|
|
@@ -0,0 +1,267 @@
|
|
|
1
|
+
# Contracts
|
|
2
|
+
|
|
3
|
+
This package ships the scenario, runner, and artifact contracts that make Agent Scenario Loop useful while live runners are added behind stable interfaces.
|
|
4
|
+
|
|
5
|
+
The package is intentionally contract-first: adopt the scenario and artifact shape once, then add or swap runner loops without rewriting your scenarios.
|
|
6
|
+
|
|
7
|
+
## What ships today
|
|
8
|
+
|
|
9
|
+
- [app/profile-session.ts](../app/profile-session.ts): thin React Native integration for session control, truth events, and signal attachments
|
|
10
|
+
- [core/agent-summary.ts](../core/agent-summary.ts): agent-facing summary builder for health, verdict, and comparison state
|
|
11
|
+
- [core/artifact-layout.ts](../core/artifact-layout.ts): canonical artifact path contract for one run directory
|
|
12
|
+
- [core/artifact-writer.ts](../core/artifact-writer.ts): schema-enforcing writers for stable JSON/text artifacts
|
|
13
|
+
- [core/comparison.ts](../core/comparison.ts): comparison artifact builder for trusted before/after run folders
|
|
14
|
+
- [core/artifact-contract.ts](../core/artifact-contract.ts): artifact builders for manifest, metrics, causal run, budget verdict, and summary
|
|
15
|
+
- [core/evidence-interpreter.ts](../core/evidence-interpreter.ts): evidence interpretation helpers that gate timing claims on scenario health
|
|
16
|
+
- [core/execution-plan.ts](../core/execution-plan.ts): scenario-step normalizer that maps portable steps to runner port methods before adapter execution
|
|
17
|
+
- [core/planner.ts](../core/planner.ts): compatibility checks between scenario requirements, primary runner capabilities, and evidence providers
|
|
18
|
+
- [core/ports.ts](../core/ports.ts): ports-and-adapters method surfaces for runners, drivers, providers, writers, and interpreters
|
|
19
|
+
- [core/run-index.ts](../core/run-index.ts): read-only artifact root index for finding trusted prior runs
|
|
20
|
+
- [core/schema-validator.ts](../core/schema-validator.ts): dependency-free validation for the JSON Schema subset used by the public contracts
|
|
21
|
+
- [runner/profile-android.ts](../runner/profile-android.ts): Android profile runner that can ingest profile-event logs directly, read adb artifact folders, or own a bounded adb capture window before writing the full artifact set
|
|
22
|
+
- [runner/ios-simctl.ts](../runner/ios-simctl.ts): iOS simulator capture runner for launch, profile-session storage seeding, profile-session deep links, bounded logs, stored profile-event collection, lifecycle crash detection, host crash-report attachment, and raw simctl evidence
|
|
23
|
+
- [runner/profile-ios.ts](../runner/profile-ios.ts): iOS profile runner that can ingest profile-event logs directly, read simctl artifact folders, or own a bounded simctl capture window before writing the full artifact set
|
|
24
|
+
- [runner/android-adb.ts](../runner/android-adb.ts): Android adb readiness preflight, optional package launch, ordered driver actions, and bounded logcat capture that write runner health and raw adb evidence
|
|
25
|
+
- [runner/android-adb-driver.ts](../runner/android-adb-driver.ts): adb-backed Android driver adapter for tap, scroll, UI tree, screenshot, and log capture plus Android-specific lifecycle helpers
|
|
26
|
+
- [runner/agent-device.ts](../runner/agent-device.ts): agent-device capture runner for portable app open, visibility, screenshot, and supported driver actions without bundling agent-device
|
|
27
|
+
- [runner/argent.ts](../runner/argent.ts): Argent capture runner for launch, coordinate-backed gestures, screenshot requests, and description-backed visibility proof without bundling Argent
|
|
28
|
+
- [runner/argent-driver.ts](../runner/argent-driver.ts): optional Argent adapter for normalized gestures, app launch, screenshot requests, and UI descriptions without making Argent a package dependency
|
|
29
|
+
- [runner/ios-simctl-driver.ts](../runner/ios-simctl-driver.ts): simctl-backed iOS driver adapter for screenshot and log capture plus explicit iOS lifecycle helpers
|
|
30
|
+
- [runner/live-android.ts](../runner/live-android.ts): generic Android live proof for one portable scenario with adb preflight, profile-session capture, optional agent-device and Argent sidecars, optional comparison, and aggregate proof writing
|
|
31
|
+
- [runner/live-ios.ts](../runner/live-ios.ts): generic iOS live proof for one portable scenario with simctl preflight, storage or deep-link profile-session capture, optional agent-device and Argent sidecars, optional comparison, and aggregate proof writing
|
|
32
|
+
- [runner/example-android-live.ts](../runner/example-android-live.ts): packaged Android example live proof for adb preflight plus canonical startup, open-close, and scroll-settle scenarios
|
|
33
|
+
- [runner/example-ios-live.ts](../runner/example-ios-live.ts): packaged iOS example live proof for simctl preflight plus canonical startup, open-close, and scroll-settle scenarios
|
|
34
|
+
- [runner/host-doctor.ts](../runner/host-doctor.ts): aggregate host/device preflight for adb, simctl, agent-device, and Argent command availability before live proof starts
|
|
35
|
+
- [runner/live-proof.ts](../runner/live-proof.ts): live-proof artifact reader for validation, status formatting, and optional regression gating
|
|
36
|
+
- [runner/validate-project.ts](../runner/validate-project.ts): initialized project validator for app helper presence, package-script snippets, app `package.json` script merge and direct-bin drift, required config fields, scenario manifests, runner manifests, and planner compatibility
|
|
37
|
+
- [runner/demo-loop.ts](../runner/demo-loop.ts): fixture loop that proves preflight, profile history, and latest-trusted comparison without a simulator
|
|
38
|
+
- [examples/event-logs](../examples/event-logs): deterministic profile-event logs for the fixture loop
|
|
39
|
+
- [examples/mobile-app](../examples/mobile-app): neutral Expo dogfood app with scenario manifests and profile-event evidence fixtures
|
|
40
|
+
- [examples/scenarios/ios](../examples/scenarios/ios): iOS profile scenario manifests for the current log-ingest runner
|
|
41
|
+
- [examples/scenarios/mobile](../examples/scenarios/mobile): canonical portable scenario fixtures
|
|
42
|
+
- [examples/runners](../examples/runners): primary runner, evidence-provider, and adapter-target capability fixtures
|
|
43
|
+
- [schemas](../schemas): JSON Schemas for current artifacts plus the scenario and runner capability contracts
|
|
44
|
+
|
|
45
|
+
## Public app contract
|
|
46
|
+
|
|
47
|
+
App-side, your app exposes:
|
|
48
|
+
|
|
49
|
+
- session control: `startProfileSession`, `stopProfileSession`, `applyProfileSessionUrl`
|
|
50
|
+
- truth events: `emitProfileEvent`
|
|
51
|
+
- signal attachments: `storeProfileSignal`
|
|
52
|
+
|
|
53
|
+
The app integration is intentionally thin. The application emits truth; runners and providers collect evidence around it.
|
|
54
|
+
|
|
55
|
+
## Public scenario contract
|
|
56
|
+
|
|
57
|
+
Portable scenario manifests describe the durable app behavior before choosing a runner:
|
|
58
|
+
|
|
59
|
+
- `journey`: human-readable intent, actor, start state, and end state
|
|
60
|
+
- `platforms`: supported runtime targets
|
|
61
|
+
- `requiredCapabilities` and `optionalCapabilities`: runner capability requirements
|
|
62
|
+
- `steps[].driverAction`: optional concrete driver operation required by a step, such as `tap`, `scroll`, `assertVisible`, `inspectTree`, `screenshot`, `record`, `readLogs`, or `collectPerfSignals`
|
|
63
|
+
- `comparisonLane`: optional default baseline lane for latest-trusted comparisons
|
|
64
|
+
- `truthEvents`: app-owned milestone events keyed by stable milestone id
|
|
65
|
+
- `milestones`: inspectable milestone list with event names, phases, timeouts, and descriptions
|
|
66
|
+
- `expectedEvents`: event names the runner or log ingest should expect to observe
|
|
67
|
+
- `cycles`: repeat count, warmup count, and failure policy for repeated journeys
|
|
68
|
+
- `budgets`: product thresholds evaluated only after scenario health passes
|
|
69
|
+
- `steps`: runner-facing launch, command, wait, gesture, and capture actions
|
|
70
|
+
- `selector`: optional app target on a step, such as a test id, accessibility id, label, text, resource id, or xpath
|
|
71
|
+
- `artifacts`: required and optional evidence outputs
|
|
72
|
+
|
|
73
|
+
The scenario contract is intentionally runner-neutral. Runners can map steps to adb, XcodeBuildMCP, agent-device, accessibility tools, profilers, or custom scripts while preserving the same journey, milestones, budgets, and expected events.
|
|
74
|
+
|
|
75
|
+
Runner capabilities describe ownership, such as launch, session control, command execution, log capture, artifact writing, or profiler support. Driver actions describe the concrete operations an adapter can perform inside a run. A runner may be able to own a scenario lifecycle without supporting every driver action; the planner fails only when a required step declares a `driverAction` that the selected runner or an active provider does not declare in `driverActions`.
|
|
76
|
+
|
|
77
|
+
`buildScenarioExecutionPlan()` turns the same scenario steps into a deterministic adapter-facing work list. Each normalized step records the scenario step id, original kind, required flag, optional driver action, and the runner port method that owns it: `launch`, `executeStep`, `waitForTruthEvent`, or `captureEvidence`.
|
|
78
|
+
|
|
79
|
+
Android adb capture routes normalized steps with `driverAction: "tap"`, `"scroll"`, `"assertVisible"`, `"inspectTree"`, `"screenshot"`, or `"readLogs"` through the adb driver adapter. `adapterOptions.androidAdb` carries action-specific metadata: coordinate fields for tap and scroll, `durationMs` for scroll, `logcatLines` for bounded logs, `waitMs` for capture timing, and `rawFileName` for evidence filename overrides. `assertVisible` requires a portable selector and verifies it against a UIAutomator tree dump, preserving that XML as raw evidence. Log capture keeps `raw/adb-logcat.txt` as the default profile input.
|
|
80
|
+
|
|
81
|
+
When Android adb `tap` or `scroll` steps provide a portable selector instead of coordinates, the runner captures `uiautomator dump` output, resolves supported selector kinds against node bounds, and derives adb input coordinates before executing the action. Built-in Android selector resolution supports `testId`, `resourceId`, `accessibilityId`, `accessibilityLabel`, and `text`; `xpath` stays available for external runners with native selector engines.
|
|
82
|
+
|
|
83
|
+
I/O from iOS simctl capture routes through the simctl driver adapter. `readLogs` preserves bounded simulator logs under `raw/ios-simctl-log.txt`. A scenario step with `driverAction: "screenshot"` or `artifact: "screenshot"` requests a screenshot capture, defaulting to `captures/ios-screenshot.png`; when `--screenshot-type`, `--screenshot-display`, or `--screenshot-mask` are supplied to `asl-ios-simctl`, the command passes those supported `simctl io screenshot` options and records them in capture metadata. The profile manifest records the resulting capture path in `artifacts.captures.screenshots`.
|
|
84
|
+
|
|
85
|
+
Planner compatibility also validates the adapter metadata that built-in runners require. Android adb `tap` steps need either `adapterOptions.androidAdb.x/y` or a portable selector; Android adb `scroll` steps need either `startX/startY/endX/endY` or a portable selector; iOS simctl command metadata needs non-empty command strings and positive integer waits/repeat counts. Argent `tap` steps need `adapterOptions.argent.x/y`, Argent `scroll` steps need `adapterOptions.argent.startX/startY/endX/endY`, and Argent `assertVisible` steps need a portable selector. These failures become `invalid_adapter_options` health checks before runtime execution starts.
|
|
86
|
+
|
|
87
|
+
Adapter-target fixtures such as `agent-device-android`, `agent-device-ios`, `argent-ios`, `argent-android`, `argent-react-profiler-provider`, and `axe-accessibility-provider` describe where external tools can plug into the same contract. They are schema-checked and planner-tested capability manifests. The bundled `agent-device` capture runner implements the portable interaction subset for iOS and Android; broader agent-device surfaces such as React DevTools, traces, network, and performance still need explicit adapters or provider attachments before they become part of the stable artifact contract. The bundled Argent runner implements launch, coordinate-backed gestures, screenshot requests, and description-backed visibility proof for portable selector match modes while keeping React profiler output in a separate Android evidence-provider lane. Argent command-surface checks prove the configured tools exist; runtime health still owns whether the selected device backend produced screenshot evidence. Required screenshot failures fail health, and optional screenshot failures are preserved as warnings. Active evidence providers can satisfy required evidence artifacts and provider-owned driver actions such as `collectPerfSignals`; providers outside the selected platform do not contribute to the match. When those tools write files independently, profile CLIs can attach the files with `--signal <js|memory|network>:<path>` or `--capture <screenshot|video|uiTree>:<path>` so provider evidence lands in the stable manifest and artifact layout. The `script-accessibility-provider`, `script-profiler-provider`, `script-memory-provider`, and `script-network-provider` examples show provider-command wrappers for project-local tools without making those tools package dependencies.
|
|
88
|
+
|
|
89
|
+
## Public artifact layout
|
|
90
|
+
|
|
91
|
+
Every run should produce a stable artifact folder.
|
|
92
|
+
|
|
93
|
+
Core artifacts:
|
|
94
|
+
|
|
95
|
+
- `health.json`: whether the scenario execution was valid enough to interpret
|
|
96
|
+
- `verdict.json`: budget outcome for product behavior, or `not_evaluated` before evidence is collected
|
|
97
|
+
- `comparison.json`: optional before/after result against a trusted baseline
|
|
98
|
+
- `live-proof.json`: aggregate proof summary for a multi-scenario live run
|
|
99
|
+
- `live-proof-set.json`: aggregate platform-set proof summary across Android and iOS live-proof artifacts
|
|
100
|
+
- `agent-summary.md`: agent-readable health gate and next-action summary
|
|
101
|
+
- `planner-compatibility.json`: optional preflight detail from runner/provider matching
|
|
102
|
+
- `project-validation.json`: project-level validation result for initialized app scaffolds, including helper readiness, config readiness, scenario candidate directories, discovered scenario paths, declared `drivers.supported` readiness, package-supported driver classification, external target driver classification, custom driver declarations, package-script snippet readiness, app `package.json` script merge and direct-bin drift readiness, non-failing setup warnings, and structured next actions
|
|
103
|
+
|
|
104
|
+
Profile runner artifacts:
|
|
105
|
+
|
|
106
|
+
- `manifest.json`
|
|
107
|
+
- `metrics.json`
|
|
108
|
+
- `budget-verdict.json`
|
|
109
|
+
- `causal-run.json`
|
|
110
|
+
- `summary.md`
|
|
111
|
+
|
|
112
|
+
`manifest.json`, `metrics.json`, `budget-verdict.json`, and `causal-run.json` are schema-checked before the runner writes them. This keeps profile artifacts stable across fixture logs, adb-captured logs, and future runner adapters.
|
|
113
|
+
|
|
114
|
+
Aggregate live proof commands write `live-proof.json` and `agent-summary.md` under `_live-proof/<run-id>`. The live-proof artifact points to preflight evidence, every scenario run, optional interaction proofs from tools such as agent-device or Argent, optional skipped interaction proof declarations, and optional latest-trusted comparison outputs, giving agents one stable entrypoint after a proof run. Preflight, profile, and interaction pointers include health and verdict status from the linked run artifacts, so agents can see what passed before opening deeper evidence. Interaction proof pointers also include sidecar screenshot capture inventory when the sidecar produced screenshots, plus `warnings` when optional sidecar checks failed without invalidating the required proof. If profile health or verdict fails, requested sidecars are not executed; they are recorded in `skippedInteractionProofs` with a reason and next action so agent feedback stays explicit without mixing runner evidence into an untrusted timing run. The aggregate artifact records `status`, `comparisonStatus`, `comparisonCounts`, optional per-comparison `metricSummary` counts/highlights, and a `nextAction` hint so agents can distinguish failed proof gates, regressions, mixed metric movement, missing baselines, inconclusive comparisons, partial sidecar evidence, and clean summaries without scraping prose.
|
|
115
|
+
|
|
116
|
+
Platform-set proof commands write `live-proof-set.json` and `agent-summary.md` under the caller-provided proof-set output directory. The proof-set artifact records required platforms, present platforms, missing platforms, each linked `live-proof.json`, failed proof reasons, regression-gate reasons, and a next action. This gives agents one stable Android-plus-iOS gate after the per-platform live proofs have written their own aggregate evidence.
|
|
117
|
+
|
|
118
|
+
Provider or custom-script evidence attached with `--signal` or `--capture` is copied into stable run folders and inventoried in `manifest.artifacts.evidenceAttachments`. Each inventory entry records the evidence channel, kind, run-relative path, source filename, byte size, and sha256 hash; it does not preserve local absolute source paths.
|
|
119
|
+
|
|
120
|
+
Evidence folders:
|
|
121
|
+
|
|
122
|
+
- `raw/`
|
|
123
|
+
- `captures/`
|
|
124
|
+
- optional `signals/js`
|
|
125
|
+
- optional `signals/memory`
|
|
126
|
+
- optional `signals/network`
|
|
127
|
+
|
|
128
|
+
The artifact contract separates scenario health from product verdict: `health.json` records execution validity, `verdict.json` records budget outcome, `comparison.json` records before/after baseline comparison, and `agent-summary.md` gives agents the health gate before they touch code.
|
|
129
|
+
|
|
130
|
+
Failed or warning health checks may include scalar `metadata.nextActionCode` and `metadata.nextAction` fields. These are stable, agent-readable recovery hints for runner setup failures such as missing adb, an unbooted simulator, an uninstalled app package, or an unresolved selector. Host-bound availability checks may also include `metadata.failureClass` values such as `host_access`, `timeout`, `missing_binary`, or `command_surface` so agents can distinguish sandbox or daemon access from a broken runner command. The summary builder renders those hints in `agent-summary.md`, but they do not make timing evidence trustworthy unless scenario health passes.
|
|
131
|
+
|
|
132
|
+
The current profile runner writes health, verdict, agent summary, metrics, causal-run, and budget-verdict artifacts.
|
|
133
|
+
|
|
134
|
+
Budgets are supported but optional for adoption.
|
|
135
|
+
|
|
136
|
+
`buildRunIndex()` can scan an artifact root after runs complete. It indexes folders that contain both `health.json` and `verdict.json`, marks a run trusted only when health and verdict both passed, and lets agents find the latest trusted prior run for a scenario without relying on terminal history.
|
|
137
|
+
|
|
138
|
+
## Supported Runner Surface
|
|
139
|
+
|
|
140
|
+
The package currently supports:
|
|
141
|
+
|
|
142
|
+
- scenario/runner compatibility planning through `check-plan`
|
|
143
|
+
- fixture profile loops through committed profile-event logs
|
|
144
|
+
- Android adb readiness checks
|
|
145
|
+
- Android bounded logcat capture
|
|
146
|
+
- Android package launch plus bounded logcat capture
|
|
147
|
+
- Android adb driver adapter with scenario-routed `tap`, `scroll`, `assertVisible`, `inspectTree`, `screenshot`, and `readLogs`
|
|
148
|
+
- Android adb screenrecord capture through scenario-routed `record` driver actions
|
|
149
|
+
- Android profile artifact generation from explicit event logs, prior adb artifacts, or an owned `--adb-capture` window
|
|
150
|
+
- iOS bounded simulator log capture and stored app truth-event collection through simctl
|
|
151
|
+
- iOS simulator app launch plus storage-backed profile-session and command seeding
|
|
152
|
+
- iOS profile artifact generation from explicit event logs, prior simctl artifacts, or an owned `--simctl-capture` window
|
|
153
|
+
- generic Android and iOS live proof runners for one portable scenario, including preflight, profile capture, optional agent-device and Argent sidecars, optional latest-trusted comparison, and aggregate `live-proof.json`
|
|
154
|
+
- agent-device and Argent capture runners that write ASL health, verdict, raw transcripts, and capture artifacts without making those tools package dependencies
|
|
155
|
+
- evidence-provider command execution through `--provider <manifest>`, with declared outputs inventoried as stable evidence attachments and nonzero exits written as failed health gates
|
|
156
|
+
- trusted baseline/current comparison after scenario health passes, with millisecond timing noise treated as unchanged inside a small mobile-safe tolerance and opposite metric directions surfaced as `mixed`
|
|
157
|
+
- latest trusted prior-run comparison from an artifact root
|
|
158
|
+
|
|
159
|
+
Not yet shipped as supported public features:
|
|
160
|
+
|
|
161
|
+
- generic consuming-app installation or build orchestration
|
|
162
|
+
- broad semantic UI workflow driving beyond the shipped portable driver-action subset
|
|
163
|
+
- memory, network, or accessibility evidence capture from built-in drivers
|
|
164
|
+
- Computer Use flows
|
|
165
|
+
- product-specific scenarios
|
|
166
|
+
|
|
167
|
+
## Preflight planning
|
|
168
|
+
|
|
169
|
+
Use `check-plan` to validate a scenario, runner manifest, and optional evidence-provider manifests before execution:
|
|
170
|
+
|
|
171
|
+
```bash
|
|
172
|
+
pnpm check-plan -- --scenario examples/scenarios/mobile/app-startup.json --runner examples/runners/xcodebuildmcp-ios.json --platform ios --out artifacts/plan/app-startup
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
This validates the input manifests, writes schema-checked `health.json` and `verdict.json`, writes `agent-summary.md`, and includes the raw planner match in `planner-compatibility.json`.
|
|
176
|
+
|
|
177
|
+
## Android adb readiness
|
|
178
|
+
|
|
179
|
+
Use `android:preflight` to verify adb and connected-device readiness before adding live Android scenario execution:
|
|
180
|
+
|
|
181
|
+
```bash
|
|
182
|
+
pnpm android:preflight -- --package com.example.app --out artifacts/android-adb-preflight
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
The command writes:
|
|
186
|
+
|
|
187
|
+
- `health.json`
|
|
188
|
+
- `verdict.json`
|
|
189
|
+
- `agent-summary.md`
|
|
190
|
+
- `raw/adb-version.txt`
|
|
191
|
+
- `raw/adb-devices.txt`
|
|
192
|
+
- `raw/android-metadata.json`
|
|
193
|
+
|
|
194
|
+
If adb, a connected online device, or an optional package check fails, health fails and the verdict remains `inconclusive`.
|
|
195
|
+
|
|
196
|
+
Add `--capture-logcat --logcat-lines <count>` to write `raw/adb-logcat.txt` in the same artifact folder. Add `--react-native-debug-host <host:port>` with `--package <name>` for React Native development builds that need adb reverse plus the app `debug_http_host` preference before launch; the runner writes `raw/adb-react-native-reverse.txt` and `raw/adb-react-native-debug-host.txt`. Add `--clear-logcat --launch --wait-ms <ms>` with `--package <name>` to clear logs, launch the package, wait for a bounded capture window, and then collect logcat evidence. If requested capture-window setup or logcat capture fails, scenario health fails because timing and event evidence would be incomplete.
|
|
197
|
+
|
|
198
|
+
Use that captured logcat evidence directly with Android profiling:
|
|
199
|
+
|
|
200
|
+
```bash
|
|
201
|
+
pnpm profile:android -- --config core/config-template.json --scenario examples/mobile-app/scenarios/android/app-startup.json --adb-artifacts artifacts/android-adb-preflight --run-id android-run-1
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
Or let Android profiling own the adb capture window before it writes profile artifacts:
|
|
205
|
+
|
|
206
|
+
```bash
|
|
207
|
+
pnpm profile:android -- --config core/config-template.json --scenario examples/mobile-app/scenarios/android/app-startup.json --adb-capture --react-native-debug-host localhost:8097 --clear-logcat --launch --run-id android-run-1
|
|
208
|
+
```
|
|
209
|
+
|
|
210
|
+
## iOS simulator capture
|
|
211
|
+
|
|
212
|
+
Use `profile:ios --simctl-capture` when the example app or a consuming app is already installed on a booted simulator:
|
|
213
|
+
|
|
214
|
+
```bash
|
|
215
|
+
pnpm profile:ios -- --config core/config-template.json --scenario examples/mobile-app/scenarios/ios/app-startup.json --simctl-capture --profile-session --profile-session-storage --launch --run-id ios-run-1
|
|
216
|
+
```
|
|
217
|
+
|
|
218
|
+
The command writes a separate simctl capture folder under the selected output root, seeds the app-owned profile session into native AsyncStorage before launch, then collects stored app profile events after the capture window. Command scenarios seed the scenario command queue through the same storage contract before launch. When `raw/ios-profile-events.log` exists, the iOS profile runner ingests that stored truth-event log; otherwise it falls back to `raw/ios-simctl-log.txt`.
|
|
219
|
+
|
|
220
|
+
For profile-session capture on Android or iOS, omitting `--wait-ms` lets ASL derive the final evidence window from scenario execution waits and cycle count. Explicit `--wait-ms` remains authoritative when a consuming app has a known startup or logging delay that the scenario cannot express.
|
|
221
|
+
|
|
222
|
+
Scenario command targets live in `adapterOptions.iosSimctl.commands`, while the app handles them through `registerProfileCommandTargetHandler`. The iOS proof does not depend on unified logs carrying JavaScript console output; it depends on app-owned stored profile events.
|
|
223
|
+
|
|
224
|
+
## Historical comparison
|
|
225
|
+
|
|
226
|
+
Use `compare` to build `comparison.json` from two completed run folders:
|
|
227
|
+
|
|
228
|
+
```bash
|
|
229
|
+
pnpm compare -- --baseline artifacts/runs/app-startup/baseline --current artifacts/runs/app-startup/current --out artifacts/runs/app-startup/current --fail-on-regression
|
|
230
|
+
```
|
|
231
|
+
|
|
232
|
+
The comparison gate is intentionally strict. If either run failed scenario health, or if the scenario ids do not match, the comparison is `inconclusive`. Numeric budget checks are compared only after that health gate passes. `comparison.json` includes `comparisonBasis` with the baseline/current run ids and run directories, giving agents artifact-local provenance instead of forcing them to infer it from folder names.
|
|
233
|
+
|
|
234
|
+
Use `compare:latest` when an artifact root contains run history and the agent should compare the current run against the newest trusted prior run for the same scenario:
|
|
235
|
+
|
|
236
|
+
```bash
|
|
237
|
+
pnpm compare:latest -- --root artifacts/runs --scenario app-startup --current artifacts/runs/app-startup/current --out artifacts/runs/app-startup/current --fail-on-regression
|
|
238
|
+
```
|
|
239
|
+
|
|
240
|
+
The latest-trusted command excludes the exact current run directory from baseline selection. Baseline trust requires passed health and passed verdict. Current runs must pass scenario health before the command will compare timing or budget evidence. If the current manifest declares `comparisonLane`, baseline selection is scoped to trusted prior runs with the same lane; if the current manifest has no lane, selection stays within unlabeled trusted prior runs. Profile manifests also include `scenarioHash`, a stable fingerprint of the normalized scenario contract. When the current run has that hash, latest-trusted selection only compares against trusted prior runs with the same hash; legacy runs without the hash remain comparable only to legacy current runs. This keeps proof modes such as plain live proof and live proof plus agent-device sidecar from comparing against each other, and it keeps migrated scenario definitions from poisoning before/after verdicts. Latest-trusted artifacts set `comparisonBasis.strategy` to `latest_trusted_prior` and record selection counts for inspected, trusted, trusted-prior, lane-comparable, and scenario-contract-comparable candidates.
|
|
241
|
+
|
|
242
|
+
## Fixture loop
|
|
243
|
+
|
|
244
|
+
Use `demo:loop` to run the current contract without a simulator:
|
|
245
|
+
|
|
246
|
+
```bash
|
|
247
|
+
pnpm demo:loop -- --out artifacts/demo-loop
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
The fixture loop writes:
|
|
251
|
+
|
|
252
|
+
- `preflight/app-startup/health.json`
|
|
253
|
+
- `preflight/app-startup/verdict.json`
|
|
254
|
+
- `preflight/app-startup/agent-summary.md`
|
|
255
|
+
- `profile-runs/app-startup/demo-baseline/*`
|
|
256
|
+
- `profile-runs/app-startup/demo-current/*`
|
|
257
|
+
- `profile-runs/app-startup/demo-current/comparison.json`
|
|
258
|
+
|
|
259
|
+
This is not a replacement for live device proof. It is a stable contract check that keeps the evidence loop reproducible through trusted prior-run selection while iOS or Android runtime setup is unavailable.
|
|
260
|
+
|
|
261
|
+
## Read next
|
|
262
|
+
|
|
263
|
+
- [README](../README.md) for the shortest path through the project
|
|
264
|
+
- [Concepts](concepts.md) for the broader product framing
|
|
265
|
+
- [Adapter Onboarding](adapters.md) for adding runners and evidence providers
|
|
266
|
+
- [Consumer App Rehearsal](consumer-rehearsal.md) for adopting the package in an existing app
|
|
267
|
+
- [Runner docs](../runner/README.md) for current runner behavior and limits
|