agent-scenario-loop 0.1.2 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (87) hide show
  1. package/README.md +9 -9
  2. package/app/profile-session.ts +352 -12
  3. package/dist/core/agent-summary.d.ts +3 -2
  4. package/dist/core/agent-summary.js +44 -2
  5. package/dist/core/artifact-contract.d.ts +28 -8
  6. package/dist/core/artifact-contract.js +676 -26
  7. package/dist/core/comparison.d.ts +57 -3
  8. package/dist/core/comparison.js +113 -1
  9. package/dist/core/planner.d.ts +32 -1
  10. package/dist/core/planner.js +144 -0
  11. package/dist/core/run-index.d.ts +4 -0
  12. package/dist/core/run-index.js +55 -1
  13. package/dist/core/schema-validator.d.ts +2 -0
  14. package/dist/core/schema-validator.js +2 -0
  15. package/dist/runner/android-adb-driver.d.ts +7 -2
  16. package/dist/runner/android-adb-driver.js +7 -1
  17. package/dist/runner/android-adb.d.ts +40 -5
  18. package/dist/runner/android-adb.js +1046 -664
  19. package/dist/runner/compare-latest.d.ts +8 -4
  20. package/dist/runner/compare-latest.js +24 -5
  21. package/dist/runner/example-android-live.d.ts +10 -1
  22. package/dist/runner/example-android-live.js +55 -0
  23. package/dist/runner/example-ios-live.d.ts +10 -1
  24. package/dist/runner/example-ios-live.js +55 -0
  25. package/dist/runner/ios-simctl.d.ts +6 -0
  26. package/dist/runner/ios-simctl.js +7 -0
  27. package/dist/runner/live-comparison.d.ts +2 -2
  28. package/dist/runner/live-comparison.js +2 -1
  29. package/dist/runner/live-proof-summary.d.ts +5 -4
  30. package/dist/runner/live-proof-summary.js +12 -2
  31. package/dist/runner/live-proof.d.ts +3 -2
  32. package/dist/runner/live-proof.js +9 -2
  33. package/dist/runner/profile-android.d.ts +16 -1
  34. package/dist/runner/profile-android.js +364 -26
  35. package/dist/runner/profile-ios.d.ts +13 -2
  36. package/dist/runner/profile-ios.js +341 -19
  37. package/dist/runner/profile-mobile.d.ts +39 -3
  38. package/dist/runner/profile-mobile.js +1054 -42
  39. package/dist/runner/validate-project.js +3 -0
  40. package/dist/scripts/consumer-rehearsal.d.ts +119 -0
  41. package/dist/scripts/consumer-rehearsal.js +757 -0
  42. package/dist/scripts/downstream-local-package-gate.d.ts +2 -0
  43. package/dist/scripts/downstream-local-package-gate.js +264 -0
  44. package/dist/scripts/package-smoke.d.ts +96 -0
  45. package/dist/scripts/package-smoke.js +2282 -0
  46. package/dist/scripts/release-readiness.d.ts +2 -0
  47. package/dist/scripts/release-readiness.js +520 -0
  48. package/docs/adapters.md +7 -1
  49. package/docs/api.md +2 -2
  50. package/docs/architecture.md +90 -0
  51. package/docs/authoring.md +39 -3
  52. package/docs/concepts.md +3 -24
  53. package/docs/consumer-rehearsal.md +31 -1
  54. package/docs/contracts.md +45 -101
  55. package/docs/external-adapter-protocol.md +219 -0
  56. package/docs/live-proofs.md +86 -3
  57. package/docs/principles.md +9 -15
  58. package/examples/mobile-app/README.md +12 -0
  59. package/examples/mobile-app/runner-manifests/evidence-provider.json +3 -3
  60. package/examples/mobile-app/runner-manifests/primary-runner.json +1 -0
  61. package/examples/mobile-app/scripts/asl-capture-profiler-provider.mjs +25 -0
  62. package/examples/runners/README.md +4 -3
  63. package/examples/runners/adb-android.json +1 -0
  64. package/examples/runners/agent-device-android.json +1 -0
  65. package/examples/runners/agent-device-ios.json +1 -0
  66. package/examples/runners/argent-android.json +1 -0
  67. package/examples/runners/argent-ios.json +1 -0
  68. package/examples/runners/axe-accessibility-provider.json +2 -2
  69. package/examples/runners/script-accessibility-provider.json +2 -2
  70. package/examples/runners/script-memory-provider.json +2 -2
  71. package/examples/runners/script-network-provider.json +2 -2
  72. package/examples/runners/script-profiler-provider.json +2 -2
  73. package/examples/runners/xcodebuildmcp-ios.json +1 -0
  74. package/package.json +12 -3
  75. package/schemas/causal-run.schema.json +85 -2
  76. package/schemas/comparison.schema.json +130 -2
  77. package/schemas/external-adapter-message.schema.json +693 -0
  78. package/schemas/health.schema.json +72 -0
  79. package/schemas/live-proof-set.schema.json +1 -1
  80. package/schemas/live-proof.schema.json +14 -6
  81. package/schemas/manifest.schema.json +515 -4
  82. package/schemas/profiler.schema.json +243 -0
  83. package/schemas/runner-capabilities.schema.json +28 -2
  84. package/schemas/scenario.schema.json +34 -2
  85. package/templates/evidence-provider.json +3 -3
  86. package/templates/primary-runner.json +1 -0
  87. package/templates/scripts/asl-capture-profiler-provider.mjs +20 -0
package/docs/authoring.md CHANGED
@@ -80,12 +80,14 @@ Preferred fields:
80
80
  - `journey`: human-readable intent, actor, start state, and end state
81
81
  - `comparisonLane`: default historical baseline lane for runs of this scenario
82
82
  - `milestones`: named event checkpoints with phases and timeouts
83
- - `cycles`: iteration count and stop policy
83
+ - `cycles`: iteration count, stop policy, and optional setup/body step ids
84
84
  - `budgets`: thresholds to evaluate only after truth-event health passes
85
85
  - `artifacts`: required and optional evidence outputs
86
86
 
87
87
  Use `comparisonLane` when a scenario should always compare within one stable proof mode, such as `feed-open-android-live`. Profile CLIs can also receive `--comparison-lane`; the CLI flag wins when one-off runs need a different lane.
88
88
 
89
+ For repeated scenarios, separate setup from the measured body. Commands that clear state, navigate home, dismiss modals, or establish readiness should not be measured every iteration unless that cleanup is the journey under test. Use `cycles.setupStepIds` for leading setup commands that run once, or `cycles.bodyStepIds` to name the repeated command body. If neither is provided, ASL profile-session runners infer a conservative setup prefix from readiness waits and measured milestone budgets, but explicit ids are clearer for complex flows.
90
+
89
91
  ## Truth Events
90
92
 
91
93
  Treat truth events as app-owned facts, not runner observations. The app should emit them from the code path that actually represents the journey state.
@@ -105,6 +107,32 @@ Weak truth events:
105
107
 
106
108
  Timing is not trusted unless scenario health passes. If a required truth event is missing, the run can still write artifacts, but verdicts and comparisons must remain inconclusive.
107
109
 
110
+ ### Resume Scenarios
111
+
112
+ `--lifecycle-phase resume` and related runner controls assert runner-owned lifecycle setup in `manifest.environment`; they do not create product truth events. If a scenario waits for `app_resumed`, `feed_restored_after_resume`, or another resumed-state milestone, the app must emit that event from the code path that proves resumed product readiness.
113
+
114
+ ## Budget Intervals
115
+
116
+ Milestone budgets measure the interval the scenario names. A budget with only `toMilestone` measures elapsed time from the run or session clock origin to each matching milestone occurrence. That is correct for startup and first-usable-screen budgets, but it is cumulative for repeated interactions.
117
+
118
+ For transition or gesture budgets, provide both ends of the interval:
119
+
120
+ ```json
121
+ {
122
+ "name": "surface transition p95",
123
+ "source": "milestone",
124
+ "metric": "p95",
125
+ "unit": "ms",
126
+ "limit": 300,
127
+ "fromMilestone": "surfaceTransitionRequested",
128
+ "toMilestone": "surfaceSettled"
129
+ }
130
+ ```
131
+
132
+ Use app-owned truth events for both milestones. Do not use a command-delivered event as the start point unless that command delivery is the product fact being measured.
133
+
134
+ When the start event is useful only as a timing anchor, keep it optional and keep scenario health tied to the completion truth. For repeated flows, set `metricEvents.milestone` or the completion-oriented cycle events to the truth that proves the iteration completed, then use the optional intent milestone as `fromMilestone` in the budget.
135
+
108
136
  ## Steps
109
137
 
110
138
  Use steps to describe intent and required adapter actions:
@@ -118,6 +146,8 @@ Use steps to describe intent and required adapter actions:
118
146
 
119
147
  Use `driverAction` only when the scenario truly requires a concrete operation such as `tap`, `scroll`, `assertVisible`, `screenshot`, `readLogs`, or `collectPerfSignals`. The planner fails early when no active runner or provider can satisfy a required driver action.
120
148
 
149
+ For profile-session command transport, platform `waitMs` metadata is queue pacing. ASL preserves it in storage and deep-link command envelopes and waits before releasing the next queued command. App-owned milestones still provide the truth that a command produced the intended product state.
150
+
121
151
  Use `selector` to describe the intended app target without committing the scenario to one driver. Supported selector kinds are `testId`, `accessibilityId`, `accessibilityLabel`, `text`, `resourceId`, and `xpath`.
122
152
 
123
153
  ```json
@@ -159,12 +189,14 @@ asl-profile-android \
159
189
  --capture uiTree:artifacts/provider/ui-tree.json
160
190
  ```
161
191
 
162
- Signals are copied into `signals/js`, `signals/memory`, or `signals/network` and listed in `manifest.json`. Captures are copied into `captures`; screenshots are listed in `artifacts.captures.screenshots`, while video and UI tree captures replace the matching named capture path in the manifest. Every attached file is also listed in `artifacts.evidenceAttachments` with kind, run-relative path, source filename, byte size, and sha256 hash. Attached provider evidence is preserved as proof, but timing verdicts still come from app-owned truth events and budgets.
192
+ Signals are copied into `signals/js`, `signals/memory`, or `signals/network` and listed in `manifest.json`. Captures are copied into `captures`; screenshots are listed in `artifacts.captures.screenshots`, while video and UI tree captures replace the matching named capture path in the manifest. Every attached file is also listed in `artifacts.evidenceAttachments` with kind, run-relative path, source filename, byte size, sha256 hash, completeness status, corruption status, redaction status, and transformation list. Attached provider evidence is preserved as proof, but timing verdicts still come from app-owned truth events and budgets.
163
193
 
164
- Provider manifests can also declare `providerCommands`. Profile runners execute those commands when passed with `--provider <manifest>`, but only when the provider manifest includes the selected platform. A provider with `platforms: ["ios"]` passed to an Android profile writes failed `health.json` with `provider_platform_unsupported` and does not run the command. Commands run without a shell, can use placeholders such as `{providerDir}`, `{runDir}`, `{runId}`, `{scenarioId}`, and `{platform}`, and must declare their output files. Provider-channel outputs are copied or preserved under `raw/providers/<provider-id>/` and inventoried in `artifacts.evidenceAttachments`; signal and capture outputs can still map into the standard `signals/*` or `captures/` folders. Command stdout, stderr, exit code, phase, and argv are preserved under `raw/provider-commands/`. When a provider command exits nonzero, the runner writes failed `health.json`, inconclusive `verdict.json`, and `agent-summary.md` with a next-action hint instead of making timing claims.
194
+ Provider manifests can also declare `providerCommands`. Profile runners execute those commands when passed with `--provider <manifest>`, but only when the provider manifest includes the selected platform. Commands run after the platform evidence source has been collected or supplied, so heavy diagnostics can be attached in a post-loop rehydration run with `--adb-artifacts` or `--simctl-artifacts` instead of perturbing the measured command window. Use `phase: "afterCapture"` for capture-sidecar diagnostics and `phase: "postRun"` for post-profile enrichment; older `capture` phase values remain accepted for existing manifests. A provider with `platforms: ["ios"]` passed to an Android profile writes failed `health.json` with `provider_platform_unsupported` and does not run the command. Commands run without a shell, can use placeholders such as `{providerDir}`, `{runDir}`, `{runId}`, `{scenarioId}`, and `{platform}`, and must declare their output files. Provider-channel outputs are copied or preserved under `raw/providers/<provider-id>/` and inventoried in `artifacts.evidenceAttachments`; signal and capture outputs can still map into the standard `signals/*` or `captures/` folders. An output can set `required: true` when the provider treats that file as required evidence; matching entries in `manifest.artifacts.diagnostics` then remain marked required in addition to scenario-authored `artifacts.required` and `requiredCapabilities`. Command stdout, stderr, exit code, phase, and argv are preserved under `raw/provider-commands/`. When a provider command exits nonzero, the runner writes failed `health.json`, inconclusive `verdict.json`, and `agent-summary.md` with a next-action hint instead of making timing claims.
165
195
 
166
196
  The `examples/runners/script-*.json` manifests show package-neutral wrappers for accessibility, profiler, memory, and network evidence. They intentionally reference placeholder commands such as `capture-accessibility` or `capture-memory`; replace those with your project-local script, binary, or agent command. The contract that matters is the declared output path and evidence kind, not the specific tool used to create the file.
167
197
 
198
+ For React Native profiling, prefer a provider that emits both the raw profiler export and a structured JSON summary. JSON outputs with `kind: "profiler"` are validated against ASL's profiler evidence schema, so include the provider id, platform, run id, scenario id, tool metadata, completeness status, and at least one content surface such as samples, metrics, events, traces, a profile object, summary, or attachment references. If profiler evidence depends on explicit start/stop commands, model it as lifecycle-owned evidence: declare `captureMode`, `profileKind`, `lifecycle`, `targetBinding`, and `comparability` so agents can distinguish passive existing reports from session captures, inline captures that may perturb budgets, and after-capture or rehydrated diagnostics. CPU summaries derived from a prior profiler session should not be attached as passive evidence unless the provider also preserves the session provenance and raw attachments. If your profiler only produces a native trace or flamegraph, attach it as preserved evidence and avoid making performance claims until a provider translates the relevant facts into structured metrics.
199
+
168
200
  ## Artifacts
169
201
 
170
202
  A completed profile run should leave the standard artifact set:
@@ -196,3 +228,7 @@ Run the release gate before publishing package changes:
196
228
  ```bash
197
229
  pnpm release:check
198
230
  ```
231
+
232
+ ## Read next
233
+
234
+ - [Adapter Onboarding](adapters.md) for runner and provider integration
package/docs/concepts.md CHANGED
@@ -104,33 +104,12 @@ The tooling may change. The runners may change. The agents may change. The scena
104
104
 
105
105
  That is a different philosophy from frameworks that primarily evaluate agents. Agent Scenario Loop is built to evaluate the evolution of software.
106
106
 
107
- ## How it differs from testing frameworks
107
+ ## Boundary
108
108
 
109
- Agent Scenario Loop does not make existing testing frameworks obsolete.
109
+ Agent Scenario Loop is not a replacement for testing frameworks, automation tools, mobile drivers, profilers, or agent evaluation systems. Those tools can still execute or observe work.
110
110
 
111
- Traditional frameworks usually optimize for:
112
-
113
- > Did the application behave correctly?
114
-
115
- Agent Scenario Loop optimizes for:
116
-
117
- > What did we learn from running this scenario?
118
-
119
- Both questions matter. Agent Scenario Loop focuses on the second question by preserving health, verdicts, metrics, logs, traces, comparisons, and other run evidence in a stable artifact shape.
120
-
121
- ## How it differs from agent evaluation
122
-
123
- Agent Scenario Loop is not primarily evaluating agents.
124
-
125
- An agent may execute part of a run. A runner may drive a device. A profiler may collect signals. None of those is the center of the model.
126
-
127
- The scenario is.
128
-
129
- The feed, livestream, upload flow, checkout flow, or conversation thread is the thing being studied over time.
111
+ The canonical boundary list lives in [What It Is Not](../README.md#what-it-is-not).
130
112
 
131
113
  ## Read next
132
114
 
133
115
  - [Principles](principles.md) for the project doctrine
134
- - [Contracts](contracts.md) for the current artifact and package surface
135
- - [Live Proofs](live-proofs.md) for fixture, Android, iOS, and comparison runs
136
- - [Runner docs](../runner/README.md) for the host execution boundary
@@ -16,6 +16,32 @@ Package gates run child package-manager and CLI commands with a bounded timeout.
16
16
  ASL_PACKAGE_GATE_TIMEOUT_MS=300000 pnpm consumer:rehearse
17
17
  ```
18
18
 
19
+ ## Downstream Local-Package Gate
20
+
21
+ Before publishing a release candidate, validate the packed local package inside at least one real downstream app when that app has already adopted durable ASL scenarios. This catches package, runner, schema, and helper regressions before npm distribution.
22
+
23
+ From this repository, run the opt-in downstream gate with an explicit app root and explicit command arrays:
24
+
25
+ ```bash
26
+ pnpm downstream:local-package -- \
27
+ --app-root /path/to/adopter-app \
28
+ --expected-branch chore/agent-scenario-loop-adoption \
29
+ --command-json '["pnpm","run","asl:validate"]'
30
+ ```
31
+
32
+ The gate packs the current checkout, installs the tarball into the downstream app with `pnpm add`, verifies `node_modules/agent-scenario-loop/package.json` matches the local candidate version, runs the supplied commands, and restores `package.json` plus `pnpm-lock.yaml` unless `--keep-install` is passed. Generated downstream proof artifacts remain the consumer app's local ignored state.
33
+
34
+ For live probes, pass direct package CLI commands as additional JSON arrays so the target scenario and artifact root are explicit:
35
+
36
+ ```bash
37
+ pnpm downstream:local-package -- \
38
+ --app-root /path/to/adopter-app \
39
+ --command-json '["pnpm","run","asl:validate"]' \
40
+ --command-json '["node_modules/.bin/asl-profile-android","--config","asl.config.json","--scenario","scenarios/mobile/first-journey.json","--adb-capture","--profile-session","--android-profile-session-storage","--launch","--out","artifacts/asl/android","--run-id","first-journey-android-local-candidate"]'
41
+ ```
42
+
43
+ Keep adopter-specific app ids, storage keys, dev-client URLs, simulator UDIDs, auth state, accounts, and scenarios in ignored local environment state or in the consuming app. ASL owns the package candidate and evidence contract; the downstream app owns product truth.
44
+
19
45
  ## 1. Initialize The Scaffold
20
46
 
21
47
  From the consuming app root:
@@ -81,7 +107,7 @@ asl-check-plan --scenario scenarios/mobile/first-journey.json --runner runner-ma
81
107
  asl-profile-ios --config asl.config.json --scenario scenarios/mobile/first-journey.json --simctl-capture --profile-session --profile-session-storage --launch --out artifacts/asl/ios --run-id first-journey-ios-live --comparison-lane first-journey-ios-live
82
108
  ```
83
109
 
84
- For Expo dev-client builds, set `ASL_ANDROID_DEV_CLIENT_URL` or `ASL_IOS_DEV_CLIENT_URL` to the app's dev-client URL in ignored local env state. Android opens it before profile-session deep links; iOS opens it before reading stored profile-session evidence. If Android bundle startup is slow, set `ASL_ANDROID_DEV_CLIENT_READY_PATTERN='Running "main"'` so profile-session links wait for app runtime readiness evidence.
110
+ For Expo dev-client builds, set `ASL_ANDROID_DEV_CLIENT_URL` or `ASL_IOS_DEV_CLIENT_URL` to the app's dev-client URL in ignored local env state. Prefer the LAN URL advertised by Metro for physical-device validation. Use `127.0.0.1` only when the selected simulator/emulator resolves that address back to the host Metro process. Android opens the dev-client URL before profile-session control. When Android storage transport is enabled, ASL waits for `Running "main"` by default before writing profile-session storage; override `ASL_ANDROID_DEV_CLIENT_READY_PATTERN` only when the app has a better readiness marker. If startup readiness fails, ASL reports an unhealthy run and skips command delivery instead of writing into a stale native shell. iOS opens the dev-client URL before reading stored profile-session evidence.
85
111
 
86
112
  When Android deep-link delivery is unreliable in a dev-client shell, use `--android-profile-session-storage` so `asl-profile-android` seeds the app-owned AsyncStorage session through `run-as` before collecting evidence. The runner reads the selected device clock for the session start timestamp, which keeps app-emitted milestone durations meaningful. Keep custom storage key overrides local to the consuming app.
87
113
 
@@ -113,3 +139,7 @@ Before expanding beyond the first journey, confirm:
113
139
  - at least one platform has a passed live proof
114
140
 
115
141
  Only then add more scenarios, providers, or runner adapters.
142
+
143
+ ## Read next
144
+
145
+ - [Live Proofs](live-proofs.md) for fixture, Android, iOS, comparison, and release-proof commands
package/docs/contracts.md CHANGED
@@ -4,6 +4,8 @@ This package ships the scenario, runner, and artifact contracts that make Agent
4
4
 
5
5
  The package is intentionally contract-first: adopt the scenario and artifact shape once, then add or swap runner loops without rewriting your scenarios.
6
6
 
7
+ See [Architecture](architecture.md) for the TypeScript-first implementation and language-neutral contract boundary.
8
+
7
9
  ## What ships today
8
10
 
9
11
  - [app/profile-session.ts](../app/profile-session.ts): thin React Native integration for session control, truth events, and signal attachments
@@ -64,15 +66,20 @@ Portable scenario manifests describe the durable app behavior before choosing a
64
66
  - `truthEvents`: app-owned milestone events keyed by stable milestone id
65
67
  - `milestones`: inspectable milestone list with event names, phases, timeouts, and descriptions
66
68
  - `expectedEvents`: event names the runner or log ingest should expect to observe
67
- - `cycles`: repeat count, warmup count, and failure policy for repeated journeys
69
+ - `cycles`: repeat count, warmup count, failure policy, and optional setup/body step ids for repeated journeys
68
70
  - `budgets`: product thresholds evaluated only after scenario health passes
69
71
  - `steps`: runner-facing launch, command, wait, gesture, and capture actions
70
72
  - `selector`: optional app target on a step, such as a test id, accessibility id, label, text, resource id, or xpath
73
+ - `uiContext`: optional UI ownership requirement on a step; UI driver actions default to `app`
71
74
  - `artifacts`: required and optional evidence outputs
72
75
 
73
76
  The scenario contract is intentionally runner-neutral. Runners can map steps to adb, XcodeBuildMCP, agent-device, accessibility tools, profilers, or custom scripts while preserving the same journey, milestones, budgets, and expected events.
74
77
 
75
- Runner capabilities describe ownership, such as launch, session control, command execution, log capture, artifact writing, or profiler support. Driver actions describe the concrete operations an adapter can perform inside a run. A runner may be able to own a scenario lifecycle without supporting every driver action; the planner fails only when a required step declares a `driverAction` that the selected runner or an active provider does not declare in `driverActions`.
78
+ For repeated mobile command scenarios, `cycles.setupStepIds` names leading setup commands that run once before measured cycle work, while `cycles.bodyStepIds` names the first repeated body commands when inference would be ambiguous. Built-in profile-session runners also infer a setup prefix conservatively: leading readiness commands or leading commands before the first measured milestone command run once, and the remaining command body repeats for `cycles.iterations`. Wait gates remain strict; ASL does not synthesize missing app-owned truth events.
79
+
80
+ Runner capabilities describe ownership, such as launch, session control, command execution, log capture, artifact writing, or profiler support. Driver actions describe the concrete operations an adapter can perform inside a run. UI contexts describe which surface the runner or provider can own: `app`, `systemDialog`, `notificationShade`, `externalBrowser`, `webView`, `shareSheet`, `picker`, or `otherApp`. UI and capture driver actions default to `app` when a step omits `uiContext`; a scenario must opt into system or external contexts explicitly. A runner may be able to own a scenario lifecycle without supporting every driver action or UI context; the planner fails when a required step declares a `driverAction` or `uiContext` that the selected runner or an active provider does not declare.
81
+
82
+ Planner compatibility artifacts and planner-derived `health.json` include a `downgradePolicy` block with `mode: "no-silent-downgrade"`. Required capability, driver-action, UI-context, or artifact gaps are recorded as `unsupported`; optional gaps are recorded as warnings. `allowedSubstitutions` and `substitutions` are explicit arrays, so future semantic downgrades must be visible in artifacts instead of being inferred from a passed plan.
76
83
 
77
84
  `buildScenarioExecutionPlan()` turns the same scenario steps into a deterministic adapter-facing work list. Each normalized step records the scenario step id, original kind, required flag, optional driver action, and the runner port method that owns it: `launch`, `executeStep`, `waitForTruthEvent`, or `captureEvidence`.
78
85
 
@@ -80,11 +87,15 @@ Android adb capture routes normalized steps with `driverAction: "tap"`, `"scroll
80
87
 
81
88
  When Android adb `tap` or `scroll` steps provide a portable selector instead of coordinates, the runner captures `uiautomator dump` output, resolves supported selector kinds against node bounds, and derives adb input coordinates before executing the action. Built-in Android selector resolution supports `testId`, `resourceId`, `accessibilityId`, `accessibilityLabel`, and `text`; `xpath` stays available for external runners with native selector engines.
82
89
 
83
- I/O from iOS simctl capture routes through the simctl driver adapter. `readLogs` preserves bounded simulator logs under `raw/ios-simctl-log.txt`. A scenario step with `driverAction: "screenshot"` or `artifact: "screenshot"` requests a screenshot capture, defaulting to `captures/ios-screenshot.png`; when `--screenshot-type`, `--screenshot-display`, or `--screenshot-mask` are supplied to `asl-ios-simctl`, the command passes those supported `simctl io screenshot` options and records them in capture metadata. The profile manifest records the resulting capture path in `artifacts.captures.screenshots`.
90
+ I/O from iOS simctl capture routes through the simctl driver adapter. `readLogs` preserves bounded simulator logs under `raw/ios-simctl-log.txt`. A scenario step with `driverAction: "screenshot"` or `artifact: "screenshot"` requests a screenshot capture, defaulting to `captures/ios-screenshot.png`. The profile manifest records the resulting capture path in `artifacts.captures.screenshots`, and capture metadata records any supported simulator screenshot options the runner used.
91
+
92
+ Manifest artifact paths are evidence claims. Optional diagnostics such as `captures.video`, `captures.uiTree`, `raw.deviceLog`, JS/memory/network signals, accessibility exports, and profiler files appear as paths only when the file was produced or intentionally referenced as a sidecar dependency. Every profile manifest also includes `artifacts.diagnostics`, an inventory of common diagnostic surfaces with `kind`, `status`, `required`, optional `path`, and a `reason`/`nextAction` when evidence was unavailable or not requested.
84
93
 
85
94
  Planner compatibility also validates the adapter metadata that built-in runners require. Android adb `tap` steps need either `adapterOptions.androidAdb.x/y` or a portable selector; Android adb `scroll` steps need either `startX/startY/endX/endY` or a portable selector; iOS simctl command metadata needs non-empty command strings and positive integer waits/repeat counts. Argent `tap` steps need `adapterOptions.argent.x/y`, Argent `scroll` steps need `adapterOptions.argent.startX/startY/endX/endY`, and Argent `assertVisible` steps need a portable selector. These failures become `invalid_adapter_options` health checks before runtime execution starts.
86
95
 
87
- Adapter-target fixtures such as `agent-device-android`, `agent-device-ios`, `argent-ios`, `argent-android`, `argent-react-profiler-provider`, and `axe-accessibility-provider` describe where external tools can plug into the same contract. They are schema-checked and planner-tested capability manifests. The bundled `agent-device` capture runner implements the portable interaction subset for iOS and Android; broader agent-device surfaces such as React DevTools, traces, network, and performance still need explicit adapters or provider attachments before they become part of the stable artifact contract. The bundled Argent runner implements launch, coordinate-backed gestures, screenshot requests, and description-backed visibility proof for portable selector match modes while keeping React profiler output in a separate Android evidence-provider lane. Argent command-surface checks prove the configured tools exist; runtime health still owns whether the selected device backend produced screenshot evidence. Required screenshot failures fail health, and optional screenshot failures are preserved as warnings. Active evidence providers can satisfy required evidence artifacts and provider-owned driver actions such as `collectPerfSignals`; providers outside the selected platform do not contribute to the match. When those tools write files independently, profile CLIs can attach the files with `--signal <js|memory|network>:<path>` or `--capture <screenshot|video|uiTree>:<path>` so provider evidence lands in the stable manifest and artifact layout. The `script-accessibility-provider`, `script-profiler-provider`, `script-memory-provider`, and `script-network-provider` examples show provider-command wrappers for project-local tools without making those tools package dependencies.
96
+ Adapter-target fixtures such as `agent-device-android`, `agent-device-ios`, `argent-ios`, `argent-android`, `argent-react-profiler-provider`, and `axe-accessibility-provider` describe where external tools can plug into the same contract. They are schema-checked and planner-tested capability manifests. The bundled `agent-device` capture runner implements the portable interaction subset for iOS and Android; broader agent-device surfaces such as React DevTools, traces, network, and performance still need explicit adapters or provider attachments before they become part of the stable artifact contract. The bundled Argent runner implements launch, coordinate-backed gestures, screenshot requests, and description-backed visibility proof for portable selector match modes while keeping React profiler output in a separate Android evidence-provider lane. Argent command-surface checks prove the configured tools exist; runtime health still owns whether the selected device backend produced screenshot evidence. Required screenshot failures fail health, and optional screenshot failures are preserved as warnings. Active evidence providers can satisfy required evidence artifacts and provider-owned driver actions such as `collectPerfSignals`; providers outside the selected platform do not contribute to the match. When those tools write files independently, attached provider evidence lands in the stable manifest and artifact layout. The `script-accessibility-provider`, `script-profiler-provider`, `script-memory-provider`, and `script-network-provider` examples show provider-command wrappers for project-local tools without making those tools package dependencies.
97
+
98
+ Profiler evidence is a first-class artifact kind, but ASL does not pretend every profiler tool has the same native format. JSON profiler outputs should satisfy [schemas/profiler.schema.json](../schemas/profiler.schema.json), including provider, platform, run, scenario, tool/completeness metadata, and at least one useful content surface such as samples, metrics, events, traces, a profile object, summary, or referenced attachments. Lifecycle-backed profilers should also declare whether evidence came from passive report ingestion, an explicit session, inline capture, `afterCapture`, `postRun`, or rehydration; whether the target device/app binding was verified; whether capture perturbed timing; and whether the output is comparable or diagnostic-only. Native traces, CPU profiles, flamegraphs, React DevTools exports, and recordings can still be attached as profiler evidence through provider outputs, but agents should treat them as preserved evidence until a provider also emits structured metrics that ASL can compare or summarize.
88
99
 
89
100
  ## Public artifact layout
90
101
 
@@ -111,11 +122,23 @@ Profile runner artifacts:
111
122
 
112
123
  `manifest.json`, `metrics.json`, `budget-verdict.json`, and `causal-run.json` are schema-checked before the runner writes them. This keeps profile artifacts stable across fixture logs, adb-captured logs, and future runner adapters.
113
124
 
125
+ `causal-run.json` preserves app-emitted timeline events through the public causal phase/status vocabulary. If an app emits richer phase or status values, ASL writes schema-valid top-level values and preserves the originals as timeline metadata. Timeline metadata also preserves scalar correlation fields such as `iteration`, `sequence`, `queueId`, `commandId`, `operationId`, `attemptId`, and `clockDomain` when the app emits them. Profile-session command acknowledgements are included as ASL-owned timeline entries with command status, result, source, sequence, queue, wait, and command ID metadata, so agents can inspect runtime ordering without treating command transport as product truth. Repeated runs include `iterationSummary` so agents can distinguish complete, partial, failed, and timeout iteration evidence without scraping raw logs. Scenarios without budget thresholds still produce schema-valid causal artifacts with an empty `budgets` object.
126
+
127
+ `manifest.attempt` records the run attempt identity and terminal semantics independently of prose summaries. It includes an `attemptId`, `attemptNumber`, `maxAttempts`, optional retry lineage, terminal state, failure classification, cleanup outcome, and whether preserved partial artifacts are valid for diagnosis. Retry attempts must identify the prior attempt and retry reason. A failed attempt can therefore keep usable raw evidence without implying that product verdict, timing, or comparison claims are trustworthy.
128
+
129
+ `manifest.provenance.cohort` records product-neutral compatibility inputs for comparing runs. Profile runners populate known fields such as `appId`, `platform`, `runnerName`, `runnerVersion`, `commandTransport`, and active provider IDs; richer callers can add app/build version, build mode, OS version, device class, feature flags, and seed identity. ASL derives `manifest.provenance.cohortHash` from the normalized cohort. Latest-trusted comparison requires the same cohort hash when the current run records one, so old artifacts remain comparable only when the current artifact has not opted into cohort-aware selection.
130
+
131
+ `manifest.attempt.terminalState` uses a terminal vocabulary of `passed`, `failed`, `timeout`, `cancelled`, `aborted`, `inconclusive`, `unsupported`, `skipped`, and `unhealthy`. Attempt construction rejects misleading terminal combinations: passed attempts must end as `passed`, failed attempts must use a failure terminal state, timeout/cancelled/aborted attempts must preserve valid partial artifact paths, and cleanup statuses such as `passed`, `failed`, or `partial` must include a cleanup message. `manifest.environment` records product-neutral lifecycle and environment preconditions and postconditions. Each field is an assertion object with a `value` and `evidence` state. Generated profile artifacts default to `value: "unknown"` and `evidence: "not-asserted"` unless the runner can prove more. The dedicated `lifecyclePhase` assertion supports `cold-launch`, `warm-launch`, `hot-launch`, `resume`, `foreground`, `background`, `force-stop`, `process-death`, `scene-recreation`, `activity-recreation`, `os-reclaim`, `reboot`, and `relaunch`. This preserves what the runner did not prove instead of letting agents infer installed state, app data state, auth state, route, foreground state, permissions, locale, timezone, theme, font scale, orientation, network, animations, cleanup, data, or artifact completeness from surrounding logs.
132
+
133
+ Profile `agent-summary.md` files include an `attempt` section when the run has a manifest attempt block, including terminal state, cleanup state, partial-artifact validity, and retry lineage. Latest-trusted baseline selection treats attempt-aware runs as baseline-trusted only when health and verdict passed, the attempt is a clean first attempt, cleanup did not fail or remain partial, and partial artifacts are not marked valid diagnostic fragments. Older artifacts without `manifest.attempt` remain legacy-trusted when health and verdict passed, but new attempt-aware runs cannot hide retry laundering behind a green final verdict.
134
+
135
+ Profile runners assert only environment facts they own. Every completed profile manifest records ASL-controlled artifact completeness and cleanup postconditions. Live adb/simctl capture paths also assert runner-controlled foreground state, explicit lifecycle preconditions, and foreground postconditions. Use `--lifecycle-phase <phase>` when a runner can prove a non-cold precondition such as `warm-launch` or `resume`; log-ingest and preexisting artifact ingestion keep those fields `unknown/not-asserted`. Lifecycle assertions are not product milestones: a runner proving `lifecyclePhase: "resume"` does not synthesize `app_resumed` or any other app truth event. Resume readiness must still be emitted by the consuming app when a scenario waits for it.
136
+
114
137
  Aggregate live proof commands write `live-proof.json` and `agent-summary.md` under `_live-proof/<run-id>`. The live-proof artifact points to preflight evidence, every scenario run, optional interaction proofs from tools such as agent-device or Argent, optional skipped interaction proof declarations, and optional latest-trusted comparison outputs, giving agents one stable entrypoint after a proof run. Preflight, profile, and interaction pointers include health and verdict status from the linked run artifacts, so agents can see what passed before opening deeper evidence. Interaction proof pointers also include sidecar screenshot capture inventory when the sidecar produced screenshots, plus `warnings` when optional sidecar checks failed without invalidating the required proof. If profile health or verdict fails, requested sidecars are not executed; they are recorded in `skippedInteractionProofs` with a reason and next action so agent feedback stays explicit without mixing runner evidence into an untrusted timing run. The aggregate artifact records `status`, `comparisonStatus`, `comparisonCounts`, optional per-comparison `metricSummary` counts/highlights, and a `nextAction` hint so agents can distinguish failed proof gates, regressions, mixed metric movement, missing baselines, inconclusive comparisons, partial sidecar evidence, and clean summaries without scraping prose.
115
138
 
116
139
  Platform-set proof commands write `live-proof-set.json` and `agent-summary.md` under the caller-provided proof-set output directory. The proof-set artifact records required platforms, present platforms, missing platforms, each linked `live-proof.json`, failed proof reasons, regression-gate reasons, and a next action. This gives agents one stable Android-plus-iOS gate after the per-platform live proofs have written their own aggregate evidence.
117
140
 
118
- Provider or custom-script evidence attached with `--signal` or `--capture` is copied into stable run folders and inventoried in `manifest.artifacts.evidenceAttachments`. Each inventory entry records the evidence channel, kind, run-relative path, source filename, byte size, and sha256 hash; it does not preserve local absolute source paths.
141
+ Provider or custom-script evidence attachments are copied into stable run folders and inventoried in `manifest.artifacts.evidenceAttachments`. Each inventory entry records the evidence channel, kind, run-relative path, source filename, byte size, sha256 hash, completeness status, corruption status, redaction status, and transformations; it does not preserve local absolute source paths.
119
142
 
120
143
  Evidence folders:
121
144
 
@@ -133,6 +156,14 @@ The current profile runner writes health, verdict, agent summary, metrics, causa
133
156
 
134
157
  Budgets are supported but optional for adoption.
135
158
 
159
+ Milestone budget interval semantics are explicit:
160
+
161
+ - `toMilestone` without `fromMilestone` measures elapsed time from the run or session clock origin to the matching milestone occurrence.
162
+ - `fromMilestone` plus `toMilestone` measures the interval between the two app-owned truth events for each iteration.
163
+ - repeated transition, gesture, open, close, scroll, or handoff budgets should use both milestones when the intended number is transition duration rather than cumulative elapsed time.
164
+
165
+ This distinction is visible in `metrics.json`: elapsed milestone-only runs populate `durationsMs` with milestone timestamps, while interval runs populate `durationsMs` with `to - from` values. Timing still remains untrusted unless `health.json` passes.
166
+
136
167
  `buildRunIndex()` can scan an artifact root after runs complete. It indexes folders that contain both `health.json` and `verdict.json`, marks a run trusted only when health and verdict both passed, and lets agents find the latest trusted prior run for a scenario without relying on terminal history.
137
168
 
138
169
  ## Supported Runner Surface
@@ -164,104 +195,17 @@ Not yet shipped as supported public features:
164
195
  - Computer Use flows
165
196
  - product-specific scenarios
166
197
 
167
- ## Preflight planning
168
-
169
- Use `check-plan` to validate a scenario, runner manifest, and optional evidence-provider manifests before execution:
170
-
171
- ```bash
172
- pnpm check-plan -- --scenario examples/scenarios/mobile/app-startup.json --runner examples/runners/xcodebuildmcp-ios.json --platform ios --out artifacts/plan/app-startup
173
- ```
174
-
175
- This validates the input manifests, writes schema-checked `health.json` and `verdict.json`, writes `agent-summary.md`, and includes the raw planner match in `planner-compatibility.json`.
176
-
177
- ## Android adb readiness
178
-
179
- Use `android:preflight` to verify adb and connected-device readiness before adding live Android scenario execution:
180
-
181
- ```bash
182
- pnpm android:preflight -- --package com.example.app --out artifacts/android-adb-preflight
183
- ```
184
-
185
- The command writes:
186
-
187
- - `health.json`
188
- - `verdict.json`
189
- - `agent-summary.md`
190
- - `raw/adb-version.txt`
191
- - `raw/adb-devices.txt`
192
- - `raw/android-metadata.json`
193
-
194
- If adb, a connected online device, or an optional package check fails, health fails and the verdict remains `inconclusive`.
195
-
196
- Add `--capture-logcat --logcat-lines <count>` to write `raw/adb-logcat.txt` in the same artifact folder. Add `--react-native-debug-host <host:port>` with `--package <name>` for React Native development builds that need adb reverse plus the app `debug_http_host` preference before launch; the runner writes `raw/adb-react-native-reverse.txt` and `raw/adb-react-native-debug-host.txt`. Add `--clear-logcat --launch --wait-ms <ms>` with `--package <name>` to clear logs, launch the package, wait for a bounded capture window, and then collect logcat evidence. If requested capture-window setup or logcat capture fails, scenario health fails because timing and event evidence would be incomplete.
197
-
198
- Use that captured logcat evidence directly with Android profiling:
199
-
200
- ```bash
201
- pnpm profile:android -- --config core/config-template.json --scenario examples/mobile-app/scenarios/android/app-startup.json --adb-artifacts artifacts/android-adb-preflight --run-id android-run-1
202
- ```
203
-
204
- Or let Android profiling own the adb capture window before it writes profile artifacts:
205
-
206
- ```bash
207
- pnpm profile:android -- --config core/config-template.json --scenario examples/mobile-app/scenarios/android/app-startup.json --adb-capture --react-native-debug-host localhost:8097 --clear-logcat --launch --run-id android-run-1
208
- ```
209
-
210
- ## iOS simulator capture
211
-
212
- Use `profile:ios --simctl-capture` when the example app or a consuming app is already installed on a booted simulator:
213
-
214
- ```bash
215
- pnpm profile:ios -- --config core/config-template.json --scenario examples/mobile-app/scenarios/ios/app-startup.json --simctl-capture --profile-session --profile-session-storage --launch --run-id ios-run-1
216
- ```
217
-
218
- The command writes a separate simctl capture folder under the selected output root, seeds the app-owned profile session into native AsyncStorage before launch, then collects stored app profile events after the capture window. Command scenarios seed the scenario command queue through the same storage contract before launch. When `raw/ios-profile-events.log` exists, the iOS profile runner ingests that stored truth-event log; otherwise it falls back to `raw/ios-simctl-log.txt`.
219
-
220
- For profile-session capture on Android or iOS, omitting `--wait-ms` lets ASL derive the final evidence window from scenario execution waits and cycle count. Explicit `--wait-ms` remains authoritative when a consuming app has a known startup or logging delay that the scenario cannot express.
221
-
222
- Scenario command targets live in `adapterOptions.iosSimctl.commands`, while the app handles them through `registerProfileCommandTargetHandler`. The iOS proof does not depend on unified logs carrying JavaScript console output; it depends on app-owned stored profile events.
223
-
224
- ## Historical comparison
225
-
226
- Use `compare` to build `comparison.json` from two completed run folders:
227
-
228
- ```bash
229
- pnpm compare -- --baseline artifacts/runs/app-startup/baseline --current artifacts/runs/app-startup/current --out artifacts/runs/app-startup/current --fail-on-regression
230
- ```
231
-
232
- The comparison gate is intentionally strict. If either run failed scenario health, or if the scenario ids do not match, the comparison is `inconclusive`. Numeric budget checks are compared only after that health gate passes. `comparison.json` includes `comparisonBasis` with the baseline/current run ids and run directories, giving agents artifact-local provenance instead of forcing them to infer it from folder names.
233
-
234
- Use `compare:latest` when an artifact root contains run history and the agent should compare the current run against the newest trusted prior run for the same scenario:
235
-
236
- ```bash
237
- pnpm compare:latest -- --root artifacts/runs --scenario app-startup --current artifacts/runs/app-startup/current --out artifacts/runs/app-startup/current --fail-on-regression
238
- ```
239
-
240
- The latest-trusted command excludes the exact current run directory from baseline selection. Baseline trust requires passed health and passed verdict. Current runs must pass scenario health before the command will compare timing or budget evidence. If the current manifest declares `comparisonLane`, baseline selection is scoped to trusted prior runs with the same lane; if the current manifest has no lane, selection stays within unlabeled trusted prior runs. Profile manifests also include `scenarioHash`, a stable fingerprint of the normalized scenario contract. When the current run has that hash, latest-trusted selection only compares against trusted prior runs with the same hash; legacy runs without the hash remain comparable only to legacy current runs. This keeps proof modes such as plain live proof and live proof plus agent-device sidecar from comparing against each other, and it keeps migrated scenario definitions from poisoning before/after verdicts. Latest-trusted artifacts set `comparisonBasis.strategy` to `latest_trusted_prior` and record selection counts for inspected, trusted, trusted-prior, lane-comparable, and scenario-contract-comparable candidates.
241
-
242
- ## Fixture loop
243
-
244
- Use `demo:loop` to run the current contract without a simulator:
245
-
246
- ```bash
247
- pnpm demo:loop -- --out artifacts/demo-loop
248
- ```
249
-
250
- The fixture loop writes:
198
+ ## Command guidance
251
199
 
252
- - `preflight/app-startup/health.json`
253
- - `preflight/app-startup/verdict.json`
254
- - `preflight/app-startup/agent-summary.md`
255
- - `profile-runs/app-startup/demo-baseline/*`
256
- - `profile-runs/app-startup/demo-current/*`
257
- - `profile-runs/app-startup/demo-current/comparison.json`
200
+ Contracts defines the schemas, artifact fields, runner surfaces, and trust policy. Runnable walkthroughs live in [Live Proofs](live-proofs.md):
258
201
 
259
- This is not a replacement for live device proof. It is a stable contract check that keeps the evidence loop reproducible through trusted prior-run selection while iOS or Android runtime setup is unavailable.
202
+ - [plan checks](live-proofs.md#plan-check)
203
+ - [Android adb preflight and profile capture](live-proofs.md#platform-preflight-and-profile-capture)
204
+ - [iOS simctl profile capture](live-proofs.md#platform-preflight-and-profile-capture)
205
+ - [fixture loop](live-proofs.md#fixture-loop)
206
+ - [explicit and latest-trusted comparison](live-proofs.md#comparison)
207
+ - [generic Android and iOS live proof](live-proofs.md#generic-mobile-proof)
260
208
 
261
209
  ## Read next
262
210
 
263
- - [README](../README.md) for the shortest path through the project
264
- - [Concepts](concepts.md) for the broader product framing
265
- - [Adapter Onboarding](adapters.md) for adding runners and evidence providers
266
- - [Consumer App Rehearsal](consumer-rehearsal.md) for adopting the package in an existing app
267
- - [Runner docs](../runner/README.md) for current runner behavior and limits
211
+ - [Scenario Authoring](authoring.md) for writing portable scenarios against these contracts
@@ -0,0 +1,219 @@
1
+ # External Adapter Protocol
2
+
3
+ ASL core is TypeScript, but the adapter contract is language-neutral. An external adapter is an out-of-process executable that exchanges newline-delimited JSON messages over stdin and stdout. The executable can be written in any language and must not depend on ASL TypeScript internals.
4
+
5
+ This document defines the minimal protocol surface for conformance fixtures and future adapter hosts. JSON Schema and this normative protocol document are the source of truth for portable behavior; built-in TypeScript runners remain implementations of the same contract, not the contract itself. The protocol message schema is published in `schemas/external-adapter-message.schema.json`.
6
+
7
+ ## Transport
8
+
9
+ - The host starts the adapter as a child process without a shell.
10
+ - Each message is one UTF-8 JSON object followed by `\n`.
11
+ - stdout is reserved for protocol messages. Diagnostics must go to stderr.
12
+ - Requests and responses are correlated by `operationId`.
13
+ - `seq` is a monotonically increasing integer within each sender's stream.
14
+ - Hosts and adapters maintain independent `seq` streams. A receiver must treat missing, repeated, or non-monotonic `seq` values as protocol health failures.
15
+ - Timestamps use RFC 3339 strings. Timing-sensitive waits must declare their `clockDomain`.
16
+ - Adapters must classify work received after its request `deadline` as a structured deadline failure instead of silently attempting stale work.
17
+ - Paths in artifact references are run-relative unless `uri` is explicitly used.
18
+ - Artifact and raw file references should include `sha256` and `sizeBytes` when the adapter can compute them.
19
+ - Evidence bytes must not be embedded in protocol messages as raw data or base64.
20
+
21
+ ## Envelope
22
+
23
+ Every message uses the same envelope:
24
+
25
+ ```json
26
+ {
27
+ "protocolVersion": "1.0",
28
+ "seq": 1,
29
+ "operationId": "op-001",
30
+ "kind": "request",
31
+ "type": "hello",
32
+ "runId": "run-001",
33
+ "attemptId": "attempt-001",
34
+ "deadline": "2026-06-19T12:00:05.000Z",
35
+ "body": {}
36
+ }
37
+ ```
38
+
39
+ Fields:
40
+
41
+ | Field | Required | Meaning |
42
+ | --- | --- | --- |
43
+ | `protocolVersion` | yes | Protocol major/minor string. This document defines `1.0`. |
44
+ | `seq` | yes | Sender-local message sequence. |
45
+ | `operationId` | yes | Correlates one request with one response. Cancellation targets this value. |
46
+ | `kind` | yes | `request`, `response`, or `event`. |
47
+ | `type` | yes | Operation or event name. |
48
+ | `runId` | request after `hello` | Stable ASL run identifier. |
49
+ | `attemptId` | request after `hello` | Stable retry/attempt identifier for the run. |
50
+ | `deadline` | request operations | Absolute deadline for bounded work. |
51
+ | `body` | yes | Operation-specific payload. |
52
+
53
+ Responses must echo `protocolVersion`, `operationId`, `runId`, and `attemptId` when those fields were present on the request.
54
+
55
+ ## Hello And Capability Discovery
56
+
57
+ The first host message must be `hello`. The adapter responds with its identity, supported protocol range, platforms, capabilities, driver actions, artifact outputs, and clock domains.
58
+
59
+ Request body:
60
+
61
+ ```json
62
+ {
63
+ "host": {
64
+ "name": "agent-scenario-loop",
65
+ "version": "0.1.x"
66
+ },
67
+ "platform": "android"
68
+ }
69
+ ```
70
+
71
+ Response body:
72
+
73
+ ```json
74
+ {
75
+ "adapter": {
76
+ "name": "asl-python-conformance-fixture",
77
+ "version": "0.1.0"
78
+ },
79
+ "acceptedProtocolVersion": "1.0",
80
+ "platforms": ["android", "ios"],
81
+ "capabilities": ["prepare", "launch", "command", "truthEvent", "evidence", "cancel", "stop", "finalize"],
82
+ "driverActions": ["tap", "assertVisible"],
83
+ "artifactOutputs": ["logs", "screenshot", "truth-events"],
84
+ "clockDomains": ["host-monotonic", "device-log"]
85
+ }
86
+ ```
87
+
88
+ If the adapter cannot support the requested protocol or platform, it must return a structured failure response and then exit cleanly or wait for `finalize`.
89
+
90
+ ## Operations
91
+
92
+ All operation responses use:
93
+
94
+ ```json
95
+ {
96
+ "ok": true,
97
+ "result": {}
98
+ }
99
+ ```
100
+
101
+ or:
102
+
103
+ ```json
104
+ {
105
+ "ok": false,
106
+ "failure": {
107
+ "category": "unsupported",
108
+ "code": "unsupported_action",
109
+ "message": "driverAction `pinch` is not supported",
110
+ "retryable": false,
111
+ "details": {
112
+ "driverAction": "pinch"
113
+ }
114
+ }
115
+ }
116
+ ```
117
+
118
+ `failure.category` is optional for older adapters but recommended for conformance. Use these product-neutral categories:
119
+
120
+ | Category | Use |
121
+ | --- | --- |
122
+ | `adapter` | Adapter implementation failure that is not more specific. |
123
+ | `cancelled` | Operation was cancelled before completion. |
124
+ | `cleanup` | Stop/finalize/cleanup invariant failed. |
125
+ | `deadline` | Request deadline expired before or during adapter work. |
126
+ | `environment` | Host, device, simulator, permission, or tool environment prevented execution. |
127
+ | `protocol` | Malformed message, invalid sequence, unsupported protocol, or decode failure. |
128
+ | `runner` | Runner orchestration failed outside app product behavior. |
129
+ | `unsupported` | Operation, platform, driver action, or evidence kind is unsupported. |
130
+
131
+ ### prepare
132
+
133
+ Validates target configuration, creates or verifies run directories, and reports setup metadata. The request body should include `platform`, target identifiers, environment assumptions, and an optional `artifactsRoot`.
134
+
135
+ The response should include normalized target metadata and any adapter-owned artifact directories.
136
+
137
+ ### launch
138
+
139
+ Launches or verifies the app, device, browser, or other target. The request body should include `platform`, `target`, and optional launch arguments.
140
+
141
+ The response should include launch status, a target reference when available, and artifact references for raw command output.
142
+
143
+ ### executeAction
144
+
145
+ Executes one portable driver action. The request body must include `driverAction` and action-specific input. The adapter must reject unknown or unsupported actions with `ok: false`.
146
+
147
+ ### waitCondition
148
+
149
+ Waits for a truth event, UI condition, log marker, or other bounded condition. The request body must include `condition`, `deadline`, and `clockDomain`.
150
+
151
+ The response should include matched truth-event data when available. Timing values are not trustworthy verdict inputs unless scenario health passed.
152
+
153
+ ### captureEvidence
154
+
155
+ Captures logs, screenshots, UI trees, videos, profiler output, or provider signals. The response must return artifact references:
156
+
157
+ ```json
158
+ {
159
+ "artifacts": [
160
+ {
161
+ "kind": "screenshot",
162
+ "path": "captures/final-screen.png",
163
+ "contentType": "image/png",
164
+ "description": "Final screen after launch",
165
+ "sha256": "0000000000000000000000000000000000000000000000000000000000000000",
166
+ "sizeBytes": 0
167
+ }
168
+ ]
169
+ }
170
+ ```
171
+
172
+ ### cancel
173
+
174
+ Requests cancellation of an in-flight `operationId`. The body must include `targetOperationId` and a human-readable `reason`. Adapters should make cancellation best effort and respond to the original operation with `code: "cancelled"` if it was interrupted.
175
+
176
+ ### stop
177
+
178
+ Stops the active app/session/target while preserving evidence produced so far. This is distinct from `finalize`; the adapter may still accept evidence capture or finalization work after stop.
179
+
180
+ If there is no active launched target, `stop` must return a structured cleanup failure instead of pretending cleanup ran. Include `details.cleanupStatus` when the adapter can distinguish `not-required`, `partial`, `failed`, or `passed`.
181
+
182
+ ### finalize
183
+
184
+ Flushes pending protocol output, closes adapter-owned resources, and reports final artifact inventory. After a successful `finalize` response the adapter should exit with code `0`.
185
+
186
+ `finalize` is terminal for one adapter attempt. Repeated finalization must return a structured cleanup or protocol failure and must not rewrite the prior artifact inventory.
187
+
188
+ ## Events
189
+
190
+ Adapters may emit `event` messages between request responses for truth events, progress, and evidence discovery:
191
+
192
+ ```json
193
+ {
194
+ "protocolVersion": "1.0",
195
+ "seq": 4,
196
+ "kind": "event",
197
+ "type": "truthEvent",
198
+ "runId": "run-001",
199
+ "attemptId": "attempt-001",
200
+ "body": {
201
+ "name": "app.ready",
202
+ "clockDomain": "device-log",
203
+ "observedAt": "2026-06-19T12:00:02.000Z",
204
+ "payload": {
205
+ "screen": "Home"
206
+ }
207
+ }
208
+ }
209
+ ```
210
+
211
+ Events must not replace the response for an operation. The host should still receive one terminal response for every request except when the process exits unexpectedly.
212
+
213
+ ## Conformance Fixture
214
+
215
+ The fixture under `runner/__tests__/fixtures/external-adapter/` is intentionally small and non-JavaScript. It proves that a conforming adapter can be an external process with no ASL TypeScript imports. Golden transcripts in the same directory define expected request/response behavior for the success path, unsupported action failure, expired deadline failure, cleanup/finalization failure, sequence monotonicity, and artifact references without embedded evidence bytes.
216
+
217
+ ## Read next
218
+
219
+ - [Contracts](contracts.md) for the scenario, runner, artifact, health, verdict, comparison, and provenance shapes