agent-scenario-loop 0.1.2 → 0.1.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +9 -9
- package/app/profile-session.ts +352 -12
- package/dist/core/agent-summary.d.ts +3 -2
- package/dist/core/agent-summary.js +44 -2
- package/dist/core/artifact-contract.d.ts +28 -8
- package/dist/core/artifact-contract.js +676 -26
- package/dist/core/comparison.d.ts +57 -3
- package/dist/core/comparison.js +113 -1
- package/dist/core/planner.d.ts +32 -1
- package/dist/core/planner.js +144 -0
- package/dist/core/run-index.d.ts +4 -0
- package/dist/core/run-index.js +55 -1
- package/dist/core/schema-validator.d.ts +2 -0
- package/dist/core/schema-validator.js +2 -0
- package/dist/runner/android-adb-driver.d.ts +7 -2
- package/dist/runner/android-adb-driver.js +7 -1
- package/dist/runner/android-adb.d.ts +40 -5
- package/dist/runner/android-adb.js +1046 -664
- package/dist/runner/compare-latest.d.ts +8 -4
- package/dist/runner/compare-latest.js +24 -5
- package/dist/runner/example-android-live.d.ts +10 -1
- package/dist/runner/example-android-live.js +55 -0
- package/dist/runner/example-ios-live.d.ts +10 -1
- package/dist/runner/example-ios-live.js +55 -0
- package/dist/runner/ios-simctl.d.ts +6 -0
- package/dist/runner/ios-simctl.js +7 -0
- package/dist/runner/live-comparison.d.ts +2 -2
- package/dist/runner/live-comparison.js +2 -1
- package/dist/runner/live-proof-summary.d.ts +5 -4
- package/dist/runner/live-proof-summary.js +12 -2
- package/dist/runner/live-proof.d.ts +3 -2
- package/dist/runner/live-proof.js +9 -2
- package/dist/runner/profile-android.d.ts +16 -1
- package/dist/runner/profile-android.js +364 -26
- package/dist/runner/profile-ios.d.ts +13 -2
- package/dist/runner/profile-ios.js +341 -19
- package/dist/runner/profile-mobile.d.ts +39 -3
- package/dist/runner/profile-mobile.js +1054 -42
- package/dist/runner/validate-project.js +3 -0
- package/dist/scripts/consumer-rehearsal.d.ts +119 -0
- package/dist/scripts/consumer-rehearsal.js +757 -0
- package/dist/scripts/downstream-local-package-gate.d.ts +2 -0
- package/dist/scripts/downstream-local-package-gate.js +264 -0
- package/dist/scripts/package-smoke.d.ts +96 -0
- package/dist/scripts/package-smoke.js +2282 -0
- package/dist/scripts/release-readiness.d.ts +2 -0
- package/dist/scripts/release-readiness.js +520 -0
- package/docs/adapters.md +7 -1
- package/docs/api.md +2 -2
- package/docs/architecture.md +90 -0
- package/docs/authoring.md +39 -3
- package/docs/concepts.md +3 -24
- package/docs/consumer-rehearsal.md +31 -1
- package/docs/contracts.md +45 -101
- package/docs/external-adapter-protocol.md +219 -0
- package/docs/live-proofs.md +86 -3
- package/docs/principles.md +9 -15
- package/examples/mobile-app/README.md +12 -0
- package/examples/mobile-app/runner-manifests/evidence-provider.json +3 -3
- package/examples/mobile-app/runner-manifests/primary-runner.json +1 -0
- package/examples/mobile-app/scripts/asl-capture-profiler-provider.mjs +25 -0
- package/examples/runners/README.md +4 -3
- package/examples/runners/adb-android.json +1 -0
- package/examples/runners/agent-device-android.json +1 -0
- package/examples/runners/agent-device-ios.json +1 -0
- package/examples/runners/argent-android.json +1 -0
- package/examples/runners/argent-ios.json +1 -0
- package/examples/runners/axe-accessibility-provider.json +2 -2
- package/examples/runners/script-accessibility-provider.json +2 -2
- package/examples/runners/script-memory-provider.json +2 -2
- package/examples/runners/script-network-provider.json +2 -2
- package/examples/runners/script-profiler-provider.json +2 -2
- package/examples/runners/xcodebuildmcp-ios.json +1 -0
- package/package.json +12 -3
- package/schemas/causal-run.schema.json +85 -2
- package/schemas/comparison.schema.json +130 -2
- package/schemas/external-adapter-message.schema.json +693 -0
- package/schemas/health.schema.json +72 -0
- package/schemas/live-proof-set.schema.json +1 -1
- package/schemas/live-proof.schema.json +14 -6
- package/schemas/manifest.schema.json +515 -4
- package/schemas/profiler.schema.json +243 -0
- package/schemas/runner-capabilities.schema.json +28 -2
- package/schemas/scenario.schema.json +34 -2
- package/templates/evidence-provider.json +3 -3
- package/templates/primary-runner.json +1 -0
- package/templates/scripts/asl-capture-profiler-provider.mjs +20 -0
package/docs/authoring.md
CHANGED
|
@@ -80,12 +80,14 @@ Preferred fields:
|
|
|
80
80
|
- `journey`: human-readable intent, actor, start state, and end state
|
|
81
81
|
- `comparisonLane`: default historical baseline lane for runs of this scenario
|
|
82
82
|
- `milestones`: named event checkpoints with phases and timeouts
|
|
83
|
-
- `cycles`: iteration count and
|
|
83
|
+
- `cycles`: iteration count, stop policy, and optional setup/body step ids
|
|
84
84
|
- `budgets`: thresholds to evaluate only after truth-event health passes
|
|
85
85
|
- `artifacts`: required and optional evidence outputs
|
|
86
86
|
|
|
87
87
|
Use `comparisonLane` when a scenario should always compare within one stable proof mode, such as `feed-open-android-live`. Profile CLIs can also receive `--comparison-lane`; the CLI flag wins when one-off runs need a different lane.
|
|
88
88
|
|
|
89
|
+
For repeated scenarios, separate setup from the measured body. Commands that clear state, navigate home, dismiss modals, or establish readiness should not be measured every iteration unless that cleanup is the journey under test. Use `cycles.setupStepIds` for leading setup commands that run once, or `cycles.bodyStepIds` to name the repeated command body. If neither is provided, ASL profile-session runners infer a conservative setup prefix from readiness waits and measured milestone budgets, but explicit ids are clearer for complex flows.
|
|
90
|
+
|
|
89
91
|
## Truth Events
|
|
90
92
|
|
|
91
93
|
Treat truth events as app-owned facts, not runner observations. The app should emit them from the code path that actually represents the journey state.
|
|
@@ -105,6 +107,32 @@ Weak truth events:
|
|
|
105
107
|
|
|
106
108
|
Timing is not trusted unless scenario health passes. If a required truth event is missing, the run can still write artifacts, but verdicts and comparisons must remain inconclusive.
|
|
107
109
|
|
|
110
|
+
### Resume Scenarios
|
|
111
|
+
|
|
112
|
+
`--lifecycle-phase resume` and related runner controls assert runner-owned lifecycle setup in `manifest.environment`; they do not create product truth events. If a scenario waits for `app_resumed`, `feed_restored_after_resume`, or another resumed-state milestone, the app must emit that event from the code path that proves resumed product readiness.
|
|
113
|
+
|
|
114
|
+
## Budget Intervals
|
|
115
|
+
|
|
116
|
+
Milestone budgets measure the interval the scenario names. A budget with only `toMilestone` measures elapsed time from the run or session clock origin to each matching milestone occurrence. That is correct for startup and first-usable-screen budgets, but it is cumulative for repeated interactions.
|
|
117
|
+
|
|
118
|
+
For transition or gesture budgets, provide both ends of the interval:
|
|
119
|
+
|
|
120
|
+
```json
|
|
121
|
+
{
|
|
122
|
+
"name": "surface transition p95",
|
|
123
|
+
"source": "milestone",
|
|
124
|
+
"metric": "p95",
|
|
125
|
+
"unit": "ms",
|
|
126
|
+
"limit": 300,
|
|
127
|
+
"fromMilestone": "surfaceTransitionRequested",
|
|
128
|
+
"toMilestone": "surfaceSettled"
|
|
129
|
+
}
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
Use app-owned truth events for both milestones. Do not use a command-delivered event as the start point unless that command delivery is the product fact being measured.
|
|
133
|
+
|
|
134
|
+
When the start event is useful only as a timing anchor, keep it optional and keep scenario health tied to the completion truth. For repeated flows, set `metricEvents.milestone` or the completion-oriented cycle events to the truth that proves the iteration completed, then use the optional intent milestone as `fromMilestone` in the budget.
|
|
135
|
+
|
|
108
136
|
## Steps
|
|
109
137
|
|
|
110
138
|
Use steps to describe intent and required adapter actions:
|
|
@@ -118,6 +146,8 @@ Use steps to describe intent and required adapter actions:
|
|
|
118
146
|
|
|
119
147
|
Use `driverAction` only when the scenario truly requires a concrete operation such as `tap`, `scroll`, `assertVisible`, `screenshot`, `readLogs`, or `collectPerfSignals`. The planner fails early when no active runner or provider can satisfy a required driver action.
|
|
120
148
|
|
|
149
|
+
For profile-session command transport, platform `waitMs` metadata is queue pacing. ASL preserves it in storage and deep-link command envelopes and waits before releasing the next queued command. App-owned milestones still provide the truth that a command produced the intended product state.
|
|
150
|
+
|
|
121
151
|
Use `selector` to describe the intended app target without committing the scenario to one driver. Supported selector kinds are `testId`, `accessibilityId`, `accessibilityLabel`, `text`, `resourceId`, and `xpath`.
|
|
122
152
|
|
|
123
153
|
```json
|
|
@@ -159,12 +189,14 @@ asl-profile-android \
|
|
|
159
189
|
--capture uiTree:artifacts/provider/ui-tree.json
|
|
160
190
|
```
|
|
161
191
|
|
|
162
|
-
Signals are copied into `signals/js`, `signals/memory`, or `signals/network` and listed in `manifest.json`. Captures are copied into `captures`; screenshots are listed in `artifacts.captures.screenshots`, while video and UI tree captures replace the matching named capture path in the manifest. Every attached file is also listed in `artifacts.evidenceAttachments` with kind, run-relative path, source filename, byte size, and
|
|
192
|
+
Signals are copied into `signals/js`, `signals/memory`, or `signals/network` and listed in `manifest.json`. Captures are copied into `captures`; screenshots are listed in `artifacts.captures.screenshots`, while video and UI tree captures replace the matching named capture path in the manifest. Every attached file is also listed in `artifacts.evidenceAttachments` with kind, run-relative path, source filename, byte size, sha256 hash, completeness status, corruption status, redaction status, and transformation list. Attached provider evidence is preserved as proof, but timing verdicts still come from app-owned truth events and budgets.
|
|
163
193
|
|
|
164
|
-
Provider manifests can also declare `providerCommands`. Profile runners execute those commands when passed with `--provider <manifest>`, but only when the provider manifest includes the selected platform. A provider with `platforms: ["ios"]` passed to an Android profile writes failed `health.json` with `provider_platform_unsupported` and does not run the command. Commands run without a shell, can use placeholders such as `{providerDir}`, `{runDir}`, `{runId}`, `{scenarioId}`, and `{platform}`, and must declare their output files. Provider-channel outputs are copied or preserved under `raw/providers/<provider-id>/` and inventoried in `artifacts.evidenceAttachments`; signal and capture outputs can still map into the standard `signals/*` or `captures/` folders. Command stdout, stderr, exit code, phase, and argv are preserved under `raw/provider-commands/`. When a provider command exits nonzero, the runner writes failed `health.json`, inconclusive `verdict.json`, and `agent-summary.md` with a next-action hint instead of making timing claims.
|
|
194
|
+
Provider manifests can also declare `providerCommands`. Profile runners execute those commands when passed with `--provider <manifest>`, but only when the provider manifest includes the selected platform. Commands run after the platform evidence source has been collected or supplied, so heavy diagnostics can be attached in a post-loop rehydration run with `--adb-artifacts` or `--simctl-artifacts` instead of perturbing the measured command window. Use `phase: "afterCapture"` for capture-sidecar diagnostics and `phase: "postRun"` for post-profile enrichment; older `capture` phase values remain accepted for existing manifests. A provider with `platforms: ["ios"]` passed to an Android profile writes failed `health.json` with `provider_platform_unsupported` and does not run the command. Commands run without a shell, can use placeholders such as `{providerDir}`, `{runDir}`, `{runId}`, `{scenarioId}`, and `{platform}`, and must declare their output files. Provider-channel outputs are copied or preserved under `raw/providers/<provider-id>/` and inventoried in `artifacts.evidenceAttachments`; signal and capture outputs can still map into the standard `signals/*` or `captures/` folders. An output can set `required: true` when the provider treats that file as required evidence; matching entries in `manifest.artifacts.diagnostics` then remain marked required in addition to scenario-authored `artifacts.required` and `requiredCapabilities`. Command stdout, stderr, exit code, phase, and argv are preserved under `raw/provider-commands/`. When a provider command exits nonzero, the runner writes failed `health.json`, inconclusive `verdict.json`, and `agent-summary.md` with a next-action hint instead of making timing claims.
|
|
165
195
|
|
|
166
196
|
The `examples/runners/script-*.json` manifests show package-neutral wrappers for accessibility, profiler, memory, and network evidence. They intentionally reference placeholder commands such as `capture-accessibility` or `capture-memory`; replace those with your project-local script, binary, or agent command. The contract that matters is the declared output path and evidence kind, not the specific tool used to create the file.
|
|
167
197
|
|
|
198
|
+
For React Native profiling, prefer a provider that emits both the raw profiler export and a structured JSON summary. JSON outputs with `kind: "profiler"` are validated against ASL's profiler evidence schema, so include the provider id, platform, run id, scenario id, tool metadata, completeness status, and at least one content surface such as samples, metrics, events, traces, a profile object, summary, or attachment references. If profiler evidence depends on explicit start/stop commands, model it as lifecycle-owned evidence: declare `captureMode`, `profileKind`, `lifecycle`, `targetBinding`, and `comparability` so agents can distinguish passive existing reports from session captures, inline captures that may perturb budgets, and after-capture or rehydrated diagnostics. CPU summaries derived from a prior profiler session should not be attached as passive evidence unless the provider also preserves the session provenance and raw attachments. If your profiler only produces a native trace or flamegraph, attach it as preserved evidence and avoid making performance claims until a provider translates the relevant facts into structured metrics.
|
|
199
|
+
|
|
168
200
|
## Artifacts
|
|
169
201
|
|
|
170
202
|
A completed profile run should leave the standard artifact set:
|
|
@@ -196,3 +228,7 @@ Run the release gate before publishing package changes:
|
|
|
196
228
|
```bash
|
|
197
229
|
pnpm release:check
|
|
198
230
|
```
|
|
231
|
+
|
|
232
|
+
## Read next
|
|
233
|
+
|
|
234
|
+
- [Adapter Onboarding](adapters.md) for runner and provider integration
|
package/docs/concepts.md
CHANGED
|
@@ -104,33 +104,12 @@ The tooling may change. The runners may change. The agents may change. The scena
|
|
|
104
104
|
|
|
105
105
|
That is a different philosophy from frameworks that primarily evaluate agents. Agent Scenario Loop is built to evaluate the evolution of software.
|
|
106
106
|
|
|
107
|
-
##
|
|
107
|
+
## Boundary
|
|
108
108
|
|
|
109
|
-
Agent Scenario Loop
|
|
109
|
+
Agent Scenario Loop is not a replacement for testing frameworks, automation tools, mobile drivers, profilers, or agent evaluation systems. Those tools can still execute or observe work.
|
|
110
110
|
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
> Did the application behave correctly?
|
|
114
|
-
|
|
115
|
-
Agent Scenario Loop optimizes for:
|
|
116
|
-
|
|
117
|
-
> What did we learn from running this scenario?
|
|
118
|
-
|
|
119
|
-
Both questions matter. Agent Scenario Loop focuses on the second question by preserving health, verdicts, metrics, logs, traces, comparisons, and other run evidence in a stable artifact shape.
|
|
120
|
-
|
|
121
|
-
## How it differs from agent evaluation
|
|
122
|
-
|
|
123
|
-
Agent Scenario Loop is not primarily evaluating agents.
|
|
124
|
-
|
|
125
|
-
An agent may execute part of a run. A runner may drive a device. A profiler may collect signals. None of those is the center of the model.
|
|
126
|
-
|
|
127
|
-
The scenario is.
|
|
128
|
-
|
|
129
|
-
The feed, livestream, upload flow, checkout flow, or conversation thread is the thing being studied over time.
|
|
111
|
+
The canonical boundary list lives in [What It Is Not](../README.md#what-it-is-not).
|
|
130
112
|
|
|
131
113
|
## Read next
|
|
132
114
|
|
|
133
115
|
- [Principles](principles.md) for the project doctrine
|
|
134
|
-
- [Contracts](contracts.md) for the current artifact and package surface
|
|
135
|
-
- [Live Proofs](live-proofs.md) for fixture, Android, iOS, and comparison runs
|
|
136
|
-
- [Runner docs](../runner/README.md) for the host execution boundary
|
|
@@ -16,6 +16,32 @@ Package gates run child package-manager and CLI commands with a bounded timeout.
|
|
|
16
16
|
ASL_PACKAGE_GATE_TIMEOUT_MS=300000 pnpm consumer:rehearse
|
|
17
17
|
```
|
|
18
18
|
|
|
19
|
+
## Downstream Local-Package Gate
|
|
20
|
+
|
|
21
|
+
Before publishing a release candidate, validate the packed local package inside at least one real downstream app when that app has already adopted durable ASL scenarios. This catches package, runner, schema, and helper regressions before npm distribution.
|
|
22
|
+
|
|
23
|
+
From this repository, run the opt-in downstream gate with an explicit app root and explicit command arrays:
|
|
24
|
+
|
|
25
|
+
```bash
|
|
26
|
+
pnpm downstream:local-package -- \
|
|
27
|
+
--app-root /path/to/adopter-app \
|
|
28
|
+
--expected-branch chore/agent-scenario-loop-adoption \
|
|
29
|
+
--command-json '["pnpm","run","asl:validate"]'
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
The gate packs the current checkout, installs the tarball into the downstream app with `pnpm add`, verifies `node_modules/agent-scenario-loop/package.json` matches the local candidate version, runs the supplied commands, and restores `package.json` plus `pnpm-lock.yaml` unless `--keep-install` is passed. Generated downstream proof artifacts remain the consumer app's local ignored state.
|
|
33
|
+
|
|
34
|
+
For live probes, pass direct package CLI commands as additional JSON arrays so the target scenario and artifact root are explicit:
|
|
35
|
+
|
|
36
|
+
```bash
|
|
37
|
+
pnpm downstream:local-package -- \
|
|
38
|
+
--app-root /path/to/adopter-app \
|
|
39
|
+
--command-json '["pnpm","run","asl:validate"]' \
|
|
40
|
+
--command-json '["node_modules/.bin/asl-profile-android","--config","asl.config.json","--scenario","scenarios/mobile/first-journey.json","--adb-capture","--profile-session","--android-profile-session-storage","--launch","--out","artifacts/asl/android","--run-id","first-journey-android-local-candidate"]'
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
Keep adopter-specific app ids, storage keys, dev-client URLs, simulator UDIDs, auth state, accounts, and scenarios in ignored local environment state or in the consuming app. ASL owns the package candidate and evidence contract; the downstream app owns product truth.
|
|
44
|
+
|
|
19
45
|
## 1. Initialize The Scaffold
|
|
20
46
|
|
|
21
47
|
From the consuming app root:
|
|
@@ -81,7 +107,7 @@ asl-check-plan --scenario scenarios/mobile/first-journey.json --runner runner-ma
|
|
|
81
107
|
asl-profile-ios --config asl.config.json --scenario scenarios/mobile/first-journey.json --simctl-capture --profile-session --profile-session-storage --launch --out artifacts/asl/ios --run-id first-journey-ios-live --comparison-lane first-journey-ios-live
|
|
82
108
|
```
|
|
83
109
|
|
|
84
|
-
For Expo dev-client builds, set `ASL_ANDROID_DEV_CLIENT_URL` or `ASL_IOS_DEV_CLIENT_URL` to the app's dev-client URL in ignored local env state.
|
|
110
|
+
For Expo dev-client builds, set `ASL_ANDROID_DEV_CLIENT_URL` or `ASL_IOS_DEV_CLIENT_URL` to the app's dev-client URL in ignored local env state. Prefer the LAN URL advertised by Metro for physical-device validation. Use `127.0.0.1` only when the selected simulator/emulator resolves that address back to the host Metro process. Android opens the dev-client URL before profile-session control. When Android storage transport is enabled, ASL waits for `Running "main"` by default before writing profile-session storage; override `ASL_ANDROID_DEV_CLIENT_READY_PATTERN` only when the app has a better readiness marker. If startup readiness fails, ASL reports an unhealthy run and skips command delivery instead of writing into a stale native shell. iOS opens the dev-client URL before reading stored profile-session evidence.
|
|
85
111
|
|
|
86
112
|
When Android deep-link delivery is unreliable in a dev-client shell, use `--android-profile-session-storage` so `asl-profile-android` seeds the app-owned AsyncStorage session through `run-as` before collecting evidence. The runner reads the selected device clock for the session start timestamp, which keeps app-emitted milestone durations meaningful. Keep custom storage key overrides local to the consuming app.
|
|
87
113
|
|
|
@@ -113,3 +139,7 @@ Before expanding beyond the first journey, confirm:
|
|
|
113
139
|
- at least one platform has a passed live proof
|
|
114
140
|
|
|
115
141
|
Only then add more scenarios, providers, or runner adapters.
|
|
142
|
+
|
|
143
|
+
## Read next
|
|
144
|
+
|
|
145
|
+
- [Live Proofs](live-proofs.md) for fixture, Android, iOS, comparison, and release-proof commands
|
package/docs/contracts.md
CHANGED
|
@@ -4,6 +4,8 @@ This package ships the scenario, runner, and artifact contracts that make Agent
|
|
|
4
4
|
|
|
5
5
|
The package is intentionally contract-first: adopt the scenario and artifact shape once, then add or swap runner loops without rewriting your scenarios.
|
|
6
6
|
|
|
7
|
+
See [Architecture](architecture.md) for the TypeScript-first implementation and language-neutral contract boundary.
|
|
8
|
+
|
|
7
9
|
## What ships today
|
|
8
10
|
|
|
9
11
|
- [app/profile-session.ts](../app/profile-session.ts): thin React Native integration for session control, truth events, and signal attachments
|
|
@@ -64,15 +66,20 @@ Portable scenario manifests describe the durable app behavior before choosing a
|
|
|
64
66
|
- `truthEvents`: app-owned milestone events keyed by stable milestone id
|
|
65
67
|
- `milestones`: inspectable milestone list with event names, phases, timeouts, and descriptions
|
|
66
68
|
- `expectedEvents`: event names the runner or log ingest should expect to observe
|
|
67
|
-
- `cycles`: repeat count, warmup count,
|
|
69
|
+
- `cycles`: repeat count, warmup count, failure policy, and optional setup/body step ids for repeated journeys
|
|
68
70
|
- `budgets`: product thresholds evaluated only after scenario health passes
|
|
69
71
|
- `steps`: runner-facing launch, command, wait, gesture, and capture actions
|
|
70
72
|
- `selector`: optional app target on a step, such as a test id, accessibility id, label, text, resource id, or xpath
|
|
73
|
+
- `uiContext`: optional UI ownership requirement on a step; UI driver actions default to `app`
|
|
71
74
|
- `artifacts`: required and optional evidence outputs
|
|
72
75
|
|
|
73
76
|
The scenario contract is intentionally runner-neutral. Runners can map steps to adb, XcodeBuildMCP, agent-device, accessibility tools, profilers, or custom scripts while preserving the same journey, milestones, budgets, and expected events.
|
|
74
77
|
|
|
75
|
-
|
|
78
|
+
For repeated mobile command scenarios, `cycles.setupStepIds` names leading setup commands that run once before measured cycle work, while `cycles.bodyStepIds` names the first repeated body commands when inference would be ambiguous. Built-in profile-session runners also infer a setup prefix conservatively: leading readiness commands or leading commands before the first measured milestone command run once, and the remaining command body repeats for `cycles.iterations`. Wait gates remain strict; ASL does not synthesize missing app-owned truth events.
|
|
79
|
+
|
|
80
|
+
Runner capabilities describe ownership, such as launch, session control, command execution, log capture, artifact writing, or profiler support. Driver actions describe the concrete operations an adapter can perform inside a run. UI contexts describe which surface the runner or provider can own: `app`, `systemDialog`, `notificationShade`, `externalBrowser`, `webView`, `shareSheet`, `picker`, or `otherApp`. UI and capture driver actions default to `app` when a step omits `uiContext`; a scenario must opt into system or external contexts explicitly. A runner may be able to own a scenario lifecycle without supporting every driver action or UI context; the planner fails when a required step declares a `driverAction` or `uiContext` that the selected runner or an active provider does not declare.
|
|
81
|
+
|
|
82
|
+
Planner compatibility artifacts and planner-derived `health.json` include a `downgradePolicy` block with `mode: "no-silent-downgrade"`. Required capability, driver-action, UI-context, or artifact gaps are recorded as `unsupported`; optional gaps are recorded as warnings. `allowedSubstitutions` and `substitutions` are explicit arrays, so future semantic downgrades must be visible in artifacts instead of being inferred from a passed plan.
|
|
76
83
|
|
|
77
84
|
`buildScenarioExecutionPlan()` turns the same scenario steps into a deterministic adapter-facing work list. Each normalized step records the scenario step id, original kind, required flag, optional driver action, and the runner port method that owns it: `launch`, `executeStep`, `waitForTruthEvent`, or `captureEvidence`.
|
|
78
85
|
|
|
@@ -80,11 +87,15 @@ Android adb capture routes normalized steps with `driverAction: "tap"`, `"scroll
|
|
|
80
87
|
|
|
81
88
|
When Android adb `tap` or `scroll` steps provide a portable selector instead of coordinates, the runner captures `uiautomator dump` output, resolves supported selector kinds against node bounds, and derives adb input coordinates before executing the action. Built-in Android selector resolution supports `testId`, `resourceId`, `accessibilityId`, `accessibilityLabel`, and `text`; `xpath` stays available for external runners with native selector engines.
|
|
82
89
|
|
|
83
|
-
I/O from iOS simctl capture routes through the simctl driver adapter. `readLogs` preserves bounded simulator logs under `raw/ios-simctl-log.txt`. A scenario step with `driverAction: "screenshot"` or `artifact: "screenshot"` requests a screenshot capture, defaulting to `captures/ios-screenshot.png
|
|
90
|
+
I/O from iOS simctl capture routes through the simctl driver adapter. `readLogs` preserves bounded simulator logs under `raw/ios-simctl-log.txt`. A scenario step with `driverAction: "screenshot"` or `artifact: "screenshot"` requests a screenshot capture, defaulting to `captures/ios-screenshot.png`. The profile manifest records the resulting capture path in `artifacts.captures.screenshots`, and capture metadata records any supported simulator screenshot options the runner used.
|
|
91
|
+
|
|
92
|
+
Manifest artifact paths are evidence claims. Optional diagnostics such as `captures.video`, `captures.uiTree`, `raw.deviceLog`, JS/memory/network signals, accessibility exports, and profiler files appear as paths only when the file was produced or intentionally referenced as a sidecar dependency. Every profile manifest also includes `artifacts.diagnostics`, an inventory of common diagnostic surfaces with `kind`, `status`, `required`, optional `path`, and a `reason`/`nextAction` when evidence was unavailable or not requested.
|
|
84
93
|
|
|
85
94
|
Planner compatibility also validates the adapter metadata that built-in runners require. Android adb `tap` steps need either `adapterOptions.androidAdb.x/y` or a portable selector; Android adb `scroll` steps need either `startX/startY/endX/endY` or a portable selector; iOS simctl command metadata needs non-empty command strings and positive integer waits/repeat counts. Argent `tap` steps need `adapterOptions.argent.x/y`, Argent `scroll` steps need `adapterOptions.argent.startX/startY/endX/endY`, and Argent `assertVisible` steps need a portable selector. These failures become `invalid_adapter_options` health checks before runtime execution starts.
|
|
86
95
|
|
|
87
|
-
Adapter-target fixtures such as `agent-device-android`, `agent-device-ios`, `argent-ios`, `argent-android`, `argent-react-profiler-provider`, and `axe-accessibility-provider` describe where external tools can plug into the same contract. They are schema-checked and planner-tested capability manifests. The bundled `agent-device` capture runner implements the portable interaction subset for iOS and Android; broader agent-device surfaces such as React DevTools, traces, network, and performance still need explicit adapters or provider attachments before they become part of the stable artifact contract. The bundled Argent runner implements launch, coordinate-backed gestures, screenshot requests, and description-backed visibility proof for portable selector match modes while keeping React profiler output in a separate Android evidence-provider lane. Argent command-surface checks prove the configured tools exist; runtime health still owns whether the selected device backend produced screenshot evidence. Required screenshot failures fail health, and optional screenshot failures are preserved as warnings. Active evidence providers can satisfy required evidence artifacts and provider-owned driver actions such as `collectPerfSignals`; providers outside the selected platform do not contribute to the match. When those tools write files independently,
|
|
96
|
+
Adapter-target fixtures such as `agent-device-android`, `agent-device-ios`, `argent-ios`, `argent-android`, `argent-react-profiler-provider`, and `axe-accessibility-provider` describe where external tools can plug into the same contract. They are schema-checked and planner-tested capability manifests. The bundled `agent-device` capture runner implements the portable interaction subset for iOS and Android; broader agent-device surfaces such as React DevTools, traces, network, and performance still need explicit adapters or provider attachments before they become part of the stable artifact contract. The bundled Argent runner implements launch, coordinate-backed gestures, screenshot requests, and description-backed visibility proof for portable selector match modes while keeping React profiler output in a separate Android evidence-provider lane. Argent command-surface checks prove the configured tools exist; runtime health still owns whether the selected device backend produced screenshot evidence. Required screenshot failures fail health, and optional screenshot failures are preserved as warnings. Active evidence providers can satisfy required evidence artifacts and provider-owned driver actions such as `collectPerfSignals`; providers outside the selected platform do not contribute to the match. When those tools write files independently, attached provider evidence lands in the stable manifest and artifact layout. The `script-accessibility-provider`, `script-profiler-provider`, `script-memory-provider`, and `script-network-provider` examples show provider-command wrappers for project-local tools without making those tools package dependencies.
|
|
97
|
+
|
|
98
|
+
Profiler evidence is a first-class artifact kind, but ASL does not pretend every profiler tool has the same native format. JSON profiler outputs should satisfy [schemas/profiler.schema.json](../schemas/profiler.schema.json), including provider, platform, run, scenario, tool/completeness metadata, and at least one useful content surface such as samples, metrics, events, traces, a profile object, summary, or referenced attachments. Lifecycle-backed profilers should also declare whether evidence came from passive report ingestion, an explicit session, inline capture, `afterCapture`, `postRun`, or rehydration; whether the target device/app binding was verified; whether capture perturbed timing; and whether the output is comparable or diagnostic-only. Native traces, CPU profiles, flamegraphs, React DevTools exports, and recordings can still be attached as profiler evidence through provider outputs, but agents should treat them as preserved evidence until a provider also emits structured metrics that ASL can compare or summarize.
|
|
88
99
|
|
|
89
100
|
## Public artifact layout
|
|
90
101
|
|
|
@@ -111,11 +122,23 @@ Profile runner artifacts:
|
|
|
111
122
|
|
|
112
123
|
`manifest.json`, `metrics.json`, `budget-verdict.json`, and `causal-run.json` are schema-checked before the runner writes them. This keeps profile artifacts stable across fixture logs, adb-captured logs, and future runner adapters.
|
|
113
124
|
|
|
125
|
+
`causal-run.json` preserves app-emitted timeline events through the public causal phase/status vocabulary. If an app emits richer phase or status values, ASL writes schema-valid top-level values and preserves the originals as timeline metadata. Timeline metadata also preserves scalar correlation fields such as `iteration`, `sequence`, `queueId`, `commandId`, `operationId`, `attemptId`, and `clockDomain` when the app emits them. Profile-session command acknowledgements are included as ASL-owned timeline entries with command status, result, source, sequence, queue, wait, and command ID metadata, so agents can inspect runtime ordering without treating command transport as product truth. Repeated runs include `iterationSummary` so agents can distinguish complete, partial, failed, and timeout iteration evidence without scraping raw logs. Scenarios without budget thresholds still produce schema-valid causal artifacts with an empty `budgets` object.
|
|
126
|
+
|
|
127
|
+
`manifest.attempt` records the run attempt identity and terminal semantics independently of prose summaries. It includes an `attemptId`, `attemptNumber`, `maxAttempts`, optional retry lineage, terminal state, failure classification, cleanup outcome, and whether preserved partial artifacts are valid for diagnosis. Retry attempts must identify the prior attempt and retry reason. A failed attempt can therefore keep usable raw evidence without implying that product verdict, timing, or comparison claims are trustworthy.
|
|
128
|
+
|
|
129
|
+
`manifest.provenance.cohort` records product-neutral compatibility inputs for comparing runs. Profile runners populate known fields such as `appId`, `platform`, `runnerName`, `runnerVersion`, `commandTransport`, and active provider IDs; richer callers can add app/build version, build mode, OS version, device class, feature flags, and seed identity. ASL derives `manifest.provenance.cohortHash` from the normalized cohort. Latest-trusted comparison requires the same cohort hash when the current run records one, so old artifacts remain comparable only when the current artifact has not opted into cohort-aware selection.
|
|
130
|
+
|
|
131
|
+
`manifest.attempt.terminalState` uses a terminal vocabulary of `passed`, `failed`, `timeout`, `cancelled`, `aborted`, `inconclusive`, `unsupported`, `skipped`, and `unhealthy`. Attempt construction rejects misleading terminal combinations: passed attempts must end as `passed`, failed attempts must use a failure terminal state, timeout/cancelled/aborted attempts must preserve valid partial artifact paths, and cleanup statuses such as `passed`, `failed`, or `partial` must include a cleanup message. `manifest.environment` records product-neutral lifecycle and environment preconditions and postconditions. Each field is an assertion object with a `value` and `evidence` state. Generated profile artifacts default to `value: "unknown"` and `evidence: "not-asserted"` unless the runner can prove more. The dedicated `lifecyclePhase` assertion supports `cold-launch`, `warm-launch`, `hot-launch`, `resume`, `foreground`, `background`, `force-stop`, `process-death`, `scene-recreation`, `activity-recreation`, `os-reclaim`, `reboot`, and `relaunch`. This preserves what the runner did not prove instead of letting agents infer installed state, app data state, auth state, route, foreground state, permissions, locale, timezone, theme, font scale, orientation, network, animations, cleanup, data, or artifact completeness from surrounding logs.
|
|
132
|
+
|
|
133
|
+
Profile `agent-summary.md` files include an `attempt` section when the run has a manifest attempt block, including terminal state, cleanup state, partial-artifact validity, and retry lineage. Latest-trusted baseline selection treats attempt-aware runs as baseline-trusted only when health and verdict passed, the attempt is a clean first attempt, cleanup did not fail or remain partial, and partial artifacts are not marked valid diagnostic fragments. Older artifacts without `manifest.attempt` remain legacy-trusted when health and verdict passed, but new attempt-aware runs cannot hide retry laundering behind a green final verdict.
|
|
134
|
+
|
|
135
|
+
Profile runners assert only environment facts they own. Every completed profile manifest records ASL-controlled artifact completeness and cleanup postconditions. Live adb/simctl capture paths also assert runner-controlled foreground state, explicit lifecycle preconditions, and foreground postconditions. Use `--lifecycle-phase <phase>` when a runner can prove a non-cold precondition such as `warm-launch` or `resume`; log-ingest and preexisting artifact ingestion keep those fields `unknown/not-asserted`. Lifecycle assertions are not product milestones: a runner proving `lifecyclePhase: "resume"` does not synthesize `app_resumed` or any other app truth event. Resume readiness must still be emitted by the consuming app when a scenario waits for it.
|
|
136
|
+
|
|
114
137
|
Aggregate live proof commands write `live-proof.json` and `agent-summary.md` under `_live-proof/<run-id>`. The live-proof artifact points to preflight evidence, every scenario run, optional interaction proofs from tools such as agent-device or Argent, optional skipped interaction proof declarations, and optional latest-trusted comparison outputs, giving agents one stable entrypoint after a proof run. Preflight, profile, and interaction pointers include health and verdict status from the linked run artifacts, so agents can see what passed before opening deeper evidence. Interaction proof pointers also include sidecar screenshot capture inventory when the sidecar produced screenshots, plus `warnings` when optional sidecar checks failed without invalidating the required proof. If profile health or verdict fails, requested sidecars are not executed; they are recorded in `skippedInteractionProofs` with a reason and next action so agent feedback stays explicit without mixing runner evidence into an untrusted timing run. The aggregate artifact records `status`, `comparisonStatus`, `comparisonCounts`, optional per-comparison `metricSummary` counts/highlights, and a `nextAction` hint so agents can distinguish failed proof gates, regressions, mixed metric movement, missing baselines, inconclusive comparisons, partial sidecar evidence, and clean summaries without scraping prose.
|
|
115
138
|
|
|
116
139
|
Platform-set proof commands write `live-proof-set.json` and `agent-summary.md` under the caller-provided proof-set output directory. The proof-set artifact records required platforms, present platforms, missing platforms, each linked `live-proof.json`, failed proof reasons, regression-gate reasons, and a next action. This gives agents one stable Android-plus-iOS gate after the per-platform live proofs have written their own aggregate evidence.
|
|
117
140
|
|
|
118
|
-
Provider or custom-script evidence
|
|
141
|
+
Provider or custom-script evidence attachments are copied into stable run folders and inventoried in `manifest.artifacts.evidenceAttachments`. Each inventory entry records the evidence channel, kind, run-relative path, source filename, byte size, sha256 hash, completeness status, corruption status, redaction status, and transformations; it does not preserve local absolute source paths.
|
|
119
142
|
|
|
120
143
|
Evidence folders:
|
|
121
144
|
|
|
@@ -133,6 +156,14 @@ The current profile runner writes health, verdict, agent summary, metrics, causa
|
|
|
133
156
|
|
|
134
157
|
Budgets are supported but optional for adoption.
|
|
135
158
|
|
|
159
|
+
Milestone budget interval semantics are explicit:
|
|
160
|
+
|
|
161
|
+
- `toMilestone` without `fromMilestone` measures elapsed time from the run or session clock origin to the matching milestone occurrence.
|
|
162
|
+
- `fromMilestone` plus `toMilestone` measures the interval between the two app-owned truth events for each iteration.
|
|
163
|
+
- repeated transition, gesture, open, close, scroll, or handoff budgets should use both milestones when the intended number is transition duration rather than cumulative elapsed time.
|
|
164
|
+
|
|
165
|
+
This distinction is visible in `metrics.json`: elapsed milestone-only runs populate `durationsMs` with milestone timestamps, while interval runs populate `durationsMs` with `to - from` values. Timing still remains untrusted unless `health.json` passes.
|
|
166
|
+
|
|
136
167
|
`buildRunIndex()` can scan an artifact root after runs complete. It indexes folders that contain both `health.json` and `verdict.json`, marks a run trusted only when health and verdict both passed, and lets agents find the latest trusted prior run for a scenario without relying on terminal history.
|
|
137
168
|
|
|
138
169
|
## Supported Runner Surface
|
|
@@ -164,104 +195,17 @@ Not yet shipped as supported public features:
|
|
|
164
195
|
- Computer Use flows
|
|
165
196
|
- product-specific scenarios
|
|
166
197
|
|
|
167
|
-
##
|
|
168
|
-
|
|
169
|
-
Use `check-plan` to validate a scenario, runner manifest, and optional evidence-provider manifests before execution:
|
|
170
|
-
|
|
171
|
-
```bash
|
|
172
|
-
pnpm check-plan -- --scenario examples/scenarios/mobile/app-startup.json --runner examples/runners/xcodebuildmcp-ios.json --platform ios --out artifacts/plan/app-startup
|
|
173
|
-
```
|
|
174
|
-
|
|
175
|
-
This validates the input manifests, writes schema-checked `health.json` and `verdict.json`, writes `agent-summary.md`, and includes the raw planner match in `planner-compatibility.json`.
|
|
176
|
-
|
|
177
|
-
## Android adb readiness
|
|
178
|
-
|
|
179
|
-
Use `android:preflight` to verify adb and connected-device readiness before adding live Android scenario execution:
|
|
180
|
-
|
|
181
|
-
```bash
|
|
182
|
-
pnpm android:preflight -- --package com.example.app --out artifacts/android-adb-preflight
|
|
183
|
-
```
|
|
184
|
-
|
|
185
|
-
The command writes:
|
|
186
|
-
|
|
187
|
-
- `health.json`
|
|
188
|
-
- `verdict.json`
|
|
189
|
-
- `agent-summary.md`
|
|
190
|
-
- `raw/adb-version.txt`
|
|
191
|
-
- `raw/adb-devices.txt`
|
|
192
|
-
- `raw/android-metadata.json`
|
|
193
|
-
|
|
194
|
-
If adb, a connected online device, or an optional package check fails, health fails and the verdict remains `inconclusive`.
|
|
195
|
-
|
|
196
|
-
Add `--capture-logcat --logcat-lines <count>` to write `raw/adb-logcat.txt` in the same artifact folder. Add `--react-native-debug-host <host:port>` with `--package <name>` for React Native development builds that need adb reverse plus the app `debug_http_host` preference before launch; the runner writes `raw/adb-react-native-reverse.txt` and `raw/adb-react-native-debug-host.txt`. Add `--clear-logcat --launch --wait-ms <ms>` with `--package <name>` to clear logs, launch the package, wait for a bounded capture window, and then collect logcat evidence. If requested capture-window setup or logcat capture fails, scenario health fails because timing and event evidence would be incomplete.
|
|
197
|
-
|
|
198
|
-
Use that captured logcat evidence directly with Android profiling:
|
|
199
|
-
|
|
200
|
-
```bash
|
|
201
|
-
pnpm profile:android -- --config core/config-template.json --scenario examples/mobile-app/scenarios/android/app-startup.json --adb-artifacts artifacts/android-adb-preflight --run-id android-run-1
|
|
202
|
-
```
|
|
203
|
-
|
|
204
|
-
Or let Android profiling own the adb capture window before it writes profile artifacts:
|
|
205
|
-
|
|
206
|
-
```bash
|
|
207
|
-
pnpm profile:android -- --config core/config-template.json --scenario examples/mobile-app/scenarios/android/app-startup.json --adb-capture --react-native-debug-host localhost:8097 --clear-logcat --launch --run-id android-run-1
|
|
208
|
-
```
|
|
209
|
-
|
|
210
|
-
## iOS simulator capture
|
|
211
|
-
|
|
212
|
-
Use `profile:ios --simctl-capture` when the example app or a consuming app is already installed on a booted simulator:
|
|
213
|
-
|
|
214
|
-
```bash
|
|
215
|
-
pnpm profile:ios -- --config core/config-template.json --scenario examples/mobile-app/scenarios/ios/app-startup.json --simctl-capture --profile-session --profile-session-storage --launch --run-id ios-run-1
|
|
216
|
-
```
|
|
217
|
-
|
|
218
|
-
The command writes a separate simctl capture folder under the selected output root, seeds the app-owned profile session into native AsyncStorage before launch, then collects stored app profile events after the capture window. Command scenarios seed the scenario command queue through the same storage contract before launch. When `raw/ios-profile-events.log` exists, the iOS profile runner ingests that stored truth-event log; otherwise it falls back to `raw/ios-simctl-log.txt`.
|
|
219
|
-
|
|
220
|
-
For profile-session capture on Android or iOS, omitting `--wait-ms` lets ASL derive the final evidence window from scenario execution waits and cycle count. Explicit `--wait-ms` remains authoritative when a consuming app has a known startup or logging delay that the scenario cannot express.
|
|
221
|
-
|
|
222
|
-
Scenario command targets live in `adapterOptions.iosSimctl.commands`, while the app handles them through `registerProfileCommandTargetHandler`. The iOS proof does not depend on unified logs carrying JavaScript console output; it depends on app-owned stored profile events.
|
|
223
|
-
|
|
224
|
-
## Historical comparison
|
|
225
|
-
|
|
226
|
-
Use `compare` to build `comparison.json` from two completed run folders:
|
|
227
|
-
|
|
228
|
-
```bash
|
|
229
|
-
pnpm compare -- --baseline artifacts/runs/app-startup/baseline --current artifacts/runs/app-startup/current --out artifacts/runs/app-startup/current --fail-on-regression
|
|
230
|
-
```
|
|
231
|
-
|
|
232
|
-
The comparison gate is intentionally strict. If either run failed scenario health, or if the scenario ids do not match, the comparison is `inconclusive`. Numeric budget checks are compared only after that health gate passes. `comparison.json` includes `comparisonBasis` with the baseline/current run ids and run directories, giving agents artifact-local provenance instead of forcing them to infer it from folder names.
|
|
233
|
-
|
|
234
|
-
Use `compare:latest` when an artifact root contains run history and the agent should compare the current run against the newest trusted prior run for the same scenario:
|
|
235
|
-
|
|
236
|
-
```bash
|
|
237
|
-
pnpm compare:latest -- --root artifacts/runs --scenario app-startup --current artifacts/runs/app-startup/current --out artifacts/runs/app-startup/current --fail-on-regression
|
|
238
|
-
```
|
|
239
|
-
|
|
240
|
-
The latest-trusted command excludes the exact current run directory from baseline selection. Baseline trust requires passed health and passed verdict. Current runs must pass scenario health before the command will compare timing or budget evidence. If the current manifest declares `comparisonLane`, baseline selection is scoped to trusted prior runs with the same lane; if the current manifest has no lane, selection stays within unlabeled trusted prior runs. Profile manifests also include `scenarioHash`, a stable fingerprint of the normalized scenario contract. When the current run has that hash, latest-trusted selection only compares against trusted prior runs with the same hash; legacy runs without the hash remain comparable only to legacy current runs. This keeps proof modes such as plain live proof and live proof plus agent-device sidecar from comparing against each other, and it keeps migrated scenario definitions from poisoning before/after verdicts. Latest-trusted artifacts set `comparisonBasis.strategy` to `latest_trusted_prior` and record selection counts for inspected, trusted, trusted-prior, lane-comparable, and scenario-contract-comparable candidates.
|
|
241
|
-
|
|
242
|
-
## Fixture loop
|
|
243
|
-
|
|
244
|
-
Use `demo:loop` to run the current contract without a simulator:
|
|
245
|
-
|
|
246
|
-
```bash
|
|
247
|
-
pnpm demo:loop -- --out artifacts/demo-loop
|
|
248
|
-
```
|
|
249
|
-
|
|
250
|
-
The fixture loop writes:
|
|
198
|
+
## Command guidance
|
|
251
199
|
|
|
252
|
-
|
|
253
|
-
- `preflight/app-startup/verdict.json`
|
|
254
|
-
- `preflight/app-startup/agent-summary.md`
|
|
255
|
-
- `profile-runs/app-startup/demo-baseline/*`
|
|
256
|
-
- `profile-runs/app-startup/demo-current/*`
|
|
257
|
-
- `profile-runs/app-startup/demo-current/comparison.json`
|
|
200
|
+
Contracts defines the schemas, artifact fields, runner surfaces, and trust policy. Runnable walkthroughs live in [Live Proofs](live-proofs.md):
|
|
258
201
|
|
|
259
|
-
|
|
202
|
+
- [plan checks](live-proofs.md#plan-check)
|
|
203
|
+
- [Android adb preflight and profile capture](live-proofs.md#platform-preflight-and-profile-capture)
|
|
204
|
+
- [iOS simctl profile capture](live-proofs.md#platform-preflight-and-profile-capture)
|
|
205
|
+
- [fixture loop](live-proofs.md#fixture-loop)
|
|
206
|
+
- [explicit and latest-trusted comparison](live-proofs.md#comparison)
|
|
207
|
+
- [generic Android and iOS live proof](live-proofs.md#generic-mobile-proof)
|
|
260
208
|
|
|
261
209
|
## Read next
|
|
262
210
|
|
|
263
|
-
- [
|
|
264
|
-
- [Concepts](concepts.md) for the broader product framing
|
|
265
|
-
- [Adapter Onboarding](adapters.md) for adding runners and evidence providers
|
|
266
|
-
- [Consumer App Rehearsal](consumer-rehearsal.md) for adopting the package in an existing app
|
|
267
|
-
- [Runner docs](../runner/README.md) for current runner behavior and limits
|
|
211
|
+
- [Scenario Authoring](authoring.md) for writing portable scenarios against these contracts
|
|
@@ -0,0 +1,219 @@
|
|
|
1
|
+
# External Adapter Protocol
|
|
2
|
+
|
|
3
|
+
ASL core is TypeScript, but the adapter contract is language-neutral. An external adapter is an out-of-process executable that exchanges newline-delimited JSON messages over stdin and stdout. The executable can be written in any language and must not depend on ASL TypeScript internals.
|
|
4
|
+
|
|
5
|
+
This document defines the minimal protocol surface for conformance fixtures and future adapter hosts. JSON Schema and this normative protocol document are the source of truth for portable behavior; built-in TypeScript runners remain implementations of the same contract, not the contract itself. The protocol message schema is published in `schemas/external-adapter-message.schema.json`.
|
|
6
|
+
|
|
7
|
+
## Transport
|
|
8
|
+
|
|
9
|
+
- The host starts the adapter as a child process without a shell.
|
|
10
|
+
- Each message is one UTF-8 JSON object followed by `\n`.
|
|
11
|
+
- stdout is reserved for protocol messages. Diagnostics must go to stderr.
|
|
12
|
+
- Requests and responses are correlated by `operationId`.
|
|
13
|
+
- `seq` is a monotonically increasing integer within each sender's stream.
|
|
14
|
+
- Hosts and adapters maintain independent `seq` streams. A receiver must treat missing, repeated, or non-monotonic `seq` values as protocol health failures.
|
|
15
|
+
- Timestamps use RFC 3339 strings. Timing-sensitive waits must declare their `clockDomain`.
|
|
16
|
+
- Adapters must classify work received after its request `deadline` as a structured deadline failure instead of silently attempting stale work.
|
|
17
|
+
- Paths in artifact references are run-relative unless `uri` is explicitly used.
|
|
18
|
+
- Artifact and raw file references should include `sha256` and `sizeBytes` when the adapter can compute them.
|
|
19
|
+
- Evidence bytes must not be embedded in protocol messages as raw data or base64.
|
|
20
|
+
|
|
21
|
+
## Envelope
|
|
22
|
+
|
|
23
|
+
Every message uses the same envelope:
|
|
24
|
+
|
|
25
|
+
```json
|
|
26
|
+
{
|
|
27
|
+
"protocolVersion": "1.0",
|
|
28
|
+
"seq": 1,
|
|
29
|
+
"operationId": "op-001",
|
|
30
|
+
"kind": "request",
|
|
31
|
+
"type": "hello",
|
|
32
|
+
"runId": "run-001",
|
|
33
|
+
"attemptId": "attempt-001",
|
|
34
|
+
"deadline": "2026-06-19T12:00:05.000Z",
|
|
35
|
+
"body": {}
|
|
36
|
+
}
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
Fields:
|
|
40
|
+
|
|
41
|
+
| Field | Required | Meaning |
|
|
42
|
+
| --- | --- | --- |
|
|
43
|
+
| `protocolVersion` | yes | Protocol major/minor string. This document defines `1.0`. |
|
|
44
|
+
| `seq` | yes | Sender-local message sequence. |
|
|
45
|
+
| `operationId` | yes | Correlates one request with one response. Cancellation targets this value. |
|
|
46
|
+
| `kind` | yes | `request`, `response`, or `event`. |
|
|
47
|
+
| `type` | yes | Operation or event name. |
|
|
48
|
+
| `runId` | request after `hello` | Stable ASL run identifier. |
|
|
49
|
+
| `attemptId` | request after `hello` | Stable retry/attempt identifier for the run. |
|
|
50
|
+
| `deadline` | request operations | Absolute deadline for bounded work. |
|
|
51
|
+
| `body` | yes | Operation-specific payload. |
|
|
52
|
+
|
|
53
|
+
Responses must echo `protocolVersion`, `operationId`, `runId`, and `attemptId` when those fields were present on the request.
|
|
54
|
+
|
|
55
|
+
## Hello And Capability Discovery
|
|
56
|
+
|
|
57
|
+
The first host message must be `hello`. The adapter responds with its identity, supported protocol range, platforms, capabilities, driver actions, artifact outputs, and clock domains.
|
|
58
|
+
|
|
59
|
+
Request body:
|
|
60
|
+
|
|
61
|
+
```json
|
|
62
|
+
{
|
|
63
|
+
"host": {
|
|
64
|
+
"name": "agent-scenario-loop",
|
|
65
|
+
"version": "0.1.x"
|
|
66
|
+
},
|
|
67
|
+
"platform": "android"
|
|
68
|
+
}
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
Response body:
|
|
72
|
+
|
|
73
|
+
```json
|
|
74
|
+
{
|
|
75
|
+
"adapter": {
|
|
76
|
+
"name": "asl-python-conformance-fixture",
|
|
77
|
+
"version": "0.1.0"
|
|
78
|
+
},
|
|
79
|
+
"acceptedProtocolVersion": "1.0",
|
|
80
|
+
"platforms": ["android", "ios"],
|
|
81
|
+
"capabilities": ["prepare", "launch", "command", "truthEvent", "evidence", "cancel", "stop", "finalize"],
|
|
82
|
+
"driverActions": ["tap", "assertVisible"],
|
|
83
|
+
"artifactOutputs": ["logs", "screenshot", "truth-events"],
|
|
84
|
+
"clockDomains": ["host-monotonic", "device-log"]
|
|
85
|
+
}
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
If the adapter cannot support the requested protocol or platform, it must return a structured failure response and then exit cleanly or wait for `finalize`.
|
|
89
|
+
|
|
90
|
+
## Operations
|
|
91
|
+
|
|
92
|
+
All operation responses use:
|
|
93
|
+
|
|
94
|
+
```json
|
|
95
|
+
{
|
|
96
|
+
"ok": true,
|
|
97
|
+
"result": {}
|
|
98
|
+
}
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
or:
|
|
102
|
+
|
|
103
|
+
```json
|
|
104
|
+
{
|
|
105
|
+
"ok": false,
|
|
106
|
+
"failure": {
|
|
107
|
+
"category": "unsupported",
|
|
108
|
+
"code": "unsupported_action",
|
|
109
|
+
"message": "driverAction `pinch` is not supported",
|
|
110
|
+
"retryable": false,
|
|
111
|
+
"details": {
|
|
112
|
+
"driverAction": "pinch"
|
|
113
|
+
}
|
|
114
|
+
}
|
|
115
|
+
}
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
`failure.category` is optional for older adapters but recommended for conformance. Use these product-neutral categories:
|
|
119
|
+
|
|
120
|
+
| Category | Use |
|
|
121
|
+
| --- | --- |
|
|
122
|
+
| `adapter` | Adapter implementation failure that is not more specific. |
|
|
123
|
+
| `cancelled` | Operation was cancelled before completion. |
|
|
124
|
+
| `cleanup` | Stop/finalize/cleanup invariant failed. |
|
|
125
|
+
| `deadline` | Request deadline expired before or during adapter work. |
|
|
126
|
+
| `environment` | Host, device, simulator, permission, or tool environment prevented execution. |
|
|
127
|
+
| `protocol` | Malformed message, invalid sequence, unsupported protocol, or decode failure. |
|
|
128
|
+
| `runner` | Runner orchestration failed outside app product behavior. |
|
|
129
|
+
| `unsupported` | Operation, platform, driver action, or evidence kind is unsupported. |
|
|
130
|
+
|
|
131
|
+
### prepare
|
|
132
|
+
|
|
133
|
+
Validates target configuration, creates or verifies run directories, and reports setup metadata. The request body should include `platform`, target identifiers, environment assumptions, and an optional `artifactsRoot`.
|
|
134
|
+
|
|
135
|
+
The response should include normalized target metadata and any adapter-owned artifact directories.
|
|
136
|
+
|
|
137
|
+
### launch
|
|
138
|
+
|
|
139
|
+
Launches or verifies the app, device, browser, or other target. The request body should include `platform`, `target`, and optional launch arguments.
|
|
140
|
+
|
|
141
|
+
The response should include launch status, a target reference when available, and artifact references for raw command output.
|
|
142
|
+
|
|
143
|
+
### executeAction
|
|
144
|
+
|
|
145
|
+
Executes one portable driver action. The request body must include `driverAction` and action-specific input. The adapter must reject unknown or unsupported actions with `ok: false`.
|
|
146
|
+
|
|
147
|
+
### waitCondition
|
|
148
|
+
|
|
149
|
+
Waits for a truth event, UI condition, log marker, or other bounded condition. The request body must include `condition`, `deadline`, and `clockDomain`.
|
|
150
|
+
|
|
151
|
+
The response should include matched truth-event data when available. Timing values are not trustworthy verdict inputs unless scenario health passed.
|
|
152
|
+
|
|
153
|
+
### captureEvidence
|
|
154
|
+
|
|
155
|
+
Captures logs, screenshots, UI trees, videos, profiler output, or provider signals. The response must return artifact references:
|
|
156
|
+
|
|
157
|
+
```json
|
|
158
|
+
{
|
|
159
|
+
"artifacts": [
|
|
160
|
+
{
|
|
161
|
+
"kind": "screenshot",
|
|
162
|
+
"path": "captures/final-screen.png",
|
|
163
|
+
"contentType": "image/png",
|
|
164
|
+
"description": "Final screen after launch",
|
|
165
|
+
"sha256": "0000000000000000000000000000000000000000000000000000000000000000",
|
|
166
|
+
"sizeBytes": 0
|
|
167
|
+
}
|
|
168
|
+
]
|
|
169
|
+
}
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
### cancel
|
|
173
|
+
|
|
174
|
+
Requests cancellation of an in-flight `operationId`. The body must include `targetOperationId` and a human-readable `reason`. Adapters should make cancellation best effort and respond to the original operation with `code: "cancelled"` if it was interrupted.
|
|
175
|
+
|
|
176
|
+
### stop
|
|
177
|
+
|
|
178
|
+
Stops the active app/session/target while preserving evidence produced so far. This is distinct from `finalize`; the adapter may still accept evidence capture or finalization work after stop.
|
|
179
|
+
|
|
180
|
+
If there is no active launched target, `stop` must return a structured cleanup failure instead of pretending cleanup ran. Include `details.cleanupStatus` when the adapter can distinguish `not-required`, `partial`, `failed`, or `passed`.
|
|
181
|
+
|
|
182
|
+
### finalize
|
|
183
|
+
|
|
184
|
+
Flushes pending protocol output, closes adapter-owned resources, and reports final artifact inventory. After a successful `finalize` response the adapter should exit with code `0`.
|
|
185
|
+
|
|
186
|
+
`finalize` is terminal for one adapter attempt. Repeated finalization must return a structured cleanup or protocol failure and must not rewrite the prior artifact inventory.
|
|
187
|
+
|
|
188
|
+
## Events
|
|
189
|
+
|
|
190
|
+
Adapters may emit `event` messages between request responses for truth events, progress, and evidence discovery:
|
|
191
|
+
|
|
192
|
+
```json
|
|
193
|
+
{
|
|
194
|
+
"protocolVersion": "1.0",
|
|
195
|
+
"seq": 4,
|
|
196
|
+
"kind": "event",
|
|
197
|
+
"type": "truthEvent",
|
|
198
|
+
"runId": "run-001",
|
|
199
|
+
"attemptId": "attempt-001",
|
|
200
|
+
"body": {
|
|
201
|
+
"name": "app.ready",
|
|
202
|
+
"clockDomain": "device-log",
|
|
203
|
+
"observedAt": "2026-06-19T12:00:02.000Z",
|
|
204
|
+
"payload": {
|
|
205
|
+
"screen": "Home"
|
|
206
|
+
}
|
|
207
|
+
}
|
|
208
|
+
}
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
Events must not replace the response for an operation. The host should still receive one terminal response for every request except when the process exits unexpectedly.
|
|
212
|
+
|
|
213
|
+
## Conformance Fixture
|
|
214
|
+
|
|
215
|
+
The fixture under `runner/__tests__/fixtures/external-adapter/` is intentionally small and non-JavaScript. It proves that a conforming adapter can be an external process with no ASL TypeScript imports. Golden transcripts in the same directory define expected request/response behavior for the success path, unsupported action failure, expired deadline failure, cleanup/finalization failure, sequence monotonicity, and artifact references without embedded evidence bytes.
|
|
216
|
+
|
|
217
|
+
## Read next
|
|
218
|
+
|
|
219
|
+
- [Contracts](contracts.md) for the scenario, runner, artifact, health, verdict, comparison, and provenance shapes
|