pi-agent-browser-native 0.2.47 → 0.2.48
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +46 -19
- package/README.md +38 -15
- package/docs/ARCHITECTURE.md +10 -10
- package/docs/COMMAND_REFERENCE.md +35 -21
- package/docs/ELECTRON.md +3 -3
- package/docs/RELEASE.md +28 -19
- package/docs/REQUIREMENTS.md +1 -1
- package/docs/SUPPORT_MATRIX.md +34 -106
- package/docs/TOOL_CONTRACT.md +23 -21
- package/extensions/agent-browser/index.ts +13 -4
- package/extensions/agent-browser/lib/config.ts +2 -0
- package/extensions/agent-browser/lib/input-modes/job.ts +138 -62
- package/extensions/agent-browser/lib/input-modes/params.ts +2 -2
- package/extensions/agent-browser/lib/orchestration/browser-run/artifact-paths.ts +44 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/click-dispatch.ts +42 -19
- package/extensions/agent-browser/lib/orchestration/browser-run/diagnostics.ts +6 -4
- package/extensions/agent-browser/lib/orchestration/browser-run/final-result.ts +18 -9
- package/extensions/agent-browser/lib/orchestration/browser-run/prepare/direct-anchor-download.ts +158 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/prepare/network-page-filter.ts +116 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/prepare/scroll-shims.ts +147 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/prepare/snapshot-filter.ts +183 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/prepare/wait-timeouts.ts +58 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/prepare.ts +19 -653
- package/extensions/agent-browser/lib/orchestration/browser-run/process-output.ts +1 -6
- package/extensions/agent-browser/lib/orchestration/browser-run/session-artifacts.ts +8 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/types.ts +1 -0
- package/extensions/agent-browser/lib/pi-tool-rendering.ts +34 -19
- package/extensions/agent-browser/lib/playbook.ts +4 -4
- package/extensions/agent-browser/lib/results/action-recommendations.ts +3 -3
- package/extensions/agent-browser/lib/web-search.ts +11 -4
- package/package.json +4 -4
- package/scripts/agent-browser-capability-baseline.mjs +6 -3
- package/scripts/doctor.mjs +11 -10
- package/scripts/platform-smoke.mjs +1 -1
|
@@ -18,13 +18,24 @@ This project intentionally blocks normal `agent-browser` bash usage in most agen
|
|
|
18
18
|
|
|
19
19
|
<!-- agent-browser-capability-baseline:start upstream-baseline -->
|
|
20
20
|
<!-- Generated from scripts/agent-browser-capability-baseline.mjs. Run `npm run docs -- command-reference write` to update. Do not edit manually. -->
|
|
21
|
-
This reference is baselined to the locally installed `agent-browser 0.27.
|
|
21
|
+
This reference is baselined to the locally installed `agent-browser 0.27.2` command/help surface, audited against vercel-labs/agent-browser@5185339ca3fdab9848e11b8ec676eecfdec3733f. Upstream `agent-browser` remains the source of truth for command semantics; this file is the local fallback for Pi agent sessions where direct binary help is blocked or discouraged.
|
|
22
22
|
|
|
23
23
|
The lightweight drift check is `npm run verify -- command-reference`. Run it whenever the installed upstream `agent-browser` version changes or this reference is edited.
|
|
24
24
|
|
|
25
25
|
Use `npm run benchmark:agent-browser` or `npm run verify -- benchmark` before and after agent-facing workflow abstractions to measure task success, tool calls, model-visible output size, stale-ref behavior, artifact success, failure-category coverage, and elapsed-time estimates.
|
|
26
26
|
<!-- agent-browser-capability-baseline:end upstream-baseline -->
|
|
27
27
|
|
|
28
|
+
### Upstream 0.27.2 changelog support
|
|
29
|
+
|
|
30
|
+
The 0.27.2 rebaseline is a passthrough-first compatibility update, not a compatibility shim for older upstream releases. The wrapper must not hide these upstream fixes:
|
|
31
|
+
|
|
32
|
+
- click reliability: upstream now scrolls off-viewport elements before coordinate resolution, handles JavaScript dialogs promptly, recovers mouse state after dialog-opening clicks, and reports overlay interception before dispatching input
|
|
33
|
+
- frame-scoped CSS selectors and waits, including cross-process iframe click-coordinate translation
|
|
34
|
+
- wait timeout handling: documented 25s default, honored `--timeout` across wait variants, and appropriate client read budgets for long waits; the native wrapper forwards explicit long waits and derives a subprocess watchdog when top-level `timeoutMs` is omitted
|
|
35
|
+
- form commands: `find label` matches `aria-label` / `aria-labelledby`, `select` errors when no option matches, and `type` parses `--clear` / `--delay` instead of typing them as literal text
|
|
36
|
+
- warm CLI command latency and batch daemon respawn/retry improvements
|
|
37
|
+
- GNU Linux release artifacts pinned to glibc 2.28
|
|
38
|
+
|
|
28
39
|
## Core mental model
|
|
29
40
|
|
|
30
41
|
Input mode chooser (one per call): **`args`** for the default open → snapshot -i → click/fill `@refs` flow; **`semanticAction`** for stable role/text/label targets; **`job`** / **`qa`** for multi-step checks; **`electron`** for desktop apps only; **`sourceLookup`** / **`networkSourceLookup`** are **experimental candidates-only** helpers (not authoritative mappings). Do not pass `--json` in `args`—the wrapper injects it. Match link and button text to the latest snapshot (on `https://example.com/` the main link is `Learn more`, not legacy `More information...` copy). See [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#input-mode-chooser) for snapshot variants (`-i` vs `--compact` vs full) and batching three or more getters.
|
|
@@ -64,7 +75,7 @@ Tool parameters (use exactly one of `args`, `semanticAction`, `job`, `qa`, `sour
|
|
|
64
75
|
|
|
65
76
|
- `args`: exact `agent-browser` CLI tokens after the binary name. Omit when using `semanticAction`, `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron` instead (mutually exclusive).
|
|
66
77
|
- `semanticAction`: optional shorthand for common `find` flows, direct selector/ref click/check/fill, and native dropdown `select`; compiles to upstream argv and is rejected together with `args`, `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron` on the same call.
|
|
67
|
-
- `job`: optional constrained short-workflow schema; compiles to existing upstream `batch` args/stdin, defaults to `batch --bail` (`failFast: true`), and reports the compiled plan in `details.compiledJob`.
|
|
78
|
+
- `job`: optional constrained short-workflow schema; compiles to existing upstream `batch` args/stdin, defaults to `batch --bail` (`failFast: true`), and reports the compiled plan in `details.compiledJob`. Keep stateful jobs short around navigation, click, and rerender boundaries on dynamic apps.
|
|
68
79
|
- `qa`: optional lightweight QA preset; compiles to the same fail-fast batch path and reports `details.compiledQaPreset` plus `details.qaPreset` pass/fail evidence.
|
|
69
80
|
- `sourceLookup`: **EXPERIMENTAL — candidates only** for local UI-to-source hints; compiles to the same `batch` path, reports `details.compiledSourceLookup` and `details.sourceLookup`, and never reclassifies a fully successful upstream batch as failed the way `qa` can (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sourcelookup) and the longer notes below).
|
|
70
81
|
- `networkSourceLookup`: **EXPERIMENTAL — candidates only** for failed request-to-source hints; compiles to generated `batch`, reports `details.compiledNetworkSourceLookup` and `details.networkSourceLookup`, and never assigns blame or edits files.
|
|
@@ -107,7 +118,7 @@ Treat headed success as browser-context success, not proof that a window is visi
|
|
|
107
118
|
|
|
108
119
|
For local fixtures, remember that `localhost` and `127.0.0.1` are resolved from the browser host, which may differ from the shell that started a temporary HTTP server. `net::ERR_EMPTY_RESPONSE` on `http://localhost:<port>` usually means the browser could not reach that server, not that the page itself rendered blank; the wrapper appends a local fixture hint for common loopback navigation failures. Prefer a host-reachable address when your environment provides one; otherwise use `file://` only for static fixtures and note its limits. `file://` does not provide HTTP headers and may change MIME/CORS/storage/debugger behavior. If `eval --stdin` on a `file://` page returns `null` for even simple DOM expressions, first make sure the JavaScript is in the native tool `stdin` field rather than trailing after `--stdin` in `args`; then treat the result as inconclusive and verify with `snapshot -i`, `get text` on current refs, or screenshots until the fixture can run over reachable HTTP.
|
|
109
120
|
|
|
110
|
-
Temporary HTTP servers and their port/process lifecycle stay outside the native tool. Extension maintainers running real-upstream contract tests can reuse `startAgentBrowserContractFixtureServer()` in [`test/helpers/agent-browser-harness.ts`](
|
|
121
|
+
Temporary HTTP servers and their port/process lifecycle stay outside the native tool. Extension maintainers running real-upstream contract tests can reuse `startAgentBrowserContractFixtureServer()` in [`test/helpers/agent-browser-harness.ts`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/test/helpers/agent-browser-harness.ts) instead of ad-hoc `python3 -m http.server` processes.
|
|
111
122
|
|
|
112
123
|
### React, SPA, and Web Vitals flows
|
|
113
124
|
|
|
@@ -129,7 +140,7 @@ Use `vitals [url]` for Core Web Vitals plus React hydration timing when availabl
|
|
|
129
140
|
{ "args": ["pushstate", "/dashboard?tab=settings"] }
|
|
130
141
|
```
|
|
131
142
|
|
|
132
|
-
For first-navigation setup, start on `about:blank`, then stage routes, cookies, or init scripts before navigating. The relevant v0.27.
|
|
143
|
+
For first-navigation setup, start on `about:blank`, then stage routes, cookies, or init scripts before navigating. The relevant v0.27.2 surfaces are `network route <url> [--abort|--body <json>] [--resource-type <csv>]` and `cookies set --curl <file>`:
|
|
133
144
|
|
|
134
145
|
```json
|
|
135
146
|
{ "args": ["open"], "sessionMode": "fresh" }
|
|
@@ -172,7 +183,7 @@ Do not assume Playwright selector dialects such as `text=Close` or `button:has-t
|
|
|
172
183
|
|
|
173
184
|
Treat `@e…` refs as page-scoped. After a successful `snapshot`, the wrapper records the latest refs and page target for that session; mutation-prone ref commands such as non-form `click @e4`, `select @e5 chocolate`, or batch steps with old refs fail with `failureCategory: "stale-ref"` when the page target changed or the ref is absent from the latest same-page snapshot. If a session `snapshot -i` fails with `No active page`, the wrapper invalidates prior refs for that session; later mutation-prone `@e…` calls fail before upstream until a successful fresh `snapshot -i` records refs again. Inside `batch` stdin JSON, the wrapper also walks steps in order before spawn: steps whose first token can navigate or mutate set a latch; a later step whose first token is `snapshot` clears that latch for following rows; guarded steps that still mention `@e…` after an uncleared latch fail with the same `stale-ref` bucket without launching upstream. Same-snapshot form fills and native form-control steps are allowed before a click or submit step, so `fill`, `check`/`uncheck` checkbox or radio refs, checkbox/radio `click`/`tap` refs, `select` combobox refs, then a final submit `click` can run from one snapshot. Split dynamic or autosubmit forms with a fresh snapshot if a control interaction rerenders the targets. Follow the `refresh-interactive-refs` next action (it includes `--session <name>` when needed) and prefer stable `find` or `semanticAction` locators when navigation or rerendering is likely. Contract detail: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`refSnapshot`, `refSnapshotInvalidation`).
|
|
174
185
|
|
|
175
|
-
A successful `click` result means upstream reported a target, not that the app definitely handled the event. For top-level non-Electron clicks, the wrapper installs a bounded DOM-event probe; when upstream reports success but no trusted event reaches the target, it fails the tool and exposes `details.clickDispatch` plus a `Click dispatch diagnostic` line with explicit retry/inspect next actions (no in-page click replay). If the probe evidence shows the target is outside a nested scroll container or viewport, `details.clickDispatch.scrollContainer` and `scroll-target-into-view-after-dispatch-miss` point to `scrollintoview <target>` before retry. When the workflow depends on a mutation, use `details.pageChangeSummary`, a wait, URL/text extraction, or a fresh `snapshot -i` before trusting the state; if nothing changed, retry with a current visible ref or stable selector and report the workflow issue. For static local fixtures or debugging where the user explicitly accepts scripted activation, `eval --stdin` can call `document.querySelector(...).click()` to exercise inline handlers and app code; treat that as an untrusted programmatic event, not as evidence that CDP/user-like clicking works. Respect explicit user stop boundaries yourself: if the user says to stop before a final order, post, purchase, or submit action, gather evidence from that page and do not click the final action or use scripted activation to bypass the stop. The wrapper does not infer broad business intent from prompt text; `details.promptGuard` is reserved for concrete artifact-before-close checks. `press`, `key`, `keydown`, and `keyup` accept exactly one key token; focus or click the target first, then run `press Enter` or another single-key command.
|
|
186
|
+
A successful `click` result means upstream reported a target, not that the app definitely handled the event. For top-level non-Electron direct clicks on selectors, `xpath=` targets, and eligible current `@e…` refs, the wrapper installs a bounded target-specific DOM-event probe when it can; when upstream reports success but no trusted event reaches the resolved target, it fails the tool and exposes `details.clickDispatch` plus a `Click dispatch diagnostic` line with explicit retry/inspect next actions (no in-page click replay). Raw `find … click` locator calls are not probed because the wrapper has no concrete element before upstream resolves the locator, and document-level probes can falsely fail frame-scoped clicks. Direct `@e…` click probes are role-gated to current snapshot refs whose accessible role is `button`, `checkbox`, `menuitem`, `radio`, `switch`, or `tab`; duplicate names use snapshot order. If the probe evidence shows the target is outside a nested scroll container or viewport, `details.clickDispatch.scrollContainer` and `scroll-target-into-view-after-dispatch-miss` point to `scrollintoview <target>` before retry. When the workflow depends on a mutation, use `details.pageChangeSummary`, a wait, URL/text extraction, or a fresh `snapshot -i` before trusting the state; if nothing changed, retry with a current visible ref or stable selector and report the workflow issue. For static local fixtures or debugging where the user explicitly accepts scripted activation, `eval --stdin` can call `document.querySelector(...).click()` to exercise inline handlers and app code; treat that as an untrusted programmatic event, not as evidence that CDP/user-like clicking works. Respect explicit user stop boundaries yourself: if the user says to stop before a final order, post, purchase, or submit action, gather evidence from that page and do not click the final action or use scripted activation to bypass the stop. The wrapper does not infer broad business intent from prompt text; `details.promptGuard` is reserved for concrete artifact-before-close checks. `press`, `key`, `keydown`, and `keyup` accept exactly one key token; focus or click the target first, then run `press Enter` or another single-key command.
|
|
176
187
|
|
|
177
188
|
Successful `snapshot -i` results can also surface `Possible overlay blockers` when their own refs already show dialog/alertdialog context plus close/dismiss controls, so agents can detect likely obstruction before clicking. When a **top-level** `click` succeeds (not a `click` hidden inside a `batch`/`job` tool call—the unified command must be `click`), the upstream payload includes `data.clicked`, no `details.clickDispatch` diagnostic fired for the same result, and the wrapper sees the active tab URL unchanged after the same normalization it uses for ref guards (**`#fragment` ignored**), it may run one extra `snapshot -i` and surface `Possible overlay blockers` plus `details.overlayBlockers` (`candidates`, `summary`, and a `snapshot` map that can refresh `refSnapshot`) when that snapshot shows strong modal context (`dialog` / `alertdialog`) **and** up to three close/dismiss-like controls; page-wide words such as privacy, sign in, or banner alone do not trigger it. The URL check compares the session’s prior pinned tab target to `details.navigationSummary.url` after the click; that summary is gathered with one read-only `eval` when the click JSON omits **both** string `data.url` and `data.title`—if upstream already echoes either field, overlay diagnostics are skipped on this path. The diagnostic is skipped if the wrapper already applied tab-focus correction or about-blank recovery on that result. Appended `inspect-overlay-state` / `try-overlay-blocker-candidate-*` entries in `details.nextActions` include `--session <name>` when the session is named, same as other session-scoped follow-ups. Treat `inspect-overlay-state` as the safe first follow-up; only use a `try-overlay-blocker-candidate-*` next action when the candidate is clearly the control you intend to close.
|
|
178
189
|
|
|
@@ -207,7 +218,7 @@ Use `batch --bail` when later steps should stop after the first failed command.
|
|
|
207
218
|
|
|
208
219
|
For short constrained flows, use top-level `job` instead of hand-writing `batch` stdin. Supported job steps are `open`, `click`, `fill`, `type`, `select`, `wait`, `assertText`, `assertUrl`, `waitForDownload`, `snapshot`, and `screenshot`. `open` can include `loadState: "domcontentloaded" | "load" | "networkidle"` to insert a `wait --load …` row immediately after navigation before the next click/read step. `click` and `fill` accept either a stable `selector` or the same semantic locator fields as top-level `semanticAction` (`locator`, plus `role`/`name` or `value` as appropriate) and compile locator steps to upstream `find` argv. `type` focuses an optional selector, sends text through upstream keyboard typing, can insert `wait` rows via `delayMs` for human-paced input, and can append a final `press` key such as `Enter`; delayed typing is capped at 200 characters per step, and generated per-character rows are compacted in model-visible batch text while remaining available in `details.batchSteps`. `select` requires `selector` plus `value` or `values`, and compiles to upstream `select <selector> <value...>`. By default the wrapper compiles steps to upstream `batch --bail` so a failed setup/fill/assertion step stops later mutating clicks; set `failFast: false` only when you explicitly need continue-after-error diagnostics. The wrapper records `details.compiledJob.steps[]` plus `details.compiledJob.failFast`. There is still no separate first-class catalog of reusable named browser recipes above `job`, the `qa` preset, and raw `batch`; see [`ARCHITECTURE.md`](ARCHITECTURE.md#no-reusable-recipe-layer-yet) for the closed `RQ-0068` decision and revisit bar.
|
|
209
220
|
|
|
210
|
-
**Job navigation is explicit.** A `click` step (or other navigation-prone interaction) does not prove the next page loaded. The wrapper does not auto-insert `assertUrl` or `assertText` after clicks inside `job`; add those steps yourself with the URL pattern or on-page text you expect, especially after forms, checkout, tabs, or submit buttons, before screenshots or later steps.
|
|
221
|
+
**Job navigation is explicit.** A `click` step (or other navigation-prone interaction) does not prove the next page loaded. The wrapper does not auto-insert `assertUrl` or `assertText` after clicks inside `job`; add those steps yourself with the exact URL, a `*` / `**` glob-style URL pattern, or on-page text you expect, especially after forms, checkout, tabs, or submit buttons, before screenshots or later steps. Exact `assertUrl` values without `*` compile to `wait --url` unchanged, including query strings and literal `?`. Glob-style values compile to a `wait --fn` predicate: single `*` matches within one path segment only, while `**` or longer star runs match across `/`; regex metacharacters such as `.`, `?`, `+`, `[`, `]`, and `$` stay literal. Literal `*` exact URLs are not supported by `assertUrl`; use raw `wait --url` only after verifying upstream behavior. Do not put a whole dynamic checkout into one long job: split around login, sorting/cart mutations, checkout navigation, and final evidence capture so refs and app state can be rechecked between phases. Glob-style `assertUrl` values compile this way so `**/shipping` works even when upstream `wait --url` pattern matching is narrower than its help text implies.
|
|
211
222
|
|
|
212
223
|
```json
|
|
213
224
|
{
|
|
@@ -327,13 +338,13 @@ Top-level `networkSourceLookup` does the same for failed browser requests. When
|
|
|
327
338
|
|
|
328
339
|
```json
|
|
329
340
|
{ "args": ["wait", "--load", "networkidle"] }
|
|
330
|
-
{ "args": ["wait", "--url", "
|
|
341
|
+
{ "args": ["wait", "--url", "https://app.example/dashboard"] }
|
|
331
342
|
{ "args": ["wait", "--download", "/tmp/report.pdf"] }
|
|
332
343
|
```
|
|
333
344
|
|
|
334
345
|
Do not omit the load state value; use `wait --load <state>` with `load`, `domcontentloaded`, or `networkidle`.
|
|
335
346
|
|
|
336
|
-
For desktop-host readiness, prefer condition waits over fixed sleeps. Use this ladder: `wait --text` / `wait --url` / `wait --fn` / `wait --load <state>` / `wait --download` when a real condition exists; after raw `connect`, run `tab list` → `tab t<N>` → condition wait or `snapshot -i`; after wrapper-owned `electron.launch`, use `electron.probe` / `electron.status` for launch health or target mismatch; use `qa.attached` when expected text or selector plus diagnostics can express the check. Fixed waits are a last resort:
|
|
347
|
+
For desktop-host readiness, prefer condition waits over fixed sleeps. Use this ladder: `wait --text` / exact `wait --url` / `wait --fn` / `wait --load <state>` / `wait --download` when a real condition exists; after raw `connect`, run `tab list` → `tab t<N>` → condition wait or `snapshot -i`; after wrapper-owned `electron.launch`, use `electron.probe` / `electron.status` for launch health or target mismatch; use `qa.attached` when expected text or selector plus diagnostics can express the check. Upstream help labels `wait --url` as a pattern matcher, but dogfood found glob forms such as `**/learn` can time out on the current baseline; use an exact URL there, or use `job.assertUrl` for `*` / `**` glob-style matching. Fixed waits are a last resort: use explicit `--timeout` or top-level `timeoutMs` for legitimately slow waits, and treat a successful fixed-wait payload such as `"waited":"timeout"` as elapsed time only, not proof that the desktop host finished. Verify with an observed condition, fresh snapshot, or screenshot before continuing.
|
|
337
348
|
|
|
338
349
|
Use `wait --download [path]` after an earlier action has already started a browser download, such as a dashboard export button that responds asynchronously:
|
|
339
350
|
|
|
@@ -348,7 +359,7 @@ For one-call flows, put the click and wait in `batch`; the wait step keeps the s
|
|
|
348
359
|
{ "args": ["batch"], "stdin": "[[\"click\",\"@export\"],[\"wait\",\"--download\",\"/tmp/report.csv\"]]" }
|
|
349
360
|
```
|
|
350
361
|
|
|
351
|
-
A successful wait-based download renders a readable summary such as `Download completed: /tmp/report.csv` and exposes top-level `details.savedFilePath` plus `details.savedFile` for non-batch calls. With the current upstream `agent-browser 0.27.
|
|
362
|
+
A successful wait-based download renders a readable summary such as `Download completed: /tmp/report.csv` and exposes top-level `details.savedFilePath` plus `details.savedFile` for non-batch calls. With the current upstream `agent-browser 0.27.2`, `wait --download <path>` may report the requested path before this environment can verify that the file was persisted there. Treat `details.savedFilePath` as upstream-reported metadata unless `details.artifacts[].exists` is true. Upstream tracking: [vercel-labs/agent-browser#1300](https://github.com/vercel-labs/agent-browser/issues/1300).
|
|
352
363
|
|
|
353
364
|
### Download, screenshot, and PDF files
|
|
354
365
|
|
|
@@ -388,7 +399,7 @@ The wrapper keeps a bounded, metadata-only `details.artifactManifest` of recent
|
|
|
388
399
|
|
|
389
400
|
This manifest cap controls what appears in `details.artifactManifest` and in summaries such as `Session artifacts: 42 live, 0 evicted (42/100 recent)`. It does not delete explicit files that upstream saved to paths you chose, such as screenshots, PDFs, downloads, traces, HAR files, or WebM recordings.
|
|
390
401
|
|
|
391
|
-
Browser close commands (`close`, `quit`, or `exit`) are also not file cleanup. If `details.artifactManifest` is present with a non-empty `entries` list, a successful close command appends
|
|
402
|
+
Browser close commands (`close`, `quit`, or `exit`) are also not file cleanup. If `details.artifactManifest` is present with a non-empty `entries` list, a successful close command appends a compact `Artifact lifecycle` note and reports `details.artifactCleanup` with the current retention summary and the same host-owned cleanup `note` as the contract (`extensions/agent-browser/lib/orchestration/browser-run/diagnostics.ts`, `getArtifactCleanupGuidance`). Up to ten distinct user-chosen paths that still exist on disk appear in `explicitArtifactPaths` when matching `explicit-path` manifest rows exist in the recent window; deleted/stale paths are skipped. Otherwise that array is empty and the visible text stays compact while the structured detail still reminds you that close commands do not delete saved files. Delete any paths you care about with host file tools after inspection; the native browser tool intentionally does not remove arbitrary user-chosen filesystem paths.
|
|
392
403
|
|
|
393
404
|
Oversized snapshots and oversized generic outputs are different: when a persisted pi session is available, their wrapper-managed spill files are stored under the private session artifact directory and are governed by the byte budget `PI_AGENT_BROWSER_SESSION_ARTIFACT_MAX_BYTES` (default 32 MiB). Raise that byte budget as well for long QA sessions that need many full raw snapshots or large text spills to survive reload/resume.
|
|
394
405
|
|
|
@@ -613,22 +624,22 @@ Stable tab ids look like `t1`, `t2`, and `t3`. Optional user labels such as `doc
|
|
|
613
624
|
|
|
614
625
|
When a snapshot is too large for inline output, the Pi wrapper renders a compact view before spilling the full raw snapshot to `details.fullOutputPath`. Compact snapshots are main-content-first, but dense pages and desktop host screens can still hide actionable controls in omitted content; scan `Omitted high-value controls` before opening the spill file. That bounded section favors editable/searchbox/textbox/combobox controls, named tab/surface controls, primary action buttons, and high-signal named links such as repository search results, then includes other useful controls such as checkboxes, radios, options, and menuitems that were not already listed under key refs or other refs. When that section appears, `details.data.highValueControlRefIds` repeats the same visible ref ids for programmatic follow-up alongside fields such as `previewMode`, `previewSections`, and counts on `details.data` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details)).
|
|
615
626
|
|
|
616
|
-
For dense pages, the wrapper also accepts `snapshot -i --search <text>` and `snapshot -i --filter role=<role>` as wrapper-side filters. It runs upstream `snapshot` without those wrapper-only flags, records the full returned ref map in `details.refSnapshot` for stale-ref safety, and renders
|
|
627
|
+
For dense pages, the wrapper also accepts `snapshot -i --search <text>` and `snapshot -i --filter role=<role>` as wrapper-side filters. It runs upstream `snapshot` without those wrapper-only flags, records the full returned ref map in `details.refSnapshot` for stale-ref safety, and renders matching direct refs plus surrounding snapshot context in the model-visible snapshot with `details.snapshotFilter` counts. The visible summary distinguishes direct ref matches from surrounding lines so contextual/nested output does not look like a ref-count mismatch. Add wrapper-side `--viewport` when scroll position, viewport size, document size, and sampled scroll-container offsets matter; it runs one read-only `eval --stdin` probe and reports `details.snapshotViewport`. Add wrapper-side `--diff` to compare the current ref map with the previous wrapper-tracked snapshot for that session and report `details.snapshotDiff` added/removed/changed refs. Use these flags when you need controls like checkout buttons, all comboboxes, above/below-fold context, or a quick before/after ref delta without reading a full spill file.
|
|
617
628
|
|
|
618
629
|
### Wait
|
|
619
630
|
|
|
620
631
|
| Mode | Purpose |
|
|
621
632
|
| --- | --- |
|
|
622
633
|
| `wait <selector>` | Wait for an element to appear. |
|
|
623
|
-
| `wait <ms>` | Wait for a fixed number of milliseconds.
|
|
634
|
+
| `wait <ms>` | Wait for a fixed number of milliseconds. The native Pi wrapper now forwards long waits and derives a subprocess watchdog from the explicit wait duration when the caller does not provide top-level `timeoutMs`. |
|
|
624
635
|
| `wait --url <pattern>` | Wait for the URL to match a pattern. |
|
|
625
636
|
| `wait --load <state>` | Wait for load state: `load`, `domcontentloaded`, or `networkidle`. |
|
|
626
637
|
| `wait --fn <expression>` | Wait for a JavaScript expression to become truthy. |
|
|
627
638
|
| `wait --text <text>` | Wait for text to appear on the page; failures may include `inspect-after-text-assertion-failure` with a session-scoped `snapshot -i` payload. |
|
|
628
639
|
| `wait --download [path]` | Wait for a download started by a previous action and optionally save it to `path`; successful wrapper results include upstream-reported `savedFilePath`/`savedFile`, while `details.artifacts[].exists` is the wrapper's on-disk verification signal. |
|
|
629
|
-
| `wait --download [path] --timeout <ms>` | Set download-start timeout in milliseconds.
|
|
640
|
+
| `wait --download [path] --timeout <ms>` | Set download-start timeout in milliseconds. The native Pi wrapper forwards explicit wait timeouts and extends the subprocess watchdog unless the caller supplies top-level `timeoutMs`. |
|
|
630
641
|
|
|
631
|
-
Current v0.27.
|
|
642
|
+
Current v0.27.2 source still does not parse `wait <selector> --state hidden` / `wait <selector> --state detached` as distinct wait modes even though upstream help mentions those examples. Use `wait --fn "!document.querySelector('#spinner')"` or another explicit JavaScript predicate for disappearance/detach checks until upstream parser support exists.
|
|
632
643
|
|
|
633
644
|
### Diff, debug, and streaming
|
|
634
645
|
|
|
@@ -701,7 +712,7 @@ When these commands are invoked through the native `agent_browser` tool, structu
|
|
|
701
712
|
- project-local: `.pi/config/pi-agent-browser-native/config.json`
|
|
702
713
|
- explicit override: `PI_AGENT_BROWSER_CONFIG=/path/to/config.json`
|
|
703
714
|
|
|
704
|
-
Get an Exa API key from the [Exa dashboard](https://dashboard.exa.ai/api-keys) or a Brave Search API key from the [Brave Search API dashboard](https://api-dashboard.search.brave.com/). If both keys are available, `agent_browser_web_search` prefers Exa by default because its `/search` endpoint returns token-efficient highlights and agent-oriented search modes; set `webSearch.preferredProvider` to `"brave"` when Brave Search is preferred. You can also disable this package's search tool with `webSearch.enabled: false` when another search tool should win. Config merges global → project → `PI_AGENT_BROWSER_CONFIG` override, so `enabled` is read from the final loaded config: a global disable can be re-enabled by project or override config, while an override file with `enabled: false` is the highest-priority hard disable for that run. Under Pi 0.79+, globally installed or CLI-loaded extensions are developer-trusted code, so this extension reads project-local config under `.pi/config/...` by default and skips that project layer
|
|
715
|
+
Get an Exa API key from the [Exa dashboard](https://dashboard.exa.ai/api-keys) or a Brave Search API key from the [Brave Search API dashboard](https://api-dashboard.search.brave.com/). If both keys are available, `agent_browser_web_search` prefers Exa by default because its `/search` endpoint returns token-efficient highlights and agent-oriented search modes; set `webSearch.preferredProvider` to `"brave"` when Brave Search is preferred. You can also disable this package's search tool with `webSearch.enabled: false` when another search tool should win. Config merges global → project → `PI_AGENT_BROWSER_CONFIG` override, so `enabled` is read from the final loaded config: a global disable can be re-enabled by project or override config, while an override file with `enabled: false` is the highest-priority hard disable for that run. Under Pi 0.79+, globally installed or CLI-loaded extensions are developer-trusted code, so this extension reads project-local config under `.pi/config/...` by default and skips that project layer when Pi reports the project is untrusted or when launched with `--no-approve`.
|
|
705
716
|
|
|
706
717
|
`pi install npm:pi-agent-browser-native` loads the extension, but it does **not** usually put the package helper on your shell `PATH`. The clearest setup is to write the config file directly and keep actual keys in the environment that launches `pi`:
|
|
707
718
|
|
|
@@ -808,7 +819,7 @@ Browser default config is conservative: it adds agent guidance for signed-in/acc
|
|
|
808
819
|
- `--confirm-interactive`: interactive confirmations; auto-denies when stdin is not a TTY. Environment: `AGENT_BROWSER_CONFIRM_INTERACTIVE`.
|
|
809
820
|
- `-p, --provider <name>`: provider such as `ios`, `browserbase`, `kernel`, `browseruse`, `browserless`, or `agentcore`. Environment: `AGENT_BROWSER_PROVIDER`.
|
|
810
821
|
- `--device <name>`: iOS device name. Environment: `AGENT_BROWSER_IOS_DEVICE`.
|
|
811
|
-
- Provider-specific iOS examples from upstream include `agent-browser -p ios device list`, `agent-browser -p ios swipe up`, and `agent-browser -p ios tap @e1`; in pi, pass those tokens through `args` rather than bash. iOS requires external Xcode/Appium setup, and cloud providers (`browserbase`, `kernel`, `browseruse`, `browserless`, `agentcore`) require their upstream accounts, credentials, and provider-specific environment variables. Common forwarded provider variables include `BROWSERBASE_API_KEY`, `BROWSERBASE_PROJECT_ID`, `BROWSERLESS_API_KEY`, `BROWSERLESS_API_URL`, `BROWSERLESS_BROWSER_TYPE`, `BROWSERLESS_STEALTH`, `BROWSERLESS_TTL`, `BROWSER_USE_API_KEY`, `KERNEL_API_KEY`, `KERNEL_HEADLESS`, `KERNEL_STEALTH`, `KERNEL_TIMEOUT_SECONDS`, `KERNEL_PROFILE_NAME`, `AGENTCORE_API_KEY`, `AGENTCORE_REGION`, `AGENTCORE_BROWSER_ID`, `AGENTCORE_PROFILE_ID`, `AGENTCORE_SESSION_TIMEOUT`, plus AWS names used by AgentCore such as `AWS_PROFILE`, `AWS_ACCESS_KEY_ID`, and `
|
|
822
|
+
- Provider-specific iOS examples from upstream include `agent-browser -p ios device list`, `agent-browser -p ios swipe up`, and `agent-browser -p ios tap @e1`; in pi, pass those tokens through `args` rather than bash. iOS requires external Xcode/Appium setup, and cloud providers (`browserbase`, `kernel`, `browseruse`, `browserless`, `agentcore`) require their upstream accounts, credentials, and provider-specific environment variables. Common forwarded provider variables include `BROWSERBASE_API_KEY`, `BROWSERBASE_PROJECT_ID`, `BROWSERLESS_API_KEY`, `BROWSERLESS_API_URL`, `BROWSERLESS_BROWSER_TYPE`, `BROWSERLESS_STEALTH`, `BROWSERLESS_TTL`, `BROWSER_USE_API_KEY`, `KERNEL_API_KEY`, `KERNEL_HEADLESS`, `KERNEL_STEALTH`, `KERNEL_TIMEOUT_SECONDS`, `KERNEL_PROFILE_NAME`, `AGENTCORE_API_KEY`, `AGENTCORE_REGION`, `AGENTCORE_BROWSER_ID`, `AGENTCORE_PROFILE_ID`, `AGENTCORE_SESSION_TIMEOUT`, plus AWS names used by AgentCore such as `AWS_PROFILE`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`, `AWS_REGION`, and `AWS_DEFAULT_REGION`. The wrapper forwards provider flags/env and stays thin; it does not emulate provider setup or cloud browser behavior.
|
|
812
823
|
- `--model <name>`: AI model for `chat`. Environment: `AI_GATEWAY_MODEL`.
|
|
813
824
|
- `-v, --verbose`: show tool commands and raw output.
|
|
814
825
|
- `-q, --quiet`: show only AI text responses.
|
|
@@ -840,7 +851,7 @@ Other useful environment variables include `AGENT_BROWSER_DEFAULT_TIMEOUT`, `AGE
|
|
|
840
851
|
- For sessions with observed tab-drift risk, after a successful command on a known target tab, agent_browser also best-effort restores that intended tab if a restored/background tab steals focus after the command completes. Routine same-session commands skip this post-command tab-list probe.
|
|
841
852
|
- If a known session target unexpectedly reports about:blank, agent_browser best-effort re-selects the prior intended target when it still exists; if recovery fails, it records the observed about:blank target and reports exact recovery guidance instead of treating the prior page as active.
|
|
842
853
|
<!-- agent-browser-playbook:end wrapper-tab-recovery -->
|
|
843
|
-
- Wrapper-spawned commands clamp `AGENT_BROWSER_DEFAULT_TIMEOUT` to 25
|
|
854
|
+
- Wrapper-spawned commands clamp `AGENT_BROWSER_DEFAULT_TIMEOUT` to the upstream documented 25-second default and use a 35-second child-process watchdog (`PI_AGENT_BROWSER_PROCESS_TIMEOUT_MS` overrides the default 35s budget; top-level `timeoutMs` overrides it per browser CLI call). Explicit `wait <ms>` or `wait --timeout <ms>` calls can exceed that default; when top-level `timeoutMs` is omitted, the wrapper derives a subprocess watchdog from the requested wait duration plus a small grace window. Dialog commands are additionally bounded to 5 seconds (`PI_AGENT_BROWSER_DIALOG_PROCESS_TIMEOUT_MS`), and click/tap/find refs or tokens plus `eval --stdin` snippets that look like alert/confirm/prompt/dialog triggers are bounded to 8 seconds (`PI_AGENT_BROWSER_DIALOG_TRIGGER_PROCESS_TIMEOUT_MS`). When any watchdog fires, `details.timeoutPartialProgress` may include a planned step list with per-step status (including `generatedFrom` labels for wrapper-inserted rows such as `open.loadState`) and a `retry-timeout-step` next action only when the first incomplete step is read-only or idempotent, or `inspect-current-page-after-timeout` when the session is still inspectable but the incomplete step may be mutating and should not be blindly retried. It also includes current page title/URL from best-effort session `get url` / `get title` (or a planned URL inferred from the step list when the session cannot answer), an `openedButPostOpenTimedOut` classification only when a live page URL was recovered before a later step hung, and declared artifact paths such as `screenshot`, `pdf`, `download`, or `wait --download` outputs with existence/state checks; the same evidence is appended under `Timeout partial progress` in visible text with URL/path redaction.
|
|
844
855
|
- Oversized snapshots and oversized generic outputs may be compacted in tool content, with the full raw output written to a spill file path shown directly in the tool result. Recent artifact metadata is bounded by `PI_AGENT_BROWSER_SESSION_ARTIFACT_MANIFEST_MAX_ENTRIES` (default 100); persisted spill files are separately bounded by `PI_AGENT_BROWSER_SESSION_ARTIFACT_MAX_BYTES` (default 32 MiB).
|
|
845
856
|
- The wrapper keeps `--help` and `--version` stateless so they do not consume the implicit managed-session slot.
|
|
846
857
|
|
|
@@ -849,14 +860,14 @@ Other useful environment variables include `AGENT_BROWSER_DEFAULT_TIMEOUT`, `AGE
|
|
|
849
860
|
<!-- agent-browser-capability-baseline:start capability-token-baseline -->
|
|
850
861
|
<!-- Generated from scripts/agent-browser-capability-baseline.mjs. Run `npm run docs -- command-reference write` to update. Do not edit manually. -->
|
|
851
862
|
<details>
|
|
852
|
-
<summary>Generated verifier capability baseline for agent-browser 0.27.
|
|
863
|
+
<summary>Generated verifier capability baseline for agent-browser 0.27.2</summary>
|
|
853
864
|
|
|
854
865
|
This generated block is review data for maintainers. The human-authored reference sections above remain the readable command guide.
|
|
855
866
|
|
|
856
867
|
#### Source evidence
|
|
857
868
|
- repository: `vercel-labs/agent-browser`
|
|
858
|
-
- upstream HEAD: `
|
|
859
|
-
- upstream package version: `0.27.
|
|
869
|
+
- upstream HEAD: `5185339ca3fdab9848e11b8ec676eecfdec3733f`
|
|
870
|
+
- upstream package version: `0.27.2`
|
|
860
871
|
- inspected: `agent-browser --version`
|
|
861
872
|
- inspected: `agent-browser --help`
|
|
862
873
|
- inspected: `selected agent-browser <command> --help output`
|
|
@@ -925,7 +936,7 @@ This generated block is review data for maintainers. The human-authored referenc
|
|
|
925
936
|
- Sessions, state, tabs, frames, dialogs, and windows: 20 human-doc token(s), 16 upstream token(s)
|
|
926
937
|
- Network, storage, artifacts, diagnostics, and performance: 43 human-doc token(s), 53 upstream token(s)
|
|
927
938
|
- Batch, auth, confirmations, setup, dashboard, devices, and AI commands: 24 human-doc token(s), 24 upstream token(s)
|
|
928
|
-
- Global flags, config, providers, policy, and environment:
|
|
939
|
+
- Global flags, config, providers, policy, and environment: 120 human-doc token(s), 90 upstream token(s)
|
|
929
940
|
|
|
930
941
|
#### Human-authored doc tokens required
|
|
931
942
|
##### Built-in skills
|
|
@@ -1230,6 +1241,9 @@ This generated block is review data for maintainers. The human-authored referenc
|
|
|
1230
1241
|
- `AWS_PROFILE`
|
|
1231
1242
|
- `AWS_ACCESS_KEY_ID`
|
|
1232
1243
|
- `AWS_SECRET_ACCESS_KEY`
|
|
1244
|
+
- `AWS_SESSION_TOKEN`
|
|
1245
|
+
- `AWS_REGION`
|
|
1246
|
+
- `AWS_DEFAULT_REGION`
|
|
1233
1247
|
|
|
1234
1248
|
#### Upstream help tokens expected
|
|
1235
1249
|
##### Built-in skills
|
package/docs/ELECTRON.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
|
|
3
3
|
Related docs:
|
|
4
4
|
- [`../README.md`](../README.md)
|
|
5
|
-
- [`../AGENTS.md`](
|
|
5
|
+
- [`../AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md) — maintainer verification (`npm run verify`, lifecycle), Pi `tmux` smoke expectations, and upstream rebaselining
|
|
6
6
|
- [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md) — full `electron` and `qa.attached` field contracts
|
|
7
7
|
- [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) — workflow snippets in the broader native command surface
|
|
8
8
|
- [`ARCHITECTURE.md`](ARCHITECTURE.md) — wrapper design and the closed `RQ-0068` recipe-layer decision
|
|
@@ -107,7 +107,7 @@ Use this ladder for desktop-host readiness instead of blind sleep loops:
|
|
|
107
107
|
2. After raw `connect`, inspect targets with `tab list`, select the stable `tab t<N>` app surface, then use a condition wait or `snapshot -i` on that selected surface.
|
|
108
108
|
3. After wrapper-owned `electron.launch`, use `electron.probe` or `electron.status` when launch health, debug-port liveness, or target mismatch matters.
|
|
109
109
|
4. Use `qa.attached` when the readiness check can be expressed as expected text or selector plus diagnostics against the current managed session.
|
|
110
|
-
5. Use fixed waits only as a last resort,
|
|
110
|
+
5. Use fixed waits only as a last resort. For legitimately slow waits, pass an explicit upstream wait timeout and let the wrapper derive the subprocess watchdog, or set top-level `timeoutMs` to at least the wait duration plus a small grace window.
|
|
111
111
|
6. Treat a fixed-wait payload such as `"waited":"timeout"` as elapsed time, not proof that the host finished. Verify with an observed condition, fresh `snapshot -i`, or screenshot before continuing.
|
|
112
112
|
|
|
113
113
|
This project is not adding a first-class host-idle primitive yet. Revisit that only if repeated desktop smokes show that condition waits, `qa.attached`, `electron.probe`, snapshots, and screenshots cannot cover the workflow.
|
|
@@ -365,7 +365,7 @@ Electron support is gated by the same release evidence as the rest of the wrappe
|
|
|
365
365
|
- `RQ-0096` in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md) records the contract, runtime, test, and verification coverage.
|
|
366
366
|
- `electron-lifecycle` and `electron-probe` scenarios in `scripts/agent-browser-efficiency-benchmark.mjs` track the token-efficiency claim deterministically (no real browser, no real launches).
|
|
367
367
|
- Fake-upstream coverage for Electron schema/probe/mismatch/post-command-health/fill-verification/broad-text/discovery-sensitivity lives in `test/agent-browser.extension-validation.test.ts`.
|
|
368
|
-
- Real-app validation is a manual `tmux` smoke pass per the maintainer notes in `AGENTS.md`; the 2026-05-21 dogfood result is recorded
|
|
368
|
+
- Real-app validation is a manual `tmux` smoke pass per the maintainer notes in `AGENTS.md`; the 2026-05-21 dogfood result is recorded in the repo-local `docs/plans/electron-extension-2026-05-20.md` plan.
|
|
369
369
|
|
|
370
370
|
Run the local gate the same way as the rest of the project:
|
|
371
371
|
|
package/docs/RELEASE.md
CHANGED
|
@@ -8,9 +8,9 @@ Related docs:
|
|
|
8
8
|
- [`ELECTRON.md`](ELECTRON.md)
|
|
9
9
|
- [`platform-smoke.md`](platform-smoke.md)
|
|
10
10
|
- [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
|
|
11
|
-
- Bounded `agent_browser` outcome metadata on `details` (`resultCategory`, `successCategory`, `failureCategory`, optional `nextActions`, optional `pageChangeSummary` with per-step summaries on `batch`): contract in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); maintainer checklists under “Tool result categories” and “Page-change summaries” in [`../AGENTS.md`](
|
|
12
|
-
- Post-success `get text` selector visibility (`RQ-0074`): optional `details.selectorTextVisibility` / `selectorTextVisibilityAll`, visible warnings, and `inspect-visible-text-candidates*` next actions after read-only visibility probes—[`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), and [`../AGENTS.md`](
|
|
13
|
-
- Managed-session outcomes (`RQ-0077`): after extension-managed implicit or fresh `--session` injection reaches process execution, `details.managedSessionOutcome` records the transition (`created` / `replaced` / `unchanged` / `closed` on success; `preserved` / `abandoned` when a plan fails before a new session becomes current). Failing `sessionMode: "fresh"` calls also append model-visible `Managed session outcome: …`—[`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md), [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md), and [`../AGENTS.md`](
|
|
11
|
+
- Bounded `agent_browser` outcome metadata on `details` (`resultCategory`, `successCategory`, `failureCategory`, optional `nextActions`, optional `pageChangeSummary` with per-step summaries on `batch`): contract in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); maintainer checklists under “Tool result categories” and “Page-change summaries” in [`../AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md)
|
|
12
|
+
- Post-success `get text` selector visibility (`RQ-0074`): optional `details.selectorTextVisibility` / `selectorTextVisibilityAll`, visible warnings, and `inspect-visible-text-candidates*` next actions after read-only visibility probes—[`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), and [`../AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md) maintainer checklist
|
|
13
|
+
- Managed-session outcomes (`RQ-0077`): after extension-managed implicit or fresh `--session` injection reaches process execution, `details.managedSessionOutcome` records the transition (`created` / `replaced` / `unchanged` / `closed` on success; `preserved` / `abandoned` when a plan fails before a new session becomes current). Failing `sessionMode: "fresh"` calls also append model-visible `Managed session outcome: …`—[`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md), [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md), and [`../AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md) maintainer checklist
|
|
14
14
|
- Stateful context commands (`cookies`, `storage`, `auth`, `dialog`, `frame`, `state`) and aggregate `batch` results: model-facing `details.data` is summarized or redacted per [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); aggregate `batch` replaces top-level `details.data` with a compact per-step matrix (`success`, argv-redacted `command`, redacted `result` or scrubbed `error`) while full per-step payloads, artifacts, and categories remain on `batchSteps[]`—operational notes in [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#use-stateful-browser-context-commands-safely), assembly in `extensions/agent-browser/lib/results/presentation/batch.ts`
|
|
15
15
|
|
|
16
16
|
## Purpose
|
|
@@ -30,7 +30,15 @@ npm run smoke:platform:doctor
|
|
|
30
30
|
npm run verify -- release
|
|
31
31
|
```
|
|
32
32
|
|
|
33
|
-
`npm run doctor` is a read-only first-run diagnostic for PATH, targeted upstream version, the
|
|
33
|
+
`npm run doctor` is a read-only first-run diagnostic for PATH, targeted upstream version, the minimum Pi runtime floor, and duplicate package/checkout source conflicts. The package keeps Pi core imports as wildcard `peerDependencies` because installed Pi package docs require the host Pi install to provide those packages, while the doctor fails setup when `pi --version` is below the enforced floor. It does not replace upstream `agent-browser doctor` for browser runtime health and does not edit Pi settings.
|
|
34
|
+
|
|
35
|
+
For PR-ready local confidence before release-only lifecycle and platform cost, run:
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
npm run verify -- pre-pr
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
`pre-pr` composes the default gate with `npm run verify -- package`: generated docs, TypeScript, the full unit/fake suite, live command-reference sampling, and package-content verification. It intentionally does not run lifecycle, packaged Pi smoke, Crabbox platform smoke, real-upstream, dogfood, or benchmark modes.
|
|
34
42
|
|
|
35
43
|
`npm run verify -- release` runs:
|
|
36
44
|
|
|
@@ -41,7 +49,7 @@ npm run verify -- release
|
|
|
41
49
|
|
|
42
50
|
`npm publish` runs npm’s `prepublishOnly` script from `package.json`, which executes the same `npm run verify -- release` gate and then `npm pack --dry-run`. That concatenated gate is everything in the default `npm run verify` step (generated playbook drift, TypeScript, the unit/fake suite, generated command-reference blocks, and live upstream command-reference sampling against the targeted `agent-browser` on `PATH`), the configured-source lifecycle harness, the packaged Pi smoke in `package-pi`, and the release-blocking Crabbox platform matrix. Using `npm publish --ignore-scripts` skips that contract intentionally.
|
|
43
51
|
|
|
44
|
-
`prepublishOnly` intentionally does **not** run the standalone host-only `npm run verify -- real-upstream`, `npm run verify -- dogfood`, or `npm run verify -- benchmark` modes; those remain separate `npm run verify` modes in [`scripts/project.mjs`](
|
|
52
|
+
`prepublishOnly` intentionally does **not** run the standalone host-only `npm run verify -- real-upstream`, `npm run verify -- dogfood`, or `npm run verify -- benchmark` modes; those remain separate `npm run verify` modes in [`scripts/project.mjs`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/scripts/project.mjs). The platform matrix includes its own fast target-local build/package gate and browser dogfood suite, and is automated through the `release` slice.
|
|
45
53
|
|
|
46
54
|
For a deterministic host-only real-browser wrapper smoke without model choice in the loop, run:
|
|
47
55
|
|
|
@@ -64,9 +72,9 @@ The Crabbox gate is only green when suite assertions and artifact manifests unde
|
|
|
64
72
|
|
|
65
73
|
The deterministic dogfood mode uses the extension harness and the real `agent-browser` on `PATH` against a deterministic local file fixture, then verifies top-level `qa`, `semanticAction`, constrained `job`, screenshot artifact verification, and session close. Use `npm run verify -- dogfood --keep-artifacts` or `--artifact-dir <path>` only while debugging, then delete retained screenshots. This smoke complements, but does not replace, human-readable interactive transcript evidence.
|
|
66
74
|
|
|
67
|
-
Every release also requires interactive `tmux`-driven Pi dogfood with the native `agent_browser` tool against real sites. For extension-focused release smokes, use `pi --approve --no-extensions --no-skills -e .` from the trusted checkout before publish so auto-loaded dogfood/QA skills cannot replace the bounded smoke workflow; omit `--approve` only when the smoke is explicitly testing Pi's Project Trust prompt. Run separate skill-enabled dogfood only when validating skill routing or report-generation behavior. Drive prompts with `tmux send-keys`, exercise at least one simple static site and one real documentation/product site, include the higher-level `qa` or `job`/`batch` surfaces when they changed, close every opened browser session, remove screenshots/temp artifacts, and record the outcome in the release notes or support-matrix evidence. Automated localhost, fake-upstream, and deterministic dogfood gates do not replace this human-readable live-site transcript evidence. When `agent_browser_web_search` or package config changed, add one key-free smoke proving the optional tool is absent without config, one fake/unit-backed smoke in the default suite, and one opt-in live Exa or Brave Search check with a real key while confirming the key does not appear in transcripts, stdout/stderr, config status, PR text, or artifacts. When `electron.*` surfaces, attached-session diagnostics, or `qa.attached` changed, add a local Electron pass: `electron.list` → `electron.launch` (expect isolated profile behavior) → `snapshot -i` or `electron.probe` / `qa.attached` → `electron.cleanup` with the returned `launchId`, verifying status/mismatch guidance if you simulate a dead renderer or stale refs. For dense-dashboard stress coverage, use the [public Grafana stress checklist](#public-grafana-stress-checklist) below; it is a maintainer workflow, not bundled product skill or recipe runtime.
|
|
75
|
+
Every release also requires interactive `tmux`-driven Pi dogfood with the native `agent_browser` tool against real sites. For extension-focused release smokes, use `pi --approve --no-extensions --no-skills -e .` from the trusted checkout before publish so auto-loaded dogfood/QA skills cannot replace the bounded smoke workflow; omit `--approve` only when the smoke is explicitly testing Pi's Project Trust prompt. Run separate skill-enabled dogfood only when validating skill routing or report-generation behavior. Drive prompts with `tmux send-keys`, exercise at least one simple static site and one real documentation/product site, include the higher-level `qa` or `job`/`batch` surfaces when they changed, close every opened browser session, remove screenshots/temp artifacts, and record the outcome in the release notes or support-matrix evidence. Do not paste raw multi-line prompts into a tmux Pi pane: plain newlines submit separate queued user messages. For scripted smoke driving, collapse prompt files to one line before sending (`PROMPT=$(tr '\n' ' ' < /tmp/smoke-prompt.md); tmux send-keys -t "$SESSION":0.0 -l "$PROMPT"; tmux send-keys -t "$SESSION":0.0 Enter`). For manual multi-line editing, use Pi's external editor shortcut (`Ctrl+G`) or configure tmux extended keys so Pi can receive `Shift+Enter` for newlines; see the installed Pi `docs/tmux.md` guidance. Automated localhost, fake-upstream, and deterministic dogfood gates do not replace this human-readable live-site transcript evidence. When `agent_browser_web_search` or package config changed, add one key-free smoke proving the optional tool is absent without config, one fake/unit-backed smoke in the default suite, and one opt-in live Exa or Brave Search check with a real key while confirming the key does not appear in transcripts, stdout/stderr, config status, PR text, or artifacts. When `electron.*` surfaces, attached-session diagnostics, or `qa.attached` changed, add a local Electron pass: `electron.list` → `electron.launch` (expect isolated profile behavior) → `snapshot -i` or `electron.probe` / `qa.attached` → `electron.cleanup` with the returned `launchId`, verifying status/mismatch guidance if you simulate a dead renderer or stale refs. For dense-dashboard stress coverage, use the [public Grafana stress checklist](#public-grafana-stress-checklist) below; it is a maintainer workflow, not bundled product skill or recipe runtime.
|
|
68
76
|
|
|
69
|
-
When reviewing saved session JSONL after a failed smoke or a `qa` preset that reclassified an upstream-successful batch, expect `agent_browser` tool rows to carry `isError: true` whenever `details.resultCategory` is `failure`. For normal prose output, model-visible text should end with a `Pi tool isError: true` category line; for caller-requested `--json` output, the hook preserves parseable JSON and only patches `isError`. The extension applies that patch on the `tool_result` path so Pi’s transcript matches the wrapper contract ([`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details)). Preserve a normal Pi session directory for those checks; avoiding `--no-session` keeps this evidence intact ([`AGENTS.md`](
|
|
77
|
+
When reviewing saved session JSONL after a failed smoke or a `qa` preset that reclassified an upstream-successful batch, expect `agent_browser` tool rows to carry `isError: true` whenever `details.resultCategory` is `failure`. For normal prose output, model-visible text should end with a `Pi tool isError: true` category line; for caller-requested `--json` output, the hook preserves parseable JSON and only patches `isError`. The extension applies that patch on the `tool_result` path so Pi’s transcript matches the wrapper contract ([`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details)). Preserve a normal Pi session directory for those checks; avoiding `--no-session` keeps this evidence intact ([`AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md) preferred validation workflow).
|
|
70
78
|
|
|
71
79
|
The configured-source lifecycle regression harness is required before release because it launches an interactive `pi` process under `tmux` with `--approve` and validates `/reload`, full relaunch with the same exact Pi 0.79 `--session-id`, managed-session continuity, persisted artifacts, and Pi failure-patch behavior. Branch-backed `session_tree` rehydration and cleanup ownership are validated by focused extension harness tests:
|
|
72
80
|
|
|
@@ -111,6 +119,8 @@ pi --approve --no-extensions --no-skills -e . --model openai-codex/gpt-5.5:minim
|
|
|
111
119
|
|
|
112
120
|
Repeat with `--model openai-codex/gpt-5.5:medium` when validating instruction-following robustness. Use unique temp paths for each run and delete them afterward. Run separate skill-enabled dogfood sessions only when the thing under test is skill integration, not this bounded release smoke.
|
|
113
121
|
|
|
122
|
+
Submit the prompt as one Pi message. In tmux automation, write it to a temp file with placeholders replaced, collapse newlines to spaces, and send that one line; for manual multiline entry, use Pi's `Ctrl+G` external editor or a tmux setup that preserves `Shift+Enter` newlines. Do not paste the raw block into a tmux pane line-by-line.
|
|
123
|
+
|
|
114
124
|
Copy/paste prompt, replacing the two artifact placeholders with exact absolute paths:
|
|
115
125
|
|
|
116
126
|
```text
|
|
@@ -142,13 +152,14 @@ Evaluator expectations after the queued Sauce Demo fixes: the agent should indep
|
|
|
142
152
|
|
|
143
153
|
## Deterministic agent efficiency benchmark
|
|
144
154
|
|
|
145
|
-
[`scripts/agent-browser-efficiency-benchmark.mjs`](
|
|
155
|
+
[`scripts/agent-browser-efficiency-benchmark.mjs`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/scripts/agent-browser-efficiency-benchmark.mjs) is an accounting-only benchmark: it does not shell out to `agent-browser`, launch a browser, or read or write Pi sessions. It models representative `agent_browser` call shapes (including optional `stdin` for `batch` and top-level `job`, `qa`, or experimental `sourceLookup` / `networkSourceLookup` objects that compile to batch) and aggregates success rate, tool-call counts, UTF-8 size of model-visible strings, stale-ref failure and recovery counts, artifact success, distinct failure-category coverage, and summed elapsed-time estimates. When extending scenarios, keep them aligned with the closed `RQ-0068` “no reusable recipe layer” rationale in [`ARCHITECTURE.md`](ARCHITECTURE.md#no-reusable-recipe-layer-yet) (benchmark ids cited there are the canonical inventory for that evidence bar).
|
|
146
156
|
|
|
147
157
|
- **During development:** `npm run benchmark:agent-browser` prints a Markdown report; `npm run benchmark:agent-browser -- --json` saves machine-readable metrics; `npm run benchmark:agent-browser -- --compare path/to/prior.json` fails with exit code `1` on regressions (see the script’s `--help` for exit codes). Optional `--sample-jsonl path/to/session.jsonl` adds a `jsonlSample` section with real UTF-8 byte totals and per-workflow/overall p95 sizes for model-visible `agent_browser` tool-result text without changing deterministic scenario metrics; comparison ignores `jsonlSample` blocks.
|
|
148
|
-
- **Default gate:** `npm run verify` checks generated playbook drift, runs `tsc --noEmit`, runs the full unit/fake suite under `test/**/*.test.ts` with Node test concurrency pinned to `1` (including [`test/agent-browser.efficiency-benchmark.test.ts`](
|
|
149
|
-
- **
|
|
158
|
+
- **Default gate:** `npm run verify` checks generated playbook drift, runs `tsc --noEmit`, runs the full unit/fake suite under `test/**/*.test.ts` with Node test concurrency pinned to `1` (including [`test/agent-browser.efficiency-benchmark.test.ts`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/test/agent-browser.efficiency-benchmark.test.ts) for scenario coverage and comparison behavior), verifies generated command-reference baseline blocks, and samples live upstream command-reference tokens. It does not spawn the standalone benchmark script’s JSON/Markdown run; that is what the opt-in slice below adds.
|
|
159
|
+
- **Pre-PR gate:** `npm run verify -- pre-pr` runs the default gate plus `npm run verify -- package` for larger handoffs that need package-content confidence without lifecycle, platform, real-upstream, dogfood, or benchmark cost.
|
|
160
|
+
- **Opt-in slice:** `npm run verify -- benchmark` runs the benchmark script once with `--json` and then that same test module alone. It is intentionally **not** part of `npm run verify -- pre-pr` or `npm run verify -- release`, so routine handoff and publish gates stay decoupled from benchmark churn while still allowing a focused check after editing scenarios or `CURRENT_BENCHMARK_VERSION`.
|
|
150
161
|
|
|
151
|
-
Maintainer constraints for evolving scenarios and version bumps are summarized under “Agent browser efficiency benchmark” in [`../AGENTS.md`](
|
|
162
|
+
Maintainer constraints for evolving scenarios and version bumps are summarized under “Agent browser efficiency benchmark” in [`../AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md).
|
|
152
163
|
|
|
153
164
|
## What package verification checks
|
|
154
165
|
|
|
@@ -174,9 +185,7 @@ The packaged execution smoke intentionally uses a temporary fake `agent-browser`
|
|
|
174
185
|
Current forbidden packed files include:
|
|
175
186
|
|
|
176
187
|
- `AGENTS.md`
|
|
177
|
-
- `docs/
|
|
178
|
-
- `docs/native-integration-design.md`
|
|
179
|
-
- `docs/v1-tool-contract.md`
|
|
188
|
+
- archived planning drafts under `docs/archive/`
|
|
180
189
|
- `.pi/extensions/agent-browser.ts`
|
|
181
190
|
- test and repo-only maintenance files
|
|
182
191
|
|
|
@@ -212,7 +221,7 @@ Run the automated harness for deterministic configured-source lifecycle regressi
|
|
|
212
221
|
npm run verify -- lifecycle
|
|
213
222
|
```
|
|
214
223
|
|
|
215
|
-
The harness creates an isolated `PI_CODING_AGENT_DIR`, writes settings with exactly one temporary configured package source, runs `pi` in `tmux` with `--approve`, default model **`zai/glm-5.1`**, and a deterministic `--session-id`, puts a deterministic fake `agent-browser` first on `PATH`, drives `/reload`, closes Pi, and relaunches with the same exact session id instead of typing `/resume`. It also asserts the JSONL session header id, same-page managed-session continuity, persisted spill reachability, and real Pi `tool_result` failure-patch semantics for a QA reclassification. Per-step tmux waits default to **180000 ms** (three minutes) in [`scripts/verify-lifecycle.mjs`](
|
|
224
|
+
The harness creates an isolated `PI_CODING_AGENT_DIR`, writes settings with exactly one temporary configured package source, runs `pi` in `tmux` with `--approve`, default model **`zai/glm-5.1`**, and a deterministic `--session-id`, puts a deterministic fake `agent-browser` first on `PATH`, drives `/reload`, closes Pi, and relaunches with the same exact session id instead of typing `/resume`. It also asserts the JSONL session header id, same-page managed-session continuity, persisted spill reachability, and real Pi `tool_result` failure-patch semantics for a QA reclassification. Per-step tmux waits default to **180000 ms** (three minutes) in [`scripts/verify-lifecycle.mjs`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/scripts/verify-lifecycle.mjs) (`DEFAULT_TIMEOUT_MS`); override with `--timeout-ms <ms>` when slower models or cold starts need more headroom. Override the model when needed:
|
|
216
225
|
|
|
217
226
|
```bash
|
|
218
227
|
npm run verify -- lifecycle --model openai-codex/gpt-5.5:minimal
|
|
@@ -234,7 +243,7 @@ These show up often in cloud dev boxes and scripted smokes; they are maintainer
|
|
|
234
243
|
|
|
235
244
|
| Topic | What to watch for | Mitigation |
|
|
236
245
|
| --- | --- | --- |
|
|
237
|
-
| **Pi CLI vs repo devDependencies** | Global `pi` older than the
|
|
246
|
+
| **Pi CLI vs repo devDependencies** | Global `pi` older than the minimum Pi runtime floor for the release can change TUI behavior, `/reload`, package installs, and tool routing during lifecycle or checkout smokes. | Run `npm run doctor` and align `pi` with the current audited baseline before release gates (`pi update` or install the matching version). The published peer range stays wildcard per Pi package docs, and the doctor enforces the minimum Pi runtime floor before package validation. |
|
|
238
247
|
| **npm lockfile (`packageManager`)** | `package.json` pins **npm@11**. npm 10 may only strip optional `libc` metadata on `@esbuild/*` platform entries in `package-lock.json` (no dependency version change). | Prefer `npx -y npm@11.14.0 install` when refreshing the lockfile; do not commit npm-10-only lockfile churn. |
|
|
239
248
|
| **`pi -p` / print mode** | Non-interactive `pi -p` may hang or emit no stdout for long real-browser smokes without a TTY. | Use **tmux**-driven interactive `pi` for release evidence and checkout smokes; reserve `-p` for short, non-browser checks. |
|
|
240
249
|
| **Real-browser cleanup** | `real-upstream`, Sauce Demo, and live-site runs can leave defunct Chrome/`agent-browser` children if a session aborts mid-flow. | Close via `agent_browser` / `agent-browser` `close`, kill stray tmux sessions, and remove temp screenshots/HARs under `/tmp` or your chosen artifact dirs. |
|
|
@@ -261,11 +270,11 @@ That npm script sets `PI_AGENT_BROWSER_REAL_UPSTREAM=1` for the test process. To
|
|
|
261
270
|
This suite requires the installed `agent-browser --version` to exactly match `scripts/agent-browser-capability-baseline.mjs`. It serves fixture pages from localhost and checks stable `details`/`data` keys via `test/fixtures/agent-browser-real-output-shapes.json`. Coverage groups:
|
|
262
271
|
|
|
263
272
|
- **Inspection and skills (stateless JSON):** `--version`, `--help`, `snapshot --help`, `skills list`, `skills get … --full`, `skills path …` (no managed `sessionName` / `usedImplicitSession`).
|
|
264
|
-
- **Managed session core and safe diagnostic matrix:** fresh `open` on the contract fixture, then implicit reuse across `eval --stdin`, `snapshot -i`, interaction commands (`click`, `dblclick`, `fill`, `type`, `focus`, `keyboard` with `type` / `inserttext`, `press`, `hover`, `check`, `uncheck`, `select`, `upload`, `drag`, `mouse`, `scroll`, `scrollintoview`, `wait` on a
|
|
273
|
+
- **Managed session core and safe diagnostic matrix:** fresh `open` on the contract fixture, then implicit reuse across `eval --stdin`, `snapshot -i`, interaction commands (`click`, `dblclick`, `fill`, `type`, `type --clear --delay`, `focus`, `keyboard` with `type` / `inserttext`, `press`, `hover`, `check`, `uncheck`, `select`, failed `select` no-match, `upload`, `drag`, `mouse`, `scroll`, off-viewport click, `scrollintoview`, `wait` on selectors in the main frame and a selected iframe), extraction (`get` variants, `is` variants, `find label … fill` via native `<label>`, `aria-label`, and `aria-labelledby`, inline `eval`), file outputs (`screenshot`, `pdf`), navigation (`back`, `forward`, `reload`, `tab list`, another `open` to the same fixture), `batch` stdin, `pushstate`, `vitals … --json`, network route/requests/HAR, diff snapshot/screenshot/url, trace/profiler, console/errors/highlight, stream enable/status/disable, and `cookies set --curl`.
|
|
265
274
|
- **Failure shape:** `react tree` on a page opened with `--enable react-devtools` but without a React app (expects a clear missing-renderer error with session-bound `details`).
|
|
266
275
|
- **Async download:** `open` on the `/download` fixture, anchor-triggered export, then `wait --download <path>` metadata and wrapper artifact reporting for the requested path.
|
|
267
276
|
|
|
268
|
-
The default unit suite also runs `agentBrowserExtension passes through core command coverage fallback matrix` in [`test/agent-browser.extension-validation.test.ts`](
|
|
277
|
+
The default unit suite also runs `agentBrowserExtension passes through core command coverage fallback matrix` in [`test/agent-browser.extension-validation.test.ts`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/test/agent-browser.extension-validation.test.ts): a fake upstream records argv so `connect 9222`, `download` with a selector and path, `get url`, `snapshot --compact`, and `tab new` / `tab 0` / `tab close` still prove `--json` plus implicit `--session` ordering without a browser. A second fake-upstream matrix in the same file (`agentBrowserExtension passes through non-core network debug diff stream dashboard and chat families`) pins representative `network`, `diff`, `trace` / `profiler` / `record`, `console` / `errors` / `highlight` / `inspect` / `clipboard`, `stream`, `dashboard`, and `chat` JSON shapes plus redacted `details.data` and argv echoes without a browser. A third matrix (`agentBrowserExtension passes through provider and specialized skill workflows`) asserts provider `open` argv shapes still receive `--json` plus implicit `--session` while read-only `skills get …` stays stateless (no managed session fields) and provider credential env vars are forwarded into the fake upstream log. Extend those matrices when adding passthrough coverage that should stay out of the slow real-upstream loop.
|
|
269
278
|
|
|
270
279
|
### Real upstream suite mechanics, isolation, and troubleshooting
|
|
271
280
|
|
|
@@ -280,7 +289,7 @@ The default unit suite also runs `agentBrowserExtension passes through core comm
|
|
|
280
289
|
- **Missing or extra `details` / `data` keys:** Update `test/fixtures/agent-browser-real-output-shapes.json` in the same change as the wrapper or presentation code that shifts those keys.
|
|
281
290
|
- **Timeouts:** A 120s bound covers the full matrix; repeated timeouts usually mean a hung browser, blocked loopback, or an environment preventing headful/headless launch—check upstream logs and local security tooling before loosening timeouts.
|
|
282
291
|
|
|
283
|
-
The current upstream `agent-browser 0.27.
|
|
292
|
+
The current upstream `agent-browser 0.27.2` `wait --download <path>` saveAs persistence limitation is tracked at [vercel-labs/agent-browser#1300](https://github.com/vercel-labs/agent-browser/issues/1300); until it is fixed, release validation must treat `details.savedFilePath` as upstream-reported metadata and use `details.artifacts[].exists` as the filesystem truth (the contract asserts the requested path is absent on disk while upstream still reports success). If the suite fails because JSON/detail keys drifted, update the wrapper behavior or refresh `test/fixtures/agent-browser-real-output-shapes.json` together with the presentation work that consumes those shapes.
|
|
284
293
|
|
|
285
294
|
Example smoke prompt:
|
|
286
295
|
|
package/docs/REQUIREMENTS.md
CHANGED
|
@@ -81,7 +81,7 @@ Define the product requirements and constraints for `pi-agent-browser-native`.
|
|
|
81
81
|
- Because direct-binary usage is commonly blocked in normal agent sessions, the repo must carry a local command reference for the effective `agent_browser` surface and keep it in sync with upstream changes.
|
|
82
82
|
- Repository verification must include a lightweight command-reference drift check against the targeted installed upstream `agent-browser` version.
|
|
83
83
|
- Published package contents should include the canonical user-facing docs plus `LICENSE`.
|
|
84
|
-
- Published package contents should exclude agent-only and superseded docs such as `AGENTS.md
|
|
84
|
+
- Published package contents should exclude agent-only and superseded docs such as `AGENTS.md` and archived drafts under `docs/archive/`.
|
|
85
85
|
|
|
86
86
|
### Testing guidance
|
|
87
87
|
|