pi-agent-browser-native 0.2.44 → 0.2.46

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (66) hide show
  1. package/CHANGELOG.md +42 -0
  2. package/README.md +20 -15
  3. package/docs/ARCHITECTURE.md +12 -10
  4. package/docs/COMMAND_REFERENCE.md +49 -27
  5. package/docs/ELECTRON.md +1 -1
  6. package/docs/RELEASE.md +6 -5
  7. package/docs/REQUIREMENTS.md +6 -3
  8. package/docs/SUPPORT_MATRIX.md +17 -13
  9. package/docs/TOOL_CONTRACT.md +87 -46
  10. package/docs/platform-smoke.md +4 -3
  11. package/extensions/agent-browser/index.ts +43 -450
  12. package/extensions/agent-browser/lib/bash-guard.ts +205 -0
  13. package/extensions/agent-browser/lib/electron/cdp.ts +69 -0
  14. package/extensions/agent-browser/lib/electron/cleanup.ts +5 -58
  15. package/extensions/agent-browser/lib/electron/discovery.ts +2 -9
  16. package/extensions/agent-browser/lib/electron/launch.ts +11 -65
  17. package/extensions/agent-browser/lib/electron/text.ts +13 -0
  18. package/extensions/agent-browser/lib/fs-utils.ts +18 -0
  19. package/extensions/agent-browser/lib/input-modes/job.ts +207 -21
  20. package/extensions/agent-browser/lib/input-modes/params.ts +28 -11
  21. package/extensions/agent-browser/lib/input-modes/semantic-action.ts +22 -2
  22. package/extensions/agent-browser/lib/input-modes/types.ts +5 -1
  23. package/extensions/agent-browser/lib/input-modes.ts +1 -0
  24. package/extensions/agent-browser/lib/json-schema.ts +73 -0
  25. package/extensions/agent-browser/lib/orchestration/browser-run/click-dispatch.ts +82 -11
  26. package/extensions/agent-browser/lib/orchestration/browser-run/diagnostics.ts +159 -30
  27. package/extensions/agent-browser/lib/orchestration/browser-run/final-result.ts +53 -2
  28. package/extensions/agent-browser/lib/orchestration/browser-run/index.ts +1 -0
  29. package/extensions/agent-browser/lib/orchestration/browser-run/prepare.ts +751 -32
  30. package/extensions/agent-browser/lib/orchestration/browser-run/process-output.ts +38 -7
  31. package/extensions/agent-browser/lib/orchestration/browser-run/prompt-guards.ts +0 -46
  32. package/extensions/agent-browser/lib/orchestration/browser-run/session-state.ts +10 -1
  33. package/extensions/agent-browser/lib/orchestration/browser-run/types.ts +28 -1
  34. package/extensions/agent-browser/lib/orchestration/electron-host/index.ts +1 -6
  35. package/extensions/agent-browser/lib/orchestration/input-plan.ts +15 -3
  36. package/extensions/agent-browser/lib/orchestration/output-file.ts +86 -0
  37. package/extensions/agent-browser/lib/pi-tool-rendering.ts +252 -0
  38. package/extensions/agent-browser/lib/playbook.ts +26 -26
  39. package/extensions/agent-browser/lib/process.ts +1 -1
  40. package/extensions/agent-browser/lib/prompt-policy.ts +1 -18
  41. package/extensions/agent-browser/lib/results/artifact-manifest.ts +1 -4
  42. package/extensions/agent-browser/lib/results/artifact-state.ts +7 -3
  43. package/extensions/agent-browser/lib/results/contracts.ts +6 -2
  44. package/extensions/agent-browser/lib/results/envelope.ts +11 -2
  45. package/extensions/agent-browser/lib/results/network-routes.ts +7 -4
  46. package/extensions/agent-browser/lib/results/network.ts +7 -1
  47. package/extensions/agent-browser/lib/results/presentation/artifacts.ts +88 -20
  48. package/extensions/agent-browser/lib/results/presentation/batch.ts +84 -12
  49. package/extensions/agent-browser/lib/results/presentation/diagnostics.ts +81 -26
  50. package/extensions/agent-browser/lib/results/presentation/errors.ts +13 -0
  51. package/extensions/agent-browser/lib/results/presentation/registry.ts +60 -0
  52. package/extensions/agent-browser/lib/results/presentation.ts +10 -1
  53. package/extensions/agent-browser/lib/results/snapshot-high-value-controls.ts +16 -5
  54. package/extensions/agent-browser/lib/results/snapshot.ts +2 -0
  55. package/extensions/agent-browser/lib/runtime.ts +10 -1
  56. package/extensions/agent-browser/lib/session-page-state.ts +15 -6
  57. package/extensions/agent-browser/lib/string-enum-schema.ts +20 -0
  58. package/extensions/agent-browser/lib/web-search.ts +31 -13
  59. package/package.json +2 -2
  60. package/platform-smoke.config.mjs +5 -2
  61. package/scripts/platform-smoke/build-ubuntu-image.mjs +25 -0
  62. package/scripts/platform-smoke/crabbox-runner.mjs +5 -1
  63. package/scripts/platform-smoke/doctor.mjs +6 -2
  64. package/scripts/platform-smoke/linux-image/Dockerfile +3 -5
  65. package/scripts/platform-smoke/targets.mjs +2 -1
  66. package/extensions/agent-browser/lib/orchestration/browser-run/browser-action-model.ts +0 -154
@@ -37,6 +37,7 @@ Tool parameters (use exactly one of `args`, `semanticAction`, `job`, `qa`, `sour
37
37
 
38
38
  ```json
39
39
  { "semanticAction": { "action": "click", "locator": "text", "value": "Submit" }, "sessionMode": "auto" }
40
+ { "semanticAction": { "action": "fill", "selector": "@e1", "text": "prompt text" } }
40
41
  { "semanticAction": { "action": "select", "selector": "#flavor", "value": "chocolate" } }
41
42
  ```
42
43
 
@@ -62,13 +63,15 @@ Tool parameters (use exactly one of `args`, `semanticAction`, `job`, `qa`, `sour
62
63
  ```
63
64
 
64
65
  - `args`: exact `agent-browser` CLI tokens after the binary name. Omit when using `semanticAction`, `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron` instead (mutually exclusive).
65
- - `semanticAction`: optional shorthand for common `find` flows and native dropdown `select`; compiles to upstream argv and is rejected together with `args`, `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron` on the same call.
66
- - `job`: optional constrained short-workflow schema; compiles to existing upstream `batch` args/stdin and reports the compiled plan in `details.compiledJob`.
67
- - `qa`: optional lightweight QA preset; compiles to the same batch path and reports `details.compiledQaPreset` plus `details.qaPreset` pass/fail evidence.
66
+ - `semanticAction`: optional shorthand for common `find` flows, direct selector/ref click/check/fill, and native dropdown `select`; compiles to upstream argv and is rejected together with `args`, `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron` on the same call.
67
+ - `job`: optional constrained short-workflow schema; compiles to existing upstream `batch` args/stdin, defaults to `batch --bail` (`failFast: true`), and reports the compiled plan in `details.compiledJob`.
68
+ - `qa`: optional lightweight QA preset; compiles to the same fail-fast batch path and reports `details.compiledQaPreset` plus `details.qaPreset` pass/fail evidence.
68
69
  - `sourceLookup`: **EXPERIMENTAL — candidates only** for local UI-to-source hints; compiles to the same `batch` path, reports `details.compiledSourceLookup` and `details.sourceLookup`, and never reclassifies a fully successful upstream batch as failed the way `qa` can (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sourcelookup) and the longer notes below).
69
70
  - `networkSourceLookup`: **EXPERIMENTAL — candidates only** for failed request-to-source hints; compiles to generated `batch`, reports `details.compiledNetworkSourceLookup` and `details.networkSourceLookup`, and never assigns blame or edits files.
70
71
  - `electron`: optional Electron desktop-app shorthand. `list`, `status`, `cleanup`, and `probe` are wrapper-owned host/session helpers; `launch` starts a wrapper-owned isolated Electron profile and attaches through upstream `connect`.
71
72
  - `stdin`: only for `batch`, `eval --stdin`, and `auth save --password-stdin`; other command/stdin combinations are rejected before `agent-browser` is launched. `job`, `qa`, `sourceLookup`, `networkSourceLookup`, and `electron` generate or manage their own input.
73
+ - `outputPath`: optional wrapper-owned local file sink for successful results. Use it for durable `eval`, `get`, `snapshot`, or diagnostic outputs; `details.outputFile` reports the saved path and byte count. If caller argv includes upstream `--json`, the visible JSON content stays parseable and the save notice is only in `details.outputFile`.
74
+ - `timeoutMs`: optional per-call wrapper subprocess watchdog override in milliseconds for browser CLI modes. Use it for known-slow opens/captures rather than relying on repeated retries.
72
75
  - `sessionMode`:
73
76
  - `"auto"` reuses the extension-managed session when possible.
74
77
  - `"fresh"` rotates that managed session to a fresh upstream launch so launch-scoped flags (`--auto-connect`, `--cdp`, `--enable`, `--executable-path`, `--init-script`, `--device`, `--profile`, `--provider`, `-p`, `--session-name`, `--state`) apply.
@@ -153,23 +156,25 @@ Examples:
153
156
  { "semanticAction": { "action": "click", "locator": "role", "value": "button", "name": "Close" } }
154
157
  { "semanticAction": { "action": "click", "locator": "role", "role": "button", "name": "Continue without Signing In" } }
155
158
  { "semanticAction": { "action": "fill", "locator": "label", "value": "Email", "text": "user@example.com" } }
159
+ { "semanticAction": { "action": "fill", "selector": "@e1", "text": "prompt text" } }
160
+ { "semanticAction": { "action": "click", "selector": "#submit" } }
156
161
  { "semanticAction": { "action": "select", "selector": "#flavor", "value": "chocolate" } }
157
162
  { "semanticAction": { "action": "click", "locator": "text", "value": "Close", "session": "named-browser" } }
158
163
  { "args": ["scrollintoview", "@e12"] }
159
164
  { "args": ["snapshot", "-i"] }
160
165
  ```
161
166
 
162
- The optional native `semanticAction` object is only a thin schema for common locator-based actions and native dropdown selection; it compiles locator actions to existing upstream `find` commands, compiles `action: "select"` to upstream `select <selector> <value...>`, and reports the compiled argv in `details.compiledSemanticAction` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction) for the full field rules). For `locator: "role"`, pass either `value: "button"` or `role: "button"`; if both are present they must match. It is a top-level alternative to `args`, `job`, `qa`, `sourceLookup`, `networkSourceLookup`, and `electron`, not a nested shape inside `batch` stdin arrays. Add `session` inside `semanticAction` when the shorthand should target a named upstream browser session; the compiled argv prepends `--session <name>` before `find` or `select`, and fallback candidate actions preserve that prefix. For active sessions, role/name click/check/fill shorthands may resolve through the current `snapshot -i` refs before execution so hidden duplicate matches do not steal the action; fill only resolves when there is one exact editable current ref match. Inspect `details.effectiveArgs` when you need the exact executed argv. `semanticAction` does not expose `uncheck` while upstream `find ... uncheck` is not runtime-supported; use raw `uncheck <selector-or-ref>` after choosing a stable selector or current snapshot ref. `select` shorthand intentionally requires a stable selector or current `@ref` plus `value`/`values`; upstream `find` does not expose a verified `select` action, so role/name/label dropdown resolution stays a snapshot/selector decision instead of hidden wrapper magic. If a raw `find` or semantic action misses with `selector-not-found`, the wrapper may take one fresh snapshot and append `Current snapshot ref fallback` when that snapshot has exact visible role/name matches for the failed target. Non-fill matches can include direct `try-current-visible-ref*` next actions. Semantic click misses may also include `Agent-browser candidate fallbacks`; `details.nextActions` first recommends a fresh `snapshot -i` and may include bounded role/name retries such as `button`/`link` for a missed `text` click, each as a `try-*-candidate` entry carrying redacted `find role …` argv.
167
+ The optional native `semanticAction` object is only a thin schema for common locator-based actions, direct selector/ref click/check/fill, and native dropdown selection; it compiles locator actions to existing upstream `find` commands, direct selector/ref actions to `click` / `check` / `fill`, compiles `action: "select"` to upstream `select <selector> <value...>`, and reports the compiled argv in `details.compiledSemanticAction` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction) for the full field rules). For `locator: "role"`, pass either `value: "button"` or `role: "button"`; if both are present they must match. It is a top-level alternative to `args`, `job`, `qa`, `sourceLookup`, `networkSourceLookup`, and `electron`, not a nested shape inside `batch` stdin arrays. Add `session` inside `semanticAction` when the shorthand should target a named upstream browser session; the compiled argv prepends `--session <name>` before `find`, direct selector/ref commands, or `select`, and fallback candidate actions preserve that prefix. For active sessions, role/name click/check/fill shorthands may resolve through the current `snapshot -i` refs before execution so hidden duplicate matches do not steal the action; fill only resolves when there is one exact editable current ref match. Inspect `details.effectiveArgs` when you need the exact executed argv. `semanticAction` does not expose `uncheck` while upstream `find ... uncheck` is not runtime-supported; use raw `uncheck <selector-or-ref>` after choosing a stable selector or current snapshot ref. `select` shorthand intentionally requires a stable selector or current `@ref` plus `value`/`values`; upstream `find` does not expose a verified `select` action, so role/name/label dropdown resolution stays a snapshot/selector decision instead of hidden wrapper magic. If a raw `find` or semantic action misses with `selector-not-found`, the wrapper may take one fresh snapshot and append `Current snapshot ref fallback` when that snapshot has exact visible role/name matches for the failed target. Non-fill matches can include direct `try-current-visible-ref*` next actions. Semantic click misses may also include `Agent-browser candidate fallbacks`; `details.nextActions` first recommends a fresh `snapshot -i` and may include bounded role/name retries such as `button`/`link` for a missed `text` click, each as a `try-*-candidate` entry carrying redacted `find role …` argv.
163
168
 
164
- For desktop or host-controlled rich inputs, treat a semantic `fill` miss differently. Active-session role/name fills can execute through one exact current editable `combobox`, `searchbox`, or `textbox` ref before upstream `find` runs. If a later selector miss still finds an exact current editable ref (`searchbox` or `textbox`), `details.richInputRecovery` and visible `Rich input recovery` describe the candidate and append `focus-current-editable-ref*` / `click-current-editable-ref*` next actions. Those actions deliberately do **not** copy the fill text and never press `Enter` or submit. Use the safe ladder instead: refresh refs, choose the current editable `@ref`, focus or click it, then send the intended text with `keyboard inserttext` or `keyboard type` in a separate call. Do not auto-submit unless the user flow explicitly calls for it.
169
+ For desktop, contenteditable, or host-controlled rich inputs, treat a semantic `fill` miss or mismatch differently. Active-session role/name fills can execute through one exact current editable `combobox`, `searchbox`, or `textbox` ref before upstream `find` runs. If a later selector miss still finds an exact current editable ref (`searchbox` or `textbox`), `details.richInputRecovery` and visible `Rich input recovery` describe the candidate and append `focus-current-editable-ref*` / `click-current-editable-ref*` next actions. Those actions deliberately do **not** copy the fill text and never press `Enter` or submit. Direct `fill @ref <text>` on contenteditable refs may also append/prepend instead of replacing; when the latest snapshot proves the target is contenteditable, the wrapper verifies `get text` after a successful fill and appends `details.fillVerification` plus `inspect-after-fill-verification` / `verify-filled-value` if the visible text does not match. Use the safe ladder instead: refresh refs, choose the current editable `@ref`, focus or click it, then send the intended text with `keyboard inserttext` or `keyboard type` in a separate call. Do not auto-submit unless the user flow explicitly calls for it.
165
170
 
166
171
  Do not assume Playwright selector dialects such as `text=Close` or `button:has-text('Close')` are supported wrapper syntax. If you need those forms, verify current upstream `agent-browser` behavior first; otherwise use refs, `find`, or known CSS selectors.
167
172
 
168
- Treat `@e…` refs as page-scoped. After a successful `snapshot`, the wrapper records the latest refs and page target for that session; mutation-prone ref commands such as non-form `click @e4`, `select @e5 chocolate`, or batch steps with old refs fail with `failureCategory: "stale-ref"` when the page target changed or the ref is absent from the latest same-page snapshot. If a session `snapshot -i` fails with `No active page`, the wrapper invalidates prior refs for that session; later mutation-prone `@e…` calls fail before upstream until a successful fresh `snapshot -i` records refs again. Inside `batch` stdin JSON, the wrapper also walks steps in order before spawn: steps whose first token can navigate or mutate set a latch; a later step whose first token is `snapshot` clears that latch for following rows; guarded steps that still mention `@e…` after an uncleared latch fail with the same `stale-ref` bucket without launching upstream. Same-snapshot form fills and native form-control steps are allowed before a click or submit step, so `fill`, `check`/`uncheck` checkbox or radio refs, `select` combobox refs, then a final submit `click` can run from one snapshot; checkbox/radio `click`s remain conservative unless followed by a fresh snapshot. Split dynamic or autosubmit forms with a fresh snapshot if a control interaction rerenders the targets. Follow the `refresh-interactive-refs` next action (it includes `--session <name>` when needed) and prefer stable `find` or `semanticAction` locators when navigation or rerendering is likely. Contract detail: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`refSnapshot`, `refSnapshotInvalidation`).
173
+ Treat `@e…` refs as page-scoped. After a successful `snapshot`, the wrapper records the latest refs and page target for that session; mutation-prone ref commands such as non-form `click @e4`, `select @e5 chocolate`, or batch steps with old refs fail with `failureCategory: "stale-ref"` when the page target changed or the ref is absent from the latest same-page snapshot. If a session `snapshot -i` fails with `No active page`, the wrapper invalidates prior refs for that session; later mutation-prone `@e…` calls fail before upstream until a successful fresh `snapshot -i` records refs again. Inside `batch` stdin JSON, the wrapper also walks steps in order before spawn: steps whose first token can navigate or mutate set a latch; a later step whose first token is `snapshot` clears that latch for following rows; guarded steps that still mention `@e…` after an uncleared latch fail with the same `stale-ref` bucket without launching upstream. Same-snapshot form fills and native form-control steps are allowed before a click or submit step, so `fill`, `check`/`uncheck` checkbox or radio refs, checkbox/radio `click`/`tap` refs, `select` combobox refs, then a final submit `click` can run from one snapshot. Split dynamic or autosubmit forms with a fresh snapshot if a control interaction rerenders the targets. Follow the `refresh-interactive-refs` next action (it includes `--session <name>` when needed) and prefer stable `find` or `semanticAction` locators when navigation or rerendering is likely. Contract detail: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`refSnapshot`, `refSnapshotInvalidation`).
169
174
 
170
- A successful `click` result means upstream reported a target, not that the app definitely handled the event. For top-level non-Electron clicks, the wrapper installs a bounded DOM-event probe; when upstream reports success but no trusted event reaches the target, it fails the tool and exposes `details.clickDispatch` plus a `Click dispatch diagnostic` line with explicit retry/inspect next actions (no in-page click replay). When the workflow depends on a mutation, use `details.pageChangeSummary`, a wait, URL/text extraction, or a fresh `snapshot -i` before trusting the state; if nothing changed, retry with a current visible ref or stable selector and report the workflow issue. For static local fixtures or debugging where the user explicitly accepts scripted activation, `eval --stdin` can call `document.querySelector(...).click()` to exercise inline handlers and app code; treat that as an untrusted programmatic event, not as evidence that CDP/user-like clicking works. Preserve explicit user stop boundaries: if the user says to stop before a final order, post, purchase, or submit action, gather evidence from that page and do not click the final action or use scripted activation to bypass the stop. The wrapper also blocks likely final order/submit click targets under those prompts and returns `details.promptGuard` with `failureCategory: "policy-blocked"`.
175
+ A successful `click` result means upstream reported a target, not that the app definitely handled the event. For top-level non-Electron clicks, the wrapper installs a bounded DOM-event probe; when upstream reports success but no trusted event reaches the target, it fails the tool and exposes `details.clickDispatch` plus a `Click dispatch diagnostic` line with explicit retry/inspect next actions (no in-page click replay). If the probe evidence shows the target is outside a nested scroll container or viewport, `details.clickDispatch.scrollContainer` and `scroll-target-into-view-after-dispatch-miss` point to `scrollintoview <target>` before retry. When the workflow depends on a mutation, use `details.pageChangeSummary`, a wait, URL/text extraction, or a fresh `snapshot -i` before trusting the state; if nothing changed, retry with a current visible ref or stable selector and report the workflow issue. For static local fixtures or debugging where the user explicitly accepts scripted activation, `eval --stdin` can call `document.querySelector(...).click()` to exercise inline handlers and app code; treat that as an untrusted programmatic event, not as evidence that CDP/user-like clicking works. Respect explicit user stop boundaries yourself: if the user says to stop before a final order, post, purchase, or submit action, gather evidence from that page and do not click the final action or use scripted activation to bypass the stop. The wrapper does not infer broad business intent from prompt text; `details.promptGuard` is reserved for concrete artifact-before-close checks. `press`, `key`, `keydown`, and `keyup` accept exactly one key token; focus or click the target first, then run `press Enter` or another single-key command.
171
176
 
172
- When a **top-level** `click` succeeds (not a `click` hidden inside a `batch`/`job` tool call—the unified command must be `click`), the upstream payload includes `data.clicked`, no `details.clickDispatch` diagnostic fired for the same result, and the wrapper sees the active tab URL unchanged after the same normalization it uses for ref guards (**`#fragment` ignored**), it may run one extra `snapshot -i` and surface `Possible overlay blockers` plus `details.overlayBlockers` (`candidates`, `summary`, and a `snapshot` map that can refresh `refSnapshot`) when that snapshot shows strong modal context (`dialog` / `alertdialog`) **and** up to three close/dismiss-like controls; page-wide words such as privacy, sign in, or banner alone do not trigger it. The URL check compares the session’s prior pinned tab target to `details.navigationSummary.url` after the click; that summary is gathered with one read-only `eval` when the click JSON omits **both** string `data.url` and `data.title`—if upstream already echoes either field, overlay diagnostics are skipped on this path. The diagnostic is skipped if the wrapper already applied tab-focus correction or about-blank recovery on that result. Appended `inspect-overlay-state` / `try-overlay-blocker-candidate-*` entries in `details.nextActions` include `--session <name>` when the session is named, same as other session-scoped follow-ups. Treat `inspect-overlay-state` as the safe first follow-up; only use a `try-overlay-blocker-candidate-*` next action when the candidate is clearly the control you intend to close.
177
+ Successful `snapshot -i` results can also surface `Possible overlay blockers` when their own refs already show dialog/alertdialog context plus close/dismiss controls, so agents can detect likely obstruction before clicking. When a **top-level** `click` succeeds (not a `click` hidden inside a `batch`/`job` tool call—the unified command must be `click`), the upstream payload includes `data.clicked`, no `details.clickDispatch` diagnostic fired for the same result, and the wrapper sees the active tab URL unchanged after the same normalization it uses for ref guards (**`#fragment` ignored**), it may run one extra `snapshot -i` and surface `Possible overlay blockers` plus `details.overlayBlockers` (`candidates`, `summary`, and a `snapshot` map that can refresh `refSnapshot`) when that snapshot shows strong modal context (`dialog` / `alertdialog`) **and** up to three close/dismiss-like controls; page-wide words such as privacy, sign in, or banner alone do not trigger it. The URL check compares the session’s prior pinned tab target to `details.navigationSummary.url` after the click; that summary is gathered with one read-only `eval` when the click JSON omits **both** string `data.url` and `data.title`—if upstream already echoes either field, overlay diagnostics are skipped on this path. The diagnostic is skipped if the wrapper already applied tab-focus correction or about-blank recovery on that result. Appended `inspect-overlay-state` / `try-overlay-blocker-candidate-*` entries in `details.nextActions` include `--session <name>` when the session is named, same as other session-scoped follow-ups. Treat `inspect-overlay-state` as the safe first follow-up; only use a `try-overlay-blocker-candidate-*` next action when the candidate is clearly the control you intend to close.
173
178
 
174
179
  ### Extract page data
175
180
 
@@ -188,7 +193,7 @@ When you already know several visible refs or selectors, extract them in one `ba
188
193
 
189
194
  Prefer `get` and scoped `eval --stdin` for read-only extraction. Getter names are grouped under `get`: use `get title`, `get url`, or `get text <selector>`, not shortcut commands such as `title` or `url`. When upstream reports an unknown command, unknown subcommand, or unrecognized command for a single-token shortcut (`attr`, `count`, `html`, `text`, `title`, `url`, or `value`), the wrapper adds a visible grouped-`get` hint; only `title` and `url` also get exact read-only `details.nextActions` (`use-get-title` / `use-get-url`, with `--session` preserved when the failed call named a session). If another `Agent-browser hint:` (selector dialect or stale-ref recovery) was already appended to the same error text, the getter hint is omitted.
190
195
 
191
- Return the intended JavaScript value from `eval --stdin` instead of relying on `console.log`. In the native pi tool, the JavaScript belongs in the top-level `stdin` field; do **not** write it as a third `args` item such as `{ "args": ["eval", "--stdin", "document.title"] }`. The wrapper tolerates that common misplaced form by moving the trailing token to stdin before spawn, but the explicit `stdin` field is the documented form and avoids ambiguity for multiline snippets. For object-shaped extraction, pass a plain expression such as `({ title: document.title, url: location.href })`; if you send a function-shaped snippet, invoke it explicitly, for example `(() => ({ title: document.title }))()`. When upstream serializes a function result to `{}`, the wrapper can append `Eval stdin hint` and `details.evalStdinHint`.
196
+ Return the intended JavaScript value from `eval --stdin` instead of relying on `console.log`. In the native pi tool, the JavaScript belongs in the top-level `stdin` field; do **not** write it as a third `args` item such as `{ "args": ["eval", "--stdin", "document.title"] }`. The wrapper tolerates that common misplaced form by moving the trailing token to stdin before spawn, but the explicit `stdin` field is the documented form and avoids ambiguity for multiline snippets. For object-shaped extraction, pass a plain expression such as `({ title: document.title, url: location.href })`; if the result should be kept outside the transcript as a durable file, add top-level `outputPath` (for example `{ "args": ["eval", "--stdin"], "stdin": "({ title: document.title })", "outputPath": "logs/page-title.json" }`). If you send a function-shaped snippet, invoke it explicitly, for example `(() => ({ title: document.title }))()`. When upstream serializes a function result to `{}`, the wrapper can append `Eval stdin hint` and `details.evalStdinHint`.
192
197
 
193
198
  On tabbed or hidden-DOM pages, `get text <selector>` reads the upstream-selected match, which may be hidden even when a later match is visible. For non-`@ref` CSS selectors with multiple matches, including successful `batch` steps, the wrapper may add `Selector text visibility warning`, `details.selectorTextVisibility` (and `details.selectorTextVisibilityAll` for multiple batched warnings), and `inspect-visible-text-candidates` next actions. The warning names the matching `details.nextActions` id so agents know to use a fresher `snapshot -i`, a visible `@ref`, or a more specific selector instead of trusting hidden tab content. If the probe still leaves multiple visible candidates, do not keep reading the broad selector; switch to a current visible `@ref`, add a narrower selector such as a known panel/container id, or use a targeted `eval --stdin` expression that filters for visible elements and returns the intended index/text.
194
199
 
@@ -200,7 +205,7 @@ On tabbed or hidden-DOM pages, `get text <selector>` reads the upstream-selected
200
205
 
201
206
  Use `batch --bail` when later steps should stop after the first failed command.
202
207
 
203
- For short constrained flows, use top-level `job` instead of hand-writing `batch` stdin. Supported job steps are `open`, `click`, `fill`, `select`, `wait`, `assertText`, `assertUrl`, `waitForDownload`, and `screenshot`; `select` requires `selector` plus `value` or `values`, and compiles to upstream `select <selector> <value...>`. The wrapper compiles steps to upstream `batch` and records `details.compiledJob.steps[]`. There is still no separate first-class catalog of reusable named browser recipes above `job`, the `qa` preset, and raw `batch`; see [`ARCHITECTURE.md`](ARCHITECTURE.md#no-reusable-recipe-layer-yet) for the closed `RQ-0068` decision and revisit bar.
208
+ For short constrained flows, use top-level `job` instead of hand-writing `batch` stdin. Supported job steps are `open`, `click`, `fill`, `type`, `select`, `wait`, `assertText`, `assertUrl`, `waitForDownload`, `snapshot`, and `screenshot`. `open` can include `loadState: "domcontentloaded" | "load" | "networkidle"` to insert a `wait --load …` row immediately after navigation before the next click/read step. `click` and `fill` accept either a stable `selector` or the same semantic locator fields as top-level `semanticAction` (`locator`, plus `role`/`name` or `value` as appropriate) and compile locator steps to upstream `find` argv. `type` focuses an optional selector, sends text through upstream keyboard typing, can insert `wait` rows via `delayMs` for human-paced input, and can append a final `press` key such as `Enter`; delayed typing is capped at 200 characters per step, and generated per-character rows are compacted in model-visible batch text while remaining available in `details.batchSteps`. `select` requires `selector` plus `value` or `values`, and compiles to upstream `select <selector> <value...>`. By default the wrapper compiles steps to upstream `batch --bail` so a failed setup/fill/assertion step stops later mutating clicks; set `failFast: false` only when you explicitly need continue-after-error diagnostics. The wrapper records `details.compiledJob.steps[]` plus `details.compiledJob.failFast`. There is still no separate first-class catalog of reusable named browser recipes above `job`, the `qa` preset, and raw `batch`; see [`ARCHITECTURE.md`](ARCHITECTURE.md#no-reusable-recipe-layer-yet) for the closed `RQ-0068` decision and revisit bar.
204
209
 
205
210
  **Job navigation is explicit.** A `click` step (or other navigation-prone interaction) does not prove the next page loaded. The wrapper does not auto-insert `assertUrl` or `assertText` after clicks inside `job`; add those steps yourself with the URL pattern or on-page text you expect, especially after forms, checkout, tabs, or submit buttons, before screenshots or later steps.
206
211
 
@@ -216,6 +221,20 @@ For short constrained flows, use top-level `job` instead of hand-writing `batch`
216
221
  }
217
222
  ```
218
223
 
224
+ Human-paced typing flow:
225
+
226
+ ```json
227
+ {
228
+ "job": {
229
+ "steps": [
230
+ { "action": "open", "url": "https://example.test/form" },
231
+ { "action": "type", "selector": "#prompt", "text": "hello", "delayMs": 20, "press": "Enter" },
232
+ { "action": "assertText", "text": "Submitted" }
233
+ ]
234
+ }
235
+ }
236
+ ```
237
+
219
238
  Navigation-prone flow (open → fill → click → assert destination → screenshot):
220
239
 
221
240
  ```json
@@ -233,21 +252,21 @@ Navigation-prone flow (open → fill → click → assert destination → screen
233
252
  }
234
253
  ```
235
254
 
236
- On app pages that expose a native dropdown, add a `select` step such as `{ "action": "select", "selector": "#flavor", "value": "chocolate" }` before the assertion that depends on it.
255
+ On app pages that expose a native dropdown, add a `select` step such as `{ "action": "select", "selector": "#flavor", "value": "chocolate" }` before the assertion that depends on it. Insert `{ "action": "snapshot" }` between mutation-prone steps when a later job row needs fresh `@refs`. On pages where stable CSS is not known, use semantic job steps such as `{ "action": "fill", "locator": "role", "role": "searchbox", "name": "Search", "text": "agent browser" }` and `{ "action": "click", "locator": "role", "role": "button", "name": "Search" }` instead of guessing selectors.
237
256
 
238
257
  Use raw `args: ["batch"]` with `stdin` when you need arbitrary upstream commands, flags, or batch failure policies outside the constrained schema. Do not pass `stdin` with `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron`; those modes generate or manage their own input.
239
258
 
240
- For quick smoke/QA checks, use top-level `qa`. It clears enabled network/console/page-error buffers before opening the target URL, waits for page readiness, checks expected text/selector, inspects fresh network requests, console messages, and page errors, and can capture an evidence screenshot. The readiness wait defaults to `loadState: "domcontentloaded"`; set `loadState` to `"load"` or `"networkidle"` only when that stricter state is useful and the site is not expected to keep background requests alive. QA network diagnostics classify failed requests by likely impact and list failed rows first in the network preview: actionable document/script/API-style failures fail the preset, while common low-impact browser icon misses such as `favicon.ico` are surfaced as warnings (`qaPreset.warnings`) so they do not fail an otherwise healthy page. Successful QA with no failed checks returns compact model-visible prose (page URL/title when known, checks run, optional screenshot verification) while keeping the full step matrix in `details.qaPreset` and `details.batchSteps`. Failed QA presets report `details.resultCategory: "failure"`, `failureCategory: "qa-failure"`, keep verbose per-step batch output, and real Pi sessions treat the diagnostic as a failed tool result. Prose output also gets a model-visible result-category line including `Pi tool isError: true`; caller-requested `--json` output keeps the JSON string parseable and relies on the patched `isError` plus `details` fields.
259
+ For quick smoke/QA checks, use top-level `qa`. It clears enabled network/console/page-error buffers before opening the target URL, waits for page readiness, checks expected text/selector, then inspects fresh network requests, console messages, and page errors only if preceding assertions pass, and can capture an evidence screenshot. The preset compiles to `batch --bail` so a missing text/selector assertion fails crisply instead of letting slower diagnostics burn the wrapper watchdog. Expected text compiles to bounded visible-text `wait --fn … --timeout 5000` predicates after load so dense pages can pass on visible headings/copy without dumping `body` text; missing text reports a crisp QA failure. The readiness wait defaults to `loadState: "domcontentloaded"`; set `loadState` to `"load"` or `"networkidle"` only when that stricter state is useful and the site is not expected to keep background requests alive. QA network diagnostics classify failed requests by likely impact and list failed rows first in the network preview: actionable document/script/API-style failures fail the preset, while common low-impact browser icon misses such as `favicon.ico` are surfaced as warnings (`qaPreset.warnings`) so they do not fail an otherwise healthy page. Successful QA with no failed checks returns compact model-visible prose (page URL/title when known, checks run, optional screenshot verification) while keeping the full step matrix in `details.qaPreset` and `details.batchSteps`. Failed QA presets report `details.resultCategory: "failure"`, `failureCategory: "qa-failure"`, keep verbose per-step batch output, and real Pi sessions treat the diagnostic as a failed tool result. Prose output also gets a model-visible result-category line including `Pi tool isError: true`; caller-requested `--json` output keeps the JSON string parseable and relies on the patched `isError` plus `details` fields.
241
260
 
242
- The same classification drives plain `network requests` presentation: when any row counts as failed (HTTP status ≥ 400, `failed: true`, or a string `error`), model-facing text starts with a line like `Network failure summary: 0 actionable, 1 benign low-impact (1 total).`, and each preview line can end with an impact tag such as `[benign: low-impact browser icon asset]` or `[actionable: document, script, API, or non-benign request failure]`. When safe request IDs are present, `details.nextActions` adds bounded read-only follow-ups such as `network request <id>`, `networkSourceLookup` for actionable failed rows, `network requests --filter <path>`, and `network har start`; prefer those payloads over rebuilding request-id commands from prose. If the wrapper has seen a prior `network route` in the same session, matching pending fetch/XHR rows or CORS-looking errors add `details.networkRouteDiagnostics` plus executable route-mock follow-ups (`inspect-pending-routed-network-request` and `start-network-har-capture-for-route-mock`) so agents do not mistake a stalled/CORS-blocked mock for a fulfilled mock; same-origin/CORS fixture retry guidance stays in visible prose. Rules live in `classifyNetworkRequestFailure` / `summarizeNetworkFailures` in `extensions/agent-browser/lib/results/network.ts`; QA aggregation is `analyzeQaPresetResults` in `extensions/agent-browser/index.ts`.
261
+ The same classification drives plain `network requests` presentation: when any row counts as failed (HTTP status ≥ 400, `failed: true`, or a string `error`), model-facing text starts with a line like `Network failure summary: 0 actionable, 1 benign low-impact (1 total).`, and each preview line can end with an impact tag such as `[benign: low-impact browser icon asset]` or `[actionable: document, script, API, or non-benign request failure]`. When safe request IDs are present, `details.nextActions` adds bounded read-only follow-ups such as `network request <id>`, `networkSourceLookup` for actionable failed rows, `network requests --filter <path>`, `network requests --clear` before a repro, and `network har start`; prefer those payloads over rebuilding request-id commands from prose. For aggregate buffers, the wrapper accepts `network requests --current-page` / `--current-origin` to render only rows matching the active page origin, or `--current-url` for exact active document URL matching; it strips those wrapper-only flags before upstream spawn and reports counts in `details.networkRequestsPageFilter`. If the wrapper has seen a prior `network route` in the same session, matching failed, pending, or CORS-looking fetch/XHR rows add `details.networkRouteDiagnostics` plus executable route-mock follow-ups (`inspect-routed-network-request` and `start-network-har-capture-for-route-mock`) so agents do not mistake an unfulfilled mock for a fulfilled mock; same-origin/CORS fixture retry guidance stays in visible prose. `network requests` also hides `data:image` screenshot/artifact noise from the compact preview by default while preserving raw rows in `details.data.requests`. Rules live in `classifyNetworkRequestFailure` / `summarizeNetworkFailures` in `extensions/agent-browser/lib/results/network.ts`; QA aggregation is `analyzeQaPresetResults` in `extensions/agent-browser/index.ts`.
243
262
 
244
263
  ```json
245
264
  { "qa": { "url": "https://example.com", "expectedText": "Example Domain", "screenshotPath": ".dogfood/qa-example.png" } }
246
265
  ```
247
266
 
248
- Optional `loadState`, `checkNetwork`, `checkConsole`, and `checkErrors` default to `"domcontentloaded"`, `true`, `true`, and `true`; set a check to `false` to skip that diagnostic. Omit `expectedText` and `expectedSelector` when you only need load plus diagnostics.
267
+ Optional `loadState`, `checkNetwork`, `checkConsole`, and `checkErrors` default to `"domcontentloaded"`, `true`, `true`, and `true` for URL-opening QA; set a check to `false` to skip that diagnostic. For `qa.attached`, the diagnostic checks default to `false` because upstream buffers may predate the current check; opt in with `checkNetwork`, `checkConsole`, or `checkErrors` when preserved-buffer failures are desired. Omit `expectedText` and `expectedSelector` when you only need load plus diagnostics.
249
268
 
250
- For attached Electron or manually connected CDP sessions, use `qa.attached` after the session exists. It does not open a URL and rejects `sessionMode: "fresh"` because it checks the current managed session. Before running diagnostics, the wrapper requires a readable `http:` or `https:` page URL on the attached session; missing URLs, read failures, and non-http(s) surfaces fail fast with recovery `nextActions` such as `tab list` and `snapshot -i` instead of running the full QA batch.
269
+ For attached Electron or manually connected CDP sessions, use `qa.attached` after the session exists. It does not open a URL and rejects `sessionMode: "fresh"` because it checks the current managed session. Before running diagnostics, the wrapper requires a readable `http:` or `https:` page URL on the attached session; missing URLs, read failures, and non-http(s) surfaces fail fast with recovery `nextActions` such as `tab list` and `snapshot -i` instead of running the full QA batch. Unlike URL-opening QA, `qa.attached` preserves existing upstream network/console/page-error buffers; by default it does not inspect those buffers so stale rows do not false-fail a current-page smoke check. Set `checkNetwork`, `checkConsole`, or `checkErrors` to `true` to opt into preserved-buffer diagnostics; model-visible text and `details.compiledQaPreset.checks.diagnosticsResetAtStart` call out that preserved diagnostics may include earlier events.
251
270
 
252
271
  ```json
253
272
  { "qa": { "attached": true, "expectedText": "Explorer", "screenshotPath": ".dogfood/electron.png" } }
@@ -341,21 +360,21 @@ A successful wait-based download renders a readable summary such as `Download co
341
360
  { "args": ["pdf", "/tmp/page.pdf"] }
342
361
  ```
343
362
 
344
- The upstream screenshot aliases are `screenshot --full` for full-page capture and `screenshot --annotate` for labeled screenshots. When a user gives exact artifact paths for screenshots, recordings, downloads, PDFs, traces, or HAR files, use those paths or explicitly report why the artifact was unavailable; do not silently substitute another path in the final report. When the latest prompt names exact required screenshot paths, `close` / `quit` / `exit` can be blocked with `details.promptGuard.reason: "requested-artifacts-missing-before-close"` until those paths appear as verified explicit artifacts.
363
+ The upstream screenshot aliases are `screenshot --full` for full-page capture and `screenshot --annotate` for labeled screenshots. Annotated screenshots can be noisy on dense pages because labels overlap real content; when labels obscure evidence, capture a scoped element screenshot, take a non-annotated screenshot, or use `snapshot -i` high-value refs as the machine-readable map. When a user gives exact artifact paths for screenshots, recordings, downloads, PDFs, traces, or HAR files, use those paths or explicitly report why the artifact was unavailable; do not silently substitute another path in the final report. When the latest prompt names exact required screenshot paths, `close` / `quit` / `exit` can be blocked with `details.promptGuard.reason: "requested-artifacts-missing-before-close"` until those paths appear as verified explicit artifacts.
345
364
 
346
- Prefer `download <selector> <path>` when the target element itself is the downloadable link/control. Use `click` plus `wait --download [path]` when a previous action starts the download indirectly.
365
+ Prefer `download <selector> <path>` when the target element itself is the downloadable link/control. For simple loopback HTML anchors with `href` and a non-ref selector, the wrapper first reads the resolved anchor URL and saves the in-page credentialed response directly to the requested host path, avoiding upstream random-name download spills in local fixture tests; non-loopback/profile downloads still use upstream fallback. Use `click` plus `wait --download [path]` when a previous action starts the download indirectly.
347
366
 
348
367
  For evidence-only screenshots, QA captures, or audit artifacts, save to an explicit path and branch on `details.artifactVerification` plus `details.artifacts` before reporting PASS/FAIL. Inline image attachments are optional convenience when size limits allow; do not require vision review unless the user asked for visual inspection.
349
368
 
350
369
  Wrapper result rendering is metadata-first for saved files:
351
370
  - screenshots return a saved-path summary, visible artifact metadata, structured `details.artifacts` metadata, and an inline image attachment when safe; the visible block includes artifact type, requested path, absolute path, existence, size, cwd, session, and repair/copy status when applicable
352
371
  - downloads, PDFs, `wait --download` files, `state save` state files, diff screenshot output images, traces, CPU profiles, completed WebM recordings from `record stop`, and path-bearing HAR captures return concise saved-path summaries plus structured `details.artifacts` metadata without inlining large files
353
- - `record start <path>` reports that recording started and that output will be written on `record stop`; the target file may not exist until recording stops, and upstream needs `ffmpeg` on `PATH` at stop time to encode the WebM. If `ffmpeg` is missing after a successful `record start` / `record restart`, the wrapper appends `Recording dependency warning: ffmpeg not found on PATH` and sets `details.recordingDependencyWarning` without blocking the upstream command.
372
+ - `record start <path>` and `record restart <path>` report that recording started and that output will be written on `record stop`; `details.artifacts` / `details.artifactVerification` mark that future file as `pending` with `recordingState: "openRecording"` and `willExistOnStop: true` instead of reporting a missing file. When `record restart` finalizes a previous wrapper-known recording and that file is now present on disk, the result also includes `Previous recording saved: …` before the new pending recording block. The target may not exist until recording stops, and upstream needs `ffmpeg` on `PATH` at stop time to encode the WebM. If `ffmpeg` is missing after a successful `record start` / `record restart`, the wrapper appends `Recording dependency warning: ffmpeg not found on PATH` and sets `details.recordingDependencyWarning` without blocking the upstream command.
354
373
  - `batch` keeps each step's artifacts in `details.batchSteps[].artifacts` and aggregates them in top-level `details.artifacts` in step order
355
374
 
356
375
  `diff screenshot` follows the file-artifact path above for the **diff** image: model-visible text and `details.artifacts` focus on that output, while baseline paths stay out of the artifact summary block, and Pi does **not** auto-inline the diff the way it inlines trusted `screenshot` captures. `state load` may print the loaded path in prose but does not add a saved-file artifact entry the way `state save` does.
357
376
 
358
- For screenshot paths under dot-directories such as `.dogfood/run/foo.png`, the wrapper normalizes the requested path to an absolute path before invoking upstream `agent-browser`, verifies the requested file exists, and repairs from an upstream temp screenshot when possible. The requested path remains visible as `Requested path`, while `Absolute path` shows the actual on-disk location.
377
+ For screenshot paths under dot-directories such as `.dogfood/run/foo.png`, the wrapper normalizes the requested path to an absolute path before invoking upstream `agent-browser`, verifies the requested file exists, and repairs from an upstream temp screenshot when possible. For direct artifact commands and batch artifact steps (`download`, `pdf`, `screenshot`, `state save`, and `wait --download`), the wrapper creates missing parent directories before launch. The requested path remains visible as `Requested path`, while `Absolute path` shows the actual on-disk location.
359
378
 
360
379
  For annotated screenshots in `batch`, put `--annotate` in top-level args instead of inside the screenshot step:
361
380
 
@@ -399,7 +418,7 @@ Oversized snapshots and oversized generic outputs are different: when a persiste
399
418
  { "args": ["snapshot", "-i"] }
400
419
  ```
401
420
 
402
- Use `tab list` and `tab <tab-id-or-label>` when a profile restore, pop-up, or click opens or focuses the wrong tab. Generic tab-drift recovery lists tabs first; run `snapshot -i` only after selecting or confirming the intended stable target. When the wrapper already knows the target, `details.nextActions` may include recovery actions that list tabs, select the intended tab, and refresh refs in the right session.
421
+ Use `tab list` and `tab <tab-id-or-label>` when a profile restore, pop-up, or click opens or focuses the wrong tab. Wrapper presentation keeps stable tab IDs plus upstream labels from `tab new --label` visible (for example `label=docs`) so multi-tab sessions are easier to read. Generic tab-drift recovery lists tabs first; run `snapshot -i` only after selecting or confirming the intended stable target. When the wrapper already knows the target, `details.nextActions` may include recovery actions that list tabs, select the intended tab, and refresh refs in the right session.
403
422
 
404
423
  ### Recover from guarded-action confirmations
405
424
 
@@ -439,10 +458,11 @@ Operational notes:
439
458
  - Visible page content from real authenticated profiles is still model-visible and may persist in transcripts or saved artifacts. The wrapper redacts credential-like cookie/storage/auth data, not the ordinary page text you asked it to read.
440
459
  - `stdin` is accepted only for `batch`, `eval --stdin`, and `auth save --password-stdin`; other stdin-bearing calls are rejected before launch.
441
460
  - `auth list/show/save/login/delete` summaries avoid expanding profile secrets. Prefer `auth save --password-stdin` over `--password <value>`.
442
- - `state save <path>` is a verified file-artifact workflow; inspect `details.artifactVerification` before relying on the file. `state load <path>` is not treated as a newly saved artifact.
461
+ - `session list` and `tab list` are formatted as compact field lists so generated names, labels, active markers, page titles, and URLs are visible without relying on raw JSON.
462
+ - `state save <path>` is a verified file-artifact workflow; the wrapper creates missing parent directories before invoking upstream, then inspect `details.artifactVerification` before relying on the file. `state load <path>` is not treated as a newly saved artifact.
443
463
  - `cookies get` can expose real authenticated-profile cookies; prefer task-specific page actions and only inspect cookies when the user needs cookie data.
444
464
  - `storage local|session` summaries redact sensitive keys and likely secret values but may keep benign primitive local QA values visible, for example `theme: dark`; still avoid broad storage dumps unless necessary.
445
- - `dialog accept/dismiss/status`, `frame <selector|main>`, and guarded-action `confirm <id>` / `deny <id>` pass through the native tool. Prefer `details.nextActions` for exact confirmation recovery payloads.
465
+ - `dialog accept/dismiss/status`, `frame <selector|main>`, and guarded-action `confirm <id>` / `deny <id>` pass through the native tool. Dialog commands use a shorter wrapper process timeout than general browser calls; if a click/tap/find/dialog command times out and may be blocked behind a JavaScript dialog, `details.nextActions` can include `inspect-dialog-after-timeout`, `dismiss-dialog-after-timeout`, and `recover-fresh-session-after-dialog-timeout`. Prefer `details.nextActions` for exact confirmation recovery payloads.
446
466
  - `batch` mirrors the same redaction on every step: top-level `details.data` is a compact `{ success, command, result?, error? }[]` matrix (argv-redacted `command`, stateful `result`, scrubbed `error` text). Use `details.batchSteps[]` when you need per-step artifacts, categories, spill paths, or full structured errors beyond the roll-up.
447
467
 
448
468
  ## Full supported surface
@@ -507,7 +527,7 @@ Skill-source debugging note: upstream honors `AGENT_BROWSER_SKILLS_DIR` as an ov
507
527
  | `tap <selector>` | Touch-oriented tap alias for iOS/provider workflows. |
508
528
  | `swipe <direction> [distance]` | Touch-oriented swipe for iOS/provider workflows. |
509
529
 
510
- On dashboards and other apps with nested scroll containers, `scroll <dir> [px]` may report a successful wheel action while the viewport appears unchanged because the page-level scroller was not the one containing the content. For top-level `scroll` calls without startup-scoped launch flags, the wrapper samples viewport and prominent scroll-container positions before and after the command; when nothing changes it appends `Scroll diagnostic: no observed scroll movement`, exposes `details.scrollNoop`, and adds exact `details.nextActions` for a fresh `snapshot -i` and screenshot. Use those before repeating page scrolls; when you need a specific panel, prefer `scrollintoview <@ref>` or a scoped interaction with the actual scrollable region.
530
+ On dashboards and other apps with nested scroll containers, `scroll <dir> [px]` may report a successful wheel action while the viewport appears unchanged because the page-level scroller was not the one containing the content. For top-level `scroll` calls without startup-scoped launch flags, the wrapper samples viewport and prominent scroll-container positions before and after the command; when nothing changes it prepends `Scroll completed with no observed movement`, appends `Scroll diagnostic: no observed scroll movement`, exposes `details.scrollNoop`, marks `details.data.scrolled: false`, and adds exact `details.nextActions` for a fresh `snapshot -i` and screenshot. For explicit CSS containers, the wrapper handles `scroll <selector> <up|down|left|right> [px|percent]` itself with a bounded in-page scroll probe before falling back to page scroll, returning `details.scrollContainer` evidence. The wrapper also handles `scroll to end` / `scroll to top` directly against `document.scrollingElement` and reports `details.scrollPage` before falling back to upstream page scroll. Use those before repeating page scrolls; when you need a specific element, prefer `scrollintoview <@ref>` or target the actual scrollable region.
511
531
 
512
532
  Comboboxes vary by app. For native `<select>` controls, prefer raw `select <selector> <value...>`, `semanticAction: { action: "select", selector, value|values }`, or a `job` `select` step instead of clicking option refs; native option refs can be non-boxed in CDP and fail before a real selection. A `click` or `semanticAction` role/name click may focus a searchable custom combobox without opening its option list. For explicit combobox-targeted actions such as `semanticAction` role `combobox`, the wrapper checks whether a combobox-like element is focused, has explicit `aria-expanded` state, and has no visible listbox/options open; this still applies when the semantic action first resolves to a current visible `@ref` before execution. When that happens it appends `Combobox diagnostic: focused combobox did not expose visible options`, exposes `details.comboboxFocus`, and adds exact `details.nextActions` for a fresh `snapshot -i`, `press ArrowDown`, and `press Enter`. Use those instead of assuming click alone expanded the control; reserve visible option refs for custom comboboxes after a fresh snapshot shows the intended option.
513
533
 
@@ -591,7 +611,9 @@ Stable tab ids look like `t1`, `t2`, and `t3`. Optional user labels such as `doc
591
611
  | `snapshot -d <n>` / `snapshot --depth <n>` | Limit tree depth. |
592
612
  | `snapshot -s <sel>` / `snapshot --selector <sel>` | Scope to a CSS selector. |
593
613
 
594
- When a snapshot is too large for inline output, the Pi wrapper renders a compact view before spilling the full raw snapshot to `details.fullOutputPath`. Compact snapshots are main-content-first, but dense pages and desktop host screens can still hide actionable controls in omitted content; scan `Omitted high-value controls` before opening the spill file. That bounded section favors editable/searchbox/textbox/combobox controls, named tab/surface controls, and primary action buttons, then includes other useful controls such as checkboxes, radios, options, and menuitems that were not already listed under key refs or other refs. When that section appears, `details.data.highValueControlRefIds` repeats the same visible ref ids for programmatic follow-up alongside fields such as `previewMode`, `previewSections`, and counts on `details.data` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details)).
614
+ When a snapshot is too large for inline output, the Pi wrapper renders a compact view before spilling the full raw snapshot to `details.fullOutputPath`. Compact snapshots are main-content-first, but dense pages and desktop host screens can still hide actionable controls in omitted content; scan `Omitted high-value controls` before opening the spill file. That bounded section favors editable/searchbox/textbox/combobox controls, named tab/surface controls, primary action buttons, and high-signal named links such as repository search results, then includes other useful controls such as checkboxes, radios, options, and menuitems that were not already listed under key refs or other refs. When that section appears, `details.data.highValueControlRefIds` repeats the same visible ref ids for programmatic follow-up alongside fields such as `previewMode`, `previewSections`, and counts on `details.data` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details)).
615
+
616
+ For dense pages, the wrapper also accepts `snapshot -i --search <text>` and `snapshot -i --filter role=<role>` as wrapper-side filters. It runs upstream `snapshot` without those wrapper-only flags, records the full returned ref map in `details.refSnapshot` for stale-ref safety, and renders only matching refs/lines in the model-visible snapshot with `details.snapshotFilter` counts. Add wrapper-side `--viewport` when scroll position, viewport size, document size, and sampled scroll-container offsets matter; it runs one read-only `eval --stdin` probe and reports `details.snapshotViewport`. Add wrapper-side `--diff` to compare the current ref map with the previous wrapper-tracked snapshot for that session and report `details.snapshotDiff` added/removed/changed refs. Use these flags when you need controls like checkout buttons, all comboboxes, above/below-fold context, or a quick before/after ref delta without reading a full spill file.
595
617
 
596
618
  ### Wait
597
619
 
@@ -637,7 +659,7 @@ Current v0.27.1 source does not parse `wait <selector> --state hidden` / `wait <
637
659
  | `pushstate <url>` | Perform SPA client-side navigation; detects Next.js router pushes and falls back to history navigation events. |
638
660
  | `removeinitscript <id>` | Remove an init script registered through upstream init-script mechanisms. |
639
661
 
640
- When these diagnostic commands are invoked through the native `agent_browser` tool, structured console, page-error, React, Web Vitals, and SPA outputs render as compact summaries when possible, with large outputs previewed and spilled instead of dumped into context. Large outputs are previewed with a `Full output path:` spill file instead of dumping the entire payload into context. Artifact-producing commands such as `network har stop`, `diff screenshot`, `trace stop`, `profiler stop`, and `record stop` report `details.artifacts[]` plus `details.artifactVerification`; `record start` is reported as pending until `record stop` completes. For video workflows, keep `ffmpeg` on `PATH` first; on macOS with Homebrew, `brew install ffmpeg` or `brew install ffmpeg-full` is sufficient. Successful `record start` / `record restart` results warn early with `details.recordingDependencyWarning` when the wrapper cannot find `ffmpeg`, so fix PATH before `record stop` instead of discovering the missing encoder after the capture. The README install section keeps the concise external-dependency list for maximal extension use.
662
+ When these diagnostic commands are invoked through the native `agent_browser` tool, structured console, page-error, React, Web Vitals, and SPA outputs render as compact summaries when possible, with large outputs previewed and spilled instead of dumped into context. Large outputs are previewed with a `Full output path:` spill file instead of dumping the entire payload into context. Artifact-producing commands such as `network har stop`, `diff screenshot`, `trace stop`, `profiler stop`, and `record stop` report `details.artifacts[]` plus `details.artifactVerification`; `record start` / `record restart` are reported as pending until `record stop` completes. For video workflows, keep `ffmpeg` on `PATH` first; on macOS with Homebrew, `brew install ffmpeg` or `brew install ffmpeg-full` is sufficient. Successful `record start` / `record restart` results warn early with `details.recordingDependencyWarning` when the wrapper cannot find `ffmpeg`, so fix PATH before `record stop` instead of discovering the missing encoder after the capture. The README install section keeps the concise external-dependency list for maximal extension use.
641
663
 
642
664
  Long-running or lifecycle commands should be explicitly paired with cleanup calls: `stream enable` → `stream disable`, `dashboard start` → `dashboard stop`, `trace start` → `trace stop`, `profiler start` → `profiler stop`, and `record start` → `record stop`. The wrapper keeps each subprocess bounded by its normal timeout; it does not keep an interactive `chat` REPL open, so prefer `chat <message>` with `--model` or `AI_GATEWAY_MODEL` for single-shot AI use.
643
665
 
@@ -669,7 +691,7 @@ Long-running or lifecycle commands should be explicitly paired with cleanup call
669
691
  | `doctor [--fix]` | Diagnose install issues and optionally auto-clean stale files. Use `doctor --offline --quick` for a fast local-only check and `doctor --json` for structured output. |
670
692
  | `profiles` | List available Chrome profiles. |
671
693
 
672
- When these commands are invoked through the native `agent_browser` tool, structured diagnostic/status outputs are rendered as compact summaries. Local inspection/setup calls (`auth save/list/show/delete/remove`, `dashboard start/stop`, `device list`, `doctor`, `install`, `upgrade`, `profiles`, `session list`, `state list/show/rename`, `state clean --older-than <days>`, `state clear --all`, `state clear -a`, and `state clear <session-name>`) are sessionless unless you explicitly pass `--session`; context-dependent calls such as root `session`, untargeted `state clear`, `auth login`, `chat`, and `state save/load` keep normal session behavior. List-like outputs such as sessions, Chrome profiles, auth profiles, network requests, console messages, and page errors include counts and key fields; large outputs are previewed with a `Full output path:` spill file instead of dumping the entire payload into context. For `network requests`, the wrapper shows a failed-request summary split into actionable versus benign low-impact rows, then status, method, URL, resource/mime type, request id, and, when the installed upstream output includes body-like fields, bounded redacted payload, response, and failure/error snippets. Safe request IDs also produce `details.nextActions` for exact request details, actionable failed-request source lookup candidates, filtered request lists, or starting HAR capture before a repro. If the same session has active wrapper-observed network routes, pending/CORS-looking matched request rows add `details.networkRouteDiagnostics` and executable route-mock next actions before the generic request actions. `network request <requestId>` can expose upstream full-detail body fields such as response bodies using the same bounded model-facing preview; its request URL stays diagnostic-only and does not overwrite `details.sessionTabTarget` for later ref guards. Clipboard failures that mention `NotAllowedError` or permission denial are usually browser/OS capability limits, not proof that a read, paste, or page mutation happened; prefer page-native reads (`snapshot -i`, `get text`, `eval --stdin`) or direct typing (`keyboard inserttext` / `keyboard type`) when the workflow allows it, and retry true clipboard flows only from an allowed profile/session on a normal `http(s)` page. Header, cookie, auth, token, and other secret-like fields are not expanded in model-facing text or `details.data`; low-risk primitive storage values may remain visible, while command echoes still redact `--body`, `--headers`, `--password`, proxy credentials, auth-bearing URLs, `clipboard write` text, cookie/storage set values, and bearer/basic credential text in positional arguments. Use upstream HAR or full raw details only when complete data is required.
694
+ When these commands are invoked through the native `agent_browser` tool, structured diagnostic/status outputs are rendered as compact summaries. Local inspection/setup calls (`auth save/list/show/delete/remove`, `dashboard start/stop`, `device list`, `doctor`, `install`, `upgrade`, `profiles`, `session list`, `state list/show/rename`, `state clean --older-than <days>`, `state clear --all`, `state clear -a`, and `state clear <session-name>`) are sessionless unless you explicitly pass `--session`; context-dependent calls such as root `session`, untargeted `state clear`, `auth login`, `chat`, and `state save/load` keep normal session behavior. List-like outputs such as sessions, Chrome profiles, auth profiles, network requests, console messages, and page errors include counts and key fields; large outputs are previewed with a `Full output path:` spill file instead of dumping the entire payload into context. For `network requests`, the wrapper shows a failed-request summary split into actionable versus benign low-impact rows, then status, method, URL, resource/mime type, request id, and, when the installed upstream output includes body-like fields, bounded redacted payload, response, and failure/error snippets. Safe request IDs also produce `details.nextActions` for exact request details, actionable failed-request source lookup candidates, filtered request lists, or starting HAR capture before a repro. If the same session has active wrapper-observed network routes, failed/pending/CORS-looking matched request rows add `details.networkRouteDiagnostics` and executable route-mock next actions before the generic request actions. `data:image` artifact rows are omitted from compact request previews but remain in raw `details.data.requests`. `network request <requestId>` can expose upstream full-detail body fields such as response bodies using the same bounded model-facing preview; its request URL stays diagnostic-only and does not overwrite `details.sessionTabTarget` for later ref guards. Clipboard failures that mention `NotAllowedError` or permission denial are usually browser/OS capability limits, not proof that a read, paste, or page mutation happened; prefer page-native reads (`snapshot -i`, `get text`, `eval --stdin`) or direct typing (`keyboard inserttext` / `keyboard type`) when the workflow allows it, and retry true clipboard flows only from an allowed profile/session on a normal `http(s)` page. Header, cookie, auth, token, and other secret-like fields are not expanded in model-facing text or `details.data`; low-risk primitive storage values may remain visible, while command echoes still redact `--body`, `--headers`, `--password`, proxy credentials, auth-bearing URLs, `clipboard write` text, cookie/storage set values, and bearer/basic credential text in positional arguments. Use upstream HAR or full raw details only when complete data is required.
673
695
 
674
696
  ## Optional package config and companion web search
675
697
 
@@ -818,7 +840,7 @@ Other useful environment variables include `AGENT_BROWSER_DEFAULT_TIMEOUT`, `AGE
818
840
  - For sessions with observed tab-drift risk, after a successful command on a known target tab, agent_browser also best-effort restores that intended tab if a restored/background tab steals focus after the command completes. Routine same-session commands skip this post-command tab-list probe.
819
841
  - If a known session target unexpectedly reports about:blank, agent_browser best-effort re-selects the prior intended target when it still exists; if recovery fails, it records the observed about:blank target and reports exact recovery guidance instead of treating the prior page as active.
820
842
  <!-- agent-browser-playbook:end wrapper-tab-recovery -->
821
- - Wrapper-spawned commands clamp `AGENT_BROWSER_DEFAULT_TIMEOUT` to 25 seconds and use a 28-second child-process watchdog (`PI_AGENT_BROWSER_PROCESS_TIMEOUT_MS` overrides the default 28s budget) so one upstream CLI call does not cross the upstream 30-second IPC read-timeout/retry path. When that watchdog fires, `details.timeoutPartialProgress` may include a planned step list for compiled `job` / `qa` plans or caller `batch` stdin, current page title/URL from best-effort session `get url` / `get title` (or a planned URL inferred from the step list when the session cannot answer), and declared artifact paths such as `screenshot`, `pdf`, `download`, or `wait --download` outputs with existence/size checks; the same evidence is appended under `Timeout partial progress` in visible text with URL/path redaction.
843
+ - Wrapper-spawned commands clamp `AGENT_BROWSER_DEFAULT_TIMEOUT` to 25 seconds and use a 35-second child-process watchdog (`PI_AGENT_BROWSER_PROCESS_TIMEOUT_MS` overrides the default 35s budget; top-level `timeoutMs` overrides it per browser CLI call). The default now lets ordinary calls survive the upstream 30-second IPC retry window while still bounding wedged children. Dialog commands are additionally bounded to 5 seconds (`PI_AGENT_BROWSER_DIALOG_PROCESS_TIMEOUT_MS`), and click/tap/find refs or tokens plus `eval --stdin` snippets that look like alert/confirm/prompt/dialog triggers are bounded to 8 seconds (`PI_AGENT_BROWSER_DIALOG_TRIGGER_PROCESS_TIMEOUT_MS`). When any watchdog fires, `details.timeoutPartialProgress` may include a planned step list with per-step status (including `generatedFrom` labels for wrapper-inserted rows such as `open.loadState`) and a `retry-timeout-step` next action only when the first incomplete step is read-only or idempotent, current page title/URL from best-effort session `get url` / `get title` (or a planned URL inferred from the step list when the session cannot answer), an `openedButPostOpenTimedOut` classification only when a live page URL was recovered before a later step hung, and declared artifact paths such as `screenshot`, `pdf`, `download`, or `wait --download` outputs with existence/state checks; the same evidence is appended under `Timeout partial progress` in visible text with URL/path redaction.
822
844
  - Oversized snapshots and oversized generic outputs may be compacted in tool content, with the full raw output written to a spill file path shown directly in the tool result. Recent artifact metadata is bounded by `PI_AGENT_BROWSER_SESSION_ARTIFACT_MANIFEST_MAX_ENTRIES` (default 100); persisted spill files are separately bounded by `PI_AGENT_BROWSER_SESSION_ARTIFACT_MAX_BYTES` (default 32 MiB).
823
845
  - The wrapper keeps `--help` and `--version` stateless so they do not consume the implicit managed-session slot.
824
846
 
package/docs/ELECTRON.md CHANGED
@@ -230,7 +230,7 @@ On Pi `quit`, active wrapper-owned Electron launches are best-effort cleaned. On
230
230
  }
231
231
  ```
232
232
 
233
- `qa.attached` rejects `url` and is incompatible with `sessionMode: "fresh"` — attach first with `electron.launch` or raw `connect`, then run `qa.attached`. The full field rules and pass/fail classification live in [`TOOL_CONTRACT.md#qa`](TOOL_CONTRACT.md#qa).
233
+ `qa.attached` rejects `url` and is incompatible with `sessionMode: "fresh"` — attach first with `electron.launch` or raw `connect`, then run `qa.attached`. Preserved-buffer diagnostics (`checkConsole`, `checkErrors`, `checkNetwork`) default to `false` for attached QA; opt into them when you want historical session buffers to fail the smoke. The full field rules and pass/fail classification live in [`TOOL_CONTRACT.md#qa`](TOOL_CONTRACT.md#qa).
234
234
 
235
235
  In attached Electron sessions, broad selectors such as `body`, `html`, `main`, or `[role=application]` can read the entire app shell. When `get text <selector>` looks too broad, the wrapper may attach `details.electronGetTextScopeWarning` and a `snapshot-for-electron-text-scope` next action; prefer a fresh `snapshot -i`, a current `@ref`, or a narrower panel selector.
236
236
 
package/docs/RELEASE.md CHANGED
@@ -35,12 +35,13 @@ npm run verify -- release
35
35
  `npm run verify -- release` runs:
36
36
 
37
37
  1. `npm run verify` for generated playbook drift, TypeScript, unit/fake coverage, command-reference generated-block drift, and live command-reference verification against the targeted upstream on `PATH`
38
- 2. `npm run verify -- package-pi`, which first validates package contents via `npm pack --json --dry-run` and then smoke-loads the packed package in Pi isolation
39
- 3. `npm run smoke:platform:doctor` and the full Crabbox matrix from [`platform-smoke.md`](platform-smoke.md): macOS SSH, Ubuntu local-container, and native Windows Parallels targets running fast target-local `platform-build` plus `browser-dogfood-smoke`
38
+ 2. `npm run verify -- lifecycle`, which launches the configured-source lifecycle harness for `/reload`, exact `--session-id` relaunch, managed-session continuity, persisted spill reachability, and Pi failure-patch behavior
39
+ 3. `npm run verify -- package-pi`, which first validates package contents via `npm pack --json --dry-run` and then smoke-loads the packed package in Pi isolation
40
+ 4. `npm run smoke:platform:doctor` and the full Crabbox matrix from [`platform-smoke.md`](platform-smoke.md): macOS SSH, Ubuntu local-container, and native Windows Parallels targets running fast target-local `platform-build` plus `browser-dogfood-smoke`
40
41
 
41
- `npm publish` runs npm’s `prepublishOnly` script from `package.json`, which executes the same `npm run verify -- release` gate and then `npm pack --dry-run`. That concatenated gate is everything in the default `npm run verify` step (generated playbook drift, TypeScript, the unit/fake suite, generated command-reference blocks, and live upstream command-reference sampling against the targeted `agent-browser` on `PATH`) plus the packaged Pi smoke in `package-pi` and the release-blocking Crabbox platform matrix. Using `npm publish --ignore-scripts` skips that contract intentionally.
42
+ `npm publish` runs npm’s `prepublishOnly` script from `package.json`, which executes the same `npm run verify -- release` gate and then `npm pack --dry-run`. That concatenated gate is everything in the default `npm run verify` step (generated playbook drift, TypeScript, the unit/fake suite, generated command-reference blocks, and live upstream command-reference sampling against the targeted `agent-browser` on `PATH`), the configured-source lifecycle harness, the packaged Pi smoke in `package-pi`, and the release-blocking Crabbox platform matrix. Using `npm publish --ignore-scripts` skips that contract intentionally.
42
43
 
43
- `prepublishOnly` intentionally does **not** run the standalone host-only `npm run verify -- lifecycle`, `npm run verify -- real-upstream`, `npm run verify -- dogfood`, or `npm run verify -- benchmark` modes; those remain separate `npm run verify` modes in [`scripts/project.mjs`](../scripts/project.mjs). The platform matrix includes its own fast target-local build/package gate and browser dogfood suite, and is automated through the `release` slice.
44
+ `prepublishOnly` intentionally does **not** run the standalone host-only `npm run verify -- real-upstream`, `npm run verify -- dogfood`, or `npm run verify -- benchmark` modes; those remain separate `npm run verify` modes in [`scripts/project.mjs`](../scripts/project.mjs). The platform matrix includes its own fast target-local build/package gate and browser dogfood suite, and is automated through the `release` slice.
44
45
 
45
46
  For a deterministic host-only real-browser wrapper smoke without model choice in the loop, run:
46
47
 
@@ -137,7 +138,7 @@ Please gather enough evidence to support the smoke result:
137
138
  Return a concise PASS/FAIL report with evidence and any tool or workflow issues you noticed. Do not create a dogfood-output report directory.
138
139
  ```
139
140
 
140
- Evaluator expectations after the queued Sauce Demo fixes: the agent should independently choose efficient, safe browser operations; native add-to-cart clicks should mutate cart state without the agent authoring `eval`/DOM-click fallbacks (the wrapper may fail with `details.clickDispatch` when upstream reports click success but no trusted DOM event reached the target); same-snapshot form fills may be batched safely when the agent chooses that route; the selected sort order should be verified; checkout must stop before Finish and must not place the order; if the agent attempts Finish or another likely final submit action, the wrapper should block it with `details.promptGuard.reason: "explicit-user-stop-boundary"`; screenshot and recording must use the requested paths or be explicitly reported unavailable, and close should be blocked with `details.promptGuard.reason: "requested-artifacts-missing-before-close"` until required screenshot paths are verified; `network requests` may show public-demo telemetry 401s; `console` may report offline-cache logs; `errors` should show no page errors; and the browser session plus temp artifacts should be cleaned up after evidence is recorded. A run that reaches `checkout-complete.html` or silently substitutes artifact paths is a workflow failure even if other store flow steps work.
141
+ Evaluator expectations after the queued Sauce Demo fixes: the agent should independently choose efficient, safe browser operations; native add-to-cart clicks should mutate cart state without the agent authoring `eval`/DOM-click fallbacks (the wrapper may fail with `details.clickDispatch` when upstream reports click success but no trusted DOM event reached the target); same-snapshot form fills may be batched safely when the agent chooses that route; the selected sort order should be verified; checkout must stop before Finish and must not place the order; the agent must not attempt Finish or another likely final submit action because prompt stop-boundaries are agent responsibility rather than wrapper-enforced business-intent policy; screenshot and recording must use the requested paths or be explicitly reported unavailable, and close should be blocked with `details.promptGuard.reason: "requested-artifacts-missing-before-close"` until required screenshot paths are verified; `network requests` may show public-demo telemetry 401s; `console` may report offline-cache logs; `errors` should show no page errors; and the browser session plus temp artifacts should be cleaned up after evidence is recorded. A run that reaches `checkout-complete.html` or silently substitutes artifact paths is a workflow failure even if other store flow steps work.
141
142
 
142
143
  ## Deterministic agent efficiency benchmark
143
144
 
@@ -64,9 +64,12 @@ Define the product requirements and constraints for `pi-agent-browser-native`.
64
64
 
65
65
  ### Native `agent_browser` inputs
66
66
 
67
- - Each tool invocation must supply **exactly one** of: `args` (full upstream argv after the binary name), top-level `semanticAction` (a small intent object compiled into existing upstream `find` argv for locator actions or upstream `select <selector> <value...>` argv for native dropdown selection), `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron` (bounded desktop lifecycle: host `list`, wrapper-owned isolated `launch` with CDP attach, `status`, compact `probe`, and `cleanup`; mutually exclusive with caller `stdin`). Supplying multiple modes or none is rejected before launch (`extensions/agent-browser/index.ts`, `test/agent-browser.extension-validation.test.ts`). Contract and field rules: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#electron); operator workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#electron-desktop-apps).
67
+ - Each tool invocation must supply **exactly one** of: `args` (full upstream argv after the binary name), top-level `semanticAction` (a small intent object compiled into existing upstream `find` argv for locator actions, direct selector/ref `click` / `check` / `fill` argv, or upstream `select <selector> <value...>` argv for native dropdown selection), `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron` (bounded desktop lifecycle: host `list`, wrapper-owned isolated `launch` with CDP attach, `status`, compact `probe`, and `cleanup`; mutually exclusive with caller `stdin`). Supplying multiple modes or none is rejected before launch (`extensions/agent-browser/index.ts`, `test/agent-browser.extension-validation.test.ts`). Contract and field rules: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#electron); operator workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#electron-desktop-apps).
68
68
  - `semanticAction` is not a nested shape inside `batch` stdin; batch steps remain upstream argv string arrays, including `find` steps expressed as token lists.
69
69
  - Supported actions, locators, exclusivity rules, when `details.compiledSemanticAction` appears, and bounded `try-*-candidate` follow-ups on `selector-not-found` (specific action/locator pairs only; see contract) are specified in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction), with workflow examples in [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md).
70
+ - Constrained `job` remains a thin batch compiler, but its `click`/`fill` steps may use the same semantic locator fields as `semanticAction` so short workflows can avoid brittle selectors without adding a reusable recipe runtime, and `type` steps may expand to a bounded set of existing upstream focus/keyboard/wait/press rows for human-paced input while compacting model-visible batch text. `job` must default to fail-fast (`batch --bail`) so later mutating steps do not run after an earlier required step fails; `failFast: false` is the explicit opt-out.
71
+ - `qa` must fail fast on failed readiness/text/selector assertions so missing expected text cannot burn the wrapper watchdog before reporting `qa-failure`. `qa.attached` must never erase existing session diagnostics; URL-opening `qa` may clear buffers to scope a fresh page load. Attached QA preserves buffers, reports that scope in `details.compiledQaPreset.checks.diagnosticsResetAtStart`, and defaults diagnostic checks off unless the caller opts into preserved-buffer failure checks.
72
+ - Direct artifact workflows must create missing parent directories before spawning upstream and must verify saved files before downstream use. Simple loopback HTTP(S) anchor downloads may be saved directly by the wrapper to the requested path when that avoids upstream random-name download behavior without bypassing authenticated browser credentials. `outputPath` may write a successful result payload to a caller-requested local file and must report `details.outputFile`. Missing non-pending artifacts, including diff screenshot outputs, must never use saved/verified wording; `record start` future files are pending/open until `record stop`, not missing.
70
73
 
71
74
  ### Documentation standard
72
75
 
@@ -85,7 +88,7 @@ Define the product requirements and constraints for `pi-agent-browser-native`.
85
88
  - The primary confidence path is a real `pi` session driven in `tmux`.
86
89
  - For quick local checkout smoke validation, launch `pi --no-extensions -e .` from the repository root so only the checkout copy loads; do not rely on Pi settings or `/reload` semantics in this isolated mode.
87
90
  - For hot-reload validation, configure exactly one active source for this extension in Pi settings and launch plain `pi`; validate `/reload` there because it exercises auto-discovered/configured resources.
88
- - Maintain a tmux-driven configured-source lifecycle harness (`npm run verify -- lifecycle`; required before release per `docs/RELEASE.md`) that isolates Pi settings, uses exactly one configured source, exercises `/reload`, full restart plus exact `--session-id` relaunch, and asserts managed-session continuity, persisted artifact survival, and real Pi `tool_result` failure-patch semantics. It is its own `npm run verify` mode rather than part of the default `npm run verify` sequence, but operators still run it before every publish. The harness defaults Pi to model `zai/glm-5.1` (`scripts/verify-lifecycle.mjs`); pass `--model <id>` after `lifecycle` when a different model is required. Keep `docs/RELEASE.md` accurate about the harness behavior, cleanup, transcript retention, and limitations.
91
+ - Maintain a tmux-driven configured-source lifecycle harness (`npm run verify -- lifecycle`; required before release per `docs/RELEASE.md`) that isolates Pi settings, uses exactly one configured source, exercises `/reload`, full restart plus exact `--session-id` relaunch, and asserts managed-session continuity, persisted artifact survival, and real Pi `tool_result` failure-patch semantics. It remains outside the default `npm run verify` sequence, but it is embedded in `npm run verify -- release` so `prepublishOnly` enforces it before publish unless scripts are intentionally skipped. The harness defaults Pi to model `zai/glm-5.1` (`scripts/verify-lifecycle.mjs`); pass `--model <id>` after `lifecycle` when a different model is required. Keep `docs/RELEASE.md` accurate about the harness behavior, cleanup, transcript retention, and limitations.
89
92
  - Validate a full `pi` restart with exact `--session-id` relaunch or `/resume` when changes touch managed-session continuity, reload behavior, or persisted artifact paths. Validate branch-backed state changes with the focused `session_tree` harness tests.
90
93
  - Prefer full `pi` restart over `/reload` when validating extension changes beyond a quick reload smoke check.
91
94
  - Use `/resume` or an explicit session id/path when needed after restart.
@@ -111,7 +114,7 @@ The design should comfortably support workflows such as:
111
114
  - Package-manifest behavior matters more than repo-local development wiring.
112
115
  - The extension should use official `pi` hooks and package resources where possible.
113
116
  - The wrapper should stay thin, with upstream `agent-browser` remaining the source of truth for command semantics.
114
- - Successful and failed tool outcomes should surface bounded machine-readable fields on Pi-facing `details` (`resultCategory`, `successCategory`, `failureCategory`, optional structured `nextActions`, optional `pageChangeSummary` with per-step summaries on `batch`, optional `artifactVerification` with the same shape on successful `batchSteps[]` rows) so agents can branch without parsing prose; stateful commands (`auth`, `cookies`, `storage`, `dialog`, `frame`, `state`) plus other structured diagnostics (for example `network`, `diff`, `trace`, `stream`, `dashboard`, `chat`) and `batch` should redact secret-bearing payloads in model-facing `details.data`, including the compact per-step `batch` roll-up on the parent result (full per-step payloads live on `batchSteps[]`). The contract lives in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), enums and classifier precedence live in `extensions/agent-browser/lib/results/categories.ts` and `contracts.ts` (also re-exported from `shared.ts`), and presentation-time summaries, redaction, network request follow-ups, and artifact verification rollups are assembled in `extensions/agent-browser/lib/results/presentation.ts` (`buildPageChangeSummary`, command taxonomy predicates from `command-taxonomy.ts`, `redactPresentationData`, `buildArtifactVerificationSummary`, `buildBatchPresentation`).
117
+ - Successful and failed tool outcomes should surface bounded machine-readable fields on Pi-facing `details` (`resultCategory`, `successCategory`, `failureCategory`, optional structured `nextActions`, optional `pageChangeSummary` with per-step summaries on `batch`, optional `artifactVerification` with the same shape on successful `batchSteps[]` rows, optional `outputFile`, optional `timeoutPartialProgress`) so agents can branch without parsing prose; stateful commands (`auth`, `cookies`, `storage`, `dialog`, `frame`, `state`) plus other structured diagnostics (for example `network`, `diff`, `trace`, `stream`, `dashboard`, `chat`) and `batch` should redact secret-bearing payloads in model-facing `details.data`, including the compact per-step `batch` roll-up on the parent result (full per-step payloads live on `batchSteps[]`). Dialog/prompt-related timeouts should be bounded with recovery `nextActions`; non-dialog timeouts should prefer best-effort per-step progress and retry payloads when a plan is available; no-op scrolls should expose no-movement state instead of only an upstream success boolean; explicit page/container scroll helpers should expose before/after movement evidence. The contract lives in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), enums and classifier precedence live in `extensions/agent-browser/lib/results/categories.ts` and `contracts.ts` (also re-exported from `shared.ts`), and presentation-time summaries, redaction, network request follow-ups, and artifact verification rollups are assembled in `extensions/agent-browser/lib/results/presentation.ts` (`buildPageChangeSummary`, command taxonomy predicates from `command-taxonomy.ts`, `redactPresentationData`, `buildArtifactVerificationSummary`, `buildBatchPresentation`).
115
118
  - User-facing docs belong in `README.md` and the canonical published files under `docs/`.
116
119
  - Agent workflow and deeper testing procedures can stay in `AGENTS.md`, but published docs must not depend on that file being present.
117
120
  - When upstream `agent-browser` changes, refresh the local command reference, prompt guidance, and other extension-side docs so agents still have a repo-readable equivalent of the blocked direct-binary help path.