pi-agent-browser-native 0.2.32 → 0.2.34
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +36 -0
- package/README.md +61 -20
- package/docs/ARCHITECTURE.md +9 -2
- package/docs/COMMAND_REFERENCE.md +45 -14
- package/docs/ELECTRON.md +23 -4
- package/docs/RELEASE.md +15 -5
- package/docs/REQUIREMENTS.md +1 -1
- package/docs/SUPPORT_MATRIX.md +36 -22
- package/docs/TOOL_CONTRACT.md +90 -31
- package/extensions/agent-browser/index.ts +407 -4373
- package/extensions/agent-browser/lib/input-modes/electron.ts +170 -0
- package/extensions/agent-browser/lib/input-modes/job.ts +265 -0
- package/extensions/agent-browser/lib/input-modes/lookups.ts +447 -0
- package/extensions/agent-browser/lib/input-modes/params.ts +188 -0
- package/extensions/agent-browser/lib/input-modes/semantic-action.ts +107 -0
- package/extensions/agent-browser/lib/input-modes/shared.ts +46 -0
- package/extensions/agent-browser/lib/input-modes/types.ts +221 -0
- package/extensions/agent-browser/lib/input-modes.ts +44 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/diagnostics.ts +762 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/final-result.ts +450 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/index.ts +46 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/prepare.ts +736 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/process-output.ts +413 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/session-state.ts +868 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/types.ts +482 -0
- package/extensions/agent-browser/lib/orchestration/browser-run.ts +1 -0
- package/extensions/agent-browser/lib/orchestration/input-plan.ts +338 -0
- package/extensions/agent-browser/lib/playbook.ts +22 -20
- package/extensions/agent-browser/lib/process.ts +106 -4
- package/extensions/agent-browser/lib/results/action-recommendations.ts +269 -0
- package/extensions/agent-browser/lib/results/artifact-manifest.ts +114 -0
- package/extensions/agent-browser/lib/results/artifact-state.ts +13 -0
- package/extensions/agent-browser/lib/results/categories.ts +106 -0
- package/extensions/agent-browser/lib/results/contracts.ts +220 -0
- package/extensions/agent-browser/lib/results/editable-ref-evidence.ts +72 -0
- package/extensions/agent-browser/lib/results/envelope.ts +2 -1
- package/extensions/agent-browser/lib/results/network.ts +64 -0
- package/extensions/agent-browser/lib/results/next-actions.ts +117 -0
- package/extensions/agent-browser/lib/results/presentation/artifacts.ts +506 -0
- package/extensions/agent-browser/lib/results/presentation/batch.ts +355 -0
- package/extensions/agent-browser/lib/results/presentation/common.ts +53 -0
- package/extensions/agent-browser/lib/results/presentation/content.ts +36 -0
- package/extensions/agent-browser/lib/results/presentation/diagnostics.ts +730 -0
- package/extensions/agent-browser/lib/results/presentation/errors.ts +125 -0
- package/extensions/agent-browser/lib/results/presentation/large-output.ts +182 -0
- package/extensions/agent-browser/lib/results/presentation/navigation.ts +216 -0
- package/extensions/agent-browser/lib/results/presentation/registry.ts +182 -0
- package/extensions/agent-browser/lib/results/presentation/semantic-action.ts +133 -0
- package/extensions/agent-browser/lib/results/presentation/skills.ts +143 -0
- package/extensions/agent-browser/lib/results/presentation.ts +96 -2403
- package/extensions/agent-browser/lib/results/recovery-actions.ts +139 -0
- package/extensions/agent-browser/lib/results/recovery-next-actions.ts +71 -0
- package/extensions/agent-browser/lib/results/selector-recovery.ts +312 -0
- package/extensions/agent-browser/lib/results/shared.ts +17 -789
- package/extensions/agent-browser/lib/results/snapshot-high-value-controls.ts +262 -0
- package/extensions/agent-browser/lib/results/snapshot-refs.ts +100 -0
- package/extensions/agent-browser/lib/results/snapshot-segments.ts +366 -0
- package/extensions/agent-browser/lib/results/snapshot-spill.ts +63 -0
- package/extensions/agent-browser/lib/results/snapshot.ts +37 -489
- package/extensions/agent-browser/lib/results/text.ts +40 -0
- package/extensions/agent-browser/lib/results.ts +16 -5
- package/extensions/agent-browser/lib/session-page-state.ts +486 -0
- package/package.json +2 -1
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,41 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## Unreleased
|
|
4
|
+
|
|
5
|
+
## 0.2.34 - 2026-05-24
|
|
6
|
+
|
|
7
|
+
### Added
|
|
8
|
+
|
|
9
|
+
- Deterministic maintainer dogfood mode: `npm run verify -- dogfood` runs a model-free live-browser smoke through the native wrapper against public `example.com`, covering top-level `qa`, `semanticAction`, `qa.attached`, constrained `job`, screenshot artifact verification, and session close.
|
|
10
|
+
- Opt-in efficiency-benchmark JSONL sampling via `--sample-jsonl`, so maintainers can measure real transcript model-visible byte output without changing deterministic scenario metrics.
|
|
11
|
+
- Architecture note for the Tier A/Tier B prompt-guidance budget, keeping always-on `promptGuidelines` short while preserving detailed browser playbook guidance in docs.
|
|
12
|
+
|
|
13
|
+
### Changed
|
|
14
|
+
|
|
15
|
+
- Always-on `agent_browser` prompt guidance is smaller and focused on Tier A rules: input-mode choice, refs/session/artifacts/nextActions, extraction basics, and explicit stop-before-order/post/purchase/submit boundaries.
|
|
16
|
+
- `semanticAction` success output now better mirrors raw browser-action navigation and page-change summaries, while docs make the input-mode chooser clearer.
|
|
17
|
+
- QA preset pass output is more compact, and `qa.attached` preflight now treats URL-only current-page checks as valid attached-session evidence.
|
|
18
|
+
- Constrained `job` docs now make post-click navigation assertions explicit with `assertUrl` / `assertText` instead of implying hidden automatic navigation checks.
|
|
19
|
+
|
|
20
|
+
### Fixed
|
|
21
|
+
|
|
22
|
+
- Stabilized concurrency-sensitive fake-upstream tests by waiting for the older explicit-session open to reach the fake binary before launching the newer one, and by covering the documented planned-URL fallback separately from strict live current-page recovery assertions.
|
|
23
|
+
|
|
24
|
+
## 0.2.33 - 2026-05-23
|
|
25
|
+
|
|
26
|
+
### Added
|
|
27
|
+
|
|
28
|
+
- `npm run typecheck` as a thin alias for the default gate’s `tsc --noEmit` step (`scripts/project.mjs` `verify typecheck`), for fast iteration without docs drift checks, unit tests, or live upstream command-reference sampling.
|
|
29
|
+
|
|
30
|
+
### Changed
|
|
31
|
+
|
|
32
|
+
- Maintainer docs now map Pi `details` classifiers, `nextActions`, recovery id registries, and network QA helpers to the split `extensions/agent-browser/lib/results/*` modules (plus the `shared.ts` re-export barrel) and call out `extensions/agent-browser/lib/session-page-state.ts` for ordered tab/ref/pinning state—see [`AGENTS.md`](../AGENTS.md), [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md), [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md), [`docs/REQUIREMENTS.md`](docs/REQUIREMENTS.md), and [`docs/SUPPORT_MATRIX.md`](docs/SUPPORT_MATRIX.md).
|
|
33
|
+
- Real Pi transcripts now treat wrapper-classified failures as tool errors: a `tool_result` hook (`buildAgentBrowserToolResultPatch` in `extensions/agent-browser/index.ts`) sets `isError: true` when `details.resultCategory` is `failure` even if `execute` returned successfully (for example `qa` preset reclassification), appends a model-visible `Result category: failure; …; Pi tool isError: true.` line for prose output, and preserves caller-requested `--json` output as parseable JSON—see [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details).
|
|
34
|
+
- Desktop and attach workflows: documented readiness ladders (condition waits, `tab list` → `tab t<N>` → `snapshot -i`, `electron.probe` / `qa.attached`) instead of blind sleeps; clarified that raw `connect` success only means the CDP endpoint accepted the session, and that `close` does not quit manually launched apps or remove explicit artifacts.
|
|
35
|
+
- Page-scoped refs: when `snapshot -i` reports `No active page`, prior session refs are cleared and `details.refSnapshotInvalidation` records `reason: "no-active-page"` until a successful snapshot restores refs; compact snapshots’ `Omitted high-value controls` heuristics favor editables, named tab/surface controls, and primary actions on dense desktop-host UIs.
|
|
36
|
+
- Semantic `fill` recovery on host-controlled rich inputs: `details.richInputRecovery`, visible `Rich input recovery`, and bounded `focus-current-editable-ref*` / `click-current-editable-ref*` next actions (no embedded fill text, no auto-submit); click misses still get bounded role/name `find` candidates where applicable.
|
|
37
|
+
- `get text` selector-visibility warnings now name the matching `details.nextActions` id; `about:blank` drift recovery records the observed blank target when re-selecting the prior tab fails instead of implying the old page stayed active.
|
|
38
|
+
|
|
3
39
|
## 0.2.32 - 2026-05-21
|
|
4
40
|
|
|
5
41
|
### Added
|
package/README.md
CHANGED
|
@@ -56,17 +56,17 @@ The result is optimized for agent work:
|
|
|
56
56
|
|
|
57
57
|
| Pain | Native wrapper capability | Proof surface |
|
|
58
58
|
|---|---|---|
|
|
59
|
-
| Agents build fragile shell commands | Exposes `agent_browser` with exact `args`, an optional `semanticAction` shorthand for common `find` flows and native `select`, constrained `job` / `qa` presets, experimental `sourceLookup` / `networkSourceLookup` that compile short workflows to `batch`, top-level `electron` for desktop lifecycle, plus controlled `stdin` and `sessionMode` | `extensions/agent-browser/index.ts`, [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md) |
|
|
60
|
-
| Page snapshots are too large | Shows compact, main-content-first summaries, surfaces an `Omitted high-value controls` section (plus `details.data.highValueControlRefIds`) when dense pages hide
|
|
59
|
+
| Agents build fragile shell commands | Exposes `agent_browser` with exact `args`, an optional `semanticAction` shorthand for common `find` flows and native `select`, constrained `job` / `qa` presets, experimental `sourceLookup` / `networkSourceLookup` that compile short workflows to `batch`, top-level `electron` for desktop lifecycle, plus controlled `stdin` and `sessionMode` | `extensions/agent-browser/index.ts`, `extensions/agent-browser/lib/input-modes/`, [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md) |
|
|
60
|
+
| Page snapshots are too large | Shows compact, main-content-first summaries, surfaces an `Omitted high-value controls` section (plus `details.data.highValueControlRefIds`) when dense pages or desktop host screens hide editables, named surfaces/tabs, and primary action buttons from the trimmed ref lists, and stores full raw output in spill files when needed | `extensions/agent-browser/lib/results/snapshot.ts`, `test/agent-browser.presentation.test.ts` |
|
|
61
61
|
| Screenshots/downloads get lost in text | Normalizes artifact paths and reports existence, size, cwd, session, and repair status | [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#download-screenshot-and-pdf-files) |
|
|
62
|
-
| Profile restores and tab drift confuse agents | Tracks managed sessions, re-selects target tabs after observed drift, and pins later commands only for sessions with drift/restored-session risk | generated tab-recovery notes below; `test/agent-browser.resume-state.test.ts` |
|
|
63
|
-
| Auth/profile workflows can leak secrets | Supports `auth save --password-stdin` and redacts sensitive args, URLs, stdout/stderr, details, and parse-failure spills | `test/agent-browser.extension-
|
|
64
|
-
| Stateful cookies/storage/auth output bloats or leaks context | Presentation layer redacts `details.data` for cookies and storage (field-aware values) and recursively scrubs other structured upstream JSON (network, diff, trace/profiler, stream, dashboard, chat, auth, dialog, frame, state, and similar) using sensitive key names plus string heuristics; masks sensitive argv flags and positionals; scrubs secrets from failed batch step errors; and exposes a compact redacted `batch` matrix on top-level `details.data` | `extensions/agent-browser/lib/results/presentation.ts`, `extensions/agent-browser/lib/runtime.ts`, `test/agent-browser.presentation.test.ts` |
|
|
65
|
-
| Stale `@eN` refs fail mysteriously | Records per-session `details.refSnapshot`, rejects mismatched URLs / unknown refs / unsafe `batch` stdin ordering before spawn, adds recovery guidance to rerun `snapshot -i` or use stable `find` locators | `extensions/agent-browser/index.ts`, `test/agent-browser.results.test.ts`, `test/agent-browser.extension-
|
|
66
|
-
| Agents need stable success/failure buckets | Exposes bounded `resultCategory`, `successCategory`, and `failureCategory` on tool `details` for branching without parsing prose | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), `extensions/agent-browser/lib/results/shared.ts`, `test/agent-browser.results.test.ts` |
|
|
67
|
-
| Models re-snapshot after every click without new URL/title context | Adds optional `details.pageChangeSummary` (and per-batch-step summaries) with `changeType`, compact text, optional `title`/`url`, artifact hints, and `nextActionIds` aligned to `nextActions`; no-navigation clicks can also surface evidence-backed `details.overlayBlockers` candidates | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), `extensions/agent-browser/lib/results/presentation.ts`, `test/agent-browser.presentation.test.ts` |
|
|
62
|
+
| Profile restores and tab drift confuse agents | Tracks managed sessions, re-selects target tabs after observed drift, and pins later commands only for sessions with drift/restored-session risk | generated tab-recovery notes below; `test/agent-browser.extension-tab-recovery.test.ts` (drift and about:blank recovery), `test/agent-browser.resume-state.test.ts` (persisted session / resume planning) |
|
|
63
|
+
| Auth/profile workflows can leak secrets | Supports `auth save --password-stdin` and redacts sensitive args, URLs, stdout/stderr, details, and parse-failure spills | `test/agent-browser.extension-security-redaction.test.ts` |
|
|
64
|
+
| Stateful cookies/storage/auth output bloats or leaks context | Presentation layer redacts `details.data` for cookies and storage (field-aware values) and recursively scrubs other structured upstream JSON (network, diff, trace/profiler, stream, dashboard, chat, auth, dialog, frame, state, and similar) using sensitive key names plus string heuristics; masks sensitive argv flags and positionals; scrubs secrets from failed batch step errors; and exposes a compact redacted `batch` matrix on top-level `details.data` | `extensions/agent-browser/lib/results/presentation.ts`, `extensions/agent-browser/lib/results/presentation/diagnostics.ts`, `extensions/agent-browser/lib/runtime.ts`, `test/agent-browser.presentation-diagnostics.test.ts` |
|
|
65
|
+
| Stale `@eN` refs fail mysteriously | Records per-session `details.refSnapshot`, rejects mismatched URLs / unknown refs / unsafe `batch` stdin ordering before spawn, adds recovery guidance to rerun `snapshot -i` or use stable `find` locators | `extensions/agent-browser/index.ts`, `extensions/agent-browser/lib/session-page-state.ts`, `test/agent-browser.session-page-state.test.ts`, `test/agent-browser.results.test.ts`, `test/agent-browser.extension-ref-guards.test.ts`, `test/agent-browser.extension-semantic-recovery.test.ts` |
|
|
66
|
+
| Agents need stable success/failure buckets | Exposes bounded `resultCategory`, `successCategory`, and `failureCategory` on tool `details` for branching without parsing prose; a `tool_result` hook also aligns real Pi `isError` semantics, naming `Pi tool isError: true` in prose output while preserving parseable caller-requested `--json` output | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), `extensions/agent-browser/lib/results/categories.ts`, `extensions/agent-browser/lib/results/shared.ts` (re-export barrel), `extensions/agent-browser/index.ts`, `test/agent-browser.results.test.ts`, `test/agent-browser.extension-validation.test.ts` |
|
|
67
|
+
| Models re-snapshot after every click without new URL/title context | Adds optional `details.pageChangeSummary` (and per-batch-step summaries) with `changeType`, compact text, optional `title`/`url`, artifact hints, and `nextActionIds` aligned to `nextActions`; no-navigation clicks can also surface evidence-backed `details.overlayBlockers` candidates | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), `extensions/agent-browser/lib/results/presentation/navigation.ts`, `test/agent-browser.presentation.test.ts`, `test/agent-browser.extension-errors-artifacts.test.ts` |
|
|
68
68
|
| Dashboard scroll commands can look successful while nothing moves | Samples viewport and prominent scroll-container positions around top-level `scroll` calls; unchanged positions produce `details.scrollNoop`, visible recovery guidance, and exact `nextActions` for snapshot/screenshot verification | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#core-page-and-element-commands), `test/agent-browser.extension-validation.test.ts` |
|
|
69
|
-
| Dropdown/combobox clicks can focus or hit native option box-model errors | Adds first-class `select <selector> <value...>` paths through raw `args`, `semanticAction`, and `job`; for custom combobox clicks, detects focused controls with explicit `aria-expanded` state but no visible options and returns `details.comboboxFocus` plus exact recovery `nextActions` | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#core-page-and-element-commands), `test/agent-browser.extension-validation.test.ts` |
|
|
69
|
+
| Dropdown/combobox clicks can focus or hit native option box-model errors | Adds first-class `select <selector> <value...>` paths through raw `args`, `semanticAction`, and `job`; for custom combobox clicks, detects focused controls with explicit `aria-expanded` state but no visible options and returns `details.comboboxFocus` plus exact recovery `nextActions` | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#core-page-and-element-commands), `extensions/agent-browser/lib/input-modes/semantic-action.ts`, `test/agent-browser.extension-input-modes.test.ts`, `test/agent-browser.extension-validation.test.ts` |
|
|
70
70
|
| Recording workflows fail late when `ffmpeg` is missing | After successful `record start` / `record restart`, warns when `ffmpeg` is not on `PATH` so agents can install or fix PATH before `record stop` | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#diff-debug-and-streaming), `test/agent-browser.extension-validation.test.ts` |
|
|
71
71
|
| Direct binary help may be blocked in agent sessions | Publishes a repo-readable command reference and verifies it against the target upstream version | `npm run verify` |
|
|
72
72
|
| Desktop Electron apps need discovery, CDP attach, and safe teardown | Top-level `electron` runs host `list` / isolated `launch` (temp profile, OS-chosen debug port) / `status` / `probe` / `cleanup`, merges `launchId` plus managed `sessionName`, supports `handoff` `snapshot` / `tabs` / `connect`, and surfaces mismatch and post-command health guidance; wrapper cleanup applies only to launches it created | `extensions/agent-browser/lib/electron/discovery.ts`, `launch.ts`, `cleanup.ts`, [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#electron), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#electron-desktop-apps) |
|
|
@@ -150,13 +150,15 @@ It does **not** edit Pi settings and does **not** run upstream `agent-browser do
|
|
|
150
150
|
|
|
151
151
|
You usually prompt the agent in natural language. These JSON snippets show the exact native tool shape the agent should use.
|
|
152
152
|
|
|
153
|
-
Open a page and inspect it:
|
|
153
|
+
Open a page and inspect it (first-call recipe: open → snapshot -i → interact with current `@refs` → snapshot -i after changes). Do not pass `--json` in `args`; the wrapper injects it.
|
|
154
154
|
|
|
155
155
|
```json
|
|
156
156
|
{ "args": ["open", "https://example.com"] }
|
|
157
157
|
{ "args": ["snapshot", "-i"] }
|
|
158
158
|
```
|
|
159
159
|
|
|
160
|
+
On `https://example.com/`, the main link label is **Learn more**—use exact visible text from your snapshot, not guessed copy such as `More information...`.
|
|
161
|
+
|
|
160
162
|
Click a visible ref, then refresh refs after navigation or a DOM update:
|
|
161
163
|
|
|
162
164
|
```json
|
|
@@ -211,23 +213,26 @@ For supported upstream `find` flows and native dropdown selection you can omit h
|
|
|
211
213
|
|
|
212
214
|
Typical pitfalls:
|
|
213
215
|
|
|
214
|
-
- Supply **exactly one** of `args`, `semanticAction`, `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron` per call (not more, not none).
|
|
216
|
+
- Supply **exactly one** of `args`, `semanticAction`, `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron` per call (not more, not none). Prefer `args` for routine browse; `semanticAction` for stable locators; `job`/`qa` for multi-step checks; `electron` for desktop apps; treat `sourceLookup` / `networkSourceLookup` as experimental candidates-only.
|
|
217
|
+
- Do not pass `--json` in `args`; the wrapper injects it automatically.
|
|
215
218
|
- `semanticAction` and `job` are **not** valid inside `batch` stdin; batch steps stay upstream argv string arrays (spell a `find` step as tokens there if you need it in a batch).
|
|
216
219
|
- Commands or locators outside the supported shorthand still require explicit `args`. Common page getters are grouped under `get`: use `get title`, `get url`, or `get text <selector>` rather than shortcut commands such as `title` or `url`; unknown getter shortcuts can return read-only `details.nextActions` like `use-get-title`.
|
|
217
220
|
- For `locator: "role"`, pass either `value: "button"` or `role: "button"`; if both are present they must match.
|
|
218
221
|
- Use `semanticAction.session` to target a named upstream browser session; the wrapper prepends `--session <name>` before the compiled `find` or `select` argv and keeps that prefix on retry/candidate actions. In active sessions, role/name click/check/uncheck shorthands may resolve through the current `snapshot -i` refs before execution so hidden duplicate matches do not steal the action; `details.effectiveArgs` shows the exact executed argv.
|
|
219
222
|
- Do not reuse `@e…` refs across navigation. The wrapper records the latest snapshot refs per session and fails mutation-prone stale/recycled refs before upstream can silently hit a different current-page element; use the session-aware `refresh-interactive-refs` next action.
|
|
220
223
|
- If upstream classifies the failure as `stale-ref` and `details.compiledSemanticAction` is present for a compiled `find` action, `details.nextActions` may list `retry-semantic-action-after-stale-ref` after `refresh-interactive-refs`, carrying the same compiled `find` argv so you can retry the locator-stable target once it is safe to do so. `select` calls that used stale `@refs` only get refresh guidance; use a fresh snapshot or stable selector before retrying (contract in [`docs/TOOL_CONTRACT.md#semanticaction`](docs/TOOL_CONTRACT.md#semanticaction)).
|
|
221
|
-
- If the failure is `selector-not-found`, the wrapper may take one fresh snapshot and add `Current snapshot ref fallback`
|
|
224
|
+
- If the failure is `selector-not-found`, the wrapper may take one fresh snapshot and add `Current snapshot ref fallback` when that snapshot has exact visible role/name matches for the failed `find` / `semanticAction` target. Non-fill targets can include direct `try-current-visible-ref*` next actions, and semantic click misses can still add bounded `Agent-browser candidate fallbacks` such as `button`/`link` role retries for `text` clicks. For semantic `fill` misses on desktop or host-controlled rich inputs, prefer `details.richInputRecovery`: refresh refs, choose the current editable `@ref`, focus or click it, then use `keyboard inserttext` or `keyboard type` with the intended text. Those recovery nextActions do not copy the fill text and do not press `Enter` or submit; only submit when the user flow explicitly calls for it (same contract link).
|
|
222
225
|
- A successful upstream `click` is not proof that the web app handled the event or changed state. When the task depends on a mutation, follow `inspect-after-mutation` / `pageChangeSummary` evidence with a wait, URL/text check, or fresh snapshot before trusting the result; if the target still did not change, retry with a current visible ref or stable selector and report the workflow issue instead of silently continuing. Preserve explicit user stop boundaries: if the user says to stop before order/post/purchase/submit, gather evidence on that page and do not click the final action.
|
|
223
226
|
- If a **top-level** `click` succeeds (unified command `click`, not a `batch` step), upstream reports `data.clicked`, and the tab URL is unchanged under the same normalization as ref preflight (fragment-insensitive), the wrapper may take one extra `snapshot -i` and add `Possible overlay blockers` with `details.overlayBlockers` (`candidates`, `summary`, optional `snapshot` refresh for refs) plus session-aware `inspect-overlay-state` / bounded `try-overlay-blocker-candidate-*` next actions when that snapshot shows strong modal context (`dialog` / `alertdialog`) and close/dismiss-like controls. Page-wide words like privacy, sign in, or banner alone do not trigger this diagnostic. The unchanged-URL check uses `details.navigationSummary`, which is populated with one read-only `eval` summary when the click JSON omits **both** string `data.url` and `data.title`; if upstream already includes either, overlay diagnostics are skipped here. Also skipped when tab correction or about-blank recovery already ran on that result.
|
|
224
|
-
- If `get text <selector>` reads a non-ref CSS selector with multiple matches or a hidden first match while visible matches exist, including successful `batch` steps, the wrapper may add `Selector text visibility warning`, `details.selectorTextVisibility` (plus `selectorTextVisibilityAll` for multiple batched warnings), and `inspect-visible-text-candidates` next actions;
|
|
227
|
+
- If `get text <selector>` reads a non-ref CSS selector with multiple matches or a hidden first match while visible matches exist, including successful `batch` steps, the wrapper may add `Selector text visibility warning`, `details.selectorTextVisibility` (plus `selectorTextVisibilityAll` for multiple batched warnings), and `inspect-visible-text-candidates` next actions; the warning names the matching `details.nextActions` id. Prefer a visible `@ref`, a scoped selector, or a targeted `eval --stdin` over hidden tab content.
|
|
225
228
|
- In attached Electron sessions, broad selectors such as `body`, `html`, `main`, or `[role=application]` may read the whole app shell. The wrapper may add `Broad Electron get text selector warning`, `details.electronGetTextScopeWarning`, and `snapshot-for-electron-text-scope`; prefer `snapshot -i`, a current `@ref`, or a narrower panel selector.
|
|
226
229
|
|
|
227
230
|
### Constrained browser jobs
|
|
228
231
|
|
|
229
232
|
For short repeatable workflows, pass a top-level `job` instead of hand-writing `batch` stdin. The wrapper only supports constrained steps (`open`, `click`, `fill`, `select`, `wait`, `assertText`, `assertUrl`, `waitForDownload`, and `screenshot`), compiles them to existing upstream `batch` commands, and echoes the compiled commands as `details.compiledJob` for auditability. The same compile path backs top-level `qa`, so long `qa` runs surface the same timeout evidence shape. If a long `job`, `qa`, or `batch` hits the wrapper watchdog, `details.timeoutPartialProgress` may recover planned steps, current page title/URL, and declared artifact paths that already exist on disk (see [`docs/TOOL_CONTRACT.md#details`](docs/TOOL_CONTRACT.md#details)). There is no separate catalog of reusable named browser recipes above `job`, `qa`, and raw `batch`; see [`docs/ARCHITECTURE.md#no-reusable-recipe-layer-yet`](docs/ARCHITECTURE.md#no-reusable-recipe-layer-yet) for the closed `RQ-0068` decision and when to revisit it.
|
|
230
233
|
|
|
234
|
+
**Navigation inside `job` is explicit.** A successful `click` does not prove the next page loaded; add `assertUrl` and/or `assertText` after navigation-prone clicks (forms, checkout, tabs, submit buttons) before screenshots or steps that assume the new page.
|
|
235
|
+
|
|
231
236
|
```json
|
|
232
237
|
{
|
|
233
238
|
"job": {
|
|
@@ -240,6 +245,21 @@ For short repeatable workflows, pass a top-level `job` instead of hand-writing `
|
|
|
240
245
|
}
|
|
241
246
|
```
|
|
242
247
|
|
|
248
|
+
```json
|
|
249
|
+
{
|
|
250
|
+
"job": {
|
|
251
|
+
"steps": [
|
|
252
|
+
{ "action": "open", "url": "https://shop.example/checkout" },
|
|
253
|
+
{ "action": "fill", "selector": "#email", "text": "user@example.com" },
|
|
254
|
+
{ "action": "click", "selector": "#continue" },
|
|
255
|
+
{ "action": "assertUrl", "url": "**/shipping" },
|
|
256
|
+
{ "action": "assertText", "text": "Shipping address" },
|
|
257
|
+
{ "action": "screenshot", "path": ".dogfood/shipping.png" }
|
|
258
|
+
]
|
|
259
|
+
}
|
|
260
|
+
}
|
|
261
|
+
```
|
|
262
|
+
|
|
243
263
|
On app pages that expose a native dropdown, add a `select` step such as `{ "action": "select", "selector": "#flavor", "value": "chocolate" }` before the assertion that depends on it.
|
|
244
264
|
|
|
245
265
|
Use raw `args`/`stdin` when you need full upstream `batch` power, custom flags, or commands outside the constrained job schema. Do not pass `stdin` with `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron`; those modes generate or manage their own input.
|
|
@@ -261,17 +281,22 @@ For desktop Electron apps, use top-level `electron` to avoid hand-building the d
|
|
|
261
281
|
|
|
262
282
|
`launch.handoff` still defaults to `"snapshot"`; it retries briefly when the first Electron snapshot has no refs. Use `handoff: "tabs"` as a safer diagnostic starting point when you only need target discovery and do not want interactive refs captured yet, or `handoff: "connect"` when you want attach-only and will run your own `snapshot -i` / tab commands next. For Electron quick inputs that rerender in place, a successful `fill` may include `details.fillVerification` if `get value` still disagrees; re-snapshot and use focus plus keyboard typing before submitting.
|
|
263
283
|
|
|
264
|
-
For an app you launched yourself with remote debugging enabled, use raw upstream attach instead and clean it up yourself:
|
|
284
|
+
For an app you launched yourself with remote debugging enabled, use raw upstream attach instead and clean it up yourself. After attach, inspect targets before assuming the app is ready:
|
|
265
285
|
|
|
266
286
|
```json
|
|
267
287
|
{ "args": ["connect", "9222"], "sessionMode": "fresh" }
|
|
288
|
+
{ "args": ["tab", "list"] }
|
|
289
|
+
{ "args": ["tab", "t2"] }
|
|
290
|
+
{ "args": ["snapshot", "-i"] }
|
|
268
291
|
```
|
|
269
292
|
|
|
270
|
-
|
|
293
|
+
`connect` success means the debug endpoint accepted the session, not that an active page is ready. If a snapshot says `No active page`, the wrapper clears prior refs for that session; choose a stable `t<N>` tab and retry a condition wait or fresh `snapshot -i` before using `@e…` refs. `close` only closes the browser/CDP session; manually launched apps, their profiles, and explicit screenshots/downloads/HARs/traces/recordings remain host-owned.
|
|
294
|
+
|
|
295
|
+
After either path, use `qa: { "attached": true, ... }` for a current-session smoke check without opening a URL. Prefer condition waits (`wait --text`, `wait --url`, `wait --fn`, `wait --load <state>`, `wait --download`), `qa.attached`, `electron.probe` / `electron.status`, `tab list` → `tab t<N>`, fresh snapshots, or screenshots over blind sleeps. Keep fixed waits below the wrapper IPC budget: `wait 30000` is intentionally blocked, and a result like `"waited":"timeout"` only proves elapsed time.
|
|
271
296
|
|
|
272
297
|
### Lightweight QA preset
|
|
273
298
|
|
|
274
|
-
For a quick smoke/QA pass, use top-level `qa`. It compiles to the same batch path as `job`. The URL form clears enabled network/console/page-error buffers before opening the target URL, waits for page readiness, checks optional expected text or selector, inspects fresh network requests, console messages, and page errors, and can capture an evidence screenshot. The attached form (`qa: { "attached": true }`) runs those checks against the current managed session, such as an attached Electron app, and rejects `url`. `loadState` defaults to `"domcontentloaded"`; set it to `"load"` or `"networkidle"` only when the stricter state is useful and the site is not expected to keep background requests alive. `checkNetwork`, `checkConsole`, and `checkErrors` default to true; set one to `false` to skip that diagnostic read. Network failures are classified by likely impact and failed rows are listed first in network previews: actionable document/script/API-style failures still fail QA, while some low-impact browser icon asset misses (for example certain `favicon` or `apple-touch-icon` paths when upstream marks the row failed and resource metadata looks image-like) surface only as warnings instead of failing an otherwise healthy smoke check (`details.qaPreset.warnings`, with human-readable `details.qaPreset.summary` when the preset still passes). Exact predicates live in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#qa) and `classifyNetworkRequestFailure` in `extensions/agent-browser/lib/results/
|
|
299
|
+
For a quick smoke/QA pass, use top-level `qa`. It compiles to the same batch path as `job`. The URL form clears enabled network/console/page-error buffers before opening the target URL, waits for page readiness, checks optional expected text or selector, inspects fresh network requests, console messages, and page errors, and can capture an evidence screenshot. The attached form (`qa: { "attached": true }`) runs those checks against the current managed session, such as an attached Electron app, and rejects `url`. `loadState` defaults to `"domcontentloaded"`; set it to `"load"` or `"networkidle"` only when the stricter state is useful and the site is not expected to keep background requests alive. `checkNetwork`, `checkConsole`, and `checkErrors` default to true; set one to `false` to skip that diagnostic read. Network failures are classified by likely impact and failed rows are listed first in network previews: actionable document/script/API-style failures still fail QA, while some low-impact browser icon asset misses (for example certain `favicon` or `apple-touch-icon` paths when upstream marks the row failed and resource metadata looks image-like) surface only as warnings instead of failing an otherwise healthy smoke check (`details.qaPreset.warnings`, with human-readable `details.qaPreset.summary` when the preset still passes). Exact predicates live in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#qa) and `classifyNetworkRequestFailure` in `extensions/agent-browser/lib/results/network.ts` (re-exported from the compatibility barrel).
|
|
275
300
|
|
|
276
301
|
```json
|
|
277
302
|
{
|
|
@@ -310,6 +335,8 @@ For asynchronous exports, click first and then wait for the download:
|
|
|
310
335
|
|
|
311
336
|
When a user gives exact artifact paths for screenshots, recordings, downloads, PDFs, traces, or HAR files, use those paths or explicitly report why the artifact was unavailable; do not silently substitute a different path in the final report. With upstream `agent-browser 0.27.0`, treat `details.savedFilePath` as upstream-reported metadata and confirm `details.artifacts[].exists` before relying on the requested `wait --download <path>` file being present on disk.
|
|
312
337
|
|
|
338
|
+
For evidence-only screenshots or QA captures, branch on `details.artifactVerification` and `details.artifacts` before reporting PASS/FAIL; inline image attachments are optional when size limits allow—do not require vision review unless the user asked for visual inspection.
|
|
339
|
+
|
|
313
340
|
Artifact cleanup is host-owned, not a browser command. `close` shuts down the browser session but does **not** delete explicit screenshots, downloads, PDFs, traces, HAR files, or recordings saved to paths you chose. When the session’s non-empty `details.artifactManifest` is in scope, a successful `close` appends an `Artifact lifecycle` note and sets `details.artifactCleanup` with the same retention summary as `details.artifactRetentionSummary`, a fixed `note` about host-owned cleanup, and `explicitArtifactPaths`: up to ten distinct paths from manifest rows whose `storageScope` is `explicit-path` (this list can be empty if the recent window only holds spills or other non-explicit inventory). Remove any listed paths with normal file tools after inspection.
|
|
314
341
|
|
|
315
342
|
Start a fresh profiled browser after the implicit public-browsing session already exists:
|
|
@@ -376,7 +403,7 @@ Use SPA and Web Vitals helpers as normal command tokens:
|
|
|
376
403
|
|
|
377
404
|
```json
|
|
378
405
|
{ "args": ["pushstate", "/dashboard"] }
|
|
379
|
-
{ "args": ["vitals", "https://example.com"
|
|
406
|
+
{ "args": ["vitals", "https://example.com"] }
|
|
380
407
|
```
|
|
381
408
|
|
|
382
409
|
For setup that must happen before first navigation, open a blank fresh page, stage routes/cookies/scripts, then navigate:
|
|
@@ -398,7 +425,13 @@ The local verification gate is:
|
|
|
398
425
|
npm run verify
|
|
399
426
|
```
|
|
400
427
|
|
|
401
|
-
|
|
428
|
+
For a fast TypeScript-only iteration loop (same `tsc --noEmit` as the default gate, without docs drift checks, unit tests, or live upstream command-reference sampling):
|
|
429
|
+
|
|
430
|
+
```bash
|
|
431
|
+
npm run typecheck
|
|
432
|
+
```
|
|
433
|
+
|
|
434
|
+
The full `npm run verify` gate runs:
|
|
402
435
|
|
|
403
436
|
- generated playbook/documentation drift checks
|
|
404
437
|
- `tsc --noEmit`
|
|
@@ -406,7 +439,7 @@ It runs:
|
|
|
406
439
|
- command-reference baseline checks
|
|
407
440
|
- live command-reference verification against the targeted installed upstream `agent-browser`
|
|
408
441
|
|
|
409
|
-
Step order and which subprocesses run live in [`scripts/project.mjs`](scripts/project.mjs); [`test/project-verify.test.ts`](test/project-verify.test.ts) locks default, `release`, `real-upstream`, `package-pi`, and combined-docs orchestration so a gate cannot disappear accidentally. Run `npm run verify -- --help` for opt-in modes and supported passthrough flags.
|
|
442
|
+
Step order and which subprocesses run live in [`scripts/project.mjs`](scripts/project.mjs); [`test/project-verify.test.ts`](test/project-verify.test.ts) locks default, `release`, `real-upstream`, `dogfood`, `package-pi`, and combined-docs orchestration so a gate cannot disappear accidentally. Run `npm run verify -- --help` for opt-in modes and supported passthrough flags.
|
|
410
443
|
|
|
411
444
|
The deterministic agent-efficiency benchmark’s **standalone JSON/Markdown accounting run** is not part of default `npm run verify` (only `npm run verify -- benchmark` or `npm run benchmark:agent-browser` invokes the script). The full unit suite still exercises `test/agent-browser.efficiency-benchmark.test.ts`. Use the script before and after agent-facing abstractions to prove call-count, output-size, stale-ref, artifact, failure-category coverage, success-rate, and elapsed-time effects before changing the wrapper UX:
|
|
412
445
|
|
|
@@ -427,6 +460,14 @@ npm run verify -- real-upstream
|
|
|
427
460
|
|
|
428
461
|
That mode sets `PI_AGENT_BROWSER_REAL_UPSTREAM=1` and runs `test/agent-browser.real-upstream-contract.test.ts` against the real `agent-browser` on `PATH` (version must match the capability baseline). It covers inspection, skills, a broad core interaction and navigation matrix on localhost fixtures (including `batch` stdin and `pushstate`), plus `vitals`, network route/requests/HAR, diff snapshot/screenshot/url, trace/profiler, console/errors/highlight, stream enable/status/disable, `cookies set --curl`, a `react tree` missing-renderer path, and `wait --download` with the on-disk caveat documented in release notes. The harness uses a throwaway temp `HOME` and dedicated socket/screenshot directories so the run does not touch your normal browser profile paths. Browser-opening or credential-dependent families such as `inspect`, `dashboard`, `chat`, provider clouds, and OS clipboard flows stay in fake-upstream or manual validation unless a safe deterministic fixture is added. For prerequisites, isolation details, and troubleshooting, see [`docs/RELEASE.md`](docs/RELEASE.md#real-upstream-contract-validation).
|
|
429
462
|
|
|
463
|
+
A deterministic live-browser wrapper smoke is available without an LLM choosing tool calls:
|
|
464
|
+
|
|
465
|
+
```bash
|
|
466
|
+
npm run verify -- dogfood
|
|
467
|
+
```
|
|
468
|
+
|
|
469
|
+
That mode drives the native wrapper through top-level `qa`, `semanticAction`, `qa.attached`, constrained `job`, screenshot artifact verification, and session close against public `example.com`. It complements, but does not replace, the interactive Pi/tmux release dogfood in [`docs/RELEASE.md`](docs/RELEASE.md#pre-release-checks).
|
|
470
|
+
|
|
430
471
|
For package release confidence, follow [`docs/RELEASE.md`](docs/RELEASE.md). The release gate is:
|
|
431
472
|
|
|
432
473
|
```bash
|
|
@@ -514,7 +555,7 @@ These calls return plain text and stay stateless: the extension does not inject
|
|
|
514
555
|
- After launch-scoped open/goto/navigate calls that can restore existing tabs (for example --profile, --session-name, or --state), agent_browser best-effort re-selects the tab whose URL matches the returned page when restored tabs steal focus during launch.
|
|
515
556
|
- After the wrapper observes tab-drift risk for a session (for example profile restore correction, overlapping stale opens, or resumed session state), later active-tab commands best-effort pin that tab inside the same upstream invocation. Routine same-session commands are not preflighted with tab list just because a target tab is known.
|
|
516
557
|
- For sessions with observed tab-drift risk, after a successful command on a known target tab, agent_browser also best-effort restores that intended tab if a restored/background tab steals focus after the command completes. Routine same-session commands skip this post-command tab-list probe.
|
|
517
|
-
- If a known session target unexpectedly reports about:blank, agent_browser
|
|
558
|
+
- If a known session target unexpectedly reports about:blank, agent_browser best-effort re-selects the prior intended target when it still exists; if recovery fails, it records the observed about:blank target and reports exact recovery guidance instead of treating the prior page as active.
|
|
518
559
|
<!-- agent-browser-playbook:end wrapper-tab-recovery -->
|
|
519
560
|
|
|
520
561
|
## Project map
|
package/docs/ARCHITECTURE.md
CHANGED
|
@@ -33,6 +33,7 @@ The extension should:
|
|
|
33
33
|
- resolve `agent-browser` from `PATH`
|
|
34
34
|
- invoke it directly, not through a shell
|
|
35
35
|
- inject `--json`
|
|
36
|
+
- complete each upstream invocation when the direct `agent-browser` child exits even if Node delays `"close"`: piped stdio can stay referenced by longer-lived descendant processes, so `runAgentBrowserProcess` watches `exit` and `close` together, leaves stdio intact during a short post-`exit` grace so normal `close` can still win, destroys streams only when the post-`exit` fallback fires, and prefers `close` codes then wrapper timeout (`124`) over signal-shaped `exit` codes (`watchSpawnedChildCompletion` / `resolveSpawnedChildExitCode` in `extensions/agent-browser/lib/process.ts`) so the tool cannot hang after the CLI process has already terminated
|
|
36
37
|
- support optional stdin only for `eval --stdin`, `batch`, `auth save --password-stdin`, and wrapper-generated `batch` stdin from top-level `job`, `qa`, `sourceLookup`, or `networkSourceLookup`, rejecting other command/stdin combinations before launch; top-level `electron` never accepts caller `stdin` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#electron))
|
|
37
38
|
- accept an optional native `semanticAction` object as a mutually exclusive alternative to `args` on a single tool call (and to `job`, `qa`, `sourceLookup`, `networkSourceLookup`, and `electron` on the same call), compile locator actions into upstream `find` argv and native dropdown selection into upstream `select <selector> <value...>` argv (with optional `semanticAction.session` expanding to a leading `--session <name>` before the compiled command when targeting a named upstream browser instead of the managed default), and echo the compiled shape in `details.compiledSemanticAction` for observability (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction))
|
|
38
39
|
- accept an optional native `job` object (mutually exclusive with `args`, `semanticAction`, `qa`, `sourceLookup`, `networkSourceLookup`, and `electron` on the same call) with a small fixed step vocabulary that compiles only to existing upstream `batch` argv rows, generates the JSON batch stdin string internally, and echoes `details.compiledJob` for observability (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#job))
|
|
@@ -52,6 +53,12 @@ That means:
|
|
|
52
53
|
- no manual user orchestration as the main workflow
|
|
53
54
|
- any future slash commands should be minimal and secondary
|
|
54
55
|
|
|
56
|
+
### Prompt guidance budget
|
|
57
|
+
|
|
58
|
+
Runtime `promptGuidelines` are a Tier A budget, not a full manual. They stay short enough to load on every `agent_browser`-aware turn and carry only high-impact rules: input-mode choice, the open → snapshot → ref loop, launch-scoped session handling, artifact verification, structured `nextActions`, extraction basics, and hard safety boundaries such as “stop before order/post/purchase/submit.”
|
|
59
|
+
|
|
60
|
+
Tier B guidance lives in `SHARED_BROWSER_PLAYBOOK_GUIDELINES`, generated README/command-reference fragments, and targeted docs. When a workflow needs examples, caveats, or long command-family coverage, add it there instead of expanding always-on prompt text. If a Tier B rule prevents a repeated real failure, promote only the smallest durable sentence into Tier A and keep the generated-doc mirrors aligned.
|
|
61
|
+
|
|
55
62
|
### No reusable recipe layer yet
|
|
56
63
|
|
|
57
64
|
Do **not** add reusable browser recipes as a first-class runtime surface yet.
|
|
@@ -120,7 +127,7 @@ Practical policy:
|
|
|
120
127
|
- after profiled `open` / `goto` / `navigate` calls, verify the active tab still matches the returned page URL and best-effort switch back when restored profile tabs steal focus
|
|
121
128
|
- once the wrapper observes tab-drift risk for a session (profile restore correction, overlapping stale opens, or restored session state), later active-tab commands may synthesize a tiny upstream `batch` that re-selects that tab and then runs the requested command in the same upstream invocation; routine same-session commands avoid `tab list` preflights to reduce probes that can perturb upstream click behavior
|
|
122
129
|
- for sessions with observed tab-drift risk, after a successful command on a known tab target, the wrapper may best-effort restore that same target again if restored/background tabs steal focus after the command returns; routine same-session commands skip this post-command `tab list` probe
|
|
123
|
-
- keep a per-session `refSnapshot` aligned with the last successful `snapshot` (including refs merged from a successful `batch` by taking the last successful `snapshot` step in batch result order): restore it from persisted tool `details` when reloading or resuming, drop it on successful `close`, and refuse mutation-prone `@e…` argv before spawn when the active tab URL no longer matches the snapshot URL, when a ref id was never in that snapshot, or when `batch` stdin would reuse `@e…` on a guarded step after an earlier invalidating step without a later `snapshot` step in the same stdin array. Same-snapshot `fill @e…` rows are guarded but do not themselves set that invalidation latch, so ordinary form fills can precede a click/submit row in one batch—see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) for the agent-visible contract and failure text
|
|
130
|
+
- keep a per-session `refSnapshot` aligned with the last successful `snapshot` (including refs merged from a successful `batch` by taking the last successful `snapshot` step in batch result order): restore it from persisted tool `details` when reloading or resuming, drop it on successful `close`, and refuse mutation-prone `@e…` argv before spawn when the active tab URL no longer matches the snapshot URL, when a ref id was never in that snapshot, or when `batch` stdin would reuse `@e…` on a guarded step after an earlier invalidating step without a later `snapshot` step in the same stdin array. Same-snapshot `fill @e…` rows are guarded but do not themselves set that invalidation latch, so ordinary form fills can precede a click/submit row in one batch—see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) for the agent-visible contract and failure text; typed per-session tab/ref/pinning state lives in `extensions/agent-browser/lib/session-page-state.ts` and is updated from `extensions/agent-browser/index.ts` after each tool result
|
|
124
131
|
- after successful `get text` on a non-ref CSS selector, optionally issue one read-only `eval --stdin` probe per qualifying selector when multiple DOM matches or a hidden first match with visible peers could misread tabbed or off-screen content; merge `details.selectorTextVisibility` / `selectorTextVisibilityAll`, visible warning lines, and `inspect-visible-text-candidates*` next actions as documented in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) and `RQ-0074` in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
|
|
125
132
|
- for local Unix launches, set a short private socket directory so extension-generated session names do not fail on the upstream Unix socket-path length limit
|
|
126
133
|
- keep wrapper-spawned upstream CLI calls inside the upstream IPC budget by clamping `AGENT_BROWSER_DEFAULT_TIMEOUT` to 25 seconds and stopping a stuck child process before the upstream 30-second read-timeout retry loop begins
|
|
@@ -168,7 +175,7 @@ This keeps the product centered on native tool usage instead of auxiliary skill
|
|
|
168
175
|
- subprocess execution and JSON parsing through a filtered child environment (`buildAgentBrowserProcessEnv` in `extensions/agent-browser/lib/process.ts`): copies an allowlisted inherited-name set plus every parent `AGENT_BROWSER_*` variable and provider-related prefixes (`AGENTCORE_*`, `AI_GATEWAY_*`, `BROWSERBASE_*`, `BROWSERLESS_*`, `BROWSER_USE_*`, `KERNEL_*`, `XDG_*`) instead of cloning the full parent process environment
|
|
169
176
|
- clear missing-binary errors
|
|
170
177
|
- compact result summaries, including presentation-time redaction: stateful browser-context commands (`auth`, `cookies`, `storage`, `dialog`, `frame`, `state`) use field-aware value redaction and compact formatters, while other structured upstream JSON (for example `network`, `diff`, `trace` / `profiler` / `record`, `console` / `errors` / `highlight` / `inspect` / `clipboard`, `stream`, `dashboard`, and `chat`) is passed through `redactPresentationData` in `extensions/agent-browser/lib/results/presentation.ts` so model-facing `details.data` and batch roll-ups stay compact and do not echo bearer tokens, proxy passwords, or similar fields verbatim; `redactInvocationArgs` in `extensions/agent-browser/lib/runtime.ts` masks trailing values for sensitive global flags such as `--body`, `--headers`, `--password`, and `--proxy`, preserves positional rules for `cookies set` and `storage local|session set`, and nested `batch` steps use the same argv and error-body scrubbing before echoing commands or errors
|
|
171
|
-
- bounded machine-readable outcome metadata on tool `details` (`resultCategory`, `successCategory`, `failureCategory`, optional `nextActions`, optional `pageChangeSummary` with per-step summaries on `batch`, optional `artifactVerification` with the same shape on each successful `batchSteps[]` row) so agents can branch without parsing prose; enums, classifier precedence, and follow-up payloads are
|
|
178
|
+
- bounded machine-readable outcome metadata on tool `details` (`resultCategory`, `successCategory`, `failureCategory`, optional `nextActions`, optional `pageChangeSummary` with per-step summaries on `batch`, optional `artifactVerification` with the same shape on each successful `batchSteps[]` row) so agents can branch without parsing prose; enums, classifier precedence, and generic follow-up payloads are implemented under `extensions/agent-browser/lib/results/` in focused modules (`contracts.ts` for shared types, `categories.ts` for `classifyAgentBrowserSuccessCategory` / `classifyAgentBrowserFailureCategory` / `buildAgentBrowserResultCategoryDetails`, `action-recommendations.ts` for `buildAgentBrowserNextActions`, `next-actions.ts` for the `AgentBrowserNextAction` shape and merge helpers, `recovery-actions.ts` for recovery id registries and `buildRecoveryNextActions`, `network.ts` for `classifyNetworkRequestFailure` / `summarizeNetworkFailures`, and related helpers; `shared.ts` in that directory is a **re-export-only** barrel so historical `./results/shared.js` imports keep working without growing a monolith). Per-session tab target, `refSnapshot` alignment, invalidation, and tab pinning observations flow through `extensions/agent-browser/lib/session-page-state.ts` from `extensions/agent-browser/index.ts`. Compact page-change summaries and artifact verification rollups are built in `extensions/agent-browser/lib/results/presentation.ts` (`buildPageChangeSummary`, `buildArtifactVerificationSummary`), and the human contract lives in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details). Real Pi custom tools otherwise only mark a row failed when `execute` throws, so the extension also registers `pi.on("tool_result", …)` and patches `agent_browser` results whose `details.resultCategory` is `failure` to set `isError: true`. Prose results also receive a short category notice, while caller-requested `--json` results with parseable JSON content keep that text unchanged so JSONL transcripts, UI affordances, and the machine-readable contract stay aligned for wrapper-side reclassifications such as `qa-failure` (`buildAgentBrowserToolResultPatch` in `extensions/agent-browser/index.ts`; transcript semantics in the same contract doc)
|
|
172
179
|
- inline screenshots/images for the plain `screenshot` command; other image-like saves (for example `diff screenshot`) still appear in `details.artifacts` and summaries but are not auto-inlined as Pi image attachments (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details))
|
|
173
180
|
- lightweight session convenience
|
|
174
181
|
- docs, including a repo-readable command reference that mirrors the blocked direct-binary help path closely enough for normal agent work
|
|
@@ -27,6 +27,8 @@ Use `npm run benchmark:agent-browser` or `npm run verify -- benchmark` before an
|
|
|
27
27
|
|
|
28
28
|
## Core mental model
|
|
29
29
|
|
|
30
|
+
Input mode chooser (one per call): **`args`** for the default open → snapshot -i → click/fill `@refs` flow; **`semanticAction`** for stable role/text/label targets; **`job`** / **`qa`** for multi-step checks; **`electron`** for desktop apps only; **`sourceLookup`** / **`networkSourceLookup`** are **experimental candidates-only** helpers (not authoritative mappings). Do not pass `--json` in `args`—the wrapper injects it. Match link and button text to the latest snapshot (on `https://example.com/` the main link is `Learn more`, not legacy `More information...` copy). See [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#input-mode-chooser) for snapshot variants (`-i` vs `--compact` vs full) and batching three or more getters.
|
|
31
|
+
|
|
30
32
|
Tool parameters (use exactly one of `args`, `semanticAction`, `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron`):
|
|
31
33
|
|
|
32
34
|
```json
|
|
@@ -63,8 +65,8 @@ Tool parameters (use exactly one of `args`, `semanticAction`, `job`, `qa`, `sour
|
|
|
63
65
|
- `semanticAction`: optional shorthand for common `find` flows and native dropdown `select`; compiles to upstream argv and is rejected together with `args`, `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron` on the same call.
|
|
64
66
|
- `job`: optional constrained short-workflow schema; compiles to existing upstream `batch` args/stdin and reports the compiled plan in `details.compiledJob`.
|
|
65
67
|
- `qa`: optional lightweight QA preset; compiles to the same batch path and reports `details.compiledQaPreset` plus `details.qaPreset` pass/fail evidence.
|
|
66
|
-
- `sourceLookup`:
|
|
67
|
-
- `networkSourceLookup`:
|
|
68
|
+
- `sourceLookup`: **EXPERIMENTAL — candidates only** for local UI-to-source hints; compiles to the same `batch` path, reports `details.compiledSourceLookup` and `details.sourceLookup`, and never reclassifies a fully successful upstream batch as failed the way `qa` can (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sourcelookup) and the longer notes below).
|
|
69
|
+
- `networkSourceLookup`: **EXPERIMENTAL — candidates only** for failed request-to-source hints; compiles to generated `batch`, reports `details.compiledNetworkSourceLookup` and `details.networkSourceLookup`, and never assigns blame or edits files.
|
|
68
70
|
- `electron`: optional Electron desktop-app shorthand. `list`, `status`, `cleanup`, and `probe` are wrapper-owned host/session helpers; `launch` starts a wrapper-owned isolated Electron profile and attaches through upstream `connect`.
|
|
69
71
|
- `stdin`: only for `batch`, `eval --stdin`, and `auth save --password-stdin`; other command/stdin combinations are rejected before `agent-browser` is launched. `job`, `qa`, `sourceLookup`, `networkSourceLookup`, and `electron` generate or manage their own input.
|
|
70
72
|
- `sessionMode`:
|
|
@@ -105,7 +107,7 @@ React introspection requires the React DevTools init hook to be installed before
|
|
|
105
107
|
Use `vitals [url]` for Core Web Vitals plus React hydration timing when available, and `pushstate <url>` for client-side SPA navigation without a full reload:
|
|
106
108
|
|
|
107
109
|
```json
|
|
108
|
-
{ "args": ["vitals", "https://example.com"
|
|
110
|
+
{ "args": ["vitals", "https://example.com"] }
|
|
109
111
|
{ "args": ["pushstate", "/dashboard?tab=settings"] }
|
|
110
112
|
```
|
|
111
113
|
|
|
@@ -143,11 +145,13 @@ Examples:
|
|
|
143
145
|
{ "args": ["snapshot", "-i"] }
|
|
144
146
|
```
|
|
145
147
|
|
|
146
|
-
The optional native `semanticAction` object is only a thin schema for common locator-based actions and native dropdown selection; it compiles locator actions to existing upstream `find` commands, compiles `action: "select"` to upstream `select <selector> <value...>`, and reports the compiled argv in `details.compiledSemanticAction` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction) for the full field rules). For `locator: "role"`, pass either `value: "button"` or `role: "button"`; if both are present they must match. It is a top-level alternative to `args`, `job`, `qa`, `sourceLookup`, `networkSourceLookup`, and `electron`, not a nested shape inside `batch` stdin arrays. Add `session` inside `semanticAction` when the shorthand should target a named upstream browser session; the compiled argv prepends `--session <name>` before `find` or `select`, and fallback candidate actions preserve that prefix. For active sessions, role/name click/check/uncheck shorthands may resolve through the current `snapshot -i` refs before execution so hidden duplicate matches do not steal the action; inspect `details.effectiveArgs` when you need the exact executed argv. `select` shorthand intentionally requires a stable selector or current `@ref` plus `value`/`values`; upstream `find` does not expose a verified `select` action, so role/name/label dropdown resolution stays a snapshot/selector decision instead of hidden wrapper magic. If a raw `find` or semantic action misses with `selector-not-found`, the wrapper may take one fresh snapshot and append `Current snapshot ref fallback`
|
|
148
|
+
The optional native `semanticAction` object is only a thin schema for common locator-based actions and native dropdown selection; it compiles locator actions to existing upstream `find` commands, compiles `action: "select"` to upstream `select <selector> <value...>`, and reports the compiled argv in `details.compiledSemanticAction` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction) for the full field rules). For `locator: "role"`, pass either `value: "button"` or `role: "button"`; if both are present they must match. It is a top-level alternative to `args`, `job`, `qa`, `sourceLookup`, `networkSourceLookup`, and `electron`, not a nested shape inside `batch` stdin arrays. Add `session` inside `semanticAction` when the shorthand should target a named upstream browser session; the compiled argv prepends `--session <name>` before `find` or `select`, and fallback candidate actions preserve that prefix. For active sessions, role/name click/check/uncheck shorthands may resolve through the current `snapshot -i` refs before execution so hidden duplicate matches do not steal the action; inspect `details.effectiveArgs` when you need the exact executed argv. `select` shorthand intentionally requires a stable selector or current `@ref` plus `value`/`values`; upstream `find` does not expose a verified `select` action, so role/name/label dropdown resolution stays a snapshot/selector decision instead of hidden wrapper magic. If a raw `find` or semantic action misses with `selector-not-found`, the wrapper may take one fresh snapshot and append `Current snapshot ref fallback` when that snapshot has exact visible role/name matches for the failed target. Non-fill matches can include direct `try-current-visible-ref*` next actions. Semantic click misses may also include `Agent-browser candidate fallbacks`; `details.nextActions` first recommends a fresh `snapshot -i` and may include bounded role/name retries such as `button`/`link` for a missed `text` click, each as a `try-*-candidate` entry carrying redacted `find role …` argv.
|
|
149
|
+
|
|
150
|
+
For desktop or host-controlled rich inputs, treat a semantic `fill` miss differently. If the fresh snapshot finds an exact current editable ref (`searchbox` or `textbox`), `details.richInputRecovery` and visible `Rich input recovery` describe the candidate and append `focus-current-editable-ref*` / `click-current-editable-ref*` next actions. Those actions deliberately do **not** copy the fill text and never press `Enter` or submit. Use the safe ladder instead: refresh refs, choose the current editable `@ref`, focus or click it, then send the intended text with `keyboard inserttext` or `keyboard type` in a separate call. Do not auto-submit unless the user flow explicitly calls for it.
|
|
147
151
|
|
|
148
152
|
Do not assume Playwright selector dialects such as `text=Close` or `button:has-text('Close')` are supported wrapper syntax. If you need those forms, verify current upstream `agent-browser` behavior first; otherwise use refs, `find`, or known CSS selectors.
|
|
149
153
|
|
|
150
|
-
Treat `@e…` refs as page-scoped. After a successful `snapshot`, the wrapper records the latest refs and page target for that session; mutation-prone ref commands such as `click @e4`, `select @e5 chocolate`, or batch steps with old refs fail with `failureCategory: "stale-ref"` when the page target changed or the ref is absent from the latest same-page snapshot. Inside `batch` stdin JSON, the wrapper also walks steps in order before spawn: steps whose first token can navigate or mutate set a latch; a later step whose first token is `snapshot` clears that latch for following rows; guarded steps that still mention `@e…` after an uncleared latch fail with the same `stale-ref` bucket without launching upstream. Same-snapshot form fills are allowed before a click or submit step, so a login-style `fill`, `fill`, `click` batch can run from one snapshot; split dynamic or autosubmit forms with a fresh snapshot if a fill itself rerenders the targets. Follow the `refresh-interactive-refs` next action (it includes `--session <name>` when needed) and prefer stable `find` or `semanticAction` locators when navigation or rerendering is likely. Contract detail: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`refSnapshot`).
|
|
154
|
+
Treat `@e…` refs as page-scoped. After a successful `snapshot`, the wrapper records the latest refs and page target for that session; mutation-prone ref commands such as `click @e4`, `select @e5 chocolate`, or batch steps with old refs fail with `failureCategory: "stale-ref"` when the page target changed or the ref is absent from the latest same-page snapshot. If a session `snapshot -i` fails with `No active page`, the wrapper invalidates prior refs for that session; later mutation-prone `@e…` calls fail before upstream until a successful fresh `snapshot -i` records refs again. Inside `batch` stdin JSON, the wrapper also walks steps in order before spawn: steps whose first token can navigate or mutate set a latch; a later step whose first token is `snapshot` clears that latch for following rows; guarded steps that still mention `@e…` after an uncleared latch fail with the same `stale-ref` bucket without launching upstream. Same-snapshot form fills are allowed before a click or submit step, so a login-style `fill`, `fill`, `click` batch can run from one snapshot; split dynamic or autosubmit forms with a fresh snapshot if a fill itself rerenders the targets. Follow the `refresh-interactive-refs` next action (it includes `--session <name>` when needed) and prefer stable `find` or `semanticAction` locators when navigation or rerendering is likely. Contract detail: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`refSnapshot`, `refSnapshotInvalidation`).
|
|
151
155
|
|
|
152
156
|
A successful `click` result means upstream reported a target, not that the app definitely handled the event. When the workflow depends on a mutation, use `details.pageChangeSummary`, a wait, URL/text extraction, or a fresh `snapshot -i` before trusting the state; if nothing changed, retry with a current visible ref or stable selector and report the workflow issue. Preserve explicit user stop boundaries: if the user says to stop before a final order, post, purchase, or submit action, gather evidence from that page and do not click the final action. The wrapper avoids site-specific fallback clicks and keeps the verification burden explicit.
|
|
153
157
|
|
|
@@ -172,7 +176,7 @@ Prefer `get` and scoped `eval --stdin` for read-only extraction. Getter names ar
|
|
|
172
176
|
|
|
173
177
|
Return the intended JavaScript value from `eval --stdin` instead of relying on `console.log`. For object-shaped extraction, pass a plain expression such as `({ title: document.title, url: location.href })`; if you send a function-shaped snippet, invoke it explicitly, for example `(() => ({ title: document.title }))()`. When upstream serializes a function result to `{}`, the wrapper can append `Eval stdin hint` and `details.evalStdinHint`.
|
|
174
178
|
|
|
175
|
-
On tabbed or hidden-DOM pages, `get text <selector>` reads the upstream-selected match, which may be hidden even when a later match is visible. For non-`@ref` CSS selectors with multiple matches, including successful `batch` steps, the wrapper may add `Selector text visibility warning`, `details.selectorTextVisibility` (and `details.selectorTextVisibilityAll` for multiple batched warnings), and `inspect-visible-text-candidates` next actions so agents know to use a fresher `snapshot -i`, a visible `@ref`, or a more specific selector instead of trusting hidden tab content.
|
|
179
|
+
On tabbed or hidden-DOM pages, `get text <selector>` reads the upstream-selected match, which may be hidden even when a later match is visible. For non-`@ref` CSS selectors with multiple matches, including successful `batch` steps, the wrapper may add `Selector text visibility warning`, `details.selectorTextVisibility` (and `details.selectorTextVisibilityAll` for multiple batched warnings), and `inspect-visible-text-candidates` next actions. The warning names the matching `details.nextActions` id so agents know to use a fresher `snapshot -i`, a visible `@ref`, or a more specific selector instead of trusting hidden tab content.
|
|
176
180
|
|
|
177
181
|
### Run a multi-step flow in one browser invocation
|
|
178
182
|
|
|
@@ -184,6 +188,8 @@ Use `batch --bail` when later steps should stop after the first failed command.
|
|
|
184
188
|
|
|
185
189
|
For short constrained flows, use top-level `job` instead of hand-writing `batch` stdin. Supported job steps are `open`, `click`, `fill`, `select`, `wait`, `assertText`, `assertUrl`, `waitForDownload`, and `screenshot`; `select` requires `selector` plus `value` or `values`, and compiles to upstream `select <selector> <value...>`. The wrapper compiles steps to upstream `batch` and records `details.compiledJob.steps[]`. There is still no separate first-class catalog of reusable named browser recipes above `job`, the `qa` preset, and raw `batch`; see [`ARCHITECTURE.md`](ARCHITECTURE.md#no-reusable-recipe-layer-yet) for the closed `RQ-0068` decision and revisit bar.
|
|
186
190
|
|
|
191
|
+
**Job navigation is explicit.** A `click` step (or other navigation-prone interaction) does not prove the next page loaded. The wrapper does not auto-insert `assertUrl` or `assertText` after clicks inside `job`; add those steps yourself with the URL pattern or on-page text you expect, especially after forms, checkout, tabs, or submit buttons, before screenshots or later steps.
|
|
192
|
+
|
|
187
193
|
```json
|
|
188
194
|
{
|
|
189
195
|
"job": {
|
|
@@ -196,13 +202,30 @@ For short constrained flows, use top-level `job` instead of hand-writing `batch`
|
|
|
196
202
|
}
|
|
197
203
|
```
|
|
198
204
|
|
|
205
|
+
Navigation-prone flow (open → fill → click → assert destination → screenshot):
|
|
206
|
+
|
|
207
|
+
```json
|
|
208
|
+
{
|
|
209
|
+
"job": {
|
|
210
|
+
"steps": [
|
|
211
|
+
{ "action": "open", "url": "https://shop.example/checkout" },
|
|
212
|
+
{ "action": "fill", "selector": "#email", "text": "user@example.com" },
|
|
213
|
+
{ "action": "click", "selector": "#continue" },
|
|
214
|
+
{ "action": "assertUrl", "url": "**/shipping" },
|
|
215
|
+
{ "action": "assertText", "text": "Shipping address" },
|
|
216
|
+
{ "action": "screenshot", "path": ".dogfood/shipping.png" }
|
|
217
|
+
]
|
|
218
|
+
}
|
|
219
|
+
}
|
|
220
|
+
```
|
|
221
|
+
|
|
199
222
|
On app pages that expose a native dropdown, add a `select` step such as `{ "action": "select", "selector": "#flavor", "value": "chocolate" }` before the assertion that depends on it.
|
|
200
223
|
|
|
201
224
|
Use raw `args: ["batch"]` with `stdin` when you need arbitrary upstream commands, flags, or batch failure policies outside the constrained schema. Do not pass `stdin` with `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron`; those modes generate or manage their own input.
|
|
202
225
|
|
|
203
|
-
For quick smoke/QA checks, use top-level `qa`. It clears enabled network/console/page-error buffers before opening the target URL, waits for page readiness, checks expected text/selector, inspects fresh network requests, console messages, and page errors, and can capture an evidence screenshot. The readiness wait defaults to `loadState: "domcontentloaded"`; set `loadState` to `"load"` or `"networkidle"` only when that stricter state is useful and the site is not expected to keep background requests alive. QA network diagnostics classify failed requests by likely impact and list failed rows first in the network preview: actionable document/script/API-style failures fail the preset, while common low-impact browser icon misses such as `favicon.ico` are surfaced as warnings (`qaPreset.warnings`) so they do not fail an otherwise healthy page.
|
|
226
|
+
For quick smoke/QA checks, use top-level `qa`. It clears enabled network/console/page-error buffers before opening the target URL, waits for page readiness, checks expected text/selector, inspects fresh network requests, console messages, and page errors, and can capture an evidence screenshot. The readiness wait defaults to `loadState: "domcontentloaded"`; set `loadState` to `"load"` or `"networkidle"` only when that stricter state is useful and the site is not expected to keep background requests alive. QA network diagnostics classify failed requests by likely impact and list failed rows first in the network preview: actionable document/script/API-style failures fail the preset, while common low-impact browser icon misses such as `favicon.ico` are surfaced as warnings (`qaPreset.warnings`) so they do not fail an otherwise healthy page. Successful QA with no failed checks returns compact model-visible prose (page URL/title when known, checks run, optional screenshot verification) while keeping the full step matrix in `details.qaPreset` and `details.batchSteps`. Failed QA presets report `details.resultCategory: "failure"`, `failureCategory: "qa-failure"`, keep verbose per-step batch output, and real Pi sessions treat the diagnostic as a failed tool result. Prose output also gets a model-visible result-category line including `Pi tool isError: true`; caller-requested `--json` output keeps the JSON string parseable and relies on the patched `isError` plus `details` fields.
|
|
204
227
|
|
|
205
|
-
The same classification drives plain `network requests` presentation: when any row counts as failed (HTTP status ≥ 400, `failed: true`, or a string `error`), model-facing text starts with a line like `Network failure summary: 0 actionable, 1 benign low-impact (1 total).`, and each preview line can end with an impact tag such as `[benign: low-impact browser icon asset]` or `[actionable: document, script, API, or non-benign request failure]`. When safe request IDs are present, `details.nextActions` adds bounded read-only follow-ups such as `network request <id>`, `networkSourceLookup` for actionable failed rows, `network requests --filter <path>`, and `network har start`; prefer those payloads over rebuilding request-id commands from prose. Rules live in `classifyNetworkRequestFailure` / `summarizeNetworkFailures` in `extensions/agent-browser/lib/results/
|
|
228
|
+
The same classification drives plain `network requests` presentation: when any row counts as failed (HTTP status ≥ 400, `failed: true`, or a string `error`), model-facing text starts with a line like `Network failure summary: 0 actionable, 1 benign low-impact (1 total).`, and each preview line can end with an impact tag such as `[benign: low-impact browser icon asset]` or `[actionable: document, script, API, or non-benign request failure]`. When safe request IDs are present, `details.nextActions` adds bounded read-only follow-ups such as `network request <id>`, `networkSourceLookup` for actionable failed rows, `network requests --filter <path>`, and `network har start`; prefer those payloads over rebuilding request-id commands from prose. Rules live in `classifyNetworkRequestFailure` / `summarizeNetworkFailures` in `extensions/agent-browser/lib/results/network.ts`; QA aggregation is `analyzeQaPresetResults` in `extensions/agent-browser/index.ts`.
|
|
206
229
|
|
|
207
230
|
```json
|
|
208
231
|
{ "qa": { "url": "https://example.com", "expectedText": "Example Domain", "screenshotPath": ".dogfood/qa-example.png" } }
|
|
@@ -210,7 +233,7 @@ The same classification drives plain `network requests` presentation: when any r
|
|
|
210
233
|
|
|
211
234
|
Optional `loadState`, `checkNetwork`, `checkConsole`, and `checkErrors` default to `"domcontentloaded"`, `true`, `true`, and `true`; set a check to `false` to skip that diagnostic. Omit `expectedText` and `expectedSelector` when you only need load plus diagnostics.
|
|
212
235
|
|
|
213
|
-
For attached Electron or manually connected CDP sessions, use `qa.attached` after the session exists. It does not open a URL and rejects `sessionMode: "fresh"` because it checks the current managed session.
|
|
236
|
+
For attached Electron or manually connected CDP sessions, use `qa.attached` after the session exists. It does not open a URL and rejects `sessionMode: "fresh"` because it checks the current managed session. Before running diagnostics, the wrapper requires a readable `http:` or `https:` page URL on the attached session; missing URLs, read failures, and non-http(s) surfaces fail fast with recovery `nextActions` such as `tab list` and `snapshot -i` instead of running the full QA batch.
|
|
214
237
|
|
|
215
238
|
```json
|
|
216
239
|
{ "qa": { "attached": true, "expectedText": "Explorer", "screenshotPath": ".dogfood/electron.png" } }
|
|
@@ -242,13 +265,17 @@ Typical lifecycle:
|
|
|
242
265
|
|
|
243
266
|
After launch, prefer the exact `details.nextActions` payloads when present: `status-electron-launch` checks liveness, `probe-electron-launch` runs compact diagnostics for a tracked launch, `snapshot-electron-session` refreshes current refs, `list-electron-tabs` inspects targets, and `cleanup-electron-launch` removes the wrapper-owned process/profile when the run is done. If launch times out, inspect `details.electron.failure.diagnostics` for PID, wrapper profile, `DevToolsActivePort`, and timing evidence before retrying. If status/probe detects a session or target mismatch, follow `reattach-electron-launch` or a fresh snapshot action before using old refs. If a click/fill/type looks successful but the Electron PID or debug port dies, the wrapper now fails the result with `details.electronPostCommandHealth` and same-launch status/probe/cleanup next actions instead of leaving the agent on `about:blank`. If cleanup is partial (`failureCategory: "cleanup-failed"`), inspect `details.electron.cleanup.results` and use `retry-electron-cleanup` only for the same `launchId`.
|
|
244
267
|
|
|
245
|
-
Manual path for externally launched apps: if you started the Electron app yourself with a debug port or DevTools URL, skip the wrapper lifecycle and attach directly with upstream `connect`. In this path you own app shutdown and profile cleanup; do not use `electron.cleanup`.
|
|
268
|
+
Manual path for externally launched apps: if you started the Electron app yourself with a debug port or DevTools URL, skip the wrapper lifecycle and attach directly with upstream `connect`. In this path you own app shutdown and profile cleanup; do not use `electron.cleanup`. `close` only closes the browser/CDP session and does not quit the manually launched app or remove explicit artifacts.
|
|
246
269
|
|
|
247
270
|
```json
|
|
248
271
|
{ "args": ["connect", "9222"], "sessionMode": "fresh" }
|
|
272
|
+
{ "args": ["tab", "list"] }
|
|
273
|
+
{ "args": ["tab", "t2"] }
|
|
249
274
|
{ "args": ["snapshot", "-i"] }
|
|
250
275
|
```
|
|
251
276
|
|
|
277
|
+
A successful raw `connect` means the debug endpoint accepted the session, not that the app has an active ready page. Prefer `details.nextActions` when present: `list-connected-session-tabs` runs the session-scoped tab inspection. After that read-only list, select or confirm the stable `t<N>` target and run `snapshot -i` explicitly before trusting refs. If a `snapshot -i` says `No active page`, the wrapper clears any prior refs for that session; follow `list-tabs-after-no-active-page`, select the stable `t<N>` surface, then use a condition wait or retry `snapshot -i` before trusting refs.
|
|
278
|
+
|
|
252
279
|
For current-session smoke checks after either path, use `qa.attached`; for compact state instead of separate title/url/focus/tab/snapshot calls, use `electron.probe`. `electron.probe.timeoutMs` bounds each underlying read subprocess; `electron.probe.launchId` ties the probe to a wrapper launch and can surface session or target mismatch guidance before you trust page refs. For VS Code-style quick inputs, treat a successful `fill` as tentative: the wrapper may append `details.fillVerification` if `get value` still reads empty or different, and Electron `@e…` mutations can append `refresh-electron-refs-after-rerender` because same-URL UI rerenders commonly churn refs.
|
|
253
280
|
|
|
254
281
|
For local app debugging, top-level `sourceLookup` can gather candidate component/file locations for a visible element from selector DOM hints, React DevTools inspection, and a bounded workspace component-name search rooted at the Pi session working directory (`maxWorkspaceFiles` defaults to 2000 and cannot exceed 5000; the scan records at most ten `workspace-search` candidates). With a `selector`, the wrapper runs `is visible` and, unless `includeDomHints` is `false`, `get html` so DOM data attributes and embedded source-like paths can become `dom-attribute` candidates. It reports evidence and confidence in `details.sourceLookup` instead of claiming a guaranteed source file. React hints require a session opened with `--enable react-devtools`. The `details.sourceLookup.status` field reads `unsupported` only when no candidates were collected **and** a `react` batch step failed (inspect errors, missing renderer, and similar); it reads `no-candidates` when the batch succeeded but nothing matched. If selector or workspace hints still yield candidates, `status` remains `candidates-found` even when React inspection failed. Unlike `qa`, the wrapper does not downgrade a **fully successful** upstream batch to `isError` solely because those statuses appear—though failed batch steps still produce normal tool errors. For wrapper-tracked packaged Electron sessions with no candidates, `details.sourceLookup.workspaceRoot` and optional `details.sourceLookup.electronContext` explain that the scan only covered the Pi tool cwd; installed app resources or `app.asar` bundles are outside that scan and are not unpacked. Those results may add `snapshot-electron-session`, `probe-electron-launch`, and `list-electron-tabs` next actions so you can inspect the live packaged app before deciding whether to change the workspace or app bundle.
|
|
@@ -271,7 +298,9 @@ Top-level `networkSourceLookup` does the same for failed browser requests. When
|
|
|
271
298
|
{ "args": ["wait", "--download", "/tmp/report.pdf"] }
|
|
272
299
|
```
|
|
273
300
|
|
|
274
|
-
Do not
|
|
301
|
+
Do not omit the load state value; use `wait --load <state>` with `load`, `domcontentloaded`, or `networkidle`.
|
|
302
|
+
|
|
303
|
+
For desktop-host readiness, prefer condition waits over fixed sleeps. Use this ladder: `wait --text` / `wait --url` / `wait --fn` / `wait --load <state>` / `wait --download` when a real condition exists; after raw `connect`, run `tab list` → `tab t<N>` → condition wait or `snapshot -i`; after wrapper-owned `electron.launch`, use `electron.probe` / `electron.status` for launch health or target mismatch; use `qa.attached` when expected text or selector plus diagnostics can express the check. Fixed waits are a last resort: `wait 30000` is intentionally blocked by the wrapper IPC budget, and a successful fixed-wait payload such as `"waited":"timeout"` means elapsed time only, not proof that the desktop host finished. Verify with an observed condition, fresh snapshot, or screenshot before continuing.
|
|
275
304
|
|
|
276
305
|
Use `wait --download [path]` after an earlier action has already started a browser download, such as a dashboard export button that responds asynchronously:
|
|
277
306
|
|
|
@@ -302,6 +331,8 @@ The upstream screenshot aliases are `screenshot --full` for full-page capture an
|
|
|
302
331
|
|
|
303
332
|
Prefer `download <selector> <path>` when the target element itself is the downloadable link/control. Use `click` plus `wait --download [path]` when a previous action starts the download indirectly.
|
|
304
333
|
|
|
334
|
+
For evidence-only screenshots, QA captures, or audit artifacts, save to an explicit path and branch on `details.artifactVerification` plus `details.artifacts` before reporting PASS/FAIL. Inline image attachments are optional convenience when size limits allow; do not require vision review unless the user asked for visual inspection.
|
|
335
|
+
|
|
305
336
|
Wrapper result rendering is metadata-first for saved files:
|
|
306
337
|
- screenshots return a saved-path summary, visible artifact metadata, structured `details.artifacts` metadata, and an inline image attachment when safe; the visible block includes artifact type, requested path, absolute path, existence, size, cwd, session, and repair/copy status when applicable
|
|
307
338
|
- downloads, PDFs, `wait --download` files, `state save` state files, diff screenshot output images, traces, CPU profiles, completed WebM recordings from `record stop`, and path-bearing HAR captures return concise saved-path summaries plus structured `details.artifacts` metadata without inlining large files
|
|
@@ -345,7 +376,7 @@ Oversized snapshots and oversized generic outputs are different: when a persiste
|
|
|
345
376
|
{ "args": ["snapshot", "-i"] }
|
|
346
377
|
```
|
|
347
378
|
|
|
348
|
-
Use `tab list` and `tab <tab-id-or-label>` when a profile restore, pop-up, or click opens or focuses the wrong tab.
|
|
379
|
+
Use `tab list` and `tab <tab-id-or-label>` when a profile restore, pop-up, or click opens or focuses the wrong tab. Generic tab-drift recovery lists tabs first; run `snapshot -i` only after selecting or confirming the intended stable target. When the wrapper already knows the target, `details.nextActions` may include recovery actions that list tabs, select the intended tab, and refresh refs in the right session.
|
|
349
380
|
|
|
350
381
|
### Recover from guarded-action confirmations
|
|
351
382
|
|
|
@@ -517,7 +548,7 @@ Stable tab ids look like `t1`, `t2`, and `t3`. Optional user labels such as `doc
|
|
|
517
548
|
| `snapshot -d <n>` / `snapshot --depth <n>` | Limit tree depth. |
|
|
518
549
|
| `snapshot -s <sel>` / `snapshot --selector <sel>` | Scope to a CSS selector. |
|
|
519
550
|
|
|
520
|
-
When a snapshot is too large for inline output, the Pi wrapper renders a compact view before spilling the full raw snapshot to `details.fullOutputPath`. Compact snapshots are main-content-first, but dense pages can still hide actionable controls in omitted content;
|
|
551
|
+
When a snapshot is too large for inline output, the Pi wrapper renders a compact view before spilling the full raw snapshot to `details.fullOutputPath`. Compact snapshots are main-content-first, but dense pages and desktop host screens can still hide actionable controls in omitted content; scan `Omitted high-value controls` before opening the spill file. That bounded section favors editable/searchbox/textbox/combobox controls, named tab/surface controls, and primary action buttons, then includes other useful controls such as checkboxes, radios, options, and menuitems that were not already listed under key refs or other refs. When that section appears, `details.data.highValueControlRefIds` repeats the same visible ref ids for programmatic follow-up alongside fields such as `previewMode`, `previewSections`, and counts on `details.data` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details)).
|
|
521
552
|
|
|
522
553
|
### Wait
|
|
523
554
|
|
|
@@ -671,7 +702,7 @@ Other useful environment variables include `AGENT_BROWSER_DEFAULT_TIMEOUT`, `AGE
|
|
|
671
702
|
- After launch-scoped open/goto/navigate calls that can restore existing tabs (for example --profile, --session-name, or --state), agent_browser best-effort re-selects the tab whose URL matches the returned page when restored tabs steal focus during launch.
|
|
672
703
|
- After the wrapper observes tab-drift risk for a session (for example profile restore correction, overlapping stale opens, or resumed session state), later active-tab commands best-effort pin that tab inside the same upstream invocation. Routine same-session commands are not preflighted with tab list just because a target tab is known.
|
|
673
704
|
- For sessions with observed tab-drift risk, after a successful command on a known target tab, agent_browser also best-effort restores that intended tab if a restored/background tab steals focus after the command completes. Routine same-session commands skip this post-command tab-list probe.
|
|
674
|
-
- If a known session target unexpectedly reports about:blank, agent_browser
|
|
705
|
+
- If a known session target unexpectedly reports about:blank, agent_browser best-effort re-selects the prior intended target when it still exists; if recovery fails, it records the observed about:blank target and reports exact recovery guidance instead of treating the prior page as active.
|
|
675
706
|
<!-- agent-browser-playbook:end wrapper-tab-recovery -->
|
|
676
707
|
- Wrapper-spawned commands clamp `AGENT_BROWSER_DEFAULT_TIMEOUT` to 25 seconds and use a 28-second child-process watchdog (`PI_AGENT_BROWSER_PROCESS_TIMEOUT_MS` overrides the default 28s budget) so one upstream CLI call does not cross the upstream 30-second IPC read-timeout/retry path. When that watchdog fires, `details.timeoutPartialProgress` may include a planned step list for compiled `job` / `qa` plans or caller `batch` stdin, current page title/URL from best-effort session `get url` / `get title` (or a planned URL inferred from the step list when the session cannot answer), and declared artifact paths such as `screenshot`, `pdf`, `download`, or `wait --download` outputs with existence/size checks; the same evidence is appended under `Timeout partial progress` in visible text with URL/path redaction.
|
|
677
708
|
- Oversized snapshots and oversized generic outputs may be compacted in tool content, with the full raw output written to a spill file path shown directly in the tool result. Recent artifact metadata is bounded by `PI_AGENT_BROWSER_SESSION_ARTIFACT_MANIFEST_MAX_ENTRIES` (default 100); persisted spill files are separately bounded by `PI_AGENT_BROWSER_SESSION_ARTIFACT_MAX_BYTES` (default 32 MiB).
|