pi-agent-browser-native 0.2.31 → 0.2.33

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (66) hide show
  1. package/CHANGELOG.md +35 -0
  2. package/README.md +64 -18
  3. package/docs/ARCHITECTURE.md +13 -10
  4. package/docs/COMMAND_REFERENCE.md +71 -16
  5. package/docs/ELECTRON.md +387 -0
  6. package/docs/RELEASE.md +34 -4
  7. package/docs/REQUIREMENTS.md +5 -3
  8. package/docs/SUPPORT_MATRIX.md +36 -21
  9. package/docs/TOOL_CONTRACT.md +198 -40
  10. package/extensions/agent-browser/index.ts +1585 -3486
  11. package/extensions/agent-browser/lib/electron/cleanup.ts +287 -0
  12. package/extensions/agent-browser/lib/electron/discovery.ts +717 -0
  13. package/extensions/agent-browser/lib/electron/launch.ts +553 -0
  14. package/extensions/agent-browser/lib/input-modes/electron.ts +170 -0
  15. package/extensions/agent-browser/lib/input-modes/job.ts +203 -0
  16. package/extensions/agent-browser/lib/input-modes/lookups.ts +447 -0
  17. package/extensions/agent-browser/lib/input-modes/params.ts +188 -0
  18. package/extensions/agent-browser/lib/input-modes/semantic-action.ts +107 -0
  19. package/extensions/agent-browser/lib/input-modes/shared.ts +46 -0
  20. package/extensions/agent-browser/lib/input-modes/types.ts +221 -0
  21. package/extensions/agent-browser/lib/input-modes.ts +41 -0
  22. package/extensions/agent-browser/lib/orchestration/browser-run/diagnostics.ts +696 -0
  23. package/extensions/agent-browser/lib/orchestration/browser-run/final-result.ts +450 -0
  24. package/extensions/agent-browser/lib/orchestration/browser-run/index.ts +46 -0
  25. package/extensions/agent-browser/lib/orchestration/browser-run/prepare.ts +711 -0
  26. package/extensions/agent-browser/lib/orchestration/browser-run/process-output.ts +386 -0
  27. package/extensions/agent-browser/lib/orchestration/browser-run/session-state.ts +868 -0
  28. package/extensions/agent-browser/lib/orchestration/browser-run/types.ts +476 -0
  29. package/extensions/agent-browser/lib/orchestration/browser-run.ts +1 -0
  30. package/extensions/agent-browser/lib/orchestration/input-plan.ts +338 -0
  31. package/extensions/agent-browser/lib/playbook.ts +15 -13
  32. package/extensions/agent-browser/lib/process.ts +106 -4
  33. package/extensions/agent-browser/lib/results/action-recommendations.ts +269 -0
  34. package/extensions/agent-browser/lib/results/artifact-manifest.ts +114 -0
  35. package/extensions/agent-browser/lib/results/artifact-state.ts +13 -0
  36. package/extensions/agent-browser/lib/results/categories.ts +106 -0
  37. package/extensions/agent-browser/lib/results/contracts.ts +220 -0
  38. package/extensions/agent-browser/lib/results/editable-ref-evidence.ts +72 -0
  39. package/extensions/agent-browser/lib/results/envelope.ts +2 -1
  40. package/extensions/agent-browser/lib/results/network.ts +64 -0
  41. package/extensions/agent-browser/lib/results/next-actions.ts +117 -0
  42. package/extensions/agent-browser/lib/results/presentation/artifacts.ts +506 -0
  43. package/extensions/agent-browser/lib/results/presentation/batch.ts +355 -0
  44. package/extensions/agent-browser/lib/results/presentation/common.ts +53 -0
  45. package/extensions/agent-browser/lib/results/presentation/content.ts +36 -0
  46. package/extensions/agent-browser/lib/results/presentation/diagnostics.ts +730 -0
  47. package/extensions/agent-browser/lib/results/presentation/errors.ts +125 -0
  48. package/extensions/agent-browser/lib/results/presentation/large-output.ts +182 -0
  49. package/extensions/agent-browser/lib/results/presentation/navigation.ts +216 -0
  50. package/extensions/agent-browser/lib/results/presentation/registry.ts +154 -0
  51. package/extensions/agent-browser/lib/results/presentation/skills.ts +143 -0
  52. package/extensions/agent-browser/lib/results/presentation.ts +87 -2369
  53. package/extensions/agent-browser/lib/results/recovery-actions.ts +139 -0
  54. package/extensions/agent-browser/lib/results/recovery-next-actions.ts +71 -0
  55. package/extensions/agent-browser/lib/results/selector-recovery.ts +312 -0
  56. package/extensions/agent-browser/lib/results/shared.ts +17 -701
  57. package/extensions/agent-browser/lib/results/snapshot-high-value-controls.ts +262 -0
  58. package/extensions/agent-browser/lib/results/snapshot-refs.ts +100 -0
  59. package/extensions/agent-browser/lib/results/snapshot-segments.ts +366 -0
  60. package/extensions/agent-browser/lib/results/snapshot-spill.ts +63 -0
  61. package/extensions/agent-browser/lib/results/snapshot.ts +37 -489
  62. package/extensions/agent-browser/lib/results/text.ts +40 -0
  63. package/extensions/agent-browser/lib/results.ts +16 -5
  64. package/extensions/agent-browser/lib/session-page-state.ts +486 -0
  65. package/extensions/agent-browser/lib/temp.ts +26 -0
  66. package/package.json +6 -4
package/CHANGELOG.md CHANGED
@@ -1,5 +1,40 @@
1
1
  # Changelog
2
2
 
3
+ ## Unreleased
4
+
5
+ ## 0.2.33 - 2026-05-23
6
+
7
+ ### Added
8
+
9
+ - `npm run typecheck` as a thin alias for the default gate’s `tsc --noEmit` step (`scripts/project.mjs` `verify typecheck`), for fast iteration without docs drift checks, unit tests, or live upstream command-reference sampling.
10
+
11
+ ### Changed
12
+
13
+ - Maintainer docs now map Pi `details` classifiers, `nextActions`, recovery id registries, and network QA helpers to the split `extensions/agent-browser/lib/results/*` modules (plus the `shared.ts` re-export barrel) and call out `extensions/agent-browser/lib/session-page-state.ts` for ordered tab/ref/pinning state—see [`AGENTS.md`](../AGENTS.md), [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md), [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md), [`docs/REQUIREMENTS.md`](docs/REQUIREMENTS.md), and [`docs/SUPPORT_MATRIX.md`](docs/SUPPORT_MATRIX.md).
14
+ - Real Pi transcripts now treat wrapper-classified failures as tool errors: a `tool_result` hook (`buildAgentBrowserToolResultPatch` in `extensions/agent-browser/index.ts`) sets `isError: true` when `details.resultCategory` is `failure` even if `execute` returned successfully (for example `qa` preset reclassification), appends a model-visible `Result category: failure; …; Pi tool isError: true.` line for prose output, and preserves caller-requested `--json` output as parseable JSON—see [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details).
15
+ - Desktop and attach workflows: documented readiness ladders (condition waits, `tab list` → `tab t<N>` → `snapshot -i`, `electron.probe` / `qa.attached`) instead of blind sleeps; clarified that raw `connect` success only means the CDP endpoint accepted the session, and that `close` does not quit manually launched apps or remove explicit artifacts.
16
+ - Page-scoped refs: when `snapshot -i` reports `No active page`, prior session refs are cleared and `details.refSnapshotInvalidation` records `reason: "no-active-page"` until a successful snapshot restores refs; compact snapshots’ `Omitted high-value controls` heuristics favor editables, named tab/surface controls, and primary actions on dense desktop-host UIs.
17
+ - Semantic `fill` recovery on host-controlled rich inputs: `details.richInputRecovery`, visible `Rich input recovery`, and bounded `focus-current-editable-ref*` / `click-current-editable-ref*` next actions (no embedded fill text, no auto-submit); click misses still get bounded role/name `find` candidates where applicable.
18
+ - `get text` selector-visibility warnings now name the matching `details.nextActions` id; `about:blank` drift recovery records the observed blank target when re-selecting the prior tab fails instead of implying the old page stayed active.
19
+
20
+ ## 0.2.32 - 2026-05-21
21
+
22
+ ### Added
23
+ - First-class Electron desktop-app support for `agent_browser`: top-level `electron` now covers bounded app discovery, isolated wrapper-owned launch/attach, status, compact probe, and cleanup without requiring agents to hand-build the CDP launch sequence.
24
+ - Electron launch safety and lifecycle details: wrapper-owned launches use a temporary profile and OS-chosen debug port, record a `launchId`, surface exact status/probe/cleanup next actions, support caller-owned `allow` / `deny` policies, and avoid touching manually launched apps.
25
+ - `qa.attached` for current attached browser/Electron sessions, so agents can run quick smoke checks without opening a URL or replacing the active desktop-app target.
26
+ - A dedicated public Electron guide at [`docs/ELECTRON.md`](docs/ELECTRON.md), linked from the README, command reference, tool contract, architecture, requirements, release, and support-matrix docs and included in the published package.
27
+
28
+ ### Changed
29
+ - `sourceLookup`, broad `get text`, fill verification, tab/session mismatch, and stale-ref guidance now include Electron-aware context and recovery actions for packaged desktop apps.
30
+ - Verification coverage now includes deterministic Electron lifecycle/probe benchmark scenarios, fake-upstream Electron discovery/lifecycle tests, lifecycle restore/shutdown cleanup checks, and real-app dogfood evidence recorded in the Electron plan.
31
+ - The configured-source lifecycle harness (`npm run verify -- lifecycle`, `scripts/verify-lifecycle.mjs`) now defaults to Pi model `zai/glm-5.1` with `--model <id>` override; `npm run verify` lifecycle passthrough rejects `--model` without a value.
32
+ - Updated the local Pi development baseline to `@earendil-works/*` `0.75.4` and refreshed the npm lockfile.
33
+
34
+ ### Fixed
35
+ - Runtime validation now rejects `electron.status` / `electron.cleanup` with `all: false`, keeping runtime behavior aligned with the public schema and contract.
36
+ - Electron + caller `stdin` validation now reports a direct Electron-specific error instead of mixing in generated-batch mode guidance.
37
+
3
38
  ## 0.2.31 - 2026-05-18
4
39
 
5
40
  ### Added
package/README.md CHANGED
@@ -56,19 +56,20 @@ The result is optimized for agent work:
56
56
 
57
57
  | Pain | Native wrapper capability | Proof surface |
58
58
  |---|---|---|
59
- | Agents build fragile shell commands | Exposes `agent_browser` with exact `args`, an optional `semanticAction` shorthand for common `find` flows and native `select`, constrained `job` / `qa` presets and experimental `sourceLookup` / `networkSourceLookup` that compile short workflows to `batch`, plus controlled `stdin` and `sessionMode` | `extensions/agent-browser/index.ts`, [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md) |
60
- | Page snapshots are too large | Shows compact, main-content-first summaries, surfaces an `Omitted high-value controls` section (plus `details.data.highValueControlRefIds`) when dense pages hide inputs and tabs from the trimmed ref lists, and stores full raw output in spill files when needed | `extensions/agent-browser/lib/results/snapshot.ts`, `test/agent-browser.presentation.test.ts` |
59
+ | Agents build fragile shell commands | Exposes `agent_browser` with exact `args`, an optional `semanticAction` shorthand for common `find` flows and native `select`, constrained `job` / `qa` presets, experimental `sourceLookup` / `networkSourceLookup` that compile short workflows to `batch`, top-level `electron` for desktop lifecycle, plus controlled `stdin` and `sessionMode` | `extensions/agent-browser/index.ts`, `extensions/agent-browser/lib/input-modes/`, [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md) |
60
+ | Page snapshots are too large | Shows compact, main-content-first summaries, surfaces an `Omitted high-value controls` section (plus `details.data.highValueControlRefIds`) when dense pages or desktop host screens hide editables, named surfaces/tabs, and primary action buttons from the trimmed ref lists, and stores full raw output in spill files when needed | `extensions/agent-browser/lib/results/snapshot.ts`, `test/agent-browser.presentation.test.ts` |
61
61
  | Screenshots/downloads get lost in text | Normalizes artifact paths and reports existence, size, cwd, session, and repair status | [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#download-screenshot-and-pdf-files) |
62
- | Profile restores and tab drift confuse agents | Tracks managed sessions, re-selects target tabs after observed drift, and pins later commands only for sessions with drift/restored-session risk | generated tab-recovery notes below; `test/agent-browser.resume-state.test.ts` |
63
- | Auth/profile workflows can leak secrets | Supports `auth save --password-stdin` and redacts sensitive args, URLs, stdout/stderr, details, and parse-failure spills | `test/agent-browser.extension-validation.test.ts` |
64
- | Stateful cookies/storage/auth output bloats or leaks context | Presentation layer redacts `details.data` for cookies and storage (field-aware values) and recursively scrubs other structured upstream JSON (network, diff, trace/profiler, stream, dashboard, chat, auth, dialog, frame, state, and similar) using sensitive key names plus string heuristics; masks sensitive argv flags and positionals; scrubs secrets from failed batch step errors; and exposes a compact redacted `batch` matrix on top-level `details.data` | `extensions/agent-browser/lib/results/presentation.ts`, `extensions/agent-browser/lib/runtime.ts`, `test/agent-browser.presentation.test.ts` |
65
- | Stale `@eN` refs fail mysteriously | Records per-session `details.refSnapshot`, rejects mismatched URLs / unknown refs / unsafe `batch` stdin ordering before spawn, adds recovery guidance to rerun `snapshot -i` or use stable `find` locators | `extensions/agent-browser/index.ts`, `test/agent-browser.results.test.ts`, `test/agent-browser.extension-validation.test.ts` |
66
- | Agents need stable success/failure buckets | Exposes bounded `resultCategory`, `successCategory`, and `failureCategory` on tool `details` for branching without parsing prose | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), `extensions/agent-browser/lib/results/shared.ts`, `test/agent-browser.results.test.ts` |
67
- | Models re-snapshot after every click without new URL/title context | Adds optional `details.pageChangeSummary` (and per-batch-step summaries) with `changeType`, compact text, optional `title`/`url`, artifact hints, and `nextActionIds` aligned to `nextActions`; no-navigation clicks can also surface evidence-backed `details.overlayBlockers` candidates | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), `extensions/agent-browser/lib/results/presentation.ts`, `test/agent-browser.presentation.test.ts` |
62
+ | Profile restores and tab drift confuse agents | Tracks managed sessions, re-selects target tabs after observed drift, and pins later commands only for sessions with drift/restored-session risk | generated tab-recovery notes below; `test/agent-browser.extension-tab-recovery.test.ts` (drift and about:blank recovery), `test/agent-browser.resume-state.test.ts` (persisted session / resume planning) |
63
+ | Auth/profile workflows can leak secrets | Supports `auth save --password-stdin` and redacts sensitive args, URLs, stdout/stderr, details, and parse-failure spills | `test/agent-browser.extension-security-redaction.test.ts` |
64
+ | Stateful cookies/storage/auth output bloats or leaks context | Presentation layer redacts `details.data` for cookies and storage (field-aware values) and recursively scrubs other structured upstream JSON (network, diff, trace/profiler, stream, dashboard, chat, auth, dialog, frame, state, and similar) using sensitive key names plus string heuristics; masks sensitive argv flags and positionals; scrubs secrets from failed batch step errors; and exposes a compact redacted `batch` matrix on top-level `details.data` | `extensions/agent-browser/lib/results/presentation.ts`, `extensions/agent-browser/lib/results/presentation/diagnostics.ts`, `extensions/agent-browser/lib/runtime.ts`, `test/agent-browser.presentation-diagnostics.test.ts` |
65
+ | Stale `@eN` refs fail mysteriously | Records per-session `details.refSnapshot`, rejects mismatched URLs / unknown refs / unsafe `batch` stdin ordering before spawn, adds recovery guidance to rerun `snapshot -i` or use stable `find` locators | `extensions/agent-browser/index.ts`, `extensions/agent-browser/lib/session-page-state.ts`, `test/agent-browser.session-page-state.test.ts`, `test/agent-browser.results.test.ts`, `test/agent-browser.extension-ref-guards.test.ts`, `test/agent-browser.extension-semantic-recovery.test.ts` |
66
+ | Agents need stable success/failure buckets | Exposes bounded `resultCategory`, `successCategory`, and `failureCategory` on tool `details` for branching without parsing prose; a `tool_result` hook also aligns real Pi `isError` semantics, naming `Pi tool isError: true` in prose output while preserving parseable caller-requested `--json` output | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), `extensions/agent-browser/lib/results/categories.ts`, `extensions/agent-browser/lib/results/shared.ts` (re-export barrel), `extensions/agent-browser/index.ts`, `test/agent-browser.results.test.ts`, `test/agent-browser.extension-validation.test.ts` |
67
+ | Models re-snapshot after every click without new URL/title context | Adds optional `details.pageChangeSummary` (and per-batch-step summaries) with `changeType`, compact text, optional `title`/`url`, artifact hints, and `nextActionIds` aligned to `nextActions`; no-navigation clicks can also surface evidence-backed `details.overlayBlockers` candidates | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), `extensions/agent-browser/lib/results/presentation/navigation.ts`, `test/agent-browser.presentation.test.ts`, `test/agent-browser.extension-errors-artifacts.test.ts` |
68
68
  | Dashboard scroll commands can look successful while nothing moves | Samples viewport and prominent scroll-container positions around top-level `scroll` calls; unchanged positions produce `details.scrollNoop`, visible recovery guidance, and exact `nextActions` for snapshot/screenshot verification | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#core-page-and-element-commands), `test/agent-browser.extension-validation.test.ts` |
69
- | Dropdown/combobox clicks can focus or hit native option box-model errors | Adds first-class `select <selector> <value...>` paths through raw `args`, `semanticAction`, and `job`; for custom combobox clicks, detects focused controls with explicit `aria-expanded` state but no visible options and returns `details.comboboxFocus` plus exact recovery `nextActions` | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#core-page-and-element-commands), `test/agent-browser.extension-validation.test.ts` |
70
- | Recording workflows fail late when `ffmpeg` is missing | After successful `record start` / `record restart`, warns when `ffmpeg` is not on `PATH` so agents can install or fix PATH before `record stop` | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#diagnostics-performance-and-recording), `test/agent-browser.extension-validation.test.ts` |
69
+ | Dropdown/combobox clicks can focus or hit native option box-model errors | Adds first-class `select <selector> <value...>` paths through raw `args`, `semanticAction`, and `job`; for custom combobox clicks, detects focused controls with explicit `aria-expanded` state but no visible options and returns `details.comboboxFocus` plus exact recovery `nextActions` | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#core-page-and-element-commands), `extensions/agent-browser/lib/input-modes/semantic-action.ts`, `test/agent-browser.extension-input-modes.test.ts`, `test/agent-browser.extension-validation.test.ts` |
70
+ | Recording workflows fail late when `ffmpeg` is missing | After successful `record start` / `record restart`, warns when `ffmpeg` is not on `PATH` so agents can install or fix PATH before `record stop` | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#diff-debug-and-streaming), `test/agent-browser.extension-validation.test.ts` |
71
71
  | Direct binary help may be blocked in agent sessions | Publishes a repo-readable command reference and verifies it against the target upstream version | `npm run verify` |
72
+ | Desktop Electron apps need discovery, CDP attach, and safe teardown | Top-level `electron` runs host `list` / isolated `launch` (temp profile, OS-chosen debug port) / `status` / `probe` / `cleanup`, merges `launchId` plus managed `sessionName`, supports `handoff` `snapshot` / `tabs` / `connect`, and surfaces mismatch and post-command health guidance; wrapper cleanup applies only to launches it created | `extensions/agent-browser/lib/electron/discovery.ts`, `launch.ts`, `cleanup.ts`, [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#electron), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#electron-desktop-apps) |
72
73
  | Agents need bundled `skills` text without touching the live session | Treats `skills list`, `skills get …`, and `skills path …` as stateless JSON reads: no implicit managed `--session` under default `sessionMode: "auto"` (same session-ownership goal as plain-text `--help` / `--version`), while provider workflows stay thin passthroughs that require upstream setup and credentials | [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#built-in-skills), `extensions/agent-browser/lib/runtime.ts` |
73
74
 
74
75
  ## Fastest way to try it
@@ -202,6 +203,7 @@ For supported upstream `find` flows and native dropdown selection you can omit h
202
203
 
203
204
  ```json
204
205
  { "semanticAction": { "action": "click", "locator": "text", "value": "Submit" } }
206
+ { "semanticAction": { "action": "click", "locator": "role", "role": "button", "name": "Continue without Signing In" } }
205
207
  { "semanticAction": { "action": "fill", "locator": "label", "value": "Email", "text": "user@example.com" } }
206
208
  { "semanticAction": { "action": "select", "selector": "#flavor", "value": "chocolate" } }
207
209
  { "semanticAction": { "action": "click", "locator": "text", "value": "Close", "session": "named-browser" } }
@@ -209,16 +211,18 @@ For supported upstream `find` flows and native dropdown selection you can omit h
209
211
 
210
212
  Typical pitfalls:
211
213
 
212
- - Supply **exactly one** of `args`, `semanticAction`, `job`, `qa`, `sourceLookup`, or `networkSourceLookup` per call (not more, not none).
214
+ - Supply **exactly one** of `args`, `semanticAction`, `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron` per call (not more, not none).
213
215
  - `semanticAction` and `job` are **not** valid inside `batch` stdin; batch steps stay upstream argv string arrays (spell a `find` step as tokens there if you need it in a batch).
214
216
  - Commands or locators outside the supported shorthand still require explicit `args`. Common page getters are grouped under `get`: use `get title`, `get url`, or `get text <selector>` rather than shortcut commands such as `title` or `url`; unknown getter shortcuts can return read-only `details.nextActions` like `use-get-title`.
217
+ - For `locator: "role"`, pass either `value: "button"` or `role: "button"`; if both are present they must match.
215
218
  - Use `semanticAction.session` to target a named upstream browser session; the wrapper prepends `--session <name>` before the compiled `find` or `select` argv and keeps that prefix on retry/candidate actions. In active sessions, role/name click/check/uncheck shorthands may resolve through the current `snapshot -i` refs before execution so hidden duplicate matches do not steal the action; `details.effectiveArgs` shows the exact executed argv.
216
219
  - Do not reuse `@e…` refs across navigation. The wrapper records the latest snapshot refs per session and fails mutation-prone stale/recycled refs before upstream can silently hit a different current-page element; use the session-aware `refresh-interactive-refs` next action.
217
220
  - If upstream classifies the failure as `stale-ref` and `details.compiledSemanticAction` is present for a compiled `find` action, `details.nextActions` may list `retry-semantic-action-after-stale-ref` after `refresh-interactive-refs`, carrying the same compiled `find` argv so you can retry the locator-stable target once it is safe to do so. `select` calls that used stale `@refs` only get refresh guidance; use a fresh snapshot or stable selector before retrying (contract in [`docs/TOOL_CONTRACT.md#semanticaction`](docs/TOOL_CONTRACT.md#semanticaction)).
218
- - If the failure is `selector-not-found`, the wrapper may take one fresh snapshot and add `Current snapshot ref fallback` plus `try-current-visible-ref*` next actions when that snapshot has exact visible role/name matches for the failed `find` / `semanticAction` target. It still adds `Agent-browser candidate fallbacks` for bounded semanticAction role/name retries (`fill` + `placeholder`, `click` + `text`, or `fill` + `label`); prefer these payloads or a fresh snapshot over guessing new selectors (same contract link).
221
+ - If the failure is `selector-not-found`, the wrapper may take one fresh snapshot and add `Current snapshot ref fallback` when that snapshot has exact visible role/name matches for the failed `find` / `semanticAction` target. Non-fill targets can include direct `try-current-visible-ref*` next actions, and semantic click misses can still add bounded `Agent-browser candidate fallbacks` such as `button`/`link` role retries for `text` clicks. For semantic `fill` misses on desktop or host-controlled rich inputs, prefer `details.richInputRecovery`: refresh refs, choose the current editable `@ref`, focus or click it, then use `keyboard inserttext` or `keyboard type` with the intended text. Those recovery nextActions do not copy the fill text and do not press `Enter` or submit; only submit when the user flow explicitly calls for it (same contract link).
219
222
  - A successful upstream `click` is not proof that the web app handled the event or changed state. When the task depends on a mutation, follow `inspect-after-mutation` / `pageChangeSummary` evidence with a wait, URL/text check, or fresh snapshot before trusting the result; if the target still did not change, retry with a current visible ref or stable selector and report the workflow issue instead of silently continuing. Preserve explicit user stop boundaries: if the user says to stop before order/post/purchase/submit, gather evidence on that page and do not click the final action.
220
223
  - If a **top-level** `click` succeeds (unified command `click`, not a `batch` step), upstream reports `data.clicked`, and the tab URL is unchanged under the same normalization as ref preflight (fragment-insensitive), the wrapper may take one extra `snapshot -i` and add `Possible overlay blockers` with `details.overlayBlockers` (`candidates`, `summary`, optional `snapshot` refresh for refs) plus session-aware `inspect-overlay-state` / bounded `try-overlay-blocker-candidate-*` next actions when that snapshot shows strong modal context (`dialog` / `alertdialog`) and close/dismiss-like controls. Page-wide words like privacy, sign in, or banner alone do not trigger this diagnostic. The unchanged-URL check uses `details.navigationSummary`, which is populated with one read-only `eval` summary when the click JSON omits **both** string `data.url` and `data.title`; if upstream already includes either, overlay diagnostics are skipped here. Also skipped when tab correction or about-blank recovery already ran on that result.
221
- - If `get text <selector>` reads a non-ref CSS selector with multiple matches or a hidden first match while visible matches exist, including successful `batch` steps, the wrapper may add `Selector text visibility warning`, `details.selectorTextVisibility` (plus `selectorTextVisibilityAll` for multiple batched warnings), and `inspect-visible-text-candidates` next actions; prefer a visible `@ref`, a scoped selector, or a targeted `eval --stdin` over hidden tab content.
224
+ - If `get text <selector>` reads a non-ref CSS selector with multiple matches or a hidden first match while visible matches exist, including successful `batch` steps, the wrapper may add `Selector text visibility warning`, `details.selectorTextVisibility` (plus `selectorTextVisibilityAll` for multiple batched warnings), and `inspect-visible-text-candidates` next actions; the warning names the matching `details.nextActions` id. Prefer a visible `@ref`, a scoped selector, or a targeted `eval --stdin` over hidden tab content.
225
+ - In attached Electron sessions, broad selectors such as `body`, `html`, `main`, or `[role=application]` may read the whole app shell. The wrapper may add `Broad Electron get text selector warning`, `details.electronGetTextScopeWarning`, and `snapshot-for-electron-text-scope`; prefer `snapshot -i`, a current `@ref`, or a narrower panel selector.
222
226
 
223
227
  ### Constrained browser jobs
224
228
 
@@ -238,11 +242,41 @@ For short repeatable workflows, pass a top-level `job` instead of hand-writing `
238
242
 
239
243
  On app pages that expose a native dropdown, add a `select` step such as `{ "action": "select", "selector": "#flavor", "value": "chocolate" }` before the assertion that depends on it.
240
244
 
241
- Use raw `args`/`stdin` when you need full upstream `batch` power, custom flags, or commands outside the constrained job schema. Do not pass `stdin` with `job`, `qa`, `sourceLookup`, or `networkSourceLookup`; those modes generate the batch stdin themselves.
245
+ Use raw `args`/`stdin` when you need full upstream `batch` power, custom flags, or commands outside the constrained job schema. Do not pass `stdin` with `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron`; those modes generate or manage their own input.
246
+
247
+ ### Electron desktop apps
248
+
249
+ The dedicated guide for this section is [`docs/ELECTRON.md`](docs/ELECTRON.md); it covers intended users, the full lifecycle, wrapper-owned vs manually launched apps, action reference, safety/ownership, `qa.attached`, `sourceLookup` context, troubleshooting, and cleanup. Read it first if Electron support is what brought you here.
250
+
251
+ For desktop Electron apps, use top-level `electron` to avoid hand-building the discover → launch with CDP → connect → inspect → cleanup sequence. The wrapper owns only apps it launched, uses an isolated temp profile and OS-chosen debug port, and reports exact cleanup/status next actions. It does **not** reuse the app's normal signed-in profile or attach to an already-running authenticated app, so launching Slack/Obsidian/VS Code this way may show first-run or sign-in UI instead of the user's live local state. When the explicit goal is signed-in local app state and host tools are available, launch the normal app with a debug port first (for example `open -a Slack --args --remote-debugging-port=9222 --remote-allow-origins='*'`), then attach with `{ "args": ["connect", "9222"], "sessionMode": "fresh" }`; if the app is already running without a debug port, ask before relaunching it. `electron.list` may annotate likely private apps (for example notes, chat, mail, developer workspaces, or password/auth tools) as `[likely sensitive: …]`; those are hints only, so use caller-owned `allow` / `deny` policy before launching sensitive apps.
252
+
253
+ ```json
254
+ { "electron": { "action": "list", "query": "code" } }
255
+ { "electron": { "action": "launch", "appName": "Visual Studio Code", "handoff": "snapshot" } }
256
+ { "electron": { "action": "probe", "timeoutMs": 5000 } }
257
+ { "electron": { "action": "cleanup", "launchId": "electron-…" } }
258
+ ```
259
+
260
+ `electron.probe.timeoutMs` bounds each underlying read subprocess when dense desktop apps need a shorter or longer probe budget (omit for the normal tool subprocess default). `electron.cleanup.timeoutMs` caps upstream `close` plus host profile/process teardown and defaults to the implicit session close budget unless overridden. `electron.status.timeoutMs` only tightens managed-session title/url reads used for mismatch checks. Pass `electron.probe.launchId` when you want the probe tied to a wrapper-tracked launch instead of only the current managed session. Launch/status/probe results show both `launchId` (for status/cleanup/probe) and `sessionName` (for browser `snapshot`/`tab` commands); if the managed session drifts to `about:blank` while wrapper status still sees a live renderer, Electron-specific mismatch warnings and `status`/`probe`/`reattach`/`snapshot` next actions replace generic tab guidance. If the app process/debug port dies after a successful-looking mutation, the wrapper reports `details.electronPostCommandHealth` and fails with `tab-drift` instead of quietly continuing on `about:blank`. Launch timeouts expose `details.electron.failure.diagnostics` for PID, profile, DevToolsActivePort, and timing evidence.
261
+
262
+ `launch.handoff` still defaults to `"snapshot"`; it retries briefly when the first Electron snapshot has no refs. Use `handoff: "tabs"` as a safer diagnostic starting point when you only need target discovery and do not want interactive refs captured yet, or `handoff: "connect"` when you want attach-only and will run your own `snapshot -i` / tab commands next. For Electron quick inputs that rerender in place, a successful `fill` may include `details.fillVerification` if `get value` still disagrees; re-snapshot and use focus plus keyboard typing before submitting.
263
+
264
+ For an app you launched yourself with remote debugging enabled, use raw upstream attach instead and clean it up yourself. After attach, inspect targets before assuming the app is ready:
265
+
266
+ ```json
267
+ { "args": ["connect", "9222"], "sessionMode": "fresh" }
268
+ { "args": ["tab", "list"] }
269
+ { "args": ["tab", "t2"] }
270
+ { "args": ["snapshot", "-i"] }
271
+ ```
272
+
273
+ `connect` success means the debug endpoint accepted the session, not that an active page is ready. If a snapshot says `No active page`, the wrapper clears prior refs for that session; choose a stable `t<N>` tab and retry a condition wait or fresh `snapshot -i` before using `@e…` refs. `close` only closes the browser/CDP session; manually launched apps, their profiles, and explicit screenshots/downloads/HARs/traces/recordings remain host-owned.
274
+
275
+ After either path, use `qa: { "attached": true, ... }` for a current-session smoke check without opening a URL. Prefer condition waits (`wait --text`, `wait --url`, `wait --fn`, `wait --load <state>`, `wait --download`), `qa.attached`, `electron.probe` / `electron.status`, `tab list` → `tab t<N>`, fresh snapshots, or screenshots over blind sleeps. Keep fixed waits below the wrapper IPC budget: `wait 30000` is intentionally blocked, and a result like `"waited":"timeout"` only proves elapsed time.
242
276
 
243
277
  ### Lightweight QA preset
244
278
 
245
- For a quick smoke/QA pass, use top-level `qa`. It compiles to the same batch path as `job`, clears enabled network/console/page-error buffers before opening the target URL, waits for page readiness, checks optional expected text or selector, inspects fresh network requests, console messages, and page errors, and can capture an evidence screenshot. `loadState` defaults to `"domcontentloaded"`; set it to `"load"` or `"networkidle"` only when the stricter state is useful and the site is not expected to keep background requests alive. `checkNetwork`, `checkConsole`, and `checkErrors` default to true; set one to `false` to skip that diagnostic read. Network failures are classified by likely impact and failed rows are listed first in network previews: actionable document/script/API-style failures still fail QA, while some low-impact browser icon asset misses (for example certain `favicon` or `apple-touch-icon` paths when upstream marks the row failed and resource metadata looks image-like) surface only as warnings instead of failing an otherwise healthy smoke check (`details.qaPreset.warnings`, with human-readable `details.qaPreset.summary` when the preset still passes). Exact predicates live in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#qa) and `classifyNetworkRequestFailure` in `extensions/agent-browser/lib/results/shared.ts`.
279
+ For a quick smoke/QA pass, use top-level `qa`. It compiles to the same batch path as `job`. The URL form clears enabled network/console/page-error buffers before opening the target URL, waits for page readiness, checks optional expected text or selector, inspects fresh network requests, console messages, and page errors, and can capture an evidence screenshot. The attached form (`qa: { "attached": true }`) runs those checks against the current managed session, such as an attached Electron app, and rejects `url`. `loadState` defaults to `"domcontentloaded"`; set it to `"load"` or `"networkidle"` only when the stricter state is useful and the site is not expected to keep background requests alive. `checkNetwork`, `checkConsole`, and `checkErrors` default to true; set one to `false` to skip that diagnostic read. Network failures are classified by likely impact and failed rows are listed first in network previews: actionable document/script/API-style failures still fail QA, while some low-impact browser icon asset misses (for example certain `favicon` or `apple-touch-icon` paths when upstream marks the row failed and resource metadata looks image-like) surface only as warnings instead of failing an otherwise healthy smoke check (`details.qaPreset.warnings`, with human-readable `details.qaPreset.summary` when the preset still passes). Exact predicates live in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#qa) and `classifyNetworkRequestFailure` in `extensions/agent-browser/lib/results/network.ts` (re-exported from the compatibility barrel).
246
280
 
247
281
  ```json
248
282
  {
@@ -264,7 +298,7 @@ For local app debugging, `sourceLookup` can gather candidate component/file loca
264
298
  { "sourceLookup": { "selector": "#save", "reactFiberId": "2", "componentName": "SaveButton" } }
265
299
  ```
266
300
 
267
- This is an experiment, not a guarantee. React hints require a session opened with `--enable react-devtools`, and many builds do not expose useful sourcemap/source metadata; `status: "no-candidates"` is common when nothing matched, and `status: "unsupported"` only when no candidates were found **and** a compiled `react` batch step failed (if DOM or workspace search still produced candidates, you get `candidates-found` instead).
301
+ This is an experiment, not a guarantee. React hints require a session opened with `--enable react-devtools`, and many builds do not expose useful sourcemap/source metadata; `status: "no-candidates"` is common when nothing matched, and `status: "unsupported"` only when no candidates were found **and** a compiled `react` batch step failed (if DOM or workspace search still produced candidates, you get `candidates-found` instead). For wrapper-tracked packaged Electron apps, a no-candidate result includes `details.sourceLookup.workspaceRoot`, optional `details.sourceLookup.electronContext`, limitations explaining that the scan is limited to the Pi cwd and does not unpack app bundles/`app.asar`, plus Electron snapshot/probe/tab next actions when a launch is known.
268
302
 
269
303
  `networkSourceLookup` is the matching failed-request experiment. It runs `network request <id>` when `requestId` is present and/or `network requests --filter …` when `filter` or `url` is present (`url` supplies the filter pattern when `filter` is omitted); add `session` when the generated batch should target an explicit upstream session. It merges failed-request rows from the batch JSON with initiator-style hints and a bounded workspace literal scan (`maxWorkspaceFiles` defaults to 2000, cap 5000), surfaces everything under `details.networkSourceLookup`, and avoids automatic blame or edits. Compact `network requests` results with safe request IDs also add `details.nextActions` for request details, bounded `networkSourceLookup` on actionable failures, path filtering, or HAR capture so agents can branch without guessing request-id syntax. Network diagnostics are read-only for wrapper page state: request URLs in `network request` or generated `networkSourceLookup` batches do not replace the session’s active page target or invalidate page-scoped refs from the app page.
270
304
 
@@ -369,7 +403,13 @@ The local verification gate is:
369
403
  npm run verify
370
404
  ```
371
405
 
372
- It runs:
406
+ For a fast TypeScript-only iteration loop (same `tsc --noEmit` as the default gate, without docs drift checks, unit tests, or live upstream command-reference sampling):
407
+
408
+ ```bash
409
+ npm run typecheck
410
+ ```
411
+
412
+ The full `npm run verify` gate runs:
373
413
 
374
414
  - generated playbook/documentation drift checks
375
415
  - `tsc --noEmit`
@@ -437,6 +477,8 @@ Install upstream `agent-browser`, then install dependencies:
437
477
  npm install
438
478
  ```
439
479
 
480
+ Use the npm version declared in `package.json` `packageManager` when refreshing `package-lock.json` (for example `npx -y npm@11.14.0 install`) so optional-platform lockfile metadata does not drift. Align the global `pi` CLI with this repo’s `pi-coding-agent` devDependency range before lifecycle or interactive browser smokes. See [Environment and automation pitfalls](docs/RELEASE.md#environment-and-automation-pitfalls) in `docs/RELEASE.md`.
481
+
440
482
  Quick isolated checkout smoke test:
441
483
 
442
484
  ```bash
@@ -453,6 +495,8 @@ Configured-source lifecycle validation:
453
495
  npm run verify -- lifecycle
454
496
  ```
455
497
 
498
+ The harness defaults to Pi model `zai/glm-5.1` and **180000 ms** per-step tmux waits; pass `--model <id>` and/or `--timeout-ms <ms>` after `lifecycle` when you need different settings (see [Configured-source lifecycle validation](docs/RELEASE.md#configured-source-lifecycle-validation) in `docs/RELEASE.md`).
499
+
456
500
  Use lifecycle validation when testing `/reload`, full restart, `/resume`, managed-session continuity, or persisted artifact behavior. Maintainers must run the same harness before every publish; see [Pre-release checks](docs/RELEASE.md#pre-release-checks).
457
501
 
458
502
  Installed-package validation after publish:
@@ -481,7 +525,7 @@ These calls return plain text and stay stateless: the extension does not inject
481
525
  - After launch-scoped open/goto/navigate calls that can restore existing tabs (for example --profile, --session-name, or --state), agent_browser best-effort re-selects the tab whose URL matches the returned page when restored tabs steal focus during launch.
482
526
  - After the wrapper observes tab-drift risk for a session (for example profile restore correction, overlapping stale opens, or resumed session state), later active-tab commands best-effort pin that tab inside the same upstream invocation. Routine same-session commands are not preflighted with tab list just because a target tab is known.
483
527
  - For sessions with observed tab-drift risk, after a successful command on a known target tab, agent_browser also best-effort restores that intended tab if a restored/background tab steals focus after the command completes. Routine same-session commands skip this post-command tab-list probe.
484
- - If a known session target unexpectedly reports about:blank, agent_browser preserves the prior intended target, best-effort re-selects it when it still exists, and reports exact recovery guidance when it cannot be re-selected.
528
+ - If a known session target unexpectedly reports about:blank, agent_browser best-effort re-selects the prior intended target when it still exists; if recovery fails, it records the observed about:blank target and reports exact recovery guidance instead of treating the prior page as active.
485
529
  <!-- agent-browser-playbook:end wrapper-tab-recovery -->
486
530
 
487
531
  ## Project map
@@ -496,6 +540,7 @@ These calls return plain text and stay stateless: the extension does not inject
496
540
  | `scripts/check-command-reference-baseline.mjs` | Regenerates or verifies HTML-bounded baseline blocks in `docs/COMMAND_REFERENCE.md` (via `npm run docs -- command-reference …`) |
497
541
  | `docs/COMMAND_REFERENCE.md` | Repo-readable native command reference |
498
542
  | `docs/TOOL_CONTRACT.md` | Tool parameters, result shape, and behavior contract |
543
+ | `docs/ELECTRON.md` | Dedicated public guide for Electron desktop-app support |
499
544
  | `docs/ARCHITECTURE.md` | Design decisions and implementation structure |
500
545
  | `docs/REQUIREMENTS.md` | Product requirements and constraints |
501
546
  | `docs/RELEASE.md` | Release, package, and lifecycle verification workflow |
@@ -507,6 +552,7 @@ These calls return plain text and stay stateless: the extension does not inject
507
552
  - [`AGENTS.md`](AGENTS.md) — maintainer and agent runbooks, including upstream capability baseline rebaselining and Pi smoke testing in `tmux`
508
553
  - [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md) — full native command reference and upstream capability baseline
509
554
  - [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md) — exact tool contract
555
+ - [`docs/ELECTRON.md`](docs/ELECTRON.md) — Electron desktop-app guide
510
556
  - [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) — how the wrapper is designed
511
557
  - [`docs/REQUIREMENTS.md`](docs/REQUIREMENTS.md) — product constraints and non-goals
512
558
  - [`docs/RELEASE.md`](docs/RELEASE.md) — maintainer release workflow
@@ -5,6 +5,7 @@ Related docs:
5
5
  - [`../AGENTS.md`](../AGENTS.md) (maintainer workflows, including upstream capability baseline)
6
6
  - [`REQUIREMENTS.md`](REQUIREMENTS.md)
7
7
  - [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md)
8
+ - [`ELECTRON.md`](ELECTRON.md)
8
9
 
9
10
  ## Decision
10
11
 
@@ -32,12 +33,14 @@ The extension should:
32
33
  - resolve `agent-browser` from `PATH`
33
34
  - invoke it directly, not through a shell
34
35
  - inject `--json`
35
- - support optional stdin only for `eval --stdin`, `batch`, `auth save --password-stdin`, and wrapper-generated `batch` stdin from top-level `job`, `qa`, `sourceLookup`, or `networkSourceLookup`, rejecting other command/stdin combinations before launch
36
- - accept an optional native `semanticAction` object as a mutually exclusive alternative to `args` on a single tool call, compile locator actions into upstream `find` argv and native dropdown selection into upstream `select <selector> <value...>` argv (with optional `semanticAction.session` expanding to a leading `--session <name>` before the compiled command when targeting a named upstream browser instead of the managed default), and echo the compiled shape in `details.compiledSemanticAction` for observability (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction))
37
- - accept an optional native `job` object (mutually exclusive with `args`, `semanticAction`, `qa`, `sourceLookup`, and `networkSourceLookup` on the same call) with a small fixed step vocabulary that compiles only to existing upstream `batch` argv rows, generates the JSON batch stdin string internally, and echoes `details.compiledJob` for observability (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#job))
38
- - accept an optional native `qa` object (mutually exclusive with `args`, `semanticAction`, `job`, `sourceLookup`, and `networkSourceLookup` on the same call) that compiles to the same `batch` path as `job`, runs a fixed diagnostic smoke sequence, and echoes `details.compiledQaPreset` plus structured `details.qaPreset` pass/fail evidence (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#qa))
39
- - accept an optional native `sourceLookup` object (mutually exclusive with `args`, `semanticAction`, `job`, `qa`, and `networkSourceLookup` on the same call) that compiles to the same `batch` path, gathers evidence-backed local source *candidates* for a selector/fiber/component name, and echoes `details.compiledSourceLookup` plus structured `details.sourceLookup` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sourcelookup)); unlike `qa`, it never applies a second pass/fail layer that marks the tool failed when upstream already reported batch success—failed upstream steps still fail the invocation normally, and `details.sourceLookup` may still be present for partial evidence
40
- - accept an optional native `networkSourceLookup` object (mutually exclusive with `args`, `semanticAction`, `job`, `qa`, and `sourceLookup` on the same call) that compiles to the same `batch` path, correlates failed network requests with initiator metadata and bounded workspace URL literals, and echoes `details.compiledNetworkSourceLookup` plus structured `details.networkSourceLookup` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#networksourcelookup)); like `sourceLookup`, it never flips a successful upstream batch to failed solely because no source candidates were found
36
+ - complete each upstream invocation when the direct `agent-browser` child exits even if Node delays `"close"`: piped stdio can stay referenced by longer-lived descendant processes, so `runAgentBrowserProcess` watches `exit` and `close` together, leaves stdio intact during a short post-`exit` grace so normal `close` can still win, destroys streams only when the post-`exit` fallback fires, and prefers `close` codes then wrapper timeout (`124`) over signal-shaped `exit` codes (`watchSpawnedChildCompletion` / `resolveSpawnedChildExitCode` in `extensions/agent-browser/lib/process.ts`) so the tool cannot hang after the CLI process has already terminated
37
+ - support optional stdin only for `eval --stdin`, `batch`, `auth save --password-stdin`, and wrapper-generated `batch` stdin from top-level `job`, `qa`, `sourceLookup`, or `networkSourceLookup`, rejecting other command/stdin combinations before launch; top-level `electron` never accepts caller `stdin` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#electron))
38
+ - accept an optional native `semanticAction` object as a mutually exclusive alternative to `args` on a single tool call (and to `job`, `qa`, `sourceLookup`, `networkSourceLookup`, and `electron` on the same call), compile locator actions into upstream `find` argv and native dropdown selection into upstream `select <selector> <value...>` argv (with optional `semanticAction.session` expanding to a leading `--session <name>` before the compiled command when targeting a named upstream browser instead of the managed default), and echo the compiled shape in `details.compiledSemanticAction` for observability (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction))
39
+ - accept an optional native `job` object (mutually exclusive with `args`, `semanticAction`, `qa`, `sourceLookup`, `networkSourceLookup`, and `electron` on the same call) with a small fixed step vocabulary that compiles only to existing upstream `batch` argv rows, generates the JSON batch stdin string internally, and echoes `details.compiledJob` for observability (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#job))
40
+ - accept an optional native `qa` object (mutually exclusive with `args`, `semanticAction`, `job`, `sourceLookup`, `networkSourceLookup`, and `electron` on the same call) that compiles to the same `batch` path as `job`, runs a fixed diagnostic smoke sequence, and echoes `details.compiledQaPreset` plus structured `details.qaPreset` pass/fail evidence (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#qa))
41
+ - accept an optional native `sourceLookup` object (mutually exclusive with `args`, `semanticAction`, `job`, `qa`, `networkSourceLookup`, and `electron` on the same call) that compiles to the same `batch` path, gathers evidence-backed local source *candidates* for a selector/fiber/component name, and echoes `details.compiledSourceLookup` plus structured `details.sourceLookup` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sourcelookup)); unlike `qa`, it never applies a second pass/fail layer that marks the tool failed when upstream already reported batch success—failed upstream steps still fail the invocation normally, and `details.sourceLookup` may still be present for partial evidence
42
+ - accept an optional native `networkSourceLookup` object (mutually exclusive with `args`, `semanticAction`, `job`, `qa`, `sourceLookup`, and `electron` on the same call) that compiles to the same `batch` path, correlates failed network requests with initiator metadata and bounded workspace URL literals, and echoes `details.compiledNetworkSourceLookup` plus structured `details.networkSourceLookup` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#networksourcelookup)); like `sourceLookup`, it never flips a successful upstream batch to failed solely because no source candidates were found
43
+ - accept an optional native `electron` object (mutually exclusive with `args`, `semanticAction`, `job`, `qa`, `sourceLookup`, and `networkSourceLookup` on the same call) for bounded desktop Electron lifecycle: `list` scans the host for install candidates, `launch` creates a wrapper-owned isolated profile plus OS-chosen remote-debugging port, then attaches through upstream `connect` with `sessionMode: "fresh"`, and `status` / `cleanup` / `probe` operate only on wrapper-tracked launches; host-side spawn and CDP discovery live in `extensions/agent-browser/lib/electron/discovery.ts`, `launch.ts`, and `cleanup.ts`, while compilation, transcript restore for `launchId` records, handoff probes, and merged `details.electron*` fields live in `extensions/agent-browser/index.ts` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#electron))
41
44
  - when a compiled `find` semantic action fails as `stale-ref`, optionally append a `retry-semantic-action-after-stale-ref` entry to `details.nextActions` after the usual `refresh-interactive-refs` snapshot step so agents can re-issue the same compiled `find` argv only when the failure implies the interaction did not run; `select` shorthands with stale `@refs` get refresh guidance only (contract in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction))
42
45
  - when the same compiled path fails as `selector-not-found` for the bounded locator/action pairs documented there, optionally append `try-*-candidate` entries to `details.nextActions` and mirror them in visible text as `Agent-browser candidate fallbacks` so agents can retry role/name `find` variants without hand-rebuilding argv (`select` misses are intentionally excluded)
43
46
 
@@ -55,12 +58,12 @@ That means:
55
58
  Do **not** add reusable browser recipes as a first-class runtime surface yet.
56
59
 
57
60
  Current evidence does not justify another source of truth for workflows:
58
- - the deterministic efficiency benchmark in [`scripts/agent-browser-efficiency-benchmark.mjs`](../scripts/agent-browser-efficiency-benchmark.mjs) models one native `job` scenario (`job-open-assert-screenshot`), one `qa` preset (`qa-open-diagnostics`), one `sourceLookup` (`source-lookup-visible-element`), and one `networkSourceLookup` (`network-source-lookup-failed-request`) rather than repeated named job patterns that agents keep re-specifying
61
+ - the deterministic efficiency benchmark in [`scripts/agent-browser-efficiency-benchmark.mjs`](../scripts/agent-browser-efficiency-benchmark.mjs) models one native `job` scenario (`job-open-assert-screenshot`), one `qa` preset (`qa-open-diagnostics`), one `sourceLookup` (`source-lookup-visible-element`), one `networkSourceLookup` (`network-source-lookup-failed-request`), plus deterministic `electron` lifecycle/probe scenarios (`electron-lifecycle`, `electron-probe`) rather than repeated named job patterns that agents keep re-specifying
59
62
  - repo-local dogfood evidence does not show repeated project-specific job recipes that need versioning or ownership
60
63
  - `qa` already covers the only repeated smoke-test shape with a stable top-level preset
61
64
  - docs and prompt guidance can carry examples without adding recipe state, migration rules, or another schema
62
65
 
63
- Revisit this only when benchmark or dogfood data shows at least two repeated, failure-prone job sequences that cannot be represented clearly by `job`, `qa`, or raw `batch`. If that happens, define ownership, versioning, schema boundaries, generated docs, and tests before adding executable recipes.
66
+ Revisit this only when benchmark or dogfood data shows at least two repeated, failure-prone job sequences that cannot be represented clearly by `job`, `qa`, top-level `electron`, or raw `batch`. If that happens, define ownership, versioning, schema boundaries, generated docs, and tests before adding executable recipes.
64
67
 
65
68
  ### Package layout versus local checkout development
66
69
 
@@ -118,7 +121,7 @@ Practical policy:
118
121
  - after profiled `open` / `goto` / `navigate` calls, verify the active tab still matches the returned page URL and best-effort switch back when restored profile tabs steal focus
119
122
  - once the wrapper observes tab-drift risk for a session (profile restore correction, overlapping stale opens, or restored session state), later active-tab commands may synthesize a tiny upstream `batch` that re-selects that tab and then runs the requested command in the same upstream invocation; routine same-session commands avoid `tab list` preflights to reduce probes that can perturb upstream click behavior
120
123
  - for sessions with observed tab-drift risk, after a successful command on a known tab target, the wrapper may best-effort restore that same target again if restored/background tabs steal focus after the command returns; routine same-session commands skip this post-command `tab list` probe
121
- - keep a per-session `refSnapshot` aligned with the last successful `snapshot` (including refs merged from a successful `batch` by taking the last successful `snapshot` step in batch result order): restore it from persisted tool `details` when reloading or resuming, drop it on successful `close`, and refuse mutation-prone `@e…` argv before spawn when the active tab URL no longer matches the snapshot URL, when a ref id was never in that snapshot, or when `batch` stdin would reuse `@e…` on a guarded step after an earlier invalidating step without a later `snapshot` step in the same stdin array. Same-snapshot `fill @e…` rows are guarded but do not themselves set that invalidation latch, so ordinary form fills can precede a click/submit row in one batch—see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) for the agent-visible contract and failure text
124
+ - keep a per-session `refSnapshot` aligned with the last successful `snapshot` (including refs merged from a successful `batch` by taking the last successful `snapshot` step in batch result order): restore it from persisted tool `details` when reloading or resuming, drop it on successful `close`, and refuse mutation-prone `@e…` argv before spawn when the active tab URL no longer matches the snapshot URL, when a ref id was never in that snapshot, or when `batch` stdin would reuse `@e…` on a guarded step after an earlier invalidating step without a later `snapshot` step in the same stdin array. Same-snapshot `fill @e…` rows are guarded but do not themselves set that invalidation latch, so ordinary form fills can precede a click/submit row in one batch—see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) for the agent-visible contract and failure text; typed per-session tab/ref/pinning state lives in `extensions/agent-browser/lib/session-page-state.ts` and is updated from `extensions/agent-browser/index.ts` after each tool result
122
125
  - after successful `get text` on a non-ref CSS selector, optionally issue one read-only `eval --stdin` probe per qualifying selector when multiple DOM matches or a hidden first match with visible peers could misread tabbed or off-screen content; merge `details.selectorTextVisibility` / `selectorTextVisibilityAll`, visible warning lines, and `inspect-visible-text-candidates*` next actions as documented in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) and `RQ-0074` in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
123
126
  - for local Unix launches, set a short private socket directory so extension-generated session names do not fail on the upstream Unix socket-path length limit
124
127
  - keep wrapper-spawned upstream CLI calls inside the upstream IPC budget by clamping `AGENT_BROWSER_DEFAULT_TIMEOUT` to 25 seconds and stopping a stuck child process before the upstream 30-second read-timeout retry loop begins
@@ -166,7 +169,7 @@ This keeps the product centered on native tool usage instead of auxiliary skill
166
169
  - subprocess execution and JSON parsing through a filtered child environment (`buildAgentBrowserProcessEnv` in `extensions/agent-browser/lib/process.ts`): copies an allowlisted inherited-name set plus every parent `AGENT_BROWSER_*` variable and provider-related prefixes (`AGENTCORE_*`, `AI_GATEWAY_*`, `BROWSERBASE_*`, `BROWSERLESS_*`, `BROWSER_USE_*`, `KERNEL_*`, `XDG_*`) instead of cloning the full parent process environment
167
170
  - clear missing-binary errors
168
171
  - compact result summaries, including presentation-time redaction: stateful browser-context commands (`auth`, `cookies`, `storage`, `dialog`, `frame`, `state`) use field-aware value redaction and compact formatters, while other structured upstream JSON (for example `network`, `diff`, `trace` / `profiler` / `record`, `console` / `errors` / `highlight` / `inspect` / `clipboard`, `stream`, `dashboard`, and `chat`) is passed through `redactPresentationData` in `extensions/agent-browser/lib/results/presentation.ts` so model-facing `details.data` and batch roll-ups stay compact and do not echo bearer tokens, proxy passwords, or similar fields verbatim; `redactInvocationArgs` in `extensions/agent-browser/lib/runtime.ts` masks trailing values for sensitive global flags such as `--body`, `--headers`, `--password`, and `--proxy`, preserves positional rules for `cookies set` and `storage local|session set`, and nested `batch` steps use the same argv and error-body scrubbing before echoing commands or errors
169
- - bounded machine-readable outcome metadata on tool `details` (`resultCategory`, `successCategory`, `failureCategory`, optional `nextActions`, optional `pageChangeSummary` with per-step summaries on `batch`, optional `artifactVerification` with the same shape on each successful `batchSteps[]` row) so agents can branch without parsing prose; enums, classifier precedence, and follow-up payloads are assembled in `extensions/agent-browser/lib/results/shared.ts`, compact page-change summaries and artifact verification rollups are built in `extensions/agent-browser/lib/results/presentation.ts` (`buildPageChangeSummary`, `buildArtifactVerificationSummary`), and the human contract lives in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details)
172
+ - bounded machine-readable outcome metadata on tool `details` (`resultCategory`, `successCategory`, `failureCategory`, optional `nextActions`, optional `pageChangeSummary` with per-step summaries on `batch`, optional `artifactVerification` with the same shape on each successful `batchSteps[]` row) so agents can branch without parsing prose; enums, classifier precedence, and generic follow-up payloads are implemented under `extensions/agent-browser/lib/results/` in focused modules (`contracts.ts` for shared types, `categories.ts` for `classifyAgentBrowserSuccessCategory` / `classifyAgentBrowserFailureCategory` / `buildAgentBrowserResultCategoryDetails`, `action-recommendations.ts` for `buildAgentBrowserNextActions`, `next-actions.ts` for the `AgentBrowserNextAction` shape and merge helpers, `recovery-actions.ts` for recovery id registries and `buildRecoveryNextActions`, `network.ts` for `classifyNetworkRequestFailure` / `summarizeNetworkFailures`, and related helpers; `shared.ts` in that directory is a **re-export-only** barrel so historical `./results/shared.js` imports keep working without growing a monolith). Per-session tab target, `refSnapshot` alignment, invalidation, and tab pinning observations flow through `extensions/agent-browser/lib/session-page-state.ts` from `extensions/agent-browser/index.ts`. Compact page-change summaries and artifact verification rollups are built in `extensions/agent-browser/lib/results/presentation.ts` (`buildPageChangeSummary`, `buildArtifactVerificationSummary`), and the human contract lives in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details). Real Pi custom tools otherwise only mark a row failed when `execute` throws, so the extension also registers `pi.on("tool_result", …)` and patches `agent_browser` results whose `details.resultCategory` is `failure` to set `isError: true`. Prose results also receive a short category notice, while caller-requested `--json` results with parseable JSON content keep that text unchanged so JSONL transcripts, UI affordances, and the machine-readable contract stay aligned for wrapper-side reclassifications such as `qa-failure` (`buildAgentBrowserToolResultPatch` in `extensions/agent-browser/index.ts`; transcript semantics in the same contract doc)
170
173
  - inline screenshots/images for the plain `screenshot` command; other image-like saves (for example `diff screenshot`) still appear in `details.artifacts` and summaries but are not auto-inlined as Pi image attachments (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details))
171
174
  - lightweight session convenience
172
175
  - docs, including a repo-readable command reference that mirrors the blocked direct-binary help path closely enough for normal agent work