pi-agent-browser-native 0.2.34 → 0.2.36

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (38) hide show
  1. package/CHANGELOG.md +44 -0
  2. package/README.md +25 -15
  3. package/docs/ARCHITECTURE.md +19 -13
  4. package/docs/COMMAND_REFERENCE.md +274 -44
  5. package/docs/ELECTRON.md +3 -3
  6. package/docs/RELEASE.md +11 -11
  7. package/docs/REQUIREMENTS.md +5 -5
  8. package/docs/SUPPORT_MATRIX.md +43 -24
  9. package/docs/TOOL_CONTRACT.md +50 -30
  10. package/extensions/agent-browser/index.ts +518 -2402
  11. package/extensions/agent-browser/lib/argv-descriptor.ts +90 -0
  12. package/extensions/agent-browser/lib/argv-grammar.ts +128 -0
  13. package/extensions/agent-browser/lib/command-policy.ts +71 -0
  14. package/extensions/agent-browser/lib/command-taxonomy.ts +336 -0
  15. package/extensions/agent-browser/lib/electron/cleanup.ts +1 -0
  16. package/extensions/agent-browser/lib/executable-path.ts +19 -0
  17. package/extensions/agent-browser/lib/input-modes/params.ts +6 -6
  18. package/extensions/agent-browser/lib/orchestration/batch-stdin.ts +65 -0
  19. package/extensions/agent-browser/lib/orchestration/browser-run/browser-action-model.ts +154 -0
  20. package/extensions/agent-browser/lib/orchestration/browser-run/click-dispatch.ts +149 -0
  21. package/extensions/agent-browser/lib/orchestration/browser-run/diagnostics.ts +56 -30
  22. package/extensions/agent-browser/lib/orchestration/browser-run/final-result.ts +13 -3
  23. package/extensions/agent-browser/lib/orchestration/browser-run/index.ts +33 -27
  24. package/extensions/agent-browser/lib/orchestration/browser-run/prepare.ts +48 -22
  25. package/extensions/agent-browser/lib/orchestration/browser-run/process-output.ts +39 -10
  26. package/extensions/agent-browser/lib/orchestration/browser-run/prompt-guards.ts +93 -0
  27. package/extensions/agent-browser/lib/orchestration/browser-run/session-state.ts +98 -124
  28. package/extensions/agent-browser/lib/orchestration/browser-run/types.ts +40 -1
  29. package/extensions/agent-browser/lib/orchestration/electron-host/index.ts +860 -0
  30. package/extensions/agent-browser/lib/playbook.ts +10 -10
  31. package/extensions/agent-browser/lib/prompt-policy.ts +122 -0
  32. package/extensions/agent-browser/lib/results/action-recommendations.ts +3 -23
  33. package/extensions/agent-browser/lib/results/presentation/navigation.ts +2 -34
  34. package/extensions/agent-browser/lib/runtime.ts +93 -227
  35. package/extensions/agent-browser/lib/session-page-state.ts +31 -14
  36. package/extensions/agent-browser/lib/temp.ts +148 -23
  37. package/package.json +4 -4
  38. package/scripts/agent-browser-capability-baseline.mjs +198 -1
package/CHANGELOG.md CHANGED
@@ -2,6 +2,50 @@
2
2
 
3
3
  ## Unreleased
4
4
 
5
+ ## 0.2.36 - 2026-05-29
6
+
7
+ ### Added
8
+
9
+ - Added `details.evalResultWarning` and visible `Eval result warning` guidance for successful `eval --stdin` calls that return `null` on `file://` pages with non-trivial stdin, so agents treat those DOM checks as inconclusive without failing the tool.
10
+ - Enriched `get text <selector>` visibility diagnostics with bounded `visibleCandidates` entries (`querySelectorAll` index, tag, optional role, redacted text preview) so agents can resolve broad selector ambiguity without trusting hidden or first-match text.
11
+ - Added support-matrix tracking for the 2026-05-29 agent UX/reliability feedback batch (`RQ-0110`–`RQ-0117`) covering headed mode, local loopback, `file://` eval, click verification, selector ambiguity, host-owned fixture servers, fresh-session failures, and headed visibility limits.
12
+
13
+ ### Changed
14
+
15
+ - Made failed `sessionMode: "fresh"` managed-session recovery prose action-oriented: visible output now avoids generated session ids, distinguishes preserved/abandoned launch failures from post-launch `qa` reclassification failures, and points to safe next actions while keeping full transition details in `details.managedSessionOutcome`.
16
+ - Expanded README, command reference, tool contract, and generated playbook guidance for headed demos, browser-host localhost semantics, `file://` fixture limits, post-click verification, and host-owned temporary server cleanup.
17
+
18
+ ### Fixed
19
+
20
+ - Fresh-session failure next actions now verify or snapshot the current managed session for post-launch failures and avoid unconditional `doctor` guidance when the launch itself succeeded but a later diagnostic failed.
21
+
22
+ ## 0.2.35 - 2026-05-28
23
+
24
+ ### Changed
25
+
26
+ - Cut the local Pi development baseline to `@earendil-works/*` `0.76.0` and refreshed the npm lockfile with npm 11.14.0. Pi **≥ 0.76** is recommended for branch/session continuity (`session_tree` rehydration and generation-aware guards).
27
+ - Updated the configured-source lifecycle harness for Pi 0.76 exact session IDs: launches and relaunches now use `--session-id piab-lifecycle-<pid>` instead of driving `/resume`, assert the JSONL session header id, and include real Pi `tool_result` failure-patch evidence for QA reclassification.
28
+ - Rehydrate branch-visible browser state on Pi `session_tree` events as well as `session_start`, while keeping runtime-owned managed-session and Electron cleanup registries separate so branch switches do not orphan resources owned by the current Pi process; `session_tree`, Electron status/probe/cleanup, and wrapper-owned browser commands now share the same serialization boundary where they can touch managed state, while independent caller-owned explicit-session completions are guarded from overwriting newer branch restores.
29
+ - Tightened the public TypeBox schema to reject unsupported top-level fields and unsupported fields inside `semanticAction`, `sourceLookup`, `networkSourceLookup`, and constrained `job` steps.
30
+ - Centralized upstream command capabilities in `command-taxonomy.ts` (navigation, ref guards, batch invalidation, session close, Electron health probes, page-change summaries, pinning exclusions) and sessionless managed-session policy in `command-policy.ts` with shared argv discovery in `argv-descriptor.ts` / `argv-grammar.ts`.
31
+ - Normalized `open` / `goto` / `navigate` navigation handling through the shared taxonomy so page-change summaries and ref invalidation stay consistent across aliases.
32
+ - Refactored the extension entrypoint and browser-run orchestration: Electron host actions moved to `orchestration/electron-host/`, click-dispatch and prompt-guard preflight live under `orchestration/browser-run/`, and duplicate browser-run helper ownership was removed (#57).
33
+ - Expanded the upstream capability baseline and command reference for agent-browser **0.27.0** (additional help sampling, inventory tokens, and maintainer rebaseline metadata).
34
+
35
+ ### Added
36
+
37
+ - Prompt-policy preflight guards: block likely final submit/order clicks (including batch steps and Enter/Return keyboard submits) when the latest user message sets an explicit stop boundary, and block `close`/`quit`/`exit` until requested screenshot/recording paths from the prompt are verified in the artifact manifest.
38
+ - Model-free real Pi pipeline coverage for `buildAgentBrowserToolResultPatch`, proving prose QA failures become `isError: true` in persisted JSONL tool results and caller-requested `--json` failures keep parseable JSON while still patching `isError`.
39
+ - Model-free real Pi pipeline coverage for strict public-schema rejection before upstream spawn.
40
+ - Regression tests for prompt guards, click-dispatch diagnostics, command taxonomy/policy, argv descriptor edge cases, temp-root cleanup, and `session_tree` branch rehydration of page-scoped refs, managed browser sessions, artifact manifests, Electron cleanup/status/probe ownership, explicit cleanup serialization, active-Electron reload profile preservation, targeted cleanup without unrelated branch promotion, reload cleanup of off-branch sessions/Electron launches, durable partial off-branch reload/quit profile preservation, protected temp-root process-exit and stale-prune cleanup, partial Electron cleanup session untracking, explicit close live/restore state retirement, explicit close generated-fresh ordinal reservation, explicit-session command branch-generation guarding, multi-branch managed-session cleanup, and monotonic fresh-session allocation.
41
+
42
+ ### Fixed
43
+
44
+ - Successful explicit `--session <current-wrapper-managed-session> close` and `electron.cleanup` managed-session close steps now clear live managed-session/page state, untrack cleanup ownership, reserve the next generated fresh-session ordinal so repeated closes cannot reuse a just-closed generated name, rotate the next default auto call away from the closed name, and stay honored after reload/resume branch restore.
45
+ - `/reload` now preserves the current branch-visible active Electron launch and its isolated temp `userDataDir` for continuity while cleaning off-branch owned Electron launches, preserves off-branch profile dirs across reload, quit, repeated temp cleanup, process-exit cleanup, and stale temp-root pruning after restart when partial cleanup intentionally skips or fails `user-data-dir` removal, and targeted `electron.cleanup` no longer promotes unrelated off-branch launches into the current branch-visible state.
46
+ - Stabilized env-patched fake-upstream tests by serializing process environment patches and tightening the inherited-stdio subprocess regression to assert quick post-exit fallback behavior without waiting for the process timeout.
47
+ - Hardened maintainer dogfood/smoke safety around release prompts (stop-before-order and required artifact paths) so automated and interactive smokes exercise the new guards without placing orders or closing early.
48
+
5
49
  ## 0.2.34 - 2026-05-24
6
50
 
7
51
  ### Added
package/README.md CHANGED
@@ -59,18 +59,18 @@ The result is optimized for agent work:
59
59
  | Agents build fragile shell commands | Exposes `agent_browser` with exact `args`, an optional `semanticAction` shorthand for common `find` flows and native `select`, constrained `job` / `qa` presets, experimental `sourceLookup` / `networkSourceLookup` that compile short workflows to `batch`, top-level `electron` for desktop lifecycle, plus controlled `stdin` and `sessionMode` | `extensions/agent-browser/index.ts`, `extensions/agent-browser/lib/input-modes/`, [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md) |
60
60
  | Page snapshots are too large | Shows compact, main-content-first summaries, surfaces an `Omitted high-value controls` section (plus `details.data.highValueControlRefIds`) when dense pages or desktop host screens hide editables, named surfaces/tabs, and primary action buttons from the trimmed ref lists, and stores full raw output in spill files when needed | `extensions/agent-browser/lib/results/snapshot.ts`, `test/agent-browser.presentation.test.ts` |
61
61
  | Screenshots/downloads get lost in text | Normalizes artifact paths and reports existence, size, cwd, session, and repair status | [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#download-screenshot-and-pdf-files) |
62
- | Profile restores and tab drift confuse agents | Tracks managed sessions, re-selects target tabs after observed drift, and pins later commands only for sessions with drift/restored-session risk | generated tab-recovery notes below; `test/agent-browser.extension-tab-recovery.test.ts` (drift and about:blank recovery), `test/agent-browser.resume-state.test.ts` (persisted session / resume planning) |
62
+ | Profile restores and tab drift confuse agents | Tracks managed sessions, re-selects target tabs after observed drift, rehydrates branch-backed session state on Pi session-tree changes, and pins later commands only for sessions with drift/restored-session risk | generated tab-recovery notes below; `test/agent-browser.extension-tab-recovery.test.ts` (drift and about:blank recovery), `test/agent-browser.resume-state.test.ts` (persisted session / resume planning), `test/agent-browser.extension-ref-guards.test.ts` (session_tree rehydration) |
63
63
  | Auth/profile workflows can leak secrets | Supports `auth save --password-stdin` and redacts sensitive args, URLs, stdout/stderr, details, and parse-failure spills | `test/agent-browser.extension-security-redaction.test.ts` |
64
64
  | Stateful cookies/storage/auth output bloats or leaks context | Presentation layer redacts `details.data` for cookies and storage (field-aware values) and recursively scrubs other structured upstream JSON (network, diff, trace/profiler, stream, dashboard, chat, auth, dialog, frame, state, and similar) using sensitive key names plus string heuristics; masks sensitive argv flags and positionals; scrubs secrets from failed batch step errors; and exposes a compact redacted `batch` matrix on top-level `details.data` | `extensions/agent-browser/lib/results/presentation.ts`, `extensions/agent-browser/lib/results/presentation/diagnostics.ts`, `extensions/agent-browser/lib/runtime.ts`, `test/agent-browser.presentation-diagnostics.test.ts` |
65
65
  | Stale `@eN` refs fail mysteriously | Records per-session `details.refSnapshot`, rejects mismatched URLs / unknown refs / unsafe `batch` stdin ordering before spawn, adds recovery guidance to rerun `snapshot -i` or use stable `find` locators | `extensions/agent-browser/index.ts`, `extensions/agent-browser/lib/session-page-state.ts`, `test/agent-browser.session-page-state.test.ts`, `test/agent-browser.results.test.ts`, `test/agent-browser.extension-ref-guards.test.ts`, `test/agent-browser.extension-semantic-recovery.test.ts` |
66
- | Agents need stable success/failure buckets | Exposes bounded `resultCategory`, `successCategory`, and `failureCategory` on tool `details` for branching without parsing prose; a `tool_result` hook also aligns real Pi `isError` semantics, naming `Pi tool isError: true` in prose output while preserving parseable caller-requested `--json` output | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), `extensions/agent-browser/lib/results/categories.ts`, `extensions/agent-browser/lib/results/shared.ts` (re-export barrel), `extensions/agent-browser/index.ts`, `test/agent-browser.results.test.ts`, `test/agent-browser.extension-validation.test.ts` |
67
- | Models re-snapshot after every click without new URL/title context | Adds optional `details.pageChangeSummary` (and per-batch-step summaries) with `changeType`, compact text, optional `title`/`url`, artifact hints, and `nextActionIds` aligned to `nextActions`; no-navigation clicks can also surface evidence-backed `details.overlayBlockers` candidates | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), `extensions/agent-browser/lib/results/presentation/navigation.ts`, `test/agent-browser.presentation.test.ts`, `test/agent-browser.extension-errors-artifacts.test.ts` |
66
+ | Agents need stable success/failure buckets | Exposes bounded `resultCategory`, `successCategory`, and `failureCategory` on tool `details` for branching without parsing prose; a `tool_result` hook also aligns real Pi `isError` semantics, naming `Pi tool isError: true` in prose output while preserving parseable caller-requested `--json` output | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), `extensions/agent-browser/lib/results/categories.ts`, `extensions/agent-browser/lib/results/shared.ts` (re-export barrel), `extensions/agent-browser/index.ts`, `test/agent-browser.results.test.ts`, `test/agent-browser.extension-validation.test.ts`, `test/agent-browser.pi-pipeline.test.ts` |
67
+ | Clicks can report success without the page receiving the event | Top-level non-Electron `click` on exact CSS/XPath selectors installs a bounded DOM-event probe; if upstream reports success but no trusted event reaches the target, the wrapper fails the tool, exposes `details.clickDispatch`, and suggests explicit retry/inspect next actions (no in-page replay; `@e…` refs skip the probe). Other click results still expose `details.pageChangeSummary`, unchanged-URL clicks can surface evidence-backed `details.overlayBlockers` candidates, and explicit user stop boundaries can best-effort block click-like actions plus `press`/`key` Enter/Return submits via `details.promptGuard`. | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), `extensions/agent-browser/lib/orchestration/browser-run/click-dispatch.ts`, `extensions/agent-browser/lib/orchestration/browser-run/browser-action-model.ts`, `extensions/agent-browser/lib/orchestration/browser-run/prompt-guards.ts`, `extensions/agent-browser/lib/results/presentation/navigation.ts`, `test/agent-browser.presentation.test.ts`, `test/agent-browser.extension-errors-artifacts.test.ts` |
68
68
  | Dashboard scroll commands can look successful while nothing moves | Samples viewport and prominent scroll-container positions around top-level `scroll` calls; unchanged positions produce `details.scrollNoop`, visible recovery guidance, and exact `nextActions` for snapshot/screenshot verification | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#core-page-and-element-commands), `test/agent-browser.extension-validation.test.ts` |
69
69
  | Dropdown/combobox clicks can focus or hit native option box-model errors | Adds first-class `select <selector> <value...>` paths through raw `args`, `semanticAction`, and `job`; for custom combobox clicks, detects focused controls with explicit `aria-expanded` state but no visible options and returns `details.comboboxFocus` plus exact recovery `nextActions` | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#core-page-and-element-commands), `extensions/agent-browser/lib/input-modes/semantic-action.ts`, `test/agent-browser.extension-input-modes.test.ts`, `test/agent-browser.extension-validation.test.ts` |
70
70
  | Recording workflows fail late when `ffmpeg` is missing | After successful `record start` / `record restart`, warns when `ffmpeg` is not on `PATH` so agents can install or fix PATH before `record stop` | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#diff-debug-and-streaming), `test/agent-browser.extension-validation.test.ts` |
71
71
  | Direct binary help may be blocked in agent sessions | Publishes a repo-readable command reference and verifies it against the target upstream version | `npm run verify` |
72
72
  | Desktop Electron apps need discovery, CDP attach, and safe teardown | Top-level `electron` runs host `list` / isolated `launch` (temp profile, OS-chosen debug port) / `status` / `probe` / `cleanup`, merges `launchId` plus managed `sessionName`, supports `handoff` `snapshot` / `tabs` / `connect`, and surfaces mismatch and post-command health guidance; wrapper cleanup applies only to launches it created | `extensions/agent-browser/lib/electron/discovery.ts`, `launch.ts`, `cleanup.ts`, [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#electron), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#electron-desktop-apps) |
73
- | Agents need bundled `skills` text without touching the live session | Treats `skills list`, `skills get …`, and `skills path …` as stateless JSON reads: no implicit managed `--session` under default `sessionMode: "auto"` (same session-ownership goal as plain-text `--help` / `--version`), while provider workflows stay thin passthroughs that require upstream setup and credentials | [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#built-in-skills), `extensions/agent-browser/lib/runtime.ts` |
73
+ | Agents need bundled `skills` text and local setup/status commands without touching the live session | Treats `skills list`, `skills get …`, `skills path …`, local auth profile management (`auth save/list/show/delete/remove`), `profiles`, `dashboard`, `device list`, `doctor`, `install`, `upgrade`, `session list`, and targeted/all saved-state maintenance (`state clear --all`, `state clear -a`, named clear, or `state clean --older-than <days>`) as sessionless reads/actions: no implicit managed `--session` under default `sessionMode: "auto"` (same session-ownership goal as plain-text `--help` / `--version`), while provider and browser-backed workflows stay thin passthroughs that require upstream setup and credentials | [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#built-in-skills), `extensions/agent-browser/lib/command-policy.ts`, `extensions/agent-browser/lib/runtime.ts` |
74
74
 
75
75
  ## Fastest way to try it
76
76
 
@@ -88,7 +88,7 @@ Optional external tools unlock the full command surface:
88
88
 
89
89
  Keep both binaries on `PATH`. `record start` can begin without a file on disk, but `record stop` needs `ffmpeg` to encode the WebM.
90
90
 
91
- The native tool also gives agents absolute installed-package doc paths in its compact runtime guidance. Agents should read `README.md` for setup/dependencies, `docs/COMMAND_REFERENCE.md` for targeted command workflows, and `docs/TOOL_CONTRACT.md` for result/detail contracts only when deeper guidance is needed.
91
+ The native tool also gives agents absolute installed-package doc paths in its compact runtime guidance. Raw `args` are the 1:1 upstream CLI coverage path for the targeted `agent-browser` release; typed modes such as `semanticAction`, `job`, `qa`, source lookups, and Electron lifecycle helpers are reliability shorthands layered on top. Agents should read `README.md` for setup/dependencies, `docs/COMMAND_REFERENCE.md` for targeted command workflows, and `docs/TOOL_CONTRACT.md` for result/detail contracts only when deeper guidance is needed.
92
92
 
93
93
  Then install this Pi package:
94
94
 
@@ -157,6 +157,13 @@ Open a page and inspect it (first-call recipe: open → snapshot -i → interact
157
157
  { "args": ["snapshot", "-i"] }
158
158
  ```
159
159
 
160
+ Watch a browser window during a demo or QA run by adding upstream's global `--headed` flag on the first launch. Use `sessionMode: "fresh"` if a managed session may already exist, because headed/headless state is launch-scoped. A successful tool call means upstream opened a browser context; it does **not** prove the OS window is visible on the user's display, especially under remote, container, or virtual-display setups.
161
+
162
+ ```json
163
+ { "args": ["--headed", "open", "https://example.com"], "sessionMode": "fresh" }
164
+ { "args": ["screenshot", "/tmp/agent-browser-headed-check.png"] }
165
+ ```
166
+
160
167
  On `https://example.com/`, the main link label is **Learn more**—use exact visible text from your snapshot, not guessed copy such as `More information...`.
161
168
 
162
169
  Click a visible ref, then refresh refs after navigation or a DOM update:
@@ -222,7 +229,7 @@ Typical pitfalls:
222
229
  - Do not reuse `@e…` refs across navigation. The wrapper records the latest snapshot refs per session and fails mutation-prone stale/recycled refs before upstream can silently hit a different current-page element; use the session-aware `refresh-interactive-refs` next action.
223
230
  - If upstream classifies the failure as `stale-ref` and `details.compiledSemanticAction` is present for a compiled `find` action, `details.nextActions` may list `retry-semantic-action-after-stale-ref` after `refresh-interactive-refs`, carrying the same compiled `find` argv so you can retry the locator-stable target once it is safe to do so. `select` calls that used stale `@refs` only get refresh guidance; use a fresh snapshot or stable selector before retrying (contract in [`docs/TOOL_CONTRACT.md#semanticaction`](docs/TOOL_CONTRACT.md#semanticaction)).
224
231
  - If the failure is `selector-not-found`, the wrapper may take one fresh snapshot and add `Current snapshot ref fallback` when that snapshot has exact visible role/name matches for the failed `find` / `semanticAction` target. Non-fill targets can include direct `try-current-visible-ref*` next actions, and semantic click misses can still add bounded `Agent-browser candidate fallbacks` such as `button`/`link` role retries for `text` clicks. For semantic `fill` misses on desktop or host-controlled rich inputs, prefer `details.richInputRecovery`: refresh refs, choose the current editable `@ref`, focus or click it, then use `keyboard inserttext` or `keyboard type` with the intended text. Those recovery nextActions do not copy the fill text and do not press `Enter` or submit; only submit when the user flow explicitly calls for it (same contract link).
225
- - A successful upstream `click` is not proof that the web app handled the event or changed state. When the task depends on a mutation, follow `inspect-after-mutation` / `pageChangeSummary` evidence with a wait, URL/text check, or fresh snapshot before trusting the result; if the target still did not change, retry with a current visible ref or stable selector and report the workflow issue instead of silently continuing. Preserve explicit user stop boundaries: if the user says to stop before order/post/purchase/submit, gather evidence on that page and do not click the final action.
232
+ - A successful upstream `click` is not proof that the web app handled the event or changed state. For top-level non-Electron clicks, the wrapper may fail the tool with `details.clickDispatch` and a `Click dispatch diagnostic` line when upstream reported success but no trusted DOM event reached the target; use the suggested `inspect-click-dispatch-miss` / `retry-click-after-dispatch-miss` next actions instead of assuming the click mutated the page. When the task depends on a mutation, follow `inspect-after-mutation` / `pageChangeSummary` evidence with a wait, URL/text check, or fresh snapshot before trusting the result; if the target still did not change, retry with a current visible ref or stable selector and report the workflow issue instead of silently continuing. Preserve explicit user stop boundaries: if the user says to stop before order/post/purchase/submit, gather evidence on that page and do not click the final action. The wrapper now blocks likely final order/submit clicks under such prompts and reports `details.promptGuard` rather than trusting the model to self-police.
226
233
  - If a **top-level** `click` succeeds (unified command `click`, not a `batch` step), upstream reports `data.clicked`, and the tab URL is unchanged under the same normalization as ref preflight (fragment-insensitive), the wrapper may take one extra `snapshot -i` and add `Possible overlay blockers` with `details.overlayBlockers` (`candidates`, `summary`, optional `snapshot` refresh for refs) plus session-aware `inspect-overlay-state` / bounded `try-overlay-blocker-candidate-*` next actions when that snapshot shows strong modal context (`dialog` / `alertdialog`) and close/dismiss-like controls. Page-wide words like privacy, sign in, or banner alone do not trigger this diagnostic. The unchanged-URL check uses `details.navigationSummary`, which is populated with one read-only `eval` summary when the click JSON omits **both** string `data.url` and `data.title`; if upstream already includes either, overlay diagnostics are skipped here. Also skipped when tab correction or about-blank recovery already ran on that result.
227
234
  - If `get text <selector>` reads a non-ref CSS selector with multiple matches or a hidden first match while visible matches exist, including successful `batch` steps, the wrapper may add `Selector text visibility warning`, `details.selectorTextVisibility` (plus `selectorTextVisibilityAll` for multiple batched warnings), and `inspect-visible-text-candidates` next actions; the warning names the matching `details.nextActions` id. Prefer a visible `@ref`, a scoped selector, or a targeted `eval --stdin` over hidden tab content.
228
235
  - In attached Electron sessions, broad selectors such as `body`, `html`, `main`, or `[role=application]` may read the whole app shell. The wrapper may add `Broad Electron get text selector warning`, `details.electronGetTextScopeWarning`, and `snapshot-for-electron-text-scope`; prefer `snapshot -i`, a current `@ref`, or a narrower panel selector.
@@ -277,7 +284,7 @@ For desktop Electron apps, use top-level `electron` to avoid hand-building the d
277
284
  { "electron": { "action": "cleanup", "launchId": "electron-…" } }
278
285
  ```
279
286
 
280
- `electron.probe.timeoutMs` bounds each underlying read subprocess when dense desktop apps need a shorter or longer probe budget (omit for the normal tool subprocess default). `electron.cleanup.timeoutMs` caps upstream `close` plus host profile/process teardown and defaults to the implicit session close budget unless overridden. `electron.status.timeoutMs` only tightens managed-session title/url reads used for mismatch checks. Pass `electron.probe.launchId` when you want the probe tied to a wrapper-tracked launch instead of only the current managed session. Launch/status/probe results show both `launchId` (for status/cleanup/probe) and `sessionName` (for browser `snapshot`/`tab` commands); if the managed session drifts to `about:blank` while wrapper status still sees a live renderer, Electron-specific mismatch warnings and `status`/`probe`/`reattach`/`snapshot` next actions replace generic tab guidance. If the app process/debug port dies after a successful-looking mutation, the wrapper reports `details.electronPostCommandHealth` and fails with `tab-drift` instead of quietly continuing on `about:blank`. Launch timeouts expose `details.electron.failure.diagnostics` for PID, profile, DevToolsActivePort, and timing evidence.
287
+ `electron.probe.timeoutMs` bounds each underlying read subprocess when dense desktop apps need a shorter or longer probe budget (omit for the normal tool subprocess default). `electron.cleanup.timeoutMs` caps upstream `close` plus host profile/process teardown and defaults to the implicit session close budget unless overridden; if the managed-session close step succeeds but host cleanup is partial, later default browser calls still rotate away from that closed wrapper-managed session. `electron.status.timeoutMs` only tightens managed-session title/url reads used for mismatch checks. Pass `electron.probe.launchId` when you want the probe tied to a wrapper-tracked launch instead of only the current managed session. Launch/status/probe results show both `launchId` (for status/cleanup/probe) and `sessionName` (for browser `snapshot`/`tab` commands); if the managed session drifts to `about:blank` while wrapper status still sees a live renderer, Electron-specific mismatch warnings and `status`/`probe`/`reattach`/`snapshot` next actions replace generic tab guidance. `/reload` preserves the current branch-visible active Electron launch and its isolated temp `userDataDir` for continuity, and cleans off-branch owned Electron launches; if cleanup is partial and skips or fails profile removal, the generic temp sweep preserves that `userDataDir` across reload, quit, later temp cleanup, process exit, and stale temp-root pruning after restart. If the app process/debug port dies after a successful-looking mutation, the wrapper reports `details.electronPostCommandHealth` and fails with `tab-drift` instead of quietly continuing on `about:blank`. Launch timeouts expose `details.electron.failure.diagnostics` for PID, profile, DevToolsActivePort, and timing evidence.
281
288
 
282
289
  `launch.handoff` still defaults to `"snapshot"`; it retries briefly when the first Electron snapshot has no refs. Use `handoff: "tabs"` as a safer diagnostic starting point when you only need target discovery and do not want interactive refs captured yet, or `handoff: "connect"` when you want attach-only and will run your own `snapshot -i` / tab commands next. For Electron quick inputs that rerender in place, a successful `fill` may include `details.fillVerification` if `get value` still disagrees; re-snapshot and use focus plus keyboard typing before submitting.
283
290
 
@@ -290,7 +297,7 @@ For an app you launched yourself with remote debugging enabled, use raw upstream
290
297
  { "args": ["snapshot", "-i"] }
291
298
  ```
292
299
 
293
- `connect` success means the debug endpoint accepted the session, not that an active page is ready. If a snapshot says `No active page`, the wrapper clears prior refs for that session; choose a stable `t<N>` tab and retry a condition wait or fresh `snapshot -i` before using `@e…` refs. `close` only closes the browser/CDP session; manually launched apps, their profiles, and explicit screenshots/downloads/HARs/traces/recordings remain host-owned.
300
+ `connect` success means the debug endpoint accepted the session, not that an active page is ready. If a snapshot says `No active page`, the wrapper clears prior refs for that session; choose a stable `t<N>` tab and retry a condition wait or fresh `snapshot -i` before using `@e…` refs. Close commands (`close`, `quit`, or `exit`) only close the browser/CDP session; manually launched apps, their profiles, and explicit screenshots/downloads/HARs/traces/recordings remain host-owned.
294
301
 
295
302
  After either path, use `qa: { "attached": true, ... }` for a current-session smoke check without opening a URL. Prefer condition waits (`wait --text`, `wait --url`, `wait --fn`, `wait --load <state>`, `wait --download`), `qa.attached`, `electron.probe` / `electron.status`, `tab list` → `tab t<N>`, fresh snapshots, or screenshots over blind sleeps. Keep fixed waits below the wrapper IPC budget: `wait 30000` is intentionally blocked, and a result like `"waited":"timeout"` only proves elapsed time.
296
303
 
@@ -335,9 +342,9 @@ For asynchronous exports, click first and then wait for the download:
335
342
 
336
343
  When a user gives exact artifact paths for screenshots, recordings, downloads, PDFs, traces, or HAR files, use those paths or explicitly report why the artifact was unavailable; do not silently substitute a different path in the final report. With upstream `agent-browser 0.27.0`, treat `details.savedFilePath` as upstream-reported metadata and confirm `details.artifacts[].exists` before relying on the requested `wait --download <path>` file being present on disk.
337
344
 
338
- For evidence-only screenshots or QA captures, branch on `details.artifactVerification` and `details.artifacts` before reporting PASS/FAIL; inline image attachments are optional when size limits allow—do not require vision review unless the user asked for visual inspection.
345
+ For evidence-only screenshots or QA captures, branch on `details.artifactVerification` and `details.artifacts` before reporting PASS/FAIL; inline image attachments are optional when size limits allow—do not require vision review unless the user asked for visual inspection. If the latest prompt names exact required artifact paths, browser close can be blocked with `details.promptGuard` until those artifacts are saved and verified.
339
346
 
340
- Artifact cleanup is host-owned, not a browser command. `close` shuts down the browser session but does **not** delete explicit screenshots, downloads, PDFs, traces, HAR files, or recordings saved to paths you chose. When the session’s non-empty `details.artifactManifest` is in scope, a successful `close` appends an `Artifact lifecycle` note and sets `details.artifactCleanup` with the same retention summary as `details.artifactRetentionSummary`, a fixed `note` about host-owned cleanup, and `explicitArtifactPaths`: up to ten distinct paths from manifest rows whose `storageScope` is `explicit-path` (this list can be empty if the recent window only holds spills or other non-explicit inventory). Remove any listed paths with normal file tools after inspection.
347
+ Artifact cleanup is host-owned, not a browser command. Close commands (`close`, `quit`, or `exit`) shut down the browser session but do **not** delete explicit screenshots, downloads, PDFs, traces, HAR files, or recordings saved to paths you chose. When the session’s non-empty `details.artifactManifest` is in scope, a successful close command appends an `Artifact lifecycle` note and sets `details.artifactCleanup` with the same retention summary as `details.artifactRetentionSummary`, a fixed `note` about host-owned cleanup, and `explicitArtifactPaths`: up to ten distinct paths from manifest rows whose `storageScope` is `explicit-path` (this list can be empty if the recent window only holds spills or other non-explicit inventory). Remove any listed paths with normal file tools after inspection.
341
348
 
342
349
  Start a fresh profiled browser after the implicit public-browsing session already exists:
343
350
 
@@ -345,7 +352,7 @@ Start a fresh profiled browser after the implicit public-browsing session alread
345
352
  { "args": ["--profile", "Default", "open", "https://example.com/account"], "sessionMode": "fresh" }
346
353
  ```
347
354
 
348
- After a successful unnamed fresh launch, later default `sessionMode: "auto"` calls follow that browser automatically. If the fresh launch fails or times out, `details.managedSessionOutcome` records whether the previous managed session was preserved or the attempted fresh session was abandoned before any managed session became current; a `Managed session outcome: …` line is appended only when the failing call used `sessionMode: "fresh"`.
355
+ After a successful unnamed fresh launch, later default `sessionMode: "auto"` calls follow that browser automatically. If the fresh launch fails or times out, `details.managedSessionOutcome` records whether the previous managed session was preserved or the attempted fresh session was abandoned before any managed session became current; a `Managed session outcome: …` line is appended only when the failing call used `sessionMode: "fresh"`. If you explicitly close the current wrapper-managed session with `--session <name> close`, later default auto calls rotate to a new wrapper-generated session instead of reusing that closed name, and repeated closes keep reserving fresh names across resume/branch restore.
349
356
 
350
357
  ## Authenticated/profile workflows
351
358
 
@@ -356,7 +363,7 @@ Use these rules:
356
363
  - Use public/temp profiles for tests and examples.
357
364
  - Use `sessionMode: "fresh"` when switching from public browsing to `--profile`, `--session-name`, `--cdp`, `--state`, `--auto-connect`, `--init-script`, `--enable`, `-p` / `--provider`, or iOS `--device`.
358
365
  - Use `--session` when you want to manage a live upstream session name yourself.
359
- - Do not treat `--session` as persisted auth or tab restore after `close`; use `--profile`, `--session-name`, or `--state` for persistence.
366
+ - Do not treat `--session` as persisted auth or tab restore after `close`, `quit`, or `exit`; use `--profile`, `--session-name`, or `--state` for persistence.
360
367
  - Prefer page actions and storage checks over cookie dumps. `cookies get` can expose real profile cookies.
361
368
  - Prefer `auth save --password-stdin` over putting passwords in `args`; the wrapper only accepts caller `stdin` for `batch`, `eval --stdin`, and `auth save --password-stdin` (top-level `job` and `qa` compile to `batch` and supply their own stdin).
362
369
  - Use `state save <path>` / `state load <path>` for portable test state. `state save` is reported as a file artifact with verification metadata; `state load` may mention a path but is not treated as a newly saved artifact.
@@ -495,7 +502,10 @@ The upstream browser engine remains [`agent-browser`](https://agent-browser.dev/
495
502
  - Published pre-1.0 package.
496
503
  - Targets the current locally installed upstream `agent-browser` version only.
497
504
  - Does not bundle `agent-browser`; users install it separately.
498
- - Does not provide a human browser UI inside Pi; the primary UX is agent-invoked tool calls.
505
+ - Does not provide a human browser UI inside Pi; the primary UX is agent-invoked tool calls. `--headed` asks upstream to show a browser window, but the wrapper cannot yet prove that the window is visible on the user's desktop.
506
+ - Localhost means the browser host's loopback, not necessarily the shell/Pi host. If `http://localhost:<port>` or `http://127.0.0.1:<port>` fails with errors such as `ERR_EMPTY_RESPONSE`, use a host-reachable address when available or a `file://` URL for static fixtures, then verify with `snapshot -i` or an explicit screenshot.
507
+ - `file://` pages are useful as a static fallback, but they can behave differently from HTTP pages for MIME types, CORS, storage, and script/debugger behavior. If `eval --stdin` returns `null` or otherwise cannot verify a `file://` page, treat that as inconclusive and use screenshot/snapshot evidence or move the fixture to reachable HTTP.
508
+ - A successful upstream `click` is not proof that the app handled the event. For state-changing flows, verify with a fresh snapshot, text/URL assertion, screenshot, or `pageChangeSummary` before reporting success.
499
509
  - Real authenticated profile use is powerful but sensitive. Treat profile and cookie access as user-approved, task-specific behavior.
500
510
  - Wrapper tab/session recovery is best effort around observed upstream behavior, not a replacement for explicit profile/session design.
501
511
 
@@ -525,9 +535,9 @@ Configured-source lifecycle validation:
525
535
  npm run verify -- lifecycle
526
536
  ```
527
537
 
528
- The harness defaults to Pi model `zai/glm-5.1` and **180000 ms** per-step tmux waits; pass `--model <id>` and/or `--timeout-ms <ms>` after `lifecycle` when you need different settings (see [Configured-source lifecycle validation](docs/RELEASE.md#configured-source-lifecycle-validation) in `docs/RELEASE.md`).
538
+ The harness defaults to Pi model `zai/glm-5.1` and **180000 ms** per-step tmux waits; pass `--model <id>` and/or `--timeout-ms <ms>` after `lifecycle` when you need different settings (see [Configured-source lifecycle validation](docs/RELEASE.md#configured-source-lifecycle-validation) in `docs/RELEASE.md`). It launches Pi 0.76 with a deterministic `--session-id`, drives `/reload`, closes Pi, relaunches the exact same session, asserts the JSONL header id, and checks managed-session continuity, persisted spill reachability, and real Pi `tool_result` failure-patch behavior.
529
539
 
530
- Use lifecycle validation when testing `/reload`, full restart, `/resume`, managed-session continuity, or persisted artifact behavior. Maintainers must run the same harness before every publish; see [Pre-release checks](docs/RELEASE.md#pre-release-checks).
540
+ Use lifecycle validation when testing `/reload`, exact-session relaunch, `/resume`, managed-session continuity, or persisted artifact behavior. Branch-backed state and `session_tree` cleanup ownership are covered by focused extension harness tests. Maintainers must run the lifecycle harness before every publish; see [Pre-release checks](docs/RELEASE.md#pre-release-checks).
531
541
 
532
542
  Installed-package validation after publish:
533
543
 
@@ -78,14 +78,14 @@ The published package should load from the `pi` manifest in `package.json`.
78
78
  Local checkout validation has two intentional modes:
79
79
 
80
80
  - **Quick isolated mode:** use explicit CLI loading such as `pi --no-extensions -e .` from the repository root. This bypasses Pi settings and extension discovery, avoids duplicate `agent_browser` registrations when another source is installed globally, and is the right mode for checkout smoke tests.
81
- - **Configured-source lifecycle mode:** configure exactly one active checkout or package source in Pi settings and launch plain `pi`. This is the right mode for validating `/reload`, restart, and `/resume` behavior because those lifecycle checks exercise discovered/configured resources. Before shipping, maintainers also run `npm run verify -- lifecycle` (same semantics under automation) plus the live-site checks in [`RELEASE.md`](RELEASE.md#pre-release-checks); `npm publish` enforces `npm run verify -- release` via `prepublishOnly` unless scripts are skipped.
81
+ - **Configured-source lifecycle mode:** configure exactly one active checkout or package source in Pi settings and launch plain `pi`. This is the right mode for validating `/reload` and exact-session relaunch because those lifecycle checks exercise discovered/configured resources. Focused extension harness tests validate branch-backed `session_tree` rehydration and cleanup ownership. Before shipping, maintainers also run `npm run verify -- lifecycle` (same semantics under automation, using Pi 0.76 `--session-id` to reopen the exact JSONL session) plus the live-site checks in [`RELEASE.md`](RELEASE.md#pre-release-checks); `npm publish` enforces `npm run verify -- release` via `prepublishOnly` unless scripts are skipped.
82
82
 
83
83
  The repo should not add a repo-local `.pi/extensions/` autoload shim as the documented checkout path.
84
84
 
85
85
  Why:
86
86
  - avoids duplicate `agent_browser` registrations when the package is also installed globally
87
87
  - keeps the product contract centered on the package manifest instead of repo-local autoload wiring
88
- - keeps reload and resume validation tied to Pi's configured-source lifecycle instead of an isolated quick-test path
88
+ - keeps reload and exact-session relaunch validation tied to Pi's configured-source lifecycle instead of an isolated quick-test path, while `session_tree` state changes stay covered by focused extension harness tests
89
89
  - keeps the published tarball focused on the package manifest, extension code, canonical docs, and license
90
90
 
91
91
  The published package should exclude agent-only and superseded repo materials such as `AGENTS.md`, `docs/v1-tool-contract.md`, `docs/native-integration-design.md`, and other internal planning notes.
@@ -113,26 +113,32 @@ V1 ownership rule:
113
113
  - implicit auto-generated sessions are extension-managed convenience sessions
114
114
  - unnamed `sessionMode: "fresh"` launches rotate that extension-managed session to a new upstream browser
115
115
  - explicit/user-managed sessions are not auto-managed by default
116
- - extension-managed sessions should be reusable during an active `pi` session and across `/reload` / `/resume`, while still being cleaned up predictably
116
+ - extension-managed sessions should be reusable during an active `pi` session and across `/reload`, exact-session relaunch, `/resume`, and Pi branch-tree transitions, while still being cleaned up predictably
117
117
 
118
118
  Practical policy:
119
- - preserve the current extension-managed session across `/reload` and resumable session transitions so persisted sessions can keep following the live browser after `/reload` or `/resume`
119
+ - preserve the current branch-visible extension-managed session across `/reload`, exact-session relaunch, `/resume`, and Pi 0.76 `session_tree` branch transitions so persisted sessions can keep following the live browser after lifecycle changes
120
120
  - close the active extension-managed session when the originating `pi` process quits, while leaving explicit caller-provided sessions alone
121
121
  - set an idle timeout on extension-managed sessions as a backstop for abnormal exits or cleanup failures
122
122
  - clean up process-private temp spill artifacts on shutdown, but keep persisted-session snapshot spill files in a private session-scoped artifact directory with a bounded per-session budget so `details.fullOutputPath` stays usable after reload/resume without unbounded growth
123
- - keep explicit screenshots, downloads, PDFs, traces, HAR captures, and recordings written to caller-chosen paths on disk after a successful upstream `close`; when the bounded `details.artifactManifest` has entries, successful `close` also surfaces `details.artifactCleanup` and an `Artifact lifecycle` note (including up to ten distinct `explicit-path` manifest paths when present) so operators remove files with normal host tools—the native tool does not delete arbitrary user paths (`extensions/agent-browser/index.ts`, `getArtifactCleanupGuidance`); contract in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), checklist `RQ-0079` in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
124
- - reconstruct the current extension-managed session from persisted tool details on resume/reload so later default calls keep following the active managed browser
123
+ - keep explicit screenshots, downloads, PDFs, traces, HAR captures, and recordings written to caller-chosen paths on disk after a successful upstream close command (`close`, `quit`, or `exit`); when the bounded `details.artifactManifest` has entries, successful close commands also surface `details.artifactCleanup` and an `Artifact lifecycle` note (including up to ten distinct `explicit-path` manifest paths when present) so operators remove files with normal host tools—the native tool does not delete arbitrary user paths (`extensions/agent-browser/lib/orchestration/browser-run/diagnostics.ts`, `getArtifactCleanupGuidance`); contract in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), checklist `RQ-0079` in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
124
+ - reconstruct the current branch-visible extension-managed session, page-scoped refs, artifact manifest, and Electron launch records from the active transcript branch on `session_start` and `session_tree` so later default calls keep following the active managed browser after resume/reload or branch switching; restore also honors successful explicit `--session <wrapper-owned> close` rows and `electron.cleanup` managed-session steps so closed wrapper-owned sessions are not resurrected
125
+ - keep process-owned cleanup registries for extension-managed sessions and wrapper-launched Electron records separate from the current branch-visible view; `session_tree` restore and wrapper-owned browser commands are serialized with managed-session work, while independent caller-owned explicit-session commands keep their parallel tab-target behavior but use a branch-state generation guard so stale completions cannot overwrite newer branch-visible managed/artifact state after a branch switch; branch switches still must not drop resources the current Pi process owns and must keep fresh-session allocation monotonic
126
+ - when a successful close targets the current extension-managed session, including an explicit `--session <current> close` or an `electron.cleanup` managed-session step, clear page/ref state, mark that session inactive, untrack cleanup ownership, and rotate the next default auto call to a fresh wrapper-generated session name rather than reusing the closed name
127
+ - on non-quit shutdown such as `/reload`, close off-branch owned managed sessions and off-branch owned Electron launches before clearing process-local ownership, but preserve the current branch-visible active managed session and Electron launch plus that launch's isolated `userDataDir` so reload continuity still works from the active transcript branch
128
+ - expose still-owned off-branch Electron launch records to `electron.status { launchId }`, `electron.status { all: true }`, `electron.probe { launchId }`, and `electron.cleanup`, while leaving default `electron.probe` scoped to the current managed session
125
129
  - if an unnamed fresh launch replaces an active extension-managed session, best-effort close the old managed session after the switch succeeds
126
130
  - leave explicit caller-provided `--session` choices alone unless the caller closes them explicitly
127
131
  - after profiled `open` / `goto` / `navigate` calls, verify the active tab still matches the returned page URL and best-effort switch back when restored profile tabs steal focus
128
132
  - once the wrapper observes tab-drift risk for a session (profile restore correction, overlapping stale opens, or restored session state), later active-tab commands may synthesize a tiny upstream `batch` that re-selects that tab and then runs the requested command in the same upstream invocation; routine same-session commands avoid `tab list` preflights to reduce probes that can perturb upstream click behavior
129
133
  - for sessions with observed tab-drift risk, after a successful command on a known tab target, the wrapper may best-effort restore that same target again if restored/background tabs steal focus after the command returns; routine same-session commands skip this post-command `tab list` probe
130
- - keep a per-session `refSnapshot` aligned with the last successful `snapshot` (including refs merged from a successful `batch` by taking the last successful `snapshot` step in batch result order): restore it from persisted tool `details` when reloading or resuming, drop it on successful `close`, and refuse mutation-prone `@e…` argv before spawn when the active tab URL no longer matches the snapshot URL, when a ref id was never in that snapshot, or when `batch` stdin would reuse `@e…` on a guarded step after an earlier invalidating step without a later `snapshot` step in the same stdin array. Same-snapshot `fill @e…` rows are guarded but do not themselves set that invalidation latch, so ordinary form fills can precede a click/submit row in one batch—see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) for the agent-visible contract and failure text; typed per-session tab/ref/pinning state lives in `extensions/agent-browser/lib/session-page-state.ts` and is updated from `extensions/agent-browser/index.ts` after each tool result
134
+ - keep a per-session `refSnapshot` aligned with the last successful `snapshot` (including refs merged from a successful `batch` by taking the last successful `snapshot` step in batch result order): restore it from persisted tool `details` when reloading, resuming, or moving to a different Pi session-tree branch, store bounded ref role/name metadata from the same snapshot for wrapper-side current-ref diagnostics, drop it on successful close commands (`close`, `quit`, or `exit`), and refuse mutation-prone `@e…` argv before spawn when the active tab URL no longer matches the snapshot URL, when a ref id was never in that snapshot, or when `batch` stdin would reuse `@e…` on a guarded step after an earlier invalidating step without a later `snapshot` step in the same stdin array. Same-snapshot `fill @e…` rows are guarded but do not themselves set that invalidation latch, so ordinary form fills can precede a click/submit row in one batch—see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) for the agent-visible contract and failure text; typed per-session tab/ref/pinning state lives in `extensions/agent-browser/lib/session-page-state.ts` and is updated from `extensions/agent-browser/index.ts` after each tool result
135
+ - for top-level non-Electron `click` commands, install a bounded in-page event probe before upstream runs; if upstream reports success but no trusted pointer/mouse/click event reached the target, fail the tool and report `details.clickDispatch` with explicit retry/inspect next actions (the wrapper does not replay clicks in-page). The probe is intentionally skipped for `batch`/`job`/`qa` click steps. For `@e…` targets it uses the stored `refSnapshot.refs` metadata above instead of taking a fresh pre-click snapshot that could recycle upstream refs
136
+ - derive narrow prompt guards from the latest user prompt for safety/evidence failures that should not rely on model self-policing: explicit stop-before-order/submit boundaries block likely final click targets before upstream runs, and exact required screenshot paths block browser close until the artifact manifest verifies those paths. These guards are bounded preflight policy (`details.promptGuard`, `failureCategory: "policy-blocked"`), not a reusable browser recipe layer
131
137
  - after successful `get text` on a non-ref CSS selector, optionally issue one read-only `eval --stdin` probe per qualifying selector when multiple DOM matches or a hidden first match with visible peers could misread tabbed or off-screen content; merge `details.selectorTextVisibility` / `selectorTextVisibilityAll`, visible warning lines, and `inspect-visible-text-candidates*` next actions as documented in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) and `RQ-0074` in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
132
138
  - for local Unix launches, set a short private socket directory so extension-generated session names do not fail on the upstream Unix socket-path length limit
133
139
  - keep wrapper-spawned upstream CLI calls inside the upstream IPC budget by clamping `AGENT_BROWSER_DEFAULT_TIMEOUT` to 25 seconds and stopping a stuck child process before the upstream 30-second read-timeout retry loop begins
134
140
 
135
- This is primarily about ownership clarity and avoiding surprise, not adding a heavy safety wrapper. If the extension invented the session, the extension should own its lifecycle without breaking reload/resume semantics. If the caller explicitly chose the upstream session model, the extension should stay out of the way.
141
+ This is primarily about ownership clarity and avoiding surprise, not adding a heavy safety wrapper. If the extension invented the session, the extension should own its lifecycle without breaking reload, resume, or branch-tree semantics. If the caller explicitly chose the upstream session model, the extension should stay out of the way.
136
142
 
137
143
  ### Launch flags
138
144
 
@@ -147,17 +153,17 @@ If the implicit session is already active and one of those startup-scoped flags
147
153
 
148
154
  That failure should include a structured recovery hint pointing to `sessionMode: "fresh"` as the first-line fix, while still allowing an explicit `--session` when the caller wants to name the new upstream session.
149
155
 
150
- Implementation detail lives in `extensions/agent-browser/lib/runtime.ts` (`findCommandStartIndex`, `VALUE_FLAGS`, `getStartupScopedFlags`, `buildExecutionPlan`):
156
+ Implementation detail lives in `extensions/agent-browser/lib/argv-descriptor.ts` and `extensions/agent-browser/lib/argv-grammar.ts` (command discovery, `VALUE_FLAGS`, `parseArgvDescriptor`) plus `extensions/agent-browser/lib/runtime.ts` (`getStartupScopedFlags`, `buildExecutionPlan`):
151
157
 
152
- - **Command discovery:** Leading argv is scanned with a value-taking allowlist so tokens such as `--timeout` on `wait`, `--resource-type` on `network route`, or `--curl` / `--domain` on `cookies set` consume their values before the upstream command word is identified. When upstream adds new global flags that take values ahead of the command, extend that allowlist; otherwise the wrapper can mis-classify the command or mis-validate value flags. A smaller set of global boolean flags may be followed by an optional `true`/`false` literal; when present, that literal is consumed as the flag value before command discovery continues.
153
- - **`--state` disambiguation:** Persisted browser `--state` before the command participates in launch-scoped validation and tab-correction hints. The same flag spelling after a `wait` command (for example `wait @ref --state hidden`) is a wait predicate, not a launch flag, and is excluded from startup-scoped detection so it does not spuriously require `sessionMode: "fresh"` while an implicit session is active.
158
+ - **Command discovery:** Leading argv is scanned with a value-taking allowlist so known global flags and documented command flags consume their values before the upstream command word is identified. Missing-value prevalidation is intentionally limited to upstream global value flags; command-scoped flags and literal text are left to upstream parsing so values like `fill #field --password` are not rejected by wrapper heuristics before the CLI sees them. When upstream adds new global flags that take values ahead of the command, extend both the command-discovery and prevalidation allowlists; when it adds command-specific flags, extend only command discovery/redaction as needed. A smaller set of global boolean flags may be followed by an optional `true`/`false` literal; when present, that literal is consumed as the flag value before command discovery continues.
159
+ - **`--state` disambiguation:** Persisted browser `--state` before the command participates in launch-scoped validation and tab-correction hints. The same flag spelling after a `wait` command is excluded from startup-scoped detection so upstream help examples such as `wait @ref --state hidden` do not spuriously require `sessionMode: "fresh"` while an implicit session is active. As of upstream `agent-browser 0.27.0`, the parser does not implement those `wait --state` examples as distinct wait modes, so agent-facing docs recommend `wait --fn` predicates for disappearance checks instead.
154
160
  - **`--auto-connect`:** Treated as launch-scoped only when enabled (`--auto-connect` bare or `true`). `--auto-connect false` is ignored for startup-scoped blocking so disabled attach hints do not force a fresh launch.
155
161
 
156
- **Stateless inspection and read-only skills:** Plain-text global help and version probes (`--help`, `-h`, `--version`, `-V`) must never allocate or bind the extension-managed session. The same session-ownership rules apply to read-only upstream `skills list`, `skills get …`, and `skills path …`: those calls still run with `--json` for machine-readable output, but the planner treats them like other stateless inspection work so an agent can load bundled skill text without pinning a browser session or consuming the implicit session slot before a real `open`. Planning and allowlisting live in `extensions/agent-browser/lib/runtime.ts` (`isPlainTextInspectionArgs`, `isStatelessInspectionCommand`, `buildExecutionPlan`).
162
+ **Sessionless inspection and local commands:** Plain-text global help and version probes (`--help`, `-h`, `--version`, `-V`) must never allocate or bind the extension-managed session. The same session-ownership rule applies to read-only upstream `skills list`, `skills get …`, and `skills path …`, local auth profile management (`auth save/list/show/delete/remove`), plus local/setup surfaces such as `profiles`, `dashboard start/stop`, `device list`, `doctor`, `install`, `upgrade`, `session list`, and targeted/all local saved-state maintenance (`state list/show`, `state clear --all`, `state clear -a`, `state clear <session-name>`, `state clean --older-than <days>`, `state rename`). Non-plain-text sessionless commands still run with `--json` for machine-readable output, but the planner does not prepend the implicit managed `--session`, so an agent can inspect local capabilities or start/stop the standalone dashboard without consuming the implicit session slot before a real `open`. Browser-backed, context-dependent, or incomplete commands such as root `session`, untargeted `state clear`, bare `state clean`, `auth login`, `state save`, and `state load` keep normal managed-session injection. Command-shape allowlisting lives in `extensions/agent-browser/lib/command-policy.ts` (`needsManagedSession`), while `extensions/agent-browser/lib/runtime.ts` (`isPlainTextInspectionArgs`, `buildExecutionPlan`) applies that decision to execution planning.
157
163
 
158
164
  A successful unnamed `sessionMode: "fresh"` launch should become the new extension-managed session so later default calls follow that browser instead of silently snapping back to the older managed session.
159
165
 
160
- When a managed implicit or fresh `--session` plan reaches process execution, `details.managedSessionOutcome` summarizes the managed-session transition: on **success**, statuses such as `created`, `replaced`, `unchanged`, or `closed` describe what became current (including successful `close`); on **failure** (launch error, timeout, missing binary, **`qa`** reclassification after a nominally successful batch, failed `close`, and similar), `preserved` vs `abandoned` captures whether a prior managed session stayed current or no managed session ended up active, plus related names and booleans. Failing calls that used `sessionMode: "fresh"` also append a short `Managed session outcome: …` line to model-visible text so the next default `sessionMode: "auto"` hop is obvious; `"auto"` failures may still populate the struct without that extra line. Implementation and field semantics live in `extensions/agent-browser/index.ts` (`buildManagedSessionOutcome`, `formatManagedSessionOutcomeText`); agent contract in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); checklist row `RQ-0077` in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md).
166
+ When a managed implicit or fresh `--session` plan reaches process execution, `details.managedSessionOutcome` summarizes the managed-session transition: on **success**, statuses such as `created`, `replaced`, `unchanged`, or `closed` describe what became current (including successful close commands: `close`, `quit`, or `exit`); on **failure** (launch error, timeout, missing binary, **`qa`** reclassification after a nominally successful batch, failed close command, and similar), `preserved` vs `abandoned` captures whether a prior managed session stayed current or no managed session ended up active, plus related names and booleans. Failing calls that used `sessionMode: "fresh"` also append a short `Managed session outcome: …` line to model-visible text so the next default `sessionMode: "auto"` hop is obvious; `"auto"` failures may still populate the struct without that extra line. Implementation and field semantics live in `extensions/agent-browser/index.ts` (`buildManagedSessionOutcome`, `formatManagedSessionOutcomeText`); agent contract in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); checklist row `RQ-0077` in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md).
161
167
 
162
168
  ## Preferring the native tool
163
169