npm - pi-agent-browser-native - Versions diffs - 0.2.12 → 0.2.14 - Mend

pi-agent-browser-native 0.2.12 → 0.2.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (22) hide show

package/CHANGELOG.md +21 -0
package/README.md +87 -27
package/docs/ARCHITECTURE.md +9 -3
package/docs/COMMAND_REFERENCE.md +383 -151
package/docs/RELEASE.md +81 -26
package/docs/REQUIREMENTS.md +10 -4
package/docs/TOOL_CONTRACT.md +51 -11
package/extensions/agent-browser/index.ts +847 -344
package/extensions/agent-browser/lib/parsing.ts +20 -0
package/extensions/agent-browser/lib/playbook.ts +79 -0
package/extensions/agent-browser/lib/process.ts +56 -8
package/extensions/agent-browser/lib/results/confirmation.ts +76 -0
package/extensions/agent-browser/lib/results/envelope.ts +42 -5
package/extensions/agent-browser/lib/results/presentation.ts +907 -50
package/extensions/agent-browser/lib/results/shared.ts +166 -15
package/extensions/agent-browser/lib/results/snapshot.ts +69 -7
package/extensions/agent-browser/lib/results.ts +7 -1
package/extensions/agent-browser/lib/runtime.ts +204 -15
package/extensions/agent-browser/lib/temp.ts +131 -23
package/package.json +14 -8
package/scripts/agent-browser-capability-baseline.mjs +104 -0
package/scripts/doctor.mjs +420 -0

package/docs/RELEASE.md CHANGED Viewed

@@ -16,24 +16,45 @@ From the repository root:
 ```bash
 npm install
-npm run verify:release
+npm run doctor
+npm run verify -- release
 ```
-`npm run verify:release` runs:
+`npm run doctor` is a read-only first-run diagnostic for PATH, targeted upstream version, and duplicate package/checkout source conflicts. It does not replace upstream `agent-browser doctor` for browser runtime health and does not edit Pi settings.
-1. `npm run verify` for TypeScript and unit coverage
-2. `npm run verify:package` for package-content validation via `npm pack --json --dry-run`
+`npm run verify -- release` runs:
+1. `npm run verify` for TypeScript, unit coverage, and command-reference drift detection
+2. `npm run verify -- package-pi`, which first validates package contents via `npm pack --json --dry-run` and then smoke-loads the packed package in Pi isolation
+The configured-source lifecycle regression harness is opt-in because it launches an interactive `pi` process under `tmux` and requires a usable local model configuration:
+```bash
+npm run verify -- lifecycle
+```
+Use `npm run verify -- lifecycle --keep-artifacts` when debugging failures.
 ## What package verification checks
-`npm run verify:package` confirms that:
+`npm run verify -- package` confirms that:
 - no repo-local `.pi/extensions/agent-browser.ts` autoload shim is present
 - `LICENSE` exists in the repo and the packed tarball
 - canonical published docs are present
+- the package-level doctor command and capability baseline are present
 - extension source files are present, including the split result-rendering modules required by the published facade
 - agent-only and superseded docs are absent from the tarball
+`npm run verify -- package-pi` runs the same package-content checks and additionally confirms that:
+- the packed package can be loaded through Pi SDK resource loading with the same isolation principle as `pi --no-extensions -e <package-source>`
+- exactly one `agent_browser` tool is registered
+- the registered `agent_browser` source resolves inside the extracted packed package path, not the working checkout
+- the packaged `agent_browser` tool can be executed through Pi's loaded native tool definition with a deterministic fake upstream `agent-browser --version` binary
+The packaged execution smoke intentionally uses a temporary fake `agent-browser` binary and the `--version` inspection path. It proves first invocation of the packaged Pi tool without launching a real browser. Real browser coverage remains part of local checkout validation and post-publish install validation.
 Current forbidden packed files include:
 - `AGENTS.md`
@@ -46,20 +67,47 @@ Current forbidden packed files include:
 For a full packed file listing:
 ```bash
-node scripts/verify-package.mjs --list-files
+npm run verify -- package --list-files
 ```
 ## Local development validation
-Before publishing, also validate the explicit local-checkout path:
+Before publishing, validate both local-checkout modes without mixing their assumptions.
+### Quick isolated checkout smoke test
 1. Install `agent-browser` separately.
-2. Make sure Pi has only one active source for this extension during checkout validation.
-3. Launch `pi --no-extensions -e .` from this repository root.
-4. Confirm the checkout extension loads from `extensions/agent-browser/index.ts`.
-5. Run a smoke prompt that exercises `agent_browser`.
-6. Validate managed-session continuity with both `/reload` and a full restart + `/resume`.
-7. Re-check local extension-side docs (`README.md`, `docs/COMMAND_REFERENCE.md`, and prompt guidance) if the upstream `agent-browser` version/help surface changed.
+2. Launch `pi --no-extensions -e .` from this repository root.
+3. Confirm the checkout extension loads from `extensions/agent-browser/index.ts`.
+4. Run a smoke prompt that exercises `agent_browser`.
+5. Restart the `pi` process after extension edits; Pi settings and `/reload` are not the validation target in this isolated mode.
+### Configured-source lifecycle validation
+Prefer the automated harness for deterministic configured-source lifecycle regression coverage:
+```bash
+npm run verify -- lifecycle
+```
+The harness creates an isolated `PI_CODING_AGENT_DIR`, writes settings with exactly one temporary configured package source, runs plain `pi` in `tmux`, puts a deterministic fake `agent-browser` first on `PATH`, and drives `/reload`, full restart, and `/resume`. It asserts same-page managed-session continuity, persisted `details.fullOutputPath` reachability after resume, and updated extension-code pickup through a temporary sentinel command. On failure it retains transcripts/session artifacts; on success it performs best-effort cleanup. It does not replace occasional real-browser manual smoke testing.
+Manual validation remains useful for release confidence and installed-package checks:
+1. Configure exactly one active source for this extension in Pi settings: this checkout path before publishing, or the installed package after publishing.
+2. Launch plain `pi` so extension discovery is active.
+3. Validate managed-session continuity with `/reload` and a full restart + `/resume`.
+4. Re-check local extension-side docs (`README.md`, `docs/COMMAND_REFERENCE.md`, and prompt guidance) if the upstream `agent-browser` version/help surface changed, then run `npm run verify -- command-reference`.
+### Real upstream contract validation
+The default `npm test` and `npm run verify` paths use fast deterministic tests and fake binaries. When a change touches upstream command planning, result presentation, managed-session behavior, or the canonical capability baseline, also run the opt-in real-upstream contract suite:
+```bash
+npm run verify -- real-upstream
+```
+This suite requires the installed `agent-browser --version` to exactly match `scripts/agent-browser-capability-baseline.mjs`. It serves fixture pages from localhost and validates real runtime output shapes for `--version`, `open`, `eval --stdin`, `snapshot -i`, `batch` stdin, `wait --download` metadata, wrapper artifact existence reporting for the requested wait-download path, and implicit managed-session reuse. The current upstream `wait --download <path>` saveAs persistence limitation is tracked at [vercel-labs/agent-browser#1300](https://github.com/vercel-labs/agent-browser/issues/1300); until it is fixed, release validation must treat `details.savedFilePath` as upstream-reported metadata and use `details.artifacts[].exists` as the filesystem truth. If the suite fails because JSON/detail keys drifted, update the wrapper behavior or refresh `test/fixtures/agent-browser-real-output-shapes.json` together with the presentation work that consumes those shapes.
 Example smoke prompt:
@@ -67,30 +115,33 @@ Example smoke prompt:
 Use the agent_browser tool to open https://react.dev and then take an interactive snapshot.
 ```
-Recommended lifecycle follow-up:
+Recommended configured-source lifecycle follow-up:
 1. Open a page with the implicit managed session and confirm the title.
 2. Run `/reload`, then ask for `snapshot -i` and confirm the same page is still active.
 3. Exit `pi`, relaunch it against the same session file or use `/resume`, then ask for `snapshot -i` again and confirm the same page is still active.
 4. Open a large page that compacts its snapshot output and confirm `details.fullOutputPath` still exists after the restart/resume flow.
 5. Trigger an oversized non-snapshot output (for example a deliberately large `eval --stdin` result) and confirm the tool prints the actual spill file path directly in content instead of only referencing a details key.
-6. Validate at least one file-download flow with `download <selector> <path>`.
+6. Validate at least one direct file-download flow with `download <selector> <path>`.
+7. Validate at least one asynchronous export flow with `click` followed by `wait --download <path>`, confirming the wait result reports `savedFilePath`/`savedFile` and checking `details.artifacts[].exists` before relying on the requested path being present on disk.
 ## Post-publish install validation
-After publishing a release, validate the package-first install path explicitly:
+After publishing a release, validate the package-first path in isolation. `npm run verify -- release` includes the deterministic fake-binary packaged execution gate, but it does not replace a real-browser installed-package smoke:
 ```bash
-pi install npm:pi-agent-browser-native@<version>
-pi -e npm:pi-agent-browser-native@<version>
+npm exec --package pi-agent-browser-native -- pi-agent-browser-doctor
+npm run verify -- release
+pi --no-extensions -e npm:pi-agent-browser-native@<version>
 ```
-For installed-package validation, make sure Pi has only one active source for this extension. The simplest safe paths are either:
+Then run the real-browser smoke prompt:
-- temporarily disable/remove the checkout path and then run plain `pi`, or
-- use an isolated ephemeral run such as `pi --no-extensions -e npm:pi-agent-browser-native@<version>`
+```text
+Use the agent_browser tool to open https://react.dev and then take an interactive snapshot.
+```
-Then confirm `pi` exposes the native `agent_browser` tool, that a basic `open` + `snapshot -i` flow works, and that `/reload` plus restart/`/resume` keep following the same implicit managed browser session.
+Only use plain `pi` for installed-package validation after temporarily disabling or removing the checkout source or any other active source for this extension from Pi settings. Then confirm `pi` exposes the native `agent_browser` tool, that a basic `open` + `snapshot -i` flow works, and that `/reload` plus restart/`/resume` keep following the same implicit managed browser session.
 ## Release notes checklist
@@ -99,7 +150,11 @@ Before publishing:
 - update `CHANGELOG.md`
 - confirm README install guidance still leads with the package-first flow
 - confirm `docs/COMMAND_REFERENCE.md` still matches the effective upstream command/help surface used by the wrapper
-- confirm the explicit local-checkout instructions still work for pre-release validation
-- rerun `npm run verify:release`
-- manually exercise `/reload` and full restart + `/resume` continuity in local checkout validation
-- publish only after the tarball contents match expectations
+- run `npm run verify -- command-reference` if the installed upstream `agent-browser` version or help surface changed
+- run `npm run doctor` and confirm any duplicate-source remediation matches the active package/checkout setup
+- run `npm run verify -- real-upstream` for upstream runtime, result-presentation, or managed-session changes
+- confirm both local-checkout modes still work for pre-release validation: isolated `pi --no-extensions -e .` smoke testing and configured-source lifecycle validation
+- rerun `npm run verify -- release`
+- run `npm run verify -- lifecycle` for opt-in configured-source `/reload` plus restart/`/resume` regression coverage
+- manually exercise real-browser `/reload` and full restart + `/resume` continuity when release risk warrants browser-level confidence beyond the fake upstream harness
+- publish only after the tarball contents and isolated packaged-extension smoke check match expectations

package/docs/REQUIREMENTS.md CHANGED Viewed

@@ -44,11 +44,13 @@ Define the product requirements and constraints for `pi-agent-browser-native`.
 ### Install priority
 - Prioritize the package install path first.
-- User-facing install docs should lead with `pi install npm:pi-agent-browser-native` and `pi -e npm:pi-agent-browser-native` once releases exist.
+- User-facing install docs should lead with `pi install npm:pi-agent-browser-native`; ephemeral package trials and validation must use `pi --no-extensions -e npm:pi-agent-browser-native[@<version>]` so configured checkout or global sources cannot duplicate `agent_browser`.
 - User-facing install docs should also include the GitHub source path `pi install https://github.com/fitchmultz/pi-agent-browser-native`.
+- Provide a read-only package-level doctor command that checks upstream `agent-browser` PATH/version and duplicate Pi package/checkout sources before first use. It must not mutate Pi settings and must remain distinct from upstream `agent-browser doctor`.
 - Keep the current local-checkout path documented as the practical pre-release and development flow.
 - Most users will install this extension globally rather than as a project-local extension.
-- Local checkout development should use explicit CLI loading such as `pi --no-extensions -e .` or `pi --no-extensions -e /absolute/path/to/pi-agent-browser-native`.
+- Local checkout smoke testing should use explicit CLI loading such as `pi --no-extensions -e .` or `pi --no-extensions -e /absolute/path/to/pi-agent-browser-native`; Pi settings are bypassed in this mode and code edits require a process restart for validation.
+- Local checkout hot-reload and resume validation should use configured-source lifecycle mode: exactly one active checkout/package source in Pi settings, launched with plain `pi`, so `/reload` exercises discovered/configured resources.
 - Do **not** rely on repo-local `.pi/extensions/` auto-discovery for this package, because it conflicts with the global installed-package path.
 ### Native-tool preference
@@ -65,14 +67,17 @@ Define the product requirements and constraints for `pi-agent-browser-native`.
 - Documents should read as complete documents, not iterative logs, unless they are explicitly meant to be iterative, such as a changelog.
 - Requirements, expectations, and durable rules from user conversations should be reflected in the appropriate docs.
 - Because direct-binary usage is commonly blocked in normal agent sessions, the repo must carry a local command reference for the effective `agent_browser` surface and keep it in sync with upstream changes.
+- Repository verification must include a lightweight command-reference drift check against the targeted installed upstream `agent-browser` version.
 - Published package contents should include the canonical user-facing docs plus `LICENSE`.
 - Published package contents should exclude agent-only and superseded docs such as `AGENTS.md`, `docs/v1-tool-contract.md`, and `docs/native-integration-design.md`.
 ### Testing guidance
 - The primary confidence path is a real `pi` session driven in `tmux`.
-- For local checkout validation, launch `pi --no-extensions -e .` from the repository root so only the checkout copy loads.
-- Validate both `/reload` and a full `pi` restart with `/resume` when changes touch managed-session continuity, reload behavior, or persisted artifact paths.
+- For quick local checkout smoke validation, launch `pi --no-extensions -e .` from the repository root so only the checkout copy loads; do not rely on Pi settings or `/reload` semantics in this isolated mode.
+- For hot-reload validation, configure exactly one active source for this extension in Pi settings and launch plain `pi`; validate `/reload` there because it exercises auto-discovered/configured resources.
+- Maintain an opt-in tmux-driven configured-source lifecycle harness that isolates Pi settings, uses exactly one configured source, exercises `/reload`, full restart, and `/resume`, and asserts managed-session continuity plus persisted artifact survival. Keep `docs/RELEASE.md` accurate about the harness behavior, cleanup, transcript retention, and limitations.
+- Validate a full `pi` restart with `/resume` when changes touch managed-session continuity, reload behavior, or persisted artifact paths.
 - Prefer full `pi` restart over `/reload` when validating extension changes beyond a quick reload smoke check.
 - Use `/resume` when needed after restart.
 - Keep testing broader than a single smoke site like `example.com`.
@@ -97,6 +102,7 @@ The design should comfortably support workflows such as:
 - User-facing docs belong in `README.md` and the canonical published files under `docs/`.
 - Agent workflow and deeper testing procedures can stay in `AGENTS.md`, but published docs must not depend on that file being present.
 - When upstream `agent-browser` changes, refresh the local command reference, prompt guidance, and other extension-side docs so agents still have a repo-readable equivalent of the blocked direct-binary help path.
+- The canonical agent-facing playbook should live in `extensions/agent-browser/lib/playbook.ts`; README, command-reference, and tool-contract fragments must be generated or checked from that source by `npm run docs -- playbook check` so prompt guidance and docs cannot drift silently.
 - Keep mitigations for legacy-skill coexistence simple; do not add extra moving parts unless observed behavior justifies them.
 - Prefer narrow, evidence-backed compatibility mitigations over broad stealth layers when a specific upstream site starts rejecting the default headless launch fingerprint.
 - Preserve the page that a profiled `open` just navigated to; if restored profile tabs steal focus during launch, the wrapper should best-effort switch back to the returned page URL before handing control back to the agent.

package/docs/TOOL_CONTRACT.md CHANGED Viewed

@@ -25,7 +25,25 @@ It also keeps the main UX where it belongs: the agent invokes the tool directly
 The tool guidance should be written for task discovery first, not wrapper implementation first. That means the description should emphasize browser use cases like web research, reading live docs, clicking, filling, screenshots, extraction, and authenticated/profile-based workflows. Low-level wrapper details like `stdin` and exact CLI args belong in the schema and guidelines, not the lead description.
-The tool also needs an operating playbook, not just a capability list. The model should not have to rediscover basics each session. Guidance should explicitly encode the normal browser workflow (`open` -> `snapshot -i` -> interact -> re-snapshot), the authenticated-content workflow (prefer `--profile Default` on the first browser call and let the implicit session carry continuity; use `--auto-connect` as a fallback when profile reuse is unavailable), and the preferred recovery path when a session opens on the wrong tab, an action changes origin unexpectedly, or an `open` call returns blocked/blank/unexpected results (`tab list` / `tab <tab-id-or-label>` / `snapshot -i` before retrying different URLs or fallback strategies). It should also discourage inventing fixed explicit session names for routine tasks, because those names leak stale browser state across otherwise unrelated `pi` sessions. For read-only browsing tasks, guidance should prefer answering from the current page state first: use the current snapshot, structured ref labels, or `eval --stdin` on the current page before navigating into media viewers, detail routes, or other new pages unless the current view lacks the needed information. For downloads, guidance should explicitly prefer `download <selector> <path>` over `click` when the goal is a file on disk. When using `eval --stdin`, scope checks and actions to the target element or route whenever possible instead of relying on broad page-wide text heuristics. When using `eval --stdin` for extraction, return the intended value instead of relying on `console.log` as the primary result channel. Because the extension blocks normal direct-binary usage in most agent sessions, the repository must also carry a local command reference that stays in sync with the effective tool surface.
+The tool also needs an operating playbook, not just a capability list. The model should not have to rediscover basics each session. The canonical agent-facing playbook lives in `extensions/agent-browser/lib/playbook.ts`; generated Markdown fragments are updated by `npm run docs -- playbook write`, and `npm run docs -- playbook check` fails when checked-in documentation drifts.
+<!-- agent-browser-playbook:start shared-guidelines -->
+<!-- Generated from extensions/agent-browser/lib/playbook.ts. Run `npm run docs -- playbook write` to update. -->
+- Standard workflow: open the page, snapshot -i, interact using current @refs from that snapshot, and re-snapshot after navigation, scrolling, rerendering, or other major DOM changes because refs can become stale.
+- When a visible text or accessible-name target should survive ref churn, prefer find locators such as role, text, label, placeholder, alt, title, or testid with the intended action instead of guessing a CSS selector.
+- Do not assume Playwright selector dialects such as text=Close or button:has-text('Close') are supported wrapper syntax unless current upstream agent-browser behavior has been verified.
+- For authenticated or user-specific content like feeds, inboxes, dashboards, and accounts, prefer --profile Default on the first browser call and let the implicit session carry continuity. Use --auto-connect only if profile-based reuse is unavailable or the task is specifically about attaching to a running debug-enabled browser.
+- Do not invent fixed explicit session names for routine tasks. Use the implicit session unless you truly need multiple isolated browser sessions in the same conversation.
+- When using --profile, --session-name, --cdp, --state, or --auto-connect, put them on the first command for that session. If you intentionally use an explicit --session, keep using that same explicit session for follow-ups.
+- If you already used the implicit session and now need launch-scoped flags like --profile, --session-name, --cdp, --state, or --auto-connect, retry with sessionMode set to fresh or pass an explicit --session for the new launch. After a successful unnamed fresh launch, later auto calls follow that new session.
+- If a session lands on the wrong page or tab, an interaction changes origin unexpectedly, or an open call returns blocked, blank, or otherwise unexpected results, use tab list / tab <tab-id-or-label> / snapshot -i to recover state before retrying different URLs or fallback strategies. Only use wait with an explicit argument like milliseconds, --load <state>, --url <matcher>, --fn <js>, or --text <matcher>.
+- For feed, timeline, or inbox reading tasks, focus on the main timeline/list region and read the first item there rather than unrelated composer or sidebar content.
+- For read-only browsing tasks, prefer extracting the answer from the current snapshot, structured ref labels, or eval --stdin on the current page before navigating away. Only click into media viewers, detail routes, or new pages when the current view does not contain the needed information.
+- For downloads, prefer download <selector> <path> when an element click should save a file. Do not rely on click alone when you need the downloaded file on disk.
+- When using eval --stdin, scope checks and actions to the target element or route whenever possible instead of relying on broad page-wide text heuristics.
+- When using eval --stdin for extraction, return the value you want instead of relying on console.log as the primary result channel.
+- Do not call --help or other exploratory inspection commands unless the user explicitly asks for them or debugging the browser integration is necessary.
+<!-- agent-browser-playbook:end shared-guidelines -->
 ## Parameters
@@ -58,7 +76,8 @@ Examples:
 - type: `string`
 - optional
-- raw stdin for commands like `eval --stdin` and `batch`
+- raw stdin for `eval --stdin` and `batch`
+- rejected before launch for any other command/stdin combination, including commands such as `click`, `snapshot`, or `open`
 Examples:
@@ -92,8 +111,18 @@ The extension should:
 - invoke `agent-browser` directly, not through a shell
 - parse JSON output into tool details
 - handle observed JSON result shapes, including the array returned by `batch --json`
-- allow plain-text fallback for inspection commands like `--help` and `--version`
-- support those inspection commands unconditionally so the tool contract stays local and predictable
+- allow plain-text fallback for native inspection calls
+- support those inspection calls unconditionally so the tool contract stays local and predictable
+<!-- agent-browser-playbook:start inspection -->
+<!-- Generated from extensions/agent-browser/lib/playbook.ts. Run `npm run docs -- playbook write` to update. -->
+Native inspection calls use the `agent_browser` tool shape, not shell-like direct-binary commands:
+- { "args": ["--help"] }
+- { "args": ["--version"] }
+These calls return plain text and stay stateless: the extension does not inject its implicit session and does not let inspection consume the managed-session slot needed for later profile, session, CDP, state, or auto-connect launches.
+<!-- agent-browser-playbook:end inspection -->
 - still describe normal browser workflows in guidance so models do not overuse inspection for routine tasks
 - surface stderr and non-zero exits clearly
 - attach images when the result points to a screenshot-like artifact
@@ -142,26 +171,33 @@ Additional structured fields can appear when relevant:
 - `batchFailure` and `batchSteps` for `batch` rendering, including mixed-success runs
 - `navigationSummary` for navigation-style commands like `click`, `back`, `forward`, and `reload`
 - `imagePath` / `imagePaths` for screenshots and batched image outputs
+- `artifacts` for upstream saved files such as screenshots, PDFs, downloads, `wait --download` files, traces, CPU profiles, completed WebM recordings, path-bearing HAR captures, and future recording output paths reported by `record start`. Each artifact includes the original saved or requested `path`, resolved `absolutePath`, `kind`, optional `mediaType`, optional `extension`, and best-effort disk metadata such as `exists` and `sizeBytes`.
+- `savedFilePath` / `savedFile` for direct `download`, `pdf`, and `wait --download` saved-file workflows; batch results preserve the same fields on the relevant `batchSteps` entry.
+- `batchSteps[].artifacts` for per-step artifacts in `batch` output; top-level `artifacts` aggregates all step artifacts in order
 - `fullOutputPath` / `fullOutputPaths` when large snapshot output or other oversized tool output is compacted and spilled to a private file; persisted sessions keep that path under a private session-scoped artifact directory with a bounded per-session budget so it survives reload/resume without unbounded growth
+- `artifactManifest` for a bounded, metadata-only inventory of recent session artifacts. Entries include path metadata, artifact `kind`, source `command`/`subcommand` when safe, `storageScope` (`persistent-session`, `process-temp`, or `explicit-path`), and `retentionState` (`live`, `ephemeral`, `missing`, or `evicted`). The default recent window is 100 entries and can be configured with `PI_AGENT_BROWSER_SESSION_ARTIFACT_MANIFEST_MAX_ENTRIES`. The manifest must not store command args, output contents, headers, DOM snapshots, or downloaded file contents.
+- `artifactRetentionSummary` with a concise count of live, evicted, ephemeral, and missing artifacts from the current manifest; results append this summary to model-facing text only when retention state affects recovery, such as spill files, ephemeral files, or evictions. Routine explicit saved files keep the summary in details to avoid noisy browsing transcripts.
 - `sessionRecoveryHint` when startup-scoped flags need `sessionMode: "fresh"`
 - `inspection: true` plus `stdout` for successful plain-text inspection commands like `--help` and `--version`
 When the tool echoes `args` or `effectiveArgs` back into Pi, sensitive values such as `--headers`, proxy credentials, and auth-bearing URL parameters should be redacted first.
-For oversized snapshots and other oversized tool outputs, details should switch to a compact metadata object and include `fullOutputPath` pointing at a private spill file with the full upstream payload. The model-facing tool text should print the actual spill-file path when one exists instead of only saying to inspect a details key. Persisted sessions should keep that spill file under a private session-scoped artifact directory so the path remains usable after reload/restart, with the oldest persisted spill files evicted as needed to stay within the per-session budget.
+For oversized snapshots and other oversized tool outputs, details should switch to a compact metadata object and include `fullOutputPath` pointing at a private spill file with the full redacted upstream payload. The model-facing tool text should print the actual spill-file path when one exists instead of only saying to inspect a details key. Persisted sessions should keep that spill file under a private session-scoped artifact directory so the path remains usable after reload/restart. The oldest persisted spill files are evicted as needed to stay within `PI_AGENT_BROWSER_SESSION_ARTIFACT_MAX_BYTES` (default 32 MiB), and those evictions are reported as `artifactManifest.entries[].retentionState: "evicted"` instead of silently disappearing from the session inventory. This persisted-spill byte budget is separate from the recent metadata window controlled by `PI_AGENT_BROWSER_SESSION_ARTIFACT_MANIFEST_MAX_ENTRIES`.
 ## High-value result rendering
 "Rendering" here means how results appear inside `pi`, not embedding a browser UI.
 Worth doing in v1:
-- screenshots → inline image attachment
-- snapshots → origin + ref count + main-content-first compact preview, with the raw snapshot spill path printed directly in content and kept in `details.fullOutputPath` when the inline result would otherwise be too large
+- screenshots → saved-path summary, `details.artifacts` metadata, and inline image attachment when safe
+- file artifacts such as PDFs, downloads, `wait --download` files, traces, CPU profiles, completed WebM recordings, and path-bearing HAR captures → concise saved-path summaries plus metadata in `details.artifacts` and bounded recent metadata in `details.artifactManifest`; `record start` reports recording lifecycle state and the future output path without adding a missing manifest entry; direct saved-file workflows also expose `details.savedFilePath` / `details.savedFile`; large or binary artifacts are not inlined into model context; the recent manifest cap can age out explicit-file metadata but does not remove explicit saved files from disk
+- snapshots → origin + ref count + main-content-first compact preview, with the raw snapshot spill path printed directly in content and kept in `details.fullOutputPath` plus `details.artifactManifest` when the inline result would otherwise be too large
 - oversized generic outputs such as large `eval --stdin` payloads → compact preview plus the actual spill file path instead of dumping the whole payload into model context
 - extraction-style commands like `eval --stdin` and `get title` → scalar-first text with lightweight origin context when available
 - navigation actions like `click`, `back`, `forward`, and `reload` → lightweight post-action title/url summary when available
 - tab lists → compact summary/table
 - stream status → enabled/connected/port summary
+- diagnostic/status families (`session`, `session list`, `profiles`, `doctor`, `auth list`/`show`, `network requests`, `console`, `errors`, and dashboard start/stop/status outputs) → compact readable summaries with counts and stable fields; large log/request/error outputs use previews plus `fullOutputPath` spill files; sensitive nested auth/header/token fields are not expanded in the model-facing text
 ## Missing binary behavior
@@ -179,14 +215,18 @@ If `agent-browser` is not on `PATH`, fail with a message that:
 - preserve the current extension-managed session across normal `pi` shutdown/reload so persisted sessions can keep following the live browser on `/reload` or `/resume`
 - set an idle timeout on extension-managed sessions so abandoned daemons eventually self-clean
 - clean up process-private temp spill artifacts on shutdown, while keeping persisted-session snapshot spill files in a private session-scoped artifact directory so `details.fullOutputPath` survives reload/restart and the oldest spill files are evicted if the per-session artifact budget is exceeded
-- reconstruct the current extension-managed session from persisted tool details on resume/reload so later default calls keep following the active managed browser
+- reconstruct the current extension-managed session and latest `artifactManifest` from persisted tool details on resume/reload so later default calls keep following the active managed browser and can continue reporting artifact retention state
 - when an unnamed `sessionMode: "fresh"` launch succeeds, make it the new extension-managed session so later default calls keep using it
 - if that unnamed fresh launch replaced an already-active managed session, best-effort close the old managed session after the switch succeeds
 - treat explicit caller-provided `--session` choices as user-managed
 - pass explicit `--profile` straight through to upstream `agent-browser`; no profile-cloning or isolation layer is added in v1
-- after profiled `open` / `goto` / `navigate`, if upstream leaves a restored profile tab active instead of the page that was just opened, best-effort switch back to the tab whose URL matches the returned open result before returning control to the agent
-- once the wrapper has a known tab target for a session, later active-tab commands may best-effort pin that tab inside the same upstream invocation so reconnect drift does not send a `click`, `snapshot`, or similar action to a restored/background tab instead
-- after a successful command on a known tab target, the wrapper may best-effort restore that same target again if a restored/background tab steals focus after the command completes
+<!-- agent-browser-playbook:start wrapper-tab-recovery -->
+<!-- Generated from extensions/agent-browser/lib/playbook.ts. Run `npm run docs -- playbook write` to update. -->
+- After launch-scoped open/goto/navigate calls that can restore existing tabs (for example --profile, --session-name, or --state), agent_browser best-effort re-selects the tab whose URL matches the returned page when restored tabs steal focus during launch.
+- After a target tab is known for a session, later active-tab commands best-effort pin that tab inside the same upstream invocation when reconnect drift would otherwise move the command to a restored/background tab.
+- After a successful command on a known target tab, agent_browser also best-effort restores that intended tab if a restored/background tab steals focus after the command completes.
+- If a known session target unexpectedly reports about:blank, agent_browser preserves the prior intended target, best-effort re-selects it when it still exists, and reports exact recovery guidance when it cannot be re-selected.
+<!-- agent-browser-playbook:end wrapper-tab-recovery -->
 - on local Unix launches, set a short private socket directory for wrapper-spawned `agent-browser` processes so extension-generated session names do not fail the upstream Unix socket-path length limit in longer cwd/session-name combinations
 - treat successful plain-text inspection commands like `--help` and `--version` as stateless: do not inject the implicit managed session and do not let those calls claim the managed-session slot
 - if startup-scoped flags like `--profile`, `--session-name`, or `--cdp` are supplied after the implicit session is already active while `sessionMode` is `"auto"`, return a validation error with a structured recovery hint that recommends `sessionMode: "fresh"`