pi-agent-browser-native 0.2.12 → 0.2.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/docs/RELEASE.md CHANGED
@@ -16,24 +16,45 @@ From the repository root:
16
16
 
17
17
  ```bash
18
18
  npm install
19
- npm run verify:release
19
+ npm run doctor
20
+ npm run verify -- release
20
21
  ```
21
22
 
22
- `npm run verify:release` runs:
23
+ `npm run doctor` is a read-only first-run diagnostic for PATH, targeted upstream version, and duplicate package/checkout source conflicts. It does not replace upstream `agent-browser doctor` for browser runtime health and does not edit Pi settings.
23
24
 
24
- 1. `npm run verify` for TypeScript and unit coverage
25
- 2. `npm run verify:package` for package-content validation via `npm pack --json --dry-run`
25
+ `npm run verify -- release` runs:
26
+
27
+ 1. `npm run verify` for TypeScript, unit coverage, and command-reference drift detection
28
+ 2. `npm run verify -- package-pi`, which first validates package contents via `npm pack --json --dry-run` and then smoke-loads the packed package in Pi isolation
29
+
30
+ The configured-source lifecycle regression harness is opt-in because it launches an interactive `pi` process under `tmux` and requires a usable local model configuration:
31
+
32
+ ```bash
33
+ npm run verify -- lifecycle
34
+ ```
35
+
36
+ Use `npm run verify -- lifecycle --keep-artifacts` when debugging failures.
26
37
 
27
38
  ## What package verification checks
28
39
 
29
- `npm run verify:package` confirms that:
40
+ `npm run verify -- package` confirms that:
30
41
 
31
42
  - no repo-local `.pi/extensions/agent-browser.ts` autoload shim is present
32
43
  - `LICENSE` exists in the repo and the packed tarball
33
44
  - canonical published docs are present
45
+ - the package-level doctor command and capability baseline are present
34
46
  - extension source files are present, including the split result-rendering modules required by the published facade
35
47
  - agent-only and superseded docs are absent from the tarball
36
48
 
49
+ `npm run verify -- package-pi` runs the same package-content checks and additionally confirms that:
50
+
51
+ - the packed package can be loaded through Pi SDK resource loading with the same isolation principle as `pi --no-extensions -e <package-source>`
52
+ - exactly one `agent_browser` tool is registered
53
+ - the registered `agent_browser` source resolves inside the extracted packed package path, not the working checkout
54
+ - the packaged `agent_browser` tool can be executed through Pi's loaded native tool definition with a deterministic fake upstream `agent-browser --version` binary
55
+
56
+ The packaged execution smoke intentionally uses a temporary fake `agent-browser` binary and the `--version` inspection path. It proves first invocation of the packaged Pi tool without launching a real browser. Real browser coverage remains part of local checkout validation and post-publish install validation.
57
+
37
58
  Current forbidden packed files include:
38
59
 
39
60
  - `AGENTS.md`
@@ -46,20 +67,47 @@ Current forbidden packed files include:
46
67
  For a full packed file listing:
47
68
 
48
69
  ```bash
49
- node scripts/verify-package.mjs --list-files
70
+ npm run verify -- package --list-files
50
71
  ```
51
72
 
52
73
  ## Local development validation
53
74
 
54
- Before publishing, also validate the explicit local-checkout path:
75
+ Before publishing, validate both local-checkout modes without mixing their assumptions.
76
+
77
+ ### Quick isolated checkout smoke test
55
78
 
56
79
  1. Install `agent-browser` separately.
57
- 2. Make sure Pi has only one active source for this extension during checkout validation.
58
- 3. Launch `pi --no-extensions -e .` from this repository root.
59
- 4. Confirm the checkout extension loads from `extensions/agent-browser/index.ts`.
60
- 5. Run a smoke prompt that exercises `agent_browser`.
61
- 6. Validate managed-session continuity with both `/reload` and a full restart + `/resume`.
62
- 7. Re-check local extension-side docs (`README.md`, `docs/COMMAND_REFERENCE.md`, and prompt guidance) if the upstream `agent-browser` version/help surface changed.
80
+ 2. Launch `pi --no-extensions -e .` from this repository root.
81
+ 3. Confirm the checkout extension loads from `extensions/agent-browser/index.ts`.
82
+ 4. Run a smoke prompt that exercises `agent_browser`.
83
+ 5. Restart the `pi` process after extension edits; Pi settings and `/reload` are not the validation target in this isolated mode.
84
+
85
+ ### Configured-source lifecycle validation
86
+
87
+ Prefer the automated harness for deterministic configured-source lifecycle regression coverage:
88
+
89
+ ```bash
90
+ npm run verify -- lifecycle
91
+ ```
92
+
93
+ The harness creates an isolated `PI_CODING_AGENT_DIR`, writes settings with exactly one temporary configured package source, runs plain `pi` in `tmux`, puts a deterministic fake `agent-browser` first on `PATH`, and drives `/reload`, full restart, and `/resume`. It asserts same-page managed-session continuity, persisted `details.fullOutputPath` reachability after resume, and updated extension-code pickup through a temporary sentinel command. On failure it retains transcripts/session artifacts; on success it performs best-effort cleanup. It does not replace occasional real-browser manual smoke testing.
94
+
95
+ Manual validation remains useful for release confidence and installed-package checks:
96
+
97
+ 1. Configure exactly one active source for this extension in Pi settings: this checkout path before publishing, or the installed package after publishing.
98
+ 2. Launch plain `pi` so extension discovery is active.
99
+ 3. Validate managed-session continuity with `/reload` and a full restart + `/resume`.
100
+ 4. Re-check local extension-side docs (`README.md`, `docs/COMMAND_REFERENCE.md`, and prompt guidance) if the upstream `agent-browser` version/help surface changed, then run `npm run verify -- command-reference`.
101
+
102
+ ### Real upstream contract validation
103
+
104
+ The default `npm test` and `npm run verify` paths use fast deterministic tests and fake binaries. When a change touches upstream command planning, result presentation, managed-session behavior, or the canonical capability baseline, also run the opt-in real-upstream contract suite:
105
+
106
+ ```bash
107
+ npm run verify -- real-upstream
108
+ ```
109
+
110
+ This suite requires the installed `agent-browser --version` to exactly match `scripts/agent-browser-capability-baseline.mjs`. It serves fixture pages from localhost and validates real runtime output shapes for `--version`, `open`, `eval --stdin`, `snapshot -i`, `batch` stdin, `wait --download` metadata, wrapper artifact existence reporting for the requested wait-download path, and implicit managed-session reuse. The current upstream `wait --download <path>` saveAs persistence limitation is tracked at [vercel-labs/agent-browser#1300](https://github.com/vercel-labs/agent-browser/issues/1300); until it is fixed, release validation must treat `details.savedFilePath` as upstream-reported metadata and use `details.artifacts[].exists` as the filesystem truth. If the suite fails because JSON/detail keys drifted, update the wrapper behavior or refresh `test/fixtures/agent-browser-real-output-shapes.json` together with the presentation work that consumes those shapes.
63
111
 
64
112
  Example smoke prompt:
65
113
 
@@ -67,30 +115,33 @@ Example smoke prompt:
67
115
  Use the agent_browser tool to open https://react.dev and then take an interactive snapshot.
68
116
  ```
69
117
 
70
- Recommended lifecycle follow-up:
118
+ Recommended configured-source lifecycle follow-up:
71
119
 
72
120
  1. Open a page with the implicit managed session and confirm the title.
73
121
  2. Run `/reload`, then ask for `snapshot -i` and confirm the same page is still active.
74
122
  3. Exit `pi`, relaunch it against the same session file or use `/resume`, then ask for `snapshot -i` again and confirm the same page is still active.
75
123
  4. Open a large page that compacts its snapshot output and confirm `details.fullOutputPath` still exists after the restart/resume flow.
76
124
  5. Trigger an oversized non-snapshot output (for example a deliberately large `eval --stdin` result) and confirm the tool prints the actual spill file path directly in content instead of only referencing a details key.
77
- 6. Validate at least one file-download flow with `download <selector> <path>`.
125
+ 6. Validate at least one direct file-download flow with `download <selector> <path>`.
126
+ 7. Validate at least one asynchronous export flow with `click` followed by `wait --download <path>`, confirming the wait result reports `savedFilePath`/`savedFile` and checking `details.artifacts[].exists` before relying on the requested path being present on disk.
78
127
 
79
128
  ## Post-publish install validation
80
129
 
81
- After publishing a release, validate the package-first install path explicitly:
130
+ After publishing a release, validate the package-first path in isolation. `npm run verify -- release` includes the deterministic fake-binary packaged execution gate, but it does not replace a real-browser installed-package smoke:
82
131
 
83
132
  ```bash
84
- pi install npm:pi-agent-browser-native@<version>
85
- pi -e npm:pi-agent-browser-native@<version>
133
+ npm exec --package pi-agent-browser-native -- pi-agent-browser-doctor
134
+ npm run verify -- release
135
+ pi --no-extensions -e npm:pi-agent-browser-native@<version>
86
136
  ```
87
137
 
88
- For installed-package validation, make sure Pi has only one active source for this extension. The simplest safe paths are either:
138
+ Then run the real-browser smoke prompt:
89
139
 
90
- - temporarily disable/remove the checkout path and then run plain `pi`, or
91
- - use an isolated ephemeral run such as `pi --no-extensions -e npm:pi-agent-browser-native@<version>`
140
+ ```text
141
+ Use the agent_browser tool to open https://react.dev and then take an interactive snapshot.
142
+ ```
92
143
 
93
- Then confirm `pi` exposes the native `agent_browser` tool, that a basic `open` + `snapshot -i` flow works, and that `/reload` plus restart/`/resume` keep following the same implicit managed browser session.
144
+ Only use plain `pi` for installed-package validation after temporarily disabling or removing the checkout source or any other active source for this extension from Pi settings. Then confirm `pi` exposes the native `agent_browser` tool, that a basic `open` + `snapshot -i` flow works, and that `/reload` plus restart/`/resume` keep following the same implicit managed browser session.
94
145
 
95
146
  ## Release notes checklist
96
147
 
@@ -99,7 +150,11 @@ Before publishing:
99
150
  - update `CHANGELOG.md`
100
151
  - confirm README install guidance still leads with the package-first flow
101
152
  - confirm `docs/COMMAND_REFERENCE.md` still matches the effective upstream command/help surface used by the wrapper
102
- - confirm the explicit local-checkout instructions still work for pre-release validation
103
- - rerun `npm run verify:release`
104
- - manually exercise `/reload` and full restart + `/resume` continuity in local checkout validation
105
- - publish only after the tarball contents match expectations
153
+ - run `npm run verify -- command-reference` if the installed upstream `agent-browser` version or help surface changed
154
+ - run `npm run doctor` and confirm any duplicate-source remediation matches the active package/checkout setup
155
+ - run `npm run verify -- real-upstream` for upstream runtime, result-presentation, or managed-session changes
156
+ - confirm both local-checkout modes still work for pre-release validation: isolated `pi --no-extensions -e .` smoke testing and configured-source lifecycle validation
157
+ - rerun `npm run verify -- release`
158
+ - run `npm run verify -- lifecycle` for opt-in configured-source `/reload` plus restart/`/resume` regression coverage
159
+ - manually exercise real-browser `/reload` and full restart + `/resume` continuity when release risk warrants browser-level confidence beyond the fake upstream harness
160
+ - publish only after the tarball contents and isolated packaged-extension smoke check match expectations
@@ -44,11 +44,13 @@ Define the product requirements and constraints for `pi-agent-browser-native`.
44
44
  ### Install priority
45
45
 
46
46
  - Prioritize the package install path first.
47
- - User-facing install docs should lead with `pi install npm:pi-agent-browser-native` and `pi -e npm:pi-agent-browser-native` once releases exist.
47
+ - User-facing install docs should lead with `pi install npm:pi-agent-browser-native`; ephemeral package trials and validation must use `pi --no-extensions -e npm:pi-agent-browser-native[@<version>]` so configured checkout or global sources cannot duplicate `agent_browser`.
48
48
  - User-facing install docs should also include the GitHub source path `pi install https://github.com/fitchmultz/pi-agent-browser-native`.
49
+ - Provide a read-only package-level doctor command that checks upstream `agent-browser` PATH/version and duplicate Pi package/checkout sources before first use. It must not mutate Pi settings and must remain distinct from upstream `agent-browser doctor`.
49
50
  - Keep the current local-checkout path documented as the practical pre-release and development flow.
50
51
  - Most users will install this extension globally rather than as a project-local extension.
51
- - Local checkout development should use explicit CLI loading such as `pi --no-extensions -e .` or `pi --no-extensions -e /absolute/path/to/pi-agent-browser-native`.
52
+ - Local checkout smoke testing should use explicit CLI loading such as `pi --no-extensions -e .` or `pi --no-extensions -e /absolute/path/to/pi-agent-browser-native`; Pi settings are bypassed in this mode and code edits require a process restart for validation.
53
+ - Local checkout hot-reload and resume validation should use configured-source lifecycle mode: exactly one active checkout/package source in Pi settings, launched with plain `pi`, so `/reload` exercises discovered/configured resources.
52
54
  - Do **not** rely on repo-local `.pi/extensions/` auto-discovery for this package, because it conflicts with the global installed-package path.
53
55
 
54
56
  ### Native-tool preference
@@ -65,14 +67,17 @@ Define the product requirements and constraints for `pi-agent-browser-native`.
65
67
  - Documents should read as complete documents, not iterative logs, unless they are explicitly meant to be iterative, such as a changelog.
66
68
  - Requirements, expectations, and durable rules from user conversations should be reflected in the appropriate docs.
67
69
  - Because direct-binary usage is commonly blocked in normal agent sessions, the repo must carry a local command reference for the effective `agent_browser` surface and keep it in sync with upstream changes.
70
+ - Repository verification must include a lightweight command-reference drift check against the targeted installed upstream `agent-browser` version.
68
71
  - Published package contents should include the canonical user-facing docs plus `LICENSE`.
69
72
  - Published package contents should exclude agent-only and superseded docs such as `AGENTS.md`, `docs/v1-tool-contract.md`, and `docs/native-integration-design.md`.
70
73
 
71
74
  ### Testing guidance
72
75
 
73
76
  - The primary confidence path is a real `pi` session driven in `tmux`.
74
- - For local checkout validation, launch `pi --no-extensions -e .` from the repository root so only the checkout copy loads.
75
- - Validate both `/reload` and a full `pi` restart with `/resume` when changes touch managed-session continuity, reload behavior, or persisted artifact paths.
77
+ - For quick local checkout smoke validation, launch `pi --no-extensions -e .` from the repository root so only the checkout copy loads; do not rely on Pi settings or `/reload` semantics in this isolated mode.
78
+ - For hot-reload validation, configure exactly one active source for this extension in Pi settings and launch plain `pi`; validate `/reload` there because it exercises auto-discovered/configured resources.
79
+ - Maintain an opt-in tmux-driven configured-source lifecycle harness that isolates Pi settings, uses exactly one configured source, exercises `/reload`, full restart, and `/resume`, and asserts managed-session continuity plus persisted artifact survival. Keep `docs/RELEASE.md` accurate about the harness behavior, cleanup, transcript retention, and limitations.
80
+ - Validate a full `pi` restart with `/resume` when changes touch managed-session continuity, reload behavior, or persisted artifact paths.
76
81
  - Prefer full `pi` restart over `/reload` when validating extension changes beyond a quick reload smoke check.
77
82
  - Use `/resume` when needed after restart.
78
83
  - Keep testing broader than a single smoke site like `example.com`.
@@ -97,6 +102,7 @@ The design should comfortably support workflows such as:
97
102
  - User-facing docs belong in `README.md` and the canonical published files under `docs/`.
98
103
  - Agent workflow and deeper testing procedures can stay in `AGENTS.md`, but published docs must not depend on that file being present.
99
104
  - When upstream `agent-browser` changes, refresh the local command reference, prompt guidance, and other extension-side docs so agents still have a repo-readable equivalent of the blocked direct-binary help path.
105
+ - The canonical agent-facing playbook should live in `extensions/agent-browser/lib/playbook.ts`; README, command-reference, and tool-contract fragments must be generated or checked from that source by `npm run docs -- playbook check` so prompt guidance and docs cannot drift silently.
100
106
  - Keep mitigations for legacy-skill coexistence simple; do not add extra moving parts unless observed behavior justifies them.
101
107
  - Prefer narrow, evidence-backed compatibility mitigations over broad stealth layers when a specific upstream site starts rejecting the default headless launch fingerprint.
102
108
  - Preserve the page that a profiled `open` just navigated to; if restored profile tabs steal focus during launch, the wrapper should best-effort switch back to the returned page URL before handing control back to the agent.
@@ -25,7 +25,25 @@ It also keeps the main UX where it belongs: the agent invokes the tool directly
25
25
 
26
26
  The tool guidance should be written for task discovery first, not wrapper implementation first. That means the description should emphasize browser use cases like web research, reading live docs, clicking, filling, screenshots, extraction, and authenticated/profile-based workflows. Low-level wrapper details like `stdin` and exact CLI args belong in the schema and guidelines, not the lead description.
27
27
 
28
- The tool also needs an operating playbook, not just a capability list. The model should not have to rediscover basics each session. Guidance should explicitly encode the normal browser workflow (`open` -> `snapshot -i` -> interact -> re-snapshot), the authenticated-content workflow (prefer `--profile Default` on the first browser call and let the implicit session carry continuity; use `--auto-connect` as a fallback when profile reuse is unavailable), and the preferred recovery path when a session opens on the wrong tab, an action changes origin unexpectedly, or an `open` call returns blocked/blank/unexpected results (`tab list` / `tab <tab-id-or-label>` / `snapshot -i` before retrying different URLs or fallback strategies). It should also discourage inventing fixed explicit session names for routine tasks, because those names leak stale browser state across otherwise unrelated `pi` sessions. For read-only browsing tasks, guidance should prefer answering from the current page state first: use the current snapshot, structured ref labels, or `eval --stdin` on the current page before navigating into media viewers, detail routes, or other new pages unless the current view lacks the needed information. For downloads, guidance should explicitly prefer `download <selector> <path>` over `click` when the goal is a file on disk. When using `eval --stdin`, scope checks and actions to the target element or route whenever possible instead of relying on broad page-wide text heuristics. When using `eval --stdin` for extraction, return the intended value instead of relying on `console.log` as the primary result channel. Because the extension blocks normal direct-binary usage in most agent sessions, the repository must also carry a local command reference that stays in sync with the effective tool surface.
28
+ The tool also needs an operating playbook, not just a capability list. The model should not have to rediscover basics each session. The canonical agent-facing playbook lives in `extensions/agent-browser/lib/playbook.ts`; generated Markdown fragments are updated by `npm run docs -- playbook write`, and `npm run docs -- playbook check` fails when checked-in documentation drifts.
29
+
30
+ <!-- agent-browser-playbook:start shared-guidelines -->
31
+ <!-- Generated from extensions/agent-browser/lib/playbook.ts. Run `npm run docs -- playbook write` to update. -->
32
+ - Standard workflow: open the page, snapshot -i, interact using current @refs from that snapshot, and re-snapshot after navigation, scrolling, rerendering, or other major DOM changes because refs can become stale.
33
+ - When a visible text or accessible-name target should survive ref churn, prefer find locators such as role, text, label, placeholder, alt, title, or testid with the intended action instead of guessing a CSS selector.
34
+ - Do not assume Playwright selector dialects such as text=Close or button:has-text('Close') are supported wrapper syntax unless current upstream agent-browser behavior has been verified.
35
+ - For authenticated or user-specific content like feeds, inboxes, dashboards, and accounts, prefer --profile Default on the first browser call and let the implicit session carry continuity. Use --auto-connect only if profile-based reuse is unavailable or the task is specifically about attaching to a running debug-enabled browser.
36
+ - Do not invent fixed explicit session names for routine tasks. Use the implicit session unless you truly need multiple isolated browser sessions in the same conversation.
37
+ - When using --profile, --session-name, --cdp, --state, or --auto-connect, put them on the first command for that session. If you intentionally use an explicit --session, keep using that same explicit session for follow-ups.
38
+ - If you already used the implicit session and now need launch-scoped flags like --profile, --session-name, --cdp, --state, or --auto-connect, retry with sessionMode set to fresh or pass an explicit --session for the new launch. After a successful unnamed fresh launch, later auto calls follow that new session.
39
+ - If a session lands on the wrong page or tab, an interaction changes origin unexpectedly, or an open call returns blocked, blank, or otherwise unexpected results, use tab list / tab <tab-id-or-label> / snapshot -i to recover state before retrying different URLs or fallback strategies. Only use wait with an explicit argument like milliseconds, --load <state>, --url <matcher>, --fn <js>, or --text <matcher>.
40
+ - For feed, timeline, or inbox reading tasks, focus on the main timeline/list region and read the first item there rather than unrelated composer or sidebar content.
41
+ - For read-only browsing tasks, prefer extracting the answer from the current snapshot, structured ref labels, or eval --stdin on the current page before navigating away. Only click into media viewers, detail routes, or new pages when the current view does not contain the needed information.
42
+ - For downloads, prefer download <selector> <path> when an element click should save a file. Do not rely on click alone when you need the downloaded file on disk.
43
+ - When using eval --stdin, scope checks and actions to the target element or route whenever possible instead of relying on broad page-wide text heuristics.
44
+ - When using eval --stdin for extraction, return the value you want instead of relying on console.log as the primary result channel.
45
+ - Do not call --help or other exploratory inspection commands unless the user explicitly asks for them or debugging the browser integration is necessary.
46
+ <!-- agent-browser-playbook:end shared-guidelines -->
29
47
 
30
48
  ## Parameters
31
49
 
@@ -58,7 +76,8 @@ Examples:
58
76
 
59
77
  - type: `string`
60
78
  - optional
61
- - raw stdin for commands like `eval --stdin` and `batch`
79
+ - raw stdin for `eval --stdin` and `batch`
80
+ - rejected before launch for any other command/stdin combination, including commands such as `click`, `snapshot`, or `open`
62
81
 
63
82
  Examples:
64
83
 
@@ -92,8 +111,18 @@ The extension should:
92
111
  - invoke `agent-browser` directly, not through a shell
93
112
  - parse JSON output into tool details
94
113
  - handle observed JSON result shapes, including the array returned by `batch --json`
95
- - allow plain-text fallback for inspection commands like `--help` and `--version`
96
- - support those inspection commands unconditionally so the tool contract stays local and predictable
114
+ - allow plain-text fallback for native inspection calls
115
+ - support those inspection calls unconditionally so the tool contract stays local and predictable
116
+
117
+ <!-- agent-browser-playbook:start inspection -->
118
+ <!-- Generated from extensions/agent-browser/lib/playbook.ts. Run `npm run docs -- playbook write` to update. -->
119
+ Native inspection calls use the `agent_browser` tool shape, not shell-like direct-binary commands:
120
+
121
+ - { "args": ["--help"] }
122
+ - { "args": ["--version"] }
123
+
124
+ These calls return plain text and stay stateless: the extension does not inject its implicit session and does not let inspection consume the managed-session slot needed for later profile, session, CDP, state, or auto-connect launches.
125
+ <!-- agent-browser-playbook:end inspection -->
97
126
  - still describe normal browser workflows in guidance so models do not overuse inspection for routine tasks
98
127
  - surface stderr and non-zero exits clearly
99
128
  - attach images when the result points to a screenshot-like artifact
@@ -142,26 +171,33 @@ Additional structured fields can appear when relevant:
142
171
  - `batchFailure` and `batchSteps` for `batch` rendering, including mixed-success runs
143
172
  - `navigationSummary` for navigation-style commands like `click`, `back`, `forward`, and `reload`
144
173
  - `imagePath` / `imagePaths` for screenshots and batched image outputs
174
+ - `artifacts` for upstream saved files such as screenshots, PDFs, downloads, `wait --download` files, traces, CPU profiles, completed WebM recordings, path-bearing HAR captures, and future recording output paths reported by `record start`. Each artifact includes the original saved or requested `path`, resolved `absolutePath`, `kind`, optional `mediaType`, optional `extension`, and best-effort disk metadata such as `exists` and `sizeBytes`.
175
+ - `savedFilePath` / `savedFile` for direct `download`, `pdf`, and `wait --download` saved-file workflows; batch results preserve the same fields on the relevant `batchSteps` entry.
176
+ - `batchSteps[].artifacts` for per-step artifacts in `batch` output; top-level `artifacts` aggregates all step artifacts in order
145
177
  - `fullOutputPath` / `fullOutputPaths` when large snapshot output or other oversized tool output is compacted and spilled to a private file; persisted sessions keep that path under a private session-scoped artifact directory with a bounded per-session budget so it survives reload/resume without unbounded growth
178
+ - `artifactManifest` for a bounded, metadata-only inventory of recent session artifacts. Entries include path metadata, artifact `kind`, source `command`/`subcommand` when safe, `storageScope` (`persistent-session`, `process-temp`, or `explicit-path`), and `retentionState` (`live`, `ephemeral`, `missing`, or `evicted`). The default recent window is 100 entries and can be configured with `PI_AGENT_BROWSER_SESSION_ARTIFACT_MANIFEST_MAX_ENTRIES`. The manifest must not store command args, output contents, headers, DOM snapshots, or downloaded file contents.
179
+ - `artifactRetentionSummary` with a concise count of live, evicted, ephemeral, and missing artifacts from the current manifest; results append this summary to model-facing text only when retention state affects recovery, such as spill files, ephemeral files, or evictions. Routine explicit saved files keep the summary in details to avoid noisy browsing transcripts.
146
180
  - `sessionRecoveryHint` when startup-scoped flags need `sessionMode: "fresh"`
147
181
  - `inspection: true` plus `stdout` for successful plain-text inspection commands like `--help` and `--version`
148
182
 
149
183
  When the tool echoes `args` or `effectiveArgs` back into Pi, sensitive values such as `--headers`, proxy credentials, and auth-bearing URL parameters should be redacted first.
150
184
 
151
- For oversized snapshots and other oversized tool outputs, details should switch to a compact metadata object and include `fullOutputPath` pointing at a private spill file with the full upstream payload. The model-facing tool text should print the actual spill-file path when one exists instead of only saying to inspect a details key. Persisted sessions should keep that spill file under a private session-scoped artifact directory so the path remains usable after reload/restart, with the oldest persisted spill files evicted as needed to stay within the per-session budget.
185
+ For oversized snapshots and other oversized tool outputs, details should switch to a compact metadata object and include `fullOutputPath` pointing at a private spill file with the full redacted upstream payload. The model-facing tool text should print the actual spill-file path when one exists instead of only saying to inspect a details key. Persisted sessions should keep that spill file under a private session-scoped artifact directory so the path remains usable after reload/restart. The oldest persisted spill files are evicted as needed to stay within `PI_AGENT_BROWSER_SESSION_ARTIFACT_MAX_BYTES` (default 32 MiB), and those evictions are reported as `artifactManifest.entries[].retentionState: "evicted"` instead of silently disappearing from the session inventory. This persisted-spill byte budget is separate from the recent metadata window controlled by `PI_AGENT_BROWSER_SESSION_ARTIFACT_MANIFEST_MAX_ENTRIES`.
152
186
 
153
187
  ## High-value result rendering
154
188
 
155
189
  "Rendering" here means how results appear inside `pi`, not embedding a browser UI.
156
190
 
157
191
  Worth doing in v1:
158
- - screenshots → inline image attachment
159
- - snapshots origin + ref count + main-content-first compact preview, with the raw snapshot spill path printed directly in content and kept in `details.fullOutputPath` when the inline result would otherwise be too large
192
+ - screenshots → saved-path summary, `details.artifacts` metadata, and inline image attachment when safe
193
+ - file artifacts such as PDFs, downloads, `wait --download` files, traces, CPU profiles, completed WebM recordings, and path-bearing HAR captures → concise saved-path summaries plus metadata in `details.artifacts` and bounded recent metadata in `details.artifactManifest`; `record start` reports recording lifecycle state and the future output path without adding a missing manifest entry; direct saved-file workflows also expose `details.savedFilePath` / `details.savedFile`; large or binary artifacts are not inlined into model context; the recent manifest cap can age out explicit-file metadata but does not remove explicit saved files from disk
194
+ - snapshots → origin + ref count + main-content-first compact preview, with the raw snapshot spill path printed directly in content and kept in `details.fullOutputPath` plus `details.artifactManifest` when the inline result would otherwise be too large
160
195
  - oversized generic outputs such as large `eval --stdin` payloads → compact preview plus the actual spill file path instead of dumping the whole payload into model context
161
196
  - extraction-style commands like `eval --stdin` and `get title` → scalar-first text with lightweight origin context when available
162
197
  - navigation actions like `click`, `back`, `forward`, and `reload` → lightweight post-action title/url summary when available
163
198
  - tab lists → compact summary/table
164
199
  - stream status → enabled/connected/port summary
200
+ - diagnostic/status families (`session`, `session list`, `profiles`, `doctor`, `auth list`/`show`, `network requests`, `console`, `errors`, and dashboard start/stop/status outputs) → compact readable summaries with counts and stable fields; large log/request/error outputs use previews plus `fullOutputPath` spill files; sensitive nested auth/header/token fields are not expanded in the model-facing text
165
201
 
166
202
  ## Missing binary behavior
167
203
 
@@ -179,14 +215,18 @@ If `agent-browser` is not on `PATH`, fail with a message that:
179
215
  - preserve the current extension-managed session across normal `pi` shutdown/reload so persisted sessions can keep following the live browser on `/reload` or `/resume`
180
216
  - set an idle timeout on extension-managed sessions so abandoned daemons eventually self-clean
181
217
  - clean up process-private temp spill artifacts on shutdown, while keeping persisted-session snapshot spill files in a private session-scoped artifact directory so `details.fullOutputPath` survives reload/restart and the oldest spill files are evicted if the per-session artifact budget is exceeded
182
- - reconstruct the current extension-managed session from persisted tool details on resume/reload so later default calls keep following the active managed browser
218
+ - reconstruct the current extension-managed session and latest `artifactManifest` from persisted tool details on resume/reload so later default calls keep following the active managed browser and can continue reporting artifact retention state
183
219
  - when an unnamed `sessionMode: "fresh"` launch succeeds, make it the new extension-managed session so later default calls keep using it
184
220
  - if that unnamed fresh launch replaced an already-active managed session, best-effort close the old managed session after the switch succeeds
185
221
  - treat explicit caller-provided `--session` choices as user-managed
186
222
  - pass explicit `--profile` straight through to upstream `agent-browser`; no profile-cloning or isolation layer is added in v1
187
- - after profiled `open` / `goto` / `navigate`, if upstream leaves a restored profile tab active instead of the page that was just opened, best-effort switch back to the tab whose URL matches the returned open result before returning control to the agent
188
- - once the wrapper has a known tab target for a session, later active-tab commands may best-effort pin that tab inside the same upstream invocation so reconnect drift does not send a `click`, `snapshot`, or similar action to a restored/background tab instead
189
- - after a successful command on a known tab target, the wrapper may best-effort restore that same target again if a restored/background tab steals focus after the command completes
223
+ <!-- agent-browser-playbook:start wrapper-tab-recovery -->
224
+ <!-- Generated from extensions/agent-browser/lib/playbook.ts. Run `npm run docs -- playbook write` to update. -->
225
+ - After launch-scoped open/goto/navigate calls that can restore existing tabs (for example --profile, --session-name, or --state), agent_browser best-effort re-selects the tab whose URL matches the returned page when restored tabs steal focus during launch.
226
+ - After a target tab is known for a session, later active-tab commands best-effort pin that tab inside the same upstream invocation when reconnect drift would otherwise move the command to a restored/background tab.
227
+ - After a successful command on a known target tab, agent_browser also best-effort restores that intended tab if a restored/background tab steals focus after the command completes.
228
+ - If a known session target unexpectedly reports about:blank, agent_browser preserves the prior intended target, best-effort re-selects it when it still exists, and reports exact recovery guidance when it cannot be re-selected.
229
+ <!-- agent-browser-playbook:end wrapper-tab-recovery -->
190
230
  - on local Unix launches, set a short private socket directory for wrapper-spawned `agent-browser` processes so extension-generated session names do not fail the upstream Unix socket-path length limit in longer cwd/session-name combinations
191
231
  - treat successful plain-text inspection commands like `--help` and `--version` as stateless: do not inject the implicit managed session and do not let those calls claim the managed-session slot
192
232
  - if startup-scoped flags like `--profile`, `--session-name`, or `--cdp` are supplied after the implicit session is already active while `sessionMode` is `"auto"`, return a validation error with a structured recovery hint that recommends `sessionMode: "fresh"`