pi-agent-browser-native 0.2.22 → 0.2.24
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +16 -0
- package/README.md +207 -191
- package/docs/ARCHITECTURE.md +1 -1
- package/docs/COMMAND_REFERENCE.md +6 -4
- package/docs/TOOL_CONTRACT.md +9 -2
- package/extensions/agent-browser/index.ts +282 -49
- package/extensions/agent-browser/lib/playbook.ts +4 -4
- package/extensions/agent-browser/lib/results/envelope.ts +14 -1
- package/extensions/agent-browser/lib/results/presentation.ts +5 -2
- package/extensions/agent-browser/lib/runtime.ts +53 -9
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,22 @@
|
|
|
2
2
|
|
|
3
3
|
## Unreleased
|
|
4
4
|
|
|
5
|
+
## 0.2.24 - 2026-05-11
|
|
6
|
+
|
|
7
|
+
### Added
|
|
8
|
+
- added custom `agent_browser` TUI rendering with colorized call/output text and built-in-style visual truncation for long visible output while preserving model-facing tool content
|
|
9
|
+
|
|
10
|
+
## 0.2.23 - 2026-05-10
|
|
11
|
+
|
|
12
|
+
### Fixed
|
|
13
|
+
- added safe `auth save --password-stdin` support for native tool calls and redacted password stdin from model-visible content, tool details, upstream failure output, and preserved parse-failure spill files
|
|
14
|
+
- improved session and launch-flag handling for agent workflows, including disabled `--auto-connect`, optional boolean flag values, dash-starting `--args` values, and stale `@ref` recovery guidance through pinned commands and user batch stdin
|
|
15
|
+
- expanded sensitive argument redaction for password and credential command forms
|
|
16
|
+
|
|
17
|
+
### Changed
|
|
18
|
+
- rewrote the public README around outcome-first usage, fastest install paths, profile/auth workflow guidance, and release verification proof
|
|
19
|
+
- clarified native-tool command guidance for password stdin, cookie/privacy handling, stable tab ids, and explicit session persistence limits
|
|
20
|
+
|
|
5
21
|
## 0.2.22 - 2026-05-07
|
|
6
22
|
|
|
7
23
|
### Compatibility
|
package/README.md
CHANGED
|
@@ -1,61 +1,86 @@
|
|
|
1
1
|
# pi-agent-browser-native
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
A Pi extension that lets coding agents drive real browser sessions with a native `agent_browser` tool instead of brittle shell commands.
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
It is for Pi users who want agents to browse sites, inspect pages, click through flows, capture screenshots, use persistent profiles, and handle authenticated web apps without spending context on `agent-browser` CLI ceremony.
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
## What this looks like in Pi
|
|
8
8
|
|
|
9
|
-
|
|
9
|
+
You prompt the agent in plain English:
|
|
10
10
|
|
|
11
|
-
|
|
11
|
+
```text
|
|
12
|
+
Use the agent_browser tool to open https://react.dev and then take an interactive snapshot.
|
|
13
|
+
```
|
|
12
14
|
|
|
13
|
-
|
|
15
|
+
The agent gets a native tool, not a bash workaround:
|
|
14
16
|
|
|
15
|
-
|
|
17
|
+
```json
|
|
18
|
+
{ "args": ["open", "https://react.dev"] }
|
|
19
|
+
{ "args": ["snapshot", "-i"] }
|
|
20
|
+
```
|
|
16
21
|
|
|
17
|
-
|
|
18
|
-
- **Latest-version only**: no backward-compatibility support or shims for older `agent-browser` versions
|
|
19
|
-
- **Thin wrapper**: stay close to upstream `agent-browser` instead of re-implementing its CLI
|
|
20
|
-
- **Agent-invoked first**: the main UX is the agent calling the tool directly, like `read` or `write`
|
|
21
|
-
- **Global-install first**: package behavior matters more than repo-local development wiring
|
|
22
|
+
The result is optimized for agent work:
|
|
22
23
|
|
|
23
|
-
|
|
24
|
-
-
|
|
25
|
-
-
|
|
24
|
+
- compact page snapshots that lead with useful page content instead of chrome/sidebar noise
|
|
25
|
+
- interactive `@eN` refs for follow-up clicks and form fills
|
|
26
|
+
- screenshots and downloaded files surfaced as Pi artifacts
|
|
27
|
+
- structured details for titles, URLs, saved files, sessions, and errors
|
|
28
|
+
- spill files for oversized raw output instead of dumping pages into context
|
|
29
|
+
- compact, colorized Pi TUI rows that can be expanded without changing what the agent receives
|
|
30
|
+
- recovery hints when a tab, selector, stale `@ref`, or launch mode needs a different next step
|
|
31
|
+
|
|
32
|
+
## Who this is for
|
|
26
33
|
|
|
27
|
-
|
|
34
|
+
- **Pi users** who want browser automation available as a normal tool beside `read`, `write`, and `bash`.
|
|
35
|
+
- **Coding agents** that need low-context browser workflows for docs, QA, research, dashboards, and web apps.
|
|
36
|
+
- **Maintainers** who want a thin integration that tracks the current upstream [`agent-browser`](https://agent-browser.dev/) CLI without bundling or re-implementing it.
|
|
28
37
|
|
|
29
|
-
|
|
38
|
+
## The problem
|
|
30
39
|
|
|
31
|
-
-
|
|
32
|
-
- parsed results instead of bash stdout
|
|
33
|
-
- compact model-facing snapshot shaping with full raw spill files for oversized pages
|
|
34
|
-
- main-content-first snapshot previews so the model sees the important page region before unrelated chrome or sidebar noise
|
|
35
|
-
- inline screenshots and artifacts
|
|
36
|
-
- lightweight session convenience inside `pi`
|
|
37
|
-
- a better base for serious browser automation
|
|
40
|
+
`agent-browser` is powerful, but plain CLI use is awkward inside an agent harness:
|
|
38
41
|
|
|
39
|
-
|
|
42
|
+
- shell strings are easy for agents to quote wrong
|
|
43
|
+
- large page snapshots can waste model context
|
|
44
|
+
- screenshots and downloads need artifact metadata, not just text paths
|
|
45
|
+
- implicit browser sessions need predictable reuse and cleanup
|
|
46
|
+
- profile/debug launches need a clear way to start fresh after public browsing
|
|
47
|
+
- secrets and auth material must not be echoed into model-visible output
|
|
48
|
+
- stale element refs need actionable recovery guidance, not generic failures
|
|
40
49
|
|
|
41
|
-
-
|
|
42
|
-
- web research
|
|
43
|
-
- driving web UIs for ChatGPT, Grok, Gemini, and Claude
|
|
44
|
-
- authenticated browser sessions and persistent profiles
|
|
50
|
+
`pi-agent-browser-native` keeps upstream `agent-browser` as the browser engine and adds the Pi-native wrapper behavior needed for reliable agent use.
|
|
45
51
|
|
|
46
|
-
##
|
|
52
|
+
## What it does
|
|
47
53
|
|
|
48
|
-
|
|
54
|
+
| Pain | Native wrapper capability | Proof surface |
|
|
55
|
+
|---|---|---|
|
|
56
|
+
| Agents build fragile shell commands | Exposes `agent_browser` with exact `args`, controlled `stdin`, and `sessionMode` fields | `extensions/agent-browser/index.ts`, [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md) |
|
|
57
|
+
| Page snapshots are too large | Shows compact, main-content-first summaries and stores full raw output in spill files when needed | `test/agent-browser.presentation.test.ts` |
|
|
58
|
+
| Screenshots/downloads get lost in text | Normalizes artifact paths and reports existence, size, cwd, session, and repair status | [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#download-screenshot-and-pdf-files) |
|
|
59
|
+
| Profile restores and tab drift confuse agents | Tracks managed sessions, pins intended tabs, and re-selects target tabs after drift | generated tab-recovery notes below; `test/agent-browser.resume-state.test.ts` |
|
|
60
|
+
| Auth/profile workflows can leak secrets | Supports `auth save --password-stdin` and redacts sensitive args, URLs, stdout/stderr, details, and parse-failure spills | `test/agent-browser.extension-validation.test.ts` |
|
|
61
|
+
| Stale `@eN` refs fail mysteriously | Adds recovery guidance to rerun `snapshot -i` or use stable `find` locators | `test/agent-browser.results.test.ts` |
|
|
62
|
+
| Direct binary help may be blocked in agent sessions | Publishes a repo-readable command reference and verifies it against the target upstream version | `npm run verify` |
|
|
49
63
|
|
|
50
|
-
|
|
64
|
+
## Fastest way to try it
|
|
51
65
|
|
|
52
|
-
Install `agent-browser`
|
|
66
|
+
Install upstream `agent-browser` first and make sure it is on `PATH`:
|
|
67
|
+
|
|
68
|
+
- https://agent-browser.dev/
|
|
69
|
+
- https://github.com/vercel-labs/agent-browser
|
|
70
|
+
|
|
71
|
+
Then install this Pi package:
|
|
53
72
|
|
|
54
73
|
```bash
|
|
55
74
|
pi install npm:pi-agent-browser-native
|
|
56
75
|
```
|
|
57
76
|
|
|
58
|
-
|
|
77
|
+
Start Pi and ask for a browser action:
|
|
78
|
+
|
|
79
|
+
```text
|
|
80
|
+
Use the agent_browser tool to open https://example.com and then take an interactive snapshot.
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
For a one-off trial that does not touch your configured Pi extensions:
|
|
59
84
|
|
|
60
85
|
```bash
|
|
61
86
|
pi --no-extensions -e npm:pi-agent-browser-native
|
|
@@ -67,127 +92,123 @@ For a specific published version:
|
|
|
67
92
|
pi --no-extensions -e npm:pi-agent-browser-native@<version>
|
|
68
93
|
```
|
|
69
94
|
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
Run the package doctor before first use or when `agent_browser` is missing or duplicated:
|
|
73
|
-
|
|
74
|
-
```bash
|
|
75
|
-
pi-agent-browser-doctor
|
|
76
|
-
# one-off without installing the package source permanently:
|
|
77
|
-
npm exec --package pi-agent-browser-native -- pi-agent-browser-doctor
|
|
78
|
-
# from a checkout:
|
|
79
|
-
npm run doctor
|
|
80
|
-
```
|
|
81
|
-
|
|
82
|
-
The doctor is read-only. It checks that upstream `agent-browser` is on `PATH`, that `agent-browser --version` matches the wrapper's capability baseline, and that Pi settings do not point at multiple active `pi-agent-browser-native` sources. It does not run upstream `agent-browser doctor --fix` or edit Pi settings.
|
|
83
|
-
|
|
84
|
-
If it reports duplicate sources, keep exactly one active source. For normal use, keep `pi install npm:pi-agent-browser-native` and remove checkout paths from Pi settings. For temporary package or checkout trials, use `pi --no-extensions -e npm:pi-agent-browser-native[@<version>]` or `pi --no-extensions -e /path/to/checkout` so configured sources are bypassed.
|
|
85
|
-
|
|
86
|
-
### GitHub install
|
|
87
|
-
|
|
88
|
-
For the source install path, prefer the repository URL:
|
|
95
|
+
To install directly from source instead of npm:
|
|
89
96
|
|
|
90
97
|
```bash
|
|
91
98
|
pi install https://github.com/fitchmultz/pi-agent-browser-native
|
|
92
99
|
```
|
|
93
100
|
|
|
94
|
-
|
|
101
|
+
For a temporary source trial, keep it isolated from your normal package sources:
|
|
95
102
|
|
|
96
103
|
```bash
|
|
97
104
|
pi --no-extensions -e https://github.com/fitchmultz/pi-agent-browser-native
|
|
98
105
|
```
|
|
99
106
|
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
### Current practical local-checkout flows
|
|
103
|
-
|
|
104
|
-
This repository's `package.json` is itself a publishable pi package manifest that points at `extensions/agent-browser/index.ts`. That file is the real extension entrypoint for both the checkout and the published package.
|
|
105
|
-
|
|
106
|
-
Use two local-checkout modes intentionally:
|
|
107
|
-
|
|
108
|
-
- **Quick isolated smoke test:** run the checkout explicitly with `-e` and disable extension discovery:
|
|
109
|
-
|
|
110
|
-
```bash
|
|
111
|
-
pi --no-extensions -e /absolute/path/to/pi-agent-browser-native
|
|
112
|
-
```
|
|
113
|
-
|
|
114
|
-
This bypasses Pi settings and any configured checkout/global package sources, so it avoids duplicate `agent_browser` registrations. After editing extension code, restart this `pi` process to validate the new source; do not use this mode as proof that configured-source `/reload` works.
|
|
107
|
+
## First-run health check
|
|
115
108
|
|
|
116
|
-
|
|
109
|
+
Run the read-only doctor when installing, upgrading, or debugging missing/duplicated tools:
|
|
117
110
|
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
111
|
+
```bash
|
|
112
|
+
pi-agent-browser-doctor
|
|
113
|
+
# one-off without permanent install:
|
|
114
|
+
npm exec --package pi-agent-browser-native -- pi-agent-browser-doctor
|
|
115
|
+
# from this checkout:
|
|
116
|
+
npm run doctor
|
|
117
|
+
```
|
|
121
118
|
|
|
122
|
-
|
|
123
|
-
- `"fresh"` switches that managed session to a fresh upstream launch so launch-scoped flags like `--profile`, `--session-name`, `--cdp`, `--state`, `--auto-connect`, `--init-script`, and `--enable` apply and later auto calls follow the new browser
|
|
119
|
+
The doctor checks:
|
|
124
120
|
|
|
125
|
-
|
|
121
|
+
- upstream `agent-browser` exists on `PATH`
|
|
122
|
+
- the installed upstream version matches this wrapper's command-reference baseline
|
|
123
|
+
- Pi settings do not point at multiple active `pi-agent-browser-native` sources
|
|
126
124
|
|
|
127
|
-
|
|
125
|
+
It does **not** edit Pi settings and does **not** run upstream `agent-browser doctor --fix`.
|
|
128
126
|
|
|
129
|
-
|
|
130
|
-
- `stdin` — raw stdin only for `batch` and `eval --stdin` (other command/stdin combinations are rejected before `agent-browser` is launched)
|
|
131
|
-
- `sessionMode`
|
|
132
|
-
- `"auto"` — default, reuse the extension-managed `pi`-scoped session
|
|
133
|
-
- `"fresh"` — switch that managed session to a new profile/debug launch
|
|
127
|
+
## Common agent calls
|
|
134
128
|
|
|
135
|
-
|
|
129
|
+
You usually prompt the agent in natural language. These JSON snippets show the exact native tool shape the agent should use.
|
|
136
130
|
|
|
137
|
-
Open a page
|
|
131
|
+
Open a page and inspect it:
|
|
138
132
|
|
|
139
133
|
```json
|
|
140
134
|
{ "args": ["open", "https://example.com"] }
|
|
141
135
|
{ "args": ["snapshot", "-i"] }
|
|
142
136
|
```
|
|
143
137
|
|
|
144
|
-
Click a ref, then
|
|
138
|
+
Click a visible ref, then refresh refs after navigation or a DOM update:
|
|
145
139
|
|
|
146
140
|
```json
|
|
147
141
|
{ "args": ["click", "@e2"] }
|
|
148
142
|
{ "args": ["snapshot", "-i"] }
|
|
149
143
|
```
|
|
150
144
|
|
|
151
|
-
Run a multi-step
|
|
145
|
+
Run a multi-step flow in one tool call:
|
|
152
146
|
|
|
153
147
|
```json
|
|
154
148
|
{ "args": ["batch"], "stdin": "[[\"open\",\"https://example.com\"],[\"snapshot\",\"-i\"]]" }
|
|
155
149
|
```
|
|
156
150
|
|
|
157
|
-
Evaluate page JavaScript
|
|
151
|
+
Evaluate page JavaScript through stdin:
|
|
158
152
|
|
|
159
153
|
```json
|
|
160
154
|
{ "args": ["eval", "--stdin"], "stdin": "document.title" }
|
|
161
155
|
```
|
|
162
156
|
|
|
163
|
-
|
|
157
|
+
Save an auth profile without putting the password in `args`:
|
|
158
|
+
|
|
159
|
+
```json
|
|
160
|
+
{ "args": ["auth", "save", "demo", "--password-stdin"], "stdin": "<password>" }
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
Download a file from a known link or control:
|
|
164
164
|
|
|
165
165
|
```json
|
|
166
166
|
{ "args": ["download", "@e5", "/tmp/report.pdf"] }
|
|
167
167
|
```
|
|
168
168
|
|
|
169
|
-
For
|
|
169
|
+
For asynchronous exports, click first and then wait for the download:
|
|
170
170
|
|
|
171
171
|
```json
|
|
172
172
|
{ "args": ["click", "@export"] }
|
|
173
173
|
{ "args": ["wait", "--download", "/tmp/report.csv"] }
|
|
174
174
|
```
|
|
175
175
|
|
|
176
|
-
|
|
176
|
+
With upstream `agent-browser 0.27.0`, treat `details.savedFilePath` as upstream-reported metadata and confirm `details.artifacts[].exists` before relying on the requested `wait --download <path>` file being present on disk.
|
|
177
|
+
|
|
178
|
+
Start a fresh profiled browser after the implicit public-browsing session already exists:
|
|
177
179
|
|
|
178
180
|
```json
|
|
179
|
-
{ "args": ["
|
|
181
|
+
{ "args": ["--profile", "Default", "open", "https://example.com/account"], "sessionMode": "fresh" }
|
|
180
182
|
```
|
|
181
183
|
|
|
182
|
-
|
|
184
|
+
After a successful unnamed fresh launch, later default `sessionMode: "auto"` calls follow that browser automatically.
|
|
185
|
+
|
|
186
|
+
## Authenticated/profile workflows
|
|
187
|
+
|
|
188
|
+
The wrapper does not clone profiles or hide what upstream Chrome profile you chose. Passing `--profile` is an explicit upstream `agent-browser` choice.
|
|
189
|
+
|
|
190
|
+
Use these rules:
|
|
191
|
+
|
|
192
|
+
- Use public/temp profiles for tests and examples.
|
|
193
|
+
- Use `sessionMode: "fresh"` when switching from public browsing to `--profile`, `--session-name`, `--cdp`, `--state`, `--auto-connect`, `--init-script`, or `--enable`.
|
|
194
|
+
- Use `--session` when you want to manage a live upstream session name yourself.
|
|
195
|
+
- Do not treat `--session` as persisted auth or tab restore after `close`; use `--profile`, `--session-name`, or `--state` for persistence.
|
|
196
|
+
- Prefer page actions and storage checks over cookie dumps. `cookies get` can expose real profile cookies.
|
|
197
|
+
- Prefer `auth save --password-stdin` over putting passwords in `args`.
|
|
198
|
+
|
|
199
|
+
Example explicit session plus profile launch:
|
|
183
200
|
|
|
184
201
|
```json
|
|
185
|
-
{
|
|
202
|
+
{
|
|
203
|
+
"args": ["--session", "auth-flow", "--profile", "Default", "open", "https://example.com/account"]
|
|
204
|
+
}
|
|
186
205
|
```
|
|
187
206
|
|
|
188
|
-
|
|
207
|
+
## React, SPA, and first-navigation setup
|
|
189
208
|
|
|
190
|
-
React and SPA tooling
|
|
209
|
+
React and SPA tooling from upstream `agent-browser` is passed through directly.
|
|
210
|
+
|
|
211
|
+
Launch React introspection before first navigation:
|
|
191
212
|
|
|
192
213
|
```json
|
|
193
214
|
{ "args": ["open", "--enable", "react-devtools", "https://example.com"], "sessionMode": "fresh" }
|
|
@@ -196,11 +217,16 @@ React and SPA tooling added upstream in `agent-browser` v0.27.0 is passed throug
|
|
|
196
217
|
{ "args": ["react", "renders", "start"] }
|
|
197
218
|
{ "args": ["react", "renders", "stop"] }
|
|
198
219
|
{ "args": ["react", "suspense", "--only-dynamic"] }
|
|
199
|
-
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
Use SPA and Web Vitals helpers as normal command tokens:
|
|
223
|
+
|
|
224
|
+
```json
|
|
200
225
|
{ "args": ["pushstate", "/dashboard"] }
|
|
226
|
+
{ "args": ["vitals", "https://example.com", "--json"] }
|
|
201
227
|
```
|
|
202
228
|
|
|
203
|
-
For first
|
|
229
|
+
For setup that must happen before first navigation, open a blank fresh page, stage routes/cookies/scripts, then navigate:
|
|
204
230
|
|
|
205
231
|
```json
|
|
206
232
|
{ "args": ["open"], "sessionMode": "fresh" }
|
|
@@ -209,68 +235,93 @@ For first-navigation setup, launch a fresh blank page before staging routes, coo
|
|
|
209
235
|
{ "args": ["navigate", "https://example.com"] }
|
|
210
236
|
```
|
|
211
237
|
|
|
212
|
-
|
|
238
|
+
## Proof and verification
|
|
213
239
|
|
|
214
|
-
|
|
215
|
-
|
|
240
|
+
The local verification gate is:
|
|
241
|
+
|
|
242
|
+
```bash
|
|
243
|
+
npm run verify
|
|
216
244
|
```
|
|
217
245
|
|
|
218
|
-
|
|
246
|
+
It runs:
|
|
219
247
|
|
|
220
|
-
|
|
221
|
-
|
|
248
|
+
- generated playbook/documentation drift checks
|
|
249
|
+
- `tsc --noEmit`
|
|
250
|
+
- the test suite
|
|
251
|
+
- command-reference baseline checks
|
|
252
|
+
- live command-reference verification against the targeted installed upstream `agent-browser`
|
|
253
|
+
|
|
254
|
+
The opt-in real-upstream suite is separate because it drives a real browser installation:
|
|
255
|
+
|
|
256
|
+
```bash
|
|
257
|
+
npm run verify -- real-upstream
|
|
222
258
|
```
|
|
223
259
|
|
|
260
|
+
For package release confidence, follow [`docs/RELEASE.md`](docs/RELEASE.md). The release gate is:
|
|
261
|
+
|
|
262
|
+
```bash
|
|
263
|
+
npm run doctor
|
|
264
|
+
npm run verify -- release
|
|
265
|
+
```
|
|
266
|
+
|
|
267
|
+
`npm run verify -- release` includes the default verification gate plus packaged Pi smoke coverage. The package also has a `prepublishOnly` hook that runs default verification and `npm pack --dry-run` during `npm publish`.
|
|
268
|
+
|
|
269
|
+
## How it works
|
|
270
|
+
|
|
271
|
+
`pi-agent-browser-native` is intentionally thin:
|
|
272
|
+
|
|
273
|
+
1. Pi loads `extensions/agent-browser/index.ts` from the package manifest.
|
|
274
|
+
2. The extension registers one native tool named `agent_browser`.
|
|
275
|
+
3. Tool calls are translated into upstream `agent-browser` CLI invocations with controlled args, stdin, environment, timeout, and session planning.
|
|
276
|
+
4. Upstream JSON/plain-text output is parsed into model-friendly content and structured details.
|
|
277
|
+
5. Screenshots, downloads, recordings, traces, profiles, and spill files are normalized as Pi-visible artifacts where possible.
|
|
278
|
+
6. Generated playbook text in docs and tool metadata stays aligned with `extensions/agent-browser/lib/playbook.ts`.
|
|
279
|
+
|
|
280
|
+
The upstream browser engine remains [`agent-browser`](https://agent-browser.dev/). This package does not bundle it and does not maintain compatibility shims for old upstream versions.
|
|
281
|
+
|
|
282
|
+
## Current limits
|
|
283
|
+
|
|
284
|
+
- Published pre-1.0 package.
|
|
285
|
+
- Targets the current locally installed upstream `agent-browser` version only.
|
|
286
|
+
- Does not bundle `agent-browser`; users install it separately.
|
|
287
|
+
- Does not provide a human browser UI inside Pi; the primary UX is agent-invoked tool calls.
|
|
288
|
+
- Real authenticated profile use is powerful but sensitive. Treat profile and cookie access as user-approved, task-specific behavior.
|
|
289
|
+
- Wrapper tab/session recovery is best effort around observed upstream behavior, not a replacement for explicit profile/session design.
|
|
290
|
+
|
|
224
291
|
## Local development
|
|
225
292
|
|
|
226
|
-
|
|
293
|
+
Install upstream `agent-browser`, then install dependencies:
|
|
227
294
|
|
|
228
|
-
|
|
295
|
+
```bash
|
|
296
|
+
npm install
|
|
297
|
+
```
|
|
229
298
|
|
|
230
|
-
|
|
231
|
-
1. Install `agent-browser` separately via the upstream project.
|
|
232
|
-
2. Run `npm install`.
|
|
233
|
-
3. For a quick checkout-only smoke test, launch `pi` from this repository root with discovery disabled:
|
|
299
|
+
Quick isolated checkout smoke test:
|
|
234
300
|
|
|
235
301
|
```bash
|
|
236
302
|
pi --no-extensions -e .
|
|
237
303
|
```
|
|
238
304
|
|
|
239
|
-
|
|
240
|
-
5. For hot-reload or resume validation, run `npm run verify -- lifecycle` or configure exactly one active source for this extension in Pi settings, launch plain `pi`, and exercise `/reload` plus restart/`/resume`. Settings matter only in this configured-source mode; they are bypassed by `--no-extensions -e .`. See [`docs/RELEASE.md`](docs/RELEASE.md) for the automated harness behavior, cleanup, and transcript retention details.
|
|
305
|
+
This bypasses Pi settings and configured extensions. After editing extension code, restart that Pi process to test the new checkout.
|
|
241
306
|
|
|
242
|
-
|
|
307
|
+
Configured-source lifecycle validation:
|
|
243
308
|
|
|
244
|
-
```
|
|
245
|
-
|
|
309
|
+
```bash
|
|
310
|
+
npm run verify -- lifecycle
|
|
246
311
|
```
|
|
247
312
|
|
|
248
|
-
|
|
313
|
+
Use lifecycle validation when testing `/reload`, full restart, `/resume`, managed-session continuity, or persisted artifact behavior.
|
|
314
|
+
|
|
315
|
+
Installed-package validation after publish:
|
|
249
316
|
|
|
250
317
|
```bash
|
|
251
318
|
npm run verify -- package-pi
|
|
252
319
|
pi --no-extensions -e npm:pi-agent-browser-native@<version>
|
|
253
320
|
```
|
|
254
321
|
|
|
255
|
-
|
|
256
|
-
|
|
257
|
-
|
|
258
|
-
|
|
259
|
-
- open a page and snapshot it
|
|
260
|
-
- click a link and confirm the destination title
|
|
261
|
-
- use an explicit `--session` across multiple tool calls
|
|
262
|
-
- use an explicit `--profile` and verify persisted browser storage across restarts
|
|
263
|
-
- open `chat.com` or `chatgpt.com` headlessly with `--profile Default` without forcing `--headed` or `--auto-connect`
|
|
264
|
-
- in configured-source lifecycle mode, verify `/reload` and full restart + `/resume` keep following the same implicit managed browser session
|
|
265
|
-
- run `batch` with JSON via `stdin`
|
|
266
|
-
- run `eval --stdin`
|
|
267
|
-
- take a screenshot with inline attachment support and visible artifact metadata: artifact type, requested path, absolute path, existence, size, cwd, session, and repair/copy status when applicable
|
|
268
|
-
- inspect upstream help/version through native tool calls like `{ "args": ["--help"] }` and `{ "args": ["--version"] }` via the tool's stateless plain-text inspection fallback
|
|
269
|
-
- use `download <selector> <path>` for direct attachment/file-save workflows instead of trying to infer downloads from generic clicks or large eval dumps
|
|
270
|
-
- for `.dogfood/...` or other dot-directory screenshot paths, rely on the wrapper's path normalization/repair contract; the visible result reports the requested path and absolute path rather than only an upstream temp path
|
|
271
|
-
- use `click` plus `wait --download <path>` for asynchronous export flows, confirm `details.savedFilePath`/`details.savedFile` are present on the wait result or batch wait step, and check `details.artifacts[].exists` before relying on requested-path persistence
|
|
272
|
-
- confirm oversized outputs show the actual spill file path directly in tool content, not just a details key name
|
|
273
|
-
- inspect `details.artifactManifest` / `details.artifactRetentionSummary` during artifact-heavy flows to recover recent saved files, spill files, and visible eviction state after reload/resume
|
|
322
|
+
## Generated native-tool playbook notes
|
|
323
|
+
|
|
324
|
+
These sections are generated from `extensions/agent-browser/lib/playbook.ts`. Run `npm run docs -- playbook write` after changing the canonical playbook source.
|
|
274
325
|
|
|
275
326
|
<!-- agent-browser-playbook:start inspection -->
|
|
276
327
|
<!-- Generated from extensions/agent-browser/lib/playbook.ts. Run `npm run docs -- playbook write` to update. -->
|
|
@@ -282,14 +333,6 @@ Native inspection calls use the `agent_browser` tool shape, not shell-like direc
|
|
|
282
333
|
These calls return plain text and stay stateless: the extension does not inject its implicit session and does not let inspection consume the managed-session slot needed for later profile, session, CDP, state, or auto-connect launches.
|
|
283
334
|
<!-- agent-browser-playbook:end inspection -->
|
|
284
335
|
|
|
285
|
-
Current cautions:
|
|
286
|
-
- passing `--profile` is an explicit upstream choice; this extension does not add its own profile-cloning or isolation layer
|
|
287
|
-
- launch-scoped flags like `--profile`, `--session-name`, `--cdp`, `--state`, `--auto-connect`, `--init-script`, and `--enable` are for the first command that launches a session; if the implicit session is already active, retry that call with `sessionMode: "fresh"` or provide an explicit `--session ...` for the new launch
|
|
288
|
-
- implicit `piab-*` sessions are extension-managed convenience sessions; they stay alive across `/reload` and resumable session transitions so later default calls can keep following the active managed browser on `/reload` or `/resume`, close when the originating `pi` process quits, rely on the configured idle timeout only as an abnormal-exit backstop, store persisted-session large snapshot spill files under a private session-scoped artifact directory with a bounded per-session budget so `details.fullOutputPath` and metadata-only `details.artifactManifest` survive reload/resume without unbounded growth, and still clean up process-private temp spill artifacts on shutdown
|
|
289
|
-
- `sessionMode: "fresh"` without an explicit `--session` rotates that extension-managed session to the new browser so later auto calls keep using it
|
|
290
|
-
- for local Unix launches, the wrapper uses a short private socket directory under `/tmp` so extension-generated session names do not trip upstream Unix socket-path limits in longer cwd/session-name combinations
|
|
291
|
-
- wrapper-spawned commands clamp `AGENT_BROWSER_DEFAULT_TIMEOUT` to 25 seconds and use a 28-second process watchdog so a single upstream CLI call does not cross the upstream 30-second IPC read-timeout/retry path; split intentionally long waits into shorter tool calls
|
|
292
|
-
- for direct headless local Chrome launches to `chat.com`, `chatgpt.com`, and `chat.openai.com`, the extension injects a normal Chrome user agent when the caller did not explicitly provide `--user-agent`; this keeps the default headless workflow usable without forcing `--headed` or `--auto-connect`
|
|
293
336
|
<!-- agent-browser-playbook:start wrapper-tab-recovery -->
|
|
294
337
|
<!-- Generated from extensions/agent-browser/lib/playbook.ts. Run `npm run docs -- playbook write` to update. -->
|
|
295
338
|
- After launch-scoped open/goto/navigate calls that can restore existing tabs (for example --profile, --session-name, or --state), agent_browser best-effort re-selects the tab whose URL matches the returned page when restored tabs steal focus during launch.
|
|
@@ -297,59 +340,32 @@ Current cautions:
|
|
|
297
340
|
- After a successful command on a known target tab, agent_browser also best-effort restores that intended tab if a restored/background tab steals focus after the command completes.
|
|
298
341
|
- If a known session target unexpectedly reports about:blank, agent_browser preserves the prior intended target, best-effort re-selects it when it still exists, and reports exact recovery guidance when it cannot be re-selected.
|
|
299
342
|
<!-- agent-browser-playbook:end wrapper-tab-recovery -->
|
|
300
|
-
- oversized snapshots and oversized generic outputs compact inline content and print the actual spill file path directly in the tool result when a spill file exists; recent spills and explicit saved artifacts are also summarized in `details.artifactManifest`, including `evicted` entries when retention budgets remove older persisted files
|
|
301
|
-
- artifact-producing commands render direct readable artifact metadata in visible content and `details.artifacts`: `kind`/`artifactType`, `path`, `requestedPath`, `absolutePath`, `exists`, `sizeBytes`, `status`, `cwd`, `session`, and `tempPath` when the wrapper repaired an upstream temp fallback
|
|
302
|
-
- if the caller explicitly passes `--json`, the visible text content is valid JSON; for `stream status`, the wrapper enriches data with `wsUrl` and `frameFormat`
|
|
303
|
-
- `trace` and `profiler` share upstream tracing machinery; the wrapper blocks starts/stops that conflict with owner state it observed in the current Pi session, but the message says "wrapper believes" because upstream or external CLI calls can desynchronize that local state
|
|
304
|
-
- explicit caller-provided `--session` values are treated as user-managed and are not auto-closed by the extension
|
|
305
|
-
- explicit caller-provided `--user-agent` values win over the ChatGPT/OpenAI compatibility workaround
|
|
306
|
-
- tool progress/details redact sensitive invocation values such as `--headers`, proxy credentials, and auth-bearing URL parameters before echoing them back into Pi
|
|
307
|
-
|
|
308
|
-
### Switching from public browsing to a fresh profile/debug launch
|
|
309
|
-
|
|
310
|
-
A common agent workflow is:
|
|
311
|
-
|
|
312
|
-
1. browse a public page with the default implicit session
|
|
313
|
-
2. then switch to a fresh authenticated/profile/debug launch
|
|
314
|
-
|
|
315
|
-
Use `sessionMode: "fresh"` for that transition instead of relying on the implicit session:
|
|
316
|
-
|
|
317
|
-
```json
|
|
318
|
-
{
|
|
319
|
-
"args": ["--profile", "Default", "open", "https://example.com/account"],
|
|
320
|
-
"sessionMode": "fresh"
|
|
321
|
-
}
|
|
322
|
-
```
|
|
323
|
-
|
|
324
|
-
After that call succeeds, later default `sessionMode: "auto"` calls continue in the new fresh browser.
|
|
325
|
-
|
|
326
|
-
If you want to name the new upstream session yourself, pass an explicit session instead:
|
|
327
|
-
|
|
328
|
-
```json
|
|
329
|
-
{
|
|
330
|
-
"args": ["--session", "auth-flow", "--profile", "Default", "open", "https://example.com/account"]
|
|
331
|
-
}
|
|
332
|
-
```
|
|
333
343
|
|
|
334
|
-
##
|
|
344
|
+
## Project map
|
|
335
345
|
|
|
336
|
-
|
|
337
|
-
|
|
338
|
-
|
|
339
|
-
|
|
340
|
-
|
|
346
|
+
| Path | Purpose |
|
|
347
|
+
|---|---|
|
|
348
|
+
| `extensions/agent-browser/index.ts` | Pi extension entrypoint and native tool wrapper |
|
|
349
|
+
| `extensions/agent-browser/lib/runtime.ts` | Args, session planning, redaction, process, and runtime helpers |
|
|
350
|
+
| `extensions/agent-browser/lib/results/` | Model-facing result rendering and error guidance |
|
|
351
|
+
| `extensions/agent-browser/lib/playbook.ts` | Canonical generated agent/browser guidance |
|
|
352
|
+
| `docs/COMMAND_REFERENCE.md` | Repo-readable native command reference |
|
|
353
|
+
| `docs/TOOL_CONTRACT.md` | Tool parameters, result shape, and behavior contract |
|
|
354
|
+
| `docs/ARCHITECTURE.md` | Design decisions and implementation structure |
|
|
355
|
+
| `docs/REQUIREMENTS.md` | Product requirements and constraints |
|
|
356
|
+
| `docs/RELEASE.md` | Release, package, and lifecycle verification workflow |
|
|
357
|
+
| `test/` | Wrapper, runtime, presentation, lifecycle, and package tests |
|
|
341
358
|
|
|
342
|
-
##
|
|
359
|
+
## More docs
|
|
343
360
|
|
|
344
|
-
|
|
361
|
+
- [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md) — full native command reference and upstream capability baseline
|
|
362
|
+
- [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md) — exact tool contract
|
|
363
|
+
- [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) — how the wrapper is designed
|
|
364
|
+
- [`docs/REQUIREMENTS.md`](docs/REQUIREMENTS.md) — product constraints and non-goals
|
|
365
|
+
- [`docs/RELEASE.md`](docs/RELEASE.md) — maintainer release workflow
|
|
345
366
|
|
|
346
|
-
|
|
347
|
-
2. update the affected design docs
|
|
348
|
-
3. update this README if user-facing expectations changed
|
|
367
|
+
## Next action
|
|
349
368
|
|
|
350
|
-
|
|
369
|
+
If you are a user, install the package and ask Pi to open a public page with `agent_browser`.
|
|
351
370
|
|
|
352
|
-
|
|
353
|
-
2. update `docs/COMMAND_REFERENCE.md`
|
|
354
|
-
3. update tool guidance, README, and release docs if behavior or recommended usage changed
|
|
355
|
-
4. verify the blocked direct-binary path still has an equally usable local extension-side documentation path
|
|
371
|
+
If you are evaluating the implementation, read [`extensions/agent-browser/index.ts`](extensions/agent-browser/index.ts), then run `npm run verify`.
|
package/docs/ARCHITECTURE.md
CHANGED
|
@@ -31,7 +31,7 @@ The extension should:
|
|
|
31
31
|
- resolve `agent-browser` from `PATH`
|
|
32
32
|
- invoke it directly, not through a shell
|
|
33
33
|
- inject `--json`
|
|
34
|
-
- support optional stdin only for `eval --stdin` and `
|
|
34
|
+
- support optional stdin only for `eval --stdin`, `batch`, and `auth save --password-stdin`, rejecting other command/stdin combinations before launch
|
|
35
35
|
|
|
36
36
|
### Agent-first UX
|
|
37
37
|
|
|
@@ -34,7 +34,7 @@ Tool parameters:
|
|
|
34
34
|
```
|
|
35
35
|
|
|
36
36
|
- `args`: exact `agent-browser` CLI tokens after the binary name.
|
|
37
|
-
- `stdin`: only for `batch` and `
|
|
37
|
+
- `stdin`: only for `batch`, `eval --stdin`, and `auth save --password-stdin`; other command/stdin combinations are rejected before `agent-browser` is launched.
|
|
38
38
|
- `sessionMode`:
|
|
39
39
|
- `"auto"` reuses the extension-managed session when possible.
|
|
40
40
|
- `"fresh"` rotates that managed session to a fresh upstream launch so launch-scoped flags like `--profile`, `--session-name`, `--cdp`, `--state`, `--auto-connect`, `--init-script`, or `--enable` apply.
|
|
@@ -220,7 +220,7 @@ The tables below intentionally list more than the recommended workflow. Rare com
|
|
|
220
220
|
|
|
221
221
|
### Built-in skills
|
|
222
222
|
|
|
223
|
-
Native-tool note: upstream skills are written for the standalone `agent-browser` CLI and may show bash/heredoc examples. In pi, convert those examples to `agent_browser` calls: pass CLI tokens in `args`, and pass heredoc/stdin bodies through the tool `stdin` field for `batch` or `
|
|
223
|
+
Native-tool note: upstream skills are written for the standalone `agent-browser` CLI and may show bash/heredoc examples. In pi, convert those examples to `agent_browser` calls: pass CLI tokens in `args`, and pass heredoc/stdin bodies through the tool `stdin` field for `batch`, `eval --stdin`, or `auth save --password-stdin`.
|
|
224
224
|
|
|
225
225
|
| Command | Purpose |
|
|
226
226
|
| --- | --- |
|
|
@@ -300,9 +300,11 @@ These calls return plain text and stay stateless: the extension does not inject
|
|
|
300
300
|
| `cookies [get|set|clear]` | Manage cookies. `set` supports `--url`, `--domain`, `--path`, `--httpOnly`, `--secure`, `--sameSite`, `--expires`, and `--curl <file>` for JSON, cURL, or bare Cookie-header bulk imports. |
|
|
301
301
|
| `storage <local|session>` | Manage web storage. |
|
|
302
302
|
|
|
303
|
+
Privacy note: `cookies get` can expose real profile cookies. Do not run it against `--profile Default` or other authenticated profiles unless the user explicitly needs cookie inspection; prefer task-specific page actions and storage checks.
|
|
304
|
+
|
|
303
305
|
### Tabs
|
|
304
306
|
|
|
305
|
-
Stable tab ids look like `t1`, `t2`, and `t3`. Optional user labels such as `docs` or `app` are interchangeable with ids wherever a tab reference is accepted.
|
|
307
|
+
Stable tab ids look like `t1`, `t2`, and `t3`. Optional user labels such as `docs` or `app` are interchangeable with ids wherever a tab reference is accepted. Upstream help may refer to numeric tab positions, but this wrapper guidance uses stable `t<N>` ids because positional integers are not accepted by current upstream `agent-browser`.
|
|
306
308
|
|
|
307
309
|
| Command | Purpose |
|
|
308
310
|
| --- | --- |
|
|
@@ -377,7 +379,7 @@ When these diagnostic commands are invoked through the native `agent_browser` to
|
|
|
377
379
|
| Command | Purpose |
|
|
378
380
|
| --- | --- |
|
|
379
381
|
| `batch [--bail] ["cmd" ...]` | Execute multiple commands sequentially from args or stdin. |
|
|
380
|
-
| `auth save <name> [opts]` | Save an auth profile with options such as `--url`, `--username`, `--password`, or `--password-stdin`. |
|
|
382
|
+
| `auth save <name> [opts]` | Save an auth profile with options such as `--url`, `--username`, `--password`, or `--password-stdin`. Prefer `--password-stdin` with the tool `stdin` field; avoid putting passwords in `args`. |
|
|
381
383
|
| `auth login <name>` | Login using saved credentials. |
|
|
382
384
|
| `auth list` | List saved auth profiles. |
|
|
383
385
|
| `auth show <name>` | Show auth profile metadata. |
|