pi-agent-browser-native 0.2.1 → 0.2.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,23 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.2.3 - 2026-04-13
4
+
5
+ ### Fixed
6
+ - direct headless local Chrome launches to `chatgpt.com` and `chat.openai.com` now inject a normal Chrome user agent when the caller did not explicitly choose one, keeping authenticated ChatGPT/OpenAI browsing working without forcing `--headed` or `--auto-connect`
7
+ - profiled `open` / `goto` / `navigate` calls now best-effort switch back to the page that was just opened when restored profile tabs steal focus during launch, reducing confusing cross-tab drift in profile-backed sessions
8
+ - command parsing now treats additional value-taking global flags like `--user-agent`, `--args`, `--allowed-domains`, `--action-policy`, and related launch options as launch metadata instead of accidentally parsing their values as subcommands
9
+ - README, requirements, architecture notes, and tool-contract docs now describe the new headless ChatGPT/OpenAI compatibility behavior and the profiled-tab focus recovery path
10
+
11
+ ## 0.2.2 - 2026-04-12
12
+
13
+ ### Fixed
14
+ - plain-text inspection commands like `agent_browser --help` and `--version` now stay stateless: they no longer claim the implicit managed session or leave behind ambiguous `parseError` details on success
15
+ - extension-managed session ownership is now reconstructed from persisted tool details on resume/reload while still preserving cwd-hash isolation across same-named checkouts and worktrees
16
+ - echoed tool updates/details now redact sensitive invocation values and structured secret-bearing fields instead of replaying headers, proxy credentials, cookies, or auth-bearing URL params back into `pi`
17
+ - the subprocess wrapper no longer forwards ambient parent-shell `AGENT_BROWSER_*` state into child runs, reducing surprising hidden configuration leaks from the caller environment
18
+ - browser-specific system-prompt injection is now minimal and only added for clearly browser-oriented turns, while the full playbook stays in tool metadata where it belongs
19
+ - published docs and changelog notes now match the current result/details contract, resume behavior, prompt behavior, and release workflow
20
+
3
21
  ## 0.2.1 - 2026-04-12
4
22
 
5
23
  ### Fixed
@@ -18,7 +36,7 @@
18
36
  - plain-text inspection commands like `agent_browser --help` and `--version` are now always allowed, removing the old prompt-dependent inspection gate and making the inspection contract local and predictable
19
37
  - navigation actions like `click`, `dblclick`, `back`, `forward`, and `reload` now include lightweight post-action title/url summaries when the wrapper can address the active session, reducing guess-and-check follow-up snapshots
20
38
  - compact snapshot rendering is leaner by default: fewer additional sections, fewer refs, smaller role summaries, and the raw spill path now stays in `details.fullOutputPath` instead of dominating the visible snapshot body
21
- - README and injected tool guidance now include a compact agent quick start with the core call shapes for `open` + `snapshot`, `click` + re-snapshot, `batch`, `eval --stdin`, and fresh profiled launches
39
+ - README and tool prompt guidance now include a compact agent quick start with the core call shapes for `open` + `snapshot`, `click` + re-snapshot, `batch`, `eval --stdin`, and fresh profiled launches, while turn-level system-prompt injection stays minimal
22
40
 
23
41
  ### Migration notes
24
42
  - replace any use of `useActiveSession` with `sessionMode`
@@ -29,7 +47,7 @@
29
47
  ### Changed
30
48
  - hardened the implicit browser-session lifecycle so failed first launches no longer mark the convenience session active, startup-scoped flags behave correctly across launches and closes, and the highest-risk entrypoint paths now have direct automated and isolated-`pi` coverage
31
49
  - added explicit temp-root ownership markers, aggregate spill-file disk budgeting, inline image size limits, and graceful fallback behavior when large snapshot or stdout artifacts exceed temp budgets
32
- - consolidated the shared browser operating playbook across the injected system prompt and tool prompt guidance while adding direct extension-hook coverage for prompt injection, bash blocking, and session resets
50
+ - consolidated the shared browser operating playbook into the tool prompt guidance while keeping turn-level system-prompt injection minimal, and added direct extension-hook coverage for prompt injection, bash blocking, and session resets
33
51
  - split the old result-rendering god module into focused envelope, presentation, shared, and snapshot modules, and made snapshot compaction fall back to a resilient outline mode when upstream raw snapshot formatting is unfamiliar
34
52
  - refactored the release-package verification script into smaller testable helpers, preserved the retired autoload-shim guard, and aligned the tarball gate with the split result-rendering module layout
35
53
 
package/README.md CHANGED
@@ -178,19 +178,25 @@ Validated workflow examples:
178
178
  - click a link and confirm the destination title
179
179
  - use an explicit `--session` across multiple tool calls
180
180
  - use an explicit `--profile` and verify persisted browser storage across restarts
181
+ - open `chatgpt.com` headlessly with `--profile Default` without forcing `--headed` or `--auto-connect`
182
+ - verify `/reload` and full restart + `/resume` keep following the same implicit managed browser session
181
183
  - run `batch` with JSON via `stdin`
182
184
  - run `eval --stdin`
183
185
  - take a screenshot with inline attachment support
184
- - inspect `agent_browser --help` and `--version` via the tool's plain-text inspection fallback
186
+ - inspect `agent_browser --help` and `--version` via the tool's stateless plain-text inspection fallback
185
187
 
186
- Inspection commands like `agent_browser --help` and `--version` are always supported. They return plain text and are useful for debugging or capability checks, but they are not required for normal browsing workflows.
188
+ Inspection commands like `agent_browser --help` and `--version` are always supported. They return plain text, are useful for debugging or capability checks, and stay stateless: the extension does not inject its implicit session for them and they do not consume the managed-session slot needed for a later `--profile`, `--session-name`, or `--cdp` launch.
187
189
 
188
190
  Current cautions:
189
191
  - passing `--profile` is an explicit upstream choice; this extension does not add its own profile-cloning or isolation layer
190
192
  - startup-scoped flags like `--profile`, `--session-name`, and `--cdp` are for the first command that launches a session; if the implicit session is already active, retry that call with `sessionMode: "fresh"` or provide an explicit `--session ...` for the new launch
191
- - implicit `piab-*` sessions are extension-managed convenience sessions; they are best-effort closed on `pi` shutdown, get an idle timeout to reduce stale background daemons, and clean up private temp spill artifacts on shutdown
193
+ - implicit `piab-*` sessions are extension-managed convenience sessions; they stay alive across `pi` shutdown/reload so later default calls can keep following the active managed browser on `/reload` or `/resume`, rely on the configured idle timeout to reduce stale background daemons, store persisted-session large snapshot spill files under a private session-scoped artifact directory with a bounded per-session budget so `details.fullOutputPath` survives reload/resume without unbounded growth, and still clean up process-private temp spill artifacts on shutdown
192
194
  - `sessionMode: "fresh"` without an explicit `--session` rotates that extension-managed session to the new browser so later auto calls keep using it
195
+ - for direct headless local Chrome launches to `chatgpt.com` and `chat.openai.com`, the extension injects a normal Chrome user agent when the caller did not explicitly provide `--user-agent`; this keeps the default headless workflow usable without forcing `--headed` or `--auto-connect`
196
+ - after profiled `open` calls, the extension best-effort re-selects the tab that matches the returned page URL when restored profile tabs steal focus during launch
193
197
  - explicit caller-provided `--session` values are treated as user-managed and are not auto-closed by the extension
198
+ - explicit caller-provided `--user-agent` values win over the ChatGPT/OpenAI compatibility workaround
199
+ - tool progress/details redact sensitive invocation values such as `--headers`, proxy credentials, and auth-bearing URL parameters before echoing them back into Pi
194
200
 
195
201
  ### Switching from public browsing to a fresh profile/debug launch
196
202
 
@@ -78,16 +78,18 @@ V1 ownership rule:
78
78
  - implicit auto-generated sessions are extension-managed convenience sessions
79
79
  - unnamed `sessionMode: "fresh"` launches rotate that extension-managed session to a new upstream browser
80
80
  - explicit/user-managed sessions are not auto-managed by default
81
- - extension-managed sessions should be reusable during an active `pi` session, but should still be cleaned up predictably
81
+ - extension-managed sessions should be reusable during an active `pi` session and across `/reload` / `/resume`, while still being cleaned up predictably
82
82
 
83
83
  Practical policy:
84
- - on normal `pi` shutdown, best-effort close the current extension-managed session
85
- - also set an idle timeout on extension-managed sessions so abandoned daemons self-clean after inactivity
86
- - clean up private temp spill artifacts owned by the extension-managed session on shutdown
84
+ - preserve the current extension-managed session across normal `pi` shutdown/reload so persisted sessions can keep following the live browser after `/reload` or `/resume`
85
+ - set an idle timeout on extension-managed sessions so abandoned daemons self-clean after inactivity
86
+ - clean up process-private temp spill artifacts on shutdown, but keep persisted-session snapshot spill files in a private session-scoped artifact directory with a bounded per-session budget so `details.fullOutputPath` stays usable after reload/resume without unbounded growth
87
+ - reconstruct the current extension-managed session from persisted tool details on resume/reload so later default calls keep following the active managed browser
87
88
  - if an unnamed fresh launch replaces an active extension-managed session, best-effort close the old managed session after the switch succeeds
88
89
  - leave explicit caller-provided `--session` choices alone unless the caller closes them explicitly
90
+ - after profiled `open` / `goto` / `navigate` calls, verify the active tab still matches the returned page URL and best-effort switch back when restored profile tabs steal focus
89
91
 
90
- This is primarily about ownership clarity and avoiding surprise, not adding a heavy safety wrapper. If the extension invented the session, the extension should clean it up. If the caller explicitly chose the upstream session model, the extension should stay out of the way.
92
+ This is primarily about ownership clarity and avoiding surprise, not adding a heavy safety wrapper. If the extension invented the session, the extension should own its lifecycle without breaking reload/resume semantics. If the caller explicitly chose the upstream session model, the extension should stay out of the way.
91
93
 
92
94
  ### Launch flags
93
95
 
@@ -96,6 +98,8 @@ The extension should surface that clearly and avoid hidden restart behavior in v
96
98
 
97
99
  That means explicit startup-scoping flags like `--profile`, `--session-name`, and `--cdp` should remain explicit upstream choices instead of being wrapped in extra hidden restart or cloning logic.
98
100
 
101
+ The wrapper may still apply narrow compatibility normalizations when observed behavior justifies them and the result remains thin, local, and opt-out. For example, if a specific site starts rejecting the default local headless Chrome user agent while the same flow works with a normal Chrome UA, the extension may inject a domain-specific fallback UA only when the caller did not already choose `--user-agent`, `--headed`, `--cdp`, `--auto-connect`, or a provider-backed launch.
102
+
99
103
  If the implicit session is already active and one of those startup-scoped flags appears again while `sessionMode` is still `"auto"`, the extension should fail clearly instead of silently sending a command shape that upstream would ignore.
100
104
 
101
105
  That failure should include a structured recovery hint pointing to `sessionMode: "fresh"` as the first-line fix, while still allowing an explicit `--session` when the caller wants to name the new upstream session.
package/docs/RELEASE.md CHANGED
@@ -57,13 +57,21 @@ Before publishing, also validate the explicit local-checkout path:
57
57
  2. Launch `pi --no-extensions -e .` from this repository root.
58
58
  3. Confirm the checkout extension loads from `extensions/agent-browser/index.ts`.
59
59
  4. Run a smoke prompt that exercises `agent_browser`.
60
+ 5. Validate managed-session continuity with both `/reload` and a full restart + `/resume`.
60
61
 
61
- Example prompt:
62
+ Example smoke prompt:
62
63
 
63
64
  ```text
64
65
  Use the agent_browser tool to open https://react.dev and then take an interactive snapshot.
65
66
  ```
66
67
 
68
+ Recommended lifecycle follow-up:
69
+
70
+ 1. Open a page with the implicit managed session and confirm the title.
71
+ 2. Run `/reload`, then ask for `snapshot -i` and confirm the same page is still active.
72
+ 3. Exit `pi`, relaunch it against the same session file or use `/resume`, then ask for `snapshot -i` again and confirm the same page is still active.
73
+ 4. Open a large page that compacts its snapshot output and confirm `details.fullOutputPath` still exists after the restart/resume flow.
74
+
67
75
  ## Post-publish install validation
68
76
 
69
77
  After publishing a release, validate the package-first install path explicitly:
@@ -73,7 +81,7 @@ pi install npm:pi-agent-browser-native@<version>
73
81
  pi -e npm:pi-agent-browser-native@<version>
74
82
  ```
75
83
 
76
- Then confirm `pi` exposes the native `agent_browser` tool and that a basic `open` + `snapshot -i` flow works.
84
+ Then confirm `pi` exposes the native `agent_browser` tool, that a basic `open` + `snapshot -i` flow works, and that `/reload` plus restart/`/resume` keep following the same implicit managed browser session.
77
85
 
78
86
  ## Release notes checklist
79
87
 
@@ -83,4 +91,5 @@ Before publishing:
83
91
  - confirm README install guidance still leads with the package-first flow
84
92
  - confirm the explicit local-checkout instructions still work for pre-release validation
85
93
  - rerun `npm run verify:release`
94
+ - manually exercise `/reload` and full restart + `/resume` continuity in local checkout validation
86
95
  - publish only after the tarball contents match expectations
@@ -71,7 +71,8 @@ Define the product requirements and constraints for `pi-agent-browser-native`.
71
71
 
72
72
  - The primary confidence path is a real `pi` session driven in `tmux`.
73
73
  - For local checkout validation, launch `pi --no-extensions -e .` from the repository root so only the checkout copy loads.
74
- - Prefer full `pi` restart over `/reload` when validating extension changes.
74
+ - Validate both `/reload` and a full `pi` restart with `/resume` when changes touch managed-session continuity, reload behavior, or persisted artifact paths.
75
+ - Prefer full `pi` restart over `/reload` when validating extension changes beyond a quick reload smoke check.
75
76
  - Use `/resume` when needed after restart.
76
77
  - Keep testing broader than a single smoke site like `example.com`.
77
78
  - Maintain a concrete release/package verification workflow in `docs/RELEASE.md` and matching repository scripts.
@@ -84,7 +85,8 @@ The design should comfortably support workflows such as:
84
85
  - web research
85
86
  - using browser UIs for other LLMs such as ChatGPT, Grok, Gemini, and Claude
86
87
  - isolated authenticated browser sessions
87
- - cloned-profile workflows similar to the patterns used in `pi-oracle`
88
+ - headless authenticated ChatGPT/OpenAI browsing without forcing `--headed` or `--auto-connect`
89
+ - upstream profile/debug workflows without adding a local profile-cloning layer in this package
88
90
 
89
91
  ## Implications for the implementation
90
92
 
@@ -94,6 +96,8 @@ The design should comfortably support workflows such as:
94
96
  - User-facing docs belong in `README.md` and the canonical published files under `docs/`.
95
97
  - Agent workflow and deeper testing procedures can stay in `AGENTS.md`, but published docs must not depend on that file being present.
96
98
  - Keep mitigations for legacy-skill coexistence simple; do not add extra moving parts unless observed behavior justifies them.
99
+ - Prefer narrow, evidence-backed compatibility mitigations over broad stealth layers when a specific upstream site starts rejecting the default headless launch fingerprint.
100
+ - Preserve the page that a profiled `open` just navigated to; if restored profile tabs steal focus during launch, the wrapper should best-effort switch back to the returned page URL before handing control back to the agent.
97
101
 
98
102
  ## Open design questions
99
103
 
@@ -121,7 +121,8 @@ Recommended details:
121
121
  ```json
122
122
  {
123
123
  "args": ["snapshot", "-i"],
124
- "effectiveArgs": ["--session", "pi-abc123", "--json", "snapshot", "-i"],
124
+ "effectiveArgs": ["--json", "--session", "pi-abc123", "snapshot", "-i"],
125
+ "command": "snapshot",
125
126
  "sessionMode": "auto",
126
127
  "sessionName": "pi-abc123",
127
128
  "usedImplicitSession": true,
@@ -131,11 +132,22 @@ Recommended details:
131
132
  "e1": { "name": "Example Domain", "role": "heading" }
132
133
  },
133
134
  "snapshot": "- heading \"Example Domain\" [level=1, ref=e1]"
134
- }
135
+ },
136
+ "summary": "Snapshot: 1 refs on https://example.com/"
135
137
  }
136
138
  ```
137
139
 
138
- For oversized snapshots, details should switch to a compact metadata object and include `fullOutputPath` pointing at a private temp JSON spill file with the full upstream snapshot payload.
140
+ Additional structured fields can appear when relevant:
141
+ - `batchFailure` and `batchSteps` for `batch` rendering, including mixed-success runs
142
+ - `navigationSummary` for navigation-style commands like `click`, `back`, `forward`, and `reload`
143
+ - `imagePath` / `imagePaths` for screenshots and batched image outputs
144
+ - `fullOutputPath` / `fullOutputPaths` when large snapshot output is compacted and spilled to a private file; persisted sessions keep that path under a private session-scoped artifact directory with a bounded per-session budget so it survives reload/resume without unbounded growth
145
+ - `sessionRecoveryHint` when startup-scoped flags need `sessionMode: "fresh"`
146
+ - `inspection: true` plus `stdout` for successful plain-text inspection commands like `--help` and `--version`
147
+
148
+ When the tool echoes `args` or `effectiveArgs` back into Pi, sensitive values such as `--headers`, proxy credentials, and auth-bearing URL parameters should be redacted first.
149
+
150
+ For oversized snapshots, details should switch to a compact metadata object and include `fullOutputPath` pointing at a private JSON spill file with the full upstream snapshot payload. Persisted sessions should keep that spill file under a private session-scoped artifact directory so the path remains usable after reload/restart, with the oldest persisted spill files evicted as needed to stay within the per-session budget.
139
151
 
140
152
  ## High-value result rendering
141
153
 
@@ -144,6 +156,7 @@ For oversized snapshots, details should switch to a compact metadata object and
144
156
  Worth doing in v1:
145
157
  - screenshots → inline image attachment
146
158
  - snapshots → origin + ref count + main-content-first compact preview, with the raw snapshot spill path kept in `details.fullOutputPath` when the inline result would otherwise be too large
159
+ - extraction-style commands like `eval --stdin` and `get title` → scalar-first text with lightweight origin context when available
147
160
  - navigation actions like `click`, `back`, `forward`, and `reload` → lightweight post-action title/url summary when available
148
161
  - tab lists → compact summary/table
149
162
  - stream status → enabled/connected/port summary
@@ -161,14 +174,18 @@ If `agent-browser` is not on `PATH`, fail with a message that:
161
174
  - derive the base implicit session name from the official `pi` session id plus a cwd hash so same-named checkouts do not collide
162
175
  - respect explicit upstream `--session` with minimal interference
163
176
  - treat the extension-managed session as convenience state owned by the wrapper
164
- - on normal `pi` shutdown, best-effort close the current extension-managed session
177
+ - preserve the current extension-managed session across normal `pi` shutdown/reload so persisted sessions can keep following the live browser on `/reload` or `/resume`
165
178
  - set an idle timeout on extension-managed sessions so abandoned daemons eventually self-clean
166
- - clean up private temp spill artifacts owned by the extension-managed session on shutdown
179
+ - clean up process-private temp spill artifacts on shutdown, while keeping persisted-session snapshot spill files in a private session-scoped artifact directory so `details.fullOutputPath` survives reload/restart and the oldest spill files are evicted if the per-session artifact budget is exceeded
180
+ - reconstruct the current extension-managed session from persisted tool details on resume/reload so later default calls keep following the active managed browser
167
181
  - when an unnamed `sessionMode: "fresh"` launch succeeds, make it the new extension-managed session so later default calls keep using it
168
182
  - if that unnamed fresh launch replaced an already-active managed session, best-effort close the old managed session after the switch succeeds
169
183
  - treat explicit caller-provided `--session` choices as user-managed
170
184
  - pass explicit `--profile` straight through to upstream `agent-browser`; no profile-cloning or isolation layer is added in v1
185
+ - after profiled `open` / `goto` / `navigate`, if upstream leaves a restored profile tab active instead of the page that was just opened, best-effort switch back to the tab whose URL matches the returned open result before returning control to the agent
186
+ - treat successful plain-text inspection commands like `--help` and `--version` as stateless: do not inject the implicit managed session and do not let those calls claim the managed-session slot
171
187
  - if startup-scoped flags like `--profile`, `--session-name`, or `--cdp` are supplied after the implicit session is already active while `sessionMode` is `"auto"`, return a validation error with a structured recovery hint that recommends `sessionMode: "fresh"`
188
+ - for direct headless local Chrome launches to `chatgpt.com` / `chat.openai.com`, allow a narrow compatibility fallback that injects a normal Chrome `--user-agent` only when the caller did not explicitly provide one and did not choose `--headed`, `--cdp`, `--auto-connect`, or a provider-backed launch
172
189
 
173
190
  ## Non-goals
174
191