pi-agent-browser-native 0.2.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,15 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.2.1 - 2026-04-12
4
+
5
+ ### Fixed
6
+ - the GitHub source trial docs now use `pi --no-extensions -e https://github.com/fitchmultz/pi-agent-browser-native` so published-package users do not hit duplicate `agent_browser` registration conflicts during source-path testing
7
+ - successful unnamed `sessionMode: "fresh"` launches now rotate the extension-managed session to the new browser, and later default `sessionMode: "auto"` calls keep following that fresh session instead of silently snapping back to the older one
8
+ - mixed-success `batch` failures now preserve per-step rendering, include the first failing step in the visible output and structured details, and still mark the overall tool call as an error so agents can recover from partial progress
9
+ - implicit `piab-*` session names now include a stable cwd hash in addition to the `pi` session id so same-named checkouts and worktrees no longer collide onto the same browser session
10
+ - value-taking flags like `--session`, `--profile`, `--session-name`, and `--cdp` now fail locally with direct validation errors when the value is missing or replaced by another flag, instead of producing confusing downstream JSON parse failures
11
+ - the bash guard now catches wrapped `agent-browser` invocations such as `env agent-browser ...`, `npx --yes agent-browser ...`, `pnpm dlx agent-browser ...`, `yarn dlx agent-browser ...`, `bunx agent-browser ...`, and absolute-path execution, reducing accidental bypasses of the native-tool path
12
+
3
13
  ## 0.2.0 - 2026-04-12
4
14
 
5
15
  ### Changed
package/README.md CHANGED
@@ -69,12 +69,14 @@ For the source install path, prefer the repository URL:
69
69
  pi install https://github.com/fitchmultz/pi-agent-browser-native
70
70
  ```
71
71
 
72
- To try the GitHub source without installing it permanently:
72
+ To try the GitHub source without installing it permanently, isolate that temporary source extension from your normal installed package set:
73
73
 
74
74
  ```bash
75
- pi -e https://github.com/fitchmultz/pi-agent-browser-native
75
+ pi --no-extensions -e https://github.com/fitchmultz/pi-agent-browser-native
76
76
  ```
77
77
 
78
+ This avoids duplicate `agent_browser` registrations when you already have `pi-agent-browser-native` installed globally.
79
+
78
80
  ### Current practical local-checkout flow
79
81
 
80
82
  Until you are using a published package release, prefer an explicit checkout-only run instead of installing the checkout into your normal `pi` package set:
@@ -89,8 +91,8 @@ The native tool exposed to the agent is named `agent_browser`.
89
91
 
90
92
  The primary session control parameter is `sessionMode`:
91
93
 
92
- - `"auto"` (default) reuses the implicit `pi`-scoped session when possible
93
- - `"fresh"` skips that implicit session so startup-scoped flags like `--profile`, `--session-name`, and `--cdp` can launch a fresh upstream session
94
+ - `"auto"` (default) reuses the extension-managed `pi`-scoped session when possible
95
+ - `"fresh"` switches that managed session to a fresh upstream launch so startup-scoped flags like `--profile`, `--session-name`, and `--cdp` apply and later auto calls follow the new browser
94
96
 
95
97
  ## Agent quick start
96
98
 
@@ -99,8 +101,8 @@ The primary session control parameter is `sessionMode`:
99
101
  - `args` — exact CLI args after `agent-browser`
100
102
  - `stdin` — raw stdin only for `batch` and `eval --stdin`
101
103
  - `sessionMode`
102
- - `"auto"` — default, reuse the implicit `pi`-scoped session
103
- - `"fresh"` — skip the implicit session for a new profile/debug launch
104
+ - `"auto"` — default, reuse the extension-managed `pi`-scoped session
105
+ - `"fresh"` — switch that managed session to a new profile/debug launch
104
106
 
105
107
  ### Common call shapes
106
108
 
@@ -136,7 +138,9 @@ Start a fresh profiled launch after you already used the implicit session:
136
138
  { "args": ["--profile", "Default", "open", "https://example.com/account"], "sessionMode": "fresh" }
137
139
  ```
138
140
 
139
- Name a new upstream session explicitly when you want to keep reusing it:
141
+ After a successful unnamed fresh launch, later `sessionMode: "auto"` calls follow that new browser automatically.
142
+
143
+ Name a new upstream session explicitly when you want to keep reusing it yourself:
140
144
 
141
145
  ```json
142
146
  { "args": ["--session", "auth-flow", "open", "https://example.com"] }
@@ -185,7 +189,8 @@ Current cautions:
185
189
  - passing `--profile` is an explicit upstream choice; this extension does not add its own profile-cloning or isolation layer
186
190
  - startup-scoped flags like `--profile`, `--session-name`, and `--cdp` are for the first command that launches a session; if the implicit session is already active, retry that call with `sessionMode: "fresh"` or provide an explicit `--session ...` for the new launch
187
191
  - implicit `piab-*` sessions are extension-managed convenience sessions; they are best-effort closed on `pi` shutdown, get an idle timeout to reduce stale background daemons, and clean up private temp spill artifacts on shutdown
188
- - explicit upstream sessions like `--session`, `--profile`, `--session-name`, and `--cdp` are treated as user-managed and are not auto-closed by the extension
192
+ - `sessionMode: "fresh"` without an explicit `--session` rotates that extension-managed session to the new browser so later auto calls keep using it
193
+ - explicit caller-provided `--session` values are treated as user-managed and are not auto-closed by the extension
189
194
 
190
195
  ### Switching from public browsing to a fresh profile/debug launch
191
196
 
@@ -203,6 +208,8 @@ Use `sessionMode: "fresh"` for that transition instead of relying on the implici
203
208
  }
204
209
  ```
205
210
 
211
+ After that call succeeds, later default `sessionMode: "auto"` calls continue in the new fresh browser.
212
+
206
213
  If you want to name the new upstream session yourself, pass an explicit session instead:
207
214
 
208
215
  ```json
@@ -59,7 +59,7 @@ The published package should exclude agent-only and superseded repo materials su
59
59
 
60
60
  ### Default
61
61
 
62
- If the caller does not provide `--session`, the extension should default to `sessionMode: "auto"` and use an implicit session name derived from the current `pi` session id.
62
+ If the caller does not provide `--session`, the extension should default to `sessionMode: "auto"` and use an implicit session name derived from the current `pi` session id plus a hash of the absolute cwd.
63
63
 
64
64
  Why:
65
65
  - works out of the box
@@ -70,20 +70,22 @@ Why:
70
70
 
71
71
  If the caller provides `--session`, `--profile`, `--cdp`, or similar upstream flags, the extension should respect them with minimal interference.
72
72
 
73
- The tool should also expose a first-class `sessionMode: "fresh"` escape hatch so agents can intentionally skip the implicit session and launch a fresh upstream session without inventing a fixed explicit session name.
73
+ The tool should also expose a first-class `sessionMode: "fresh"` escape hatch so agents can intentionally rotate the extension-managed session to a fresh upstream launch without inventing a fixed explicit session name.
74
74
 
75
75
  ### Ownership
76
76
 
77
77
  V1 ownership rule:
78
78
  - implicit auto-generated sessions are extension-managed convenience sessions
79
+ - unnamed `sessionMode: "fresh"` launches rotate that extension-managed session to a new upstream browser
79
80
  - explicit/user-managed sessions are not auto-managed by default
80
- - implicit sessions should be reusable during an active `pi` session, but should still be cleaned up predictably
81
+ - extension-managed sessions should be reusable during an active `pi` session, but should still be cleaned up predictably
81
82
 
82
83
  Practical policy:
83
- - on normal `pi` shutdown, best-effort close the implicit session
84
- - also set an idle timeout on implicit sessions so abandoned daemons self-clean after inactivity
85
- - clean up private temp spill artifacts owned by the implicit session on shutdown
86
- - leave explicit upstream sessions like `--session`, `--profile`, `--session-name`, and `--cdp` alone unless the caller closes them explicitly
84
+ - on normal `pi` shutdown, best-effort close the current extension-managed session
85
+ - also set an idle timeout on extension-managed sessions so abandoned daemons self-clean after inactivity
86
+ - clean up private temp spill artifacts owned by the extension-managed session on shutdown
87
+ - if an unnamed fresh launch replaces an active extension-managed session, best-effort close the old managed session after the switch succeeds
88
+ - leave explicit caller-provided `--session` choices alone unless the caller closes them explicitly
87
89
 
88
90
  This is primarily about ownership clarity and avoiding surprise, not adding a heavy safety wrapper. If the extension invented the session, the extension should clean it up. If the caller explicitly chose the upstream session model, the extension should stay out of the way.
89
91
 
@@ -98,6 +100,8 @@ If the implicit session is already active and one of those startup-scoped flags
98
100
 
99
101
  That failure should include a structured recovery hint pointing to `sessionMode: "fresh"` as the first-line fix, while still allowing an explicit `--session` when the caller wants to name the new upstream session.
100
102
 
103
+ A successful unnamed `sessionMode: "fresh"` launch should become the new extension-managed session so later default calls follow that browser instead of silently snapping back to the older managed session.
104
+
101
105
  ## Preferring the native tool
102
106
 
103
107
  Keep the handling simple:
@@ -77,8 +77,8 @@ Examples:
77
77
 
78
78
  Behavior:
79
79
  - if `args` already include `--session`, upstream session choice wins
80
- - `"auto"` prepends the implicit active session when appropriate
81
- - `"fresh"` skips the implicit session so startup-scoped flags like `--profile`, `--session-name`, or `--cdp` can launch a fresh upstream session
80
+ - `"auto"` prepends the current extension-managed active session when appropriate
81
+ - `"fresh"` rotates that managed session to a fresh upstream launch so startup-scoped flags like `--profile`, `--session-name`, or `--cdp` apply and later default calls follow the new browser
82
82
 
83
83
  Recommended use:
84
84
  - use `"auto"` for the common browse/snapshot/click flow inside one `pi` session
@@ -157,14 +157,16 @@ If `agent-browser` is not on `PATH`, fail with a message that:
157
157
 
158
158
  ## Session behavior
159
159
 
160
- - maintain one implicit active session per `pi` session for the common path
161
- - derive that implicit session from the official `pi` session id
160
+ - maintain one extension-managed active session per `pi` session for the common path
161
+ - derive the base implicit session name from the official `pi` session id plus a cwd hash so same-named checkouts do not collide
162
162
  - respect explicit upstream `--session` with minimal interference
163
- - treat the implicit session as extension-managed convenience state
164
- - on normal `pi` shutdown, best-effort close the implicit session
165
- - set an idle timeout on implicit sessions so abandoned daemons eventually self-clean
166
- - clean up private temp spill artifacts owned by the implicit session on shutdown
167
- - treat explicit upstream session choices like `--session`, `--profile`, `--session-name`, and `--cdp` as user-managed
163
+ - treat the extension-managed session as convenience state owned by the wrapper
164
+ - on normal `pi` shutdown, best-effort close the current extension-managed session
165
+ - set an idle timeout on extension-managed sessions so abandoned daemons eventually self-clean
166
+ - clean up private temp spill artifacts owned by the extension-managed session on shutdown
167
+ - when an unnamed `sessionMode: "fresh"` launch succeeds, make it the new extension-managed session so later default calls keep using it
168
+ - if that unnamed fresh launch replaced an already-active managed session, best-effort close the old managed session after the switch succeeds
169
+ - treat explicit caller-provided `--session` choices as user-managed
168
170
  - pass explicit `--profile` straight through to upstream `agent-browser`; no profile-cloning or isolation layer is added in v1
169
171
  - if startup-scoped flags like `--profile`, `--session-name`, or `--cdp` are supplied after the implicit session is already active while `sessionMode` is `"auto"`, return a validation error with a structured recovery hint that recommends `sessionMode: "fresh"`
170
172
 
@@ -1,6 +1,6 @@
1
1
  /**
2
2
  * Purpose: Register the native agent_browser tool for pi so agents can invoke agent-browser without going through bash.
3
- * Responsibilities: Define the tool schema, inject thin wrapper behavior around the upstream CLI, manage implicit session convenience, and return pi-friendly content/details.
3
+ * Responsibilities: Define the tool schema, inject thin wrapper behavior around the upstream CLI, manage extension-owned browser session convenience, and return pi-friendly content/details.
4
4
  * Scope: Native tool registration and orchestration only; the wrapper intentionally stays close to the upstream agent-browser CLI.
5
5
  * Usage: Loaded by pi through the package manifest in this package, or explicitly via `pi --no-extensions -e .` during local checkout development.
6
6
  * Invariants/Assumptions: agent-browser is installed separately on PATH, the wrapper targets the current locally installed upstream version only, and no backward-compatibility shims are provided.
@@ -17,12 +17,13 @@ import {
17
17
  buildExecutionPlan,
18
18
  buildPromptPolicy,
19
19
  createEphemeralSessionSeed,
20
+ createFreshSessionName,
20
21
  createImplicitSessionName,
21
22
  getImplicitSessionCloseTimeoutMs,
22
23
  getImplicitSessionIdleTimeoutMs,
23
24
  getLatestUserPrompt,
24
25
  hasUsableBraveApiKey,
25
- resolveImplicitSessionActiveState,
26
+ resolveManagedSessionState,
26
27
  validateToolArgs,
27
28
  } from "./lib/runtime.js";
28
29
  import { cleanupSecureTempArtifacts } from "./lib/temp.js";
@@ -38,7 +39,7 @@ const AGENT_BROWSER_PARAMS = Type.Object({
38
39
  sessionMode: Type.Optional(
39
40
  Type.Union([Type.Literal("auto"), Type.Literal("fresh")], {
40
41
  description:
41
- "Session handling mode. `auto` reuses the implicit pi-scoped session when possible. `fresh` skips the implicit session so startup-scoped flags like --profile, --session-name, or --cdp can launch a fresh upstream session.",
42
+ "Session handling mode. `auto` reuses the extension-managed pi-scoped session when possible. `fresh` switches that managed session to a fresh upstream launch so startup-scoped flags like --profile, --session-name, or --cdp apply and later auto calls follow the new browser.",
42
43
  default: DEFAULT_SESSION_MODE,
43
44
  }),
44
45
  ),
@@ -46,7 +47,7 @@ const AGENT_BROWSER_PARAMS = Type.Object({
46
47
  const PROJECT_RULE_PROMPT =
47
48
  "Project rule: when browser automation is needed, prefer the native `agent_browser` tool. Do not run direct `agent-browser` bash commands unless the user explicitly asks for a bash-oriented workflow or browser-integration debugging.";
48
49
  const QUICK_START_GUIDELINES = [
49
- "Quick start mental model: args are the exact agent-browser CLI args after the binary; stdin is only for batch and eval --stdin; sessionMode=fresh starts a fresh upstream launch when you need new --profile, --session-name, or --cdp state.",
50
+ "Quick start mental model: args are the exact agent-browser CLI args after the binary; stdin is only for batch and eval --stdin; sessionMode=fresh switches the extension-managed session to a fresh upstream launch when you need new --profile, --session-name, or --cdp state.",
50
51
  "Common first calls: { args: [\"open\", \"https://example.com\"] } then { args: [\"snapshot\", \"-i\"] }; after navigation, use { args: [\"click\", \"@e2\"] } then { args: [\"snapshot\", \"-i\"] }.",
51
52
  "Common advanced calls: { args: [\"batch\"], stdin: \"[[\\\"open\\\",\\\"https://example.com\\\"],[\\\"snapshot\\\",\\\"-i\\\"]]\" }, { args: [\"eval\", \"--stdin\"], stdin: \"document.title\" }, and { args: [\"--profile\", \"Default\", \"open\", \"https://example.com/account\"], sessionMode: \"fresh\" }.",
52
53
  ] as const;
@@ -57,7 +58,7 @@ const SHARED_BROWSER_PLAYBOOK_GUIDELINES = [
57
58
  "For authenticated or user-specific content like feeds, inboxes, dashboards, and accounts, prefer --profile Default on the first browser call and let the implicit session carry continuity. Use --auto-connect only if profile-based reuse is unavailable or the task is specifically about attaching to a running debug-enabled browser.",
58
59
  "Do not invent fixed explicit session names for routine tasks. Use the implicit session unless you truly need multiple isolated browser sessions in the same conversation.",
59
60
  "When using --profile, --session-name, or --cdp, put them on the first command for that session. If you intentionally use an explicit --session, keep using that same explicit session for follow-ups.",
60
- "If you already used the implicit session and now need startup-scoped flags like --profile, --session-name, or --cdp, retry with sessionMode set to fresh or pass an explicit --session for the new launch.",
61
+ "If you already used the implicit session and now need startup-scoped flags like --profile, --session-name, or --cdp, retry with sessionMode set to fresh or pass an explicit --session for the new launch. After a successful unnamed fresh launch, later auto calls follow that new session.",
61
62
  "If a session lands on the wrong page or tab, an interaction changes origin unexpectedly, or an open call returns blocked, blank, or otherwise unexpected results, use tab list / tab <n> / snapshot -i to recover state before retrying different URLs or fallback strategies. Only use wait with an explicit argument like milliseconds, --load, --url, --fn, or --text.",
62
63
  "For feed, timeline, or inbox reading tasks, focus on the main timeline/list region and read the first item there rather than unrelated composer or sidebar content.",
63
64
  "For read-only browsing tasks, prefer extracting the answer from the current snapshot, structured ref labels, or eval --stdin on the current page before navigating away. Only click into media viewers, detail routes, or new pages when the current view does not contain the needed information.",
@@ -71,8 +72,8 @@ const TOOL_PROMPT_GUIDELINES_SUFFIX = [
71
72
  "Do not fall back to osascript, AppleScript, or generic browser-driving bash commands when this tool can do the job.",
72
73
  "Pass exact agent-browser CLI arguments in args, excluding the binary name.",
73
74
  "Use stdin for commands like eval --stdin and batch instead of shell heredocs.",
74
- "Let the implicit session handle the common path unless you explicitly need a fresh launch for upstream flags like --profile, --session-name, or --cdp.",
75
- "Use sessionMode=fresh when switching from an existing implicit session to a new profile/debug launch without inventing a fixed explicit session name.",
75
+ "Let the extension-managed session handle the common path unless you explicitly need a fresh launch for upstream flags like --profile, --session-name, or --cdp.",
76
+ "Use sessionMode=fresh when switching from an existing implicit session to a new profile/debug launch without inventing a fixed explicit session name; later auto calls will follow that new session.",
76
77
  ] as const;
77
78
 
78
79
  function buildMissingBinaryMessage(): string {
@@ -90,12 +91,19 @@ function buildInvocationPreview(effectiveArgs: string[]): string {
90
91
  return preview.length > 120 ? `${preview.slice(0, 117)}...` : preview;
91
92
  }
92
93
 
94
+ const AGENT_BROWSER_BASH_PREFIX = String.raw`(?:env(?:\s+[A-Za-z_][A-Za-z0-9_]*=[^\s;&|]+)*\s+)?(?:(?:npx|bunx)(?:\s+-[^\s;&|]+|\s+--[^\s;&|]+(?:=[^\s;&|]+)?)*\s+|(?:pnpm|yarn)\s+dlx(?:\s+-[^\s;&|]+|\s+--[^\s;&|]+(?:=[^\s;&|]+)?)*\s+)?`;
95
+ const AGENT_BROWSER_BASH_EXECUTABLE = String.raw`(?:[.~]|\.\.?|\/)?(?:[^\s;&|]+\/)?agent-browser`;
96
+ const DIRECT_AGENT_BROWSER_BASH_PATTERN = new RegExp(
97
+ String.raw`(^|[\s;&|])${AGENT_BROWSER_BASH_PREFIX}${AGENT_BROWSER_BASH_EXECUTABLE}(?=\s|$)`,
98
+ );
99
+ const HARMLESS_AGENT_BROWSER_INSPECTION_PATTERN = /(command\s+-v|which|type\s+-P)\s+agent-browser\b/;
100
+
93
101
  function looksLikeDirectAgentBrowserBash(command: string): boolean {
94
- return /(^|[\s;&|])(npx\s+)?agent-browser(\s|$)/.test(command);
102
+ return DIRECT_AGENT_BROWSER_BASH_PATTERN.test(command);
95
103
  }
96
104
 
97
105
  function isHarmlessAgentBrowserInspectionCommand(command: string): boolean {
98
- return /(command\s+-v|which)\s+agent-browser\b/.test(command) || /(^|\s)agent-browser\s+--(help|version)\b/.test(command);
106
+ return HARMLESS_AGENT_BROWSER_INSPECTION_PATTERN.test(command);
99
107
  }
100
108
 
101
109
  function isPlainTextInspectionArgs(args: string[]): boolean {
@@ -208,6 +216,22 @@ function buildToolPromptGuidelines(hasBraveApiKey: boolean): string[] {
208
216
  ];
209
217
  }
210
218
 
219
+ async function closeManagedSession(options: { cwd: string; sessionName: string; timeoutMs: number }): Promise<void> {
220
+ const controller = new AbortController();
221
+ const timer = setTimeout(() => controller.abort(), options.timeoutMs);
222
+ try {
223
+ await runAgentBrowserProcess({
224
+ args: ["--session", options.sessionName, "close"],
225
+ cwd: options.cwd,
226
+ signal: controller.signal,
227
+ });
228
+ } catch {
229
+ // Best-effort cleanup only.
230
+ } finally {
231
+ clearTimeout(timer);
232
+ }
233
+ }
234
+
211
235
  export default function agentBrowserExtension(pi: ExtensionAPI) {
212
236
  const ephemeralSessionSeed = createEphemeralSessionSeed();
213
237
  const hasBraveApiKey = hasUsableBraveApiKey();
@@ -215,32 +239,28 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
215
239
  const toolPromptGuidelines = buildToolPromptGuidelines(hasBraveApiKey);
216
240
  const implicitSessionIdleTimeoutMs = getImplicitSessionIdleTimeoutMs();
217
241
  const implicitSessionCloseTimeoutMs = getImplicitSessionCloseTimeoutMs();
218
- let implicitSessionActive = false;
219
- let implicitSessionName = createImplicitSessionName(undefined, process.cwd(), ephemeralSessionSeed);
220
- let implicitSessionCwd = process.cwd();
242
+ let managedSessionActive = false;
243
+ let managedSessionBaseName = createImplicitSessionName(undefined, process.cwd(), ephemeralSessionSeed);
244
+ let managedSessionName = managedSessionBaseName;
245
+ let managedSessionCwd = process.cwd();
246
+ let freshSessionOrdinal = 0;
221
247
 
222
248
  pi.on("session_start", async (_event, ctx) => {
223
- implicitSessionActive = false;
224
- implicitSessionName = createImplicitSessionName(ctx.sessionManager.getSessionId(), ctx.cwd, ephemeralSessionSeed);
225
- implicitSessionCwd = ctx.cwd;
249
+ managedSessionActive = false;
250
+ managedSessionBaseName = createImplicitSessionName(ctx.sessionManager.getSessionId(), ctx.cwd, ephemeralSessionSeed);
251
+ managedSessionName = managedSessionBaseName;
252
+ managedSessionCwd = ctx.cwd;
253
+ freshSessionOrdinal = 0;
226
254
  });
227
255
 
228
256
  pi.on("session_shutdown", async () => {
229
- implicitSessionActive = false;
230
- const controller = new AbortController();
231
- const timer = setTimeout(() => controller.abort(), implicitSessionCloseTimeoutMs);
232
- try {
233
- await runAgentBrowserProcess({
234
- args: ["--session", implicitSessionName, "close"],
235
- cwd: implicitSessionCwd,
236
- signal: controller.signal,
237
- });
238
- } catch {
239
- // Best-effort cleanup only.
240
- } finally {
241
- clearTimeout(timer);
242
- await cleanupSecureTempArtifacts();
243
- }
257
+ managedSessionActive = false;
258
+ await closeManagedSession({
259
+ cwd: managedSessionCwd,
260
+ sessionName: managedSessionName,
261
+ timeoutMs: implicitSessionCloseTimeoutMs,
262
+ });
263
+ await cleanupSecureTempArtifacts();
244
264
  });
245
265
 
246
266
  pi.on("before_agent_start", async (event) => {
@@ -284,17 +304,23 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
284
304
  }
285
305
 
286
306
  const sessionMode = params.sessionMode ?? DEFAULT_SESSION_MODE;
307
+ const freshSessionName = createFreshSessionName(managedSessionBaseName, ephemeralSessionSeed, freshSessionOrdinal + 1);
287
308
  const executionPlan = buildExecutionPlan(params.args, {
288
- implicitSessionActive,
289
- implicitSessionName,
309
+ freshSessionName,
310
+ managedSessionActive,
311
+ managedSessionName,
290
312
  sessionMode,
291
313
  });
314
+ if (executionPlan.managedSessionName === freshSessionName) {
315
+ freshSessionOrdinal += 1;
316
+ }
292
317
 
293
318
  if (executionPlan.validationError) {
294
319
  return {
295
320
  content: [{ type: "text", text: executionPlan.validationError }],
296
321
  details: {
297
322
  args: params.args,
323
+ invalidValueFlag: executionPlan.invalidValueFlag,
298
324
  sessionMode,
299
325
  sessionRecoveryHint: executionPlan.recoveryHint,
300
326
  startupScopedFlags: executionPlan.startupScopedFlags,
@@ -317,9 +343,7 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
317
343
  const processResult = await runAgentBrowserProcess({
318
344
  args: executionPlan.effectiveArgs,
319
345
  cwd: ctx.cwd,
320
- env: executionPlan.usedImplicitSession
321
- ? { AGENT_BROWSER_IDLE_TIMEOUT_MS: implicitSessionIdleTimeoutMs }
322
- : undefined,
346
+ env: executionPlan.managedSessionName ? { AGENT_BROWSER_IDLE_TIMEOUT_MS: implicitSessionIdleTimeoutMs } : undefined,
323
347
  signal,
324
348
  stdin: params.stdin,
325
349
  });
@@ -365,12 +389,27 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
365
389
  }
366
390
  }
367
391
 
368
- implicitSessionActive = resolveImplicitSessionActiveState({
392
+ const priorManagedSessionCwd = managedSessionCwd;
393
+ const managedSessionState = resolveManagedSessionState({
369
394
  command: executionPlan.commandInfo.command,
370
- priorActive: implicitSessionActive,
395
+ managedSessionName: executionPlan.managedSessionName,
396
+ priorActive: managedSessionActive,
397
+ priorSessionName: managedSessionName,
371
398
  succeeded,
372
- usedImplicitSession: executionPlan.usedImplicitSession,
373
399
  });
400
+ const replacedManagedSessionName = managedSessionState.replacedSessionName;
401
+ managedSessionActive = managedSessionState.active;
402
+ managedSessionName = managedSessionState.sessionName;
403
+ if (executionPlan.managedSessionName && succeeded) {
404
+ managedSessionCwd = ctx.cwd;
405
+ }
406
+ if (replacedManagedSessionName) {
407
+ await closeManagedSession({
408
+ cwd: priorManagedSessionCwd,
409
+ sessionName: replacedManagedSessionName,
410
+ timeoutMs: implicitSessionCloseTimeoutMs,
411
+ });
412
+ }
374
413
 
375
414
  const errorText = getAgentBrowserErrorText({
376
415
  aborted: processResult.aborted,
@@ -399,6 +438,7 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
399
438
  content: presentation.content,
400
439
  details: {
401
440
  args: params.args,
441
+ batchFailure: presentation.batchFailure,
402
442
  batchSteps: presentation.batchSteps,
403
443
  command: executionPlan.commandInfo.command,
404
444
  subcommand: executionPlan.commandInfo.subcommand,
@@ -10,6 +10,10 @@ import { readFile } from "node:fs/promises";
10
10
 
11
11
  import { type AgentBrowserBatchResult, type AgentBrowserEnvelope, isRecord, stringifyUnknown } from "./shared.js";
12
12
 
13
+ function hasStructuredBatchStepFailure(data: unknown): data is AgentBrowserBatchResult[] {
14
+ return Array.isArray(data) && data.some((item) => isRecord(item) && item.success === false);
15
+ }
16
+
13
17
  async function readEnvelopeSource(options: { stdout: string; stdoutPath?: string }): Promise<string> {
14
18
  if (!options.stdoutPath) {
15
19
  return options.stdout;
@@ -93,6 +97,9 @@ export function getAgentBrowserErrorText(options: {
93
97
  if (spawnError) return spawnError.message;
94
98
  if (parseError) return parseError;
95
99
  if (envelope?.success === false) {
100
+ if (hasStructuredBatchStepFailure(envelope.data) && envelope.error === undefined) {
101
+ return undefined;
102
+ }
96
103
  return extractEnvelopeErrorText(envelope.error) ?? (stderr.trim() || `agent-browser reported failure${exitCode !== 0 ? ` (exit code ${exitCode})` : "."}`);
97
104
  }
98
105
  if (exitCode !== 0) {
@@ -14,6 +14,7 @@ import { buildSnapshotPresentation, formatRawSnapshotText, formatSnapshotSummary
14
14
  import {
15
15
  type AgentBrowserBatchResult,
16
16
  type AgentBrowserEnvelope,
17
+ type BatchFailurePresentationDetails,
17
18
  type BatchStepPresentationDetails,
18
19
  type ToolPresentation,
19
20
  isRecord,
@@ -188,6 +189,20 @@ function formatBatchStepError(error: unknown): string {
188
189
  return errorText.length > 0 ? `Error: ${errorText}` : "Error: batch step failed.";
189
190
  }
190
191
 
192
+ function getBatchFailureDetails(steps: Array<{ details: BatchStepPresentationDetails }>): BatchFailurePresentationDetails | undefined {
193
+ const failedSteps = steps.filter((step) => step.details.success === false);
194
+ if (failedSteps.length === 0) {
195
+ return undefined;
196
+ }
197
+ const successCount = steps.length - failedSteps.length;
198
+ return {
199
+ failedStep: failedSteps[0].details,
200
+ failureCount: failedSteps.length,
201
+ successCount,
202
+ totalCount: steps.length,
203
+ };
204
+ }
205
+
191
206
  async function buildBatchStepPresentation(options: {
192
207
  cwd: string;
193
208
  index: number;
@@ -261,6 +276,7 @@ async function buildBatchPresentation(options: {
261
276
  steps.push(await buildBatchStepPresentation({ cwd, index, item }));
262
277
  }
263
278
 
279
+ const batchFailure = getBatchFailureDetails(steps);
264
280
  const images = steps.flatMap((step) => getPresentationImages(step.presentation));
265
281
  const fullOutputPaths = steps.flatMap((step) => getPresentationPaths({
266
282
  primaryPath: step.presentation.fullOutputPath,
@@ -270,13 +286,14 @@ async function buildBatchPresentation(options: {
270
286
  primaryPath: step.presentation.imagePath,
271
287
  secondaryPaths: step.presentation.imagePaths,
272
288
  }));
273
- const text =
289
+ const stepText =
274
290
  steps.length === 0
275
291
  ? "(no batch steps)"
276
292
  : steps
277
293
  .map(({ details, presentation }) => {
278
294
  const inlineImageCount = getPresentationImages(presentation).length;
279
- const lines = [`Step ${details.index + 1} ${details.commandText}`];
295
+ const status = details.success ? "succeeded" : "failed";
296
+ const lines = [`Step ${details.index + 1} — ${details.commandText} (${status})`];
280
297
  if (details.text.length > 0) {
281
298
  lines.push(details.text);
282
299
  }
@@ -286,8 +303,20 @@ async function buildBatchPresentation(options: {
286
303
  return lines.join("\n");
287
304
  })
288
305
  .join("\n\n");
306
+ const failureHeader =
307
+ batchFailure === undefined
308
+ ? undefined
309
+ : [
310
+ summary,
311
+ `First failing step: ${batchFailure.failedStep.index + 1} — ${batchFailure.failedStep.commandText}`,
312
+ batchFailure.failureCount > 1
313
+ ? `${batchFailure.failureCount} steps failed. See the per-step results below.`
314
+ : "See the per-step results below.",
315
+ ].join("\n");
316
+ const text = failureHeader ? `${failureHeader}\n\n${stepText}` : stepText;
289
317
 
290
318
  return {
319
+ batchFailure,
291
320
  batchSteps: steps.map((step) => step.details),
292
321
  content: [{ type: "text", text }, ...images],
293
322
  data,
@@ -302,7 +331,7 @@ async function buildBatchPresentation(options: {
302
331
  function formatSummary(commandInfo: CommandInfo, data: unknown): string {
303
332
  if (Array.isArray(data) && commandInfo.command === "batch") {
304
333
  const successCount = data.filter((item) => isRecord(item) && item.success !== false).length;
305
- return `Batch: ${successCount}/${data.length} succeeded`;
334
+ return successCount === data.length ? `Batch: ${successCount}/${data.length} succeeded` : `Batch failed: ${successCount}/${data.length} succeeded`;
306
335
  }
307
336
  if (isRecord(data)) {
308
337
  const navigationSummary = getNavigationSummary(data);
@@ -33,7 +33,15 @@ export interface BatchStepPresentationDetails {
33
33
  text: string;
34
34
  }
35
35
 
36
+ export interface BatchFailurePresentationDetails {
37
+ failedStep: BatchStepPresentationDetails;
38
+ failureCount: number;
39
+ successCount: number;
40
+ totalCount: number;
41
+ }
42
+
36
43
  export interface ToolPresentation {
44
+ batchFailure?: BatchFailurePresentationDetails;
37
45
  batchSteps?: BatchStepPresentationDetails[];
38
46
  content: Array<{ text: string; type: "text" } | { data: string; mimeType: string; type: "image" }>;
39
47
  data?: unknown;
@@ -1,9 +1,9 @@
1
1
  /**
2
2
  * Purpose: Build safe, deterministic agent-browser invocations for the pi-agent-browser extension.
3
- * Responsibilities: Validate raw tool arguments, derive implicit session names from the pi session identity, resolve implicit-session timeout/state helpers, detect explicit session usage, and build the effective CLI argument list passed to the upstream agent-browser binary.
3
+ * Responsibilities: Validate raw tool arguments, derive extension-managed session names from the pi session identity, resolve managed-session timeout/state helpers, detect explicit session usage, and build the effective CLI argument list passed to the upstream agent-browser binary.
4
4
  * Scope: Pure runtime-planning helpers only; no subprocess execution or filesystem access lives here.
5
5
  * Usage: Imported by the extension entrypoint and unit tests before spawning the upstream CLI.
6
- * Invariants/Assumptions: The wrapper stays thin, preserves upstream command vocabulary, and only injects `--json` plus an implicit `--session` when appropriate.
6
+ * Invariants/Assumptions: The wrapper stays thin, preserves upstream command vocabulary, and only injects `--json` plus an extension-managed `--session` when appropriate.
7
7
  */
8
8
 
9
9
  import { createHash, randomUUID } from "node:crypto";
@@ -49,6 +49,8 @@ const GLOBAL_FLAGS_WITH_VALUES = new Set([
49
49
  ]);
50
50
  const SHELL_OPERATOR_TOKENS = new Set(["&&", "||", "|", ";", ">", ">>", "<"]);
51
51
  const MAX_PROJECT_SLUG_LENGTH = 24;
52
+ const SESSION_NAME_CWD_HASH_LENGTH = 8;
53
+ const SESSION_NAME_SESSION_ID_LENGTH = 12;
52
54
 
53
55
  export interface CommandInfo {
54
56
  command?: string;
@@ -64,9 +66,18 @@ export interface SessionRecoveryHint {
64
66
  recommendedSessionMode: "fresh";
65
67
  }
66
68
 
69
+ export interface InvalidValueFlagDetails {
70
+ flag: string;
71
+ index: number;
72
+ reason: "missing-value" | "unexpected-flag";
73
+ receivedToken?: string;
74
+ }
75
+
67
76
  export interface ExecutionPlan {
68
77
  commandInfo: CommandInfo;
69
78
  effectiveArgs: string[];
79
+ invalidValueFlag?: InvalidValueFlagDetails;
80
+ managedSessionName?: string;
70
81
  recoveryHint?: SessionRecoveryHint;
71
82
  sessionName?: string;
72
83
  startupScopedFlags: string[];
@@ -74,6 +85,12 @@ export interface ExecutionPlan {
74
85
  validationError?: string;
75
86
  }
76
87
 
88
+ export interface ManagedSessionState {
89
+ active: boolean;
90
+ replacedSessionName?: string;
91
+ sessionName: string;
92
+ }
93
+
77
94
  export interface PromptPolicy {
78
95
  allowLegacyAgentBrowserBash: boolean;
79
96
  }
@@ -105,25 +122,38 @@ export function getImplicitSessionCloseTimeoutMs(env: NodeJS.ProcessEnv = proces
105
122
  return parseTimeoutMs(env[IMPLICIT_SESSION_CLOSE_TIMEOUT_ENV], 0) ?? DEFAULT_IMPLICIT_SESSION_CLOSE_TIMEOUT_MS;
106
123
  }
107
124
 
108
- export function resolveImplicitSessionActiveState(options: {
125
+ export function resolveManagedSessionState(options: {
109
126
  command?: string;
127
+ managedSessionName?: string;
110
128
  priorActive: boolean;
129
+ priorSessionName: string;
111
130
  succeeded: boolean;
112
- usedImplicitSession: boolean;
113
- }): boolean {
114
- const { command, priorActive, succeeded, usedImplicitSession } = options;
115
- if (!usedImplicitSession) return priorActive;
116
- if (command === "close") {
117
- return succeeded ? false : priorActive;
131
+ }): ManagedSessionState {
132
+ const { command, managedSessionName, priorActive, priorSessionName, succeeded } = options;
133
+ if (!managedSessionName) {
134
+ return { active: priorActive, sessionName: priorSessionName };
118
135
  }
119
- if (!command) return priorActive;
120
- return priorActive || succeeded;
136
+ if (command === "close" && managedSessionName === priorSessionName) {
137
+ return { active: succeeded ? false : priorActive, sessionName: priorSessionName };
138
+ }
139
+ if (!succeeded) {
140
+ return { active: priorActive, sessionName: priorSessionName };
141
+ }
142
+ return {
143
+ active: true,
144
+ replacedSessionName: priorActive && priorSessionName !== managedSessionName ? priorSessionName : undefined,
145
+ sessionName: managedSessionName,
146
+ };
121
147
  }
122
148
 
123
149
  export function createEphemeralSessionSeed(): string {
124
150
  return randomUUID();
125
151
  }
126
152
 
153
+ function createCwdHash(cwd: string): string {
154
+ return createHash("sha256").update(`cwd:${cwd}`).digest("hex").slice(0, SESSION_NAME_CWD_HASH_LENGTH);
155
+ }
156
+
127
157
  export function createImplicitSessionName(
128
158
  sessionId: string | undefined,
129
159
  cwd: string,
@@ -135,13 +165,25 @@ export function createImplicitSessionName(
135
165
  .replace(/[^a-z0-9]+/g, "-")
136
166
  .replace(/^-+|-+$/g, "")
137
167
  .slice(0, MAX_PROJECT_SLUG_LENGTH) || "project";
138
- const stableSessionId = sessionId?.replace(/-/g, "").slice(0, 12);
168
+ const cwdHash = createCwdHash(cwd);
169
+ const stableSessionId = sessionId?.replace(/-/g, "").slice(0, SESSION_NAME_SESSION_ID_LENGTH);
139
170
  if (stableSessionId && stableSessionId.length > 0) {
140
- return `piab-${slug}-${stableSessionId}`;
171
+ return `piab-${slug}-${stableSessionId}-${cwdHash}`;
141
172
  }
142
173
 
143
- const digest = createHash("sha256").update(`ephemeral:${cwd}:${ephemeralSeed}`).digest("hex").slice(0, 12);
144
- return `piab-${slug}-${digest}`;
174
+ const digest = createHash("sha256")
175
+ .update(`ephemeral:${cwd}:${ephemeralSeed}`)
176
+ .digest("hex")
177
+ .slice(0, SESSION_NAME_SESSION_ID_LENGTH);
178
+ return `piab-${slug}-${digest}-${cwdHash}`;
179
+ }
180
+
181
+ export function createFreshSessionName(baseSessionName: string, ephemeralSeed: string, ordinal: number): string {
182
+ const suffix = createHash("sha256")
183
+ .update(`fresh:${baseSessionName}:${ephemeralSeed}:${ordinal}`)
184
+ .digest("hex")
185
+ .slice(0, 10);
186
+ return `${baseSessionName}-fresh-${suffix}`;
145
187
  }
146
188
 
147
189
  export function validateToolArgs(args: string[]): string | undefined {
@@ -157,6 +199,54 @@ export function validateToolArgs(args: string[]): string | undefined {
157
199
  return undefined;
158
200
  }
159
201
 
202
+ function getInvalidValueFlagDetails(args: string[]): InvalidValueFlagDetails | undefined {
203
+ for (const [index, token] of args.entries()) {
204
+ if (!token.startsWith("-")) {
205
+ continue;
206
+ }
207
+ const normalizedToken = token.split("=", 1)[0] ?? token;
208
+ if (!GLOBAL_FLAGS_WITH_VALUES.has(normalizedToken)) {
209
+ continue;
210
+ }
211
+ if (token.includes("=")) {
212
+ const value = token.slice(token.indexOf("=") + 1).trim();
213
+ if (value.length === 0) {
214
+ return {
215
+ flag: normalizedToken,
216
+ index,
217
+ reason: "missing-value",
218
+ };
219
+ }
220
+ continue;
221
+ }
222
+ const receivedToken = args[index + 1];
223
+ if (receivedToken === undefined) {
224
+ return {
225
+ flag: normalizedToken,
226
+ index,
227
+ reason: "missing-value",
228
+ };
229
+ }
230
+ if (receivedToken.startsWith("-")) {
231
+ return {
232
+ flag: normalizedToken,
233
+ index,
234
+ reason: "unexpected-flag",
235
+ receivedToken,
236
+ };
237
+ }
238
+ continue;
239
+ }
240
+ return undefined;
241
+ }
242
+
243
+ function formatInvalidValueFlagError(details: InvalidValueFlagDetails): string {
244
+ if (details.reason === "unexpected-flag" && details.receivedToken) {
245
+ return `Flag \`${details.flag}\` requires a value, but received \`${details.receivedToken}\` instead. Pass a non-flag value immediately after \`${details.flag}\`.`;
246
+ }
247
+ return `Flag \`${details.flag}\` requires a value immediately after it. Pass a non-flag token like \`${details.flag} demo\`.`;
248
+ }
249
+
160
250
  function hasFlagToken(args: string[], flag: string): boolean {
161
251
  return args.some((token) => token === flag || token.startsWith(`${flag}=`));
162
252
  }
@@ -213,35 +303,60 @@ export function getLatestUserPrompt(branch: unknown[]): string {
213
303
 
214
304
  export function buildExecutionPlan(
215
305
  args: string[],
216
- options: { implicitSessionActive: boolean; implicitSessionName: string; sessionMode: SessionMode },
306
+ options: {
307
+ freshSessionName: string;
308
+ managedSessionActive: boolean;
309
+ managedSessionName: string;
310
+ sessionMode: SessionMode;
311
+ },
217
312
  ): ExecutionPlan {
313
+ const effectiveArgs = args.includes("--json") ? [] : ["--json"];
314
+ const invalidValueFlag = getInvalidValueFlagDetails(args);
315
+ if (invalidValueFlag) {
316
+ return {
317
+ commandInfo: {},
318
+ effectiveArgs,
319
+ invalidValueFlag,
320
+ startupScopedFlags: [],
321
+ usedImplicitSession: false,
322
+ validationError: formatInvalidValueFlagError(invalidValueFlag),
323
+ };
324
+ }
325
+
218
326
  const commandInfo = parseCommandInfo(args);
219
327
  const explicitSessionName = extractExplicitSessionName(args);
220
328
  const startupScopedFlags = getStartupScopedFlags(args);
221
- const effectiveArgs = args.includes("--json") ? [] : ["--json"];
329
+ const shouldCreateFreshManagedSession =
330
+ !explicitSessionName && options.sessionMode === "fresh" && commandInfo.command !== undefined && commandInfo.command !== "close";
331
+ let managedSessionName: string | undefined;
222
332
  let recoveryHint: SessionRecoveryHint | undefined;
223
333
  let sessionName = explicitSessionName;
224
334
  let usedImplicitSession = false;
225
335
  let validationError: string | undefined;
226
336
 
227
337
  if (!explicitSessionName && options.sessionMode === "auto") {
228
- if (options.implicitSessionActive && startupScopedFlags.length > 0) {
338
+ if (options.managedSessionActive && startupScopedFlags.length > 0) {
229
339
  recoveryHint = {
230
340
  exampleArgs: args,
231
341
  exampleParams: { args, sessionMode: "fresh" },
232
342
  reason:
233
- "Startup-scoped flags like --profile, --session-name, and --cdp need a fresh upstream launch once the implicit session is already active.",
343
+ "Startup-scoped flags like --profile, --session-name, and --cdp need a fresh upstream launch once the extension-managed session is already active.",
234
344
  recommendedSessionMode: "fresh",
235
345
  };
236
346
  validationError = [
237
- `The current implicit agent-browser session is already running, so startup-scoped flags ${startupScopedFlags.join(", ")} would be ignored by upstream agent-browser.`,
347
+ `The current extension-managed agent-browser session is already running, so startup-scoped flags ${startupScopedFlags.join(", ")} would be ignored by upstream agent-browser.`,
238
348
  "Retry this call with `sessionMode: \"fresh\"` to force a fresh upstream launch, or pass an explicit `--session ...` if you want to name the new session yourself.",
239
349
  ].join(" ");
240
350
  } else {
241
- effectiveArgs.push("--session", options.implicitSessionName);
242
- sessionName = options.implicitSessionName;
351
+ effectiveArgs.push("--session", options.managedSessionName);
352
+ managedSessionName = options.managedSessionName;
353
+ sessionName = options.managedSessionName;
243
354
  usedImplicitSession = true;
244
355
  }
356
+ } else if (shouldCreateFreshManagedSession) {
357
+ effectiveArgs.push("--session", options.freshSessionName);
358
+ managedSessionName = options.freshSessionName;
359
+ sessionName = options.freshSessionName;
245
360
  }
246
361
 
247
362
  effectiveArgs.push(...args);
@@ -249,6 +364,7 @@ export function buildExecutionPlan(
249
364
  return {
250
365
  commandInfo,
251
366
  effectiveArgs,
367
+ managedSessionName,
252
368
  recoveryHint,
253
369
  sessionName,
254
370
  startupScopedFlags,
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "pi-agent-browser-native",
3
- "version": "0.2.0",
3
+ "version": "0.2.1",
4
4
  "description": "pi extension that exposes agent-browser as a native tool for browser automation",
5
5
  "type": "module",
6
6
  "author": "Mitch Fultz (https://github.com/fitchmultz)",