npm - pi-agent-browser-native - Versions diffs - 0.2.0 → 0.2.1 - Mend

pi-agent-browser-native 0.2.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/CHANGELOG.md +10 -0
package/README.md +15 -8
package/docs/ARCHITECTURE.md +11 -7
package/docs/TOOL_CONTRACT.md +11 -9
package/extensions/agent-browser/index.ts +78 -38
package/extensions/agent-browser/lib/results/envelope.ts +7 -0
package/extensions/agent-browser/lib/results/presentation.ts +32 -3
package/extensions/agent-browser/lib/results/shared.ts +8 -0
package/extensions/agent-browser/lib/runtime.ts +138 -22
package/package.json +1 -1

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,15 @@
 # Changelog
+## 0.2.1 - 2026-04-12
+### Fixed
+- the GitHub source trial docs now use `pi --no-extensions -e https://github.com/fitchmultz/pi-agent-browser-native` so published-package users do not hit duplicate `agent_browser` registration conflicts during source-path testing
+- successful unnamed `sessionMode: "fresh"` launches now rotate the extension-managed session to the new browser, and later default `sessionMode: "auto"` calls keep following that fresh session instead of silently snapping back to the older one
+- mixed-success `batch` failures now preserve per-step rendering, include the first failing step in the visible output and structured details, and still mark the overall tool call as an error so agents can recover from partial progress
+- implicit `piab-*` session names now include a stable cwd hash in addition to the `pi` session id so same-named checkouts and worktrees no longer collide onto the same browser session
+- value-taking flags like `--session`, `--profile`, `--session-name`, and `--cdp` now fail locally with direct validation errors when the value is missing or replaced by another flag, instead of producing confusing downstream JSON parse failures
+- the bash guard now catches wrapped `agent-browser` invocations such as `env agent-browser ...`, `npx --yes agent-browser ...`, `pnpm dlx agent-browser ...`, `yarn dlx agent-browser ...`, `bunx agent-browser ...`, and absolute-path execution, reducing accidental bypasses of the native-tool path
 ## 0.2.0 - 2026-04-12
 ### Changed

package/README.md CHANGED Viewed

@@ -69,12 +69,14 @@ For the source install path, prefer the repository URL:
 pi install https://github.com/fitchmultz/pi-agent-browser-native
 ```
-To try the GitHub source without installing it permanently:
+To try the GitHub source without installing it permanently, isolate that temporary source extension from your normal installed package set:
 ```bash
-pi -e https://github.com/fitchmultz/pi-agent-browser-native
+pi --no-extensions -e https://github.com/fitchmultz/pi-agent-browser-native
 ```
+This avoids duplicate `agent_browser` registrations when you already have `pi-agent-browser-native` installed globally.
 ### Current practical local-checkout flow
 Until you are using a published package release, prefer an explicit checkout-only run instead of installing the checkout into your normal `pi` package set:
@@ -89,8 +91,8 @@ The native tool exposed to the agent is named `agent_browser`.
 The primary session control parameter is `sessionMode`:
-- `"auto"` (default) reuses the implicit `pi`-scoped session when possible
-- `"fresh"` skips that implicit session so startup-scoped flags like `--profile`, `--session-name`, and `--cdp` can launch a fresh upstream session
+- `"auto"` (default) reuses the extension-managed `pi`-scoped session when possible
+- `"fresh"` switches that managed session to a fresh upstream launch so startup-scoped flags like `--profile`, `--session-name`, and `--cdp` apply and later auto calls follow the new browser
 ## Agent quick start
@@ -99,8 +101,8 @@ The primary session control parameter is `sessionMode`:
 - `args` — exact CLI args after `agent-browser`
 - `stdin` — raw stdin only for `batch` and `eval --stdin`
 - `sessionMode`
-  - `"auto"` — default, reuse the implicit `pi`-scoped session
-  - `"fresh"` — skip the implicit session for a new profile/debug launch
+  - `"auto"` — default, reuse the extension-managed `pi`-scoped session
+  - `"fresh"` — switch that managed session to a new profile/debug launch
 ### Common call shapes
@@ -136,7 +138,9 @@ Start a fresh profiled launch after you already used the implicit session:
 { "args": ["--profile", "Default", "open", "https://example.com/account"], "sessionMode": "fresh" }
 ```
-Name a new upstream session explicitly when you want to keep reusing it:
+After a successful unnamed fresh launch, later `sessionMode: "auto"` calls follow that new browser automatically.
+Name a new upstream session explicitly when you want to keep reusing it yourself:
 ```json
 { "args": ["--session", "auth-flow", "open", "https://example.com"] }
@@ -185,7 +189,8 @@ Current cautions:
 - passing `--profile` is an explicit upstream choice; this extension does not add its own profile-cloning or isolation layer
 - startup-scoped flags like `--profile`, `--session-name`, and `--cdp` are for the first command that launches a session; if the implicit session is already active, retry that call with `sessionMode: "fresh"` or provide an explicit `--session ...` for the new launch
 - implicit `piab-*` sessions are extension-managed convenience sessions; they are best-effort closed on `pi` shutdown, get an idle timeout to reduce stale background daemons, and clean up private temp spill artifacts on shutdown
-- explicit upstream sessions like `--session`, `--profile`, `--session-name`, and `--cdp` are treated as user-managed and are not auto-closed by the extension
+- `sessionMode: "fresh"` without an explicit `--session` rotates that extension-managed session to the new browser so later auto calls keep using it
+- explicit caller-provided `--session` values are treated as user-managed and are not auto-closed by the extension
 ### Switching from public browsing to a fresh profile/debug launch
@@ -203,6 +208,8 @@ Use `sessionMode: "fresh"` for that transition instead of relying on the implici
 }
 ```
+After that call succeeds, later default `sessionMode: "auto"` calls continue in the new fresh browser.
 If you want to name the new upstream session yourself, pass an explicit session instead:
 ```json

package/docs/ARCHITECTURE.md CHANGED Viewed

@@ -59,7 +59,7 @@ The published package should exclude agent-only and superseded repo materials su
 ### Default
-If the caller does not provide `--session`, the extension should default to `sessionMode: "auto"` and use an implicit session name derived from the current `pi` session id.
+If the caller does not provide `--session`, the extension should default to `sessionMode: "auto"` and use an implicit session name derived from the current `pi` session id plus a hash of the absolute cwd.
 Why:
 - works out of the box
@@ -70,20 +70,22 @@ Why:
 If the caller provides `--session`, `--profile`, `--cdp`, or similar upstream flags, the extension should respect them with minimal interference.
-The tool should also expose a first-class `sessionMode: "fresh"` escape hatch so agents can intentionally skip the implicit session and launch a fresh upstream session without inventing a fixed explicit session name.
+The tool should also expose a first-class `sessionMode: "fresh"` escape hatch so agents can intentionally rotate the extension-managed session to a fresh upstream launch without inventing a fixed explicit session name.
 ### Ownership
 V1 ownership rule:
 - implicit auto-generated sessions are extension-managed convenience sessions
+- unnamed `sessionMode: "fresh"` launches rotate that extension-managed session to a new upstream browser
 - explicit/user-managed sessions are not auto-managed by default
-- implicit sessions should be reusable during an active `pi` session, but should still be cleaned up predictably
+- extension-managed sessions should be reusable during an active `pi` session, but should still be cleaned up predictably
 Practical policy:
-- on normal `pi` shutdown, best-effort close the implicit session
-- also set an idle timeout on implicit sessions so abandoned daemons self-clean after inactivity
-- clean up private temp spill artifacts owned by the implicit session on shutdown
-- leave explicit upstream sessions like `--session`, `--profile`, `--session-name`, and `--cdp` alone unless the caller closes them explicitly
+- on normal `pi` shutdown, best-effort close the current extension-managed session
+- also set an idle timeout on extension-managed sessions so abandoned daemons self-clean after inactivity
+- clean up private temp spill artifacts owned by the extension-managed session on shutdown
+- if an unnamed fresh launch replaces an active extension-managed session, best-effort close the old managed session after the switch succeeds
+- leave explicit caller-provided `--session` choices alone unless the caller closes them explicitly
 This is primarily about ownership clarity and avoiding surprise, not adding a heavy safety wrapper. If the extension invented the session, the extension should clean it up. If the caller explicitly chose the upstream session model, the extension should stay out of the way.
@@ -98,6 +100,8 @@ If the implicit session is already active and one of those startup-scoped flags
 That failure should include a structured recovery hint pointing to `sessionMode: "fresh"` as the first-line fix, while still allowing an explicit `--session` when the caller wants to name the new upstream session.
+A successful unnamed `sessionMode: "fresh"` launch should become the new extension-managed session so later default calls follow that browser instead of silently snapping back to the older managed session.
 ## Preferring the native tool
 Keep the handling simple:

package/docs/TOOL_CONTRACT.md CHANGED Viewed

@@ -77,8 +77,8 @@ Examples:
 Behavior:
 - if `args` already include `--session`, upstream session choice wins
-- `"auto"` prepends the implicit active session when appropriate
-- `"fresh"` skips the implicit session so startup-scoped flags like `--profile`, `--session-name`, or `--cdp` can launch a fresh upstream session
+- `"auto"` prepends the current extension-managed active session when appropriate
+- `"fresh"` rotates that managed session to a fresh upstream launch so startup-scoped flags like `--profile`, `--session-name`, or `--cdp` apply and later default calls follow the new browser
 Recommended use:
 - use `"auto"` for the common browse/snapshot/click flow inside one `pi` session
@@ -157,14 +157,16 @@ If `agent-browser` is not on `PATH`, fail with a message that:
 ## Session behavior
-- maintain one implicit active session per `pi` session for the common path
-- derive that implicit session from the official `pi` session id
+- maintain one extension-managed active session per `pi` session for the common path
+- derive the base implicit session name from the official `pi` session id plus a cwd hash so same-named checkouts do not collide
 - respect explicit upstream `--session` with minimal interference
-- treat the implicit session as extension-managed convenience state
-- on normal `pi` shutdown, best-effort close the implicit session
-- set an idle timeout on implicit sessions so abandoned daemons eventually self-clean
-- clean up private temp spill artifacts owned by the implicit session on shutdown
-- treat explicit upstream session choices like `--session`, `--profile`, `--session-name`, and `--cdp` as user-managed
+- treat the extension-managed session as convenience state owned by the wrapper
+- on normal `pi` shutdown, best-effort close the current extension-managed session
+- set an idle timeout on extension-managed sessions so abandoned daemons eventually self-clean
+- clean up private temp spill artifacts owned by the extension-managed session on shutdown
+- when an unnamed `sessionMode: "fresh"` launch succeeds, make it the new extension-managed session so later default calls keep using it
+- if that unnamed fresh launch replaced an already-active managed session, best-effort close the old managed session after the switch succeeds
+- treat explicit caller-provided `--session` choices as user-managed
 - pass explicit `--profile` straight through to upstream `agent-browser`; no profile-cloning or isolation layer is added in v1
 - if startup-scoped flags like `--profile`, `--session-name`, or `--cdp` are supplied after the implicit session is already active while `sessionMode` is `"auto"`, return a validation error with a structured recovery hint that recommends `sessionMode: "fresh"`

package/extensions/agent-browser/index.ts CHANGED Viewed

@@ -1,6 +1,6 @@
 /**
  * Purpose: Register the native agent_browser tool for pi so agents can invoke agent-browser without going through bash.
- * Responsibilities: Define the tool schema, inject thin wrapper behavior around the upstream CLI, manage implicit session convenience, and return pi-friendly content/details.
+ * Responsibilities: Define the tool schema, inject thin wrapper behavior around the upstream CLI, manage extension-owned browser session convenience, and return pi-friendly content/details.
  * Scope: Native tool registration and orchestration only; the wrapper intentionally stays close to the upstream agent-browser CLI.
  * Usage: Loaded by pi through the package manifest in this package, or explicitly via `pi --no-extensions -e .` during local checkout development.
  * Invariants/Assumptions: agent-browser is installed separately on PATH, the wrapper targets the current locally installed upstream version only, and no backward-compatibility shims are provided.
@@ -17,12 +17,13 @@ import {
 	buildExecutionPlan,
 	buildPromptPolicy,
 	createEphemeralSessionSeed,
+	createFreshSessionName,
 	createImplicitSessionName,
 	getImplicitSessionCloseTimeoutMs,
 	getImplicitSessionIdleTimeoutMs,
 	getLatestUserPrompt,
 	hasUsableBraveApiKey,
-	resolveImplicitSessionActiveState,
+	resolveManagedSessionState,
 	validateToolArgs,
 } from "./lib/runtime.js";
 import { cleanupSecureTempArtifacts } from "./lib/temp.js";
@@ -38,7 +39,7 @@ const AGENT_BROWSER_PARAMS = Type.Object({
 	sessionMode: Type.Optional(
 		Type.Union([Type.Literal("auto"), Type.Literal("fresh")], {
 			description:
-				"Session handling mode. `auto` reuses the implicit pi-scoped session when possible. `fresh` skips the implicit session so startup-scoped flags like --profile, --session-name, or --cdp can launch a fresh upstream session.",
+				"Session handling mode. `auto` reuses the extension-managed pi-scoped session when possible. `fresh` switches that managed session to a fresh upstream launch so startup-scoped flags like --profile, --session-name, or --cdp apply and later auto calls follow the new browser.",
 			default: DEFAULT_SESSION_MODE,
 		}),
 	),
@@ -46,7 +47,7 @@ const AGENT_BROWSER_PARAMS = Type.Object({
 const PROJECT_RULE_PROMPT =
 	"Project rule: when browser automation is needed, prefer the native `agent_browser` tool. Do not run direct `agent-browser` bash commands unless the user explicitly asks for a bash-oriented workflow or browser-integration debugging.";
 const QUICK_START_GUIDELINES = [
-	"Quick start mental model: args are the exact agent-browser CLI args after the binary; stdin is only for batch and eval --stdin; sessionMode=fresh starts a fresh upstream launch when you need new --profile, --session-name, or --cdp state.",
+	"Quick start mental model: args are the exact agent-browser CLI args after the binary; stdin is only for batch and eval --stdin; sessionMode=fresh switches the extension-managed session to a fresh upstream launch when you need new --profile, --session-name, or --cdp state.",
 	"Common first calls: { args: [\"open\", \"https://example.com\"] } then { args: [\"snapshot\", \"-i\"] }; after navigation, use { args: [\"click\", \"@e2\"] } then { args: [\"snapshot\", \"-i\"] }.",
 	"Common advanced calls: { args: [\"batch\"], stdin: \"[[\\\"open\\\",\\\"https://example.com\\\"],[\\\"snapshot\\\",\\\"-i\\\"]]\" }, { args: [\"eval\", \"--stdin\"], stdin: \"document.title\" }, and { args: [\"--profile\", \"Default\", \"open\", \"https://example.com/account\"], sessionMode: \"fresh\" }.",
 ] as const;
@@ -57,7 +58,7 @@ const SHARED_BROWSER_PLAYBOOK_GUIDELINES = [
 	"For authenticated or user-specific content like feeds, inboxes, dashboards, and accounts, prefer --profile Default on the first browser call and let the implicit session carry continuity. Use --auto-connect only if profile-based reuse is unavailable or the task is specifically about attaching to a running debug-enabled browser.",
 	"Do not invent fixed explicit session names for routine tasks. Use the implicit session unless you truly need multiple isolated browser sessions in the same conversation.",
 	"When using --profile, --session-name, or --cdp, put them on the first command for that session. If you intentionally use an explicit --session, keep using that same explicit session for follow-ups.",
-	"If you already used the implicit session and now need startup-scoped flags like --profile, --session-name, or --cdp, retry with sessionMode set to fresh or pass an explicit --session for the new launch.",
+	"If you already used the implicit session and now need startup-scoped flags like --profile, --session-name, or --cdp, retry with sessionMode set to fresh or pass an explicit --session for the new launch. After a successful unnamed fresh launch, later auto calls follow that new session.",
 	"If a session lands on the wrong page or tab, an interaction changes origin unexpectedly, or an open call returns blocked, blank, or otherwise unexpected results, use tab list / tab <n> / snapshot -i to recover state before retrying different URLs or fallback strategies. Only use wait with an explicit argument like milliseconds, --load, --url, --fn, or --text.",
 	"For feed, timeline, or inbox reading tasks, focus on the main timeline/list region and read the first item there rather than unrelated composer or sidebar content.",
 	"For read-only browsing tasks, prefer extracting the answer from the current snapshot, structured ref labels, or eval --stdin on the current page before navigating away. Only click into media viewers, detail routes, or new pages when the current view does not contain the needed information.",
@@ -71,8 +72,8 @@ const TOOL_PROMPT_GUIDELINES_SUFFIX = [
 	"Do not fall back to osascript, AppleScript, or generic browser-driving bash commands when this tool can do the job.",
 	"Pass exact agent-browser CLI arguments in args, excluding the binary name.",
 	"Use stdin for commands like eval --stdin and batch instead of shell heredocs.",
-	"Let the implicit session handle the common path unless you explicitly need a fresh launch for upstream flags like --profile, --session-name, or --cdp.",
-	"Use sessionMode=fresh when switching from an existing implicit session to a new profile/debug launch without inventing a fixed explicit session name.",
+	"Let the extension-managed session handle the common path unless you explicitly need a fresh launch for upstream flags like --profile, --session-name, or --cdp.",
+	"Use sessionMode=fresh when switching from an existing implicit session to a new profile/debug launch without inventing a fixed explicit session name; later auto calls will follow that new session.",
 ] as const;
 function buildMissingBinaryMessage(): string {
@@ -90,12 +91,19 @@ function buildInvocationPreview(effectiveArgs: string[]): string {
 	return preview.length > 120 ? `${preview.slice(0, 117)}...` : preview;
 }
+const AGENT_BROWSER_BASH_PREFIX = String.raw`(?:env(?:\s+[A-Za-z_][A-Za-z0-9_]*=[^\s;&|]+)*\s+)?(?:(?:npx|bunx)(?:\s+-[^\s;&|]+|\s+--[^\s;&|]+(?:=[^\s;&|]+)?)*\s+|(?:pnpm|yarn)\s+dlx(?:\s+-[^\s;&|]+|\s+--[^\s;&|]+(?:=[^\s;&|]+)?)*\s+)?`;
+const AGENT_BROWSER_BASH_EXECUTABLE = String.raw`(?:[.~]|\.\.?|\/)?(?:[^\s;&|]+\/)?agent-browser`;
+const DIRECT_AGENT_BROWSER_BASH_PATTERN = new RegExp(
+	String.raw`(^|[\s;&|])${AGENT_BROWSER_BASH_PREFIX}${AGENT_BROWSER_BASH_EXECUTABLE}(?=\s|$)`,
+);
+const HARMLESS_AGENT_BROWSER_INSPECTION_PATTERN = /(command\s+-v|which|type\s+-P)\s+agent-browser\b/;
 function looksLikeDirectAgentBrowserBash(command: string): boolean {
-	return /(^|[\s;&|])(npx\s+)?agent-browser(\s|$)/.test(command);
+	return DIRECT_AGENT_BROWSER_BASH_PATTERN.test(command);
 }
 function isHarmlessAgentBrowserInspectionCommand(command: string): boolean {
-	return /(command\s+-v|which)\s+agent-browser\b/.test(command) || /(^|\s)agent-browser\s+--(help|version)\b/.test(command);
+	return HARMLESS_AGENT_BROWSER_INSPECTION_PATTERN.test(command);
 }
 function isPlainTextInspectionArgs(args: string[]): boolean {
@@ -208,6 +216,22 @@ function buildToolPromptGuidelines(hasBraveApiKey: boolean): string[] {
 	];
 }
+async function closeManagedSession(options: { cwd: string; sessionName: string; timeoutMs: number }): Promise<void> {
+	const controller = new AbortController();
+	const timer = setTimeout(() => controller.abort(), options.timeoutMs);
+	try {
+		await runAgentBrowserProcess({
+			args: ["--session", options.sessionName, "close"],
+			cwd: options.cwd,
+			signal: controller.signal,
+		});
+	} catch {
+		// Best-effort cleanup only.
+	} finally {
+		clearTimeout(timer);
+	}
+}
 export default function agentBrowserExtension(pi: ExtensionAPI) {
 	const ephemeralSessionSeed = createEphemeralSessionSeed();
 	const hasBraveApiKey = hasUsableBraveApiKey();
@@ -215,32 +239,28 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
 	const toolPromptGuidelines = buildToolPromptGuidelines(hasBraveApiKey);
 	const implicitSessionIdleTimeoutMs = getImplicitSessionIdleTimeoutMs();
 	const implicitSessionCloseTimeoutMs = getImplicitSessionCloseTimeoutMs();
-	let implicitSessionActive = false;
-	let implicitSessionName = createImplicitSessionName(undefined, process.cwd(), ephemeralSessionSeed);
-	let implicitSessionCwd = process.cwd();
+	let managedSessionActive = false;
+	let managedSessionBaseName = createImplicitSessionName(undefined, process.cwd(), ephemeralSessionSeed);
+	let managedSessionName = managedSessionBaseName;
+	let managedSessionCwd = process.cwd();
+	let freshSessionOrdinal = 0;
 	pi.on("session_start", async (_event, ctx) => {
-		implicitSessionActive = false;
-		implicitSessionName = createImplicitSessionName(ctx.sessionManager.getSessionId(), ctx.cwd, ephemeralSessionSeed);
-		implicitSessionCwd = ctx.cwd;
+		managedSessionActive = false;
+		managedSessionBaseName = createImplicitSessionName(ctx.sessionManager.getSessionId(), ctx.cwd, ephemeralSessionSeed);
+		managedSessionName = managedSessionBaseName;
+		managedSessionCwd = ctx.cwd;
+		freshSessionOrdinal = 0;
 	});
 	pi.on("session_shutdown", async () => {
-		implicitSessionActive = false;
-		const controller = new AbortController();
-		const timer = setTimeout(() => controller.abort(), implicitSessionCloseTimeoutMs);
-		try {
-			await runAgentBrowserProcess({
-				args: ["--session", implicitSessionName, "close"],
-				cwd: implicitSessionCwd,
-				signal: controller.signal,
-			});
-		} catch {
-			// Best-effort cleanup only.
-		} finally {
-			clearTimeout(timer);
-			await cleanupSecureTempArtifacts();
-		}
+		managedSessionActive = false;
+		await closeManagedSession({
+			cwd: managedSessionCwd,
+			sessionName: managedSessionName,
+			timeoutMs: implicitSessionCloseTimeoutMs,
+		});
+		await cleanupSecureTempArtifacts();
 	});
 	pi.on("before_agent_start", async (event) => {
@@ -284,17 +304,23 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
 			}
 			const sessionMode = params.sessionMode ?? DEFAULT_SESSION_MODE;
+			const freshSessionName = createFreshSessionName(managedSessionBaseName, ephemeralSessionSeed, freshSessionOrdinal + 1);
 			const executionPlan = buildExecutionPlan(params.args, {
-				implicitSessionActive,
-				implicitSessionName,
+				freshSessionName,
+				managedSessionActive,
+				managedSessionName,
 				sessionMode,
 			});
+			if (executionPlan.managedSessionName === freshSessionName) {
+				freshSessionOrdinal += 1;
+			}
 			if (executionPlan.validationError) {
 				return {
 					content: [{ type: "text", text: executionPlan.validationError }],
 					details: {
 						args: params.args,
+						invalidValueFlag: executionPlan.invalidValueFlag,
 						sessionMode,
 						sessionRecoveryHint: executionPlan.recoveryHint,
 						startupScopedFlags: executionPlan.startupScopedFlags,
@@ -317,9 +343,7 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
 			const processResult = await runAgentBrowserProcess({
 				args: executionPlan.effectiveArgs,
 				cwd: ctx.cwd,
-				env: executionPlan.usedImplicitSession
-					? { AGENT_BROWSER_IDLE_TIMEOUT_MS: implicitSessionIdleTimeoutMs }
-					: undefined,
+				env: executionPlan.managedSessionName ? { AGENT_BROWSER_IDLE_TIMEOUT_MS: implicitSessionIdleTimeoutMs } : undefined,
 				signal,
 				stdin: params.stdin,
 			});
@@ -365,12 +389,27 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
 					}
 				}
-				implicitSessionActive = resolveImplicitSessionActiveState({
+				const priorManagedSessionCwd = managedSessionCwd;
+				const managedSessionState = resolveManagedSessionState({
 					command: executionPlan.commandInfo.command,
-					priorActive: implicitSessionActive,
+					managedSessionName: executionPlan.managedSessionName,
+					priorActive: managedSessionActive,
+					priorSessionName: managedSessionName,
 					succeeded,
-					usedImplicitSession: executionPlan.usedImplicitSession,
 				});
+				const replacedManagedSessionName = managedSessionState.replacedSessionName;
+				managedSessionActive = managedSessionState.active;
+				managedSessionName = managedSessionState.sessionName;
+				if (executionPlan.managedSessionName && succeeded) {
+					managedSessionCwd = ctx.cwd;
+				}
+				if (replacedManagedSessionName) {
+					await closeManagedSession({
+						cwd: priorManagedSessionCwd,
+						sessionName: replacedManagedSessionName,
+						timeoutMs: implicitSessionCloseTimeoutMs,
+					});
+				}
 				const errorText = getAgentBrowserErrorText({
 					aborted: processResult.aborted,
@@ -399,6 +438,7 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
 					content: presentation.content,
 					details: {
 						args: params.args,
+						batchFailure: presentation.batchFailure,
 						batchSteps: presentation.batchSteps,
 						command: executionPlan.commandInfo.command,
 						subcommand: executionPlan.commandInfo.subcommand,

package/extensions/agent-browser/lib/results/envelope.ts CHANGED Viewed

@@ -10,6 +10,10 @@ import { readFile } from "node:fs/promises";
 import { type AgentBrowserBatchResult, type AgentBrowserEnvelope, isRecord, stringifyUnknown } from "./shared.js";
+function hasStructuredBatchStepFailure(data: unknown): data is AgentBrowserBatchResult[] {
+	return Array.isArray(data) && data.some((item) => isRecord(item) && item.success === false);
+}
 async function readEnvelopeSource(options: { stdout: string; stdoutPath?: string }): Promise<string> {
 	if (!options.stdoutPath) {
 		return options.stdout;
@@ -93,6 +97,9 @@ export function getAgentBrowserErrorText(options: {
 	if (spawnError) return spawnError.message;
 	if (parseError) return parseError;
 	if (envelope?.success === false) {
+		if (hasStructuredBatchStepFailure(envelope.data) && envelope.error === undefined) {
+			return undefined;
+		}
 		return extractEnvelopeErrorText(envelope.error) ?? (stderr.trim() || `agent-browser reported failure${exitCode !== 0 ? ` (exit code ${exitCode})` : "."}`);
 	}
 	if (exitCode !== 0) {

package/extensions/agent-browser/lib/results/presentation.ts CHANGED Viewed

@@ -14,6 +14,7 @@ import { buildSnapshotPresentation, formatRawSnapshotText, formatSnapshotSummary
 import {
 	type AgentBrowserBatchResult,
 	type AgentBrowserEnvelope,
+	type BatchFailurePresentationDetails,
 	type BatchStepPresentationDetails,
 	type ToolPresentation,
 	isRecord,
@@ -188,6 +189,20 @@ function formatBatchStepError(error: unknown): string {
 	return errorText.length > 0 ? `Error: ${errorText}` : "Error: batch step failed.";
 }
+function getBatchFailureDetails(steps: Array<{ details: BatchStepPresentationDetails }>): BatchFailurePresentationDetails | undefined {
+	const failedSteps = steps.filter((step) => step.details.success === false);
+	if (failedSteps.length === 0) {
+		return undefined;
+	}
+	const successCount = steps.length - failedSteps.length;
+	return {
+		failedStep: failedSteps[0].details,
+		failureCount: failedSteps.length,
+		successCount,
+		totalCount: steps.length,
+	};
+}
 async function buildBatchStepPresentation(options: {
 	cwd: string;
 	index: number;
@@ -261,6 +276,7 @@ async function buildBatchPresentation(options: {
 		steps.push(await buildBatchStepPresentation({ cwd, index, item }));
 	}
+	const batchFailure = getBatchFailureDetails(steps);
 	const images = steps.flatMap((step) => getPresentationImages(step.presentation));
 	const fullOutputPaths = steps.flatMap((step) => getPresentationPaths({
 		primaryPath: step.presentation.fullOutputPath,
@@ -270,13 +286,14 @@ async function buildBatchPresentation(options: {
 		primaryPath: step.presentation.imagePath,
 		secondaryPaths: step.presentation.imagePaths,
 	}));
-	const text =
+	const stepText =
 		steps.length === 0
 			? "(no batch steps)"
 			: steps
 				.map(({ details, presentation }) => {
 					const inlineImageCount = getPresentationImages(presentation).length;
-					const lines = [`Step ${details.index + 1} — ${details.commandText}`];
+					const status = details.success ? "succeeded" : "failed";
+					const lines = [`Step ${details.index + 1} — ${details.commandText} (${status})`];
 					if (details.text.length > 0) {
 						lines.push(details.text);
 					}
@@ -286,8 +303,20 @@ async function buildBatchPresentation(options: {
 					return lines.join("\n");
 				})
 				.join("\n\n");
+	const failureHeader =
+		batchFailure === undefined
+			? undefined
+			: [
+					summary,
+					`First failing step: ${batchFailure.failedStep.index + 1} — ${batchFailure.failedStep.commandText}`,
+					batchFailure.failureCount > 1
+						? `${batchFailure.failureCount} steps failed. See the per-step results below.`
+						: "See the per-step results below.",
+				].join("\n");
+	const text = failureHeader ? `${failureHeader}\n\n${stepText}` : stepText;
 	return {
+		batchFailure,
 		batchSteps: steps.map((step) => step.details),
 		content: [{ type: "text", text }, ...images],
 		data,
@@ -302,7 +331,7 @@ async function buildBatchPresentation(options: {
 function formatSummary(commandInfo: CommandInfo, data: unknown): string {
 	if (Array.isArray(data) && commandInfo.command === "batch") {
 		const successCount = data.filter((item) => isRecord(item) && item.success !== false).length;
-		return `Batch: ${successCount}/${data.length} succeeded`;
+		return successCount === data.length ? `Batch: ${successCount}/${data.length} succeeded` : `Batch failed: ${successCount}/${data.length} succeeded`;
 	}
 	if (isRecord(data)) {
 		const navigationSummary = getNavigationSummary(data);

package/extensions/agent-browser/lib/results/shared.ts CHANGED Viewed

@@ -33,7 +33,15 @@ export interface BatchStepPresentationDetails {
 	text: string;
 }
+export interface BatchFailurePresentationDetails {
+	failedStep: BatchStepPresentationDetails;
+	failureCount: number;
+	successCount: number;
+	totalCount: number;
+}
 export interface ToolPresentation {
+	batchFailure?: BatchFailurePresentationDetails;
 	batchSteps?: BatchStepPresentationDetails[];
 	content: Array<{ text: string; type: "text" } | { data: string; mimeType: string; type: "image" }>;
 	data?: unknown;

package/extensions/agent-browser/lib/runtime.ts CHANGED Viewed

@@ -1,9 +1,9 @@
 /**
  * Purpose: Build safe, deterministic agent-browser invocations for the pi-agent-browser extension.
- * Responsibilities: Validate raw tool arguments, derive implicit session names from the pi session identity, resolve implicit-session timeout/state helpers, detect explicit session usage, and build the effective CLI argument list passed to the upstream agent-browser binary.
+ * Responsibilities: Validate raw tool arguments, derive extension-managed session names from the pi session identity, resolve managed-session timeout/state helpers, detect explicit session usage, and build the effective CLI argument list passed to the upstream agent-browser binary.
  * Scope: Pure runtime-planning helpers only; no subprocess execution or filesystem access lives here.
  * Usage: Imported by the extension entrypoint and unit tests before spawning the upstream CLI.
- * Invariants/Assumptions: The wrapper stays thin, preserves upstream command vocabulary, and only injects `--json` plus an implicit `--session` when appropriate.
+ * Invariants/Assumptions: The wrapper stays thin, preserves upstream command vocabulary, and only injects `--json` plus an extension-managed `--session` when appropriate.
  */
 import { createHash, randomUUID } from "node:crypto";
@@ -49,6 +49,8 @@ const GLOBAL_FLAGS_WITH_VALUES = new Set([
 ]);
 const SHELL_OPERATOR_TOKENS = new Set(["&&", "||", "|", ";", ">", ">>", "<"]);
 const MAX_PROJECT_SLUG_LENGTH = 24;
+const SESSION_NAME_CWD_HASH_LENGTH = 8;
+const SESSION_NAME_SESSION_ID_LENGTH = 12;
 export interface CommandInfo {
 	command?: string;
@@ -64,9 +66,18 @@ export interface SessionRecoveryHint {
 	recommendedSessionMode: "fresh";
 }
+export interface InvalidValueFlagDetails {
+	flag: string;
+	index: number;
+	reason: "missing-value" | "unexpected-flag";
+	receivedToken?: string;
+}
 export interface ExecutionPlan {
 	commandInfo: CommandInfo;
 	effectiveArgs: string[];
+	invalidValueFlag?: InvalidValueFlagDetails;
+	managedSessionName?: string;
 	recoveryHint?: SessionRecoveryHint;
 	sessionName?: string;
 	startupScopedFlags: string[];
@@ -74,6 +85,12 @@ export interface ExecutionPlan {
 	validationError?: string;
 }
+export interface ManagedSessionState {
+	active: boolean;
+	replacedSessionName?: string;
+	sessionName: string;
+}
 export interface PromptPolicy {
 	allowLegacyAgentBrowserBash: boolean;
 }
@@ -105,25 +122,38 @@ export function getImplicitSessionCloseTimeoutMs(env: NodeJS.ProcessEnv = proces
 	return parseTimeoutMs(env[IMPLICIT_SESSION_CLOSE_TIMEOUT_ENV], 0) ?? DEFAULT_IMPLICIT_SESSION_CLOSE_TIMEOUT_MS;
 }
-export function resolveImplicitSessionActiveState(options: {
+export function resolveManagedSessionState(options: {
 	command?: string;
+	managedSessionName?: string;
 	priorActive: boolean;
+	priorSessionName: string;
 	succeeded: boolean;
-	usedImplicitSession: boolean;
-}): boolean {
-	const { command, priorActive, succeeded, usedImplicitSession } = options;
-	if (!usedImplicitSession) return priorActive;
-	if (command === "close") {
-		return succeeded ? false : priorActive;
+}): ManagedSessionState {
+	const { command, managedSessionName, priorActive, priorSessionName, succeeded } = options;
+	if (!managedSessionName) {
+		return { active: priorActive, sessionName: priorSessionName };
 	}
-	if (!command) return priorActive;
-	return priorActive || succeeded;
+	if (command === "close" && managedSessionName === priorSessionName) {
+		return { active: succeeded ? false : priorActive, sessionName: priorSessionName };
+	}
+	if (!succeeded) {
+		return { active: priorActive, sessionName: priorSessionName };
+	}
+	return {
+		active: true,
+		replacedSessionName: priorActive && priorSessionName !== managedSessionName ? priorSessionName : undefined,
+		sessionName: managedSessionName,
+	};
 }
 export function createEphemeralSessionSeed(): string {
 	return randomUUID();
 }
+function createCwdHash(cwd: string): string {
+	return createHash("sha256").update(`cwd:${cwd}`).digest("hex").slice(0, SESSION_NAME_CWD_HASH_LENGTH);
+}
 export function createImplicitSessionName(
 	sessionId: string | undefined,
 	cwd: string,
@@ -135,13 +165,25 @@ export function createImplicitSessionName(
 			.replace(/[^a-z0-9]+/g, "-")
 			.replace(/^-+|-+$/g, "")
 			.slice(0, MAX_PROJECT_SLUG_LENGTH) || "project";
-	const stableSessionId = sessionId?.replace(/-/g, "").slice(0, 12);
+	const cwdHash = createCwdHash(cwd);
+	const stableSessionId = sessionId?.replace(/-/g, "").slice(0, SESSION_NAME_SESSION_ID_LENGTH);
 	if (stableSessionId && stableSessionId.length > 0) {
-		return `piab-${slug}-${stableSessionId}`;
+		return `piab-${slug}-${stableSessionId}-${cwdHash}`;
 	}
-	const digest = createHash("sha256").update(`ephemeral:${cwd}:${ephemeralSeed}`).digest("hex").slice(0, 12);
-	return `piab-${slug}-${digest}`;
+	const digest = createHash("sha256")
+		.update(`ephemeral:${cwd}:${ephemeralSeed}`)
+		.digest("hex")
+		.slice(0, SESSION_NAME_SESSION_ID_LENGTH);
+	return `piab-${slug}-${digest}-${cwdHash}`;
+}
+export function createFreshSessionName(baseSessionName: string, ephemeralSeed: string, ordinal: number): string {
+	const suffix = createHash("sha256")
+		.update(`fresh:${baseSessionName}:${ephemeralSeed}:${ordinal}`)
+		.digest("hex")
+		.slice(0, 10);
+	return `${baseSessionName}-fresh-${suffix}`;
 }
 export function validateToolArgs(args: string[]): string | undefined {
@@ -157,6 +199,54 @@ export function validateToolArgs(args: string[]): string | undefined {
 	return undefined;
 }
+function getInvalidValueFlagDetails(args: string[]): InvalidValueFlagDetails | undefined {
+	for (const [index, token] of args.entries()) {
+		if (!token.startsWith("-")) {
+			continue;
+		}
+		const normalizedToken = token.split("=", 1)[0] ?? token;
+		if (!GLOBAL_FLAGS_WITH_VALUES.has(normalizedToken)) {
+			continue;
+		}
+		if (token.includes("=")) {
+			const value = token.slice(token.indexOf("=") + 1).trim();
+			if (value.length === 0) {
+				return {
+					flag: normalizedToken,
+					index,
+					reason: "missing-value",
+				};
+			}
+			continue;
+		}
+		const receivedToken = args[index + 1];
+		if (receivedToken === undefined) {
+			return {
+				flag: normalizedToken,
+				index,
+				reason: "missing-value",
+			};
+		}
+		if (receivedToken.startsWith("-")) {
+			return {
+				flag: normalizedToken,
+				index,
+				reason: "unexpected-flag",
+				receivedToken,
+			};
+		}
+		continue;
+	}
+	return undefined;
+}
+function formatInvalidValueFlagError(details: InvalidValueFlagDetails): string {
+	if (details.reason === "unexpected-flag" && details.receivedToken) {
+		return `Flag \`${details.flag}\` requires a value, but received \`${details.receivedToken}\` instead. Pass a non-flag value immediately after \`${details.flag}\`.`;
+	}
+	return `Flag \`${details.flag}\` requires a value immediately after it. Pass a non-flag token like \`${details.flag} demo\`.`;
+}
 function hasFlagToken(args: string[], flag: string): boolean {
 	return args.some((token) => token === flag || token.startsWith(`${flag}=`));
 }
@@ -213,35 +303,60 @@ export function getLatestUserPrompt(branch: unknown[]): string {
 export function buildExecutionPlan(
 	args: string[],
-	options: { implicitSessionActive: boolean; implicitSessionName: string; sessionMode: SessionMode },
+	options: {
+		freshSessionName: string;
+		managedSessionActive: boolean;
+		managedSessionName: string;
+		sessionMode: SessionMode;
+	},
 ): ExecutionPlan {
+	const effectiveArgs = args.includes("--json") ? [] : ["--json"];
+	const invalidValueFlag = getInvalidValueFlagDetails(args);
+	if (invalidValueFlag) {
+		return {
+			commandInfo: {},
+			effectiveArgs,
+			invalidValueFlag,
+			startupScopedFlags: [],
+			usedImplicitSession: false,
+			validationError: formatInvalidValueFlagError(invalidValueFlag),
+		};
+	}
 	const commandInfo = parseCommandInfo(args);
 	const explicitSessionName = extractExplicitSessionName(args);
 	const startupScopedFlags = getStartupScopedFlags(args);
-	const effectiveArgs = args.includes("--json") ? [] : ["--json"];
+	const shouldCreateFreshManagedSession =
+		!explicitSessionName && options.sessionMode === "fresh" && commandInfo.command !== undefined && commandInfo.command !== "close";
+	let managedSessionName: string | undefined;
 	let recoveryHint: SessionRecoveryHint | undefined;
 	let sessionName = explicitSessionName;
 	let usedImplicitSession = false;
 	let validationError: string | undefined;
 	if (!explicitSessionName && options.sessionMode === "auto") {
-		if (options.implicitSessionActive && startupScopedFlags.length > 0) {
+		if (options.managedSessionActive && startupScopedFlags.length > 0) {
 			recoveryHint = {
 				exampleArgs: args,
 				exampleParams: { args, sessionMode: "fresh" },
 				reason:
-					"Startup-scoped flags like --profile, --session-name, and --cdp need a fresh upstream launch once the implicit session is already active.",
+					"Startup-scoped flags like --profile, --session-name, and --cdp need a fresh upstream launch once the extension-managed session is already active.",
 				recommendedSessionMode: "fresh",
 			};
 			validationError = [
-				`The current implicit agent-browser session is already running, so startup-scoped flags ${startupScopedFlags.join(", ")} would be ignored by upstream agent-browser.`,
+				`The current extension-managed agent-browser session is already running, so startup-scoped flags ${startupScopedFlags.join(", ")} would be ignored by upstream agent-browser.`,
 				"Retry this call with `sessionMode: \"fresh\"` to force a fresh upstream launch, or pass an explicit `--session ...` if you want to name the new session yourself.",
 			].join(" ");
 		} else {
-			effectiveArgs.push("--session", options.implicitSessionName);
-			sessionName = options.implicitSessionName;
+			effectiveArgs.push("--session", options.managedSessionName);
+			managedSessionName = options.managedSessionName;
+			sessionName = options.managedSessionName;
 			usedImplicitSession = true;
 		}
+	} else if (shouldCreateFreshManagedSession) {
+		effectiveArgs.push("--session", options.freshSessionName);
+		managedSessionName = options.freshSessionName;
+		sessionName = options.freshSessionName;
 	}
 	effectiveArgs.push(...args);
@@ -249,6 +364,7 @@ export function buildExecutionPlan(
 	return {
 		commandInfo,
 		effectiveArgs,
+		managedSessionName,
 		recoveryHint,
 		sessionName,
 		startupScopedFlags,

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "pi-agent-browser-native",
-  "version": "0.2.0",
+  "version": "0.2.1",
   "description": "pi extension that exposes agent-browser as a native tool for browser automation",
   "type": "module",
   "author": "Mitch Fultz (https://github.com/fitchmultz)",