npm - pi-agent-browser-native - Versions diffs - 0.2.51 → 0.2.53 - Mend

pi-agent-browser-native 0.2.51 → 0.2.53

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

package/CHANGELOG.md +35 -1
package/README.md +6 -17
package/dist/extensions/agent-browser/lib/command-policy.js +11 -0
package/dist/extensions/agent-browser/lib/input-modes/job.js +31 -15
package/dist/extensions/agent-browser/lib/input-modes/params.js +19 -40
package/dist/extensions/agent-browser/lib/playbook.js +3 -2
package/dist/extensions/agent-browser/lib/results/presentation/batch.js +3 -2
package/dist/extensions/agent-browser/lib/results/presentation/diagnostics.js +27 -9
package/dist/extensions/agent-browser/lib/results/presentation/large-output.js +26 -1
package/dist/extensions/agent-browser/lib/results/presentation.js +9 -7
package/dist/extensions/agent-browser/lib/web-search.js +1 -1
package/docs/ARCHITECTURE.md +1 -1
package/docs/COMMAND_REFERENCE.md +54 -14
package/docs/RELEASE.md +4 -8
package/docs/REQUIREMENTS.md +1 -1
package/docs/SUPPORT_MATRIX.md +14 -13
package/docs/TOOL_CONTRACT.md +6 -5
package/package.json +5 -5
package/scripts/agent-browser-capability-baseline.mjs +25 -3
package/scripts/platform-smoke/browser-dogfood-windows.ps1 +8 -2
package/scripts/platform-smoke.mjs +1 -1

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,40 @@
 ## Unreleased
+## 0.2.53 - 2026-06-18
+### Changed
+- Rebaselined upstream capability metadata, command reference, support matrix, platform-smoke image tag, and real-upstream output-shape metadata for `agent-browser` `0.28.0` / vercel-labs/agent-browser@6323df571ffd17d14e60ec19fcb56cc1caf498ab.
+- Documented upstream `mcp`, `plugin add/list/show/run`, plugin-backed `auth login --credential-provider`, and `AGENT_BROWSER_PLUGINS` surfaces while keeping the wrapper thin and compatibility-shim-free.
+- Marked `mcp` and known `plugin` commands as sessionless wrapper calls so local/infra commands do not get an implicit managed browser session.
+- Collapsed duplicated release/platform-smoke prose across README, release docs, and agent guidance in favor of `docs/platform-smoke.md` as the detailed source of truth.
+- Simplified duplicate internal schema/job compiler plumbing without changing the public tool schema or generated argv behavior.
+### Fixed
+- Retried the Windows platform dogfood smoke once after transient first browser-open failures, matching the existing Windows browser prewarm tolerance while preserving real dogfood failures.
+### Validation
+- Ran `npm run verify -- release` against `agent-browser` `0.28.0`; the gate passed default verification, command-reference checks, build, lifecycle verification, packaged Pi smoke, and macOS/Ubuntu/Windows-native platform smoke after refreshing the Ubuntu image and Windows `crabbox-ready` snapshot.
+- Ran `npm run verify -- real-upstream`, `npm run verify -- dogfood`, `npm run docs`, `npm run verify -- command-reference`, and `git diff --check`.
+## 0.2.52 - 2026-06-15
+### Changed
+- Rebaselined the upstream capability metadata, command reference, support matrix, platform-smoke image tag, and real-upstream output-shape metadata for `agent-browser` `0.27.3` / vercel-labs/agent-browser@2c7991c9eccca1c9db6eee1a26a713414778de5a. This is an install-only upstream update from the prior baseline; no wrapper feature, shim, or inventory-token change was added.
+- Updated the local Pi development baseline to `@earendil-works/*` `0.79.4`, refreshed `.pi-fleet-tested-version`, and refreshed `package-lock.json` with npm 11 while keeping the intentional doctor floor at Pi `0.79.0`.
+### Fixed
+- Updated the lifecycle release harness prompt-readiness check to accept Pi 0.79.4 footer units such as `1.0M`, avoiding false readiness timeouts after successful startup.
+### Validation
+- Ran `npm publish --dry-run` against `agent-browser` `0.27.3` and Pi `0.79.4`; the gate passed default verification, command-reference checks, build, lifecycle verification, packaged Pi smoke, and macOS/Ubuntu/Windows-native platform smoke.
 ## 0.2.51 - 2026-06-11
 ### Fixed
@@ -328,7 +362,7 @@
 ### Changed
 - `sourceLookup`, broad `get text`, fill verification, tab/session mismatch, and stale-ref guidance now include Electron-aware context and recovery actions for packaged desktop apps.
 - Verification coverage now includes deterministic Electron lifecycle/probe benchmark scenarios, fake-upstream Electron discovery/lifecycle tests, lifecycle restore/shutdown cleanup checks, and real-app dogfood evidence recorded in the Electron plan.
-- The configured-source lifecycle harness (`npm run verify -- lifecycle`, `scripts/verify-lifecycle.mjs`) now defaults to Pi model `zai/glm-5.1` with `--model <id>` override; `npm run verify` lifecycle passthrough rejects `--model` without a value.
+- The configured-source lifecycle harness (`npm run verify -- lifecycle`, `scripts/verify-lifecycle.mjs`) now defaults to Pi model `zai/glm-5.2` with `--model <id>` override; `npm run verify` lifecycle passthrough rejects `--model` without a value.
 - Updated the local Pi development baseline to `@earendil-works/*` `0.75.4` and refreshed the npm lockfile.
 ### Fixed

package/README.md CHANGED Viewed

@@ -183,7 +183,7 @@ npm exec --yes --package pi-agent-browser-native@latest -- pi-agent-browser-conf
 npm exec --yes --package pi-agent-browser-native@latest -- pi-agent-browser-config show
 ```
-The optional `agent_browser_web_search` companion tool is available when a usable Exa or Brave credential source is configured or resolvable from startup config or trusted session config. It is not an `agent_browser` input mode and does not launch a browser; agents may use it whenever current/live external web information helps, then use `agent_browser` when they need page interaction, screenshots, authenticated/profile content, or DOM inspection. If both keys are available, the default provider is Exa because its `/search` endpoint returns agent-friendly highlights and search modes; set `webSearch.preferredProvider` to `"brave"` when you prefer Brave Search.
+The optional `agent_browser_web_search` companion tool is available when a usable Exa or Brave credential source is configured or resolvable from startup config or trusted session config. It is not an `agent_browser` input mode and does not launch a browser; agents may use it whenever current/live external web information helps, then use `agent_browser` when they need page interaction, screenshots, authenticated/profile content, or DOM inspection. Prefer it over automating public search-engine forms such as Google in headless browser jobs: those flows may be redirected to anti-bot or CAPTCHA pages, and this wrapper does not provide or recommend CAPTCHA bypass. If both keys are available, the default provider is Exa because its `/search` endpoint returns agent-friendly highlights and search modes; set `webSearch.preferredProvider` to `"brave"` when you prefer Brave Search.
 Get an Exa API key from the [Exa dashboard](https://dashboard.exa.ai/api-keys) or a Brave Search API key from the [Brave Search API dashboard](https://api-dashboard.search.brave.com/). Most users can simply export `EXA_API_KEY` or `BRAVE_API_KEY` in the environment that launches `pi`; config is only needed when you want Pi-scoped secret references, a preferred provider, or to disable this built-in search tool.
@@ -412,7 +412,7 @@ After either path, use `qa: { "attached": true, ... }` for a current-session smo
 ### Lightweight QA preset
-For a quick smoke/QA pass, use top-level `qa`. It compiles to the same batch path as `job` and uses `batch --bail` so failed readiness/text/selector assertions stop before slower diagnostics can burn the wrapper watchdog. The URL form clears enabled network/console/page-error buffers before opening the target URL, waits for page readiness, checks optional expected text or selector, inspects fresh network requests, console messages, and page errors when preceding assertions pass, and can capture an evidence screenshot. Expected text is checked with bounded visible-text `wait --fn … --timeout 5000` predicates after the requested load state so dense pages can pass on visible headings/copy and missing text becomes crisp QA evidence. The attached form (`qa: { "attached": true }`) runs checks against the current managed session, such as an attached Electron app, rejects `url`, and deliberately preserves existing diagnostics instead of clearing evidence; its diagnostic reads default off so stale buffers do not fail a current-page smoke unless `checkNetwork`, `checkConsole`, or `checkErrors` is explicitly `true`. `loadState` defaults to `"domcontentloaded"`; set it to `"load"` or `"networkidle"` only when the stricter state is useful and the site is not expected to keep background requests alive. For URL-opening QA, `checkNetwork`, `checkConsole`, and `checkErrors` default to true; set one to `false` to skip that diagnostic read. Network failures are classified by likely impact and failed rows are listed first in network previews: actionable document/script/API-style failures still fail QA, while some low-impact browser icon asset misses (for example certain `favicon` or `apple-touch-icon` paths when upstream marks the row failed and resource metadata looks image-like) surface only as warnings instead of failing an otherwise healthy smoke check (`details.qaPreset.warnings`, with human-readable `details.qaPreset.summary` when the preset still passes). Exact predicates live in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#qa) and `classifyNetworkRequestFailure` in `extensions/agent-browser/lib/results/network.ts` (re-exported from the compatibility barrel).
+For a quick smoke/QA pass, use top-level `qa`. It compiles to the same batch path as `job` and uses `batch --bail` so failed readiness/text/selector assertions stop before slower diagnostics can burn the wrapper watchdog. The URL form clears enabled network/console/page-error buffers before opening the target URL, waits for page readiness, checks optional expected text or selector, inspects fresh network requests, console messages, and page errors when preceding assertions pass, and can capture an evidence screenshot. Successful reset rows are labeled as reset-scoped output and ignored by QA failure analysis so stale pre-target errors do not fail an otherwise healthy target page; real post-open diagnostic rows still fail or warn according to the normal QA rules. Expected text is checked with bounded visible-text `wait --fn … --timeout 5000` predicates after the requested load state so dense pages can pass on visible headings/copy and missing text becomes crisp QA evidence. The attached form (`qa: { "attached": true }`) runs checks against the current managed session, such as an attached Electron app, rejects `url`, and deliberately preserves existing diagnostics instead of clearing evidence; its diagnostic reads default off so stale buffers do not fail a current-page smoke unless `checkNetwork`, `checkConsole`, or `checkErrors` is explicitly `true`. `loadState` defaults to `"domcontentloaded"`; set it to `"load"` or `"networkidle"` only when the stricter state is useful and the site is not expected to keep background requests alive. For URL-opening QA, `checkNetwork`, `checkConsole`, and `checkErrors` default to true; set one to `false` to skip that diagnostic read. Network failures are classified by likely impact and failed rows are listed first in network previews: actionable document/script/API-style failures still fail QA, while some low-impact browser icon asset misses (for example certain `favicon` or `apple-touch-icon` paths when upstream marks the row failed and resource metadata looks image-like) surface only as warnings instead of failing an otherwise healthy smoke check (`details.qaPreset.warnings`, with human-readable `details.qaPreset.summary` when the preset still passes). Exact predicates live in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#qa) and `classifyNetworkRequestFailure` in `extensions/agent-browser/lib/results/network.ts` (re-exported from the compatibility barrel).
 ```json
 {
@@ -449,7 +449,7 @@ For asynchronous exports, click first and then wait for the download:
 { "args": ["wait", "--download", "/tmp/report.csv"] }
 ```
-When a user gives exact artifact paths for screenshots, recordings, downloads, PDFs, traces, or HAR files, use those paths or explicitly report why the artifact was unavailable; do not silently substitute a different path in the final report. The wrapper creates missing parent directories for direct artifact paths such as `state save`, screenshots, PDFs, downloads, and `wait --download`. For simple loopback `download <selector> <path>` anchor links with HTTP(S) `href`, it can save the in-page response directly to the requested path before falling back to upstream click/download behavior; non-loopback/profile downloads stay upstream-owned. With upstream `agent-browser 0.27.2`, treat `details.savedFilePath` as upstream-reported metadata and confirm `details.artifacts[].exists` / `details.artifactVerification.verified` before relying on the requested `wait --download <path>` file being present on disk; non-file download payloads such as `data:` URLs are not verified local artifacts.
+When a user gives exact artifact paths for screenshots, recordings, downloads, PDFs, traces, or HAR files, use those paths or explicitly report why the artifact was unavailable; do not silently substitute a different path in the final report. The wrapper creates missing parent directories for direct artifact paths such as `state save`, screenshots, PDFs, downloads, and `wait --download`. For simple loopback `download <selector> <path>` anchor links with HTTP(S) `href`, it can save the in-page response directly to the requested path before falling back to upstream click/download behavior; non-loopback/profile downloads stay upstream-owned. With current upstream `agent-browser`, treat `details.savedFilePath` as upstream-reported metadata and confirm `details.artifacts[].exists` / `details.artifactVerification.verified` before relying on the requested `wait --download <path>` file being present on disk; non-file download payloads such as `data:` URLs are not verified local artifacts.
 For evidence-only screenshots or QA captures, branch on `details.artifactVerification` and `details.artifacts` before reporting PASS/FAIL; inline image attachments are optional when size limits allow—do not require vision review unless the user asked for visual inspection. If the latest prompt names exact required artifact paths, browser close can be blocked with `details.promptGuard` until those artifacts are saved and verified.
@@ -613,18 +613,7 @@ npm run verify -- dogfood
 That mode drives the native wrapper through top-level `qa`, `semanticAction`, constrained `job`, screenshot artifact verification, and session close against a deterministic local fixture. It complements, but does not replace, the interactive Pi/tmux release dogfood in [`docs/RELEASE.md`](docs/RELEASE.md#pre-release-checks).
-Cross-platform release coverage uses Crabbox to run macOS, Ubuntu Linux, and native Windows target suites:
-```bash
-npm run check:platform-smoke
-npm run smoke:platform:ubuntu-image
-npm run smoke:platform:doctor
-npm run smoke:platform:all
-```
-The required matrix is documented in [`docs/platform-smoke.md`](docs/platform-smoke.md). It runs `platform-build` (fast target-local verify, pack, clean packed Pi install with `--approve`, `pi list --approve`) and `browser-dogfood-smoke` (real `agent-browser`/browser wrapper smoke) on every target. Inspect `.artifacts/platform-smoke/` and check `crabbox list --provider local-container` plus `crabbox list --provider parallels` after release runs so cleanup proof is not chat-only.
-For package release confidence, follow [`docs/RELEASE.md`](docs/RELEASE.md). The release gate is:
+Cross-platform release coverage uses Crabbox to run macOS, Ubuntu Linux, and native Windows target suites; see [`docs/platform-smoke.md`](docs/platform-smoke.md) for the required matrix, standalone coverage (`npm run smoke:platform:all` and per-target `smoke:platform:macos` / `:ubuntu` / `:windows-native`), and artifact/lease inspection. The release gate is:
 ```bash
 npm run doctor
@@ -634,7 +623,7 @@ npm run smoke:platform:doctor
 npm run verify -- release
 ```
-`npm run verify -- release` includes the default verification gate, packaged Pi smoke coverage, and the release-blocking Crabbox platform matrix. The package also has a `prepublishOnly` hook that runs the same release gate and `npm pack --dry-run` during `npm publish`.
+`npm run verify -- release` includes the default verification gate, packaged Pi smoke coverage, and the release-blocking Crabbox platform matrix (the same matrix `npm run smoke:platform:all` runs standalone). For the full maintainer release flow, follow [`docs/RELEASE.md`](docs/RELEASE.md). The package also has a `prepublishOnly` hook that runs the same release gate and `npm pack --dry-run` during `npm publish`.
 ## How it works
@@ -687,7 +676,7 @@ Configured-source lifecycle validation:
 npm run verify -- lifecycle
 ```
-The harness defaults to Pi model `zai/glm-5.1` and **180000 ms** per-step tmux waits; pass `--model <id>` and/or `--timeout-ms <ms>` after `lifecycle` when you need different settings (see [Configured-source lifecycle validation](docs/RELEASE.md#configured-source-lifecycle-validation) in `docs/RELEASE.md`). It launches Pi 0.79 with `--approve` and a deterministic `--session-id`, drives `/reload`, closes Pi, relaunches the exact same session, asserts the JSONL header id, and checks managed-session continuity, compiled-entrypoint pickup after process restart, persisted spill reachability, and real Pi `tool_result` failure-patch behavior.
+The harness defaults to Pi model `zai/glm-5.2` and **180000 ms** per-step tmux waits; pass `--model <id>` and/or `--timeout-ms <ms>` after `lifecycle` when you need different settings (see [Configured-source lifecycle validation](docs/RELEASE.md#configured-source-lifecycle-validation) in `docs/RELEASE.md`). It launches Pi 0.79 with `--approve` and a deterministic `--session-id`, drives `/reload`, closes Pi, relaunches the exact same session, asserts the JSONL header id, and checks managed-session continuity, compiled-entrypoint pickup after process restart, persisted spill reachability, and real Pi `tool_result` failure-patch behavior.
 Use lifecycle validation when testing `/reload`, exact-session relaunch, `/resume`, managed-session continuity, or persisted artifact behavior. Branch-backed state and `session_tree` cleanup ownership are covered by focused extension harness tests. Maintainers must run the lifecycle harness before every publish; see [Pre-release checks](docs/RELEASE.md#pre-release-checks).

package/dist/extensions/agent-browser/lib/command-policy.js CHANGED Viewed

@@ -5,6 +5,7 @@
  */
 import { hasOnlyBooleanFlags, hasOnlyOptionFlags, isNonFlagToken, stripSessionlessShapeGlobalFlags } from "./argv-grammar.js";
 const SESSIONLESS_AUTH_SUBCOMMANDS = new Set(["save", "list", "show", "delete", "remove"]);
+const PLUGIN_SESSIONLESS_SUBCOMMANDS = new Set(["list", "show", "add", "run"]);
 const EMPTY_BOOLEAN_FLAGS = new Set();
 const JSON_BOOLEAN_FLAGS = new Set(["--json"]);
 const AUTH_SAVE_BOOLEAN_FLAGS = new Set(["--json", "--password-stdin"]);
@@ -57,6 +58,12 @@ function isSessionlessStateCommand(commandTokens) {
         return false;
     return secondArg === undefined || (secondArg === "--all" && rest.length === 0);
 }
+function isSessionlessPluginCommand(commandTokens) {
+    const [, subcommand] = commandTokens;
+    if (subcommand === undefined)
+        return true;
+    return PLUGIN_SESSIONLESS_SUBCOMMANDS.has(subcommand);
+}
 function isSessionlessCommand(commandTokens) {
     const normalizedTokens = stripSessionlessShapeGlobalFlags(commandTokens);
     const [command, subcommand] = normalizedTokens;
@@ -64,6 +71,10 @@ function isSessionlessCommand(commandTokens) {
         return ["list", "get", "path"].includes(subcommand ?? "");
     if (command === "auth")
         return isSessionlessAuthCommand(normalizedTokens);
+    if (command === "plugin")
+        return isSessionlessPluginCommand(normalizedTokens);
+    if (command === "mcp")
+        return true;
     if (command === "dashboard")
         return isSessionlessDashboardCommand(normalizedTokens);
     if (command === "device")

package/dist/extensions/agent-browser/lib/input-modes/job.js CHANGED Viewed

@@ -190,18 +190,21 @@ function compilePathArtifactJobStep(step, action) {
         return { error: result.error };
     return { args: action === "waitForDownload" ? ["wait", "--download", result.value] : ["screenshot", result.value] };
 }
-const JOB_STEP_DESCRIPTORS = {
-    assertText: { allowedFields: JOB_STEP_ALLOWED_FIELDS.assertText, compile: compileAssertTextJobStep },
-    assertUrl: { allowedFields: JOB_STEP_ALLOWED_FIELDS.assertUrl, compile: compileAssertUrlJobStep },
-    click: { allowedFields: JOB_STEP_ALLOWED_FIELDS.click, compile: compileClickJobStep },
-    fill: { allowedFields: JOB_STEP_ALLOWED_FIELDS.fill, compile: compileFillJobStep },
-    open: { allowedFields: JOB_STEP_ALLOWED_FIELDS.open, compile: compileOpenJobStep },
-    screenshot: { allowedFields: JOB_STEP_ALLOWED_FIELDS.screenshot, compile: (step) => compilePathArtifactJobStep(step, "screenshot") },
-    select: { allowedFields: JOB_STEP_ALLOWED_FIELDS.select, compile: compileSelectJobStep },
-    snapshot: { allowedFields: JOB_STEP_ALLOWED_FIELDS.snapshot, compile: () => ({ args: ["snapshot", "-i"] }) },
-    type: { allowedFields: JOB_STEP_ALLOWED_FIELDS.type, compile: compileTypeJobStep },
-    wait: { allowedFields: JOB_STEP_ALLOWED_FIELDS.wait, compile: compileWaitJobStep },
-    waitForDownload: { allowedFields: JOB_STEP_ALLOWED_FIELDS.waitForDownload, compile: (step) => compilePathArtifactJobStep(step, "waitForDownload") },
+// ponytail: allowedFields for each action live in JOB_STEP_ALLOWED_FIELDS (same key
+// alignment enforced by Record<AgentBrowserJobStepAction, …>), so the compiler map no
+// longer mirrors that set per entry; the call site looks it up by action.
+const JOB_STEP_COMPILERS = {
+    assertText: compileAssertTextJobStep,
+    assertUrl: compileAssertUrlJobStep,
+    click: compileClickJobStep,
+    fill: compileFillJobStep,
+    open: compileOpenJobStep,
+    screenshot: (step) => compilePathArtifactJobStep(step, "screenshot"),
+    select: compileSelectJobStep,
+    snapshot: () => ({ args: ["snapshot", "-i"] }),
+    type: compileTypeJobStep,
+    wait: compileWaitJobStep,
+    waitForDownload: (step) => compilePathArtifactJobStep(step, "waitForDownload"),
 };
 export function compileAgentBrowserJob(input) {
     if (!isRecord(input)) {
@@ -226,11 +229,11 @@ export function compileAgentBrowserJob(input) {
             return { error: `job.steps[${index}].action must be one of: ${AGENT_BROWSER_JOB_STEP_ACTIONS.join(", ")}.` };
         }
         const jobAction = action;
-        const descriptor = JOB_STEP_DESCRIPTORS[jobAction];
-        const unsupportedFieldError = getUnsupportedJobStepFieldError(rawStep, jobAction, descriptor.allowedFields);
+        const compile = JOB_STEP_COMPILERS[jobAction];
+        const unsupportedFieldError = getUnsupportedJobStepFieldError(rawStep, jobAction, JOB_STEP_ALLOWED_FIELDS[jobAction]);
         if (unsupportedFieldError)
             return { error: `job.steps[${index}]: ${unsupportedFieldError}` };
-        const compiledStep = descriptor.compile(rawStep, index);
+        const compiledStep = compile(rawStep, index);
         if (compiledStep.error)
             return { error: compiledStep.error.startsWith(`job.steps[${index}]`) ? compiledStep.error : `job.steps[${index}]: ${compiledStep.error}` };
         steps.push({ action: jobAction, args: compiledStep.args, generatedFrom: compiledStep.generatedFrom }, ...(compiledStep.extraSteps ?? []));
@@ -289,6 +292,9 @@ export function buildQaCompactPassText(options) {
     if (pageParts.length > 0)
         lines.push(`Page: ${pageParts.join(" — ")}`);
     lines.push(`Checks run: ${describeQaChecksRun(options.checks)} (${options.batchStepCount} batch step${options.batchStepCount === 1 ? "" : "s"})`);
+    if (options.checks.diagnosticsResetAtStart && (options.checks.checkNetwork || options.checks.checkConsole || options.checks.checkErrors)) {
+        lines.push("Diagnostic reset: URL QA cleared enabled network/console/page-error buffers before opening the target; reset rows in details.batchSteps are not counted as current-page failures.");
+    }
     if (options.checks.attached && !options.checks.diagnosticsResetAtStart && (options.checks.checkNetwork || options.checks.checkConsole || options.checks.checkErrors)) {
         lines.push("Attached diagnostics: existing upstream session console/network/error buffers were preserved; rows may include events from before qa.attached started.");
     }
@@ -369,6 +375,13 @@ function extractQaTextAssertionResultText(item) {
     }
     return undefined;
 }
+function isDiagnosticResetCommand(item) {
+    const command = item.command;
+    if (!Array.isArray(command) || !command.every((token) => typeof token === "string"))
+        return false;
+    const [name, subcommand] = command;
+    return command.includes("--clear") && (name === "console" || name === "errors" || (name === "network" && subcommand === "requests"));
+}
 export function analyzeQaPresetTimeout(compiled) {
     if (compiled.checks.expectedText.length === 0)
         return undefined;
@@ -392,6 +405,9 @@ export function analyzeQaPresetResults(data, compiled) {
         }
         const result = isRecord(item.result) ? item.result : undefined;
         const commandName = getCommandNameFromBatchItem(item);
+        if (compiled?.checks.diagnosticsResetAtStart && isDiagnosticResetCommand(item)) {
+            continue;
+        }
         if (commandName === "errors" && Array.isArray(result?.errors) && result.errors.length > 0) {
             failedChecks.push(`${result.errors.length} page error(s)`);
         }

package/dist/extensions/agent-browser/lib/input-modes/params.js CHANGED Viewed

@@ -7,6 +7,21 @@ import { JsonSchema } from "../json-schema.js";
 import { StringEnum as localStringEnum } from "../string-enum-schema.js";
 import { ELECTRON_DISCOVERY_DEFAULT_MAX_RESULTS, ELECTRON_DISCOVERY_MAX_RESULTS, } from "../electron/discovery.js";
 import { AGENT_BROWSER_ELECTRON_HANDOFFS, AGENT_BROWSER_ELECTRON_TARGET_TYPES, AGENT_BROWSER_JOB_STEP_ACTIONS, AGENT_BROWSER_JOB_TYPE_DELAYED_TEXT_MAX_CHARACTERS, AGENT_BROWSER_QA_LOAD_STATES, AGENT_BROWSER_SEMANTIC_ACTIONS, AGENT_BROWSER_SEMANTIC_LOCATORS, DEFAULT_SESSION_MODE, SOURCE_LOOKUP_MAX_WORKSPACE_FILES, } from "./types.js";
+// ponytail: the four electron.launch variants differ only in their single target field
+// (appPath/appName/bundleId/executablePath); the action literal and the shared optional
+// launch fields are identical, so a helper keeps the duplicate schema blocks in sync.
+function electronLaunchVariant(Type, StringEnum, targetField) {
+    return Type.Object({
+        action: StringEnum(["launch"], { description: "Launch an Electron app with an isolated wrapper-owned profile." }),
+        ...targetField,
+        appArgs: Type.Optional(Type.Array(Type.String({ description: "Argument passed to the Electron application.", minLength: 1 }), { description: "Optional Electron app argv. Wrapper-owned lifecycle/debug flags are rejected." })),
+        handoff: Type.Optional(StringEnum(AGENT_BROWSER_ELECTRON_HANDOFFS, { description: "Post-launch handoff depth. Defaults to snapshot." })),
+        targetType: Type.Optional(StringEnum(AGENT_BROWSER_ELECTRON_TARGET_TYPES, { description: "Preferred CDP target type. Defaults to page." })),
+        timeoutMs: Type.Optional(Type.Integer({ description: "Bounded launch timeout in milliseconds.", minimum: 1 })),
+        allow: Type.Optional(Type.Array(Type.String({ description: "App identifier allowed by the caller for electron.launch.", minLength: 1 }), { description: "Optional caller-owned allow list for electron.launch policy checks." })),
+        deny: Type.Optional(Type.Array(Type.String({ description: "App identifier denied by the caller for electron.launch.", minLength: 1 }), { description: "Optional caller-owned deny list for electron.launch policy checks; deny wins over allow." })),
+    }, { additionalProperties: false });
+}
 export function createAgentBrowserParamsSchema(Type = JsonSchema, StringEnum = localStringEnum) {
     return Type.Object({
         args: Type.Optional(Type.Array(Type.String({ description: "Exact agent-browser CLI arguments, excluding the binary name. Do not pass --json; the wrapper injects it. First-call recipe: open → snapshot -i → click/fill @eN → snapshot -i." }), {
@@ -71,46 +86,10 @@ export function createAgentBrowserParamsSchema(Type = JsonSchema, StringEnum = l
                 query: Type.Optional(Type.String({ description: "Optional case-insensitive substring filter for electron.list across app name, bundle id, desktop id, and paths.", minLength: 1 })),
                 maxResults: Type.Optional(Type.Integer({ description: `Maximum electron.list apps to return. Defaults to ${ELECTRON_DISCOVERY_DEFAULT_MAX_RESULTS}; values above ${ELECTRON_DISCOVERY_MAX_RESULTS} are clamped.`, minimum: 1 })),
             }, { additionalProperties: false }),
-            Type.Object({
-                action: StringEnum(["launch"], { description: "Launch an Electron app with an isolated wrapper-owned profile." }),
-                appPath: Type.String({ description: "Electron launch target: macOS .app bundle path. Exactly one launch target is required for electron.launch.", minLength: 1 }),
-                appArgs: Type.Optional(Type.Array(Type.String({ description: "Argument passed to the Electron application.", minLength: 1 }), { description: "Optional Electron app argv. Wrapper-owned lifecycle/debug flags are rejected." })),
-                handoff: Type.Optional(StringEnum(AGENT_BROWSER_ELECTRON_HANDOFFS, { description: "Post-launch handoff depth. Defaults to snapshot." })),
-                targetType: Type.Optional(StringEnum(AGENT_BROWSER_ELECTRON_TARGET_TYPES, { description: "Preferred CDP target type. Defaults to page." })),
-                timeoutMs: Type.Optional(Type.Integer({ description: "Bounded launch timeout in milliseconds.", minimum: 1 })),
-                allow: Type.Optional(Type.Array(Type.String({ description: "App identifier allowed by the caller for electron.launch.", minLength: 1 }), { description: "Optional caller-owned allow list for electron.launch policy checks." })),
-                deny: Type.Optional(Type.Array(Type.String({ description: "App identifier denied by the caller for electron.launch.", minLength: 1 }), { description: "Optional caller-owned deny list for electron.launch policy checks; deny wins over allow." })),
-            }, { additionalProperties: false }),
-            Type.Object({
-                action: StringEnum(["launch"], { description: "Launch an Electron app with an isolated wrapper-owned profile." }),
-                appName: Type.String({ description: "Electron launch target: app display name discovered by electron.list. Exactly one launch target is required for electron.launch.", minLength: 1 }),
-                appArgs: Type.Optional(Type.Array(Type.String({ description: "Argument passed to the Electron application.", minLength: 1 }), { description: "Optional Electron app argv. Wrapper-owned lifecycle/debug flags are rejected." })),
-                handoff: Type.Optional(StringEnum(AGENT_BROWSER_ELECTRON_HANDOFFS, { description: "Post-launch handoff depth. Defaults to snapshot." })),
-                targetType: Type.Optional(StringEnum(AGENT_BROWSER_ELECTRON_TARGET_TYPES, { description: "Preferred CDP target type. Defaults to page." })),
-                timeoutMs: Type.Optional(Type.Integer({ description: "Bounded launch timeout in milliseconds.", minimum: 1 })),
-                allow: Type.Optional(Type.Array(Type.String({ description: "App identifier allowed by the caller for electron.launch.", minLength: 1 }), { description: "Optional caller-owned allow list for electron.launch policy checks." })),
-                deny: Type.Optional(Type.Array(Type.String({ description: "App identifier denied by the caller for electron.launch.", minLength: 1 }), { description: "Optional caller-owned deny list for electron.launch policy checks; deny wins over allow." })),
-            }, { additionalProperties: false }),
-            Type.Object({
-                action: StringEnum(["launch"], { description: "Launch an Electron app with an isolated wrapper-owned profile." }),
-                bundleId: Type.String({ description: "Electron launch target: macOS bundle identifier discovered by electron.list. Exactly one launch target is required for electron.launch.", minLength: 1 }),
-                appArgs: Type.Optional(Type.Array(Type.String({ description: "Argument passed to the Electron application.", minLength: 1 }), { description: "Optional Electron app argv. Wrapper-owned lifecycle/debug flags are rejected." })),
-                handoff: Type.Optional(StringEnum(AGENT_BROWSER_ELECTRON_HANDOFFS, { description: "Post-launch handoff depth. Defaults to snapshot." })),
-                targetType: Type.Optional(StringEnum(AGENT_BROWSER_ELECTRON_TARGET_TYPES, { description: "Preferred CDP target type. Defaults to page." })),
-                timeoutMs: Type.Optional(Type.Integer({ description: "Bounded launch timeout in milliseconds.", minimum: 1 })),
-                allow: Type.Optional(Type.Array(Type.String({ description: "App identifier allowed by the caller for electron.launch.", minLength: 1 }), { description: "Optional caller-owned allow list for electron.launch policy checks." })),
-                deny: Type.Optional(Type.Array(Type.String({ description: "App identifier denied by the caller for electron.launch.", minLength: 1 }), { description: "Optional caller-owned deny list for electron.launch policy checks; deny wins over allow." })),
-            }, { additionalProperties: false }),
-            Type.Object({
-                action: StringEnum(["launch"], { description: "Launch an Electron app with an isolated wrapper-owned profile." }),
-                executablePath: Type.String({ description: "Electron launch target: executable path. Discovery is not required when this is provided. Exactly one launch target is required for electron.launch.", minLength: 1 }),
-                appArgs: Type.Optional(Type.Array(Type.String({ description: "Argument passed to the Electron application.", minLength: 1 }), { description: "Optional Electron app argv. Wrapper-owned lifecycle/debug flags are rejected." })),
-                handoff: Type.Optional(StringEnum(AGENT_BROWSER_ELECTRON_HANDOFFS, { description: "Post-launch handoff depth. Defaults to snapshot." })),
-                targetType: Type.Optional(StringEnum(AGENT_BROWSER_ELECTRON_TARGET_TYPES, { description: "Preferred CDP target type. Defaults to page." })),
-                timeoutMs: Type.Optional(Type.Integer({ description: "Bounded launch timeout in milliseconds.", minimum: 1 })),
-                allow: Type.Optional(Type.Array(Type.String({ description: "App identifier allowed by the caller for electron.launch.", minLength: 1 }), { description: "Optional caller-owned allow list for electron.launch policy checks." })),
-                deny: Type.Optional(Type.Array(Type.String({ description: "App identifier denied by the caller for electron.launch.", minLength: 1 }), { description: "Optional caller-owned deny list for electron.launch policy checks; deny wins over allow." })),
-            }, { additionalProperties: false }),
+            electronLaunchVariant(Type, StringEnum, { appPath: Type.String({ description: "Electron launch target: macOS .app bundle path. Exactly one launch target is required for electron.launch.", minLength: 1 }) }),
+            electronLaunchVariant(Type, StringEnum, { appName: Type.String({ description: "Electron launch target: app display name discovered by electron.list. Exactly one launch target is required for electron.launch.", minLength: 1 }) }),
+            electronLaunchVariant(Type, StringEnum, { bundleId: Type.String({ description: "Electron launch target: macOS bundle identifier discovered by electron.list. Exactly one launch target is required for electron.launch.", minLength: 1 }) }),
+            electronLaunchVariant(Type, StringEnum, { executablePath: Type.String({ description: "Electron launch target: executable path. Discovery is not required when this is provided. Exactly one launch target is required for electron.launch.", minLength: 1 }) }),
             Type.Object({
                 action: StringEnum(["status", "cleanup"], { description: "Inspect or cleanup one wrapper-tracked Electron launch by launchId." }),
                 launchId: Type.String({ description: "Wrapper launch id for electron.status and electron.cleanup.", minLength: 1 }),

package/dist/extensions/agent-browser/lib/playbook.js CHANGED Viewed

@@ -24,10 +24,11 @@ export const QUICK_START_GUIDELINES = [
     "For artifact-producing commands, read the visible artifact block and details.artifactVerification before using files: check requested path, absolute path, existence, size bytes, artifact kind, optional mediaType, status, optional limitation, and verified/missing/pending/unverified counts. details.artifacts contains per-file metadata; record start rows are pending/openRecording until record stop writes the target. The wrapper creates parent directories for direct artifact paths and can save simple loopback HTTP(S) anchor downloads directly to the requested path before upstream download fallback. Browser close does not delete explicit saved files; if close reports details.artifactCleanup, use host file tools to remove paths listed in explicitArtifactPaths (when non-empty) after inspection. If close fails with details.promptGuard.reason=requested-artifacts-missing-before-close, save the exact required artifact path before closing. For annotated screenshots inside batch, put --annotate in top-level args (for example { args: [\"--annotate\", \"batch\"], stdin: \"[[\\\"screenshot\\\",\\\"/tmp/page.png\\\"]]\" }) rather than inside the screenshot step; if annotation labels crowd a dense page, use a scoped or non-annotated screenshot plus snapshot refs instead.",
     "When details.nextActions is present, prefer those exact native agent_browser follow-up payloads over prose guidance; they may include args, stdin, sessionMode, networkSourceLookup, safety notes, or artifactPath for saved files.",
 ];
-export const WEB_SEARCH_PROMPT_GUIDELINE = "Use agent_browser_web_search for quick live search/URL discovery; it chooses Exa or Brave, preferring Exa unless configured otherwise. Use agent_browser for interaction/DOM/screenshots/auth. Do not run parallel searches: one good query, inspect results, then one follow-up max; on HTTP 429 stop and report provider limits.";
+export const WEB_SEARCH_PROMPT_GUIDELINE = "Use agent_browser_web_search for quick live search/URL discovery; prefer it over browser-automating public search-engine forms, which can hit anti-bot/CAPTCHA-gated pages. Use agent_browser for interaction/DOM/screenshots/auth after you have a target URL. One query, inspect, one follow-up max; on HTTP 429 stop/report limits.";
 export const SHARED_BROWSER_PLAYBOOK_GUIDELINES = [
     "Standard workflow: open the page, snapshot -i, interact using current @refs from that snapshot, and re-snapshot after navigation, scrolling, rerendering, or other major DOM changes because refs are page-scoped; the wrapper fails mutation-prone stale/recycled refs before upstream can silently target a different current-page element. On dense pages, use wrapper-side snapshot -i --search <text> or snapshot -i --filter role=<role> to render matching refs while preserving the full ref map in details.refSnapshot, add snapshot --viewport when scroll position or above/below-fold context matters, and add snapshot --diff when a quick before/after ref-map delta would prevent reading a full spill file.",
     "For ordinary forms from one snapshot, batch multiple fill @refs before the submit/click step to avoid serial tool calls; if a fill may autosubmit, navigate, or rerender later fields, split the flow and refresh refs first.",
+    "Do not use browser automation to drive public search-engine forms such as Google for discovery; headless jobs that type a query and press Enter can be redirected to anti-bot or CAPTCHA pages. Use agent_browser_web_search when configured, ask for/search from a direct target URL, or navigate to known result URLs. Do not attempt CAPTCHA bypass.",
     "Snapshot choice: prefer snapshot -i for routine clicks/fills (interactive @refs, main-content-first). Use snapshot --compact when you need a denser same-page tree without full spill; use full snapshot (no -i) only when you need the complete accessibility tree. Re-snapshot after navigation or major DOM changes. When snapshot -i compacts because the tree is oversized, scan visible output for Omitted high-value controls and optional details.data.highValueControlRefIds before opening the spill file: those list bounded searchboxes, textboxes, comboboxes, buttons, tabs, checkboxes, radios, options, and menuitems that did not fit the key/other ref previews.",
     "When a visible text or accessible-name target should survive ref churn, prefer find locators such as role, text, label, placeholder, alt, title, or testid with the intended action instead of guessing a CSS selector.",
     "For desktop or host-controlled rich inputs, if semanticAction fill misses, refresh refs and prefer a current editable @ref from details.richInputRecovery or the latest snapshot; focus or click that ref, then use keyboard inserttext or keyboard type with the intended text. Do not auto-submit with Enter or a submit button unless the user flow explicitly calls for it.",
@@ -44,7 +45,7 @@ export const SHARED_BROWSER_PLAYBOOK_GUIDELINES = [
     "For Electron desktop apps, prefer top-level electron for wrapper-owned discovery, isolated launch, status, compact probe, and cleanup: list first, treat likely-sensitive annotations as hints rather than enforcement, launch with the default snapshot handoff unless handoff: \"tabs\" is the safer diagnostic starting point, use electron.probe or snapshot -i/qa.attached for current-session state, and always cleanup the returned launchId when done. electron.launch uses an isolated temporary profile; it does not reuse the app's normal signed-in profile or attach to an already-running authenticated app. For signed-in local app state, host-launch the normal app with --remote-debugging-port when appropriate, then use raw args connect <port|url>; after connect, inspect tab list, select the stable tab id such as tab t2, then run a condition wait or snapshot -i before using refs. close commands (`close`, `quit`, or `exit`) only close the browser/CDP session; leave manually launched app shutdown, profile cleanup, and explicit artifacts to the host owner.",
     "For provider or specialized app workflows, load version-matched upstream guidance with skills get agentcore|electron|slack|dogfood|vercel-sandbox through the native tool; add --full when you need references/templates, and use skills get --all only for broad skill audits. Provider launches such as -p ios, --provider browserbase/kernel/browseruse/browserless/agentcore, and iOS --device are upstream-owned setup paths; use sessionMode fresh when switching providers and expect external credentials or local Appium/Xcode setup to be required.",
     "For dialogs and frames, use dialog status/accept/dismiss and frame <selector|main> through native args; dialog commands and eval snippets that look like alert/confirm/prompt/dialog triggers are shorter-bounded than normal browser calls, and timed-out dialog-like interactions may add inspect-dialog-after-timeout, dismiss-dialog-after-timeout, or recover-fresh-session-after-dialog-timeout nextActions. When --confirm-actions produces a pending confirmation, use details.nextActions or exact confirm <id> / deny <id> calls instead of inventing ids.",
-    "If a session lands on the wrong page or tab, an interaction changes origin unexpectedly, or an open call returns blocked, blank, or otherwise unexpected results, use tab list / tab <tab-id-or-label> / snapshot -i to recover state before retrying different URLs or fallback strategies. For headed demos, put --headed on the first launch with sessionMode=fresh and verify with screenshot/tab/get-url evidence because tool success cannot prove the OS window is visible to the user. For desktop readiness, prefer real conditions first: wait --text, wait --url, wait --fn, wait --load <state>, wait --download, or qa.attached; for disappearance checks in agent-browser 0.27.2, use wait --fn predicates instead of stale upstream-help examples like wait <selector> --state hidden. Use electron.probe/status for wrapper-owned launch health or target mismatch. Fixed waits are a last resort: use explicit --timeout or top-level timeoutMs for legitimately slow waits, and treat a successful payload like \"waited\":\"timeout\" as elapsed time only—verify completion with an observed condition, fresh snapshot, or screenshot.",
+    "If a session lands on the wrong page or tab, an interaction changes origin unexpectedly, or an open call returns blocked, blank, or otherwise unexpected results, use tab list / tab <tab-id-or-label> / snapshot -i to recover state before retrying different URLs or fallback strategies. For headed demos, put --headed on the first launch with sessionMode=fresh and verify with screenshot/tab/get-url evidence because tool success cannot prove the OS window is visible to the user. For desktop readiness, prefer real conditions first: wait --text, wait --url, wait --fn, wait --load <state>, wait --download, or qa.attached; for disappearance checks in agent-browser 0.27.3, use wait --fn predicates instead of stale upstream-help examples like wait <selector> --state hidden. Use electron.probe/status for wrapper-owned launch health or target mismatch. Fixed waits are a last resort: use explicit --timeout or top-level timeoutMs for legitimately slow waits, and treat a successful payload like \"waited\":\"timeout\" as elapsed time only—verify completion with an observed condition, fresh snapshot, or screenshot.",
     "For feed, timeline, or inbox reading tasks, focus on the main timeline/list region and read the first item there rather than unrelated composer or sidebar content.",
     "For read-only browsing tasks, prefer extracting the answer from the current snapshot, structured ref labels, or eval --stdin on the current page before navigating away. Only click into media viewers, detail routes, or new pages when the current view does not contain the needed information.",
     "For downloads, prefer download <selector> <path> when an element click should save a file; simple loopback anchor downloads are saved to the requested path when the wrapper can resolve an HTTP(S) href. Do not rely on click alone when you need the downloaded file on disk.",

package/dist/extensions/agent-browser/lib/results/presentation/batch.js CHANGED Viewed

@@ -229,13 +229,14 @@ async function buildBatchStepPresentation(options) {
         };
     }
     const commandInfo = parseCommandInfo(command ?? []);
+    const commandInfoWithTokens = command ? { ...commandInfo, commandTokens: command } : commandInfo;
     const networkRouteDiagnostics = commandInfo.command === "network" && commandInfo.subcommand === "requests"
         ? buildNetworkRouteDiagnostics(item.result, networkRoutes)
         : undefined;
     const presentation = await buildNestedToolPresentation({
         artifactManifest,
         artifactRequest,
-        commandInfo,
+        commandInfo: commandInfoWithTokens,
         cwd,
         args: command,
         envelope: { data: item.result, success: true },
@@ -264,7 +265,7 @@ async function buildBatchStepPresentation(options) {
     });
     const pageChangeSummary = buildPageChangeSummary({
         artifacts: presentation.artifacts,
-        commandInfo,
+        commandInfo: commandInfoWithTokens,
         data: presentation.data,
         nextActions,
         savedFilePath: presentation.savedFilePath,

package/dist/extensions/agent-browser/lib/results/presentation/diagnostics.js CHANGED Viewed

@@ -107,6 +107,9 @@ export function enrichStreamStatusData(commandInfo, data) {
         wsUrl: getStreamWebSocketUrl(data.port),
     };
 }
+function isClearDiagnosticCommand(commandInfo) {
+    return commandInfo.subcommand === "--clear" || commandInfo.commandTokens?.includes("--clear") === true;
+}
 export function formatDiagnosticSummary(commandInfo, data) {
     if (commandInfo.command === "session") {
         const sessions = getArrayField(data, "sessions");
@@ -181,7 +184,7 @@ export function formatDiagnosticSummary(commandInfo, data) {
         if (commandInfo.subcommand === "requests") {
             const requests = getArrayField(data, "requests");
             if (requests)
-                return `Network requests: ${requests.length}`;
+                return isClearDiagnosticCommand(commandInfo) ? `Network requests reset: ${requests.length} cleared` : `Network requests: ${requests.length}`;
         }
         if (commandInfo.subcommand === "route") {
             const routed = getStringField(data, "routed") ?? getStringField(data, "url") ?? getStringField(data, "pattern");
@@ -228,12 +231,12 @@ export function formatDiagnosticSummary(commandInfo, data) {
     if (commandInfo.command === "console") {
         const messages = getArrayField(data, "messages");
         if (messages)
-            return `Console messages: ${messages.length}`;
+            return isClearDiagnosticCommand(commandInfo) ? `Console reset: ${messages.length} cleared` : `Console messages: ${messages.length}`;
     }
     if (commandInfo.command === "errors") {
         const errors = getArrayField(data, "errors");
         if (errors)
-            return `Page errors: ${errors.length}`;
+            return isClearDiagnosticCommand(commandInfo) ? `Page errors reset: ${errors.length} cleared` : `Page errors: ${errors.length}`;
     }
     if (commandInfo.command === "dashboard") {
         if (typeof data.port === "number")
@@ -344,10 +347,15 @@ function formatNetworkRequestLine(item, index) {
     appendNetworkPreview(lines, "Error", getPreviewCandidate(item, NETWORK_PREVIEW_FIELD_CANDIDATES.error), NETWORK_ERROR_PREVIEW_MAX_CHARS);
     return lines;
 }
-function formatNetworkRequestsText(data) {
+function formatNetworkRequestsText(data, commandInfo) {
     const requests = getArrayField(data, "requests");
     if (!requests)
         return undefined;
+    if (isClearDiagnosticCommand(commandInfo)) {
+        return requests.length === 0
+            ? "Network request buffer cleared; no prior request rows were returned. This reset output is not evidence of current-page network activity."
+            : `Network request buffer cleared; upstream returned ${requests.length} cleared/stale row${requests.length === 1 ? "" : "s"}. Treat these as reset output, not current-page request failures.`;
+    }
     if (requests.length === 0)
         return "No network requests captured. Scope: upstream session aggregate unless the upstream command output says it was cleared or filtered for this page.";
     const shown = ["Scope: upstream session aggregate unless the upstream command output says it was cleared or filtered for this page; do not attribute old requests to the current page without URL/time evidence."];
@@ -584,10 +592,15 @@ export function buildStreamNextActions(commandInfo, data, sessionName) {
         },
     ];
 }
-function formatConsoleText(data) {
+function formatConsoleText(data, commandInfo) {
     const messages = getArrayField(data, "messages");
     if (!messages)
         return undefined;
+    if (isClearDiagnosticCommand(commandInfo)) {
+        return messages.length === 0
+            ? "Console buffer cleared; no prior message rows were returned. This reset output is not evidence of current-page console activity."
+            : `Console buffer cleared; upstream returned ${messages.length} cleared/stale message row${messages.length === 1 ? "" : "s"}. Treat these as reset output, not current-page console errors.`;
+    }
     if (messages.length === 0)
         return "No console messages. Scope: upstream session aggregate unless the upstream command output says it was cleared or filtered for this page.";
     const shown = ["Scope: upstream session aggregate unless the upstream command output says it was cleared or filtered for this page; do not attribute old messages to the current page without URL/time evidence."];
@@ -604,10 +617,15 @@ function formatConsoleText(data) {
     }
     return shown.join("\n");
 }
-function formatErrorsText(data) {
+function formatErrorsText(data, commandInfo) {
     const errors = getArrayField(data, "errors");
     if (!errors)
         return undefined;
+    if (isClearDiagnosticCommand(commandInfo)) {
+        return errors.length === 0
+            ? "Page error buffer cleared; no prior error rows were returned. This reset output is not evidence of current-page errors."
+            : `Page error buffer cleared; upstream returned ${errors.length} cleared/stale error row${errors.length === 1 ? "" : "s"}. Treat these as reset output, not current-page errors.`;
+    }
     if (errors.length === 0)
         return "No page errors.";
     const shown = errors.slice(0, DIAGNOSTIC_LOG_PREVIEW_LIMIT).map((item, index) => {
@@ -927,7 +945,7 @@ export function formatDiagnosticText(commandInfo, data) {
     if (commandInfo.command === "state")
         return formatStateText(data);
     if (commandInfo.command === "network" && commandInfo.subcommand === "requests")
-        return formatNetworkRequestsText(data);
+        return formatNetworkRequestsText(data, commandInfo);
     if (commandInfo.command === "network" && commandInfo.subcommand === "request")
         return formatNetworkRequestText(data);
     if (commandInfo.command === "diff")
@@ -945,9 +963,9 @@ export function formatDiagnosticText(commandInfo, data) {
     if (commandInfo.command === "chat")
         return formatChatText(data);
     if (commandInfo.command === "console")
-        return formatConsoleText(data);
+        return formatConsoleText(data, commandInfo);
     if (commandInfo.command === "errors")
-        return formatErrorsText(data);
+        return formatErrorsText(data, commandInfo);
     if (commandInfo.command === "dashboard")
         return formatDashboardText(data);
     if (commandInfo.command === "doctor")

package/dist/extensions/agent-browser/lib/results/presentation/large-output.js CHANGED Viewed

@@ -13,6 +13,8 @@ const LARGE_OUTPUT_INLINE_MAX_CHARS = 8_000;
 const LARGE_OUTPUT_INLINE_MAX_LINES = 120;
 const LARGE_OUTPUT_PREVIEW_MAX_CHARS = 2_500;
 const LARGE_OUTPUT_PREVIEW_MAX_LINES = 40;
+const LARGE_OUTPUT_PREVIEW_MAX_LINE_CHARS = 240;
+const LARGE_OUTPUT_FAILURE_COMMAND_MAX_CHARS = 240;
 const LARGE_OUTPUT_FILE_PREFIX = "pi-agent-browser-output";
 function shouldCompactLargeOutput(text) {
     return text.length > LARGE_OUTPUT_INLINE_MAX_CHARS || countLines(text) > LARGE_OUTPUT_INLINE_MAX_LINES;
@@ -26,7 +28,7 @@ function buildLargeOutputPreview(text) {
             break;
         }
         const remainingChars = LARGE_OUTPUT_PREVIEW_MAX_CHARS - previewChars;
-        const previewLine = truncateText(line, Math.max(40, remainingChars));
+        const previewLine = truncateText(line, Math.min(Math.max(40, remainingChars), LARGE_OUTPUT_PREVIEW_MAX_LINE_CHARS));
         previewLines.push(previewLine);
         previewChars += previewLine.length + 1;
     }
@@ -35,6 +37,27 @@ function buildLargeOutputPreview(text) {
         previewText: previewLines.join("\n"),
     };
 }
+function buildLargeOutputFailureContext(presentation) {
+    const failure = presentation.batchFailure;
+    if (!failure)
+        return [];
+    const failedStep = failure.failedStep;
+    const commandText = truncateText(failedStep.commandText, LARGE_OUTPUT_FAILURE_COMMAND_MAX_CHARS);
+    const lines = [
+        "Failure context:",
+        `- First failing step: ${failedStep.index + 1} — ${commandText}`,
+        `- Batch result: ${failure.successCount}/${failure.totalCount} succeeded${failure.failureCount > 1 ? `; ${failure.failureCount} failed` : ""}`,
+    ];
+    if (failedStep.failureCategory)
+        lines.push(`- Failure category: ${failedStep.failureCategory}`);
+    const failureText = (failedStep.text || failedStep.summary).replace(/\s+/g, " ").trim();
+    if (failureText)
+        lines.push(`- Failure detail: ${truncateText(failureText, 700)}`);
+    const stepPaths = [failedStep.fullOutputPath, ...(failedStep.fullOutputPaths ?? [])].filter((path, index, paths) => typeof path === "string" && path.length > 0 && paths.indexOf(path) === index);
+    if (stepPaths.length > 0)
+        lines.push(`- Failed-step spill path${stepPaths.length === 1 ? "" : "s"}: ${stepPaths.join(", ")}`);
+    return lines;
+}
 async function writeLargeOutputSpillFile(options) {
     const payload = typeof options.data === "string"
         ? redactModelFacingText(options.data)
@@ -91,8 +114,10 @@ export async function compactLargePresentationOutput(options) {
     }
     const { omittedLineCount, previewText } = buildLargeOutputPreview(text);
     const commandLabel = options.commandInfo.command ?? "agent-browser";
+    const failureContext = buildLargeOutputFailureContext(options.presentation);
     const lines = [
         `Large ${commandLabel} output compacted.`,
+        ...(failureContext.length > 0 ? ["", ...failureContext] : []),
         "",
         "Preview:",
         previewText,

package/dist/extensions/agent-browser/lib/results/presentation.js CHANGED Viewed

@@ -4,6 +4,7 @@
  * Scope: Presentation shaping only; upstream stdout parsing and snapshot compaction internals live in separate modules.
  */
 import { isRecord } from "../parsing.js";
+import { extractCommandTokens } from "../runtime.js";
 import { buildAgentBrowserNextActions } from "./action-recommendations.js";
 import { buildAgentBrowserResultCategoryDetails } from "./categories.js";
 import { detectConfirmationRequired } from "./confirmation.js";
@@ -37,16 +38,17 @@ function shouldAddAnnotatedScreenshotGuidance(commandInfo, args) {
 }
 export async function buildToolPresentation(options) {
     const { args, artifactManifest, artifactRequest, commandInfo, compiledSemanticAction, cwd, envelope, errorText, networkRouteDiagnostics, networkRoutes, persistentArtifactStore, sessionName, } = options;
-    const presentationCommandInfo = resolvePresentationCommandInfo(commandInfo, compiledSemanticAction);
+    const commandInfoWithTokens = commandInfo.commandTokens || !args ? commandInfo : { ...commandInfo, commandTokens: extractCommandTokens(args) };
+    const presentationCommandInfo = resolvePresentationCommandInfo(commandInfoWithTokens, compiledSemanticAction);
     if (errorText) {
         return buildErrorPresentation({ args, commandInfo, errorText, sessionName });
     }
-    const data = enrichStreamStatusData(commandInfo, envelope?.data);
-    const presentationData = redactPresentationData(commandInfo, data);
+    const data = enrichStreamStatusData(commandInfoWithTokens, envelope?.data);
+    const presentationData = redactPresentationData(commandInfoWithTokens, data);
     const artifacts = await extractFileArtifacts({ artifactManifest, artifactRequest, commandInfo: presentationCommandInfo, cwd, data, sessionName });
     const artifactVerification = buildArtifactVerificationSummary(artifacts);
     const artifactSummary = formatArtifactSummary(artifacts);
-    const summary = artifactSummary ?? formatPresentationSummary(commandInfo, data, compiledSemanticAction);
+    const summary = artifactSummary ?? formatPresentationSummary(commandInfoWithTokens, data, compiledSemanticAction);
     const artifactText = artifacts.length > 0 ? formatArtifactMetadataLines(artifacts).join("\n") : undefined;
     let presentation;
     if (commandInfo.command === "batch" && isAgentBrowserBatchResultArray(data)) {
@@ -69,7 +71,7 @@ export async function buildToolPresentation(options) {
         presentation = {
             artifactVerification,
             artifacts: artifacts.length > 0 ? artifacts : undefined,
-            content: [{ type: "text", text: artifactText ?? formatPresentationContentText(commandInfo, data, compiledSemanticAction) }],
+            content: [{ type: "text", text: artifactText ?? formatPresentationContentText(commandInfoWithTokens, data, compiledSemanticAction) }],
             data: presentationData,
             summary,
         };
@@ -160,10 +162,10 @@ export async function buildToolPresentation(options) {
         savedFilePath: presentationWithManifest.savedFilePath,
         successCategory: presentationWithManifest.successCategory,
     });
-    const networkNextActions = commandInfo.command === "network" && commandInfo.subcommand === "requests" && presentationWithManifest.resultCategory === "success"
+    const networkNextActions = commandInfoWithTokens.command === "network" && commandInfoWithTokens.subcommand === "requests" && presentationWithManifest.resultCategory === "success"
         ? buildNetworkRequestsNextActions(data, sessionName, presentationWithManifest.networkRouteDiagnostics)
         : undefined;
-    const streamNextActions = presentationWithManifest.resultCategory === "success" ? buildStreamNextActions(commandInfo, data, sessionName) : undefined;
+    const streamNextActions = presentationWithManifest.resultCategory === "success" ? buildStreamNextActions(commandInfoWithTokens, data, sessionName) : undefined;
     presentationWithManifest.nextActions = mergeNextActions(presentationWithManifest.nextActions, genericNextActions, networkNextActions, streamNextActions);
     presentationWithManifest.pageChangeSummary = presentationWithManifest.pageChangeSummary ?? buildPageChangeSummary({
         artifacts: presentationWithManifest.artifacts,