pi-agent-browser-native 0.2.47 → 0.2.48

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (34) hide show
  1. package/CHANGELOG.md +46 -19
  2. package/README.md +38 -15
  3. package/docs/ARCHITECTURE.md +10 -10
  4. package/docs/COMMAND_REFERENCE.md +35 -21
  5. package/docs/ELECTRON.md +3 -3
  6. package/docs/RELEASE.md +28 -19
  7. package/docs/REQUIREMENTS.md +1 -1
  8. package/docs/SUPPORT_MATRIX.md +34 -106
  9. package/docs/TOOL_CONTRACT.md +23 -21
  10. package/extensions/agent-browser/index.ts +13 -4
  11. package/extensions/agent-browser/lib/config.ts +2 -0
  12. package/extensions/agent-browser/lib/input-modes/job.ts +138 -62
  13. package/extensions/agent-browser/lib/input-modes/params.ts +2 -2
  14. package/extensions/agent-browser/lib/orchestration/browser-run/artifact-paths.ts +44 -0
  15. package/extensions/agent-browser/lib/orchestration/browser-run/click-dispatch.ts +42 -19
  16. package/extensions/agent-browser/lib/orchestration/browser-run/diagnostics.ts +6 -4
  17. package/extensions/agent-browser/lib/orchestration/browser-run/final-result.ts +18 -9
  18. package/extensions/agent-browser/lib/orchestration/browser-run/prepare/direct-anchor-download.ts +158 -0
  19. package/extensions/agent-browser/lib/orchestration/browser-run/prepare/network-page-filter.ts +116 -0
  20. package/extensions/agent-browser/lib/orchestration/browser-run/prepare/scroll-shims.ts +147 -0
  21. package/extensions/agent-browser/lib/orchestration/browser-run/prepare/snapshot-filter.ts +183 -0
  22. package/extensions/agent-browser/lib/orchestration/browser-run/prepare/wait-timeouts.ts +58 -0
  23. package/extensions/agent-browser/lib/orchestration/browser-run/prepare.ts +19 -653
  24. package/extensions/agent-browser/lib/orchestration/browser-run/process-output.ts +1 -6
  25. package/extensions/agent-browser/lib/orchestration/browser-run/session-artifacts.ts +8 -0
  26. package/extensions/agent-browser/lib/orchestration/browser-run/types.ts +1 -0
  27. package/extensions/agent-browser/lib/pi-tool-rendering.ts +34 -19
  28. package/extensions/agent-browser/lib/playbook.ts +4 -4
  29. package/extensions/agent-browser/lib/results/action-recommendations.ts +3 -3
  30. package/extensions/agent-browser/lib/web-search.ts +11 -4
  31. package/package.json +4 -4
  32. package/scripts/agent-browser-capability-baseline.mjs +6 -3
  33. package/scripts/doctor.mjs +11 -10
  34. package/scripts/platform-smoke.mjs +1 -1
package/CHANGELOG.md CHANGED
@@ -2,7 +2,34 @@
2
2
 
3
3
  ## Unreleased
4
4
 
5
- No changes yet.
5
+ No unreleased changes yet.
6
+
7
+ ## 0.2.48 - 2026-06-11
8
+
9
+ ### Changed
10
+
11
+ - Rebaselined the upstream capability metadata, command reference, support matrix, and real-upstream contract metadata for `agent-browser` `0.27.2` after reviewing the upstream changelog.
12
+ - Forward explicit long `wait <ms>` / `wait --timeout <ms>` calls now that upstream `agent-browser` `0.27.2` fixes wait timeout and client read-budget handling; the wrapper derives a longer subprocess watchdog when the caller does not provide top-level `timeoutMs`.
13
+ - Split `browser-run/prepare.ts` concerns into focused direct-anchor download, network page-filter, wait-timeout, scroll-shim, and snapshot-filter preparation modules while preserving the existing coordinator behavior.
14
+ - Refactored constrained `job` step compilation around per-action descriptors so unsupported-field validation and compilation stay paired.
15
+ - Added a documentation source-of-truth map and moved implementation-deep support-matrix closure notes into maintainer notes so the active release checklist stays navigable.
16
+ - Moved superseded implementation-plan and v1 contract drafts into `docs/archive/` with clear non-canonical archive guidance while keeping them out of the published package.
17
+ - Added `npm run verify -- pre-pr` as a named local confidence gate that runs the default verification stack plus package-content checks without release-only lifecycle, platform, live dogfood, or benchmark cost.
18
+
19
+ ### Fixed
20
+
21
+ - Removed stale docs that said `wait 30000` was intentionally blocked by the wrapper.
22
+ - Rejected unsupported fields for every constrained `job` step action instead of silently ignoring irrelevant fields.
23
+ - Broke the browser-run orchestration import cycle and added a static acyclic import-boundary regression test.
24
+ - Added package verification for local Markdown links in packed docs and converted repo-only documentation references to external GitHub links or plain text so npm docs remain navigable.
25
+ - Clarified support-matrix evidence so current `agent-browser 0.27.2` gates are separated from historical pending-refresh release gates.
26
+ - Aligned the documented/tested AWS provider environment allowlist with the runtime-forwarded AgentCore AWS variables.
27
+ - Kept env/global web-search registration available when project-local config is not approved, while trusted project disables or config errors still suppress unsafe search execution.
28
+ - Stopped running click-dispatch probes for unresolved `find … click` locators to avoid false failures on frame-scoped upstream clicks.
29
+
30
+ ### Validation
31
+
32
+ - Ran focused wrapper/process tests, `npm run typecheck`, `npm run docs`, `npm run verify -- command-reference`, `npm test -- --test-concurrency=1`, `npm run verify -- real-upstream`, `npm run doctor`, `npm run verify -- package`, and `npm run verify -- dogfood` against `agent-browser` `0.27.2`; final release validation evidence is recorded in `docs/SUPPORT_MATRIX.md`.
6
33
 
7
34
  ## 0.2.47 - 2026-06-08
8
35
 
@@ -247,7 +274,7 @@ No changes yet.
247
274
 
248
275
  ### Changed
249
276
 
250
- - Maintainer docs now map Pi `details` classifiers, `nextActions`, recovery id registries, and network QA helpers to the split `extensions/agent-browser/lib/results/*` modules (plus the `shared.ts` re-export barrel) and call out `extensions/agent-browser/lib/session-page-state.ts` for ordered tab/ref/pinning state—see [`AGENTS.md`](../AGENTS.md), [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md), [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md), [`docs/REQUIREMENTS.md`](docs/REQUIREMENTS.md), and [`docs/SUPPORT_MATRIX.md`](docs/SUPPORT_MATRIX.md).
277
+ - Maintainer docs now map Pi `details` classifiers, `nextActions`, recovery id registries, and network QA helpers to the split `extensions/agent-browser/lib/results/*` modules (plus the `shared.ts` re-export barrel) and call out `extensions/agent-browser/lib/session-page-state.ts` for ordered tab/ref/pinning state—see [`AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md), [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md), [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md), [`docs/REQUIREMENTS.md`](docs/REQUIREMENTS.md), and [`docs/SUPPORT_MATRIX.md`](docs/SUPPORT_MATRIX.md).
251
278
  - Real Pi transcripts now treat wrapper-classified failures as tool errors: a `tool_result` hook (`buildAgentBrowserToolResultPatch` in `extensions/agent-browser/index.ts`) sets `isError: true` when `details.resultCategory` is `failure` even if `execute` returned successfully (for example `qa` preset reclassification), appends a model-visible `Result category: failure; …; Pi tool isError: true.` line for prose output, and preserves caller-requested `--json` output as parseable JSON—see [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details).
252
279
  - Desktop and attach workflows: documented readiness ladders (condition waits, `tab list` → `tab t<N>` → `snapshot -i`, `electron.probe` / `qa.attached`) instead of blind sleeps; clarified that raw `connect` success only means the CDP endpoint accepted the session, and that `close` does not quit manually launched apps or remove explicit artifacts.
253
280
  - Page-scoped refs: when `snapshot -i` reports `No active page`, prior session refs are cleared and `details.refSnapshotInvalidation` records `reason: "no-active-page"` until a successful snapshot restores refs; compact snapshots’ `Omitted high-value controls` heuristics favor editables, named tab/surface controls, and primary actions on dense desktop-host UIs.
@@ -337,15 +364,15 @@ No changes yet.
337
364
  - managed-session outcome diagnostics for failed `sessionMode: "fresh"` calls (`RQ-0077`): failed or timed-out fresh launches now report `details.managedSessionOutcome` and, when the plan used `sessionMode: "fresh"` with a failing outcome, append visible `Managed session outcome: …` text so agents know whether the prior managed session was preserved or no managed session became current.
338
365
  - timeout partial-progress evidence for long `job` / `qa` / `batch` calls (`RQ-0076`): wrapper watchdog timeouts now add best-effort `details.timeoutPartialProgress` with planned steps, current page title/URL, and declared artifact path checks, and append visible `Timeout partial progress` recovery text.
339
366
  - QA/network failed-request impact classification (`RQ-0075`): `extensions/agent-browser/lib/results/shared.ts` now separates actionable network failures from benign low-impact browser icon misses such as missing `favicon.ico`, `qa` preserves benign misses as `qaPreset.warnings` instead of failing otherwise healthy smoke checks, and `network requests` presentation includes actionable/benign summary lines plus per-row impact tags. Contract and operator notes live in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md), [`README.md`](README.md), and [`docs/SUPPORT_MATRIX.md`](docs/SUPPORT_MATRIX.md); regression coverage is in `test/agent-browser.extension-validation.test.ts` and `test/agent-browser.presentation.test.ts`.
340
- - post-success `get text <selector>` visibility diagnostics (`RQ-0074`): after a successful `get text` on a non-`@ref` CSS selector, `extensions/agent-browser/index.ts` may run an extra read-only `eval --stdin` probe (`buildVisibleTextProbeScript`), merge `details.selectorTextVisibility` (and `selectorTextVisibilityAll` when several batched selectors qualify), prepend `Selector text visibility warning` lines to visible text, and append `inspect-visible-text-candidates` next actions carrying the same probe script in `stdin`; skipped for `@e…` refs and for selectors whose string would leak secrets after redaction or match sensitive attribute-literal heuristics. Contract in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), operator notes in [`README.md`](README.md), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md), [`docs/SUPPORT_MATRIX.md`](docs/SUPPORT_MATRIX.md), maintainer checklist in [`AGENTS.md`](AGENTS.md); regression `agentBrowserExtension warns when get text may read hidden selector matches` in [`test/agent-browser.extension-validation.test.ts`](test/agent-browser.extension-validation.test.ts)
367
+ - post-success `get text <selector>` visibility diagnostics (`RQ-0074`): after a successful `get text` on a non-`@ref` CSS selector, `extensions/agent-browser/index.ts` may run an extra read-only `eval --stdin` probe (`buildVisibleTextProbeScript`), merge `details.selectorTextVisibility` (and `selectorTextVisibilityAll` when several batched selectors qualify), prepend `Selector text visibility warning` lines to visible text, and append `inspect-visible-text-candidates` next actions carrying the same probe script in `stdin`; skipped for `@e…` refs and for selectors whose string would leak secrets after redaction or match sensitive attribute-literal heuristics. Contract in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), operator notes in [`README.md`](README.md), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md), [`docs/SUPPORT_MATRIX.md`](docs/SUPPORT_MATRIX.md), maintainer checklist in [`AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md); regression `agentBrowserExtension warns when get text may read hidden selector matches` in [`test/agent-browser.extension-validation.test.ts`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/test/agent-browser.extension-validation.test.ts)
341
368
  - optional `semanticAction.session` on native `agent_browser`: compiles to a leading `--session <name>` pair before upstream `find` argv so the locator shorthand targets a named upstream browser session instead of the extension-managed default; `buildExecutionPlan` skips implicit `--session` injection when argv already starts with `--session`; successful unified results echo `details.sessionName`; `retry-semantic-action-after-stale-ref` copies the compiled argv including that prefix, and bounded `try-*-candidate` next actions preserve the same session prefix from `getCompiledSemanticActionSessionPrefix` in `extensions/agent-browser/index.ts`. Contract in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#semanticaction) and [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#sessionmode); operator notes in [`README.md`](README.md), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md), [`docs/SUPPORT_MATRIX.md`](docs/SUPPORT_MATRIX.md), [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md#direct-subprocess-execution); playbook line in `extensions/agent-browser/lib/playbook.ts`; regression coverage in `test/agent-browser.extension-validation.test.ts`
342
369
  - bounded `selector-not-found` recovery for top-level `semanticAction`: when the wrapper still has `details.compiledSemanticAction`, `extensions/agent-browser/index.ts` may append `try-*-candidate` entries to `details.nextActions` and an `Agent-browser candidate fallbacks` block in visible text for specific `fill`/`click` locator pairs (`placeholder`, `text`, `label` only; not `select`); contract in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#semanticaction) and [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), operator notes in [`README.md`](README.md), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md), [`docs/SUPPORT_MATRIX.md`](docs/SUPPORT_MATRIX.md), [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md#direct-subprocess-execution), playbook line in `extensions/agent-browser/lib/playbook.ts`, regression coverage in `test/agent-browser.extension-validation.test.ts`
343
370
  - compact oversized `snapshot` output now documents the `Omitted high-value controls` prose block and matching `details.data.highValueControlRefIds` (plus related compact snapshot metadata) in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#snapshot), [`README.md`](README.md), and generated playbook guidance from `extensions/agent-browser/lib/playbook.ts`, with implementation in `extensions/agent-browser/lib/results/snapshot.ts` and regression coverage in `test/agent-browser.presentation.test.ts`
344
371
 
345
372
  ### Changed
346
- - documentation for `RQ-0077` managed-session outcomes now matches `buildManagedSessionOutcome` / `formatManagedSessionOutcomeText`: when the visible `Managed session outcome: …` line is emitted (including ordering after other diagnostic tails per `rawAppendedDiagnosticText` in `extensions/agent-browser/index.ts`, and the missing-binary-only case), how `details.managedSessionOutcome` behaves after **`qa`** reclassification, and that `"auto"` failures can populate `details` without the extra prose line; updates in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md), [`README.md`](README.md), [`docs/SUPPORT_MATRIX.md`](docs/SUPPORT_MATRIX.md), [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md#session-model), [`AGENTS.md`](AGENTS.md), and this changelog.
347
- - documentation for `RQ-0076` timeout partial progress now matches `collectTimeoutPartialProgress` / `formatTimeoutPartialProgressText` in code: compiled `qa` shares the `job` step-list path; otherwise planned steps come from JSON-array `batch` stdin (caller-provided or wrapper-generated for `sourceLookup` / `networkSourceLookup`), only when each element is a string[] argv row; optional planned-URL fallback and `PI_AGENT_BROWSER_PROCESS_TIMEOUT_MS` watchdog tuning are called out in [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md), [`docs/SUPPORT_MATRIX.md`](docs/SUPPORT_MATRIX.md), [`README.md`](README.md), and [`AGENTS.md`](AGENTS.md); [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details) now documents `timeoutPartialProgress.summary`, the batch stdin parse rule, the six-step cap on visible `Planned steps` lines, and the `0/0` declared-path count when only page context is recovered.
348
- - batch stdin page-scoped ref preflight clears the ref-invalidating latch when a later `snapshot` step appears in the same JSON plan (`getBatchRefInvalidationMessage` in `extensions/agent-browser/index.ts`), matching documented `snapshot -i` spacing inside `batch`; contract expanded under [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details) (`refSnapshot`); regression `agentBrowserExtension allows batch stdin ref steps after snapshot following an invalidating step` in [`test/agent-browser.extension-validation.test.ts`](test/agent-browser.extension-validation.test.ts)
373
+ - documentation for `RQ-0077` managed-session outcomes now matches `buildManagedSessionOutcome` / `formatManagedSessionOutcomeText`: when the visible `Managed session outcome: …` line is emitted (including ordering after other diagnostic tails per `rawAppendedDiagnosticText` in `extensions/agent-browser/index.ts`, and the missing-binary-only case), how `details.managedSessionOutcome` behaves after **`qa`** reclassification, and that `"auto"` failures can populate `details` without the extra prose line; updates in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md), [`README.md`](README.md), [`docs/SUPPORT_MATRIX.md`](docs/SUPPORT_MATRIX.md), [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md#session-model), [`AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md), and this changelog.
374
+ - documentation for `RQ-0076` timeout partial progress now matches `collectTimeoutPartialProgress` / `formatTimeoutPartialProgressText` in code: compiled `qa` shares the `job` step-list path; otherwise planned steps come from JSON-array `batch` stdin (caller-provided or wrapper-generated for `sourceLookup` / `networkSourceLookup`), only when each element is a string[] argv row; optional planned-URL fallback and `PI_AGENT_BROWSER_PROCESS_TIMEOUT_MS` watchdog tuning are called out in [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md), [`docs/SUPPORT_MATRIX.md`](docs/SUPPORT_MATRIX.md), [`README.md`](README.md), and [`AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md); [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details) now documents `timeoutPartialProgress.summary`, the batch stdin parse rule, the six-step cap on visible `Planned steps` lines, and the `0/0` declared-path count when only page context is recovered.
375
+ - batch stdin page-scoped ref preflight clears the ref-invalidating latch when a later `snapshot` step appears in the same JSON plan (`getBatchRefInvalidationMessage` in `extensions/agent-browser/index.ts`), matching documented `snapshot -i` spacing inside `batch`; contract expanded under [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details) (`refSnapshot`); regression `agentBrowserExtension allows batch stdin ref steps after snapshot following an invalidating step` in [`test/agent-browser.extension-validation.test.ts`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/test/agent-browser.extension-validation.test.ts)
349
376
 
350
377
  ### Fixed
351
378
  - explicit `--json` calls keep machine-readable visible content parseable even when the wrapper attaches extra diagnostic structs such as `details.evalStdinHint`; diagnostic guidance remains available on `details` without being appended after the JSON payload.
@@ -355,28 +382,28 @@ No changes yet.
355
382
  ## 0.2.25 - 2026-05-14
356
383
 
357
384
  ### Added
358
- - [`docs/SUPPORT_MATRIX.md`](docs/SUPPORT_MATRIX.md) as the durable upstream support and release-readiness matrix keyed to `CAPABILITY_BASELINE.inventorySections` in `scripts/agent-browser-capability-baseline.mjs`, including maintainer refresh steps, verification gate evidence, and per-inventory documentation/runtime/test pointers; cross-linked from [`README.md`](README.md), [`AGENTS.md`](AGENTS.md), [`docs/RELEASE.md`](docs/RELEASE.md), [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md), and the published tarball `files` list in `package.json`
385
+ - [`docs/SUPPORT_MATRIX.md`](docs/SUPPORT_MATRIX.md) as the durable upstream support and release-readiness matrix keyed to `CAPABILITY_BASELINE.inventorySections` in `scripts/agent-browser-capability-baseline.mjs`, including maintainer refresh steps, verification gate evidence, and per-inventory documentation/runtime/test pointers; cross-linked from [`README.md`](README.md), [`AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md), [`docs/RELEASE.md`](docs/RELEASE.md), [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md), and the published tarball `files` list in `package.json`
359
386
  - machine-readable `details.nextActions` id `retry-semantic-action-after-stale-ref` when a top-level `semanticAction` call fails with `failureCategory: "stale-ref"` and the wrapper still has the compiled upstream `find` argv: it is appended after `refresh-interactive-refs` so agents can retry the same locator-stable target without hand-rebuilding argv, while direct stale `@e…` flows keep snapshot-only recovery; merged in `extensions/agent-browser/index.ts`, documented in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#semanticaction) and [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), agent playbook string in `extensions/agent-browser/lib/playbook.ts`, regression coverage in `test/agent-browser.extension-validation.test.ts`
360
387
  - optional top-level `semanticAction` on native `agent_browser` as a mutually exclusive alternative to `args`, compiling common locator intents into upstream `find` argv and echoing `{ action, locator, args }` (redacted like other argv) in `details.compiledSemanticAction` when the unified or early-validation `details` object includes that field; contract in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#semanticaction), compilation in `extensions/agent-browser/index.ts` (`compileAgentBrowserSemanticAction`), regression coverage in `test/agent-browser.extension-validation.test.ts`
361
388
  - bounded machine-readable outcome fields on native `agent_browser` tool `details`: `resultCategory` (`success` | `failure`) with `successCategory` or `failureCategory` for stable agent branching without parsing prose; contract in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), types and classifiers in `extensions/agent-browser/lib/results/shared.ts`, regression coverage in `test/agent-browser.results.test.ts` and related extension tests
362
389
  - optional `details.pageChangeSummary` (and per-step `batchSteps[].pageChangeSummary` on `batch`) with `changeType`, human-readable `summary`, optional `title`/`url`, artifact hints, and `nextActionIds` aligned to `details.nextActions`; assembly in `extensions/agent-browser/lib/results/presentation.ts` (`buildPageChangeSummary`, `PAGE_CHANGE_SUMMARY_COMMANDS`); contract and examples in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), regression coverage in `test/agent-browser.presentation.test.ts` and `test/agent-browser.extension-validation.test.ts`
363
- - optional experimental top-level `sourceLookup` on native `agent_browser` (mutually exclusive with `args`, `semanticAction`, `job`, and `qa`) that compiles to upstream `batch` steps (`is visible`, `get html`, `react inspect`, and `react tree` when the corresponding fields are set), performs a bounded workspace component scan under the Pi session cwd when `componentName` is present, and merges structured `details.sourceLookup` (`status`, `candidates`, `limitations`, `summary`) plus `details.compiledSourceLookup` for observability; `details.sourceLookup.status` distinguishes `candidates-found`, `no-candidates`, and `unsupported` (the last only when no candidates were collected and a `react` batch step failed). Operator and agent contracts in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#sourcelookup), [`README.md`](README.md), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md), [`docs/SUPPORT_MATRIX.md`](docs/SUPPORT_MATRIX.md), [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md), [`docs/RELEASE.md`](docs/RELEASE.md), and [`AGENTS.md`](AGENTS.md); compilation and post-batch analysis in `extensions/agent-browser/index.ts` (`compileAgentBrowserSourceLookup`, `analyzeSourceLookupResults`); regression coverage in `test/agent-browser.extension-validation.test.ts` and a representative scenario in `scripts/agent-browser-efficiency-benchmark.mjs`
390
+ - optional experimental top-level `sourceLookup` on native `agent_browser` (mutually exclusive with `args`, `semanticAction`, `job`, and `qa`) that compiles to upstream `batch` steps (`is visible`, `get html`, `react inspect`, and `react tree` when the corresponding fields are set), performs a bounded workspace component scan under the Pi session cwd when `componentName` is present, and merges structured `details.sourceLookup` (`status`, `candidates`, `limitations`, `summary`) plus `details.compiledSourceLookup` for observability; `details.sourceLookup.status` distinguishes `candidates-found`, `no-candidates`, and `unsupported` (the last only when no candidates were collected and a `react` batch step failed). Operator and agent contracts in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#sourcelookup), [`README.md`](README.md), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md), [`docs/SUPPORT_MATRIX.md`](docs/SUPPORT_MATRIX.md), [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md), [`docs/RELEASE.md`](docs/RELEASE.md), and [`AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md); compilation and post-batch analysis in `extensions/agent-browser/index.ts` (`compileAgentBrowserSourceLookup`, `analyzeSourceLookupResults`); regression coverage in `test/agent-browser.extension-validation.test.ts` and a representative scenario in `scripts/agent-browser-efficiency-benchmark.mjs`
364
391
 
365
392
  ### Changed
366
- - documented closed `RQ-0068` (no first-class reusable named browser recipe runtime above constrained `job`, the `qa` preset, experimental `sourceLookup` / `networkSourceLookup`, and raw `batch`): evidence bar tied to deterministic efficiency-benchmark scenario ids in [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md#no-reusable-recipe-layer-yet), operator and maintainer cross-links in [`README.md`](README.md), [`AGENTS.md`](AGENTS.md), [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md), [`docs/SUPPORT_MATRIX.md`](docs/SUPPORT_MATRIX.md), and agent playbook guidance in `extensions/agent-browser/lib/playbook.ts`
367
- - presentation layer treats `cookies`, `storage`, `auth`, `dialog`, `frame`, and `state` as stateful: successful `details.data` and per-step `batch` results pass through field-aware or full-tree redaction, argv echo uses `redactInvocationArgs` for cookie/storage set values, failed batch steps strip the same literals from structured errors, and aggregate `batch` tool calls expose a compact redacted `details.data` roll-up—documented in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#use-stateful-browser-context-commands-safely), [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md), [`docs/RELEASE.md`](docs/RELEASE.md), [`README.md`](README.md), and [`AGENTS.md`](AGENTS.md), with regression coverage in `test/agent-browser.presentation.test.ts` and `test/agent-browser.extension-validation.test.ts`
368
- - documented real-upstream suite mechanics (single 120s contract test, output-shape JSON, temp `HOME` / socket / screenshot isolation, React DevTools branch) plus triage notes in [`docs/RELEASE.md`](docs/RELEASE.md#real-upstream-suite-mechanics-isolation-and-troubleshooting); cross-links from [`README.md`](README.md) and [`AGENTS.md`](AGENTS.md)
369
- - expanded the opt-in `npm run verify -- real-upstream` contract (`PI_AGENT_BROWSER_REAL_UPSTREAM=1`) across `test/agent-browser.real-upstream-contract.test.ts`, `test/fixtures/agent-browser-real-output-shapes.json`, and `test/helpers/agent-browser-harness.ts` (broader core command matrix, `batch` stdin, `pushstate`, `vitals … --json`, `network route … --abort --resource-type`, `cookies set --curl`, missing-renderer `react tree`, and `wait --download` metadata versus on-disk presence); added a separate fast fake-upstream argv matrix in `test/agent-browser.extension-validation.test.ts` for additional passthrough commands (`connect`, `download`, `get url`, `snapshot --compact`, `tab` lifecycle); maintainer inventory and caveat notes in [`docs/RELEASE.md`](docs/RELEASE.md#real-upstream-contract-validation), high-level summaries in [`README.md`](README.md) and [`AGENTS.md`](AGENTS.md), and `scripts/project.mjs` verify help
370
- - read-only `skills list`, `skills get …`, and `skills path …` now share the same implicit-session behavior as plain-text `--help` / `--version` probes: `buildExecutionPlan` still prepends `--json`, but under default `sessionMode: "auto"` it does not inject the extension-managed implicit `--session`, so bundled skill text can be loaded without pinning or rotating the active browser session; allowlisting lives in `extensions/agent-browser/lib/runtime.ts` (`isStatelessInspectionCommand`), with regression coverage in `test/agent-browser.runtime.test.ts` and `test/agent-browser.extension-validation.test.ts`, operator-facing notes in [`README.md`](README.md), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#built-in-skills), [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md), [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md), and [`AGENTS.md`](AGENTS.md)
371
- - `-p`, `--provider`, and `--device` are now modeled as launch-scoped flags in `LAUNCH_SCOPED_FLAG_DEFINITIONS` (`extensions/agent-browser/lib/runtime.ts`), so implicit `sessionMode: "auto"` reuse fails fast with the same `sessionRecoveryHint` / `sessionMode: "fresh"` guidance as profile, CDP, and state launches when those selectors would otherwise be ignored on an active managed session; contract and operator docs updated in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md), [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md), [`README.md`](README.md), and [`AGENTS.md`](AGENTS.md), with argv matrices in `test/agent-browser.extension-validation.test.ts` and planning assertions in `test/agent-browser.runtime.test.ts`
372
- - `test/agent-browser.process.test.ts` now asserts representative provider and iOS credential env vars (`AGENT_BROWSER_IOS_DEVICE`, `AGENT_BROWSER_IOS_UDID`, `AGENTCORE_API_KEY`, `BROWSERBASE_PROJECT_ID`) reach the upstream child alongside existing `AGENT_BROWSER_*` and provider-prefix forwarding documented in [`AGENTS.md`](AGENTS.md) and [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#output-provider-policy-and-ai-flags)
393
+ - documented closed `RQ-0068` (no first-class reusable named browser recipe runtime above constrained `job`, the `qa` preset, experimental `sourceLookup` / `networkSourceLookup`, and raw `batch`): evidence bar tied to deterministic efficiency-benchmark scenario ids in [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md#no-reusable-recipe-layer-yet), operator and maintainer cross-links in [`README.md`](README.md), [`AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md), [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md), [`docs/SUPPORT_MATRIX.md`](docs/SUPPORT_MATRIX.md), and agent playbook guidance in `extensions/agent-browser/lib/playbook.ts`
394
+ - presentation layer treats `cookies`, `storage`, `auth`, `dialog`, `frame`, and `state` as stateful: successful `details.data` and per-step `batch` results pass through field-aware or full-tree redaction, argv echo uses `redactInvocationArgs` for cookie/storage set values, failed batch steps strip the same literals from structured errors, and aggregate `batch` tool calls expose a compact redacted `details.data` roll-up—documented in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#use-stateful-browser-context-commands-safely), [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md), [`docs/RELEASE.md`](docs/RELEASE.md), [`README.md`](README.md), and [`AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md), with regression coverage in `test/agent-browser.presentation.test.ts` and `test/agent-browser.extension-validation.test.ts`
395
+ - documented real-upstream suite mechanics (single 120s contract test, output-shape JSON, temp `HOME` / socket / screenshot isolation, React DevTools branch) plus triage notes in [`docs/RELEASE.md`](docs/RELEASE.md#real-upstream-suite-mechanics-isolation-and-troubleshooting); cross-links from [`README.md`](README.md) and [`AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md)
396
+ - expanded the opt-in `npm run verify -- real-upstream` contract (`PI_AGENT_BROWSER_REAL_UPSTREAM=1`) across `test/agent-browser.real-upstream-contract.test.ts`, `test/fixtures/agent-browser-real-output-shapes.json`, and `test/helpers/agent-browser-harness.ts` (broader core command matrix, `batch` stdin, `pushstate`, `vitals … --json`, `network route … --abort --resource-type`, `cookies set --curl`, missing-renderer `react tree`, and `wait --download` metadata versus on-disk presence); added a separate fast fake-upstream argv matrix in `test/agent-browser.extension-validation.test.ts` for additional passthrough commands (`connect`, `download`, `get url`, `snapshot --compact`, `tab` lifecycle); maintainer inventory and caveat notes in [`docs/RELEASE.md`](docs/RELEASE.md#real-upstream-contract-validation), high-level summaries in [`README.md`](README.md) and [`AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md), and `scripts/project.mjs` verify help
397
+ - read-only `skills list`, `skills get …`, and `skills path …` now share the same implicit-session behavior as plain-text `--help` / `--version` probes: `buildExecutionPlan` still prepends `--json`, but under default `sessionMode: "auto"` it does not inject the extension-managed implicit `--session`, so bundled skill text can be loaded without pinning or rotating the active browser session; allowlisting lives in `extensions/agent-browser/lib/runtime.ts` (`isStatelessInspectionCommand`), with regression coverage in `test/agent-browser.runtime.test.ts` and `test/agent-browser.extension-validation.test.ts`, operator-facing notes in [`README.md`](README.md), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#built-in-skills), [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md), [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md), and [`AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md)
398
+ - `-p`, `--provider`, and `--device` are now modeled as launch-scoped flags in `LAUNCH_SCOPED_FLAG_DEFINITIONS` (`extensions/agent-browser/lib/runtime.ts`), so implicit `sessionMode: "auto"` reuse fails fast with the same `sessionRecoveryHint` / `sessionMode: "fresh"` guidance as profile, CDP, and state launches when those selectors would otherwise be ignored on an active managed session; contract and operator docs updated in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md), [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md), [`README.md`](README.md), and [`AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md), with argv matrices in `test/agent-browser.extension-validation.test.ts` and planning assertions in `test/agent-browser.runtime.test.ts`
399
+ - `test/agent-browser.process.test.ts` now asserts representative provider and iOS credential env vars (`AGENT_BROWSER_IOS_DEVICE`, `AGENT_BROWSER_IOS_UDID`, `AGENTCORE_API_KEY`, `BROWSERBASE_PROJECT_ID`) reach the upstream child alongside existing `AGENT_BROWSER_*` and provider-prefix forwarding documented in [`AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md) and [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#output-provider-policy-and-ai-flags)
373
400
  - added a concrete `details.nextActions` JSON example in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details) for the `refresh-interactive-refs` + `retry-semantic-action-after-stale-ref` chain on semantic `stale-ref` failures, aligned with `extensions/agent-browser/index.ts` and `extensions/agent-browser/lib/results/shared.ts`
374
401
  - documented how `npm run docs` differs from the default `npm run verify` gate, and linked checkout maintainers to `AGENTS.md` for capability baseline rebaselining and operational testing notes alongside the shipped `docs/` set
375
402
  - linked [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) to the stable `agent_browser` result-category contract in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details) and the TypeScript source in `extensions/agent-browser/lib/results/shared.ts`
376
- - `package.json` `prepublishOnly` now runs `npm run verify -- release` before `npm pack --dry-run`, so publishes enforce packaged Pi smoke and the same live upstream command-reference sampling as [`docs/RELEASE.md`](docs/RELEASE.md#pre-release-checks); orchestration is the `release` mode in [`scripts/project.mjs`](scripts/project.mjs), with operator-facing notes in [`README.md`](README.md)
377
- - release guidance now requires `tmux`-driven live-site Pi dogfood with the native `agent_browser` tool before every release, with cleanup and evidence recording expectations in [`docs/RELEASE.md`](docs/RELEASE.md#pre-release-checks) and [`AGENTS.md`](AGENTS.md)
378
- - aligned maintainer wording so configured-source lifecycle (`npm run verify -- lifecycle`) is documented as a pre-publish requirement across [`AGENTS.md`](AGENTS.md), [`README.md`](README.md), [`docs/RELEASE.md`](docs/RELEASE.md), and [`docs/REQUIREMENTS.md`](docs/REQUIREMENTS.md), while noting it remains a separate `verify` mode from the default gate in [`scripts/project.mjs`](scripts/project.mjs)
379
- - release-readiness cross-links: `package.json` `prepublishOnly` called out next to the verification facade in [`AGENTS.md`](AGENTS.md); configured-source lifecycle plus publish-time `release` gate summarized under local validation modes in [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md); configured-source harness subsection in [`docs/RELEASE.md`](docs/RELEASE.md) explicitly ties to [Pre-release checks](docs/RELEASE.md#pre-release-checks)
403
+ - `package.json` `prepublishOnly` now runs `npm run verify -- release` before `npm pack --dry-run`, so publishes enforce packaged Pi smoke and the same live upstream command-reference sampling as [`docs/RELEASE.md`](docs/RELEASE.md#pre-release-checks); orchestration is the `release` mode in [`scripts/project.mjs`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/scripts/project.mjs), with operator-facing notes in [`README.md`](README.md)
404
+ - release guidance now requires `tmux`-driven live-site Pi dogfood with the native `agent_browser` tool before every release, with cleanup and evidence recording expectations in [`docs/RELEASE.md`](docs/RELEASE.md#pre-release-checks) and [`AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md)
405
+ - aligned maintainer wording so configured-source lifecycle (`npm run verify -- lifecycle`) is documented as a pre-publish requirement across [`AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md), [`README.md`](README.md), [`docs/RELEASE.md`](docs/RELEASE.md), and [`docs/REQUIREMENTS.md`](docs/REQUIREMENTS.md), while noting it remains a separate `verify` mode from the default gate in [`scripts/project.mjs`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/scripts/project.mjs)
406
+ - release-readiness cross-links: `package.json` `prepublishOnly` called out next to the verification facade in [`AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md); configured-source lifecycle plus publish-time `release` gate summarized under local validation modes in [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md); configured-source harness subsection in [`docs/RELEASE.md`](docs/RELEASE.md) explicitly ties to [Pre-release checks](docs/RELEASE.md#pre-release-checks)
380
407
 
381
408
  ## 0.2.24 - 2026-05-11
382
409
 
package/README.md CHANGED
@@ -4,6 +4,21 @@ A Pi extension that lets coding agents drive real browser sessions with a native
4
4
 
5
5
  It is for Pi users who want agents to browse sites, inspect pages, click through flows, capture screenshots, use persistent profiles, and handle authenticated web apps without spending context on `agent-browser` CLI ceremony.
6
6
 
7
+ ## Source-of-truth map
8
+
9
+ Start here for install and common usage. For deeper work, use the active docs by purpose:
10
+
11
+ | Need | Read |
12
+ | --- | --- |
13
+ | Command workflows and upstream CLI coverage | [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md) |
14
+ | Native tool input/output contract and `details` fields | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md) |
15
+ | Runtime design and package config policy | [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) |
16
+ | Electron desktop lifecycle | [`docs/ELECTRON.md`](docs/ELECTRON.md) |
17
+ | Release gates and targeted upstream support | [`docs/SUPPORT_MATRIX.md`](docs/SUPPORT_MATRIX.md) |
18
+ | Maintainer release process | [`docs/RELEASE.md`](docs/RELEASE.md) |
19
+
20
+ The complete documentation ownership map lives in the repository source at [`docs/SOURCE_OF_TRUTH.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/docs/SOURCE_OF_TRUTH.md).
21
+
7
22
  ## What this looks like in Pi
8
23
 
9
24
  You prompt the agent in plain English:
@@ -63,8 +78,8 @@ The result is optimized for agent work:
63
78
  | Auth/profile workflows can leak secrets | Supports `auth save --password-stdin` and redacts sensitive args, URLs, stdout/stderr, details, and parse-failure spills | `test/agent-browser.extension-security-redaction.test.ts` |
64
79
  | Stateful cookies/storage/auth output bloats or leaks context | Presentation layer redacts `details.data` for cookies and credential-like storage values while keeping low-risk local QA values such as `theme: dark` readable; recursively scrubs other structured upstream JSON (network, diff, trace/profiler, stream, dashboard, chat, auth, dialog, frame, state, and similar) using sensitive key names plus string heuristics; masks sensitive argv flags and positionals; scrubs secrets from failed batch step errors; and exposes a compact redacted `batch` matrix on top-level `details.data` | `extensions/agent-browser/lib/results/presentation.ts`, `extensions/agent-browser/lib/results/presentation/diagnostics.ts`, `extensions/agent-browser/lib/runtime.ts`, `test/agent-browser.presentation-diagnostics.test.ts` |
65
80
  | Stale `@eN` refs fail mysteriously | Records per-session `details.refSnapshot`, rejects mismatched URLs / unknown refs / unsafe `batch` stdin ordering before spawn, adds recovery guidance to rerun `snapshot -i` or use stable `find` locators | `extensions/agent-browser/index.ts`, `extensions/agent-browser/lib/session-page-state.ts`, `test/agent-browser.session-page-state.test.ts`, `test/agent-browser.results.test.ts`, `test/agent-browser.extension-ref-guards.test.ts`, `test/agent-browser.extension-semantic-recovery.test.ts` |
66
- | Agents need stable success/failure buckets | Exposes bounded `resultCategory`, `successCategory`, and `failureCategory` on tool `details` for branching without parsing prose; a `tool_result` hook also aligns real Pi `isError` semantics, naming `Pi tool isError: true` in prose output while preserving parseable caller-requested `--json` output | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), `extensions/agent-browser/lib/results/categories.ts`, `extensions/agent-browser/lib/results/shared.ts` (re-export barrel), `extensions/agent-browser/index.ts`, `test/agent-browser.results.test.ts`, `test/agent-browser.extension-validation.test.ts`, `test/agent-browser.pi-pipeline.test.ts` |
67
- | Clicks can report success without the page receiving the event | Top-level non-Electron `click` on exact CSS/XPath selectors, and on `@e…` refs when the latest snapshot has role/name metadata the wrapper can resolve to a unique visible element, installs a bounded DOM-event probe; if upstream reports success but no trusted event reaches the target, the wrapper fails the tool, exposes `details.clickDispatch`, and suggests explicit retry/inspect next actions (no in-page replay), including a nested-scroll `scrollintoview` action when the probe sees the target outside a scroll container or viewport. Other click results still expose `details.pageChangeSummary`, and unchanged-URL clicks can surface evidence-backed `details.overlayBlockers` candidates. | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), `extensions/agent-browser/lib/orchestration/browser-run/click-dispatch.ts`, `extensions/agent-browser/lib/results/presentation/navigation.ts`, `test/agent-browser.presentation.test.ts`, `test/agent-browser.extension-click-dispatch.test.ts` |
81
+ | Agents need stable success/failure buckets | Exposes bounded `resultCategory`, `successCategory`, and `failureCategory` on tool `details` for branching without parsing prose; a `tool_result` hook also aligns real Pi `isError` semantics, naming `Pi tool isError: true` in prose output while preserving parseable caller-requested `--json` output | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), `extensions/agent-browser/lib/results/categories.ts`, `extensions/agent-browser/lib/results/shared.ts` (re-export barrel), `extensions/agent-browser/index.ts`, `extensions/agent-browser/lib/pi-tool-rendering.ts`, `test/agent-browser.results.test.ts`, `test/agent-browser.extension-validation.test.ts`, `test/agent-browser.pi-pipeline.test.ts` |
82
+ | Clicks can report success without the page receiving the event | Top-level non-Electron direct `click` calls on CSS selectors / `xpath=` targets or role-gated current `@e…` refs (`button`, `checkbox`, `menuitem`, `radio`, `switch`, `tab`) install a bounded target-specific DOM-event probe; eligible `@e…` refs use the latest snapshot role/name metadata, and duplicate-name refs use snapshot-order `duplicateIndex` rather than requiring a unique name. If upstream reports success but no trusted event reaches the resolved target, the wrapper fails the tool, exposes `details.clickDispatch`, and suggests explicit retry/inspect next actions (no in-page replay), including a nested-scroll `scrollintoview` action when the probe sees the target outside a scroll container or viewport. Unresolved locator clicks such as raw `find … click` are left upstream-owned to avoid false failures for frame-scoped targets. Other click results still expose `details.pageChangeSummary`, and unchanged-URL clicks can surface evidence-backed `details.overlayBlockers` candidates. | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), `extensions/agent-browser/lib/orchestration/browser-run/click-dispatch.ts`, `extensions/agent-browser/lib/results/presentation/navigation.ts`, `test/agent-browser.presentation.test.ts`, `test/agent-browser.extension-click-dispatch.test.ts` |
68
83
  | Dashboard scroll commands can look successful while nothing moves | Samples viewport and prominent scroll-container positions around top-level `scroll` calls; unchanged positions produce `details.scrollNoop`, visible recovery guidance, and exact `nextActions` for snapshot/screenshot verification | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#core-page-and-element-commands), `test/agent-browser.extension-validation.test.ts` |
69
84
  | Dropdown/combobox clicks can focus or hit native option box-model errors | Adds first-class `select <selector> <value...>` paths through raw `args`, `semanticAction`, and `job`; for custom combobox clicks, detects focused controls with explicit `aria-expanded` state but no visible options and returns `details.comboboxFocus` plus exact recovery `nextActions` | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#core-page-and-element-commands), `extensions/agent-browser/lib/input-modes/semantic-action.ts`, `test/agent-browser.extension-input-modes.test.ts`, `test/agent-browser.extension-validation.test.ts` |
70
85
  | Recording workflows fail late when `ffmpeg` is missing | After successful `record start` / `record restart`, warns when `ffmpeg` is not on `PATH` so agents can install or fix PATH before `record stop` | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#diff-debug-and-streaming), `test/agent-browser.extension-validation.test.ts` |
@@ -74,7 +89,7 @@ The result is optimized for agent work:
74
89
 
75
90
  ## Fastest way to try it
76
91
 
77
- Use Pi 0.79.0 or newer when possible. This package does not hard-pin Pi 0.79.0 as a runtime requirement, but the current release is audited and validated against that extension/package baseline, including Project Trust.
92
+ Use Pi 0.79.0 or newer. This package keeps Pi core imports as wildcard `peerDependencies` because Pi package docs require the host Pi install to provide those packages, and `pi-agent-browser-doctor` fails setup when `pi --version` is below the enforced runtime floor. The current release is audited and validated against the Pi 0.79 extension/package baseline, including Project Trust.
78
93
 
79
94
  Install upstream `agent-browser` first and make sure it is on `PATH`:
80
95
 
@@ -146,7 +161,7 @@ The doctor checks:
146
161
 
147
162
  - upstream `agent-browser` exists on `PATH`
148
163
  - the installed upstream version matches this wrapper's command-reference baseline
149
- - `pi --version` meets the recommended Pi floor for this release, as a warning rather than a hard failure
164
+ - `pi --version` meets the minimum Pi runtime floor for this release; older Pi versions are setup failures
150
165
  - Pi settings do not point at multiple active `pi-agent-browser-native` sources
151
166
 
152
167
  It does **not** edit Pi settings and does **not** run upstream `agent-browser doctor --fix`.
@@ -268,7 +283,7 @@ Run a multi-step flow in one tool call:
268
283
  { "args": ["batch"], "stdin": "[[\"open\",\"https://example.com\"],[\"snapshot\",\"-i\"]]" }
269
284
  ```
270
285
 
271
- If the same `batch` stdin later uses `@e…` on interaction commands after a step that can navigate or mutate the page (`open`, non-form `click`, `reload`, and similar), insert a `snapshot` step whose first argv token is `snapshot` (for example `["snapshot","-i"]`) between those phases. Multiple same-snapshot `fill @e…` steps and native form-control steps (`check`/`uncheck` on checkbox or radio refs and `select` on combobox refs) may be batched before a click/submit step; checkbox/radio `click`s remain conservative unless followed by a fresh snapshot. Dynamic or autosubmit forms should still use stable locators or split with a fresh snapshot. The wrapper rejects unsafe ordering with `failureCategory: "stale-ref"` before upstream runs; full rules are under `refSnapshot` in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details).
286
+ If the same `batch` stdin later uses `@e…` on interaction commands after a step that can navigate or mutate the page (`open`, non-form `click`, `reload`, and similar), insert a `snapshot` step whose first argv token is `snapshot` (for example `["snapshot","-i"]`) between those phases. Multiple same-snapshot `fill @e…` steps and native form-control steps (`check`/`uncheck` on checkbox or radio refs, checkbox/radio `click`/`tap` refs, and `select` on combobox refs) may be batched before a final click/submit step. Dynamic or autosubmit forms should still use stable locators or split with a fresh snapshot. The wrapper rejects unsafe ordering with `failureCategory: "stale-ref"` before upstream runs; full rules are under `refSnapshot` in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details).
272
287
 
273
288
  Evaluate page JavaScript through stdin. Put the script in the top-level `stdin` field, not as an extra `args` token after `--stdin`. Return the value you want as an expression; `eval --stdin` may warn with `details.evalStdinHint` when a function-shaped snippet serializes to `{}` instead of being invoked:
274
289
 
@@ -323,16 +338,16 @@ Typical pitfalls:
323
338
  - Do not reuse `@e…` refs across navigation. The wrapper records the latest snapshot refs per session and fails mutation-prone stale/recycled refs before upstream can silently hit a different current-page element; use the session-aware `refresh-interactive-refs` next action.
324
339
  - If upstream classifies the failure as `stale-ref` and `details.compiledSemanticAction` is present for a compiled `find` action, `details.nextActions` may list `retry-semantic-action-after-stale-ref` after `refresh-interactive-refs`, carrying the same compiled `find` argv so you can retry the locator-stable target once it is safe to do so. `select` calls that used stale `@refs` only get refresh guidance; use a fresh snapshot or stable selector before retrying (contract in [`docs/TOOL_CONTRACT.md#semanticaction`](docs/TOOL_CONTRACT.md#semanticaction)).
325
340
  - If the failure is `selector-not-found`, the wrapper may take one fresh snapshot and add `Current snapshot ref fallback` when that snapshot has exact visible role/name matches for the failed `find` / `semanticAction` target. Non-fill targets can include direct `try-current-visible-ref*` next actions, and semantic click misses can still add bounded `Agent-browser candidate fallbacks` such as `button`/`link` role retries for `text` clicks. `semanticAction` does not expose `uncheck` while upstream `find ... uncheck` is not runtime-supported; use raw `args: ["uncheck", <selector-or-ref>]` after a stable selector or fresh snapshot ref. For semantic `fill` misses on desktop or host-controlled rich inputs, prefer `details.richInputRecovery`: refresh refs, choose the current editable `@ref`, focus or click it, then use `keyboard inserttext` or `keyboard type` with the intended text. Direct contenteditable fills are verified with `get text` when snapshot metadata proves the target is contenteditable; if replacement did not happen, `details.fillVerification` warns before any submit step. Those recovery nextActions do not copy the fill text and do not press `Enter` or submit; only submit when the user flow explicitly calls for it (same contract link).
326
- - A successful upstream `click` is not proof that the web app handled the event or changed state. For top-level non-Electron clicks, the wrapper may fail the tool with `details.clickDispatch` and a `Click dispatch diagnostic` line when upstream reported success but no trusted DOM event reached the target; use the suggested `inspect-click-dispatch-miss` / `retry-click-after-dispatch-miss` next actions instead of assuming the click mutated the page; when `details.clickDispatch.scrollContainer` is present, use `scroll-target-into-view-after-dispatch-miss` first. When the task depends on a mutation, follow `inspect-after-mutation` / `pageChangeSummary` evidence with a wait, URL/text check, or fresh snapshot before trusting the result; if the target still did not change, retry with a current visible ref or stable selector and report the workflow issue instead of silently continuing. For static local fixtures where the user only needs to exercise app code, an explicit `eval --stdin` programmatic click such as `document.querySelector("#demo").click()` can be a diagnostic workaround, but treat it as an untrusted scripted activation rather than proof a real user click works, and never use it to bypass user instructions. Respect explicit user stop boundaries yourself: if the user says to stop before order/post/purchase/submit, gather evidence on that page and do not click the final action. The wrapper does not parse broad prompt text into business-intent action blocks; `details.promptGuard` is reserved for concrete artifact-before-close checks.
341
+ - A successful upstream `click` is not proof that the web app handled the event or changed state. For top-level non-Electron direct clicks on selectors, `xpath=` targets, and eligible current `@e…` refs, the wrapper may fail the tool with `details.clickDispatch` and a `Click dispatch diagnostic` line when upstream reported success but no trusted DOM event reached the resolved target. Raw `find … click` locator calls are not probed because the wrapper has no concrete element before upstream resolves the locator, and document-level probes can falsely fail frame-scoped clicks. `@e…` ref click probes are limited to current snapshot refs with accessible role `button`, `checkbox`, `menuitem`, `radio`, `switch`, or `tab`, using duplicate-name snapshot order when needed. Use the suggested `inspect-click-dispatch-miss` / `retry-click-after-dispatch-miss` next actions instead of assuming the click mutated the page; when `details.clickDispatch.scrollContainer` is present, use `scroll-target-into-view-after-dispatch-miss` first. When the task depends on a mutation, follow `inspect-after-mutation` / `pageChangeSummary` evidence with a wait, URL/text check, or fresh snapshot before trusting the result; if the target still did not change, retry with a current visible ref or stable selector and report the workflow issue instead of silently continuing. For static local fixtures where the user only needs to exercise app code, an explicit `eval --stdin` programmatic click such as `document.querySelector("#demo").click()` can be a diagnostic workaround, but treat it as an untrusted scripted activation rather than proof a real user click works, and never use it to bypass user instructions. Respect explicit user stop boundaries yourself: if the user says to stop before order/post/purchase/submit, gather evidence on that page and do not click the final action. The wrapper does not parse broad prompt text into business-intent action blocks; `details.promptGuard` is reserved for concrete artifact-before-close checks.
327
342
  - A successful `snapshot -i` can surface `Possible overlay blockers` immediately when refs already contain strong dialog/alertdialog evidence plus close/dismiss controls. If a **top-level** `click` succeeds (unified command `click`, not a `batch` step), upstream reports `data.clicked`, and the tab URL is unchanged under the same normalization as ref preflight (fragment-insensitive), the wrapper may take one extra `snapshot -i` and add `Possible overlay blockers` with `details.overlayBlockers` (`candidates`, `summary`, optional `snapshot` refresh for refs) plus session-aware `inspect-overlay-state` / bounded `try-overlay-blocker-candidate-*` next actions when that snapshot shows strong modal context (`dialog` / `alertdialog`) and close/dismiss-like controls. Page-wide words like privacy, sign in, or banner alone do not trigger this diagnostic. The unchanged-URL check uses `details.navigationSummary`, which is populated with one read-only `eval` summary when the click JSON omits **both** string `data.url` and `data.title`; if upstream already includes either, overlay diagnostics are skipped here. Also skipped when tab correction or about-blank recovery already ran on that result.
328
343
  - If `get text <selector>` reads a non-ref CSS selector with multiple matches or a hidden first match while visible matches exist, including successful `batch` steps, the wrapper may add `Selector text visibility warning`, `details.selectorTextVisibility` (plus `selectorTextVisibilityAll` for multiple batched warnings), and `inspect-visible-text-candidates` next actions; the warning names the matching `details.nextActions` id. Prefer a visible `@ref`, a scoped selector, or a targeted `eval --stdin` over hidden tab content.
329
344
  - In wrapper-tracked attached Electron sessions, broad selectors such as `body`, `html`, `main`, or `[role=application]` may read the whole app shell. The wrapper may add `Broad Electron get text selector warning`, `details.electronGetTextScopeWarning`, and `snapshot-for-electron-text-scope`; ordinary browser pages, including `file://` fixtures, do not qualify without Electron launch provenance. Prefer `snapshot -i`, a current `@ref`, or a narrower panel selector.
330
345
 
331
346
  ### Constrained browser jobs
332
347
 
333
- For short repeatable workflows, pass a top-level `job` instead of hand-writing `batch` stdin. The wrapper only supports constrained steps (`open`, `click`, `fill`, `type`, `select`, `wait`, `assertText`, `assertUrl`, `waitForDownload`, `snapshot`, and `screenshot`), compiles them to existing upstream `batch` commands, and echoes the compiled commands as `details.compiledJob` for auditability. `open` steps can include `loadState` (`domcontentloaded`, `load`, or `networkidle`) to insert a readiness wait before the next step. `click` and `fill` steps can use either CSS `selector` or semantic locator fields (`locator`, `role`/`value`, optional `name`) so a job can express flows like role/name search without brittle selectors. `type` can use `selector`, `text`, optional `delayMs` for per-character pacing, and optional `press` for a final key such as `Enter`; paced type compiles to existing `focus`, `keyboard type`, `wait`, and `press` batch rows, is capped at 200 characters per delayed step, and compacts model-visible batch text while full rows remain in `details.batchSteps`. The same compile path backs top-level `qa`, so long `qa` runs surface the same timeout evidence shape. If a long `job`, `qa`, or `batch` hits the wrapper watchdog, `details.timeoutPartialProgress` may recover per-step status (`completed`, `failed`, `pending`, or `unknown`), current page title/URL, declared artifact paths that already exist on disk, and a `retry-timeout-step` next action for the first incomplete read-only or idempotent step (see [`docs/TOOL_CONTRACT.md#details`](docs/TOOL_CONTRACT.md#details)). There is no separate catalog of reusable named browser recipes above `job`, `qa`, and raw `batch`; see [`docs/ARCHITECTURE.md#no-reusable-recipe-layer-yet`](docs/ARCHITECTURE.md#no-reusable-recipe-layer-yet) for the closed `RQ-0068` decision and when to revisit it.
348
+ For short repeatable workflows, pass a top-level `job` instead of hand-writing `batch` stdin. Keep dynamic app jobs short around navigation, click, and rerender boundaries; avoid packing a whole checkout into one job. The wrapper only supports constrained steps (`open`, `click`, `fill`, `type`, `select`, `wait`, `assertText`, `assertUrl`, `waitForDownload`, `snapshot`, and `screenshot`), compiles them to existing upstream `batch` commands, and echoes the compiled commands as `details.compiledJob` for auditability. `open` steps can include `loadState` (`domcontentloaded`, `load`, or `networkidle`) to insert a readiness wait before the next step. `click` and `fill` steps can use either CSS `selector` or semantic locator fields (`locator`, `role`/`value`, optional `name`) so a job can express flows like role/name search without brittle selectors. `type` can use `selector`, `text`, optional `delayMs` for per-character pacing, and optional `press` for a final key such as `Enter`; paced type compiles to existing `focus`, `keyboard type`, `wait`, and `press` batch rows, is capped at 200 characters per delayed step, and compacts model-visible batch text while full rows remain in `details.batchSteps`. The same compile path backs top-level `qa`, so long `qa` runs surface the same timeout evidence shape. If a long `job`, `qa`, or `batch` hits the wrapper watchdog, `details.timeoutPartialProgress` may recover per-step status (`completed`, `failed`, `pending`, or `unknown`), current page title/URL, declared artifact paths that already exist on disk, and either a `retry-timeout-step` next action for the first incomplete read-only or idempotent step or `inspect-current-page-after-timeout` when the first incomplete step may be mutating and needs state inspection before a shorter follow-up flow (see [`docs/TOOL_CONTRACT.md#details`](docs/TOOL_CONTRACT.md#details)). There is no separate catalog of reusable named browser recipes above `job`, `qa`, and raw `batch`; see [`docs/ARCHITECTURE.md#no-reusable-recipe-layer-yet`](docs/ARCHITECTURE.md#no-reusable-recipe-layer-yet) for the closed `RQ-0068` decision and when to revisit it.
334
349
 
335
- **Navigation inside `job` is explicit.** A successful `click` does not prove the next page loaded; add `assertUrl` and/or `assertText` after navigation-prone clicks (forms, checkout, tabs, submit buttons) before screenshots or steps that assume the new page.
350
+ **Navigation inside `job` is explicit.** A successful `click` does not prove the next page loaded; add `assertUrl` and/or `assertText` after navigation-prone clicks (forms, checkout, tabs, submit buttons) before screenshots or steps that assume the new page. `assertUrl` accepts exact URLs and `*` / `**` glob-style patterns; glob patterns compile to a `wait --fn` URL predicate so examples like `**/shipping` do not depend on upstream `wait --url` matcher quirks. In those patterns, `*` stays within one path segment and `**` can cross `/`; exact URLs with no `*` stay literal.
336
351
 
337
352
  ```json
338
353
  {
@@ -393,7 +408,7 @@ For an app you launched yourself with remote debugging enabled, use raw upstream
393
408
 
394
409
  `connect` success means the debug endpoint accepted the session, not that an active page is ready. If a snapshot says `No active page`, the wrapper clears prior refs for that session; choose a stable `t<N>` tab and retry a condition wait or fresh `snapshot -i` before using `@e…` refs. Close commands (`close`, `quit`, or `exit`) only close the browser/CDP session; manually launched apps, their profiles, and explicit screenshots/downloads/HARs/traces/recordings remain host-owned.
395
410
 
396
- After either path, use `qa: { "attached": true, ... }` for a current-session smoke check without opening a URL. Attached QA preserves existing network/console/page-error buffers instead of clearing them, so it can catch errors raised before the check started; visible output and `details.compiledQaPreset.checks.diagnosticsResetAtStart` identify that scope. Prefer condition waits (`wait --text`, `wait --url`, `wait --fn`, `wait --load <state>`, `wait --download`), `qa.attached`, `electron.probe` / `electron.status`, `tab list` → `tab t<N>`, fresh snapshots, or screenshots over blind sleeps. Keep fixed waits below the wrapper IPC budget: `wait 30000` is intentionally blocked, and a result like `"waited":"timeout"` only proves elapsed time.
411
+ After either path, use `qa: { "attached": true, ... }` for a current-session smoke check without opening a URL. Attached QA preserves existing network/console/page-error buffers instead of clearing them, so it can catch errors raised before the check started; visible output and `details.compiledQaPreset.checks.diagnosticsResetAtStart` identify that scope. Prefer condition waits (`wait --text`, `wait --url`, `wait --fn`, `wait --load <state>`, `wait --download`), `qa.attached`, `electron.probe` / `electron.status`, `tab list` → `tab t<N>`, fresh snapshots, or screenshots over blind sleeps. Fixed waits are a last resort: use explicit `--timeout` or top-level `timeoutMs` for legitimately slow waits, and treat a result like `"waited":"timeout"` as elapsed time only.
397
412
 
398
413
  ### Lightweight QA preset
399
414
 
@@ -434,11 +449,11 @@ For asynchronous exports, click first and then wait for the download:
434
449
  { "args": ["wait", "--download", "/tmp/report.csv"] }
435
450
  ```
436
451
 
437
- When a user gives exact artifact paths for screenshots, recordings, downloads, PDFs, traces, or HAR files, use those paths or explicitly report why the artifact was unavailable; do not silently substitute a different path in the final report. The wrapper creates missing parent directories for direct artifact paths such as `state save`, screenshots, PDFs, downloads, and `wait --download`. For simple loopback `download <selector> <path>` anchor links with HTTP(S) `href`, it can save the in-page response directly to the requested path before falling back to upstream click/download behavior; non-loopback/profile downloads stay upstream-owned. With upstream `agent-browser 0.27.1`, treat `details.savedFilePath` as upstream-reported metadata and confirm `details.artifacts[].exists` / `details.artifactVerification.verified` before relying on the requested `wait --download <path>` file being present on disk; non-file download payloads such as `data:` URLs are not verified local artifacts.
452
+ When a user gives exact artifact paths for screenshots, recordings, downloads, PDFs, traces, or HAR files, use those paths or explicitly report why the artifact was unavailable; do not silently substitute a different path in the final report. The wrapper creates missing parent directories for direct artifact paths such as `state save`, screenshots, PDFs, downloads, and `wait --download`. For simple loopback `download <selector> <path>` anchor links with HTTP(S) `href`, it can save the in-page response directly to the requested path before falling back to upstream click/download behavior; non-loopback/profile downloads stay upstream-owned. With upstream `agent-browser 0.27.2`, treat `details.savedFilePath` as upstream-reported metadata and confirm `details.artifacts[].exists` / `details.artifactVerification.verified` before relying on the requested `wait --download <path>` file being present on disk; non-file download payloads such as `data:` URLs are not verified local artifacts.
438
453
 
439
454
  For evidence-only screenshots or QA captures, branch on `details.artifactVerification` and `details.artifacts` before reporting PASS/FAIL; inline image attachments are optional when size limits allow—do not require vision review unless the user asked for visual inspection. If the latest prompt names exact required artifact paths, browser close can be blocked with `details.promptGuard` until those artifacts are saved and verified.
440
455
 
441
- Artifact cleanup is host-owned, not a browser command. Close commands (`close`, `quit`, or `exit`) shut down the browser session but do **not** delete explicit screenshots, downloads, PDFs, traces, HAR files, or recordings saved to paths you chose. When the session’s non-empty `details.artifactManifest` is in scope, a successful close command appends an `Artifact lifecycle` note and sets `details.artifactCleanup` with the same retention summary as `details.artifactRetentionSummary`, a fixed `note` about host-owned cleanup, and `explicitArtifactPaths`: up to ten distinct paths from manifest rows whose `storageScope` is `explicit-path` (this list can be empty if the recent window only holds spills or other non-explicit inventory). Remove any listed paths with normal file tools after inspection.
456
+ Artifact cleanup is host-owned, not a browser command. Close commands (`close`, `quit`, or `exit`) shut down the browser session but do **not** delete explicit screenshots, downloads, PDFs, traces, HAR files, or recordings saved to paths you chose. When the session’s non-empty `details.artifactManifest` is in scope, a successful close command appends a compact `Artifact lifecycle` note and sets `details.artifactCleanup` with the same retention summary as `details.artifactRetentionSummary`, a fixed `note` about host-owned cleanup, and `explicitArtifactPaths`: up to ten distinct paths from manifest rows whose `storageScope` is `explicit-path` (this list can be empty if the recent window only holds spills or other non-explicit inventory). Remove any listed paths with normal file tools after inspection.
442
457
 
443
458
  Start a fresh profiled browser after the implicit public-browsing session already exists:
444
459
 
@@ -551,9 +566,17 @@ The full `npm run verify` gate runs:
551
566
  - command-reference baseline checks
552
567
  - live command-reference verification against the targeted installed upstream `agent-browser`
553
568
 
554
- Step order and which subprocesses run live in [`scripts/project.mjs`](scripts/project.mjs); [`test/project-verify.test.ts`](test/project-verify.test.ts) locks default, `release`, `real-upstream`, `dogfood`, `platform-target`, `platform-smoke`, `package-pi`, and combined-docs orchestration so a gate cannot disappear accidentally. Run `npm run verify -- --help` for opt-in modes and supported passthrough flags.
569
+ Step order and which subprocesses run live in [`scripts/project.mjs`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/scripts/project.mjs); [`test/project-verify.test.ts`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/test/project-verify.test.ts) locks default, `pre-pr`, `release`, `real-upstream`, `dogfood`, `platform-target`, `platform-smoke`, `package-pi`, and combined-docs orchestration so a gate cannot disappear accidentally. Run `npm run verify -- --help` for opt-in modes and supported passthrough flags.
570
+
571
+ For larger local handoffs or PR-ready confidence before expensive release/lifecycle/platform gates, run:
572
+
573
+ ```bash
574
+ npm run verify -- pre-pr
575
+ ```
576
+
577
+ That mode composes the full default gate with `npm run verify -- package`, so package contents and forbidden archived/repo-only files are checked without launching Pi lifecycle, Crabbox, or live dogfood flows.
555
578
 
556
- The deterministic agent-efficiency benchmark’s **standalone JSON/Markdown accounting run** is not part of default `npm run verify` (only `npm run verify -- benchmark` or `npm run benchmark:agent-browser` invokes the script). The full unit suite still exercises `test/agent-browser.efficiency-benchmark.test.ts`. Use the script before and after agent-facing abstractions to prove call-count, output-size, stale-ref, artifact, failure-category coverage, success-rate, and elapsed-time effects before changing the wrapper UX:
579
+ The deterministic agent-efficiency benchmark’s **standalone JSON/Markdown accounting run** is not part of default or pre-PR `npm run verify` (only `npm run verify -- benchmark` or `npm run benchmark:agent-browser` invokes the script). The full unit suite still exercises `test/agent-browser.efficiency-benchmark.test.ts`. Use the script before and after agent-facing abstractions to prove call-count, output-size, stale-ref, artifact, failure-category coverage, success-rate, and elapsed-time effects before changing the wrapper UX:
557
580
 
558
581
  ```bash
559
582
  npm run benchmark:agent-browser
@@ -570,7 +593,7 @@ The opt-in real-upstream suite is separate because it drives a real browser inst
570
593
  npm run verify -- real-upstream
571
594
  ```
572
595
 
573
- That mode sets `PI_AGENT_BROWSER_REAL_UPSTREAM=1` and runs `test/agent-browser.real-upstream-contract.test.ts` against the real `agent-browser` on `PATH` (version must match the capability baseline). It covers inspection, skills, a broad core interaction and navigation matrix on localhost fixtures (including `batch` stdin and `pushstate`), plus `vitals`, network route/requests/HAR, diff snapshot/screenshot/url, trace/profiler, console/errors/highlight, stream enable/status/disable, `cookies set --curl`, a `react tree` missing-renderer path, and `wait --download` with the on-disk caveat documented in release notes. The harness uses a throwaway temp `HOME` and dedicated socket/screenshot directories so the run does not touch your normal browser profile paths. Browser-opening or credential-dependent families such as `inspect`, `dashboard`, `chat`, provider clouds, and OS clipboard flows stay in fake-upstream or manual validation unless a safe deterministic fixture is added. For prerequisites, isolation details, and troubleshooting, see [`docs/RELEASE.md`](docs/RELEASE.md#real-upstream-contract-validation).
596
+ That mode sets `PI_AGENT_BROWSER_REAL_UPSTREAM=1` and runs `test/agent-browser.real-upstream-contract.test.ts` against the real `agent-browser` on `PATH` (version must match the capability baseline). It covers inspection, skills, a broad core interaction and navigation matrix on localhost fixtures (including off-viewport click, frame-scoped selector/wait/click behavior, form command fixes, `batch` stdin, and `pushstate`), plus `vitals`, network route/requests/HAR, diff snapshot/screenshot/url, trace/profiler, console/errors/highlight, stream enable/status/disable, `cookies set --curl`, a `react tree` missing-renderer path, and `wait --download` with the on-disk caveat documented in release notes. The harness uses a throwaway temp `HOME` and dedicated socket/screenshot directories so the run does not touch your normal browser profile paths. Browser-opening or credential-dependent families such as `inspect`, `dashboard`, `chat`, provider clouds, and OS clipboard flows stay in fake-upstream or manual validation unless a safe deterministic fixture is added. For prerequisites, isolation details, and troubleshooting, see [`docs/RELEASE.md`](docs/RELEASE.md#real-upstream-contract-validation).
574
597
 
575
598
  A deterministic host-only live-browser wrapper smoke is available without an LLM choosing tool calls:
576
599
 
@@ -709,7 +732,7 @@ These calls return plain text and stay stateless: the extension does not inject
709
732
 
710
733
  ## More docs
711
734
 
712
- - [`AGENTS.md`](AGENTS.md) — maintainer and agent runbooks, including upstream capability baseline rebaselining and Pi smoke testing in `tmux`
735
+ - [`AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md) — maintainer and agent runbooks, including upstream capability baseline rebaselining and Pi smoke testing in `tmux`
713
736
  - [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md) — full native command reference and upstream capability baseline
714
737
  - [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md) — exact tool contract
715
738
  - [`docs/ELECTRON.md`](docs/ELECTRON.md) — Electron desktop-app guide
@@ -2,7 +2,7 @@
2
2
 
3
3
  Related docs:
4
4
  - [`../README.md`](../README.md)
5
- - [`../AGENTS.md`](../AGENTS.md) (maintainer workflows, including upstream capability baseline)
5
+ - [`../AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md) (maintainer workflows, including upstream capability baseline)
6
6
  - [`REQUIREMENTS.md`](REQUIREMENTS.md)
7
7
  - [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md)
8
8
  - [`ELECTRON.md`](ELECTRON.md)
@@ -68,7 +68,7 @@ Pi docs use `settings.json` for package/resource loading and filtering, not arbi
68
68
  - project-local: `.pi/config/pi-agent-browser-native/config.json`
69
69
  - explicit override: `PI_AGENT_BROWSER_CONFIG=/path/to/config.json`
70
70
 
71
- Config layers merge in that order: global, project, override. The shared policy module (`extensions/agent-browser/lib/config-policy.js`) owns provider descriptors, environment variable names, config keys, project-local credential safety, developer-trusted project layer inclusion, layer validation/merge, redacted status projection, and credential summaries for both runtime config loading and the package config helper. Under Pi 0.79+, globally installed or CLI-loaded extensions are developer-trusted code, so this extension reads `.pi/config/pi-agent-browser-native/config.json` by default and skips that project layer only when Pi is launched with `--no-approve`. Global config and explicit `PI_AGENT_BROWSER_CONFIG` overrides remain available either way. The config reader accepts v1 fields for `webSearch.enabled`, `webSearch.preferredProvider`, `webSearch.exaApiKey`, `webSearch.braveApiKey`, and conservative browser defaults such as `browser.defaultProfile` and `browser.executablePath`. Web-search key fields follow Pi model/provider-style value resolution for trusted global/override config: literal values, `$ENV_VAR` / `${ENV_VAR}` interpolation, escapes (`$$`, `$!`), and leading `!command` resolved at request time. Project-local plaintext, interpolation-literal, malformed, and command-backed web-search keys are rejected because project config can be copied, committed, or supplied by a repository; project config may use only the matching provider env ref (`$EXA_API_KEY` / `${EXA_API_KEY}` for Exa, `$BRAVE_API_KEY` / `${BRAVE_API_KEY}` for Brave), so a repository cannot redirect search credentials to arbitrary host env vars. `EXA_API_KEY` and `BRAVE_API_KEY` remain environment fallbacks when no config credential source exists for that provider. Browser default values keep their source scope; profile/executable prompt guidance is emitted only from trusted global or explicit override config, not from project-local config that could steer host profile or executable choices.
71
+ Config layers merge in that order: global, project, override. The shared policy module (`extensions/agent-browser/lib/config-policy.js`) owns provider descriptors, environment variable names, config keys, project-local credential safety, developer-trusted project layer inclusion, layer validation/merge, redacted status projection, and credential summaries for both runtime config loading and the package config helper. Under Pi 0.79+, globally installed or CLI-loaded extensions are developer-trusted code, so this extension reads `.pi/config/pi-agent-browser-native/config.json` by default and skips that project layer when Pi reports the project is untrusted or when launched with `--no-approve`. Global config and explicit `PI_AGENT_BROWSER_CONFIG` overrides remain available either way. The config reader accepts v1 fields for `webSearch.enabled`, `webSearch.preferredProvider`, `webSearch.exaApiKey`, `webSearch.braveApiKey`, and conservative browser defaults such as `browser.defaultProfile` and `browser.executablePath`. Web-search key fields follow Pi model/provider-style value resolution for trusted global/override config: literal values, `$ENV_VAR` / `${ENV_VAR}` interpolation, escapes (`$$`, `$!`), and leading `!command` resolved at request time. Project-local plaintext, interpolation-literal, malformed, and command-backed web-search keys are rejected because project config can be copied, committed, or supplied by a repository; project config may use only the matching provider env ref (`$EXA_API_KEY` / `${EXA_API_KEY}` for Exa, `$BRAVE_API_KEY` / `${BRAVE_API_KEY}` for Brave), so a repository cannot redirect search credentials to arbitrary host env vars. `EXA_API_KEY` and `BRAVE_API_KEY` remain environment fallbacks when no config credential source exists for that provider. Browser default values keep their source scope; profile/executable prompt guidance is emitted only from trusted global or explicit override config, not from project-local config that could steer host profile or executable choices.
72
72
 
73
73
  `agent_browser_web_search` registration is conditional. `webSearch.enabled: false` disables registration even when environment keys are present, but it is evaluated after the available config layers merge. A global disable is the normal user default and can still be overridden by project config or `PI_AGENT_BROWSER_CONFIG`; a project disable applies to one repo; an explicit `PI_AGENT_BROWSER_CONFIG` file with `webSearch.enabled: false` is the highest-priority hard disable for that run. Literal and env-backed sources must resolve at startup; command-backed sources are considered configured without running the command until tool execution, so secret managers do not slow startup or prompt unexpectedly. The tool resolves the selected key lazily, chooses Exa or Brave from available credentials (preferring Exa by default unless `webSearch.preferredProvider` says otherwise), then follows one provider-agnostic execution path through provider adapters for request building, HTTP JSON fetch, response normalization, and provider-specific detail fields. It calls Exa `/search` with highlights or Brave Search and returns compact result details without exposing keys.
74
74
 
@@ -85,7 +85,7 @@ Tier B guidance lives in `SHARED_BROWSER_PLAYBOOK_GUIDELINES`, generated README/
85
85
  Do **not** add reusable browser recipes as a first-class runtime surface yet.
86
86
 
87
87
  Current evidence does not justify another source of truth for workflows:
88
- - the deterministic efficiency benchmark in [`scripts/agent-browser-efficiency-benchmark.mjs`](../scripts/agent-browser-efficiency-benchmark.mjs) models one native `job` scenario (`job-open-assert-screenshot`), one `qa` preset (`qa-open-diagnostics`), one `sourceLookup` (`source-lookup-visible-element`), one `networkSourceLookup` (`network-source-lookup-failed-request`), plus deterministic `electron` lifecycle/probe scenarios (`electron-lifecycle`, `electron-probe`) rather than repeated named job patterns that agents keep re-specifying
88
+ - the deterministic efficiency benchmark in [`scripts/agent-browser-efficiency-benchmark.mjs`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/scripts/agent-browser-efficiency-benchmark.mjs) models one native `job` scenario (`job-open-assert-screenshot`), one `qa` preset (`qa-open-diagnostics`), one `sourceLookup` (`source-lookup-visible-element`), one `networkSourceLookup` (`network-source-lookup-failed-request`), plus deterministic `electron` lifecycle/probe scenarios (`electron-lifecycle`, `electron-probe`) rather than repeated named job patterns that agents keep re-specifying
89
89
  - repo-local dogfood evidence does not show repeated project-specific job recipes that need versioning or ownership
90
90
  - `qa` already covers the only repeated smoke-test shape with a stable top-level preset
91
91
  - docs and prompt guidance can carry examples without adding recipe state, migration rules, or another schema
@@ -109,7 +109,7 @@ Why:
109
109
  - keeps reload and exact-session relaunch validation tied to Pi's configured-source lifecycle instead of an isolated quick-test path, while `session_tree` state changes stay covered by focused extension harness tests
110
110
  - keeps the published tarball focused on the package manifest, extension code, canonical docs, and license
111
111
 
112
- The published package should exclude agent-only and superseded repo materials such as `AGENTS.md`, `docs/v1-tool-contract.md`, `docs/native-integration-design.md`, and other internal planning notes.
112
+ The published package should exclude agent-only and superseded repo materials such as `AGENTS.md`, archived drafts under `docs/archive/`, and other internal planning notes.
113
113
 
114
114
  ## Session model
115
115
 
@@ -141,7 +141,7 @@ Practical policy:
141
141
  - close the active extension-managed session when the originating `pi` process quits, while leaving explicit caller-provided sessions alone
142
142
  - set an idle timeout on extension-managed sessions as a backstop for abnormal exits or cleanup failures
143
143
  - clean up process-private temp spill artifacts on shutdown, but keep persisted-session snapshot spill files in a private session-scoped artifact directory with a bounded per-session budget so `details.fullOutputPath` stays usable after reload/resume without unbounded growth
144
- - keep explicit screenshots, downloads, PDFs, traces, HAR captures, and recordings written to caller-chosen paths on disk after a successful upstream close command (`close`, `quit`, or `exit`); before artifact-producing commands run, create missing parent directories for requested host paths, and for simple loopback HTML anchor downloads with resolvable HTTP(S) hrefs the wrapper may save directly to the requested path before upstream fallback. When the bounded `details.artifactManifest` has entries, successful close commands also surface `details.artifactCleanup` and an `Artifact lifecycle` note (including up to ten distinct `explicit-path` manifest paths when present) so operators remove files with normal host tools—the native tool does not delete arbitrary user paths (`extensions/agent-browser/lib/orchestration/browser-run/diagnostics.ts`, `getArtifactCleanupGuidance`); contract in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), checklist `RQ-0079` in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
144
+ - keep explicit screenshots, downloads, PDFs, traces, HAR captures, and recordings written to caller-chosen paths on disk after a successful upstream close command (`close`, `quit`, or `exit`); before artifact-producing commands run, create missing parent directories for requested host paths, and for simple loopback HTML anchor downloads with resolvable HTTP(S) hrefs the wrapper may save directly to the requested path before upstream fallback. When the bounded `details.artifactManifest` has entries, successful close commands also surface `details.artifactCleanup` and a compact `Artifact lifecycle` note pointing to structured explicit paths so operators remove files with normal host tools—the native tool does not delete arbitrary user paths (`extensions/agent-browser/lib/orchestration/browser-run/diagnostics.ts`, `getArtifactCleanupGuidance`); contract in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), checklist `RQ-0079` in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
145
145
  - reconstruct the current branch-visible extension-managed session, page-scoped refs, artifact manifest, and Electron launch records from the active transcript branch on `session_start` and `session_tree` so later default calls keep following the active managed browser after resume/reload or branch switching; restore also honors successful explicit `--session <wrapper-owned> close` rows and `electron.cleanup` managed-session steps so closed wrapper-owned sessions are not resurrected
146
146
  - keep process-owned cleanup registries for extension-managed sessions and wrapper-launched Electron records separate from the current branch-visible view; `session_tree` restore and wrapper-owned browser commands are serialized with managed-session work, while independent caller-owned explicit-session commands keep their parallel tab-target behavior but use a branch-state generation guard so stale completions cannot overwrite newer branch-visible managed/artifact state after a branch switch; branch switches still must not drop resources the current Pi process owns and must keep fresh-session allocation monotonic
147
147
  - when a successful close targets the current extension-managed session, including an explicit `--session <current> close` or an `electron.cleanup` managed-session step, clear page/ref state, mark that session inactive, untrack cleanup ownership, and rotate the next default auto call to a fresh wrapper-generated session name rather than reusing the closed name
@@ -153,11 +153,11 @@ Practical policy:
153
153
  - once the wrapper observes tab-drift risk for a session (profile restore correction, overlapping stale opens, or restored session state), later active-tab commands may synthesize a tiny upstream `batch` that re-selects that tab and then runs the requested command in the same upstream invocation; routine same-session commands avoid `tab list` preflights to reduce probes that can perturb upstream click behavior
154
154
  - for sessions with observed tab-drift risk, after a successful command on a known tab target, the wrapper may best-effort restore that same target again if restored/background tabs steal focus after the command returns; routine same-session commands skip this post-command `tab list` probe
155
155
  - keep a per-session `refSnapshot` aligned with the last successful `snapshot` (including refs merged from a successful `batch` by taking the last successful `snapshot` step in batch result order): restore it from persisted tool `details` when reloading, resuming, or moving to a different Pi session-tree branch, store bounded ref role/name metadata from the same snapshot for wrapper-side current-ref diagnostics, drop it on successful close commands (`close`, `quit`, or `exit`), and refuse mutation-prone `@e…` argv before spawn when the active tab URL no longer matches the snapshot URL, when a ref id was never in that snapshot, or when `batch` stdin would reuse `@e…` on a guarded step after an earlier invalidating step without a later `snapshot` step in the same stdin array. Same-snapshot `fill @e…` rows are guarded but do not themselves set that invalidation latch, so ordinary form fills can precede a click/submit row in one batch—see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) for the agent-visible contract and failure text; typed per-session tab/ref/pinning state lives in `extensions/agent-browser/lib/session-page-state.ts` and is updated from `extensions/agent-browser/index.ts` after each tool result
156
- - for top-level non-Electron `click` commands, install a bounded in-page event probe before upstream runs; if upstream reports success but no trusted pointer/mouse/click event reached the target, fail the tool and report `details.clickDispatch` with explicit retry/inspect next actions (the wrapper does not replay clicks in-page). The probe is intentionally skipped for `batch`/`job`/`qa` click steps. For `@e…` targets it uses the stored `refSnapshot.refs` metadata above instead of taking a fresh pre-click snapshot that could recycle upstream refs
156
+ - for top-level non-Electron direct `click` commands, install a bounded in-page target-specific event probe before upstream runs; if upstream reports success but no trusted pointer/mouse/click event reached the resolved target, fail the tool and report `details.clickDispatch` with explicit retry/inspect next actions (the wrapper does not replay clicks in-page). The probe is intentionally skipped for unresolved `find … click` locators and for `batch`/`job`/`qa` click steps. For `@e…` targets it only covers current refs whose latest stored `refSnapshot.refs` role is `button`, `checkbox`, `menuitem`, `radio`, `switch`, or `tab`; it uses that role/name metadata, including snapshot-order `duplicateIndex` for duplicate-name refs, instead of taking a fresh pre-click snapshot that could recycle upstream refs
157
157
  - derive narrow prompt guards only for concrete evidence invariants: exact required screenshot paths block browser close until the artifact manifest verifies those paths. The wrapper intentionally does not infer broad business/user intent from prompt text such as order/payment/post boundaries; agents must follow those instructions themselves. The artifact guard is bounded preflight policy (`details.promptGuard`, `failureCategory: "policy-blocked"`), not a reusable browser recipe layer
158
158
  - after successful `get text` on a non-ref CSS selector, optionally issue one read-only `eval --stdin` probe per qualifying selector when multiple DOM matches or a hidden first match with visible peers could misread tabbed or off-screen content; merge `details.selectorTextVisibility` / `selectorTextVisibilityAll`, visible warning lines, and `inspect-visible-text-candidates*` next actions as documented in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) and `RQ-0074` in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
159
159
  - for local Unix launches, set a short private socket directory so extension-generated session names do not fail on the upstream Unix socket-path length limit
160
- - keep wrapper-spawned upstream CLI calls inside the upstream IPC budget by clamping `AGENT_BROWSER_DEFAULT_TIMEOUT` to 25 seconds and stopping a stuck child process before the upstream 30-second read-timeout retry loop begins; dialog commands, likely dialog-trigger clicks/taps/finds, and `eval --stdin` snippets that look like alert/confirm/prompt/dialog triggers use shorter wrapper subprocess budgets so blocking JavaScript prompts surface recovery actions before the full default watchdog
160
+ - keep wrapper-spawned upstream CLI calls bounded by clamping `AGENT_BROWSER_DEFAULT_TIMEOUT` to the upstream documented 25-second default while deriving a longer subprocess watchdog for explicit long `wait <ms>` / `wait --timeout <ms>` calls; dialog commands, likely dialog-trigger clicks/taps/finds, and `eval --stdin` snippets that look like alert/confirm/prompt/dialog triggers use shorter wrapper subprocess budgets so blocking JavaScript prompts surface recovery actions before the full default watchdog
161
161
 
162
162
  This is primarily about ownership clarity and avoiding surprise, not adding a heavy safety wrapper. If the extension invented the session, the extension should own its lifecycle without breaking reload, resume, or branch-tree semantics. If the caller explicitly chose the upstream session model, the extension should stay out of the way.
163
163
 
@@ -177,7 +177,7 @@ That failure should include a structured recovery hint pointing to `sessionMode:
177
177
  Implementation detail lives in `extensions/agent-browser/lib/launch-scoped-flags.ts` (canonical flag metadata shared with playbook/docs assertions), `extensions/agent-browser/lib/argv-descriptor.ts` and `extensions/agent-browser/lib/argv-grammar.ts` (command discovery, `VALUE_FLAGS`, `parseArgvDescriptor`) plus `extensions/agent-browser/lib/runtime.ts` (`getStartupScopedFlags`, `buildExecutionPlan`):
178
178
 
179
179
  - **Command discovery:** Leading argv is scanned with a value-taking allowlist so known global flags and documented command flags consume their values before the upstream command word is identified. Missing-value prevalidation is intentionally limited to upstream global value flags; command-scoped flags and literal text are left to upstream parsing so values like `fill #field --password` are not rejected by wrapper heuristics before the CLI sees them. When upstream adds new global flags that take values ahead of the command, extend both the command-discovery and prevalidation allowlists; when it adds command-specific flags, extend only command discovery/redaction as needed. A smaller set of global boolean flags may be followed by an optional `true`/`false` literal; when present, that literal is consumed as the flag value before command discovery continues.
180
- - **`--state` disambiguation:** Persisted browser `--state` before the command participates in launch-scoped validation and tab-correction hints. The same flag spelling after a `wait` command is excluded from startup-scoped detection so upstream help examples such as `wait @ref --state hidden` do not spuriously require `sessionMode: "fresh"` while an implicit session is active. As of upstream `agent-browser 0.27.1`, the parser does not implement those `wait --state` examples as distinct wait modes, so agent-facing docs recommend `wait --fn` predicates for disappearance checks instead.
180
+ - **`--state` disambiguation:** Persisted browser `--state` before the command participates in launch-scoped validation and tab-correction hints. The same flag spelling after a `wait` command is excluded from startup-scoped detection so upstream help examples such as `wait @ref --state hidden` do not spuriously require `sessionMode: "fresh"` while an implicit session is active. As of upstream `agent-browser 0.27.2`, the parser still does not implement those `wait --state` examples as distinct wait modes, so agent-facing docs recommend `wait --fn` predicates for disappearance checks instead.
181
181
  - **`--auto-connect`:** Treated as launch-scoped only when enabled (`--auto-connect` bare or `true`). `--auto-connect false` is ignored for startup-scoped blocking so disabled attach hints do not force a fresh launch.
182
182
 
183
183
  **Sessionless inspection and local commands:** Plain-text global help and version probes (`--help`, `-h`, `--version`, `-V`) must never allocate or bind the extension-managed session. The same session-ownership rule applies to read-only upstream `skills list`, `skills get …`, and `skills path …`, local auth profile management (`auth save/list/show/delete/remove`), plus local/setup surfaces such as `profiles`, `dashboard start/stop`, `device list`, `doctor`, `install`, `upgrade`, `session list`, and targeted/all local saved-state maintenance (`state list/show`, `state clear --all`, `state clear -a`, `state clear <session-name>`, `state clean --older-than <days>`, `state rename`). Non-plain-text sessionless commands still run with `--json` for machine-readable output, but the planner does not prepend the implicit managed `--session`, so an agent can inspect local capabilities or start/stop the standalone dashboard without consuming the implicit session slot before a real `open`. Browser-backed, context-dependent, or incomplete commands such as root `session`, untargeted `state clear`, bare `state clean`, `auth login`, `state save`, and `state load` keep normal managed-session injection. Command-shape allowlisting lives in `extensions/agent-browser/lib/command-policy.ts` (`needsManagedSession`), while `extensions/agent-browser/lib/runtime.ts` (`isPlainTextInspectionArgs`, `buildExecutionPlan`) applies that decision to execution planning.
@@ -206,7 +206,7 @@ This keeps the product centered on native tool usage instead of auxiliary skill
206
206
  - inline screenshots/images for the plain `screenshot` command; other image-like saves (for example `diff screenshot`) still appear in `details.artifacts` and summaries but are not auto-inlined as Pi image attachments (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details))
207
207
  - lightweight session convenience
208
208
  - docs, including a repo-readable command reference that mirrors the blocked direct-binary help path closely enough for normal agent work
209
- - a deterministic **agent efficiency benchmark** (`scripts/agent-browser-efficiency-benchmark.mjs`) used to quantify representative agent-facing workflows without invoking upstream; maintainer commands and constraints are in [`AGENTS.md`](../AGENTS.md) under “Agent browser efficiency benchmark”
209
+ - a deterministic **agent efficiency benchmark** (`scripts/agent-browser-efficiency-benchmark.mjs`) used to quantify representative agent-facing workflows without invoking upstream; maintainer commands and constraints are in [`AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md) under “Agent browser efficiency benchmark”
210
210
 
211
211
  ### Upstream `agent-browser` owns
212
212
 
@@ -226,7 +226,7 @@ The extension does not ship `agent-browser`, but it does ship maintainer-owned d
226
226
 
227
227
  3. **Live help verification** is `scripts/verify-command-reference.mjs`, invoked via `npm run verify -- command-reference` (and included in the default `npm run verify` gate). It runs the baseline’s help commands against `agent-browser` on `PATH` and fails when the installed upstream surface does not match the declared target version or expected tokens.
228
228
 
229
- This mirrors the playbook contract pattern described in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md): canonical TypeScript source and Markdown fragments stay paired through `npm run docs` / `npm run verify`, with deeper step-by-step notes in [`AGENTS.md`](../AGENTS.md), release checklist items in [`RELEASE.md`](RELEASE.md), and the baseline inventory-to-gates matrix in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md).
229
+ This mirrors the playbook contract pattern described in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md): canonical TypeScript source and Markdown fragments stay paired through `npm run docs` / `npm run verify`, with deeper step-by-step notes in [`AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md), release checklist items in [`RELEASE.md`](RELEASE.md), and the baseline inventory-to-gates matrix in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md).
230
230
 
231
231
  ## Not the right design
232
232