pi-agent-browser-native 0.2.42 → 0.2.44

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (39) hide show
  1. package/CHANGELOG.md +32 -0
  2. package/README.md +14 -9
  3. package/docs/COMMAND_REFERENCE.md +9 -10
  4. package/docs/RELEASE.md +10 -4
  5. package/docs/SUPPORT_MATRIX.md +7 -6
  6. package/docs/TOOL_CONTRACT.md +27 -24
  7. package/docs/platform-smoke.md +13 -8
  8. package/extensions/agent-browser/index.ts +71 -2
  9. package/extensions/agent-browser/lib/input-modes/params.ts +1 -1
  10. package/extensions/agent-browser/lib/input-modes/types.ts +1 -1
  11. package/extensions/agent-browser/lib/navigation-policy.ts +95 -0
  12. package/extensions/agent-browser/lib/orchestration/browser-run/diagnostics.ts +2 -7
  13. package/extensions/agent-browser/lib/orchestration/browser-run/final-result.ts +1 -0
  14. package/extensions/agent-browser/lib/orchestration/browser-run/prepare.ts +2 -2
  15. package/extensions/agent-browser/lib/orchestration/browser-run/process-output.ts +103 -12
  16. package/extensions/agent-browser/lib/orchestration/browser-run/session-state.ts +20 -3
  17. package/extensions/agent-browser/lib/orchestration/browser-run/types.ts +6 -1
  18. package/extensions/agent-browser/lib/playbook.ts +4 -4
  19. package/extensions/agent-browser/lib/results/action-recommendations.ts +15 -0
  20. package/extensions/agent-browser/lib/results/contracts.ts +17 -0
  21. package/extensions/agent-browser/lib/results/network-routes.ts +80 -0
  22. package/extensions/agent-browser/lib/results/network.ts +10 -2
  23. package/extensions/agent-browser/lib/results/presentation/artifacts.ts +14 -0
  24. package/extensions/agent-browser/lib/results/presentation/batch.ts +36 -13
  25. package/extensions/agent-browser/lib/results/presentation/diagnostics.ts +154 -16
  26. package/extensions/agent-browser/lib/results/presentation/errors.ts +62 -2
  27. package/extensions/agent-browser/lib/results/presentation/semantic-action.ts +2 -4
  28. package/extensions/agent-browser/lib/results/presentation.ts +31 -1
  29. package/extensions/agent-browser/lib/results/selector-recovery.ts +11 -3
  30. package/extensions/agent-browser/lib/results/shared.ts +1 -0
  31. package/extensions/agent-browser/lib/results.ts +3 -0
  32. package/extensions/agent-browser/lib/runtime.ts +6 -0
  33. package/package.json +4 -4
  34. package/platform-smoke.config.mjs +10 -1
  35. package/scripts/doctor.mjs +70 -1
  36. package/scripts/platform-smoke/crabbox-runner.mjs +57 -29
  37. package/scripts/platform-smoke/doctor.mjs +22 -9
  38. package/scripts/platform-smoke/targets.mjs +58 -21
  39. package/scripts/platform-smoke.mjs +1 -0
package/CHANGELOG.md CHANGED
@@ -4,6 +4,38 @@
4
4
 
5
5
  No changes yet.
6
6
 
7
+ ## 0.2.44 - 2026-06-04
8
+
9
+ ### Changed
10
+
11
+ - Updated the local Pi development baseline to `@earendil-works/*` `0.78.1` after reviewing the installed Pi 0.78.1 changelog, docs, examples, and extension source. The audit found no runtime migration needed for `ctx.mode` or command-only `ctx.getSystemPromptOptions()`, and kept the public peer dependency ranges non-pinning.
12
+ - Extended the read-only package doctor with a warning-only `pi --version` check so release validation can catch a Pi CLI older than the audited 0.78.1 floor without making Pi 0.78.1 a hard runtime requirement.
13
+
14
+ ### Validation
15
+
16
+ - Ran checkout-based interactive `tmux` Pi dogfood with `pi --no-extensions --no-skills -e .` on Pi 0.78.1: `agent_browser` opened and snapshotted `https://example.com`, ran a QA preset against `https://react.dev` expecting `React`, saved and verified a screenshot, reported no console/network/page errors, closed the browser session, and cleaned the temp artifact directory.
17
+
18
+ ## 0.2.43 - 2026-06-04
19
+
20
+ ### Added
21
+
22
+ - Added fail-closed artifact verification for explicit saved paths with an `artifact-missing` failure category when upstream reports success but the requested artifact is absent.
23
+ - Added wrapper-side allowed-domain tracking for browser sessions so post-navigation escapes from configured allowed domains fail loudly.
24
+ - Added route-aware network diagnostics for pending or CORS-likely route mocks, including structured follow-up actions for request inspection and HAR capture plus prose guidance for same-origin/CORS-correct fixture retries.
25
+
26
+ ### Changed
27
+
28
+ - Made same-snapshot form batching less conservative: `check`/`uncheck` on checkbox or radio refs and `select` on combobox refs remain guarded against stale refs but no longer force a fresh snapshot before later same-batch form work; direct `click @ref` remains conservative.
29
+ - Removed unsupported `semanticAction.uncheck` from the public shorthand contract while keeping raw upstream `uncheck <selector-or-ref>` pass-through available.
30
+ - Improved `semanticAction.fill` recovery by resolving exact current editable role/name matches, including comboboxes, through a current visible ref before falling back to upstream locator behavior.
31
+ - Made benign local QA storage values visible for explicitly safe primitive keys while continuing to redact secret-, identity-, session-, email-, token-, and URL-shaped values.
32
+ - Treated the exact upstream “streaming already enabled” response as an idempotent success with cleanup/status next actions, without masking broader stream failures.
33
+
34
+ ### Fixed
35
+
36
+ - Stopped emitting Electron app-shell broad-selector warnings on ordinary browser pages such as `file://` fixtures; broad Electron `get text` warnings now require wrapper-tracked Electron launch provenance.
37
+ - Clarified clipboard permission denials without leaking denied clipboard write payloads across prose, JSON, batch, parse-failure, and detail surfaces.
38
+
7
39
  ## 0.2.42 - 2026-06-03
8
40
 
9
41
  ### Fixed
package/README.md CHANGED
@@ -61,7 +61,7 @@ The result is optimized for agent work:
61
61
  | Screenshots/downloads get lost in text | Normalizes artifact paths and reports existence, size, cwd, session, and repair status | [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#download-screenshot-and-pdf-files) |
62
62
  | Profile restores and tab drift confuse agents | Tracks managed sessions, re-selects target tabs after observed drift, rehydrates branch-backed session state on Pi session-tree changes, and pins later commands only for sessions with drift/restored-session risk | generated tab-recovery notes below; `test/agent-browser.extension-tab-recovery.test.ts` (drift and about:blank recovery), `test/agent-browser.resume-state.test.ts` (persisted session / resume planning), `test/agent-browser.extension-ref-guards.test.ts` (session_tree rehydration) |
63
63
  | Auth/profile workflows can leak secrets | Supports `auth save --password-stdin` and redacts sensitive args, URLs, stdout/stderr, details, and parse-failure spills | `test/agent-browser.extension-security-redaction.test.ts` |
64
- | Stateful cookies/storage/auth output bloats or leaks context | Presentation layer redacts `details.data` for cookies and storage (field-aware values) and recursively scrubs other structured upstream JSON (network, diff, trace/profiler, stream, dashboard, chat, auth, dialog, frame, state, and similar) using sensitive key names plus string heuristics; masks sensitive argv flags and positionals; scrubs secrets from failed batch step errors; and exposes a compact redacted `batch` matrix on top-level `details.data` | `extensions/agent-browser/lib/results/presentation.ts`, `extensions/agent-browser/lib/results/presentation/diagnostics.ts`, `extensions/agent-browser/lib/runtime.ts`, `test/agent-browser.presentation-diagnostics.test.ts` |
64
+ | Stateful cookies/storage/auth output bloats or leaks context | Presentation layer redacts `details.data` for cookies and credential-like storage values while keeping low-risk local QA values such as `theme: dark` readable; recursively scrubs other structured upstream JSON (network, diff, trace/profiler, stream, dashboard, chat, auth, dialog, frame, state, and similar) using sensitive key names plus string heuristics; masks sensitive argv flags and positionals; scrubs secrets from failed batch step errors; and exposes a compact redacted `batch` matrix on top-level `details.data` | `extensions/agent-browser/lib/results/presentation.ts`, `extensions/agent-browser/lib/results/presentation/diagnostics.ts`, `extensions/agent-browser/lib/runtime.ts`, `test/agent-browser.presentation-diagnostics.test.ts` |
65
65
  | Stale `@eN` refs fail mysteriously | Records per-session `details.refSnapshot`, rejects mismatched URLs / unknown refs / unsafe `batch` stdin ordering before spawn, adds recovery guidance to rerun `snapshot -i` or use stable `find` locators | `extensions/agent-browser/index.ts`, `extensions/agent-browser/lib/session-page-state.ts`, `test/agent-browser.session-page-state.test.ts`, `test/agent-browser.results.test.ts`, `test/agent-browser.extension-ref-guards.test.ts`, `test/agent-browser.extension-semantic-recovery.test.ts` |
66
66
  | Agents need stable success/failure buckets | Exposes bounded `resultCategory`, `successCategory`, and `failureCategory` on tool `details` for branching without parsing prose; a `tool_result` hook also aligns real Pi `isError` semantics, naming `Pi tool isError: true` in prose output while preserving parseable caller-requested `--json` output | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), `extensions/agent-browser/lib/results/categories.ts`, `extensions/agent-browser/lib/results/shared.ts` (re-export barrel), `extensions/agent-browser/index.ts`, `test/agent-browser.results.test.ts`, `test/agent-browser.extension-validation.test.ts`, `test/agent-browser.pi-pipeline.test.ts` |
67
67
  | Clicks can report success without the page receiving the event | Top-level non-Electron `click` on exact CSS/XPath selectors, and on `@e…` refs when the latest snapshot has role/name metadata the wrapper can resolve to a unique visible element, installs a bounded DOM-event probe; if upstream reports success but no trusted event reaches the target, the wrapper fails the tool, exposes `details.clickDispatch`, and suggests explicit retry/inspect next actions (no in-page replay). Other click results still expose `details.pageChangeSummary`, unchanged-URL clicks can surface evidence-backed `details.overlayBlockers` candidates, and explicit user stop boundaries can best-effort block click-like actions plus `press`/`key` Enter/Return submits via `details.promptGuard`. | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), `extensions/agent-browser/lib/orchestration/browser-run/click-dispatch.ts`, `extensions/agent-browser/lib/orchestration/browser-run/browser-action-model.ts`, `extensions/agent-browser/lib/orchestration/browser-run/prompt-guards.ts`, `extensions/agent-browser/lib/results/presentation/navigation.ts`, `test/agent-browser.presentation.test.ts`, `test/agent-browser.extension-click-dispatch.test.ts` |
@@ -74,6 +74,8 @@ The result is optimized for agent work:
74
74
 
75
75
  ## Fastest way to try it
76
76
 
77
+ Use Pi 0.78.1 or newer when possible. This package does not hard-pin Pi 0.78.1 as a runtime requirement, but the current release is audited and validated against that extension/package baseline.
78
+
77
79
  Install upstream `agent-browser` first and make sure it is on `PATH`:
78
80
 
79
81
  - https://agent-browser.dev/
@@ -142,6 +144,7 @@ The doctor checks:
142
144
 
143
145
  - upstream `agent-browser` exists on `PATH`
144
146
  - the installed upstream version matches this wrapper's command-reference baseline
147
+ - `pi --version` meets the recommended Pi floor for this release, as a warning rather than a hard failure
145
148
  - Pi settings do not point at multiple active `pi-agent-browser-native` sources
146
149
 
147
150
  It does **not** edit Pi settings and does **not** run upstream `agent-browser doctor --fix`.
@@ -263,7 +266,7 @@ Run a multi-step flow in one tool call:
263
266
  { "args": ["batch"], "stdin": "[[\"open\",\"https://example.com\"],[\"snapshot\",\"-i\"]]" }
264
267
  ```
265
268
 
266
- If the same `batch` stdin later uses `@e…` on interaction commands after a step that can navigate or mutate the page (`open`, `click`, `reload`, and similar), insert a `snapshot` step whose first argv token is `snapshot` (for example `["snapshot","-i"]`) between those phases. Multiple same-snapshot `fill @e…` steps may be batched before a click/submit step; dynamic or autosubmit forms should still use stable locators or split with a fresh snapshot. The wrapper rejects unsafe ordering with `failureCategory: "stale-ref"` before upstream runs; full rules are under `refSnapshot` in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details).
269
+ If the same `batch` stdin later uses `@e…` on interaction commands after a step that can navigate or mutate the page (`open`, non-form `click`, `reload`, and similar), insert a `snapshot` step whose first argv token is `snapshot` (for example `["snapshot","-i"]`) between those phases. Multiple same-snapshot `fill @e…` steps and native form-control steps (`check`/`uncheck` on checkbox or radio refs and `select` on combobox refs) may be batched before a click/submit step; checkbox/radio `click`s remain conservative unless followed by a fresh snapshot. Dynamic or autosubmit forms should still use stable locators or split with a fresh snapshot. The wrapper rejects unsafe ordering with `failureCategory: "stale-ref"` before upstream runs; full rules are under `refSnapshot` in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details).
267
270
 
268
271
  Evaluate page JavaScript through stdin. Put the script in the top-level `stdin` field, not as an extra `args` token after `--stdin`. Return the value you want as an expression; `eval --stdin` may warn with `details.evalStdinHint` when a function-shaped snippet serializes to `{}` instead of being invoked:
269
272
 
@@ -309,14 +312,14 @@ Typical pitfalls:
309
312
  - `semanticAction` and `job` are **not** valid inside `batch` stdin; batch steps stay upstream argv string arrays (spell a `find` step as tokens there if you need it in a batch).
310
313
  - Commands or locators outside the supported shorthand still require explicit `args`. Common page getters are grouped under `get`: use `get title`, `get url`, or `get text <selector>` rather than shortcut commands such as `title` or `url`; unknown getter shortcuts can return read-only `details.nextActions` like `use-get-title`.
311
314
  - For `locator: "role"`, pass either `value: "button"` or `role: "button"`; if both are present they must match.
312
- - Use `semanticAction.session` to target a named upstream browser session; the wrapper prepends `--session <name>` before the compiled `find` or `select` argv and keeps that prefix on retry/candidate actions. In active sessions, role/name click/check/uncheck shorthands may resolve through the current `snapshot -i` refs before execution so hidden duplicate matches do not steal the action; `details.effectiveArgs` shows the exact executed argv.
315
+ - Use `semanticAction.session` to target a named upstream browser session; the wrapper prepends `--session <name>` before the compiled `find` or `select` argv and keeps that prefix on retry/candidate actions. In active sessions, role/name click/check/fill shorthands may resolve through the current `snapshot -i` refs before execution so hidden duplicate matches do not steal the action; fill only resolves when the current snapshot has one exact editable ref match. `details.effectiveArgs` shows the exact executed argv.
313
316
  - Do not reuse `@e…` refs across navigation. The wrapper records the latest snapshot refs per session and fails mutation-prone stale/recycled refs before upstream can silently hit a different current-page element; use the session-aware `refresh-interactive-refs` next action.
314
317
  - If upstream classifies the failure as `stale-ref` and `details.compiledSemanticAction` is present for a compiled `find` action, `details.nextActions` may list `retry-semantic-action-after-stale-ref` after `refresh-interactive-refs`, carrying the same compiled `find` argv so you can retry the locator-stable target once it is safe to do so. `select` calls that used stale `@refs` only get refresh guidance; use a fresh snapshot or stable selector before retrying (contract in [`docs/TOOL_CONTRACT.md#semanticaction`](docs/TOOL_CONTRACT.md#semanticaction)).
315
- - If the failure is `selector-not-found`, the wrapper may take one fresh snapshot and add `Current snapshot ref fallback` when that snapshot has exact visible role/name matches for the failed `find` / `semanticAction` target. Non-fill targets can include direct `try-current-visible-ref*` next actions, and semantic click misses can still add bounded `Agent-browser candidate fallbacks` such as `button`/`link` role retries for `text` clicks. For semantic `fill` misses on desktop or host-controlled rich inputs, prefer `details.richInputRecovery`: refresh refs, choose the current editable `@ref`, focus or click it, then use `keyboard inserttext` or `keyboard type` with the intended text. Those recovery nextActions do not copy the fill text and do not press `Enter` or submit; only submit when the user flow explicitly calls for it (same contract link).
318
+ - If the failure is `selector-not-found`, the wrapper may take one fresh snapshot and add `Current snapshot ref fallback` when that snapshot has exact visible role/name matches for the failed `find` / `semanticAction` target. Non-fill targets can include direct `try-current-visible-ref*` next actions, and semantic click misses can still add bounded `Agent-browser candidate fallbacks` such as `button`/`link` role retries for `text` clicks. `semanticAction` does not expose `uncheck` while upstream `find ... uncheck` is not runtime-supported; use raw `args: ["uncheck", <selector-or-ref>]` after a stable selector or fresh snapshot ref. For semantic `fill` misses on desktop or host-controlled rich inputs, prefer `details.richInputRecovery`: refresh refs, choose the current editable `@ref`, focus or click it, then use `keyboard inserttext` or `keyboard type` with the intended text. Those recovery nextActions do not copy the fill text and do not press `Enter` or submit; only submit when the user flow explicitly calls for it (same contract link).
316
319
  - A successful upstream `click` is not proof that the web app handled the event or changed state. For top-level non-Electron clicks, the wrapper may fail the tool with `details.clickDispatch` and a `Click dispatch diagnostic` line when upstream reported success but no trusted DOM event reached the target; use the suggested `inspect-click-dispatch-miss` / `retry-click-after-dispatch-miss` next actions instead of assuming the click mutated the page. When the task depends on a mutation, follow `inspect-after-mutation` / `pageChangeSummary` evidence with a wait, URL/text check, or fresh snapshot before trusting the result; if the target still did not change, retry with a current visible ref or stable selector and report the workflow issue instead of silently continuing. For static local fixtures where the user only needs to exercise app code, an explicit `eval --stdin` programmatic click such as `document.querySelector("#demo").click()` can be a diagnostic workaround, but treat it as an untrusted scripted activation rather than proof a real user click works, and never use it to bypass explicit stop-before-submit/order/purchase boundaries. Preserve explicit user stop boundaries: if the user says to stop before order/post/purchase/submit, gather evidence on that page and do not click the final action. The wrapper now blocks likely final order/submit clicks under such prompts and reports `details.promptGuard` rather than trusting the model to self-police.
317
320
  - If a **top-level** `click` succeeds (unified command `click`, not a `batch` step), upstream reports `data.clicked`, and the tab URL is unchanged under the same normalization as ref preflight (fragment-insensitive), the wrapper may take one extra `snapshot -i` and add `Possible overlay blockers` with `details.overlayBlockers` (`candidates`, `summary`, optional `snapshot` refresh for refs) plus session-aware `inspect-overlay-state` / bounded `try-overlay-blocker-candidate-*` next actions when that snapshot shows strong modal context (`dialog` / `alertdialog`) and close/dismiss-like controls. Page-wide words like privacy, sign in, or banner alone do not trigger this diagnostic. The unchanged-URL check uses `details.navigationSummary`, which is populated with one read-only `eval` summary when the click JSON omits **both** string `data.url` and `data.title`; if upstream already includes either, overlay diagnostics are skipped here. Also skipped when tab correction or about-blank recovery already ran on that result.
318
321
  - If `get text <selector>` reads a non-ref CSS selector with multiple matches or a hidden first match while visible matches exist, including successful `batch` steps, the wrapper may add `Selector text visibility warning`, `details.selectorTextVisibility` (plus `selectorTextVisibilityAll` for multiple batched warnings), and `inspect-visible-text-candidates` next actions; the warning names the matching `details.nextActions` id. Prefer a visible `@ref`, a scoped selector, or a targeted `eval --stdin` over hidden tab content.
319
- - In attached Electron sessions, broad selectors such as `body`, `html`, `main`, or `[role=application]` may read the whole app shell. The wrapper may add `Broad Electron get text selector warning`, `details.electronGetTextScopeWarning`, and `snapshot-for-electron-text-scope`; prefer `snapshot -i`, a current `@ref`, or a narrower panel selector.
322
+ - In wrapper-tracked attached Electron sessions, broad selectors such as `body`, `html`, `main`, or `[role=application]` may read the whole app shell. The wrapper may add `Broad Electron get text selector warning`, `details.electronGetTextScopeWarning`, and `snapshot-for-electron-text-scope`; ordinary browser pages, including `file://` fixtures, do not qualify without Electron launch provenance. Prefer `snapshot -i`, a current `@ref`, or a narrower panel selector.
320
323
 
321
324
  ### Constrained browser jobs
322
325
 
@@ -411,7 +414,7 @@ For local app debugging, `sourceLookup` can gather candidate component/file loca
411
414
 
412
415
  This is an experiment, not a guarantee. React hints require a session opened with `--enable react-devtools`, and many builds do not expose useful sourcemap/source metadata; `status: "no-candidates"` is common when nothing matched, and `status: "unsupported"` only when no candidates were found **and** a compiled `react` batch step failed (if DOM or workspace search still produced candidates, you get `candidates-found` instead). For wrapper-tracked packaged Electron apps, a no-candidate result includes `details.sourceLookup.workspaceRoot`, optional `details.sourceLookup.electronContext`, limitations explaining that the scan is limited to the Pi cwd and does not unpack app bundles/`app.asar`, plus Electron snapshot/probe/tab next actions when a launch is known.
413
416
 
414
- `networkSourceLookup` is the matching failed-request experiment. It runs `network request <id>` when `requestId` is present and/or `network requests --filter …` when `filter` or `url` is present (`url` supplies the filter pattern when `filter` is omitted); add `session` when the generated batch should target an explicit upstream session. It merges failed-request rows from the batch JSON with initiator-style hints and a bounded workspace literal scan (`maxWorkspaceFiles` defaults to 2000, cap 5000), surfaces everything under `details.networkSourceLookup`, and avoids automatic blame or edits. Compact `network requests` results with safe request IDs also add `details.nextActions` for request details, bounded `networkSourceLookup` on actionable failures, path filtering, or HAR capture so agents can branch without guessing request-id syntax. Network diagnostics are read-only for wrapper page state: request URLs in `network request` or generated `networkSourceLookup` batches do not replace the session’s active page target or invalidate page-scoped refs from the app page.
417
+ `networkSourceLookup` is the matching failed-request experiment. It runs `network request <id>` when `requestId` is present and/or `network requests --filter …` when `filter` or `url` is present (`url` supplies the filter pattern when `filter` is omitted); add `session` when the generated batch should target an explicit upstream session. It merges failed-request rows from the batch JSON with initiator-style hints and a bounded workspace literal scan (`maxWorkspaceFiles` defaults to 2000, cap 5000), surfaces everything under `details.networkSourceLookup`, and avoids automatic blame or edits. Compact `network requests` results with safe request IDs also add `details.nextActions` for request details, bounded `networkSourceLookup` on actionable failures, path filtering, or HAR capture so agents can branch without guessing request-id syntax. When the wrapper has seen `network route` in the same session, pending fetch/XHR rows or CORS-looking errors that match the route surface `details.networkRouteDiagnostics` plus executable follow-ups to inspect the request or start HAR capture; same-origin/CORS-correct fixture retry guidance stays in prose. Network diagnostics are read-only for wrapper page state: request URLs in `network request` or generated `networkSourceLookup` batches do not replace the session’s active page target or invalidate page-scoped refs from the app page.
415
418
 
416
419
  ```json
417
420
  { "networkSourceLookup": { "requestId": "req-1", "url": "/api/fail" } }
@@ -461,8 +464,8 @@ Use these rules:
461
464
  - Do not treat `--session` as persisted auth or tab restore after `close`, `quit`, or `exit`; use `--profile`, `--session-name`, or `--state` for persistence.
462
465
  - Prefer page actions and storage checks over cookie dumps. `cookies get` can expose real profile cookies.
463
466
  - Prefer `auth save --password-stdin` over putting passwords in `args`; the wrapper only accepts caller `stdin` for `batch`, `eval --stdin`, and `auth save --password-stdin` (top-level `job` and `qa` compile to `batch` and supply their own stdin).
464
- - Use `state save <path>` / `state load <path>` for portable test state. `state save` is reported as a file artifact with verification metadata; `state load` may mention a path but is not treated as a newly saved artifact.
465
- - Treat `cookies get`, `storage local|session`, and `auth show` output as sensitive. The native presentation summarizes and redacts credential-like values, but avoid requesting these dumps unless the task needs them.
467
+ - Use `state save <path>` / `state load <path>` for portable test state. `state save` is reported as a file artifact with verification metadata; if an upstream-successful artifact command reports a non-pending file path that the wrapper cannot find on disk, the tool fails with `failureCategory: "artifact-missing"` instead of treating the path as durable. `state load` may mention a path but is not treated as a newly saved artifact.
468
+ - Treat `cookies get`, `storage local|session`, and `auth show` output as sensitive. The native presentation summarizes and redacts credential-like values while allowing benign primitive storage values to aid local QA, but avoid requesting broad dumps unless the task needs them.
466
469
  - Use `dialog status`, `dialog accept [text]`, `dialog dismiss`, and `frame <selector|main>` through native `args`; use exact `confirm <id>` / `deny <id>` next actions for guarded-action confirmations.
467
470
 
468
471
  Safe stateful examples:
@@ -575,10 +578,11 @@ Cross-platform release coverage uses Crabbox to run macOS, Ubuntu Linux, and nat
575
578
  ```bash
576
579
  npm run check:platform-smoke
577
580
  npm run smoke:platform:ubuntu-image
581
+ npm run smoke:platform:doctor
578
582
  npm run smoke:platform:all
579
583
  ```
580
584
 
581
- The required matrix is documented in [`docs/platform-smoke.md`](docs/platform-smoke.md). It runs `platform-build` (fast target-local verify, pack, clean packed Pi install, `pi list`) and `browser-dogfood-smoke` (real `agent-browser`/browser wrapper smoke) on every target.
585
+ The required matrix is documented in [`docs/platform-smoke.md`](docs/platform-smoke.md). It runs `platform-build` (fast target-local verify, pack, clean packed Pi install, `pi list`) and `browser-dogfood-smoke` (real `agent-browser`/browser wrapper smoke) on every target. Inspect `.artifacts/platform-smoke/` and check `crabbox list --provider local-container` plus `crabbox list --provider parallels` after release runs so cleanup proof is not chat-only.
582
586
 
583
587
  For package release confidence, follow [`docs/RELEASE.md`](docs/RELEASE.md). The release gate is:
584
588
 
@@ -586,6 +590,7 @@ For package release confidence, follow [`docs/RELEASE.md`](docs/RELEASE.md). The
586
590
  npm run doctor
587
591
  npm run check:platform-smoke
588
592
  npm run smoke:platform:ubuntu-image
593
+ npm run smoke:platform:doctor
589
594
  npm run verify -- release
590
595
  ```
591
596
 
@@ -155,18 +155,17 @@ Examples:
155
155
  { "semanticAction": { "action": "fill", "locator": "label", "value": "Email", "text": "user@example.com" } }
156
156
  { "semanticAction": { "action": "select", "selector": "#flavor", "value": "chocolate" } }
157
157
  { "semanticAction": { "action": "click", "locator": "text", "value": "Close", "session": "named-browser" } }
158
- { "semanticAction": { "action": "uncheck", "locator": "label", "value": "Remember me" } }
159
158
  { "args": ["scrollintoview", "@e12"] }
160
159
  { "args": ["snapshot", "-i"] }
161
160
  ```
162
161
 
163
- The optional native `semanticAction` object is only a thin schema for common locator-based actions and native dropdown selection; it compiles locator actions to existing upstream `find` commands, compiles `action: "select"` to upstream `select <selector> <value...>`, and reports the compiled argv in `details.compiledSemanticAction` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction) for the full field rules). For `locator: "role"`, pass either `value: "button"` or `role: "button"`; if both are present they must match. It is a top-level alternative to `args`, `job`, `qa`, `sourceLookup`, `networkSourceLookup`, and `electron`, not a nested shape inside `batch` stdin arrays. Add `session` inside `semanticAction` when the shorthand should target a named upstream browser session; the compiled argv prepends `--session <name>` before `find` or `select`, and fallback candidate actions preserve that prefix. For active sessions, role/name click/check/uncheck shorthands may resolve through the current `snapshot -i` refs before execution so hidden duplicate matches do not steal the action; inspect `details.effectiveArgs` when you need the exact executed argv. `select` shorthand intentionally requires a stable selector or current `@ref` plus `value`/`values`; upstream `find` does not expose a verified `select` action, so role/name/label dropdown resolution stays a snapshot/selector decision instead of hidden wrapper magic. If a raw `find` or semantic action misses with `selector-not-found`, the wrapper may take one fresh snapshot and append `Current snapshot ref fallback` when that snapshot has exact visible role/name matches for the failed target. Non-fill matches can include direct `try-current-visible-ref*` next actions. Semantic click misses may also include `Agent-browser candidate fallbacks`; `details.nextActions` first recommends a fresh `snapshot -i` and may include bounded role/name retries such as `button`/`link` for a missed `text` click, each as a `try-*-candidate` entry carrying redacted `find role …` argv.
162
+ The optional native `semanticAction` object is only a thin schema for common locator-based actions and native dropdown selection; it compiles locator actions to existing upstream `find` commands, compiles `action: "select"` to upstream `select <selector> <value...>`, and reports the compiled argv in `details.compiledSemanticAction` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction) for the full field rules). For `locator: "role"`, pass either `value: "button"` or `role: "button"`; if both are present they must match. It is a top-level alternative to `args`, `job`, `qa`, `sourceLookup`, `networkSourceLookup`, and `electron`, not a nested shape inside `batch` stdin arrays. Add `session` inside `semanticAction` when the shorthand should target a named upstream browser session; the compiled argv prepends `--session <name>` before `find` or `select`, and fallback candidate actions preserve that prefix. For active sessions, role/name click/check/fill shorthands may resolve through the current `snapshot -i` refs before execution so hidden duplicate matches do not steal the action; fill only resolves when there is one exact editable current ref match. Inspect `details.effectiveArgs` when you need the exact executed argv. `semanticAction` does not expose `uncheck` while upstream `find ... uncheck` is not runtime-supported; use raw `uncheck <selector-or-ref>` after choosing a stable selector or current snapshot ref. `select` shorthand intentionally requires a stable selector or current `@ref` plus `value`/`values`; upstream `find` does not expose a verified `select` action, so role/name/label dropdown resolution stays a snapshot/selector decision instead of hidden wrapper magic. If a raw `find` or semantic action misses with `selector-not-found`, the wrapper may take one fresh snapshot and append `Current snapshot ref fallback` when that snapshot has exact visible role/name matches for the failed target. Non-fill matches can include direct `try-current-visible-ref*` next actions. Semantic click misses may also include `Agent-browser candidate fallbacks`; `details.nextActions` first recommends a fresh `snapshot -i` and may include bounded role/name retries such as `button`/`link` for a missed `text` click, each as a `try-*-candidate` entry carrying redacted `find role …` argv.
164
163
 
165
- For desktop or host-controlled rich inputs, treat a semantic `fill` miss differently. If the fresh snapshot finds an exact current editable ref (`searchbox` or `textbox`), `details.richInputRecovery` and visible `Rich input recovery` describe the candidate and append `focus-current-editable-ref*` / `click-current-editable-ref*` next actions. Those actions deliberately do **not** copy the fill text and never press `Enter` or submit. Use the safe ladder instead: refresh refs, choose the current editable `@ref`, focus or click it, then send the intended text with `keyboard inserttext` or `keyboard type` in a separate call. Do not auto-submit unless the user flow explicitly calls for it.
164
+ For desktop or host-controlled rich inputs, treat a semantic `fill` miss differently. Active-session role/name fills can execute through one exact current editable `combobox`, `searchbox`, or `textbox` ref before upstream `find` runs. If a later selector miss still finds an exact current editable ref (`searchbox` or `textbox`), `details.richInputRecovery` and visible `Rich input recovery` describe the candidate and append `focus-current-editable-ref*` / `click-current-editable-ref*` next actions. Those actions deliberately do **not** copy the fill text and never press `Enter` or submit. Use the safe ladder instead: refresh refs, choose the current editable `@ref`, focus or click it, then send the intended text with `keyboard inserttext` or `keyboard type` in a separate call. Do not auto-submit unless the user flow explicitly calls for it.
166
165
 
167
166
  Do not assume Playwright selector dialects such as `text=Close` or `button:has-text('Close')` are supported wrapper syntax. If you need those forms, verify current upstream `agent-browser` behavior first; otherwise use refs, `find`, or known CSS selectors.
168
167
 
169
- Treat `@e…` refs as page-scoped. After a successful `snapshot`, the wrapper records the latest refs and page target for that session; mutation-prone ref commands such as `click @e4`, `select @e5 chocolate`, or batch steps with old refs fail with `failureCategory: "stale-ref"` when the page target changed or the ref is absent from the latest same-page snapshot. If a session `snapshot -i` fails with `No active page`, the wrapper invalidates prior refs for that session; later mutation-prone `@e…` calls fail before upstream until a successful fresh `snapshot -i` records refs again. Inside `batch` stdin JSON, the wrapper also walks steps in order before spawn: steps whose first token can navigate or mutate set a latch; a later step whose first token is `snapshot` clears that latch for following rows; guarded steps that still mention `@e…` after an uncleared latch fail with the same `stale-ref` bucket without launching upstream. Same-snapshot form fills are allowed before a click or submit step, so a login-style `fill`, `fill`, `click` batch can run from one snapshot; split dynamic or autosubmit forms with a fresh snapshot if a fill itself rerenders the targets. Follow the `refresh-interactive-refs` next action (it includes `--session <name>` when needed) and prefer stable `find` or `semanticAction` locators when navigation or rerendering is likely. Contract detail: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`refSnapshot`, `refSnapshotInvalidation`).
168
+ Treat `@e…` refs as page-scoped. After a successful `snapshot`, the wrapper records the latest refs and page target for that session; mutation-prone ref commands such as non-form `click @e4`, `select @e5 chocolate`, or batch steps with old refs fail with `failureCategory: "stale-ref"` when the page target changed or the ref is absent from the latest same-page snapshot. If a session `snapshot -i` fails with `No active page`, the wrapper invalidates prior refs for that session; later mutation-prone `@e…` calls fail before upstream until a successful fresh `snapshot -i` records refs again. Inside `batch` stdin JSON, the wrapper also walks steps in order before spawn: steps whose first token can navigate or mutate set a latch; a later step whose first token is `snapshot` clears that latch for following rows; guarded steps that still mention `@e…` after an uncleared latch fail with the same `stale-ref` bucket without launching upstream. Same-snapshot form fills and native form-control steps are allowed before a click or submit step, so `fill`, `check`/`uncheck` checkbox or radio refs, `select` combobox refs, then a final submit `click` can run from one snapshot; checkbox/radio `click`s remain conservative unless followed by a fresh snapshot. Split dynamic or autosubmit forms with a fresh snapshot if a control interaction rerenders the targets. Follow the `refresh-interactive-refs` next action (it includes `--session <name>` when needed) and prefer stable `find` or `semanticAction` locators when navigation or rerendering is likely. Contract detail: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`refSnapshot`, `refSnapshotInvalidation`).
170
169
 
171
170
  A successful `click` result means upstream reported a target, not that the app definitely handled the event. For top-level non-Electron clicks, the wrapper installs a bounded DOM-event probe; when upstream reports success but no trusted event reaches the target, it fails the tool and exposes `details.clickDispatch` plus a `Click dispatch diagnostic` line with explicit retry/inspect next actions (no in-page click replay). When the workflow depends on a mutation, use `details.pageChangeSummary`, a wait, URL/text extraction, or a fresh `snapshot -i` before trusting the state; if nothing changed, retry with a current visible ref or stable selector and report the workflow issue. For static local fixtures or debugging where the user explicitly accepts scripted activation, `eval --stdin` can call `document.querySelector(...).click()` to exercise inline handlers and app code; treat that as an untrusted programmatic event, not as evidence that CDP/user-like clicking works. Preserve explicit user stop boundaries: if the user says to stop before a final order, post, purchase, or submit action, gather evidence from that page and do not click the final action or use scripted activation to bypass the stop. The wrapper also blocks likely final order/submit click targets under those prompts and returns `details.promptGuard` with `failureCategory: "policy-blocked"`.
172
171
 
@@ -240,7 +239,7 @@ Use raw `args: ["batch"]` with `stdin` when you need arbitrary upstream commands
240
239
 
241
240
  For quick smoke/QA checks, use top-level `qa`. It clears enabled network/console/page-error buffers before opening the target URL, waits for page readiness, checks expected text/selector, inspects fresh network requests, console messages, and page errors, and can capture an evidence screenshot. The readiness wait defaults to `loadState: "domcontentloaded"`; set `loadState` to `"load"` or `"networkidle"` only when that stricter state is useful and the site is not expected to keep background requests alive. QA network diagnostics classify failed requests by likely impact and list failed rows first in the network preview: actionable document/script/API-style failures fail the preset, while common low-impact browser icon misses such as `favicon.ico` are surfaced as warnings (`qaPreset.warnings`) so they do not fail an otherwise healthy page. Successful QA with no failed checks returns compact model-visible prose (page URL/title when known, checks run, optional screenshot verification) while keeping the full step matrix in `details.qaPreset` and `details.batchSteps`. Failed QA presets report `details.resultCategory: "failure"`, `failureCategory: "qa-failure"`, keep verbose per-step batch output, and real Pi sessions treat the diagnostic as a failed tool result. Prose output also gets a model-visible result-category line including `Pi tool isError: true`; caller-requested `--json` output keeps the JSON string parseable and relies on the patched `isError` plus `details` fields.
242
241
 
243
- The same classification drives plain `network requests` presentation: when any row counts as failed (HTTP status ≥ 400, `failed: true`, or a string `error`), model-facing text starts with a line like `Network failure summary: 0 actionable, 1 benign low-impact (1 total).`, and each preview line can end with an impact tag such as `[benign: low-impact browser icon asset]` or `[actionable: document, script, API, or non-benign request failure]`. When safe request IDs are present, `details.nextActions` adds bounded read-only follow-ups such as `network request <id>`, `networkSourceLookup` for actionable failed rows, `network requests --filter <path>`, and `network har start`; prefer those payloads over rebuilding request-id commands from prose. Rules live in `classifyNetworkRequestFailure` / `summarizeNetworkFailures` in `extensions/agent-browser/lib/results/network.ts`; QA aggregation is `analyzeQaPresetResults` in `extensions/agent-browser/index.ts`.
242
+ The same classification drives plain `network requests` presentation: when any row counts as failed (HTTP status ≥ 400, `failed: true`, or a string `error`), model-facing text starts with a line like `Network failure summary: 0 actionable, 1 benign low-impact (1 total).`, and each preview line can end with an impact tag such as `[benign: low-impact browser icon asset]` or `[actionable: document, script, API, or non-benign request failure]`. When safe request IDs are present, `details.nextActions` adds bounded read-only follow-ups such as `network request <id>`, `networkSourceLookup` for actionable failed rows, `network requests --filter <path>`, and `network har start`; prefer those payloads over rebuilding request-id commands from prose. If the wrapper has seen a prior `network route` in the same session, matching pending fetch/XHR rows or CORS-looking errors add `details.networkRouteDiagnostics` plus executable route-mock follow-ups (`inspect-pending-routed-network-request` and `start-network-har-capture-for-route-mock`) so agents do not mistake a stalled/CORS-blocked mock for a fulfilled mock; same-origin/CORS fixture retry guidance stays in visible prose. Rules live in `classifyNetworkRequestFailure` / `summarizeNetworkFailures` in `extensions/agent-browser/lib/results/network.ts`; QA aggregation is `analyzeQaPresetResults` in `extensions/agent-browser/index.ts`.
244
243
 
245
244
  ```json
246
245
  { "qa": { "url": "https://example.com", "expectedText": "Example Domain", "screenshotPath": ".dogfood/qa-example.png" } }
@@ -442,7 +441,7 @@ Operational notes:
442
441
  - `auth list/show/save/login/delete` summaries avoid expanding profile secrets. Prefer `auth save --password-stdin` over `--password <value>`.
443
442
  - `state save <path>` is a verified file-artifact workflow; inspect `details.artifactVerification` before relying on the file. `state load <path>` is not treated as a newly saved artifact.
444
443
  - `cookies get` can expose real authenticated-profile cookies; prefer task-specific page actions and only inspect cookies when the user needs cookie data.
445
- - `storage local|session` summaries redact sensitive keys and values; still avoid broad storage dumps unless necessary.
444
+ - `storage local|session` summaries redact sensitive keys and likely secret values but may keep benign primitive local QA values visible, for example `theme: dark`; still avoid broad storage dumps unless necessary.
446
445
  - `dialog accept/dismiss/status`, `frame <selector|main>`, and guarded-action `confirm <id>` / `deny <id>` pass through the native tool. Prefer `details.nextActions` for exact confirmation recovery payloads.
447
446
  - `batch` mirrors the same redaction on every step: top-level `details.data` is a compact `{ success, command, result?, error? }[]` matrix (argv-redacted `command`, stateful `result`, scrubbed `error` text). Use `details.batchSteps[]` when you need per-step artifacts, categories, spill paths, or full structured errors beyond the roll-up.
448
447
 
@@ -625,8 +624,8 @@ Current v0.27.1 source does not parse `wait <selector> --state hidden` / `wait <
625
624
  | `errors [--clear]` | View or clear page errors. |
626
625
  | `highlight <sel>` | Highlight an element. |
627
626
  | `inspect` | Open Chrome DevTools for the active page. |
628
- | `clipboard <op> [text]` | Read/write clipboard: `clipboard read`, `clipboard write <text>`, `clipboard copy`, and `clipboard paste`. |
629
- | `stream enable [--port <n>]` | Start runtime WebSocket streaming for this session. |
627
+ | `clipboard <op> [text]` | Read/write clipboard: `clipboard read`, `clipboard write <text>`, `clipboard copy`, and `clipboard paste`. Clipboard access is environment-dependent; `NotAllowedError` / permission-denied failures are common in headless, managed-profile, remote, or `file://` sessions. |
628
+ | `stream enable [--port <n>]` | Start runtime WebSocket streaming for this session. If upstream reports that streaming is already enabled, the wrapper treats it as an idempotent success and adds status/disable follow-ups. |
630
629
  | `stream disable` | Stop runtime WebSocket streaming. |
631
630
  | `stream status` | Show streaming status and active port. |
632
631
  | `react tree` | Print the full React component tree. Requires the page to have been launched with `--enable react-devtools`. |
@@ -670,7 +669,7 @@ Long-running or lifecycle commands should be explicitly paired with cleanup call
670
669
  | `doctor [--fix]` | Diagnose install issues and optionally auto-clean stale files. Use `doctor --offline --quick` for a fast local-only check and `doctor --json` for structured output. |
671
670
  | `profiles` | List available Chrome profiles. |
672
671
 
673
- When these commands are invoked through the native `agent_browser` tool, structured diagnostic/status outputs are rendered as compact summaries. Local inspection/setup calls (`auth save/list/show/delete/remove`, `dashboard start/stop`, `device list`, `doctor`, `install`, `upgrade`, `profiles`, `session list`, `state list/show/rename`, `state clean --older-than <days>`, `state clear --all`, `state clear -a`, and `state clear <session-name>`) are sessionless unless you explicitly pass `--session`; context-dependent calls such as root `session`, untargeted `state clear`, `auth login`, `chat`, and `state save/load` keep normal session behavior. List-like outputs such as sessions, Chrome profiles, auth profiles, network requests, console messages, and page errors include counts and key fields; large outputs are previewed with a `Full output path:` spill file instead of dumping the entire payload into context. For `network requests`, the wrapper shows a failed-request summary split into actionable versus benign low-impact rows, then status, method, URL, resource/mime type, request id, and, when the installed upstream output includes body-like fields, bounded redacted payload, response, and failure/error snippets. Safe request IDs also produce `details.nextActions` for exact request details, actionable failed-request source lookup candidates, filtered request lists, or starting HAR capture before a repro. `network request <requestId>` can expose upstream full-detail body fields such as response bodies using the same bounded model-facing preview; its request URL stays diagnostic-only and does not overwrite `details.sessionTabTarget` for later ref guards. Header, cookie, auth, token, and other secret-like fields are not expanded in model-facing text or `details.data`; command echoes also redact `--body`, `--headers`, `--password`, proxy credentials, auth-bearing URLs, cookie/storage values, and bearer/basic credential text in positional arguments. Use upstream HAR or full raw details only when complete data is required.
672
+ When these commands are invoked through the native `agent_browser` tool, structured diagnostic/status outputs are rendered as compact summaries. Local inspection/setup calls (`auth save/list/show/delete/remove`, `dashboard start/stop`, `device list`, `doctor`, `install`, `upgrade`, `profiles`, `session list`, `state list/show/rename`, `state clean --older-than <days>`, `state clear --all`, `state clear -a`, and `state clear <session-name>`) are sessionless unless you explicitly pass `--session`; context-dependent calls such as root `session`, untargeted `state clear`, `auth login`, `chat`, and `state save/load` keep normal session behavior. List-like outputs such as sessions, Chrome profiles, auth profiles, network requests, console messages, and page errors include counts and key fields; large outputs are previewed with a `Full output path:` spill file instead of dumping the entire payload into context. For `network requests`, the wrapper shows a failed-request summary split into actionable versus benign low-impact rows, then status, method, URL, resource/mime type, request id, and, when the installed upstream output includes body-like fields, bounded redacted payload, response, and failure/error snippets. Safe request IDs also produce `details.nextActions` for exact request details, actionable failed-request source lookup candidates, filtered request lists, or starting HAR capture before a repro. If the same session has active wrapper-observed network routes, pending/CORS-looking matched request rows add `details.networkRouteDiagnostics` and executable route-mock next actions before the generic request actions. `network request <requestId>` can expose upstream full-detail body fields such as response bodies using the same bounded model-facing preview; its request URL stays diagnostic-only and does not overwrite `details.sessionTabTarget` for later ref guards. Clipboard failures that mention `NotAllowedError` or permission denial are usually browser/OS capability limits, not proof that a read, paste, or page mutation happened; prefer page-native reads (`snapshot -i`, `get text`, `eval --stdin`) or direct typing (`keyboard inserttext` / `keyboard type`) when the workflow allows it, and retry true clipboard flows only from an allowed profile/session on a normal `http(s)` page. Header, cookie, auth, token, and other secret-like fields are not expanded in model-facing text or `details.data`; low-risk primitive storage values may remain visible, while command echoes still redact `--body`, `--headers`, `--password`, proxy credentials, auth-bearing URLs, `clipboard write` text, cookie/storage set values, and bearer/basic credential text in positional arguments. Use upstream HAR or full raw details only when complete data is required.
674
673
 
675
674
  ## Optional package config and companion web search
676
675
 
@@ -781,7 +780,7 @@ Browser default config is conservative: it adds agent guidance for signed-in/acc
781
780
  - `--screenshot-format <fmt>`: `png` or `jpeg`. Environment: `AGENT_BROWSER_SCREENSHOT_FORMAT`.
782
781
  - `--content-boundaries`: wrap page output in boundary markers. Environment: `AGENT_BROWSER_CONTENT_BOUNDARIES`.
783
782
  - `--max-output <chars>`: truncate page output to N characters. Environment: `AGENT_BROWSER_MAX_OUTPUT`.
784
- - `--allowed-domains <list>`: restrict navigation domains. Environment: `AGENT_BROWSER_ALLOWED_DOMAINS`.
783
+ - `--allowed-domains <list>`: restrict navigation domains. Environment: `AGENT_BROWSER_ALLOWED_DOMAINS`. The wrapper also remembers argv-supplied allowed domains for the managed session and fails a successful-looking browser command with `failureCategory: "policy-blocked"` when the final observed `http(s)` URL host is outside that allowlist, including click/navigation escapes after the initial page load.
785
784
  - `--action-policy <path>`: action policy JSON file. Environment: `AGENT_BROWSER_ACTION_POLICY`.
786
785
  - `--confirm-actions <list>`: action categories requiring confirmation. Environment: `AGENT_BROWSER_CONFIRM_ACTIONS`.
787
786
  - `--confirm-interactive`: interactive confirmations; auto-denies when stdin is not a TTY. Environment: `AGENT_BROWSER_CONFIRM_INTERACTIVE`.
package/docs/RELEASE.md CHANGED
@@ -26,10 +26,11 @@ npm install
26
26
  npm run doctor
27
27
  npm run check:platform-smoke
28
28
  npm run smoke:platform:ubuntu-image
29
+ npm run smoke:platform:doctor
29
30
  npm run verify -- release
30
31
  ```
31
32
 
32
- `npm run doctor` is a read-only first-run diagnostic for PATH, targeted upstream version, and duplicate package/checkout source conflicts. It does not replace upstream `agent-browser doctor` for browser runtime health and does not edit Pi settings.
33
+ `npm run doctor` is a read-only first-run diagnostic for PATH, targeted upstream version, the recommended Pi release floor, and duplicate package/checkout source conflicts. The Pi version check is a warning, not a hard runtime requirement. It does not replace upstream `agent-browser doctor` for browser runtime health and does not edit Pi settings.
33
34
 
34
35
  `npm run verify -- release` runs:
35
36
 
@@ -50,12 +51,17 @@ npm run verify -- dogfood
50
51
  For direct Crabbox diagnostics outside the full release compose, run:
51
52
 
52
53
  ```bash
53
- npm run smoke:platform:doctor
54
+ npm run check:platform-smoke
54
55
  npm run smoke:platform:ubuntu-image
56
+ npm run smoke:platform:doctor
55
57
  npm run smoke:platform:all
58
+ crabbox list --provider local-container
59
+ crabbox list --provider parallels
56
60
  ```
57
61
 
58
- This mode uses the extension harness and the real `agent-browser` on `PATH` against a deterministic local file fixture, then verifies top-level `qa`, `semanticAction`, constrained `job`, screenshot artifact verification, and session close. Use `npm run verify -- dogfood --keep-artifacts` or `--artifact-dir <path>` only while debugging, then delete retained screenshots. This smoke complements, but does not replace, human-readable interactive transcript evidence.
62
+ The Crabbox gate is only green when suite assertions and artifact manifests under `.artifacts/platform-smoke/` are green and no unexpected lease/clone remains.
63
+
64
+ The deterministic dogfood mode uses the extension harness and the real `agent-browser` on `PATH` against a deterministic local file fixture, then verifies top-level `qa`, `semanticAction`, constrained `job`, screenshot artifact verification, and session close. Use `npm run verify -- dogfood --keep-artifacts` or `--artifact-dir <path>` only while debugging, then delete retained screenshots. This smoke complements, but does not replace, human-readable interactive transcript evidence.
59
65
 
60
66
  Every release also requires interactive `tmux`-driven Pi dogfood with the native `agent_browser` tool against real sites. For extension-focused release smokes, use `pi --no-extensions --no-skills -e .` from the checkout before publish so auto-loaded dogfood/QA skills cannot replace the bounded smoke workflow; run separate skill-enabled dogfood only when validating skill routing or report-generation behavior. Drive prompts with `tmux send-keys`, exercise at least one simple static site and one real documentation/product site, include the higher-level `qa` or `job`/`batch` surfaces when they changed, close every opened browser session, remove screenshots/temp artifacts, and record the outcome in the release notes or support-matrix evidence. Automated localhost, fake-upstream, and deterministic dogfood gates do not replace this human-readable live-site transcript evidence. When `agent_browser_web_search` or package config changed, add one key-free smoke proving the optional tool is absent without config, one fake/unit-backed smoke in the default suite, and one opt-in live Exa or Brave Search check with a real key while confirming the key does not appear in transcripts, stdout/stderr, config status, PR text, or artifacts. When `electron.*` surfaces, attached-session diagnostics, or `qa.attached` changed, add a local Electron pass: `electron.list` → `electron.launch` (expect isolated profile behavior) → `snapshot -i` or `electron.probe` / `qa.attached` → `electron.cleanup` with the returned `launchId`, verifying status/mismatch guidance if you simulate a dead renderer or stale refs. For dense-dashboard stress coverage, use the [public Grafana stress checklist](#public-grafana-stress-checklist) below; it is a maintainer workflow, not bundled product skill or recipe runtime.
61
67
 
@@ -227,7 +233,7 @@ These show up often in cloud dev boxes and scripted smokes; they are maintainer
227
233
 
228
234
  | Topic | What to watch for | Mitigation |
229
235
  | --- | --- | --- |
230
- | **Pi CLI vs repo devDependencies** | Global `pi` older than the `@earendil-works/pi-coding-agent` range in `package.json` can change TUI behavior, `/reload`, and tool routing during lifecycle or checkout smokes. | Align `pi` with the repo’s pinned coding-agent release before release gates (`pi update` or install the matching version). |
236
+ | **Pi CLI vs repo devDependencies** | Global `pi` older than the recommended Pi floor for the release can change TUI behavior, `/reload`, package installs, and tool routing during lifecycle or checkout smokes. | Run `npm run doctor` and align `pi` with the current audited baseline before release gates (`pi update` or install the matching version). The published peer range stays non-pinning; the local release gate should use the audited Pi version. |
231
237
  | **npm lockfile (`packageManager`)** | `package.json` pins **npm@11**. npm 10 may only strip optional `libc` metadata on `@esbuild/*` platform entries in `package-lock.json` (no dependency version change). | Prefer `npx -y npm@11.14.0 install` when refreshing the lockfile; do not commit npm-10-only lockfile churn. |
232
238
  | **`pi -p` / print mode** | Non-interactive `pi -p` may hang or emit no stdout for long real-browser smokes without a TTY. | Use **tmux**-driven interactive `pi` for release evidence and checkout smokes; reserve `-p` for short, non-browser checks. |
233
239
  | **Real-browser cleanup** | `real-upstream`, Sauce Demo, and live-site runs can leave defunct Chrome/`agent-browser` children if a session aborts mid-flow. | Close via `agent_browser` / `agent-browser` `close`, kill stray tmux sessions, and remove temp screenshots/HARs under `/tmp` or your chosen artifact dirs. |
@@ -49,6 +49,7 @@ These rows track this feedback batch. Some rows are docs-only or environment-own
49
49
  | RQ-0116 | Fresh-session failure prose is opaque and exposes internal generated session ids without clear recovery. | Wrapper | **wrapper-owned shipped** (action-oriented visible recovery + `nextActions`; `attemptedSessionName` remains in `details`). Struct + visible line already exist (`RQ-0077`). | `buildManagedSessionOutcome` still keeps full generated-session transition details in `details.managedSessionOutcome`, while visible failure prose now summarizes preserved/abandoned/replaced outcomes without repeating generated ids. Focused fake coverage covers preserved, missing-binary, abandoned, and QA-reclassification paths. | No further wrapper action planned for this batch unless reviewer finds recovery actions unsafe or insufficient. | `extensions/agent-browser/lib/orchestration/browser-run/session-state.ts`, `final-result.ts`, `docs/TOOL_CONTRACT.md`, `test/agent-browser.extension-errors-artifacts.test.ts`, `test/agent-browser.extension-input-modes.test.ts`. |
50
50
  | RQ-0117 | There is no machine-readable confirmation that headed mode is visible to the user. | Wrapper gap + environment (display) | **documented unsupported** for this batch; true OS visibility is **out-of-scope/host-owned** until upstream exposes a portable signal. Pairs with RQ-0110. | Same root cause as RQ-0110: no portable upstream/wrapper field observed. Headed launch success is not visibility proof, and adding a constant `details.headedVisibility: "unsupported"` would add noise without a decision signal. | No runtime field in this batch. Keep the explicit contract limitation and independent screenshot/tab/get-url evidence guidance. | README, `docs/TOOL_CONTRACT.md`, `docs/COMMAND_REFERENCE.md`, generated playbook guidance. |
51
51
  | RQ-0118 | Second-round report says every `eval --stdin` expression on `file://` returned `null`, including `1 + 1` and `document.title`. | Wrapper UX + caller-shape recovery | **wrapper-owned shipped in follow-up branch**: direct upstream and native-tool checks on maintainer macOS show `eval --stdin` works on `file://` when the script is supplied through the top-level native tool `stdin` field. The reported all-null behavior is reproduced by the malformed native-tool shape `args: ["eval", "--stdin", "document.title"]` with no top-level `stdin`, which upstream treats as empty stdin and returns `null`. | Direct probes: `agent-browser --session ... open file:///tmp/page.html` then `printf '1+1' | agent-browser --session ... eval --stdin` returns `2`; native tool `{ args: ["eval", "--stdin"], stdin: "document.title" }` returns the fixture title; native tool `{ args: ["eval", "--stdin", "1+1"] }` reproduced `result: null` before normalization. | Normalize the common malformed native-tool call by moving trailing args after `--stdin` into process stdin before launch; keep docs/playbook explicit that top-level `stdin` is canonical. | `extensions/agent-browser/lib/orchestration/input-plan.ts`, `test/agent-browser.extension-errors-artifacts.test.ts`, README, `docs/COMMAND_REFERENCE.md`, `docs/TOOL_CONTRACT.md`, generated playbook guidance. |
52
+ | RQ-0123 | Stress testing found artifact and navigation safety contracts could report success after failure evidence: missing explicit artifact files and `--allowed-domains` click escapes. | Wrapper result contract + navigation policy | **wrapper-owned in this branch**: non-pending resolved file artifacts with `exists:false` fail closed with `failureCategory: "artifact-missing"`; argv-supplied `--allowed-domains` is remembered for the managed session and successful-looking browser commands whose final observed `http(s)` URL escapes the allowlist fail with `failureCategory: "policy-blocked"`. | Reproduced in issue #68/#69 audits: `wait --download` and `diff screenshot --output` reported success with missing files; `example.com` opened under `--allowed-domains example.com` but clicking `Learn more` reached `www.iana.org`, while direct outside-domain open was blocked by upstream. | Preserve verified and pending artifact success; preserve valid in-domain navigation and direct upstream blocks; keep allowlist matching exact host plus subdomain suffix and skip non-`http(s)` URLs. | `extensions/agent-browser/lib/results/presentation/artifacts.ts`, `presentation.ts`, `batch.ts`, `contracts.ts`, `action-recommendations.ts`, `extensions/agent-browser/lib/navigation-policy.ts`, `process-output.ts`, `docs/TOOL_CONTRACT.md`, `docs/COMMAND_REFERENCE.md`, focused artifact/navigation tests. |
52
53
  | RQ-0119 | Second-round localhost failures still show `ERR_EMPTY_RESPONSE` even when shell `curl` succeeds. | Environment + wrapper diagnostics | **diagnostic-mitigated**: direct maintainer repro shows localhost HTTP succeeds with a normal same-host Python server, so the wrapper still cannot prove or bridge an environment-specific browser-host namespace/proxy mismatch. Add error presentation guidance specifically for loopback navigation failures so agents do not misread `ERR_EMPTY_RESPONSE` as blank page content. | Direct probe: `python3 -m http.server --bind 127.0.0.1 8766` + `agent-browser open http://127.0.0.1:8766/page.html` succeeds; previous first-batch evidence still shows accept-then-close servers can produce `ERR_EMPTY_RESPONSE`. | Append a local fixture hint on loopback `open`/navigation failures with `net::ERR_EMPTY_RESPONSE`, `ERR_CONNECTION_REFUSED`, `ERR_ADDRESS_UNREACHABLE`, `ERR_TIMED_OUT`, or `ERR_CONNECTION_RESET`; do not add server lifecycle management in the native browser tool. | `extensions/agent-browser/lib/results/presentation/errors.ts`, `test/agent-browser.presentation-skills-recovery.test.ts`, `docs/COMMAND_REFERENCE.md`, `docs/TOOL_CONTRACT.md`. |
53
54
  | RQ-0120 | Third-round report says ref/semantic clicks can report success while inline `onclick="…"` handlers do not run, though programmatic `.click()` does. | Wrapper diagnostics + upstream/browser hit testing | **diagnostic-mitigated in follow-up branch**: simple direct upstream probes show inline `onclick` handlers fire for selector and `@ref` clicks on file pages, so the reported case is likely a hit-target/overlay/ref-resolution miss rather than inline attributes generally. Extend the click-dispatch probe to `@e…` refs using the latest snapshot role/name metadata so ref or semanticAction→ref clicks that never deliver a trusted event to the intended element fail with `details.clickDispatch` instead of silently reporting success. Fourth-round external testing confirmed this diagnostic now catches the failure. | Direct probe: minimal `<button onclick="showGraph('rps')">` fixture updates DOM via selector click, `@e1` click, and programmatic `.click()`. Existing wrapper probe covered CSS/XPath only; semantic visible-ref resolution and raw `@e…` clicks skipped dispatch diagnostics. Follow-up tester confirmed programmatic `.click()` remains a useful static-fixture workaround when CDP/user-like click dispatch fails. | Probe standalone `click @e…` when the latest snapshot maps that ref to a unique visible role/name DOM candidate; keep no in-page replay policy. Document programmatic `eval --stdin` `.click()` as an explicit debugging/static-fixture workaround only, not proof of real user click behavior and not a way around stop boundaries. | `extensions/agent-browser/lib/orchestration/browser-run/click-dispatch.ts`, `types.ts`, `prepare.ts`, `test/agent-browser.extension-click-dispatch.test.ts`, README, `docs/COMMAND_REFERENCE.md`, `docs/TOOL_CONTRACT.md`. |
54
55
 
@@ -63,7 +64,7 @@ Re-run the gates below before each release; this table records what the closure
63
64
  | Packaged Pi smoke | `npm run verify -- package-pi` validates package contents, loads the packaged `agent_browser` tool without requiring optional Brave config, and executes fake-upstream `--version`. | Pass on 2026-06-03 as part of `npm run verify -- release` (`npm run verify -- package-pi` slice). |
64
65
  | Deterministic dogfood smoke | `npm run verify -- dogfood` (`scripts/verify-agent-browser-dogfood.ts`) drives the native wrapper against a local file fixture through top-level `qa`, `semanticAction`, constrained `job`, screenshot artifact verification, and session close with the real `agent-browser` on `PATH`. | Pass on 2026-06-03 (`npm run verify -- dogfood`, `agent-browser 0.27.1`; artifacts cleaned by the harness). |
65
66
  | Efficiency benchmark | `npm run verify -- benchmark` runs deterministic browser workflow accounting plus focused benchmark tests, including JSONL sampling fixtures and job/qa/sourceLookup/networkSourceLookup/Electron scenario coverage. | Pass on 2026-05-29 (`npm run verify -- benchmark`). |
66
- | Crabbox platform smoke | `npm run check:platform-smoke` syntax-checks the harness and cheap invariants. `npm run smoke:platform:all` runs doctor first, then fast target-local `platform-build` (`npm run verify -- platform-target`, pack, clean Pi install) plus `browser-dogfood-smoke` on Crabbox `macos`, `ubuntu`, and `windows-native`; see [`platform-smoke.md`](platform-smoke.md). | Pass on 2026-06-03 (`npm run check:platform-smoke`, `npm run smoke:platform:ubuntu-image`, and `npm run verify -- release`, whose platform slice ran the macOS/Ubuntu/native-Windows Crabbox matrix; artifacts cleaned after evidence capture). |
67
+ | Crabbox platform smoke | `npm run check:platform-smoke` syntax-checks the harness and cheap invariants. `npm run smoke:platform:ubuntu-image` builds the project-owned Linux image, `npm run smoke:platform:doctor` checks Crabbox 0.26.0+ and local target readiness, and `npm run smoke:platform:all` runs doctor first, then fast target-local `platform-build` (`npm run verify -- platform-target`, pack, clean Pi install) plus `browser-dogfood-smoke` on Crabbox `macos`, `ubuntu`, and `windows-native`; see [`platform-smoke.md`](platform-smoke.md). Target artifacts include Crabbox/provider/work-root metadata, and release review also checks provider-specific `crabbox list` commands for leftover leases/clones. | Pass on 2026-06-03 (`npm run check:platform-smoke`, `npm run smoke:platform:ubuntu-image`, and `npm run verify -- release`, whose platform slice ran the macOS/Ubuntu/native-Windows Crabbox matrix; artifacts cleaned after evidence capture). |
67
68
  | `verify -- release` / `prepublishOnly` | `npm run verify -- release` chains the default gate with packaged Pi smoke and the release-blocking Crabbox platform matrix (`verifySteps` `release` in [`scripts/project.mjs`](../scripts/project.mjs)). `package.json` `prepublishOnly` runs that compose before `npm pack --dry-run` during `npm publish`. It intentionally omits standalone lifecycle, real-upstream, host-only dogfood, and benchmark modes—see [`RELEASE.md`](RELEASE.md#pre-release-checks). | Pass on 2026-06-03 (`npm run verify -- release`, including macOS/Ubuntu/native-Windows Crabbox matrix). |
68
69
  | Configured-source lifecycle | `npm run verify -- lifecycle` (`scripts/verify-lifecycle.mjs`) drives `/reload`, closes and relaunches Pi with the same exact `--session-id`, checks the JSONL session header id, session continuity, slash-command sentinel tokens (`v1` then `v2` after rewriting the packaged extension to simulate pickup), persisted spill reachability, and real Pi `tool_result` failure-patch semantics for a QA reclassification with a fake upstream on `PATH`. Default Pi model is `zai/glm-5.1`; default per-step wait is **180000 ms** (`DEFAULT_TIMEOUT_MS`); override model with `--model <id>` and waits with `--timeout-ms <ms>`. Passthrough flags in [`scripts/project.mjs`](../scripts/project.mjs): `--keep-artifacts`, `--model`, `--verbose`, and `--timeout-ms` plus a value (for example `npm run verify -- lifecycle --model openai-codex/gpt-5.5:minimal --keep-artifacts --verbose --timeout-ms 600000`). | Pass on 2026-06-03 (`npm run verify -- lifecycle`). Treat any future unexplained red lifecycle gate as a release blocker. |
69
70
  | Quick isolated Pi smoke | `pi --no-extensions --no-skills -e . --tools agent_browser` from repo root; native `agent_browser` only. | Last interactive tmux checkout smoke pass on 2026-05-29 (`agent-browser 0.27.0` at the time). The 2026-06-03 Crabbox matrix now covers clean packed Pi install plus deterministic wrapper dogfood on all required platforms for `agent-browser 0.27.1`; run a new manual tmux smoke before publish when human-readable transcript evidence is required. Broader historical coverage also includes version/help/skills, open/snapshot/click, eval stdin, batch stdin, screenshot, explicit session, `sessionMode: "fresh"`, network requests, console/errors, diff snapshot, stream status/disable, dashboard start/stop, and chat credential-failure pass-through during RQ-0055. |
@@ -75,7 +76,7 @@ Re-run the gates below before each release; this table records what the closure
75
76
  | Built-in skills | 13 canonical tokens from baseline section `skills`; see [`scripts/agent-browser-capability-baseline.mjs`](../scripts/agent-browser-capability-baseline.mjs) and generated [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#built-in-skills). | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#built-in-skills), generated baseline block, README proof section, release docs. | `needsManagedSession` keeps read-only skills inspection sessionless while preserving thin upstream passthrough. | Runtime and extension-validation skills/provider matrix; real-upstream inspection/skills group. | Supported. |
76
77
  | Core page, element, navigation, and extraction commands | 74 canonical tokens from baseline section `core-commands`; see [`scripts/agent-browser-capability-baseline.mjs`](../scripts/agent-browser-capability-baseline.mjs) and generated [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#core-page-and-element-commands). | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#core-page-and-element-commands), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md), README quick start. | Thin passthrough with wrapper-owned JSON/session planning, ref guidance, artifact verification, page-change summaries, click-dispatch diagnostics, no-op scroll/focus diagnostics, shorthand compilers, and redaction. | Real-upstream core matrix plus fake core matrix for passthrough, ordering, diagnostics, and compiler validation. | Supported. Upstream semantics remain upstream-owned. |
77
78
  | Sessions, state, tabs, frames, dialogs, and windows | 20 canonical tokens from baseline section `state-tabs-frames-dialogs`; see [`scripts/agent-browser-capability-baseline.mjs`](../scripts/agent-browser-capability-baseline.mjs) and generated [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#session-state-frames-dialogs-windows-and-inspection-commands). | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#session-state-frames-dialogs-windows-and-inspection-commands), stateful workflow notes, [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details). | Stateful summaries/redaction, state artifact handling, sessionless local command planning, managed-session restore, tab target pinning, and close alias cleanup. | Extension-validation stateful matrix, runtime session/resume tests, presentation redaction tests, lifecycle harness. | Supported. External profile/auth state remains operator-owned. |
78
- | Network, storage, artifacts, diagnostics, and performance | 42 canonical tokens from baseline section `network-storage-artifacts-diagnostics`; see [`scripts/agent-browser-capability-baseline.mjs`](../scripts/agent-browser-capability-baseline.mjs) and generated [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#page-state-finding-mouse-settings-network-and-storage). | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#page-state-finding-mouse-settings-network-and-storage), diagnostic sections, [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details). | Thin passthrough plus compact diagnostics, artifact metadata, missing-ffmpeg warnings, sensitive-data redaction, timeout bounds, and cleanup-pair guidance. | Fake non-core matrix and safe real-upstream coverage for network/HAR, diff, trace/profiler, console/errors/highlight, stream, vitals, and React missing-renderer. | Supported. Environment-sensitive operations need suitable local/browser state. |
79
+ | Network, storage, artifacts, diagnostics, and performance | 42 canonical tokens from baseline section `network-storage-artifacts-diagnostics`; see [`scripts/agent-browser-capability-baseline.mjs`](../scripts/agent-browser-capability-baseline.mjs) and generated [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#page-state-finding-mouse-settings-network-and-storage). | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#page-state-finding-mouse-settings-network-and-storage), diagnostic sections, [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details). | Thin passthrough plus compact diagnostics, route-mock warnings, useful-but-redacted storage output, stream idempotency normalization, artifact metadata, missing-ffmpeg warnings, sensitive-data redaction, timeout bounds, and cleanup-pair guidance. | Fake non-core matrix and safe real-upstream coverage for network/HAR, diff, trace/profiler, console/errors/highlight, stream, vitals, and React missing-renderer. | Supported. Environment-sensitive operations need suitable local/browser state. |
79
80
  | Batch, auth, confirmations, setup, dashboard, devices, and AI commands | 24 canonical tokens from baseline section `batch-auth-setup-ai`; see [`scripts/agent-browser-capability-baseline.mjs`](../scripts/agent-browser-capability-baseline.mjs) and generated [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#batch-auth-confirmations-sessions-chat-dashboard-devices-and-setup). | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#batch-auth-confirmations-sessions-chat-dashboard-devices-and-setup), README security notes, release docs. | Native-tool batch stdin, generated `job`/`qa`/lookup batch plans, auth/confirmation redaction, sessionless local auth/setup/dashboard/doctor planning, timeout/cleanup guidance. | Unit/fake batch/auth/confirmation/dashboard/chat/doctor tests; extension-validation for structured input modes; efficiency benchmark scenarios. | Supported. Interactive side-effecting setup/auth/chat remains upstream-owned. |
80
81
  | Global flags, config, providers, policy, and environment | 117 canonical tokens from baseline section `options-and-env`; see [`scripts/agent-browser-capability-baseline.mjs`](../scripts/agent-browser-capability-baseline.mjs) and generated [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#important-global-flags-config-and-environment). | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#important-global-flags-config-and-environment), README provider/setup notes, [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sessionmode), architecture/runtime docs. | Runtime handles command discovery, value-flag prevalidation, launch-scoped flags, redacted echoes, fresh-session recovery hints, explicit sessions, provider/device launch-scoping, curated env forwarding, subprocess completion, and package-owned Pi-scoped config for optional companion features. | Runtime tests for flags/planning/redaction/session behavior; process tests for env and stdio-linger completion; config/web-search/CLI tests; fake provider/specialized-skill matrix; package doctor. | Supported. Provider clouds, iOS/Appium, proxies, profiles, and credentials require external setup. |
81
82
 
@@ -93,7 +94,7 @@ Native `job`, `qa`, experimental `sourceLookup`, experimental `networkSourceLook
93
94
 
94
95
  `RQ-0093` keeps network diagnostics read-only for wrapper page/ref state: standalone `network request …` results and generated `networkSourceLookup` batch rows may contain API/request URLs, but those URLs are not promoted to `details.sessionTabTarget` and do not stale the latest app-page `details.refSnapshot`. The prior session target is preserved until a real page/navigation/snapshot result updates it. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#networksourcelookup) and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); fake coverage: `agentBrowserExtension keeps network request diagnostics from replacing the active page target` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
95
96
 
96
- `RQ-0095` adds bounded machine follow-ups for compact `network requests` output: `extensions/agent-browser/lib/results/presentation/diagnostics.ts` selects at most one safe request ID (actionable failed row first, then API/fetch-like row, benign failed row, or first safe ID) and appends `details.nextActions` for exact `network request <id>`, optional `networkSourceLookup` on actionable failed rows, path filtering with `network requests --filter <path>`, and `network har start` before a repro. Request-detail/filter/HAR argv preserve the current `--session` prefix when known, source lookup nextActions carry `networkSourceLookup.session` when known, and URL queries plus sensitive-looking IDs/paths are omitted from action params. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) network diagnostics note and README source-lookup section; fake coverage: `buildToolPresentation formats redacted network payload, response, and error previews` and `buildToolPresentation returns bounded network request next actions for benign and successful API rows` in [`test/agent-browser.presentation.test.ts`](../test/agent-browser.presentation.test.ts).
97
+ `RQ-0095` adds bounded machine follow-ups for compact `network requests` output: `extensions/agent-browser/lib/results/presentation/diagnostics.ts` selects at most one safe request ID (actionable failed row first, then API/fetch-like row, benign failed row, or first safe ID) and appends `details.nextActions` for exact `network request <id>`, optional `networkSourceLookup` on actionable failed rows, path filtering with `network requests --filter <path>`, and `network har start` before a repro. Request-detail/filter/HAR argv preserve the current `--session` prefix when known, source lookup nextActions carry `networkSourceLookup.session` when known, and URL queries plus sensitive-looking IDs/paths are omitted from action params. Route-mock diagnostics (#73) now track successful `network route` / `network unroute` patterns per session and, on later `network requests`, surface `details.networkRouteDiagnostics` plus executable `inspect-pending-routed-network-request` and `start-network-har-capture-for-route-mock` follow-ups when a matching fetch/XHR row is pending or CORS/preflight-looking; same-origin/CORS fixture guidance stays in prose rather than a non-runnable next action. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) network diagnostics note and README source-lookup section; fake coverage: `buildToolPresentation formats redacted network payload, response, and error previews`, `buildToolPresentation returns bounded network request next actions for benign and successful API rows`, `buildToolPresentation adds routed pending network diagnostics`, and `agentBrowserExtension reports pending routed network mocks`.
97
98
 
98
99
  `RQ-0092` adds first-class native select support to the wrapper shorthand surfaces without adding a recipe layer: `semanticAction.action = "select"` requires `selector` plus `value` or `values` and compiles to upstream `select <selector> <value...>`; constrained `job` supports the same `select` step inside generated `batch` stdin. Role/name/label dropdown selection is deliberately not hidden behind `find … select` because upstream `find` has no verified select action; agents should use a stable selector or a current `@ref` for native selects and reserve visible option refs for custom comboboxes after a fresh snapshot. Stale-ref retries remain limited to compiled `find` semantic actions, so `select @e…` failures return refresh guidance rather than blind retry. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction) and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#job); fake coverage: semanticAction/job select compile in [`test/agent-browser.extension-input-modes.test.ts`](../test/agent-browser.extension-input-modes.test.ts) and stale-ref assertions in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts); real-upstream coverage: raw, semanticAction, and job select against the localhost native `<select>` fixture in [`test/agent-browser.real-upstream-contract.test.ts`](../test/agent-browser.real-upstream-contract.test.ts).
99
100
 
@@ -113,11 +114,11 @@ Native `job`, `qa`, experimental `sourceLookup`, experimental `networkSourceLook
113
114
 
114
115
  `RQ-0100` makes desktop tab/surface drift recovery machine-readable without adding routine tab-list probes for normal clicks. When existing wrapper state already identifies a target tab, about:blank and tab-drift paths append `list-tabs-for-about-blank-recovery` or `list-tabs-for-tab-drift-recovery`, then `select-intended-tab-after-drift` and `snapshot-after-tab-recovery` when the stable `t<N>` id is known. The implementation reuses `priorSessionTabTarget`, `aboutBlankSessionMismatch`, `sessionTabCorrection`, `openResultTabCorrection`, and existing tab-correction outputs; it does not probe tabs for ordinary clicks beyond the RQ-0086-gated drift paths. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#tabs) and [`ELECTRON.md`](ELECTRON.md#troubleshooting); fake coverage: about:blank recovery and explicit-about:blank negatives in [`test/agent-browser.extension-tab-recovery.test.ts`](../test/agent-browser.extension-tab-recovery.test.ts), early tab-drift failure assertions in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts), and central next-action helper coverage in [`test/agent-browser.results.test.ts`](../test/agent-browser.results.test.ts).
115
116
 
116
- `RQ-0099` makes semantic fill misses on host-controlled rich inputs recoverable without changing upstream `find` semantics or adding a recipe runtime. When `selector-not-found` recovery already collected an exact current editable `searchbox` / `textbox` ref, `extensions/agent-browser/lib/results/selector-recovery.ts` defines `details.richInputRecovery`, visible `Rich input recovery`, and bounded `focus-current-editable-ref*` / `click-current-editable-ref*` next actions; `extensions/agent-browser/index.ts` only probes the current session snapshot and merges the result. Those next actions never copy the fill text and never press `Enter` or submit; agents should refresh refs, choose the current editable `@ref`, focus/click it, then use `keyboard inserttext` or `keyboard type` with the intended text only after the right input is focused. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: README locator shorthand, [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#selector-strategy), and generated playbook text from `extensions/agent-browser/lib/playbook.ts`; fake coverage: `agentBrowserExtension returns rich input recovery when semanticAction fill misses current editable refs` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
117
+ `RQ-0099` makes semantic fill misses on host-controlled rich inputs recoverable without changing upstream `find` semantics or adding a recipe runtime. Active-session role/name `semanticAction.fill` first gets a guarded pre-execution current-ref pass: one fresh `snapshot -i`, one exact editable `combobox` / `searchbox` / `textbox` match, then direct `fill @ref <text>` while preserving the original semantic target in `details.compiledSemanticAction`. When a later `selector-not-found` recovery already collected an exact current editable `searchbox` / `textbox` ref, `extensions/agent-browser/lib/results/selector-recovery.ts` defines `details.richInputRecovery`, visible `Rich input recovery`, and bounded `focus-current-editable-ref*` / `click-current-editable-ref*` next actions; `extensions/agent-browser/index.ts` only probes the current session snapshot and merges the result. Those next actions never copy the fill text and never press `Enter` or submit; agents should refresh refs, choose the current editable `@ref`, focus/click it, then use `keyboard inserttext` or `keyboard type` with the intended text only after the right input is focused. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: README locator shorthand, [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#selector-strategy), and generated playbook text from `extensions/agent-browser/lib/playbook.ts`; fake coverage: `agentBrowserExtension resolves semantic role fills through one exact current editable ref` in [`test/agent-browser.extension-input-modes.test.ts`](../test/agent-browser.extension-input-modes.test.ts) and `agentBrowserExtension returns rich input recovery when semanticAction fill misses current editable refs` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
117
118
 
118
119
  `RQ-0101` improves compact snapshot usefulness for dense desktop host screens without adding a new mode or dumping all refs inline. `extensions/agent-browser/lib/results/snapshot.ts` still emits the existing visible `Omitted high-value controls` section and `details.data.highValueControlRefIds`, while `snapshot-high-value-controls.ts` selects omitted controls with bounded diversity so editable/searchbox/textbox/combobox controls, named tab/surface controls, and primary action buttons remain discoverable even when many utility buttons and dense host rows compete for the trimmed ref budget. Human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#snapshot-refs-and-current-page-state), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), and README; fake coverage: `buildToolPresentation keeps dense desktop host high-value controls discoverable in compact snapshots` in [`test/agent-browser.snapshot-presentation.test.ts`](../test/agent-browser.snapshot-presentation.test.ts).
119
120
 
120
- `RQ-0070` adds bounded locator fallbacks when a compiled top-level `semanticAction` fails with `failureCategory: "selector-not-found"`: `extensions/agent-browser/index.ts` appends `try-*-candidate` entries to `details.nextActions` (and an `Agent-browser candidate fallbacks` block in visible text) only for `click`+`text` (`try-button-name-candidate` and `try-link-name-candidate`). Other locator/action pairs omit this block; fill recovery now goes through the RQ-0099 current-editable-ref ladder so candidate nextActions do not repeat fill text. `semanticAction` `select` uses explicit `selector` plus `value`/`values` and compiles to upstream `select`, not to unverified `find … select`. Active-session role/name click/check/uncheck shorthands also get a pre-execution visible-ref resolution pass via one fresh `snapshot -i`, so hidden duplicate upstream `find` matches do not steal the action; the original target remains in `details.compiledSemanticAction` and the executed ref appears in `details.effectiveArgs`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction); fake coverage: semantic selector-miss assertions in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts) plus current-ref assertions and `agentBrowserExtension resolves semantic role clicks through current visible snapshot refs when available` in [`test/agent-browser.extension-input-modes.test.ts`](../test/agent-browser.extension-input-modes.test.ts).
121
+ `RQ-0070` adds bounded locator fallbacks when a compiled top-level `semanticAction` fails with `failureCategory: "selector-not-found"`: `extensions/agent-browser/index.ts` appends `try-*-candidate` entries to `details.nextActions` (and an `Agent-browser candidate fallbacks` block in visible text) only for `click`+`text` (`try-button-name-candidate` and `try-link-name-candidate`). Other locator/action pairs omit this block; fill recovery now goes through the RQ-0099 current-editable-ref ladder so candidate nextActions do not repeat fill text. `semanticAction` `select` uses explicit `selector` plus `value`/`values` and compiles to upstream `select`, not to unverified `find … select`; `semanticAction.uncheck` is intentionally not exposed while upstream `find … uncheck` is not runtime-supported, and raw `uncheck <selector-or-ref>` remains available. Active-session role/name click/check/fill shorthands also get a pre-execution visible-ref resolution pass via one fresh `snapshot -i`, so hidden duplicate upstream `find` matches do not steal the action; fill requires one exact editable current ref. The original target remains in `details.compiledSemanticAction` and the executed ref appears in `details.effectiveArgs`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction); fake coverage: semantic selector-miss assertions in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts) plus current-ref assertions and `agentBrowserExtension resolves semantic role clicks through current visible snapshot refs when available` / `agentBrowserExtension resolves semantic role fills through one exact current editable ref` in [`test/agent-browser.extension-input-modes.test.ts`](../test/agent-browser.extension-input-modes.test.ts).
121
122
 
122
123
  `RQ-0071` makes that shorthand session-aware: optional `semanticAction.session` compiles to `--session <name>` before `find` or `select`, so `buildExecutionPlan` treats the call like any argv that already names an upstream session (no extra implicit `--session`); `details.sessionName` reflects the name on success; stale-ref retries for compiled `find` actions copy compiled argv with that prefix, and `try-*` candidates preserve the same `--session` prefix via `getCompiledSemanticActionSessionPrefix`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction); fake coverage: `semanticAction` session compile/assertions in [`test/agent-browser.extension-input-modes.test.ts`](../test/agent-browser.extension-input-modes.test.ts).
123
124
 
@@ -125,7 +126,7 @@ Native `job`, `qa`, experimental `sourceLookup`, experimental `networkSourceLook
125
126
 
126
127
  `RQ-0072` guards page-scoped `@e…` refs against silent recycling: successful `snapshot` (or the last `snapshot` step inside a successful `batch`) records `details.refSnapshot` with ref ids and the snapshot page URL; `extensions/agent-browser/lib/session-page-state.ts` replays per-session snapshots and `refSnapshotInvalidation` markers from the active transcript branch on `session_start` and Pi 0.78 `session_tree` branch changes, clears them on successful close commands (`close`, `quit`, or `exit`), invalidates prior refs when a session `snapshot` fails with `No active page`, rejects mutation-prone ref argv before spawn when the tab URL diverges, a ref id is missing from the latest snapshot, or the session refs are invalidated, blocks `batch` stdin that uses `@e…` on a guarded command after an earlier step that can navigate or mutate until a `snapshot` step appears later in the same stdin array (pre-spawn latch reset only), and prefixes `refresh-interactive-refs` with `--session` when the call names a session (including upstream-classified `stale-ref` outcomes). The entrypoint also serializes `session_tree` restore and wrapper-owned browser commands with managed-session work, guards independent caller-owned explicit-session completions with a branch-state generation check, keeps process-owned cleanup registries for managed sessions and wrapper-launched Electron records separate from the branch-visible view, treats explicit wrapper-owned close rows and Electron cleanup managed-session steps as restore-visible close events, closes off-branch owned managed sessions and Electron launches on non-quit reload shutdown, preserves current branch-visible active managed/Electron sessions and active Electron temp profiles for reload continuity, and preserves fresh-session allocation monotonicity across branch restores. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`refSnapshot`, `refSnapshotInvalidation`, `stale-ref`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) snapshot/ref notes and README pitfalls; fake coverage: `agentBrowserExtension recommends tab recovery after No active page snapshot failures` and `agentBrowserExtension invalidates refs after No active page snapshot failures inside batch` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts), plus `agentBrowserExtension blocks page-scoped ref reuse…`, `…rehydrates page-scoped refs from the current tree branch`, `…rehydrates managed browser session state from the current tree branch`, `…rehydrates artifact manifest state from the current tree branch`, `…keeps Electron cleanup ownership after session_tree switches away from the launch branch`, `…blocks stale refs after page-changing steps inside a batch`, `…allows same-snapshot form fills before a batch click`, `…allows batch stdin ref steps after snapshot following an invalidating step`, `…records snapshot refs returned inside a successful batch`, and `…rejects refs absent from the latest same-page snapshot` in [`test/agent-browser.extension-ref-guards.test.ts`](../test/agent-browser.extension-ref-guards.test.ts); managed-session reload cleanup, explicit close untracking/state rotation/restore, generated fresh-name reservation after repeated explicit closes, explicit-session command versus `session_tree` generation-guard coverage, explicit close versus in-flight implicit command serialization, and fresh-ordinal coverage lives in [`test/agent-browser.resume-state.test.ts`](../test/agent-browser.resume-state.test.ts).
127
128
 
128
- `RQ-0087` keeps the RQ-0072 guard but removes `fill` from the batch invalidation latch: `fill @e…` rows remain guarded against stale/missing refs, yet multiple same-snapshot form fills can run before the first click/submit/navigation step in one upstream `batch`. A later guarded ref after `click`, `open`, `reload`, or other invalidating rows still fails before spawn unless the batch includes a fresh `snapshot` step first. This improves login/checkout efficiency without permitting likely post-navigation ref reuse. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`Batch stdin ordering`); human workflow: README and [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) ref notes; fake coverage: `agentBrowserExtension allows same-snapshot form fills before a batch click` in [`test/agent-browser.extension-ref-guards.test.ts`](../test/agent-browser.extension-ref-guards.test.ts).
129
+ `RQ-0087` keeps the RQ-0072 guard but removes safe same-snapshot form work from the batch invalidation latch: `fill @e…` rows and role-checked native form-control rows (`check`/`uncheck` on checkbox or radio refs and `select` on combobox refs) remain guarded against stale/missing refs, yet can run before the first click/submit/navigation step in one upstream `batch`. A later guarded ref after `open`, `reload`, direct `click`, or other invalidating rows still fails before spawn unless the batch includes a fresh `snapshot` step first; checkbox/radio clicks stay conservative because snapshot role metadata alone does not prove native semantics. This improves login/checkout/static-form efficiency without permitting likely post-navigation ref reuse. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`Batch stdin ordering`); human workflow: README and [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) ref notes; fake coverage: `agentBrowserExtension allows same-snapshot form fills before a batch click` and `agentBrowserExtension allows same-snapshot form control batches before a hard invalidating click` in [`test/agent-browser.extension-ref-guards.test.ts`](../test/agent-browser.extension-ref-guards.test.ts).
129
130
 
130
131
  `RQ-0073` surfaces likely overlay blockers after no-navigation clicks without inventing blind targets: for **top-level** `click` results (unified command `click`, not `batch`-wrapped steps) whose upstream JSON includes `data.clicked`, whose prior pinned tab URL and post-click URL (from `details.navigationSummary`, gathered by one read-only `eval` summary when the click payload omits **both** string `data.url` and `data.title`) stay equal after the same fragment-insensitive normalization used for ref preflight, and where the same unified result did **not** already apply session tab correction, about-blank mismatch recovery, or `details.clickDispatch` fired for the same result, `extensions/agent-browser/index.ts` takes one fresh session-scoped `snapshot -i`, scans `refs` for strong modal context (`dialog` / `alertdialog`) plus up to three close/dismiss-pattern `button`/`link`/`menuitem` controls, and only then emits `details.overlayBlockers` (`candidates`, `summary`, and a `snapshot` map that can advance `refSnapshot`), visible `Possible overlay blockers`, and `inspect-overlay-state` / `try-overlay-blocker-candidate-*` next actions (with `--session` prefix when the session is named) appended after presentation follow-ups such as `inspect-after-mutation`. Page-wide privacy/sign-in/banner text without a dialog role is deliberately ignored to avoid warnings after ordinary same-page clicks. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`overlayBlockers`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) no-navigation click note and README pitfalls; fake coverage: `agentBrowserExtension surfaces likely overlay blockers after a no-op click` and `agentBrowserExtension does not report overlay blockers from unrelated page chrome after a successful same-page click` in [`test/agent-browser.extension-errors-artifacts.test.ts`](../test/agent-browser.extension-errors-artifacts.test.ts).
131
132