pi-agent-browser-native 0.2.38 → 0.2.40
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +27 -0
- package/README.md +60 -5
- package/docs/ARCHITECTURE.md +27 -8
- package/docs/COMMAND_REFERENCE.md +57 -12
- package/docs/RELEASE.md +22 -9
- package/docs/SUPPORT_MATRIX.md +15 -11
- package/docs/TOOL_CONTRACT.md +46 -2
- package/docs/platform-smoke.md +176 -0
- package/extensions/agent-browser/index.ts +13 -3
- package/extensions/agent-browser/lib/config.ts +446 -0
- package/extensions/agent-browser/lib/playbook.ts +17 -3
- package/extensions/agent-browser/lib/process.ts +72 -13
- package/extensions/agent-browser/lib/web-search.ts +352 -0
- package/package.json +15 -1
- package/platform-smoke.config.mjs +18 -0
- package/scripts/agent-browser-capability-baseline.mjs +9 -6
- package/scripts/config.mjs +297 -0
- package/scripts/platform-smoke/artifacts.mjs +94 -0
- package/scripts/platform-smoke/browser-dogfood-windows.ps1 +110 -0
- package/scripts/platform-smoke/crabbox-runner.mjs +149 -0
- package/scripts/platform-smoke/doctor.mjs +307 -0
- package/scripts/platform-smoke/linux-image/Dockerfile +23 -0
- package/scripts/platform-smoke/platform-build-windows.ps1 +103 -0
- package/scripts/platform-smoke/targets.mjs +471 -0
- package/scripts/platform-smoke.mjs +161 -0
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,33 @@
|
|
|
2
2
|
|
|
3
3
|
## Unreleased
|
|
4
4
|
|
|
5
|
+
## 0.2.40 - 2026-06-02
|
|
6
|
+
|
|
7
|
+
### Added
|
|
8
|
+
|
|
9
|
+
- Added Pi-scoped `pi-agent-browser-native` package config at `~/.pi/config/pi-agent-browser-native/config.json`, `.pi/config/pi-agent-browser-native/config.json`, and the `PI_AGENT_BROWSER_CONFIG` override, including the `pi-agent-browser-config` helper for redacted setup/status and conservative browser profile hints.
|
|
10
|
+
- Added the optional Brave-backed `agent_browser_web_search` companion tool, registered only when a usable Brave credential source is configured or resolvable, with compact normalized results for current/live web information.
|
|
11
|
+
|
|
12
|
+
### Changed
|
|
13
|
+
|
|
14
|
+
- Documented optional web-search setup, config precedence, credential safety, and browser profile guidance across the README, command reference, tool contract, architecture notes, and support matrix.
|
|
15
|
+
|
|
16
|
+
### Fixed
|
|
17
|
+
|
|
18
|
+
- Hardened Brave credential handling so project-local config only accepts exact inert `$ENV_VAR` or `${ENV_VAR}` references, rejects plaintext/malformed/interpolation-literal/command-backed secrets, and keeps raw or entity-encoded API keys out of tool output and errors.
|
|
19
|
+
- Cleaned Brave result text by decoding common HTML entities while stripping decoded HTML tags safely and preserving placeholder text such as `<version>`.
|
|
20
|
+
|
|
21
|
+
## 0.2.39 - 2026-06-02
|
|
22
|
+
|
|
23
|
+
### Added
|
|
24
|
+
|
|
25
|
+
- Added a Crabbox-backed platform smoke gate for release validation across macOS, prepared Ubuntu Linux, and native Windows, including packed-package installation and deterministic browser dogfood suites.
|
|
26
|
+
|
|
27
|
+
### Changed
|
|
28
|
+
|
|
29
|
+
- Updated the upstream capability baseline, command reference, platform smoke images, and live-contract metadata for `agent-browser` `0.27.1`.
|
|
30
|
+
- Reduced per-target platform smoke cost by using a focused `verify -- platform-target` gate inside Crabbox targets instead of rerunning the full default verification suite on every OS.
|
|
31
|
+
|
|
5
32
|
## 0.2.38 - 2026-05-29
|
|
6
33
|
|
|
7
34
|
### Changed
|
package/README.md
CHANGED
|
@@ -146,6 +146,48 @@ The doctor checks:
|
|
|
146
146
|
|
|
147
147
|
It does **not** edit Pi settings and does **not** run upstream `agent-browser doctor --fix`.
|
|
148
148
|
|
|
149
|
+
## Optional package config and web search
|
|
150
|
+
|
|
151
|
+
`pi-agent-browser-native` also reads package-owned config under Pi-scoped paths:
|
|
152
|
+
|
|
153
|
+
- global user config: `~/.pi/config/pi-agent-browser-native/config.json`
|
|
154
|
+
- project config: `.pi/config/pi-agent-browser-native/config.json`
|
|
155
|
+
- explicit override: `PI_AGENT_BROWSER_CONFIG=/path/to/config.json`
|
|
156
|
+
|
|
157
|
+
Use the setup helper to inspect or write config:
|
|
158
|
+
|
|
159
|
+
```bash
|
|
160
|
+
pi-agent-browser-config paths
|
|
161
|
+
pi-agent-browser-config show
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
The optional `agent_browser_web_search` companion tool is Brave-only today and is registered only when a usable Brave Search credential source is configured or resolvable. It is not an `agent_browser` input mode and does not launch a browser; agents may use it whenever current/live external web information helps, then use `agent_browser` when they need page interaction, screenshots, authenticated/profile content, or DOM inspection.
|
|
165
|
+
|
|
166
|
+
Get a Brave Search API key from the [Brave Search API dashboard](https://api-dashboard.search.brave.com/). Brave currently advertises free monthly credits for Search API usage, which is usually ample for light personal agent/dogfood use; confirm current pricing and limits on Brave's dashboard before relying on it for heavier workflows.
|
|
167
|
+
|
|
168
|
+
Supported setup examples:
|
|
169
|
+
|
|
170
|
+
```bash
|
|
171
|
+
# Store a plaintext key in global Pi-scoped user config; output stays redacted.
|
|
172
|
+
printf '%s' "$BRAVE_API_KEY" | pi-agent-browser-config web-search set-key --stdin
|
|
173
|
+
|
|
174
|
+
# Store an env-var reference, useful for project config.
|
|
175
|
+
pi-agent-browser-config web-search set-env BRAVE_API_KEY --project
|
|
176
|
+
|
|
177
|
+
# Store a global secret-manager command source.
|
|
178
|
+
pi-agent-browser-config web-search set-command "op read 'op://Private/Brave Search/API Key'" --global
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
Project-local plaintext, interpolation-literal, malformed, and command-backed Brave keys are refused; use an exact environment reference such as `$BRAVE_API_KEY` or `${BRAVE_API_KEY}` for `.pi/config/pi-agent-browser-native/config.json`. The tool content, details, status output, and docs examples must not expose the resolved key.
|
|
182
|
+
|
|
183
|
+
The same config file can record conservative browser defaults such as a profile hint:
|
|
184
|
+
|
|
185
|
+
```bash
|
|
186
|
+
pi-agent-browser-config browser profile set Default --policy authenticated-only
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
This adds agent guidance for signed-in/account-specific tasks; current releases do not auto-inject `--profile` for every launch.
|
|
190
|
+
|
|
149
191
|
## Common agent calls
|
|
150
192
|
|
|
151
193
|
You usually prompt the agent in natural language. These JSON snippets show the exact native tool shape the agent should use.
|
|
@@ -340,7 +382,7 @@ For asynchronous exports, click first and then wait for the download:
|
|
|
340
382
|
{ "args": ["wait", "--download", "/tmp/report.csv"] }
|
|
341
383
|
```
|
|
342
384
|
|
|
343
|
-
When a user gives exact artifact paths for screenshots, recordings, downloads, PDFs, traces, or HAR files, use those paths or explicitly report why the artifact was unavailable; do not silently substitute a different path in the final report. With upstream `agent-browser 0.27.
|
|
385
|
+
When a user gives exact artifact paths for screenshots, recordings, downloads, PDFs, traces, or HAR files, use those paths or explicitly report why the artifact was unavailable; do not silently substitute a different path in the final report. With upstream `agent-browser 0.27.1`, treat `details.savedFilePath` as upstream-reported metadata and confirm `details.artifacts[].exists` before relying on the requested `wait --download <path>` file being present on disk.
|
|
344
386
|
|
|
345
387
|
For evidence-only screenshots or QA captures, branch on `details.artifactVerification` and `details.artifacts` before reporting PASS/FAIL; inline image attachments are optional when size limits allow—do not require vision review unless the user asked for visual inspection. If the latest prompt names exact required artifact paths, browser close can be blocked with `details.promptGuard` until those artifacts are saved and verified.
|
|
346
388
|
|
|
@@ -446,7 +488,7 @@ The full `npm run verify` gate runs:
|
|
|
446
488
|
- command-reference baseline checks
|
|
447
489
|
- live command-reference verification against the targeted installed upstream `agent-browser`
|
|
448
490
|
|
|
449
|
-
Step order and which subprocesses run live in [`scripts/project.mjs`](scripts/project.mjs); [`test/project-verify.test.ts`](test/project-verify.test.ts) locks default, `release`, `real-upstream`, `dogfood`, `package-pi`, and combined-docs orchestration so a gate cannot disappear accidentally. Run `npm run verify -- --help` for opt-in modes and supported passthrough flags.
|
|
491
|
+
Step order and which subprocesses run live in [`scripts/project.mjs`](scripts/project.mjs); [`test/project-verify.test.ts`](test/project-verify.test.ts) locks default, `release`, `real-upstream`, `dogfood`, `platform-target`, `platform-smoke`, `package-pi`, and combined-docs orchestration so a gate cannot disappear accidentally. Run `npm run verify -- --help` for opt-in modes and supported passthrough flags.
|
|
450
492
|
|
|
451
493
|
The deterministic agent-efficiency benchmark’s **standalone JSON/Markdown accounting run** is not part of default `npm run verify` (only `npm run verify -- benchmark` or `npm run benchmark:agent-browser` invokes the script). The full unit suite still exercises `test/agent-browser.efficiency-benchmark.test.ts`. Use the script before and after agent-facing abstractions to prove call-count, output-size, stale-ref, artifact, failure-category coverage, success-rate, and elapsed-time effects before changing the wrapper UX:
|
|
452
494
|
|
|
@@ -467,22 +509,34 @@ npm run verify -- real-upstream
|
|
|
467
509
|
|
|
468
510
|
That mode sets `PI_AGENT_BROWSER_REAL_UPSTREAM=1` and runs `test/agent-browser.real-upstream-contract.test.ts` against the real `agent-browser` on `PATH` (version must match the capability baseline). It covers inspection, skills, a broad core interaction and navigation matrix on localhost fixtures (including `batch` stdin and `pushstate`), plus `vitals`, network route/requests/HAR, diff snapshot/screenshot/url, trace/profiler, console/errors/highlight, stream enable/status/disable, `cookies set --curl`, a `react tree` missing-renderer path, and `wait --download` with the on-disk caveat documented in release notes. The harness uses a throwaway temp `HOME` and dedicated socket/screenshot directories so the run does not touch your normal browser profile paths. Browser-opening or credential-dependent families such as `inspect`, `dashboard`, `chat`, provider clouds, and OS clipboard flows stay in fake-upstream or manual validation unless a safe deterministic fixture is added. For prerequisites, isolation details, and troubleshooting, see [`docs/RELEASE.md`](docs/RELEASE.md#real-upstream-contract-validation).
|
|
469
511
|
|
|
470
|
-
A deterministic live-browser wrapper smoke is available without an LLM choosing tool calls:
|
|
512
|
+
A deterministic host-only live-browser wrapper smoke is available without an LLM choosing tool calls:
|
|
471
513
|
|
|
472
514
|
```bash
|
|
473
515
|
npm run verify -- dogfood
|
|
474
516
|
```
|
|
475
517
|
|
|
476
|
-
That mode drives the native wrapper through top-level `qa`, `semanticAction`,
|
|
518
|
+
That mode drives the native wrapper through top-level `qa`, `semanticAction`, constrained `job`, screenshot artifact verification, and session close against a deterministic local fixture. It complements, but does not replace, the interactive Pi/tmux release dogfood in [`docs/RELEASE.md`](docs/RELEASE.md#pre-release-checks).
|
|
519
|
+
|
|
520
|
+
Cross-platform release coverage uses Crabbox to run macOS, Ubuntu Linux, and native Windows target suites:
|
|
521
|
+
|
|
522
|
+
```bash
|
|
523
|
+
npm run check:platform-smoke
|
|
524
|
+
npm run smoke:platform:ubuntu-image
|
|
525
|
+
npm run smoke:platform:all
|
|
526
|
+
```
|
|
527
|
+
|
|
528
|
+
The required matrix is documented in [`docs/platform-smoke.md`](docs/platform-smoke.md). It runs `platform-build` (fast target-local verify, pack, clean packed Pi install, `pi list`) and `browser-dogfood-smoke` (real `agent-browser`/browser wrapper smoke) on every target.
|
|
477
529
|
|
|
478
530
|
For package release confidence, follow [`docs/RELEASE.md`](docs/RELEASE.md). The release gate is:
|
|
479
531
|
|
|
480
532
|
```bash
|
|
481
533
|
npm run doctor
|
|
534
|
+
npm run check:platform-smoke
|
|
535
|
+
npm run smoke:platform:ubuntu-image
|
|
482
536
|
npm run verify -- release
|
|
483
537
|
```
|
|
484
538
|
|
|
485
|
-
`npm run verify -- release` includes the default verification gate
|
|
539
|
+
`npm run verify -- release` includes the default verification gate, packaged Pi smoke coverage, and the release-blocking Crabbox platform matrix. The package also has a `prepublishOnly` hook that runs the same release gate and `npm pack --dry-run` during `npm publish`.
|
|
486
540
|
|
|
487
541
|
## How it works
|
|
488
542
|
|
|
@@ -584,6 +638,7 @@ These calls return plain text and stay stateless: the extension does not inject
|
|
|
584
638
|
| `docs/ARCHITECTURE.md` | Design decisions and implementation structure |
|
|
585
639
|
| `docs/REQUIREMENTS.md` | Product requirements and constraints |
|
|
586
640
|
| `docs/RELEASE.md` | Release, package, and lifecycle verification workflow |
|
|
641
|
+
| `docs/platform-smoke.md` | Crabbox macOS, Ubuntu, and native Windows release gate |
|
|
587
642
|
| `docs/SUPPORT_MATRIX.md` | Current upstream support audit and release-readiness matrix |
|
|
588
643
|
| `test/` | Wrapper, runtime, presentation, lifecycle, and package tests |
|
|
589
644
|
|
package/docs/ARCHITECTURE.md
CHANGED
|
@@ -15,23 +15,28 @@ The package install path is the primary product path. Local checkout development
|
|
|
15
15
|
|
|
16
16
|
## Chosen shape
|
|
17
17
|
|
|
18
|
-
### One primary tool
|
|
18
|
+
### One primary tool plus optional companion search
|
|
19
19
|
|
|
20
|
-
V1
|
|
20
|
+
V1 exposes one native browser tool:
|
|
21
21
|
|
|
22
22
|
- `agent_browser`
|
|
23
23
|
|
|
24
|
+
It may also expose one optional companion tool:
|
|
25
|
+
|
|
26
|
+
- `agent_browser_web_search`, registered only when a Brave Search credential source is configured or resolvable
|
|
27
|
+
|
|
24
28
|
Why:
|
|
25
|
-
-
|
|
26
|
-
-
|
|
27
|
-
-
|
|
28
|
-
-
|
|
29
|
+
- keeps browser automation centered on `agent_browser`
|
|
30
|
+
- avoids colliding with generic `web_search`
|
|
31
|
+
- keeps live search separate from browser state, screenshots, refs, and session lifecycle
|
|
32
|
+
- preserves full upstream power without adding a generic provider abstraction
|
|
33
|
+
- keeps optional search invisible when it cannot run
|
|
29
34
|
|
|
30
35
|
### Direct subprocess execution
|
|
31
36
|
|
|
32
37
|
The extension should:
|
|
33
38
|
- resolve `agent-browser` from `PATH`
|
|
34
|
-
- invoke it directly,
|
|
39
|
+
- invoke it directly on POSIX; on Windows, route through PowerShell with single-quoted argv so npm launchers and the native `.exe` receive the same command tail that a user would type, and terminate the full PowerShell/agent-browser process tree with `taskkill /T /F` on timeout or abort before falling back to the direct child signal
|
|
35
40
|
- inject `--json`
|
|
36
41
|
- complete each upstream invocation when the direct `agent-browser` child exits even if Node delays `"close"`: piped stdio can stay referenced by longer-lived descendant processes, so `runAgentBrowserProcess` watches `exit` and `close` together, leaves stdio intact during a short post-`exit` grace so normal `close` can still win, destroys streams only when the post-`exit` fallback fires, and prefers `close` codes then wrapper timeout (`124`) over signal-shaped `exit` codes (`watchSpawnedChildCompletion` / `resolveSpawnedChildExitCode` in `extensions/agent-browser/lib/process.ts`) so the tool cannot hang after the CLI process has already terminated
|
|
37
42
|
- support optional stdin only for `eval --stdin`, `batch`, `auth save --password-stdin`, and wrapper-generated `batch` stdin from top-level `job`, `qa`, `sourceLookup`, or `networkSourceLookup`, rejecting other command/stdin combinations before launch; top-level `electron` never accepts caller `stdin` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#electron))
|
|
@@ -53,6 +58,20 @@ That means:
|
|
|
53
58
|
- no manual user orchestration as the main workflow
|
|
54
59
|
- any future slash commands should be minimal and secondary
|
|
55
60
|
|
|
61
|
+
### Package-owned config
|
|
62
|
+
|
|
63
|
+
Pi docs use `settings.json` for package/resource loading and filtering, not arbitrary extension secrets. For user-tunable package behavior, this package owns Pi-scoped config files instead:
|
|
64
|
+
|
|
65
|
+
- global: `~/.pi/config/pi-agent-browser-native/config.json`
|
|
66
|
+
- project-local: `.pi/config/pi-agent-browser-native/config.json`
|
|
67
|
+
- explicit override: `PI_AGENT_BROWSER_CONFIG=/path/to/config.json`
|
|
68
|
+
|
|
69
|
+
Config layers merge in that order: global, project, override. The config reader accepts v1 fields for `webSearch.braveApiKey` and conservative browser defaults such as `browser.defaultProfile`. `webSearch.braveApiKey` follows Pi model/provider-style value resolution for trusted global/override config: literal values, `$ENV_VAR` / `${ENV_VAR}` interpolation, escapes (`$$`, `$!`), and leading `!command` resolved at request time. Project-local plaintext, interpolation-literal, malformed, and command-backed Brave keys are rejected because project config can be copied, committed, or supplied by a repository; project config should use inert exact `$ENV_VAR` or `${ENV_VAR}` sources only.
|
|
70
|
+
|
|
71
|
+
`agent_browser_web_search` registration is conditional. Literal and env-backed sources must resolve at startup; command-backed sources are considered configured without running the command until tool execution, so secret managers do not slow startup or prompt unexpectedly. The tool resolves the key lazily, calls Brave Search, and returns compact result details without exposing the key.
|
|
72
|
+
|
|
73
|
+
Browser default profile config is intentionally conservative. It can add prompt guidance for signed-in/account-specific tasks, but current releases do not auto-inject `--profile` into every launch. Automatic launch-default mutation would affect privacy and browser state, so it needs a separate explicit design and test pass.
|
|
74
|
+
|
|
56
75
|
### Prompt guidance budget
|
|
57
76
|
|
|
58
77
|
Runtime `promptGuidelines` are a Tier A budget, not a full manual. They stay short enough to load on every `agent_browser`-aware turn and carry only high-impact rules: input-mode choice, the open → snapshot → ref loop, launch-scoped session handling, artifact verification, structured `nextActions`, extraction basics, and hard safety boundaries such as “stop before order/post/purchase/submit.”
|
|
@@ -156,7 +175,7 @@ That failure should include a structured recovery hint pointing to `sessionMode:
|
|
|
156
175
|
Implementation detail lives in `extensions/agent-browser/lib/argv-descriptor.ts` and `extensions/agent-browser/lib/argv-grammar.ts` (command discovery, `VALUE_FLAGS`, `parseArgvDescriptor`) plus `extensions/agent-browser/lib/runtime.ts` (`getStartupScopedFlags`, `buildExecutionPlan`):
|
|
157
176
|
|
|
158
177
|
- **Command discovery:** Leading argv is scanned with a value-taking allowlist so known global flags and documented command flags consume their values before the upstream command word is identified. Missing-value prevalidation is intentionally limited to upstream global value flags; command-scoped flags and literal text are left to upstream parsing so values like `fill #field --password` are not rejected by wrapper heuristics before the CLI sees them. When upstream adds new global flags that take values ahead of the command, extend both the command-discovery and prevalidation allowlists; when it adds command-specific flags, extend only command discovery/redaction as needed. A smaller set of global boolean flags may be followed by an optional `true`/`false` literal; when present, that literal is consumed as the flag value before command discovery continues.
|
|
159
|
-
- **`--state` disambiguation:** Persisted browser `--state` before the command participates in launch-scoped validation and tab-correction hints. The same flag spelling after a `wait` command is excluded from startup-scoped detection so upstream help examples such as `wait @ref --state hidden` do not spuriously require `sessionMode: "fresh"` while an implicit session is active. As of upstream `agent-browser 0.27.
|
|
178
|
+
- **`--state` disambiguation:** Persisted browser `--state` before the command participates in launch-scoped validation and tab-correction hints. The same flag spelling after a `wait` command is excluded from startup-scoped detection so upstream help examples such as `wait @ref --state hidden` do not spuriously require `sessionMode: "fresh"` while an implicit session is active. As of upstream `agent-browser 0.27.1`, the parser does not implement those `wait --state` examples as distinct wait modes, so agent-facing docs recommend `wait --fn` predicates for disappearance checks instead.
|
|
160
179
|
- **`--auto-connect`:** Treated as launch-scoped only when enabled (`--auto-connect` bare or `true`). `--auto-connect false` is ignored for startup-scoped blocking so disabled attach hints do not force a fresh launch.
|
|
161
180
|
|
|
162
181
|
**Sessionless inspection and local commands:** Plain-text global help and version probes (`--help`, `-h`, `--version`, `-V`) must never allocate or bind the extension-managed session. The same session-ownership rule applies to read-only upstream `skills list`, `skills get …`, and `skills path …`, local auth profile management (`auth save/list/show/delete/remove`), plus local/setup surfaces such as `profiles`, `dashboard start/stop`, `device list`, `doctor`, `install`, `upgrade`, `session list`, and targeted/all local saved-state maintenance (`state list/show`, `state clear --all`, `state clear -a`, `state clear <session-name>`, `state clean --older-than <days>`, `state rename`). Non-plain-text sessionless commands still run with `--json` for machine-readable output, but the planner does not prepend the implicit managed `--session`, so an agent can inspect local capabilities or start/stop the standalone dashboard without consuming the implicit session slot before a real `open`. Browser-backed, context-dependent, or incomplete commands such as root `session`, untargeted `state clear`, bare `state clean`, `auth login`, `state save`, and `state load` keep normal managed-session injection. Command-shape allowlisting lives in `extensions/agent-browser/lib/command-policy.ts` (`needsManagedSession`), while `extensions/agent-browser/lib/runtime.ts` (`isPlainTextInspectionArgs`, `buildExecutionPlan`) applies that decision to execution planning.
|
|
@@ -18,7 +18,7 @@ This project intentionally blocks normal `agent-browser` bash usage in most agen
|
|
|
18
18
|
|
|
19
19
|
<!-- agent-browser-capability-baseline:start upstream-baseline -->
|
|
20
20
|
<!-- Generated from scripts/agent-browser-capability-baseline.mjs. Run `npm run docs -- command-reference write` to update. Do not edit manually. -->
|
|
21
|
-
This reference is baselined to the locally installed `agent-browser 0.27.
|
|
21
|
+
This reference is baselined to the locally installed `agent-browser 0.27.1` command/help surface, audited against vercel-labs/agent-browser@90050f2913159875e2c3719e424746396ccb3cbf. Upstream `agent-browser` remains the source of truth for command semantics; this file is the local fallback for Pi agent sessions where direct binary help is blocked or discouraged.
|
|
22
22
|
|
|
23
23
|
The lightweight drift check is `npm run verify -- command-reference`. Run it whenever the installed upstream `agent-browser` version changes or this reference is edited.
|
|
24
24
|
|
|
@@ -126,7 +126,7 @@ Use `vitals [url]` for Core Web Vitals plus React hydration timing when availabl
|
|
|
126
126
|
{ "args": ["pushstate", "/dashboard?tab=settings"] }
|
|
127
127
|
```
|
|
128
128
|
|
|
129
|
-
For first-navigation setup, start on `about:blank`, then stage routes, cookies, or init scripts before navigating. The relevant v0.27.
|
|
129
|
+
For first-navigation setup, start on `about:blank`, then stage routes, cookies, or init scripts before navigating. The relevant v0.27.1 surfaces are `network route <url> [--abort|--body <json>] [--resource-type <csv>]` and `cookies set --curl <file>`:
|
|
130
130
|
|
|
131
131
|
```json
|
|
132
132
|
{ "args": ["open"], "sessionMode": "fresh" }
|
|
@@ -330,7 +330,7 @@ For one-call flows, put the click and wait in `batch`; the wait step keeps the s
|
|
|
330
330
|
{ "args": ["batch"], "stdin": "[[\"click\",\"@export\"],[\"wait\",\"--download\",\"/tmp/report.csv\"]]" }
|
|
331
331
|
```
|
|
332
332
|
|
|
333
|
-
A successful wait-based download renders a readable summary such as `Download completed: /tmp/report.csv` and exposes top-level `details.savedFilePath` plus `details.savedFile` for non-batch calls. With the current upstream `agent-browser 0.27.
|
|
333
|
+
A successful wait-based download renders a readable summary such as `Download completed: /tmp/report.csv` and exposes top-level `details.savedFilePath` plus `details.savedFile` for non-batch calls. With the current upstream `agent-browser 0.27.1`, `wait --download <path>` may report the requested path before this environment can verify that the file was persisted there. Treat `details.savedFilePath` as upstream-reported metadata unless `details.artifacts[].exists` is true. Upstream tracking: [vercel-labs/agent-browser#1300](https://github.com/vercel-labs/agent-browser/issues/1300).
|
|
334
334
|
|
|
335
335
|
### Download, screenshot, and PDF files
|
|
336
336
|
|
|
@@ -598,7 +598,7 @@ When a snapshot is too large for inline output, the Pi wrapper renders a compact
|
|
|
598
598
|
| `wait --download [path]` | Wait for a download started by a previous action and optionally save it to `path`; successful wrapper results include upstream-reported `savedFilePath`/`savedFile`, while `details.artifacts[].exists` is the wrapper's on-disk verification signal. |
|
|
599
599
|
| `wait --download [path] --timeout <ms>` | Set download-start timeout in milliseconds. In the native Pi wrapper, use `25000` ms or less per call to stay under the upstream CLI IPC budget. |
|
|
600
600
|
|
|
601
|
-
Current v0.27.
|
|
601
|
+
Current v0.27.1 source does not parse `wait <selector> --state hidden` / `wait <selector> --state detached` as distinct wait modes even though upstream help mentions those examples. Use `wait --fn "!document.querySelector('#spinner')"` or another explicit JavaScript predicate for disappearance/detach checks until upstream parser support exists.
|
|
602
602
|
|
|
603
603
|
### Diff, debug, and streaming
|
|
604
604
|
|
|
@@ -607,7 +607,7 @@ Current v0.27.0 source does not parse `wait <selector> --state hidden` / `wait <
|
|
|
607
607
|
| `diff snapshot` | Compare current versus last snapshot. Use `diff snapshot --baseline <file> --selector <sel> --compact --depth <n>` when you need a saved baseline, scoped subtree, compact output, or depth bound. |
|
|
608
608
|
| `diff screenshot --baseline` | Compare current screenshot versus a baseline image. Use `diff screenshot --baseline <file> --output <file> --threshold <0-1> --selector <sel> --full` when you need a saved diff image, threshold tuning, element scope, or full-page capture. |
|
|
609
609
|
| `diff url <u1> <u2>` | Compare two pages. Use `diff url <u1> <u2> --screenshot --wait-until <strategy> --selector <sel> --compact --depth <n>` when you need screenshot comparison, navigation wait control, or scoped/compact snapshot comparison. |
|
|
610
|
-
| `trace start
|
|
610
|
+
| `trace start`, `trace stop [path]` | Record a Chrome DevTools trace. |
|
|
611
611
|
| `profiler start|stop [path]` | Record a Chrome DevTools profile. |
|
|
612
612
|
| `record start <path> [url]` | Start WebM video recording; output is written on `record stop`. Requires `ffmpeg` on `PATH` for the final encode. |
|
|
613
613
|
| `record stop` | Stop and save video. If this fails with `ffmpeg not found`, install `ffmpeg` / `ffmpeg-full` and rerun the recording. |
|
|
@@ -663,6 +663,48 @@ Long-running or lifecycle commands should be explicitly paired with cleanup call
|
|
|
663
663
|
|
|
664
664
|
When these commands are invoked through the native `agent_browser` tool, structured diagnostic/status outputs are rendered as compact summaries. Local inspection/setup calls (`auth save/list/show/delete/remove`, `dashboard start/stop`, `device list`, `doctor`, `install`, `upgrade`, `profiles`, `session list`, `state list/show/rename`, `state clean --older-than <days>`, `state clear --all`, `state clear -a`, and `state clear <session-name>`) are sessionless unless you explicitly pass `--session`; context-dependent calls such as root `session`, untargeted `state clear`, `auth login`, `chat`, and `state save/load` keep normal session behavior. List-like outputs such as sessions, Chrome profiles, auth profiles, network requests, console messages, and page errors include counts and key fields; large outputs are previewed with a `Full output path:` spill file instead of dumping the entire payload into context. For `network requests`, the wrapper shows a failed-request summary split into actionable versus benign low-impact rows, then status, method, URL, resource/mime type, request id, and, when the installed upstream output includes body-like fields, bounded redacted payload, response, and failure/error snippets. Safe request IDs also produce `details.nextActions` for exact request details, actionable failed-request source lookup candidates, filtered request lists, or starting HAR capture before a repro. `network request <requestId>` can expose upstream full-detail body fields such as response bodies using the same bounded model-facing preview; its request URL stays diagnostic-only and does not overwrite `details.sessionTabTarget` for later ref guards. Header, cookie, auth, token, and other secret-like fields are not expanded in model-facing text or `details.data`; command echoes also redact `--body`, `--headers`, `--password`, proxy credentials, auth-bearing URLs, cookie/storage values, and bearer/basic credential text in positional arguments. Use upstream HAR or full raw details only when complete data is required.
|
|
665
665
|
|
|
666
|
+
## Optional package config and companion web search
|
|
667
|
+
|
|
668
|
+
`pi-agent-browser-native` has package-owned config under Pi-scoped paths. This is separate from upstream `agent-browser` config and from Pi package settings:
|
|
669
|
+
|
|
670
|
+
- global: `~/.pi/config/pi-agent-browser-native/config.json`
|
|
671
|
+
- project-local: `.pi/config/pi-agent-browser-native/config.json`
|
|
672
|
+
- explicit override: `PI_AGENT_BROWSER_CONFIG=/path/to/config.json`
|
|
673
|
+
|
|
674
|
+
Get a Brave Search API key from the [Brave Search API dashboard](https://api-dashboard.search.brave.com/). Brave currently advertises free monthly credits for Search API usage, which is usually ample for light personal agent/dogfood use; confirm current pricing and limits on Brave's dashboard before relying on it for heavier workflows.
|
|
675
|
+
|
|
676
|
+
Inspect and write config with the package helper:
|
|
677
|
+
|
|
678
|
+
```bash
|
|
679
|
+
pi-agent-browser-config paths
|
|
680
|
+
pi-agent-browser-config show
|
|
681
|
+
pi-agent-browser-config web-search set-env BRAVE_API_KEY --project
|
|
682
|
+
pi-agent-browser-config web-search set-command "op read 'op://Private/Brave Search/API Key'" --global
|
|
683
|
+
printf '%s' "$BRAVE_API_KEY" | pi-agent-browser-config web-search set-key --stdin
|
|
684
|
+
pi-agent-browser-config browser profile set Default --policy authenticated-only
|
|
685
|
+
```
|
|
686
|
+
|
|
687
|
+
The optional `agent_browser_web_search` tool is registered only when a Brave Search credential source is configured or resolvable. It is a separate custom tool, not an `agent_browser` input mode, and does not launch a browser. Use it when current/live external web information would help; use `agent_browser` for browser interaction, screenshots, authenticated/profile pages, and DOM inspection. Project-local plaintext, interpolation-literal, malformed, and command-backed Brave keys are refused; use exact `$ENV_VAR` or `${ENV_VAR}` sources there.
|
|
688
|
+
|
|
689
|
+
Example config:
|
|
690
|
+
|
|
691
|
+
```json
|
|
692
|
+
{
|
|
693
|
+
"version": 1,
|
|
694
|
+
"webSearch": {
|
|
695
|
+
"braveApiKey": "$BRAVE_API_KEY"
|
|
696
|
+
},
|
|
697
|
+
"browser": {
|
|
698
|
+
"defaultProfile": {
|
|
699
|
+
"name": "Default",
|
|
700
|
+
"policy": "authenticated-only"
|
|
701
|
+
}
|
|
702
|
+
}
|
|
703
|
+
}
|
|
704
|
+
```
|
|
705
|
+
|
|
706
|
+
Browser default profile config is conservative: it adds agent guidance for signed-in/account-specific tasks; current releases do not auto-inject `--profile` for every launch.
|
|
707
|
+
|
|
666
708
|
## Important global flags, config, and environment
|
|
667
709
|
|
|
668
710
|
### Authentication and session flags
|
|
@@ -750,14 +792,14 @@ Other useful environment variables include `AGENT_BROWSER_DEFAULT_TIMEOUT`, `AGE
|
|
|
750
792
|
<!-- agent-browser-capability-baseline:start capability-token-baseline -->
|
|
751
793
|
<!-- Generated from scripts/agent-browser-capability-baseline.mjs. Run `npm run docs -- command-reference write` to update. Do not edit manually. -->
|
|
752
794
|
<details>
|
|
753
|
-
<summary>Generated verifier capability baseline for agent-browser 0.27.
|
|
795
|
+
<summary>Generated verifier capability baseline for agent-browser 0.27.1</summary>
|
|
754
796
|
|
|
755
797
|
This generated block is review data for maintainers. The human-authored reference sections above remain the readable command guide.
|
|
756
798
|
|
|
757
799
|
#### Source evidence
|
|
758
800
|
- repository: `vercel-labs/agent-browser`
|
|
759
|
-
- upstream HEAD: `
|
|
760
|
-
- upstream package version: `0.27.
|
|
801
|
+
- upstream HEAD: `90050f2913159875e2c3719e424746396ccb3cbf`
|
|
802
|
+
- upstream package version: `0.27.1`
|
|
761
803
|
- inspected: `agent-browser --version`
|
|
762
804
|
- inspected: `agent-browser --help`
|
|
763
805
|
- inspected: `selected agent-browser <command> --help output`
|
|
@@ -824,7 +866,7 @@ This generated block is review data for maintainers. The human-authored referenc
|
|
|
824
866
|
- Built-in skills: 13 human-doc token(s), 13 upstream token(s)
|
|
825
867
|
- Core page, element, navigation, and extraction commands: 74 human-doc token(s), 74 upstream token(s)
|
|
826
868
|
- Sessions, state, tabs, frames, dialogs, and windows: 20 human-doc token(s), 16 upstream token(s)
|
|
827
|
-
- Network, storage, artifacts, diagnostics, and performance:
|
|
869
|
+
- Network, storage, artifacts, diagnostics, and performance: 43 human-doc token(s), 53 upstream token(s)
|
|
828
870
|
- Batch, auth, confirmations, setup, dashboard, devices, and AI commands: 24 human-doc token(s), 24 upstream token(s)
|
|
829
871
|
- Global flags, config, providers, policy, and environment: 117 human-doc token(s), 90 upstream token(s)
|
|
830
872
|
|
|
@@ -960,7 +1002,8 @@ This generated block is review data for maintainers. The human-authored referenc
|
|
|
960
1002
|
- `diff screenshot --baseline <file> --output <file> --threshold <0-1> --selector <sel> --full`
|
|
961
1003
|
- `diff url <u1> <u2>`
|
|
962
1004
|
- `diff url <u1> <u2> --screenshot --wait-until <strategy> --selector <sel> --compact --depth <n>`
|
|
963
|
-
- `trace start
|
|
1005
|
+
- `trace start`
|
|
1006
|
+
- `trace stop [path]`
|
|
964
1007
|
- `profiler start|stop [path]`
|
|
965
1008
|
- `record start <path> [url]`
|
|
966
1009
|
- `record restart <path> [url]`
|
|
@@ -1252,7 +1295,8 @@ This generated block is review data for maintainers. The human-authored referenc
|
|
|
1252
1295
|
- root help: `storage <local|session>`
|
|
1253
1296
|
- root help: `diff snapshot`
|
|
1254
1297
|
- root help: `diff screenshot --baseline`
|
|
1255
|
-
- root help: `trace start
|
|
1298
|
+
- root help: `trace start`
|
|
1299
|
+
- root help: `trace stop [path]`
|
|
1256
1300
|
- root help: `profiler start|stop [path]`
|
|
1257
1301
|
- root help: `record start <path> [url]`
|
|
1258
1302
|
- root help: `record stop`
|
|
@@ -1288,7 +1332,8 @@ This generated block is review data for maintainers. The human-authored referenc
|
|
|
1288
1332
|
- diff help: `--threshold <0-1>`
|
|
1289
1333
|
- diff help: `--wait-until <strategy>`
|
|
1290
1334
|
- diff help: `diff screenshot --baseline <f>`
|
|
1291
|
-
- trace help: `trace
|
|
1335
|
+
- trace help: `trace start`
|
|
1336
|
+
- trace help: `trace stop [path]`
|
|
1292
1337
|
- profiler help: `--categories <list>`
|
|
1293
1338
|
- record help: `record restart <path.webm> [url]`
|
|
1294
1339
|
- console help: `--clear`
|
package/docs/RELEASE.md
CHANGED
|
@@ -6,6 +6,7 @@ Related docs:
|
|
|
6
6
|
- [`ARCHITECTURE.md`](ARCHITECTURE.md)
|
|
7
7
|
- [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md)
|
|
8
8
|
- [`ELECTRON.md`](ELECTRON.md)
|
|
9
|
+
- [`platform-smoke.md`](platform-smoke.md)
|
|
9
10
|
- [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
|
|
10
11
|
- Bounded `agent_browser` outcome metadata on `details` (`resultCategory`, `successCategory`, `failureCategory`, optional `nextActions`, optional `pageChangeSummary` with per-step summaries on `batch`): contract in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); maintainer checklists under “Tool result categories” and “Page-change summaries” in [`../AGENTS.md`](../AGENTS.md)
|
|
11
12
|
- Post-success `get text` selector visibility (`RQ-0074`): optional `details.selectorTextVisibility` / `selectorTextVisibilityAll`, visible warnings, and `inspect-visible-text-candidates*` next actions after read-only visibility probes—[`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), and [`../AGENTS.md`](../AGENTS.md) maintainer checklist
|
|
@@ -23,6 +24,8 @@ From the repository root:
|
|
|
23
24
|
```bash
|
|
24
25
|
npm install
|
|
25
26
|
npm run doctor
|
|
27
|
+
npm run check:platform-smoke
|
|
28
|
+
npm run smoke:platform:ubuntu-image
|
|
26
29
|
npm run verify -- release
|
|
27
30
|
```
|
|
28
31
|
|
|
@@ -32,20 +35,29 @@ npm run verify -- release
|
|
|
32
35
|
|
|
33
36
|
1. `npm run verify` for generated playbook drift, TypeScript, unit/fake coverage, command-reference generated-block drift, and live command-reference verification against the targeted upstream on `PATH`
|
|
34
37
|
2. `npm run verify -- package-pi`, which first validates package contents via `npm pack --json --dry-run` and then smoke-loads the packed package in Pi isolation
|
|
38
|
+
3. `npm run smoke:platform:doctor` and the full Crabbox matrix from [`platform-smoke.md`](platform-smoke.md): macOS SSH, Ubuntu local-container, and native Windows Parallels targets running fast target-local `platform-build` plus `browser-dogfood-smoke`
|
|
35
39
|
|
|
36
|
-
`npm publish` runs npm’s `prepublishOnly` script from `package.json`, which executes the same `npm run verify -- release` gate and then `npm pack --dry-run`. That concatenated gate is everything in the default `npm run verify` step (generated playbook drift, TypeScript, the unit/fake suite, generated command-reference blocks, and live upstream command-reference sampling against the targeted `agent-browser` on `PATH`) plus the packaged Pi smoke in `package-pi
|
|
40
|
+
`npm publish` runs npm’s `prepublishOnly` script from `package.json`, which executes the same `npm run verify -- release` gate and then `npm pack --dry-run`. That concatenated gate is everything in the default `npm run verify` step (generated playbook drift, TypeScript, the unit/fake suite, generated command-reference blocks, and live upstream command-reference sampling against the targeted `agent-browser` on `PATH`) plus the packaged Pi smoke in `package-pi` and the release-blocking Crabbox platform matrix. Using `npm publish --ignore-scripts` skips that contract intentionally.
|
|
37
41
|
|
|
38
|
-
`prepublishOnly` intentionally does **not** run `npm run verify -- lifecycle`, `npm run verify -- real-upstream`, `npm run verify -- dogfood`, or `npm run verify -- benchmark
|
|
42
|
+
`prepublishOnly` intentionally does **not** run the standalone host-only `npm run verify -- lifecycle`, `npm run verify -- real-upstream`, `npm run verify -- dogfood`, or `npm run verify -- benchmark` modes; those remain separate `npm run verify` modes in [`scripts/project.mjs`](../scripts/project.mjs). The platform matrix includes its own fast target-local build/package gate and browser dogfood suite, and is automated through the `release` slice.
|
|
39
43
|
|
|
40
|
-
For a deterministic real-browser wrapper smoke without model choice in the loop, run:
|
|
44
|
+
For a deterministic host-only real-browser wrapper smoke without model choice in the loop, run:
|
|
41
45
|
|
|
42
46
|
```bash
|
|
43
47
|
npm run verify -- dogfood
|
|
44
48
|
```
|
|
45
49
|
|
|
46
|
-
|
|
50
|
+
For direct Crabbox diagnostics outside the full release compose, run:
|
|
47
51
|
|
|
48
|
-
|
|
52
|
+
```bash
|
|
53
|
+
npm run smoke:platform:doctor
|
|
54
|
+
npm run smoke:platform:ubuntu-image
|
|
55
|
+
npm run smoke:platform:all
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
This mode uses the extension harness and the real `agent-browser` on `PATH` against a deterministic local file fixture, then verifies top-level `qa`, `semanticAction`, constrained `job`, screenshot artifact verification, and session close. Use `npm run verify -- dogfood --keep-artifacts` or `--artifact-dir <path>` only while debugging, then delete retained screenshots. This smoke complements, but does not replace, human-readable interactive transcript evidence.
|
|
59
|
+
|
|
60
|
+
Every release also requires interactive `tmux`-driven Pi dogfood with the native `agent_browser` tool against real sites. For extension-focused release smokes, use `pi --no-extensions --no-skills -e .` from the checkout before publish so auto-loaded dogfood/QA skills cannot replace the bounded smoke workflow; run separate skill-enabled dogfood only when validating skill routing or report-generation behavior. Drive prompts with `tmux send-keys`, exercise at least one simple static site and one real documentation/product site, include the higher-level `qa` or `job`/`batch` surfaces when they changed, close every opened browser session, remove screenshots/temp artifacts, and record the outcome in the release notes or support-matrix evidence. Automated localhost, fake-upstream, and deterministic dogfood gates do not replace this human-readable live-site transcript evidence. When `agent_browser_web_search` or package config changed, add one key-free smoke proving the optional tool is absent without config, one fake/unit-backed smoke in the default suite, and one opt-in live Brave Search check with a real key while confirming the key does not appear in transcripts, stdout/stderr, config status, PR text, or artifacts. When `electron.*` surfaces, attached-session diagnostics, or `qa.attached` changed, add a local Electron pass: `electron.list` → `electron.launch` (expect isolated profile behavior) → `snapshot -i` or `electron.probe` / `qa.attached` → `electron.cleanup` with the returned `launchId`, verifying status/mismatch guidance if you simulate a dead renderer or stale refs. For dense-dashboard stress coverage, use the [public Grafana stress checklist](#public-grafana-stress-checklist) below; it is a maintainer workflow, not bundled product skill or recipe runtime.
|
|
49
61
|
|
|
50
62
|
When reviewing saved session JSONL after a failed smoke or a `qa` preset that reclassified an upstream-successful batch, expect `agent_browser` tool rows to carry `isError: true` whenever `details.resultCategory` is `failure`. For normal prose output, model-visible text should end with a `Pi tool isError: true` category line; for caller-requested `--json` output, the hook preserves parseable JSON and only patches `isError`. The extension applies that patch on the `tool_result` path so Pi’s transcript matches the wrapper contract ([`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details)). Preserve a normal Pi session directory for those checks; avoiding `--no-session` keeps this evidence intact ([`AGENTS.md`](../AGENTS.md) preferred validation workflow).
|
|
51
63
|
|
|
@@ -145,7 +157,8 @@ Maintainer constraints for evolving scenarios and version bumps are summarized u
|
|
|
145
157
|
`npm run verify -- package-pi` runs the same package-content checks and additionally confirms that:
|
|
146
158
|
|
|
147
159
|
- the packed package can be loaded through Pi SDK resource loading with the same isolation principle as `pi --no-extensions -e <package-source>`
|
|
148
|
-
-
|
|
160
|
+
- `agent_browser` is registered without requiring optional Brave config
|
|
161
|
+
- any optional companion tools remain governed by their own configuration gates
|
|
149
162
|
- the registered `agent_browser` source resolves inside the extracted packed package path, not the working checkout
|
|
150
163
|
- the packaged `agent_browser` tool can be executed through Pi's loaded native tool definition with a deterministic fake upstream `agent-browser --version` binary
|
|
151
164
|
|
|
@@ -260,7 +273,7 @@ The default unit suite also runs `agentBrowserExtension passes through core comm
|
|
|
260
273
|
- **Missing or extra `details` / `data` keys:** Update `test/fixtures/agent-browser-real-output-shapes.json` in the same change as the wrapper or presentation code that shifts those keys.
|
|
261
274
|
- **Timeouts:** A 120s bound covers the full matrix; repeated timeouts usually mean a hung browser, blocked loopback, or an environment preventing headful/headless launch—check upstream logs and local security tooling before loosening timeouts.
|
|
262
275
|
|
|
263
|
-
The current upstream `agent-browser 0.27.
|
|
276
|
+
The current upstream `agent-browser 0.27.1` `wait --download <path>` saveAs persistence limitation is tracked at [vercel-labs/agent-browser#1300](https://github.com/vercel-labs/agent-browser/issues/1300); until it is fixed, release validation must treat `details.savedFilePath` as upstream-reported metadata and use `details.artifacts[].exists` as the filesystem truth (the contract asserts the requested path is absent on disk while upstream still reports success). If the suite fails because JSON/detail keys drifted, update the wrapper behavior or refresh `test/fixtures/agent-browser-real-output-shapes.json` together with the presentation work that consumes those shapes.
|
|
264
277
|
|
|
265
278
|
Example smoke prompt:
|
|
266
279
|
|
|
@@ -280,7 +293,7 @@ Recommended configured-source lifecycle follow-up:
|
|
|
280
293
|
|
|
281
294
|
## Post-publish install validation
|
|
282
295
|
|
|
283
|
-
After publishing a release, validate the package-first path in isolation. `npm run verify -- release` includes the deterministic fake-binary packaged execution gate, but it does not replace a real-browser installed-package smoke:
|
|
296
|
+
After publishing a release, validate the package-first path in isolation. `npm run verify -- release` includes the deterministic fake-binary packaged execution gate and the pre-publish Crabbox platform matrix, but it does not replace a real-browser installed-package smoke against the published npm package:
|
|
284
297
|
|
|
285
298
|
```bash
|
|
286
299
|
npm exec --package pi-agent-browser-native -- pi-agent-browser-doctor
|
|
@@ -309,7 +322,7 @@ Before publishing:
|
|
|
309
322
|
- run `npm run verify -- real-upstream` for upstream runtime, result-presentation, or managed-session changes
|
|
310
323
|
- confirm both local-checkout modes still work for pre-release validation: isolated `pi --no-extensions -e .` smoke testing for general checkout loading (add `--no-skills` for extension-focused bounded smokes) and configured-source lifecycle validation
|
|
311
324
|
- complete interactive `tmux` live-site extension smoke with `pi --no-extensions --no-skills -e .` and the native `agent_browser` tool (at least one simple static site and one real documentation/product site; include `qa` or `job`/`batch` when those surfaces changed; use the [public Grafana stress checklist](#public-grafana-stress-checklist) when dashboard/diagnostic/artifact behavior changed; close sessions and remove screenshots/temp artifacts; record evidence). Run separate skill-enabled dogfood only when validating skill routing/report-generation behavior—see [Pre-release checks](#pre-release-checks); automated gates are not a substitute
|
|
312
|
-
- rerun `npm run verify -- release`
|
|
325
|
+
- rerun `npm run verify -- release` and confirm the embedded Crabbox `platform-build` plus `browser-dogfood-smoke` matrix passed on `macos`, `ubuntu`, and `windows-native` with artifacts under `.artifacts/platform-smoke/`
|
|
313
326
|
- run `npm run verify -- lifecycle` for configured-source `/reload`, exact `--session-id` relaunch, managed-session continuity, persisted-spill, and Pi failure-patch regression coverage (required before publish; see [Pre-release checks](#pre-release-checks))
|
|
314
327
|
- confirm [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md) still maps every current baseline inventory section to docs, runtime handling, tests, and validation status
|
|
315
328
|
- manually exercise real-browser `/reload` and full restart plus exact `--session-id` relaunch or `/resume` continuity when release risk warrants browser-level confidence beyond the fake upstream harness
|
package/docs/SUPPORT_MATRIX.md
CHANGED
|
@@ -7,6 +7,7 @@ Related docs:
|
|
|
7
7
|
- [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md)
|
|
8
8
|
- [`ELECTRON.md`](ELECTRON.md)
|
|
9
9
|
- [`RELEASE.md`](RELEASE.md)
|
|
10
|
+
- [`platform-smoke.md`](platform-smoke.md)
|
|
10
11
|
- [`REQUIREMENTS.md`](REQUIREMENTS.md)
|
|
11
12
|
|
|
12
13
|
## Purpose
|
|
@@ -25,11 +26,11 @@ When upstream ships a new `agent-browser` or the inventory changes:
|
|
|
25
26
|
|
|
26
27
|
## Audit result
|
|
27
28
|
|
|
28
|
-
- Target upstream: `agent-browser 0.27.
|
|
29
|
+
- Target upstream: `agent-browser 0.27.1` (must match `CAPABILITY_BASELINE.targetVersion` in [`scripts/agent-browser-capability-baseline.mjs`](../scripts/agent-browser-capability-baseline.mjs)).
|
|
29
30
|
- Source of truth: `CAPABILITY_BASELINE.inventorySections` in the same file (stable `id` keys: `skills`, `core-commands`, `state-tabs-frames-dialogs`, `network-storage-artifacts-diagnostics`, `batch-auth-setup-ai`, `options-and-env`).
|
|
30
31
|
- Status: supported for the current wrapper contract after the 2026-05-26 all-command audit.
|
|
31
|
-
- High-priority support gaps: 2026-05-26 audit found sessionless local commands and command-scoped value flags needed sharper wrapper handling; runtime/tests/docs now cover those paths. Remaining upstream-owned caveat: `agent-browser 0.27.
|
|
32
|
-
- Post-`v0.2.29` review state: commits `eb55320` through `86abbfb` add browser guidance/smoke coverage plus `RQ-0086` click-probe reduction, `RQ-0087` same-snapshot form fill batching, `RQ-0088` current-ref fallback on locator misses, `RQ-0089` direct-upstream click mutation investigation, and `RQ-0090` stop-boundary/artifact-path guidance. Verification gates below were rerun on 2026-05-18 after those tasks landed. Constrained `job` (`RQ-0064`), the lightweight `qa` preset (`RQ-0065`), the experimental `sourceLookup` helper (`RQ-0066`),
|
|
32
|
+
- High-priority support gaps: 2026-05-26 audit found sessionless local commands and command-scoped value flags needed sharper wrapper handling; runtime/tests/docs now cover those paths. Remaining upstream-owned caveat: `agent-browser 0.27.1` help mentions `wait <selector> --state hidden`, but source parsing does not implement that distinct wait mode, so wrapper docs steer agents to `wait --fn` predicates.
|
|
33
|
+
- Post-`v0.2.29` review state: commits `eb55320` through `86abbfb` add browser guidance/smoke coverage plus `RQ-0086` click-probe reduction, `RQ-0087` same-snapshot form fill batching, `RQ-0088` current-ref fallback on locator misses, `RQ-0089` direct-upstream click mutation investigation, and `RQ-0090` stop-boundary/artifact-path guidance. Verification gates below were rerun on 2026-05-18 after those tasks landed. Constrained `job` (`RQ-0064`), the lightweight `qa` preset (`RQ-0065`), the experimental `sourceLookup` helper (`RQ-0066`), the experimental `networkSourceLookup` helper (`RQ-0067`), and optional Brave-backed `agent_browser_web_search` with Pi-scoped package config (`RQ-0121`) are implemented; see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#job), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#qa), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sourcelookup), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#networksourcelookup), and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#optional-companion-web-search). Reusable browser recipes (`RQ-0068`) are intentionally not adopted as a runtime surface; see [`ARCHITECTURE.md`](ARCHITECTURE.md#no-reusable-recipe-layer-yet).
|
|
33
34
|
|
|
34
35
|
## Open UX/reliability follow-ups from 2026-05-29 agent feedback
|
|
35
36
|
|
|
@@ -57,14 +58,15 @@ Re-run the gates below before each release; this table records what the closure
|
|
|
57
58
|
|
|
58
59
|
| Gate | Evidence | Status |
|
|
59
60
|
| --- | --- | --- |
|
|
60
|
-
| Default local gate | `npm run verify` checks generated playbook drift, `tsc --noEmit`, unit/fake tests, generated command-reference blocks, and live command-reference sampling. | Pass on 2026-
|
|
61
|
-
| Real upstream contract | `npm run verify -- real-upstream` runs the localhost fixture matrix against the real installed `agent-browser` matching the baseline. | Pass on 2026-
|
|
62
|
-
| Packaged Pi smoke | `npm run verify -- package-pi` validates package contents, loads
|
|
63
|
-
| Deterministic dogfood smoke | `npm run verify -- dogfood` (`scripts/verify-agent-browser-dogfood.ts`) drives the native wrapper against
|
|
61
|
+
| Default local gate | `npm run verify` checks generated playbook drift, `tsc --noEmit`, unit/fake tests, generated command-reference blocks, and live command-reference sampling. | Pass on 2026-06-02 as part of `npm run verify -- release` (`agent-browser 0.27.1` on `PATH`). |
|
|
62
|
+
| Real upstream contract | `npm run verify -- real-upstream` runs the localhost fixture matrix against the real installed `agent-browser` matching the baseline. | Pass on 2026-06-02 (`npm run verify -- real-upstream`, `agent-browser 0.27.1` on `PATH`; updated Web Vitals shape assertions for upstream 0.27.1 structured output). |
|
|
63
|
+
| Packaged Pi smoke | `npm run verify -- package-pi` validates package contents, loads the packaged `agent_browser` tool without requiring optional Brave config, and executes fake-upstream `--version`. | Pass on 2026-05-29 (`npm run verify -- package-pi`). |
|
|
64
|
+
| Deterministic dogfood smoke | `npm run verify -- dogfood` (`scripts/verify-agent-browser-dogfood.ts`) drives the native wrapper against a local file fixture through top-level `qa`, `semanticAction`, constrained `job`, screenshot artifact verification, and session close with the real `agent-browser` on `PATH`. | Pass on 2026-06-02 (`npm run verify -- dogfood`, `agent-browser 0.27.1`; artifacts cleaned by the harness). |
|
|
64
65
|
| Efficiency benchmark | `npm run verify -- benchmark` runs deterministic browser workflow accounting plus focused benchmark tests, including JSONL sampling fixtures and job/qa/sourceLookup/networkSourceLookup/Electron scenario coverage. | Pass on 2026-05-29 (`npm run verify -- benchmark`). |
|
|
65
|
-
|
|
|
66
|
+
| Crabbox platform smoke | `npm run check:platform-smoke` syntax-checks the harness and cheap invariants. `npm run smoke:platform:all` runs doctor first, then fast target-local `platform-build` (`npm run verify -- platform-target`, pack, clean Pi install) plus `browser-dogfood-smoke` on Crabbox `macos`, `ubuntu`, and `windows-native`; see [`platform-smoke.md`](platform-smoke.md). | Pass on 2026-06-02 (`npm run check:platform-smoke`, `npm run smoke:platform:ubuntu-image`, and `npm run smoke:platform:all`; artifacts cleaned after evidence capture). |
|
|
67
|
+
| `verify -- release` / `prepublishOnly` | `npm run verify -- release` chains the default gate with packaged Pi smoke and the release-blocking Crabbox platform matrix (`verifySteps` `release` in [`scripts/project.mjs`](../scripts/project.mjs)). `package.json` `prepublishOnly` runs that compose before `npm pack --dry-run` during `npm publish`. It intentionally omits standalone lifecycle, real-upstream, host-only dogfood, and benchmark modes—see [`RELEASE.md`](RELEASE.md#pre-release-checks). | Pass on 2026-06-02 (`npm run verify -- release`, including macOS/Ubuntu/native-Windows Crabbox matrix). |
|
|
66
68
|
| Configured-source lifecycle | `npm run verify -- lifecycle` (`scripts/verify-lifecycle.mjs`) drives `/reload`, closes and relaunches Pi with the same exact `--session-id`, checks the JSONL session header id, session continuity, slash-command sentinel tokens (`v1` then `v2` after rewriting the packaged extension to simulate pickup), persisted spill reachability, and real Pi `tool_result` failure-patch semantics for a QA reclassification with a fake upstream on `PATH`. Default Pi model is `zai/glm-5.1`; default per-step wait is **180000 ms** (`DEFAULT_TIMEOUT_MS`); override model with `--model <id>` and waits with `--timeout-ms <ms>`. Passthrough flags in [`scripts/project.mjs`](../scripts/project.mjs): `--keep-artifacts`, `--model`, `--verbose`, and `--timeout-ms` plus a value (for example `npm run verify -- lifecycle --model openai-codex/gpt-5.5:minimal --keep-artifacts --verbose --timeout-ms 600000`). | Pass on 2026-05-29 (`npm run verify -- lifecycle`). Treat any future unexplained red lifecycle gate as a release blocker. |
|
|
67
|
-
| Quick isolated Pi smoke | `pi --no-extensions --no-skills -e . --tools agent_browser` from repo root; native `agent_browser` only. |
|
|
69
|
+
| Quick isolated Pi smoke | `pi --no-extensions --no-skills -e . --tools agent_browser` from repo root; native `agent_browser` only. | Last interactive tmux checkout smoke pass on 2026-05-29 (`agent-browser 0.27.0` at the time). The 2026-06-02 Crabbox matrix now covers clean packed Pi install plus deterministic wrapper dogfood on all required platforms for `agent-browser 0.27.1`; run a new manual tmux smoke before publish when human-readable transcript evidence is required. Broader historical coverage also includes version/help/skills, open/snapshot/click, eval stdin, batch stdin, screenshot, explicit session, `sessionMode: "fresh"`, network requests, console/errors, diff snapshot, stream status/disable, dashboard start/stop, and chat credential-failure pass-through during RQ-0055. |
|
|
68
70
|
|
|
69
71
|
## Baseline checklist by inventory section
|
|
70
72
|
|
|
@@ -75,11 +77,13 @@ Re-run the gates below before each release; this table records what the closure
|
|
|
75
77
|
| Sessions, state, tabs, frames, dialogs, and windows | 20 canonical tokens from baseline section `state-tabs-frames-dialogs`; see [`scripts/agent-browser-capability-baseline.mjs`](../scripts/agent-browser-capability-baseline.mjs) and generated [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#session-state-frames-dialogs-windows-and-inspection-commands). | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#session-state-frames-dialogs-windows-and-inspection-commands), stateful workflow notes, [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details). | Stateful summaries/redaction, state artifact handling, sessionless local command planning, managed-session restore, tab target pinning, and close alias cleanup. | Extension-validation stateful matrix, runtime session/resume tests, presentation redaction tests, lifecycle harness. | Supported. External profile/auth state remains operator-owned. |
|
|
76
78
|
| Network, storage, artifacts, diagnostics, and performance | 42 canonical tokens from baseline section `network-storage-artifacts-diagnostics`; see [`scripts/agent-browser-capability-baseline.mjs`](../scripts/agent-browser-capability-baseline.mjs) and generated [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#page-state-finding-mouse-settings-network-and-storage). | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#page-state-finding-mouse-settings-network-and-storage), diagnostic sections, [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details). | Thin passthrough plus compact diagnostics, artifact metadata, missing-ffmpeg warnings, sensitive-data redaction, timeout bounds, and cleanup-pair guidance. | Fake non-core matrix and safe real-upstream coverage for network/HAR, diff, trace/profiler, console/errors/highlight, stream, vitals, and React missing-renderer. | Supported. Environment-sensitive operations need suitable local/browser state. |
|
|
77
79
|
| Batch, auth, confirmations, setup, dashboard, devices, and AI commands | 24 canonical tokens from baseline section `batch-auth-setup-ai`; see [`scripts/agent-browser-capability-baseline.mjs`](../scripts/agent-browser-capability-baseline.mjs) and generated [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#batch-auth-confirmations-sessions-chat-dashboard-devices-and-setup). | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#batch-auth-confirmations-sessions-chat-dashboard-devices-and-setup), README security notes, release docs. | Native-tool batch stdin, generated `job`/`qa`/lookup batch plans, auth/confirmation redaction, sessionless local auth/setup/dashboard/doctor planning, timeout/cleanup guidance. | Unit/fake batch/auth/confirmation/dashboard/chat/doctor tests; extension-validation for structured input modes; efficiency benchmark scenarios. | Supported. Interactive side-effecting setup/auth/chat remains upstream-owned. |
|
|
78
|
-
| Global flags, config, providers, policy, and environment | 117 canonical tokens from baseline section `options-and-env`; see [`scripts/agent-browser-capability-baseline.mjs`](../scripts/agent-browser-capability-baseline.mjs) and generated [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#important-global-flags-config-and-environment). | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#important-global-flags-config-and-environment), README provider/setup notes, [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sessionmode), architecture/runtime docs. | Runtime handles command discovery, value-flag prevalidation, launch-scoped flags, redacted echoes, fresh-session recovery hints, explicit sessions, provider/device launch-scoping, curated env forwarding, and
|
|
80
|
+
| Global flags, config, providers, policy, and environment | 117 canonical tokens from baseline section `options-and-env`; see [`scripts/agent-browser-capability-baseline.mjs`](../scripts/agent-browser-capability-baseline.mjs) and generated [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#important-global-flags-config-and-environment). | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#important-global-flags-config-and-environment), README provider/setup notes, [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sessionmode), architecture/runtime docs. | Runtime handles command discovery, value-flag prevalidation, launch-scoped flags, redacted echoes, fresh-session recovery hints, explicit sessions, provider/device launch-scoping, curated env forwarding, subprocess completion, and package-owned Pi-scoped config for optional companion features. | Runtime tests for flags/planning/redaction/session behavior; process tests for env and stdio-linger completion; config/web-search/CLI tests; fake provider/specialized-skill matrix; package doctor. | Supported. Provider clouds, iOS/Appium, proxies, profiles, and credentials require external setup. |
|
|
79
81
|
|
|
80
82
|
## Follow-up decision after closure
|
|
81
83
|
|
|
82
|
-
Native `job`, `qa`, experimental `sourceLookup`, experimental `networkSourceLookup`,
|
|
84
|
+
Native `job`, `qa`, experimental `sourceLookup`, experimental `networkSourceLookup`, first-class Electron lifecycle/probe support, and optional Brave-backed companion web search are shipped.
|
|
85
|
+
|
|
86
|
+
`RQ-0121` adds Pi-scoped package config plus optional Brave web search without turning search into an `agent_browser` input mode. Config lives at `~/.pi/config/pi-agent-browser-native/config.json`, `.pi/config/pi-agent-browser-native/config.json`, or an explicit `PI_AGENT_BROWSER_CONFIG` override, with global → project → override merge order and `BRAVE_API_KEY` as a fallback only when no config credential source exists. `webSearch.braveApiKey` supports Pi model/provider-style literal, `$ENV_VAR` / `${ENV_VAR}`, escape, and `!command` values in trusted global/override config; project-local config rejects plaintext, interpolation-literal, malformed, and command-backed keys and allows inert exact env references only. `agent_browser_web_search` registers only when a usable credential source is available, resolves command secrets lazily at execution, calls Brave Search, returns compact normalized result details, and never exposes the key. Browser default profile config records conservative prompt guidance only; it does not auto-inject launch args. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#optional-companion-web-search); human workflow: README optional package config and [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#optional-package-config-and-companion-web-search); implementation: `extensions/agent-browser/lib/config.ts`, `extensions/agent-browser/lib/web-search.ts`, `scripts/config.mjs`, and conditional registration in `extensions/agent-browser/index.ts`; fake coverage: `test/agent-browser.config.test.ts`, `test/agent-browser.web-search.test.ts`, and `test/agent-browser.config-cli.test.ts`.
|
|
83
87
|
|
|
84
88
|
`RQ-0066` shipped as the bounded evidence model in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sourcelookup): it compiles to upstream `batch` steps (`is visible`, `get html`, `react inspect`, `react tree` as applicable), merges `details.sourceLookup` into the tool `details` alongside batch presentation, and never reclassifies an upstream-successful batch to failed solely because no candidates were found (unlike `qa` diagnostic reclassification). Wrapper-tracked packaged Electron no-candidate results now add bounded `workspaceRoot` / `electronContext` when available, limitations that the scan only covers the Pi cwd and does not unpack installed app resources or `app.asar`, and live Electron `snapshot` / `probe` / `tab list` next actions. Fake coverage: `agentBrowserExtension explains packaged Electron sourceLookup no-candidate boundaries` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
|
|
85
89
|
|