pi-agent-browser-native 0.2.46 → 0.2.48
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +64 -20
- package/README.md +45 -20
- package/docs/ARCHITECTURE.md +14 -14
- package/docs/COMMAND_REFERENCE.md +37 -23
- package/docs/ELECTRON.md +3 -3
- package/docs/RELEASE.md +33 -24
- package/docs/REQUIREMENTS.md +4 -4
- package/docs/SUPPORT_MATRIX.md +34 -106
- package/docs/TOOL_CONTRACT.md +24 -22
- package/docs/platform-smoke.md +2 -2
- package/extensions/agent-browser/index.ts +20 -2
- package/extensions/agent-browser/lib/config-policy.js +16 -5
- package/extensions/agent-browser/lib/config.ts +17 -4
- package/extensions/agent-browser/lib/input-modes/job.ts +138 -62
- package/extensions/agent-browser/lib/input-modes/params.ts +2 -2
- package/extensions/agent-browser/lib/orchestration/browser-run/artifact-paths.ts +44 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/click-dispatch.ts +42 -19
- package/extensions/agent-browser/lib/orchestration/browser-run/diagnostics.ts +6 -4
- package/extensions/agent-browser/lib/orchestration/browser-run/final-result.ts +18 -9
- package/extensions/agent-browser/lib/orchestration/browser-run/prepare/direct-anchor-download.ts +158 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/prepare/network-page-filter.ts +116 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/prepare/scroll-shims.ts +147 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/prepare/snapshot-filter.ts +183 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/prepare/wait-timeouts.ts +58 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/prepare.ts +19 -653
- package/extensions/agent-browser/lib/orchestration/browser-run/process-output.ts +1 -6
- package/extensions/agent-browser/lib/orchestration/browser-run/session-artifacts.ts +8 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/types.ts +1 -0
- package/extensions/agent-browser/lib/pi-tool-rendering.ts +34 -19
- package/extensions/agent-browser/lib/playbook.ts +4 -4
- package/extensions/agent-browser/lib/results/action-recommendations.ts +3 -3
- package/extensions/agent-browser/lib/web-search.ts +11 -4
- package/package.json +4 -4
- package/scripts/agent-browser-capability-baseline.mjs +6 -3
- package/scripts/doctor.mjs +12 -11
- package/scripts/platform-smoke/platform-build-windows.ps1 +2 -2
- package/scripts/platform-smoke/targets.mjs +7 -3
- package/scripts/platform-smoke.mjs +2 -2
package/docs/RELEASE.md
CHANGED
|
@@ -8,9 +8,9 @@ Related docs:
|
|
|
8
8
|
- [`ELECTRON.md`](ELECTRON.md)
|
|
9
9
|
- [`platform-smoke.md`](platform-smoke.md)
|
|
10
10
|
- [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
|
|
11
|
-
- Bounded `agent_browser` outcome metadata on `details` (`resultCategory`, `successCategory`, `failureCategory`, optional `nextActions`, optional `pageChangeSummary` with per-step summaries on `batch`): contract in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); maintainer checklists under “Tool result categories” and “Page-change summaries” in [`../AGENTS.md`](
|
|
12
|
-
- Post-success `get text` selector visibility (`RQ-0074`): optional `details.selectorTextVisibility` / `selectorTextVisibilityAll`, visible warnings, and `inspect-visible-text-candidates*` next actions after read-only visibility probes—[`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), and [`../AGENTS.md`](
|
|
13
|
-
- Managed-session outcomes (`RQ-0077`): after extension-managed implicit or fresh `--session` injection reaches process execution, `details.managedSessionOutcome` records the transition (`created` / `replaced` / `unchanged` / `closed` on success; `preserved` / `abandoned` when a plan fails before a new session becomes current). Failing `sessionMode: "fresh"` calls also append model-visible `Managed session outcome: …`—[`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md), [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md), and [`../AGENTS.md`](
|
|
11
|
+
- Bounded `agent_browser` outcome metadata on `details` (`resultCategory`, `successCategory`, `failureCategory`, optional `nextActions`, optional `pageChangeSummary` with per-step summaries on `batch`): contract in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); maintainer checklists under “Tool result categories” and “Page-change summaries” in [`../AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md)
|
|
12
|
+
- Post-success `get text` selector visibility (`RQ-0074`): optional `details.selectorTextVisibility` / `selectorTextVisibilityAll`, visible warnings, and `inspect-visible-text-candidates*` next actions after read-only visibility probes—[`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), and [`../AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md) maintainer checklist
|
|
13
|
+
- Managed-session outcomes (`RQ-0077`): after extension-managed implicit or fresh `--session` injection reaches process execution, `details.managedSessionOutcome` records the transition (`created` / `replaced` / `unchanged` / `closed` on success; `preserved` / `abandoned` when a plan fails before a new session becomes current). Failing `sessionMode: "fresh"` calls also append model-visible `Managed session outcome: …`—[`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md), [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md), and [`../AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md) maintainer checklist
|
|
14
14
|
- Stateful context commands (`cookies`, `storage`, `auth`, `dialog`, `frame`, `state`) and aggregate `batch` results: model-facing `details.data` is summarized or redacted per [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); aggregate `batch` replaces top-level `details.data` with a compact per-step matrix (`success`, argv-redacted `command`, redacted `result` or scrubbed `error`) while full per-step payloads, artifacts, and categories remain on `batchSteps[]`—operational notes in [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#use-stateful-browser-context-commands-safely), assembly in `extensions/agent-browser/lib/results/presentation/batch.ts`
|
|
15
15
|
|
|
16
16
|
## Purpose
|
|
@@ -30,7 +30,15 @@ npm run smoke:platform:doctor
|
|
|
30
30
|
npm run verify -- release
|
|
31
31
|
```
|
|
32
32
|
|
|
33
|
-
`npm run doctor` is a read-only first-run diagnostic for PATH, targeted upstream version, the
|
|
33
|
+
`npm run doctor` is a read-only first-run diagnostic for PATH, targeted upstream version, the minimum Pi runtime floor, and duplicate package/checkout source conflicts. The package keeps Pi core imports as wildcard `peerDependencies` because installed Pi package docs require the host Pi install to provide those packages, while the doctor fails setup when `pi --version` is below the enforced floor. It does not replace upstream `agent-browser doctor` for browser runtime health and does not edit Pi settings.
|
|
34
|
+
|
|
35
|
+
For PR-ready local confidence before release-only lifecycle and platform cost, run:
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
npm run verify -- pre-pr
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
`pre-pr` composes the default gate with `npm run verify -- package`: generated docs, TypeScript, the full unit/fake suite, live command-reference sampling, and package-content verification. It intentionally does not run lifecycle, packaged Pi smoke, Crabbox platform smoke, real-upstream, dogfood, or benchmark modes.
|
|
34
42
|
|
|
35
43
|
`npm run verify -- release` runs:
|
|
36
44
|
|
|
@@ -41,7 +49,7 @@ npm run verify -- release
|
|
|
41
49
|
|
|
42
50
|
`npm publish` runs npm’s `prepublishOnly` script from `package.json`, which executes the same `npm run verify -- release` gate and then `npm pack --dry-run`. That concatenated gate is everything in the default `npm run verify` step (generated playbook drift, TypeScript, the unit/fake suite, generated command-reference blocks, and live upstream command-reference sampling against the targeted `agent-browser` on `PATH`), the configured-source lifecycle harness, the packaged Pi smoke in `package-pi`, and the release-blocking Crabbox platform matrix. Using `npm publish --ignore-scripts` skips that contract intentionally.
|
|
43
51
|
|
|
44
|
-
`prepublishOnly` intentionally does **not** run the standalone host-only `npm run verify -- real-upstream`, `npm run verify -- dogfood`, or `npm run verify -- benchmark` modes; those remain separate `npm run verify` modes in [`scripts/project.mjs`](
|
|
52
|
+
`prepublishOnly` intentionally does **not** run the standalone host-only `npm run verify -- real-upstream`, `npm run verify -- dogfood`, or `npm run verify -- benchmark` modes; those remain separate `npm run verify` modes in [`scripts/project.mjs`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/scripts/project.mjs). The platform matrix includes its own fast target-local build/package gate and browser dogfood suite, and is automated through the `release` slice.
|
|
45
53
|
|
|
46
54
|
For a deterministic host-only real-browser wrapper smoke without model choice in the loop, run:
|
|
47
55
|
|
|
@@ -64,11 +72,11 @@ The Crabbox gate is only green when suite assertions and artifact manifests unde
|
|
|
64
72
|
|
|
65
73
|
The deterministic dogfood mode uses the extension harness and the real `agent-browser` on `PATH` against a deterministic local file fixture, then verifies top-level `qa`, `semanticAction`, constrained `job`, screenshot artifact verification, and session close. Use `npm run verify -- dogfood --keep-artifacts` or `--artifact-dir <path>` only while debugging, then delete retained screenshots. This smoke complements, but does not replace, human-readable interactive transcript evidence.
|
|
66
74
|
|
|
67
|
-
Every release also requires interactive `tmux`-driven Pi dogfood with the native `agent_browser` tool against real sites. For extension-focused release smokes, use `pi --no-extensions --no-skills -e .` from the checkout before publish so auto-loaded dogfood/QA skills cannot replace the bounded smoke workflow;
|
|
75
|
+
Every release also requires interactive `tmux`-driven Pi dogfood with the native `agent_browser` tool against real sites. For extension-focused release smokes, use `pi --approve --no-extensions --no-skills -e .` from the trusted checkout before publish so auto-loaded dogfood/QA skills cannot replace the bounded smoke workflow; omit `--approve` only when the smoke is explicitly testing Pi's Project Trust prompt. Run separate skill-enabled dogfood only when validating skill routing or report-generation behavior. Drive prompts with `tmux send-keys`, exercise at least one simple static site and one real documentation/product site, include the higher-level `qa` or `job`/`batch` surfaces when they changed, close every opened browser session, remove screenshots/temp artifacts, and record the outcome in the release notes or support-matrix evidence. Do not paste raw multi-line prompts into a tmux Pi pane: plain newlines submit separate queued user messages. For scripted smoke driving, collapse prompt files to one line before sending (`PROMPT=$(tr '\n' ' ' < /tmp/smoke-prompt.md); tmux send-keys -t "$SESSION":0.0 -l "$PROMPT"; tmux send-keys -t "$SESSION":0.0 Enter`). For manual multi-line editing, use Pi's external editor shortcut (`Ctrl+G`) or configure tmux extended keys so Pi can receive `Shift+Enter` for newlines; see the installed Pi `docs/tmux.md` guidance. Automated localhost, fake-upstream, and deterministic dogfood gates do not replace this human-readable live-site transcript evidence. When `agent_browser_web_search` or package config changed, add one key-free smoke proving the optional tool is absent without config, one fake/unit-backed smoke in the default suite, and one opt-in live Exa or Brave Search check with a real key while confirming the key does not appear in transcripts, stdout/stderr, config status, PR text, or artifacts. When `electron.*` surfaces, attached-session diagnostics, or `qa.attached` changed, add a local Electron pass: `electron.list` → `electron.launch` (expect isolated profile behavior) → `snapshot -i` or `electron.probe` / `qa.attached` → `electron.cleanup` with the returned `launchId`, verifying status/mismatch guidance if you simulate a dead renderer or stale refs. For dense-dashboard stress coverage, use the [public Grafana stress checklist](#public-grafana-stress-checklist) below; it is a maintainer workflow, not bundled product skill or recipe runtime.
|
|
68
76
|
|
|
69
|
-
When reviewing saved session JSONL after a failed smoke or a `qa` preset that reclassified an upstream-successful batch, expect `agent_browser` tool rows to carry `isError: true` whenever `details.resultCategory` is `failure`. For normal prose output, model-visible text should end with a `Pi tool isError: true` category line; for caller-requested `--json` output, the hook preserves parseable JSON and only patches `isError`. The extension applies that patch on the `tool_result` path so Pi’s transcript matches the wrapper contract ([`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details)). Preserve a normal Pi session directory for those checks; avoiding `--no-session` keeps this evidence intact ([`AGENTS.md`](
|
|
77
|
+
When reviewing saved session JSONL after a failed smoke or a `qa` preset that reclassified an upstream-successful batch, expect `agent_browser` tool rows to carry `isError: true` whenever `details.resultCategory` is `failure`. For normal prose output, model-visible text should end with a `Pi tool isError: true` category line; for caller-requested `--json` output, the hook preserves parseable JSON and only patches `isError`. The extension applies that patch on the `tool_result` path so Pi’s transcript matches the wrapper contract ([`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details)). Preserve a normal Pi session directory for those checks; avoiding `--no-session` keeps this evidence intact ([`AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md) preferred validation workflow).
|
|
70
78
|
|
|
71
|
-
The configured-source lifecycle regression harness is required before release because it launches an interactive `pi` process under `tmux` and validates `/reload`, full relaunch with the same exact Pi 0.
|
|
79
|
+
The configured-source lifecycle regression harness is required before release because it launches an interactive `pi` process under `tmux` with `--approve` and validates `/reload`, full relaunch with the same exact Pi 0.79 `--session-id`, managed-session continuity, persisted artifacts, and Pi failure-patch behavior. Branch-backed `session_tree` rehydration and cleanup ownership are validated by focused extension harness tests:
|
|
72
80
|
|
|
73
81
|
```bash
|
|
74
82
|
npm run verify -- lifecycle
|
|
@@ -106,11 +114,13 @@ Use this validation prompt after changing click enrichment, tab pinning, ref pre
|
|
|
106
114
|
Run it in an isolated checkout session with skills disabled so the run validates the extension browser workflow instead of external dogfood/QA skill routing. It is fine to restrict active tools at launch so the checkout extension is the only browser surface, but keep those launch details out of the user prompt:
|
|
107
115
|
|
|
108
116
|
```bash
|
|
109
|
-
pi --no-extensions --no-skills -e . --model openai-codex/gpt-5.5:minimal --tools agent_browser --session-dir "$SESSION_DIR"
|
|
117
|
+
pi --approve --no-extensions --no-skills -e . --model openai-codex/gpt-5.5:minimal --tools agent_browser --session-dir "$SESSION_DIR"
|
|
110
118
|
```
|
|
111
119
|
|
|
112
120
|
Repeat with `--model openai-codex/gpt-5.5:medium` when validating instruction-following robustness. Use unique temp paths for each run and delete them afterward. Run separate skill-enabled dogfood sessions only when the thing under test is skill integration, not this bounded release smoke.
|
|
113
121
|
|
|
122
|
+
Submit the prompt as one Pi message. In tmux automation, write it to a temp file with placeholders replaced, collapse newlines to spaces, and send that one line; for manual multiline entry, use Pi's `Ctrl+G` external editor or a tmux setup that preserves `Shift+Enter` newlines. Do not paste the raw block into a tmux pane line-by-line.
|
|
123
|
+
|
|
114
124
|
Copy/paste prompt, replacing the two artifact placeholders with exact absolute paths:
|
|
115
125
|
|
|
116
126
|
```text
|
|
@@ -142,13 +152,14 @@ Evaluator expectations after the queued Sauce Demo fixes: the agent should indep
|
|
|
142
152
|
|
|
143
153
|
## Deterministic agent efficiency benchmark
|
|
144
154
|
|
|
145
|
-
[`scripts/agent-browser-efficiency-benchmark.mjs`](
|
|
155
|
+
[`scripts/agent-browser-efficiency-benchmark.mjs`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/scripts/agent-browser-efficiency-benchmark.mjs) is an accounting-only benchmark: it does not shell out to `agent-browser`, launch a browser, or read or write Pi sessions. It models representative `agent_browser` call shapes (including optional `stdin` for `batch` and top-level `job`, `qa`, or experimental `sourceLookup` / `networkSourceLookup` objects that compile to batch) and aggregates success rate, tool-call counts, UTF-8 size of model-visible strings, stale-ref failure and recovery counts, artifact success, distinct failure-category coverage, and summed elapsed-time estimates. When extending scenarios, keep them aligned with the closed `RQ-0068` “no reusable recipe layer” rationale in [`ARCHITECTURE.md`](ARCHITECTURE.md#no-reusable-recipe-layer-yet) (benchmark ids cited there are the canonical inventory for that evidence bar).
|
|
146
156
|
|
|
147
157
|
- **During development:** `npm run benchmark:agent-browser` prints a Markdown report; `npm run benchmark:agent-browser -- --json` saves machine-readable metrics; `npm run benchmark:agent-browser -- --compare path/to/prior.json` fails with exit code `1` on regressions (see the script’s `--help` for exit codes). Optional `--sample-jsonl path/to/session.jsonl` adds a `jsonlSample` section with real UTF-8 byte totals and per-workflow/overall p95 sizes for model-visible `agent_browser` tool-result text without changing deterministic scenario metrics; comparison ignores `jsonlSample` blocks.
|
|
148
|
-
- **Default gate:** `npm run verify` checks generated playbook drift, runs `tsc --noEmit`, runs the full unit/fake suite under `test/**/*.test.ts` with Node test concurrency pinned to `1` (including [`test/agent-browser.efficiency-benchmark.test.ts`](
|
|
149
|
-
- **
|
|
158
|
+
- **Default gate:** `npm run verify` checks generated playbook drift, runs `tsc --noEmit`, runs the full unit/fake suite under `test/**/*.test.ts` with Node test concurrency pinned to `1` (including [`test/agent-browser.efficiency-benchmark.test.ts`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/test/agent-browser.efficiency-benchmark.test.ts) for scenario coverage and comparison behavior), verifies generated command-reference baseline blocks, and samples live upstream command-reference tokens. It does not spawn the standalone benchmark script’s JSON/Markdown run; that is what the opt-in slice below adds.
|
|
159
|
+
- **Pre-PR gate:** `npm run verify -- pre-pr` runs the default gate plus `npm run verify -- package` for larger handoffs that need package-content confidence without lifecycle, platform, real-upstream, dogfood, or benchmark cost.
|
|
160
|
+
- **Opt-in slice:** `npm run verify -- benchmark` runs the benchmark script once with `--json` and then that same test module alone. It is intentionally **not** part of `npm run verify -- pre-pr` or `npm run verify -- release`, so routine handoff and publish gates stay decoupled from benchmark churn while still allowing a focused check after editing scenarios or `CURRENT_BENCHMARK_VERSION`.
|
|
150
161
|
|
|
151
|
-
Maintainer constraints for evolving scenarios and version bumps are summarized under “Agent browser efficiency benchmark” in [`../AGENTS.md`](
|
|
162
|
+
Maintainer constraints for evolving scenarios and version bumps are summarized under “Agent browser efficiency benchmark” in [`../AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md).
|
|
152
163
|
|
|
153
164
|
## What package verification checks
|
|
154
165
|
|
|
@@ -174,9 +185,7 @@ The packaged execution smoke intentionally uses a temporary fake `agent-browser`
|
|
|
174
185
|
Current forbidden packed files include:
|
|
175
186
|
|
|
176
187
|
- `AGENTS.md`
|
|
177
|
-
- `docs/
|
|
178
|
-
- `docs/native-integration-design.md`
|
|
179
|
-
- `docs/v1-tool-contract.md`
|
|
188
|
+
- archived planning drafts under `docs/archive/`
|
|
180
189
|
- `.pi/extensions/agent-browser.ts`
|
|
181
190
|
- test and repo-only maintenance files
|
|
182
191
|
|
|
@@ -193,7 +202,7 @@ Before publishing, validate both local-checkout modes without mixing their assum
|
|
|
193
202
|
### Quick isolated checkout smoke test
|
|
194
203
|
|
|
195
204
|
1. Install `agent-browser` separately.
|
|
196
|
-
2. Launch `pi --no-extensions -e .` from this repository root.
|
|
205
|
+
2. Launch `pi --approve --no-extensions -e .` from this trusted repository root. Omit `--approve` only when testing Pi's Project Trust prompt.
|
|
197
206
|
3. Confirm the checkout extension loads from `extensions/agent-browser/index.ts`.
|
|
198
207
|
4. Run a smoke prompt that exercises `agent_browser`.
|
|
199
208
|
5. Restart the `pi` process after extension edits; Pi settings and `/reload` are not the validation target in this isolated mode.
|
|
@@ -212,7 +221,7 @@ Run the automated harness for deterministic configured-source lifecycle regressi
|
|
|
212
221
|
npm run verify -- lifecycle
|
|
213
222
|
```
|
|
214
223
|
|
|
215
|
-
The harness creates an isolated `PI_CODING_AGENT_DIR`, writes settings with exactly one temporary configured package source, runs `pi` in `tmux` with default model **`zai/glm-5.1
|
|
224
|
+
The harness creates an isolated `PI_CODING_AGENT_DIR`, writes settings with exactly one temporary configured package source, runs `pi` in `tmux` with `--approve`, default model **`zai/glm-5.1`**, and a deterministic `--session-id`, puts a deterministic fake `agent-browser` first on `PATH`, drives `/reload`, closes Pi, and relaunches with the same exact session id instead of typing `/resume`. It also asserts the JSONL session header id, same-page managed-session continuity, persisted spill reachability, and real Pi `tool_result` failure-patch semantics for a QA reclassification. Per-step tmux waits default to **180000 ms** (three minutes) in [`scripts/verify-lifecycle.mjs`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/scripts/verify-lifecycle.mjs) (`DEFAULT_TIMEOUT_MS`); override with `--timeout-ms <ms>` when slower models or cold starts need more headroom. Override the model when needed:
|
|
216
225
|
|
|
217
226
|
```bash
|
|
218
227
|
npm run verify -- lifecycle --model openai-codex/gpt-5.5:minimal
|
|
@@ -234,7 +243,7 @@ These show up often in cloud dev boxes and scripted smokes; they are maintainer
|
|
|
234
243
|
|
|
235
244
|
| Topic | What to watch for | Mitigation |
|
|
236
245
|
| --- | --- | --- |
|
|
237
|
-
| **Pi CLI vs repo devDependencies** | Global `pi` older than the
|
|
246
|
+
| **Pi CLI vs repo devDependencies** | Global `pi` older than the minimum Pi runtime floor for the release can change TUI behavior, `/reload`, package installs, and tool routing during lifecycle or checkout smokes. | Run `npm run doctor` and align `pi` with the current audited baseline before release gates (`pi update` or install the matching version). The published peer range stays wildcard per Pi package docs, and the doctor enforces the minimum Pi runtime floor before package validation. |
|
|
238
247
|
| **npm lockfile (`packageManager`)** | `package.json` pins **npm@11**. npm 10 may only strip optional `libc` metadata on `@esbuild/*` platform entries in `package-lock.json` (no dependency version change). | Prefer `npx -y npm@11.14.0 install` when refreshing the lockfile; do not commit npm-10-only lockfile churn. |
|
|
239
248
|
| **`pi -p` / print mode** | Non-interactive `pi -p` may hang or emit no stdout for long real-browser smokes without a TTY. | Use **tmux**-driven interactive `pi` for release evidence and checkout smokes; reserve `-p` for short, non-browser checks. |
|
|
240
249
|
| **Real-browser cleanup** | `real-upstream`, Sauce Demo, and live-site runs can leave defunct Chrome/`agent-browser` children if a session aborts mid-flow. | Close via `agent_browser` / `agent-browser` `close`, kill stray tmux sessions, and remove temp screenshots/HARs under `/tmp` or your chosen artifact dirs. |
|
|
@@ -261,11 +270,11 @@ That npm script sets `PI_AGENT_BROWSER_REAL_UPSTREAM=1` for the test process. To
|
|
|
261
270
|
This suite requires the installed `agent-browser --version` to exactly match `scripts/agent-browser-capability-baseline.mjs`. It serves fixture pages from localhost and checks stable `details`/`data` keys via `test/fixtures/agent-browser-real-output-shapes.json`. Coverage groups:
|
|
262
271
|
|
|
263
272
|
- **Inspection and skills (stateless JSON):** `--version`, `--help`, `snapshot --help`, `skills list`, `skills get … --full`, `skills path …` (no managed `sessionName` / `usedImplicitSession`).
|
|
264
|
-
- **Managed session core and safe diagnostic matrix:** fresh `open` on the contract fixture, then implicit reuse across `eval --stdin`, `snapshot -i`, interaction commands (`click`, `dblclick`, `fill`, `type`, `focus`, `keyboard` with `type` / `inserttext`, `press`, `hover`, `check`, `uncheck`, `select`, `upload`, `drag`, `mouse`, `scroll`, `scrollintoview`, `wait` on a
|
|
273
|
+
- **Managed session core and safe diagnostic matrix:** fresh `open` on the contract fixture, then implicit reuse across `eval --stdin`, `snapshot -i`, interaction commands (`click`, `dblclick`, `fill`, `type`, `type --clear --delay`, `focus`, `keyboard` with `type` / `inserttext`, `press`, `hover`, `check`, `uncheck`, `select`, failed `select` no-match, `upload`, `drag`, `mouse`, `scroll`, off-viewport click, `scrollintoview`, `wait` on selectors in the main frame and a selected iframe), extraction (`get` variants, `is` variants, `find label … fill` via native `<label>`, `aria-label`, and `aria-labelledby`, inline `eval`), file outputs (`screenshot`, `pdf`), navigation (`back`, `forward`, `reload`, `tab list`, another `open` to the same fixture), `batch` stdin, `pushstate`, `vitals … --json`, network route/requests/HAR, diff snapshot/screenshot/url, trace/profiler, console/errors/highlight, stream enable/status/disable, and `cookies set --curl`.
|
|
265
274
|
- **Failure shape:** `react tree` on a page opened with `--enable react-devtools` but without a React app (expects a clear missing-renderer error with session-bound `details`).
|
|
266
275
|
- **Async download:** `open` on the `/download` fixture, anchor-triggered export, then `wait --download <path>` metadata and wrapper artifact reporting for the requested path.
|
|
267
276
|
|
|
268
|
-
The default unit suite also runs `agentBrowserExtension passes through core command coverage fallback matrix` in [`test/agent-browser.extension-validation.test.ts`](
|
|
277
|
+
The default unit suite also runs `agentBrowserExtension passes through core command coverage fallback matrix` in [`test/agent-browser.extension-validation.test.ts`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/test/agent-browser.extension-validation.test.ts): a fake upstream records argv so `connect 9222`, `download` with a selector and path, `get url`, `snapshot --compact`, and `tab new` / `tab 0` / `tab close` still prove `--json` plus implicit `--session` ordering without a browser. A second fake-upstream matrix in the same file (`agentBrowserExtension passes through non-core network debug diff stream dashboard and chat families`) pins representative `network`, `diff`, `trace` / `profiler` / `record`, `console` / `errors` / `highlight` / `inspect` / `clipboard`, `stream`, `dashboard`, and `chat` JSON shapes plus redacted `details.data` and argv echoes without a browser. A third matrix (`agentBrowserExtension passes through provider and specialized skill workflows`) asserts provider `open` argv shapes still receive `--json` plus implicit `--session` while read-only `skills get …` stays stateless (no managed session fields) and provider credential env vars are forwarded into the fake upstream log. Extend those matrices when adding passthrough coverage that should stay out of the slow real-upstream loop.
|
|
269
278
|
|
|
270
279
|
### Real upstream suite mechanics, isolation, and troubleshooting
|
|
271
280
|
|
|
@@ -280,7 +289,7 @@ The default unit suite also runs `agentBrowserExtension passes through core comm
|
|
|
280
289
|
- **Missing or extra `details` / `data` keys:** Update `test/fixtures/agent-browser-real-output-shapes.json` in the same change as the wrapper or presentation code that shifts those keys.
|
|
281
290
|
- **Timeouts:** A 120s bound covers the full matrix; repeated timeouts usually mean a hung browser, blocked loopback, or an environment preventing headful/headless launch—check upstream logs and local security tooling before loosening timeouts.
|
|
282
291
|
|
|
283
|
-
The current upstream `agent-browser 0.27.
|
|
292
|
+
The current upstream `agent-browser 0.27.2` `wait --download <path>` saveAs persistence limitation is tracked at [vercel-labs/agent-browser#1300](https://github.com/vercel-labs/agent-browser/issues/1300); until it is fixed, release validation must treat `details.savedFilePath` as upstream-reported metadata and use `details.artifacts[].exists` as the filesystem truth (the contract asserts the requested path is absent on disk while upstream still reports success). If the suite fails because JSON/detail keys drifted, update the wrapper behavior or refresh `test/fixtures/agent-browser-real-output-shapes.json` together with the presentation work that consumes those shapes.
|
|
284
293
|
|
|
285
294
|
Example smoke prompt:
|
|
286
295
|
|
|
@@ -327,8 +336,8 @@ Before publishing:
|
|
|
327
336
|
- run `npm run verify -- command-reference` if the installed upstream `agent-browser` version or help surface changed
|
|
328
337
|
- run `npm run doctor` and confirm any duplicate-source remediation matches the active package/checkout setup
|
|
329
338
|
- run `npm run verify -- real-upstream` for upstream runtime, result-presentation, or managed-session changes
|
|
330
|
-
- confirm both local-checkout modes still work for pre-release validation: isolated `pi --no-extensions -e .` smoke testing for general checkout loading (add `--no-skills` for extension-focused bounded smokes) and configured-source lifecycle validation
|
|
331
|
-
- complete interactive `tmux` live-site extension smoke with `pi --no-extensions --no-skills -e .` and the native `agent_browser` tool (at least one simple static site and one real documentation/product site; include `qa` or `job`/`batch` when those surfaces changed; use the [public Grafana stress checklist](#public-grafana-stress-checklist) when dashboard/diagnostic/artifact behavior changed; close sessions and remove screenshots/temp artifacts; record evidence). Run separate skill-enabled dogfood only when validating skill routing/report-generation behavior—see [Pre-release checks](#pre-release-checks); automated gates are not a substitute
|
|
339
|
+
- confirm both local-checkout modes still work for pre-release validation: isolated `pi --approve --no-extensions -e .` smoke testing for general trusted checkout loading (add `--no-skills` for extension-focused bounded smokes; omit `--approve` only to test the trust prompt) and configured-source lifecycle validation
|
|
340
|
+
- complete interactive `tmux` live-site extension smoke with `pi --approve --no-extensions --no-skills -e .` and the native `agent_browser` tool (at least one simple static site and one real documentation/product site; include `qa` or `job`/`batch` when those surfaces changed; use the [public Grafana stress checklist](#public-grafana-stress-checklist) when dashboard/diagnostic/artifact behavior changed; close sessions and remove screenshots/temp artifacts; record evidence). Run separate skill-enabled dogfood only when validating skill routing/report-generation behavior—see [Pre-release checks](#pre-release-checks); automated gates are not a substitute
|
|
332
341
|
- rerun `npm run verify -- release` and confirm the embedded Crabbox `platform-build` plus `browser-dogfood-smoke` matrix passed on `macos`, `ubuntu`, and `windows-native` with artifacts under `.artifacts/platform-smoke/`
|
|
333
342
|
- run `npm run verify -- lifecycle` for configured-source `/reload`, exact `--session-id` relaunch, managed-session continuity, persisted-spill, and Pi failure-patch regression coverage (required before publish; see [Pre-release checks](#pre-release-checks))
|
|
334
343
|
- confirm [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md) still maps every current baseline inventory section to docs, runtime handling, tests, and validation status
|
package/docs/REQUIREMENTS.md
CHANGED
|
@@ -47,12 +47,12 @@ Define the product requirements and constraints for `pi-agent-browser-native`.
|
|
|
47
47
|
### Install priority
|
|
48
48
|
|
|
49
49
|
- Prioritize the package install path first.
|
|
50
|
-
- User-facing install docs should lead with `pi install npm:pi-agent-browser-native`; ephemeral package trials and validation
|
|
50
|
+
- User-facing install docs should lead with `pi install npm:pi-agent-browser-native`; ephemeral package trials and validation should use `pi --no-extensions -e npm:pi-agent-browser-native[@<version>]` so configured checkout or global sources cannot duplicate `agent_browser`, adding `--approve` in Pi 0.79+ automation when the current project is intentionally trusted.
|
|
51
51
|
- User-facing install docs should also include the GitHub source path `pi install https://github.com/fitchmultz/pi-agent-browser-native`.
|
|
52
52
|
- Provide a read-only package-level doctor command that checks upstream `agent-browser` PATH/version and duplicate Pi package/checkout sources before first use. It must not mutate Pi settings and must remain distinct from upstream `agent-browser doctor`.
|
|
53
53
|
- Keep the current local-checkout path documented as the practical pre-release and development flow.
|
|
54
54
|
- Most users will install this extension globally rather than as a project-local extension.
|
|
55
|
-
- Local checkout smoke testing should use explicit CLI loading such as `pi --no-extensions -e .` or `pi --no-extensions -e /absolute/path/to/pi-agent-browser-native`; Pi settings are bypassed in this mode and code edits require a process restart for validation.
|
|
55
|
+
- Local trusted-checkout smoke testing should use explicit CLI loading such as `pi --approve --no-extensions -e .` or `pi --approve --no-extensions -e /absolute/path/to/pi-agent-browser-native`; Pi settings are bypassed in this mode and code edits require a process restart for validation. Omit `--approve` only when the test is meant to cover Pi's Project Trust prompt.
|
|
56
56
|
- Local checkout hot-reload and exact-session relaunch validation should use configured-source lifecycle mode: exactly one active checkout/package source in Pi settings, launched with plain `pi` (or the lifecycle harness' exact `--session-id` relaunch path), so `/reload` and relaunch events exercise discovered/configured resources. Focused extension harness tests validate Pi `session_tree` branch rehydration and cleanup ownership.
|
|
57
57
|
- Do **not** rely on repo-local `.pi/extensions/` auto-discovery for this package, because it conflicts with the global installed-package path.
|
|
58
58
|
|
|
@@ -81,12 +81,12 @@ Define the product requirements and constraints for `pi-agent-browser-native`.
|
|
|
81
81
|
- Because direct-binary usage is commonly blocked in normal agent sessions, the repo must carry a local command reference for the effective `agent_browser` surface and keep it in sync with upstream changes.
|
|
82
82
|
- Repository verification must include a lightweight command-reference drift check against the targeted installed upstream `agent-browser` version.
|
|
83
83
|
- Published package contents should include the canonical user-facing docs plus `LICENSE`.
|
|
84
|
-
- Published package contents should exclude agent-only and superseded docs such as `AGENTS.md
|
|
84
|
+
- Published package contents should exclude agent-only and superseded docs such as `AGENTS.md` and archived drafts under `docs/archive/`.
|
|
85
85
|
|
|
86
86
|
### Testing guidance
|
|
87
87
|
|
|
88
88
|
- The primary confidence path is a real `pi` session driven in `tmux`.
|
|
89
|
-
- For quick local checkout smoke validation, launch `pi --no-extensions -e .` from the repository root so only the checkout copy loads; do not rely on Pi settings or `/reload` semantics in this isolated mode.
|
|
89
|
+
- For quick local checkout smoke validation, launch `pi --approve --no-extensions -e .` from the repository root so only the checkout copy loads; do not rely on Pi settings or `/reload` semantics in this isolated mode.
|
|
90
90
|
- For hot-reload validation, configure exactly one active source for this extension in Pi settings and launch plain `pi`; validate `/reload` there because it exercises auto-discovered/configured resources.
|
|
91
91
|
- Maintain a tmux-driven configured-source lifecycle harness (`npm run verify -- lifecycle`; required before release per `docs/RELEASE.md`) that isolates Pi settings, uses exactly one configured source, exercises `/reload`, full restart plus exact `--session-id` relaunch, and asserts managed-session continuity, persisted artifact survival, and real Pi `tool_result` failure-patch semantics. It remains outside the default `npm run verify` sequence, but it is embedded in `npm run verify -- release` so `prepublishOnly` enforces it before publish unless scripts are intentionally skipped. The harness defaults Pi to model `zai/glm-5.1` (`scripts/verify-lifecycle.mjs`); pass `--model <id>` after `lifecycle` when a different model is required. Keep `docs/RELEASE.md` accurate about the harness behavior, cleanup, transcript retention, and limitations.
|
|
92
92
|
- Validate a full `pi` restart with exact `--session-id` relaunch or `/resume` when changes touch managed-session continuity, reload behavior, or persisted artifact paths. Validate branch-backed state changes with the focused `session_tree` harness tests.
|