pi-agent-browser-native 0.2.30 → 0.2.32
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +31 -0
- package/README.md +51 -13
- package/docs/ARCHITECTURE.md +12 -10
- package/docs/COMMAND_REFERENCE.md +66 -15
- package/docs/ELECTRON.md +368 -0
- package/docs/RELEASE.md +40 -12
- package/docs/REQUIREMENTS.md +7 -4
- package/docs/SUPPORT_MATRIX.md +21 -10
- package/docs/TOOL_CONTRACT.md +200 -37
- package/extensions/agent-browser/index.ts +2305 -127
- package/extensions/agent-browser/lib/electron/cleanup.ts +287 -0
- package/extensions/agent-browser/lib/electron/discovery.ts +717 -0
- package/extensions/agent-browser/lib/electron/launch.ts +553 -0
- package/extensions/agent-browser/lib/playbook.ts +14 -13
- package/extensions/agent-browser/lib/results/presentation.ts +191 -9
- package/extensions/agent-browser/lib/results/shared.ts +95 -1
- package/extensions/agent-browser/lib/temp.ts +26 -0
- package/package.json +5 -4
package/docs/ELECTRON.md
ADDED
|
@@ -0,0 +1,368 @@
|
|
|
1
|
+
# Electron desktop apps
|
|
2
|
+
|
|
3
|
+
Related docs:
|
|
4
|
+
- [`../README.md`](../README.md)
|
|
5
|
+
- [`../AGENTS.md`](../AGENTS.md) — maintainer verification (`npm run verify`, lifecycle), Pi `tmux` smoke expectations, and upstream rebaselining
|
|
6
|
+
- [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md) — full `electron` and `qa.attached` field contracts
|
|
7
|
+
- [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) — workflow snippets in the broader native command surface
|
|
8
|
+
- [`ARCHITECTURE.md`](ARCHITECTURE.md) — wrapper design and the closed `RQ-0068` recipe-layer decision
|
|
9
|
+
- [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md) — `RQ-0096` Electron support row and verification gates
|
|
10
|
+
|
|
11
|
+
## Purpose
|
|
12
|
+
|
|
13
|
+
This guide is the entry point for using `pi-agent-browser-native` against desktop **Electron** applications. The wrapper exposes a top-level `electron` shorthand that owns the awkward discover → launch → attach → probe → cleanup sequence so agents do not hand-build `--remote-debugging-port` argv, poll `DevToolsActivePort`, and `kill` profile directories. After attach, the rest of the native `agent_browser` surface (`snapshot`, `find`, `click`, `fill`, `get`, `eval --stdin`, `batch`, `qa.attached`, and similar) works the same way it does against a web page.
|
|
14
|
+
|
|
15
|
+
This document is structured for users, not implementers. Field-level rules live in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#electron); this guide focuses on **when** and **how** to use them, and on the safety and ownership boundary the wrapper enforces.
|
|
16
|
+
|
|
17
|
+
## Who this is for
|
|
18
|
+
|
|
19
|
+
- **Pi users** who want an agent to operate a local Electron app the same way it operates a web page.
|
|
20
|
+
- **Coding agents** that need a low-context lifecycle for desktop apps such as VS Code, Cursor, Obsidian, Slack, or any app built on Electron, without re-implementing the CDP attach dance every session.
|
|
21
|
+
- **Maintainers and reviewers** validating the wrapper's Electron behavior before release; verification evidence lives under `RQ-0096` in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md).
|
|
22
|
+
|
|
23
|
+
It is **not** an upstream `agent-browser` reference and it does **not** replace the canonical [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#electron) for exact field semantics, validation rules, or failure categories.
|
|
24
|
+
|
|
25
|
+
## Mental model
|
|
26
|
+
|
|
27
|
+
```
|
|
28
|
+
electron.list → discover Electron apps (host-only; no upstream spawn)
|
|
29
|
+
electron.launch → launch a wrapper-owned isolated app, attach via CDP, hand off (snapshot|tabs|connect)
|
|
30
|
+
electron.status → liveness, debug-port, and target inspection (read-only)
|
|
31
|
+
electron.probe → compact one-call state read (title/url/focus/tabs/snapshot)
|
|
32
|
+
electron.cleanup → close managed session, stop the tracked process, remove the temp profile
|
|
33
|
+
qa.attached → smoke check against the currently attached session (no URL)
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
Two ownership modes coexist:
|
|
37
|
+
|
|
38
|
+
1. **Wrapper-owned launches** — `electron.launch` starts a brand-new app process with an **isolated temporary user-data-dir** and an **OS-chosen debug port**. The wrapper records a `launchId` for every such launch and `electron.cleanup` only operates on those `launchId`s.
|
|
39
|
+
2. **Manually launched apps** — you start the Electron app yourself (for example with `open -a Slack --args --remote-debugging-port=9222 --remote-allow-origins='*'`), then attach with `{ "args": ["connect", "9222"], "sessionMode": "fresh" }`. The wrapper does not own that process; **you** are responsible for shutting it down and cleaning its profile.
|
|
40
|
+
|
|
41
|
+
Choosing between the two is a real decision, not a stylistic one. See [Wrapper-owned vs manually launched](#wrapper-owned-vs-manually-launched).
|
|
42
|
+
|
|
43
|
+
## Quick start
|
|
44
|
+
|
|
45
|
+
Discover the app, launch with the default snapshot handoff, work with current refs, then clean up:
|
|
46
|
+
|
|
47
|
+
```json
|
|
48
|
+
{ "electron": { "action": "list", "query": "code" } }
|
|
49
|
+
{ "electron": { "action": "launch", "appName": "Visual Studio Code", "handoff": "snapshot" } }
|
|
50
|
+
{ "args": ["snapshot", "-i"] }
|
|
51
|
+
{ "electron": { "action": "probe", "timeoutMs": 5000 } }
|
|
52
|
+
{ "electron": { "action": "cleanup", "launchId": "electron-…" } }
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
The launch result carries both a `launchId` (used by `status`/`probe`/`cleanup`) and an attached `sessionName` (used by browser-style `snapshot`/`tab`/`click`/`find` calls). Read both from `details.electron.launch` and `details.electron.identifiers`. With default implicit session reuse, the quick-start `args: ["snapshot", "-i"]` line uses that attached session without an extra `--session` argument; pass `--session` explicitly when you target a named upstream session instead.
|
|
56
|
+
|
|
57
|
+
For a quick "is the app actually showing what we expect?" smoke check after attach:
|
|
58
|
+
|
|
59
|
+
```json
|
|
60
|
+
{ "qa": { "attached": true, "expectedText": "Explorer", "screenshotPath": ".dogfood/electron.png" } }
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
`qa.attached` runs against the **current managed session** without opening a URL, so it works for any attached app — wrapper-owned or manually launched.
|
|
64
|
+
|
|
65
|
+
## Wrapper-owned vs manually launched
|
|
66
|
+
|
|
67
|
+
Pick the mode that matches the **state you need**.
|
|
68
|
+
|
|
69
|
+
| | `electron.launch` (wrapper-owned) | `args: ["connect", …]` (manual host launch) |
|
|
70
|
+
|---|---|---|
|
|
71
|
+
| Profile | Isolated temporary `userDataDir` | The app's normal profile (your real signed-in state) |
|
|
72
|
+
| Debug port | OS-chosen via `--remote-debugging-port=0` and `DevToolsActivePort` | Caller-supplied port (for example `9222`) |
|
|
73
|
+
| Signed-in state | **No** — first-run or empty profile | **Yes** — whatever is in the launched profile |
|
|
74
|
+
| Already-running app | Cannot attach to it | Required (or relaunch yourself with a debug port) |
|
|
75
|
+
| Lifecycle ownership | Wrapper owns shutdown and profile cleanup | **You** own shutdown and profile cleanup |
|
|
76
|
+
| When to use | Anything you can do against a fresh app: tooling, UX flows, scripted local QA, exploring panels, packaged debugging | Tasks that explicitly need the user's signed-in Slack/Obsidian/VS Code state |
|
|
77
|
+
| How to clean up | `electron.cleanup` with the returned `launchId` | Close the app yourself; do **not** call `electron.cleanup` |
|
|
78
|
+
|
|
79
|
+
### Manual host-launch pattern
|
|
80
|
+
|
|
81
|
+
When the explicit goal is the user's signed-in local app state and the app is not already running:
|
|
82
|
+
|
|
83
|
+
```bash
|
|
84
|
+
# macOS example
|
|
85
|
+
open -a Slack --args --remote-debugging-port=9222 --remote-allow-origins='*'
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
Then attach and clean up yourself:
|
|
89
|
+
|
|
90
|
+
```json
|
|
91
|
+
{ "args": ["connect", "9222"], "sessionMode": "fresh" }
|
|
92
|
+
{ "args": ["snapshot", "-i"] }
|
|
93
|
+
{ "qa": { "attached": true, "expectedText": "Channels" } }
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
If the app is already running without a debug port, ask before relaunching it — relaunching may lose unsaved state and Electron's single-instance behavior will silently drop a second invocation's `--remote-debugging-port` flag.
|
|
97
|
+
|
|
98
|
+
## Action reference
|
|
99
|
+
|
|
100
|
+
The exact field schemas, validation rules, and `details.*` payload shapes live in [`TOOL_CONTRACT.md#electron`](TOOL_CONTRACT.md#electron). This section is a usage-oriented overview.
|
|
101
|
+
|
|
102
|
+
### `electron.list` — discover apps
|
|
103
|
+
|
|
104
|
+
Host-only scan; does not spawn upstream `agent-browser`. macOS (`/Applications/*.app`, `~/Applications/*.app`) and Linux (`.desktop` launchers under standard XDG, Flatpak, and Snap locations) are supported in v1. On Windows (and any non-macOS/non-Linux host), `list` returns `details.electron.platform: "unsupported"` with an empty `apps` array—use `executablePath` (or a host `appPath` that resolves to a verifiable Electron binary) for `launch` instead; `inspectElectronExecutablePath` in `extensions/agent-browser/lib/electron/discovery.ts` still gates Windows executables before spawn.
|
|
105
|
+
|
|
106
|
+
```json
|
|
107
|
+
{ "electron": { "action": "list", "query": "code", "maxResults": 25 } }
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
Returns app metadata under `details.electron.apps`: `name`, optional `bundleId`/`desktopId`, `appPath`, `executablePath`, `platform`, and optional non-blocking `sensitivity` annotations. Apps flagged as likely sensitive (categories such as `notes`, `chat`, `mail`, `developer-workspace`, or `passwords-auth`) are printed with `[likely sensitive: …]`. These are **advisory hints**, not enforcement; see [Safety and ownership](#safety-and-ownership) for the policy boundary.
|
|
111
|
+
|
|
112
|
+
### `electron.launch` — launch and attach
|
|
113
|
+
|
|
114
|
+
Pass **exactly one** target: `appPath`, `appName`, `bundleId`, or `executablePath`. The wrapper resolves the target, verifies Electron framework evidence, applies optional caller-owned `allow` / `deny` policy, creates an isolated temp `userDataDir`, launches with `--remote-debugging-port=0` plus safe defaults, reads `DevToolsActivePort`, then attaches through upstream `connect` as a fresh managed session.
|
|
115
|
+
|
|
116
|
+
```json
|
|
117
|
+
{
|
|
118
|
+
"electron": {
|
|
119
|
+
"action": "launch",
|
|
120
|
+
"appName": "Visual Studio Code",
|
|
121
|
+
"handoff": "snapshot",
|
|
122
|
+
"targetType": "page",
|
|
123
|
+
"timeoutMs": 30000,
|
|
124
|
+
"appArgs": ["--disable-telemetry"]
|
|
125
|
+
}
|
|
126
|
+
}
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
Handoff selection (`handoff` field):
|
|
130
|
+
|
|
131
|
+
| Value | Behavior | When to use |
|
|
132
|
+
|---|---|---|
|
|
133
|
+
| `"snapshot"` (default) | Attach, list targets, capture `snapshot -i` in one call | You need interactive refs immediately for clicks/fills |
|
|
134
|
+
| `"tabs"` | Attach and list targets only | Safer diagnostic start when you only need target discovery |
|
|
135
|
+
| `"connect"` | Attach and stop | You will run your own follow-up commands |
|
|
136
|
+
|
|
137
|
+
`targetType` defaults to `"page"`; use `"webview"` or `"any"` for apps whose useful UI is exposed as a webview target.
|
|
138
|
+
|
|
139
|
+
Optional `timeoutMs` on `electron.launch` bounds host-side CDP readiness (waiting for `DevToolsActivePort` and attach). When omitted, the default is **15 seconds** with a hard maximum of **120 seconds**, matching `ELECTRON_LAUNCH_DEFAULT_TIMEOUT_MS` and `ELECTRON_LAUNCH_MAX_TIMEOUT_MS` in `extensions/agent-browser/lib/electron/launch.ts`.
|
|
140
|
+
|
|
141
|
+
Wrapper-owned launches **always** use an isolated temp profile and an OS-chosen port. `--user-data-dir`, `--remote-debugging-port`, `--remote-debugging-address`, `--remote-debugging-pipe`, and bare `--` in `appArgs` are rejected. There is no caller-supplied port and no way to make `electron.launch` reuse the app's normal signed-in profile or attach to an already-running app — by design. Use the manual path described above when those are the actual requirements.
|
|
142
|
+
|
|
143
|
+
### `electron.status` — liveness and targets
|
|
144
|
+
|
|
145
|
+
Read-only inspection of one or more tracked launches. Without `launchId` or `all`, it selects the single active wrapper launch when unambiguous.
|
|
146
|
+
|
|
147
|
+
```json
|
|
148
|
+
{ "electron": { "action": "status" } }
|
|
149
|
+
{ "electron": { "action": "status", "launchId": "electron-…" } }
|
|
150
|
+
{ "electron": { "action": "status", "all": true } }
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
Reports `cleanupState`, debug-port and PID liveness, and bounded CDP target metadata under `details.electron.statuses`. Mismatch fields surface when the current managed session or tab no longer matches a live wrapper launch target — typically the cue to follow `reattach-electron-launch` before trusting old refs.
|
|
154
|
+
|
|
155
|
+
### `electron.probe` — compact state read
|
|
156
|
+
|
|
157
|
+
`probe` collapses what would otherwise be separate `get title` / `get url` / focused-element `eval` / `tab list` / `snapshot -i` calls into one bounded result. Use it instead of chaining those reads when you just need a quick "where are we?" check.
|
|
158
|
+
|
|
159
|
+
```json
|
|
160
|
+
{ "electron": { "action": "probe" } }
|
|
161
|
+
{ "electron": { "action": "probe", "launchId": "electron-…", "timeoutMs": 5000 } }
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
Output appears under `details.electron.probe`: `title`, `url`, `focusedElement`, `activeTab`, `tabs`, compact `snapshot` metadata (`refCount`, `refIds`, optional text preview and omission counts), and `errors`. When `launchId` is given, the probe is tied to that tracked launch and will surface mismatch guidance if the wrapper sees a session or target drift; visible output also includes debug-port/pid liveness so a stale `about:blank` against a dead launch is unmistakable.
|
|
165
|
+
|
|
166
|
+
`timeoutMs` bounds each underlying read subprocess. Use it for dense desktop apps when the default budget is too short, or to fail fast when you suspect the app process is wedged.
|
|
167
|
+
|
|
168
|
+
### `electron.cleanup` — wrapper-owned only
|
|
169
|
+
|
|
170
|
+
Closes the tracked managed session, stops only the wrapper-tracked process, verifies that the debug port no longer serves `/json/version`, and removes the wrapper-created `userDataDir`. Cleanup partial failures fail the tool result with `failureCategory: "cleanup-failed"` and the `retry-electron-cleanup` next action references the same `launchId` so retries are bounded.
|
|
171
|
+
|
|
172
|
+
```json
|
|
173
|
+
{ "electron": { "action": "cleanup", "launchId": "electron-…" } }
|
|
174
|
+
{ "electron": { "action": "cleanup", "all": true } }
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
`electron.cleanup` **never** targets:
|
|
178
|
+
|
|
179
|
+
- manually launched apps
|
|
180
|
+
- externally supplied debug ports
|
|
181
|
+
- arbitrary Electron processes the wrapper did not start
|
|
182
|
+
|
|
183
|
+
For manual launches, close the app yourself and clean its profile/temp files with normal host tools.
|
|
184
|
+
|
|
185
|
+
On Pi session shutdown, active wrapper-owned Electron launches are best-effort cleaned. Stale restored records (PID gone, port dead) are **reported** instead of guessed at or killed.
|
|
186
|
+
|
|
187
|
+
### `timeoutMs` by action (quick reference)
|
|
188
|
+
|
|
189
|
+
`electron.list` does not take `timeoutMs` (host scan only). For every other action, `timeoutMs` applies to **different surfaces**; treat values as per-call budgets, not one global knob. Authoritative rules and env overrides live under **Validation and defaults** in [`TOOL_CONTRACT.md#electron`](TOOL_CONTRACT.md#electron).
|
|
190
|
+
|
|
191
|
+
| Action | What `timeoutMs` covers when set | Typical default when omitted |
|
|
192
|
+
| --- | --- | --- |
|
|
193
|
+
| `launch` | Host-side wait for `DevToolsActivePort` and CDP readiness | **15 s**, hard-capped at **120 s** (`normalizeTimeoutMs` in `extensions/agent-browser/lib/electron/launch.ts`) |
|
|
194
|
+
| `status` | Optional managed-session `get title` / `get url` reads used for mismatch diagnostics | Normal tool subprocess budget from `runAgentBrowserProcess` / `AGENT_BROWSER_DEFAULT_TIMEOUT`; localhost CDP HTTP probes keep a short fixed budget (`ELECTRON_STATUS_FETCH_TIMEOUT_MS` in `extensions/agent-browser/lib/electron/cleanup.ts`) |
|
|
195
|
+
| `cleanup` | One combined budget for managed-session `close`, tracked process exit, debug-port verification, and temp profile removal | `PI_AGENT_BROWSER_IMPLICIT_SESSION_CLOSE_TIMEOUT_MS` when set, else **5000 ms** (`getImplicitSessionCloseTimeoutMs` in `extensions/agent-browser/lib/runtime.ts`, passed through `cleanupTrackedElectronLaunches` in `extensions/agent-browser/index.ts`) |
|
|
196
|
+
| `probe` | **Each** upstream read in the probe chain (`get title`, `get url`, focused `eval --stdin`, `tab list`, `snapshot -i`) | Same default as other tool calls (typically **28 s** per subprocess unless `AGENT_BROWSER_DEFAULT_TIMEOUT` / `PI_AGENT_BROWSER_PROCESS_TIMEOUT_MS` overrides `runAgentBrowserProcess` in `extensions/agent-browser/lib/process.ts`) |
|
|
197
|
+
|
|
198
|
+
## `qa.attached` — current-session smoke check
|
|
199
|
+
|
|
200
|
+
`qa` has two forms: the URL form (`qa: { url, … }`) and the attached form (`qa: { attached: true, … }`). The attached form is the right tool for Electron smoke checks after either launch path because it does not open a URL and runs all checks against the current managed session.
|
|
201
|
+
|
|
202
|
+
```json
|
|
203
|
+
{
|
|
204
|
+
"qa": {
|
|
205
|
+
"attached": true,
|
|
206
|
+
"expectedText": "Explorer",
|
|
207
|
+
"expectedSelector": "@e1",
|
|
208
|
+
"checkConsole": true,
|
|
209
|
+
"checkErrors": true,
|
|
210
|
+
"screenshotPath": ".dogfood/electron.png"
|
|
211
|
+
}
|
|
212
|
+
}
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
`qa.attached` rejects `url` and is incompatible with `sessionMode: "fresh"` — attach first with `electron.launch` or raw `connect`, then run `qa.attached`. The full field rules and pass/fail classification live in [`TOOL_CONTRACT.md#qa`](TOOL_CONTRACT.md#qa).
|
|
216
|
+
|
|
217
|
+
In attached Electron sessions, broad selectors such as `body`, `html`, `main`, or `[role=application]` can read the entire app shell. When `get text <selector>` looks too broad, the wrapper may attach `details.electronGetTextScopeWarning` and a `snapshot-for-electron-text-scope` next action; prefer a fresh `snapshot -i`, a current `@ref`, or a narrower panel selector.
|
|
218
|
+
|
|
219
|
+
## `sourceLookup` against packaged Electron apps
|
|
220
|
+
|
|
221
|
+
`sourceLookup` is an experiment for hinting at the source file/component behind a visible element. It is **opt-in** and **evidence-based**: it reports confidence and evidence rather than claiming a guaranteed mapping. The same experimental helper works against packaged Electron apps, but with two important boundaries:
|
|
222
|
+
|
|
223
|
+
1. **Scope of the workspace scan.** `sourceLookup` walks the Pi session **cwd** (default `maxWorkspaceFiles: 2000`, hard cap 5000). It does **not** unpack `app.asar` or installed app resources. For packaged apps where the source lives inside `Contents/Resources/app.asar`, the workspace-search lane will commonly return no candidates.
|
|
224
|
+
2. **React DevTools requirement.** `react inspect <id>` requires the session to have been launched with `--enable react-devtools` before first navigation. For Electron, the wrapper's `electron.launch` path does **not** inject `--enable react-devtools` into the Electron process; that flag belongs to upstream `agent-browser` Chromium launches. If the Electron app does not already expose a React DevTools backend, expect `react inspect` to fail; DOM-attribute and workspace-search candidates may still surface.
|
|
225
|
+
|
|
226
|
+
For wrapper-tracked packaged Electron sessions where `status` is `no-candidates`, the wrapper attaches `workspaceRoot` plus optional `electronContext` (`launchId?`, `appName?`, `appPath?`, `executablePath?`, `sessionName?`, `url?`) and limitations explaining the bundle/asar boundary, plus `snapshot-electron-session`, `probe-electron-launch`, and `list-electron-tabs` next actions so you can inspect the live app and decide whether to widen the workspace or pull source out-of-band before re-running the lookup.
|
|
227
|
+
|
|
228
|
+
```json
|
|
229
|
+
{ "sourceLookup": { "selector": "#save", "reactFiberId": "2", "componentName": "SaveButton" } }
|
|
230
|
+
```
|
|
231
|
+
|
|
232
|
+
Treat `sourceLookup` output as a starting point for navigation, not a substitute for reading code. Full contract: [`TOOL_CONTRACT.md#sourcelookup`](TOOL_CONTRACT.md#sourcelookup).
|
|
233
|
+
|
|
234
|
+
## Safety and ownership
|
|
235
|
+
|
|
236
|
+
Remote debugging exposes app content (DOM, network, JavaScript) to the attached browser tool. The wrapper ships **isolation defaults**; it does **not** classify any app as too-risky-to-launch.
|
|
237
|
+
|
|
238
|
+
### What the wrapper always does
|
|
239
|
+
|
|
240
|
+
- Launches with `--user-data-dir=<wrapper-created-temp>` and `--remote-debugging-port=0`.
|
|
241
|
+
- Reads the OS-chosen port from `DevToolsActivePort`.
|
|
242
|
+
- Adds `--disable-extensions`, `--no-first-run`, and `--no-default-browser-check` alongside sanitized caller `appArgs`.
|
|
243
|
+
- Rejects `appArgs` that try to override lifecycle/debug flags.
|
|
244
|
+
- Refuses to launch non-Electron targets (correctness gate, not a security gate).
|
|
245
|
+
- Treats `electron.cleanup` as wrapper-owned only; never touches manually launched apps.
|
|
246
|
+
|
|
247
|
+
### What the **caller** owns
|
|
248
|
+
|
|
249
|
+
- The decision to launch or attach to a sensitive app in the first place.
|
|
250
|
+
- Optional `allow` / `deny` policy lists when you want guardrails.
|
|
251
|
+
- Profile and process cleanup for manually launched apps.
|
|
252
|
+
- Host-file cleanup for any explicit screenshots, downloads, HARs, traces, or recordings saved to caller-chosen paths. `electron.cleanup` does not touch these.
|
|
253
|
+
|
|
254
|
+
### Caller-owned policy: `allow` / `deny`
|
|
255
|
+
|
|
256
|
+
Both lists match `appName`, `bundleId`, `desktopId`, `appPath`, or `executablePath` by substring.
|
|
257
|
+
|
|
258
|
+
```json
|
|
259
|
+
{
|
|
260
|
+
"electron": {
|
|
261
|
+
"action": "launch",
|
|
262
|
+
"appName": "Slack",
|
|
263
|
+
"allow": ["Slack"],
|
|
264
|
+
"deny": ["1Password", "Bitwarden"]
|
|
265
|
+
}
|
|
266
|
+
}
|
|
267
|
+
```
|
|
268
|
+
|
|
269
|
+
Rules:
|
|
270
|
+
|
|
271
|
+
- If `allow` is set, the target must match at least one entry.
|
|
272
|
+
- If `deny` is set, a matching target is rejected.
|
|
273
|
+
- `deny` wins on conflict.
|
|
274
|
+
- With neither set, launch is permitted.
|
|
275
|
+
|
|
276
|
+
Policy mismatches fail with `failureCategory: "policy-blocked"` and `details.electron.failure.policy` names the matched list and entry.
|
|
277
|
+
|
|
278
|
+
### Likely-sensitive annotations
|
|
279
|
+
|
|
280
|
+
`electron.list` may annotate common private-data apps (`notes`, `chat`, `mail`, `developer-workspace`, `passwords-auth`) with `sensitivity.level: "likely-sensitive"` and a visible `[likely sensitive: …]` marker. These are **advisory hints only**. They do not block `launch` and they do not replace caller `allow` / `deny`.
|
|
281
|
+
|
|
282
|
+
## Failure categories and recovery
|
|
283
|
+
|
|
284
|
+
`details.failureCategory` values you should expect from Electron flows, with the recovery move:
|
|
285
|
+
|
|
286
|
+
| Category | When | Recovery |
|
|
287
|
+
|---|---|---|
|
|
288
|
+
| `validation-error` | Bad input (missing target, conflicting fields, non-Electron target) | Fix the request; the message names the problem |
|
|
289
|
+
| `policy-blocked` | Caller `allow` / `deny` rejected the launch | Adjust the policy or pick a different target |
|
|
290
|
+
| `timeout` | `DevToolsActivePort` never appeared in time | Inspect `details.electron.failure.diagnostics` (PID, profile path, port file state, elapsed/timeout); retry with a higher `timeoutMs` if the app legitimately needs more time |
|
|
291
|
+
| `upstream-error` | Launch/attach/spawn/CDP failure that does not fit a more specific bucket | Inspect `details.electron.failure.diagnostics`; the app may be missing dependencies or hitting a CDP race |
|
|
292
|
+
| `tab-drift` | A successful-looking command was followed by a dead process / debug port / unrecoverable `about:blank` | Use the appended `status-electron-launch` / `probe-electron-launch` next actions, then decide whether to relaunch |
|
|
293
|
+
| `cleanup-failed` | Cleanup only partially succeeded | Inspect `details.electron.cleanup.results[].steps` for remaining process/port/profile state; `retry-electron-cleanup` references the same `launchId` |
|
|
294
|
+
| `stale-ref` | `@e…` ref reused after a navigation/rerender | Take a fresh `snapshot -i` (or follow `refresh-electron-refs-after-rerender` when the wrapper appends it) |
|
|
295
|
+
|
|
296
|
+
Single-instance Electron behavior is a common cause of `timeout` and `upstream-error`. Many Electron apps enforce a single running instance and silently drop a second invocation's `--remote-debugging-port` flag. If the app is already running without a debug port, quit it first or use the manual host-launch path against the existing instance instead.
|
|
297
|
+
|
|
298
|
+
## Troubleshooting
|
|
299
|
+
|
|
300
|
+
### Launch hangs and then times out
|
|
301
|
+
- The app is enforcing single-instance; quit the running copy first, then retry.
|
|
302
|
+
- The app may have moved its Electron framework directory; pass `executablePath` explicitly.
|
|
303
|
+
- `timeoutMs` is too short for a heavy app; raise it (`launch.timeoutMs` is bounded but generous).
|
|
304
|
+
- Read `details.electron.failure.diagnostics`: presence/absence of `DevToolsActivePort`, port number, PID liveness, and elapsed time usually identify the issue.
|
|
305
|
+
|
|
306
|
+
### `electron.list` returns nothing
|
|
307
|
+
- On Linux, the binary may be a custom rebrand without `chrome_*.pak` siblings, an AppImage without a `.desktop` entry, or a statically linked fork. Pass `executablePath` directly.
|
|
308
|
+
- On macOS, apps installed outside `/Applications` and `~/Applications` are not scanned in v1. Pass `appPath` or `executablePath` explicitly.
|
|
309
|
+
- Windows hosts report `platform: "unsupported"` from `electron.list`; always pass `executablePath` (or a resolvable `appPath`) for `launch`.
|
|
310
|
+
|
|
311
|
+
### Attach succeeds but `snapshot -i` returns no refs
|
|
312
|
+
- Some Electron apps take a beat to render. The default `handoff: "snapshot"` already retries briefly; if it still reports no refs, run `snapshot -i` once more before treating the UI as blank.
|
|
313
|
+
- For apps whose UI lives in a webview, switch `targetType` to `"webview"` or `"any"` so the wrapper attaches to the right CDP target.
|
|
314
|
+
|
|
315
|
+
### "I clicked, but nothing happened"
|
|
316
|
+
- A successful upstream `click` means the action was dispatched, not that the app handled it. Re-snapshot, check `details.pageChangeSummary`, or use `qa.attached` to verify.
|
|
317
|
+
- Electron apps frequently rerender in place (no URL change). The wrapper may attach `refresh-electron-refs-after-rerender` to remind you to re-snapshot before reusing `@e…` refs.
|
|
318
|
+
|
|
319
|
+
### `fill` looks fine but the field is empty
|
|
320
|
+
- Custom quick-input controls (VS Code's quick-pick, command palette, etc.) often need focus + keyboard typing rather than a direct `fill`. The wrapper attaches `details.fillVerification` when `get value` disagrees with the requested text; follow `inspect-after-fill-verification` and switch to focus + `keyboard type` before submitting.
|
|
321
|
+
|
|
322
|
+
### `get text` returns the whole app
|
|
323
|
+
- Broad selectors (`body`, `html`, `main`, `[role=application]`) read the entire shell. Use a current `@ref` or a narrower panel selector. The wrapper attaches `details.electronGetTextScopeWarning` and a `snapshot-for-electron-text-scope` next action when it detects this pattern.
|
|
324
|
+
|
|
325
|
+
### `sourceLookup` says `no-candidates` for a packaged app
|
|
326
|
+
- Expected when the app's source lives inside `app.asar`. The wrapper does not unpack bundles. Use `electron.probe` / `snapshot-electron-session` / `list-electron-tabs` next actions to inspect the live UI, or pull source separately into the Pi session cwd before re-running the lookup.
|
|
327
|
+
|
|
328
|
+
### Mismatch between `status` and the active session
|
|
329
|
+
- `electron.status` may report a live wrapper launch while the managed session has drifted to `about:blank`. Follow `reattach-electron-launch`, then refresh refs with `snapshot-electron-session` before continuing.
|
|
330
|
+
|
|
331
|
+
## Cleanup checklist
|
|
332
|
+
|
|
333
|
+
Before ending the task:
|
|
334
|
+
|
|
335
|
+
- Call `electron.cleanup` (or `electron.cleanup` with `all: true`) for every wrapper-owned `launchId` you started. The result reports per-step state for `managed-session`, `process`, `debug-port`, and `user-data-dir`.
|
|
336
|
+
- Confirm `details.electron.cleanup.summary` does not list remaining resources.
|
|
337
|
+
- For **manually launched** apps, close the app yourself and clean any profile or temp files you created. `electron.cleanup` will not (and should not) touch them.
|
|
338
|
+
- Remove any explicit screenshots, recordings, downloads, PDFs, traces, or HAR files you saved to caller-chosen paths. Artifact cleanup is host-owned; the wrapper only reports them under `details.artifacts` and `details.artifactCleanup`.
|
|
339
|
+
|
|
340
|
+
If `cleanup` returns `failureCategory: "cleanup-failed"`, inspect `details.electron.cleanup.results[].steps` and use `retry-electron-cleanup` for the same `launchId`. Do not invent new cleanup commands for processes the wrapper did not start.
|
|
341
|
+
|
|
342
|
+
## Verification and benchmarks
|
|
343
|
+
|
|
344
|
+
Electron support is gated by the same release evidence as the rest of the wrapper:
|
|
345
|
+
|
|
346
|
+
- `RQ-0096` in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md) records the contract, runtime, test, and verification coverage.
|
|
347
|
+
- `electron-lifecycle` and `electron-probe` scenarios in `scripts/agent-browser-efficiency-benchmark.mjs` track the token-efficiency claim deterministically (no real browser, no real launches).
|
|
348
|
+
- Fake-upstream coverage for Electron schema/probe/mismatch/post-command-health/fill-verification/broad-text/discovery-sensitivity lives in `test/agent-browser.extension-validation.test.ts`.
|
|
349
|
+
- Real-app validation is a manual `tmux` smoke pass per the maintainer notes in `AGENTS.md`; the 2026-05-21 dogfood result is recorded at the end of [`docs/plans/electron-extension-2026-05-20.md`](plans/electron-extension-2026-05-20.md).
|
|
350
|
+
|
|
351
|
+
Run the local gate the same way as the rest of the project:
|
|
352
|
+
|
|
353
|
+
```bash
|
|
354
|
+
npm run verify
|
|
355
|
+
```
|
|
356
|
+
|
|
357
|
+
The token-efficiency claim has its own opt-in run:
|
|
358
|
+
|
|
359
|
+
```bash
|
|
360
|
+
npm run benchmark:agent-browser
|
|
361
|
+
```
|
|
362
|
+
|
|
363
|
+
## Where to go next
|
|
364
|
+
|
|
365
|
+
- For exact field semantics, schemas, and `details.*` payloads: [`TOOL_CONTRACT.md#electron`](TOOL_CONTRACT.md#electron) and [`TOOL_CONTRACT.md#qa`](TOOL_CONTRACT.md#qa).
|
|
366
|
+
- For workflow examples woven into the broader command surface: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#electron-desktop-apps).
|
|
367
|
+
- For the closed `RQ-0068` recipe-layer decision that bounds why Electron support is a typed shorthand and not a generic recipe runtime: [`ARCHITECTURE.md`](ARCHITECTURE.md#no-reusable-recipe-layer-yet).
|
|
368
|
+
- For the full release-readiness audit and the `RQ-0096` evidence row: [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md).
|
package/docs/RELEASE.md
CHANGED
|
@@ -5,6 +5,7 @@ Related docs:
|
|
|
5
5
|
- [`REQUIREMENTS.md`](REQUIREMENTS.md)
|
|
6
6
|
- [`ARCHITECTURE.md`](ARCHITECTURE.md)
|
|
7
7
|
- [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md)
|
|
8
|
+
- [`ELECTRON.md`](ELECTRON.md)
|
|
8
9
|
- [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
|
|
9
10
|
- Bounded `agent_browser` outcome metadata on `details` (`resultCategory`, `successCategory`, `failureCategory`, optional `nextActions`, optional `pageChangeSummary` with per-step summaries on `batch`): contract in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); maintainer checklists under “Tool result categories” and “Page-change summaries” in [`../AGENTS.md`](../AGENTS.md)
|
|
10
11
|
- Post-success `get text` selector visibility (`RQ-0074`): optional `details.selectorTextVisibility` / `selectorTextVisibilityAll`, visible warnings, and `inspect-visible-text-candidates*` next actions after read-only visibility probes—[`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), and [`../AGENTS.md`](../AGENTS.md) maintainer checklist
|
|
@@ -36,7 +37,7 @@ npm run verify -- release
|
|
|
36
37
|
|
|
37
38
|
`prepublishOnly` intentionally does **not** run `npm run verify -- lifecycle`, `npm run verify -- real-upstream`, or `npm run verify -- benchmark`; those are separate `npm run verify` modes in [`scripts/project.mjs`](../scripts/project.mjs). Treat the bullets below as the full pre-publish contract even though only the `release` slice is automated at publish time.
|
|
38
39
|
|
|
39
|
-
Every release also requires interactive `tmux`-driven Pi dogfood with the native `agent_browser` tool against real sites.
|
|
40
|
+
Every release also requires interactive `tmux`-driven Pi dogfood with the native `agent_browser` tool against real sites. For extension-focused release smokes, use `pi --no-extensions --no-skills -e .` from the checkout before publish so auto-loaded dogfood/QA skills cannot replace the bounded smoke workflow; run separate skill-enabled dogfood only when validating skill routing or report-generation behavior. Drive prompts with `tmux send-keys`, exercise at least one simple static site and one real documentation/product site, include the higher-level `qa` or `job`/`batch` surfaces when they changed, close every opened browser session, remove screenshots/temp artifacts, and record the outcome in the release notes or support-matrix evidence. Automated localhost and fake-upstream gates do not replace this human-readable live-site transcript evidence. When `electron.*` surfaces, attached-session diagnostics, or `qa.attached` changed, add a local Electron pass: `electron.list` → `electron.launch` (expect isolated profile behavior) → `snapshot -i` or `electron.probe` / `qa.attached` → `electron.cleanup` with the returned `launchId`, verifying status/mismatch guidance if you simulate a dead renderer or stale refs. For dense-dashboard stress coverage, use the [public Grafana stress checklist](#public-grafana-stress-checklist) below; it is a maintainer workflow, not bundled product skill or recipe runtime.
|
|
40
41
|
|
|
41
42
|
The configured-source lifecycle regression harness is required before release because it launches an interactive `pi` process under `tmux` and validates `/reload` plus restart/`/resume` behavior:
|
|
42
43
|
|
|
@@ -73,18 +74,18 @@ Record release evidence as a short note with: date, package/checkout source, tar
|
|
|
73
74
|
|
|
74
75
|
Use this validation prompt after changing click enrichment, tab pinning, ref preflight, form-fill batching, artifact handling, recording, or prompt guidance. It is intentionally more stateful than `example.com` and uses a natural user-style request so the transcript shows what the agent chooses on its own. Do **not** mention `agent_browser`, snapshots, refs, `batch`, `eval`, or upstream command names in the prompt; those are evaluator expectations, not user instructions.
|
|
75
76
|
|
|
76
|
-
Run it in an isolated checkout session. It is fine to restrict active tools at launch so the checkout extension is the only browser surface, but keep
|
|
77
|
+
Run it in an isolated checkout session with skills disabled so the run validates the extension browser workflow instead of external dogfood/QA skill routing. It is fine to restrict active tools at launch so the checkout extension is the only browser surface, but keep those launch details out of the user prompt:
|
|
77
78
|
|
|
78
79
|
```bash
|
|
79
|
-
pi --no-extensions -e . --model openai-codex/gpt-5.5:minimal --tools agent_browser --session-dir "$SESSION_DIR"
|
|
80
|
+
pi --no-extensions --no-skills -e . --model openai-codex/gpt-5.5:minimal --tools agent_browser --session-dir "$SESSION_DIR"
|
|
80
81
|
```
|
|
81
82
|
|
|
82
|
-
Repeat with `--model openai-codex/gpt-5.5:medium` when validating instruction-following robustness. Use unique temp paths for each run and delete them afterward.
|
|
83
|
+
Repeat with `--model openai-codex/gpt-5.5:medium` when validating instruction-following robustness. Use unique temp paths for each run and delete them afterward. Run separate skill-enabled dogfood sessions only when the thing under test is skill integration, not this bounded release smoke.
|
|
83
84
|
|
|
84
85
|
Copy/paste prompt, replacing the two artifact placeholders with exact absolute paths:
|
|
85
86
|
|
|
86
87
|
```text
|
|
87
|
-
Please
|
|
88
|
+
Please run a bounded release smoke check on the public Sauce Demo store. This is not an exploratory bug hunt or dogfood report.
|
|
88
89
|
|
|
89
90
|
Site: https://www.saucedemo.com/
|
|
90
91
|
Demo credentials: standard_user / secret_sauce
|
|
@@ -99,13 +100,13 @@ Scenario:
|
|
|
99
100
|
- Start checkout with a fake name and postal code.
|
|
100
101
|
- Stop on the checkout overview page; do not place the order.
|
|
101
102
|
|
|
102
|
-
Please gather enough evidence to support the
|
|
103
|
+
Please gather enough evidence to support the smoke result:
|
|
103
104
|
- Save a screenshot here: <ABSOLUTE_SCREENSHOT_PATH>.png
|
|
104
105
|
- Save a short screen recording here if recording is available: <ABSOLUTE_RECORDING_PATH>.webm
|
|
105
106
|
- Include the final page title/URL, the selected sort order, cart contents, item total/tax/total, and any browser-side network, console, or page-error issues you see.
|
|
106
107
|
- Clean up by closing the browser when finished.
|
|
107
108
|
|
|
108
|
-
Return a concise PASS/FAIL report with evidence and any tool or workflow issues you noticed.
|
|
109
|
+
Return a concise PASS/FAIL report with evidence and any tool or workflow issues you noticed. Do not create a dogfood-output report directory.
|
|
109
110
|
```
|
|
110
111
|
|
|
111
112
|
Evaluator expectations after the queued Sauce Demo fixes: the agent should independently choose efficient, safe browser operations; native add-to-cart clicks should mutate cart state without JavaScript fallback; same-snapshot form fills may be batched safely when the agent chooses that route; the selected sort order should be verified; checkout must stop before Finish and must not place the order; screenshot and recording must use the requested paths or be explicitly reported unavailable; `network requests` may show public-demo telemetry 401s; `console` may report offline-cache logs; `errors` should show no page errors; and the browser session plus temp artifacts should be cleaned up after evidence is recorded. A run that clicks Finish despite the stop instruction or silently substitutes artifact paths is a workflow failure even if the store flow itself works.
|
|
@@ -167,7 +168,7 @@ Before publishing, validate both local-checkout modes without mixing their assum
|
|
|
167
168
|
4. Run a smoke prompt that exercises `agent_browser`.
|
|
168
169
|
5. Restart the `pi` process after extension edits; Pi settings and `/reload` are not the validation target in this isolated mode.
|
|
169
170
|
|
|
170
|
-
For expanded-surface validation, the smoke prompt should cover native tool invocation rather than shelling out to `agent-browser`: `--version`, `--help`, `skills list`, `skills get core --full`, `open` with `sessionMode: "fresh"`, `snapshot -i`, `click`, top-level `semanticAction` (locator shorthand compiled to upstream `find`, optionally with `semanticAction.session` when you need the same named upstream session as a prior explicit `--session` call), `eval --stdin`, `batch` via stdin, top-level `job`, `qa`, or experimental `sourceLookup` / `networkSourceLookup` (compiled batch smoke), `screenshot <path>`, explicit `--session … open` plus `--session … close`, `network requests`, `console` / `errors`, `diff snapshot`, `stream status` plus `stream disable`, `dashboard start` plus `dashboard stop`, and `chat <message>` (credential failure is acceptable evidence of wrapper pass-through when `AI_GATEWAY_API_KEY` is intentionally unset). Clean up any opened browser session with `close`, remove temporary files, and kill the tmux session before ending validation.
|
|
171
|
+
For expanded-surface validation, the smoke prompt should cover native tool invocation rather than shelling out to `agent-browser`: `--version`, `--help`, `skills list`, `skills get core --full`, `open` with `sessionMode: "fresh"`, `snapshot -i`, `click`, top-level `semanticAction` (locator shorthand compiled to upstream `find` and native dropdown selection compiled to upstream `select`, optionally with `semanticAction.session` when you need the same named upstream session as a prior explicit `--session` call), `eval --stdin`, `batch` via stdin, top-level `job`, `qa`, or experimental `sourceLookup` / `networkSourceLookup` (compiled batch smoke), `screenshot <path>`, explicit `--session … open` plus `--session … close`, `network requests`, `console` / `errors`, `diff snapshot`, `stream status` plus `stream disable`, `dashboard start` plus `dashboard stop`, and `chat <message>` (credential failure is acceptable evidence of wrapper pass-through when `AI_GATEWAY_API_KEY` is intentionally unset). Clean up any opened browser session with `close`, remove temporary files, and kill the tmux session before ending validation.
|
|
171
172
|
|
|
172
173
|
This checklist assumes a real `agent-browser` on `PATH`. It complements, but does not overlap, `npm run verify -- lifecycle`: that harness swaps in a fake upstream binary and focuses on `/reload`, full restart, `/resume`, managed-session continuity, and spill-path persistence (`scripts/verify-lifecycle.mjs`), not the full command matrix above.
|
|
173
174
|
|
|
@@ -181,14 +182,41 @@ Run the automated harness for deterministic configured-source lifecycle regressi
|
|
|
181
182
|
npm run verify -- lifecycle
|
|
182
183
|
```
|
|
183
184
|
|
|
184
|
-
The harness creates an isolated `PI_CODING_AGENT_DIR`, writes settings with exactly one temporary configured package source, runs plain `pi` in `tmux
|
|
185
|
+
The harness creates an isolated `PI_CODING_AGENT_DIR`, writes settings with exactly one temporary configured package source, runs plain `pi` in `tmux` with default model **`zai/glm-5.1`**, puts a deterministic fake `agent-browser` first on `PATH`, and drives `/reload`, full restart, and `/resume`. Per-step tmux waits default to **180000 ms** (three minutes) in [`scripts/verify-lifecycle.mjs`](../scripts/verify-lifecycle.mjs) (`DEFAULT_TIMEOUT_MS`); override with `--timeout-ms <ms>` when slower models or cold starts need more headroom. Override the model when needed:
|
|
186
|
+
|
|
187
|
+
```bash
|
|
188
|
+
npm run verify -- lifecycle --model openai-codex/gpt-5.5:minimal
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
Combine flags in one invocation when both apply (order after `lifecycle` is flexible as long as each value-taking flag is immediately followed by its value):
|
|
192
|
+
|
|
193
|
+
```bash
|
|
194
|
+
npm run verify -- lifecycle --model openai-codex/gpt-5.5:minimal --timeout-ms 600000
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
It asserts same-page managed-session continuity, persisted `details.fullOutputPath` reachability after resume, and updated extension-code pickup through a temporary sentinel command. On failure it retains transcripts/session artifacts; on success it performs best-effort cleanup. It does not replace occasional real-browser manual smoke testing.
|
|
198
|
+
|
|
199
|
+
**Lifecycle triage:** a timeout on sentinel `v2` after `/reload` often means Pi rejected reload while the TUI still showed `Working…` (`Wait for the current response to finish before reloading`), even when the session JSONL already has a final assistant message. Re-run with `--keep-artifacts --verbose`, inspect the retained pane capture, and confirm the configured model follows tool prompts reliably. Slower models may need a higher `--timeout-ms` than the **180000 ms** default.
|
|
200
|
+
|
|
201
|
+
### Environment and automation pitfalls
|
|
202
|
+
|
|
203
|
+
These show up often in cloud dev boxes and scripted smokes; they are maintainer notes, not product defects.
|
|
204
|
+
|
|
205
|
+
| Topic | What to watch for | Mitigation |
|
|
206
|
+
| --- | --- | --- |
|
|
207
|
+
| **Pi CLI vs repo devDependencies** | Global `pi` older than the `@earendil-works/pi-coding-agent` range in `package.json` can change TUI behavior, `/reload`, and tool routing during lifecycle or checkout smokes. | Align `pi` with the repo’s pinned coding-agent release before release gates (`pi update` or install the matching version). |
|
|
208
|
+
| **npm lockfile (`packageManager`)** | `package.json` pins **npm@11**. npm 10 may only strip optional `libc` metadata on `@esbuild/*` platform entries in `package-lock.json` (no dependency version change). | Prefer `npx -y npm@11.14.0 install` when refreshing the lockfile; do not commit npm-10-only lockfile churn. |
|
|
209
|
+
| **`pi -p` / print mode** | Non-interactive `pi -p` may hang or emit no stdout for long real-browser smokes without a TTY. | Use **tmux**-driven interactive `pi` for release evidence and checkout smokes; reserve `-p` for short, non-browser checks. |
|
|
210
|
+
| **Real-browser cleanup** | `real-upstream`, Sauce Demo, and live-site runs can leave defunct Chrome/`agent-browser` children if a session aborts mid-flow. | Close via `agent_browser` / `agent-browser` `close`, kill stray tmux sessions, and remove temp screenshots/HARs under `/tmp` or your chosen artifact dirs. |
|
|
211
|
+
| **Automated prompt driving** | Grepping tmux pane text for words that also appear in the **user** prompt (`PASS`, `FAIL`, `checkout overview`, `Smoke result:`) can false-complete before the agent finishes. | Wait for pane idle (no `Working…`), `agent_browser close` / `Artifact lifecycle`, or JSONL tool results—not instruction phrases copied from the prompt. |
|
|
212
|
+
| **Lifecycle verify flags** | `npm run verify -- lifecycle --model` or `--timeout-ms` without the next argv token fails fast with a usage error—the `project.mjs` facade validates passthrough the same way as `scripts/verify-lifecycle.mjs`. | Always pair flags with values (`--model openai-codex/gpt-5.5:minimal`, `--timeout-ms 600000`) or omit `--model` / `--timeout-ms` to keep the harness defaults (`zai/glm-5.1`, **180000 ms** per-step waits). |
|
|
185
213
|
|
|
186
214
|
Manual validation remains useful for release confidence and installed-package checks:
|
|
187
215
|
|
|
188
216
|
1. Configure exactly one active source for this extension in Pi settings: this checkout path before publishing, or the installed package after publishing.
|
|
189
217
|
2. Launch plain `pi` so extension discovery is active.
|
|
190
218
|
3. Validate managed-session continuity with `/reload` and a full restart + `/resume`.
|
|
191
|
-
4. Re-check local extension-side docs (`README.md`, `docs/COMMAND_REFERENCE.md`, `docs/TOOL_CONTRACT.md`, including the [`semanticAction`](TOOL_CONTRACT.md#semanticaction) rules when that shorthand or upstream `find` behavior changes) and regenerated prompt fragments from `extensions/agent-browser/lib/playbook.ts` via `npm run docs -- playbook check` or `npm run docs`. When the upstream `agent-browser` version or help surface changed, run `npm run verify -- command-reference`.
|
|
219
|
+
4. Re-check local extension-side docs (`README.md`, `docs/COMMAND_REFERENCE.md`, `docs/TOOL_CONTRACT.md`, including the [`semanticAction`](TOOL_CONTRACT.md#semanticaction) rules when that shorthand or upstream `find` / `select` behavior changes) and regenerated prompt fragments from `extensions/agent-browser/lib/playbook.ts` via `npm run docs -- playbook check` or `npm run docs`. When the upstream `agent-browser` version or help surface changed, run `npm run verify -- command-reference`.
|
|
192
220
|
|
|
193
221
|
### Real upstream contract validation
|
|
194
222
|
|
|
@@ -269,8 +297,8 @@ Before publishing:
|
|
|
269
297
|
- run `npm run verify -- command-reference` if the installed upstream `agent-browser` version or help surface changed
|
|
270
298
|
- run `npm run doctor` and confirm any duplicate-source remediation matches the active package/checkout setup
|
|
271
299
|
- run `npm run verify -- real-upstream` for upstream runtime, result-presentation, or managed-session changes
|
|
272
|
-
- confirm both local-checkout modes still work for pre-release validation: isolated `pi --no-extensions -e .` smoke testing and configured-source lifecycle validation
|
|
273
|
-
- complete interactive `tmux` live-site
|
|
300
|
+
- confirm both local-checkout modes still work for pre-release validation: isolated `pi --no-extensions -e .` smoke testing for general checkout loading (add `--no-skills` for extension-focused bounded smokes) and configured-source lifecycle validation
|
|
301
|
+
- complete interactive `tmux` live-site extension smoke with `pi --no-extensions --no-skills -e .` and the native `agent_browser` tool (at least one simple static site and one real documentation/product site; include `qa` or `job`/`batch` when those surfaces changed; use the [public Grafana stress checklist](#public-grafana-stress-checklist) when dashboard/diagnostic/artifact behavior changed; close sessions and remove screenshots/temp artifacts; record evidence). Run separate skill-enabled dogfood only when validating skill routing/report-generation behavior—see [Pre-release checks](#pre-release-checks); automated gates are not a substitute
|
|
274
302
|
- rerun `npm run verify -- release`
|
|
275
303
|
- run `npm run verify -- lifecycle` for configured-source `/reload` plus restart/`/resume` regression coverage (required before publish; see [Pre-release checks](#pre-release-checks))
|
|
276
304
|
- confirm [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md) still maps every current baseline inventory section to docs, runtime handling, tests, and validation status
|
package/docs/REQUIREMENTS.md
CHANGED
|
@@ -4,6 +4,7 @@ Related docs:
|
|
|
4
4
|
- [`../README.md`](../README.md)
|
|
5
5
|
- [`ARCHITECTURE.md`](ARCHITECTURE.md)
|
|
6
6
|
- [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md)
|
|
7
|
+
- [`ELECTRON.md`](ELECTRON.md)
|
|
7
8
|
- [`RELEASE.md`](RELEASE.md)
|
|
8
9
|
- [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
|
|
9
10
|
|
|
@@ -63,7 +64,7 @@ Define the product requirements and constraints for `pi-agent-browser-native`.
|
|
|
63
64
|
|
|
64
65
|
### Native `agent_browser` inputs
|
|
65
66
|
|
|
66
|
-
- Each tool invocation must supply **exactly one** of: `args` (full upstream argv after the binary name)
|
|
67
|
+
- Each tool invocation must supply **exactly one** of: `args` (full upstream argv after the binary name), top-level `semanticAction` (a small intent object compiled into existing upstream `find` argv for locator actions or upstream `select <selector> <value...>` argv for native dropdown selection), `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron` (bounded desktop lifecycle: host `list`, wrapper-owned isolated `launch` with CDP attach, `status`, compact `probe`, and `cleanup`; mutually exclusive with caller `stdin`). Supplying multiple modes or none is rejected before launch (`extensions/agent-browser/index.ts`, `test/agent-browser.extension-validation.test.ts`). Contract and field rules: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#electron); operator workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#electron-desktop-apps).
|
|
67
68
|
- `semanticAction` is not a nested shape inside `batch` stdin; batch steps remain upstream argv string arrays, including `find` steps expressed as token lists.
|
|
68
69
|
- Supported actions, locators, exclusivity rules, when `details.compiledSemanticAction` appears, and bounded `try-*-candidate` follow-ups on `selector-not-found` (specific action/locator pairs only; see contract) are specified in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction), with workflow examples in [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md).
|
|
69
70
|
|
|
@@ -84,11 +85,12 @@ Define the product requirements and constraints for `pi-agent-browser-native`.
|
|
|
84
85
|
- The primary confidence path is a real `pi` session driven in `tmux`.
|
|
85
86
|
- For quick local checkout smoke validation, launch `pi --no-extensions -e .` from the repository root so only the checkout copy loads; do not rely on Pi settings or `/reload` semantics in this isolated mode.
|
|
86
87
|
- For hot-reload validation, configure exactly one active source for this extension in Pi settings and launch plain `pi`; validate `/reload` there because it exercises auto-discovered/configured resources.
|
|
87
|
-
- Maintain a tmux-driven configured-source lifecycle harness (`npm run verify -- lifecycle`; required before release per `docs/RELEASE.md`) that isolates Pi settings, uses exactly one configured source, exercises `/reload`, full restart, and `/resume`, and asserts managed-session continuity plus persisted artifact survival. It is its own `npm run verify` mode rather than part of the default `npm run verify` sequence, but operators still run it before every publish. Keep `docs/RELEASE.md` accurate about the harness behavior, cleanup, transcript retention, and limitations.
|
|
88
|
+
- Maintain a tmux-driven configured-source lifecycle harness (`npm run verify -- lifecycle`; required before release per `docs/RELEASE.md`) that isolates Pi settings, uses exactly one configured source, exercises `/reload`, full restart, and `/resume`, and asserts managed-session continuity plus persisted artifact survival. It is its own `npm run verify` mode rather than part of the default `npm run verify` sequence, but operators still run it before every publish. The harness defaults Pi to model `zai/glm-5.1` (`scripts/verify-lifecycle.mjs`); pass `--model <id>` after `lifecycle` when a different model is required. Keep `docs/RELEASE.md` accurate about the harness behavior, cleanup, transcript retention, and limitations.
|
|
88
89
|
- Validate a full `pi` restart with `/resume` when changes touch managed-session continuity, reload behavior, or persisted artifact paths.
|
|
89
90
|
- Prefer full `pi` restart over `/reload` when validating extension changes beyond a quick reload smoke check.
|
|
90
91
|
- Use `/resume` when needed after restart.
|
|
91
92
|
- Keep testing broader than a single smoke site like `example.com`.
|
|
93
|
+
- Bounded release smokes that validate this extension should disable auto-loaded skills with `--no-skills`; run skill-enabled dogfood separately only when validating external skill routing or report-generation behavior.
|
|
92
94
|
- Maintain a concrete release/package verification workflow in `docs/RELEASE.md` and matching repository scripts.
|
|
93
95
|
|
|
94
96
|
## Representative use cases
|
|
@@ -102,13 +104,14 @@ The design should comfortably support workflows such as:
|
|
|
102
104
|
- headless authenticated `chat.com` / ChatGPT / OpenAI browsing without forcing `--headed` or `--auto-connect`
|
|
103
105
|
- upstream profile/debug workflows without adding a local profile-cloning layer in this package
|
|
104
106
|
- provider-backed or iOS device launches where upstream owns credentials, env, and setup; the wrapper forwards argv and a curated provider-related environment without emulating those backends
|
|
107
|
+
- desktop Electron targets using top-level `electron` for discover → isolated launch → attach → probe/cleanup, or raw `args: ["connect", …]` when the operator launches the real app with a debug port for signed-in state (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#electron) and [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#electron-desktop-apps))
|
|
105
108
|
|
|
106
109
|
## Implications for the implementation
|
|
107
110
|
|
|
108
111
|
- Package-manifest behavior matters more than repo-local development wiring.
|
|
109
112
|
- The extension should use official `pi` hooks and package resources where possible.
|
|
110
113
|
- The wrapper should stay thin, with upstream `agent-browser` remaining the source of truth for command semantics.
|
|
111
|
-
- Successful and failed tool outcomes should surface bounded machine-readable fields on Pi-facing `details` (`resultCategory`, `successCategory`, `failureCategory`, optional structured `nextActions`, optional `pageChangeSummary` with per-step summaries on `batch`, optional `artifactVerification` with the same shape on successful `batchSteps[]` rows) so agents can branch without parsing prose; stateful commands (`auth`, `cookies`, `storage`, `dialog`, `frame`, `state`) plus other structured diagnostics (for example `network`, `diff`, `trace`, `stream`, `dashboard`, `chat`) and `batch` should redact secret-bearing payloads in model-facing `details.data`, including the compact per-step `batch` roll-up on the parent result (full per-step payloads live on `batchSteps[]`). The contract lives in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), enums and classifier precedence live in `extensions/agent-browser/lib/results/shared.ts`, and presentation-time summaries, redaction, and artifact verification rollups are assembled in `extensions/agent-browser/lib/results/presentation.ts` (`buildPageChangeSummary`, `PAGE_CHANGE_SUMMARY_COMMANDS`, `redactPresentationData`, `buildArtifactVerificationSummary`, `buildBatchPresentation`).
|
|
114
|
+
- Successful and failed tool outcomes should surface bounded machine-readable fields on Pi-facing `details` (`resultCategory`, `successCategory`, `failureCategory`, optional structured `nextActions`, optional `pageChangeSummary` with per-step summaries on `batch`, optional `artifactVerification` with the same shape on successful `batchSteps[]` rows) so agents can branch without parsing prose; stateful commands (`auth`, `cookies`, `storage`, `dialog`, `frame`, `state`) plus other structured diagnostics (for example `network`, `diff`, `trace`, `stream`, `dashboard`, `chat`) and `batch` should redact secret-bearing payloads in model-facing `details.data`, including the compact per-step `batch` roll-up on the parent result (full per-step payloads live on `batchSteps[]`). The contract lives in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), enums and classifier precedence live in `extensions/agent-browser/lib/results/shared.ts`, and presentation-time summaries, redaction, network request follow-ups, and artifact verification rollups are assembled in `extensions/agent-browser/lib/results/presentation.ts` (`buildPageChangeSummary`, `PAGE_CHANGE_SUMMARY_COMMANDS`, `redactPresentationData`, `buildArtifactVerificationSummary`, `buildBatchPresentation`).
|
|
112
115
|
- User-facing docs belong in `README.md` and the canonical published files under `docs/`.
|
|
113
116
|
- Agent workflow and deeper testing procedures can stay in `AGENTS.md`, but published docs must not depend on that file being present.
|
|
114
117
|
- When upstream `agent-browser` changes, refresh the local command reference, prompt guidance, and other extension-side docs so agents still have a repo-readable equivalent of the blocked direct-binary help path.
|
|
@@ -121,7 +124,7 @@ The design should comfortably support workflows such as:
|
|
|
121
124
|
- On local Unix launches, extension-generated session names should not fail just because the upstream default socket path is too long; the wrapper should choose a shorter socket directory when needed.
|
|
122
125
|
- Provider selection flags (`-p`, `--provider`) and provider device flags (`--device`) are launch-scoped like profile, CDP, and persisted state: if an extension-managed implicit session is already active, the planner must fail fast with the same recovery guidance as other startup-scoped flags instead of silently forwarding argv upstream would ignore; contract in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sessionmode) and session model in [`ARCHITECTURE.md`](ARCHITECTURE.md).
|
|
123
126
|
- Read-only upstream `skills list`, `skills get …`, and `skills path …` must stay free of implicit managed `--session` under default `sessionMode: "auto"` (still with `--json`), matching plain-text `--help` / `--version` inspection semantics so bundled skill text does not pin or rotate the active browser session; new `skills` subcommands pick up that behavior only after allowlisting in `extensions/agent-browser/lib/runtime.ts` with regression coverage.
|
|
124
|
-
- Optional `semanticAction.session` on native `agent_browser` must compile to a leading `--session <name>` pair before upstream `find` argv so the
|
|
127
|
+
- Optional `semanticAction.session` on native `agent_browser` must compile to a leading `--session <name>` pair before upstream `find` or `select` argv so the shorthand can target a named upstream browser without hand-built `args`, while `buildExecutionPlan` still skips double-injecting the extension-managed implicit session whenever planned argv already starts with `--session`; stale-ref retries for compiled `find` actions and bounded `try-*` candidate `nextActions` must preserve that same prefix. Contract in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction) / [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sessionmode); implementation in `extensions/agent-browser/index.ts` and `extensions/agent-browser/lib/runtime.ts`.
|
|
125
128
|
|
|
126
129
|
## Open design questions
|
|
127
130
|
|