pi-agent-browser-native 0.2.31 → 0.2.32

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,368 @@
1
+ # Electron desktop apps
2
+
3
+ Related docs:
4
+ - [`../README.md`](../README.md)
5
+ - [`../AGENTS.md`](../AGENTS.md) — maintainer verification (`npm run verify`, lifecycle), Pi `tmux` smoke expectations, and upstream rebaselining
6
+ - [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md) — full `electron` and `qa.attached` field contracts
7
+ - [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) — workflow snippets in the broader native command surface
8
+ - [`ARCHITECTURE.md`](ARCHITECTURE.md) — wrapper design and the closed `RQ-0068` recipe-layer decision
9
+ - [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md) — `RQ-0096` Electron support row and verification gates
10
+
11
+ ## Purpose
12
+
13
+ This guide is the entry point for using `pi-agent-browser-native` against desktop **Electron** applications. The wrapper exposes a top-level `electron` shorthand that owns the awkward discover → launch → attach → probe → cleanup sequence so agents do not hand-build `--remote-debugging-port` argv, poll `DevToolsActivePort`, and `kill` profile directories. After attach, the rest of the native `agent_browser` surface (`snapshot`, `find`, `click`, `fill`, `get`, `eval --stdin`, `batch`, `qa.attached`, and similar) works the same way it does against a web page.
14
+
15
+ This document is structured for users, not implementers. Field-level rules live in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#electron); this guide focuses on **when** and **how** to use them, and on the safety and ownership boundary the wrapper enforces.
16
+
17
+ ## Who this is for
18
+
19
+ - **Pi users** who want an agent to operate a local Electron app the same way it operates a web page.
20
+ - **Coding agents** that need a low-context lifecycle for desktop apps such as VS Code, Cursor, Obsidian, Slack, or any app built on Electron, without re-implementing the CDP attach dance every session.
21
+ - **Maintainers and reviewers** validating the wrapper's Electron behavior before release; verification evidence lives under `RQ-0096` in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md).
22
+
23
+ It is **not** an upstream `agent-browser` reference and it does **not** replace the canonical [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#electron) for exact field semantics, validation rules, or failure categories.
24
+
25
+ ## Mental model
26
+
27
+ ```
28
+ electron.list → discover Electron apps (host-only; no upstream spawn)
29
+ electron.launch → launch a wrapper-owned isolated app, attach via CDP, hand off (snapshot|tabs|connect)
30
+ electron.status → liveness, debug-port, and target inspection (read-only)
31
+ electron.probe → compact one-call state read (title/url/focus/tabs/snapshot)
32
+ electron.cleanup → close managed session, stop the tracked process, remove the temp profile
33
+ qa.attached → smoke check against the currently attached session (no URL)
34
+ ```
35
+
36
+ Two ownership modes coexist:
37
+
38
+ 1. **Wrapper-owned launches** — `electron.launch` starts a brand-new app process with an **isolated temporary user-data-dir** and an **OS-chosen debug port**. The wrapper records a `launchId` for every such launch and `electron.cleanup` only operates on those `launchId`s.
39
+ 2. **Manually launched apps** — you start the Electron app yourself (for example with `open -a Slack --args --remote-debugging-port=9222 --remote-allow-origins='*'`), then attach with `{ "args": ["connect", "9222"], "sessionMode": "fresh" }`. The wrapper does not own that process; **you** are responsible for shutting it down and cleaning its profile.
40
+
41
+ Choosing between the two is a real decision, not a stylistic one. See [Wrapper-owned vs manually launched](#wrapper-owned-vs-manually-launched).
42
+
43
+ ## Quick start
44
+
45
+ Discover the app, launch with the default snapshot handoff, work with current refs, then clean up:
46
+
47
+ ```json
48
+ { "electron": { "action": "list", "query": "code" } }
49
+ { "electron": { "action": "launch", "appName": "Visual Studio Code", "handoff": "snapshot" } }
50
+ { "args": ["snapshot", "-i"] }
51
+ { "electron": { "action": "probe", "timeoutMs": 5000 } }
52
+ { "electron": { "action": "cleanup", "launchId": "electron-…" } }
53
+ ```
54
+
55
+ The launch result carries both a `launchId` (used by `status`/`probe`/`cleanup`) and an attached `sessionName` (used by browser-style `snapshot`/`tab`/`click`/`find` calls). Read both from `details.electron.launch` and `details.electron.identifiers`. With default implicit session reuse, the quick-start `args: ["snapshot", "-i"]` line uses that attached session without an extra `--session` argument; pass `--session` explicitly when you target a named upstream session instead.
56
+
57
+ For a quick "is the app actually showing what we expect?" smoke check after attach:
58
+
59
+ ```json
60
+ { "qa": { "attached": true, "expectedText": "Explorer", "screenshotPath": ".dogfood/electron.png" } }
61
+ ```
62
+
63
+ `qa.attached` runs against the **current managed session** without opening a URL, so it works for any attached app — wrapper-owned or manually launched.
64
+
65
+ ## Wrapper-owned vs manually launched
66
+
67
+ Pick the mode that matches the **state you need**.
68
+
69
+ | | `electron.launch` (wrapper-owned) | `args: ["connect", …]` (manual host launch) |
70
+ |---|---|---|
71
+ | Profile | Isolated temporary `userDataDir` | The app's normal profile (your real signed-in state) |
72
+ | Debug port | OS-chosen via `--remote-debugging-port=0` and `DevToolsActivePort` | Caller-supplied port (for example `9222`) |
73
+ | Signed-in state | **No** — first-run or empty profile | **Yes** — whatever is in the launched profile |
74
+ | Already-running app | Cannot attach to it | Required (or relaunch yourself with a debug port) |
75
+ | Lifecycle ownership | Wrapper owns shutdown and profile cleanup | **You** own shutdown and profile cleanup |
76
+ | When to use | Anything you can do against a fresh app: tooling, UX flows, scripted local QA, exploring panels, packaged debugging | Tasks that explicitly need the user's signed-in Slack/Obsidian/VS Code state |
77
+ | How to clean up | `electron.cleanup` with the returned `launchId` | Close the app yourself; do **not** call `electron.cleanup` |
78
+
79
+ ### Manual host-launch pattern
80
+
81
+ When the explicit goal is the user's signed-in local app state and the app is not already running:
82
+
83
+ ```bash
84
+ # macOS example
85
+ open -a Slack --args --remote-debugging-port=9222 --remote-allow-origins='*'
86
+ ```
87
+
88
+ Then attach and clean up yourself:
89
+
90
+ ```json
91
+ { "args": ["connect", "9222"], "sessionMode": "fresh" }
92
+ { "args": ["snapshot", "-i"] }
93
+ { "qa": { "attached": true, "expectedText": "Channels" } }
94
+ ```
95
+
96
+ If the app is already running without a debug port, ask before relaunching it — relaunching may lose unsaved state and Electron's single-instance behavior will silently drop a second invocation's `--remote-debugging-port` flag.
97
+
98
+ ## Action reference
99
+
100
+ The exact field schemas, validation rules, and `details.*` payload shapes live in [`TOOL_CONTRACT.md#electron`](TOOL_CONTRACT.md#electron). This section is a usage-oriented overview.
101
+
102
+ ### `electron.list` — discover apps
103
+
104
+ Host-only scan; does not spawn upstream `agent-browser`. macOS (`/Applications/*.app`, `~/Applications/*.app`) and Linux (`.desktop` launchers under standard XDG, Flatpak, and Snap locations) are supported in v1. On Windows (and any non-macOS/non-Linux host), `list` returns `details.electron.platform: "unsupported"` with an empty `apps` array—use `executablePath` (or a host `appPath` that resolves to a verifiable Electron binary) for `launch` instead; `inspectElectronExecutablePath` in `extensions/agent-browser/lib/electron/discovery.ts` still gates Windows executables before spawn.
105
+
106
+ ```json
107
+ { "electron": { "action": "list", "query": "code", "maxResults": 25 } }
108
+ ```
109
+
110
+ Returns app metadata under `details.electron.apps`: `name`, optional `bundleId`/`desktopId`, `appPath`, `executablePath`, `platform`, and optional non-blocking `sensitivity` annotations. Apps flagged as likely sensitive (categories such as `notes`, `chat`, `mail`, `developer-workspace`, or `passwords-auth`) are printed with `[likely sensitive: …]`. These are **advisory hints**, not enforcement; see [Safety and ownership](#safety-and-ownership) for the policy boundary.
111
+
112
+ ### `electron.launch` — launch and attach
113
+
114
+ Pass **exactly one** target: `appPath`, `appName`, `bundleId`, or `executablePath`. The wrapper resolves the target, verifies Electron framework evidence, applies optional caller-owned `allow` / `deny` policy, creates an isolated temp `userDataDir`, launches with `--remote-debugging-port=0` plus safe defaults, reads `DevToolsActivePort`, then attaches through upstream `connect` as a fresh managed session.
115
+
116
+ ```json
117
+ {
118
+ "electron": {
119
+ "action": "launch",
120
+ "appName": "Visual Studio Code",
121
+ "handoff": "snapshot",
122
+ "targetType": "page",
123
+ "timeoutMs": 30000,
124
+ "appArgs": ["--disable-telemetry"]
125
+ }
126
+ }
127
+ ```
128
+
129
+ Handoff selection (`handoff` field):
130
+
131
+ | Value | Behavior | When to use |
132
+ |---|---|---|
133
+ | `"snapshot"` (default) | Attach, list targets, capture `snapshot -i` in one call | You need interactive refs immediately for clicks/fills |
134
+ | `"tabs"` | Attach and list targets only | Safer diagnostic start when you only need target discovery |
135
+ | `"connect"` | Attach and stop | You will run your own follow-up commands |
136
+
137
+ `targetType` defaults to `"page"`; use `"webview"` or `"any"` for apps whose useful UI is exposed as a webview target.
138
+
139
+ Optional `timeoutMs` on `electron.launch` bounds host-side CDP readiness (waiting for `DevToolsActivePort` and attach). When omitted, the default is **15 seconds** with a hard maximum of **120 seconds**, matching `ELECTRON_LAUNCH_DEFAULT_TIMEOUT_MS` and `ELECTRON_LAUNCH_MAX_TIMEOUT_MS` in `extensions/agent-browser/lib/electron/launch.ts`.
140
+
141
+ Wrapper-owned launches **always** use an isolated temp profile and an OS-chosen port. `--user-data-dir`, `--remote-debugging-port`, `--remote-debugging-address`, `--remote-debugging-pipe`, and bare `--` in `appArgs` are rejected. There is no caller-supplied port and no way to make `electron.launch` reuse the app's normal signed-in profile or attach to an already-running app — by design. Use the manual path described above when those are the actual requirements.
142
+
143
+ ### `electron.status` — liveness and targets
144
+
145
+ Read-only inspection of one or more tracked launches. Without `launchId` or `all`, it selects the single active wrapper launch when unambiguous.
146
+
147
+ ```json
148
+ { "electron": { "action": "status" } }
149
+ { "electron": { "action": "status", "launchId": "electron-…" } }
150
+ { "electron": { "action": "status", "all": true } }
151
+ ```
152
+
153
+ Reports `cleanupState`, debug-port and PID liveness, and bounded CDP target metadata under `details.electron.statuses`. Mismatch fields surface when the current managed session or tab no longer matches a live wrapper launch target — typically the cue to follow `reattach-electron-launch` before trusting old refs.
154
+
155
+ ### `electron.probe` — compact state read
156
+
157
+ `probe` collapses what would otherwise be separate `get title` / `get url` / focused-element `eval` / `tab list` / `snapshot -i` calls into one bounded result. Use it instead of chaining those reads when you just need a quick "where are we?" check.
158
+
159
+ ```json
160
+ { "electron": { "action": "probe" } }
161
+ { "electron": { "action": "probe", "launchId": "electron-…", "timeoutMs": 5000 } }
162
+ ```
163
+
164
+ Output appears under `details.electron.probe`: `title`, `url`, `focusedElement`, `activeTab`, `tabs`, compact `snapshot` metadata (`refCount`, `refIds`, optional text preview and omission counts), and `errors`. When `launchId` is given, the probe is tied to that tracked launch and will surface mismatch guidance if the wrapper sees a session or target drift; visible output also includes debug-port/pid liveness so a stale `about:blank` against a dead launch is unmistakable.
165
+
166
+ `timeoutMs` bounds each underlying read subprocess. Use it for dense desktop apps when the default budget is too short, or to fail fast when you suspect the app process is wedged.
167
+
168
+ ### `electron.cleanup` — wrapper-owned only
169
+
170
+ Closes the tracked managed session, stops only the wrapper-tracked process, verifies that the debug port no longer serves `/json/version`, and removes the wrapper-created `userDataDir`. Cleanup partial failures fail the tool result with `failureCategory: "cleanup-failed"` and the `retry-electron-cleanup` next action references the same `launchId` so retries are bounded.
171
+
172
+ ```json
173
+ { "electron": { "action": "cleanup", "launchId": "electron-…" } }
174
+ { "electron": { "action": "cleanup", "all": true } }
175
+ ```
176
+
177
+ `electron.cleanup` **never** targets:
178
+
179
+ - manually launched apps
180
+ - externally supplied debug ports
181
+ - arbitrary Electron processes the wrapper did not start
182
+
183
+ For manual launches, close the app yourself and clean its profile/temp files with normal host tools.
184
+
185
+ On Pi session shutdown, active wrapper-owned Electron launches are best-effort cleaned. Stale restored records (PID gone, port dead) are **reported** instead of guessed at or killed.
186
+
187
+ ### `timeoutMs` by action (quick reference)
188
+
189
+ `electron.list` does not take `timeoutMs` (host scan only). For every other action, `timeoutMs` applies to **different surfaces**; treat values as per-call budgets, not one global knob. Authoritative rules and env overrides live under **Validation and defaults** in [`TOOL_CONTRACT.md#electron`](TOOL_CONTRACT.md#electron).
190
+
191
+ | Action | What `timeoutMs` covers when set | Typical default when omitted |
192
+ | --- | --- | --- |
193
+ | `launch` | Host-side wait for `DevToolsActivePort` and CDP readiness | **15 s**, hard-capped at **120 s** (`normalizeTimeoutMs` in `extensions/agent-browser/lib/electron/launch.ts`) |
194
+ | `status` | Optional managed-session `get title` / `get url` reads used for mismatch diagnostics | Normal tool subprocess budget from `runAgentBrowserProcess` / `AGENT_BROWSER_DEFAULT_TIMEOUT`; localhost CDP HTTP probes keep a short fixed budget (`ELECTRON_STATUS_FETCH_TIMEOUT_MS` in `extensions/agent-browser/lib/electron/cleanup.ts`) |
195
+ | `cleanup` | One combined budget for managed-session `close`, tracked process exit, debug-port verification, and temp profile removal | `PI_AGENT_BROWSER_IMPLICIT_SESSION_CLOSE_TIMEOUT_MS` when set, else **5000 ms** (`getImplicitSessionCloseTimeoutMs` in `extensions/agent-browser/lib/runtime.ts`, passed through `cleanupTrackedElectronLaunches` in `extensions/agent-browser/index.ts`) |
196
+ | `probe` | **Each** upstream read in the probe chain (`get title`, `get url`, focused `eval --stdin`, `tab list`, `snapshot -i`) | Same default as other tool calls (typically **28 s** per subprocess unless `AGENT_BROWSER_DEFAULT_TIMEOUT` / `PI_AGENT_BROWSER_PROCESS_TIMEOUT_MS` overrides `runAgentBrowserProcess` in `extensions/agent-browser/lib/process.ts`) |
197
+
198
+ ## `qa.attached` — current-session smoke check
199
+
200
+ `qa` has two forms: the URL form (`qa: { url, … }`) and the attached form (`qa: { attached: true, … }`). The attached form is the right tool for Electron smoke checks after either launch path because it does not open a URL and runs all checks against the current managed session.
201
+
202
+ ```json
203
+ {
204
+ "qa": {
205
+ "attached": true,
206
+ "expectedText": "Explorer",
207
+ "expectedSelector": "@e1",
208
+ "checkConsole": true,
209
+ "checkErrors": true,
210
+ "screenshotPath": ".dogfood/electron.png"
211
+ }
212
+ }
213
+ ```
214
+
215
+ `qa.attached` rejects `url` and is incompatible with `sessionMode: "fresh"` — attach first with `electron.launch` or raw `connect`, then run `qa.attached`. The full field rules and pass/fail classification live in [`TOOL_CONTRACT.md#qa`](TOOL_CONTRACT.md#qa).
216
+
217
+ In attached Electron sessions, broad selectors such as `body`, `html`, `main`, or `[role=application]` can read the entire app shell. When `get text <selector>` looks too broad, the wrapper may attach `details.electronGetTextScopeWarning` and a `snapshot-for-electron-text-scope` next action; prefer a fresh `snapshot -i`, a current `@ref`, or a narrower panel selector.
218
+
219
+ ## `sourceLookup` against packaged Electron apps
220
+
221
+ `sourceLookup` is an experiment for hinting at the source file/component behind a visible element. It is **opt-in** and **evidence-based**: it reports confidence and evidence rather than claiming a guaranteed mapping. The same experimental helper works against packaged Electron apps, but with two important boundaries:
222
+
223
+ 1. **Scope of the workspace scan.** `sourceLookup` walks the Pi session **cwd** (default `maxWorkspaceFiles: 2000`, hard cap 5000). It does **not** unpack `app.asar` or installed app resources. For packaged apps where the source lives inside `Contents/Resources/app.asar`, the workspace-search lane will commonly return no candidates.
224
+ 2. **React DevTools requirement.** `react inspect <id>` requires the session to have been launched with `--enable react-devtools` before first navigation. For Electron, the wrapper's `electron.launch` path does **not** inject `--enable react-devtools` into the Electron process; that flag belongs to upstream `agent-browser` Chromium launches. If the Electron app does not already expose a React DevTools backend, expect `react inspect` to fail; DOM-attribute and workspace-search candidates may still surface.
225
+
226
+ For wrapper-tracked packaged Electron sessions where `status` is `no-candidates`, the wrapper attaches `workspaceRoot` plus optional `electronContext` (`launchId?`, `appName?`, `appPath?`, `executablePath?`, `sessionName?`, `url?`) and limitations explaining the bundle/asar boundary, plus `snapshot-electron-session`, `probe-electron-launch`, and `list-electron-tabs` next actions so you can inspect the live app and decide whether to widen the workspace or pull source out-of-band before re-running the lookup.
227
+
228
+ ```json
229
+ { "sourceLookup": { "selector": "#save", "reactFiberId": "2", "componentName": "SaveButton" } }
230
+ ```
231
+
232
+ Treat `sourceLookup` output as a starting point for navigation, not a substitute for reading code. Full contract: [`TOOL_CONTRACT.md#sourcelookup`](TOOL_CONTRACT.md#sourcelookup).
233
+
234
+ ## Safety and ownership
235
+
236
+ Remote debugging exposes app content (DOM, network, JavaScript) to the attached browser tool. The wrapper ships **isolation defaults**; it does **not** classify any app as too-risky-to-launch.
237
+
238
+ ### What the wrapper always does
239
+
240
+ - Launches with `--user-data-dir=<wrapper-created-temp>` and `--remote-debugging-port=0`.
241
+ - Reads the OS-chosen port from `DevToolsActivePort`.
242
+ - Adds `--disable-extensions`, `--no-first-run`, and `--no-default-browser-check` alongside sanitized caller `appArgs`.
243
+ - Rejects `appArgs` that try to override lifecycle/debug flags.
244
+ - Refuses to launch non-Electron targets (correctness gate, not a security gate).
245
+ - Treats `electron.cleanup` as wrapper-owned only; never touches manually launched apps.
246
+
247
+ ### What the **caller** owns
248
+
249
+ - The decision to launch or attach to a sensitive app in the first place.
250
+ - Optional `allow` / `deny` policy lists when you want guardrails.
251
+ - Profile and process cleanup for manually launched apps.
252
+ - Host-file cleanup for any explicit screenshots, downloads, HARs, traces, or recordings saved to caller-chosen paths. `electron.cleanup` does not touch these.
253
+
254
+ ### Caller-owned policy: `allow` / `deny`
255
+
256
+ Both lists match `appName`, `bundleId`, `desktopId`, `appPath`, or `executablePath` by substring.
257
+
258
+ ```json
259
+ {
260
+ "electron": {
261
+ "action": "launch",
262
+ "appName": "Slack",
263
+ "allow": ["Slack"],
264
+ "deny": ["1Password", "Bitwarden"]
265
+ }
266
+ }
267
+ ```
268
+
269
+ Rules:
270
+
271
+ - If `allow` is set, the target must match at least one entry.
272
+ - If `deny` is set, a matching target is rejected.
273
+ - `deny` wins on conflict.
274
+ - With neither set, launch is permitted.
275
+
276
+ Policy mismatches fail with `failureCategory: "policy-blocked"` and `details.electron.failure.policy` names the matched list and entry.
277
+
278
+ ### Likely-sensitive annotations
279
+
280
+ `electron.list` may annotate common private-data apps (`notes`, `chat`, `mail`, `developer-workspace`, `passwords-auth`) with `sensitivity.level: "likely-sensitive"` and a visible `[likely sensitive: …]` marker. These are **advisory hints only**. They do not block `launch` and they do not replace caller `allow` / `deny`.
281
+
282
+ ## Failure categories and recovery
283
+
284
+ `details.failureCategory` values you should expect from Electron flows, with the recovery move:
285
+
286
+ | Category | When | Recovery |
287
+ |---|---|---|
288
+ | `validation-error` | Bad input (missing target, conflicting fields, non-Electron target) | Fix the request; the message names the problem |
289
+ | `policy-blocked` | Caller `allow` / `deny` rejected the launch | Adjust the policy or pick a different target |
290
+ | `timeout` | `DevToolsActivePort` never appeared in time | Inspect `details.electron.failure.diagnostics` (PID, profile path, port file state, elapsed/timeout); retry with a higher `timeoutMs` if the app legitimately needs more time |
291
+ | `upstream-error` | Launch/attach/spawn/CDP failure that does not fit a more specific bucket | Inspect `details.electron.failure.diagnostics`; the app may be missing dependencies or hitting a CDP race |
292
+ | `tab-drift` | A successful-looking command was followed by a dead process / debug port / unrecoverable `about:blank` | Use the appended `status-electron-launch` / `probe-electron-launch` next actions, then decide whether to relaunch |
293
+ | `cleanup-failed` | Cleanup only partially succeeded | Inspect `details.electron.cleanup.results[].steps` for remaining process/port/profile state; `retry-electron-cleanup` references the same `launchId` |
294
+ | `stale-ref` | `@e…` ref reused after a navigation/rerender | Take a fresh `snapshot -i` (or follow `refresh-electron-refs-after-rerender` when the wrapper appends it) |
295
+
296
+ Single-instance Electron behavior is a common cause of `timeout` and `upstream-error`. Many Electron apps enforce a single running instance and silently drop a second invocation's `--remote-debugging-port` flag. If the app is already running without a debug port, quit it first or use the manual host-launch path against the existing instance instead.
297
+
298
+ ## Troubleshooting
299
+
300
+ ### Launch hangs and then times out
301
+ - The app is enforcing single-instance; quit the running copy first, then retry.
302
+ - The app may have moved its Electron framework directory; pass `executablePath` explicitly.
303
+ - `timeoutMs` is too short for a heavy app; raise it (`launch.timeoutMs` is bounded but generous).
304
+ - Read `details.electron.failure.diagnostics`: presence/absence of `DevToolsActivePort`, port number, PID liveness, and elapsed time usually identify the issue.
305
+
306
+ ### `electron.list` returns nothing
307
+ - On Linux, the binary may be a custom rebrand without `chrome_*.pak` siblings, an AppImage without a `.desktop` entry, or a statically linked fork. Pass `executablePath` directly.
308
+ - On macOS, apps installed outside `/Applications` and `~/Applications` are not scanned in v1. Pass `appPath` or `executablePath` explicitly.
309
+ - Windows hosts report `platform: "unsupported"` from `electron.list`; always pass `executablePath` (or a resolvable `appPath`) for `launch`.
310
+
311
+ ### Attach succeeds but `snapshot -i` returns no refs
312
+ - Some Electron apps take a beat to render. The default `handoff: "snapshot"` already retries briefly; if it still reports no refs, run `snapshot -i` once more before treating the UI as blank.
313
+ - For apps whose UI lives in a webview, switch `targetType` to `"webview"` or `"any"` so the wrapper attaches to the right CDP target.
314
+
315
+ ### "I clicked, but nothing happened"
316
+ - A successful upstream `click` means the action was dispatched, not that the app handled it. Re-snapshot, check `details.pageChangeSummary`, or use `qa.attached` to verify.
317
+ - Electron apps frequently rerender in place (no URL change). The wrapper may attach `refresh-electron-refs-after-rerender` to remind you to re-snapshot before reusing `@e…` refs.
318
+
319
+ ### `fill` looks fine but the field is empty
320
+ - Custom quick-input controls (VS Code's quick-pick, command palette, etc.) often need focus + keyboard typing rather than a direct `fill`. The wrapper attaches `details.fillVerification` when `get value` disagrees with the requested text; follow `inspect-after-fill-verification` and switch to focus + `keyboard type` before submitting.
321
+
322
+ ### `get text` returns the whole app
323
+ - Broad selectors (`body`, `html`, `main`, `[role=application]`) read the entire shell. Use a current `@ref` or a narrower panel selector. The wrapper attaches `details.electronGetTextScopeWarning` and a `snapshot-for-electron-text-scope` next action when it detects this pattern.
324
+
325
+ ### `sourceLookup` says `no-candidates` for a packaged app
326
+ - Expected when the app's source lives inside `app.asar`. The wrapper does not unpack bundles. Use `electron.probe` / `snapshot-electron-session` / `list-electron-tabs` next actions to inspect the live UI, or pull source separately into the Pi session cwd before re-running the lookup.
327
+
328
+ ### Mismatch between `status` and the active session
329
+ - `electron.status` may report a live wrapper launch while the managed session has drifted to `about:blank`. Follow `reattach-electron-launch`, then refresh refs with `snapshot-electron-session` before continuing.
330
+
331
+ ## Cleanup checklist
332
+
333
+ Before ending the task:
334
+
335
+ - Call `electron.cleanup` (or `electron.cleanup` with `all: true`) for every wrapper-owned `launchId` you started. The result reports per-step state for `managed-session`, `process`, `debug-port`, and `user-data-dir`.
336
+ - Confirm `details.electron.cleanup.summary` does not list remaining resources.
337
+ - For **manually launched** apps, close the app yourself and clean any profile or temp files you created. `electron.cleanup` will not (and should not) touch them.
338
+ - Remove any explicit screenshots, recordings, downloads, PDFs, traces, or HAR files you saved to caller-chosen paths. Artifact cleanup is host-owned; the wrapper only reports them under `details.artifacts` and `details.artifactCleanup`.
339
+
340
+ If `cleanup` returns `failureCategory: "cleanup-failed"`, inspect `details.electron.cleanup.results[].steps` and use `retry-electron-cleanup` for the same `launchId`. Do not invent new cleanup commands for processes the wrapper did not start.
341
+
342
+ ## Verification and benchmarks
343
+
344
+ Electron support is gated by the same release evidence as the rest of the wrapper:
345
+
346
+ - `RQ-0096` in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md) records the contract, runtime, test, and verification coverage.
347
+ - `electron-lifecycle` and `electron-probe` scenarios in `scripts/agent-browser-efficiency-benchmark.mjs` track the token-efficiency claim deterministically (no real browser, no real launches).
348
+ - Fake-upstream coverage for Electron schema/probe/mismatch/post-command-health/fill-verification/broad-text/discovery-sensitivity lives in `test/agent-browser.extension-validation.test.ts`.
349
+ - Real-app validation is a manual `tmux` smoke pass per the maintainer notes in `AGENTS.md`; the 2026-05-21 dogfood result is recorded at the end of [`docs/plans/electron-extension-2026-05-20.md`](plans/electron-extension-2026-05-20.md).
350
+
351
+ Run the local gate the same way as the rest of the project:
352
+
353
+ ```bash
354
+ npm run verify
355
+ ```
356
+
357
+ The token-efficiency claim has its own opt-in run:
358
+
359
+ ```bash
360
+ npm run benchmark:agent-browser
361
+ ```
362
+
363
+ ## Where to go next
364
+
365
+ - For exact field semantics, schemas, and `details.*` payloads: [`TOOL_CONTRACT.md#electron`](TOOL_CONTRACT.md#electron) and [`TOOL_CONTRACT.md#qa`](TOOL_CONTRACT.md#qa).
366
+ - For workflow examples woven into the broader command surface: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#electron-desktop-apps).
367
+ - For the closed `RQ-0068` recipe-layer decision that bounds why Electron support is a typed shorthand and not a generic recipe runtime: [`ARCHITECTURE.md`](ARCHITECTURE.md#no-reusable-recipe-layer-yet).
368
+ - For the full release-readiness audit and the `RQ-0096` evidence row: [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md).
package/docs/RELEASE.md CHANGED
@@ -5,6 +5,7 @@ Related docs:
5
5
  - [`REQUIREMENTS.md`](REQUIREMENTS.md)
6
6
  - [`ARCHITECTURE.md`](ARCHITECTURE.md)
7
7
  - [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md)
8
+ - [`ELECTRON.md`](ELECTRON.md)
8
9
  - [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
9
10
  - Bounded `agent_browser` outcome metadata on `details` (`resultCategory`, `successCategory`, `failureCategory`, optional `nextActions`, optional `pageChangeSummary` with per-step summaries on `batch`): contract in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); maintainer checklists under “Tool result categories” and “Page-change summaries” in [`../AGENTS.md`](../AGENTS.md)
10
11
  - Post-success `get text` selector visibility (`RQ-0074`): optional `details.selectorTextVisibility` / `selectorTextVisibilityAll`, visible warnings, and `inspect-visible-text-candidates*` next actions after read-only visibility probes—[`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), and [`../AGENTS.md`](../AGENTS.md) maintainer checklist
@@ -36,7 +37,7 @@ npm run verify -- release
36
37
 
37
38
  `prepublishOnly` intentionally does **not** run `npm run verify -- lifecycle`, `npm run verify -- real-upstream`, or `npm run verify -- benchmark`; those are separate `npm run verify` modes in [`scripts/project.mjs`](../scripts/project.mjs). Treat the bullets below as the full pre-publish contract even though only the `release` slice is automated at publish time.
38
39
 
39
- Every release also requires interactive `tmux`-driven Pi dogfood with the native `agent_browser` tool against real sites. For extension-focused release smokes, use `pi --no-extensions --no-skills -e .` from the checkout before publish so auto-loaded dogfood/QA skills cannot replace the bounded smoke workflow; run separate skill-enabled dogfood only when validating skill routing or report-generation behavior. Drive prompts with `tmux send-keys`, exercise at least one simple static site and one real documentation/product site, include the higher-level `qa` or `job`/`batch` surfaces when they changed, close every opened browser session, remove screenshots/temp artifacts, and record the outcome in the release notes or support-matrix evidence. Automated localhost and fake-upstream gates do not replace this human-readable live-site transcript evidence. For dense-dashboard stress coverage, use the [public Grafana stress checklist](#public-grafana-stress-checklist) below; it is a maintainer workflow, not bundled product skill or recipe runtime.
40
+ Every release also requires interactive `tmux`-driven Pi dogfood with the native `agent_browser` tool against real sites. For extension-focused release smokes, use `pi --no-extensions --no-skills -e .` from the checkout before publish so auto-loaded dogfood/QA skills cannot replace the bounded smoke workflow; run separate skill-enabled dogfood only when validating skill routing or report-generation behavior. Drive prompts with `tmux send-keys`, exercise at least one simple static site and one real documentation/product site, include the higher-level `qa` or `job`/`batch` surfaces when they changed, close every opened browser session, remove screenshots/temp artifacts, and record the outcome in the release notes or support-matrix evidence. Automated localhost and fake-upstream gates do not replace this human-readable live-site transcript evidence. When `electron.*` surfaces, attached-session diagnostics, or `qa.attached` changed, add a local Electron pass: `electron.list` → `electron.launch` (expect isolated profile behavior) → `snapshot -i` or `electron.probe` / `qa.attached` → `electron.cleanup` with the returned `launchId`, verifying status/mismatch guidance if you simulate a dead renderer or stale refs. For dense-dashboard stress coverage, use the [public Grafana stress checklist](#public-grafana-stress-checklist) below; it is a maintainer workflow, not bundled product skill or recipe runtime.
40
41
 
41
42
  The configured-source lifecycle regression harness is required before release because it launches an interactive `pi` process under `tmux` and validates `/reload` plus restart/`/resume` behavior:
42
43
 
@@ -181,7 +182,34 @@ Run the automated harness for deterministic configured-source lifecycle regressi
181
182
  npm run verify -- lifecycle
182
183
  ```
183
184
 
184
- The harness creates an isolated `PI_CODING_AGENT_DIR`, writes settings with exactly one temporary configured package source, runs plain `pi` in `tmux`, puts a deterministic fake `agent-browser` first on `PATH`, and drives `/reload`, full restart, and `/resume`. It asserts same-page managed-session continuity, persisted `details.fullOutputPath` reachability after resume, and updated extension-code pickup through a temporary sentinel command. On failure it retains transcripts/session artifacts; on success it performs best-effort cleanup. It does not replace occasional real-browser manual smoke testing.
185
+ The harness creates an isolated `PI_CODING_AGENT_DIR`, writes settings with exactly one temporary configured package source, runs plain `pi` in `tmux` with default model **`zai/glm-5.1`**, puts a deterministic fake `agent-browser` first on `PATH`, and drives `/reload`, full restart, and `/resume`. Per-step tmux waits default to **180000 ms** (three minutes) in [`scripts/verify-lifecycle.mjs`](../scripts/verify-lifecycle.mjs) (`DEFAULT_TIMEOUT_MS`); override with `--timeout-ms <ms>` when slower models or cold starts need more headroom. Override the model when needed:
186
+
187
+ ```bash
188
+ npm run verify -- lifecycle --model openai-codex/gpt-5.5:minimal
189
+ ```
190
+
191
+ Combine flags in one invocation when both apply (order after `lifecycle` is flexible as long as each value-taking flag is immediately followed by its value):
192
+
193
+ ```bash
194
+ npm run verify -- lifecycle --model openai-codex/gpt-5.5:minimal --timeout-ms 600000
195
+ ```
196
+
197
+ It asserts same-page managed-session continuity, persisted `details.fullOutputPath` reachability after resume, and updated extension-code pickup through a temporary sentinel command. On failure it retains transcripts/session artifacts; on success it performs best-effort cleanup. It does not replace occasional real-browser manual smoke testing.
198
+
199
+ **Lifecycle triage:** a timeout on sentinel `v2` after `/reload` often means Pi rejected reload while the TUI still showed `Working…` (`Wait for the current response to finish before reloading`), even when the session JSONL already has a final assistant message. Re-run with `--keep-artifacts --verbose`, inspect the retained pane capture, and confirm the configured model follows tool prompts reliably. Slower models may need a higher `--timeout-ms` than the **180000 ms** default.
200
+
201
+ ### Environment and automation pitfalls
202
+
203
+ These show up often in cloud dev boxes and scripted smokes; they are maintainer notes, not product defects.
204
+
205
+ | Topic | What to watch for | Mitigation |
206
+ | --- | --- | --- |
207
+ | **Pi CLI vs repo devDependencies** | Global `pi` older than the `@earendil-works/pi-coding-agent` range in `package.json` can change TUI behavior, `/reload`, and tool routing during lifecycle or checkout smokes. | Align `pi` with the repo’s pinned coding-agent release before release gates (`pi update` or install the matching version). |
208
+ | **npm lockfile (`packageManager`)** | `package.json` pins **npm@11**. npm 10 may only strip optional `libc` metadata on `@esbuild/*` platform entries in `package-lock.json` (no dependency version change). | Prefer `npx -y npm@11.14.0 install` when refreshing the lockfile; do not commit npm-10-only lockfile churn. |
209
+ | **`pi -p` / print mode** | Non-interactive `pi -p` may hang or emit no stdout for long real-browser smokes without a TTY. | Use **tmux**-driven interactive `pi` for release evidence and checkout smokes; reserve `-p` for short, non-browser checks. |
210
+ | **Real-browser cleanup** | `real-upstream`, Sauce Demo, and live-site runs can leave defunct Chrome/`agent-browser` children if a session aborts mid-flow. | Close via `agent_browser` / `agent-browser` `close`, kill stray tmux sessions, and remove temp screenshots/HARs under `/tmp` or your chosen artifact dirs. |
211
+ | **Automated prompt driving** | Grepping tmux pane text for words that also appear in the **user** prompt (`PASS`, `FAIL`, `checkout overview`, `Smoke result:`) can false-complete before the agent finishes. | Wait for pane idle (no `Working…`), `agent_browser close` / `Artifact lifecycle`, or JSONL tool results—not instruction phrases copied from the prompt. |
212
+ | **Lifecycle verify flags** | `npm run verify -- lifecycle --model` or `--timeout-ms` without the next argv token fails fast with a usage error—the `project.mjs` facade validates passthrough the same way as `scripts/verify-lifecycle.mjs`. | Always pair flags with values (`--model openai-codex/gpt-5.5:minimal`, `--timeout-ms 600000`) or omit `--model` / `--timeout-ms` to keep the harness defaults (`zai/glm-5.1`, **180000 ms** per-step waits). |
185
213
 
186
214
  Manual validation remains useful for release confidence and installed-package checks:
187
215
 
@@ -4,6 +4,7 @@ Related docs:
4
4
  - [`../README.md`](../README.md)
5
5
  - [`ARCHITECTURE.md`](ARCHITECTURE.md)
6
6
  - [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md)
7
+ - [`ELECTRON.md`](ELECTRON.md)
7
8
  - [`RELEASE.md`](RELEASE.md)
8
9
  - [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
9
10
 
@@ -63,7 +64,7 @@ Define the product requirements and constraints for `pi-agent-browser-native`.
63
64
 
64
65
  ### Native `agent_browser` inputs
65
66
 
66
- - Each tool invocation must supply **exactly one** of: `args` (full upstream argv after the binary name), top-level `semanticAction` (a small intent object compiled into existing upstream `find` argv for locator actions or upstream `select <selector> <value...>` argv for native dropdown selection), `job`, `qa`, `sourceLookup`, or `networkSourceLookup`. Supplying multiple modes or none is rejected before launch (`extensions/agent-browser/index.ts`, `test/agent-browser.extension-validation.test.ts`).
67
+ - Each tool invocation must supply **exactly one** of: `args` (full upstream argv after the binary name), top-level `semanticAction` (a small intent object compiled into existing upstream `find` argv for locator actions or upstream `select <selector> <value...>` argv for native dropdown selection), `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron` (bounded desktop lifecycle: host `list`, wrapper-owned isolated `launch` with CDP attach, `status`, compact `probe`, and `cleanup`; mutually exclusive with caller `stdin`). Supplying multiple modes or none is rejected before launch (`extensions/agent-browser/index.ts`, `test/agent-browser.extension-validation.test.ts`). Contract and field rules: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#electron); operator workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#electron-desktop-apps).
67
68
  - `semanticAction` is not a nested shape inside `batch` stdin; batch steps remain upstream argv string arrays, including `find` steps expressed as token lists.
68
69
  - Supported actions, locators, exclusivity rules, when `details.compiledSemanticAction` appears, and bounded `try-*-candidate` follow-ups on `selector-not-found` (specific action/locator pairs only; see contract) are specified in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction), with workflow examples in [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md).
69
70
 
@@ -84,7 +85,7 @@ Define the product requirements and constraints for `pi-agent-browser-native`.
84
85
  - The primary confidence path is a real `pi` session driven in `tmux`.
85
86
  - For quick local checkout smoke validation, launch `pi --no-extensions -e .` from the repository root so only the checkout copy loads; do not rely on Pi settings or `/reload` semantics in this isolated mode.
86
87
  - For hot-reload validation, configure exactly one active source for this extension in Pi settings and launch plain `pi`; validate `/reload` there because it exercises auto-discovered/configured resources.
87
- - Maintain a tmux-driven configured-source lifecycle harness (`npm run verify -- lifecycle`; required before release per `docs/RELEASE.md`) that isolates Pi settings, uses exactly one configured source, exercises `/reload`, full restart, and `/resume`, and asserts managed-session continuity plus persisted artifact survival. It is its own `npm run verify` mode rather than part of the default `npm run verify` sequence, but operators still run it before every publish. Keep `docs/RELEASE.md` accurate about the harness behavior, cleanup, transcript retention, and limitations.
88
+ - Maintain a tmux-driven configured-source lifecycle harness (`npm run verify -- lifecycle`; required before release per `docs/RELEASE.md`) that isolates Pi settings, uses exactly one configured source, exercises `/reload`, full restart, and `/resume`, and asserts managed-session continuity plus persisted artifact survival. It is its own `npm run verify` mode rather than part of the default `npm run verify` sequence, but operators still run it before every publish. The harness defaults Pi to model `zai/glm-5.1` (`scripts/verify-lifecycle.mjs`); pass `--model <id>` after `lifecycle` when a different model is required. Keep `docs/RELEASE.md` accurate about the harness behavior, cleanup, transcript retention, and limitations.
88
89
  - Validate a full `pi` restart with `/resume` when changes touch managed-session continuity, reload behavior, or persisted artifact paths.
89
90
  - Prefer full `pi` restart over `/reload` when validating extension changes beyond a quick reload smoke check.
90
91
  - Use `/resume` when needed after restart.
@@ -103,6 +104,7 @@ The design should comfortably support workflows such as:
103
104
  - headless authenticated `chat.com` / ChatGPT / OpenAI browsing without forcing `--headed` or `--auto-connect`
104
105
  - upstream profile/debug workflows without adding a local profile-cloning layer in this package
105
106
  - provider-backed or iOS device launches where upstream owns credentials, env, and setup; the wrapper forwards argv and a curated provider-related environment without emulating those backends
107
+ - desktop Electron targets using top-level `electron` for discover → isolated launch → attach → probe/cleanup, or raw `args: ["connect", …]` when the operator launches the real app with a debug port for signed-in state (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#electron) and [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#electron-desktop-apps))
106
108
 
107
109
  ## Implications for the implementation
108
110
 
@@ -5,6 +5,7 @@ Related docs:
5
5
  - [`../AGENTS.md`](../AGENTS.md) (rebaselining and verification stack)
6
6
  - [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md)
7
7
  - [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md)
8
+ - [`ELECTRON.md`](ELECTRON.md)
8
9
  - [`RELEASE.md`](RELEASE.md)
9
10
  - [`REQUIREMENTS.md`](REQUIREMENTS.md)
10
11
 
@@ -28,7 +29,7 @@ When upstream ships a new `agent-browser` or the inventory changes:
28
29
  - Source of truth: `CAPABILITY_BASELINE.inventorySections` in the same file (stable `id` keys: `skills`, `core-commands`, `state-tabs-frames-dialogs`, `network-storage-artifacts-diagnostics`, `batch-auth-setup-ai`, `options-and-env`).
29
30
  - Status: supported for the current wrapper contract.
30
31
  - High-priority support gaps: none identified in the baseline audit.
31
- - Post-`v0.2.29` review state: commits `eb55320` through `86abbfb` add browser guidance/smoke coverage plus `RQ-0086` click-probe reduction, `RQ-0087` same-snapshot form fill batching, `RQ-0088` current-ref fallback on locator misses, `RQ-0089` direct-upstream click mutation investigation, and `RQ-0090` stop-boundary/artifact-path guidance. CueLoop validation was idle/valid on 2026-05-18 after those tasks were marked done. Constrained `job` (`RQ-0064`), the lightweight `qa` preset (`RQ-0065`), the experimental `sourceLookup` helper (`RQ-0066`), and the experimental `networkSourceLookup` helper (`RQ-0067`) are implemented; see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#job), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#qa), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sourcelookup), and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#networksourcelookup). Reusable browser recipes (`RQ-0068`) are intentionally not adopted as a runtime surface; see [`ARCHITECTURE.md`](ARCHITECTURE.md#no-reusable-recipe-layer-yet).
32
+ - Post-`v0.2.29` review state: commits `eb55320` through `86abbfb` add browser guidance/smoke coverage plus `RQ-0086` click-probe reduction, `RQ-0087` same-snapshot form fill batching, `RQ-0088` current-ref fallback on locator misses, `RQ-0089` direct-upstream click mutation investigation, and `RQ-0090` stop-boundary/artifact-path guidance. Verification gates below were rerun on 2026-05-18 after those tasks landed. Constrained `job` (`RQ-0064`), the lightweight `qa` preset (`RQ-0065`), the experimental `sourceLookup` helper (`RQ-0066`), and the experimental `networkSourceLookup` helper (`RQ-0067`) are implemented; see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#job), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#qa), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sourcelookup), and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#networksourcelookup). Reusable browser recipes (`RQ-0068`) are intentionally not adopted as a runtime surface; see [`ARCHITECTURE.md`](ARCHITECTURE.md#no-reusable-recipe-layer-yet).
32
33
 
33
34
  ## Verification evidence
34
35
 
@@ -40,7 +41,7 @@ Re-run the gates below before each release; this table records what the closure
40
41
  | Real upstream contract | `npm run verify -- real-upstream` runs the localhost fixture matrix against the real installed `agent-browser` matching the baseline. | Pass on 2026-05-18 (`npm run verify -- real-upstream`). |
41
42
  | Packaged Pi smoke | `npm run verify -- package-pi` validates package contents, loads exactly one packaged `agent_browser` tool, and executes fake-upstream `--version`. | Pass on 2026-05-18 (`npm run verify -- package-pi`). |
42
43
  | `verify -- release` / `prepublishOnly` | `npm run verify -- release` chains the default gate with packaged Pi smoke (`verifySteps` `release` in [`scripts/project.mjs`](../scripts/project.mjs)). `package.json` `prepublishOnly` runs that compose before `npm pack --dry-run` during `npm publish`. It intentionally omits lifecycle, real-upstream, and benchmark modes—see [`RELEASE.md`](RELEASE.md#pre-release-checks). | Pass on 2026-05-18 (`npm run verify -- release`). `prepublishOnly` still needs a fresh run during actual publish. |
43
- | Configured-source lifecycle | `npm run verify -- lifecycle` (`scripts/verify-lifecycle.mjs`) drives `/reload`, restart, `/resume`, session continuity, slash-command sentinel tokens (`v1` then `v2` after rewriting the packaged extension to simulate pickup), and persisted spill reachability with a fake upstream on `PATH`. Passthrough flags are defined in `validatePassthrough` in [`scripts/project.mjs`](../scripts/project.mjs): `--keep-artifacts`, `--verbose`, and `--timeout-ms` plus a separate positive integer value (for example `npm run verify -- lifecycle --keep-artifacts --verbose --timeout-ms 600000`). | Pass on 2026-05-18 (`npm run verify -- lifecycle`). Treat any future unexplained red lifecycle gate as a release blocker. |
44
+ | Configured-source lifecycle | `npm run verify -- lifecycle` (`scripts/verify-lifecycle.mjs`) drives `/reload`, restart, `/resume`, session continuity, slash-command sentinel tokens (`v1` then `v2` after rewriting the packaged extension to simulate pickup), and persisted spill reachability with a fake upstream on `PATH`. Default Pi model is `zai/glm-5.1`; default per-step wait is **180000 ms** (`DEFAULT_TIMEOUT_MS`); override model with `--model <id>` and waits with `--timeout-ms <ms>`. Passthrough flags in [`scripts/project.mjs`](../scripts/project.mjs): `--keep-artifacts`, `--model`, `--verbose`, and `--timeout-ms` plus a value (for example `npm run verify -- lifecycle --model openai-codex/gpt-5.5:minimal --keep-artifacts --verbose --timeout-ms 600000`). | Pass on 2026-05-18 (`npm run verify -- lifecycle`). Treat any future unexplained red lifecycle gate as a release blocker. |
44
45
  | Quick isolated Pi smoke | `pi --no-extensions -e .` from repo root; native `agent_browser` only. | Pass on 2026-05-18 for a fresh interactive tmux smoke: the agent opened `https://example.com`, waited for `Example Domain`, saved `/tmp/piab-isolated-smoke.png` with verified `image/png` artifact metadata, closed the browser session, and reported PASS. Broader historical coverage also includes version/help/skills, open/snapshot/click, eval stdin, batch stdin, screenshot, explicit session, `sessionMode: "fresh"`, network requests, console/errors, diff snapshot, stream status/disable, dashboard start/stop, and chat credential-failure pass-through during RQ-0055. |
45
46
 
46
47
  ## Baseline checklist by inventory section
@@ -56,9 +57,9 @@ Re-run the gates below before each release; this table records what the closure
56
57
 
57
58
  ## Follow-up decision after closure
58
59
 
59
- Native `job`, `qa`, experimental `sourceLookup`, and experimental `networkSourceLookup` are shipped.
60
+ Native `job`, `qa`, experimental `sourceLookup`, experimental `networkSourceLookup`, and first-class Electron lifecycle/probe support are shipped.
60
61
 
61
- `RQ-0066` shipped as the bounded evidence model in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sourcelookup): it compiles to upstream `batch` steps (`is visible`, `get html`, `react inspect`, `react tree` as applicable), merges `details.sourceLookup` into the tool `details` alongside batch presentation, and never reclassifies an upstream-successful batch to failed solely because no candidates were found (unlike `qa` diagnostic reclassification).
62
+ `RQ-0066` shipped as the bounded evidence model in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sourcelookup): it compiles to upstream `batch` steps (`is visible`, `get html`, `react inspect`, `react tree` as applicable), merges `details.sourceLookup` into the tool `details` alongside batch presentation, and never reclassifies an upstream-successful batch to failed solely because no candidates were found (unlike `qa` diagnostic reclassification). Wrapper-tracked packaged Electron no-candidate results now add bounded `workspaceRoot` / `electronContext` when available, limitations that the scan only covers the Pi cwd and does not unpack installed app resources or `app.asar`, and live Electron `snapshot` / `probe` / `tab list` next actions. Fake coverage: `agentBrowserExtension explains packaged Electron sourceLookup no-candidate boundaries` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
62
63
 
63
64
  `RQ-0067` shipped as the failed-request correlation experiment in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#networksourcelookup): it compiles to upstream `batch` steps (`network request …` and/or `network requests --filter …`), merges `details.networkSourceLookup` after scanning batch JSON for failed requests and optional workspace URL literals, redacts query strings and credentials in model-visible surfaces, and never reclassifies an upstream-successful batch to failed solely because no candidates were found.
64
65
 
@@ -70,7 +71,9 @@ Native `job`, `qa`, experimental `sourceLookup`, and experimental `networkSource
70
71
 
71
72
  `RQ-0091` keeps advanced release smoke tests focused on extension behavior instead of external skill routing: the Sauce Demo smoke in [`RELEASE.md`](RELEASE.md#public-sauce-demo-checkout-smoke-prompt) now launches with `--no-skills`, restricts tools to `agent_browser`, and uses bounded release-smoke wording rather than dogfood/exploratory QA language. Runtime guidance remains the concise stop-boundary and exact-artifact-path contract from `extensions/agent-browser/lib/playbook.ts`; no site-specific automation or recipe layer was added. Evidence from the failed high/low local-shop runs showed skill/report drift (`dogfood-output` substitution) and reasoning complexity, not a wrapper command defect, so skill-enabled dogfood remains a separate validation mode. Human workflow: [`RELEASE.md`](RELEASE.md#public-sauce-demo-checkout-smoke-prompt), [`AGENTS.md`](../AGENTS.md#preferred-testing-workflow), and [`REQUIREMENTS.md`](REQUIREMENTS.md#testing-guidance).
72
73
 
73
- `RQ-0068` closed with a no-adopt decision for reusable browser recipes. Current benchmark and repo-local dogfood evidence do not show repeated named job shapes that justify executable recipe state; examples stay in docs and prompt guidance, while the `qa` preset remains the only stable repeated smoke-test shortcut. Revisit recipes only with concrete repeated workflow evidence and a defined owner/versioning/test plan.
74
+ `RQ-0096` ships first-class Electron desktop-app support without adding a generic recipe runtime: top-level `electron` covers wrapper-owned `list`, isolated `launch` with snapshot/tabs/connect handoff, `status`, `cleanup`, and compact current-session or launch-scoped `probe`; `qa.attached` extends the existing QA preset for attached Electron/CDP sessions without introducing `electron.qa`. `launch.handoff` still defaults to `"snapshot"`, while `handoff: "tabs"` is documented as the safer diagnostic starting point when refs/content capture is not needed yet. Host install discovery (`discoverElectronApps`) is macOS/Linux-only today: on Windows `electron.list` reports `platform: "unsupported"` with an empty catalog and name/bundle targets cannot resolve from scans—use `executablePath` (or a host path to the Electron binary) for Windows launch targeting. Discovery adds non-blocking likely-sensitive app annotations plus visible isolated-profile/auth-state warnings; launch output and `details.electron.profileIsolation` state that wrapper launches do not reuse existing signed-in app profiles or attach to already-running authenticated apps, and point agents to the host debug-port launch plus raw `connect` path when signed-in local app state is the goal; launch timeout failures include PID/profile/DevToolsActivePort/timing diagnostics; status/probe add launch/session identifiers, liveness, mismatch/reattach next actions, and dead-launch context for `about:blank`; post-mutation Electron death is upgraded to `tab-drift` with `details.electronPostCommandHealth`; Electron fills can add `details.fillVerification`; Electron `@e…` mutations can add same-URL ref freshness guidance; broad Electron `get text` selectors add scope warnings; cleanup ownership is bounded to wrapper-created launch records and temp profiles; externally launched debug ports stay on the manual `args: ["connect", "<port-or-url>"]` path and remain host-owned. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#electron) plus [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#qa) for `qa.attached`; human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#electron-desktop-apps) and README common calls; implementation: `extensions/agent-browser/index.ts` and `extensions/agent-browser/lib/electron/`; deterministic efficiency evidence: `electron-lifecycle` and `electron-probe` in `scripts/agent-browser-efficiency-benchmark.mjs`; fake coverage includes Electron schema/probe/mismatch/post-command-health/fill-verification/broad-text/discovery-sensitivity and packaged-sourceLookup cases in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts). This plan is the `RQ-0068` revisit evidence for Electron specifically: [`docs/plans/electron-extension-2026-05-20.md`](plans/electron-extension-2026-05-20.md) documents repeated failure-prone discover/launch/attach/cleanup and multi-call state-probe sequences, plus bounded owner/versioning/test/docs artifacts.
75
+
76
+ `RQ-0068` remains closed with a no-adopt decision for a reusable named browser recipe runtime. The Electron evidence above justified a narrow typed shorthand and compact probe, not an open-ended recipe layer; future reusable recipes still require concrete repeated workflow evidence and a defined owner/versioning/test plan.
74
77
 
75
78
  `RQ-0070` adds bounded locator fallbacks when a compiled top-level `semanticAction` fails with `failureCategory: "selector-not-found"`: `extensions/agent-browser/index.ts` appends `try-*-candidate` entries to `details.nextActions` (and an `Agent-browser candidate fallbacks` block in visible text) only for `fill`+`placeholder`, `click`+`text`, or `fill`+`label`. Other locator/action pairs omit this block; `semanticAction` `select` now uses explicit `selector` plus `value`/`values` and compiles to upstream `select`, not to unverified `find … select`. Active-session role/name click/check/uncheck shorthands also get a pre-execution visible-ref resolution pass via one fresh `snapshot -i`, so hidden duplicate upstream `find` matches do not steal the action; the original target remains in `details.compiledSemanticAction` and the executed ref appears in `details.effectiveArgs`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction); fake coverage: `agentBrowserExtension returns semantic locator candidates when semanticAction misses` and `agentBrowserExtension resolves semantic role clicks through current visible snapshot refs when available` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
76
79