npm - pi-agent-browser-native - Versions diffs - 0.2.31 → 0.2.33 - Mend

pi-agent-browser-native 0.2.31 → 0.2.33

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (66) hide show

package/docs/ELECTRON.md ADDED Viewed

@@ -0,0 +1,387 @@
+# Electron desktop apps
+Related docs:
+- [`../README.md`](../README.md)
+- [`../AGENTS.md`](../AGENTS.md) — maintainer verification (`npm run verify`, lifecycle), Pi `tmux` smoke expectations, and upstream rebaselining
+- [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md) — full `electron` and `qa.attached` field contracts
+- [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) — workflow snippets in the broader native command surface
+- [`ARCHITECTURE.md`](ARCHITECTURE.md) — wrapper design and the closed `RQ-0068` recipe-layer decision
+- [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md) — `RQ-0096` Electron support row and verification gates
+## Purpose
+This guide is the entry point for using `pi-agent-browser-native` against desktop **Electron** applications. The wrapper exposes a top-level `electron` shorthand that owns the awkward discover → launch → attach → probe → cleanup sequence so agents do not hand-build `--remote-debugging-port` argv, poll `DevToolsActivePort`, and `kill` profile directories. After attach, the rest of the native `agent_browser` surface (`snapshot`, `find`, `click`, `fill`, `get`, `eval --stdin`, `batch`, `qa.attached`, and similar) works the same way it does against a web page.
+This document is structured for users, not implementers. Field-level rules live in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#electron); this guide focuses on **when** and **how** to use them, and on the safety and ownership boundary the wrapper enforces.
+## Who this is for
+- **Pi users** who want an agent to operate a local Electron app the same way it operates a web page.
+- **Coding agents** that need a low-context lifecycle for desktop apps such as VS Code, Cursor, Obsidian, Slack, or any app built on Electron, without re-implementing the CDP attach dance every session.
+- **Maintainers and reviewers** validating the wrapper's Electron behavior before release; verification evidence lives under `RQ-0096` in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md).
+It is **not** an upstream `agent-browser` reference and it does **not** replace the canonical [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#electron) for exact field semantics, validation rules, or failure categories.
+## Mental model
+```
+electron.list       → discover Electron apps (host-only; no upstream spawn)
+electron.launch     → launch a wrapper-owned isolated app, attach via CDP, hand off (snapshot|tabs|connect)
+electron.status     → liveness, debug-port, and target inspection (read-only)
+electron.probe      → compact one-call state read (title/url/focus/tabs/snapshot)
+electron.cleanup    → close managed session, stop the tracked process, remove the temp profile
+qa.attached         → smoke check against the currently attached session (no URL)
+```
+Two ownership modes coexist:
+1. **Wrapper-owned launches** — `electron.launch` starts a brand-new app process with an **isolated temporary user-data-dir** and an **OS-chosen debug port**. The wrapper records a `launchId` for every such launch and `electron.cleanup` only operates on those `launchId`s.
+2. **Manually launched apps** — you start the Electron app yourself (for example with `open -a Slack --args --remote-debugging-port=9222 --remote-allow-origins='*'`), then attach with `{ "args": ["connect", "9222"], "sessionMode": "fresh" }`. The wrapper does not own that process; **you** are responsible for shutting it down and cleaning its profile.
+Choosing between the two is a real decision, not a stylistic one. See [Wrapper-owned vs manually launched](#wrapper-owned-vs-manually-launched).
+## Quick start
+Discover the app, launch with the default snapshot handoff, work with current refs, then clean up:
+```json
+{ "electron": { "action": "list", "query": "code" } }
+{ "electron": { "action": "launch", "appName": "Visual Studio Code", "handoff": "snapshot" } }
+{ "args": ["snapshot", "-i"] }
+{ "electron": { "action": "probe", "timeoutMs": 5000 } }
+{ "electron": { "action": "cleanup", "launchId": "electron-…" } }
+```
+The launch result carries both a `launchId` (used by `status`/`probe`/`cleanup`) and an attached `sessionName` (used by browser-style `snapshot`/`tab`/`click`/`find` calls). Read both from `details.electron.launch` and `details.electron.identifiers`. With default implicit session reuse, the quick-start `args: ["snapshot", "-i"]` line uses that attached session without an extra `--session` argument; pass `--session` explicitly when you target a named upstream session instead.
+For a quick "is the app actually showing what we expect?" smoke check after attach:
+```json
+{ "qa": { "attached": true, "expectedText": "Explorer", "screenshotPath": ".dogfood/electron.png" } }
+```
+`qa.attached` runs against the **current managed session** without opening a URL, so it works for any attached app — wrapper-owned or manually launched.
+## Wrapper-owned vs manually launched
+Pick the mode that matches the **state you need**.
+| | `electron.launch` (wrapper-owned) | `args: ["connect", …]` (manual host launch) |
+|---|---|---|
+| Profile | Isolated temporary `userDataDir` | The app's normal profile (your real signed-in state) |
+| Debug port | OS-chosen via `--remote-debugging-port=0` and `DevToolsActivePort` | Caller-supplied port (for example `9222`) |
+| Signed-in state | **No** — first-run or empty profile | **Yes** — whatever is in the launched profile |
+| Already-running app | Cannot attach to it | Required (or relaunch yourself with a debug port) |
+| Lifecycle ownership | Wrapper owns shutdown and profile cleanup | **You** own shutdown and profile cleanup |
+| When to use | Anything you can do against a fresh app: tooling, UX flows, scripted local QA, exploring panels, packaged debugging | Tasks that explicitly need the user's signed-in Slack/Obsidian/VS Code state |
+| How to clean up | `electron.cleanup` with the returned `launchId` | Close the app yourself; do **not** call `electron.cleanup` |
+### Manual host-launch pattern
+When the explicit goal is the user's signed-in local app state and the app is not already running:
+```bash
+# macOS example
+open -a Slack --args --remote-debugging-port=9222 --remote-allow-origins='*'
+```
+Then attach and choose a ready target before using refs:
+```json
+{ "args": ["connect", "9222"], "sessionMode": "fresh" }
+{ "args": ["tab", "list"] }
+{ "args": ["tab", "t2"] }
+{ "args": ["snapshot", "-i"] }
+{ "qa": { "attached": true, "expectedText": "Channels" } }
+```
+A successful `connect` means the CDP endpoint accepted the session; it does **not** prove the app has an active rendered page yet. Prefer `details.nextActions` when present: `list-connected-session-tabs` inspects the attached session targets. After that read-only list, select or confirm a stable `t<N>` target and run `snapshot -i` explicitly before trusting refs. If the first `snapshot -i` says `No active page`, follow `list-tabs-after-no-active-page`. If it returns no useful refs without that error, manually run `tab list`, select a stable `t<N>` id for the app surface, then retry a condition wait or `snapshot -i` on that selected target.
+If the app is already running without a debug port, ask before relaunching it — relaunching may lose unsaved state and Electron's single-instance behavior will silently drop a second invocation's `--remote-debugging-port` flag.
+## Readiness and waits
+Use this ladder for desktop-host readiness instead of blind sleep loops:
+1. Prefer a real condition when one exists: `wait --text`, `wait --url`, `wait --fn`, `wait --load <state>`, or `wait --download`.
+2. After raw `connect`, inspect targets with `tab list`, select the stable `tab t<N>` app surface, then use a condition wait or `snapshot -i` on that selected surface.
+3. After wrapper-owned `electron.launch`, use `electron.probe` or `electron.status` when launch health, debug-port liveness, or target mismatch matters.
+4. Use `qa.attached` when the readiness check can be expressed as expected text or selector plus diagnostics against the current managed session.
+5. Use fixed waits only as a last resort, and keep each fixed wait below the wrapper IPC budget. `wait 30000` is intentionally blocked; use `25000` ms or less per call when a fixed wait is unavoidable.
+6. Treat a fixed-wait payload such as `"waited":"timeout"` as elapsed time, not proof that the host finished. Verify with an observed condition, fresh `snapshot -i`, or screenshot before continuing.
+This project is not adding a first-class host-idle primitive yet. Revisit that only if repeated desktop smokes show that condition waits, `qa.attached`, `electron.probe`, snapshots, and screenshots cannot cover the workflow.
+## Action reference
+The exact field schemas, validation rules, and `details.*` payload shapes live in [`TOOL_CONTRACT.md#electron`](TOOL_CONTRACT.md#electron). This section is a usage-oriented overview.
+### `electron.list` — discover apps
+Host-only scan; does not spawn upstream `agent-browser`. macOS (`/Applications/*.app`, `~/Applications/*.app`) and Linux (`.desktop` launchers under standard XDG, Flatpak, and Snap locations) are supported in v1. On Windows (and any non-macOS/non-Linux host), `list` returns `details.electron.platform: "unsupported"` with an empty `apps` array—use `executablePath` (or a host `appPath` that resolves to a verifiable Electron binary) for `launch` instead; `inspectElectronExecutablePath` in `extensions/agent-browser/lib/electron/discovery.ts` still gates Windows executables before spawn.
+```json
+{ "electron": { "action": "list", "query": "code", "maxResults": 25 } }
+```
+Returns app metadata under `details.electron.apps`: `name`, optional `bundleId`/`desktopId`, `appPath`, `executablePath`, `platform`, and optional non-blocking `sensitivity` annotations. Apps flagged as likely sensitive (categories such as `notes`, `chat`, `mail`, `developer-workspace`, or `passwords-auth`) are printed with `[likely sensitive: …]`. These are **advisory hints**, not enforcement; see [Safety and ownership](#safety-and-ownership) for the policy boundary.
+### `electron.launch` — launch and attach
+Pass **exactly one** target: `appPath`, `appName`, `bundleId`, or `executablePath`. The wrapper resolves the target, verifies Electron framework evidence, applies optional caller-owned `allow` / `deny` policy, creates an isolated temp `userDataDir`, launches with `--remote-debugging-port=0` plus safe defaults, reads `DevToolsActivePort`, then attaches through upstream `connect` as a fresh managed session.
+```json
+{
+  "electron": {
+    "action": "launch",
+    "appName": "Visual Studio Code",
+    "handoff": "snapshot",
+    "targetType": "page",
+    "timeoutMs": 30000,
+    "appArgs": ["--disable-telemetry"]
+  }
+}
+```
+Handoff selection (`handoff` field):
+| Value | Behavior | When to use |
+|---|---|---|
+| `"snapshot"` (default) | Attach, list targets, capture `snapshot -i` in one call | You need interactive refs immediately for clicks/fills |
+| `"tabs"` | Attach and list targets only | Safer diagnostic start when you only need target discovery |
+| `"connect"` | Attach and stop | You will run your own follow-up commands |
+`targetType` defaults to `"page"`; use `"webview"` or `"any"` for apps whose useful UI is exposed as a webview target.
+Optional `timeoutMs` on `electron.launch` bounds host-side CDP readiness (waiting for `DevToolsActivePort` and attach). When omitted, the default is **15 seconds** with a hard maximum of **120 seconds**, matching `ELECTRON_LAUNCH_DEFAULT_TIMEOUT_MS` and `ELECTRON_LAUNCH_MAX_TIMEOUT_MS` in `extensions/agent-browser/lib/electron/launch.ts`.
+Wrapper-owned launches **always** use an isolated temp profile and an OS-chosen port. `--user-data-dir`, `--remote-debugging-port`, `--remote-debugging-address`, `--remote-debugging-pipe`, and bare `--` in `appArgs` are rejected. There is no caller-supplied port and no way to make `electron.launch` reuse the app's normal signed-in profile or attach to an already-running app — by design. Use the manual path described above when those are the actual requirements.
+### `electron.status` — liveness and targets
+Read-only inspection of one or more tracked launches. Without `launchId` or `all`, it selects the single active wrapper launch when unambiguous.
+```json
+{ "electron": { "action": "status" } }
+{ "electron": { "action": "status", "launchId": "electron-…" } }
+{ "electron": { "action": "status", "all": true } }
+```
+Reports `cleanupState`, debug-port and PID liveness, and bounded CDP target metadata under `details.electron.statuses`. Mismatch fields surface when the current managed session or tab no longer matches a live wrapper launch target — typically the cue to follow `reattach-electron-launch` before trusting old refs.
+### `electron.probe` — compact state read
+`probe` collapses what would otherwise be separate `get title` / `get url` / focused-element `eval` / `tab list` / `snapshot -i` calls into one bounded result. Use it instead of chaining those reads when you just need a quick "where are we?" check.
+```json
+{ "electron": { "action": "probe" } }
+{ "electron": { "action": "probe", "launchId": "electron-…", "timeoutMs": 5000 } }
+```
+Output appears under `details.electron.probe`: `title`, `url`, `focusedElement`, `activeTab`, `tabs`, compact `snapshot` metadata (`refCount`, `refIds`, optional text preview and omission counts), and `errors`. When `launchId` is given, the probe is tied to that tracked launch and will surface mismatch guidance if the wrapper sees a session or target drift; visible output also includes debug-port/pid liveness so a stale `about:blank` against a dead launch is unmistakable.
+`timeoutMs` bounds each underlying read subprocess. Use it for dense desktop apps when the default budget is too short, or to fail fast when you suspect the app process is wedged.
+### `electron.cleanup` — wrapper-owned only
+Closes the tracked managed session, stops only the wrapper-tracked process, verifies that the debug port no longer serves `/json/version`, and removes the wrapper-created `userDataDir`. Cleanup partial failures fail the tool result with `failureCategory: "cleanup-failed"` and the `retry-electron-cleanup` next action references the same `launchId` so retries are bounded.
+```json
+{ "electron": { "action": "cleanup", "launchId": "electron-…" } }
+{ "electron": { "action": "cleanup", "all": true } }
+```
+`electron.cleanup` **never** targets:
+- manually launched apps
+- externally supplied debug ports
+- arbitrary Electron processes the wrapper did not start
+- explicit screenshots, downloads, PDFs, traces, HAR files, or recordings saved to caller-chosen paths
+For manual launches, `close` only closes the browser/CDP session. Close the app yourself and clean its profile/temp files with normal host tools.
+On Pi session shutdown, active wrapper-owned Electron launches are best-effort cleaned. Stale restored records (PID gone, port dead) are **reported** instead of guessed at or killed.
+### `timeoutMs` by action (quick reference)
+`electron.list` does not take `timeoutMs` (host scan only). For every other action, `timeoutMs` applies to **different surfaces**; treat values as per-call budgets, not one global knob. Authoritative rules and env overrides live under **Validation and defaults** in [`TOOL_CONTRACT.md#electron`](TOOL_CONTRACT.md#electron).
+| Action | What `timeoutMs` covers when set | Typical default when omitted |
+| --- | --- | --- |
+| `launch` | Host-side wait for `DevToolsActivePort` and CDP readiness | **15 s**, hard-capped at **120 s** (`normalizeTimeoutMs` in `extensions/agent-browser/lib/electron/launch.ts`) |
+| `status` | Optional managed-session `get title` / `get url` reads used for mismatch diagnostics | Normal tool subprocess budget from `runAgentBrowserProcess` / `AGENT_BROWSER_DEFAULT_TIMEOUT`; localhost CDP HTTP probes keep a short fixed budget (`ELECTRON_STATUS_FETCH_TIMEOUT_MS` in `extensions/agent-browser/lib/electron/cleanup.ts`) |
+| `cleanup` | One combined budget for managed-session `close`, tracked process exit, debug-port verification, and temp profile removal | `PI_AGENT_BROWSER_IMPLICIT_SESSION_CLOSE_TIMEOUT_MS` when set, else **5000 ms** (`getImplicitSessionCloseTimeoutMs` in `extensions/agent-browser/lib/runtime.ts`, passed through `cleanupTrackedElectronLaunches` in `extensions/agent-browser/index.ts`) |
+| `probe` | **Each** upstream read in the probe chain (`get title`, `get url`, focused `eval --stdin`, `tab list`, `snapshot -i`) | Same default as other tool calls (typically **28 s** per subprocess unless `AGENT_BROWSER_DEFAULT_TIMEOUT` / `PI_AGENT_BROWSER_PROCESS_TIMEOUT_MS` overrides `runAgentBrowserProcess` in `extensions/agent-browser/lib/process.ts`) |
+## `qa.attached` — current-session smoke check
+`qa` has two forms: the URL form (`qa: { url, … }`) and the attached form (`qa: { attached: true, … }`). The attached form is the right tool for Electron smoke checks after either launch path because it does not open a URL and runs all checks against the current managed session.
+```json
+{
+  "qa": {
+    "attached": true,
+    "expectedText": "Explorer",
+    "expectedSelector": "@e1",
+    "checkConsole": true,
+    "checkErrors": true,
+    "screenshotPath": ".dogfood/electron.png"
+  }
+}
+```
+`qa.attached` rejects `url` and is incompatible with `sessionMode: "fresh"` — attach first with `electron.launch` or raw `connect`, then run `qa.attached`. The full field rules and pass/fail classification live in [`TOOL_CONTRACT.md#qa`](TOOL_CONTRACT.md#qa).
+In attached Electron sessions, broad selectors such as `body`, `html`, `main`, or `[role=application]` can read the entire app shell. When `get text <selector>` looks too broad, the wrapper may attach `details.electronGetTextScopeWarning` and a `snapshot-for-electron-text-scope` next action; prefer a fresh `snapshot -i`, a current `@ref`, or a narrower panel selector.
+## `sourceLookup` against packaged Electron apps
+`sourceLookup` is an experiment for hinting at the source file/component behind a visible element. It is **opt-in** and **evidence-based**: it reports confidence and evidence rather than claiming a guaranteed mapping. The same experimental helper works against packaged Electron apps, but with two important boundaries:
+1. **Scope of the workspace scan.** `sourceLookup` walks the Pi session **cwd** (default `maxWorkspaceFiles: 2000`, hard cap 5000). It does **not** unpack `app.asar` or installed app resources. For packaged apps where the source lives inside `Contents/Resources/app.asar`, the workspace-search lane will commonly return no candidates.
+2. **React DevTools requirement.** `react inspect <id>` requires the session to have been launched with `--enable react-devtools` before first navigation. For Electron, the wrapper's `electron.launch` path does **not** inject `--enable react-devtools` into the Electron process; that flag belongs to upstream `agent-browser` Chromium launches. If the Electron app does not already expose a React DevTools backend, expect `react inspect` to fail; DOM-attribute and workspace-search candidates may still surface.
+For wrapper-tracked packaged Electron sessions where `status` is `no-candidates`, the wrapper attaches `workspaceRoot` plus optional `electronContext` (`launchId?`, `appName?`, `appPath?`, `executablePath?`, `sessionName?`, `url?`) and limitations explaining the bundle/asar boundary, plus `snapshot-electron-session`, `probe-electron-launch`, and `list-electron-tabs` next actions so you can inspect the live app and decide whether to widen the workspace or pull source out-of-band before re-running the lookup.
+```json
+{ "sourceLookup": { "selector": "#save", "reactFiberId": "2", "componentName": "SaveButton" } }
+```
+Treat `sourceLookup` output as a starting point for navigation, not a substitute for reading code. Full contract: [`TOOL_CONTRACT.md#sourcelookup`](TOOL_CONTRACT.md#sourcelookup).
+## Safety and ownership
+Remote debugging exposes app content (DOM, network, JavaScript) to the attached browser tool. The wrapper ships **isolation defaults**; it does **not** classify any app as too-risky-to-launch.
+### What the wrapper always does
+- Launches with `--user-data-dir=<wrapper-created-temp>` and `--remote-debugging-port=0`.
+- Reads the OS-chosen port from `DevToolsActivePort`.
+- Adds `--disable-extensions`, `--no-first-run`, and `--no-default-browser-check` alongside sanitized caller `appArgs`.
+- Rejects `appArgs` that try to override lifecycle/debug flags.
+- Refuses to launch non-Electron targets (correctness gate, not a security gate).
+- Treats `electron.cleanup` as wrapper-owned only; never touches manually launched apps.
+### What the **caller** owns
+- The decision to launch or attach to a sensitive app in the first place.
+- Optional `allow` / `deny` policy lists when you want guardrails.
+- Profile and process cleanup for manually launched apps.
+- Host-file cleanup for any explicit screenshots, downloads, HARs, traces, or recordings saved to caller-chosen paths. `electron.cleanup` does not touch these.
+### Caller-owned policy: `allow` / `deny`
+Both lists match `appName`, `bundleId`, `desktopId`, `appPath`, or `executablePath` by substring.
+```json
+{
+  "electron": {
+    "action": "launch",
+    "appName": "Slack",
+    "allow": ["Slack"],
+    "deny": ["1Password", "Bitwarden"]
+  }
+}
+```
+Rules:
+- If `allow` is set, the target must match at least one entry.
+- If `deny` is set, a matching target is rejected.
+- `deny` wins on conflict.
+- With neither set, launch is permitted.
+Policy mismatches fail with `failureCategory: "policy-blocked"` and `details.electron.failure.policy` names the matched list and entry.
+### Likely-sensitive annotations
+`electron.list` may annotate common private-data apps (`notes`, `chat`, `mail`, `developer-workspace`, `passwords-auth`) with `sensitivity.level: "likely-sensitive"` and a visible `[likely sensitive: …]` marker. These are **advisory hints only**. They do not block `launch` and they do not replace caller `allow` / `deny`.
+## Failure categories and recovery
+`details.failureCategory` values you should expect from Electron flows, with the recovery move:
+| Category | When | Recovery |
+|---|---|---|
+| `validation-error` | Bad input (missing target, conflicting fields, non-Electron target) | Fix the request; the message names the problem |
+| `policy-blocked` | Caller `allow` / `deny` rejected the launch | Adjust the policy or pick a different target |
+| `timeout` | `DevToolsActivePort` never appeared in time | Inspect `details.electron.failure.diagnostics` (PID, profile path, port file state, elapsed/timeout); retry with a higher `timeoutMs` if the app legitimately needs more time |
+| `upstream-error` | Launch/attach/spawn/CDP failure that does not fit a more specific bucket | Inspect `details.electron.failure.diagnostics`; the app may be missing dependencies or hitting a CDP race |
+| `tab-drift` | A successful-looking command was followed by a dead process / debug port / unrecoverable `about:blank` | Use the appended `status-electron-launch` / `probe-electron-launch` next actions, then decide whether to relaunch |
+| `cleanup-failed` | Cleanup only partially succeeded | Inspect `details.electron.cleanup.results[].steps` for remaining process/port/profile state; `retry-electron-cleanup` references the same `launchId` |
+| `stale-ref` | `@e…` ref reused after a navigation/rerender | Take a fresh `snapshot -i` (or follow `refresh-electron-refs-after-rerender` when the wrapper appends it) |
+Single-instance Electron behavior is a common cause of `timeout` and `upstream-error`. Many Electron apps enforce a single running instance and silently drop a second invocation's `--remote-debugging-port` flag. If the app is already running without a debug port, quit it first or use the manual host-launch path against the existing instance instead.
+## Troubleshooting
+### Launch hangs and then times out
+- The app is enforcing single-instance; quit the running copy first, then retry.
+- The app may have moved its Electron framework directory; pass `executablePath` explicitly.
+- `timeoutMs` is too short for a heavy app; raise it (`launch.timeoutMs` is bounded but generous).
+- Read `details.electron.failure.diagnostics`: presence/absence of `DevToolsActivePort`, port number, PID liveness, and elapsed time usually identify the issue.
+### `electron.list` returns nothing
+- On Linux, the binary may be a custom rebrand without `chrome_*.pak` siblings, an AppImage without a `.desktop` entry, or a statically linked fork. Pass `executablePath` directly.
+- On macOS, apps installed outside `/Applications` and `~/Applications` are not scanned in v1. Pass `appPath` or `executablePath` explicitly.
+- Windows hosts report `platform: "unsupported"` from `electron.list`; always pass `executablePath` (or a resolvable `appPath`) for `launch`.
+### Attach succeeds but `snapshot -i` returns no refs
+- Some Electron apps take a beat to render. The default `handoff: "snapshot"` already retries briefly; if it still reports no refs, run `tab list`, select the intended stable `t<N>` app tab, then run `snapshot -i` again.
+- For raw `connect`, do the same target check before assuming the signed-in app is ready; the attach can succeed before an active page is available.
+- For apps whose UI lives in a webview, switch `targetType` to `"webview"` or `"any"` so the wrapper attaches to the right CDP target.
+### "I clicked, but nothing happened"
+- A successful upstream `click` means the action was dispatched, not that the app handled it. Re-snapshot, check `details.pageChangeSummary`, or use `qa.attached` to verify.
+- Electron apps frequently rerender in place (no URL change). The wrapper may attach `refresh-electron-refs-after-rerender` to remind you to re-snapshot before reusing `@e…` refs.
+### `fill` looks fine but the field is empty
+- Custom quick-input controls (VS Code's quick-pick, command palette, etc.) often need focus + keyboard typing rather than a direct `fill`. The wrapper attaches `details.fillVerification` when `get value` disagrees with the requested text; follow `inspect-after-fill-verification` and switch to focus + `keyboard type` before submitting.
+### `get text` returns the whole app
+- Broad selectors (`body`, `html`, `main`, `[role=application]`) read the entire shell. Use a current `@ref` or a narrower panel selector. The wrapper attaches `details.electronGetTextScopeWarning` and a `snapshot-for-electron-text-scope` next action when it detects this pattern.
+### `sourceLookup` says `no-candidates` for a packaged app
+- Expected when the app's source lives inside `app.asar`. The wrapper does not unpack bundles. Use `electron.probe` / `snapshot-electron-session` / `list-electron-tabs` next actions to inspect the live UI, or pull source separately into the Pi session cwd before re-running the lookup.
+### Mismatch between `status` and the active session
+- `electron.status` may report a live wrapper launch while the managed session has drifted to `about:blank`. Follow `reattach-electron-launch`, then refresh refs before reusing old `@e…` handles. For non-wrapper tab drift where `details.nextActions` names `select-intended-tab-after-drift`, use that stable `t<N>` action plus `snapshot-after-tab-recovery` before continuing.
+## Cleanup checklist
+Before ending the task:
+- Call `electron.cleanup` (or `electron.cleanup` with `all: true`) for every wrapper-owned `launchId` you started. The result reports per-step state for `managed-session`, `process`, `debug-port`, and `user-data-dir`.
+- Confirm `details.electron.cleanup.summary` does not list remaining resources.
+- For **manually launched** apps, close the app yourself and clean any profile or temp files you created. `electron.cleanup` will not (and should not) touch them.
+- Remove any explicit screenshots, recordings, downloads, PDFs, traces, or HAR files you saved to caller-chosen paths. Artifact cleanup is host-owned; the wrapper only reports them under `details.artifacts` and `details.artifactCleanup`.
+If `cleanup` returns `failureCategory: "cleanup-failed"`, inspect `details.electron.cleanup.results[].steps` and use `retry-electron-cleanup` for the same `launchId`. Do not invent new cleanup commands for processes the wrapper did not start.
+## Verification and benchmarks
+Electron support is gated by the same release evidence as the rest of the wrapper:
+- `RQ-0096` in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md) records the contract, runtime, test, and verification coverage.
+- `electron-lifecycle` and `electron-probe` scenarios in `scripts/agent-browser-efficiency-benchmark.mjs` track the token-efficiency claim deterministically (no real browser, no real launches).
+- Fake-upstream coverage for Electron schema/probe/mismatch/post-command-health/fill-verification/broad-text/discovery-sensitivity lives in `test/agent-browser.extension-validation.test.ts`.
+- Real-app validation is a manual `tmux` smoke pass per the maintainer notes in `AGENTS.md`; the 2026-05-21 dogfood result is recorded at the end of [`docs/plans/electron-extension-2026-05-20.md`](plans/electron-extension-2026-05-20.md).
+Run the local gate the same way as the rest of the project:
+```bash
+npm run verify
+```
+The token-efficiency claim has its own opt-in run:
+```bash
+npm run benchmark:agent-browser
+```
+## Where to go next
+- For exact field semantics, schemas, and `details.*` payloads: [`TOOL_CONTRACT.md#electron`](TOOL_CONTRACT.md#electron) and [`TOOL_CONTRACT.md#qa`](TOOL_CONTRACT.md#qa).
+- For workflow examples woven into the broader command surface: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#electron-desktop-apps).
+- For the closed `RQ-0068` recipe-layer decision that bounds why Electron support is a typed shorthand and not a generic recipe runtime: [`ARCHITECTURE.md`](ARCHITECTURE.md#no-reusable-recipe-layer-yet).
+- For the full release-readiness audit and the `RQ-0096` evidence row: [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md).

package/docs/RELEASE.md CHANGED Viewed

@@ -5,11 +5,12 @@ Related docs:
 - [`REQUIREMENTS.md`](REQUIREMENTS.md)
 - [`ARCHITECTURE.md`](ARCHITECTURE.md)
 - [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md)
+- [`ELECTRON.md`](ELECTRON.md)
 - [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
 - Bounded `agent_browser` outcome metadata on `details` (`resultCategory`, `successCategory`, `failureCategory`, optional `nextActions`, optional `pageChangeSummary` with per-step summaries on `batch`): contract in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); maintainer checklists under “Tool result categories” and “Page-change summaries” in [`../AGENTS.md`](../AGENTS.md)
 - Post-success `get text` selector visibility (`RQ-0074`): optional `details.selectorTextVisibility` / `selectorTextVisibilityAll`, visible warnings, and `inspect-visible-text-candidates*` next actions after read-only visibility probes—[`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), and [`../AGENTS.md`](../AGENTS.md) maintainer checklist
 - Managed-session outcomes (`RQ-0077`): after extension-managed implicit or fresh `--session` injection reaches process execution, `details.managedSessionOutcome` records the transition (`created` / `replaced` / `unchanged` / `closed` on success; `preserved` / `abandoned` when a plan fails before a new session becomes current). Failing `sessionMode: "fresh"` calls also append model-visible `Managed session outcome: …`—[`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md), [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md), and [`../AGENTS.md`](../AGENTS.md) maintainer checklist
-- Stateful context commands (`cookies`, `storage`, `auth`, `dialog`, `frame`, `state`) and aggregate `batch` results: model-facing `details.data` is summarized or redacted per [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); aggregate `batch` replaces top-level `details.data` with a compact per-step matrix (`success`, argv-redacted `command`, redacted `result` or scrubbed `error`) while full per-step payloads, artifacts, and categories remain on `batchSteps[]`—operational notes in [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#use-stateful-browser-context-commands-safely), assembly in `extensions/agent-browser/lib/results/presentation.ts`
+- Stateful context commands (`cookies`, `storage`, `auth`, `dialog`, `frame`, `state`) and aggregate `batch` results: model-facing `details.data` is summarized or redacted per [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); aggregate `batch` replaces top-level `details.data` with a compact per-step matrix (`success`, argv-redacted `command`, redacted `result` or scrubbed `error`) while full per-step payloads, artifacts, and categories remain on `batchSteps[]`—operational notes in [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#use-stateful-browser-context-commands-safely), assembly in `extensions/agent-browser/lib/results/presentation/batch.ts`
 ## Purpose
@@ -36,7 +37,9 @@ npm run verify -- release
 `prepublishOnly` intentionally does **not** run `npm run verify -- lifecycle`, `npm run verify -- real-upstream`, or `npm run verify -- benchmark`; those are separate `npm run verify` modes in [`scripts/project.mjs`](../scripts/project.mjs). Treat the bullets below as the full pre-publish contract even though only the `release` slice is automated at publish time.
-Every release also requires interactive `tmux`-driven Pi dogfood with the native `agent_browser` tool against real sites. For extension-focused release smokes, use `pi --no-extensions --no-skills -e .` from the checkout before publish so auto-loaded dogfood/QA skills cannot replace the bounded smoke workflow; run separate skill-enabled dogfood only when validating skill routing or report-generation behavior. Drive prompts with `tmux send-keys`, exercise at least one simple static site and one real documentation/product site, include the higher-level `qa` or `job`/`batch` surfaces when they changed, close every opened browser session, remove screenshots/temp artifacts, and record the outcome in the release notes or support-matrix evidence. Automated localhost and fake-upstream gates do not replace this human-readable live-site transcript evidence. For dense-dashboard stress coverage, use the [public Grafana stress checklist](#public-grafana-stress-checklist) below; it is a maintainer workflow, not bundled product skill or recipe runtime.
+Every release also requires interactive `tmux`-driven Pi dogfood with the native `agent_browser` tool against real sites. For extension-focused release smokes, use `pi --no-extensions --no-skills -e .` from the checkout before publish so auto-loaded dogfood/QA skills cannot replace the bounded smoke workflow; run separate skill-enabled dogfood only when validating skill routing or report-generation behavior. Drive prompts with `tmux send-keys`, exercise at least one simple static site and one real documentation/product site, include the higher-level `qa` or `job`/`batch` surfaces when they changed, close every opened browser session, remove screenshots/temp artifacts, and record the outcome in the release notes or support-matrix evidence. Automated localhost and fake-upstream gates do not replace this human-readable live-site transcript evidence. When `electron.*` surfaces, attached-session diagnostics, or `qa.attached` changed, add a local Electron pass: `electron.list` → `electron.launch` (expect isolated profile behavior) → `snapshot -i` or `electron.probe` / `qa.attached` → `electron.cleanup` with the returned `launchId`, verifying status/mismatch guidance if you simulate a dead renderer or stale refs. For dense-dashboard stress coverage, use the [public Grafana stress checklist](#public-grafana-stress-checklist) below; it is a maintainer workflow, not bundled product skill or recipe runtime.
+When reviewing saved session JSONL after a failed smoke or a `qa` preset that reclassified an upstream-successful batch, expect `agent_browser` tool rows to carry `isError: true` whenever `details.resultCategory` is `failure`. For normal prose output, model-visible text should end with a `Pi tool isError: true` category line; for caller-requested `--json` output, the hook preserves parseable JSON and only patches `isError`. The extension applies that patch on the `tool_result` path so Pi’s transcript matches the wrapper contract ([`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details)). Preserve a normal Pi session directory for those checks; avoiding `--no-session` keeps this evidence intact ([`AGENTS.md`](../AGENTS.md) preferred validation workflow).
 The configured-source lifecycle regression harness is required before release because it launches an interactive `pi` process under `tmux` and validates `/reload` plus restart/`/resume` behavior:
@@ -181,7 +184,34 @@ Run the automated harness for deterministic configured-source lifecycle regressi
 npm run verify -- lifecycle
 ```
-The harness creates an isolated `PI_CODING_AGENT_DIR`, writes settings with exactly one temporary configured package source, runs plain `pi` in `tmux`, puts a deterministic fake `agent-browser` first on `PATH`, and drives `/reload`, full restart, and `/resume`. It asserts same-page managed-session continuity, persisted `details.fullOutputPath` reachability after resume, and updated extension-code pickup through a temporary sentinel command. On failure it retains transcripts/session artifacts; on success it performs best-effort cleanup. It does not replace occasional real-browser manual smoke testing.
+The harness creates an isolated `PI_CODING_AGENT_DIR`, writes settings with exactly one temporary configured package source, runs plain `pi` in `tmux` with default model **`zai/glm-5.1`**, puts a deterministic fake `agent-browser` first on `PATH`, and drives `/reload`, full restart, and `/resume`. Per-step tmux waits default to **180000 ms** (three minutes) in [`scripts/verify-lifecycle.mjs`](../scripts/verify-lifecycle.mjs) (`DEFAULT_TIMEOUT_MS`); override with `--timeout-ms <ms>` when slower models or cold starts need more headroom. Override the model when needed:
+```bash
+npm run verify -- lifecycle --model openai-codex/gpt-5.5:minimal
+```
+Combine flags in one invocation when both apply (order after `lifecycle` is flexible as long as each value-taking flag is immediately followed by its value):
+```bash
+npm run verify -- lifecycle --model openai-codex/gpt-5.5:minimal --timeout-ms 600000
+```
+It asserts same-page managed-session continuity, persisted `details.fullOutputPath` reachability after resume, and updated extension-code pickup through a temporary sentinel command. On failure it retains transcripts/session artifacts; on success it performs best-effort cleanup. It does not replace occasional real-browser manual smoke testing.
+**Lifecycle triage:** a timeout on sentinel `v2` after `/reload` often means Pi rejected reload while the TUI still showed `Working…` (`Wait for the current response to finish before reloading`), even when the session JSONL already has a final assistant message. Re-run with `--keep-artifacts --verbose`, inspect the retained pane capture, and confirm the configured model follows tool prompts reliably. Slower models may need a higher `--timeout-ms` than the **180000 ms** default.
+### Environment and automation pitfalls
+These show up often in cloud dev boxes and scripted smokes; they are maintainer notes, not product defects.
+| Topic | What to watch for | Mitigation |
+| --- | --- | --- |
+| **Pi CLI vs repo devDependencies** | Global `pi` older than the `@earendil-works/pi-coding-agent` range in `package.json` can change TUI behavior, `/reload`, and tool routing during lifecycle or checkout smokes. | Align `pi` with the repo’s pinned coding-agent release before release gates (`pi update` or install the matching version). |
+| **npm lockfile (`packageManager`)** | `package.json` pins **npm@11**. npm 10 may only strip optional `libc` metadata on `@esbuild/*` platform entries in `package-lock.json` (no dependency version change). | Prefer `npx -y npm@11.14.0 install` when refreshing the lockfile; do not commit npm-10-only lockfile churn. |
+| **`pi -p` / print mode** | Non-interactive `pi -p` may hang or emit no stdout for long real-browser smokes without a TTY. | Use **tmux**-driven interactive `pi` for release evidence and checkout smokes; reserve `-p` for short, non-browser checks. |
+| **Real-browser cleanup** | `real-upstream`, Sauce Demo, and live-site runs can leave defunct Chrome/`agent-browser` children if a session aborts mid-flow. | Close via `agent_browser` / `agent-browser` `close`, kill stray tmux sessions, and remove temp screenshots/HARs under `/tmp` or your chosen artifact dirs. |
+| **Automated prompt driving** | Grepping tmux pane text for words that also appear in the **user** prompt (`PASS`, `FAIL`, `checkout overview`, `Smoke result:`) can false-complete before the agent finishes. | Wait for pane idle (no `Working…`), `agent_browser close` / `Artifact lifecycle`, or JSONL tool results—not instruction phrases copied from the prompt. |
+| **Lifecycle verify flags** | `npm run verify -- lifecycle --model` or `--timeout-ms` without the next argv token fails fast with a usage error—the `project.mjs` facade validates passthrough the same way as `scripts/verify-lifecycle.mjs`. | Always pair flags with values (`--model openai-codex/gpt-5.5:minimal`, `--timeout-ms 600000`) or omit `--model` / `--timeout-ms` to keep the harness defaults (`zai/glm-5.1`, **180000 ms** per-step waits). |
 Manual validation remains useful for release confidence and installed-package checks:
@@ -192,7 +222,7 @@ Manual validation remains useful for release confidence and installed-package ch
 ### Real upstream contract validation
-The default `npm test` and `npm run verify` paths use fast deterministic tests and fake binaries. When a change touches upstream command planning, result presentation, managed-session behavior, or the canonical capability baseline, also run the opt-in real-upstream contract suite:
+The default `npm test` and `npm run verify` paths use fast deterministic tests and fake binaries. For a focused single-file rerun, use `npx tsx --test test/<file>.test.ts`; `npm test -- test/<file>.test.ts` still runs the package script's full glob. When a change touches upstream command planning, result presentation, managed-session behavior, or the canonical capability baseline, also run the opt-in real-upstream contract suite:
 ```bash
 npm run verify -- real-upstream

package/docs/REQUIREMENTS.md CHANGED Viewed

@@ -4,6 +4,7 @@ Related docs:
 - [`../README.md`](../README.md)
 - [`ARCHITECTURE.md`](ARCHITECTURE.md)
 - [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md)
+- [`ELECTRON.md`](ELECTRON.md)
 - [`RELEASE.md`](RELEASE.md)
 - [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
@@ -63,7 +64,7 @@ Define the product requirements and constraints for `pi-agent-browser-native`.
 ### Native `agent_browser` inputs
-- Each tool invocation must supply **exactly one** of: `args` (full upstream argv after the binary name), top-level `semanticAction` (a small intent object compiled into existing upstream `find` argv for locator actions or upstream `select <selector> <value...>` argv for native dropdown selection), `job`, `qa`, `sourceLookup`, or `networkSourceLookup`. Supplying multiple modes or none is rejected before launch (`extensions/agent-browser/index.ts`, `test/agent-browser.extension-validation.test.ts`).
+- Each tool invocation must supply **exactly one** of: `args` (full upstream argv after the binary name), top-level `semanticAction` (a small intent object compiled into existing upstream `find` argv for locator actions or upstream `select <selector> <value...>` argv for native dropdown selection), `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron` (bounded desktop lifecycle: host `list`, wrapper-owned isolated `launch` with CDP attach, `status`, compact `probe`, and `cleanup`; mutually exclusive with caller `stdin`). Supplying multiple modes or none is rejected before launch (`extensions/agent-browser/index.ts`, `test/agent-browser.extension-validation.test.ts`). Contract and field rules: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#electron); operator workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#electron-desktop-apps).
 - `semanticAction` is not a nested shape inside `batch` stdin; batch steps remain upstream argv string arrays, including `find` steps expressed as token lists.
 - Supported actions, locators, exclusivity rules, when `details.compiledSemanticAction` appears, and bounded `try-*-candidate` follow-ups on `selector-not-found` (specific action/locator pairs only; see contract) are specified in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction), with workflow examples in [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md).
@@ -84,7 +85,7 @@ Define the product requirements and constraints for `pi-agent-browser-native`.
 - The primary confidence path is a real `pi` session driven in `tmux`.
 - For quick local checkout smoke validation, launch `pi --no-extensions -e .` from the repository root so only the checkout copy loads; do not rely on Pi settings or `/reload` semantics in this isolated mode.
 - For hot-reload validation, configure exactly one active source for this extension in Pi settings and launch plain `pi`; validate `/reload` there because it exercises auto-discovered/configured resources.
-- Maintain a tmux-driven configured-source lifecycle harness (`npm run verify -- lifecycle`; required before release per `docs/RELEASE.md`) that isolates Pi settings, uses exactly one configured source, exercises `/reload`, full restart, and `/resume`, and asserts managed-session continuity plus persisted artifact survival. It is its own `npm run verify` mode rather than part of the default `npm run verify` sequence, but operators still run it before every publish. Keep `docs/RELEASE.md` accurate about the harness behavior, cleanup, transcript retention, and limitations.
+- Maintain a tmux-driven configured-source lifecycle harness (`npm run verify -- lifecycle`; required before release per `docs/RELEASE.md`) that isolates Pi settings, uses exactly one configured source, exercises `/reload`, full restart, and `/resume`, and asserts managed-session continuity plus persisted artifact survival. It is its own `npm run verify` mode rather than part of the default `npm run verify` sequence, but operators still run it before every publish. The harness defaults Pi to model `zai/glm-5.1` (`scripts/verify-lifecycle.mjs`); pass `--model <id>` after `lifecycle` when a different model is required. Keep `docs/RELEASE.md` accurate about the harness behavior, cleanup, transcript retention, and limitations.
 - Validate a full `pi` restart with `/resume` when changes touch managed-session continuity, reload behavior, or persisted artifact paths.
 - Prefer full `pi` restart over `/reload` when validating extension changes beyond a quick reload smoke check.
 - Use `/resume` when needed after restart.
@@ -103,13 +104,14 @@ The design should comfortably support workflows such as:
 - headless authenticated `chat.com` / ChatGPT / OpenAI browsing without forcing `--headed` or `--auto-connect`
 - upstream profile/debug workflows without adding a local profile-cloning layer in this package
 - provider-backed or iOS device launches where upstream owns credentials, env, and setup; the wrapper forwards argv and a curated provider-related environment without emulating those backends
+- desktop Electron targets using top-level `electron` for discover → isolated launch → attach → probe/cleanup, or raw `args: ["connect", …]` when the operator launches the real app with a debug port for signed-in state (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#electron) and [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#electron-desktop-apps))
 ## Implications for the implementation
 - Package-manifest behavior matters more than repo-local development wiring.
 - The extension should use official `pi` hooks and package resources where possible.
 - The wrapper should stay thin, with upstream `agent-browser` remaining the source of truth for command semantics.
-- Successful and failed tool outcomes should surface bounded machine-readable fields on Pi-facing `details` (`resultCategory`, `successCategory`, `failureCategory`, optional structured `nextActions`, optional `pageChangeSummary` with per-step summaries on `batch`, optional `artifactVerification` with the same shape on successful `batchSteps[]` rows) so agents can branch without parsing prose; stateful commands (`auth`, `cookies`, `storage`, `dialog`, `frame`, `state`) plus other structured diagnostics (for example `network`, `diff`, `trace`, `stream`, `dashboard`, `chat`) and `batch` should redact secret-bearing payloads in model-facing `details.data`, including the compact per-step `batch` roll-up on the parent result (full per-step payloads live on `batchSteps[]`). The contract lives in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), enums and classifier precedence live in `extensions/agent-browser/lib/results/shared.ts`, and presentation-time summaries, redaction, network request follow-ups, and artifact verification rollups are assembled in `extensions/agent-browser/lib/results/presentation.ts` (`buildPageChangeSummary`, `PAGE_CHANGE_SUMMARY_COMMANDS`, `redactPresentationData`, `buildArtifactVerificationSummary`, `buildBatchPresentation`).
+- Successful and failed tool outcomes should surface bounded machine-readable fields on Pi-facing `details` (`resultCategory`, `successCategory`, `failureCategory`, optional structured `nextActions`, optional `pageChangeSummary` with per-step summaries on `batch`, optional `artifactVerification` with the same shape on successful `batchSteps[]` rows) so agents can branch without parsing prose; stateful commands (`auth`, `cookies`, `storage`, `dialog`, `frame`, `state`) plus other structured diagnostics (for example `network`, `diff`, `trace`, `stream`, `dashboard`, `chat`) and `batch` should redact secret-bearing payloads in model-facing `details.data`, including the compact per-step `batch` roll-up on the parent result (full per-step payloads live on `batchSteps[]`). The contract lives in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), enums and classifier precedence live in `extensions/agent-browser/lib/results/categories.ts` and `contracts.ts` (also re-exported from `shared.ts`), and presentation-time summaries, redaction, network request follow-ups, and artifact verification rollups are assembled in `extensions/agent-browser/lib/results/presentation.ts` (`buildPageChangeSummary`, `PAGE_CHANGE_SUMMARY_COMMANDS`, `redactPresentationData`, `buildArtifactVerificationSummary`, `buildBatchPresentation`).
 - User-facing docs belong in `README.md` and the canonical published files under `docs/`.
 - Agent workflow and deeper testing procedures can stay in `AGENTS.md`, but published docs must not depend on that file being present.
 - When upstream `agent-browser` changes, refresh the local command reference, prompt guidance, and other extension-side docs so agents still have a repo-readable equivalent of the blocked direct-binary help path.