pursr 0.8.1 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -34,7 +34,8 @@ Most teams need **five separate tools** to do visual QA: a screenshot CLI, a reg
34
34
 
35
35
  - **A unified CLI** (`pursr`) for every capture, diff, sweep, and audit.
36
36
  - **An agent-grade MCP stdio server** (`pursr-mcp`) built on the official Model Context Protocol SDK, with persistent tabs, direct image responses, rendered-state inspection, actions, diagnostics, screenshots, sweeps, and resources.
37
- - **A library API** with 23 subpath modules, so you can embed the browser and QA primitives in your own tooling.
37
+ - **Visual Operator** sessions with a rendered cursor, target labels, click markers, visible Chrome windows, and authenticated Chrome attachment over CDP.
38
+ - **A library API** with 25 subpath modules, so you can embed the browser and QA primitives in your own tooling.
38
39
  - **A plugin system** for custom viewports, sweep ops, and capture hooks.
39
40
  - **PDF reports + AI diff summaries** built in - render a sweep to a styled PDF or ask a vision LLM to describe the regression in plain language.
40
41
  - **Zero browser bundled** - drives your system Chrome via Playwright. No 200 MB Chromium download.
@@ -79,7 +80,8 @@ pursr sweep ./plan.json # see plans/ for an example
79
80
  | Multi-viewport capture | 10+ presets (mobile, tablet, desktop, ultrawide) | `--preset mobile-375` |
80
81
  | Layered states | entity / terrain / hud / ui isolation | `--layer entity` |
81
82
  | Animation freeze | pause CSS/JS animations for stable frames | `--no-animation` |
82
- | Cursor overlay | pointer / grab / grabbing / crosshair | `--cursor crosshair` |
83
+ | Cursor overlay | pointer / grab / grabbing / crosshair | `--cursor crosshair` |
84
+ | Visual Operator | rendered cursor, target labels, click markers, WebM recording, headed and CDP sessions | `operator` CLI + MCP session tools |
83
85
  | Grid overlay | spacing guides, custom color + tile size | `--grid --grid-tile 64` |
84
86
  | Camera control | zoom + pan via mouse wheel/drag | `--zoom 1.5 --panX 200` |
85
87
  | Frame timeline | N captures at intervalMs for animations | `pursr frames <url> 8 200` |
@@ -159,7 +161,8 @@ pursr validate ./plan.json
159
161
  | `probe` | Health check (HTTP status, page title) |
160
162
  | `shot` / `full` | Viewport / full-page screenshot |
161
163
  | `eval` | Execute JS in the page, return result |
162
- | `click` / `type` / `wait` / `seq` | Interaction primitives |
164
+ | `click` / `type` / `wait` / `seq` | Interaction primitives |
165
+ | `operator` | Run a visible action plan with cursor feedback, screenshot, trace, diagnostics, and optional WebM video |
163
166
  | `diff` | Pixel-level diff vs a reference PNG |
164
167
  | `viewports` | List all registered viewport presets |
165
168
  | `shoot` | Rich capture (overlays, freeze, camera, plugins) |
@@ -188,10 +191,10 @@ npx pursr-mcp --verbose
188
191
 
189
192
  | Tool | Description |
190
193
  | --- | --- |
191
- | `pursr_session_open` | Open a persistent browser tab for iterative agent work |
194
+ | `pursr_session_open` | Open a headless, visible, or CDP browser session with optional Visual Operator |
192
195
  | `pursr_sessions` | List active browser sessions |
193
196
  | `pursr_snapshot` | Visible rendered nodes, geometry, semantics, and computed styles |
194
- | `pursr_act` | Click, hover, fill, type, scroll, navigate, reload, and more |
197
+ | `pursr_act` | Interact plus move cursor, annotate targets, and clear visual feedback |
195
198
  | `pursr_screenshot` | Return the current PNG directly to the vision model |
196
199
  | `pursr_inspect` | Inspect exact geometry, computed styles, and stacking ancestors |
197
200
  | `pursr_diagnostics` | Read console, page errors, failed requests, and HTTP failures |
@@ -229,6 +232,93 @@ Example action arguments:
229
232
  ]
230
233
  }
231
234
  ```
235
+
236
+ ### Visual Operator
237
+
238
+ Set `visual: true` to render the agent cursor and interaction feedback into screenshots. `mode: "visible"` enables it automatically and opens a Chrome window that a developer can watch.
239
+
240
+ #### CLI: scripted tutorials and repeatable recordings
241
+
242
+ Use the CLI when the steps are already known. It needs no MCP host and produces a final screenshot, JSON trace, diagnostics, and an optional WebM recording.
243
+
244
+ ```bash
245
+ pursr operator http://localhost:3000 @plans/operator-tutorial.json \
246
+ --visible \
247
+ --start-delay 3000 \
248
+ --slow-mo 100 \
249
+ --video ./recordings \
250
+ --out ./recordings/final.png
251
+ ```
252
+
253
+ The action plan is a JSON array. The same action objects work through `pursr_act` in MCP:
254
+
255
+ ```json
256
+ [
257
+ { "type": "annotate", "selector": "role=button|Build", "label": "Open build menu" },
258
+ { "type": "click", "selector": "role=button|Build", "durationMs": 350, "settleMs": 500 },
259
+ { "type": "click", "x": 640, "y": 420, "durationMs": 250 },
260
+ { "type": "drag", "fromX": 520, "fromY": 400, "toX": 760, "toY": 520, "steps": 30 },
261
+ { "type": "keyDown", "key": "Shift" },
262
+ { "type": "keyUp", "key": "Shift" },
263
+ { "type": "press", "key": "Escape" },
264
+ { "type": "sleep", "ms": 800 },
265
+ { "type": "clearAnnotations", "keepCursor": true }
266
+ ]
267
+ ```
268
+
269
+ Chrome records the browser viewport as silent WebM video. Add narration or system audio in your editor, and convert to MP4 when needed:
270
+
271
+ ```bash
272
+ ffmpeg -i recording.webm -c:v libx264 -pix_fmt yuv420p tutorial.mp4
273
+ ```
274
+
275
+ #### MCP: adaptive agent operation
276
+
277
+ Use MCP when the agent must inspect the current page, decide the next action, verify visual results, or pause for human approval. MCP is not required for CLI recording. Both interfaces use the same session and Visual Operator engine.
278
+
279
+ ```json
280
+ {
281
+ "url": "http://localhost:3000",
282
+ "sessionId": "visual-review",
283
+ "mode": "visible",
284
+ "operatorColor": "#ff2ea6",
285
+ "slowMo": 80
286
+ }
287
+ ```
288
+
289
+ Add `recordVideoDir` to record an MCP session in headless or visible mode. The final video path is returned by `pursr_session_close`. CDP sessions preserve an existing browser profile but cannot record video because Chrome owns that context.
290
+
291
+ Visual actions use the regular `pursr_act` tool:
292
+
293
+ ```json
294
+ {
295
+ "sessionId": "visual-review",
296
+ "actions": [
297
+ { "type": "move", "x": 640, "y": 360, "durationMs": 300 },
298
+ { "type": "annotate", "selector": "role=button|Publish", "label": "Primary CTA" },
299
+ { "type": "click", "selector": "role=button|Publish" },
300
+ { "type": "clearAnnotations", "keepCursor": true }
301
+ ]
302
+ }
303
+ ```
304
+
305
+ To use an existing authenticated Chrome profile, start Chrome with a dedicated remote-debugging profile and attach using CDP. Do not expose the debugging port beyond localhost.
306
+
307
+ ```bash
308
+ chrome --remote-debugging-port=9222 --user-data-dir=/tmp/pursr-chrome
309
+ ```
310
+
311
+ ```json
312
+ {
313
+ "url": "https://app.example.com",
314
+ "sessionId": "signed-in-review",
315
+ "mode": "cdp",
316
+ "cdpUrl": "http://127.0.0.1:9222",
317
+ "visual": true
318
+ }
319
+ ```
320
+
321
+ Pursr opens a new tab in Chrome's default context, preserving that profile's cookies and login state. Closing the Pursr session disconnects without terminating the owner browser.
232
322
 
233
323
  ### Exposed Resources
234
324
 
@@ -364,7 +454,8 @@ import {
364
454
  saveBaseline, diffKey,
365
455
  startHarCapture, stopHarCapture, writeHar,
366
456
  loadAuthState,
367
- PursrMCPServer, loadMcpConfig,
457
+ PursrMCPServer, loadMcpConfig, BrowserSessionManager,
458
+ installVisualOperator, moveVisualCursor, highlightVisualTarget,
368
459
  validateSweepPlan,
369
460
  listResources, readResource,
370
461
  listViewports, resolveViewport, VIEWPORTS,
@@ -389,7 +480,9 @@ import { validateSweepPlan } from "pursr/sweep-schema";
389
480
  import { startHarCapture, stopHarCapture } from "pursr/har";
390
481
  import { saveAuthState, loadAuthState } from "pursr/auth";
391
482
  import { listResources, readResource } from "pursr/mcp-resources";
392
- import { PursrMCPServer } from "pursr/mcp";
483
+ import { PursrMCPServer } from "pursr/mcp";
484
+ import { BrowserSessionManager } from "pursr/session";
485
+ import { moveVisualCursor, highlightVisualTarget } from "pursr/visual-operator";
393
486
  ```
394
487
 
395
488
  ## Plugins
@@ -417,7 +510,9 @@ Plugins are auto-loaded from `plugins/` (built-in) or via `--plugin <path>`.
417
510
  ```
418
511
  src/
419
512
  index.js - public library entry
420
- mcp.js - MCP stdio server (JSON-RPC 2.0)
513
+ mcp.js - official MCP SDK stdio server
514
+ session.js - persistent headless, visible, and CDP sessions
515
+ visual-operator.js - rendered cursor and interaction feedback
421
516
  shoot.js - runShoot (overlays + camera + frame-stable)
422
517
  sweep.js - runSweep (validated, parallel pool)
423
518
  diff.js - pixelmatch wrapper
package/bin/pursr.mjs CHANGED
@@ -2,7 +2,8 @@
2
2
  // pursr CLI. Thin wrapper around src/* that mirrors the npm bin.
3
3
 
4
4
  import { VERSION } from "../src/index.js";
5
- import { runClick, runType, runWait, runSeq } from "../src/interact.js";
5
+ import { runClick, runType, runWait, runSeq } from "../src/interact.js";
6
+ import { runOperator } from "../src/operator.js";
6
7
  import { runEval } from "../src/eval.js";
7
8
  import { runProbe } from "../src/probe.js";
8
9
  import { runShot } from "../src/shot.js";
@@ -24,7 +25,8 @@ import { loadPlugins, listPlugins, getFlagHelp } from "../src/plugin.js";
24
25
 
25
26
  const USAGE = `usage:
26
27
  v1: pursr {probe|shot|full|eval|click|type|wait|diff|seq} <url> [...]
27
- v2: pursr {viewports|shoot|layer|frames|hover|sweep} <...>
28
+ v2: pursr {viewports|shoot|layer|frames|hover|sweep} <...>
29
+ operator: pursr operator <url> <actions.json|@file> [--visible] [--start-delay 3000] [--video <dir>] [--out <final.png>]
28
30
  flags: --preset <name> --width N --height N --dpr N
29
31
  --zoom 1.5 --panX 200 --panY -100
30
32
  --cursor pointer|grab|grabbing|crosshair|none
@@ -90,7 +92,41 @@ await loadPlugins(pluginPaths);
90
92
  : await runDiff(url, ref, out, threshold, flags);
91
93
  console.log(JSON.stringify(r, null, 2)); break;
92
94
  }
93
- case "seq": { if (!url) die("missing url"); const actions = readArg(b); if (!actions) die("seq: missing <actions.json> (or @file)"); const out = c || makeOut("seq.png"); const r = await runSeq(url, actions, out); console.log(JSON.stringify(r, null, 2)); break; }
95
+ case "seq": { if (!url) die("missing url"); const actions = readArg(b); if (!actions) die("seq: missing <actions.json> (or @file)"); const out = c || makeOut("seq.png"); const r = await runSeq(url, actions, out); console.log(JSON.stringify(r, null, 2)); break; }
96
+ case "operator": {
97
+ if (!url) die("operator: missing <url>");
98
+ const actions = readArg(b); if (!actions) die("operator: missing <actions.json> (or @file)");
99
+ const flags = parseFlags(argv.slice(5));
100
+ const out = flags.out || makeOut("operator.png");
101
+ const videoValue = flags.video ?? flags["record-video"];
102
+ const recordVideoDir = videoValue
103
+ ? (videoValue === true ? dirname(out) : String(videoValue))
104
+ : null;
105
+ const r = await runOperator({
106
+ url,
107
+ actions,
108
+ out,
109
+ outputDir: dirname(out),
110
+ sessionId: flags.session || undefined,
111
+ flags: {
112
+ mode: flags.mode || (flags.cdp ? "cdp" : flags.visible ? "visible" : "headless"),
113
+ visual: !flags["no-visual"],
114
+ cdpUrl: flags.cdp || flags["cdp-url"],
115
+ slowMo: asNum(flags["slow-mo"] ?? flags.slowMo, 0),
116
+ startDelayMs: asNum(flags["start-delay"] ?? flags.startDelayMs, 0),
117
+ operatorColor: flags["operator-color"] || flags.operatorColor,
118
+ recordVideoDir,
119
+ width: flags.width,
120
+ height: flags.height,
121
+ dpr: flags.dpr,
122
+ preset: flags.preset,
123
+ full: !!flags.full,
124
+ },
125
+ });
126
+ console.log(JSON.stringify(r, null, 2));
127
+ if (!r.ok) process.exitCode = 1;
128
+ break;
129
+ }
94
130
  case "viewports": { console.log(JSON.stringify(listViewports(), null, 2)); break; }
95
131
  case "shoot": {
96
132
  if (!url) die("missing url");
@@ -342,4 +378,4 @@ await loadPlugins(pluginPaths);
342
378
  console.error(JSON.stringify({ error: e.message, stack: e.stack?.split("\n").slice(0, 3).join("\n") }, null, 2));
343
379
  process.exit(1);
344
380
  }
345
- })();
381
+ })();
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "pursr",
3
- "version": "0.8.1",
3
+ "version": "0.10.0",
4
4
  "private": false,
5
5
  "description": "pursr — Visual QA, audit, and MCP for the browser. One CLI + one MCP server for screenshots, sweeps, baselines, diffs, axe-core a11y audits, HAR capture, and auth state — with parallel sweep workers, auto-healing selectors, and a plugin system. Zero browser bundled: drives your system Chrome via Playwright.",
6
6
  "homepage": "https://github.com/0xheycat/pursr",
@@ -39,7 +39,9 @@
39
39
  "./snap": "./src/snap.js",
40
40
  "./report": "./src/report.js",
41
41
  "./ai-diff": "./src/ai-diff.js",
42
- "./session": "./src/session.js"
42
+ "./session": "./src/session.js",
43
+ "./operator": "./src/operator.js",
44
+ "./visual-operator": "./src/visual-operator.js"
43
45
  },
44
46
  "files": [
45
47
  "bin",
@@ -0,0 +1,16 @@
1
+ [
2
+ {
3
+ "type": "annotate",
4
+ "selector": "body",
5
+ "label": "Visual Operator session",
6
+ "durationMs": 300
7
+ },
8
+ {
9
+ "type": "sleep",
10
+ "ms": 700
11
+ },
12
+ {
13
+ "type": "clearAnnotations",
14
+ "keepCursor": true
15
+ }
16
+ ]
package/src/index.js CHANGED
@@ -25,7 +25,7 @@ import { runClick, runType, runWait, runSeq } from "./interact.js";
25
25
  import { listViewports, resolveViewport, VIEWPORTS } from "./viewport.js";
26
26
  import { applyCamera, waitForStableFrame } from "./overlays.js";
27
27
  import { loadPlugins, registerPlugin, listPlugins, getSweepOp, getViewportPreset, listViewportPresets, getFlagHelp } from "./plugin.js";
28
- import { launch, newPage } from "./runway.js";
28
+ import { connectOverCDP, launch, newPage } from "./runway.js";
29
29
  import { parseFlags, asNum, asBool, nowIso, shortHash, escapeHtml, renderSweepHtml, renderEveryViewportHtml, findStepPng, readArg, makeOut } from "./util.js";
30
30
  import { resolveLocator, parseTextSelector } from "./selector.js";
31
31
  import { captureDomSnapshot, captureDomSnapshotSidecar } from "./dom-snapshot.js";
@@ -45,6 +45,11 @@ import { runCheck } from "./check.js";
45
45
  import { renderSweepPdf } from "./report.js";
46
46
  import { aiDiffSummary, aiDiffSidecar } from "./ai-diff.js";
47
47
  import { BrowserSessionManager } from "./session.js";
48
+ import { runOperator } from "./operator.js";
49
+ import {
50
+ installVisualOperator, moveVisualCursor, highlightVisualTarget,
51
+ markVisualClick, clearVisualAnnotations, visualPointForLocator,
52
+ } from "./visual-operator.js";
48
53
 
49
54
 
50
55
  // Derive VERSION from package.json to prevent drift
@@ -64,7 +69,7 @@ export {
64
69
  // plugin system
65
70
  loadPlugins, registerPlugin, listPlugins, getSweepOp, getViewportPreset, listViewportPresets, getFlagHelp,
66
71
  // low-level helpers (for plugin authors)
67
- launch, newPage,
72
+ launch, connectOverCDP, newPage,
68
73
  parseFlags, asNum, asBool, nowIso, shortHash, escapeHtml, renderSweepHtml, renderEveryViewportHtml, findStepPng, readArg, makeOut,
69
74
  resolveLocator, parseTextSelector,
70
75
  // v3: selector healing, CI output, MCP server
@@ -88,6 +93,9 @@ export {
88
93
  renderSweepPdf,
89
94
  aiDiffSummary, aiDiffSidecar,
90
95
  BrowserSessionManager,
96
+ runOperator,
97
+ installVisualOperator, moveVisualCursor, highlightVisualTarget,
98
+ markVisualClick, clearVisualAnnotations, visualPointForLocator,
91
99
  VERSION,
92
100
  };
93
101
 
@@ -98,7 +106,7 @@ export default {
98
106
  listViewports, resolveViewport, VIEWPORTS,
99
107
  applyCamera, waitForStableFrame,
100
108
  loadPlugins, registerPlugin, listPlugins, getSweepOp, getViewportPreset, listViewportPresets, getFlagHelp,
101
- launch, newPage,
109
+ launch, connectOverCDP, newPage,
102
110
  parseFlags, asNum, asBool, nowIso, shortHash, escapeHtml, renderSweepHtml, renderEveryViewportHtml, findStepPng, readArg, makeOut,
103
111
  resolveLocator, parseTextSelector,
104
112
  resolveHealedSelector, healStepAction,
@@ -110,5 +118,8 @@ export default {
110
118
  // v6: PDF report, AI diff summary
111
119
  runDiffWithAi, renderSweepPdf, aiDiffSummary, aiDiffSidecar,
112
120
  BrowserSessionManager,
121
+ runOperator,
122
+ installVisualOperator, moveVisualCursor, highlightVisualTarget,
123
+ markVisualClick, clearVisualAnnotations, visualPointForLocator,
113
124
  VERSION,
114
125
  };
package/src/mcp.js CHANGED
@@ -147,7 +147,7 @@ class PursrMCPServer {
147
147
  return [
148
148
  {
149
149
  name: "pursr_session_open",
150
- description: "Open a persistent browser tab for iterative agent work. State, hover, scroll, dialogs, and navigation persist until closed.",
150
+ description: "Open a persistent browser tab in headless, visible, or CDP mode. Visual sessions render cursor movement and interaction feedback into screenshots.",
151
151
  inputSchema: {
152
152
  type: "object",
153
153
  properties: {
@@ -155,6 +155,14 @@ class PursrMCPServer {
155
155
  sessionId: { type: "string", description: "Stable session name; generated when omitted" },
156
156
  preset: { type: "string", description: "Viewport preset" },
157
157
  width: { type: "number" }, height: { type: "number" }, dpr: { type: "number" },
158
+ mode: { type: "string", enum: ["headless", "visible", "cdp"], description: "Browser mode (default headless)" },
159
+ visible: { type: "boolean", description: "Alias for mode=visible" },
160
+ visual: { type: "boolean", description: "Enable rendered cursor and interaction overlays" },
161
+ cdpUrl: { type: "string", description: "Chrome DevTools endpoint for mode=cdp, e.g. http://127.0.0.1:9222" },
162
+ slowMo: { type: "number", description: "Delay Playwright operations in milliseconds" },
163
+ operatorColor: { type: "string", description: "Visual Operator accent color" },
164
+ recordVideoDir: { type: "string", description: "Directory for a WebM recording (headless or visible mode only)" },
165
+ timeoutMs: { type: "number", description: "Navigation/CDP connection timeout" },
158
166
  storageState: { description: "Playwright storageState object or file path" },
159
167
  },
160
168
  required: ["url"],
@@ -180,7 +188,7 @@ class PursrMCPServer {
180
188
  },
181
189
  {
182
190
  name: "pursr_act",
183
- description: "Perform ordered actions in a persistent session. Supported types: click, hover, fill, type, check, select, press, scroll, wait, sleep, navigate, reload, eval.",
191
+ description: "Perform ordered actions in a persistent session. Supports selector or coordinate click/doubleClick, drag, hover, fill, type, check, select, press, keyDown, keyUp, scroll, wait, sleep, navigate, reload, eval, move, annotate, and clearAnnotations.",
184
192
  inputSchema: {
185
193
  type: "object",
186
194
  properties: {
@@ -405,7 +413,12 @@ class PursrMCPServer {
405
413
 
406
414
  async _sessionOpen(args) {
407
415
  if (!args.url) throw new McpError(-32602, "Missing required: url");
408
- const flags = { preset: args.preset, width: args.width, height: args.height, dpr: args.dpr };
416
+ const flags = {
417
+ preset: args.preset, width: args.width, height: args.height, dpr: args.dpr,
418
+ mode: args.mode, visible: args.visible, visual: args.visual, cdpUrl: args.cdpUrl,
419
+ slowMo: args.slowMo, operatorColor: args.operatorColor, timeoutMs: args.timeoutMs,
420
+ recordVideoDir: args.recordVideoDir,
421
+ };
409
422
  const result = await this.sessions.open({ sessionId: args.sessionId, url: args.url, flags, storageState: args.storageState });
410
423
  return this._text(result);
411
424
  }
@@ -0,0 +1,61 @@
1
+ // One-shot Visual Operator workflow for CLI and library consumers.
2
+
3
+ import { dirname, join, resolve } from "node:path";
4
+ import { mkdirSync } from "node:fs";
5
+ import { BrowserSessionManager } from "./session.js";
6
+
7
+ function normalizeActions(actions) {
8
+ if (typeof actions === "string") actions = JSON.parse(actions);
9
+ if (!Array.isArray(actions) || !actions.length) throw new Error("operator actions must be a non-empty JSON array");
10
+ return actions;
11
+ }
12
+
13
+ export async function runOperator({
14
+ url,
15
+ actions,
16
+ out,
17
+ outputDir = process.cwd(),
18
+ sessionId = `operator-${Date.now().toString(36)}`,
19
+ flags = {},
20
+ } = {}) {
21
+ if (!url) throw new Error("operator url is required");
22
+ const steps = normalizeActions(actions);
23
+ const screenshotOut = resolve(out || join(outputDir, `${sessionId}.png`));
24
+ mkdirSync(dirname(screenshotOut), { recursive: true });
25
+
26
+ const manager = new BrowserSessionManager({ outputDir });
27
+ let opened = null;
28
+ let acted = null;
29
+ let shot = null;
30
+ let diagnostics = null;
31
+ let closed = null;
32
+ try {
33
+ opened = await manager.open({
34
+ sessionId,
35
+ url,
36
+ storageState: flags.storageState,
37
+ flags: { ...flags, visual: flags.visual !== false },
38
+ });
39
+ if (Number(flags.startDelayMs) > 0) {
40
+ await manager.get(sessionId).page.waitForTimeout(Number(flags.startDelayMs));
41
+ }
42
+ acted = await manager.act(sessionId, steps);
43
+ shot = await manager.screenshot(sessionId, { out: screenshotOut, full: !!flags.full });
44
+ diagnostics = manager.diagnostics(sessionId);
45
+ } finally {
46
+ closed = await manager.close(sessionId).catch(() => ({ sessionId, closed: false, video: null }));
47
+ }
48
+
49
+ return {
50
+ ok: !acted?.failed,
51
+ sessionId,
52
+ mode: opened?.mode,
53
+ visual: opened?.visual,
54
+ url: acted?.url || opened?.url || url,
55
+ title: acted?.title || opened?.title || null,
56
+ trace: acted?.trace || [],
57
+ screenshot: shot?.out || null,
58
+ video: closed?.video || null,
59
+ diagnostics,
60
+ };
61
+ }
package/src/runway.js CHANGED
@@ -43,24 +43,40 @@ function findChrome() {
43
43
 
44
44
  const BROWSER_ARGS = Object.freeze(["--no-sandbox", "--disable-gpu", "--disable-dev-shm-usage"]);
45
45
 
46
- export async function launch() {
47
- const chromium = await getChromium();
48
- const exec = findChrome();
49
- if (!exec) throw new Error("system Chrome not found in standard paths");
50
- return await chromium.launch({ headless: true, executablePath: exec, args: BROWSER_ARGS });
51
- }
52
-
53
- export async function newPage(browser, viewport, opts = {}) {
54
- const ctx = await browser.newContext({
55
- viewport: { width: viewport.width, height: viewport.height },
56
- deviceScaleFactor: viewport.dpr || 1,
57
- reducedMotion: "no-preference",
58
- colorScheme: "light",
59
- hasTouch: !!(viewport.name && viewport.name.startsWith("mobile")),
60
- isMobile: !!(viewport.name && viewport.name.startsWith("mobile")),
61
- storageState: opts.storageState || undefined,
62
- });
63
- const page = await ctx.newPage();
64
- page._pursrContext = ctx;
65
- return page;
66
- }
46
+ export async function launch(options = {}) {
47
+ const chromium = await getChromium();
48
+ const exec = findChrome();
49
+ if (!exec) throw new Error("system Chrome not found in standard paths");
50
+ return await chromium.launch({
51
+ headless: options.headless !== false,
52
+ executablePath: options.executablePath || exec,
53
+ slowMo: Math.max(0, Number(options.slowMo) || 0),
54
+ args: [...BROWSER_ARGS, ...(Array.isArray(options.args) ? options.args : [])],
55
+ });
56
+ }
57
+
58
+ export async function connectOverCDP(endpointURL, options = {}) {
59
+ if (!endpointURL || typeof endpointURL !== "string") throw new Error("cdpUrl is required for CDP mode");
60
+ const chromium = await getChromium();
61
+ return await chromium.connectOverCDP(endpointURL, { timeout: options.timeoutMs || 30_000 });
62
+ }
63
+
64
+ export async function newPage(browser, viewport, opts = {}) {
65
+ const ctx = opts.context || await browser.newContext({
66
+ viewport: { width: viewport.width, height: viewport.height },
67
+ deviceScaleFactor: viewport.dpr || 1,
68
+ reducedMotion: "no-preference",
69
+ colorScheme: "light",
70
+ hasTouch: !!(viewport.name && viewport.name.startsWith("mobile")),
71
+ isMobile: !!(viewport.name && viewport.name.startsWith("mobile")),
72
+ storageState: opts.storageState || undefined,
73
+ recordVideo: opts.recordVideoDir ? {
74
+ dir: opts.recordVideoDir,
75
+ size: { width: viewport.width, height: viewport.height },
76
+ } : undefined,
77
+ });
78
+ const page = await ctx.newPage();
79
+ if (opts.context) await page.setViewportSize({ width: viewport.width, height: viewport.height }).catch(() => {});
80
+ page._pursrContext = ctx;
81
+ return page;
82
+ }
package/src/session.js CHANGED
@@ -2,10 +2,18 @@
2
2
 
3
3
  import { mkdirSync, readFileSync } from "node:fs";
4
4
  import { dirname, join } from "node:path";
5
- import { launch, newPage } from "./runway.js";
5
+ import { connectOverCDP, launch, newPage } from "./runway.js";
6
6
  import { resolveViewport } from "./viewport.js";
7
7
  import { gotoOrThrow, settle, CLICK_TIMEOUT_MS, WAIT_DEFAULT_TIMEOUT_MS } from "./overlays.js";
8
8
  import { resolveLocator } from "./selector.js";
9
+ import {
10
+ clearVisualAnnotations,
11
+ highlightVisualTarget,
12
+ installVisualOperator,
13
+ markVisualClick,
14
+ moveVisualCursor,
15
+ visualPointForLocator,
16
+ } from "./visual-operator.js";
9
17
 
10
18
  const MAX_DIAGNOSTICS = 250;
11
19
  const MAX_ACTIONS = 50;
@@ -37,8 +45,9 @@ function attachDiagnostics(page, diagnostics) {
37
45
  }
38
46
 
39
47
  export class BrowserSessionManager {
40
- constructor({ launchBrowser = launch, outputDir = process.cwd() } = {}) {
48
+ constructor({ launchBrowser = launch, connectBrowser = connectOverCDP, outputDir = process.cwd() } = {}) {
41
49
  this.launchBrowser = launchBrowser;
50
+ this.connectBrowser = connectBrowser;
42
51
  this.outputDir = outputDir;
43
52
  this.sessions = new Map();
44
53
  }
@@ -52,24 +61,41 @@ export class BrowserSessionManager {
52
61
  }
53
62
 
54
63
  list() {
55
- return [...this.sessions.values()].map(({ id, page, viewport, createdAt }) => ({ sessionId: id, url: page.url(), viewport, createdAt }));
64
+ return [...this.sessions.values()].map(({ id, page, viewport, mode, visual, createdAt }) => ({ sessionId: id, url: page.url(), viewport, mode, visual, createdAt }));
56
65
  }
57
66
 
58
67
  async open({ sessionId, url, flags = {}, storageState } = {}) {
59
68
  if (!url) throw new Error("url is required");
60
69
  const id = cleanId(sessionId);
61
70
  if (this.sessions.has(id)) await this.close(id);
62
- const browser = await this.launchBrowser();
71
+ const mode = flags.mode || (flags.cdpUrl ? "cdp" : flags.visible ? "visible" : "headless");
72
+ if (!new Set(["headless", "visible", "cdp"]).has(mode)) throw new Error("mode must be headless, visible, or cdp");
73
+ const visual = flags.visual === true || mode === "visible";
74
+ const recordVideoDir = flags.recordVideoDir || null;
75
+ if (recordVideoDir && mode === "cdp") throw new Error("video recording is not available in CDP mode; use visible or headless mode");
76
+ if (recordVideoDir) mkdirSync(recordVideoDir, { recursive: true });
77
+ const operatorOptions = { color: flags.operatorColor || "#ff2ea6" };
78
+ const browser = mode === "cdp"
79
+ ? await this.connectBrowser(flags.cdpUrl, { timeoutMs: flags.timeoutMs })
80
+ : await this.launchBrowser({ headless: mode !== "visible", slowMo: flags.slowMo });
63
81
  try {
64
82
  const viewport = resolveViewport(flags);
65
- const page = await newPage(browser, viewport, { storageState });
83
+ const context = mode === "cdp" ? browser.contexts()[0] : null;
84
+ if (mode === "cdp" && !context) throw new Error("CDP browser has no default context");
85
+ const page = await newPage(browser, viewport, { storageState, context, recordVideoDir });
66
86
  const diagnostics = { console: [], errors: [], requests: [], responses: [] };
67
87
  attachDiagnostics(page, diagnostics);
88
+ if (visual) page.on("domcontentloaded", () => installVisualOperator(page, operatorOptions).catch(() => {}));
68
89
  const nav = await gotoOrThrow(page, url, { timeoutMs: flags.timeoutMs });
69
90
  await settle(page);
70
- const session = { id, browser, page, context: page._pursrContext, viewport, diagnostics, createdAt: new Date().toISOString() };
91
+ if (visual) await installVisualOperator(page, operatorOptions);
92
+ const session = {
93
+ id, browser, page, context: page._pursrContext, viewport, mode, visual,
94
+ operatorOptions, diagnostics, video: page.video?.() || null,
95
+ createdAt: new Date().toISOString(),
96
+ };
71
97
  this.sessions.set(id, session);
72
- return { sessionId: id, url: page.url(), title: await page.title(), viewport, status: nav.status, createdAt: session.createdAt };
98
+ return { sessionId: id, url: page.url(), title: await page.title(), viewport, mode, visual, status: nav.status, createdAt: session.createdAt };
73
99
  } catch (error) {
74
100
  try { await browser.close(); } catch {}
75
101
  throw error;
@@ -131,29 +157,86 @@ export class BrowserSessionManager {
131
157
  async act(sessionId, actions = []) {
132
158
  if (!Array.isArray(actions) || !actions.length) throw new Error("actions must be a non-empty array");
133
159
  if (actions.length > MAX_ACTIONS) throw new Error(`actions cannot exceed ${MAX_ACTIONS}`);
134
- const { page } = this.get(sessionId);
160
+ const session = this.get(sessionId);
161
+ const { page, visual, operatorOptions } = session;
135
162
  const trace = [];
136
163
  for (let i = 0; i < actions.length; i++) {
137
164
  const action = actions[i] || {};
138
165
  const op = action.type || action.op;
139
166
  const step = { index: i, type: op };
140
167
  try {
141
- if (["click", "hover", "fill", "type", "check", "select"].includes(op)) {
168
+ if (["click", "doubleClick", "hover", "fill", "type", "check", "select"].includes(op) && action.selector) {
142
169
  const locator = await resolveLocator(page, action.selector);
143
170
  await locator.first().waitFor({ state: "visible", timeout: action.timeoutMs || CLICK_TIMEOUT_MS });
171
+ let point = null;
172
+ if (visual) {
173
+ point = await visualPointForLocator(locator.first());
174
+ await moveVisualCursor(page, point.x, point.y, { ...operatorOptions, durationMs: action.durationMs });
175
+ await highlightVisualTarget(page, point.rect, { ...operatorOptions, color: action.color, label: action.label || `${op}: ${action.selector}` });
176
+ step.cursor = { x: Math.round(point.x), y: Math.round(point.y) };
177
+ }
144
178
  if (op === "click") await locator.first().click();
179
+ else if (op === "doubleClick") await locator.first().dblclick();
145
180
  else if (op === "hover") await locator.first().hover();
146
181
  else if (op === "fill") await locator.first().fill(String(action.text ?? action.value ?? ""));
147
182
  else if (op === "type") await locator.first().pressSequentially(String(action.text ?? ""), { delay: action.delayMs || 10 });
148
183
  else if (op === "check") await locator.first().setChecked(action.checked !== false);
149
184
  else await locator.first().selectOption(action.value);
185
+ if (visual && ["click", "doubleClick"].includes(op) && point) await markVisualClick(page, point.x, point.y, { ...operatorOptions, color: action.color });
150
186
  step.selector = action.selector;
187
+ } else if (["click", "doubleClick"].includes(op) && Number.isFinite(Number(action.x)) && Number.isFinite(Number(action.y))) {
188
+ const x = Number(action.x), y = Number(action.y);
189
+ if (visual) await moveVisualCursor(page, x, y, { ...operatorOptions, durationMs: action.durationMs });
190
+ await page.mouse[op === "doubleClick" ? "dblclick" : "click"](x, y, { button: action.button || "left" });
191
+ if (visual) await markVisualClick(page, x, y, { ...operatorOptions, color: action.color });
192
+ step.cursor = { x: Math.round(x), y: Math.round(y) };
193
+ } else if (op === "drag") {
194
+ const start = action.fromSelector
195
+ ? await visualPointForLocator((await resolveLocator(page, action.fromSelector)).first())
196
+ : { x: Number(action.fromX), y: Number(action.fromY) };
197
+ const end = action.toSelector
198
+ ? await visualPointForLocator((await resolveLocator(page, action.toSelector)).first())
199
+ : { x: Number(action.toX), y: Number(action.toY) };
200
+ if (![start.x, start.y, end.x, end.y].every(Number.isFinite)) throw new Error("drag requires from/to coordinates or selectors");
201
+ if (visual) await moveVisualCursor(page, start.x, start.y, { ...operatorOptions, durationMs: action.durationMs });
202
+ await page.mouse.move(start.x, start.y);
203
+ await page.mouse.down({ button: action.button || "left" });
204
+ const steps = Math.max(1, Math.min(100, Number(action.steps) || 20));
205
+ await page.mouse.move(end.x, end.y, { steps });
206
+ await page.mouse.up({ button: action.button || "left" });
207
+ if (visual) {
208
+ await moveVisualCursor(page, end.x, end.y, { ...operatorOptions, durationMs: 0 });
209
+ await markVisualClick(page, end.x, end.y, { ...operatorOptions, color: action.color });
210
+ }
211
+ step.cursor = { x: Math.round(end.x), y: Math.round(end.y) };
151
212
  } else if (op === "press") await page.keyboard.press(String(action.key));
213
+ else if (op === "keyDown") await page.keyboard.down(String(action.key));
214
+ else if (op === "keyUp") await page.keyboard.up(String(action.key));
152
215
  else if (op === "scroll") await page.mouse.wheel(Number(action.deltaX) || 0, Number(action.deltaY) || 0);
153
216
  else if (op === "wait") await (await resolveLocator(page, action.selector)).first().waitFor({ state: action.state || "visible", timeout: action.timeoutMs || WAIT_DEFAULT_TIMEOUT_MS });
154
217
  else if (op === "sleep") await page.waitForTimeout(Math.max(0, Number(action.ms) || 0));
155
- else if (op === "navigate") await gotoOrThrow(page, action.url, { timeoutMs: action.timeoutMs });
156
- else if (op === "reload") await page.reload({ waitUntil: "domcontentloaded" });
218
+ else if (op === "navigate") {
219
+ await gotoOrThrow(page, action.url, { timeoutMs: action.timeoutMs });
220
+ if (visual) await installVisualOperator(page, operatorOptions);
221
+ } else if (op === "reload") {
222
+ await page.reload({ waitUntil: "domcontentloaded" });
223
+ if (visual) await installVisualOperator(page, operatorOptions);
224
+ } else if (op === "move") {
225
+ if (!visual) throw new Error("move requires a visual session");
226
+ step.cursor = await moveVisualCursor(page, action.x, action.y, { ...operatorOptions, durationMs: action.durationMs });
227
+ } else if (op === "annotate") {
228
+ if (!visual) throw new Error("annotate requires a visual session");
229
+ const locator = await resolveLocator(page, action.selector);
230
+ await locator.first().waitFor({ state: "visible", timeout: action.timeoutMs || CLICK_TIMEOUT_MS });
231
+ const point = await visualPointForLocator(locator.first());
232
+ await moveVisualCursor(page, point.x, point.y, { ...operatorOptions, durationMs: action.durationMs });
233
+ await highlightVisualTarget(page, point.rect, { ...operatorOptions, color: action.color, label: action.label || action.selector });
234
+ step.selector = action.selector;
235
+ step.cursor = { x: Math.round(point.x), y: Math.round(point.y) };
236
+ } else if (op === "clearAnnotations") {
237
+ if (!visual) throw new Error("clearAnnotations requires a visual session");
238
+ await clearVisualAnnotations(page, { keepCursor: action.keepCursor !== false });
239
+ }
157
240
  else if (op === "eval") step.result = await page.evaluate(String(action.js || ""));
158
241
  else throw new Error(`unknown action type: ${op}`);
159
242
  if (action.settleMs) await page.waitForTimeout(Number(action.settleMs));
@@ -194,8 +277,16 @@ export class BrowserSessionManager {
194
277
  const session = this.sessions.get(id);
195
278
  if (!session) return { sessionId: id, closed: false };
196
279
  this.sessions.delete(id);
280
+ let video = null;
281
+ try {
282
+ if (session.mode === "cdp") await session.page.close();
283
+ else await session.context.close();
284
+ } catch {}
197
285
  try { await session.browser.close(); } catch {}
198
- return { sessionId: id, closed: true };
286
+ if (session.video) {
287
+ try { video = await session.video.path(); } catch {}
288
+ }
289
+ return { sessionId: id, closed: true, video };
199
290
  }
200
291
 
201
292
  async closeAll() {
@@ -0,0 +1,124 @@
1
+ // Visible cursor and interaction feedback for agent-driven browser sessions.
2
+
3
+ const DEFAULT_COLOR = "#ff2ea6";
4
+
5
+ function safeColor(value) {
6
+ const color = String(value || DEFAULT_COLOR).trim();
7
+ if (/^#[0-9a-f]{3,8}$/i.test(color)) return color;
8
+ if (/^(rgb|hsl)a?\([\d\s.,%+-]+\)$/i.test(color)) return color;
9
+ if (/^[a-z]{1,24}$/i.test(color)) return color;
10
+ return DEFAULT_COLOR;
11
+ }
12
+
13
+ export async function installVisualOperator(page, options = {}) {
14
+ const color = safeColor(options.color);
15
+ await page.evaluate(({ color }) => {
16
+ if (document.getElementById("__pursr_operator_style__")) return;
17
+ const style = document.createElement("style");
18
+ style.id = "__pursr_operator_style__";
19
+ style.textContent = `
20
+ #__pursr_cursor__ { position: fixed; left: 0; top: 0; width: 28px; height: 34px;
21
+ pointer-events: none; z-index: 2147483647; transform: translate(24px, 24px);
22
+ filter: drop-shadow(0 2px 2px rgba(0,0,0,.55)); transition: none; }
23
+ #__pursr_cursor__ svg { display: block; width: 100%; height: 100%; }
24
+ .__pursr_target__ { position: fixed; pointer-events: none; z-index: 2147483645;
25
+ border: 3px solid var(--pursr-color); border-radius: 7px;
26
+ box-shadow: 0 0 0 2px rgba(255,255,255,.92), 0 0 18px var(--pursr-color); }
27
+ .__pursr_label__ { position: absolute; left: -3px; bottom: calc(100% + 7px);
28
+ padding: 3px 7px; border-radius: 4px; background: var(--pursr-color); color: white;
29
+ font: 700 12px/1.3 ui-monospace, SFMono-Regular, Consolas, monospace;
30
+ white-space: nowrap; text-shadow: 0 1px 1px rgba(0,0,0,.35); }
31
+ .__pursr_click__ { position: fixed; width: 28px; height: 28px; margin: -14px 0 0 -14px;
32
+ pointer-events: none; z-index: 2147483646; border: 4px solid var(--pursr-color);
33
+ border-radius: 50%; box-shadow: 0 0 0 3px rgba(255,255,255,.9), 0 0 20px var(--pursr-color); }
34
+ `;
35
+ document.documentElement.appendChild(style);
36
+ const cursor = document.createElement("div");
37
+ cursor.id = "__pursr_cursor__";
38
+ cursor.dataset.x = "24";
39
+ cursor.dataset.y = "24";
40
+ cursor.style.setProperty("--pursr-color", color);
41
+ cursor.innerHTML = `<svg viewBox="0 0 28 34" aria-hidden="true"><path d="M3 2.5V27l6.8-6.2 4.7 10.2 5.2-2.5-4.7-9.8 9.4-.2z" fill="${color}" stroke="#fff" stroke-width="2.4" stroke-linejoin="round"/><path d="M3 2.5V27l6.8-6.2 4.7 10.2 5.2-2.5-4.7-9.8 9.4-.2z" fill="none" stroke="#16131a" stroke-width="1" stroke-linejoin="round"/></svg>`;
42
+ document.documentElement.appendChild(cursor);
43
+ }, { color });
44
+ }
45
+
46
+ export async function moveVisualCursor(page, x, y, options = {}) {
47
+ await installVisualOperator(page, options);
48
+ const durationMs = Math.max(0, Math.min(3000, Number(options.durationMs) || 220));
49
+ const point = { x: Math.round(Number(x) || 0), y: Math.round(Number(y) || 0) };
50
+ await page.evaluate(async ({ point, durationMs }) => {
51
+ const cursor = document.getElementById("__pursr_cursor__");
52
+ if (!cursor) return;
53
+ const startX = Number(cursor.dataset.x) || 0;
54
+ const startY = Number(cursor.dataset.y) || 0;
55
+ const started = performance.now();
56
+ await new Promise((resolve) => {
57
+ const frame = (now) => {
58
+ const progress = durationMs ? Math.min(1, (now - started) / durationMs) : 1;
59
+ const eased = 1 - Math.pow(1 - progress, 3);
60
+ const nextX = startX + (point.x - startX) * eased;
61
+ const nextY = startY + (point.y - startY) * eased;
62
+ cursor.style.transform = `translate(${nextX}px, ${nextY}px)`;
63
+ if (progress < 1) requestAnimationFrame(frame);
64
+ else resolve();
65
+ };
66
+ requestAnimationFrame(frame);
67
+ });
68
+ cursor.dataset.x = String(point.x);
69
+ cursor.dataset.y = String(point.y);
70
+ }, { point, durationMs });
71
+ await page.mouse.move(point.x, point.y, { steps: Math.max(1, Math.min(20, Math.ceil(durationMs / 20))) });
72
+ return point;
73
+ }
74
+
75
+ export async function highlightVisualTarget(page, rect, options = {}) {
76
+ await installVisualOperator(page, options);
77
+ const color = safeColor(options.color);
78
+ const label = String(options.label || "target").slice(0, 80);
79
+ await page.evaluate(({ rect, color, label }) => {
80
+ document.querySelectorAll(".__pursr_target__").forEach((node) => node.remove());
81
+ const target = document.createElement("div");
82
+ target.className = "__pursr_target__";
83
+ target.style.setProperty("--pursr-color", color);
84
+ target.style.left = `${Math.round(rect.x)}px`;
85
+ target.style.top = `${Math.round(rect.y)}px`;
86
+ target.style.width = `${Math.max(0, Math.round(rect.width))}px`;
87
+ target.style.height = `${Math.max(0, Math.round(rect.height))}px`;
88
+ const tag = document.createElement("span");
89
+ tag.className = "__pursr_label__";
90
+ tag.textContent = label;
91
+ target.appendChild(tag);
92
+ document.documentElement.appendChild(target);
93
+ }, { rect, color, label });
94
+ }
95
+
96
+ export async function markVisualClick(page, x, y, options = {}) {
97
+ await installVisualOperator(page, options);
98
+ const color = safeColor(options.color);
99
+ await page.evaluate(({ x, y, color }) => {
100
+ document.querySelectorAll(".__pursr_click__").forEach((node) => node.remove());
101
+ const marker = document.createElement("div");
102
+ marker.className = "__pursr_click__";
103
+ marker.style.setProperty("--pursr-color", color);
104
+ marker.style.left = `${Math.round(x)}px`;
105
+ marker.style.top = `${Math.round(y)}px`;
106
+ document.documentElement.appendChild(marker);
107
+ }, { x, y, color });
108
+ }
109
+
110
+ export async function clearVisualAnnotations(page, { keepCursor = true } = {}) {
111
+ await page.evaluate(({ keepCursor }) => {
112
+ document.querySelectorAll(".__pursr_target__, .__pursr_click__").forEach((node) => node.remove());
113
+ if (!keepCursor) {
114
+ document.getElementById("__pursr_cursor__")?.remove();
115
+ document.getElementById("__pursr_operator_style__")?.remove();
116
+ }
117
+ }, { keepCursor });
118
+ }
119
+
120
+ export async function visualPointForLocator(locator) {
121
+ const rect = await locator.boundingBox();
122
+ if (!rect) throw new Error("target has no visible bounding box");
123
+ return { rect, x: rect.x + rect.width / 2, y: rect.y + rect.height / 2 };
124
+ }