barebrowse 0.6.1 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,38 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.7.0
4
+
5
+ MCP resilience: timeouts, auto-retry, LLM-friendly scroll, and click fallback for hidden elements.
6
+
7
+ ### Timeouts (`mcp-server.js`)
8
+ - All MCP tool calls now have a hard timeout: 30s for session tools, 60s for `browse` and `assess`
9
+ - Returns a structured error (`Tool "X" timed out after Ns`) instead of hanging silently
10
+ - Previously: a hung browser or slow page caused `[Tool result missing due to internal error]` — opaque and unrecoverable
11
+
12
+ ### Auto-retry (`mcp-server.js`)
13
+ - `withRetry()` wrapper on all session tools (goto, snapshot, click, type, press, scroll, back, forward, drag, upload, pdf)
14
+ - On transient CDP failure (WebSocket closed, target/session closed), resets the session and retries once automatically
15
+ - Non-CDP errors (validation, unknown tool) are not retried
16
+
17
+ ### LLM-friendly scroll (`mcp-server.js`, `src/bareagent.js`)
18
+ - Scroll tool now accepts `direction: "up"/"down"` in addition to numeric `deltaY`
19
+ - LLMs naturally say `scroll(direction: "down")` — this now works instead of crashing with `deltaX/deltaY expected for mouseWheel event`
20
+ - `"down"` → `deltaY: 900`, `"up"` → `deltaY: -900`. Numeric `deltaY` still works and takes precedence.
21
+ - Clear validation error if neither `direction` nor `deltaY` is provided
22
+
23
+ ### Click JS fallback (`src/interact.js`)
24
+ - Click now falls back to JS `element.click()` when `DOM.scrollIntoViewIfNeeded` fails with "Node does not have a layout object"
25
+ - This error occurs on elements that exist in the ARIA tree but have no visual layout (display:none, zero-size, collapsed sections, detached nodes)
26
+ - Resolves the node via `DOM.requestNode` → `DOM.resolveNode` → `Runtime.callFunctionOn`
27
+ - Other click errors still throw normally
28
+
29
+ ### Docs
30
+ - Updated barebrowse.context.md, README.md, prd.md with resilience features
31
+ - MCP server version string updated to 0.7.0
32
+
33
+ ### Tests
34
+ - 71/71 passing — no test changes needed
35
+
3
36
  ## 0.6.1
4
37
 
5
38
  Headed fallback is now a per-navigation escape hatch, not a permanent mode switch. Graceful degradation when headed is unavailable.
package/README.md CHANGED
@@ -87,7 +87,7 @@ Or manually add to your config (`claude_desktop_config.json`, `.cursor/mcp.json`
87
87
  }
88
88
  ```
89
89
 
90
- 12 tools: `browse`, `goto`, `snapshot`, `click`, `type`, `press`, `scroll`, `back`, `forward`, `drag`, `upload`, `pdf`. Plus `assess` (privacy scan) if [wearehere](https://github.com/hamr0/wearehere) is installed. Session runs in hybrid mode with automatic cookie injection.
90
+ 12 tools: `browse`, `goto`, `snapshot`, `click`, `type`, `press`, `scroll`, `back`, `forward`, `drag`, `upload`, `pdf`. Plus `assess` (privacy scan) if [wearehere](https://github.com/hamr0/wearehere) is installed. Session runs in hybrid mode with automatic cookie injection. All tools have timeouts (30s/60s) and auto-retry on transient failures.
91
91
 
92
92
  ### 3. Library -- for agentic automation
93
93
 
@@ -129,10 +129,10 @@ Everything the agent can do through barebrowse:
129
129
  | **Navigate** | Load a URL, wait for page load, auto-dismiss consent |
130
130
  | **Back / Forward** | Browser history navigation |
131
131
  | **Snapshot** | Pruned ARIA tree with `[ref=N]` markers. Two modes: `act` (buttons, links, inputs) and `read` (full text). 40-90% token reduction. |
132
- | **Click** | Scroll into view + mouse click at element center |
132
+ | **Click** | Scroll into view + mouse click at element center, JS fallback for hidden elements |
133
133
  | **Type** | Focus + insert text, with option to clear existing content first |
134
134
  | **Press** | Special keys: Enter, Tab, Escape, Backspace, Delete, arrows, Space |
135
- | **Scroll** | Mouse wheel up or down |
135
+ | **Scroll** | Mouse wheel up or down (accepts direction or pixels) |
136
136
  | **Hover** | Move mouse to element center (triggers tooltips, hover states) |
137
137
  | **Select** | Set dropdown value (native select or custom dropdown) |
138
138
  | **Drag** | Drag one element to another (Kanban boards, sliders) |
@@ -1,7 +1,7 @@
1
1
  # barebrowse -- Integration Guide
2
2
 
3
3
  > For AI assistants and developers wiring barebrowse into a project.
4
- > v0.6.1 | Node.js >= 22 | 0 required deps | MIT
4
+ > v0.7.0 | Node.js >= 22 | 0 required deps | MIT
5
5
 
6
6
  ## What this is
7
7
 
@@ -59,7 +59,7 @@ const snapshot = await browse('https://example.com', {
59
59
  | `click(ref)` | ref: string | void | Scroll into view + mouse press+release at center |
60
60
  | `type(ref, text, opts?)` | ref: string, text: string, opts: { clear?, keyEvents? } | void | Focus + insert text. `clear: true` replaces existing. |
61
61
  | `press(key)` | key: string | void | Special key: Enter, Tab, Escape, Backspace, Delete, arrows, Home, End, PageUp, PageDown, Space |
62
- | `scroll(deltaY)` | deltaY: number | void | Mouse wheel. Positive = down, negative = up. |
62
+ | `scroll(deltaY)` | deltaY: number | void | Mouse wheel. Positive = down, negative = up. MCP/bareagent also accept `direction: "up"/"down"`. |
63
63
  | `hover(ref)` | ref: string | void | Move mouse to element center |
64
64
  | `select(ref, value)` | ref: string, value: string | void | Set `<select>` value or click custom dropdown option |
65
65
  | `drag(fromRef, toRef)` | fromRef: string, toRef: string | void | Drag from one element to another |
@@ -149,7 +149,7 @@ barebrowse can inject cookies from the user's real browser sessions, bypassing l
149
149
  | Media autoplay blocked | `--autoplay-policy=no-user-gesture-required` | Both |
150
150
  | Login walls | Cookie extraction from Firefox/Chromium + CDP injection | Both |
151
151
  | Pre-filled form inputs | `type({ clear: true })` selects all + deletes first | Both |
152
- | Off-screen elements | `DOM.scrollIntoViewIfNeeded` before every click | Both |
152
+ | Off-screen elements | `DOM.scrollIntoViewIfNeeded` before every click, JS `.click()` fallback for no-layout elements | Both |
153
153
  | Form submission | `press('Enter')` triggers onsubmit | Both |
154
154
  | SPA navigation | `waitForNavigation()` uses loadEventFired + frameNavigated | Both |
155
155
  | Bot detection | ARIA node count (<50 = bot-blocked) + text heuristics. `botBlocked` flag set after every `goto()`. Hybrid fallback switches to headed. Snapshot shows `[BOT CHALLENGE DETECTED]` warning. | Hybrid |
@@ -241,7 +241,7 @@ Action tools return `'ok'` -- the agent calls `snapshot` explicitly to observe.
241
241
 
242
242
  Session runs in hybrid mode (headless with automatic headed fallback on bot detection). `goto` injects cookies from the user's browser before navigation for authenticated access.
243
243
 
244
- Session tools share a singleton page, lazy-created on first use. Assess tries headless first; if bot-blocked (score ≤5 with all zeros), retries with a separate headed session. Tabs dismissed for consent and closed after every scan. Max 3 concurrent. Browser OOM/crash auto-recovers (session resets, server stays alive).
244
+ Session tools share a singleton page, lazy-created on first use. All session tools have auto-retry on transient CDP failures (browser crash, WebSocket close) session resets and retries once automatically. 30s timeout on all tools (60s for `browse`/`assess`). Scroll accepts `direction: "up"/"down"` in addition to numeric `deltaY`. Click falls back to JS `.click()` when elements have no layout. Assess tries headless first; if bot-blocked, retries headed. Browser OOM/crash auto-recovers (session resets, server stays alive).
245
245
 
246
246
  ## Architecture
247
247
 
package/mcp-server.js CHANGED
@@ -25,6 +25,18 @@ function isCdpDead(err) {
25
25
  return m.includes('WebSocket') || m.includes('Target closed') || m.includes('Session closed') || m.includes('CDP');
26
26
  }
27
27
 
28
+ /** Retry-once wrapper for transient CDP failures. Resets session and retries. */
29
+ async function withRetry(fn) {
30
+ try {
31
+ return await fn();
32
+ } catch (err) {
33
+ if (!isCdpDead(err)) throw err;
34
+ // CDP died — reset session and retry once
35
+ _page = null;
36
+ return await fn();
37
+ }
38
+ }
39
+
28
40
  const MAX_CHARS_DEFAULT = 30000;
29
41
  const OUTPUT_DIR = join(process.cwd(), '.barebrowse');
30
42
 
@@ -139,13 +151,13 @@ const TOOLS = [
139
151
  },
140
152
  {
141
153
  name: 'scroll',
142
- description: 'Scroll the page. Positive deltaY scrolls down, negative scrolls up. Returns ok.',
154
+ description: 'Scroll the page up or down. Pass direction ("up"/"down") or a numeric deltaY. Returns ok.',
143
155
  inputSchema: {
144
156
  type: 'object',
145
157
  properties: {
146
- deltaY: { type: 'number', description: 'Pixels to scroll (positive=down, negative=up)' },
158
+ direction: { type: 'string', enum: ['up', 'down'], description: 'Scroll direction "up" or "down" (scrolls ~3 screen-heights)' },
159
+ deltaY: { type: 'number', description: 'Pixels to scroll (positive=down, negative=up). Overrides direction if both given.' },
147
160
  },
148
- required: ['deltaY'],
149
161
  },
150
162
  },
151
163
  {
@@ -222,13 +234,13 @@ async function handleToolCall(name, args) {
222
234
  }
223
235
  return text;
224
236
  }
225
- case 'goto': {
237
+ case 'goto': return withRetry(async () => {
226
238
  const page = await getPage();
227
239
  try { await page.injectCookies(args.url); } catch {}
228
240
  await page.goto(args.url);
229
241
  return 'ok';
230
- }
231
- case 'snapshot': {
242
+ });
243
+ case 'snapshot': return withRetry(async () => {
232
244
  const page = await getPage();
233
245
  const text = await page.snapshot();
234
246
  const limit = args.maxChars ?? MAX_CHARS_DEFAULT;
@@ -237,51 +249,58 @@ async function handleToolCall(name, args) {
237
249
  return `Snapshot (${text.length} chars) saved to ${file}`;
238
250
  }
239
251
  return text;
240
- }
241
- case 'click': {
252
+ });
253
+ case 'click': return withRetry(async () => {
242
254
  const page = await getPage();
243
255
  await page.click(args.ref);
244
256
  return 'ok';
245
- }
246
- case 'type': {
257
+ });
258
+ case 'type': return withRetry(async () => {
247
259
  const page = await getPage();
248
260
  await page.type(args.ref, args.text, { clear: args.clear });
249
261
  return 'ok';
250
- }
251
- case 'press': {
262
+ });
263
+ case 'press': return withRetry(async () => {
252
264
  const page = await getPage();
253
265
  await page.press(args.key);
254
266
  return 'ok';
255
- }
256
- case 'scroll': {
267
+ });
268
+ case 'scroll': return withRetry(async () => {
257
269
  const page = await getPage();
258
- await page.scroll(args.deltaY);
270
+ let dy = args.deltaY;
271
+ if (dy == null && args.direction) {
272
+ dy = args.direction === 'up' ? -900 : 900;
273
+ }
274
+ if (dy == null || typeof dy !== 'number') {
275
+ throw new Error('scroll requires "direction" ("up"/"down") or numeric "deltaY"');
276
+ }
277
+ await page.scroll(dy);
259
278
  return 'ok';
260
- }
261
- case 'back': {
279
+ });
280
+ case 'back': return withRetry(async () => {
262
281
  const page = await getPage();
263
282
  await page.goBack();
264
283
  return 'ok';
265
- }
266
- case 'forward': {
284
+ });
285
+ case 'forward': return withRetry(async () => {
267
286
  const page = await getPage();
268
287
  await page.goForward();
269
288
  return 'ok';
270
- }
271
- case 'drag': {
289
+ });
290
+ case 'drag': return withRetry(async () => {
272
291
  const page = await getPage();
273
292
  await page.drag(args.fromRef, args.toRef);
274
293
  return 'ok';
275
- }
276
- case 'upload': {
294
+ });
295
+ case 'upload': return withRetry(async () => {
277
296
  const page = await getPage();
278
297
  await page.upload(args.ref, args.files);
279
298
  return 'ok';
280
- }
281
- case 'pdf': {
299
+ });
300
+ case 'pdf': return withRetry(async () => {
282
301
  const page = await getPage();
283
302
  return await page.pdf({ landscape: args.landscape });
284
- }
303
+ });
285
304
  case 'assess': {
286
305
  if (!assessFn) throw new Error('wearehere is not installed. Run: npm install wearehere');
287
306
  const releaseSlot = await acquireAssessSlot();
@@ -350,7 +369,7 @@ async function handleMessage(msg) {
350
369
  return jsonrpcResponse(id, {
351
370
  protocolVersion: '2024-11-05',
352
371
  capabilities: { tools: {} },
353
- serverInfo: { name: 'barebrowse', version: '0.6.0' },
372
+ serverInfo: { name: 'barebrowse', version: '0.7.0' },
354
373
  });
355
374
  }
356
375
 
@@ -364,12 +383,19 @@ async function handleMessage(msg) {
364
383
 
365
384
  if (method === 'tools/call') {
366
385
  const { name, arguments: args } = params;
386
+ const TOOL_TIMEOUT = name === 'browse' || name === 'assess' ? 60000 : 30000;
367
387
  try {
368
- const result = await handleToolCall(name, args || {});
388
+ let timer;
389
+ const result = await Promise.race([
390
+ handleToolCall(name, args || {}),
391
+ new Promise((_, rej) => { timer = setTimeout(() => rej(new Error(`Tool "${name}" timed out after ${TOOL_TIMEOUT / 1000}s`)), TOOL_TIMEOUT); }),
392
+ ]);
393
+ clearTimeout(timer);
369
394
  return jsonrpcResponse(id, {
370
395
  content: [{ type: 'text', text: typeof result === 'string' ? result : JSON.stringify(result) }],
371
396
  });
372
397
  } catch (err) {
398
+ if (isCdpDead(err)) _page = null;
373
399
  return jsonrpcResponse(id, {
374
400
  content: [{ type: 'text', text: `Error: ${err.message}` }],
375
401
  isError: true,
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "barebrowse",
3
- "version": "0.6.1",
3
+ "version": "0.7.0",
4
4
  "description": "Authenticated web browsing for autonomous agents via CDP. URL in, pruned ARIA snapshot out.",
5
5
  "type": "module",
6
6
  "main": "src/index.js",
package/src/bareagent.js CHANGED
@@ -116,15 +116,20 @@ export function createBrowseTools(opts = {}) {
116
116
  },
117
117
  {
118
118
  name: 'scroll',
119
- description: 'Scroll the page. Returns the updated snapshot.',
119
+ description: 'Scroll the page up or down. Pass direction ("up"/"down") or a numeric deltaY. Returns the updated snapshot.',
120
120
  parameters: {
121
121
  type: 'object',
122
122
  properties: {
123
- deltaY: { type: 'number', description: 'Pixels to scroll (positive=down, negative=up)' },
123
+ direction: { type: 'string', enum: ['up', 'down'], description: 'Scroll direction "up" or "down" (scrolls ~3 screen-heights)' },
124
+ deltaY: { type: 'number', description: 'Pixels to scroll (positive=down, negative=up). Overrides direction if both given.' },
124
125
  },
125
- required: ['deltaY'],
126
126
  },
127
- execute: async ({ deltaY }) => actionAndSnapshot((page) => page.scroll(deltaY)),
127
+ execute: async ({ direction, deltaY }) => {
128
+ let dy = deltaY;
129
+ if (dy == null && direction) dy = direction === 'up' ? -900 : 900;
130
+ if (dy == null || typeof dy !== 'number') throw new Error('scroll requires "direction" or numeric "deltaY"');
131
+ return actionAndSnapshot((page) => page.scroll(dy));
132
+ },
128
133
  },
129
134
  {
130
135
  name: 'select',
package/src/interact.js CHANGED
@@ -46,13 +46,27 @@ async function getCenter(session, backendNodeId) {
46
46
  * @param {number} backendNodeId - Backend DOM node ID
47
47
  */
48
48
  export async function click(session, backendNodeId) {
49
- const { x, y } = await getCenter(session, backendNodeId);
50
- await session.send('Input.dispatchMouseEvent', {
51
- type: 'mousePressed', x, y, button: 'left', clickCount: 1,
52
- });
53
- await session.send('Input.dispatchMouseEvent', {
54
- type: 'mouseReleased', x, y, button: 'left', clickCount: 1,
55
- });
49
+ try {
50
+ const { x, y } = await getCenter(session, backendNodeId);
51
+ await session.send('Input.dispatchMouseEvent', {
52
+ type: 'mousePressed', x, y, button: 'left', clickCount: 1,
53
+ });
54
+ await session.send('Input.dispatchMouseEvent', {
55
+ type: 'mouseReleased', x, y, button: 'left', clickCount: 1,
56
+ });
57
+ } catch (err) {
58
+ // Element has no layout (display:none, zero-size, detached) — fall back to JS click
59
+ if (err.message && err.message.includes('layout object')) {
60
+ const { nodeId } = await session.send('DOM.requestNode', { backendNodeId });
61
+ const { object } = await session.send('DOM.resolveNode', { nodeId });
62
+ await session.send('Runtime.callFunctionOn', {
63
+ objectId: object.objectId,
64
+ functionDeclaration: 'function() { this.click(); }',
65
+ });
66
+ } else {
67
+ throw err;
68
+ }
69
+ }
56
70
  }
57
71
 
58
72
  /**