browserclaw 0.3.2 → 0.3.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -41,30 +41,35 @@ The snapshot + ref pattern means:
41
41
 
42
42
  The AI browser automation space is moving fast. Here's how browserclaw compares to the major alternatives.
43
43
 
44
- | | [browserclaw](https://github.com/idan-rubin/browserclaw) | [browser-use](https://github.com/browser-use/browser-use) | [Stagehand](https://github.com/browserbase/stagehand) | [Skyvern](https://github.com/Skyvern-AI/skyvern) | [Playwright MCP](https://github.com/microsoft/playwright-mcp) |
45
- |:---|:---:|:---:|:---:|:---:|:---:|
46
- | Ref → exact element, no guessing | :white_check_mark: | :heavy_minus_sign: | :x: | :x: | :white_check_mark: |
47
- | No vision model in the loop | :white_check_mark: | :heavy_minus_sign: | :white_check_mark: | :x: | :white_check_mark: |
48
- | Survives redesigns (semantic, not pixel) | :white_check_mark: | :heavy_minus_sign: | :white_check_mark: | :x: | :white_check_mark: |
49
- | Fill 10 form fields in one call | :white_check_mark: | :x: | :x: | :x: | :x: |
50
- | Interact with cross-origin iframes | :white_check_mark: | :white_check_mark: | :x: | :x: | :x: |
51
- | Playwright engine (auto-wait, locators) | :white_check_mark: | :x: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
52
- | Embeddable in your own agent loop | :white_check_mark: | :x: | :heavy_minus_sign: | :x: | :x: |
44
+ | | [browserclaw](https://github.com/idan-rubin/browserclaw) | [browser-use](https://github.com/browser-use/browser-use) | [Stagehand](https://github.com/browserbase/stagehand) | [Playwright MCP](https://github.com/microsoft/playwright-mcp) |
45
+ |:---|:---:|:---:|:---:|:---:|
46
+ | Ref → exact element, no guessing | :white_check_mark: | :heavy_minus_sign: | :x: | :white_check_mark: |
47
+ | No vision model in the loop | :white_check_mark: | :heavy_minus_sign: | :white_check_mark: | :white_check_mark: |
48
+ | Survives redesigns (semantic, not pixel) | :white_check_mark: | :heavy_minus_sign: | :white_check_mark: | :white_check_mark: |
49
+ | Fill 10 form fields in one call | :white_check_mark: | :x: | :x: | :x: |
50
+ | Interact with cross-origin iframes | :white_check_mark: | :white_check_mark: | :x: | :x: |
51
+ | Playwright engine (auto-wait, locators) | :white_check_mark: | :x: | :white_check_mark: | :white_check_mark: |
52
+ | Embeddable in your own JS/TS agent loop | :white_check_mark: | :x: | :heavy_minus_sign: | :x: |
53
53
 
54
54
  :white_check_mark: = Yes  :heavy_minus_sign: = Partial  :x: = No
55
55
 
56
56
  **browserclaw is the only tool that checks every box.** It combines the precision of accessibility snapshots with Playwright's battle-tested engine, batch operations, cross-origin iframe access, and zero framework lock-in — in a single embeddable library.
57
57
 
58
+ ### The key distinction: browser tool vs. AI agent
59
+
60
+ Most tools in this space are **AI agents that happen to control a browser**. They own the intelligence layer: they take a task, call an LLM, decide what actions to take, and execute them. That's a complete agent.
61
+
62
+ browserclaw is different. It's a **browser tool** — just the eyes and hands. It takes a snapshot and returns refs. It executes actions on refs. The LLM, the reasoning, the task planning — that all lives in your code, in your agent, wherever you want it. browserclaw doesn't have opinions about any of that.
63
+
64
+ This distinction matters if you're building an agent platform, a product with its own AI layer, or anything where you need to control the intelligence loop. You can't compose an agent-first tool into a system that already has an agent. You end up with two brains fighting over who's in charge.
65
+
58
66
  ### How each tool works under the hood
59
67
 
60
- - **browserclaw** — Accessibility snapshot with numbered refs → Playwright locator (`aria-ref` in default mode, `getByRole()` in role mode). One ref, one element. No vision model, no LLM in the targeting loop.
61
- - **browser-use** — DOM element indexing via raw CDP + optional screenshots. [Dropped Playwright](https://browser-use.com/posts/playwright-to-cdp) to go "closer to the metal" fast, but now reinvents auto-wait, retry logic, and cross-browser support from scratch.
68
+ - **browserclaw** — Accessibility snapshot with numbered refs → Playwright locator (`aria-ref` in default mode, `getByRole()` in role mode). One ref, one element. No vision model, no LLM in the targeting loop. You bring the brain.
69
+ - **browser-use** — A complete AI agent: takes a task, calls an LLM, decides actions, executes them. The LLM loop is inside the library. Great for standalone automation scripts; incompatible with platforms that already own the agent loop. Python-only.
62
70
  - **Stagehand** — Accessibility tree + natural language primitives (`page.act("click login")`). Convenient, but the LLM re-interprets which element to target on every single call — non-deterministic by design.
63
- - **Skyvern** — Vision-first. Screenshots sent to a Vision LLM that guesses coordinates. Multi-agent architecture (Planner/Actor/Validator) adds self-correction, but at significant cost and latency.
64
71
  - **Playwright MCP** — Same snapshot philosophy as browserclaw, but locked to the MCP protocol. Great for chat-based agents, but not embeddable as a library — you can't compose it into your own agent loop or call it from application code.
65
72
 
66
- **Also in the space:** [LaVague](https://github.com/lavague-ai/LaVague) (generates Selenium code via RAG on HTML), [AgentQL](https://github.com/tinyfish-io/agentql) (semantic query language for the DOM), [Vercel agent-browser](https://github.com/vercel-labs/agent-browser) (element refs like `@e1` — a similar ref-based approach).
67
-
68
73
  ### Why this matters for repeated complex UI tasks
69
74
 
70
75
  When you're running the same multi-step workflow hundreds of times — filling forms, navigating dashboards, processing queues — the differences compound:
package/dist/index.cjs CHANGED
@@ -751,12 +751,13 @@ async function getPageForTargetId(opts) {
751
751
  const found = await findPageByTargetId(browser, opts.targetId, opts.cdpUrl);
752
752
  if (!found) {
753
753
  if (pages.length === 1) return first;
754
- throw new Error(`Tab not found (targetId: ${opts.targetId}). Use browser.tabs() to list open tabs.`);
754
+ throw new Error(`Tab not found (targetId: ${opts.targetId}). Call browser.tabs() to list open tabs.`);
755
755
  }
756
756
  return found;
757
757
  }
758
758
  function refLocator(page, ref) {
759
759
  const normalized = ref.startsWith("@") ? ref.slice(1) : ref.startsWith("ref=") ? ref.slice(4) : ref;
760
+ if (!normalized.trim()) throw new Error("ref is required");
760
761
  if (/^e\d+$/.test(normalized)) {
761
762
  const state = pageStates.get(page);
762
763
  if (state?.roleRefsMode === "aria") {
@@ -1105,7 +1106,7 @@ async function snapshotRole(opts) {
1105
1106
  const frameSelector = opts.frameSelector?.trim() || "";
1106
1107
  const selector = opts.selector?.trim() || "";
1107
1108
  const locator = frameSelector ? selector ? page.frameLocator(frameSelector).locator(selector) : page.frameLocator(frameSelector).locator(":root") : selector ? page.locator(selector) : page.locator(":root");
1108
- const ariaSnapshot = await locator.ariaSnapshot({ timeout: 1e4 });
1109
+ const ariaSnapshot = await locator.ariaSnapshot({ timeout: normalizeTimeoutMs(opts.timeoutMs, 5e3) });
1109
1110
  const built = buildRoleSnapshotFromAriaSnapshot(String(ariaSnapshot ?? ""), opts.options);
1110
1111
  storeRoleRefsForTarget({
1111
1112
  page,
@@ -1834,7 +1835,7 @@ async function setGeolocationViaPlaywright(opts) {
1834
1835
  return;
1835
1836
  }
1836
1837
  if (opts.latitude === void 0 || opts.longitude === void 0) {
1837
- throw new Error("latitude and longitude are required when not clearing geolocation.");
1838
+ throw new Error("latitude and longitude are required (or set clear=true)");
1838
1839
  }
1839
1840
  await context.grantPermissions(["geolocation"], opts.origin ? { origin: opts.origin } : void 0);
1840
1841
  await context.setGeolocation({
@@ -1981,8 +1982,9 @@ async function responseBodyViaPlaywright(opts) {
1981
1982
  const response = await page.waitForResponse(opts.url, { timeout });
1982
1983
  let body = await response.text();
1983
1984
  let truncated = false;
1984
- if (opts.maxChars && body.length > opts.maxChars) {
1985
- body = body.slice(0, opts.maxChars);
1985
+ const maxChars = typeof opts.maxChars === "number" && Number.isFinite(opts.maxChars) ? Math.max(1, Math.min(5e6, Math.floor(opts.maxChars))) : void 0;
1986
+ if (maxChars !== void 0 && body.length > maxChars) {
1987
+ body = body.slice(0, maxChars);
1986
1988
  truncated = true;
1987
1989
  }
1988
1990
  const headers = {};
@@ -2153,6 +2155,7 @@ var CrawlPage = class {
2153
2155
  selector: opts.selector,
2154
2156
  frameSelector: opts.frameSelector,
2155
2157
  refsMode: opts.refsMode,
2158
+ timeoutMs: opts.timeoutMs,
2156
2159
  options: {
2157
2160
  interactive: opts.interactive,
2158
2161
  compact: opts.compact,