browserclaw 0.3.2 → 0.3.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +19 -14
- package/dist/index.cjs +8 -5
- package/dist/index.cjs.map +1 -1
- package/dist/index.d.cts +2 -0
- package/dist/index.d.ts +2 -0
- package/dist/index.js +8 -5
- package/dist/index.js.map +1 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -41,30 +41,35 @@ The snapshot + ref pattern means:
|
|
|
41
41
|
|
|
42
42
|
The AI browser automation space is moving fast. Here's how browserclaw compares to the major alternatives.
|
|
43
43
|
|
|
44
|
-
| | [browserclaw](https://github.com/idan-rubin/browserclaw) | [browser-use](https://github.com/browser-use/browser-use) | [Stagehand](https://github.com/browserbase/stagehand) | [
|
|
45
|
-
|
|
46
|
-
| Ref → exact element, no guessing | :white_check_mark: | :heavy_minus_sign: | :x: | :
|
|
47
|
-
| No vision model in the loop | :white_check_mark: | :heavy_minus_sign: | :white_check_mark: | :
|
|
48
|
-
| Survives redesigns (semantic, not pixel) | :white_check_mark: | :heavy_minus_sign: | :white_check_mark: | :
|
|
49
|
-
| Fill 10 form fields in one call | :white_check_mark: | :x: | :x: | :x: |
|
|
50
|
-
| Interact with cross-origin iframes | :white_check_mark: | :white_check_mark: | :x: | :x: |
|
|
51
|
-
| Playwright engine (auto-wait, locators) | :white_check_mark: | :x: | :white_check_mark: | :white_check_mark: |
|
|
52
|
-
| Embeddable in your own agent loop | :white_check_mark: | :x: | :heavy_minus_sign: | :x: |
|
|
44
|
+
| | [browserclaw](https://github.com/idan-rubin/browserclaw) | [browser-use](https://github.com/browser-use/browser-use) | [Stagehand](https://github.com/browserbase/stagehand) | [Playwright MCP](https://github.com/microsoft/playwright-mcp) |
|
|
45
|
+
|:---|:---:|:---:|:---:|:---:|
|
|
46
|
+
| Ref → exact element, no guessing | :white_check_mark: | :heavy_minus_sign: | :x: | :white_check_mark: |
|
|
47
|
+
| No vision model in the loop | :white_check_mark: | :heavy_minus_sign: | :white_check_mark: | :white_check_mark: |
|
|
48
|
+
| Survives redesigns (semantic, not pixel) | :white_check_mark: | :heavy_minus_sign: | :white_check_mark: | :white_check_mark: |
|
|
49
|
+
| Fill 10 form fields in one call | :white_check_mark: | :x: | :x: | :x: |
|
|
50
|
+
| Interact with cross-origin iframes | :white_check_mark: | :white_check_mark: | :x: | :x: |
|
|
51
|
+
| Playwright engine (auto-wait, locators) | :white_check_mark: | :x: | :white_check_mark: | :white_check_mark: |
|
|
52
|
+
| Embeddable in your own JS/TS agent loop | :white_check_mark: | :x: | :heavy_minus_sign: | :x: |
|
|
53
53
|
|
|
54
54
|
:white_check_mark: = Yes  :heavy_minus_sign: = Partial  :x: = No
|
|
55
55
|
|
|
56
56
|
**browserclaw is the only tool that checks every box.** It combines the precision of accessibility snapshots with Playwright's battle-tested engine, batch operations, cross-origin iframe access, and zero framework lock-in — in a single embeddable library.
|
|
57
57
|
|
|
58
|
+
### The key distinction: browser tool vs. AI agent
|
|
59
|
+
|
|
60
|
+
Most tools in this space are **AI agents that happen to control a browser**. They own the intelligence layer: they take a task, call an LLM, decide what actions to take, and execute them. That's a complete agent.
|
|
61
|
+
|
|
62
|
+
browserclaw is different. It's a **browser tool** — just the eyes and hands. It takes a snapshot and returns refs. It executes actions on refs. The LLM, the reasoning, the task planning — that all lives in your code, in your agent, wherever you want it. browserclaw doesn't have opinions about any of that.
|
|
63
|
+
|
|
64
|
+
This distinction matters if you're building an agent platform, a product with its own AI layer, or anything where you need to control the intelligence loop. You can't compose an agent-first tool into a system that already has an agent. You end up with two brains fighting over who's in charge.
|
|
65
|
+
|
|
58
66
|
### How each tool works under the hood
|
|
59
67
|
|
|
60
|
-
- **browserclaw** — Accessibility snapshot with numbered refs → Playwright locator (`aria-ref` in default mode, `getByRole()` in role mode). One ref, one element. No vision model, no LLM in the targeting loop.
|
|
61
|
-
- **browser-use** —
|
|
68
|
+
- **browserclaw** — Accessibility snapshot with numbered refs → Playwright locator (`aria-ref` in default mode, `getByRole()` in role mode). One ref, one element. No vision model, no LLM in the targeting loop. You bring the brain.
|
|
69
|
+
- **browser-use** — A complete AI agent: takes a task, calls an LLM, decides actions, executes them. The LLM loop is inside the library. Great for standalone automation scripts; incompatible with platforms that already own the agent loop. Python-only.
|
|
62
70
|
- **Stagehand** — Accessibility tree + natural language primitives (`page.act("click login")`). Convenient, but the LLM re-interprets which element to target on every single call — non-deterministic by design.
|
|
63
|
-
- **Skyvern** — Vision-first. Screenshots sent to a Vision LLM that guesses coordinates. Multi-agent architecture (Planner/Actor/Validator) adds self-correction, but at significant cost and latency.
|
|
64
71
|
- **Playwright MCP** — Same snapshot philosophy as browserclaw, but locked to the MCP protocol. Great for chat-based agents, but not embeddable as a library — you can't compose it into your own agent loop or call it from application code.
|
|
65
72
|
|
|
66
|
-
**Also in the space:** [LaVague](https://github.com/lavague-ai/LaVague) (generates Selenium code via RAG on HTML), [AgentQL](https://github.com/tinyfish-io/agentql) (semantic query language for the DOM), [Vercel agent-browser](https://github.com/vercel-labs/agent-browser) (element refs like `@e1` — a similar ref-based approach).
|
|
67
|
-
|
|
68
73
|
### Why this matters for repeated complex UI tasks
|
|
69
74
|
|
|
70
75
|
When you're running the same multi-step workflow hundreds of times — filling forms, navigating dashboards, processing queues — the differences compound:
|
package/dist/index.cjs
CHANGED
|
@@ -751,12 +751,13 @@ async function getPageForTargetId(opts) {
|
|
|
751
751
|
const found = await findPageByTargetId(browser, opts.targetId, opts.cdpUrl);
|
|
752
752
|
if (!found) {
|
|
753
753
|
if (pages.length === 1) return first;
|
|
754
|
-
throw new Error(`Tab not found (targetId: ${opts.targetId}).
|
|
754
|
+
throw new Error(`Tab not found (targetId: ${opts.targetId}). Call browser.tabs() to list open tabs.`);
|
|
755
755
|
}
|
|
756
756
|
return found;
|
|
757
757
|
}
|
|
758
758
|
function refLocator(page, ref) {
|
|
759
759
|
const normalized = ref.startsWith("@") ? ref.slice(1) : ref.startsWith("ref=") ? ref.slice(4) : ref;
|
|
760
|
+
if (!normalized.trim()) throw new Error("ref is required");
|
|
760
761
|
if (/^e\d+$/.test(normalized)) {
|
|
761
762
|
const state = pageStates.get(page);
|
|
762
763
|
if (state?.roleRefsMode === "aria") {
|
|
@@ -1105,7 +1106,7 @@ async function snapshotRole(opts) {
|
|
|
1105
1106
|
const frameSelector = opts.frameSelector?.trim() || "";
|
|
1106
1107
|
const selector = opts.selector?.trim() || "";
|
|
1107
1108
|
const locator = frameSelector ? selector ? page.frameLocator(frameSelector).locator(selector) : page.frameLocator(frameSelector).locator(":root") : selector ? page.locator(selector) : page.locator(":root");
|
|
1108
|
-
const ariaSnapshot = await locator.ariaSnapshot({ timeout:
|
|
1109
|
+
const ariaSnapshot = await locator.ariaSnapshot({ timeout: normalizeTimeoutMs(opts.timeoutMs, 5e3) });
|
|
1109
1110
|
const built = buildRoleSnapshotFromAriaSnapshot(String(ariaSnapshot ?? ""), opts.options);
|
|
1110
1111
|
storeRoleRefsForTarget({
|
|
1111
1112
|
page,
|
|
@@ -1834,7 +1835,7 @@ async function setGeolocationViaPlaywright(opts) {
|
|
|
1834
1835
|
return;
|
|
1835
1836
|
}
|
|
1836
1837
|
if (opts.latitude === void 0 || opts.longitude === void 0) {
|
|
1837
|
-
throw new Error("latitude and longitude are required
|
|
1838
|
+
throw new Error("latitude and longitude are required (or set clear=true)");
|
|
1838
1839
|
}
|
|
1839
1840
|
await context.grantPermissions(["geolocation"], opts.origin ? { origin: opts.origin } : void 0);
|
|
1840
1841
|
await context.setGeolocation({
|
|
@@ -1981,8 +1982,9 @@ async function responseBodyViaPlaywright(opts) {
|
|
|
1981
1982
|
const response = await page.waitForResponse(opts.url, { timeout });
|
|
1982
1983
|
let body = await response.text();
|
|
1983
1984
|
let truncated = false;
|
|
1984
|
-
|
|
1985
|
-
|
|
1985
|
+
const maxChars = typeof opts.maxChars === "number" && Number.isFinite(opts.maxChars) ? Math.max(1, Math.min(5e6, Math.floor(opts.maxChars))) : void 0;
|
|
1986
|
+
if (maxChars !== void 0 && body.length > maxChars) {
|
|
1987
|
+
body = body.slice(0, maxChars);
|
|
1986
1988
|
truncated = true;
|
|
1987
1989
|
}
|
|
1988
1990
|
const headers = {};
|
|
@@ -2153,6 +2155,7 @@ var CrawlPage = class {
|
|
|
2153
2155
|
selector: opts.selector,
|
|
2154
2156
|
frameSelector: opts.frameSelector,
|
|
2155
2157
|
refsMode: opts.refsMode,
|
|
2158
|
+
timeoutMs: opts.timeoutMs,
|
|
2156
2159
|
options: {
|
|
2157
2160
|
interactive: opts.interactive,
|
|
2158
2161
|
compact: opts.compact,
|