barebrowse 0.4.6 → 0.4.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +12 -0
- package/README.md +23 -21
- package/baremobile.md +105 -0
- package/docs/01-product/prd.md +24 -0
- package/package.json +1 -1
- package/src/index.js +2 -2
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,17 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 0.4.7
|
|
4
|
+
|
|
5
|
+
Snapshot URL prefix format changed from `# <url>` to `url: <url>`.
|
|
6
|
+
|
|
7
|
+
- Fix: MCP clients (Claude Code) stripped `#`-prefixed lines as comments, making the URL invisible to agents
|
|
8
|
+
- Snapshot first line is now `url: <current-page-url>` (was `# <current-page-url>`)
|
|
9
|
+
- Stats line no longer prefixed with `#`
|
|
10
|
+
|
|
11
|
+
## 0.4.6
|
|
12
|
+
|
|
13
|
+
- README wording fix
|
|
14
|
+
|
|
3
15
|
## 0.4.5
|
|
4
16
|
|
|
5
17
|
- README: "What this is" rewritten — concise, no implementation details exposed
|
package/README.md
CHANGED
|
@@ -105,27 +105,7 @@ For code examples, API reference, and wiring instructions, see **[barebrowse.con
|
|
|
105
105
|
|
|
106
106
|
## What it handles automatically
|
|
107
107
|
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
| Obstacle | How it's handled | Mode |
|
|
111
|
-
|----------|-----------------|------|
|
|
112
|
-
| **Cookie consent walls** | ARIA tree scan + jsClick accept button, 29 languages | Both |
|
|
113
|
-
| **Consent in dialog role** | Detect `dialog`/`alertdialog` with consent hints, click accept inside | Both |
|
|
114
|
-
| **Consent outside dialog** (BBC SourcePoint) | Fallback global button scan when dialog has no accept button | Both |
|
|
115
|
-
| **Consent behind iframe overlay** | JS click via DOM.resolveNode bypasses z-index/overlay issues | Both |
|
|
116
|
-
| **Permission prompts** (location, camera, mic) | Launch flags + CDP Browser.setPermission auto-deny | Both |
|
|
117
|
-
| **Media autoplay blocked** | Autoplay policy flag on launch | Both |
|
|
118
|
-
| **Login walls** | Cookie extraction from all browsers (Firefox + Chromium merged), injected via CDP | Both |
|
|
119
|
-
| **Pre-filled form inputs** | Select-all + delete before typing | Both |
|
|
120
|
-
| **Off-screen elements** | Scrolled into view before every click | Both |
|
|
121
|
-
| **Form submission** | Enter key triggers onsubmit | Both |
|
|
122
|
-
| **Tab between fields** | Tab key moves focus correctly | Both |
|
|
123
|
-
| **SPA navigation** (YouTube, GitHub) | SPA-aware wait: frameNavigated + loadEventFired | Both |
|
|
124
|
-
| **Bot detection** (Google, Reddit) | Stealth patches (headless) + automatic headed fallback with real cookies | Hybrid |
|
|
125
|
-
| **navigator.webdriver leak** | Patched before page scripts run: webdriver, plugins, languages, chrome object | Headless |
|
|
126
|
-
| **JS dialogs** (alert/confirm/prompt) | Auto-dismiss via CDP, logged for inspection | Both |
|
|
127
|
-
| **Profile locking** | Unique temp dir per headless instance | Headless |
|
|
128
|
-
| **ARIA noise** | 9-step pruning pipeline (ported from mcprune): wrapper collapse, noise removal, landmark promotion | Both |
|
|
108
|
+
Cookie consent walls (29 languages), login walls (cookie extraction from your browsers), bot detection (stealth patches + automatic headed fallback), permission prompts, SPA navigation, JS dialogs, off-screen elements, pre-filled inputs, ARIA noise, and profile locking. The agent doesn't think about any of it.
|
|
129
109
|
|
|
130
110
|
## What the agent sees
|
|
131
111
|
|
|
@@ -202,6 +182,28 @@ URL -> find/launch browser (chromium.js)
|
|
|
202
182
|
- Any Chromium-based browser installed (Chrome, Chromium, Brave, Edge, Vivaldi)
|
|
203
183
|
- Linux tested (Fedora/KDE). macOS/Windows cookie paths exist but untested.
|
|
204
184
|
|
|
185
|
+
## The bare ecosystem
|
|
186
|
+
|
|
187
|
+
Three vanilla JS modules. Zero dependencies. Same API patterns.
|
|
188
|
+
|
|
189
|
+
| | [**bareagent**](https://npmjs.com/package/bare-agent) | [**barebrowse**](https://npmjs.com/package/barebrowse) | [**baremobile**](https://npmjs.com/package/baremobile) |
|
|
190
|
+
|---|---|---|---|
|
|
191
|
+
| **Does** | Gives agents a think→act loop | Gives agents a real browser | Gives agents an Android device |
|
|
192
|
+
| **How** | Goal in → coordinated actions out | URL in → pruned snapshot out | Screen in → pruned snapshot out |
|
|
193
|
+
| **Replaces** | LangChain, CrewAI, AutoGen | Playwright, Selenium, Puppeteer | Appium, Espresso, UIAutomator2 |
|
|
194
|
+
| **Interfaces** | Library · CLI · subprocess | Library · CLI · MCP | Library · CLI · MCP |
|
|
195
|
+
| **Solo or together** | Orchestrates both as tools | Works standalone | Works standalone |
|
|
196
|
+
|
|
197
|
+
**What you can build:**
|
|
198
|
+
|
|
199
|
+
- **Headless automation** — scrape sites, fill forms, extract data, monitor pages on a schedule
|
|
200
|
+
- **QA & testing** — automated test suites for web and Android apps without heavyweight frameworks
|
|
201
|
+
- **Personal AI assistants** — chatbots that browse the web or control your phone on your behalf
|
|
202
|
+
- **Remote device control** — manage Android devices over WiFi, including on-device via Termux
|
|
203
|
+
- **Agentic workflows** — multi-step tasks where an AI plans, browses, and acts across web and mobile
|
|
204
|
+
|
|
205
|
+
**Why this exists:** Most automation stacks ship 200MB of opinions before you write a line of code. These don't. Install, import, go.
|
|
206
|
+
|
|
205
207
|
## License
|
|
206
208
|
|
|
207
209
|
MIT
|
package/baremobile.md
ADDED
|
@@ -0,0 +1,105 @@
|
|
|
1
|
+
# baremobile — Research & Feasibility
|
|
2
|
+
|
|
3
|
+
## Platform Feasibility
|
|
4
|
+
|
|
5
|
+
| Platform | Accessibility Tree | Input Injection | Auth/Cookie Reuse | Practical? |
|
|
6
|
+
|---|---|---|---|---|
|
|
7
|
+
| **Android (non-root)** | uiautomator dump — good XML tree | ADB tap/swipe/input — solid | No. SharedPrefs locked, Keystore hardware-bound | Yes — tree + input work, auth is the gap |
|
|
8
|
+
| **Android (rooted)** | Same | Same | Partial — SharedPrefs yes, Keystore still no | Yes — best mobile option |
|
|
9
|
+
| **iOS (simulator)** | XCUITest — excellent tree | simctl + WDA | No. Absolute sandboxing | Dev/QA only |
|
|
10
|
+
| **iOS (physical)** | Same, but needs Mac+Xcode+signing | Same | No | Impractical for end users |
|
|
11
|
+
| **Windows** | UI Automation (UIA) — best desktop tree | SendInput — works well | App-specific, no universal trick | Yes — strongest desktop option |
|
|
12
|
+
| **macOS** | AXUIElement — good tree | CGEvent APIs, cliclick | Keychain Access (gated by prompts) | Medium — permission hell |
|
|
13
|
+
| **Linux (X11)** | AT-SPI2 — inconsistent coverage | xdotool — trivial | App-specific | Medium — tree quality varies |
|
|
14
|
+
| **Linux (Wayland)** | AT-SPI2 — same | ydotool (needs root) | Same | Hard — input injection blocked by design |
|
|
15
|
+
|
|
16
|
+
## Market Demand
|
|
17
|
+
|
|
18
|
+
| Platform | Who Wants It | Use Cases | Demand Level |
|
|
19
|
+
|---|---|---|---|
|
|
20
|
+
| **Android** | Devs, QA teams, end users | Mobile-only apps, testing, data entry, social media automation | **High** — DroidRun got 900 signups in 72h, 2.1M EUR raised |
|
|
21
|
+
| **iOS** | QA teams, enterprises | App testing, accessibility audits | **Medium** — gated access kills consumer use |
|
|
22
|
+
| **Windows** | Enterprises | Legacy app automation (no API, GUI only) | **High** — Microsoft building UFO (8k stars) |
|
|
23
|
+
| **macOS** | Developers, power users | App automation, workflows | **Low-Medium** — most Mac apps have APIs or CLI |
|
|
24
|
+
| **Linux desktop** | Almost nobody | Niche automation | **Low** — devs use CLI, not GUI agents |
|
|
25
|
+
|
|
26
|
+
## Existing Open-Source Competition
|
|
27
|
+
|
|
28
|
+
| Project | Platform | Stars | Approach | Maturity |
|
|
29
|
+
|---|---|---|---|---|
|
|
30
|
+
| DroidRun | Android | 3.8k | A11y tree + ADB | Funded (2.1M EUR), active |
|
|
31
|
+
| DroidClaw | Android | New | A11y tree + vision fallback | Weeks old |
|
|
32
|
+
| agent-device (Callstack) | Android + iOS | Early | A11y tree, TypeScript | Active |
|
|
33
|
+
| UFO (Microsoft) | Windows | 8k | UIA + vision | Mature |
|
|
34
|
+
| Agent-S (Simular AI) | Cross-platform | 8.5k | Hybrid | Mature, 72% OSWorld |
|
|
35
|
+
| Cua | macOS/Linux/Win | 11.8k | Sandbox VMs | YC-backed |
|
|
36
|
+
|
|
37
|
+
## Auth Problem (Mobile vs Web)
|
|
38
|
+
|
|
39
|
+
| | Web (barebrowse) | Android Native | iOS Native |
|
|
40
|
+
|---|---|---|---|
|
|
41
|
+
| Where tokens live | SQLite cookie DB, readable | SharedPrefs (locked) or Keystore (hardware) | Keychain (sandboxed) |
|
|
42
|
+
| Can agent read them? | Yes — decrypt with OS keyring | No (non-root), Partial (root) | No |
|
|
43
|
+
| Workaround | N/A — it works | Agent logs in via UI, or keeps app session alive | Same |
|
|
44
|
+
| WebView content? | N/A | CDP attach possible (debug builds only) | No |
|
|
45
|
+
|
|
46
|
+
## Strategic Comparison
|
|
47
|
+
|
|
48
|
+
| Factor | Android | Windows | iOS | macOS | Linux Desktop |
|
|
49
|
+
|---|---|---|---|---|---|
|
|
50
|
+
| Tree quality | Good | Excellent | Excellent | Good | Inconsistent |
|
|
51
|
+
| Input control | Easy (ADB) | Easy | Gated | Permission-heavy | X11 easy, Wayland hard |
|
|
52
|
+
| Auth reuse | Bad | App-specific | Impossible | Gated | App-specific |
|
|
53
|
+
| Real demand | **High** | **High** | Medium (QA only) | Low-Medium | Low |
|
|
54
|
+
| Competition | Active but early | Microsoft owns it | Apple controls it | Niche | Nobody cares |
|
|
55
|
+
| Fits barebrowse DNA? | **Yes** | Partial | No | No | No |
|
|
56
|
+
|
|
57
|
+
## Android Technical Details
|
|
58
|
+
|
|
59
|
+
### Accessibility Tree via ADB
|
|
60
|
+
```bash
|
|
61
|
+
adb shell uiautomator dump /dev/tty # dump XML accessibility tree
|
|
62
|
+
```
|
|
63
|
+
Returns XML with: bounds, text, class, content-desc, resource-id, clickable, scrollable, focused, enabled, checked, selected. Structurally similar to ARIA — roles, names, states, coordinates.
|
|
64
|
+
|
|
65
|
+
### Input via ADB
|
|
66
|
+
```bash
|
|
67
|
+
adb shell input tap 500 300 # tap at coordinates
|
|
68
|
+
adb shell input text "hello" # type text
|
|
69
|
+
adb shell input keyevent 66 # Enter key (KEYCODE_ENTER)
|
|
70
|
+
adb shell input swipe 300 500 300 100 # swipe gesture
|
|
71
|
+
adb shell input keyevent 4 # Back button
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
### Key Limitations
|
|
75
|
+
- **WebViews:** uiautomator tree is empty/shallow for WebView content. Flutter apps can crash uiautomator with StackOverflowError.
|
|
76
|
+
- **Auth:** Cannot read app tokens on non-rooted devices. Agent must log in through UI or keep sessions alive.
|
|
77
|
+
- **Latency:** uiautomator dump takes 1-3 seconds. Screenshot approach is faster per frame but less structured.
|
|
78
|
+
|
|
79
|
+
### WebView Gap — Potential Differentiator
|
|
80
|
+
Android WebViews expose a CDP debug port when the app is built with `WebView.setWebContentsDebuggingEnabled(true)`. If accessible, barebrowse's CDP + ARIA expertise can fill the gap that all other Android agent tools struggle with — structured content inside WebViews instead of falling back to screenshots.
|
|
81
|
+
|
|
82
|
+
Discovery: `adb forward tcp:9222 localabstract:webview_devtools_remote_<pid>`
|
|
83
|
+
|
|
84
|
+
## Windows Technical Details
|
|
85
|
+
|
|
86
|
+
### UI Automation (UIA)
|
|
87
|
+
Windows' accessibility API. Exposes a tree of AutomationElements with: ControlType, Name, AutomationId, BoundingRectangle, IsEnabled, patterns (Invoke, Value, Toggle, Selection, Scroll, etc.).
|
|
88
|
+
|
|
89
|
+
Best desktop accessibility tree. Covers Win32, WPF, WinForms, UWP, and most Electron apps.
|
|
90
|
+
|
|
91
|
+
### Input
|
|
92
|
+
SendInput API for keyboard/mouse. Or higher-level: `pyautogui`, `robotjs`, `nut.js`.
|
|
93
|
+
|
|
94
|
+
### Competition
|
|
95
|
+
Microsoft's own UFO project (8k stars) dominates this space. They have first-party UIA access and deep investment. Competing here means competing with Microsoft on their own platform's APIs.
|
|
96
|
+
|
|
97
|
+
## References
|
|
98
|
+
- [DroidRun](https://github.com/droidrun/droidrun)
|
|
99
|
+
- [DroidClaw](https://github.com/unitedbyai/droidclaw)
|
|
100
|
+
- [agent-device](https://github.com/callstackincubator/agent-device)
|
|
101
|
+
- [UFO (Microsoft)](https://github.com/microsoft/UFO)
|
|
102
|
+
- [Agent-S](https://github.com/simular-ai/Agent-S)
|
|
103
|
+
- [Cua](https://github.com/trycua/cua)
|
|
104
|
+
- [Android uiautomator](https://developer.android.com/training/testing/other-components/ui-automator)
|
|
105
|
+
- [Windows UI Automation](https://learn.microsoft.com/en-us/windows/win32/winauto/entry-uiauto-win32)
|
package/docs/01-product/prd.md
CHANGED
|
@@ -131,6 +131,30 @@ After connection, every CDP command is the same. Three modes = ~20 extra lines i
|
|
|
131
131
|
|
|
132
132
|
**What stays in mcprune:** The Playwright MCP proxy architecture. mcprune can continue to exist as a Playwright-based MCP server for users who want that path. But for barebrowse consumers, pruning is built in.
|
|
133
133
|
|
|
134
|
+
### Obstacle Course — What barebrowse handles automatically
|
|
135
|
+
|
|
136
|
+
The agent doesn't have to think about any of this:
|
|
137
|
+
|
|
138
|
+
| Obstacle | How it's handled | Mode |
|
|
139
|
+
|----------|-----------------|------|
|
|
140
|
+
| **Cookie consent walls** | ARIA tree scan + jsClick accept button, 29 languages | Both |
|
|
141
|
+
| **Consent in dialog role** | Detect `dialog`/`alertdialog` with consent hints, click accept inside | Both |
|
|
142
|
+
| **Consent outside dialog** (BBC SourcePoint) | Fallback global button scan when dialog has no accept button | Both |
|
|
143
|
+
| **Consent behind iframe overlay** | JS click via DOM.resolveNode bypasses z-index/overlay issues | Both |
|
|
144
|
+
| **Permission prompts** (location, camera, mic) | Launch flags + CDP Browser.setPermission auto-deny | Both |
|
|
145
|
+
| **Media autoplay blocked** | Autoplay policy flag on launch | Both |
|
|
146
|
+
| **Login walls** | Cookie extraction from all browsers (Firefox + Chromium merged), injected via CDP | Both |
|
|
147
|
+
| **Pre-filled form inputs** | Select-all + delete before typing | Both |
|
|
148
|
+
| **Off-screen elements** | Scrolled into view before every click | Both |
|
|
149
|
+
| **Form submission** | Enter key triggers onsubmit | Both |
|
|
150
|
+
| **Tab between fields** | Tab key moves focus correctly | Both |
|
|
151
|
+
| **SPA navigation** (YouTube, GitHub) | SPA-aware wait: frameNavigated + loadEventFired | Both |
|
|
152
|
+
| **Bot detection** (Google, Reddit) | Stealth patches (headless) + automatic headed fallback with real cookies | Hybrid |
|
|
153
|
+
| **navigator.webdriver leak** | Patched before page scripts run: webdriver, plugins, languages, chrome object | Headless |
|
|
154
|
+
| **JS dialogs** (alert/confirm/prompt) | Auto-dismiss via CDP, logged for inspection | Both |
|
|
155
|
+
| **Profile locking** | Unique temp dir per headless instance | Headless |
|
|
156
|
+
| **ARIA noise** | 9-step pruning pipeline (ported from mcprune): wrapper collapse, noise removal, landmark promotion | Both |
|
|
157
|
+
|
|
134
158
|
---
|
|
135
159
|
|
|
136
160
|
## API Design
|
package/package.json
CHANGED
package/src/index.js
CHANGED
|
@@ -223,10 +223,10 @@ export async function connect(opts = {}) {
|
|
|
223
223
|
const raw = formatTree(result.tree);
|
|
224
224
|
const { currentIndex, entries } = await page.session.send('Page.getNavigationHistory');
|
|
225
225
|
const pageUrl = entries[currentIndex]?.url || '';
|
|
226
|
-
if (pruneOpts === false) return
|
|
226
|
+
if (pruneOpts === false) return `url: ${pageUrl}\n` + raw;
|
|
227
227
|
const pruned = pruneTree(result.tree, { mode: pruneOpts?.mode || 'act' });
|
|
228
228
|
const out = formatTree(pruned);
|
|
229
|
-
const stats =
|
|
229
|
+
const stats = `url: ${pageUrl}\n${raw.length.toLocaleString()} chars → ${out.length.toLocaleString()} chars (${Math.round((1 - out.length / raw.length) * 100)}% pruned)`;
|
|
230
230
|
return stats + '\n' + out;
|
|
231
231
|
},
|
|
232
232
|
|