barebrowse 0.4.5 → 0.4.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,17 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.4.7
4
+
5
+ Snapshot URL prefix format changed from `# <url>` to `url: <url>`.
6
+
7
+ - Fix: MCP clients (Claude Code) stripped `#`-prefixed lines as comments, making the URL invisible to agents
8
+ - Snapshot first line is now `url: <current-page-url>` (was `# <current-page-url>`)
9
+ - Stats line no longer prefixed with `#`
10
+
11
+ ## 0.4.6
12
+
13
+ - README wording fix
14
+
3
15
  ## 0.4.5
4
16
 
5
17
  - README: "What this is" rewritten — concise, no implementation details exposed
package/README.md CHANGED
@@ -18,7 +18,7 @@
18
18
 
19
19
  barebrowse gives your AI agent a real browser. Navigate, read, interact, move on.
20
20
 
21
- It runs on whatever browser you already have -- your sessions, your cookies. Pages come back stripped to what matters -- 40-90% fewer tokens than raw output.
21
+ It uses the browser you already have -- your sessions, your cookies. Pages come back stripped to what matters -- 40-90% fewer tokens than raw output.
22
22
 
23
23
  No Playwright. Zero dependencies. No bundled browser. No 200MB download.
24
24
 
@@ -105,27 +105,7 @@ For code examples, API reference, and wiring instructions, see **[barebrowse.con
105
105
 
106
106
  ## What it handles automatically
107
107
 
108
- This is the obstacle course your agent doesn't have to think about:
109
-
110
- | Obstacle | How it's handled | Mode |
111
- |----------|-----------------|------|
112
- | **Cookie consent walls** | ARIA tree scan + jsClick accept button, 29 languages | Both |
113
- | **Consent in dialog role** | Detect `dialog`/`alertdialog` with consent hints, click accept inside | Both |
114
- | **Consent outside dialog** (BBC SourcePoint) | Fallback global button scan when dialog has no accept button | Both |
115
- | **Consent behind iframe overlay** | JS click via DOM.resolveNode bypasses z-index/overlay issues | Both |
116
- | **Permission prompts** (location, camera, mic) | Launch flags + CDP Browser.setPermission auto-deny | Both |
117
- | **Media autoplay blocked** | Autoplay policy flag on launch | Both |
118
- | **Login walls** | Cookie extraction from all browsers (Firefox + Chromium merged), injected via CDP | Both |
119
- | **Pre-filled form inputs** | Select-all + delete before typing | Both |
120
- | **Off-screen elements** | Scrolled into view before every click | Both |
121
- | **Form submission** | Enter key triggers onsubmit | Both |
122
- | **Tab between fields** | Tab key moves focus correctly | Both |
123
- | **SPA navigation** (YouTube, GitHub) | SPA-aware wait: frameNavigated + loadEventFired | Both |
124
- | **Bot detection** (Google, Reddit) | Stealth patches (headless) + automatic headed fallback with real cookies | Hybrid |
125
- | **navigator.webdriver leak** | Patched before page scripts run: webdriver, plugins, languages, chrome object | Headless |
126
- | **JS dialogs** (alert/confirm/prompt) | Auto-dismiss via CDP, logged for inspection | Both |
127
- | **Profile locking** | Unique temp dir per headless instance | Headless |
128
- | **ARIA noise** | 9-step pruning pipeline (ported from mcprune): wrapper collapse, noise removal, landmark promotion | Both |
108
+ Cookie consent walls (29 languages), login walls (cookie extraction from your browsers), bot detection (stealth patches + automatic headed fallback), permission prompts, SPA navigation, JS dialogs, off-screen elements, pre-filled inputs, ARIA noise, and profile locking. The agent doesn't think about any of it.
129
109
 
130
110
  ## What the agent sees
131
111
 
@@ -202,6 +182,28 @@ URL -> find/launch browser (chromium.js)
202
182
  - Any Chromium-based browser installed (Chrome, Chromium, Brave, Edge, Vivaldi)
203
183
  - Linux tested (Fedora/KDE). macOS/Windows cookie paths exist but untested.
204
184
 
185
+ ## The bare ecosystem
186
+
187
+ Three vanilla JS modules. Zero dependencies. Same API patterns.
188
+
189
+ | | [**bareagent**](https://npmjs.com/package/bare-agent) | [**barebrowse**](https://npmjs.com/package/barebrowse) | [**baremobile**](https://npmjs.com/package/baremobile) |
190
+ |---|---|---|---|
191
+ | **Does** | Gives agents a think→act loop | Gives agents a real browser | Gives agents an Android device |
192
+ | **How** | Goal in → coordinated actions out | URL in → pruned snapshot out | Screen in → pruned snapshot out |
193
+ | **Replaces** | LangChain, CrewAI, AutoGen | Playwright, Selenium, Puppeteer | Appium, Espresso, UIAutomator2 |
194
+ | **Interfaces** | Library · CLI · subprocess | Library · CLI · MCP | Library · CLI · MCP |
195
+ | **Solo or together** | Orchestrates both as tools | Works standalone | Works standalone |
196
+
197
+ **What you can build:**
198
+
199
+ - **Headless automation** — scrape sites, fill forms, extract data, monitor pages on a schedule
200
+ - **QA & testing** — automated test suites for web and Android apps without heavyweight frameworks
201
+ - **Personal AI assistants** — chatbots that browse the web or control your phone on your behalf
202
+ - **Remote device control** — manage Android devices over WiFi, including on-device via Termux
203
+ - **Agentic workflows** — multi-step tasks where an AI plans, browses, and acts across web and mobile
204
+
205
+ **Why this exists:** Most automation stacks ship 200MB of opinions before you write a line of code. These don't. Install, import, go.
206
+
205
207
  ## License
206
208
 
207
209
  MIT
package/baremobile.md ADDED
@@ -0,0 +1,105 @@
1
+ # baremobile — Research & Feasibility
2
+
3
+ ## Platform Feasibility
4
+
5
+ | Platform | Accessibility Tree | Input Injection | Auth/Cookie Reuse | Practical? |
6
+ |---|---|---|---|---|
7
+ | **Android (non-root)** | uiautomator dump — good XML tree | ADB tap/swipe/input — solid | No. SharedPrefs locked, Keystore hardware-bound | Yes — tree + input work, auth is the gap |
8
+ | **Android (rooted)** | Same | Same | Partial — SharedPrefs yes, Keystore still no | Yes — best mobile option |
9
+ | **iOS (simulator)** | XCUITest — excellent tree | simctl + WDA | No. Absolute sandboxing | Dev/QA only |
10
+ | **iOS (physical)** | Same, but needs Mac+Xcode+signing | Same | No | Impractical for end users |
11
+ | **Windows** | UI Automation (UIA) — best desktop tree | SendInput — works well | App-specific, no universal trick | Yes — strongest desktop option |
12
+ | **macOS** | AXUIElement — good tree | CGEvent APIs, cliclick | Keychain Access (gated by prompts) | Medium — permission hell |
13
+ | **Linux (X11)** | AT-SPI2 — inconsistent coverage | xdotool — trivial | App-specific | Medium — tree quality varies |
14
+ | **Linux (Wayland)** | AT-SPI2 — same | ydotool (needs root) | Same | Hard — input injection blocked by design |
15
+
16
+ ## Market Demand
17
+
18
+ | Platform | Who Wants It | Use Cases | Demand Level |
19
+ |---|---|---|---|
20
+ | **Android** | Devs, QA teams, end users | Mobile-only apps, testing, data entry, social media automation | **High** — DroidRun got 900 signups in 72h, 2.1M EUR raised |
21
+ | **iOS** | QA teams, enterprises | App testing, accessibility audits | **Medium** — gated access kills consumer use |
22
+ | **Windows** | Enterprises | Legacy app automation (no API, GUI only) | **High** — Microsoft building UFO (8k stars) |
23
+ | **macOS** | Developers, power users | App automation, workflows | **Low-Medium** — most Mac apps have APIs or CLI |
24
+ | **Linux desktop** | Almost nobody | Niche automation | **Low** — devs use CLI, not GUI agents |
25
+
26
+ ## Existing Open-Source Competition
27
+
28
+ | Project | Platform | Stars | Approach | Maturity |
29
+ |---|---|---|---|---|
30
+ | DroidRun | Android | 3.8k | A11y tree + ADB | Funded (2.1M EUR), active |
31
+ | DroidClaw | Android | New | A11y tree + vision fallback | Weeks old |
32
+ | agent-device (Callstack) | Android + iOS | Early | A11y tree, TypeScript | Active |
33
+ | UFO (Microsoft) | Windows | 8k | UIA + vision | Mature |
34
+ | Agent-S (Simular AI) | Cross-platform | 8.5k | Hybrid | Mature, 72% OSWorld |
35
+ | Cua | macOS/Linux/Win | 11.8k | Sandbox VMs | YC-backed |
36
+
37
+ ## Auth Problem (Mobile vs Web)
38
+
39
+ | | Web (barebrowse) | Android Native | iOS Native |
40
+ |---|---|---|---|
41
+ | Where tokens live | SQLite cookie DB, readable | SharedPrefs (locked) or Keystore (hardware) | Keychain (sandboxed) |
42
+ | Can agent read them? | Yes — decrypt with OS keyring | No (non-root), Partial (root) | No |
43
+ | Workaround | N/A — it works | Agent logs in via UI, or keeps app session alive | Same |
44
+ | WebView content? | N/A | CDP attach possible (debug builds only) | No |
45
+
46
+ ## Strategic Comparison
47
+
48
+ | Factor | Android | Windows | iOS | macOS | Linux Desktop |
49
+ |---|---|---|---|---|---|
50
+ | Tree quality | Good | Excellent | Excellent | Good | Inconsistent |
51
+ | Input control | Easy (ADB) | Easy | Gated | Permission-heavy | X11 easy, Wayland hard |
52
+ | Auth reuse | Bad | App-specific | Impossible | Gated | App-specific |
53
+ | Real demand | **High** | **High** | Medium (QA only) | Low-Medium | Low |
54
+ | Competition | Active but early | Microsoft owns it | Apple controls it | Niche | Nobody cares |
55
+ | Fits barebrowse DNA? | **Yes** | Partial | No | No | No |
56
+
57
+ ## Android Technical Details
58
+
59
+ ### Accessibility Tree via ADB
60
+ ```bash
61
+ adb shell uiautomator dump /dev/tty # dump XML accessibility tree
62
+ ```
63
+ Returns XML with: bounds, text, class, content-desc, resource-id, clickable, scrollable, focused, enabled, checked, selected. Structurally similar to ARIA — roles, names, states, coordinates.
64
+
65
+ ### Input via ADB
66
+ ```bash
67
+ adb shell input tap 500 300 # tap at coordinates
68
+ adb shell input text "hello" # type text
69
+ adb shell input keyevent 66 # Enter key (KEYCODE_ENTER)
70
+ adb shell input swipe 300 500 300 100 # swipe gesture
71
+ adb shell input keyevent 4 # Back button
72
+ ```
73
+
74
+ ### Key Limitations
75
+ - **WebViews:** uiautomator tree is empty/shallow for WebView content. Flutter apps can crash uiautomator with StackOverflowError.
76
+ - **Auth:** Cannot read app tokens on non-rooted devices. Agent must log in through UI or keep sessions alive.
77
+ - **Latency:** uiautomator dump takes 1-3 seconds. Screenshot approach is faster per frame but less structured.
78
+
79
+ ### WebView Gap — Potential Differentiator
80
+ Android WebViews expose a CDP debug port when the app is built with `WebView.setWebContentsDebuggingEnabled(true)`. If accessible, barebrowse's CDP + ARIA expertise can fill the gap that all other Android agent tools struggle with — structured content inside WebViews instead of falling back to screenshots.
81
+
82
+ Discovery: `adb forward tcp:9222 localabstract:webview_devtools_remote_<pid>`
83
+
84
+ ## Windows Technical Details
85
+
86
+ ### UI Automation (UIA)
87
+ Windows' accessibility API. Exposes a tree of AutomationElements with: ControlType, Name, AutomationId, BoundingRectangle, IsEnabled, patterns (Invoke, Value, Toggle, Selection, Scroll, etc.).
88
+
89
+ Best desktop accessibility tree. Covers Win32, WPF, WinForms, UWP, and most Electron apps.
90
+
91
+ ### Input
92
+ SendInput API for keyboard/mouse. Or higher-level: `pyautogui`, `robotjs`, `nut.js`.
93
+
94
+ ### Competition
95
+ Microsoft's own UFO project (8k stars) dominates this space. They have first-party UIA access and deep investment. Competing here means competing with Microsoft on their own platform's APIs.
96
+
97
+ ## References
98
+ - [DroidRun](https://github.com/droidrun/droidrun)
99
+ - [DroidClaw](https://github.com/unitedbyai/droidclaw)
100
+ - [agent-device](https://github.com/callstackincubator/agent-device)
101
+ - [UFO (Microsoft)](https://github.com/microsoft/UFO)
102
+ - [Agent-S](https://github.com/simular-ai/Agent-S)
103
+ - [Cua](https://github.com/trycua/cua)
104
+ - [Android uiautomator](https://developer.android.com/training/testing/other-components/ui-automator)
105
+ - [Windows UI Automation](https://learn.microsoft.com/en-us/windows/win32/winauto/entry-uiauto-win32)
@@ -131,6 +131,30 @@ After connection, every CDP command is the same. Three modes = ~20 extra lines i
131
131
 
132
132
  **What stays in mcprune:** The Playwright MCP proxy architecture. mcprune can continue to exist as a Playwright-based MCP server for users who want that path. But for barebrowse consumers, pruning is built in.
133
133
 
134
+ ### Obstacle Course — What barebrowse handles automatically
135
+
136
+ The agent doesn't have to think about any of this:
137
+
138
+ | Obstacle | How it's handled | Mode |
139
+ |----------|-----------------|------|
140
+ | **Cookie consent walls** | ARIA tree scan + jsClick accept button, 29 languages | Both |
141
+ | **Consent in dialog role** | Detect `dialog`/`alertdialog` with consent hints, click accept inside | Both |
142
+ | **Consent outside dialog** (BBC SourcePoint) | Fallback global button scan when dialog has no accept button | Both |
143
+ | **Consent behind iframe overlay** | JS click via DOM.resolveNode bypasses z-index/overlay issues | Both |
144
+ | **Permission prompts** (location, camera, mic) | Launch flags + CDP Browser.setPermission auto-deny | Both |
145
+ | **Media autoplay blocked** | Autoplay policy flag on launch | Both |
146
+ | **Login walls** | Cookie extraction from all browsers (Firefox + Chromium merged), injected via CDP | Both |
147
+ | **Pre-filled form inputs** | Select-all + delete before typing | Both |
148
+ | **Off-screen elements** | Scrolled into view before every click | Both |
149
+ | **Form submission** | Enter key triggers onsubmit | Both |
150
+ | **Tab between fields** | Tab key moves focus correctly | Both |
151
+ | **SPA navigation** (YouTube, GitHub) | SPA-aware wait: frameNavigated + loadEventFired | Both |
152
+ | **Bot detection** (Google, Reddit) | Stealth patches (headless) + automatic headed fallback with real cookies | Hybrid |
153
+ | **navigator.webdriver leak** | Patched before page scripts run: webdriver, plugins, languages, chrome object | Headless |
154
+ | **JS dialogs** (alert/confirm/prompt) | Auto-dismiss via CDP, logged for inspection | Both |
155
+ | **Profile locking** | Unique temp dir per headless instance | Headless |
156
+ | **ARIA noise** | 9-step pruning pipeline (ported from mcprune): wrapper collapse, noise removal, landmark promotion | Both |
157
+
134
158
  ---
135
159
 
136
160
  ## API Design
package/mcp-server.js CHANGED
@@ -249,7 +249,7 @@ async function handleMessage(msg) {
249
249
  return jsonrpcResponse(id, {
250
250
  protocolVersion: '2024-11-05',
251
251
  capabilities: { tools: {} },
252
- serverInfo: { name: 'barebrowse', version: '0.4.5' },
252
+ serverInfo: { name: 'barebrowse', version: '0.4.6' },
253
253
  });
254
254
  }
255
255
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "barebrowse",
3
- "version": "0.4.5",
3
+ "version": "0.4.7",
4
4
  "description": "Authenticated web browsing for autonomous agents via CDP. URL in, pruned ARIA snapshot out.",
5
5
  "type": "module",
6
6
  "main": "src/index.js",
package/src/index.js CHANGED
@@ -223,10 +223,10 @@ export async function connect(opts = {}) {
223
223
  const raw = formatTree(result.tree);
224
224
  const { currentIndex, entries } = await page.session.send('Page.getNavigationHistory');
225
225
  const pageUrl = entries[currentIndex]?.url || '';
226
- if (pruneOpts === false) return `# ${pageUrl}\n` + raw;
226
+ if (pruneOpts === false) return `url: ${pageUrl}\n` + raw;
227
227
  const pruned = pruneTree(result.tree, { mode: pruneOpts?.mode || 'act' });
228
228
  const out = formatTree(pruned);
229
- const stats = `# ${pageUrl}\n# ${raw.length.toLocaleString()} chars → ${out.length.toLocaleString()} chars (${Math.round((1 - out.length / raw.length) * 100)}% pruned)`;
229
+ const stats = `url: ${pageUrl}\n${raw.length.toLocaleString()} chars → ${out.length.toLocaleString()} chars (${Math.round((1 - out.length / raw.length) * 100)}% pruned)`;
230
230
  return stats + '\n' + out;
231
231
  },
232
232