barebrowse 0.4.8 → 0.5.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/.mcp.json DELETED
@@ -1,8 +0,0 @@
1
- {
2
- "mcpServers": {
3
- "barebrowse": {
4
- "command": "npx",
5
- "args": ["barebrowse", "mcp"]
6
- }
7
- }
8
- }
package/CLAUDE.md DELETED
@@ -1,24 +0,0 @@
1
- ## Dev Rules
2
-
3
- **POC first.** Always validate logic with a ~15min proof-of-concept before building. Cover happy path + common edges. POC works → design properly → build with tests. Never ship the POC.
4
-
5
- **Build incrementally.** Break work into small independent modules. One piece at a time, each must work on its own before integrating.
6
-
7
- **Dependency hierarchy — follow strictly:** vanilla language → standard library → external (only when stdlib can't do it in <100 lines). External deps must be maintained, lightweight, and widely adopted. Exception: always use vetted libraries for security-critical code (crypto, auth, sanitization).
8
-
9
- **Lightweight over complex.** Fewer moving parts, fewer deps, less config. Simple > clever. Readable > elegant.
10
-
11
- **Open-source only.** No vendor lock-in. Every line of code must have a purpose — no speculative code, no premature abstractions.
12
-
13
- ## Project Specifics
14
-
15
- - **What:** Vanilla JS library — CDP-direct browsing for autonomous agents. URL in, pruned ARIA snapshot out.
16
- - **Language:** Vanilla JavaScript, ES modules, no build step
17
- - **Runtime:** Node.js >= 22 (built-in WebSocket, sqlite)
18
- - **Protocol:** CDP (Chrome DevTools Protocol) direct — no Playwright
19
- - **Browser:** Any installed Chromium-based browser (chromium, chrome, brave, edge)
20
- - **Modules:** 11 files in `src/`, ~2,400 lines, zero required deps
21
- - **Tests:** 54 passing — run with `node --test test/unit/*.test.js test/integration/*.test.js`
22
- - **Docs:** `docs/README.md` (navigation guide to all documentation)
23
-
24
- For full development and testing standards, see `.claude/memory/AGENT_RULES.md`.
package/baremobile.md DELETED
@@ -1,105 +0,0 @@
1
- # baremobile — Research & Feasibility
2
-
3
- ## Platform Feasibility
4
-
5
- | Platform | Accessibility Tree | Input Injection | Auth/Cookie Reuse | Practical? |
6
- |---|---|---|---|---|
7
- | **Android (non-root)** | uiautomator dump — good XML tree | ADB tap/swipe/input — solid | No. SharedPrefs locked, Keystore hardware-bound | Yes — tree + input work, auth is the gap |
8
- | **Android (rooted)** | Same | Same | Partial — SharedPrefs yes, Keystore still no | Yes — best mobile option |
9
- | **iOS (simulator)** | XCUITest — excellent tree | simctl + WDA | No. Absolute sandboxing | Dev/QA only |
10
- | **iOS (physical)** | Same, but needs Mac+Xcode+signing | Same | No | Impractical for end users |
11
- | **Windows** | UI Automation (UIA) — best desktop tree | SendInput — works well | App-specific, no universal trick | Yes — strongest desktop option |
12
- | **macOS** | AXUIElement — good tree | CGEvent APIs, cliclick | Keychain Access (gated by prompts) | Medium — permission hell |
13
- | **Linux (X11)** | AT-SPI2 — inconsistent coverage | xdotool — trivial | App-specific | Medium — tree quality varies |
14
- | **Linux (Wayland)** | AT-SPI2 — same | ydotool (needs root) | Same | Hard — input injection blocked by design |
15
-
16
- ## Market Demand
17
-
18
- | Platform | Who Wants It | Use Cases | Demand Level |
19
- |---|---|---|---|
20
- | **Android** | Devs, QA teams, end users | Mobile-only apps, testing, data entry, social media automation | **High** — DroidRun got 900 signups in 72h, 2.1M EUR raised |
21
- | **iOS** | QA teams, enterprises | App testing, accessibility audits | **Medium** — gated access kills consumer use |
22
- | **Windows** | Enterprises | Legacy app automation (no API, GUI only) | **High** — Microsoft building UFO (8k stars) |
23
- | **macOS** | Developers, power users | App automation, workflows | **Low-Medium** — most Mac apps have APIs or CLI |
24
- | **Linux desktop** | Almost nobody | Niche automation | **Low** — devs use CLI, not GUI agents |
25
-
26
- ## Existing Open-Source Competition
27
-
28
- | Project | Platform | Stars | Approach | Maturity |
29
- |---|---|---|---|---|
30
- | DroidRun | Android | 3.8k | A11y tree + ADB | Funded (2.1M EUR), active |
31
- | DroidClaw | Android | New | A11y tree + vision fallback | Weeks old |
32
- | agent-device (Callstack) | Android + iOS | Early | A11y tree, TypeScript | Active |
33
- | UFO (Microsoft) | Windows | 8k | UIA + vision | Mature |
34
- | Agent-S (Simular AI) | Cross-platform | 8.5k | Hybrid | Mature, 72% OSWorld |
35
- | Cua | macOS/Linux/Win | 11.8k | Sandbox VMs | YC-backed |
36
-
37
- ## Auth Problem (Mobile vs Web)
38
-
39
- | | Web (barebrowse) | Android Native | iOS Native |
40
- |---|---|---|---|
41
- | Where tokens live | SQLite cookie DB, readable | SharedPrefs (locked) or Keystore (hardware) | Keychain (sandboxed) |
42
- | Can agent read them? | Yes — decrypt with OS keyring | No (non-root), Partial (root) | No |
43
- | Workaround | N/A — it works | Agent logs in via UI, or keeps app session alive | Same |
44
- | WebView content? | N/A | CDP attach possible (debug builds only) | No |
45
-
46
- ## Strategic Comparison
47
-
48
- | Factor | Android | Windows | iOS | macOS | Linux Desktop |
49
- |---|---|---|---|---|---|
50
- | Tree quality | Good | Excellent | Excellent | Good | Inconsistent |
51
- | Input control | Easy (ADB) | Easy | Gated | Permission-heavy | X11 easy, Wayland hard |
52
- | Auth reuse | Bad | App-specific | Impossible | Gated | App-specific |
53
- | Real demand | **High** | **High** | Medium (QA only) | Low-Medium | Low |
54
- | Competition | Active but early | Microsoft owns it | Apple controls it | Niche | Nobody cares |
55
- | Fits barebrowse DNA? | **Yes** | Partial | No | No | No |
56
-
57
- ## Android Technical Details
58
-
59
- ### Accessibility Tree via ADB
60
- ```bash
61
- adb shell uiautomator dump /dev/tty # dump XML accessibility tree
62
- ```
63
- Returns XML with: bounds, text, class, content-desc, resource-id, clickable, scrollable, focused, enabled, checked, selected. Structurally similar to ARIA — roles, names, states, coordinates.
64
-
65
- ### Input via ADB
66
- ```bash
67
- adb shell input tap 500 300 # tap at coordinates
68
- adb shell input text "hello" # type text
69
- adb shell input keyevent 66 # Enter key (KEYCODE_ENTER)
70
- adb shell input swipe 300 500 300 100 # swipe gesture
71
- adb shell input keyevent 4 # Back button
72
- ```
73
-
74
- ### Key Limitations
75
- - **WebViews:** uiautomator tree is empty/shallow for WebView content. Flutter apps can crash uiautomator with StackOverflowError.
76
- - **Auth:** Cannot read app tokens on non-rooted devices. Agent must log in through UI or keep sessions alive.
77
- - **Latency:** uiautomator dump takes 1-3 seconds. Screenshot approach is faster per frame but less structured.
78
-
79
- ### WebView Gap — Potential Differentiator
80
- Android WebViews expose a CDP debug port when the app is built with `WebView.setWebContentsDebuggingEnabled(true)`. If accessible, barebrowse's CDP + ARIA expertise can fill the gap that all other Android agent tools struggle with — structured content inside WebViews instead of falling back to screenshots.
81
-
82
- Discovery: `adb forward tcp:9222 localabstract:webview_devtools_remote_<pid>`
83
-
84
- ## Windows Technical Details
85
-
86
- ### UI Automation (UIA)
87
- Windows' accessibility API. Exposes a tree of AutomationElements with: ControlType, Name, AutomationId, BoundingRectangle, IsEnabled, patterns (Invoke, Value, Toggle, Selection, Scroll, etc.).
88
-
89
- Best desktop accessibility tree. Covers Win32, WPF, WinForms, UWP, and most Electron apps.
90
-
91
- ### Input
92
- SendInput API for keyboard/mouse. Or higher-level: `pyautogui`, `robotjs`, `nut.js`.
93
-
94
- ### Competition
95
- Microsoft's own UFO project (8k stars) dominates this space. They have first-party UIA access and deep investment. Competing here means competing with Microsoft on their own platform's APIs.
96
-
97
- ## References
98
- - [DroidRun](https://github.com/droidrun/droidrun)
99
- - [DroidClaw](https://github.com/unitedbyai/droidclaw)
100
- - [agent-device](https://github.com/callstackincubator/agent-device)
101
- - [UFO (Microsoft)](https://github.com/microsoft/UFO)
102
- - [Agent-S](https://github.com/simular-ai/Agent-S)
103
- - [Cua](https://github.com/trycua/cua)
104
- - [Android uiautomator](https://developer.android.com/training/testing/other-components/ui-automator)
105
- - [Windows UI Automation](https://learn.microsoft.com/en-us/windows/win32/winauto/entry-uiauto-win32)
@@ -1,38 +0,0 @@
1
- # barebrowse -- Assumptions & Constraints
2
-
3
- ## Hard constraints
4
-
5
- | Constraint | Detail |
6
- |-----------|--------|
7
- | **Chromium-only** | CDP protocol. Covers Chrome, Chromium, Edge, Brave, Vivaldi, Arc, Opera (~80% desktop share). Firefox later via WebDriver BiDi. |
8
- | **Node >= 22** | Built-in WebSocket (`globalThis.WebSocket`), built-in SQLite (`node:sqlite`). No polyfills. |
9
- | **Linux first** | Tested on Fedora/KDE/Wayland. macOS/Windows cookie extraction paths exist in auth.js but are untested. |
10
- | **Zero required deps** | Everything uses Node stdlib. Vanilla JS, ES modules, no build step. |
11
- | **Not a server** | Library that agents import. MCP wrapper included, HTTP wrapper is DIY. |
12
-
13
- ## Assumptions
14
-
15
- - **User has Chromium installed.** At least one of: chromium-browser, google-chrome, brave-browser, microsoft-edge. `chromium.js` searches common paths.
16
- - **Cookie extraction needs unlocked profile.** Chromium cookies are AES-encrypted with a keyring key (KWallet on KDE, GNOME Keyring on GNOME). Firefox cookies are plaintext SQLite and always accessible.
17
- - **Headed mode requires manual browser launch.** User must start their browser with `--remote-debugging-port=9222`. barebrowse connects to it -- does not launch it.
18
- - **Hybrid fallback needs a running headed browser.** If headless is bot-blocked, hybrid kills headless and connects to headed on port 9222. That browser must already be running.
19
- - **Cookies expire.** Cookie injection works for existing sessions, not new logins. For sites requiring fresh auth, headed mode with user interaction is the fallback.
20
- - **One page per connect().** Each `connect()` call creates one page. For multiple tabs, call `connect()` multiple times.
21
-
22
- ## Known limitations
23
-
24
- | Limitation | Impact | Workaround |
25
- |-----------|--------|------------|
26
- | No Firefox/WebKit support | ~20% of desktop users can't use native browser | Use Chromium as the automation target, Firefox as cookie source |
27
- | No file upload | Can't interact with file inputs | Not yet implemented (`Input.setFiles` via CDP) |
28
- | No drag and drop | Can't use drag-based UIs | Not yet implemented |
29
- | No cross-origin iframes | Content inside iframes invisible to ARIA tree | Frame tree traversal via CDP (medium effort) |
30
- | No CAPTCHAs | Cannot solve challenge pages | Headed mode lets user solve manually |
31
- | Canvas/WebGL opaque | No ARIA representation | Needs screenshot + vision model |
32
- | macOS/Windows untested | Cookie paths exist but may not work | Linux-only for now |
33
-
34
- ## Risks
35
-
36
- - **CDP is not a stable API.** Chrome team can change it across versions. Mitigation: we use well-established domains (Accessibility, Input, Page, Network, DOM) that rarely break.
37
- - **Cookie consent patterns evolve.** New consent frameworks may not be detected by `consent.js`. Mitigation: best-effort, opt-out with `{ consent: false }`.
38
- - **Stealth patches are an arms race.** Bot detection evolves. Mitigation: headed mode with real browser profile is the ultimate fallback.
@@ -1,402 +0,0 @@
1
- # barebrowse -- Blueprint
2
-
3
- Vanilla JS library. CDP-direct. URL in, pruned ARIA snapshot out.
4
- No Playwright, no bundled browser, no build step.
5
-
6
- ---
7
-
8
- ## What It Does
9
-
10
- Gives autonomous agents authenticated access to the web through the user's own Chromium browser.
11
-
12
- ```js
13
- import { browse, connect } from 'barebrowse';
14
-
15
- // One-shot: read a page
16
- const snapshot = await browse('https://any-page.com');
17
-
18
- // Session: navigate, interact, observe
19
- const page = await connect();
20
- await page.goto('https://any-page.com');
21
- console.log(await page.snapshot());
22
- await page.click('8'); // ref from snapshot
23
- await page.type('3', 'hello');
24
- await page.scroll(500);
25
- await page.close();
26
- ```
27
-
28
- ---
29
-
30
- ## Capabilities
31
-
32
- Every action returns a **pruned ARIA snapshot** -- the agent's view of the page after each move. The snapshot is a YAML-like tree with `[ref=N]` markers on interactive elements. The agent reads the snapshot, picks a ref, acts, then reads the next snapshot. This is the observe-think-act loop.
33
-
34
- ### Actions
35
-
36
- | Action | Method | What It Does | Status |
37
- |--------|--------|-------------|--------|
38
- | Navigate | `page.goto(url)` | Load a URL, wait for page load, dismiss consent | Done |
39
- | Snapshot | `page.snapshot()` | Pruned ARIA tree (47-95% token reduction) | Done |
40
- | Click | `page.click(ref)` | Scroll into view, mouse press+release at element center | Done |
41
- | Type | `page.type(ref, text)` | Focus element, insert text (fast batch mode) | Done |
42
- | Type (clear) | `page.type(ref, text, { clear: true })` | Select-all + delete, then type (replaces pre-filled content) | Done |
43
- | Type (key events) | `page.type(ref, text, { keyEvents: true })` | Char-by-char keyDown/keyUp (triggers JS handlers) | Done |
44
- | Press | `page.press(key)` | Special key: Enter, Tab, Escape, Backspace, Delete, arrows, Home/End, PageUp/Down, Space | Done |
45
- | Scroll | `page.scroll(deltaY)` | Mouse wheel event (positive=down, negative=up) | Done |
46
- | Hover | `page.hover(ref)` | Move mouse to element center (triggers hover styles/tooltips) | Done |
47
- | Select | `page.select(ref, value)` | Set `<select>` value or click custom dropdown option | Done |
48
- | Screenshot | `page.screenshot(opts)` | `Page.captureScreenshot`, returns base64 string | Done |
49
- | Wait for nav | `page.waitForNavigation()` | Promise.race of loadEventFired + frameNavigated (SPA-aware) | Done |
50
- | Wait for idle | `page.waitForNetworkIdle(opts)` | Resolve when no pending requests for N ms (default 500) | Done |
51
- | Wait for content | `page.waitFor({ text, selector })` | Poll for text or CSS selector to appear on page | Done |
52
- | Back / Forward | `page.goBack()` / `page.goForward()` | Browser history navigation via `Page.getNavigationHistory` | Done |
53
- | Drag | `page.drag(fromRef, toRef)` | Mouse down on source, move to target, release | Done |
54
- | Upload | `page.upload(ref, files)` | Set files on file input via `DOM.setFileInputFiles` | Done |
55
- | PDF | `page.pdf(opts)` | Export page as PDF via `Page.printToPDF` | Done |
56
- | Tabs | `page.tabs()` / `page.switchTab(index)` | List and switch between browser tabs | Done |
57
- | Dialog handling | Auto | JS alert/confirm/prompt auto-dismissed, logged to `page.dialogLog` | Done |
58
- | Save state | `page.saveState(filePath)` | Export cookies + localStorage to JSON for later `--storage-state` | Done |
59
- | Inject cookies | `page.injectCookies(url, opts)` | Extract cookies from Firefox/Chromium, inject via CDP | Done |
60
- | Raw CDP | `page.cdp.send(method, params)` | Escape hatch for any CDP command | Done |
61
- | Close | `page.close()` | Close page target, disconnect CDP, kill browser (if headless) | Done |
62
-
63
- ### Obstacle course -- what barebrowse handles automatically
64
-
65
- | Obstacle | How It's Handled | Mode |
66
- |----------|-----------------|------|
67
- | **Cookie consent walls** | ARIA tree scan, jsClick accept button. 29 languages | Both |
68
- | **Consent in dialog role** | Detect `dialog`/`alertdialog` with consent hints, click accept inside | Both |
69
- | **Consent outside dialog** (BBC SourcePoint) | Fallback global button scan when dialog has no accept button | Both |
70
- | **Consent behind iframe overlay** | JS `.click()` via `DOM.resolveNode` bypasses z-index/overlay issues | Both |
71
- | **Permission prompts** (location, notifications, camera, mic) | Launch flags + CDP `Browser.setPermission` auto-deny | Both |
72
- | **Media autoplay blocked** | `--autoplay-policy=no-user-gesture-required` | Both |
73
- | **Login walls** | All-browser cookie merge (Firefox + Chromium), CDP injection (user's real sessions) | Both |
74
- | **Pre-filled form inputs** | `type({ clear: true })` selects all + deletes before typing | Both |
75
- | **Off-screen elements** | `DOM.scrollIntoViewIfNeeded` before every click | Both |
76
- | **Form submission** | `press('Enter')` with proper `text: '\r'` triggers onsubmit | Both |
77
- | **Tab between fields** | `press('Tab')` with `text: '\t'` moves focus | Both |
78
- | **SPA navigation** (YouTube, GitHub) | `waitForNavigation()` uses frameNavigated + loadEventFired race | Both |
79
- | **Bot detection** (Google, Reddit) | Stealth patches (headless) + headed mode with real cookies | Both |
80
- | **`navigator.webdriver`** | Stealth patches: webdriver, plugins, languages, chrome object | Headless |
81
- | **JS dialogs** (alert/confirm/prompt) | Auto-dismiss via `Page.handleJavaScriptDialog`, logged to `dialogLog` | Both |
82
- | **Profile locking** | Unique temp dir per headless instance (`/tmp/barebrowse-<pid>-<ts>`) | Headless |
83
- | **ARIA noise** | 9-step pruning: wrapper collapse, noise removal, landmark promotion | Both |
84
-
85
- ### Not yet handled
86
-
87
- | Obstacle | What's Needed | Difficulty |
88
- |----------|--------------|------------|
89
- | Infinite scroll | Scroll + wait for new content strategy | Medium |
90
- | CAPTCHAs | Cannot solve -- headed mode lets user solve manually | N/A |
91
- | Cross-origin iframes | Frame tree traversal via CDP | Medium |
92
- | Canvas/WebGL | Opaque to ARIA -- needs screenshot + vision model | Hard |
93
-
94
- ### Tested sites (16+ sites, 8 countries, all consent dismissed)
95
-
96
- | Site | Consent | Cookies | Interactions | Notes |
97
- |------|---------|---------|-------------|-------|
98
- | google.com | NL dialog dismissed | Firefox injection | Search (combobox + Enter) | Bot-blocks headless |
99
- | youtube.com | Bypassed via cookies | Firefox injection | Search + video playback | Full e2e demo, SPA nav |
100
- | bbc.com | SourcePoint dismissed | -- | -- | Button outside dialog |
101
- | wikipedia.org | -- | -- | Link click + navigation | Clean, no consent |
102
- | github.com | -- | -- | SPA navigation | Needs settle time |
103
- | duckduckgo.com | -- | -- | Search + results | Headless-friendly |
104
- | news.ycombinator.com | -- | -- | Story link click | Clean, simple DOM |
105
- | amazon.de | Banner dismissed | -- | -- | |
106
- | theguardian.com | CMP dismissed | -- | -- | |
107
- | spiegel.de | CMP dismissed | -- | -- | German |
108
- | lemonde.fr | CMP dismissed | -- | -- | French |
109
- | elpais.com | CMP dismissed | -- | -- | Spanish |
110
- | corriere.it | CMP dismissed | -- | -- | Italian |
111
- | nos.nl | CMP dismissed | -- | -- | Dutch |
112
- | bild.de | CMP dismissed | -- | -- | German |
113
- | nu.nl | CMP dismissed | -- | -- | Dutch |
114
- | booking.com | Banner dismissed | -- | -- | |
115
- | nytimes.com | -- | -- | -- | No consent wall |
116
- | stackoverflow.com | Footer link only | -- | -- | Not blocking |
117
- | cnn.com | -- | -- | -- | No consent wall |
118
- | reddit.com | -- | -- | Fallback to old.reddit | Bot-blocks headless |
119
-
120
- ---
121
-
122
- ## Architecture
123
-
124
- ### Full pipeline: browse(url) or connect() -> goto(url)
125
-
126
- ```
127
- 1. LAUNCH chromium.js finds installed browser
128
- Headless: spawn fresh Chromium with permission flags
129
- Headed: connect to running browser on CDP port
130
- Hybrid: try headless, detect challenge page, fallback to headed
131
-
132
- 2. CDP CONNECTION cdp.js opens WebSocket to browser
133
- Creates page target, attaches flattened session
134
- Enables Page, Network, DOM domains
135
-
136
- 3. STEALTH stealth.js (headless only)
137
- Page.addScriptToEvaluateOnNewDocument before any page scripts
138
- Patches: navigator.webdriver, plugins, languages, chrome object
139
-
140
- 4. PERMISSIONS Browser.setPermission denies all prompts
141
- geo, notifications, camera, mic, midi, sensors, idle
142
-
143
- 5. AUTH auth.js extracts cookies from user's browser
144
- Firefox: SQLite cookies.sqlite (plaintext)
145
- Chromium: SQLite Cookies + AES decrypt via keyring
146
- Injects via Network.setCookie before navigation
147
-
148
- 6. NAVIGATE Page.navigate(url), wait for Page.loadEventFired
149
- 500ms settle for dynamic content
150
-
151
- 7. CONSENT consent.js scans ARIA tree post-load
152
- Finds dialog/alertdialog with consent hints
153
- Falls back to global button scan (BBC SourcePoint pattern)
154
- jsClick via DOM.resolveNode (bypasses iframe overlays)
155
-
156
- 8. SNAPSHOT Accessibility.getFullAXTree -> nested tree (aria.js)
157
- prune.js: 9-step pipeline (47-95% token reduction)
158
- Output: URL + pruning stats + YAML-like text with [ref=N] markers
159
-
160
- 9. INTERACT interact.js dispatches real CDP Input events
161
- click: scrollIntoView -> getBoxModel -> mousePressed/Released
162
- type: DOM.focus -> insertText or keyDown/keyUp per char
163
- press: special keys (Enter, Tab, Escape, arrows, etc.)
164
- scroll: mouseWheel events
165
- hover: mouseMoved at element center
166
- select: set <select> value or click custom dropdown option
167
-
168
- 10. OBSERVE AGAIN Back to step 8. Refs are ephemeral -- fresh snapshot needed.
169
- ```
170
-
171
- ### Module table
172
-
173
- Thirteen modules, zero required dependencies.
174
-
175
- | Module | Lines | Purpose |
176
- |---|---|---|
177
- | `src/index.js` | 434 | Public API: `browse()`, `connect()`, screenshot, network idle, hybrid |
178
- | `src/cdp.js` | 148 | WebSocket CDP client, flattened sessions |
179
- | `src/chromium.js` | 148 | Find/launch Chromium browsers, permission-suppressing flags |
180
- | `src/aria.js` | 69 | Format ARIA tree as YAML-like text |
181
- | `src/auth.js` | 279 | Cookie extraction (Chromium AES + keyring, Firefox), CDP injection |
182
- | `src/prune.js` | 472 | ARIA pruning pipeline (9-step, ported from mcprune) |
183
- | `src/interact.js` | 208 | Click, type, press, scroll, hover, select |
184
- | `src/consent.js` | ~280 | Auto-dismiss cookie consent dialogs, 29 languages |
185
- | `src/stealth.js` | 51 | Navigator patches for headless anti-detection |
186
- | `src/bareagent.js` | 161 | Tool adapter for bareagent Loop |
187
- | `src/daemon.js` | ~230 | Background HTTP server holding connect() session for CLI mode |
188
- | `src/session-client.js` | ~60 | HTTP client to daemon (sendCommand, readSession, isAlive) |
189
- | `mcp-server.js` | 216 | MCP server (JSON-RPC 2.0 over stdio) |
190
-
191
- ---
192
-
193
- ## What's Built
194
-
195
- ### Headless mode -- done
196
- Spawn a fresh Chromium, navigate, snapshot, close. Default mode.
197
- - Cookie extraction from user's Firefox or Chromium profile
198
- - Cookie injection via `Network.setCookie` before navigation
199
- - ARIA tree extraction via `Accessibility.getFullAXTree`
200
- - 9-step pruning: landmarks, noise removal, wrapper collapsing, context filtering
201
- - 47-95% token reduction depending on page complexity
202
- - Permission prompts auto-suppressed (notifications, geolocation, camera, mic)
203
- - Stealth patches: `navigator.webdriver`, plugins, languages, chrome object
204
-
205
- ### Headed mode -- done
206
- Connect to an already-running browser on a CDP debug port.
207
- - Same ARIA + prune pipeline
208
- - Manual cookie injection via `page.injectCookies(url, { browser })` (e.g. inject Firefox cookies into headed Chromium)
209
- - Permission prompts suppressed via CDP `Browser.setPermission`
210
- - User must launch browser with `--remote-debugging-port=9222`
211
-
212
- ### Hybrid mode -- done
213
- Try headless first. If bot-blocked (Cloudflare, etc.), fall back to headed automatically.
214
- - Detection: heuristic on ARIA tree for challenge phrases ("Just a moment", "Checking your browser")
215
- - Fallback: kill headless, connect to user's running browser on port 9222, re-navigate
216
- - One flag: `mode: 'hybrid'`
217
-
218
- ### Interactions -- done, real-world tested
219
- On `connect()` sessions: `click(ref)`, `type(ref, text, opts)`, `press(key)`, `scroll(deltaY)`, `hover(ref)`, `select(ref, value)`, `screenshot()`, `waitForNavigation()`, `waitForNetworkIdle()`, `injectCookies(url, opts)`.
220
- - Refs come from ARIA snapshot (`[ref=N]` markers)
221
- - Click: `DOM.scrollIntoViewIfNeeded` -> `DOM.getBoxModel` -> center -> `Input.dispatchMouseEvent`
222
- - Type: `DOM.focus` + `Input.insertText` (fast) or `Input.dispatchKeyEvent` (triggers handlers)
223
- - Type with `{ clear: true }`: select-all (Ctrl+A) + delete before typing
224
- - Press: special keys (Enter, Tab, Escape, Backspace, arrows) with proper key/code/keyCode
225
- - Scroll: `Input.dispatchMouseEvent` mouseWheel
226
- - Hover: `DOM.scrollIntoViewIfNeeded` -> `Input.dispatchMouseEvent` mouseMoved
227
- - Select: native `<select>` (set value + change event) or custom dropdown (click + find option)
228
- - Screenshot: `Page.captureScreenshot` -> base64 string (png/jpeg/webp)
229
- - WaitForNavigation: `Promise.race` of `Page.loadEventFired` + `Page.frameNavigated` (SPA-aware)
230
- - WaitForNetworkIdle: track pending requests, resolve when 0 for N ms
231
-
232
- **Real-world tested against:** Google, Wikipedia, GitHub (SPA), Hacker News, DuckDuckGo, YouTube (search + video playback), example.com
233
-
234
- ### Cookie consent auto-dismiss -- done
235
- Automatically detects and dismisses cookie consent dialogs after page load.
236
- - Scans ARIA tree for `dialog`/`alertdialog` with consent-related content
237
- - Falls back to global button scan for sites that don't use dialog roles (e.g. BBC SourcePoint)
238
- - Uses JS `.click()` via `DOM.resolveNode` + `Runtime.callFunctionOn` to bypass iframe overlays
239
- - 29 languages: EN, NL, DE, FR, ES, IT, PT, RU, UK, PL, CS, TR, RO, HU, EL, SV, DA, NO, FI, AR, FA, ZH, JA, KO, VI, TH, HI, ID/MS
240
- - Opt-out via `{ consent: false }`
241
- - Works in both headless and headed modes
242
-
243
- **Tested against 16+ sites across 8 countries, 0 consent dialogs remaining.**
244
-
245
- ### Permission suppression -- done
246
- Chrome permission prompts (location, notifications, camera, mic, etc.) are suppressed automatically.
247
- - Headless: launch flags (`--disable-notifications`, `--autoplay-policy=no-user-gesture-required`, `--use-fake-device-for-media-stream`, `--use-fake-ui-for-media-stream`, `--disable-features=MediaRouter`)
248
- - Both modes: CDP `Browser.setPermission` denies geolocation, notifications, midi, audioCapture, videoCapture, sensors, idleDetection, etc.
249
- - No user prompt ever appears -- agents browse without interruption
250
-
251
- ### Cross-browser cookie injection -- done
252
- Auto mode merges cookies from all detected browsers (Chromium + Firefox, last-write-wins by name+domain). No need to use Chromium as daily browser.
253
- - `browse()`: auto-injects merged cookies before navigation (opt-out with `{ cookies: false }`)
254
- - `connect()`: manual injection via `page.injectCookies(url, { browser: 'firefox' })`
255
- - MCP `goto`: auto-injects cookies before every navigation
256
- - Proven: YouTube login session transferred from Firefox -> headed Chromium -> video playback
257
-
258
- ### Stealth patches -- done
259
- Anti-detection for headless mode via `Page.addScriptToEvaluateOnNewDocument` (runs before page scripts).
260
- - `navigator.webdriver` -> undefined
261
- - `navigator.plugins` -> fake 3 plugins
262
- - `navigator.languages` -> `['en-US', 'en']`
263
- - `window.chrome` -> fake object
264
- - `Permissions.prototype.query` -> notifications return 'prompt'
265
- - Applied automatically in headless mode
266
-
267
- ### Tests -- 64 passing
268
- - 16 unit tests (pruning logic)
269
- - 7 unit tests (cookie extraction -- 2 skip when Chromium profile locked)
270
- - 5 unit tests (CDP client + browser launch)
271
- - 11 integration tests (end-to-end browse pipeline)
272
- - 10 integration tests (CLI session lifecycle: open/snapshot/goto/click/eval/console/network/close)
273
- - 15 integration tests (real-world interactions: data: URL fixture + live sites)
274
-
275
- ---
276
-
277
- ## Integrations
278
-
279
- ### bareagent -- tool adapter
280
-
281
- `createBrowseTools(opts)` returns bareagent-compatible tools for the Loop:
282
-
283
- ```js
284
- import { Loop } from 'bare-agent';
285
- import { Anthropic } from 'bare-agent/providers';
286
- import { createBrowseTools } from 'barebrowse/src/bareagent.js';
287
-
288
- const { tools, close } = createBrowseTools();
289
- const loop = new Loop({ provider: new Anthropic({ apiKey }) });
290
- const result = await loop.run(messages, tools);
291
- await close();
292
- ```
293
-
294
- 13 tools: browse, goto, snapshot, click, type, press, scroll, select, back, forward, drag, upload, screenshot.
295
- Action tools auto-return snapshot (300ms settle delay). The LLM always sees the result.
296
-
297
- ### MCP server
298
-
299
- Raw JSON-RPC 2.0 over stdio. Zero SDK dependencies. `npm install barebrowse` then:
300
-
301
- ```json
302
- {
303
- "mcpServers": {
304
- "barebrowse": {
305
- "command": "npx",
306
- "args": ["barebrowse", "mcp"]
307
- }
308
- }
309
- }
310
- ```
311
-
312
- 12 tools: browse (one-shot), goto, snapshot, click, type, press, scroll, back, forward, drag, upload, pdf.
313
- Action tools return `'ok'` -- agent calls `snapshot` explicitly (MCP tool calls are cheap to chain).
314
- `browse` and `snapshot` accept `maxChars` (default 30000) — large snapshots are saved to `.barebrowse/` and a file path is returned.
315
- Session runs in hybrid mode (headless + automatic headed fallback on bot detection). `goto` injects cookies from the user's browser before navigation.
316
- Session tools share a singleton page, lazy-created on first use.
317
-
318
- ### CLI session -- for coding agents + human devs
319
-
320
- Shell commands that output to disk. Coding agents (Claude Code, Copilot, Cursor) read output files with their file tools -- no tokens wasted in tool responses.
321
-
322
- ```bash
323
- barebrowse open https://example.com # Start daemon + navigate
324
- barebrowse snapshot # → .barebrowse/page-*.yml
325
- barebrowse click 8 # Click element
326
- barebrowse console-logs # → .barebrowse/console-*.json
327
- barebrowse close # Kill daemon + browser
328
- ```
329
-
330
- Architecture: `open` spawns a detached child process running an HTTP server on a random localhost port. Session state stored in `.barebrowse/session.json`. Subsequent commands POST to the daemon. `close` sends shutdown, daemon calls `page.close()` + `process.exit(0)`.
331
-
332
- Full commands: open, close, status, goto, back, forward, snapshot, screenshot, pdf, click, type, fill, press, scroll, hover, select, drag, upload, tabs, tab, eval, wait-idle, wait-for, console-logs, network-log, dialog-log, save-state.
333
-
334
- Self-sufficiency features (console/network capture, eval) let agents debug without guessing -- they see JS errors and failed requests directly.
335
-
336
- SKILL.md (`commands/barebrowse/SKILL.md`) teaches Claude Code the CLI commands. Install with `barebrowse install --skill`.
337
-
338
- ---
339
-
340
- ## Ecosystem
341
-
342
- ```
343
- bareagent = the brain (orchestration, LLM loop, memory, retries)
344
- barebrowse = the eyes + hands (browse, read, interact with the web)
345
- ```
346
-
347
- **barebrowse is a library.** bareagent imports it as a capability. barebrowse doesn't know about bareagent. bareagent doesn't know about CDP. Clean boundary. Each ships and tests independently.
348
-
349
- ---
350
-
351
- ## Constraints
352
-
353
- - **Chromium-only.** CDP protocol. Covers Chrome, Chromium, Edge, Brave, Vivaldi, Arc, Opera (~80% desktop share). Firefox later via WebDriver BiDi.
354
- - **Linux first.** Tested on Fedora/KDE. macOS/Windows cookie extraction paths exist in auth.js but untested.
355
- - **Node >= 22.** Built-in WebSocket, built-in SQLite.
356
- - **Not a server.** Library that agents import. Wrap as MCP (included) or HTTP if needed.
357
- - **Not cross-platform tested.** Tested on Linux only. Published to npm as `barebrowse`.
358
-
359
- ---
360
-
361
- ## File Map
362
-
363
- ```
364
- barebrowse/
365
- ├── src/
366
- │ ├── index.js # Public API: browse(), connect(), screenshot, network idle, hybrid
367
- │ ├── cdp.js # WebSocket CDP client
368
- │ ├── chromium.js # Find/launch Chromium, permission flags
369
- │ ├── aria.js # ARIA tree formatting
370
- │ ├── auth.js # Cookie extraction + injection
371
- │ ├── prune.js # ARIA pruning (9-step pipeline)
372
- │ ├── interact.js # Click, type, press, scroll, hover, select
373
- │ ├── consent.js # Auto-dismiss cookie consent dialogs
374
- │ ├── stealth.js # Navigator patches for headless anti-detection
375
- │ ├── bareagent.js # Tool adapter for bareagent Loop
376
- │ ├── daemon.js # Background HTTP server for CLI session
377
- │ └── session-client.js # HTTP client to daemon
378
- ├── test/
379
- │ ├── unit/ # prune, auth, cdp tests
380
- │ └── integration/ # browse, interact, cli tests
381
- ├── examples/
382
- │ ├── headed-demo.js # Interactive demo: Wikipedia → DuckDuckGo
383
- │ └── yt-demo.js # YouTube demo: Firefox cookies → search → play video
384
- ├── docs/
385
- │ ├── README.md # Documentation navigation guide
386
- │ ├── 00-context/ # vision, assumptions, system-state (this file)
387
- │ ├── 01-product/ # prd.md
388
- │ ├── 03-logs/ # decisions, implementation, bugs, validation, insights
389
- │ ├── 04-process/ # dev-workflow, definition-of-done, testing (64 tests)
390
- │ └── archive/ # poc-plan.md
391
- ├── mcp-server.js # MCP server (JSON-RPC 2.0 over stdio)
392
- ├── cli.js # CLI entry: session commands, MCP, browse, install
393
- ├── .mcp.json # MCP server config for Claude Desktop / Cursor
394
- ├── barebrowse.context.md # LLM-consumable integration guide
395
- ├── commands/
396
- │ ├── barebrowse.md # CLI command reference (any agent)
397
- │ └── barebrowse/
398
- │ └── SKILL.md # CLI command reference (Claude Code skill)
399
- ├── package.json
400
- ├── README.md
401
- └── CLAUDE.md
402
- ```
@@ -1,52 +0,0 @@
1
- # barebrowse -- Vision
2
-
3
- ## What it is
4
-
5
- A standalone vanilla JavaScript library that gives autonomous agents authenticated access to the web through the user's own Chromium browser. One package, one import, three modes.
6
-
7
- ```js
8
- import { browse } from 'barebrowse';
9
- const snapshot = await browse('https://any-page.com');
10
- ```
11
-
12
- barebrowse handles: finding the browser, connecting via CDP, injecting cookies, navigating, extracting the ARIA accessibility tree, and pruning it down to what an agent actually needs. The output is a clean, token-efficient snapshot of any web page -- authenticated as the real user.
13
-
14
- ## What it is NOT
15
-
16
- - **Not a framework.** No plugin system, no config files, no lifecycle hooks.
17
- - **Not Playwright.** No bundled browser, no cross-engine abstraction, no 200MB download.
18
- - **Not an agent.** No LLM, no planning, no orchestration -- that's bareagent's job.
19
- - **Not a scraper.** It browses as the user, not as a bot harvesting data.
20
-
21
- ## The core insight
22
-
23
- The user already has a browser. It's already logged in. It already passes Cloudflare. Instead of fighting the web with headless stealth tricks, **use what's already there**.
24
-
25
- CDP (Chrome DevTools Protocol) lets us connect to any Chromium-based browser -- the same one the user browses with daily. We get their cookies, their sessions, their anti-detection posture, for free.
26
-
27
- ## The problem it solves
28
-
29
- Every AI agent that needs to read or interact with the web hits the same walls:
30
-
31
- 1. **Cloudflare / bot detection** -- headless browsers get blocked
32
- 2. **Authentication** -- sites require login, OAuth, session cookies
33
- 3. **Token bloat** -- raw DOM is 100K+ tokens; agents need ~5K
34
- 4. **Two consumers, same need** -- research agents (read pages) and personal assistants (click/type) both need an authenticated browser, but existing tools force you to choose one path
35
-
36
- ## The bare- ecosystem
37
-
38
- ```
39
- bareagent = the brain (orchestration, LLM loop, memory, retries)
40
- barebrowse = the eyes + hands (browse, read, interact with the web)
41
- ```
42
-
43
- barebrowse is a library. bareagent imports it as a capability. barebrowse doesn't know about bareagent. bareagent doesn't know about CDP. Clean boundary. Each ships and tests independently.
44
-
45
- ## Success criteria
46
-
47
- 1. `browse(url)` returns a pruned ARIA snapshot of any page, authenticated as the user
48
- 2. Zero heavy dependencies -- no Playwright, no Puppeteer, no bundled browser
49
- 3. Works with any installed Chromium-based browser
50
- 4. Headless for research, headed for interaction, hybrid for autonomous agents
51
- 5. Plugs into bareagent as plain tool functions
52
- 6. An agent using barebrowse + bareagent can autonomously research the web and act on pages