barebrowse 0.4.8 → 0.5.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +37 -0
- package/README.md +4 -3
- package/barebrowse.context.md +34 -6
- package/mcp-server.js +38 -1
- package/package.json +4 -1
- package/src/bareagent.js +80 -0
- package/.barebrowse/page-2026-02-23T15-39-32-011Z.yml +0 -1219
- package/.barebrowse/page-2026-02-23T15-40-19-874Z.yml +0 -663
- package/.mcp.json +0 -8
- package/CLAUDE.md +0 -24
- package/baremobile.md +0 -105
- package/docs/00-context/assumptions.md +0 -38
- package/docs/00-context/system-state.md +0 -402
- package/docs/00-context/vision.md +0 -52
- package/docs/01-product/prd.md +0 -308
- package/docs/03-logs/bug-log.md +0 -16
- package/docs/03-logs/decisions-log.md +0 -32
- package/docs/03-logs/implementation-log.md +0 -54
- package/docs/03-logs/insights.md +0 -35
- package/docs/03-logs/validation-log.md +0 -269
- package/docs/04-process/definition-of-done.md +0 -31
- package/docs/04-process/dev-workflow.md +0 -68
- package/docs/04-process/testing.md +0 -242
- package/docs/README.md +0 -56
- package/docs/archive/poc-plan.md +0 -230
- package/docs/skill-template.md +0 -106
package/.mcp.json
DELETED
package/CLAUDE.md
DELETED
|
@@ -1,24 +0,0 @@
|
|
|
1
|
-
## Dev Rules
|
|
2
|
-
|
|
3
|
-
**POC first.** Always validate logic with a ~15min proof-of-concept before building. Cover happy path + common edges. POC works → design properly → build with tests. Never ship the POC.
|
|
4
|
-
|
|
5
|
-
**Build incrementally.** Break work into small independent modules. One piece at a time, each must work on its own before integrating.
|
|
6
|
-
|
|
7
|
-
**Dependency hierarchy — follow strictly:** vanilla language → standard library → external (only when stdlib can't do it in <100 lines). External deps must be maintained, lightweight, and widely adopted. Exception: always use vetted libraries for security-critical code (crypto, auth, sanitization).
|
|
8
|
-
|
|
9
|
-
**Lightweight over complex.** Fewer moving parts, fewer deps, less config. Simple > clever. Readable > elegant.
|
|
10
|
-
|
|
11
|
-
**Open-source only.** No vendor lock-in. Every line of code must have a purpose — no speculative code, no premature abstractions.
|
|
12
|
-
|
|
13
|
-
## Project Specifics
|
|
14
|
-
|
|
15
|
-
- **What:** Vanilla JS library — CDP-direct browsing for autonomous agents. URL in, pruned ARIA snapshot out.
|
|
16
|
-
- **Language:** Vanilla JavaScript, ES modules, no build step
|
|
17
|
-
- **Runtime:** Node.js >= 22 (built-in WebSocket, sqlite)
|
|
18
|
-
- **Protocol:** CDP (Chrome DevTools Protocol) direct — no Playwright
|
|
19
|
-
- **Browser:** Any installed Chromium-based browser (chromium, chrome, brave, edge)
|
|
20
|
-
- **Modules:** 11 files in `src/`, ~2,400 lines, zero required deps
|
|
21
|
-
- **Tests:** 54 passing — run with `node --test test/unit/*.test.js test/integration/*.test.js`
|
|
22
|
-
- **Docs:** `docs/README.md` (navigation guide to all documentation)
|
|
23
|
-
|
|
24
|
-
For full development and testing standards, see `.claude/memory/AGENT_RULES.md`.
|
package/baremobile.md
DELETED
|
@@ -1,105 +0,0 @@
|
|
|
1
|
-
# baremobile — Research & Feasibility
|
|
2
|
-
|
|
3
|
-
## Platform Feasibility
|
|
4
|
-
|
|
5
|
-
| Platform | Accessibility Tree | Input Injection | Auth/Cookie Reuse | Practical? |
|
|
6
|
-
|---|---|---|---|---|
|
|
7
|
-
| **Android (non-root)** | uiautomator dump — good XML tree | ADB tap/swipe/input — solid | No. SharedPrefs locked, Keystore hardware-bound | Yes — tree + input work, auth is the gap |
|
|
8
|
-
| **Android (rooted)** | Same | Same | Partial — SharedPrefs yes, Keystore still no | Yes — best mobile option |
|
|
9
|
-
| **iOS (simulator)** | XCUITest — excellent tree | simctl + WDA | No. Absolute sandboxing | Dev/QA only |
|
|
10
|
-
| **iOS (physical)** | Same, but needs Mac+Xcode+signing | Same | No | Impractical for end users |
|
|
11
|
-
| **Windows** | UI Automation (UIA) — best desktop tree | SendInput — works well | App-specific, no universal trick | Yes — strongest desktop option |
|
|
12
|
-
| **macOS** | AXUIElement — good tree | CGEvent APIs, cliclick | Keychain Access (gated by prompts) | Medium — permission hell |
|
|
13
|
-
| **Linux (X11)** | AT-SPI2 — inconsistent coverage | xdotool — trivial | App-specific | Medium — tree quality varies |
|
|
14
|
-
| **Linux (Wayland)** | AT-SPI2 — same | ydotool (needs root) | Same | Hard — input injection blocked by design |
|
|
15
|
-
|
|
16
|
-
## Market Demand
|
|
17
|
-
|
|
18
|
-
| Platform | Who Wants It | Use Cases | Demand Level |
|
|
19
|
-
|---|---|---|---|
|
|
20
|
-
| **Android** | Devs, QA teams, end users | Mobile-only apps, testing, data entry, social media automation | **High** — DroidRun got 900 signups in 72h, 2.1M EUR raised |
|
|
21
|
-
| **iOS** | QA teams, enterprises | App testing, accessibility audits | **Medium** — gated access kills consumer use |
|
|
22
|
-
| **Windows** | Enterprises | Legacy app automation (no API, GUI only) | **High** — Microsoft building UFO (8k stars) |
|
|
23
|
-
| **macOS** | Developers, power users | App automation, workflows | **Low-Medium** — most Mac apps have APIs or CLI |
|
|
24
|
-
| **Linux desktop** | Almost nobody | Niche automation | **Low** — devs use CLI, not GUI agents |
|
|
25
|
-
|
|
26
|
-
## Existing Open-Source Competition
|
|
27
|
-
|
|
28
|
-
| Project | Platform | Stars | Approach | Maturity |
|
|
29
|
-
|---|---|---|---|---|
|
|
30
|
-
| DroidRun | Android | 3.8k | A11y tree + ADB | Funded (2.1M EUR), active |
|
|
31
|
-
| DroidClaw | Android | New | A11y tree + vision fallback | Weeks old |
|
|
32
|
-
| agent-device (Callstack) | Android + iOS | Early | A11y tree, TypeScript | Active |
|
|
33
|
-
| UFO (Microsoft) | Windows | 8k | UIA + vision | Mature |
|
|
34
|
-
| Agent-S (Simular AI) | Cross-platform | 8.5k | Hybrid | Mature, 72% OSWorld |
|
|
35
|
-
| Cua | macOS/Linux/Win | 11.8k | Sandbox VMs | YC-backed |
|
|
36
|
-
|
|
37
|
-
## Auth Problem (Mobile vs Web)
|
|
38
|
-
|
|
39
|
-
| | Web (barebrowse) | Android Native | iOS Native |
|
|
40
|
-
|---|---|---|---|
|
|
41
|
-
| Where tokens live | SQLite cookie DB, readable | SharedPrefs (locked) or Keystore (hardware) | Keychain (sandboxed) |
|
|
42
|
-
| Can agent read them? | Yes — decrypt with OS keyring | No (non-root), Partial (root) | No |
|
|
43
|
-
| Workaround | N/A — it works | Agent logs in via UI, or keeps app session alive | Same |
|
|
44
|
-
| WebView content? | N/A | CDP attach possible (debug builds only) | No |
|
|
45
|
-
|
|
46
|
-
## Strategic Comparison
|
|
47
|
-
|
|
48
|
-
| Factor | Android | Windows | iOS | macOS | Linux Desktop |
|
|
49
|
-
|---|---|---|---|---|---|
|
|
50
|
-
| Tree quality | Good | Excellent | Excellent | Good | Inconsistent |
|
|
51
|
-
| Input control | Easy (ADB) | Easy | Gated | Permission-heavy | X11 easy, Wayland hard |
|
|
52
|
-
| Auth reuse | Bad | App-specific | Impossible | Gated | App-specific |
|
|
53
|
-
| Real demand | **High** | **High** | Medium (QA only) | Low-Medium | Low |
|
|
54
|
-
| Competition | Active but early | Microsoft owns it | Apple controls it | Niche | Nobody cares |
|
|
55
|
-
| Fits barebrowse DNA? | **Yes** | Partial | No | No | No |
|
|
56
|
-
|
|
57
|
-
## Android Technical Details
|
|
58
|
-
|
|
59
|
-
### Accessibility Tree via ADB
|
|
60
|
-
```bash
|
|
61
|
-
adb shell uiautomator dump /dev/tty # dump XML accessibility tree
|
|
62
|
-
```
|
|
63
|
-
Returns XML with: bounds, text, class, content-desc, resource-id, clickable, scrollable, focused, enabled, checked, selected. Structurally similar to ARIA — roles, names, states, coordinates.
|
|
64
|
-
|
|
65
|
-
### Input via ADB
|
|
66
|
-
```bash
|
|
67
|
-
adb shell input tap 500 300 # tap at coordinates
|
|
68
|
-
adb shell input text "hello" # type text
|
|
69
|
-
adb shell input keyevent 66 # Enter key (KEYCODE_ENTER)
|
|
70
|
-
adb shell input swipe 300 500 300 100 # swipe gesture
|
|
71
|
-
adb shell input keyevent 4 # Back button
|
|
72
|
-
```
|
|
73
|
-
|
|
74
|
-
### Key Limitations
|
|
75
|
-
- **WebViews:** uiautomator tree is empty/shallow for WebView content. Flutter apps can crash uiautomator with StackOverflowError.
|
|
76
|
-
- **Auth:** Cannot read app tokens on non-rooted devices. Agent must log in through UI or keep sessions alive.
|
|
77
|
-
- **Latency:** uiautomator dump takes 1-3 seconds. Screenshot approach is faster per frame but less structured.
|
|
78
|
-
|
|
79
|
-
### WebView Gap — Potential Differentiator
|
|
80
|
-
Android WebViews expose a CDP debug port when the app is built with `WebView.setWebContentsDebuggingEnabled(true)`. If accessible, barebrowse's CDP + ARIA expertise can fill the gap that all other Android agent tools struggle with — structured content inside WebViews instead of falling back to screenshots.
|
|
81
|
-
|
|
82
|
-
Discovery: `adb forward tcp:9222 localabstract:webview_devtools_remote_<pid>`
|
|
83
|
-
|
|
84
|
-
## Windows Technical Details
|
|
85
|
-
|
|
86
|
-
### UI Automation (UIA)
|
|
87
|
-
Windows' accessibility API. Exposes a tree of AutomationElements with: ControlType, Name, AutomationId, BoundingRectangle, IsEnabled, patterns (Invoke, Value, Toggle, Selection, Scroll, etc.).
|
|
88
|
-
|
|
89
|
-
Best desktop accessibility tree. Covers Win32, WPF, WinForms, UWP, and most Electron apps.
|
|
90
|
-
|
|
91
|
-
### Input
|
|
92
|
-
SendInput API for keyboard/mouse. Or higher-level: `pyautogui`, `robotjs`, `nut.js`.
|
|
93
|
-
|
|
94
|
-
### Competition
|
|
95
|
-
Microsoft's own UFO project (8k stars) dominates this space. They have first-party UIA access and deep investment. Competing here means competing with Microsoft on their own platform's APIs.
|
|
96
|
-
|
|
97
|
-
## References
|
|
98
|
-
- [DroidRun](https://github.com/droidrun/droidrun)
|
|
99
|
-
- [DroidClaw](https://github.com/unitedbyai/droidclaw)
|
|
100
|
-
- [agent-device](https://github.com/callstackincubator/agent-device)
|
|
101
|
-
- [UFO (Microsoft)](https://github.com/microsoft/UFO)
|
|
102
|
-
- [Agent-S](https://github.com/simular-ai/Agent-S)
|
|
103
|
-
- [Cua](https://github.com/trycua/cua)
|
|
104
|
-
- [Android uiautomator](https://developer.android.com/training/testing/other-components/ui-automator)
|
|
105
|
-
- [Windows UI Automation](https://learn.microsoft.com/en-us/windows/win32/winauto/entry-uiauto-win32)
|
|
@@ -1,38 +0,0 @@
|
|
|
1
|
-
# barebrowse -- Assumptions & Constraints
|
|
2
|
-
|
|
3
|
-
## Hard constraints
|
|
4
|
-
|
|
5
|
-
| Constraint | Detail |
|
|
6
|
-
|-----------|--------|
|
|
7
|
-
| **Chromium-only** | CDP protocol. Covers Chrome, Chromium, Edge, Brave, Vivaldi, Arc, Opera (~80% desktop share). Firefox later via WebDriver BiDi. |
|
|
8
|
-
| **Node >= 22** | Built-in WebSocket (`globalThis.WebSocket`), built-in SQLite (`node:sqlite`). No polyfills. |
|
|
9
|
-
| **Linux first** | Tested on Fedora/KDE/Wayland. macOS/Windows cookie extraction paths exist in auth.js but are untested. |
|
|
10
|
-
| **Zero required deps** | Everything uses Node stdlib. Vanilla JS, ES modules, no build step. |
|
|
11
|
-
| **Not a server** | Library that agents import. MCP wrapper included, HTTP wrapper is DIY. |
|
|
12
|
-
|
|
13
|
-
## Assumptions
|
|
14
|
-
|
|
15
|
-
- **User has Chromium installed.** At least one of: chromium-browser, google-chrome, brave-browser, microsoft-edge. `chromium.js` searches common paths.
|
|
16
|
-
- **Cookie extraction needs unlocked profile.** Chromium cookies are AES-encrypted with a keyring key (KWallet on KDE, GNOME Keyring on GNOME). Firefox cookies are plaintext SQLite and always accessible.
|
|
17
|
-
- **Headed mode requires manual browser launch.** User must start their browser with `--remote-debugging-port=9222`. barebrowse connects to it -- does not launch it.
|
|
18
|
-
- **Hybrid fallback needs a running headed browser.** If headless is bot-blocked, hybrid kills headless and connects to headed on port 9222. That browser must already be running.
|
|
19
|
-
- **Cookies expire.** Cookie injection works for existing sessions, not new logins. For sites requiring fresh auth, headed mode with user interaction is the fallback.
|
|
20
|
-
- **One page per connect().** Each `connect()` call creates one page. For multiple tabs, call `connect()` multiple times.
|
|
21
|
-
|
|
22
|
-
## Known limitations
|
|
23
|
-
|
|
24
|
-
| Limitation | Impact | Workaround |
|
|
25
|
-
|-----------|--------|------------|
|
|
26
|
-
| No Firefox/WebKit support | ~20% of desktop users can't use native browser | Use Chromium as the automation target, Firefox as cookie source |
|
|
27
|
-
| No file upload | Can't interact with file inputs | Not yet implemented (`Input.setFiles` via CDP) |
|
|
28
|
-
| No drag and drop | Can't use drag-based UIs | Not yet implemented |
|
|
29
|
-
| No cross-origin iframes | Content inside iframes invisible to ARIA tree | Frame tree traversal via CDP (medium effort) |
|
|
30
|
-
| No CAPTCHAs | Cannot solve challenge pages | Headed mode lets user solve manually |
|
|
31
|
-
| Canvas/WebGL opaque | No ARIA representation | Needs screenshot + vision model |
|
|
32
|
-
| macOS/Windows untested | Cookie paths exist but may not work | Linux-only for now |
|
|
33
|
-
|
|
34
|
-
## Risks
|
|
35
|
-
|
|
36
|
-
- **CDP is not a stable API.** Chrome team can change it across versions. Mitigation: we use well-established domains (Accessibility, Input, Page, Network, DOM) that rarely break.
|
|
37
|
-
- **Cookie consent patterns evolve.** New consent frameworks may not be detected by `consent.js`. Mitigation: best-effort, opt-out with `{ consent: false }`.
|
|
38
|
-
- **Stealth patches are an arms race.** Bot detection evolves. Mitigation: headed mode with real browser profile is the ultimate fallback.
|
|
@@ -1,402 +0,0 @@
|
|
|
1
|
-
# barebrowse -- Blueprint
|
|
2
|
-
|
|
3
|
-
Vanilla JS library. CDP-direct. URL in, pruned ARIA snapshot out.
|
|
4
|
-
No Playwright, no bundled browser, no build step.
|
|
5
|
-
|
|
6
|
-
---
|
|
7
|
-
|
|
8
|
-
## What It Does
|
|
9
|
-
|
|
10
|
-
Gives autonomous agents authenticated access to the web through the user's own Chromium browser.
|
|
11
|
-
|
|
12
|
-
```js
|
|
13
|
-
import { browse, connect } from 'barebrowse';
|
|
14
|
-
|
|
15
|
-
// One-shot: read a page
|
|
16
|
-
const snapshot = await browse('https://any-page.com');
|
|
17
|
-
|
|
18
|
-
// Session: navigate, interact, observe
|
|
19
|
-
const page = await connect();
|
|
20
|
-
await page.goto('https://any-page.com');
|
|
21
|
-
console.log(await page.snapshot());
|
|
22
|
-
await page.click('8'); // ref from snapshot
|
|
23
|
-
await page.type('3', 'hello');
|
|
24
|
-
await page.scroll(500);
|
|
25
|
-
await page.close();
|
|
26
|
-
```
|
|
27
|
-
|
|
28
|
-
---
|
|
29
|
-
|
|
30
|
-
## Capabilities
|
|
31
|
-
|
|
32
|
-
Every action returns a **pruned ARIA snapshot** -- the agent's view of the page after each move. The snapshot is a YAML-like tree with `[ref=N]` markers on interactive elements. The agent reads the snapshot, picks a ref, acts, then reads the next snapshot. This is the observe-think-act loop.
|
|
33
|
-
|
|
34
|
-
### Actions
|
|
35
|
-
|
|
36
|
-
| Action | Method | What It Does | Status |
|
|
37
|
-
|--------|--------|-------------|--------|
|
|
38
|
-
| Navigate | `page.goto(url)` | Load a URL, wait for page load, dismiss consent | Done |
|
|
39
|
-
| Snapshot | `page.snapshot()` | Pruned ARIA tree (47-95% token reduction) | Done |
|
|
40
|
-
| Click | `page.click(ref)` | Scroll into view, mouse press+release at element center | Done |
|
|
41
|
-
| Type | `page.type(ref, text)` | Focus element, insert text (fast batch mode) | Done |
|
|
42
|
-
| Type (clear) | `page.type(ref, text, { clear: true })` | Select-all + delete, then type (replaces pre-filled content) | Done |
|
|
43
|
-
| Type (key events) | `page.type(ref, text, { keyEvents: true })` | Char-by-char keyDown/keyUp (triggers JS handlers) | Done |
|
|
44
|
-
| Press | `page.press(key)` | Special key: Enter, Tab, Escape, Backspace, Delete, arrows, Home/End, PageUp/Down, Space | Done |
|
|
45
|
-
| Scroll | `page.scroll(deltaY)` | Mouse wheel event (positive=down, negative=up) | Done |
|
|
46
|
-
| Hover | `page.hover(ref)` | Move mouse to element center (triggers hover styles/tooltips) | Done |
|
|
47
|
-
| Select | `page.select(ref, value)` | Set `<select>` value or click custom dropdown option | Done |
|
|
48
|
-
| Screenshot | `page.screenshot(opts)` | `Page.captureScreenshot`, returns base64 string | Done |
|
|
49
|
-
| Wait for nav | `page.waitForNavigation()` | Promise.race of loadEventFired + frameNavigated (SPA-aware) | Done |
|
|
50
|
-
| Wait for idle | `page.waitForNetworkIdle(opts)` | Resolve when no pending requests for N ms (default 500) | Done |
|
|
51
|
-
| Wait for content | `page.waitFor({ text, selector })` | Poll for text or CSS selector to appear on page | Done |
|
|
52
|
-
| Back / Forward | `page.goBack()` / `page.goForward()` | Browser history navigation via `Page.getNavigationHistory` | Done |
|
|
53
|
-
| Drag | `page.drag(fromRef, toRef)` | Mouse down on source, move to target, release | Done |
|
|
54
|
-
| Upload | `page.upload(ref, files)` | Set files on file input via `DOM.setFileInputFiles` | Done |
|
|
55
|
-
| PDF | `page.pdf(opts)` | Export page as PDF via `Page.printToPDF` | Done |
|
|
56
|
-
| Tabs | `page.tabs()` / `page.switchTab(index)` | List and switch between browser tabs | Done |
|
|
57
|
-
| Dialog handling | Auto | JS alert/confirm/prompt auto-dismissed, logged to `page.dialogLog` | Done |
|
|
58
|
-
| Save state | `page.saveState(filePath)` | Export cookies + localStorage to JSON for later `--storage-state` | Done |
|
|
59
|
-
| Inject cookies | `page.injectCookies(url, opts)` | Extract cookies from Firefox/Chromium, inject via CDP | Done |
|
|
60
|
-
| Raw CDP | `page.cdp.send(method, params)` | Escape hatch for any CDP command | Done |
|
|
61
|
-
| Close | `page.close()` | Close page target, disconnect CDP, kill browser (if headless) | Done |
|
|
62
|
-
|
|
63
|
-
### Obstacle course -- what barebrowse handles automatically
|
|
64
|
-
|
|
65
|
-
| Obstacle | How It's Handled | Mode |
|
|
66
|
-
|----------|-----------------|------|
|
|
67
|
-
| **Cookie consent walls** | ARIA tree scan, jsClick accept button. 29 languages | Both |
|
|
68
|
-
| **Consent in dialog role** | Detect `dialog`/`alertdialog` with consent hints, click accept inside | Both |
|
|
69
|
-
| **Consent outside dialog** (BBC SourcePoint) | Fallback global button scan when dialog has no accept button | Both |
|
|
70
|
-
| **Consent behind iframe overlay** | JS `.click()` via `DOM.resolveNode` bypasses z-index/overlay issues | Both |
|
|
71
|
-
| **Permission prompts** (location, notifications, camera, mic) | Launch flags + CDP `Browser.setPermission` auto-deny | Both |
|
|
72
|
-
| **Media autoplay blocked** | `--autoplay-policy=no-user-gesture-required` | Both |
|
|
73
|
-
| **Login walls** | All-browser cookie merge (Firefox + Chromium), CDP injection (user's real sessions) | Both |
|
|
74
|
-
| **Pre-filled form inputs** | `type({ clear: true })` selects all + deletes before typing | Both |
|
|
75
|
-
| **Off-screen elements** | `DOM.scrollIntoViewIfNeeded` before every click | Both |
|
|
76
|
-
| **Form submission** | `press('Enter')` with proper `text: '\r'` triggers onsubmit | Both |
|
|
77
|
-
| **Tab between fields** | `press('Tab')` with `text: '\t'` moves focus | Both |
|
|
78
|
-
| **SPA navigation** (YouTube, GitHub) | `waitForNavigation()` uses frameNavigated + loadEventFired race | Both |
|
|
79
|
-
| **Bot detection** (Google, Reddit) | Stealth patches (headless) + headed mode with real cookies | Both |
|
|
80
|
-
| **`navigator.webdriver`** | Stealth patches: webdriver, plugins, languages, chrome object | Headless |
|
|
81
|
-
| **JS dialogs** (alert/confirm/prompt) | Auto-dismiss via `Page.handleJavaScriptDialog`, logged to `dialogLog` | Both |
|
|
82
|
-
| **Profile locking** | Unique temp dir per headless instance (`/tmp/barebrowse-<pid>-<ts>`) | Headless |
|
|
83
|
-
| **ARIA noise** | 9-step pruning: wrapper collapse, noise removal, landmark promotion | Both |
|
|
84
|
-
|
|
85
|
-
### Not yet handled
|
|
86
|
-
|
|
87
|
-
| Obstacle | What's Needed | Difficulty |
|
|
88
|
-
|----------|--------------|------------|
|
|
89
|
-
| Infinite scroll | Scroll + wait for new content strategy | Medium |
|
|
90
|
-
| CAPTCHAs | Cannot solve -- headed mode lets user solve manually | N/A |
|
|
91
|
-
| Cross-origin iframes | Frame tree traversal via CDP | Medium |
|
|
92
|
-
| Canvas/WebGL | Opaque to ARIA -- needs screenshot + vision model | Hard |
|
|
93
|
-
|
|
94
|
-
### Tested sites (16+ sites, 8 countries, all consent dismissed)
|
|
95
|
-
|
|
96
|
-
| Site | Consent | Cookies | Interactions | Notes |
|
|
97
|
-
|------|---------|---------|-------------|-------|
|
|
98
|
-
| google.com | NL dialog dismissed | Firefox injection | Search (combobox + Enter) | Bot-blocks headless |
|
|
99
|
-
| youtube.com | Bypassed via cookies | Firefox injection | Search + video playback | Full e2e demo, SPA nav |
|
|
100
|
-
| bbc.com | SourcePoint dismissed | -- | -- | Button outside dialog |
|
|
101
|
-
| wikipedia.org | -- | -- | Link click + navigation | Clean, no consent |
|
|
102
|
-
| github.com | -- | -- | SPA navigation | Needs settle time |
|
|
103
|
-
| duckduckgo.com | -- | -- | Search + results | Headless-friendly |
|
|
104
|
-
| news.ycombinator.com | -- | -- | Story link click | Clean, simple DOM |
|
|
105
|
-
| amazon.de | Banner dismissed | -- | -- | |
|
|
106
|
-
| theguardian.com | CMP dismissed | -- | -- | |
|
|
107
|
-
| spiegel.de | CMP dismissed | -- | -- | German |
|
|
108
|
-
| lemonde.fr | CMP dismissed | -- | -- | French |
|
|
109
|
-
| elpais.com | CMP dismissed | -- | -- | Spanish |
|
|
110
|
-
| corriere.it | CMP dismissed | -- | -- | Italian |
|
|
111
|
-
| nos.nl | CMP dismissed | -- | -- | Dutch |
|
|
112
|
-
| bild.de | CMP dismissed | -- | -- | German |
|
|
113
|
-
| nu.nl | CMP dismissed | -- | -- | Dutch |
|
|
114
|
-
| booking.com | Banner dismissed | -- | -- | |
|
|
115
|
-
| nytimes.com | -- | -- | -- | No consent wall |
|
|
116
|
-
| stackoverflow.com | Footer link only | -- | -- | Not blocking |
|
|
117
|
-
| cnn.com | -- | -- | -- | No consent wall |
|
|
118
|
-
| reddit.com | -- | -- | Fallback to old.reddit | Bot-blocks headless |
|
|
119
|
-
|
|
120
|
-
---
|
|
121
|
-
|
|
122
|
-
## Architecture
|
|
123
|
-
|
|
124
|
-
### Full pipeline: browse(url) or connect() -> goto(url)
|
|
125
|
-
|
|
126
|
-
```
|
|
127
|
-
1. LAUNCH chromium.js finds installed browser
|
|
128
|
-
Headless: spawn fresh Chromium with permission flags
|
|
129
|
-
Headed: connect to running browser on CDP port
|
|
130
|
-
Hybrid: try headless, detect challenge page, fallback to headed
|
|
131
|
-
|
|
132
|
-
2. CDP CONNECTION cdp.js opens WebSocket to browser
|
|
133
|
-
Creates page target, attaches flattened session
|
|
134
|
-
Enables Page, Network, DOM domains
|
|
135
|
-
|
|
136
|
-
3. STEALTH stealth.js (headless only)
|
|
137
|
-
Page.addScriptToEvaluateOnNewDocument before any page scripts
|
|
138
|
-
Patches: navigator.webdriver, plugins, languages, chrome object
|
|
139
|
-
|
|
140
|
-
4. PERMISSIONS Browser.setPermission denies all prompts
|
|
141
|
-
geo, notifications, camera, mic, midi, sensors, idle
|
|
142
|
-
|
|
143
|
-
5. AUTH auth.js extracts cookies from user's browser
|
|
144
|
-
Firefox: SQLite cookies.sqlite (plaintext)
|
|
145
|
-
Chromium: SQLite Cookies + AES decrypt via keyring
|
|
146
|
-
Injects via Network.setCookie before navigation
|
|
147
|
-
|
|
148
|
-
6. NAVIGATE Page.navigate(url), wait for Page.loadEventFired
|
|
149
|
-
500ms settle for dynamic content
|
|
150
|
-
|
|
151
|
-
7. CONSENT consent.js scans ARIA tree post-load
|
|
152
|
-
Finds dialog/alertdialog with consent hints
|
|
153
|
-
Falls back to global button scan (BBC SourcePoint pattern)
|
|
154
|
-
jsClick via DOM.resolveNode (bypasses iframe overlays)
|
|
155
|
-
|
|
156
|
-
8. SNAPSHOT Accessibility.getFullAXTree -> nested tree (aria.js)
|
|
157
|
-
prune.js: 9-step pipeline (47-95% token reduction)
|
|
158
|
-
Output: URL + pruning stats + YAML-like text with [ref=N] markers
|
|
159
|
-
|
|
160
|
-
9. INTERACT interact.js dispatches real CDP Input events
|
|
161
|
-
click: scrollIntoView -> getBoxModel -> mousePressed/Released
|
|
162
|
-
type: DOM.focus -> insertText or keyDown/keyUp per char
|
|
163
|
-
press: special keys (Enter, Tab, Escape, arrows, etc.)
|
|
164
|
-
scroll: mouseWheel events
|
|
165
|
-
hover: mouseMoved at element center
|
|
166
|
-
select: set <select> value or click custom dropdown option
|
|
167
|
-
|
|
168
|
-
10. OBSERVE AGAIN Back to step 8. Refs are ephemeral -- fresh snapshot needed.
|
|
169
|
-
```
|
|
170
|
-
|
|
171
|
-
### Module table
|
|
172
|
-
|
|
173
|
-
Thirteen modules, zero required dependencies.
|
|
174
|
-
|
|
175
|
-
| Module | Lines | Purpose |
|
|
176
|
-
|---|---|---|
|
|
177
|
-
| `src/index.js` | 434 | Public API: `browse()`, `connect()`, screenshot, network idle, hybrid |
|
|
178
|
-
| `src/cdp.js` | 148 | WebSocket CDP client, flattened sessions |
|
|
179
|
-
| `src/chromium.js` | 148 | Find/launch Chromium browsers, permission-suppressing flags |
|
|
180
|
-
| `src/aria.js` | 69 | Format ARIA tree as YAML-like text |
|
|
181
|
-
| `src/auth.js` | 279 | Cookie extraction (Chromium AES + keyring, Firefox), CDP injection |
|
|
182
|
-
| `src/prune.js` | 472 | ARIA pruning pipeline (9-step, ported from mcprune) |
|
|
183
|
-
| `src/interact.js` | 208 | Click, type, press, scroll, hover, select |
|
|
184
|
-
| `src/consent.js` | ~280 | Auto-dismiss cookie consent dialogs, 29 languages |
|
|
185
|
-
| `src/stealth.js` | 51 | Navigator patches for headless anti-detection |
|
|
186
|
-
| `src/bareagent.js` | 161 | Tool adapter for bareagent Loop |
|
|
187
|
-
| `src/daemon.js` | ~230 | Background HTTP server holding connect() session for CLI mode |
|
|
188
|
-
| `src/session-client.js` | ~60 | HTTP client to daemon (sendCommand, readSession, isAlive) |
|
|
189
|
-
| `mcp-server.js` | 216 | MCP server (JSON-RPC 2.0 over stdio) |
|
|
190
|
-
|
|
191
|
-
---
|
|
192
|
-
|
|
193
|
-
## What's Built
|
|
194
|
-
|
|
195
|
-
### Headless mode -- done
|
|
196
|
-
Spawn a fresh Chromium, navigate, snapshot, close. Default mode.
|
|
197
|
-
- Cookie extraction from user's Firefox or Chromium profile
|
|
198
|
-
- Cookie injection via `Network.setCookie` before navigation
|
|
199
|
-
- ARIA tree extraction via `Accessibility.getFullAXTree`
|
|
200
|
-
- 9-step pruning: landmarks, noise removal, wrapper collapsing, context filtering
|
|
201
|
-
- 47-95% token reduction depending on page complexity
|
|
202
|
-
- Permission prompts auto-suppressed (notifications, geolocation, camera, mic)
|
|
203
|
-
- Stealth patches: `navigator.webdriver`, plugins, languages, chrome object
|
|
204
|
-
|
|
205
|
-
### Headed mode -- done
|
|
206
|
-
Connect to an already-running browser on a CDP debug port.
|
|
207
|
-
- Same ARIA + prune pipeline
|
|
208
|
-
- Manual cookie injection via `page.injectCookies(url, { browser })` (e.g. inject Firefox cookies into headed Chromium)
|
|
209
|
-
- Permission prompts suppressed via CDP `Browser.setPermission`
|
|
210
|
-
- User must launch browser with `--remote-debugging-port=9222`
|
|
211
|
-
|
|
212
|
-
### Hybrid mode -- done
|
|
213
|
-
Try headless first. If bot-blocked (Cloudflare, etc.), fall back to headed automatically.
|
|
214
|
-
- Detection: heuristic on ARIA tree for challenge phrases ("Just a moment", "Checking your browser")
|
|
215
|
-
- Fallback: kill headless, connect to user's running browser on port 9222, re-navigate
|
|
216
|
-
- One flag: `mode: 'hybrid'`
|
|
217
|
-
|
|
218
|
-
### Interactions -- done, real-world tested
|
|
219
|
-
On `connect()` sessions: `click(ref)`, `type(ref, text, opts)`, `press(key)`, `scroll(deltaY)`, `hover(ref)`, `select(ref, value)`, `screenshot()`, `waitForNavigation()`, `waitForNetworkIdle()`, `injectCookies(url, opts)`.
|
|
220
|
-
- Refs come from ARIA snapshot (`[ref=N]` markers)
|
|
221
|
-
- Click: `DOM.scrollIntoViewIfNeeded` -> `DOM.getBoxModel` -> center -> `Input.dispatchMouseEvent`
|
|
222
|
-
- Type: `DOM.focus` + `Input.insertText` (fast) or `Input.dispatchKeyEvent` (triggers handlers)
|
|
223
|
-
- Type with `{ clear: true }`: select-all (Ctrl+A) + delete before typing
|
|
224
|
-
- Press: special keys (Enter, Tab, Escape, Backspace, arrows) with proper key/code/keyCode
|
|
225
|
-
- Scroll: `Input.dispatchMouseEvent` mouseWheel
|
|
226
|
-
- Hover: `DOM.scrollIntoViewIfNeeded` -> `Input.dispatchMouseEvent` mouseMoved
|
|
227
|
-
- Select: native `<select>` (set value + change event) or custom dropdown (click + find option)
|
|
228
|
-
- Screenshot: `Page.captureScreenshot` -> base64 string (png/jpeg/webp)
|
|
229
|
-
- WaitForNavigation: `Promise.race` of `Page.loadEventFired` + `Page.frameNavigated` (SPA-aware)
|
|
230
|
-
- WaitForNetworkIdle: track pending requests, resolve when 0 for N ms
|
|
231
|
-
|
|
232
|
-
**Real-world tested against:** Google, Wikipedia, GitHub (SPA), Hacker News, DuckDuckGo, YouTube (search + video playback), example.com
|
|
233
|
-
|
|
234
|
-
### Cookie consent auto-dismiss -- done
|
|
235
|
-
Automatically detects and dismisses cookie consent dialogs after page load.
|
|
236
|
-
- Scans ARIA tree for `dialog`/`alertdialog` with consent-related content
|
|
237
|
-
- Falls back to global button scan for sites that don't use dialog roles (e.g. BBC SourcePoint)
|
|
238
|
-
- Uses JS `.click()` via `DOM.resolveNode` + `Runtime.callFunctionOn` to bypass iframe overlays
|
|
239
|
-
- 29 languages: EN, NL, DE, FR, ES, IT, PT, RU, UK, PL, CS, TR, RO, HU, EL, SV, DA, NO, FI, AR, FA, ZH, JA, KO, VI, TH, HI, ID/MS
|
|
240
|
-
- Opt-out via `{ consent: false }`
|
|
241
|
-
- Works in both headless and headed modes
|
|
242
|
-
|
|
243
|
-
**Tested against 16+ sites across 8 countries, 0 consent dialogs remaining.**
|
|
244
|
-
|
|
245
|
-
### Permission suppression -- done
|
|
246
|
-
Chrome permission prompts (location, notifications, camera, mic, etc.) are suppressed automatically.
|
|
247
|
-
- Headless: launch flags (`--disable-notifications`, `--autoplay-policy=no-user-gesture-required`, `--use-fake-device-for-media-stream`, `--use-fake-ui-for-media-stream`, `--disable-features=MediaRouter`)
|
|
248
|
-
- Both modes: CDP `Browser.setPermission` denies geolocation, notifications, midi, audioCapture, videoCapture, sensors, idleDetection, etc.
|
|
249
|
-
- No user prompt ever appears -- agents browse without interruption
|
|
250
|
-
|
|
251
|
-
### Cross-browser cookie injection -- done
|
|
252
|
-
Auto mode merges cookies from all detected browsers (Chromium + Firefox, last-write-wins by name+domain). No need to use Chromium as daily browser.
|
|
253
|
-
- `browse()`: auto-injects merged cookies before navigation (opt-out with `{ cookies: false }`)
|
|
254
|
-
- `connect()`: manual injection via `page.injectCookies(url, { browser: 'firefox' })`
|
|
255
|
-
- MCP `goto`: auto-injects cookies before every navigation
|
|
256
|
-
- Proven: YouTube login session transferred from Firefox -> headed Chromium -> video playback
|
|
257
|
-
|
|
258
|
-
### Stealth patches -- done
|
|
259
|
-
Anti-detection for headless mode via `Page.addScriptToEvaluateOnNewDocument` (runs before page scripts).
|
|
260
|
-
- `navigator.webdriver` -> undefined
|
|
261
|
-
- `navigator.plugins` -> fake 3 plugins
|
|
262
|
-
- `navigator.languages` -> `['en-US', 'en']`
|
|
263
|
-
- `window.chrome` -> fake object
|
|
264
|
-
- `Permissions.prototype.query` -> notifications return 'prompt'
|
|
265
|
-
- Applied automatically in headless mode
|
|
266
|
-
|
|
267
|
-
### Tests -- 64 passing
|
|
268
|
-
- 16 unit tests (pruning logic)
|
|
269
|
-
- 7 unit tests (cookie extraction -- 2 skip when Chromium profile locked)
|
|
270
|
-
- 5 unit tests (CDP client + browser launch)
|
|
271
|
-
- 11 integration tests (end-to-end browse pipeline)
|
|
272
|
-
- 10 integration tests (CLI session lifecycle: open/snapshot/goto/click/eval/console/network/close)
|
|
273
|
-
- 15 integration tests (real-world interactions: data: URL fixture + live sites)
|
|
274
|
-
|
|
275
|
-
---
|
|
276
|
-
|
|
277
|
-
## Integrations
|
|
278
|
-
|
|
279
|
-
### bareagent -- tool adapter
|
|
280
|
-
|
|
281
|
-
`createBrowseTools(opts)` returns bareagent-compatible tools for the Loop:
|
|
282
|
-
|
|
283
|
-
```js
|
|
284
|
-
import { Loop } from 'bare-agent';
|
|
285
|
-
import { Anthropic } from 'bare-agent/providers';
|
|
286
|
-
import { createBrowseTools } from 'barebrowse/src/bareagent.js';
|
|
287
|
-
|
|
288
|
-
const { tools, close } = createBrowseTools();
|
|
289
|
-
const loop = new Loop({ provider: new Anthropic({ apiKey }) });
|
|
290
|
-
const result = await loop.run(messages, tools);
|
|
291
|
-
await close();
|
|
292
|
-
```
|
|
293
|
-
|
|
294
|
-
13 tools: browse, goto, snapshot, click, type, press, scroll, select, back, forward, drag, upload, screenshot.
|
|
295
|
-
Action tools auto-return snapshot (300ms settle delay). The LLM always sees the result.
|
|
296
|
-
|
|
297
|
-
### MCP server
|
|
298
|
-
|
|
299
|
-
Raw JSON-RPC 2.0 over stdio. Zero SDK dependencies. `npm install barebrowse` then:
|
|
300
|
-
|
|
301
|
-
```json
|
|
302
|
-
{
|
|
303
|
-
"mcpServers": {
|
|
304
|
-
"barebrowse": {
|
|
305
|
-
"command": "npx",
|
|
306
|
-
"args": ["barebrowse", "mcp"]
|
|
307
|
-
}
|
|
308
|
-
}
|
|
309
|
-
}
|
|
310
|
-
```
|
|
311
|
-
|
|
312
|
-
12 tools: browse (one-shot), goto, snapshot, click, type, press, scroll, back, forward, drag, upload, pdf.
|
|
313
|
-
Action tools return `'ok'` -- agent calls `snapshot` explicitly (MCP tool calls are cheap to chain).
|
|
314
|
-
`browse` and `snapshot` accept `maxChars` (default 30000) — large snapshots are saved to `.barebrowse/` and a file path is returned.
|
|
315
|
-
Session runs in hybrid mode (headless + automatic headed fallback on bot detection). `goto` injects cookies from the user's browser before navigation.
|
|
316
|
-
Session tools share a singleton page, lazy-created on first use.
|
|
317
|
-
|
|
318
|
-
### CLI session -- for coding agents + human devs
|
|
319
|
-
|
|
320
|
-
Shell commands that output to disk. Coding agents (Claude Code, Copilot, Cursor) read output files with their file tools -- no tokens wasted in tool responses.
|
|
321
|
-
|
|
322
|
-
```bash
|
|
323
|
-
barebrowse open https://example.com # Start daemon + navigate
|
|
324
|
-
barebrowse snapshot # → .barebrowse/page-*.yml
|
|
325
|
-
barebrowse click 8 # Click element
|
|
326
|
-
barebrowse console-logs # → .barebrowse/console-*.json
|
|
327
|
-
barebrowse close # Kill daemon + browser
|
|
328
|
-
```
|
|
329
|
-
|
|
330
|
-
Architecture: `open` spawns a detached child process running an HTTP server on a random localhost port. Session state stored in `.barebrowse/session.json`. Subsequent commands POST to the daemon. `close` sends shutdown, daemon calls `page.close()` + `process.exit(0)`.
|
|
331
|
-
|
|
332
|
-
Full commands: open, close, status, goto, back, forward, snapshot, screenshot, pdf, click, type, fill, press, scroll, hover, select, drag, upload, tabs, tab, eval, wait-idle, wait-for, console-logs, network-log, dialog-log, save-state.
|
|
333
|
-
|
|
334
|
-
Self-sufficiency features (console/network capture, eval) let agents debug without guessing -- they see JS errors and failed requests directly.
|
|
335
|
-
|
|
336
|
-
SKILL.md (`commands/barebrowse/SKILL.md`) teaches Claude Code the CLI commands. Install with `barebrowse install --skill`.
|
|
337
|
-
|
|
338
|
-
---
|
|
339
|
-
|
|
340
|
-
## Ecosystem
|
|
341
|
-
|
|
342
|
-
```
|
|
343
|
-
bareagent = the brain (orchestration, LLM loop, memory, retries)
|
|
344
|
-
barebrowse = the eyes + hands (browse, read, interact with the web)
|
|
345
|
-
```
|
|
346
|
-
|
|
347
|
-
**barebrowse is a library.** bareagent imports it as a capability. barebrowse doesn't know about bareagent. bareagent doesn't know about CDP. Clean boundary. Each ships and tests independently.
|
|
348
|
-
|
|
349
|
-
---
|
|
350
|
-
|
|
351
|
-
## Constraints
|
|
352
|
-
|
|
353
|
-
- **Chromium-only.** CDP protocol. Covers Chrome, Chromium, Edge, Brave, Vivaldi, Arc, Opera (~80% desktop share). Firefox later via WebDriver BiDi.
|
|
354
|
-
- **Linux first.** Tested on Fedora/KDE. macOS/Windows cookie extraction paths exist in auth.js but untested.
|
|
355
|
-
- **Node >= 22.** Built-in WebSocket, built-in SQLite.
|
|
356
|
-
- **Not a server.** Library that agents import. Wrap as MCP (included) or HTTP if needed.
|
|
357
|
-
- **Not cross-platform tested.** Tested on Linux only. Published to npm as `barebrowse`.
|
|
358
|
-
|
|
359
|
-
---
|
|
360
|
-
|
|
361
|
-
## File Map
|
|
362
|
-
|
|
363
|
-
```
|
|
364
|
-
barebrowse/
|
|
365
|
-
├── src/
|
|
366
|
-
│ ├── index.js # Public API: browse(), connect(), screenshot, network idle, hybrid
|
|
367
|
-
│ ├── cdp.js # WebSocket CDP client
|
|
368
|
-
│ ├── chromium.js # Find/launch Chromium, permission flags
|
|
369
|
-
│ ├── aria.js # ARIA tree formatting
|
|
370
|
-
│ ├── auth.js # Cookie extraction + injection
|
|
371
|
-
│ ├── prune.js # ARIA pruning (9-step pipeline)
|
|
372
|
-
│ ├── interact.js # Click, type, press, scroll, hover, select
|
|
373
|
-
│ ├── consent.js # Auto-dismiss cookie consent dialogs
|
|
374
|
-
│ ├── stealth.js # Navigator patches for headless anti-detection
|
|
375
|
-
│ ├── bareagent.js # Tool adapter for bareagent Loop
|
|
376
|
-
│ ├── daemon.js # Background HTTP server for CLI session
|
|
377
|
-
│ └── session-client.js # HTTP client to daemon
|
|
378
|
-
├── test/
|
|
379
|
-
│ ├── unit/ # prune, auth, cdp tests
|
|
380
|
-
│ └── integration/ # browse, interact, cli tests
|
|
381
|
-
├── examples/
|
|
382
|
-
│ ├── headed-demo.js # Interactive demo: Wikipedia → DuckDuckGo
|
|
383
|
-
│ └── yt-demo.js # YouTube demo: Firefox cookies → search → play video
|
|
384
|
-
├── docs/
|
|
385
|
-
│ ├── README.md # Documentation navigation guide
|
|
386
|
-
│ ├── 00-context/ # vision, assumptions, system-state (this file)
|
|
387
|
-
│ ├── 01-product/ # prd.md
|
|
388
|
-
│ ├── 03-logs/ # decisions, implementation, bugs, validation, insights
|
|
389
|
-
│ ├── 04-process/ # dev-workflow, definition-of-done, testing (64 tests)
|
|
390
|
-
│ └── archive/ # poc-plan.md
|
|
391
|
-
├── mcp-server.js # MCP server (JSON-RPC 2.0 over stdio)
|
|
392
|
-
├── cli.js # CLI entry: session commands, MCP, browse, install
|
|
393
|
-
├── .mcp.json # MCP server config for Claude Desktop / Cursor
|
|
394
|
-
├── barebrowse.context.md # LLM-consumable integration guide
|
|
395
|
-
├── commands/
|
|
396
|
-
│ ├── barebrowse.md # CLI command reference (any agent)
|
|
397
|
-
│ └── barebrowse/
|
|
398
|
-
│ └── SKILL.md # CLI command reference (Claude Code skill)
|
|
399
|
-
├── package.json
|
|
400
|
-
├── README.md
|
|
401
|
-
└── CLAUDE.md
|
|
402
|
-
```
|
|
@@ -1,52 +0,0 @@
|
|
|
1
|
-
# barebrowse -- Vision
|
|
2
|
-
|
|
3
|
-
## What it is
|
|
4
|
-
|
|
5
|
-
A standalone vanilla JavaScript library that gives autonomous agents authenticated access to the web through the user's own Chromium browser. One package, one import, three modes.
|
|
6
|
-
|
|
7
|
-
```js
|
|
8
|
-
import { browse } from 'barebrowse';
|
|
9
|
-
const snapshot = await browse('https://any-page.com');
|
|
10
|
-
```
|
|
11
|
-
|
|
12
|
-
barebrowse handles: finding the browser, connecting via CDP, injecting cookies, navigating, extracting the ARIA accessibility tree, and pruning it down to what an agent actually needs. The output is a clean, token-efficient snapshot of any web page -- authenticated as the real user.
|
|
13
|
-
|
|
14
|
-
## What it is NOT
|
|
15
|
-
|
|
16
|
-
- **Not a framework.** No plugin system, no config files, no lifecycle hooks.
|
|
17
|
-
- **Not Playwright.** No bundled browser, no cross-engine abstraction, no 200MB download.
|
|
18
|
-
- **Not an agent.** No LLM, no planning, no orchestration -- that's bareagent's job.
|
|
19
|
-
- **Not a scraper.** It browses as the user, not as a bot harvesting data.
|
|
20
|
-
|
|
21
|
-
## The core insight
|
|
22
|
-
|
|
23
|
-
The user already has a browser. It's already logged in. It already passes Cloudflare. Instead of fighting the web with headless stealth tricks, **use what's already there**.
|
|
24
|
-
|
|
25
|
-
CDP (Chrome DevTools Protocol) lets us connect to any Chromium-based browser -- the same one the user browses with daily. We get their cookies, their sessions, their anti-detection posture, for free.
|
|
26
|
-
|
|
27
|
-
## The problem it solves
|
|
28
|
-
|
|
29
|
-
Every AI agent that needs to read or interact with the web hits the same walls:
|
|
30
|
-
|
|
31
|
-
1. **Cloudflare / bot detection** -- headless browsers get blocked
|
|
32
|
-
2. **Authentication** -- sites require login, OAuth, session cookies
|
|
33
|
-
3. **Token bloat** -- raw DOM is 100K+ tokens; agents need ~5K
|
|
34
|
-
4. **Two consumers, same need** -- research agents (read pages) and personal assistants (click/type) both need an authenticated browser, but existing tools force you to choose one path
|
|
35
|
-
|
|
36
|
-
## The bare- ecosystem
|
|
37
|
-
|
|
38
|
-
```
|
|
39
|
-
bareagent = the brain (orchestration, LLM loop, memory, retries)
|
|
40
|
-
barebrowse = the eyes + hands (browse, read, interact with the web)
|
|
41
|
-
```
|
|
42
|
-
|
|
43
|
-
barebrowse is a library. bareagent imports it as a capability. barebrowse doesn't know about bareagent. bareagent doesn't know about CDP. Clean boundary. Each ships and tests independently.
|
|
44
|
-
|
|
45
|
-
## Success criteria
|
|
46
|
-
|
|
47
|
-
1. `browse(url)` returns a pruned ARIA snapshot of any page, authenticated as the user
|
|
48
|
-
2. Zero heavy dependencies -- no Playwright, no Puppeteer, no bundled browser
|
|
49
|
-
3. Works with any installed Chromium-based browser
|
|
50
|
-
4. Headless for research, headed for interaction, hybrid for autonomous agents
|
|
51
|
-
5. Plugs into bareagent as plain tool functions
|
|
52
|
-
6. An agent using barebrowse + bareagent can autonomously research the web and act on pages
|