barebrowse 0.2.1 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,38 @@
1
+ # barebrowse -- Assumptions & Constraints
2
+
3
+ ## Hard constraints
4
+
5
+ | Constraint | Detail |
6
+ |-----------|--------|
7
+ | **Chromium-only** | CDP protocol. Covers Chrome, Chromium, Edge, Brave, Vivaldi, Arc, Opera (~80% desktop share). Firefox later via WebDriver BiDi. |
8
+ | **Node >= 22** | Built-in WebSocket (`globalThis.WebSocket`), built-in SQLite (`node:sqlite`). No polyfills. |
9
+ | **Linux first** | Tested on Fedora/KDE/Wayland. macOS/Windows cookie extraction paths exist in auth.js but are untested. |
10
+ | **Zero required deps** | Everything uses Node stdlib. Vanilla JS, ES modules, no build step. |
11
+ | **Not a server** | Library that agents import. MCP wrapper included, HTTP wrapper is DIY. |
12
+
13
+ ## Assumptions
14
+
15
+ - **User has Chromium installed.** At least one of: chromium-browser, google-chrome, brave-browser, microsoft-edge. `chromium.js` searches common paths.
16
+ - **Cookie extraction needs unlocked profile.** Chromium cookies are AES-encrypted with a keyring key (KWallet on KDE, GNOME Keyring on GNOME). Firefox cookies are plaintext SQLite and always accessible.
17
+ - **Headed mode requires manual browser launch.** User must start their browser with `--remote-debugging-port=9222`. barebrowse connects to it -- does not launch it.
18
+ - **Hybrid fallback needs a running headed browser.** If headless is bot-blocked, hybrid kills headless and connects to headed on port 9222. That browser must already be running.
19
+ - **Cookies expire.** Cookie injection works for existing sessions, not new logins. For sites requiring fresh auth, headed mode with user interaction is the fallback.
20
+ - **One page per connect().** Each `connect()` call creates one page. For multiple tabs, call `connect()` multiple times.
21
+
22
+ ## Known limitations
23
+
24
+ | Limitation | Impact | Workaround |
25
+ |-----------|--------|------------|
26
+ | No Firefox/WebKit support | ~20% of desktop users can't use native browser | Use Chromium as the automation target, Firefox as cookie source |
27
+ | No file upload | Can't interact with file inputs | Not yet implemented (`Input.setFiles` via CDP) |
28
+ | No drag and drop | Can't use drag-based UIs | Not yet implemented |
29
+ | No cross-origin iframes | Content inside iframes invisible to ARIA tree | Frame tree traversal via CDP (medium effort) |
30
+ | No CAPTCHAs | Cannot solve challenge pages | Headed mode lets user solve manually |
31
+ | Canvas/WebGL opaque | No ARIA representation | Needs screenshot + vision model |
32
+ | macOS/Windows untested | Cookie paths exist but may not work | Linux-only for now |
33
+
34
+ ## Risks
35
+
36
+ - **CDP is not a stable API.** Chrome team can change it across versions. Mitigation: we use well-established domains (Accessibility, Input, Page, Network, DOM) that rarely break.
37
+ - **Cookie consent patterns evolve.** New consent frameworks may not be detected by `consent.js`. Mitigation: best-effort, opt-out with `{ consent: false }`.
38
+ - **Stealth patches are an arms race.** Bot detection evolves. Mitigation: headed mode with real browser profile is the ultimate fallback.
@@ -163,7 +163,7 @@ Every action returns a **pruned ARIA snapshot** -- the agent's view of the page
163
163
 
164
164
  ### Module table
165
165
 
166
- Eleven modules, 2,396 lines, zero required dependencies.
166
+ Thirteen modules, zero required dependencies.
167
167
 
168
168
  | Module | Lines | Purpose |
169
169
  |---|---|---|
@@ -177,6 +177,8 @@ Eleven modules, 2,396 lines, zero required dependencies.
177
177
  | `src/consent.js` | 210 | Auto-dismiss cookie consent dialogs, 7 languages |
178
178
  | `src/stealth.js` | 51 | Navigator patches for headless anti-detection |
179
179
  | `src/bareagent.js` | 161 | Tool adapter for bareagent Loop |
180
+ | `src/daemon.js` | ~230 | Background HTTP server holding connect() session for CLI mode |
181
+ | `src/session-client.js` | ~60 | HTTP client to daemon (sendCommand, readSession, isAlive) |
180
182
  | `mcp-server.js` | 216 | MCP server (JSON-RPC 2.0 over stdio) |
181
183
 
182
184
  ---
@@ -254,11 +256,12 @@ Anti-detection for headless mode via `Page.addScriptToEvaluateOnNewDocument` (ru
254
256
  - `Permissions.prototype.query` -> notifications return 'prompt'
255
257
  - Applied automatically in headless mode
256
258
 
257
- ### Tests -- 47+ passing
259
+ ### Tests -- 64 passing
258
260
  - 16 unit tests (pruning logic)
259
261
  - 7 unit tests (cookie extraction -- 2 skip when Chromium profile locked)
260
262
  - 5 unit tests (CDP client + browser launch)
261
263
  - 11 integration tests (end-to-end browse pipeline)
264
+ - 10 integration tests (CLI session lifecycle: open/snapshot/goto/click/eval/console/network/close)
262
265
  - 15 integration tests (real-world interactions: data: URL fixture + live sites)
263
266
 
264
267
  ---
@@ -302,6 +305,26 @@ Raw JSON-RPC 2.0 over stdio. Zero SDK dependencies. `npm install barebrowse` the
302
305
  Action tools return `'ok'` -- agent calls `snapshot` explicitly (MCP tool calls are cheap to chain).
303
306
  Session tools share a singleton page, lazy-created on first use.
304
307
 
308
+ ### CLI session -- for coding agents + human devs
309
+
310
+ Shell commands that output to disk. Coding agents (Claude Code, Copilot, Cursor) read output files with their file tools -- no tokens wasted in tool responses.
311
+
312
+ ```bash
313
+ barebrowse open https://example.com # Start daemon + navigate
314
+ barebrowse snapshot # → .barebrowse/page-*.yml
315
+ barebrowse click 8 # Click element
316
+ barebrowse console-logs # → .barebrowse/console-*.json
317
+ barebrowse close # Kill daemon + browser
318
+ ```
319
+
320
+ Architecture: `open` spawns a detached child process running an HTTP server on a random localhost port. Session state stored in `.barebrowse/session.json`. Subsequent commands POST to the daemon. `close` sends shutdown, daemon calls `page.close()` + `process.exit(0)`.
321
+
322
+ Full commands: open, close, status, goto, snapshot, screenshot, click, type, fill, press, scroll, hover, select, eval, wait-idle, console-logs, network-log.
323
+
324
+ Self-sufficiency features (console/network capture, eval) let agents debug without guessing -- they see JS errors and failed requests directly.
325
+
326
+ SKILL.md (`.claude/skills/barebrowse/SKILL.md`) teaches Claude Code the CLI commands. Install with `barebrowse install --skill`.
327
+
305
328
  ---
306
329
 
307
330
  ## Ecosystem
@@ -339,10 +362,12 @@ barebrowse/
339
362
  │ ├── interact.js # Click, type, press, scroll, hover, select
340
363
  │ ├── consent.js # Auto-dismiss cookie consent dialogs
341
364
  │ ├── stealth.js # Navigator patches for headless anti-detection
342
- └── bareagent.js # Tool adapter for bareagent Loop
365
+ ├── bareagent.js # Tool adapter for bareagent Loop
366
+ │ ├── daemon.js # Background HTTP server for CLI session
367
+ │ └── session-client.js # HTTP client to daemon
343
368
  ├── test/
344
369
  │ ├── unit/ # prune, auth, cdp tests
345
- │ └── integration/ # browse + interact tests (real sites)
370
+ │ └── integration/ # browse, interact, cli tests
346
371
  ├── examples/
347
372
  │ ├── headed-demo.js # Interactive demo: Wikipedia → DuckDuckGo
348
373
  │ └── yt-demo.js # YouTube demo: Firefox cookies → search → play video
@@ -352,7 +377,7 @@ barebrowse/
352
377
  │ ├── blueprint.md # This file
353
378
  │ └── testing.md # Test guide: pyramid, all 54 tests, CI strategy
354
379
  ├── mcp-server.js # MCP server (JSON-RPC 2.0 over stdio)
355
- ├── cli.js # CLI entry: `npx barebrowse mcp` or `npx barebrowse browse <url>`
380
+ ├── cli.js # CLI entry: session commands, MCP, browse, install
356
381
  ├── .mcp.json # MCP server config for Claude Desktop / Cursor
357
382
  ├── barebrowse.context.md # LLM-consumable integration guide
358
383
  ├── package.json
@@ -0,0 +1,52 @@
1
+ # barebrowse -- Vision
2
+
3
+ ## What it is
4
+
5
+ A standalone vanilla JavaScript library that gives autonomous agents authenticated access to the web through the user's own Chromium browser. One package, one import, three modes.
6
+
7
+ ```js
8
+ import { browse } from 'barebrowse';
9
+ const snapshot = await browse('https://any-page.com');
10
+ ```
11
+
12
+ barebrowse handles: finding the browser, connecting via CDP, injecting cookies, navigating, extracting the ARIA accessibility tree, and pruning it down to what an agent actually needs. The output is a clean, token-efficient snapshot of any web page -- authenticated as the real user.
13
+
14
+ ## What it is NOT
15
+
16
+ - **Not a framework.** No plugin system, no config files, no lifecycle hooks.
17
+ - **Not Playwright.** No bundled browser, no cross-engine abstraction, no 200MB download.
18
+ - **Not an agent.** No LLM, no planning, no orchestration -- that's bareagent's job.
19
+ - **Not a scraper.** It browses as the user, not as a bot harvesting data.
20
+
21
+ ## The core insight
22
+
23
+ The user already has a browser. It's already logged in. It already passes Cloudflare. Instead of fighting the web with headless stealth tricks, **use what's already there**.
24
+
25
+ CDP (Chrome DevTools Protocol) lets us connect to any Chromium-based browser -- the same one the user browses with daily. We get their cookies, their sessions, their anti-detection posture, for free.
26
+
27
+ ## The problem it solves
28
+
29
+ Every AI agent that needs to read or interact with the web hits the same walls:
30
+
31
+ 1. **Cloudflare / bot detection** -- headless browsers get blocked
32
+ 2. **Authentication** -- sites require login, OAuth, session cookies
33
+ 3. **Token bloat** -- raw DOM is 100K+ tokens; agents need ~5K
34
+ 4. **Two consumers, same need** -- research agents (read pages) and personal assistants (click/type) both need an authenticated browser, but existing tools force you to choose one path
35
+
36
+ ## The bare- ecosystem
37
+
38
+ ```
39
+ bareagent = the brain (orchestration, LLM loop, memory, retries)
40
+ barebrowse = the eyes + hands (browse, read, interact with the web)
41
+ ```
42
+
43
+ barebrowse is a library. bareagent imports it as a capability. barebrowse doesn't know about bareagent. bareagent doesn't know about CDP. Clean boundary. Each ships and tests independently.
44
+
45
+ ## Success criteria
46
+
47
+ 1. `browse(url)` returns a pruned ARIA snapshot of any page, authenticated as the user
48
+ 2. Zero heavy dependencies -- no Playwright, no Puppeteer, no bundled browser
49
+ 3. Works with any installed Chromium-based browser
50
+ 4. Headless for research, headed for interaction, hybrid for autonomous agents
51
+ 5. Plugs into bareagent as plain tool functions
52
+ 6. An agent using barebrowse + bareagent can autonomously research the web and act on pages
@@ -0,0 +1,284 @@
1
+ # barebrowse — Product Requirements Document
2
+
3
+ **Version:** 1.0
4
+ **Date:** 2026-02-22
5
+ **Status:** POC
6
+
7
+ ---
8
+
9
+ ## What barebrowse is
10
+
11
+ A standalone vanilla JavaScript library that gives autonomous agents authenticated access to the web through the user's own Chromium browser. One package, one import, three modes.
12
+
13
+ ```js
14
+ import { browse } from 'barebrowse';
15
+ const snapshot = await browse('https://any-page.com');
16
+ ```
17
+
18
+ barebrowse handles: finding the browser, connecting via CDP, injecting cookies, navigating, extracting the ARIA accessibility tree, and pruning it down to what an agent actually needs. The output is a clean, token-efficient snapshot of any web page — authenticated as the real user.
19
+
20
+ ## What barebrowse is NOT
21
+
22
+ - **Not a framework.** No plugin system, no config files, no lifecycle hooks.
23
+ - **Not an MCP server.** But trivially wrappable as one (~30 lines).
24
+ - **Not Playwright.** No bundled browser, no cross-engine abstraction, no 200MB download.
25
+ - **Not an agent.** No LLM, no planning, no orchestration — that's bareagent's job.
26
+ - **Not a scraper.** It browses as the user, not as a bot harvesting data.
27
+
28
+ ---
29
+
30
+ ## The Problem
31
+
32
+ Every AI agent that needs to read or interact with the web hits the same walls:
33
+
34
+ 1. **Cloudflare / bot detection** — headless browsers get blocked
35
+ 2. **Authentication** — sites require login, OAuth, session cookies
36
+ 3. **Token bloat** — raw DOM is 100K+ tokens; agents need ~5K
37
+ 4. **Two consumers, same need** — research agents (read pages) and personal assistants (click/type) both need an authenticated browser, but existing tools force you to choose one path
38
+
39
+ Existing solutions (Playwright MCP, sweetlink, open-operator, browser-use) are either too heavy, too opinionated, or solve only half the problem.
40
+
41
+ ## The Insight
42
+
43
+ The user already has a browser. It's already logged in. It already passes Cloudflare. Instead of fighting the web with headless stealth tricks, **use what's already there**.
44
+
45
+ CDP (Chrome DevTools Protocol) lets us connect to any Chromium-based browser — the same one the user browses with daily. We get their cookies, their sessions, their anti-detection posture, for free.
46
+
47
+ ---
48
+
49
+ ## Core Architecture
50
+
51
+ ### CDP-Direct (Why No Playwright)
52
+
53
+ **Decision:** Use CDP over WebSocket directly. No Playwright dependency.
54
+
55
+ **Why:**
56
+ - Playwright downloads a bundled Chromium (~200MB). barebrowse uses the browser already installed on the user's machine.
57
+ - Playwright abstracts CDP, but we need CDP directly for all three modes (headless, headed, hybrid) against the user's real browser.
58
+ - Every Playwright API call maps 1:1 to a CDP method. The abstraction adds weight without adding capability for our use case.
59
+ - CDP gives us everything: `Accessibility.getFullAXTree`, `Page.navigate`, `Runtime.evaluate`, `Input.dispatch*Event`, `Network.setCookie`, `Page.captureScreenshot`.
60
+ - The CDP WebSocket client is ~100 lines of vanilla JS. Playwright is ~50,000.
61
+
62
+ **What we lose:** Cross-engine support (Firefox, WebKit). CDP only works with Chromium-family browsers (Chrome, Chromium, Edge, Brave, Vivaldi, Arc, Opera). This covers ~80% of desktop browsers. Firefox support could come later via WebDriver BiDi.
63
+
64
+ **What we gain:** Zero heavy deps, uses the user's real browser, same code path for headless/headed/hybrid, drastically simpler codebase.
65
+
66
+ ### ARIA-First (Why Not DOM)
67
+
68
+ **Decision:** Use `Accessibility.getFullAXTree` (ARIA/accessibility tree) as the primary page representation, not DOM.
69
+
70
+ **Why:**
71
+ - The accessibility tree is the semantic structure of the page — roles, names, states, interactive elements. It's what screen readers see. It's also what agents need.
72
+ - DOM is bloated: wrapper divs, styling, tracking pixels, ad scripts. An agent doesn't need any of that.
73
+ - mcprune already proved this: ARIA snapshots pruned by role achieve 75-95% token reduction on typical pages while preserving all actionable information.
74
+ - CDP's `Accessibility.getFullAXTree` returns the tree directly. No parsing HTML, no building a DOM tree, no traversing nodes.
75
+ - ARIA refs map directly to CDP interaction targets — the agent reads a button in the tree and can click it via the same CDP connection.
76
+
77
+ **The pipeline:** CDP connect → authenticate → navigate → ARIA tree → prune → agent gets clean snapshot.
78
+
79
+ ### Three Modes (Why All Three)
80
+
81
+ **Decision:** Headless, headed, and hybrid — not as separate packages or optional features, but as a single flag on the same API.
82
+
83
+ **Why they're not bloat:** The CDP conversation is identical regardless of mode. The only difference is how you get a browser process with a debug port. It's one code path with a different entry point:
84
+
85
+ ```
86
+ headless: spawn chromium --headless=new --remote-debugging-port=N
87
+ headed: connect to user's already-running browser on debug port
88
+ hybrid: try headless → detect failure → fall back to headed
89
+ ```
90
+
91
+ After connection, every CDP command is the same. Three modes = ~20 extra lines in `chromium.js`, not three implementations.
92
+
93
+ **When to use each:**
94
+
95
+ | Mode | Use case | Example |
96
+ |---|---|---|
97
+ | `headless` | Agent research, background tasks, CI | "Read this article and summarize it" |
98
+ | `headed` | Personal assistant, interactive tasks, auth flows | "Book me a flight on this page" |
99
+ | `hybrid` | Default for autonomous agents | Try headless; if CF-blocked, fall back to headed |
100
+
101
+ **Headless is the default.** Most agent tasks are "go read this page." Headed is the escape hatch for when headless fails or the task requires user-visible interaction.
102
+
103
+ ### Cookie Authentication
104
+
105
+ **Decision:** Extract cookies from the user's browser profile and inject via CDP `Network.setCookie`.
106
+
107
+ **Why:**
108
+ - The user's browser has active sessions for every site they use. We reuse those sessions instead of building new auth flows.
109
+ - sweet-cookie (npm package) already extracts cookies from Chrome/Firefox/Safari SQLite databases with OS keychain decryption. We use it or vendor the relevant parts.
110
+ - For headed mode, cookies are already present in the browser — no extraction needed.
111
+ - For headless mode, we extract from the user's profile and inject into the headless instance.
112
+
113
+ **Limitation:** Cookies expire. This works for existing sessions, not new logins. For sites requiring fresh auth, headed mode with user interaction is the fallback.
114
+
115
+ ### Pruning (Absorbed from mcprune)
116
+
117
+ **Decision:** Port mcprune's role-based ARIA tree pruning into barebrowse as a built-in step, not an optional module.
118
+
119
+ **Why:**
120
+ - Pruning is not optional for agent consumption. A raw ARIA tree is still too large for most LLM context windows. Pruning is part of the pipeline, not an afterthought.
121
+ - mcprune's pruning logic is a pure function: takes an ARIA tree, returns a smaller ARIA tree. No browser dependency, no Playwright coupling. It's ~300 lines of role-based tree surgery.
122
+ - By absorbing it, barebrowse becomes a complete "URL in, agent-ready snapshot out" solution. No second package needed.
123
+
124
+ **What we port from mcprune:**
125
+ - Role taxonomy (landmarks, interactive, structural, noise)
126
+ - Landmark extraction (main, nav, banner, etc.)
127
+ - Noise removal (ads, tracking, legal boilerplate)
128
+ - Interactive element preservation (buttons, links, inputs)
129
+ - Wrapper collapsing (nested generics, empty groups)
130
+ - Context-aware filtering (search relevance, dedup)
131
+
132
+ **What stays in mcprune:** The Playwright MCP proxy architecture. mcprune can continue to exist as a Playwright-based MCP server for users who want that path. But for barebrowse consumers, pruning is built in.
133
+
134
+ ---
135
+
136
+ ## API Design
137
+
138
+ ### Public API
139
+
140
+ ```js
141
+ import { browse, connect } from 'barebrowse';
142
+
143
+ // One-shot: URL in, pruned ARIA snapshot out
144
+ const tree = await browse('https://example.com');
145
+
146
+ // With options
147
+ const tree = await browse('https://example.com', {
148
+ mode: 'hybrid', // 'headless' (default) | 'headed' | 'hybrid'
149
+ cookies: true, // inject user's cookies (default: true)
150
+ prune: true, // apply ARIA pruning (default: true)
151
+ browser: 'chrome', // which browser profile for cookies
152
+ timeout: 30000, // navigation timeout ms
153
+ });
154
+
155
+ // Long-lived session for interaction
156
+ const page = await connect({ mode: 'headed' });
157
+ await page.goto('https://amazon.com/cart');
158
+ await page.click('[data-action="checkout"]');
159
+ await page.type('#gift-message', 'Happy birthday!');
160
+ const tree = await page.snapshot(); // ARIA + prune
161
+ await page.close();
162
+ ```
163
+
164
+ ### Design Principles
165
+
166
+ 1. **One package, one import.** No picking pieces. `browse()` does everything. Power users get `connect()` for long-lived sessions.
167
+ 2. **Batteries included.** Cookies, ARIA, pruning — all happen inside by default. Disable with flags if you want raw access.
168
+ 3. **Escape hatches.** `connect()` returns an object with the raw CDP connection accessible. If you need something we don't wrap, you can send CDP commands directly.
169
+ 4. **Progressive complexity.** `browse(url)` for 90% of use cases. Options object for the rest. `connect()` for interactive sessions.
170
+
171
+ ---
172
+
173
+ ## The bare- Ecosystem
174
+
175
+ ```
176
+ bareagent = the brain (orchestration, planning, memory, retries, tool loop)
177
+ barebrowse = the eyes + hands (browse, read, interact with the web)
178
+ ```
179
+
180
+ **Integration with bareagent:**
181
+
182
+ ```js
183
+ import { Loop } from 'bare-agent';
184
+ import { browse } from 'barebrowse';
185
+
186
+ const tools = [
187
+ { name: 'browse', execute: ({ url }) => browse(url) },
188
+ ];
189
+
190
+ const loop = new Loop({ provider });
191
+ await loop.run([{ role: 'user', content: 'Find the cheapest flight to Tokyo' }], tools);
192
+ ```
193
+
194
+ bareagent handles the think/act/observe loop. barebrowse handles "see the web and act on it." Neither is opinionated about the other. Tools are plain functions.
195
+
196
+ **Integration with multis:**
197
+
198
+ multis (personal assistant) uses barebrowse in headed mode for interactive tasks. The multis proxy is already running, providing a desktop session. barebrowse connects to the user's Chrome and drives it on behalf of the assistant.
199
+
200
+ **MCP server wrapper (future):**
201
+
202
+ barebrowse is not an MCP server, but wrapping it as one is ~30 lines. This would replace Playwright MCP + mcprune proxy with a single, lighter MCP server.
203
+
204
+ ---
205
+
206
+ ## Decisions Log — Why We Chose Each
207
+
208
+ This section exists so we don't re-debate settled decisions.
209
+
210
+ | Decision | Choice | Why | Alternative considered | Why not |
211
+ |---|---|---|---|---|
212
+ | Browser protocol | CDP direct | Uses user's browser, ~100 lines, all 3 modes | Playwright | 200MB download, bundles its own Chromium, abstracts what we need raw |
213
+ | Page representation | ARIA tree | Semantic, token-efficient, what agents need | DOM/HTML | Bloated, noisy, needs heavy parsing |
214
+ | Pruning | Built-in | Agents always need pruned output | Optional/separate | Two deps for one job, pruning isn't optional |
215
+ | Cookie auth | Own auth.js + CDP inject | User's existing sessions (Firefox or Chromium), cross-browser injection into headless Chromium | OAuth/credential storage | Complex, security liability, reinventing what the browser already solved |
216
+ | Three modes | One flag | Same CDP code, ~20 lines difference | Separate packages | Same code, artificial separation |
217
+ | Chromium only | CDP constraint | ~80% browser share, user's real browser | Cross-browser (Playwright) | Requires Playwright, loses "use your own browser" benefit |
218
+ | Anti-detection | Runtime.evaluate patches | Minimal stealth for headless mode | Full stealth framework | Over-engineering; headless + real cookies handles 90% |
219
+ | Daemon/server | None | CDP is direct, no intermediary needed | sweetlink daemon pattern | Unnecessary complexity for local agent→browser |
220
+ | Framework | None (vanilla JS) | Matches bare- philosophy, zero deps | Express/Fastify wrapper | Not a server, not needed |
221
+ | Language | Vanilla JavaScript | Node.js ecosystem, same as bareagent, CDP libs available | TypeScript | Added build step, not needed for POC; can add types later |
222
+ | Naming | chromium.js | Covers all Chromium-family browsers, not just Chrome | chrome.js | Too specific; Brave/Edge/Arc are also targets |
223
+ | mcprune integration | Absorb pruning logic | One package does it all, mcprune pruning is a pure function | Keep separate | Agents shouldn't need two packages to browse |
224
+ | openclaw lesson | Single bridge protocol | One CDP connection vs many API integrations | Direct multi-API | openclaw proved this fails — bloat, maintenance, fragility |
225
+
226
+ ---
227
+
228
+ ## Future Features (Post-POC)
229
+
230
+ ### Near-term
231
+ - **Screenshot capture** — `Page.captureScreenshot` via CDP. Useful for visual verification and multimodal agents.
232
+ - **Network interception** — `Network.requestWillBeSent` / `Network.responseReceived` for monitoring page loads. Detect redirects, blocked resources, API calls.
233
+ - **Wait strategies** — `waitForNavigation()` done (Page.loadEventFired). Still needed: network idle, element presence polling.
234
+ - **Tab management** — Multiple pages in one browser session. CDP `Target.createTarget` / `Target.attachToTarget`.
235
+ - **MCP server wrapper** — Expose browse/click/type as MCP tools. Replaces Playwright MCP + mcprune combo.
236
+
237
+ ### Medium-term
238
+ - **Firefox support** — Via WebDriver BiDi protocol (cross-browser standard, still maturing). Second protocol adapter alongside CDP.
239
+ - **Cookie sync** — In hybrid mode, extract fresh cookies from headed session and cache for future headless use. Self-refreshing auth.
240
+ - **Selector discovery** — Port sweetlink's `discoverSelectors` — crawl ARIA tree, score interactive elements, return ranked action targets.
241
+ - **Form understanding** — Detect forms in ARIA tree, map fields to semantic purposes, enable agents to fill forms intelligently.
242
+ - **Proxy/Tor support** — Route headless browser through proxy for geo-restricted content.
243
+
244
+ ### Long-term
245
+ - **Profile management** — Multiple browser profiles for different identities/accounts.
246
+ - **Session recording/replay** — Record browsing sessions as CDP commands, replay for testing.
247
+ - **Visual grounding** — Combine ARIA tree with screenshot regions for multimodal agents.
248
+ - **Agent memory integration** — Remember visited pages, cache snapshots, track which sites need headed mode.
249
+
250
+ ---
251
+
252
+ ## Repos Studied — What We Borrowed and Why
253
+
254
+ | Repo | What we took | What we skipped |
255
+ |---|---|---|
256
+ | **steipete/sweet-cookie** | Cookie extraction from browser profiles, OS keychain decryption | Nothing — clean, focused library |
257
+ | **steipete/sweetlink** | CDP dual-channel concept, selector discovery scoring, click/command patterns | Daemon architecture, WebSocket bridge, in-page runtime injection, HMAC auth |
258
+ | **steipete/canvas** | Stealth/anti-detection config patterns | Go implementation (we're JS) |
259
+ | **nichochar/open-operator** | AI agent web automation patterns | Full framework, too opinionated |
260
+ | **AntlerClaw/playwright-mcp** | How to expose browser as MCP tools | Playwright dependency |
261
+ | **AntlerClaw/mcp-browser-use** | MCP-native browser patterns | Heavy deps |
262
+ | **AitchKay/chromancer** | Accessibility tree extraction approach | Different stack |
263
+ | **mcprune (own)** | ARIA pruning logic — role taxonomy, landmark extraction, noise removal, wrapper collapsing | Playwright dependency, MCP proxy architecture |
264
+ | **openclaw (own)** | Lesson learned: multi-API direct integration = bloat. Use a single bridge protocol | Everything — the architecture was the cautionary tale |
265
+
266
+ ### The openclaw lesson
267
+
268
+ openclaw tried to integrate 10+ messaging APIs directly — each with its own auth, format, quirks. It became a maintenance nightmare. multis solved the same problem by using Beeper/Matrix as a single bridge.
269
+
270
+ barebrowse applies the same lesson: instead of integrating Playwright + Puppeteer + WebDriver + stealth plugins + cookie libraries + proxy managers, we use **one protocol (CDP) to one browser (the user's)**. Everything else is unnecessary.
271
+
272
+ ---
273
+
274
+ ## Success Criteria
275
+
276
+ barebrowse succeeds when:
277
+
278
+ 1. `browse(url)` returns a pruned ARIA snapshot of any page, authenticated as the user
279
+ 2. Zero heavy dependencies — no Playwright, no Puppeteer, no bundled browser
280
+ 3. Works with any installed Chromium-based browser
281
+ 4. Headless for research, headed for interaction, hybrid for autonomous agents
282
+ 5. Plugs into bareagent as plain tool functions
283
+ 6. Total source under 1,000 lines for core functionality
284
+ 7. An agent using barebrowse + bareagent can autonomously research the web and act on pages
@@ -0,0 +1,16 @@
1
+ # Bug Log
2
+
3
+ Track bugs: symptom, root cause, fix, regression test.
4
+
5
+ ---
6
+
7
+ *No bugs logged yet. When one is found, add an entry:*
8
+
9
+ ```
10
+ ## [date] Short description
11
+
12
+ **Symptom:** What the user/test observed
13
+ **Root cause:** Why it happened
14
+ **Fix:** What was changed (file:line)
15
+ **Regression test:** Which test prevents recurrence
16
+ ```
@@ -0,0 +1,32 @@
1
+ # Decisions Log
2
+
3
+ Settled decisions. Don't re-debate these -- see rationale column.
4
+
5
+ ## Founding decisions (v0.1.0)
6
+
7
+ | # | Decision | Choice | Why | Alternative | Why not |
8
+ |---|----------|--------|-----|-------------|---------|
9
+ | 1 | Browser protocol | CDP direct | Uses user's browser, ~100 lines, all 3 modes | Playwright | 200MB download, bundles its own Chromium, abstracts what we need raw |
10
+ | 2 | Page representation | ARIA tree | Semantic, token-efficient, what agents need | DOM/HTML | Bloated, noisy, needs heavy parsing |
11
+ | 3 | Pruning | Built-in | Agents always need pruned output | Optional/separate | Two deps for one job, pruning isn't optional |
12
+ | 4 | Cookie auth | Own auth.js + CDP inject | User's existing sessions (Firefox or Chromium), cross-browser injection | OAuth/credential storage | Complex, security liability, reinventing what the browser already solved |
13
+ | 5 | Three modes | One flag | Same CDP code, ~20 lines difference | Separate packages | Same code, artificial separation |
14
+ | 6 | Chromium only | CDP constraint | ~80% browser share, user's real browser | Cross-browser (Playwright) | Requires Playwright, loses "use your own browser" benefit |
15
+ | 7 | Framework | None (vanilla JS) | Matches bare- philosophy, zero deps | Express/Fastify wrapper | Not a server, not needed |
16
+ | 8 | Language | Vanilla JavaScript | Node.js ecosystem, same as bareagent, CDP libs available | TypeScript | Added build step, not needed; can add types later |
17
+ | 9 | mcprune integration | Absorb pruning logic | One package does it all, mcprune pruning is a pure function | Keep separate | Agents shouldn't need two packages to browse |
18
+ | 10 | Daemon/server | None | CDP is direct, no intermediary needed | sweetlink daemon pattern | Unnecessary complexity for local agent-to-browser |
19
+
20
+ ## v0.2.0 decisions
21
+
22
+ | # | Decision | Choice | Why | Alternative | Why not |
23
+ |---|----------|--------|-----|-------------|---------|
24
+ | 11 | Anti-detection | Runtime.evaluate patches | Minimal stealth for headless mode | Full stealth framework | Over-engineering; headless + real cookies handles 90% |
25
+ | 12 | sweet-cookie | Wrote own auth.js | sweet-cookie not on npm (different package). Our version is simpler, tailored, vanilla JS | Use sweet-cookie | Not available as npm package |
26
+ | 13 | MCP server | Raw JSON-RPC, no SDK | Zero deps, ~200 lines. SDK adds weight without capability for stdio | @modelcontextprotocol/sdk | Unnecessary dependency for simple JSON-RPC |
27
+ | 14 | bareagent adapter | Action tools auto-return snapshot | LLM always sees result without extra tool call. 300ms settle for DOM updates | Return 'ok' like MCP | Different tradeoff -- bareagent tool calls are expensive (LLM round-trip) |
28
+ | 15 | MCP action tools | Return 'ok', agent calls snapshot | MCP tool calls are cheap to chain. Avoids double-token output | Auto-return snapshot | Would bloat every action response |
29
+
30
+ ---
31
+
32
+ *Add new decisions below this line. Include date, context, and rationale.*
@@ -0,0 +1,54 @@
1
+ # Implementation Log
2
+
3
+ Chronological record of what changed and why. For detailed changelogs, see `/CHANGELOG.md`.
4
+
5
+ ---
6
+
7
+ ## v0.2.1 (2026-02-22)
8
+
9
+ - README rewritten: no code blocks, obstacle table, two usage paths (MCP vs framework)
10
+ - MCP auto-installer: `npx barebrowse install` detects Claude Desktop, Cursor, Claude Code
11
+ - MCP config uses `npx` instead of local file paths
12
+
13
+ ## v0.2.0 (2026-02-22)
14
+
15
+ Major release: agent integration layer.
16
+
17
+ **New modules:**
18
+ - `mcp-server.js` -- JSON-RPC 2.0 over stdio, 7 tools, singleton session
19
+ - `src/bareagent.js` -- tool adapter for bareagent Loop, 9 tools, auto-snapshot
20
+ - `src/stealth.js` -- navigator patches for headless anti-detection
21
+ - `cli.js` -- `npx barebrowse mcp|install|browse`
22
+
23
+ **New features:**
24
+ - Hybrid mode (try headless, fallback to headed on bot detection)
25
+ - `page.hover(ref)`, `page.select(ref, value)`, `page.screenshot(opts)`
26
+ - `page.waitForNetworkIdle(opts)` -- resolve when no pending requests
27
+ - SPA-aware `waitForNavigation()`
28
+
29
+ **Docs:**
30
+ - `barebrowse.context.md` -- LLM integration guide
31
+ - `docs/testing.md` -- test pyramid, all 54 tests
32
+ - `docs/blueprint.md` -- full pipeline, module table
33
+
34
+ **Tests:** 54 passing (was 47)
35
+
36
+ ## v0.1.0 (2026-02-22)
37
+
38
+ Initial release. CDP-direct browsing with ARIA snapshots.
39
+
40
+ **Core modules (7):**
41
+ - `src/index.js` -- `browse()`, `connect()` API
42
+ - `src/cdp.js` -- WebSocket CDP client
43
+ - `src/chromium.js` -- browser discovery and launch
44
+ - `src/aria.js` -- ARIA tree formatting
45
+ - `src/auth.js` -- cookie extraction (Firefox SQLite, Chromium AES + keyring)
46
+ - `src/prune.js` -- 9-step pruning pipeline (ported from mcprune)
47
+ - `src/interact.js` -- click, type, press, scroll
48
+ - `src/consent.js` -- cookie consent auto-dismiss (7 languages, 16+ sites)
49
+
50
+ **Tests:** 47 passing across 5 files
51
+
52
+ ---
53
+
54
+ *Add new entries at the top. Include version, date, and what changed.*
@@ -0,0 +1,35 @@
1
+ # Insights
2
+
3
+ Lessons learned, patterns discovered, things to remember.
4
+
5
+ ---
6
+
7
+ ## The openclaw lesson
8
+
9
+ openclaw tried to integrate 10+ messaging APIs directly -- each with its own auth, format, quirks. It became a maintenance nightmare. multis solved the same problem by using Beeper/Matrix as a single bridge.
10
+
11
+ barebrowse applies the same lesson: instead of integrating Playwright + Puppeteer + WebDriver + stealth plugins + cookie libraries + proxy managers, we use **one protocol (CDP) to one browser (the user's)**. Everything else is unnecessary.
12
+
13
+ **Takeaway:** When possible, find a single bridge protocol instead of N direct integrations.
14
+
15
+ ## Repos studied -- what we took and what we skipped
16
+
17
+ | Repo | What we took | What we skipped | Why |
18
+ |------|-------------|-----------------|-----|
19
+ | **steipete/sweet-cookie** | Cookie extraction concept (SQLite + keyring) | Nothing | Not on npm. Wrote our own auth.js -- simpler, tailored, vanilla JS |
20
+ | **steipete/sweetlink** | CDP-direct concept | Daemon, WebSocket bridge, in-page runtime, HMAC auth | CDP direct is 100 lines vs ~2,000 |
21
+ | **steipete/canvas** | Stealth/anti-detection patterns | Go implementation | Noted for stealth.js |
22
+ | **mcprune (own)** | Full pruning pipeline port | Playwright dependency, MCP proxy | prune.js is 472 lines, adapted from Playwright YAML to CDP tree |
23
+ | **openclaw (own)** | Cautionary tale | Everything | Multi-API direct integration = bloat |
24
+
25
+ ## Key technical insights
26
+
27
+ - **ARIA tree > DOM** for agent consumption. Semantic, compact, interactive elements are first-class. Token reduction of 47-95% is real.
28
+ - **Cookie consent is solvable** with ARIA tree scanning + a button text corpus in 7 languages. Dialog role detection + global fallback covers >95% of sites.
29
+ - **Headed mode is the ultimate fallback.** When stealth fails, when cookies expire, when CAPTCHAs appear -- connecting to the user's real browser session handles it.
30
+ - **CDP flattened sessions** are the way to go. One WebSocket, multiple targets. The session ID header routes commands to the right tab.
31
+ - **`Page.addScriptToEvaluateOnNewDocument`** runs before any page scripts -- perfect for stealth patches without race conditions.
32
+
33
+ ---
34
+
35
+ *Add new insights as they emerge. These should be durable lessons, not session notes.*