barebrowse 0.2.2 → 0.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/skills/barebrowse/SKILL.md +107 -0
- package/CHANGELOG.md +57 -0
- package/CLAUDE.md +4 -2
- package/README.md +52 -5
- package/barebrowse.context.md +27 -8
- package/cli.js +289 -48
- package/docs/00-context/assumptions.md +38 -0
- package/docs/{blueprint.md → 00-context/system-state.md} +30 -5
- package/docs/00-context/vision.md +52 -0
- package/docs/01-product/prd.md +284 -0
- package/docs/03-logs/bug-log.md +16 -0
- package/docs/03-logs/decisions-log.md +32 -0
- package/docs/03-logs/implementation-log.md +54 -0
- package/docs/03-logs/insights.md +35 -0
- package/docs/03-logs/validation-log.md +123 -0
- package/docs/04-process/definition-of-done.md +31 -0
- package/docs/04-process/dev-workflow.md +68 -0
- package/docs/{testing.md → 04-process/testing.md} +21 -2
- package/docs/README.md +55 -0
- package/docs/archive/poc-plan.md +230 -0
- package/package.json +1 -1
- package/src/aria.js +1 -1
- package/src/daemon.js +321 -0
- package/src/session-client.js +70 -0
|
@@ -0,0 +1,284 @@
|
|
|
1
|
+
# barebrowse — Product Requirements Document
|
|
2
|
+
|
|
3
|
+
**Version:** 1.0
|
|
4
|
+
**Date:** 2026-02-22
|
|
5
|
+
**Status:** POC
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## What barebrowse is
|
|
10
|
+
|
|
11
|
+
A standalone vanilla JavaScript library that gives autonomous agents authenticated access to the web through the user's own Chromium browser. One package, one import, three modes.
|
|
12
|
+
|
|
13
|
+
```js
|
|
14
|
+
import { browse } from 'barebrowse';
|
|
15
|
+
const snapshot = await browse('https://any-page.com');
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
barebrowse handles: finding the browser, connecting via CDP, injecting cookies, navigating, extracting the ARIA accessibility tree, and pruning it down to what an agent actually needs. The output is a clean, token-efficient snapshot of any web page — authenticated as the real user.
|
|
19
|
+
|
|
20
|
+
## What barebrowse is NOT
|
|
21
|
+
|
|
22
|
+
- **Not a framework.** No plugin system, no config files, no lifecycle hooks.
|
|
23
|
+
- **Not an MCP server.** But trivially wrappable as one (~30 lines).
|
|
24
|
+
- **Not Playwright.** No bundled browser, no cross-engine abstraction, no 200MB download.
|
|
25
|
+
- **Not an agent.** No LLM, no planning, no orchestration — that's bareagent's job.
|
|
26
|
+
- **Not a scraper.** It browses as the user, not as a bot harvesting data.
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## The Problem
|
|
31
|
+
|
|
32
|
+
Every AI agent that needs to read or interact with the web hits the same walls:
|
|
33
|
+
|
|
34
|
+
1. **Cloudflare / bot detection** — headless browsers get blocked
|
|
35
|
+
2. **Authentication** — sites require login, OAuth, session cookies
|
|
36
|
+
3. **Token bloat** — raw DOM is 100K+ tokens; agents need ~5K
|
|
37
|
+
4. **Two consumers, same need** — research agents (read pages) and personal assistants (click/type) both need an authenticated browser, but existing tools force you to choose one path
|
|
38
|
+
|
|
39
|
+
Existing solutions (Playwright MCP, sweetlink, open-operator, browser-use) are either too heavy, too opinionated, or solve only half the problem.
|
|
40
|
+
|
|
41
|
+
## The Insight
|
|
42
|
+
|
|
43
|
+
The user already has a browser. It's already logged in. It already passes Cloudflare. Instead of fighting the web with headless stealth tricks, **use what's already there**.
|
|
44
|
+
|
|
45
|
+
CDP (Chrome DevTools Protocol) lets us connect to any Chromium-based browser — the same one the user browses with daily. We get their cookies, their sessions, their anti-detection posture, for free.
|
|
46
|
+
|
|
47
|
+
---
|
|
48
|
+
|
|
49
|
+
## Core Architecture
|
|
50
|
+
|
|
51
|
+
### CDP-Direct (Why No Playwright)
|
|
52
|
+
|
|
53
|
+
**Decision:** Use CDP over WebSocket directly. No Playwright dependency.
|
|
54
|
+
|
|
55
|
+
**Why:**
|
|
56
|
+
- Playwright downloads a bundled Chromium (~200MB). barebrowse uses the browser already installed on the user's machine.
|
|
57
|
+
- Playwright abstracts CDP, but we need CDP directly for all three modes (headless, headed, hybrid) against the user's real browser.
|
|
58
|
+
- Every Playwright API call maps 1:1 to a CDP method. The abstraction adds weight without adding capability for our use case.
|
|
59
|
+
- CDP gives us everything: `Accessibility.getFullAXTree`, `Page.navigate`, `Runtime.evaluate`, `Input.dispatch*Event`, `Network.setCookie`, `Page.captureScreenshot`.
|
|
60
|
+
- The CDP WebSocket client is ~100 lines of vanilla JS. Playwright is ~50,000.
|
|
61
|
+
|
|
62
|
+
**What we lose:** Cross-engine support (Firefox, WebKit). CDP only works with Chromium-family browsers (Chrome, Chromium, Edge, Brave, Vivaldi, Arc, Opera). This covers ~80% of desktop browsers. Firefox support could come later via WebDriver BiDi.
|
|
63
|
+
|
|
64
|
+
**What we gain:** Zero heavy deps, uses the user's real browser, same code path for headless/headed/hybrid, drastically simpler codebase.
|
|
65
|
+
|
|
66
|
+
### ARIA-First (Why Not DOM)
|
|
67
|
+
|
|
68
|
+
**Decision:** Use `Accessibility.getFullAXTree` (ARIA/accessibility tree) as the primary page representation, not DOM.
|
|
69
|
+
|
|
70
|
+
**Why:**
|
|
71
|
+
- The accessibility tree is the semantic structure of the page — roles, names, states, interactive elements. It's what screen readers see. It's also what agents need.
|
|
72
|
+
- DOM is bloated: wrapper divs, styling, tracking pixels, ad scripts. An agent doesn't need any of that.
|
|
73
|
+
- mcprune already proved this: ARIA snapshots pruned by role achieve 75-95% token reduction on typical pages while preserving all actionable information.
|
|
74
|
+
- CDP's `Accessibility.getFullAXTree` returns the tree directly. No parsing HTML, no building a DOM tree, no traversing nodes.
|
|
75
|
+
- ARIA refs map directly to CDP interaction targets — the agent reads a button in the tree and can click it via the same CDP connection.
|
|
76
|
+
|
|
77
|
+
**The pipeline:** CDP connect → authenticate → navigate → ARIA tree → prune → agent gets clean snapshot.
|
|
78
|
+
|
|
79
|
+
### Three Modes (Why All Three)
|
|
80
|
+
|
|
81
|
+
**Decision:** Headless, headed, and hybrid — not as separate packages or optional features, but as a single flag on the same API.
|
|
82
|
+
|
|
83
|
+
**Why they're not bloat:** The CDP conversation is identical regardless of mode. The only difference is how you get a browser process with a debug port. It's one code path with a different entry point:
|
|
84
|
+
|
|
85
|
+
```
|
|
86
|
+
headless: spawn chromium --headless=new --remote-debugging-port=N
|
|
87
|
+
headed: connect to user's already-running browser on debug port
|
|
88
|
+
hybrid: try headless → detect failure → fall back to headed
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
After connection, every CDP command is the same. Three modes = ~20 extra lines in `chromium.js`, not three implementations.
|
|
92
|
+
|
|
93
|
+
**When to use each:**
|
|
94
|
+
|
|
95
|
+
| Mode | Use case | Example |
|
|
96
|
+
|---|---|---|
|
|
97
|
+
| `headless` | Agent research, background tasks, CI | "Read this article and summarize it" |
|
|
98
|
+
| `headed` | Personal assistant, interactive tasks, auth flows | "Book me a flight on this page" |
|
|
99
|
+
| `hybrid` | Default for autonomous agents | Try headless; if CF-blocked, fall back to headed |
|
|
100
|
+
|
|
101
|
+
**Headless is the default.** Most agent tasks are "go read this page." Headed is the escape hatch for when headless fails or the task requires user-visible interaction.
|
|
102
|
+
|
|
103
|
+
### Cookie Authentication
|
|
104
|
+
|
|
105
|
+
**Decision:** Extract cookies from the user's browser profile and inject via CDP `Network.setCookie`.
|
|
106
|
+
|
|
107
|
+
**Why:**
|
|
108
|
+
- The user's browser has active sessions for every site they use. We reuse those sessions instead of building new auth flows.
|
|
109
|
+
- sweet-cookie (npm package) already extracts cookies from Chrome/Firefox/Safari SQLite databases with OS keychain decryption. We use it or vendor the relevant parts.
|
|
110
|
+
- For headed mode, cookies are already present in the browser — no extraction needed.
|
|
111
|
+
- For headless mode, we extract from the user's profile and inject into the headless instance.
|
|
112
|
+
|
|
113
|
+
**Limitation:** Cookies expire. This works for existing sessions, not new logins. For sites requiring fresh auth, headed mode with user interaction is the fallback.
|
|
114
|
+
|
|
115
|
+
### Pruning (Absorbed from mcprune)
|
|
116
|
+
|
|
117
|
+
**Decision:** Port mcprune's role-based ARIA tree pruning into barebrowse as a built-in step, not an optional module.
|
|
118
|
+
|
|
119
|
+
**Why:**
|
|
120
|
+
- Pruning is not optional for agent consumption. A raw ARIA tree is still too large for most LLM context windows. Pruning is part of the pipeline, not an afterthought.
|
|
121
|
+
- mcprune's pruning logic is a pure function: takes an ARIA tree, returns a smaller ARIA tree. No browser dependency, no Playwright coupling. It's ~300 lines of role-based tree surgery.
|
|
122
|
+
- By absorbing it, barebrowse becomes a complete "URL in, agent-ready snapshot out" solution. No second package needed.
|
|
123
|
+
|
|
124
|
+
**What we port from mcprune:**
|
|
125
|
+
- Role taxonomy (landmarks, interactive, structural, noise)
|
|
126
|
+
- Landmark extraction (main, nav, banner, etc.)
|
|
127
|
+
- Noise removal (ads, tracking, legal boilerplate)
|
|
128
|
+
- Interactive element preservation (buttons, links, inputs)
|
|
129
|
+
- Wrapper collapsing (nested generics, empty groups)
|
|
130
|
+
- Context-aware filtering (search relevance, dedup)
|
|
131
|
+
|
|
132
|
+
**What stays in mcprune:** The Playwright MCP proxy architecture. mcprune can continue to exist as a Playwright-based MCP server for users who want that path. But for barebrowse consumers, pruning is built in.
|
|
133
|
+
|
|
134
|
+
---
|
|
135
|
+
|
|
136
|
+
## API Design
|
|
137
|
+
|
|
138
|
+
### Public API
|
|
139
|
+
|
|
140
|
+
```js
|
|
141
|
+
import { browse, connect } from 'barebrowse';
|
|
142
|
+
|
|
143
|
+
// One-shot: URL in, pruned ARIA snapshot out
|
|
144
|
+
const tree = await browse('https://example.com');
|
|
145
|
+
|
|
146
|
+
// With options
|
|
147
|
+
const tree = await browse('https://example.com', {
|
|
148
|
+
mode: 'hybrid', // 'headless' (default) | 'headed' | 'hybrid'
|
|
149
|
+
cookies: true, // inject user's cookies (default: true)
|
|
150
|
+
prune: true, // apply ARIA pruning (default: true)
|
|
151
|
+
browser: 'chrome', // which browser profile for cookies
|
|
152
|
+
timeout: 30000, // navigation timeout ms
|
|
153
|
+
});
|
|
154
|
+
|
|
155
|
+
// Long-lived session for interaction
|
|
156
|
+
const page = await connect({ mode: 'headed' });
|
|
157
|
+
await page.goto('https://amazon.com/cart');
|
|
158
|
+
await page.click('[data-action="checkout"]');
|
|
159
|
+
await page.type('#gift-message', 'Happy birthday!');
|
|
160
|
+
const tree = await page.snapshot(); // ARIA + prune
|
|
161
|
+
await page.close();
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
### Design Principles
|
|
165
|
+
|
|
166
|
+
1. **One package, one import.** No picking pieces. `browse()` does everything. Power users get `connect()` for long-lived sessions.
|
|
167
|
+
2. **Batteries included.** Cookies, ARIA, pruning — all happen inside by default. Disable with flags if you want raw access.
|
|
168
|
+
3. **Escape hatches.** `connect()` returns an object with the raw CDP connection accessible. If you need something we don't wrap, you can send CDP commands directly.
|
|
169
|
+
4. **Progressive complexity.** `browse(url)` for 90% of use cases. Options object for the rest. `connect()` for interactive sessions.
|
|
170
|
+
|
|
171
|
+
---
|
|
172
|
+
|
|
173
|
+
## The bare- Ecosystem
|
|
174
|
+
|
|
175
|
+
```
|
|
176
|
+
bareagent = the brain (orchestration, planning, memory, retries, tool loop)
|
|
177
|
+
barebrowse = the eyes + hands (browse, read, interact with the web)
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
**Integration with bareagent:**
|
|
181
|
+
|
|
182
|
+
```js
|
|
183
|
+
import { Loop } from 'bare-agent';
|
|
184
|
+
import { browse } from 'barebrowse';
|
|
185
|
+
|
|
186
|
+
const tools = [
|
|
187
|
+
{ name: 'browse', execute: ({ url }) => browse(url) },
|
|
188
|
+
];
|
|
189
|
+
|
|
190
|
+
const loop = new Loop({ provider });
|
|
191
|
+
await loop.run([{ role: 'user', content: 'Find the cheapest flight to Tokyo' }], tools);
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
bareagent handles the think/act/observe loop. barebrowse handles "see the web and act on it." Neither is opinionated about the other. Tools are plain functions.
|
|
195
|
+
|
|
196
|
+
**Integration with multis:**
|
|
197
|
+
|
|
198
|
+
multis (personal assistant) uses barebrowse in headed mode for interactive tasks. The multis proxy is already running, providing a desktop session. barebrowse connects to the user's Chrome and drives it on behalf of the assistant.
|
|
199
|
+
|
|
200
|
+
**MCP server wrapper (future):**
|
|
201
|
+
|
|
202
|
+
barebrowse is not an MCP server, but wrapping it as one is ~30 lines. This would replace Playwright MCP + mcprune proxy with a single, lighter MCP server.
|
|
203
|
+
|
|
204
|
+
---
|
|
205
|
+
|
|
206
|
+
## Decisions Log — Why We Chose Each
|
|
207
|
+
|
|
208
|
+
This section exists so we don't re-debate settled decisions.
|
|
209
|
+
|
|
210
|
+
| Decision | Choice | Why | Alternative considered | Why not |
|
|
211
|
+
|---|---|---|---|---|
|
|
212
|
+
| Browser protocol | CDP direct | Uses user's browser, ~100 lines, all 3 modes | Playwright | 200MB download, bundles its own Chromium, abstracts what we need raw |
|
|
213
|
+
| Page representation | ARIA tree | Semantic, token-efficient, what agents need | DOM/HTML | Bloated, noisy, needs heavy parsing |
|
|
214
|
+
| Pruning | Built-in | Agents always need pruned output | Optional/separate | Two deps for one job, pruning isn't optional |
|
|
215
|
+
| Cookie auth | Own auth.js + CDP inject | User's existing sessions (Firefox or Chromium), cross-browser injection into headless Chromium | OAuth/credential storage | Complex, security liability, reinventing what the browser already solved |
|
|
216
|
+
| Three modes | One flag | Same CDP code, ~20 lines difference | Separate packages | Same code, artificial separation |
|
|
217
|
+
| Chromium only | CDP constraint | ~80% browser share, user's real browser | Cross-browser (Playwright) | Requires Playwright, loses "use your own browser" benefit |
|
|
218
|
+
| Anti-detection | Runtime.evaluate patches | Minimal stealth for headless mode | Full stealth framework | Over-engineering; headless + real cookies handles 90% |
|
|
219
|
+
| Daemon/server | None | CDP is direct, no intermediary needed | sweetlink daemon pattern | Unnecessary complexity for local agent→browser |
|
|
220
|
+
| Framework | None (vanilla JS) | Matches bare- philosophy, zero deps | Express/Fastify wrapper | Not a server, not needed |
|
|
221
|
+
| Language | Vanilla JavaScript | Node.js ecosystem, same as bareagent, CDP libs available | TypeScript | Added build step, not needed for POC; can add types later |
|
|
222
|
+
| Naming | chromium.js | Covers all Chromium-family browsers, not just Chrome | chrome.js | Too specific; Brave/Edge/Arc are also targets |
|
|
223
|
+
| mcprune integration | Absorb pruning logic | One package does it all, mcprune pruning is a pure function | Keep separate | Agents shouldn't need two packages to browse |
|
|
224
|
+
| openclaw lesson | Single bridge protocol | One CDP connection vs many API integrations | Direct multi-API | openclaw proved this fails — bloat, maintenance, fragility |
|
|
225
|
+
|
|
226
|
+
---
|
|
227
|
+
|
|
228
|
+
## Future Features (Post-POC)
|
|
229
|
+
|
|
230
|
+
### Near-term
|
|
231
|
+
- **Screenshot capture** — `Page.captureScreenshot` via CDP. Useful for visual verification and multimodal agents.
|
|
232
|
+
- **Network interception** — `Network.requestWillBeSent` / `Network.responseReceived` for monitoring page loads. Detect redirects, blocked resources, API calls.
|
|
233
|
+
- **Wait strategies** — `waitForNavigation()` done (Page.loadEventFired). Still needed: network idle, element presence polling.
|
|
234
|
+
- **Tab management** — Multiple pages in one browser session. CDP `Target.createTarget` / `Target.attachToTarget`.
|
|
235
|
+
- **MCP server wrapper** — Expose browse/click/type as MCP tools. Replaces Playwright MCP + mcprune combo.
|
|
236
|
+
|
|
237
|
+
### Medium-term
|
|
238
|
+
- **Firefox support** — Via WebDriver BiDi protocol (cross-browser standard, still maturing). Second protocol adapter alongside CDP.
|
|
239
|
+
- **Cookie sync** — In hybrid mode, extract fresh cookies from headed session and cache for future headless use. Self-refreshing auth.
|
|
240
|
+
- **Selector discovery** — Port sweetlink's `discoverSelectors` — crawl ARIA tree, score interactive elements, return ranked action targets.
|
|
241
|
+
- **Form understanding** — Detect forms in ARIA tree, map fields to semantic purposes, enable agents to fill forms intelligently.
|
|
242
|
+
- **Proxy/Tor support** — Route headless browser through proxy for geo-restricted content.
|
|
243
|
+
|
|
244
|
+
### Long-term
|
|
245
|
+
- **Profile management** — Multiple browser profiles for different identities/accounts.
|
|
246
|
+
- **Session recording/replay** — Record browsing sessions as CDP commands, replay for testing.
|
|
247
|
+
- **Visual grounding** — Combine ARIA tree with screenshot regions for multimodal agents.
|
|
248
|
+
- **Agent memory integration** — Remember visited pages, cache snapshots, track which sites need headed mode.
|
|
249
|
+
|
|
250
|
+
---
|
|
251
|
+
|
|
252
|
+
## Repos Studied — What We Borrowed and Why
|
|
253
|
+
|
|
254
|
+
| Repo | What we took | What we skipped |
|
|
255
|
+
|---|---|---|
|
|
256
|
+
| **steipete/sweet-cookie** | Cookie extraction from browser profiles, OS keychain decryption | Nothing — clean, focused library |
|
|
257
|
+
| **steipete/sweetlink** | CDP dual-channel concept, selector discovery scoring, click/command patterns | Daemon architecture, WebSocket bridge, in-page runtime injection, HMAC auth |
|
|
258
|
+
| **steipete/canvas** | Stealth/anti-detection config patterns | Go implementation (we're JS) |
|
|
259
|
+
| **nichochar/open-operator** | AI agent web automation patterns | Full framework, too opinionated |
|
|
260
|
+
| **AntlerClaw/playwright-mcp** | How to expose browser as MCP tools | Playwright dependency |
|
|
261
|
+
| **AntlerClaw/mcp-browser-use** | MCP-native browser patterns | Heavy deps |
|
|
262
|
+
| **AitchKay/chromancer** | Accessibility tree extraction approach | Different stack |
|
|
263
|
+
| **mcprune (own)** | ARIA pruning logic — role taxonomy, landmark extraction, noise removal, wrapper collapsing | Playwright dependency, MCP proxy architecture |
|
|
264
|
+
| **openclaw (own)** | Lesson learned: multi-API direct integration = bloat. Use a single bridge protocol | Everything — the architecture was the cautionary tale |
|
|
265
|
+
|
|
266
|
+
### The openclaw lesson
|
|
267
|
+
|
|
268
|
+
openclaw tried to integrate 10+ messaging APIs directly — each with its own auth, format, quirks. It became a maintenance nightmare. multis solved the same problem by using Beeper/Matrix as a single bridge.
|
|
269
|
+
|
|
270
|
+
barebrowse applies the same lesson: instead of integrating Playwright + Puppeteer + WebDriver + stealth plugins + cookie libraries + proxy managers, we use **one protocol (CDP) to one browser (the user's)**. Everything else is unnecessary.
|
|
271
|
+
|
|
272
|
+
---
|
|
273
|
+
|
|
274
|
+
## Success Criteria
|
|
275
|
+
|
|
276
|
+
barebrowse succeeds when:
|
|
277
|
+
|
|
278
|
+
1. `browse(url)` returns a pruned ARIA snapshot of any page, authenticated as the user
|
|
279
|
+
2. Zero heavy dependencies — no Playwright, no Puppeteer, no bundled browser
|
|
280
|
+
3. Works with any installed Chromium-based browser
|
|
281
|
+
4. Headless for research, headed for interaction, hybrid for autonomous agents
|
|
282
|
+
5. Plugs into bareagent as plain tool functions
|
|
283
|
+
6. Total source under 1,000 lines for core functionality
|
|
284
|
+
7. An agent using barebrowse + bareagent can autonomously research the web and act on pages
|
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
# Bug Log
|
|
2
|
+
|
|
3
|
+
Track bugs: symptom, root cause, fix, regression test.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
*No bugs logged yet. When one is found, add an entry:*
|
|
8
|
+
|
|
9
|
+
```
|
|
10
|
+
## [date] Short description
|
|
11
|
+
|
|
12
|
+
**Symptom:** What the user/test observed
|
|
13
|
+
**Root cause:** Why it happened
|
|
14
|
+
**Fix:** What was changed (file:line)
|
|
15
|
+
**Regression test:** Which test prevents recurrence
|
|
16
|
+
```
|
|
@@ -0,0 +1,32 @@
|
|
|
1
|
+
# Decisions Log
|
|
2
|
+
|
|
3
|
+
Settled decisions. Don't re-debate these -- see rationale column.
|
|
4
|
+
|
|
5
|
+
## Founding decisions (v0.1.0)
|
|
6
|
+
|
|
7
|
+
| # | Decision | Choice | Why | Alternative | Why not |
|
|
8
|
+
|---|----------|--------|-----|-------------|---------|
|
|
9
|
+
| 1 | Browser protocol | CDP direct | Uses user's browser, ~100 lines, all 3 modes | Playwright | 200MB download, bundles its own Chromium, abstracts what we need raw |
|
|
10
|
+
| 2 | Page representation | ARIA tree | Semantic, token-efficient, what agents need | DOM/HTML | Bloated, noisy, needs heavy parsing |
|
|
11
|
+
| 3 | Pruning | Built-in | Agents always need pruned output | Optional/separate | Two deps for one job, pruning isn't optional |
|
|
12
|
+
| 4 | Cookie auth | Own auth.js + CDP inject | User's existing sessions (Firefox or Chromium), cross-browser injection | OAuth/credential storage | Complex, security liability, reinventing what the browser already solved |
|
|
13
|
+
| 5 | Three modes | One flag | Same CDP code, ~20 lines difference | Separate packages | Same code, artificial separation |
|
|
14
|
+
| 6 | Chromium only | CDP constraint | ~80% browser share, user's real browser | Cross-browser (Playwright) | Requires Playwright, loses "use your own browser" benefit |
|
|
15
|
+
| 7 | Framework | None (vanilla JS) | Matches bare- philosophy, zero deps | Express/Fastify wrapper | Not a server, not needed |
|
|
16
|
+
| 8 | Language | Vanilla JavaScript | Node.js ecosystem, same as bareagent, CDP libs available | TypeScript | Added build step, not needed; can add types later |
|
|
17
|
+
| 9 | mcprune integration | Absorb pruning logic | One package does it all, mcprune pruning is a pure function | Keep separate | Agents shouldn't need two packages to browse |
|
|
18
|
+
| 10 | Daemon/server | None | CDP is direct, no intermediary needed | sweetlink daemon pattern | Unnecessary complexity for local agent-to-browser |
|
|
19
|
+
|
|
20
|
+
## v0.2.0 decisions
|
|
21
|
+
|
|
22
|
+
| # | Decision | Choice | Why | Alternative | Why not |
|
|
23
|
+
|---|----------|--------|-----|-------------|---------|
|
|
24
|
+
| 11 | Anti-detection | Runtime.evaluate patches | Minimal stealth for headless mode | Full stealth framework | Over-engineering; headless + real cookies handles 90% |
|
|
25
|
+
| 12 | sweet-cookie | Wrote own auth.js | sweet-cookie not on npm (different package). Our version is simpler, tailored, vanilla JS | Use sweet-cookie | Not available as npm package |
|
|
26
|
+
| 13 | MCP server | Raw JSON-RPC, no SDK | Zero deps, ~200 lines. SDK adds weight without capability for stdio | @modelcontextprotocol/sdk | Unnecessary dependency for simple JSON-RPC |
|
|
27
|
+
| 14 | bareagent adapter | Action tools auto-return snapshot | LLM always sees result without extra tool call. 300ms settle for DOM updates | Return 'ok' like MCP | Different tradeoff -- bareagent tool calls are expensive (LLM round-trip) |
|
|
28
|
+
| 15 | MCP action tools | Return 'ok', agent calls snapshot | MCP tool calls are cheap to chain. Avoids double-token output | Auto-return snapshot | Would bloat every action response |
|
|
29
|
+
|
|
30
|
+
---
|
|
31
|
+
|
|
32
|
+
*Add new decisions below this line. Include date, context, and rationale.*
|
|
@@ -0,0 +1,54 @@
|
|
|
1
|
+
# Implementation Log
|
|
2
|
+
|
|
3
|
+
Chronological record of what changed and why. For detailed changelogs, see `/CHANGELOG.md`.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## v0.2.1 (2026-02-22)
|
|
8
|
+
|
|
9
|
+
- README rewritten: no code blocks, obstacle table, two usage paths (MCP vs framework)
|
|
10
|
+
- MCP auto-installer: `npx barebrowse install` detects Claude Desktop, Cursor, Claude Code
|
|
11
|
+
- MCP config uses `npx` instead of local file paths
|
|
12
|
+
|
|
13
|
+
## v0.2.0 (2026-02-22)
|
|
14
|
+
|
|
15
|
+
Major release: agent integration layer.
|
|
16
|
+
|
|
17
|
+
**New modules:**
|
|
18
|
+
- `mcp-server.js` -- JSON-RPC 2.0 over stdio, 7 tools, singleton session
|
|
19
|
+
- `src/bareagent.js` -- tool adapter for bareagent Loop, 9 tools, auto-snapshot
|
|
20
|
+
- `src/stealth.js` -- navigator patches for headless anti-detection
|
|
21
|
+
- `cli.js` -- `npx barebrowse mcp|install|browse`
|
|
22
|
+
|
|
23
|
+
**New features:**
|
|
24
|
+
- Hybrid mode (try headless, fallback to headed on bot detection)
|
|
25
|
+
- `page.hover(ref)`, `page.select(ref, value)`, `page.screenshot(opts)`
|
|
26
|
+
- `page.waitForNetworkIdle(opts)` -- resolve when no pending requests
|
|
27
|
+
- SPA-aware `waitForNavigation()`
|
|
28
|
+
|
|
29
|
+
**Docs:**
|
|
30
|
+
- `barebrowse.context.md` -- LLM integration guide
|
|
31
|
+
- `docs/testing.md` -- test pyramid, all 54 tests
|
|
32
|
+
- `docs/blueprint.md` -- full pipeline, module table
|
|
33
|
+
|
|
34
|
+
**Tests:** 54 passing (was 47)
|
|
35
|
+
|
|
36
|
+
## v0.1.0 (2026-02-22)
|
|
37
|
+
|
|
38
|
+
Initial release. CDP-direct browsing with ARIA snapshots.
|
|
39
|
+
|
|
40
|
+
**Core modules (7):**
|
|
41
|
+
- `src/index.js` -- `browse()`, `connect()` API
|
|
42
|
+
- `src/cdp.js` -- WebSocket CDP client
|
|
43
|
+
- `src/chromium.js` -- browser discovery and launch
|
|
44
|
+
- `src/aria.js` -- ARIA tree formatting
|
|
45
|
+
- `src/auth.js` -- cookie extraction (Firefox SQLite, Chromium AES + keyring)
|
|
46
|
+
- `src/prune.js` -- 9-step pruning pipeline (ported from mcprune)
|
|
47
|
+
- `src/interact.js` -- click, type, press, scroll
|
|
48
|
+
- `src/consent.js` -- cookie consent auto-dismiss (7 languages, 16+ sites)
|
|
49
|
+
|
|
50
|
+
**Tests:** 47 passing across 5 files
|
|
51
|
+
|
|
52
|
+
---
|
|
53
|
+
|
|
54
|
+
*Add new entries at the top. Include version, date, and what changed.*
|
|
@@ -0,0 +1,35 @@
|
|
|
1
|
+
# Insights
|
|
2
|
+
|
|
3
|
+
Lessons learned, patterns discovered, things to remember.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## The openclaw lesson
|
|
8
|
+
|
|
9
|
+
openclaw tried to integrate 10+ messaging APIs directly -- each with its own auth, format, quirks. It became a maintenance nightmare. multis solved the same problem by using Beeper/Matrix as a single bridge.
|
|
10
|
+
|
|
11
|
+
barebrowse applies the same lesson: instead of integrating Playwright + Puppeteer + WebDriver + stealth plugins + cookie libraries + proxy managers, we use **one protocol (CDP) to one browser (the user's)**. Everything else is unnecessary.
|
|
12
|
+
|
|
13
|
+
**Takeaway:** When possible, find a single bridge protocol instead of N direct integrations.
|
|
14
|
+
|
|
15
|
+
## Repos studied -- what we took and what we skipped
|
|
16
|
+
|
|
17
|
+
| Repo | What we took | What we skipped | Why |
|
|
18
|
+
|------|-------------|-----------------|-----|
|
|
19
|
+
| **steipete/sweet-cookie** | Cookie extraction concept (SQLite + keyring) | Nothing | Not on npm. Wrote our own auth.js -- simpler, tailored, vanilla JS |
|
|
20
|
+
| **steipete/sweetlink** | CDP-direct concept | Daemon, WebSocket bridge, in-page runtime, HMAC auth | CDP direct is 100 lines vs ~2,000 |
|
|
21
|
+
| **steipete/canvas** | Stealth/anti-detection patterns | Go implementation | Noted for stealth.js |
|
|
22
|
+
| **mcprune (own)** | Full pruning pipeline port | Playwright dependency, MCP proxy | prune.js is 472 lines, adapted from Playwright YAML to CDP tree |
|
|
23
|
+
| **openclaw (own)** | Cautionary tale | Everything | Multi-API direct integration = bloat |
|
|
24
|
+
|
|
25
|
+
## Key technical insights
|
|
26
|
+
|
|
27
|
+
- **ARIA tree > DOM** for agent consumption. Semantic, compact, interactive elements are first-class. Token reduction of 47-95% is real.
|
|
28
|
+
- **Cookie consent is solvable** with ARIA tree scanning + a button text corpus in 7 languages. Dialog role detection + global fallback covers >95% of sites.
|
|
29
|
+
- **Headed mode is the ultimate fallback.** When stealth fails, when cookies expire, when CAPTCHAs appear -- connecting to the user's real browser session handles it.
|
|
30
|
+
- **CDP flattened sessions** are the way to go. One WebSocket, multiple targets. The session ID header routes commands to the right tab.
|
|
31
|
+
- **`Page.addScriptToEvaluateOnNewDocument`** runs before any page scripts -- perfect for stealth patches without race conditions.
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
*Add new insights as they emerge. These should be durable lessons, not session notes.*
|
|
@@ -0,0 +1,123 @@
|
|
|
1
|
+
# Validation Log
|
|
2
|
+
|
|
3
|
+
What's been tested against the real world. Updated when new sites or features are validated.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Test suite (64 tests, 6 files)
|
|
8
|
+
|
|
9
|
+
| File | Tests | Type | What it covers |
|
|
10
|
+
|------|-------|------|----------------|
|
|
11
|
+
| `test/unit/prune.test.js` | 16 | Unit | 9-step pruning pipeline in isolation |
|
|
12
|
+
| `test/unit/auth.test.js` | 7 | Unit | Cookie extraction from Firefox/Chromium |
|
|
13
|
+
| `test/unit/cdp.test.js` | 5 | Unit | Browser discovery, launch, CDP client, sessions |
|
|
14
|
+
| `test/integration/browse.test.js` | 11 | Integration | Full `browse()` and `connect()` pipeline |
|
|
15
|
+
| `test/integration/cli.test.js` | 10 | Integration | CLI session lifecycle: open/snapshot/goto/click/eval/console/network/close |
|
|
16
|
+
| `test/integration/interact.test.js` | 15 | E2E | Real interactions on data: fixtures + live sites |
|
|
17
|
+
|
|
18
|
+
Run all: `node --test test/unit/*.test.js test/integration/*.test.js`
|
|
19
|
+
|
|
20
|
+
## Site validation matrix
|
|
21
|
+
|
|
22
|
+
Tested across 16+ sites, 8 countries, 7 languages.
|
|
23
|
+
|
|
24
|
+
| Site | Consent | Cookies | Interactions | Notes |
|
|
25
|
+
|------|---------|---------|-------------|-------|
|
|
26
|
+
| google.com | NL dialog dismissed | Firefox injection | Search (combobox + Enter) | Bot-blocks headless |
|
|
27
|
+
| youtube.com | Bypassed via cookies | Firefox injection | Search + video playback | Full e2e demo, SPA nav |
|
|
28
|
+
| bbc.com | SourcePoint dismissed | -- | -- | Button outside dialog |
|
|
29
|
+
| wikipedia.org | -- | -- | Link click + navigation | Clean, no consent |
|
|
30
|
+
| github.com | -- | -- | SPA navigation | Needs settle time |
|
|
31
|
+
| duckduckgo.com | -- | -- | Search + results | Headless-friendly |
|
|
32
|
+
| news.ycombinator.com | -- | -- | Story link click | Clean, simple DOM |
|
|
33
|
+
| amazon.de | Banner dismissed | -- | -- | |
|
|
34
|
+
| theguardian.com | CMP dismissed | -- | -- | |
|
|
35
|
+
| spiegel.de | CMP dismissed | -- | -- | German |
|
|
36
|
+
| lemonde.fr | CMP dismissed | -- | -- | French |
|
|
37
|
+
| elpais.com | CMP dismissed | -- | -- | Spanish |
|
|
38
|
+
| corriere.it | CMP dismissed | -- | -- | Italian |
|
|
39
|
+
| nos.nl | CMP dismissed | -- | -- | Dutch |
|
|
40
|
+
| bild.de | CMP dismissed | -- | -- | German |
|
|
41
|
+
| nu.nl | CMP dismissed | -- | -- | Dutch |
|
|
42
|
+
| booking.com | Banner dismissed | -- | -- | |
|
|
43
|
+
| nytimes.com | -- | -- | -- | No consent wall |
|
|
44
|
+
| stackoverflow.com | Footer link only | -- | -- | Not blocking |
|
|
45
|
+
| cnn.com | -- | -- | -- | No consent wall |
|
|
46
|
+
| reddit.com | -- | -- | Fallback to old.reddit | Bot-blocks headless |
|
|
47
|
+
|
|
48
|
+
## Token reduction measurements
|
|
49
|
+
|
|
50
|
+
| Page | Raw ARIA | Pruned | Reduction |
|
|
51
|
+
|------|----------|--------|-----------|
|
|
52
|
+
| example.com | 377 chars | 45 chars | 88% |
|
|
53
|
+
| Hacker News | 51,726 chars | 27,197 chars | 47% |
|
|
54
|
+
| Wikipedia (article) | 109,479 chars | 40,566 chars | 63% |
|
|
55
|
+
| DuckDuckGo | 42,254 chars | 5,407 chars | 87% |
|
|
56
|
+
|
|
57
|
+
---
|
|
58
|
+
|
|
59
|
+
## CLI manual validation (v0.3.0)
|
|
60
|
+
|
|
61
|
+
Full end-to-end validation of every CLI command against real websites.
|
|
62
|
+
|
|
63
|
+
### Session lifecycle
|
|
64
|
+
|
|
65
|
+
| Command | Result |
|
|
66
|
+
|---------|--------|
|
|
67
|
+
| `barebrowse open https://example.com` | Session started, pid+port printed, session.json created |
|
|
68
|
+
| `barebrowse status` | Shows running pid, port, start time |
|
|
69
|
+
| `barebrowse close` | "Session closed", session.json removed, daemon exited |
|
|
70
|
+
| `status` after close | "No session found", exit code 1 |
|
|
71
|
+
| `click 5` with no session | "No active session. Run `barebrowse open` first.", exit 1 |
|
|
72
|
+
| double `open` | "Session already running. Use `barebrowse close` first.", exit 1 |
|
|
73
|
+
|
|
74
|
+
### Navigation + snapshots (example.com, HN)
|
|
75
|
+
|
|
76
|
+
| Command | Result |
|
|
77
|
+
|---------|--------|
|
|
78
|
+
| `snapshot` (example.com) | `.barebrowse/page-*.yml` created, clean formatting |
|
|
79
|
+
| `snapshot --mode=read` | Read mode includes paragraphs, each node on own line |
|
|
80
|
+
| `goto https://news.ycombinator.com` | "ok" |
|
|
81
|
+
| `snapshot` (HN) | Clean ARIA tree with refs, proper newline separation |
|
|
82
|
+
| `screenshot` | Valid 780x493 PNG file |
|
|
83
|
+
|
|
84
|
+
### Interactions (DuckDuckGo search)
|
|
85
|
+
|
|
86
|
+
| Command | Result |
|
|
87
|
+
|---------|--------|
|
|
88
|
+
| `type 12 barebrowse npm` | "ok", multi-word text correctly joined |
|
|
89
|
+
| `press Enter` | "ok", search submitted |
|
|
90
|
+
| `wait-idle` | "ok", waited for network settle |
|
|
91
|
+
| `eval "document.title"` | `"barebrowse npm at DuckDuckGo"` |
|
|
92
|
+
| `snapshot` | Search results page, clean formatting with refs |
|
|
93
|
+
| `fill 2583 hello world` | "ok", cleared search box + typed new text |
|
|
94
|
+
| `hover 2402` | "ok" |
|
|
95
|
+
| `scroll 300` | "ok" |
|
|
96
|
+
|
|
97
|
+
### Debugging commands
|
|
98
|
+
|
|
99
|
+
| Command | Result |
|
|
100
|
+
|---------|--------|
|
|
101
|
+
| `eval "1 + 1"` | `2` |
|
|
102
|
+
| `eval "document.location.href"` | `"https://news.ycombinator.com/news"` |
|
|
103
|
+
| `eval "console.log('test'); console.error('err')"` | `ok` (undefined return) |
|
|
104
|
+
| `console-logs` | `.json (2 entries)` — log + error captured with types and timestamps |
|
|
105
|
+
| `network-log` | `.json (15 entries)` — all requests with URL, method, status |
|
|
106
|
+
| `network-log --failed` | `.json (1 entries)` — filtered to failed/4xx+ only |
|
|
107
|
+
|
|
108
|
+
### Legacy + install commands
|
|
109
|
+
|
|
110
|
+
| Command | Result |
|
|
111
|
+
|---------|--------|
|
|
112
|
+
| `browse https://example.com` | One-shot snapshot to stdout |
|
|
113
|
+
| `install` | "No MCP clients detected" + Claude Code hint |
|
|
114
|
+
| `install --skill` | SKILL.md copied to `~/.config/claude/skills/barebrowse/` |
|
|
115
|
+
| (no args) | Clean help output with all commands |
|
|
116
|
+
|
|
117
|
+
### Bug found and fixed during validation
|
|
118
|
+
|
|
119
|
+
**`src/aria.js` line 23**: ignored nodes joined children with `''` instead of `'\n'`, causing sibling subtrees to concatenate on one line (e.g. `[ref=15]- _promote`). Fixed to `.filter(Boolean).join('\n')`. All 64 tests pass with the fix.
|
|
120
|
+
|
|
121
|
+
---
|
|
122
|
+
|
|
123
|
+
*Add new validation entries when testing against new sites or features.*
|
|
@@ -0,0 +1,31 @@
|
|
|
1
|
+
# Definition of Done
|
|
2
|
+
|
|
3
|
+
A feature or change is "done" when ALL of these are true.
|
|
4
|
+
|
|
5
|
+
## Code
|
|
6
|
+
|
|
7
|
+
- [ ] Works end-to-end (not just the happy path)
|
|
8
|
+
- [ ] No heavy dependencies added (vanilla -> stdlib -> external hierarchy respected)
|
|
9
|
+
- [ ] Under reasonable line count -- no bloat
|
|
10
|
+
- [ ] Clean process management -- no orphan browser processes
|
|
11
|
+
- [ ] No security vulnerabilities introduced (command injection, XSS, etc.)
|
|
12
|
+
|
|
13
|
+
## Tests
|
|
14
|
+
|
|
15
|
+
- [ ] Existing tests still pass: `node --test test/unit/*.test.js test/integration/*.test.js`
|
|
16
|
+
- [ ] New behavior has test coverage (integration preferred over unit)
|
|
17
|
+
- [ ] Bug fixes include a regression test that fails before the fix
|
|
18
|
+
|
|
19
|
+
## Documentation
|
|
20
|
+
|
|
21
|
+
- [ ] `docs/00-context/system-state.md` updated if architecture changed
|
|
22
|
+
- [ ] `docs/03-logs/decisions-log.md` updated if a design decision was made
|
|
23
|
+
- [ ] `barebrowse.context.md` updated if public API changed
|
|
24
|
+
- [ ] `CHANGELOG.md` updated with what changed
|
|
25
|
+
|
|
26
|
+
## Not required (avoid over-engineering)
|
|
27
|
+
|
|
28
|
+
- 100% code coverage
|
|
29
|
+
- TypeScript types
|
|
30
|
+
- Cross-platform testing (Linux first, others later)
|
|
31
|
+
- Performance benchmarks (unless performance is the feature)
|
|
@@ -0,0 +1,68 @@
|
|
|
1
|
+
# Development Workflow
|
|
2
|
+
|
|
3
|
+
## Dev rules
|
|
4
|
+
|
|
5
|
+
**POC first.** Always validate logic with a ~15min proof-of-concept before building. Cover happy path + common edges. POC works -> design properly -> build with tests. Never ship the POC.
|
|
6
|
+
|
|
7
|
+
**Build incrementally.** Break work into small independent modules. One piece at a time, each must work on its own before integrating.
|
|
8
|
+
|
|
9
|
+
**Dependency hierarchy -- follow strictly:**
|
|
10
|
+
1. Vanilla language -- write it yourself if <50 lines and not security-critical
|
|
11
|
+
2. Standard library -- `node:test`, `node:fs`, `node:crypto`, `node:sqlite`
|
|
12
|
+
3. External -- only when stdlib can't do it in <100 lines. Must be maintained, lightweight, widely adopted
|
|
13
|
+
|
|
14
|
+
**Exception:** Always use vetted libraries for security-critical code (crypto, auth, sanitization).
|
|
15
|
+
|
|
16
|
+
**Lightweight over complex.** Fewer moving parts, fewer deps, less config. Simple > clever. Readable > elegant.
|
|
17
|
+
|
|
18
|
+
**Open-source only.** No vendor lock-in. Every line of code must have a purpose -- no speculative code, no premature abstractions.
|
|
19
|
+
|
|
20
|
+
## Language and runtime
|
|
21
|
+
|
|
22
|
+
- Vanilla JavaScript, ES modules, no build step
|
|
23
|
+
- Node.js >= 22 (built-in WebSocket, built-in SQLite)
|
|
24
|
+
- No TypeScript -- can add types later if needed
|
|
25
|
+
|
|
26
|
+
## Running tests
|
|
27
|
+
|
|
28
|
+
```bash
|
|
29
|
+
# All 54 tests
|
|
30
|
+
node --test test/unit/*.test.js test/integration/*.test.js
|
|
31
|
+
|
|
32
|
+
# Unit only (fast, no network)
|
|
33
|
+
node --test test/unit/prune.test.js
|
|
34
|
+
node --test test/unit/auth.test.js
|
|
35
|
+
node --test test/unit/cdp.test.js
|
|
36
|
+
|
|
37
|
+
# Integration (needs Chromium + network)
|
|
38
|
+
node --test test/integration/browse.test.js
|
|
39
|
+
node --test test/integration/interact.test.js
|
|
40
|
+
|
|
41
|
+
# Quick smoke test
|
|
42
|
+
node -e "import { browse } from './src/index.js'; console.log(await browse('https://example.com'))"
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
## Testing standards
|
|
46
|
+
|
|
47
|
+
- **Test behavior, not implementation.** Call the public API, assert on observable output.
|
|
48
|
+
- **Integration tests are the sweet spot.** Real components working together.
|
|
49
|
+
- **No test framework deps.** `node:test` and `node:assert/strict` only.
|
|
50
|
+
- **Always `page.close()` in a `finally` block** to avoid leaked browser processes.
|
|
51
|
+
- **Use `data:` URL fixtures** for deterministic tests (no network dependency).
|
|
52
|
+
- **Real-site tests** go in `interact.test.js`, grouped by site.
|
|
53
|
+
|
|
54
|
+
See `docs/04-process/testing.md` for the full test guide.
|
|
55
|
+
|
|
56
|
+
## Git workflow
|
|
57
|
+
|
|
58
|
+
- Main branch: `main`
|
|
59
|
+
- Commit messages: conventional (`fix:`, `feat:`, `chore:`, `docs:`, `release:`)
|
|
60
|
+
- No force pushes to main
|
|
61
|
+
|
|
62
|
+
## Environment
|
|
63
|
+
|
|
64
|
+
- OS: Fedora Linux, KDE Plasma, Wayland
|
|
65
|
+
- Node: 22.22.0
|
|
66
|
+
- Browser: `/usr/bin/chromium-browser`
|
|
67
|
+
- Default browser: Firefox (cookies extracted from `~/.mozilla/firefox/*.default-release/cookies.sqlite`)
|
|
68
|
+
- KWallet has Chromium Safe Storage key
|