barebrowse 0.2.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/docs/prd.md DELETED
@@ -1,284 +0,0 @@
1
- # barebrowse — Product Requirements Document
2
-
3
- **Version:** 1.0
4
- **Date:** 2026-02-22
5
- **Status:** POC
6
-
7
- ---
8
-
9
- ## What barebrowse is
10
-
11
- A standalone vanilla JavaScript library that gives autonomous agents authenticated access to the web through the user's own Chromium browser. One package, one import, three modes.
12
-
13
- ```js
14
- import { browse } from 'barebrowse';
15
- const snapshot = await browse('https://any-page.com');
16
- ```
17
-
18
- barebrowse handles: finding the browser, connecting via CDP, injecting cookies, navigating, extracting the ARIA accessibility tree, and pruning it down to what an agent actually needs. The output is a clean, token-efficient snapshot of any web page — authenticated as the real user.
19
-
20
- ## What barebrowse is NOT
21
-
22
- - **Not a framework.** No plugin system, no config files, no lifecycle hooks.
23
- - **Not an MCP server.** But trivially wrappable as one (~30 lines).
24
- - **Not Playwright.** No bundled browser, no cross-engine abstraction, no 200MB download.
25
- - **Not an agent.** No LLM, no planning, no orchestration — that's bareagent's job.
26
- - **Not a scraper.** It browses as the user, not as a bot harvesting data.
27
-
28
- ---
29
-
30
- ## The Problem
31
-
32
- Every AI agent that needs to read or interact with the web hits the same walls:
33
-
34
- 1. **Cloudflare / bot detection** — headless browsers get blocked
35
- 2. **Authentication** — sites require login, OAuth, session cookies
36
- 3. **Token bloat** — raw DOM is 100K+ tokens; agents need ~5K
37
- 4. **Two consumers, same need** — research agents (read pages) and personal assistants (click/type) both need an authenticated browser, but existing tools force you to choose one path
38
-
39
- Existing solutions (Playwright MCP, sweetlink, open-operator, browser-use) are either too heavy, too opinionated, or solve only half the problem.
40
-
41
- ## The Insight
42
-
43
- The user already has a browser. It's already logged in. It already passes Cloudflare. Instead of fighting the web with headless stealth tricks, **use what's already there**.
44
-
45
- CDP (Chrome DevTools Protocol) lets us connect to any Chromium-based browser — the same one the user browses with daily. We get their cookies, their sessions, their anti-detection posture, for free.
46
-
47
- ---
48
-
49
- ## Core Architecture
50
-
51
- ### CDP-Direct (Why No Playwright)
52
-
53
- **Decision:** Use CDP over WebSocket directly. No Playwright dependency.
54
-
55
- **Why:**
56
- - Playwright downloads a bundled Chromium (~200MB). barebrowse uses the browser already installed on the user's machine.
57
- - Playwright abstracts CDP, but we need CDP directly for all three modes (headless, headed, hybrid) against the user's real browser.
58
- - Every Playwright API call maps 1:1 to a CDP method. The abstraction adds weight without adding capability for our use case.
59
- - CDP gives us everything: `Accessibility.getFullAXTree`, `Page.navigate`, `Runtime.evaluate`, `Input.dispatch*Event`, `Network.setCookie`, `Page.captureScreenshot`.
60
- - The CDP WebSocket client is ~100 lines of vanilla JS. Playwright is ~50,000.
61
-
62
- **What we lose:** Cross-engine support (Firefox, WebKit). CDP only works with Chromium-family browsers (Chrome, Chromium, Edge, Brave, Vivaldi, Arc, Opera). This covers ~80% of desktop browsers. Firefox support could come later via WebDriver BiDi.
63
-
64
- **What we gain:** Zero heavy deps, uses the user's real browser, same code path for headless/headed/hybrid, drastically simpler codebase.
65
-
66
- ### ARIA-First (Why Not DOM)
67
-
68
- **Decision:** Use `Accessibility.getFullAXTree` (ARIA/accessibility tree) as the primary page representation, not DOM.
69
-
70
- **Why:**
71
- - The accessibility tree is the semantic structure of the page — roles, names, states, interactive elements. It's what screen readers see. It's also what agents need.
72
- - DOM is bloated: wrapper divs, styling, tracking pixels, ad scripts. An agent doesn't need any of that.
73
- - mcprune already proved this: ARIA snapshots pruned by role achieve 75-95% token reduction on typical pages while preserving all actionable information.
74
- - CDP's `Accessibility.getFullAXTree` returns the tree directly. No parsing HTML, no building a DOM tree, no traversing nodes.
75
- - ARIA refs map directly to CDP interaction targets — the agent reads a button in the tree and can click it via the same CDP connection.
76
-
77
- **The pipeline:** CDP connect → authenticate → navigate → ARIA tree → prune → agent gets clean snapshot.
78
-
79
- ### Three Modes (Why All Three)
80
-
81
- **Decision:** Headless, headed, and hybrid — not as separate packages or optional features, but as a single flag on the same API.
82
-
83
- **Why they're not bloat:** The CDP conversation is identical regardless of mode. The only difference is how you get a browser process with a debug port. It's one code path with a different entry point:
84
-
85
- ```
86
- headless: spawn chromium --headless=new --remote-debugging-port=N
87
- headed: connect to user's already-running browser on debug port
88
- hybrid: try headless → detect failure → fall back to headed
89
- ```
90
-
91
- After connection, every CDP command is the same. Three modes = ~20 extra lines in `chromium.js`, not three implementations.
92
-
93
- **When to use each:**
94
-
95
- | Mode | Use case | Example |
96
- |---|---|---|
97
- | `headless` | Agent research, background tasks, CI | "Read this article and summarize it" |
98
- | `headed` | Personal assistant, interactive tasks, auth flows | "Book me a flight on this page" |
99
- | `hybrid` | Default for autonomous agents | Try headless; if CF-blocked, fall back to headed |
100
-
101
- **Headless is the default.** Most agent tasks are "go read this page." Headed is the escape hatch for when headless fails or the task requires user-visible interaction.
102
-
103
- ### Cookie Authentication
104
-
105
- **Decision:** Extract cookies from the user's browser profile and inject via CDP `Network.setCookie`.
106
-
107
- **Why:**
108
- - The user's browser has active sessions for every site they use. We reuse those sessions instead of building new auth flows.
109
- - sweet-cookie (npm package) already extracts cookies from Chrome/Firefox/Safari SQLite databases with OS keychain decryption. We use it or vendor the relevant parts.
110
- - For headed mode, cookies are already present in the browser — no extraction needed.
111
- - For headless mode, we extract from the user's profile and inject into the headless instance.
112
-
113
- **Limitation:** Cookies expire. This works for existing sessions, not new logins. For sites requiring fresh auth, headed mode with user interaction is the fallback.
114
-
115
- ### Pruning (Absorbed from mcprune)
116
-
117
- **Decision:** Port mcprune's role-based ARIA tree pruning into barebrowse as a built-in step, not an optional module.
118
-
119
- **Why:**
120
- - Pruning is not optional for agent consumption. A raw ARIA tree is still too large for most LLM context windows. Pruning is part of the pipeline, not an afterthought.
121
- - mcprune's pruning logic is a pure function: takes an ARIA tree, returns a smaller ARIA tree. No browser dependency, no Playwright coupling. It's ~300 lines of role-based tree surgery.
122
- - By absorbing it, barebrowse becomes a complete "URL in, agent-ready snapshot out" solution. No second package needed.
123
-
124
- **What we port from mcprune:**
125
- - Role taxonomy (landmarks, interactive, structural, noise)
126
- - Landmark extraction (main, nav, banner, etc.)
127
- - Noise removal (ads, tracking, legal boilerplate)
128
- - Interactive element preservation (buttons, links, inputs)
129
- - Wrapper collapsing (nested generics, empty groups)
130
- - Context-aware filtering (search relevance, dedup)
131
-
132
- **What stays in mcprune:** The Playwright MCP proxy architecture. mcprune can continue to exist as a Playwright-based MCP server for users who want that path. But for barebrowse consumers, pruning is built in.
133
-
134
- ---
135
-
136
- ## API Design
137
-
138
- ### Public API
139
-
140
- ```js
141
- import { browse, connect } from 'barebrowse';
142
-
143
- // One-shot: URL in, pruned ARIA snapshot out
144
- const tree = await browse('https://example.com');
145
-
146
- // With options
147
- const tree = await browse('https://example.com', {
148
- mode: 'hybrid', // 'headless' (default) | 'headed' | 'hybrid'
149
- cookies: true, // inject user's cookies (default: true)
150
- prune: true, // apply ARIA pruning (default: true)
151
- browser: 'chrome', // which browser profile for cookies
152
- timeout: 30000, // navigation timeout ms
153
- });
154
-
155
- // Long-lived session for interaction
156
- const page = await connect({ mode: 'headed' });
157
- await page.goto('https://amazon.com/cart');
158
- await page.click('[data-action="checkout"]');
159
- await page.type('#gift-message', 'Happy birthday!');
160
- const tree = await page.snapshot(); // ARIA + prune
161
- await page.close();
162
- ```
163
-
164
- ### Design Principles
165
-
166
- 1. **One package, one import.** No picking pieces. `browse()` does everything. Power users get `connect()` for long-lived sessions.
167
- 2. **Batteries included.** Cookies, ARIA, pruning — all happen inside by default. Disable with flags if you want raw access.
168
- 3. **Escape hatches.** `connect()` returns an object with the raw CDP connection accessible. If you need something we don't wrap, you can send CDP commands directly.
169
- 4. **Progressive complexity.** `browse(url)` for 90% of use cases. Options object for the rest. `connect()` for interactive sessions.
170
-
171
- ---
172
-
173
- ## The bare- Ecosystem
174
-
175
- ```
176
- bareagent = the brain (orchestration, planning, memory, retries, tool loop)
177
- barebrowse = the eyes + hands (browse, read, interact with the web)
178
- ```
179
-
180
- **Integration with bareagent:**
181
-
182
- ```js
183
- import { Loop } from 'bare-agent';
184
- import { browse } from 'barebrowse';
185
-
186
- const tools = [
187
- { name: 'browse', execute: ({ url }) => browse(url) },
188
- ];
189
-
190
- const loop = new Loop({ provider });
191
- await loop.run([{ role: 'user', content: 'Find the cheapest flight to Tokyo' }], tools);
192
- ```
193
-
194
- bareagent handles the think/act/observe loop. barebrowse handles "see the web and act on it." Neither is opinionated about the other. Tools are plain functions.
195
-
196
- **Integration with multis:**
197
-
198
- multis (personal assistant) uses barebrowse in headed mode for interactive tasks. The multis proxy is already running, providing a desktop session. barebrowse connects to the user's Chrome and drives it on behalf of the assistant.
199
-
200
- **MCP server wrapper (future):**
201
-
202
- barebrowse is not an MCP server, but wrapping it as one is ~30 lines. This would replace Playwright MCP + mcprune proxy with a single, lighter MCP server.
203
-
204
- ---
205
-
206
- ## Decisions Log — Why We Chose Each
207
-
208
- This section exists so we don't re-debate settled decisions.
209
-
210
- | Decision | Choice | Why | Alternative considered | Why not |
211
- |---|---|---|---|---|
212
- | Browser protocol | CDP direct | Uses user's browser, ~100 lines, all 3 modes | Playwright | 200MB download, bundles its own Chromium, abstracts what we need raw |
213
- | Page representation | ARIA tree | Semantic, token-efficient, what agents need | DOM/HTML | Bloated, noisy, needs heavy parsing |
214
- | Pruning | Built-in | Agents always need pruned output | Optional/separate | Two deps for one job, pruning isn't optional |
215
- | Cookie auth | Own auth.js + CDP inject | User's existing sessions (Firefox or Chromium), cross-browser injection into headless Chromium | OAuth/credential storage | Complex, security liability, reinventing what the browser already solved |
216
- | Three modes | One flag | Same CDP code, ~20 lines difference | Separate packages | Same code, artificial separation |
217
- | Chromium only | CDP constraint | ~80% browser share, user's real browser | Cross-browser (Playwright) | Requires Playwright, loses "use your own browser" benefit |
218
- | Anti-detection | Runtime.evaluate patches | Minimal stealth for headless mode | Full stealth framework | Over-engineering; headless + real cookies handles 90% |
219
- | Daemon/server | None | CDP is direct, no intermediary needed | sweetlink daemon pattern | Unnecessary complexity for local agent→browser |
220
- | Framework | None (vanilla JS) | Matches bare- philosophy, zero deps | Express/Fastify wrapper | Not a server, not needed |
221
- | Language | Vanilla JavaScript | Node.js ecosystem, same as bareagent, CDP libs available | TypeScript | Added build step, not needed for POC; can add types later |
222
- | Naming | chromium.js | Covers all Chromium-family browsers, not just Chrome | chrome.js | Too specific; Brave/Edge/Arc are also targets |
223
- | mcprune integration | Absorb pruning logic | One package does it all, mcprune pruning is a pure function | Keep separate | Agents shouldn't need two packages to browse |
224
- | openclaw lesson | Single bridge protocol | One CDP connection vs many API integrations | Direct multi-API | openclaw proved this fails — bloat, maintenance, fragility |
225
-
226
- ---
227
-
228
- ## Future Features (Post-POC)
229
-
230
- ### Near-term
231
- - **Screenshot capture** — `Page.captureScreenshot` via CDP. Useful for visual verification and multimodal agents.
232
- - **Network interception** — `Network.requestWillBeSent` / `Network.responseReceived` for monitoring page loads. Detect redirects, blocked resources, API calls.
233
- - **Wait strategies** — `waitForNavigation()` done (Page.loadEventFired). Still needed: network idle, element presence polling.
234
- - **Tab management** — Multiple pages in one browser session. CDP `Target.createTarget` / `Target.attachToTarget`.
235
- - **MCP server wrapper** — Expose browse/click/type as MCP tools. Replaces Playwright MCP + mcprune combo.
236
-
237
- ### Medium-term
238
- - **Firefox support** — Via WebDriver BiDi protocol (cross-browser standard, still maturing). Second protocol adapter alongside CDP.
239
- - **Cookie sync** — In hybrid mode, extract fresh cookies from headed session and cache for future headless use. Self-refreshing auth.
240
- - **Selector discovery** — Port sweetlink's `discoverSelectors` — crawl ARIA tree, score interactive elements, return ranked action targets.
241
- - **Form understanding** — Detect forms in ARIA tree, map fields to semantic purposes, enable agents to fill forms intelligently.
242
- - **Proxy/Tor support** — Route headless browser through proxy for geo-restricted content.
243
-
244
- ### Long-term
245
- - **Profile management** — Multiple browser profiles for different identities/accounts.
246
- - **Session recording/replay** — Record browsing sessions as CDP commands, replay for testing.
247
- - **Visual grounding** — Combine ARIA tree with screenshot regions for multimodal agents.
248
- - **Agent memory integration** — Remember visited pages, cache snapshots, track which sites need headed mode.
249
-
250
- ---
251
-
252
- ## Repos Studied — What We Borrowed and Why
253
-
254
- | Repo | What we took | What we skipped |
255
- |---|---|---|
256
- | **steipete/sweet-cookie** | Cookie extraction from browser profiles, OS keychain decryption | Nothing — clean, focused library |
257
- | **steipete/sweetlink** | CDP dual-channel concept, selector discovery scoring, click/command patterns | Daemon architecture, WebSocket bridge, in-page runtime injection, HMAC auth |
258
- | **steipete/canvas** | Stealth/anti-detection config patterns | Go implementation (we're JS) |
259
- | **nichochar/open-operator** | AI agent web automation patterns | Full framework, too opinionated |
260
- | **AntlerClaw/playwright-mcp** | How to expose browser as MCP tools | Playwright dependency |
261
- | **AntlerClaw/mcp-browser-use** | MCP-native browser patterns | Heavy deps |
262
- | **AitchKay/chromancer** | Accessibility tree extraction approach | Different stack |
263
- | **mcprune (own)** | ARIA pruning logic — role taxonomy, landmark extraction, noise removal, wrapper collapsing | Playwright dependency, MCP proxy architecture |
264
- | **openclaw (own)** | Lesson learned: multi-API direct integration = bloat. Use a single bridge protocol | Everything — the architecture was the cautionary tale |
265
-
266
- ### The openclaw lesson
267
-
268
- openclaw tried to integrate 10+ messaging APIs directly — each with its own auth, format, quirks. It became a maintenance nightmare. multis solved the same problem by using Beeper/Matrix as a single bridge.
269
-
270
- barebrowse applies the same lesson: instead of integrating Playwright + Puppeteer + WebDriver + stealth plugins + cookie libraries + proxy managers, we use **one protocol (CDP) to one browser (the user's)**. Everything else is unnecessary.
271
-
272
- ---
273
-
274
- ## Success Criteria
275
-
276
- barebrowse succeeds when:
277
-
278
- 1. `browse(url)` returns a pruned ARIA snapshot of any page, authenticated as the user
279
- 2. Zero heavy dependencies — no Playwright, no Puppeteer, no bundled browser
280
- 3. Works with any installed Chromium-based browser
281
- 4. Headless for research, headed for interaction, hybrid for autonomous agents
282
- 5. Plugs into bareagent as plain tool functions
283
- 6. Total source under 1,000 lines for core functionality
284
- 7. An agent using barebrowse + bareagent can autonomously research the web and act on pages
@@ -1,157 +0,0 @@
1
- /**
2
- * barebrowse headed-mode demo
3
- *
4
- * This script demonstrates interactive browsing with a VISIBLE browser window.
5
- * You watch the browser while barebrowse navigates, clicks, and types.
6
- *
7
- * SETUP — run this command first in a separate terminal:
8
- *
9
- * chromium-browser --remote-debugging-port=9222
10
- *
11
- * Then run this script:
12
- *
13
- * node examples/headed-demo.js
14
- *
15
- * The script connects to the already-running browser via CDP on port 9222.
16
- * You will see each action happen in real time.
17
- */
18
-
19
- import { connect } from '../src/index.js';
20
-
21
- // --- Helpers ---
22
-
23
- /** Small delay so you can watch the browser between steps. */
24
- function wait(ms = 1500) {
25
- return new Promise((r) => setTimeout(r, ms));
26
- }
27
-
28
- /** Find a ref by matching role and name in the snapshot text. */
29
- function findRoleRef(snapshot, role, name) {
30
- const escaped = name.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
31
- const re = new RegExp(`${role} "${escaped}".*?\\[ref=([^\\]]+)\\]`);
32
- const m = snapshot.match(re);
33
- return m ? m[1] : null;
34
- }
35
-
36
- /** Find a ref by partial name match (case-insensitive). */
37
- function findRoleRefPartial(snapshot, role, nameFragment) {
38
- const escaped = nameFragment.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
39
- const re = new RegExp(`${role} "[^"]*${escaped}[^"]*".*?\\[ref=([^\\]]+)\\]`, 'i');
40
- const m = snapshot.match(re);
41
- return m ? m[1] : null;
42
- }
43
-
44
- /** Print a snapshot truncated to a character limit. */
45
- function printSnapshot(snapshot, limit = 500) {
46
- const truncated = snapshot.length > limit
47
- ? snapshot.slice(0, limit) + `\n... (${snapshot.length - limit} more chars)`
48
- : snapshot;
49
- console.log(truncated);
50
- }
51
-
52
- // --- Demo ---
53
-
54
- async function main() {
55
- console.log('=== barebrowse headed-mode demo ===\n');
56
- console.log('Connecting to Chromium on port 9222...');
57
- console.log('(Make sure you ran: chromium-browser --remote-debugging-port=9222)\n');
58
-
59
- const page = await connect({ mode: 'headed', port: 9222 });
60
-
61
- try {
62
- // Step 1: Navigate to Wikipedia
63
- console.log('[Step 1] Navigating to Wikipedia "JavaScript" article...');
64
- await page.goto('https://en.wikipedia.org/wiki/JavaScript');
65
- await wait();
66
-
67
- // Step 2: Take a snapshot
68
- console.log('[Step 2] Taking ARIA snapshot of the page...\n');
69
- let snap = await page.snapshot();
70
- printSnapshot(snap);
71
- console.log();
72
-
73
- // Step 3: Find and click a link
74
- console.log('[Step 3] Looking for a link to click...');
75
- // Try to find the "ECMAScript" link — a common one in the JS article
76
- let linkRef = findRoleRefPartial(snap, 'link', 'ECMAScript');
77
- if (!linkRef) {
78
- // Fallback: find any link
79
- linkRef = findRoleRefPartial(snap, 'link', 'programming');
80
- }
81
- if (linkRef) {
82
- console.log(` Found link ref=${linkRef}, clicking it...`);
83
- const navPromise = page.waitForNavigation();
84
- await page.click(linkRef);
85
-
86
- // Step 4: Wait for navigation
87
- console.log('[Step 4] Waiting for navigation to complete...');
88
- await navPromise;
89
- await wait();
90
- } else {
91
- console.log(' No matching link found, skipping click step.');
92
- }
93
-
94
- // Step 5: New snapshot after navigation
95
- console.log('[Step 5] Taking snapshot of the new page...\n');
96
- snap = await page.snapshot();
97
- printSnapshot(snap);
98
- console.log();
99
-
100
- // Step 6: Navigate to DuckDuckGo
101
- console.log('[Step 6] Navigating to DuckDuckGo...');
102
- await page.goto('https://duckduckgo.com');
103
- await wait();
104
-
105
- // Step 7: Find search box and type a query
106
- console.log('[Step 7] Taking snapshot to find the search box...');
107
- snap = await page.snapshot();
108
- let searchRef = findRoleRefPartial(snap, 'textbox', 'search')
109
- || findRoleRefPartial(snap, 'searchbox', 'search')
110
- || findRoleRefPartial(snap, 'combobox', 'search');
111
-
112
- if (searchRef) {
113
- console.log(` Found search box ref=${searchRef}, typing query...`);
114
- await page.click(searchRef);
115
- await wait(500);
116
- await page.type(searchRef, 'barebrowse CDP browser automation');
117
- await wait();
118
- } else {
119
- console.log(' Could not find search box. Snapshot preview:');
120
- printSnapshot(snap, 300);
121
- console.log(' Skipping search steps.');
122
- return;
123
- }
124
-
125
- // Step 8: Press Enter
126
- console.log('[Step 8] Pressing Enter to search...');
127
- const resultsNav = page.waitForNavigation();
128
- await page.press('Enter');
129
-
130
- // Step 9: Wait for results
131
- console.log('[Step 9] Waiting for search results...');
132
- await resultsNav;
133
- await wait(2000);
134
-
135
- // Step 10: Snapshot the results
136
- console.log('[Step 10] Taking snapshot of search results...\n');
137
- snap = await page.snapshot();
138
- printSnapshot(snap, 800);
139
- console.log();
140
-
141
- console.log('=== Demo complete ===');
142
- } finally {
143
- console.log('Closing session...');
144
- await page.close();
145
- }
146
- }
147
-
148
- main().catch((err) => {
149
- if (err.message?.includes('ECONNREFUSED') || err.message?.includes('connect')) {
150
- console.error('\nError: Could not connect to Chromium on port 9222.');
151
- console.error('Make sure you have Chromium running with remote debugging:');
152
- console.error('\n chromium-browser --remote-debugging-port=9222\n');
153
- } else {
154
- console.error('\nError:', err.message || err);
155
- }
156
- process.exit(1);
157
- });
@@ -1,137 +0,0 @@
1
- /**
2
- * YouTube headed-mode demo — search and play "Family Portrait Pink"
3
- *
4
- * SETUP — run this command first in a separate terminal:
5
- *
6
- * chromium-browser --remote-debugging-port=9222 \
7
- * --disable-notifications \
8
- * --autoplay-policy=no-user-gesture-required \
9
- * --use-fake-device-for-media-stream \
10
- * --use-fake-ui-for-media-stream \
11
- * --disable-features=MediaRouter
12
- *
13
- * Then run this script:
14
- *
15
- * node examples/yt-demo.js
16
- *
17
- * Uses Firefox cookies to bypass YouTube consent wall.
18
- */
19
-
20
- import { connect } from '../src/index.js';
21
-
22
- function wait(ms = 2000) {
23
- return new Promise((r) => setTimeout(r, ms));
24
- }
25
-
26
- function findRef(snapshot, role, nameFragment) {
27
- const escaped = nameFragment.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
28
- const re = new RegExp(`${role} "[^"]*${escaped}[^"]*".*?\\[ref=([^\\]]+)\\]`, 'i');
29
- const m = snapshot.match(re);
30
- return m ? m[1] : null;
31
- }
32
-
33
- function printSnap(snapshot, limit = 600) {
34
- const t = snapshot.length > limit
35
- ? snapshot.slice(0, limit) + `\n... (${snapshot.length - limit} more chars)`
36
- : snapshot;
37
- console.log(t);
38
- }
39
-
40
- async function main() {
41
- console.log('=== YouTube Demo — Family Portrait by Pink ===\n');
42
- console.log('Connecting to Chromium on port 9222...\n');
43
-
44
- const page = await connect({ mode: 'headed', port: 9222 });
45
-
46
- try {
47
- // Step 1: Inject Firefox cookies for youtube.com (bypasses consent wall)
48
- console.log('[1] Injecting Firefox cookies for youtube.com...');
49
- await page.injectCookies('https://www.youtube.com', { browser: 'firefox' });
50
- await wait(500);
51
-
52
- // Step 2: Navigate to YouTube
53
- console.log('[2] Navigating to YouTube...');
54
- await page.goto('https://www.youtube.com');
55
- await wait(2000);
56
-
57
- // Step 3: Find the search box
58
- console.log('[3] Taking snapshot to find search box...');
59
- let snap = await page.snapshot();
60
-
61
- let searchRef = findRef(snap, 'combobox', 'Search')
62
- || findRef(snap, 'textbox', 'Search')
63
- || findRef(snap, 'searchbox', 'Search');
64
-
65
- if (!searchRef) {
66
- console.log(' Could not find search box. Snapshot:');
67
- printSnap(snap, 1000);
68
- return;
69
- }
70
-
71
- // Step 4: Type the search query
72
- console.log(`[4] Found search box ref=${searchRef}, typing query...`);
73
- await page.click(searchRef);
74
- await wait(500);
75
- await page.type(searchRef, 'Family Portrait Pink', { clear: true });
76
- await wait(1000);
77
-
78
- // Step 5: Press Enter to search
79
- // YouTube is an SPA — loadEventFired won't fire, so just wait for results to render
80
- console.log('[5] Pressing Enter to search...');
81
- await page.press('Enter');
82
- await wait(3000);
83
-
84
- // Step 6: Find the video in results
85
- console.log('[6] Looking for Family Portrait in results...');
86
- snap = await page.snapshot();
87
-
88
- let videoRef = findRef(snap, 'link', 'Family Portrait')
89
- || findRef(snap, 'link', 'family portrait');
90
-
91
- if (!videoRef) {
92
- console.log(' Could not find video link. Trying broader match...');
93
- printSnap(snap, 1500);
94
- // Try any link with "Pink" in it
95
- videoRef = findRef(snap, 'link', 'Pink');
96
- }
97
-
98
- if (!videoRef) {
99
- console.log(' No matching video found.');
100
- return;
101
- }
102
-
103
- // Step 7: Click the video (SPA nav — no loadEventFired)
104
- console.log(`[7] Found video ref=${videoRef}, clicking to play...`);
105
- await page.click(videoRef);
106
- await wait(4000);
107
-
108
- // Step 8: Snapshot the video page
109
- console.log('[8] Video page snapshot:\n');
110
- snap = await page.snapshot();
111
- printSnap(snap, 800);
112
-
113
- console.log('\n=== Video should be playing! ===');
114
- console.log('Press Ctrl+C to exit when done watching.\n');
115
-
116
- // Keep alive so user can watch
117
- await new Promise(() => {});
118
- } finally {
119
- await page.close();
120
- }
121
- }
122
-
123
- main().catch((err) => {
124
- if (err.message?.includes('ECONNREFUSED')) {
125
- console.error('\nError: Could not connect to Chromium on port 9222.');
126
- console.error('Start Chromium first:\n');
127
- console.error(' chromium-browser --remote-debugging-port=9222 \\');
128
- console.error(' --disable-notifications \\');
129
- console.error(' --autoplay-policy=no-user-gesture-required \\');
130
- console.error(' --use-fake-device-for-media-stream \\');
131
- console.error(' --use-fake-ui-for-media-stream \\');
132
- console.error(' --disable-features=MediaRouter\n');
133
- } else {
134
- console.error('\nError:', err.message || err);
135
- }
136
- process.exit(1);
137
- });