@olib-ai/owl-browser-sdk 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,582 @@
1
+ # Owl Browser SDK
2
+
3
+ AI-first browser automation SDK for Node.js/TypeScript. Built on top of the Owl Browser - a custom Chromium-based browser with built-in AI intelligence.
4
+
5
+ ## Features
6
+
7
+ - 🧠 **Natural Language Selectors**: Use "search button" instead of complex CSS selectors
8
+ - 🤖 **On-Device LLM**: Built-in Qwen3-1.7B for intelligent page understanding
9
+ - 🌍 **Context Awareness**: Auto-detect location, time, and weather for smart automation
10
+ - ⚡ **Lightning Fast**: Native C++ browser with <1s cold start
11
+ - 🛡️ **Maximum Stealth**: No WebDriver detection, built-in ad blocking
12
+ - 📸 **Full HD Screenshots**: 1920x1080 high-quality captures
13
+ - 🎥 **Video Recording**: Record browser sessions as video
14
+ - 🏠 **Custom Homepage**: Beautiful branded homepage with real-time demographics
15
+ - 🌐 **Dual Mode**: Connect to local browser binary OR remote HTTP server
16
+
17
+ ## Installation
18
+
19
+ ```bash
20
+ npm install owl-browser-sdk
21
+ ```
22
+
23
+ **Prerequisites:**
24
+ - Node.js 18+
25
+ - **Local mode**: The Owl Browser binary must be built (see main README)
26
+ - **HTTP mode**: A running Owl Browser HTTP server
27
+
28
+ ## Connection Modes
29
+
30
+ The SDK supports two connection modes:
31
+
32
+ ### Local Mode (Default)
33
+
34
+ Uses the local browser binary via IPC (stdin/stdout). Best for development and single-machine deployments.
35
+
36
+ ```typescript
37
+ import { Browser } from 'owl-browser-sdk';
38
+
39
+ // Local mode (default)
40
+ const browser = new Browser();
41
+ await browser.launch();
42
+ ```
43
+
44
+ ### HTTP Mode
45
+
46
+ Connects to a remote browser server via REST API. Best for Docker deployments, microservices, and distributed systems.
47
+
48
+ ```typescript
49
+ import { Browser } from 'owl-browser-sdk';
50
+
51
+ // HTTP mode - connect to remote server
52
+ const browser = new Browser({
53
+ mode: 'http',
54
+ http: {
55
+ baseUrl: 'http://localhost:8080',
56
+ token: 'your-secret-token',
57
+ timeout: 30000 // optional, default 30s
58
+ }
59
+ });
60
+ await browser.launch();
61
+ ```
62
+
63
+ **Setting up the HTTP server:**
64
+
65
+ ```bash
66
+ # Build the HTTP server
67
+ cd build && cmake .. -DCMAKE_BUILD_TYPE=Release && make owl_http_server
68
+
69
+ # Run the server
70
+ OWL_HTTP_TOKEN=your-secret-token \
71
+ OWL_BROWSER_PATH=/path/to/owl_browser \
72
+ ./build/Release/owl_http_server
73
+ ```
74
+
75
+ See [HTTP Server Documentation](../http-server/README.md) for more details.
76
+
77
+ ## Quick Start
78
+
79
+ ```typescript
80
+ import { Browser } from 'owl-browser-sdk';
81
+
82
+ // Create and launch browser
83
+ const browser = new Browser();
84
+ await browser.launch();
85
+
86
+ // Create a new page
87
+ const page = await browser.newPage();
88
+
89
+ // Navigate to a URL
90
+ await page.goto('https://example.com');
91
+
92
+ // Use natural language selectors!
93
+ await page.click('search button');
94
+ await page.type('email input', 'user@example.com');
95
+ await page.pressKey('Enter');
96
+
97
+ // Take a screenshot
98
+ const screenshot = await page.screenshot();
99
+
100
+ // Close browser
101
+ await browser.close();
102
+ ```
103
+
104
+ ## API Reference
105
+
106
+ ### Browser
107
+
108
+ ```typescript
109
+ const browser = new Browser(config?: BrowserConfig);
110
+ ```
111
+
112
+ **Methods:**
113
+ - `launch()` - Start the browser process
114
+ - `newPage()` - Create a new browser context (page/tab)
115
+ - `pages()` - Get all active pages
116
+ - `getLLMStatus()` - Check on-device LLM status
117
+ - `listTemplates()` - List available extraction templates
118
+ - `getDemographics()` - Get complete demographics (location, time, weather)
119
+ - `getLocation()` - Get geographic location based on IP
120
+ - `getDateTime()` - Get current date and time
121
+ - `getWeather()` - Get current weather for user's location
122
+ - `getHomepage()` - Get custom browser homepage HTML
123
+ - `close()` - Shutdown browser and cleanup
124
+ - `isRunning()` - Check if browser is running
125
+
126
+ ### BrowserContext (Page)
127
+
128
+ All page interactions happen through a BrowserContext instance.
129
+
130
+ #### Navigation
131
+
132
+ ```typescript
133
+ await page.goto('https://example.com');
134
+ await page.reload();
135
+ await page.goBack();
136
+ await page.goForward();
137
+ ```
138
+
139
+ #### Interactions
140
+
141
+ ```typescript
142
+ // Natural language selectors work!
143
+ await page.click('search button');
144
+ await page.type('email input', 'user@example.com');
145
+ await page.pressKey('Enter');
146
+
147
+ // Or use CSS selectors
148
+ await page.click('#submit-btn');
149
+ await page.type('input[name="email"]', 'user@example.com');
150
+
151
+ // Or use position coordinates (x,y)
152
+ await page.click('500x300'); // Click at x=500, y=300
153
+ await page.type('640x400', 'text here'); // Type at position
154
+
155
+ // Highlight elements for debugging
156
+ await page.highlight('login button');
157
+ ```
158
+
159
+ #### Content Extraction
160
+
161
+ ```typescript
162
+ // Extract text
163
+ const text = await page.extractText('body');
164
+
165
+ // Get HTML (with cleaning)
166
+ const html = await page.getHTML('basic'); // 'minimal' | 'basic' | 'aggressive'
167
+
168
+ // Get Markdown
169
+ const markdown = await page.getMarkdown({
170
+ includeLinks: true,
171
+ includeImages: true,
172
+ maxLength: 10000
173
+ });
174
+
175
+ // Extract structured JSON
176
+ const data = await page.extractJSON('google_search');
177
+ const siteType = await page.detectWebsiteType();
178
+ ```
179
+
180
+ #### AI Features
181
+
182
+ ```typescript
183
+ // Ask questions about the page
184
+ const answer = await page.queryPage('What is the main topic of this page?');
185
+
186
+ // Execute natural language commands
187
+ await page.executeNLA('go to google.com and search for banana');
188
+ ```
189
+
190
+ #### Screenshots & Video
191
+
192
+ ```typescript
193
+ // Screenshot
194
+ const screenshot = await page.screenshot();
195
+ fs.writeFileSync('screenshot.png', screenshot);
196
+
197
+ // Video recording
198
+ await page.startVideoRecording({ fps: 30 });
199
+ // ... perform actions ...
200
+ const videoPath = await page.stopVideoRecording();
201
+ console.log('Video saved to:', videoPath);
202
+ ```
203
+
204
+ #### Scrolling
205
+
206
+ ```typescript
207
+ await page.scrollBy(0, 500);
208
+ await page.scrollTo(0, 0);
209
+ await page.scrollToTop();
210
+ await page.scrollToBottom();
211
+ await page.scrollToElement('footer');
212
+ ```
213
+
214
+ #### Waiting
215
+
216
+ ```typescript
217
+ // Wait for element
218
+ await page.waitForSelector('submit button', { timeout: 5000 });
219
+
220
+ // Wait for time
221
+ await page.wait(1000); // 1 second
222
+ ```
223
+
224
+ #### Test Execution
225
+
226
+ ```typescript
227
+ // Load test from Developer Playground JSON export
228
+ const test = JSON.parse(fs.readFileSync('test.json', 'utf-8'));
229
+ const result = await page.runTest(test, {
230
+ verbose: true,
231
+ continueOnError: false,
232
+ screenshotOnError: true
233
+ });
234
+
235
+ console.log(`Success: ${result.successfulSteps}/${result.totalSteps}`);
236
+ console.log(`Time: ${result.executionTime}ms`);
237
+
238
+ // Or define test inline
239
+ const inlineTest = {
240
+ name: "Login Test",
241
+ steps: [
242
+ { type: "navigate", url: "https://example.com/login" },
243
+ { type: "type", selector: "#email", text: "user@example.com" },
244
+ { type: "click", selector: "500x300" }, // Position-based click
245
+ { type: "screenshot", filename: "result.png" }
246
+ ]
247
+ };
248
+ const result = await page.runTest(inlineTest);
249
+ ```
250
+
251
+ #### Page Information
252
+
253
+ ```typescript
254
+ const url = await page.getCurrentURL();
255
+ const title = await page.getTitle();
256
+ const info = await page.getPageInfo(); // { url, title, canGoBack, canGoForward }
257
+ ```
258
+
259
+ #### Viewport
260
+
261
+ ```typescript
262
+ await page.setViewport({ width: 1920, height: 1080 });
263
+ const viewport = await page.getViewport();
264
+ ```
265
+
266
+ ## Configuration
267
+
268
+ ### Local Mode Configuration
269
+
270
+ ```typescript
271
+ const browser = new Browser({
272
+ // Connection mode (default: 'local')
273
+ mode: 'local',
274
+
275
+ // Path to browser binary (auto-detected if not provided)
276
+ browserPath: '/path/to/owl_browser',
277
+
278
+ // Enable headless mode (default: true)
279
+ headless: true,
280
+
281
+ // Enable verbose logging (default: false)
282
+ verbose: true,
283
+
284
+ // Initialization timeout in ms (default: 30000)
285
+ initTimeout: 30000
286
+ });
287
+ ```
288
+
289
+ ### HTTP Mode Configuration
290
+
291
+ ```typescript
292
+ const browser = new Browser({
293
+ // Use HTTP mode to connect to remote server
294
+ mode: 'http',
295
+
296
+ // HTTP server connection settings (required for HTTP mode)
297
+ http: {
298
+ // Base URL of the HTTP server
299
+ baseUrl: 'http://localhost:8080',
300
+
301
+ // Bearer token for authentication (must match server's OWL_HTTP_TOKEN)
302
+ token: 'your-secret-token',
303
+
304
+ // Request timeout in ms (default: 30000)
305
+ timeout: 30000
306
+ },
307
+
308
+ // Enable verbose logging (default: false)
309
+ verbose: true
310
+ });
311
+ ```
312
+
313
+ ### Configuration from Environment Variables
314
+
315
+ ```typescript
316
+ // HTTP mode with environment variables
317
+ const browser = new Browser({
318
+ mode: 'http',
319
+ http: {
320
+ baseUrl: process.env.OWL_HTTP_URL || 'http://localhost:8080',
321
+ token: process.env.OWL_HTTP_TOKEN || ''
322
+ }
323
+ });
324
+ ```
325
+
326
+ ## Examples
327
+
328
+ ### Demographics & Context Awareness
329
+
330
+ ```typescript
331
+ import { Browser } from 'owl-browser-sdk';
332
+
333
+ const browser = new Browser();
334
+ await browser.launch();
335
+
336
+ // Get location based on IP address
337
+ const location = await browser.getLocation();
338
+ console.log(`Location: ${location.city}, ${location.country}`);
339
+ console.log(`Coordinates: ${location.latitude}, ${location.longitude}`);
340
+
341
+ // Get current weather
342
+ const weather = await browser.getWeather();
343
+ console.log(`Weather: ${weather.temperature_c}°C, ${weather.condition}`);
344
+
345
+ // Get date and time
346
+ const datetime = await browser.getDateTime();
347
+ console.log(`Date: ${datetime.date} (${datetime.day_of_week})`);
348
+ console.log(`Time: ${datetime.time} ${datetime.timezone}`);
349
+
350
+ // Get everything at once
351
+ const demographics = await browser.getDemographics();
352
+ console.log('Complete context:', demographics);
353
+
354
+ // Get custom homepage
355
+ const homepage = await browser.getHomepage();
356
+ fs.writeFileSync('homepage.html', homepage);
357
+
358
+ await browser.close();
359
+ ```
360
+
361
+ ### Simple Google Search
362
+
363
+ ```typescript
364
+ import { Browser } from 'owl-browser-sdk';
365
+ import fs from 'fs';
366
+
367
+ async function googleSearch() {
368
+ const browser = new Browser();
369
+ await browser.launch();
370
+
371
+ const page = await browser.newPage();
372
+ await page.goto('https://www.google.com');
373
+
374
+ // Natural language selectors!
375
+ await page.type('search box', 'Owl Browser');
376
+ await page.pressKey('Enter');
377
+
378
+ await page.wait(2000);
379
+ const screenshot = await page.screenshot();
380
+ fs.writeFileSync('search-results.png', screenshot);
381
+
382
+ await browser.close();
383
+ }
384
+
385
+ googleSearch();
386
+ ```
387
+
388
+ ### Web Scraping with AI
389
+
390
+ ```typescript
391
+ import { Browser } from 'owl-browser-sdk';
392
+
393
+ async function scrapeWithAI() {
394
+ const browser = new Browser();
395
+ await browser.launch();
396
+
397
+ const page = await browser.newPage();
398
+ await page.goto('https://news.ycombinator.com');
399
+
400
+ // Ask AI about the page
401
+ const answer = await page.queryPage('What are the top 3 story titles?');
402
+ console.log(answer);
403
+
404
+ // Extract as Markdown
405
+ const markdown = await page.getMarkdown();
406
+ console.log(markdown);
407
+
408
+ await browser.close();
409
+ }
410
+
411
+ scrapeWithAI();
412
+ ```
413
+
414
+ ### Video Recording
415
+
416
+ ```typescript
417
+ import { Browser } from 'owl-browser-sdk';
418
+
419
+ async function recordSession() {
420
+ const browser = new Browser();
421
+ await browser.launch();
422
+
423
+ const page = await browser.newPage();
424
+
425
+ // Start recording
426
+ await page.startVideoRecording({ fps: 30 });
427
+
428
+ await page.goto('https://example.com');
429
+ await page.click('some button');
430
+ await page.scrollToBottom();
431
+
432
+ // Stop and get video path
433
+ const videoPath = await page.stopVideoRecording();
434
+ console.log('Video saved to:', videoPath);
435
+
436
+ await browser.close();
437
+ }
438
+
439
+ recordSession();
440
+ ```
441
+
442
+ ## TypeScript Support
443
+
444
+ This SDK is written in TypeScript and includes full type definitions.
445
+
446
+ ```typescript
447
+ import { Browser, BrowserContext, Viewport } from 'owl-browser-sdk';
448
+
449
+ const browser: Browser = new Browser();
450
+ const page: BrowserContext = await browser.newPage();
451
+ const viewport: Viewport = await page.getViewport();
452
+ ```
453
+
454
+ ## Why Olib SDK vs Puppeteer/Playwright?
455
+
456
+ | Feature | Olib SDK | Puppeteer/Playwright |
457
+ |---------|----------|----------------------|
458
+ | Natural Language Selectors | ✅ Built-in | ❌ Not supported |
459
+ | On-Device LLM | ✅ Qwen3-1.7B | ❌ Not supported |
460
+ | Cold Start | <1s | 2-5s |
461
+ | WebDriver Detection | ✅ None | ⚠️ Detectable |
462
+ | Built-in Ad Blocking | ✅ 72 domains | ❌ Manual setup |
463
+ | Maintenance Overhead | ✅ Low (AI handles changes) | ⚠️ High (selectors break) |
464
+
465
+ ## Advanced Features
466
+
467
+ ### Semantic Element Matching
468
+
469
+ The browser uses an advanced semantic matcher that understands natural language:
470
+
471
+ ```typescript
472
+ // All of these work!
473
+ await page.click('search button');
474
+ await page.click('search btn');
475
+ await page.click('search');
476
+ await page.type('email input', 'test@example.com');
477
+ await page.type('email field', 'test@example.com');
478
+ await page.type('email box', 'test@example.com');
479
+ ```
480
+
481
+ The matcher uses:
482
+ - Keyword extraction and normalization
483
+ - Role inference (search_input, submit_button, etc.)
484
+ - Multi-source scoring (aria-label, placeholder, text, title)
485
+ - Fuzzy matching with confidence thresholds
486
+
487
+ ### On-Device LLM
488
+
489
+ Check LLM status:
490
+
491
+ ```typescript
492
+ const status = await browser.getLLMStatus();
493
+ // Returns: 'ready' | 'loading' | 'unavailable'
494
+ ```
495
+
496
+ Query pages with natural language:
497
+
498
+ ```typescript
499
+ const answer = await page.queryPage('What products are shown on this page?');
500
+ const summary = await page.queryPage('Summarize this article in 2 sentences');
501
+ ```
502
+
503
+ ## Error Handling
504
+
505
+ ```typescript
506
+ import { Browser } from 'owl-browser-sdk';
507
+
508
+ async function withErrorHandling() {
509
+ const browser = new Browser();
510
+
511
+ try {
512
+ await browser.launch();
513
+ const page = await browser.newPage();
514
+ await page.goto('https://example.com');
515
+
516
+ // Your automation code...
517
+
518
+ } catch (error) {
519
+ console.error('Automation failed:', error);
520
+ } finally {
521
+ await browser.close();
522
+ }
523
+ }
524
+ ```
525
+
526
+ ## Performance Tips
527
+
528
+ 1. **Reuse browser instances** - Creating a new browser is expensive
529
+ 2. **Use multiple pages** - Instead of multiple browsers, create multiple pages
530
+ 3. **Wait strategically** - Use `waitForSelector()` instead of fixed `wait()`
531
+ 4. **Close contexts** - Call `page.close()` when done with a page
532
+
533
+ ## Troubleshooting
534
+
535
+ ### Browser binary not found
536
+
537
+ Make sure you've built the browser first:
538
+
539
+ ```bash
540
+ cd /path/to/owl-browser
541
+ npm run build
542
+ ```
543
+
544
+ Or provide the path explicitly:
545
+
546
+ ```typescript
547
+ const browser = new Browser({
548
+ browserPath: '/custom/path/to/owl_browser'
549
+ });
550
+ ```
551
+
552
+ ### Timeout errors
553
+
554
+ Increase the timeout:
555
+
556
+ ```typescript
557
+ const browser = new Browser({
558
+ initTimeout: 60000 // 60 seconds
559
+ });
560
+ ```
561
+
562
+ ### Element not found
563
+
564
+ Try different selector variations:
565
+
566
+ ```typescript
567
+ // Try multiple approaches
568
+ await page.click('search button');
569
+ await page.click('search');
570
+ await page.click('[aria-label="Search"]');
571
+ await page.click('#search-btn');
572
+ ```
573
+
574
+ ## License
575
+
576
+ MIT
577
+
578
+ ## Credits
579
+
580
+ - Built with Chromium Embedded Framework (CEF)
581
+ - LLM powered by llama.cpp + Qwen3-1.7B
582
+ - Designed for AI automation and developer productivity