@intuned/browser-dev 0.1.8-dev.0 → 0.1.10-dev.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (40) hide show
  1. package/README.md +85 -143
  2. package/dist/ai/export.d.ts +291 -143
  3. package/dist/ai/extractStructuredData.js +21 -27
  4. package/dist/ai/extractStructuredDataUsingAi.js +24 -1
  5. package/dist/ai/index.d.ts +291 -143
  6. package/dist/ai/tests/testCreateMatchesMapping.spec.js +216 -0
  7. package/dist/ai/tests/testExtractStructuredData.spec.js +348 -2
  8. package/dist/ai/tests/testExtractStructuredDataDomMatchingIframes.spec.js +459 -0
  9. package/dist/ai/tests/testExtractStructuredDataUnit.spec.js +375 -0
  10. package/dist/ai/tests/testMatching.spec.js +342 -0
  11. package/dist/ai/tests/testValidateMatchesMapping.spec.js +265 -0
  12. package/dist/common/Logger/index.js +2 -2
  13. package/dist/common/extendedTest.js +38 -30
  14. package/dist/common/frame_utils/frameTree.js +116 -0
  15. package/dist/common/frame_utils/getContentWithNestedIframes.js +13 -0
  16. package/dist/common/frame_utils/index.js +95 -0
  17. package/dist/common/frame_utils/stitchIframe.js +105 -0
  18. package/dist/{helpers → common}/frame_utils/tests/testFindAllIframes.spec.js +24 -15
  19. package/dist/common/frame_utils/tests/testGetContentWithNestedIframes.spec.js +241 -0
  20. package/dist/common/frame_utils/utils.js +91 -0
  21. package/dist/common/getSimplifiedHtml.js +20 -20
  22. package/dist/common/matching/matching.js +91 -16
  23. package/dist/common/tests/matching.test.js +225 -0
  24. package/dist/common/tests/testGetSimplifiedHtml.spec.js +324 -0
  25. package/dist/helpers/export.d.ts +702 -575
  26. package/dist/helpers/extractMarkdown.js +16 -7
  27. package/dist/helpers/index.d.ts +702 -575
  28. package/dist/helpers/tests/testExtractMarkdown.spec.js +29 -0
  29. package/dist/helpers/waitForDomSettled.js +4 -4
  30. package/dist/helpers/withNetworkSettledWait.js +2 -7
  31. package/dist/optimized-extractors/export.d.ts +17 -18
  32. package/dist/optimized-extractors/index.d.ts +17 -18
  33. package/dist/types/intuned-runtime.d.ts +6 -32
  34. package/how-to-generate-docs.md +40 -28
  35. package/package.json +2 -2
  36. package/dist/helpers/frame_utils/constants.js +0 -8
  37. package/dist/helpers/frame_utils/findAllIframes.js +0 -82
  38. package/dist/helpers/frame_utils/index.js +0 -44
  39. /package/dist/{helpers → common}/frame_utils/checkFrameAllowsAsyncScripts.js +0 -0
  40. /package/dist/{helpers → common}/frame_utils/getContainerFrame.js +0 -0
package/README.md CHANGED
@@ -1,159 +1,101 @@
1
- # Intuned Browser SDK (TypeScript)
1
+ ---
2
+ title: "TypeScript SDK"
3
+ sidebarTitle: "@intuned/browser"
4
+ icon: cube
5
+ ---
2
6
 
3
- Intuned's TypeScript/JavaScript SDK for browser automation and web data extraction, designed to work seamlessly with the Intuned platform.
7
+ Browser automation helpers for TypeScript/JavaScript, built on [Playwright](https://playwright.dev/). This package provides utilities for common automation tasks—AI-powered data extraction, navigation with retries, pagination handling, and more.
4
8
 
5
9
  ## Installation
6
10
 
7
- ### Using Yarn (Recommended)
8
-
9
- ```bash
10
- yarn add @intuned/browser
11
- ```
12
-
13
- ### Using npm
14
-
15
11
  ```bash
16
12
  npm install @intuned/browser
17
13
  ```
18
14
 
19
- ## Features
20
-
21
- The Intuned Browser SDK provides a comprehensive set of tools for browser automation and data extraction:
22
-
23
- ### 🤖 AI-Powered Extraction
24
-
25
- - **Structured Data Extraction** - Extract structured data from web pages using AI with `extractStructuredData()`
26
- - **Smart Page Loading Detection** - Determine when pages have fully loaded with `isPageLoaded()`
27
- - **Schema Validation** - Validate extracted data against JSON schemas
28
-
29
- ### 🌐 Web Automation Helpers
30
-
31
- - **Navigation** - Advanced URL navigation with `goToUrl()`
32
- - **Content Loading** - Scroll to load dynamic content with `scrollToLoadContent()`
33
- - **Network Monitoring** - Wait for network activity with `withNetworkSettledWait()`
34
- - **DOM Monitoring** - Wait for DOM changes with `waitForDomSettled()`
35
- - **Click Automation** - Click elements until exhausted with `clickUntilExhausted()`
36
-
37
- ### 📄 Content Processing
38
-
39
- - **HTML Sanitization** - Clean and sanitize HTML with `sanitizeHtml()`
40
- - **Markdown Extraction** - Convert HTML to markdown with `extractMarkdown()`
41
- - **URL Resolution** - Resolve relative URLs with `resolveUrl()`
42
- - **Date Processing** - Parse and process dates with `processDate()`
15
+ <Note>
16
+ When using [Intuned](https://intuned.io), this package is pre-installed in every TypeScript project.
17
+ </Note>
43
18
 
44
- ### 📁 File Operations
45
-
46
- - **File Downloads** - Download files with `downloadFile()`
47
- - **S3 Integration** - Upload and save files to S3 with `uploadFileToS3()` and `saveFileToS3()`
48
-
49
- ### ✅ Data Validation
50
-
51
- - **Schema Validation** - Validate data structures with `validateDataUsingSchema()`
52
- - **Empty Value Filtering** - Filter empty values with `filterEmptyValues()`
53
-
54
- ### ⚡ Optimized Extractors
55
-
56
- - **High-Performance Extractors** - Pre-built optimized extractors for common use cases
57
- - Available via `@intuned/browser/optimized-extractors`
58
-
59
- ## Quick Start
19
+ ## Quick example
60
20
 
61
21
  ```typescript
62
- import {
63
- extractMarkdown,
64
- sanitizeHtml,
65
- goToUrl,
66
- withNetworkSettledWait,
67
- } from "@intuned/browser";
68
-
69
- // Example: Extract and process web content
70
- async function extractContent(page: Page) {
71
- // Navigate to URL
72
- await goToUrl(page, "https://example.com");
73
-
74
- // Wait for network to settle
75
- await withNetworkSettledWait(page, async () => {
76
- // Your actions here
22
+ import { Page, BrowserContext } from "playwright";
23
+ import { extractStructuredData, isPageLoaded } from "@intuned/browser/ai";
24
+ import { goToUrl } from "@intuned/browser";
25
+
26
+ interface Params {}
27
+
28
+ export default async function automation(
29
+ params: Params,
30
+ page: Page,
31
+ context: BrowserContext
32
+ ) {
33
+ await goToUrl(page, "https://books.toscrape.com");
34
+
35
+ const loaded = await isPageLoaded({ source: page });
36
+ if (!loaded) {
37
+ throw new Error("Page is not loaded, cannot extract data");
38
+ }
39
+
40
+ // Extract all book listings from the page
41
+ const books = await extractStructuredData({
42
+ source: page,
43
+ dataSchema: {
44
+ type: "object",
45
+ properties: {
46
+ products: {
47
+ type: "array",
48
+ items: {
49
+ type: "object",
50
+ properties: {
51
+ title: { type: "string" },
52
+ price: { type: "string" },
53
+ },
54
+ },
55
+ },
56
+ },
57
+ },
58
+ prompt: "Extract all book listings with their titles and prices",
59
+ strategy: "HTML",
60
+ model: "claude-haiku-4-5-20251001",
77
61
  });
78
62
 
79
- // Get and sanitize HTML
80
- const html = await page.content();
81
- const cleanHtml = sanitizeHtml(html);
82
-
83
- // Extract markdown
84
- const markdown = extractMarkdown(cleanHtml);
85
-
86
- return markdown;
63
+ return books;
87
64
  }
88
65
  ```
89
66
 
90
- ## AI-Powered Data Extraction
91
-
92
- ```typescript
93
- import { extractStructuredData } from "@intuned/browser/ai";
94
- import type { JsonSchema } from "@intuned/browser/ai";
95
-
96
- // Define your data schema
97
- const schema: JsonSchema = {
98
- type: "object",
99
- properties: {
100
- title: { type: "string" },
101
- price: { type: "number" },
102
- description: { type: "string" },
103
- },
104
- required: ["title", "price"],
105
- };
106
-
107
- // Extract structured data using AI
108
- async function extractProductData(page: Page) {
109
- const result = await extractStructuredData({
110
- page,
111
- schema,
112
- prompt: "Extract product information from this page",
113
- });
114
- return result;
115
- }
116
- ```
117
-
118
- ## Module Exports
119
-
120
- The SDK provides multiple import paths for different features:
121
-
122
- ```typescript
123
- // Main helpers
124
- import { goToUrl, sanitizeHtml /* ... */ } from "@intuned/browser";
125
-
126
- // AI functions
127
- import { extractStructuredData, isPageLoaded } from "@intuned/browser/ai";
128
-
129
- // Optimized extractors
130
- import /* extractors */ "@intuned/browser/optimized-extractors";
131
- ```
132
-
133
- ## Documentation
134
-
135
- For detailed documentation on all functions and types, see the [documentation](https://docs.intunedhq.com/docs-old/getting-started/introduction).
136
-
137
- ## Building from Source
138
-
139
- ```bash
140
- # Install dependencies
141
- yarn install
142
-
143
- # Build the project
144
- yarn build
145
-
146
- # Run tests
147
- yarn test
148
-
149
- # Run tests with UI
150
- yarn test:dev
151
- ```
152
-
153
- ## Support
154
-
155
- For support, questions, or contributions, please contact the Intuned team at engineering@intunedhq.com.
156
-
157
- ## About Intuned
158
-
159
- Intuned provides powerful tools for browser automation, web scraping, and data extraction. Visit [intunedhq.com](https://intunedhq.com) to learn more.
67
+ ## AI module
68
+
69
+ AI-powered utilities for data extraction and page analysis. These functions use AI and incur costs.
70
+
71
+ | Function | Description |
72
+ | --- | --- |
73
+ | [`extractStructuredData`](./ai/functions/extractStructuredData) | Extract structured data from pages using AI with JSON Schema or Zod validation |
74
+ | [`isPageLoaded`](./ai/functions/isPageLoaded) | Detect when a page has finished loading |
75
+
76
+ <Tip>AI functions support caching and matching to reduce costs.</Tip>
77
+
78
+ ## Helpers module
79
+
80
+ | Function | Description |
81
+ | --- | --- |
82
+ | [`goToUrl`](./helpers/functions/goToUrl) | Navigate with automatic retries and error handling |
83
+ | [`withNetworkSettledWait`](./helpers/functions/withNetworkSettledWait) | Wait for network requests to complete |
84
+ | [`waitForDomSettled`](./helpers/functions/waitForDomSettled) | Wait for DOM mutations to finish |
85
+ | [`scrollToLoadContent`](./helpers/functions/scrollToLoadContent) | Load infinite-scroll content |
86
+ | [`clickUntilExhausted`](./helpers/functions/clickUntilExhausted) | Click "Load More" buttons until all content loads |
87
+ | [`extractMarkdown`](./helpers/functions/extractMarkdown) | Convert pages to markdown |
88
+ | [`downloadFile`](./helpers/functions/downloadFile) | Download files with different triggers |
89
+ | [`saveFileToS3`](./helpers/functions/saveFileToS3) | Download and upload files to S3 |
90
+ | [`uploadFileToS3`](./helpers/functions/uploadFileToS3) | Upload files with custom S3 configurations |
91
+ | [`filterEmptyValues`](./helpers/functions/filterEmptyValues) | Remove empty values from data |
92
+ | [`validateDataUsingSchema`](./helpers/functions/validateDataUsingSchema) | Validate data against schemas |
93
+ | [`processDate`](./helpers/functions/processDate) | Parse and normalize dates |
94
+ | [`sanitizeHtml`](./helpers/functions/sanitizeHtml) | Clean and sanitize HTML |
95
+ | [`resolveUrl`](./helpers/functions/resolveUrl) | Resolve relative URLs to absolute paths |
96
+
97
+ ## Requirements
98
+
99
+ - Node.js 18+
100
+ - Playwright (`npm install playwright && npx playwright install`)
101
+ - For AI functions: API key for your AI provider (set via environment variable or function parameter)