npm - @intuned/browser-dev - Versions diffs - 0.1.8-dev.0 → 0.1.10-dev.0 - Mend

@intuned/browser-dev 0.1.8-dev.0 → 0.1.10-dev.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (40) hide show

package/README.md +85 -143
package/dist/ai/export.d.ts +291 -143
package/dist/ai/extractStructuredData.js +21 -27
package/dist/ai/extractStructuredDataUsingAi.js +24 -1
package/dist/ai/index.d.ts +291 -143
package/dist/ai/tests/testCreateMatchesMapping.spec.js +216 -0
package/dist/ai/tests/testExtractStructuredData.spec.js +348 -2
package/dist/ai/tests/testExtractStructuredDataDomMatchingIframes.spec.js +459 -0
package/dist/ai/tests/testExtractStructuredDataUnit.spec.js +375 -0
package/dist/ai/tests/testMatching.spec.js +342 -0
package/dist/ai/tests/testValidateMatchesMapping.spec.js +265 -0
package/dist/common/Logger/index.js +2 -2
package/dist/common/extendedTest.js +38 -30
package/dist/common/frame_utils/frameTree.js +116 -0
package/dist/common/frame_utils/getContentWithNestedIframes.js +13 -0
package/dist/common/frame_utils/index.js +95 -0
package/dist/common/frame_utils/stitchIframe.js +105 -0
package/dist/{helpers → common}/frame_utils/tests/testFindAllIframes.spec.js +24 -15
package/dist/common/frame_utils/tests/testGetContentWithNestedIframes.spec.js +241 -0
package/dist/common/frame_utils/utils.js +91 -0
package/dist/common/getSimplifiedHtml.js +20 -20
package/dist/common/matching/matching.js +91 -16
package/dist/common/tests/matching.test.js +225 -0
package/dist/common/tests/testGetSimplifiedHtml.spec.js +324 -0
package/dist/helpers/export.d.ts +702 -575
package/dist/helpers/extractMarkdown.js +16 -7
package/dist/helpers/index.d.ts +702 -575
package/dist/helpers/tests/testExtractMarkdown.spec.js +29 -0
package/dist/helpers/waitForDomSettled.js +4 -4
package/dist/helpers/withNetworkSettledWait.js +2 -7
package/dist/optimized-extractors/export.d.ts +17 -18
package/dist/optimized-extractors/index.d.ts +17 -18
package/dist/types/intuned-runtime.d.ts +6 -32
package/how-to-generate-docs.md +40 -28
package/package.json +2 -2
package/dist/helpers/frame_utils/constants.js +0 -8
package/dist/helpers/frame_utils/findAllIframes.js +0 -82
package/dist/helpers/frame_utils/index.js +0 -44
/package/dist/{helpers → common}/frame_utils/checkFrameAllowsAsyncScripts.js +0 -0
/package/dist/{helpers → common}/frame_utils/getContainerFrame.js +0 -0

package/README.md CHANGED Viewed

@@ -1,159 +1,101 @@
-# Intuned Browser SDK (TypeScript)
+---
+title: "TypeScript SDK"
+sidebarTitle: "@intuned/browser"
+icon: cube
+---
-Intuned's TypeScript/JavaScript SDK for browser automation and web data extraction, designed to work seamlessly with the Intuned platform.
+Browser automation helpers for TypeScript/JavaScript, built on [Playwright](https://playwright.dev/). This package provides utilities for common automation tasks—AI-powered data extraction, navigation with retries, pagination handling, and more.
 ## Installation
-### Using Yarn (Recommended)
-```bash
-yarn add @intuned/browser
-```
-### Using npm
 ```bash
 npm install @intuned/browser
 ```
-## Features
-The Intuned Browser SDK provides a comprehensive set of tools for browser automation and data extraction:
-### 🤖 AI-Powered Extraction
-- **Structured Data Extraction** - Extract structured data from web pages using AI with `extractStructuredData()`
-- **Smart Page Loading Detection** - Determine when pages have fully loaded with `isPageLoaded()`
-- **Schema Validation** - Validate extracted data against JSON schemas
-### 🌐 Web Automation Helpers
-- **Navigation** - Advanced URL navigation with `goToUrl()`
-- **Content Loading** - Scroll to load dynamic content with `scrollToLoadContent()`
-- **Network Monitoring** - Wait for network activity with `withNetworkSettledWait()`
-- **DOM Monitoring** - Wait for DOM changes with `waitForDomSettled()`
-- **Click Automation** - Click elements until exhausted with `clickUntilExhausted()`
-### 📄 Content Processing
-- **HTML Sanitization** - Clean and sanitize HTML with `sanitizeHtml()`
-- **Markdown Extraction** - Convert HTML to markdown with `extractMarkdown()`
-- **URL Resolution** - Resolve relative URLs with `resolveUrl()`
-- **Date Processing** - Parse and process dates with `processDate()`
+<Note>
+When using [Intuned](https://intuned.io), this package is pre-installed in every TypeScript project.
+</Note>
-### 📁 File Operations
-- **File Downloads** - Download files with `downloadFile()`
-- **S3 Integration** - Upload and save files to S3 with `uploadFileToS3()` and `saveFileToS3()`
-### ✅ Data Validation
-- **Schema Validation** - Validate data structures with `validateDataUsingSchema()`
-- **Empty Value Filtering** - Filter empty values with `filterEmptyValues()`
-### ⚡ Optimized Extractors
-- **High-Performance Extractors** - Pre-built optimized extractors for common use cases
-- Available via `@intuned/browser/optimized-extractors`
-## Quick Start
+## Quick example
 ```typescript
-import {
-  extractMarkdown,
-  sanitizeHtml,
-  goToUrl,
-  withNetworkSettledWait,
-} from "@intuned/browser";
-// Example: Extract and process web content
-async function extractContent(page: Page) {
-  // Navigate to URL
-  await goToUrl(page, "https://example.com");
-  // Wait for network to settle
-  await withNetworkSettledWait(page, async () => {
-    // Your actions here
+import { Page, BrowserContext } from "playwright";
+import { extractStructuredData, isPageLoaded } from "@intuned/browser/ai";
+import { goToUrl } from "@intuned/browser";
+interface Params {}
+export default async function automation(
+  params: Params,
+  page: Page,
+  context: BrowserContext
+) {
+  await goToUrl(page, "https://books.toscrape.com");
+  const loaded = await isPageLoaded({ source: page });
+  if (!loaded) {
+    throw new Error("Page is not loaded, cannot extract data");
+  }
+  // Extract all book listings from the page
+  const books = await extractStructuredData({
+    source: page,
+    dataSchema: {
+      type: "object",
+      properties: {
+        products: {
+          type: "array",
+          items: {
+            type: "object",
+            properties: {
+              title: { type: "string" },
+              price: { type: "string" },
+            },
+          },
+        },
+      },
+    },
+    prompt: "Extract all book listings with their titles and prices",
+    strategy: "HTML",
+    model: "claude-haiku-4-5-20251001",
   });
-  // Get and sanitize HTML
-  const html = await page.content();
-  const cleanHtml = sanitizeHtml(html);
-  // Extract markdown
-  const markdown = extractMarkdown(cleanHtml);
-  return markdown;
+  return books;
 }
 ```
-## AI-Powered Data Extraction
-```typescript
-import { extractStructuredData } from "@intuned/browser/ai";
-import type { JsonSchema } from "@intuned/browser/ai";
-// Define your data schema
-const schema: JsonSchema = {
-  type: "object",
-  properties: {
-    title: { type: "string" },
-    price: { type: "number" },
-    description: { type: "string" },
-  },
-  required: ["title", "price"],
-};
-// Extract structured data using AI
-async function extractProductData(page: Page) {
-  const result = await extractStructuredData({
-    page,
-    schema,
-    prompt: "Extract product information from this page",
-  });
-  return result;
-}
-```
-## Module Exports
-The SDK provides multiple import paths for different features:
-```typescript
-// Main helpers
-import { goToUrl, sanitizeHtml /* ... */ } from "@intuned/browser";
-// AI functions
-import { extractStructuredData, isPageLoaded } from "@intuned/browser/ai";
-// Optimized extractors
-import /* extractors */ "@intuned/browser/optimized-extractors";
-```
-## Documentation
-For detailed documentation on all functions and types, see the [documentation](https://docs.intunedhq.com/docs-old/getting-started/introduction).
-## Building from Source
-```bash
-# Install dependencies
-yarn install
-# Build the project
-yarn build
-# Run tests
-yarn test
-# Run tests with UI
-yarn test:dev
-```
-## Support
-For support, questions, or contributions, please contact the Intuned team at engineering@intunedhq.com.
-## About Intuned
-Intuned provides powerful tools for browser automation, web scraping, and data extraction. Visit [intunedhq.com](https://intunedhq.com) to learn more.
+## AI module
+AI-powered utilities for data extraction and page analysis. These functions use AI and incur costs.
+| Function | Description |
+| --- | --- |
+| [`extractStructuredData`](./ai/functions/extractStructuredData) | Extract structured data from pages using AI with JSON Schema or Zod validation |
+| [`isPageLoaded`](./ai/functions/isPageLoaded) | Detect when a page has finished loading |
+<Tip>AI functions support caching and matching to reduce costs.</Tip>
+## Helpers module
+| Function | Description |
+| --- | --- |
+| [`goToUrl`](./helpers/functions/goToUrl) | Navigate with automatic retries and error handling |
+| [`withNetworkSettledWait`](./helpers/functions/withNetworkSettledWait) | Wait for network requests to complete |
+| [`waitForDomSettled`](./helpers/functions/waitForDomSettled) | Wait for DOM mutations to finish |
+| [`scrollToLoadContent`](./helpers/functions/scrollToLoadContent) | Load infinite-scroll content |
+| [`clickUntilExhausted`](./helpers/functions/clickUntilExhausted) | Click "Load More" buttons until all content loads |
+| [`extractMarkdown`](./helpers/functions/extractMarkdown) | Convert pages to markdown |
+| [`downloadFile`](./helpers/functions/downloadFile) | Download files with different triggers |
+| [`saveFileToS3`](./helpers/functions/saveFileToS3) | Download and upload files to S3 |
+| [`uploadFileToS3`](./helpers/functions/uploadFileToS3) | Upload files with custom S3 configurations |
+| [`filterEmptyValues`](./helpers/functions/filterEmptyValues) | Remove empty values from data |
+| [`validateDataUsingSchema`](./helpers/functions/validateDataUsingSchema) | Validate data against schemas |
+| [`processDate`](./helpers/functions/processDate) | Parse and normalize dates |
+| [`sanitizeHtml`](./helpers/functions/sanitizeHtml) | Clean and sanitize HTML |
+| [`resolveUrl`](./helpers/functions/resolveUrl) | Resolve relative URLs to absolute paths |
+## Requirements
+- Node.js 18+
+- Playwright (`npm install playwright && npx playwright install`)
+- For AI functions: API key for your AI provider (set via environment variable or function parameter)