@purepageio/fetch-engines 0.2.12 → 0.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -11,11 +11,10 @@ Fetching web content can be complex. You need to handle static HTML, dynamic Jav
11
11
 
12
12
  - **Unified API:** Get content from simple or complex sites using the same `fetchHTML(url, options?)` method.
13
13
  - **Flexible Strategies:** Choose the right tool for the job:
14
- - `FetchEngine`: Lightweight and fast for static HTML, using the standard `fetch` API.
15
- - `PlaywrightEngine`: Powerful browser automation for JavaScript-heavy sites, handling rendering and interactions.
16
- - `HybridEngine`: The best of both worlds – tries `FetchEngine` first for speed, automatically falls back to `PlaywrightEngine` for reliability on complex pages.
14
+ - `FetchEngine`: Lightweight and fast for static HTML, using the standard `fetch` API. Ideal for speed and efficiency with content that doesn't require JavaScript rendering. Supports custom headers.
15
+ - `HybridEngine`: The best of both worlds – tries `FetchEngine` first for speed, automatically falls back to a powerful browser engine (internally, `PlaywrightEngine`) for reliability on complex, JavaScript-heavy pages. Supports custom headers.
17
16
  - **Robust & Resilient:** Built-in caching, configurable retries, and standardized error handling make your fetching logic more dependable.
18
- - **Simplified Automation:** `PlaywrightEngine` manages browser instances and contexts automatically through efficient pooling and includes integrated stealth measures to bypass common anti-bot systems.
17
+ - **Simplified Automation:** When `HybridEngine` uses its browser capabilities (via the internal `PlaywrightEngine`), it manages browser instances and contexts automatically through efficient pooling and includes integrated stealth measures to bypass common anti-bot systems.
19
18
  - **Content Transformation:** Optionally convert fetched HTML directly to clean Markdown content.
20
19
  - **TypeScript Ready:** Fully typed for a better development experience.
21
20
 
@@ -32,18 +31,20 @@ This package provides a high-level abstraction, letting you focus on using the w
32
31
  - [API Reference](#api-reference)
33
32
  - [Stealth / Anti-Detection (`PlaywrightEngine`)](#stealth--anti-detection-playwrightengine)
34
33
  - [Error Handling](#error-handling)
34
+ - [Logging](#logging)
35
35
  - [Contributing](#contributing)
36
36
  - [License](#license)
37
37
 
38
38
  ## Features
39
39
 
40
- - **Multiple Fetching Strategies:** Choose between `FetchEngine` (lightweight `fetch`), `PlaywrightEngine` (robust JS rendering via Playwright), or `HybridEngine` (smart fallback).
41
- - **Unified API:** Simple `fetchHTML(url, options?)` interface across all engines.
40
+ - **Multiple Fetching Strategies:** Choose between `FetchEngine` (lightweight `fetch`) or `HybridEngine` (smart fallback to a full browser engine).
41
+ - **Unified API:** Simple `fetchHTML(url, options?)` interface across both primary engines.
42
+ - **Custom Headers:** Easily provide custom HTTP headers for requests in both `FetchEngine` and `HybridEngine`.
42
43
  - **Configurable Retries:** Automatic retries on failure with customizable attempts and delays.
43
44
  - **Built-in Caching:** In-memory caching with configurable TTL to reduce redundant fetches.
44
- - **Playwright Stealth:** Automatic integration of `playwright-extra` and stealth plugins to bypass common bot detection.
45
- - **Managed Browser Pooling:** Efficient resource management for `PlaywrightEngine` with configurable browser/context limits and lifecycles.
46
- - **Smart Fallbacks:** `HybridEngine` uses `FetchEngine` first, falling back to `PlaywrightEngine` only when needed. `PlaywrightEngine` can optionally use a fast HTTP fetch before launching a full browser.
45
+ - **Playwright Stealth:** When `HybridEngine` utilizes its browser capabilities, it automatically integrates `playwright-extra` and stealth plugins to bypass common bot detection.
46
+ - **Managed Browser Pooling:** Efficient resource management for `HybridEngine`'s browser mode with configurable browser/context limits and lifecycles.
47
+ - **Smart Fallbacks:** `HybridEngine` uses `FetchEngine` first, falling back to its internal browser engine only when needed. The internal browser engine can also optionally use a fast HTTP fetch before launching a full browser.
47
48
  - **Content Conversion:** Optionally convert fetched HTML directly to Markdown.
48
49
  - **Standardized Errors:** Custom `FetchError` classes provide context on failures.
49
50
  - **TypeScript Ready:** Fully typed codebase for enhanced developer experience.
@@ -58,7 +59,7 @@ npm install @purepageio/fetch-engines
58
59
  yarn add @purepageio/fetch-engines
59
60
  ```
60
61
 
61
- If you plan to use the `PlaywrightEngine` or `HybridEngine`, you also need to install Playwright's browser binaries:
62
+ If you plan to use the `HybridEngine` (which internally uses Playwright for advanced fetching), you also need to install Playwright's browser binaries:
62
63
 
63
64
  ```bash
64
65
  pnpm exec playwright install
@@ -68,9 +69,9 @@ npx playwright install
68
69
 
69
70
  ## Engines
70
71
 
71
- - **`FetchEngine`**: Uses the standard `fetch` API. Suitable for simple HTML pages or APIs returning HTML. Lightweight and fast.
72
- - **`PlaywrightEngine`**: Uses Playwright to control a managed pool of headless browsers (Chromium by default via `playwright-extra`). Handles JavaScript rendering, complex interactions, and provides automatic stealth/anti-bot detection measures. More resource-intensive but necessary for dynamic websites.
73
- - **`HybridEngine`**: A smart combination. It first attempts to fetch content using the lightweight `FetchEngine`. If that fails for _any_ reason (e.g., network error, non-HTML content, HTTP error like 403), it automatically falls back to using the `PlaywrightEngine`. This provides the speed of `FetchEngine` for simple sites while retaining the power of `PlaywrightEngine` for complex ones.
72
+ - **`FetchEngine`**: Uses the standard `fetch` API. Suitable for simple HTML pages or APIs returning HTML. Lightweight and fast. This is your go-to for speed and efficiency when JavaScript rendering is not required.
73
+ - **`HybridEngine`**: A smart combination. It first attempts to fetch content using the lightweight `FetchEngine`. If that fails for _any_ reason (e.g., network error, non-HTML content, HTTP error like 403), or if `spaMode` is enabled and an SPA shell is detected, it automatically falls back to using an internal, powerful browser engine (based on Playwright). This provides the speed of `FetchEngine` for simple sites while retaining the power of a full browser for complex, dynamic websites. This is recommended for most general-purpose fetching tasks.
74
+ - **`PlaywrightEngine` (Internal Component)**: While not recommended for direct use by most users, `PlaywrightEngine` is the component `HybridEngine` uses internally for its browser-based fetching. It manages Playwright browser instances, contexts, and stealth features. Users needing direct, low-level control over Playwright might consider it, but `HybridEngine` offers a more robust and flexible approach for most scenarios.
74
75
 
75
76
  ## Basic Usage
76
77
 
@@ -101,68 +102,54 @@ async function main() {
101
102
  main();
102
103
  ```
103
104
 
104
- ### PlaywrightEngine
105
-
106
- ```typescript
107
- import { PlaywrightEngine } from "@purepageio/fetch-engines";
108
-
109
- // Engine configured to fetch HTML by default
110
- const engine = new PlaywrightEngine({ markdown: false });
111
-
112
- async function main() {
113
- try {
114
- const url = "https://quotes.toscrape.com/";
115
-
116
- // Example: Fetching as Markdown using per-request override
117
- console.log(`Fetching ${url} as Markdown...`);
118
- const mdResult = await engine.fetchHTML(url, { markdown: true });
119
- console.log(`Fetched ${mdResult.url} (ContentType: ${mdResult.contentType}) - Title: ${mdResult.title}`);
120
- console.log(`Content (Markdown):\n${mdResult.content.substring(0, 300)}...`);
121
-
122
- // You could also fetch as HTML by default:
123
- // const htmlResult = await engine.fetchHTML(url);
124
- // console.log(`\nFetched ${htmlResult.url} (ContentType: ${htmlResult.contentType}) - Title: ${htmlResult.title}`);
125
- } catch (error) {
126
- console.error("Playwright fetch failed:", error);
127
- } finally {
128
- await engine.cleanup();
129
- }
130
- }
131
- main();
132
- ```
133
-
134
105
  ### HybridEngine
135
106
 
136
107
  ```typescript
137
108
  import { HybridEngine } from "@purepageio/fetch-engines";
138
109
 
139
- // Engine configured to fetch HTML by default for both internal engines
140
- const engine = new HybridEngine({ markdown: false });
110
+ // Engine configured to fetch HTML by default for its internal engines
111
+ // and provide some custom headers for all requests made by HybridEngine.
112
+ const engine = new HybridEngine({
113
+ markdown: false,
114
+ headers: { "X-Global-Custom-Header": "HybridGlobalValue" },
115
+ // Other PlaywrightEngine specific configs can be set here for the fallback mechanism
116
+ // e.g., playwrightLaunchOptions: { args: ["--disable-gpu"] }
117
+ });
141
118
 
142
119
  async function main() {
143
120
  try {
144
- const url1 = "https://example.com"; // Simple site
145
- const url2 = "https://quotes.toscrape.com/"; // Complex site
146
-
147
- // --- Scenario 1: FetchEngine Succeeds ---
148
- console.log(`\nFetching simple site (${url1}) requesting Markdown...`);
149
- // FetchEngine uses its constructor config (markdown: false), ignoring the per-request option.
150
- const result1 = await engine.fetchHTML(url1, { markdown: true });
121
+ const urlSimple = "https://example.com"; // Simple site, likely handled by FetchEngine
122
+ const urlComplex = "https://quotes.toscrape.com/"; // JS-heavy site, likely requiring Playwright fallback
123
+
124
+ // --- Scenario 1: FetchEngine part of HybridEngine handles it ---
125
+ console.log(`\nFetching simple site (${urlSimple}) with per-request headers...`);
126
+ const result1 = await engine.fetchHTML(urlSimple, {
127
+ headers: { "X-Request-Specific": "SimpleRequestValue" },
128
+ });
129
+ // FetchEngine (via HybridEngine) will use:
130
+ // 1. Its base default headers (User-Agent etc.)
131
+ // 2. Overridden/augmented by HybridEngine's constructor headers ("X-Global-Custom-Header")
132
+ // 3. Overridden/augmented by per-request headers ("X-Request-Specific")
151
133
  console.log(`Fetched ${result1.url} (ContentType: ${result1.contentType}) - Title: ${result1.title}`);
152
- console.log(`Content is ${result1.contentType} because FetchEngine succeeded and used its own config.`);
153
- console.log(`${result1.content.substring(0, 300)}...`);
154
-
155
- // --- Scenario 2: FetchEngine Fails, Playwright Fallback Occurs ---
156
- console.log(`\nFetching complex site (${url2}) requesting Markdown...`);
157
- // Assume FetchEngine fails for url2. PlaywrightEngine will be used and *will* receive the markdown: true override.
158
- const result2 = await engine.fetchHTML(url2, { markdown: true });
134
+ console.log(`Content (HTML): ${result1.content.substring(0, 100)}...`);
135
+
136
+ // --- Scenario 2: Playwright part of HybridEngine handles it ---
137
+ console.log(`\nFetching complex site (${urlComplex}) requesting Markdown and with per-request headers...`);
138
+ const result2 = await engine.fetchHTML(urlComplex, {
139
+ markdown: true,
140
+ headers: { "X-Request-Specific": "ComplexRequestValue", "X-Another": "ComplexAnother" },
141
+ });
142
+ // PlaywrightEngine (via HybridEngine) will use:
143
+ // 1. Its base default headers (User-Agent etc. if doing HTTP fallback, or for page.setExtraHTTPHeaders)
144
+ // 2. Overridden/augmented by HybridEngine's constructor headers ("X-Global-Custom-Header")
145
+ // 3. Overridden/augmented by per-request headers ("X-Request-Specific", "X-Another")
146
+ // The markdown: true option will be respected by the Playwright part.
159
147
  console.log(`Fetched ${result2.url} (ContentType: ${result2.contentType}) - Title: ${result2.title}`);
160
- console.log(`Content is ${result2.contentType} because Playwright fallback used the per-request option.`);
161
- console.log(`${result2.content.substring(0, 300)}...`);
148
+ console.log(`Content (Markdown):\n${result2.content.substring(0, 300)}...`);
162
149
  } catch (error) {
163
150
  console.error("Hybrid fetch failed:", error);
164
151
  } finally {
165
- await engine.cleanup();
152
+ await engine.cleanup(); // Important for HybridEngine
166
153
  }
167
154
  }
168
155
  main();
@@ -176,58 +163,83 @@ Engines accept an optional configuration object in their constructor to customis
176
163
 
177
164
  The `FetchEngine` accepts a `FetchEngineOptions` object with the following properties:
178
165
 
179
- | Option | Type | Default | Description |
180
- | ---------- | --------- | ------- | ------------------------------------------------------------------------------------------------------ |
181
- | `markdown` | `boolean` | `false` | If `true`, converts fetched HTML to Markdown. `contentType` in the result will be set to `'markdown'`. |
166
+ | Option | Type | Default | Description |
167
+ | ---------- | ------------------------ | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
168
+ | `markdown` | `boolean` | `false` | If `true`, converts fetched HTML to Markdown. `contentType` in the result will be set to `'markdown'`. |
169
+ | `headers` | `Record<string, string>` | `{}` | Custom HTTP headers to be sent with the request. These are merged with and can override the engine's default headers. Headers from `fetchHTML` options take higher precedence. |
182
170
 
183
171
  ```typescript
184
- // Example: Always convert to Markdown
185
- const mdFetchEngine = new FetchEngine({ markdown: true });
172
+ // Example: FetchEngine with custom headers and Markdown conversion
173
+ const customFetchEngine = new FetchEngine({
174
+ markdown: true,
175
+ headers: {
176
+ "User-Agent": "MyCustomFetchAgent/1.0",
177
+ "X-Api-Key": "your-api-key",
178
+ },
179
+ });
186
180
  ```
187
181
 
188
- ### PlaywrightEngine
189
-
190
- The `PlaywrightEngine` accepts a `PlaywrightEngineConfig` object with the following properties:
191
-
192
- **General Options:**
193
-
194
- | Option | Type | Default | Description |
195
- | ----------------------- | --------- | -------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
196
- | `markdown` | `boolean` | `false` | If `true`, converts content (from Playwright or fallback) to Markdown. `contentType` will be `'markdown'`. Can be overridden per-request. |
197
- | `useHttpFallback` | `boolean` | `true` | If `true`, attempts a fast HTTP fetch before using Playwright. |
198
- | `useHeadedModeFallback` | `boolean` | `false` | If `true`, automatically retries specific failed domains in headed (visible) mode. |
199
- | `defaultFastMode` | `boolean` | `true` | If `true`, initially blocks non-essential resources and skips human simulation. Can be overridden per-request. |
200
- | `simulateHumanBehavior` | `boolean` | `true` | If `true` (and not `fastMode`), attempts basic human-like interactions. |
201
- | `concurrentPages` | `number` | `3` | Max number of pages to process concurrently within the engine queue. |
202
- | `maxRetries` | `number` | `3` | Max retry attempts for a failed fetch (excluding initial try). |
203
- | `retryDelay` | `number` | `5000` | Delay (ms) between retries. |
204
- | `cacheTTL` | `number` | `900000` | Cache Time-To-Live (ms). `0` disables caching. (15 mins default) |
205
- | `spaMode` | `boolean` | `false` | If `true`, enables Single Page Application mode. This typically bypasses `useHttpFallback`, forces `fastMode` to effectively `false`, uses more patient load conditions (e.g., network idle), and may apply `spaRenderDelayMs`. Recommended for JavaScript-heavy sites. |
206
- | `spaRenderDelayMs` | `number` | `0` | Explicit delay (ms) after page load events in `spaMode` to allow for client-side rendering. Only applies if `spaMode` is `true`. |
207
-
208
- **Browser Pool Options (Passed to internal `PlaywrightBrowserPool`):**
209
-
210
- | Option | Type | Default | Description |
211
- | -------------------------- | -------------------------- | ----------- | ------------------------------------------------------------------------- |
212
- | `maxBrowsers` | `number` | `2` | Max concurrent browser instances managed by the pool. |
213
- | `maxPagesPerContext` | `number` | `6` | Max pages per browser context before recycling. |
214
- | `maxBrowserAge` | `number` | `1200000` | Max age (ms) a browser instance lives before recycling. (20 mins default) |
215
- | `healthCheckInterval` | `number` | `60000` | How often (ms) the pool checks browser health. (1 min default) |
216
- | `useHeadedMode` | `boolean` | `false` | Forces the _entire pool_ to launch browsers in headed (visible) mode. |
217
- | `poolBlockedDomains` | `string[]` | `[]` | List of domain glob patterns to block requests to. |
218
- | `poolBlockedResourceTypes` | `string[]` | `[]` | List of Playwright resource types (e.g., 'image', 'font') to block. |
219
- | `proxy` | `{ server: string, ... }?` | `undefined` | Proxy configuration object (see `PlaywrightEngineConfig` type). |
220
-
221
- ### HybridEngine
222
-
223
- The `HybridEngine` constructor accepts a single optional argument which uses the **`PlaywrightEngineConfig`** structure (see the `PlaywrightEngine` tables above). These options configure the underlying engines where applicable:
224
-
225
- - Options like `maxRetries`, `cacheTTL`, `proxy`, `maxBrowsers`, `spaMode`, `spaRenderDelayMs`, etc., are primarily passed to the internal `PlaywrightEngine` or used by `HybridEngine` to decide its strategy.
226
- - The `markdown` setting in the constructor (`boolean`, default: `false`) applies to **both** internal engines by default.
227
- - The `spaMode` setting in the constructor (`boolean`, default: `false`) configures the default SPA behavior for the `HybridEngine`. If `spaMode` is true, the `HybridEngine` will attempt to detect if the `FetchEngine` result is an SPA shell (e.g., empty root div, noscript tag). If so, it will automatically fallback to `PlaywrightEngine` (with `spaMode` active) even if `FetchEngine` returned a 200 status.
228
- - If you provide `markdown: true` or `spaMode: true` in the `options` object when calling `fetchHTML`, this override is handled as follows:
229
- - For `markdown`: Only applies if a fallback to `PlaywrightEngine` is necessary or if `FetchEngine` succeeded but an SPA shell was detected in `spaMode` (forcing Playwright). The `FetchEngine` part (if its result is used) will always use the `markdown` setting provided in the `HybridEngine` constructor.
230
- - For `spaMode`: This directly controls the `HybridEngine`'s SPA shell detection and informs the `PlaywrightEngine` if a fallback occurs.
182
+ #### Header Precedence for `FetchEngine`:
183
+
184
+ 1. Headers passed in `fetchHTML(url, { headers: { ... } })` (highest precedence).
185
+ 2. Headers passed in the `FetchEngine` constructor `new FetchEngine({ headers: { ... } })`.
186
+ 3. Default headers of the `FetchEngine` (e.g., its default `User-Agent`) (lowest precedence).
187
+
188
+ ### `PlaywrightEngineConfig` (Used by `HybridEngine`)
189
+
190
+ The `HybridEngine` constructor accepts a `PlaywrightEngineConfig` object. These settings configure the underlying `FetchEngine` and `PlaywrightEngine` (for fallback scenarios) and the hybrid strategy itself. When using `HybridEngine`, you are essentially configuring how it will behave and how its internal Playwright capabilities will operate if needed.
191
+
192
+ **Key Options for `HybridEngine` (from `PlaywrightEngineConfig`):**
193
+
194
+ | Option | Type | Default | Description |
195
+ | ------------------------- | ------------------------ | ----------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
196
+ | `headers` | `Record<string, string>` | `{}` | Custom HTTP headers. For `HybridEngine`, these serve as default headers for both its internal `FetchEngine` (constructor) and `PlaywrightEngine` (constructor). They can be overridden by headers in `HybridEngine.fetchHTML()` options. |
197
+ | `markdown` | `boolean` | `false` | Default Markdown conversion. For `HybridEngine`: sets default for internal `FetchEngine` (constructor) and internal `PlaywrightEngine`. Can be overridden per-request for the `PlaywrightEngine` part. |
198
+ | `useHttpFallback` | `boolean` | `true` | (For Playwright part) If `true`, attempts a fast HTTP fetch before using Playwright. Ineffective if `spaMode` is `true`. |
199
+ | `useHeadedModeFallback` | `boolean` | `false` | (For Playwright part) If `true`, automatically retries specific failed Playwright attempts in headed (visible) mode. |
200
+ | `defaultFastMode` | `boolean` | `true` | If `true`, initially blocks non-essential resources and skips human simulation. Can be overridden per-request. Effectively `false` if `spaMode` is `true`. |
201
+ | `simulateHumanBehavior` | `boolean` | `true` | If `true` (and not `fastMode` or `spaMode`), attempts basic human-like interactions. |
202
+ | `concurrentPages` | `number` | `3` | Max number of pages to process concurrently within the engine queue. |
203
+ | `maxRetries` | `number` | `3` | Max retry attempts for a failed fetch (excluding initial try). |
204
+ | `retryDelay` | `number` | `5000` | Delay (ms) between retries. |
205
+ | `cacheTTL` | `number` | `900000` | Cache Time-To-Live (ms). `0` disables caching. (15 mins default) |
206
+ | `spaMode` | `boolean` | `false` | If `true`, enables Single Page Application mode. This typically bypasses `useHttpFallback`, effectively sets `fastMode` to `false`, uses more patient load conditions (e.g., network idle), and may apply `spaRenderDelayMs`. Recommended for JavaScript-heavy sites. |
207
+ | `spaRenderDelayMs` | `number` | `0` | Explicit delay (ms) after page load events in `spaMode` to allow for client-side rendering. Only applies if `spaMode` is `true`. |
208
+ | `playwrightLaunchOptions` | `LaunchOptions` | `undefined` | (For Playwright part) Optional Playwright launch options (from `playwright` package, e.g., `{ args: ['--some-flag'] }`) passed when a browser instance is created. Merged with internal defaults. |
209
+
210
+ **Browser Pool Options (For `HybridEngine`'s internal `PlaywrightEngine`):**
211
+
212
+ | Option | Type | Default | Description |
213
+ | -------------------------- | -------------------------- | ----------- | ------------------------------------------------------------------------------------------- |
214
+ | `maxBrowsers` | `number` | `2` | Max concurrent browser instances managed by the pool. |
215
+ | `maxPagesPerContext` | `number` | `6` | Max pages per browser context before recycling. |
216
+ | `maxBrowserAge` | `number` | `1200000` | Max age (ms) a browser instance lives before recycling. (20 mins default) |
217
+ | `healthCheckInterval` | `number` | `60000` | How often (ms) the pool checks browser health. (1 min default) |
218
+ | `useHeadedMode` | `boolean` | `false` | Forces the _entire pool_ (for Playwright part) to launch browsers in headed (visible) mode. |
219
+ | `poolBlockedDomains` | `string[]` | `[]` | List of domain glob patterns to block requests to (for Playwright part). |
220
+ | `poolBlockedResourceTypes` | `string[]` | `[]` | List of Playwright resource types (e.g., 'image', 'font') to block (for Playwright part). |
221
+ | `proxy` | `{ server: string, ... }?` | `undefined` | Proxy configuration object (see `PlaywrightEngineConfig` type) (for Playwright part). |
222
+
223
+ ### `HybridEngine` - Configuration Summary & Header Precedence
224
+
225
+ When you configure `HybridEngine` using `PlaywrightEngineConfig`:
226
+
227
+ - **`headers`**: Constructor headers are passed to the internal `FetchEngine`'s constructor and the internal `PlaywrightEngine`'s constructor.
228
+ - **`markdown`**: Sets the default for both internal engines.
229
+ - **`spaMode`**: Sets the default for `HybridEngine`'s SPA shell detection and for the internal `PlaywrightEngine`.
230
+ - Other options primarily configure the internal `PlaywrightEngine` or general retry/caching logic.
231
+
232
+ **Per-request `options` in `HybridEngine.fetchHTML(url, options)`:**
233
+
234
+ - **`headers?: Record<string, string>`**:
235
+ - These headers override any headers set in the `HybridEngine` constructor.
236
+ - If `FetchEngine` is used: These headers are passed to `FetchEngine.fetchHTML(url, { headers: ... })`. `FetchEngine` then merges them with its constructor headers and base defaults.
237
+ - If `PlaywrightEngine` (fallback) is used: These headers are merged with `HybridEngine` constructor headers (options take precedence) and the result is passed to `PlaywrightEngine.fetchHTML()`. `PlaywrightEngine` then applies its own logic (e.g., for `page.setExtraHTTPHeaders` or its HTTP fallback).
238
+ - **`markdown?: boolean`**:
239
+ - If `FetchEngine` is used: This per-request option is **ignored**. `FetchEngine` uses its own constructor `markdown` setting.
240
+ - If `PlaywrightEngine` (fallback) is used: This overrides `PlaywrightEngine`'s default and determines its output format.
241
+ - **`spaMode?: boolean`**: Overrides `HybridEngine`'s default SPA mode and is passed to `PlaywrightEngine` if used.
242
+ - **`fastMode?: boolean`**: Passed to `PlaywrightEngine` if used; no effect on `FetchEngine`.
231
243
 
232
244
  ```typescript
233
245
  // Example: HybridEngine with SPA mode enabled by default
@@ -264,24 +276,26 @@ All `fetchHTML()` methods return a Promise that resolves to an `HTMLFetchResult`
264
276
 
265
277
  - `url` (`string`): URL to fetch.
266
278
  - `options?` (`FetchOptions`): Optional per-request overrides.
267
- - `markdown?: boolean`: (Playwright/Hybrid only) Request Markdown conversion. For Hybrid, only applies on fallback to Playwright.
268
- - `fastMode?: boolean`: (Playwright/Hybrid only) Override fast mode.
269
- - `spaMode?: boolean`: (Playwright/Hybrid only) Override SPA mode behavior for this request.
279
+ - `headers?: Record<string, string>`: Custom headers for this specific request.
280
+ - `markdown?: boolean`: (For `HybridEngine`'s Playwright part) Request Markdown conversion.
281
+ - `fastMode?: boolean`: (For `HybridEngine`'s Playwright part) Override fast mode.
282
+ - `spaMode?: boolean`: (For `HybridEngine`) Override SPA mode behavior for this request.
270
283
  - **Returns:** `Promise<HTMLFetchResult>`
271
284
 
272
285
  Fetches content, returning HTML or Markdown based on configuration/options in `result.content` with `result.contentType` indicating the format.
273
286
 
274
- ### `engine.cleanup()` (PlaywrightEngine & HybridEngine)
287
+ ### `engine.cleanup()` (`HybridEngine` and direct `FetchEngine` if no cleanup needed)
275
288
 
276
289
  - **Returns:** `Promise<void>`
277
290
 
278
- Gracefully shuts down all browser instances managed by the `PlaywrightEngine`'s browser pool (used by both `PlaywrightEngine` and `HybridEngine`). **It is crucial to call `await engine.cleanup()` when you are finished using these engines** to release system resources.
291
+ For `HybridEngine`, this gracefully shuts down all browser instances managed by its internal `PlaywrightEngine`. **It is crucial to call `await engine.cleanup()` when you are finished using `HybridEngine`** to release system resources.
292
+ `FetchEngine` has a `cleanup` method for API consistency, but it's a no-op as `FetchEngine` doesn't manage persistent resources.
279
293
 
280
- ## Stealth / Anti-Detection (`PlaywrightEngine`)
294
+ ## Stealth / Anti-Detection (via `HybridEngine`)
281
295
 
282
- The `PlaywrightEngine` automatically integrates `playwright-extra` and its powerful stealth plugin ([`puppeteer-extra-plugin-stealth`](https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth)). This plugin applies various techniques to make the headless browser controlled by Playwright appear more like a regular human-operated browser, helping to bypass many common bot detection systems.
296
+ When `HybridEngine` uses its internal browser capabilities (via `PlaywrightEngine`), it automatically integrates `playwright-extra` and its powerful stealth plugin ([`puppeteer-extra-plugin-stealth`](https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth)). This plugin applies various techniques to make the headless browser controlled by Playwright appear more like a regular human-operated browser, helping to bypass many common bot detection systems.
283
297
 
284
- There are **no manual configuration options** for stealth; it is enabled by default when using `PlaywrightEngine`. The previous options (`useStealthMode`, `randomizeFingerprint`, `evasionLevel`) have been removed.
298
+ There are **no manual configuration options** for stealth; it is enabled by default when `HybridEngine` uses its browser functionality.
285
299
 
286
300
  While effective, be aware that no stealth technique is foolproof, and sophisticated websites may still detect automated browsing.
287
301
 
@@ -295,67 +309,73 @@ Errors during fetching are typically thrown as instances of `FetchError` (or its
295
309
  - `originalError` (`Error | undefined`): The underlying error that caused this fetch error (e.g., a Playwright error object).
296
310
  - `statusCode` (`number | undefined`): The HTTP status code, if relevant (especially for `FetchEngineHttpError`).
297
311
 
298
- Common error scenarios include:
299
-
300
- - Network issues (DNS resolution failure, connection refused).
301
- - HTTP errors (4xx client errors, 5xx server errors) -> `FetchEngineHttpError` from `FetchEngine` or potentially wrapped `FetchError` from `PlaywrightEngine`.
302
- - Non-HTML content type received -> `FetchError` with code `ERR_NON_HTML_CONTENT` from `FetchEngine`.
303
- - Playwright navigation timeouts -> `FetchError` wrapping Playwright error, often with code `ERR_NAVIGATION_TIMEOUT`.
304
- - Proxy connection errors.
305
- - Page crashes within Playwright.
306
- - Errors thrown by the browser pool (e.g., failure to launch browser).
312
+ Common `FetchError` codes and scenarios:
313
+
314
+ - **`ERR_HTTP_ERROR`**: Thrown by `FetchEngine` for HTTP status codes >= 400. `error.statusCode` will be set.
315
+ - **`ERR_NON_HTML_CONTENT`**: Thrown by `FetchEngine` if the content type is not HTML and `markdown` conversion is not requested.
316
+ - **`ERR_PLAYWRIGHT_OPERATION`**: A general error from `HybridEngine`'s browser mode indicating a failure during a Playwright operation (e.g., page acquisition, navigation, interaction). The `originalError` property will often contain the specific Playwright error.
317
+ - **`ERR_NAVIGATION`**: Often seen as part of `ERR_PLAYWRIGHT_OPERATION`'s message or in `originalError` when a Playwright navigation (in `HybridEngine`'s browser mode) fails (e.g., timeout, SSL error).
318
+ - **`ERR_MARKDOWN_CONVERSION_NON_HTML`**: Thrown by `HybridEngine` (when its Playwright part is active) if `markdown: true` is requested for a non-HTML content type (e.g., XML, JSON).
319
+ - **`ERR_UNSUPPORTED_RAW_CONTENT_TYPE`**: Thrown by `HybridEngine` (when its Playwright part is active and `markdown: false`) if requested for a content type it doesn't support for direct fetching (e.g., images, applications).
320
+ - **`ERR_CACHE_ERROR`**: Indicates an issue with cache read/write operations.
321
+ - **`ERR_PROXY_CONFIG_ERROR`**: Problem with proxy configuration (for `HybridEngine`'s browser mode).
322
+ - **`ERR_BROWSER_POOL_EXHAUSTED`**: If `HybridEngine`'s browser pool cannot provide a page.
323
+ - **Other Scenarios (often wrapped by `ERR_PLAYWRIGHT_OPERATION` or a generic `FetchError` when `HybridEngine` uses its browser mode):**
324
+ - Network issues (DNS resolution, connection refused).
325
+ - Proxy connection failures.
326
+ - Page crashes or context/browser disconnections within Playwright.
327
+ - Failures during browser launch or management by the pool.
307
328
 
308
329
  The `HTMLFetchResult` object may also contain an `error` property if the final fetch attempt failed after all retries but an earlier attempt (within retries) might have produced some intermediate (potentially unusable) result data. It's generally best to rely on the thrown error for failure handling.
309
330
 
310
331
  **Example:**
311
332
 
312
333
  ```typescript
313
- import { FetchEngine, FetchError } from "@purepageio/fetch-engines";
334
+ import { HybridEngine, FetchError } from "@purepageio/fetch-engines";
314
335
 
315
- const engine = new FetchEngine();
336
+ // Example using HybridEngine to illustrate error handling
337
+ const engine = new HybridEngine({ useHttpFallback: false, maxRetries: 1 }); // useHttpFallback for Playwright part
316
338
 
317
339
  async function fetchWithHandling(url: string) {
318
340
  try {
319
- const result = await engine.fetchHTML(url);
320
- // Note: result.error is less common, primary errors are thrown.
341
+ const result = await engine.fetchHTML(url, { headers: { "X-My-Header": "TestValue" } });
321
342
  if (result.error) {
322
- console.error(`Fetch for ${url} reported error after retries: ${result.error.message}`);
323
- } else {
324
- console.log(`Success for ${url}! Content type: ${result.contentType}`);
325
- // Use result.content
343
+ console.warn(`Fetch for ${url} included non-critical error after retries: ${result.error.message}`);
326
344
  }
345
+ console.log(`Success for ${url}! Title: ${result.title}, Content type: ${result.contentType}`);
346
+ // Use result.content
327
347
  } catch (error) {
328
- console.error(`Fetch failed entirely for ${url}:`);
348
+ console.error(`Fetch failed for ${url}:`);
329
349
  if (error instanceof FetchError) {
330
- // Handle specific FetchError codes
331
- switch (error.code) {
332
- case "ERR_HTTP_ERROR":
333
- console.error(` HTTP Error: Status ${error.statusCode} - ${error.message}`);
334
- break;
335
- case "ERR_NON_HTML_CONTENT":
336
- console.error(` Wrong Content Type: ${error.message}`);
337
- break;
338
- // Add other specific codes as needed
339
- default:
340
- console.error(` FetchError (${error.code || "UNKNOWN"}): ${error.message}`);
341
- break;
350
+ console.error(` Error Code: ${error.code || "N/A"}`);
351
+ console.error(` Message: ${error.message}`);
352
+ if (error.statusCode) {
353
+ console.error(` Status Code: ${error.statusCode}`);
342
354
  }
343
355
  if (error.originalError) {
344
- console.error(` Original Error: ${error.originalError.message}`);
356
+ console.error(` Original Error: ${error.originalError.name} - ${error.originalError.message}`);
357
+ }
358
+ // Example of specific handling:
359
+ if (error.code === "ERR_PLAYWRIGHT_OPERATION") {
360
+ console.error(
361
+ " Hint: This was a Playwright operation failure (HybridEngine's browser mode). Check Playwright logs or originalError."
362
+ );
345
363
  }
346
364
  } else if (error instanceof Error) {
347
- // Handle generic JavaScript errors
348
365
  console.error(` Generic Error: ${error.message}`);
349
366
  } else {
350
- // Handle unexpected throw types
351
- console.error(` Unknown error occurred.`);
367
+ console.error(` Unknown error occurred: ${String(error)}`);
352
368
  }
353
369
  }
354
370
  }
355
371
 
356
- fetchWithHandling("https://example.com");
357
- fetchWithHandling("https://httpbin.org/status/404"); // Example causing HTTP error
358
- fetchWithHandling("https://httpbin.org/image/png"); // Example causing non-HTML error
372
+ async function runExamples() {
373
+ await fetchWithHandling("https://nonexistentdomain.example.com"); // Likely DNS or navigation error via FetchEngine or Playwright
374
+ await fetchWithHandling("https://example.com/non_html_resource.json"); // Test with actual JSON URL if available (FetchEngine might handle, or Playwright if complex)
375
+ await engine.cleanup(); // Important for HybridEngine
376
+ }
377
+
378
+ runExamples();
359
379
  ```
360
380
 
361
381
  ## Logging
@@ -1 +1 @@
1
- {"version":3,"file":"FetchEngine.d.ts","sourceRoot":"","sources":["../src/FetchEngine.ts"],"names":[],"mappings":"AAAA,OAAO,KAAK,EAAE,eAAe,EAAE,cAAc,EAAE,kBAAkB,EAAE,MAAM,YAAY,CAAC;AACtF,OAAO,KAAK,EAAE,OAAO,EAAE,MAAM,cAAc,CAAC;AAG5C,OAAO,EAAE,UAAU,EAAE,MAAM,aAAa,CAAC;AAEzC;;GAEG;AACH,qBAAa,oBAAqB,SAAQ,UAAU;aAGhC,UAAU,EAAE,MAAM;gBADlC,OAAO,EAAE,MAAM,EACC,UAAU,EAAE,MAAM;CAKrC;AAED;;;;;GAKG;AACH,qBAAa,WAAY,YAAW,OAAO;IACzC,OAAO,CAAC,QAAQ,CAAC,OAAO,CAA+B;IAEvD,OAAO,CAAC,MAAM,CAAC,QAAQ,CAAC,eAAe,CAErC;IAEF;;;OAGG;gBACS,OAAO,GAAE,kBAAuB;IAI5C;;;;;;;OAOG;IACG,SAAS,CAAC,GAAG,EAAE,MAAM,EAAE,OAAO,CAAC,EAAE,kBAAkB,GAAG,OAAO,CAAC,eAAe,CAAC;IAiEpF;;;;OAIG;IACG,OAAO,IAAI,OAAO,CAAC,IAAI,CAAC;IAI9B;;;;OAIG;IACH,UAAU,IAAI,cAAc,EAAE;CAG/B"}
1
+ {"version":3,"file":"FetchEngine.d.ts","sourceRoot":"","sources":["../src/FetchEngine.ts"],"names":[],"mappings":"AAAA,OAAO,KAAK,EAAE,eAAe,EAAE,cAAc,EAAE,kBAAkB,EAAE,MAAM,YAAY,CAAC;AACtF,OAAO,KAAK,EAAE,OAAO,EAAE,MAAM,cAAc,CAAC;AAG5C,OAAO,EAAE,UAAU,EAAE,MAAM,aAAa,CAAC;AAEzC;;GAEG;AACH,qBAAa,oBAAqB,SAAQ,UAAU;aAGhC,UAAU,EAAE,MAAM;gBADlC,OAAO,EAAE,MAAM,EACC,UAAU,EAAE,MAAM;CAKrC;AAED;;;;;GAKG;AACH,qBAAa,WAAY,YAAW,OAAO;IACzC,OAAO,CAAC,QAAQ,CAAC,OAAO,CAA+B;IAEvD,OAAO,CAAC,MAAM,CAAC,QAAQ,CAAC,eAAe,CAGrC;IAEF;;;OAGG;gBACS,OAAO,GAAE,kBAAuB;IAI5C;;;;;;;OAOG;IACG,SAAS,CAAC,GAAG,EAAE,MAAM,EAAE,OAAO,CAAC,EAAE,kBAAkB,GAAG,OAAO,CAAC,eAAe,CAAC;IA+EpF;;;;OAIG;IACG,OAAO,IAAI,OAAO,CAAC,IAAI,CAAC;IAI9B;;;;OAIG;IACH,UAAU,IAAI,cAAc,EAAE;CAG/B"}
@@ -21,6 +21,7 @@ export class FetchEngine {
21
21
  options;
22
22
  static DEFAULT_OPTIONS = {
23
23
  markdown: false,
24
+ headers: {},
24
25
  };
25
26
  /**
26
27
  * Creates an instance of FetchEngine.
@@ -41,14 +42,24 @@ export class FetchEngine {
41
42
  const effectiveOptions = { ...this.options, ...options }; // Combine constructor and call options
42
43
  let response;
43
44
  try {
45
+ const baseHeaders = {
46
+ "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36",
47
+ Accept: "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8",
48
+ "Accept-Language": "en-US,en;q=0.9",
49
+ };
50
+ // this.options.headers are headers passed to the constructor
51
+ const constructorHeaders = this.options.headers || {};
52
+ // options.headers are headers passed directly to the fetchHTML method
53
+ // options is the second argument to fetchHTML: async fetchHTML(url: string, options?: FetchEngineOptions)
54
+ const callSpecificHeaders = options?.headers || {};
55
+ const finalHeaders = {
56
+ ...baseHeaders,
57
+ ...constructorHeaders,
58
+ ...callSpecificHeaders, // Ensures callSpecificHeaders override constructorHeaders, which override baseHeaders
59
+ };
44
60
  response = await fetch(url, {
45
61
  redirect: "follow",
46
- headers: {
47
- // Standard browser-like headers
48
- "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36",
49
- Accept: "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8",
50
- "Accept-Language": "en-US,en;q=0.9",
51
- },
62
+ headers: finalHeaders,
52
63
  });
53
64
  if (!response.ok) {
54
65
  throw new FetchEngineHttpError(`HTTP error! status: ${response.status}`, response.status);
@@ -1 +1 @@
1
- {"version":3,"file":"FetchEngine.js","sourceRoot":"","sources":["../src/FetchEngine.ts"],"names":[],"mappings":"AAGA,OAAO,EAAE,iBAAiB,EAAE,MAAM,+BAA+B,CAAC,CAAC,uBAAuB;AAC1F,OAAO,EAAE,UAAU,EAAE,MAAM,aAAa,CAAC,CAAC,yBAAyB;AAEnE;;GAEG;AACH,MAAM,OAAO,oBAAqB,SAAQ,UAAU;IAGhC;IAFlB,YACE,OAAe,EACC,UAAkB;QAElC,KAAK,CAAC,OAAO,EAAE,gBAAgB,EAAE,SAAS,EAAE,UAAU,CAAC,CAAC;QAFxC,eAAU,GAAV,UAAU,CAAQ;QAGlC,IAAI,CAAC,IAAI,GAAG,sBAAsB,CAAC;IACrC,CAAC;CACF;AAED;;;;;GAKG;AACH,MAAM,OAAO,WAAW;IACL,OAAO,CAA+B;IAE/C,MAAM,CAAU,eAAe,GAAiC;QACtE,QAAQ,EAAE,KAAK;KAChB,CAAC;IAEF;;;OAGG;IACH,YAAY,UAA8B,EAAE;QAC1C,IAAI,CAAC,OAAO,GAAG,EAAE,GAAG,WAAW,CAAC,eAAe,EAAE,GAAG,OAAO,EAAE,CAAC;IAChE,CAAC;IAED;;;;;;;OAOG;IACH,KAAK,CAAC,SAAS,CAAC,GAAW,EAAE,OAA4B;QACvD,MAAM,gBAAgB,GAAG,EAAE,GAAG,IAAI,CAAC,OAAO,EAAE,GAAG,OAAO,EAAE,CAAC,CAAC,uCAAuC;QACjG,IAAI,QAAkB,CAAC;QACvB,IAAI,CAAC;YACH,QAAQ,GAAG,MAAM,KAAK,CAAC,GAAG,EAAE;gBAC1B,QAAQ,EAAE,QAAQ;gBAClB,OAAO,EAAE;oBACP,gCAAgC;oBAChC,YAAY,EACV,iHAAiH;oBACnH,MAAM,EAAE,kGAAkG;oBAC1G,iBAAiB,EAAE,gBAAgB;iBACpC;aACF,CAAC,CAAC;YAEH,IAAI,CAAC,QAAQ,CAAC,EAAE,EAAE,CAAC;gBACjB,MAAM,IAAI,oBAAoB,CAAC,uBAAuB,QAAQ,CAAC,MAAM,EAAE,EAAE,QAAQ,CAAC,MAAM,CAAC,CAAC;YAC5F,CAAC;YAED,MAAM,iBAAiB,GAAG,QAAQ,CAAC,OAAO,CAAC,GAAG,CAAC,cAAc,CAAC,CAAC;YAC/D,IAAI,CAAC,iBAAiB,IAAI,CAAC,iBAAiB,CAAC,QAAQ,CAAC,WAAW,CAAC,EAAE,CAAC;gBACnE,MAAM,IAAI,UAAU,CAAC,+BAA+B,EAAE,sBAAsB,CAAC,CAAC;YAChF,CAAC;YAED,MAAM,IAAI,GAAG,MAAM,QAAQ,CAAC,IAAI,EAAE,CAAC;YACnC,MAAM,UAAU,GAAG,IAAI,CAAC,KAAK,CAAC,+BAA+B,CAAC,CAAC;YAC/D,MAAM,KAAK,GAAG,UAAU,CAAC,CAAC,CAAC,UAAU,CAAC,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC,CAAC,CAAC,IAAI,CAAC;YAEvD,IAAI,YAAY,GAAG,IAAI,CAAC;YACxB,IAAI,gBAAgB,GAAwB,MAAM,CAAC;YAEnD,IAAI,gBAAgB,CAAC,QAAQ,EAAE,CAAC;gBAC9B,IAAI,CAAC;oBACH,MAAM,SAAS,GAAG,IAAI,iBAAiB,EAAE,CAAC;oBAC1C,YAAY,GAAG,SAAS,CAAC,OAAO,CAAC,IAAI,CAAC,CAAC;oBACvC,gBAAgB,GAAG,UAAU,CAAC;gBAChC,CAAC;gBAAC,OAAO,eAAoB,EAAE,CAAC;oBAC9B,OAAO,CAAC,KAAK,CAAC,kCAAkC,GAAG,iBAAiB,EAAE,eAAe,CAAC,CAAC;oBACvF,gDAAgD;gBAClD,CAAC;YACH,CAAC;YAED,OAAO;gBACL,OAAO,EAAE,YAAY;gBACrB,WAAW,EAAE,gBAAgB;gBAC7B,KAAK,EAAE,KAAK;gBACZ,GAAG,EAAE,QAAQ,CAAC,GAAG,EAAE,oCAAoC;gBACvD,WAAW,EAAE,KAAK;gBAClB,UAAU,EAAE,QAAQ,CAAC,MAAM;gBAC3B,KAAK,EAAE,SAAS;aACjB,CAAC;QACJ,CAAC;QAAC,OAAO,KAAU,EAAE,CAAC;YACpB,0CAA0C;YAC1C,IACE,KAAK,YAAY,oBAAoB;gBACrC,CAAC,KAAK,YAAY,UAAU,IAAI,KAAK,CAAC,IAAI,KAAK,sBAAsB,CAAC,EACtE,CAAC;gBACD,MAAM,KAAK,CAAC;YACd,CAAC;YACD,+BAA+B;YAC/B,MAAM,OAAO,GAAG,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,OAAO,CAAC,CAAC,CAAC,qBAAqB,CAAC;YAC/E,MAAM,IAAI,UAAU,CAAC,iBAAiB,OAAO,EAAE,EAAE,kBAAkB,EAAE,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,CAAC,CAAC,SAAS,CAAC,CAAC;QACnH,CAAC;IACH,CAAC;IAED;;;;OAIG;IACH,KAAK,CAAC,OAAO;QACX,OAAO,OAAO,CAAC,OAAO,EAAE,CAAC;IAC3B,CAAC;IAED;;;;OAIG;IACH,UAAU;QACR,OAAO,EAAE,CAAC;IACZ,CAAC"}
1
+ {"version":3,"file":"FetchEngine.js","sourceRoot":"","sources":["../src/FetchEngine.ts"],"names":[],"mappings":"AAGA,OAAO,EAAE,iBAAiB,EAAE,MAAM,+BAA+B,CAAC,CAAC,uBAAuB;AAC1F,OAAO,EAAE,UAAU,EAAE,MAAM,aAAa,CAAC,CAAC,yBAAyB;AAEnE;;GAEG;AACH,MAAM,OAAO,oBAAqB,SAAQ,UAAU;IAGhC;IAFlB,YACE,OAAe,EACC,UAAkB;QAElC,KAAK,CAAC,OAAO,EAAE,gBAAgB,EAAE,SAAS,EAAE,UAAU,CAAC,CAAC;QAFxC,eAAU,GAAV,UAAU,CAAQ;QAGlC,IAAI,CAAC,IAAI,GAAG,sBAAsB,CAAC;IACrC,CAAC;CACF;AAED;;;;;GAKG;AACH,MAAM,OAAO,WAAW;IACL,OAAO,CAA+B;IAE/C,MAAM,CAAU,eAAe,GAAiC;QACtE,QAAQ,EAAE,KAAK;QACf,OAAO,EAAE,EAAE;KACZ,CAAC;IAEF;;;OAGG;IACH,YAAY,UAA8B,EAAE;QAC1C,IAAI,CAAC,OAAO,GAAG,EAAE,GAAG,WAAW,CAAC,eAAe,EAAE,GAAG,OAAO,EAAE,CAAC;IAChE,CAAC;IAED;;;;;;;OAOG;IACH,KAAK,CAAC,SAAS,CAAC,GAAW,EAAE,OAA4B;QACvD,MAAM,gBAAgB,GAAG,EAAE,GAAG,IAAI,CAAC,OAAO,EAAE,GAAG,OAAO,EAAE,CAAC,CAAC,uCAAuC;QACjG,IAAI,QAAkB,CAAC;QACvB,IAAI,CAAC;YACH,MAAM,WAAW,GAAG;gBAClB,YAAY,EACV,iHAAiH;gBACnH,MAAM,EAAE,kGAAkG;gBAC1G,iBAAiB,EAAE,gBAAgB;aACpC,CAAC;YAEF,6DAA6D;YAC7D,MAAM,kBAAkB,GAAG,IAAI,CAAC,OAAO,CAAC,OAAO,IAAI,EAAE,CAAC;YAEtD,sEAAsE;YACtE,0GAA0G;YAC1G,MAAM,mBAAmB,GAAG,OAAO,EAAE,OAAO,IAAI,EAAE,CAAC;YAEnD,MAAM,YAAY,GAAG;gBACnB,GAAG,WAAW;gBACd,GAAG,kBAAkB;gBACrB,GAAG,mBAAmB,EAAE,sFAAsF;aAC/G,CAAC;YAEF,QAAQ,GAAG,MAAM,KAAK,CAAC,GAAG,EAAE;gBAC1B,QAAQ,EAAE,QAAQ;gBAClB,OAAO,EAAE,YAAY;aACtB,CAAC,CAAC;YAEH,IAAI,CAAC,QAAQ,CAAC,EAAE,EAAE,CAAC;gBACjB,MAAM,IAAI,oBAAoB,CAAC,uBAAuB,QAAQ,CAAC,MAAM,EAAE,EAAE,QAAQ,CAAC,MAAM,CAAC,CAAC;YAC5F,CAAC;YAED,MAAM,iBAAiB,GAAG,QAAQ,CAAC,OAAO,CAAC,GAAG,CAAC,cAAc,CAAC,CAAC;YAC/D,IAAI,CAAC,iBAAiB,IAAI,CAAC,iBAAiB,CAAC,QAAQ,CAAC,WAAW,CAAC,EAAE,CAAC;gBACnE,MAAM,IAAI,UAAU,CAAC,+BAA+B,EAAE,sBAAsB,CAAC,CAAC;YAChF,CAAC;YAED,MAAM,IAAI,GAAG,MAAM,QAAQ,CAAC,IAAI,EAAE,CAAC;YACnC,MAAM,UAAU,GAAG,IAAI,CAAC,KAAK,CAAC,+BAA+B,CAAC,CAAC;YAC/D,MAAM,KAAK,GAAG,UAAU,CAAC,CAAC,CAAC,UAAU,CAAC,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC,CAAC,CAAC,IAAI,CAAC;YAEvD,IAAI,YAAY,GAAG,IAAI,CAAC;YACxB,IAAI,gBAAgB,GAAwB,MAAM,CAAC;YAEnD,IAAI,gBAAgB,CAAC,QAAQ,EAAE,CAAC;gBAC9B,IAAI,CAAC;oBACH,MAAM,SAAS,GAAG,IAAI,iBAAiB,EAAE,CAAC;oBAC1C,YAAY,GAAG,SAAS,CAAC,OAAO,CAAC,IAAI,CAAC,CAAC;oBACvC,gBAAgB,GAAG,UAAU,CAAC;gBAChC,CAAC;gBAAC,OAAO,eAAoB,EAAE,CAAC;oBAC9B,OAAO,CAAC,KAAK,CAAC,kCAAkC,GAAG,iBAAiB,EAAE,eAAe,CAAC,CAAC;oBACvF,gDAAgD;gBAClD,CAAC;YACH,CAAC;YAED,OAAO;gBACL,OAAO,EAAE,YAAY;gBACrB,WAAW,EAAE,gBAAgB;gBAC7B,KAAK,EAAE,KAAK;gBACZ,GAAG,EAAE,QAAQ,CAAC,GAAG,EAAE,oCAAoC;gBACvD,WAAW,EAAE,KAAK;gBAClB,UAAU,EAAE,QAAQ,CAAC,MAAM;gBAC3B,KAAK,EAAE,SAAS;aACjB,CAAC;QACJ,CAAC;QAAC,OAAO,KAAU,EAAE,CAAC;YACpB,0CAA0C;YAC1C,IACE,KAAK,YAAY,oBAAoB;gBACrC,CAAC,KAAK,YAAY,UAAU,IAAI,KAAK,CAAC,IAAI,KAAK,sBAAsB,CAAC,EACtE,CAAC;gBACD,MAAM,KAAK,CAAC;YACd,CAAC;YACD,+BAA+B;YAC/B,MAAM,OAAO,GAAG,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,OAAO,CAAC,CAAC,CAAC,qBAAqB,CAAC;YAC/E,MAAM,IAAI,UAAU,CAAC,iBAAiB,OAAO,EAAE,EAAE,kBAAkB,EAAE,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,CAAC,CAAC,SAAS,CAAC,CAAC;QACnH,CAAC;IACH,CAAC;IAED;;;;OAIG;IACH,KAAK,CAAC,OAAO;QACX,OAAO,OAAO,CAAC,OAAO,EAAE,CAAC;IAC3B,CAAC;IAED;;;;OAIG;IACH,UAAU;QACR,OAAO,EAAE,CAAC;IACZ,CAAC"}
@@ -7,6 +7,7 @@ export declare class HybridEngine implements IEngine {
7
7
  private readonly fetchEngine;
8
8
  private readonly playwrightEngine;
9
9
  private readonly config;
10
+ private readonly playwrightOnlyPatterns;
10
11
  constructor(config?: PlaywrightEngineConfig);
11
12
  private _isSpaShell;
12
13
  fetchHTML(url: string, options?: FetchOptions): Promise<HTMLFetchResult>;
@@ -1 +1 @@
1
- {"version":3,"file":"HybridEngine.d.ts","sourceRoot":"","sources":["../src/HybridEngine.ts"],"names":[],"mappings":"AAEA,OAAO,KAAK,EAAE,OAAO,EAAE,MAAM,cAAc,CAAC;AAC5C,OAAO,KAAK,EAAE,eAAe,EAAE,sBAAsB,EAAE,YAAY,EAAE,cAAc,EAAE,MAAM,YAAY,CAAC;AAExG;;GAEG;AACH,qBAAa,YAAa,YAAW,OAAO;IAC1C,OAAO,CAAC,QAAQ,CAAC,WAAW,CAAc;IAC1C,OAAO,CAAC,QAAQ,CAAC,gBAAgB,CAAmB;IACpD,OAAO,CAAC,QAAQ,CAAC,MAAM,CAAyB;gBAEpC,MAAM,GAAE,sBAA2B;IAS/C,OAAO,CAAC,WAAW;IAkBb,SAAS,CAAC,GAAG,EAAE,MAAM,EAAE,OAAO,GAAE,YAAiB,GAAG,OAAO,CAAC,eAAe,CAAC;IAoDlF;;OAEG;IACH,UAAU,IAAI,cAAc,EAAE;IAI9B;;OAEG;IACG,OAAO,IAAI,OAAO,CAAC,IAAI,CAAC;CAM/B"}
1
+ {"version":3,"file":"HybridEngine.d.ts","sourceRoot":"","sources":["../src/HybridEngine.ts"],"names":[],"mappings":"AAEA,OAAO,KAAK,EAAE,OAAO,EAAE,MAAM,cAAc,CAAC;AAC5C,OAAO,KAAK,EAAE,eAAe,EAAE,sBAAsB,EAAE,YAAY,EAAE,cAAc,EAAE,MAAM,YAAY,CAAC;AAExG;;GAEG;AACH,qBAAa,YAAa,YAAW,OAAO;IAC1C,OAAO,CAAC,QAAQ,CAAC,WAAW,CAAc;IAC1C,OAAO,CAAC,QAAQ,CAAC,gBAAgB,CAAmB;IACpD,OAAO,CAAC,QAAQ,CAAC,MAAM,CAAyB;IAChD,OAAO,CAAC,QAAQ,CAAC,sBAAsB,CAAsB;gBAEjD,MAAM,GAAE,sBAA2B;IAU/C,OAAO,CAAC,WAAW;IAkBb,SAAS,CAAC,GAAG,EAAE,MAAM,EAAE,OAAO,GAAE,YAAiB,GAAG,OAAO,CAAC,eAAe,CAAC;IAiFlF;;OAEG;IACH,UAAU,IAAI,cAAc,EAAE;IAI9B;;OAEG;IACG,OAAO,IAAI,OAAO,CAAC,IAAI,CAAC;CAM/B"}