@purepageio/fetch-engines 0.9.1 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -2,13 +2,18 @@
2
2
 
3
3
  [![npm version](https://img.shields.io/npm/v/@purepageio/fetch-engines.svg)](https://www.npmjs.com/package/@purepageio/fetch-engines)
4
4
  [![CI](https://github.com/purepage/fetch-engines/actions/workflows/publish.yml/badge.svg)](https://github.com/purepage/fetch-engines/actions/workflows/publish.yml)
5
+ [![Live Browser Evals](https://github.com/purepage/fetch-engines/actions/workflows/live-browser-evals.yml/badge.svg)](https://github.com/purepage/fetch-engines/actions/workflows/live-browser-evals.yml)
5
6
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
6
7
 
7
- Fetch web pages as clean Markdown or structured data. HTTP-first with automatic Playwright fallback, built for RAG pipelines and content extraction.
8
+ Reliable public-web extraction for Node.js.
9
+
10
+ HTTP-first for speed. Browser-backed when needed. Clean Markdown, soft-block handling, and structured extraction for RAG and AI pipelines.
8
11
 
9
12
  ## Table of contents
10
13
 
11
14
  - [Why fetch-engines?](#why-fetch-engines)
15
+ - [Why trust fetch-engines](#why-trust-fetch-engines)
16
+ - [Library vs hosted crawler](#library-vs-hosted-crawler)
12
17
  - [Installation](#installation)
13
18
  - [Quick start](#quick-start)
14
19
  - [Usage patterns](#usage-patterns)
@@ -26,11 +31,33 @@ Fetch web pages as clean Markdown or structured data. HTTP-first with automatic
26
31
  ## Why fetch-engines?
27
32
 
28
33
  - **One API for multiple strategies** – Call `fetchHTML` for rendered pages or `fetchContent` for raw responses. The library handles HTTP shortcuts and Playwright fallbacks automatically.
29
- - **RAG-ready Markdown** – Convert any page to clean Markdown with boilerplate, nav, and SVG noise stripped out. Powered by a Rust-native converter.
30
- - **Built-in retries, caching, and a managed browser pool** Production defaults you can tune per request.
31
- - **URL to structured data in one call** – Define a Zod schema, get typed results back via any OpenAI-compatible API. The page is fetched as Markdown first to minimise tokens.
34
+ - **Automatic app-shell & soft-block detection** – Shell-like HTTP responses and bot-gate pages (Cloudflare challenges, CAPTCHAs, "verify you're human") are upgraded to Playwright rendering by default, so client-rendered pages and soft blocks work without per-domain rules.
35
+ - **RAG-ready Markdown** Convert public content pages to clean Markdown with boilerplate, nav, and SVG noise stripped out. Powered by a Rust-native converter.
36
+ - **HTTP-first, browser-backed when needed** – Fast pages stay cheap via plain HTTP, while harder pages automatically benefit from Playwright fallback.
37
+ - **Structured extraction built in** – Define a Zod schema and go from URL to typed data via any OpenAI-compatible API. The page is fetched as Markdown first to minimise tokens.
32
38
  - **Playwright is optional** – `FetchEngine` works without browser dependencies. Playwright is only loaded when you use `HybridEngine` or `PlaywrightEngine`.
33
39
 
40
+ ## Why trust fetch-engines
41
+
42
+ - **19 live URLs across 7 archetypes** (docs, government, knowledge, marketing, commerce, static, access-guarded) validated on every release and nightly via browser-enabled CI
43
+ - **85 unit tests + dedicated live browser eval workflow** — not just "it compiles," but "it extracts real content from real pages"
44
+ - Handles app shells, Cloudflare challenges, CAPTCHAs, and utility-class-heavy doc sites (Tailwind, Vite) without per-domain rules
45
+ - Produces clean Markdown with absolute URLs — boilerplate removal typically reduces raw HTML to 10–30% of its original size before it reaches your LLM
46
+ - Structured extraction with Zod schemas and any OpenAI-compatible provider, in the same pipeline as page fetching
47
+
48
+ ## Library vs hosted crawler
49
+
50
+ | | fetch-engines | Hosted crawlers |
51
+ | ------------------- | --------------------------------------- | ------------------- |
52
+ | **Runs where** | Your Node.js process | Third-party API |
53
+ | **Data stays** | In your infrastructure | Leaves your network |
54
+ | **Cost model** | Free + your compute | Per-page pricing |
55
+ | **Customisation** | Full source access, tune heuristics | Configuration flags |
56
+ | **Browser control** | Your Playwright instance, your proxy | Opaque |
57
+ | **Transparency** | Open tests, open evals, open heuristics | Black box |
58
+
59
+ Choose `fetch-engines` when you want full control over extraction, data residency, and cost. Choose a hosted crawler when you need managed infrastructure and don't want to run browsers yourself.
60
+
34
61
  ## Installation
35
62
 
36
63
  ```bash
@@ -84,7 +111,9 @@ console.log(page.contentType); // "markdown"
84
111
  await engine.cleanup();
85
112
  ```
86
113
 
87
- `FetchEngine` also supports `markdown: true` for static pages that don't need JavaScript rendering.
114
+ `FetchEngine` also supports `markdown: true` for static pages that don't need JavaScript rendering. `HybridEngine` now decides whether to render before converting to Markdown, so shell detection still works when callers request Markdown output.
115
+ Relative links and image URLs in Markdown output are normalized to absolute URLs using the final fetched page URL. The converter strips generic UI chrome (nav/footer/button controls and dense link clusters) using domain-agnostic heuristics, while preserving content on pages without semantic `<main>`/`<article>` containers (e.g., Tailwind CSS docs).
116
+ The extraction path is tuned for publicly accessible content. Paywalled or member-only pages may still return intentionally partial content unless you supply authenticated access yourself.
88
117
 
89
118
  ### Structured extraction
90
119
 
@@ -131,7 +160,7 @@ When you supply a custom `baseURL`, the engine automatically switches to the Ver
131
160
  All engines accept familiar `fetch` options such as custom headers. Additional Hybrid/Playwright options you are likely to tweak:
132
161
 
133
162
  - `markdown` – return Markdown instead of HTML.
134
- - `spaMode` & `spaRenderDelayMs` allow single-page apps to render before extraction.
163
+ - Automatic shell detection is enabled by default. `spaMode` & `spaRenderDelayMs` still force a more patient render path when you know a page is highly dynamic.
135
164
  - `cacheTTL`, `maxRetries`, and browser pool sizes – control resilience and throughput.
136
165
 
137
166
  Check the inline TypeScript docs or the [`/examples`](./examples) directory for end-to-end flows.
@@ -140,30 +169,30 @@ Check the inline TypeScript docs or the [`/examples`](./examples) directory for
140
169
 
141
170
  Every option from `PlaywrightEngineConfig` (consumed by `HybridEngine`) with defaults:
142
171
 
143
- | Option | Default | Purpose |
144
- | -------------------------- | ----------- | ------------------------------------------------------------------------------------------------ |
145
- | `headers` | `{}` | Extra headers merged into every request. |
146
- | `concurrentPages` | `3` | Maximum Playwright pages processed at once. |
147
- | `maxRetries` | `3` | Additional retry attempts after the first failure. |
148
- | `retryDelay` | `5000` | Milliseconds to wait between retries. |
149
- | `cacheTTL` | `900000` | Cache lifetime in ms (`0` disables caching). |
150
- | `useHttpFallback` | `true` | Try a fast HTTP GET before spinning up Playwright. |
151
- | `useHeadedModeFallback` | `false` | Automatically retry a domain in headed mode after repeated failures. |
152
- | `defaultFastMode` | `true` | Block non-critical assets and skip human simulation unless overridden. |
153
- | `simulateHumanBehavior` | `true` | When not in fast mode, add delays and scrolling to avoid bot detection. |
154
- | `maxBrowsers` | `2` | Highest number of Playwright browser instances kept in the pool. |
155
- | `maxPagesPerContext` | `6` | Pages opened per browser context before recycling it. |
156
- | `maxBrowserAge` | `1200000` | Milliseconds before a browser instance is torn down (20 minutes). |
157
- | `healthCheckInterval` | `60000` | Pool health check frequency in ms. |
158
- | `poolBlockedDomains` | `[]` | Domains blocked across every Playwright request (inherit pool defaults if empty). |
159
- | `poolBlockedResourceTypes` | `[]` | Resource types (e.g. `"image"`) blocked globally. |
160
- | `proxy` | `undefined` | Per-browser proxy `{ server, username?, password? }`. |
161
- | `useHeadedMode` | `false` | Force every browser to launch with a visible window. |
162
- | `markdown` | `false` | Return Markdown instead of raw HTML. Converts via a Rust-native engine with boilerplate removal. |
163
- | `spaMode` | `false` | Enable SPA heuristics and allow additional waits for client rendering. |
164
- | `spaRenderDelayMs` | `0` | Extra delay after load when `spaMode` is `true`. |
165
- | `playwrightOnlyPatterns` | `[]` | URLs matching any string/regex go straight to Playwright, skipping HTTP fetches. |
166
- | `playwrightLaunchOptions` | `undefined` | Options passed to `browserType.launch` (see Playwright docs). |
172
+ | Option | Default | Purpose |
173
+ | -------------------------- | ----------- | -------------------------------------------------------------------------------------------------- |
174
+ | `headers` | `{}` | Extra headers merged into every request. |
175
+ | `concurrentPages` | `3` | Maximum Playwright pages processed at once. |
176
+ | `maxRetries` | `3` | Additional retry attempts after the first failure. |
177
+ | `retryDelay` | `5000` | Milliseconds to wait between retries. |
178
+ | `cacheTTL` | `900000` | Cache lifetime in ms (`0` disables caching). |
179
+ | `useHttpFallback` | `true` | Try a fast HTTP GET before spinning up Playwright. |
180
+ | `useHeadedModeFallback` | `false` | Automatically retry a domain in headed mode after repeated failures. |
181
+ | `defaultFastMode` | `true` | Block non-critical assets and skip human simulation unless overridden. |
182
+ | `simulateHumanBehavior` | `true` | When not in fast mode, add delays and scrolling to avoid bot detection. |
183
+ | `maxBrowsers` | `2` | Highest number of Playwright browser instances kept in the pool. |
184
+ | `maxPagesPerContext` | `6` | Pages opened per browser context before recycling it. |
185
+ | `maxBrowserAge` | `1200000` | Milliseconds before a browser instance is torn down (20 minutes). |
186
+ | `healthCheckInterval` | `60000` | Pool health check frequency in ms. |
187
+ | `poolBlockedDomains` | `[]` | Domains blocked across every Playwright request (inherit pool defaults if empty). |
188
+ | `poolBlockedResourceTypes` | `[]` | Resource types (e.g. `"image"`) blocked globally. |
189
+ | `proxy` | `undefined` | Per-browser proxy `{ server, username?, password? }`. |
190
+ | `useHeadedMode` | `false` | Force every browser to launch with a visible window. |
191
+ | `markdown` | `false` | Return Markdown instead of raw HTML. Converts via a Rust-native engine with boilerplate removal. |
192
+ | `spaMode` | `false` | Force the more patient render path. Many shell-like pages are auto-detected even when this is off. |
193
+ | `spaRenderDelayMs` | `0` | Minimum extra wait budget when `spaMode` is `true`. |
194
+ | `playwrightOnlyPatterns` | `[]` | URLs matching any string/regex go straight to Playwright, skipping HTTP shell detection. |
195
+ | `playwrightLaunchOptions` | `undefined` | Options passed to `browserType.launch` (see Playwright docs). |
167
196
 
168
197
  Per-request overrides: `fetchHTML` accepts `fastMode`, `markdown`, `spaMode`, and `headers`, while `fetchContent` supports `fastMode` and `headers`.
169
198
 
@@ -176,6 +205,9 @@ Failures raise a typed `FetchError` exposing `code`, `statusCode`, and the under
176
205
  - Explore the [`examples`](./examples) directory for scripts you can run end-to-end.
177
206
  - Ready-to-use TypeScript types ship with the package.
178
207
  - `pnpm test` runs the automated suite when you are ready to contribute.
208
+ - `pnpm eval:auto-render` runs a live Hybrid-vs-HTTP quality matrix across docs, government, knowledge, marketing, commerce, and access-guarded pages, using a stable gated core plus observe-only sentinels for harder domains.
209
+ - `pnpm test:live:auto-render` runs the same hypothesis as a Vitest live test (`LIVE_NETWORK=1`).
210
+ - GitHub Actions includes a dedicated browser-enabled live eval workflow that runs on `main` changes, nightly on a schedule, and on manual dispatch. It uploads the JSON report as a build artifact.
179
211
 
180
212
  ## Contributing
181
213
 
@@ -31,7 +31,6 @@ export declare class FetchEngine implements IEngine {
31
31
  * @throws {Error} If the content type is not HTML or for other network errors.
32
32
  */
33
33
  fetchHTML(url: string, options?: FetchEngineOptions): Promise<HTMLFetchResult>;
34
- private _injectSourceUnderH1;
35
34
  /**
36
35
  * Fetches raw content from the specified URL (mimics standard fetch API).
37
36
  *
@@ -1 +1 @@
1
- {"version":3,"file":"FetchEngine.d.ts","sourceRoot":"","sources":["../src/FetchEngine.ts"],"names":[],"mappings":"AAAA,OAAO,KAAK,EACV,eAAe,EACf,kBAAkB,EAClB,mBAAmB,EACnB,cAAc,EACd,kBAAkB,EACnB,MAAM,YAAY,CAAC;AACpB,OAAO,KAAK,EAAE,OAAO,EAAE,MAAM,cAAc,CAAC;AAG5C,OAAO,EAAE,UAAU,EAAE,MAAM,aAAa,CAAC;AAEzC;;GAEG;AACH,qBAAa,oBAAqB,SAAQ,UAAU;aAGhC,UAAU,EAAE,MAAM;gBADlC,OAAO,EAAE,MAAM,EACC,UAAU,EAAE,MAAM;CAKrC;AAED;;;;;GAKG;AACH,qBAAa,WAAY,YAAW,OAAO;IACzC,OAAO,CAAC,QAAQ,CAAC,OAAO,CAA+B;IAEvD,OAAO,CAAC,MAAM,CAAC,QAAQ,CAAC,eAAe,CAGrC;IAEF;;;OAGG;gBACS,OAAO,GAAE,kBAAuB;IAI5C;;;;;;;OAOG;IACG,SAAS,CAAC,GAAG,EAAE,MAAM,EAAE,OAAO,CAAC,EAAE,kBAAkB,GAAG,OAAO,CAAC,eAAe,CAAC;IAkFpF,OAAO,CAAC,oBAAoB;IAS5B;;;;;;;;OAQG;IACG,YAAY,CAAC,GAAG,EAAE,MAAM,EAAE,OAAO,CAAC,EAAE,mBAAmB,GAAG,OAAO,CAAC,kBAAkB,CAAC;IA8E3F;;;;OAIG;IACG,OAAO,IAAI,OAAO,CAAC,IAAI,CAAC;IAI9B;;;;OAIG;IACH,UAAU,IAAI,cAAc,EAAE;CAG/B"}
1
+ {"version":3,"file":"FetchEngine.d.ts","sourceRoot":"","sources":["../src/FetchEngine.ts"],"names":[],"mappings":"AAAA,OAAO,KAAK,EACV,eAAe,EACf,kBAAkB,EAClB,mBAAmB,EACnB,cAAc,EACd,kBAAkB,EACnB,MAAM,YAAY,CAAC;AACpB,OAAO,KAAK,EAAE,OAAO,EAAE,MAAM,cAAc,CAAC;AAG5C,OAAO,EAAE,UAAU,EAAE,MAAM,aAAa,CAAC;AAEzC;;GAEG;AACH,qBAAa,oBAAqB,SAAQ,UAAU;aAGhC,UAAU,EAAE,MAAM;gBADlC,OAAO,EAAE,MAAM,EACC,UAAU,EAAE,MAAM;CAKrC;AAED;;;;;GAKG;AACH,qBAAa,WAAY,YAAW,OAAO;IACzC,OAAO,CAAC,QAAQ,CAAC,OAAO,CAA+B;IAEvD,OAAO,CAAC,MAAM,CAAC,QAAQ,CAAC,eAAe,CAGrC;IAEF;;;OAGG;gBACS,OAAO,GAAE,kBAAuB;IAI5C;;;;;;;OAOG;IACG,SAAS,CAAC,GAAG,EAAE,MAAM,EAAE,OAAO,CAAC,EAAE,kBAAkB,GAAG,OAAO,CAAC,eAAe,CAAC;IAgFpF;;;;;;;;OAQG;IACG,YAAY,CAAC,GAAG,EAAE,MAAM,EAAE,OAAO,CAAC,EAAE,mBAAmB,GAAG,OAAO,CAAC,kBAAkB,CAAC;IA8E3F;;;;OAIG;IACG,OAAO,IAAI,OAAO,CAAC,IAAI,CAAC;IAI9B;;;;OAIG;IACH,UAAU,IAAI,cAAc,EAAE;CAG/B"}
@@ -1,4 +1,4 @@
1
- import { MarkdownConverter } from "./utils/markdown-converter.js"; // Import the converter
1
+ import { MarkdownConverter, injectSourceUrl } from "./utils/markdown-converter.js";
2
2
  import { FetchError } from "./errors.js"; // Only import FetchError
3
3
  /**
4
4
  * Custom error class for HTTP errors from FetchEngine.
@@ -76,9 +76,8 @@ export class FetchEngine {
76
76
  if (effectiveOptions.markdown) {
77
77
  try {
78
78
  const converter = new MarkdownConverter();
79
- finalContent = converter.convert(html);
80
- // Inject source URL directly under the first H1 for traceability
81
- finalContent = this._injectSourceUnderH1(finalContent, response.url || url);
79
+ finalContent = converter.convert(html, { baseUrl: response.url || url });
80
+ finalContent = injectSourceUrl(finalContent, response.url || url);
82
81
  finalContentType = "markdown";
83
82
  }
84
83
  catch (conversionError) {
@@ -107,17 +106,6 @@ export class FetchEngine {
107
106
  throw new FetchError(`Fetch failed: ${message}`, "ERR_FETCH_FAILED", error instanceof Error ? error : undefined);
108
107
  }
109
108
  }
110
- // Insert a "Source: <url>" line immediately below the first H1.
111
- _injectSourceUnderH1(markdown, sourceUrl) {
112
- if (!markdown || !sourceUrl)
113
- return markdown;
114
- // Avoid duplicate insertion if already present near the top
115
- const head = markdown.split("\n").slice(0, 50).join("\n");
116
- if (/^Source:\s+/m.test(head))
117
- return markdown;
118
- const safeUrl = sourceUrl.trim();
119
- return markdown.replace(/^(\s*#\s.*)$/m, `$1\n\nSource: ${safeUrl}`);
120
- }
121
109
  /**
122
110
  * Fetches raw content from the specified URL (mimics standard fetch API).
123
111
  *
@@ -1 +1 @@
1
- {"version":3,"file":"FetchEngine.js","sourceRoot":"","sources":["../src/FetchEngine.ts"],"names":[],"mappings":"AASA,OAAO,EAAE,iBAAiB,EAAE,MAAM,+BAA+B,CAAC,CAAC,uBAAuB;AAC1F,OAAO,EAAE,UAAU,EAAE,MAAM,aAAa,CAAC,CAAC,yBAAyB;AAEnE;;GAEG;AACH,MAAM,OAAO,oBAAqB,SAAQ,UAAU;IAGhC;IAFlB,YACE,OAAe,EACC,UAAkB;QAElC,KAAK,CAAC,OAAO,EAAE,gBAAgB,EAAE,SAAS,EAAE,UAAU,CAAC,CAAC;QAFxC,eAAU,GAAV,UAAU,CAAQ;QAGlC,IAAI,CAAC,IAAI,GAAG,sBAAsB,CAAC;IACrC,CAAC;CACF;AAED;;;;;GAKG;AACH,MAAM,OAAO,WAAW;IACL,OAAO,CAA+B;IAE/C,MAAM,CAAU,eAAe,GAAiC;QACtE,QAAQ,EAAE,KAAK;QACf,OAAO,EAAE,EAAE;KACZ,CAAC;IAEF;;;OAGG;IACH,YAAY,UAA8B,EAAE;QAC1C,IAAI,CAAC,OAAO,GAAG,EAAE,GAAG,WAAW,CAAC,eAAe,EAAE,GAAG,OAAO,EAAE,CAAC;IAChE,CAAC;IAED;;;;;;;OAOG;IACH,KAAK,CAAC,SAAS,CAAC,GAAW,EAAE,OAA4B;QACvD,MAAM,gBAAgB,GAAG,EAAE,GAAG,IAAI,CAAC,OAAO,EAAE,GAAG,OAAO,EAAE,CAAC,CAAC,uCAAuC;QACjG,IAAI,QAAkB,CAAC;QACvB,IAAI,CAAC;YACH,MAAM,WAAW,GAAG;gBAClB,YAAY,EACV,iHAAiH;gBACnH,MAAM,EAAE,kGAAkG;gBAC1G,iBAAiB,EAAE,gBAAgB;aACpC,CAAC;YAEF,6DAA6D;YAC7D,MAAM,kBAAkB,GAAG,IAAI,CAAC,OAAO,CAAC,OAAO,IAAI,EAAE,CAAC;YAEtD,sEAAsE;YACtE,0GAA0G;YAC1G,MAAM,mBAAmB,GAAG,OAAO,EAAE,OAAO,IAAI,EAAE,CAAC;YAEnD,MAAM,YAAY,GAAG;gBACnB,GAAG,WAAW;gBACd,GAAG,kBAAkB;gBACrB,GAAG,mBAAmB,EAAE,sFAAsF;aAC/G,CAAC;YAEF,QAAQ,GAAG,MAAM,KAAK,CAAC,GAAG,EAAE;gBAC1B,QAAQ,EAAE,QAAQ;gBAClB,OAAO,EAAE,YAAY;aACtB,CAAC,CAAC;YAEH,IAAI,CAAC,QAAQ,CAAC,EAAE,EAAE,CAAC;gBACjB,MAAM,IAAI,oBAAoB,CAAC,uBAAuB,QAAQ,CAAC,MAAM,EAAE,EAAE,QAAQ,CAAC,MAAM,CAAC,CAAC;YAC5F,CAAC;YAED,MAAM,iBAAiB,GAAG,QAAQ,CAAC,OAAO,CAAC,GAAG,CAAC,cAAc,CAAC,CAAC;YAC/D,IAAI,CAAC,iBAAiB,IAAI,CAAC,iBAAiB,CAAC,QAAQ,CAAC,WAAW,CAAC,EAAE,CAAC;gBACnE,MAAM,IAAI,UAAU,CAAC,+BAA+B,EAAE,sBAAsB,CAAC,CAAC;YAChF,CAAC;YAED,MAAM,IAAI,GAAG,MAAM,QAAQ,CAAC,IAAI,EAAE,CAAC;YACnC,MAAM,UAAU,GAAG,IAAI,CAAC,KAAK,CAAC,+BAA+B,CAAC,CAAC;YAC/D,MAAM,KAAK,GAAG,UAAU,CAAC,CAAC,CAAC,UAAU,CAAC,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC,CAAC,CAAC,IAAI,CAAC;YAEvD,IAAI,YAAY,GAAG,IAAI,CAAC;YACxB,IAAI,gBAAgB,GAAwB,MAAM,CAAC;YAEnD,IAAI,gBAAgB,CAAC,QAAQ,EAAE,CAAC;gBAC9B,IAAI,CAAC;oBACH,MAAM,SAAS,GAAG,IAAI,iBAAiB,EAAE,CAAC;oBAC1C,YAAY,GAAG,SAAS,CAAC,OAAO,CAAC,IAAI,CAAC,CAAC;oBACvC,iEAAiE;oBACjE,YAAY,GAAG,IAAI,CAAC,oBAAoB,CAAC,YAAY,EAAE,QAAQ,CAAC,GAAG,IAAI,GAAG,CAAC,CAAC;oBAC5E,gBAAgB,GAAG,UAAU,CAAC;gBAChC,CAAC;gBAAC,OAAO,eAAwB,EAAE,CAAC;oBAClC,OAAO,CAAC,KAAK,CAAC,kCAAkC,GAAG,iBAAiB,EAAE,eAAe,CAAC,CAAC;oBACvF,gDAAgD;gBAClD,CAAC;YACH,CAAC;YAED,OAAO;gBACL,OAAO,EAAE,YAAY;gBACrB,WAAW,EAAE,gBAAgB;gBAC7B,KAAK,EAAE,KAAK;gBACZ,GAAG,EAAE,QAAQ,CAAC,GAAG,EAAE,oCAAoC;gBACvD,WAAW,EAAE,KAAK;gBAClB,UAAU,EAAE,QAAQ,CAAC,MAAM;gBAC3B,KAAK,EAAE,SAAS;aACjB,CAAC;QACJ,CAAC;QAAC,OAAO,KAAc,EAAE,CAAC;YACxB,0CAA0C;YAC1C,IACE,KAAK,YAAY,oBAAoB;gBACrC,CAAC,KAAK,YAAY,UAAU,IAAI,KAAK,CAAC,IAAI,KAAK,sBAAsB,CAAC,EACtE,CAAC;gBACD,MAAM,KAAK,CAAC;YACd,CAAC;YACD,+BAA+B;YAC/B,MAAM,OAAO,GAAG,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,OAAO,CAAC,CAAC,CAAC,qBAAqB,CAAC;YAC/E,MAAM,IAAI,UAAU,CAAC,iBAAiB,OAAO,EAAE,EAAE,kBAAkB,EAAE,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,CAAC,CAAC,SAAS,CAAC,CAAC;QACnH,CAAC;IACH,CAAC;IAED,gEAAgE;IACxD,oBAAoB,CAAC,QAAgB,EAAE,SAAiB;QAC9D,IAAI,CAAC,QAAQ,IAAI,CAAC,SAAS;YAAE,OAAO,QAAQ,CAAC;QAC7C,4DAA4D;QAC5D,MAAM,IAAI,GAAG,QAAQ,CAAC,KAAK,CAAC,IAAI,CAAC,CAAC,KAAK,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC;QAC1D,IAAI,cAAc,CAAC,IAAI,CAAC,IAAI,CAAC;YAAE,OAAO,QAAQ,CAAC;QAC/C,MAAM,OAAO,GAAG,SAAS,CAAC,IAAI,EAAE,CAAC;QACjC,OAAO,QAAQ,CAAC,OAAO,CAAC,eAAe,EAAE,iBAAiB,OAAO,EAAE,CAAC,CAAC;IACvE,CAAC;IAED;;;;;;;;OAQG;IACH,KAAK,CAAC,YAAY,CAAC,GAAW,EAAE,OAA6B;QAC3D,IAAI,QAAkB,CAAC;QACvB,IAAI,CAAC;YACH,MAAM,WAAW,GAAG;gBAClB,YAAY,EACV,iHAAiH;gBACnH,MAAM,EAAE,KAAK,EAAE,mDAAmD;aACnE,CAAC;YAEF,sDAAsD;YACtD,MAAM,kBAAkB,GAAG,IAAI,CAAC,OAAO,CAAC,OAAO,IAAI,EAAE,CAAC;YACtD,MAAM,mBAAmB,GAAG,OAAO,EAAE,OAAO,IAAI,EAAE,CAAC;YAEnD,MAAM,YAAY,GAAG;gBACnB,GAAG,WAAW;gBACd,GAAG,kBAAkB;gBACrB,GAAG,mBAAmB;aACvB,CAAC;YAEF,QAAQ,GAAG,MAAM,KAAK,CAAC,GAAG,EAAE;gBAC1B,QAAQ,EAAE,QAAQ;gBAClB,OAAO,EAAE,YAAY;aACtB,CAAC,CAAC;YAEH,IAAI,CAAC,QAAQ,CAAC,EAAE,EAAE,CAAC;gBACjB,MAAM,IAAI,oBAAoB,CAAC,uBAAuB,QAAQ,CAAC,MAAM,EAAE,EAAE,QAAQ,CAAC,MAAM,CAAC,CAAC;YAC5F,CAAC;YAED,MAAM,iBAAiB,GAAG,QAAQ,CAAC,OAAO,CAAC,GAAG,CAAC,cAAc,CAAC,IAAI,0BAA0B,CAAC;YAE7F,+CAA+C;YAC/C,MAAM,WAAW,GACf,iBAAiB,CAAC,UAAU,CAAC,OAAO,CAAC;gBACrC,iBAAiB,CAAC,QAAQ,CAAC,MAAM,CAAC;gBAClC,iBAAiB,CAAC,QAAQ,CAAC,KAAK,CAAC;gBACjC,iBAAiB,CAAC,QAAQ,CAAC,YAAY,CAAC;gBACxC,iBAAiB,CAAC,QAAQ,CAAC,MAAM,CAAC;gBAClC,iBAAiB,CAAC,QAAQ,CAAC,KAAK,CAAC,CAAC;YAEpC,IAAI,OAAwB,CAAC;YAC7B,IAAI,WAAW,EAAE,CAAC;gBAChB,OAAO,GAAG,MAAM,QAAQ,CAAC,IAAI,EAAE,CAAC;YAClC,CAAC;iBAAM,CAAC;gBACN,MAAM,WAAW,GAAG,MAAM,QAAQ,CAAC,WAAW,EAAE,CAAC;gBACjD,OAAO,GAAG,MAAM,CAAC,IAAI,CAAC,WAAW,CAAC,CAAC;YACrC,CAAC;YAED,wCAAwC;YACxC,IAAI,KAAK,GAAkB,IAAI,CAAC;YAChC,IAAI,OAAO,OAAO,KAAK,QAAQ,IAAI,iBAAiB,CAAC,QAAQ,CAAC,MAAM,CAAC,EAAE,CAAC;gBACtE,MAAM,UAAU,GAAG,OAAO,CAAC,KAAK,CAAC,+BAA+B,CAAC,CAAC;gBAClE,KAAK,GAAG,UAAU,CAAC,CAAC,CAAC,UAAU,CAAC,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC,CAAC,CAAC,IAAI,CAAC;YACnD,CAAC;YAED,OAAO;gBACL,OAAO;gBACP,WAAW,EAAE,iBAAiB;gBAC9B,KAAK;gBACL,GAAG,EAAE,QAAQ,CAAC,GAAG,EAAE,oCAAoC;gBACvD,WAAW,EAAE,KAAK;gBAClB,UAAU,EAAE,QAAQ,CAAC,MAAM;gBAC3B,KAAK,EAAE,SAAS;aACjB,CAAC;QACJ,CAAC;QAAC,OAAO,KAAc,EAAE,CAAC;YACxB,0CAA0C;YAC1C,IAAI,KAAK,YAAY,oBAAoB,EAAE,CAAC;gBAC1C,MAAM,KAAK,CAAC;YACd,CAAC;YACD,+BAA+B;YAC/B,MAAM,OAAO,GAAG,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,OAAO,CAAC,CAAC,CAAC,qBAAqB,CAAC;YAC/E,MAAM,IAAI,UAAU,CAClB,yBAAyB,OAAO,EAAE,EAClC,kBAAkB,EAClB,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,CAAC,CAAC,SAAS,CAC3C,CAAC;QACJ,CAAC;IACH,CAAC;IAED;;;;OAIG;IACH,KAAK,CAAC,OAAO;QACX,OAAO,OAAO,CAAC,OAAO,EAAE,CAAC;IAC3B,CAAC;IAED;;;;OAIG;IACH,UAAU;QACR,OAAO,EAAE,CAAC;IACZ,CAAC"}
1
+ {"version":3,"file":"FetchEngine.js","sourceRoot":"","sources":["../src/FetchEngine.ts"],"names":[],"mappings":"AASA,OAAO,EAAE,iBAAiB,EAAE,eAAe,EAAE,MAAM,+BAA+B,CAAC;AACnF,OAAO,EAAE,UAAU,EAAE,MAAM,aAAa,CAAC,CAAC,yBAAyB;AAEnE;;GAEG;AACH,MAAM,OAAO,oBAAqB,SAAQ,UAAU;IAGhC;IAFlB,YACE,OAAe,EACC,UAAkB;QAElC,KAAK,CAAC,OAAO,EAAE,gBAAgB,EAAE,SAAS,EAAE,UAAU,CAAC,CAAC;QAFxC,eAAU,GAAV,UAAU,CAAQ;QAGlC,IAAI,CAAC,IAAI,GAAG,sBAAsB,CAAC;IACrC,CAAC;CACF;AAED;;;;;GAKG;AACH,MAAM,OAAO,WAAW;IACL,OAAO,CAA+B;IAE/C,MAAM,CAAU,eAAe,GAAiC;QACtE,QAAQ,EAAE,KAAK;QACf,OAAO,EAAE,EAAE;KACZ,CAAC;IAEF;;;OAGG;IACH,YAAY,UAA8B,EAAE;QAC1C,IAAI,CAAC,OAAO,GAAG,EAAE,GAAG,WAAW,CAAC,eAAe,EAAE,GAAG,OAAO,EAAE,CAAC;IAChE,CAAC;IAED;;;;;;;OAOG;IACH,KAAK,CAAC,SAAS,CAAC,GAAW,EAAE,OAA4B;QACvD,MAAM,gBAAgB,GAAG,EAAE,GAAG,IAAI,CAAC,OAAO,EAAE,GAAG,OAAO,EAAE,CAAC,CAAC,uCAAuC;QACjG,IAAI,QAAkB,CAAC;QACvB,IAAI,CAAC;YACH,MAAM,WAAW,GAAG;gBAClB,YAAY,EACV,iHAAiH;gBACnH,MAAM,EAAE,kGAAkG;gBAC1G,iBAAiB,EAAE,gBAAgB;aACpC,CAAC;YAEF,6DAA6D;YAC7D,MAAM,kBAAkB,GAAG,IAAI,CAAC,OAAO,CAAC,OAAO,IAAI,EAAE,CAAC;YAEtD,sEAAsE;YACtE,0GAA0G;YAC1G,MAAM,mBAAmB,GAAG,OAAO,EAAE,OAAO,IAAI,EAAE,CAAC;YAEnD,MAAM,YAAY,GAAG;gBACnB,GAAG,WAAW;gBACd,GAAG,kBAAkB;gBACrB,GAAG,mBAAmB,EAAE,sFAAsF;aAC/G,CAAC;YAEF,QAAQ,GAAG,MAAM,KAAK,CAAC,GAAG,EAAE;gBAC1B,QAAQ,EAAE,QAAQ;gBAClB,OAAO,EAAE,YAAY;aACtB,CAAC,CAAC;YAEH,IAAI,CAAC,QAAQ,CAAC,EAAE,EAAE,CAAC;gBACjB,MAAM,IAAI,oBAAoB,CAAC,uBAAuB,QAAQ,CAAC,MAAM,EAAE,EAAE,QAAQ,CAAC,MAAM,CAAC,CAAC;YAC5F,CAAC;YAED,MAAM,iBAAiB,GAAG,QAAQ,CAAC,OAAO,CAAC,GAAG,CAAC,cAAc,CAAC,CAAC;YAC/D,IAAI,CAAC,iBAAiB,IAAI,CAAC,iBAAiB,CAAC,QAAQ,CAAC,WAAW,CAAC,EAAE,CAAC;gBACnE,MAAM,IAAI,UAAU,CAAC,+BAA+B,EAAE,sBAAsB,CAAC,CAAC;YAChF,CAAC;YAED,MAAM,IAAI,GAAG,MAAM,QAAQ,CAAC,IAAI,EAAE,CAAC;YACnC,MAAM,UAAU,GAAG,IAAI,CAAC,KAAK,CAAC,+BAA+B,CAAC,CAAC;YAC/D,MAAM,KAAK,GAAG,UAAU,CAAC,CAAC,CAAC,UAAU,CAAC,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC,CAAC,CAAC,IAAI,CAAC;YAEvD,IAAI,YAAY,GAAG,IAAI,CAAC;YACxB,IAAI,gBAAgB,GAAwB,MAAM,CAAC;YAEnD,IAAI,gBAAgB,CAAC,QAAQ,EAAE,CAAC;gBAC9B,IAAI,CAAC;oBACH,MAAM,SAAS,GAAG,IAAI,iBAAiB,EAAE,CAAC;oBAC1C,YAAY,GAAG,SAAS,CAAC,OAAO,CAAC,IAAI,EAAE,EAAE,OAAO,EAAE,QAAQ,CAAC,GAAG,IAAI,GAAG,EAAE,CAAC,CAAC;oBACzE,YAAY,GAAG,eAAe,CAAC,YAAY,EAAE,QAAQ,CAAC,GAAG,IAAI,GAAG,CAAC,CAAC;oBAClE,gBAAgB,GAAG,UAAU,CAAC;gBAChC,CAAC;gBAAC,OAAO,eAAwB,EAAE,CAAC;oBAClC,OAAO,CAAC,KAAK,CAAC,kCAAkC,GAAG,iBAAiB,EAAE,eAAe,CAAC,CAAC;oBACvF,gDAAgD;gBAClD,CAAC;YACH,CAAC;YAED,OAAO;gBACL,OAAO,EAAE,YAAY;gBACrB,WAAW,EAAE,gBAAgB;gBAC7B,KAAK,EAAE,KAAK;gBACZ,GAAG,EAAE,QAAQ,CAAC,GAAG,EAAE,oCAAoC;gBACvD,WAAW,EAAE,KAAK;gBAClB,UAAU,EAAE,QAAQ,CAAC,MAAM;gBAC3B,KAAK,EAAE,SAAS;aACjB,CAAC;QACJ,CAAC;QAAC,OAAO,KAAc,EAAE,CAAC;YACxB,0CAA0C;YAC1C,IACE,KAAK,YAAY,oBAAoB;gBACrC,CAAC,KAAK,YAAY,UAAU,IAAI,KAAK,CAAC,IAAI,KAAK,sBAAsB,CAAC,EACtE,CAAC;gBACD,MAAM,KAAK,CAAC;YACd,CAAC;YACD,+BAA+B;YAC/B,MAAM,OAAO,GAAG,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,OAAO,CAAC,CAAC,CAAC,qBAAqB,CAAC;YAC/E,MAAM,IAAI,UAAU,CAAC,iBAAiB,OAAO,EAAE,EAAE,kBAAkB,EAAE,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,CAAC,CAAC,SAAS,CAAC,CAAC;QACnH,CAAC;IACH,CAAC;IAED;;;;;;;;OAQG;IACH,KAAK,CAAC,YAAY,CAAC,GAAW,EAAE,OAA6B;QAC3D,IAAI,QAAkB,CAAC;QACvB,IAAI,CAAC;YACH,MAAM,WAAW,GAAG;gBAClB,YAAY,EACV,iHAAiH;gBACnH,MAAM,EAAE,KAAK,EAAE,mDAAmD;aACnE,CAAC;YAEF,sDAAsD;YACtD,MAAM,kBAAkB,GAAG,IAAI,CAAC,OAAO,CAAC,OAAO,IAAI,EAAE,CAAC;YACtD,MAAM,mBAAmB,GAAG,OAAO,EAAE,OAAO,IAAI,EAAE,CAAC;YAEnD,MAAM,YAAY,GAAG;gBACnB,GAAG,WAAW;gBACd,GAAG,kBAAkB;gBACrB,GAAG,mBAAmB;aACvB,CAAC;YAEF,QAAQ,GAAG,MAAM,KAAK,CAAC,GAAG,EAAE;gBAC1B,QAAQ,EAAE,QAAQ;gBAClB,OAAO,EAAE,YAAY;aACtB,CAAC,CAAC;YAEH,IAAI,CAAC,QAAQ,CAAC,EAAE,EAAE,CAAC;gBACjB,MAAM,IAAI,oBAAoB,CAAC,uBAAuB,QAAQ,CAAC,MAAM,EAAE,EAAE,QAAQ,CAAC,MAAM,CAAC,CAAC;YAC5F,CAAC;YAED,MAAM,iBAAiB,GAAG,QAAQ,CAAC,OAAO,CAAC,GAAG,CAAC,cAAc,CAAC,IAAI,0BAA0B,CAAC;YAE7F,+CAA+C;YAC/C,MAAM,WAAW,GACf,iBAAiB,CAAC,UAAU,CAAC,OAAO,CAAC;gBACrC,iBAAiB,CAAC,QAAQ,CAAC,MAAM,CAAC;gBAClC,iBAAiB,CAAC,QAAQ,CAAC,KAAK,CAAC;gBACjC,iBAAiB,CAAC,QAAQ,CAAC,YAAY,CAAC;gBACxC,iBAAiB,CAAC,QAAQ,CAAC,MAAM,CAAC;gBAClC,iBAAiB,CAAC,QAAQ,CAAC,KAAK,CAAC,CAAC;YAEpC,IAAI,OAAwB,CAAC;YAC7B,IAAI,WAAW,EAAE,CAAC;gBAChB,OAAO,GAAG,MAAM,QAAQ,CAAC,IAAI,EAAE,CAAC;YAClC,CAAC;iBAAM,CAAC;gBACN,MAAM,WAAW,GAAG,MAAM,QAAQ,CAAC,WAAW,EAAE,CAAC;gBACjD,OAAO,GAAG,MAAM,CAAC,IAAI,CAAC,WAAW,CAAC,CAAC;YACrC,CAAC;YAED,wCAAwC;YACxC,IAAI,KAAK,GAAkB,IAAI,CAAC;YAChC,IAAI,OAAO,OAAO,KAAK,QAAQ,IAAI,iBAAiB,CAAC,QAAQ,CAAC,MAAM,CAAC,EAAE,CAAC;gBACtE,MAAM,UAAU,GAAG,OAAO,CAAC,KAAK,CAAC,+BAA+B,CAAC,CAAC;gBAClE,KAAK,GAAG,UAAU,CAAC,CAAC,CAAC,UAAU,CAAC,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC,CAAC,CAAC,IAAI,CAAC;YACnD,CAAC;YAED,OAAO;gBACL,OAAO;gBACP,WAAW,EAAE,iBAAiB;gBAC9B,KAAK;gBACL,GAAG,EAAE,QAAQ,CAAC,GAAG,EAAE,oCAAoC;gBACvD,WAAW,EAAE,KAAK;gBAClB,UAAU,EAAE,QAAQ,CAAC,MAAM;gBAC3B,KAAK,EAAE,SAAS;aACjB,CAAC;QACJ,CAAC;QAAC,OAAO,KAAc,EAAE,CAAC;YACxB,0CAA0C;YAC1C,IAAI,KAAK,YAAY,oBAAoB,EAAE,CAAC;gBAC1C,MAAM,KAAK,CAAC;YACd,CAAC;YACD,+BAA+B;YAC/B,MAAM,OAAO,GAAG,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,OAAO,CAAC,CAAC,CAAC,qBAAqB,CAAC;YAC/E,MAAM,IAAI,UAAU,CAClB,yBAAyB,OAAO,EAAE,EAClC,kBAAkB,EAClB,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,CAAC,CAAC,SAAS,CAC3C,CAAC;QACJ,CAAC;IACH,CAAC;IAED;;;;OAIG;IACH,KAAK,CAAC,OAAO;QACX,OAAO,OAAO,CAAC,OAAO,EAAE,CAAC;IAC3B,CAAC;IAED;;;;OAIG;IACH,UAAU;QACR,OAAO,EAAE,CAAC;IACZ,CAAC"}
@@ -9,7 +9,8 @@ export declare class HybridEngine implements IEngine {
9
9
  private readonly config;
10
10
  private readonly playwrightOnlyPatterns;
11
11
  constructor(config?: PlaywrightEngineConfig);
12
- private _isSpaShell;
12
+ private _convertHtmlToMarkdown;
13
+ private _shouldAutoRender;
13
14
  fetchHTML(url: string, options?: FetchOptions): Promise<HTMLFetchResult>;
14
15
  /**
15
16
  * Fetches raw content from the specified URL using the hybrid approach.
@@ -1 +1 @@
1
- {"version":3,"file":"HybridEngine.d.ts","sourceRoot":"","sources":["../src/HybridEngine.ts"],"names":[],"mappings":"AAEA,OAAO,KAAK,EAAE,OAAO,EAAE,MAAM,cAAc,CAAC;AAC5C,OAAO,KAAK,EACV,eAAe,EACf,kBAAkB,EAClB,mBAAmB,EACnB,sBAAsB,EACtB,YAAY,EACZ,cAAc,EACf,MAAM,YAAY,CAAC;AAEpB;;GAEG;AACH,qBAAa,YAAa,YAAW,OAAO;IAC1C,OAAO,CAAC,QAAQ,CAAC,WAAW,CAAc;IAC1C,OAAO,CAAC,QAAQ,CAAC,gBAAgB,CAAmB;IACpD,OAAO,CAAC,QAAQ,CAAC,MAAM,CAAyB;IAChD,OAAO,CAAC,QAAQ,CAAC,sBAAsB,CAAsB;gBAEjD,MAAM,GAAE,sBAA2B;IAU/C,OAAO,CAAC,WAAW;IAkBb,SAAS,CAAC,GAAG,EAAE,MAAM,EAAE,OAAO,GAAE,YAAiB,GAAG,OAAO,CAAC,eAAe,CAAC;IAsFlF;;;;;;;;;OASG;IACG,YAAY,CAAC,GAAG,EAAE,MAAM,EAAE,OAAO,GAAE,mBAAwB,GAAG,OAAO,CAAC,kBAAkB,CAAC;IA0C/F;;OAEG;IACH,UAAU,IAAI,cAAc,EAAE;IAI9B;;OAEG;IACG,OAAO,IAAI,OAAO,CAAC,IAAI,CAAC;CAM/B"}
1
+ {"version":3,"file":"HybridEngine.d.ts","sourceRoot":"","sources":["../src/HybridEngine.ts"],"names":[],"mappings":"AAEA,OAAO,KAAK,EAAE,OAAO,EAAE,MAAM,cAAc,CAAC;AAQ5C,OAAO,KAAK,EACV,eAAe,EACf,kBAAkB,EAClB,mBAAmB,EACnB,sBAAsB,EACtB,YAAY,EACZ,cAAc,EACf,MAAM,YAAY,CAAC;AAEpB;;GAEG;AACH,qBAAa,YAAa,YAAW,OAAO;IAC1C,OAAO,CAAC,QAAQ,CAAC,WAAW,CAAc;IAC1C,OAAO,CAAC,QAAQ,CAAC,gBAAgB,CAAmB;IACpD,OAAO,CAAC,QAAQ,CAAC,MAAM,CAAyB;IAChD,OAAO,CAAC,QAAQ,CAAC,sBAAsB,CAAsB;gBAEjD,MAAM,GAAE,sBAA2B;IAS/C,OAAO,CAAC,sBAAsB;IAkB9B,OAAO,CAAC,iBAAiB;IAUnB,SAAS,CAAC,GAAG,EAAE,MAAM,EAAE,OAAO,GAAE,YAAiB,GAAG,OAAO,CAAC,eAAe,CAAC;IAsGlF;;;;;;;;;OASG;IACG,YAAY,CAAC,GAAG,EAAE,MAAM,EAAE,OAAO,GAAE,mBAAwB,GAAG,OAAO,CAAC,kBAAkB,CAAC;IA0C/F;;OAEG;IACH,UAAU,IAAI,cAAc,EAAE;IAI9B;;OAEG;IACG,OAAO,IAAI,OAAO,CAAC,IAAI,CAAC;CAM/B"}
@@ -1,5 +1,7 @@
1
1
  import { FetchEngine, FetchEngineHttpError } from "./FetchEngine.js";
2
2
  import { PlaywrightEngine } from "./PlaywrightEngine.js";
3
+ import { MarkdownConverter, injectSourceUrl } from "./utils/markdown-converter.js";
4
+ import { assessHtmlRenderNeed, assessSerializedContent, isRenderedContentMeaningfullyBetter, isSoftBlockPage, } from "./utils/render-detection.js";
3
5
  /**
4
6
  * HybridEngine - Tries FetchEngine first, falls back to PlaywrightEngine on failure.
5
7
  */
@@ -10,30 +12,35 @@ export class HybridEngine {
10
12
  playwrightOnlyPatterns;
11
13
  constructor(config = {}) {
12
14
  // Pass relevant config parts to each engine
13
- // FetchEngine only takes markdown option from the shared config
14
- // spaMode from config is primarily for PlaywrightEngine, but HybridEngine uses it for decision making.
15
- this.fetchEngine = new FetchEngine({ markdown: config.markdown, headers: config.headers });
15
+ // HybridEngine fetches raw HTML first so it can decide whether rendering is necessary.
16
+ this.fetchEngine = new FetchEngine({ markdown: false, headers: config.headers });
16
17
  this.playwrightEngine = new PlaywrightEngine(config);
17
18
  this.config = config; // Store for merging later
18
19
  this.playwrightOnlyPatterns = config.playwrightOnlyPatterns || [];
19
20
  }
20
- _isSpaShell(htmlContent) {
21
- if (!htmlContent || htmlContent.length < 150) {
22
- // Very short content might be a shell or error
23
- // Heuristic: if it's very short AND contains noscript, good chance it's a shell.
24
- if (htmlContent.includes("<noscript>"))
25
- return true;
21
+ _convertHtmlToMarkdown(htmlResult) {
22
+ try {
23
+ const converter = new MarkdownConverter();
24
+ const content = injectSourceUrl(converter.convert(htmlResult.content, { baseUrl: htmlResult.url }), htmlResult.url);
25
+ return {
26
+ ...htmlResult,
27
+ content,
28
+ contentType: "markdown",
29
+ };
26
30
  }
27
- // Check for <noscript> tag
28
- if (htmlContent.includes("<noscript>"))
29
- return true;
30
- // Check for common empty root divs
31
- if (/<div id=(?:"|')?(root|app)(?:"|')?[^>]*>\s*<\/div>/i.test(htmlContent))
31
+ catch (conversionError) {
32
+ console.error(`HybridEngine: Markdown conversion failed for ${htmlResult.url}:`, conversionError);
33
+ return htmlResult;
34
+ }
35
+ }
36
+ _shouldAutoRender(fetchResult, forceSpaMode) {
37
+ if (forceSpaMode) {
32
38
  return true;
33
- // Check for empty title tag or no title tag at all
34
- if (/<title>\s*<\/title>/i.test(htmlContent) || !/<title[^>]*>/i.test(htmlContent))
39
+ }
40
+ if (isSoftBlockPage(fetchResult.content)) {
35
41
  return true;
36
- return false;
42
+ }
43
+ return assessHtmlRenderNeed(fetchResult.content).renderLikelyNeeded;
37
44
  }
38
45
  async fetchHTML(url, options = {}) {
39
46
  // Determine effective SPA mode and markdown options
@@ -70,22 +77,36 @@ export class HybridEngine {
70
77
  }
71
78
  }
72
79
  try {
73
- // Prepare options for FetchEngine call
74
- const fetchEngineCallSpecificOptions = {
75
- markdown: effectiveMarkdown, // Pass the resolved markdown setting
76
- headers: options.headers, // Pass only the request-specific headers. FetchEngine will merge these with its own constructor headers.
80
+ const fetchResult = await this.fetchEngine.fetchHTML(url, {
81
+ markdown: false,
82
+ headers: options.headers,
83
+ });
84
+ const httpPreferredResult = effectiveMarkdown ? this._convertHtmlToMarkdown(fetchResult) : fetchResult;
85
+ if (!this._shouldAutoRender(fetchResult, effectiveSpaMode)) {
86
+ return httpPreferredResult;
87
+ }
88
+ console.warn(`HybridEngine: HTTP fetch for ${url} looks incomplete. Attempting Playwright render.`);
89
+ // Skip HTTP fallback (we already know it's a shell) and use SPA rendering path for patient waits.
90
+ const autoRenderOptions = {
91
+ ...playwrightOptions,
92
+ useHttpFallback: false,
93
+ spaMode: true,
77
94
  };
78
- const fetchResult = await this.fetchEngine.fetchHTML(url, fetchEngineCallSpecificOptions);
79
- // If FetchEngine succeeded AND spaMode is active, check if it's just a shell
80
- if (effectiveSpaMode && fetchResult && fetchResult.content) {
81
- if (this._isSpaShell(fetchResult.content)) {
82
- console.warn(`HybridEngine: FetchEngine returned likely SPA shell for ${url} in spaMode. Forcing PlaywrightEngine.`);
83
- // Fallback to PlaywrightEngine, passing the determined effective options
84
- return this.playwrightEngine.fetchHTML(url, playwrightOptions);
95
+ try {
96
+ const playwrightResult = await this.playwrightEngine.fetchHTML(url, autoRenderOptions);
97
+ const staticAssessment = assessSerializedContent(httpPreferredResult.content, httpPreferredResult.contentType);
98
+ const renderedAssessment = assessSerializedContent(playwrightResult.content, playwrightResult.contentType);
99
+ if (!isRenderedContentMeaningfullyBetter(staticAssessment, renderedAssessment)) {
100
+ console.warn(`HybridEngine: Playwright render for ${url} was not meaningfully better. Keeping HTTP result.`);
101
+ return httpPreferredResult;
85
102
  }
103
+ return playwrightResult;
104
+ }
105
+ catch (playwrightError) {
106
+ const pwMessage = playwrightError instanceof Error ? playwrightError.message : String(playwrightError);
107
+ console.warn(`HybridEngine: Playwright render failed for ${url}: ${pwMessage}. Returning HTTP result.`);
108
+ return httpPreferredResult;
86
109
  }
87
- // If not spaMode, or if spaMode but content is not a shell, return FetchEngine's result
88
- return fetchResult;
89
110
  }
90
111
  catch (fetchError) {
91
112
  // If FetchEngine returned a 404, do not attempt Playwright fallback
@@ -1 +1 @@
1
- {"version":3,"file":"HybridEngine.js","sourceRoot":"","sources":["../src/HybridEngine.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,WAAW,EAAE,oBAAoB,EAAE,MAAM,kBAAkB,CAAC;AACrE,OAAO,EAAE,gBAAgB,EAAE,MAAM,uBAAuB,CAAC;AAWzD;;GAEG;AACH,MAAM,OAAO,YAAY;IACN,WAAW,CAAc;IACzB,gBAAgB,CAAmB;IACnC,MAAM,CAAyB,CAAC,sDAAsD;IACtF,sBAAsB,CAAsB;IAE7D,YAAY,SAAiC,EAAE;QAC7C,4CAA4C;QAC5C,gEAAgE;QAChE,uGAAuG;QACvG,IAAI,CAAC,WAAW,GAAG,IAAI,WAAW,CAAC,EAAE,QAAQ,EAAE,MAAM,CAAC,QAAQ,EAAE,OAAO,EAAE,MAAM,CAAC,OAAO,EAAE,CAAC,CAAC;QAC3F,IAAI,CAAC,gBAAgB,GAAG,IAAI,gBAAgB,CAAC,MAAM,CAAC,CAAC;QACrD,IAAI,CAAC,MAAM,GAAG,MAAM,CAAC,CAAC,0BAA0B;QAChD,IAAI,CAAC,sBAAsB,GAAG,MAAM,CAAC,sBAAsB,IAAI,EAAE,CAAC;IACpE,CAAC;IAEO,WAAW,CAAC,WAAmB;QACrC,IAAI,CAAC,WAAW,IAAI,WAAW,CAAC,MAAM,GAAG,GAAG,EAAE,CAAC;YAC7C,+CAA+C;YAC/C,iFAAiF;YACjF,IAAI,WAAW,CAAC,QAAQ,CAAC,YAAY,CAAC;gBAAE,OAAO,IAAI,CAAC;QACtD,CAAC;QACD,2BAA2B;QAC3B,IAAI,WAAW,CAAC,QAAQ,CAAC,YAAY,CAAC;YAAE,OAAO,IAAI,CAAC;QAEpD,mCAAmC;QACnC,IAAI,qDAAqD,CAAC,IAAI,CAAC,WAAW,CAAC;YAAE,OAAO,IAAI,CAAC;QAEzF,mDAAmD;QACnD,IAAI,sBAAsB,CAAC,IAAI,CAAC,WAAW,CAAC,IAAI,CAAC,eAAe,CAAC,IAAI,CAAC,WAAW,CAAC;YAAE,OAAO,IAAI,CAAC;QAEhG,OAAO,KAAK,CAAC;IACf,CAAC;IAED,KAAK,CAAC,SAAS,CAAC,GAAW,EAAE,UAAwB,EAAE;QACrD,oDAAoD;QACpD,gHAAgH;QAChH,MAAM,gBAAgB,GACpB,OAAO,CAAC,OAAO,KAAK,SAAS,CAAC,CAAC,CAAC,OAAO,CAAC,OAAO,CAAC,CAAC,CAAC,IAAI,CAAC,MAAM,CAAC,OAAO,KAAK,SAAS,CAAC,CAAC,CAAC,IAAI,CAAC,MAAM,CAAC,OAAO,CAAC,CAAC,CAAC,KAAK,CAAC;QACpH,MAAM,iBAAiB,GACrB,OAAO,CAAC,QAAQ,KAAK,SAAS;YAC5B,CAAC,CAAC,OAAO,CAAC,QAAQ;YAClB,CAAC,CAAC,IAAI,CAAC,MAAM,CAAC,QAAQ,KAAK,SAAS;gBAClC,CAAC,CAAC,IAAI,CAAC,MAAM,CAAC,QAAQ;gBACtB,CAAC,CAAC,KAAK,CAAC;QAEd,yFAAyF;QACzF,mEAAmE;QACnE,MAAM,kBAAkB,GAAG,IAAI,CAAC,MAAM,CAAC,OAAO,IAAI,EAAE,CAAC;QACrD,MAAM,sBAAsB,GAAG,OAAO,CAAC,OAAO,IAAI,EAAE,CAAC,CAAC,mEAAmE;QAEzH,8DAA8D;QAC9D,MAAM,0BAA0B,GAAG,EAAE,GAAG,kBAAkB,EAAE,GAAG,sBAAsB,EAAE,CAAC;QAExF,kEAAkE;QAClE,MAAM,iBAAiB,GAInB;YACF,GAAG,IAAI,CAAC,MAAM,EAAE,gEAAgE;YAChF,GAAG,OAAO,EAAE,yDAAyD;YACrE,OAAO,EAAE,0BAA0B,EAAE,sCAAsC;YAC3E,QAAQ,EAAE,iBAAiB;YAC3B,OAAO,EAAE,gBAAgB;SAC1B,CAAC;QAEF,qCAAqC;QACrC,KAAK,MAAM,OAAO,IAAI,IAAI,CAAC,sBAAsB,EAAE,CAAC;YAClD,IAAI,OAAO,OAAO,KAAK,QAAQ,IAAI,GAAG,CAAC,QAAQ,CAAC,OAAO,CAAC,EAAE,CAAC;gBACzD,OAAO,CAAC,IAAI,CAAC,qBAAqB,GAAG,4BAA4B,OAAO,qCAAqC,CAAC,CAAC;gBAC/G,OAAO,IAAI,CAAC,gBAAgB,CAAC,SAAS,CAAC,GAAG,EAAE,iBAAiB,CAAC,CAAC;YACjE,CAAC;iBAAM,IAAI,OAAO,YAAY,MAAM,IAAI,OAAO,CAAC,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC;gBAC1D,OAAO,CAAC,IAAI,CACV,qBAAqB,GAAG,2BAA2B,OAAO,CAAC,QAAQ,EAAE,qCAAqC,CAC3G,CAAC;gBACF,OAAO,IAAI,CAAC,gBAAgB,CAAC,SAAS,CAAC,GAAG,EAAE,iBAAiB,CAAC,CAAC;YACjE,CAAC;QACH,CAAC;QAED,IAAI,CAAC;YACH,uCAAuC;YACvC,MAAM,8BAA8B,GAAiB;gBACnD,QAAQ,EAAE,iBAAiB,EAAE,qCAAqC;gBAClE,OAAO,EAAE,OAAO,CAAC,OAAO,EAAE,yGAAyG;aACpI,CAAC;YACF,MAAM,WAAW,GAAG,MAAM,IAAI,CAAC,WAAW,CAAC,SAAS,CAAC,GAAG,EAAE,8BAA8B,CAAC,CAAC;YAE1F,6EAA6E;YAC7E,IAAI,gBAAgB,IAAI,WAAW,IAAI,WAAW,CAAC,OAAO,EAAE,CAAC;gBAC3D,IAAI,IAAI,CAAC,WAAW,CAAC,WAAW,CAAC,OAAO,CAAC,EAAE,CAAC;oBAC1C,OAAO,CAAC,IAAI,CACV,2DAA2D,GAAG,wCAAwC,CACvG,CAAC;oBACF,yEAAyE;oBACzE,OAAO,IAAI,CAAC,gBAAgB,CAAC,SAAS,CAAC,GAAG,EAAE,iBAAiB,CAAC,CAAC;gBACjE,CAAC;YACH,CAAC;YACD,wFAAwF;YACxF,OAAO,WAAW,CAAC;QACrB,CAAC;QAAC,OAAO,UAAmB,EAAE,CAAC;YAC7B,oEAAoE;YACpE,IAAI,UAAU,YAAY,oBAAoB,IAAI,UAAU,CAAC,UAAU,KAAK,GAAG,EAAE,CAAC;gBAChF,OAAO,CAAC,IAAI,CAAC,8CAA8C,GAAG,qBAAqB,CAAC,CAAC;gBACrF,MAAM,UAAU,CAAC;YACnB,CAAC;YACD,MAAM,OAAO,GAAG,UAAU,YAAY,KAAK,CAAC,CAAC,CAAC,UAAU,CAAC,OAAO,CAAC,CAAC,CAAC,MAAM,CAAC,UAAU,CAAC,CAAC;YACtF,OAAO,CAAC,IAAI,CAAC,wCAAwC,GAAG,KAAK,OAAO,qCAAqC,CAAC,CAAC;YAC3G,IAAI,CAAC;gBACH,yEAAyE;gBACzE,MAAM,gBAAgB,GAAG,MAAM,IAAI,CAAC,gBAAgB,CAAC,SAAS,CAAC,GAAG,EAAE,iBAAiB,CAAC,CAAC;gBACvF,OAAO,gBAAgB,CAAC;YAC1B,CAAC;YAAC,OAAO,eAAwB,EAAE,CAAC;gBAClC,MAAM,SAAS,GAAG,eAAe,YAAY,KAAK,CAAC,CAAC,CAAC,eAAe,CAAC,OAAO,CAAC,CAAC,CAAC,MAAM,CAAC,eAAe,CAAC,CAAC;gBACvG,OAAO,CAAC,KAAK,CAAC,2DAA2D,GAAG,KAAK,SAAS,EAAE,CAAC,CAAC;gBAC9F,MAAM,eAAe,CAAC,CAAC,8DAA8D;YACvF,CAAC;QACH,CAAC;IACH,CAAC;IAED;;;;;;;;;OASG;IACH,KAAK,CAAC,YAAY,CAAC,GAAW,EAAE,UAA+B,EAAE;QAC/D,qCAAqC;QACrC,KAAK,MAAM,OAAO,IAAI,IAAI,CAAC,sBAAsB,EAAE,CAAC;YAClD,IAAI,OAAO,OAAO,KAAK,QAAQ,IAAI,GAAG,CAAC,QAAQ,CAAC,OAAO,CAAC,EAAE,CAAC;gBACzD,OAAO,CAAC,IAAI,CACV,qBAAqB,GAAG,4BAA4B,OAAO,uDAAuD,CACnH,CAAC;gBACF,OAAO,IAAI,CAAC,gBAAgB,CAAC,YAAY,CAAC,GAAG,EAAE,OAAO,CAAC,CAAC;YAC1D,CAAC;iBAAM,IAAI,OAAO,YAAY,MAAM,IAAI,OAAO,CAAC,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC;gBAC1D,OAAO,CAAC,IAAI,CACV,qBAAqB,GAAG,2BAA2B,OAAO,CAAC,QAAQ,EAAE,uDAAuD,CAC7H,CAAC;gBACF,OAAO,IAAI,CAAC,gBAAgB,CAAC,YAAY,CAAC,GAAG,EAAE,OAAO,CAAC,CAAC;YAC1D,CAAC;QACH,CAAC;QAED,IAAI,CAAC;YACH,wBAAwB;YACxB,MAAM,WAAW,GAAG,MAAM,IAAI,CAAC,WAAW,CAAC,YAAY,CAAC,GAAG,EAAE,OAAO,CAAC,CAAC;YACtE,OAAO,WAAW,CAAC;QACrB,CAAC;QAAC,OAAO,UAAmB,EAAE,CAAC;YAC7B,oEAAoE;YACpE,IAAI,UAAU,YAAY,oBAAoB,IAAI,UAAU,CAAC,UAAU,KAAK,GAAG,EAAE,CAAC;gBAChF,OAAO,CAAC,IAAI,CAAC,4DAA4D,GAAG,qBAAqB,CAAC,CAAC;gBACnG,MAAM,UAAU,CAAC;YACnB,CAAC;YACD,MAAM,OAAO,GAAG,UAAU,YAAY,KAAK,CAAC,CAAC,CAAC,UAAU,CAAC,OAAO,CAAC,CAAC,CAAC,MAAM,CAAC,UAAU,CAAC,CAAC;YACtF,OAAO,CAAC,IAAI,CACV,sDAAsD,GAAG,KAAK,OAAO,qCAAqC,CAC3G,CAAC;YACF,IAAI,CAAC;gBACH,+BAA+B;gBAC/B,MAAM,gBAAgB,GAAG,MAAM,IAAI,CAAC,gBAAgB,CAAC,YAAY,CAAC,GAAG,EAAE,OAAO,CAAC,CAAC;gBAChF,OAAO,gBAAgB,CAAC;YAC1B,CAAC;YAAC,OAAO,eAAwB,EAAE,CAAC;gBAClC,MAAM,SAAS,GAAG,eAAe,YAAY,KAAK,CAAC,CAAC,CAAC,eAAe,CAAC,OAAO,CAAC,CAAC,CAAC,MAAM,CAAC,eAAe,CAAC,CAAC;gBACvG,OAAO,CAAC,KAAK,CAAC,yEAAyE,GAAG,KAAK,SAAS,EAAE,CAAC,CAAC;gBAC5G,MAAM,eAAe,CAAC,CAAC,8DAA8D;YACvF,CAAC;QACH,CAAC;IACH,CAAC;IAED;;OAEG;IACH,UAAU;QACR,OAAO,IAAI,CAAC,gBAAgB,CAAC,UAAU,EAAE,CAAC;IAC5C,CAAC;IAED;;OAEG;IACH,KAAK,CAAC,OAAO;QACX,MAAM,OAAO,CAAC,UAAU,CAAC;YACvB,IAAI,CAAC,WAAW,CAAC,OAAO,EAAE,EAAE,yCAAyC;YACrE,IAAI,CAAC,gBAAgB,CAAC,OAAO,EAAE;SAChC,CAAC,CAAC;IACL,CAAC;CACF"}
1
+ {"version":3,"file":"HybridEngine.js","sourceRoot":"","sources":["../src/HybridEngine.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,WAAW,EAAE,oBAAoB,EAAE,MAAM,kBAAkB,CAAC;AACrE,OAAO,EAAE,gBAAgB,EAAE,MAAM,uBAAuB,CAAC;AAEzD,OAAO,EAAE,iBAAiB,EAAE,eAAe,EAAE,MAAM,+BAA+B,CAAC;AACnF,OAAO,EACL,oBAAoB,EACpB,uBAAuB,EACvB,mCAAmC,EACnC,eAAe,GAChB,MAAM,6BAA6B,CAAC;AAUrC;;GAEG;AACH,MAAM,OAAO,YAAY;IACN,WAAW,CAAc;IACzB,gBAAgB,CAAmB;IACnC,MAAM,CAAyB,CAAC,sDAAsD;IACtF,sBAAsB,CAAsB;IAE7D,YAAY,SAAiC,EAAE;QAC7C,4CAA4C;QAC5C,uFAAuF;QACvF,IAAI,CAAC,WAAW,GAAG,IAAI,WAAW,CAAC,EAAE,QAAQ,EAAE,KAAK,EAAE,OAAO,EAAE,MAAM,CAAC,OAAO,EAAE,CAAC,CAAC;QACjF,IAAI,CAAC,gBAAgB,GAAG,IAAI,gBAAgB,CAAC,MAAM,CAAC,CAAC;QACrD,IAAI,CAAC,MAAM,GAAG,MAAM,CAAC,CAAC,0BAA0B;QAChD,IAAI,CAAC,sBAAsB,GAAG,MAAM,CAAC,sBAAsB,IAAI,EAAE,CAAC;IACpE,CAAC;IAEO,sBAAsB,CAAC,UAA2B;QACxD,IAAI,CAAC;YACH,MAAM,SAAS,GAAG,IAAI,iBAAiB,EAAE,CAAC;YAC1C,MAAM,OAAO,GAAG,eAAe,CAC7B,SAAS,CAAC,OAAO,CAAC,UAAU,CAAC,OAAO,EAAE,EAAE,OAAO,EAAE,UAAU,CAAC,GAAG,EAAE,CAAC,EAClE,UAAU,CAAC,GAAG,CACf,CAAC;YACF,OAAO;gBACL,GAAG,UAAU;gBACb,OAAO;gBACP,WAAW,EAAE,UAAU;aACxB,CAAC;QACJ,CAAC;QAAC,OAAO,eAAwB,EAAE,CAAC;YAClC,OAAO,CAAC,KAAK,CAAC,gDAAgD,UAAU,CAAC,GAAG,GAAG,EAAE,eAAe,CAAC,CAAC;YAClG,OAAO,UAAU,CAAC;QACpB,CAAC;IACH,CAAC;IAEO,iBAAiB,CAAC,WAA4B,EAAE,YAAqB;QAC3E,IAAI,YAAY,EAAE,CAAC;YACjB,OAAO,IAAI,CAAC;QACd,CAAC;QACD,IAAI,eAAe,CAAC,WAAW,CAAC,OAAO,CAAC,EAAE,CAAC;YACzC,OAAO,IAAI,CAAC;QACd,CAAC;QACD,OAAO,oBAAoB,CAAC,WAAW,CAAC,OAAO,CAAC,CAAC,kBAAkB,CAAC;IACtE,CAAC;IAED,KAAK,CAAC,SAAS,CAAC,GAAW,EAAE,UAAwB,EAAE;QACrD,oDAAoD;QACpD,gHAAgH;QAChH,MAAM,gBAAgB,GACpB,OAAO,CAAC,OAAO,KAAK,SAAS,CAAC,CAAC,CAAC,OAAO,CAAC,OAAO,CAAC,CAAC,CAAC,IAAI,CAAC,MAAM,CAAC,OAAO,KAAK,SAAS,CAAC,CAAC,CAAC,IAAI,CAAC,MAAM,CAAC,OAAO,CAAC,CAAC,CAAC,KAAK,CAAC;QACpH,MAAM,iBAAiB,GACrB,OAAO,CAAC,QAAQ,KAAK,SAAS;YAC5B,CAAC,CAAC,OAAO,CAAC,QAAQ;YAClB,CAAC,CAAC,IAAI,CAAC,MAAM,CAAC,QAAQ,KAAK,SAAS;gBAClC,CAAC,CAAC,IAAI,CAAC,MAAM,CAAC,QAAQ;gBACtB,CAAC,CAAC,KAAK,CAAC;QAEd,yFAAyF;QACzF,mEAAmE;QACnE,MAAM,kBAAkB,GAAG,IAAI,CAAC,MAAM,CAAC,OAAO,IAAI,EAAE,CAAC;QACrD,MAAM,sBAAsB,GAAG,OAAO,CAAC,OAAO,IAAI,EAAE,CAAC,CAAC,mEAAmE;QAEzH,8DAA8D;QAC9D,MAAM,0BAA0B,GAAG,EAAE,GAAG,kBAAkB,EAAE,GAAG,sBAAsB,EAAE,CAAC;QAExF,kEAAkE;QAClE,MAAM,iBAAiB,GAInB;YACF,GAAG,IAAI,CAAC,MAAM,EAAE,gEAAgE;YAChF,GAAG,OAAO,EAAE,yDAAyD;YACrE,OAAO,EAAE,0BAA0B,EAAE,sCAAsC;YAC3E,QAAQ,EAAE,iBAAiB;YAC3B,OAAO,EAAE,gBAAgB;SAC1B,CAAC;QAEF,qCAAqC;QACrC,KAAK,MAAM,OAAO,IAAI,IAAI,CAAC,sBAAsB,EAAE,CAAC;YAClD,IAAI,OAAO,OAAO,KAAK,QAAQ,IAAI,GAAG,CAAC,QAAQ,CAAC,OAAO,CAAC,EAAE,CAAC;gBACzD,OAAO,CAAC,IAAI,CAAC,qBAAqB,GAAG,4BAA4B,OAAO,qCAAqC,CAAC,CAAC;gBAC/G,OAAO,IAAI,CAAC,gBAAgB,CAAC,SAAS,CAAC,GAAG,EAAE,iBAAiB,CAAC,CAAC;YACjE,CAAC;iBAAM,IAAI,OAAO,YAAY,MAAM,IAAI,OAAO,CAAC,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC;gBAC1D,OAAO,CAAC,IAAI,CACV,qBAAqB,GAAG,2BAA2B,OAAO,CAAC,QAAQ,EAAE,qCAAqC,CAC3G,CAAC;gBACF,OAAO,IAAI,CAAC,gBAAgB,CAAC,SAAS,CAAC,GAAG,EAAE,iBAAiB,CAAC,CAAC;YACjE,CAAC;QACH,CAAC;QAED,IAAI,CAAC;YACH,MAAM,WAAW,GAAG,MAAM,IAAI,CAAC,WAAW,CAAC,SAAS,CAAC,GAAG,EAAE;gBACxD,QAAQ,EAAE,KAAK;gBACf,OAAO,EAAE,OAAO,CAAC,OAAO;aACzB,CAAC,CAAC;YACH,MAAM,mBAAmB,GAAG,iBAAiB,CAAC,CAAC,CAAC,IAAI,CAAC,sBAAsB,CAAC,WAAW,CAAC,CAAC,CAAC,CAAC,WAAW,CAAC;YAEvG,IAAI,CAAC,IAAI,CAAC,iBAAiB,CAAC,WAAW,EAAE,gBAAgB,CAAC,EAAE,CAAC;gBAC3D,OAAO,mBAAmB,CAAC;YAC7B,CAAC;YAED,OAAO,CAAC,IAAI,CAAC,gCAAgC,GAAG,kDAAkD,CAAC,CAAC;YAEpG,kGAAkG;YAClG,MAAM,iBAAiB,GAAG;gBACxB,GAAG,iBAAiB;gBACpB,eAAe,EAAE,KAAK;gBACtB,OAAO,EAAE,IAAI;aACd,CAAC;YAEF,IAAI,CAAC;gBACH,MAAM,gBAAgB,GAAG,MAAM,IAAI,CAAC,gBAAgB,CAAC,SAAS,CAAC,GAAG,EAAE,iBAAiB,CAAC,CAAC;gBACvF,MAAM,gBAAgB,GAAG,uBAAuB,CAAC,mBAAmB,CAAC,OAAO,EAAE,mBAAmB,CAAC,WAAW,CAAC,CAAC;gBAC/G,MAAM,kBAAkB,GAAG,uBAAuB,CAAC,gBAAgB,CAAC,OAAO,EAAE,gBAAgB,CAAC,WAAW,CAAC,CAAC;gBAE3G,IAAI,CAAC,mCAAmC,CAAC,gBAAgB,EAAE,kBAAkB,CAAC,EAAE,CAAC;oBAC/E,OAAO,CAAC,IAAI,CAAC,uCAAuC,GAAG,oDAAoD,CAAC,CAAC;oBAC7G,OAAO,mBAAmB,CAAC;gBAC7B,CAAC;gBAED,OAAO,gBAAgB,CAAC;YAC1B,CAAC;YAAC,OAAO,eAAwB,EAAE,CAAC;gBAClC,MAAM,SAAS,GAAG,eAAe,YAAY,KAAK,CAAC,CAAC,CAAC,eAAe,CAAC,OAAO,CAAC,CAAC,CAAC,MAAM,CAAC,eAAe,CAAC,CAAC;gBACvG,OAAO,CAAC,IAAI,CAAC,8CAA8C,GAAG,KAAK,SAAS,0BAA0B,CAAC,CAAC;gBACxG,OAAO,mBAAmB,CAAC;YAC7B,CAAC;QACH,CAAC;QAAC,OAAO,UAAmB,EAAE,CAAC;YAC7B,oEAAoE;YACpE,IAAI,UAAU,YAAY,oBAAoB,IAAI,UAAU,CAAC,UAAU,KAAK,GAAG,EAAE,CAAC;gBAChF,OAAO,CAAC,IAAI,CAAC,8CAA8C,GAAG,qBAAqB,CAAC,CAAC;gBACrF,MAAM,UAAU,CAAC;YACnB,CAAC;YACD,MAAM,OAAO,GAAG,UAAU,YAAY,KAAK,CAAC,CAAC,CAAC,UAAU,CAAC,OAAO,CAAC,CAAC,CAAC,MAAM,CAAC,UAAU,CAAC,CAAC;YACtF,OAAO,CAAC,IAAI,CAAC,wCAAwC,GAAG,KAAK,OAAO,qCAAqC,CAAC,CAAC;YAC3G,IAAI,CAAC;gBACH,yEAAyE;gBACzE,MAAM,gBAAgB,GAAG,MAAM,IAAI,CAAC,gBAAgB,CAAC,SAAS,CAAC,GAAG,EAAE,iBAAiB,CAAC,CAAC;gBACvF,OAAO,gBAAgB,CAAC;YAC1B,CAAC;YAAC,OAAO,eAAwB,EAAE,CAAC;gBAClC,MAAM,SAAS,GAAG,eAAe,YAAY,KAAK,CAAC,CAAC,CAAC,eAAe,CAAC,OAAO,CAAC,CAAC,CAAC,MAAM,CAAC,eAAe,CAAC,CAAC;gBACvG,OAAO,CAAC,KAAK,CAAC,2DAA2D,GAAG,KAAK,SAAS,EAAE,CAAC,CAAC;gBAC9F,MAAM,eAAe,CAAC,CAAC,8DAA8D;YACvF,CAAC;QACH,CAAC;IACH,CAAC;IAED;;;;;;;;;OASG;IACH,KAAK,CAAC,YAAY,CAAC,GAAW,EAAE,UAA+B,EAAE;QAC/D,qCAAqC;QACrC,KAAK,MAAM,OAAO,IAAI,IAAI,CAAC,sBAAsB,EAAE,CAAC;YAClD,IAAI,OAAO,OAAO,KAAK,QAAQ,IAAI,GAAG,CAAC,QAAQ,CAAC,OAAO,CAAC,EAAE,CAAC;gBACzD,OAAO,CAAC,IAAI,CACV,qBAAqB,GAAG,4BAA4B,OAAO,uDAAuD,CACnH,CAAC;gBACF,OAAO,IAAI,CAAC,gBAAgB,CAAC,YAAY,CAAC,GAAG,EAAE,OAAO,CAAC,CAAC;YAC1D,CAAC;iBAAM,IAAI,OAAO,YAAY,MAAM,IAAI,OAAO,CAAC,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC;gBAC1D,OAAO,CAAC,IAAI,CACV,qBAAqB,GAAG,2BAA2B,OAAO,CAAC,QAAQ,EAAE,uDAAuD,CAC7H,CAAC;gBACF,OAAO,IAAI,CAAC,gBAAgB,CAAC,YAAY,CAAC,GAAG,EAAE,OAAO,CAAC,CAAC;YAC1D,CAAC;QACH,CAAC;QAED,IAAI,CAAC;YACH,wBAAwB;YACxB,MAAM,WAAW,GAAG,MAAM,IAAI,CAAC,WAAW,CAAC,YAAY,CAAC,GAAG,EAAE,OAAO,CAAC,CAAC;YACtE,OAAO,WAAW,CAAC;QACrB,CAAC;QAAC,OAAO,UAAmB,EAAE,CAAC;YAC7B,oEAAoE;YACpE,IAAI,UAAU,YAAY,oBAAoB,IAAI,UAAU,CAAC,UAAU,KAAK,GAAG,EAAE,CAAC;gBAChF,OAAO,CAAC,IAAI,CAAC,4DAA4D,GAAG,qBAAqB,CAAC,CAAC;gBACnG,MAAM,UAAU,CAAC;YACnB,CAAC;YACD,MAAM,OAAO,GAAG,UAAU,YAAY,KAAK,CAAC,CAAC,CAAC,UAAU,CAAC,OAAO,CAAC,CAAC,CAAC,MAAM,CAAC,UAAU,CAAC,CAAC;YACtF,OAAO,CAAC,IAAI,CACV,sDAAsD,GAAG,KAAK,OAAO,qCAAqC,CAC3G,CAAC;YACF,IAAI,CAAC;gBACH,+BAA+B;gBAC/B,MAAM,gBAAgB,GAAG,MAAM,IAAI,CAAC,gBAAgB,CAAC,YAAY,CAAC,GAAG,EAAE,OAAO,CAAC,CAAC;gBAChF,OAAO,gBAAgB,CAAC;YAC1B,CAAC;YAAC,OAAO,eAAwB,EAAE,CAAC;gBAClC,MAAM,SAAS,GAAG,eAAe,YAAY,KAAK,CAAC,CAAC,CAAC,eAAe,CAAC,OAAO,CAAC,CAAC,CAAC,MAAM,CAAC,eAAe,CAAC,CAAC;gBACvG,OAAO,CAAC,KAAK,CAAC,yEAAyE,GAAG,KAAK,SAAS,EAAE,CAAC,CAAC;gBAC5G,MAAM,eAAe,CAAC,CAAC,8DAA8D;YACvF,CAAC;QACH,CAAC;IACH,CAAC;IAED;;OAEG;IACH,UAAU;QACR,OAAO,IAAI,CAAC,gBAAgB,CAAC,UAAU,EAAE,CAAC;IAC5C,CAAC;IAED;;OAEG;IACH,KAAK,CAAC,OAAO;QACX,MAAM,OAAO,CAAC,UAAU,CAAC;YACvB,IAAI,CAAC,WAAW,CAAC,OAAO,EAAE,EAAE,yCAAyC;YACrE,IAAI,CAAC,gBAAgB,CAAC,OAAO,EAAE;SAChC,CAAC,CAAC;IACL,CAAC;CACF"}
@@ -8,6 +8,9 @@ import type { IEngine } from "./IEngine.js";
8
8
  * Features include caching, retries, HTTP fallback, and configurable browser pooling.
9
9
  */
10
10
  export declare class PlaywrightEngine implements IEngine {
11
+ private static readonly AUTO_RENDER_POLL_MS;
12
+ private static readonly AUTO_RENDER_QUIET_WINDOW_MS;
13
+ private static readonly AUTO_RENDER_MAX_WAIT_MS;
11
14
  private browserPool;
12
15
  private readonly queue;
13
16
  private readonly cache;
@@ -41,6 +44,9 @@ export declare class PlaywrightEngine implements IEngine {
41
44
  * Simulate human-like interactions on the page.
42
45
  */
43
46
  private simulateHumanBehavior;
47
+ private captureRenderedDomSnapshot;
48
+ private shouldAutoWaitForRenderedDom;
49
+ private waitForRenderedDomIfNeeded;
44
50
  /**
45
51
  * Adds a result to the in-memory cache.
46
52
  */
@@ -102,7 +108,6 @@ export declare class PlaywrightEngine implements IEngine {
102
108
  */
103
109
  private fetchWithPlaywright;
104
110
  private applyBlockingRules;
105
- private _injectSourceUnderH1;
106
111
  /**
107
112
  * Cleans up resources used by the engine, primarily closing browser instances in the pool.
108
113
  *
@@ -1 +1 @@
1
- {"version":3,"file":"PlaywrightEngine.d.ts","sourceRoot":"","sources":["../src/PlaywrightEngine.ts"],"names":[],"mappings":"AAAA,OAAO,KAAK,EACV,eAAe,EACf,kBAAkB,EAClB,mBAAmB,EACnB,cAAc,EACd,sBAAsB,EACtB,YAAY,EACb,MAAM,YAAY,CAAC;AACpB,OAAO,KAAK,EAAE,OAAO,EAAE,MAAM,cAAc,CAAC;AAyC5C;;;;;;GAMG;AACH,qBAAa,gBAAiB,YAAW,OAAO;IAC9C,OAAO,CAAC,WAAW,CAAsC;IACzD,OAAO,CAAC,QAAQ,CAAC,KAAK,CAAS;IAC/B,OAAO,CAAC,QAAQ,CAAC,KAAK,CAAsC;IAC5D,OAAO,CAAC,QAAQ,CAAC,MAAM,CAAiC;IAGxD,OAAO,CAAC,uBAAuB,CAAkB;IACjD,OAAO,CAAC,iBAAiB,CAAkB;IAC3C,OAAO,CAAC,mBAAmB,CAA0B;IAGrD,OAAO,CAAC,MAAM,CAAC,QAAQ,CAAC,cAAc,CAuBpC;IAEF;;;;;OAKG;gBACS,MAAM,GAAE,sBAA2B;IAM/C;;OAEG;YACW,qBAAqB;IAwCnC;;;OAGG;YACW,yBAAyB;IAyEvC,OAAO,CAAC,UAAU;IAalB;;OAEG;YACW,WAAW;IAazB;;OAEG;YACW,qBAAqB;IAwCnC;;OAEG;IACH,OAAO,CAAC,UAAU;IAUlB;;;;;;;;;OASG;IACG,SAAS,CACb,GAAG,EAAE,MAAM,EACX,OAAO,GAAE,YAAY,GAAG;QAAE,QAAQ,CAAC,EAAE,OAAO,CAAC;QAAC,OAAO,CAAC,EAAE,OAAO,CAAA;KAAO,GACrE,OAAO,CAAC,eAAe,CAAC;IAoB3B;;;;;;;OAOG;IACH,OAAO,CAAC,iBAAiB;IAqDzB;;;;;;;OAOG;YACW,oBAAoB;IAmClC;;;;;;;;OAQG;YACW,6BAA6B;IAmC3C;;;;;;;OAOG;YACW,eAAe;IAkH7B;;;OAGG;YACW,mBAAmB;YA8KnB,kBAAkB;IA0ChC,OAAO,CAAC,oBAAoB;IAQ5B;;;;;OAKG;IACG,OAAO,IAAI,OAAO,CAAC,IAAI,CAAC;IAoB9B;;;OAGG;IACH,UAAU,IAAI,cAAc,EAAE;IAQ9B,OAAO,CAAC,mBAAmB;IAU3B;;;;;;;;OAQG;IACG,YAAY,CAAC,GAAG,EAAE,MAAM,EAAE,OAAO,GAAE,mBAAwB,GAAG,OAAO,CAAC,kBAAkB,CAAC;IA0C/F;;OAEG;IACH,OAAO,CAAC,iBAAiB;IAkBzB;;OAEG;IACH,OAAO,CAAC,iBAAiB;IAoBzB;;OAEG;YACW,sBAAsB;IAyDpC;;OAEG;YACW,2BAA2B;IA4DzC;;OAEG;YACW,0BAA0B;CAmFzC"}
1
+ {"version":3,"file":"PlaywrightEngine.d.ts","sourceRoot":"","sources":["../src/PlaywrightEngine.ts"],"names":[],"mappings":"AAAA,OAAO,KAAK,EACV,eAAe,EACf,kBAAkB,EAClB,mBAAmB,EACnB,cAAc,EACd,sBAAsB,EACtB,YAAY,EACb,MAAM,YAAY,CAAC;AACpB,OAAO,KAAK,EAAE,OAAO,EAAE,MAAM,cAAc,CAAC;AA2D5C;;;;;;GAMG;AACH,qBAAa,gBAAiB,YAAW,OAAO;IAC9C,OAAO,CAAC,MAAM,CAAC,QAAQ,CAAC,mBAAmB,CAAO;IAClD,OAAO,CAAC,MAAM,CAAC,QAAQ,CAAC,2BAA2B,CAAO;IAC1D,OAAO,CAAC,MAAM,CAAC,QAAQ,CAAC,uBAAuB,CAAQ;IAEvD,OAAO,CAAC,WAAW,CAAsC;IACzD,OAAO,CAAC,QAAQ,CAAC,KAAK,CAAS;IAC/B,OAAO,CAAC,QAAQ,CAAC,KAAK,CAAsC;IAC5D,OAAO,CAAC,QAAQ,CAAC,MAAM,CAAiC;IAGxD,OAAO,CAAC,uBAAuB,CAAkB;IACjD,OAAO,CAAC,iBAAiB,CAAkB;IAC3C,OAAO,CAAC,mBAAmB,CAA0B;IAGrD,OAAO,CAAC,MAAM,CAAC,QAAQ,CAAC,cAAc,CAuBpC;IAEF;;;;;OAKG;gBACS,MAAM,GAAE,sBAA2B;IAM/C;;OAEG;YACW,qBAAqB;IAwCnC;;;OAGG;YACW,yBAAyB;IA0EvC,OAAO,CAAC,UAAU;IAalB;;OAEG;YACW,WAAW;IAazB;;OAEG;YACW,qBAAqB;YA2CrB,0BAA0B;IAqDxC,OAAO,CAAC,4BAA4B;YAUtB,0BAA0B;IA8FxC;;OAEG;IACH,OAAO,CAAC,UAAU;IAUlB;;;;;;;;;OASG;IACG,SAAS,CACb,GAAG,EAAE,MAAM,EACX,OAAO,GAAE,YAAY,GAAG;QAAE,QAAQ,CAAC,EAAE,OAAO,CAAC;QAAC,OAAO,CAAC,EAAE,OAAO,CAAA;KAAO,GACrE,OAAO,CAAC,eAAe,CAAC;IAoB3B;;;;;;;OAOG;IACH,OAAO,CAAC,iBAAiB;IAqDzB;;;;;;;OAOG;YACW,oBAAoB;IAmClC;;;;;;;;OAQG;YACW,6BAA6B;IAmC3C;;;;;;;OAOG;YACW,eAAe;IAkH7B;;;OAGG;YACW,mBAAmB;YA6KnB,kBAAkB;IAyChC;;;;;OAKG;IACG,OAAO,IAAI,OAAO,CAAC,IAAI,CAAC;IAoB9B;;;OAGG;IACH,UAAU,IAAI,cAAc,EAAE;IAQ9B,OAAO,CAAC,mBAAmB;IAU3B;;;;;;;;OAQG;IACG,YAAY,CAAC,GAAG,EAAE,MAAM,EAAE,OAAO,GAAE,mBAAwB,GAAG,OAAO,CAAC,kBAAkB,CAAC;IA0C/F;;OAEG;IACH,OAAO,CAAC,iBAAiB;IAkBzB;;OAEG;IACH,OAAO,CAAC,iBAAiB;IAoBzB;;OAEG;YACW,sBAAsB;IAyDpC;;OAEG;YACW,2BAA2B;IA4DzC;;OAEG;YACW,0BAA0B;CA2FzC"}