@purepageio/fetch-engines 0.9.0 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -2,13 +2,18 @@
2
2
 
3
3
  [![npm version](https://img.shields.io/npm/v/@purepageio/fetch-engines.svg)](https://www.npmjs.com/package/@purepageio/fetch-engines)
4
4
  [![CI](https://github.com/purepage/fetch-engines/actions/workflows/publish.yml/badge.svg)](https://github.com/purepage/fetch-engines/actions/workflows/publish.yml)
5
+ [![Live Browser Evals](https://github.com/purepage/fetch-engines/actions/workflows/live-browser-evals.yml/badge.svg)](https://github.com/purepage/fetch-engines/actions/workflows/live-browser-evals.yml)
5
6
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
6
7
 
7
- Fetch websites with confidence. `@purepageio/fetch-engines` gives teams an HTTP-first workflow that automatically promotes tricky pages to a managed Playwright browser and can even hand structured results back through OpenAI.
8
+ Reliable public-web extraction for Node.js.
9
+
10
+ HTTP-first for speed. Browser-backed when needed. Clean Markdown, soft-block handling, and structured extraction for RAG and AI pipelines.
8
11
 
9
12
  ## Table of contents
10
13
 
11
14
  - [Why fetch-engines?](#why-fetch-engines)
15
+ - [Why trust fetch-engines](#why-trust-fetch-engines)
16
+ - [Library vs hosted crawler](#library-vs-hosted-crawler)
12
17
  - [Installation](#installation)
13
18
  - [Quick start](#quick-start)
14
19
  - [Usage patterns](#usage-patterns)
@@ -26,9 +31,32 @@ Fetch websites with confidence. `@purepageio/fetch-engines` gives teams an HTTP-
26
31
  ## Why fetch-engines?
27
32
 
28
33
  - **One API for multiple strategies** – Call `fetchHTML` for rendered pages or `fetchContent` for raw responses. The library handles HTTP shortcuts and Playwright fallbacks automatically.
29
- - **Production-minded defaults** – Retries, caching, and consistent telemetry are ready out of the box.
30
- - **Drop-in AI enrichment** – Provide a Zod schema and let OpenAI (or any OpenAI-compatible API) convert full pages into structured data.
31
- - **Typed and tested** – Built in TypeScript with examples that mirror real-world scraping pipelines.
34
+ - **Automatic app-shell & soft-block detection** – Shell-like HTTP responses and bot-gate pages (Cloudflare challenges, CAPTCHAs, "verify you're human") are upgraded to Playwright rendering by default, so client-rendered pages and soft blocks work without per-domain rules.
35
+ - **RAG-ready Markdown** – Convert public content pages to clean Markdown with boilerplate, nav, and SVG noise stripped out. Powered by a Rust-native converter.
36
+ - **HTTP-first, browser-backed when needed** – Fast pages stay cheap via plain HTTP, while harder pages automatically benefit from Playwright fallback.
37
+ - **Structured extraction built in** – Define a Zod schema and go from URL to typed data via any OpenAI-compatible API. The page is fetched as Markdown first to minimise tokens.
38
+ - **Playwright is optional** – `FetchEngine` works without browser dependencies. Playwright is only loaded when you use `HybridEngine` or `PlaywrightEngine`.
39
+
40
+ ## Why trust fetch-engines
41
+
42
+ - **19 live URLs across 7 archetypes** (docs, government, knowledge, marketing, commerce, static, access-guarded) validated on every release and nightly via browser-enabled CI
43
+ - **85 unit tests + dedicated live browser eval workflow** — not just "it compiles," but "it extracts real content from real pages"
44
+ - Handles app shells, Cloudflare challenges, CAPTCHAs, and utility-class-heavy doc sites (Tailwind, Vite) without per-domain rules
45
+ - Produces clean Markdown with absolute URLs — boilerplate removal typically reduces raw HTML to 10–30% of its original size before it reaches your LLM
46
+ - Structured extraction with Zod schemas and any OpenAI-compatible provider, in the same pipeline as page fetching
47
+
48
+ ## Library vs hosted crawler
49
+
50
+ | | fetch-engines | Hosted crawlers |
51
+ | ------------------- | --------------------------------------- | ------------------- |
52
+ | **Runs where** | Your Node.js process | Third-party API |
53
+ | **Data stays** | In your infrastructure | Leaves your network |
54
+ | **Cost model** | Free + your compute | Per-page pricing |
55
+ | **Customisation** | Full source access, tune heuristics | Configuration flags |
56
+ | **Browser control** | Your Playwright instance, your proxy | Opaque |
57
+ | **Transparency** | Open tests, open evals, open heuristics | Black box |
58
+
59
+ Choose `fetch-engines` when you want full control over extraction, data residency, and cost. Choose a hosted crawler when you need managed infrastructure and don't want to run browsers yourself.
32
60
 
33
61
  ## Installation
34
62
 
@@ -83,7 +111,9 @@ console.log(page.contentType); // "markdown"
83
111
  await engine.cleanup();
84
112
  ```
85
113
 
86
- `FetchEngine` also supports `markdown: true` for static pages that don't need JavaScript rendering.
114
+ `FetchEngine` also supports `markdown: true` for static pages that don't need JavaScript rendering. `HybridEngine` now decides whether to render before converting to Markdown, so shell detection still works when callers request Markdown output.
115
+ Relative links and image URLs in Markdown output are normalized to absolute URLs using the final fetched page URL. The converter strips generic UI chrome (nav/footer/button controls and dense link clusters) using domain-agnostic heuristics, while preserving content on pages without semantic `<main>`/`<article>` containers (e.g., Tailwind CSS docs).
116
+ The extraction path is tuned for publicly accessible content. Paywalled or member-only pages may still return intentionally partial content unless you supply authenticated access yourself.
87
117
 
88
118
  ### Structured extraction
89
119
 
@@ -103,6 +133,8 @@ const result = await fetchStructuredContent("https://example.com/article", schem
103
133
  console.log(result.data.summary);
104
134
  ```
105
135
 
136
+ Under the hood, structured extraction fetches the page as Markdown first (same boilerplate removal as Markdown mode), then sends the cleaned content to the AI model — keeping token usage low and extraction quality high.
137
+
106
138
  Set `OPENAI_API_KEY` (or `OPENROUTER_API_KEY`) before running structured helpers, or use `apiConfig` to connect to OpenAI-compatible APIs like OpenRouter. The engine automatically adds the `Authorization` header when you provide an API key:
107
139
 
108
140
  ```typescript
@@ -128,7 +160,7 @@ When you supply a custom `baseURL`, the engine automatically switches to the Ver
128
160
  All engines accept familiar `fetch` options such as custom headers. Additional Hybrid/Playwright options you are likely to tweak:
129
161
 
130
162
  - `markdown` – return Markdown instead of HTML.
131
- - `spaMode` & `spaRenderDelayMs` allow single-page apps to render before extraction.
163
+ - Automatic shell detection is enabled by default. `spaMode` & `spaRenderDelayMs` still force a more patient render path when you know a page is highly dynamic.
132
164
  - `cacheTTL`, `maxRetries`, and browser pool sizes – control resilience and throughput.
133
165
 
134
166
  Check the inline TypeScript docs or the [`/examples`](./examples) directory for end-to-end flows.
@@ -137,30 +169,30 @@ Check the inline TypeScript docs or the [`/examples`](./examples) directory for
137
169
 
138
170
  Every option from `PlaywrightEngineConfig` (consumed by `HybridEngine`) with defaults:
139
171
 
140
- | Option | Default | Purpose |
141
- | -------------------------- | ----------- | ------------------------------------------------------------------------------------------------ |
142
- | `headers` | `{}` | Extra headers merged into every request. |
143
- | `concurrentPages` | `3` | Maximum Playwright pages processed at once. |
144
- | `maxRetries` | `3` | Additional retry attempts after the first failure. |
145
- | `retryDelay` | `5000` | Milliseconds to wait between retries. |
146
- | `cacheTTL` | `900000` | Cache lifetime in ms (`0` disables caching). |
147
- | `useHttpFallback` | `true` | Try a fast HTTP GET before spinning up Playwright. |
148
- | `useHeadedModeFallback` | `false` | Automatically retry a domain in headed mode after repeated failures. |
149
- | `defaultFastMode` | `true` | Block non-critical assets and skip human simulation unless overridden. |
150
- | `simulateHumanBehavior` | `true` | When not in fast mode, add delays and scrolling to avoid bot detection. |
151
- | `maxBrowsers` | `2` | Highest number of Playwright browser instances kept in the pool. |
152
- | `maxPagesPerContext` | `6` | Pages opened per browser context before recycling it. |
153
- | `maxBrowserAge` | `1200000` | Milliseconds before a browser instance is torn down (20 minutes). |
154
- | `healthCheckInterval` | `60000` | Pool health check frequency in ms. |
155
- | `poolBlockedDomains` | `[]` | Domains blocked across every Playwright request (inherit pool defaults if empty). |
156
- | `poolBlockedResourceTypes` | `[]` | Resource types (e.g. `"image"`) blocked globally. |
157
- | `proxy` | `undefined` | Per-browser proxy `{ server, username?, password? }`. |
158
- | `useHeadedMode` | `false` | Force every browser to launch with a visible window. |
159
- | `markdown` | `false` | Return Markdown instead of raw HTML. Converts via a Rust-native engine with boilerplate removal. |
160
- | `spaMode` | `false` | Enable SPA heuristics and allow additional waits for client rendering. |
161
- | `spaRenderDelayMs` | `0` | Extra delay after load when `spaMode` is `true`. |
162
- | `playwrightOnlyPatterns` | `[]` | URLs matching any string/regex go straight to Playwright, skipping HTTP fetches. |
163
- | `playwrightLaunchOptions` | `undefined` | Options passed to `browserType.launch` (see Playwright docs). |
172
+ | Option | Default | Purpose |
173
+ | -------------------------- | ----------- | -------------------------------------------------------------------------------------------------- |
174
+ | `headers` | `{}` | Extra headers merged into every request. |
175
+ | `concurrentPages` | `3` | Maximum Playwright pages processed at once. |
176
+ | `maxRetries` | `3` | Additional retry attempts after the first failure. |
177
+ | `retryDelay` | `5000` | Milliseconds to wait between retries. |
178
+ | `cacheTTL` | `900000` | Cache lifetime in ms (`0` disables caching). |
179
+ | `useHttpFallback` | `true` | Try a fast HTTP GET before spinning up Playwright. |
180
+ | `useHeadedModeFallback` | `false` | Automatically retry a domain in headed mode after repeated failures. |
181
+ | `defaultFastMode` | `true` | Block non-critical assets and skip human simulation unless overridden. |
182
+ | `simulateHumanBehavior` | `true` | When not in fast mode, add delays and scrolling to avoid bot detection. |
183
+ | `maxBrowsers` | `2` | Highest number of Playwright browser instances kept in the pool. |
184
+ | `maxPagesPerContext` | `6` | Pages opened per browser context before recycling it. |
185
+ | `maxBrowserAge` | `1200000` | Milliseconds before a browser instance is torn down (20 minutes). |
186
+ | `healthCheckInterval` | `60000` | Pool health check frequency in ms. |
187
+ | `poolBlockedDomains` | `[]` | Domains blocked across every Playwright request (inherit pool defaults if empty). |
188
+ | `poolBlockedResourceTypes` | `[]` | Resource types (e.g. `"image"`) blocked globally. |
189
+ | `proxy` | `undefined` | Per-browser proxy `{ server, username?, password? }`. |
190
+ | `useHeadedMode` | `false` | Force every browser to launch with a visible window. |
191
+ | `markdown` | `false` | Return Markdown instead of raw HTML. Converts via a Rust-native engine with boilerplate removal. |
192
+ | `spaMode` | `false` | Force the more patient render path. Many shell-like pages are auto-detected even when this is off. |
193
+ | `spaRenderDelayMs` | `0` | Minimum extra wait budget when `spaMode` is `true`. |
194
+ | `playwrightOnlyPatterns` | `[]` | URLs matching any string/regex go straight to Playwright, skipping HTTP shell detection. |
195
+ | `playwrightLaunchOptions` | `undefined` | Options passed to `browserType.launch` (see Playwright docs). |
164
196
 
165
197
  Per-request overrides: `fetchHTML` accepts `fastMode`, `markdown`, `spaMode`, and `headers`, while `fetchContent` supports `fastMode` and `headers`.
166
198
 
@@ -173,6 +205,9 @@ Failures raise a typed `FetchError` exposing `code`, `statusCode`, and the under
173
205
  - Explore the [`examples`](./examples) directory for scripts you can run end-to-end.
174
206
  - Ready-to-use TypeScript types ship with the package.
175
207
  - `pnpm test` runs the automated suite when you are ready to contribute.
208
+ - `pnpm eval:auto-render` runs a live Hybrid-vs-HTTP quality matrix across docs, government, knowledge, marketing, commerce, and access-guarded pages, using a stable gated core plus observe-only sentinels for harder domains.
209
+ - `pnpm test:live:auto-render` runs the same hypothesis as a Vitest live test (`LIVE_NETWORK=1`).
210
+ - GitHub Actions includes a dedicated browser-enabled live eval workflow that runs on `main` changes, nightly on a schedule, and on manual dispatch. It uploads the JSON report as a build artifact.
176
211
 
177
212
  ## Contributing
178
213
 
@@ -31,7 +31,6 @@ export declare class FetchEngine implements IEngine {
31
31
  * @throws {Error} If the content type is not HTML or for other network errors.
32
32
  */
33
33
  fetchHTML(url: string, options?: FetchEngineOptions): Promise<HTMLFetchResult>;
34
- private _injectSourceUnderH1;
35
34
  /**
36
35
  * Fetches raw content from the specified URL (mimics standard fetch API).
37
36
  *
@@ -1 +1 @@
1
- {"version":3,"file":"FetchEngine.d.ts","sourceRoot":"","sources":["../src/FetchEngine.ts"],"names":[],"mappings":"AAAA,OAAO,KAAK,EACV,eAAe,EACf,kBAAkB,EAClB,mBAAmB,EACnB,cAAc,EACd,kBAAkB,EACnB,MAAM,YAAY,CAAC;AACpB,OAAO,KAAK,EAAE,OAAO,EAAE,MAAM,cAAc,CAAC;AAG5C,OAAO,EAAE,UAAU,EAAE,MAAM,aAAa,CAAC;AAEzC;;GAEG;AACH,qBAAa,oBAAqB,SAAQ,UAAU;aAGhC,UAAU,EAAE,MAAM;gBADlC,OAAO,EAAE,MAAM,EACC,UAAU,EAAE,MAAM;CAKrC;AAED;;;;;GAKG;AACH,qBAAa,WAAY,YAAW,OAAO;IACzC,OAAO,CAAC,QAAQ,CAAC,OAAO,CAA+B;IAEvD,OAAO,CAAC,MAAM,CAAC,QAAQ,CAAC,eAAe,CAGrC;IAEF;;;OAGG;gBACS,OAAO,GAAE,kBAAuB;IAI5C;;;;;;;OAOG;IACG,SAAS,CAAC,GAAG,EAAE,MAAM,EAAE,OAAO,CAAC,EAAE,kBAAkB,GAAG,OAAO,CAAC,eAAe,CAAC;IAkFpF,OAAO,CAAC,oBAAoB;IAS5B;;;;;;;;OAQG;IACG,YAAY,CAAC,GAAG,EAAE,MAAM,EAAE,OAAO,CAAC,EAAE,mBAAmB,GAAG,OAAO,CAAC,kBAAkB,CAAC;IA8E3F;;;;OAIG;IACG,OAAO,IAAI,OAAO,CAAC,IAAI,CAAC;IAI9B;;;;OAIG;IACH,UAAU,IAAI,cAAc,EAAE;CAG/B"}
1
+ {"version":3,"file":"FetchEngine.d.ts","sourceRoot":"","sources":["../src/FetchEngine.ts"],"names":[],"mappings":"AAAA,OAAO,KAAK,EACV,eAAe,EACf,kBAAkB,EAClB,mBAAmB,EACnB,cAAc,EACd,kBAAkB,EACnB,MAAM,YAAY,CAAC;AACpB,OAAO,KAAK,EAAE,OAAO,EAAE,MAAM,cAAc,CAAC;AAG5C,OAAO,EAAE,UAAU,EAAE,MAAM,aAAa,CAAC;AAEzC;;GAEG;AACH,qBAAa,oBAAqB,SAAQ,UAAU;aAGhC,UAAU,EAAE,MAAM;gBADlC,OAAO,EAAE,MAAM,EACC,UAAU,EAAE,MAAM;CAKrC;AAED;;;;;GAKG;AACH,qBAAa,WAAY,YAAW,OAAO;IACzC,OAAO,CAAC,QAAQ,CAAC,OAAO,CAA+B;IAEvD,OAAO,CAAC,MAAM,CAAC,QAAQ,CAAC,eAAe,CAGrC;IAEF;;;OAGG;gBACS,OAAO,GAAE,kBAAuB;IAI5C;;;;;;;OAOG;IACG,SAAS,CAAC,GAAG,EAAE,MAAM,EAAE,OAAO,CAAC,EAAE,kBAAkB,GAAG,OAAO,CAAC,eAAe,CAAC;IAgFpF;;;;;;;;OAQG;IACG,YAAY,CAAC,GAAG,EAAE,MAAM,EAAE,OAAO,CAAC,EAAE,mBAAmB,GAAG,OAAO,CAAC,kBAAkB,CAAC;IA8E3F;;;;OAIG;IACG,OAAO,IAAI,OAAO,CAAC,IAAI,CAAC;IAI9B;;;;OAIG;IACH,UAAU,IAAI,cAAc,EAAE;CAG/B"}
@@ -1,4 +1,4 @@
1
- import { MarkdownConverter } from "./utils/markdown-converter.js"; // Import the converter
1
+ import { MarkdownConverter, injectSourceUrl } from "./utils/markdown-converter.js";
2
2
  import { FetchError } from "./errors.js"; // Only import FetchError
3
3
  /**
4
4
  * Custom error class for HTTP errors from FetchEngine.
@@ -76,9 +76,8 @@ export class FetchEngine {
76
76
  if (effectiveOptions.markdown) {
77
77
  try {
78
78
  const converter = new MarkdownConverter();
79
- finalContent = converter.convert(html);
80
- // Inject source URL directly under the first H1 for traceability
81
- finalContent = this._injectSourceUnderH1(finalContent, response.url || url);
79
+ finalContent = converter.convert(html, { baseUrl: response.url || url });
80
+ finalContent = injectSourceUrl(finalContent, response.url || url);
82
81
  finalContentType = "markdown";
83
82
  }
84
83
  catch (conversionError) {
@@ -107,17 +106,6 @@ export class FetchEngine {
107
106
  throw new FetchError(`Fetch failed: ${message}`, "ERR_FETCH_FAILED", error instanceof Error ? error : undefined);
108
107
  }
109
108
  }
110
- // Insert a "Source: <url>" line immediately below the first H1.
111
- _injectSourceUnderH1(markdown, sourceUrl) {
112
- if (!markdown || !sourceUrl)
113
- return markdown;
114
- // Avoid duplicate insertion if already present near the top
115
- const head = markdown.split("\n").slice(0, 50).join("\n");
116
- if (/^Source:\s+/m.test(head))
117
- return markdown;
118
- const safeUrl = sourceUrl.trim();
119
- return markdown.replace(/^(\s*#\s.*)$/m, `$1\n\nSource: ${safeUrl}`);
120
- }
121
109
  /**
122
110
  * Fetches raw content from the specified URL (mimics standard fetch API).
123
111
  *
@@ -1 +1 @@
1
- {"version":3,"file":"FetchEngine.js","sourceRoot":"","sources":["../src/FetchEngine.ts"],"names":[],"mappings":"AASA,OAAO,EAAE,iBAAiB,EAAE,MAAM,+BAA+B,CAAC,CAAC,uBAAuB;AAC1F,OAAO,EAAE,UAAU,EAAE,MAAM,aAAa,CAAC,CAAC,yBAAyB;AAEnE;;GAEG;AACH,MAAM,OAAO,oBAAqB,SAAQ,UAAU;IAGhC;IAFlB,YACE,OAAe,EACC,UAAkB;QAElC,KAAK,CAAC,OAAO,EAAE,gBAAgB,EAAE,SAAS,EAAE,UAAU,CAAC,CAAC;QAFxC,eAAU,GAAV,UAAU,CAAQ;QAGlC,IAAI,CAAC,IAAI,GAAG,sBAAsB,CAAC;IACrC,CAAC;CACF;AAED;;;;;GAKG;AACH,MAAM,OAAO,WAAW;IACL,OAAO,CAA+B;IAE/C,MAAM,CAAU,eAAe,GAAiC;QACtE,QAAQ,EAAE,KAAK;QACf,OAAO,EAAE,EAAE;KACZ,CAAC;IAEF;;;OAGG;IACH,YAAY,UAA8B,EAAE;QAC1C,IAAI,CAAC,OAAO,GAAG,EAAE,GAAG,WAAW,CAAC,eAAe,EAAE,GAAG,OAAO,EAAE,CAAC;IAChE,CAAC;IAED;;;;;;;OAOG;IACH,KAAK,CAAC,SAAS,CAAC,GAAW,EAAE,OAA4B;QACvD,MAAM,gBAAgB,GAAG,EAAE,GAAG,IAAI,CAAC,OAAO,EAAE,GAAG,OAAO,EAAE,CAAC,CAAC,uCAAuC;QACjG,IAAI,QAAkB,CAAC;QACvB,IAAI,CAAC;YACH,MAAM,WAAW,GAAG;gBAClB,YAAY,EACV,iHAAiH;gBACnH,MAAM,EAAE,kGAAkG;gBAC1G,iBAAiB,EAAE,gBAAgB;aACpC,CAAC;YAEF,6DAA6D;YAC7D,MAAM,kBAAkB,GAAG,IAAI,CAAC,OAAO,CAAC,OAAO,IAAI,EAAE,CAAC;YAEtD,sEAAsE;YACtE,0GAA0G;YAC1G,MAAM,mBAAmB,GAAG,OAAO,EAAE,OAAO,IAAI,EAAE,CAAC;YAEnD,MAAM,YAAY,GAAG;gBACnB,GAAG,WAAW;gBACd,GAAG,kBAAkB;gBACrB,GAAG,mBAAmB,EAAE,sFAAsF;aAC/G,CAAC;YAEF,QAAQ,GAAG,MAAM,KAAK,CAAC,GAAG,EAAE;gBAC1B,QAAQ,EAAE,QAAQ;gBAClB,OAAO,EAAE,YAAY;aACtB,CAAC,CAAC;YAEH,IAAI,CAAC,QAAQ,CAAC,EAAE,EAAE,CAAC;gBACjB,MAAM,IAAI,oBAAoB,CAAC,uBAAuB,QAAQ,CAAC,MAAM,EAAE,EAAE,QAAQ,CAAC,MAAM,CAAC,CAAC;YAC5F,CAAC;YAED,MAAM,iBAAiB,GAAG,QAAQ,CAAC,OAAO,CAAC,GAAG,CAAC,cAAc,CAAC,CAAC;YAC/D,IAAI,CAAC,iBAAiB,IAAI,CAAC,iBAAiB,CAAC,QAAQ,CAAC,WAAW,CAAC,EAAE,CAAC;gBACnE,MAAM,IAAI,UAAU,CAAC,+BAA+B,EAAE,sBAAsB,CAAC,CAAC;YAChF,CAAC;YAED,MAAM,IAAI,GAAG,MAAM,QAAQ,CAAC,IAAI,EAAE,CAAC;YACnC,MAAM,UAAU,GAAG,IAAI,CAAC,KAAK,CAAC,+BAA+B,CAAC,CAAC;YAC/D,MAAM,KAAK,GAAG,UAAU,CAAC,CAAC,CAAC,UAAU,CAAC,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC,CAAC,CAAC,IAAI,CAAC;YAEvD,IAAI,YAAY,GAAG,IAAI,CAAC;YACxB,IAAI,gBAAgB,GAAwB,MAAM,CAAC;YAEnD,IAAI,gBAAgB,CAAC,QAAQ,EAAE,CAAC;gBAC9B,IAAI,CAAC;oBACH,MAAM,SAAS,GAAG,IAAI,iBAAiB,EAAE,CAAC;oBAC1C,YAAY,GAAG,SAAS,CAAC,OAAO,CAAC,IAAI,CAAC,CAAC;oBACvC,iEAAiE;oBACjE,YAAY,GAAG,IAAI,CAAC,oBAAoB,CAAC,YAAY,EAAE,QAAQ,CAAC,GAAG,IAAI,GAAG,CAAC,CAAC;oBAC5E,gBAAgB,GAAG,UAAU,CAAC;gBAChC,CAAC;gBAAC,OAAO,eAAwB,EAAE,CAAC;oBAClC,OAAO,CAAC,KAAK,CAAC,kCAAkC,GAAG,iBAAiB,EAAE,eAAe,CAAC,CAAC;oBACvF,gDAAgD;gBAClD,CAAC;YACH,CAAC;YAED,OAAO;gBACL,OAAO,EAAE,YAAY;gBACrB,WAAW,EAAE,gBAAgB;gBAC7B,KAAK,EAAE,KAAK;gBACZ,GAAG,EAAE,QAAQ,CAAC,GAAG,EAAE,oCAAoC;gBACvD,WAAW,EAAE,KAAK;gBAClB,UAAU,EAAE,QAAQ,CAAC,MAAM;gBAC3B,KAAK,EAAE,SAAS;aACjB,CAAC;QACJ,CAAC;QAAC,OAAO,KAAc,EAAE,CAAC;YACxB,0CAA0C;YAC1C,IACE,KAAK,YAAY,oBAAoB;gBACrC,CAAC,KAAK,YAAY,UAAU,IAAI,KAAK,CAAC,IAAI,KAAK,sBAAsB,CAAC,EACtE,CAAC;gBACD,MAAM,KAAK,CAAC;YACd,CAAC;YACD,+BAA+B;YAC/B,MAAM,OAAO,GAAG,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,OAAO,CAAC,CAAC,CAAC,qBAAqB,CAAC;YAC/E,MAAM,IAAI,UAAU,CAAC,iBAAiB,OAAO,EAAE,EAAE,kBAAkB,EAAE,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,CAAC,CAAC,SAAS,CAAC,CAAC;QACnH,CAAC;IACH,CAAC;IAED,gEAAgE;IACxD,oBAAoB,CAAC,QAAgB,EAAE,SAAiB;QAC9D,IAAI,CAAC,QAAQ,IAAI,CAAC,SAAS;YAAE,OAAO,QAAQ,CAAC;QAC7C,4DAA4D;QAC5D,MAAM,IAAI,GAAG,QAAQ,CAAC,KAAK,CAAC,IAAI,CAAC,CAAC,KAAK,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC;QAC1D,IAAI,cAAc,CAAC,IAAI,CAAC,IAAI,CAAC;YAAE,OAAO,QAAQ,CAAC;QAC/C,MAAM,OAAO,GAAG,SAAS,CAAC,IAAI,EAAE,CAAC;QACjC,OAAO,QAAQ,CAAC,OAAO,CAAC,eAAe,EAAE,iBAAiB,OAAO,EAAE,CAAC,CAAC;IACvE,CAAC;IAED;;;;;;;;OAQG;IACH,KAAK,CAAC,YAAY,CAAC,GAAW,EAAE,OAA6B;QAC3D,IAAI,QAAkB,CAAC;QACvB,IAAI,CAAC;YACH,MAAM,WAAW,GAAG;gBAClB,YAAY,EACV,iHAAiH;gBACnH,MAAM,EAAE,KAAK,EAAE,mDAAmD;aACnE,CAAC;YAEF,sDAAsD;YACtD,MAAM,kBAAkB,GAAG,IAAI,CAAC,OAAO,CAAC,OAAO,IAAI,EAAE,CAAC;YACtD,MAAM,mBAAmB,GAAG,OAAO,EAAE,OAAO,IAAI,EAAE,CAAC;YAEnD,MAAM,YAAY,GAAG;gBACnB,GAAG,WAAW;gBACd,GAAG,kBAAkB;gBACrB,GAAG,mBAAmB;aACvB,CAAC;YAEF,QAAQ,GAAG,MAAM,KAAK,CAAC,GAAG,EAAE;gBAC1B,QAAQ,EAAE,QAAQ;gBAClB,OAAO,EAAE,YAAY;aACtB,CAAC,CAAC;YAEH,IAAI,CAAC,QAAQ,CAAC,EAAE,EAAE,CAAC;gBACjB,MAAM,IAAI,oBAAoB,CAAC,uBAAuB,QAAQ,CAAC,MAAM,EAAE,EAAE,QAAQ,CAAC,MAAM,CAAC,CAAC;YAC5F,CAAC;YAED,MAAM,iBAAiB,GAAG,QAAQ,CAAC,OAAO,CAAC,GAAG,CAAC,cAAc,CAAC,IAAI,0BAA0B,CAAC;YAE7F,+CAA+C;YAC/C,MAAM,WAAW,GACf,iBAAiB,CAAC,UAAU,CAAC,OAAO,CAAC;gBACrC,iBAAiB,CAAC,QAAQ,CAAC,MAAM,CAAC;gBAClC,iBAAiB,CAAC,QAAQ,CAAC,KAAK,CAAC;gBACjC,iBAAiB,CAAC,QAAQ,CAAC,YAAY,CAAC;gBACxC,iBAAiB,CAAC,QAAQ,CAAC,MAAM,CAAC;gBAClC,iBAAiB,CAAC,QAAQ,CAAC,KAAK,CAAC,CAAC;YAEpC,IAAI,OAAwB,CAAC;YAC7B,IAAI,WAAW,EAAE,CAAC;gBAChB,OAAO,GAAG,MAAM,QAAQ,CAAC,IAAI,EAAE,CAAC;YAClC,CAAC;iBAAM,CAAC;gBACN,MAAM,WAAW,GAAG,MAAM,QAAQ,CAAC,WAAW,EAAE,CAAC;gBACjD,OAAO,GAAG,MAAM,CAAC,IAAI,CAAC,WAAW,CAAC,CAAC;YACrC,CAAC;YAED,wCAAwC;YACxC,IAAI,KAAK,GAAkB,IAAI,CAAC;YAChC,IAAI,OAAO,OAAO,KAAK,QAAQ,IAAI,iBAAiB,CAAC,QAAQ,CAAC,MAAM,CAAC,EAAE,CAAC;gBACtE,MAAM,UAAU,GAAG,OAAO,CAAC,KAAK,CAAC,+BAA+B,CAAC,CAAC;gBAClE,KAAK,GAAG,UAAU,CAAC,CAAC,CAAC,UAAU,CAAC,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC,CAAC,CAAC,IAAI,CAAC;YACnD,CAAC;YAED,OAAO;gBACL,OAAO;gBACP,WAAW,EAAE,iBAAiB;gBAC9B,KAAK;gBACL,GAAG,EAAE,QAAQ,CAAC,GAAG,EAAE,oCAAoC;gBACvD,WAAW,EAAE,KAAK;gBAClB,UAAU,EAAE,QAAQ,CAAC,MAAM;gBAC3B,KAAK,EAAE,SAAS;aACjB,CAAC;QACJ,CAAC;QAAC,OAAO,KAAc,EAAE,CAAC;YACxB,0CAA0C;YAC1C,IAAI,KAAK,YAAY,oBAAoB,EAAE,CAAC;gBAC1C,MAAM,KAAK,CAAC;YACd,CAAC;YACD,+BAA+B;YAC/B,MAAM,OAAO,GAAG,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,OAAO,CAAC,CAAC,CAAC,qBAAqB,CAAC;YAC/E,MAAM,IAAI,UAAU,CAClB,yBAAyB,OAAO,EAAE,EAClC,kBAAkB,EAClB,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,CAAC,CAAC,SAAS,CAC3C,CAAC;QACJ,CAAC;IACH,CAAC;IAED;;;;OAIG;IACH,KAAK,CAAC,OAAO;QACX,OAAO,OAAO,CAAC,OAAO,EAAE,CAAC;IAC3B,CAAC;IAED;;;;OAIG;IACH,UAAU;QACR,OAAO,EAAE,CAAC;IACZ,CAAC"}
1
+ {"version":3,"file":"FetchEngine.js","sourceRoot":"","sources":["../src/FetchEngine.ts"],"names":[],"mappings":"AASA,OAAO,EAAE,iBAAiB,EAAE,eAAe,EAAE,MAAM,+BAA+B,CAAC;AACnF,OAAO,EAAE,UAAU,EAAE,MAAM,aAAa,CAAC,CAAC,yBAAyB;AAEnE;;GAEG;AACH,MAAM,OAAO,oBAAqB,SAAQ,UAAU;IAGhC;IAFlB,YACE,OAAe,EACC,UAAkB;QAElC,KAAK,CAAC,OAAO,EAAE,gBAAgB,EAAE,SAAS,EAAE,UAAU,CAAC,CAAC;QAFxC,eAAU,GAAV,UAAU,CAAQ;QAGlC,IAAI,CAAC,IAAI,GAAG,sBAAsB,CAAC;IACrC,CAAC;CACF;AAED;;;;;GAKG;AACH,MAAM,OAAO,WAAW;IACL,OAAO,CAA+B;IAE/C,MAAM,CAAU,eAAe,GAAiC;QACtE,QAAQ,EAAE,KAAK;QACf,OAAO,EAAE,EAAE;KACZ,CAAC;IAEF;;;OAGG;IACH,YAAY,UAA8B,EAAE;QAC1C,IAAI,CAAC,OAAO,GAAG,EAAE,GAAG,WAAW,CAAC,eAAe,EAAE,GAAG,OAAO,EAAE,CAAC;IAChE,CAAC;IAED;;;;;;;OAOG;IACH,KAAK,CAAC,SAAS,CAAC,GAAW,EAAE,OAA4B;QACvD,MAAM,gBAAgB,GAAG,EAAE,GAAG,IAAI,CAAC,OAAO,EAAE,GAAG,OAAO,EAAE,CAAC,CAAC,uCAAuC;QACjG,IAAI,QAAkB,CAAC;QACvB,IAAI,CAAC;YACH,MAAM,WAAW,GAAG;gBAClB,YAAY,EACV,iHAAiH;gBACnH,MAAM,EAAE,kGAAkG;gBAC1G,iBAAiB,EAAE,gBAAgB;aACpC,CAAC;YAEF,6DAA6D;YAC7D,MAAM,kBAAkB,GAAG,IAAI,CAAC,OAAO,CAAC,OAAO,IAAI,EAAE,CAAC;YAEtD,sEAAsE;YACtE,0GAA0G;YAC1G,MAAM,mBAAmB,GAAG,OAAO,EAAE,OAAO,IAAI,EAAE,CAAC;YAEnD,MAAM,YAAY,GAAG;gBACnB,GAAG,WAAW;gBACd,GAAG,kBAAkB;gBACrB,GAAG,mBAAmB,EAAE,sFAAsF;aAC/G,CAAC;YAEF,QAAQ,GAAG,MAAM,KAAK,CAAC,GAAG,EAAE;gBAC1B,QAAQ,EAAE,QAAQ;gBAClB,OAAO,EAAE,YAAY;aACtB,CAAC,CAAC;YAEH,IAAI,CAAC,QAAQ,CAAC,EAAE,EAAE,CAAC;gBACjB,MAAM,IAAI,oBAAoB,CAAC,uBAAuB,QAAQ,CAAC,MAAM,EAAE,EAAE,QAAQ,CAAC,MAAM,CAAC,CAAC;YAC5F,CAAC;YAED,MAAM,iBAAiB,GAAG,QAAQ,CAAC,OAAO,CAAC,GAAG,CAAC,cAAc,CAAC,CAAC;YAC/D,IAAI,CAAC,iBAAiB,IAAI,CAAC,iBAAiB,CAAC,QAAQ,CAAC,WAAW,CAAC,EAAE,CAAC;gBACnE,MAAM,IAAI,UAAU,CAAC,+BAA+B,EAAE,sBAAsB,CAAC,CAAC;YAChF,CAAC;YAED,MAAM,IAAI,GAAG,MAAM,QAAQ,CAAC,IAAI,EAAE,CAAC;YACnC,MAAM,UAAU,GAAG,IAAI,CAAC,KAAK,CAAC,+BAA+B,CAAC,CAAC;YAC/D,MAAM,KAAK,GAAG,UAAU,CAAC,CAAC,CAAC,UAAU,CAAC,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC,CAAC,CAAC,IAAI,CAAC;YAEvD,IAAI,YAAY,GAAG,IAAI,CAAC;YACxB,IAAI,gBAAgB,GAAwB,MAAM,CAAC;YAEnD,IAAI,gBAAgB,CAAC,QAAQ,EAAE,CAAC;gBAC9B,IAAI,CAAC;oBACH,MAAM,SAAS,GAAG,IAAI,iBAAiB,EAAE,CAAC;oBAC1C,YAAY,GAAG,SAAS,CAAC,OAAO,CAAC,IAAI,EAAE,EAAE,OAAO,EAAE,QAAQ,CAAC,GAAG,IAAI,GAAG,EAAE,CAAC,CAAC;oBACzE,YAAY,GAAG,eAAe,CAAC,YAAY,EAAE,QAAQ,CAAC,GAAG,IAAI,GAAG,CAAC,CAAC;oBAClE,gBAAgB,GAAG,UAAU,CAAC;gBAChC,CAAC;gBAAC,OAAO,eAAwB,EAAE,CAAC;oBAClC,OAAO,CAAC,KAAK,CAAC,kCAAkC,GAAG,iBAAiB,EAAE,eAAe,CAAC,CAAC;oBACvF,gDAAgD;gBAClD,CAAC;YACH,CAAC;YAED,OAAO;gBACL,OAAO,EAAE,YAAY;gBACrB,WAAW,EAAE,gBAAgB;gBAC7B,KAAK,EAAE,KAAK;gBACZ,GAAG,EAAE,QAAQ,CAAC,GAAG,EAAE,oCAAoC;gBACvD,WAAW,EAAE,KAAK;gBAClB,UAAU,EAAE,QAAQ,CAAC,MAAM;gBAC3B,KAAK,EAAE,SAAS;aACjB,CAAC;QACJ,CAAC;QAAC,OAAO,KAAc,EAAE,CAAC;YACxB,0CAA0C;YAC1C,IACE,KAAK,YAAY,oBAAoB;gBACrC,CAAC,KAAK,YAAY,UAAU,IAAI,KAAK,CAAC,IAAI,KAAK,sBAAsB,CAAC,EACtE,CAAC;gBACD,MAAM,KAAK,CAAC;YACd,CAAC;YACD,+BAA+B;YAC/B,MAAM,OAAO,GAAG,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,OAAO,CAAC,CAAC,CAAC,qBAAqB,CAAC;YAC/E,MAAM,IAAI,UAAU,CAAC,iBAAiB,OAAO,EAAE,EAAE,kBAAkB,EAAE,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,CAAC,CAAC,SAAS,CAAC,CAAC;QACnH,CAAC;IACH,CAAC;IAED;;;;;;;;OAQG;IACH,KAAK,CAAC,YAAY,CAAC,GAAW,EAAE,OAA6B;QAC3D,IAAI,QAAkB,CAAC;QACvB,IAAI,CAAC;YACH,MAAM,WAAW,GAAG;gBAClB,YAAY,EACV,iHAAiH;gBACnH,MAAM,EAAE,KAAK,EAAE,mDAAmD;aACnE,CAAC;YAEF,sDAAsD;YACtD,MAAM,kBAAkB,GAAG,IAAI,CAAC,OAAO,CAAC,OAAO,IAAI,EAAE,CAAC;YACtD,MAAM,mBAAmB,GAAG,OAAO,EAAE,OAAO,IAAI,EAAE,CAAC;YAEnD,MAAM,YAAY,GAAG;gBACnB,GAAG,WAAW;gBACd,GAAG,kBAAkB;gBACrB,GAAG,mBAAmB;aACvB,CAAC;YAEF,QAAQ,GAAG,MAAM,KAAK,CAAC,GAAG,EAAE;gBAC1B,QAAQ,EAAE,QAAQ;gBAClB,OAAO,EAAE,YAAY;aACtB,CAAC,CAAC;YAEH,IAAI,CAAC,QAAQ,CAAC,EAAE,EAAE,CAAC;gBACjB,MAAM,IAAI,oBAAoB,CAAC,uBAAuB,QAAQ,CAAC,MAAM,EAAE,EAAE,QAAQ,CAAC,MAAM,CAAC,CAAC;YAC5F,CAAC;YAED,MAAM,iBAAiB,GAAG,QAAQ,CAAC,OAAO,CAAC,GAAG,CAAC,cAAc,CAAC,IAAI,0BAA0B,CAAC;YAE7F,+CAA+C;YAC/C,MAAM,WAAW,GACf,iBAAiB,CAAC,UAAU,CAAC,OAAO,CAAC;gBACrC,iBAAiB,CAAC,QAAQ,CAAC,MAAM,CAAC;gBAClC,iBAAiB,CAAC,QAAQ,CAAC,KAAK,CAAC;gBACjC,iBAAiB,CAAC,QAAQ,CAAC,YAAY,CAAC;gBACxC,iBAAiB,CAAC,QAAQ,CAAC,MAAM,CAAC;gBAClC,iBAAiB,CAAC,QAAQ,CAAC,KAAK,CAAC,CAAC;YAEpC,IAAI,OAAwB,CAAC;YAC7B,IAAI,WAAW,EAAE,CAAC;gBAChB,OAAO,GAAG,MAAM,QAAQ,CAAC,IAAI,EAAE,CAAC;YAClC,CAAC;iBAAM,CAAC;gBACN,MAAM,WAAW,GAAG,MAAM,QAAQ,CAAC,WAAW,EAAE,CAAC;gBACjD,OAAO,GAAG,MAAM,CAAC,IAAI,CAAC,WAAW,CAAC,CAAC;YACrC,CAAC;YAED,wCAAwC;YACxC,IAAI,KAAK,GAAkB,IAAI,CAAC;YAChC,IAAI,OAAO,OAAO,KAAK,QAAQ,IAAI,iBAAiB,CAAC,QAAQ,CAAC,MAAM,CAAC,EAAE,CAAC;gBACtE,MAAM,UAAU,GAAG,OAAO,CAAC,KAAK,CAAC,+BAA+B,CAAC,CAAC;gBAClE,KAAK,GAAG,UAAU,CAAC,CAAC,CAAC,UAAU,CAAC,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC,CAAC,CAAC,IAAI,CAAC;YACnD,CAAC;YAED,OAAO;gBACL,OAAO;gBACP,WAAW,EAAE,iBAAiB;gBAC9B,KAAK;gBACL,GAAG,EAAE,QAAQ,CAAC,GAAG,EAAE,oCAAoC;gBACvD,WAAW,EAAE,KAAK;gBAClB,UAAU,EAAE,QAAQ,CAAC,MAAM;gBAC3B,KAAK,EAAE,SAAS;aACjB,CAAC;QACJ,CAAC;QAAC,OAAO,KAAc,EAAE,CAAC;YACxB,0CAA0C;YAC1C,IAAI,KAAK,YAAY,oBAAoB,EAAE,CAAC;gBAC1C,MAAM,KAAK,CAAC;YACd,CAAC;YACD,+BAA+B;YAC/B,MAAM,OAAO,GAAG,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,OAAO,CAAC,CAAC,CAAC,qBAAqB,CAAC;YAC/E,MAAM,IAAI,UAAU,CAClB,yBAAyB,OAAO,EAAE,EAClC,kBAAkB,EAClB,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,CAAC,CAAC,SAAS,CAC3C,CAAC;QACJ,CAAC;IACH,CAAC;IAED;;;;OAIG;IACH,KAAK,CAAC,OAAO;QACX,OAAO,OAAO,CAAC,OAAO,EAAE,CAAC;IAC3B,CAAC;IAED;;;;OAIG;IACH,UAAU;QACR,OAAO,EAAE,CAAC;IACZ,CAAC"}
@@ -9,7 +9,8 @@ export declare class HybridEngine implements IEngine {
9
9
  private readonly config;
10
10
  private readonly playwrightOnlyPatterns;
11
11
  constructor(config?: PlaywrightEngineConfig);
12
- private _isSpaShell;
12
+ private _convertHtmlToMarkdown;
13
+ private _shouldAutoRender;
13
14
  fetchHTML(url: string, options?: FetchOptions): Promise<HTMLFetchResult>;
14
15
  /**
15
16
  * Fetches raw content from the specified URL using the hybrid approach.
@@ -1 +1 @@
1
- {"version":3,"file":"HybridEngine.d.ts","sourceRoot":"","sources":["../src/HybridEngine.ts"],"names":[],"mappings":"AAEA,OAAO,KAAK,EAAE,OAAO,EAAE,MAAM,cAAc,CAAC;AAC5C,OAAO,KAAK,EACV,eAAe,EACf,kBAAkB,EAClB,mBAAmB,EACnB,sBAAsB,EACtB,YAAY,EACZ,cAAc,EACf,MAAM,YAAY,CAAC;AAEpB;;GAEG;AACH,qBAAa,YAAa,YAAW,OAAO;IAC1C,OAAO,CAAC,QAAQ,CAAC,WAAW,CAAc;IAC1C,OAAO,CAAC,QAAQ,CAAC,gBAAgB,CAAmB;IACpD,OAAO,CAAC,QAAQ,CAAC,MAAM,CAAyB;IAChD,OAAO,CAAC,QAAQ,CAAC,sBAAsB,CAAsB;gBAEjD,MAAM,GAAE,sBAA2B;IAU/C,OAAO,CAAC,WAAW;IAkBb,SAAS,CAAC,GAAG,EAAE,MAAM,EAAE,OAAO,GAAE,YAAiB,GAAG,OAAO,CAAC,eAAe,CAAC;IAsFlF;;;;;;;;;OASG;IACG,YAAY,CAAC,GAAG,EAAE,MAAM,EAAE,OAAO,GAAE,mBAAwB,GAAG,OAAO,CAAC,kBAAkB,CAAC;IA0C/F;;OAEG;IACH,UAAU,IAAI,cAAc,EAAE;IAI9B;;OAEG;IACG,OAAO,IAAI,OAAO,CAAC,IAAI,CAAC;CAM/B"}
1
+ {"version":3,"file":"HybridEngine.d.ts","sourceRoot":"","sources":["../src/HybridEngine.ts"],"names":[],"mappings":"AAEA,OAAO,KAAK,EAAE,OAAO,EAAE,MAAM,cAAc,CAAC;AAQ5C,OAAO,KAAK,EACV,eAAe,EACf,kBAAkB,EAClB,mBAAmB,EACnB,sBAAsB,EACtB,YAAY,EACZ,cAAc,EACf,MAAM,YAAY,CAAC;AAEpB;;GAEG;AACH,qBAAa,YAAa,YAAW,OAAO;IAC1C,OAAO,CAAC,QAAQ,CAAC,WAAW,CAAc;IAC1C,OAAO,CAAC,QAAQ,CAAC,gBAAgB,CAAmB;IACpD,OAAO,CAAC,QAAQ,CAAC,MAAM,CAAyB;IAChD,OAAO,CAAC,QAAQ,CAAC,sBAAsB,CAAsB;gBAEjD,MAAM,GAAE,sBAA2B;IAS/C,OAAO,CAAC,sBAAsB;IAkB9B,OAAO,CAAC,iBAAiB;IAUnB,SAAS,CAAC,GAAG,EAAE,MAAM,EAAE,OAAO,GAAE,YAAiB,GAAG,OAAO,CAAC,eAAe,CAAC;IAsGlF;;;;;;;;;OASG;IACG,YAAY,CAAC,GAAG,EAAE,MAAM,EAAE,OAAO,GAAE,mBAAwB,GAAG,OAAO,CAAC,kBAAkB,CAAC;IA0C/F;;OAEG;IACH,UAAU,IAAI,cAAc,EAAE;IAI9B;;OAEG;IACG,OAAO,IAAI,OAAO,CAAC,IAAI,CAAC;CAM/B"}
@@ -1,5 +1,7 @@
1
1
  import { FetchEngine, FetchEngineHttpError } from "./FetchEngine.js";
2
2
  import { PlaywrightEngine } from "./PlaywrightEngine.js";
3
+ import { MarkdownConverter, injectSourceUrl } from "./utils/markdown-converter.js";
4
+ import { assessHtmlRenderNeed, assessSerializedContent, isRenderedContentMeaningfullyBetter, isSoftBlockPage, } from "./utils/render-detection.js";
3
5
  /**
4
6
  * HybridEngine - Tries FetchEngine first, falls back to PlaywrightEngine on failure.
5
7
  */
@@ -10,30 +12,35 @@ export class HybridEngine {
10
12
  playwrightOnlyPatterns;
11
13
  constructor(config = {}) {
12
14
  // Pass relevant config parts to each engine
13
- // FetchEngine only takes markdown option from the shared config
14
- // spaMode from config is primarily for PlaywrightEngine, but HybridEngine uses it for decision making.
15
- this.fetchEngine = new FetchEngine({ markdown: config.markdown, headers: config.headers });
15
+ // HybridEngine fetches raw HTML first so it can decide whether rendering is necessary.
16
+ this.fetchEngine = new FetchEngine({ markdown: false, headers: config.headers });
16
17
  this.playwrightEngine = new PlaywrightEngine(config);
17
18
  this.config = config; // Store for merging later
18
19
  this.playwrightOnlyPatterns = config.playwrightOnlyPatterns || [];
19
20
  }
20
- _isSpaShell(htmlContent) {
21
- if (!htmlContent || htmlContent.length < 150) {
22
- // Very short content might be a shell or error
23
- // Heuristic: if it's very short AND contains noscript, good chance it's a shell.
24
- if (htmlContent.includes("<noscript>"))
25
- return true;
21
+ _convertHtmlToMarkdown(htmlResult) {
22
+ try {
23
+ const converter = new MarkdownConverter();
24
+ const content = injectSourceUrl(converter.convert(htmlResult.content, { baseUrl: htmlResult.url }), htmlResult.url);
25
+ return {
26
+ ...htmlResult,
27
+ content,
28
+ contentType: "markdown",
29
+ };
26
30
  }
27
- // Check for <noscript> tag
28
- if (htmlContent.includes("<noscript>"))
29
- return true;
30
- // Check for common empty root divs
31
- if (/<div id=(?:"|')?(root|app)(?:"|')?[^>]*>\s*<\/div>/i.test(htmlContent))
31
+ catch (conversionError) {
32
+ console.error(`HybridEngine: Markdown conversion failed for ${htmlResult.url}:`, conversionError);
33
+ return htmlResult;
34
+ }
35
+ }
36
+ _shouldAutoRender(fetchResult, forceSpaMode) {
37
+ if (forceSpaMode) {
32
38
  return true;
33
- // Check for empty title tag or no title tag at all
34
- if (/<title>\s*<\/title>/i.test(htmlContent) || !/<title[^>]*>/i.test(htmlContent))
39
+ }
40
+ if (isSoftBlockPage(fetchResult.content)) {
35
41
  return true;
36
- return false;
42
+ }
43
+ return assessHtmlRenderNeed(fetchResult.content).renderLikelyNeeded;
37
44
  }
38
45
  async fetchHTML(url, options = {}) {
39
46
  // Determine effective SPA mode and markdown options
@@ -70,22 +77,36 @@ export class HybridEngine {
70
77
  }
71
78
  }
72
79
  try {
73
- // Prepare options for FetchEngine call
74
- const fetchEngineCallSpecificOptions = {
75
- markdown: effectiveMarkdown, // Pass the resolved markdown setting
76
- headers: options.headers, // Pass only the request-specific headers. FetchEngine will merge these with its own constructor headers.
80
+ const fetchResult = await this.fetchEngine.fetchHTML(url, {
81
+ markdown: false,
82
+ headers: options.headers,
83
+ });
84
+ const httpPreferredResult = effectiveMarkdown ? this._convertHtmlToMarkdown(fetchResult) : fetchResult;
85
+ if (!this._shouldAutoRender(fetchResult, effectiveSpaMode)) {
86
+ return httpPreferredResult;
87
+ }
88
+ console.warn(`HybridEngine: HTTP fetch for ${url} looks incomplete. Attempting Playwright render.`);
89
+ // Skip HTTP fallback (we already know it's a shell) and use SPA rendering path for patient waits.
90
+ const autoRenderOptions = {
91
+ ...playwrightOptions,
92
+ useHttpFallback: false,
93
+ spaMode: true,
77
94
  };
78
- const fetchResult = await this.fetchEngine.fetchHTML(url, fetchEngineCallSpecificOptions);
79
- // If FetchEngine succeeded AND spaMode is active, check if it's just a shell
80
- if (effectiveSpaMode && fetchResult && fetchResult.content) {
81
- if (this._isSpaShell(fetchResult.content)) {
82
- console.warn(`HybridEngine: FetchEngine returned likely SPA shell for ${url} in spaMode. Forcing PlaywrightEngine.`);
83
- // Fallback to PlaywrightEngine, passing the determined effective options
84
- return this.playwrightEngine.fetchHTML(url, playwrightOptions);
95
+ try {
96
+ const playwrightResult = await this.playwrightEngine.fetchHTML(url, autoRenderOptions);
97
+ const staticAssessment = assessSerializedContent(httpPreferredResult.content, httpPreferredResult.contentType);
98
+ const renderedAssessment = assessSerializedContent(playwrightResult.content, playwrightResult.contentType);
99
+ if (!isRenderedContentMeaningfullyBetter(staticAssessment, renderedAssessment)) {
100
+ console.warn(`HybridEngine: Playwright render for ${url} was not meaningfully better. Keeping HTTP result.`);
101
+ return httpPreferredResult;
85
102
  }
103
+ return playwrightResult;
104
+ }
105
+ catch (playwrightError) {
106
+ const pwMessage = playwrightError instanceof Error ? playwrightError.message : String(playwrightError);
107
+ console.warn(`HybridEngine: Playwright render failed for ${url}: ${pwMessage}. Returning HTTP result.`);
108
+ return httpPreferredResult;
86
109
  }
87
- // If not spaMode, or if spaMode but content is not a shell, return FetchEngine's result
88
- return fetchResult;
89
110
  }
90
111
  catch (fetchError) {
91
112
  // If FetchEngine returned a 404, do not attempt Playwright fallback
@@ -1 +1 @@
1
- {"version":3,"file":"HybridEngine.js","sourceRoot":"","sources":["../src/HybridEngine.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,WAAW,EAAE,oBAAoB,EAAE,MAAM,kBAAkB,CAAC;AACrE,OAAO,EAAE,gBAAgB,EAAE,MAAM,uBAAuB,CAAC;AAWzD;;GAEG;AACH,MAAM,OAAO,YAAY;IACN,WAAW,CAAc;IACzB,gBAAgB,CAAmB;IACnC,MAAM,CAAyB,CAAC,sDAAsD;IACtF,sBAAsB,CAAsB;IAE7D,YAAY,SAAiC,EAAE;QAC7C,4CAA4C;QAC5C,gEAAgE;QAChE,uGAAuG;QACvG,IAAI,CAAC,WAAW,GAAG,IAAI,WAAW,CAAC,EAAE,QAAQ,EAAE,MAAM,CAAC,QAAQ,EAAE,OAAO,EAAE,MAAM,CAAC,OAAO,EAAE,CAAC,CAAC;QAC3F,IAAI,CAAC,gBAAgB,GAAG,IAAI,gBAAgB,CAAC,MAAM,CAAC,CAAC;QACrD,IAAI,CAAC,MAAM,GAAG,MAAM,CAAC,CAAC,0BAA0B;QAChD,IAAI,CAAC,sBAAsB,GAAG,MAAM,CAAC,sBAAsB,IAAI,EAAE,CAAC;IACpE,CAAC;IAEO,WAAW,CAAC,WAAmB;QACrC,IAAI,CAAC,WAAW,IAAI,WAAW,CAAC,MAAM,GAAG,GAAG,EAAE,CAAC;YAC7C,+CAA+C;YAC/C,iFAAiF;YACjF,IAAI,WAAW,CAAC,QAAQ,CAAC,YAAY,CAAC;gBAAE,OAAO,IAAI,CAAC;QACtD,CAAC;QACD,2BAA2B;QAC3B,IAAI,WAAW,CAAC,QAAQ,CAAC,YAAY,CAAC;YAAE,OAAO,IAAI,CAAC;QAEpD,mCAAmC;QACnC,IAAI,qDAAqD,CAAC,IAAI,CAAC,WAAW,CAAC;YAAE,OAAO,IAAI,CAAC;QAEzF,mDAAmD;QACnD,IAAI,sBAAsB,CAAC,IAAI,CAAC,WAAW,CAAC,IAAI,CAAC,eAAe,CAAC,IAAI,CAAC,WAAW,CAAC;YAAE,OAAO,IAAI,CAAC;QAEhG,OAAO,KAAK,CAAC;IACf,CAAC;IAED,KAAK,CAAC,SAAS,CAAC,GAAW,EAAE,UAAwB,EAAE;QACrD,oDAAoD;QACpD,gHAAgH;QAChH,MAAM,gBAAgB,GACpB,OAAO,CAAC,OAAO,KAAK,SAAS,CAAC,CAAC,CAAC,OAAO,CAAC,OAAO,CAAC,CAAC,CAAC,IAAI,CAAC,MAAM,CAAC,OAAO,KAAK,SAAS,CAAC,CAAC,CAAC,IAAI,CAAC,MAAM,CAAC,OAAO,CAAC,CAAC,CAAC,KAAK,CAAC;QACpH,MAAM,iBAAiB,GACrB,OAAO,CAAC,QAAQ,KAAK,SAAS;YAC5B,CAAC,CAAC,OAAO,CAAC,QAAQ;YAClB,CAAC,CAAC,IAAI,CAAC,MAAM,CAAC,QAAQ,KAAK,SAAS;gBAClC,CAAC,CAAC,IAAI,CAAC,MAAM,CAAC,QAAQ;gBACtB,CAAC,CAAC,KAAK,CAAC;QAEd,yFAAyF;QACzF,mEAAmE;QACnE,MAAM,kBAAkB,GAAG,IAAI,CAAC,MAAM,CAAC,OAAO,IAAI,EAAE,CAAC;QACrD,MAAM,sBAAsB,GAAG,OAAO,CAAC,OAAO,IAAI,EAAE,CAAC,CAAC,mEAAmE;QAEzH,8DAA8D;QAC9D,MAAM,0BAA0B,GAAG,EAAE,GAAG,kBAAkB,EAAE,GAAG,sBAAsB,EAAE,CAAC;QAExF,kEAAkE;QAClE,MAAM,iBAAiB,GAInB;YACF,GAAG,IAAI,CAAC,MAAM,EAAE,gEAAgE;YAChF,GAAG,OAAO,EAAE,yDAAyD;YACrE,OAAO,EAAE,0BAA0B,EAAE,sCAAsC;YAC3E,QAAQ,EAAE,iBAAiB;YAC3B,OAAO,EAAE,gBAAgB;SAC1B,CAAC;QAEF,qCAAqC;QACrC,KAAK,MAAM,OAAO,IAAI,IAAI,CAAC,sBAAsB,EAAE,CAAC;YAClD,IAAI,OAAO,OAAO,KAAK,QAAQ,IAAI,GAAG,CAAC,QAAQ,CAAC,OAAO,CAAC,EAAE,CAAC;gBACzD,OAAO,CAAC,IAAI,CAAC,qBAAqB,GAAG,4BAA4B,OAAO,qCAAqC,CAAC,CAAC;gBAC/G,OAAO,IAAI,CAAC,gBAAgB,CAAC,SAAS,CAAC,GAAG,EAAE,iBAAiB,CAAC,CAAC;YACjE,CAAC;iBAAM,IAAI,OAAO,YAAY,MAAM,IAAI,OAAO,CAAC,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC;gBAC1D,OAAO,CAAC,IAAI,CACV,qBAAqB,GAAG,2BAA2B,OAAO,CAAC,QAAQ,EAAE,qCAAqC,CAC3G,CAAC;gBACF,OAAO,IAAI,CAAC,gBAAgB,CAAC,SAAS,CAAC,GAAG,EAAE,iBAAiB,CAAC,CAAC;YACjE,CAAC;QACH,CAAC;QAED,IAAI,CAAC;YACH,uCAAuC;YACvC,MAAM,8BAA8B,GAAiB;gBACnD,QAAQ,EAAE,iBAAiB,EAAE,qCAAqC;gBAClE,OAAO,EAAE,OAAO,CAAC,OAAO,EAAE,yGAAyG;aACpI,CAAC;YACF,MAAM,WAAW,GAAG,MAAM,IAAI,CAAC,WAAW,CAAC,SAAS,CAAC,GAAG,EAAE,8BAA8B,CAAC,CAAC;YAE1F,6EAA6E;YAC7E,IAAI,gBAAgB,IAAI,WAAW,IAAI,WAAW,CAAC,OAAO,EAAE,CAAC;gBAC3D,IAAI,IAAI,CAAC,WAAW,CAAC,WAAW,CAAC,OAAO,CAAC,EAAE,CAAC;oBAC1C,OAAO,CAAC,IAAI,CACV,2DAA2D,GAAG,wCAAwC,CACvG,CAAC;oBACF,yEAAyE;oBACzE,OAAO,IAAI,CAAC,gBAAgB,CAAC,SAAS,CAAC,GAAG,EAAE,iBAAiB,CAAC,CAAC;gBACjE,CAAC;YACH,CAAC;YACD,wFAAwF;YACxF,OAAO,WAAW,CAAC;QACrB,CAAC;QAAC,OAAO,UAAmB,EAAE,CAAC;YAC7B,oEAAoE;YACpE,IAAI,UAAU,YAAY,oBAAoB,IAAI,UAAU,CAAC,UAAU,KAAK,GAAG,EAAE,CAAC;gBAChF,OAAO,CAAC,IAAI,CAAC,8CAA8C,GAAG,qBAAqB,CAAC,CAAC;gBACrF,MAAM,UAAU,CAAC;YACnB,CAAC;YACD,MAAM,OAAO,GAAG,UAAU,YAAY,KAAK,CAAC,CAAC,CAAC,UAAU,CAAC,OAAO,CAAC,CAAC,CAAC,MAAM,CAAC,UAAU,CAAC,CAAC;YACtF,OAAO,CAAC,IAAI,CAAC,wCAAwC,GAAG,KAAK,OAAO,qCAAqC,CAAC,CAAC;YAC3G,IAAI,CAAC;gBACH,yEAAyE;gBACzE,MAAM,gBAAgB,GAAG,MAAM,IAAI,CAAC,gBAAgB,CAAC,SAAS,CAAC,GAAG,EAAE,iBAAiB,CAAC,CAAC;gBACvF,OAAO,gBAAgB,CAAC;YAC1B,CAAC;YAAC,OAAO,eAAwB,EAAE,CAAC;gBAClC,MAAM,SAAS,GAAG,eAAe,YAAY,KAAK,CAAC,CAAC,CAAC,eAAe,CAAC,OAAO,CAAC,CAAC,CAAC,MAAM,CAAC,eAAe,CAAC,CAAC;gBACvG,OAAO,CAAC,KAAK,CAAC,2DAA2D,GAAG,KAAK,SAAS,EAAE,CAAC,CAAC;gBAC9F,MAAM,eAAe,CAAC,CAAC,8DAA8D;YACvF,CAAC;QACH,CAAC;IACH,CAAC;IAED;;;;;;;;;OASG;IACH,KAAK,CAAC,YAAY,CAAC,GAAW,EAAE,UAA+B,EAAE;QAC/D,qCAAqC;QACrC,KAAK,MAAM,OAAO,IAAI,IAAI,CAAC,sBAAsB,EAAE,CAAC;YAClD,IAAI,OAAO,OAAO,KAAK,QAAQ,IAAI,GAAG,CAAC,QAAQ,CAAC,OAAO,CAAC,EAAE,CAAC;gBACzD,OAAO,CAAC,IAAI,CACV,qBAAqB,GAAG,4BAA4B,OAAO,uDAAuD,CACnH,CAAC;gBACF,OAAO,IAAI,CAAC,gBAAgB,CAAC,YAAY,CAAC,GAAG,EAAE,OAAO,CAAC,CAAC;YAC1D,CAAC;iBAAM,IAAI,OAAO,YAAY,MAAM,IAAI,OAAO,CAAC,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC;gBAC1D,OAAO,CAAC,IAAI,CACV,qBAAqB,GAAG,2BAA2B,OAAO,CAAC,QAAQ,EAAE,uDAAuD,CAC7H,CAAC;gBACF,OAAO,IAAI,CAAC,gBAAgB,CAAC,YAAY,CAAC,GAAG,EAAE,OAAO,CAAC,CAAC;YAC1D,CAAC;QACH,CAAC;QAED,IAAI,CAAC;YACH,wBAAwB;YACxB,MAAM,WAAW,GAAG,MAAM,IAAI,CAAC,WAAW,CAAC,YAAY,CAAC,GAAG,EAAE,OAAO,CAAC,CAAC;YACtE,OAAO,WAAW,CAAC;QACrB,CAAC;QAAC,OAAO,UAAmB,EAAE,CAAC;YAC7B,oEAAoE;YACpE,IAAI,UAAU,YAAY,oBAAoB,IAAI,UAAU,CAAC,UAAU,KAAK,GAAG,EAAE,CAAC;gBAChF,OAAO,CAAC,IAAI,CAAC,4DAA4D,GAAG,qBAAqB,CAAC,CAAC;gBACnG,MAAM,UAAU,CAAC;YACnB,CAAC;YACD,MAAM,OAAO,GAAG,UAAU,YAAY,KAAK,CAAC,CAAC,CAAC,UAAU,CAAC,OAAO,CAAC,CAAC,CAAC,MAAM,CAAC,UAAU,CAAC,CAAC;YACtF,OAAO,CAAC,IAAI,CACV,sDAAsD,GAAG,KAAK,OAAO,qCAAqC,CAC3G,CAAC;YACF,IAAI,CAAC;gBACH,+BAA+B;gBAC/B,MAAM,gBAAgB,GAAG,MAAM,IAAI,CAAC,gBAAgB,CAAC,YAAY,CAAC,GAAG,EAAE,OAAO,CAAC,CAAC;gBAChF,OAAO,gBAAgB,CAAC;YAC1B,CAAC;YAAC,OAAO,eAAwB,EAAE,CAAC;gBAClC,MAAM,SAAS,GAAG,eAAe,YAAY,KAAK,CAAC,CAAC,CAAC,eAAe,CAAC,OAAO,CAAC,CAAC,CAAC,MAAM,CAAC,eAAe,CAAC,CAAC;gBACvG,OAAO,CAAC,KAAK,CAAC,yEAAyE,GAAG,KAAK,SAAS,EAAE,CAAC,CAAC;gBAC5G,MAAM,eAAe,CAAC,CAAC,8DAA8D;YACvF,CAAC;QACH,CAAC;IACH,CAAC;IAED;;OAEG;IACH,UAAU;QACR,OAAO,IAAI,CAAC,gBAAgB,CAAC,UAAU,EAAE,CAAC;IAC5C,CAAC;IAED;;OAEG;IACH,KAAK,CAAC,OAAO;QACX,MAAM,OAAO,CAAC,UAAU,CAAC;YACvB,IAAI,CAAC,WAAW,CAAC,OAAO,EAAE,EAAE,yCAAyC;YACrE,IAAI,CAAC,gBAAgB,CAAC,OAAO,EAAE;SAChC,CAAC,CAAC;IACL,CAAC;CACF"}
1
+ {"version":3,"file":"HybridEngine.js","sourceRoot":"","sources":["../src/HybridEngine.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,WAAW,EAAE,oBAAoB,EAAE,MAAM,kBAAkB,CAAC;AACrE,OAAO,EAAE,gBAAgB,EAAE,MAAM,uBAAuB,CAAC;AAEzD,OAAO,EAAE,iBAAiB,EAAE,eAAe,EAAE,MAAM,+BAA+B,CAAC;AACnF,OAAO,EACL,oBAAoB,EACpB,uBAAuB,EACvB,mCAAmC,EACnC,eAAe,GAChB,MAAM,6BAA6B,CAAC;AAUrC;;GAEG;AACH,MAAM,OAAO,YAAY;IACN,WAAW,CAAc;IACzB,gBAAgB,CAAmB;IACnC,MAAM,CAAyB,CAAC,sDAAsD;IACtF,sBAAsB,CAAsB;IAE7D,YAAY,SAAiC,EAAE;QAC7C,4CAA4C;QAC5C,uFAAuF;QACvF,IAAI,CAAC,WAAW,GAAG,IAAI,WAAW,CAAC,EAAE,QAAQ,EAAE,KAAK,EAAE,OAAO,EAAE,MAAM,CAAC,OAAO,EAAE,CAAC,CAAC;QACjF,IAAI,CAAC,gBAAgB,GAAG,IAAI,gBAAgB,CAAC,MAAM,CAAC,CAAC;QACrD,IAAI,CAAC,MAAM,GAAG,MAAM,CAAC,CAAC,0BAA0B;QAChD,IAAI,CAAC,sBAAsB,GAAG,MAAM,CAAC,sBAAsB,IAAI,EAAE,CAAC;IACpE,CAAC;IAEO,sBAAsB,CAAC,UAA2B;QACxD,IAAI,CAAC;YACH,MAAM,SAAS,GAAG,IAAI,iBAAiB,EAAE,CAAC;YAC1C,MAAM,OAAO,GAAG,eAAe,CAC7B,SAAS,CAAC,OAAO,CAAC,UAAU,CAAC,OAAO,EAAE,EAAE,OAAO,EAAE,UAAU,CAAC,GAAG,EAAE,CAAC,EAClE,UAAU,CAAC,GAAG,CACf,CAAC;YACF,OAAO;gBACL,GAAG,UAAU;gBACb,OAAO;gBACP,WAAW,EAAE,UAAU;aACxB,CAAC;QACJ,CAAC;QAAC,OAAO,eAAwB,EAAE,CAAC;YAClC,OAAO,CAAC,KAAK,CAAC,gDAAgD,UAAU,CAAC,GAAG,GAAG,EAAE,eAAe,CAAC,CAAC;YAClG,OAAO,UAAU,CAAC;QACpB,CAAC;IACH,CAAC;IAEO,iBAAiB,CAAC,WAA4B,EAAE,YAAqB;QAC3E,IAAI,YAAY,EAAE,CAAC;YACjB,OAAO,IAAI,CAAC;QACd,CAAC;QACD,IAAI,eAAe,CAAC,WAAW,CAAC,OAAO,CAAC,EAAE,CAAC;YACzC,OAAO,IAAI,CAAC;QACd,CAAC;QACD,OAAO,oBAAoB,CAAC,WAAW,CAAC,OAAO,CAAC,CAAC,kBAAkB,CAAC;IACtE,CAAC;IAED,KAAK,CAAC,SAAS,CAAC,GAAW,EAAE,UAAwB,EAAE;QACrD,oDAAoD;QACpD,gHAAgH;QAChH,MAAM,gBAAgB,GACpB,OAAO,CAAC,OAAO,KAAK,SAAS,CAAC,CAAC,CAAC,OAAO,CAAC,OAAO,CAAC,CAAC,CAAC,IAAI,CAAC,MAAM,CAAC,OAAO,KAAK,SAAS,CAAC,CAAC,CAAC,IAAI,CAAC,MAAM,CAAC,OAAO,CAAC,CAAC,CAAC,KAAK,CAAC;QACpH,MAAM,iBAAiB,GACrB,OAAO,CAAC,QAAQ,KAAK,SAAS;YAC5B,CAAC,CAAC,OAAO,CAAC,QAAQ;YAClB,CAAC,CAAC,IAAI,CAAC,MAAM,CAAC,QAAQ,KAAK,SAAS;gBAClC,CAAC,CAAC,IAAI,CAAC,MAAM,CAAC,QAAQ;gBACtB,CAAC,CAAC,KAAK,CAAC;QAEd,yFAAyF;QACzF,mEAAmE;QACnE,MAAM,kBAAkB,GAAG,IAAI,CAAC,MAAM,CAAC,OAAO,IAAI,EAAE,CAAC;QACrD,MAAM,sBAAsB,GAAG,OAAO,CAAC,OAAO,IAAI,EAAE,CAAC,CAAC,mEAAmE;QAEzH,8DAA8D;QAC9D,MAAM,0BAA0B,GAAG,EAAE,GAAG,kBAAkB,EAAE,GAAG,sBAAsB,EAAE,CAAC;QAExF,kEAAkE;QAClE,MAAM,iBAAiB,GAInB;YACF,GAAG,IAAI,CAAC,MAAM,EAAE,gEAAgE;YAChF,GAAG,OAAO,EAAE,yDAAyD;YACrE,OAAO,EAAE,0BAA0B,EAAE,sCAAsC;YAC3E,QAAQ,EAAE,iBAAiB;YAC3B,OAAO,EAAE,gBAAgB;SAC1B,CAAC;QAEF,qCAAqC;QACrC,KAAK,MAAM,OAAO,IAAI,IAAI,CAAC,sBAAsB,EAAE,CAAC;YAClD,IAAI,OAAO,OAAO,KAAK,QAAQ,IAAI,GAAG,CAAC,QAAQ,CAAC,OAAO,CAAC,EAAE,CAAC;gBACzD,OAAO,CAAC,IAAI,CAAC,qBAAqB,GAAG,4BAA4B,OAAO,qCAAqC,CAAC,CAAC;gBAC/G,OAAO,IAAI,CAAC,gBAAgB,CAAC,SAAS,CAAC,GAAG,EAAE,iBAAiB,CAAC,CAAC;YACjE,CAAC;iBAAM,IAAI,OAAO,YAAY,MAAM,IAAI,OAAO,CAAC,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC;gBAC1D,OAAO,CAAC,IAAI,CACV,qBAAqB,GAAG,2BAA2B,OAAO,CAAC,QAAQ,EAAE,qCAAqC,CAC3G,CAAC;gBACF,OAAO,IAAI,CAAC,gBAAgB,CAAC,SAAS,CAAC,GAAG,EAAE,iBAAiB,CAAC,CAAC;YACjE,CAAC;QACH,CAAC;QAED,IAAI,CAAC;YACH,MAAM,WAAW,GAAG,MAAM,IAAI,CAAC,WAAW,CAAC,SAAS,CAAC,GAAG,EAAE;gBACxD,QAAQ,EAAE,KAAK;gBACf,OAAO,EAAE,OAAO,CAAC,OAAO;aACzB,CAAC,CAAC;YACH,MAAM,mBAAmB,GAAG,iBAAiB,CAAC,CAAC,CAAC,IAAI,CAAC,sBAAsB,CAAC,WAAW,CAAC,CAAC,CAAC,CAAC,WAAW,CAAC;YAEvG,IAAI,CAAC,IAAI,CAAC,iBAAiB,CAAC,WAAW,EAAE,gBAAgB,CAAC,EAAE,CAAC;gBAC3D,OAAO,mBAAmB,CAAC;YAC7B,CAAC;YAED,OAAO,CAAC,IAAI,CAAC,gCAAgC,GAAG,kDAAkD,CAAC,CAAC;YAEpG,kGAAkG;YAClG,MAAM,iBAAiB,GAAG;gBACxB,GAAG,iBAAiB;gBACpB,eAAe,EAAE,KAAK;gBACtB,OAAO,EAAE,IAAI;aACd,CAAC;YAEF,IAAI,CAAC;gBACH,MAAM,gBAAgB,GAAG,MAAM,IAAI,CAAC,gBAAgB,CAAC,SAAS,CAAC,GAAG,EAAE,iBAAiB,CAAC,CAAC;gBACvF,MAAM,gBAAgB,GAAG,uBAAuB,CAAC,mBAAmB,CAAC,OAAO,EAAE,mBAAmB,CAAC,WAAW,CAAC,CAAC;gBAC/G,MAAM,kBAAkB,GAAG,uBAAuB,CAAC,gBAAgB,CAAC,OAAO,EAAE,gBAAgB,CAAC,WAAW,CAAC,CAAC;gBAE3G,IAAI,CAAC,mCAAmC,CAAC,gBAAgB,EAAE,kBAAkB,CAAC,EAAE,CAAC;oBAC/E,OAAO,CAAC,IAAI,CAAC,uCAAuC,GAAG,oDAAoD,CAAC,CAAC;oBAC7G,OAAO,mBAAmB,CAAC;gBAC7B,CAAC;gBAED,OAAO,gBAAgB,CAAC;YAC1B,CAAC;YAAC,OAAO,eAAwB,EAAE,CAAC;gBAClC,MAAM,SAAS,GAAG,eAAe,YAAY,KAAK,CAAC,CAAC,CAAC,eAAe,CAAC,OAAO,CAAC,CAAC,CAAC,MAAM,CAAC,eAAe,CAAC,CAAC;gBACvG,OAAO,CAAC,IAAI,CAAC,8CAA8C,GAAG,KAAK,SAAS,0BAA0B,CAAC,CAAC;gBACxG,OAAO,mBAAmB,CAAC;YAC7B,CAAC;QACH,CAAC;QAAC,OAAO,UAAmB,EAAE,CAAC;YAC7B,oEAAoE;YACpE,IAAI,UAAU,YAAY,oBAAoB,IAAI,UAAU,CAAC,UAAU,KAAK,GAAG,EAAE,CAAC;gBAChF,OAAO,CAAC,IAAI,CAAC,8CAA8C,GAAG,qBAAqB,CAAC,CAAC;gBACrF,MAAM,UAAU,CAAC;YACnB,CAAC;YACD,MAAM,OAAO,GAAG,UAAU,YAAY,KAAK,CAAC,CAAC,CAAC,UAAU,CAAC,OAAO,CAAC,CAAC,CAAC,MAAM,CAAC,UAAU,CAAC,CAAC;YACtF,OAAO,CAAC,IAAI,CAAC,wCAAwC,GAAG,KAAK,OAAO,qCAAqC,CAAC,CAAC;YAC3G,IAAI,CAAC;gBACH,yEAAyE;gBACzE,MAAM,gBAAgB,GAAG,MAAM,IAAI,CAAC,gBAAgB,CAAC,SAAS,CAAC,GAAG,EAAE,iBAAiB,CAAC,CAAC;gBACvF,OAAO,gBAAgB,CAAC;YAC1B,CAAC;YAAC,OAAO,eAAwB,EAAE,CAAC;gBAClC,MAAM,SAAS,GAAG,eAAe,YAAY,KAAK,CAAC,CAAC,CAAC,eAAe,CAAC,OAAO,CAAC,CAAC,CAAC,MAAM,CAAC,eAAe,CAAC,CAAC;gBACvG,OAAO,CAAC,KAAK,CAAC,2DAA2D,GAAG,KAAK,SAAS,EAAE,CAAC,CAAC;gBAC9F,MAAM,eAAe,CAAC,CAAC,8DAA8D;YACvF,CAAC;QACH,CAAC;IACH,CAAC;IAED;;;;;;;;;OASG;IACH,KAAK,CAAC,YAAY,CAAC,GAAW,EAAE,UAA+B,EAAE;QAC/D,qCAAqC;QACrC,KAAK,MAAM,OAAO,IAAI,IAAI,CAAC,sBAAsB,EAAE,CAAC;YAClD,IAAI,OAAO,OAAO,KAAK,QAAQ,IAAI,GAAG,CAAC,QAAQ,CAAC,OAAO,CAAC,EAAE,CAAC;gBACzD,OAAO,CAAC,IAAI,CACV,qBAAqB,GAAG,4BAA4B,OAAO,uDAAuD,CACnH,CAAC;gBACF,OAAO,IAAI,CAAC,gBAAgB,CAAC,YAAY,CAAC,GAAG,EAAE,OAAO,CAAC,CAAC;YAC1D,CAAC;iBAAM,IAAI,OAAO,YAAY,MAAM,IAAI,OAAO,CAAC,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC;gBAC1D,OAAO,CAAC,IAAI,CACV,qBAAqB,GAAG,2BAA2B,OAAO,CAAC,QAAQ,EAAE,uDAAuD,CAC7H,CAAC;gBACF,OAAO,IAAI,CAAC,gBAAgB,CAAC,YAAY,CAAC,GAAG,EAAE,OAAO,CAAC,CAAC;YAC1D,CAAC;QACH,CAAC;QAED,IAAI,CAAC;YACH,wBAAwB;YACxB,MAAM,WAAW,GAAG,MAAM,IAAI,CAAC,WAAW,CAAC,YAAY,CAAC,GAAG,EAAE,OAAO,CAAC,CAAC;YACtE,OAAO,WAAW,CAAC;QACrB,CAAC;QAAC,OAAO,UAAmB,EAAE,CAAC;YAC7B,oEAAoE;YACpE,IAAI,UAAU,YAAY,oBAAoB,IAAI,UAAU,CAAC,UAAU,KAAK,GAAG,EAAE,CAAC;gBAChF,OAAO,CAAC,IAAI,CAAC,4DAA4D,GAAG,qBAAqB,CAAC,CAAC;gBACnG,MAAM,UAAU,CAAC;YACnB,CAAC;YACD,MAAM,OAAO,GAAG,UAAU,YAAY,KAAK,CAAC,CAAC,CAAC,UAAU,CAAC,OAAO,CAAC,CAAC,CAAC,MAAM,CAAC,UAAU,CAAC,CAAC;YACtF,OAAO,CAAC,IAAI,CACV,sDAAsD,GAAG,KAAK,OAAO,qCAAqC,CAC3G,CAAC;YACF,IAAI,CAAC;gBACH,+BAA+B;gBAC/B,MAAM,gBAAgB,GAAG,MAAM,IAAI,CAAC,gBAAgB,CAAC,YAAY,CAAC,GAAG,EAAE,OAAO,CAAC,CAAC;gBAChF,OAAO,gBAAgB,CAAC;YAC1B,CAAC;YAAC,OAAO,eAAwB,EAAE,CAAC;gBAClC,MAAM,SAAS,GAAG,eAAe,YAAY,KAAK,CAAC,CAAC,CAAC,eAAe,CAAC,OAAO,CAAC,CAAC,CAAC,MAAM,CAAC,eAAe,CAAC,CAAC;gBACvG,OAAO,CAAC,KAAK,CAAC,yEAAyE,GAAG,KAAK,SAAS,EAAE,CAAC,CAAC;gBAC5G,MAAM,eAAe,CAAC,CAAC,8DAA8D;YACvF,CAAC;QACH,CAAC;IACH,CAAC;IAED;;OAEG;IACH,UAAU;QACR,OAAO,IAAI,CAAC,gBAAgB,CAAC,UAAU,EAAE,CAAC;IAC5C,CAAC;IAED;;OAEG;IACH,KAAK,CAAC,OAAO;QACX,MAAM,OAAO,CAAC,UAAU,CAAC;YACvB,IAAI,CAAC,WAAW,CAAC,OAAO,EAAE,EAAE,yCAAyC;YACrE,IAAI,CAAC,gBAAgB,CAAC,OAAO,EAAE;SAChC,CAAC,CAAC;IACL,CAAC;CACF"}
@@ -8,6 +8,9 @@ import type { IEngine } from "./IEngine.js";
8
8
  * Features include caching, retries, HTTP fallback, and configurable browser pooling.
9
9
  */
10
10
  export declare class PlaywrightEngine implements IEngine {
11
+ private static readonly AUTO_RENDER_POLL_MS;
12
+ private static readonly AUTO_RENDER_QUIET_WINDOW_MS;
13
+ private static readonly AUTO_RENDER_MAX_WAIT_MS;
11
14
  private browserPool;
12
15
  private readonly queue;
13
16
  private readonly cache;
@@ -41,6 +44,9 @@ export declare class PlaywrightEngine implements IEngine {
41
44
  * Simulate human-like interactions on the page.
42
45
  */
43
46
  private simulateHumanBehavior;
47
+ private captureRenderedDomSnapshot;
48
+ private shouldAutoWaitForRenderedDom;
49
+ private waitForRenderedDomIfNeeded;
44
50
  /**
45
51
  * Adds a result to the in-memory cache.
46
52
  */
@@ -102,7 +108,6 @@ export declare class PlaywrightEngine implements IEngine {
102
108
  */
103
109
  private fetchWithPlaywright;
104
110
  private applyBlockingRules;
105
- private _injectSourceUnderH1;
106
111
  /**
107
112
  * Cleans up resources used by the engine, primarily closing browser instances in the pool.
108
113
  *
@@ -1 +1 @@
1
- {"version":3,"file":"PlaywrightEngine.d.ts","sourceRoot":"","sources":["../src/PlaywrightEngine.ts"],"names":[],"mappings":"AAAA,OAAO,KAAK,EACV,eAAe,EACf,kBAAkB,EAClB,mBAAmB,EACnB,cAAc,EACd,sBAAsB,EACtB,YAAY,EACb,MAAM,YAAY,CAAC;AACpB,OAAO,KAAK,EAAE,OAAO,EAAE,MAAM,cAAc,CAAC;AAyC5C;;;;;;GAMG;AACH,qBAAa,gBAAiB,YAAW,OAAO;IAC9C,OAAO,CAAC,WAAW,CAAsC;IACzD,OAAO,CAAC,QAAQ,CAAC,KAAK,CAAS;IAC/B,OAAO,CAAC,QAAQ,CAAC,KAAK,CAAsC;IAC5D,OAAO,CAAC,QAAQ,CAAC,MAAM,CAAiC;IAGxD,OAAO,CAAC,uBAAuB,CAAkB;IACjD,OAAO,CAAC,iBAAiB,CAAkB;IAC3C,OAAO,CAAC,mBAAmB,CAA0B;IAGrD,OAAO,CAAC,MAAM,CAAC,QAAQ,CAAC,cAAc,CAuBpC;IAEF;;;;;OAKG;gBACS,MAAM,GAAE,sBAA2B;IAM/C;;OAEG;YACW,qBAAqB;IAwCnC;;;OAGG;YACW,yBAAyB;IAyEvC,OAAO,CAAC,UAAU;IAalB;;OAEG;YACW,WAAW;IAazB;;OAEG;YACW,qBAAqB;IAwCnC;;OAEG;IACH,OAAO,CAAC,UAAU;IAUlB;;;;;;;;;OASG;IACG,SAAS,CACb,GAAG,EAAE,MAAM,EACX,OAAO,GAAE,YAAY,GAAG;QAAE,QAAQ,CAAC,EAAE,OAAO,CAAC;QAAC,OAAO,CAAC,EAAE,OAAO,CAAA;KAAO,GACrE,OAAO,CAAC,eAAe,CAAC;IAoB3B;;;;;;;OAOG;IACH,OAAO,CAAC,iBAAiB;IAqDzB;;;;;;;OAOG;YACW,oBAAoB;IAmClC;;;;;;;;OAQG;YACW,6BAA6B;IAmC3C;;;;;;;OAOG;YACW,eAAe;IAkH7B;;;OAGG;YACW,mBAAmB;YA8KnB,kBAAkB;IA0ChC,OAAO,CAAC,oBAAoB;IAQ5B;;;;;OAKG;IACG,OAAO,IAAI,OAAO,CAAC,IAAI,CAAC;IAoB9B;;;OAGG;IACH,UAAU,IAAI,cAAc,EAAE;IAQ9B,OAAO,CAAC,mBAAmB;IAU3B;;;;;;;;OAQG;IACG,YAAY,CAAC,GAAG,EAAE,MAAM,EAAE,OAAO,GAAE,mBAAwB,GAAG,OAAO,CAAC,kBAAkB,CAAC;IA0C/F;;OAEG;IACH,OAAO,CAAC,iBAAiB;IAkBzB;;OAEG;IACH,OAAO,CAAC,iBAAiB;IAoBzB;;OAEG;YACW,sBAAsB;IAyDpC;;OAEG;YACW,2BAA2B;IA4DzC;;OAEG;YACW,0BAA0B;CAmFzC"}
1
+ {"version":3,"file":"PlaywrightEngine.d.ts","sourceRoot":"","sources":["../src/PlaywrightEngine.ts"],"names":[],"mappings":"AAAA,OAAO,KAAK,EACV,eAAe,EACf,kBAAkB,EAClB,mBAAmB,EACnB,cAAc,EACd,sBAAsB,EACtB,YAAY,EACb,MAAM,YAAY,CAAC;AACpB,OAAO,KAAK,EAAE,OAAO,EAAE,MAAM,cAAc,CAAC;AA2D5C;;;;;;GAMG;AACH,qBAAa,gBAAiB,YAAW,OAAO;IAC9C,OAAO,CAAC,MAAM,CAAC,QAAQ,CAAC,mBAAmB,CAAO;IAClD,OAAO,CAAC,MAAM,CAAC,QAAQ,CAAC,2BAA2B,CAAO;IAC1D,OAAO,CAAC,MAAM,CAAC,QAAQ,CAAC,uBAAuB,CAAQ;IAEvD,OAAO,CAAC,WAAW,CAAsC;IACzD,OAAO,CAAC,QAAQ,CAAC,KAAK,CAAS;IAC/B,OAAO,CAAC,QAAQ,CAAC,KAAK,CAAsC;IAC5D,OAAO,CAAC,QAAQ,CAAC,MAAM,CAAiC;IAGxD,OAAO,CAAC,uBAAuB,CAAkB;IACjD,OAAO,CAAC,iBAAiB,CAAkB;IAC3C,OAAO,CAAC,mBAAmB,CAA0B;IAGrD,OAAO,CAAC,MAAM,CAAC,QAAQ,CAAC,cAAc,CAuBpC;IAEF;;;;;OAKG;gBACS,MAAM,GAAE,sBAA2B;IAM/C;;OAEG;YACW,qBAAqB;IAwCnC;;;OAGG;YACW,yBAAyB;IA0EvC,OAAO,CAAC,UAAU;IAalB;;OAEG;YACW,WAAW;IAazB;;OAEG;YACW,qBAAqB;YA2CrB,0BAA0B;IAqDxC,OAAO,CAAC,4BAA4B;YAUtB,0BAA0B;IA8FxC;;OAEG;IACH,OAAO,CAAC,UAAU;IAUlB;;;;;;;;;OASG;IACG,SAAS,CACb,GAAG,EAAE,MAAM,EACX,OAAO,GAAE,YAAY,GAAG;QAAE,QAAQ,CAAC,EAAE,OAAO,CAAC;QAAC,OAAO,CAAC,EAAE,OAAO,CAAA;KAAO,GACrE,OAAO,CAAC,eAAe,CAAC;IAoB3B;;;;;;;OAOG;IACH,OAAO,CAAC,iBAAiB;IAqDzB;;;;;;;OAOG;YACW,oBAAoB;IAmClC;;;;;;;;OAQG;YACW,6BAA6B;IAmC3C;;;;;;;OAOG;YACW,eAAe;IAkH7B;;;OAGG;YACW,mBAAmB;YA6KnB,kBAAkB;IAyChC;;;;;OAKG;IACG,OAAO,IAAI,OAAO,CAAC,IAAI,CAAC;IAoB9B;;;OAGG;IACH,UAAU,IAAI,cAAc,EAAE;IAQ9B,OAAO,CAAC,mBAAmB;IAU3B;;;;;;;;OAQG;IACG,YAAY,CAAC,GAAG,EAAE,MAAM,EAAE,OAAO,GAAE,mBAAwB,GAAG,OAAO,CAAC,kBAAkB,CAAC;IA0C/F;;OAEG;IACH,OAAO,CAAC,iBAAiB;IAkBzB;;OAEG;IACH,OAAO,CAAC,iBAAiB;IAoBzB;;OAEG;YACW,sBAAsB;IAyDpC;;OAEG;YACW,2BAA2B;IA4DzC;;OAEG;YACW,0BAA0B;CA2FzC"}