@purepageio/fetch-engines 0.9.0 → 0.9.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +7 -4
- package/package.json +2 -2
package/README.md
CHANGED
|
@@ -4,7 +4,7 @@
|
|
|
4
4
|
[](https://github.com/purepage/fetch-engines/actions/workflows/publish.yml)
|
|
5
5
|
[](https://opensource.org/licenses/MIT)
|
|
6
6
|
|
|
7
|
-
Fetch
|
|
7
|
+
Fetch web pages as clean Markdown or structured data. HTTP-first with automatic Playwright fallback, built for RAG pipelines and content extraction.
|
|
8
8
|
|
|
9
9
|
## Table of contents
|
|
10
10
|
|
|
@@ -26,9 +26,10 @@ Fetch websites with confidence. `@purepageio/fetch-engines` gives teams an HTTP-
|
|
|
26
26
|
## Why fetch-engines?
|
|
27
27
|
|
|
28
28
|
- **One API for multiple strategies** – Call `fetchHTML` for rendered pages or `fetchContent` for raw responses. The library handles HTTP shortcuts and Playwright fallbacks automatically.
|
|
29
|
-
- **
|
|
30
|
-
- **
|
|
31
|
-
- **
|
|
29
|
+
- **RAG-ready Markdown** – Convert any page to clean Markdown with boilerplate, nav, and SVG noise stripped out. Powered by a Rust-native converter.
|
|
30
|
+
- **Built-in retries, caching, and a managed browser pool** – Production defaults you can tune per request.
|
|
31
|
+
- **URL to structured data in one call** – Define a Zod schema, get typed results back via any OpenAI-compatible API. The page is fetched as Markdown first to minimise tokens.
|
|
32
|
+
- **Playwright is optional** – `FetchEngine` works without browser dependencies. Playwright is only loaded when you use `HybridEngine` or `PlaywrightEngine`.
|
|
32
33
|
|
|
33
34
|
## Installation
|
|
34
35
|
|
|
@@ -103,6 +104,8 @@ const result = await fetchStructuredContent("https://example.com/article", schem
|
|
|
103
104
|
console.log(result.data.summary);
|
|
104
105
|
```
|
|
105
106
|
|
|
107
|
+
Under the hood, structured extraction fetches the page as Markdown first (same boilerplate removal as Markdown mode), then sends the cleaned content to the AI model — keeping token usage low and extraction quality high.
|
|
108
|
+
|
|
106
109
|
Set `OPENAI_API_KEY` (or `OPENROUTER_API_KEY`) before running structured helpers, or use `apiConfig` to connect to OpenAI-compatible APIs like OpenRouter. The engine automatically adds the `Authorization` header when you provide an API key:
|
|
107
110
|
|
|
108
111
|
```typescript
|
package/package.json
CHANGED
|
@@ -1,8 +1,8 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@purepageio/fetch-engines",
|
|
3
|
-
"version": "0.9.
|
|
3
|
+
"version": "0.9.1",
|
|
4
4
|
"type": "module",
|
|
5
|
-
"description": "
|
|
5
|
+
"description": "Fetch web pages as clean Markdown or structured data. HTTP-first with automatic Playwright fallback, built for RAG pipelines and content extraction.",
|
|
6
6
|
"main": "dist/index.js",
|
|
7
7
|
"types": "dist/index.d.ts",
|
|
8
8
|
"files": [
|