@j0hanz/superfetch 1.0.3 → 1.0.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +615 -590
- package/dist/config/index.d.ts +5 -0
- package/dist/config/index.d.ts.map +1 -1
- package/dist/config/index.js +5 -0
- package/dist/config/index.js.map +1 -1
- package/dist/config/types.d.ts +5 -0
- package/dist/config/types.d.ts.map +1 -1
- package/dist/errors/app-error.d.ts +4 -0
- package/dist/errors/app-error.d.ts.map +1 -1
- package/dist/errors/app-error.js +7 -0
- package/dist/errors/app-error.js.map +1 -1
- package/dist/index.js +94 -17
- package/dist/index.js.map +1 -1
- package/dist/middleware/error-handler.d.ts.map +1 -1
- package/dist/middleware/error-handler.js +4 -2
- package/dist/middleware/error-handler.js.map +1 -1
- package/dist/middleware/rate-limiter.d.ts.map +1 -1
- package/dist/middleware/rate-limiter.js +46 -13
- package/dist/middleware/rate-limiter.js.map +1 -1
- package/dist/prompts/index.d.ts.map +1 -1
- package/dist/prompts/index.js +2 -7
- package/dist/prompts/index.js.map +1 -1
- package/dist/resources/cached-content.d.ts +4 -0
- package/dist/resources/cached-content.d.ts.map +1 -0
- package/dist/resources/cached-content.js +68 -0
- package/dist/resources/cached-content.js.map +1 -0
- package/dist/resources/index.d.ts.map +1 -1
- package/dist/resources/index.js +39 -1
- package/dist/resources/index.js.map +1 -1
- package/dist/server.d.ts.map +1 -1
- package/dist/server.js +10 -0
- package/dist/server.js.map +1 -1
- package/dist/services/cache.d.ts +11 -0
- package/dist/services/cache.d.ts.map +1 -1
- package/dist/services/cache.js +72 -8
- package/dist/services/cache.js.map +1 -1
- package/dist/services/card-extractor.d.ts +0 -4
- package/dist/services/card-extractor.d.ts.map +1 -1
- package/dist/services/card-extractor.js +17 -5
- package/dist/services/card-extractor.js.map +1 -1
- package/dist/services/extractor.d.ts +7 -1
- package/dist/services/extractor.d.ts.map +1 -1
- package/dist/services/extractor.js +16 -9
- package/dist/services/extractor.js.map +1 -1
- package/dist/services/fetcher.d.ts +10 -1
- package/dist/services/fetcher.d.ts.map +1 -1
- package/dist/services/fetcher.js +162 -36
- package/dist/services/fetcher.js.map +1 -1
- package/dist/services/parser.d.ts.map +1 -1
- package/dist/services/parser.js +41 -29
- package/dist/services/parser.js.map +1 -1
- package/dist/tools/handlers/fetch-links.tool.d.ts +5 -10
- package/dist/tools/handlers/fetch-links.tool.d.ts.map +1 -1
- package/dist/tools/handlers/fetch-links.tool.js +4 -0
- package/dist/tools/handlers/fetch-links.tool.js.map +1 -1
- package/dist/tools/handlers/fetch-markdown.tool.d.ts +5 -12
- package/dist/tools/handlers/fetch-markdown.tool.d.ts.map +1 -1
- package/dist/tools/handlers/fetch-markdown.tool.js +1 -2
- package/dist/tools/handlers/fetch-markdown.tool.js.map +1 -1
- package/dist/tools/handlers/fetch-url.tool.d.ts +4 -12
- package/dist/tools/handlers/fetch-url.tool.d.ts.map +1 -1
- package/dist/tools/handlers/fetch-url.tool.js.map +1 -1
- package/dist/tools/handlers/fetch-urls.tool.d.ts +8 -1
- package/dist/tools/handlers/fetch-urls.tool.d.ts.map +1 -1
- package/dist/tools/handlers/fetch-urls.tool.js +67 -16
- package/dist/tools/handlers/fetch-urls.tool.js.map +1 -1
- package/dist/tools/utils/common.js +1 -1
- package/dist/tools/utils/common.js.map +1 -1
- package/dist/tools/utils/fetch-pipeline.d.ts.map +1 -1
- package/dist/tools/utils/fetch-pipeline.js +90 -20
- package/dist/tools/utils/fetch-pipeline.js.map +1 -1
- package/dist/transformers/markdown.transformer.d.ts.map +1 -1
- package/dist/transformers/markdown.transformer.js +8 -28
- package/dist/transformers/markdown.transformer.js.map +1 -1
- package/dist/utils/concurrency.d.ts +5 -1
- package/dist/utils/concurrency.d.ts.map +1 -1
- package/dist/utils/concurrency.js +15 -2
- package/dist/utils/concurrency.js.map +1 -1
- package/dist/utils/content-cleaner.d.ts.map +1 -1
- package/dist/utils/content-cleaner.js +124 -108
- package/dist/utils/content-cleaner.js.map +1 -1
- package/dist/utils/language-detector.d.ts +1 -1
- package/dist/utils/language-detector.d.ts.map +1 -1
- package/dist/utils/sanitizer.js +1 -1
- package/dist/utils/sanitizer.js.map +1 -1
- package/dist/utils/tool-error-handler.d.ts.map +1 -1
- package/dist/utils/tool-error-handler.js +36 -6
- package/dist/utils/tool-error-handler.js.map +1 -1
- package/dist/utils/url-validator.d.ts +10 -0
- package/dist/utils/url-validator.d.ts.map +1 -1
- package/dist/utils/url-validator.js +43 -5
- package/dist/utils/url-validator.js.map +1 -1
- package/package.json +83 -80
package/README.md
CHANGED
|
@@ -1,590 +1,615 @@
|
|
|
1
|
-
# 🚀 superFetch
|
|
2
|
-
|
|
3
|
-
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
[](https://www.npmjs.com/package/@j0hanz/superfetch) [](https://nodejs.org/) [](https://www.typescriptlang.org/)
|
|
6
|
+
|
|
7
|
+
## One-Click Install
|
|
8
|
+
|
|
9
|
+
[](https://insiders.vscode.dev/redirect/mcp/install?name=superfetch&inputs=%5B%5D&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Fsuperfetch%40latest%22%2C%22--stdio%22%5D%7D) [](https://insiders.vscode.dev/redirect/mcp/install?name=superfetch&inputs=%5B%5D&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Fsuperfetch%40latest%22%2C%22--stdio%22%5D%7D&quality=insiders)
|
|
10
|
+
|
|
11
|
+
[](https://cursor.com/install-mcp?name=superfetch&config=eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyIteSIsIkBqMGhhbnovc3VwZXJmZXRjaEBsYXRlc3QiLCItLXN0ZGlvIl19)
|
|
12
|
+
|
|
13
|
+
A [Model Context Protocol](https://modelcontextprotocol.io/) (MCP) server that fetches, extracts, and transforms web content into AI-optimized formats using Mozilla Readability.
|
|
14
|
+
|
|
15
|
+
**Version:** 1.0.5
|
|
16
|
+
|
|
17
|
+
[Quick Start](#quick-start) · [How to Choose a Tool](#-how-to-choose-a-tool) · [Tools](#available-tools) · [Configuration](#configuration) · [Contributing](#contributing)
|
|
18
|
+
|
|
19
|
+
> 📦 **Published to [MCP Registry](https://registry.modelcontextprotocol.io/)** — Search for `io.github.j0hanz/superfetch`
|
|
20
|
+
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
> [!CAUTION]
|
|
24
|
+
> This server can access URLs on behalf of AI assistants. Built-in SSRF protection blocks private IP ranges and cloud metadata endpoints, but exercise caution when deploying in sensitive environments.
|
|
25
|
+
|
|
26
|
+
## ✨ Features
|
|
27
|
+
|
|
28
|
+
| Feature | Description |
|
|
29
|
+
| ------------------------- | ------------------------------------------------------------- |
|
|
30
|
+
| 🧠 **Smart Extraction** | Mozilla Readability removes ads, navigation, and boilerplate |
|
|
31
|
+
| 📄 **Multiple Formats** | JSONL semantic blocks or clean Markdown with YAML frontmatter |
|
|
32
|
+
| 🔗 **Link Discovery** | Extract and classify internal/external links |
|
|
33
|
+
| ⚡ **Built-in Caching** | Configurable TTL and max entries |
|
|
34
|
+
| 🛡️ **Security First** | SSRF protection, URL validation, header sanitization |
|
|
35
|
+
| 🔄 **Resilient Fetching** | Exponential backoff with jitter |
|
|
36
|
+
| 📊 **Monitoring** | Stats resource for cache performance and health |
|
|
37
|
+
|
|
38
|
+
---
|
|
39
|
+
|
|
40
|
+
## 🎯 How to Choose a Tool
|
|
41
|
+
|
|
42
|
+
Use this guide to select the right tool for your web content extraction needs:
|
|
43
|
+
|
|
44
|
+
### Decision Tree
|
|
45
|
+
|
|
46
|
+
```text
|
|
47
|
+
Need web content for AI?
|
|
48
|
+
├─ Single URL?
|
|
49
|
+
│ ├─ Need structured semantic blocks → fetch-url (JSONL)
|
|
50
|
+
│ ├─ Need readable markdown → fetch-markdown
|
|
51
|
+
│ └─ Need links only → fetch-links
|
|
52
|
+
└─ Multiple URLs?
|
|
53
|
+
└─ Use fetch-urls (batch processing)
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
### Quick Reference Table
|
|
57
|
+
|
|
58
|
+
| Tool | Best For | Output Format | Use When |
|
|
59
|
+
| ---------------- | -------------------------------- | ----------------------- | ------------------------------------------- |
|
|
60
|
+
| `fetch-url` | Single page → structured content | JSONL semantic blocks | AI analysis, RAG pipelines, content parsing |
|
|
61
|
+
| `fetch-markdown` | Single page → readable format | Clean Markdown + TOC | Documentation, human-readable output |
|
|
62
|
+
| `fetch-links` | Link discovery & classification | URL array with types | Sitemap building, finding related pages |
|
|
63
|
+
| `fetch-urls` | Batch processing multiple pages | Multiple JSONL/Markdown | Comparing pages, bulk extraction |
|
|
64
|
+
|
|
65
|
+
### Common Use Cases
|
|
66
|
+
|
|
67
|
+
| Task | Recommended Tool | Why |
|
|
68
|
+
| ------------------------ | ---------------------------------------- | ---------------------------------------------------- |
|
|
69
|
+
| Parse a blog post for AI | `fetch-url` | Returns semantic blocks (headings, paragraphs, code) |
|
|
70
|
+
| Generate documentation | `fetch-markdown` | Clean markdown with optional TOC |
|
|
71
|
+
| Build a sitemap | `fetch-links` | Extracts and classifies all links |
|
|
72
|
+
| Compare multiple docs | `fetch-urls` | Parallel fetching with concurrency control |
|
|
73
|
+
| Extract article for RAG | `fetch-url` + `extractMainContent: true` | Removes ads/nav, keeps main content |
|
|
74
|
+
|
|
75
|
+
---
|
|
76
|
+
|
|
77
|
+
## Quick Start
|
|
78
|
+
|
|
79
|
+
Add superFetch to your MCP client configuration — no installation required!
|
|
80
|
+
|
|
81
|
+
### Claude Desktop
|
|
82
|
+
|
|
83
|
+
Add to your `claude_desktop_config.json`:
|
|
84
|
+
|
|
85
|
+
```json
|
|
86
|
+
{
|
|
87
|
+
"mcpServers": {
|
|
88
|
+
"superFetch": {
|
|
89
|
+
"command": "npx",
|
|
90
|
+
"args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
|
|
91
|
+
}
|
|
92
|
+
}
|
|
93
|
+
}
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
### VS Code
|
|
97
|
+
|
|
98
|
+
Add to `.vscode/mcp.json` in your workspace:
|
|
99
|
+
|
|
100
|
+
```json
|
|
101
|
+
{
|
|
102
|
+
"servers": {
|
|
103
|
+
"superFetch": {
|
|
104
|
+
"command": "npx",
|
|
105
|
+
"args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
|
|
106
|
+
}
|
|
107
|
+
}
|
|
108
|
+
}
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
### With Environment Variables
|
|
112
|
+
|
|
113
|
+
```json
|
|
114
|
+
{
|
|
115
|
+
"servers": {
|
|
116
|
+
"superFetch": {
|
|
117
|
+
"command": "npx",
|
|
118
|
+
"args": ["-y", "@j0hanz/superfetch@latest", "--stdio"],
|
|
119
|
+
"env": {
|
|
120
|
+
"CACHE_TTL": "7200",
|
|
121
|
+
"LOG_LEVEL": "debug"
|
|
122
|
+
}
|
|
123
|
+
}
|
|
124
|
+
}
|
|
125
|
+
}
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
### Cursor
|
|
129
|
+
|
|
130
|
+
1. Open Cursor Settings
|
|
131
|
+
2. Go to **Features > MCP Servers**
|
|
132
|
+
3. Click **"+ Add new global MCP server"**
|
|
133
|
+
4. Add this configuration:
|
|
134
|
+
|
|
135
|
+
```json
|
|
136
|
+
{
|
|
137
|
+
"mcpServers": {
|
|
138
|
+
"superFetch": {
|
|
139
|
+
"command": "npx",
|
|
140
|
+
"args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
|
|
141
|
+
}
|
|
142
|
+
}
|
|
143
|
+
}
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
> **Tip:** On Windows, if you encounter issues, try: `cmd /c "npx -y @j0hanz/superfetch@latest --stdio"`
|
|
147
|
+
|
|
148
|
+
<details>
|
|
149
|
+
<summary><strong>Cline (VS Code Extension)</strong></summary>
|
|
150
|
+
|
|
151
|
+
Open the Cline MCP settings file:
|
|
152
|
+
|
|
153
|
+
**macOS:**
|
|
154
|
+
|
|
155
|
+
```bash
|
|
156
|
+
code ~/Library/Application\ Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
**Windows:**
|
|
160
|
+
|
|
161
|
+
```bash
|
|
162
|
+
code %APPDATA%\Code\User\globalStorage\saoudrizwan.claude-dev\settings\cline_mcp_settings.json
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
Add the configuration:
|
|
166
|
+
|
|
167
|
+
```json
|
|
168
|
+
{
|
|
169
|
+
"mcpServers": {
|
|
170
|
+
"superFetch": {
|
|
171
|
+
"command": "npx",
|
|
172
|
+
"args": ["-y", "@j0hanz/superfetch@latest", "--stdio"],
|
|
173
|
+
"disabled": false,
|
|
174
|
+
"autoApprove": []
|
|
175
|
+
}
|
|
176
|
+
}
|
|
177
|
+
}
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
</details>
|
|
181
|
+
|
|
182
|
+
<details>
|
|
183
|
+
<summary><strong>Windsurf</strong></summary>
|
|
184
|
+
|
|
185
|
+
Add to `./codeium/windsurf/model_config.json`:
|
|
186
|
+
|
|
187
|
+
```json
|
|
188
|
+
{
|
|
189
|
+
"mcpServers": {
|
|
190
|
+
"superFetch": {
|
|
191
|
+
"command": "npx",
|
|
192
|
+
"args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
|
|
193
|
+
}
|
|
194
|
+
}
|
|
195
|
+
}
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
</details>
|
|
199
|
+
|
|
200
|
+
<details>
|
|
201
|
+
<summary><strong>Claude Desktop (Config File Locations)</strong></summary>
|
|
202
|
+
|
|
203
|
+
**macOS:**
|
|
204
|
+
|
|
205
|
+
```bash
|
|
206
|
+
# Open config file
|
|
207
|
+
open -e "$HOME/Library/Application Support/Claude/claude_desktop_config.json"
|
|
208
|
+
|
|
209
|
+
# Or with VS Code
|
|
210
|
+
code "$HOME/Library/Application Support/Claude/claude_desktop_config.json"
|
|
211
|
+
```
|
|
212
|
+
|
|
213
|
+
**Windows:**
|
|
214
|
+
|
|
215
|
+
```bash
|
|
216
|
+
code %APPDATA%\Claude\claude_desktop_config.json
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
</details>
|
|
220
|
+
|
|
221
|
+
---
|
|
222
|
+
|
|
223
|
+
## Installation (Alternative)
|
|
224
|
+
|
|
225
|
+
### Global Installation
|
|
226
|
+
|
|
227
|
+
```bash
|
|
228
|
+
npm install -g @j0hanz/superfetch
|
|
229
|
+
|
|
230
|
+
# Run in stdio mode
|
|
231
|
+
superfetch --stdio
|
|
232
|
+
|
|
233
|
+
# Run HTTP server
|
|
234
|
+
superfetch
|
|
235
|
+
```
|
|
236
|
+
|
|
237
|
+
### From Source
|
|
238
|
+
|
|
239
|
+
```bash
|
|
240
|
+
git clone https://github.com/j0hanz/super-fetch-mcp-server.git
|
|
241
|
+
cd super-fetch-mcp-server
|
|
242
|
+
npm install
|
|
243
|
+
npm run build
|
|
244
|
+
```
|
|
245
|
+
|
|
246
|
+
### Running the Server
|
|
247
|
+
|
|
248
|
+
<details>
|
|
249
|
+
<summary><strong>HTTP Mode</strong> (default)</summary>
|
|
250
|
+
|
|
251
|
+
```bash
|
|
252
|
+
# Development with hot reload
|
|
253
|
+
npm run dev
|
|
254
|
+
|
|
255
|
+
# Production
|
|
256
|
+
npm start
|
|
257
|
+
```
|
|
258
|
+
|
|
259
|
+
Server runs at `http://127.0.0.1:3000`:
|
|
260
|
+
|
|
261
|
+
- Health check: `GET /health`
|
|
262
|
+
- MCP endpoint: `POST /mcp`
|
|
263
|
+
|
|
264
|
+
</details>
|
|
265
|
+
|
|
266
|
+
<details>
|
|
267
|
+
<summary><strong>stdio Mode</strong> (direct MCP integration)</summary>
|
|
268
|
+
|
|
269
|
+
```bash
|
|
270
|
+
node dist/index.js --stdio
|
|
271
|
+
```
|
|
272
|
+
|
|
273
|
+
</details>
|
|
274
|
+
|
|
275
|
+
---
|
|
276
|
+
|
|
277
|
+
## Available Tools
|
|
278
|
+
|
|
279
|
+
### `fetch-url`
|
|
280
|
+
|
|
281
|
+
Fetches a webpage and converts it to AI-readable JSONL format with semantic content blocks.
|
|
282
|
+
|
|
283
|
+
| Parameter | Type | Default | Description |
|
|
284
|
+
| -------------------- | ------- | ---------- | -------------------------------------------- |
|
|
285
|
+
| `url` | string | _required_ | URL to fetch |
|
|
286
|
+
| `extractMainContent` | boolean | `true` | Use Readability to extract main content |
|
|
287
|
+
| `includeMetadata` | boolean | `true` | Include page metadata (title, description) |
|
|
288
|
+
| `maxContentLength` | number | – | Maximum content length in characters |
|
|
289
|
+
| `customHeaders` | object | – | Custom HTTP headers for the request |
|
|
290
|
+
| `timeout` | number | `30000` | Request timeout in milliseconds (1000-60000) |
|
|
291
|
+
| `retries` | number | `3` | Number of retry attempts (1-10) |
|
|
292
|
+
|
|
293
|
+
**Example Response:**
|
|
294
|
+
|
|
295
|
+
```json
|
|
296
|
+
{
|
|
297
|
+
"url": "https://example.com/article",
|
|
298
|
+
"title": "Example Article",
|
|
299
|
+
"fetchedAt": "2025-12-11T10:30:00.000Z",
|
|
300
|
+
"contentBlocks": [
|
|
301
|
+
{
|
|
302
|
+
"type": "metadata",
|
|
303
|
+
"title": "Example Article",
|
|
304
|
+
"description": "A sample article"
|
|
305
|
+
},
|
|
306
|
+
{ "type": "heading", "level": 1, "text": "Introduction" },
|
|
307
|
+
{
|
|
308
|
+
"type": "paragraph",
|
|
309
|
+
"text": "This is the main content of the article..."
|
|
310
|
+
},
|
|
311
|
+
{
|
|
312
|
+
"type": "code",
|
|
313
|
+
"language": "javascript",
|
|
314
|
+
"content": "console.log('Hello');"
|
|
315
|
+
}
|
|
316
|
+
],
|
|
317
|
+
"cached": false
|
|
318
|
+
}
|
|
319
|
+
```
|
|
320
|
+
|
|
321
|
+
### `fetch-links`
|
|
322
|
+
|
|
323
|
+
Extracts hyperlinks from a webpage with classification. Supports filtering, image links, and link limits.
|
|
324
|
+
|
|
325
|
+
| Parameter | Type | Default | Description |
|
|
326
|
+
| ----------------- | ------- | ---------- | -------------------------------------------- |
|
|
327
|
+
| `url` | string | _required_ | URL to extract links from |
|
|
328
|
+
| `includeExternal` | boolean | `true` | Include external links |
|
|
329
|
+
| `includeInternal` | boolean | `true` | Include internal links |
|
|
330
|
+
| `includeImages` | boolean | `false` | Include image links (img src attributes) |
|
|
331
|
+
| `maxLinks` | number | – | Maximum number of links to return (1-1000) |
|
|
332
|
+
| `filterPattern` | string | – | Regex pattern to filter links (matches href) |
|
|
333
|
+
| `customHeaders` | object | – | Custom HTTP headers for the request |
|
|
334
|
+
| `timeout` | number | `30000` | Request timeout in milliseconds (1000-60000) |
|
|
335
|
+
| `retries` | number | `3` | Number of retry attempts (1-10) |
|
|
336
|
+
|
|
337
|
+
**Example Response:**
|
|
338
|
+
|
|
339
|
+
```json
|
|
340
|
+
{
|
|
341
|
+
"url": "https://example.com/",
|
|
342
|
+
"linkCount": 15,
|
|
343
|
+
"links": [
|
|
344
|
+
{
|
|
345
|
+
"href": "https://example.com/about",
|
|
346
|
+
"text": "About Us",
|
|
347
|
+
"type": "internal"
|
|
348
|
+
},
|
|
349
|
+
{
|
|
350
|
+
"href": "https://github.com/example",
|
|
351
|
+
"text": "GitHub",
|
|
352
|
+
"type": "external"
|
|
353
|
+
},
|
|
354
|
+
{ "href": "https://example.com/logo.png", "text": "", "type": "image" }
|
|
355
|
+
],
|
|
356
|
+
"cached": false,
|
|
357
|
+
"truncated": false
|
|
358
|
+
}
|
|
359
|
+
```
|
|
360
|
+
|
|
361
|
+
### `fetch-markdown`
|
|
362
|
+
|
|
363
|
+
Fetches a webpage and converts it to clean Markdown with optional table of contents.
|
|
364
|
+
|
|
365
|
+
| Parameter | Type | Default | Description |
|
|
366
|
+
| -------------------- | ------- | ---------- | -------------------------------------------- |
|
|
367
|
+
| `url` | string | _required_ | URL to fetch |
|
|
368
|
+
| `extractMainContent` | boolean | `true` | Extract main content only |
|
|
369
|
+
| `includeMetadata` | boolean | `true` | Include YAML frontmatter |
|
|
370
|
+
| `maxContentLength` | number | – | Maximum content length in characters |
|
|
371
|
+
| `generateToc` | boolean | `false` | Generate table of contents from headings |
|
|
372
|
+
| `customHeaders` | object | – | Custom HTTP headers for the request |
|
|
373
|
+
| `timeout` | number | `30000` | Request timeout in milliseconds (1000-60000) |
|
|
374
|
+
| `retries` | number | `3` | Number of retry attempts (1-10) |
|
|
375
|
+
|
|
376
|
+
**Example Response:**
|
|
377
|
+
|
|
378
|
+
````json
|
|
379
|
+
{
|
|
380
|
+
"url": "https://example.com/docs",
|
|
381
|
+
"title": "Documentation",
|
|
382
|
+
"fetchedAt": "2025-12-11T10:30:00.000Z",
|
|
383
|
+
"markdown": "---\ntitle: Documentation\nsource: \"https://example.com/docs\"\n---\n\n# Getting Started\n\nWelcome to our documentation...\n\n## Installation\n\n```bash\nnpm install example\n```",
|
|
384
|
+
"toc": [
|
|
385
|
+
{ "level": 1, "text": "Getting Started", "slug": "getting-started" },
|
|
386
|
+
{ "level": 2, "text": "Installation", "slug": "installation" }
|
|
387
|
+
],
|
|
388
|
+
"cached": false,
|
|
389
|
+
"truncated": false
|
|
390
|
+
}
|
|
391
|
+
````
|
|
392
|
+
|
|
393
|
+
### `fetch-urls` (Batch)
|
|
394
|
+
|
|
395
|
+
Fetches multiple URLs in parallel with concurrency control. Ideal for comparing content or processing multiple pages efficiently.
|
|
396
|
+
|
|
397
|
+
| Parameter | Type | Default | Description |
|
|
398
|
+
| -------------------- | -------- | ---------- | -------------------------------------------- |
|
|
399
|
+
| `urls` | string[] | _required_ | Array of URLs to fetch (1-10 URLs) |
|
|
400
|
+
| `extractMainContent` | boolean | `true` | Use Readability to extract main content |
|
|
401
|
+
| `includeMetadata` | boolean | `true` | Include page metadata |
|
|
402
|
+
| `maxContentLength` | number | – | Maximum content length per URL in characters |
|
|
403
|
+
| `format` | string | `'jsonl'` | Output format: `'jsonl'` or `'markdown'` |
|
|
404
|
+
| `concurrency` | number | `3` | Maximum concurrent requests (1-5) |
|
|
405
|
+
| `continueOnError` | boolean | `true` | Continue processing if some URLs fail |
|
|
406
|
+
| `customHeaders` | object | – | Custom HTTP headers for all requests |
|
|
407
|
+
| `timeout` | number | `30000` | Request timeout in milliseconds (1000-60000) |
|
|
408
|
+
| `retries` | number | `3` | Number of retry attempts (1-10) |
|
|
409
|
+
|
|
410
|
+
**Example Output:**
|
|
411
|
+
|
|
412
|
+
```json
|
|
413
|
+
{
|
|
414
|
+
"results": [
|
|
415
|
+
{
|
|
416
|
+
"url": "https://example.com",
|
|
417
|
+
"success": true,
|
|
418
|
+
"title": "Example",
|
|
419
|
+
"content": "...",
|
|
420
|
+
"cached": false
|
|
421
|
+
},
|
|
422
|
+
{
|
|
423
|
+
"url": "https://example.org",
|
|
424
|
+
"success": true,
|
|
425
|
+
"title": "Example Org",
|
|
426
|
+
"content": "...",
|
|
427
|
+
"cached": false
|
|
428
|
+
}
|
|
429
|
+
],
|
|
430
|
+
"summary": {
|
|
431
|
+
"total": 2,
|
|
432
|
+
"successful": 2,
|
|
433
|
+
"failed": 0,
|
|
434
|
+
"cached": 0,
|
|
435
|
+
"totalContentBlocks": 15
|
|
436
|
+
},
|
|
437
|
+
"fetchedAt": "2024-12-11T10:30:00.000Z"
|
|
438
|
+
}
|
|
439
|
+
```
|
|
440
|
+
|
|
441
|
+
### Resources
|
|
442
|
+
|
|
443
|
+
| URI | Description |
|
|
444
|
+
| --------------------- | --------------------------------------------------- |
|
|
445
|
+
| `superfetch://stats` | Server statistics and cache metrics |
|
|
446
|
+
| `superfetch://health` | Real-time server health and dependency status |
|
|
447
|
+
| Dynamic resources | Cached content available via resource subscriptions |
|
|
448
|
+
|
|
449
|
+
### Prompts
|
|
450
|
+
|
|
451
|
+
- **`analyze-web-content`** — Analyze fetched content with optional focus area
|
|
452
|
+
- **`summarize-page`** — Fetch and summarize a webpage concisely
|
|
453
|
+
- **`extract-data`** — Extract structured data from a webpage
|
|
454
|
+
|
|
455
|
+
---
|
|
456
|
+
|
|
457
|
+
## Configuration
|
|
458
|
+
|
|
459
|
+
### Alternative MCP Client Setups
|
|
460
|
+
|
|
461
|
+
<details>
|
|
462
|
+
<summary><strong>VS Code (HTTP mode)</strong> — requires running server separately</summary>
|
|
463
|
+
|
|
464
|
+
First, start the HTTP server:
|
|
465
|
+
|
|
466
|
+
```bash
|
|
467
|
+
npx -y @j0hanz/superfetch@latest
|
|
468
|
+
```
|
|
469
|
+
|
|
470
|
+
Then add to `.vscode/mcp.json`:
|
|
471
|
+
|
|
472
|
+
```json
|
|
473
|
+
{
|
|
474
|
+
"servers": {
|
|
475
|
+
"superFetch": {
|
|
476
|
+
"type": "http",
|
|
477
|
+
"url": "http://127.0.0.1:3000/mcp"
|
|
478
|
+
}
|
|
479
|
+
}
|
|
480
|
+
}
|
|
481
|
+
```
|
|
482
|
+
|
|
483
|
+
</details>
|
|
484
|
+
|
|
485
|
+
<details>
|
|
486
|
+
<summary><strong>Claude Desktop (local path)</strong> — for development</summary>
|
|
487
|
+
|
|
488
|
+
```json
|
|
489
|
+
{
|
|
490
|
+
"mcpServers": {
|
|
491
|
+
"superFetch": {
|
|
492
|
+
"command": "node",
|
|
493
|
+
"args": ["/path/to/super-fetch-mcp-server/dist/index.js", "--stdio"]
|
|
494
|
+
}
|
|
495
|
+
}
|
|
496
|
+
}
|
|
497
|
+
```
|
|
498
|
+
|
|
499
|
+
</details>
|
|
500
|
+
|
|
501
|
+
### Environment Variables
|
|
502
|
+
|
|
503
|
+
| Variable | Default | Description |
|
|
504
|
+
| ---------------------- | -------------------- | ------------------------------- |
|
|
505
|
+
| `PORT` | `3000` | HTTP server port |
|
|
506
|
+
| `HOST` | `127.0.0.1` | HTTP server host |
|
|
507
|
+
| `FETCH_TIMEOUT` | `30000` | Request timeout (ms) |
|
|
508
|
+
| `MAX_REDIRECTS` | `5` | Maximum HTTP redirects |
|
|
509
|
+
| `USER_AGENT` | `superFetch-MCP/1.0` | HTTP User-Agent |
|
|
510
|
+
| `MAX_CONTENT_LENGTH` | `10485760` | Max response size (bytes) |
|
|
511
|
+
| `CACHE_ENABLED` | `true` | Enable response caching |
|
|
512
|
+
| `CACHE_TTL` | `3600` | Cache TTL (seconds) |
|
|
513
|
+
| `CACHE_MAX_KEYS` | `100` | Maximum cache entries |
|
|
514
|
+
| `LOG_LEVEL` | `info` | Logging level |
|
|
515
|
+
| `ENABLE_LOGGING` | `true` | Enable/disable logging |
|
|
516
|
+
| `EXTRACT_MAIN_CONTENT` | `true` | Extract main content by default |
|
|
517
|
+
| `INCLUDE_METADATA` | `true` | Include metadata by default |
|
|
518
|
+
| `MAX_BLOCK_LENGTH` | `5000` | Maximum block length |
|
|
519
|
+
| `MIN_PARAGRAPH_LENGTH` | `10` | Minimum paragraph length |
|
|
520
|
+
|
|
521
|
+
---
|
|
522
|
+
|
|
523
|
+
## Content Block Types
|
|
524
|
+
|
|
525
|
+
JSONL output includes semantic content blocks:
|
|
526
|
+
|
|
527
|
+
| Type | Description |
|
|
528
|
+
| ----------- | ----------------------------------------------- |
|
|
529
|
+
| `metadata` | Page title, description, author, URL, timestamp |
|
|
530
|
+
| `heading` | Headings (h1-h6) with level indicator |
|
|
531
|
+
| `paragraph` | Text paragraphs |
|
|
532
|
+
| `list` | Ordered/unordered lists |
|
|
533
|
+
| `code` | Code blocks with language |
|
|
534
|
+
| `table` | Tables with headers and rows |
|
|
535
|
+
| `image` | Images with src and alt text |
|
|
536
|
+
|
|
537
|
+
---
|
|
538
|
+
|
|
539
|
+
## Security
|
|
540
|
+
|
|
541
|
+
### SSRF Protection
|
|
542
|
+
|
|
543
|
+
Blocked destinations:
|
|
544
|
+
|
|
545
|
+
- Localhost and loopback addresses
|
|
546
|
+
- Private IP ranges (`10.x.x.x`, `172.16-31.x.x`, `192.168.x.x`)
|
|
547
|
+
- Cloud metadata endpoints (AWS, GCP, Azure)
|
|
548
|
+
- IPv6 link-local and unique local addresses
|
|
549
|
+
|
|
550
|
+
### Header Sanitization
|
|
551
|
+
|
|
552
|
+
Blocked headers: `host`, `authorization`, `cookie`, `x-forwarded-for`, `x-real-ip`, `proxy-authorization`
|
|
553
|
+
|
|
554
|
+
### Rate Limiting
|
|
555
|
+
|
|
556
|
+
Default: **100 requests/minute** per IP (configurable)
|
|
557
|
+
|
|
558
|
+
### HTTP Mode Endpoints
|
|
559
|
+
|
|
560
|
+
When running without `--stdio`, the following endpoints are available:
|
|
561
|
+
|
|
562
|
+
| Endpoint | Method | Description |
|
|
563
|
+
| --------- | ------ | --------------------------------------- |
|
|
564
|
+
| `/health` | GET | Health check with uptime and version |
|
|
565
|
+
| `/mcp` | POST | MCP request handling (requires session) |
|
|
566
|
+
| `/mcp` | GET | SSE stream for notifications |
|
|
567
|
+
| `/mcp` | DELETE | Close session |
|
|
568
|
+
|
|
569
|
+
Sessions are managed via `mcp-session-id` header with 30-minute TTL.
|
|
570
|
+
|
|
571
|
+
---
|
|
572
|
+
|
|
573
|
+
## Development
|
|
574
|
+
|
|
575
|
+
### Scripts
|
|
576
|
+
|
|
577
|
+
| Command | Description |
|
|
578
|
+
| -------------------- | ---------------------------------- |
|
|
579
|
+
| `npm run dev` | Development server with hot reload |
|
|
580
|
+
| `npm run build` | Compile TypeScript |
|
|
581
|
+
| `npm start` | Production server |
|
|
582
|
+
| `npm run lint` | Run ESLint |
|
|
583
|
+
| `npm run type-check` | TypeScript type checking |
|
|
584
|
+
| `npm run format` | Format with Prettier |
|
|
585
|
+
| `npm run release` | Create new release |
|
|
586
|
+
| `npm run knip` | Find unused exports/dependencies |
|
|
587
|
+
| `npm run knip:fix` | Auto-fix unused code |
|
|
588
|
+
|
|
589
|
+
### Tech Stack
|
|
590
|
+
|
|
591
|
+
| Category | Technology |
|
|
592
|
+
| ------------------ | --------------------------------- |
|
|
593
|
+
| Runtime | Node.js ≥20.0.0 |
|
|
594
|
+
| Language | TypeScript 5.9 |
|
|
595
|
+
| MCP SDK | @modelcontextprotocol/sdk ^1.24.3 |
|
|
596
|
+
| Content Extraction | @mozilla/readability ^0.6.0 |
|
|
597
|
+
| HTML Parsing | Cheerio ^1.1.2, JSDOM ^27.3.0 |
|
|
598
|
+
| Markdown | Turndown ^7.2.2 |
|
|
599
|
+
| HTTP | Express ^5.2.1, Axios ^1.13.2 |
|
|
600
|
+
| Caching | node-cache ^5.1.2 |
|
|
601
|
+
| Validation | Zod ^3.25.76 |
|
|
602
|
+
| Logging | Winston ^3.19.0 |
|
|
603
|
+
|
|
604
|
+
---
|
|
605
|
+
|
|
606
|
+
## Contributing
|
|
607
|
+
|
|
608
|
+
1. Fork the repository
|
|
609
|
+
2. Create a feature branch: `git checkout -b feature/amazing-feature`
|
|
610
|
+
3. Ensure linting passes: `npm run lint`
|
|
611
|
+
4. Commit changes: `git commit -m 'Add amazing feature'`
|
|
612
|
+
5. Push: `git push origin feature/amazing-feature`
|
|
613
|
+
6. Open a Pull Request
|
|
614
|
+
|
|
615
|
+
For examples of other MCP servers, see: [github.com/modelcontextprotocol/servers](https://github.com/modelcontextprotocol/servers)
|