@j0hanz/superfetch 1.0.3 → 1.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (93) hide show
  1. package/README.md +615 -590
  2. package/dist/config/index.d.ts +5 -0
  3. package/dist/config/index.d.ts.map +1 -1
  4. package/dist/config/index.js +5 -0
  5. package/dist/config/index.js.map +1 -1
  6. package/dist/config/types.d.ts +5 -0
  7. package/dist/config/types.d.ts.map +1 -1
  8. package/dist/errors/app-error.d.ts +4 -0
  9. package/dist/errors/app-error.d.ts.map +1 -1
  10. package/dist/errors/app-error.js +7 -0
  11. package/dist/errors/app-error.js.map +1 -1
  12. package/dist/index.js +94 -17
  13. package/dist/index.js.map +1 -1
  14. package/dist/middleware/error-handler.d.ts.map +1 -1
  15. package/dist/middleware/error-handler.js +4 -2
  16. package/dist/middleware/error-handler.js.map +1 -1
  17. package/dist/middleware/rate-limiter.d.ts.map +1 -1
  18. package/dist/middleware/rate-limiter.js +46 -13
  19. package/dist/middleware/rate-limiter.js.map +1 -1
  20. package/dist/prompts/index.d.ts.map +1 -1
  21. package/dist/prompts/index.js +2 -7
  22. package/dist/prompts/index.js.map +1 -1
  23. package/dist/resources/cached-content.d.ts +4 -0
  24. package/dist/resources/cached-content.d.ts.map +1 -0
  25. package/dist/resources/cached-content.js +68 -0
  26. package/dist/resources/cached-content.js.map +1 -0
  27. package/dist/resources/index.d.ts.map +1 -1
  28. package/dist/resources/index.js +39 -1
  29. package/dist/resources/index.js.map +1 -1
  30. package/dist/server.d.ts.map +1 -1
  31. package/dist/server.js +10 -0
  32. package/dist/server.js.map +1 -1
  33. package/dist/services/cache.d.ts +11 -0
  34. package/dist/services/cache.d.ts.map +1 -1
  35. package/dist/services/cache.js +72 -8
  36. package/dist/services/cache.js.map +1 -1
  37. package/dist/services/card-extractor.d.ts +0 -4
  38. package/dist/services/card-extractor.d.ts.map +1 -1
  39. package/dist/services/card-extractor.js +17 -5
  40. package/dist/services/card-extractor.js.map +1 -1
  41. package/dist/services/extractor.d.ts +7 -1
  42. package/dist/services/extractor.d.ts.map +1 -1
  43. package/dist/services/extractor.js +16 -9
  44. package/dist/services/extractor.js.map +1 -1
  45. package/dist/services/fetcher.d.ts +10 -1
  46. package/dist/services/fetcher.d.ts.map +1 -1
  47. package/dist/services/fetcher.js +162 -36
  48. package/dist/services/fetcher.js.map +1 -1
  49. package/dist/services/parser.d.ts.map +1 -1
  50. package/dist/services/parser.js +41 -29
  51. package/dist/services/parser.js.map +1 -1
  52. package/dist/tools/handlers/fetch-links.tool.d.ts +5 -10
  53. package/dist/tools/handlers/fetch-links.tool.d.ts.map +1 -1
  54. package/dist/tools/handlers/fetch-links.tool.js +4 -0
  55. package/dist/tools/handlers/fetch-links.tool.js.map +1 -1
  56. package/dist/tools/handlers/fetch-markdown.tool.d.ts +5 -12
  57. package/dist/tools/handlers/fetch-markdown.tool.d.ts.map +1 -1
  58. package/dist/tools/handlers/fetch-markdown.tool.js +1 -2
  59. package/dist/tools/handlers/fetch-markdown.tool.js.map +1 -1
  60. package/dist/tools/handlers/fetch-url.tool.d.ts +4 -12
  61. package/dist/tools/handlers/fetch-url.tool.d.ts.map +1 -1
  62. package/dist/tools/handlers/fetch-url.tool.js.map +1 -1
  63. package/dist/tools/handlers/fetch-urls.tool.d.ts +8 -1
  64. package/dist/tools/handlers/fetch-urls.tool.d.ts.map +1 -1
  65. package/dist/tools/handlers/fetch-urls.tool.js +67 -16
  66. package/dist/tools/handlers/fetch-urls.tool.js.map +1 -1
  67. package/dist/tools/utils/common.js +1 -1
  68. package/dist/tools/utils/common.js.map +1 -1
  69. package/dist/tools/utils/fetch-pipeline.d.ts.map +1 -1
  70. package/dist/tools/utils/fetch-pipeline.js +90 -20
  71. package/dist/tools/utils/fetch-pipeline.js.map +1 -1
  72. package/dist/transformers/markdown.transformer.d.ts.map +1 -1
  73. package/dist/transformers/markdown.transformer.js +8 -28
  74. package/dist/transformers/markdown.transformer.js.map +1 -1
  75. package/dist/utils/concurrency.d.ts +5 -1
  76. package/dist/utils/concurrency.d.ts.map +1 -1
  77. package/dist/utils/concurrency.js +15 -2
  78. package/dist/utils/concurrency.js.map +1 -1
  79. package/dist/utils/content-cleaner.d.ts.map +1 -1
  80. package/dist/utils/content-cleaner.js +124 -108
  81. package/dist/utils/content-cleaner.js.map +1 -1
  82. package/dist/utils/language-detector.d.ts +1 -1
  83. package/dist/utils/language-detector.d.ts.map +1 -1
  84. package/dist/utils/sanitizer.js +1 -1
  85. package/dist/utils/sanitizer.js.map +1 -1
  86. package/dist/utils/tool-error-handler.d.ts.map +1 -1
  87. package/dist/utils/tool-error-handler.js +36 -6
  88. package/dist/utils/tool-error-handler.js.map +1 -1
  89. package/dist/utils/url-validator.d.ts +10 -0
  90. package/dist/utils/url-validator.d.ts.map +1 -1
  91. package/dist/utils/url-validator.js +43 -5
  92. package/dist/utils/url-validator.js.map +1 -1
  93. package/package.json +83 -80
package/README.md CHANGED
@@ -1,590 +1,615 @@
1
- # 🚀 superFetch
2
-
3
- [![npm version](https://img.shields.io/npm/v/@j0hanz/superfetch.svg)](https://www.npmjs.com/package/@j0hanz/superfetch)[![Node.js](https://img.shields.io/badge/Node.js-≥18.0.0-339933?logo=nodedotjs&logoColor=white)](https://nodejs.org/)[![TypeScript](https://img.shields.io/badge/TypeScript-5.9-3178C6?logo=typescript&logoColor=white)](https://www.typescriptlang.org/)[![MCP SDK](https://img.shields.io/badge/MCP_SDK-1.0.4-8B5CF6)](https://modelcontextprotocol.io/)
4
-
5
- ## One-Click Install
6
-
7
- [![Install with NPX in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=superfetch&inputs=%5B%5D&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Fsuperfetch%40latest%22%2C%22--stdio%22%5D%7D)[![Install with NPX in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=superfetch&inputs=%5B%5D&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Fsuperfetch%40latest%22%2C%22--stdio%22%5D%7D&quality=insiders)
8
-
9
- [![Install in Cursor](https://cursor.com/deeplink/mcp-install-dark.svg)](https://cursor.com/install-mcp?name=superfetch&config=eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyIteSIsIkBqMGhhbnovc3VwZXJmZXRjaEBsYXRlc3QiLCItLXN0ZGlvIl19)
10
-
11
- A [Model Context Protocol](https://modelcontextprotocol.io/) (MCP) server that fetches, extracts, and transforms web content into AI-optimized formats using Mozilla Readability.
12
-
13
- [Quick Start](#quick-start) · [How to Choose a Tool](#-how-to-choose-a-tool) · [Tools](#available-tools) · [Configuration](#configuration) · [Contributing](#contributing)
14
-
15
- > 📦 **Published to [MCP Registry](https://registry.modelcontextprotocol.io/)** — Search for `io.github.j0hanz/superfetch`
16
-
17
- ---
18
-
19
- > [!CAUTION]
20
- > This server can access URLs on behalf of AI assistants. Built-in SSRF protection blocks private IP ranges and cloud metadata endpoints, but exercise caution when deploying in sensitive environments.
21
-
22
- ## ✨ Features
23
-
24
- | Feature | Description |
25
- | ------------------------- | ------------------------------------------------------------- |
26
- | 🧠 **Smart Extraction** | Mozilla Readability removes ads, navigation, and boilerplate |
27
- | 📄 **Multiple Formats** | JSONL semantic blocks or clean Markdown with YAML frontmatter |
28
- | 🔗 **Link Discovery** | Extract and classify internal/external links |
29
- | **Built-in Caching** | Configurable TTL and max entries |
30
- | 🛡️ **Security First** | SSRF protection, URL validation, header sanitization |
31
- | 🔄 **Resilient Fetching** | Exponential backoff with jitter |
32
- | 📊 **Monitoring** | Stats resource for cache performance and health |
33
-
34
- ---
35
-
36
- ## 🎯 How to Choose a Tool
37
-
38
- Use this guide to select the right tool for your web content extraction needs:
39
-
40
- ### Decision Tree
41
-
42
- ```text
43
- Need web content for AI?
44
- ├─ Single URL?
45
- │ ├─ Need structured semantic blocks → fetch-url (JSONL)
46
- │ ├─ Need readable markdown → fetch-markdown
47
- │ └─ Need links only fetch-links
48
- └─ Multiple URLs?
49
- └─ Use fetch-urls (batch processing)
50
- ```
51
-
52
- ### Quick Reference Table
53
-
54
- | Tool | Best For | Output Format | Use When |
55
- | ---------------- | -------------------------------- | ----------------------- | ------------------------------------------- |
56
- | `fetch-url` | Single page → structured content | JSONL semantic blocks | AI analysis, RAG pipelines, content parsing |
57
- | `fetch-markdown` | Single page → readable format | Clean Markdown + TOC | Documentation, human-readable output |
58
- | `fetch-links` | Link discovery & classification | URL array with types | Sitemap building, finding related pages |
59
- | `fetch-urls` | Batch processing multiple pages | Multiple JSONL/Markdown | Comparing pages, bulk extraction |
60
-
61
- ### Common Use Cases
62
-
63
- | Task | Recommended Tool | Why |
64
- | ------------------------ | ---------------------------------------- | ---------------------------------------------------- |
65
- | Parse a blog post for AI | `fetch-url` | Returns semantic blocks (headings, paragraphs, code) |
66
- | Generate documentation | `fetch-markdown` | Clean markdown with optional TOC |
67
- | Build a sitemap | `fetch-links` | Extracts and classifies all links |
68
- | Compare multiple docs | `fetch-urls` | Parallel fetching with concurrency control |
69
- | Extract article for RAG | `fetch-url` + `extractMainContent: true` | Removes ads/nav, keeps main content |
70
-
71
- ---
72
-
73
- ## Quick Start
74
-
75
- Add superFetch to your MCP client configuration — no installation required!
76
-
77
- ### Claude Desktop
78
-
79
- Add to your `claude_desktop_config.json`:
80
-
81
- ```json
82
- {
83
- "mcpServers": {
84
- "superFetch": {
85
- "command": "npx",
86
- "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
87
- }
88
- }
89
- }
90
- ```
91
-
92
- ### VS Code
93
-
94
- Add to `.vscode/mcp.json` in your workspace:
95
-
96
- ```json
97
- {
98
- "servers": {
99
- "superFetch": {
100
- "command": "npx",
101
- "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
102
- }
103
- }
104
- }
105
- ```
106
-
107
- ### With Environment Variables
108
-
109
- ```json
110
- {
111
- "servers": {
112
- "superFetch": {
113
- "command": "npx",
114
- "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"],
115
- "env": {
116
- "CACHE_TTL": "7200",
117
- "LOG_LEVEL": "debug"
118
- }
119
- }
120
- }
121
- }
122
- ```
123
-
124
- ### Cursor
125
-
126
- 1. Open Cursor Settings
127
- 2. Go to **Features > MCP Servers**
128
- 3. Click **"+ Add new global MCP server"**
129
- 4. Add this configuration:
130
-
131
- ```json
132
- {
133
- "mcpServers": {
134
- "superFetch": {
135
- "command": "npx",
136
- "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
137
- }
138
- }
139
- }
140
- ```
141
-
142
- > **Tip:** On Windows, if you encounter issues, try: `cmd /c "npx -y @j0hanz/superfetch@latest --stdio"`
143
-
144
- <details>
145
- <summary><strong>Cline (VS Code Extension)</strong></summary>
146
-
147
- Open the Cline MCP settings file:
148
-
149
- **macOS:**
150
-
151
- ```bash
152
- code ~/Library/Application\ Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json
153
- ```
154
-
155
- **Windows:**
156
-
157
- ```bash
158
- code %APPDATA%\Code\User\globalStorage\saoudrizwan.claude-dev\settings\cline_mcp_settings.json
159
- ```
160
-
161
- Add the configuration:
162
-
163
- ```json
164
- {
165
- "mcpServers": {
166
- "superFetch": {
167
- "command": "npx",
168
- "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"],
169
- "disabled": false,
170
- "autoApprove": []
171
- }
172
- }
173
- }
174
- ```
175
-
176
- </details>
177
-
178
- <details>
179
- <summary><strong>Windsurf</strong></summary>
180
-
181
- Add to `./codeium/windsurf/model_config.json`:
182
-
183
- ```json
184
- {
185
- "mcpServers": {
186
- "superFetch": {
187
- "command": "npx",
188
- "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
189
- }
190
- }
191
- }
192
- ```
193
-
194
- </details>
195
-
196
- <details>
197
- <summary><strong>Claude Desktop (Config File Locations)</strong></summary>
198
-
199
- **macOS:**
200
-
201
- ```bash
202
- # Open config file
203
- open -e "$HOME/Library/Application Support/Claude/claude_desktop_config.json"
204
-
205
- # Or with VS Code
206
- code "$HOME/Library/Application Support/Claude/claude_desktop_config.json"
207
- ```
208
-
209
- **Windows:**
210
-
211
- ```bash
212
- code %APPDATA%\Claude\claude_desktop_config.json
213
- ```
214
-
215
- </details>
216
-
217
- ---
218
-
219
- ## Installation (Alternative)
220
-
221
- ### Global Installation
222
-
223
- ```bash
224
- npm install -g @j0hanz/superfetch
225
-
226
- # Run in stdio mode
227
- superfetch --stdio
228
-
229
- # Run HTTP server
230
- superfetch
231
- ```
232
-
233
- ### From Source
234
-
235
- ```bash
236
- git clone https://github.com/j0hanz/super-fetch-mcp-server.git
237
- cd super-fetch-mcp-server
238
- npm install
239
- npm run build
240
- ```
241
-
242
- ### Running the Server
243
-
244
- <details>
245
- <summary><strong>HTTP Mode</strong> (default)</summary>
246
-
247
- ```bash
248
- # Development with hot reload
249
- npm run dev
250
-
251
- # Production
252
- npm start
253
- ```
254
-
255
- Server runs at `http://127.0.0.1:3000`:
256
-
257
- - Health check: `GET /health`
258
- - MCP endpoint: `POST /mcp`
259
-
260
- </details>
261
-
262
- <details>
263
- <summary><strong>stdio Mode</strong> (direct MCP integration)</summary>
264
-
265
- ```bash
266
- node dist/index.js --stdio
267
- ```
268
-
269
- </details>
270
-
271
- ---
272
-
273
- ## Available Tools
274
-
275
- ### `fetch-url`
276
-
277
- Fetches a webpage and converts it to AI-readable JSONL format with semantic content blocks.
278
-
279
- | Parameter | Type | Default | Description |
280
- | -------------------- | ------- | ---------- | -------------------------------------------- |
281
- | `url` | string | _required_ | URL to fetch |
282
- | `extractMainContent` | boolean | `true` | Use Readability to extract main content |
283
- | `includeMetadata` | boolean | `true` | Include page metadata (title, description) |
284
- | `maxContentLength` | number | | Maximum content length in characters |
285
- | `customHeaders` | object | | Custom HTTP headers for the request |
286
- | `timeout` | number | `30000` | Request timeout in milliseconds (1000-60000) |
287
- | `retries` | number | `3` | Number of retry attempts (1-10) |
288
-
289
- **Example Response:**
290
-
291
- ```json
292
- {
293
- "url": "https://example.com/article",
294
- "title": "Example Article",
295
- "fetchedAt": "2025-12-11T10:30:00.000Z",
296
- "contentBlocks": [
297
- {
298
- "type": "metadata",
299
- "title": "Example Article",
300
- "description": "A sample article"
301
- },
302
- { "type": "heading", "level": 1, "text": "Introduction" },
303
- {
304
- "type": "paragraph",
305
- "text": "This is the main content of the article..."
306
- },
307
- {
308
- "type": "code",
309
- "language": "javascript",
310
- "content": "console.log('Hello');"
311
- }
312
- ],
313
- "cached": false
314
- }
315
- ```
316
-
317
- ### `fetch-links`
318
-
319
- Extracts hyperlinks from a webpage with classification. Supports filtering, image links, and link limits.
320
-
321
- | Parameter | Type | Default | Description |
322
- | ----------------- | ------- | ---------- | -------------------------------------------- |
323
- | `url` | string | _required_ | URL to extract links from |
324
- | `includeExternal` | boolean | `true` | Include external links |
325
- | `includeInternal` | boolean | `true` | Include internal links |
326
- | `includeImages` | boolean | `false` | Include image links (img src attributes) |
327
- | `maxLinks` | number | | Maximum number of links to return (1-1000) |
328
- | `filterPattern` | string | | Regex pattern to filter links (matches href) |
329
- | `customHeaders` | object | | Custom HTTP headers for the request |
330
- | `timeout` | number | `30000` | Request timeout in milliseconds (1000-60000) |
331
- | `retries` | number | `3` | Number of retry attempts (1-10) |
332
-
333
- **Example Response:**
334
-
335
- ```json
336
- {
337
- "url": "https://example.com/",
338
- "linkCount": 15,
339
- "links": [
340
- {
341
- "href": "https://example.com/about",
342
- "text": "About Us",
343
- "type": "internal"
344
- },
345
- {
346
- "href": "https://github.com/example",
347
- "text": "GitHub",
348
- "type": "external"
349
- },
350
- { "href": "https://example.com/logo.png", "text": "", "type": "image" }
351
- ],
352
- "cached": false,
353
- "truncated": false
354
- }
355
- ```
356
-
357
- ### `fetch-markdown`
358
-
359
- Fetches a webpage and converts it to clean Markdown with optional table of contents.
360
-
361
- | Parameter | Type | Default | Description |
362
- | -------------------- | ------- | ---------- | -------------------------------------------- |
363
- | `url` | string | _required_ | URL to fetch |
364
- | `extractMainContent` | boolean | `true` | Extract main content only |
365
- | `includeMetadata` | boolean | `true` | Include YAML frontmatter |
366
- | `maxContentLength` | number | | Maximum content length in characters |
367
- | `generateToc` | boolean | `false` | Generate table of contents from headings |
368
- | `customHeaders` | object | | Custom HTTP headers for the request |
369
- | `timeout` | number | `30000` | Request timeout in milliseconds (1000-60000) |
370
- | `retries` | number | `3` | Number of retry attempts (1-10) |
371
-
372
- **Example Response:**
373
-
374
- ````json
375
- {
376
- "url": "https://example.com/docs",
377
- "title": "Documentation",
378
- "fetchedAt": "2025-12-11T10:30:00.000Z",
379
- "markdown": "---\ntitle: Documentation\nsource: \"https://example.com/docs\"\n---\n\n# Getting Started\n\nWelcome to our documentation...\n\n## Installation\n\n```bash\nnpm install example\n```",
380
- "toc": [
381
- { "level": 1, "text": "Getting Started", "slug": "getting-started" },
382
- { "level": 2, "text": "Installation", "slug": "installation" }
383
- ],
384
- "cached": false,
385
- "truncated": false
386
- }
387
- ````
388
-
389
- ### `fetch-urls` (Batch)
390
-
391
- Fetches multiple URLs in parallel with concurrency control. Ideal for comparing content or processing multiple pages efficiently.
392
-
393
- | Parameter | Type | Default | Description |
394
- | -------------------- | -------- | ---------- | -------------------------------------------- |
395
- | `urls` | string[] | _required_ | Array of URLs to fetch (1-10 URLs) |
396
- | `extractMainContent` | boolean | `true` | Use Readability to extract main content |
397
- | `includeMetadata` | boolean | `true` | Include page metadata |
398
- | `maxContentLength` | number | | Maximum content length per URL in characters |
399
- | `format` | string | `'jsonl'` | Output format: `'jsonl'` or `'markdown'` |
400
- | `concurrency` | number | `3` | Maximum concurrent requests (1-5) |
401
- | `continueOnError` | boolean | `true` | Continue processing if some URLs fail |
402
- | `customHeaders` | object | – | Custom HTTP headers for all requests |
403
- | `timeout` | number | `30000` | Request timeout in milliseconds (1000-60000) |
404
- | `retries` | number | `3` | Number of retry attempts (1-10) |
405
-
406
- **Example Output:**
407
-
408
- ```json
409
- {
410
- "results": [
411
- {
412
- "url": "https://example.com",
413
- "success": true,
414
- "title": "Example",
415
- "content": "...",
416
- "cached": false
417
- },
418
- {
419
- "url": "https://example.org",
420
- "success": true,
421
- "title": "Example Org",
422
- "content": "...",
423
- "cached": false
424
- }
425
- ],
426
- "summary": {
427
- "total": 2,
428
- "successful": 2,
429
- "failed": 0,
430
- "cached": 0,
431
- "totalContentBlocks": 15
432
- },
433
- "fetchedAt": "2024-12-11T10:30:00.000Z"
434
- }
435
- ```
436
-
437
- ### Resources
438
-
439
- | URI | Description |
440
- | -------------------- | ----------------------------------- |
441
- | `superfetch://stats` | Server statistics and cache metrics |
442
-
443
- ### Prompts
444
-
445
- - **`analyze-web-content`** Analyze fetched content with optional focus area
446
- - **`summarize-page`** Fetch and summarize a webpage concisely
447
- - **`extract-data`** Extract structured data from a webpage
448
-
449
- ---
450
-
451
- ## Configuration
452
-
453
- ### Alternative MCP Client Setups
454
-
455
- <details>
456
- <summary><strong>VS Code (HTTP mode)</strong> — requires running server separately</summary>
457
-
458
- First, start the HTTP server:
459
-
460
- ```bash
461
- npx -y @j0hanz/superfetch@latest
462
- ```
463
-
464
- Then add to `.vscode/mcp.json`:
465
-
466
- ```json
467
- {
468
- "servers": {
469
- "superFetch": {
470
- "type": "http",
471
- "url": "http://127.0.0.1:3000/mcp"
472
- }
473
- }
474
- }
475
- ```
476
-
477
- </details>
478
-
479
- <details>
480
- <summary><strong>Claude Desktop (local path)</strong> — for development</summary>
481
-
482
- ```json
483
- {
484
- "mcpServers": {
485
- "superFetch": {
486
- "command": "node",
487
- "args": ["/path/to/super-fetch-mcp-server/dist/index.js", "--stdio"]
488
- }
489
- }
490
- }
491
- ```
492
-
493
- </details>
494
-
495
- ### Environment Variables
496
-
497
- | Variable | Default | Description |
498
- | -------------------- | -------------------- | ------------------------- |
499
- | `PORT` | `3000` | HTTP server port |
500
- | `HOST` | `127.0.0.1` | HTTP server host |
501
- | `FETCH_TIMEOUT` | `30000` | Request timeout (ms) |
502
- | `MAX_REDIRECTS` | `5` | Maximum HTTP redirects |
503
- | `USER_AGENT` | `superFetch-MCP/1.0` | HTTP User-Agent |
504
- | `MAX_CONTENT_LENGTH` | `10485760` | Max response size (bytes) |
505
- | `CACHE_ENABLED` | `true` | Enable response caching |
506
- | `CACHE_TTL` | `3600` | Cache TTL (seconds) |
507
- | `CACHE_MAX_KEYS` | `100` | Maximum cache entries |
508
- | `LOG_LEVEL` | `info` | Logging level |
509
- | `ENABLE_LOGGING` | `true` | Enable/disable logging |
510
-
511
- ---
512
-
513
- ## Content Block Types
514
-
515
- JSONL output includes semantic content blocks:
516
-
517
- | Type | Description |
518
- | ----------- | ----------------------------------------------- |
519
- | `metadata` | Page title, description, author, URL, timestamp |
520
- | `heading` | Headings (h1-h6) with level indicator |
521
- | `paragraph` | Text paragraphs |
522
- | `list` | Ordered/unordered lists |
523
- | `code` | Code blocks with language |
524
- | `table` | Tables with headers and rows |
525
- | `image` | Images with src and alt text |
526
-
527
- ---
528
-
529
- ## Security
530
-
531
- ### SSRF Protection
532
-
533
- Blocked destinations:
534
-
535
- - Localhost and loopback addresses
536
- - Private IP ranges (`10.x.x.x`, `172.16-31.x.x`, `192.168.x.x`)
537
- - Cloud metadata endpoints (AWS, GCP, Azure)
538
- - IPv6 link-local and unique local addresses
539
-
540
- ### Header Sanitization
541
-
542
- Blocked headers: `host`, `authorization`, `cookie`, `x-forwarded-for`, `x-real-ip`, `proxy-authorization`
543
-
544
- ### Rate Limiting
545
-
546
- Default: **100 requests/minute** per IP (configurable)
547
-
548
- ---
549
-
550
- ## Development
551
-
552
- ### Scripts
553
-
554
- | Command | Description |
555
- | -------------------- | ---------------------------------- |
556
- | `npm run dev` | Development server with hot reload |
557
- | `npm run build` | Compile TypeScript |
558
- | `npm start` | Production server |
559
- | `npm run lint` | Run ESLint |
560
- | `npm run type-check` | TypeScript type checking |
561
- | `npm run format` | Format with Prettier |
562
- | `npm test` | Run tests |
563
-
564
- ### Tech Stack
565
-
566
- | Category | Technology |
567
- | ------------------ | -------------------------------- |
568
- | Runtime | Node.js ≥18 |
569
- | Language | TypeScript 5.9 |
570
- | MCP SDK | @modelcontextprotocol/sdk ^1.0.4 |
571
- | Content Extraction | @mozilla/readability |
572
- | HTML Parsing | Cheerio, JSDOM |
573
- | Markdown | Turndown |
574
- | HTTP | Express, Axios |
575
- | Caching | node-cache |
576
- | Validation | Zod |
577
- | Logging | Winston |
578
-
579
- ---
580
-
581
- ## Contributing
582
-
583
- 1. Fork the repository
584
- 2. Create a feature branch: `git checkout -b feature/amazing-feature`
585
- 3. Ensure linting passes: `npm run lint`
586
- 4. Commit changes: `git commit -m 'Add amazing feature'`
587
- 5. Push: `git push origin feature/amazing-feature`
588
- 6. Open a Pull Request
589
-
590
- For examples of other MCP servers, see: [github.com/modelcontextprotocol/servers](https://github.com/modelcontextprotocol/servers)
1
+ # 🚀 superFetch MCP Server
2
+
3
+ <img src="docs/logo.png" alt="SuperFetch MCP Logo" width="200">
4
+
5
+ [![npm version](https://img.shields.io/npm/v/@j0hanz/superfetch.svg)](https://www.npmjs.com/package/@j0hanz/superfetch) [![Node.js](https://img.shields.io/badge/Node.js-≥20.0.0-339933?logo=nodedotjs&logoColor=white)](https://nodejs.org/) [![TypeScript](https://img.shields.io/badge/TypeScript-5.9-3178C6?logo=typescript&logoColor=white)](https://www.typescriptlang.org/)
6
+
7
+ ## One-Click Install
8
+
9
+ [![Install with NPX in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=superfetch&inputs=%5B%5D&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Fsuperfetch%40latest%22%2C%22--stdio%22%5D%7D) [![Install with NPX in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=superfetch&inputs=%5B%5D&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Fsuperfetch%40latest%22%2C%22--stdio%22%5D%7D&quality=insiders)
10
+
11
+ [![Install in Cursor](https://cursor.com/deeplink/mcp-install-dark.svg)](https://cursor.com/install-mcp?name=superfetch&config=eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyIteSIsIkBqMGhhbnovc3VwZXJmZXRjaEBsYXRlc3QiLCItLXN0ZGlvIl19)
12
+
13
+ A [Model Context Protocol](https://modelcontextprotocol.io/) (MCP) server that fetches, extracts, and transforms web content into AI-optimized formats using Mozilla Readability.
14
+
15
+ **Version:** 1.0.5
16
+
17
+ [Quick Start](#quick-start) · [How to Choose a Tool](#-how-to-choose-a-tool) · [Tools](#available-tools) · [Configuration](#configuration) · [Contributing](#contributing)
18
+
19
+ > 📦 **Published to [MCP Registry](https://registry.modelcontextprotocol.io/)** — Search for `io.github.j0hanz/superfetch`
20
+
21
+ ---
22
+
23
+ > [!CAUTION]
24
+ > This server can access URLs on behalf of AI assistants. Built-in SSRF protection blocks private IP ranges and cloud metadata endpoints, but exercise caution when deploying in sensitive environments.
25
+
26
+ ## Features
27
+
28
+ | Feature | Description |
29
+ | ------------------------- | ------------------------------------------------------------- |
30
+ | 🧠 **Smart Extraction** | Mozilla Readability removes ads, navigation, and boilerplate |
31
+ | 📄 **Multiple Formats** | JSONL semantic blocks or clean Markdown with YAML frontmatter |
32
+ | 🔗 **Link Discovery** | Extract and classify internal/external links |
33
+ | ⚡ **Built-in Caching** | Configurable TTL and max entries |
34
+ | 🛡️ **Security First** | SSRF protection, URL validation, header sanitization |
35
+ | 🔄 **Resilient Fetching** | Exponential backoff with jitter |
36
+ | 📊 **Monitoring** | Stats resource for cache performance and health |
37
+
38
+ ---
39
+
40
+ ## 🎯 How to Choose a Tool
41
+
42
+ Use this guide to select the right tool for your web content extraction needs:
43
+
44
+ ### Decision Tree
45
+
46
+ ```text
47
+ Need web content for AI?
48
+ ├─ Single URL?
49
+ │ ├─ Need structured semantic blocks → fetch-url (JSONL)
50
+ │ ├─ Need readable markdown → fetch-markdown
51
+ │ └─ Need links only → fetch-links
52
+ └─ Multiple URLs?
53
+ └─ Use fetch-urls (batch processing)
54
+ ```
55
+
56
+ ### Quick Reference Table
57
+
58
+ | Tool | Best For | Output Format | Use When |
59
+ | ---------------- | -------------------------------- | ----------------------- | ------------------------------------------- |
60
+ | `fetch-url` | Single page → structured content | JSONL semantic blocks | AI analysis, RAG pipelines, content parsing |
61
+ | `fetch-markdown` | Single page → readable format | Clean Markdown + TOC | Documentation, human-readable output |
62
+ | `fetch-links` | Link discovery & classification | URL array with types | Sitemap building, finding related pages |
63
+ | `fetch-urls` | Batch processing multiple pages | Multiple JSONL/Markdown | Comparing pages, bulk extraction |
64
+
65
+ ### Common Use Cases
66
+
67
+ | Task | Recommended Tool | Why |
68
+ | ------------------------ | ---------------------------------------- | ---------------------------------------------------- |
69
+ | Parse a blog post for AI | `fetch-url` | Returns semantic blocks (headings, paragraphs, code) |
70
+ | Generate documentation | `fetch-markdown` | Clean markdown with optional TOC |
71
+ | Build a sitemap | `fetch-links` | Extracts and classifies all links |
72
+ | Compare multiple docs | `fetch-urls` | Parallel fetching with concurrency control |
73
+ | Extract article for RAG | `fetch-url` + `extractMainContent: true` | Removes ads/nav, keeps main content |
74
+
75
+ ---
76
+
77
+ ## Quick Start
78
+
79
+ Add superFetch to your MCP client configuration — no installation required!
80
+
81
+ ### Claude Desktop
82
+
83
+ Add to your `claude_desktop_config.json`:
84
+
85
+ ```json
86
+ {
87
+ "mcpServers": {
88
+ "superFetch": {
89
+ "command": "npx",
90
+ "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
91
+ }
92
+ }
93
+ }
94
+ ```
95
+
96
+ ### VS Code
97
+
98
+ Add to `.vscode/mcp.json` in your workspace:
99
+
100
+ ```json
101
+ {
102
+ "servers": {
103
+ "superFetch": {
104
+ "command": "npx",
105
+ "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
106
+ }
107
+ }
108
+ }
109
+ ```
110
+
111
+ ### With Environment Variables
112
+
113
+ ```json
114
+ {
115
+ "servers": {
116
+ "superFetch": {
117
+ "command": "npx",
118
+ "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"],
119
+ "env": {
120
+ "CACHE_TTL": "7200",
121
+ "LOG_LEVEL": "debug"
122
+ }
123
+ }
124
+ }
125
+ }
126
+ ```
127
+
128
+ ### Cursor
129
+
130
+ 1. Open Cursor Settings
131
+ 2. Go to **Features > MCP Servers**
132
+ 3. Click **"+ Add new global MCP server"**
133
+ 4. Add this configuration:
134
+
135
+ ```json
136
+ {
137
+ "mcpServers": {
138
+ "superFetch": {
139
+ "command": "npx",
140
+ "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
141
+ }
142
+ }
143
+ }
144
+ ```
145
+
146
+ > **Tip:** On Windows, if you encounter issues, try: `cmd /c "npx -y @j0hanz/superfetch@latest --stdio"`
147
+
148
+ <details>
149
+ <summary><strong>Cline (VS Code Extension)</strong></summary>
150
+
151
+ Open the Cline MCP settings file:
152
+
153
+ **macOS:**
154
+
155
+ ```bash
156
+ code ~/Library/Application\ Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json
157
+ ```
158
+
159
+ **Windows:**
160
+
161
+ ```bash
162
+ code %APPDATA%\Code\User\globalStorage\saoudrizwan.claude-dev\settings\cline_mcp_settings.json
163
+ ```
164
+
165
+ Add the configuration:
166
+
167
+ ```json
168
+ {
169
+ "mcpServers": {
170
+ "superFetch": {
171
+ "command": "npx",
172
+ "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"],
173
+ "disabled": false,
174
+ "autoApprove": []
175
+ }
176
+ }
177
+ }
178
+ ```
179
+
180
+ </details>
181
+
182
+ <details>
183
+ <summary><strong>Windsurf</strong></summary>
184
+
185
+ Add to `./codeium/windsurf/model_config.json`:
186
+
187
+ ```json
188
+ {
189
+ "mcpServers": {
190
+ "superFetch": {
191
+ "command": "npx",
192
+ "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
193
+ }
194
+ }
195
+ }
196
+ ```
197
+
198
+ </details>
199
+
200
+ <details>
201
+ <summary><strong>Claude Desktop (Config File Locations)</strong></summary>
202
+
203
+ **macOS:**
204
+
205
+ ```bash
206
+ # Open config file
207
+ open -e "$HOME/Library/Application Support/Claude/claude_desktop_config.json"
208
+
209
+ # Or with VS Code
210
+ code "$HOME/Library/Application Support/Claude/claude_desktop_config.json"
211
+ ```
212
+
213
+ **Windows:**
214
+
215
+ ```bash
216
+ code %APPDATA%\Claude\claude_desktop_config.json
217
+ ```
218
+
219
+ </details>
220
+
221
+ ---
222
+
223
+ ## Installation (Alternative)
224
+
225
+ ### Global Installation
226
+
227
+ ```bash
228
+ npm install -g @j0hanz/superfetch
229
+
230
+ # Run in stdio mode
231
+ superfetch --stdio
232
+
233
+ # Run HTTP server
234
+ superfetch
235
+ ```
236
+
237
+ ### From Source
238
+
239
+ ```bash
240
+ git clone https://github.com/j0hanz/super-fetch-mcp-server.git
241
+ cd super-fetch-mcp-server
242
+ npm install
243
+ npm run build
244
+ ```
245
+
246
+ ### Running the Server
247
+
248
+ <details>
249
+ <summary><strong>HTTP Mode</strong> (default)</summary>
250
+
251
+ ```bash
252
+ # Development with hot reload
253
+ npm run dev
254
+
255
+ # Production
256
+ npm start
257
+ ```
258
+
259
+ Server runs at `http://127.0.0.1:3000`:
260
+
261
+ - Health check: `GET /health`
262
+ - MCP endpoint: `POST /mcp`
263
+
264
+ </details>
265
+
266
+ <details>
267
+ <summary><strong>stdio Mode</strong> (direct MCP integration)</summary>
268
+
269
+ ```bash
270
+ node dist/index.js --stdio
271
+ ```
272
+
273
+ </details>
274
+
275
+ ---
276
+
277
+ ## Available Tools
278
+
279
+ ### `fetch-url`
280
+
281
+ Fetches a webpage and converts it to AI-readable JSONL format with semantic content blocks.
282
+
283
+ | Parameter | Type | Default | Description |
284
+ | -------------------- | ------- | ---------- | -------------------------------------------- |
285
+ | `url` | string | _required_ | URL to fetch |
286
+ | `extractMainContent` | boolean | `true` | Use Readability to extract main content |
287
+ | `includeMetadata` | boolean | `true` | Include page metadata (title, description) |
288
+ | `maxContentLength` | number | – | Maximum content length in characters |
289
+ | `customHeaders` | object | – | Custom HTTP headers for the request |
290
+ | `timeout` | number | `30000` | Request timeout in milliseconds (1000-60000) |
291
+ | `retries` | number | `3` | Number of retry attempts (1-10) |
292
+
293
+ **Example Response:**
294
+
295
+ ```json
296
+ {
297
+ "url": "https://example.com/article",
298
+ "title": "Example Article",
299
+ "fetchedAt": "2025-12-11T10:30:00.000Z",
300
+ "contentBlocks": [
301
+ {
302
+ "type": "metadata",
303
+ "title": "Example Article",
304
+ "description": "A sample article"
305
+ },
306
+ { "type": "heading", "level": 1, "text": "Introduction" },
307
+ {
308
+ "type": "paragraph",
309
+ "text": "This is the main content of the article..."
310
+ },
311
+ {
312
+ "type": "code",
313
+ "language": "javascript",
314
+ "content": "console.log('Hello');"
315
+ }
316
+ ],
317
+ "cached": false
318
+ }
319
+ ```
320
+
321
+ ### `fetch-links`
322
+
323
+ Extracts hyperlinks from a webpage with classification. Supports filtering, image links, and link limits.
324
+
325
+ | Parameter | Type | Default | Description |
326
+ | ----------------- | ------- | ---------- | -------------------------------------------- |
327
+ | `url` | string | _required_ | URL to extract links from |
328
+ | `includeExternal` | boolean | `true` | Include external links |
329
+ | `includeInternal` | boolean | `true` | Include internal links |
330
+ | `includeImages` | boolean | `false` | Include image links (img src attributes) |
331
+ | `maxLinks` | number | | Maximum number of links to return (1-1000) |
332
+ | `filterPattern` | string | – | Regex pattern to filter links (matches href) |
333
+ | `customHeaders` | object | – | Custom HTTP headers for the request |
334
+ | `timeout` | number | `30000` | Request timeout in milliseconds (1000-60000) |
335
+ | `retries` | number | `3` | Number of retry attempts (1-10) |
336
+
337
+ **Example Response:**
338
+
339
+ ```json
340
+ {
341
+ "url": "https://example.com/",
342
+ "linkCount": 15,
343
+ "links": [
344
+ {
345
+ "href": "https://example.com/about",
346
+ "text": "About Us",
347
+ "type": "internal"
348
+ },
349
+ {
350
+ "href": "https://github.com/example",
351
+ "text": "GitHub",
352
+ "type": "external"
353
+ },
354
+ { "href": "https://example.com/logo.png", "text": "", "type": "image" }
355
+ ],
356
+ "cached": false,
357
+ "truncated": false
358
+ }
359
+ ```
360
+
361
+ ### `fetch-markdown`
362
+
363
+ Fetches a webpage and converts it to clean Markdown with optional table of contents.
364
+
365
+ | Parameter | Type | Default | Description |
366
+ | -------------------- | ------- | ---------- | -------------------------------------------- |
367
+ | `url` | string | _required_ | URL to fetch |
368
+ | `extractMainContent` | boolean | `true` | Extract main content only |
369
+ | `includeMetadata` | boolean | `true` | Include YAML frontmatter |
370
+ | `maxContentLength` | number | | Maximum content length in characters |
371
+ | `generateToc` | boolean | `false` | Generate table of contents from headings |
372
+ | `customHeaders` | object | – | Custom HTTP headers for the request |
373
+ | `timeout` | number | `30000` | Request timeout in milliseconds (1000-60000) |
374
+ | `retries` | number | `3` | Number of retry attempts (1-10) |
375
+
376
+ **Example Response:**
377
+
378
+ ````json
379
+ {
380
+ "url": "https://example.com/docs",
381
+ "title": "Documentation",
382
+ "fetchedAt": "2025-12-11T10:30:00.000Z",
383
+ "markdown": "---\ntitle: Documentation\nsource: \"https://example.com/docs\"\n---\n\n# Getting Started\n\nWelcome to our documentation...\n\n## Installation\n\n```bash\nnpm install example\n```",
384
+ "toc": [
385
+ { "level": 1, "text": "Getting Started", "slug": "getting-started" },
386
+ { "level": 2, "text": "Installation", "slug": "installation" }
387
+ ],
388
+ "cached": false,
389
+ "truncated": false
390
+ }
391
+ ````
392
+
393
+ ### `fetch-urls` (Batch)
394
+
395
+ Fetches multiple URLs in parallel with concurrency control. Ideal for comparing content or processing multiple pages efficiently.
396
+
397
+ | Parameter | Type | Default | Description |
398
+ | -------------------- | -------- | ---------- | -------------------------------------------- |
399
+ | `urls` | string[] | _required_ | Array of URLs to fetch (1-10 URLs) |
400
+ | `extractMainContent` | boolean | `true` | Use Readability to extract main content |
401
+ | `includeMetadata` | boolean | `true` | Include page metadata |
402
+ | `maxContentLength` | number | – | Maximum content length per URL in characters |
403
+ | `format` | string | `'jsonl'` | Output format: `'jsonl'` or `'markdown'` |
404
+ | `concurrency` | number | `3` | Maximum concurrent requests (1-5) |
405
+ | `continueOnError` | boolean | `true` | Continue processing if some URLs fail |
406
+ | `customHeaders` | object | – | Custom HTTP headers for all requests |
407
+ | `timeout` | number | `30000` | Request timeout in milliseconds (1000-60000) |
408
+ | `retries` | number | `3` | Number of retry attempts (1-10) |
409
+
410
+ **Example Output:**
411
+
412
+ ```json
413
+ {
414
+ "results": [
415
+ {
416
+ "url": "https://example.com",
417
+ "success": true,
418
+ "title": "Example",
419
+ "content": "...",
420
+ "cached": false
421
+ },
422
+ {
423
+ "url": "https://example.org",
424
+ "success": true,
425
+ "title": "Example Org",
426
+ "content": "...",
427
+ "cached": false
428
+ }
429
+ ],
430
+ "summary": {
431
+ "total": 2,
432
+ "successful": 2,
433
+ "failed": 0,
434
+ "cached": 0,
435
+ "totalContentBlocks": 15
436
+ },
437
+ "fetchedAt": "2024-12-11T10:30:00.000Z"
438
+ }
439
+ ```
440
+
441
+ ### Resources
442
+
443
+ | URI | Description |
444
+ | --------------------- | --------------------------------------------------- |
445
+ | `superfetch://stats` | Server statistics and cache metrics |
446
+ | `superfetch://health` | Real-time server health and dependency status |
447
+ | Dynamic resources | Cached content available via resource subscriptions |
448
+
449
+ ### Prompts
450
+
451
+ - **`analyze-web-content`** — Analyze fetched content with optional focus area
452
+ - **`summarize-page`** — Fetch and summarize a webpage concisely
453
+ - **`extract-data`** Extract structured data from a webpage
454
+
455
+ ---
456
+
457
+ ## Configuration
458
+
459
+ ### Alternative MCP Client Setups
460
+
461
+ <details>
462
+ <summary><strong>VS Code (HTTP mode)</strong> — requires running server separately</summary>
463
+
464
+ First, start the HTTP server:
465
+
466
+ ```bash
467
+ npx -y @j0hanz/superfetch@latest
468
+ ```
469
+
470
+ Then add to `.vscode/mcp.json`:
471
+
472
+ ```json
473
+ {
474
+ "servers": {
475
+ "superFetch": {
476
+ "type": "http",
477
+ "url": "http://127.0.0.1:3000/mcp"
478
+ }
479
+ }
480
+ }
481
+ ```
482
+
483
+ </details>
484
+
485
+ <details>
486
+ <summary><strong>Claude Desktop (local path)</strong> — for development</summary>
487
+
488
+ ```json
489
+ {
490
+ "mcpServers": {
491
+ "superFetch": {
492
+ "command": "node",
493
+ "args": ["/path/to/super-fetch-mcp-server/dist/index.js", "--stdio"]
494
+ }
495
+ }
496
+ }
497
+ ```
498
+
499
+ </details>
500
+
501
+ ### Environment Variables
502
+
503
+ | Variable | Default | Description |
504
+ | ---------------------- | -------------------- | ------------------------------- |
505
+ | `PORT` | `3000` | HTTP server port |
506
+ | `HOST` | `127.0.0.1` | HTTP server host |
507
+ | `FETCH_TIMEOUT` | `30000` | Request timeout (ms) |
508
+ | `MAX_REDIRECTS` | `5` | Maximum HTTP redirects |
509
+ | `USER_AGENT` | `superFetch-MCP/1.0` | HTTP User-Agent |
510
+ | `MAX_CONTENT_LENGTH` | `10485760` | Max response size (bytes) |
511
+ | `CACHE_ENABLED` | `true` | Enable response caching |
512
+ | `CACHE_TTL` | `3600` | Cache TTL (seconds) |
513
+ | `CACHE_MAX_KEYS` | `100` | Maximum cache entries |
514
+ | `LOG_LEVEL` | `info` | Logging level |
515
+ | `ENABLE_LOGGING` | `true` | Enable/disable logging |
516
+ | `EXTRACT_MAIN_CONTENT` | `true` | Extract main content by default |
517
+ | `INCLUDE_METADATA` | `true` | Include metadata by default |
518
+ | `MAX_BLOCK_LENGTH` | `5000` | Maximum block length |
519
+ | `MIN_PARAGRAPH_LENGTH` | `10` | Minimum paragraph length |
520
+
521
+ ---
522
+
523
+ ## Content Block Types
524
+
525
+ JSONL output includes semantic content blocks:
526
+
527
+ | Type | Description |
528
+ | ----------- | ----------------------------------------------- |
529
+ | `metadata` | Page title, description, author, URL, timestamp |
530
+ | `heading` | Headings (h1-h6) with level indicator |
531
+ | `paragraph` | Text paragraphs |
532
+ | `list` | Ordered/unordered lists |
533
+ | `code` | Code blocks with language |
534
+ | `table` | Tables with headers and rows |
535
+ | `image` | Images with src and alt text |
536
+
537
+ ---
538
+
539
+ ## Security
540
+
541
+ ### SSRF Protection
542
+
543
+ Blocked destinations:
544
+
545
+ - Localhost and loopback addresses
546
+ - Private IP ranges (`10.x.x.x`, `172.16-31.x.x`, `192.168.x.x`)
547
+ - Cloud metadata endpoints (AWS, GCP, Azure)
548
+ - IPv6 link-local and unique local addresses
549
+
550
+ ### Header Sanitization
551
+
552
+ Blocked headers: `host`, `authorization`, `cookie`, `x-forwarded-for`, `x-real-ip`, `proxy-authorization`
553
+
554
+ ### Rate Limiting
555
+
556
+ Default: **100 requests/minute** per IP (configurable)
557
+
558
+ ### HTTP Mode Endpoints
559
+
560
+ When running without `--stdio`, the following endpoints are available:
561
+
562
+ | Endpoint | Method | Description |
563
+ | --------- | ------ | --------------------------------------- |
564
+ | `/health` | GET | Health check with uptime and version |
565
+ | `/mcp` | POST | MCP request handling (requires session) |
566
+ | `/mcp` | GET | SSE stream for notifications |
567
+ | `/mcp` | DELETE | Close session |
568
+
569
+ Sessions are managed via `mcp-session-id` header with 30-minute TTL.
570
+
571
+ ---
572
+
573
+ ## Development
574
+
575
+ ### Scripts
576
+
577
+ | Command | Description |
578
+ | -------------------- | ---------------------------------- |
579
+ | `npm run dev` | Development server with hot reload |
580
+ | `npm run build` | Compile TypeScript |
581
+ | `npm start` | Production server |
582
+ | `npm run lint` | Run ESLint |
583
+ | `npm run type-check` | TypeScript type checking |
584
+ | `npm run format` | Format with Prettier |
585
+ | `npm run release` | Create new release |
586
+ | `npm run knip` | Find unused exports/dependencies |
587
+ | `npm run knip:fix` | Auto-fix unused code |
588
+
589
+ ### Tech Stack
590
+
591
+ | Category | Technology |
592
+ | ------------------ | --------------------------------- |
593
+ | Runtime | Node.js ≥20.0.0 |
594
+ | Language | TypeScript 5.9 |
595
+ | MCP SDK | @modelcontextprotocol/sdk ^1.24.3 |
596
+ | Content Extraction | @mozilla/readability ^0.6.0 |
597
+ | HTML Parsing | Cheerio ^1.1.2, JSDOM ^27.3.0 |
598
+ | Markdown | Turndown ^7.2.2 |
599
+ | HTTP | Express ^5.2.1, Axios ^1.13.2 |
600
+ | Caching | node-cache ^5.1.2 |
601
+ | Validation | Zod ^3.25.76 |
602
+ | Logging | Winston ^3.19.0 |
603
+
604
+ ---
605
+
606
+ ## Contributing
607
+
608
+ 1. Fork the repository
609
+ 2. Create a feature branch: `git checkout -b feature/amazing-feature`
610
+ 3. Ensure linting passes: `npm run lint`
611
+ 4. Commit changes: `git commit -m 'Add amazing feature'`
612
+ 5. Push: `git push origin feature/amazing-feature`
613
+ 6. Open a Pull Request
614
+
615
+ For examples of other MCP servers, see: [github.com/modelcontextprotocol/servers](https://github.com/modelcontextprotocol/servers)