@j0hanz/superfetch 2.2.0 → 2.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (68) hide show
  1. package/README.md +363 -614
  2. package/dist/cache.d.ts +2 -2
  3. package/dist/cache.d.ts.map +1 -1
  4. package/dist/cache.js +49 -227
  5. package/dist/cache.js.map +1 -1
  6. package/dist/config.d.ts +6 -0
  7. package/dist/config.d.ts.map +1 -1
  8. package/dist/config.js +20 -27
  9. package/dist/config.js.map +1 -1
  10. package/dist/dom-noise-removal.d.ts +6 -0
  11. package/dist/dom-noise-removal.d.ts.map +1 -0
  12. package/dist/dom-noise-removal.js +482 -0
  13. package/dist/dom-noise-removal.js.map +1 -0
  14. package/dist/errors.d.ts.map +1 -1
  15. package/dist/errors.js +8 -5
  16. package/dist/errors.js.map +1 -1
  17. package/dist/fetch.d.ts.map +1 -1
  18. package/dist/fetch.js +26 -32
  19. package/dist/fetch.js.map +1 -1
  20. package/dist/http-native.d.ts +6 -0
  21. package/dist/http-native.d.ts.map +1 -0
  22. package/dist/http-native.js +645 -0
  23. package/dist/http-native.js.map +1 -0
  24. package/dist/http-utils.d.ts +61 -0
  25. package/dist/http-utils.d.ts.map +1 -0
  26. package/dist/http-utils.js +252 -0
  27. package/dist/http-utils.js.map +1 -0
  28. package/dist/index.js +1 -1
  29. package/dist/index.js.map +1 -1
  30. package/dist/instructions.md +41 -39
  31. package/dist/json.d.ts +2 -0
  32. package/dist/json.d.ts.map +1 -0
  33. package/dist/json.js +30 -0
  34. package/dist/json.js.map +1 -0
  35. package/dist/language-detection.d.ts +13 -0
  36. package/dist/language-detection.d.ts.map +1 -0
  37. package/dist/language-detection.js +283 -0
  38. package/dist/language-detection.js.map +1 -0
  39. package/dist/markdown-cleanup.d.ts +19 -0
  40. package/dist/markdown-cleanup.d.ts.map +1 -0
  41. package/dist/markdown-cleanup.js +283 -0
  42. package/dist/markdown-cleanup.js.map +1 -0
  43. package/dist/observability.d.ts +1 -0
  44. package/dist/observability.d.ts.map +1 -1
  45. package/dist/observability.js +10 -0
  46. package/dist/observability.js.map +1 -1
  47. package/dist/tools.d.ts.map +1 -1
  48. package/dist/tools.js +23 -8
  49. package/dist/tools.js.map +1 -1
  50. package/dist/transform-types.d.ts +81 -0
  51. package/dist/transform-types.d.ts.map +1 -0
  52. package/dist/transform-types.js +6 -0
  53. package/dist/transform-types.js.map +1 -0
  54. package/dist/transform.d.ts +8 -52
  55. package/dist/transform.d.ts.map +1 -1
  56. package/dist/transform.js +419 -825
  57. package/dist/transform.js.map +1 -1
  58. package/dist/type-guards.d.ts +1 -1
  59. package/dist/type-guards.d.ts.map +1 -1
  60. package/dist/type-guards.js +1 -1
  61. package/dist/type-guards.js.map +1 -1
  62. package/dist/workers/transform-worker.js +23 -24
  63. package/dist/workers/transform-worker.js.map +1 -1
  64. package/package.json +85 -86
  65. package/dist/http.d.ts +0 -90
  66. package/dist/http.d.ts.map +0 -1
  67. package/dist/http.js +0 -1576
  68. package/dist/http.js.map +0 -1
package/README.md CHANGED
@@ -1,614 +1,363 @@
1
- # superFetch MCP Server
2
-
3
- <!-- markdownlint-disable MD033 -->
4
-
5
- <img src="docs/logo.png" alt="SuperFetch MCP Logo" width="200">
6
-
7
- [![npm version](https://img.shields.io/npm/v/@j0hanz/superfetch.svg)](https://www.npmjs.com/package/@j0hanz/superfetch) [![Node.js](https://img.shields.io/badge/Node.js-%3E=20.18.1-339933?logo=nodedotjs&logoColor=white)](https://nodejs.org/) [![TypeScript](https://img.shields.io/badge/TypeScript-5.9-3178C6?logo=typescript&logoColor=white)](https://www.typescriptlang.org/)
8
-
9
- ## One-Click Install
10
-
11
- [![Install with NPX in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=superfetch&inputs=%5B%5D&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Fsuperfetch%40latest%22%2C%22--stdio%22%5D%7D) [![Install with NPX in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=superfetch&inputs=%5B%5D&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Fsuperfetch%40latest%22%2C%22--stdio%22%5D%7D&quality=insiders)
12
-
13
- [![Install in Cursor](https://cursor.com/deeplink/mcp-install-dark.svg)](https://cursor.com/install-mcp?name=superfetch&config=eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyIteSIsIkBqMGhhbnovc3VwZXJmZXRjaEBsYXRlc3QiLCItLXN0ZGlvIl19)
14
-
15
- A [Model Context Protocol](https://modelcontextprotocol.io/) (MCP) server that fetches web pages, extracts readable content with Mozilla Readability, and returns AI-friendly Markdown.
16
-
17
- Built for AI workflows that need _clean text_, _stable metadata_, and _safe-by-default fetching_.
18
-
19
- **Great for:** _LLM summarization_, _context retrieval_, _knowledge base ingestion_, and _AI agents_.
20
-
21
- _|_ [Quick Start](#quick-start) _|_ [Tool](#available-tools) _|_ [Resources](#resources) _|_ [Configuration](#configuration) _|_ [Security](#security) _|_ [Development](#development) _|_
22
-
23
- ---
24
-
25
- > [!CAUTION]
26
- > This server can access URLs on behalf of AI assistants. Built-in SSRF protection blocks private IP ranges and cloud metadata endpoints, but exercise caution when deploying in sensitive environments.
27
-
28
- ## Features
29
-
30
- - **Cleaner outputs for LLMs**: Readability extraction with quality gates (content ratio + heading retention ≥ 70%)
31
- - **Markdown that’s easy to consume**: metadata footer for HTML + configurable source injection for raw Markdown (markdown or frontmatter)
32
- - **Handles “raw content” sources**: preserves markdown/text; rewrites GitHub/GitLab/Bitbucket/Gist URLs to raw
33
- - **Works for both local and hosted setups**:
34
- - **Stdio mode**: best for MCP clients (VS Code / Claude Desktop / Cursor)
35
- - **HTTP mode**: best for self-hosting (auth, sessions, rate limiting, Host/Origin validation)
36
- - **Fast and resilient**: redirect validation, timeouts, and response size limits
37
- - **Security-first defaults**: URL validation + SSRF/DNS/IP blocklists (blocks private ranges and cloud metadata endpoints)
38
-
39
- **You get, in one tool call:**
40
-
41
- - **Clean, readable Markdown** from any public URL (docs, articles, blogs, wikis)
42
-
43
- If you’re comparing “just call `fetch()`” vs superFetch: superFetch focuses on extracting the main content in a readble format for LLMs and even humans, when requested url is fetched it returns clean structured markdown that can also be saved as a resource for later use.
44
-
45
- ## What it is (and isn’t)
46
-
47
- - **It is** a content extraction tool: focuses on extracting readable content, not screenshots or full-page data.
48
- - **It is** an MCP server: integrates with any MCP-compatible client (Claude Desktop, VS Code, Cursor, Cline, Windsurf, Codex, etc).
49
- - **It isn’t** a general web scraper: it extracts main content, not all page elements.
50
- - **It isn’t** a browser: it doesn’t execute JavaScript or render pages.
51
- - **It’s opinionated on safety**: blocks private/internal URLs and cloud metadata endpoints by default.
52
-
53
- ---
54
-
55
- ## Quick Start
56
-
57
- Recommended: use **stdio mode** with your MCP client (no HTTP server).
58
-
59
- ### Try it in 60 seconds
60
-
61
- 1. Add the MCP server config (below)
62
- 2. Restart your MCP client
63
- 3. Call the `fetch-url` tool with any public URL
64
-
65
- ### What the tool returns
66
-
67
- You’ll get `structuredContent` with `url`, `resolvedUrl`, optional `title`, and `markdown` (plus a `superfetch://cache/...` resource link when cache is enabled and content is large).
68
-
69
- ### Claude Desktop
70
-
71
- Add to your `claude_desktop_config.json`:
72
-
73
- ```json
74
- {
75
- "mcpServers": {
76
- "superFetch": {
77
- "command": "npx",
78
- "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
79
- }
80
- }
81
- }
82
- ```
83
-
84
- ### VS Code
85
-
86
- Add to `.vscode/mcp.json` in your workspace:
87
-
88
- ```json
89
- {
90
- "servers": {
91
- "superFetch": {
92
- "command": "npx",
93
- "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
94
- }
95
- }
96
- }
97
- ```
98
-
99
- ### With Custom Configuration
100
-
101
- Add environment variables in your MCP client config under `env`.
102
- See [Configuration](#configuration) or `CONFIGURATION.md` for all available options and presets.
103
-
104
- ### Example output (trimmed)
105
-
106
- ```json
107
- {
108
- "url": "https://example.com/docs",
109
- "inputUrl": "https://example.com/docs",
110
- "resolvedUrl": "https://example.com/docs",
111
- "title": "Documentation",
112
- "markdown": "# Getting Started\n\n...\n\n---\n\n _Documentation_ | [_Original Source_](https://example.com/docs) | _12-01-2026_"
113
- }
114
- ```
115
-
116
- > **Tip (Windows):** If you encounter issues, try: `cmd /c "npx -y @j0hanz/superfetch@latest --stdio"`
117
-
118
- <details>
119
- <summary><strong>Other clients (Cursor, Cline, Windsurf, Codex)</strong></summary>
120
-
121
- ### Cursor
122
-
123
- 1. Open Cursor Settings
124
- 2. Go to **Features > MCP Servers**
125
- 3. Click **"+ Add new global MCP server"**
126
- 4. Add this configuration:
127
-
128
- ```json
129
- {
130
- "mcpServers": {
131
- "superFetch": {
132
- "command": "npx",
133
- "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
134
- }
135
- }
136
- }
137
- ```
138
-
139
- <details>
140
- <summary><strong>Codex IDE</strong></summary>
141
-
142
- Add to your `~/.codex/config.toml` file:
143
-
144
- **Basic Configuration:**
145
-
146
- ```toml
147
- [mcp_servers.superfetch]
148
- command = "npx"
149
- args = ["-y", "@j0hanz/superfetch@latest", "--stdio"]
150
- ```
151
-
152
- **With Environment Variables:** See `CONFIGURATION.md` for examples.
153
-
154
- > **Access config file:** Click the gear icon -> "Codex Settings > Open config.toml"
155
- >
156
- > **Documentation:** [Codex MCP Guide](https://codex.com/docs/mcp)
157
-
158
- </details>
159
-
160
- <details>
161
- <summary><strong>Cline (VS Code Extension)</strong></summary>
162
-
163
- Open the Cline MCP settings file:
164
-
165
- **macOS:**
166
-
167
- ```bash
168
- code ~/Library/Application\ Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json
169
- ```
170
-
171
- **Windows:**
172
-
173
- ```bash
174
- code %APPDATA%\Code\User\globalStorage\saoudrizwan.claude-dev\settings\cline_mcp_settings.json
175
- ```
176
-
177
- Add the configuration:
178
-
179
- ```json
180
- {
181
- "mcpServers": {
182
- "superFetch": {
183
- "command": "npx",
184
- "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"],
185
- "disabled": false,
186
- "autoApprove": []
187
- }
188
- }
189
- }
190
- ```
191
-
192
- </details>
193
-
194
- <details>
195
- <summary><strong>Windsurf</strong></summary>
196
-
197
- Add to `./codeium/windsurf/model_config.json`:
198
-
199
- ```json
200
- {
201
- "mcpServers": {
202
- "superFetch": {
203
- "command": "npx",
204
- "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
205
- }
206
- }
207
- }
208
- ```
209
-
210
- </details>
211
-
212
- <details>
213
- <summary><strong>Claude Desktop (Config File Locations)</strong></summary>
214
-
215
- **macOS:**
216
-
217
- ```bash
218
- # Open config file
219
- open -e "$HOME/Library/Application Support/Claude/claude_desktop_config.json"
220
-
221
- # Or with VS Code
222
- code "$HOME/Library/Application Support/Claude/claude_desktop_config.json"
223
- ```
224
-
225
- **Windows:**
226
-
227
- ```bash
228
- code %APPDATA%\Claude\claude_desktop_config.json
229
- ```
230
-
231
- </details>
232
-
233
- </details>
234
-
235
- ---
236
-
237
- ## Use cases
238
-
239
- ### 1) Turn a docs page into “LLM-ready” Markdown
240
-
241
- - Call `fetch-url` with the docs URL
242
- - Feed the returned `markdown` into your summarizer / chunker
243
- - Use the metadata footer fields (especially **Original Source**) for citations
244
-
245
- ### 2) Fetch a GitHub/GitLab/Bitbucket file as raw markdown
246
-
247
- - Pass the normal “web UI” URL to `fetch-url`
248
- - superFetch will rewrite it to the raw content URL when possible
249
- - This avoids navigation UI and reduces boilerplate
250
-
251
- ### 3) Large pages: keep responses stable with cache resources
252
-
253
- - When content is large, the tool can include a `superfetch://cache/...` resource link
254
- - In MCP clients that support resources, you can read the full content via the resource URI
255
- - In HTTP mode, you can also download cached content via `/mcp/downloads/:namespace/:hash` when cache is enabled
256
-
257
- ### 4) Safe-by-default web access for agents
258
-
259
- - superFetch blocks private IP ranges and common cloud metadata endpoints
260
- - If your agent needs internal access, this is intentionally not supported by default (see Security)
261
-
262
- ---
263
-
264
- ## Installation (Alternative)
265
-
266
- ### Global Installation
267
-
268
- ```bash
269
- npm install -g @j0hanz/superfetch
270
-
271
- # Run in stdio mode
272
- superfetch --stdio
273
-
274
- # Run HTTP server (requires auth token)
275
- superfetch
276
- ```
277
-
278
- ### From Source
279
-
280
- ```bash
281
- git clone https://github.com/j0hanz/super-fetch-mcp-server.git
282
- cd super-fetch-mcp-server
283
- npm install
284
- npm run build
285
- ```
286
-
287
- ### Running the Server
288
-
289
- <details>
290
- <summary><strong>stdio Mode</strong> (direct MCP integration)</summary>
291
-
292
- ```bash
293
- node dist/index.js --stdio
294
- ```
295
-
296
- </details>
297
-
298
- <details>
299
- <summary><strong>HTTP Mode</strong> (default)</summary>
300
-
301
- HTTP mode requires authentication. By default it binds to `127.0.0.1`. Non-loopback `HOST` values require `ALLOW_REMOTE=true`. To listen on all interfaces, set `HOST=0.0.0.0` or `HOST=::`, set `ALLOW_REMOTE=true`, and configure OAuth (remote bindings require OAuth).
302
-
303
- ```bash
304
- API_KEY=supersecret npx -y @j0hanz/superfetch@latest
305
- # Server runs at http://127.0.0.1:3000
306
- ```
307
-
308
- **Windows (PowerShell):**
309
-
310
- ```powershell
311
- $env:API_KEY = "supersecret"
312
- npx -y @j0hanz/superfetch@latest
313
- ```
314
-
315
- For multiple static tokens, set `ACCESS_TOKENS` (comma/space separated).
316
-
317
- Auth is required for `/mcp` and `/mcp/downloads` via `Authorization: Bearer <token>` (static mode also accepts `X-API-Key`).
318
-
319
- Endpoints:
320
-
321
- - `GET /health` (no auth; returns status, name, version, uptime)
322
- - `POST /mcp` (auth required)
323
- - `GET /mcp` (auth required; SSE stream; requires `Accept: text/event-stream`)
324
- - `DELETE /mcp` (auth required)
325
- - `GET /mcp/downloads/:namespace/:hash` (auth required)
326
-
327
- Sessions are managed via the `mcp-session-id` header (see [HTTP Mode Details](#http-mode-details)).
328
-
329
- </details>
330
-
331
- ---
332
-
333
- ## Available Tools
334
-
335
- ### Tool Response Notes
336
-
337
- The tool returns `structuredContent` with `url`, `inputUrl`, `resolvedUrl`, optional `title`, and `markdown` when inline content is available. `resolvedUrl` may differ from `inputUrl` when the URL is rewritten to raw content (GitHub/GitLab/Bitbucket/Gist). On errors, `error` is included instead of content.
338
-
339
- The response includes:
340
-
341
- - a `text` block containing JSON of `structuredContent`
342
- - a `resource` block embedding markdown when inline content is available (stdio always embeds full markdown; HTTP embeds inline markdown when it fits or when truncated)
343
- - when content exceeds the inline limit and cache is enabled, a `resource_link` block pointing to `superfetch://cache/...` (stdio mode still embeds full markdown; HTTP mode omits embedded markdown)
344
- - error responses set `isError: true` and return `structuredContent` with `error` and `url`
345
-
346
- ---
347
-
348
- ### `fetch-url`
349
-
350
- Fetches a webpage and converts it to clean Markdown format with a metadata footer for HTML (raw markdown is preserved with source injection).
351
-
352
- | Parameter | Type | Default | Description |
353
- | --------- | ------ | -------- | ------------ |
354
- | `url` | string | required | URL to fetch |
355
-
356
- **Example `structuredContent`:**
357
-
358
- ```json
359
- {
360
- "url": "https://example.com/docs",
361
- "inputUrl": "https://example.com/docs",
362
- "resolvedUrl": "https://example.com/docs",
363
- "title": "Documentation",
364
- "markdown": "---\ntitle: Documentation\n---\n\n# Getting Started\n\nWelcome..."
365
- }
366
- ```
367
-
368
- **Error response:**
369
-
370
- ```json
371
- {
372
- "url": "https://example.com/broken",
373
- "error": "Failed to fetch: 404 Not Found"
374
- }
375
- ```
376
-
377
- ---
378
-
379
- ### Large Content Handling
380
-
381
- - Inline markdown is capped at 20,000 characters (`maxInlineContentChars`).
382
- - **Stdio mode:** full markdown is embedded as a `resource` block; if cache is enabled and content exceeds the inline limit, a `resource_link` is still included.
383
- - **HTTP mode:** if content exceeds the inline limit and cache is enabled, the response includes a `resource_link` to `superfetch://cache/...` and omits embedded markdown. If cache is disabled, the inline markdown is truncated with `...[truncated]`.
384
- - Upstream fetch size is capped at 10 MB of HTML; larger responses fail.
385
-
386
- ---
387
-
388
- ## Resources
389
-
390
- | URI | Description |
391
- | ------------------------------------------ | ---------------------------------------------- |
392
- | `superfetch://cache/{namespace}/{urlHash}` | Cached content entry (`namespace`: `markdown`) |
393
-
394
- Resource listings enumerate cached entries, and subscriptions notify clients when cache entries update.
395
-
396
- ---
397
-
398
- ## Download Endpoint (HTTP Mode)
399
-
400
- When running in HTTP mode, cached content can be downloaded directly. Downloads are available only when cache is enabled.
401
-
402
- ### Endpoint
403
-
404
- ```text
405
- GET /mcp/downloads/:namespace/:hash
406
- ```
407
-
408
- - `namespace`: `markdown`
409
- - Auth required (`Authorization: Bearer <token>`; in static token mode, `X-API-Key` is accepted)
410
-
411
- ### Response Headers
412
-
413
- | Header | Value |
414
- | --------------------- | ------------------------------- |
415
- | `Content-Type` | `text/markdown; charset=utf-8` |
416
- | `Content-Disposition` | `attachment; filename="<name>"` |
417
- | `Cache-Control` | `private, max-age=<CACHE_TTL>` |
418
-
419
- ### Example Usage
420
-
421
- ```bash
422
- curl -H "Authorization: Bearer $TOKEN" \
423
- http://localhost:3000/mcp/downloads/markdown/abc123.def456 \
424
- -o article.md
425
- ```
426
-
427
- ### Error Responses
428
-
429
- | Status | Code | Description |
430
- | ------ | --------------------- | -------------------------------- |
431
- | 400 | `BAD_REQUEST` | Invalid namespace or hash format |
432
- | 404 | `NOT_FOUND` | Content not found or expired |
433
- | 503 | `SERVICE_UNAVAILABLE` | Download service disabled |
434
-
435
- ---
436
-
437
- ## Configuration
438
-
439
- Set environment variables in your MCP client `env` or in the shell before starting the server.
440
-
441
- ### Core Server Settings
442
-
443
- | Variable | Default | Description |
444
- | --------------------------- | -------------------- | ---------------------------------------------------------- |
445
- | `HOST` | `127.0.0.1` | HTTP bind address |
446
- | `PORT` | `3000` | HTTP server port (1024-65535) |
447
- | `USER_AGENT` | `superFetch-MCP/2.0` | User-Agent header for outgoing requests |
448
- | `CACHE_ENABLED` | `true` | Enable response caching |
449
- | `CACHE_TTL` | `3600` | Cache TTL in seconds (60-86400) |
450
- | `LOG_LEVEL` | `info` | Logging level (`debug` enables verbose logs) |
451
- | `ALLOW_REMOTE` | `false` | Allow non-loopback binds (OAuth required) |
452
- | `ALLOWED_HOSTS` | (empty) | Additional allowed Host/Origin values |
453
- | `TRANSFORM_TIMEOUT_MS` | `30000` | Worker transform timeout in ms (5000-120000) |
454
- | `TOOL_TIMEOUT_MS` | `50000` | Overall tool timeout in ms (1000-300000) |
455
- | `TRANSFORM_METADATA_FORMAT` | `markdown` | Raw markdown metadata format (`markdown` or `frontmatter`) |
456
-
457
- For HTTP server tuning (`SERVER_HEADERS_TIMEOUT_MS`, `SERVER_REQUEST_TIMEOUT_MS`, `SERVER_KEEP_ALIVE_TIMEOUT_MS`, `SERVER_SHUTDOWN_CLOSE_IDLE`, `SERVER_SHUTDOWN_CLOSE_ALL`), see `CONFIGURATION.md`.
458
-
459
- ### Auth (HTTP Mode)
460
-
461
- | Variable | Default | Description |
462
- | --------------- | ------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
463
- | `AUTH_MODE` | auto | `static` or `oauth`. Auto-selects OAuth if OAUTH_ISSUER_URL, OAUTH_AUTHORIZATION_URL, OAUTH_TOKEN_URL, or OAUTH_INTROSPECTION_URL is set |
464
- | `ACCESS_TOKENS` | (empty) | Comma/space-separated static bearer tokens |
465
- | `API_KEY` | (empty) | Adds a static bearer token and enables `X-API-Key` header |
466
-
467
- Static mode requires at least one token (`ACCESS_TOKENS` or `API_KEY`).
468
-
469
- ### OAuth (HTTP Mode)
470
-
471
- Required when `AUTH_MODE=oauth` (or auto-selected by presence of OAuth URLs):
472
-
473
- | Variable | Default | Description |
474
- | ------------------------- | ------- | ---------------------- |
475
- | `OAUTH_ISSUER_URL` | - | OAuth issuer |
476
- | `OAUTH_AUTHORIZATION_URL` | - | Authorization endpoint |
477
- | `OAUTH_TOKEN_URL` | - | Token endpoint |
478
- | `OAUTH_INTROSPECTION_URL` | - | Introspection endpoint |
479
-
480
- Optional:
481
-
482
- | Variable | Default | Description |
483
- | -------------------------------- | -------------------------- | --------------------------------------- |
484
- | `OAUTH_REVOCATION_URL` | - | Revocation endpoint |
485
- | `OAUTH_REGISTRATION_URL` | - | Dynamic client registration endpoint |
486
- | `OAUTH_RESOURCE_URL` | `http://<host>:<port>/mcp` | Protected resource URL |
487
- | `OAUTH_REQUIRED_SCOPES` | (empty) | Required scopes (comma/space separated) |
488
- | `OAUTH_CLIENT_ID` | - | Client ID for introspection |
489
- | `OAUTH_CLIENT_SECRET` | - | Client secret for introspection |
490
- | `OAUTH_INTROSPECTION_TIMEOUT_MS` | `5000` | Introspection timeout (1000-30000) |
491
-
492
- ### Fixed Limits (Not Configurable via env)
493
-
494
- - Fetch timeout: 15 seconds
495
- - Max redirects: 5
496
- - Max HTML response size: 10 MB
497
- - Inline markdown limit: 20,000 characters
498
- - Cache max entries: 100
499
- - Session TTL: 30 minutes
500
- - Session init timeout: 10 seconds
501
- - Max sessions: 200
502
- - Rate limit: 100 req/min per IP (60s window)
503
-
504
- See `CONFIGURATION.md` for preset examples and quick-start snippets.
505
-
506
- ---
507
-
508
- ## HTTP Mode Details
509
-
510
- HTTP mode uses the MCP Streamable HTTP transport. The workflow is:
511
-
512
- 1. `POST /mcp` with an `initialize` request and **no** `mcp-session-id` header.
513
- 2. The server returns `mcp-session-id` in the response headers.
514
- 3. Use that header for subsequent `POST /mcp`, `GET /mcp`, and `DELETE /mcp` requests.
515
-
516
- If the `mcp-protocol-version` header is missing, the server rejects the request. Only `mcp-protocol-version: 2025-11-25` is supported.
517
-
518
- `GET /mcp` and `DELETE /mcp` require `mcp-session-id`. `POST /mcp` without an `initialize` request will return 400.
519
-
520
- Additional HTTP transport notes:
521
-
522
- - `POST /mcp` should advertise `Accept: application/json, text/event-stream` (the server normalizes missing or `*/*` Accept headers).
523
- - `GET /mcp` requires `Accept: text/event-stream` (otherwise 406).
524
- - JSON-RPC batch requests are not supported (400).
525
-
526
- If the server reaches its session cap (200), it evicts the oldest session when possible; otherwise it returns a 503.
527
-
528
- Host and Origin headers are always validated. Allowed values include loopback hosts, the configured `HOST` (if not a wildcard), and any entries in `ALLOWED_HOSTS`. When binding to `0.0.0.0` or `::`, set `ALLOWED_HOSTS` to the hostnames clients will send.
529
-
530
- ---
531
-
532
- ## Security
533
-
534
- ### SSRF Protection
535
-
536
- Blocked destinations include:
537
-
538
- - Loopback and unspecified addresses (`127.0.0.0/8`, `::1`, `0.0.0.0`, `::`)
539
- - Private/ULA ranges (`10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16`, `fc00::/7`)
540
- - Link-local and shared address space (`169.254.0.0/16`, `100.64.0.0/10`, `fe80::/10`)
541
- - Multicast/reserved ranges (`224.0.0.0/4`, `240.0.0.0/4`, `ff00::/8`)
542
- - IPv6 transition ranges (`64:ff9b::/96`, `64:ff9b:1::/48`, `2001::/32`, `2002::/16`)
543
- - Cloud metadata endpoints (AWS/GCP/Azure/Alibaba) like `169.254.169.254`, `metadata.google.internal`, `metadata.azure.com`, `100.100.100.200`, `instance-data`
544
- - Internal suffixes such as `.local` and `.internal`
545
-
546
- DNS resolution is performed and blocked if any resolved IP matches a blocked range.
547
-
548
- ### URL Validation
549
-
550
- - Only `http` and `https` URLs
551
- - No embedded credentials in URLs
552
- - Max URL length: 2048 characters
553
- - Hostnames ending in `.local` or `.internal` are rejected
554
-
555
- ### Host/Origin Validation (HTTP Mode)
556
-
557
- - Host header must match loopback, configured `HOST` (if not a wildcard), or `ALLOWED_HOSTS`
558
- - Origin header (when present) is validated against the same allow-list
559
-
560
- ### Rate Limiting
561
-
562
- Rate limiting applies to `/mcp` and `/mcp/downloads` (100 req/min per IP, 60s window). OPTIONS requests are not rate-limited.
563
-
564
- ---
565
-
566
- ## Development
567
-
568
- ### Scripts
569
-
570
- | Command | Description |
571
- | ----------------------- | ------------------------------------ |
572
- | `npm run dev` | Development server with hot reload |
573
- | `npm run build` | Compile TypeScript |
574
- | `npm start` | Production server |
575
- | `npm run lint` | Run ESLint |
576
- | `npm run lint:fix` | Auto-fix lint issues |
577
- | `npm run type-check` | TypeScript type checking |
578
- | `npm run format` | Format with Prettier |
579
- | `npm test` | Run Node test runner (builds dist) |
580
- | `npm run test:coverage` | Run tests with experimental coverage |
581
- | `npm run knip` | Find unused exports/dependencies |
582
- | `npm run knip:fix` | Auto-fix unused code |
583
- | `npm run inspector` | Launch MCP Inspector |
584
-
585
- > **Note:** Tests run via `node --test` with `--experimental-transform-types` to execute `.ts` test files. Node will emit an experimental warning.
586
-
587
- ### Tech Stack
588
-
589
- | Category | Technology |
590
- | ------------------ | --------------------------------- |
591
- | Runtime | Node.js >=20.18.1 |
592
- | Language | TypeScript 5.9 |
593
- | MCP SDK | @modelcontextprotocol/sdk ^1.25.2 |
594
- | Content Extraction | @mozilla/readability ^0.6.0 |
595
- | HTML Parsing | linkedom ^0.18.12 |
596
- | Markdown | node-html-markdown ^2.0.0 |
597
- | HTTP | Express ^5.2.1, undici ^7.18.2 |
598
- | Validation | Zod ^4.3.5 |
599
-
600
- ---
601
-
602
- ## Contributing
603
-
604
- 1. Fork the repository
605
- 2. Create a feature branch: `git checkout -b feature/amazing-feature`
606
- 3. Ensure linting passes: `npm run lint`
607
- 4. Run tests: `npm test`
608
- 5. Commit changes: `git commit -m 'Add amazing feature'`
609
- 6. Push: `git push origin feature/amazing-feature`
610
- 7. Open a Pull Request
611
-
612
- For examples of other MCP servers, see: [github.com/modelcontextprotocol/servers](https://github.com/modelcontextprotocol/servers)
613
-
614
- <!-- markdownlint-enable MD033 -->
1
+ <!-- markdownlint-disable MD033 -->
2
+
3
+ # superFetch MCP Server
4
+
5
+ Intelligent web content fetcher MCP server that converts HTML to clean, AI-readable Markdown.
6
+
7
+ [![npm version](https://img.shields.io/npm/v/@j0hanz/superfetch.svg)](https://www.npmjs.com/package/@j0hanz/superfetch) [![license](https://img.shields.io/npm/l/@j0hanz/superfetch.svg)](https://www.npmjs.com/package/@j0hanz/superfetch) [![Node.js](https://img.shields.io/badge/Node.js-%3E=20.18.1-339933?logo=nodedotjs&logoColor=white)](https://nodejs.org/) [![TypeScript](https://img.shields.io/badge/TypeScript-5.9-3178C6?logo=typescript&logoColor=white)](https://www.typescriptlang.org/) [![MCP SDK](https://img.shields.io/badge/MCP%20SDK-1.25.x-6f42c1)](https://github.com/modelcontextprotocol/sdk)
8
+
9
+ <img src="docs/logo.png" alt="SuperFetch MCP Logo" width="300">
10
+
11
+ ## One-Click Install
12
+
13
+ [![Install with NPX in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=superfetch&inputs=%5B%5D&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Fsuperfetch%40latest%22%2C%22--stdio%22%5D%7D)
14
+ [![Install with NPX in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=superfetch&inputs=%5B%5D&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Fsuperfetch%40latest%22%2C%22--stdio%22%5D%7D&quality=insiders)
15
+
16
+ [![Install in Cursor](https://cursor.com/deeplink/mcp-install-dark.svg)](https://cursor.com/install-mcp?name=superfetch&config=eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyIteSIsIkBqMGhhbnovc3VwZXJmZXRjaEBsYXRlc3QiLCItLXN0ZGlvIl19)
17
+
18
+ ## Overview
19
+
20
+ | Feature | Details |
21
+ | -------------------- | -------------------------------------------------------------------------- |
22
+ | HTML → Markdown | Mozilla Readability + node-html-markdown pipeline with metadata injection. |
23
+ | Raw content handling | Rewrites supported GitHub/GitLab/Bitbucket/Gist URLs to raw content. |
24
+ | Caching + resources | LRU cache with resource listing and update notifications. |
25
+ | Transport | Stdio (local clients) and Streamable HTTP (self-hosted). |
26
+ | Safety | SSRF/IP blocklists, Host/Origin validation, auth for HTTP mode. |
27
+
28
+ ### When to use
29
+
30
+ - You need clean, AI-friendly Markdown from public http(s) URLs.
31
+ - You want a single MCP tool that handles fetching, extraction, and caching.
32
+ - You need self-hosted HTTP with auth and session management.
33
+
34
+ ## Quick Start
35
+
36
+ Recommended for MCP clients: stdio mode.
37
+
38
+ ```bash
39
+ npx -y @j0hanz/superfetch@latest --stdio
40
+ ```
41
+
42
+ Example MCP client configuration:
43
+
44
+ ```json
45
+ {
46
+ "mcpServers": {
47
+ "superFetch": {
48
+ "command": "npx",
49
+ "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
50
+ }
51
+ }
52
+ }
53
+ ```
54
+
55
+ ## Installation
56
+
57
+ ### NPX (recommended)
58
+
59
+ ```bash
60
+ npx -y @j0hanz/superfetch@latest --stdio
61
+ ```
62
+
63
+ ### Global install
64
+
65
+ ```bash
66
+ npm install -g @j0hanz/superfetch
67
+ superfetch --stdio
68
+ ```
69
+
70
+ ### From source
71
+
72
+ ```bash
73
+ git clone https://github.com/j0hanz/super-fetch-mcp-server.git
74
+ cd super-fetch-mcp-server
75
+ npm install
76
+ npm run build
77
+ node dist/index.js --stdio
78
+ ```
79
+
80
+ ## Configuration
81
+
82
+ ### CLI arguments
83
+
84
+ | Argument | Type | Default | Description |
85
+ | --------- | ------- | ------- | ----------------------------------- |
86
+ | `--stdio` | boolean | false | Run in stdio mode (no HTTP server). |
87
+
88
+ ### Environment variables
89
+
90
+ #### Core server settings
91
+
92
+ | Variable | Default | Description |
93
+ | ---------------------------------- | -------------------- | -------------------------------------------------------------- |
94
+ | `HOST` | `127.0.0.1` | HTTP bind address. |
95
+ | `PORT` | `3000` | HTTP server port (1024-65535, `0` for ephemeral). |
96
+ | `USER_AGENT` | `superFetch-MCP/2.0` | User-Agent header for outgoing requests. |
97
+ | `CACHE_ENABLED` | `true` | Enable response caching. |
98
+ | `CACHE_TTL` | `3600` | Cache TTL in seconds (60-86400). |
99
+ | `LOG_LEVEL` | `info` | Logging level (`debug` enables verbose logs). |
100
+ | `ALLOW_REMOTE` | `false` | Allow non-loopback binds (OAuth required). |
101
+ | `ALLOWED_HOSTS` | (empty) | Additional allowed Host/Origin values (comma/space separated). |
102
+ | `TRANSFORM_TIMEOUT_MS` | `30000` | Worker transform timeout in ms (5000-120000). |
103
+ | `TOOL_TIMEOUT_MS` | `50000` | Overall tool timeout in ms (1000-300000). |
104
+ | `TRANSFORM_METADATA_FORMAT` | `markdown` | Metadata format: `markdown` or `frontmatter`. |
105
+ | `SUPERFETCH_EXTRA_NOISE_TOKENS` | (empty) | Extra noise tokens for DOM noise removal. |
106
+ | `SUPERFETCH_EXTRA_NOISE_SELECTORS` | (empty) | Extra CSS selectors for DOM noise removal. |
107
+
108
+ #### HTTP server tuning (optional)
109
+
110
+ | Variable | Default | Description |
111
+ | ------------------------------ | ------- | --------------------------------------------- |
112
+ | `SERVER_HEADERS_TIMEOUT_MS` | (unset) | Sets `server.headersTimeout` (1000-600000). |
113
+ | `SERVER_REQUEST_TIMEOUT_MS` | (unset) | Sets `server.requestTimeout` (1000-600000). |
114
+ | `SERVER_KEEP_ALIVE_TIMEOUT_MS` | (unset) | Sets `server.keepAliveTimeout` (1000-600000). |
115
+ | `SERVER_SHUTDOWN_CLOSE_IDLE` | `false` | Close idle connections on shutdown. |
116
+ | `SERVER_SHUTDOWN_CLOSE_ALL` | `false` | Close all connections on shutdown. |
117
+
118
+ #### Auth (HTTP mode)
119
+
120
+ | Variable | Default | Description |
121
+ | --------------- | ------- | ---------------------------------------------------- |
122
+ | `AUTH_MODE` | auto | `static` or `oauth` (auto-detected from OAuth URLs). |
123
+ | `ACCESS_TOKENS` | (empty) | Comma/space-separated static bearer tokens. |
124
+ | `API_KEY` | (empty) | Adds a static bearer token and enables `X-API-Key`. |
125
+
126
+ Static mode requires at least one token (`ACCESS_TOKENS` or `API_KEY`).
127
+
128
+ #### OAuth (HTTP mode)
129
+
130
+ Required when `AUTH_MODE=oauth` (or auto-selected by OAuth URLs):
131
+
132
+ | Variable | Default | Description |
133
+ | ------------------------- | ------- | ----------------------- |
134
+ | `OAUTH_ISSUER_URL` | - | OAuth issuer. |
135
+ | `OAUTH_AUTHORIZATION_URL` | - | Authorization endpoint. |
136
+ | `OAUTH_TOKEN_URL` | - | Token endpoint. |
137
+ | `OAUTH_INTROSPECTION_URL` | - | Introspection endpoint. |
138
+
139
+ Optional:
140
+
141
+ | Variable | Default | Description |
142
+ | -------------------------------- | -------------------------- | ---------------------------------------- |
143
+ | `OAUTH_REVOCATION_URL` | - | Revocation endpoint. |
144
+ | `OAUTH_REGISTRATION_URL` | - | Dynamic client registration endpoint. |
145
+ | `OAUTH_RESOURCE_URL` | `http://<host>:<port>/mcp` | Protected resource URL. |
146
+ | `OAUTH_REQUIRED_SCOPES` | (empty) | Required scopes (comma/space separated). |
147
+ | `OAUTH_CLIENT_ID` | - | Client ID for introspection. |
148
+ | `OAUTH_CLIENT_SECRET` | - | Client secret for introspection. |
149
+ | `OAUTH_INTROSPECTION_TIMEOUT_MS` | `5000` | Introspection timeout (1000-30000). |
150
+
151
+ ### HTTP mode endpoints
152
+
153
+ | Method | Path | Auth | Notes |
154
+ | ------ | --------------------------------- | ---- | -------------------------------------------------- |
155
+ | GET | `/health` | No | Health check. |
156
+ | POST | `/mcp` | Yes | Streamable HTTP JSON-RPC requests. |
157
+ | GET | `/mcp` | Yes | SSE stream (requires `Accept: text/event-stream`). |
158
+ | DELETE | `/mcp` | Yes | Close the session. |
159
+ | GET | `/mcp/downloads/:namespace/:hash` | Yes | Download cached markdown. |
160
+
161
+ Sessions are managed via the `mcp-session-id` header. A `POST /mcp` `initialize` request creates a session and returns the session id.
162
+
163
+ ## API Reference
164
+
165
+ ### Tools
166
+
167
+ #### `fetch-url`
168
+
169
+ Fetches a webpage and converts it to clean Markdown.
170
+
171
+ ##### Parameters
172
+
173
+ | Name | Type | Required | Default | Description |
174
+ | ----- | ------ | -------- | ------- | ------------------------------------ |
175
+ | `url` | string | Yes | - | Public http(s) URL, max length 2048. |
176
+
177
+ ##### Returns
178
+
179
+ `structuredContent` fields:
180
+
181
+ - `url` (string): fetched URL
182
+ - `inputUrl` (string, optional): original input URL
183
+ - `resolvedUrl` (string, optional): normalized or raw-content URL
184
+ - `title` (string, optional): page title
185
+ - `markdown` (string, optional): markdown content (inline when available)
186
+ - `error` (string, optional): error message on failure
187
+
188
+ ##### Example success
189
+
190
+ ```json
191
+ {
192
+ "url": "https://example.com/docs",
193
+ "inputUrl": "https://example.com/docs",
194
+ "resolvedUrl": "https://example.com/docs",
195
+ "title": "Example Docs",
196
+ "markdown": "# Getting Started\n\n..."
197
+ }
198
+ ```
199
+
200
+ ##### Example error
201
+
202
+ ```json
203
+ {
204
+ "url": "https://example.com/404",
205
+ "error": "Failed to fetch URL: 404 Not Found"
206
+ }
207
+ ```
208
+
209
+ ##### Large content handling
210
+
211
+ - Inline markdown is capped at 20,000 characters.
212
+ - When content exceeds the inline limit and cache is enabled, responses include a `resource_link` to `superfetch://cache/markdown/{urlHash}`.
213
+ - If cache is disabled, inline content is truncated with `...[truncated]`.
214
+
215
+ ### Resources
216
+
217
+ | URI pattern | Description | MIME type |
218
+ | --------------------------------------- | ------------------------------ | --------------- |
219
+ | `superfetch://cache/markdown/{urlHash}` | Cached markdown content entry. | `text/markdown` |
220
+ | `internal://instructions` | Server usage instructions. | `text/markdown` |
221
+
222
+ ### Prompts
223
+
224
+ No prompts are registered in this server.
225
+
226
+ ## Client Configuration Examples
227
+
228
+ <details>
229
+ <summary><strong>VS Code</strong></summary>
230
+
231
+ Add to .vscode/mcp.json:
232
+
233
+ ```json
234
+ {
235
+ "servers": {
236
+ "superFetch": {
237
+ "command": "npx",
238
+ "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
239
+ }
240
+ }
241
+ }
242
+ ```
243
+
244
+ </details>
245
+
246
+ <details>
247
+ <summary><strong>Claude Desktop</strong></summary>
248
+
249
+ Add to claude_desktop_config.json:
250
+
251
+ ```json
252
+ {
253
+ "mcpServers": {
254
+ "superFetch": {
255
+ "command": "npx",
256
+ "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
257
+ }
258
+ }
259
+ }
260
+ ```
261
+
262
+ </details>
263
+
264
+ <details>
265
+ <summary><strong>Cursor</strong></summary>
266
+
267
+ ```json
268
+ {
269
+ "mcpServers": {
270
+ "superFetch": {
271
+ "command": "npx",
272
+ "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
273
+ }
274
+ }
275
+ }
276
+ ```
277
+
278
+ </details>
279
+
280
+ <details>
281
+ <summary><strong>Windsurf</strong></summary>
282
+
283
+ ```json
284
+ {
285
+ "mcpServers": {
286
+ "superFetch": {
287
+ "command": "npx",
288
+ "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
289
+ }
290
+ }
291
+ }
292
+ ```
293
+
294
+ </details>
295
+
296
+ ## Security
297
+
298
+ - Stdio logs are written to stderr (stdout is reserved for MCP traffic).
299
+ - HTTP mode validates Host and Origin headers against allowed hosts.
300
+ - HTTP mode requires `MCP-Protocol-Version: 2025-11-25`.
301
+ - Auth is required for HTTP mode (static tokens or OAuth).
302
+ - SSRF protections block private IP ranges and common metadata endpoints.
303
+ - Rate limiting: 100 requests/minute per IP (60s window) for HTTP routes.
304
+
305
+ ## Development
306
+
307
+ ### Prerequisites
308
+
309
+ - Node.js >= 20.18.1
310
+ - npm
311
+
312
+ ### Scripts
313
+
314
+ | Script | Command | Purpose |
315
+ | ---------------------- | ----------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------- |
316
+ | clean | `node scripts/clean.mjs` | Remove build artifacts. |
317
+ | validate:instructions | `node scripts/validate-instructions.mjs` | Validate embedded instructions. |
318
+ | build | `npm run clean && tsc -p tsconfig.json && npm run validate:instructions && npm run copy:assets && node scripts/make-executable.mjs` | Build the server. |
319
+ | copy:assets | `node scripts/copy-assets.mjs` | Copy static assets. |
320
+ | prepare | `npm run build` | Prepare package for publishing. |
321
+ | dev | `tsc --watch --preserveWatchOutput` | TypeScript watch mode. |
322
+ | dev:run | `node --watch dist/index.js` | Run compiled server in watch mode. |
323
+ | start | `node dist/index.js` | Start HTTP server (default). |
324
+ | format | `prettier --write .` | Format codebase. |
325
+ | type-check | `tsc --noEmit` | Type checking. |
326
+ | type-check:diagnostics | `tsc --noEmit --extendedDiagnostics` | Type check diagnostics. |
327
+ | type-check:trace | `tsc --noEmit --generateTrace .ts-trace` | Generate TS trace. |
328
+ | lint | `eslint .` | Lint. |
329
+ | lint:fix | `eslint . --fix` | Lint and fix. |
330
+ | test | `npm run build --silent && node --test --experimental-transform-types` | Run tests (builds first). |
331
+ | test:coverage | `npm run build --silent && node --test --experimental-transform-types --experimental-test-coverage` | Test with coverage. |
332
+ | knip | `knip` | Dead code analysis. |
333
+ | knip:fix | `knip --fix` | Fix knip issues. |
334
+ | inspector | `npx @modelcontextprotocol/inspector` | MCP Inspector. |
335
+ | prepublishOnly | `npm run lint && npm run type-check && npm run build` | Prepublish checks. |
336
+
337
+ ### Project structure
338
+
339
+ ```text
340
+ superFetch
341
+ ├── docs
342
+ │ └── logo.png
343
+ ├── src
344
+ │ ├── workers
345
+ │ ├── cache.ts
346
+ │ ├── config.ts
347
+ │ ├── fetch.ts
348
+ │ ├── http-native.ts
349
+ │ ├── http-utils.ts
350
+ │ ├── index.ts
351
+ │ ├── instructions.md
352
+ ├── mcp.ts
353
+ │ ├── tools.ts
354
+ │ ├── transform.ts
355
+ │ └── ...
356
+ ├── tests
357
+ │ └── *.test.ts
358
+ ├── CONFIGURATION.md
359
+ ├── package.json
360
+ └── tsconfig.json
361
+ ```
362
+
363
+ <!-- markdownlint-enable MD033 -->