@j0hanz/superfetch 2.2.1 → 2.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (75) hide show
  1. package/README.md +243 -494
  2. package/dist/cache.d.ts +2 -3
  3. package/dist/cache.js +51 -241
  4. package/dist/config.d.ts +6 -1
  5. package/dist/config.js +29 -34
  6. package/dist/crypto.d.ts +0 -1
  7. package/dist/crypto.js +0 -1
  8. package/dist/dom-noise-removal.d.ts +5 -0
  9. package/dist/dom-noise-removal.js +485 -0
  10. package/dist/errors.d.ts +0 -1
  11. package/dist/errors.js +8 -6
  12. package/dist/fetch.d.ts +0 -1
  13. package/dist/fetch.js +71 -61
  14. package/dist/host-normalization.d.ts +1 -0
  15. package/dist/host-normalization.js +47 -0
  16. package/dist/http-native.d.ts +5 -0
  17. package/dist/http-native.js +693 -0
  18. package/dist/index.d.ts +0 -1
  19. package/dist/index.js +1 -2
  20. package/dist/instructions.md +22 -20
  21. package/dist/json.d.ts +1 -0
  22. package/dist/json.js +29 -0
  23. package/dist/language-detection.d.ts +12 -0
  24. package/dist/language-detection.js +291 -0
  25. package/dist/markdown-cleanup.d.ts +18 -0
  26. package/dist/markdown-cleanup.js +283 -0
  27. package/dist/mcp-validator.d.ts +14 -0
  28. package/dist/mcp-validator.js +22 -0
  29. package/dist/mcp.d.ts +0 -1
  30. package/dist/mcp.js +0 -1
  31. package/dist/observability.d.ts +1 -1
  32. package/dist/observability.js +15 -3
  33. package/dist/server-tuning.d.ts +9 -0
  34. package/dist/server-tuning.js +30 -0
  35. package/dist/session.d.ts +36 -0
  36. package/dist/session.js +159 -0
  37. package/dist/tools.d.ts +0 -1
  38. package/dist/tools.js +23 -33
  39. package/dist/transform-types.d.ts +80 -0
  40. package/dist/transform-types.js +5 -0
  41. package/dist/transform.d.ts +7 -53
  42. package/dist/transform.js +434 -856
  43. package/dist/type-guards.d.ts +1 -2
  44. package/dist/type-guards.js +1 -2
  45. package/dist/workers/transform-worker.d.ts +0 -1
  46. package/dist/workers/transform-worker.js +52 -43
  47. package/package.json +11 -12
  48. package/dist/cache.d.ts.map +0 -1
  49. package/dist/cache.js.map +0 -1
  50. package/dist/config.d.ts.map +0 -1
  51. package/dist/config.js.map +0 -1
  52. package/dist/crypto.d.ts.map +0 -1
  53. package/dist/crypto.js.map +0 -1
  54. package/dist/errors.d.ts.map +0 -1
  55. package/dist/errors.js.map +0 -1
  56. package/dist/fetch.d.ts.map +0 -1
  57. package/dist/fetch.js.map +0 -1
  58. package/dist/http.d.ts +0 -90
  59. package/dist/http.d.ts.map +0 -1
  60. package/dist/http.js +0 -1576
  61. package/dist/http.js.map +0 -1
  62. package/dist/index.d.ts.map +0 -1
  63. package/dist/index.js.map +0 -1
  64. package/dist/mcp.d.ts.map +0 -1
  65. package/dist/mcp.js.map +0 -1
  66. package/dist/observability.d.ts.map +0 -1
  67. package/dist/observability.js.map +0 -1
  68. package/dist/tools.d.ts.map +0 -1
  69. package/dist/tools.js.map +0 -1
  70. package/dist/transform.d.ts.map +0 -1
  71. package/dist/transform.js.map +0 -1
  72. package/dist/type-guards.d.ts.map +0 -1
  73. package/dist/type-guards.js.map +0 -1
  74. package/dist/workers/transform-worker.d.ts.map +0 -1
  75. package/dist/workers/transform-worker.js.map +0 -1
package/README.md CHANGED
@@ -1,200 +1,45 @@
1
+ <!-- markdownlint-disable MD033 -->
2
+
1
3
  # superFetch MCP Server
2
4
 
3
- <!-- markdownlint-disable MD033 -->
5
+ Intelligent web content fetcher MCP server that converts HTML to clean, AI-readable Markdown.
4
6
 
5
- <img src="docs/logo.png" alt="SuperFetch MCP Logo" width="200">
7
+ [![npm version](https://img.shields.io/npm/v/@j0hanz/superfetch.svg)](https://www.npmjs.com/package/@j0hanz/superfetch) [![license](https://img.shields.io/npm/l/@j0hanz/superfetch.svg)](https://www.npmjs.com/package/@j0hanz/superfetch) [![Node.js](https://img.shields.io/badge/Node.js-%3E=20.18.1-339933?logo=nodedotjs&logoColor=white)](https://nodejs.org/) [![TypeScript](https://img.shields.io/badge/TypeScript-5.9-3178C6?logo=typescript&logoColor=white)](https://www.typescriptlang.org/) [![MCP SDK](https://img.shields.io/badge/MCP%20SDK-1.25.x-6f42c1)](https://github.com/modelcontextprotocol/sdk)
6
8
 
7
- [![npm version](https://img.shields.io/npm/v/@j0hanz/superfetch.svg)](https://www.npmjs.com/package/@j0hanz/superfetch) [![Node.js](https://img.shields.io/badge/Node.js-%3E=20.18.1-339933?logo=nodedotjs&logoColor=white)](https://nodejs.org/) [![TypeScript](https://img.shields.io/badge/TypeScript-5.9-3178C6?logo=typescript&logoColor=white)](https://www.typescriptlang.org/)
9
+ <img src="docs/logo.png" alt="SuperFetch MCP Logo" width="300">
8
10
 
9
11
  ## One-Click Install
10
12
 
11
- [![Install with NPX in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=superfetch&inputs=%5B%5D&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Fsuperfetch%40latest%22%2C%22--stdio%22%5D%7D) [![Install with NPX in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=superfetch&inputs=%5B%5D&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Fsuperfetch%40latest%22%2C%22--stdio%22%5D%7D&quality=insiders)
13
+ [![Install with NPX in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=superfetch&inputs=%5B%5D&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Fsuperfetch%40latest%22%2C%22--stdio%22%5D%7D)
14
+ [![Install with NPX in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=superfetch&inputs=%5B%5D&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Fsuperfetch%40latest%22%2C%22--stdio%22%5D%7D&quality=insiders)
12
15
 
13
16
  [![Install in Cursor](https://cursor.com/deeplink/mcp-install-dark.svg)](https://cursor.com/install-mcp?name=superfetch&config=eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyIteSIsIkBqMGhhbnovc3VwZXJmZXRjaEBsYXRlc3QiLCItLXN0ZGlvIl19)
14
17
 
15
- A [Model Context Protocol](https://modelcontextprotocol.io/) (MCP) server that fetches web pages, extracts readable content with Mozilla Readability, and returns AI-friendly Markdown.
16
-
17
- Built for AI workflows that need _clean text_, _stable metadata_, and _safe-by-default fetching_.
18
-
19
- **Great for:** _LLM summarization_, _context retrieval_, _knowledge base ingestion_, and _AI agents_.
20
-
21
- _|_ [Quick Start](#quick-start) _|_ [Tool](#available-tools) _|_ [Resources](#resources) _|_ [Configuration](#configuration) _|_ [Security](#security) _|_ [Development](#development) _|_
22
-
23
- ---
24
-
25
- > [!CAUTION]
26
- > This server can access URLs on behalf of AI assistants. Built-in SSRF protection blocks private IP ranges and cloud metadata endpoints, but exercise caution when deploying in sensitive environments.
18
+ ## Overview
27
19
 
28
- ## Features
20
+ | Feature | Details |
21
+ | -------------------- | -------------------------------------------------------------------------- |
22
+ | HTML → Markdown | Mozilla Readability + node-html-markdown pipeline with metadata injection. |
23
+ | Raw content handling | Rewrites supported GitHub/GitLab/Bitbucket/Gist URLs to raw content. |
24
+ | Caching + resources | LRU cache with resource listing and update notifications. |
25
+ | Transport | Stdio (local clients) and Streamable HTTP (self-hosted). |
26
+ | Safety | SSRF/IP blocklists, Host/Origin validation, auth for HTTP mode. |
29
27
 
30
- - **Cleaner outputs for LLMs**: Readability extraction with quality gates (content ratio + heading retention ≥ 70%)
31
- - **Markdown that’s easy to consume**: metadata footer for HTML + configurable source injection for raw Markdown (markdown or frontmatter)
32
- - **Handles “raw content” sources**: preserves markdown/text; rewrites GitHub/GitLab/Bitbucket/Gist URLs to raw
33
- - **Works for both local and hosted setups**:
34
- - **Stdio mode**: best for MCP clients (VS Code / Claude Desktop / Cursor)
35
- - **HTTP mode**: best for self-hosting (auth, sessions, rate limiting, Host/Origin validation)
36
- - **Fast and resilient**: redirect validation, timeouts, and response size limits
37
- - **Security-first defaults**: URL validation + SSRF/DNS/IP blocklists (blocks private ranges and cloud metadata endpoints)
28
+ ### When to use
38
29
 
39
- **You get, in one tool call:**
40
-
41
- - **Clean, readable Markdown** from any public URL (docs, articles, blogs, wikis)
42
-
43
- If you’re comparing “just call `fetch()`” vs superFetch: superFetch focuses on extracting the main content in a readble format for LLMs and even humans, when requested url is fetched it returns clean structured markdown that can also be saved as a resource for later use.
44
-
45
- ## What it is (and isn’t)
46
-
47
- - **It is** a content extraction tool: focuses on extracting readable content, not screenshots or full-page data.
48
- - **It is** an MCP server: integrates with any MCP-compatible client (Claude Desktop, VS Code, Cursor, Cline, Windsurf, Codex, etc).
49
- - **It isn’t** a general web scraper: it extracts main content, not all page elements.
50
- - **It isn’t** a browser: it doesn’t execute JavaScript or render pages.
51
- - **It’s opinionated on safety**: blocks private/internal URLs and cloud metadata endpoints by default.
52
-
53
- ---
30
+ - You need clean, AI-friendly Markdown from public http(s) URLs.
31
+ - You want a single MCP tool that handles fetching, extraction, and caching.
32
+ - You need self-hosted HTTP with auth and session management.
54
33
 
55
34
  ## Quick Start
56
35
 
57
- Recommended: use **stdio mode** with your MCP client (no HTTP server).
58
-
59
- ### Try it in 60 seconds
60
-
61
- 1. Add the MCP server config (below)
62
- 2. Restart your MCP client
63
- 3. Call the `fetch-url` tool with any public URL
64
-
65
- ### What the tool returns
66
-
67
- You’ll get `structuredContent` with `url`, `resolvedUrl`, optional `title`, and `markdown` (plus a `superfetch://cache/...` resource link when cache is enabled and content is large).
68
-
69
- ### Claude Desktop
70
-
71
- Add to your `claude_desktop_config.json`:
72
-
73
- ```json
74
- {
75
- "mcpServers": {
76
- "superFetch": {
77
- "command": "npx",
78
- "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
79
- }
80
- }
81
- }
82
- ```
83
-
84
- ### VS Code
85
-
86
- Add to `.vscode/mcp.json` in your workspace:
87
-
88
- ```json
89
- {
90
- "servers": {
91
- "superFetch": {
92
- "command": "npx",
93
- "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
94
- }
95
- }
96
- }
97
- ```
98
-
99
- ### With Custom Configuration
100
-
101
- Add environment variables in your MCP client config under `env`.
102
- See [Configuration](#configuration) or `CONFIGURATION.md` for all available options and presets.
103
-
104
- ### Example output (trimmed)
105
-
106
- ```json
107
- {
108
- "url": "https://example.com/docs",
109
- "inputUrl": "https://example.com/docs",
110
- "resolvedUrl": "https://example.com/docs",
111
- "title": "Documentation",
112
- "markdown": "# Getting Started\n\n...\n\n---\n\n _Documentation_ | [_Original Source_](https://example.com/docs) | _12-01-2026_"
113
- }
114
- ```
115
-
116
- > **Tip (Windows):** If you encounter issues, try: `cmd /c "npx -y @j0hanz/superfetch@latest --stdio"`
117
-
118
- <details>
119
- <summary><strong>Other clients (Cursor, Cline, Windsurf, Codex)</strong></summary>
120
-
121
- ### Cursor
122
-
123
- 1. Open Cursor Settings
124
- 2. Go to **Features > MCP Servers**
125
- 3. Click **"+ Add new global MCP server"**
126
- 4. Add this configuration:
127
-
128
- ```json
129
- {
130
- "mcpServers": {
131
- "superFetch": {
132
- "command": "npx",
133
- "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
134
- }
135
- }
136
- }
137
- ```
138
-
139
- <details>
140
- <summary><strong>Codex IDE</strong></summary>
141
-
142
- Add to your `~/.codex/config.toml` file:
143
-
144
- **Basic Configuration:**
145
-
146
- ```toml
147
- [mcp_servers.superfetch]
148
- command = "npx"
149
- args = ["-y", "@j0hanz/superfetch@latest", "--stdio"]
150
- ```
151
-
152
- **With Environment Variables:** See `CONFIGURATION.md` for examples.
153
-
154
- > **Access config file:** Click the gear icon -> "Codex Settings > Open config.toml"
155
- >
156
- > **Documentation:** [Codex MCP Guide](https://codex.com/docs/mcp)
157
-
158
- </details>
159
-
160
- <details>
161
- <summary><strong>Cline (VS Code Extension)</strong></summary>
162
-
163
- Open the Cline MCP settings file:
164
-
165
- **macOS:**
36
+ Recommended for MCP clients: stdio mode.
166
37
 
167
38
  ```bash
168
- code ~/Library/Application\ Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json
39
+ npx -y @j0hanz/superfetch@latest --stdio
169
40
  ```
170
41
 
171
- **Windows:**
172
-
173
- ```bash
174
- code %APPDATA%\Code\User\globalStorage\saoudrizwan.claude-dev\settings\cline_mcp_settings.json
175
- ```
176
-
177
- Add the configuration:
178
-
179
- ```json
180
- {
181
- "mcpServers": {
182
- "superFetch": {
183
- "command": "npx",
184
- "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"],
185
- "disabled": false,
186
- "autoApprove": []
187
- }
188
- }
189
- }
190
- ```
191
-
192
- </details>
193
-
194
- <details>
195
- <summary><strong>Windsurf</strong></summary>
196
-
197
- Add to `./codeium/windsurf/model_config.json`:
42
+ Example MCP client configuration:
198
43
 
199
44
  ```json
200
45
  {
@@ -207,408 +52,312 @@ Add to `./codeium/windsurf/model_config.json`:
207
52
  }
208
53
  ```
209
54
 
210
- </details>
55
+ ## Installation
211
56
 
212
- <details>
213
- <summary><strong>Claude Desktop (Config File Locations)</strong></summary>
214
-
215
- **macOS:**
216
-
217
- ```bash
218
- # Open config file
219
- open -e "$HOME/Library/Application Support/Claude/claude_desktop_config.json"
220
-
221
- # Or with VS Code
222
- code "$HOME/Library/Application Support/Claude/claude_desktop_config.json"
223
- ```
224
-
225
- **Windows:**
57
+ ### NPX (recommended)
226
58
 
227
59
  ```bash
228
- code %APPDATA%\Claude\claude_desktop_config.json
60
+ npx -y @j0hanz/superfetch@latest --stdio
229
61
  ```
230
62
 
231
- </details>
232
-
233
- </details>
234
-
235
- ---
236
-
237
- ## Use cases
238
-
239
- ### 1) Turn a docs page into “LLM-ready” Markdown
240
-
241
- - Call `fetch-url` with the docs URL
242
- - Feed the returned `markdown` into your summarizer / chunker
243
- - Use the metadata footer fields (especially **Original Source**) for citations
244
-
245
- ### 2) Fetch a GitHub/GitLab/Bitbucket file as raw markdown
246
-
247
- - Pass the normal “web UI” URL to `fetch-url`
248
- - superFetch will rewrite it to the raw content URL when possible
249
- - This avoids navigation UI and reduces boilerplate
250
-
251
- ### 3) Large pages: keep responses stable with cache resources
252
-
253
- - When content is large, the tool can include a `superfetch://cache/...` resource link
254
- - In MCP clients that support resources, you can read the full content via the resource URI
255
- - In HTTP mode, you can also download cached content via `/mcp/downloads/:namespace/:hash` when cache is enabled
256
-
257
- ### 4) Safe-by-default web access for agents
258
-
259
- - superFetch blocks private IP ranges and common cloud metadata endpoints
260
- - If your agent needs internal access, this is intentionally not supported by default (see Security)
261
-
262
- ---
263
-
264
- ## Installation (Alternative)
265
-
266
- ### Global Installation
63
+ ### Global install
267
64
 
268
65
  ```bash
269
66
  npm install -g @j0hanz/superfetch
270
-
271
- # Run in stdio mode
272
67
  superfetch --stdio
273
-
274
- # Run HTTP server (requires auth token)
275
- superfetch
276
68
  ```
277
69
 
278
- ### From Source
70
+ ### From source
279
71
 
280
72
  ```bash
281
73
  git clone https://github.com/j0hanz/super-fetch-mcp-server.git
282
74
  cd super-fetch-mcp-server
283
75
  npm install
284
76
  npm run build
285
- ```
286
-
287
- ### Running the Server
288
-
289
- <details>
290
- <summary><strong>stdio Mode</strong> (direct MCP integration)</summary>
291
-
292
- ```bash
293
77
  node dist/index.js --stdio
294
78
  ```
295
79
 
296
- </details>
297
-
298
- <details>
299
- <summary><strong>HTTP Mode</strong> (default)</summary>
80
+ ## Configuration
300
81
 
301
- HTTP mode requires authentication. By default it binds to `127.0.0.1`. Non-loopback `HOST` values require `ALLOW_REMOTE=true`. To listen on all interfaces, set `HOST=0.0.0.0` or `HOST=::`, set `ALLOW_REMOTE=true`, and configure OAuth (remote bindings require OAuth).
82
+ ### CLI arguments
83
+
84
+ | Argument | Type | Default | Description |
85
+ | --------- | ------- | ------- | ----------------------------------- |
86
+ | `--stdio` | boolean | false | Run in stdio mode (no HTTP server). |
87
+
88
+ ### Environment variables
89
+
90
+ #### Core server settings
91
+
92
+ | Variable | Default | Description |
93
+ | ---------------------------------- | -------------------- | -------------------------------------------------------------- |
94
+ | `HOST` | `127.0.0.1` | HTTP bind address. |
95
+ | `PORT` | `3000` | HTTP server port (1024-65535, `0` for ephemeral). |
96
+ | `USER_AGENT` | `superFetch-MCP/2.0` | User-Agent header for outgoing requests. |
97
+ | `CACHE_ENABLED` | `true` | Enable response caching. |
98
+ | `CACHE_TTL` | `3600` | Cache TTL in seconds (60-86400). |
99
+ | `LOG_LEVEL` | `info` | Logging level (`debug` enables verbose logs). |
100
+ | `ALLOW_REMOTE` | `false` | Allow non-loopback binds (OAuth required). |
101
+ | `ALLOWED_HOSTS` | (empty) | Additional allowed Host/Origin values (comma/space separated). |
102
+ | `TRANSFORM_TIMEOUT_MS` | `30000` | Worker transform timeout in ms (5000-120000). |
103
+ | `TOOL_TIMEOUT_MS` | `50000` | Overall tool timeout in ms (1000-300000). |
104
+ | `TRANSFORM_METADATA_FORMAT` | `markdown` | Metadata format: `markdown` or `frontmatter`. |
105
+ | `SUPERFETCH_EXTRA_NOISE_TOKENS` | (empty) | Extra noise tokens for DOM noise removal. |
106
+ | `SUPERFETCH_EXTRA_NOISE_SELECTORS` | (empty) | Extra CSS selectors for DOM noise removal. |
107
+
108
+ #### HTTP server tuning (optional)
109
+
110
+ | Variable | Default | Description |
111
+ | ------------------------------ | ------- | --------------------------------------------- |
112
+ | `SERVER_HEADERS_TIMEOUT_MS` | (unset) | Sets `server.headersTimeout` (1000-600000). |
113
+ | `SERVER_REQUEST_TIMEOUT_MS` | (unset) | Sets `server.requestTimeout` (1000-600000). |
114
+ | `SERVER_KEEP_ALIVE_TIMEOUT_MS` | (unset) | Sets `server.keepAliveTimeout` (1000-600000). |
115
+ | `SERVER_SHUTDOWN_CLOSE_IDLE` | `false` | Close idle connections on shutdown. |
116
+ | `SERVER_SHUTDOWN_CLOSE_ALL` | `false` | Close all connections on shutdown. |
117
+
118
+ #### Auth (HTTP mode)
119
+
120
+ | Variable | Default | Description |
121
+ | --------------- | ------- | ---------------------------------------------------- |
122
+ | `AUTH_MODE` | auto | `static` or `oauth` (auto-detected from OAuth URLs). |
123
+ | `ACCESS_TOKENS` | (empty) | Comma/space-separated static bearer tokens. |
124
+ | `API_KEY` | (empty) | Adds a static bearer token and enables `X-API-Key`. |
302
125
 
303
- ```bash
304
- API_KEY=supersecret npx -y @j0hanz/superfetch@latest
305
- # Server runs at http://127.0.0.1:3000
306
- ```
126
+ Static mode requires at least one token (`ACCESS_TOKENS` or `API_KEY`).
307
127
 
308
- **Windows (PowerShell):**
128
+ #### OAuth (HTTP mode)
309
129
 
310
- ```powershell
311
- $env:API_KEY = "supersecret"
312
- npx -y @j0hanz/superfetch@latest
313
- ```
130
+ Required when `AUTH_MODE=oauth` (or auto-selected by OAuth URLs):
314
131
 
315
- For multiple static tokens, set `ACCESS_TOKENS` (comma/space separated).
132
+ | Variable | Default | Description |
133
+ | ------------------------- | ------- | ----------------------- |
134
+ | `OAUTH_ISSUER_URL` | - | OAuth issuer. |
135
+ | `OAUTH_AUTHORIZATION_URL` | - | Authorization endpoint. |
136
+ | `OAUTH_TOKEN_URL` | - | Token endpoint. |
137
+ | `OAUTH_INTROSPECTION_URL` | - | Introspection endpoint. |
316
138
 
317
- Auth is required for `/mcp` and `/mcp/downloads` via `Authorization: Bearer <token>` (static mode also accepts `X-API-Key`).
139
+ Optional:
318
140
 
319
- Endpoints:
141
+ | Variable | Default | Description |
142
+ | -------------------------------- | -------------------------- | ---------------------------------------- |
143
+ | `OAUTH_REVOCATION_URL` | - | Revocation endpoint. |
144
+ | `OAUTH_REGISTRATION_URL` | - | Dynamic client registration endpoint. |
145
+ | `OAUTH_RESOURCE_URL` | `http://<host>:<port>/mcp` | Protected resource URL. |
146
+ | `OAUTH_REQUIRED_SCOPES` | (empty) | Required scopes (comma/space separated). |
147
+ | `OAUTH_CLIENT_ID` | - | Client ID for introspection. |
148
+ | `OAUTH_CLIENT_SECRET` | - | Client secret for introspection. |
149
+ | `OAUTH_INTROSPECTION_TIMEOUT_MS` | `5000` | Introspection timeout (1000-30000). |
320
150
 
321
- - `GET /health` (no auth; returns status, name, version, uptime)
322
- - `POST /mcp` (auth required)
323
- - `GET /mcp` (auth required; SSE stream; requires `Accept: text/event-stream`)
324
- - `DELETE /mcp` (auth required)
325
- - `GET /mcp/downloads/:namespace/:hash` (auth required)
151
+ ### HTTP mode endpoints
326
152
 
327
- Sessions are managed via the `mcp-session-id` header (see [HTTP Mode Details](#http-mode-details)).
153
+ | Method | Path | Auth | Notes |
154
+ | ------ | --------------------------------- | ---- | -------------------------------------------------- |
155
+ | GET | `/health` | No | Health check. |
156
+ | POST | `/mcp` | Yes | Streamable HTTP JSON-RPC requests. |
157
+ | GET | `/mcp` | Yes | SSE stream (requires `Accept: text/event-stream`). |
158
+ | DELETE | `/mcp` | Yes | Close the session. |
159
+ | GET | `/mcp/downloads/:namespace/:hash` | Yes | Download cached markdown. |
328
160
 
329
- </details>
161
+ Sessions are managed via the `mcp-session-id` header. A `POST /mcp` `initialize` request creates a session and returns the session id.
330
162
 
331
- ---
163
+ ## API Reference
332
164
 
333
- ## Available Tools
165
+ ### Tools
334
166
 
335
- ### Tool Response Notes
167
+ #### `fetch-url`
336
168
 
337
- The tool returns `structuredContent` with `url`, `inputUrl`, `resolvedUrl`, optional `title`, and `markdown` when inline content is available. `resolvedUrl` may differ from `inputUrl` when the URL is rewritten to raw content (GitHub/GitLab/Bitbucket/Gist). On errors, `error` is included instead of content.
169
+ Fetches a webpage and converts it to clean Markdown.
338
170
 
339
- The response includes:
171
+ ##### Parameters
340
172
 
341
- - a `text` block containing JSON of `structuredContent`
342
- - a `resource` block embedding markdown when inline content is available (stdio always embeds full markdown; HTTP embeds inline markdown when it fits or when truncated)
343
- - when content exceeds the inline limit and cache is enabled, a `resource_link` block pointing to `superfetch://cache/...` (stdio mode still embeds full markdown; HTTP mode omits embedded markdown)
344
- - error responses set `isError: true` and return `structuredContent` with `error` and `url`
173
+ | Name | Type | Required | Default | Description |
174
+ | ----- | ------ | -------- | ------- | ------------------------------------ |
175
+ | `url` | string | Yes | - | Public http(s) URL, max length 2048. |
345
176
 
346
- ---
177
+ ##### Returns
347
178
 
348
- ### `fetch-url`
179
+ `structuredContent` fields:
349
180
 
350
- Fetches a webpage and converts it to clean Markdown format with a metadata footer for HTML (raw markdown is preserved with source injection).
181
+ - `url` (string): fetched URL
182
+ - `inputUrl` (string, optional): original input URL
183
+ - `resolvedUrl` (string, optional): normalized or raw-content URL
184
+ - `title` (string, optional): page title
185
+ - `markdown` (string, optional): markdown content (inline when available)
186
+ - `error` (string, optional): error message on failure
351
187
 
352
- | Parameter | Type | Default | Description |
353
- | --------- | ------ | -------- | ------------ |
354
- | `url` | string | required | URL to fetch |
355
-
356
- **Example `structuredContent`:**
188
+ ##### Example success
357
189
 
358
190
  ```json
359
191
  {
360
192
  "url": "https://example.com/docs",
361
193
  "inputUrl": "https://example.com/docs",
362
194
  "resolvedUrl": "https://example.com/docs",
363
- "title": "Documentation",
364
- "markdown": "---\ntitle: Documentation\n---\n\n# Getting Started\n\nWelcome..."
195
+ "title": "Example Docs",
196
+ "markdown": "# Getting Started\n\n..."
365
197
  }
366
198
  ```
367
199
 
368
- **Error response:**
200
+ ##### Example error
369
201
 
370
202
  ```json
371
203
  {
372
- "url": "https://example.com/broken",
373
- "error": "Failed to fetch: 404 Not Found"
204
+ "url": "https://example.com/404",
205
+ "error": "Failed to fetch URL: 404 Not Found"
374
206
  }
375
207
  ```
376
208
 
377
- ---
378
-
379
- ### Large Content Handling
380
-
381
- - Inline markdown is capped at 20,000 characters (`maxInlineContentChars`).
382
- - **Stdio mode:** full markdown is embedded as a `resource` block; if cache is enabled and content exceeds the inline limit, a `resource_link` is still included.
383
- - **HTTP mode:** if content exceeds the inline limit and cache is enabled, the response includes a `resource_link` to `superfetch://cache/...` and omits embedded markdown. If cache is disabled, the inline markdown is truncated with `...[truncated]`.
384
- - Upstream fetch size is capped at 10 MB of HTML; larger responses fail.
209
+ ##### Large content handling
385
210
 
386
- ---
211
+ - Inline markdown is capped at 20,000 characters.
212
+ - When content exceeds the inline limit and cache is enabled, responses include a `resource_link` to `superfetch://cache/markdown/{urlHash}`.
213
+ - If cache is disabled, inline content is truncated with `...[truncated]`.
387
214
 
388
- ## Resources
215
+ ### Resources
389
216
 
390
- | URI | Description |
391
- | ------------------------------------------ | ---------------------------------------------- |
392
- | `superfetch://cache/{namespace}/{urlHash}` | Cached content entry (`namespace`: `markdown`) |
217
+ | URI pattern | Description | MIME type |
218
+ | --------------------------------------- | ------------------------------ | --------------- |
219
+ | `superfetch://cache/markdown/{urlHash}` | Cached markdown content entry. | `text/markdown` |
220
+ | `internal://instructions` | Server usage instructions. | `text/markdown` |
393
221
 
394
- Resource listings enumerate cached entries, and subscriptions notify clients when cache entries update.
222
+ ### Prompts
395
223
 
396
- ---
224
+ No prompts are registered in this server.
397
225
 
398
- ## Download Endpoint (HTTP Mode)
226
+ ## Client Configuration Examples
399
227
 
400
- When running in HTTP mode, cached content can be downloaded directly. Downloads are available only when cache is enabled.
228
+ <details>
229
+ <summary><strong>VS Code</strong></summary>
401
230
 
402
- ### Endpoint
231
+ Add to .vscode/mcp.json:
403
232
 
404
- ```text
405
- GET /mcp/downloads/:namespace/:hash
233
+ ```json
234
+ {
235
+ "servers": {
236
+ "superFetch": {
237
+ "command": "npx",
238
+ "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
239
+ }
240
+ }
241
+ }
406
242
  ```
407
243
 
408
- - `namespace`: `markdown`
409
- - Auth required (`Authorization: Bearer <token>`; in static token mode, `X-API-Key` is accepted)
410
-
411
- ### Response Headers
244
+ </details>
412
245
 
413
- | Header | Value |
414
- | --------------------- | ------------------------------- |
415
- | `Content-Type` | `text/markdown; charset=utf-8` |
416
- | `Content-Disposition` | `attachment; filename="<name>"` |
417
- | `Cache-Control` | `private, max-age=<CACHE_TTL>` |
246
+ <details>
247
+ <summary><strong>Claude Desktop</strong></summary>
418
248
 
419
- ### Example Usage
249
+ Add to claude_desktop_config.json:
420
250
 
421
- ```bash
422
- curl -H "Authorization: Bearer $TOKEN" \
423
- http://localhost:3000/mcp/downloads/markdown/abc123.def456 \
424
- -o article.md
251
+ ```json
252
+ {
253
+ "mcpServers": {
254
+ "superFetch": {
255
+ "command": "npx",
256
+ "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
257
+ }
258
+ }
259
+ }
425
260
  ```
426
261
 
427
- ### Error Responses
428
-
429
- | Status | Code | Description |
430
- | ------ | --------------------- | -------------------------------- |
431
- | 400 | `BAD_REQUEST` | Invalid namespace or hash format |
432
- | 404 | `NOT_FOUND` | Content not found or expired |
433
- | 503 | `SERVICE_UNAVAILABLE` | Download service disabled |
434
-
435
- ---
436
-
437
- ## Configuration
438
-
439
- Set environment variables in your MCP client `env` or in the shell before starting the server.
440
-
441
- ### Core Server Settings
442
-
443
- | Variable | Default | Description |
444
- | --------------------------- | -------------------- | ---------------------------------------------------------- |
445
- | `HOST` | `127.0.0.1` | HTTP bind address |
446
- | `PORT` | `3000` | HTTP server port (1024-65535) |
447
- | `USER_AGENT` | `superFetch-MCP/2.0` | User-Agent header for outgoing requests |
448
- | `CACHE_ENABLED` | `true` | Enable response caching |
449
- | `CACHE_TTL` | `3600` | Cache TTL in seconds (60-86400) |
450
- | `LOG_LEVEL` | `info` | Logging level (`debug` enables verbose logs) |
451
- | `ALLOW_REMOTE` | `false` | Allow non-loopback binds (OAuth required) |
452
- | `ALLOWED_HOSTS` | (empty) | Additional allowed Host/Origin values |
453
- | `TRANSFORM_TIMEOUT_MS` | `30000` | Worker transform timeout in ms (5000-120000) |
454
- | `TOOL_TIMEOUT_MS` | `50000` | Overall tool timeout in ms (1000-300000) |
455
- | `TRANSFORM_METADATA_FORMAT` | `markdown` | Raw markdown metadata format (`markdown` or `frontmatter`) |
456
-
457
- For HTTP server tuning (`SERVER_HEADERS_TIMEOUT_MS`, `SERVER_REQUEST_TIMEOUT_MS`, `SERVER_KEEP_ALIVE_TIMEOUT_MS`, `SERVER_SHUTDOWN_CLOSE_IDLE`, `SERVER_SHUTDOWN_CLOSE_ALL`), see `CONFIGURATION.md`.
458
-
459
- ### Auth (HTTP Mode)
460
-
461
- | Variable | Default | Description |
462
- | --------------- | ------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
463
- | `AUTH_MODE` | auto | `static` or `oauth`. Auto-selects OAuth if OAUTH_ISSUER_URL, OAUTH_AUTHORIZATION_URL, OAUTH_TOKEN_URL, or OAUTH_INTROSPECTION_URL is set |
464
- | `ACCESS_TOKENS` | (empty) | Comma/space-separated static bearer tokens |
465
- | `API_KEY` | (empty) | Adds a static bearer token and enables `X-API-Key` header |
466
-
467
- Static mode requires at least one token (`ACCESS_TOKENS` or `API_KEY`).
468
-
469
- ### OAuth (HTTP Mode)
470
-
471
- Required when `AUTH_MODE=oauth` (or auto-selected by presence of OAuth URLs):
472
-
473
- | Variable | Default | Description |
474
- | ------------------------- | ------- | ---------------------- |
475
- | `OAUTH_ISSUER_URL` | - | OAuth issuer |
476
- | `OAUTH_AUTHORIZATION_URL` | - | Authorization endpoint |
477
- | `OAUTH_TOKEN_URL` | - | Token endpoint |
478
- | `OAUTH_INTROSPECTION_URL` | - | Introspection endpoint |
479
-
480
- Optional:
481
-
482
- | Variable | Default | Description |
483
- | -------------------------------- | -------------------------- | --------------------------------------- |
484
- | `OAUTH_REVOCATION_URL` | - | Revocation endpoint |
485
- | `OAUTH_REGISTRATION_URL` | - | Dynamic client registration endpoint |
486
- | `OAUTH_RESOURCE_URL` | `http://<host>:<port>/mcp` | Protected resource URL |
487
- | `OAUTH_REQUIRED_SCOPES` | (empty) | Required scopes (comma/space separated) |
488
- | `OAUTH_CLIENT_ID` | - | Client ID for introspection |
489
- | `OAUTH_CLIENT_SECRET` | - | Client secret for introspection |
490
- | `OAUTH_INTROSPECTION_TIMEOUT_MS` | `5000` | Introspection timeout (1000-30000) |
491
-
492
- ### Fixed Limits (Not Configurable via env)
493
-
494
- - Fetch timeout: 15 seconds
495
- - Max redirects: 5
496
- - Max HTML response size: 10 MB
497
- - Inline markdown limit: 20,000 characters
498
- - Cache max entries: 100
499
- - Session TTL: 30 minutes
500
- - Session init timeout: 10 seconds
501
- - Max sessions: 200
502
- - Rate limit: 100 req/min per IP (60s window)
503
-
504
- See `CONFIGURATION.md` for preset examples and quick-start snippets.
505
-
506
- ---
507
-
508
- ## HTTP Mode Details
509
-
510
- HTTP mode uses the MCP Streamable HTTP transport. The workflow is:
511
-
512
- 1. `POST /mcp` with an `initialize` request and **no** `mcp-session-id` header.
513
- 2. The server returns `mcp-session-id` in the response headers.
514
- 3. Use that header for subsequent `POST /mcp`, `GET /mcp`, and `DELETE /mcp` requests.
515
-
516
- If the `mcp-protocol-version` header is missing, the server rejects the request. Only `mcp-protocol-version: 2025-11-25` is supported.
262
+ </details>
517
263
 
518
- `GET /mcp` and `DELETE /mcp` require `mcp-session-id`. `POST /mcp` without an `initialize` request will return 400.
264
+ <details>
265
+ <summary><strong>Cursor</strong></summary>
519
266
 
520
- Additional HTTP transport notes:
267
+ ```json
268
+ {
269
+ "mcpServers": {
270
+ "superFetch": {
271
+ "command": "npx",
272
+ "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
273
+ }
274
+ }
275
+ }
276
+ ```
521
277
 
522
- - `POST /mcp` should advertise `Accept: application/json, text/event-stream` (the server normalizes missing or `*/*` Accept headers).
523
- - `GET /mcp` requires `Accept: text/event-stream` (otherwise 406).
524
- - JSON-RPC batch requests are not supported (400).
278
+ </details>
525
279
 
526
- If the server reaches its session cap (200), it evicts the oldest session when possible; otherwise it returns a 503.
280
+ <details>
281
+ <summary><strong>Windsurf</strong></summary>
527
282
 
528
- Host and Origin headers are always validated. Allowed values include loopback hosts, the configured `HOST` (if not a wildcard), and any entries in `ALLOWED_HOSTS`. When binding to `0.0.0.0` or `::`, set `ALLOWED_HOSTS` to the hostnames clients will send.
283
+ ```json
284
+ {
285
+ "mcpServers": {
286
+ "superFetch": {
287
+ "command": "npx",
288
+ "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
289
+ }
290
+ }
291
+ }
292
+ ```
529
293
 
530
- ---
294
+ </details>
531
295
 
532
296
  ## Security
533
297
 
534
- ### SSRF Protection
535
-
536
- Blocked destinations include:
537
-
538
- - Loopback and unspecified addresses (`127.0.0.0/8`, `::1`, `0.0.0.0`, `::`)
539
- - Private/ULA ranges (`10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16`, `fc00::/7`)
540
- - Link-local and shared address space (`169.254.0.0/16`, `100.64.0.0/10`, `fe80::/10`)
541
- - Multicast/reserved ranges (`224.0.0.0/4`, `240.0.0.0/4`, `ff00::/8`)
542
- - IPv6 transition ranges (`64:ff9b::/96`, `64:ff9b:1::/48`, `2001::/32`, `2002::/16`)
543
- - Cloud metadata endpoints (AWS/GCP/Azure/Alibaba) like `169.254.169.254`, `metadata.google.internal`, `metadata.azure.com`, `100.100.100.200`, `instance-data`
544
- - Internal suffixes such as `.local` and `.internal`
545
-
546
- DNS resolution is performed and blocked if any resolved IP matches a blocked range.
547
-
548
- ### URL Validation
549
-
550
- - Only `http` and `https` URLs
551
- - No embedded credentials in URLs
552
- - Max URL length: 2048 characters
553
- - Hostnames ending in `.local` or `.internal` are rejected
554
-
555
- ### Host/Origin Validation (HTTP Mode)
298
+ - Stdio logs are written to stderr (stdout is reserved for MCP traffic).
299
+ - HTTP mode validates Host and Origin headers against allowed hosts.
300
+ - HTTP mode requires `MCP-Protocol-Version: 2025-11-25`.
301
+ - Auth is required for HTTP mode (static tokens or OAuth).
302
+ - SSRF protections block private IP ranges and common metadata endpoints.
303
+ - Rate limiting: 100 requests/minute per IP (60s window) for HTTP routes.
556
304
 
557
- - Host header must match loopback, configured `HOST` (if not a wildcard), or `ALLOWED_HOSTS`
558
- - Origin header (when present) is validated against the same allow-list
559
-
560
- ### Rate Limiting
561
-
562
- Rate limiting applies to `/mcp` and `/mcp/downloads` (100 req/min per IP, 60s window). OPTIONS requests are not rate-limited.
305
+ ## Development
563
306
 
564
- ---
307
+ ### Prerequisites
565
308
 
566
- ## Development
309
+ - Node.js >= 20.18.1
310
+ - npm
567
311
 
568
312
  ### Scripts
569
313
 
570
- | Command | Description |
571
- | ----------------------- | ------------------------------------ |
572
- | `npm run dev` | Development server with hot reload |
573
- | `npm run build` | Compile TypeScript |
574
- | `npm start` | Production server |
575
- | `npm run lint` | Run ESLint |
576
- | `npm run lint:fix` | Auto-fix lint issues |
577
- | `npm run type-check` | TypeScript type checking |
578
- | `npm run format` | Format with Prettier |
579
- | `npm test` | Run Node test runner (builds dist) |
580
- | `npm run test:coverage` | Run tests with experimental coverage |
581
- | `npm run knip` | Find unused exports/dependencies |
582
- | `npm run knip:fix` | Auto-fix unused code |
583
- | `npm run inspector` | Launch MCP Inspector |
584
-
585
- > **Note:** Tests run via `node --test` with `--experimental-transform-types` to execute `.ts` test files. Node will emit an experimental warning.
586
-
587
- ### Tech Stack
588
-
589
- | Category | Technology |
590
- | ------------------ | --------------------------------- |
591
- | Runtime | Node.js >=20.18.1 |
592
- | Language | TypeScript 5.9 |
593
- | MCP SDK | @modelcontextprotocol/sdk ^1.25.2 |
594
- | Content Extraction | @mozilla/readability ^0.6.0 |
595
- | HTML Parsing | linkedom ^0.18.12 |
596
- | Markdown | node-html-markdown ^2.0.0 |
597
- | HTTP | Express ^5.2.1, undici ^7.18.2 |
598
- | Validation | Zod ^4.3.5 |
599
-
600
- ---
601
-
602
- ## Contributing
603
-
604
- 1. Fork the repository
605
- 2. Create a feature branch: `git checkout -b feature/amazing-feature`
606
- 3. Ensure linting passes: `npm run lint`
607
- 4. Run tests: `npm test`
608
- 5. Commit changes: `git commit -m 'Add amazing feature'`
609
- 6. Push: `git push origin feature/amazing-feature`
610
- 7. Open a Pull Request
611
-
612
- For examples of other MCP servers, see: [github.com/modelcontextprotocol/servers](https://github.com/modelcontextprotocol/servers)
314
+ | Script | Command | Purpose |
315
+ | ---------------------- | ----------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------- |
316
+ | clean | `node scripts/clean.mjs` | Remove build artifacts. |
317
+ | validate:instructions | `node scripts/validate-instructions.mjs` | Validate embedded instructions. |
318
+ | build | `npm run clean && tsc -p tsconfig.json && npm run validate:instructions && npm run copy:assets && node scripts/make-executable.mjs` | Build the server. |
319
+ | copy:assets | `node scripts/copy-assets.mjs` | Copy static assets. |
320
+ | prepare | `npm run build` | Prepare package for publishing. |
321
+ | dev | `tsc --watch --preserveWatchOutput` | TypeScript watch mode. |
322
+ | dev:run | `node --watch dist/index.js` | Run compiled server in watch mode. |
323
+ | start | `node dist/index.js` | Start HTTP server (default). |
324
+ | format | `prettier --write .` | Format codebase. |
325
+ | type-check | `tsc --noEmit` | Type checking. |
326
+ | type-check:diagnostics | `tsc --noEmit --extendedDiagnostics` | Type check diagnostics. |
327
+ | type-check:trace | `tsc --noEmit --generateTrace .ts-trace` | Generate TS trace. |
328
+ | lint | `eslint .` | Lint. |
329
+ | lint:fix | `eslint . --fix` | Lint and fix. |
330
+ | test | `npm run build --silent && node --test --experimental-transform-types` | Run tests (builds first). |
331
+ | test:coverage | `npm run build --silent && node --test --experimental-transform-types --experimental-test-coverage` | Test with coverage. |
332
+ | knip | `knip` | Dead code analysis. |
333
+ | knip:fix | `knip --fix` | Fix knip issues. |
334
+ | inspector | `npx @modelcontextprotocol/inspector` | MCP Inspector. |
335
+ | prepublishOnly | `npm run lint && npm run type-check && npm run build` | Prepublish checks. |
336
+
337
+ ### Project structure
338
+
339
+ ```text
340
+ superFetch
341
+ ├── docs
342
+ │ └── logo.png
343
+ ├── src
344
+ │ ├── workers
345
+ │ ├── cache.ts
346
+ │ ├── config.ts
347
+ │ ├── fetch.ts
348
+ │ ├── http-native.ts
349
+ │ ├── http-utils.ts
350
+ │ ├── index.ts
351
+ │ ├── instructions.md
352
+ │ ├── mcp.ts
353
+ │ ├── tools.ts
354
+ │ ├── transform.ts
355
+ │ └── ...
356
+ ├── tests
357
+ │ └── *.test.ts
358
+ ├── CONFIGURATION.md
359
+ ├── package.json
360
+ └── tsconfig.json
361
+ ```
613
362
 
614
363
  <!-- markdownlint-enable MD033 -->