@j0hanz/superfetch 2.2.2 → 2.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +363 -363
- package/dist/cache.d.ts +0 -1
- package/dist/cache.js +13 -25
- package/dist/config.d.ts +0 -1
- package/dist/config.js +9 -7
- package/dist/crypto.d.ts +0 -1
- package/dist/crypto.js +0 -1
- package/dist/dom-noise-removal.d.ts +0 -1
- package/dist/dom-noise-removal.js +35 -32
- package/dist/errors.d.ts +0 -1
- package/dist/errors.js +0 -1
- package/dist/fetch.d.ts +0 -1
- package/dist/fetch.js +45 -29
- package/dist/host-normalization.d.ts +1 -0
- package/dist/host-normalization.js +47 -0
- package/dist/http-native.d.ts +0 -1
- package/dist/http-native.js +73 -25
- package/dist/index.d.ts +0 -1
- package/dist/index.js +0 -1
- package/dist/instructions.md +41 -41
- package/dist/json.d.ts +0 -1
- package/dist/json.js +0 -1
- package/dist/language-detection.d.ts +0 -1
- package/dist/language-detection.js +10 -2
- package/dist/markdown-cleanup.d.ts +0 -1
- package/dist/markdown-cleanup.js +10 -10
- package/dist/mcp-validator.d.ts +14 -0
- package/dist/mcp-validator.js +22 -0
- package/dist/mcp.d.ts +0 -1
- package/dist/mcp.js +0 -1
- package/dist/observability.d.ts +0 -1
- package/dist/observability.js +5 -3
- package/dist/server-tuning.d.ts +9 -0
- package/dist/server-tuning.js +30 -0
- package/dist/{http-utils.d.ts → session.d.ts} +0 -25
- package/dist/{http-utils.js → session.js} +11 -104
- package/dist/tools.d.ts +0 -1
- package/dist/tools.js +19 -29
- package/dist/transform-types.d.ts +0 -1
- package/dist/transform-types.js +0 -1
- package/dist/transform.d.ts +0 -1
- package/dist/transform.js +85 -79
- package/dist/type-guards.d.ts +0 -1
- package/dist/type-guards.js +0 -1
- package/dist/workers/transform-worker.d.ts +0 -1
- package/dist/workers/transform-worker.js +29 -19
- package/package.json +85 -85
- package/dist/cache.d.ts.map +0 -1
- package/dist/cache.js.map +0 -1
- package/dist/config.d.ts.map +0 -1
- package/dist/config.js.map +0 -1
- package/dist/crypto.d.ts.map +0 -1
- package/dist/crypto.js.map +0 -1
- package/dist/dom-noise-removal.d.ts.map +0 -1
- package/dist/dom-noise-removal.js.map +0 -1
- package/dist/errors.d.ts.map +0 -1
- package/dist/errors.js.map +0 -1
- package/dist/fetch.d.ts.map +0 -1
- package/dist/fetch.js.map +0 -1
- package/dist/http-native.d.ts.map +0 -1
- package/dist/http-native.js.map +0 -1
- package/dist/http-utils.d.ts.map +0 -1
- package/dist/http-utils.js.map +0 -1
- package/dist/index.d.ts.map +0 -1
- package/dist/index.js.map +0 -1
- package/dist/json.d.ts.map +0 -1
- package/dist/json.js.map +0 -1
- package/dist/language-detection.d.ts.map +0 -1
- package/dist/language-detection.js.map +0 -1
- package/dist/markdown-cleanup.d.ts.map +0 -1
- package/dist/markdown-cleanup.js.map +0 -1
- package/dist/mcp.d.ts.map +0 -1
- package/dist/mcp.js.map +0 -1
- package/dist/observability.d.ts.map +0 -1
- package/dist/observability.js.map +0 -1
- package/dist/tools.d.ts.map +0 -1
- package/dist/tools.js.map +0 -1
- package/dist/transform-types.d.ts.map +0 -1
- package/dist/transform-types.js.map +0 -1
- package/dist/transform.d.ts.map +0 -1
- package/dist/transform.js.map +0 -1
- package/dist/type-guards.d.ts.map +0 -1
- package/dist/type-guards.js.map +0 -1
- package/dist/workers/transform-worker.d.ts.map +0 -1
- package/dist/workers/transform-worker.js.map +0 -1
package/README.md
CHANGED
|
@@ -1,363 +1,363 @@
|
|
|
1
|
-
<!-- markdownlint-disable MD033 -->
|
|
2
|
-
|
|
3
|
-
# superFetch MCP Server
|
|
4
|
-
|
|
5
|
-
Intelligent web content fetcher MCP server that converts HTML to clean, AI-readable Markdown.
|
|
6
|
-
|
|
7
|
-
[](https://www.npmjs.com/package/@j0hanz/superfetch) [](https://www.npmjs.com/package/@j0hanz/superfetch) [](https://nodejs.org/) [](https://www.typescriptlang.org/) [](https://github.com/modelcontextprotocol/sdk)
|
|
8
|
-
|
|
9
|
-
<img src="docs/logo.png" alt="SuperFetch MCP Logo" width="300">
|
|
10
|
-
|
|
11
|
-
## One-Click Install
|
|
12
|
-
|
|
13
|
-
[](https://insiders.vscode.dev/redirect/mcp/install?name=superfetch&inputs=%5B%5D&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Fsuperfetch%40latest%22%2C%22--stdio%22%5D%7D)
|
|
14
|
-
[](https://insiders.vscode.dev/redirect/mcp/install?name=superfetch&inputs=%5B%5D&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Fsuperfetch%40latest%22%2C%22--stdio%22%5D%7D&quality=insiders)
|
|
15
|
-
|
|
16
|
-
[](https://cursor.com/install-mcp?name=superfetch&config=eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyIteSIsIkBqMGhhbnovc3VwZXJmZXRjaEBsYXRlc3QiLCItLXN0ZGlvIl19)
|
|
17
|
-
|
|
18
|
-
## Overview
|
|
19
|
-
|
|
20
|
-
| Feature | Details |
|
|
21
|
-
| -------------------- | -------------------------------------------------------------------------- |
|
|
22
|
-
| HTML → Markdown | Mozilla Readability + node-html-markdown pipeline with metadata injection. |
|
|
23
|
-
| Raw content handling | Rewrites supported GitHub/GitLab/Bitbucket/Gist URLs to raw content. |
|
|
24
|
-
| Caching + resources | LRU cache with resource listing and update notifications. |
|
|
25
|
-
| Transport | Stdio (local clients) and Streamable HTTP (self-hosted). |
|
|
26
|
-
| Safety | SSRF/IP blocklists, Host/Origin validation, auth for HTTP mode. |
|
|
27
|
-
|
|
28
|
-
### When to use
|
|
29
|
-
|
|
30
|
-
- You need clean, AI-friendly Markdown from public http(s) URLs.
|
|
31
|
-
- You want a single MCP tool that handles fetching, extraction, and caching.
|
|
32
|
-
- You need self-hosted HTTP with auth and session management.
|
|
33
|
-
|
|
34
|
-
## Quick Start
|
|
35
|
-
|
|
36
|
-
Recommended for MCP clients: stdio mode.
|
|
37
|
-
|
|
38
|
-
```bash
|
|
39
|
-
npx -y @j0hanz/superfetch@latest --stdio
|
|
40
|
-
```
|
|
41
|
-
|
|
42
|
-
Example MCP client configuration:
|
|
43
|
-
|
|
44
|
-
```json
|
|
45
|
-
{
|
|
46
|
-
"mcpServers": {
|
|
47
|
-
"superFetch": {
|
|
48
|
-
"command": "npx",
|
|
49
|
-
"args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
|
|
50
|
-
}
|
|
51
|
-
}
|
|
52
|
-
}
|
|
53
|
-
```
|
|
54
|
-
|
|
55
|
-
## Installation
|
|
56
|
-
|
|
57
|
-
### NPX (recommended)
|
|
58
|
-
|
|
59
|
-
```bash
|
|
60
|
-
npx -y @j0hanz/superfetch@latest --stdio
|
|
61
|
-
```
|
|
62
|
-
|
|
63
|
-
### Global install
|
|
64
|
-
|
|
65
|
-
```bash
|
|
66
|
-
npm install -g @j0hanz/superfetch
|
|
67
|
-
superfetch --stdio
|
|
68
|
-
```
|
|
69
|
-
|
|
70
|
-
### From source
|
|
71
|
-
|
|
72
|
-
```bash
|
|
73
|
-
git clone https://github.com/j0hanz/super-fetch-mcp-server.git
|
|
74
|
-
cd super-fetch-mcp-server
|
|
75
|
-
npm install
|
|
76
|
-
npm run build
|
|
77
|
-
node dist/index.js --stdio
|
|
78
|
-
```
|
|
79
|
-
|
|
80
|
-
## Configuration
|
|
81
|
-
|
|
82
|
-
### CLI arguments
|
|
83
|
-
|
|
84
|
-
| Argument | Type | Default | Description |
|
|
85
|
-
| --------- | ------- | ------- | ----------------------------------- |
|
|
86
|
-
| `--stdio` | boolean | false | Run in stdio mode (no HTTP server). |
|
|
87
|
-
|
|
88
|
-
### Environment variables
|
|
89
|
-
|
|
90
|
-
#### Core server settings
|
|
91
|
-
|
|
92
|
-
| Variable | Default | Description |
|
|
93
|
-
| ---------------------------------- | -------------------- | -------------------------------------------------------------- |
|
|
94
|
-
| `HOST` | `127.0.0.1` | HTTP bind address. |
|
|
95
|
-
| `PORT` | `3000` | HTTP server port (1024-65535, `0` for ephemeral). |
|
|
96
|
-
| `USER_AGENT` | `superFetch-MCP/2.0` | User-Agent header for outgoing requests. |
|
|
97
|
-
| `CACHE_ENABLED` | `true` | Enable response caching. |
|
|
98
|
-
| `CACHE_TTL` | `3600` | Cache TTL in seconds (60-86400). |
|
|
99
|
-
| `LOG_LEVEL` | `info` | Logging level (`debug` enables verbose logs). |
|
|
100
|
-
| `ALLOW_REMOTE` | `false` | Allow non-loopback binds (OAuth required). |
|
|
101
|
-
| `ALLOWED_HOSTS` | (empty) | Additional allowed Host/Origin values (comma/space separated). |
|
|
102
|
-
| `TRANSFORM_TIMEOUT_MS` | `30000` | Worker transform timeout in ms (5000-120000). |
|
|
103
|
-
| `TOOL_TIMEOUT_MS` | `50000` | Overall tool timeout in ms (1000-300000). |
|
|
104
|
-
| `TRANSFORM_METADATA_FORMAT` | `markdown` | Metadata format: `markdown` or `frontmatter`. |
|
|
105
|
-
| `SUPERFETCH_EXTRA_NOISE_TOKENS` | (empty) | Extra noise tokens for DOM noise removal. |
|
|
106
|
-
| `SUPERFETCH_EXTRA_NOISE_SELECTORS` | (empty) | Extra CSS selectors for DOM noise removal. |
|
|
107
|
-
|
|
108
|
-
#### HTTP server tuning (optional)
|
|
109
|
-
|
|
110
|
-
| Variable | Default | Description |
|
|
111
|
-
| ------------------------------ | ------- | --------------------------------------------- |
|
|
112
|
-
| `SERVER_HEADERS_TIMEOUT_MS` | (unset) | Sets `server.headersTimeout` (1000-600000). |
|
|
113
|
-
| `SERVER_REQUEST_TIMEOUT_MS` | (unset) | Sets `server.requestTimeout` (1000-600000). |
|
|
114
|
-
| `SERVER_KEEP_ALIVE_TIMEOUT_MS` | (unset) | Sets `server.keepAliveTimeout` (1000-600000). |
|
|
115
|
-
| `SERVER_SHUTDOWN_CLOSE_IDLE` | `false` | Close idle connections on shutdown. |
|
|
116
|
-
| `SERVER_SHUTDOWN_CLOSE_ALL` | `false` | Close all connections on shutdown. |
|
|
117
|
-
|
|
118
|
-
#### Auth (HTTP mode)
|
|
119
|
-
|
|
120
|
-
| Variable | Default | Description |
|
|
121
|
-
| --------------- | ------- | ---------------------------------------------------- |
|
|
122
|
-
| `AUTH_MODE` | auto | `static` or `oauth` (auto-detected from OAuth URLs). |
|
|
123
|
-
| `ACCESS_TOKENS` | (empty) | Comma/space-separated static bearer tokens. |
|
|
124
|
-
| `API_KEY` | (empty) | Adds a static bearer token and enables `X-API-Key`. |
|
|
125
|
-
|
|
126
|
-
Static mode requires at least one token (`ACCESS_TOKENS` or `API_KEY`).
|
|
127
|
-
|
|
128
|
-
#### OAuth (HTTP mode)
|
|
129
|
-
|
|
130
|
-
Required when `AUTH_MODE=oauth` (or auto-selected by OAuth URLs):
|
|
131
|
-
|
|
132
|
-
| Variable | Default | Description |
|
|
133
|
-
| ------------------------- | ------- | ----------------------- |
|
|
134
|
-
| `OAUTH_ISSUER_URL` | - | OAuth issuer. |
|
|
135
|
-
| `OAUTH_AUTHORIZATION_URL` | - | Authorization endpoint. |
|
|
136
|
-
| `OAUTH_TOKEN_URL` | - | Token endpoint. |
|
|
137
|
-
| `OAUTH_INTROSPECTION_URL` | - | Introspection endpoint. |
|
|
138
|
-
|
|
139
|
-
Optional:
|
|
140
|
-
|
|
141
|
-
| Variable | Default | Description |
|
|
142
|
-
| -------------------------------- | -------------------------- | ---------------------------------------- |
|
|
143
|
-
| `OAUTH_REVOCATION_URL` | - | Revocation endpoint. |
|
|
144
|
-
| `OAUTH_REGISTRATION_URL` | - | Dynamic client registration endpoint. |
|
|
145
|
-
| `OAUTH_RESOURCE_URL` | `http://<host>:<port>/mcp` | Protected resource URL. |
|
|
146
|
-
| `OAUTH_REQUIRED_SCOPES` | (empty) | Required scopes (comma/space separated). |
|
|
147
|
-
| `OAUTH_CLIENT_ID` | - | Client ID for introspection. |
|
|
148
|
-
| `OAUTH_CLIENT_SECRET` | - | Client secret for introspection. |
|
|
149
|
-
| `OAUTH_INTROSPECTION_TIMEOUT_MS` | `5000` | Introspection timeout (1000-30000). |
|
|
150
|
-
|
|
151
|
-
### HTTP mode endpoints
|
|
152
|
-
|
|
153
|
-
| Method | Path | Auth | Notes |
|
|
154
|
-
| ------ | --------------------------------- | ---- | -------------------------------------------------- |
|
|
155
|
-
| GET | `/health` | No | Health check. |
|
|
156
|
-
| POST | `/mcp` | Yes | Streamable HTTP JSON-RPC requests. |
|
|
157
|
-
| GET | `/mcp` | Yes | SSE stream (requires `Accept: text/event-stream`). |
|
|
158
|
-
| DELETE | `/mcp` | Yes | Close the session. |
|
|
159
|
-
| GET | `/mcp/downloads/:namespace/:hash` | Yes | Download cached markdown. |
|
|
160
|
-
|
|
161
|
-
Sessions are managed via the `mcp-session-id` header. A `POST /mcp` `initialize` request creates a session and returns the session id.
|
|
162
|
-
|
|
163
|
-
## API Reference
|
|
164
|
-
|
|
165
|
-
### Tools
|
|
166
|
-
|
|
167
|
-
#### `fetch-url`
|
|
168
|
-
|
|
169
|
-
Fetches a webpage and converts it to clean Markdown.
|
|
170
|
-
|
|
171
|
-
##### Parameters
|
|
172
|
-
|
|
173
|
-
| Name | Type | Required | Default | Description |
|
|
174
|
-
| ----- | ------ | -------- | ------- | ------------------------------------ |
|
|
175
|
-
| `url` | string | Yes | - | Public http(s) URL, max length 2048. |
|
|
176
|
-
|
|
177
|
-
##### Returns
|
|
178
|
-
|
|
179
|
-
`structuredContent` fields:
|
|
180
|
-
|
|
181
|
-
- `url` (string): fetched URL
|
|
182
|
-
- `inputUrl` (string, optional): original input URL
|
|
183
|
-
- `resolvedUrl` (string, optional): normalized or raw-content URL
|
|
184
|
-
- `title` (string, optional): page title
|
|
185
|
-
- `markdown` (string, optional): markdown content (inline when available)
|
|
186
|
-
- `error` (string, optional): error message on failure
|
|
187
|
-
|
|
188
|
-
##### Example success
|
|
189
|
-
|
|
190
|
-
```json
|
|
191
|
-
{
|
|
192
|
-
"url": "https://example.com/docs",
|
|
193
|
-
"inputUrl": "https://example.com/docs",
|
|
194
|
-
"resolvedUrl": "https://example.com/docs",
|
|
195
|
-
"title": "Example Docs",
|
|
196
|
-
"markdown": "# Getting Started\n\n..."
|
|
197
|
-
}
|
|
198
|
-
```
|
|
199
|
-
|
|
200
|
-
##### Example error
|
|
201
|
-
|
|
202
|
-
```json
|
|
203
|
-
{
|
|
204
|
-
"url": "https://example.com/404",
|
|
205
|
-
"error": "Failed to fetch URL: 404 Not Found"
|
|
206
|
-
}
|
|
207
|
-
```
|
|
208
|
-
|
|
209
|
-
##### Large content handling
|
|
210
|
-
|
|
211
|
-
- Inline markdown is capped at 20,000 characters.
|
|
212
|
-
- When content exceeds the inline limit and cache is enabled, responses include a `resource_link` to `superfetch://cache/markdown/{urlHash}`.
|
|
213
|
-
- If cache is disabled, inline content is truncated with `...[truncated]`.
|
|
214
|
-
|
|
215
|
-
### Resources
|
|
216
|
-
|
|
217
|
-
| URI pattern | Description | MIME type |
|
|
218
|
-
| --------------------------------------- | ------------------------------ | --------------- |
|
|
219
|
-
| `superfetch://cache/markdown/{urlHash}` | Cached markdown content entry. | `text/markdown` |
|
|
220
|
-
| `internal://instructions` | Server usage instructions. | `text/markdown` |
|
|
221
|
-
|
|
222
|
-
### Prompts
|
|
223
|
-
|
|
224
|
-
No prompts are registered in this server.
|
|
225
|
-
|
|
226
|
-
## Client Configuration Examples
|
|
227
|
-
|
|
228
|
-
<details>
|
|
229
|
-
<summary><strong>VS Code</strong></summary>
|
|
230
|
-
|
|
231
|
-
Add to .vscode/mcp.json:
|
|
232
|
-
|
|
233
|
-
```json
|
|
234
|
-
{
|
|
235
|
-
"servers": {
|
|
236
|
-
"superFetch": {
|
|
237
|
-
"command": "npx",
|
|
238
|
-
"args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
|
|
239
|
-
}
|
|
240
|
-
}
|
|
241
|
-
}
|
|
242
|
-
```
|
|
243
|
-
|
|
244
|
-
</details>
|
|
245
|
-
|
|
246
|
-
<details>
|
|
247
|
-
<summary><strong>Claude Desktop</strong></summary>
|
|
248
|
-
|
|
249
|
-
Add to claude_desktop_config.json:
|
|
250
|
-
|
|
251
|
-
```json
|
|
252
|
-
{
|
|
253
|
-
"mcpServers": {
|
|
254
|
-
"superFetch": {
|
|
255
|
-
"command": "npx",
|
|
256
|
-
"args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
|
|
257
|
-
}
|
|
258
|
-
}
|
|
259
|
-
}
|
|
260
|
-
```
|
|
261
|
-
|
|
262
|
-
</details>
|
|
263
|
-
|
|
264
|
-
<details>
|
|
265
|
-
<summary><strong>Cursor</strong></summary>
|
|
266
|
-
|
|
267
|
-
```json
|
|
268
|
-
{
|
|
269
|
-
"mcpServers": {
|
|
270
|
-
"superFetch": {
|
|
271
|
-
"command": "npx",
|
|
272
|
-
"args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
|
|
273
|
-
}
|
|
274
|
-
}
|
|
275
|
-
}
|
|
276
|
-
```
|
|
277
|
-
|
|
278
|
-
</details>
|
|
279
|
-
|
|
280
|
-
<details>
|
|
281
|
-
<summary><strong>Windsurf</strong></summary>
|
|
282
|
-
|
|
283
|
-
```json
|
|
284
|
-
{
|
|
285
|
-
"mcpServers": {
|
|
286
|
-
"superFetch": {
|
|
287
|
-
"command": "npx",
|
|
288
|
-
"args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
|
|
289
|
-
}
|
|
290
|
-
}
|
|
291
|
-
}
|
|
292
|
-
```
|
|
293
|
-
|
|
294
|
-
</details>
|
|
295
|
-
|
|
296
|
-
## Security
|
|
297
|
-
|
|
298
|
-
- Stdio logs are written to stderr (stdout is reserved for MCP traffic).
|
|
299
|
-
- HTTP mode validates Host and Origin headers against allowed hosts.
|
|
300
|
-
- HTTP mode requires `MCP-Protocol-Version: 2025-11-25`.
|
|
301
|
-
- Auth is required for HTTP mode (static tokens or OAuth).
|
|
302
|
-
- SSRF protections block private IP ranges and common metadata endpoints.
|
|
303
|
-
- Rate limiting: 100 requests/minute per IP (60s window) for HTTP routes.
|
|
304
|
-
|
|
305
|
-
## Development
|
|
306
|
-
|
|
307
|
-
### Prerequisites
|
|
308
|
-
|
|
309
|
-
- Node.js >= 20.18.1
|
|
310
|
-
- npm
|
|
311
|
-
|
|
312
|
-
### Scripts
|
|
313
|
-
|
|
314
|
-
| Script | Command | Purpose |
|
|
315
|
-
| ---------------------- | ----------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------- |
|
|
316
|
-
| clean | `node scripts/clean.mjs` | Remove build artifacts. |
|
|
317
|
-
| validate:instructions | `node scripts/validate-instructions.mjs` | Validate embedded instructions. |
|
|
318
|
-
| build | `npm run clean && tsc -p tsconfig.json && npm run validate:instructions && npm run copy:assets && node scripts/make-executable.mjs` | Build the server. |
|
|
319
|
-
| copy:assets | `node scripts/copy-assets.mjs` | Copy static assets. |
|
|
320
|
-
| prepare | `npm run build` | Prepare package for publishing. |
|
|
321
|
-
| dev | `tsc --watch --preserveWatchOutput` | TypeScript watch mode. |
|
|
322
|
-
| dev:run | `node --watch dist/index.js` | Run compiled server in watch mode. |
|
|
323
|
-
| start | `node dist/index.js` | Start HTTP server (default). |
|
|
324
|
-
| format | `prettier --write .` | Format codebase. |
|
|
325
|
-
| type-check | `tsc --noEmit` | Type checking. |
|
|
326
|
-
| type-check:diagnostics | `tsc --noEmit --extendedDiagnostics` | Type check diagnostics. |
|
|
327
|
-
| type-check:trace | `tsc --noEmit --generateTrace .ts-trace` | Generate TS trace. |
|
|
328
|
-
| lint | `eslint .` | Lint. |
|
|
329
|
-
| lint:fix | `eslint . --fix` | Lint and fix. |
|
|
330
|
-
| test | `npm run build --silent && node --test --experimental-transform-types` | Run tests (builds first). |
|
|
331
|
-
| test:coverage | `npm run build --silent && node --test --experimental-transform-types --experimental-test-coverage` | Test with coverage. |
|
|
332
|
-
| knip | `knip` | Dead code analysis. |
|
|
333
|
-
| knip:fix | `knip --fix` | Fix knip issues. |
|
|
334
|
-
| inspector | `npx @modelcontextprotocol/inspector` | MCP Inspector. |
|
|
335
|
-
| prepublishOnly | `npm run lint && npm run type-check && npm run build` | Prepublish checks. |
|
|
336
|
-
|
|
337
|
-
### Project structure
|
|
338
|
-
|
|
339
|
-
```text
|
|
340
|
-
superFetch
|
|
341
|
-
├── docs
|
|
342
|
-
│ └── logo.png
|
|
343
|
-
├── src
|
|
344
|
-
│ ├── workers
|
|
345
|
-
│ ├── cache.ts
|
|
346
|
-
│ ├── config.ts
|
|
347
|
-
│ ├── fetch.ts
|
|
348
|
-
│ ├── http-native.ts
|
|
349
|
-
│ ├── http-utils.ts
|
|
350
|
-
│ ├── index.ts
|
|
351
|
-
│ ├── instructions.md
|
|
352
|
-
│ ├── mcp.ts
|
|
353
|
-
│ ├── tools.ts
|
|
354
|
-
│ ├── transform.ts
|
|
355
|
-
│ └── ...
|
|
356
|
-
├── tests
|
|
357
|
-
│ └── *.test.ts
|
|
358
|
-
├── CONFIGURATION.md
|
|
359
|
-
├── package.json
|
|
360
|
-
└── tsconfig.json
|
|
361
|
-
```
|
|
362
|
-
|
|
363
|
-
<!-- markdownlint-enable MD033 -->
|
|
1
|
+
<!-- markdownlint-disable MD033 -->
|
|
2
|
+
|
|
3
|
+
# superFetch MCP Server
|
|
4
|
+
|
|
5
|
+
Intelligent web content fetcher MCP server that converts HTML to clean, AI-readable Markdown.
|
|
6
|
+
|
|
7
|
+
[](https://www.npmjs.com/package/@j0hanz/superfetch) [](https://www.npmjs.com/package/@j0hanz/superfetch) [](https://nodejs.org/) [](https://www.typescriptlang.org/) [](https://github.com/modelcontextprotocol/sdk)
|
|
8
|
+
|
|
9
|
+
<img src="docs/logo.png" alt="SuperFetch MCP Logo" width="300">
|
|
10
|
+
|
|
11
|
+
## One-Click Install
|
|
12
|
+
|
|
13
|
+
[](https://insiders.vscode.dev/redirect/mcp/install?name=superfetch&inputs=%5B%5D&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Fsuperfetch%40latest%22%2C%22--stdio%22%5D%7D)
|
|
14
|
+
[](https://insiders.vscode.dev/redirect/mcp/install?name=superfetch&inputs=%5B%5D&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Fsuperfetch%40latest%22%2C%22--stdio%22%5D%7D&quality=insiders)
|
|
15
|
+
|
|
16
|
+
[](https://cursor.com/install-mcp?name=superfetch&config=eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyIteSIsIkBqMGhhbnovc3VwZXJmZXRjaEBsYXRlc3QiLCItLXN0ZGlvIl19)
|
|
17
|
+
|
|
18
|
+
## Overview
|
|
19
|
+
|
|
20
|
+
| Feature | Details |
|
|
21
|
+
| -------------------- | -------------------------------------------------------------------------- |
|
|
22
|
+
| HTML → Markdown | Mozilla Readability + node-html-markdown pipeline with metadata injection. |
|
|
23
|
+
| Raw content handling | Rewrites supported GitHub/GitLab/Bitbucket/Gist URLs to raw content. |
|
|
24
|
+
| Caching + resources | LRU cache with resource listing and update notifications. |
|
|
25
|
+
| Transport | Stdio (local clients) and Streamable HTTP (self-hosted). |
|
|
26
|
+
| Safety | SSRF/IP blocklists, Host/Origin validation, auth for HTTP mode. |
|
|
27
|
+
|
|
28
|
+
### When to use
|
|
29
|
+
|
|
30
|
+
- You need clean, AI-friendly Markdown from public http(s) URLs.
|
|
31
|
+
- You want a single MCP tool that handles fetching, extraction, and caching.
|
|
32
|
+
- You need self-hosted HTTP with auth and session management.
|
|
33
|
+
|
|
34
|
+
## Quick Start
|
|
35
|
+
|
|
36
|
+
Recommended for MCP clients: stdio mode.
|
|
37
|
+
|
|
38
|
+
```bash
|
|
39
|
+
npx -y @j0hanz/superfetch@latest --stdio
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
Example MCP client configuration:
|
|
43
|
+
|
|
44
|
+
```json
|
|
45
|
+
{
|
|
46
|
+
"mcpServers": {
|
|
47
|
+
"superFetch": {
|
|
48
|
+
"command": "npx",
|
|
49
|
+
"args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
|
|
50
|
+
}
|
|
51
|
+
}
|
|
52
|
+
}
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
## Installation
|
|
56
|
+
|
|
57
|
+
### NPX (recommended)
|
|
58
|
+
|
|
59
|
+
```bash
|
|
60
|
+
npx -y @j0hanz/superfetch@latest --stdio
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
### Global install
|
|
64
|
+
|
|
65
|
+
```bash
|
|
66
|
+
npm install -g @j0hanz/superfetch
|
|
67
|
+
superfetch --stdio
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
### From source
|
|
71
|
+
|
|
72
|
+
```bash
|
|
73
|
+
git clone https://github.com/j0hanz/super-fetch-mcp-server.git
|
|
74
|
+
cd super-fetch-mcp-server
|
|
75
|
+
npm install
|
|
76
|
+
npm run build
|
|
77
|
+
node dist/index.js --stdio
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
## Configuration
|
|
81
|
+
|
|
82
|
+
### CLI arguments
|
|
83
|
+
|
|
84
|
+
| Argument | Type | Default | Description |
|
|
85
|
+
| --------- | ------- | ------- | ----------------------------------- |
|
|
86
|
+
| `--stdio` | boolean | false | Run in stdio mode (no HTTP server). |
|
|
87
|
+
|
|
88
|
+
### Environment variables
|
|
89
|
+
|
|
90
|
+
#### Core server settings
|
|
91
|
+
|
|
92
|
+
| Variable | Default | Description |
|
|
93
|
+
| ---------------------------------- | -------------------- | -------------------------------------------------------------- |
|
|
94
|
+
| `HOST` | `127.0.0.1` | HTTP bind address. |
|
|
95
|
+
| `PORT` | `3000` | HTTP server port (1024-65535, `0` for ephemeral). |
|
|
96
|
+
| `USER_AGENT` | `superFetch-MCP/2.0` | User-Agent header for outgoing requests. |
|
|
97
|
+
| `CACHE_ENABLED` | `true` | Enable response caching. |
|
|
98
|
+
| `CACHE_TTL` | `3600` | Cache TTL in seconds (60-86400). |
|
|
99
|
+
| `LOG_LEVEL` | `info` | Logging level (`debug` enables verbose logs). |
|
|
100
|
+
| `ALLOW_REMOTE` | `false` | Allow non-loopback binds (OAuth required). |
|
|
101
|
+
| `ALLOWED_HOSTS` | (empty) | Additional allowed Host/Origin values (comma/space separated). |
|
|
102
|
+
| `TRANSFORM_TIMEOUT_MS` | `30000` | Worker transform timeout in ms (5000-120000). |
|
|
103
|
+
| `TOOL_TIMEOUT_MS` | `50000` | Overall tool timeout in ms (1000-300000). |
|
|
104
|
+
| `TRANSFORM_METADATA_FORMAT` | `markdown` | Metadata format: `markdown` or `frontmatter`. |
|
|
105
|
+
| `SUPERFETCH_EXTRA_NOISE_TOKENS` | (empty) | Extra noise tokens for DOM noise removal. |
|
|
106
|
+
| `SUPERFETCH_EXTRA_NOISE_SELECTORS` | (empty) | Extra CSS selectors for DOM noise removal. |
|
|
107
|
+
|
|
108
|
+
#### HTTP server tuning (optional)
|
|
109
|
+
|
|
110
|
+
| Variable | Default | Description |
|
|
111
|
+
| ------------------------------ | ------- | --------------------------------------------- |
|
|
112
|
+
| `SERVER_HEADERS_TIMEOUT_MS` | (unset) | Sets `server.headersTimeout` (1000-600000). |
|
|
113
|
+
| `SERVER_REQUEST_TIMEOUT_MS` | (unset) | Sets `server.requestTimeout` (1000-600000). |
|
|
114
|
+
| `SERVER_KEEP_ALIVE_TIMEOUT_MS` | (unset) | Sets `server.keepAliveTimeout` (1000-600000). |
|
|
115
|
+
| `SERVER_SHUTDOWN_CLOSE_IDLE` | `false` | Close idle connections on shutdown. |
|
|
116
|
+
| `SERVER_SHUTDOWN_CLOSE_ALL` | `false` | Close all connections on shutdown. |
|
|
117
|
+
|
|
118
|
+
#### Auth (HTTP mode)
|
|
119
|
+
|
|
120
|
+
| Variable | Default | Description |
|
|
121
|
+
| --------------- | ------- | ---------------------------------------------------- |
|
|
122
|
+
| `AUTH_MODE` | auto | `static` or `oauth` (auto-detected from OAuth URLs). |
|
|
123
|
+
| `ACCESS_TOKENS` | (empty) | Comma/space-separated static bearer tokens. |
|
|
124
|
+
| `API_KEY` | (empty) | Adds a static bearer token and enables `X-API-Key`. |
|
|
125
|
+
|
|
126
|
+
Static mode requires at least one token (`ACCESS_TOKENS` or `API_KEY`).
|
|
127
|
+
|
|
128
|
+
#### OAuth (HTTP mode)
|
|
129
|
+
|
|
130
|
+
Required when `AUTH_MODE=oauth` (or auto-selected by OAuth URLs):
|
|
131
|
+
|
|
132
|
+
| Variable | Default | Description |
|
|
133
|
+
| ------------------------- | ------- | ----------------------- |
|
|
134
|
+
| `OAUTH_ISSUER_URL` | - | OAuth issuer. |
|
|
135
|
+
| `OAUTH_AUTHORIZATION_URL` | - | Authorization endpoint. |
|
|
136
|
+
| `OAUTH_TOKEN_URL` | - | Token endpoint. |
|
|
137
|
+
| `OAUTH_INTROSPECTION_URL` | - | Introspection endpoint. |
|
|
138
|
+
|
|
139
|
+
Optional:
|
|
140
|
+
|
|
141
|
+
| Variable | Default | Description |
|
|
142
|
+
| -------------------------------- | -------------------------- | ---------------------------------------- |
|
|
143
|
+
| `OAUTH_REVOCATION_URL` | - | Revocation endpoint. |
|
|
144
|
+
| `OAUTH_REGISTRATION_URL` | - | Dynamic client registration endpoint. |
|
|
145
|
+
| `OAUTH_RESOURCE_URL` | `http://<host>:<port>/mcp` | Protected resource URL. |
|
|
146
|
+
| `OAUTH_REQUIRED_SCOPES` | (empty) | Required scopes (comma/space separated). |
|
|
147
|
+
| `OAUTH_CLIENT_ID` | - | Client ID for introspection. |
|
|
148
|
+
| `OAUTH_CLIENT_SECRET` | - | Client secret for introspection. |
|
|
149
|
+
| `OAUTH_INTROSPECTION_TIMEOUT_MS` | `5000` | Introspection timeout (1000-30000). |
|
|
150
|
+
|
|
151
|
+
### HTTP mode endpoints
|
|
152
|
+
|
|
153
|
+
| Method | Path | Auth | Notes |
|
|
154
|
+
| ------ | --------------------------------- | ---- | -------------------------------------------------- |
|
|
155
|
+
| GET | `/health` | No | Health check. |
|
|
156
|
+
| POST | `/mcp` | Yes | Streamable HTTP JSON-RPC requests. |
|
|
157
|
+
| GET | `/mcp` | Yes | SSE stream (requires `Accept: text/event-stream`). |
|
|
158
|
+
| DELETE | `/mcp` | Yes | Close the session. |
|
|
159
|
+
| GET | `/mcp/downloads/:namespace/:hash` | Yes | Download cached markdown. |
|
|
160
|
+
|
|
161
|
+
Sessions are managed via the `mcp-session-id` header. A `POST /mcp` `initialize` request creates a session and returns the session id.
|
|
162
|
+
|
|
163
|
+
## API Reference
|
|
164
|
+
|
|
165
|
+
### Tools
|
|
166
|
+
|
|
167
|
+
#### `fetch-url`
|
|
168
|
+
|
|
169
|
+
Fetches a webpage and converts it to clean Markdown.
|
|
170
|
+
|
|
171
|
+
##### Parameters
|
|
172
|
+
|
|
173
|
+
| Name | Type | Required | Default | Description |
|
|
174
|
+
| ----- | ------ | -------- | ------- | ------------------------------------ |
|
|
175
|
+
| `url` | string | Yes | - | Public http(s) URL, max length 2048. |
|
|
176
|
+
|
|
177
|
+
##### Returns
|
|
178
|
+
|
|
179
|
+
`structuredContent` fields:
|
|
180
|
+
|
|
181
|
+
- `url` (string): fetched URL
|
|
182
|
+
- `inputUrl` (string, optional): original input URL
|
|
183
|
+
- `resolvedUrl` (string, optional): normalized or raw-content URL
|
|
184
|
+
- `title` (string, optional): page title
|
|
185
|
+
- `markdown` (string, optional): markdown content (inline when available)
|
|
186
|
+
- `error` (string, optional): error message on failure
|
|
187
|
+
|
|
188
|
+
##### Example success
|
|
189
|
+
|
|
190
|
+
```json
|
|
191
|
+
{
|
|
192
|
+
"url": "https://example.com/docs",
|
|
193
|
+
"inputUrl": "https://example.com/docs",
|
|
194
|
+
"resolvedUrl": "https://example.com/docs",
|
|
195
|
+
"title": "Example Docs",
|
|
196
|
+
"markdown": "# Getting Started\n\n..."
|
|
197
|
+
}
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
##### Example error
|
|
201
|
+
|
|
202
|
+
```json
|
|
203
|
+
{
|
|
204
|
+
"url": "https://example.com/404",
|
|
205
|
+
"error": "Failed to fetch URL: 404 Not Found"
|
|
206
|
+
}
|
|
207
|
+
```
|
|
208
|
+
|
|
209
|
+
##### Large content handling
|
|
210
|
+
|
|
211
|
+
- Inline markdown is capped at 20,000 characters.
|
|
212
|
+
- When content exceeds the inline limit and cache is enabled, responses include a `resource_link` to `superfetch://cache/markdown/{urlHash}`.
|
|
213
|
+
- If cache is disabled, inline content is truncated with `...[truncated]`.
|
|
214
|
+
|
|
215
|
+
### Resources
|
|
216
|
+
|
|
217
|
+
| URI pattern | Description | MIME type |
|
|
218
|
+
| --------------------------------------- | ------------------------------ | --------------- |
|
|
219
|
+
| `superfetch://cache/markdown/{urlHash}` | Cached markdown content entry. | `text/markdown` |
|
|
220
|
+
| `internal://instructions` | Server usage instructions. | `text/markdown` |
|
|
221
|
+
|
|
222
|
+
### Prompts
|
|
223
|
+
|
|
224
|
+
No prompts are registered in this server.
|
|
225
|
+
|
|
226
|
+
## Client Configuration Examples
|
|
227
|
+
|
|
228
|
+
<details>
|
|
229
|
+
<summary><strong>VS Code</strong></summary>
|
|
230
|
+
|
|
231
|
+
Add to .vscode/mcp.json:
|
|
232
|
+
|
|
233
|
+
```json
|
|
234
|
+
{
|
|
235
|
+
"servers": {
|
|
236
|
+
"superFetch": {
|
|
237
|
+
"command": "npx",
|
|
238
|
+
"args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
|
|
239
|
+
}
|
|
240
|
+
}
|
|
241
|
+
}
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
</details>
|
|
245
|
+
|
|
246
|
+
<details>
|
|
247
|
+
<summary><strong>Claude Desktop</strong></summary>
|
|
248
|
+
|
|
249
|
+
Add to claude_desktop_config.json:
|
|
250
|
+
|
|
251
|
+
```json
|
|
252
|
+
{
|
|
253
|
+
"mcpServers": {
|
|
254
|
+
"superFetch": {
|
|
255
|
+
"command": "npx",
|
|
256
|
+
"args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
|
|
257
|
+
}
|
|
258
|
+
}
|
|
259
|
+
}
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
</details>
|
|
263
|
+
|
|
264
|
+
<details>
|
|
265
|
+
<summary><strong>Cursor</strong></summary>
|
|
266
|
+
|
|
267
|
+
```json
|
|
268
|
+
{
|
|
269
|
+
"mcpServers": {
|
|
270
|
+
"superFetch": {
|
|
271
|
+
"command": "npx",
|
|
272
|
+
"args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
|
|
273
|
+
}
|
|
274
|
+
}
|
|
275
|
+
}
|
|
276
|
+
```
|
|
277
|
+
|
|
278
|
+
</details>
|
|
279
|
+
|
|
280
|
+
<details>
|
|
281
|
+
<summary><strong>Windsurf</strong></summary>
|
|
282
|
+
|
|
283
|
+
```json
|
|
284
|
+
{
|
|
285
|
+
"mcpServers": {
|
|
286
|
+
"superFetch": {
|
|
287
|
+
"command": "npx",
|
|
288
|
+
"args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
|
|
289
|
+
}
|
|
290
|
+
}
|
|
291
|
+
}
|
|
292
|
+
```
|
|
293
|
+
|
|
294
|
+
</details>
|
|
295
|
+
|
|
296
|
+
## Security
|
|
297
|
+
|
|
298
|
+
- Stdio logs are written to stderr (stdout is reserved for MCP traffic).
|
|
299
|
+
- HTTP mode validates Host and Origin headers against allowed hosts.
|
|
300
|
+
- HTTP mode requires `MCP-Protocol-Version: 2025-11-25`.
|
|
301
|
+
- Auth is required for HTTP mode (static tokens or OAuth).
|
|
302
|
+
- SSRF protections block private IP ranges and common metadata endpoints.
|
|
303
|
+
- Rate limiting: 100 requests/minute per IP (60s window) for HTTP routes.
|
|
304
|
+
|
|
305
|
+
## Development
|
|
306
|
+
|
|
307
|
+
### Prerequisites
|
|
308
|
+
|
|
309
|
+
- Node.js >= 20.18.1
|
|
310
|
+
- npm
|
|
311
|
+
|
|
312
|
+
### Scripts
|
|
313
|
+
|
|
314
|
+
| Script | Command | Purpose |
|
|
315
|
+
| ---------------------- | ----------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------- |
|
|
316
|
+
| clean | `node scripts/clean.mjs` | Remove build artifacts. |
|
|
317
|
+
| validate:instructions | `node scripts/validate-instructions.mjs` | Validate embedded instructions. |
|
|
318
|
+
| build | `npm run clean && tsc -p tsconfig.json && npm run validate:instructions && npm run copy:assets && node scripts/make-executable.mjs` | Build the server. |
|
|
319
|
+
| copy:assets | `node scripts/copy-assets.mjs` | Copy static assets. |
|
|
320
|
+
| prepare | `npm run build` | Prepare package for publishing. |
|
|
321
|
+
| dev | `tsc --watch --preserveWatchOutput` | TypeScript watch mode. |
|
|
322
|
+
| dev:run | `node --watch dist/index.js` | Run compiled server in watch mode. |
|
|
323
|
+
| start | `node dist/index.js` | Start HTTP server (default). |
|
|
324
|
+
| format | `prettier --write .` | Format codebase. |
|
|
325
|
+
| type-check | `tsc --noEmit` | Type checking. |
|
|
326
|
+
| type-check:diagnostics | `tsc --noEmit --extendedDiagnostics` | Type check diagnostics. |
|
|
327
|
+
| type-check:trace | `tsc --noEmit --generateTrace .ts-trace` | Generate TS trace. |
|
|
328
|
+
| lint | `eslint .` | Lint. |
|
|
329
|
+
| lint:fix | `eslint . --fix` | Lint and fix. |
|
|
330
|
+
| test | `npm run build --silent && node --test --experimental-transform-types` | Run tests (builds first). |
|
|
331
|
+
| test:coverage | `npm run build --silent && node --test --experimental-transform-types --experimental-test-coverage` | Test with coverage. |
|
|
332
|
+
| knip | `knip` | Dead code analysis. |
|
|
333
|
+
| knip:fix | `knip --fix` | Fix knip issues. |
|
|
334
|
+
| inspector | `npx @modelcontextprotocol/inspector` | MCP Inspector. |
|
|
335
|
+
| prepublishOnly | `npm run lint && npm run type-check && npm run build` | Prepublish checks. |
|
|
336
|
+
|
|
337
|
+
### Project structure
|
|
338
|
+
|
|
339
|
+
```text
|
|
340
|
+
superFetch
|
|
341
|
+
├── docs
|
|
342
|
+
│ └── logo.png
|
|
343
|
+
├── src
|
|
344
|
+
│ ├── workers
|
|
345
|
+
│ ├── cache.ts
|
|
346
|
+
│ ├── config.ts
|
|
347
|
+
│ ├── fetch.ts
|
|
348
|
+
│ ├── http-native.ts
|
|
349
|
+
│ ├── http-utils.ts
|
|
350
|
+
│ ├── index.ts
|
|
351
|
+
│ ├── instructions.md
|
|
352
|
+
│ ├── mcp.ts
|
|
353
|
+
│ ├── tools.ts
|
|
354
|
+
│ ├── transform.ts
|
|
355
|
+
│ └── ...
|
|
356
|
+
├── tests
|
|
357
|
+
│ └── *.test.ts
|
|
358
|
+
├── CONFIGURATION.md
|
|
359
|
+
├── package.json
|
|
360
|
+
└── tsconfig.json
|
|
361
|
+
```
|
|
362
|
+
|
|
363
|
+
<!-- markdownlint-enable MD033 -->
|