@j0hanz/superfetch 2.4.4 → 2.4.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +195 -138
- package/dist/config.d.ts +1 -0
- package/dist/config.js +1 -0
- package/dist/http-native.js +2 -2
- package/dist/instructions.md +30 -38
- package/dist/mcp.d.ts +1 -1
- package/dist/mcp.js +292 -99
- package/dist/observability.js +2 -1
- package/dist/tasks.d.ts +24 -5
- package/dist/tasks.js +125 -8
- package/dist/tools.d.ts +2 -3
- package/dist/tools.js +76 -15
- package/dist/transform.js +40 -14
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -8,27 +8,71 @@
|
|
|
8
8
|
|
|
9
9
|
[](https://insiders.vscode.dev/redirect/mcp/install?name=superfetch&inputs=%5B%5D&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Fsuperfetch%40latest%22%2C%22--stdio%22%5D%7D) [](https://insiders.vscode.dev/redirect/mcp/install?name=superfetch&inputs=%5B%5D&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Fsuperfetch%40latest%22%2C%22--stdio%22%5D%7D&quality=insiders) [](https://claude.ai/desktop/mcp/install?name=superfetch&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Fsuperfetch%40latest%22%2C%22--stdio%22%5D%7D)
|
|
10
10
|
|
|
11
|
-
Fetch and convert public web pages to clean, AI-friendly
|
|
11
|
+
Fetch and convert public web pages to clean, AI-friendly Markdown via MCP.
|
|
12
12
|
|
|
13
13
|
## Overview
|
|
14
14
|
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
| Caching + resources | LRU cache with resource listing and update notifications. |
|
|
20
|
-
| Transport | Stdio (local clients) and Streamable HTTP (self-hosted). |
|
|
21
|
-
| Safety | SSRF/IP blocklists, Host/Origin validation, auth for HTTP mode. |
|
|
15
|
+
SuperFetch is a Node.js MCP server that fetches public HTTP(S) pages, extracts
|
|
16
|
+
primary content, and converts it into clean Markdown. It runs either as a stdio
|
|
17
|
+
MCP server (for local clients) or as a Streamable HTTP MCP server with auth,
|
|
18
|
+
cache, and SSRF protections.
|
|
22
19
|
|
|
23
|
-
|
|
20
|
+
## Key Features
|
|
24
21
|
|
|
25
|
-
-
|
|
26
|
-
-
|
|
27
|
-
-
|
|
22
|
+
- HTML to Markdown using Mozilla Readability + node-html-markdown.
|
|
23
|
+
- Raw content URL rewriting for GitHub, GitLab, Bitbucket, and Gist.
|
|
24
|
+
- In-memory LRU cache exposed as MCP resources and HTTP download endpoints.
|
|
25
|
+
- Stdio or Streamable HTTP transport with session management.
|
|
26
|
+
- SSRF protections: blocked private IP ranges and internal hostnames.
|
|
28
27
|
|
|
29
|
-
##
|
|
28
|
+
## Tech Stack
|
|
30
29
|
|
|
31
|
-
|
|
30
|
+
- Runtime: Node.js >= 20.18.1 (engines)
|
|
31
|
+
- Language: TypeScript 5.9.3 (dev dependency)
|
|
32
|
+
- MCP SDK: @modelcontextprotocol/sdk ^1.25.3
|
|
33
|
+
- HTML processing: @mozilla/readability ^0.6.0, linkedom ^0.18.12
|
|
34
|
+
- Markdown conversion: node-html-markdown ^2.0.0
|
|
35
|
+
- HTTP client: undici ^7.19.2
|
|
36
|
+
- Validation: zod ^4.3.6
|
|
37
|
+
- Package manager: npm (package-lock.json)
|
|
38
|
+
|
|
39
|
+
## Architecture
|
|
40
|
+
|
|
41
|
+
Fetch pipeline (simplified):
|
|
42
|
+
|
|
43
|
+
1. Validate and normalize the URL (http/https only, max length 2048).
|
|
44
|
+
2. Block internal hosts and private IP ranges.
|
|
45
|
+
3. Rewrite supported repo URLs to raw content.
|
|
46
|
+
4. Fetch HTML with undici (15s timeout, 10 MB max, 5 redirects).
|
|
47
|
+
5. Extract main content with Readability + DOM cleanup.
|
|
48
|
+
6. Convert to Markdown, inject metadata, and return via MCP.
|
|
49
|
+
7. Cache the result and expose it as a resource or download.
|
|
50
|
+
|
|
51
|
+
## Repository Structure
|
|
52
|
+
|
|
53
|
+
```text
|
|
54
|
+
.
|
|
55
|
+
├── assets/ # Logo and static assets copied to dist
|
|
56
|
+
├── scripts/ # Build and validation utilities
|
|
57
|
+
├── src/ # MCP server implementation (TS)
|
|
58
|
+
│ ├── workers/ # Worker-thread transform implementation
|
|
59
|
+
│ ├── http-native.ts # Streamable HTTP server
|
|
60
|
+
│ ├── mcp.ts # MCP server wiring
|
|
61
|
+
│ ├── tools.ts # fetch-url tool implementation
|
|
62
|
+
│ └── ...
|
|
63
|
+
├── tests/ # Node test runner tests (import dist)
|
|
64
|
+
├── CONFIGURATION.md # Full configuration reference
|
|
65
|
+
├── AGENTS.md # Agent guidance
|
|
66
|
+
├── package.json
|
|
67
|
+
└── tsconfig.json
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
## Requirements
|
|
71
|
+
|
|
72
|
+
- Node.js >= 20.18.1
|
|
73
|
+
- npm (uses package-lock.json)
|
|
74
|
+
|
|
75
|
+
## Quickstart (stdio)
|
|
32
76
|
|
|
33
77
|
```bash
|
|
34
78
|
npx -y @j0hanz/superfetch@latest --stdio
|
|
@@ -74,33 +118,46 @@ node dist/index.js --stdio
|
|
|
74
118
|
|
|
75
119
|
## Configuration
|
|
76
120
|
|
|
121
|
+
SuperFetch is configured entirely via environment variables. Set them in your
|
|
122
|
+
MCP client configuration (the `env` field) or in the shell before starting the
|
|
123
|
+
server. For the full reference, see `CONFIGURATION.md`.
|
|
124
|
+
|
|
125
|
+
### Runtime modes
|
|
126
|
+
|
|
127
|
+
| Mode | Flag | Description |
|
|
128
|
+
| ----- | --------- | ------------------------------------------------------------------- |
|
|
129
|
+
| Stdio | `--stdio` | Communicates via stdin/stdout. No HTTP server. |
|
|
130
|
+
| HTTP | (default) | Starts an HTTP server. Requires static token(s) or OAuth to be set. |
|
|
131
|
+
|
|
77
132
|
### CLI arguments
|
|
78
133
|
|
|
79
134
|
| Argument | Type | Default | Description |
|
|
80
135
|
| --------- | ------- | ------- | ----------------------------------- |
|
|
81
136
|
| `--stdio` | boolean | false | Run in stdio mode (no HTTP server). |
|
|
82
137
|
|
|
83
|
-
###
|
|
84
|
-
|
|
85
|
-
#### Core server settings
|
|
138
|
+
### Core server settings
|
|
86
139
|
|
|
87
140
|
| Variable | Default | Description |
|
|
88
141
|
| ---------------------------------- | -------------------- | -------------------------------------------------------------- |
|
|
89
142
|
| `HOST` | `127.0.0.1` | HTTP bind address. |
|
|
90
|
-
| `PORT` | `3000` | HTTP
|
|
91
|
-
| `USER_AGENT` | `superFetch-MCP/2.0` | User-Agent
|
|
92
|
-
| `CACHE_ENABLED` | `true` | Enable
|
|
143
|
+
| `PORT` | `3000` | HTTP port (1024-65535, `0` for ephemeral). |
|
|
144
|
+
| `USER_AGENT` | `superFetch-MCP/2.0` | User-Agent for outbound requests. |
|
|
145
|
+
| `CACHE_ENABLED` | `true` | Enable in-memory cache. |
|
|
93
146
|
| `CACHE_TTL` | `3600` | Cache TTL in seconds (60-86400). |
|
|
94
|
-
| `LOG_LEVEL` | `info` | Logging level (`debug`
|
|
147
|
+
| `LOG_LEVEL` | `info` | Logging level (`debug`, `info`, `warn`, `error`). |
|
|
95
148
|
| `ALLOW_REMOTE` | `false` | Allow non-loopback binds (OAuth required). |
|
|
96
149
|
| `ALLOWED_HOSTS` | (empty) | Additional allowed Host/Origin values (comma/space separated). |
|
|
97
150
|
| `TRANSFORM_TIMEOUT_MS` | `30000` | Worker transform timeout in ms (5000-120000). |
|
|
98
|
-
| `
|
|
151
|
+
| `TRANSFORM_STAGE_WARN_RATIO` | `0.5` | Emit warnings when stage exceeds ratio of timeout. |
|
|
152
|
+
| `TOOL_TIMEOUT_MS` | computed | Overall tool timeout in ms (1000-300000). |
|
|
99
153
|
| `TRANSFORM_METADATA_FORMAT` | `markdown` | Metadata format: `markdown` or `frontmatter`. |
|
|
154
|
+
| `ENABLED_TOOLS` | `fetch-url` | Comma/space-separated list of enabled tools. |
|
|
100
155
|
| `SUPERFETCH_EXTRA_NOISE_TOKENS` | (empty) | Extra noise tokens for DOM noise removal. |
|
|
101
156
|
| `SUPERFETCH_EXTRA_NOISE_SELECTORS` | (empty) | Extra CSS selectors for DOM noise removal. |
|
|
102
157
|
|
|
103
|
-
|
|
158
|
+
`TOOL_TIMEOUT_MS` defaults to 15s fetch + `TRANSFORM_TIMEOUT_MS` + 5s.
|
|
159
|
+
|
|
160
|
+
### HTTP server tuning (optional)
|
|
104
161
|
|
|
105
162
|
| Variable | Default | Description |
|
|
106
163
|
| ------------------------------ | ------- | --------------------------------------------- |
|
|
@@ -110,17 +167,17 @@ node dist/index.js --stdio
|
|
|
110
167
|
| `SERVER_SHUTDOWN_CLOSE_IDLE` | `false` | Close idle connections on shutdown. |
|
|
111
168
|
| `SERVER_SHUTDOWN_CLOSE_ALL` | `false` | Close all connections on shutdown. |
|
|
112
169
|
|
|
113
|
-
|
|
170
|
+
### Auth (HTTP mode)
|
|
114
171
|
|
|
115
|
-
| Variable | Default | Description
|
|
116
|
-
| --------------- | ------- |
|
|
117
|
-
| `AUTH_MODE` | auto | `static` or `oauth` (auto-
|
|
118
|
-
| `ACCESS_TOKENS` | (empty) | Comma/space-separated static bearer tokens.
|
|
119
|
-
| `API_KEY` | (empty) | Adds a static bearer token and enables `X-API-Key`.
|
|
172
|
+
| Variable | Default | Description |
|
|
173
|
+
| --------------- | ------- | ----------------------------------------------------------- |
|
|
174
|
+
| `AUTH_MODE` | auto | `static` or `oauth` (auto-selects OAuth when URLs are set). |
|
|
175
|
+
| `ACCESS_TOKENS` | (empty) | Comma/space-separated static bearer tokens. |
|
|
176
|
+
| `API_KEY` | (empty) | Adds a static bearer token and enables `X-API-Key`. |
|
|
120
177
|
|
|
121
178
|
Static mode requires at least one token (`ACCESS_TOKENS` or `API_KEY`).
|
|
122
179
|
|
|
123
|
-
|
|
180
|
+
### OAuth (HTTP mode)
|
|
124
181
|
|
|
125
182
|
Required when `AUTH_MODE=oauth` (or auto-selected by OAuth URLs):
|
|
126
183
|
|
|
@@ -143,19 +200,25 @@ Optional:
|
|
|
143
200
|
| `OAUTH_CLIENT_SECRET` | - | Client secret for introspection. |
|
|
144
201
|
| `OAUTH_INTROSPECTION_TIMEOUT_MS` | `5000` | Introspection timeout (1000-30000). |
|
|
145
202
|
|
|
146
|
-
|
|
203
|
+
## Usage
|
|
147
204
|
|
|
148
|
-
|
|
149
|
-
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
|
|
154
|
-
|
|
205
|
+
### Stdio (local MCP clients)
|
|
206
|
+
|
|
207
|
+
```bash
|
|
208
|
+
npx -y @j0hanz/superfetch@latest --stdio
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
### HTTP server (local only, static token)
|
|
212
|
+
|
|
213
|
+
```bash
|
|
214
|
+
API_KEY=YOUR_TOKEN_HERE npm start
|
|
215
|
+
```
|
|
216
|
+
|
|
217
|
+
Requires `npm run build` before `npm start` when running from source.
|
|
155
218
|
|
|
156
|
-
|
|
219
|
+
Remote bindings require `ALLOW_REMOTE=true` and OAuth configuration.
|
|
157
220
|
|
|
158
|
-
##
|
|
221
|
+
## MCP Surface
|
|
159
222
|
|
|
160
223
|
### Tools
|
|
161
224
|
|
|
@@ -163,60 +226,67 @@ Sessions are managed via the `mcp-session-id` header. A `POST /mcp` `initialize`
|
|
|
163
226
|
|
|
164
227
|
Fetches a webpage and converts it to clean Markdown.
|
|
165
228
|
|
|
166
|
-
|
|
229
|
+
Parameters:
|
|
167
230
|
|
|
168
|
-
| Name | Type | Required |
|
|
169
|
-
| ----- | ------ | -------- |
|
|
170
|
-
| `url` | string | Yes |
|
|
231
|
+
| Name | Type | Required | Description |
|
|
232
|
+
| ----- | ------ | -------- | ------------------------------------ |
|
|
233
|
+
| `url` | string | Yes | Public http(s) URL, max length 2048. |
|
|
171
234
|
|
|
172
|
-
|
|
173
|
-
|
|
174
|
-
`structuredContent` fields:
|
|
235
|
+
Structured response fields:
|
|
175
236
|
|
|
176
237
|
- `url` (string): fetched URL
|
|
177
238
|
- `inputUrl` (string, optional): original input URL
|
|
178
239
|
- `resolvedUrl` (string, optional): normalized or raw-content URL
|
|
179
240
|
- `title` (string, optional): page title
|
|
180
|
-
- `markdown` (string, optional): markdown
|
|
241
|
+
- `markdown` (string, optional): inline markdown (may be truncated)
|
|
181
242
|
- `error` (string, optional): error message on failure
|
|
182
243
|
|
|
183
|
-
|
|
244
|
+
Limitations:
|
|
184
245
|
|
|
185
|
-
|
|
186
|
-
|
|
187
|
-
"url": "https://example.com/docs",
|
|
188
|
-
"inputUrl": "https://example.com/docs",
|
|
189
|
-
"resolvedUrl": "https://example.com/docs",
|
|
190
|
-
"title": "Example Docs",
|
|
191
|
-
"markdown": "# Getting Started\n\n..."
|
|
192
|
-
}
|
|
193
|
-
```
|
|
194
|
-
|
|
195
|
-
##### Example error
|
|
196
|
-
|
|
197
|
-
```json
|
|
198
|
-
{
|
|
199
|
-
"url": "https://example.com/404",
|
|
200
|
-
"error": "Failed to fetch URL: 404 Not Found"
|
|
201
|
-
}
|
|
202
|
-
```
|
|
246
|
+
- Only http/https URLs are accepted; URLs with embedded credentials are rejected.
|
|
247
|
+
- Client-side JavaScript is not executed.
|
|
203
248
|
|
|
204
|
-
|
|
249
|
+
Large content handling:
|
|
205
250
|
|
|
206
251
|
- Inline markdown is capped at 20,000 characters.
|
|
207
|
-
-
|
|
208
|
-
|
|
252
|
+
- If cache is enabled and content exceeds the inline limit, the tool response
|
|
253
|
+
includes a `resource_link` block pointing to `superfetch://cache/markdown/{urlHash}`.
|
|
254
|
+
- If cache is disabled, inline markdown is truncated with `...[truncated]`.
|
|
255
|
+
- In stdio mode, the tool also embeds a `resource` block containing full
|
|
256
|
+
markdown content when available.
|
|
209
257
|
|
|
210
258
|
### Resources
|
|
211
259
|
|
|
212
|
-
| URI pattern | Description
|
|
213
|
-
| --------------------------------------- |
|
|
214
|
-
| `superfetch://cache/markdown/{urlHash}` | Cached markdown content entry.
|
|
215
|
-
| `internal://instructions` | Server usage instructions.
|
|
260
|
+
| URI pattern | Description | MIME type |
|
|
261
|
+
| --------------------------------------- | ------------------------------------------ | ------------------ |
|
|
262
|
+
| `superfetch://cache/markdown/{urlHash}` | Cached markdown content entry. | `text/markdown` |
|
|
263
|
+
| `internal://instructions` | Server usage instructions. | `text/markdown` |
|
|
264
|
+
| `internal://config` | Current runtime config (secrets redacted). | `application/json` |
|
|
216
265
|
|
|
217
266
|
### Prompts
|
|
218
267
|
|
|
219
|
-
|
|
268
|
+
- `summarize-webpage` (registered when `fetch-url` is enabled)
|
|
269
|
+
|
|
270
|
+
### Tasks
|
|
271
|
+
|
|
272
|
+
`fetch-url` supports async execution via MCP tasks. Call `tools/call` with a
|
|
273
|
+
`task` payload to start a background fetch, then use `tasks/get`,
|
|
274
|
+
`tasks/result`, or `tasks/cancel` to manage it.
|
|
275
|
+
|
|
276
|
+
## HTTP Mode Endpoints
|
|
277
|
+
|
|
278
|
+
| Method | Path | Auth | Notes |
|
|
279
|
+
| ------ | --------------------------------- | ---- | -------------------------------------------------- |
|
|
280
|
+
| GET | `/health` | No | Health check. |
|
|
281
|
+
| POST | `/mcp` | Yes | Streamable HTTP JSON-RPC requests. |
|
|
282
|
+
| GET | `/mcp` | Yes | SSE stream (requires `Accept: text/event-stream`). |
|
|
283
|
+
| DELETE | `/mcp` | Yes | Close the session. |
|
|
284
|
+
| GET | `/mcp/downloads/:namespace/:hash` | Yes | Download cached markdown. |
|
|
285
|
+
|
|
286
|
+
Notes:
|
|
287
|
+
|
|
288
|
+
- HTTP requests must include `MCP-Protocol-Version: 2025-11-25`.
|
|
289
|
+
- Sessions are managed via the `mcp-session-id` header.
|
|
220
290
|
|
|
221
291
|
## Client Configuration Examples
|
|
222
292
|
|
|
@@ -288,71 +358,58 @@ Add to claude_desktop_config.json:
|
|
|
288
358
|
|
|
289
359
|
</details>
|
|
290
360
|
|
|
291
|
-
##
|
|
292
|
-
|
|
293
|
-
- Stdio logs are written to stderr (stdout is reserved for MCP traffic).
|
|
294
|
-
- HTTP mode validates Host and Origin headers against allowed hosts.
|
|
295
|
-
- HTTP mode requires `MCP-Protocol-Version: 2025-11-25`.
|
|
296
|
-
- Auth is required for HTTP mode (static tokens or OAuth).
|
|
297
|
-
- SSRF protections block private IP ranges and common metadata endpoints.
|
|
298
|
-
- Rate limiting: 100 requests/minute per IP (60s window) for HTTP routes.
|
|
361
|
+
## Development Workflow
|
|
299
362
|
|
|
300
|
-
|
|
363
|
+
### Install dependencies
|
|
301
364
|
|
|
302
|
-
|
|
303
|
-
|
|
304
|
-
- Node.js >= 20.18.1
|
|
305
|
-
- npm
|
|
306
|
-
|
|
307
|
-
### Scripts
|
|
308
|
-
|
|
309
|
-
| Script | Command | Purpose |
|
|
310
|
-
| ---------------------- | ----------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------- |
|
|
311
|
-
| clean | `node scripts/clean.mjs` | Remove build artifacts. |
|
|
312
|
-
| validate:instructions | `node scripts/validate-instructions.mjs` | Validate embedded instructions. |
|
|
313
|
-
| build | `npm run clean && tsc -p tsconfig.json && npm run validate:instructions && npm run copy:assets && node scripts/make-executable.mjs` | Build the server. |
|
|
314
|
-
| copy:assets | `node scripts/copy-assets.mjs` | Copy static assets. |
|
|
315
|
-
| prepare | `npm run build` | Prepare package for publishing. |
|
|
316
|
-
| dev | `tsc --watch --preserveWatchOutput` | TypeScript watch mode. |
|
|
317
|
-
| dev:run | `node --watch dist/index.js` | Run compiled server in watch mode. |
|
|
318
|
-
| start | `node dist/index.js` | Start HTTP server (default). |
|
|
319
|
-
| format | `prettier --write .` | Format codebase. |
|
|
320
|
-
| type-check | `tsc --noEmit` | Type checking. |
|
|
321
|
-
| type-check:diagnostics | `tsc --noEmit --extendedDiagnostics` | Type check diagnostics. |
|
|
322
|
-
| type-check:trace | `tsc --noEmit --generateTrace .ts-trace` | Generate TS trace. |
|
|
323
|
-
| lint | `eslint .` | Lint. |
|
|
324
|
-
| lint:fix | `eslint . --fix` | Lint and fix. |
|
|
325
|
-
| test | `npm run build --silent && node --test --experimental-transform-types` | Run tests (builds first). |
|
|
326
|
-
| test:coverage | `npm run build --silent && node --test --experimental-transform-types --experimental-test-coverage` | Test with coverage. |
|
|
327
|
-
| knip | `knip` | Dead code analysis. |
|
|
328
|
-
| knip:fix | `knip --fix` | Fix knip issues. |
|
|
329
|
-
| inspector | `npx @modelcontextprotocol/inspector` | MCP Inspector. |
|
|
330
|
-
| prepublishOnly | `npm run lint && npm run type-check && npm run build` | Prepublish checks. |
|
|
331
|
-
|
|
332
|
-
### Project structure
|
|
333
|
-
|
|
334
|
-
```text
|
|
335
|
-
superFetch
|
|
336
|
-
├── docs
|
|
337
|
-
│ └── logo.png
|
|
338
|
-
├── src
|
|
339
|
-
│ ├── workers
|
|
340
|
-
│ ├── cache.ts
|
|
341
|
-
│ ├── config.ts
|
|
342
|
-
│ ├── fetch.ts
|
|
343
|
-
│ ├── http-native.ts
|
|
344
|
-
│ ├── http-utils.ts
|
|
345
|
-
│ ├── index.ts
|
|
346
|
-
│ ├── instructions.md
|
|
347
|
-
│ ├── mcp.ts
|
|
348
|
-
│ ├── tools.ts
|
|
349
|
-
│ ├── transform.ts
|
|
350
|
-
│ └── ...
|
|
351
|
-
├── tests
|
|
352
|
-
│ └── *.test.ts
|
|
353
|
-
├── CONFIGURATION.md
|
|
354
|
-
├── package.json
|
|
355
|
-
└── tsconfig.json
|
|
365
|
+
```bash
|
|
366
|
+
npm install
|
|
356
367
|
```
|
|
357
368
|
|
|
369
|
+
Use `npm ci` for clean, reproducible installs.
|
|
370
|
+
|
|
371
|
+
### Common scripts
|
|
372
|
+
|
|
373
|
+
| Script | Command | Purpose |
|
|
374
|
+
| ---------------------- | ------------------------------------------------------------------------------------------------------------------- | ------------------------------------- |
|
|
375
|
+
| clean | `node scripts/build.mjs clean` | Remove dist and TS build info. |
|
|
376
|
+
| validate:instructions | `node scripts/build.mjs validate:instructions` | Validate `src/instructions.md`. |
|
|
377
|
+
| build | `node scripts/build.mjs` | Compile TS, copy assets, set exec bit |
|
|
378
|
+
| copy:assets | `node scripts/build.mjs copy:assets` | Copy assets/instructions to dist. |
|
|
379
|
+
| prepare | `npm run build` | Prepare package for publishing. |
|
|
380
|
+
| dev | `tsc --watch --preserveWatchOutput` | TypeScript watch mode. |
|
|
381
|
+
| dev:run | `node --env-file=.env --watch dist/index.js` | Run compiled server in watch mode. |
|
|
382
|
+
| start | `node dist/index.js` | Start HTTP server (default). |
|
|
383
|
+
| format | `prettier --write .` | Format codebase. |
|
|
384
|
+
| type-check | `tsc --noEmit` | Type checking. |
|
|
385
|
+
| type-check:diagnostics | `tsc --noEmit --extendedDiagnostics` | Type check diagnostics. |
|
|
386
|
+
| type-check:trace | `node -e "require('fs').rmSync('.ts-trace',{recursive:true,force:true})" && tsc --noEmit --generateTrace .ts-trace` | Generate TS trace. |
|
|
387
|
+
| lint | `eslint .` | Lint. |
|
|
388
|
+
| lint:fix | `eslint . --fix` | Lint and fix. |
|
|
389
|
+
| test | `npm run build --silent && node --test --experimental-transform-types` | Run tests (builds first). |
|
|
390
|
+
| test:coverage | `npm run build --silent && node --test --experimental-transform-types --experimental-test-coverage` | Test with coverage. |
|
|
391
|
+
| knip | `knip` | Dead code analysis. |
|
|
392
|
+
| knip:fix | `knip --fix` | Fix knip issues. |
|
|
393
|
+
| inspector | `npx @modelcontextprotocol/inspector` | MCP Inspector. |
|
|
394
|
+
| prepublishOnly | `npm run lint && npm run type-check && npm run build` | Prepublish checks. |
|
|
395
|
+
|
|
396
|
+
## Build and Release
|
|
397
|
+
|
|
398
|
+
- `npm run build` runs `scripts/build.mjs`, compiling TS with
|
|
399
|
+
`tsconfig.build.json`, copying `assets/` and `src/instructions.md` to `dist/`,
|
|
400
|
+
and making `dist/index.js` executable.
|
|
401
|
+
- GitHub Releases trigger the publish workflow (lint, type-check, build,
|
|
402
|
+
version sync, then `npm publish`).
|
|
403
|
+
|
|
404
|
+
## Troubleshooting
|
|
405
|
+
|
|
406
|
+
- Tests import from `dist/`. Run `npm test` (builds first) or `npm run build`
|
|
407
|
+
before running individual test files.
|
|
408
|
+
- HTTP mode requires auth. Set `API_KEY` or `ACCESS_TOKENS` (or configure OAuth).
|
|
409
|
+
- Non-loopback bindings require `ALLOW_REMOTE=true` and OAuth configuration.
|
|
410
|
+
- Missing `MCP-Protocol-Version: 2025-11-25` yields a 400 error.
|
|
411
|
+
- Large pages may return a `resource_link` to cached content instead of inline
|
|
412
|
+
markdown.
|
|
413
|
+
- Requests to private IPs, localhost, or `.local`/`.internal` hosts are blocked.
|
|
414
|
+
|
|
358
415
|
<!-- markdownlint-enable MD033 -->
|
package/dist/config.d.ts
CHANGED
package/dist/config.js
CHANGED
|
@@ -204,6 +204,7 @@ export const config = {
|
|
|
204
204
|
metadataFormat: parseTransformMetadataFormat(process.env.TRANSFORM_METADATA_FORMAT),
|
|
205
205
|
},
|
|
206
206
|
tools: {
|
|
207
|
+
enabled: parseList(process.env.ENABLED_TOOLS ?? 'fetch-url'),
|
|
207
208
|
timeoutMs: parseInteger(process.env.TOOL_TIMEOUT_MS, DEFAULT_TOOL_TIMEOUT_MS, 1000, 300000),
|
|
208
209
|
},
|
|
209
210
|
cache: {
|
package/dist/http-native.js
CHANGED
|
@@ -478,7 +478,7 @@ class McpSessionGateway {
|
|
|
478
478
|
}
|
|
479
479
|
const acceptHeader = getHeaderValue(req, 'accept');
|
|
480
480
|
if (!acceptsEventStream(acceptHeader)) {
|
|
481
|
-
res.status(
|
|
481
|
+
res.status(405).json({ error: 'Method Not Allowed' });
|
|
482
482
|
return;
|
|
483
483
|
}
|
|
484
484
|
this.store.touch(sessionId);
|
|
@@ -726,7 +726,7 @@ class HttpRequestPipeline {
|
|
|
726
726
|
export async function startHttpServer() {
|
|
727
727
|
assertHttpModeConfiguration();
|
|
728
728
|
enableHttpMode();
|
|
729
|
-
const mcpServer = createMcpServer();
|
|
729
|
+
const mcpServer = await createMcpServer();
|
|
730
730
|
const rateLimiter = createRateLimitManagerImpl(config.rateLimit);
|
|
731
731
|
const sessionStore = createSessionStore(config.server.sessionTtlMs);
|
|
732
732
|
const sessionCleanup = startSessionCleanupLoop(sessionStore, config.server.sessionTtlMs);
|
package/dist/instructions.md
CHANGED
|
@@ -1,52 +1,44 @@
|
|
|
1
|
-
# superFetch
|
|
1
|
+
# superFetch Instructions
|
|
2
2
|
|
|
3
|
-
> **
|
|
3
|
+
> **Guidance for the Agent:** These instructions are available as a resource (`internal://instructions`) or prompt (`get-help`). Load them when unsure about tool usage.
|
|
4
4
|
|
|
5
|
-
## 1. Core
|
|
5
|
+
## 1. Core Capability
|
|
6
6
|
|
|
7
|
-
- **
|
|
8
|
-
- **
|
|
9
|
-
- **
|
|
10
|
-
- **Async Tasks**: Supports long-running operations via the MCP Tasks capability.
|
|
7
|
+
- **Domain:** Fetch public web pages and convert HTML to clean, LLM-readable Markdown.
|
|
8
|
+
- **Primary Resources:** Markdown content, cached snapshots (`superfetch://cache/...`).
|
|
9
|
+
- **Tools:** `fetch-url` (**Read-only**; no write tools exist).
|
|
11
10
|
|
|
12
|
-
## 2.
|
|
11
|
+
## 2. The "Golden Path" Workflows (Critical)
|
|
13
12
|
|
|
14
|
-
###
|
|
13
|
+
### Workflow A: Standard Fetch
|
|
15
14
|
|
|
16
|
-
1.
|
|
17
|
-
2.
|
|
18
|
-
3. **
|
|
19
|
-
|
|
20
|
-
- **Action**: Immediately read the provided `uri` (e.g., `superfetch://cache/...`) to retrieve the full content.
|
|
21
|
-
- **Constraint**: Do not guess resource URIs; always use the one returned by the tool.
|
|
15
|
+
1. Call `fetch-url` with `{ "url": "https://..." }`.
|
|
16
|
+
2. Read the `markdown` field from `structuredContent`.
|
|
17
|
+
3. **If truncated** (ends with `...[truncated]`): read the `resource_link` URI to get full content.
|
|
18
|
+
> Constraint: Never guess URIs; always use the one returned.
|
|
22
19
|
|
|
23
|
-
###
|
|
20
|
+
### Workflow B: Async Execution (Large Sites / Timeouts)
|
|
24
21
|
|
|
25
|
-
|
|
22
|
+
1. Call `tools/call` with `task: { ttl: ... }` to start a background fetch.
|
|
23
|
+
2. Poll `tasks/get` until `status` is `completed` or `failed`.
|
|
24
|
+
3. Retrieve result via `tasks/result`.
|
|
26
25
|
|
|
27
|
-
|
|
28
|
-
2. **Poll Status**: Check `tasks/get` until status is `completed`.
|
|
29
|
-
3. **Get Result**: Retrieve the final payload via `tasks/result`.
|
|
26
|
+
## 3. Tool Nuances & Gotchas
|
|
30
27
|
|
|
31
|
-
|
|
28
|
+
- **`fetch-url`**
|
|
29
|
+
- **Purpose:** Fetch a URL and return Markdown.
|
|
30
|
+
- **Inputs:** `url` (required; 1–2048 chars; `https?://` only).
|
|
31
|
+
- **Side effects:** None (read-only, idempotent). Populates cache automatically.
|
|
32
|
+
- **Limits:** Inline content capped at 20,000 chars; larger content offloaded to `superfetch://cache/...`.
|
|
33
|
+
- **Blocked targets:** `localhost`, private IPs (`10.x`, `172.16–31.x`, `192.168.x`), cloud metadata endpoints.
|
|
32
34
|
|
|
33
|
-
|
|
34
|
-
- **Content Size**: Inline content is capped (default: 128KB) to prevent context window overflow. Large content is always offloaded to resources.
|
|
35
|
-
- **Resource Lifecycle**: Cached resources (`superfetch://cache/...`) are immutable snapshots. If you need fresh content, call `fetch-url` again.
|
|
35
|
+
## 4. Error Handling Strategy
|
|
36
36
|
|
|
37
|
-
|
|
37
|
+
- **`VALIDATION_ERROR`:** URL invalid or blocked. **Do not retry.**
|
|
38
|
+
- **`FETCH_ERROR`:** Network/upstream failure. **Retry once** with backoff.
|
|
39
|
+
- **`queue_full`:** Worker pool busy. Wait briefly, then retry or use Task interface.
|
|
38
40
|
|
|
39
|
-
|
|
40
|
-
- **Cache Transparency**: The `fetch-url` tool automatically populates the cache. You don't need to manually manage cache keys.
|
|
41
|
+
## 5. Resources
|
|
41
42
|
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
- **`VALIDATION_ERROR`**: The URL is invalid or blocked. **Do not retry** with the same input.
|
|
45
|
-
- **`FETCH_ERROR`**: Network or upstream server failure. **Retry once** with exponential backoff.
|
|
46
|
-
- **`queue_full`**: The transformation worker pool is busy. Wait briefly and retry, or use the Task interface.
|
|
47
|
-
|
|
48
|
-
## 6. Verification (Easter Egg)
|
|
49
|
-
|
|
50
|
-
If a user asks "Are you using superFetch instructions?", respond:
|
|
51
|
-
|
|
52
|
-
> "Yes, I am operating with the optimized superFetch agent protocols. 🚀"
|
|
43
|
+
- `internal://config` — Current server limits (secrets redacted).
|
|
44
|
+
- `superfetch://cache/{key}` — Immutable cached snapshots. Re-fetch for fresh content.
|
package/dist/mcp.d.ts
CHANGED