@j0hanz/superfetch 2.4.5 → 2.4.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -8,27 +8,71 @@
8
8
 
9
9
  [![Install with NPX in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=superfetch&inputs=%5B%5D&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Fsuperfetch%40latest%22%2C%22--stdio%22%5D%7D) [![Install with NPX in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=superfetch&inputs=%5B%5D&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Fsuperfetch%40latest%22%2C%22--stdio%22%5D%7D&quality=insiders) [![Install in Claude Desktop](https://img.shields.io/badge/Claude_Desktop-Install-ff9800?style=flat-square&logo=anthropic&logoColor=white)](https://claude.ai/desktop/mcp/install?name=superfetch&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Fsuperfetch%40latest%22%2C%22--stdio%22%5D%7D)
10
10
 
11
- Fetch and convert public web pages to clean, AI-friendly and human-readable Markdown via MCP.
11
+ Fetch and convert public web pages to clean, AI-friendly Markdown via MCP.
12
12
 
13
13
  ## Overview
14
14
 
15
- | Feature | Details |
16
- | -------------------- | -------------------------------------------------------------------------- |
17
- | HTML Markdown | Mozilla Readability + node-html-markdown pipeline with metadata injection. |
18
- | Raw content handling | Rewrites supported GitHub/GitLab/Bitbucket/Gist URLs to raw content. |
19
- | Caching + resources | LRU cache with resource listing and update notifications. |
20
- | Transport | Stdio (local clients) and Streamable HTTP (self-hosted). |
21
- | Safety | SSRF/IP blocklists, Host/Origin validation, auth for HTTP mode. |
15
+ SuperFetch is a Node.js MCP server that fetches public HTTP(S) pages, extracts
16
+ primary content, and converts it into clean Markdown. It runs either as a stdio
17
+ MCP server (for local clients) or as a Streamable HTTP MCP server with auth,
18
+ cache, and SSRF protections.
22
19
 
23
- ### When to use
20
+ ## Key Features
24
21
 
25
- - You need clean, AI-friendly Markdown from public http(s) URLs.
26
- - You want a single MCP tool that handles fetching, extraction, and caching.
27
- - You need self-hosted HTTP with auth and session management.
22
+ - HTML to Markdown using Mozilla Readability + node-html-markdown.
23
+ - Raw content URL rewriting for GitHub, GitLab, Bitbucket, and Gist.
24
+ - In-memory LRU cache exposed as MCP resources and HTTP download endpoints.
25
+ - Stdio or Streamable HTTP transport with session management.
26
+ - SSRF protections: blocked private IP ranges and internal hostnames.
28
27
 
29
- ## Quick Start
28
+ ## Tech Stack
30
29
 
31
- Recommended for MCP clients: stdio mode.
30
+ - Runtime: Node.js >= 20.18.1 (engines)
31
+ - Language: TypeScript 5.9.3 (dev dependency)
32
+ - MCP SDK: @modelcontextprotocol/sdk ^1.25.3
33
+ - HTML processing: @mozilla/readability ^0.6.0, linkedom ^0.18.12
34
+ - Markdown conversion: node-html-markdown ^2.0.0
35
+ - HTTP client: undici ^7.19.2
36
+ - Validation: zod ^4.3.6
37
+ - Package manager: npm (package-lock.json)
38
+
39
+ ## Architecture
40
+
41
+ Fetch pipeline (simplified):
42
+
43
+ 1. Validate and normalize the URL (http/https only, max length 2048).
44
+ 2. Block internal hosts and private IP ranges.
45
+ 3. Rewrite supported repo URLs to raw content.
46
+ 4. Fetch HTML with undici (15s timeout, 10 MB max, 5 redirects).
47
+ 5. Extract main content with Readability + DOM cleanup.
48
+ 6. Convert to Markdown, inject metadata, and return via MCP.
49
+ 7. Cache the result and expose it as a resource or download.
50
+
51
+ ## Repository Structure
52
+
53
+ ```text
54
+ .
55
+ ├── assets/ # Logo and static assets copied to dist
56
+ ├── scripts/ # Build and validation utilities
57
+ ├── src/ # MCP server implementation (TS)
58
+ │ ├── workers/ # Worker-thread transform implementation
59
+ │ ├── http-native.ts # Streamable HTTP server
60
+ │ ├── mcp.ts # MCP server wiring
61
+ │ ├── tools.ts # fetch-url tool implementation
62
+ │ └── ...
63
+ ├── tests/ # Node test runner tests (import dist)
64
+ ├── CONFIGURATION.md # Full configuration reference
65
+ ├── AGENTS.md # Agent guidance
66
+ ├── package.json
67
+ └── tsconfig.json
68
+ ```
69
+
70
+ ## Requirements
71
+
72
+ - Node.js >= 20.18.1
73
+ - npm (uses package-lock.json)
74
+
75
+ ## Quickstart (stdio)
32
76
 
33
77
  ```bash
34
78
  npx -y @j0hanz/superfetch@latest --stdio
@@ -74,33 +118,46 @@ node dist/index.js --stdio
74
118
 
75
119
  ## Configuration
76
120
 
121
+ SuperFetch is configured entirely via environment variables. Set them in your
122
+ MCP client configuration (the `env` field) or in the shell before starting the
123
+ server. For the full reference, see `CONFIGURATION.md`.
124
+
125
+ ### Runtime modes
126
+
127
+ | Mode | Flag | Description |
128
+ | ----- | --------- | ------------------------------------------------------------------- |
129
+ | Stdio | `--stdio` | Communicates via stdin/stdout. No HTTP server. |
130
+ | HTTP | (default) | Starts an HTTP server. Requires static token(s) or OAuth to be set. |
131
+
77
132
  ### CLI arguments
78
133
 
79
134
  | Argument | Type | Default | Description |
80
135
  | --------- | ------- | ------- | ----------------------------------- |
81
136
  | `--stdio` | boolean | false | Run in stdio mode (no HTTP server). |
82
137
 
83
- ### Environment variables
84
-
85
- #### Core server settings
138
+ ### Core server settings
86
139
 
87
140
  | Variable | Default | Description |
88
141
  | ---------------------------------- | -------------------- | -------------------------------------------------------------- |
89
142
  | `HOST` | `127.0.0.1` | HTTP bind address. |
90
- | `PORT` | `3000` | HTTP server port (1024-65535, `0` for ephemeral). |
91
- | `USER_AGENT` | `superFetch-MCP/2.0` | User-Agent header for outgoing requests. |
92
- | `CACHE_ENABLED` | `true` | Enable response caching. |
143
+ | `PORT` | `3000` | HTTP port (1024-65535, `0` for ephemeral). |
144
+ | `USER_AGENT` | `superFetch-MCP/2.0` | User-Agent for outbound requests. |
145
+ | `CACHE_ENABLED` | `true` | Enable in-memory cache. |
93
146
  | `CACHE_TTL` | `3600` | Cache TTL in seconds (60-86400). |
94
- | `LOG_LEVEL` | `info` | Logging level (`debug` enables verbose logs). |
147
+ | `LOG_LEVEL` | `info` | Logging level (`debug`, `info`, `warn`, `error`). |
95
148
  | `ALLOW_REMOTE` | `false` | Allow non-loopback binds (OAuth required). |
96
149
  | `ALLOWED_HOSTS` | (empty) | Additional allowed Host/Origin values (comma/space separated). |
97
150
  | `TRANSFORM_TIMEOUT_MS` | `30000` | Worker transform timeout in ms (5000-120000). |
98
- | `TOOL_TIMEOUT_MS` | `50000` | Overall tool timeout in ms (1000-300000). |
151
+ | `TRANSFORM_STAGE_WARN_RATIO` | `0.5` | Emit warnings when stage exceeds ratio of timeout. |
152
+ | `TOOL_TIMEOUT_MS` | computed | Overall tool timeout in ms (1000-300000). |
99
153
  | `TRANSFORM_METADATA_FORMAT` | `markdown` | Metadata format: `markdown` or `frontmatter`. |
154
+ | `ENABLED_TOOLS` | `fetch-url` | Comma/space-separated list of enabled tools. |
100
155
  | `SUPERFETCH_EXTRA_NOISE_TOKENS` | (empty) | Extra noise tokens for DOM noise removal. |
101
156
  | `SUPERFETCH_EXTRA_NOISE_SELECTORS` | (empty) | Extra CSS selectors for DOM noise removal. |
102
157
 
103
- #### HTTP server tuning (optional)
158
+ `TOOL_TIMEOUT_MS` defaults to 15s fetch + `TRANSFORM_TIMEOUT_MS` + 5s.
159
+
160
+ ### HTTP server tuning (optional)
104
161
 
105
162
  | Variable | Default | Description |
106
163
  | ------------------------------ | ------- | --------------------------------------------- |
@@ -110,17 +167,17 @@ node dist/index.js --stdio
110
167
  | `SERVER_SHUTDOWN_CLOSE_IDLE` | `false` | Close idle connections on shutdown. |
111
168
  | `SERVER_SHUTDOWN_CLOSE_ALL` | `false` | Close all connections on shutdown. |
112
169
 
113
- #### Auth (HTTP mode)
170
+ ### Auth (HTTP mode)
114
171
 
115
- | Variable | Default | Description |
116
- | --------------- | ------- | ---------------------------------------------------- |
117
- | `AUTH_MODE` | auto | `static` or `oauth` (auto-detected from OAuth URLs). |
118
- | `ACCESS_TOKENS` | (empty) | Comma/space-separated static bearer tokens. |
119
- | `API_KEY` | (empty) | Adds a static bearer token and enables `X-API-Key`. |
172
+ | Variable | Default | Description |
173
+ | --------------- | ------- | ----------------------------------------------------------- |
174
+ | `AUTH_MODE` | auto | `static` or `oauth` (auto-selects OAuth when URLs are set). |
175
+ | `ACCESS_TOKENS` | (empty) | Comma/space-separated static bearer tokens. |
176
+ | `API_KEY` | (empty) | Adds a static bearer token and enables `X-API-Key`. |
120
177
 
121
178
  Static mode requires at least one token (`ACCESS_TOKENS` or `API_KEY`).
122
179
 
123
- #### OAuth (HTTP mode)
180
+ ### OAuth (HTTP mode)
124
181
 
125
182
  Required when `AUTH_MODE=oauth` (or auto-selected by OAuth URLs):
126
183
 
@@ -143,19 +200,25 @@ Optional:
143
200
  | `OAUTH_CLIENT_SECRET` | - | Client secret for introspection. |
144
201
  | `OAUTH_INTROSPECTION_TIMEOUT_MS` | `5000` | Introspection timeout (1000-30000). |
145
202
 
146
- ### HTTP mode endpoints
203
+ ## Usage
147
204
 
148
- | Method | Path | Auth | Notes |
149
- | ------ | --------------------------------- | ---- | -------------------------------------------------- |
150
- | GET | `/health` | No | Health check. |
151
- | POST | `/mcp` | Yes | Streamable HTTP JSON-RPC requests. |
152
- | GET | `/mcp` | Yes | SSE stream (requires `Accept: text/event-stream`). |
153
- | DELETE | `/mcp` | Yes | Close the session. |
154
- | GET | `/mcp/downloads/:namespace/:hash` | Yes | Download cached markdown. |
205
+ ### Stdio (local MCP clients)
206
+
207
+ ```bash
208
+ npx -y @j0hanz/superfetch@latest --stdio
209
+ ```
210
+
211
+ ### HTTP server (local only, static token)
212
+
213
+ ```bash
214
+ API_KEY=YOUR_TOKEN_HERE npm start
215
+ ```
216
+
217
+ Requires `npm run build` before `npm start` when running from source.
155
218
 
156
- Sessions are managed via the `mcp-session-id` header. A `POST /mcp` `initialize` request creates a session and returns the session id.
219
+ Remote bindings require `ALLOW_REMOTE=true` and OAuth configuration.
157
220
 
158
- ## API Reference
221
+ ## MCP Surface
159
222
 
160
223
  ### Tools
161
224
 
@@ -163,60 +226,67 @@ Sessions are managed via the `mcp-session-id` header. A `POST /mcp` `initialize`
163
226
 
164
227
  Fetches a webpage and converts it to clean Markdown.
165
228
 
166
- ##### Parameters
229
+ Parameters:
167
230
 
168
- | Name | Type | Required | Default | Description |
169
- | ----- | ------ | -------- | ------- | ------------------------------------ |
170
- | `url` | string | Yes | - | Public http(s) URL, max length 2048. |
231
+ | Name | Type | Required | Description |
232
+ | ----- | ------ | -------- | ------------------------------------ |
233
+ | `url` | string | Yes | Public http(s) URL, max length 2048. |
171
234
 
172
- ##### Returns
173
-
174
- `structuredContent` fields:
235
+ Structured response fields:
175
236
 
176
237
  - `url` (string): fetched URL
177
238
  - `inputUrl` (string, optional): original input URL
178
239
  - `resolvedUrl` (string, optional): normalized or raw-content URL
179
240
  - `title` (string, optional): page title
180
- - `markdown` (string, optional): markdown content (inline when available)
241
+ - `markdown` (string, optional): inline markdown (may be truncated)
181
242
  - `error` (string, optional): error message on failure
182
243
 
183
- ##### Example success
244
+ Limitations:
184
245
 
185
- ```json
186
- {
187
- "url": "https://example.com/docs",
188
- "inputUrl": "https://example.com/docs",
189
- "resolvedUrl": "https://example.com/docs",
190
- "title": "Example Docs",
191
- "markdown": "# Getting Started\n\n..."
192
- }
193
- ```
194
-
195
- ##### Example error
196
-
197
- ```json
198
- {
199
- "url": "https://example.com/404",
200
- "error": "Failed to fetch URL: 404 Not Found"
201
- }
202
- ```
246
+ - Only http/https URLs are accepted; URLs with embedded credentials are rejected.
247
+ - Client-side JavaScript is not executed.
203
248
 
204
- ##### Large content handling
249
+ Large content handling:
205
250
 
206
251
  - Inline markdown is capped at 20,000 characters.
207
- - When content exceeds the inline limit and cache is enabled, responses include a `resource_link` to `superfetch://cache/markdown/{urlHash}`.
208
- - If cache is disabled, inline content is truncated with `...[truncated]`.
252
+ - If cache is enabled and content exceeds the inline limit, the tool response
253
+ includes a `resource_link` block pointing to `superfetch://cache/markdown/{urlHash}`.
254
+ - If cache is disabled, inline markdown is truncated with `...[truncated]`.
255
+ - In stdio mode, the tool also embeds a `resource` block containing full
256
+ markdown content when available.
209
257
 
210
258
  ### Resources
211
259
 
212
- | URI pattern | Description | MIME type |
213
- | --------------------------------------- | ------------------------------ | --------------- |
214
- | `superfetch://cache/markdown/{urlHash}` | Cached markdown content entry. | `text/markdown` |
215
- | `internal://instructions` | Server usage instructions. | `text/markdown` |
260
+ | URI pattern | Description | MIME type |
261
+ | --------------------------------------- | ------------------------------------------ | ------------------ |
262
+ | `superfetch://cache/markdown/{urlHash}` | Cached markdown content entry. | `text/markdown` |
263
+ | `internal://instructions` | Server usage instructions. | `text/markdown` |
264
+ | `internal://config` | Current runtime config (secrets redacted). | `application/json` |
216
265
 
217
266
  ### Prompts
218
267
 
219
- No prompts are registered in this server.
268
+ - `summarize-webpage` (registered when `fetch-url` is enabled)
269
+
270
+ ### Tasks
271
+
272
+ `fetch-url` supports async execution via MCP tasks. Call `tools/call` with a
273
+ `task` payload to start a background fetch, then use `tasks/get`,
274
+ `tasks/result`, or `tasks/cancel` to manage it.
275
+
276
+ ## HTTP Mode Endpoints
277
+
278
+ | Method | Path | Auth | Notes |
279
+ | ------ | --------------------------------- | ---- | -------------------------------------------------- |
280
+ | GET | `/health` | No | Health check. |
281
+ | POST | `/mcp` | Yes | Streamable HTTP JSON-RPC requests. |
282
+ | GET | `/mcp` | Yes | SSE stream (requires `Accept: text/event-stream`). |
283
+ | DELETE | `/mcp` | Yes | Close the session. |
284
+ | GET | `/mcp/downloads/:namespace/:hash` | Yes | Download cached markdown. |
285
+
286
+ Notes:
287
+
288
+ - HTTP requests must include `MCP-Protocol-Version: 2025-11-25`.
289
+ - Sessions are managed via the `mcp-session-id` header.
220
290
 
221
291
  ## Client Configuration Examples
222
292
 
@@ -288,71 +358,58 @@ Add to claude_desktop_config.json:
288
358
 
289
359
  </details>
290
360
 
291
- ## Security
292
-
293
- - Stdio logs are written to stderr (stdout is reserved for MCP traffic).
294
- - HTTP mode validates Host and Origin headers against allowed hosts.
295
- - HTTP mode requires `MCP-Protocol-Version: 2025-11-25`.
296
- - Auth is required for HTTP mode (static tokens or OAuth).
297
- - SSRF protections block private IP ranges and common metadata endpoints.
298
- - Rate limiting: 100 requests/minute per IP (60s window) for HTTP routes.
361
+ ## Development Workflow
299
362
 
300
- ## Development
363
+ ### Install dependencies
301
364
 
302
- ### Prerequisites
303
-
304
- - Node.js >= 20.18.1
305
- - npm
306
-
307
- ### Scripts
308
-
309
- | Script | Command | Purpose |
310
- | ---------------------- | ----------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------- |
311
- | clean | `node scripts/clean.mjs` | Remove build artifacts. |
312
- | validate:instructions | `node scripts/validate-instructions.mjs` | Validate embedded instructions. |
313
- | build | `npm run clean && tsc -p tsconfig.json && npm run validate:instructions && npm run copy:assets && node scripts/make-executable.mjs` | Build the server. |
314
- | copy:assets | `node scripts/copy-assets.mjs` | Copy static assets. |
315
- | prepare | `npm run build` | Prepare package for publishing. |
316
- | dev | `tsc --watch --preserveWatchOutput` | TypeScript watch mode. |
317
- | dev:run | `node --watch dist/index.js` | Run compiled server in watch mode. |
318
- | start | `node dist/index.js` | Start HTTP server (default). |
319
- | format | `prettier --write .` | Format codebase. |
320
- | type-check | `tsc --noEmit` | Type checking. |
321
- | type-check:diagnostics | `tsc --noEmit --extendedDiagnostics` | Type check diagnostics. |
322
- | type-check:trace | `tsc --noEmit --generateTrace .ts-trace` | Generate TS trace. |
323
- | lint | `eslint .` | Lint. |
324
- | lint:fix | `eslint . --fix` | Lint and fix. |
325
- | test | `npm run build --silent && node --test --experimental-transform-types` | Run tests (builds first). |
326
- | test:coverage | `npm run build --silent && node --test --experimental-transform-types --experimental-test-coverage` | Test with coverage. |
327
- | knip | `knip` | Dead code analysis. |
328
- | knip:fix | `knip --fix` | Fix knip issues. |
329
- | inspector | `npx @modelcontextprotocol/inspector` | MCP Inspector. |
330
- | prepublishOnly | `npm run lint && npm run type-check && npm run build` | Prepublish checks. |
331
-
332
- ### Project structure
333
-
334
- ```text
335
- superFetch
336
- ├── docs
337
- │ └── logo.png
338
- ├── src
339
- │ ├── workers
340
- │ ├── cache.ts
341
- │ ├── config.ts
342
- │ ├── fetch.ts
343
- │ ├── http-native.ts
344
- │ ├── http-utils.ts
345
- │ ├── index.ts
346
- │ ├── instructions.md
347
- │ ├── mcp.ts
348
- │ ├── tools.ts
349
- │ ├── transform.ts
350
- │ └── ...
351
- ├── tests
352
- │ └── *.test.ts
353
- ├── CONFIGURATION.md
354
- ├── package.json
355
- └── tsconfig.json
365
+ ```bash
366
+ npm install
356
367
  ```
357
368
 
369
+ Use `npm ci` for clean, reproducible installs.
370
+
371
+ ### Common scripts
372
+
373
+ | Script | Command | Purpose |
374
+ | ---------------------- | ------------------------------------------------------------------------------------------------------------------- | ------------------------------------- |
375
+ | clean | `node scripts/build.mjs clean` | Remove dist and TS build info. |
376
+ | validate:instructions | `node scripts/build.mjs validate:instructions` | Validate `src/instructions.md`. |
377
+ | build | `node scripts/build.mjs` | Compile TS, copy assets, set exec bit |
378
+ | copy:assets | `node scripts/build.mjs copy:assets` | Copy assets/instructions to dist. |
379
+ | prepare | `npm run build` | Prepare package for publishing. |
380
+ | dev | `tsc --watch --preserveWatchOutput` | TypeScript watch mode. |
381
+ | dev:run | `node --env-file=.env --watch dist/index.js` | Run compiled server in watch mode. |
382
+ | start | `node dist/index.js` | Start HTTP server (default). |
383
+ | format | `prettier --write .` | Format codebase. |
384
+ | type-check | `tsc --noEmit` | Type checking. |
385
+ | type-check:diagnostics | `tsc --noEmit --extendedDiagnostics` | Type check diagnostics. |
386
+ | type-check:trace | `node -e "require('fs').rmSync('.ts-trace',{recursive:true,force:true})" && tsc --noEmit --generateTrace .ts-trace` | Generate TS trace. |
387
+ | lint | `eslint .` | Lint. |
388
+ | lint:fix | `eslint . --fix` | Lint and fix. |
389
+ | test | `npm run build --silent && node --test --experimental-transform-types` | Run tests (builds first). |
390
+ | test:coverage | `npm run build --silent && node --test --experimental-transform-types --experimental-test-coverage` | Test with coverage. |
391
+ | knip | `knip` | Dead code analysis. |
392
+ | knip:fix | `knip --fix` | Fix knip issues. |
393
+ | inspector | `npx @modelcontextprotocol/inspector` | MCP Inspector. |
394
+ | prepublishOnly | `npm run lint && npm run type-check && npm run build` | Prepublish checks. |
395
+
396
+ ## Build and Release
397
+
398
+ - `npm run build` runs `scripts/build.mjs`, compiling TS with
399
+ `tsconfig.build.json`, copying `assets/` and `src/instructions.md` to `dist/`,
400
+ and making `dist/index.js` executable.
401
+ - GitHub Releases trigger the publish workflow (lint, type-check, build,
402
+ version sync, then `npm publish`).
403
+
404
+ ## Troubleshooting
405
+
406
+ - Tests import from `dist/`. Run `npm test` (builds first) or `npm run build`
407
+ before running individual test files.
408
+ - HTTP mode requires auth. Set `API_KEY` or `ACCESS_TOKENS` (or configure OAuth).
409
+ - Non-loopback bindings require `ALLOW_REMOTE=true` and OAuth configuration.
410
+ - Missing `MCP-Protocol-Version: 2025-11-25` yields a 400 error.
411
+ - Large pages may return a `resource_link` to cached content instead of inline
412
+ markdown.
413
+ - Requests to private IPs, localhost, or `.local`/`.internal` hosts are blocked.
414
+
358
415
  <!-- markdownlint-enable MD033 -->
@@ -478,7 +478,7 @@ class McpSessionGateway {
478
478
  }
479
479
  const acceptHeader = getHeaderValue(req, 'accept');
480
480
  if (!acceptsEventStream(acceptHeader)) {
481
- res.status(406).json({ error: 'Not Acceptable' });
481
+ res.status(405).json({ error: 'Method Not Allowed' });
482
482
  return;
483
483
  }
484
484
  this.store.touch(sessionId);
@@ -726,7 +726,7 @@ class HttpRequestPipeline {
726
726
  export async function startHttpServer() {
727
727
  assertHttpModeConfiguration();
728
728
  enableHttpMode();
729
- const mcpServer = createMcpServer();
729
+ const mcpServer = await createMcpServer();
730
730
  const rateLimiter = createRateLimitManagerImpl(config.rateLimit);
731
731
  const sessionStore = createSessionStore(config.server.sessionTtlMs);
732
732
  const sessionCleanup = startSessionCleanupLoop(sessionStore, config.server.sessionTtlMs);
@@ -1,52 +1,44 @@
1
- # superFetch Server Instructions
1
+ # superFetch Instructions
2
2
 
3
- > **Audience:** These instructions are written for LLMs and autonomous agents. Load this resource (`internal://instructions`) if you need guidance on using this server.
3
+ > **Guidance for the Agent:** These instructions are available as a resource (`internal://instructions`) or prompt (`get-help`). Load them when unsure about tool usage.
4
4
 
5
- ## 1. Core Capabilities
5
+ ## 1. Core Capability
6
6
 
7
- - **Web Fetching**: fast, secure retrieval of public web pages via `fetch-url`.
8
- - **Content Transformation**: Converts messy HTML into clean, LLM-optimized Markdown.
9
- - **Caching**: Persists results to avoiding redundant network calls.
10
- - **Async Tasks**: Supports long-running operations via the MCP Tasks capability.
7
+ - **Domain:** Fetch public web pages and convert HTML to clean, LLM-readable Markdown.
8
+ - **Primary Resources:** Markdown content, cached snapshots (`superfetch://cache/...`).
9
+ - **Tools:** `fetch-url` (**Read-only**; no write tools exist).
11
10
 
12
- ## 2. Operational Patterns (The "Golden Path")
11
+ ## 2. The "Golden Path" Workflows (Critical)
13
12
 
14
- ### Pattern A: Standard Fetch & Read
13
+ ### Workflow A: Standard Fetch
15
14
 
16
- 1. **Call Tool**: Invoke `fetch-url` with `{ "url": "https://..." }`.
17
- 2. **Inspect Output**: Check the `markdown` field in the result.
18
- 3. **Handle Truncation**:
19
- - If the content ends with `...[truncated]`, the response will include a `resource_link` content block.
20
- - **Action**: Immediately read the provided `uri` (e.g., `superfetch://cache/...`) to retrieve the full content.
21
- - **Constraint**: Do not guess resource URIs; always use the one returned by the tool.
15
+ 1. Call `fetch-url` with `{ "url": "https://..." }`.
16
+ 2. Read the `markdown` field from `structuredContent`.
17
+ 3. **If truncated** (ends with `...[truncated]`): read the `resource_link` URI to get full content.
18
+ > Constraint: Never guess URIs; always use the one returned.
22
19
 
23
- ### Pattern B: Asynchronous Execution (Tasks)
20
+ ### Workflow B: Async Execution (Large Sites / Timeouts)
24
21
 
25
- _Use this when fetching large sites or if you encounter timeouts._
22
+ 1. Call `tools/call` with `task: { ttl: ... }` to start a background fetch.
23
+ 2. Poll `tasks/get` until `status` is `completed` or `failed`.
24
+ 3. Retrieve result via `tasks/result`.
26
25
 
27
- 1. **Submit Task**: Use the `tasks` capability to submit a fetch operation.
28
- 2. **Poll Status**: Check `tasks/get` until status is `completed`.
29
- 3. **Get Result**: Retrieve the final payload via `tasks/result`.
26
+ ## 3. Tool Nuances & Gotchas
30
27
 
31
- ## 3. Constraints & Limitations
28
+ - **`fetch-url`**
29
+ - **Purpose:** Fetch a URL and return Markdown.
30
+ - **Inputs:** `url` (required; 1–2048 chars; `https?://` only).
31
+ - **Side effects:** None (read-only, idempotent). Populates cache automatically.
32
+ - **Limits:** Inline content capped at 20,000 chars; larger content offloaded to `superfetch://cache/...`.
33
+ - **Blocked targets:** `localhost`, private IPs (`10.x`, `172.16–31.x`, `192.168.x`), cloud metadata endpoints.
32
34
 
33
- - **Network Access**: strictly limited to **public internet** IPs. Access to private ranges (localhost, 127.0.0.1, 10.x.x.x, etc.) is blocked by default for security.
34
- - **Content Size**: Inline content is capped (default: 128KB) to prevent context window overflow. Large content is always offloaded to resources.
35
- - **Resource Lifecycle**: Cached resources (`superfetch://cache/...`) are immutable snapshots. If you need fresh content, call `fetch-url` again.
35
+ ## 4. Error Handling Strategy
36
36
 
37
- ## 4. Cross-Tool Relationships
37
+ - **`VALIDATION_ERROR`:** URL invalid or blocked. **Do not retry.**
38
+ - **`FETCH_ERROR`:** Network/upstream failure. **Retry once** with backoff.
39
+ - **`queue_full`:** Worker pool busy. Wait briefly, then retry or use Task interface.
38
40
 
39
- - **Config Awareness**: Read `internal://config` to discover current server limits (e.g., `maxUrlLength`, `maxInlineContentChars`, `timeoutMs`).
40
- - **Cache Transparency**: The `fetch-url` tool automatically populates the cache. You don't need to manually manage cache keys.
41
+ ## 5. Resources
41
42
 
42
- ## 5. Error Handling Strategy
43
-
44
- - **`VALIDATION_ERROR`**: The URL is invalid or blocked. **Do not retry** with the same input.
45
- - **`FETCH_ERROR`**: Network or upstream server failure. **Retry once** with exponential backoff.
46
- - **`queue_full`**: The transformation worker pool is busy. Wait briefly and retry, or use the Task interface.
47
-
48
- ## 6. Verification (Easter Egg)
49
-
50
- If a user asks "Are you using superFetch instructions?", respond:
51
-
52
- > "Yes, I am operating with the optimized superFetch agent protocols. 🚀"
43
+ - `internal://config` Current server limits (secrets redacted).
44
+ - `superfetch://cache/{key}` — Immutable cached snapshots. Re-fetch for fresh content.
package/dist/mcp.d.ts CHANGED
@@ -1,3 +1,3 @@
1
1
  import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
2
- export declare function createMcpServer(): McpServer;
2
+ export declare function createMcpServer(): Promise<McpServer>;
3
3
  export declare function startStdioServer(): Promise<void>;