@j0hanz/fetch-url-mcp 1.6.0 → 1.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (108) hide show
  1. package/README.md +507 -403
  2. package/dist/cli.d.ts +1 -0
  3. package/dist/cli.d.ts.map +1 -0
  4. package/dist/http/auth.d.ts +12 -0
  5. package/dist/http/auth.d.ts.map +1 -0
  6. package/dist/http/auth.js +85 -6
  7. package/dist/http/health.d.ts +1 -0
  8. package/dist/http/health.d.ts.map +1 -0
  9. package/dist/http/helpers.d.ts +1 -0
  10. package/dist/http/helpers.d.ts.map +1 -0
  11. package/dist/http/native.d.ts +1 -0
  12. package/dist/http/native.d.ts.map +1 -0
  13. package/dist/http/native.js +80 -63
  14. package/dist/http/rate-limit.d.ts +1 -0
  15. package/dist/http/rate-limit.d.ts.map +1 -0
  16. package/dist/index.d.ts +1 -0
  17. package/dist/index.d.ts.map +1 -0
  18. package/dist/lib/content.d.ts +3 -0
  19. package/dist/lib/content.d.ts.map +1 -0
  20. package/dist/lib/content.js +16 -11
  21. package/dist/lib/core.d.ts +6 -8
  22. package/dist/lib/core.d.ts.map +1 -0
  23. package/dist/lib/core.js +111 -97
  24. package/dist/lib/fetch-pipeline.d.ts +1 -1
  25. package/dist/lib/fetch-pipeline.d.ts.map +1 -0
  26. package/dist/lib/fetch-pipeline.js +54 -44
  27. package/dist/lib/http.d.ts +1 -0
  28. package/dist/lib/http.d.ts.map +1 -0
  29. package/dist/lib/mcp-tools.d.ts +45 -7
  30. package/dist/lib/mcp-tools.d.ts.map +1 -0
  31. package/dist/lib/mcp-tools.js +37 -6
  32. package/dist/lib/net-utils.d.ts +1 -0
  33. package/dist/lib/net-utils.d.ts.map +1 -0
  34. package/dist/lib/progress.d.ts +1 -0
  35. package/dist/lib/progress.d.ts.map +1 -0
  36. package/dist/lib/task-handlers.d.ts +9 -1
  37. package/dist/lib/task-handlers.d.ts.map +1 -0
  38. package/dist/lib/task-handlers.js +30 -38
  39. package/dist/lib/types.d.ts +4 -0
  40. package/dist/lib/types.d.ts.map +1 -0
  41. package/dist/lib/types.js +12 -1
  42. package/dist/lib/url.d.ts +3 -0
  43. package/dist/lib/url.d.ts.map +1 -0
  44. package/dist/lib/url.js +78 -151
  45. package/dist/lib/utils.d.ts +2 -2
  46. package/dist/lib/utils.d.ts.map +1 -0
  47. package/dist/lib/utils.js +60 -94
  48. package/dist/lib/zod.d.ts +3 -0
  49. package/dist/lib/zod.d.ts.map +1 -0
  50. package/dist/lib/zod.js +33 -0
  51. package/dist/prompts/index.d.ts +2 -1
  52. package/dist/prompts/index.d.ts.map +1 -0
  53. package/dist/prompts/index.js +2 -13
  54. package/dist/resources/index.d.ts +2 -1
  55. package/dist/resources/index.d.ts.map +1 -0
  56. package/dist/resources/index.js +5 -19
  57. package/dist/resources/instructions.d.ts +1 -0
  58. package/dist/resources/instructions.d.ts.map +1 -0
  59. package/dist/resources/instructions.js +2 -0
  60. package/dist/schemas/cache.d.ts +18 -0
  61. package/dist/schemas/cache.d.ts.map +1 -0
  62. package/dist/schemas/cache.js +19 -0
  63. package/dist/schemas/inputs.d.ts +1 -0
  64. package/dist/schemas/inputs.d.ts.map +1 -0
  65. package/dist/schemas/outputs.d.ts +6 -5
  66. package/dist/schemas/outputs.d.ts.map +1 -0
  67. package/dist/schemas/outputs.js +5 -9
  68. package/dist/server.d.ts +1 -0
  69. package/dist/server.d.ts.map +1 -0
  70. package/dist/server.js +9 -7
  71. package/dist/tasks/execution.d.ts +1 -0
  72. package/dist/tasks/execution.d.ts.map +1 -0
  73. package/dist/tasks/execution.js +3 -21
  74. package/dist/tasks/manager.d.ts +2 -6
  75. package/dist/tasks/manager.d.ts.map +1 -0
  76. package/dist/tasks/manager.js +2 -4
  77. package/dist/tasks/owner.d.ts +1 -0
  78. package/dist/tasks/owner.d.ts.map +1 -0
  79. package/dist/tasks/tool-registry.d.ts +2 -0
  80. package/dist/tasks/tool-registry.d.ts.map +1 -0
  81. package/dist/tasks/tool-registry.js +3 -0
  82. package/dist/tools/fetch-url.d.ts +4 -6
  83. package/dist/tools/fetch-url.d.ts.map +1 -0
  84. package/dist/tools/fetch-url.js +61 -59
  85. package/dist/tools/index.d.ts +1 -0
  86. package/dist/tools/index.d.ts.map +1 -0
  87. package/dist/transform/html-translators.d.ts +1 -0
  88. package/dist/transform/html-translators.d.ts.map +1 -0
  89. package/dist/transform/html-translators.js +5 -2
  90. package/dist/transform/metadata.d.ts +1 -0
  91. package/dist/transform/metadata.d.ts.map +1 -0
  92. package/dist/transform/metadata.js +1 -0
  93. package/dist/transform/{workers/shared.d.ts → shared.d.ts} +2 -1
  94. package/dist/transform/shared.d.ts.map +1 -0
  95. package/dist/transform/{workers/shared.js → shared.js} +1 -1
  96. package/dist/transform/transform.d.ts +1 -0
  97. package/dist/transform/transform.d.ts.map +1 -0
  98. package/dist/transform/transform.js +21 -14
  99. package/dist/transform/types.d.ts +1 -4
  100. package/dist/transform/types.d.ts.map +1 -0
  101. package/dist/transform/worker-pool.d.ts +3 -18
  102. package/dist/transform/worker-pool.d.ts.map +1 -0
  103. package/dist/transform/worker-pool.js +51 -167
  104. package/package.json +9 -6
  105. package/dist/transform/workers/transform-child.d.ts +0 -1
  106. package/dist/transform/workers/transform-child.js +0 -15
  107. package/dist/transform/workers/transform-worker.d.ts +0 -1
  108. package/dist/transform/workers/transform-worker.js +0 -13
package/README.md CHANGED
@@ -1,610 +1,714 @@
1
1
  # Fetch URL MCP Server
2
2
 
3
- [![npm version](https://img.shields.io/npm/v/%40j0hanz%2Ffetch-url-mcp)](https://www.npmjs.com/package/@j0hanz/fetch-url-mcp) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Node.js](https://img.shields.io/badge/node-%3E%3D24-3c873a)](https://nodejs.org) [![TypeScript](https://img.shields.io/badge/TypeScript-5.9-3178c6?logo=typescript&logoColor=white)](https://www.typescriptlang.org) [![MCP SDK](https://img.shields.io/badge/MCP%20SDK-1.26-7c3aed)](https://modelcontextprotocol.io)
3
+ [![npm version](https://img.shields.io/npm/v/%40j0hanz%2Ffetch-url-mcp?style=flat-square&logo=npm)](https://www.npmjs.com/package/%40j0hanz%2Ffetch-url-mcp) [![License](https://img.shields.io/badge/license-MIT-blue?style=flat-square)](#contributing-and-license)
4
4
 
5
- [![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0078d7?logo=visual-studio-code&logoColor=white)](https://insiders.vscode.dev/redirect?url=vscode%3Amcp%2Finstall%3F%7B%22name%22%3A%22fetch-url-mcp%22%2C%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Ffetch-url-mcp%40latest%22%2C%22--stdio%22%5D%7D) [![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?logo=visual-studio-code&logoColor=white)](https://insiders.vscode.dev/redirect?url=vscode-insiders%3Amcp%2Finstall%3F%7B%22name%22%3A%22fetch-url-mcp%22%2C%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Ffetch-url-mcp%40latest%22%2C%22--stdio%22%5D%7D) [![Install in Cursor](https://img.shields.io/badge/Cursor-Install-f97316?logo=cursor&logoColor=white)](https://cursor.com/install-mcp?name=fetch-url-mcp&config=eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyIteSIsIkBqMGhhbnovZmV0Y2gtdXJsLW1jcEBsYXRlc3QiLCItLXN0ZGlvIl19)
5
+ [![Install in VS Code](https://img.shields.io/badge/VS_Code-Install_Server-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=fetch-url-mcp&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Ffetch-url-mcp%40latest%22%5D%7D) [![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install_Server-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=fetch-url-mcp&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Ffetch-url-mcp%40latest%22%5D%7D&quality=insiders) [![Install in Visual Studio](https://img.shields.io/badge/Visual_Studio-Install_Server-C16FDE?logo=visualstudio&logoColor=white)](https://vs-open.link/mcp-install?%7B%22fetch-url-mcp%22%3A%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Ffetch-url-mcp%40latest%22%5D%7D%7D)
6
6
 
7
- Fetch public web pages and convert them into clean, AI-readable Markdown.
7
+ [![Add to LM Studio](https://files.lmstudio.ai/deeplink/mcp-install-light.svg)](https://lmstudio.ai/install-mcp?name=fetch-url-mcp&config=eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyIteSIsIkBqMGhhbnovZmV0Y2gtdXJsLW1jcEBsYXRlc3QiXX0%3D) [![Install in Cursor](https://cursor.com/deeplink/mcp-install-dark.svg)](https://cursor.com/en/install-mcp?name=fetch-url-mcp&config=eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyIteSIsIkBqMGhhbnovZmV0Y2gtdXJsLW1jcEBsYXRlc3QiXX0%3D) [![Install in Goose](https://block.github.io/goose/img/extension-install-dark.svg)](https://block.github.io/goose/extension?cmd=npx&arg=-y&arg=%40j0hanz%2Ffetch-url-mcp%40latest&id=%40j0hanz%2Ffetch-url-mcp&name=fetch-url-mcp&description=fetch-url-mcp%20MCP%20server)
8
8
 
9
- ## Overview
9
+ A web content fetcher MCP server that converts HTML to clean, AI and human readable markdown.
10
10
 
11
- Fetch URL is a [Model Context Protocol](https://modelcontextprotocol.io) (MCP) server that fetches public web pages, extracts meaningful content using Mozilla's Readability algorithm, and converts the result into clean Markdown optimized for LLM context windows. It handles noise removal, caching, SSRF protection, async task execution, and supports both **stdio** and **Streamable HTTP** transports.
11
+ ## Overview
12
12
 
13
- > [!NOTE]
14
- > Content extraction quality varies depending on the HTML structure and complexity of the source page. Fetch URL works best with standard article and documentation layouts. Pages relying on client-side JavaScript rendering may yield incomplete results.
13
+ The Fetch URL MCP Server provides a standardized interface for fetching public web content and transforming it into Markdown enriched with structured metadata. It validates URLs, applies noise removal heuristics, and caches results for reuse. The server supports both inline and task-based execution modes, making it suitable for a wide range of client applications and LLM interactions.
15
14
 
16
15
  ## Key Features
17
16
 
18
- - **HTML to Markdown** Content extraction via Mozilla Readability + node-html-markdown
19
- - **Noise removal** Strips navigation, ads, cookie banners, and other non-content elements
20
- - **In-memory LRU cache** Faster repeat fetches with configurable TTL (24 h default)
21
- - **Raw URL rewriting** Auto-converts GitHub, GitLab, Bitbucket, and Gist URLs to raw content endpoints
17
+ - `fetch-url` validates public HTTP(S) URLs, fetches the page, and returns cleaned Markdown plus structured metadata.
18
+ - The tool advertises optional task support and emits progress updates while fetching and transforming larger pages.
19
+ - GitHub, GitLab, Bitbucket, and Gist page URLs are rewritten to raw-content endpoints when possible before fetch.
20
+ - `internal://instructions` and `internal://cache/{namespace}/{hash}` expose built-in guidance and cached Markdown as MCP resources.
21
+ - HTTP mode adds host/origin validation, auth, rate limiting, health checks, OAuth protected-resource metadata, and cached-download URLs.
22
22
 
23
- ## Tech Stack
23
+ ## Requirements
24
24
 
25
- | Component | Technology |
26
- | ------------------- | ----------------------------------- |
27
- | Runtime | Node.js >= 24 |
28
- | Language | TypeScript 5.9 |
29
- | MCP SDK | `@modelcontextprotocol/sdk` ^1.26.0 |
30
- | Content Extraction | `@mozilla/readability` ^0.6.0 |
31
- | DOM Parsing | `linkedom` ^0.18.12 |
32
- | Markdown Conversion | `node-html-markdown` ^2.0.0 |
33
- | Schema Validation | `zod` ^4.3.6 |
34
- | Package Manager | npm |
25
+ - Node.js >=24 (from `package.json`)
26
+ - Docker is optional if you want to run the published container image.
35
27
 
36
- ## Architecture
28
+ ## Quick Start
37
29
 
38
- ```text
39
- URL → Validate → DNS Preflight → HTTP Fetch → Decompress
40
- → Truncate HTML → Readability Extract → Noise Removal
41
- → Markdown Convert → Cleanup Pipeline → Cache → Response
30
+ Use this standard MCP client configuration:
31
+
32
+ ```json
33
+ {
34
+ "mcpServers": {
35
+ "fetch-url-mcp": {
36
+ "command": "npx",
37
+ "args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
38
+ }
39
+ }
40
+ }
42
41
  ```
43
42
 
44
- 1. **URL Validation** — Normalize, block private hosts, transform raw-content URLs (GitHub, GitLab, Bitbucket)
45
- 2. **Fetch** — HTTP request with redirect following, DNS preflight SSRF checks, and size limits (10 MB)
46
- 3. **Transform** — Offloaded to worker threads: parse HTML with `linkedom`, extract with Readability, remove DOM noise, convert to Markdown
47
- 4. **Cleanup** — Multi-pass Markdown normalization (heading promotion, spacing, skip-link removal)
48
- 5. **Cache + Respond** — Store result in LRU cache, apply inline content limits, return structured content
43
+ ## Client Configuration
49
44
 
50
- ## Repository Structure
45
+ <details>
46
+ <summary><b>Install in VS Code</b></summary>
51
47
 
52
- ```text
53
- fetch-url-mcp/
54
- ├── assets/ # Server icon (logo.svg)
55
- ├── examples/ # Client examples
56
- ├── scripts/ # Build & test orchestration
57
- ├── src/
58
- │ ├── workers/ # Worker-thread child for HTML transforms
59
- │ ├── index.ts # CLI entrypoint, transport wiring, shutdown
60
- │ ├── server.ts # McpServer lifecycle and registration
61
- │ ├── tools.ts # fetch-url tool definition and pipeline
62
- │ ├── fetch.ts # URL normalization, SSRF, HTTP fetch
63
- │ ├── transform.ts # HTML-to-Markdown pipeline, worker pool
64
- │ ├── config.ts # Env-driven configuration
65
- │ ├── resources.ts # MCP resource/template registration
66
- │ ├── prompts.ts # MCP prompt registration (get-help)
67
- │ ├── mcp.ts # Task execution management
68
- │ ├── http-native.ts # Streamable HTTP server, auth, sessions
69
- │ └── instructions.md # Server instructions embedded at runtime
70
- ├── tests/ # Unit/integration tests (Node.js test runner)
71
- ├── package.json
72
- ├── tsconfig.json
73
- └── AGENTS.md
74
- ```
48
+ [![Install in VS Code](https://img.shields.io/badge/VS_Code-Install_Server-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=fetch-url-mcp&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Ffetch-url-mcp%40latest%22%5D%7D)
75
49
 
76
- ## Requirements
50
+ Add to `.vscode/mcp.json`:
77
51
 
78
- - **Node.js** >= 24
52
+ ```json
53
+ {
54
+ "servers": {
55
+ "fetch-url-mcp": {
56
+ "type": "stdio",
57
+ "command": "npx",
58
+ "args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
59
+ }
60
+ }
61
+ }
62
+ ```
79
63
 
80
- ## Quickstart
64
+ Or install via CLI:
81
65
 
82
- ```bash
83
- npx -y @j0hanz/fetch-url-mcp@latest --stdio
66
+ ```sh
67
+ code --add-mcp '{"name":"fetch-url-mcp","command":"npx","args":["-y","@j0hanz/fetch-url-mcp@latest"]}'
84
68
  ```
85
69
 
86
- Add to your MCP client configuration:
70
+ For more info, see [VS Code MCP docs](https://code.visualstudio.com/docs/copilot/chat/mcp-servers).
71
+
72
+ </details>
73
+
74
+ <details>
75
+ <summary><b>Install in VS Code Insiders</b></summary>
76
+
77
+ [![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install_Server-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=fetch-url-mcp&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Ffetch-url-mcp%40latest%22%5D%7D&quality=insiders)
78
+
79
+ Add to `.vscode/mcp.json`:
87
80
 
88
81
  ```json
89
82
  {
90
- "mcpServers": {
83
+ "servers": {
91
84
  "fetch-url-mcp": {
85
+ "type": "stdio",
92
86
  "command": "npx",
93
- "args": ["-y", "@j0hanz/fetch-url-mcp@latest", "--stdio"]
87
+ "args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
94
88
  }
95
89
  }
96
90
  }
97
91
  ```
98
92
 
99
- ## Client Example (CLI)
100
-
101
- Build the server and examples, then run the client:
93
+ Or install via CLI:
102
94
 
103
- ```bash
104
- npm run build
105
- node dist/examples/mcp-fetch-url-client.js https://example.com
95
+ ```sh
96
+ code-insiders --add-mcp '{"name":"fetch-url-mcp","command":"npx","args":["-y","@j0hanz/fetch-url-mcp@latest"]}'
106
97
  ```
107
98
 
108
- Optional flags:
99
+ For more info, see [VS Code Insiders MCP docs](https://code.visualstudio.com/docs/copilot/chat/mcp-servers).
109
100
 
110
- - `--full` reads the cached markdown resource to avoid inline truncation.
111
- - `--task` enables task-based execution with streamed status updates.
112
- - `--task-ttl <ms>` sets task TTL; `--task-poll <ms>` sets poll interval.
113
- - `--http http://localhost:3000/mcp` connects to the Streamable HTTP server.
114
- - Progress updates (when emitted) are printed to stderr.
101
+ </details>
115
102
 
116
- ## Installation
103
+ <details>
104
+ <summary><b>Install in Cursor</b></summary>
117
105
 
118
- ### NPX (Recommended)
106
+ [![Install in Cursor](https://cursor.com/deeplink/mcp-install-dark.svg)](https://cursor.com/en/install-mcp?name=fetch-url-mcp&config=eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyIteSIsIkBqMGhhbnovZmV0Y2gtdXJsLW1jcEBsYXRlc3QiXX0%3D)
119
107
 
120
- No installation required — runs directly:
108
+ Add to `~/.cursor/mcp.json`:
121
109
 
122
- ```bash
123
- npx -y @j0hanz/fetch-url-mcp@latest --stdio
110
+ ```json
111
+ {
112
+ "mcpServers": {
113
+ "fetch-url-mcp": {
114
+ "command": "npx",
115
+ "args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
116
+ }
117
+ }
118
+ }
124
119
  ```
125
120
 
126
- ### Global Install
121
+ For more info, see [Cursor MCP docs](https://docs.cursor.com/context/model-context-protocol).
127
122
 
128
- ```bash
129
- npm install -g @j0hanz/fetch-url-mcp
130
- fetch-url-mcp --stdio
131
- ```
123
+ </details>
132
124
 
133
- ### From Source
125
+ <details>
126
+ <summary><b>Install in Visual Studio</b></summary>
134
127
 
135
- ```bash
136
- git clone https://github.com/j0hanz/fetch-url-mcp.git
137
- cd fetch-url-mcp
138
- npm install
139
- npm run build
140
- node dist/index.js --stdio
141
- ```
128
+ [![Install in Visual Studio](https://img.shields.io/badge/Visual_Studio-Install_Server-C16FDE?logo=visualstudio&logoColor=white)](https://vs-open.link/mcp-install?%7B%22fetch-url-mcp%22%3A%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40j0hanz%2Ffetch-url-mcp%40latest%22%5D%7D%7D)
142
129
 
143
- ### Docker
130
+ For solution-scoped setup, add this to `.mcp.json` at the solution root:
144
131
 
145
- ```bash
146
- docker compose up --build
132
+ ```json
133
+ {
134
+ "servers": {
135
+ "fetch-url-mcp": {
136
+ "type": "stdio",
137
+ "command": "npx",
138
+ "args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
139
+ }
140
+ }
141
+ }
147
142
  ```
148
143
 
149
- ## Configuration
144
+ For more info, see [Visual Studio MCP docs](https://learn.microsoft.com/en-us/visualstudio/ide/mcp-servers).
150
145
 
151
- ### Runtime Modes
152
-
153
- | Flag | Description |
154
- | ----------------- | ---------------------------------------------------------- |
155
- | `--stdio`, `-s` | Run in stdio mode (for desktop MCP clients; default) |
156
- | `--http` | Run in HTTP mode (Streamable HTTP on port 3000 by default) |
157
- | `--help`, `-h` | Show usage help |
158
- | `--version`, `-v` | Print server version |
159
-
160
- When no transport flag is passed, the server starts in **stdio mode**.
161
-
162
- ### Environment Variables
163
-
164
- #### Core Settings
165
-
166
- | Variable | Default | Description |
167
- | ------------------------------------ | ------------------------- | ------------------------------------------------------------------------------------------------------- |
168
- | `HOST` | `127.0.0.1` | HTTP server bind address |
169
- | `PORT` | `3000` | HTTP server port (1024–65535) |
170
- | `LOG_LEVEL` | `info` | Log level: `debug`, `info`, `warn`, `error` |
171
- | `FETCH_TIMEOUT_MS` | `15000` | HTTP fetch timeout in ms (1000–60000) |
172
- | `CACHE_ENABLED` | `true` | Enable/disable in-memory content cache |
173
- | `USER_AGENT` | `fetch-url-mcp/{version}` | Custom User-Agent header |
174
- | `ALLOW_REMOTE` | `false` | Allow remote connections in HTTP mode |
175
- | `ALLOWED_HOSTS` | _(empty)_ | Comma-separated host/origin allowlist for HTTP mode |
176
- | `MCP_STRICT_PROTOCOL_VERSION_HEADER` | `true` | Require `MCP-Protocol-Version` on HTTP session initialize (`false` allows legacy headerless initialize) |
177
-
178
- #### Task Management
179
-
180
- | Variable | Default | Description |
181
- | ---------------------------- | ------- | -------------------------------------------------------- |
182
- | `TASKS_MAX_TOTAL` | `5000` | Maximum retained task records across all owners |
183
- | `TASKS_MAX_PER_OWNER` | `1000` | Maximum retained task records per session/client |
184
- | `TASKS_STATUS_NOTIFICATIONS` | `false` | Emit experimental `notifications/tasks/status` extension |
185
-
186
- #### Authentication (HTTP Mode)
187
-
188
- | Variable | Default | Description |
189
- | ------------------------- | --------- | --------------------------------------- |
190
- | `ACCESS_TOKENS` | _(empty)_ | Comma-separated static bearer tokens |
191
- | `API_KEY` | _(empty)_ | Single API key (added to static tokens) |
192
- | `OAUTH_ISSUER_URL` | _(empty)_ | OAuth issuer URL (enables OAuth mode) |
193
- | `OAUTH_AUTHORIZATION_URL` | _(empty)_ | OAuth authorization endpoint |
194
- | `OAUTH_TOKEN_URL` | _(empty)_ | OAuth token endpoint |
195
- | `OAUTH_INTROSPECTION_URL` | _(empty)_ | OAuth token introspection endpoint |
196
- | `OAUTH_REVOCATION_URL` | _(empty)_ | OAuth token revocation endpoint |
197
- | `OAUTH_REGISTRATION_URL` | _(empty)_ | OAuth dynamic client registration |
198
- | `OAUTH_REQUIRED_SCOPES` | _(empty)_ | Required OAuth scopes |
199
- | `OAUTH_CLIENT_ID` | _(empty)_ | OAuth client ID |
200
- | `OAUTH_CLIENT_SECRET` | _(empty)_ | OAuth client secret |
201
-
202
- #### Transform & Workers
203
-
204
- | Variable | Default | Description |
205
- | ------------------------------------------ | --------- | ----------------------------------------- |
206
- | `TRANSFORM_WORKER_MODE` | `threads` | Worker mode: `threads` or `process` |
207
- | `TRANSFORM_WORKER_MAX_OLD_GENERATION_MB` | _(unset)_ | V8 old generation heap limit per worker |
208
- | `TRANSFORM_WORKER_MAX_YOUNG_GENERATION_MB` | _(unset)_ | V8 young generation heap limit per worker |
209
- | `TRANSFORM_WORKER_CODE_RANGE_MB` | _(unset)_ | V8 code range limit per worker |
210
- | `TRANSFORM_WORKER_STACK_MB` | _(unset)_ | Stack size limit per worker |
211
-
212
- #### Content Tuning
213
-
214
- | Variable | Default | Description |
215
- | ------------------------------------- | ----------------- | ------------------------------------------------ |
216
- | `MAX_INLINE_CONTENT_CHARS` | `0` | Global inline markdown limit (`0` = unlimited) |
217
- | `FETCH_URL_MCP_EXTRA_NOISE_TOKENS` | _(empty)_ | Additional CSS class/id tokens for noise removal |
218
- | `FETCH_URL_MCP_EXTRA_NOISE_SELECTORS` | _(empty)_ | Additional CSS selectors for noise removal |
219
- | `MARKDOWN_HEADING_KEYWORDS` | _(built-in list)_ | Keywords triggering heading promotion |
220
- | `FETCH_URL_MCP_LOCALE` | _(system)_ | Locale for content processing |
221
-
222
- #### Server Tuning
223
-
224
- | Variable | Default | Description |
225
- | ---------------------------------- | --------------- | ---------------------------------------- |
226
- | `SERVER_MAX_CONNECTIONS` | `0` (unlimited) | Maximum concurrent HTTP connections |
227
- | `SERVER_BLOCK_PRIVATE_CONNECTIONS` | `false` | Block connections from private IP ranges |
228
-
229
- ### Hardcoded Defaults
230
-
231
- | Setting | Value |
232
- | ------------------------ | ------------------------------- |
233
- | Max HTML size | 10 MB |
234
- | Max inline content chars | 0 (unlimited, configurable) |
235
- | Fetch timeout | 15 s |
236
- | Transform timeout | 30 s |
237
- | Tool timeout | Fetch + Transform + 5 s padding |
238
- | Max redirects | 5 |
239
- | Cache TTL | 86400 s (24 h) |
240
- | Cache max keys | 100 |
241
- | Rate limit | 100 requests / 60 s |
242
- | Max sessions | 200 |
243
- | Session TTL | 30 min |
244
- | Max URL length | 2048 chars |
245
- | Worker pool max scale | 4 |
246
-
247
- ## Usage
248
-
249
- ### Stdio Mode
250
-
251
- ```bash
252
- fetch-url-mcp --stdio
146
+ </details>
147
+
148
+ <details>
149
+ <summary><b>Install in Goose</b></summary>
150
+
151
+ [![Install in Goose](https://block.github.io/goose/img/extension-install-dark.svg)](https://block.github.io/goose/extension?cmd=npx&arg=-y&arg=%40j0hanz%2Ffetch-url-mcp%40latest&id=%40j0hanz%2Ffetch-url-mcp&name=fetch-url-mcp&description=A%20web%20content%20fetcher%20MCP%20server%20that%20converts%20HTML%20to%20clean%2C%20AI%20and%20human%20readable%20markdown.)
152
+
153
+ Add to `~/.config/goose/config.yaml` on macOS/Linux or `%APPDATA%\Block\goose\config\config.yaml` on Windows:
154
+
155
+ ```yaml
156
+ extensions:
157
+ fetch-url-mcp:
158
+ name: fetch-url-mcp
159
+ cmd: npx
160
+ args: ['-y', '@j0hanz/fetch-url-mcp@latest']
161
+ enabled: true
162
+ type: stdio
163
+ timeout: 300
253
164
  ```
254
165
 
255
- The server communicates via JSON-RPC over stdin/stdout. All MCP clients that support stdio transport can connect directly.
166
+ For more info, see [Goose extension docs](https://block.github.io/goose/docs/getting-started/using-extensions/).
167
+
168
+ </details>
169
+
170
+ <details>
171
+ <summary><b>Install in LM Studio</b></summary>
172
+
173
+ [![Add to LM Studio](https://files.lmstudio.ai/deeplink/mcp-install-light.svg)](https://lmstudio.ai/install-mcp?name=fetch-url-mcp&config=eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyIteSIsIkBqMGhhbnovZmV0Y2gtdXJsLW1jcEBsYXRlc3QiXX0%3D)
256
174
 
257
- ### HTTP Mode
175
+ Add to `~/.lmstudio/mcp.json` on macOS/Linux or `%USERPROFILE%/.lmstudio/mcp.json` on Windows:
258
176
 
259
- ```bash
260
- fetch-url-mcp
261
- # or
262
- PORT=8080 HOST=0.0.0.0 ALLOW_REMOTE=true fetch-url-mcp
177
+ ```json
178
+ {
179
+ "mcpServers": {
180
+ "fetch-url-mcp": {
181
+ "command": "npx",
182
+ "args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
183
+ }
184
+ }
185
+ }
263
186
  ```
264
187
 
265
- The server starts a Streamable HTTP endpoint at `/mcp`. Authenticate with bearer tokens via the `ACCESS_TOKENS` or `API_KEY` environment variables.
188
+ For more info, see [LM Studio MCP docs](https://lmstudio.ai/docs/basics/mcp).
266
189
 
267
- For `POST /mcp`, clients should send:
190
+ </details>
268
191
 
269
- - `Accept: application/json, text/event-stream`
270
- - `MCP-Protocol-Version: 2025-11-25` (or `2025-03-26` for legacy clients)
192
+ <details>
193
+ <summary><b>Install in Claude Desktop</b></summary>
271
194
 
272
- ## MCP Surface
195
+ Add to `claude_desktop_config.json`:
273
196
 
274
- ### Tools
197
+ ```json
198
+ {
199
+ "mcpServers": {
200
+ "fetch-url-mcp": {
201
+ "command": "npx",
202
+ "args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
203
+ }
204
+ }
205
+ }
206
+ ```
275
207
 
276
- #### `fetch-url`
208
+ For more info, see [Claude Desktop MCP docs](https://modelcontextprotocol.io/quickstart/user).
277
209
 
278
- Fetches a webpage and converts it to clean Markdown format optimized for LLM context.
210
+ </details>
279
211
 
280
- **Useful for:**
212
+ <details>
213
+ <summary><b>Install in Claude Code</b></summary>
281
214
 
282
- - Reading documentation, blog posts, or articles
283
- - Extracting main content while removing navigation and ads
284
- - Caching content to speed up repeated queries
215
+ Use the CLI:
285
216
 
286
- **Limitations:**
217
+ ```sh
218
+ claude mcp add fetch-url-mcp -- npx -y @j0hanz/fetch-url-mcp@latest
219
+ ```
287
220
 
288
- - Does not execute complex client-side JavaScript interactions
289
- - Inline output may be truncated when `MAX_INLINE_CONTENT_CHARS` is set
221
+ For project-scoped config, Claude Code writes `.mcp.json` with:
290
222
 
291
- ##### Parameters
223
+ ```json
224
+ {
225
+ "mcpServers": {
226
+ "fetch-url-mcp": {
227
+ "command": "npx",
228
+ "args": ["-y", "@j0hanz/fetch-url-mcp@latest"],
229
+ "env": {}
230
+ }
231
+ }
232
+ }
233
+ ```
292
234
 
293
- | Parameter | Type | Required | Default | Description |
294
- | ------------------ | -------------- | -------- | ------- | -------------------------------------------------------------------------- |
295
- | `url` | `string` (URL) | Yes | — | The URL of the webpage to fetch (http/https, max 2048 chars) |
296
- | `skipNoiseRemoval` | `boolean` | No | `false` | Preserve navigation, footers, and other elements normally filtered |
297
- | `forceRefresh` | `boolean` | No | `false` | Bypass cache and fetch fresh content |
298
- | `maxInlineChars` | `number` | No | `0` | Per-call inline markdown limit (`0` = unlimited; global cap still applies) |
235
+ For more info, see [Claude Code MCP docs](https://docs.anthropic.com/en/docs/claude-code/mcp).
299
236
 
300
- ##### Returns
237
+ </details>
238
+
239
+ <details>
240
+ <summary><b>Install in Windsurf</b></summary>
241
+
242
+ Add to `~/.codeium/windsurf/mcp_config.json`:
301
243
 
302
244
  ```json
303
245
  {
304
- "url": "https://example.com",
305
- "inputUrl": "https://example.com",
306
- "resolvedUrl": "https://example.com",
307
- "finalUrl": "https://example.com",
308
- "title": "Example Domain",
309
- "metadata": {
310
- "title": "Example Domain",
311
- "description": "...",
312
- "author": "...",
313
- "image": "...",
314
- "favicon": "...",
315
- "publishedAt": "...",
316
- "modifiedAt": "..."
317
- },
318
- "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
319
- "fromCache": false,
320
- "fetchedAt": "2026-02-11T12:00:00.000Z",
321
- "contentSize": 1234,
322
- "truncated": false
246
+ "mcpServers": {
247
+ "fetch-url-mcp": {
248
+ "command": "npx",
249
+ "args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
250
+ }
251
+ }
323
252
  }
324
253
  ```
325
254
 
326
- | Field | Type | Description |
327
- | ------------- | ---------- | ---------------------------------------------------------------------------------------- |
328
- | `url` | `string` | The canonical URL (pre-raw-transform) |
329
- | `inputUrl` | `string?` | The original URL provided by the caller |
330
- | `resolvedUrl` | `string?` | The normalized/transformed URL that was fetched |
331
- | `finalUrl` | `string?` | Final response URL after redirects |
332
- | `title` | `string?` | Extracted page title |
333
- | `metadata` | `object?` | Extracted metadata (title, description, author, image, favicon, publishedAt, modifiedAt) |
334
- | `markdown` | `string?` | Extracted content in Markdown format |
335
- | `fromCache` | `boolean?` | Whether the response was served from cache |
336
- | `fetchedAt` | `string?` | ISO timestamp for fetch/cache retrieval |
337
- | `contentSize` | `number?` | Full markdown size before inline truncation |
338
- | `truncated` | `boolean?` | Whether inline markdown was truncated |
339
- | `error` | `string?` | Error message if the request failed |
340
- | `statusCode` | `number?` | HTTP status code for failed requests |
341
- | `details` | `object?` | Additional error details |
342
-
343
- ##### Annotations
255
+ For more info, see [Windsurf MCP docs](https://docs.windsurf.com/windsurf/cascade/mcp).
344
256
 
345
- | Annotation | Value |
346
- | ----------------- | ------- |
347
- | `readOnlyHint` | `true` |
348
- | `destructiveHint` | `false` |
349
- | `idempotentHint` | `true` |
350
- | `openWorldHint` | `true` |
257
+ </details>
351
258
 
352
- ##### Async Task Execution
259
+ <details>
260
+ <summary><b>Install in Amp</b></summary>
353
261
 
354
- The `fetch-url` tool supports optional async task execution (`execution.taskSupport: "optional"`). Include a `task` field in the tool call to run the fetch in the background:
262
+ Add to `~/.config/amp/settings.json` on macOS/Linux, `%USERPROFILE%\.config\amp\settings.json` on Windows, or `.amp/settings.json` for workspace-scoped config:
355
263
 
356
264
  ```json
357
265
  {
358
- "method": "tools/call",
359
- "params": {
360
- "name": "fetch-url",
361
- "arguments": { "url": "https://example.com" },
362
- "task": { "ttl": 30000 }
266
+ "amp.mcpServers": {
267
+ "fetch-url-mcp": {
268
+ "command": "npx",
269
+ "args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
270
+ }
363
271
  }
364
272
  }
365
273
  ```
366
274
 
367
- Then poll `tasks/get` until the task status is `completed` or `failed`, and retrieve the result via `tasks/result`.
275
+ Or install via CLI:
368
276
 
369
- ### Prompts
277
+ ```sh
278
+ amp mcp add fetch-url-mcp -- npx -y @j0hanz/fetch-url-mcp@latest
279
+ ```
370
280
 
371
- | Name | Description |
372
- | ---------- | --------------------------------- |
373
- | `get-help` | Returns server usage instructions |
281
+ For more info, see [Amp docs](https://ampcode.com/manual).
374
282
 
375
- ### Resources
283
+ </details>
376
284
 
377
- | URI Pattern | MIME Type | Description |
378
- | ------------------------------------- | --------------- | ---------------------------------------------------- |
379
- | `internal://instructions` | `text/markdown` | Server instructions and usage guidance |
380
- | `internal://cache/{namespace}/{hash}` | `text/markdown` | Cached markdown entries from prior `fetch-url` calls |
285
+ <details>
286
+ <summary><b>Install in Cline</b></summary>
381
287
 
382
- ### Tasks
288
+ Open the MCP Servers panel, choose `Configure MCP Servers`, and add this to `cline_mcp_settings.json`:
383
289
 
384
- The server declares full MCP task support:
290
+ ```json
291
+ {
292
+ "mcpServers": {
293
+ "fetch-url-mcp": {
294
+ "command": "npx",
295
+ "args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
296
+ }
297
+ }
298
+ }
299
+ ```
385
300
 
386
- | Endpoint | Description |
387
- | -------------- | ------------------------------------ |
388
- | `tasks/list` | List tasks (scoped to session/owner) |
389
- | `tasks/get` | Get task status by ID |
390
- | `tasks/result` | Retrieve completed task result |
391
- | `tasks/cancel` | Cancel an in-flight task |
301
+ For more info, see [Cline MCP docs](https://docs.cline.bot/mcp/configuring-mcp-servers).
392
302
 
393
- ## HTTP Mode Endpoints
303
+ </details>
394
304
 
395
- | Method | Path | Auth | Description |
396
- | -------- | ----------------------------------- | ----- | ---------------------------------------- |
397
- | `GET` | `/health` | No | Health check (minimal payload) |
398
- | `GET` | `/health?verbose=true` | Yes\* | Detailed diagnostics and runtime metrics |
399
- | `POST` | `/mcp` | Yes | MCP JSON-RPC (Streamable HTTP) |
400
- | `GET` | `/mcp` | Yes | SSE stream for server-initiated messages |
401
- | `DELETE` | `/mcp` | Yes | Terminate MCP session |
402
- | `GET` | `/mcp/downloads/{namespace}/{hash}` | Yes | Download cached content |
305
+ <details>
306
+ <summary><b>Install in Codex CLI</b></summary>
403
307
 
404
- \* `verbose=true` can be read without auth only for local-only deployments (`ALLOW_REMOTE=false`).
308
+ Use the CLI:
405
309
 
406
- ### Session Behavior
310
+ ```sh
311
+ codex mcp add fetch-url-mcp -- npx -y @j0hanz/fetch-url-mcp@latest
312
+ ```
407
313
 
408
- - Sessions are created on the first `POST /mcp` request with an `initialize` message
409
- - Session ID is returned in the `mcp-session-id` response header
410
- - Sessions expire after 30 minutes of inactivity (max 200 concurrent)
314
+ Or add this to `~/.codex/config.toml` or project-scoped `.codex/config.toml`:
411
315
 
412
- ### Authentication
316
+ ```toml
317
+ [mcp_servers.fetch-url-mcp]
318
+ command = "npx"
319
+ args = ["-y", "@j0hanz/fetch-url-mcp@latest"]
320
+ ```
413
321
 
414
- - **Static tokens**: Set `ACCESS_TOKENS` or `API_KEY` environment variables; pass as `Authorization: Bearer <token>`
415
- - **OAuth**: Configure `OAUTH_*` environment variables to enable OAuth 2.0 token introspection
322
+ For more info, see [Codex MCP docs](https://developers.openai.com/codex/mcp/).
416
323
 
417
- ## Client Configuration Examples
324
+ </details>
418
325
 
419
326
  <details>
420
- <summary>VS Code / VS Code Insiders</summary>
327
+ <summary><b>Install in GitHub Copilot</b></summary>
421
328
 
422
- Add to your VS Code settings (`.vscode/mcp.json` or User Settings):
329
+ Add to `.vscode/mcp.json`:
423
330
 
424
331
  ```json
425
332
  {
426
333
  "servers": {
427
334
  "fetch-url-mcp": {
335
+ "type": "stdio",
428
336
  "command": "npx",
429
- "args": ["-y", "@j0hanz/fetch-url-mcp@latest", "--stdio"]
337
+ "args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
430
338
  }
431
339
  }
432
340
  }
433
341
  ```
434
342
 
343
+ For more info, see [GitHub Copilot MCP docs](https://code.visualstudio.com/docs/copilot/chat/mcp-servers).
344
+
435
345
  </details>
436
346
 
437
347
  <details>
438
- <summary>Claude Desktop</summary>
348
+ <summary><b>Install in Warp</b></summary>
439
349
 
440
- Add to `claude_desktop_config.json`:
350
+ Open `Personal > MCP Servers` in Warp, choose `+ Add`, and either add a CLI server with:
351
+
352
+ - `command`: `npx`
353
+ - `args`: `["-y", "@j0hanz/fetch-url-mcp@latest"]`
354
+
355
+ Or paste this JSON snippet when using Warp's multi-server import flow:
441
356
 
442
357
  ```json
443
358
  {
444
359
  "mcpServers": {
445
360
  "fetch-url-mcp": {
446
361
  "command": "npx",
447
- "args": ["-y", "@j0hanz/fetch-url-mcp@latest", "--stdio"]
362
+ "args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
448
363
  }
449
364
  }
450
365
  }
451
366
  ```
452
367
 
368
+ For more info, see [Warp MCP docs](https://docs.warp.dev/features/warp-ai/mcp).
369
+
453
370
  </details>
454
371
 
455
372
  <details>
456
- <summary>Cursor</summary>
373
+ <summary><b>Install in Kiro</b></summary>
374
+
375
+ Use Kiro's MCP Servers panel or the `Add to Kiro` install flow. Kiro stores workspace-scoped MCP config in `.kiro/settings/mcp.json` and user-scoped config in `~/.kiro/settings/mcp.json`.
376
+
377
+ For this server, use:
457
378
 
458
- [![Install in Cursor](https://img.shields.io/badge/Cursor-Install-f97316?logo=cursor&logoColor=white)](https://cursor.com/install-mcp?name=fetch-url-mcp&config=eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyIteSIsIkBqMGhhbnovZmV0Y2gtdXJsLW1jcEBsYXRlc3QiLCItLXN0ZGlvIl19)
379
+ - `command`: `npx`
380
+ - `args`: `["-y", "@j0hanz/fetch-url-mcp@latest"]`
459
381
 
460
- Or manually add to Cursor MCP settings:
382
+ For more info, see [Kiro MCP docs](https://kiro.dev/blog/unlock-your-development-productivity-with-kiro-and-mcp/).
383
+
384
+ </details>
385
+
386
+ <details>
387
+ <summary><b>Install in Gemini CLI</b></summary>
388
+
389
+ Add to `~/.gemini/settings.json`:
461
390
 
462
391
  ```json
463
392
  {
464
393
  "mcpServers": {
465
394
  "fetch-url-mcp": {
466
395
  "command": "npx",
467
- "args": ["-y", "@j0hanz/fetch-url-mcp@latest", "--stdio"]
396
+ "args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
468
397
  }
469
398
  }
470
399
  }
471
400
  ```
472
401
 
402
+ For more info, see [Gemini CLI MCP docs](https://google-gemini.github.io/gemini-cli/docs/tools/mcp-server.html).
403
+
473
404
  </details>
474
405
 
475
406
  <details>
476
- <summary>Windsurf</summary>
407
+ <summary><b>Install in Zed</b></summary>
477
408
 
478
- Add to your Windsurf MCP configuration:
409
+ Add to `~/.config/zed/settings.json`:
479
410
 
480
411
  ```json
481
412
  {
482
- "mcpServers": {
413
+ "context_servers": {
483
414
  "fetch-url-mcp": {
484
415
  "command": "npx",
485
- "args": ["-y", "@j0hanz/fetch-url-mcp@latest", "--stdio"]
416
+ "args": ["-y", "@j0hanz/fetch-url-mcp@latest"],
417
+ "env": {}
486
418
  }
487
419
  }
488
420
  }
489
421
  ```
490
422
 
423
+ For more info, see [Zed MCP docs](https://zed.dev/docs/ai/mcp).
424
+
491
425
  </details>
492
426
 
493
427
  <details>
494
- <summary>Docker</summary>
428
+ <summary><b>Install in Augment</b></summary>
495
429
 
496
- Use the published image from GitHub Container Registry:
430
+ Use the Augment Settings panel and either add the server manually or choose `Import from JSON`:
497
431
 
498
432
  ```json
499
433
  {
500
434
  "mcpServers": {
501
435
  "fetch-url-mcp": {
502
- "command": "docker",
503
- "args": [
504
- "run",
505
- "-i",
506
- "--rm",
507
- "ghcr.io/j0hanz/fetch-url-mcp:latest",
508
- "--stdio"
509
- ]
436
+ "command": "npx",
437
+ "args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
510
438
  }
511
439
  }
512
440
  }
513
441
  ```
514
442
 
515
- Or build and run locally:
516
-
517
- ```bash
518
- docker build -t fetch-url-mcp .
519
- docker run -i --rm fetch-url-mcp --stdio
520
- ```
443
+ For more info, see [Augment MCP docs](https://docs.augmentcode.com/setup-augment/mcp).
521
444
 
522
445
  </details>
523
446
 
524
- ## Security
447
+ <details>
448
+ <summary><b>Install in Roo Code</b></summary>
525
449
 
526
- ### SSRF Protection
450
+ Use Roo Code's MCP Servers UI or marketplace flow.
527
451
 
528
- Fetch URL blocks requests to private and internal network addresses:
452
+ For this server, use:
529
453
 
530
- - **Blocked hosts**: `localhost`, `127.0.0.0/8`, `10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16`, `169.254.0.0/16`, `100.64.0.0/10`
531
- - **Blocked IPv6**: `::1`, `fc00::/7`, `fe80::/10`, IPv4-mapped private addresses
532
- - **Cloud metadata**: `169.254.169.254` (AWS), `metadata.google.internal`, `metadata.azure.com`, `100.100.100.200` (Azure IMDS)
454
+ - `command`: `npx`
455
+ - `args`: `["-y", "@j0hanz/fetch-url-mcp@latest"]`
533
456
 
534
- DNS preflight checks run on every redirect hop to prevent DNS rebinding attacks.
457
+ For more info, see [Roo Code docs](https://docs.roocode.com/).
535
458
 
536
- ### Stdio Transport Safety
459
+ </details>
537
460
 
538
- The server never writes non-protocol data to stdout. All logs and diagnostics go to stderr.
461
+ <details>
462
+ <summary><b>Install in Kilo Code</b></summary>
539
463
 
540
- ### Rate Limiting
464
+ Use Kilo Code's MCP Servers UI or marketplace flow.
541
465
 
542
- HTTP mode enforces a rate limit of 100 requests per 60-second window per client.
466
+ For this server, use:
543
467
 
544
- ### Content Safety
468
+ - `command`: `npx`
469
+ - `args`: `["-y", "@j0hanz/fetch-url-mcp@latest"]`
545
470
 
546
- - HTML downloads are capped at 10 MB
547
- - Worker threads run in isolation with configurable resource limits
548
- - Auth tokens are stored in-memory only and compared using timing-safe equality
471
+ For more info, see [Kilo Code docs](https://kilocode.ai/docs).
549
472
 
550
- ## Development Workflow
473
+ </details>
551
474
 
552
- ### Install Dependencies
475
+ ## Use Cases
553
476
 
554
- ```bash
555
- npm install
556
- ```
477
+ - Fetch documentation pages, blog posts, or reference material into Markdown before sending them to an LLM.
478
+ - Retrieve repository-hosted content from GitHub, GitLab, Bitbucket, or Gists and let the server rewrite page URLs to raw endpoints when possible.
479
+ - Reuse cached Markdown through `internal://cache/{namespace}/{hash}` or bypass the cache with `forceRefresh` for time-sensitive pages.
480
+ - Use task mode for large pages or slower sites when the inline response would otherwise be truncated or delayed.
557
481
 
558
- ### Scripts
559
-
560
- | Script | Command | Description |
561
- | --------------- | ----------------------- | -------------------------------------------- |
562
- | `dev` | `npm run dev` | TypeScript watch mode |
563
- | `dev:run` | `npm run dev:run` | Run compiled output with watch + `.env` |
564
- | `build` | `npm run build` | Clean, compile, copy assets, make executable |
565
- | `start` | `npm start` | Run compiled server |
566
- | `test` | `npm test` | Run test suite (Node.js native test runner) |
567
- | `test:coverage` | `npm run test:coverage` | Run tests with coverage |
568
- | `lint` | `npm run lint` | ESLint |
569
- | `lint:fix` | `npm run lint:fix` | ESLint with auto-fix |
570
- | `format` | `npm run format` | Prettier |
571
- | `type-check` | `npm run type-check` | TypeScript type checking |
572
- | `inspector` | `npm run inspector` | Build and launch MCP Inspector |
482
+ ## Architecture
573
483
 
574
- ## Build and Release
484
+ ```text
485
+ [MCP Client]
486
+ ├─ stdio -> `src/index.ts` -> `startStdioServer()` -> `createMcpServer()`
487
+ └─ HTTP (`--http`) -> `src/index.ts` -> `startHttpServer()` -> HTTP dispatcher
488
+ ├─ `GET /health`
489
+ ├─ `GET /.well-known/oauth-protected-resource`
490
+ ├─ `GET /.well-known/oauth-protected-resource/mcp`
491
+ ├─ `GET /mcp/downloads/{namespace}/{hash}`
492
+ └─ `POST|GET|DELETE /mcp`
493
+
494
+ `createMcpServer()`
495
+ ├─ registers tool: `fetch-url`
496
+ ├─ registers prompt: `get-help`
497
+ ├─ registers resources:
498
+ │ - `internal://instructions`
499
+ │ - `internal://cache/{namespace}/{hash}`
500
+ ├─ enables capabilities: completions, logging, resources, prompts, tasks
501
+ └─ installs task handlers, log-level handling, and shutdown cleanup
502
+
503
+ `fetch-url` execution
504
+ ├─ validate input with `fetchUrlInputSchema`
505
+ ├─ normalize URL and block local/private targets unless allowed
506
+ ├─ rewrite supported code-host URLs to raw endpoints when possible
507
+ ├─ fetch and cache content via the shared pipeline
508
+ ├─ transform HTML into Markdown in the transform worker path
509
+ └─ validate `structuredContent` with `fetchUrlOutputSchema`
510
+ ```
575
511
 
576
- ```bash
577
- npm run build # Clean → Compile → Copy Assets → chmod
578
- npm run prepublishOnly # Lint → Type-Check → Build
579
- npm publish # Publish to npm
512
+ ### Request Lifecycle
513
+
514
+ ```text
515
+ [Client] -- initialize {protocolVersion, capabilities} --> [Server]
516
+ [Server] -- {protocolVersion, capabilities, serverInfo} --> [Client]
517
+ [Client] -- notifications/initialized --> [Server]
518
+ [Client] -- tools/call {name, arguments} --> [Server]
519
+ [Server] -- {content: [{type, text}], structuredContent?, isError?} --> [Client]
580
520
  ```
581
521
 
582
- CI/CD is handled via a GitHub Actions workflow (`release.yml`) that runs lint, type-check, test, build, and publishes to npm with version bumping.
522
+ ## MCP Surface
583
523
 
584
- ## Troubleshooting
524
+ ### Tools
585
525
 
586
- ### MCP Inspector
526
+ #### `fetch-url`
527
+
528
+ Fetch public webpages and convert HTML into AI-readable Markdown. The tool is read-only, does not execute page JavaScript, can bypass the cache with `forceRefresh`, and supports optional task mode for larger or slower fetches.
529
+
530
+ | Parameter | Type | Required | Description |
531
+ | ------------------ | --------- | -------- | ------------------------------------------------------------------------------------------- |
532
+ | `url` | `string` | yes | Target URL. Max 2048 chars. |
533
+ | `skipNoiseRemoval` | `boolean` | no | Preserve navigation/footers (disable noise filtering). |
534
+ | `forceRefresh` | `boolean` | no | Bypass cache and fetch fresh content. |
535
+ | `maxInlineChars` | `integer` | no | Inline markdown limit (0-10485760, 0=unlimited). Lower of this or the global limit applies. |
587
536
 
588
- Use the built-in inspector to test the server interactively:
537
+ The response is returned as MCP text content and, when validation succeeds, as `structuredContent` containing `url`, `resolvedUrl`, `finalUrl`, `title`, `metadata`, `markdown`, `fromCache`, `fetchedAt`, `contentSize`, and `truncated`.
589
538
 
590
- ```bash
591
- npm run inspector
539
+ ```text
540
+ 1. [Client] -- tools/call {name: "fetch-url", arguments} --> [Server]
541
+ 2. [Server] -- dispatch("fetch-url") --> [src/tools/fetch-url.ts]
542
+ 3. [Handler] -- validate(fetchUrlInputSchema) --> normalize / fetch / transform
543
+ 4. [Handler] -- validate(fetchUrlOutputSchema) --> assemble content + structuredContent
544
+ 5. [Server] -- result or tool error --> [Client]
592
545
  ```
593
546
 
594
- ### Common Issues
547
+ ### Resources
548
+
549
+ | Resource | URI | MIME Type | Description |
550
+ | ---------------------------- | ------------------------------------- | --------------- | ------------------------------------------------------------- |
551
+ | `fetch-url-mcp-instructions` | `internal://instructions` | `text/markdown` | Guidance for using the Fetch URL MCP server. |
552
+ | `fetch-url-mcp-cache-entry` | `internal://cache/{namespace}/{hash}` | `text/markdown` | Read cached markdown generated by previous `fetch-url` calls. |
553
+
554
+ ### Prompts
555
+
556
+ | Prompt | Arguments | Description |
557
+ | ---------- | --------- | -------------------------------------------------------------------------------------------- |
558
+ | `get-help` | none | Return Fetch URL server instructions: workflows, cache usage, task mode, and error handling. |
559
+
560
+ ## MCP Capabilities
561
+
562
+ | Capability | Status | Notes |
563
+ | ------------------------------- | --------- | ------------------------------------------------------------------------------------------------------------------------- |
564
+ | completions | confirmed | Advertised in `createServerCapabilities()` and used by the cache resource template for `namespace` and `hash` completion. |
565
+ | logging | confirmed | Advertised in `createServerCapabilities()` and handled through `SetLevelRequestSchema`. |
566
+ | resources subscribe/listChanged | confirmed | Advertised in `createServerCapabilities()` and implemented for cache resource subscriptions and list changes. |
567
+ | prompts | confirmed | `get-help` is registered during server startup. |
568
+ | tasks | confirmed | Advertised in `createServerCapabilities()` and backed by registered task handlers plus optional tool task support. |
569
+ | progress notifications | confirmed | Tool execution reports `notifications/progress` updates during fetch and transform stages. |
570
+
571
+ ### Tool Annotations
572
+
573
+ | Annotation | Value |
574
+ | ----------------- | ------- |
575
+ | `readOnlyHint` | `true` |
576
+ | `destructiveHint` | `false` |
577
+ | `idempotentHint` | `true` |
578
+ | `openWorldHint` | `true` |
579
+
580
+ ### Structured Output
581
+
582
+ - `fetch-url` publishes an explicit `outputSchema` and returns `structuredContent` when the assembled response passes validation.
583
+
584
+ ## Configuration
585
+
586
+ | Variable | Default | Applies To | Notes |
587
+ | ------------------------------------------ | ------------------------- | ----------------- | --------------------------------------------------------------------- |
588
+ | `HOST` | `127.0.0.1` | HTTP mode | Bind address. Non-loopback bindings also require `ALLOW_REMOTE=true`. |
589
+ | `PORT` | `3000` | HTTP mode | Listening port for `--http`. |
590
+ | `ALLOW_REMOTE` | `false` | HTTP mode | Must be enabled to bind to a non-loopback interface. |
591
+ | `ACCESS_TOKENS` | unset | HTTP mode | Comma- or space-separated static bearer tokens. |
592
+ | `API_KEY` | unset | HTTP mode | Alternate static token source for header auth. |
593
+ | `OAUTH_ISSUER_URL` | unset | HTTP mode | Enables OAuth mode when combined with the other OAuth URLs. |
594
+ | `OAUTH_AUTHORIZATION_URL` | unset | HTTP mode | Optional explicit authorization endpoint. |
595
+ | `OAUTH_TOKEN_URL` | unset | HTTP mode | Optional explicit token endpoint. |
596
+ | `OAUTH_REVOCATION_URL` | unset | HTTP mode | Optional OAuth revocation endpoint. |
597
+ | `OAUTH_REGISTRATION_URL` | unset | HTTP mode | Optional OAuth dynamic client registration endpoint. |
598
+ | `OAUTH_INTROSPECTION_URL` | unset | HTTP mode | Required for OAuth token introspection. |
599
+ | `OAUTH_REQUIRED_SCOPES` | empty | HTTP mode | Required scopes enforced after auth. |
600
+ | `OAUTH_CLIENT_ID` | unset | HTTP mode | Optional introspection client ID. |
601
+ | `OAUTH_CLIENT_SECRET` | unset | HTTP mode | Optional introspection client secret. |
602
+ | `SERVER_TLS_KEY_FILE` | unset | HTTP mode | Enable HTTPS when set together with `SERVER_TLS_CERT_FILE`. |
603
+ | `SERVER_TLS_CERT_FILE` | unset | HTTP mode | TLS certificate path. |
604
+ | `SERVER_TLS_CA_FILE` | unset | HTTP mode | Optional custom CA bundle. |
605
+ | `SERVER_MAX_CONNECTIONS` | `0` | HTTP mode | Optional connection cap. |
606
+ | `SERVER_HEADERS_TIMEOUT_MS` | unset | HTTP mode | Optional Node server tuning. |
607
+ | `SERVER_REQUEST_TIMEOUT_MS` | unset | HTTP mode | Optional Node server tuning. |
608
+ | `SERVER_KEEP_ALIVE_TIMEOUT_MS` | unset | HTTP mode | Optional keep-alive tuning. |
609
+ | `SERVER_KEEP_ALIVE_TIMEOUT_BUFFER_MS` | unset | HTTP mode | Optional keep-alive tuning buffer. |
610
+ | `SERVER_MAX_HEADERS_COUNT` | unset | HTTP mode | Optional header count limit. |
611
+ | `SERVER_BLOCK_PRIVATE_CONNECTIONS` | `false` | HTTP mode | Enables inbound private-network protections. |
612
+ | `MCP_STRICT_PROTOCOL_VERSION_HEADER` | `true` | HTTP mode | Requires `MCP-Protocol-Version` on session init. |
613
+ | `ALLOWED_HOSTS` | empty | HTTP mode | Additional allowed `Host` and `Origin` values. |
614
+ | `ALLOW_LOCAL_FETCH` | `false` | Fetching | Allows loopback and private-network fetch targets. |
615
+ | `FETCH_TIMEOUT_MS` | `15000` | Fetching | Network fetch timeout in milliseconds. |
616
+ | `USER_AGENT` | `fetch-url-mcp/<version>` | Fetching | Override the outbound user agent string. |
617
+ | `MAX_INLINE_CONTENT_CHARS` | `0` | Tool output | `0` means no explicit inline truncation limit. |
618
+ | `CACHE_ENABLED` | `true` | Caching | Enables in-memory fetch result caching. |
619
+ | `TASKS_MAX_TOTAL` | `5000` | Tasks | Total task capacity. |
620
+ | `TASKS_MAX_PER_OWNER` | `1000` | Tasks | Per-owner task cap, clamped to the total cap. |
621
+ | `TASKS_STATUS_NOTIFICATIONS` | `false` | Tasks | Enables status notifications for tasks. |
622
+ | `TASKS_REQUIRE_INTERCEPTION` | `true` | Tasks | Requires task interception for task-capable tool execution. |
623
+ | `TRANSFORM_CANCEL_ACK_TIMEOUT_MS` | `200` | Transform workers | Cancellation acknowledgement timeout. |
624
+ | `TRANSFORM_WORKER_MODE` | `threads` | Transform workers | Worker execution mode. |
625
+ | `TRANSFORM_WORKER_MAX_OLD_GENERATION_MB` | unset | Transform workers | Optional worker memory limit. |
626
+ | `TRANSFORM_WORKER_MAX_YOUNG_GENERATION_MB` | unset | Transform workers | Optional worker memory limit. |
627
+ | `TRANSFORM_WORKER_CODE_RANGE_MB` | unset | Transform workers | Optional worker memory limit. |
628
+ | `TRANSFORM_WORKER_STACK_MB` | unset | Transform workers | Optional worker stack size. |
629
+ | `FETCH_URL_MCP_EXTRA_NOISE_TOKENS` | empty | Content cleanup | Extra noise-removal tokens. |
630
+ | `FETCH_URL_MCP_EXTRA_NOISE_SELECTORS` | empty | Content cleanup | Extra DOM selectors for noise removal. |
631
+ | `FETCH_URL_MCP_LOCALE` | system default | Content cleanup | Locale override for extraction heuristics. |
632
+ | `MARKDOWN_HEADING_KEYWORDS` | built-in list | Markdown cleanup | Override heading keywords used by cleanup. |
633
+ | `LOG_LEVEL` | `info` | Logging | `debug`, `info`, `warn`, or `error`. |
634
+ | `LOG_FORMAT` | `text` | Logging | Set to `json` for structured logs. |
635
+
636
+ ## HTTP Endpoints
637
+
638
+ | Method | Path | Auth | Purpose |
639
+ | -------- | ------------------------------------------- | ------------------------------------------ | ------------------------------------------------------- |
640
+ | `GET` | `/health` | no, unless `?verbose=1` on a remote server | Basic health response, with optional diagnostics. |
641
+ | `GET` | `/.well-known/oauth-protected-resource` | no | OAuth protected-resource metadata. |
642
+ | `GET` | `/.well-known/oauth-protected-resource/mcp` | no | OAuth protected-resource metadata for the MCP endpoint. |
643
+ | `POST` | `/mcp` | yes | Session initialization and JSON-RPC requests. |
644
+ | `GET` | `/mcp` | yes | Session-bound server-to-client stream handling. |
645
+ | `DELETE` | `/mcp` | yes | Session shutdown. |
646
+ | `GET` | `/mcp/downloads/{namespace}/{hash}` | yes | Download route used by HTTP-mode cached fetch results. |
647
+
648
+ ## Security
649
+
650
+ | Control | Status | Notes |
651
+ | -------------------------- | ----------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
652
+ | Host and origin validation | implemented | HTTP requests are rejected unless `Host` and `Origin` match the allowlist built from loopback, the configured host, and `ALLOWED_HOSTS`. |
653
+ | Authentication | implemented | HTTP mode supports static bearer tokens locally or OAuth token introspection; remote bindings require OAuth. |
654
+ | Protocol version checks | implemented | HTTP sessions validate `MCP-Protocol-Version` and pin it to the negotiated session version. |
655
+ | Rate limiting | implemented | Requests pass through the HTTP rate limiter before route dispatch. |
656
+ | Outbound SSRF protections | implemented | Local/private IPs, metadata endpoints, and `.local`/`.internal` hosts are blocked unless `ALLOW_LOCAL_FETCH=true`. |
657
+ | TLS | optional | HTTPS is enabled when both TLS key and certificate files are configured. |
658
+ | Stdio logging safety | implemented | Server logs are written to stderr, not stdout, so stdio MCP traffic stays clean. |
659
+
660
+ ## Development
661
+
662
+ | Script | Command |
663
+ | ------------------------ | ------------------------------------------------------------------------------------------------------------------- |
664
+ | `clean` | `node scripts/tasks.mjs clean` |
665
+ | `build` | `node scripts/tasks.mjs build` |
666
+ | `copy:assets` | `node scripts/tasks.mjs copy:assets` |
667
+ | `prepare` | `npm run build` |
668
+ | `dev` | `tsc --watch --preserveWatchOutput` |
669
+ | `dev:run` | `node --env-file=.env --watch dist/index.js` |
670
+ | `start` | `node dist/index.js` |
671
+ | `format` | `prettier --write .` |
672
+ | `type-check` | `node scripts/tasks.mjs type-check` |
673
+ | `type-check:src` | `node node_modules/typescript/bin/tsc -p tsconfig.json --noEmit` |
674
+ | `type-check:tests` | `node node_modules/typescript/bin/tsc -p tsconfig.test.json --noEmit` |
675
+ | `type-check:diagnostics` | `tsc --noEmit --extendedDiagnostics` |
676
+ | `type-check:trace` | `node -e "require('fs').rmSync('.ts-trace',{recursive:true,force:true})" && tsc --noEmit --generateTrace .ts-trace` |
677
+ | `lint` | `eslint .` |
678
+ | `lint:tests` | `eslint src/__tests__` |
679
+ | `lint:fix` | `eslint . --fix` |
680
+ | `test` | `node scripts/tasks.mjs test` |
681
+ | `test:fast` | `node --test --import tsx/esm src/__tests__/**/*.test.ts node-tests/**/*.test.ts` |
682
+ | `test:coverage` | `node scripts/tasks.mjs test --coverage` |
683
+ | `knip` | `knip` |
684
+ | `knip:fix` | `knip --fix` |
685
+ | `inspector` | `npm run build && npx -y @modelcontextprotocol/inspector node dist/index.js --stdio` |
686
+ | `prepublishOnly` | `npm run lint && npm run type-check && npm run build` |
687
+
688
+ ## Build and Release
689
+
690
+ - The repository includes release automation under `.github/workflows/`.
691
+ - `Dockerfile` and `docker-compose.yml` are available for container-based packaging and local runs.
692
+ - `npm run prepublishOnly` runs the release gate: lint, type-check, and build.
693
+
694
+ ## Troubleshooting
595
695
 
596
- | Issue | Solution |
597
- | ------------------------- | ------------------------------------------------------------------------------------- |
598
- | `VALIDATION_ERROR` on URL | URL is blocked (private IP/localhost) or malformed. Do not retry. |
599
- | `queue_full` error | Worker pool busy. Wait briefly, then retry or use async task mode. |
600
- | Garbled output | Binary content (images, PDFs) cannot be converted. Ensure the URL serves HTML. |
601
- | No output in stdio mode | If you intended HTTP mode, pass `--http`. Stdio is the default transport. |
602
- | Auth errors in HTTP mode | Set `ACCESS_TOKENS` or `API_KEY` env var and pass as `Authorization: Bearer <token>`. |
696
+ - For stdio mode, avoid writing logs to stdout; keep logs on stderr.
697
+ - For HTTP mode, verify MCP protocol headers and endpoint routing.
698
+ - Update client snippets when client MCP configuration formats change.
603
699
 
604
- ### Stdout / Stderr Guidance
700
+ ## Credits
605
701
 
606
- In stdio mode, **stdout** is reserved exclusively for MCP JSON-RPC messages. Logs and diagnostics are written to **stderr**. Never pipe stdout to a log file when using stdio transport.
702
+ | Dependency | Registry |
703
+ | ------------------------------------------------------------------------------------ | -------- |
704
+ | [@modelcontextprotocol/sdk](https://www.npmjs.com/package/@modelcontextprotocol/sdk) | npm |
705
+ | [@mozilla/readability](https://www.npmjs.com/package/@mozilla/readability) | npm |
706
+ | [linkedom](https://www.npmjs.com/package/linkedom) | npm |
707
+ | [node-html-markdown](https://www.npmjs.com/package/node-html-markdown) | npm |
708
+ | [undici](https://www.npmjs.com/package/undici) | npm |
709
+ | [zod](https://www.npmjs.com/package/zod) | npm |
607
710
 
608
- ## License
711
+ ## Contributing and License
609
712
 
610
- [MIT](https://opensource.org/licenses/MIT)
713
+ - License: MIT
714
+ - Contributions are welcome via pull requests.