mcp-web-reader 2.0.2 → 2.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +49 -30
- package/dist/index.js +495 -267
- package/package.json +30 -23
package/README.md
CHANGED
|
@@ -1,39 +1,46 @@
|
|
|
1
1
|
# MCP Web Reader
|
|
2
2
|
|
|
3
|
-
A powerful MCP (Model Context Protocol) server that enables Claude and other LLMs to read and parse web content.
|
|
3
|
+
A powerful MCP (Model Context Protocol) server that enables Claude and other LLMs to read and parse web content. Bypasses access restrictions for WeChat articles, paywalled sites, and Cloudflare-protected pages.
|
|
4
|
+
|
|
5
|
+
[简体中文](./README_CN.md)
|
|
4
6
|
|
|
5
7
|
## Features
|
|
6
8
|
|
|
7
|
-
- 🚀 **Multi-engine
|
|
8
|
-
- 🔄 **
|
|
9
|
-
- 🌐 **Bypass restrictions**:
|
|
9
|
+
- 🚀 **Multi-engine**: Jina Reader API, local parser, and Playwright browser
|
|
10
|
+
- 🔄 **Smart fallback**: Auto-switches Jina → Local → Playwright browser
|
|
11
|
+
- 🌐 **Bypass restrictions**: Cloudflare, CAPTCHAs, access controls
|
|
10
12
|
- 📦 **Batch processing**: Fetch multiple URLs simultaneously
|
|
11
|
-
-
|
|
12
|
-
-
|
|
13
|
+
- 📝 **Markdown output**: Automatic conversion to clean Markdown
|
|
14
|
+
- 🔌 **Transport compatibility**: stdio + Streamable HTTP (optional legacy SSE compatibility mode)
|
|
13
15
|
|
|
14
16
|
## Installation
|
|
15
17
|
|
|
16
|
-
### Quick Install (Recommended)
|
|
17
|
-
|
|
18
18
|
```bash
|
|
19
19
|
npm install -g mcp-web-reader
|
|
20
20
|
```
|
|
21
21
|
|
|
22
|
-
|
|
22
|
+
> **Note**: Chromium browser (~100-200MB) will be automatically downloaded. This is required for:
|
|
23
|
+
> - WeChat articles (need browser rendering)
|
|
24
|
+
> - Cloudflare-protected sites
|
|
25
|
+
> - JavaScript-heavy sites
|
|
26
|
+
> - CAPTCHA/access restrictions
|
|
27
|
+
|
|
28
|
+
Download may take 1-5 minutes depending on network speed.
|
|
29
|
+
|
|
30
|
+
### From Source
|
|
23
31
|
|
|
24
32
|
```bash
|
|
25
33
|
git clone https://github.com/Gracker/mcp-web-reader.git
|
|
26
34
|
cd mcp-web-reader
|
|
27
35
|
npm install
|
|
28
36
|
npm run build
|
|
29
|
-
npx playwright install chromium
|
|
30
37
|
```
|
|
31
38
|
|
|
32
39
|
## Configuration
|
|
33
40
|
|
|
34
41
|
### Claude Desktop
|
|
35
42
|
|
|
36
|
-
Add to your
|
|
43
|
+
Add to your config file:
|
|
37
44
|
|
|
38
45
|
**Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
|
|
39
46
|
**macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
|
|
@@ -48,36 +55,49 @@ Add to your Claude Desktop config file:
|
|
|
48
55
|
}
|
|
49
56
|
```
|
|
50
57
|
|
|
51
|
-
### Claude Code
|
|
52
|
-
|
|
53
|
-
For Claude Code users, add the MCP server using the command line:
|
|
58
|
+
### Claude Code
|
|
54
59
|
|
|
55
60
|
```bash
|
|
56
61
|
claude mcp add web-reader -- mcp-web-reader
|
|
62
|
+
claude mcp list
|
|
57
63
|
```
|
|
58
64
|
|
|
59
|
-
|
|
65
|
+
### Streamable HTTP (Remote Deployment)
|
|
66
|
+
|
|
67
|
+
Start server in Streamable HTTP mode:
|
|
68
|
+
|
|
60
69
|
```bash
|
|
61
|
-
|
|
70
|
+
MCP_TRANSPORT=http MCP_HTTP_HOST=0.0.0.0 MCP_HTTP_PORT=3000 npm run start:http
|
|
62
71
|
```
|
|
63
72
|
|
|
64
|
-
|
|
73
|
+
Optional environment variables:
|
|
65
74
|
|
|
66
|
-
|
|
75
|
+
- `MCP_HTTP_PATH` (default: `/mcp`)
|
|
76
|
+
- `MCP_ENABLE_LEGACY_SSE=true` to expose deprecated `/sse` + `/messages` endpoints
|
|
67
77
|
|
|
68
|
-
|
|
78
|
+
Codex MCP config (HTTP):
|
|
79
|
+
|
|
80
|
+
```toml
|
|
81
|
+
[mcp_servers.web-reader]
|
|
82
|
+
type = "http"
|
|
83
|
+
url = "https://your-domain.com/mcp"
|
|
84
|
+
bearer_token_env_var = "WEB_READER_TOKEN"
|
|
85
|
+
```
|
|
69
86
|
|
|
87
|
+
## Usage
|
|
88
|
+
|
|
89
|
+
In Claude:
|
|
70
90
|
- "Fetch content from https://example.com"
|
|
71
|
-
- "Get content using browser for https://mp.weixin.qq.com/..."
|
|
91
|
+
- "Get content using browser for https://mp.weixin.qq.com/..."
|
|
72
92
|
- "Fetch multiple URLs: [url1, url2, url3]"
|
|
73
93
|
|
|
74
94
|
## Supported Sites
|
|
75
95
|
|
|
76
|
-
-
|
|
77
|
-
-
|
|
78
|
-
-
|
|
79
|
-
-
|
|
80
|
-
-
|
|
96
|
+
- WeChat articles (mp.weixin.qq.com)
|
|
97
|
+
- Paywalled sites (NYT, Time Magazine, etc.)
|
|
98
|
+
- Cloudflare-protected sites
|
|
99
|
+
- JavaScript-heavy sites
|
|
100
|
+
- CAPTCHA-protected sites
|
|
81
101
|
|
|
82
102
|
## Tools
|
|
83
103
|
|
|
@@ -89,12 +109,12 @@ After configuration, use natural language commands:
|
|
|
89
109
|
|
|
90
110
|
## Architecture
|
|
91
111
|
|
|
92
|
-
Intelligent fallback
|
|
112
|
+
Intelligent fallback:
|
|
93
113
|
```
|
|
94
114
|
URL Request → Jina Reader → Local Parser → Playwright Browser
|
|
95
115
|
```
|
|
96
116
|
|
|
97
|
-
Auto-detects restrictions and switches to browser
|
|
117
|
+
Auto-detects restrictions and switches to browser for:
|
|
98
118
|
- HTTP status codes: 403, 429, 503, 520-524
|
|
99
119
|
- Keywords: Cloudflare, CAPTCHA, Access Denied
|
|
100
120
|
- Content patterns: Security checks, human verification
|
|
@@ -102,13 +122,12 @@ Auto-detects restrictions and switches to browser mode for:
|
|
|
102
122
|
## Development
|
|
103
123
|
|
|
104
124
|
```bash
|
|
105
|
-
npm run dev # Development
|
|
125
|
+
npm run dev # Development with auto-rebuild
|
|
106
126
|
npm run build # Build production version
|
|
107
127
|
npm start # Test run
|
|
108
|
-
|
|
128
|
+
npm run start:http # Run Streamable HTTP server
|
|
109
129
|
```
|
|
110
130
|
|
|
111
131
|
## License
|
|
112
132
|
|
|
113
133
|
MIT License
|
|
114
|
-
|
package/dist/index.js
CHANGED
|
@@ -1,32 +1,27 @@
|
|
|
1
1
|
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
|
|
2
2
|
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
|
|
3
|
-
import {
|
|
3
|
+
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
|
|
4
|
+
import { SSEServerTransport } from "@modelcontextprotocol/sdk/server/sse.js";
|
|
5
|
+
import { createMcpExpressApp } from "@modelcontextprotocol/sdk/server/express.js";
|
|
6
|
+
import { CallToolRequestSchema, ErrorCode, ListToolsRequestSchema, McpError, isInitializeRequest, } from "@modelcontextprotocol/sdk/types.js";
|
|
4
7
|
import fetch from "node-fetch";
|
|
5
8
|
import { JSDOM } from "jsdom";
|
|
6
9
|
import TurndownService from "turndown";
|
|
7
10
|
import { chromium } from "playwright";
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
name: "web-reader",
|
|
11
|
-
version: "2.0.0",
|
|
12
|
-
}, {
|
|
13
|
-
capabilities: {
|
|
14
|
-
tools: {},
|
|
15
|
-
},
|
|
16
|
-
});
|
|
17
|
-
// 初始化Turndown服务(将HTML转换为Markdown)
|
|
11
|
+
import { randomUUID } from "node:crypto";
|
|
12
|
+
// Initialize Turndown service (convert HTML to Markdown)
|
|
18
13
|
const turndownService = new TurndownService({
|
|
19
14
|
headingStyle: "atx",
|
|
20
15
|
codeBlockStyle: "fenced",
|
|
21
16
|
});
|
|
22
|
-
//
|
|
17
|
+
// Configure Turndown rules
|
|
23
18
|
turndownService.addRule("skipScripts", {
|
|
24
19
|
filter: ["script", "style", "noscript"],
|
|
25
20
|
replacement: () => "",
|
|
26
21
|
});
|
|
27
|
-
//
|
|
22
|
+
// Browser instance management
|
|
28
23
|
let browser = null;
|
|
29
|
-
//
|
|
24
|
+
// Get or create browser instance
|
|
30
25
|
async function getBrowser() {
|
|
31
26
|
if (!browser) {
|
|
32
27
|
browser = await chromium.launch({
|
|
@@ -34,7 +29,7 @@ async function getBrowser() {
|
|
|
34
29
|
args: [
|
|
35
30
|
'--no-sandbox',
|
|
36
31
|
'--disable-dev-shm-usage',
|
|
37
|
-
'--disable-blink-features=AutomationControlled', //
|
|
32
|
+
'--disable-blink-features=AutomationControlled', // Disable automation detection
|
|
38
33
|
'--disable-infobars',
|
|
39
34
|
'--window-size=1920,1080',
|
|
40
35
|
'--start-maximized',
|
|
@@ -43,14 +38,14 @@ async function getBrowser() {
|
|
|
43
38
|
}
|
|
44
39
|
return browser;
|
|
45
40
|
}
|
|
46
|
-
//
|
|
41
|
+
// Clean up browser instance
|
|
47
42
|
async function closeBrowser() {
|
|
48
43
|
if (browser) {
|
|
49
44
|
await browser.close();
|
|
50
45
|
browser = null;
|
|
51
46
|
}
|
|
52
47
|
}
|
|
53
|
-
// URL
|
|
48
|
+
// URL validation function
|
|
54
49
|
function isValidUrl(urlString) {
|
|
55
50
|
try {
|
|
56
51
|
const url = new URL(urlString);
|
|
@@ -60,18 +55,18 @@ function isValidUrl(urlString) {
|
|
|
60
55
|
return false;
|
|
61
56
|
}
|
|
62
57
|
}
|
|
63
|
-
//
|
|
58
|
+
// Check if it's a WeChat article link
|
|
64
59
|
function isWeixinUrl(url) {
|
|
65
60
|
return url.includes('mp.weixin.qq.com') || url.includes('weixin.qq.com');
|
|
66
61
|
}
|
|
67
|
-
//
|
|
62
|
+
// Check if browser mode is needed
|
|
68
63
|
function shouldUseBrowser(error, statusCode, content) {
|
|
69
64
|
const errorMessage = error.message.toLowerCase();
|
|
70
|
-
//
|
|
65
|
+
// Based on HTTP status codes
|
|
71
66
|
if (statusCode && [403, 429, 503, 520, 521, 522, 523, 524].includes(statusCode)) {
|
|
72
67
|
return true;
|
|
73
68
|
}
|
|
74
|
-
//
|
|
69
|
+
// Based on error messages
|
|
75
70
|
const browserTriggers = [
|
|
76
71
|
'cloudflare',
|
|
77
72
|
'access denied',
|
|
@@ -83,13 +78,13 @@ function shouldUseBrowser(error, statusCode, content) {
|
|
|
83
78
|
'blocked',
|
|
84
79
|
'protection',
|
|
85
80
|
'verification required',
|
|
86
|
-
'
|
|
87
|
-
'
|
|
81
|
+
'environment anomaly',
|
|
82
|
+
'verify'
|
|
88
83
|
];
|
|
89
84
|
if (browserTriggers.some(trigger => errorMessage.includes(trigger))) {
|
|
90
85
|
return true;
|
|
91
86
|
}
|
|
92
|
-
//
|
|
87
|
+
// Based on response content
|
|
93
88
|
if (content) {
|
|
94
89
|
const contentLower = content.toLowerCase();
|
|
95
90
|
const contentTriggers = [
|
|
@@ -99,10 +94,10 @@ function shouldUseBrowser(error, statusCode, content) {
|
|
|
99
94
|
'security check',
|
|
100
95
|
'human verification',
|
|
101
96
|
'captcha',
|
|
102
|
-
//
|
|
103
|
-
'
|
|
104
|
-
'
|
|
105
|
-
'
|
|
97
|
+
// WeChat-specific verification keywords
|
|
98
|
+
'environment anomaly',
|
|
99
|
+
'verify',
|
|
100
|
+
'complete verification to continue',
|
|
106
101
|
'verify'
|
|
107
102
|
];
|
|
108
103
|
if (contentTriggers.some(trigger => contentLower.includes(trigger))) {
|
|
@@ -111,12 +106,12 @@ function shouldUseBrowser(error, statusCode, content) {
|
|
|
111
106
|
}
|
|
112
107
|
return false;
|
|
113
108
|
}
|
|
114
|
-
//
|
|
109
|
+
// Fetch content using Jina Reader
|
|
115
110
|
async function fetchWithJinaReader(url) {
|
|
116
111
|
try {
|
|
117
112
|
// Jina Reader API URL
|
|
118
113
|
const jinaUrl = `https://r.jina.ai/${url}`;
|
|
119
|
-
//
|
|
114
|
+
// Create timeout controller
|
|
120
115
|
const controller = new AbortController();
|
|
121
116
|
const timeoutId = setTimeout(() => controller.abort(), 30000);
|
|
122
117
|
const response = await fetch(jinaUrl, {
|
|
@@ -131,9 +126,9 @@ async function fetchWithJinaReader(url) {
|
|
|
131
126
|
throw new Error(`Jina Reader API error! status: ${response.status}`);
|
|
132
127
|
}
|
|
133
128
|
const markdown = await response.text();
|
|
134
|
-
//
|
|
129
|
+
// Extract title from Markdown (usually the first # heading)
|
|
135
130
|
const titleMatch = markdown.match(/^#\s+(.+)$/m);
|
|
136
|
-
const title = titleMatch ? titleMatch[1] : "
|
|
131
|
+
const title = titleMatch ? titleMatch[1] : "No title";
|
|
137
132
|
return {
|
|
138
133
|
title,
|
|
139
134
|
content: markdown,
|
|
@@ -148,21 +143,21 @@ async function fetchWithJinaReader(url) {
|
|
|
148
143
|
catch (error) {
|
|
149
144
|
if (error instanceof Error) {
|
|
150
145
|
if (error.name === 'AbortError') {
|
|
151
|
-
throw new Error(`Jina Reader
|
|
146
|
+
throw new Error(`Jina Reader request timeout (30s)`);
|
|
152
147
|
}
|
|
153
|
-
throw new Error(`Jina Reader
|
|
148
|
+
throw new Error(`Jina Reader fetch failed: ${error.message}`);
|
|
154
149
|
}
|
|
155
|
-
throw new Error(`Jina Reader
|
|
150
|
+
throw new Error(`Jina Reader fetch failed: ${String(error)}`);
|
|
156
151
|
}
|
|
157
152
|
}
|
|
158
|
-
//
|
|
153
|
+
// Fetch web content using Playwright
|
|
159
154
|
async function fetchWithPlaywright(url) {
|
|
160
155
|
let page = null;
|
|
161
156
|
const isWeixin = isWeixinUrl(url);
|
|
162
157
|
try {
|
|
163
158
|
const browserInstance = await getBrowser();
|
|
164
159
|
page = await browserInstance.newPage();
|
|
165
|
-
//
|
|
160
|
+
// Set real User-Agent (simulate Chrome on Mac)
|
|
166
161
|
const userAgent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36';
|
|
167
162
|
await page.setExtraHTTPHeaders({
|
|
168
163
|
'User-Agent': userAgent,
|
|
@@ -174,7 +169,7 @@ async function fetchWithPlaywright(url) {
|
|
|
174
169
|
...(isWeixin ? { 'Referer': 'https://mp.weixin.qq.com/' } : {}),
|
|
175
170
|
});
|
|
176
171
|
await page.setViewportSize({ width: 1920, height: 1080 });
|
|
177
|
-
//
|
|
172
|
+
// WeChat articles need to load styles for correct rendering, filter for other sites
|
|
178
173
|
if (!isWeixin) {
|
|
179
174
|
await page.route('**/*', (route) => {
|
|
180
175
|
const resourceType = route.request().resourceType();
|
|
@@ -186,30 +181,30 @@ async function fetchWithPlaywright(url) {
|
|
|
186
181
|
}
|
|
187
182
|
});
|
|
188
183
|
}
|
|
189
|
-
//
|
|
184
|
+
// Navigate to page with longer timeout
|
|
190
185
|
await page.goto(url, {
|
|
191
186
|
timeout: 45000,
|
|
192
|
-
waitUntil: 'networkidle' //
|
|
187
|
+
waitUntil: 'networkidle' // Wait for network idle to ensure JS execution
|
|
193
188
|
});
|
|
194
|
-
//
|
|
189
|
+
// WeChat articles need longer wait time
|
|
195
190
|
const waitTime = isWeixin ? 5000 : 2000;
|
|
196
191
|
await page.waitForTimeout(waitTime);
|
|
197
|
-
//
|
|
198
|
-
const title = await page.title() || "
|
|
199
|
-
//
|
|
192
|
+
// Get page title
|
|
193
|
+
const title = await page.title() || "No title";
|
|
194
|
+
// Remove unwanted elements
|
|
200
195
|
await page.evaluate(() => {
|
|
201
196
|
const elementsToRemove = document.querySelectorAll('script, style, nav, header, footer, aside, .advertisement, .ads, .sidebar, .comments, .social-share');
|
|
202
197
|
elementsToRemove.forEach(el => el.remove());
|
|
203
198
|
});
|
|
204
|
-
//
|
|
199
|
+
// Get main content (WeChat articles have specific DOM structure)
|
|
205
200
|
const htmlContent = await page.evaluate(() => {
|
|
206
|
-
//
|
|
201
|
+
// WeChat article specific selectors
|
|
207
202
|
const weixinContent = document.querySelector('#js_content') ||
|
|
208
203
|
document.querySelector('.rich_media_content');
|
|
209
204
|
if (weixinContent) {
|
|
210
205
|
return weixinContent.innerHTML;
|
|
211
206
|
}
|
|
212
|
-
//
|
|
207
|
+
// Common selectors
|
|
213
208
|
const mainContent = document.querySelector('main') ||
|
|
214
209
|
document.querySelector('article') ||
|
|
215
210
|
document.querySelector('[role="main"]') ||
|
|
@@ -220,9 +215,9 @@ async function fetchWithPlaywright(url) {
|
|
|
220
215
|
document.body;
|
|
221
216
|
return mainContent ? mainContent.innerHTML : document.body.innerHTML;
|
|
222
217
|
});
|
|
223
|
-
//
|
|
218
|
+
// Convert to Markdown
|
|
224
219
|
const markdown = turndownService.turndown(htmlContent);
|
|
225
|
-
//
|
|
220
|
+
// Clean content
|
|
226
221
|
const cleanedContent = markdown
|
|
227
222
|
.replace(/\n{3,}/g, "\n\n")
|
|
228
223
|
.replace(/^\s+$/gm, "")
|
|
@@ -240,9 +235,9 @@ async function fetchWithPlaywright(url) {
|
|
|
240
235
|
}
|
|
241
236
|
catch (error) {
|
|
242
237
|
if (error instanceof Error) {
|
|
243
|
-
throw new Error(`Playwright
|
|
238
|
+
throw new Error(`Playwright fetch failed: ${error.message}`);
|
|
244
239
|
}
|
|
245
|
-
throw new Error(`Playwright
|
|
240
|
+
throw new Error(`Playwright fetch failed: ${String(error)}`);
|
|
246
241
|
}
|
|
247
242
|
finally {
|
|
248
243
|
if (page) {
|
|
@@ -250,13 +245,13 @@ async function fetchWithPlaywright(url) {
|
|
|
250
245
|
}
|
|
251
246
|
}
|
|
252
247
|
}
|
|
253
|
-
//
|
|
248
|
+
// Local web content extraction function
|
|
254
249
|
async function fetchWithLocalParser(url) {
|
|
255
250
|
try {
|
|
256
|
-
//
|
|
251
|
+
// Create timeout controller
|
|
257
252
|
const controller = new AbortController();
|
|
258
253
|
const timeoutId = setTimeout(() => controller.abort(), 30000);
|
|
259
|
-
//
|
|
254
|
+
// Send HTTP request
|
|
260
255
|
const response = await fetch(url, {
|
|
261
256
|
headers: {
|
|
262
257
|
"User-Agent": "Mozilla/5.0 (compatible; MCP-URLFetcher/2.0)",
|
|
@@ -267,17 +262,17 @@ async function fetchWithLocalParser(url) {
|
|
|
267
262
|
if (!response.ok) {
|
|
268
263
|
throw new Error(`HTTP error! status: ${response.status}`);
|
|
269
264
|
}
|
|
270
|
-
//
|
|
265
|
+
// Get HTML content
|
|
271
266
|
const html = await response.text();
|
|
272
|
-
//
|
|
267
|
+
// Parse HTML with JSDOM
|
|
273
268
|
const dom = new JSDOM(html);
|
|
274
269
|
const document = dom.window.document;
|
|
275
|
-
//
|
|
276
|
-
const title = document.querySelector("title")?.textContent || "
|
|
277
|
-
//
|
|
270
|
+
// Get title
|
|
271
|
+
const title = document.querySelector("title")?.textContent || "No title";
|
|
272
|
+
// Remove unwanted elements
|
|
278
273
|
const elementsToRemove = document.querySelectorAll("script, style, nav, header, footer, aside, .advertisement, .ads, .sidebar, .comments");
|
|
279
274
|
elementsToRemove.forEach(el => el.remove());
|
|
280
|
-
//
|
|
275
|
+
// Get main content area
|
|
281
276
|
const mainContent = document.querySelector("main") ||
|
|
282
277
|
document.querySelector("article") ||
|
|
283
278
|
document.querySelector('[role="main"]') ||
|
|
@@ -286,9 +281,9 @@ async function fetchWithLocalParser(url) {
|
|
|
286
281
|
document.querySelector(".post") ||
|
|
287
282
|
document.querySelector(".entry-content") ||
|
|
288
283
|
document.body;
|
|
289
|
-
//
|
|
284
|
+
// Convert to Markdown
|
|
290
285
|
const markdown = turndownService.turndown(mainContent.innerHTML);
|
|
291
|
-
//
|
|
286
|
+
// Clean extra whitespace
|
|
292
287
|
const cleanedContent = markdown
|
|
293
288
|
.replace(/\n{3,}/g, "\n\n")
|
|
294
289
|
.replace(/^\s+$/gm, "")
|
|
@@ -307,62 +302,62 @@ async function fetchWithLocalParser(url) {
|
|
|
307
302
|
catch (error) {
|
|
308
303
|
if (error instanceof Error) {
|
|
309
304
|
if (error.name === 'AbortError') {
|
|
310
|
-
throw new Error(
|
|
305
|
+
throw new Error(`Local parser request timeout (30s)`);
|
|
311
306
|
}
|
|
312
|
-
throw new Error(
|
|
307
|
+
throw new Error(`Local parser failed: ${error.message}`);
|
|
313
308
|
}
|
|
314
|
-
throw new Error(
|
|
309
|
+
throw new Error(`Local parser failed: ${String(error)}`);
|
|
315
310
|
}
|
|
316
311
|
}
|
|
317
|
-
//
|
|
318
|
-
//
|
|
312
|
+
// Smart web content fetching (three-tier fallback: Jina → Local → Playwright)
|
|
313
|
+
// For known sites requiring browser (like WeChat), use browser mode directly
|
|
319
314
|
async function fetchWebContent(url, preferJina = true) {
|
|
320
|
-
//
|
|
315
|
+
// WeChat articles use browser mode directly as other methods cannot bypass verification
|
|
321
316
|
if (isWeixinUrl(url)) {
|
|
322
|
-
console.error("
|
|
317
|
+
console.error("Detected WeChat article, using Playwright browser mode");
|
|
323
318
|
return await fetchWithPlaywright(url);
|
|
324
319
|
}
|
|
325
320
|
if (preferJina) {
|
|
326
|
-
//
|
|
321
|
+
// Tier 1: Try Jina Reader
|
|
327
322
|
try {
|
|
328
323
|
return await fetchWithJinaReader(url);
|
|
329
324
|
}
|
|
330
325
|
catch (jinaError) {
|
|
331
|
-
console.error("Jina Reader
|
|
332
|
-
//
|
|
326
|
+
console.error("Jina Reader failed, trying local parser:", jinaError instanceof Error ? jinaError.message : String(jinaError));
|
|
327
|
+
// Tier 2: Try local parser
|
|
333
328
|
try {
|
|
334
329
|
return await fetchWithLocalParser(url);
|
|
335
330
|
}
|
|
336
331
|
catch (localError) {
|
|
337
|
-
console.error("
|
|
338
|
-
//
|
|
332
|
+
console.error("Local parser failed, checking if browser mode needed:", localError instanceof Error ? localError.message : String(localError));
|
|
333
|
+
// Check if browser mode is needed
|
|
339
334
|
const jinaErr = jinaError instanceof Error ? jinaError : new Error(String(jinaError));
|
|
340
335
|
const localErr = localError instanceof Error ? localError : new Error(String(localError));
|
|
341
336
|
if (shouldUseBrowser(jinaErr) || shouldUseBrowser(localErr)) {
|
|
342
|
-
console.error("
|
|
337
|
+
console.error("Detected access restrictions, using Playwright browser mode");
|
|
343
338
|
try {
|
|
344
|
-
//
|
|
339
|
+
// Tier 3: Use Playwright browser
|
|
345
340
|
return await fetchWithPlaywright(url);
|
|
346
341
|
}
|
|
347
342
|
catch (browserError) {
|
|
348
|
-
throw new Error(
|
|
343
|
+
throw new Error(`All methods failed. Jina: ${jinaErr.message}, Local: ${localErr.message}, Browser: ${browserError instanceof Error ? browserError.message : String(browserError)}`);
|
|
349
344
|
}
|
|
350
345
|
}
|
|
351
346
|
else {
|
|
352
|
-
throw new Error(`Jina
|
|
347
|
+
throw new Error(`Jina and local parser both failed. Jina: ${jinaErr.message}, Local: ${localErr.message}`);
|
|
353
348
|
}
|
|
354
349
|
}
|
|
355
350
|
}
|
|
356
351
|
}
|
|
357
352
|
else {
|
|
358
|
-
//
|
|
353
|
+
// If not prioritizing Jina, start with local parser
|
|
359
354
|
try {
|
|
360
355
|
return await fetchWithLocalParser(url);
|
|
361
356
|
}
|
|
362
357
|
catch (localError) {
|
|
363
358
|
const localErr = localError instanceof Error ? localError : new Error(String(localError));
|
|
364
359
|
if (shouldUseBrowser(localErr)) {
|
|
365
|
-
console.error("
|
|
360
|
+
console.error("Local parser failed, detected access restrictions, using Playwright browser mode");
|
|
366
361
|
return await fetchWithPlaywright(url);
|
|
367
362
|
}
|
|
368
363
|
else {
|
|
@@ -371,228 +366,461 @@ async function fetchWebContent(url, preferJina = true) {
|
|
|
371
366
|
}
|
|
372
367
|
}
|
|
373
368
|
}
|
|
374
|
-
|
|
375
|
-
|
|
376
|
-
|
|
377
|
-
|
|
378
|
-
|
|
379
|
-
|
|
380
|
-
|
|
381
|
-
|
|
382
|
-
|
|
383
|
-
|
|
384
|
-
|
|
385
|
-
|
|
386
|
-
|
|
387
|
-
|
|
388
|
-
|
|
389
|
-
|
|
390
|
-
|
|
391
|
-
|
|
369
|
+
const streamableSessions = new Map();
|
|
370
|
+
const legacySseSessions = new Map();
|
|
371
|
+
function createServerInstance() {
|
|
372
|
+
const server = new Server({
|
|
373
|
+
name: "web-reader",
|
|
374
|
+
version: "2.1.0",
|
|
375
|
+
}, {
|
|
376
|
+
capabilities: {
|
|
377
|
+
tools: {},
|
|
378
|
+
},
|
|
379
|
+
});
|
|
380
|
+
registerServerHandlers(server);
|
|
381
|
+
return server;
|
|
382
|
+
}
|
|
383
|
+
function registerServerHandlers(server) {
|
|
384
|
+
// Handle tool list requests
|
|
385
|
+
server.setRequestHandler(ListToolsRequestSchema, async () => {
|
|
386
|
+
return {
|
|
387
|
+
tools: [
|
|
388
|
+
{
|
|
389
|
+
name: "fetch_url",
|
|
390
|
+
description: "Fetch web content from specified URL and convert to Markdown format. Uses Jina Reader by default, automatically falls back to local parser on failure",
|
|
391
|
+
inputSchema: {
|
|
392
|
+
type: "object",
|
|
393
|
+
properties: {
|
|
394
|
+
url: {
|
|
395
|
+
type: "string",
|
|
396
|
+
description: "Webpage URL to fetch (must be http or https protocol)",
|
|
397
|
+
},
|
|
398
|
+
preferJina: {
|
|
399
|
+
type: "boolean",
|
|
400
|
+
description: "Whether to prioritize Jina Reader (default: true)",
|
|
401
|
+
default: true,
|
|
402
|
+
},
|
|
392
403
|
},
|
|
404
|
+
required: ["url"],
|
|
393
405
|
},
|
|
394
|
-
required: ["url"],
|
|
395
406
|
},
|
|
396
|
-
|
|
397
|
-
|
|
398
|
-
|
|
399
|
-
|
|
400
|
-
|
|
401
|
-
|
|
402
|
-
|
|
403
|
-
|
|
404
|
-
|
|
405
|
-
|
|
406
|
-
|
|
407
|
+
{
|
|
408
|
+
name: "fetch_multiple_urls",
|
|
409
|
+
description: "Batch fetch web content from multiple URLs",
|
|
410
|
+
inputSchema: {
|
|
411
|
+
type: "object",
|
|
412
|
+
properties: {
|
|
413
|
+
urls: {
|
|
414
|
+
type: "array",
|
|
415
|
+
items: {
|
|
416
|
+
type: "string",
|
|
417
|
+
},
|
|
418
|
+
description: "List of webpage URLs to fetch",
|
|
419
|
+
maxItems: 10, // Limit to 10 URLs
|
|
420
|
+
},
|
|
421
|
+
preferJina: {
|
|
422
|
+
type: "boolean",
|
|
423
|
+
description: "Whether to prioritize Jina Reader (default: true)",
|
|
424
|
+
default: true,
|
|
407
425
|
},
|
|
408
|
-
description: "要获取内容的网页URL列表",
|
|
409
|
-
maxItems: 10, // 限制最多10个URL
|
|
410
|
-
},
|
|
411
|
-
preferJina: {
|
|
412
|
-
type: "boolean",
|
|
413
|
-
description: "是否优先使用Jina Reader(默认为true)",
|
|
414
|
-
default: true,
|
|
415
426
|
},
|
|
427
|
+
required: ["urls"],
|
|
416
428
|
},
|
|
417
|
-
required: ["urls"],
|
|
418
429
|
},
|
|
419
|
-
|
|
420
|
-
|
|
421
|
-
|
|
422
|
-
|
|
423
|
-
|
|
424
|
-
|
|
425
|
-
|
|
426
|
-
|
|
427
|
-
|
|
428
|
-
|
|
430
|
+
{
|
|
431
|
+
name: "fetch_url_with_jina",
|
|
432
|
+
description: "Force fetch using Jina Reader (suitable for complex webpages)",
|
|
433
|
+
inputSchema: {
|
|
434
|
+
type: "object",
|
|
435
|
+
properties: {
|
|
436
|
+
url: {
|
|
437
|
+
type: "string",
|
|
438
|
+
description: "Webpage URL to fetch",
|
|
439
|
+
},
|
|
429
440
|
},
|
|
441
|
+
required: ["url"],
|
|
430
442
|
},
|
|
431
|
-
required: ["url"],
|
|
432
443
|
},
|
|
433
|
-
|
|
434
|
-
|
|
435
|
-
|
|
436
|
-
|
|
437
|
-
|
|
438
|
-
|
|
439
|
-
|
|
440
|
-
|
|
441
|
-
|
|
442
|
-
|
|
444
|
+
{
|
|
445
|
+
name: "fetch_url_local",
|
|
446
|
+
description: "Force fetch using local parser (suitable for simple webpages or when Jina is unavailable)",
|
|
447
|
+
inputSchema: {
|
|
448
|
+
type: "object",
|
|
449
|
+
properties: {
|
|
450
|
+
url: {
|
|
451
|
+
type: "string",
|
|
452
|
+
description: "Webpage URL to fetch",
|
|
453
|
+
},
|
|
443
454
|
},
|
|
455
|
+
required: ["url"],
|
|
444
456
|
},
|
|
445
|
-
required: ["url"],
|
|
446
457
|
},
|
|
447
|
-
|
|
448
|
-
|
|
449
|
-
|
|
450
|
-
|
|
451
|
-
|
|
452
|
-
|
|
453
|
-
|
|
454
|
-
|
|
455
|
-
|
|
456
|
-
|
|
458
|
+
{
|
|
459
|
+
name: "fetch_url_with_browser",
|
|
460
|
+
description: "Force fetch using Playwright browser (suitable for websites with access restrictions, such as Cloudflare protection, CAPTCHA, etc.)",
|
|
461
|
+
inputSchema: {
|
|
462
|
+
type: "object",
|
|
463
|
+
properties: {
|
|
464
|
+
url: {
|
|
465
|
+
type: "string",
|
|
466
|
+
description: "Webpage URL to fetch",
|
|
467
|
+
},
|
|
457
468
|
},
|
|
469
|
+
required: ["url"],
|
|
458
470
|
},
|
|
459
|
-
required: ["url"],
|
|
460
471
|
},
|
|
461
|
-
|
|
462
|
-
|
|
463
|
-
};
|
|
464
|
-
|
|
465
|
-
|
|
466
|
-
|
|
467
|
-
|
|
468
|
-
|
|
469
|
-
|
|
470
|
-
|
|
471
|
-
|
|
472
|
-
|
|
473
|
-
|
|
474
|
-
|
|
475
|
-
|
|
476
|
-
|
|
477
|
-
|
|
478
|
-
|
|
479
|
-
|
|
480
|
-
|
|
481
|
-
|
|
482
|
-
|
|
483
|
-
|
|
484
|
-
};
|
|
485
|
-
}
|
|
486
|
-
else if (name === "fetch_url_with_jina") {
|
|
487
|
-
const { url } = args;
|
|
488
|
-
if (!isValidUrl(url)) {
|
|
489
|
-
throw new McpError(ErrorCode.InvalidParams, "无效的URL格式");
|
|
472
|
+
],
|
|
473
|
+
};
|
|
474
|
+
});
|
|
475
|
+
// Handle tool call requests
|
|
476
|
+
server.setRequestHandler(CallToolRequestSchema, async (request) => {
|
|
477
|
+
const { name, arguments: args } = request.params;
|
|
478
|
+
try {
|
|
479
|
+
if (name === "fetch_url") {
|
|
480
|
+
const { url, preferJina = true } = args;
|
|
481
|
+
// Validate URL
|
|
482
|
+
if (!isValidUrl(url)) {
|
|
483
|
+
throw new McpError(ErrorCode.InvalidParams, "Invalid URL format, please provide http or https protocol URL");
|
|
484
|
+
}
|
|
485
|
+
// Fetch web content
|
|
486
|
+
const result = await fetchWebContent(url, preferJina);
|
|
487
|
+
return {
|
|
488
|
+
content: [
|
|
489
|
+
{
|
|
490
|
+
type: "text",
|
|
491
|
+
text: `# ${result.title}\n\n**URL**: ${result.metadata.url}\n**Fetched At**: ${result.metadata.fetchedAt}\n**Content Length**: ${result.metadata.contentLength} characters\n**Method**: ${result.metadata.method}\n\n---\n\n${result.content}`,
|
|
492
|
+
},
|
|
493
|
+
],
|
|
494
|
+
};
|
|
490
495
|
}
|
|
491
|
-
|
|
492
|
-
|
|
493
|
-
|
|
494
|
-
|
|
495
|
-
|
|
496
|
-
|
|
497
|
-
|
|
498
|
-
|
|
499
|
-
|
|
500
|
-
|
|
501
|
-
|
|
502
|
-
|
|
503
|
-
|
|
504
|
-
|
|
496
|
+
else if (name === "fetch_url_with_jina") {
|
|
497
|
+
const { url } = args;
|
|
498
|
+
if (!isValidUrl(url)) {
|
|
499
|
+
throw new McpError(ErrorCode.InvalidParams, "Invalid URL format");
|
|
500
|
+
}
|
|
501
|
+
const result = await fetchWithJinaReader(url);
|
|
502
|
+
return {
|
|
503
|
+
content: [
|
|
504
|
+
{
|
|
505
|
+
type: "text",
|
|
506
|
+
text: `# ${result.title}\n\n**URL**: ${result.metadata.url}\n**Fetched At**: ${result.metadata.fetchedAt}\n**Content Length**: ${result.metadata.contentLength} characters\n**Method**: Jina Reader\n\n---\n\n${result.content}`,
|
|
507
|
+
},
|
|
508
|
+
],
|
|
509
|
+
};
|
|
505
510
|
}
|
|
506
|
-
|
|
507
|
-
|
|
508
|
-
|
|
509
|
-
|
|
510
|
-
|
|
511
|
-
|
|
512
|
-
|
|
513
|
-
|
|
514
|
-
|
|
515
|
-
|
|
516
|
-
|
|
517
|
-
|
|
518
|
-
|
|
519
|
-
|
|
520
|
-
if (invalidUrls.length > 0) {
|
|
521
|
-
throw new McpError(ErrorCode.InvalidParams, `以下URL格式无效: ${invalidUrls.join(", ")}`);
|
|
511
|
+
else if (name === "fetch_url_local") {
|
|
512
|
+
const { url } = args;
|
|
513
|
+
if (!isValidUrl(url)) {
|
|
514
|
+
throw new McpError(ErrorCode.InvalidParams, "Invalid URL format");
|
|
515
|
+
}
|
|
516
|
+
const result = await fetchWithLocalParser(url);
|
|
517
|
+
return {
|
|
518
|
+
content: [
|
|
519
|
+
{
|
|
520
|
+
type: "text",
|
|
521
|
+
text: `# ${result.title}\n\n**URL**: ${result.metadata.url}\n**Fetched At**: ${result.metadata.fetchedAt}\n**Content Length**: ${result.metadata.contentLength} characters\n**Method**: Local Parser\n\n---\n\n${result.content}`,
|
|
522
|
+
},
|
|
523
|
+
],
|
|
524
|
+
};
|
|
522
525
|
}
|
|
523
|
-
|
|
524
|
-
|
|
525
|
-
|
|
526
|
-
|
|
527
|
-
|
|
528
|
-
|
|
529
|
-
combinedContent += `## ${index + 1}. ${url}\n\n`;
|
|
530
|
-
if (result.status === "fulfilled") {
|
|
531
|
-
const { title, content, metadata } = result.value;
|
|
532
|
-
combinedContent += `**标题**: ${title}\n`;
|
|
533
|
-
combinedContent += `**获取时间**: ${metadata.fetchedAt}\n`;
|
|
534
|
-
combinedContent += `**内容长度**: ${metadata.contentLength} 字符\n`;
|
|
535
|
-
combinedContent += `**解析方法**: ${metadata.method}\n\n`;
|
|
536
|
-
combinedContent += `### 内容\n\n${content}\n\n`;
|
|
526
|
+
else if (name === "fetch_multiple_urls") {
|
|
527
|
+
const { urls, preferJina = true } = args;
|
|
528
|
+
// Validate all URLs
|
|
529
|
+
const invalidUrls = urls.filter(url => !isValidUrl(url));
|
|
530
|
+
if (invalidUrls.length > 0) {
|
|
531
|
+
throw new McpError(ErrorCode.InvalidParams, `The following URLs have invalid format: ${invalidUrls.join(", ")}`);
|
|
537
532
|
}
|
|
538
|
-
|
|
539
|
-
|
|
533
|
+
// Fetch all URLs concurrently
|
|
534
|
+
const results = await Promise.allSettled(urls.map(url => fetchWebContent(url, preferJina)));
|
|
535
|
+
// Combine results
|
|
536
|
+
let combinedContent = "# Batch URL Content Fetch Results\n\n";
|
|
537
|
+
results.forEach((result, index) => {
|
|
538
|
+
const url = urls[index];
|
|
539
|
+
combinedContent += `## ${index + 1}. ${url}\n\n`;
|
|
540
|
+
if (result.status === "fulfilled") {
|
|
541
|
+
const { title, content, metadata } = result.value;
|
|
542
|
+
combinedContent += `**Title**: ${title}\n`;
|
|
543
|
+
combinedContent += `**Fetched At**: ${metadata.fetchedAt}\n`;
|
|
544
|
+
combinedContent += `**Content Length**: ${metadata.contentLength} characters\n`;
|
|
545
|
+
combinedContent += `**Method**: ${metadata.method}\n\n`;
|
|
546
|
+
combinedContent += `### Content\n\n${content}\n\n`;
|
|
547
|
+
}
|
|
548
|
+
else {
|
|
549
|
+
combinedContent += `**Error**: ${result.reason}\n\n`;
|
|
550
|
+
}
|
|
551
|
+
combinedContent += "---\n\n";
|
|
552
|
+
});
|
|
553
|
+
return {
|
|
554
|
+
content: [
|
|
555
|
+
{
|
|
556
|
+
type: "text",
|
|
557
|
+
text: combinedContent,
|
|
558
|
+
},
|
|
559
|
+
],
|
|
560
|
+
};
|
|
561
|
+
}
|
|
562
|
+
else if (name === "fetch_url_with_browser") {
|
|
563
|
+
const { url } = args;
|
|
564
|
+
if (!isValidUrl(url)) {
|
|
565
|
+
throw new McpError(ErrorCode.InvalidParams, "Invalid URL format");
|
|
540
566
|
}
|
|
541
|
-
|
|
542
|
-
|
|
543
|
-
|
|
544
|
-
|
|
545
|
-
|
|
546
|
-
|
|
547
|
-
|
|
548
|
-
|
|
549
|
-
|
|
550
|
-
}
|
|
567
|
+
const result = await fetchWithPlaywright(url);
|
|
568
|
+
return {
|
|
569
|
+
content: [
|
|
570
|
+
{
|
|
571
|
+
type: "text",
|
|
572
|
+
text: `# ${result.title}\n\n**URL**: ${result.metadata.url}\n**Fetched At**: ${result.metadata.fetchedAt}\n**Content Length**: ${result.metadata.contentLength} characters\n**Method**: Playwright Browser\n\n---\n\n${result.content}`,
|
|
573
|
+
},
|
|
574
|
+
],
|
|
575
|
+
};
|
|
576
|
+
}
|
|
577
|
+
else {
|
|
578
|
+
throw new McpError(ErrorCode.MethodNotFound, `Unknown tool: ${name}`);
|
|
579
|
+
}
|
|
551
580
|
}
|
|
552
|
-
|
|
553
|
-
|
|
554
|
-
|
|
555
|
-
throw new McpError(ErrorCode.InvalidParams, "无效的URL格式");
|
|
581
|
+
catch (error) {
|
|
582
|
+
if (error instanceof McpError) {
|
|
583
|
+
throw error;
|
|
556
584
|
}
|
|
557
|
-
|
|
558
|
-
|
|
559
|
-
|
|
560
|
-
|
|
561
|
-
|
|
562
|
-
|
|
563
|
-
|
|
564
|
-
|
|
565
|
-
|
|
585
|
+
throw new McpError(ErrorCode.InternalError, `Tool execution failed: ${error instanceof Error ? error.message : String(error)}`);
|
|
586
|
+
}
|
|
587
|
+
});
|
|
588
|
+
}
|
|
589
|
+
function sendJsonRpcError(res, statusCode, message) {
|
|
590
|
+
res.status(statusCode).json({
|
|
591
|
+
jsonrpc: "2.0",
|
|
592
|
+
error: {
|
|
593
|
+
code: -32000,
|
|
594
|
+
message,
|
|
595
|
+
},
|
|
596
|
+
id: null,
|
|
597
|
+
});
|
|
598
|
+
}
|
|
599
|
+
function getSessionIdFromHeaders(headers) {
|
|
600
|
+
const value = headers["mcp-session-id"];
|
|
601
|
+
if (!value) {
|
|
602
|
+
return undefined;
|
|
603
|
+
}
|
|
604
|
+
return Array.isArray(value) ? value[0] : value;
|
|
605
|
+
}
|
|
606
|
+
function resolveTransportMode() {
|
|
607
|
+
const cliTransportArg = process.argv.find((arg) => arg.startsWith("--transport="));
|
|
608
|
+
const cliTransport = cliTransportArg ? cliTransportArg.split("=", 2)[1] : undefined;
|
|
609
|
+
const legacyHttpFlag = process.argv.includes("--http");
|
|
610
|
+
const mode = (cliTransport ?? process.env.MCP_TRANSPORT ?? (legacyHttpFlag ? "http" : "stdio"))
|
|
611
|
+
.toLowerCase();
|
|
612
|
+
if (mode === "stdio" || mode === "http") {
|
|
613
|
+
return mode;
|
|
614
|
+
}
|
|
615
|
+
throw new Error(`Unsupported transport mode: ${mode}. Use 'stdio' or 'http'.`);
|
|
616
|
+
}
|
|
617
|
+
function resolveLegacySseFlag() {
|
|
618
|
+
const envValue = (process.env.MCP_ENABLE_LEGACY_SSE ?? "").toLowerCase();
|
|
619
|
+
return envValue === "1" || envValue === "true" || process.argv.includes("--legacy-sse");
|
|
620
|
+
}
|
|
621
|
+
async function closeAllSessions() {
|
|
622
|
+
for (const [sessionId, session] of streamableSessions.entries()) {
|
|
623
|
+
try {
|
|
624
|
+
await session.server.close();
|
|
566
625
|
}
|
|
567
|
-
|
|
568
|
-
|
|
626
|
+
catch (error) {
|
|
627
|
+
console.error(`Failed to close streamable server for session ${sessionId}:`, error);
|
|
569
628
|
}
|
|
570
629
|
}
|
|
571
|
-
|
|
572
|
-
|
|
573
|
-
|
|
630
|
+
streamableSessions.clear();
|
|
631
|
+
for (const [sessionId, session] of legacySseSessions.entries()) {
|
|
632
|
+
try {
|
|
633
|
+
await session.server.close();
|
|
634
|
+
}
|
|
635
|
+
catch (error) {
|
|
636
|
+
console.error(`Failed to close SSE server for session ${sessionId}:`, error);
|
|
574
637
|
}
|
|
575
|
-
throw new McpError(ErrorCode.InternalError, `工具执行失败: ${error instanceof Error ? error.message : String(error)}`);
|
|
576
638
|
}
|
|
577
|
-
|
|
578
|
-
|
|
579
|
-
async function
|
|
639
|
+
legacySseSessions.clear();
|
|
640
|
+
}
|
|
641
|
+
async function startStdioServer() {
|
|
642
|
+
const server = createServerInstance();
|
|
580
643
|
const transport = new StdioServerTransport();
|
|
581
644
|
await server.connect(transport);
|
|
582
|
-
console.error("MCP Web Reader
|
|
645
|
+
console.error("MCP Web Reader started in stdio mode");
|
|
646
|
+
}
|
|
647
|
+
async function startHttpServer() {
|
|
648
|
+
const host = process.env.MCP_HTTP_HOST ?? "127.0.0.1";
|
|
649
|
+
const port = Number.parseInt(process.env.MCP_HTTP_PORT ?? "3000", 10);
|
|
650
|
+
const mcpPath = process.env.MCP_HTTP_PATH ?? "/mcp";
|
|
651
|
+
const enableLegacySse = resolveLegacySseFlag();
|
|
652
|
+
if (!Number.isInteger(port) || port <= 0 || port > 65535) {
|
|
653
|
+
throw new Error(`Invalid MCP_HTTP_PORT: ${process.env.MCP_HTTP_PORT}`);
|
|
654
|
+
}
|
|
655
|
+
const app = createMcpExpressApp({ host });
|
|
656
|
+
app.post(mcpPath, async (req, res) => {
|
|
657
|
+
const sessionId = getSessionIdFromHeaders(req.headers);
|
|
658
|
+
try {
|
|
659
|
+
if (sessionId) {
|
|
660
|
+
const existingSession = streamableSessions.get(sessionId);
|
|
661
|
+
if (!existingSession) {
|
|
662
|
+
sendJsonRpcError(res, 404, "Session not found");
|
|
663
|
+
return;
|
|
664
|
+
}
|
|
665
|
+
await existingSession.transport.handleRequest(req, res, req.body);
|
|
666
|
+
return;
|
|
667
|
+
}
|
|
668
|
+
if (!isInitializeRequest(req.body)) {
|
|
669
|
+
sendJsonRpcError(res, 400, "Missing session ID; initialize request required");
|
|
670
|
+
return;
|
|
671
|
+
}
|
|
672
|
+
let transport;
|
|
673
|
+
const sessionServer = createServerInstance();
|
|
674
|
+
transport = new StreamableHTTPServerTransport({
|
|
675
|
+
sessionIdGenerator: () => randomUUID(),
|
|
676
|
+
onsessioninitialized: (initializedSessionId) => {
|
|
677
|
+
streamableSessions.set(initializedSessionId, { transport, server: sessionServer });
|
|
678
|
+
console.error(`Streamable HTTP session initialized: ${initializedSessionId}`);
|
|
679
|
+
},
|
|
680
|
+
});
|
|
681
|
+
transport.onclose = () => {
|
|
682
|
+
const closedSessionId = transport.sessionId;
|
|
683
|
+
if (closedSessionId && streamableSessions.delete(closedSessionId)) {
|
|
684
|
+
console.error(`Streamable HTTP session closed: ${closedSessionId}`);
|
|
685
|
+
}
|
|
686
|
+
};
|
|
687
|
+
await sessionServer.connect(transport);
|
|
688
|
+
await transport.handleRequest(req, res, req.body);
|
|
689
|
+
}
|
|
690
|
+
catch (error) {
|
|
691
|
+
console.error("Error handling streamable HTTP POST request:", error);
|
|
692
|
+
if (!res.headersSent) {
|
|
693
|
+
sendJsonRpcError(res, 500, "Internal server error");
|
|
694
|
+
}
|
|
695
|
+
}
|
|
696
|
+
});
|
|
697
|
+
app.get(mcpPath, async (req, res) => {
|
|
698
|
+
const sessionId = getSessionIdFromHeaders(req.headers);
|
|
699
|
+
if (!sessionId) {
|
|
700
|
+
sendJsonRpcError(res, 400, "Missing mcp-session-id header");
|
|
701
|
+
return;
|
|
702
|
+
}
|
|
703
|
+
const session = streamableSessions.get(sessionId);
|
|
704
|
+
if (!session) {
|
|
705
|
+
sendJsonRpcError(res, 404, "Session not found");
|
|
706
|
+
return;
|
|
707
|
+
}
|
|
708
|
+
try {
|
|
709
|
+
await session.transport.handleRequest(req, res);
|
|
710
|
+
}
|
|
711
|
+
catch (error) {
|
|
712
|
+
console.error("Error handling streamable HTTP GET request:", error);
|
|
713
|
+
if (!res.headersSent) {
|
|
714
|
+
sendJsonRpcError(res, 500, "Internal server error");
|
|
715
|
+
}
|
|
716
|
+
}
|
|
717
|
+
});
|
|
718
|
+
app.delete(mcpPath, async (req, res) => {
|
|
719
|
+
const sessionId = getSessionIdFromHeaders(req.headers);
|
|
720
|
+
if (!sessionId) {
|
|
721
|
+
sendJsonRpcError(res, 400, "Missing mcp-session-id header");
|
|
722
|
+
return;
|
|
723
|
+
}
|
|
724
|
+
const session = streamableSessions.get(sessionId);
|
|
725
|
+
if (!session) {
|
|
726
|
+
sendJsonRpcError(res, 404, "Session not found");
|
|
727
|
+
return;
|
|
728
|
+
}
|
|
729
|
+
try {
|
|
730
|
+
await session.transport.handleRequest(req, res);
|
|
731
|
+
}
|
|
732
|
+
catch (error) {
|
|
733
|
+
console.error("Error handling streamable HTTP DELETE request:", error);
|
|
734
|
+
if (!res.headersSent) {
|
|
735
|
+
sendJsonRpcError(res, 500, "Internal server error");
|
|
736
|
+
}
|
|
737
|
+
}
|
|
738
|
+
});
|
|
739
|
+
if (enableLegacySse) {
|
|
740
|
+
app.get("/sse", async (_req, res) => {
|
|
741
|
+
const transport = new SSEServerTransport("/messages", res);
|
|
742
|
+
const server = createServerInstance();
|
|
743
|
+
legacySseSessions.set(transport.sessionId, { transport, server });
|
|
744
|
+
res.on("close", () => {
|
|
745
|
+
const removed = legacySseSessions.delete(transport.sessionId);
|
|
746
|
+
if (removed) {
|
|
747
|
+
void server.close().catch((error) => {
|
|
748
|
+
console.error("Failed to close legacy SSE session server:", error);
|
|
749
|
+
});
|
|
750
|
+
}
|
|
751
|
+
});
|
|
752
|
+
await server.connect(transport);
|
|
753
|
+
});
|
|
754
|
+
app.post("/messages", async (req, res) => {
|
|
755
|
+
const querySessionId = req.query.sessionId;
|
|
756
|
+
const sessionId = typeof querySessionId === "string"
|
|
757
|
+
? querySessionId
|
|
758
|
+
: Array.isArray(querySessionId) && typeof querySessionId[0] === "string"
|
|
759
|
+
? querySessionId[0]
|
|
760
|
+
: undefined;
|
|
761
|
+
if (!sessionId) {
|
|
762
|
+
sendJsonRpcError(res, 400, "Missing sessionId query parameter");
|
|
763
|
+
return;
|
|
764
|
+
}
|
|
765
|
+
const session = legacySseSessions.get(sessionId);
|
|
766
|
+
if (!session) {
|
|
767
|
+
sendJsonRpcError(res, 404, "Session not found");
|
|
768
|
+
return;
|
|
769
|
+
}
|
|
770
|
+
try {
|
|
771
|
+
await session.transport.handlePostMessage(req, res, req.body);
|
|
772
|
+
}
|
|
773
|
+
catch (error) {
|
|
774
|
+
console.error("Error handling legacy SSE POST request:", error);
|
|
775
|
+
if (!res.headersSent) {
|
|
776
|
+
sendJsonRpcError(res, 500, "Internal server error");
|
|
777
|
+
}
|
|
778
|
+
}
|
|
779
|
+
});
|
|
780
|
+
}
|
|
781
|
+
app.get("/healthz", (_req, res) => {
|
|
782
|
+
res.json({
|
|
783
|
+
status: "ok",
|
|
784
|
+
transport: "streamable-http",
|
|
785
|
+
sessions: streamableSessions.size,
|
|
786
|
+
legacySseEnabled: enableLegacySse,
|
|
787
|
+
});
|
|
788
|
+
});
|
|
789
|
+
app.listen(port, host, () => {
|
|
790
|
+
console.error(`MCP Web Reader started in HTTP mode on http://${host}:${port}${mcpPath}`);
|
|
791
|
+
if (enableLegacySse) {
|
|
792
|
+
console.error("Legacy SSE compatibility enabled on /sse and /messages");
|
|
793
|
+
}
|
|
794
|
+
});
|
|
583
795
|
}
|
|
584
|
-
|
|
585
|
-
|
|
586
|
-
|
|
796
|
+
let isShuttingDown = false;
|
|
797
|
+
async function shutdown(signal) {
|
|
798
|
+
if (isShuttingDown) {
|
|
799
|
+
return;
|
|
800
|
+
}
|
|
801
|
+
isShuttingDown = true;
|
|
802
|
+
console.error(`Received ${signal}, shutting down MCP Web Reader...`);
|
|
803
|
+
await closeAllSessions();
|
|
587
804
|
await closeBrowser();
|
|
588
805
|
process.exit(0);
|
|
806
|
+
}
|
|
807
|
+
// Start server
|
|
808
|
+
async function main() {
|
|
809
|
+
const transportMode = resolveTransportMode();
|
|
810
|
+
if (transportMode === "http") {
|
|
811
|
+
await startHttpServer();
|
|
812
|
+
return;
|
|
813
|
+
}
|
|
814
|
+
await startStdioServer();
|
|
815
|
+
}
|
|
816
|
+
// Graceful shutdown handling
|
|
817
|
+
process.on("SIGINT", () => {
|
|
818
|
+
void shutdown("SIGINT");
|
|
589
819
|
});
|
|
590
|
-
process.on(
|
|
591
|
-
|
|
592
|
-
await closeBrowser();
|
|
593
|
-
process.exit(0);
|
|
820
|
+
process.on("SIGTERM", () => {
|
|
821
|
+
void shutdown("SIGTERM");
|
|
594
822
|
});
|
|
595
823
|
main().catch((error) => {
|
|
596
|
-
console.error("
|
|
824
|
+
console.error("Server startup failed:", error);
|
|
597
825
|
process.exit(1);
|
|
598
826
|
});
|
package/package.json
CHANGED
|
@@ -1,45 +1,52 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "mcp-web-reader",
|
|
3
|
-
"version": "2.0
|
|
3
|
+
"version": "2.2.0",
|
|
4
4
|
"description": "MCP server for reading web content with Jina Reader and local parser support",
|
|
5
5
|
"main": "dist/index.js",
|
|
6
6
|
"bin": {
|
|
7
|
-
|
|
7
|
+
"mcp-web-reader": "./dist/index.js"
|
|
8
8
|
},
|
|
9
9
|
"type": "module",
|
|
10
10
|
"scripts": {
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
11
|
+
"build": "tsc",
|
|
12
|
+
"start": "node dist/index.js",
|
|
13
|
+
"start:http": "node dist/index.js --transport=http",
|
|
14
|
+
"dev": "tsc --watch",
|
|
15
|
+
"claude-code": "node dist/index.js",
|
|
16
|
+
"postinstall": "npx playwright install chromium"
|
|
15
17
|
},
|
|
16
18
|
"repository": {
|
|
17
|
-
|
|
18
|
-
|
|
19
|
+
"type": "git",
|
|
20
|
+
"url": "git+https://github.com/Gracker/mcp-web-reader.git"
|
|
19
21
|
},
|
|
20
22
|
"bugs": {
|
|
21
|
-
|
|
23
|
+
"url": "https://github.com/Gracker/mcp-web-reader/issues"
|
|
22
24
|
},
|
|
23
25
|
"homepage": "https://github.com/Gracker/mcp-web-reader#readme",
|
|
24
|
-
"keywords": [
|
|
26
|
+
"keywords": [
|
|
27
|
+
"mcp",
|
|
28
|
+
"claude",
|
|
29
|
+
"web-scraping",
|
|
30
|
+
"jina-reader"
|
|
31
|
+
],
|
|
25
32
|
"author": "Gracker",
|
|
26
33
|
"license": "MIT",
|
|
27
34
|
"files": [
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
35
|
+
"dist",
|
|
36
|
+
"README.md",
|
|
37
|
+
"LICENSE"
|
|
31
38
|
],
|
|
32
39
|
"dependencies": {
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
40
|
+
"@modelcontextprotocol/sdk": "^1.26.0",
|
|
41
|
+
"jsdom": "^24.0.0",
|
|
42
|
+
"node-fetch": "^3.3.2",
|
|
43
|
+
"playwright": "^1.40.0",
|
|
44
|
+
"turndown": "^7.1.3"
|
|
38
45
|
},
|
|
39
46
|
"devDependencies": {
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
47
|
+
"@types/jsdom": "^21.1.6",
|
|
48
|
+
"@types/node": "^20.0.0",
|
|
49
|
+
"@types/turndown": "^5.0.4",
|
|
50
|
+
"typescript": "^5.3.3"
|
|
44
51
|
}
|
|
45
|
-
|
|
52
|
+
}
|