mcp-web-reader 2.0.2 → 2.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +49 -30
  2. package/dist/index.js +495 -267
  3. package/package.json +30 -23
package/README.md CHANGED
@@ -1,39 +1,46 @@
1
1
  # MCP Web Reader
2
2
 
3
- A powerful MCP (Model Context Protocol) server that enables Claude and other LLMs to read and parse web content. Supports bypassing access restrictions to easily fetch protected content like WeChat articles and paywalled sites.
3
+ A powerful MCP (Model Context Protocol) server that enables Claude and other LLMs to read and parse web content. Bypasses access restrictions for WeChat articles, paywalled sites, and Cloudflare-protected pages.
4
+
5
+ [简体中文](./README_CN.md)
4
6
 
5
7
  ## Features
6
8
 
7
- - 🚀 **Multi-engine support**: Jina Reader API, local parser, and Playwright browser
8
- - 🔄 **Intelligent fallback**: Auto-switches from Jina → Local → Playwright browser
9
- - 🌐 **Bypass restrictions**: Handles Cloudflare, CAPTCHAs, and access controls
9
+ - 🚀 **Multi-engine**: Jina Reader API, local parser, and Playwright browser
10
+ - 🔄 **Smart fallback**: Auto-switches Jina → Local → Playwright browser
11
+ - 🌐 **Bypass restrictions**: Cloudflare, CAPTCHAs, access controls
10
12
  - 📦 **Batch processing**: Fetch multiple URLs simultaneously
11
- - 🎯 **Flexible control**: Force specific parsing methods when needed
12
- - 📝 **Markdown output**: Automatic conversion to clean Markdown format
13
+ - 📝 **Markdown output**: Automatic conversion to clean Markdown
14
+ - 🔌 **Transport compatibility**: stdio + Streamable HTTP (optional legacy SSE compatibility mode)
13
15
 
14
16
  ## Installation
15
17
 
16
- ### Quick Install (Recommended)
17
-
18
18
  ```bash
19
19
  npm install -g mcp-web-reader
20
20
  ```
21
21
 
22
- ### Install from Source
22
+ > **Note**: Chromium browser (~100-200MB) will be automatically downloaded. This is required for:
23
+ > - WeChat articles (need browser rendering)
24
+ > - Cloudflare-protected sites
25
+ > - JavaScript-heavy sites
26
+ > - CAPTCHA/access restrictions
27
+
28
+ Download may take 1-5 minutes depending on network speed.
29
+
30
+ ### From Source
23
31
 
24
32
  ```bash
25
33
  git clone https://github.com/Gracker/mcp-web-reader.git
26
34
  cd mcp-web-reader
27
35
  npm install
28
36
  npm run build
29
- npx playwright install chromium
30
37
  ```
31
38
 
32
39
  ## Configuration
33
40
 
34
41
  ### Claude Desktop
35
42
 
36
- Add to your Claude Desktop config file:
43
+ Add to your config file:
37
44
 
38
45
  **Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
39
46
  **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
@@ -48,36 +55,49 @@ Add to your Claude Desktop config file:
48
55
  }
49
56
  ```
50
57
 
51
- ### Claude Code (Terminal)
52
-
53
- For Claude Code users, add the MCP server using the command line:
58
+ ### Claude Code
54
59
 
55
60
  ```bash
56
61
  claude mcp add web-reader -- mcp-web-reader
62
+ claude mcp list
57
63
  ```
58
64
 
59
- To verify the server is configured:
65
+ ### Streamable HTTP (Remote Deployment)
66
+
67
+ Start server in Streamable HTTP mode:
68
+
60
69
  ```bash
61
- claude mcp list
70
+ MCP_TRANSPORT=http MCP_HTTP_HOST=0.0.0.0 MCP_HTTP_PORT=3000 npm run start:http
62
71
  ```
63
72
 
64
- ## Usage
73
+ Optional environment variables:
65
74
 
66
- ### In Claude
75
+ - `MCP_HTTP_PATH` (default: `/mcp`)
76
+ - `MCP_ENABLE_LEGACY_SSE=true` to expose deprecated `/sse` + `/messages` endpoints
67
77
 
68
- After configuration, use natural language commands:
78
+ Codex MCP config (HTTP):
79
+
80
+ ```toml
81
+ [mcp_servers.web-reader]
82
+ type = "http"
83
+ url = "https://your-domain.com/mcp"
84
+ bearer_token_env_var = "WEB_READER_TOKEN"
85
+ ```
69
86
 
87
+ ## Usage
88
+
89
+ In Claude:
70
90
  - "Fetch content from https://example.com"
71
- - "Get content using browser for https://mp.weixin.qq.com/..." (for restricted sites)
91
+ - "Get content using browser for https://mp.weixin.qq.com/..."
72
92
  - "Fetch multiple URLs: [url1, url2, url3]"
73
93
 
74
94
  ## Supported Sites
75
95
 
76
- - **WeChat articles** - Automatic access bypass
77
- - **Paywalled sites** - NYT, Time Magazine, etc.
78
- - **Cloudflare protected sites**
79
- - **JavaScript-heavy sites**
80
- - **CAPTCHA protected sites**
96
+ - WeChat articles (mp.weixin.qq.com)
97
+ - Paywalled sites (NYT, Time Magazine, etc.)
98
+ - Cloudflare-protected sites
99
+ - JavaScript-heavy sites
100
+ - CAPTCHA-protected sites
81
101
 
82
102
  ## Tools
83
103
 
@@ -89,12 +109,12 @@ After configuration, use natural language commands:
89
109
 
90
110
  ## Architecture
91
111
 
92
- Intelligent fallback strategy:
112
+ Intelligent fallback:
93
113
  ```
94
114
  URL Request → Jina Reader → Local Parser → Playwright Browser
95
115
  ```
96
116
 
97
- Auto-detects restrictions and switches to browser mode for:
117
+ Auto-detects restrictions and switches to browser for:
98
118
  - HTTP status codes: 403, 429, 503, 520-524
99
119
  - Keywords: Cloudflare, CAPTCHA, Access Denied
100
120
  - Content patterns: Security checks, human verification
@@ -102,13 +122,12 @@ Auto-detects restrictions and switches to browser mode for:
102
122
  ## Development
103
123
 
104
124
  ```bash
105
- npm run dev # Development mode with auto-rebuild
125
+ npm run dev # Development with auto-rebuild
106
126
  npm run build # Build production version
107
127
  npm start # Test run
108
- npx playwright install chromium # Install browser (required)
128
+ npm run start:http # Run Streamable HTTP server
109
129
  ```
110
130
 
111
131
  ## License
112
132
 
113
133
  MIT License
114
-
package/dist/index.js CHANGED
@@ -1,32 +1,27 @@
1
1
  import { Server } from "@modelcontextprotocol/sdk/server/index.js";
2
2
  import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
3
- import { CallToolRequestSchema, ErrorCode, ListToolsRequestSchema, McpError, } from "@modelcontextprotocol/sdk/types.js";
3
+ import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
4
+ import { SSEServerTransport } from "@modelcontextprotocol/sdk/server/sse.js";
5
+ import { createMcpExpressApp } from "@modelcontextprotocol/sdk/server/express.js";
6
+ import { CallToolRequestSchema, ErrorCode, ListToolsRequestSchema, McpError, isInitializeRequest, } from "@modelcontextprotocol/sdk/types.js";
4
7
  import fetch from "node-fetch";
5
8
  import { JSDOM } from "jsdom";
6
9
  import TurndownService from "turndown";
7
10
  import { chromium } from "playwright";
8
- // 创建服务器实例
9
- const server = new Server({
10
- name: "web-reader",
11
- version: "2.0.0",
12
- }, {
13
- capabilities: {
14
- tools: {},
15
- },
16
- });
17
- // 初始化Turndown服务(将HTML转换为Markdown)
11
+ import { randomUUID } from "node:crypto";
12
+ // Initialize Turndown service (convert HTML to Markdown)
18
13
  const turndownService = new TurndownService({
19
14
  headingStyle: "atx",
20
15
  codeBlockStyle: "fenced",
21
16
  });
22
- // 配置Turndown规则
17
+ // Configure Turndown rules
23
18
  turndownService.addRule("skipScripts", {
24
19
  filter: ["script", "style", "noscript"],
25
20
  replacement: () => "",
26
21
  });
27
- // 浏览器实例管理
22
+ // Browser instance management
28
23
  let browser = null;
29
- // 获取或创建浏览器实例
24
+ // Get or create browser instance
30
25
  async function getBrowser() {
31
26
  if (!browser) {
32
27
  browser = await chromium.launch({
@@ -34,7 +29,7 @@ async function getBrowser() {
34
29
  args: [
35
30
  '--no-sandbox',
36
31
  '--disable-dev-shm-usage',
37
- '--disable-blink-features=AutomationControlled', // 禁用自动化检测
32
+ '--disable-blink-features=AutomationControlled', // Disable automation detection
38
33
  '--disable-infobars',
39
34
  '--window-size=1920,1080',
40
35
  '--start-maximized',
@@ -43,14 +38,14 @@ async function getBrowser() {
43
38
  }
44
39
  return browser;
45
40
  }
46
- // 清理浏览器实例
41
+ // Clean up browser instance
47
42
  async function closeBrowser() {
48
43
  if (browser) {
49
44
  await browser.close();
50
45
  browser = null;
51
46
  }
52
47
  }
53
- // URL验证函数
48
+ // URL validation function
54
49
  function isValidUrl(urlString) {
55
50
  try {
56
51
  const url = new URL(urlString);
@@ -60,18 +55,18 @@ function isValidUrl(urlString) {
60
55
  return false;
61
56
  }
62
57
  }
63
- // 检测是否是微信文章链接
58
+ // Check if it's a WeChat article link
64
59
  function isWeixinUrl(url) {
65
60
  return url.includes('mp.weixin.qq.com') || url.includes('weixin.qq.com');
66
61
  }
67
- // 检测是否需要使用浏览器模式
62
+ // Check if browser mode is needed
68
63
  function shouldUseBrowser(error, statusCode, content) {
69
64
  const errorMessage = error.message.toLowerCase();
70
- // 基于HTTP状态码判断
65
+ // Based on HTTP status codes
71
66
  if (statusCode && [403, 429, 503, 520, 521, 522, 523, 524].includes(statusCode)) {
72
67
  return true;
73
68
  }
74
- // 基于错误消息判断
69
+ // Based on error messages
75
70
  const browserTriggers = [
76
71
  'cloudflare',
77
72
  'access denied',
@@ -83,13 +78,13 @@ function shouldUseBrowser(error, statusCode, content) {
83
78
  'blocked',
84
79
  'protection',
85
80
  'verification required',
86
- '环境异常',
87
- '验证'
81
+ 'environment anomaly',
82
+ 'verify'
88
83
  ];
89
84
  if (browserTriggers.some(trigger => errorMessage.includes(trigger))) {
90
85
  return true;
91
86
  }
92
- // 基于响应内容判断
87
+ // Based on response content
93
88
  if (content) {
94
89
  const contentLower = content.toLowerCase();
95
90
  const contentTriggers = [
@@ -99,10 +94,10 @@ function shouldUseBrowser(error, statusCode, content) {
99
94
  'security check',
100
95
  'human verification',
101
96
  'captcha',
102
- // 微信特有的验证关键词
103
- '环境异常',
104
- '去验证',
105
- '完成验证后即可继续访问',
97
+ // WeChat-specific verification keywords
98
+ 'environment anomaly',
99
+ 'verify',
100
+ 'complete verification to continue',
106
101
  'verify'
107
102
  ];
108
103
  if (contentTriggers.some(trigger => contentLower.includes(trigger))) {
@@ -111,12 +106,12 @@ function shouldUseBrowser(error, statusCode, content) {
111
106
  }
112
107
  return false;
113
108
  }
114
- // 使用Jina Reader获取内容
109
+ // Fetch content using Jina Reader
115
110
  async function fetchWithJinaReader(url) {
116
111
  try {
117
112
  // Jina Reader API URL
118
113
  const jinaUrl = `https://r.jina.ai/${url}`;
119
- // 创建超时控制器
114
+ // Create timeout controller
120
115
  const controller = new AbortController();
121
116
  const timeoutId = setTimeout(() => controller.abort(), 30000);
122
117
  const response = await fetch(jinaUrl, {
@@ -131,9 +126,9 @@ async function fetchWithJinaReader(url) {
131
126
  throw new Error(`Jina Reader API error! status: ${response.status}`);
132
127
  }
133
128
  const markdown = await response.text();
134
- // Markdown中提取标题(通常是第一个#标题)
129
+ // Extract title from Markdown (usually the first # heading)
135
130
  const titleMatch = markdown.match(/^#\s+(.+)$/m);
136
- const title = titleMatch ? titleMatch[1] : "无标题";
131
+ const title = titleMatch ? titleMatch[1] : "No title";
137
132
  return {
138
133
  title,
139
134
  content: markdown,
@@ -148,21 +143,21 @@ async function fetchWithJinaReader(url) {
148
143
  catch (error) {
149
144
  if (error instanceof Error) {
150
145
  if (error.name === 'AbortError') {
151
- throw new Error(`Jina Reader请求超时(30秒)`);
146
+ throw new Error(`Jina Reader request timeout (30s)`);
152
147
  }
153
- throw new Error(`Jina Reader获取失败: ${error.message}`);
148
+ throw new Error(`Jina Reader fetch failed: ${error.message}`);
154
149
  }
155
- throw new Error(`Jina Reader获取失败: ${String(error)}`);
150
+ throw new Error(`Jina Reader fetch failed: ${String(error)}`);
156
151
  }
157
152
  }
158
- // 使用Playwright获取网页内容
153
+ // Fetch web content using Playwright
159
154
  async function fetchWithPlaywright(url) {
160
155
  let page = null;
161
156
  const isWeixin = isWeixinUrl(url);
162
157
  try {
163
158
  const browserInstance = await getBrowser();
164
159
  page = await browserInstance.newPage();
165
- // 设置真实的 User-Agent(模拟 Chrome on Mac
160
+ // Set real User-Agent (simulate Chrome on Mac)
166
161
  const userAgent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36';
167
162
  await page.setExtraHTTPHeaders({
168
163
  'User-Agent': userAgent,
@@ -174,7 +169,7 @@ async function fetchWithPlaywright(url) {
174
169
  ...(isWeixin ? { 'Referer': 'https://mp.weixin.qq.com/' } : {}),
175
170
  });
176
171
  await page.setViewportSize({ width: 1920, height: 1080 });
177
- // 微信文章需要加载样式以正确渲染,其他网站可以过滤
172
+ // WeChat articles need to load styles for correct rendering, filter for other sites
178
173
  if (!isWeixin) {
179
174
  await page.route('**/*', (route) => {
180
175
  const resourceType = route.request().resourceType();
@@ -186,30 +181,30 @@ async function fetchWithPlaywright(url) {
186
181
  }
187
182
  });
188
183
  }
189
- // 导航到页面,设置更长的超时时间
184
+ // Navigate to page with longer timeout
190
185
  await page.goto(url, {
191
186
  timeout: 45000,
192
- waitUntil: 'networkidle' // 等待网络空闲,确保 JS 完全执行
187
+ waitUntil: 'networkidle' // Wait for network idle to ensure JS execution
193
188
  });
194
- // 微信文章需要更长的等待时间
189
+ // WeChat articles need longer wait time
195
190
  const waitTime = isWeixin ? 5000 : 2000;
196
191
  await page.waitForTimeout(waitTime);
197
- // 获取页面标题
198
- const title = await page.title() || "无标题";
199
- // 移除不需要的元素
192
+ // Get page title
193
+ const title = await page.title() || "No title";
194
+ // Remove unwanted elements
200
195
  await page.evaluate(() => {
201
196
  const elementsToRemove = document.querySelectorAll('script, style, nav, header, footer, aside, .advertisement, .ads, .sidebar, .comments, .social-share');
202
197
  elementsToRemove.forEach(el => el.remove());
203
198
  });
204
- // 获取主要内容(微信文章有特定的 DOM 结构)
199
+ // Get main content (WeChat articles have specific DOM structure)
205
200
  const htmlContent = await page.evaluate(() => {
206
- // 微信文章特定选择器
201
+ // WeChat article specific selectors
207
202
  const weixinContent = document.querySelector('#js_content') ||
208
203
  document.querySelector('.rich_media_content');
209
204
  if (weixinContent) {
210
205
  return weixinContent.innerHTML;
211
206
  }
212
- // 通用选择器
207
+ // Common selectors
213
208
  const mainContent = document.querySelector('main') ||
214
209
  document.querySelector('article') ||
215
210
  document.querySelector('[role="main"]') ||
@@ -220,9 +215,9 @@ async function fetchWithPlaywright(url) {
220
215
  document.body;
221
216
  return mainContent ? mainContent.innerHTML : document.body.innerHTML;
222
217
  });
223
- // 转换为Markdown
218
+ // Convert to Markdown
224
219
  const markdown = turndownService.turndown(htmlContent);
225
- // 清理内容
220
+ // Clean content
226
221
  const cleanedContent = markdown
227
222
  .replace(/\n{3,}/g, "\n\n")
228
223
  .replace(/^\s+$/gm, "")
@@ -240,9 +235,9 @@ async function fetchWithPlaywright(url) {
240
235
  }
241
236
  catch (error) {
242
237
  if (error instanceof Error) {
243
- throw new Error(`Playwright获取失败: ${error.message}`);
238
+ throw new Error(`Playwright fetch failed: ${error.message}`);
244
239
  }
245
- throw new Error(`Playwright获取失败: ${String(error)}`);
240
+ throw new Error(`Playwright fetch failed: ${String(error)}`);
246
241
  }
247
242
  finally {
248
243
  if (page) {
@@ -250,13 +245,13 @@ async function fetchWithPlaywright(url) {
250
245
  }
251
246
  }
252
247
  }
253
- // 本地提取网页内容的函数
248
+ // Local web content extraction function
254
249
  async function fetchWithLocalParser(url) {
255
250
  try {
256
- // 创建超时控制器
251
+ // Create timeout controller
257
252
  const controller = new AbortController();
258
253
  const timeoutId = setTimeout(() => controller.abort(), 30000);
259
- // 发送HTTP请求
254
+ // Send HTTP request
260
255
  const response = await fetch(url, {
261
256
  headers: {
262
257
  "User-Agent": "Mozilla/5.0 (compatible; MCP-URLFetcher/2.0)",
@@ -267,17 +262,17 @@ async function fetchWithLocalParser(url) {
267
262
  if (!response.ok) {
268
263
  throw new Error(`HTTP error! status: ${response.status}`);
269
264
  }
270
- // 获取HTML内容
265
+ // Get HTML content
271
266
  const html = await response.text();
272
- // 使用JSDOM解析HTML
267
+ // Parse HTML with JSDOM
273
268
  const dom = new JSDOM(html);
274
269
  const document = dom.window.document;
275
- // 获取标题
276
- const title = document.querySelector("title")?.textContent || "无标题";
277
- // 移除不需要的元素
270
+ // Get title
271
+ const title = document.querySelector("title")?.textContent || "No title";
272
+ // Remove unwanted elements
278
273
  const elementsToRemove = document.querySelectorAll("script, style, nav, header, footer, aside, .advertisement, .ads, .sidebar, .comments");
279
274
  elementsToRemove.forEach(el => el.remove());
280
- // 获取主要内容区域
275
+ // Get main content area
281
276
  const mainContent = document.querySelector("main") ||
282
277
  document.querySelector("article") ||
283
278
  document.querySelector('[role="main"]') ||
@@ -286,9 +281,9 @@ async function fetchWithLocalParser(url) {
286
281
  document.querySelector(".post") ||
287
282
  document.querySelector(".entry-content") ||
288
283
  document.body;
289
- // 转换为Markdown
284
+ // Convert to Markdown
290
285
  const markdown = turndownService.turndown(mainContent.innerHTML);
291
- // 清理多余的空行和空格
286
+ // Clean extra whitespace
292
287
  const cleanedContent = markdown
293
288
  .replace(/\n{3,}/g, "\n\n")
294
289
  .replace(/^\s+$/gm, "")
@@ -307,62 +302,62 @@ async function fetchWithLocalParser(url) {
307
302
  catch (error) {
308
303
  if (error instanceof Error) {
309
304
  if (error.name === 'AbortError') {
310
- throw new Error(`本地解析请求超时(30秒)`);
305
+ throw new Error(`Local parser request timeout (30s)`);
311
306
  }
312
- throw new Error(`本地解析失败: ${error.message}`);
307
+ throw new Error(`Local parser failed: ${error.message}`);
313
308
  }
314
- throw new Error(`本地解析失败: ${String(error)}`);
309
+ throw new Error(`Local parser failed: ${String(error)}`);
315
310
  }
316
311
  }
317
- // 智能获取网页内容(三层降级策略:Jina → 本地 → Playwright
318
- // 对于微信等已知需要浏览器的网站,直接使用浏览器模式
312
+ // Smart web content fetching (three-tier fallback: Jina → Local → Playwright)
313
+ // For known sites requiring browser (like WeChat), use browser mode directly
319
314
  async function fetchWebContent(url, preferJina = true) {
320
- // 微信文章直接使用浏览器模式,因为其他方式无法绕过验证
315
+ // WeChat articles use browser mode directly as other methods cannot bypass verification
321
316
  if (isWeixinUrl(url)) {
322
- console.error("检测到微信文章,直接使用Playwright浏览器模式");
317
+ console.error("Detected WeChat article, using Playwright browser mode");
323
318
  return await fetchWithPlaywright(url);
324
319
  }
325
320
  if (preferJina) {
326
- // 第一层:尝试Jina Reader
321
+ // Tier 1: Try Jina Reader
327
322
  try {
328
323
  return await fetchWithJinaReader(url);
329
324
  }
330
325
  catch (jinaError) {
331
- console.error("Jina Reader失败,尝试本地解析:", jinaError instanceof Error ? jinaError.message : String(jinaError));
332
- // 第二层:尝试本地解析
326
+ console.error("Jina Reader failed, trying local parser:", jinaError instanceof Error ? jinaError.message : String(jinaError));
327
+ // Tier 2: Try local parser
333
328
  try {
334
329
  return await fetchWithLocalParser(url);
335
330
  }
336
331
  catch (localError) {
337
- console.error("本地解析失败,检查是否需要浏览器模式:", localError instanceof Error ? localError.message : String(localError));
338
- // 判断是否需要使用浏览器模式
332
+ console.error("Local parser failed, checking if browser mode needed:", localError instanceof Error ? localError.message : String(localError));
333
+ // Check if browser mode is needed
339
334
  const jinaErr = jinaError instanceof Error ? jinaError : new Error(String(jinaError));
340
335
  const localErr = localError instanceof Error ? localError : new Error(String(localError));
341
336
  if (shouldUseBrowser(jinaErr) || shouldUseBrowser(localErr)) {
342
- console.error("检测到访问限制,使用Playwright浏览器模式");
337
+ console.error("Detected access restrictions, using Playwright browser mode");
343
338
  try {
344
- // 第三层:使用Playwright浏览器
339
+ // Tier 3: Use Playwright browser
345
340
  return await fetchWithPlaywright(url);
346
341
  }
347
342
  catch (browserError) {
348
- throw new Error(`所有方法都失败了。Jina: ${jinaErr.message}, 本地: ${localErr.message}, 浏览器: ${browserError instanceof Error ? browserError.message : String(browserError)}`);
343
+ throw new Error(`All methods failed. Jina: ${jinaErr.message}, Local: ${localErr.message}, Browser: ${browserError instanceof Error ? browserError.message : String(browserError)}`);
349
344
  }
350
345
  }
351
346
  else {
352
- throw new Error(`Jina和本地解析都失败了。Jina: ${jinaErr.message}, 本地: ${localErr.message}`);
347
+ throw new Error(`Jina and local parser both failed. Jina: ${jinaErr.message}, Local: ${localErr.message}`);
353
348
  }
354
349
  }
355
350
  }
356
351
  }
357
352
  else {
358
- // 如果不优先使用Jina,直接从本地解析开始
353
+ // If not prioritizing Jina, start with local parser
359
354
  try {
360
355
  return await fetchWithLocalParser(url);
361
356
  }
362
357
  catch (localError) {
363
358
  const localErr = localError instanceof Error ? localError : new Error(String(localError));
364
359
  if (shouldUseBrowser(localErr)) {
365
- console.error("本地解析失败,检测到访问限制,使用Playwright浏览器模式");
360
+ console.error("Local parser failed, detected access restrictions, using Playwright browser mode");
366
361
  return await fetchWithPlaywright(url);
367
362
  }
368
363
  else {
@@ -371,228 +366,461 @@ async function fetchWebContent(url, preferJina = true) {
371
366
  }
372
367
  }
373
368
  }
374
- // 处理工具列表请求
375
- server.setRequestHandler(ListToolsRequestSchema, async () => {
376
- return {
377
- tools: [
378
- {
379
- name: "fetch_url",
380
- description: "获取指定URL的网页内容,并转换为Markdown格式。默认使用Jina Reader,失败时自动切换到本地解析",
381
- inputSchema: {
382
- type: "object",
383
- properties: {
384
- url: {
385
- type: "string",
386
- description: "要获取内容的网页URL(必须是http或https协议)",
387
- },
388
- preferJina: {
389
- type: "boolean",
390
- description: "是否优先使用Jina Reader(默认为true)",
391
- default: true,
369
+ const streamableSessions = new Map();
370
+ const legacySseSessions = new Map();
371
+ function createServerInstance() {
372
+ const server = new Server({
373
+ name: "web-reader",
374
+ version: "2.1.0",
375
+ }, {
376
+ capabilities: {
377
+ tools: {},
378
+ },
379
+ });
380
+ registerServerHandlers(server);
381
+ return server;
382
+ }
383
+ function registerServerHandlers(server) {
384
+ // Handle tool list requests
385
+ server.setRequestHandler(ListToolsRequestSchema, async () => {
386
+ return {
387
+ tools: [
388
+ {
389
+ name: "fetch_url",
390
+ description: "Fetch web content from specified URL and convert to Markdown format. Uses Jina Reader by default, automatically falls back to local parser on failure",
391
+ inputSchema: {
392
+ type: "object",
393
+ properties: {
394
+ url: {
395
+ type: "string",
396
+ description: "Webpage URL to fetch (must be http or https protocol)",
397
+ },
398
+ preferJina: {
399
+ type: "boolean",
400
+ description: "Whether to prioritize Jina Reader (default: true)",
401
+ default: true,
402
+ },
392
403
  },
404
+ required: ["url"],
393
405
  },
394
- required: ["url"],
395
406
  },
396
- },
397
- {
398
- name: "fetch_multiple_urls",
399
- description: "批量获取多个URL的网页内容",
400
- inputSchema: {
401
- type: "object",
402
- properties: {
403
- urls: {
404
- type: "array",
405
- items: {
406
- type: "string",
407
+ {
408
+ name: "fetch_multiple_urls",
409
+ description: "Batch fetch web content from multiple URLs",
410
+ inputSchema: {
411
+ type: "object",
412
+ properties: {
413
+ urls: {
414
+ type: "array",
415
+ items: {
416
+ type: "string",
417
+ },
418
+ description: "List of webpage URLs to fetch",
419
+ maxItems: 10, // Limit to 10 URLs
420
+ },
421
+ preferJina: {
422
+ type: "boolean",
423
+ description: "Whether to prioritize Jina Reader (default: true)",
424
+ default: true,
407
425
  },
408
- description: "要获取内容的网页URL列表",
409
- maxItems: 10, // 限制最多10个URL
410
- },
411
- preferJina: {
412
- type: "boolean",
413
- description: "是否优先使用Jina Reader(默认为true)",
414
- default: true,
415
426
  },
427
+ required: ["urls"],
416
428
  },
417
- required: ["urls"],
418
429
  },
419
- },
420
- {
421
- name: "fetch_url_with_jina",
422
- description: "强制使用Jina Reader获取网页内容(适用于复杂网页)",
423
- inputSchema: {
424
- type: "object",
425
- properties: {
426
- url: {
427
- type: "string",
428
- description: "要获取内容的网页URL",
430
+ {
431
+ name: "fetch_url_with_jina",
432
+ description: "Force fetch using Jina Reader (suitable for complex webpages)",
433
+ inputSchema: {
434
+ type: "object",
435
+ properties: {
436
+ url: {
437
+ type: "string",
438
+ description: "Webpage URL to fetch",
439
+ },
429
440
  },
441
+ required: ["url"],
430
442
  },
431
- required: ["url"],
432
443
  },
433
- },
434
- {
435
- name: "fetch_url_local",
436
- description: "强制使用本地解析器获取网页内容(适用于简单网页或Jina不可用时)",
437
- inputSchema: {
438
- type: "object",
439
- properties: {
440
- url: {
441
- type: "string",
442
- description: "要获取内容的网页URL",
444
+ {
445
+ name: "fetch_url_local",
446
+ description: "Force fetch using local parser (suitable for simple webpages or when Jina is unavailable)",
447
+ inputSchema: {
448
+ type: "object",
449
+ properties: {
450
+ url: {
451
+ type: "string",
452
+ description: "Webpage URL to fetch",
453
+ },
443
454
  },
455
+ required: ["url"],
444
456
  },
445
- required: ["url"],
446
457
  },
447
- },
448
- {
449
- name: "fetch_url_with_browser",
450
- description: "强制使用Playwright浏览器获取网页内容(适用于有访问限制的网站,如Cloudflare保护、验证码等)",
451
- inputSchema: {
452
- type: "object",
453
- properties: {
454
- url: {
455
- type: "string",
456
- description: "要获取内容的网页URL",
458
+ {
459
+ name: "fetch_url_with_browser",
460
+ description: "Force fetch using Playwright browser (suitable for websites with access restrictions, such as Cloudflare protection, CAPTCHA, etc.)",
461
+ inputSchema: {
462
+ type: "object",
463
+ properties: {
464
+ url: {
465
+ type: "string",
466
+ description: "Webpage URL to fetch",
467
+ },
457
468
  },
469
+ required: ["url"],
458
470
  },
459
- required: ["url"],
460
471
  },
461
- },
462
- ],
463
- };
464
- });
465
- // 处理工具调用请求
466
- server.setRequestHandler(CallToolRequestSchema, async (request) => {
467
- const { name, arguments: args } = request.params;
468
- try {
469
- if (name === "fetch_url") {
470
- const { url, preferJina = true } = args;
471
- // 验证URL
472
- if (!isValidUrl(url)) {
473
- throw new McpError(ErrorCode.InvalidParams, "无效的URL格式,请提供http或https协议的URL");
474
- }
475
- // 获取网页内容
476
- const result = await fetchWebContent(url, preferJina);
477
- return {
478
- content: [
479
- {
480
- type: "text",
481
- text: `# ${result.title}\n\n**URL**: ${result.metadata.url}\n**获取时间**: ${result.metadata.fetchedAt}\n**内容长度**: ${result.metadata.contentLength} 字符\n**解析方法**: ${result.metadata.method}\n\n---\n\n${result.content}`,
482
- },
483
- ],
484
- };
485
- }
486
- else if (name === "fetch_url_with_jina") {
487
- const { url } = args;
488
- if (!isValidUrl(url)) {
489
- throw new McpError(ErrorCode.InvalidParams, "无效的URL格式");
472
+ ],
473
+ };
474
+ });
475
+ // Handle tool call requests
476
+ server.setRequestHandler(CallToolRequestSchema, async (request) => {
477
+ const { name, arguments: args } = request.params;
478
+ try {
479
+ if (name === "fetch_url") {
480
+ const { url, preferJina = true } = args;
481
+ // Validate URL
482
+ if (!isValidUrl(url)) {
483
+ throw new McpError(ErrorCode.InvalidParams, "Invalid URL format, please provide http or https protocol URL");
484
+ }
485
+ // Fetch web content
486
+ const result = await fetchWebContent(url, preferJina);
487
+ return {
488
+ content: [
489
+ {
490
+ type: "text",
491
+ text: `# ${result.title}\n\n**URL**: ${result.metadata.url}\n**Fetched At**: ${result.metadata.fetchedAt}\n**Content Length**: ${result.metadata.contentLength} characters\n**Method**: ${result.metadata.method}\n\n---\n\n${result.content}`,
492
+ },
493
+ ],
494
+ };
490
495
  }
491
- const result = await fetchWithJinaReader(url);
492
- return {
493
- content: [
494
- {
495
- type: "text",
496
- text: `# ${result.title}\n\n**URL**: ${result.metadata.url}\n**获取时间**: ${result.metadata.fetchedAt}\n**内容长度**: ${result.metadata.contentLength} 字符\n**解析方法**: Jina Reader\n\n---\n\n${result.content}`,
497
- },
498
- ],
499
- };
500
- }
501
- else if (name === "fetch_url_local") {
502
- const { url } = args;
503
- if (!isValidUrl(url)) {
504
- throw new McpError(ErrorCode.InvalidParams, "无效的URL格式");
496
+ else if (name === "fetch_url_with_jina") {
497
+ const { url } = args;
498
+ if (!isValidUrl(url)) {
499
+ throw new McpError(ErrorCode.InvalidParams, "Invalid URL format");
500
+ }
501
+ const result = await fetchWithJinaReader(url);
502
+ return {
503
+ content: [
504
+ {
505
+ type: "text",
506
+ text: `# ${result.title}\n\n**URL**: ${result.metadata.url}\n**Fetched At**: ${result.metadata.fetchedAt}\n**Content Length**: ${result.metadata.contentLength} characters\n**Method**: Jina Reader\n\n---\n\n${result.content}`,
507
+ },
508
+ ],
509
+ };
505
510
  }
506
- const result = await fetchWithLocalParser(url);
507
- return {
508
- content: [
509
- {
510
- type: "text",
511
- text: `# ${result.title}\n\n**URL**: ${result.metadata.url}\n**获取时间**: ${result.metadata.fetchedAt}\n**内容长度**: ${result.metadata.contentLength} 字符\n**解析方法**: 本地解析器\n\n---\n\n${result.content}`,
512
- },
513
- ],
514
- };
515
- }
516
- else if (name === "fetch_multiple_urls") {
517
- const { urls, preferJina = true } = args;
518
- // 验证所有URL
519
- const invalidUrls = urls.filter(url => !isValidUrl(url));
520
- if (invalidUrls.length > 0) {
521
- throw new McpError(ErrorCode.InvalidParams, `以下URL格式无效: ${invalidUrls.join(", ")}`);
511
+ else if (name === "fetch_url_local") {
512
+ const { url } = args;
513
+ if (!isValidUrl(url)) {
514
+ throw new McpError(ErrorCode.InvalidParams, "Invalid URL format");
515
+ }
516
+ const result = await fetchWithLocalParser(url);
517
+ return {
518
+ content: [
519
+ {
520
+ type: "text",
521
+ text: `# ${result.title}\n\n**URL**: ${result.metadata.url}\n**Fetched At**: ${result.metadata.fetchedAt}\n**Content Length**: ${result.metadata.contentLength} characters\n**Method**: Local Parser\n\n---\n\n${result.content}`,
522
+ },
523
+ ],
524
+ };
522
525
  }
523
- // 并发获取所有URL内容
524
- const results = await Promise.allSettled(urls.map(url => fetchWebContent(url, preferJina)));
525
- // 整理结果
526
- let combinedContent = "# 批量URL内容获取结果\n\n";
527
- results.forEach((result, index) => {
528
- const url = urls[index];
529
- combinedContent += `## ${index + 1}. ${url}\n\n`;
530
- if (result.status === "fulfilled") {
531
- const { title, content, metadata } = result.value;
532
- combinedContent += `**标题**: ${title}\n`;
533
- combinedContent += `**获取时间**: ${metadata.fetchedAt}\n`;
534
- combinedContent += `**内容长度**: ${metadata.contentLength} 字符\n`;
535
- combinedContent += `**解析方法**: ${metadata.method}\n\n`;
536
- combinedContent += `### 内容\n\n${content}\n\n`;
526
+ else if (name === "fetch_multiple_urls") {
527
+ const { urls, preferJina = true } = args;
528
+ // Validate all URLs
529
+ const invalidUrls = urls.filter(url => !isValidUrl(url));
530
+ if (invalidUrls.length > 0) {
531
+ throw new McpError(ErrorCode.InvalidParams, `The following URLs have invalid format: ${invalidUrls.join(", ")}`);
537
532
  }
538
- else {
539
- combinedContent += `**错误**: ${result.reason}\n\n`;
533
+ // Fetch all URLs concurrently
534
+ const results = await Promise.allSettled(urls.map(url => fetchWebContent(url, preferJina)));
535
+ // Combine results
536
+ let combinedContent = "# Batch URL Content Fetch Results\n\n";
537
+ results.forEach((result, index) => {
538
+ const url = urls[index];
539
+ combinedContent += `## ${index + 1}. ${url}\n\n`;
540
+ if (result.status === "fulfilled") {
541
+ const { title, content, metadata } = result.value;
542
+ combinedContent += `**Title**: ${title}\n`;
543
+ combinedContent += `**Fetched At**: ${metadata.fetchedAt}\n`;
544
+ combinedContent += `**Content Length**: ${metadata.contentLength} characters\n`;
545
+ combinedContent += `**Method**: ${metadata.method}\n\n`;
546
+ combinedContent += `### Content\n\n${content}\n\n`;
547
+ }
548
+ else {
549
+ combinedContent += `**Error**: ${result.reason}\n\n`;
550
+ }
551
+ combinedContent += "---\n\n";
552
+ });
553
+ return {
554
+ content: [
555
+ {
556
+ type: "text",
557
+ text: combinedContent,
558
+ },
559
+ ],
560
+ };
561
+ }
562
+ else if (name === "fetch_url_with_browser") {
563
+ const { url } = args;
564
+ if (!isValidUrl(url)) {
565
+ throw new McpError(ErrorCode.InvalidParams, "Invalid URL format");
540
566
  }
541
- combinedContent += "---\n\n";
542
- });
543
- return {
544
- content: [
545
- {
546
- type: "text",
547
- text: combinedContent,
548
- },
549
- ],
550
- };
567
+ const result = await fetchWithPlaywright(url);
568
+ return {
569
+ content: [
570
+ {
571
+ type: "text",
572
+ text: `# ${result.title}\n\n**URL**: ${result.metadata.url}\n**Fetched At**: ${result.metadata.fetchedAt}\n**Content Length**: ${result.metadata.contentLength} characters\n**Method**: Playwright Browser\n\n---\n\n${result.content}`,
573
+ },
574
+ ],
575
+ };
576
+ }
577
+ else {
578
+ throw new McpError(ErrorCode.MethodNotFound, `Unknown tool: ${name}`);
579
+ }
551
580
  }
552
- else if (name === "fetch_url_with_browser") {
553
- const { url } = args;
554
- if (!isValidUrl(url)) {
555
- throw new McpError(ErrorCode.InvalidParams, "无效的URL格式");
581
+ catch (error) {
582
+ if (error instanceof McpError) {
583
+ throw error;
556
584
  }
557
- const result = await fetchWithPlaywright(url);
558
- return {
559
- content: [
560
- {
561
- type: "text",
562
- text: `# ${result.title}\n\n**URL**: ${result.metadata.url}\n**获取时间**: ${result.metadata.fetchedAt}\n**内容长度**: ${result.metadata.contentLength} 字符\n**解析方法**: Playwright浏览器\n\n---\n\n${result.content}`,
563
- },
564
- ],
565
- };
585
+ throw new McpError(ErrorCode.InternalError, `Tool execution failed: ${error instanceof Error ? error.message : String(error)}`);
586
+ }
587
+ });
588
+ }
589
+ function sendJsonRpcError(res, statusCode, message) {
590
+ res.status(statusCode).json({
591
+ jsonrpc: "2.0",
592
+ error: {
593
+ code: -32000,
594
+ message,
595
+ },
596
+ id: null,
597
+ });
598
+ }
599
+ function getSessionIdFromHeaders(headers) {
600
+ const value = headers["mcp-session-id"];
601
+ if (!value) {
602
+ return undefined;
603
+ }
604
+ return Array.isArray(value) ? value[0] : value;
605
+ }
606
+ function resolveTransportMode() {
607
+ const cliTransportArg = process.argv.find((arg) => arg.startsWith("--transport="));
608
+ const cliTransport = cliTransportArg ? cliTransportArg.split("=", 2)[1] : undefined;
609
+ const legacyHttpFlag = process.argv.includes("--http");
610
+ const mode = (cliTransport ?? process.env.MCP_TRANSPORT ?? (legacyHttpFlag ? "http" : "stdio"))
611
+ .toLowerCase();
612
+ if (mode === "stdio" || mode === "http") {
613
+ return mode;
614
+ }
615
+ throw new Error(`Unsupported transport mode: ${mode}. Use 'stdio' or 'http'.`);
616
+ }
617
+ function resolveLegacySseFlag() {
618
+ const envValue = (process.env.MCP_ENABLE_LEGACY_SSE ?? "").toLowerCase();
619
+ return envValue === "1" || envValue === "true" || process.argv.includes("--legacy-sse");
620
+ }
621
+ async function closeAllSessions() {
622
+ for (const [sessionId, session] of streamableSessions.entries()) {
623
+ try {
624
+ await session.server.close();
566
625
  }
567
- else {
568
- throw new McpError(ErrorCode.MethodNotFound, `未知的工具: ${name}`);
626
+ catch (error) {
627
+ console.error(`Failed to close streamable server for session ${sessionId}:`, error);
569
628
  }
570
629
  }
571
- catch (error) {
572
- if (error instanceof McpError) {
573
- throw error;
630
+ streamableSessions.clear();
631
+ for (const [sessionId, session] of legacySseSessions.entries()) {
632
+ try {
633
+ await session.server.close();
634
+ }
635
+ catch (error) {
636
+ console.error(`Failed to close SSE server for session ${sessionId}:`, error);
574
637
  }
575
- throw new McpError(ErrorCode.InternalError, `工具执行失败: ${error instanceof Error ? error.message : String(error)}`);
576
638
  }
577
- });
578
- // 启动服务器
579
- async function main() {
639
+ legacySseSessions.clear();
640
+ }
641
+ async function startStdioServer() {
642
+ const server = createServerInstance();
580
643
  const transport = new StdioServerTransport();
581
644
  await server.connect(transport);
582
- console.error("MCP Web Reader v2.0 已启动(支持Jina Reader + Playwright)");
645
+ console.error("MCP Web Reader started in stdio mode");
646
+ }
647
+ async function startHttpServer() {
648
+ const host = process.env.MCP_HTTP_HOST ?? "127.0.0.1";
649
+ const port = Number.parseInt(process.env.MCP_HTTP_PORT ?? "3000", 10);
650
+ const mcpPath = process.env.MCP_HTTP_PATH ?? "/mcp";
651
+ const enableLegacySse = resolveLegacySseFlag();
652
+ if (!Number.isInteger(port) || port <= 0 || port > 65535) {
653
+ throw new Error(`Invalid MCP_HTTP_PORT: ${process.env.MCP_HTTP_PORT}`);
654
+ }
655
+ const app = createMcpExpressApp({ host });
656
+ app.post(mcpPath, async (req, res) => {
657
+ const sessionId = getSessionIdFromHeaders(req.headers);
658
+ try {
659
+ if (sessionId) {
660
+ const existingSession = streamableSessions.get(sessionId);
661
+ if (!existingSession) {
662
+ sendJsonRpcError(res, 404, "Session not found");
663
+ return;
664
+ }
665
+ await existingSession.transport.handleRequest(req, res, req.body);
666
+ return;
667
+ }
668
+ if (!isInitializeRequest(req.body)) {
669
+ sendJsonRpcError(res, 400, "Missing session ID; initialize request required");
670
+ return;
671
+ }
672
+ let transport;
673
+ const sessionServer = createServerInstance();
674
+ transport = new StreamableHTTPServerTransport({
675
+ sessionIdGenerator: () => randomUUID(),
676
+ onsessioninitialized: (initializedSessionId) => {
677
+ streamableSessions.set(initializedSessionId, { transport, server: sessionServer });
678
+ console.error(`Streamable HTTP session initialized: ${initializedSessionId}`);
679
+ },
680
+ });
681
+ transport.onclose = () => {
682
+ const closedSessionId = transport.sessionId;
683
+ if (closedSessionId && streamableSessions.delete(closedSessionId)) {
684
+ console.error(`Streamable HTTP session closed: ${closedSessionId}`);
685
+ }
686
+ };
687
+ await sessionServer.connect(transport);
688
+ await transport.handleRequest(req, res, req.body);
689
+ }
690
+ catch (error) {
691
+ console.error("Error handling streamable HTTP POST request:", error);
692
+ if (!res.headersSent) {
693
+ sendJsonRpcError(res, 500, "Internal server error");
694
+ }
695
+ }
696
+ });
697
+ app.get(mcpPath, async (req, res) => {
698
+ const sessionId = getSessionIdFromHeaders(req.headers);
699
+ if (!sessionId) {
700
+ sendJsonRpcError(res, 400, "Missing mcp-session-id header");
701
+ return;
702
+ }
703
+ const session = streamableSessions.get(sessionId);
704
+ if (!session) {
705
+ sendJsonRpcError(res, 404, "Session not found");
706
+ return;
707
+ }
708
+ try {
709
+ await session.transport.handleRequest(req, res);
710
+ }
711
+ catch (error) {
712
+ console.error("Error handling streamable HTTP GET request:", error);
713
+ if (!res.headersSent) {
714
+ sendJsonRpcError(res, 500, "Internal server error");
715
+ }
716
+ }
717
+ });
718
+ app.delete(mcpPath, async (req, res) => {
719
+ const sessionId = getSessionIdFromHeaders(req.headers);
720
+ if (!sessionId) {
721
+ sendJsonRpcError(res, 400, "Missing mcp-session-id header");
722
+ return;
723
+ }
724
+ const session = streamableSessions.get(sessionId);
725
+ if (!session) {
726
+ sendJsonRpcError(res, 404, "Session not found");
727
+ return;
728
+ }
729
+ try {
730
+ await session.transport.handleRequest(req, res);
731
+ }
732
+ catch (error) {
733
+ console.error("Error handling streamable HTTP DELETE request:", error);
734
+ if (!res.headersSent) {
735
+ sendJsonRpcError(res, 500, "Internal server error");
736
+ }
737
+ }
738
+ });
739
+ if (enableLegacySse) {
740
+ app.get("/sse", async (_req, res) => {
741
+ const transport = new SSEServerTransport("/messages", res);
742
+ const server = createServerInstance();
743
+ legacySseSessions.set(transport.sessionId, { transport, server });
744
+ res.on("close", () => {
745
+ const removed = legacySseSessions.delete(transport.sessionId);
746
+ if (removed) {
747
+ void server.close().catch((error) => {
748
+ console.error("Failed to close legacy SSE session server:", error);
749
+ });
750
+ }
751
+ });
752
+ await server.connect(transport);
753
+ });
754
+ app.post("/messages", async (req, res) => {
755
+ const querySessionId = req.query.sessionId;
756
+ const sessionId = typeof querySessionId === "string"
757
+ ? querySessionId
758
+ : Array.isArray(querySessionId) && typeof querySessionId[0] === "string"
759
+ ? querySessionId[0]
760
+ : undefined;
761
+ if (!sessionId) {
762
+ sendJsonRpcError(res, 400, "Missing sessionId query parameter");
763
+ return;
764
+ }
765
+ const session = legacySseSessions.get(sessionId);
766
+ if (!session) {
767
+ sendJsonRpcError(res, 404, "Session not found");
768
+ return;
769
+ }
770
+ try {
771
+ await session.transport.handlePostMessage(req, res, req.body);
772
+ }
773
+ catch (error) {
774
+ console.error("Error handling legacy SSE POST request:", error);
775
+ if (!res.headersSent) {
776
+ sendJsonRpcError(res, 500, "Internal server error");
777
+ }
778
+ }
779
+ });
780
+ }
781
+ app.get("/healthz", (_req, res) => {
782
+ res.json({
783
+ status: "ok",
784
+ transport: "streamable-http",
785
+ sessions: streamableSessions.size,
786
+ legacySseEnabled: enableLegacySse,
787
+ });
788
+ });
789
+ app.listen(port, host, () => {
790
+ console.error(`MCP Web Reader started in HTTP mode on http://${host}:${port}${mcpPath}`);
791
+ if (enableLegacySse) {
792
+ console.error("Legacy SSE compatibility enabled on /sse and /messages");
793
+ }
794
+ });
583
795
  }
584
- // 优雅关闭处理
585
- process.on('SIGINT', async () => {
586
- console.error("接收到SIGINT信号,正在关闭浏览器...");
796
+ let isShuttingDown = false;
797
+ async function shutdown(signal) {
798
+ if (isShuttingDown) {
799
+ return;
800
+ }
801
+ isShuttingDown = true;
802
+ console.error(`Received ${signal}, shutting down MCP Web Reader...`);
803
+ await closeAllSessions();
587
804
  await closeBrowser();
588
805
  process.exit(0);
806
+ }
807
+ // Start server
808
+ async function main() {
809
+ const transportMode = resolveTransportMode();
810
+ if (transportMode === "http") {
811
+ await startHttpServer();
812
+ return;
813
+ }
814
+ await startStdioServer();
815
+ }
816
+ // Graceful shutdown handling
817
+ process.on("SIGINT", () => {
818
+ void shutdown("SIGINT");
589
819
  });
590
- process.on('SIGTERM', async () => {
591
- console.error("接收到SIGTERM信号,正在关闭浏览器...");
592
- await closeBrowser();
593
- process.exit(0);
820
+ process.on("SIGTERM", () => {
821
+ void shutdown("SIGTERM");
594
822
  });
595
823
  main().catch((error) => {
596
- console.error("服务器启动失败:", error);
824
+ console.error("Server startup failed:", error);
597
825
  process.exit(1);
598
826
  });
package/package.json CHANGED
@@ -1,45 +1,52 @@
1
1
  {
2
2
  "name": "mcp-web-reader",
3
- "version": "2.0.2",
3
+ "version": "2.2.0",
4
4
  "description": "MCP server for reading web content with Jina Reader and local parser support",
5
5
  "main": "dist/index.js",
6
6
  "bin": {
7
- "mcp-web-reader": "./dist/index.js"
7
+ "mcp-web-reader": "./dist/index.js"
8
8
  },
9
9
  "type": "module",
10
10
  "scripts": {
11
- "build": "tsc",
12
- "start": "node dist/index.js",
13
- "dev": "tsc --watch",
14
- "claude-code": "node dist/index.js"
11
+ "build": "tsc",
12
+ "start": "node dist/index.js",
13
+ "start:http": "node dist/index.js --transport=http",
14
+ "dev": "tsc --watch",
15
+ "claude-code": "node dist/index.js",
16
+ "postinstall": "npx playwright install chromium"
15
17
  },
16
18
  "repository": {
17
- "type": "git",
18
- "url": "git+https://github.com/Gracker/mcp-web-reader.git"
19
+ "type": "git",
20
+ "url": "git+https://github.com/Gracker/mcp-web-reader.git"
19
21
  },
20
22
  "bugs": {
21
- "url": "https://github.com/Gracker/mcp-web-reader/issues"
23
+ "url": "https://github.com/Gracker/mcp-web-reader/issues"
22
24
  },
23
25
  "homepage": "https://github.com/Gracker/mcp-web-reader#readme",
24
- "keywords": ["mcp", "claude", "web-scraping", "jina-reader"],
26
+ "keywords": [
27
+ "mcp",
28
+ "claude",
29
+ "web-scraping",
30
+ "jina-reader"
31
+ ],
25
32
  "author": "Gracker",
26
33
  "license": "MIT",
27
34
  "files": [
28
- "dist",
29
- "README.md",
30
- "LICENSE"
35
+ "dist",
36
+ "README.md",
37
+ "LICENSE"
31
38
  ],
32
39
  "dependencies": {
33
- "@modelcontextprotocol/sdk": "^0.5.0",
34
- "node-fetch": "^3.3.2",
35
- "jsdom": "^24.0.0",
36
- "turndown": "^7.1.3",
37
- "playwright": "^1.40.0"
40
+ "@modelcontextprotocol/sdk": "^1.26.0",
41
+ "jsdom": "^24.0.0",
42
+ "node-fetch": "^3.3.2",
43
+ "playwright": "^1.40.0",
44
+ "turndown": "^7.1.3"
38
45
  },
39
46
  "devDependencies": {
40
- "@types/node": "^20.0.0",
41
- "@types/jsdom": "^21.1.6",
42
- "@types/turndown": "^5.0.4",
43
- "typescript": "^5.3.3"
47
+ "@types/jsdom": "^21.1.6",
48
+ "@types/node": "^20.0.0",
49
+ "@types/turndown": "^5.0.4",
50
+ "typescript": "^5.3.3"
44
51
  }
45
- }
52
+ }