npm - mcp-web-reader - Versions diffs - 2.0.2 → 2.2.0 - Mend

mcp-web-reader 2.0.2 → 2.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md CHANGED Viewed

@@ -1,39 +1,46 @@
 # MCP Web Reader
-A powerful MCP (Model Context Protocol) server that enables Claude and other LLMs to read and parse web content. Supports bypassing access restrictions to easily fetch protected content like WeChat articles and paywalled sites.
+A powerful MCP (Model Context Protocol) server that enables Claude and other LLMs to read and parse web content. Bypasses access restrictions for WeChat articles, paywalled sites, and Cloudflare-protected pages.
+[简体中文](./README_CN.md)
 ## Features
-- 🚀 **Multi-engine support**: Jina Reader API, local parser, and Playwright browser
-- 🔄 **Intelligent fallback**: Auto-switches from Jina → Local → Playwright browser
-- 🌐 **Bypass restrictions**: Handles Cloudflare, CAPTCHAs, and access controls
+- 🚀 **Multi-engine**: Jina Reader API, local parser, and Playwright browser
+- 🔄 **Smart fallback**: Auto-switches Jina → Local → Playwright browser
+- 🌐 **Bypass restrictions**: Cloudflare, CAPTCHAs, access controls
 - 📦 **Batch processing**: Fetch multiple URLs simultaneously
-- 🎯 **Flexible control**: Force specific parsing methods when needed
-- 📝 **Markdown output**: Automatic conversion to clean Markdown format
+- 📝 **Markdown output**: Automatic conversion to clean Markdown
+- 🔌 **Transport compatibility**: stdio + Streamable HTTP (optional legacy SSE compatibility mode)
 ## Installation
-### Quick Install (Recommended)
 ```bash
 npm install -g mcp-web-reader
 ```
-### Install from Source
+> **Note**: Chromium browser (~100-200MB) will be automatically downloaded. This is required for:
+> - WeChat articles (need browser rendering)
+> - Cloudflare-protected sites
+> - JavaScript-heavy sites
+> - CAPTCHA/access restrictions
+Download may take 1-5 minutes depending on network speed.
+### From Source
 ```bash
 git clone https://github.com/Gracker/mcp-web-reader.git
 cd mcp-web-reader
 npm install
 npm run build
-npx playwright install chromium
 ```
 ## Configuration
 ### Claude Desktop
-Add to your Claude Desktop config file:
+Add to your config file:
 **Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
 **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
@@ -48,36 +55,49 @@ Add to your Claude Desktop config file:
 }
 ```
-### Claude Code (Terminal)
-For Claude Code users, add the MCP server using the command line:
+### Claude Code
 ```bash
 claude mcp add web-reader -- mcp-web-reader
+claude mcp list
 ```
-To verify the server is configured:
+### Streamable HTTP (Remote Deployment)
+Start server in Streamable HTTP mode:
 ```bash
-claude mcp list
+MCP_TRANSPORT=http MCP_HTTP_HOST=0.0.0.0 MCP_HTTP_PORT=3000 npm run start:http
 ```
-## Usage
+Optional environment variables:
-### In Claude
+- `MCP_HTTP_PATH` (default: `/mcp`)
+- `MCP_ENABLE_LEGACY_SSE=true` to expose deprecated `/sse` + `/messages` endpoints
-After configuration, use natural language commands:
+Codex MCP config (HTTP):
+```toml
+[mcp_servers.web-reader]
+type = "http"
+url = "https://your-domain.com/mcp"
+bearer_token_env_var = "WEB_READER_TOKEN"
+```
+## Usage
+In Claude:
 - "Fetch content from https://example.com"
-- "Get content using browser for https://mp.weixin.qq.com/..." (for restricted sites)
+- "Get content using browser for https://mp.weixin.qq.com/..."
 - "Fetch multiple URLs: [url1, url2, url3]"
 ## Supported Sites
-- **WeChat articles** - Automatic access bypass
-- **Paywalled sites** - NYT, Time Magazine, etc.
-- **Cloudflare protected sites**
-- **JavaScript-heavy sites**
-- **CAPTCHA protected sites**
+- WeChat articles (mp.weixin.qq.com)
+- Paywalled sites (NYT, Time Magazine, etc.)
+- Cloudflare-protected sites
+- JavaScript-heavy sites
+- CAPTCHA-protected sites
 ## Tools
@@ -89,12 +109,12 @@ After configuration, use natural language commands:
 ## Architecture
-Intelligent fallback strategy:
+Intelligent fallback:
 ```
 URL Request → Jina Reader → Local Parser → Playwright Browser
 ```
-Auto-detects restrictions and switches to browser mode for:
+Auto-detects restrictions and switches to browser for:
 - HTTP status codes: 403, 429, 503, 520-524
 - Keywords: Cloudflare, CAPTCHA, Access Denied
 - Content patterns: Security checks, human verification
@@ -102,13 +122,12 @@ Auto-detects restrictions and switches to browser mode for:
 ## Development
 ```bash
-npm run dev    # Development mode with auto-rebuild
+npm run dev    # Development with auto-rebuild
 npm run build  # Build production version
 npm start      # Test run
-npx playwright install chromium  # Install browser (required)
+npm run start:http  # Run Streamable HTTP server
 ```
 ## License
 MIT License

package/dist/index.js CHANGED Viewed

@@ -1,32 +1,27 @@
 import { Server } from "@modelcontextprotocol/sdk/server/index.js";
 import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
-import { CallToolRequestSchema, ErrorCode, ListToolsRequestSchema, McpError, } from "@modelcontextprotocol/sdk/types.js";
+import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
+import { SSEServerTransport } from "@modelcontextprotocol/sdk/server/sse.js";
+import { createMcpExpressApp } from "@modelcontextprotocol/sdk/server/express.js";
+import { CallToolRequestSchema, ErrorCode, ListToolsRequestSchema, McpError, isInitializeRequest, } from "@modelcontextprotocol/sdk/types.js";
 import fetch from "node-fetch";
 import { JSDOM } from "jsdom";
 import TurndownService from "turndown";
 import { chromium } from "playwright";
-// 创建服务器实例
-const server = new Server({
-    name: "web-reader",
-    version: "2.0.0",
-}, {
-    capabilities: {
-        tools: {},
-    },
-});
-// 初始化Turndown服务（将HTML转换为Markdown）
+import { randomUUID } from "node:crypto";
+// Initialize Turndown service (convert HTML to Markdown)
 const turndownService = new TurndownService({
     headingStyle: "atx",
     codeBlockStyle: "fenced",
 });
-// 配置Turndown规则
+// Configure Turndown rules
 turndownService.addRule("skipScripts", {
     filter: ["script", "style", "noscript"],
     replacement: () => "",
 });
-// 浏览器实例管理
+// Browser instance management
 let browser = null;
-// 获取或创建浏览器实例
+// Get or create browser instance
 async function getBrowser() {
     if (!browser) {
         browser = await chromium.launch({
@@ -34,7 +29,7 @@ async function getBrowser() {
             args: [
                 '--no-sandbox',
                 '--disable-dev-shm-usage',
-                '--disable-blink-features=AutomationControlled', // 禁用自动化检测
+                '--disable-blink-features=AutomationControlled', // Disable automation detection
                 '--disable-infobars',
                 '--window-size=1920,1080',
                 '--start-maximized',
@@ -43,14 +38,14 @@ async function getBrowser() {
     }
     return browser;
 }
-// 清理浏览器实例
+// Clean up browser instance
 async function closeBrowser() {
     if (browser) {
         await browser.close();
         browser = null;
     }
 }
-// URL验证函数
+// URL validation function
 function isValidUrl(urlString) {
     try {
         const url = new URL(urlString);
@@ -60,18 +55,18 @@ function isValidUrl(urlString) {
         return false;
     }
 }
-// 检测是否是微信文章链接
+// Check if it's a WeChat article link
 function isWeixinUrl(url) {
     return url.includes('mp.weixin.qq.com') || url.includes('weixin.qq.com');
 }
-// 检测是否需要使用浏览器模式
+// Check if browser mode is needed
 function shouldUseBrowser(error, statusCode, content) {
     const errorMessage = error.message.toLowerCase();
-    // 基于HTTP状态码判断
+    // Based on HTTP status codes
     if (statusCode && [403, 429, 503, 520, 521, 522, 523, 524].includes(statusCode)) {
         return true;
     }
-    // 基于错误消息判断
+    // Based on error messages
     const browserTriggers = [
         'cloudflare',
         'access denied',
@@ -83,13 +78,13 @@ function shouldUseBrowser(error, statusCode, content) {
         'blocked',
         'protection',
         'verification required',
-        '环境异常',
-        '验证'
+        'environment anomaly',
+        'verify'
     ];
     if (browserTriggers.some(trigger => errorMessage.includes(trigger))) {
         return true;
     }
-    // 基于响应内容判断
+    // Based on response content
     if (content) {
         const contentLower = content.toLowerCase();
         const contentTriggers = [
@@ -99,10 +94,10 @@ function shouldUseBrowser(error, statusCode, content) {
             'security check',
             'human verification',
             'captcha',
-            // 微信特有的验证关键词
-            '环境异常',
-            '去验证',
-            '完成验证后即可继续访问',
+            // WeChat-specific verification keywords
+            'environment anomaly',
+            'verify',
+            'complete verification to continue',
             'verify'
         ];
         if (contentTriggers.some(trigger => contentLower.includes(trigger))) {
@@ -111,12 +106,12 @@ function shouldUseBrowser(error, statusCode, content) {
     }
     return false;
 }
-// 使用Jina Reader获取内容
+// Fetch content using Jina Reader
 async function fetchWithJinaReader(url) {
     try {
         // Jina Reader API URL
         const jinaUrl = `https://r.jina.ai/${url}`;
-        // 创建超时控制器
+        // Create timeout controller
         const controller = new AbortController();
         const timeoutId = setTimeout(() => controller.abort(), 30000);
         const response = await fetch(jinaUrl, {
@@ -131,9 +126,9 @@ async function fetchWithJinaReader(url) {
             throw new Error(`Jina Reader API error! status: ${response.status}`);
         }
         const markdown = await response.text();
-        // 从Markdown中提取标题（通常是第一个#标题）
+        // Extract title from Markdown (usually the first # heading)
         const titleMatch = markdown.match(/^#\s+(.+)$/m);
-        const title = titleMatch ? titleMatch[1] : "无标题";
+        const title = titleMatch ? titleMatch[1] : "No title";
         return {
             title,
             content: markdown,
@@ -148,21 +143,21 @@ async function fetchWithJinaReader(url) {
     catch (error) {
         if (error instanceof Error) {
             if (error.name === 'AbortError') {
-                throw new Error(`Jina Reader请求超时（30秒）`);
+                throw new Error(`Jina Reader request timeout (30s)`);
             }
-            throw new Error(`Jina Reader获取失败: ${error.message}`);
+            throw new Error(`Jina Reader fetch failed: ${error.message}`);
         }
-        throw new Error(`Jina Reader获取失败: ${String(error)}`);
+        throw new Error(`Jina Reader fetch failed: ${String(error)}`);
     }
 }
-// 使用Playwright获取网页内容
+// Fetch web content using Playwright
 async function fetchWithPlaywright(url) {
     let page = null;
     const isWeixin = isWeixinUrl(url);
     try {
         const browserInstance = await getBrowser();
         page = await browserInstance.newPage();
-        // 设置真实的 User-Agent（模拟 Chrome on Mac）
+        // Set real User-Agent (simulate Chrome on Mac)
         const userAgent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36';
         await page.setExtraHTTPHeaders({
             'User-Agent': userAgent,
@@ -174,7 +169,7 @@ async function fetchWithPlaywright(url) {
             ...(isWeixin ? { 'Referer': 'https://mp.weixin.qq.com/' } : {}),
         });
         await page.setViewportSize({ width: 1920, height: 1080 });
-        // 微信文章需要加载样式以正确渲染，其他网站可以过滤
+        // WeChat articles need to load styles for correct rendering, filter for other sites
         if (!isWeixin) {
             await page.route('**/*', (route) => {
                 const resourceType = route.request().resourceType();
@@ -186,30 +181,30 @@ async function fetchWithPlaywright(url) {
                 }
             });
         }
-        // 导航到页面，设置更长的超时时间
+        // Navigate to page with longer timeout
         await page.goto(url, {
             timeout: 45000,
-            waitUntil: 'networkidle' // 等待网络空闲，确保 JS 完全执行
+            waitUntil: 'networkidle' // Wait for network idle to ensure JS execution
         });
-        // 微信文章需要更长的等待时间
+        // WeChat articles need longer wait time
         const waitTime = isWeixin ? 5000 : 2000;
         await page.waitForTimeout(waitTime);
-        // 获取页面标题
-        const title = await page.title() || "无标题";
-        // 移除不需要的元素
+        // Get page title
+        const title = await page.title() || "No title";
+        // Remove unwanted elements
         await page.evaluate(() => {
             const elementsToRemove = document.querySelectorAll('script, style, nav, header, footer, aside, .advertisement, .ads, .sidebar, .comments, .social-share');
             elementsToRemove.forEach(el => el.remove());
         });
-        // 获取主要内容（微信文章有特定的 DOM 结构）
+        // Get main content (WeChat articles have specific DOM structure)
         const htmlContent = await page.evaluate(() => {
-            // 微信文章特定选择器
+            // WeChat article specific selectors
             const weixinContent = document.querySelector('#js_content') ||
                 document.querySelector('.rich_media_content');
             if (weixinContent) {
                 return weixinContent.innerHTML;
             }
-            // 通用选择器
+            // Common selectors
             const mainContent = document.querySelector('main') ||
                 document.querySelector('article') ||
                 document.querySelector('[role="main"]') ||
@@ -220,9 +215,9 @@ async function fetchWithPlaywright(url) {
                 document.body;
             return mainContent ? mainContent.innerHTML : document.body.innerHTML;
         });
-        // 转换为Markdown
+        // Convert to Markdown
         const markdown = turndownService.turndown(htmlContent);
-        // 清理内容
+        // Clean content
         const cleanedContent = markdown
             .replace(/\n{3,}/g, "\n\n")
             .replace(/^\s+$/gm, "")
@@ -240,9 +235,9 @@ async function fetchWithPlaywright(url) {
     }
     catch (error) {
         if (error instanceof Error) {
-            throw new Error(`Playwright获取失败: ${error.message}`);
+            throw new Error(`Playwright fetch failed: ${error.message}`);
         }
-        throw new Error(`Playwright获取失败: ${String(error)}`);
+        throw new Error(`Playwright fetch failed: ${String(error)}`);
     }
     finally {
         if (page) {
@@ -250,13 +245,13 @@ async function fetchWithPlaywright(url) {
         }
     }
 }
-// 本地提取网页内容的函数
+// Local web content extraction function
 async function fetchWithLocalParser(url) {
     try {
-        // 创建超时控制器
+        // Create timeout controller
         const controller = new AbortController();
         const timeoutId = setTimeout(() => controller.abort(), 30000);
-        // 发送HTTP请求
+        // Send HTTP request
         const response = await fetch(url, {
             headers: {
                 "User-Agent": "Mozilla/5.0 (compatible; MCP-URLFetcher/2.0)",
@@ -267,17 +262,17 @@ async function fetchWithLocalParser(url) {
         if (!response.ok) {
             throw new Error(`HTTP error! status: ${response.status}`);
         }
-        // 获取HTML内容
+        // Get HTML content
         const html = await response.text();
-        // 使用JSDOM解析HTML
+        // Parse HTML with JSDOM
         const dom = new JSDOM(html);
         const document = dom.window.document;
-        // 获取标题
-        const title = document.querySelector("title")?.textContent || "无标题";
-        // 移除不需要的元素
+        // Get title
+        const title = document.querySelector("title")?.textContent || "No title";
+        // Remove unwanted elements
         const elementsToRemove = document.querySelectorAll("script, style, nav, header, footer, aside, .advertisement, .ads, .sidebar, .comments");
         elementsToRemove.forEach(el => el.remove());
-        // 获取主要内容区域
+        // Get main content area
         const mainContent = document.querySelector("main") ||
             document.querySelector("article") ||
             document.querySelector('[role="main"]') ||
@@ -286,9 +281,9 @@ async function fetchWithLocalParser(url) {
             document.querySelector(".post") ||
             document.querySelector(".entry-content") ||
             document.body;
-        // 转换为Markdown
+        // Convert to Markdown
         const markdown = turndownService.turndown(mainContent.innerHTML);
-        // 清理多余的空行和空格
+        // Clean extra whitespace
         const cleanedContent = markdown
             .replace(/\n{3,}/g, "\n\n")
             .replace(/^\s+$/gm, "")
@@ -307,62 +302,62 @@ async function fetchWithLocalParser(url) {
     catch (error) {
         if (error instanceof Error) {
             if (error.name === 'AbortError') {
-                throw new Error(`本地解析请求超时（30秒）`);
+                throw new Error(`Local parser request timeout (30s)`);
             }
-            throw new Error(`本地解析失败: ${error.message}`);
+            throw new Error(`Local parser failed: ${error.message}`);
         }
-        throw new Error(`本地解析失败: ${String(error)}`);
+        throw new Error(`Local parser failed: ${String(error)}`);
     }
 }
-// 智能获取网页内容（三层降级策略：Jina → 本地 → Playwright）
-// 对于微信等已知需要浏览器的网站，直接使用浏览器模式
+// Smart web content fetching (three-tier fallback: Jina → Local → Playwright)
+// For known sites requiring browser (like WeChat), use browser mode directly
 async function fetchWebContent(url, preferJina = true) {
-    // 微信文章直接使用浏览器模式，因为其他方式无法绕过验证
+    // WeChat articles use browser mode directly as other methods cannot bypass verification
     if (isWeixinUrl(url)) {
-        console.error("检测到微信文章，直接使用Playwright浏览器模式");
+        console.error("Detected WeChat article, using Playwright browser mode");
         return await fetchWithPlaywright(url);
     }
     if (preferJina) {
-        // 第一层：尝试Jina Reader
+        // Tier 1: Try Jina Reader
         try {
             return await fetchWithJinaReader(url);
         }
         catch (jinaError) {
-            console.error("Jina Reader失败，尝试本地解析:", jinaError instanceof Error ? jinaError.message : String(jinaError));
-            // 第二层：尝试本地解析
+            console.error("Jina Reader failed, trying local parser:", jinaError instanceof Error ? jinaError.message : String(jinaError));
+            // Tier 2: Try local parser
             try {
                 return await fetchWithLocalParser(url);
             }
             catch (localError) {
-                console.error("本地解析失败，检查是否需要浏览器模式:", localError instanceof Error ? localError.message : String(localError));
-                // 判断是否需要使用浏览器模式
+                console.error("Local parser failed, checking if browser mode needed:", localError instanceof Error ? localError.message : String(localError));
+                // Check if browser mode is needed
                 const jinaErr = jinaError instanceof Error ? jinaError : new Error(String(jinaError));
                 const localErr = localError instanceof Error ? localError : new Error(String(localError));
                 if (shouldUseBrowser(jinaErr) || shouldUseBrowser(localErr)) {
-                    console.error("检测到访问限制，使用Playwright浏览器模式");
+                    console.error("Detected access restrictions, using Playwright browser mode");
                     try {
-                        // 第三层：使用Playwright浏览器
+                        // Tier 3: Use Playwright browser
                         return await fetchWithPlaywright(url);
                     }
                     catch (browserError) {
-                        throw new Error(`所有方法都失败了。Jina: ${jinaErr.message}, 本地: ${localErr.message}, 浏览器: ${browserError instanceof Error ? browserError.message : String(browserError)}`);
+                        throw new Error(`All methods failed. Jina: ${jinaErr.message}, Local: ${localErr.message}, Browser: ${browserError instanceof Error ? browserError.message : String(browserError)}`);
                     }
                 }
                 else {
-                    throw new Error(`Jina和本地解析都失败了。Jina: ${jinaErr.message}, 本地: ${localErr.message}`);
+                    throw new Error(`Jina and local parser both failed. Jina: ${jinaErr.message}, Local: ${localErr.message}`);
                 }
             }
         }
     }
     else {
-        // 如果不优先使用Jina，直接从本地解析开始
+        // If not prioritizing Jina, start with local parser
         try {
             return await fetchWithLocalParser(url);
         }
         catch (localError) {
             const localErr = localError instanceof Error ? localError : new Error(String(localError));
             if (shouldUseBrowser(localErr)) {
-                console.error("本地解析失败，检测到访问限制，使用Playwright浏览器模式");
+                console.error("Local parser failed, detected access restrictions, using Playwright browser mode");
                 return await fetchWithPlaywright(url);
             }
             else {
@@ -371,228 +366,461 @@ async function fetchWebContent(url, preferJina = true) {
         }
     }
 }
-// 处理工具列表请求
-server.setRequestHandler(ListToolsRequestSchema, async () => {
-    return {
-        tools: [
-            {
-                name: "fetch_url",
-                description: "获取指定URL的网页内容，并转换为Markdown格式。默认使用Jina Reader，失败时自动切换到本地解析",
-                inputSchema: {
-                    type: "object",
-                    properties: {
-                        url: {
-                            type: "string",
-                            description: "要获取内容的网页URL（必须是http或https协议）",
-                        },
-                        preferJina: {
-                            type: "boolean",
-                            description: "是否优先使用Jina Reader（默认为true）",
-                            default: true,
+const streamableSessions = new Map();
+const legacySseSessions = new Map();
+function createServerInstance() {
+    const server = new Server({
+        name: "web-reader",
+        version: "2.1.0",
+    }, {
+        capabilities: {
+            tools: {},
+        },
+    });
+    registerServerHandlers(server);
+    return server;
+}
+function registerServerHandlers(server) {
+    // Handle tool list requests
+    server.setRequestHandler(ListToolsRequestSchema, async () => {
+        return {
+            tools: [
+                {
+                    name: "fetch_url",
+                    description: "Fetch web content from specified URL and convert to Markdown format. Uses Jina Reader by default, automatically falls back to local parser on failure",
+                    inputSchema: {
+                        type: "object",
+                        properties: {
+                            url: {
+                                type: "string",
+                                description: "Webpage URL to fetch (must be http or https protocol)",
+                            },
+                            preferJina: {
+                                type: "boolean",
+                                description: "Whether to prioritize Jina Reader (default: true)",
+                                default: true,
+                            },
                         },
+                        required: ["url"],
                     },
-                    required: ["url"],
                 },
-            },
-            {
-                name: "fetch_multiple_urls",
-                description: "批量获取多个URL的网页内容",
-                inputSchema: {
-                    type: "object",
-                    properties: {
-                        urls: {
-                            type: "array",
-                            items: {
-                                type: "string",
+                {
+                    name: "fetch_multiple_urls",
+                    description: "Batch fetch web content from multiple URLs",
+                    inputSchema: {
+                        type: "object",
+                        properties: {
+                            urls: {
+                                type: "array",
+                                items: {
+                                    type: "string",
+                                },
+                                description: "List of webpage URLs to fetch",
+                                maxItems: 10, // Limit to 10 URLs
+                            },
+                            preferJina: {
+                                type: "boolean",
+                                description: "Whether to prioritize Jina Reader (default: true)",
+                                default: true,
                             },
-                            description: "要获取内容的网页URL列表",
-                            maxItems: 10, // 限制最多10个URL
-                        },
-                        preferJina: {
-                            type: "boolean",
-                            description: "是否优先使用Jina Reader（默认为true）",
-                            default: true,
                         },
+                        required: ["urls"],
                     },
-                    required: ["urls"],
                 },
-            },
-            {
-                name: "fetch_url_with_jina",
-                description: "强制使用Jina Reader获取网页内容（适用于复杂网页）",
-                inputSchema: {
-                    type: "object",
-                    properties: {
-                        url: {
-                            type: "string",
-                            description: "要获取内容的网页URL",
+                {
+                    name: "fetch_url_with_jina",
+                    description: "Force fetch using Jina Reader (suitable for complex webpages)",
+                    inputSchema: {
+                        type: "object",
+                        properties: {
+                            url: {
+                                type: "string",
+                                description: "Webpage URL to fetch",
+                            },
                         },
+                        required: ["url"],
                     },
-                    required: ["url"],
                 },
-            },
-            {
-                name: "fetch_url_local",
-                description: "强制使用本地解析器获取网页内容（适用于简单网页或Jina不可用时）",
-                inputSchema: {
-                    type: "object",
-                    properties: {
-                        url: {
-                            type: "string",
-                            description: "要获取内容的网页URL",
+                {
+                    name: "fetch_url_local",
+                    description: "Force fetch using local parser (suitable for simple webpages or when Jina is unavailable)",
+                    inputSchema: {
+                        type: "object",
+                        properties: {
+                            url: {
+                                type: "string",
+                                description: "Webpage URL to fetch",
+                            },
                         },
+                        required: ["url"],
                     },
-                    required: ["url"],
                 },
-            },
-            {
-                name: "fetch_url_with_browser",
-                description: "强制使用Playwright浏览器获取网页内容（适用于有访问限制的网站，如Cloudflare保护、验证码等）",
-                inputSchema: {
-                    type: "object",
-                    properties: {
-                        url: {
-                            type: "string",
-                            description: "要获取内容的网页URL",
+                {
+                    name: "fetch_url_with_browser",
+                    description: "Force fetch using Playwright browser (suitable for websites with access restrictions, such as Cloudflare protection, CAPTCHA, etc.)",
+                    inputSchema: {
+                        type: "object",
+                        properties: {
+                            url: {
+                                type: "string",
+                                description: "Webpage URL to fetch",
+                            },
                         },
+                        required: ["url"],
                     },
-                    required: ["url"],
                 },
-            },
-        ],
-    };
-});
-// 处理工具调用请求
-server.setRequestHandler(CallToolRequestSchema, async (request) => {
-    const { name, arguments: args } = request.params;
-    try {
-        if (name === "fetch_url") {
-            const { url, preferJina = true } = args;
-            // 验证URL
-            if (!isValidUrl(url)) {
-                throw new McpError(ErrorCode.InvalidParams, "无效的URL格式，请提供http或https协议的URL");
-            }
-            // 获取网页内容
-            const result = await fetchWebContent(url, preferJina);
-            return {
-                content: [
-                    {
-                        type: "text",
-                        text: `# ${result.title}\n\n**URL**: ${result.metadata.url}\n**获取时间**: ${result.metadata.fetchedAt}\n**内容长度**: ${result.metadata.contentLength} 字符\n**解析方法**: ${result.metadata.method}\n\n---\n\n${result.content}`,
-                    },
-                ],
-            };
-        }
-        else if (name === "fetch_url_with_jina") {
-            const { url } = args;
-            if (!isValidUrl(url)) {
-                throw new McpError(ErrorCode.InvalidParams, "无效的URL格式");
+            ],
+        };
+    });
+    // Handle tool call requests
+    server.setRequestHandler(CallToolRequestSchema, async (request) => {
+        const { name, arguments: args } = request.params;
+        try {
+            if (name === "fetch_url") {
+                const { url, preferJina = true } = args;
+                // Validate URL
+                if (!isValidUrl(url)) {
+                    throw new McpError(ErrorCode.InvalidParams, "Invalid URL format, please provide http or https protocol URL");
+                }
+                // Fetch web content
+                const result = await fetchWebContent(url, preferJina);
+                return {
+                    content: [
+                        {
+                            type: "text",
+                            text: `# ${result.title}\n\n**URL**: ${result.metadata.url}\n**Fetched At**: ${result.metadata.fetchedAt}\n**Content Length**: ${result.metadata.contentLength} characters\n**Method**: ${result.metadata.method}\n\n---\n\n${result.content}`,
+                        },
+                    ],
+                };
             }
-            const result = await fetchWithJinaReader(url);
-            return {
-                content: [
-                    {
-                        type: "text",
-                        text: `# ${result.title}\n\n**URL**: ${result.metadata.url}\n**获取时间**: ${result.metadata.fetchedAt}\n**内容长度**: ${result.metadata.contentLength} 字符\n**解析方法**: Jina Reader\n\n---\n\n${result.content}`,
-                    },
-                ],
-            };
-        }
-        else if (name === "fetch_url_local") {
-            const { url } = args;
-            if (!isValidUrl(url)) {
-                throw new McpError(ErrorCode.InvalidParams, "无效的URL格式");
+            else if (name === "fetch_url_with_jina") {
+                const { url } = args;
+                if (!isValidUrl(url)) {
+                    throw new McpError(ErrorCode.InvalidParams, "Invalid URL format");
+                }
+                const result = await fetchWithJinaReader(url);
+                return {
+                    content: [
+                        {
+                            type: "text",
+                            text: `# ${result.title}\n\n**URL**: ${result.metadata.url}\n**Fetched At**: ${result.metadata.fetchedAt}\n**Content Length**: ${result.metadata.contentLength} characters\n**Method**: Jina Reader\n\n---\n\n${result.content}`,
+                        },
+                    ],
+                };
             }
-            const result = await fetchWithLocalParser(url);
-            return {
-                content: [
-                    {
-                        type: "text",
-                        text: `# ${result.title}\n\n**URL**: ${result.metadata.url}\n**获取时间**: ${result.metadata.fetchedAt}\n**内容长度**: ${result.metadata.contentLength} 字符\n**解析方法**: 本地解析器\n\n---\n\n${result.content}`,
-                    },
-                ],
-            };
-        }
-        else if (name === "fetch_multiple_urls") {
-            const { urls, preferJina = true } = args;
-            // 验证所有URL
-            const invalidUrls = urls.filter(url => !isValidUrl(url));
-            if (invalidUrls.length > 0) {
-                throw new McpError(ErrorCode.InvalidParams, `以下URL格式无效: ${invalidUrls.join(", ")}`);
+            else if (name === "fetch_url_local") {
+                const { url } = args;
+                if (!isValidUrl(url)) {
+                    throw new McpError(ErrorCode.InvalidParams, "Invalid URL format");
+                }
+                const result = await fetchWithLocalParser(url);
+                return {
+                    content: [
+                        {
+                            type: "text",
+                            text: `# ${result.title}\n\n**URL**: ${result.metadata.url}\n**Fetched At**: ${result.metadata.fetchedAt}\n**Content Length**: ${result.metadata.contentLength} characters\n**Method**: Local Parser\n\n---\n\n${result.content}`,
+                        },
+                    ],
+                };
             }
-            // 并发获取所有URL内容
-            const results = await Promise.allSettled(urls.map(url => fetchWebContent(url, preferJina)));
-            // 整理结果
-            let combinedContent = "# 批量URL内容获取结果\n\n";
-            results.forEach((result, index) => {
-                const url = urls[index];
-                combinedContent += `## ${index + 1}. ${url}\n\n`;
-                if (result.status === "fulfilled") {
-                    const { title, content, metadata } = result.value;
-                    combinedContent += `**标题**: ${title}\n`;
-                    combinedContent += `**获取时间**: ${metadata.fetchedAt}\n`;
-                    combinedContent += `**内容长度**: ${metadata.contentLength} 字符\n`;
-                    combinedContent += `**解析方法**: ${metadata.method}\n\n`;
-                    combinedContent += `### 内容\n\n${content}\n\n`;
+            else if (name === "fetch_multiple_urls") {
+                const { urls, preferJina = true } = args;
+                // Validate all URLs
+                const invalidUrls = urls.filter(url => !isValidUrl(url));
+                if (invalidUrls.length > 0) {
+                    throw new McpError(ErrorCode.InvalidParams, `The following URLs have invalid format: ${invalidUrls.join(", ")}`);
                 }
-                else {
-                    combinedContent += `**错误**: ${result.reason}\n\n`;
+                // Fetch all URLs concurrently
+                const results = await Promise.allSettled(urls.map(url => fetchWebContent(url, preferJina)));
+                // Combine results
+                let combinedContent = "# Batch URL Content Fetch Results\n\n";
+                results.forEach((result, index) => {
+                    const url = urls[index];
+                    combinedContent += `## ${index + 1}. ${url}\n\n`;
+                    if (result.status === "fulfilled") {
+                        const { title, content, metadata } = result.value;
+                        combinedContent += `**Title**: ${title}\n`;
+                        combinedContent += `**Fetched At**: ${metadata.fetchedAt}\n`;
+                        combinedContent += `**Content Length**: ${metadata.contentLength} characters\n`;
+                        combinedContent += `**Method**: ${metadata.method}\n\n`;
+                        combinedContent += `### Content\n\n${content}\n\n`;
+                    }
+                    else {
+                        combinedContent += `**Error**: ${result.reason}\n\n`;
+                    }
+                    combinedContent += "---\n\n";
+                });
+                return {
+                    content: [
+                        {
+                            type: "text",
+                            text: combinedContent,
+                        },
+                    ],
+                };
+            }
+            else if (name === "fetch_url_with_browser") {
+                const { url } = args;
+                if (!isValidUrl(url)) {
+                    throw new McpError(ErrorCode.InvalidParams, "Invalid URL format");
                 }
-                combinedContent += "---\n\n";
-            });
-            return {
-                content: [
-                    {
-                        type: "text",
-                        text: combinedContent,
-                    },
-                ],
-            };
+                const result = await fetchWithPlaywright(url);
+                return {
+                    content: [
+                        {
+                            type: "text",
+                            text: `# ${result.title}\n\n**URL**: ${result.metadata.url}\n**Fetched At**: ${result.metadata.fetchedAt}\n**Content Length**: ${result.metadata.contentLength} characters\n**Method**: Playwright Browser\n\n---\n\n${result.content}`,
+                        },
+                    ],
+                };
+            }
+            else {
+                throw new McpError(ErrorCode.MethodNotFound, `Unknown tool: ${name}`);
+            }
         }
-        else if (name === "fetch_url_with_browser") {
-            const { url } = args;
-            if (!isValidUrl(url)) {
-                throw new McpError(ErrorCode.InvalidParams, "无效的URL格式");
+        catch (error) {
+            if (error instanceof McpError) {
+                throw error;
             }
-            const result = await fetchWithPlaywright(url);
-            return {
-                content: [
-                    {
-                        type: "text",
-                        text: `# ${result.title}\n\n**URL**: ${result.metadata.url}\n**获取时间**: ${result.metadata.fetchedAt}\n**内容长度**: ${result.metadata.contentLength} 字符\n**解析方法**: Playwright浏览器\n\n---\n\n${result.content}`,
-                    },
-                ],
-            };
+            throw new McpError(ErrorCode.InternalError, `Tool execution failed: ${error instanceof Error ? error.message : String(error)}`);
+        }
+    });
+}
+function sendJsonRpcError(res, statusCode, message) {
+    res.status(statusCode).json({
+        jsonrpc: "2.0",
+        error: {
+            code: -32000,
+            message,
+        },
+        id: null,
+    });
+}
+function getSessionIdFromHeaders(headers) {
+    const value = headers["mcp-session-id"];
+    if (!value) {
+        return undefined;
+    }
+    return Array.isArray(value) ? value[0] : value;
+}
+function resolveTransportMode() {
+    const cliTransportArg = process.argv.find((arg) => arg.startsWith("--transport="));
+    const cliTransport = cliTransportArg ? cliTransportArg.split("=", 2)[1] : undefined;
+    const legacyHttpFlag = process.argv.includes("--http");
+    const mode = (cliTransport ?? process.env.MCP_TRANSPORT ?? (legacyHttpFlag ? "http" : "stdio"))
+        .toLowerCase();
+    if (mode === "stdio" || mode === "http") {
+        return mode;
+    }
+    throw new Error(`Unsupported transport mode: ${mode}. Use 'stdio' or 'http'.`);
+}
+function resolveLegacySseFlag() {
+    const envValue = (process.env.MCP_ENABLE_LEGACY_SSE ?? "").toLowerCase();
+    return envValue === "1" || envValue === "true" || process.argv.includes("--legacy-sse");
+}
+async function closeAllSessions() {
+    for (const [sessionId, session] of streamableSessions.entries()) {
+        try {
+            await session.server.close();
         }
-        else {
-            throw new McpError(ErrorCode.MethodNotFound, `未知的工具: ${name}`);
+        catch (error) {
+            console.error(`Failed to close streamable server for session ${sessionId}:`, error);
         }
     }
-    catch (error) {
-        if (error instanceof McpError) {
-            throw error;
+    streamableSessions.clear();
+    for (const [sessionId, session] of legacySseSessions.entries()) {
+        try {
+            await session.server.close();
+        }
+        catch (error) {
+            console.error(`Failed to close SSE server for session ${sessionId}:`, error);
         }
-        throw new McpError(ErrorCode.InternalError, `工具执行失败: ${error instanceof Error ? error.message : String(error)}`);
     }
-});
-// 启动服务器
-async function main() {
+    legacySseSessions.clear();
+}
+async function startStdioServer() {
+    const server = createServerInstance();
     const transport = new StdioServerTransport();
     await server.connect(transport);
-    console.error("MCP Web Reader v2.0 已启动（支持Jina Reader + Playwright）");
+    console.error("MCP Web Reader started in stdio mode");
+}
+async function startHttpServer() {
+    const host = process.env.MCP_HTTP_HOST ?? "127.0.0.1";
+    const port = Number.parseInt(process.env.MCP_HTTP_PORT ?? "3000", 10);
+    const mcpPath = process.env.MCP_HTTP_PATH ?? "/mcp";
+    const enableLegacySse = resolveLegacySseFlag();
+    if (!Number.isInteger(port) || port <= 0 || port > 65535) {
+        throw new Error(`Invalid MCP_HTTP_PORT: ${process.env.MCP_HTTP_PORT}`);
+    }
+    const app = createMcpExpressApp({ host });
+    app.post(mcpPath, async (req, res) => {
+        const sessionId = getSessionIdFromHeaders(req.headers);
+        try {
+            if (sessionId) {
+                const existingSession = streamableSessions.get(sessionId);
+                if (!existingSession) {
+                    sendJsonRpcError(res, 404, "Session not found");
+                    return;
+                }
+                await existingSession.transport.handleRequest(req, res, req.body);
+                return;
+            }
+            if (!isInitializeRequest(req.body)) {
+                sendJsonRpcError(res, 400, "Missing session ID; initialize request required");
+                return;
+            }
+            let transport;
+            const sessionServer = createServerInstance();
+            transport = new StreamableHTTPServerTransport({
+                sessionIdGenerator: () => randomUUID(),
+                onsessioninitialized: (initializedSessionId) => {
+                    streamableSessions.set(initializedSessionId, { transport, server: sessionServer });
+                    console.error(`Streamable HTTP session initialized: ${initializedSessionId}`);
+                },
+            });
+            transport.onclose = () => {
+                const closedSessionId = transport.sessionId;
+                if (closedSessionId && streamableSessions.delete(closedSessionId)) {
+                    console.error(`Streamable HTTP session closed: ${closedSessionId}`);
+                }
+            };
+            await sessionServer.connect(transport);
+            await transport.handleRequest(req, res, req.body);
+        }
+        catch (error) {
+            console.error("Error handling streamable HTTP POST request:", error);
+            if (!res.headersSent) {
+                sendJsonRpcError(res, 500, "Internal server error");
+            }
+        }
+    });
+    app.get(mcpPath, async (req, res) => {
+        const sessionId = getSessionIdFromHeaders(req.headers);
+        if (!sessionId) {
+            sendJsonRpcError(res, 400, "Missing mcp-session-id header");
+            return;
+        }
+        const session = streamableSessions.get(sessionId);
+        if (!session) {
+            sendJsonRpcError(res, 404, "Session not found");
+            return;
+        }
+        try {
+            await session.transport.handleRequest(req, res);
+        }
+        catch (error) {
+            console.error("Error handling streamable HTTP GET request:", error);
+            if (!res.headersSent) {
+                sendJsonRpcError(res, 500, "Internal server error");
+            }
+        }
+    });
+    app.delete(mcpPath, async (req, res) => {
+        const sessionId = getSessionIdFromHeaders(req.headers);
+        if (!sessionId) {
+            sendJsonRpcError(res, 400, "Missing mcp-session-id header");
+            return;
+        }
+        const session = streamableSessions.get(sessionId);
+        if (!session) {
+            sendJsonRpcError(res, 404, "Session not found");
+            return;
+        }
+        try {
+            await session.transport.handleRequest(req, res);
+        }
+        catch (error) {
+            console.error("Error handling streamable HTTP DELETE request:", error);
+            if (!res.headersSent) {
+                sendJsonRpcError(res, 500, "Internal server error");
+            }
+        }
+    });
+    if (enableLegacySse) {
+        app.get("/sse", async (_req, res) => {
+            const transport = new SSEServerTransport("/messages", res);
+            const server = createServerInstance();
+            legacySseSessions.set(transport.sessionId, { transport, server });
+            res.on("close", () => {
+                const removed = legacySseSessions.delete(transport.sessionId);
+                if (removed) {
+                    void server.close().catch((error) => {
+                        console.error("Failed to close legacy SSE session server:", error);
+                    });
+                }
+            });
+            await server.connect(transport);
+        });
+        app.post("/messages", async (req, res) => {
+            const querySessionId = req.query.sessionId;
+            const sessionId = typeof querySessionId === "string"
+                ? querySessionId
+                : Array.isArray(querySessionId) && typeof querySessionId[0] === "string"
+                    ? querySessionId[0]
+                    : undefined;
+            if (!sessionId) {
+                sendJsonRpcError(res, 400, "Missing sessionId query parameter");
+                return;
+            }
+            const session = legacySseSessions.get(sessionId);
+            if (!session) {
+                sendJsonRpcError(res, 404, "Session not found");
+                return;
+            }
+            try {
+                await session.transport.handlePostMessage(req, res, req.body);
+            }
+            catch (error) {
+                console.error("Error handling legacy SSE POST request:", error);
+                if (!res.headersSent) {
+                    sendJsonRpcError(res, 500, "Internal server error");
+                }
+            }
+        });
+    }
+    app.get("/healthz", (_req, res) => {
+        res.json({
+            status: "ok",
+            transport: "streamable-http",
+            sessions: streamableSessions.size,
+            legacySseEnabled: enableLegacySse,
+        });
+    });
+    app.listen(port, host, () => {
+        console.error(`MCP Web Reader started in HTTP mode on http://${host}:${port}${mcpPath}`);
+        if (enableLegacySse) {
+            console.error("Legacy SSE compatibility enabled on /sse and /messages");
+        }
+    });
 }
-// 优雅关闭处理
-process.on('SIGINT', async () => {
-    console.error("接收到SIGINT信号，正在关闭浏览器...");
+let isShuttingDown = false;
+async function shutdown(signal) {
+    if (isShuttingDown) {
+        return;
+    }
+    isShuttingDown = true;
+    console.error(`Received ${signal}, shutting down MCP Web Reader...`);
+    await closeAllSessions();
     await closeBrowser();
     process.exit(0);
+}
+// Start server
+async function main() {
+    const transportMode = resolveTransportMode();
+    if (transportMode === "http") {
+        await startHttpServer();
+        return;
+    }
+    await startStdioServer();
+}
+// Graceful shutdown handling
+process.on("SIGINT", () => {
+    void shutdown("SIGINT");
 });
-process.on('SIGTERM', async () => {
-    console.error("接收到SIGTERM信号，正在关闭浏览器...");
-    await closeBrowser();
-    process.exit(0);
+process.on("SIGTERM", () => {
+    void shutdown("SIGTERM");
 });
 main().catch((error) => {
-    console.error("服务器启动失败:", error);
+    console.error("Server startup failed:", error);
     process.exit(1);
 });

package/package.json CHANGED Viewed

@@ -1,45 +1,52 @@
 {
     "name": "mcp-web-reader",
-    "version": "2.0.2",
+    "version": "2.2.0",
     "description": "MCP server for reading web content with Jina Reader and local parser support",
     "main": "dist/index.js",
     "bin": {
-      "mcp-web-reader": "./dist/index.js"
+        "mcp-web-reader": "./dist/index.js"
     },
     "type": "module",
     "scripts": {
-      "build": "tsc",
-      "start": "node dist/index.js",
-      "dev": "tsc --watch",
-      "claude-code": "node dist/index.js"
+        "build": "tsc",
+        "start": "node dist/index.js",
+        "start:http": "node dist/index.js --transport=http",
+        "dev": "tsc --watch",
+        "claude-code": "node dist/index.js",
+        "postinstall": "npx playwright install chromium"
     },
     "repository": {
-      "type": "git",
-      "url": "git+https://github.com/Gracker/mcp-web-reader.git"
+        "type": "git",
+        "url": "git+https://github.com/Gracker/mcp-web-reader.git"
     },
     "bugs": {
-      "url": "https://github.com/Gracker/mcp-web-reader/issues"
+        "url": "https://github.com/Gracker/mcp-web-reader/issues"
     },
     "homepage": "https://github.com/Gracker/mcp-web-reader#readme",
-    "keywords": ["mcp", "claude", "web-scraping", "jina-reader"],
+    "keywords": [
+        "mcp",
+        "claude",
+        "web-scraping",
+        "jina-reader"
+    ],
     "author": "Gracker",
     "license": "MIT",
     "files": [
-      "dist",
-      "README.md",
-      "LICENSE"
+        "dist",
+        "README.md",
+        "LICENSE"
     ],
     "dependencies": {
-      "@modelcontextprotocol/sdk": "^0.5.0",
-      "node-fetch": "^3.3.2",
-      "jsdom": "^24.0.0",
-      "turndown": "^7.1.3",
-      "playwright": "^1.40.0"
+        "@modelcontextprotocol/sdk": "^1.26.0",
+        "jsdom": "^24.0.0",
+        "node-fetch": "^3.3.2",
+        "playwright": "^1.40.0",
+        "turndown": "^7.1.3"
     },
     "devDependencies": {
-      "@types/node": "^20.0.0",
-      "@types/jsdom": "^21.1.6",
-      "@types/turndown": "^5.0.4",
-      "typescript": "^5.3.3"
+        "@types/jsdom": "^21.1.6",
+        "@types/node": "^20.0.0",
+        "@types/turndown": "^5.0.4",
+        "typescript": "^5.3.3"
     }
-  }
+}