mcp-web-reader 2.0.0 → 2.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +56 -197
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,255 +1,114 @@
|
|
|
1
1
|
# MCP Web Reader
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
A powerful MCP (Model Context Protocol) server that enables Claude and other LLMs to read and parse web content. Supports bypassing access restrictions to easily fetch protected content like WeChat articles and paywalled sites.
|
|
4
4
|
|
|
5
|
-
##
|
|
5
|
+
## Features
|
|
6
6
|
|
|
7
|
-
- 🚀
|
|
8
|
-
- 🔄
|
|
9
|
-
- 🌐
|
|
10
|
-
- 📦
|
|
11
|
-
- 🎯
|
|
12
|
-
- 📝 **Markdown
|
|
7
|
+
- 🚀 **Multi-engine support**: Jina Reader API, local parser, and Playwright browser
|
|
8
|
+
- 🔄 **Intelligent fallback**: Auto-switches from Jina → Local → Playwright browser
|
|
9
|
+
- 🌐 **Bypass restrictions**: Handles Cloudflare, CAPTCHAs, and access controls
|
|
10
|
+
- 📦 **Batch processing**: Fetch multiple URLs simultaneously
|
|
11
|
+
- 🎯 **Flexible control**: Force specific parsing methods when needed
|
|
12
|
+
- 📝 **Markdown output**: Automatic conversion to clean Markdown format
|
|
13
13
|
|
|
14
|
-
##
|
|
14
|
+
## Installation
|
|
15
15
|
|
|
16
|
-
###
|
|
17
|
-
|
|
18
|
-
```bash
|
|
19
|
-
# 克隆仓库
|
|
20
|
-
git clone https://github.com/zacfire/mcp-web-reader.git
|
|
21
|
-
cd mcp-web-reader
|
|
22
|
-
|
|
23
|
-
# 安装依赖
|
|
24
|
-
npm install
|
|
25
|
-
|
|
26
|
-
# 构建项目
|
|
27
|
-
npm run build
|
|
28
|
-
|
|
29
|
-
# 安装 Playwright 浏览器(必需)
|
|
30
|
-
npx playwright install chromium
|
|
31
|
-
```
|
|
32
|
-
|
|
33
|
-
### 方法 2:使用 npm 安装(推荐)
|
|
34
|
-
|
|
35
|
-
发布后,您可以简单地通过 npm 安装:
|
|
16
|
+
### Quick Install (Recommended)
|
|
36
17
|
|
|
37
18
|
```bash
|
|
38
19
|
npm install -g mcp-web-reader
|
|
39
20
|
```
|
|
40
21
|
|
|
41
|
-
|
|
42
|
-
如果这是第一次发布,请运行提供的发布脚本:
|
|
22
|
+
### Install from Source
|
|
43
23
|
|
|
44
24
|
```bash
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
25
|
+
git clone https://github.com/Gracker/mcp-web-reader.git
|
|
26
|
+
cd mcp-web-reader
|
|
27
|
+
npm install
|
|
28
|
+
npm run build
|
|
29
|
+
npx playwright install chromium
|
|
50
30
|
```
|
|
51
31
|
|
|
52
|
-
##
|
|
32
|
+
## Configuration
|
|
53
33
|
|
|
54
|
-
###
|
|
34
|
+
### Claude Desktop
|
|
55
35
|
|
|
56
|
-
|
|
36
|
+
Add to your Claude Desktop config file:
|
|
57
37
|
|
|
58
38
|
**Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
|
|
59
|
-
|
|
60
39
|
**macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
|
|
61
40
|
|
|
62
41
|
```json
|
|
63
42
|
{
|
|
64
43
|
"mcpServers": {
|
|
65
44
|
"web-reader": {
|
|
66
|
-
"command": "
|
|
67
|
-
"args": ["/absolute/path/to/mcp-web-reader/dist/index.js"]
|
|
45
|
+
"command": "mcp-web-reader"
|
|
68
46
|
}
|
|
69
47
|
}
|
|
70
48
|
}
|
|
71
49
|
```
|
|
72
50
|
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
### 详细配置指南
|
|
76
|
-
|
|
77
|
-
📖 **完整的使用指南**:请查看 [USAGE_GUIDE.md](./USAGE_GUIDE.md),包含:
|
|
78
|
-
- **命令行使用(推荐)** - 适合使用 CLI 的用户
|
|
79
|
-
- Claude Desktop 配置
|
|
80
|
-
- Claude Code (Cursor) 配置
|
|
81
|
-
- 其他 MCP 客户端配置
|
|
82
|
-
- 使用示例和故障排除
|
|
51
|
+
### Claude Code (Terminal)
|
|
83
52
|
|
|
84
|
-
|
|
85
|
-
- 使用 MCP Inspector 测试
|
|
86
|
-
- 创建 CLI 包装器
|
|
87
|
-
- 集成到自定义脚本
|
|
88
|
-
- 命令行工具使用示例
|
|
89
|
-
|
|
90
|
-
## 使用方法
|
|
91
|
-
|
|
92
|
-
### 命令行使用(推荐)
|
|
93
|
-
|
|
94
|
-
如果你主要使用命令行的 Claude,可以使用提供的 CLI 工具:
|
|
53
|
+
For Claude Code users, add the MCP server using the command line:
|
|
95
54
|
|
|
96
55
|
```bash
|
|
97
|
-
|
|
98
|
-
node cli.js fetch https://example.com
|
|
99
|
-
|
|
100
|
-
# 强制使用 Jina Reader
|
|
101
|
-
node cli.js jina https://example.com
|
|
102
|
-
|
|
103
|
-
# 强制使用本地解析
|
|
104
|
-
node cli.js local https://example.com
|
|
105
|
-
|
|
106
|
-
# 强制使用浏览器模式(适用于微信文章等受限网站)
|
|
107
|
-
node cli.js browser https://mp.weixin.qq.com/...
|
|
56
|
+
claude mcp add web-reader -- mcp-web-reader
|
|
108
57
|
```
|
|
109
58
|
|
|
110
|
-
|
|
111
|
-
|
|
59
|
+
To verify the server is configured:
|
|
112
60
|
```bash
|
|
113
|
-
|
|
61
|
+
claude mcp list
|
|
114
62
|
```
|
|
115
63
|
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
### 在 Claude 中使用
|
|
119
|
-
|
|
120
|
-
配置完成后,在 Claude 中可以使用以下命令:
|
|
121
|
-
|
|
122
|
-
1. **智能获取(推荐)**
|
|
123
|
-
* "请获取 [https://example.com](https://example.com) 的内容"
|
|
124
|
-
* 自动三层降级:Jina Reader → 本地解析 → Playwright 浏览器
|
|
125
|
-
|
|
126
|
-
2. **批量获取**
|
|
127
|
-
* "请获取这些网页:[url1, url2, url3]"
|
|
128
|
-
* 每个URL都享受智能降级策略
|
|
64
|
+
## Usage
|
|
129
65
|
|
|
130
|
-
|
|
131
|
-
* "使用 Jina Reader 获取 [https://example.com](https://example.com)"
|
|
66
|
+
### In Claude
|
|
132
67
|
|
|
133
|
-
|
|
134
|
-
* "使用本地解析器获取 [https://example.com](https://example.com)"
|
|
68
|
+
After configuration, use natural language commands:
|
|
135
69
|
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
|
|
70
|
+
- "Fetch content from https://example.com"
|
|
71
|
+
- "Get content using browser for https://mp.weixin.qq.com/..." (for restricted sites)
|
|
72
|
+
- "Fetch multiple URLs: [url1, url2, url3]"
|
|
139
73
|
|
|
140
|
-
##
|
|
74
|
+
## Supported Sites
|
|
141
75
|
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
76
|
+
- **WeChat articles** - Automatic access bypass
|
|
77
|
+
- **Paywalled sites** - NYT, Time Magazine, etc.
|
|
78
|
+
- **Cloudflare protected sites**
|
|
79
|
+
- **JavaScript-heavy sites**
|
|
80
|
+
- **CAPTCHA protected sites**
|
|
147
81
|
|
|
148
|
-
##
|
|
82
|
+
## Tools
|
|
149
83
|
|
|
150
|
-
-
|
|
151
|
-
-
|
|
152
|
-
-
|
|
153
|
-
-
|
|
154
|
-
-
|
|
84
|
+
- `fetch_url` - Smart fetching with automatic fallback
|
|
85
|
+
- `fetch_url_with_jina` - Force Jina Reader
|
|
86
|
+
- `fetch_url_local` - Force local parsing
|
|
87
|
+
- `fetch_url_with_browser` - Force browser mode (for restricted sites)
|
|
88
|
+
- `fetch_multiple_urls` - Batch URL fetching
|
|
155
89
|
|
|
156
|
-
##
|
|
90
|
+
## Architecture
|
|
157
91
|
|
|
158
|
-
|
|
92
|
+
Intelligent fallback strategy:
|
|
159
93
|
```
|
|
160
|
-
|
|
161
|
-
↓
|
|
162
|
-
1. Jina Reader API (最快,成功率高)
|
|
163
|
-
↓ 失败
|
|
164
|
-
2. 本地解析器 (Node.js + JSDOM)
|
|
165
|
-
↓ 检测到访问限制
|
|
166
|
-
3. Playwright 浏览器 (真实浏览器,突破限制)
|
|
94
|
+
URL Request → Jina Reader → Local Parser → Playwright Browser
|
|
167
95
|
```
|
|
168
96
|
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
-
|
|
172
|
-
-
|
|
173
|
-
- 内容关键词:Security Check, Human Verification
|
|
97
|
+
Auto-detects restrictions and switches to browser mode for:
|
|
98
|
+
- HTTP status codes: 403, 429, 503, 520-524
|
|
99
|
+
- Keywords: Cloudflare, CAPTCHA, Access Denied
|
|
100
|
+
- Content patterns: Security checks, human verification
|
|
174
101
|
|
|
175
|
-
##
|
|
102
|
+
## Development
|
|
176
103
|
|
|
177
104
|
```bash
|
|
178
|
-
#
|
|
179
|
-
npm run
|
|
180
|
-
|
|
181
|
-
#
|
|
182
|
-
npm run build
|
|
183
|
-
|
|
184
|
-
# 测试运行
|
|
185
|
-
npm start
|
|
186
|
-
|
|
187
|
-
# 安装浏览器二进制文件(首次使用必需)
|
|
188
|
-
npx playwright install chromium
|
|
189
|
-
```
|
|
190
|
-
|
|
191
|
-
## 性能优化
|
|
192
|
-
|
|
193
|
-
- ⚡ **浏览器实例复用** - 避免重复启动开销
|
|
194
|
-
- 🚫 **资源过滤** - 阻止图片、样式表等不必要加载
|
|
195
|
-
- 🎯 **智能选择** - 优先使用快速方法,必要时才用浏览器
|
|
196
|
-
- 💾 **优雅关闭** - 正确清理浏览器资源
|
|
197
|
-
|
|
198
|
-
## 验证安装
|
|
199
|
-
|
|
200
|
-
### 测试 MCP 服务器
|
|
201
|
-
|
|
202
|
-
1. **使用 MCP Inspector 测试**:
|
|
203
|
-
```bash
|
|
204
|
-
npx @modelcontextprotocol/inspector node dist/index.js
|
|
205
|
-
```
|
|
206
|
-
|
|
207
|
-
2. **测试工具功能**:
|
|
208
|
-
在 Inspector 中输入以下 JSON 测试各种工具:
|
|
209
|
-
```json
|
|
210
|
-
{"method": "tools/call", "params": {"name": "fetch_url", "arguments": {"url": "https://example.com"}}}
|
|
105
|
+
npm run dev # Development mode with auto-rebuild
|
|
106
|
+
npm run build # Build production version
|
|
107
|
+
npm start # Test run
|
|
108
|
+
npx playwright install chromium # Install browser (required)
|
|
211
109
|
```
|
|
212
110
|
|
|
213
|
-
|
|
214
|
-
|
|
215
|
-
配置完成后,重启 Claude Desktop,然后在对话中输入:
|
|
216
|
-
- "请获取 https://httpbin.org/json 的内容"
|
|
217
|
-
|
|
218
|
-
如果能成功返回内容,说明安装成功。
|
|
219
|
-
|
|
220
|
-
## 故障排除
|
|
221
|
-
|
|
222
|
-
### 常见问题
|
|
223
|
-
|
|
224
|
-
1. **"找不到模块" 错误**
|
|
225
|
-
- 确保已运行 `npm install`
|
|
226
|
-
- 确保已运行 `npm run build`
|
|
227
|
-
|
|
228
|
-
2. **Claude Desktop 无法连接到 MCP 服务器**
|
|
229
|
-
- 检查配置文件路径是否正确
|
|
230
|
-
- 检查 `dist/index.js` 路径是否正确
|
|
231
|
-
- 重启 Claude Desktop
|
|
232
|
-
|
|
233
|
-
3. **Playwright 浏览器相关错误**
|
|
234
|
-
- 确保已运行 `npx playwright install chromium`
|
|
235
|
-
- 检查系统是否支持图形界面(某些服务器环境可能需要额外配置)
|
|
236
|
-
|
|
237
|
-
4. **微信文章无法获取**
|
|
238
|
-
- 微信文章需要 Playwright 浏览器模式
|
|
239
|
-
- 使用 `fetch_url_with_browser` 工具强制使用浏览器
|
|
240
|
-
|
|
241
|
-
### 调试模式
|
|
242
|
-
|
|
243
|
-
启用详细日志:
|
|
244
|
-
```bash
|
|
245
|
-
DEBUG=* node dist/index.js
|
|
246
|
-
```
|
|
247
|
-
|
|
248
|
-
## 贡献
|
|
249
|
-
|
|
250
|
-
欢迎提交 Pull Request!
|
|
251
|
-
|
|
252
|
-
## 许可证
|
|
111
|
+
## License
|
|
253
112
|
|
|
254
113
|
MIT License
|
|
255
114
|
|