scrape-do-mcp 0.1.2 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README-ZH.md ADDED
@@ -0,0 +1,146 @@
1
+ # scrape-do-mcp
2
+
3
+ Scrape.do 网页抓取和 Google 搜索 MCP 服务器 - 支持反机器人保护
4
+
5
+ ## 功能特点
6
+
7
+ - **scrape_url**: 抓取任意网页并返回 Markdown 格式内容。自动绕过 Cloudflare、WAF、CAPTCHA 和反爬虫保护。支持 JavaScript 渲染页面。
8
+ - **google_search**: 搜索 Google 并返回结构化的 SERP 结果 JSON。包含自然搜索结果、知识图谱、本地商家、新闻、相关问题(People Also Ask)等。
9
+
10
+ ## 安装
11
+
12
+ ### 快速安装(推荐)
13
+
14
+ 在终端中运行以下命令:
15
+
16
+ ```bash
17
+ claude mcp add-json scrape-do --scope user '{
18
+ "type": "stdio",
19
+ "command": "npx",
20
+ "args": ["-y", "scrape-do-mcp"],
21
+ "env": {
22
+ "SCRAPE_DO_TOKEN": "你的Token"
23
+ }
24
+ }'
25
+ ```
26
+
27
+ 将 `你的Token` 替换为你在 https://app.scrape.do 获取的 API Token。
28
+
29
+ ### Claude Desktop
30
+
31
+ 添加到 `~/.claude.json`:
32
+
33
+ ```json
34
+ {
35
+ "mcpServers": {
36
+ "scrape-do": {
37
+ "command": "npx",
38
+ "args": ["-y", "scrape-do-mcp"],
39
+ "env": {
40
+ "SCRAPE_DO_TOKEN": "你的Token"
41
+ }
42
+ }
43
+ }
44
+ }
45
+ ```
46
+
47
+ 获取免费 API Token:https://app.scrape.do
48
+
49
+ ## 使用方法
50
+
51
+ ### scrape_url
52
+
53
+ 抓取任意网页并获取 Markdown 内容。
54
+
55
+ ```typescript
56
+ // 参数
57
+ {
58
+ url: string, // 要抓取的网址
59
+ render_js?: boolean, // 渲染 JavaScript(默认 false)
60
+ super_proxy?: boolean, // 使用住宅代理(消耗 10 积分,默认 false)
61
+ output?: "markdown" | "raw" // 输出格式(默认 markdown)
62
+ }
63
+ ```
64
+
65
+ ### google_search
66
+
67
+ 搜索 Google 并获取结构化结果。
68
+
69
+ ```typescript
70
+ // 参数
71
+ {
72
+ query: string, // 搜索关键词
73
+ country?: string, // 国家代码(默认 "us")
74
+ language?: string, // 界面语言(默认 "en")
75
+ page?: number, // 页码(默认 1)
76
+ time_period?: "" | "last_hour" | "last_day" | "last_week" | "last_month" | "last_year",
77
+ device?: "desktop" | "mobile" // 设备类型(默认 desktop)
78
+ }
79
+ ```
80
+
81
+ ## 使用示例
82
+
83
+ ### 抓取网页
84
+ ```
85
+ 请抓取 https://github.com 并给我主要内容(Markdown 格式)。
86
+ ```
87
+
88
+ ### Google 搜索
89
+ ```
90
+ 搜索 "2026 年最佳 Python Web 框架",返回前 5 个结果。
91
+ ```
92
+
93
+ ### 带筛选条件的搜索
94
+ ```
95
+ 用中文搜索 "AI 新闻",限定为中国,过去一周的内容。
96
+ ```
97
+
98
+ ### JavaScript 渲染
99
+ ```
100
+ 抓取这个 React 单页应用:https://example-spa.com
101
+ 使用 render_js=true 获取完整渲染内容。
102
+ ```
103
+
104
+ ### 获取原始 HTML
105
+ ```
106
+ 抓取 https://example.com 并返回原始 HTML 而不是 markdown。
107
+ ```
108
+
109
+ ## 与其他工具对比
110
+
111
+ | 功能 | scrape-do-mcp | Firecrawl | Browserbase |
112
+ |------|--------------|-----------|-------------|
113
+ | Google 搜索 | ✅ | ❌ | ❌ |
114
+ | 免费积分 | 1,000 | 500 | 无 |
115
+ | 价格 | 按量付费 | $19+/月 | $15+/月 |
116
+ | MCP 原生 | ✅ | ✅ | ❌ |
117
+ | 配置难度 | 无需配置 | 需要 API key | 需要 API key + 浏览器 |
118
+
119
+ ### 为什么选择 scrape-do-mcp?
120
+
121
+ - **零配置**:获取 Token 后即可立即使用
122
+ - **一体化**:网页抓取和 Google 搜索集于一个 MCP
123
+ - **反爬虫绕过**:自动处理 Cloudflare、WAF、CAPTCHA
124
+ - **成本效益**:按需付费,免费额度可用
125
+
126
+ ## 积分消耗
127
+
128
+ | 工具 | 积分消耗 |
129
+ |------|---------|
130
+ | scrape_url(普通) | 1 积分/次 |
131
+ | scrape_url(super_proxy) | 10 积分/次 |
132
+ | google_search | 1 积分/次 |
133
+
134
+ 注册即送 **1,000 积分**:https://app.scrape.do
135
+
136
+ ## 开发
137
+
138
+ ```bash
139
+ npm install
140
+ npm run build
141
+ npm run dev # 开发模式运行
142
+ ```
143
+
144
+ ## 许可证
145
+
146
+ MIT
package/README.md CHANGED
@@ -9,7 +9,24 @@ MCP Server for Scrape.do - Web Scraping & Google Search with anti-bot bypass
9
9
 
10
10
  ## Installation
11
11
 
12
- ### Claude Code / Claude Desktop
12
+ ### Quick Install (Recommended)
13
+
14
+ Run this command in your terminal:
15
+
16
+ ```bash
17
+ claude mcp add-json scrape-do --scope user '{
18
+ "type": "stdio",
19
+ "command": "npx",
20
+ "args": ["-y", "scrape-do-mcp"],
21
+ "env": {
22
+ "SCRAPE_DO_TOKEN": "YOUR_TOKEN_HERE"
23
+ }
24
+ }'
25
+ ```
26
+
27
+ Replace `YOUR_TOKEN_HERE` with your Scrape.do API token from https://app.scrape.do
28
+
29
+ ### Claude Desktop
13
30
 
14
31
  Add to your `~/.claude.json`:
15
32
 
@@ -29,29 +46,12 @@ Add to your `~/.claude.json`:
29
46
 
30
47
  Get your free API token at: https://app.scrape.do
31
48
 
32
- ### Smithery.ai
33
-
34
- Published on [Smithery.ai](https://smithery.ai) - Search for "scrape-do" to install.
35
-
36
- ### HTTP Server Mode
37
-
38
- The server supports both STDIO and HTTP modes:
39
-
40
- - **STDIO mode** (default): For local Claude Code / Claude Desktop usage
41
- - **HTTP mode**: For Smithery托管或 custom HTTP deployment
42
-
43
- ```bash
44
- # HTTP mode
45
- TRANSPORT=http PORT=3000 SCRAPE_DO_TOKEN=your_token npm start
46
-
47
- # Health check
48
- curl http://localhost:3000/health
49
- ```
50
-
51
49
  ## Usage
52
50
 
53
51
  ### scrape_url
54
52
 
53
+ Scrape any webpage and get content as Markdown.
54
+
55
55
  ```typescript
56
56
  // Parameters
57
57
  {
@@ -64,6 +64,8 @@ curl http://localhost:3000/health
64
64
 
65
65
  ### google_search
66
66
 
67
+ Search Google and get structured results.
68
+
67
69
  ```typescript
68
70
  // Parameters
69
71
  {
@@ -76,6 +78,53 @@ curl http://localhost:3000/health
76
78
  }
77
79
  ```
78
80
 
81
+ ## Example Prompts
82
+
83
+ Here are some prompts you can use to invoke the tools:
84
+
85
+ ### Scrape a Website
86
+ ```
87
+ Please scrape https://github.com and give me the main content as markdown.
88
+ ```
89
+
90
+ ### Search Google
91
+ ```
92
+ Search Google for "best Python web frameworks 2026" and return the top 5 results.
93
+ ```
94
+
95
+ ### Search with Filters
96
+ ```
97
+ Search for "AI news" in Chinese, from China, last week.
98
+ ```
99
+
100
+ ### JavaScript Rendering
101
+ ```
102
+ Scrape this React Single Page Application: https://example-spa.com
103
+ Use render_js=true to get the fully rendered content.
104
+ ```
105
+
106
+ ### Get Raw HTML
107
+ ```
108
+ Scrape https://example.com and return raw HTML instead of markdown.
109
+ ```
110
+
111
+ ## Comparison with Alternatives
112
+
113
+ | Feature | scrape-do-mcp | Firecrawl | Browserbase |
114
+ |---------|--------------|-----------|-------------|
115
+ | Google Search | ✅ | ❌ | ❌ |
116
+ | Free Credits | 1,000 | 500 | None |
117
+ | Pricing | Pay per use | $19+/mo | $15+/mo |
118
+ | MCP Native | ✅ | ✅ | ❌ |
119
+ | Setup Required | None | API key | API key + browser |
120
+
121
+ ### Why scrape-do-mcp?
122
+
123
+ - **Zero setup**: Just get a token and use immediately
124
+ - **All-in-one**: Both web scraping AND Google search in one MCP
125
+ - **Anti-bot bypass**: Automatically handles Cloudflare, WAFs, CAPTCHAs
126
+ - **Cost-effective**: Pay only for what you use, free tier available
127
+
79
128
  ## Credit Usage
80
129
 
81
130
  | Tool | Credit Cost |
package/dist/index.js CHANGED
@@ -6,16 +6,13 @@ var __importDefault = (this && this.__importDefault) || function (mod) {
6
6
  Object.defineProperty(exports, "__esModule", { value: true });
7
7
  const mcp_js_1 = require("@modelcontextprotocol/sdk/server/mcp.js");
8
8
  const stdio_js_1 = require("@modelcontextprotocol/sdk/server/stdio.js");
9
- const streamableHttp_js_1 = require("@modelcontextprotocol/sdk/server/streamableHttp.js");
10
9
  const zod_1 = require("zod");
11
10
  const axios_1 = __importDefault(require("axios"));
12
- const http_1 = __importDefault(require("http"));
13
11
  const SCRAPE_DO_TOKEN = process.env.SCRAPE_DO_TOKEN || "";
14
12
  const SCRAPE_API_BASE = "https://api.scrape.do";
15
- const HTTP_PORT = process.env.PORT || process.env.HTTP_PORT || 3000;
16
13
  const server = new mcp_js_1.McpServer({
17
14
  name: "scrape-do-mcp",
18
- version: "0.1.1",
15
+ version: "0.1.3",
19
16
  });
20
17
  // ─── Tool 1: scrape_url ──────────────────────────────────────────────────────
21
18
  server.tool("scrape_url", "Scrape any webpage and return its content as Markdown. Automatically bypasses Cloudflare, WAFs, CAPTCHAs, and anti-bot protection. Supports JavaScript-rendered pages.", {
@@ -97,47 +94,7 @@ server.tool("google_search", "Search Google and return structured SERP results a
97
94
  });
98
95
  // ─── Start Server ────────────────────────────────────────────────────────────
99
96
  async function main() {
100
- const transportMode = process.env.TRANSPORT || "stdio";
101
- if (transportMode === "http" || transportMode === "streamable-http") {
102
- console.error(`Starting Streamable HTTP server on port ${HTTP_PORT}...`);
103
- const transport = new streamableHttp_js_1.StreamableHTTPServerTransport({
104
- sessionIdGenerator: () => Math.random().toString(36).substring(2, 15),
105
- });
106
- await server.connect(transport);
107
- const serverInstance = http_1.default.createServer();
108
- serverInstance.on("request", async (req, res) => {
109
- // Handle CORS
110
- res.setHeader("Access-Control-Allow-Origin", "*");
111
- res.setHeader("Access-Control-Allow-Methods", "GET, POST, OPTIONS");
112
- res.setHeader("Access-Control-Allow-Headers", "Content-Type");
113
- if (req.method === "OPTIONS") {
114
- res.writeHead(204);
115
- res.end();
116
- return;
117
- }
118
- // Health check
119
- if (req.url === "/health") {
120
- res.writeHead(200, { "Content-Type": "application/json" });
121
- res.end(JSON.stringify({ status: "ok", name: "scrape-do-mcp", version: "0.1.1" }));
122
- return;
123
- }
124
- // MCP endpoint
125
- if (req.url === "/" || req.url?.startsWith("/mcp")) {
126
- await transport.handleRequest(req, res);
127
- return;
128
- }
129
- res.writeHead(404);
130
- res.end("Not found");
131
- });
132
- serverInstance.listen(parseInt(String(HTTP_PORT), 10), () => {
133
- console.error(`MCP server running on http://localhost:${HTTP_PORT}`);
134
- });
135
- }
136
- else {
137
- // Default to stdio mode
138
- console.error("Starting STDIO server...");
139
- const transport = new stdio_js_1.StdioServerTransport();
140
- await server.connect(transport);
141
- }
97
+ const transport = new stdio_js_1.StdioServerTransport();
98
+ await server.connect(transport);
142
99
  }
143
100
  main().catch(console.error);
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "scrape-do-mcp",
3
- "version": "0.1.2",
3
+ "version": "0.1.4",
4
4
  "description": "MCP Server for Scrape.do - Web Scraping & Google Search with anti-bot bypass",
5
5
  "main": "dist/index.js",
6
6
  "bin": {
@@ -25,7 +25,8 @@
25
25
  },
26
26
  "files": [
27
27
  "dist",
28
- "README.md"
28
+ "README.md",
29
+ "README-ZH.md"
29
30
  ],
30
31
  "engines": {
31
32
  "node": ">=18.0.0"