scrape-do-mcp 0.2.0 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README-ZH.md +64 -171
- package/README.md +63 -172
- package/dist/index.js +935 -165
- package/package.json +6 -2
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Abel
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README-ZH.md
CHANGED
|
@@ -2,25 +2,44 @@
|
|
|
2
2
|
|
|
3
3
|
[English Docs](./README.md) | 中文文档
|
|
4
4
|
|
|
5
|
-
Scrape.do
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
5
|
+
这是一个把 Scrape.do 官方文档中主要 API 能力封装成 MCP 工具的包:主抓取 API、Google Search API、Amazon Scraper API、Async API,以及 Proxy Mode 配置辅助工具。
|
|
6
|
+
|
|
7
|
+
官方文档:https://scrape.do/documentation/
|
|
8
|
+
|
|
9
|
+
## 覆盖范围
|
|
10
|
+
|
|
11
|
+
- `scrape_url`:主 Scrape.do 抓取 API,支持 JS 渲染、地理定位、会话保持、截图、ReturnJSON、浏览器交互、Cookie、Header 转发。
|
|
12
|
+
- `google_search`:结构化 Google 搜索 API,支持 `google_domain`、`location`、`uule`、`lr`、`cr`、`safe`、`nfpr`、`filter`、分页、原始 HTML。
|
|
13
|
+
- `amazon_product`:Amazon PDP 接口。
|
|
14
|
+
- `amazon_offer_listing`:Amazon 卖家报价接口。
|
|
15
|
+
- `amazon_search`:Amazon 搜索 / 类目结果接口。
|
|
16
|
+
- `amazon_raw_html`:Amazon 原始 HTML 接口。
|
|
17
|
+
- `async_create_job`、`async_get_job`、`async_get_task`、`async_list_jobs`、`async_cancel_job`、`async_get_account`:Async API,并同时兼容 MCP 风格字段和官方字段名。
|
|
18
|
+
- `proxy_mode_config`:生成更贴近官方文档的 Proxy Mode 连接信息、默认参数串和证书信息。
|
|
19
|
+
|
|
20
|
+
## 兼容性说明
|
|
21
|
+
|
|
22
|
+
- `scrape_url` 同时支持 MCP 友好的别名和官方参数名:
|
|
23
|
+
- `render_js` 或 `render`
|
|
24
|
+
- `super_proxy` 或 `super`
|
|
25
|
+
- `screenshot` 或 `screenShot`
|
|
26
|
+
- `google_search` 同时支持:
|
|
27
|
+
- `query` 或 `q`
|
|
28
|
+
- `country` 或 `gl`
|
|
29
|
+
- `language` 或 `hl`
|
|
30
|
+
- `domain` 或 `google_domain`
|
|
31
|
+
- `includeHtml` 或 `include_html`
|
|
32
|
+
- `async_create_job` 同时接受 `targets`、`render`、`webhookUrl` 这类别名,以及官方字段 `Targets`、`Render`、`WebhookURL`。
|
|
33
|
+
- `async_get_job`、`async_get_task`、`async_cancel_job` 同时接受 `jobId` / `taskId` 和官方 `jobID` / `taskID`。
|
|
34
|
+
- `async_list_jobs` 同时支持 `pageSize` 和官方 `page_size`。
|
|
35
|
+
- `scrape_url` 里的 Header 转发请使用 `headers` + `header_mode`(`custom` / `extra` / `forward`)。
|
|
36
|
+
- 截图结果会保留官方 JSON 响应,同时附加 MCP 图片内容,尽量兼顾官方格式和 MCP 可视化体验。
|
|
37
|
+
- `scrape_url` 现在默认使用 `output="raw"`,更贴近官方 API。
|
|
38
|
+
- `scrape_url` 会在 `structuredContent` 里附带响应元数据,便于在 MCP 中查看 `pureCookies`、`transparentResponse` 和二进制响应信息。
|
|
18
39
|
|
|
19
40
|
## 安装
|
|
20
41
|
|
|
21
|
-
###
|
|
22
|
-
|
|
23
|
-
在终端中运行以下命令:
|
|
42
|
+
### 快速安装
|
|
24
43
|
|
|
25
44
|
```bash
|
|
26
45
|
claude mcp add-json scrape-do --scope user '{
|
|
@@ -28,13 +47,11 @@ claude mcp add-json scrape-do --scope user '{
|
|
|
28
47
|
"command": "npx",
|
|
29
48
|
"args": ["-y", "scrape-do-mcp"],
|
|
30
49
|
"env": {
|
|
31
|
-
"SCRAPE_DO_TOKEN": "
|
|
50
|
+
"SCRAPE_DO_TOKEN": "YOUR_TOKEN_HERE"
|
|
32
51
|
}
|
|
33
52
|
}'
|
|
34
53
|
```
|
|
35
54
|
|
|
36
|
-
将 `你的Token` 替换为你在 https://app.scrape.do 获取的 API Token。
|
|
37
|
-
|
|
38
55
|
### Claude Desktop
|
|
39
56
|
|
|
40
57
|
添加到 `~/.claude.json`:
|
|
@@ -46,181 +63,57 @@ claude mcp add-json scrape-do --scope user '{
|
|
|
46
63
|
"command": "npx",
|
|
47
64
|
"args": ["-y", "scrape-do-mcp"],
|
|
48
65
|
"env": {
|
|
49
|
-
"SCRAPE_DO_TOKEN": "
|
|
66
|
+
"SCRAPE_DO_TOKEN": "YOUR_TOKEN_HERE"
|
|
50
67
|
}
|
|
51
68
|
}
|
|
52
69
|
}
|
|
53
70
|
}
|
|
54
71
|
```
|
|
55
72
|
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
## 使用方法
|
|
59
|
-
|
|
60
|
-
### scrape_url
|
|
61
|
-
|
|
62
|
-
抓取任意网页并获取 Markdown 内容。
|
|
63
|
-
|
|
64
|
-
```typescript
|
|
65
|
-
// 完整参数
|
|
66
|
-
{
|
|
67
|
-
// 必需
|
|
68
|
-
url: string, // 要抓取的网址
|
|
69
|
-
|
|
70
|
-
// 代理和渲染
|
|
71
|
-
render_js?: boolean, // 渲染 JavaScript(默认 false)
|
|
72
|
-
super_proxy?: boolean, // 使用住宅/移动代理(消耗 10 积分)
|
|
73
|
-
geoCode?: string, // 国家代码(如 'us', 'cn', 'gb')
|
|
74
|
-
regionalGeoCode?: string, // 区域(如 'asia', 'europe')
|
|
75
|
-
device?: "desktop" | "mobile" | "tablet", // 设备类型
|
|
76
|
-
sessionId?: number, // 保持相同 IP 的会话
|
|
77
|
-
|
|
78
|
-
// 超时和重试
|
|
79
|
-
timeout?: number, // 最大超时时间(毫秒,默认 60000)
|
|
80
|
-
retryTimeout?: number, // 重试超时(毫秒)
|
|
81
|
-
disableRetry?: boolean, // 禁用自动重试
|
|
82
|
-
|
|
83
|
-
// 输出格式
|
|
84
|
-
output?: "markdown" | "raw", // 输出格式(默认 markdown)
|
|
85
|
-
returnJSON?: boolean, // 以 JSON 形式返回网络请求
|
|
86
|
-
transparentResponse?: boolean, // 返回原始响应
|
|
87
|
-
|
|
88
|
-
// 截图
|
|
89
|
-
screenshot?: boolean, // 截图(PNG)
|
|
90
|
-
fullScreenShot?: boolean, // 全页截图
|
|
91
|
-
particularScreenShot?: string, // 元素截图(CSS 选择器)
|
|
92
|
-
|
|
93
|
-
// 浏览器控制
|
|
94
|
-
waitSelector?: string, // 等待元素(CSS 选择器)
|
|
95
|
-
customWait?: number, // 加载后等待时间(毫秒)
|
|
96
|
-
waitUntil?: "domcontentloaded" | "load" | "networkidle" | "networkidle0" | "networkidle2",
|
|
97
|
-
width?: number, // 视口宽度(默认 1920)
|
|
98
|
-
height?: number, // 视口高度(默认 1080)
|
|
99
|
-
blockResources?: boolean, // 阻止 CSS/图片/字体(默认 true)
|
|
100
|
-
|
|
101
|
-
// 请求头和 Cookie
|
|
102
|
-
customHeaders?: boolean, // 处理所有请求头
|
|
103
|
-
extraHeaders?: boolean, // 添加额外请求头
|
|
104
|
-
forwardHeaders?: boolean, // 转发你的请求头
|
|
105
|
-
setCookies?: string, // 设置 Cookie(格式:'name=value; name2=value2')
|
|
106
|
-
pureCookies?: boolean, // 返回原始 Cookie
|
|
107
|
-
|
|
108
|
-
// 其他
|
|
109
|
-
disableRedirection?: boolean, // 禁用重定向
|
|
110
|
-
callback?: string // Webhook URL 异步接收结果
|
|
111
|
-
}
|
|
112
|
-
```
|
|
113
|
-
|
|
114
|
-
### google_search
|
|
115
|
-
|
|
116
|
-
搜索 Google 并获取结构化结果。
|
|
117
|
-
|
|
118
|
-
```typescript
|
|
119
|
-
// 完整参数
|
|
120
|
-
{
|
|
121
|
-
// 必需
|
|
122
|
-
query: string, // 搜索关键词
|
|
123
|
-
|
|
124
|
-
// 搜索选项
|
|
125
|
-
country?: string, // 国家代码(默认 'us')
|
|
126
|
-
language?: string, // 界面语言(默认 'en')
|
|
127
|
-
domain?: string, // Google 域名(如 'com', 'co.uk')
|
|
128
|
-
page?: number, // 页码(默认 1)
|
|
129
|
-
num?: number, // 每页结果数(默认 10)
|
|
130
|
-
time_period?: "" | "last_hour" | "last_day" | "last_week" | "last_month" | "last_year",
|
|
131
|
-
device?: "desktop" | "mobile", // 设备类型
|
|
132
|
-
|
|
133
|
-
// 高级
|
|
134
|
-
includeHtml?: boolean // 在响应中包含原始 HTML
|
|
135
|
-
}
|
|
136
|
-
```
|
|
137
|
-
|
|
138
|
-
## 使用示例
|
|
139
|
-
|
|
140
|
-
### 抓取网页
|
|
141
|
-
```
|
|
142
|
-
请抓取 https://github.com 并给我主要内容(Markdown 格式)。
|
|
143
|
-
```
|
|
144
|
-
|
|
145
|
-
### Google 搜索
|
|
146
|
-
```
|
|
147
|
-
搜索 "2026 年最佳 Python Web 框架",返回前 5 个结果。
|
|
148
|
-
```
|
|
73
|
+
Token 获取地址:https://app.scrape.do
|
|
149
74
|
|
|
150
|
-
|
|
151
|
-
```
|
|
152
|
-
用中文搜索 "AI 新闻",限定为中国,过去一周的内容。
|
|
153
|
-
```
|
|
75
|
+
## 可用工具
|
|
154
76
|
|
|
155
|
-
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
|
|
77
|
+
| 工具 | 用途 |
|
|
78
|
+
|------|------|
|
|
79
|
+
| `scrape_url` | 主 Scrape.do 抓取 API |
|
|
80
|
+
| `google_search` | 结构化 Google 搜索结果 |
|
|
81
|
+
| `amazon_product` | Amazon PDP 结构化数据 |
|
|
82
|
+
| `amazon_offer_listing` | Amazon 全量卖家报价 |
|
|
83
|
+
| `amazon_search` | Amazon 搜索 / 类目结果 |
|
|
84
|
+
| `amazon_raw_html` | Amazon 原始 HTML |
|
|
85
|
+
| `async_create_job` | 创建 Async API 任务 |
|
|
86
|
+
| `async_get_job` | 查询 Async job 详情 |
|
|
87
|
+
| `async_get_task` | 查询 Async task 详情 |
|
|
88
|
+
| `async_list_jobs` | 列出 Async jobs |
|
|
89
|
+
| `async_cancel_job` | 取消 Async job |
|
|
90
|
+
| `async_get_account` | 查询 Async 账户 / 并发信息 |
|
|
91
|
+
| `proxy_mode_config` | 生成 Proxy Mode 配置 |
|
|
160
92
|
|
|
161
|
-
|
|
162
|
-
```
|
|
163
|
-
抓取 https://example.com 并返回原始 HTML 而不是 markdown。
|
|
164
|
-
```
|
|
93
|
+
## 示例提示词
|
|
165
94
|
|
|
166
|
-
|
|
167
|
-
|
|
168
|
-
用日本(geoCode: jp)的 IP 抓取 https://www.amazon.com/product/12345
|
|
95
|
+
```text
|
|
96
|
+
抓取 https://example.com,开启 render=true,并等待 #app 出现。
|
|
169
97
|
```
|
|
170
98
|
|
|
171
|
-
|
|
172
|
-
|
|
173
|
-
用移动设备抓取 https://example.com 来查看移动版页面。
|
|
99
|
+
```text
|
|
100
|
+
搜索 "open source MCP servers",并设置 google_domain=google.co.uk 与 lr=lang_en。
|
|
174
101
|
```
|
|
175
102
|
|
|
176
|
-
|
|
177
|
-
|
|
178
|
-
截取 https://example.com 的屏幕截图并返回图片。
|
|
179
|
-
```
|
|
180
|
-
|
|
181
|
-
### 等待元素加载
|
|
182
|
-
```
|
|
183
|
-
抓取 https://example.com 但先等待 id 为 "content" 的元素加载完成。
|
|
103
|
+
```text
|
|
104
|
+
获取 Amazon ASIN B0C7BKZ883 在美国 zipcode=10001 下的 PDP 数据。
|
|
184
105
|
```
|
|
185
106
|
|
|
186
|
-
|
|
187
|
-
|
|
188
|
-
使用会话 ID 12345 抓取 https://example.com 的多个页面,以保持相同的 IP。
|
|
107
|
+
```text
|
|
108
|
+
帮我为这 20 个 URL 创建一个异步抓取任务,并返回 job ID。
|
|
189
109
|
```
|
|
190
110
|
|
|
191
|
-
## 与其他工具对比
|
|
192
|
-
|
|
193
|
-
| 功能 | scrape-do-mcp | Firecrawl | Browserbase |
|
|
194
|
-
|------|--------------|-----------|-------------|
|
|
195
|
-
| Google 搜索 | ✅ | ❌ | ❌ |
|
|
196
|
-
| 免费积分 | 1,000 | 500 | 无 |
|
|
197
|
-
| 价格 | 按量付费 | $19+/月 | $15+/月 |
|
|
198
|
-
| MCP 原生 | ✅ | ✅ | ❌ |
|
|
199
|
-
| 配置难度 | 无需配置 | 需要 API key | 需要 API key + 浏览器 |
|
|
200
|
-
|
|
201
|
-
### 为什么选择 scrape-do-mcp?
|
|
202
|
-
|
|
203
|
-
- **零配置**:获取 Token 后即可立即使用
|
|
204
|
-
- **一体化**:网页抓取和 Google 搜索集于一个 MCP
|
|
205
|
-
- **反爬虫绕过**:自动处理 Cloudflare、WAF、CAPTCHA
|
|
206
|
-
- **成本效益**:按需付费,免费额度可用
|
|
207
|
-
|
|
208
|
-
## 积分消耗
|
|
209
|
-
|
|
210
|
-
| 工具 | 积分消耗 |
|
|
211
|
-
|------|---------|
|
|
212
|
-
| scrape_url(普通) | 1 积分/次 |
|
|
213
|
-
| scrape_url(super_proxy) | 10 积分/次 |
|
|
214
|
-
| google_search | 1 积分/次 |
|
|
215
|
-
|
|
216
|
-
**免费:每月 1,000 积分** - 无需信用卡:https://app.scrape.do
|
|
217
|
-
|
|
218
111
|
## 开发
|
|
219
112
|
|
|
220
113
|
```bash
|
|
221
114
|
npm install
|
|
222
115
|
npm run build
|
|
223
|
-
npm run dev
|
|
116
|
+
npm run dev
|
|
224
117
|
```
|
|
225
118
|
|
|
226
119
|
## 许可证
|
package/README.md
CHANGED
|
@@ -2,25 +2,44 @@
|
|
|
2
2
|
|
|
3
3
|
[中文文档](./README-ZH.md) | English
|
|
4
4
|
|
|
5
|
-
MCP
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
5
|
+
An MCP server that wraps Scrape.do's documented APIs in one package: the main scraping API, Google Search API, Amazon Scraper API, Async API, and a Proxy Mode configuration helper.
|
|
6
|
+
|
|
7
|
+
Official docs: https://scrape.do/documentation/
|
|
8
|
+
|
|
9
|
+
## Coverage
|
|
10
|
+
|
|
11
|
+
- `scrape_url`: Main Scrape.do API with JS rendering, geo-targeting, session persistence, screenshots, ReturnJSON, browser interactions, cookies, and header forwarding.
|
|
12
|
+
- `google_search`: Structured Google SERP API with `google_domain`, `location`, `uule`, `lr`, `cr`, `safe`, `nfpr`, `filter`, pagination, and optional raw HTML.
|
|
13
|
+
- `amazon_product`: Amazon PDP endpoint.
|
|
14
|
+
- `amazon_offer_listing`: Amazon offer listing endpoint.
|
|
15
|
+
- `amazon_search`: Amazon search/category endpoint.
|
|
16
|
+
- `amazon_raw_html`: Raw HTML Amazon endpoint with geo-targeting.
|
|
17
|
+
- `async_create_job`, `async_get_job`, `async_get_task`, `async_list_jobs`, `async_cancel_job`, `async_get_account`: Async API coverage with both MCP-friendly aliases and official field names.
|
|
18
|
+
- `proxy_mode_config`: Builds official Proxy Mode connection details, default parameter strings, and CA certificate references.
|
|
19
|
+
|
|
20
|
+
## Compatibility Notes
|
|
21
|
+
|
|
22
|
+
- `scrape_url` supports both MCP-friendly aliases and official parameter names:
|
|
23
|
+
- `render_js` or `render`
|
|
24
|
+
- `super_proxy` or `super`
|
|
25
|
+
- `screenshot` or `screenShot`
|
|
26
|
+
- `google_search` supports:
|
|
27
|
+
- `query` or `q`
|
|
28
|
+
- `country` or `gl`
|
|
29
|
+
- `language` or `hl`
|
|
30
|
+
- `domain` or `google_domain`
|
|
31
|
+
- `includeHtml` or `include_html`
|
|
32
|
+
- `async_create_job` accepts both alias fields like `targets`, `render`, `webhookUrl` and official Async API fields like `Targets`, `Render`, `WebhookURL`.
|
|
33
|
+
- `async_get_job`, `async_get_task`, and `async_cancel_job` accept both `jobId`/`taskId` and official `jobID`/`taskID`.
|
|
34
|
+
- `async_list_jobs` accepts both `pageSize` and official `page_size`.
|
|
35
|
+
- For header forwarding in `scrape_url`, pass `headers` plus `header_mode` (`custom`, `extra`, or `forward`).
|
|
36
|
+
- Screenshot responses preserve the official Scrape.do JSON body and also attach MCP image content when screenshots are present.
|
|
37
|
+
- `scrape_url` now defaults to `output="raw"` to match the official API more closely.
|
|
38
|
+
- `scrape_url` includes response metadata in `structuredContent`, which helps surface `pureCookies`, `transparentResponse`, and binary responses inside MCP.
|
|
18
39
|
|
|
19
40
|
## Installation
|
|
20
41
|
|
|
21
|
-
### Quick Install
|
|
22
|
-
|
|
23
|
-
Run this command in your terminal:
|
|
42
|
+
### Quick Install
|
|
24
43
|
|
|
25
44
|
```bash
|
|
26
45
|
claude mcp add-json scrape-do --scope user '{
|
|
@@ -33,11 +52,9 @@ claude mcp add-json scrape-do --scope user '{
|
|
|
33
52
|
}'
|
|
34
53
|
```
|
|
35
54
|
|
|
36
|
-
Replace `YOUR_TOKEN_HERE` with your Scrape.do API token from https://app.scrape.do
|
|
37
|
-
|
|
38
55
|
### Claude Desktop
|
|
39
56
|
|
|
40
|
-
Add to
|
|
57
|
+
Add this to `~/.claude.json`:
|
|
41
58
|
|
|
42
59
|
```json
|
|
43
60
|
{
|
|
@@ -46,183 +63,57 @@ Add to your `~/.claude.json`:
|
|
|
46
63
|
"command": "npx",
|
|
47
64
|
"args": ["-y", "scrape-do-mcp"],
|
|
48
65
|
"env": {
|
|
49
|
-
"SCRAPE_DO_TOKEN": "
|
|
66
|
+
"SCRAPE_DO_TOKEN": "YOUR_TOKEN_HERE"
|
|
50
67
|
}
|
|
51
68
|
}
|
|
52
69
|
}
|
|
53
70
|
}
|
|
54
71
|
```
|
|
55
72
|
|
|
56
|
-
Get your
|
|
73
|
+
Get your token at https://app.scrape.do
|
|
57
74
|
|
|
58
|
-
##
|
|
59
|
-
|
|
60
|
-
### scrape_url
|
|
61
|
-
|
|
62
|
-
Scrape any webpage and get content as Markdown.
|
|
63
|
-
|
|
64
|
-
```typescript
|
|
65
|
-
// All Parameters
|
|
66
|
-
{
|
|
67
|
-
// Required
|
|
68
|
-
url: string, // Target URL to scrape
|
|
69
|
-
|
|
70
|
-
// Proxy & Rendering
|
|
71
|
-
render_js?: boolean, // Render JavaScript (default: false)
|
|
72
|
-
super_proxy?: boolean, // Use residential/mobile proxies (costs 10 credits)
|
|
73
|
-
geoCode?: string, // Country code (e.g., 'us', 'cn', 'gb')
|
|
74
|
-
regionalGeoCode?: string, // Region (e.g., 'asia', 'europe')
|
|
75
|
-
device?: "desktop" | "mobile" | "tablet", // Device type
|
|
76
|
-
sessionId?: number, // Keep same IP for session
|
|
77
|
-
|
|
78
|
-
// Timeout & Retry
|
|
79
|
-
timeout?: number, // Max timeout in ms (default: 60000)
|
|
80
|
-
retryTimeout?: number, // Retry timeout in ms
|
|
81
|
-
disableRetry?: boolean, // Disable auto retry
|
|
82
|
-
|
|
83
|
-
// Output Format
|
|
84
|
-
output?: "markdown" | "raw", // Output format (default: markdown)
|
|
85
|
-
returnJSON?: boolean, // Return network requests as JSON
|
|
86
|
-
transparentResponse?: boolean, // Return pure response
|
|
87
|
-
|
|
88
|
-
// Screenshot
|
|
89
|
-
screenshot?: boolean, // Take screenshot (PNG)
|
|
90
|
-
fullScreenShot?: boolean, // Full page screenshot
|
|
91
|
-
particularScreenShot?: string, // Screenshot of element (CSS selector)
|
|
92
|
-
|
|
93
|
-
// Browser Control
|
|
94
|
-
waitSelector?: string, // Wait for element (CSS selector)
|
|
95
|
-
customWait?: number, // Wait time after load (ms)
|
|
96
|
-
waitUntil?: "domcontentloaded" | "load" | "networkidle" | "networkidle0" | "networkidle2",
|
|
97
|
-
width?: number, // Viewport width (default: 1920)
|
|
98
|
-
height?: number, // Viewport height (default: 1080)
|
|
99
|
-
blockResources?: boolean, // Block CSS/images/fonts (default: true)
|
|
100
|
-
|
|
101
|
-
// Headers & Cookies
|
|
102
|
-
customHeaders?: boolean, // Handle all headers
|
|
103
|
-
extraHeaders?: boolean, // Add extra headers
|
|
104
|
-
forwardHeaders?: boolean, // Forward your headers
|
|
105
|
-
setCookies?: string, // Set cookies ('name=value; name2=value2')
|
|
106
|
-
pureCookies?: boolean, // Return original cookies
|
|
107
|
-
|
|
108
|
-
// Other
|
|
109
|
-
disableRedirection?: boolean, // Disable redirect
|
|
110
|
-
callback?: string // Webhook URL for async results
|
|
111
|
-
}
|
|
112
|
-
```
|
|
113
|
-
|
|
114
|
-
### google_search
|
|
115
|
-
|
|
116
|
-
Search Google and get structured results.
|
|
75
|
+
## Available Tools
|
|
117
76
|
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
// Advanced
|
|
134
|
-
includeHtml?: boolean // Include raw HTML in response
|
|
135
|
-
}
|
|
136
|
-
```
|
|
77
|
+
| Tool | Purpose |
|
|
78
|
+
|------|---------|
|
|
79
|
+
| `scrape_url` | Main Scrape.do scraping API wrapper |
|
|
80
|
+
| `google_search` | Structured Google search results |
|
|
81
|
+
| `amazon_product` | Amazon PDP structured data |
|
|
82
|
+
| `amazon_offer_listing` | Amazon seller offers |
|
|
83
|
+
| `amazon_search` | Amazon keyword/category results |
|
|
84
|
+
| `amazon_raw_html` | Raw Amazon HTML with geo-targeting |
|
|
85
|
+
| `async_create_job` | Create Async API jobs |
|
|
86
|
+
| `async_get_job` | Fetch Async job details |
|
|
87
|
+
| `async_get_task` | Fetch Async task details |
|
|
88
|
+
| `async_list_jobs` | List Async jobs |
|
|
89
|
+
| `async_cancel_job` | Cancel Async jobs |
|
|
90
|
+
| `async_get_account` | Fetch Async account/concurrency info |
|
|
91
|
+
| `proxy_mode_config` | Generate Proxy Mode configuration |
|
|
137
92
|
|
|
138
93
|
## Example Prompts
|
|
139
94
|
|
|
140
|
-
|
|
141
|
-
|
|
142
|
-
### Scrape a Website
|
|
143
|
-
```
|
|
144
|
-
Please scrape https://github.com and give me the main content as markdown.
|
|
95
|
+
```text
|
|
96
|
+
Scrape https://example.com with render=true and wait for #app.
|
|
145
97
|
```
|
|
146
98
|
|
|
147
|
-
|
|
148
|
-
|
|
149
|
-
Search Google for "best Python web frameworks 2026" and return the top 5 results.
|
|
99
|
+
```text
|
|
100
|
+
Search Google for "open source MCP servers" with google_domain=google.co.uk and lr=lang_en.
|
|
150
101
|
```
|
|
151
102
|
|
|
152
|
-
|
|
153
|
-
|
|
154
|
-
Search for "AI news" in Chinese, from China, last week.
|
|
103
|
+
```text
|
|
104
|
+
Get the Amazon PDP for ASIN B0C7BKZ883 in the US with zipcode 10001.
|
|
155
105
|
```
|
|
156
106
|
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
Scrape this React Single Page Application: https://example-spa.com
|
|
160
|
-
Use render_js=true to get the fully rendered content.
|
|
107
|
+
```text
|
|
108
|
+
Create an async job for these 20 URLs and give me the job ID.
|
|
161
109
|
```
|
|
162
110
|
|
|
163
|
-
### Get Raw HTML
|
|
164
|
-
```
|
|
165
|
-
Scrape https://example.com and return raw HTML instead of markdown.
|
|
166
|
-
```
|
|
167
|
-
|
|
168
|
-
### Geo-targeting
|
|
169
|
-
```
|
|
170
|
-
Scrape https://www.amazon.com/product/12345 as if I'm in Japan (geoCode: jp)
|
|
171
|
-
```
|
|
172
|
-
|
|
173
|
-
### Mobile Device
|
|
174
|
-
```
|
|
175
|
-
Scrape https://example.com using a mobile device to see the mobile version.
|
|
176
|
-
```
|
|
177
|
-
|
|
178
|
-
### Take Screenshot
|
|
179
|
-
```
|
|
180
|
-
Take a screenshot of https://example.com and return the image.
|
|
181
|
-
```
|
|
182
|
-
|
|
183
|
-
### Wait for Element
|
|
184
|
-
```
|
|
185
|
-
Scrape https://example.com but wait for the element with id "content" to load first.
|
|
186
|
-
```
|
|
187
|
-
|
|
188
|
-
### Session Persistence
|
|
189
|
-
```
|
|
190
|
-
Scrape multiple pages of https://example.com using sessionId 12345 to maintain the same IP.
|
|
191
|
-
```
|
|
192
|
-
|
|
193
|
-
## Comparison with Alternatives
|
|
194
|
-
|
|
195
|
-
| Feature | scrape-do-mcp | Firecrawl | Browserbase |
|
|
196
|
-
|---------|--------------|-----------|-------------|
|
|
197
|
-
| Google Search | ✅ | ❌ | ❌ |
|
|
198
|
-
| Free Credits | 1,000 | 500 | None |
|
|
199
|
-
| Pricing | Pay per use | $19+/mo | $15+/mo |
|
|
200
|
-
| MCP Native | ✅ | ✅ | ❌ |
|
|
201
|
-
| Setup Required | None | API key | API key + browser |
|
|
202
|
-
|
|
203
|
-
### Why scrape-do-mcp?
|
|
204
|
-
|
|
205
|
-
- **Zero setup**: Just get a token and use immediately
|
|
206
|
-
- **All-in-one**: Both web scraping AND Google search in one MCP
|
|
207
|
-
- **Anti-bot bypass**: Automatically handles Cloudflare, WAFs, CAPTCHAs
|
|
208
|
-
- **Cost-effective**: Pay only for what you use, free tier available
|
|
209
|
-
|
|
210
|
-
## Credit Usage
|
|
211
|
-
|
|
212
|
-
| Tool | Credit Cost |
|
|
213
|
-
|------|-------------|
|
|
214
|
-
| scrape_url (regular) | 1 credit/request |
|
|
215
|
-
| scrape_url (super_proxy) | 10 credits/request |
|
|
216
|
-
| google_search | 1 credit/request |
|
|
217
|
-
|
|
218
|
-
**Free: 1,000 credits/month** - No credit card required: https://app.scrape.do
|
|
219
|
-
|
|
220
111
|
## Development
|
|
221
112
|
|
|
222
113
|
```bash
|
|
223
114
|
npm install
|
|
224
115
|
npm run build
|
|
225
|
-
npm run dev
|
|
116
|
+
npm run dev
|
|
226
117
|
```
|
|
227
118
|
|
|
228
119
|
## License
|