scrape-do-mcp 0.2.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Abel
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README-ZH.md CHANGED
@@ -2,25 +2,44 @@
2
2
 
3
3
  [English Docs](./README.md) | 中文文档
4
4
 
5
- Scrape.do 网页抓取和 Google 搜索 MCP 服务器 - 支持反机器人保护
6
-
7
- ## 功能特点
8
-
9
- - **scrape_url**: 抓取任意网页并返回 Markdown 格式内容。自动绕过 Cloudflare、WAF、CAPTCHA 和反爬虫保护。支持 JavaScript 渲染、截图、地理定位(150+ 国家)、设备模拟、会话保持、自定义请求头/Cookie、超时控制等。
10
- - **google_search**: 搜索 Google 并返回结构化的 SERP 结果 JSON。包含自然搜索结果、知识图谱、本地商家、新闻、相关问题等。支持地理定位和设备筛选。
11
-
12
- ## 可用工具
13
-
14
- | 工具 | 描述 |
15
- |------|------|
16
- | `scrape_url` | 全功能网页抓取,反机器人绕过。支持:JavaScript 渲染、截图(PNG)、地理定位(150+ 国家)、设备模拟(桌面/手机/平板)、会话保持、自定义请求头/Cookie、超时控制等。 |
17
- | `google_search` | Google SERP 结构化抓取,返回 JSON。支持:自然搜索结果、知识图谱、本地商家、新闻、People Also Ask、视频结果等,支持地理定位、设备筛选、时间筛选。 |
5
+ 这是一个把 Scrape.do 官方文档中主要 API 能力封装成 MCP 工具的包:主抓取 API、Google Search API、Amazon Scraper API、Async API,以及 Proxy Mode 配置辅助工具。
6
+
7
+ 官方文档:https://scrape.do/documentation/
8
+
9
+ ## 覆盖范围
10
+
11
+ - `scrape_url`:主 Scrape.do 抓取 API,支持 JS 渲染、地理定位、会话保持、截图、ReturnJSON、浏览器交互、Cookie、Header 转发。
12
+ - `google_search`:结构化 Google 搜索 API,支持 `google_domain`、`location`、`uule`、`lr`、`cr`、`safe`、`nfpr`、`filter`、分页、原始 HTML。
13
+ - `amazon_product`:Amazon PDP 接口。
14
+ - `amazon_offer_listing`:Amazon 卖家报价接口。
15
+ - `amazon_search`:Amazon 搜索 / 类目结果接口。
16
+ - `amazon_raw_html`:Amazon 原始 HTML 接口。
17
+ - `async_create_job`、`async_get_job`、`async_get_task`、`async_list_jobs`、`async_cancel_job`、`async_get_account`:Async API,并同时兼容 MCP 风格字段和官方字段名。
18
+ - `proxy_mode_config`:生成更贴近官方文档的 Proxy Mode 连接信息、默认参数串和证书信息。
19
+
20
+ ## 兼容性说明
21
+
22
+ - `scrape_url` 同时支持 MCP 友好的别名和官方参数名:
23
+ - `render_js` 或 `render`
24
+ - `super_proxy` 或 `super`
25
+ - `screenshot` 或 `screenShot`
26
+ - `google_search` 同时支持:
27
+ - `query` 或 `q`
28
+ - `country` 或 `gl`
29
+ - `language` 或 `hl`
30
+ - `domain` 或 `google_domain`
31
+ - `includeHtml` 或 `include_html`
32
+ - `async_create_job` 同时接受 `targets`、`render`、`webhookUrl` 这类别名,以及官方字段 `Targets`、`Render`、`WebhookURL`。
33
+ - `async_get_job`、`async_get_task`、`async_cancel_job` 同时接受 `jobId` / `taskId` 和官方 `jobID` / `taskID`。
34
+ - `async_list_jobs` 同时支持 `pageSize` 和官方 `page_size`。
35
+ - `scrape_url` 里的 Header 转发请使用 `headers` + `header_mode`(`custom` / `extra` / `forward`)。
36
+ - 截图结果会保留官方 JSON 响应,同时附加 MCP 图片内容,尽量兼顾官方格式和 MCP 可视化体验。
37
+ - `scrape_url` 现在默认使用 `output="raw"`,更贴近官方 API。
38
+ - `scrape_url` 会在 `structuredContent` 里附带响应元数据,便于在 MCP 中查看 `pureCookies`、`transparentResponse` 和二进制响应信息。
18
39
 
19
40
  ## 安装
20
41
 
21
- ### 快速安装(推荐)
22
-
23
- 在终端中运行以下命令:
42
+ ### 快速安装
24
43
 
25
44
  ```bash
26
45
  claude mcp add-json scrape-do --scope user '{
@@ -28,13 +47,11 @@ claude mcp add-json scrape-do --scope user '{
28
47
  "command": "npx",
29
48
  "args": ["-y", "scrape-do-mcp"],
30
49
  "env": {
31
- "SCRAPE_DO_TOKEN": "你的Token"
50
+ "SCRAPE_DO_TOKEN": "YOUR_TOKEN_HERE"
32
51
  }
33
52
  }'
34
53
  ```
35
54
 
36
- 将 `你的Token` 替换为你在 https://app.scrape.do 获取的 API Token。
37
-
38
55
  ### Claude Desktop
39
56
 
40
57
  添加到 `~/.claude.json`:
@@ -46,181 +63,57 @@ claude mcp add-json scrape-do --scope user '{
46
63
  "command": "npx",
47
64
  "args": ["-y", "scrape-do-mcp"],
48
65
  "env": {
49
- "SCRAPE_DO_TOKEN": "你的Token"
66
+ "SCRAPE_DO_TOKEN": "YOUR_TOKEN_HERE"
50
67
  }
51
68
  }
52
69
  }
53
70
  }
54
71
  ```
55
72
 
56
- 获取免费 API Token:https://app.scrape.do
57
-
58
- ## 使用方法
59
-
60
- ### scrape_url
61
-
62
- 抓取任意网页并获取 Markdown 内容。
63
-
64
- ```typescript
65
- // 完整参数
66
- {
67
- // 必需
68
- url: string, // 要抓取的网址
69
-
70
- // 代理和渲染
71
- render_js?: boolean, // 渲染 JavaScript(默认 false)
72
- super_proxy?: boolean, // 使用住宅/移动代理(消耗 10 积分)
73
- geoCode?: string, // 国家代码(如 'us', 'cn', 'gb')
74
- regionalGeoCode?: string, // 区域(如 'asia', 'europe')
75
- device?: "desktop" | "mobile" | "tablet", // 设备类型
76
- sessionId?: number, // 保持相同 IP 的会话
77
-
78
- // 超时和重试
79
- timeout?: number, // 最大超时时间(毫秒,默认 60000)
80
- retryTimeout?: number, // 重试超时(毫秒)
81
- disableRetry?: boolean, // 禁用自动重试
82
-
83
- // 输出格式
84
- output?: "markdown" | "raw", // 输出格式(默认 markdown)
85
- returnJSON?: boolean, // 以 JSON 形式返回网络请求
86
- transparentResponse?: boolean, // 返回原始响应
87
-
88
- // 截图
89
- screenshot?: boolean, // 截图(PNG)
90
- fullScreenShot?: boolean, // 全页截图
91
- particularScreenShot?: string, // 元素截图(CSS 选择器)
92
-
93
- // 浏览器控制
94
- waitSelector?: string, // 等待元素(CSS 选择器)
95
- customWait?: number, // 加载后等待时间(毫秒)
96
- waitUntil?: "domcontentloaded" | "load" | "networkidle" | "networkidle0" | "networkidle2",
97
- width?: number, // 视口宽度(默认 1920)
98
- height?: number, // 视口高度(默认 1080)
99
- blockResources?: boolean, // 阻止 CSS/图片/字体(默认 true)
100
-
101
- // 请求头和 Cookie
102
- customHeaders?: boolean, // 处理所有请求头
103
- extraHeaders?: boolean, // 添加额外请求头
104
- forwardHeaders?: boolean, // 转发你的请求头
105
- setCookies?: string, // 设置 Cookie(格式:'name=value; name2=value2')
106
- pureCookies?: boolean, // 返回原始 Cookie
107
-
108
- // 其他
109
- disableRedirection?: boolean, // 禁用重定向
110
- callback?: string // Webhook URL 异步接收结果
111
- }
112
- ```
113
-
114
- ### google_search
115
-
116
- 搜索 Google 并获取结构化结果。
117
-
118
- ```typescript
119
- // 完整参数
120
- {
121
- // 必需
122
- query: string, // 搜索关键词
123
-
124
- // 搜索选项
125
- country?: string, // 国家代码(默认 'us')
126
- language?: string, // 界面语言(默认 'en')
127
- domain?: string, // Google 域名(如 'com', 'co.uk')
128
- page?: number, // 页码(默认 1)
129
- num?: number, // 每页结果数(默认 10)
130
- time_period?: "" | "last_hour" | "last_day" | "last_week" | "last_month" | "last_year",
131
- device?: "desktop" | "mobile", // 设备类型
132
-
133
- // 高级
134
- includeHtml?: boolean // 在响应中包含原始 HTML
135
- }
136
- ```
137
-
138
- ## 使用示例
139
-
140
- ### 抓取网页
141
- ```
142
- 请抓取 https://github.com 并给我主要内容(Markdown 格式)。
143
- ```
144
-
145
- ### Google 搜索
146
- ```
147
- 搜索 "2026 年最佳 Python Web 框架",返回前 5 个结果。
148
- ```
73
+ Token 获取地址:https://app.scrape.do
149
74
 
150
- ### 带筛选条件的搜索
151
- ```
152
- 用中文搜索 "AI 新闻",限定为中国,过去一周的内容。
153
- ```
75
+ ## 可用工具
154
76
 
155
- ### JavaScript 渲染
156
- ```
157
- 抓取这个 React 单页应用:https://example-spa.com
158
- 使用 render_js=true 获取完整渲染内容。
159
- ```
77
+ | 工具 | 用途 |
78
+ |------|------|
79
+ | `scrape_url` | 主 Scrape.do 抓取 API |
80
+ | `google_search` | 结构化 Google 搜索结果 |
81
+ | `amazon_product` | Amazon PDP 结构化数据 |
82
+ | `amazon_offer_listing` | Amazon 全量卖家报价 |
83
+ | `amazon_search` | Amazon 搜索 / 类目结果 |
84
+ | `amazon_raw_html` | Amazon 原始 HTML |
85
+ | `async_create_job` | 创建 Async API 任务 |
86
+ | `async_get_job` | 查询 Async job 详情 |
87
+ | `async_get_task` | 查询 Async task 详情 |
88
+ | `async_list_jobs` | 列出 Async jobs |
89
+ | `async_cancel_job` | 取消 Async job |
90
+ | `async_get_account` | 查询 Async 账户 / 并发信息 |
91
+ | `proxy_mode_config` | 生成 Proxy Mode 配置 |
160
92
 
161
- ### 获取原始 HTML
162
- ```
163
- 抓取 https://example.com 并返回原始 HTML 而不是 markdown。
164
- ```
93
+ ## 示例提示词
165
94
 
166
- ### 地理定位抓取
167
- ```
168
- 用日本(geoCode: jp)的 IP 抓取 https://www.amazon.com/product/12345
95
+ ```text
96
+ 抓取 https://example.com,开启 render=true,并等待 #app 出现。
169
97
  ```
170
98
 
171
- ### 移动设备模拟
172
- ```
173
- 用移动设备抓取 https://example.com 来查看移动版页面。
99
+ ```text
100
+ 搜索 "open source MCP servers",并设置 google_domain=google.co.uk 与 lr=lang_en。
174
101
  ```
175
102
 
176
- ### 截图
177
- ```
178
- 截取 https://example.com 的屏幕截图并返回图片。
179
- ```
180
-
181
- ### 等待元素加载
182
- ```
183
- 抓取 https://example.com 但先等待 id 为 "content" 的元素加载完成。
103
+ ```text
104
+ 获取 Amazon ASIN B0C7BKZ883 在美国 zipcode=10001 下的 PDP 数据。
184
105
  ```
185
106
 
186
- ### 会话保持
187
- ```
188
- 使用会话 ID 12345 抓取 https://example.com 的多个页面,以保持相同的 IP。
107
+ ```text
108
+ 帮我为这 20 个 URL 创建一个异步抓取任务,并返回 job ID。
189
109
  ```
190
110
 
191
- ## 与其他工具对比
192
-
193
- | 功能 | scrape-do-mcp | Firecrawl | Browserbase |
194
- |------|--------------|-----------|-------------|
195
- | Google 搜索 | ✅ | ❌ | ❌ |
196
- | 免费积分 | 1,000 | 500 | 无 |
197
- | 价格 | 按量付费 | $19+/月 | $15+/月 |
198
- | MCP 原生 | ✅ | ✅ | ❌ |
199
- | 配置难度 | 无需配置 | 需要 API key | 需要 API key + 浏览器 |
200
-
201
- ### 为什么选择 scrape-do-mcp?
202
-
203
- - **零配置**:获取 Token 后即可立即使用
204
- - **一体化**:网页抓取和 Google 搜索集于一个 MCP
205
- - **反爬虫绕过**:自动处理 Cloudflare、WAF、CAPTCHA
206
- - **成本效益**:按需付费,免费额度可用
207
-
208
- ## 积分消耗
209
-
210
- | 工具 | 积分消耗 |
211
- |------|---------|
212
- | scrape_url(普通) | 1 积分/次 |
213
- | scrape_url(super_proxy) | 10 积分/次 |
214
- | google_search | 1 积分/次 |
215
-
216
- **免费:每月 1,000 积分** - 无需信用卡:https://app.scrape.do
217
-
218
111
  ## 开发
219
112
 
220
113
  ```bash
221
114
  npm install
222
115
  npm run build
223
- npm run dev # 开发模式运行
116
+ npm run dev
224
117
  ```
225
118
 
226
119
  ## 许可证
package/README.md CHANGED
@@ -2,25 +2,44 @@
2
2
 
3
3
  [中文文档](./README-ZH.md) | English
4
4
 
5
- MCP Server for Scrape.do - Web Scraping & Google Search with anti-bot bypass
6
-
7
- ## Features
8
-
9
- - **scrape_url**: Scrape any webpage and return content as Markdown. Automatically bypasses Cloudflare, WAFs, CAPTCHAs, and anti-bot protection. Supports JavaScript rendering, screenshots, geo-targeting (150+ countries), device emulation, session persistence, and more.
10
- - **google_search**: Search Google and return structured SERP results as JSON. Returns organic results, knowledge graph, local businesses, news stories, and more. Supports geo-targeting and device filtering.
11
-
12
- ## Available Tools
13
-
14
- | Tool | Description |
15
- |------|-------------|
16
- | `scrape_url` | Full-featured web scraping with anti-bot bypass. Supports: JavaScript rendering, screenshots (PNG), geo-targeting (150+ countries), device emulation (desktop/mobile/tablet), session persistence, custom headers/cookies, timeout control, and more. |
17
- | `google_search` | Google SERP scraping returning structured JSON. Supports: organic results, knowledge graph, local businesses, news, People Also Ask, video results, geo-targeting, device filtering, and time-based filtering. |
5
+ An MCP server that wraps Scrape.do's documented APIs in one package: the main scraping API, Google Search API, Amazon Scraper API, Async API, and a Proxy Mode configuration helper.
6
+
7
+ Official docs: https://scrape.do/documentation/
8
+
9
+ ## Coverage
10
+
11
+ - `scrape_url`: Main Scrape.do API with JS rendering, geo-targeting, session persistence, screenshots, ReturnJSON, browser interactions, cookies, and header forwarding.
12
+ - `google_search`: Structured Google SERP API with `google_domain`, `location`, `uule`, `lr`, `cr`, `safe`, `nfpr`, `filter`, pagination, and optional raw HTML.
13
+ - `amazon_product`: Amazon PDP endpoint.
14
+ - `amazon_offer_listing`: Amazon offer listing endpoint.
15
+ - `amazon_search`: Amazon search/category endpoint.
16
+ - `amazon_raw_html`: Raw HTML Amazon endpoint with geo-targeting.
17
+ - `async_create_job`, `async_get_job`, `async_get_task`, `async_list_jobs`, `async_cancel_job`, `async_get_account`: Async API coverage with both MCP-friendly aliases and official field names.
18
+ - `proxy_mode_config`: Builds official Proxy Mode connection details, default parameter strings, and CA certificate references.
19
+
20
+ ## Compatibility Notes
21
+
22
+ - `scrape_url` supports both MCP-friendly aliases and official parameter names:
23
+ - `render_js` or `render`
24
+ - `super_proxy` or `super`
25
+ - `screenshot` or `screenShot`
26
+ - `google_search` supports:
27
+ - `query` or `q`
28
+ - `country` or `gl`
29
+ - `language` or `hl`
30
+ - `domain` or `google_domain`
31
+ - `includeHtml` or `include_html`
32
+ - `async_create_job` accepts both alias fields like `targets`, `render`, `webhookUrl` and official Async API fields like `Targets`, `Render`, `WebhookURL`.
33
+ - `async_get_job`, `async_get_task`, and `async_cancel_job` accept both `jobId`/`taskId` and official `jobID`/`taskID`.
34
+ - `async_list_jobs` accepts both `pageSize` and official `page_size`.
35
+ - For header forwarding in `scrape_url`, pass `headers` plus `header_mode` (`custom`, `extra`, or `forward`).
36
+ - Screenshot responses preserve the official Scrape.do JSON body and also attach MCP image content when screenshots are present.
37
+ - `scrape_url` now defaults to `output="raw"` to match the official API more closely.
38
+ - `scrape_url` includes response metadata in `structuredContent`, which helps surface `pureCookies`, `transparentResponse`, and binary responses inside MCP.
18
39
 
19
40
  ## Installation
20
41
 
21
- ### Quick Install (Recommended)
22
-
23
- Run this command in your terminal:
42
+ ### Quick Install
24
43
 
25
44
  ```bash
26
45
  claude mcp add-json scrape-do --scope user '{
@@ -33,11 +52,9 @@ claude mcp add-json scrape-do --scope user '{
33
52
  }'
34
53
  ```
35
54
 
36
- Replace `YOUR_TOKEN_HERE` with your Scrape.do API token from https://app.scrape.do
37
-
38
55
  ### Claude Desktop
39
56
 
40
- Add to your `~/.claude.json`:
57
+ Add this to `~/.claude.json`:
41
58
 
42
59
  ```json
43
60
  {
@@ -46,183 +63,57 @@ Add to your `~/.claude.json`:
46
63
  "command": "npx",
47
64
  "args": ["-y", "scrape-do-mcp"],
48
65
  "env": {
49
- "SCRAPE_DO_TOKEN": "your_token_here"
66
+ "SCRAPE_DO_TOKEN": "YOUR_TOKEN_HERE"
50
67
  }
51
68
  }
52
69
  }
53
70
  }
54
71
  ```
55
72
 
56
- Get your free API token at: https://app.scrape.do
73
+ Get your token at https://app.scrape.do
57
74
 
58
- ## Usage
59
-
60
- ### scrape_url
61
-
62
- Scrape any webpage and get content as Markdown.
63
-
64
- ```typescript
65
- // All Parameters
66
- {
67
- // Required
68
- url: string, // Target URL to scrape
69
-
70
- // Proxy & Rendering
71
- render_js?: boolean, // Render JavaScript (default: false)
72
- super_proxy?: boolean, // Use residential/mobile proxies (costs 10 credits)
73
- geoCode?: string, // Country code (e.g., 'us', 'cn', 'gb')
74
- regionalGeoCode?: string, // Region (e.g., 'asia', 'europe')
75
- device?: "desktop" | "mobile" | "tablet", // Device type
76
- sessionId?: number, // Keep same IP for session
77
-
78
- // Timeout & Retry
79
- timeout?: number, // Max timeout in ms (default: 60000)
80
- retryTimeout?: number, // Retry timeout in ms
81
- disableRetry?: boolean, // Disable auto retry
82
-
83
- // Output Format
84
- output?: "markdown" | "raw", // Output format (default: markdown)
85
- returnJSON?: boolean, // Return network requests as JSON
86
- transparentResponse?: boolean, // Return pure response
87
-
88
- // Screenshot
89
- screenshot?: boolean, // Take screenshot (PNG)
90
- fullScreenShot?: boolean, // Full page screenshot
91
- particularScreenShot?: string, // Screenshot of element (CSS selector)
92
-
93
- // Browser Control
94
- waitSelector?: string, // Wait for element (CSS selector)
95
- customWait?: number, // Wait time after load (ms)
96
- waitUntil?: "domcontentloaded" | "load" | "networkidle" | "networkidle0" | "networkidle2",
97
- width?: number, // Viewport width (default: 1920)
98
- height?: number, // Viewport height (default: 1080)
99
- blockResources?: boolean, // Block CSS/images/fonts (default: true)
100
-
101
- // Headers & Cookies
102
- customHeaders?: boolean, // Handle all headers
103
- extraHeaders?: boolean, // Add extra headers
104
- forwardHeaders?: boolean, // Forward your headers
105
- setCookies?: string, // Set cookies ('name=value; name2=value2')
106
- pureCookies?: boolean, // Return original cookies
107
-
108
- // Other
109
- disableRedirection?: boolean, // Disable redirect
110
- callback?: string // Webhook URL for async results
111
- }
112
- ```
113
-
114
- ### google_search
115
-
116
- Search Google and get structured results.
75
+ ## Available Tools
117
76
 
118
- ```typescript
119
- // All Parameters
120
- {
121
- // Required
122
- query: string, // Search query
123
-
124
- // Search Options
125
- country?: string, // Country code (default: 'us')
126
- language?: string, // Interface language (default: 'en')
127
- domain?: string, // Google domain (e.g., 'com', 'co.uk')
128
- page?: number, // Page number (default: 1)
129
- num?: number, // Results per page (default: 10)
130
- time_period?: "" | "last_hour" | "last_day" | "last_week" | "last_month" | "last_year",
131
- device?: "desktop" | "mobile", // Device type
132
-
133
- // Advanced
134
- includeHtml?: boolean // Include raw HTML in response
135
- }
136
- ```
77
+ | Tool | Purpose |
78
+ |------|---------|
79
+ | `scrape_url` | Main Scrape.do scraping API wrapper |
80
+ | `google_search` | Structured Google search results |
81
+ | `amazon_product` | Amazon PDP structured data |
82
+ | `amazon_offer_listing` | Amazon seller offers |
83
+ | `amazon_search` | Amazon keyword/category results |
84
+ | `amazon_raw_html` | Raw Amazon HTML with geo-targeting |
85
+ | `async_create_job` | Create Async API jobs |
86
+ | `async_get_job` | Fetch Async job details |
87
+ | `async_get_task` | Fetch Async task details |
88
+ | `async_list_jobs` | List Async jobs |
89
+ | `async_cancel_job` | Cancel Async jobs |
90
+ | `async_get_account` | Fetch Async account/concurrency info |
91
+ | `proxy_mode_config` | Generate Proxy Mode configuration |
137
92
 
138
93
  ## Example Prompts
139
94
 
140
- Here are some prompts you can use to invoke the tools:
141
-
142
- ### Scrape a Website
143
- ```
144
- Please scrape https://github.com and give me the main content as markdown.
95
+ ```text
96
+ Scrape https://example.com with render=true and wait for #app.
145
97
  ```
146
98
 
147
- ### Search Google
148
- ```
149
- Search Google for "best Python web frameworks 2026" and return the top 5 results.
99
+ ```text
100
+ Search Google for "open source MCP servers" with google_domain=google.co.uk and lr=lang_en.
150
101
  ```
151
102
 
152
- ### Search with Filters
153
- ```
154
- Search for "AI news" in Chinese, from China, last week.
103
+ ```text
104
+ Get the Amazon PDP for ASIN B0C7BKZ883 in the US with zipcode 10001.
155
105
  ```
156
106
 
157
- ### JavaScript Rendering
158
- ```
159
- Scrape this React Single Page Application: https://example-spa.com
160
- Use render_js=true to get the fully rendered content.
107
+ ```text
108
+ Create an async job for these 20 URLs and give me the job ID.
161
109
  ```
162
110
 
163
- ### Get Raw HTML
164
- ```
165
- Scrape https://example.com and return raw HTML instead of markdown.
166
- ```
167
-
168
- ### Geo-targeting
169
- ```
170
- Scrape https://www.amazon.com/product/12345 as if I'm in Japan (geoCode: jp)
171
- ```
172
-
173
- ### Mobile Device
174
- ```
175
- Scrape https://example.com using a mobile device to see the mobile version.
176
- ```
177
-
178
- ### Take Screenshot
179
- ```
180
- Take a screenshot of https://example.com and return the image.
181
- ```
182
-
183
- ### Wait for Element
184
- ```
185
- Scrape https://example.com but wait for the element with id "content" to load first.
186
- ```
187
-
188
- ### Session Persistence
189
- ```
190
- Scrape multiple pages of https://example.com using sessionId 12345 to maintain the same IP.
191
- ```
192
-
193
- ## Comparison with Alternatives
194
-
195
- | Feature | scrape-do-mcp | Firecrawl | Browserbase |
196
- |---------|--------------|-----------|-------------|
197
- | Google Search | ✅ | ❌ | ❌ |
198
- | Free Credits | 1,000 | 500 | None |
199
- | Pricing | Pay per use | $19+/mo | $15+/mo |
200
- | MCP Native | ✅ | ✅ | ❌ |
201
- | Setup Required | None | API key | API key + browser |
202
-
203
- ### Why scrape-do-mcp?
204
-
205
- - **Zero setup**: Just get a token and use immediately
206
- - **All-in-one**: Both web scraping AND Google search in one MCP
207
- - **Anti-bot bypass**: Automatically handles Cloudflare, WAFs, CAPTCHAs
208
- - **Cost-effective**: Pay only for what you use, free tier available
209
-
210
- ## Credit Usage
211
-
212
- | Tool | Credit Cost |
213
- |------|-------------|
214
- | scrape_url (regular) | 1 credit/request |
215
- | scrape_url (super_proxy) | 10 credits/request |
216
- | google_search | 1 credit/request |
217
-
218
- **Free: 1,000 credits/month** - No credit card required: https://app.scrape.do
219
-
220
111
  ## Development
221
112
 
222
113
  ```bash
223
114
  npm install
224
115
  npm run build
225
- npm run dev # Run in development mode
116
+ npm run dev
226
117
  ```
227
118
 
228
119
  ## License