scrape-do-mcp 0.1.6 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README-ZH.md +62 -100
- package/README.md +59 -99
- package/dist/index.js +773 -62
- package/package.json +6 -2
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Abel
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README-ZH.md
CHANGED
|
@@ -2,25 +2,40 @@
|
|
|
2
2
|
|
|
3
3
|
[English Docs](./README.md) | 中文文档
|
|
4
4
|
|
|
5
|
-
Scrape.do
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
5
|
+
这是一个把 Scrape.do 官方文档中主要 API 能力封装成 MCP 工具的包:主抓取 API、Google Search API、Amazon Scraper API、Async API,以及 Proxy Mode 配置辅助工具。
|
|
6
|
+
|
|
7
|
+
官方文档:https://scrape.do/documentation/
|
|
8
|
+
|
|
9
|
+
## 覆盖范围
|
|
10
|
+
|
|
11
|
+
- `scrape_url`:主 Scrape.do 抓取 API,支持 JS 渲染、地理定位、会话保持、截图、ReturnJSON、浏览器交互、Cookie、Header 转发。
|
|
12
|
+
- `google_search`:结构化 Google 搜索 API,支持 `google_domain`、`location`、`uule`、`lr`、`cr`、`safe`、`nfpr`、`filter`、分页、原始 HTML。
|
|
13
|
+
- `amazon_product`:Amazon PDP 接口。
|
|
14
|
+
- `amazon_offer_listing`:Amazon 卖家报价接口。
|
|
15
|
+
- `amazon_search`:Amazon 搜索 / 类目结果接口。
|
|
16
|
+
- `amazon_raw_html`:Amazon 原始 HTML 接口。
|
|
17
|
+
- `async_create_job`、`async_get_job`、`async_get_task`、`async_list_jobs`、`async_cancel_job`、`async_get_account`:Async API。
|
|
18
|
+
- `proxy_mode_config`:生成 Proxy Mode 的连接信息和参数字符串,不会在工具输出里泄露你的 token。
|
|
19
|
+
|
|
20
|
+
## 兼容性说明
|
|
21
|
+
|
|
22
|
+
- `scrape_url` 同时支持 MCP 友好的别名和官方参数名:
|
|
23
|
+
- `render_js` 或 `render`
|
|
24
|
+
- `super_proxy` 或 `super`
|
|
25
|
+
- `screenshot` 或 `screenShot`
|
|
26
|
+
- `google_search` 同时支持:
|
|
27
|
+
- `query` 或 `q`
|
|
28
|
+
- `country` 或 `gl`
|
|
29
|
+
- `language` 或 `hl`
|
|
30
|
+
- `domain` 或 `google_domain`
|
|
31
|
+
- `includeHtml` 或 `include_html`
|
|
32
|
+
- `scrape_url` 里的 Header 转发请使用 `headers` + `header_mode`(`custom` / `extra` / `forward`)。
|
|
33
|
+
- 截图结果会以 MCP 图片内容返回,而不是单纯的 base64 文本。
|
|
34
|
+
- `scrape_url` 在未启用 ReturnJSON 时默认使用 `output="markdown"`,更适合 LLM 读取;如果你想更贴近原始 HTTP API 的行为,请手动设置 `output="raw"`。
|
|
18
35
|
|
|
19
36
|
## 安装
|
|
20
37
|
|
|
21
|
-
###
|
|
22
|
-
|
|
23
|
-
在终端中运行以下命令:
|
|
38
|
+
### 快速安装
|
|
24
39
|
|
|
25
40
|
```bash
|
|
26
41
|
claude mcp add-json scrape-do --scope user '{
|
|
@@ -28,13 +43,11 @@ claude mcp add-json scrape-do --scope user '{
|
|
|
28
43
|
"command": "npx",
|
|
29
44
|
"args": ["-y", "scrape-do-mcp"],
|
|
30
45
|
"env": {
|
|
31
|
-
"SCRAPE_DO_TOKEN": "
|
|
46
|
+
"SCRAPE_DO_TOKEN": "YOUR_TOKEN_HERE"
|
|
32
47
|
}
|
|
33
48
|
}'
|
|
34
49
|
```
|
|
35
50
|
|
|
36
|
-
将 `你的Token` 替换为你在 https://app.scrape.do 获取的 API Token。
|
|
37
|
-
|
|
38
51
|
### Claude Desktop
|
|
39
52
|
|
|
40
53
|
添加到 `~/.claude.json`:
|
|
@@ -46,108 +59,57 @@ claude mcp add-json scrape-do --scope user '{
|
|
|
46
59
|
"command": "npx",
|
|
47
60
|
"args": ["-y", "scrape-do-mcp"],
|
|
48
61
|
"env": {
|
|
49
|
-
"SCRAPE_DO_TOKEN": "
|
|
62
|
+
"SCRAPE_DO_TOKEN": "YOUR_TOKEN_HERE"
|
|
50
63
|
}
|
|
51
64
|
}
|
|
52
65
|
}
|
|
53
66
|
}
|
|
54
67
|
```
|
|
55
68
|
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
## 使用方法
|
|
59
|
-
|
|
60
|
-
### scrape_url
|
|
69
|
+
Token 获取地址:https://app.scrape.do
|
|
61
70
|
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
```typescript
|
|
65
|
-
// 参数
|
|
66
|
-
{
|
|
67
|
-
url: string, // 要抓取的网址
|
|
68
|
-
render_js?: boolean, // 渲染 JavaScript(默认 false)
|
|
69
|
-
super_proxy?: boolean, // 使用住宅代理(消耗 10 积分,默认 false)
|
|
70
|
-
output?: "markdown" | "raw" // 输出格式(默认 markdown)
|
|
71
|
-
}
|
|
72
|
-
```
|
|
73
|
-
|
|
74
|
-
### google_search
|
|
75
|
-
|
|
76
|
-
搜索 Google 并获取结构化结果。
|
|
77
|
-
|
|
78
|
-
```typescript
|
|
79
|
-
// 参数
|
|
80
|
-
{
|
|
81
|
-
query: string, // 搜索关键词
|
|
82
|
-
country?: string, // 国家代码(默认 "us")
|
|
83
|
-
language?: string, // 界面语言(默认 "en")
|
|
84
|
-
page?: number, // 页码(默认 1)
|
|
85
|
-
time_period?: "" | "last_hour" | "last_day" | "last_week" | "last_month" | "last_year",
|
|
86
|
-
device?: "desktop" | "mobile" // 设备类型(默认 desktop)
|
|
87
|
-
}
|
|
88
|
-
```
|
|
89
|
-
|
|
90
|
-
## 使用示例
|
|
91
|
-
|
|
92
|
-
### 抓取网页
|
|
93
|
-
```
|
|
94
|
-
请抓取 https://github.com 并给我主要内容(Markdown 格式)。
|
|
95
|
-
```
|
|
71
|
+
## 可用工具
|
|
96
72
|
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
73
|
+
| 工具 | 用途 |
|
|
74
|
+
|------|------|
|
|
75
|
+
| `scrape_url` | 主 Scrape.do 抓取 API |
|
|
76
|
+
| `google_search` | 结构化 Google 搜索结果 |
|
|
77
|
+
| `amazon_product` | Amazon PDP 结构化数据 |
|
|
78
|
+
| `amazon_offer_listing` | Amazon 全量卖家报价 |
|
|
79
|
+
| `amazon_search` | Amazon 搜索 / 类目结果 |
|
|
80
|
+
| `amazon_raw_html` | Amazon 原始 HTML |
|
|
81
|
+
| `async_create_job` | 创建 Async API 任务 |
|
|
82
|
+
| `async_get_job` | 查询 Async job 详情 |
|
|
83
|
+
| `async_get_task` | 查询 Async task 详情 |
|
|
84
|
+
| `async_list_jobs` | 列出 Async jobs |
|
|
85
|
+
| `async_cancel_job` | 取消 Async job |
|
|
86
|
+
| `async_get_account` | 查询 Async 账户 / 并发信息 |
|
|
87
|
+
| `proxy_mode_config` | 生成 Proxy Mode 配置 |
|
|
88
|
+
|
|
89
|
+
## 示例提示词
|
|
90
|
+
|
|
91
|
+
```text
|
|
92
|
+
抓取 https://example.com,开启 render=true,并等待 #app 出现。
|
|
100
93
|
```
|
|
101
94
|
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
用中文搜索 "AI 新闻",限定为中国,过去一周的内容。
|
|
95
|
+
```text
|
|
96
|
+
搜索 "open source MCP servers",并设置 google_domain=google.co.uk 与 lr=lang_en。
|
|
105
97
|
```
|
|
106
98
|
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
抓取这个 React 单页应用:https://example-spa.com
|
|
110
|
-
使用 render_js=true 获取完整渲染内容。
|
|
99
|
+
```text
|
|
100
|
+
获取 Amazon ASIN B0C7BKZ883 在美国 zipcode=10001 下的 PDP 数据。
|
|
111
101
|
```
|
|
112
102
|
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
抓取 https://example.com 并返回原始 HTML 而不是 markdown。
|
|
103
|
+
```text
|
|
104
|
+
帮我为这 20 个 URL 创建一个异步抓取任务,并返回 job ID。
|
|
116
105
|
```
|
|
117
106
|
|
|
118
|
-
## 与其他工具对比
|
|
119
|
-
|
|
120
|
-
| 功能 | scrape-do-mcp | Firecrawl | Browserbase |
|
|
121
|
-
|------|--------------|-----------|-------------|
|
|
122
|
-
| Google 搜索 | ✅ | ❌ | ❌ |
|
|
123
|
-
| 免费积分 | 1,000 | 500 | 无 |
|
|
124
|
-
| 价格 | 按量付费 | $19+/月 | $15+/月 |
|
|
125
|
-
| MCP 原生 | ✅ | ✅ | ❌ |
|
|
126
|
-
| 配置难度 | 无需配置 | 需要 API key | 需要 API key + 浏览器 |
|
|
127
|
-
|
|
128
|
-
### 为什么选择 scrape-do-mcp?
|
|
129
|
-
|
|
130
|
-
- **零配置**:获取 Token 后即可立即使用
|
|
131
|
-
- **一体化**:网页抓取和 Google 搜索集于一个 MCP
|
|
132
|
-
- **反爬虫绕过**:自动处理 Cloudflare、WAF、CAPTCHA
|
|
133
|
-
- **成本效益**:按需付费,免费额度可用
|
|
134
|
-
|
|
135
|
-
## 积分消耗
|
|
136
|
-
|
|
137
|
-
| 工具 | 积分消耗 |
|
|
138
|
-
|------|---------|
|
|
139
|
-
| scrape_url(普通) | 1 积分/次 |
|
|
140
|
-
| scrape_url(super_proxy) | 10 积分/次 |
|
|
141
|
-
| google_search | 1 积分/次 |
|
|
142
|
-
|
|
143
|
-
注册即送 **1,000 积分**:https://app.scrape.do
|
|
144
|
-
|
|
145
107
|
## 开发
|
|
146
108
|
|
|
147
109
|
```bash
|
|
148
110
|
npm install
|
|
149
111
|
npm run build
|
|
150
|
-
npm run dev
|
|
112
|
+
npm run dev
|
|
151
113
|
```
|
|
152
114
|
|
|
153
115
|
## 许可证
|
package/README.md
CHANGED
|
@@ -2,25 +2,40 @@
|
|
|
2
2
|
|
|
3
3
|
[中文文档](./README-ZH.md) | English
|
|
4
4
|
|
|
5
|
-
MCP
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
5
|
+
An MCP server that wraps Scrape.do's documented APIs in one package: the main scraping API, Google Search API, Amazon Scraper API, Async API, and a Proxy Mode configuration helper.
|
|
6
|
+
|
|
7
|
+
Official docs: https://scrape.do/documentation/
|
|
8
|
+
|
|
9
|
+
## Coverage
|
|
10
|
+
|
|
11
|
+
- `scrape_url`: Main Scrape.do API with JS rendering, geo-targeting, session persistence, screenshots, ReturnJSON, browser interactions, cookies, and header forwarding.
|
|
12
|
+
- `google_search`: Structured Google SERP API with `google_domain`, `location`, `uule`, `lr`, `cr`, `safe`, `nfpr`, `filter`, pagination, and optional raw HTML.
|
|
13
|
+
- `amazon_product`: Amazon PDP endpoint.
|
|
14
|
+
- `amazon_offer_listing`: Amazon offer listing endpoint.
|
|
15
|
+
- `amazon_search`: Amazon search/category endpoint.
|
|
16
|
+
- `amazon_raw_html`: Raw HTML Amazon endpoint with geo-targeting.
|
|
17
|
+
- `async_create_job`, `async_get_job`, `async_get_task`, `async_list_jobs`, `async_cancel_job`, `async_get_account`: Async API coverage.
|
|
18
|
+
- `proxy_mode_config`: Builds Proxy Mode connection details and parameter strings without exposing your token in tool output.
|
|
19
|
+
|
|
20
|
+
## Compatibility Notes
|
|
21
|
+
|
|
22
|
+
- `scrape_url` supports both MCP-friendly aliases and official parameter names:
|
|
23
|
+
- `render_js` or `render`
|
|
24
|
+
- `super_proxy` or `super`
|
|
25
|
+
- `screenshot` or `screenShot`
|
|
26
|
+
- `google_search` supports:
|
|
27
|
+
- `query` or `q`
|
|
28
|
+
- `country` or `gl`
|
|
29
|
+
- `language` or `hl`
|
|
30
|
+
- `domain` or `google_domain`
|
|
31
|
+
- `includeHtml` or `include_html`
|
|
32
|
+
- For header forwarding in `scrape_url`, pass `headers` plus `header_mode` (`custom`, `extra`, or `forward`).
|
|
33
|
+
- Screenshot responses are returned as MCP image content instead of plain base64 text.
|
|
34
|
+
- `scrape_url` defaults to `output="markdown"` when ReturnJSON is not used so the tool stays LLM-friendly. Set `output="raw"` if you want the raw API-style output.
|
|
18
35
|
|
|
19
36
|
## Installation
|
|
20
37
|
|
|
21
|
-
### Quick Install
|
|
22
|
-
|
|
23
|
-
Run this command in your terminal:
|
|
38
|
+
### Quick Install
|
|
24
39
|
|
|
25
40
|
```bash
|
|
26
41
|
claude mcp add-json scrape-do --scope user '{
|
|
@@ -33,11 +48,9 @@ claude mcp add-json scrape-do --scope user '{
|
|
|
33
48
|
}'
|
|
34
49
|
```
|
|
35
50
|
|
|
36
|
-
Replace `YOUR_TOKEN_HERE` with your Scrape.do API token from https://app.scrape.do
|
|
37
|
-
|
|
38
51
|
### Claude Desktop
|
|
39
52
|
|
|
40
|
-
Add to
|
|
53
|
+
Add this to `~/.claude.json`:
|
|
41
54
|
|
|
42
55
|
```json
|
|
43
56
|
{
|
|
@@ -46,110 +59,57 @@ Add to your `~/.claude.json`:
|
|
|
46
59
|
"command": "npx",
|
|
47
60
|
"args": ["-y", "scrape-do-mcp"],
|
|
48
61
|
"env": {
|
|
49
|
-
"SCRAPE_DO_TOKEN": "
|
|
62
|
+
"SCRAPE_DO_TOKEN": "YOUR_TOKEN_HERE"
|
|
50
63
|
}
|
|
51
64
|
}
|
|
52
65
|
}
|
|
53
66
|
}
|
|
54
67
|
```
|
|
55
68
|
|
|
56
|
-
Get your
|
|
57
|
-
|
|
58
|
-
## Usage
|
|
59
|
-
|
|
60
|
-
### scrape_url
|
|
69
|
+
Get your token at https://app.scrape.do
|
|
61
70
|
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
```typescript
|
|
65
|
-
// Parameters
|
|
66
|
-
{
|
|
67
|
-
url: string, // Target URL to scrape
|
|
68
|
-
render_js?: boolean, // Render JavaScript (default: false)
|
|
69
|
-
super_proxy?: boolean, // Use residential proxies (costs 10 credits, default: false)
|
|
70
|
-
output?: "markdown" | "raw" // Output format (default: markdown)
|
|
71
|
-
}
|
|
72
|
-
```
|
|
73
|
-
|
|
74
|
-
### google_search
|
|
75
|
-
|
|
76
|
-
Search Google and get structured results.
|
|
71
|
+
## Available Tools
|
|
77
72
|
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
73
|
+
| Tool | Purpose |
|
|
74
|
+
|------|---------|
|
|
75
|
+
| `scrape_url` | Main Scrape.do scraping API wrapper |
|
|
76
|
+
| `google_search` | Structured Google search results |
|
|
77
|
+
| `amazon_product` | Amazon PDP structured data |
|
|
78
|
+
| `amazon_offer_listing` | Amazon seller offers |
|
|
79
|
+
| `amazon_search` | Amazon keyword/category results |
|
|
80
|
+
| `amazon_raw_html` | Raw Amazon HTML with geo-targeting |
|
|
81
|
+
| `async_create_job` | Create Async API jobs |
|
|
82
|
+
| `async_get_job` | Fetch Async job details |
|
|
83
|
+
| `async_get_task` | Fetch Async task details |
|
|
84
|
+
| `async_list_jobs` | List Async jobs |
|
|
85
|
+
| `async_cancel_job` | Cancel Async jobs |
|
|
86
|
+
| `async_get_account` | Fetch Async account/concurrency info |
|
|
87
|
+
| `proxy_mode_config` | Generate Proxy Mode configuration |
|
|
89
88
|
|
|
90
89
|
## Example Prompts
|
|
91
90
|
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
### Scrape a Website
|
|
95
|
-
```
|
|
96
|
-
Please scrape https://github.com and give me the main content as markdown.
|
|
91
|
+
```text
|
|
92
|
+
Scrape https://example.com with render=true and wait for #app.
|
|
97
93
|
```
|
|
98
94
|
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
Search Google for "best Python web frameworks 2026" and return the top 5 results.
|
|
95
|
+
```text
|
|
96
|
+
Search Google for "open source MCP servers" with google_domain=google.co.uk and lr=lang_en.
|
|
102
97
|
```
|
|
103
98
|
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
Search for "AI news" in Chinese, from China, last week.
|
|
107
|
-
```
|
|
108
|
-
|
|
109
|
-
### JavaScript Rendering
|
|
110
|
-
```
|
|
111
|
-
Scrape this React Single Page Application: https://example-spa.com
|
|
112
|
-
Use render_js=true to get the fully rendered content.
|
|
99
|
+
```text
|
|
100
|
+
Get the Amazon PDP for ASIN B0C7BKZ883 in the US with zipcode 10001.
|
|
113
101
|
```
|
|
114
102
|
|
|
115
|
-
|
|
103
|
+
```text
|
|
104
|
+
Create an async job for these 20 URLs and give me the job ID.
|
|
116
105
|
```
|
|
117
|
-
Scrape https://example.com and return raw HTML instead of markdown.
|
|
118
|
-
```
|
|
119
|
-
|
|
120
|
-
## Comparison with Alternatives
|
|
121
|
-
|
|
122
|
-
| Feature | scrape-do-mcp | Firecrawl | Browserbase |
|
|
123
|
-
|---------|--------------|-----------|-------------|
|
|
124
|
-
| Google Search | ✅ | ❌ | ❌ |
|
|
125
|
-
| Free Credits | 1,000 | 500 | None |
|
|
126
|
-
| Pricing | Pay per use | $19+/mo | $15+/mo |
|
|
127
|
-
| MCP Native | ✅ | ✅ | ❌ |
|
|
128
|
-
| Setup Required | None | API key | API key + browser |
|
|
129
|
-
|
|
130
|
-
### Why scrape-do-mcp?
|
|
131
|
-
|
|
132
|
-
- **Zero setup**: Just get a token and use immediately
|
|
133
|
-
- **All-in-one**: Both web scraping AND Google search in one MCP
|
|
134
|
-
- **Anti-bot bypass**: Automatically handles Cloudflare, WAFs, CAPTCHAs
|
|
135
|
-
- **Cost-effective**: Pay only for what you use, free tier available
|
|
136
|
-
|
|
137
|
-
## Credit Usage
|
|
138
|
-
|
|
139
|
-
| Tool | Credit Cost |
|
|
140
|
-
|------|-------------|
|
|
141
|
-
| scrape_url (regular) | 1 credit/request |
|
|
142
|
-
| scrape_url (super_proxy) | 10 credits/request |
|
|
143
|
-
| google_search | 1 credit/request |
|
|
144
|
-
|
|
145
|
-
Free registration includes **1,000 credits**: https://app.scrape.do
|
|
146
106
|
|
|
147
107
|
## Development
|
|
148
108
|
|
|
149
109
|
```bash
|
|
150
110
|
npm install
|
|
151
111
|
npm run build
|
|
152
|
-
npm run dev
|
|
112
|
+
npm run dev
|
|
153
113
|
```
|
|
154
114
|
|
|
155
115
|
## License
|
package/dist/index.js
CHANGED
|
@@ -4,97 +4,808 @@ var __importDefault = (this && this.__importDefault) || function (mod) {
|
|
|
4
4
|
return (mod && mod.__esModule) ? mod : { "default": mod };
|
|
5
5
|
};
|
|
6
6
|
Object.defineProperty(exports, "__esModule", { value: true });
|
|
7
|
+
const axios_1 = __importDefault(require("axios"));
|
|
7
8
|
const mcp_js_1 = require("@modelcontextprotocol/sdk/server/mcp.js");
|
|
8
9
|
const stdio_js_1 = require("@modelcontextprotocol/sdk/server/stdio.js");
|
|
9
10
|
const zod_1 = require("zod");
|
|
10
|
-
const
|
|
11
|
+
const SERVER_VERSION = "0.3.0";
|
|
11
12
|
const SCRAPE_DO_TOKEN = process.env.SCRAPE_DO_TOKEN || "";
|
|
12
13
|
const SCRAPE_API_BASE = "https://api.scrape.do";
|
|
14
|
+
const ASYNC_API_BASE = "https://q.scrape.do";
|
|
15
|
+
const headerValueSchema = zod_1.z.union([zod_1.z.string(), zod_1.z.number(), zod_1.z.boolean()]);
|
|
16
|
+
const headerRecordSchema = zod_1.z.record(zod_1.z.string(), headerValueSchema);
|
|
17
|
+
const browserActionSchema = zod_1.z.record(zod_1.z.string(), zod_1.z.union([zod_1.z.string(), zod_1.z.number(), zod_1.z.boolean()]));
|
|
18
|
+
const headerModeSchema = zod_1.z.enum(["custom", "extra", "forward"]);
|
|
19
|
+
const scrapeWaitUntilSchema = zod_1.z.enum(["domcontentloaded", "load", "networkidle", "networkidle0", "networkidle2"]);
|
|
20
|
+
const asyncWaitUntilSchema = zod_1.z.enum(["domcontentloaded", "networkidle0", "networkidle2"]);
|
|
21
|
+
const googleTimePeriodSchema = zod_1.z.enum(["last_hour", "last_day", "last_week", "last_month", "last_year"]);
|
|
22
|
+
const asyncMethodSchema = zod_1.z.enum(["GET", "POST", "PUT", "PATCH", "HEAD", "DELETE"]);
|
|
13
23
|
const server = new mcp_js_1.McpServer({
|
|
14
24
|
name: "scrape-do-mcp",
|
|
15
|
-
version:
|
|
25
|
+
version: SERVER_VERSION,
|
|
16
26
|
});
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
if (
|
|
27
|
+
function isRecord(value) {
|
|
28
|
+
return typeof value === "object" && value !== null && !Array.isArray(value);
|
|
29
|
+
}
|
|
30
|
+
function compactObject(value) {
|
|
31
|
+
return Object.fromEntries(Object.entries(value).filter(([, entry]) => entry !== undefined));
|
|
32
|
+
}
|
|
33
|
+
function stringifyUnknown(value) {
|
|
34
|
+
if (typeof value === "string") {
|
|
35
|
+
return value;
|
|
36
|
+
}
|
|
37
|
+
if (value instanceof ArrayBuffer) {
|
|
38
|
+
return Buffer.from(value).toString("utf8");
|
|
39
|
+
}
|
|
40
|
+
if (Buffer.isBuffer(value)) {
|
|
41
|
+
return value.toString("utf8");
|
|
42
|
+
}
|
|
43
|
+
if (value === undefined || value === null) {
|
|
44
|
+
return "";
|
|
45
|
+
}
|
|
46
|
+
try {
|
|
47
|
+
return JSON.stringify(value, null, 2);
|
|
48
|
+
}
|
|
49
|
+
catch {
|
|
50
|
+
return String(value);
|
|
51
|
+
}
|
|
52
|
+
}
|
|
53
|
+
function tryParseJson(value) {
|
|
54
|
+
try {
|
|
55
|
+
return JSON.parse(value);
|
|
56
|
+
}
|
|
57
|
+
catch {
|
|
58
|
+
return undefined;
|
|
59
|
+
}
|
|
60
|
+
}
|
|
61
|
+
function createErrorResult(message) {
|
|
62
|
+
return {
|
|
63
|
+
content: [{ type: "text", text: message }],
|
|
64
|
+
isError: true,
|
|
65
|
+
};
|
|
66
|
+
}
|
|
67
|
+
function createTextResult(text, structuredContent) {
|
|
68
|
+
return {
|
|
69
|
+
content: [{ type: "text", text }],
|
|
70
|
+
...(structuredContent ? { structuredContent } : {}),
|
|
71
|
+
};
|
|
72
|
+
}
|
|
73
|
+
function createJsonResult(value) {
|
|
74
|
+
if (isRecord(value)) {
|
|
25
75
|
return {
|
|
26
|
-
content: [{ type: "text", text:
|
|
27
|
-
|
|
76
|
+
content: [{ type: "text", text: JSON.stringify(value, null, 2) }],
|
|
77
|
+
structuredContent: value,
|
|
28
78
|
};
|
|
29
79
|
}
|
|
80
|
+
return createTextResult(JSON.stringify(value, null, 2));
|
|
81
|
+
}
|
|
82
|
+
function createImageResult(images, note) {
|
|
83
|
+
const content = [];
|
|
84
|
+
if (note) {
|
|
85
|
+
content.push({ type: "text", text: note });
|
|
86
|
+
}
|
|
87
|
+
for (const image of images) {
|
|
88
|
+
content.push({
|
|
89
|
+
type: "image",
|
|
90
|
+
data: image.data,
|
|
91
|
+
mimeType: image.mimeType,
|
|
92
|
+
});
|
|
93
|
+
}
|
|
94
|
+
return { content };
|
|
95
|
+
}
|
|
96
|
+
function getErrorMessage(error) {
|
|
97
|
+
if (axios_1.default.isAxiosError(error)) {
|
|
98
|
+
const responseData = error.response?.data;
|
|
99
|
+
if (responseData !== undefined) {
|
|
100
|
+
return stringifyUnknown(responseData);
|
|
101
|
+
}
|
|
102
|
+
return error.message;
|
|
103
|
+
}
|
|
104
|
+
if (error instanceof Error) {
|
|
105
|
+
return error.message;
|
|
106
|
+
}
|
|
107
|
+
return String(error);
|
|
108
|
+
}
|
|
109
|
+
async function requestText(config) {
|
|
110
|
+
const response = await axios_1.default.request({
|
|
111
|
+
...config,
|
|
112
|
+
responseType: "text",
|
|
113
|
+
transformResponse: [(value) => value],
|
|
114
|
+
});
|
|
115
|
+
return {
|
|
116
|
+
text: stringifyUnknown(response.data),
|
|
117
|
+
headers: response.headers,
|
|
118
|
+
};
|
|
119
|
+
}
|
|
120
|
+
function normalizeHeaderRecord(value) {
|
|
121
|
+
if (!value) {
|
|
122
|
+
return undefined;
|
|
123
|
+
}
|
|
124
|
+
return Object.fromEntries(Object.entries(value).map(([key, entry]) => [key, String(entry)]));
|
|
125
|
+
}
|
|
126
|
+
function resolveHeaderMode(input) {
|
|
127
|
+
const modes = new Set();
|
|
128
|
+
if (input.customHeaders) {
|
|
129
|
+
modes.add("custom");
|
|
130
|
+
}
|
|
131
|
+
if (input.extraHeaders) {
|
|
132
|
+
modes.add("extra");
|
|
133
|
+
}
|
|
134
|
+
if (input.forwardHeaders) {
|
|
135
|
+
modes.add("forward");
|
|
136
|
+
}
|
|
137
|
+
const explicitMode = input.header_mode ?? input.headerMode;
|
|
138
|
+
if (explicitMode) {
|
|
139
|
+
modes.add(explicitMode);
|
|
140
|
+
}
|
|
141
|
+
if (modes.size > 1) {
|
|
142
|
+
throw new Error("Choose only one header mode: custom, extra, or forward.");
|
|
143
|
+
}
|
|
144
|
+
if (modes.size === 1) {
|
|
145
|
+
return [...modes][0];
|
|
146
|
+
}
|
|
147
|
+
if (input.headers) {
|
|
148
|
+
return "custom";
|
|
149
|
+
}
|
|
150
|
+
return undefined;
|
|
151
|
+
}
|
|
152
|
+
function buildForwardedHeaders(headers, mode) {
|
|
153
|
+
const normalizedHeaders = normalizeHeaderRecord(headers);
|
|
154
|
+
if (!normalizedHeaders) {
|
|
155
|
+
return undefined;
|
|
156
|
+
}
|
|
157
|
+
if (mode !== "extra") {
|
|
158
|
+
return normalizedHeaders;
|
|
159
|
+
}
|
|
160
|
+
return Object.fromEntries(Object.entries(normalizedHeaders).map(([key, value]) => [key.toLowerCase().startsWith("sd-") ? key : `sd-${key}`, value]));
|
|
161
|
+
}
|
|
162
|
+
function inferMimeTypeFromBase64(value) {
|
|
163
|
+
if (value.startsWith("iVBORw0KGgo")) {
|
|
164
|
+
return "image/png";
|
|
165
|
+
}
|
|
166
|
+
if (value.startsWith("/9j/")) {
|
|
167
|
+
return "image/jpeg";
|
|
168
|
+
}
|
|
169
|
+
if (value.startsWith("R0lGOD")) {
|
|
170
|
+
return "image/gif";
|
|
171
|
+
}
|
|
172
|
+
if (value.startsWith("UklGR")) {
|
|
173
|
+
return "image/webp";
|
|
174
|
+
}
|
|
175
|
+
if (value.startsWith("Qk0")) {
|
|
176
|
+
return "image/bmp";
|
|
177
|
+
}
|
|
178
|
+
return undefined;
|
|
179
|
+
}
|
|
180
|
+
function maybeImageMatch(value) {
|
|
181
|
+
const trimmedValue = value.trim();
|
|
182
|
+
const dataUriMatch = trimmedValue.match(/^data:(image\/[a-zA-Z0-9.+-]+);base64,([A-Za-z0-9+/=\s]+)$/);
|
|
183
|
+
if (dataUriMatch) {
|
|
184
|
+
return {
|
|
185
|
+
mimeType: dataUriMatch[1],
|
|
186
|
+
data: dataUriMatch[2].replace(/\s+/g, ""),
|
|
187
|
+
};
|
|
188
|
+
}
|
|
189
|
+
const normalizedValue = trimmedValue.replace(/\s+/g, "");
|
|
190
|
+
const mimeType = inferMimeTypeFromBase64(normalizedValue);
|
|
191
|
+
if (!mimeType || normalizedValue.length < 100) {
|
|
192
|
+
return undefined;
|
|
193
|
+
}
|
|
194
|
+
return {
|
|
195
|
+
mimeType,
|
|
196
|
+
data: normalizedValue,
|
|
197
|
+
};
|
|
198
|
+
}
|
|
199
|
+
function collectImageMatches(value, results = [], seen = new Set()) {
|
|
200
|
+
if (typeof value === "string") {
|
|
201
|
+
const match = maybeImageMatch(value);
|
|
202
|
+
if (match && !seen.has(match.data)) {
|
|
203
|
+
seen.add(match.data);
|
|
204
|
+
results.push(match);
|
|
205
|
+
}
|
|
206
|
+
return results;
|
|
207
|
+
}
|
|
208
|
+
if (Array.isArray(value)) {
|
|
209
|
+
for (const item of value) {
|
|
210
|
+
collectImageMatches(item, results, seen);
|
|
211
|
+
}
|
|
212
|
+
return results;
|
|
213
|
+
}
|
|
214
|
+
if (!isRecord(value)) {
|
|
215
|
+
return results;
|
|
216
|
+
}
|
|
217
|
+
const prioritizedKeys = ["screenShot", "screenshot", "fullScreenShot", "particularScreenShot", "image", "images"];
|
|
218
|
+
for (const key of prioritizedKeys) {
|
|
219
|
+
if (key in value) {
|
|
220
|
+
collectImageMatches(value[key], results, seen);
|
|
221
|
+
}
|
|
222
|
+
}
|
|
223
|
+
for (const [key, entry] of Object.entries(value)) {
|
|
224
|
+
if (!prioritizedKeys.includes(key)) {
|
|
225
|
+
collectImageMatches(entry, results, seen);
|
|
226
|
+
}
|
|
227
|
+
}
|
|
228
|
+
return results;
|
|
229
|
+
}
|
|
230
|
+
function buildProxyParameterString(params) {
|
|
231
|
+
if (!params) {
|
|
232
|
+
return "render=false";
|
|
233
|
+
}
|
|
234
|
+
const searchParams = new URLSearchParams();
|
|
235
|
+
for (const [key, value] of Object.entries(params)) {
|
|
236
|
+
searchParams.set(key, String(value));
|
|
237
|
+
}
|
|
238
|
+
return searchParams.toString();
|
|
239
|
+
}
|
|
240
|
+
function ensureToken() {
|
|
241
|
+
if (!SCRAPE_DO_TOKEN) {
|
|
242
|
+
throw new Error("SCRAPE_DO_TOKEN is not set. Get your token at https://app.scrape.do");
|
|
243
|
+
}
|
|
244
|
+
}
|
|
245
|
+
server.tool("scrape_url", "Scrape a webpage with the official Scrape.do API. Supports Markdown/raw output, JS rendering, screenshots, browser interactions, geo-targeting, header forwarding, session persistence, and ReturnJSON features.", {
|
|
246
|
+
url: zod_1.z.string().url().describe("The target URL to scrape"),
|
|
247
|
+
render_js: zod_1.z.boolean().optional().describe("Alias for render. Render JavaScript for SPA or dynamic pages."),
|
|
248
|
+
render: zod_1.z.boolean().optional().describe("Official Scrape.do render parameter."),
|
|
249
|
+
super_proxy: zod_1.z.boolean().optional().describe("Alias for super. Use residential/mobile proxies."),
|
|
250
|
+
super: zod_1.z.boolean().optional().describe("Official Scrape.do super parameter."),
|
|
251
|
+
geoCode: zod_1.z.string().optional().describe("Country code for geo-targeting."),
|
|
252
|
+
regionalGeoCode: zod_1.z.string().optional().describe("Regional geo-targeting code."),
|
|
253
|
+
device: zod_1.z.enum(["desktop", "mobile", "tablet"]).optional().default("desktop").describe("Device type to emulate."),
|
|
254
|
+
sessionId: zod_1.z.union([zod_1.z.number().int(), zod_1.z.string()]).optional().describe("Sticky session ID."),
|
|
255
|
+
timeout: zod_1.z.number().int().positive().optional().default(60000).describe("Maximum timeout in milliseconds."),
|
|
256
|
+
retryTimeout: zod_1.z.number().int().positive().optional().describe("Retry timeout in milliseconds."),
|
|
257
|
+
disableRetry: zod_1.z.boolean().optional().default(false).describe("Disable automatic retries."),
|
|
258
|
+
output: zod_1.z.enum(["markdown", "raw"]).optional().describe("Output format. MCP defaults to markdown unless ReturnJSON is used."),
|
|
259
|
+
returnJSON: zod_1.z.boolean().optional().default(false).describe("Return JSON with network requests/content."),
|
|
260
|
+
transparentResponse: zod_1.z.boolean().optional().default(false).describe("Return the target response without Scrape.do post-processing."),
|
|
261
|
+
screenshot: zod_1.z.boolean().optional().describe("Alias for screenShot. Capture a viewport screenshot."),
|
|
262
|
+
screenShot: zod_1.z.boolean().optional().describe("Official Scrape.do screenshot parameter."),
|
|
263
|
+
fullScreenShot: zod_1.z.boolean().optional().default(false).describe("Capture a full-page screenshot."),
|
|
264
|
+
particularScreenShot: zod_1.z.string().optional().describe("Capture a screenshot of a specific CSS selector."),
|
|
265
|
+
playWithBrowser: zod_1.z.array(browserActionSchema).optional().describe("Browser interaction script for Scrape.do."),
|
|
266
|
+
waitSelector: zod_1.z.string().optional().describe("CSS selector to wait for."),
|
|
267
|
+
customWait: zod_1.z.number().int().min(0).optional().describe("Additional wait time after load in milliseconds."),
|
|
268
|
+
waitUntil: scrapeWaitUntilSchema.optional().default("domcontentloaded").describe("Browser load event to wait for."),
|
|
269
|
+
width: zod_1.z.number().int().positive().optional().default(1920).describe("Viewport width."),
|
|
270
|
+
height: zod_1.z.number().int().positive().optional().default(1080).describe("Viewport height."),
|
|
271
|
+
blockResources: zod_1.z.boolean().optional().default(true).describe("Block CSS, images, and fonts."),
|
|
272
|
+
showFrames: zod_1.z.boolean().optional().default(false).describe("Include iframe content in ReturnJSON responses."),
|
|
273
|
+
showWebsocketRequests: zod_1.z.boolean().optional().default(false).describe("Include websocket requests in ReturnJSON responses."),
|
|
274
|
+
headers: headerRecordSchema.optional().describe("Header values to forward to Scrape.do for custom/extra/forward modes."),
|
|
275
|
+
header_mode: headerModeSchema.optional().describe("Header forwarding mode: custom, extra, or forward."),
|
|
276
|
+
headerMode: headerModeSchema.optional().describe("CamelCase alias for header_mode."),
|
|
277
|
+
customHeaders: zod_1.z.boolean().optional().describe("Enable official customHeaders mode."),
|
|
278
|
+
extraHeaders: zod_1.z.boolean().optional().describe("Enable official extraHeaders mode."),
|
|
279
|
+
forwardHeaders: zod_1.z.boolean().optional().describe("Enable official forwardHeaders mode."),
|
|
280
|
+
setCookies: zod_1.z.string().optional().describe("Cookies to send to the target page."),
|
|
281
|
+
pureCookies: zod_1.z.boolean().optional().default(false).describe("Return original Set-Cookie headers."),
|
|
282
|
+
disableRedirection: zod_1.z.boolean().optional().default(false).describe("Disable redirect following."),
|
|
283
|
+
callback: zod_1.z.string().url().optional().describe("Webhook callback URL."),
|
|
284
|
+
}, async (params) => {
|
|
285
|
+
try {
|
|
286
|
+
ensureToken();
|
|
287
|
+
const screenshotRequested = (params.screenshot ?? params.screenShot ?? false) || params.fullScreenShot || Boolean(params.particularScreenShot);
|
|
288
|
+
const interactionRequested = Boolean(params.playWithBrowser?.length);
|
|
289
|
+
const screenshotModeCount = [params.screenshot ?? params.screenShot ?? false, params.fullScreenShot, Boolean(params.particularScreenShot)].filter(Boolean).length;
|
|
290
|
+
if (screenshotModeCount > 1) {
|
|
291
|
+
return createErrorResult("Use only one screenshot mode at a time: screenShot, fullScreenShot, or particularScreenShot.");
|
|
292
|
+
}
|
|
293
|
+
if (params.particularScreenShot && interactionRequested) {
|
|
294
|
+
return createErrorResult("particularScreenShot cannot be used together with playWithBrowser.");
|
|
295
|
+
}
|
|
296
|
+
const headerMode = resolveHeaderMode(params);
|
|
297
|
+
const effectiveRender = (params.render_js ?? params.render ?? false) || params.returnJSON || params.showFrames || params.showWebsocketRequests || screenshotRequested || interactionRequested;
|
|
298
|
+
const effectiveReturnJSON = params.returnJSON || params.showFrames || params.showWebsocketRequests || screenshotRequested || interactionRequested;
|
|
299
|
+
const effectiveBlockResources = screenshotRequested || interactionRequested ? false : params.blockResources;
|
|
300
|
+
const effectiveOutput = effectiveReturnJSON ? params.output : params.output ?? "markdown";
|
|
301
|
+
const requestParams = compactObject({
|
|
302
|
+
token: SCRAPE_DO_TOKEN,
|
|
303
|
+
url: params.url,
|
|
304
|
+
render: effectiveRender || undefined,
|
|
305
|
+
super: params.super_proxy ?? params.super,
|
|
306
|
+
geoCode: params.geoCode,
|
|
307
|
+
regionalGeoCode: params.regionalGeoCode,
|
|
308
|
+
device: params.device !== "desktop" ? params.device : undefined,
|
|
309
|
+
sessionId: params.sessionId,
|
|
310
|
+
timeout: params.timeout !== 60000 ? params.timeout : undefined,
|
|
311
|
+
retryTimeout: params.retryTimeout,
|
|
312
|
+
disableRetry: params.disableRetry || undefined,
|
|
313
|
+
output: effectiveOutput,
|
|
314
|
+
returnJSON: effectiveReturnJSON || undefined,
|
|
315
|
+
transparentResponse: params.transparentResponse || undefined,
|
|
316
|
+
screenShot: (params.screenshot ?? params.screenShot ?? false) || undefined,
|
|
317
|
+
fullScreenShot: params.fullScreenShot || undefined,
|
|
318
|
+
particularScreenShot: params.particularScreenShot,
|
|
319
|
+
playWithBrowser: params.playWithBrowser?.length ? JSON.stringify(params.playWithBrowser) : undefined,
|
|
320
|
+
waitSelector: params.waitSelector,
|
|
321
|
+
customWait: params.customWait,
|
|
322
|
+
waitUntil: params.waitUntil !== "domcontentloaded" ? params.waitUntil : undefined,
|
|
323
|
+
width: params.width !== 1920 ? params.width : undefined,
|
|
324
|
+
height: params.height !== 1080 ? params.height : undefined,
|
|
325
|
+
blockResources: effectiveBlockResources === false ? false : undefined,
|
|
326
|
+
showFrames: params.showFrames || undefined,
|
|
327
|
+
showWebsocketRequests: params.showWebsocketRequests || undefined,
|
|
328
|
+
customHeaders: headerMode === "custom" || params.customHeaders ? true : undefined,
|
|
329
|
+
extraHeaders: headerMode === "extra" || params.extraHeaders ? true : undefined,
|
|
330
|
+
forwardHeaders: headerMode === "forward" || params.forwardHeaders ? true : undefined,
|
|
331
|
+
setCookies: params.setCookies,
|
|
332
|
+
pureCookies: params.pureCookies || undefined,
|
|
333
|
+
disableRedirection: params.disableRedirection || undefined,
|
|
334
|
+
callback: params.callback,
|
|
335
|
+
});
|
|
336
|
+
const headers = buildForwardedHeaders(params.headers, headerMode);
|
|
337
|
+
const { text } = await requestText({
|
|
338
|
+
method: "GET",
|
|
339
|
+
url: SCRAPE_API_BASE,
|
|
340
|
+
params: requestParams,
|
|
341
|
+
headers,
|
|
342
|
+
timeout: Math.min(params.timeout ?? 60000, 120000),
|
|
343
|
+
});
|
|
344
|
+
const parsed = tryParseJson(text);
|
|
345
|
+
const images = screenshotRequested || interactionRequested ? collectImageMatches(parsed ?? text) : [];
|
|
346
|
+
if (images.length > 0) {
|
|
347
|
+
const note = images.length === 1 ? "Captured screenshot from Scrape.do." : `Captured ${images.length} screenshots from Scrape.do.`;
|
|
348
|
+
return createImageResult(images, note);
|
|
349
|
+
}
|
|
350
|
+
if (parsed !== undefined) {
|
|
351
|
+
return createJsonResult(parsed);
|
|
352
|
+
}
|
|
353
|
+
return createTextResult(text);
|
|
354
|
+
}
|
|
355
|
+
catch (error) {
|
|
356
|
+
return createErrorResult(`Error: ${getErrorMessage(error)}`);
|
|
357
|
+
}
|
|
358
|
+
});
|
|
359
|
+
server.tool("google_search", "Search Google with Scrape.do's structured SERP API. Supports localization, google_domain, UULE/location targeting, filters, pagination, and optional raw HTML.", {
|
|
360
|
+
query: zod_1.z.string().optional().describe("Alias for q. Search query."),
|
|
361
|
+
q: zod_1.z.string().optional().describe("Official Google Search query parameter."),
|
|
362
|
+
country: zod_1.z.string().optional().default("us").describe("Alias for gl. Country code."),
|
|
363
|
+
gl: zod_1.z.string().optional().describe("Official Google geo-location parameter."),
|
|
364
|
+
language: zod_1.z.string().optional().default("en").describe("Alias for hl. Interface language."),
|
|
365
|
+
hl: zod_1.z.string().optional().describe("Official Google interface language parameter."),
|
|
366
|
+
domain: zod_1.z.string().optional().describe("Deprecated alias for google_domain."),
|
|
367
|
+
google_domain: zod_1.z.string().optional().describe("Official Google domain parameter."),
|
|
368
|
+
page: zod_1.z.number().int().positive().optional().default(1).describe("1-based page number."),
|
|
369
|
+
start: zod_1.z.number().int().min(0).optional().describe("Official Google result offset. Overrides page."),
|
|
370
|
+
num: zod_1.z.number().int().positive().optional().describe("Number of results per page."),
|
|
371
|
+
time_period: googleTimePeriodSchema.optional().describe("Time-based search filter."),
|
|
372
|
+
device: zod_1.z.enum(["desktop", "mobile"]).optional().default("desktop").describe("SERP layout device."),
|
|
373
|
+
includeHtml: zod_1.z.boolean().optional().describe("Alias for include_html."),
|
|
374
|
+
include_html: zod_1.z.boolean().optional().describe("Include raw Google HTML in the response."),
|
|
375
|
+
location: zod_1.z.string().optional().describe("Canonical Google location string."),
|
|
376
|
+
uule: zod_1.z.string().optional().describe("UULE-encoded location string."),
|
|
377
|
+
lr: zod_1.z.string().optional().describe("Strict language filter such as lang_en."),
|
|
378
|
+
cr: zod_1.z.string().optional().describe("Strict country filter such as countryUS."),
|
|
379
|
+
safe: zod_1.z.string().optional().describe("SafeSearch mode. Use active to filter adult content."),
|
|
380
|
+
nfpr: zod_1.z.boolean().optional().describe("Disable spelling correction."),
|
|
381
|
+
filter: zod_1.z.union([zod_1.z.string(), zod_1.z.number()]).optional().describe("Result filtering control. Use 0 to disable similar/omitted result filtering."),
|
|
382
|
+
}, async (params) => {
|
|
383
|
+
try {
|
|
384
|
+
ensureToken();
|
|
385
|
+
const query = params.query ?? params.q;
|
|
386
|
+
if (!query) {
|
|
387
|
+
return createErrorResult("Error: query or q is required.");
|
|
388
|
+
}
|
|
389
|
+
const start = params.start ?? Math.max((params.page - 1) * (params.num ?? 10), 0);
|
|
390
|
+
const requestParams = compactObject({
|
|
391
|
+
token: SCRAPE_DO_TOKEN,
|
|
392
|
+
q: query,
|
|
393
|
+
gl: params.gl ?? params.country,
|
|
394
|
+
hl: params.hl ?? params.language,
|
|
395
|
+
google_domain: params.google_domain ?? params.domain,
|
|
396
|
+
start,
|
|
397
|
+
num: params.num,
|
|
398
|
+
time_period: params.time_period,
|
|
399
|
+
device: params.device,
|
|
400
|
+
include_html: params.include_html ?? params.includeHtml ? true : undefined,
|
|
401
|
+
location: params.location,
|
|
402
|
+
uule: params.uule,
|
|
403
|
+
lr: params.lr,
|
|
404
|
+
cr: params.cr,
|
|
405
|
+
safe: params.safe,
|
|
406
|
+
nfpr: params.nfpr,
|
|
407
|
+
filter: params.filter,
|
|
408
|
+
});
|
|
409
|
+
const { text } = await requestText({
|
|
410
|
+
method: "GET",
|
|
411
|
+
url: `${SCRAPE_API_BASE}/plugin/google/search`,
|
|
412
|
+
params: requestParams,
|
|
413
|
+
timeout: 60000,
|
|
414
|
+
});
|
|
415
|
+
const parsed = tryParseJson(text);
|
|
416
|
+
if (parsed !== undefined) {
|
|
417
|
+
return createJsonResult(parsed);
|
|
418
|
+
}
|
|
419
|
+
return createTextResult(text);
|
|
420
|
+
}
|
|
421
|
+
catch (error) {
|
|
422
|
+
return createErrorResult(`Error: ${getErrorMessage(error)}`);
|
|
423
|
+
}
|
|
424
|
+
});
|
|
425
|
+
server.tool("amazon_product", "Get structured Amazon product detail data with the official Scrape.do Amazon PDP API.", {
|
|
426
|
+
asin: zod_1.z.string().min(1).describe("Amazon ASIN."),
|
|
427
|
+
geocode: zod_1.z.string().min(1).describe("Amazon marketplace country code."),
|
|
428
|
+
zipcode: zod_1.z.string().min(1).describe("ZIP/postal code for geo-targeting."),
|
|
429
|
+
super_proxy: zod_1.z.boolean().optional().describe("Alias for super."),
|
|
430
|
+
super: zod_1.z.boolean().optional().describe("Official Amazon super proxy flag."),
|
|
431
|
+
language: zod_1.z.string().optional().describe("ISO 639-1 language code."),
|
|
432
|
+
includeHtml: zod_1.z.boolean().optional().describe("Alias for include_html."),
|
|
433
|
+
include_html: zod_1.z.boolean().optional().describe("Include raw HTML in the JSON response."),
|
|
434
|
+
}, async (params) => {
|
|
435
|
+
try {
|
|
436
|
+
ensureToken();
|
|
437
|
+
const requestParams = compactObject({
|
|
438
|
+
token: SCRAPE_DO_TOKEN,
|
|
439
|
+
asin: params.asin,
|
|
440
|
+
geocode: params.geocode,
|
|
441
|
+
zipcode: params.zipcode,
|
|
442
|
+
super: params.super_proxy ?? params.super,
|
|
443
|
+
language: params.language,
|
|
444
|
+
include_html: params.include_html ?? params.includeHtml ? true : undefined,
|
|
445
|
+
});
|
|
446
|
+
const { text } = await requestText({
|
|
447
|
+
method: "GET",
|
|
448
|
+
url: `${SCRAPE_API_BASE}/plugin/amazon/pdp`,
|
|
449
|
+
params: requestParams,
|
|
450
|
+
timeout: 60000,
|
|
451
|
+
});
|
|
452
|
+
const parsed = tryParseJson(text);
|
|
453
|
+
if (parsed !== undefined) {
|
|
454
|
+
return createJsonResult(parsed);
|
|
455
|
+
}
|
|
456
|
+
return createTextResult(text);
|
|
457
|
+
}
|
|
458
|
+
catch (error) {
|
|
459
|
+
return createErrorResult(`Error: ${getErrorMessage(error)}`);
|
|
460
|
+
}
|
|
461
|
+
});
|
|
462
|
+
server.tool("amazon_offer_listing", "Get all seller offers for an Amazon product with structured pricing, fulfillment, and Buy Box data.", {
|
|
463
|
+
asin: zod_1.z.string().min(1).describe("Amazon ASIN."),
|
|
464
|
+
geocode: zod_1.z.string().min(1).describe("Amazon marketplace country code."),
|
|
465
|
+
zipcode: zod_1.z.string().min(1).describe("ZIP/postal code for geo-targeting."),
|
|
466
|
+
super_proxy: zod_1.z.boolean().optional().describe("Alias for super."),
|
|
467
|
+
super: zod_1.z.boolean().optional().describe("Official Amazon super proxy flag."),
|
|
468
|
+
includeHtml: zod_1.z.boolean().optional().describe("Alias for include_html."),
|
|
469
|
+
include_html: zod_1.z.boolean().optional().describe("Include raw HTML in the JSON response."),
|
|
470
|
+
}, async (params) => {
|
|
471
|
+
try {
|
|
472
|
+
ensureToken();
|
|
473
|
+
const requestParams = compactObject({
|
|
474
|
+
token: SCRAPE_DO_TOKEN,
|
|
475
|
+
asin: params.asin,
|
|
476
|
+
geocode: params.geocode,
|
|
477
|
+
zipcode: params.zipcode,
|
|
478
|
+
super: params.super_proxy ?? params.super,
|
|
479
|
+
include_html: params.include_html ?? params.includeHtml ? true : undefined,
|
|
480
|
+
});
|
|
481
|
+
const { text } = await requestText({
|
|
482
|
+
method: "GET",
|
|
483
|
+
url: `${SCRAPE_API_BASE}/plugin/amazon/offer-listing`,
|
|
484
|
+
params: requestParams,
|
|
485
|
+
timeout: 60000,
|
|
486
|
+
});
|
|
487
|
+
const parsed = tryParseJson(text);
|
|
488
|
+
if (parsed !== undefined) {
|
|
489
|
+
return createJsonResult(parsed);
|
|
490
|
+
}
|
|
491
|
+
return createTextResult(text);
|
|
492
|
+
}
|
|
493
|
+
catch (error) {
|
|
494
|
+
return createErrorResult(`Error: ${getErrorMessage(error)}`);
|
|
495
|
+
}
|
|
496
|
+
});
|
|
497
|
+
server.tool("amazon_search", "Search Amazon or scrape Amazon category-style result pages with structured product listings.", {
|
|
498
|
+
keyword: zod_1.z.string().min(1).describe("Amazon keyword query."),
|
|
499
|
+
geocode: zod_1.z.string().min(1).describe("Amazon marketplace country code."),
|
|
500
|
+
zipcode: zod_1.z.string().min(1).describe("ZIP/postal code for geo-targeting."),
|
|
501
|
+
page: zod_1.z.number().int().positive().optional().default(1).describe("Page number."),
|
|
502
|
+
super_proxy: zod_1.z.boolean().optional().describe("Alias for super."),
|
|
503
|
+
super: zod_1.z.boolean().optional().describe("Official Amazon super proxy flag."),
|
|
504
|
+
language: zod_1.z.string().optional().describe("ISO 639-1 language code."),
|
|
505
|
+
includeHtml: zod_1.z.boolean().optional().describe("Alias for include_html."),
|
|
506
|
+
include_html: zod_1.z.boolean().optional().describe("Include raw HTML in the JSON response."),
|
|
507
|
+
}, async (params) => {
|
|
508
|
+
try {
|
|
509
|
+
ensureToken();
|
|
510
|
+
const requestParams = compactObject({
|
|
511
|
+
token: SCRAPE_DO_TOKEN,
|
|
512
|
+
keyword: params.keyword,
|
|
513
|
+
geocode: params.geocode,
|
|
514
|
+
zipcode: params.zipcode,
|
|
515
|
+
page: params.page !== 1 ? params.page : undefined,
|
|
516
|
+
super: params.super_proxy ?? params.super,
|
|
517
|
+
language: params.language,
|
|
518
|
+
include_html: params.include_html ?? params.includeHtml ? true : undefined,
|
|
519
|
+
});
|
|
520
|
+
const { text } = await requestText({
|
|
521
|
+
method: "GET",
|
|
522
|
+
url: `${SCRAPE_API_BASE}/plugin/amazon/search`,
|
|
523
|
+
params: requestParams,
|
|
524
|
+
timeout: 60000,
|
|
525
|
+
});
|
|
526
|
+
const parsed = tryParseJson(text);
|
|
527
|
+
if (parsed !== undefined) {
|
|
528
|
+
return createJsonResult(parsed);
|
|
529
|
+
}
|
|
530
|
+
return createTextResult(text);
|
|
531
|
+
}
|
|
532
|
+
catch (error) {
|
|
533
|
+
return createErrorResult(`Error: ${getErrorMessage(error)}`);
|
|
534
|
+
}
|
|
535
|
+
});
|
|
536
|
+
server.tool("amazon_raw_html", "Get raw HTML from any Amazon URL with ZIP-code geo-targeting.", {
|
|
537
|
+
url: zod_1.z.string().url().describe("Full Amazon URL to scrape."),
|
|
538
|
+
geocode: zod_1.z.string().min(1).describe("Amazon marketplace country code."),
|
|
539
|
+
zipcode: zod_1.z.string().min(1).describe("ZIP/postal code for geo-targeting."),
|
|
540
|
+
super_proxy: zod_1.z.boolean().optional().describe("Alias for super."),
|
|
541
|
+
super: zod_1.z.boolean().optional().describe("Official Amazon super proxy flag."),
|
|
542
|
+
language: zod_1.z.string().optional().describe("ISO 639-1 language code."),
|
|
543
|
+
timeout: zod_1.z.number().int().positive().optional().describe("Request timeout in milliseconds."),
|
|
544
|
+
}, async (params) => {
|
|
545
|
+
try {
|
|
546
|
+
ensureToken();
|
|
547
|
+
const requestParams = compactObject({
|
|
548
|
+
token: SCRAPE_DO_TOKEN,
|
|
549
|
+
url: params.url,
|
|
550
|
+
geocode: params.geocode,
|
|
551
|
+
zipcode: params.zipcode,
|
|
552
|
+
output: "html",
|
|
553
|
+
super: params.super_proxy ?? params.super,
|
|
554
|
+
language: params.language,
|
|
555
|
+
timeout: params.timeout,
|
|
556
|
+
});
|
|
557
|
+
const { text } = await requestText({
|
|
558
|
+
method: "GET",
|
|
559
|
+
url: `${SCRAPE_API_BASE}/plugin/amazon/`,
|
|
560
|
+
params: requestParams,
|
|
561
|
+
timeout: params.timeout ?? 60000,
|
|
562
|
+
});
|
|
563
|
+
return createTextResult(text);
|
|
564
|
+
}
|
|
565
|
+
catch (error) {
|
|
566
|
+
return createErrorResult(`Error: ${getErrorMessage(error)}`);
|
|
567
|
+
}
|
|
568
|
+
});
|
|
569
|
+
server.tool("async_create_job", "Create a Scrape.do Async API job for batch/background scraping.", {
|
|
570
|
+
targets: zod_1.z.array(zod_1.z.string().url()).min(1).describe("URLs to scrape."),
|
|
571
|
+
method: asyncMethodSchema.optional().default("GET").describe("HTTP method for the job."),
|
|
572
|
+
body: zod_1.z.string().optional().describe("Request body for POST/PUT/PATCH jobs."),
|
|
573
|
+
geoCode: zod_1.z.string().optional().describe("Country code."),
|
|
574
|
+
regionalGeoCode: zod_1.z.string().optional().describe("Regional code."),
|
|
575
|
+
super_proxy: zod_1.z.boolean().optional().describe("Use residential/mobile proxies."),
|
|
576
|
+
headers: headerRecordSchema.optional().describe("Headers to send with the upstream request."),
|
|
577
|
+
forwardHeaders: zod_1.z.boolean().optional().describe("Use only provided headers instead of merging with Scrape.do headers."),
|
|
578
|
+
sessionId: zod_1.z.union([zod_1.z.number().int(), zod_1.z.string()]).optional().describe("Sticky session ID."),
|
|
579
|
+
device: zod_1.z.enum(["desktop", "mobile", "tablet"]).optional().describe("Device type."),
|
|
580
|
+
setCookies: zod_1.z.string().optional().describe("Cookies to include."),
|
|
581
|
+
timeout: zod_1.z.number().int().positive().optional().describe("Request timeout in milliseconds."),
|
|
582
|
+
retryTimeout: zod_1.z.number().int().positive().optional().describe("Retry timeout in milliseconds."),
|
|
583
|
+
disableRetry: zod_1.z.boolean().optional().describe("Disable automatic retries."),
|
|
584
|
+
transparentResponse: zod_1.z.boolean().optional().describe("Return raw target response."),
|
|
585
|
+
disableRedirection: zod_1.z.boolean().optional().describe("Disable redirects."),
|
|
586
|
+
output: zod_1.z.enum(["raw", "markdown"]).optional().describe("Output format."),
|
|
587
|
+
render: zod_1.z
|
|
588
|
+
.object({
|
|
589
|
+
blockResources: zod_1.z.boolean().optional(),
|
|
590
|
+
waitUntil: asyncWaitUntilSchema.optional(),
|
|
591
|
+
customWait: zod_1.z.number().int().min(0).max(35000).optional(),
|
|
592
|
+
waitSelector: zod_1.z.string().optional(),
|
|
593
|
+
playWithBrowser: zod_1.z.array(browserActionSchema).optional(),
|
|
594
|
+
returnJSON: zod_1.z.boolean().optional(),
|
|
595
|
+
showWebsocketRequests: zod_1.z.boolean().optional(),
|
|
596
|
+
showFrames: zod_1.z.boolean().optional(),
|
|
597
|
+
screenshot: zod_1.z.boolean().optional(),
|
|
598
|
+
fullScreenshot: zod_1.z.boolean().optional(),
|
|
599
|
+
particularScreenshot: zod_1.z.string().optional(),
|
|
600
|
+
})
|
|
601
|
+
.optional()
|
|
602
|
+
.describe("Headless browser configuration."),
|
|
603
|
+
webhookUrl: zod_1.z.string().url().optional().describe("Webhook URL to receive results."),
|
|
604
|
+
webhookHeaders: headerRecordSchema.optional().describe("Extra headers for the webhook request."),
|
|
605
|
+
}, async (params) => {
|
|
606
|
+
try {
|
|
607
|
+
ensureToken();
|
|
608
|
+
const render = params.render
|
|
609
|
+
? compactObject({
|
|
610
|
+
BlockResources: params.render.blockResources,
|
|
611
|
+
WaitUntil: params.render.waitUntil,
|
|
612
|
+
CustomWait: params.render.customWait,
|
|
613
|
+
WaitSelector: params.render.waitSelector,
|
|
614
|
+
PlayWithBrowser: params.render.playWithBrowser,
|
|
615
|
+
ReturnJSON: params.render.returnJSON,
|
|
616
|
+
ShowWebsocketRequests: params.render.showWebsocketRequests,
|
|
617
|
+
ShowFrames: params.render.showFrames,
|
|
618
|
+
Screenshot: params.render.screenshot,
|
|
619
|
+
FullScreenshot: params.render.fullScreenshot,
|
|
620
|
+
ParticularScreenshot: params.render.particularScreenshot,
|
|
621
|
+
})
|
|
622
|
+
: undefined;
|
|
623
|
+
const body = compactObject({
|
|
624
|
+
Targets: params.targets,
|
|
625
|
+
Method: params.method,
|
|
626
|
+
Body: params.body,
|
|
627
|
+
GeoCode: params.geoCode,
|
|
628
|
+
RegionalGeoCode: params.regionalGeoCode,
|
|
629
|
+
Super: params.super_proxy,
|
|
630
|
+
Headers: normalizeHeaderRecord(params.headers),
|
|
631
|
+
ForwardHeaders: params.forwardHeaders,
|
|
632
|
+
SessionID: params.sessionId !== undefined ? String(params.sessionId) : undefined,
|
|
633
|
+
Device: params.device,
|
|
634
|
+
SetCookies: params.setCookies,
|
|
635
|
+
Timeout: params.timeout,
|
|
636
|
+
RetryTimeout: params.retryTimeout,
|
|
637
|
+
DisableRetry: params.disableRetry,
|
|
638
|
+
TransparentResponse: params.transparentResponse,
|
|
639
|
+
DisableRedirection: params.disableRedirection,
|
|
640
|
+
Output: params.output,
|
|
641
|
+
Render: render && Object.keys(render).length > 0 ? render : undefined,
|
|
642
|
+
WebhookURL: params.webhookUrl,
|
|
643
|
+
WebhookHeaders: normalizeHeaderRecord(params.webhookHeaders),
|
|
644
|
+
});
|
|
645
|
+
const { text } = await requestText({
|
|
646
|
+
method: "POST",
|
|
647
|
+
url: `${ASYNC_API_BASE}/api/v1/jobs`,
|
|
648
|
+
headers: {
|
|
649
|
+
"Content-Type": "application/json",
|
|
650
|
+
"X-Token": SCRAPE_DO_TOKEN,
|
|
651
|
+
},
|
|
652
|
+
data: body,
|
|
653
|
+
timeout: 60000,
|
|
654
|
+
});
|
|
655
|
+
const parsed = tryParseJson(text);
|
|
656
|
+
if (parsed !== undefined) {
|
|
657
|
+
return createJsonResult(parsed);
|
|
658
|
+
}
|
|
659
|
+
return createTextResult(text);
|
|
660
|
+
}
|
|
661
|
+
catch (error) {
|
|
662
|
+
return createErrorResult(`Error: ${getErrorMessage(error)}`);
|
|
663
|
+
}
|
|
664
|
+
});
|
|
665
|
+
server.tool("async_get_job", "Get Scrape.do Async API job details by job ID.", {
|
|
666
|
+
jobId: zod_1.z.string().min(1).describe("Job ID returned by async_create_job."),
|
|
667
|
+
}, async ({ jobId }) => {
|
|
30
668
|
try {
|
|
31
|
-
|
|
669
|
+
ensureToken();
|
|
670
|
+
const { text } = await requestText({
|
|
671
|
+
method: "GET",
|
|
672
|
+
url: `${ASYNC_API_BASE}/api/v1/jobs/${encodeURIComponent(jobId)}`,
|
|
673
|
+
headers: {
|
|
674
|
+
"X-Token": SCRAPE_DO_TOKEN,
|
|
675
|
+
},
|
|
676
|
+
timeout: 60000,
|
|
677
|
+
});
|
|
678
|
+
const parsed = tryParseJson(text);
|
|
679
|
+
if (parsed !== undefined) {
|
|
680
|
+
return createJsonResult(parsed);
|
|
681
|
+
}
|
|
682
|
+
return createTextResult(text);
|
|
683
|
+
}
|
|
684
|
+
catch (error) {
|
|
685
|
+
return createErrorResult(`Error: ${getErrorMessage(error)}`);
|
|
686
|
+
}
|
|
687
|
+
});
|
|
688
|
+
server.tool("async_get_task", "Get Scrape.do Async API task details by job ID and task ID.", {
|
|
689
|
+
jobId: zod_1.z.string().min(1).describe("Job ID."),
|
|
690
|
+
taskId: zod_1.z.string().min(1).describe("Task ID."),
|
|
691
|
+
}, async ({ jobId, taskId }) => {
|
|
692
|
+
try {
|
|
693
|
+
ensureToken();
|
|
694
|
+
const { text } = await requestText({
|
|
695
|
+
method: "GET",
|
|
696
|
+
url: `${ASYNC_API_BASE}/api/v1/jobs/${encodeURIComponent(jobId)}/${encodeURIComponent(taskId)}`,
|
|
697
|
+
headers: {
|
|
698
|
+
"X-Token": SCRAPE_DO_TOKEN,
|
|
699
|
+
},
|
|
700
|
+
timeout: 60000,
|
|
701
|
+
});
|
|
702
|
+
const parsed = tryParseJson(text);
|
|
703
|
+
if (parsed !== undefined) {
|
|
704
|
+
return createJsonResult(parsed);
|
|
705
|
+
}
|
|
706
|
+
return createTextResult(text);
|
|
707
|
+
}
|
|
708
|
+
catch (error) {
|
|
709
|
+
return createErrorResult(`Error: ${getErrorMessage(error)}`);
|
|
710
|
+
}
|
|
711
|
+
});
|
|
712
|
+
server.tool("async_list_jobs", "List Scrape.do Async API jobs with pagination.", {
|
|
713
|
+
page: zod_1.z.number().int().positive().optional().default(1).describe("Page number."),
|
|
714
|
+
pageSize: zod_1.z.number().int().positive().max(100).optional().default(10).describe("Items per page."),
|
|
715
|
+
}, async ({ page, pageSize }) => {
|
|
716
|
+
try {
|
|
717
|
+
ensureToken();
|
|
718
|
+
const { text } = await requestText({
|
|
719
|
+
method: "GET",
|
|
720
|
+
url: `${ASYNC_API_BASE}/api/v1/jobs`,
|
|
32
721
|
params: {
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
722
|
+
page,
|
|
723
|
+
page_size: pageSize,
|
|
724
|
+
},
|
|
725
|
+
headers: {
|
|
726
|
+
"X-Token": SCRAPE_DO_TOKEN,
|
|
38
727
|
},
|
|
39
728
|
timeout: 60000,
|
|
40
729
|
});
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
730
|
+
const parsed = tryParseJson(text);
|
|
731
|
+
if (parsed !== undefined) {
|
|
732
|
+
return createJsonResult(parsed);
|
|
733
|
+
}
|
|
734
|
+
return createTextResult(text);
|
|
44
735
|
}
|
|
45
736
|
catch (error) {
|
|
46
|
-
|
|
47
|
-
return {
|
|
48
|
-
content: [{ type: "text", text: `Error: ${msg}` }],
|
|
49
|
-
isError: true,
|
|
50
|
-
};
|
|
737
|
+
return createErrorResult(`Error: ${getErrorMessage(error)}`);
|
|
51
738
|
}
|
|
52
739
|
});
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
740
|
+
server.tool("async_cancel_job", "Cancel a Scrape.do Async API job.", {
|
|
741
|
+
jobId: zod_1.z.string().min(1).describe("Job ID to cancel."),
|
|
742
|
+
}, async ({ jobId }) => {
|
|
743
|
+
try {
|
|
744
|
+
ensureToken();
|
|
745
|
+
const { text } = await requestText({
|
|
746
|
+
method: "DELETE",
|
|
747
|
+
url: `${ASYNC_API_BASE}/api/v1/jobs/${encodeURIComponent(jobId)}`,
|
|
748
|
+
headers: {
|
|
749
|
+
"X-Token": SCRAPE_DO_TOKEN,
|
|
750
|
+
},
|
|
751
|
+
timeout: 60000,
|
|
752
|
+
});
|
|
753
|
+
const parsed = tryParseJson(text);
|
|
754
|
+
if (parsed !== undefined) {
|
|
755
|
+
return createJsonResult(parsed);
|
|
756
|
+
}
|
|
757
|
+
return createTextResult(text);
|
|
67
758
|
}
|
|
759
|
+
catch (error) {
|
|
760
|
+
return createErrorResult(`Error: ${getErrorMessage(error)}`);
|
|
761
|
+
}
|
|
762
|
+
});
|
|
763
|
+
server.tool("async_get_account", "Get Scrape.do Async API account/concurrency information.", {}, async () => {
|
|
68
764
|
try {
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
};
|
|
77
|
-
if (time_period)
|
|
78
|
-
params.time_period = time_period;
|
|
79
|
-
const response = await axios_1.default.get(`${SCRAPE_API_BASE}/plugin/google/search`, {
|
|
80
|
-
params,
|
|
765
|
+
ensureToken();
|
|
766
|
+
const { text } = await requestText({
|
|
767
|
+
method: "GET",
|
|
768
|
+
url: `${ASYNC_API_BASE}/api/v1/me`,
|
|
769
|
+
headers: {
|
|
770
|
+
"X-Token": SCRAPE_DO_TOKEN,
|
|
771
|
+
},
|
|
81
772
|
timeout: 60000,
|
|
82
773
|
});
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
774
|
+
const parsed = tryParseJson(text);
|
|
775
|
+
if (parsed !== undefined) {
|
|
776
|
+
return createJsonResult(parsed);
|
|
777
|
+
}
|
|
778
|
+
return createTextResult(text);
|
|
86
779
|
}
|
|
87
780
|
catch (error) {
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
781
|
+
return createErrorResult(`Error: ${getErrorMessage(error)}`);
|
|
782
|
+
}
|
|
783
|
+
});
|
|
784
|
+
server.tool("proxy_mode_config", "Generate Scrape.do Proxy Mode configuration and parameter strings without exposing your configured token.", {
|
|
785
|
+
params: headerRecordSchema.optional().describe("Proxy mode query parameters to place into the password segment."),
|
|
786
|
+
}, async ({ params }) => {
|
|
787
|
+
try {
|
|
788
|
+
ensureToken();
|
|
789
|
+
const parameterString = buildProxyParameterString(params);
|
|
790
|
+
return createJsonResult({
|
|
791
|
+
protocol: "http or https",
|
|
792
|
+
host: "proxy.scrape.do",
|
|
793
|
+
port: 8080,
|
|
794
|
+
username: "<SCRAPE_DO_TOKEN>",
|
|
795
|
+
password: parameterString,
|
|
796
|
+
proxy_url_template: `http://<SCRAPE_DO_TOKEN>:${parameterString}@proxy.scrape.do:8080`,
|
|
797
|
+
ca_certificate_url: "https://scrape.do/scrapedo_ca.crt",
|
|
798
|
+
});
|
|
799
|
+
}
|
|
800
|
+
catch (error) {
|
|
801
|
+
return createErrorResult(`Error: ${getErrorMessage(error)}`);
|
|
93
802
|
}
|
|
94
803
|
});
|
|
95
|
-
// ─── Start Server ────────────────────────────────────────────────────────────
|
|
96
804
|
async function main() {
|
|
97
805
|
const transport = new stdio_js_1.StdioServerTransport();
|
|
98
806
|
await server.connect(transport);
|
|
99
807
|
}
|
|
100
|
-
main().catch(
|
|
808
|
+
main().catch((error) => {
|
|
809
|
+
console.error(error);
|
|
810
|
+
process.exitCode = 1;
|
|
811
|
+
});
|
package/package.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "scrape-do-mcp",
|
|
3
|
-
"version": "0.
|
|
4
|
-
"description": "MCP Server for Scrape.do -
|
|
3
|
+
"version": "0.3.0",
|
|
4
|
+
"description": "MCP Server for Scrape.do - scraping, Google Search, Amazon, Async API, and Proxy Mode helpers",
|
|
5
5
|
"main": "dist/index.js",
|
|
6
6
|
"bin": {
|
|
7
7
|
"scrape-do-mcp": "dist/index.js"
|
|
@@ -16,6 +16,9 @@
|
|
|
16
16
|
"scraping",
|
|
17
17
|
"web-scraper",
|
|
18
18
|
"google-search",
|
|
19
|
+
"amazon-scraper",
|
|
20
|
+
"async-api",
|
|
21
|
+
"serp",
|
|
19
22
|
"firecrawl-alternative"
|
|
20
23
|
],
|
|
21
24
|
"license": "MIT",
|
|
@@ -25,6 +28,7 @@
|
|
|
25
28
|
},
|
|
26
29
|
"files": [
|
|
27
30
|
"dist",
|
|
31
|
+
"LICENSE",
|
|
28
32
|
"README.md",
|
|
29
33
|
"README-ZH.md"
|
|
30
34
|
],
|